Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240218となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# 実業団プロジェクト開発における国際・多学制の授業経験 An International and Multidisciplinary Teaching Experience with Real Industrial Team Project Development ( http://arxiv.org/abs/2403.15398v1 ) ライセンス: Link先を確認	Martin Mellado, Eduardo Vendrell, Filomena Ferrucci, Andrea Abate, Detlef Zuhlke, Bernard Riera,	(参考訳) 本稿では,学生のカリキュラム改善を目的としたEasmus Intensive Programme (IP,略してIP) の文脈において,欧州委員会が資金提供した国際協力プロジェクトの設計,目標,経験,成果について述べる。 IP(IP)とは、少なくとも3カ国の大学生とスタッフをまとめて、専門分野の効率的かつ多国籍的な教育を奨励する研究プログラム(最小2週間)である。このプロジェクトは6年間続き、2つの異なるエディションをそれぞれ3年間にわたってカバーした。このプロジェクトは6年間続き、2つの異なるエディションをそれぞれ3年間にわたってカバーした。 SAVRO (Simulation and Virtual Reality in Robotics for Industrial Assembly Processes) は、2008年から2010年にかけて、バレンシア大学 (Universitat Politecnica de Valencia) がIPコーディネーターとして参加し、ドイツ工科大学 (Technische Universitat Kaiserslautern) とイタリア工科大学 (Universita degli Studi di Salerno) が参加した。フランスのライムズ・シャンパン=アルデンヌ大学(Universite de Reims Champagne-Ardenne)は、HUMAIN(Human-Machine Interaction)と改名されたIPの2011-2013年版に新たなパートナーとして参加した。教育事業の両版は同じ目的と組織的側面を特徴とし、産業パートナーも関与する国際機関間の協力的な作業を通じて、活発な指導に基づく教育活動を提供することを目的としていた。本研究の目的は,我々の経験の組織化を特徴とするベストプラクティスを概説するとともに,計算学カリキュラムの創出方法に関する一般的な勧告や提案を提示することである。 This paper presents the design, objectives, experiences, and results of an international cooperation project funded by the European Commission in the context of the Erasmus Intensive Programme (IP, for short) designed to improve students' curricula. An IP is a short programme of study (minimum 2 weeks) that brings together university students and staff from at least three countries in order to encourage efficient and multinational teaching of specialist topics, which might otherwise not be taught at all. This project lasted for 6 years, covering two different editions, each one with three year duration. This project lasted for 6 years, covering two different editions, each one with three year duration. The first edition, named SAVRO (Simulation and Virtual Reality in Robotics for Industrial Assembly Processes) was held in the period 2008-2010, with the participation of three Universities, namely the Universitat Politecnica de Valencia (Spain), acting as IP coordinator, the Technische Universitat Kaiserslautern (Germany), and the Universita degli Studi di Salerno (Italy). The Universite de Reims Champagne-Ardenne (France) participated as a new partner in the subsequent edition (2011-2013) of the IP, renamed as HUMAIN (Human-Machine Interaction). Both editions of the teaching project were characterized by the same objectives and organizational aspects, aiming to provide educational initiatives based on active teaching through collaborative works between international institutions, involving industrial partners too. The aim of the paper is to illustrate the best practices that characterized the organization of our experience as well as to present some general recommendations and suggestions on how to devise computing academic curricula.	翻訳日:2024-04-01 03:13:49 公開日:2024-02-18
# 線形代数のChatGPT:前へ進もう ChatGPT in Linear Algebra: Strides Forward, Steps to Go ( http://arxiv.org/abs/2403.15399v1 ) ライセンス: Link先を確認	Eli Bagno, Thierry Dana-Picard, Shulamit Reches,	(参考訳) 新たな技術が出現するとすぐに、教育コミュニティは、その余裕と、それを教育に適用する可能性を探る。本稿では,ChatGPTを用いた基本線形代数のトピックに関するセッションの分析を行う。我々は,近年の関心分野におけるChatGPTのプロセスを反映し,線形代数問題に対処する上での大幅な改善を強調している。特に、このソフトウェアが教師のアシスタントになるのか、それとも人間の教師の代わりになるのかという問題に対処する。この論文が書かれた時点では、答えは概して否定的である。答えが正の小さな部分については、元の楽器生成に関するいくつかの反射が与えられる。ソフトウェアとのコミュニケーションは人間と話す印象を与えます。したがって、ChatGPTが統計的に機能し、反射や理解によっては機能しないという事実に、読者の注意が向けられる。 As soon as a new technology emerges, the education community explores its affordances and the possibilities to apply it in education. In this paper, we analyze sessions with ChatGPT around topics in basic Linear Algebra. We reflect the process undertaken by the ChatGPT along the recent year in our area of interest, emphasising the vast improvement that has been done in grappling with Linear Algebra problems. In particular, the question whether this software can be a teaching assistant or even somehow replace the human teacher, is addressed. As of the time this paper is written, the answer is generally negative. For the small part where the answer can be positive, some reflections about an original instrumental genesis are given. Communication with the software gives the impression to talk to a human, and sometimes the question is whether the software understands the question or not. Therefore, the reader's attention is drawn to the fact that ChatGPT works on a statistical basis and not according to reflection and understanding.	翻訳日:2024-04-01 03:13:49 公開日:2024-02-18
# 即時投票投票を監査するための効率的な重み付け方式 Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections ( http://arxiv.org/abs/2403.15400v1 ) ライセンス: Link先を確認	Alexander Ek, Philip B. Stark, Peter J. Stuckey, Damjan Vukcevic,	(参考訳) 即時投票 (IRV) 選挙のためのリスク制限監査 (RLA) 手法が開発されている。最近の手法であるAWAIREは、キャスト投票記録(CVR)を必要としない最初の効率的なアプローチである。 AWAIREは、適応的に重み付けされたテスト統計量であり、本質的には、テストに有効な仮説のセットを「学習」する。しかし、AWAIREの最初の論文では、いくつかの重み付けスキームとパラメータ設定についてのみ検討した。我々は,計画と設定を幅広く探究し,有効利用のための効率的な選択を特定し,推奨する。我々は、実際の選挙データに基づくシミュレーションを用いて、CVRが利用できない(最も厳しい)ケースにのみ焦点をあてる。比較において、最も効果的なスキームは、しばしば、既に観測されたデータに基づいて、見かけ上の「ベスト」仮説に重みのほとんどまたは全てを配置するものである。逆に、最適チューニングパラメータは選挙マージンによって異なる傾向にあった。それでも、デフォルトオプションが必要な場合、最も望ましいトレードオフを選択するのに役立ち、さまざまな選挙マージンで異なる選択に対するパフォーマンストレードオフを定量化します。現在のAWAIRE実装の制限は、少数の候補(以前は6つの候補)を扱うことの制限である。より計算的に効率的な実装への道の1つは、遅延評価を使い、可能なすべての仮説を考慮しないことである。以上の結果から,統計的に有意な構成を伴わずに,このようなアプローチが可能であることが示唆された。 Various risk-limiting audit (RLA) methods have been developed for instant-runoff voting (IRV) elections. A recent method, AWAIRE, is the first efficient approach that does not require cast vote records (CVRs). AWAIRE involves adaptively weighted averages of test statistics, essentially "learning" an effective set of hypotheses to test. However, the initial paper on AWAIRE only examined a few weighting schemes and parameter settings. We provide an extensive exploration of schemes and settings, to identify and recommend efficient choices for practical use. We focus only on the (hardest) case where CVRs are not available, using simulations based on real election data to assess performance. Across our comparisons, the most effective schemes are often those that place most or all of the weight on the apparent "best" hypotheses based on already seen data. Conversely, the optimal tuning parameters tended to vary based on the election margin. Nonetheless, we quantify the performance trade-offs for different choices across varying election margins, aiding in selecting the most desirable trade-off if a default option is needed. A limitation of the current AWAIRE implementation is its restriction to handling a small number of candidates (previously demonstrated up to six candidates). One path to a more computationally efficient implementation would be to use lazy evaluation and avoid considering all possible hypotheses. Our findings suggest that such an approach could be done without substantially comprising statistical performance.	翻訳日:2024-04-01 03:13:49 公開日:2024-02-18
# virtCCA:TrustZoneでArm Confidential Compute Architectureを仮想化 virtCCA: Virtualized Arm Confidential Compute Architecture with TrustZone ( http://arxiv.org/abs/2306.11011v2 ) ライセンス: Link先を確認	Xiangyi Xu, Wenhao Wang, Yongzheng Wu, Chenyu Wang, Huifeng Zhu, Haocheng Ma, Zhennan Min, Zixuan Pang, Rui Hou, Yier Jin,	(参考訳) ARMは近日中に予定されているARMv9-Aアーキテクチャの一部として、Confidential Compute Architecture (CCA)を導入した。 CCAは、Realm Worldと呼ばれる別の世界における機密仮想マシン(cVM)のサポートを可能にし、信頼できない通常の世界から保護を提供する。 CCAは機密コンピューティングの有望な未来を提供するが、ARMのロードマップによると、CCAハードウェアの広範な利用は近い将来は期待されない。このギャップに対処するために、既存のARMプラットフォームで利用可能な成熟したハードウェア機能であるTrustZoneを使用して仮想化CCAを容易にするアーキテクチャであるvirtCCAを提案する。特に、virtCCAはARMv8.4以降のSecure EL2 (S-EL2)拡張とS-EL2をサポートしていない初期のプラットフォームで実装できる。 virtCCAはAPIレベルでのCCA仕様と完全に互換性がある。我々はCCAソフトウェアとファームウェアスタック全体をvirtCCA上に開発し、通常の世界のKVMがcVMをサポートするように拡張され、TrustZone Management Monitor(TMM)はcVM間の分離を強制し、cVMライフサイクル管理を提供する。我々は,S-EL2サポートの有無にかかわらず,実際のARMサーバにvirtCCAを実装した。マイクロベンチマークとマクロベンチマークを用いて評価した結果,通常のVMと比較して,cVMの実行のオーバーヘッドは許容できることがわかった。具体的には、現実世界のワークロードセットでは、I/O集約ワークロードでは、virtCCA-SEL2のオーバーヘッドが29.5%未満であるのに対して、virtCCA-EL3は、ほとんどの場合、ベースラインを上回っている。 ARM recently introduced the Confidential Compute Architecture (CCA) as part of the upcoming ARMv9-A architecture. CCA enables the support of confidential virtual machines (cVMs) within a separate world called the Realm world, providing protection from the untrusted normal world. While CCA offers a promising future for confidential computing, the widespread availability of CCA hardware is not expected in the near future, according to ARM's roadmap. To address this gap, we present virtCCA, an architecture that facilitates virtualized CCA using TrustZone, a mature hardware feature available on existing ARM platforms. Notably, virtCCA can be implemented on platforms equipped with the Secure EL2 (S-EL2) extension available from ARMv8.4 onwards, as well as on earlier platforms that lack S-EL2 support. virtCCA is fully compatible with the CCA specifications at the API level. We have developed the entire CCA software and firmware stack on top of virtCCA, including the enhancements to the normal world's KVM to support cVMs, and the TrustZone Management Monitor (TMM) that enforces isolation among cVMs and provides cVM life-cycle management. We have implemented virtCCA on real ARM servers, with and without S-EL2 support. Our evaluation, conducted on micro-benchmarks and macro-benchmarks, demonstrates that the overhead of running cVMs is acceptable compared to running normal-world VMs. Specifically, in a set of real-world workloads, the overhead of virtCCA-SEL2 is less than 29.5% for I/O intensive workloads, while virtCCA-EL3 outperforms the baseline in most cases.	翻訳日:2024-03-25 23:38:51 公開日:2024-02-18
# VoltSchemer:ワイヤレス充電器を操作するために電圧ノイズを使う VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger ( http://arxiv.org/abs/2402.11423v1 ) ライセンス: Link先を確認	Zihao Zhan, Yirui Yang, Haoqi Shan, Hanqiu Wang, Yier Jin, Shuo Wang,	(参考訳) ワイヤレス充電は、従来の有線充電よりも便利で安全な充電体験のために、ポータブル電子製品の充電ソリューションとしてますます人気が高まっている。しかし、我々の研究はワイヤレス充電システムの新たな脆弱性を特定し、意図的な電磁干渉の影響を受けやすいようにした。これらの脆弱性は、新しい攻撃ベクトルのセットを促進し、敵がチャージャーを操作して一連の攻撃を行うことを可能にする。本稿では,電力供給の電圧を調節するだけで,攻撃者が市販のワイヤレス充電器を制御できる革新的な攻撃セットであるVoltSchemerを提案する。これらの攻撃は、電源からの電圧ノイズを利用して、充電器自体に悪質な変更を加えることなく、ワイヤレス充電器を操作する最初のものだ。 VoltSchemerが課した重大な脅威は、3つの実用的な攻撃によって裏付けられる: チャージャーを操作できる: 難聴音声コマンドによるボイスアシスタントの制御、過給または過熱によって充電される損傷装置、強磁場にさらされた貴重なアイテムに損傷を与えるためのQi規格の特定異物検出機構をバイパスする。トップセラーのCOTSワイヤレス充電器9台に対する攻撃を成功させたVoltSchemer攻撃の有効性と実用性を示す。さらに,本研究の安全性について考察し,潜在的な脅威を軽減するための対策を提案する。 Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vectors, enabling adversaries to manipulate the charger and perform a series of attacks. In this paper, we propose VoltSchemer, a set of innovative attacks that grant attackers control over commercial-off-the-shelf wireless chargers merely by modulating the voltage from the power supply. These attacks represent the first of its kind, exploiting voltage noises from the power supply to manipulate wireless chargers without necessitating any malicious modifications to the chargers themselves. The significant threats imposed by VoltSchemer are substantiated by three practical attacks, where a charger can be manipulated to: control voice assistants via inaudible voice commands, damage devices being charged through overcharging or overheating, and bypass Qi-standard specified foreign-object-detection mechanism to damage valuable items exposed to intense magnetic fields. We demonstrate the effectiveness and practicality of the VoltSchemer attacks with successful attacks on 9 top-selling COTS wireless chargers. Furthermore, we discuss the security implications of our findings and suggest possible countermeasures to mitigate potential threats.	翻訳日:2024-03-25 09:06:20 公開日:2024-02-18
# NestedSGX: 信頼できるVM内に宣言する信頼をブートストラップする NestedSGX: Bootstrapping Trust to Enclaves within Confidential VMs ( http://arxiv.org/abs/2402.11438v1 ) ライセンス: Link先を確認	Wenhao Wang, Linke Song, Benshan Mei, Shuang Liu, Shijun Zhao, Shoumeng Yan, XiaoFeng Wang, Dan Meng, Rui Hou,	(参考訳) 真のソフトウェアだけがマシンにロードされることを保証するため、システムセキュリティの維持には統合性が不可欠である。機密仮想マシン(CVM)はホストとは分離された環境内で機能するが、信頼された実行環境(TEE)内で実行されるコードの整合性を維持する上で、ユーザが依然として課題に直面していることを認識することが重要である。高度なオペレーティングシステム(OS)が存在することで、動的にコードを作成して実行することが可能になり、ゲストOSが侵害された場合、TEE内のユーザアプリケーションが干渉や改ざんに対して脆弱になる。本稿では、ゲストVM内でハードウェアエンクレーブの作成を可能にするために、AMD SEV-SNPで利用可能な最近のハードウェア機能である仮想マシン特権レベル(VMPL)を活用するNestedSGXを紹介する。 Intel SGXと同様、NestedSGXは、悪意のあるコードのロードを信頼していないゲストOSだと考えている。これは、エンクレーブ内で実行される信頼され測定されたコードだけがリモートで検証可能であることを保証します。既存のアプリケーションをシームレスに保護するために、NestedSGXはSGXリーフ関数をシミュレートすることで、Intel SGXとの互換性を目指している。また、SGX SDKをNestedSGXに移植し、システム内の既存のSGXツールチェーンとアプリケーションの使用を可能にしました。性能評価によると、NestedSGXのコンテキストスイッチはIntel SGXの約2～3倍の約35,000～37,000サイクルを要している。 NestedSGXは、ほとんどの現実世界のアプリケーションでは最小限のオーバーヘッドを発生し、ほとんどのワークロードでは平均5%以下、I/O集約ワークロードでは22.7%である。 Integrity is critical for maintaining system security, as it ensures that only genuine software is loaded onto a machine. Although confidential virtual machines (CVMs) function within isolated environments separate from the host, it is important to recognize that users still encounter challenges in maintaining control over the integrity of the code running within the trusted execution environments (TEEs). The presence of a sophisticated operating system (OS) raises the possibility of dynamically creating and executing any code, making user applications within TEEs vulnerable to interference or tampering if the guest OS is compromised. This paper introduces NestedSGX, which leverages virtual machine privilege level (VMPL), a recent hardware feature available on AMD SEV-SNP to enable the creation of hardware enclaves within the guest VM. Similar to Intel SGX, NestedSGX considers the guest OS untrusted for loading potentially malicious code. It ensures that only trusted and measured code executed within the enclave can be remotely attested. To seamlessly protect existing applications, NestedSGX aims for compatibility with Intel SGX by simulating SGX leaf functions. We have also ported the SGX SDK to NestedSGX, enabling the use of existing SGX toolchains and applications in the system. Performance evaluations show that context switches in NestedSGX take about 35,000-37,000 cycles, approximately 2-3 times that of Intel SGX. NestedSGX incurs minimal overhead in most real-world applications, with an average overhead below 5% for most workloads and 22.7% for I/O intensive workloads.	翻訳日:2024-03-25 09:06:20 公開日:2024-02-18
# 分散時空間データにおけるプライバシ損失の測定 Measuring Privacy Loss in Distributed Spatio-Temporal Data ( http://arxiv.org/abs/2402.11526v1 ) ライセンス: Link先を確認	Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri,	(参考訳) 複数の地理的な場所から分散的に収集された交通の流れや人々の移動に関する統計は、交通予測、需要予測、レストラン占領報告など、多くのアプリケーションを動かす原動力である。しかし、これらの統計は、しばしば人々のセンシティブな位置情報に基づいており、したがって、そのデータを公開している間にプライバシーを保持する必要がある。差分プライバシーは、厳格で最悪の人格レベルのプライバシーを保証します。本研究は,分散位置情報アプリケーションにおける差分プライバシーの非直感的特徴を動機として,情報提供者による位置復元攻撃に対する代替的プライバシー損失を提案する。実データと合成データを用いた実験により、分散時空間設定における個人のプライバシー侵害に対する直感を、プライバシーの損失がより良く反映していることが示される。 Statistics about traffic flow and people's movement gathered from multiple geographical locations in a distributed manner are the driving force powering many applications, such as traffic prediction, demand prediction, and restaurant occupancy reports. However, these statistics are often based on sensitive location data of people, and hence privacy has to be preserved while releasing them. The standard way to do this is via differential privacy, which guarantees a form of rigorous, worst-case, person-level privacy. In this work, motivated by several counter-intuitive features of differential privacy in distributed location applications, we propose an alternative privacy loss against location reconstruction attacks by an informed adversary. Our experiments on real and synthetic data demonstrate that our privacy loss better reflects our intuitions on individual privacy violation in the distributed spatio-temporal setting.	翻訳日:2024-03-25 08:56:22 公開日:2024-02-18
# エネルギーセクターレジリエンスの強化:設計原則によるセキュリティの統合 Enhancing Energy Sector Resilience: Integrating Security by Design Principles ( http://arxiv.org/abs/2402.11543v1 ) ライセンス: Link先を確認	Dov Shirtz, Inna Koberman, Aviad Elyashar, Rami Puzis, Yuval Elovici,	(参考訳) 設計によるセキュリティ、Sbdは、可能な限り、セキュリティ上の脆弱性がなく、セキュリティ攻撃に不注意なシステムの開発とメンテナンスのための概念である。堅牢な産業制御システムを開発する方法、ソフトウェア、通信製品など、技術的な側面に加えて、SbDには組織管理の態度や行動、従業員の意識といったソフトな側面も含まれている。 Sbdのコンセプトの下では、ICS(ICS)はユーザにとってより信頼に値するものとみなされるでしょう。システムに対するユーザの信頼は、SbDプロセスとポリシーの厳密な遵守から導き出されます。 SbDの概念に従って、セキュリティが検討されている。セキュリティ対策は、その後ではなく、製品やシステム開発ライフサイクルの各段階で実施されます。本報告では,産業用制御システムにおけるSbDの実装に関するセキュリティ要件について述べる。提示された情報は、既存のセキュリティやサイバーセキュリティの基準を無効にするものではありません。その代わり、私たちは組織がそれらの標準とベストプラクティスを実装し、遵守することを強く推奨します。設計によるセキュリティは、一度限りのプロセスではありません。システム設計のプロダクトの始まりから始まり、ライフサイクル全体を通して継続します。 SbDの利点、より高いレベルのセキュリティ、サイバー攻撃に対する堅牢性により、エネルギーセクターに関連するすべての組織は、エコシステムを確立する努力をすべきである。この文書に記載されている要件は、組織によって負担のかかるものとみなすことができる。しかしながら、この文書に記載されているように、要求と既存のセキュリティ標準とベストプラクティスへの厳格なコンプライアンスは、SbDが推進し保護するエコシステムを実現する上で不可欠である。 Security by design, Sbd is a concept for developing and maintaining systems that are, to the greatest extent possible, free from security vulnerabilities and impervious to security attacks. In addition to technical aspects, such as how to develop a robust industrial control systems hardware, software, communication product, etc., SbD includes also soft aspects, such as organizational managerial attitude and behavior, and employee awareness. Under the Sbd concept, systems, ICS in our context, will be considered more trustworthy by users. User's trust in the systems will be derived from the meticulous adherence to the SbD processes and policies. In accordance with the SbD concept, security is considered. Security measures are implemented, at every stage of the product and systems development life cycle, rather than afterwards. This document presents the security requirements for the implementation of the SbD in industrial control systems. The information presented does not negate any existing security and cyber security standards, etc. Instead, we strongly recommend that organizations should implement and comply with those standards and best practices. Security by design is not a one-time process. It starts at the very beginning of the products of the system design and continues through all its lifecycle. Due to the benefits of the SbD, higher level of security, and robustness to cyber attacks, all organizations associated with the energy sector should strive to establish an ecosystem. The requirements presented in this document may be perceived as burdensome by organizations. However, strict compliance with the requirements and existing security standards and best practices, including continuous monitoring, as specified in this document, is essential to realize an ecosystem driven and protected by the SbD	翻訳日:2024-03-25 08:56:22 公開日:2024-02-18
# 二元体上の効率的な正規基底について On efficient normal bases over binary fields ( http://arxiv.org/abs/2402.11544v1 ) ライセンス: Link先を確認	Mohamadou Sall, M. Anwar Hasan,	(参考訳) バイナリフィールド拡張は、多変量公開鍵暗号、コードベースの暗号、エラー訂正コードなど、多くのアプリケーションに基本的なものである。それらの実装は数論と代数幾何学の基礎を必要とし、効率的な基底の利用を必要とする。計算能力の継続的な増加と新しい(量子)コンピュータの設計により、システムのセキュリティに対する脅威が増大し、膨大な多項式や拡張度の暗号化標準が要求されるようになる。暗号的な目的や有限場演算の一般的な実装のためには、多様な基礎を持つ幅広い実装を検討することが不可欠である。いくつかの基底とは異なり、多項式とガウス正規基底は十分に文書化され広く使われている。本稿では、異なる範囲における演算の効率的な実装を示すために、$\mathbb{F}_{2^n}$ over $\mathbb{F}_2$の他の形式の基底について検討する。これを実現するために、Couveignes と Lercier が導入した高速計算と楕円周期の結果を活用し、その後 Ezome と Sall によって拡張した。これにより、二進体上の効率的な計算のための新しいテーブルが確立される。 Binary field extensions are fundamental to many applications, such as multivariate public key cryptography, code-based cryptography, and error-correcting codes. Their implementation requires a foundation in number theory and algebraic geometry and necessitates the utilization of efficient bases. The continuous increase in the power of computation, and the design of new (quantum) computers increase the threat to the security of systems and impose increasingly demanding encryption standards with huge polynomial or extension degrees. For cryptographic purposes or other common implementations of finite fields arithmetic, it is essential to explore a wide range of implementations with diverse bases. Unlike some bases, polynomial and Gaussian normal bases are well-documented and widely employed. In this paper, we explore other forms of bases of $\mathbb{F}_{2^n}$ over $\mathbb{F}_2$ to demonstrate efficient implementation of operations within different ranges. To achieve this, we leverage results on fast computations and elliptic periods introduced by Couveignes and Lercier, and subsequently expanded upon by Ezome and Sall. This leads to the establishment of new tables for efficient computation over binary fields.	翻訳日:2024-03-25 08:56:22 公開日:2024-02-18
# ハードウェアで戦うハードウェア:性能カウンタを用いたサイドチャネル攻撃の検出と軽減 Fight Hardware with Hardware: System-wide Detection and Mitigation of Side-Channel Attacks using Performance Counters ( http://arxiv.org/abs/2402.13281v1 ) ライセンス: Link先を確認	Stefano Carnà, Serena Ferracci, Francesco Quaglia, Alessandro Pellegrini,	(参考訳) 本稿では,キャッシュベースのサイドチャネル攻撃を利用して,標準的なオペレーティングシステムによるプロセス制限を破ろうとする悪意のあるアプリケーションに対して,システム全体の検出を可能にするカーネルレベルのインフラストラクチャを提案する。このインフラストラクチャは、マシン上で動作するすべてのアプリケーションから実行時に情報を集めるために、ハードウェアパフォーマンスカウンタに依存している。これらの測定から高レベルの検出指標が導出され、悪意のあるアプリケーションを迅速に検出する可能性の最大化が図られる。実験により, オーバーヘッドを著しく低減して, サイドチャネル攻撃の大規模なファミリーを捕捉できることが示唆された。また,非監視プロセス実行時のシステムセキュリティレベルと納品性能の全体的なトレードオフを増大させるため,プロセスがサイドチャネルアタックを実行した疑いのある場合に実施可能な対策についても論じる。 We present a kernel-level infrastructure that allows system-wide detection of malicious applications attempting to exploit cache-based side-channel attacks to break the process confinement enforced by standard operating systems. This infrastructure relies on hardware performance counters to collect information at runtime from all applications running on the machine. High-level detection metrics are derived from these measurements to maximize the likelihood of promptly detecting a malicious application. Our experimental assessment shows that we can catch a large family of side-channel attacks with a significantly reduced overhead. We also discuss countermeasures that can be enacted once a process is suspected of carrying out a side-channel attack to increase the overall tradeoff between the system's security level and the delivered performance under non-suspected process executions.	翻訳日:2024-03-25 08:56:22 公開日:2024-02-18
# PassViz:漏洩したパスワードを可視化するシステム PassViz: A Visualisation System for Analysing Leaked Passwords ( http://arxiv.org/abs/2309.12968v3 ) ライセンス: Link先を確認	Sam Parker, Haiyue Yuan, Shujun Li,	(参考訳) 他の手法の進歩にもかかわらず、パスワードは依然として最も広く使われているユーザー認証形式である。しかしながら、攻撃に対する感受性、特に人間のユーザによって定義された弱いパスワードなど、それらの制限は文書化されている。弱い人間が定義したパスワードの存在は、ウェブサイトから繰り返しパスワードのリークを引き起こし、その多くが大規模である。このようなパスワードリークは不運なセキュリティインシデントであるが、パスワードポリシーやパスワードの他のセキュリティコントロールを改善する方法を見つけるために、セキュリティ研究者や専門家に、そのようなリークパスワードから貴重な洞察を得る機会を提供する。研究者たちは、漏洩したパスワードを分析するために、さまざまなデータ可視化技術を提案している。しかし、多くのアプローチは周波数解析にのみ依存しており、距離ベースグラフの探索は限られている。本稿では,2次元空間における漏洩パスワードの可視化と解析を行うため,編集距離をt-SNE(t-disdistributed stochastic embedded)次元削減アルゴリズムと組み合わせた新しい手法であるPassVizについて報告する。我々はPassVizを大規模なパスワードデータベースを視覚化するための使いやすいコマンドラインツールとして実装し、また小さなパスワードデータベースのインタラクティブなビジュアル分析をサポートするグラフィカルユーザインタフェース(GUI)として実装した。リークした“000webhost”データベースを例として、PassVizを使って、漏洩したパスワードのさまざまな側面を視覚的に分析し、これまで知らなかったパスワードパターンの発見を容易にする方法を示す。全体として、我々のアプローチは、研究者や実践者が有効なデータ可視化と分析を通じて、貴重な洞察を得てパスワードセキュリティを改善するのに役立ちます。 Passwords remain the most widely used form of user authentication, despite advancements in other methods. However, their limitations, such as susceptibility to attacks, especially weak passwords defined by human users, are well-documented. The existence of weak human-defined passwords has led to repeated password leaks from websites, many of which are of large scale. While such password leaks are unfortunate security incidents, they provide security researchers and practitioners with good opportunities to learn valuable insights from such leaked passwords, in order to identify ways to improve password policies and other security controls on passwords. Researchers have proposed different data visualisation techniques to help analyse leaked passwords. However, many approaches rely solely on frequency analysis, with limited exploration of distance-based graphs. This paper reports PassViz, a novel method that combines the edit distance with the t-SNE (t-distributed stochastic neighbour embedding) dimensionality reduction algorithm for visualising and analysing leaked passwords in a 2-D space. We implemented PassViz as an easy-to-use command-line tool for visualising large-scale password databases, and also as a graphical user interface (GUI) to support interactive visual analytics of small password databases. Using the "000webhost" leaked database as an example, we show how PassViz can be used to visually analyse different aspects of leaked passwords and to facilitate the discovery of previously unknown password patterns. Overall, our approach empowers researchers and practitioners to gain valuable insights and improve password security through effective data visualisation and analysis.	翻訳日:2024-03-19 04:01:03 公開日:2024-02-18
# 情報検索におけるBERTの利用:調査,応用,資源,課題 Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges ( http://arxiv.org/abs/2403.00784v1 ) ライセンス: Link先を確認	Jiajia Wang, Jimmy X. Huang, Xinhui Tu, Junmei Wang, Angela J. Huang, Md Tahmid Rahman Laskar, Amran Bhuiyan	(参考訳) 近年では、さまざまな自然言語処理(nlp)問題を解決するためのディープラーニングの利用が大幅に増加している。初期のディープラーニングモデルは、テキスト入力間の文脈的関係を捉えるのに苦労するなど、逐次的あるいは一方向的な性質によって制約されていた。変換器(BERT)からの双方向エンコーダ表現の導入は、より広いコンテキストを理解し、様々なNLPタスクに対して最先端のパフォーマンスを提供することができるトランスフォーマーモデルの堅牢なエンコーダをもたらす。これは、研究者や実践者が情報検索(IR)のような実践的な問題にBERTを適用するきっかけとなった。 BERTのような事前訓練されたトランスフォーマーエンコーダをIRに適用する一般的なアプローチの包括的分析に焦点を当てた調査は、学術や産業にとって有用である。これを踏まえ、この調査では様々なBERTベースの手法を再検討し、IRの幅広い手法を網羅し、それらを6つのハイレベルカテゴリに分類する。 (i)長い文書を扱うこと。 (ii)意味情報の統合 (iii)有効性と効率のバランスをとること。 (四)項の重みを予測すること。 (v)クエリ拡張、および (vi)文書拡張。また、BERTベースのIRシステムのためのデータセットやツールキットを含むリソースへのリンクも提供します。この調査の重要な点は、bertのエンコーダベースのモデルと、デコーダに依存するchatgptのような最新の生成型大規模言語モデル(llm)の比較である。 LLMの人気にもかかわらず、特定のタスクに対して細調整されたBERTエンコーダは依然として性能が良く、デプロイコストも低い。最後に,調査の総合的な成果を要約し,今後の研究の方向性を提案する。 Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. A key highlight of our survey is the comparison between BERT's encoder-based models and the latest generative Large Language Models (LLMs), such as ChatGPT, which rely on decoders. Despite the popularity of LLMs, we find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area.	翻訳日:2024-03-11 00:09:22 公開日:2024-02-18
# 計画における LLM の役割--計画図への LLM の埋め込み On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs ( http://arxiv.org/abs/2403.00783v1 ) ライセンス: Link先を確認	Hankz Hankui Zhuo and Xin Chen and Rong Pan	(参考訳) プラン合成は、与えられた初期状態から目標状態へ移行するための一連のアクションやポリシーを生成することを目的としており、専門家が設計したり、データや世界との対話から学ぶことのできるドメインモデルを提供する。大規模言語モデル (LLM) における創発的計画能力の主張により, LLM における既成計画技術の利用を考慮せずに, LLM の計画効率を検討する作業が提案されている。本稿では,既成の計画フレームワークにおけるLCMの役割を解明し,LCMの計画能力に関する知見をさらに研究することを目的とする。そこで本研究では,LLMをグラフベースの計画フレームワークに組み込むことの有効性について検討し,LLMを2段階の計画グラフ,すなわち相互制約生成レベルと制約解決レベルに組み込んだ新しいLLMベースの計画フレームワークを提案する。様々な計画領域において提案手法の有効性を実証的に示す。 Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, without considering any utilization of off-the-shelf planning techniques in LLMs. In this paper, we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embedding LLMs into one of the well-known planning frameworks, graph-based planning, proposing a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs, i.e., mutual constraints generation level and constraints solving level. We empirically exhibit the effectiveness of our proposed framework in various planning domains.	翻訳日:2024-03-11 00:08:53 公開日:2024-02-18
# Ploutos:金融大言語モデルによる株価変動予測に向けて Ploutos: Towards interpretable stock movement prediction with financial large language model ( http://arxiv.org/abs/2403.00782v1 ) ライセンス: Link先を確認	Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, Qi Zhang	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、多くの領域で新しい経路を開拓している。しかし、金融投資におけるLLMのポテンシャルは、ほとんど未完成のままである。一般的なディープラーニングベースの定量的ファイナンスには,2つの大きな課題がある。まず、株価移動予測のためにテキスト情報と数値情報を柔軟に融合するのに苦労する。第二に、従来の手法には明確さと解釈性が欠けており、予測の正当化が不可欠であるシナリオでその応用を妨げる。以上の課題を解決するために,PloutosGenとPloutosGPTで構成される新しい金融LLMフレームワークであるPloutosを提案する。 ploutosgenには、テキストや数値などの異なるモーダルデータを分析し、異なる観点から定量的な戦略を提供する複数の主要な専門家が含まれている。そして、PloutosGPTは彼らの洞察と予測を組み合わせて解釈可能な理性を生成する。正確で忠実な合理性を生成するために、PloutosGPTのトレーニング戦略は、GPT-4を誘導して合理性を生成するリアビューミラープロンプト機構と、キートークンの重みを増大させることによりLDMを微調整する動的トークン重み付け機構を利用する。我々のフレームワークは予測精度と解釈可能性の両方において最先端の手法より優れていることを示す。 Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional methods lack clarity and interpretability, which impedes their application in scenarios where the justification for predictions is essential. To solve the above challenges, we propose Ploutos, a novel financial LLM framework that consists of PloutosGen and PloutosGPT. The PloutosGen contains multiple primary experts that can analyze different modal data, such as text and numbers, and provide quantitative strategies from different perspectives. Then PloutosGPT combines their insights and predictions and generates interpretable rationales. To generate accurate and faithful rationales, the training strategy of PloutosGPT leverage rearview-mirror prompting mechanism to guide GPT-4 to generate rationales, and a dynamic token weighting mechanism to finetune LLM by increasing key tokens weight. Extensive experiments show our framework outperforms the state-of-the-art methods on both prediction accuracy and interpretability.	翻訳日:2024-03-11 00:08:33 公開日:2024-02-18
# ChatDiet: LLM拡張フレームワークによるパーソナライズされた栄養指向食品レコメンダチャットボットの活用 ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework ( http://arxiv.org/abs/2403.00781v1 ) ライセンス: Link先を確認	Zhongqi Yang, Elahe Khatibi, Nitish Nagesh, Mahyar Abbasian, Iman Azimi, Ramesh Jain, Amir M. Rahmani	(参考訳) 食品が健康に与える影響は、高度な栄養指向の食品レコメンデーションサービスを必要とする。従来の手法は、パーソナライゼーション、説明可能性、対話性の重要な要素を欠いていることが多い。大きな言語モデル(LLM)は解釈可能性と説明可能性をもたらすが、彼らのスタンドアロンの使用は真のパーソナライゼーションを達成するには不十分である。本稿では、栄養指向食品レコメンデーションチャットボットに特化して設計された、新しいLLMフレームワークChatDietを紹介する。 ChatDietは、オーケストラが補完する個人モデルと人口モデルを統合し、シームレスに関連する情報を検索し、処理する。その結果、個人の好みに合わせて、パーソナライズされた説明可能な食品レコメンデーションが動的に配信される。 chatdietの評価には、個々の栄養効果を推定するための因果的個人モデルを確立する、説得力のあるケーススタディが含まれています。食事のレコメンデーションテストを含む評価は,説明可能性,パーソナライゼーション,対話性におけるチャットの強みを図示的対話例と組み合わせて,有効率92\%を示した。 The profound impact of food on health necessitates advanced nutrition-oriented food recommendation services. Conventional methods often lack the crucial elements of personalization, explainability, and interactivity. While Large Language Models (LLMs) bring interpretability and explainability, their standalone use falls short of achieving true personalization. In this paper, we introduce ChatDiet, a novel LLM-powered framework designed specifically for personalized nutrition-oriented food recommendation chatbots. ChatDiet integrates personal and population models, complemented by an orchestrator, to seamlessly retrieve and process pertinent information. The result is a dynamic delivery of personalized and explainable food recommendations, tailored to individual user preferences. Our evaluation of ChatDiet includes a compelling case study, where we establish a causal personal model to estimate individual nutrition effects. Our assessments, including a food recommendation test showcasing a 92\% effectiveness rate, coupled with illustrative dialogue examples, underscore ChatDiet's strengths in explainability, personalization, and interactivity.	翻訳日:2024-03-11 00:08:11 公開日:2024-02-18
# 絡み合い:罰と補償のバランス、繰り返しジレンマゲーム--ウォーレスの法則の弱い偽ニュースの場合、バイパスの最大補償問題とファクトチェックの最小コストパスの理論的解析 Entanglement: Balancing Punishment and Compensation, Repeated Dilemma Game-Theoretic Analysis of Maximum Compensation Problem for Bypass and Least Cost Paths in Fact-Checking, Case of Fake News with Weak Wallace's Law ( http://arxiv.org/abs/2403.02342v1 ) ライセンス: Link先を確認	Yasuko Kawahata	(参考訳) 本研究ノートは,偽ニュースの拡散と効果的な事実確認に関連する問題を解決するための新しいアプローチについて整理したものである。最小コストのルーティング問題に着目し,ニュース提供者間の情報伝達のダイナミクスをモデル化するために,メッツラー関数とメッツラー行列を用いて議論を行った。このアプローチでは,情報健康に有害な偽ニュースの拡散を最小限に抑えるとともに,信頼性の高い情報の拡散を最大化する戦略を考案した。特に, 懲罰的支配問題と最大補償問題を通じて, 情報提供者が行動すべきインセンティブを再評価し, それらの情報市場の均衡への影響を分析する方法を開発し検討した。情報伝達の文脈に絡み合いの概念を適用することで、ニュース提供者間の相互作用の複雑さに光を当て、より効果的な情報管理戦略の策定に寄与する。本研究は,偽ニュースとファクトチェックに関する新たな理論的,実践的な知見を提供し,情報健康と公衆デジタル健康の改善について検討する。 This research note is organized with respect to a novel approach to solving problems related to the spread of fake news and effective fact-checking. Focusing on the least-cost routing problem, the discussion is organized with respect to the use of Metzler functions and Metzler matrices to model the dynamics of information propagation among news providers. With this approach, we designed a strategy to minimize the spread of fake news, which is detrimental to informational health, while at the same time maximizing the spread of credible information. In particular, through the punitive dominance problem and the maximum compensation problem, we developed and examined a path to reassess the incentives of news providers to act and to analyze their impact on the equilibrium of the information market. By applying the concept of entanglement to the context of information propagation, we shed light on the complexity of interactions among news providers and contribute to the formulation of more effective information management strategies. This study provides new theoretical and practical insights into issues related to fake news and fact-checking, and will be examined against improving informational health and public digital health.	翻訳日:2024-03-10 23:51:35 公開日:2024-02-18
# 大規模言語モデルのためのプロンプト手法の実証的分類:実践者ガイド An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide ( http://arxiv.org/abs/2402.14837v1 ) ライセンス: Link先を確認	Oluwole Fagbohun, Rachel M. Harrison, Anton Dereventsov	(参考訳) 大規模言語モデル(llm)の開発が急速に進んでいるため、これらのモデルをプロンプトでプログラミングすることが最近大きな注目を集めている。しかし、利用可能なプロンプトエンジニアリングテクニックの数が多く、これらのツールを使いたい実践者にとって圧倒的な景観を生み出します。 LLMの最も効率的かつ効果的な利用のためには、プロンプト技術の包括的なリストをコンパイルし、標準化された学際分類フレームワークを確立することが重要である。本調査では,学術的,実践的両面から最もよく知られたプロンプト技術について検討し,それらを7つのカテゴリーに分類する。本稿では,それぞれのカテゴリについて概説し,それぞれの分野に合わせたプロンプト技術を理解し,分類するための構造的枠組みを,実践者の実例で示すことを目的とする。このアプローチは、迅速なエンジニアリングの複雑な景観を単純化し、様々なアプリケーションにおけるLLMのより効率的な利用を可能にする。実践者に分類を急ぐための体系的なアプローチを提供することにより,対話型事前学習 LLM の効果的なプロンプト設計の複雑化を支援し,それぞれの分野に新たな可能性をもたらすことを目指す。 Due to rapid advancements in the development of Large Language Models (LLMs), programming these models with prompts has recently gained significant attention. However, the sheer number of available prompt engineering techniques creates an overwhelming landscape for practitioners looking to utilize these tools. For the most efficient and effective use of LLMs, it is important to compile a comprehensive list of prompting techniques and establish a standardized, interdisciplinary categorization framework. In this survey, we examine some of the most well-known prompting techniques from both academic and practical viewpoints and classify them into seven distinct categories. We present an overview of each category, aiming to clarify their unique contributions and showcase their practical applications in real-world examples in order to equip fellow practitioners with a structured framework for understanding and categorizing prompting techniques tailored to their specific domains. We believe that this approach will help simplify the complex landscape of prompt engineering and enable more effective utilization of LLMs in various applications. By providing practitioners with a systematic approach to prompt categorization, we aim to assist in navigating the intricacies of effective prompt design for conversational pre-trained LLMs and inspire new possibilities in their respective fields.	翻訳日:2024-03-03 19:38:28 公開日:2024-02-18
# 大規模言語モデルに基づくレコメンデーションのステルス攻撃 Stealthy Attack on Large Language Model based Recommendation ( http://arxiv.org/abs/2402.14836v1 ) ライセンス: Link先を確認	Jinghao Zhang, Yuting Liu, Qiang Liu, Shu Wu, Guibing Guo and Liang Wang	(参考訳) 近年、強力な大規模言語モデル(llms)は、レコメンダシステム(rs)の進歩を促進するのに役立っている。しかし、これらのシステムは繁栄しているが、セキュリティの脅威に対する感受性はほとんど見過ごされている。本稿では,推奨モデルへのllmの導入が,項目のテキストコンテンツを重視した新たなセキュリティ脆弱性をもたらすことを明らかにした。攻撃者は、モデルのトレーニングプロセスに直接干渉することなく、テストフェーズ中にテキストの内容を変更するだけで、アイテムの露出を大幅に向上できることを示す。さらにこの攻撃は、全体的なレコメンデーション性能に影響を与えず、テキストの変更が微妙であるため、ユーザやプラットフォームが検出することが難しいため、特にステルス性が強い。 4つの主要なLCMベースレコメンデーションモデルに対する総合的な実験は、我々のアプローチの優れた有効性とステルス性を示している。我々の研究は、LLMベースのレコメンデーションシステムにおいて重大なセキュリティギャップを明らかにし、これらのシステムを保護するための将来の研究の道を開く。 Recently, the powerful large language models (LLMs) have been instrumental in propelling the progress of recommender systems (RS). However, while these systems have flourished, their susceptibility to security threats has been largely overlooked. In this work, we reveal that the introduction of LLMs into recommendation models presents new security vulnerabilities due to their emphasis on the textual content of items. We demonstrate that attackers can significantly boost an item's exposure by merely altering its textual content during the testing phase, without requiring direct interference with the model's training process. Additionally, the attack is notably stealthy, as it does not affect the overall recommendation performance and the modifications to the text are subtle, making it difficult for users and platforms to detect. Our comprehensive experiments across four mainstream LLM-based recommendation models demonstrate the superior efficacy and stealthiness of our approach. Our work unveils a significant security gap in LLM-based recommendation systems and paves the way for future research on protecting these systems.	翻訳日:2024-03-03 19:38:06 公開日:2024-02-18
# MIKE: きめ細かいマルチモーダルエンティティ知識編集のためのベンチマーク MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing ( http://arxiv.org/abs/2402.14835v1 ) ライセンス: Link先を確認	Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, Bozhong Tian	(参考訳) マルチモーダル知識編集は,MLLM(Multimodal Large Language Models)の能力向上における重要な進歩である。その可能性にもかかわらず、現在のベンチマークは主に粗粒度知識に重点を置いており、細粒度(FG)マルチモーダルエンティティ知識の複雑さはほとんど解明されていない。このギャップは、さまざまな実世界のシナリオにおけるMLLMの実践的展開と有効性において、FGエンティティ認識が重要な課題であることを示している。このギャップを埋めるために、我々はFGマルチモーダルエンティティ知識編集用に設計された包括的なベンチマークとデータセットであるMIKEを紹介する。 MIKEには、Vanilla Name Answering、Entity-Level Caption、Complex-Scenario Recognitionなど、さまざまな視点を評価するための一連のタスクが含まれている。また,新たな知識編集形式であるマルチステップ編集を導入し,編集効率を評価する。本研究では, MLLMにおけるFG知識編集の複雑さを浮き彫りにして, 提案したベンチマークに対処する上で, 現在の最先端手法が重大な課題に直面していることを示す。本研究は,この領域における新たなアプローチの急激なニーズを浮き彫りにして,コミュニティにおける今後の研究・開発活動に向けた明確な議題を定めている。 Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs). Despite its potential, current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored. This gap presents a notable challenge, as FG entity recognition is pivotal for the practical deployment and effectiveness of MLLMs in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess different perspectives, including Vanilla Name Answering, Entity-Level Caption, and Complex-Scenario Recognition. In addition, a new form of knowledge editing, Multi-step Editing, is introduced to evaluate the editing efficiency. Through our extensive evaluations, we demonstrate that the current state-of-the-art methods face significant challenges in tackling our proposed benchmark, underscoring the complexity of FG knowledge editing in MLLMs. Our findings spotlight the urgent need for novel approaches in this domain, setting a clear agenda for future research and development efforts within the community.	翻訳日:2024-03-03 19:37:51 公開日:2024-02-18
# MSynFD:マルチホップ構文認識フェイクニュース検出 MSynFD: Multi-hop Syntax aware Fake News Detection ( http://arxiv.org/abs/2402.14834v1 ) ライセンス: Link先を確認	Liang Xiao, Qi Zhang, Chongyang Shi, Shoujin Wang, Usman Naseem, and Liang Hu	(参考訳) ソーシャルメディアプラットフォームの普及は偽ニュースの拡散を加速させ、われわれの現実社会に脅威をもたらしている。既存の手法では、マルチモーダルデータや文脈情報を用いて、ニュースコンテンツやそのソーシャルコンテキストを分析して偽ニュースの検出を強化する。しかし、これらの方法はしばしば本質的なテクスト的なニュースコンテンツ(記事)を見落とし、シーケンシャルなモデリングと世界的注意に依存して意味情報を抽出する。これらの既存の手法は、構文論的ミスマッチや先行バイアスといった、ニュース記事の複雑な微妙なひねりを処理できず、モダリティや社会的文脈が欠けている場合のパフォーマンスが低下し、潜在的な失敗につながる。これらの大きなギャップを埋めるために,偽ニュースの微妙なひねりに対処するために,補完的な構文情報を組み込んだマルチホップ構文認識フェイクニュース検出(msynfd)手法を提案する。具体的には、構文依存グラフを導入し、マルチホップ構文をキャプチャするマルチホップサブグラフアグリゲーション機構を設計する。単語知覚の効果を拡張し、効果的なノイズフィルタリングと隣接した関係強化につながる。その後、シーケンシャルな相対位置認識トランスは、先行バイアスを軽減するために、精巧なキーワードデバイアスモジュールと共にシーケンシャル情報をキャプチャするように設計されている。 2つのベンチマークデータセットにおける広範囲な実験結果から,提案手法の有効性と優れた性能を検証できた。 The proliferation of social media platforms has fueled the rapid dissemination of fake news, posing threats to our real-life society. Existing methods use multimodal data or contextual information to enhance the detection of fake news by analyzing news content and/or its social context. However, these methods often overlook essential textual news content (articles) and heavily rely on sequential modeling and global attention to extract semantic information. These existing methods fail to handle the complex, subtle twists in news articles, such as syntax-semantics mismatches and prior biases, leading to lower performance and potential failure when modalities or social context are missing. To bridge these significant gaps, we propose a novel multi-hop syntax aware fake news detection (MSynFD) method, which incorporates complementary syntax information to deal with subtle twists in fake news. Specifically, we introduce a syntactical dependency graph and design a multi-hop subgraph aggregation mechanism to capture multi-hop syntax. It extends the effect of word perception, leading to effective noise filtering and adjacent relation enhancement. Subsequently, a sequential relative position-aware Transformer is designed to capture the sequential information, together with an elaborate keyword debiasing module to mitigate the prior bias. Extensive experimental results on two public benchmark datasets verify the effectiveness and superior performance of our proposed MSynFD over state-of-the-art detection models.	翻訳日:2024-03-03 19:37:27 公開日:2024-02-18
# ド・ジッター時空における三部交絡 A tripartite entanglement in de Sitter spacetime ( http://arxiv.org/abs/1909.13454v4 ) ライセンス: Link先を確認	Sang-Eon Bak, Paul M. Alsing, Warner A. Miller, Shahabeddin M. Aslmarand and Doyeol Ahn	(参考訳) ド・ジッター空間における三部絡み状態の量子相関について検討する。まず,ノイズ量子チャネルモデルを採用する。このモデルでは、拡大効果は対応するクラウス作用素との演算子和表現によって表現される。この写像はトレース保存であり、完全に正である。次に,チャネル状態対応を用いて量子相関解析を行う。拡大率が大きい場合には、三成分相互情報には大きな負の値があり、これは小さな二成分相互情報に対応する。この結果と局所的な測定から情報を回収する課題を関連づける。 We investigate the quantum correlation for tripartite entangled states in de Sitter space. First, we adopt the noisy quantum channel model. In this model, the expansion effect is represented by an operator sum representation with its corresponding Kraus operator. This map is shown to be trace-preserving and completely positive. Second, we analyze the quantum correlation by using the channel-state correspondence. For a large expansion rate, the tripartite mutual information has a large negative value, which corresponds to a small magnitude of bipartite mutual information. We relate this result with the challenge of recovering information from local measurements.	翻訳日:2024-03-03 19:35:18 公開日:2024-02-18
# 定常エネルギー輸送における周波数依存性ビブロニック効果 Frequency-Dependent Vibronic Effects in Steady State Energy Transport ( http://arxiv.org/abs/2402.16881v1 ) ライセンス: Link先を確認	Leonardo F. Calder\'on and Paul Brumer	(参考訳) 電子と分子内における高周波振動自由度の間の相互作用は、自然光ハーベスティングシステムにおいてユビキタスである。近年の研究では、分子内振動ドナー-受容体周波数差によってエネルギー輸送が促進されることが示されている。ここでは,分子内ドナー-受容体振動周波数の違いが平衡(コヒーレント光励起)における励起エネルギー輸送に与える影響と,より自然な非平衡定常状態(コヒーレント光励起)構成に与える影響を分析する。また,Huang-Rhys因子が一定であれば,受容体の分子内振動頻度がドナーの振動数を上回ると,受容体の数が増加することがわかった。振動周波数差によるアクセプター数の増大は,Huang-Rhys因子の高値や振動結合強度に対して高い値を示した。しかし、非平衡定常状態の結果、振動ドナー・アクセプターの周波数差は、非コヒーレント光励起の自然なシナリオや生物学的に関連するパラメータの下でエネルギー輸送を著しく促進しないことが示された。反応中心での収穫時間の増加に基づいて,NESSにおけるエネルギー移動を最適化する可能性について考察した。 The interplay between electronic and intramolecular high-frequency vibrational degrees of freedom is ubiquitous in natural light-harvesting systems. Recent studies have indicated that an intramolecular vibrational donor-acceptor frequency difference can enhance energy transport. Here, we analyze the extent to which different intramolecular donor-acceptor vibrational frequencies affect excitation energy transport in equilibrium (coherent light excitation) and the more natural nonequilibrium steady state (incoherent light excitation) configurations. It is found that if the Huang-Rhys factors remain constant, the acceptor population increases when the intramolecular vibrational frequency of the acceptor exceeds that of the donor. The increase in the acceptor population due to the vibrational frequency difference is higher for higher values of the Huang-Rhys factors or the vibronic coupling strengths. However, the nonequilibrium steady state results show that the vibrational donor-acceptor frequency difference does not significantly enhance energy transport in the natural scenario of incoherent light excitation and under biologically relevant parameters. Insight about a potential mechanism to optimize energy transfer in the NESS based on increasing the harvesting time at the reaction center is analyzed.	翻訳日:2024-03-03 19:06:47 公開日:2024-02-18
# besa: ブロックワイズパラメータ効率のよいスパルシティアロケーションによる大規模言語モデルのpruning BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ( http://arxiv.org/abs/2402.16880v1 ) ライセンス: Link先を確認	Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo	(参考訳) 大規模言語モデル(llm)は,テキスト要約やテキスト質問処理など,さまざまなタスクにおいて優れた性能を示している。彼らの性能は印象的だが、膨大な数のパラメータによる計算フットプリントは禁止される。 SparseGPTやWandaといった既存のソリューションは、重み付けによってこの問題を緩和しようと試みている。しかし、それらの層的なアプローチはモデルの出力にかなりの摂動をもたらし、モデル全体の性能に悪影響を及ぼすプラニングレートのような細心の注意深いハイパーパラメータチューニングを必要とする。そこで本研究では,ブロックワイド再構成損失を適用して,ブロックワイドパラメータ効率の空間割当(BESA)と呼ばれる新しいLCMプルーニング手法を提案する。典型的な層割り刈り技術とは対照的に、besaには2つの特徴がある。一個々の変圧器ブロックに関する全体的な刈り誤差を目標とし、二層特異的スパルシリティを鑑別可能な方法で割り当てることにより、刈り取り後の性能劣化の低減を図ること。 LLaMA1 や LLaMA2 のような LLM を 1 つの A100 GPU 上で 7B から 70B のパラメータでわずか 5 時間で効率よく刈り取ることができる。コードは \href{https://github.com/OpenGVLab/LLMPrune-BESA}{here} で公開されている。 Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc. While their performance is impressive, the computational footprint due to their vast number of parameters can be prohibitive. Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning. However, their layer-wise approach results in significant perturbation to the model's output and requires meticulous hyperparameter tuning, such as the pruning rate, which can adversely affect overall model performance. To address this, this paper introduces a novel LLM pruning technique dubbed blockwise parameter-efficient sparsity allocation (BESA) by applying a blockwise reconstruction loss. In contrast to the typical layer-wise pruning techniques, BESA is characterized by two distinctive attributes: i) it targets the overall pruning error with respect to individual transformer blocks, and ii) it allocates layer-specific sparsity in a differentiable manner, both of which ensure reduced performance degradation after pruning. Our experiments show that BESA achieves state-of-the-art performance, efficiently pruning LLMs like LLaMA1, and LLaMA2 with 7B to 70B parameters on a single A100 GPU in just five hours. Code is available at \href{https://github.com/OpenGVLab/LLMPrune-BESA}{here}.	翻訳日:2024-03-03 19:06:24 公開日:2024-02-18
# radarscenes: 自動車アプリケーションのための現実世界のレーダーポイントクラウドデータセット RadarScenes: A Real-World Radar Point Cloud Data Set for Automotive Applications ( http://arxiv.org/abs/2104.02493v2 ) ライセンス: Link先を確認	Ole Schumann, Markus Hahn, Nicolas Scheiner, Fabio Weishaupt, Julius F. Tilly, J\"urgen Dickmann, Christian W\"ohler	(参考訳) 4時間以上の運転から測定値とポイントワイズアノテーションを備えた新しい自動車レーダデータセットが提示された。 1台の試験車に搭載された4つのレーダセンサーから得られたデータを記録し、動的物体の個別検出を手動でクラスターにグループ化し、その後ラベル付けした。このデータセットの目的は、移動道路利用者に焦点を当てた新しい(機械学習に基づく)レーダ認識アルゴリズムの開発を可能にすることである。記録されたシーケンスの画像は、ドキュメンタリーカメラで撮影された。将来のオブジェクト検出および分類アルゴリズムの評価のために,研究者が共通のアルゴリズムを評価できるように,スコア計算の提案を行う。追加情報とダウンロード手順は、データセットのウェブサイト(www.radar-scenes.com)で見ることができる。 A new automotive radar data set with measurements and point-wise annotations from more than four hours of driving is presented. Data provided by four series radar sensors mounted on one test vehicle were recorded and the individual detections of dynamic objects were manually grouped to clusters and labeled afterwards. The purpose of this data set is to enable the development of novel (machine learning-based) radar perception algorithms with the focus on moving road users. Images of the recorded sequences were captured using a documentary camera. For the evaluation of future object detection and classification algorithms, proposals for score calculation are made so that researchers can evaluate their algorithms on a common basis. Additional information as well as download instructions can be found on the website of the data set: www.radar-scenes.com.	翻訳日:2024-02-22 22:05:58 公開日:2024-02-18
# GPT4Motion:Blender-Oriented GPT Planningによるテキスト・ビデオ生成における物理動作のスクリプト作成 GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning ( http://arxiv.org/abs/2311.12631v2 ) ライセンス: Link先を確認	Jiaxi Lv and Yi Huang and Mingfu Yan and Jiancheng Huang and Jianzhuang Liu and Yifan Liu and Yafei Wen and Xiaoxin Chen and Shifeng Chen	(参考訳) テキスト対ビデオ生成の最近の進歩は、拡散モデルの力を利用して、テキストプロンプトに基づいて視覚的に魅力的なコンテンツを作成する。しかし、通常高い計算コストに遭遇し、コヒーレントな物理的動きを持つビデオを作るのに苦労する。そこで本研究では,gptなどの大規模言語モデルの計画能力,ブレンダの物理シミュレーション強度,映像合成の質を高めるためのテキスト・画像拡散モデルの優れた画像生成能力を活用する,トレーニングフリーなフレームワークであるgpt4motionを提案する。具体的には、gpt4motionはgpt-4を使用してユーザーテキストプロンプトに基づいたブレンダースクリプトを生成し、blenderの組み込み物理エンジンにフレーム間のコヒーレントな物理運動をカプセル化する基本的なシーンコンポーネントを作成するよう命令する。そして、これらのコンポーネントを安定拡散に入力し、テキストプロンプトに合わせたビデオを生成する。剛体物体の落下・衝突・布のドッピング・揺動・液流を含む3つの基本的な物理運動シナリオの実験結果から,GPT4Motionは動きのコヒーレンシと実体の整合性を維持する上で,高品質な映像を効率よく生成できることを示した。 GPT4Motionは、テキスト・ビデオ研究における新たな洞察を提供し、その品質を高め、さらなる探索のための地平を広げる。 Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis. Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a user textual prompt, which commands Blender's built-in physics engine to craft fundamental scene components that encapsulate coherent physical motions across frames. Then these components are inputted into Stable Diffusion to generate a video aligned with the textual prompt. Experimental results on three basic physical motion scenarios, including rigid object drop and collision, cloth draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate high-quality videos efficiently in maintaining motion coherency and entity consistency. GPT4Motion offers new insights in text-to-video research, enhancing its quality and broadening its horizon for further explorations.	翻訳日:2024-02-21 20:15:37 公開日:2024-02-18
# ModelGPT: モデル生成のためのLLMの能力の解放 ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation ( http://arxiv.org/abs/2402.12408v1 ) ライセンス: Link先を確認	Zihao Tang, Zheqi Lv, Shengyu Zhang, Fei Wu, Kun Kuang	(参考訳) 大規模言語モデル(llm)の急速な進歩は、ルーチンタスクを自動化することで様々な分野に革命をもたらし、人工知能(agi)の実現に向けた一歩となった。しかしながら、ユーザのニーズの多様さや、平均的なユーザに対するaiモデルの利用の簡素化にはまだ苦労している。そこで本研究では,ユーザが提供するデータやタスク記述に合わせたaiモデルを,llmの機能を活用して決定・生成する新しいフレームワークであるmodelgptを提案する。ユーザの要求に応じて、ModelGPTは以前のパラダイム(全パラメータやLoRAファインタニングなど)よりも少なくとも270倍高速なモデルを提供することができる。 NLP、CV、Tabularデータセットに関する包括的な実験は、AIモデルをよりアクセシブルでユーザフレンドリにするためのフレームワークの有効性を実証しています。私たちのコードはhttps://github.com/IshiKura-a/ModelGPTで利用可能です。 The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at https://github.com/IshiKura-a/ModelGPT.	翻訳日:2024-02-21 18:50:45 公開日:2024-02-18
# FPGA上の局所ラプラシアンフィルタの高速化 Accelerating local laplacian filters on FPGAs ( http://arxiv.org/abs/2402.12407v1 ) ライセンス: Link先を確認	Shashwat Khandelwal, Ziaul Choudhury, Shashwat Shrivastava and Suresh Purini	(参考訳) 様々なエンハンスメント技術を用いて処理された画像は、しばしばエッジ劣化やハロなどの不要なアーティファクトにつながる。これらのアーティファクトは、画像の品質を損なうことができる写真応用にとって大きな問題となる。画像処理の分野ではエッジアウェア技術が数多く提案されている。しかし、これらは複雑な最適化や後処理の方法の応用を必要とする。局所ラプラシアンフィルタリングは、単純なガウスピラミッドとラプラシアンピラミッドの構築を含むエッジ対応画像処理技術である。このテクニックは、ディテールの平滑化、ディテールエンハンスメント、トーンマッピング、画像の逆トーンマッピングにうまく適用でき、アーティファクトフリーにしておくことができる。しかし、このアプローチの問題は計算コストが高いことだ。そのため、マルチコアCPUとGPUを用いた並列化方式が提案されている。良く知られたように、電力効率は高くなく、fpga上のよく設計されたハードウェアアーキテクチャはワットメートル当たりの性能を良くすることができる。本稿では,オンチップFPGAリソースの利用を最小化しつつ,ローカルラプラシアンフィルタアルゴリズムで利用可能な並列性を完全に活用するハードウェアアクセラレータを提案する。 Virtex-7 FPGAでは、最適化されたベースラインCPU実装と比較して、1MBの画像を処理するための7.5倍のスピードアップが得られる。私たちの知る限りでは、ローカルラプラシアンフィルタリング問題の研究文献で提案されている他のハードウェアアクセラレータには気づいていません。 Images when processed using various enhancement techniques often lead to edge degradation and other unwanted artifacts such as halos. These artifacts pose a major problem for photographic applications where they can denude the quality of an image. There is a plethora of edge-aware techniques proposed in the field of image processing. However, these require the application of complex optimization or post-processing methods. Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids. This technique can be successfully applied for detail smoothing, detail enhancement, tone mapping and inverse tone mapping of an image while keeping it artifact-free. The problem though with this approach is that it is computationally expensive. Hence, parallelization schemes using multi-core CPUs and GPUs have been proposed. As is well known, they are not power-efficient, and a well-designed hardware architecture on an FPGA can do better on the performance per watt metric. In this paper, we propose a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation. To the best of our knowledge, we are not aware of any other hardware accelerators proposed in the research literature for the Local Laplacian Filtering problem.	翻訳日:2024-02-21 18:50:28 公開日:2024-02-18
# 教師としての教師 : 教師非依存のデータフリー知識蒸留 Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation ( http://arxiv.org/abs/2402.12406v1 ) ライセンス: Link先を確認	Hyunjune Shin, Dong-Wan Choi	(参考訳) data-free knowledge distillation (dfkd) は、学習済みの知識を、元のデータを使わずに、ジェネレータの助けを借りて学生モデルに蒸留することを目的としている。このようなデータのないシナリオでは、DFKDの安定した性能を達成することが不可欠である。残念ながら,既存のDFKD法は様々な教師モデルに非常に敏感であり,よく訓練された教師モデルを用いても,蒸留の破滅的な失敗を示すことがある。 DFKDのジェネレータは,クラスプライアロスと対角損失の最小化という従来の代表的戦略を用いて,正確かつ多様なサンプルを生成することが常に保証されているわけではない。実験では,クラス優先が生成したサンプルの多様性を減少させるだけでなく,教師モデルによって予期せぬほど低品質なサンプルを生成する問題に完全に対処できないことに着目した。本稿では,教師モデルによらず,より堅牢で安定した性能を目指して,教師に依存しないデータフリー知識蒸留法(TA-DFKD)を提案する。私たちの基本的な考え方は、ジェネレータにクラス優先を強制する厳格な監督者ではなく、教師モデルにサンプルを評価するための寛大な専門家の役割を割り当てることです。具体的には,教師モデルによって検証されたクリーンなサンプルのみを取り出すサンプル選択手法を,多様なサンプル生成のパワーに制約を課さずに設計する。実験により,既存のDFKD法よりも高い性能を示しながら,様々な教師モデルにおける頑健さと訓練安定性を両立させることができた。 Data-free knowledge distillation (DFKD) aims to distill pretrained knowledge to a student model with the help of a generator without using original data. In such data-free scenarios, achieving stable performance of DFKD is essential due to the unavailability of validation data. Unfortunately, this paper has discovered that existing DFKD methods are quite sensitive to different teacher models, occasionally showing catastrophic failures of distillation, even when using well-trained teacher models. Our observation is that the generator in DFKD is not always guaranteed to produce precise yet diverse samples using the existing representative strategy of minimizing both class-prior and adversarial losses. Through our empirical study, we focus on the fact that class-prior not only decreases the diversity of generated samples, but also cannot completely address the problem of generating unexpectedly low-quality samples depending on teacher models. In this paper, we propose the teacher-agnostic data-free knowledge distillation (TA-DFKD) method, with the goal of more robust and stable performance regardless of teacher models. Our basic idea is to assign the teacher model a lenient expert role for evaluating samples, rather than a strict supervisor that enforces its class-prior on the generator. Specifically, we design a sample selection approach that takes only clean samples verified by the teacher model without imposing restrictions on the power of generating diverse samples. Through extensive experiments, we show that our method successfully achieves both robustness and training stability across various teacher models, while outperforming the existing DFKD methods.	翻訳日:2024-02-21 18:50:08 公開日:2024-02-18
# scInterpreter: セル型アノテーションのためのscRNA-seqデータ解釈のための大規模言語モデルのトレーニング scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation ( http://arxiv.org/abs/2402.12405v1 ) ライセンス: Link先を確認	Cong Li, Meng Xiao, Pengfei Wang, Guihai Feng, Xin Li, Yuanchun Zhou	(参考訳) 単一セルのオミックデータを直接読み書きする上で、既存の大規模言語モデルの固有の制限にもかかわらず、基礎モデルとして重要な可能性と柔軟性を示している。本研究は、単一細胞RNAシークエンシングデータにおいて、細胞型を解釈し、区別する機能を備えた大規模言語モデルの訓練および適応方法に焦点を当てる。予備研究の結果,これらの基礎モデルが既知の細胞型を正確に分類し,新しい生物学的知見を明らかにする効果的なツールとしての大規模言語モデルの可能性を示した。 Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model. This research focuses on how to train and adapt the Large Language Model with the capability to interpret and distinguish cell types in single-cell RNA sequencing data. Our preliminary research results indicate that these foundational models excel in accurately categorizing known cell types, demonstrating the potential of the Large Language Models as effective tools for uncovering new biological insights.	翻訳日:2024-02-21 18:49:39 公開日:2024-02-18
# Deep-Lock: ディープニューラルネットワークのセキュアな認証 Deep-Lock: Secure Authorization for Deep Neural Networks ( http://arxiv.org/abs/2008.05966v2 ) ライセンス: Link先を確認	Manaar Alam and Sayandeep Saha and Debdeep Mukhopadhyay and Sandip Kundu	(参考訳) 訓練されたディープニューラルネットワーク(DNN)モデルは、いくつかのビジネスモデルにおいて価値のある知的特性(IP)と見なされている。このようなDNNモデルのIP盗難防止と不正使用は、業界によって大きな関心を集めている。本稿では,鍵型モデルロック方式を提案することで,鍵型モデルが正しい秘密鍵を適用した場合にのみ正常に機能することを保証することで,DNNモデルの不正使用を防止する問題に対処する。提案方式はDeep-Lockと呼ばれ、S-Boxesと優れたセキュリティ特性を利用して、訓練済みのDNNモデルのパラメータを鍵スケジューリングアルゴリズムを介してマスターキーから生成される秘密鍵で暗号化する。結果として、暗号化された重みの密度の高いネットワークは、モデル微調整攻撃に対して堅牢である。最後に、Deep-LockはDNNモデルの構造とトレーニングを一切必要とせず、DNNの既存のソフトウェアおよびハードウェア実装すべてに適用できる。 Trained Deep Neural Network (DNN) models are considered valuable Intellectual Properties (IP) in several business models. Prevention of IP theft and unauthorized usage of such DNN models has been raised as of significant concern by industry. In this paper, we address the problem of preventing unauthorized usage of DNN models by proposing a generic and lightweight key-based model-locking scheme, which ensures that a locked model functions correctly only upon applying the correct secret key. The proposed scheme, known as Deep-Lock, utilizes S-Boxes with good security properties to encrypt each parameter of a trained DNN model with secret keys generated from a master key via a key scheduling algorithm. The resulting dense network of encrypted weights is found robust against model fine-tuning attacks. Finally, Deep-Lock does not require any intervention in the structure and training of the DNN models, making it applicable for all existing software and hardware implementations of DNN.	翻訳日:2024-02-21 07:53:58 公開日:2024-02-18
# 3次元vr-sketchから3次元形状検索へ Towards 3D VR-Sketch to 3D Shape Retrieval ( http://arxiv.org/abs/2209.10020v2 ) ライセンス: Link先を確認	Ling Luo, Yulia Gryaditskaya, Yongxin Yang, Tao Xiang, Yi-Zhe Song	(参考訳) 無料のオンライン3D形状コレクションは、3D検索の研究を規定している。しかし、活発な議論が続いている。 (i)検索をトリガーする最良の入力モダリティ、及び (ii)そのような検索の究極の使用シナリオ。本稿では,3次元スケッチを入力モダリティとして用い,検索を行うVRシナリオを提案する。したがって、究極のビジョンは、ユーザーがvr環境でエアドルリングすることで3dモデルを自由に取得できることだ。この新しい3dvr-sketch to 3d shape searchの問題を初めて見たとき、私たちは4つの貢献をした。まず、VRユーティリティをコーディングして、3DVRスケッチを収集し、検索を行う。第二に、ModelNetから2つの形状カテゴリーについて、最初の167ドルの3DVRスケッチを収集する。第3に,深層ネットワークを学習するために,抽象レベルが異なる人間の3Dスケッチの合成データセットを作成する手法を提案する。最後に,3次元の形状検索と3次元の形状検索とは対照的に,3次元の形状検索と3次元の立体スケッチのスパースで抽象的な性質により,3次元の形状検索に優れた性能を示すことを示す。これらのコントリビュートが、この課題に対する今後の試みの実現に一役買うと私たちは信じています。 VRインターフェース、コード、データセットはhttps://tinyurl.com/3DSketch3DVで入手できる。 Growing free online 3D shapes collections dictated research on 3D retrieval. Active debate has however been had on (i) what the best input modality is to trigger retrieval, and (ii) the ultimate usage scenario for such retrieval. In this paper, we offer a different perspective towards answering these questions -- we study the use of 3D sketches as an input modality and advocate a VR-scenario where retrieval is conducted. Thus, the ultimate vision is that users can freely retrieve a 3D model by air-doodling in a VR environment. As a first stab at this new 3D VR-sketch to 3D shape retrieval problem, we make four contributions. First, we code a VR utility to collect 3D VR-sketches and conduct retrieval. Second, we collect the first set of $167$ 3D VR-sketches on two shape categories from ModelNet. Third, we propose a novel approach to generate a synthetic dataset of human-like 3D sketches of different abstract levels to train deep networks. At last, we compare the common multi-view and volumetric approaches: We show that, in contrast to 3D shape to 3D shape retrieval, volumetric point-based approaches exhibit superior performance on 3D sketch to 3D shape retrieval due to the sparse and abstract nature of 3D VR-sketches. We believe these contributions will collectively serve as enablers for future attempts at this problem. The VR interface, code and datasets are available at https://tinyurl.com/3DSketch3DV.	翻訳日:2024-02-21 07:50:26 公開日:2024-02-18
# ログデータを用いた半教師付きバッチ学習 Semi-supervised Batch Learning From Logged Data ( http://arxiv.org/abs/2209.07148v3 ) ライセンス: Link先を確認	Gholamali Aminian, Armin Behnamnia, Roberto Vega, Laura Toni, Chengchun Shi, Hamid R. Rabiee, Omar Rivasplata, Miguel R. D. Rodrigues	(参考訳) オフポリシー学習法は、各サンプルポイントのコンテキスト、アクション、フィードバック(コストまたは報酬)を含むログデータからポリシーを学ぶことを意図している。本研究は, リスク最小化フレームワークの構築であり, また, 妥当性スコアへのアクセスも想定している。本稿では,いくつかのサンプルに対してフィードバックが欠落している問題に対する学習方法を提案する。我々は、このタイプの学習を、ログデータから半教師付きバッチ学習と呼び、広範囲のアプリケーションドメインで発生する。このような学習問題に対処するために、逆確率スコア推定器の下で真リスクの新たな上限を導出する。このバウンダリを用いて、正規化項がフィードバックに依存しないログデータを用いた半教師付きバッチ学習手法を提案し、その結果、ログ化された不足フィードバックデータを用いて評価できる。その結果、フィードバックは一部のサンプルにのみ存在するが、不足したフィードバックサンプルを活用することで学習ポリシーを学ぶことができる。ベンチマークデータセットから得られた実験の結果は、これらのアルゴリズムがロギングポリシーよりも優れたパフォーマンスでポリシーを達成することを示している。 Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedback and samples missing-feedback in the logged data. We refer to this type of learning as semi-supervised batch learning from logged data, which arises in a wide range of application domains. We derive a novel upper bound for the true risk under the inverse propensity score estimator to address this kind of learning problem. Using this bound, we propose a regularized semi-supervised batch learning method with logged data where the regularization term is feedback-independent and, as a result, can be evaluated using the logged missing-feedback data. Consequently, even though feedback is only present for some samples, a learning policy can be learned by leveraging the missing-feedback samples. The results of experiments derived from benchmark datasets indicate that these algorithms achieve policies with better performance in comparison with logging policies.	翻訳日:2024-02-21 07:49:49 公開日:2024-02-18
# テクスチャ・サリエンシー適応型注意を画像の漫画化に組み込む学習 Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization ( http://arxiv.org/abs/2208.01587v4 ) ライセンス: Link先を確認	Xiang Gao, Yuqi Zhang, and Yingjie Tian	(参考訳) 画像の漫画化は、近ごろ、教師なしのイメージ・ツー・イメージ翻訳の観点から、特徴ある漫画スタイル(クリアエッジ、スムーズなカラーシェーディング、抽象的な微細構造など)を正確に捉え、十分に伝達することが固有の課題である、生成的敵ネットワーク(GAN)に支配されている。既存の高度なモデルは、エッジを逆方向に推進する学習、スタイル伝達損失の導入、あるいは複数の表現空間からスタイルを整合させる学習により、漫画化効果を高めようとする。本稿では,より鮮明かつ鮮明なマンガ化効果が,基本的対向損失のみで容易に達成できることを実証する。漫画のスタイルが漫画のテクスチャ・サレントなローカル画像領域でより明確であることを示すため,通常の画像レベルと平行して,漫画のテクスチャの特徴をよりよく認識し伝達するために,漫画のテクスチャ・サレントなローカルパッチに対する逆学習を制限する領域レベルの逆学習ブランチを構築した。そこで, マンガ・テクスチュア・サリエンシ・サンプラー (CTSS) モジュールを提案し, トレーニングデータからマンガ・テクスチュア・サリエントパッチを動的にサンプリングする。広範な実験により,画像マンガ化における関連する手法の欠如成分として,敵対的学習におけるテクスチャ・サリエンシー適応的注意が,特に高分解能入力画像において,画像マンガのスタイライゼーションの促進と向上に重要であることを実証した。 Image cartoonization is recently dominated by generative adversarial networks (GANs) from the perspective of unsupervised image-to-image translation, in which an inherent challenge is to precisely capture and sufficiently transfer characteristic cartoon styles (e.g., clear edges, smooth color shading, abstract fine structures, etc.). Existing advanced models try to enhance cartoonization effect by learning to promote edges adversarially, introducing style transfer loss, or learning to align style from multiple representation space. This paper demonstrates that more distinct and vivid cartoonization effect could be easily achieved with only basic adversarial loss. Observing that cartoon style is more evident in cartoon-texture-salient local image regions, we build a region-level adversarial learning branch in parallel with the normal image-level one, which constrains adversarial learning on cartoon-texture-salient local patches for better perceiving and transferring cartoon texture features. To this end, a novel cartoon-texture-saliency-sampler (CTSS) module is proposed to dynamically sample cartoon-texture-salient patches from training data. With extensive experiments, we demonstrate that texture saliency adaptive attention in adversarial learning, as a missing ingredient of related methods in image cartoonization, is of significant importance in facilitating and enhancing image cartoon stylization, especially for high-resolution input pictures.	翻訳日:2024-02-21 07:49:30 公開日:2024-02-18
# 融合ラッソグラフにおける分散推定 Variance estimation in graphs with the fused lasso ( http://arxiv.org/abs/2207.12638v3 ) ライセンス: Link先を確認	Oscar Hernan Madrid Padilla	(参考訳) 一般グラフ構造問題における分散推定の問題について検討する。まず、一般グラフの分散を一貫して推定できる相補的ケースに対する線形時間推定器を開発する。我々の推定器は,平均信号が標準スケーリングと全く異なる場合,チェーンと2次元グリッドグラフの最小値が得られることを示す。さらに、モーメント条件下での一般グラフにおける融合ラッソ推定器の平均二乗誤差性能と誤差のテール挙動のバウンドについて、一般上限を与える。これらの上界は、誤差が準ガウス確率変数であるという仮定でしか持たない、融合ラッソ上の部分指数(sub-exponential)のような、より広い分布のクラスへの一般化を可能にする。上界を爆発させると、ヘテロ代用の場合のばらつきの信号を推定する単純な総変分正規化推定器を研究する。また,我々のヘテロシドスティック分散推定器が,グリッドグラフの有界変動の信号と,k$-nearest近傍グラフを推定するための最小値を得ることを示す下限を提供し,任意の連結グラフの分散を推定するための推定器との整合性を示す。 We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. We also provide lower bounds showing that our heteroscedastic variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, and $K$-nearest neighbor graphs, and the estimator is consistent for estimating the variances in any connected graph.	翻訳日:2024-02-21 07:48:25 公開日:2024-02-18
# 視線シフトの本質的なコストによる健康モデルによる次の固定の予測の改善 Improving saliency models' predictions of the next fixation with humans' intrinsic cost of gaze shifts ( http://arxiv.org/abs/2207.04250v3 ) ライセンス: Link先を確認	Florian Kadner, Tobias Thomas, David Hoppe and Constantin A. Rothkopf	(参考訳) 画像領域の人間の優先順位付けは、サリエンシマップやスキャンパスモデルを用いて時間不変の方法でモデル化することができる。しかしながら、どちらのモデルもいくつかのベンチマークやデータセットで着実に改善されているものの、人間の視線を予測するには大きなギャップがある。本稿では,このギャップを減らすために,次の視線目標を予測するための原則的枠組みを確立する理論的解析と,視線スイッチの人的コストを画像の内容とは無関係に実証的に測定する。本稿では,任意の静的サリエンシマップを動的履歴依存値マップの列に変換し,視線シフト後に再計算する逐次決定の枠組みにアルゴリズムを導入する。これらの地図は、 1) 任意の給与モデルによって提供される給与マップ。 2)最近測定された人的コスト関数は、眼球運動の大きさと方向の嗜好を定量化し、 3) 逐次的探索ボーナスは,その後の視線シフト毎に変化する。この探索ボーナスの空間的範囲と時間的減衰のパラメータは、人間の視線データから推定される。これら3つのコンポーネントの相対的な貢献は、nssスコアのmit1003データセットに最適化されており、3つの画像データセット上の5つの状態のアートサリエンシーモデルに対して、nssとaucスコアの次の視線目標の予測を著しく上回るほどである。そこで我々は、人間の視線嗜好の実装を行い、人間の次の視線目標に対する任意の正当性モデルの予測を改善するために使用できる。 The human prioritization of image regions can be modeled in a time invariant fashion with saliency maps or sequentially with scanpath models. However, while both types of models have steadily improved on several benchmarks and datasets, there is still a considerable gap in predicting human gaze. Here, we leverage two recent developments to reduce this gap: theoretical analyses establishing a principled framework for predicting the next gaze target and the empirical measurement of the human cost for gaze switches independently of image content. We introduce an algorithm in the framework of sequential decision making, which converts any static saliency map into a sequence of dynamic history-dependent value maps, which are recomputed after each gaze shift. These maps are based on 1) a saliency map provided by an arbitrary saliency model, 2) the recently measured human cost function quantifying preferences in magnitude and direction of eye movements, and 3) a sequential exploration bonus, which changes with each subsequent gaze shift. The parameters of the spatial extent and temporal decay of this exploration bonus are estimated from human gaze data. The relative contributions of these three components were optimized on the MIT1003 dataset for the NSS score and are sufficient to significantly outperform predictions of the next gaze target on NSS and AUC scores for five state of the art saliency models on three image data sets. Thus, we provide an implementation of human gaze preferences, which can be used to improve arbitrary saliency models' predictions of humans' next gaze targets.	翻訳日:2024-02-21 07:48:01 公開日:2024-02-18
# 適応型クラスアクティベーションマッピングによるマルチビュー機能拡張 Multi-view Feature Augmentation with Adaptive Class Activation Mapping ( http://arxiv.org/abs/2206.12943v4 ) ライセンス: Link先を確認	Xiang Gao, Yingjie Tian, and Zhiquan Qi	(参考訳) モデル性能を向上させるために,複数ビューの局所的特徴を抽出し,活用する画像分類のためのエンドツーエンド・トレーニング可能な機能拡張モジュールを提案する。グローバル平均プーリング(GAP)を用いて,グローバルビューのみからベクトル化された特徴を抽出するのと異なり,モデルロバスト性を改善するため,多様な多視点局所特徴をサンプリング・アンサンブルすることを提案する。今回提案したAdaCAM (Adaptive Class Activation Mapping, 適応型クラス活性化マッピング) を通じて, 特徴マップのクラス識別ローカル領域に効率よく適応的に対応できる, 単純な補助的分類器ヘッド(1$\times$1畳み込み層を含む)を組み込んだ。広範な実験は、マルチビュー機能拡張モジュールによって達成された一貫性と注目すべきパフォーマンスの向上を示しています。 We propose an end-to-end-trainable feature augmentation module built for image classification that extracts and exploits multi-view local features to boost model performance. Different from using global average pooling (GAP) to extract vectorized features from only the global view, we propose to sample and ensemble diverse multi-view local features to improve model robustness. To sample class-representative local features, we incorporate a simple auxiliary classifier head (comprising only one 1$\times$1 convolutional layer) which efficiently and adaptively attends to class-discriminative local regions of feature maps via our proposed AdaCAM (Adaptive Class Activation Mapping). Extensive experiments demonstrate consistent and noticeable performance gains achieved by our multi-view feature augmentation module.	翻訳日:2024-02-21 07:46:35 公開日:2024-02-18
# SMEMO: 軌道予測のためのソーシャルメモリ SMEMO: Social Memory for Trajectory Forecasting ( http://arxiv.org/abs/2203.12446v2 ) ライセンス: Link先を確認	Francesco Marchetti, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo	(参考訳) 人間の相互作用の効果的なモデリングは、将来の軌跡のような行動を予測する際に最も重要である。それぞれの個人は、その動きによって周囲のエージェントに影響を与え、全員が衝突回避やグループフォローのような社会的に記述されていない規則に従う。本稿では,アルゴリズム的な観点から,すなわちデータ操作タスクとして問題を見ることにより,時間を通じて常に進化するそのようなインタラクションをモデル化する。本稿では,各エージェントに関する情報の連続書き込み,更新,リコールが可能な外部ストレージとして機能する,エンドツーエンドのトレーニング可能な作業メモリに基づくニューラルネットワークを提案する。提案手法は,異なるエージェントの動き間の説明可能な因果関係を学習し,複数の軌道予測データセットの最先端結果を得る。 Effective modeling of human interactions is of utmost importance when forecasting behaviors such as future trajectories. Each individual, with its motion, influences surrounding agents since everyone obeys to social non-written rules such as collision avoidance or group following. In this paper we model such interactions, which constantly evolve through time, by looking at the problem from an algorithmic point of view, i.e. as a data manipulation task. We present a neural network based on an end-to-end trainable working memory, which acts as an external storage where information about each agent can be continuously written, updated and recalled. We show that our method is capable of learning explainable cause-effect relationships between motions of different agents, obtaining state-of-the-art results on multiple trajectory forecasting datasets.	翻訳日:2024-02-21 07:44:34 公開日:2024-02-18
# 離散力学系における非自明な最小固定点の探索 Finding Nontrivial Minimum Fixed Points in Discrete Dynamical Systems ( http://arxiv.org/abs/2301.04090v4 ) ライセンス: Link先を確認	Zirou Qiu, Chen Chen, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti	(参考訳) ネットワーク化された離散力学システムは、協調ゲームにおけるエージェントによる伝染と意思決定の拡散をモデル化するためにしばしば用いられる。このような力学系の固定点は、システムが収束する構成を表す。望ましくない感染(噂や誤報など)の拡散においては、少数の影響を受けるノードを持つ固定点への収束が望ましい目標である。このような考慮により、影響を受けるノード数が最小となるシステムの非自明な固定点を見つけるという、新しい最適化問題を定式化する。 p = np でない限り、この問題の解を任意の定数エプシロン > 0 の係数 n^1-\epsilon に近似する多項式時間アルゴリズムは存在しない。この計算難易度に対処するため,この問題を効率的に解決できる特別な事例をいくつか挙げる。さらに,適切な大きさのネットワークに対する問題に対処する整数線形プログラムを提案する。大規模ネットワーク上での問題を解くために、欲求選択法とともに一般的なヒューリスティックな枠組みを提案する。実世界のネットワークにおける広範囲な実験結果から,提案するヒューリスティックスの有効性が示された。 Networked discrete dynamical systems are often used to model the spread of contagions and decision-making by agents in coordination games. Fixed points of such dynamical systems represent configurations to which the system converges. In the dissemination of undesirable contagions (such as rumors and misinformation), convergence to fixed points with a small number of affected nodes is a desirable goal. Motivated by such considerations, we formulate a novel optimization problem of finding a nontrivial fixed point of the system with the minimum number of affected nodes. We establish that, unless P = NP, there is no polynomial time algorithm for approximating a solution to this problem to within the factor n^1-\epsilon for any constant epsilon > 0. To cope with this computational intractability, we identify several special cases for which the problem can be solved efficiently. Further, we introduce an integer linear program to address the problem for networks of reasonable sizes. For solving the problem on larger networks, we propose a general heuristic framework along with greedy selection methods. Extensive experimental results on real-world networks demonstrate the effectiveness of the proposed heuristics.	翻訳日:2024-02-21 07:36:58 公開日:2024-02-18
# 時間系の絡み合いと特殊相対性 Time-System Entanglement and Special Relativity ( http://arxiv.org/abs/2212.13348v3 ) ライセンス: Link先を確認	Ngo Phuc Duc Loc	(参考訳) 空間と時間は古典物理学ではほぼ等しく扱われるが、量子力学ではそうではないことも分かっている。空間と時間の両方の量子記述は、現実の量子性を理解する上で重要である。量子時間のページ・ウーター機構は、量子系の進化と量子時間自由度の間の絡み合いによって記述される、有望な出発点である。本稿では,ローレンツ変換によって誘起されるウィグナー回転により量子系と絡み合う量子ビット時計を考える。この時間系の絡み合いがローレンツ加速の速さに依存するかを研究する。実例として、ガウス運動量分布を持つスピン-1/2粒子の場合を考える。また、時間系の絡み合いエントロピーとスピン運動量絡みエントロピーを比較し、前者が後者より小さいことを発見した。 We know that space and time are treated almost equally in classical physics, but we also know that this is not the case for quantum mechanics. A quantum description of both space and time is important to really understand the quantum nature of reality. The Page-Wootters mechanism of quantum time is a promising starting point, according to which the evolution of the quantum system is described by the entanglement between it and quantum temporal degrees of freedom. In this paper, we consider a qubit clock that is entangled with a quantum system due to the Wigner rotation induced by Lorentz transformation. We study how this time-system entanglement depends on the rapidity of the Lorentz boost. We consider the case of a spin-1/2 particle with Gaussian momentum distribution as a concrete example. We also compare the time-system entanglement entropy with the spin-momentum entanglement entropy and find that the former is smaller than the latter.	翻訳日:2024-02-21 07:36:42 公開日:2024-02-18
# 分数と乗法による関数線形回帰の統計的最適性 Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression ( http://arxiv.org/abs/2211.10968v3 ) ライセンス: Link先を確認	Jiading Liu and Lei Shi	(参考訳) 再生核ヒルベルト空間(英語版)(rkhs)における正規化関数線形回帰の以前の解析では、通常この核空間に含まれる対象関数が必要である。本稿では, 対象関数が基礎となるRKHSに必ずしも属さないシナリオにおいて, 分割・コンカレント推定器の収束性能について検討する。分解に基づくスケーラブルなアプローチとして、関数線形回帰の分割・収束推定器は、時間とメモリにおけるアルゴリズムの複雑さを大幅に減らすことができる。我々は、説明変数と対象関数の様々な規則性条件下での分割・対数推定器を用いた予測のための、シャープな有限標本上限を確立するための積分作用素アプローチを開発する。また、最小最大下界を構築することによって導出率の漸近的最適性を証明する。最後に,無騒音推定器の収束について考察し,穏やかな条件下では任意の速度で推定できることを示す。 Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert space (RKHS) typically requires the target function to be contained in this kernel space. This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not necessarily reside in the underlying RKHS. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory. We develop an integral operator approach to establish sharp finite sample upper bounds for prediction with divide-and-conquer estimators under various regularity conditions of explanatory variables and target function. We also prove the asymptotic optimality of the derived rates by building the mini-max lower bounds. Finally, we consider the convergence of noiseless estimators and show that the rates can be arbitrarily fast under mild conditions.	翻訳日:2024-02-21 07:34:35 公開日:2024-02-18
# データフローエンジンによる高最適化量子回路 Highly optimized quantum circuits synthesized via data-flow engines ( http://arxiv.org/abs/2211.07685v3 ) ライセンス: Link先を確認	Peter Rakyta, Gregory Morse, Jakab N\'adori, Zita Majnay-Tak\'acs, Oskar Mencer, Zolt\'an Zimbor\'as	(参考訳) 最少数のゲート演算による量子プログラムの定式化は、近年アクセス可能なノイズ量子プロセッサから有意義な結果を得るために重要である。本研究では、FPGA(Field Programmable Gate Array)ベースのデータフローエンジン(DFE)を用いて、可変量子コンパイラをスケールアップし、最大9ドルの量子ビットプログラムまで回路を合成する。このゲートデコンポザは、FPGAチップ上の単一キュービット回転からなる任意の量子回路をシミュレートし、2キュービットゲートを制御するように設計された、新しく開発されたDFE量子コンピュータシミュレータを利用する。 QISKITパッケージを用いたベンチマークでは,SQUANDERパッケージ(DFEアクセラレータサポート付き)が生成する回路の深さは平均で9,7 %以下であったが,回路の忠実度は最大で$\sim10^{-4}の誤差に近かった。 The formulation of quantum programs in terms of the fewest number of gate operations is crucial to retrieve meaningful results from the noisy quantum processors accessible these days. In this work, we demonstrate a use-case for Field Programmable Gate Array (FPGA) based data-flow engines (DFEs) to scale up variational quantum compilers to synthesize circuits up to $9$-qubit programs.This gate decomposer utilizes a newly developed DFE quantum computer simulator that is designed to simulate arbitrary quantum circuit consisting of single qubit rotations and controlled two-qubit gates on FPGA chips. In our benchmark with the QISKIT package, the depth of the circuits produced by the SQUANDER package (with the DFE accelerator support) were less by $97\%$ on average, while the fidelity of the circuits was still close to unity up to an error of $\sim10^{-4}$.	翻訳日:2024-02-21 07:34:18 公開日:2024-02-18
# 減数化拡散から減数化マルコフモデルへ From Denoising Diffusions to Denoising Markov Models ( http://arxiv.org/abs/2211.03595v3 ) ライセンス: Link先を確認	Joe Benton, Yuyang Shi, Valentin De Bortoli, George Deligiannidis, Arnaud Doucet	(参考訳) ノイズ拡散は、驚くべき経験的性能を示す最先端の生成モデルである。それらは、データ分布をガウス分布に拡散し、このノミネーションプロセスを逆転して合成データポイントを得るように学習することで機能する。ノイズ拡散は、スコアマッチングを用いたノイズデータ密度の対数微分の近似に依存する。このようなモデルは、事前および可能性からのみサンプリングできる場合、近似後続シミュレーションの実行にも使用できる。本稿では,このアプローチを広い範囲に一般化した統一フレームワークを提案し,スコアマッチングを独自に拡張する。様々なアプリケーションで得られたモデルを説明します。 Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. The denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities using score matching. Such models can also be used to perform approximate posterior simulation when one can only sample from the prior and likelihood. We propose a unifying framework generalising this approach to a wide class of spaces and leading to an original extension of score matching. We illustrate the resulting models on various applications.	翻訳日:2024-02-21 07:33:45 公開日:2024-02-18
# 未分離調理映像からのレシピ生成 Recipe Generation from Unsegmented Cooking Videos ( http://arxiv.org/abs/2209.10134v2 ) ライセンス: Link先を確認	Taichi Nishimura and Atsushi Hashimoto and Yoshitaka Ushiku and Hirotaka Kameko and Shinsuke Mori	(参考訳) 本稿では,(1)調理完了時に重要なイベントを抽出し,(2)抽出したイベントの文を生成することをエージェントに要求する,無節の調理ビデオからのレシピ生成に取り組む。我々の課題は、出来事を徹底的に検出し、それらに対する文を生成することを目的とした高密度ビデオキャプション(DVC)と似ている。しかし、レシピ生成においては、DVCとは異なり、レシピストーリーの認識が不可欠であり、モデルが正しい順序で適切な回数のイベントを抽出し、それらに基づいて正確な文章を生成する必要がある。 dvcモデルの出力を分析し、(1)いくつかのイベントをレシピストーリーとして採用できるが、(2)生成された文が視覚的な内容に基づかないことを確認した。これに基づいて,出力イベントからoracleイベントを選択し,文章を再生成することで,適切なレシピを得るという目標を設定しました。そこで本研究では,DVCのイベントからオラクルイベントを選択して文を生成するイベントセレクタと文生成器をトレーニングする,トランスフォーマーに基づくマルチモーダルリカレントアプローチを提案する。さらに、より正確なレシピを生成するために材料を含めることでモデルを拡張する。実験の結果,提案手法は最先端DVCモデルよりも優れていた。また,本モデルでは,レシピをストーリーアウェアな方法でモデル化することにより,適切なイベント数を正しい順序で出力することを確認した。 This paper tackles recipe generation from unsegmented cooking videos, a task that requires agents to (1) extract key events in completing the dish and (2) generate sentences for the extracted events. Our task is similar to dense video captioning (DVC), which aims at detecting events thoroughly and generating sentences for them. However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should extract an appropriate number of events in the correct order and generate accurate sentences based on them. We analyze the output of the DVC model and confirm that although (1) several events are adoptable as a recipe story, (2) the generated sentences for such events are not grounded in the visual content. Based on this, we set our goal to obtain correct recipes by selecting oracle events from the output events and re-generating sentences for them. To achieve this, we propose a transformer-based multimodal recurrent approach of training an event selector and sentence generator for selecting oracle events from the DVC's events and generating sentences for them. In addition, we extend the model by including ingredients to generate more accurate recipes. The experimental results show that the proposed method outperforms state-of-the-art DVC models. We also confirm that, by modeling the recipe in a story-aware manner, the proposed model outputs the appropriate number of events in the correct order.	翻訳日:2024-02-21 07:32:28 公開日:2024-02-18
# イギリスのバイオバンク・ファンドによるパーキンソン病の深層学習予測とインシデント予測 Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging ( http://arxiv.org/abs/2302.06727v3 ) ライセンス: Link先を確認	Charlie Tran, Kai Shen, Kang Liu, Akshay Ashok, Adolfo Ramirez-Zamora, Jinghua Chen, Yulin Li, and Ruogu Fang	(参考訳) パーキンソン病は世界最速の神経疾患である。パーキンソン病のメカニズムを解明し、診断を自動化する研究は、パーキンソン病患者の治療を大幅に改善する。現在の診断方法は高価であり、可用性は限られている。本疾患の発症・進展を考慮すれば, 診断的スクリーニングは, 症状の発症前にも診断的に正確であり, 医療的介入を許容すべきである。我々は、パーキンソン病の診断検査として、しばしば脳への窓と呼ばれる網膜基底像を強調した。パーキンソン病をイギリスのバイオバンク法から分類するための従来の機械学習とディープラーニングの手法を体系的に評価した。以上の結果から,パーキンソン病患者は年齢と性差のある健常者で,auc (auc) の0.77。この精度はパーキンソン病の流行または発症の予測において維持される。説明可能性と信頼性は、局所的なバイオマーカーの視覚属性マップと、データ摂動に対するモデルロバストネスの定量化によって向上する。 Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations.	翻訳日:2024-02-21 07:22:51 公開日:2024-02-18
# 近似輸送地図を用いたサンプリングについて On Sampling with Approximate Transport Maps ( http://arxiv.org/abs/2302.04763v3 ) ライセンス: Link先を確認	Louis Grenioux, Alain Durmus, \'Eric Moulines, Marylou Gabri\'e	(参考訳) トランスポートマップは、扱いやすい分布に変換することで、非自明なジオメトリを持つ分布のサンプリングを容易にすることができる。このアプローチのポテンシャルは、ターゲットに向かって参照分布をプッシュするようにトレーニングされたディープニューラルネットワークでパラメータ化されたマップである正規化フロー(NF)の開発によって高まっている。 NF強化サンプリング器が最近提案したブレンド(マルコフ連鎖)モンテカルロ法 (i)その流れから引き出すもの,又は (ii)フローベースの再パラメータ化。いずれの場合も、学習した輸送条件の品質が向上する。本研究は,これら2つのアプローチの相対的強みと弱みを初めて明らかにした。本研究は,マルチモーダルターゲットを適度な高次元までフローベースの提案で確実に処理できることを結論づける。対照的に、再パラメトリゼーションに依存する手法はマルチモダリティに苦しむが、高次元の設定や訓練不足においてはより堅牢である。さらに, 目的-目的の妥当性の影響を明らかにするために, 独立系メトロポリス・ハスティングスサンプリング装置の混合時間に対する新しい定量的境界を導出する。 Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target. NF-enhanced samplers recently proposed blend (Markov chain) Monte Carlo methods with either (i) proposal draws from the flow or (ii) a flow-based reparametrization. In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of target-proposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler.	翻訳日:2024-02-21 07:22:18 公開日:2024-02-18
# WOMD-LiDAR:モーション予測のための生センサデータセットベンチマーク WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting ( http://arxiv.org/abs/2304.03834v2 ) ライセンス: Link先を確認	Kan Chen, Runzhou Ge, Hang Qiu, Rami AI-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov	(参考訳) 広く採用されている動き予測データセットは、観測された感覚入力を3Dボックスやポリラインのような高レベルの抽象化で置き換える。これらのスパースな形状は、知覚システムの予測で元のシーンに注釈を付けて推測される。このような中間表現は、動き予測モデルの品質とコンピュータビジョンモデルの性能を結びつける。さらに、人間によって設計された知覚と動き予測の明確なインターフェースは、通常、元の感覚入力に存在する意味情報のサブセットを通り過ぎます。これらのモジュラーアプローチの効果について検討し、これらの制約を緩和する新しいパラダイムを設計し、エンドツーエンドのモーション予測モデルの開発を加速するために、大規模かつ高品質で多様なLiDARデータを用いて、Waymo Open Motion Dataset(WOMD)を拡張した。新しい拡張現実データセットWOMD-LiDARは、それぞれ20秒にまたがる10000以上のシーンで構成され、高度に同期化され、校正された高品質のLiDAR点雲が、都市や郊外の地理的に捕獲される(https://waymo.com/open/data/motion/)。 Waymo Open Dataset (WOD)と比較して、WOMD-LiDARデータセットには100倍以上のシーンが含まれている。さらに,lidarデータをモーション予測モデルのトレーニングに統合し,強力なベースラインを提供する。実験の結果,LiDARデータは動き予測タスクの改善をもたらすことがわかった。我々は、WOMD-LiDARがエンドツーエンドのモーション予測モデルを強化する新たな機会を提供することを期待している。 Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the human-designed explicit interfaces between perception and motion forecasting typically pass only a subset of the semantic information present in the original sensory input. To study the effect of these modular approaches, design new paradigms that mitigate these limitations, and accelerate the development of end-to-end motion forecasting models, we augment the Waymo Open Motion Dataset (WOMD) with large-scale, high-quality, diverse LiDAR data for the motion forecasting task. The new augmented dataset WOMD-LiDAR consists of over 100,000 scenes that each spans 20 seconds, consisting of well-synchronized and calibrated high quality LiDAR point clouds captured across a range of urban and suburban geographies (https://waymo.com/open/data/motion/). Compared to Waymo Open Dataset (WOD), WOMD-LiDAR dataset contains 100x more scenes. Furthermore, we integrate the LiDAR data into the motion forecasting model training and provide a strong baseline. Experiments show that the LiDAR data brings improvement in the motion forecasting task. We hope that WOMD-LiDAR will provide new opportunities for boosting end-to-end motion forecasting models.	翻訳日:2024-02-21 07:12:37 公開日:2024-02-18
# 確率制御とゲームのための機械学習手法の最近の進歩 Recent Developments in Machine Learning Methods for Stochastic Control and Games ( http://arxiv.org/abs/2303.10257v2 ) ライセンス: Link先を確認	Ruimeng Hu, Mathieu Lauri\`ere	(参考訳) 確率的最適制御とゲームは、金融や経済学から社会科学、ロボット工学、エネルギー管理まで幅広い応用がある。多くの実世界の応用は、洗練された数値手法の開発を駆動する複雑なモデルを含んでいる。近年,確率制御問題やゲームを解くために機械学習に基づく計算手法が開発されている。本稿では,高次元でも,あるいは構造が非常に複雑であっても,従来の数値的手法が達成できる範囲を超えて,そのような問題を解決する可能性を解いた深層学習手法に注目する。主に連続時間と連続空間の設定を考える。新しいアプローチの多くは、高次元偏微分方程式や後方確率微分方程式を解くための最近のニューラル・ネットワークに基づく手法、またはマルコフ決定過程のモデルなし強化学習に基づいて構築され、画期的な結果をもたらした。本稿では,これらの手法を紹介するとともに,機械学習と確率制御とゲームにおける最先端の成果を概説する。 Stochastic optimal control and games have a wide range of applications, from finance and economics to social sciences, robotics, and energy management. Many real-world applications involve complex models that have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for solving stochastic control problems and games. In this review, we focus on deep learning methods that have unlocked the possibility of solving such problems, even in high dimensions or when the structure is very complex, beyond what traditional numerical methods can achieve. We consider mostly the continuous time and continuous space setting. Many of the new approaches build on recent neural-network-based methods for solving high-dimensional partial differential equations or backward stochastic differential equations, or on model-free reinforcement learning for Markov decision processes that have led to breakthrough results. This paper provides an introduction to these methods and summarizes the state-of-the-art works at the crossroad of machine learning and stochastic control and games.	翻訳日:2024-02-21 07:09:31 公開日:2024-02-18
# 時系列予測のためのマルチタスクメタラベル補正 Multi-task Meta Label Correction for Time Series Prediction ( http://arxiv.org/abs/2303.08103v3 ) ライセンス: Link先を確認	Luxuan Yang, Ting Gao, Wei Wei, Min Dai, Cheng Fang, Jinqiao Duan	(参考訳) 時系列分類は避けられない2つの問題に直面している。 1つは部分的特徴情報であり、もう1つはラベル品質の低下であり、モデルの性能に影響を及ぼす可能性がある。上記の問題に対処するため,マルチタスク・フレームワークの下で,メタラーニングによる時系列データに対するラベル補正手法を開発した。主な貢献は3つある。まず,外側ループに2つの分岐ニューラルネットワークを用いたラベル補正モデルをトレーニングする。モデルに依存しない内部ループでは、既存の分類モデルをマルチタスク方式で使用し、メタ知識を共同で更新することで、複雑な時系列上で適応的なラベリングを実現する。第2に、歴史データのイメージパターンと予測地平線におけるデータの両方に対する新しいデータ可視化手法を考案する。最後に、XOM、S\&P500、SZ50など、さまざまな財務データを用いて手法をテストする。その結果,提案手法は既存のラベル補正手法よりも有効で正確であることがわかった。 Time series classification faces two unavoidable problems. One is partial feature information and the other is poor label quality, which may affect model performance. To address the above issues, we create a label correction method to time series data with meta-learning under a multi-task framework. There are three main contributions. First, we train the label correction model with a two-branch neural network in the outer loop. While in the model-agnostic inner loop, we use pre-existing classification models in a multi-task way and jointly update the meta-knowledge so as to help us achieve adaptive labeling on complex time series. Second, we devise new data visualization methods for both image patterns of the historical data and data in the prediction horizon. Finally, we test our method with various financial datasets, including XOM, S\&P500, and SZ50. Results show that our method is more effective and accurate than some existing label correction techniques.	翻訳日:2024-02-21 07:09:16 公開日:2024-02-18
# EventNet-ITA: イベントのイタリアのフレーム解析 EventNet-ITA: Italian Frame Parsing for Events ( http://arxiv.org/abs/2305.10892v2 ) ライセンス: Link先を確認	Marco Rovera	(参考訳) 本稿では,イタリア語用イベントフレームを用いたマルチドメインコーパスであるeventnet-itaについて述べる。さらに、フレーム解析のための効率的なマルチラベルシーケンスラベリング手法を提案し、徹底的に評価する。 53,000以上の注釈付き文と200以上のモデル化されたフレームを持つ、幅広い個人的、社会的、歴史的現象をカバーするeventnet-itaは、イタリア語にイベントのフレーム解析のための公開リソースを提供する最初の体系的な試みであり、幅広い研究や応用タスクに有用である。提案手法は,計算要求の最小化に加えて,フレーム分類に0.9厳密なF1スコア,フレーム要素分類に0.72スコアを実現する。注釈付きコーパスとフレーム解析モデルはオープンライセンスでリリースされている。 This paper introduces EventNet-ITA, a large, multi-domain corpus annotated full-text with event frames for Italian. Moreover, we present and thoroughly evaluate an efficient multi-label sequence labeling approach for Frame Parsing. Covering a wide range of individual, social and historical phenomena, with more than 53,000 annotated sentences and over 200 modeled frames, EventNet-ITA constitutes the first systematic attempt to provide the Italian language with a publicly available resource for Frame Parsing of events, useful for a broad spectrum of research and application tasks. Our approach achieves a promising 0.9 strict F1-score for frame classification and 0.72 for frame element classification, on top of minimizing computational requirements. The annotated corpus and the frame parsing model are released under open license.	翻訳日:2024-02-21 06:58:55 公開日:2024-02-18
# マルチキュービットシステムにおけるエンタングルメントの可視化 Visualizing Entanglement in multi-Qubit Systems ( http://arxiv.org/abs/2305.07596v4 ) ライセンス: Link先を確認	Jonas Bley, Eva Rexigel, Alda Arias, Nikolas Longen, Lars Krupp, Maximilian Kiefer-Emmanouilidis, Paul Lukowicz, Anna Donhauser, Stefan K\"uchemann, Jochen Kuhn, and Artur Widera	(参考訳) 量子情報科学とテクノロジーの分野では、量子状態と関連するプロセスの表現と視覚化は研究と教育の両方に不可欠である。この文脈では、特に数量子ビットのアンサンブルに焦点を当てる。有名なブロッホ球面や一般化など、シングルキュービットおよびマルチキュービットシステムの多くの強力な表現が存在する。ここでは、そのようなアンサンブルの表現として次元円記法を用い、量子ビットのいわゆる円記法と、n-粒子系をn-次元空間で表現するアイデアを適用する。分離可能性の数学的条件は量子状態の対称性を可視化し、数量子ビット系の絡み合いや様々な量子アルゴリズムに対する新しい視点を提供する。このようにして、次元記法は、数量子ビット系の非自明な量子絡み合い特性と過程をより広いオーディエンスに伝達する大きな可能性を約束し、これらの概念を直感的な量子洞察と形式的な数学的記述との橋渡しとして理解を深めることができる。 In the field of quantum information science and technology, the representation and visualization of quantum states and related processes are essential for both research and education. In this context, a focus especially lies on ensembles of few qubits. There exist many powerful representations for single-qubit and multi-qubit systems, such as the famous Bloch sphere and generalizations. Here, we utilize the dimensional circle notation as a representation of such ensembles, adapting the so-called circle notation of qubits and the idea of representing the n-particle system in an n-dimensional space. We show that the mathematical conditions for separability lead to symmetry conditions of the quantum state visualized, offering a new perspective on entanglement in few-qubit systems and therefore on various quantum algorithms. In this way, dimensional notations promise significant potential for conveying nontrivial quantum entanglement properties and processes in few-qubit systems to a broader audience, and could enhance understanding of these concepts as a bridge between intuitive quantum insight and formal mathematical descriptions.	翻訳日:2024-02-21 06:58:12 公開日:2024-02-18
# MLCopilot: 機械学習タスクの解決における大規模言語モデルのパワーの解放 MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks ( http://arxiv.org/abs/2304.14979v2 ) ライセンス: Link先を確認	Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang	(参考訳) 機械学習(ML)の分野は広く普及し、特定のシナリオにMLを適用することに対する大きな需要がもたらされた。 MLタスクの自動化(例えば、AutoML)に対する主要なアプローチは、しばしば時間がかかり、人間の開発者にとって理解するのが困難である。対照的に、人間のエンジニアは、タスクとソリューションに関する推論を理解する驚くべき能力を持っているが、彼らの経験と知識は、しばしば、量的アプローチによって利用され難い。本稿では,機械知能と人間の知識のギャップを埋めるために,最先端の大規模言語モデルを活用する新しいフレームワークを導入し,新しいタスクのためのMLソリューションを開発することを目的とする。本稿では、構造化された入力を理解するためのLLMの能力を拡張し、新しいMLタスクを解くための徹底的な推論を行う可能性を示す。そして私たちは、いくつかの専用デザインの後、LLMが実現できることに気付きました。 (i)MLタスクの既存の経験から観察し、二新たな業務に有望な成果を効果的に提供する理由生成したソリューションは、高いレベルの競争力を達成するために直接使用することができる。サンプルとコードはhttps://github.com/microsoft/CoMLで公開されている。 The field of machine learning (ML) has gained widespread adoption, leading to significant demand for adapting ML to specific scenarios, which is yet expensive and non-trivial. The predominant approaches towards the automation of solving ML tasks (e.g., AutoML) are often time-consuming and hard to understand for human developers. In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches. In this paper, we aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework, which leverages the state-of-the-art large language models to develop ML solutions for novel tasks. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks. And we find that, after some dedicated design, the LLM can (i) observe from the existing experiences of ML tasks and (ii) reason effectively to deliver promising results for new tasks. The solution generated can be used directly to achieve high levels of competitiveness. Examples and code available at https://github.com/microsoft/CoML.	翻訳日:2024-02-21 06:56:30 公開日:2024-02-18
# チャトGPTの教育・教育における中国の社会的視点に関する研究 A Study on Chinese Social Perspective regarding ChatGPT for Education and Beyond ( http://arxiv.org/abs/2306.04325v3 ) ライセンス: Link先を確認	Yao Tian, Chengwei Tong, Lik-Hang Lee, Reza Hadi Mogavi, Yong Liao, Pengyuan Zhou	(参考訳) ChatGPTは多くの分野、特に学術コミュニティの関心を喚起してきた。最新バージョンのGPT-4はマルチモーダル入力と出力をサポートする。本研究は、中国国民がChatGPTの可能性を教育的、一般目的にどう捉えているかをソーシャルメディアで分析する。この研究は、GPT-4のリリース以来、世論の変化を調査する最初の試みでもある。分析結果によると、GPT-4の前には、一部のソーシャルメディア利用者はAIの進歩が教育や社会に恩恵をもたらすと信じていたが、ChatGPTのような先進的なAIは人間を劣悪に感じさせ、不正行為や道徳的原則の低下などの問題を招き、大多数は中立なままだと信じていた。興味深いことに、GPT-4の公開以降、公衆の態度はポジティブな方向に移行する傾向にある。教育におけるchatgpt様モデルの倫理的適用性を確保するため,トレンドシフトとロードマップを徹底的に分析した。 ChatGPT has piqued the interest of many fields, particularly in the academic community. GPT-4, the latest version, starts supporting multimodal input and output. This study examines social media posts to analyze how the Chinese public perceives the potential of ChatGPT for educational and general purposes. The study also serves as the first effort to investigate the changes in public opinion since the release of GPT-4. According to the analysis results, prior to GPT-4, although some social media users believed that AI advancements would benefit education and society, some believed that advanced AI, such as ChatGPT, would make humans feel inferior and lead to problems such as cheating and a decline in moral principles, while the majority remain neutral. Interestingly, public attitudes have tended to shift in a positive direction since the release of GPT-4. We present a thorough analysis of the trending shift and a roadmap to ensure the ethical application of ChatGPT-like models in education and beyond.	翻訳日:2024-02-21 06:48:46 公開日:2024-02-18
# 視覚言語モデルのための一貫性誘導型プロンプト学習 Consistency-guided Prompt Learning for Vision-Language Models ( http://arxiv.org/abs/2306.01195v2 ) ライセンス: Link先を確認	Shuvendu Roy, Ali Etemad	(参考訳) 視覚言語モデルのための新しい微調整手法であるConsistency-Guided Prompt Learning (CoPrompt)を提案する。提案手法は,下流タスクを数ショットで微調整した場合に,大規模な基礎モデルの一般化を改善する。 CoPromptの基本的な考え方は、トレーニング可能なモデルと事前訓練されたモデルの予測に一貫性の制約を適用して、下流タスクの過度な適合を防ぐことである。さらに,2つの入力に一貫性を強制し,チューニング,プロンプト,アダプタという2つの支配的なパラダイムを組み合わせることで,一貫性の制約をさらに向上させます。摂動入力における一貫性の強制は、一貫性の制約をさらに規則化し、一般化を改善するのに役立つ。さらに、アダプタとプロンプトの統合により、下流タスクのパフォーマンスが向上するだけでなく、入出力スペースにおけるチューニング柔軟性も向上している。これにより、数ショットの学習環境で下流タスクへのより効果的な適応が可能になる。実験により、CoPromptは、ベース・ツー・ノーベルの一般化、ドメインの一般化、データセット間の評価など、様々な評価スイートにおいて既存の手法よりも優れていることが示された。一般化では、CoPromptはゼロショットタスクの最先端と11データセットの全体的な調和平均を改善している。詳細なアブレーション研究は、CoPromptの各成分の有効性を示している。 We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models. Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstream task. Additionally, we introduce the following two components into our consistency constraint to further boost the performance: enforcing consistency on two perturbed inputs and combining two dominant paradigms of tuning, prompting and adapter. Enforcing consistency on perturbed input serves to further regularize the consistency constraint, thereby improving generalization. Moreover, the integration of adapters and prompts not only enhances performance on downstream tasks but also offers increased tuning flexibility in both input and output spaces. This facilitates more effective adaptation to downstream tasks in a few-shot learning setting. Experiments show that CoPrompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation. On generalization, CoPrompt improves the state-of-the-art on zero-shot tasks and the overall harmonic mean over 11 datasets. Detailed ablation studies show the effectiveness of each of the components in CoPrompt.	翻訳日:2024-02-21 06:48:04 公開日:2024-02-18
# アダプティブフローサンプリングを用いたエネルギーベースモデルのバランストレーニング Balanced Training of Energy-Based Models with Adaptive Flow Sampling ( http://arxiv.org/abs/2306.00684v4 ) ライセンス: Link先を確認	Louis Grenioux, \'Eric Moulines, Marylou Gabri\'e	(参考訳) エネルギーベースモデル(EBMs)は、非正規化ログ密度を直接パラメータ化する汎用密度推定モデルである。非常に柔軟であるが、ebmsはモデルの特定の正規化定数を欠いているため、モデルの可能性は計算的に難解である。いくつかの近似サンプルと変分推論手法が提案され、トレーニングの確率勾配を推定している。これらの手法はサンプル生成に有望な結果を示しているが、データセット内の異なるクラスの相対的重要性を決定するなど、推定密度の統計的精度にはほとんど注意が払われていない。そこで本研究では, サンプリングを容易にするために最近提案されているNF(正規化フロー)という, 異なる種類の生成モデルを用いたESMの新しい最大格トレーニングアルゴリズムを提案する。本手法はトレーニング中にNFをEMMに適合させることで,NFを用いたサンプリング方式によりESMの正確な勾配が常に得られ,最終的には新しいデータを生成するための高速サンプリング装置となる。 Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data.	翻訳日:2024-02-21 06:47:21 公開日:2024-02-18
# ターゲットドメインラベルのないドメイン適応モデルの評価は可能か? Can We Evaluate Domain Adaptation Models Without Target-Domain Labels? ( http://arxiv.org/abs/2305.18712v3 ) ライセンス: Link先を確認	Jianfei Yang, Hanjie Qian, Yuecong Xu, Kai Wang, Lihua Xie	(参考訳) 教師なしドメイン適応(Unsupervised domain adapt, UDA)は、ラベル豊富なソースドメインでトレーニングされたモデルをラベルなしのターゲットドメインに適応させる。しかし、現実のシナリオでは、ターゲットドメインラベルがないため、UDAモデルの性能を評価することは困難である。さらに, 対人訓練と自己学習に頼ってUDA法が普及すると, モデル変性と負の移動が生じ, 評価問題がさらに悪化する可能性がある。本稿では,これらの問題に対処する新しい指標である「textit{Transfer Score}」を提案する。提案手法は,モデルパラメータによる分類器の空間的均一性,深部表現の伝達性と識別性を評価することで,udaモデルの教師なし評価を可能にする。提案手法は,対象ドメインを含まない3つの新たな目的を達成し,(1)利用可能な選択肢から最適なUDA法を選択すること,(2)モデル劣化を防止するためにUDAモデルのハイパーパラメーターを最適化すること,(3)UDAモデルのどのチェックポイントが最適かを同定すること,である。我々の研究は、データレベルのUDA研究と実践的なUDAシナリオのギャップを埋め、UDAモデルの性能の現実的な評価を可能にします。異なるスケールのUDAデータセットと不均衡分布に関する広範な実験研究を通じて,我々の測定値の有効性を検証する。その結果、上記の目標をしっかりと達成できることがわかった。 Unsupervised domain adaptation (UDA) involves adapting a model trained on a label-rich source domain to an unlabeled target domain. However, in real-world scenarios, the absence of target-domain labels makes it challenging to evaluate the performance of UDA models. Furthermore, prevailing UDA methods relying on adversarial training and self-training could lead to model degeneration and negative transfer, further exacerbating the evaluation problem. In this paper, we propose a novel metric called the \textit{Transfer Score} to address these issues. The proposed metric enables the unsupervised evaluation of UDA models by assessing the spatial uniformity of the classifier via model parameters, as well as the transferability and discriminability of deep representations. Based on the metric, we achieve three novel objectives without target-domain labels: (1) selecting the best UDA method from a range of available options, (2) optimizing hyperparameters of UDA models to prevent model degeneration, and (3) identifying which checkpoint of UDA model performs optimally. Our work bridges the gap between data-level UDA research and practical UDA scenarios, enabling a realistic assessment of UDA model performance. We validate the effectiveness of our metric through extensive empirical studies on UDA datasets of different scales and imbalanced distributions. The results demonstrate that our metric robustly achieves the aforementioned goals.	翻訳日:2024-02-21 06:45:16 公開日:2024-02-18
# Nestを去る - 予測を最適化するローカルロス関数を超えて Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize ( http://arxiv.org/abs/2305.16830v2 ) ライセンス: Link先を確認	Sanket Shah, Andrew Perrault, Bryan Wilder, Milind Tambe	(参考訳) predict-then-optimizeは、不確実性下で意思決定を行うために機械学習を使用するフレームワークである。中心的な研究課題は、“意思決定タスクの構造は、その特定のタスクのためにMLモデルを調整するためにどのように使用できるのか? この目的のために、近年の研究では、タスク固有の損失関数の学習が提案されている。しかしながら、現在のアプローチでは、これらの損失の形式とそれらのMLモデルの振る舞いへの影響について制限的な仮定がなされている。これらの仮定はどちらも高い計算コストのアプローチにつながり、実際に違反した場合は性能が劣る。本稿では,上記の仮定を回避し,学習損失関数のサンプル効率を向上させるためにmlモデルの特徴を活用することにより,これらの課題に対する解決策を提案する。実験により,本手法は文献から得られた4つの領域で最新の結果を得ることができ,過去の手法と同等のサンプル数を何桁も必要とすることが少なくないことを示した。さらに, 局所性仮定が破られた場合, 最良既存手法を200%近く上回っている。 Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can the structure of a decision-making task be used to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches make restrictive assumptions about the form of these losses and their impact on ML model behavior. These assumptions both lead to approaches with high computational cost, and when they are violated in practice, poor performance. In this paper, we propose solutions to these issues, avoiding the aforementioned assumptions and utilizing the ML model's features to increase the sample efficiency of learning loss functions. We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. Moreover, our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.	翻訳日:2024-02-21 06:44:14 公開日:2024-02-18
# フェデレーション学習における共有性に関する調査 : モデルユーティリティ,プライバシリーク,コミュニケーション効率の展望 A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency ( http://arxiv.org/abs/2307.10655v2 ) ライセンス: Link先を確認	Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Yuyi Mao, Jun Zhang	(参考訳) フェデレーション学習(fl)は、クライアント間のコラボレーショントレーニングのためのセキュアなパラダイムとして登場した。データ集中化がなければ、FLはクライアントがプライバシー保護の方法でローカル情報を共有できる。このアプローチは大きな注目を集め、関連する研究をまとめるために多くの調査が進められた。しかしながら、これらの調査の大部分は、トレーニングプロセス中にモデルパラメータを共有するflメソッドに集中し、他の形式でローカル情報を共有する可能性を検討している。本稿では,FLで共有すべきものに対する新たな視点から,モデルユーティリティ,プライバシリーク,通信効率を重視した体系的な調査を行う。まず, モデル, 合成データ, 知識をそれぞれ共有する3つの共有手法を用いて, FL法の新しい分類法を提案する。第2に,プライバシ攻撃に対するさまざまな共有方法の脆弱性を分析し,防御機構をレビューする。第3に、FLにおける様々な共有手法の学習性能と通信オーバーヘッドを比較するための広範な実験を行う。さらに,様々な防御手法の有効性を比較しながら,モデルインバージョン攻撃とメンバーシップ推論攻撃によるプライバシー漏洩の可能性を評価する。最後に,今後の研究方針を特定し,調査結果をまとめる。 Federated learning (FL) has emerged as a secure paradigm for collaborative training among clients. Without data centralization, FL allows clients to share local information in a privacy-preserving manner. This approach has gained considerable attention, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on FL methods that share model parameters during the training process, while overlooking the possibility of sharing local information in other forms. In this paper, we present a systematic survey from a new perspective of what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. First, we present a new taxonomy of FL methods in terms of three sharing methods, which respectively share model, synthetic data, and knowledge. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms. Third, we conduct extensive experiments to compare the learning performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we identify future research directions and conclude the survey.	翻訳日:2024-02-21 06:37:41 公開日:2024-02-18
# オーバーパラメータ付き畳み込み残差ネットワークを用いた低次元多様体の非パラメトリック分類 Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks ( http://arxiv.org/abs/2307.01649v2 ) ライセンス: Link先を確認	Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang	(参考訳) 畳み込み残留ニューラルネットワーク(convolutional residual neural network, convresnets)は、過パラメータ化されているものの、実際には驚くべき予測性能を達成することができる。このギャップを埋めるために,ConvResNeXtsの性能について検討する。これはConvResNetsを特別なケースとしてカバーし,非パラメトリック分類の観点から重量減衰を訓練する。我々の分析は、ConvResNeXtsにおいて無限に多くのビルディングブロックを許容し、重み減衰がこれらのブロックに空間性を暗黙的に強制することを示す。具体的には、低次元多様体上で支持される滑らかな対象関数を考えることで、convresnextsが関数の滑らかさや低次元構造に適応できることを証明し、次元の呪いに苦しむことなく効率的に関数を学習する。従来の機械学習モデルに比べて過パラメータ化されたConvResNeXtの利点を部分的に正当化する。 Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.	翻訳日:2024-02-21 06:37:23 公開日:2024-02-18
# ViTEraser:SegMIMプレトレーニングによるシーンテキスト除去のためのビジョントランスフォーマーのパワーを損なう ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining ( http://arxiv.org/abs/2306.12106v2 ) ライセンス: Link先を確認	Dezhi Peng, Chongyu Liu, Yuliang Liu, Lianwen Jin	(参考訳) シーンテキスト除去(str)は、自然シーンのテキストストロークを視覚的なコヒーレントな背景に置き換えることを目的としている。最近のSTRアプローチは反復的な改善や明示的なテキストマスクに依存しており、結果としてテキストローカライゼーションの精度に高い複雑さと感度をもたらす。さらに、既存のSTRメソッドの多くは畳み込みアーキテクチャを採用しているが、視覚変換器(ViT)の可能性はほとんど未検討である。本稿では, ViTEraser と呼ばれる, 単純かつ効率の良い ViT ベースのテキスト消去器を提案する。簡潔なエンコーダ・デコーダフレームワークに従えば、ViTEraserは様々なViTを容易に組み込んで長距離モデリングを強化することができる。具体的には、エンコーダは、入力画像をViTブロックと埋め込み層を介して隠れた空間に階層的にマッピングし、デコーダは、隠れた特徴を徐々にViTブロックと分割層でテキスト消去画像にアップサンプリングする。 ViTEraserはテキストローカライゼーションと塗装を暗黙的に統合するので、テキストボックスセグメンテーションとマスク付き画像モデリングタスクにエンコーダとデコーダに焦点を当てた、SegMIMと呼ばれる新しいエンドツーエンド事前学習手法を提案する。実験結果から,SegMIM を用いた ViTEraser はSTR 上での最先端性能をかなりのマージンで達成し,他のタスクである textit{e.g.} に拡張した場合に強い一般化能力を示すことが明らかとなった。さらに我々は,vit を str フィールドに適用するための深い洞察を提供する vit ベースのエンコーダデコーダのアーキテクチャ,事前トレーニング,スケーラビリティを総合的に検討する。コードはhttps://github.com/shannanyinxiang/viteraserで入手できる。 Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds. Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization. Moreover, most existing STR methods adopt convolutional architectures while the potential of vision Transformers (ViTs) remains largely unexplored. In this paper, we propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser. Following a concise encoder-decoder framework, ViTEraser can easily incorporate various ViTs to enhance long-range modeling. Specifically, the encoder hierarchically maps the input image into the hidden space through ViT blocks and patch embedding layers, while the decoder gradually upsamples the hidden features to the text-erased image with ViT blocks and patch splitting layers. As ViTEraser implicitly integrates text localization and inpainting, we propose a novel end-to-end pretraining method, termed SegMIM, which focuses the encoder and decoder on the text box segmentation and masked image modeling tasks, respectively. Experimental results demonstrate that ViTEraser with SegMIM achieves state-of-the-art performance on STR by a substantial margin and exhibits strong generalization ability when extended to other tasks, \textit{e.g.}, tampered scene text detection. Furthermore, we comprehensively explore the architecture, pretraining, and scalability of the ViT-based encoder-decoder for STR, which provides deep insights into the application of ViT to the STR field. Code is available at https://github.com/shannanyinxiang/ViTEraser.	翻訳日:2024-02-21 06:35:09 公開日:2024-02-18
# 視覚モデル適応とロバストネスのための群直交化正規化 Group Orthogonalization Regularization For Vision Models Adaptation and Robustness ( http://arxiv.org/abs/2306.10001v2 ) ライセンス: Link先を確認	Yoav Kurtz, Noga Bar, Raja Giryes	(参考訳) ニューラルネットワークが深まるにつれて、パラメータ内の冗長性が増大する。この現象は、畳み込みフィルタ間の相関を減らそうとするいくつかの方法につながった。同じ層内のフィルタ群間の正則性を促進する計算効率の良い正規化手法を提案する。実験により,近年の拡散モデルと視覚変換器(ViT)の適応手法に組み込むと,この正規化により下流タスクの性能が向上することが示された。また,対人訓練中に集団直交を施行した場合の頑健性も改善した。私たちのコードはhttps://github.com/yoavkurtz/gorで入手できます。 As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.	翻訳日:2024-02-21 06:33:53 公開日:2024-02-18
# マトリックス製品密度演算子の量子状態トモグラフィ Quantum State Tomography for Matrix Product Density Operators ( http://arxiv.org/abs/2306.09432v4 ) ライセンス: Link先を確認	Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin and Zhihui Zhu	(参考訳) 量子状態トモグラフィ(QST)を用いてしばしば達成される実験的測定から量子状態の再構成は、量子デバイスの検証とベンチマークに不可欠である。しかし、一般の非構造化量子状態に対してQSTを実行するには、最も最適な測定設定であっても、システム内の個々の量子数とともに \emph{exponentially} を成長させる膨大な数の状態コピーが必要である。幸いなことに、ノイズや中間スケールの量子コンピュータによって生成される状態のような多くの物理量子状態は通常、構造化される。一次元では、そのような状態は、キュービットの個数に依存しない有限行列/結合次元を持つ行列積作用素(MPO)によってよく近似されることが期待される。しかしながら、これらの状態に対して効率的なQSTが実行可能であるかどうかはまだ不明である。本稿では, このギャップを橋渡しし, 圧縮センシングと経験的過程の理論を用いたmposの安定回復のための理論的保証を確立する。まず、ガウス測度とHaar random rank-one Positive Operator Valued Measures (POVMs)の2種類のランダム測定設定について検討する。有限結合次元のMPOに含まれる情報は、測定値の統計的誤差を仮定して、キュービット数にのみ依存する多数のランダムな測定値を用いて保存可能であることを示す。次に、量子コンピュータ上で実装可能なHaarランダムランクワンPOVMを用いて、MPOベースのQSTを物理量子測定により研究する。我々は、MPO状態の有界回復誤差を保証するために、キュービット数における状態コピー数 \emph{polynomial} だけが必要であることを証明した。 The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general. In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state.	翻訳日:2024-02-21 06:33:44 公開日:2024-02-18
# CoRe Optimizer: マシンラーニングのためのオールインワンソリューション CoRe Optimizer: An All-in-One Solution for Machine Learning ( http://arxiv.org/abs/2307.15663v2 ) ライセンス: Link先を確認	Marco Eckhoff and Markus Reiher	(参考訳) 最適化アルゴリズムとそのハイパーパラメータは、機械学習アプリケーションにおけるトレーニング速度とモデル精度に大きな影響を与える可能性がある。理想的なオプティマイザの希望リストには、高速でスムーズな低エラー収束、低計算要求、一般応用性が含まれている。当社が最近導入したcontinual resilient (core)オプティマイザは他の最先端の1次勾配ベースオプティマイザと比較して、生涯にわたるマシンラーニングポテンシャルをトレーニングする上で優れたパフォーマンスを示しました。本稿では,さまざまな機械学習タスクに対して,コアオプティマイザとadamオプティマイザとresilient backpropagation(rprop)を含む9つの最適化アルゴリズムの広範なパフォーマンス比較を行う。我々は、異なるハイパーパラメータの影響を分析し、一般に適用可能な値を提供する。コアオプティマイザは、調査対象のアプリケーション毎に最高の性能または競合性能を提供するが、ミニバッチやバッチ学習によっては、1つのハイパーパラメータのみを変更する必要がある。 The optimization algorithm and its hyperparameters can significantly affect the training speed and resulting model accuracy in machine learning applications. The wish list for an ideal optimizer includes fast and smooth convergence to low error, low computational demand, and general applicability. Our recently introduced continual resilient (CoRe) optimizer has shown superior performance compared to other state-of-the-art first-order gradient-based optimizers for training lifelong machine learning potentials. In this work we provide an extensive performance comparison of the CoRe optimizer and nine other optimization algorithms including the Adam optimizer and resilient backpropagation (RPROP) for diverse machine learning tasks. We analyze the influence of different hyperparameters and provide generally applicable values. The CoRe optimizer yields best or competitive performance in every investigated application, while only one hyperparameter needs to be changed depending on mini-batch or batch learning.	翻訳日:2024-02-21 06:25:30 公開日:2024-02-18
# OUTFOX: 逆生成例を用いた文脈学習によるLLM生成エッセイ検出 OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples ( http://arxiv.org/abs/2307.11729v3 ) ライセンス: Link先を確認	Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki	(参考訳) 大規模言語モデル (LLM) はテキスト生成において人間レベルの流布を達成しており、人間の書き起こしとLLM生成の区別が難しい。これはLSMを誤用するリスクが増大し、LSM生成テキストを特定するための検出器の開発が要求される。しかし、既存の検出器は攻撃に対する堅牢性に欠けており、単にllm生成テキストをパラフレージングすることで検出精度を低下させる。さらに、悪意のあるユーザは、検出結果に基づいて意図的に検出を回避しようとするかもしれないが、これは以前の研究では想定されていなかった。本稿では,検出器と攻撃者の両方が互いの出力を考慮できるように,llm生成テキスト検出器のロバスト性を向上させるフレームワークであるexfoxを提案する。このフレームワークでは、検知器の予測ラベルをコンテキスト内学習の例として使用し、検出しにくいエッセイを逆向きに生成する一方、検出器は逆向きに生成されたエッセイをコンテキスト内学習の例として使用して、強い攻撃者からのエッセイを検出する。学生エッセイの領域での実験では、提案された検出器は攻撃者が生成したテキストの検出性能を+41.3ポイントF1スコアまで改善することを示した。さらに、提案した検出器は、96.9ポイントのF1スコアまでの最先端検出性能を示し、非攻撃テキスト上で既存の検出器を打ち負かす。最後に、提案する攻撃者は検出器の性能を-57.0点f1-scoreまで劇的に低下させ、検出を回避するためのベースラインパラフレージング法を大きく上回っている。 Large Language Models (LLMs) have achieved human-level fluency in text generation, making it difficult to distinguish between human-written and LLM-generated texts. This poses a growing risk of misuse of LLMs and demands the development of detectors to identify LLM-generated texts. However, existing detectors lack robustness against attacks: they degrade detection accuracy by simply paraphrasing LLM-generated texts. Furthermore, a malicious user might attempt to deliberately evade the detectors based on detection results, but this has not been assumed in previous studies. In this paper, we propose OUTFOX, a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output. In this framework, the attacker uses the detector's prediction labels as examples for in-context learning and adversarially generates essays that are harder to detect, while the detector uses the adversarially generated essays as examples for in-context learning to learn to detect essays from a strong attacker. Experiments in the domain of student essays show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41.3 points F1-score. Furthermore, the proposed detector shows a state-of-the-art detection performance: up to 96.9 points F1-score, beating existing detectors on non-attacked texts. Finally, the proposed attacker drastically degrades the performance of detectors by up to -57.0 points F1-score, massively outperforming the baseline paraphrasing method for evading detection.	翻訳日:2024-02-21 06:25:16 公開日:2024-02-18
# 頑健なビジュアル質問回答:データセット,メソッド,今後の課題 Robust Visual Question Answering: Datasets, Methods, and Future Challenges ( http://arxiv.org/abs/2307.11471v2 ) ライセンス: Link先を確認	Jie Ma, Pinghui Wang, Dechen Kong, Zewei Wang, Jun Liu, Hongbin Pei, Junzhou Zhao	(参考訳) 視覚質問応答は、画像と自然言語質問を与えられた正確な自然言語応答を提供するシステムが必要である。しかし,従来の一般的なVQA手法では,解答前の画像のグラウンド化など,適切な行動を学習するよりも,トレーニングデータに存在するバイアスを記憶する傾向があることが広く認識されている。したがって、これらの手法は通常、分配性能は高いが、分配性能は低い。近年,VQAのロバスト性を評価するために,様々なデータセットとデバイアス法が提案されている。本稿は,この新興ファッションに焦点をあてた初の総合調査を行う。具体的には、まず、分布内および分布外の観点からデータセットの開発プロセスの概要を示す。次に,これらのデータセットを用いた評価指標について検討する。第3に, 開発プロセス, 類似性, 差異, 堅牢性比較, および既存のデバイアス手法の技術的特徴を提示するタイポロジーを提案する。さらに,VQA上での視覚・言語事前学習モデルのロバスト性を分析し,議論する。最後に、利用可能な文献の徹底的なレビューと実験分析を通じて、様々な観点から今後の研究の要点について論じる。 Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often exhibit a tendency to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively. This paper provides the first comprehensive survey focused on this emerging fashion. Specifically, we first provide an overview of the development process of datasets from in-distribution and out-of-distribution perspectives. Then, we examine the evaluation metrics employed by these datasets. Thirdly, we propose a typology that presents the development process, similarities and differences, robustness comparison, and technical features of existing debiasing methods. Furthermore, we analyze and discuss the robustness of representative vision-and-language pre-training models on VQA. Finally, through a thorough review of the available literature and experimental analysis, we discuss the key areas for future research from various viewpoints.	翻訳日:2024-02-21 06:24:16 公開日:2024-02-18
# TALL:ディープフェイクビデオ検出のためのThumbnailレイアウト TALL: Thumbnail Layout for Deepfake Video Detection ( http://arxiv.org/abs/2307.07494v3 ) ライセンス: Link先を確認	Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He	(参考訳) 社会やサイバーセキュリティに対するディープフェイクの脅威が高まり、公衆の懸念が高まり、ディープフェイクビデオ検出のこの重要な話題に努力が注がれている。既存のビデオ手法は優れた性能を発揮するが、計算量が多い。本稿では,ビデオクリップを予め定義されたレイアウトに変換することで,空間的および時間的依存関係の保存を実現する,Thumbnail Layout (TALL) というシンプルな手法を提案する。具体的には、連続したフレームを各フレーム内の一定の位置にマスクして一般化を改善し、サブイメージにリサイズし、サムネイルとして予め定義されたレイアウトに再構成する。 TALLは、数行のコードだけを変更することで、モデルに依存しない、非常に単純です。視覚変換器の成功に触発されて,我々はTALLをSwin Transformerに組み込み,効率的かつ効果的なTALL-Swin法を構築した。 TALLとSOTA TALL-Swinの有効性と優位性を検証した。 TALL-Swinは、挑戦的なクロスデータセットタスク、FaceForensics++ $\to$ Celeb-DFで90.79$\%$AUCを達成した。コードはhttps://github.com/rainy-xu/tall4 deepfakeで入手できる。 The growing threats of deepfakes to society and cybersecurity have raised enormous public concerns, and increasing efforts have been devoted to this critical topic of deepfake video detection. Existing video methods achieve good performance but are computationally intensive. This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. Specifically, consecutive frames are masked in a fixed position in each frame to improve generalization, then resized to sub-images and rearranged into a pre-defined layout as the thumbnail. TALL is model-agnostic and extremely simple by only modifying a few lines of code. Inspired by the success of vision transformers, we incorporate TALL into Swin Transformer, forming an efficient and effective method TALL-Swin. Extensive experiments on intra-dataset and cross-dataset validate the validity and superiority of TALL and SOTA TALL-Swin. TALL-Swin achieves 90.79$\%$ AUC on the challenging cross-dataset task, FaceForensics++ $\to$ Celeb-DF. The code is available at https://github.com/rainy-xu/TALL4Deepfake.	翻訳日:2024-02-21 06:22:09 公開日:2024-02-18
# 深層強化学習における報酬機械抽象化の文脈的事前計画 Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning ( http://arxiv.org/abs/2307.05209v3 ) ライセンス: Link先を確認	Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren	(参考訳) 近年の研究では、深層強化学習(DRL)エージェントは、訓練されたタスクに過度に適合し、小さな環境変化に適応できない傾向が示されている。未知のタスクに移行する際の学習の迅速化を目的として,現在のタスクを,現在のタスクの報酬やダイナミクスに基づいてサブタスクを誘導する状態マシン抽象化を用いて表現する手法を提案する。本手法は,現在の抽象状態からの最適遷移の象徴表現をエージェントに与え,それらの遷移を達成するための報酬を与える。これらの表現はタスク間で共有され、エージェントは以前に遭遇したシンボルや遷移の知識を活用できるため、転送が促進される。実験結果から, 種々の領域におけるサンプル効率と少数ショット転送の改善が示された。 Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.	翻訳日:2024-02-21 06:21:29 公開日:2024-02-18
# 知識グラフ補完のための大規模言語モデル探索 Exploring Large Language Models for Knowledge Graph Completion ( http://arxiv.org/abs/2308.13916v4 ) ライセンス: Link先を確認	Liang Yao, Jiazhen Peng, Chengsheng Mao, Yuan Luo	(参考訳) 知識グラフは多くの人工知能タスクにおいて重要な役割を果たすが、不完全性の問題にしばしば直面する。本研究では,Large Language Models (LLM) を用いて知識グラフの補完を行う。我々は知識グラフのトリプルをテキストシーケンスとみなし、これらのトリプルをモデル化するための知識グラフ LLM (KG-LLM) と呼ばれる革新的なフレームワークを導入する。提案手法では,三重項の実体記述と関係記述を用いて,その応答を予測に利用する。ベンチマークナレッジグラフを用いた実験により,トリプル分類や関係予測などのタスクにおいて,最先端の性能が得られることが示された。また、微調整モデル(LLaMA-7B、ChatGLM-6B)が最近のChatGPTおよびGPT-4より優れていることも見出した。 Knowledge graphs play a vital role in numerous artificial intelligence tasks, yet they frequently face the issue of incompleteness. In this study, we explore utilizing Large Language Models (LLM) for knowledge graph completion. We consider triples in knowledge graphs as text sequences and introduce an innovative framework called Knowledge Graph LLM (KG-LLM) to model these triples. Our technique employs entity and relation descriptions of a triple as prompts and utilizes the response for predictions. Experiments on various benchmark knowledge graphs demonstrate that our method attains state-of-the-art performance in tasks such as triple classification and relation prediction. We also find that fine-tuning relatively smaller models (e.g., LLaMA-7B, ChatGLM-6B) outperforms recent ChatGPT and GPT-4.	翻訳日:2024-02-21 06:12:36 公開日:2024-02-18
# SpikingBERT:不特定微分を用いたスパイキング言語モデルのトレーニングのためのBERTの蒸留 SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation ( http://arxiv.org/abs/2308.10873v3 ) ライセンス: Link先を確認	Malyaban Bal, Abhronil Sengupta	(参考訳) 大規模言語モデル(llm)は非常に強力に成長しているが、人間の脳よりもニューロンやシナプスは桁違いに少ない。しかし、運用にはエネルギーとエネルギーがかなり必要である。本研究では,脳内のシナプス情報の流れからモチベーションを引き出すことにより,従来のLMの計算コストを削減することを目的とした,バイオインスピレーションスパイキング言語モデルを提案する。本稿では,ニューロンの平衡における平均スパイク速度を利用して,暗黙の微分法を用いてニューロモルフィックスパイキングLMを訓練し,サロゲート勾配を使わずにスパイキングニューラルネットワーク(SNN)に基づくアルゴリズムの非微分可能性問題を克服する枠組みを示す。スパイキングニューロンの定常収束はまた、スケーラブルなスパイキングLMの開発において重要なスパイキングアテンション機構を設計することができる。さらに、平衡時のニューロンの平均スパイク速度の収束を利用して、トレーニング済みBERTモデルを「教師」として使用し、「学生」スパイクアーキテクチャを訓練する新しいANN-SNN知識蒸留技術を開発した。本論文で提案するアーキテクチャはBERTをモチベーションとしているが,多種多様な LLM に拡張できる可能性がある。我々の研究は、GLUEベンチマークで複数の異なるタスクにおいて、運用上のスパイクするLMアーキテクチャのパフォーマンスを実証する最初のものである。 Large language Models (LLMs), though growing exceedingly powerful, comprises of orders of magnitude less neurons and synapses than the human brain. However, it requires significantly more power/energy to operate. In this work, we propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain. In this paper, we demonstrate a framework that leverages the average spiking rate of neurons at equilibrium to train a neuromorphic spiking LM using implicit differentiation technique, thereby overcoming the non-differentiability problem of spiking neural network (SNN) based algorithms without using any type of surrogate gradient. The steady-state convergence of the spiking neurons also allows us to design a spiking attention mechanism, which is critical in developing a scalable spiking LM. Moreover, the convergence of average spiking rate of neurons at equilibrium is utilized to develop a novel ANN-SNN knowledge distillation based technique wherein we use a pre-trained BERT model as "teacher" to train our "student" spiking architecture. While the primary architecture proposed in this paper is motivated by BERT, the technique can be potentially extended to different kinds of LLMs. Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark.	翻訳日:2024-02-21 06:10:51 公開日:2024-02-18
# OctoPack: コード大言語モデルをチューニングするインストラクション OctoPack: Instruction Tuning Code Large Language Models ( http://arxiv.org/abs/2308.07124v2 ) ライセンス: Link先を確認	Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre	(参考訳) 命令で大きな言語モデル(LLM)を微調整すると、自然言語タスクのパフォーマンスが大幅に向上する。我々は、コード変更とヒューマンインストラクションを組み合わせるgitコミットの自然な構造を活用して、コードを使った命令チューニングを適用する。 CommitPack:350のプログラミング言語で4テラバイトのGitコミットをコンパイルします。我々は、HumanEval Pythonベンチマーク(46.2% pass@1)で、CommitPackを16BパラメータStarCoderモデル上の他の自然および合成コード命令(xP3x、Self-Instruct、OASST)と比較し、OpenAI出力でトレーニングされていないモデル間で最先端のパフォーマンスを達成する。さらに、HumanEvalPackを導入し、HumanEvalベンチマークを6つの言語(Python、JavaScript、Java、Go、C++、Rust)で合計3つのコーディングタスク(コード補完、コード説明、コード合成)に拡張しました。私たちのモデルであるOctoCoderとOctoGeeXは、すべての許容モデルの中でHumanEvalPackで最高のパフォーマンスを実現し、CommitPackがより広範な言語や自然なコーディングタスクに一般化する利点を実証しています。コード、モデル、データはhttps://github.com/bigcode-project/octopackで無料で利用できる。 Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.	翻訳日:2024-02-21 06:09:06 公開日:2024-02-18
# 動的活性化関数によるフィードフォワードと畳み込みニューラルネットワークの性能最適化 Optimizing Performance of Feedforward and Convolutional Neural Networks through Dynamic Activation Functions ( http://arxiv.org/abs/2308.05724v2 ) ライセンス: Link先を確認	Chinmay Rane, Kanishka Tyagi, Michael Manry	(参考訳) ディープラーニングトレーニングトレーニングアルゴリズムは、音声、テキスト、画像ビデオなど、多くの分野において、近年で大きな成功を収めています。より深い層と深い層が提案され、152層ほどのresnet構造で大きな成功を収めた。浅層畳み込みニューラルネットワーク(CNN)はまだ活発な研究であり、いくつかの現象はまだ説明されていない。ネットワークで使用されるアクティベーション機能は、ネットワークに非線型性を提供するため、最も重要である。 Relu は最もよく使われる活性化関数であり、隠れた層に複雑なピースワイド線形(PWL)活性化を示す。これらのpwl活性化は、畳み込みニューラルネットワークと多層パーセプトロンのためのネットワークのrelu活性化よりもはるかに優れた働きを示す。浅部および深部CNNに対するPyTorchの結果の比較を行い,本症例をさらに強化した。 Deep learning training training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deeper and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained. Activation functions used in the network are of utmost importance, as they provide non linearity to the networks. Relu's are the most commonly used activation function.We show a complex piece-wise linear(PWL) activation in the hidden layer. We show that these PWL activations work much better than relu activations in our networks for convolution neural networks and multilayer perceptrons. Result comparison in PyTorch for shallow and deep CNNs are given to further strengthen our case.	翻訳日:2024-02-21 06:08:38 公開日:2024-02-18
# 高絡み合いトーリックxyモデルの任意の子 Anyons in a highly-entangled toric xy model ( http://arxiv.org/abs/2308.01765v2 ) ライセンス: Link先を確認	Milo Moses, Konrad Deka	(参考訳) 1989年にXiao-Gang Wenによって表面的には造られたが、1972年からは古典的xyモデルの振る舞いを記述するためにトポロジカル秩序 (topological order) という用語が用いられてきた。 xyモデルは、非位相的 u(1) ゲージ作用の対象となるため、ウェンの位相次数を持たないことが指摘されている。私たちはある意味でこれが唯一の障害であることを示している。すなわち、ゲージ不変性がエネルギー的に強制されると、$xy$モデルは純粋に位相的に順序づけられる。実際、量子$xy$トポロジカル位数は、群 G=Z に適用された北エフの量子二重模型の無限格子極限であることを示す。 While ostensibly coined in 1989 by Xiao-Gang Wen, the term "topological order" has been in use since 1972 to describe the behavior of the classical xy model. It has been noted that the xy model does not have Wen's topological order since it is also subject a non-topological U(1) gauge action. We show in a sense this is the only obstruction. That is, if gauge invariance is enforced energetically then the $xy$ model becomes purely topologically ordered. In fact, we show that the quantum $xy$ topological order is an infinite lattice limit of Kitaev's quantum double model applied to the group G=Z.	翻訳日:2024-02-21 06:08:17 公開日:2024-02-18
# Decoupled Training: フラストレーションに易しいマルチドメイン学習の復活 Decoupled Training: Return of Frustratingly Easy Multi-Domain Learning ( http://arxiv.org/abs/2309.10302v2 ) ライセンス: Link先を確認	Ximei Wang, Junwei Pan, Xingzhuo Guo, Dapeng Liu, Jie Jiang	(参考訳) マルチドメイン学習(mdl)は、重複する複数のドメインに対して、最小平均リスクでモデルをトレーニングすることを目的としている。データセットバイアスとドメイン支配の課題に対処するために、分布を整列してドメインギャップを減らしたり、ドメイン固有のタワーやゲート、さらには専門家による差異を保ったりすることで共通性を求める多くのMDLアプローチが提案されている。 MDLモデルは、高度なネットワークアーキテクチャや損失関数によってますます複雑になり、余分なパラメータを導入し、計算コストを増大させています。本稿では,Decoupled Training (D-Train) という名前のマルチドメイン学習手法を提案する。 d-trainは、まずすべてのドメインを事前トレーニングしてルートモデルをウォームアップし、次にマルチヘッドに分割して各ドメインをポストトレーニングし、最終的にバックボーンを固定することでヘッドを微調整し、トレーニングを分離してドメイン独立を達成する3段階のトレーニング戦略である。 d-trainは単純さと効率性にも拘わらず、標準的なベンチマークから衛星画像やレコメンデーションシステムの応用に至るまで、さまざまなデータセットの広範な評価において非常に優れた性能を発揮している。 Multi-domain learning (MDL) aims to train a model with minimal average risk across multiple overlapping but non-identical domains. To tackle the challenges of dataset bias and domain domination, numerous MDL approaches have been proposed from the perspectives of seeking commonalities by aligning distributions to reduce domain gap or reserving differences by implementing domain-specific towers, gates, and even experts. MDL models are becoming more and more complex with sophisticated network architectures or loss functions, introducing extra parameters and enlarging computation costs. In this paper, we propose a frustratingly easy and hyperparameter-free multi-domain learning method named Decoupled Training (D-Train). D-Train is a tri-phase general-to-specific training strategy that first pre-trains on all domains to warm up a root model, then post-trains on each domain by splitting into multi-heads, and finally fine-tunes the heads by fixing the backbone, enabling decouple training to achieve domain independence. Despite its extraordinary simplicity and efficiency, D-Train performs remarkably well in extensive evaluations of various datasets from standard benchmarks to applications of satellite imagery and recommender systems.	翻訳日:2024-02-21 06:00:21 公開日:2024-02-18
# アンカーポイント: 少ない例でベンチマークモデル Anchor Points: Benchmarking Models with Much Fewer Examples ( http://arxiv.org/abs/2309.08638v2 ) ライセンス: Link先を確認	Rajan Vivek, Kawin Ethayarajh, Diyi Yang, Douwe Kiela	(参考訳) 現代の言語モデルは、しばしば強力だが不安定な振る舞いを示し、その振る舞いを確実に評価するより大きく、より多様なベンチマークの開発につながる。ここでは,モデルの性能を,より小さな評価セットでベンチマークし,解くことを提案する。まず,6つの人気言語分類ベンチマークにおいて,多くの点に対する正しいクラスに対するモデル信頼度は,モデル間で強く相関していることを示す。 Anchor Point Selectionは、データセット全体のモデル挙動をキャプチャするデータセットの小さなサブセットを選択するテクニックである。 1-30アンカーポイントを用いたモデルの評価は、正確なランキングモデルにおける一様サンプリングやその他のベースラインよりも優れています。さらに、いくつかのアンカーポイントを使用して、低平均の絶対誤差を持つデータセット内の他のすべてのポイントにおけるクラス毎のモデル予測を見積もることができる。最後に,これらの知見を可視化し,データセット分布内の様々な領域における異なるモデルの性能比較を容易にするアンカーポイントマップを提案する。 Modern language models often exhibit powerful but brittle behavior, leading to the development of larger and more diverse benchmarks to reliably assess their behavior. Here, we suggest that model performance can be benchmarked and elucidated with much smaller evaluation sets. We first show that in six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models. We build upon this phenomenon to propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset. Anchor points reliably rank models: across 87 diverse language model-prompt pairs, evaluating models using 1-30 anchor points outperforms uniform sampling and other baselines at accurately ranking models. Moreover, just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error, sufficient for gauging where the model is likely to fail. Lastly, we present Anchor Point Maps for visualizing these insights and facilitating comparisons of the performance of different models on various regions within the dataset distribution.	翻訳日:2024-02-21 05:58:54 公開日:2024-02-18
# prograsp: 物体把握のための実用的ヒューマンロボットコミュニケーション PROGrasp: Pragmatic Human-Robot Communication for Object Grasping ( http://arxiv.org/abs/2309.07759v2 ) ライセンス: Link先を確認	Gi-Cheon Kang, Junghyun Kim, Jaein Kim, Byoung-Tak Zhang	(参考訳) 対話型オブジェクトグラスピング(IOG)は、人間とロボットの自然言語による対話を通じて、望ましいオブジェクトを識別し、把握するタスクである。現在のIOGシステムは、人間が最初に対象のオブジェクトのカテゴリ(例えばボトル)を指定すると仮定している。目的達成のためにコンテキストに依存して意図を伝達する実践的手法に触発されて,新たなIOGタスクであるPragmatic-IOGと,それに対応するデータセットであるIntention-oriented Multi-modal Dialogue (IM-Dial)を導入する。提案するタスクシナリオでは、まず、意図指向の発話(例えば「喉が渇いている」など)がロボットに与えられる。ロボットは、人間のユーザと対話することで、対象物を識別する。タスク設定に基づいて,ユーザの意図を解釈し,対象物であるPROGrasp(Pragmatic Object Grasping)をピックアップするロボットシステムを提案する。 PROGraspは、視覚的なグラウンドニング、質問、オブジェクトの把握、そして最も重要なのは、実用的推論の解答解釈のモジュールを組み込むことで、Pragmatic-IOGを実行する。 ProGraspはオフライン(ターゲットオブジェクト発見)やオンライン(物理ロボットアーム付きIOG)の設定で有効であることを示す実験結果が得られた。コードとデータはhttps://github.com/gicheonkang/prograspで入手できる。 Interactive Object Grasping (IOG) is the task of identifying and grasping the desired object via human-robot natural language interaction. Current IOG systems assume that a human user initially specifies the target object's category (e.g., bottle). Inspired by pragmatics, where humans often convey their intentions by relying on context to achieve goals, we introduce a new IOG task, Pragmatic-IOG, and the corresponding dataset, Intention-oriented Multi-modal Dialogue (IM-Dial). In our proposed task scenario, an intention-oriented utterance (e.g., "I am thirsty") is initially given to the robot. The robot should then identify the target object by interacting with a human user. Based on the task setup, we propose a new robotic system that can interpret the user's intention and pick up the target object, Pragmatic Object Grasping (PROGrasp). PROGrasp performs Pragmatic-IOG by incorporating modules for visual grounding, question asking, object grasping, and most importantly, answer interpretation for pragmatic inference. Experimental results show that PROGrasp is effective in offline (i.e., target object discovery) and online (i.e., IOG with a physical robot arm) settings. Code and data are available at https://github.com/gicheonkang/prograsp.	翻訳日:2024-02-21 05:58:19 公開日:2024-02-18
# コントラスト-Phys+:時空間コントラストによる教師なし・弱教師付き遠隔生理計測 Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast ( http://arxiv.org/abs/2309.06924v3 ) ライセンス: Link先を確認	Zhaodong Sun and Xiaobai Li	(参考訳) ビデオベースの遠隔生理計測は、顔の映像を利用して血液量変化信号を測定する。 rPPG測定の監視手法は優れた性能を発揮することが示されている。しかし、これらの手法の欠点は、しばしばコストがかかり入手が困難である、地上の真実(GT)生理学的信号を持つ顔ビデオを必要とすることである。本稿では,教師なし設定と弱い教師なし設定の両方で訓練できる方法であるcon contrast-phys+を提案する。我々は3DCNNモデルを用いて、複数の時空間rPPG信号を生成し、rPPGの事前知識を対照的な損失関数に組み込む。さらに、GT信号をコントラスト学習に組み込んで、部分的または不正なラベルに適応させる。対照的な損失は、同じビデオからのrPPG/GT信号をグループ化し、異なるビデオからそれらを分離させる。 RGBおよび近赤外ビデオを含む5つの公開データセットに対して,本手法の評価を行った。コントラスト-Phys+は、部分的に利用可能または不一致のGT信号を使用する場合やラベルが全くない場合でも、最先端の教師付き手法よりも優れている。さらに,計算効率,雑音頑健性,一般化の観点から,本手法の利点を強調した。私たちのコードはhttps://github.com/zhaodongsun/contrast-physで利用可能です。 Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization. Our code is available at https://github.com/zhaodongsun/contrast-phys.	翻訳日:2024-02-21 05:57:48 公開日:2024-02-18
# DePT:パラメータ効率の良い微調整のための分解プロンプトチューニング DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning ( http://arxiv.org/abs/2309.05173v5 ) ライセンス: Link先を確認	Zhengxiang Shi, Aldo Lipani	(参考訳) 言語モデル(lm)の入力に少量の訓練可能なソフト(連続)プロンプトベクトルが固定されるプロンプトチューニング(pt)は、パラメータ効率の良い微調整(peft)のための様々なタスクやモデルに対して有望な結果を示している。 PTは、トレーニング可能なパラメータが少なくて競合性能を保ち、モデルのサイズが拡大するにつれてパラメータを劇的にスケールアップしないため、他のPEFTアプローチと際立っている。しかし、PTはソフトプロンプトトークンを導入し、入力シーケンスが長くなり、Transformerの2次複雑さによるトレーニングや推論時間、メモリ使用量に大きな影響を及ぼす。特に大きな言語モデル(llm)では、日々の大量のクエリに直面する。この問題に対処するために,ソフトプロンプトを短いソフトプロンプトと2つの異なる学習率で最適化された2つの低ランク行列に分解するDecomposed Prompt Tuning (DePT)を提案する。これにより、トレーニング可能なパラメータサイズを変更することなく、バニラPTとその変種と比較してメモリと時間コストを大幅に削減しながら、パフォーマンスが向上する。 23の自然言語処理(NLP)と視覚言語(VL)タスクに関する広範な実験を通じて、DePTが最先端のPEFTアプローチより優れていることを示す。さらに,モデルサイズが大きくなるにつれてdeptがより効率的になることを示す。さらに,DePTは数ショットの学習環境においてパラメータ効率のよい伝達学習とシームレスに統合され,様々なモデルアーキテクチャやサイズへの適応性を強調している。 Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving substantial memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.	翻訳日:2024-02-21 05:57:24 公開日:2024-02-18
# ローエンド32ビットIoTデバイス上での高速KyberのためのPlanard Arithmeticの改良 Yet another Improvement of Plantard Arithmetic for Faster Kyber on Low-end 32-bit IoT Devices ( http://arxiv.org/abs/2309.00440v3 ) ライセンス: Link先を確認	Junhao Huang, Haosong Zhao, Jipeng Zhang, Wangchen Dai, Lu Zhou, Ray C.C. Cheung, Cetin Kaya Koc, Donglong Chen	(参考訳) 本稿では、SIMD拡張のない2つのローエンド32ビットIoTプラットフォーム(ARM Cortex-M3とRISC-V)上でKyberの実装を高速化するPlanard演算の別の改良版を提案する。具体的には、計算ステップを変更することなく、Planard演算の入力範囲をさらに拡大する。 Kyber のモジュラーに対して、Planard 算術を調整した後、定数によるPlanard 乗算の入力範囲は、TCHES2022 の元の設計よりも少なくとも2.14倍大きいことを示す。次に, Cortex-M3 と RISC-V の2つの最適化手法を提案する。プランタード算術はローエンド32ビットプラットフォーム上でモンゴメリー算術とバレット算術の両方に取って代わることを示す。これらのプラットフォーム上でのインプット範囲の拡大とPlanard演算の効率的な実装により,NTT/INTTの最適化手法を提案する。ローエンド32ビットプラットフォーム上で提案したPlanard演算の入力範囲を大きくすることで,NTT/INTTにおける係数のモジュラー化を最小化あるいは完全に排除する。さらに,2つのメモリ最適化手法を提案し,cortex-m4に比較して,速度変換kyber実装のスタック使用率を23.50%から28.31%に削減した。提案した最適化により、ローエンドIoTデバイス上でのスピードバージョン実装がより実現可能になった。上記の最適化のおかげで、NTT/INTTの実装は最先端の作業と比べてかなりスピードアップしている。全体として、メモリ制限されたIoTプラットフォーム上での速度変換Kyberの実装の適用性を示し、これらのプラットフォーム上でKyberの新しい速度記録を設定します。 This paper presents another improved version of Plantard arithmetic that could speed up Kyber implementations on two low-end 32-bit IoT platforms (ARM Cortex-M3 and RISC-V) without SIMD extensions. Specifically, we further enlarge the input range of the Plantard arithmetic without modifying its computation steps. After tailoring the Plantard arithmetic for Kyber's modulus, we show that the input range of the Plantard multiplication by a constant is at least 2.14 times larger than the original design in TCHES2022. Then, two optimization techniques for efficient Plantard arithmetic on Cortex-M3 and RISC-V are presented. We show that the Plantard arithmetic supersedes both Montgomery and Barrett arithmetic on low-end 32-bit platforms. With the enlarged input range and the efficient implementation of the Plantard arithmetic on these platforms, we propose various optimization strategies for NTT/INTT. We minimize or entirely eliminate the modular reduction of coefficients in NTT/INTT by taking advantage of the larger input range of the proposed Plantard arithmetic on low-end 32-bit platforms. Furthermore, we propose two memory optimization strategies that reduce 23.50% to 28.31% stack usage for the speed-version Kyber implementation when compared to its counterpart on Cortex-M4. The proposed optimizations make the speed-version implementation more feasible on low-end IoT devices. Thanks to the aforementioned optimizations, our NTT/INTT implementation shows considerable speedups compared to the state-of-the-art work. Overall, we demonstrate the applicability of the speed-version Kyber implementation on memory-constrained IoT platforms and set new speed records for Kyber on these platforms.	翻訳日:2024-02-21 05:56:41 公開日:2024-02-18
# 多数の権限を与え、バイアスを負う: 大規模言語モデルによるジェネラリストクレジットスコアリング Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models ( http://arxiv.org/abs/2310.00566v3 ) ライセンス: Link先を確認	Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, Hao Wang	(参考訳) 金融業界では、クレジットスコアリングが基本的な要素であり、クレジットへのアクセスを形成し、個人やビジネスのローン条件を決定する。しかし、伝統的なクレジットスコアリング手法は、狭い知識範囲や独立したクレジットタスクの評価といった課題にしばしば対処している。我々の研究は、Large Language Models (LLM) が複数のタスクにまたがる強力な一般化能力を持つ信用スコアリングタスクに大きな可能性を持っていることを示唆している。クレジットスコアリングのためのLCMを体系的に探索するために,我々は,最初のオープンソース包括的フレームワークを提案する。筆者らは,14Kサンプルを用いた9つのデータセットを対象とし,LLM内の潜在的なバイアスに対する評価と評価を行うとともに,45k以上のサンプルを用いた新しいインストラクションチューニングデータについて検証した。そこで我々は,各種金融リスク評価タスクの煩雑な要求に合わせて,指導チューニングによる最初の信用リスク評価大言語モデル(CALM)を提案する。ビルドベンチマークでは,CALM,既存の最先端(SOTA)メソッド,オープンソースおよびクローズドソースのLCMを評価した。我々の経験的結果は、LLMが従来のモデルに適合するだけでなく、信用スコアがより包括的で包括的で偏見のない未来へ向けて、従来のモデルを上回る能力を示す。我々は、先駆的なインストラクションチューニングデータセット、信用とリスクアセスメントLLM、および研究コミュニティと金融業界とのベンチマークを共有することで、業界変革に貢献する。 In the financial industry, credit scoring is a fundamental element, shaping access to credit and determining the terms of loans for individuals and businesses alike. Traditional credit scoring methods, however, often grapple with challenges such as narrow knowledge scope and isolated evaluation of credit tasks. Our work posits that Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks. To systematically explore LLMs for credit scoring, we propose the first open-source comprehensive framework. We curate a novel benchmark covering 9 datasets with 14K samples, tailored for credit assessment and a critical examination of potential biases within LLMs, and the novel instruction tuning data with over 45k samples. We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks. We evaluate CALM, existing state-of-art (SOTA) methods, open source and closed source LLMs on the build benchmark. Our empirical results illuminate the capability of LLMs to not only match but surpass conventional models, pointing towards a future where credit scoring can be more inclusive, comprehensive, and unbiased. We contribute to the industry's transformation by sharing our pioneering instruction-tuning datasets, credit and risk assessment LLM, and benchmarks with the research community and the financial industry.	翻訳日:2024-02-21 05:47:51 公開日:2024-02-18
# スケールでの粒度:高解像度オーソグラフィー画像とハイブリッド学習による近隣社会経済指標の推定 Granularity at Scale: Estimating Neighborhood Socioeconomic Indicators from High-Resolution Orthographic Imagery and Hybrid Learning ( http://arxiv.org/abs/2309.16808v3 ) ライセンス: Link先を確認	Ethan Brewer, Giovani Valdrighi, Parikshit Solunke, Joao Rulff, Yurii Piadyk, Zhonghui Lv, Jorge Poco, and Claudio Silva	(参考訳) 世界の多くの地域は、既存のデータ収集方法の限界のために、人口の社会経済的幸福に関する基本的な情報を持っていない。衛星や航空機などの遠隔地から得られたオーバーヘッド画像は、地上の生命状態の窓として機能し、より高解像度のセンサーを必要とするより小さなスケールでの推定で、コミュニティ情報が不足している「ギャップに埋める」のに役立つ。センサーの解像度の改善と並行して、機械学習とコンピュータビジョンの最近の進歩により、これらの特徴を他の情報と関連付けるプロセスにおいて、画像データのパターンから素早く特徴を抽出し、検出することが可能になった。本研究は, 教師付き畳み込みニューラルネットワークと半教師付きクラスタリングという2つのアプローチが, 人口密度, 中央値の世帯所得, および全米の都市の高解像度画像から各地区の教育的到達度を推定するものである。その結果、画像から抽出された特徴は、近隣の人口密度 (r$^2$- 0.81) を正確に推定でき、教師付きアプローチにより、人口の所得と教育の変動の約半分を説明できることがわかった。地理的一般化の基盤となる提示されたアプローチに加えて、新しい半教師付きアプローチは、ラベルデータを必要としない航空画像から微細な情報を推定する将来の研究の基盤を提供する。 Many areas of the world are without basic information on the socioeconomic well-being of the residing population due to limitations in existing data collection methods. Overhead images obtained remotely, such as from satellite or aircraft, can help serve as windows into the state of life on the ground and help "fill in the gaps" where community information is sparse, with estimates at smaller geographic scales requiring higher resolution sensors. Concurrent with improved sensor resolutions, recent advancements in machine learning and computer vision have made it possible to quickly extract features from and detect patterns in image data, in the process correlating these features with other information. In this work, we explore how well two approaches, a supervised convolutional neural network and semi-supervised clustering based on bag-of-visual-words, estimate population density, median household income, and educational attainment of individual neighborhoods from publicly available high-resolution imagery of cities throughout the United States. Results and analyses indicate that features extracted from the imagery can accurately estimate the density (R$^2$ up to 0.81) of neighborhoods, with the supervised approach able to explain about half the variation in a population's income and education. In addition to the presented approaches serving as a basis for further geographic generalization, the novel semi-supervised approach provides a foundation for future work seeking to estimate fine-scale information from aerial imagery without the need for label data.	翻訳日:2024-02-21 05:46:37 公開日:2024-02-18
# LSTDとランダム特徴を用いた強化学習における二重明度について On Double Descent in Reinforcement Learning with LSTD and Random Features ( http://arxiv.org/abs/2310.05518v4 ) ライセンス: Link先を確認	David Brellmann, Elo\"ise Berthier, David Filliat and Goran Frehse	(参考訳) 時間差分法(TD)アルゴリズムは深層強化学習(RL)において広く用いられている。その性能はニューラルネットワークのサイズに大きく影響されている。教師付き学習では、過度パラメータ化の体制とその利点はよく理解されているが、RLの状況は明らかになっていない。本稿では,ネットワークサイズと$l_2$-regularizationが性能に与える影響を理論的に分析する。パラメータ数と訪問状態数との比率を重要な要因として同定し,1以上の場合の過剰パラメータ化をレジームとして定義する。さらに,二重降下現象,すなわち1のパラメータ/状態比付近で突然性能が低下する現象を観測した。ランダムな特徴と遅延学習体制を生かし、パラメータ数と状態が無限に近づき、一定比を維持するため、漸近的条件下でのLSTD(Last-Square Temporal difference)アルゴリズムについて検討する。二重降下の原因となる補正項を特徴とする経験的および真のベルマン誤差(MSBE)の決定論的限界を導出する。補正項は、$l_2$-レギュライゼーションが増加したり、見返りのない状態がゼロになったときに消滅する。合成環境と小さな実環境における数値実験は、理論的な予測と密接に一致する。 Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and $l_2$-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the $l_2$-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.	翻訳日:2024-02-21 05:36:22 公開日:2024-02-18
# Hieros: 構造化状態空間シーケンスワールドモデルに関する階層的イマジネーション Hieros: Hierarchical Imagination on Structured State Space Sequence World Models ( http://arxiv.org/abs/2310.05167v3 ) ライセンス: Link先を確認	Paul Mattes, Rainer Schlosser, Ralf Herbrich	(参考訳) 現代的深層強化学習(drl)アルゴリズムの最大の課題の1つはサンプル効率である。多くのアプローチは、エージェントを完全に想像力で訓練するために世界モデルを学び、トレーニング中に直接環境相互作用の必要性をなくす。しかし、これらの方法はしばしば想像力の正確さ、探索能力、実行時の効率の欠如に苦しむ。本研究では,時間的抽象世界表現を学習し,複数の時間的空間における軌跡を推定する階層的ポリシーであるHierosを提案する。 hierosはs5レイヤベースの世界モデルを使用して、トレーニング中と環境相互作用中の反復的に次の世界状態を並列に予測する。 s5層の特殊性により,並列に学習し,イマジネーション中に次世界の状態を反復的に予測できる。これにより、rnnベースのワールドモデルよりも効率的なトレーニングと、トランスフォーマーベースのワールドモデルよりも効率的なイマジネーションが可能になる。このアプローチはatari 100kベンチマークで平均値と平均値の正規化人間のスコアの点でアートの状態を上回っており、提案する世界モデルは複雑なダイナミクスを非常に正確に予測できることを示した。また、hierosは既存のアプローチよりも優れた探索能力を示している。 One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is sample efficiency. Many approaches learn a world model in order to train an agent entirely in imagination, eliminating the need for direct environment interaction during training. However, these methods often suffer from either a lack of imagination accuracy, exploration capabilities, or runtime efficiency. We propose Hieros, a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space. Hieros uses an S5 layer-based world model, which predicts next world states in parallel during training and iteratively during environment interaction. Due to the special properties of S5 layers, our method can train in parallel and predict next world states iteratively during imagination. This allows for more efficient training than RNN-based world models and more efficient imagination than Transformer-based world models. We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark, and that our proposed world model is able to predict complex dynamics very accurately. We also show that Hieros displays superior exploration capabilities compared to existing approaches.	翻訳日:2024-02-21 05:35:01 公開日:2024-02-18
# 不均衡階層型最適トランスポートフレームワークを用いたロバストグラフマッチング Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport Framework ( http://arxiv.org/abs/2310.12081v4 ) ライセンス: Link先を確認	Haoran Cheng, Dixin Luo, Hongteng Xu	(参考訳) グラフマッチングは、異なるグラフ間のノード対応を見つけることを目的とした、最も重要なグラフ解析タスクの1つである。既存のグラフマッチングアプローチの多くは、ノード属性やサブグラフ構造など、グラフに隠されているマルチモーダル情報を十分に活用していないため、パフォーマンスが最適でデータノイズに敏感なトポロジ情報に依存している。本研究では,不均衡な階層的最適輸送(UHOT)フレームワークに基づく新しい頑健なグラフマッチング手法を提案する。原則として、多層メッセージパッシングを適用して、各グラフを異なるモードに対応する層ワイドノード埋め込みとして表現する。 2つのグラフが与えられたとき、それぞれのノードの埋め込みをそれぞれ同じモダリティと異なるモダリティに並べる。そして、全てのアライメント結果の重み付き平均によりノード対応を推定する。この方法は、2つのグラフ間のUHOT距離を計算するために実装され、各アライメントは2つのノード埋め込み間のノードレベル最適トランスポート計画によって達成され、全てのアライメント結果の重みは不均衡なモダリティレベル最適トランスポート計画に対応する。様々なグラフマッチングタスクにおける実験は、最先端のアプローチと比較して、提案手法の優越性と頑健性を示している。実装はhttps://github.com/Dixin-Lab/UHOT-GMで公開しています。 Graph matching is one of the most significant graph analytic tasks, which aims to find the node correspondence across different graphs. Most existing graph matching approaches mainly rely on topological information, whose performances are often sub-optimal and sensitive to data noise because of not fully leveraging the multi-modal information hidden in graphs, such as node attributes, subgraph structures, etc. In this study, we propose a novel and robust graph matching method based on an unbalanced hierarchical optimal transport (UHOT) framework, which, to our knowledge, makes the first attempt to exploit cross-modal alignment in graph matching. In principle, applying multi-layer message passing, we represent each graph as layer-wise node embeddings corresponding to different modalities. Given two graphs, we align their node embeddings within the same modality and across different modalities, respectively. Then, we infer the node correspondence by the weighted average of all the alignment results. This method is implemented as computing the UHOT distance between the two graphs -- each alignment is achieved by a node-level optimal transport plan between two sets of node embeddings, and the weights of all alignment results correspond to an unbalanced modality-level optimal transport plan. Experiments on various graph matching tasks demonstrate the superiority and robustness of our method compared to state-of-the-art approaches. Our implementation is available at https://github.com/Dixin-Lab/UHOT-GM.	翻訳日:2024-02-21 05:23:47 公開日:2024-02-18
# 複素量子系における遷移状態理論の微視的導出 Microscopic derivation of transition-state theory for complex quantum systems ( http://arxiv.org/abs/2310.09537v2 ) ライセンス: Link先を確認	K. Hagino and G.F. Bertsch	(参考訳) ポテンシャル障壁による量子複雑系の崩壊は、化学においてRRKM理論として知られる遷移状態理論でしばしば説明される。ここでは、構成-相互作用基底で構築されるようなジェネリックハミルトニアンに基づく遷移状態理論の基本公式を導出する。ガウス直交アンサンブルからのランダムなハミルトニアンの2つの貯水池は、障壁における遷移状態を表す中間状態と結合される。貯水池の開水路への崩壊が大きい条件下では、反応速度の解析式が導出される。遷移状態は、総遷移確率に付加的に寄与する独立したブライト・ウィグナー共鳴として作用し、共鳴トンネル状態による電子伝導で知られている。また, 遷移確率は, 広範囲の崩壊幅にわたって第2貯留層における状態の崩壊特性とは無関係であることが判明した。 The decay of quantum complex systems through a potential barrier is often described with transition-state theory, also known as RRKM theory in chemistry. Here we derive the basic formula for transition-state theory based on a generic Hamiltonian as might be constructed in a configuration-interaction basis. Two reservoirs of random Hamiltonians from Gaussian orthogonal ensembles are coupled to intermediate states representing the transition states at a barrier. Under the condition that the decay of the reservoirs to open channels is large, an analytic formula for reaction rates is derived. The transition states act as independent Breit-Wigner resonances which contribute additively to the total transition probability, as is well known for electronic conductance through resonant tunneling states. It is also found that the transition probability is independent of the decay properties of the states in the second reservoir over a wide range of decay widths.	翻訳日:2024-02-21 05:21:05 公開日:2024-02-18
# LAiW: 中国の法律大言語モデルベンチマーク LAiW: A Chinese Legal Large Language Models Benchmark ( http://arxiv.org/abs/2310.05620v2 ) ライセンス: Link先を確認	Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang	(参考訳) 一般および法的ドメイン LLM は LegalAI の様々なタスクにおいて高いパフォーマンスを示している。しかし、これらのLLMの現在の評価は、コンピュータサイエンスの専門家によって定義されており、法的な実践の論理と整合性に欠けており、実用能力の判断が困難である。この課題に対処するため、我々はまず、法的実践の論理に基づいて、中国の法的LLMベンチマークLAiWを構築しました。法律専門家の思考プロセスや法的実践(シロジズム)に合わせるために,LLMの法的能力は,基本的な情報検索,法的基礎推論,複雑な法的応用の3つのレベルに分割する。各レベルは総合的な評価を保証するために複数のタスクを含んでいる。本ベンチマークでは,現在の一般領域と法域のLLMを自動評価することにより,これらのLLMは法的な実践の論理と一致しない可能性が示唆された。 llmは、複雑な法的応用能力を直接獲得できるが、いくつかの基本的なタスクでは性能が悪く、その実用的適用や法の専門家の受け入れに支障を来す可能性がある。法律適用シナリオにおける現在のLLMの複雑な法的な適用能力をさらに確認するために、人間の評価を法の専門家に取り入れる。その結果, LLMは高い性能を示すが, 法論理の強化が必要であることが示唆された。 General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, the current evaluations of these LLMs in LegalAI are defined by the experts of computer science, lacking consistency with the logic of legal practice, making it difficult to judge their practical capabilities. To address this challenge, we are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice. To align with the thinking process of legal experts and legal practice (syllogism), we divide the legal capabilities of LLMs from easy to difficult into three levels: basic information retrieval, legal foundation inference, and complex legal application. Each level contains multiple tasks to ensure a comprehensive evaluation. Through automated evaluation of current general and legal domain LLMs on our benchmark, we indicate that these LLMs may not align with the logic of legal practice. LLMs seem to be able to directly acquire complex legal application capabilities but perform poorly in some basic tasks, which may pose obstacles to their practical application and acceptance by legal experts. To further confirm the complex legal application capabilities of current LLMs in legal application scenarios, we also incorporate human evaluation with legal experts. The results indicate that while LLMs may demonstrate strong performance, they still require reinforcement of legal logic.	翻訳日:2024-02-21 05:19:24 公開日:2024-02-18
# DreamSmooth: Reward Smoothingによるモデルベース強化学習の改善 DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing ( http://arxiv.org/abs/2311.01450v2 ) ライセンス: Link先を確認	Vint Lee, Pieter Abbeel, Youngwoon Lee	(参考訳) モデルベース強化学習(MBRL)は、複雑な振る舞いをサンプル効率のよい方法で学習する能力で注目を集めている。その成功にもかかわらず、驚くべきことに、報酬予測はMBRLのボトルネックとなることが多い。人間が大まかな報酬推定から学べる直感に触発され、与えられた報酬の正確な報酬ではなく、時間的に滑らかな報酬を予測することを学ぶ、単純で効果的な報酬平滑化アプローチDreamSmoothを提案する。 dreamsmoothはdeepmind control suiteやatari benchmarksといった一般的なベンチマークのパフォーマンスを損なうことなく、サンプル効率と最終パフォーマンスの両方において、長時間ホリゾンスパースリワードタスクで最先端のパフォーマンスを達成している。 Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Motivated by the intuition that humans can learn from rough reward estimates, we propose a simple yet effective reward smoothing approach, DreamSmooth, which learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep. We empirically show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks both in sample efficiency and final performance without losing performance on common benchmarks, such as Deepmind Control Suite and Atari benchmarks.	翻訳日:2024-02-21 05:11:36 公開日:2024-02-18
# MCE: カントンと英語のオーディオデータセット MCE: Mixed Cantonese and English Audio Dataset ( http://arxiv.org/abs/2310.17953v2 ) ライセンス: Link先を確認	Peng Xie, Zihao Xin, Yang Wang, Shengjun Huang, Tsz Wai Chan, Kani Chen	(参考訳) 近年、whisperは英語音声認識において人間のレベルのロバスト性と正確性にアプローチしているが、マイナー言語と混合言語音声認識では、さらなる改善が必要である。本研究では、自作したデータセットであるMixed Cantoneseand English (MCE)オーディオデータセットをトレーニングしたWhisper-MCEの印象的な結果を示す。 Whisper-MCEは14.28%のMER(Mix Error Rate)を達成したが、これはオリジナルのモデルよりも35.13%低かった。また、共通音声zh-HKでは12.61%の文字誤り率(CER)を達成した。しかし、MERとCERは、混合言語とマイナー言語での有効性を評価する上で、課題となる。そこで我々は,FALと呼ばれる新しい評価基準を提案し,元の音声,精度,レイテンシに対する忠実度に基づいて自動音声認識(ASR)システムを評価する。 Whisper-MCEは、この評価基準で他のモデルよりも優れ、90.91 FALのスコアを得た。 MCEデータセットとコードはhttps://github.com/Shelton1013/Whisper MCEで見ることができる。 Recently Whisper has approached human-level robustness and accuracy in English speech recognition, while in minor language and mixed language speech recognition, there remains a compelling need for further improvement. In this work,we present the impressive results of Whisper-MCE, our fine-tuned Whisper, which was trainedusing our self-collected dataset, Mixed Cantoneseand English (MCE) audio dataset. Whisper-MCE achieved an impressive Mix Error Rate (MER) of 14.28%, which is 35.13% lower than the original model. It also achieved 12.61% Character Error Rate (CER) in Common voice zh-HK, positioning it as state-of-the-art. However, MER and CER pose challenges when it comes to evaluating its effectiveness in mixed-language and minor language contexts. We proposed a novel evaluation metric called FAL, which assesses an Automatic Speech Recognition (ASR) system based on fidelity to the original audio, accuracy, and latency. Whisper-MCE outperformed other models in this evaluation metric, achieving a score of 90.91 FAL, further highlighting its exceptional performance. The MCE dataset and code can be found at https://github.com/Shelton1013/Whisper MCE.	翻訳日:2024-02-21 05:09:02 公開日:2024-02-18
# 超伝導量子ビットをカオスに駆動する Driving superconducting qubits into chaos ( http://arxiv.org/abs/2310.17698v2 ) ライセンス: Link先を確認	Jorge Ch\'avez-Carlos, Miguel A. Prado Reynoso, Ignacio Garc\'ia-Mata, Victor S. Batista, Francisco P\'erez-Bernal, Diego A. Wisniacki, Lea F. Santos	(参考訳) カーパラメトリック発振器は、フォールトトレラント量子コンピュータのためのビルディングブロックである。それらはKerr-cat量子ビットを安定化し、エラー保護された量子情報のエンコーディングと操作の利点を提供する。カーキャット量子ビットの最近の実現は、SNAILトランスモン超伝導回路とスクイーズ駆動の非線形性を生かした。非線形性の増大はゲート時間の短縮を可能にするが、ここで示すようにカオスを引き起こして量子ビットを溶かすこともできる。我々は,kerr-cat qubit の有効領域を決定し,その崩壊を実験的に検出する方法について検討した。パラメトリック量子計算の危険領域は、駆動超伝導回路による量子カオスの研究の場でもある。 Kerr parametric oscillators are potential building blocks for fault-tolerant quantum computers. They can stabilize Kerr-cat qubits, which offer advantages toward the encoding and manipulation of error-protected quantum information. The recent realization of Kerr-cat qubits made use of the nonlinearity of the SNAIL transmon superconducting circuit and a squeezing drive. Increasing nonlinearities can enable faster gate times, but, as shown here, can also induce chaos and melt the qubit away. We determine the region of validity of the Kerr-cat qubit and discuss how its disintegration could be experimentally detected. The danger zone for parametric quantum computation is also a potential playground for investigating quantum chaos with driven superconducting circuits.	翻訳日:2024-02-21 05:08:39 公開日:2024-02-18
# ヒルベルト空間固有プロブレムによって生成される仮定公式 Summation formulas generated by Hilbert space eigenproblem ( http://arxiv.org/abs/2310.17210v3 ) ライセンス: Link先を確認	Petar Mali, Sonja Gombar, Slobodan Rado\v{s}evi\' c, Milica Rutonjski, Milan Panti\' c, Milica Pavkov-Hrvojevi\' c	(参考訳) 一般化超幾何関数を含むschl\" omilch的無限級数と級数のあるクラスは、無限ポテンシャル井戸内に閉じ込められた粒子の単純な量子モデルと量子力学の原理から、閉じた形で計算できることを実証する。我々は、ヒルベルト空間の固有プロブレムに基づく一般的なフレームワークを提供し、異なる正確な可解量子モデルに適用することができる。明確に定義された量子問題における正規化条件から級数を取得することは、それらの収束を保証する。 We demonstrate that certain classes of Schl\" omilch-like infinite series and series that include generalized hypergeometric functions can be calculated in closed form starting from a simple quantum model of a particle trapped inside an infinite potential well and using principles of quantum mechanics. We provide a general framework based on the Hilbert space eigenproblem that can be applied to different exactly solvable quantum models. Obtaining series from normalization conditions in well-defined quantum problems secures their convergence.	翻訳日:2024-02-21 05:08:28 公開日:2024-02-18
# 雑音Werner-Holevoチャネルとその特性 The noisy Werner-Holevo channel and its properties ( http://arxiv.org/abs/2310.15353v6 ) ライセンス: Link先を確認	Shayan Roofeh, Vahid Karimipour	(参考訳) Werner-Holevo チャネル $\Lambda_{1} (\rho)=\frac{1}{2}(\text{tr}(\rho)I-\rho^T)$ への関心は主に、その抽象的な数学的性質に起因する。三次元およびわずかな修正により、このチャネルはランダムな角度でランダムな方向における量子状態の回転として実現できることを示した。我々の修正は $\Lambda_x(\rho)=(1-x)\rho+x\Lambda_1(\rho)$ の形を取る。したがって、量子処理タスクにおけるクトリットの潜在的利用や、様々なプラットフォームにおけるそれらの実現を考えると、修正されたwerner-holevoチャネルは、量子ビットに対する脱分極チャネルと同様に、非常に単純で現実的なノイズモデルとして使用できる。我々は、このチャネルを詳細に研究し、その様々な特性を導き出す。特に、最近提案されたフラグ拡張や他の手法を用いて、このチャネルの異なる容量に対する解析的表現と境界を導出する。これらの導出において対称性の役割が明らかになる。また、チャネル $\Lambda_x$ が反分解可能であり、したがって領域 $\frac{4}{7}\leq x\leq 1.$ において量子容量がゼロであることを厳格に証明する。 The interest in the Werner-Holevo channel $\Lambda_{1} (\rho)=\frac{1}{2}(\text{tr}(\rho)I-\rho^T)$ has been mainly due to its abstract mathematical properties. We show that in three dimensions and with a slight modification, this channel can be realized as the rotation of qutrit states in random directions by random angles. Our modification takes the form $\Lambda_x(\rho)=(1-x)\rho+x\Lambda_1(\rho)$. Therefore and in view of the potential use of qutrits in quantum processing tasks and their realization in many different platforms, the modified Werner-Holevo channel can be used as a very simple and realistic noise model, in the same way that the depolarizing channel is for qubits. We will make a detailed study of this channel and derive its various properties. In particular, we will use the recently proposed flag extension and other techniques to derive analytical expressions and bounds for the different capacities of this channel. The role of symmetry is revealed in these derivations. We also rigorously prove that the channel $\Lambda_x$ is anti-degradable and hence has zero quantum capacity, in the region $\frac{4}{7}\leq x\leq 1.$	翻訳日:2024-02-21 05:08:16 公開日:2024-02-18
# 羊の服を着たオオカミ:ネストした脱獄プロンプトは大きな言語モデルを簡単に騙す A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily ( http://arxiv.org/abs/2311.08268v2 ) ライセンス: Link先を確認	Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, Shujian Huang	(参考訳) ChatGPTやGPT-4のような大規模言語モデル(LLM)は、有用で安全な応答を提供するように設計されている。しかし、"jailbreaks"と呼ばれる敵のプロンプトは、LLMが潜在的に有害な内容を生成するため、保護を回避することができる。ジェイルブレイクのプロンプトを探索することは、LSMの弱点を明らかにするのに役立ちます。残念ながら、既存のjailbreakメソッドは複雑な手動設計に悩まされるか、他のホワイトボックスモデルの最適化が必要であり、一般化や効率を損なう。本稿では,(1)プロンプトリライトと(2)シナリオネスティングの2つの側面にジェイルブレイク即時攻撃を一般化する。そこで本研究では,LDM自体を利用して効果的なジェイルブレイクプロンプトを生成する自動フレームワークReNeLLMを提案する。大規模な実験により、ReNeLLMは攻撃成功率を大幅に改善し、既存のベースラインと比較して時間コストを大幅に削減することが示された。また,LLMの保護における現在の防御方法の欠如も明らかにした。最後に,迅速な実行優先度の観点からllms防御の失敗を分析し,対応する防衛戦略を提案する。我々は,学術コミュニティとLLM開発者の両方に,より安全で規制の厳しいLLMの提供を促すことを願っている。コードはhttps://github.com/NJUNLP/ReNeLLMで入手できる。 Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe responses. However, adversarial prompts known as 'jailbreaks' can circumvent safeguards, leading LLMs to generate potentially harmful content. Exploring jailbreak prompts can help to better reveal the weaknesses of LLMs and further steer us to secure them. Unfortunately, existing jailbreak methods either suffer from intricate manual design or require optimization on other white-box models, compromising generalization or efficiency. In this paper, we generalize jailbreak prompt attacks into two aspects: (1) Prompt Rewriting and (2) Scenario Nesting. Based on this, we propose ReNeLLM, an automatic framework that leverages LLMs themselves to generate effective jailbreak prompts. Extensive experiments demonstrate that ReNeLLM significantly improves the attack success rate while greatly reducing the time cost compared to existing baselines. Our study also reveals the inadequacy of current defense methods in safeguarding LLMs. Finally, we analyze the failure of LLMs defense from the perspective of prompt execution priority, and propose corresponding defense strategies. We hope that our research can catalyze both the academic community and LLMs developers towards the provision of safer and more regulated LLMs. The code is available at https://github.com/NJUNLP/ReNeLLM.	翻訳日:2024-02-21 04:59:34 公開日:2024-02-18
# もう一度質問する:(ほとんど)すべてのシナリオで、セルフアグリメントが言語モデルの推論を改善する Ask One More Time: Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios ( http://arxiv.org/abs/2311.08154v2 ) ライセンス: Link先を確認	Lei Lin, Jiayi Fu, Pengli Liu, Qingyang Li, Yan Gong, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai	(参考訳) チェーン・オブ・シンクレット(CoT)と言語モデルの組み合わせは複雑な推論タスクにおいて促進的な結果をもたらすが、CoTプロンプトで使用される単純なグレディ・デコードは通常、反復性と局所最適性を引き起こす。この欠点に対処するため、アンサンブル最適化は最終解集合を得るために複数の推論経路を得ようとする。しかし、現在のアンサンブル最適化手法では、単に \textit{self-consistency}のようなルールベースの後処理を用いるか、複数の推論パスの中で最良のものを選択するタスク関連のヒューマンアノテーションに基づいた追加モデルを訓練するが、入力された質問の種類や推論パスの回答形式が不明な現実的な設定に一般化できない。その限界を避けるために,入力質問のタイプや推論パスの回答形式が不明な場合,ほぼすべてのシナリオに適用可能な,一般化されたアンサンブル最適化手法である \textbf{self-agreement} を提案する。まず、言語モデルのデコーダからサンプルを取得して、推論パスの \textit{diverse} 集合を生成し、その後、サンプルされた推論パスの中から最も \textit{agreed} 回答を選択することで、言語モデル \textit{one more time} に最適な回答を決定するように促す。自己分離は、6つの公開推論ベンチマークと優れた一般化能力を同時に達成する。 Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{Self-Agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.	翻訳日:2024-02-21 04:59:10 公開日:2024-02-18
# ResMGCN: 高速バイオメディカルインタラクションのための残留メッセージグラフ畳み込みネットワーク ResMGCN: Residual Message Graph Convolution Network for Fast Biomedical Interactions Discovering ( http://arxiv.org/abs/2311.07632v2 ) ライセンス: Link先を確認	Zecheng Yin	(参考訳) バイオメディカル情報グラフは、生物医療、バイオインフォマティクス、ヒトの医療コミュニティの関心を惹きつける多種多様な分子相互作用の同定や薬物発見など、現代におけるバイオメディカル情報の発見に不可欠である。今日では、バイオメディカル情報の実体を学習し、最先端の結果と生体分子の相互作用を正確に明らかにするために、グラフニューラルネットワークがますます多く提案されている。これらの手法は、遠方から特徴の消失を防ぎつつ、冗長なメモリと時間を犠牲にしてそのような問題を治療する。本稿では,異なる考え方で高速かつ正確な生体医学的相互作用予測を行うための,新しい残差メッセージグラフ畳み込みネットワーク (resmgcn) を提案する。具体的には、遠くのノードからメッセージを拡張する代わりに、ResMGCNは下位情報を次のラウンドの上位情報と集約してノード更新をガイドし、より意味のあるノード表現を得る。 resmgcnは、前層からの様々なメッセージと現在の層内の高次情報を最小のメモリと時間コストで認識・保存することができ、生体医学的実体の情報表現を得ることができる。タンパク質・タンパク質・薬物・薬物・ターゲット・遺伝子・疾患の相互作用を含む4つのバイオメディカル相互作用ネットワークデータセットについて実験を行い、ResMGCNが従来の最先端モデルより優れており、記憶と時間の両方において非常に有効であることを示した。 Biomedical information graphs are crucial for interaction discovering of biomedical information in modern age, such as identification of multifarious molecular interactions and drug discovery, which attracts increasing interests in biomedicine, bioinformatics, and human healthcare communities. Nowadays, more and more graph neural networks have been proposed to learn the entities of biomedical information and precisely reveal biomedical molecule interactions with state-of-the-art results. These methods remedy the fading of features from a far distance but suffer from remedying such problem at the expensive cost of redundant memory and time. In our paper, we propose a novel Residual Message Graph Convolution Network (ResMGCN) for fast and precise biomedical interaction prediction in a different idea. Specifically, instead of enhancing the message from far nodes, ResMGCN aggregates lower-order information with the next round higher information to guide the node update to obtain a more meaningful node representation. ResMGCN is able to perceive and preserve various messages from the previous layer and high-order information in the current layer with least memory and time cost to obtain informative representations of biomedical entities. We conduct experiments on four biomedical interaction network datasets, including protein-protein, drug-drug, drug-target, and gene-disease interactions, which demonstrates that ResMGCN outperforms previous state-of-the-art models while achieving superb effectiveness on both storage and time.	翻訳日:2024-02-21 04:58:22 公開日:2024-02-18
# sac3:semantic-aware cross-check consistencyによるブラックボックス言語モデルの信頼性の高い幻覚検出 SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency ( http://arxiv.org/abs/2311.01740v2 ) ライセンス: Link先を確認	Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley A. Malin, Sricharan Kumar	(参考訳) 幻覚検出は、現代言語モデル(LM)の信頼性を理解するための重要なステップである。この目的を達成するために,lmsの自己矛盾に基づく既存の検出アプローチを再検討し,その結果生じる2種類の幻覚を明らかにする。 1)質問レベルと回答 2)自己整合性チェックのみでは効果的に識別できないモデルレベル。この発見に基づいて, 自己一貫性検査の原理に基づいて拡張した新しいサンプリングベース手法,すなわちsemantic-aware cross-check consistency (sac3)を提案する。我々のSAC3アプローチは、意味論的に等価な質問摂動やモデル間の応答整合性チェックなどの進歩を活用することで、質問レベルとモデルレベルの幻覚の両方を検出するための追加メカニズムを組み込んでいる。広範かつ体系的な実証分析を通じて、SAC3は複数の質問応答およびオープンドメイン生成ベンチマークにおいて、非実例と実例の両方の検出において、技術の現状より優れていることを示す。 Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs). To achieve this goal, we re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level, which cannot be effectively identified through self-consistency check alone. Building upon this discovery, we propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking. Our SAC3 approach incorporates additional mechanisms to detect both question-level and model-level hallucinations by leveraging advances including semantically equivalent question perturbation and cross-model response consistency checking. Through extensive and systematic empirical analysis, we demonstrate that SAC3 outperforms the state of the art in detecting both non-factual and factual statements across multiple question-answering and open-domain generation benchmarks.	翻訳日:2024-02-21 04:55:13 公開日:2024-02-18
# 大規模言語モデルからどこまで様々な視点を抽出できるか? How Far Can We Extract Diverse Perspectives from Large Language Models? ( http://arxiv.org/abs/2311.09799v2 ) ライセンス: Link先を確認	Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang	(参考訳) 多様な人間の意見を集めるのは費用がかかり難い。これは、さまざまなデータを生成し、潜在的にスケーラブルで効率的なソリューションを提供するために、人間と大規模言語モデル(LLM)の協調作業の最近の傾向につながります。しかしながら、主観的話題に対する多様な視点を生み出すllmsの能力は、未解決の疑問である。本研究では,社会規範や論証文などの主観的話題に多様な視点と理性をもたらすLLMの能力について検討する。 LLMから最大多様性抽出の新しい問題を定式化する。本研究は, 人間の価値観を生かし, 多様な意見の基盤となる基準に基づく促進手法を提案する。 LLMからどの程度多様な視点を抽出できるか、あるいは多様性カバレッジと呼ばれるかを調べるため、反復的な方法でモデルからより多くの出力を生成するためにステップバイステップのリコールプロンプトを採用している。様々なタスクにメソッドを適用すると、実際にLLMはタスク主観性の度合いに応じて多様な意見を生成できることがわかった。 Collecting diverse human opinions is costly and challenging. This leads to a recent trend in collaborative efforts between humans and Large Language Models (LLMs) for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate LLMs' capacity for generating diverse perspectives and rationales on subjective topics, such as social norms and argumentative texts. We formulate a new problem of maximum diversity extraction from LLMs. Motivated by how humans develop their opinions through their values, we propose a criteria-based prompting technique to ground diverse opinions. To see how far we can extract diverse perspectives from LLMs, or called diversity coverage, we employ a step-by-step recall prompting for generating more outputs from the model in an iterative manner. As we apply our methods to various tasks, indeed we find that LLMs can generate diverse opinions according to the degree of task subjectivity	翻訳日:2024-02-21 04:45:51 公開日:2024-02-18
# DocLens:医療用テキスト生成のための多面的きめ細かい評価 DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation ( http://arxiv.org/abs/2311.09581v2 ) ライセンス: Link先を確認	Yiqing Xie, Sheng Zhang, Hao Cheng, Pengfei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, Carolyn Rose	(参考訳) 医療用テキスト生成は、行政業務の支援と意思決定を支援するための健全な情報強調を目的としている。医療用テキストの具体的な要件を反映するため,本論文では,生成したテキストの完全性,簡潔性,属性をきめ細かなレベルで評価するための指標セットを提案する。メトリクスは、インストラクションフォロー(プロプライエタリとオープンソースの両方)や教師付きエンテーメントモデルなど、さまざまなタイプの評価者によって計算できる。臨床ノート作成,放射線報告書要約,患者の質問要約の3つのタスクにおいて,doclensが3つの評価器で有効性を示す。総合的な人間の研究によると、DocLensは既存の指標よりも医療専門家の判断とかなり高い一致を示している。結果はまた、オープンソースの評価ツールの改善の必要性を強調し、潜在的な方向性を提案する。 Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including instruction-following (both proprietary and open-source) and supervised entailment models. We demonstrate the effectiveness of the resulting framework, DocLens, with three evaluators on three tasks: clinical note generation, radiology report summarization, and patient question summarization. A comprehensive human study shows that DocLens exhibits substantially higher agreement with the judgments of medical experts than existing metrics. The results also highlight the need to improve open-source evaluators and suggest potential directions.	翻訳日:2024-02-21 04:44:52 公開日:2024-02-18
# symbol-llm: 大規模言語モデルのための基本記号中心インタフェースに向けて Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models ( http://arxiv.org/abs/2311.09278v2 ) ライセンス: Link先を確認	Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu	(参考訳) 大規模言語モデル(llm)は、人間に似たテキストの処理と生成において顕著な能力を示すが、自然言語の境界を超えて広がる世界知識の理解と表現(例えば化学分子公式)に関して制限がある。 LLMのトレーニングに直接シンボリックデータのコレクションを注入することは、異なるシンボリックファミリー間のシナジーを無視し、自然なデータとシンボリックデータのバランスの取れた混合の必要性を見落としているため、問題となる。本研究では、データとフレームワークの観点からこれらの課題に取り組み、Symbol-LLMシリーズモデルを導入する。まず、34のタスクからなるデータコレクションをキュレーションし、約20の異なるシンボリックファミリーを組み込んで、相互関係を捉え、シンボル間の相乗効果を育む。そして、2段階のチューニングフレームワークは、一般化能力を失うことなく記号的知識を注入することに成功した。シンボル中心タスクとNL中心タスクの広範な実験は、Symbol-LLMシリーズモデルのバランスと優れた性能を示している。プロジェクトページはhttps://xufangzhi.github.io/symbol-llm-page/。 Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e.g., chemical molecular formula). Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disregards the synergies among different symbolic families and overlooks the need for a balanced mixture of natural and symbolic data. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. First, we curated a data collection consisting of 34 tasks and incorporating approximately 20 distinct symbolic families, intending to capture the interrelations and foster synergies between symbols. Then, a two-stage tuning framework succeeds in injecting symbolic knowledge without loss of the generality ability. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models. The project page is https://xufangzhi.github.io/symbol-llm-page/.	翻訳日:2024-02-21 04:44:10 公開日:2024-02-18
# タスク特化知識のない自己強化学習のための自己監督型カリキュラム生成 Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge ( http://arxiv.org/abs/2311.09195v2 ) ライセンス: Link先を確認	Sang-Hyun Lee and Seung-Woo Seo	(参考訳) 現在の強化学習アルゴリズムを現実世界のシナリオに適用する際の大きなボトルネックは、各エピソード間の環境をリセットする必要があることである。このリセットプロセスは人間の介入を必要とするため、エージェントが継続的に自律的に学習することは困難である。いくつかの最近の研究は、リセットとフォワードを共同でトレーニングするためのカリキュラムを生成する自律強化学習(ARL)アルゴリズムを導入している。彼らのカリキュラムは、エージェントの学習の進捗を考慮して、必要な手動リセットの数を減らすことができるが、事前定義された初期状態やリセット報酬関数のようなタスク固有の知識に依存している。本稿では,タスク固有の知識を使わずに,エージェントの学習進捗に適応したカリキュラムを生成する新しいARLアルゴリズムを提案する。我々のカリキュラムは、エージェントが多様かつ情報的な初期状態に自律的にリセットする権限を与えます。これを実現するために,エージェントがフォワードポリシーに従うと,各初期状態から成功確率を推定する成功判別器を導入する。成功判別器は自己監督的な方法で可逆遷移で訓練される。実験の結果, arlアルゴリズムは適応型カリキュラムを生成でき, エージェントのブートストラップにより, スパース・リワードの迷路ナビゲーションや操作タスクを効率的に解くことができ, 手動リセットの少ないベースラインよりも優れていた。 A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.	翻訳日:2024-02-21 04:43:51 公開日:2024-02-18
# 時間的相関、コヒーレンス、ポスト選択が2光子干渉に及ぼす影響 Impact of temporal correlations, coherence, and postselection on two-photon interference ( http://arxiv.org/abs/2312.01503v2 ) ライセンス: Link先を確認	Fernando Redivo Cardoso, Jaewon Lee, Riccardo Checchinato, Jan-Heinrich Littmann, Marco De Gregorio, Sven H\"ofling, Christian Schneider, Celso J. Villas-Boas, Ana Predojevi\'c	(参考訳) 2光子干渉は量子フォトニクスにおいて必須の資源であるが、達成は容易ではない。光子対のカスケード生成は、2光子干渉を行う能力に悪影響を及ぼす固有の時間的相関を含むため、応用を妨げる。このような相関関係がデコヒーレンスや時間的ポストセレクションとどのように相互作用し、時間的ポストセレクションが2光子干渉の可視性を改善するかについて報告する。本研究は重要なパラメータを特定し,最適性能のソースへの道を示す。 Two-photon interference is an indispensable resource in quantum photonics, but it is not straightforward to achieve. The cascaded generation of photon pairs contains intrinsic temporal correlations that negatively affect the ability of such sources to perform two-photon interference, thus hindering applications. We report on how such correlation interplays with decoherence and temporal postselection, and under which conditions temporal postselection could improve two-photon interference visibility. Our study identifies crucial parameters and points the way to a source with optimal performance.	翻訳日:2024-02-21 04:35:21 公開日:2024-02-18
# ブロック圧縮特徴を用いたリアルタイム神経材料 Real-Time Neural Materials using Block-Compressed Features ( http://arxiv.org/abs/2311.16121v2 ) ライセンス: Link先を確認	Cl\'ement Weinreich, Louis de Oliveira, Antoine Houdard, Georges Nader	(参考訳) 神経材料は典型的にはデコーダネットワークと共に神経特徴の集合から成る。このようなモデルをリアルタイムレンダリングパイプラインに統合する上での大きな課題は、GPUメモリに機能を格納するために必要な大きなサイズと、ネットワークを効率的に評価する複雑性にある。本稿では,機能とデコーダをリアルタイムレンダリングパイプライン用に特別に設計したニューラルマテリアルモデルを提案する。我々のフレームワークはハードウェアベースのブロック圧縮(BC)テクスチャフォーマットを利用して学習した特徴を記憶し、そのモデルに空間と規模で連続的に材料情報を出力するように訓練する。これを実現するため、ブロックベースで特徴を整理し、トレーニング中にBC6の圧縮をエミュレートし、通常のBC6テクスチャとしてエクスポートする。この構造により、メモリフットプリントを低く保ちながら高解像度の機能を利用することができます。これにより、モデル全体の能力が向上し、シェーダ内で直接評価可能な軽量でシンプルなデコーダアーキテクチャが利用可能になります。さらに、学習した機能は継続的に復号化できるため、ランダムuvサンプリングとスケール間のスムーズな遷移を、その後のフィルタリングを必要とせずに実現することができる。その結果、我々の神経材料はメモリフットプリントが小さく、非常に高速にデコードでき、レンダリングパイプラインに最小の計算オーバーヘッドを加えることができる。 Neural materials typically consist of a collection of neural features along with a decoder network. The main challenge in integrating such models in real-time rendering pipelines lies in the large size required to store their features in GPU memory and the complexity of evaluating the network efficiently. We present a neural material model whose features and decoder are specifically designed to be used in real-time rendering pipelines. Our framework leverages hardware-based block compression (BC) texture formats to store the learned features and trains the model to output the material information continuously in space and scale. To achieve this, we organize the features in a block-based manner and emulate BC6 decompression during training, making it possible to export them as regular BC6 textures. This structure allows us to use high resolution features while maintaining a low memory footprint. Consequently, this enhances our model's overall capability, enabling the use of a lightweight and simple decoder architecture that can be evaluated directly in a shader. Furthermore, since the learned features can be decoded continuously, it allows for random uv sampling and smooth transition between scales without needing any subsequent filtering. As a result, our neural material has a small memory footprint, can be decoded extremely fast adding a minimal computational overhead to the rendering pipeline.	翻訳日:2024-02-21 04:35:05 公開日:2024-02-18
# 物理学におけるAlpha Zero:Alpha Zeroを用いたシンボリック回帰の物理解析への応用 Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics ( http://arxiv.org/abs/2311.12713v3 ) ライセンス: Link先を確認	Yoshihiro Michishita	(参考訳) ニューラルネットワークによる機械学習は、自然言語処理、画像認識、ゲーム勝利、さらには物理学の問題など、さまざまなタスクのための、ますます強力なツールになりつつある。機械学習を数値計算や実験の支援に応用する研究は数多く存在するが、解析方法を見つけるために機械学習を適用する方法はあまり研究されていない。本稿では、アルファゼロアルゴリズム(α zero for physics (azfp))を用いた記号回帰を用いて、物理学における解析手法を開発する枠組みを提案する。実演として、AZfPはFloquetシステムの高周波展開を導出できることを示す。 AZfPは物理学の新しい理論フレームワークを開発する可能性がある。 Machine learning with neural networks is now becoming a more and more powerful tool for various tasks, such as natural language processing, image recognition, winning the game, and even for the issues of physics. Although there are many studies on the application of machine learning to numerical calculation and assistance of experiments, the methods of applying machine learning to find the analytical method are poorly studied. In this paper, we propose the frameworks of developing analytical methods in physics by using the symbolic regression with the Alpha Zero algorithm, that is Alpha Zero for physics (AZfP). As a demonstration, we show that AZfP can derive the high-frequency expansion in the Floquet systems. AZfP may have the possibility of developing a new theoretical framework in physics.	翻訳日:2024-02-21 04:32:34 公開日:2024-02-18
# 難易度対策と文脈情報に基づくToken-Level Adversarial Prompt Detection Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information ( http://arxiv.org/abs/2311.11509v3 ) ライセンス: Link先を確認	Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, and Viswanathan Swaminathan	(参考訳) 近年,様々なアプリケーションにおいて,Large Language Models (LLM) が重要なツールとして登場している。しかし、これらのモデルは敵のプロンプト攻撃の影響を受けやすいため、攻撃者はLSMを誤る入力文字列を慎重にキュレートし、誤った出力や望ましくない出力を生成することができる。従来の研究によると、離散最適化に基づく比較的単純な効果的な攻撃では、モデルのモデレーションやアライメントをバイパスする逆のプロンプトを生成することができる。敵に対するこの脆弱性は、LSMの堅牢性と信頼性に関する重要な懸念を浮き彫りにする。本研究の目的は,次のトークンの確率を予測するLLMの能力を活用して,トークンレベルでの敵対的プロンプトの検出に新たなアプローチを導入することである。本研究では,高い確率で予測されるトークンが正規であり,高いパープレキシティを示すトークンが逆数としてフラグ付けされるような,モデルのパープレキシティの度合いを測定する。さらに,提案手法では,隣接トークン情報を組み込んだコンテキスト理解も統合し,連続した敵のプロンプトシーケンスの検出を促進する。この目的のために、最適化手法に基づく2つのアルゴリズムと確率的グラフィカルモデル(PGM)に基づく2つのアルゴリズムを設計する。どちらの手法も効率的な解法を備えており、効率のよい逆数検出が可能である。トークンレベルの検出結果は、テキストシーケンス上のヒートマップオーバーレイとして可視化でき、テキストのどの部分が逆プロンプトを含んでいるかを明確により直感的に表現することができます。 In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that mislead LLMs into generating incorrect or undesired outputs. Previous work has revealed that with relatively simple yet effective attacks based on discrete optimization, it is possible to generate adversarial prompts that bypass moderation and alignment of the models. This vulnerability to adversarial prompts underscores a significant concern regarding the robustness and reliability of LLMs. Our work aims to address this concern by introducing a novel approach to detecting adversarial prompts at a token level, leveraging the LLM's capability to predict the next token's probability. We measure the degree of the model's perplexity, where tokens predicted with high probability are considered normal, and those exhibiting high perplexity are flagged as adversarial. Additionaly, our method also integrates context understanding by incorporating neighboring token information to encourage the detection of contiguous adversarial prompt sequences. To this end, we design two algorithms for adversarial prompt detection: one based on optimization techniques and another on Probabilistic Graphical Models (PGM). Both methods are equipped with efficient solving methods, ensuring efficient adversarial prompt detection. Our token-level detection result can be visualized as heatmap overlays on the text sequence, allowing for a clearer and more intuitive representation of which part of the text may contain adversarial prompts.	翻訳日:2024-02-21 04:32:04 公開日:2024-02-18
# 地平線からのデコヒーレンス:一般定式化と回転ブラックホール Decoherence from Horizons: General Formulation and Rotating Black Holes ( http://arxiv.org/abs/2311.11461v2 ) ライセンス: Link先を確認	Samuel E. Gralla and Hongji Wei	(参考訳) Danielson, Satishchandran, and Wald (DSW) による最近の研究は、ブラックホール ― そして実際、キリング地平線はより一般的に ― が、近くの全ての量子スーパーポジションに基本的なデコヒーレンスの割合を与えることを示した。ブラックホールの観測者(bob)は、重ねられた重力場を測定することによって、量子重ね合わせの外側を乱すことができるはずであるが、その作用は(因果性によって)この効果を持つことができないため、重ね合わせは自動的に妨害されなければならない。 DSWは、シュワルツシルト時空における遠い観測者、平時時におけるリンドラー観測者、デ・シッター時空における静的観測者に対して、デコヒーレンス率を未知の数値要因まで計算した。電磁的およびクライン=ゴードンアナログで作業し、それらの計算を一般化し、バイフルケートキリング地平線近傍のキリング観測者に対する正確なデコヒーレンス率の一般的な公式を導出する。カーブラックホールの対称性軸上の任意の位置における観測者に対する閉形式の速度を評価する。これにより、遠方のオブザーバーであるシュワルツシルトの結果における数値的要因が修正され、また近接ホリゾンおよび/または極端に近い振る舞いの新たな探索が可能になる。電磁界の場合、クーロン場がブラックホールに入るのを遮蔽する「ブラックホールマイスナー効果」のため、デコヒーレンスは極端に完全に消滅する。ボブは外側の重ね合わせの場を測定することができないので、非一貫性は必要ありません。 Recent work by Danielson, Satishchandran, and Wald (DSW) has shown that black holes -- and, in fact, Killing horizons more generally -- impart a fundamental rate of decoherence on all nearby quantum superpositions. The effect can be understood from measurement and causality: An observer (Bob) in the black hole should be able to disturb outside quantum superpositions by measuring their superposed gravitational fields, but since his actions cannot (by causality) have this effect, the superpositions must automatically disturb themselves. DSW calculated the rate of decoherence up to an unknown numerical factor for distant observers in Schwarzschild spacetime, Rindler observers in flat spacetime, and static observers in de Sitter spacetime. Working in electromagnetic and Klein-Gordon analogs, we flesh out and generalize their calculation to derive a general formula for the precise decoherence rate for Killing observers near bifurcate Killing horizons. We evaluate the rate in closed form for an observer at an arbitrary location on the symmetry axis of a Kerr black hole. This fixes the numerical factor in the distant-observer Schwarzschild result, while allowing new exploration of near-horizon and/or near-extremal behavior. In the electromagnetic case we find that the decoherence vanishes entirely in the extremal limit, due to the "Black hole Meissner effect" screening the Coulomb field from entering the black hole. This supports the causality picture: Since Bob is unable to measure the field of the outside superposition, no decoherence is necessary -- and indeed none occurs.	翻訳日:2024-02-21 04:31:39 公開日:2024-02-18
# 構造認識型スパースビューX線3次元再構成 Structure-Aware Sparse-View X-ray 3D Reconstruction ( http://arxiv.org/abs/2311.10959v2 ) ライセンス: Link先を確認	Yuanhao Cai, Jiahao Wang, Zongwei Zhou, Angtian Wang, Alan Yuille	(参考訳) 物体の内部構造を明らかにする能力で知られているx線は、可視光よりもリッチな3d再構成情報を提供することが期待されている。しかし、既存のニューラル放射場(NeRF)アルゴリズムは、X線の重要な性質を無視し、画像化された物体の構造的内容の取得に制限をもたらす。本稿では, スパースビューX線3次元再構成のための構造対応X線ニューラルラジオ密度場(SAX-NeRF)を提案する。まず,SAX-NeRFのバックボーンとしてLineformer(Lineformer)を設計する。 Linefomerは、X線の各線分内の依存関係をモデル化することで、3D空間内のオブジェクトの内部構造をキャプチャする。次に,2次元投影における文脈的および幾何学的情報を抽出するためのマスキング局所グローバル(mlg)レイサンプリング戦略を提案する。さらに、より広いX線アプリケーションをカバーする大規模なデータセットX3Dを収集する。 X3Dの実験では、SAX-NeRFは、新しいビュー合成とCT再構成において、従来のNeRF法を12.56と2.49dBで上回っている。コード、モデル、データはhttps://github.com/caiyuanhao1998/SAX-NeRFで公開される。 X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data will be released at https://github.com/caiyuanhao1998/SAX-NeRF	翻訳日:2024-02-21 04:31:05 公開日:2024-02-18
# 微調整型大言語モデルのためのデミスティファイション命令混合 Demystifying Instruction Mixing for Fine-tuning Large Language Models ( http://arxiv.org/abs/2312.10793v3 ) ライセンス: Link先を確認	Renxi Wang, Haonan Li, Minghao Wu, Yuxia Wang, Xudong Han, Chiyu Zhang, Timothy Baldwin	(参考訳) インストラクションチューニングは、様々なタスクにわたる大規模言語モデル(LLM)の性能を大幅に向上させる。しかし、LLM微調整のための命令データセットの混合を最適化する手順はまだ理解されていない。本研究は,NLPダウンストリームタスク,コーディング,一般的なチャットの3つに分類する。提案手法は,LLMの性能に異なるデータセットの組み合わせが与える影響について検討し,特定の命令型が特定のアプリケーションに有利であるが,他の領域に悪影響を及ぼす可能性があることを示す。この研究は、命令の混合に関する洞察を与え、将来の研究の基礎を築いた。 Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks. However, the procedure to optimizing the mixing of instruction datasets for LLM fine-tuning is still poorly understood. This study categorizes instructions into three primary types: NLP downstream tasks, coding, and general chat. We explore the effects of instruction tuning on different combinations of datasets on LLM performance, and find that certain instruction types are more advantageous for specific applications but can negatively impact other areas. This work provides insights into instruction mixtures, laying the foundations for future research.	翻訳日:2024-02-21 04:10:40 公開日:2024-02-18
# 大規模言語モデルアライメントの多様な選好について On Diversified Preferences of Large Language Model Alignment ( http://arxiv.org/abs/2312.07401v3 ) ライセンス: Link先を確認	Dun Zeng, Yong Dai, Pengyu Cheng, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu	(参考訳) 大規模言語モデル(LLM)を人間の好みに合わせることが,LLMのインタラクション品質向上の鍵であると認識されている。しかし、この多元的世界では、アノテータの異なる嗜好によって人間の嗜好が多様化し、LCMアライメント手法の有効性を阻害する。本稿では,ヒトのフィードバックデータセットを定量的に分析し,様々な好みが報酬モデルに与える影響について検討する。本研究では,報酬モデル(RM)の校正性能とLLMのアライメント性能の相関関係を明らかにする。その結果,様々な選好データが,例えば \textit{Harmless\&Helpful} などの人為的選好に対するRMの校正性能に悪影響を及ぼし,LCM のアライメント性能を損なうことがわかった。そこで本研究では, RMの校正性能を向上するMORE(Multi-Objective Reward Learning Method)を提案する。 3つのモデルと5つの人間好みデータセットで実験を行い,結果の検証を行った。提案手法はRMの予測キャリブレーションを大幅に改善し,Alpaca-7B モデルと \textit{Harmless\&Helpful} モデルのアライメントを向上させる。さらに,報奨校正性能と選好アライメント性能の関連性から,キャリブレーション誤差がRM評価の指標となることが示唆された。オープンソースのコードとデータは、 \url{https://github.com/dunzeng/more}で入手できる。 Aligning large language models (LLMs) with human preferences has been recognized as the key to improving LLMs' interaction quality. However, in this pluralistic world, human preferences can be diversified due to annotators' different tastes, which hinders the effectiveness of LLM alignment methods. This paper presents the first quantitative analysis of commonly used human feedback datasets to investigate the impact of diversified preferences on reward modeling. Our analysis reveals a correlation between the calibration performance of reward models (RMs) and the alignment performance of LLMs. We find that diversified preference data negatively affect the calibration performance of RMs on human-shared preferences, such as \textit{Harmless\&Helpful}, thereby impairing the alignment performance of LLMs. To address the ineffectiveness, we propose a novel Multi-Objective Reward learning method (MORE) to enhance the calibration performance of RMs on shared preferences. We validate our findings by experiments on three models and five human preference datasets. Our method significantly improves the prediction calibration of RMs, leading to better alignment of the Alpaca-7B model with \textit{Harmless\&Helpful} preferences. Furthermore, the connection between reward calibration and preference alignment performance suggests that calibration error can be adopted as a key metric for evaluating RMs. The open-source code and data are available at \url{https://github.com/dunzeng/MORE}.	翻訳日:2024-02-21 04:08:32 公開日:2024-02-18
# 効率的なニューラルネットワークのためのクラスアウェアプルーニング Class-Aware Pruning for Efficient Neural Networks ( http://arxiv.org/abs/2312.05875v2 ) ライセンス: Link先を確認	Mengnan Jiang, Jingcun Wang, Amro Eldebiky, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Grace Li Zhang	(参考訳) ディープニューラルネットワーク(DNN)は様々な分野で顕著な成功を収めている。しかし、DNNにおける多数の浮動小数点演算(FLOP)は、エッジデバイスのようなリソース制約のアプリケーションに展開する上での課題となっている。この問題に対処するため、DNNの実行における計算コストを削減するためにプルーニングが導入された。従来のプルーニング戦略は、重量値、勾配値、アクティベーション出力に基づいている。本稿では,dnnを圧縮するクラスアウェアプルーニング手法を提案し,dnnの計算コストを削減するための新しい視点を提供する。各イテレーションで、ニューラルネットワークのトレーニングが変更され、クラス認識の刈り込みが容易になる。その後、クラス数に関するフィルタの重要性が評価される。いくつかのクラスでのみ重要なフィルタは削除される。ニューラルネットワークは、発生した精度の損失を補償するために再トレーニングされる。プルーニングのイテレーションは、フィルタがなくなるまで終了し、残りのフィルタが多くのクラスにとって非常に重要であることを示す。このプルーニング法は, 従来のプルーニング法よりも精度, プルーニング率, FLOPsの低減に優れていた。実験の結果, このクラスアウェアプルーニング手法は, 高い推定精度を維持しつつ, 重みとフラップ数を大幅に削減できることがわかった。 Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weight values, gradient values and activation outputs. Different from previous pruning solutions, in this paper, we propose a class-aware pruning technique to compress DNNs, which provides a novel perspective to reduce the computational cost of DNNs. In each iteration, the neural network training is modified to facilitate the class-aware pruning. Afterwards, the importance of filters with respect to the number of classes is evaluated. The filters that are only important for a few number of classes are removed. The neural network is then retrained to compensate for the incurred accuracy loss. The pruning iterations end until no filter can be removed anymore, indicating that the remaining filters are very important for many classes. This pruning technique outperforms previous pruning solutions in terms of accuracy, pruning ratio and the reduction of FLOPs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs, while maintaining a high inference accuracy.	翻訳日:2024-02-21 04:07:22 公開日:2024-02-18
# NeRFをベースとした色とオパクティを持つガウススメッティング Gaussian Splatting with NeRF-based Color and Opacity ( http://arxiv.org/abs/2312.13729v3 ) ライセンス: Link先を確認	Dawid Malarz, Weronika Smolak, Jacek Tabor, S{\l}awomir Tadeja, Przemys{\l}aw Spurek	(参考訳) neural radiance fields (nerfs) は、3dオブジェクトの複雑さを捉えるためのニューラルネットワークの驚くべき可能性を実証している。ニューラルネットワークの重みの中に形状と色情報をエンコードすることで、NeRFは3Dオブジェクトの驚くほどシャープな新しいビューを生み出すのに優れています。近年, 生成モデルを用いたNeRFの一般化が数多く現れ, その汎用性が高まっている。対照的に、gaussian splatting (gs) はニューラルネットワークを必要とせず、より高速なトレーニングと推論で同様のレンダリング品質を提供する。ガウス分布の集合に3Dオブジェクトに関する情報をエンコードし、古典的メッシュと同様に3Dで描画できる。残念ながら、GSは通常数十万のガウス成分を必要とするため、条件付けが難しい。両モデルの注意点を緩和するため,3dオブジェクトの形状のgs表現と,nerfに基づく色と不透明のエンコーディングを用いたハイブリッドモデル視聴方向ガウススプレーティング(vdgs)を提案する。我々のモデルは、ガウス分布とトレーニング可能な位置(すなわちガウスの手段)、形状(ガウスの共分散)、色と不透明度、ニューラルネットワークを用いており、ガウス分布と視方向のパラメータを使って色と不透明度の変化を生成する。その結果、3dオブジェクトのシャドウ、光反射、透明性をよりよく記述した。 Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects.	翻訳日:2024-02-21 03:55:00 公開日:2024-02-18
# 擬似ブールモデルカウンタの工学 Engineering an Exact Pseudo-Boolean Model Counter ( http://arxiv.org/abs/2312.12341v2 ) ライセンス: Link先を確認	Suwei Yang and Kuldeep S. Meel	(参考訳) モデルカウント(英: model counting)とは、コンピュータ科学における基本的なタスクであり、結合正規形(cnf)で表されるブール公式の割り当て数を決定することを含む。 CNF式に対するモデルカウントは幅広い用途で広く注目されているが、Pseudo-Boolean(PB)式に対するモデルカウントの研究は比較的見過ごされている。擬ブール公式は命題のブール公式よりも簡潔であり、現実世界の問題を表現できる柔軟性を提供する。その結果,PB式に対するモデルカウントの効率的な手法を検討する必要がある。本研究では,代数的決定図による知識コンパイルアプローチに依拠する,最初の完全擬ボアリーンモデルカウンタpbcountを提案する。 pbcountは1513インスタンスのカウントを計算できるが、現在の最先端のアプローチでは1013インスタンスしか処理できない。私たちの研究は,事前処理手法の開発や知識コンパイル以外のアプローチの探求など,pb公式のモデルカウントという文脈で,今後の作業へのいくつかの道を開いた。 Model counting, a fundamental task in computer science, involves determining the number of satisfying assignments to a Boolean formula, typically represented in conjunctive normal form (CNF). While model counting for CNF formulas has received extensive attention with a broad range of applications, the study of model counting for Pseudo-Boolean (PB) formulas has been relatively overlooked. Pseudo-Boolean formulas, being more succinct than propositional Boolean formulas, offer greater flexibility in representing real-world problems. Consequently, there is a crucial need to investigate efficient techniques for model counting for PB formulas. In this work, we propose the first exact Pseudo-Boolean model counter, PBCount, that relies on knowledge compilation approach via algebraic decision diagrams. Our extensive empirical evaluation shows that PBCount can compute counts for 1513 instances while the current state-of-the-art approach could only handle 1013 instances. Our work opens up several avenues for future work in the context of model counting for PB formulas, such as the development of preprocessing techniques and exploration of approaches other than knowledge compilation.	翻訳日:2024-02-21 03:53:31 公開日:2024-02-18
# EASYTOOL:簡潔ツール指導によるLCMエージェントの強化 EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction ( http://arxiv.org/abs/2401.06201v2 ) ライセンス: Link先を確認	Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing Yang	(参考訳) 現実世界の複雑なタスクに対処するため、大規模言語モデル(LLM)の応用におけるツール利用への関心が高まっている。 LLMベースのエージェントを開発するには、通常、異なるツールドキュメントから多くのツール機能を理解する必要がある。しかし、これらのドキュメンテーションは多様で冗長で不完全で、ツールを使用する際のllmの能力に大きな影響を与えます。そこで本稿では,多種多様なツールドキュメントを統一的かつ簡潔なツール命令に変換するためのフレームワークであるEASYTOOLを紹介する。 EasyToolは、異なるソースの広範なツールドキュメントから必須情報を浄化し、標準化されたツール記述とLLMベースのエージェントの機能を提供する統一されたインターフェース(ツールインストラクション)を精査する。複数のタスクに関する大規模な実験は、EasyToolがトークン消費を大幅に削減し、現実のシナリオにおけるツール利用のパフォーマンスを向上させることを実証している。私たちのコードは将来的には \url{https://github.com/microsoft/JARVIS/} で利用可能になります。 To address intricate real-world tasks, there has been a rising interest in tool utilization in applications of large language models (LLMs). To develop LLM-based agents, it usually requires LLMs to understand many tool functions from different tool documentation. But these documentations could be diverse, redundant or incomplete, which immensely affects the capability of LLMs in using tools. To solve this, we introduce EASYTOOL, a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction for easier tool usage. EasyTool purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i.e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents. Extensive experiments on multiple different tasks demonstrate that EasyTool can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios. Our code will be available at \url{https://github.com/microsoft/JARVIS/} in the future.	翻訳日:2024-02-21 03:42:56 公開日:2024-02-18
# GeoDecoder: マルチモーダルマップ理解の強化 GeoDecoder: Empowering Multimodal Map Understanding ( http://arxiv.org/abs/2401.15118v2 ) ライセンス: Link先を確認	Feng Qi, Mian Dai, Zixian Zheng, Chao Wang	(参考訳) 本稿では,地理空間情報を処理するための専用マルチモーダルモデルgeodecoderを提案する。 GeoDecoderはBeitGPTアーキテクチャに基づいて構築されており、画像やテキスト処理の専門的なモジュールが組み込まれている。画像側では、GeoDecoderはGaoDe Amapを基盤となるベースマップとして使用しています。レンダリング技術の利用により、モデルは外部データとシンボルマーカー、ドライブ軌道、ヒートマップ、ユーザ定義マーカーなどの機能をシームレスに統合し、追加の機能エンジニアリングの必要性をなくす。 geodecoderのテキストモジュールは、さまざまなコンテキストテキストと質問プロンプトを受け付け、gptのスタイルでテキスト出力を生成する。さらに、GPTベースのモデルは、エンドツーエンドで同じモデル内で複数のタスクのトレーニングと実行を可能にする。北京の地理空間の分布に関する知識をジオデコーダが取得できるようにするため,8つの基本的な地理空間課題を考案し,大規模テキスト画像サンプルを用いてモデルの事前学習を行った。その後、3つの下流タスクで迅速な微調整が行われ、パフォーマンスが大幅に向上した。 geodecoderモデルは、マップ要素とその関連操作の包括的理解を示し、異なるビジネスシナリオにおける多様な地理空間タスクの効率的かつ高品質な適用を可能にする。 This paper presents GeoDecoder, a dedicated multimodal model designed for processing geospatial information in maps. Built on the BeitGPT architecture, GeoDecoder incorporates specialized expert modules for image and text processing. On the image side, GeoDecoder utilizes GaoDe Amap as the underlying base map, which inherently encompasses essential details about road and building shapes, relative positions, and other attributes. Through the utilization of rendering techniques, the model seamlessly integrates external data and features such as symbol markers, drive trajectories, heatmaps, and user-defined markers, eliminating the need for extra feature engineering. The text module of GeoDecoder accepts various context texts and question prompts, generating text outputs in the style of GPT. Furthermore, the GPT-based model allows for the training and execution of multiple tasks within the same model in an end-to-end manner. To enhance map cognition and enable GeoDecoder to acquire knowledge about the distribution of geographic entities in Beijing, we devised eight fundamental geospatial tasks and conducted pretraining of the model using large-scale text-image samples. Subsequently, rapid fine-tuning was performed on three downstream tasks, resulting in significant performance improvements. The GeoDecoder model demonstrates a comprehensive understanding of map elements and their associated operations, enabling efficient and high-quality application of diverse geospatial tasks in different business scenarios.	翻訳日:2024-02-21 03:34:19 公開日:2024-02-18
# PsySafe: 多エージェントシステム安全の心理的攻撃・防衛・評価のための総合的枠組み PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety ( http://arxiv.org/abs/2401.11880v2 ) ライセンス: Link先を確認	Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao	(参考訳) 大規模言語モデル(llm)で拡張されたマルチエージェントシステムは、集団知性において深い能力を発揮する。しかし、悪意のある目的のためにこのインテリジェンスの潜在的誤用は重大なリスクをもたらす。現在,マルチエージェントシステムの安全性に関する総合的な研究は限られている。本稿では,エージェント心理学の革新的なレンズを通して,エージェントの暗黒心理状態が安全性に対する重大な脅威となることを明らかにする。これらの問題に対処するために,エージェント心理学を基盤とした包括的枠組み(PsySafe)を提案する。まず,エージェントのダークパーソナリティ特性がいかに危険行動を引き起こすか,次に心理的・行動的観点からマルチエージェントシステムの安全性を評価すること,そしてリスクを軽減する効果的な戦略を考案することである。実験により,エージェント間の集団的危険行動,危険行動に関わるエージェントの自己反射,エージェントの心理的評価と危険行動の相関など,いくつかの興味深い現象が明らかになった。我々は,マルチエージェントシステムの安全性に関するさらなる研究に,我々のフレームワークと観測が貴重な洞察を提供することを期待している。データとコードをhttps://github.com/AI4Good24/PsySafeで公開します。 Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety. To tackle these concerns, we propose a comprehensive framework (PsySafe) grounded in agent psychology, focusing on three key areas: firstly, identifying how dark personality traits in agents can lead to risky behaviors; secondly, evaluating the safety of multi-agent systems from the psychological and behavioral perspectives, and thirdly, devising effective strategies to mitigate these risks. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at https://github.com/AI4Good24/PsySafe.	翻訳日:2024-02-21 03:33:29 公開日:2024-02-18
# LightDiC: 大規模図形表現学習におけるシンプルかつ効果的なアプローチ LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning ( http://arxiv.org/abs/2401.11772v2 ) ライセンス: Link先を確認	Xunkai Li, Meihao Liao, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang	(参考訳) 既存のグラフニューラルネットワーク(GNN)のほとんどは、キャプチャされたリレーショナル情報の制限範囲が、実世界のシナリオにおける表現能力とデプロイメントを妨げる、非ダイレクトグラフに限られている。非有向グラフと比較して、有向グラフ (digraphs) は、輸送や金融ネットワークなどのノード間のより複雑な関係を捉えることにより、より複雑なトポロジーシステムのモデリングの要求に合致する。いくつかの指向型GNNが導入されたが、そのインスピレーションは主にディープラーニングアーキテクチャによるもので、冗長な複雑性と計算をもたらし、大規模データベースには適用できない。これらの問題に対処するために、磁気ラプラシアンに基づくダイグラフ畳み込みのスケーラブルな変種であるLightDiCを提案する。トポロジ関連の計算はオフライン前処理でのみ実行されるため、lightdicは例外的なスケーラビリティを実現し、再帰的な計算コストを伴わずに下流の予測を個別に訓練することができる。理論的解析により、lightdicはディリクレエネルギー最適化関数の近位勾配降下過程に対応する複素場に基づくメッセージパッシングを達成するために、ディグラフ信号のデノイジングの観点から有向情報を利用することが示され、その表現性が保証される。実験の結果、LightDiCは様々な下流タスクにおいて、学習可能なパラメータが少なく、訓練効率も高く、他のSOTAメソッドよりも優れていた。特に、LightDiCは最も代表的な大規模データベース(ogbn-papers100M)で満足できる結果を提供する最初のDiGNNである。 Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).	翻訳日:2024-02-21 03:33:08 公開日:2024-02-18
# R-Judge: LLMエージェントの安全リスク意識のベンチマーク R-Judge: Benchmarking Safety Risk Awareness for LLM Agents ( http://arxiv.org/abs/2401.10019v2 ) ライセンス: Link先を確認	Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, Gongshen Liu	(参考訳) 大規模言語モデル(LLM)は、現実世界のアプリケーション間で自律的にタスクを完了させる大きな可能性を示している。それにもかかわらず、これらのllmエージェントは、対話環境での運用において予期せぬ安全性リスクをもたらす。本研究は, LLM生成コンテンツの安全性を従来の研究で重視する代わりに, 多様な環境下でのLCMエージェントの行動安全のベンチマークの必要性に対処する。 r-judgeは,エージェントインタラクション記録による安全リスクの判定と同定において,llmの熟練度を評価するためのベンチマークである。 r-judgeはマルチターンエージェントインタラクションの162レコードで構成され、7つのアプリケーションカテゴリと10のリスクタイプのうち27の重要なリスクシナリオを包含する。安全に関する人間のコンセンサスと、注釈付き安全ラベルと高品質のリスク記述が組み込まれている。 r-judge における 9 llm の評価は llm のリスク意識を高める余地がある: ベストパフォーマンスモデル gpt-4 は 89.07% の人間のスコアに対して 72.52% を達成し、他の全てのモデルはランダムより少ない。さらに,環境フィードバックとしてリスク記述を活用することで,大幅な性能向上が期待できることを示す。事例研究では,オープンエージェントシナリオにおけるリスク認識は,知識と推論を伴う多次元的能力であり,現在のllmでは困難であることを明らかにした。 R-Judgeはhttps://github.com/Lordog/R-Judgeで公開されている。 Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering on LLM-generated content safety in most prior studies, this work addresses the imperative need for benchmarking the behavioral safety of LLM agents within diverse environments. We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records. R-Judge comprises 162 records of multi-turn agent interaction, encompassing 27 key risk scenarios among 7 application categories and 10 risk types. It incorporates human consensus on safety with annotated safety labels and high-quality risk descriptions. Evaluation of 9 LLMs on R-Judge shows considerable room for enhancing the risk awareness of LLMs: The best-performing model, GPT-4, achieves 72.52% in contrast to the human score of 89.07%, while all other models score less than the random. Moreover, further experiments demonstrate that leveraging risk descriptions as environment feedback achieves substantial performance gains. With case studies, we reveal that correlated to parameter amount, risk awareness in open agent scenarios is a multi-dimensional capability involving knowledge and reasoning, thus challenging for current LLMs. R-Judge is publicly available at https://github.com/Lordog/R-Judge.	翻訳日:2024-02-21 03:32:24 公開日:2024-02-18
# SymTC : 腰部MRIのインスタンス分割のための共生トランスフォーマー-CNNネット SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI ( http://arxiv.org/abs/2401.09627v3 ) ライセンス: Link先を確認	Jiasong Chen, Linchen Qian, Linhai Ma, Timur Urakov, Weiyong Gu, Liang Liang	(参考訳) 椎間板疾患は一般的な疾患であり、しばしば間欠的または持続的な腰痛につながり、この疾患の診断と評価は腰椎mri画像から椎間板と椎間板の正確な測定に依存している。ディープニューラルネットワーク(DNN)モデルは、腰椎の個々のインスタンス(ディスクと脊椎)のより効率的なイメージセグメンテーションを自動化された方法で臨床医を支援する。本研究では,トランスフォーマーと畳み込みニューラルネットワーク(CNN)の強みを組み合わせた,革新的な腰椎MR画像分割モデルであるSymTCを提案する。具体的には、cnn層とtransformer層をマージする並列なデュアルパスアーキテクチャを設計し、トランスのセルフアテンションモジュールに新しい位置埋め込みを組み込むことにより、より正確なセグメンテーションのための位置情報の利用を強化した。モデル性能をさらに向上させるため,ssmspineと呼ばれる合成的で現実的なmr画像データセットを作成するための新しいデータ拡張技術を導入した。 ssmspineデータセットとプライベートデータセットのsymtcおよび既存の15のイメージセグメンテーションモデルを,dice類似度係数と95%ハウスドルフ距離の2つの指標を用いて評価した。その結果,SymTCは腰椎MRI画像における椎骨と椎間板のセグメンテーションに最適であることが示唆された。 SymTCコードとSSMSpineデータセットはhttps://github.com/jiasongchen/SymTCで公開されている。 Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain, and diagnosing and assessing of this disease rely on accurate measurement of vertebral bone and intervertebral disc geometries from lumbar MR images. Deep neural network (DNN) models may assist clinicians with more efficient image segmentation of individual instances (disks and vertebrae) of the lumbar spine in an automated way, which is termed as instance image segmentation. In this work, we proposed SymTC, an innovative lumbar spine MR image segmentation model that combines the strengths of Transformer and Convolutional Neural Network (CNN). Specifically, we designed a parallel dual-path architecture to merge CNN layers and Transformer layers, and we integrated a novel position embedding into the self-attention module of Transformer, enhancing the utilization of positional information for more accurate segmentation. To further improves model performance, we introduced a new data augmentation technique to create synthetic yet realistic MR image dataset, named SSMSpine, which is made publicly available. We evaluated our SymTC and the other 15 existing image segmentation models on our private in-house dataset and the public SSMSpine dataset, using two metrics, Dice Similarity Coefficient and 95% Hausdorff Distance. The results show that our SymTC has the best performance for segmenting vertebral bones and intervertebral discs in lumbar spine MR images. The SymTC code and SSMSpine dataset are available at https://github.com/jiasongchen/SymTC.	翻訳日:2024-02-21 03:31:58 公開日:2024-02-18
# 技術報告:ノードの到達不能性を考慮したゴシップ学習の収束について Technical Report: On the Convergence of Gossip Learning in the Presence of Node Inaccessibility ( http://arxiv.org/abs/2401.09498v2 ) ライセンス: Link先を確認	Tian Liu, Yue Cui, Xueyang Hu, Yecheng Xu, Bo Liu	(参考訳) gossip learning(gl)は、連合学習(federated learning:fl)の代替として、無人航空機(uavs)によって形成される空飛ぶアドホックネットワーク(fanets)のようなリソース制約された無線ネットワークに適している。 GLは、UAVネットワークの効率を大幅に向上し、バッテリー寿命を延長することができる。この利点にもかかわらず、GLの性能はデータ分散、通信速度、ネットワーク接続に強く影響されている。しかし、これらの因子がGL収束にどのように影響するかはいまだ不明である。既存の研究は、コンビニエンスのために仮想量に基づくglの収束を研究したが、いくつかのノードがアクセスできない場合、ネットワークの実際の状態を反映しなかった。本稿では,動的ネットワークトポロジの下でglに対するアクセス不能ノードの影響を定式化し,検討する。まず、ノードがアクセス可能かどうかによって重み分散を分解する。そこで我々は,ノードアクセシビリティの動的条件下でのGL収束について検討し,到達不能ノード数,データ非i.d.ネス,到達不能期間が収束に与える影響を理論的に示す。理論的な結果の正しさを包括的に検証するために,実践的な実験を行った。 Gossip learning (GL), as a decentralized alternative to federated learning (FL), is more suitable for resource-constrained wireless networks, such as Flying Ad-Hoc Networks (FANETs) that are formed by unmanned aerial vehicles (UAVs). GL can significantly enhance the efficiency and extend the battery life of UAV networks. Despite the advantages, the performance of GL is strongly affected by data distribution, communication speed, and network connectivity. However, how these factors influence the GL convergence is still unclear. Existing work studied the convergence of GL based on a virtual quantity for the sake of convenience, which failed to reflect the real state of the network when some nodes are inaccessible. In this paper, we formulate and investigate the impact of inaccessible nodes to GL under a dynamic network topology. We first decompose the weight divergence by whether the node is accessible or not. Then, we investigate the GL convergence under the dynamic of node accessibility and theoretically provide how the number of inaccessible nodes, data non-i.i.d.-ness, and duration of inaccessibility affect the convergence. Extensive experiments are carried out in practical settings to comprehensively verify the correctness of our theoretical findings.	翻訳日:2024-02-21 03:31:28 公開日:2024-02-18
# 部分観測による空間・時間連続物理シミュレーション Space and Time Continuous Physics Simulation From Partial Observations ( http://arxiv.org/abs/2401.09198v2 ) ライセンス: Link先を確認	Janny Steeven, Nadri Madiha, Digne Julie, Wolf Christian	(参考訳) 物理シミュレーションの最新の技術は、精度と複雑性のトレードオフに対処する数値スキームとメッシュリファインメント法に依存しているが、これらの手作りのソリューションは面倒で高い計算力を必要とする。大規模機械学習に基づくデータ駆動方式は、より直接的かつ効率的に長距離依存関係を統合することにより、高い適応性を実現する。本研究では,流体力学に焦点をあて,正則あるいは不規則な格子の形での計算と予測の固定的なサポートに基づく,文献の大部分の欠点に対処した。本研究では,空間的・時間的領域の連続的な予測を行うための新しい手法を提案する。本稿では,この課題を二重観測問題として定式化し,それぞれスパース位置と連続領域の2つの相互結合力学系を持つ解を提案し,初期状態からの解の予測と補間を可能にする。我々の実践的な実装は、繰り返しGNNと任意の場所で解を補間できる時空間注意オブザーバを含む。我々のモデルは(標準の自己回帰モデルのように)新しい初期条件に一般化するだけでなく、任意の空間と時間の位置で評価を行う。流体力学の標準データセットを3つ評価し、古典的設定と連続予測を必要とする拡張された新しいタスクの両方において優れたベースラインと比較した。 Modern techniques for physical simulations rely on numerical schemes and mesh-refinement methods to address trade-offs between precision and complexity, but these handcrafted solutions are tedious and require high computational power. Data-driven methods based on large-scale machine learning promise high adaptivity by integrating long-range dependencies more directly and efficiently. In this work, we focus on fluid dynamics and address the shortcomings of a large part of the literature, which are based on fixed support for computations and predictions in the form of regular or irregular grids. We propose a novel setup to perform predictions in a continuous spatial and temporal domain while being trained on sparse observations. We formulate the task as a double observation problem and propose a solution with two interlinked dynamical systems defined on, respectively, the sparse positions and the continuous domain, which allows to forecast and interpolate a solution from the initial condition. Our practical implementation involves recurrent GNNs and a spatio-temporal attention observer capable of interpolating the solution at arbitrary locations. Our model not only generalizes to new initial conditions (as standard auto-regressive models do) but also performs evaluation at arbitrary space and time locations. We evaluate on three standard datasets in fluid dynamics and compare to strong baselines, which are outperformed both in classical settings and in the extended new task requiring continuous predictions.	翻訳日:2024-02-21 03:30:44 公開日:2024-02-18
# AI適応画像ラベリングにおけるコンフォーマル予測セットの有用性の評価 Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling ( http://arxiv.org/abs/2401.08876v3 ) ライセンス: Link先を確認	Dongping Zhang, Angelos Chatzimparmpas, Negar Kamali, and Jessica Hullman	(参考訳) ディープニューラルネットワークは高スループット領域に一般的に展開されるため、その解釈可能性の欠如は不確実性定量化を難しくする。共形予測セット$\unicode{x2013}$aの分布のない不確実性定量化$\unicode{x2013}$aの方法が、aiが助言する意思決定における不確実性を表現するために有効であることを検証した。大規模なオンライン実験を通じて、共形予測セットの有用性を、AIが推奨する画像ラベリングのためのTop-$とTop-k$の表示と比較する。事前登録された分析では,精度の予測セットの有用性はタスクの難易度に応じて変化し,Top-1$とTop-k$と同等以上の精度で画像の表示が可能であるのに対し,アウト・オブ・ディストリビューション(OOD)画像のラベル付けにおいて人を支援するための予測セットは優れている。本研究は,共形予測セットの実際的課題を実証的に特定し,実世界の意思決定にどのように組み込むかを示す。 As deep neural networks are more commonly deployed in high-stakes domains, their lack of interpretability makes uncertainty quantification challenging. We investigate the effects of presenting conformal prediction sets$\unicode{x2013}$a method for generating valid confidence sets in distribution-free uncertainty quantification$\unicode{x2013}$to express uncertainty in AI-advised decision-making. Through a large online experiment, we compare the utility of conformal prediction sets to displays of Top-$1$ and Top-$k$ predictions for AI-advised image labeling. In a pre-registered analysis, we find that the utility of prediction sets for accuracy varies with the difficulty of the task: while they result in accuracy on par with or less than Top-$1$ and Top-$k$ displays for easy images, prediction sets excel at assisting humans in labeling out-of-distribution (OOD) images, especially when the set size is small. Our results empirically pinpoint the practical challenges of conformal prediction sets and provide implications on how to incorporate them for real-world decision-making.	翻訳日:2024-02-21 03:30:21 公開日:2024-02-18
# AI信頼度測定のための統計フレームワーク A Statistical Framework for Measuring AI Reliance ( http://arxiv.org/abs/2401.15356v2 ) ライセンス: Link先を確認	Ziyang Guo, Yifan Wu, Jason Hartline and Jessica Hullman	(参考訳) 人間はしばしば人工知能(AI)システムの助けを借りて意思決定をする。一般的なパターンは、最終決定をコントロールしている人間に対して、AIがアクションを推奨することである。研究者は、補完的なパフォーマンスを達成する上で重要な要素として、人間がAIに適切に依存していることを確認する。このような研究で使われる適切な信頼度の定義は、形式的な統計的根拠が欠如しており、矛盾を招く可能性がある。統計的決定理論に基づき,AIの予測に従う確率として信頼の概念を,人間が信号の識別や状況に関する正確な信念形成に直面する可能性のある課題から分離する形式的信頼の定義を提案する。私たちの定義は、人間とAIの相補性と信頼に関する研究の設計と解釈を導くのに使用できるフレームワークを生み出します。近年のaiによる意思決定研究を文献から活用し,信号の正確な区別ができないことによる損失と,誤依存による損失を分離するために,我々のフレームワークがいかに利用できるかを実証する。これらの損失を,行動意思決定者と同じ意思決定課題に直面した合理的な意思決定者によって達成される期待された報酬によって定義される相補的性能の基準とベンチマークと比較することにより評価する。 Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's prediction from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.	翻訳日:2024-02-21 03:19:59 公開日:2024-02-18
# インストラクションファインチューニング: プロンプト損失は重要か? Instruction Fine-Tuning: Does Prompt Loss Matter? ( http://arxiv.org/abs/2401.13586v2 ) ライセンス: Link先を確認	Mathew Huerta-Enochian	(参考訳) 本稿では,教師付き命令の微調整におけるplwの効果について検討する。 LLaMA 1とLLaMA 2の両方と複数の命令データセットを用いて、スタンフォード大学のAlpaca実験を再現した。短時間補完データセットで微調整したモデルの性能はPLWと統計的に有意な負の二次関係を示したが,中長期補完データで微調整したモデルの性能はPLWとは何の関係も示さなかった。即時損失は多くのデータセットに対して安全に無視できる。短時間補完データの場合,PLWの小さな値 (0.01-0.1) は複数選択および短世代タスクに最適であり,PLWの大きな値 (~1.0) は長世代タスクに最適であった。その結果、低非ゼロPLWはトレーニング中にトレーニング済みモデル重量から逸脱しないようにし、高いPLWは過度な適合を減少させる。最後に、微調整データの完成-急激な長さ比に基づいてPLW値を選択するための粗いガイドを示す。 We present a study analyzing the effects of prompt loss weighting (PLW) on supervised instruction fine-tuning. We recreated Stanford's Alpaca experiment with both LLaMA 1 and LLaMA 2 and multiple instruction datasets. We found that performance of models fine-tuned on our short-completion dataset had a statistically significant negative quadratic relationship with PLW, but performance of models fine-tuned on medium- and long-completion data did not show any relationship with PLW. I.e., prompt loss can be safely ignored for many datasets. For short-completion data, small values (0.01-0.1) of PLW were optimal for multiple-choice and short-generation tasks while large values (~ 1.0) of PLW were optimal for long-generation tasks. We concluded that low non-zero PLW encourages models to not diverge from pre-trained model weights during training and high PLW reduces overfitting. Finally, we present a rough guide for selecting PLW values based on the completion-prompt length ratio of fine-tuning data.	翻訳日:2024-02-21 03:18:31 公開日:2024-02-18
# 進化的アルゴリズムと強化学習の橋渡し:包括的調査 Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey ( http://arxiv.org/abs/2401.11963v2 ) ライセンス: Link先を確認	Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang	(参考訳) 進化的アルゴリズム(EA)と強化学習(RL)を統合した進化的強化学習(ERL)は,優れた性能向上を示す。両アプローチの強みを融合させることで、ERLは有望な研究方向として現れている。本調査では,ERLの多様な研究分野について概観する。具体的には、関連アルゴリズムの最近の進歩を体系的に要約し、RLのEA支援最適化、EAのRL支援最適化、EAとRLの相乗的最適化の3つの研究方向を特定する。その後、各研究の方向性を詳細に分析し、複数の研究部門を編成する。それぞれのブランチが取り組もうとしている問題と、EAとRLの統合がこれらの課題にどのように対処するかを明らかにする。結論として,様々な研究方向の潜在的な課題と今後の研究方向性について論じる。研究者によるERLの探究を容易にするため, https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learningに関するアルゴリズムとコードを整理した。 Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing the strengths of both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in relevant algorithms and identify three primary research directions: EA-assisted optimization of RL, RL-assisted optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EA and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning.	翻訳日:2024-02-21 03:17:08 公開日:2024-02-18
# TrustAgent:エージェント・コンスティチューションによる安全で信頼できるLDMエージェントを目指して TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution ( http://arxiv.org/abs/2402.01586v2 ) ライセンス: Link先を確認	Wenyue Hua, Xianjun Yang, Zelong Li, Wei Cheng, Yongfeng Zhang	(参考訳) llmに基づくエージェントの出現は、かなりの注目を集めているが、信頼度は未調査領域である。エージェントは物理的な環境と直接対話できるので、信頼性と安全性は重要です。本稿では,エージェント・コンスティチューションをベースとしたエージェント・フレームワークであるTrustAgentについて述べる。本枠組みは, 計画作成前のモデルに安全知識を注入する事前計画戦略, 計画作成時の安全性を高める内計画戦略, 計画後検査による安全性を確保する後計画戦略からなる。実験により,これらの手法がLLMエージェントの安全性を効果的に高め,潜在的な危険を識別し,防止する方法を実証する。さらに, 安全性と利便性の複雑な関係, モデルの推論能力と安全エージェントとしての有効性について検討した。本稿では,LLMをベースとしたエージェントの設計と展開に安全意識と信頼性を組み込むことが,その性能向上だけでなく,人間中心環境への責任ある統合を確実にするためにも不可欠であることを示す。データとコードはhttps://github.com/agiresearch/trustagentで入手できる。 The emergence of LLM-based agents has garnered considerable attention, yet their trustworthiness remains an under-explored area. As agents can directly interact with the physical environment, their reliability and safety is critical. This paper presents an Agent-Constitution-based agent framework, TrustAgent, an initial investigation into improving the safety dimension of trustworthiness in LLM-based agents. This framework consists of threefold strategies: pre-planning strategy which injects safety knowledge to the model prior to plan generation, in-planning strategy which bolsters safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection. Through experimental analysis, we demonstrate how these approaches can effectively elevate an LLM agent's safety by identifying and preventing potential dangers. Furthermore, we explore the intricate relationships between safety and helpfulness, and between the model's reasoning ability and its efficacy as a safe agent. This paper underscores the imperative of integrating safety awareness and trustworthiness into the design and deployment of LLM-based agents, not only to enhance their performance but also to ensure their responsible integration into human-centric environments. Data and code are available at https://github.com/agiresearch/TrustAgent.	翻訳日:2024-02-21 03:08:39 公開日:2024-02-18
# Vaccine: 大規模言語モデルのための摂動認識アライメント Vaccine: Perturbation-aware Alignment for Large Language Model ( http://arxiv.org/abs/2402.01109v2 ) ライセンス: Link先を確認	Tiansheng Huang, Sihao Hu, Ling Liu	(参考訳) ユーザがアップロードした有害なデータのいくつかは、微調整を簡単に騙してアライメントブロッキングモデルを生成することができる。我々は経験的解析を行い,アライメント・ブロッケン効果の帰結を示唆する現象である \textit{harmful embedded drift} を解明する。本稿では,ユーザのセキュリティリスクを軽減するために,摂動認識アライメント技術であるVaccineを提案する。 Vaccineの中核となる考え方は、アライメントフェーズにおいて、職人的な摂動を徐々に加えることで、不変な隠れ埋め込みを作り出すことである。これにより、埋め込みは、微調整フェーズにおける不衛生なユーザデータからの有害な摂動に耐えることができる。オープンソース主流のllm(例えばllama2, opt, vicuna)における結果から,ワクチンは有害なプロンプトによる埋没ドリフトに対するアライメントの頑健性を高めつつ,良性プロンプトに対する推論能力を維持することができることが示されている。私たちのコードは \url{https://github.com/git-disl/Vaccine} で利用可能です。 The new paradigm of finetuning-as-a-service introduces a new attack surface for Large Language Models (LLMs): a few harmful data uploaded by users can easily trick the finetuning to produce an alignment-broken model. We conduct an empirical analysis and uncover a \textit{harmful embedding drift} phenomenon, showing a probable cause of the alignment-broken effect. Inspired by our findings, we propose Vaccine, a perturbation-aware alignment technique to mitigate the security risk of users finetuning. The core idea of Vaccine is to produce invariant hidden embeddings by progressively adding crafted perturbation to them in the alignment phase. This enables the embeddings to withstand harmful perturbation from un-sanitized user data in the finetuning phase. Our results on open source mainstream LLMs (e.g., Llama2, Opt, Vicuna) demonstrate that Vaccine can boost the robustness of alignment against harmful prompts induced embedding drift while reserving reasoning ability towards benign prompts. Our code is available at \url{https://github.com/git-disl/Vaccine}.	翻訳日:2024-02-21 03:08:01 公開日:2024-02-18
# ultralink: オープンソースの知識エンハンスド多言語教師付き微調整データセット UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset ( http://arxiv.org/abs/2402.04588v2 ) ライセンス: Link先を確認	Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun	(参考訳) オープンソースの大規模言語モデル(llm)は、さまざまな分野で大きな力を得ています。それにもかかわらず、ほとんどの研究は主に英語に集中し、多言語能力の領域への探索は限られていた。そこで本研究では,オープンソースの多言語教師付き微調整データセットを構築する。英語の指示を単純に翻訳する以前の研究と異なり、LLMの言語固有の能力と言語に依存しない能力の両方を考慮する。まず,LLMの言語固有の知識を引き出すための知識基盤型データ拡張手法を導入し,各国のユーザに提供する能力を向上させる。さらに,現代のLLMは言語間移動能力が強いため,様々な言語で同一の内容を繰り返し学習する必要はない。その結果、言語に依存しない微調整(SFT)データを性能劣化なく実質的に作成することができ、多言語SFTをより効率的にすることができる。得られたUltraLinkデータセットは、5つの言語(En, Zh, Ru, Fr, Es)にまたがる約100万のサンプルからなり、提案したデータ構築法は他の言語にも容易に拡張できる。 UltraLink-LMはUltraLinkでトレーニングされており、多くのタスクで代表的ベースラインを上回っている。 Open-source large language models (LLMs) have gained significant strength across diverse fields. Nevertheless, the majority of studies primarily concentrate on English, with only limited exploration into the realm of multilingual abilities. In this work, we therefore construct an open-source multilingual supervised fine-tuning dataset. Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs. Firstly, we introduce a knowledge-grounded data augmentation approach to elicit more language-specific knowledge of LLMs, improving their ability to serve users from different countries. Moreover, we find modern LLMs possess strong cross-lingual transfer capabilities, thus repeatedly learning identical content in various languages is not necessary. Consequently, we can substantially prune the language-agnostic supervised fine-tuning (SFT) data without any performance degradation, making multilingual SFT more efficient. The resulting UltraLink dataset comprises approximately 1 million samples across five languages (i.e., En, Zh, Ru, Fr, Es), and the proposed data construction method can be easily extended to other languages. UltraLink-LM, which is trained on UltraLink, outperforms several representative baselines across many tasks.	翻訳日:2024-02-21 02:57:49 公開日:2024-02-18
# lighthgnn: 100\times$高速推論のためにハイパーグラフニューラルネットワークをmlpに蒸留する LightHGNN: Distilling Hypergraph Neural Networks into MLPs for $100\times$ Faster Inference ( http://arxiv.org/abs/2402.04296v2 ) ライセンス: Link先を確認	Yifan Feng, Yihe Luo, Shihui Ying, Yue Gao	(参考訳) ハイパーグラフニューラルネットワーク(HGNN)は近年注目され,高次相関モデルにおける優位性から良好な性能を示した。しかし、ハイパーグラフの高次モデリング能力は計算の複雑さを増大させ、実用的な産業展開を妨げることにも注目される。実際、HGNNの効率的なデプロイにおける重要な障壁は、推論中の高次構造的依存関係である。本稿では,HGNNのハイパーグラフ依存性を解消し,計算複雑性を低減し,推論速度の向上を図るため,HGNNと推論効率のよいMulti-Layer Perceptron(MLP)のギャップを埋めることを提案する。具体的には、複雑性の低い高速推論のために、LightHGNNとLightHGNN$^+$を導入する。 LightHGNN は教師 HGNN から学生 MLP への知識をソフトラベルを通じて直接蒸留し、LightHGNN$^+$ は生徒 MLP に信頼性の高い高次相関関係を明示的に注入し、トポロジカルな蒸留と過度なスムースティングに対する耐性を達成する。 8つのハイパーグラフデータセットの実験では、ハイパーグラフの依存関係がなくても、提案されたLightHGNNはHGNNよりも競争力や性能が向上し、バニラMLPを平均16.3ドル上回った。 3つのグラフデータセットに関する広範な実験は、他のすべての方法と比較して、我々のlighthgnnの平均的なパフォーマンスを示している。 5.5wの頂点を持つ合成ハイパーグラフの実験は、LightHGNNがHGNNよりも100\times$で動作可能であることを示している。 Hypergraph Neural Networks (HGNNs) have recently attracted much attention and exhibited satisfactory performance due to their superiority in high-order correlation modeling. However, it is noticed that the high-order modeling capability of hypergraph also brings increased computation complexity, which hinders its practical industrial deployment. In practice, we find that one key barrier to the efficient deployment of HGNNs is the high-order structural dependencies during inference. In this paper, we propose to bridge the gap between the HGNNs and inference-efficient Multi-Layer Perceptron (MLPs) to eliminate the hypergraph dependency of HGNNs and thus reduce computational complexity as well as improve inference speed. Specifically, we introduce LightHGNN and LightHGNN$^+$ for fast inference with low complexity. LightHGNN directly distills the knowledge from teacher HGNNs to student MLPs via soft labels, and LightHGNN$^+$ further explicitly injects reliable high-order correlations into the student MLPs to achieve topology-aware distillation and resistance to over-smoothing. Experiments on eight hypergraph datasets demonstrate that even without hypergraph dependency, the proposed LightHGNNs can still achieve competitive or even better performance than HGNNs and outperform vanilla MLPs by $16.3$ on average. Extensive experiments on three graph datasets further show the average best performance of our LightHGNNs compared with all other methods. Experiments on synthetic hypergraphs with 5.5w vertices indicate LightHGNNs can run $100\times$ faster than HGNNs, showcasing their ability for latency-sensitive deployments.	翻訳日:2024-02-21 02:57:30 公開日:2024-02-18
# エンタングルメント強化量子距離論:標準量子極限からハイゼンベルク極限へ Entanglement-enhanced quantum metrology: from standard quantum limit to Heisenberg limit ( http://arxiv.org/abs/2402.03572v2 ) ライセンス: Link先を確認	Jiahao Huang, Min Zhuang, Chaohong Lee	(参考訳) エンタングルメント強化量子メートル法は、測定精度を高めるために量子エンタングルメントの利用を探求する。プローブ内の粒子を量子絡み合い状態にすると、測定対象の物理量に関する情報をまとめて蓄積し、標準量子限界を超えた測定精度の向上とハイゼンベルク限界への接近に繋がる。量子操作と検出技術の急速な進歩により、寒冷原子や閉じ込められたイオンのような合成量子システムにおける多粒子の絡み合い状態の生成、操作、検出が可能になった。本稿では,量子計測における多粒子絡み合いを実証する基本原理と実験の進展を概観し,絡み合い量子センサの応用可能性について考察する。 Entanglement-enhanced quantum metrology explores the utilization of quantum entanglement to enhance measurement precision. When particles in a probe are prepared into a quantum entangled state, they collectively accumulate information about the physical quantity to be measured, leading to an improvement in measurement precision beyond the standard quantum limit and approaching the Heisenberg limit. The rapid advancement of techniques for quantum manipulation and detection has enabled the generation, manipulation, and detection of multi-particle entangled states in synthetic quantum systems such as cold atoms and trapped ions. This article aims to review and illustrate the fundamental principles and experimental progresses that demonstrate multi-particle entanglement for quantum metrology, as well as discuss the potential applications of entanglement-enhanced quantum sensors.	翻訳日:2024-02-21 02:56:06 公開日:2024-02-18
# 神経密度比推定のための$\alpha$-divergence loss関数 $\alpha$-Divergence Loss Function for Neural Density Ratio Estimation ( http://arxiv.org/abs/2402.02041v2 ) ライセンス: Link先を確認	Yoshiaki Kitazawa	(参考訳) 近年、ニューラルネットワークは、機械学習の基本技術である密度比推定(DRE)の最先端の結果を生み出している。しかしながら、既存の手法では、kullback-leibler (kl)-divergenceの大きなサンプル要件、列車損失勾配の消失、損失関数の偏り勾配といったdreの損失関数から生じる最適化の問題がある。そこで本稿では,簡単な実装と安定な最適化を提供する$\alpha$-divergence loss関数($\alpha$-div)を提案する。さらに,提案した損失関数の技術的正当性を示す。提案した損失関数の安定性を実証的に検証し,DREタスクの推定精度を検討した。さらに,提案した損失関数を用いたDREのサンプル要件を,高次元DREタスクにおける一般的な問題として次元性の呪いを結び付ける,$L_1$エラーの上限という観点から提示する。 Recently, neural networks have produced state-of-the-art results for density-ratio estimation (DRE), a fundamental technique in machine learning. However, existing methods bear optimization issues that arise from the loss functions of DRE: a large sample requirement of Kullback--Leibler (KL)-divergence, vanishing of train loss gradients, and biased gradients of the loss functions. Thus, an $\alpha$-divergence loss function ($\alpha$-Div) that offers concise implementation and stable optimization is proposed in this paper. Furthermore, technical justifications for the proposed loss function are presented. The stability of the proposed loss function is empirically demonstrated and the estimation accuracy of DRE tasks is investigated. Additionally, this study presents a sample requirement for DRE using the proposed loss function in terms of the upper bound of $L_1$ error, which connects a curse of dimensionality as a common problem in high-dimensional DRE tasks.	翻訳日:2024-02-21 02:52:52 公開日:2024-02-18
# マルチモーダルヘイト音声イベント検出2024におけるMasonPerplexity:トランスフォーマーアンサンブルを用いたヘイトスピーチとターゲット検出 MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles ( http://arxiv.org/abs/2402.01967v2 ) ライセンス: Link先を確認	Amrita Ganguly, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, Marcos Zampieri	(参考訳) ヘイトスピーチのような攻撃的言語の自動識別は、オンラインコミュニティにおける議論を公にする上で重要である。マルチモーダルコンテンツにおけるヘイトスピーチの識別は、単語または画像のいずれかに攻撃性が現れるか、あるいはこれら2つの曖昧さが顕在化できるため、特に難しい課題である。本稿では,EACL 2024のケース2024におけるマルチモーダルヘイト音声イベント検出における共有タスクに対するMasonPerplexityの提出について述べる。タスクは2つのサブタスクに分けられる: サブタスクAはヘイトスピーチの識別に焦点を当て、サブタスクBは政治イベント中のテキスト埋め込み画像におけるターゲットの識別に焦点を当てる。我々は,サブタスクAにXLM-roBERTa-largeモデル,サブタスクBにXLM-roBERTa-base,BERTweet-large,BERT-baseを組み合わせたアンサンブルアプローチを用い,サブタスクAに0.8347F1スコア,サブタスクBに0.6741F1スコアを得た。 The automatic identification of offensive language such as hate speech is important to keep discussions civil in online communities. Identifying hate speech in multimodal content is a particularly challenging task because offensiveness can be manifested in either words or images or a juxtaposition of the two. This paper presents the MasonPerplexity submission for the Shared Task on Multimodal Hate Speech Event Detection at CASE 2024 at EACL 2024. The task is divided into two sub-tasks: sub-task A focuses on the identification of hate speech and sub-task B focuses on the identification of targets in text-embedded images during political events. We use an XLM-roBERTa-large model for sub-task A and an ensemble approach combining XLM-roBERTa-base, BERTweet-large, and BERT-base for sub-task B. Our approach obtained 0.8347 F1-score in sub-task A and 0.6741 F1-score in sub-task B ranking 3rd on both sub-tasks.	翻訳日:2024-02-21 02:52:34 公開日:2024-02-18
# データ保護がMLサーベイランスのアーキテクチャをどのようにサポートするか You Still See Me: How Data Protection Supports the Architecture of ML Surveillance ( http://arxiv.org/abs/2402.06609v2 ) ライセンス: Link先を確認	Rui-Jie Yew, Lucy Qin, Suresh Venkatasubramanian	(参考訳) 人間のデータは機械学習のバックボーンを形成する。したがって、データ保護法はMLシステムの管理方法に強く依存する。個人データの処理に伴うデータ保護法の要件がほとんどであることを踏まえると、組織には、データを法的スコープから遠ざけるインセンティブがある。これにより、特定のプライバシー保護技術(データ保護技術)の開発と応用が、ml準拠の重要な戦略となる。本稿では,これらの手法でラップされたデータを「良い」データとみなす修辞学の影響について検討する。モデル計算の一部として、データセットキュレーションの一部としてのプライベートセットの交差から、同型暗号化やフェデレーション学習に至るまで、MLシステムの開発におけるそれらの応用が、個別の監視とデータ統合をさらに支援できることを示す。 mlパイプラインの構成方法のコアにデータ蓄積があるため、データ保護技術は、データに関連する個人を保護する方法ではなく、監視のインフラストラクチャをサポートする方法で具現化されることが多い、と私たちは主張する。最後に,データ保護技術を評価するための技術と政策戦略を提案する。我々は、監視機械学習技術と戦う政策を策定する上で、技術者が果たす役割を強調して締めくくります。 Human data forms the backbone of machine learning. Data protection laws thus have strong bearing on how ML systems are governed. Given that most requirements in data protection laws accompany the processing of personal data, organizations have an incentive to keep their data out of legal scope. This makes the development and application of certain privacy-preserving techniques--data protection techniques--an important strategy for ML compliance. In this paper, we examine the impact of a rhetoric that deems data wrapped in these techniques as data that is "good-to-go". We show how their application in the development of ML systems--from private set intersection as part of dataset curation to homomorphic encryption and federated learning as part of model computation--can further support individual monitoring and data consolidation. With data accumulation at the core of how the ML pipeline is configured, we argue that data protection techniques are often instrumentalized in ways that support infrastructures of surveillance, rather than in ways that protect individuals associated with data. Finally, we propose technology and policy strategies to evaluate data protection techniques in light of the protections they actually confer. We conclude by highlighting the role that technologists might play in devising policies that combat surveillance ML technologies.	翻訳日:2024-02-21 01:08:06 公開日:2024-02-18
# イントロスペクティブプランニング:言語対応エージェントが自身の不確かさを補う Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty ( http://arxiv.org/abs/2402.06529v2 ) ライセンス: Link先を確認	Kaiqu Liang, Zixu Zhang, Jaime Fern\'andez Fisac	(参考訳) 大きな言語モデル(llm)は高度な推論スキルを示し、ロボットが自然言語命令を理解し、適切な接地を通じて高度なアクションを戦略的に計画できる。しかし、LSM幻覚は、ユーザーの目標と不一致の計画を実行したり、極端な場合、安全でない計画を実行する。さらに、自然言語命令に固有の曖昧さは、特に複数の有効な選択肢が存在する状況において、タスクの不確実性を引き起こす可能性がある。この問題に対処するには、LSMはそのような不確実性を特定し、積極的に明確化を求める必要がある。本稿では,ロボットタスク実行のための不確実性形成におけるllm誘導の体系的手法としてのイントロスペクティブ・プランニングの概念について検討する。タスクレベルのロボット計画における不確実性定量化を調査し,イントロスペクションが成功率と安全性の両方を,最先端のllmベースの計画手法と比較して著しく改善することを示す。さらに,コンフォメーション予測と連動してイントロスペクティブプランニングの有効性を評価し,この組み合わせにより信頼性境界がより強くなり,過剰なユーザ明確化クエリが少ない統計的成功保証が維持されることを示した。 Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions can induce task uncertainty, particularly in situations where multiple valid options exist. To address this issue, LLMs must identify such uncertainty and proactively seek clarification. This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty--aware plans for robotic task execution without the need for fine-tuning. We investigate uncertainty quantification in task-level robot planning and demonstrate that introspection significantly improves both success rates and safety compared to state-of-the-art LLM-based planning approaches. Furthermore, we assess the effectiveness of introspective planning in conjunction with conformal prediction, revealing that this combination yields tighter confidence bounds, thereby maintaining statistical success guarantees with fewer superfluous user clarification queries.	翻訳日:2024-02-21 01:07:47 公開日:2024-02-18
# 大規模言語モデルを用いたマルチモーダル臨床試験結果予測 Multimodal Clinical Trial Outcome Prediction with Large Language Models ( http://arxiv.org/abs/2402.06512v2 ) ライセンス: Link先を確認	Wenhao Zheng, Dongsheng Peng, Hongxia Xu, Hongtu Zhu, Tianfan Fu, Huaxiu Yao	(参考訳) 臨床試験は重要かつ費用のかかるプロセスであり、しばしば数年にわたって、かなりの資金を必要とする。したがって、臨床試験結果予測モデルの開発は、失敗しそうな薬物を除外することを目的としており、大幅なコスト削減の可能性を秘めている。近年のデータ駆動型試みは、臨床治験結果を予測するために、深層学習を利用してマルチモーダルデータを統合している。しかし、これらのアプローチは手動で設計されたモーダル固有エンコーダに依存しており、新しいモーダルに適応する拡張性と、異なるモーダルにまたがる類似した情報パターンを識別する能力の両方を制限する。そこで本研究では, 臨床結果予測のためのマルチモーダル・ミックス・オブ・エキスパート(lifted)アプローチを提案する。具体的には、LIFTEDは異なるモダリティデータを自然言語記述に変換することで統一する。そして、LIFTEDは統合ノイズ耐性エンコーダを構築し、モーダル固有の言語記述から情報を抽出する。その後、sparse mixture-of-expertsフレームワークを使用して表現をさらに洗練し、liftedは異なるモダリティにまたがる類似情報パターンを特定し、同じエキスパートモデルを使用してそれらのパターンからより一貫性のある表現を抽出することができる。最後に、様々なモダリティ表現を動的に統合して予測することで、LIFTEDは異なるモダリティを自動で測定し、重要な情報により多くの注意を払うことができる。実験の結果, LIFTEDは, 3段階の治験成績を予測する上で, 最良基準に比べて有意に向上し, キーコンポーネントの有効性が示された。 The clinical trial is a pivotal and costly process, often spanning multiple years and requiring substantial financial resources. Therefore, the development of clinical trial outcome prediction models aims to exclude drugs likely to fail and holds the potential for significant cost savings. Recent data-driven attempts leverage deep learning methods to integrate multimodal data for predicting clinical trial outcomes. However, these approaches rely on manually designed modal-specific encoders, which limits both the extensibility to adapt new modalities and the ability to discern similar information patterns across different modalities. To address these issues, we propose a multimodal mixture-of-experts (LIFTED) approach for clinical trial outcome prediction. Specifically, LIFTED unifies different modality data by transforming them into natural language descriptions. Then, LIFTED constructs unified noise-resilient encoders to extract information from modal-specific language descriptions. Subsequently, a sparse Mixture-of-Experts framework is employed to further refine the representations, enabling LIFTED to identify similar information patterns across different modalities and extract more consistent representations from those patterns using the same expert model. Finally, a mixture-of-experts module is further employed to dynamically integrate different modality representations for prediction, which gives LIFTED the ability to automatically weigh different modalities and pay more attention to critical information. The experiments demonstrate that LIFTED significantly enhances performance in predicting clinical trial outcomes across all three phases compared to the best baseline, showcasing the effectiveness of our proposed key components.	翻訳日:2024-02-21 01:07:28 公開日:2024-02-18
# taser: 高速かつ高精度な動的グラフ表現学習のための時間適応サンプリング TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning ( http://arxiv.org/abs/2402.05396v2 ) ライセンス: Link先を確認	Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor Prasanna	(参考訳) 近年,tgnn(temporal graph neural network)が,不正検出やコンテンツ推薦など,さまざまなハイインパクトアプリケーションにおいて最先端のパフォーマンスを示している。 TGNNの成功にもかかわらず、タイムデプリケートリンクや歪んだ相互作用分布のような現実の動的グラフに見られる一般的なノイズに傾向がある。ノイズはTGNNの精度を著しく損なう2つの重要な問題を引き起こす:(1)モデルが劣る相互作用によって制御され、(2)ノイズ入力は集約されたメッセージに高いばらつきをもたらす。しかし、現在のTGNN復調技術は各ノードの多様かつ動的ノイズパターンを考慮していない。さらに、より多くの隣人をトラバースすることで発生する、超過度のミニバッチ生成オーバーヘッドにも悩まされる。高速かつ正確なTGNNの治療法は、時間適応サンプリングにあると考えています。本研究では,TGNNの精度,効率,スケーラビリティに最適化された最初の適応サンプリング手法であるTASERを提案する。 TASERは、過去の相互作用の文脈的、構造的、時間的特性に基づいて、トレーニングダイナミクスと時間的隣人選択に基づいてミニバッチ選択を適用する。ミニバッチ生成のボトルネックを軽減するため、TASERは純粋なGPUベースの時間的隣のファインダと専用のGPU機能キャッシュを実装している。 2つの最先端のバックボーンTGNNを用いたTASERの性能評価を行った。 5つの一般的なデータセットにおいて、TASERは平均相反ランク(MRR)で平均2.3%のベースラインを上回り、トレーニング時間で平均5.1倍のスピードアップを達成する。 Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.	翻訳日:2024-02-21 01:04:34 公開日:2024-02-18
# デッサートを乗り越えて、繰り返しケーキをカットする技術を習得する Dueling Over Dessert, Mastering the Art of Repeated Cake Cutting ( http://arxiv.org/abs/2402.08547v2 ) ライセンス: Link先を確認	Simina Br\^anzei and MohammadTaghi Hajiaghayi and Reed Phillips and Suho Shin and Kun Wang	(参考訳) 我々は、アリスとボブという2人のプレイヤーがケーキよりもプライベートなバリュエーションで繰り返し公平に分割することを考える。各ラウンドに新しいケーキが登場し、前ラウンドと同じである。アリスは自分の選択した時点でケーキを切るが、ボブは左のピースか右のピースを選び、残りはアリスに任せる。我々は2つのバージョンを考える: シーケンシャル: ボブがアリスのカットポイントを左と右を選ぶ前に観察し、同時に、ボブが選択した後のみ彼女のカットポイントを観察する。同時版は Aumann and Maschler (1995) によって最初に検討された。ボブがほとんど近視的であり、彼の好きな曲をあまり頻繁に選ぶなら、二分探索に似た戦略を通じてアリスによって体系的に悪用されるのである。この戦略により、アリスはボブの好みを精度を上げることで近似し、時間とともに資源の不均等な共有を確保することができる。プレイヤーが他のプレイヤーをどの程度利用できるかの限界を分析し、公正なユーティリティプロファイルが実際に達成可能であることを示す。特に、プレイヤーは、他のプレイヤーの効用を平均で約1/2$に保ちながら、平均で約1/2$の保証をすることで、プレーの軌跡ごとに、同等の効用プロファイルに$(1/2, 1/2)$を課すことができる。この定理はブラックウェルのアプローチ可能性との接続を用いて示される。最後に、プレイヤーが他のプレイヤーの経験的分布に最も反応する架空の遊びとして知られる自然力学を分析する。虚数プレイは、$(1/2, 1/2)$の公平なユーティリティプロファイルに$O(1/\sqrt{T})$の速度で収束することを示す。 We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler (1995). We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob's preferences with increasing precision, thereby securing a disproportionate share of the resource over time. We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of $(1/2, 1/2)$ in the limit on every trajectory of play, by keeping the other player's utility to approximately $1/2$ on average while guaranteeing they themselves get at least approximately $1/2$ on average. We show this theorem using a connection with Blackwell approachability. Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the equitable utility profile of $(1/2, 1/2)$ at a rate of $O(1/\sqrt{T})$.	翻訳日:2024-02-21 00:53:32 公開日:2024-02-18
# 一般値関数をもつ文脈多項ロジット帯域 Contextual Multinomial Logit Bandits with General Value Functions ( http://arxiv.org/abs/2402.08126v2 ) ライセンス: Link先を確認	Mengxiao Zhang, Haipeng Luo	(参考訳) MNL(Contextual multinomial logit)は、オンライン小売や広告など、現実のアソシエーションレコメンデーション問題の多くを捉えている。しかしながら、以前の研究は線形値関数のみを考慮(一般化)しており、適用可能性を大幅に制限している。この事実に動機づけられた本研究では、文脈的帯域幅の研究の最近の動向からアイデアを借り、基礎的真実を含む一般値関数クラスを持つ文脈的MNL帯域幅を考える。具体的には,確率的および対数的設定の両方を考慮し,それぞれ異なる計算-回帰トレードオフを持つアルゴリズム一式を提案する。線形の場合に適用した場合、この結果は指数関数的に大きい問題依存定数に依存しない最初のものであるだけでなく、計算効率、次元自由後悔境界、完全に対向する文脈や報酬を扱う能力などの他の利点も享受する。 Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.	翻訳日:2024-02-21 00:52:43 公開日:2024-02-18
# 平均場定常分布からのサンプリング Sampling from the Mean-Field Stationary Distribution ( http://arxiv.org/abs/2402.07355v3 ) ライセンス: Link先を確認	Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li	(参考訳) 本研究では,平均場SDEの定常分布からのサンプリングの複雑さ,あるいは相互作用項を含む確率測度空間上の関数の最小化の複雑さについて検討する。本研究の主な知見は,(1)有限粒子系による平均場sdeの近似,(2)カオスの均一な時間伝播,(2)標準対数対数解析による有限粒子定常分布からのサンプリング,の2つの重要な側面を分離することである。我々のアプローチは概念的にシンプルであり、その柔軟性はアルゴリズムと理論の両方に最先端の技術を取り入れることができる。これにより、平均フィールド状態における特定の2層ニューラルネットワークを最適化する保証の改善など、多数の設定での保証が改善される。 We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of chaos, and (2) sampling from the finite-particle stationary distribution, via standard log-concave samplers. Our approach is conceptually simpler and its flexibility allows for incorporating the state-of-the-art for both algorithms and theory. This leads to improved guarantees in numerous settings, including better guarantees for optimizing certain two-layer neural networks in the mean-field regime.	翻訳日:2024-02-21 00:51:50 公開日:2024-02-18
# 知的分子特性予測におけるドメイン知識とマルチモダリティの影響--体系的調査 The Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey ( http://arxiv.org/abs/2402.07249v2 ) ライセンス: Link先を確認	Taojie Kuang, Pengfei Liu, Zhixiang Ren	(参考訳) 分子特性の正確な予測は薬物開発、特に仮想スクリーニングや化合物最適化の進歩に不可欠である。近年の多くの深層学習手法の導入は、分子特性予測(MPP)の強化、特に分子構造に対する精度と洞察の向上に顕著な可能性を示している。しかし、2つの重要な疑問が生じる: ドメイン知識の統合は分子特性予測の精度を高め、マルチモーダルデータ融合を用いることで、ユニークなデータソース法よりも正確な結果が得られるか? そこで本研究では,近年の深層学習法を総合的に検討し,定量的に分析する。分子情報の統合はMPPの回帰と分類のタスクをそれぞれ3.98%と1.72%改善することを発見した。また,1次元情報と2次元情報を同時に利用することにより,mppを最大4.2%向上できることがわかった。 2つの統合された洞察は、薬物発見の将来の進歩に重要なガイダンスを提供する。 The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learning-based methods has shown remarkable potential in enhancing molecular property prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information will improve both MPP regression and classification tasks by upto 3.98% and 1.72%, respectively. We also discover that the utilizing 3-dimensional information with 1-dimensional and 2-dimensional information simultaneously can substantially enhance MPP upto 4.2%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.	翻訳日:2024-02-21 00:51:35 公開日:2024-02-18
# 金融におけるネットワークレジリエンス向上のためのディープラーニング Utilizing Deep Learning for Enhancing Network Resilience in Finance ( http://arxiv.org/abs/2402.09820v2 ) ライセンス: Link先を確認	Yulu Gong, Mengran Zhu, Shuning Huo, Yafei Xiang, Hanyi Yu	(参考訳) インターネットの時代において、人々の生活はますます今日のネットワーク技術に依存している。ネットワークの完全性を維持し、ユーザーの正当な利益を守ることは、ネットワーク構築の核心である。脅威検出は、完全かつ効果的な防衛システムの重要な部分である。未知の脅威を効果的に検出する方法は、ネットワーク保護の懸念のひとつだ。現在、ネットワーク脅威検出は、通常、人工的なルールを作成したり、大規模なデータアプリケーションに適用できない時空間的特徴を抽出したりするルールや従来の機械学習手法に基づいており、未知のリスクの発生によって元のモデルの検出精度が低下する。このことを念頭に置いて,金融業界の保護対策を改善するために,高度な脅威検出にディープラーニングを用いる。多くのネットワーク研究者は例外ベースの侵入検知技術に焦点を移した。検出技術は主に、通常のプログラムとネットワークの振る舞いデータを収集し、多次元の特徴を抽出し、このベースで決定機械学習モデルを訓練する統計機械学習手法を使用する(一般的には、ベイズ、決定木、サポートベクターマシン、ランダムフォレストなどを含む)。 In the age of the Internet, people's lives are increasingly dependent on today's network technology. Maintaining network integrity and protecting the legitimate interests of users is at the heart of network construction. Threat detection is an important part of a complete and effective defense system. How to effectively detect unknown threats is one of the concerns of network protection. Currently, network threat detection is usually based on rules and traditional machine learning methods, which create artificial rules or extract common spatiotemporal features, which cannot be applied to large-scale data applications, and the emergence of unknown risks causes the detection accuracy of the original model to decline. With this in mind, this paper uses deep learning for advanced threat detection to improve protective measures in the financial industry. Many network researchers have shifted their focus to exception-based intrusion detection techniques. The detection technology mainly uses statistical machine learning methods - collecting normal program and network behavior data, extracting multidimensional features, and training decision machine learning models on this basis (commonly used include naive Bayes, decision trees, support vector machines, random forests, etc.).	翻訳日:2024-02-21 00:26:16 公開日:2024-02-18
# コントラスト学習とセルフアテンションを用いた時間軸の逐次推薦 Sequential Recommendation on Temporal Proximities with Contrastive Learning and Self-Attention ( http://arxiv.org/abs/2402.09784v2 ) ライセンス: Link先を確認	Hansol Jung, Hyunwoo Seo and Chiehyeon Lim	(参考訳) 逐次リコメンデータシステムは、過去のインタラクションからユーザの好みを識別し、後続の項目を最適に予測する。従来のディープラーニングモデルと最新のトランスフォーマーモデルでは、ユーザとテーマのインタラクションにおける一方向および双方向のパターンが捉えられているが、個人の行動パターンや社会的傾向パターンといった時間的文脈の重要性は未検討のままである。特に最近のモデルは、類似した時間枠の間、ユーザ間で暗黙的に発生するユーザのアクションの類似性を無視することが多い。これらのモデルは主に変換器の自己認識機構を適用し、個々のユーザアクションの時間的コンテキストを考慮する。一方、この適応は、アイテム間の相互作用における水平時間的近接性、例えば1週間以内のアイテム購入と1ヶ月以内のアイテム購入の区別を考慮しても依然として限定的である。これらのギャップに対処するため,ユーザ間相互作用の時間的近さを考慮し,コントラスト学習と自己注意法を含む,TemProxRecというシーケンシャルレコメンデーションモデルを提案する。提案するコントラスト学習法は,ユーザ間の密接な時間間隔で選択された項目の表現を学習する。同時に,提案手法は,絶対埋め込みと相対埋め込みの両方を用いて,ユーザシーケンス内の時間的および位置的コンテキストを符号化する。このようにして、私たちのTemProxRecは、特定の時間枠内のユーザとイテムのインタラクションに基づいて、関連するアイテムを正確に予測します。 temproxrecに関する包括的実験によって検証し、ベンチマークデータセットで既存のモデルと一貫して比較し、垂直および水平の時間軸を逐次レコメンデーションとして考慮することの重要性を示す。 Sequential recommender systems identify user preferences from their past interactions to predict subsequent items optimally. Although traditional deep-learning-based models and modern transformer-based models in previous studies capture unidirectional and bidirectional patterns within user-item interactions, the importance of temporal contexts, such as individual behavioral and societal trend patterns, remains underexplored. Notably, recent models often neglect similarities in users' actions that occur implicitly among users during analogous timeframes-a concept we term vertical temporal proximity. These models primarily adapt the self-attention mechanisms of the transformer to consider the temporal context in individual user actions. Meanwhile, this adaptation still remains limited in considering the horizontal temporal proximity within item interactions, like distinguishing between subsequent item purchases within a week versus a month. To address these gaps, we propose a sequential recommendation model called TemProxRec, which includes contrastive learning and self-attention methods to consider temporal proximities both across and within user-item interactions. The proposed contrastive learning method learns representations of items selected in close temporal periods across different users to be close. Simultaneously, the proposed self-attention mechanism encodes temporal and positional contexts in a user sequence using both absolute and relative embeddings. This way, our TemProxRec accurately predicts the relevant items based on the user-item interactions within a specific timeframe. We validate this work through comprehensive experiments on TemProxRec, consistently outperforming existing models on benchmark datasets as well as showing the significance of considering the vertical and horizontal temporal proximities into sequential recommendation.	翻訳日:2024-02-21 00:25:57 公開日:2024-02-18
# 高アフィン変換に適応した領域特徴記述子 Region Feature Descriptor Adapted to High Affine Transformations ( http://arxiv.org/abs/2402.09724v2 ) ライセンス: Link先を確認	Shaojie Zhang, Yinghui Wang, Bin Nan, Jinlong Yang, Tao Yan, Liangyi Huang, and Mingfeng Wang	(参考訳) 画像が高アフィン変換を行う場合のグレースケール特徴情報の表現に効果のない特徴ディスクリプタの問題に対処するため,分類を用いてアフィン変換をシミュレートした領域特徴ディスクリプタを提案する。提案手法は当初,異なるアフィン次数を持つ画像を分類し,アフィン変換をシミュレートし,新たな画像群を生成する。その後、この新しい画像集合上の特徴点の近傍情報を算出する。最後に、特徴点が属する最大安定極端領域のグレースケールヒストグラムと特徴点領域のグレイスケールセントロイドに対する正規化位置とを組み合わせて記述子を生成する。アフィン変換のシナリオで特徴マッチングメトリクスを比較した実験の結果,提案する記述器は従来の記述器と比較して高い精度と頑健性を示すことがわかった。さらに、他のディスクリプタと統合すると堅牢性を示す。 To address the issue of feature descriptors being ineffective in representing grayscale feature information when images undergo high affine transformations, leading to a rapid decline in feature matching accuracy, this paper proposes a region feature descriptor based on simulating affine transformations using classification. The proposed method initially categorizes images with different affine degrees to simulate affine transformations and generate a new set of images. Subsequently, it calculates neighborhood information for feature points on this new image set. Finally, the descriptor is generated by combining the grayscale histogram of the maximum stable extremal region to which the feature point belongs and the normalized position relative to the grayscale centroid of the feature point's region. Experimental results, comparing feature matching metrics under affine transformation scenarios, demonstrate that the proposed descriptor exhibits higher precision and robustness compared to existing classical descriptors. Additionally, it shows robustness when integrated with other descriptors.	翻訳日:2024-02-21 00:25:23 公開日:2024-02-18
# モデル編集による蝶効果:大言語モデルの崩壊をトリガーできる編集は少ない The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse ( http://arxiv.org/abs/2402.09656v2 ) ライセンス: Link先を確認	Wanli Yang, Fei Sun, Xinyu Ma, Xun Liu, Dawei Yin, Xueqi Cheng	(参考訳) モデル編集は、Large Language Models (LLM) における知識の改訂に有望であるが、LLMの本質的な能力への影響はしばしば見過ごされている。一つの編集でもモデル崩壊を引き起こし、様々なベンチマークタスクで大幅なパフォーマンス低下を示す。しかし、このような崩壊を防ぐために各編集後のLCMのベンチマークは、致命的であり、資源集約である。そこで本研究では,ダウンストリームタスクの性能と強い相関関係を実証した広範囲な実験により検証した,代理メトリックとしてのパープレキシティの利用を提案する。さらに,従来の単一編集研究の難題に焦点をあて,様々な編集手法やLLMをまたいだ実世界のシナリオの実践的設定であるシーケンシャル編集の詳細な研究を行っている。その結果, ほぼすべての編集手法が, ほんの数回の編集後, モデル崩壊をもたらすことがわかった。さらなる研究を容易にするため,我々はGPT-3.5を用いて,これらのハードケースに基づいた新しいデータセットであるHardEditを開発した。このデータセットは、信頼性のあるモデル編集の研究の先駆的な基盤と、編集によるモデル崩壊のメカニズムを確立することを目的としている。この作業が,モデル編集プラクティスに内在する潜在的なリスクに対して,コミュニティの注意を引き付けることを願っています。 Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating its strong correlation with downstream tasks performance. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community's attention to the potential risks inherent in model editing practices.	翻訳日:2024-02-21 00:25:07 公開日:2024-02-18
# EventRL: 大規模言語モデルのアウトカムスーパービジョンによるイベント抽出の強化 EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models ( http://arxiv.org/abs/2402.11430v1 ) ライセンス: Link先を確認	Jun Gao, Huan Zhao, Wei Wang, Changlong Yu, Ruifeng Xu	(参考訳) 本研究では,大規模言語モデル(LLM)におけるイベント抽出の強化を目的とした強化学習手法であるEventRLを提案する。 EventRLは、結果の監視と特定の報酬関数を利用して、イベント構造のミスマッチや未定義のイベントタイプの生成として表される命令の追従や幻覚といった、LLMの一般的な課題に取り組む。我々は,Few-Shot Prompting (FSP) や Supervised Fine-Tuning (SFT) といった既存手法に対して,GPT-4, LLaMa, CodeLLaMa モデルを含む様々な LLM に対して EventRL を評価する。以上の結果から,EventRLはイベントの識別や構造化,特に新しいイベントタイプへの対応において,従来の手法よりも優れていた。この研究は、報酬関数の選択の重要な役割を強調し、より良いイベント抽出のためにコードデータを統合する利点を示す。モデルサイズの増加は高い精度をもたらすが、オーバーフィットを避けるには一般化する能力を維持することが不可欠である。 In this study, we present EventRL, a reinforcement learning approach developed to enhance event extraction for large language models (LLMs). EventRL utilizes outcome supervision with specific reward functions to tackle prevalent challenges in LLMs, such as instruction following and hallucination, manifested as the mismatch of event structure and the generation of undefined event types. We evaluate EventRL against existing methods like Few-Shot Prompting (FSP) (based on GPT4) and Supervised Fine-Tuning (SFT) across various LLMs, including GPT-4, LLaMa, and CodeLLaMa models. Our findings show that EventRL significantly outperforms these conventional approaches by improving the performance in identifying and structuring events, particularly in handling novel event types. The study emphasizes the critical role of reward function selection and demonstrates the benefits of incorporating code data for better event extraction. While increasing model size leads to higher accuracy, maintaining the ability to generalize is essential to avoid overfitting.	翻訳日:2024-02-20 21:37:29 公開日:2024-02-18
# 選好微調整による視覚大言語モデルにおけるモーダリティの調整 Aligning Modalities in Vision Large Language Models via Preference Fine-tuning ( http://arxiv.org/abs/2402.11411v1 ) ライセンス: Link先を確認	Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao	(参考訳) VLLM(Instruction-following Vision Large Language Models)は近年,様々なタスクにおいて大きな進歩を遂げている。これらのアプローチは、強い事前訓練された視覚モデルと大きな言語モデル(LLM)を統合する。これらのコンポーネントは別々にトレーニングされているため、学習された表現は、追加のイメージ言語対のジョイントトレーニングと整合する必要がある。この手順は完全ではなく、モデルに幻覚を与える - コアllmが極めて事実的であり、ビジョンバックボーンが十分に完全な表現を持っている場合でも、画像を正確に反映しない回答を提供する - 。本研究では,幻覚問題をアライメント問題として枠組し,嗜好調整によって対処する。具体的には,AIモデルを用いたフィードバックデータを生成するPOVIDを提案する。提案手法は,好ましくないデータを生成するための2段階のアプローチである。まず,GPT-4Vに対して,正解に可溶性幻覚を注入するよう促す。第2に、VLLMの固有の幻覚行動を引き起こすために、画像を歪ませる。これは自動化されたアプローチで、人間のデータ生成に依存したり、完璧な専門家を必要としません。最後に、これら2つの生成ストラテジーは、Direct Preference Optimizationを通じてRLHFパイプラインに統合される。広範ベンチマークを用いた実験では、幻覚を減らすだけでなく、標準ベンチマークでのモデル性能を向上させることができ、従来の手法よりも優れていた。私たちのデータとコードはhttps://github.com/YiyangZhou/POVID.comで公開されています。 Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and can cause the model to hallucinate - provide answers that do not accurately reflect the image, even when the core LLM is highly factual and the vision backbone has sufficiently complete representations. In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning. Specifically, we propose POVID to generate feedback data with AI models. We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data. First, we prompt GPT-4V to inject plausible hallucinations into the correct answer. Second, we distort the image to trigger the inherent hallucination behavior of the VLLM. This is an automated approach, which does not rely on human data generation or require a perfect expert, which makes it easily scalable. Finally, both of these generation strategies are integrated into an RLHF pipeline via Direct Preference Optimization. In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches. Our data and code are available at https://github.com/YiyangZhou/POVID.	翻訳日:2024-02-20 21:37:09 公開日:2024-02-18
# キャリブレーションまでの距離2\sqrt{t}$を得る基本予測器 An Elementary Predictor Obtaining $2\sqrt{T}$ Distance to Calibration ( http://arxiv.org/abs/2402.11410v1 ) ライセンス: Link先を確認	Eshwar Ram Arunachaleswaran, Natalie Collina, Aaron Roth, Mirah Shi	(参考訳) Blasiokら。 2023] 予測校正誤差(ece)と異なる校正誤差の自然な尺度として校正までの距離を提案する。最近、Qiao と Zheng [2024] は、ECE では不可能であることが知られている対角線の校正までの距離を$O(\sqrt{T})$で得るオンライン予測器の存在を確立する非構成的議論を行った。それらは、明示的で効率的なアルゴリズムを見つけるオープンな問題として残されている。我々はこの問題を解き、極端に単純で効率的で決定論的なアルゴリズムを与え、最大2$\sqrt{T}$で校正誤差までの距離を求める。 Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They leave as an open problem finding an explicit, efficient algorithm. We resolve this problem and give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2\sqrt{T}$.	翻訳日:2024-02-20 21:36:46 公開日:2024-02-18
# 共感的対話応答の多次元評価 Multi-dimensional Evaluation of Empathetic Dialog Responses ( http://arxiv.org/abs/2402.11409v1 ) ライセンス: Link先を確認	Zhichao Xu, Jiepu Jiang	(参考訳) 共感は効果的な会話コミュニケーションの重要な要素であるが、会話の共感を測定する以前の研究は、主に表現されたコミュニケーションの意図に焦点を当てている。対照的に,話者の視点から表現された意図と聞き手の視点から知覚された共感の両方を測定するために,既存の作業を拡張する多次元共感評価フレームワークを提案する。提案手法を適用して顧客・サービス対話の分析を行ったところ,2次元(表現意図型と知覚共感)は相互に関連しており,共感感は対話セッションの満足度と高い相関関係にあることがわかった。このフレームワークでは、トレーニングされたアノテータからの主観的な評価が必要である。そこで我々は,(1)凍結した大言語モデル(LLM)と(2)学習言語モデルに基づく分類器を用いて,対話的共感を自動的に計測する様々なモデリングオプションについて検討した。 GPT-4およびFlanファミリーモデルの性能の低下を反映して、内部および外部の対話データセットの広範な実験により、会話の共感を測定することは、凍結LDMの促進に依然として困難な課題であることが示された。一方,sequence-to-sequence (seq2seq) 言語モデルに基づく提案手法は,先行研究や競合ベースラインと比較して最高の性能を実現することができる。最後に,提案する命令精細分類器の性能に関する包括的アブレーション研究を行い,自動会話共感評価指標として採用する可能性について推奨する。 Empathy is a critical element of effective and satisfactory conversational communication, yet previous studies in measuring conversational empathy mostly focus on expressed communicative intents -- in which way empathy is expressed, ignoring the fact that conversation is also a collaborative practice involving both speakers and listeners. In contrast, we propose a multi-dimensional empathy evaluation framework that extends upon existing work to measure both expressed intents from the speaker's perspective and perceived empathy from the listener's perspective. Applying the proposed framework to analyzing our internal customer-service dialogue shows that the two dimensions (expressed intent types and perceived empathy) are inter-connected, while perceived empathy has high correlation with the satisfactory level of dialogue sessions. This proposed framework still requires subjective assessments from trained annotators, which can be non-trivial to collect. To scale up evaluation without excessive reliance on carefully annotated data, we explore different modeling options to automatically measure conversational empathy with (1) prompting frozen large language models (LLMs) and (2) training language model-based classifiers. Extensive experiments on both internal and external dialogue datasets show that measuring conversational empathy remains a challenging task for prompting frozen LLMs, reflected by less satisfying performance of GPT-4 and Flan family models. On the other hand, our proposed instruction-finetuned classifiers based on sequence-to-sequence (Seq2Seq) language models is able to achieve the best performance compared to prior works and competitive baselines. Finally, we perform comprehensive ablation studies on the performance of proposed instruction-finetuned classifiers and give recommendations on potentially adopting them as automatic conversational empathy evaluation metrics.	翻訳日:2024-02-20 21:36:27 公開日:2024-02-18
# 極端に言うな - 暗黙のヘイトスピーチ検出におけるllmの過度の感度とキャリブレーション制限を明らかにする Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection ( http://arxiv.org/abs/2402.11406v1 ) ライセンス: Link先を確認	Min Zhang, Jianfeng He, Taoran Ji, Chang-Tien Lu	(参考訳) 大規模言語モデル(LLM)の公平性と信頼性は注目されている。憎しみの意図を伝えるために間接言語を用いる暗黙のヘイトスピーチは、実践のかなりの部分を占める。しかし、LLMがこの問題に効果的に対処する程度については、まだ十分に検証されていない。本稿では,LLMが暗黙のヘイトスピーチ(分類タスク)を検出し,その応答に対する自信を表現する能力について述べる。本評価は,様々なプロンプトパターンと主観的不確実性推定手法を念頭において検討する。 1) LLMは, 公平性問題を引き起こす可能性のあるグループやトピックに対して過度な感受性を示し, ヘイトスピーチとして良心的発言を誤分類する。 (2)各手法に対するllmsの信頼度スコアは固定範囲に集中し、データセットの複雑さにかかわらず変わらない。これにより、キャリブレーション性能は一次分類精度に大きく依存する。これらの発見はLSMの新たな制限を明らかにし、極端に向かないようモデルを最適化する際の注意が必要であることを強調している。これは、モデルフェアネスの追求における感度と信頼性を慎重に考慮するためのリマインダーとして機能する。 The fairness and trustworthiness of Large Language Models (LLMs) are receiving increasing attention. Implicit hate speech, which employs indirect language to convey hateful intentions, occupies a significant portion of practice. However, the extent to which LLMs effectively address this issue remains insufficiently examined. This paper delves into the capability of LLMs to detect implicit hate speech (Classification Task) and express confidence in their responses (Calibration Task). Our evaluation meticulously considers various prompt patterns and mainstream uncertainty estimation methods. Our findings highlight that LLMs exhibit two extremes: (1) LLMs display excessive sensitivity towards groups or topics that may cause fairness issues, resulting in misclassifying benign statements as hate speech. (2) LLMs' confidence scores for each method excessively concentrate on a fixed range, remaining unchanged regardless of the dataset's complexity. Consequently, the calibration performance is heavily reliant on primary classification accuracy. These discoveries unveil new limitations of LLMs, underscoring the need for caution when optimizing models to ensure they do not veer towards extremes. This serves as a reminder to carefully consider sensitivity and confidence in the pursuit of model fairness.	翻訳日:2024-02-20 21:35:57 公開日:2024-02-18
# autoprm: 制御可能な質問分解による多段階推論のための手続き的監督の自動化 AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition ( http://arxiv.org/abs/2402.11452v1 ) ライセンス: Link先を確認	Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj and Huaxiu Yao	(参考訳) 大規模言語モデル(LLM)の最近の進歩は、多段階推論タスクにおいて有望であるが、手続き的フィードバックを提供するための広範な手動ラベリングに依存していることは、依然として大きな障害である。本稿では,複雑な推論課題に対して,llmの微調整をより効率的に行うための,自己教師付きフレームワークautoprmを提案する。具体的には、autoprmは、まず複雑な問題を制御可能な粒度スイッチでより管理可能なサブクエストに分解し、その後順次強化学習を適用してサブクエストソルバを反復的に改善する。さらに,報酬の改ざんを回避するための文脈誘導復号法を提案し,従属問題の解法を導出する。大規模な実験により、AutoPRMはSOTA上の数学的および常識推論タスクの性能を著しく向上することが示された。さらに奨励的に、AutoPRMは他の直交推論パイプラインと簡単に統合できる。 Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment. To address this challenge, in this paper, we propose a novel self-supervised framework AutoPRM that efficiently enhances the fine-tuning of LLMs for intricate reasoning challenges. Specifically, AutoPRM first decomposes complex problems into more manageable subquestions with a controllable granularity switch, then sequentially apply reinforcement learning to iteratively improve the subquestion solver. Additionally, we propose context-guided-decoding to avoid reward tampering and guide the subquestion solver towards the solution of the holistic problem. Extensive experiments show that AutoPRM significantly improves performance on mathematical and commonsense reasoning tasks over SOTA. More encouragingly, AutoPRM can be easily integrated with other orthogonal reasoning pipelines.	翻訳日:2024-02-20 21:26:47 公開日:2024-02-18
# Momentor: 微粒な時間推論によるビデオ大言語モデルの改善 Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning ( http://arxiv.org/abs/2402.11435v1 ) ライセンス: Link先を確認	Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang	(参考訳) 大規模言語モデル(LLM)は、テキストベースのタスクの理解と処理において顕著な熟練度を示す。これらの属性をビデオLLMと呼ばれるビデオモダリティに転送するために、多くの努力がなされている。しかし、既存のVideo-LLMは粗いセマンティクスのみをキャプチャすることができ、特定のビデオセグメントの理解やローカライゼーションに関連するタスクを効果的に処理できない。これらの課題を踏まえ、細かな時間的理解タスクを実現できるビデオLLMであるMomentorを提案する。 Momentorのトレーニングを支援するために,セグメントレベルの命令データを持つ大規模ビデオ命令データセットであるMoment-10Mを構築するための自動データ生成エンジンを設計する。 moment-10mでmomentorをトレーニングし,セグメントレベルの推論とローカライズを可能にした。いくつかのタスクにおけるゼロショット評価は、モーメントアが微粒な時間的基底の理解と局所化において優れていることを示す。 Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and handling text-based tasks. Many efforts are being made to transfer these attributes to video modality, which are termed Video-LLMs. However, existing Video-LLMs can only capture the coarse-grained semantics and are unable to effectively handle tasks related to comprehension or localization of specific video segments. In light of these challenges, we propose Momentor, a Video-LLM capable of accomplishing fine-grained temporal understanding tasks. To support the training of Momentor, we design an automatic data generation engine to construct Moment-10M, a large-scale video instruction dataset with segment-level instruction data. We train Momentor on Moment-10M, enabling it to perform segment-level reasoning and localization. Zero-shot evaluations on several tasks demonstrate that Momentor excels in fine-grained temporally grounded comprehension and localization.	翻訳日:2024-02-20 21:26:30 公開日:2024-02-18
# IoTアプリケーションのための機械学習技術による屋内ローカライゼーションの改善 Improved Indoor Localization with Machine Learning Techniques for IoT applications ( http://arxiv.org/abs/2402.11433v1 ) ライセンス: Link先を確認	M.W.P. Maduranga	(参考訳) IoT(Internet of Things)とモバイルインターネットアプリケーションの台頭は、商業、軍事、社会アプリケーションのための位置情報サービス(LBS)への関心を喚起している。世界測位システム(GPS)が屋外の局地化を支配する一方で、その効果は信号の問題により屋内で弱まる。屋内ローカライゼーションシステムは、Wi-Fi、ZigBee、Bluetooth、UWBなどの無線技術を活用し、コンテキストに基づいて選択する。受信信号強度インジケータ(RSSI)技術は、その精度と単純さで広く採用されている。本研究は,rssiに基づく屋内定位のための教師付き回帰器,教師付き分類器,アンサンブル方式の3段階の機械学習アルゴリズムを用いる。さらに、重み付き最小二乗法と擬似線形解法を導入し、非線形rssi測定方程式に線形方程式を近似することで対処する。さまざまな無線技術とアンカーノードを利用する実験的なテストベッドは、IoTクラウドアーキテクチャを使用したデータ収集用に設計されている。事前処理には、アルゴリズムトレーニング前のデータ精錬のためのフィルタの調査が含まれる。この研究は、線形回帰、多項式回帰、支持ベクトル回帰、ランダム森林回帰、および様々な無線技術における決定木回帰といった機械学習モデルを用いている。これらのモデルは移動対象ノードの地理的座標を推定し、その性能を精度、根平均二乗誤差、精度、リコール、感度、行列式係数、f1-scoreなどの指標を用いて評価する。実験の結果は、屋内環境におけるローカライズ精度とロバスト性の観点から、異なる教師付き機械学習技術の有効性に関する洞察を与える。 The rise of the Internet of Things (IoT) and mobile internet applications has spurred interest in location-based services (LBS) for commercial, military, and social applications. While the global positioning system (GPS) dominates outdoor localization, its efficacy wanes indoors due to signal challenges. Indoor localization systems leverage wireless technologies like Wi-Fi, ZigBee, Bluetooth, UWB, selecting based on context. Received signal strength indicator (RSSI) technology, known for its accuracy and simplicity, is widely adopted. This study employs machine learning algorithms in three phases: supervised regressors, supervised classifiers, and ensemble methods for RSSI-based indoor localization. Additionally, it introduces a weighted least squares technique and pseudo-linear solution approach to address non-linear RSSI measurement equations by approximating them with linear equations. An experimental testbed, utilizing diverse wireless technologies and anchor nodes, is designed for data collection, employing IoT cloud architectures. Pre-processing involves investigating filters for data refinement before algorithm training. The study employs machine learning models like linear regression, polynomial regression, support vector regression, random forest regression, and decision tree regressor across various wireless technologies. These models estimate the geographical coordinates of a moving target node, and their performance is evaluated using metrics such as accuracy, root mean square errors, precision, recall, sensitivity, coefficient of determinant, and the f1-score. The experiment's outcomes provide insights into the effectiveness of different supervised machine learning techniques in terms of localization accuracy and robustness in indoor environments.	翻訳日:2024-02-20 21:26:14 公開日:2024-02-18
# 偽造検出はより深くできるか? 認識推論のためのデータセット, 評価, ベンチマーク Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning ( http://arxiv.org/abs/2402.11432v1 ) ライセンス: Link先を確認	Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao	(参考訳) 偽造検出は、多くの実践シナリオにおいてその重要性から注目を集めている。現在、データ不足はこの分野の発展に悪影響を及ぼす。一方、虚偽のシナリオをシミュレートするために参加者を雇うのはコストがかかる。一方,インターネット上での偽装行動を含む動画の収集は困難である。本稿では,データ不足に対処するため,新しいデータ収集パイプラインを提案する。具体的には、GPT-4を用いて被疑者と警察官のロールプレイをシミュレートする。尋問中、容疑者は犯罪の責任を逃れるために警察官に嘘をつき、警察官は真実を知り、証拠を収集する。以前のデータセットと比較して、この戦略はデータ収集コストを削減し、データセットのサイズを増加させる有望な方法を提供する。一方,従来の偽装検出タスクを偽装推論に拡張し,さらに偽装部品のエビデンスを提供する。このデータセットは、現在の大規模言語モデルの複雑な推論能力を評価するためにも使用でき、さらなる研究のための推論ベンチマークとして役立ちます。 Deception detection has attracted increasing attention due to its importance in many practical scenarios. Currently, data scarcity harms the development of this field. On the one hand, it is costly to hire participants to simulate deception scenarios. On the other hand, it is difficult to collect videos containing deceptive behaviors on the Internet. To address data scarcity, this paper proposes a new data collection pipeline. Specifically, we use GPT-4 to simulate a role-play between a suspect and a police officer. During interrogation, the suspect lies to the police officer to evade responsibility for the crime, while the police officer uncovers the truth and gathers evidence. Compared with previous datasets, this strategy reduces data collection costs, providing a promising way to increase the dataset size. Meanwhile, we extend the traditional deception detection task to deception reasoning, further providing evidence for deceptive parts. This dataset can also be used to evaluate the complex reasoning capability of current large language models and serve as a reasoning benchmark for further research.	翻訳日:2024-02-20 21:25:45 公開日:2024-02-18
# 3次元再構成のためのロバストなエラー耐性ビュー選択法 A Robust Error-Resistant View Selection Method for 3D Reconstruction ( http://arxiv.org/abs/2402.11431v1 ) ライセンス: Link先を確認	Shaojie Zhang, Yinghui Wang, Bin Nan, Jinlong Yang, Tao Yan, Liangyi Huang, and Mingfeng Wang	(参考訳) 本研究では,SFM(Structure from Motion)ビュー選択におけるカメラベースラインの小さいビューの選択による三角測量の不確実性の増加に対処するため,ロバストなエラー耐性ビュー選択法を提案する。この手法は三角法に基づく計算を用いて誤り耐性モデルを求め、エラー耐性行列を構築するのに使用される。エラー耐性行列の各行のソート結果は、各ビューの候補ビューセットを決定する。全ビューの候補ビューセットをトラバースし、エラー耐性行列に基づいて欠落ビューを完遂することにより、3D再構成の整合性を確保する。本手法とcolmapプログラムにおいて最も精度の高い排他的手法との実験的比較を行い, 復元結果における平均再投影誤差と絶対軌道誤差について検討した。提案手法は,TUMデータセットとDTUデータセットの絶対軌道誤差の平均29.40%,および5.07%の減少を示す。 To address the issue of increased triangulation uncertainty caused by selecting views with small camera baselines in Structure from Motion (SFM) view selection, this paper proposes a robust error-resistant view selection method. The method utilizes a triangulation-based computation to obtain an error-resistant model, which is then used to construct an error-resistant matrix. The sorting results of each row in the error-resistant matrix determine the candidate view set for each view. By traversing the candidate view sets of all views and completing the missing views based on the error-resistant matrix, the integrity of 3D reconstruction is ensured. Experimental comparisons between this method and the exhaustive method with the highest accuracy in the COLMAP program are conducted in terms of average reprojection error and absolute trajectory error in the reconstruction results. The proposed method demonstrates an average reduction of 29.40% in reprojection error accuracy and 5.07% in absolute trajectory error on the TUM dataset and DTU dataset.	翻訳日:2024-02-20 21:25:31 公開日:2024-02-18
# OptEx: ほぼ並列化されたイテレーションによる一階最適化の高速化 OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations ( http://arxiv.org/abs/2402.11427v1 ) ライセンス: Link先を確認	Yao Shu, Jiongfeng Fang, Ying Tiffany He, Fei Richard Yu	(参考訳) 第一次最適化(foo)アルゴリズムは、機械学習や信号デノイジングといった多くの計算領域において重要である。しかしながら、ニューラルネットワークトレーニングのような複雑なタスクへの適用は、収束のために多くの逐次イテレーションを必要とするため、大きな非効率性を必要とすることが多い。これに対して,並列計算を利用して並列化ボトルネックを緩和し,FOOの効率を向上する第1のフレームワークであるOptExを,ほぼ並列化イテレーションで高速化する一階最適化を導入する。 optexでは、将来の勾配予測に勾配履歴を使用するために、カーネル化された勾配推定を採用しており、イテレーションの並列化を可能にしている。 We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $\Omega(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx. First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $\Omega(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.	翻訳日:2024-02-20 21:25:12 公開日:2024-02-18
# オンラインローカル偽発見率制御:資源配分アプローチ Online Local False Discovery Rate Control: A Resource Allocation Approach ( http://arxiv.org/abs/2402.11425v1 ) ライセンス: Link先を確認	Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu	(参考訳) オンライン局所的偽発見率(fdr: online local false discovery rate)制御では,複数のテストが順次実施され,期待される発見回数を最大化することが課題である。本稿では,オンラインの資源配分問題としてこの問題を定式化し,高いレベルからネット上のknapsack問題とみなすことが可能であり,さらにランダムな予算補充の不確実性が生じる。一般到着分布から始めて、$O(\sqrt{T})$ regret を達成するための簡単なポリシーを提案する。このような後悔率は一般的には実現不可能であることを示すことで結果を補完する。その後、焦点を離散的な到着分布に移す。オンラインリソース割り当て文献における多くの既存の再解決ヒューリスティックは、標準設定における有界損失を達成したとしても、$\Omega(\sqrt{T})$あるいは$\Omega(T)$後悔を引き起こす可能性がある。標準政策があまりに楽観的になりすぎ、到着を過度に受け入れる傾向があるという観測から、予算バッファーを組み込んだ新しい政策を提案する。我々は、小さな対数バッファが$\Omega(\sqrt{T})$または$\Omega(T)$から$O(\ln^2T)$への後悔を減らすのに十分であることを示す。理論的結果を検証するため, 数値実験を行った。本論文では, 不正な受理の回避と不確実な予算を伴うオンライン資源配分問題における不正な拒否のバランスを保ちつつ, 効果的な政策がいかに設計されるべきかを強調した。 We consider the problem of online local false discovery rate (FDR) control where multiple tests are conducted sequentially, with the goal of maximizing the total expected number of discoveries. We formulate the problem as an online resource allocation problem with accept/reject decisions, which from a high level can be viewed as an online knapsack problem, with the additional uncertainty of random budget replenishment. We start with general arrival distributions and propose a simple policy that achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such regret rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over accept arrivals, we propose a novel policy that incorporates budget buffers. We show that small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln^2 T)$. Numerical experiments are conducted to validate our theoretical findings. Our formulation may have wider applications beyond the problem considered in this paper, and our results emphasize how effective policies should be designed to reach a balance between circumventing wrong accept and reducing wrong reject in online resource allocation problems with uncertain budgets.	翻訳日:2024-02-20 21:24:55 公開日:2024-02-18
# 一般化ゼロショット認識のためのデータ分布蒸留生成モデル Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition ( http://arxiv.org/abs/2402.11424v1 ) ライセンス: Link先を確認	Yijie Wang and Mingjian Hong and Luwen Huangfu and Sheng Huang	(参考訳) ゼロショット学習(zsl)の領域では,参照データを好む一般化ゼロショット学習(gzsl)モデルのバイアスに対処する。これに対応するために、D$3$GZSLと呼ばれるエンドツーエンド生成GZSLフレームワークを導入する。このフレームワークは、よりバランスの取れたモデルに対して、目に見えないデータと合成されたデータを、それぞれ分布外データとみなす。 d$^3$gzslは2つのコアモジュールから成り、in-distribution dual space distillation (id$^2$sd)とout-of-distribution batch distillation (o$^2$dbd)である。 ID$2$SDは、埋め込みやラベル空間における教師の学習結果と整合し、学習コヒーレンスを高める。 o$^2$dbdは、バッチサンプル毎に低次元の分散表現を導入し、目に見えるカテゴリと目に見えないカテゴリ間の共有構造をキャプチャする。提案手法は,確立されたGZSLベンチマーク間で有効性を示し,主要な生成フレームワークにシームレスに統合する。 D$3$GZSLは既存の生成GZSLメソッドの性能を高め、ゼロショット学習プラクティスを洗練させる可能性を示している。 In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modules: in-distribution dual space distillation (ID$^2$SD) and out-of-distribution batch distillation (O$^2$DBD). ID$^2$SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O$^2$DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D$^3$GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git	翻訳日:2024-02-20 21:24:20 公開日:2024-02-18
# 多段階知識伝達フレームワークによる中国語綴り誤りの軽減 Mitigating Catastrophic Forgetting in Multi-domain Chinese Spelling Correction by Multi-stage Knowledge Transfer Framework ( http://arxiv.org/abs/2402.11422v1 ) ライセンス: Link先を確認	Peng Xing, Yinghui Li, Shirong Ma, Xinnian Liang, Haojing Huang, Yangning Li, Hai-Tao Zheng, Wenhao Jiang, Ying Shen	(参考訳) Chinese Spelling Correction (CSC)は、与えられた文中のスペルエラーを検出し、修正することを目的としている。近年、マルチドメインCSCはより実践的であるため、研究者の注目を集めている。本稿では,複数のドメインシナリオに適応する際のCSCモデルの重要な欠陥,すなわち,新たなドメイン固有の知識(破滅的な忘れ事)を学習する際に獲得した知識を忘れる傾向に注目した。そこで本研究では,新しいドメイン知識にのみ焦点をあてるのではなく,各ドメインにおける知識伝達に継続的に進化する教師モデルを利用する,モデルに依存しない多段階知識伝達(MKT)フレームワークを提案する。マルチドメインCSCタスクに継続的学習メソッドを適用するのは,私たちが初めてです。提案手法の有効性を実証する実験を行い,さらなる解析によりモデル性能を向上させるために壊滅的な忘れを克服することの重要性を実証した。 Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences. Recently, multi-domain CSC has gradually attracted the attention of researchers because it is more practicable. In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowledge (i.e., catastrophic forgetting). To address this, we propose a novel model-agnostic Multi-stage Knowledge Transfer (MKT) framework, which utilizes a continuously evolving teacher model for knowledge transfer in each domain, rather than focusing solely on new domain knowledge. It deserves to be mentioned that we are the first to apply continual learning methods to the multi-domain CSC task. Experiments prove the effectiveness of our proposed method, and further analyses demonstrate the importance of overcoming catastrophic forgetting for improving the model performance.	翻訳日:2024-02-20 21:23:45 公開日:2024-02-18
# 中国語文法誤り訂正における大規模言語モデルの役割の再考 Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction ( http://arxiv.org/abs/2402.11420v1 ) ライセンス: Link先を確認	Yinghui Li, Shang Qin, Jingheng Ye, Shirong Ma, Yangning Li, Libo Qin, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu	(参考訳) 近年,Large Language Models (LLMs) は下流NLPタスクにおける役割について研究者によって広く研究されている。 NLP分野における基本的な課題として、中国語文法誤り訂正(CGEC)は、入力文中のすべての文法的誤りを修正することを目的としている。これまでの研究では、LCMがCGECの修正子としての性能は、課題の焦点が難しいため、未だに満足できないことが示されている。 CGEC における LLM の役割を再考し, CGEC における LLM の活用と探索について検討した。 LLMに格納されている豊富な文法知識とその強力な意味理解能力を考慮すると、LCMを説明者として利用し、エラー修正時にCGEC小モデルの説明情報を提供し、性能を向上させる。また,LCMを評価指標として,より合理的なCGEC評価を実現し,CGECタスクの主観性による問題を軽減する。特に私たちの仕事は、下流のタスクにおいてllmと小さなモデルがどのように協調するかを積極的に探究するものです。広く使われているデータセットに関する広範な実験と詳細な分析は、我々の思考直感と提案手法の有効性を検証する。 Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC. Considering the rich grammatical knowledge stored in LLMs and their powerful semantic understanding capabilities, we utilize LLMs as explainers to provide explanation information for the CGEC small models during error correction to enhance performance. We also use LLMs as evaluators to bring more reasonable CGEC evaluations, thus alleviating the troubles caused by the subjectivity of the CGEC task. In particular, our work is also an active exploration of how LLMs and small models better collaborate in downstream tasks. Extensive experiments and detailed analyses on widely used datasets verify the effectiveness of our thinking intuition and the proposed methods.	翻訳日:2024-02-20 21:23:17 公開日:2024-02-18
# 量子および古典計算による多体相関効果のキャプチャ Capturing many-body correlation effects with quantum and classical computing ( http://arxiv.org/abs/2402.11418v1 ) ライセンス: Link先を確認	Karol Kowalski, Nicholas P. Bauman, Guang Hao Low, Martin Roetteler, John J. Rehr, Fernando D. Vila	(参考訳) 高エネルギー状態における分子系の励起状態の理論的記述は、光源施設における多くの実験活動を支援し推進するために重要である。しかし、それらの複雑な相関効果を捉えるには、近似の階層的インフラストラクチャを提供する形式を必要とする。これらの近似は古典的な計算手法のオーバーヘッドを増大させるため、近似のランク付けと結果の質に関する決定は純粋に数値的な根拠で行う必要がある。量子コンピューティング手法の出現は、この状況を変える可能性がある。本研究では、X線光電子分光に関連するコアレベル状態の同定における量子位相推定器(QPE)の効率を実証する。集団相関効果が支配する状態に対する最も正確な方法の1つとして,qpe予測と正確な対角化および実時間運動連成クラスター式を比較し,検証する。 Theoretical descriptions of excited states of molecular systems in high-energy regimes are crucial for supporting and driving many experimental efforts at light source facilities. However, capturing their complicated correlation effects requires formalisms that provide a hierarchical infrastructure of approximations. These approximations lead to an increased overhead in classical computing methods, and therefore, decisions regarding the ranking of approximations and the quality of results must be made on purely numerical grounds. The emergence of quantum computing methods has the potential to change this situation. In this study, we demonstrate the efficiency of Quantum Phase Estimator (QPE) in identifying core-level states relevant to x-ray photoelectron spectroscopy. We compare and validate the QPE predictions with exact diagonalization and real-time equation-of-motion coupled cluster formulations, which are some of the most accurate methods for states dominated by collective correlation effects.	翻訳日:2024-02-20 21:22:42 公開日:2024-02-18
# LoRETTA:大規模言語モデルの超低パラメータ微調整のための低レベル経済テンソルトレイン適応 LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models ( http://arxiv.org/abs/2402.11417v1 ) ライセンス: Link先を確認	Yifan Yang, Jiajun Zhou, Ngai Wong, Zheng Zhang	(参考訳) モデル性能を維持しながら計算効率のよい微調整を実現するために,様々なパラメータ効率の微調整技術が提案されている。しかし、既存のPEFTメソッドは、LLM(Large Language Models)の迅速な展開に伴うトレーニング可能なパラメータの増加によって、依然として制限されている。この課題に対処するため、テンソル-トレイン分解によりトレーニング可能なパラメータを著しく削減する超パラメータ効率のフレームワークであるLoRETTAを提案する。具体的には, {LoRETTA}$_{adp}$と {LoRETTA}$_{rep}$という2つの方法を提案する。前者はテンソル化アダプタを採用し、LLMの微調整に高性能で軽量なアプローチを提供する。後者は、小さなテンソル因子のセットによる重量パラメータ化による微調整を強調する。 LoRETTAは、LLaMA-2-7Bモデルで最大100\times$より少ないパラメータを持つ、最も広く使われているPEFTメソッドと同等または優れたパフォーマンスを達成する。さらに,提案手法はトレーニング効率を効果的に向上し,マルチタスク学習性能を向上し,反オーバーフィッティング能力を向上することを示した。 HuggingfaceフレームワークとPEFTライブラリ上に構築されたプラグイン・アンド・プレイコードがリリースされる。 Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we present LoRETTA, an ultra-parameter-efficient framework that significantly reduces trainable parameters through tensor-train decomposition. Specifically, we propose two methods, named {LoRETTA}$_{adp}$ and {LoRETTA}$_{rep}$. The former employs tensorized adapters, offering a high-performance yet lightweight approach for the fine-tuning of LLMs. The latter emphasizes fine-tuning via weight parameterization with a set of small tensor factors. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100\times$ fewer parameters on the LLaMA-2-7B models. Furthermore, empirical results demonstrate that the proposed method effectively improves training efficiency, enjoys better multi-task learning performance, and enhances the anti-overfitting capability. Plug-and-play codes built upon the Huggingface framework and PEFT library will be released.	翻訳日:2024-02-20 21:21:49 公開日:2024-02-18
# マルチモーダル要約のためのきめ細かな説明可能なファクタリティ評価 Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization ( http://arxiv.org/abs/2402.11414v1 ) ライセンス: Link先を確認	Liqiang Jing, Jingxuan Zuo, Yue Zhang	(参考訳) マルチモーダル要約は入力テキストと画像に基づいて簡潔な要約を生成することを目的としている。しかし、既存の手法は非実効的な出力に悩まされる可能性がある。マルチモーダル要約モデルの事実性を評価するため、異なるアプリケーションシナリオ、すなわち参照ベース事実性評価フレームワークと参照フリー事実性評価フレームワークに対して、細粒度で説明可能な2つの評価フレームワーク(FALLACIOUS)を提案する。特に、参照フリーの事実性評価フレームワークは、基礎的な真実を必要としないため、より広いアプリケーションシナリオを持つ。提案フレームワークの有効性を評価するために,フレームワークと他のメトリクスの相関度を計算する。実験の結果,提案手法の有効性が示された。コードとデータセットをgithub経由でリリースします。 Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i.e. reference-based factuality evaluation framework and reference-free factuality evaluation framework. Notably, the reference-free factuality evaluation framework doesn't need ground truth and hence it has a wider application scenario. To evaluate the effectiveness of the proposed frameworks, we compute the correlation between our frameworks and the other metrics. The experimental results show the effectiveness of our proposed method. We will release our code and dataset via github.	翻訳日:2024-02-20 21:21:28 公開日:2024-02-18
# Segment Anything Model(SAM)を用いた機械駆動画像ラベリングのためのマルチスペクトル自動転送技術(MATT) A Multispectral Automated Transfer Technique (MATT) for machine-driven image labeling utilizing the Segment Anything Model (SAM) ( http://arxiv.org/abs/2402.11413v1 ) ライセンス: Link先を確認	James E. Gallagher, Aryav Gogia, Edward J. Oughton	(参考訳) Segment Anything Model (SAM)は、大規模なRed-Green-Blue (RGB)イメージデータセットの自動セグメンテーションとラベル付けのスピードと正確性を大幅に加速している。しかし、サムは、例えばマルチスペクトルやハイパースペクトル画像など、可視光スペクトルの外側の画像をセグメンテーションしたりラベル付けしたりできない。そこで本稿では,MATT(Multispectral Automated Transfer Technique)と呼ぶ手法について概説する。 RGB画像からSAMセグメンテーションマスクを変換することで、高精度で効率よくマルチスペクトル画像のセグメンテーションとラベルを自動で行うことができる。例えば、mattを用いた2,400画像データセットのセグメンテーションとラベリングは、トレーニングモデルの開発において87.8%の時間短縮を達成し、およそ20時間の手動ラベリングを2.4時間に短縮した。この効率向上は、MATTによるマルチスペクトルモデルのトレーニングにおいて、手動でラベル付けされたデータセットと比較して、全体の平均平均精度(mAP)が6.7%減少することと関連付けられている。本研究では,訓練中に省いた時間を考慮した場合の精度の低下を許容できるレベルとみなす。本研究は,人間のインタラクションを最小限に抑えたマルチスペクトル物体検出モデルを高速に分割,ラベル付け,訓練するための,新しいオープンソース手法を提供することにより,マルチスペクトル物体検出の研究に大きく貢献する。今後の研究はこれらの手法を応用することに集中する必要がある (i)空間ベースのマルチスペクトル、及び (ii) ドローンによるハイパースペクトル画像。 Segment Anything Model (SAM) is drastically accelerating the speed and accuracy of automatically segmenting and labeling large Red-Green-Blue (RGB) imagery datasets. However, SAM is unable to segment and label images outside of the visible light spectrum, for example, for multispectral or hyperspectral imagery. Therefore, this paper outlines a method we call the Multispectral Automated Transfer Technique (MATT). By transposing SAM segmentation masks from RGB images we can automatically segment and label multispectral imagery with high precision and efficiency. For example, the results demonstrate that segmenting and labeling a 2,400-image dataset utilizing MATT achieves a time reduction of 87.8% in developing a trained model, reducing roughly 20 hours of manual labeling, to only 2.4 hours. This efficiency gain is associated with only a 6.7% decrease in overall mean average precision (mAP) when training multispectral models via MATT, compared to a manually labeled dataset. We consider this an acceptable level of precision loss when considering the time saved during training, especially for rapidly prototyping experimental modeling methods. This research greatly contributes to the study of multispectral object detection by providing a novel and open-source method to rapidly segment, label, and train multispectral object detection models with minimal human interaction. Future research needs to focus on applying these methods to (i) space-based multispectral, and (ii) drone-based hyperspectral imagery.	翻訳日:2024-02-20 21:21:00 公開日:2024-02-18
# 弱値による最適量子状態トモグラフィ Optimal Quantum State Tomography via Weak Value ( http://arxiv.org/abs/2402.11484v1 ) ライセンス: Link先を確認	Xuanmin Zhua, Dezheng Zhanga, Runping Gao, Qun wei, Lixia Liu, and Zijiang Luo	(参考訳) 弱値による状態トモグラフィー戦略の効率を向上させるため,システムと測定装置の最適結合強度を探索した。任意のd次元量子系に対して、密度行列の実部と虚部を測定するのに使用される最適な強度を求める。状態トモグラフィーの最適効率についても平均二乗誤差を用いて検討した。再構成密度行列における最小平均二乗誤差が導出された。本論文で研究されている状態トモグラフィー戦略は、未知の量子状態の測定に有用である。 To improve the efficiency of the state tomography strategy via weak value, we have searched the optimal coupling strength between the system and measuring device. For an arbitrary d-dimensional quantum system, the optimal strengths being used in measuring the real and imaginary parts of the density matrix are obtained. The optimal efficiency of the state tomography has also been studied by using mean square error. The minimal mean square errors in the reconstructed density matrices have been derived. The state tomography strategy studied in this article may be useful in the measurement of the unknown quantum states.	翻訳日:2024-02-20 21:13:27 公開日:2024-02-18
# Re-Dock: 拡散ブリッジによるフレキシブルでリアルな分子ドッキングを目指して Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge ( http://arxiv.org/abs/2402.11459v1 ) ライセンス: Link先を確認	Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li and Stan.Z. Li	(参考訳) タンパク質-リガンド結合構造の正確な予測は、分子ドッキングとして知られるタスクが薬物設計に不可欠であるが、依然として困難である。ディープラーニングは期待されているが、既存の手法はホロタンパク質の構造(ドッキングされ、現実的なタスクでは利用できない)やポケットサイドチェーンのコンフォーメーションに依存し、実用性や非現実的なコンフォーメーション予測に限定される。これらのギャップを埋めるために,リガンドとポケット側鎖のポーズを同時予測するフレキシブルドッキングと呼ばれる未熟なタスクを導入し,幾何多様体に拡張した新しい拡散橋生成モデルであるre-dockを導入する。具体的には, ニュートン・オイラー方程式に触発されたエネルギー対ジオメトリマッピングを提案し, エネルギー制約ドッキング生成過程を反映する結合エネルギーと配座を共モデル化する。 apo-dockやcross-dockを含む設計ベンチマークデータセットに関する包括的な実験は、現在の手法よりも優れた効果と効率を示している。 Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation predictions. To fill these gaps, we introduce an under-explored task, named flexible docking to predict poses of ligand and pocket sidechains simultaneously and introduce Re-Dock, a novel diffusion bridge generative model extended to geometric manifolds. Specifically, we propose energy-to-geometry mapping inspired by the Newton-Euler equation to co-model the binding energy and conformations for reflecting the energy-constrained docking generative process. Comprehensive experiments on designed benchmark datasets including apo-dock and cross-dock demonstrate our model's superior effectiveness and efficiency over current methods.	翻訳日:2024-02-20 21:13:19 公開日:2024-02-18
# Key Patch Proposer: リッチ情報を含むキーパッチ Key Patch Proposer: Key Patches Contain Rich Information ( http://arxiv.org/abs/2402.11458v1 ) ライセンス: Link先を確認	Jing Xu, Beiwen Tian, Hao Zhao	(参考訳) 本稿では,新たなアルゴリズムであるkpp(key patch proposalr)を提案する。本実験では,KPP のセマンティック情報を再構築作業と分類作業の両方で捉える能力を示す。 KPPの有効性は、セマンティックセグメンテーションのためのアクティブラーニングにその可能性を示している。ソースコードはhttps://github.com/ca-tt-ac/key-patch-proposerで公開しています。 In this paper, we introduce a novel algorithm named Key Patch Proposer (KPP) designed to select key patches in an image without additional training. Our experiments showcase KPP's robust capacity to capture semantic information by both reconstruction and classification tasks. The efficacy of KPP suggests its potential application in active learning for semantic segmentation. Our source code is publicly available at https://github.com/CA-TT-AC/key-patch-proposer.	翻訳日:2024-02-20 21:13:02 公開日:2024-02-18
# LLMはいつ検索強化が必要なのか? LLMの過信の緩和は検索の強化に役立つ When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation ( http://arxiv.org/abs/2402.11457v1 ) ライセンス: Link先を確認	Shiyu Ni, Keping Bi, Jiafeng Guo, Xueqi Cheng	(参考訳) 大きな言語モデル(LLM)は、特定の知識を持っていないことや、そのようなケースで明らかな答えを提供する傾向があることを知るのが困難である。 Retrieval Augmentation (RA)はLLMの幻覚を緩和するために広く研究されている。しかし、余分なオーバーヘッドと保証されていない検索品質のため、RAを常に実行するのが最適ではないかもしれない。簡単な考え方は、LLMが質問に対して不確実である場合にのみ検索を行うことである。このことは、LLMが知識境界を知覚しRAを支援する能力を高める動機となります。本稿ではまず,LSMのそのような能力を定量的に測定し,その過信を確かめる。次に,質問に対するllmsの確信度と,外部検索情報への依存度との関係について検討した。本稿では,LLMの知識境界に対する認識を高めるためのいくつかの手法を提案する。さらに、これらの手法により、LLMはより少ない検索呼び出しでRAの同等またはそれ以上の性能を達成することができる。 Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge and tend to provide specious answers in such cases. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. However, due to the extra overhead and unassured quality of retrieval, it may not be optimal to conduct RA all the time. A straightforward idea is to only conduct retrieval when LLMs are uncertain about a question. This motivates us to enhance the LLMs' ability to perceive their knowledge boundaries to help RA. In this paper, we first quantitatively measure LLMs' such ability and confirm their overconfidence. Then, we study how LLMs' certainty about a question correlates with their dependence on external retrieved information. We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence. Additionally, equipped with these methods, LLMs can achieve comparable or even better performance of RA with much fewer retrieval calls.	翻訳日:2024-02-20 21:12:54 公開日:2024-02-18
# FactPICO:医学的証拠の平易な要約のためのファクチュアリティ評価 FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence ( http://arxiv.org/abs/2402.11456v1 ) ライセンス: Link先を確認	Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa G\"oke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li	(参考訳) LLMを用いた平易な言語要約は、技術的コンテンツのテキストアクセシビリティを向上させるのに有用である。しかし、これらの要約は薬のような高リスク領域において、どの程度事実か? 本稿では,無作為化対照治験(rcts)を記述した医学文献の原文要約のための事実度ベンチマークであるfactpico(ファクトピコ)について述べる。 FactPICOは、3つのLCM(GPT-4、Llama-2、Alpaca)から生成された345のプレーン言語要約と、専門家によるきめ細かい評価と自然言語の有理性からなる。本研究は,これらの要約におけるrctの重要な要素である集団,介入,比較者,成果(pico)の事実性,およびそれらに関する報告結果を評価する。また,llmsに付加された追加情報(説明など)の正確性を評価する。 FactPICOを用いて, LLMをベースとした新たなファクトリティー指標を含む, 既存のファクトリティー指標をベンチマークする。医学的証拠の平易な言語要約は、特に単純さと事実性のバランスをとる場合、依然として困難であり、既存のメトリクスは、インスタンスレベルの専門家の判断とあまり相関しない。 Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly inform patient treatment. FactPICO consists of 345 plain language summaries of RCT abstracts generated from three LLMs (i.e., GPT-4, Llama-2, and Alpaca), with fine-grained evaluation and natural language rationales from experts. We assess the factuality of critical elements of RCTs in those summaries: Populations, Interventions, Comparators, Outcomes (PICO), as well as the reported findings concerning these. We also evaluate the correctness of the extra information (e.g., explanations) added by LLMs. Using FactPICO, we benchmark a range of existing factuality metrics, including the newly devised ones based on LLMs. We find that plain language summarization of medical evidence is still challenging, especially when balancing between simplicity and factuality, and that existing metrics correlate poorly with expert judgments on the instance level.	翻訳日:2024-02-20 21:12:38 公開日:2024-02-18
# LoRA-Flow: 生成タスクにおける大規模言語モデルのための動的LoRA融合 LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks ( http://arxiv.org/abs/2402.11455v1 ) ライセンス: Link先を確認	Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun	(参考訳) LoRAは軽量モジュールを使用して、ダウンストリームタスクやドメイン毎に大きな言語モデル(LLM)をカスタマイズする。新しいタスクに対処するために既存のLoRAを組み合わせることで、学習したLoRAの再利用性を高めることができる。 LoRAの組み合わせに関する以前のほとんどの研究は、主に関連するLoRAごとにタスクレベルの重みに依存しており、異なる例とトークンが同じLoRA重みを共有する。しかし、生成タスクでは、異なるトークンは管理する様々なスキルを必要とする。中国の数学タスクを例にとると、問題記述の理解は中国のLoRAに依存し、計算部は数学のLoRAに依存している可能性がある。そこで本稿では,異なるロラスの影響を動的重み付けを用いて調整するlora-flowを提案する。各ステップの重みは、非常に少ないパラメータを持つ融合ゲートによって決定され、200のトレーニング例で学習できる。 6つの生成タスクに対する実験により、我々の手法はタスクレベルの融合重みでベースラインを一貫して上回ることを示した。これはlora結合に動的融合重みを導入する必要性を強調する。 LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain, where different learned additional modules represent diverse skills. Combining existing LoRAs to address new tasks can enhance the reusability of learned LoRAs, particularly beneficial for tasks with limited annotated data. Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights. However, in generative tasks, different tokens may necessitate diverse skills to manage. Taking the Chinese math task as an example, understanding the problem description may depend more on the Chinese LoRA, while the calculation part may rely more on the math LoRA. To this end, we propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs. The weights at each step are determined by a fusion gate with extremely few parameters, which can be learned with only 200 training examples. Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights. This underscores the necessity of introducing dynamic fusion weights for LoRA combination.	翻訳日:2024-02-20 21:12:16 公開日:2024-02-18
# MatPlotAgent: LLMに基づくエージェント科学データの可視化手法と評価 MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization ( http://arxiv.org/abs/2402.11453v1 ) ライセンス: Link先を確認	Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun	(参考訳) 科学データ可視化は、複雑な情報の直接表示を可能にし、暗黙のパターンを識別する研究者を支援することによって、研究において重要な役割を果たす。その重要性にもかかわらず、科学的データの可視化にLarge Language Models (LLMs) を用いることは、まだ明らかになっていない。本研究では,科学的データ可視化タスクの自動化を目的とした,効率的なモデルに依存しないLLMエージェントフレームワークであるMatPlotAgentを紹介する。 matplotagentは,コードllmとマルチモーダルllmの両方の機能を活用することで,クエリ理解,反復デバッグによるコード生成,エラー修正のための視覚的フィードバック機構という3つのコアモジュールで構成される。この分野でのベンチマークの欠如に対処するため、100の人間検証テストケースからなる高品質なベンチマークであるMatPlotBenchを紹介した。さらに, GPT-4V を用いた自動評価手法を提案する。実験の結果,matplotagentは商用モデルとオープンソースモデルの両方を含む様々なllmの性能を向上させることができた。さらに,提案手法は,人間の注記スコアと強い相関関係を示す。 Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.	翻訳日:2024-02-20 21:11:55 公開日:2024-02-18
# SciAgent: 科学的推論のためのツール強化言語モデル SciAgent: Tool-augmented Language Models for Scientific Reasoning ( http://arxiv.org/abs/2402.11451v1 ) ライセンス: Link先を確認	Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao and Aixin Sun	(参考訳) 科学的推論は、最も先進的な大規模言語モデル(LLM)でさえも過度に挑戦する。このタスクをより実用的で解き易くするために,ツール強化科学推論という新しいタスク設定を導入する。この設定は、スケーラブルなツールセットでLLMを補完し、全能的な問題解決者から熟練したツールユーザへと焦点を移す。そこで我々は,3万以上のサンプルと約6,000のツールを含むツール強化学習コーパスMathFuncを構築した。 MathFunc上に構築したSciAgentは,科学的な問題解決のためのツールを検索し,理解し,必要に応じて利用する。さらに、私たちは5つの科学的領域にまたがるベンチマークSciToolBenchを作成し、ツールアシストによるLSMの能力を評価する。 SciToolBenchの大規模な実験により、SciAgentの有効性が確認された。特に、SciAgent-Mistral-7Bは、同じ大きさの他のLLMを13%以上、絶対精度で上回る。さらに、SciAgent-DeepMath-7BはChatGPTよりも優れた性能を示している。 Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.	翻訳日:2024-02-20 21:11:38 公開日:2024-02-18
# ラベル分布による文脈内サンプル順序付け In-Context Example Ordering Guided by Label Distributions ( http://arxiv.org/abs/2402.11447v1 ) ライセンス: Link先を確認	Zhichao Xu, Daniel Cohen, Bei Wang, Vivek Srikumar	(参考訳) タスク固有のトレーニングなしでモデルを予測できるようにすることで、事前訓練されたLLMを用いたインコンテキスト学習(ICL)は、NLPにおいて大きな可能性を秘めている。しかし、iclでは多くの問題が続いている。特に、そのパフォーマンスは、コンテキスト内例の選択と順序に敏感である。異なる順序を持つ同じコンテキストの例が与えられた場合、モデルの性能は、ほぼランダムからほぼ最先端まで様々である。本研究では,最適化問題としてコンテキスト内注文を定式化する。本研究は,課題の既知点に関する仮定が異なる3つの問題設定について検討する。ラベル比率から学習するという考えに触発されて,モデルの確率予測に導かれる文脈内サンプル順序付けの2つの原則を提案する。提案手法をテキスト分類データセット13と,700Mから13Bパラメータを持つ9種類の自己回帰LDMに適用した。提案手法は, 分類精度の向上, モデルの誤校正の低減, 文脈内事例の選択により, ベースラインよりも優れていることを示す。 By allowing models to predict without task-specific training, in-context learning (ICL) with pretrained LLMs has enormous potential in NLP. However, a number of problems persist in ICL. In particular, its performance is sensitive to the choice and order of in-context examples. Given the same set of in-context examples with different orderings, model performance may vary between near random to near state-of-the-art. In this work, we formulate in-context example ordering as an optimization problem. We examine three problem settings that differ in the assumptions they make about what is known about the task. Inspired by the idea of learning from label proportions, we propose two principles for in-context example ordering guided by model's probability predictions. We apply our proposed principles to thirteen text classification datasets and nine different autoregressive LLMs with 700M to 13B parameters. We demonstrate that our approach outperforms the baselines by improving the classification accuracy, reducing model miscalibration, and also by selecting better in-context examples.	翻訳日:2024-02-20 21:11:22 公開日:2024-02-18
# 米国における条件付き自動走行車の受容状況 Gauging Public Acceptance of Conditionally Automated Cars in the United States ( http://arxiv.org/abs/2402.11444v1 ) ライセンス: Link先を確認	Antonios Saravanos (1) ((1) New York University)	(参考訳) この研究では、スマートシティの要素である条件付き自動走行車(saeレベル3)を調べ、米国における公共の受容に影響を与える要因を調査します。 UTUAT2モデルの適応を適用した。米国の358名の被験者を対象に,l3技術を概説したvignetteと,条件付き自動走行車の認識を捉えた一連の質問を行った。 PLS-SEMは収集データの解析に使用された。その結果, 社会的影響, パフォーマンス期待度, ヘドニックモチベーション, ファシリテーション条件, 努力期待度によって, 技術の受容が決定された。さらに、ヘドニックモチベーション、社会的影響、ファシリテーション条件、努力期待度は、テクノロジがいかに有用であるかの認識に肯定的な影響を与え、ファシリテーション条件、ヘドニックモチベーション、社会的影響は、努力期待度に肯定的な影響を与え、社会的影響とファシリエーション条件はヘドニックモチベーションに肯定的な影響を与え、社会的影響は、ファシリエーション条件に肯定的な影響を与える。男女差の緩和効果がみられ, 採用意図に影響を与えるヘドニックモチベーションの影響が男性にとって顕著であった。 In this work we look at an element of smart cities, conditionally automated cars (SAE Level 3), investigating the factors influencing public acceptance in the United States. We apply an adaptation of the UTUAT2 model. Taking an experimental approach study 358 participants in the US were presented with a vignette outlining the L3 technology followed by a series of questions to capture their perceptions of conditionally automated cars. PLS-SEM was used to analyze the collected data. The results reveal that the acceptance of the technology, in order of decreasing importance, was determined by social influence, performance expectancy, hedonic motivation, facilitating conditions, and effort expectancy. Furthermore, hedonic motivation, social influence, facilitating conditions and effort expectancy all have a positive influence on the perception of how useful the technology is; facilitating conditions, hedonic motivation, and social influence all have a positive influence on effort expectancy; social influence and facilitating conditions positively influence hedonic motivation; and social influence positively influences facilitating conditions. A moderating effect for gender was found, with the effect of hedonic motivation influencing intention to adopt is more prominent for men.	翻訳日:2024-02-20 21:11:07 公開日:2024-02-18
# ベンチマーク自己進化:動的LLM評価のためのマルチエージェントフレームワーク Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation ( http://arxiv.org/abs/2402.11443v1 ) ライセンス: Link先を確認	Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei, Xuanjing Huang	(参考訳) 本稿では,高速に進行する大規模言語モデル(llm)を動的に評価するためのベンチマーク自己進化フレームワークを提案する。マルチエージェントシステムを使用して、元のインスタンスのコンテキストや質問を操作し、既存のベンチマークを動的に拡張する信頼性の高い新しいインスタンスをフレーミングする。よりスケーラブルでロバストできめ細かい評価を行うため、様々なクエリやデータノイズに対してllmをテストする進化するインスタンスを構築するために、6つのリフレーミング操作を実装し、問題解決するサブアビリティを探索します。このフレームワークでは、4つのタスクのベンチマークデータセットを拡張する。実験結果から, LLMの当初の結果に対する性能低下が認められた。スケーラブルで堅牢な評価の下でのこの低下は、より正確にモデルの能力を反映する、きめ細かい評価と並んでいます。さらに、当社のフレームワークは、異なるモデルとさまざまなタスクにおける同一モデル間のパフォーマンスの相違を拡大し、特定のタスクに対するより情報のあるモデル選択を容易にします(コードとデータはhttps://github.com/NanshineLoong/Self-Evolving-Benchmarkで利用可能です)。 This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations to construct evolving instances testing LLMs against diverse queries, data noise and probing their problem-solving sub-abilities. With this framework, we extend benchmark datasets of four tasks. Experimental results show a general performance decline in most LLMs against their original results. This decline under our scalable and robust evaluations, alongside our fine-grained evaluation, more accurately reflect models' capabilities. Besides, our framework widens performance discrepancies both between different models and within the same model across various tasks, facilitating more informed model selection for specific tasks (Code and data are available at https://github.com/NanshineLoong/Self-Evolving-Benchmark).	翻訳日:2024-02-20 21:09:59 公開日:2024-02-18
# LLMはルールで理にかなっているか? ストレス試験とLLM改善のための論理スカッホールディング Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs ( http://arxiv.org/abs/2402.11442v1 ) ライセンス: Link先を確認	Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren	(参考訳) 大規模言語モデル(llm)は様々な推論タスクで印象的な人間的なパフォーマンスを達成している。しかし、その根底にある推論規則の熟達性は、人間の能力に欠ける。そこで本研究では,5つの領域にまたがるプリミティブルールとコンポジションルールを組み合わせた推論ルールベースであるULogicを構築するための,推論ルール生成フレームワークを提案する。ルールサブセット上でのGPT系列モデルの解析は,LLMの論理的理解において,特に特定のバイアスパターンを持つ構成的・構造的複雑な規則において,人的性能と比較して大きなギャップを生じさせる。さらにこれらのルールを,よりフレキシブルなルール生成と下流推論の強化のために,より小型な推論エンジンに蒸留する。提案する推論エンジンは, 精度, 複雑, 抽象的な結論と前提を生成するのに有効であることを証明し, 各種常識推論タスクを改良する。全体として、我々の研究は、推論ルールの把握における LLM の限界に光を当て、論理的推論能力~\footnote{Code を向上する方法を、 \url{https://github.com/SiyuanWangw/ULogic} で示しています。 }. Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks. However, their mastery of underlying inferential rules still falls short of human capabilities. To investigate this, we propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic, comprising both primitive and compositional rules across five domains. Our analysis of GPT-series models over a rule subset reveals significant gaps in LLMs' logic understanding compared to human performance, especially in compositional and structural complex rules with certain bias patterns. We further distill these rules into a smaller-scale inference engine for flexible rule generation and enhancing downstream reasoning. Through a multi-judger evaluation, our inference engine proves effective in generating accurate, complex and abstract conclusions and premises, and improve various commonsense reasoning tasks. Overall, our work sheds light on LLMs' limitations in grasping inferential rule and suggests ways to enhance their logical reasoning abilities~\footnote{Code and data are available at \url{https://github.com/SiyuanWangw/ULogic}.}.	翻訳日:2024-02-20 21:09:34 公開日:2024-02-18
# InfuserKI:Infuser-Guided Knowledge Integrationによる知識グラフによる大規模言語モデルの強化 InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration ( http://arxiv.org/abs/2402.11441v1 ) ライセンス: Link先を確認	Fali Wang, Runxue Bao, Suhang Wang, Wenchao Yu, Yanchi Liu, Wei Cheng, Haifeng Chen	(参考訳) 大規模言語モデル(LLM)は、様々な領域にまたがる顕著なオープンジェネレーション能力を示しているが、彼らは知識集約的なタスクに苦労している。この問題を軽減するため、外部モジュールを用いたドメイン固有の知識グラフでllmを強化するための知識統合手法が提案されている。しかし、微調整には既知の知識と未知の知識の両方を必要とするため、データの非効率に苦しむ。そこで本研究では,未知の知識をLLMに効率的に統合する新たな課題について検討する。新しい知識を注入すると、以前に獲得した知識を忘れるリスクが生じる。そこで本研究では,トランスフォーマティブ内部状態を利用した新しい知識統合(infuserki,infuserki,infuserki,infuserki)フレームワークを提案する。 UMLS-2.5k と MetaQA ドメイン知識グラフの評価は、InfuserKI が知識の忘れを減らすために、新しい知識を効果的に獲得し、最先端のベースラインを9% と 6% に向上させることができることを示している。 Though Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains, they struggle with knowledge-intensive tasks. To alleviate this issue, knowledge integration methods have been proposed to enhance LLMs with domain-specific knowledge graphs using external modules. However, they suffer from data inefficiency as they require both known and unknown knowledge for fine-tuning. Thus, we study a novel problem of integrating unknown knowledge into LLMs efficiently without unnecessary overlap of known knowledge. Injecting new knowledge poses the risk of forgetting previously acquired knowledge. To tackle this, we propose a novel Infuser-Guided Knowledge Integration (InfuserKI) framework that utilizes transformer internal states to determine whether to enhance the original LLM output with additional information, thereby effectively mitigating knowledge forgetting. Evaluations on the UMLS-2.5k and MetaQA domain knowledge graphs demonstrate that InfuserKI can effectively acquire new knowledge and outperform state-of-the-art baselines by 9% and 6%, respectively, in reducing knowledge forgetting.	翻訳日:2024-02-20 21:09:11 公開日:2024-02-18
# 自己フィードバックのペリル--大規模言語モデルにおける自己バイアスの増幅 Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models ( http://arxiv.org/abs/2402.11436v1 ) ライセンス: Link先を確認	Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, William Yang Wang	(参考訳) 最近の研究によると、自己フィードバックは特定のタスクにおいて大きな言語モデル(LLM)を改善し、他のタスクを悪化させる。このような逆は、LLMが自身の出力に偏りがあることが判明した。本稿では, LLMの自己バイアス(自称世代を好む傾向)を2つの統計値を用いて正式に定義する。我々は、翻訳、制約付きテキスト生成、数学的推論の6つのLCMを解析する。自己バイアスは、複数の言語やタスクにまたがる全てのLLMで顕著である。分析の結果,自己定義パイプラインはモデル出力の流速と理解性を向上するが,さらに自己バイアスを増幅することがわかった。このようなバイアスを軽減するために,モデルサイズと正確な評価による外部からのフィードバックが,自己定義パイプラインのバイアスを著しく低減し,下流タスクのパフォーマンス向上につながることを見出した。 Recent studies show that self-feedback improves large language models (LLMs) on certain tasks while worsens other tasks. We discovered that such a contrary is due to LLM's bias towards their own output. In this paper, we formally define LLM's self-bias -- the tendency to favor its own generation -- using two statistics. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks.	翻訳日:2024-02-20 21:08:49 公開日:2024-02-18
# 大規模言語モデルのための知識境界のベンチマーク:モデル評価の異なる視点 Benchmarking Knowledge Boundary for Large Language Model: A Different Perspective on Model Evaluation ( http://arxiv.org/abs/2402.11493v1 ) ライセンス: Link先を確認	Xunjian Yin and Xu Zhang and Jie Ruan and Xiaojun Wan	(参考訳) 近年,多種多様なタスクにおいて顕著な性能を達成し,大規模言語モデルの開発において大きな進歩を遂げている。言語モデルの知識能力を評価するため,従来の研究では,質問応答ペアに基づくベンチマークが多数提案されている。我々は,言語モデルがアクティベートに敏感であるため,一定の質問や限定的なパラフレーズで言語モデルを評価することは信頼性が高く,包括的ではないと主張している。そこで本研究では,言語モデルにおいて,知識境界という新しい概念を導入する。知識境界は言語モデル評価の迅速な感度を回避し、より信頼性と堅牢性を高める。与えられたモデルの知識境界を探索するために,各知識に対して最適なプロンプトを識別する新しいアルゴリズムである,セマンティック制約付き予測勾配降下法を提案する。実験により,既存の手法と比較して知識境界の計算において,アルゴリズムの優れた性能を示す。さらに,知識境界を持つ複数の領域における複数の言語モデルの能力を評価する。 In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or limited paraphrases as the query, since language models are sensitive to prompt. Therefore, we introduce a novel concept named knowledge boundary to encompass both prompt-agnostic and prompt-sensitive knowledge within language models. Knowledge boundary avoids prompt sensitivity in language model evaluations, rendering them more dependable and robust. To explore the knowledge boundary for a given model, we propose projected gradient descent method with semantic constraints, a new algorithm designed to identify the optimal prompt for each piece of knowledge. Experiments demonstrate a superior performance of our algorithm in computing the knowledge boundary compared to existing methods. Furthermore, we evaluate the ability of multiple language models in several domains with knowledge boundary.	翻訳日:2024-02-20 21:02:27 公開日:2024-02-18
# 何の計画だ? LLMのためのプランニングアウェア技術の評価と開発 What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs ( http://arxiv.org/abs/2402.11489v1 ) ライセンス: Link先を確認	Eran Hirsch, Guy Uziel, Ateret Anaby-Tavor	(参考訳) 計画は、特定の環境で特定の目標を達成する一連の行動を見つけることを含む、人工知能の基本的なタスクである。大規模言語モデル(LLM)は、Webやエンボディエージェントのような計画機能を必要とするアプリケーションにますます使われています。近年の研究では,LSMには計画に必要なスキルが欠けていることが実証されている。これらの観測に基づいて,LLMと古典的計画手法を組み合わせたハイブリッドアプローチの可能性を提唱する。次に,新しいハイブリッド手法であるSimPlanを紹介し,その性能を新たな挑戦的な設定で評価する。様々な計画領域にわたる広範な実験により、SimPlanは既存のLLMベースのプランナーよりも大幅に優れていることが示された。 Planning is a fundamental task in artificial intelligence that involves finding a sequence of actions that achieve a specified goal in a given environment. Large language models (LLMs) are increasingly used for applications that require planning capabilities, such as web or embodied agents. In line with recent studies, we demonstrate through experimentation that LLMs lack necessary skills required for planning. Based on these observations, we advocate for the potential of a hybrid approach that combines LLMs with classical planning methodology. Then, we introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup. Our extensive experiments across various planning domains demonstrate that SimPlan significantly outperforms existing LLM-based planners.	翻訳日:2024-02-20 21:02:08 公開日:2024-02-18
# irfundusset:調和した健康ラベルを持つ網膜rundusデータセット IRFundusSet: An Integrated Retinal Rundus Dataset with a Harmonized Healthy Label ( http://arxiv.org/abs/2402.11488v1 ) ライセンス: Link先を確認	P. Bilha Githinji, Keming Zhao, Jiantao Wang, Peiwu Qin	(参考訳) 眼の条件は世界的関心事であり、網膜底色写真を利用した計算ツールは定期的なスクリーニングと管理に役立つ。しかし、包括的かつ十分な大きさのデータセットを持つことは、人口統計学や取得のバリエーションに加えて、病理学における異質性を示す複雑な網膜基底体にとって自明ではない。さらに、公共空間における網膜眼底データセットは、データの組織化と健全な観察の定義において断片化に苦しむ。本稿では,複数の公開データセットを統合し,調和させ,キュレーションするデータセットである統合網膜底セット(irfundusset)を提案する。 IRFundusSetはPythonパッケージで構成されており、調和を自動化し、PyTorchアプローチに従ってデータセットオブジェクトを活用する。さらに、画像が物理的にレビューされ、健康観察の一貫した定義のために新しいis_normalラベルが注釈付けされる。 10の公開データセットが46064の画像で検討され、そのうち25406が新しいis_normalラベルのためにキュレートされ、3515はソース全体で健全であると考えられている。 Ocular conditions are a global concern and computational tools utilizing retinal fundus color photographs can aid in routine screening and management. Obtaining comprehensive and sufficiently sized datasets, however, is non-trivial for the intricate retinal fundus, which exhibits heterogeneities within pathologies, in addition to variations from demographics and acquisition. Moreover, retinal fundus datasets in the public space suffer fragmentation in the organization of data and definition of a healthy observation. We present Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label. IRFundusSet comprises a Python package that automates harmonization and avails a dataset object in line with the PyTorch approach. Moreover, images are physically reviewed and a new is_normal label is annotated for a consistent definition of a healthy observation. Ten public datasets are initially considered with a total of 46064 images, of which 25406 are curated for a new is_normal label and 3515 are deemed healthy across the sources.	翻訳日:2024-02-20 21:01:59 公開日:2024-02-18
# テキスト-画像拡散モデルを用いた視覚概念駆動画像生成 Visual Concept-driven Image Generation with Text-to-Image Diffusion Model ( http://arxiv.org/abs/2402.11487v1 ) ライセンス: Link先を確認	Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal	(参考訳) テキスト・ツー・イメージ(tti)拡散モデルは、複雑なシーンや想像上のシーンの高解像度画像を生成する素晴らしい結果を示している。近年のアプローチでは、これらの手法をパーソナライズ技術でさらに拡張し、いくつかのサンプル画像のイラストを使ってユーザイリュートされた概念(例えば、ユーザ自身)を統合できるようになった。しかし、人間の主題など、複数の相互作用する概念を持つ画像を生成する能力や、1つあるいは複数の画像イラストに絡み合っているかもしれない概念は、いまだに想像に難くない。本研究では,これらの課題に対処する概念駆動型TTIパーソナライズフレームワークを提案する。ユーザ認証概念のカスタムトークンを学習し、TTIモデルで既存のテキストトークンと対話可能な既存の作業に基づいて構築する。しかし,問題となっている概念を解き散らし,よりよく学習するために,ユーザが提供するイメージイラストでこれらの概念を解き散らした(相対的な)セグメンテーションマスクを共同で学習する。我々は,カスタムトークンの学習と,ユーザ提供画像中の対応する概念を包含するマスクの推定を交互に行う,期待最大化(em)ライクな最適化手順を導入する。我々は,u-netパラメータ化潜在拡散モデルとそれに続く高密度crf最適化から,クロスアテンションに基づくマスクを得る。このような共同改良が、概念のより良いトークンの学習につながり、また、両産物として、潜伏したマスクであることを示す。提案手法の利点を(ユーザスタディを通して)質的かつ定量的に説明し,3つの概念を結合できる例とユースケースをいくつか紹介する。 Text-to-image (TTI) diffusion models have demonstrated impressive results in generating high-resolution images of complex and imaginative scenes. Recent approaches have further extended these methods with personalization techniques that allow them to integrate user-illustrated concepts (e.g., the user him/herself) using a few sample image illustrations. However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive. In this work, we propose a concept-driven TTI personalization framework that addresses these core challenges. We build on existing works that learn custom tokens for user-illustrated concepts, allowing those to interact with existing text tokens in the TTI model. However, importantly, to disentangle and better learn the concepts in question, we jointly learn (latent) segmentation masks that disentangle these concepts in user-provided image illustrations. We do so by introducing an Expectation Maximization (EM)-like optimization procedure where we alternate between learning the custom tokens and estimating masks encompassing corresponding concepts in user-supplied images. We obtain these masks based on cross-attention, from within the U-Net parameterized latent diffusion model and subsequent Dense CRF optimization. We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks. We illustrate the benefits of the proposed approach qualitatively and quantitatively (through user studies) with a number of examples and use cases that can combine up to three entangled concepts.	翻訳日:2024-02-20 21:01:36 公開日:2024-02-18
# 回転加速基準フレームにおける軌道角運動量スペクトルと絡み合い Orbital angular momentum spectrum and entanglement in rotating accelerated reference frame ( http://arxiv.org/abs/2402.11486v1 ) ライセンス: Link先を確認	Haorong Wu, Xilong Fan, and Lixiang Chen	(参考訳) 粒子の定義は異なる理論によって異なる。曲線時空における場の量子論は、線形加速された観測者の視点からすると、慣性空空間は熱粒子で満たされている可能性があることを示している。この効果はunruh効果として知られている。軌道角運動量(oam)の自由度が考慮されると、全てのoamモードは同じ期待粒子数を共有する。本稿では, 回転加速基準フレームにおけるOAMスペクトルについて検討し, 線形加速の場合とスペクトルの相違について検討する。観測者が回転し始めると、全てのOAMモードが許されず、負のエネルギーモードが現れる。回転加速オブザーバーが実際にこれらの粒子をどう知覚するかを理解するために、ウンルー・デウィット検出器とその詳細バランスを研究した。この関係は、共動慣性フレームと残りフレームの両方で研究される。これらの結果に基づいて, OAMエンタングルメント劣化を2次元および高次元ケースでそれぞれ検討した。その結果, OAMモードのエンタングルメント次元と最高次数は, それぞれ加速度と回転に関係していることが示唆された。すると、これらの結果が全ての定常軌道に一般化できることが示される。 The particle definition varies across different theories. The quantum field theory in curved spacetime shows that from the perspective of a linearly accelerated observer, an inertial empty space may be full of thermal particles. This effect is known as the Unruh effect. When the degrees of freedom of orbital angular momentum (OAM) are considered, all OAM modes share the same expected particle number. Here, we examine the OAM spectrum in a rotating accelerated reference frame to see how the spectrum differs from the linear accelerated case. When the observer starts to rotate, not all OAM modes are allowed and some negative energy modes show up. To understand how a rotating accelerated observer actually perceives these particles, the Unruh-DeWitt detector and its detailed balance are studied. This relation is studied both in the comoving inertial frame and in the rest frame. Based on these results, the OAM entanglement degradation is explored in two-dimensional and high-dimensional cases, respectively. The results indicate that the entanglement dimension and the highest order of OAM modes are mainly related to the acceleration and the rotation, respectively. It is then demonstrated that these results can be generalized to all stationary trajectories.	翻訳日:2024-02-20 21:01:06 公開日:2024-02-18
# LEIA:エンティティベースのデータ拡張による言語モデルにおける言語間知識伝達の実現 LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation ( http://arxiv.org/abs/2402.11485v1 ) ライセンス: Link先を確認	Ikuya Yamada and Ryokan Ri	(参考訳) 英語に基づく大規模言語モデル(LLM)を他の言語に適応させることは、言語間移動の効率性と可能性から、ますます人気が高まっている。しかし、既存の言語適応手法はしばしば言語間監督の利点を見落としている。本研究では,言語間で一致したウィキペディアのエンティティ名を利用する言語適応チューニング手法であるLEIAを紹介する。この方法は、対象言語コーパスを英語のエンティティ名で拡張し、左右の言語モデルを用いてモデルをトレーニングすることを含む。 7Bパラメータ LLM を用いて多様な質問応答データセット上でLEIAを評価し,英語以外の言語で顕著な性能向上を示した。ソースコードはhttps://github.com/studio-ousia/leiaで入手できる。 Adapting English-based large language models (LLMs) to other languages has become increasingly popular due to the efficiency and potential of cross-lingual transfer. However, existing language adaptation methods often overlook the benefits of cross-lingual supervision. In this study, we introduce LEIA, a language adaptation tuning method that utilizes Wikipedia entity names aligned across languages. This method involves augmenting the target language corpus with English entity names and training the model using left-to-right language modeling. We assess LEIA on diverse question answering datasets using 7B-parameter LLMs, demonstrating significant performance gains across various non-English languages. The source code is available at https://github.com/studio-ousia/leia.	翻訳日:2024-02-20 21:00:47 公開日:2024-02-18
# DictLLM:医学診断のための大規模言語モデルを用いたキーバリューデータ構造 DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics ( http://arxiv.org/abs/2402.11481v1 ) ライセンス: Link先を確認	YiQiu Guo, Yuchen Yang, Ya Zhang, Yu Wang, Yanfeng Wang	(参考訳) 構造化データは、情報の組織化のための洗練されたメカニズムを提供する。大規模言語モデルの文脈における構造化データのテキストシリアライズのための既存の手法は、キー値構造化データに固有の不均一性に適切に対処できない。これらの手法は理想的ではなく、しばしば入力サイズが大きくなり、入力変更への適応性が低い。本稿では,医学検査報告などのキーバリュー構造化データのモデリングの改善を目的とした,医療診断のための革新的なフレームワークであるDictLLMを紹介する。 DictLLMは,(1)置換不変性を維持するためのグループ位置符号化,(2)構造化データの固有バイアスを捉える階層的注意バイアス,(3)辞書エンコーダが生成する埋め込みをLCMに整列させる最適な輸送アライメント層,の3つの重要な構成要素を統合し,固定長仮想トークンのシーケンスを生成する。診断自動生成のための総合的実世界医学実験室レポートデータセット上で,様々なllmモデルを用いた実験を行い,この結果から,ディクセルムはルージュlと知識f1得点の両方において,確立されたベースライン法と少数ショットgpt-4実装を有意に上回っていることが示された。さらに,このフレームワークのスケーラビリティとロバスト性の評価は,医用辞書データの複雑なキー・バリューデータ構造を正確にモデル化する上で,その例外的な能力を強調する。 Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity inherent in key-value structured data. These methods are not ideal and frequently result in larger input sizes and poor adaptability to input changes. In this paper, we introduce DictLLM, an innovative framework designed to improve the modeling of key-value structured data, like medical laboratory reports, for generating medical diagnoses. DictLLM integrates three key components: (1) group positional encoding to maintain permutation invariance, (2) hierarchical attention bias to capture the inherent bias in structured data, and (3) an optimal transport alignment layer that aligns the embedding generated by the dictionary encoder with the LLM, thereby producing a sequence of fixed-length virtual tokens. We carry out experiments using various LLM models on a comprehensive real-world medical laboratory report dataset for automatic diagnosis generation, our findings illustrate that DictLLM significantly outperforms established baseline methods and few-shot GPT-4 implementations in terms of both Rouge-L and Knowledge F1 scores. Furthermore, our evaluation of the framework's scalability and robustness, through a series of experiments, underscores its exceptional capability in accurately modeling the complex key-value data structure of medical dictionary data.	翻訳日:2024-02-20 21:00:13 公開日:2024-02-18
# インドにおける異なるメンタルヘルス表現に関する研究 Studying Differential Mental Health Expressions in India ( http://arxiv.org/abs/2402.11477v1 ) ライセンス: Link先を確認	Khushi Shelat, Sunny Rai, Devansh R Jain, Young Min Cho, Maitreyi Redkar, Samindara Sawant, Sharath Chandra Guntuku	(参考訳) 精神社会的ストレスと精神障害の症状学は、社会文化的環境によって異なることが知られている。しかし、ソーシャルメディア上でのメンタルヘルスの表現は、主にWEIRD(Western, Educated, Industrial, Rich, Democratic)の文脈で研究されている。本稿では,インドにおける個人によるRedditのメンタルヘルス投稿を分析し,Rest of the World (ROW) のユーザと比較して,インドの文脈に特有なオンラインうつ病言語の変化を明らかにする。西洋のサンプルとは異なり、インドにおけるメンタルヘルスの議論は、悲しみ、否定の使用、現在に焦点を当てており、仕事と達成に関連している。イルネス』はインドとのみ関連しており、インドの患者の体症状と精神障害の関連を再確認している。 2人の臨床心理学者がソーシャルメディア投稿の調査結果を検証し、メンタルヘルスに関する議論に関連するトップ20のトピックの95%がインド人における「流行」であると判明した。インドにおけるオンラインメンタルヘルス関連言語における重要な言語的変化は、ROWと比較して、文化的に認識されたメンタルヘルスモデルの必要性を強調している。これらの知見は、インドにおける精神疾患の診断と治療のギャップを減少させるために文化的に適切な介入を設計する上で重要な意味を持つ。 Psychosocial stressors and the symptomatology of mental disorders are known to vary with socio-cultural environment. Mental health expressions on social media, however, are primarily informed by studies in the WEIRD (Western, Educated, Industrial, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online depression language specific to the Indian context compared to users from the Rest of the World (ROW). Unlike in Western samples, mental health discussions in India additionally express sadness, use negation, are present-focused, and are related to work and achievement. {Illness} is exclusively correlated to India, reaffirming the link between somatic symptoms and mental disorders in Indian patients. Two clinical psychologists validated the findings from social media posts and found 95\% of the top-20 topics associated with mental health discussions as {prevalent} in Indians. Significant linguistic variations in online mental health-related language in India compared to ROW, highlight the need for precision culturally-aware mental health models. These findings have important implications for designing culturally appropriate interventions to reduce the growing diagnosis and treatment gap for mental disorders in India.	翻訳日:2024-02-20 20:59:44 公開日:2024-02-18
# endoood : カプセル内視鏡診断における不確実性認識 EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis ( http://arxiv.org/abs/2402.11476v1 ) ライセンス: Link先を確認	Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Ren	(参考訳) wireless capsule endoscopy (wce) は消化管の可視化を可能にする非侵襲的な診断方法である。深層学習に基づく手法は、WCEデータを用いた疾患スクリーニングの有効性を示し、医療専門家の負担を軽減する。しかしながら、既存のカプセル内視鏡分類法は、主に予め定義されたカテゴリに依存しており、未定義のカテゴリや解剖学的ランドマークなど、分布外データ(ood)の識別と分類が困難である。この問題に対処するために,WCE 診断における OOD 検出課題を効果的に扱うことを目的としたEndoOOD (EndoOOD) フレームワークを提案する。提案フレームワークは,不確実性を考慮した混合訓練と長期分布データキャリブレーションを取り入れたWCE診断機能の堅牢性と信頼性の向上に重点を置いている。さらに、情報損失を最小限に抑えつつ、OODとIDデータを正確に識別するために仮想ロジットマッチングを用いる。提案手法の性能を評価するために,2つの公開データセットを用いた12の最先端(SOTA)手法の評価と比較を行った。以上の結果から,診断精度の向上と臨床意思決定支援におけるフレームワークの有効性が示された。 Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to identify and classify out-of-distribution (OOD) data, such as undefined categories or anatomical landmarks. To address this issue, we propose the Endoscopy Out-of-Distribution (EndoOOD) framework, which aims to effectively handle the OOD detection challenge in WCE diagnosis. The proposed framework focuses on improving the robustness and reliability of WCE diagnostic capabilities by incorporating uncertainty-aware mixup training and long-tailed in-distribution (ID) data calibration techniques. Additionally, virtual-logit matching is employed to accurately distinguish between OOD and ID data while minimizing information loss. To assess the performance of our proposed solution, we conduct evaluations and comparisons with 12 state-of-the-art (SOTA) methods using two publicly available datasets. The results demonstrate the effectiveness of the proposed framework in enhancing diagnostic accuracy and supporting clinical decision-making.	翻訳日:2024-02-20 20:59:24 公開日:2024-02-18
# 有毒偽造顔:顔偽造検出におけるバックドア攻撃へ向けて Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection ( http://arxiv.org/abs/2402.11473v1 ) ライセンス: Link先を確認	Jiawei Liang, Siyuan Liang, Aishan Liu, Xiaojun Jia, Junhao Kuang, Xiaochun Cao	(参考訳) 顔偽造技術の普及は社会に重大な関心を喚起し、顔偽造検出法の開発を動機付けている。これらの方法は、偽造された顔と本物の顔とを区別することを目的としており、実用的に有効であることが証明されている。しかし,本論文では,バックドア攻撃による顔偽造検出の新たな脅威について紹介する。バックドアをモデルに埋め込み、特定のトリガーパターンを入力に組み込むことで、攻撃者は検出器を欺き、偽造された顔に対する誤った予測を生成することができる。この目的を達成するために,顔偽造検知器に対するクリーンラベルバックドア攻撃を可能にする 'emph{Poisoned Forgery Face} フレームワークを提案する。提案手法は,スケーラブルなトリガジェネレータの構築と,翻訳に敏感なトリガパターンを生成するための新しいコンボルディングプロセスの利用である。また, 被毒試料のステルス性を高めるために, ランドマークベース領域に基づく相対的埋込法を適用した。その結果、有毒サンプルで訓練された検出器にはバックドアが埋め込まれる。特に,攻撃成功率 (+16.39\% BD-AUC) と可視性 (-12.65\% $L_\infty$) の低下により,SoTAのバックドアベースラインを超えている。さらに,攻撃は後方防御に対する有望な性能を示す。本論文は, 偽造検出シナリオにおけるバックドア攻撃による潜在的な脅威に, より注意を向けることが期待できる。我々のコードは \url{https://github.com/JWLiang007/PFF} で利用可能になる。 The proliferation of face forgery techniques has raised significant concerns within society, thereby motivating the development of face forgery detection methods. These methods aim to distinguish forged faces from genuine ones and have proven effective in practical applications. However, this paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack. By embedding backdoors into models and incorporating specific trigger patterns into the input, attackers can deceive detectors into producing erroneous predictions for forged faces. To achieve this goal, this paper proposes \emph{Poisoned Forgery Face} framework, which enables clean-label backdoor attacks on face forgery detectors. Our approach involves constructing a scalable trigger generator and utilizing a novel convolving process to generate translation-sensitive trigger patterns. Moreover, we employ a relative embedding method based on landmark-based regions to enhance the stealthiness of the poisoned samples. Consequently, detectors trained on our poisoned samples are embedded with backdoors. Notably, our approach surpasses SoTA backdoor baselines with a significant improvement in attack success rate (+16.39\% BD-AUC) and reduction in visibility (-12.65\% $L_\infty$). Furthermore, our attack exhibits promising performance against backdoor defenses. We anticipate that this paper will draw greater attention to the potential threats posed by backdoor attacks in face forgery detection scenarios. Our codes will be made available at \url{https://github.com/JWLiang007/PFF}	翻訳日:2024-02-20 20:58:59 公開日:2024-02-18
# ddiprompt: グラフプロンプト学習に基づく薬物と薬物の相互作用イベント予測 DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt Learning ( http://arxiv.org/abs/2402.11472v1 ) ライセンス: Link先を確認	Yingying Wang, Yun Xiong, Xixi Wu, Xiangguo Sun, and Jiawei Zhang	(参考訳) 近年、グラフニューラルネットワークは、薬物分子内の原子と官能基間の複雑な関連をモデル化する能力があるため、有害薬物-薬物相互作用(ddi)を予測するためにますます普及している。しかし、(1)特定の相互作用が過小評価されている医療データセットでは一般的なが重要な問題である、高度に不均衡な事象分散の問題である。この不均衡は、正確で信頼性の高いDDI予測を達成する上で大きな障壁となる。 2) まれな事象のラベル付きデータの不足は, 稀かつ潜在的に重要な相互作用が限られたデータによって見過ごされ, 過小評価される場合が多い医療分野において, 広範な問題である。これに対し、グラフプロンプトの最近の進歩に触発された革新的なパナセアであるDDIPromptを提供する。我々のフレームワークは、トレーニング済みのモデルから本質的な知識を活用することで、これらの問題に対処することを目的としており、最小限の下流データで効率的にデプロイできる。特に、最初の課題を解決するために、DDIPromptは、構造的および対話的な近接性の両方を考慮して、薬物間のリンクを増設する。分子内構造と分子間相互作用を理解する階層的な事前学習戦略を特徴とし、薬物特性の包括的で偏見のない理解を促進する。第2の課題として,推論中にprototype-enhanced prompting機構を実装した。このメカニズムは、各カテゴリの数少ない例によって洗練され、リッチな事前学習知識を効果的に活用し、予測精度を高める。 2つのベンチマークデータセットの総合評価は、DDIPromptの優位性を示し、特に稀なDDIイベントを予測する。 Recently, Graph Neural Networks have become increasingly prevalent in predicting adverse drug-drug interactions (DDI) due to their proficiency in modeling the intricate associations between atoms and functional groups within and across drug molecules. However, they are still hindered by two significant challenges: (1) the issue of highly imbalanced event distribution, which is a common but critical problem in medical datasets where certain interactions are vastly underrepresented. This imbalance poses a substantial barrier to achieving accurate and reliable DDI predictions. (2) the scarcity of labeled data for rare events, which is a pervasive issue in the medical field where rare yet potentially critical interactions are often overlooked or under-studied due to limited available data. In response, we offer DDIPrompt, an innovative panacea inspired by the recent advancements in graph prompting. Our framework aims to address these issues by leveraging the intrinsic knowledge from pre-trained models, which can be efficiently deployed with minimal downstream data. Specifically, to solve the first challenge, DDIPrompt employs augmented links between drugs, considering both structural and interactive proximity. It features a hierarchical pre-training strategy that comprehends intra-molecular structures and inter-molecular interactions, fostering a comprehensive and unbiased understanding of drug properties. For the second challenge, we implement a prototype-enhanced prompting mechanism during inference. This mechanism, refined by few-shot examples from each category, effectively harnesses the rich pre-training knowledge to enhance prediction accuracy, particularly for these rare but crucial interactions. Comprehensive evaluations on two benchmark datasets demonstrate the superiority of DDIPrompt, particularly in predicting rare DDI events.	翻訳日:2024-02-20 20:58:34 公開日:2024-02-18
# 変圧器テクスチャモデルにおけるトレーニングデータと対向ロバスト性との相関関係の探索 A Curious Case of Searching for the Correlation between Training Data and Adversarial Robustness of Transformer Textual Models ( http://arxiv.org/abs/2402.11469v1 ) ライセンス: Link先を確認	Cuong Dang, Dung D. Le, Thai Le	(参考訳) 既存の研究によると、微調整されたテキスト変換モデルは最先端の予測性能を実現するが、敵対的なテキスト摂動にも弱い。従来の敵対的評価は、モデルの微調整とトレーニングデータを無視して、しばしば \textit{only after} で行われる。本稿では,トレーニングデータとモデルロバスト性との間にも強い相関関係があることを証明したい。この目的のために,入力の微調整コーパス特性を表す13の異なる特徴を抽出し,それらを用いて微調整モデルの敵対的ロバスト性を予測する。主にエンコーダのみのトランスモデル BERT と RoBERTa に着目し,BART,ELECTRA,GPT2 のさらなる結果を得た上で,この議論を裏付けるさまざまな証拠を提供する。まず実証的な分析から (a)抽出した特徴をランダムフォレストなどの軽量分類器を用いて効果的に攻撃成功率を予測することができる。 (b)モデルのロバスト性に最も影響を及ぼす特徴は、ロバスト性と明確に相関する。第2に、このフレームワークは堅牢性評価のための高速かつ効果的な追加ツールとして使用できる。 (a)従来の手法と比較して30x-193xのランタイムを節約する。 (b)モデル間で転送可能である。 (c) 敵対的訓練で使用することができ、 (d)統計的ランダム性に頑健である。私たちのコードは公開されます。 Existing works have shown that fine-tuned textual transformer models achieve state-of-the-art prediction performances but are also vulnerable to adversarial text perturbations. Traditional adversarial evaluation is often done \textit{only after} fine-tuning the models and ignoring the training data. In this paper, we want to prove that there is also a strong correlation between training data and model robustness. To this end, we extract 13 different features representing a wide range of input fine-tuning corpora properties and use them to predict the adversarial robustness of the fine-tuned models. Focusing mostly on encoder-only transformer models BERT and RoBERTa with additional results for BART, ELECTRA and GPT2, we provide diverse evidence to support our argument. First, empirical analyses show that (a) extracted features can be used with a lightweight classifier such as Random Forest to effectively predict the attack success rate and (b) features with the most influence on the model robustness have a clear correlation with the robustness. Second, our framework can be used as a fast and effective additional tool for robustness evaluation since it (a) saves 30x-193x runtime compared to the traditional technique, (b) is transferable across models, (c) can be used under adversarial training, and (d) robust to statistical randomness. Our code will be publicly available.	翻訳日:2024-02-20 20:58:04 公開日:2024-02-18
# 長期時系列予測のためのアトラクタメモリ:カオスの視点から Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective ( http://arxiv.org/abs/2402.11463v1 ) ライセンス: Link先を確認	Jiaxi Hu, Yuehong Hu, Wei Chen, Ming Jin, Shirui Pan, Qingsong Wen, Yuxuan Liang	(参考訳) 長期時系列予測(LTSF)タスクでは、既存のディープラーニングモデルは、離散時系列が基礎となる連続力学系から生じる決定的な特性を見落とし、外挿と進化能力の欠如をもたらす。実世界のデータのカオス性を認識するモデルである \textbf{\textit{attraos}} は、カオス理論をltsfに取り入れ、未知の高次元カオス力学系からの観測として実世界の時系列を知覚する。誘引的不変性の概念の下で、Attraosは提案したマルチスケール動的メモリユニットを使用して、歴史的動的構造を記憶し、周波数強調ローカル進化戦略によって予測する。詳細な理論的分析と豊富な経験的証拠は、アトラオスが主流のLTSFデータセットやカオスデータセット上で様々なLTSFメソッドより優れていることを一貫して示している。 In long-term time series forecasting (LTSF) tasks, existing deep learning models overlook the crucial characteristic that discrete time series originate from underlying continuous dynamic systems, resulting in a lack of extrapolation and evolution capabilities. Recognizing the chaotic nature of real-world data, our model, \textbf{\textit{Attraos}}, incorporates chaos theory into LTSF, perceiving real-world time series as observations from unknown high-dimensional chaotic dynamic systems. Under the concept of attractor invariance, Attraos utilizes the proposed multi-scale dynamic memory unit to memorize historical dynamics structure and predicts by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets.	翻訳日:2024-02-20 20:57:40 公開日:2024-02-18
# fgeo-hypergnet:形式的シンボリックシステムとハイパーグラフニューラルネットワークの統合による幾何問題 FGeo-HyperGNet: Geometry Problem Solving Integrating Formal Symbolic System and Hypergraph Neural Network ( http://arxiv.org/abs/2402.11461v1 ) ライセンス: Link先を確認	Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Cheng Qin, Yang Li, Zhenbing Zeng, Tuo Leng	(参考訳) 幾何学的問題解決は、自動推論と人工知能の分野における長年にわたる課題である。我々は、人間のような幾何学的推論を自動実行するためのニューラルシンボリックシステムを構築しました。シンボリック部分はformorgeo上に構築された形式的システムであり、ジオマーティックな関係推論と代数的計算を自動実行し、解の過程をハイパーノードとして条件とハイパーエッジとして定理を持つ解超木に整理することができる。神経部分はhypergnetと呼ばれ、注意機構に基づくハイパーグラフニューラルネットワークであり、ハイパーツリーの構造および意味情報を効果的にエンコードするエンコーダと、問題解決ガイダンスを提供するソルバが含まれている。神経部はハイパーツリーに従って定理を予測し、記号部は定理を適用し、ハイパーツリーを更新する。実験は、このニューラルシンボリックアーキテクチャの正確性と有効性を示す。フォルマジオ7kデータセットでは、ステップワイズ精度87.65%、全体的な精度85.53%を達成した。コードとデータはhttps://github.com/bitsecret/hypergnetで入手できる。 Geometry problem solving has always been a long-standing challenge in the fields of automated reasoning and artificial intelligence. This is the fifth article in a series of our works, we built a neural-symbolic system to automatically perform human-like geometric deductive reasoning. The symbolic part is a formal system built on FormalGeo, which can automatically perform geomertic relational reasoning and algebraic calculations and organize the solving process into a solution hypertree with conditions as hypernodes and theorems as hyperedges. The neural part, called HyperGNet, is a hypergraph neural network based on the attention mechanism, including a encoder to effectively encode the structural and semantic information of the hypertree, and a solver to provide problem-solving guidance. The neural part predicts theorems according to the hypertree, and the symbolic part applies theorems and updates the hypertree, thus forming a Predict-Apply Cycle to ultimately achieve readable and traceable automatic solving of geometric problems. Experiments demonstrate the correctness and effectiveness of this neural-symbolic architecture. We achieved a step-wised accuracy of 87.65% and an overall accuracy of 85.53% on the formalgeo7k datasets. The code and data is available at https://github.com/BitSecret/HyperGNet.	翻訳日:2024-02-20 20:57:13 公開日:2024-02-18
# 時空間知識グラフに関する質問応答 Question Answering Over Spatio-Temporal Knowledge Graph ( http://arxiv.org/abs/2402.11542v1 ) ライセンス: Link先を確認	Xinbang Dai, Huiying Li, Guilin Qi	(参考訳) 時空間知識グラフ(STKG)は、時間と位置情報を組み込んだ知識グラフ(KG)の概念を拡張している。知識グラフ質問応答(kgqa)を研究コミュニティが重視する一方で、stkgsに基づく時空間情報と時空間情報の両方を取り入れた質問への回答の分野は、ほとんど未定である。さらに、包括的なデータセットの欠如は、この分野の進歩を妨げている。この問題に対処するために、時空間知識グラフ質問応答(STKGQA)のための1万の自然言語質問からなるデータセットSTQADを提案する。残念なことに、最先端のKGQAアプローチは、我々のデータセットで十分なパフォーマンスを達成するには程遠い。そこで本研究では,STComplExという新しいSTKG埋め込み手法を用いた時空間KGQA手法であるSTCQAを提案する。質問から時間的・空間的な情報を抽出することにより、質問をよりよく理解し、STKGから正確な回答を得ることができる。大規模な実験を通じて、データセットの品質とSTKGQA法の有効性を実証した。 Spatio-temporal knowledge graphs (STKGs) extend the concept of knowledge graphs (KGs) by incorporating time and location information. While the research community's focus on Knowledge Graph Question Answering (KGQA), the field of answering questions incorporating both spatio-temporal information based on STKGs remains largely unexplored. Furthermore, a lack of comprehensive datasets also has hindered progress in this area. To address this issue, we present STQAD, a dataset comprising 10,000 natural language questions for spatio-temporal knowledge graph question answering (STKGQA). Unfortunately, various state-of-the-art KGQA approaches fall far short of achieving satisfactory performance on our dataset. In response, we propose STCQA, a new spatio-temporal KGQA approach that utilizes a novel STKG embedding method named STComplEx. By extracting temporal and spatial information from a question, our QA model can better comprehend the question and retrieve accurate answers from the STKG. Through extensive experiments, we demonstrate the quality of our dataset and the effectiveness of our STKGQA method.	翻訳日:2024-02-20 20:48:42 公開日:2024-02-18
# 大語彙アラビアリブディングのための視覚的特徴と幾何学的特徴の相互注意融合 Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading ( http://arxiv.org/abs/2402.11520v1 ) ライセンス: Link先を確認	Samar Daou, Ahmed Rekik, Achraf Ben-Hamadou, Abdelaziz Kallel	(参考訳) リップリーディングは、唇とその周辺領域の動きを分析することによって、音声の認識に視覚データを使用する。これは、人間と機械の相互作用や音声認識の強化など、多くの潜在的な応用に関する熱い研究トピックである。近年の深層学習に基づく研究は,口域から抽出した視覚的特徴を唇輪郭の目印点と統合することを目的としている。しかし、結合のような単純な組み合わせ法は最適な特徴ベクトルを得るための最も効果的なアプローチではないかもしれない。まず,この課題に対処するために,大語彙レキシコン語彙によるビデオ中の発話単語の予測のためのクロス・アテンション・フュージョンに基づくアプローチを提案する。本手法は,視覚的特徴と幾何学的特徴を効率的に統合するために,クロスアテンションネットワークのパワーを利用する。第二に, アラビア語 (lrw-ar) 用に, 36名の話者が発話する100語クラスの2万本のビデオを含む大規模リップリーディングを初めて紹介する。 lrw-ar と arabic visual database で得られた実験結果は,提案手法の有効性と頑健性を示した。私たちの研究は、アラビア語にリップリード技術を適用する可能性と有効性について洞察を与え、この分野におけるさらなる研究の扉を開く。プロジェクトページへのリンク: https://crns-smartvision.github.io/lrwar Lipreading involves using visual data to recognize spoken words by analyzing the movements of the lips and surrounding area. It is a hot research topic with many potential applications, such as human-machine interaction and enhancing audio speech recognition. Recent deep-learning based works aim to integrate visual features extracted from the mouth region with landmark points on the lip contours. However, employing a simple combination method such as concatenation may not be the most effective approach to get the optimal feature vector. To address this challenge, firstly, we propose a cross-attention fusion-based approach for large lexicon Arabic vocabulary to predict spoken words in videos. Our method leverages the power of cross-attention networks to efficiently integrate visual and geometric features computed on the mouth region. Secondly, we introduce the first large-scale Lip Reading in the Wild for Arabic (LRW-AR) dataset containing 20,000 videos for 100-word classes, uttered by 36 speakers. The experimental results obtained on LRW-AR and ArabicVisual databases showed the effectiveness and robustness of the proposed approach in recognizing Arabic words. Our work provides insights into the feasibility and effectiveness of applying lipreading techniques to the Arabic language, opening doors for further research in this field. Link to the project page: https://crns-smartvision.github.io/lrwar	翻訳日:2024-02-20 20:48:24 公開日:2024-02-18
# 異種情報ネットワークにおける大規模言語モデル駆動型メタ構造発見 Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network ( http://arxiv.org/abs/2402.11518v1 ) ライセンス: Link先を確認	Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, Pan Hui	(参考訳) 不均一情報ネットワーク(HIN)は、多様なタイプのノード間の複雑な関係を捉えることができることで人気が高まっている。メタ構造は、豊かな意味情報を抽出し、グラフニューラルネットワークが表現表現を学ぶのに有効であることが証明されたHINに関する重要な関係パターンを特定するために提案された。しかし,手作りのメタ構造はスケールアップの難しさを招き,自動メタ構造探索アルゴリズムの開発に広く研究されている。以前の取り組みは、説明可能性を見越して、経験的予測性能の優れたメタ構造を探索することに集中していた。したがって、それらはしばしば、過度に適合し、人間には理解できないメタ構造を生み出す。これに対処するため、私たちは大言語モデル(llm)の創発的な推論能力からインスピレーションを得ます。本稿では,LLM推論を進化過程に統合したReasoning meta-STRUCTure search(ReStruct)フレームワークを提案する。 ReStructは文法トランスレータを使用して、メタ構造を自然言語文にエンコードし、LLMの推論能力を利用して意味論的に可能なメタ構造を評価する。 ReStructはパフォーマンス指向の進化操作も採用している。これら2つの競合する力は、メタ構造の意味的説明可能性と経験的性能を共同で最適化する。また,発見したメタ構造を自然言語で説明できる差分LLM説明器を設計し,検索履歴を通した推論により説明を洗練する。 5つのデータセットの実験では、ノード分類とリンクレコメンデーションタスクにおいて、ReStructがSOTAのパフォーマンスを達成することを示した。さらに、73人の大学院生を対象にした調査では、ReStructが生み出したメタ構造や自然言語の説明が理解しやすくなっている。 Heterogeneous information networks (HIN) have gained increasing popularity for being able to capture complex relations between nodes of diverse types. Meta-structure was proposed to identify important patterns of relations on HIN, which has been proven effective for extracting rich semantic information and facilitating graph neural networks to learn expressive representations. However, hand-crafted meta-structures pose challenges for scaling up, which draws wide research attention for developing automatic meta-structure search algorithms. Previous efforts concentrate on searching for meta-structures with good empirical prediction performance, overlooking explainability. Thus, they often produce meta-structures prone to overfitting and incomprehensible to humans. To address this, we draw inspiration from the emergent reasoning abilities of large language models (LLMs). We propose a novel REasoning meta-STRUCTure search (ReStruct) framework that integrates LLM reasoning into the evolutionary procedure. ReStruct uses a grammar translator to encode meta-structures into natural language sentences, and leverages the reasoning power of LLMs to evaluate semantically feasible meta-structures. ReStruct also employs performance-oriented evolutionary operations. These two competing forces jointly optimize for semantic explainability and empirical performance of meta-structures. We also design a differential LLM explainer that can produce natural language explanations for the discovered meta-structures, and refine the explanation by reasoning through the search history. Experiments on five datasets demonstrate ReStruct achieve SOTA performance in node classification and link recommendation tasks. Additionally, a survey study involving 73 graduate students shows that the meta-structures and natural language explanations generated by ReStruct are substantially more comprehensible.	翻訳日:2024-02-20 20:48:01 公開日:2024-02-18
# Knowledge-to-SQL: データエキスパートLLMによるSQL生成の強化 Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM ( http://arxiv.org/abs/2402.11517v1 ) ライセンス: Link先を確認	Zijin Hong, Zheng Yuan, Hao Chen, Qinggang Zhang, Feiran Huang, Xiao Huang	(参考訳) ユーザクエリ(text-to-SQL)に対する正確なSQLの生成は、SQLの生成がクエリとデータベースを解釈し、データベースから正確なデータを取得する必要があるため、長年にわたる問題である。既存のモデルはデータベーススキーマに従ってSQLを生成するためのLLM(Large Language Models)の包括的な能力に依存している。しかし、データベーススキーマに明示的に含まれていない、あるいはllmsによって学習された必要な知識がある。したがって、生成した知識不足クエリのsqlは不正確であり、テキスト対sqlモデルのロバスト性に悪影響を及ぼす可能性がある。この状況に対処するため,データエキスパートのLLM(DELLM)を用いて,すべてのタイプのテキスト・トゥ・SQLモデルに有用な知識を提供するKnowledge-to-SQLフレームワークを提案する。具体的には,DELLMの詳細設計とテーブル読解,および基礎的な微調整プロセスについて述べる。さらに、データベースフィードバックによる強化学習(RLDBF)のトレーニング戦略を提供し、DELLMを誘導し、LLMのより有用な知識を生成する。大規模な実験により、DELLMはテキストからSQLタスクにおける最先端のLLMを強化することができる。 DELLMのモデル構造とパラメータ重量は、さらなる研究のために公表される。 Generating accurate SQL for user queries (text-to-SQL) is a long-standing problem since the generation of the SQL requires comprehending the query and database and retrivale the accurate data from the database accordingly. Existing models rely on the comprehensive ability of Large Language Models (LLMs) to generate the SQL according to the database schema. However, there is some necessary knowledge that is not explicitly included in the database schema or has been learned by LLMs. Thus, the generated SQL of the knowledge-insufficient queries may be inaccurate, which negatively impacts the robustness of the text-to-SQL models. To deal with this situation, we propose the Knowledge-to-SQL framework, which employs tailored Data Expert LLM (DELLM) to provide helpful knowledge for all types of text-to-SQL models. Specifically, we provide the detailed design of DELLM, in terms of table reading, and the basic fine-tuning process. We further provide a Reinforcement Learning via Database Feedback (RLDBF) training strategy to guide the DELLM to generate more helpful knowledge for LLMs. Extensive experiments verify DELLM can enhance the state-of-the-art LLMs on text-to-SQL tasks. The model structure and the parameter weight of DELLM are released for further research.	翻訳日:2024-02-20 20:47:34 公開日:2024-02-18
# 深層強化学習に基づく計算流体力学におけるアクティブフロー制御のための最適並列化戦略 Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics ( http://arxiv.org/abs/2402.11515v1 ) ライセンス: Link先を確認	Wang Jia and Hang Xu	(参考訳) Deep Reinforcement Learning (DRL) は、高ダイナミックかつ非線形なアクティブフロー制御(AFC)問題を扱うための有望なアプローチとして登場した。しかし、DRLモデルのトレーニングに伴う計算コストは、大きなパフォーマンスボトルネックとなる。この課題に対処し、高性能コンピューティングアーキテクチャの効率的なスケーリングを実現するために、DRLベースのアルゴリズムを並列設定で最適化することに焦点を当てた。我々は、AFC問題に使用される既存の最先端DRLフレームワークを検証し、その効率ボトルネックについて議論する。その後、フレームワーク全体を分解し、個々のコンポーネントの広範なスケーラビリティベンチマークを行うことで、様々なハイブリッド並列化構成を調査し、効率的な並列化戦略を提案する。さらに,複数環境drlトレーニングにおける入出力(i/o)操作を洗練し,データ移動に伴う重要なオーバーヘッドに対処する。最後に,一般的なafc問題に対して最適化されたフレームワークを示し,そのフレームワーク全体に対してニアリニアスケーリングを求める。並列効率を約49%から約78%に大幅に向上させ,60cpuコアを用いて約47倍高速化した。これらの知見は、DRLに基づくAFC研究のさらなる進歩に有用な知見をもたらすことが期待されている。 Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 CPU cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies.	翻訳日:2024-02-20 20:47:12 公開日:2024-02-18
# 偏見からパリティへ: 大きな言語モデルによる単語埋め込みのデバイアスに対する新しいアプローチ From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings ( http://arxiv.org/abs/2402.11512v1 ) ライセンス: Link先を確認	Aishik Rakshit, Smriti Singh, Shuvam Keshari, Arijit Ghosh Chowdhury, Vinija Jain, Aman Chadha	(参考訳) 埋め込みは、大規模言語モデルの有効性において重要な役割を果たす。これらのモデルが文脈的関係を把握し、言語に対するよりニュアンス的な理解を育み、その結果、人間言語の基本的な理解を必要とする多くの複雑なタスクにおいて、著しく機能する基盤となる。これらの埋め込み自体がしばしばバイアスを反映または表象していることを考えると、これらのモデルが必然的にこのバイアスを学習する理由である。本研究では,これまでの精巧な研究に基づいて,ニューラルネットワークを用いて「ソフトデバイアス」を行うアルゴリズムであるdeepsoftdebiasを提案する。我々はこのアルゴリズムを様々なSOTAデータセット、精度メトリクス、難解なNLPタスクで徹底的に評価する。 DeepSoftDebiasは、性別、人種、宗教の偏見を減らし、最先端の手法よりも優れています。 Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDebias, an algorithm that uses a neural network to perform `soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.	翻訳日:2024-02-20 20:46:50 公開日:2024-02-18
# 胸部X線セグメンテーションマスクにおける肺領域の過小評価 : CTによる肺総容積評価との比較 Underestimation of lung regions on chest X-ray segmentation masks assessed by comparison with total lung volume evaluated on computed tomography ( http://arxiv.org/abs/2402.11510v1 ) ライセンス: Link先を確認	Przemys{\l}aw Bombi\'nski, Patryk Szatkowski, Bart{\l}omiej Sobieski, Tymoteusz Kwieci\'nski, Szymon P{\l}otka, Mariusz Adamek, Marcin Banasiuk, Mariusz I. Furmanek, Przemys{\l}aw Biecek	(参考訳) 肺マスクの作成には明確な基準や基準が欠如しており、アノテータ間の主観性が高い。本研究では, 胸部x線分画マスクにおける肺領域の過大評価を, 胸部x線分画マスクの現況とctで評価した肺総量との比較により検討した。肺x線マスクは, 心臓, 縦隔, 横隔の輪郭を追尾し, 肺領域を著しく過小評価し, 肺のかなりの部分をさらなる評価から排除し, 臨床上の誤りを多数生じさせる可能性がある。 Lung mask creation lacks well-defined criteria and standardized guidelines, leading to a high degree of subjectivity between annotators. In this study, we assess the underestimation of lung regions on chest X-ray segmentation masks created according to the current state-of-the-art method, by comparison with total lung volume evaluated on computed tomography (CT). We show, that lung X-ray masks created by following the contours of the heart, mediastinum, and diaphragm significantly underestimate lung regions and exclude substantial portions of the lungs from further assessment, which may result in numerous clinical errors.	翻訳日:2024-02-20 20:46:34 公開日:2024-02-18
# MAL:自己監督深度推定のための時間・蒸留ヒント付き運動認識損失 MAL: Motion-Aware Loss with Temporal and Distillation Hints for Self-Supervised Depth Estimation ( http://arxiv.org/abs/2402.11507v1 ) ライセンス: Link先を確認	Yup-Jiang Dong, Fang-Lue Zhang, Song-Hai Zhang	(参考訳) 深度知覚は、幅広いロボット応用に不可欠である。大規模でラベルのない実世界のデータを活用できるため,多フレーム自己監督深度推定手法が研究の関心を集めている。しかし、自己教師型手法は静的シーンの仮定に依存し、動的環境において性能が劣化する傾向がある。そこで本研究では,連続した入力フレーム間の時間的関係と,教師と生徒ネットワーク間の新たな蒸留方式を活用し,マルチフレーム自己教師奥行き推定手法を提案する。具体的には,移動物体の空間位置と入力フレームの時間順序を関連付け,物体の動きによる誤差を解消する。一方,マルチフレーム方式では元の蒸留スキームを強化し,教師ネットワークからの知識をより活用する。 MALはマルチフレームの自己監督型単眼深度推定手法にシームレスに統合するために設計された新しいプラグアンドプレイモジュールである。従来の最先端手法にMALを追加すると、KITTIとCityScapesベンチマークでそれぞれ4.2%と10.8%の深さ推定誤差が減少する。 Depth perception is crucial for a wide range of robotic applications. Multi-frame self-supervised depth estimation methods have gained research interest due to their ability to leverage large-scale, unlabeled real-world data. However, the self-supervised methods often rely on the assumption of a static scene and their performance tends to degrade in dynamic environments. To address this issue, we present Motion-Aware Loss, which leverages the temporal relation among consecutive input frames and a novel distillation scheme between the teacher and student networks in the multi-frame self-supervised depth estimation methods. Specifically, we associate the spatial locations of moving objects with the temporal order of input frames to eliminate errors induced by object motion. Meanwhile, we enhance the original distillation scheme in multi-frame methods to better exploit the knowledge from a teacher network. MAL is a novel, plug-and-play module designed for seamless integration into multi-frame self-supervised monocular depth estimation methods. Adding MAL into previous state-of-the-art methods leads to a reduction in depth estimation errors by up to 4.2% and 10.8% on KITTI and CityScapes benchmarks, respectively.	翻訳日:2024-02-20 20:46:20 公開日:2024-02-18
# 不均質な言語課題とクライアントリソースに基づく大規模言語モデルのフェデレーション微調整 Federated Fine-tuning of Large Language Models under Heterogeneous Language Tasks and Client Resources ( http://arxiv.org/abs/2402.11505v1 ) ライセンス: Link先を確認	Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li	(参考訳) Federated Learning (FL) は、最近、LLM(Large Language Models)のパラメータ効率の高い微調整に応用されている。将来性はあるものの、クライアントの異種リソースとデータ分散による大きな課題を提起する。本研究では、llmの微調整のためのシンプルで効果的な集約スキームflexloraを紹介している。これは、リソース不足の参加者の能力に結びつけることで、豊富なリソースを持つクライアントの可能性を制限する従来のflの「バケット効果」を緩和する。 FlexLoRAはローカルなLoRAランクの動的調整を可能にし、より広範でタスク固有の知識の少ないグローバルモデルの開発を促進する。個々のクライアントからのコントリビューションからフルサイズのLoRA重みを合成し、重量再分配にSingular Value Decomposition(SVD)を採用することで、FlexLoRAは異種クライアントリソースを完全に活用する。 1,600以上のクライアントが多様なNLPタスクを担い、この実験はFlexLoRAの有効性を検証し、フェデレートされたグローバルモデルにより、下流のNLPタスクのパフォーマンスが3.1%向上した。 FlexLoRAの実用性は、既存のLoRAベースのFLメソッドとシームレスに統合され、LLMのスケーラブルでプライバシ保護されたフェデレーションチューニングへの道を提供する理論解析によってさらに強調されている。 Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients.This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the "buckets effect" in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving over 1,600 clients performing diverse NLP tasks, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving up to a 3.1% average improvement in downstream NLP task performance. FlexLoRA's practicality is further underscored by its seamless integration with existing LoRA-based FL methods and theoretical analysis, offering a path toward scalable, privacy-preserving federated tuning for LLMs.	翻訳日:2024-02-20 20:45:59 公開日:2024-02-18
# プロプライエタリなストリートビュー画像を(健康と場所)調査で使うか、使わないか? それが問題です To use or not to use proprietary street view images in (health and place) research? That is the question ( http://arxiv.org/abs/2402.11504v1 ) ライセンス: Link先を確認	Marco Helbich, Matthew Danish, SM Labib, Britta Ricker	(参考訳) コンピュータビジョンに基づくストリートビュー画像の解析は環境アセスメントに変化をもたらす。インタラクティブなWebサービス、特にGoogleストリートビューは、画像データをユビキタスにするための重要な役割を果たす。何百万ものGoogleストリートビュー画像を利用する技術的容易さにもかかわらず、この記事は、このプロプライエタリなデータソースを使用する際の現在のプラクティスに疑問を投げかける。画像の大量ダウンロードやストリートビュー画像ベースのインデックスの生成を禁止しているGoogleのサービス規約に懸念があります。データライセンス契約と法的整合性を維持しつつ研究を画期的に進めることによる社会の発展の課題を解決するためには、オープンデータ原則に固執し、将来の研究にオープンイメージソースを活用することが不可欠である。 Computer vision-based analysis of street view imagery has transformative impacts on environmental assessments. Interactive web services, particularly Google Street View, play an ever-important role in making imagery data ubiquitous. Despite the technical ease of harnessing millions of Google Street View images, this article questions the current practices in using this proprietary data source. Our concern lies with Google's terms of service, which prohibit bulk image downloads and the generation of street view image-based indices. To reconcile the challenge of advancing society through groundbreaking research while maintaining data license agreements and legal integrity, it is crucial to adhere to open data principles and utilize open image sources for future research.	翻訳日:2024-02-20 20:45:33 公開日:2024-02-18
# GenAD: 次世代のエンドツーエンド自動運転 GenAD: Generative End-to-End Autonomous Driving ( http://arxiv.org/abs/2402.11502v1 ) ライセンス: Link先を確認	Wenzhao Zheng, Ruiqi Song, Xianda Guo, Long Chen	(参考訳) 生センサによる計画結果を直接生成することは、自動運転の長年望まれてきたソリューションであり、近年注目を集めている。既存のエンドツーエンドの自動運転手法の多くは、この問題を知覚、運動予測、計画に分解している。しかし、従来のプログレッシブパイプラインは、例えば、エゴカーと他の交通参加者と、それ以前の構造軌道との間の将来の相互作用など、交通進化過程全体を包括的にモデル化することはできない。本稿では,エゴカーと周辺環境が過去の場面でどのように進化するかを予測するために,エンド・ツー・エンドの自動運転の新しいパラダイムを探求する。我々は、自律運転を生成モデル問題に投入する生成フレームワークGenADを提案する。まず,周辺シーンをmap-awareインスタンストークンに変換するインスタンス中心のシーントークン化器を提案する。次に、変動オートエンコーダを用いて、軌道先行モデリングのための構造潜在空間における将来の軌道分布を学習する。さらに, 潜伏空間におけるエージェントとエゴの動きを捉えるための時間モデルを採用し, より効果的な将来の軌跡を生成する。最後にgenadは、インスタンストークンに条件付けされた学習構造潜在空間の分布をサンプリングし、学習時間モデルを使用して未来を生成することで、動作予測と計画を同時に行う。広く使用されているnuScenesベンチマークの大規模な実験により、提案されたGenADは、高効率でビジョン中心のエンドツーエンド自動運転における最先端のパフォーマンスを達成することが示された。 Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic evolution process, e.g., the future interaction between the ego car and other traffic participants and the structural trajectory prior. In this paper, we explore a new paradigm for end-to-end autonomous driving, where the key is to predict how the ego car and the surroundings evolve given past scenes. We propose GenAD, a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling. We further adopt a temporal model to capture the agent and ego movements in the latent space to generate more effective future trajectories. GenAD finally simultaneously performs motion prediction and planning by sampling distributions in the learned structural latent space conditioned on the instance tokens and using the learned temporal model to generate futures. Extensive experiments on the widely used nuScenes benchmark show that the proposed GenAD achieves state-of-the-art performance on vision-centric end-to-end autonomous driving with high efficiency.	翻訳日:2024-02-20 20:45:19 公開日:2024-02-18
# 基礎モデルを用いた複雑なロボット指導の検証 Verifiably Following Complex Robot Instructions with Foundation Models ( http://arxiv.org/abs/2402.11498v1 ) ライセンス: Link先を確認	Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris	(参考訳) 複雑な自然言語命令に従うロボットの開発は、重要な課題である。人々は柔軟に制約を表現し、任意のランドマークを参照し、ロボットに指示するときの行動を検証することを望んでいます。逆に、ロボットは人間の指示を、現実世界の仕様や地上の指示にあいまいにする必要がある。動作計画のための言語指導基盤(LIMP: Language Instruction Grounding for Motion Planning)を提案する。これは、基本モデルと時間論理を利用して、ロボットがオープンな語彙参照と複雑な時空間制約を持つ表現的・長期的指示を確実に追従できるように、指示条件付きセマンティックマップを生成するシステムである。ロボットタスクの実行において基礎モデルを使用する従来の方法とは対照的に、LIMPは、インストラクターの意図する動機とロボットのアライメントを明らかにする説明可能な指示表現を構築し、正しいロボット動作の合成を行う。 LIMPは,35の複雑な時空間命令からなる実世界の3つの環境において,我々のアプローチの一般化と新規な非構造ドメインへの展開の容易さを示す。実験では,オープンボキャブラリーレファレンスを空間的に接地し,対象方向ナビゲーションの90%と移動操作命令の71%で制約満足プランを合成する。補足ビデオはhttps://robotlimp.github.io Enabling robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), a system that leverages foundation models and temporal logics to generate instruction-conditioned semantic maps that enable robots to verifiably follow expressive and long-horizon instructions with open vocabulary referents and complex spatiotemporal constraints. In contrast to prior methods for using foundation models in robot task execution, LIMP constructs an explainable instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We demonstrate LIMP in three real-world environments, across a set of 35 complex spatiotemporal instructions, showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP can spatially ground open-vocabulary referents and synthesize constraint-satisfying plans in 90% of object-goal navigation and 71% of mobile manipulation instructions. See supplementary videos at https://robotlimp.github.io	翻訳日:2024-02-20 20:44:54 公開日:2024-02-18
# 多視点自己教師付き学習と2段階前訓練による甲状腺超音波診断の改善 Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training ( http://arxiv.org/abs/2402.11497v1 ) ライセンス: Link先を確認	Jian Wang, Xin Yang, Xiaohong Jia, Wufeng Xue, Rusi Chen, Yanlin Chen, Xiliang Zhu, Lian Liu, Yan Cao, Jianqiao Zhou, Dong Ni, Ning Gu	(参考訳) 超音波画像の甲状腺結節分類とセグメンテーションはコンピュータ支援診断において重要であるが,ラベル付きデータ不足による限界に直面している。そこで本研究では, 甲状腺結節分類と分節性能を改善するための多視点コントラスト型自己教師あり方式を提案する。本手法は,同一結節の横断的および縦方向の視野を整合させ,結節領域に焦点を合わせることを可能にする。我々は、ペアデータの制限を取り除く適応的損失関数を設計した。さらに,imagenetおよび甲状腺超音波画像の事前訓練を活用すべく,2段階の事前訓練を行った。複数のセンターから収集した大規模データセット上で大規模な実験を行った。提案手法は,手動ラベルの限定による結節分類とセグメンテーション性能を著しく向上し,最先端の自己管理手法よりも優れていた。 2段階の事前トレーニングもImageNetの事前トレーニングをはるかに上回った。 Thyroid nodule classification and segmentation in ultrasound images are crucial for computer-aided diagnosis; however, they face limitations owing to insufficient labeled data. In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels. Our method aligns the transverse and longitudinal views of the same nodule, thereby enabling the model to focus more on the nodule area. We designed an adaptive loss function that eliminates the limitations of the paired data. Additionally, we adopted a two-stage pre-training to exploit the pre-training on ImageNet and thyroid ultrasound images. Extensive experiments were conducted on a large-scale dataset collected from multiple centers. The results showed that the proposed method significantly improves nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods. The two-stage pre-training also significantly exceeded ImageNet pre-training.	翻訳日:2024-02-20 20:44:30 公開日:2024-02-18
# URLBERT:URL分類のためのコントラストおよび逆順事前学習モデル URLBERT:A Contrastive and Adversarial Pre-trained Model for URL Classification ( http://arxiv.org/abs/2402.11495v1 ) ライセンス: Link先を確認	Yujie Li, Yanbin Wang, Haitao Xu, Zhenhao Guo, Zheng Cao, Lun Zhang	(参考訳) URLは、特にセキュリティ管理やオンラインレコメンデーションに関連するタスクにおいて、Webコンテンツの理解と分類において重要な役割を果たす。現在、事前訓練されたモデルは様々な分野を支配しているが、URL分析の領域には特別な事前訓練されたモデルがない。このギャップに対処するために、様々なURL分類や検出タスクに適用された最初の事前学習型表現学習モデルであるURLBERTを紹介する。私たちはまず、URLデータのトークン化に対処するために、数十億のURLのコーパスでURLトークンライザをトレーニングします。さらに,(1)同一URLの異なる変種を識別することで,モデルのURL構造理解とカテゴリー差の捕捉を強化する自己教師型コントラスト学習タスク,(2)URLから意味的特徴を抽出する際のモデルの堅牢性向上を目的とした仮想対人訓練,の2つの新しい事前学習タスクを提案する。最後に,提案手法をフィッシングurl検出,webページ分類,広告フィルタリングなどのタスクで評価し,最先端のパフォーマンスを実現する。また, URLBERTを用いたマルチタスク学習についても検討し, 複雑なタスク要求の処理における URLBERT の単純さを示すために, URLBERT に基づくマルチタスク学習モデルが独立に調整されたモデルと同等の有効性を示した。私たちの仕事のコードはhttps://github.com/davidup1/urlbert.comで利用可能です。 URLs play a crucial role in understanding and categorizing web content, particularly in tasks related to security control and online recommendations. While pre-trained models are currently dominating various fields, the domain of URL analysis still lacks specialized pre-trained models. To address this gap, this paper introduces URLBERT, the first pre-trained representation learning model applied to a variety of URL classification or detection tasks. We first train a URL tokenizer on a corpus of billions of URLs to address URL data tokenization. Additionally, we propose two novel pre-training tasks: (1) self-supervised contrastive learning tasks, which strengthen the model's understanding of URL structure and the capture of category differences by distinguishing different variants of the same URL; (2) virtual adversarial training, aimed at improving the model's robustness in extracting semantic features from URLs. Finally, our proposed methods are evaluated on tasks including phishing URL detection, web page classification, and ad filtering, achieving state-of-the-art performance. Importantly, we also explore multi-task learning with URLBERT, and experimental results demonstrate that multi-task learning model based on URLBERT exhibit equivalent effectiveness compared to independently fine-tuned models, showing the simplicity of URLBERT in handling complex task requirements. The code for our work is available at https://github.com/Davidup1/URLBERT.	翻訳日:2024-02-20 20:44:14 公開日:2024-02-18
# 因果干渉によるグラフアウトオブディストリビューション一般化 Graph Out-of-Distribution Generalization via Causal Intervention ( http://arxiv.org/abs/2402.11494v1 ) ライセンス: Link先を確認	Qitian Wu, Fan Nie, Chenxiao Yang, Tianyi Bao, Junchi Yan	(参考訳) グラフニューラルネットワーク(GNN)は、分散シフトに伴うパフォーマンス劣化を示すことが多いため、アウト・オブ・ディストリビューション(OOD)の一般化は、グラフの学習に注目が集まっている。課題は、グラフ上の分散シフトがノード間の複雑な相互接続を伴い、環境ラベルがしばしばデータに欠落することである。本稿では,ボトムアップなデータ生成的視点を採用し,因果分析による重要な観察を明らかにする。後者は、egoグラフの特徴とターゲットノードのラベルの間の環境に敏感な相関を利用するようにモデルを誤解し、新たな未知ノードに対する望ましくない一般化をもたらす。この分析に基づいて,環境ラベルの事前知識を必要とせず,ノードレベルの分散シフトの下で堅牢なGNNをトレーニングするための,概念的に単純だが原則化されたアプローチを導入する。本手法は,環境推定器と熟練GNN予測器を協調する因果推論に基づく新たな学習目標を提案する。この新しいアプローチは、トレーニングデータの偏りを克服し、一般化可能な予測関係の学習を容易にする。広範な実験により,本モデルは様々な分布シフトによる一般化を効果的に促進し,グラフood一般化ベンチマークにおける最先端よりも27.4\%の精度向上が得られることを示した。ソースコードはhttps://github.com/fannie1208/canetで入手できる。 Out-of-distribution (OOD) generalization has gained increasing attentions for learning on graphs, as graph neural networks (GNNs) often exhibit performance degradation with distribution shifts. The challenge is that distribution shifts on graphs involve intricate interconnections between nodes, and the environment labels are often absent in data. In this paper, we adopt a bottom-up data-generative perspective and reveal a key observation through causal analysis: the crux of GNNs' failure in OOD generalization lies in the latent confounding bias from the environment. The latter misguides the model to leverage environment-sensitive correlations between ego-graph features and target nodes' labels, resulting in undesirable generalization on new unseen nodes. Built upon this analysis, we introduce a conceptually simple yet principled approach for training robust GNNs under node-level distribution shifts, without prior knowledge of environment labels. Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor. The new approach can counteract the confounding bias in training data and facilitate learning generalizable predictive relations. Extensive experiment demonstrates that our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4\% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks. Source codes are available at https://github.com/fannie1208/CaNet.	翻訳日:2024-02-20 20:43:48 公開日:2024-02-18
# BGEランドマーク埋め込み:長期拡張大言語モデル検索のためのチャンキングフリー埋め込み手法 BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models ( http://arxiv.org/abs/2402.11573v1 ) ライセンス: Link先を確認	Kun Luo and Zheng Liu and Shitao Xiao and Kang Liu	(参考訳) 大規模言語モデル(LLM)は、多くの重要なアプリケーションを扱うためにコンテキストの拡張を要求する。しかし、既存のアプローチはコストがかかり、コンテキスト拡張の品質が劣る傾向にある。本研究では,LLMのコンテキストを高精細に拡張し,柔軟性とコスト効率を向上するExtensible Embeddingを提案する。拡張可能な埋め込みは、単一のトークンではなく、拡張可能なコンテキストのスコープの情報を表す典型的なトークン埋め込みの拡張である。情報密度の高いそのようなコンパクトな入力ユニットを利用することで、LLMは小さなコンテキストウィンドウでも広い範囲のコンテキストにアクセスできる。拡張可能な埋め込みは、アーキテクチャとトレーニングメソッドに体系的に最適化され、複数の利点をもたらす。 1) 多様なコンテキスト長のアドホック拡張を柔軟にサポートするコンテキスト拡張の柔軟性が高い。 2) 組込みモデルを費用対効果で学習する訓練の強いサンプル効率について検討した。 3) プラグインコンポーネントとして拡張可能な埋め込みをシームレスに導入可能な既存のLLMとの互換性。長文言語モデリングおよび理解タスクに関する包括的な評価は、LLMのコンテキストを拡張するために、効果的で効率的で柔軟で互換性のある方法として拡張可能な埋め込みを検証する。 Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we proposeExtensible Embedding, which realizes high-quality extension of LLM's context with strong flexibility and cost-effectiveness. Extensible embedding stand as an enhancement of typical token embedding, which represents the information for an extensible scope of context instead of a single token. By leveraging such compact input units of higher information density, the LLM can access to a vast scope of context even with a small context window. Extensible embedding is systematically optimized in architecture and training method, which leads to multiple advantages. 1) High flexibility of context extension, which flexibly supports ad-hoc extension of diverse context lengths. 2) Strong sample efficiency of training, which enables the embedding model to be learned in a cost-effective way. 3) Superior compatibility with the existing LLMs, where the extensible embedding can be seamlessly introduced as a plug-in component. Comprehensive evaluations on long-context language modeling and understanding tasks verify extensible embedding as an effective, efficient, flexible, and compatible method to extend the LLM's context.	翻訳日:2024-02-20 20:36:19 公開日:2024-02-18
# LongAgent: マルチエージェントコラボレーションによる言語モデルから128kコンテキストへのスケーリング LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration ( http://arxiv.org/abs/2402.11550v1 ) ライセンス: Link先を確認	Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang	(参考訳) 大規模言語モデル(LLM)は、言語理解と複雑な推論タスクの実行において優れたパフォーマンスを示している。しかし、長いコンテキストウィンドウを持つLLMは、高価なトレーニングコストと高い推論遅延で有名である。 GPT-4やClaude2のような最も先進的なモデルでさえ、100kドルを超えるトークンの入力を処理するときにしばしば間違いを犯す。本稿では、128Kの文脈にLLM(LLaMA)を拡大し、GPT-4と比較して長文処理において潜在的優位性を示すマルチエージェント協調に基づく手法である \textsc{LongAgent} を提案する。 textsc{longagent}では、リーダーはユーザーの意図を理解し、チームメンバーに文書から情報を取得するよう指示する責任がある。メンバーの幻覚のため、リーダーが数十人から数百人のメンバーの反応から正確な情報を得るのは自明ではない。これに対処するために,情報共有による幻覚による応答競合を解決するための \textit{inter-member communication} メカニズムを開発した。実験結果から, <textsc{LongAgent} が長文処理の代替となる可能性が示唆された。 LLaMA-7Bでインスタンス化したエージェントチームは、128k長のテキスト検索やマルチホップ質問応答といったタスクを、GPT-4と比べて大幅に改善した。 Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.	翻訳日:2024-02-20 20:35:58 公開日:2024-02-18
# 英語とドイツ語における構文的言語変化:メトリクス、パーサー、収束 Syntactic Language Change in English and German: Metrics, Parsers, and Convergences ( http://arxiv.org/abs/2402.11549v1 ) ライセンス: Link先を確認	Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger	(参考訳) 多くの研究が、人間の言語は複雑さの低減と通信効率の向上のために最適化される傾向があることを示した。依存語間の線形距離を測定する構文依存距離は、しばしば言語処理の困難さと作業記憶負荷の重要な指標と見なされる。本論文は,前回のC言語論争のコーパスを用いて,英語とドイツ語の統語的言語変化のダイアクロニックな傾向を考察する。 160年。私たちは、広く使われているStanford CoreNLPと4つの新しい代替品を含む5つの依存性パーサをベースとしています。構文言語の変化の分析は, 線形依存性距離を超えるもので, 依存性距離最小化(DDM)および/または木の高さや次数分散といった木グラフ特性に基づく15の指標を探索する。最近の木バンクで訓練されたパーサーは,スペル変化やOCRエラーなどのデータ「ノイズ」の影響を受けない証拠があるが,構文言語変化の結果は関連するパーサーに敏感であり,構文言語変化を評価するために単一のパーサーを使用することに注意が必要である。また、調査期間中の構文言語の変化は、調査対象の異なる指標間で英語とドイツ語にほぼ類似していることを示し、調査対象のケースのわずか4%が、ドイツ語と英語の統語的指標の上昇と下降に関する反対の結論を得た。また,文長分布の尾部では,統語的尺度の変化がより頻繁であることを示す。我々の知る限りでは、近年の英語とドイツ語のコーパスにおいて、現代のNLP技術を用いた構文言語の最も包括的な分析である。 Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German across the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language using modern NLP technology in recent corpora of English and German.	翻訳日:2024-02-20 20:35:34 公開日:2024-02-18
# KMMLU:韓国における大規模マルチタスク言語理解の測定 KMMLU: Measuring Massive Multitask Language Understanding in Korean ( http://arxiv.org/abs/2402.11548v1 ) ライセンス: Link先を確認	Guijin Son and Hanwool Lee and Sungdong Kim and Seungone Kim and Niklas Muennighoff and Taekyoon Choi and Cheonbok Park and Kang Min Yoo and Stella Biderman	(参考訳) 人文科学からstemまで,45名を対象に,35,030名の専門家レベルのマルチチョイス質問を持つ韓国人ベンチマークkmmluを提案する。既存の英語のベンチマークから翻訳された以前の韓国のベンチマークとは異なり、KMMLUは韓国語の言語的・文化的側面を捉えた最初の韓国の試験から収集される。公開可能な26のLLMをテストし、改善の余地を特定した。最高の一般公開モデルはKMMLUで50.54%、平均62.6%よりもはるかに低い。このモデルは韓国語ではなく、主に英語と中国語で訓練された。韓国のポリグロット・コ(Polyglot-Ko)のような現在のLLMは、はるかに悪化している。驚くべきことに、GPT-4やHyperCLOVA Xのような最も有能なLLMでさえそれぞれ59.95%と53.40%を達成している。これは韓国のLLMを改善するためにさらなる作業が必要であることを示唆しており、KMMLUはこの進捗を追跡できる適切なツールを提供している。私たちはデータセットをHugging Face Hubで公開し、ベンチマークをEleutherAIのLanguage Model Evaluation Harnessに統合します。 We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM. Unlike previous Korean benchmarks that are translated from existing English benchmarks, KMMLU is collected from original Korean exams, capturing linguistic and cultural aspects of the Korean language. We test 26 publically available and proprietary LLMs, identifying significant room for improvement. The best publicly available model achieves 50.54% on KMMLU, far below the average human performance of 62.6%. This model was primarily trained for English and Chinese, not Korean. Current LLMs tailored to Korean, such as Polyglot-Ko, perform far worse. Surprisingly, even the most capable proprietary LLMs, e.g., GPT-4 and HyperCLOVA X, achieve 59.95% and 53.40%, respectively. This suggests that further work is needed to improve Korean LLMs, and KMMLU offers the right tool to track this progress. We make our dataset publicly available on the Hugging Face Hub and integrate the benchmark into EleutherAI's Language Model Evaluation Harness.	翻訳日:2024-02-20 20:35:05 公開日:2024-02-18
# 逆直感的:大きな言語モデルは、思った以上に知識グラフを理解できる Counter-intuitive: Large Language Models Can Better Understand Knowledge Graphs Than We Thought ( http://arxiv.org/abs/2402.11541v1 ) ライセンス: Link先を確認	Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Guilin Qi	(参考訳) 大規模言語モデル(LLM)の推論能力の向上と知識グラフ(KG)の利用による幻覚の低減は広く注目されているが、LLMがKGのオンザフライでの構造化知識を統合する方法の探求はいまだ不十分である。研究者はしばしば、KGの知識を理解する能力を備えたLLMに、KG埋め込みとLLMパラメータを併用する。しかし、このリソースハーグリートレーニングパラダイムはモデル学習コストを大幅に向上させ、非オープンソースでブラックボックスのLCMにも適さない。本稿では,複雑な質問応答(CQA)を用いて,KG知識を解釈するLLMの能力を評価する。我々は,KG知識をLLMに供給する最適なプロンプト法を検討することを目的として,KG知識注入法(トリプルから自然言語テキストまで)の総合的な比較を行った。初期の期待とは対照的に,llmは乱雑でうるさく,線形化されたkg知識を効果的に処理し,高度に設計された自然言語(nl)テキストプロンプトを用いた手法よりも優れていた。この反直感的な発見は、LLMの構造化知識の理解に関する将来の研究に重要な洞察を与える。 Although the method of enhancing large language models' (LLMs') reasoning ability and reducing their hallucinations through the use of knowledge graphs (KGs) has received widespread attention, the exploration of how to enable LLMs to integrate the structured knowledge in KGs on-the-fly remains inadequate. Researchers often co-train KG embeddings and LLM parameters to equip LLMs with the ability of comprehending KG knowledge. However, this resource-hungry training paradigm significantly increases the model learning cost and is also unsuitable for non-open-source, black-box LLMs. In this paper, we employ complex question answering (CQA) as a task to assess the LLM's ability of comprehending KG knowledge. We conducted a comprehensive comparison of KG knowledge injection methods (from triples to natural language text), aiming to explore the optimal prompting method for supplying KG knowledge to LLMs, thereby enhancing their comprehension of KG. Contrary to our initial expectations, our analysis revealed that LLMs effectively handle messy, noisy, and linearized KG knowledge, outperforming methods that employ well-designed natural language (NL) textual prompts. This counter-intuitive finding provides substantial insights for future research on LLMs' comprehension of structured knowledge.	翻訳日:2024-02-20 20:34:48 公開日:2024-02-18
# CPN:制約なしテキスト検出のための補完提案ネットワーク CPN: Complementary Proposal Network for Unconstrained Text Detection ( http://arxiv.org/abs/2402.11540v1 ) ライセンス: Link先を確認	Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong	(参考訳) 既存のテキスト検出方法は、セグメンテーションベースとアンカーベースという2つのパラダイムに分けられる。セグメンテーションベースの手法は不規則な形状に適しているが、コンパクトもしくは重なり合うレイアウトに苦労する。逆に、アンカーベースのアプローチは複雑なレイアウトでは優れているが、不規則な形状に苦しむ。それらのメリットを強化し,それぞれのデメリットを克服するために,意味的および幾何学的情報をシームレスに統合し,優れた性能を実現する補完的提案ネットワーク(cpn)を提案する。 cpnは、革新的な変形可能な形態素演算子を用いた意味的提案を生成する変形可能形態素意味ネットワークと、事前定義されたアンカーを用いた幾何学的提案を生成するバランスド領域提案ネットワークである。補間性をさらに向上するため,提案生成前に意味的および幾何学的特徴を深く相互作用させるインターリーブド・フィーチャー・アテンション・モジュールを導入する。補完的な提案と特徴の両方を活用することで、CPNは同等の計算コストで最先端のアプローチよりも優れたマージンを持つ。具体的には, icdar19-art, ic15, msra-td500をそれぞれ3.6%, 1.3%, 1.0%改善した。私たちのメソッドのコードはリリースされます。 Existing methods for scene text detection can be divided into two paradigms: segmentation-based and anchor-based. While Segmentation-based methods are well-suited for irregular shapes, they struggle with compact or overlapping layouts. Conversely, anchor-based approaches excel for complex layouts but suffer from irregular shapes. To strengthen their merits and overcome their respective demerits, we propose a Complementary Proposal Network (CPN) that seamlessly and parallelly integrates semantic and geometric information for superior performance. The CPN comprises two efficient networks for proposal generation: the Deformable Morphology Semantic Network, which generates semantic proposals employing an innovative deformable morphological operator, and the Balanced Region Proposal Network, which produces geometric proposals with pre-defined anchors. To further enhance the complementarity, we introduce an Interleaved Feature Attention module that enables semantic and geometric features to interact deeply before proposal generation. By leveraging both complementary proposals and features, CPN outperforms state-of-the-art approaches with significant margins under comparable computation cost. Specifically, our approach achieves improvements of 3.6%, 1.3% and 1.0% on challenging benchmarks ICDAR19-ArT, IC15, and MSRA-TD500, respectively. Code for our method will be released.	翻訳日:2024-02-20 20:34:23 公開日:2024-02-18
# PASCL:パーティクルデバイ再建のための摂動増強によるコントラスト学習の促進 PASCL: Supervised Contrastive Learning with Perturbative Augmentation for Particle Decay Reconstruction ( http://arxiv.org/abs/2402.11538v1 ) ライセンス: Link先を確認	Junjian Lu, Siwei Liu, Dmitrii Kobylianski, Etienne Dreyer, Eilam Gross, Shangsong Liang	(参考訳) 高エネルギー物理学では、衝突で生じる粒子は階層木構造の形で崩壊し、最終崩壊生成物のみが検出器を用いて観測される。しかし、可能な木構造の大規模な組合せ空間は、最終粒子の集合から実際の崩壊過程の回復を困難にしている。階層木構造をよりよく解析するために,木構造を推論して衝突イベントを再構成するグラフベースディープラーニングモデルを提案する。特に、最小共通祖先世代(LCAG)行列と呼ばれるコンパクトな行列表現を用いて、粒子崩壊木構造を符号化する。次に,実験的な不確かさを模倣し,データの多様性を高めることを目的として,ノード特徴に適用する摂動的拡張手法を提案する。さらに,複数の崩壊過程から粒子間関係の情報を利用する教師付きグラフコントラスト学習アルゴリズムを提案する。広汎な実験により,提案手法による教師付きグラフコントラスト学習は,既存の物理ベースデータセット上での最先端のベースラインモデルよりも優れ,再構成精度が大幅に向上した。この方法は、同じパラメータを持つモデルに対してより効率的なトレーニング戦略を提供し、より正確で効率的な高エネルギー粒子物理データ解析を実現する。 In high-energy physics, particles produced in collision events decay in a format of a hierarchical tree structure, where only the final decay products can be observed using detectors. However, the large combinatorial space of possible tree structures makes it challenging to recover the actual decay process given a set of final particles. To better analyse the hierarchical tree structure, we propose a graph-based deep learning model to infer the tree structure to reconstruct collision events. In particular, we use a compact matrix representation termed as lowest common ancestor generations (LCAG) matrix, to encode the particle decay tree structure. Then, we introduce a perturbative augmentation technique applied to node features, aiming to mimic experimental uncertainties and increase data diversity. We further propose a supervised graph contrastive learning algorithm to utilize the information of inter-particle relations from multiple decay processes. Extensive experiments show that our proposed supervised graph contrastive learning with perturbative augmentation (PASCL) method outperforms state-of-the-art baseline models on an existing physics-based dataset, significantly improving the reconstruction accuracy. This method provides a more effective training strategy for models with the same parameters and makes way for more accurate and efficient high-energy particle physics data analysis.	翻訳日:2024-02-20 20:34:00 公開日:2024-02-18
# 機械学習による大規模言語モデルにおける事前学習データのlmpactの解読 Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning ( http://arxiv.org/abs/2402.11537v1 ) ライセンス: Link先を確認	Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu and Bing Qin	(参考訳) 様々なソースを持つコーパスでの事前トレーニングを通じて、Large Language Models (LLM) は印象的なパフォーマンスを得た。しかし,プレトレーニングコーパスの各成分の影響はいまだに不透明である。結果として、プレトレーニングコーパスの組織は、まだ経験的であり、最適から逸脱する可能性がある。この問題に対処するために, LLMの事前学習データ5つの主要なカテゴリから48のデータセットが与える影響を系統的に分析し, モデル能力の9つの主要なカテゴリに関するベンチマークを用いてLLMへの影響を測定する。本研究は, 複数コーパスがLLMの性能に与える影響と, 相補関係, 直交関係, 相関関係など, 共同的な影響パターンについて実験的に検討した。また、モデル能力のセットと著しく関連のある書籍のような `high-impact data'' のセットも識別します。これらの知見は、LLMのより効率的な事前トレーニングを支援するために、データの組織化に関する洞察を提供する。 Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major categories of pretraining data of LLMs and measure their impacts on LLMs using benchmarks about nine major categories of model capabilities. Our analyses provide empirical results about the contribution of multiple corpora on the performances of LLMs, along with their joint impact patterns, including complementary, orthogonal, and correlational relationships. We also identify a set of ``high-impact data'' such as Books that is significantly related to a set of model capabilities. These findings provide insights into the organization of data to support more efficient pretraining of LLMs.	翻訳日:2024-02-20 20:33:39 公開日:2024-02-18
# preact:reactの将来予測はエージェントの計画能力を高める PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability ( http://arxiv.org/abs/2402.11534v1 ) ライセンス: Link先を確認	Dayuan Fu, Jianzhao Huang, Siyuan Lu, Guanting Dong, Yejie Wang, Keqing He, Weiran Xu	(参考訳) 予測と実際の結果の相違に対処することは、思考プロセスを拡大し、リフレクションに関わり、正しい方向への推論を促進するのに役立つ。本稿では、$\textbf{pre}$dictionと$\textbf{rea}$soningと$\textbf{act}$ionを統合したエージェントフレームワークである$\textbf{PreAct}$を紹介します。予測によって提供される情報を活用することで、大きな言語モデル(LLM)ベースのエージェントは、より多様化し、戦略的に指向した推論を提供することができる。実験により,PreActは複雑なタスクを遂行する上でReActアプローチよりも優れており,Reflexion法と組み合わせることでPreActを協調的に実現できることが実証された。我々は,そのモデルに異なる数の歴史的予測を推奨し,過去の予測がllm計画に継続的なプラス効果をもたらすことを見出した。 PreActとReActの単一ステップ推論の違いは、PreActがReActよりも多様性と戦略的指向性という面で、確かに有利であることを示している。 Addressing the discrepancies between predictions and actual outcomes often aids individuals in expanding their thought processes and engaging in reflection, thereby facilitating reasoning in the correct direction. In this paper, we introduce $\textbf{PreAct}$, an agent framework that integrates $\textbf{pre}$diction with $\textbf{rea}$soning and $\textbf{act}$ion. Leveraging the information provided by predictions, a large language model (LLM) based agent can offer more diversified and strategically oriented reasoning, which in turn leads to more effective actions that help the agent complete complex tasks. Our experiments demonstrate that PreAct outperforms the ReAct approach in accomplishing complex tasks and that PreAct can be co-enhanced when combined with Reflexion methods. We prompt the model with different numbers of historical predictions and find that historical predictions have a sustained positive effect on LLM planning. The differences in single-step reasoning between PreAct and ReAct show that PreAct indeed offers advantages in terms of diversity and strategic directivity over ReAct.	翻訳日:2024-02-20 20:33:24 公開日:2024-02-18
# chain-of-instruction:大規模言語モデルにおける合成命令チューニング Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models ( http://arxiv.org/abs/2402.11532v1 ) ライセンス: Link先を確認	Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang	(参考訳) 大型言語モデル(llm)を大規模で多様な命令の集合で微調整することで、モデルの異なるタスクへの一般化が改善される。しかし、既存の命令データセットの多くは単一の命令のみを含み、複数のサブタスク(Wang et al., 2023a)からなる複雑な命令に従うのに苦労している。本稿では、1つの命令の出力がチェーンのように次の命令の入力となるような合成命令の新たな概念であるchain-of-instructions(coi)を提案する。従来の単一命令タスクの解法とは異なり,提案手法では各サブタスクを段階的に解き,最終的な解答に到達するまで解き明かす。 CoIチューニング(CoI命令による微調整)は、複数のサブタスクからなる命令を処理するモデルの能力を向上させる。 coi調律モデルはまた、多言語要約のベースラインモデルよりも優れており、非知覚の複合下流タスクにおけるcoiモデルの一般化性を示している。 Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks (Wang et al., 2023a). In this work, we propose a novel concept of compositional instructions called chain-of-instructions (CoI), where the output of one instruction becomes an input for the next like a chain. Unlike the conventional practice of solving single instruction tasks, our proposed method encourages a model to solve each subtask step by step until the final answer is reached. CoI-tuning (i.e., fine-tuning with CoI instructions) improves the model's ability to handle instructions composed of multiple subtasks. CoI-tuned models also outperformed baseline models on multilingual summarization, demonstrating the generalizability of CoI models on unseen composite downstream tasks.	翻訳日:2024-02-20 20:33:01 公開日:2024-02-18
# データ中心の観点からの効率的なマルチモーダル学習 Efficient Multimodal Learning from Data-centric Perspective ( http://arxiv.org/abs/2402.11530v1 ) ライセンス: Link先を確認	Muyang He, Yexin Liu, Boya Wu, Jianhao Yuan, Yueze Wang, Tiejun Huang, Bo Zhao	(参考訳) MLLM(Multimodal Large Language Models)は、一般的な視覚的理解と推論タスクにおいて顕著な機能を示す。しかし、それらのデプロイメントは、トレーニングと推論の両方において相当な計算コストによって妨げられ、より広範な研究とユーザコミュニティへのアクセシビリティを制限する。簡単な解決策は、より小さな事前学習されたビジョンと言語モデルを活用することだ。本稿では,より情報的なトレーニングデータを探索することにより,スケーリング法を破り,より小さいが優れたMLLMを訓練する可能性を実証する。具体的には、フレキシブルビジョンと言語バックボーンを備えた軽量MLLMのファミリであるBunnyを紹介し、凝縮学習データから効率的なマルチモーダル学習を実現する。注目すべきは、Bunny-3Bは最先端の大規模なMLLM、特にLLaVA-v1.5-13Bを複数のベンチマークで上回ることです。コード、モデル、データはhttps://github.com/BAAI-DCAI/Bunny.comにある。 Multimodal Large Language Models (MLLMs) have demonstrated notable capabilities in general visual understanding and reasoning tasks. However, their deployment is hindered by substantial computational costs in both training and inference, limiting accessibility to the broader research and user communities. A straightforward solution is to leverage smaller pre-trained vision and language models, which inevitably causes significant performance drop. In this paper, we demonstrate the possibility to beat the scaling law and train a smaller but better MLLM by exploring more informative training data. Specifically, we introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning from condensed training data. Remarkably, our Bunny-3B outperforms the state-of-the-art large MLLMs, especially LLaVA-v1.5-13B, on multiple benchmarks. The code, models and data can be found in https://github.com/BAAI-DCAI/Bunny.	翻訳日:2024-02-20 20:32:44 公開日:2024-02-18
# RLHFを用いた翻訳選好モデルの改良:コスト効果ソリューションへの一歩 Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution ( http://arxiv.org/abs/2402.11525v1 ) ライセンス: Link先を確認	Nuo Xu, Jun Zhao, Can Zu, Tao Gui, Qi Zhang, Xuanjing Huang	(参考訳) 忠実さ、表現力、優雅さは機械翻訳における絶え間ない追求である。しかし、‘textit{BLEU} のような伝統的なメトリクスは、翻訳品質の人間の好みと厳密に一致しない。本稿では,人間のフィードバックによる強化学習(\textit{RLHF})の活用による翻訳品質の向上について検討する。特に低リソース言語において、翻訳間の人的比較の大規模な高品質データセットを収集するのは自明ではない。この問題に対処するために,人間と機械の翻訳を区別して報酬モデルを最適化する,費用対効果の高い選好学習戦略を提案する。このようにして、報酬モデルは人間に比べて機械翻訳の欠陥を学習し、その後の機械翻訳の改善を導く。実験により, \textit{RLHF} は翻訳品質を効果的に向上し, この改善は, \textit{RLHF} で訓練されていない他の翻訳指導に有効であることが示された。さらなる分析は、モデルの言語能力が嗜好学習において重要な役割を果たすことを示している。強力な言語能力を持つ報酬モデルは、翻訳品質の微妙な違いをよりセンシティブに学習し、実際の人間の翻訳好みに合致することができる。 Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.	翻訳日:2024-02-20 20:32:29 公開日:2024-02-18
# 協調フィルタリングのための周辺環境改善型コントラスト学習 Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering ( http://arxiv.org/abs/2402.11523v1 ) ライセンス: Link先を確認	Peijie Sun, Le Wu, Kun Zhang, Xiangzhi Chen, and Meng Wang	(参考訳) レコメンデーションタスクでは有効だが、コラボレーティブフィルタリング(CF)技術はデータの分散性の課題に直面している。研究者は、これに対処するために、コントラスト学習を利用して、追加の自己監督信号を導入し始めた。しかし、このアプローチは意図せずターゲットのユーザ/テーマを隣人から遠ざけ、その効果を制限していることが多い。そこで本研究では,アンカーノードの協調近傍を最終目的損失関数内の正のサンプルとして扱う手法を提案する。本稿では,監督信号とコントラスト損失を効果的に結合する2つの一意な教師付きコントラスト損失関数の開発に着目する。提案する損失関数を勾配レンズを通して解析し,異なる正のサンプルがアンカーノードの埋め込みの更新に同時に影響を与えることを示した。これらのサンプルの影響は、アンカーノードと負のサンプルとの類似性に依存する。グラフベースの協調フィルタリングモデルを我々のバックボーンとし、既存のコントラスト学習モデルSGLと同じデータ拡張手法に従うことにより、推奨モデルの性能を効果的に向上する。提案したNorborhood-Enhanced Supervised Convistive Loss (NESCL) モデルは,SGLのコントラスト損失関数を新たな損失関数に置き換え,性能改善を示す。 Yelp2018、Gowalla、Amazon-Bookの3つの実世界のデータセットでは、当社のモデルは、それぞれ10.09%、7.09%、35.36%をNDCG@20で上回っている。 While effective in recommendation tasks, collaborative filtering (CF) techniques face the challenge of data sparsity. Researchers have begun leveraging contrastive learning to introduce additional self-supervised signals to address this. However, this approach often unintentionally distances the target user/item from their collaborative neighbors, limiting its efficacy. In response, we propose a solution that treats the collaborative neighbors of the anchor node as positive samples within the final objective loss function. This paper focuses on developing two unique supervised contrastive loss functions that effectively combine supervision signals with contrastive loss. We analyze our proposed loss functions through the gradient lens, demonstrating that different positive samples simultaneously influence updating the anchor node's embeddings. These samples' impact depends on their similarities to the anchor node and the negative samples. Using the graph-based collaborative filtering model as our backbone and following the same data augmentation methods as the existing contrastive learning model SGL, we effectively enhance the performance of the recommendation model. Our proposed Neighborhood-Enhanced Supervised Contrastive Loss (NESCL) model substitutes the contrastive loss function in SGL with our novel loss function, showing marked performance improvement. On three real-world datasets, Yelp2018, Gowalla, and Amazon-Book, our model surpasses the original SGL by 10.09%, 7.09%, and 35.36% on NDCG@20, respectively.	翻訳日:2024-02-20 20:32:08 公開日:2024-02-18
# 関与する会話の秘密を明らかにする: ロールプレイングダイアログエージェントにユーザをつなげる要因 Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents ( http://arxiv.org/abs/2402.11522v1 ) ライセンス: Link先を確認	Shuai Zhang, Yu Lu, Junwen Liu, Jia Yu, Huachuan Qiu, Yuming Yan, Zhenzhong Lan	(参考訳) 対話エージェントの人間的な性質が高まるにつれて、人々は、短い瞬間からかなりの時間に及ぶ、拡張された会話に従事しています。これらの相互作用の持続に寄与する要因を理解することは重要であるが、既存の研究は主にこのような長く実際の会話をほとんど探索しない短期的なシミュレーションに焦点を当てている。本稿では,ロールプレイングモデルとの実際の相互作用における保持率に影響を与える要因について検討する。実ユーザと数千文字のインタラクションの大規模なデータセットを分析することで,複数の要因を体系的に検討し,ユーザ保持率への影響を評価する。驚くべきことに、ボットが果たす役割を具現化する程度は保持率に限られた影響を与え、各ターンの長さは保持率に大きな影響を及ぼす。本研究は,ロールプレイングモデルによるユーザエンゲージメントの重要な側面を明らかにし,ロールプレイング目的の大規模言語モデルの開発において,今後の改善に向けた貴重な洞察を提供する。 With the growing humanlike nature of dialog agents, people are now engaging in extended conversations that can stretch from brief moments to substantial periods of time. Understanding the factors that contribute to sustaining these interactions is crucial, yet existing studies primarily focusing on short-term simulations that rarely explore such prolonged and real conversations. In this paper, we investigate the factors influencing retention rates in real interactions with roleplaying models. By analyzing a large dataset of interactions between real users and thousands of characters, we systematically examine multiple factors and assess their impact on user retention rate. Surprisingly, we find that the degree to which the bot embodies the roles it plays has limited influence on retention rates, while the length of each turn it speaks significantly affects retention rates. This study sheds light on the critical aspects of user engagement with role-playing models and provides valuable insights for future improvements in the development of large language models for role-playing purposes.	翻訳日:2024-02-20 20:31:43 公開日:2024-02-18
# 未知の絡み合った状態のシミュレーションにおける通信コスト Communication Cost in Simulating Unknown Entangled States ( http://arxiv.org/abs/2402.11610v1 ) ライセンス: Link先を確認	Kelvin Onggadinata, Pawel Kurzynski, Dagomir Kaszlikowski	(参考訳) 我々は,n$オブザーバ間で共有されるn$-qubit状態における投影局所測定のアンサンブル統計を,古典的コミュニケーションと共有ランダム性で普遍的にシミュレートする方法を示す。本手法は, [in horizons of the mind, springer, cham (2014)] 量子非局所性をシミュレートするプロトコルと, [phys. rev. lett. 115, 070501 (2015)] 量子回路の古典シミュレーションから生まれたものである。このプロトコルは、他のアプローチとは対照的に、シミュレーションされた量子シナリオの3つの重要な側面を保存している。 We demonstrate how to universally simulate ensemble statistics of projective local measurements on any $n$-qubit state shared among $n$ observers with classical communication and shared randomness. Our technique originates from protocols designed to simulate quantum non-locality [in Horizons of the Mind, Springer, Cham (2014)] and classical simulation of quantum circuits [Phys. Rev. Lett. 115, 070501 (2015)]. The protocol preserves three crucial aspects of the simulated quantum scenario in contrast to other approaches: no involvement of additional parties, none of the observers knows the global state of the system, and local measurement settings remain undisclosed.	翻訳日:2024-02-20 20:22:52 公開日:2024-02-18
# 自己修復システムにおけるルールエンジンの利用とMAPEモデル Using rule engine in self-healing systems and MAPE model ( http://arxiv.org/abs/2402.11581v1 ) ライセンス: Link先を確認	Zahra Yazdanparast	(参考訳) ソフトウェア機能障害はコンピューティング領域において大きなハードルとなり、システム、企業、ユーザに対して大きなリスクをもたらす。信頼性と品質の高いソフトウェアを作成するには、効果的なデバッグが不可欠である。プログラムデバッグは、ソフトウェアのメンテナンスコストを削減する活動です。本研究では,ルールエンジンを用いた故障修復手法を提案する。 mRUBISのシミュレーションにより,本手法は運用環境において効率がよいことを示した。ソフトウェアの失敗と効率的な緩和戦略の採用を徹底的に把握することで、ステークホルダーはソフトウェアシステムの信頼性、セキュリティ、適応性を高めることができる。これにより、失敗による影響を低減し、デジタル技術への信頼を高めることができる。 Software malfunction presents a significant hurdle within the computing domain, carrying substantial risks for systems, enterprises, and users universally. To produce software with high reliability and quality, effective debugging is essential. Program debugging is an activity to reduce software maintenance costs. In this study, a failure repair method that uses a rule engine is presented. The simulation on mRUBIS showed that the proposed method could be efficient in the operational environment. Through a thorough grasp of software failure and the adoption of efficient mitigation strategies, stakeholders can bolster the dependability, security, and adaptability of software systems. This, in turn, reduces the repercussions of failures and cultivates increased confidence in digital technologies.	翻訳日:2024-02-20 20:22:39 公開日:2024-02-18
# 拡張可能な埋め込み: LLMのコンテキスト長のための柔軟な多重化 Extensible Embedding: A Flexible Multipler For LLM's Context Length ( http://arxiv.org/abs/2402.11577v1 ) ライセンス: Link先を確認	Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang	(参考訳) 大規模言語モデル(LLM)は、多くの重要なアプリケーションを扱うためにコンテキストの拡張を要求する。しかし、既存のアプローチはコストがかかり、コンテキスト拡張の品質が劣る傾向にある。本研究では,LLMのコンテキストを高精細に拡張し,柔軟性とコスト効率を両立させる拡張可能な埋め込みを提案する。拡張可能な埋め込みは、単一のトークンではなく、拡張可能なコンテキストのスコープの情報を表す典型的なトークン埋め込みの拡張である。情報密度の高いそのようなコンパクトな入力ユニットを利用することで、LLMは小さなコンテキストウィンドウでも広い範囲のコンテキストにアクセスできる。拡張可能な埋め込みは、アーキテクチャとトレーニングメソッドに体系的に最適化され、複数の利点をもたらす。 1) 多様なコンテキスト長のアドホック拡張を柔軟にサポートするコンテキスト拡張の柔軟性が高い。 2) 組込みモデルを費用対効果で学習する訓練の強いサンプル効率について検討した。 3) プラグインコンポーネントとして拡張可能な埋め込みをシームレスに導入可能な既存のLLMとの互換性。長文言語モデリングおよび理解タスクに関する包括的な評価は、LLMのコンテキストを拡張するために、効果的で効率的で柔軟で互換性のある方法として拡張可能な埋め込みを検証する。 Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we propose Extensible Embedding, which realizes high-quality extension of LLM's context with strong flexibility and cost-effectiveness. Extensible embedding stand as an enhancement of typical token embedding, which represents the information for an extensible scope of context instead of a single token. By leveraging such compact input units of higher information density, the LLM can access to a vast scope of context even with a small context window. Extensible embedding is systematically optimized in architecture and training method, which leads to multiple advantages. 1) High flexibility of context extension, which flexibly supports ad-hoc extension of diverse context lengths. 2) Strong sample efficiency of training, which enables the embedding model to be learned in a cost-effective way. 3) Superior compatibility with the existing LLMs, where the extensible embedding can be seamlessly introduced as a plug-in component. Comprehensive evaluations on long-context language modeling and understanding tasks verify extensible embedding as an effective, efficient, flexible, and compatible method to extend the LLM's context.	翻訳日:2024-02-20 20:22:28 公開日:2024-02-18
# 大規模視覚言語モデルのための視覚内コンテキスト学習 Visual In-Context Learning for Large Vision-Language Models ( http://arxiv.org/abs/2402.11574v1 ) ライセンス: Link先を確認	Yucheng Zhou, Xiang Li, Qianning Wang, Jianbing Shen	(参考訳) 大規模視覚言語モデル(LVLM)では、言語間相互作用や表現格差の課題により、ICL(In-Context Learning)の有効性が制限されている。これらの課題を克服するために,視覚デモンストレーション検索,意図指向画像要約,意図指向デモンストレーション合成を含む新しい視覚インコンテキスト学習(vicl)手法を提案する。提案手法では,'retrieval & rerank'のパラダイムで画像を検索し,タスク意図とタスク特有の視覚的解析で画像を要約し,トークン数を削減し,クロスモーダルインタラクション問題を緩和する言語ベースのデモンストレーションを構成する。 5つの視覚的推論データセットの実験的評価により,本手法の有効性が示された。さらに,本手法の有効性を解明するために情報フロー解析を活用し,LVLMにおける実演の長さと位置の影響について検討した。コンテキスト内アンラーニングの使用はさらに、リトレーニングせずに特定のモデル知識をリセットする可能性を示しています。 In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.	翻訳日:2024-02-20 20:22:08 公開日:2024-02-18
# 参照フリー画像キャプションにおけるコブラ効果 Cobra Effect in Reference-Free Image Captioning Metrics ( http://arxiv.org/abs/2402.11572v1 ) ライセンス: Link先を確認	Zheng Ma, Changxin Wang, Yawen Ouyang, Fei Zhao, Jianbing Zhang, Shujian Huang, Jiajun Chen	(参考訳) テキスト記述と対応する画像の互換性を評価することは、マルチモーダル研究における中核的な取り組みである。近年,視覚言語事前学習モデル(VLM)を活用した参照フリー手法の普及が進んでいる。実証的な証拠は、これらの革新的なアプローチが人間の判断と高い相関関係を示し、この分野の大きな進歩を示していることを裏付けている。しかし、人間の評価とより高い相関関係は、計量の完備を示すのに十分か? そこで本稿では,本質問に対する回答として,参照フリーメトリクスに欠陥があるかどうかについて検討する。特に,コブラ効果に触発されて,指標スコアを報酬として,指標の基準と密接に一致する記述を生成するためにキャプションモデルを指示する。ある計量に欠陥がある場合、モデルによって利用され、生成された文に反映される。以上の結果から,これらの指標による記述には,一貫性のない文や過度な繰り返しなど,重大な欠陥が含まれていることが明らかとなった。次に,これらの指標の問題点を解消するために,自己改善という新しい手法を提案する。 GPT-4Vは生成した文を評価するための評価ツールであり,提案手法がSOTA(State-of-the-art)の性能を達成することを示す。また,参照のない画像キャプション指標を包括的に評価するために,欠陥キャプションと呼ばれる難易度評価ベンチマークも導入する。私たちのコードはhttps://github.com/aaronma2020/robust_captioning_metricで利用可能です。 Evaluating the compatibility between textual descriptions and corresponding images represents a core endeavor within multi-modal research. In recent years, a proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. Empirical evidence has substantiated that these innovative approaches exhibit a higher correlation with human judgment, marking a significant advancement in the field. However, does a higher correlation with human evaluations alone sufficiently denote the complete of a metric? In response to this question, in this paper, we study if there are any deficiencies in reference-free metrics. Specifically, inspired by the Cobra Effect, we utilize metric scores as rewards to direct the captioning model toward generating descriptions that closely align with the metric's criteria. If a certain metric has flaws, it will be exploited by the model and reflected in the generated sentences. Our findings reveal that descriptions guided by these metrics contain significant flaws, e.g. incoherent statements and excessive repetition. Subsequently, we propose a novel method termed Self-Improving to rectify the identified shortcomings within these metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance. In addition, we also introduce a challenging evaluation benchmark called Flaws Caption to evaluate reference-free image captioning metrics comprehensively. Our code is available at https://github.com/aaronma2020/robust_captioning_metric	翻訳日:2024-02-20 20:21:53 公開日:2024-02-18
# テーブルトップロボット「ハル」との会話で表現力のあるロボットの振る舞いをllmで生成する Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru ( http://arxiv.org/abs/2402.11571v1 ) ライセンス: Link先を確認	Zining Wang and Paul Reisert and Eric Nichols and Randy Gomez	(参考訳) ソーシャルロボットは、対話を通じて人間と長期の結びつきを確立することを目的としている。しかし、従来の会話のアプローチは、スクリプト化された対話に依存しており、しばしば対話を維持するのに不足する。本稿では,よりダイナミックで表現豊かな会話を実現するために,大規模言語モデル(llm)をソーシャルロボットに統合することで,この制限に対処する。ロボットの性格に相反する表現行動を伴うロボット応答を生成するために,LLMを利用した完全自動会話システムを提案する。ロボットの動作を2つのモードで組み込む。 1)様々な配送スタイルが可能なtts(text-to-speech)エンジン 2)ロボットの身体動作のライブラリ。ロボットの音声のトーンを動的に選択し,LLM出力の絵文字をロボット行動生成の手がかりとして利用する,カスタムな最先端の感情認識モデルを開発した。私たちのシステムのデモはここにある。そこで,提案するシステムを用いて,ボランティアがソーシャルロボットとチャットする実験を行い,そのフィードバックを分析し,チャットの書き起こしを厳格にエラー解析する。フィードバックは圧倒的に肯定的であり、参加者はロボットの共感、役立ち、自然性、娯楽についてコメントした。最も否定的なフィードバックは、会話に限られた影響を及ぼす自動音声認識(ASR)エラーによるものだった。しかし,LLM自体の繰り返しや幻覚的情報や人間の反応など,会話を損なう可能性があり,LLMアプリケーションにとって重要な問題を提起する,小さな誤りが見られた。 Social robots aim to establish long-term bonds with humans through engaging conversation. However, traditional conversational approaches, reliant on scripted interactions, often fall short in maintaining engaging conversations. This paper addresses this limitation by integrating large language models (LLMs) into social robots to achieve more dynamic and expressive conversations. We introduce a fully-automated conversation system that leverages LLMs to generate robot responses with expressive behaviors, congruent with the robot's personality. We incorporate robot behavior with two modalities: 1) a text-to-speech (TTS) engine capable of various delivery styles, and 2) a library of physical actions for the robot. We develop a custom, state-of-the-art emotion recognition model to dynamically select the robot's tone of voice and utilize emojis from LLM output as cues for generating robot actions. A demo of our system is available here. To illuminate design and implementation issues, we conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Feedback was overwhelmingly positive, with participants commenting on the robot's empathy, helpfulness, naturalness, and entertainment. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations. However, we observed a small class of errors, such as the LLM repeating itself or hallucinating fictitious information and human responses, that have the potential to derail conversations, raising important issues for LLM application.	翻訳日:2024-02-20 20:21:27 公開日:2024-02-18
# 自律型ロボットによる行動コーチングセッションの開発 Developing Autonomous Robot-Mediated Behavior Coaching Sessions with Haru ( http://arxiv.org/abs/2402.11569v1 ) ライセンス: Link先を確認	Matou\v{s} Jel\'inek and Eric Nichols and Randy Gomez	(参考訳) 本研究では,行動変化コーチングにおける人間とロボットの対話における自律対話の設計と影響に関する実証的研究を行う。テーブルトップ型ソーシャルロボット「はる」の利用に注目し,ポジティブな行動変化を促すための「ちっちゃい習慣」手法の実装を検討する。本研究の核心は、春の感情表現力と独特な性格を最大化する完全自律的な対話システムを開発することである。本手法では,対話システムの反復設計と広範囲なテストを行い,Tiny Habits法の原則を効果的に具現化し,信頼性向上と信頼度向上の戦略を取り入れた。対話の最終版の有効性を実験実験で評価した(n=12)。その結果, 春の活力, 相互作用性, 中立性に対する認識は著しく改善した。さらに,本研究は,社会ロボティクスにおける対話設計のより広範な理解に寄与し,今後の発展に向けた実践的な洞察を提供する。 This study presents an empirical investigation into the design and impact of autonomous dialogues in human-robot interaction for behavior change coaching. We focus on the use of Haru, a tabletop social robot, and explore the implementation of the Tiny Habits method for fostering positive behavior change. The core of our study lies in developing a fully autonomous dialogue system that maximizes Haru's emotional expressiveness and unique personality. Our methodology involved iterative design and extensive testing of the dialogue system, ensuring it effectively embodied the principles of the Tiny Habits method while also incorporating strategies for trust-raising and trust-dampening. The effectiveness of the final version of the dialogue was evaluated in an experimental study with human participants (N=12). The results indicated a significant improvement in perceptions of Haru's liveliness, interactivity, and neutrality. Additionally, our study contributes to the broader understanding of dialogue design in social robotics, offering practical insights for future developments in the field.	翻訳日:2024-02-20 20:21:00 公開日:2024-02-18
# 多次元画像の分類のための新しいフーリエニューラルオペレーターフレームワーク:3次元デジタル多孔質メディアへの応用 A novel Fourier neural operator framework for classification of multi-sized images: Application to 3D digital porous media ( http://arxiv.org/abs/2402.11568v1 ) ライセンス: Link先を確認	Ali Kashefi, Tapan Mukerji	(参考訳) フーリエニューラル演算子(FNO)は入力画像のサイズに関して不変であるため、従来の畳み込みニューラルネットワーク(CNN)とは対照的に、任意の大きさの画像をネットワークアーキテクチャの変更なしにFNOベースのフレームワークに入力することができる。 FNOの利点を生かして,様々な大きさの画像を分類する新しいディープラーニングフレームワークを提案する。特に,提案するネットワークを多次元画像上で同時にトレーニングする。実用的応用として,3次元ディジタル多孔質媒体のラベル(透過性など)の予測の問題を考える。このフレームワークを構築するための直感的なアプローチは、適応的な最大プーリングを用いてFNO層を分類器に接続することである。まず, 一定サイズを有する多孔質媒体に対してのみ有効であり, 異なるサイズを有する多孔質媒体に対して有効であることを示す。この制限を克服するため,我々は適応的な最大プールを使用する代わりに,FNO層のチャネル幅の大きい静的最大プールを使用する。 FNO層のチャネル幅は入力画像サイズとは無関係であるため、導入したフレームワークはトレーニング中にマルチサイズの画像を処理できる。導入したフレームワークの有効性を示し、様々な大きさの3次元デジタル多孔質媒体の分類例を例に、直感的な手法と比較する。 Fourier neural operators (FNOs) are invariant with respect to the size of input images, and thus images with any size can be fed into FNO-based frameworks without any modification of network architectures, in contrast to traditional convolutional neural networks (CNNs). Leveraging the advantage of FNOs, we propose a novel deep-learning framework for classifying images with varying sizes. Particularly, we simultaneously train the proposed network on multi-sized images. As a practical application, we consider the problem of predicting the label (e.g., permeability) of three-dimensional digital porous media. To construct the framework, an intuitive approach is to connect FNO layers to a classifier using adaptive max pooling. First, we show that this approach is only effective for porous media with fixed sizes, whereas it fails for porous media of varying sizes. To overcome this limitation, we introduce our approach: instead of using adaptive max pooling, we use static max pooling with the size of channel width of FNO layers. Since the channel width of the FNO layers is independent of input image size, the introduced framework can handle multi-sized images during training. We show the effectiveness of the introduced framework and compare its performance with the intuitive approach through the example of the classification of three-dimensional digital porous media of varying sizes.	翻訳日:2024-02-20 20:20:44 公開日:2024-02-18
# 単一コピーレベルでのマルチパラメータ量子推定における量子Cram\'{e}r-Rao境界の飽和性 Saturability of the Quantum Cram\'{e}r-Rao Bound in Multiparameter Quantum Estimation at the Single-Copy Level ( http://arxiv.org/abs/2402.11567v1 ) ライセンス: Link先を確認	Hendra I. Nurdin	(参考訳) 量子パラメータ推定における精度の究極の下界としての量子クローズ(qcrb)は、パラメータに付随する対称対数微分(sld)の完全または平均可換性のような条件下では、特別な場合において、マルチパラメータ設定において飽和可能であることが知られている。さらに、一般の混合状態の場合、量子状態の無限に多くの同一のコピーに対する集合的測定は一般にqcrbを達成するために必要となる。重要かつ実験的な単一コピーシナリオでは、一般混合状態のマルチパラメータ設定においてQCRBを飽和させるために必要な条件は、SLDにおけるいわゆる部分可換性条件である。しかし、この条件が十分かどうかは不明である。本稿では, 部分可換性を示し, ほぼ十分である新しい条件を導出する。マルチパラメータ単一コピーの場合,QCRBの飽和度は,他の条件とともに十分であることがわかった。また、十分な条件が満たされると、qcrbを飽和させる最適な測定を投影的かつ明示的に特徴付けることができる。例として、この条件が満たされ、明確に検証できるマルチパラメータ量子状態の例を示す。 The quantum Cram\'{e}r-Rao bound (QCRB) as the ultimate lower bound for precision in quantum parameter estimation is only known to be saturable in the multiparameter setting in special cases and under conditions such as full or average commutavity of the symmetric logarithmic derivatives (SLDs) associated with the parameters. Moreover, for general mixed states, collective measurements over infinitely many identical copies of the quantum state are generally required to attain the QCRB. In the important and experimentally relevant single-copy scenario, a necessary condition for saturating the QCRB in the multiparameter setting for general mixed states is the so-called partial commutativity condition on the SLDs. However, it is not known if this condition is also sufficient. This paper derives new necessary conditions that imply partial commutativity and are almost sufficient. It is shown that together with another condition they become sufficient for saturability of the QCRB in the multiparameter single-copy case. Moreover, when the sufficient conditions are satisfied an optimal measurement saturating the QCRB can be chosen to be projective and explicitly characterized. An example is developed to illustrate the case of a multiparameter quantum state where the conditions derived herein are satisfied and can be explicitly verified.	翻訳日:2024-02-20 20:20:21 公開日:2024-02-18
# データ拡張と一貫性トレーニングの再検討による半教師付き2次元ポーズ推定の促進 Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training ( http://arxiv.org/abs/2402.11566v1 ) ライセンス: Link先を確認	Huayi Zhou, Mukun Luo, Fei Jiang, Yue Ding, Hongtao Lu	(参考訳) 2次元のポーズ推定は基本的な視覚問題である。しかし、モデルの教師付き学習には大量のラベル付き画像が必要である。本稿では,半教師付き学習(SSL)方式でラベルのない余分な画像を抽出することにより,ポーズ推定器の精度を高めることを目的とする。従来の一貫性ベースのSSLメソッドは、異なる拡張イメージに対して一貫性のある結果を予測するためにモデルを制約しようと努力した。この合意に従い、高度なデータ拡張手法と簡潔な一貫性トレーニングフレームワークを含む2つのコア側面を再検討する。具体的には、既存のデータ拡張の様々な組み合わせをヒューリスティックに掘り下げ、新しい優れたデータ拡張スキームを発見し、ラベルのないサンプルにより効果的にノイズを加える。一貫性ベースのSSLにおいて重要な役割を果たす、変換の難しさのギャップを大きくした、簡単なハードな拡張ペアを構成することができる。さらに,多彩な拡張によるラベルなし画像の繰り返しの強化,マルチパス予測の逐次生成,および1つのネットワークを用いた教師なし一貫性損失の最適化を提案する。このシンプルでコンパクトな設計は、以前の2重または3重ネットワークからなる手法と同等である。さらに、パフォーマンスを向上させるために複数のネットワークと統合することもできる。最先端のSSLアプローチと比較して、我々の手法はパブリックデータセットに大幅な改善をもたらす。コードは \url{https://github.com/hnuzhy/MultiAugs} で学術的に使用される。 The 2D human pose estimation is a basic visual problem. However, supervised learning of a model requires massive labeled images, which is expensive and labor-intensive. In this paper, we aim at boosting the accuracy of a pose estimator by excavating extra unlabeled images in a semi-supervised learning (SSL) way. Most previous consistency-based SSL methods strive to constraint the model to predict consistent results for differently augmented images. Following this consensus, we revisit two core aspects including advanced data augmentation methods and concise consistency training frameworks. Specifically, we heuristically dig various collaborative combinations of existing data augmentations, and discover novel superior data augmentation schemes to more effectively add noise on unlabeled samples. They can compose easy-hard augmentation pairs with larger transformation difficulty gaps, which play a crucial role in consistency-based SSL. Moreover, we propose to strongly augment unlabeled images repeatedly with diverse augmentations, generate multi-path predictions sequentially, and optimize corresponding unsupervised consistency losses using one single network. This simple and compact design is on a par with previous methods consisting of dual or triple networks. Furthermore, it can also be integrated with multiple networks to produce better performance. Comparing to state-of-the-art SSL approaches, our method brings substantial improvements on public datasets. Code is released for academic use in \url{https://github.com/hnuzhy/MultiAugs}.	翻訳日:2024-02-20 20:19:57 公開日:2024-02-18
# グラフ上での継続的学習:挑戦、解決策、機会 Continual Learning on Graphs: Challenges, Solutions, and Opportunities ( http://arxiv.org/abs/2402.11565v1 ) ライセンス: Link先を確認	Xikun Zhang, Dongjin Song, Dacheng Tao	(参考訳) グラフデータに対する連続学習は,新たに出現したグラフタスクに逐次更新されたモデルを適用しつつ,既存のタスクにおける破滅的な忘れの問題を解決することを目的として,近年注目されている。ユークリッドデータの連続学習研究(画像やテキストなど)の進展を要約する努力が続けられているが、連続グラフ学習(CGL)や生涯グラフ学習(英語版)といった連続学習の体系的レビューは依然として求められている。グラフデータは、データ構造やアプリケーションのシナリオに関してはるかに複雑で、CGLタスクの設定、モデル設計、アプリケーションは非常に困難です。このギャップを埋めるために,既存の連続グラフ学習(CGL)アルゴリズムを網羅的にレビューし,その特徴に基づいてタスク設定を解明し,既存の手法を分類する。 CGL手法を従来の連続学習手法と比較し、従来の連続学習手法をCGLタスクに適用可能であるか分析する。さらに、我々はCGL研究に不可欠なベンチマーク作業についてレビューする。最後に,残る課題を議論し,今後の方向性を提案する。 CGLアルゴリズムの包括的なリストは、https://github.com/UConn-DSIS/Survey-of-Continual-Learning-on-Graphsで参照できます。 Continual learning on graph data has recently attracted paramount attention for its aim to resolve the catastrophic forgetting problem on existing tasks while adapting the sequentially updated model to newly emerged graph tasks. While there have been efforts to summarize progress on continual learning research over Euclidean data, e.g., images and texts, a systematic review of progress in continual learning on graphs, a.k.a, continual graph learning (CGL) or lifelong graph learning, is still demanding. Graph data are far more complex in terms of data structures and application scenarios, making CGL task settings, model designs, and applications extremely challenging. To bridge the gap, we provide a comprehensive review of existing continual graph learning (CGL) algorithms by elucidating the different task settings and categorizing the existing methods based on their characteristics. We compare the CGL methods with traditional continual learning techniques and analyze the applicability of the traditional continual learning techniques to CGL tasks. Additionally, we review the benchmark works that are crucial to CGL research. Finally, we discuss the remaining challenges and propose several future directions. We will maintain an up-to-date GitHub repository featuring a comprehensive list of CGL algorithms, accessible at https://github.com/UConn-DSIS/Survey-of-Continual-Learning-on-Graphs.	翻訳日:2024-02-20 20:19:35 公開日:2024-02-18
# 時空間インプットのための時間的遠方性コントラスト拡散モデル Temporal Disentangled Contrastive Diffusion Model for Spatiotemporal Imputation ( http://arxiv.org/abs/2402.11558v1 ) ライセンス: Link先を確認	Yakun Chen, Kaize Shi, Zhangkai Wu, Juan Chen, Xianzhi Wang, Julian McAuley, Guandong Xu, Shui Yu	(参考訳) 時空間データ分析は、輸送、気象学、医療など、さまざまな分野において重要である。しかし、実際のシナリオで収集されたデータは、センサーの故障やネットワークの伝送エラーによって不完全性に悩まされることが多い。時空間計算は、観測データに存在する空間的および時間的依存関係を利用して、欠落した値を予測する。古典的な統計学や機械学習技術に依存した従来の手法は、特にデータが厳密な分布仮定を満たさない場合、しばしば不十分である。対照的に、グラフとリカレントニューラルネットワークを利用する最近のディープラーニングベースの手法では、効果が向上している。しかし、これらのアプローチはエラーの蓄積を招きやすい。生成モデルは、将来の予測のために、潜在的に不正確な歴史的暗示値への依存を避けるために、ますます採用されている。これらのモデルは、拡散モデルにおいて特に問題となる不安定な結果を生み出すという課題に対処する。我々は,生成過程と迅速なトレーニングを導く条件的特徴を設計することで,これらの課題に対処することを目的とする。具体的にはc$^2$tsdという,トレンド情報と季節情報を条件特徴として取り入れ,コントラスト学習を用いてモデルの一般化性を向上させる新しい手法を導入する。 3つの実世界のデータセットに関する広範な実験は、様々な最先端のベースラインよりもC$^2$TSDの方が優れた性能を示している。 Spatiotemporal data analysis is pivotal across various domains, including transportation, meteorology, and healthcare. However, the data collected in real-world scenarios often suffers incompleteness due to sensor malfunctions and network transmission errors. Spatiotemporal imputation endeavours to predict missing values by exploiting the inherent spatial and temporal dependencies present in the observed data. Traditional approaches, which rely on classical statistical and machine learning techniques, are often inadequate, particularly when the data fails to meet strict distributional assumptions. In contrast, recent deep learning-based methods, leveraging graph and recurrent neural networks, have demonstrated enhanced efficacy. Nonetheless, these approaches are prone to error accumulation. Generative models have been increasingly adopted to circumvent the reliance on potentially inaccurate historical imputed values for future predictions. These models grapple with the challenge of producing unstable results, a particular issue in diffusion-based models. We aim to address these challenges by designing conditional features to guide the generative process and expedite training. Specifically, we introduce C$^2$TSD, a novel approach incorporating trend and seasonal information as conditional features and employing contrastive learning to improve model generalizability. The extensive experiments on three real-world datasets demonstrate the superior performance of C$^2$TSD over various state-of-the-art baselines.	翻訳日:2024-02-20 20:19:09 公開日:2024-02-18
# 低線量ctリカバリの対向的ロバスト性評価 Evaluating Adversarial Robustness of Low dose CT Recovery ( http://arxiv.org/abs/2402.11557v1 ) ライセンス: Link先を確認	Kanchana Vaishnavi Gandikota, Paramanand Chandramouli, Hannah Droege, Michael Moeller	(参考訳) 低線量CT (low dose Computed tomography) 取得は, X線照射による有害な影響を低減するために推奨される。最近の研究は、ベンチマークデータセットの低線量CT回復問題にディープネットワークをうまく応用している。しかし、その堅牢性は臨床での使用前に徹底的な評価が必要である。本研究では,異なる深層学習手法と古典的CT回復手法の堅牢性を評価する。我々は,データ一貫性を促進するモデルベースネットワークを含むディープネットワークが,非標的攻撃の影響を受けやすいことを示した。驚いたことに、これらの品質の悪い再構築であっても、データの一貫性は大きな影響を受けず、ネットワークのより優れた正規化の必要性を動機付けている。ユニバーサルアタックの実現可能性を示し、異なる手法による攻撃伝達性について検討する。臨床領域の局所的な変化を引き起こす攻撃に対するロバスト性を分析した。古典的アプローチとディープネットワークの両方がそのような攻撃の影響を受け、局所的な病変の視覚的外観が変化し、非常に小さな摂動が生じる。結果として得られた再構成は、元の測定値と高いデータ整合性を持つため、これらの局所攻撃は、CT回復問題の解空間を探索するために使用できる。 Low dose computed tomography (CT) acquisition using reduced radiation or sparse angle measurements is recommended to decrease the harmful effects of X-ray radiation. Recent works successfully apply deep networks to the problem of low dose CT recovery on bench-mark datasets. However, their robustness needs a thorough evaluation before use in clinical settings. In this work, we evaluate the robustness of different deep learning approaches and classical methods for CT recovery. We show that deep networks, including model-based networks encouraging data consistency, are more susceptible to untargeted attacks. Surprisingly, we observe that data consistency is not heavily affected even for these poor quality reconstructions, motivating the need for better regularization for the networks. We demonstrate the feasibility of universal attacks and study attack transferability across different methods. We analyze robustness to attacks causing localized changes in clinically relevant regions. Both classical approaches and deep networks are affected by such attacks leading to changes in the visual appearance of localized lesions, for extremely small perturbations. As the resulting reconstructions have high data consistency with the original measurements, these localized attacks can be used to explore the solution space of the CT recovery problem.	翻訳日:2024-02-20 20:18:50 公開日:2024-02-18
# スプライン準補間に基づく経験的密度推定とCopulasクラスタリングモデルへの応用 Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling ( http://arxiv.org/abs/2402.11552v1 ) ライセンス: Link先を確認	Cristiano Tamborrino, Antonella Falini, Francesca Mazzia	(参考訳) 密度推定は、様々な分野において、基礎となるデータの分布をモデル化し理解するための基礎技術である。密度推定の主な目的は、確率変数の確率密度関数を推定することである。このプロセスは、非変量データや多変量データを扱う際に特に有用であり、クラスタリング、異常検出、生成モデリングといったタスクに必須である。本稿では,スプライン準補間を用いた密度の単変量近似を提案し,クラスタリングモデリングの文脈で適用した。クラスタリング手法は, 単変量経験密度 (marginals) の推定に依存する適切な多変量分布の構築に基づいている。このような近似は、提案したスプライン準補間を用いて達成され、探索されたクラスタリング分割をモデル化する結合分布はコプラ関数を用いて構成される。特に、コプラは限界分布とは独立にデータの特徴間の依存性を捉えることができるため、有限混合コプラモデルが提案されている。提案アルゴリズムは人工データセットと実データセットで検証される。 Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data. The primary objective of density estimation is to estimate the probability density function of a random variable. This process is particularly valuable when dealing with univariate or multivariate data and is essential for tasks such as clustering, anomaly detection, and generative modeling. In this paper we propose the mono-variate approximation of the density using spline quasi interpolation and we applied it in the context of clustering modeling. The clustering technique used is based on the construction of suitable multivariate distributions which rely on the estimation of the monovariate empirical densities (marginals). Such an approximation is achieved by using the proposed spline quasi-interpolation, while the joint distributions to model the sought clustering partition is constructed with the use of copulas functions. In particular, since copulas can capture the dependence between the features of the data independently from the marginal distributions, a finite mixture copula model is proposed. The presented algorithm is validated on artificial and real datasets.	翻訳日:2024-02-20 20:18:32 公開日:2024-02-18
# 変圧器を用いたインコンテキスト学習:リップシッツネスに適応したソフトマックスアテンション In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness ( http://arxiv.org/abs/2402.11639v1 ) ライセンス: Link先を確認	Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai	(参考訳) トランスフォーマーの驚くべき特性は、あるデータを通じて暗黙的に推論中に学習者が新しいコンテキストを提示し、そのコンテキストで予測を行うことを任務とする機械学習フレームワーク、in-context learning(icl)を実行する能力である。そのため、学習者は追加のトレーニングなしに文脈に適応しなければならない。各コンテキストが回帰タスクを符号化するICL設定におけるソフトマックスアテンションの役割について検討する。注意ユニットは、事前学習タスクのランドスケープに適応した最寄りの予測器を実装するために使用するウィンドウを学習する。具体的には,このウィンドウがリプシッツ性の減少とラベルノイズの増加によって拡大することを示す。また,低ランク線形問題において,注意ユニットは推論前に適切な部分空間に投影することを学習することを示した。さらに, この適応性はソフトマックス活性化に大きく依存しており, 先行理論解析でよく研究される線形活性化では再現できないことを示した。 A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such that learner must adapt to the context without additional training. We explore the role of softmax attention in an ICL setting where each context encodes a regression task. We show that an attention unit learns a window that it uses to implement a nearest-neighbors predictor adapted to the landscape of the pretraining tasks. Specifically, we show that this window widens with decreasing Lipschitzness and increasing label noise in the pretraining tasks. We also show that on low-rank, linear problems, the attention unit learns to project onto the appropriate subspace before inference. Further, we show that this adaptivity relies crucially on the softmax activation and thus cannot be replicated by the linear activation often studied in prior theoretical analyses.	翻訳日:2024-02-20 20:10:53 公開日:2024-02-18
# 流れ速度の古典性から導かれる量子粒子の位相的挙動 A topological behavior of quantum particles originated from the classicality of their flow velocity ( http://arxiv.org/abs/2402.11624v1 ) ライセンス: Link先を確認	Tomer Shushi	(参考訳) この手紙では、量子粒子を古典流体として記述することから自然に生じる新しい量子効果を提案する。有限凸領域における粒子の量子力学の流体力学的定式化に続いて、波動関数の振幅の最大値は、消滅した量子ポテンシャルを示唆する領域の境界に沿ってどのようにあるかを示し、粒子の古典的な流れ速度を示唆する。この効果は、リーマン構造によって記述された曲線空間の粒子に対して得られる。さらに、平面時空や曲線時空の量子粒子を扱う場合、相対論的状態においてそのような効果は達成できないことを示す。 In this letter, we propose a new quantum effect that naturally emerges from describing the quantum particle as a classical fluid. Following the hydrodynamical formulation of quantum mechanics for a particle in a finite convex region, we show how the maximum values of the wavefunction's amplitude lie along the boundaries of the region when imposing a vanished quantum potential, implying a classical flow velocity of the particle. The effect is obtained for the case of particles in curved space, described by Riemannian structures. We further show that such an effect cannot be achieved in the relativistic regime when dealing with quantum particles in flat or curved spacetime.	翻訳日:2024-02-20 20:10:35 公開日:2024-02-18
# ファブリペロマイクロキャビティにおける量子ドットからのフィルタフリー高性能単一光子放出 Filter-free high-performance single photon emission from a quantum dot in a Fabry-Perot microcavity ( http://arxiv.org/abs/2402.11623v1 ) ライセンス: Link先を確認	Zhixuan Rao, Jiawei Yang, Changkun Song, Mujie Rao, Ziyang Zheng, Luyu Liu, Xuebin Peng, Ying Yu and Siyuan Yu	(参考訳) 共鳴励起とPurcell-enhanced single quantum dots(QD)を組み合わせることは、高性能な固体単一光子源を実現するための重要な戦略である。しかし、光子効率の最適化には、励起レーザーとqdsの発光を効果的に分離する問題に対処する必要がある。伝統的に、これは偏光フィルタリングであり、達成可能な偏光方向とフォトニック状態のスケーラビリティを制限する。本研究では, モノリシックファブリペロマイクロキャビティと決定的に結合したQDの空間直交共振励起を用いて, この問題に対処した。膜キャビティ構造を利用して, フィルタのない単一光子共鳴蛍光を実現した。得られた光源は、高い抽出効率が0.87、純度が0.9045(4)、識別性が0.963(4)である単一光子を生成する。 Combining resonant excitation with Purcell-enhanced single quantum dots (QDs) stands out as a prominent strategy for realizing high performance solid-state single photon sources. However, optimizing photon efficiency requires addressing challenges associated with effectively separating the excitation laser from QDs' emission. Traditionally, this involves polarization filtering, which limits the achievable polarization directions and the scalability of photonic states. In this study, we have successfully tackled this challenge by employing spatially-orthogonal resonant excitation of QDs, deterministically coupled to monolithic Fabry-Perot microcavities. Leveraging the membrane cavity structures, we have achieved filter-free single photon resonant fluorescence. The resulting source produces single photons with a simultaneous high extraction efficiency of 0.87, purity of 0.9045(4), and indistinguishability of 0.963(4).	翻訳日:2024-02-20 20:10:23 公開日:2024-02-18
# 論理閉ループ:大規模視覚言語モデルにおける物体幻覚の発見 Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models ( http://arxiv.org/abs/2402.11622v1 ) ライセンス: Link先を確認	Junfei Wu, Qiang Liu, Ding Wang, Jinghao Zhang, Shu Wu, Liang Wang, Tieniu Tan	(参考訳) 物体幻覚は、大きな視覚言語モデル(LVLM)の幅広い応用を妨げるアキレス腱である。オブジェクト幻覚(Object Hallucination)とは、LVLMが画像に存在しない物体を主張する現象である。対象幻覚を緩和するために,大規模計算資源を必要とするか,あるいは外部モデルの検出結果に依存する命令チューニングや外部モデルに基づく検出手法が提案されている。しかし、lvlm自体を物体幻覚の緩和に利用する未熟な分野は残されている。本研究では、lvlm は存在物体に対して論理的に一貫して応答するが、幻覚対象には一貫性がないという直観を取り入れている。そこで我々は,物体の幻覚検出と緩和のための論理閉ループベースのフレームワーク,LogicCheckGPTを提案する。具体的には、論理的整合性探索を考案し、論理的相関による質問を提起し、オブジェクトの属性を問う。それらの反応が論理閉ループを形成するか否かは、対象幻覚の指標となる。プラグアンドプレイ法として、既存のすべてのLVLMにシームレスに適用することができる。 4つのLVLMにまたがる3つのベンチマークで実施した総合的な実験により,本手法による大幅な改善が示された。 Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.	翻訳日:2024-02-20 20:10:05 公開日:2024-02-18
# Decoding News Narratives: Framing Bias Detectionにおける大規模言語モデルの批判的分析 Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Bias Detection ( http://arxiv.org/abs/2402.11621v1 ) ライセンス: Link先を確認	Valeria Pastorino, Jasivan A. Sivakumar, Nafise Sadat Moosavi	(参考訳) 本研究は,GPT-3.5 Turbo, GPT-4, Flan-T5モデルを用いて,ゼロショット, 少数ショット, 説明可能なプロンプト手法によるニュース見出しのフレーミングバイアスを検出することにより, 社会科学におけるLCMの適用性の向上に寄与する。評価から得られた重要な知見は、これらのモデルの信頼性を高めるための説明可能な効果が顕著であり、フレーミングバイアスに関する社会科学研究における説明可能な設定の重要性を強調している。特にGPT-4は、関連するドメイン内の様々な例を示す場合、いくつかのシナリオでパフォーマンスが向上した。 FLAN-T5の貧弱な性能は、より小さなモデルではフレーミングバイアスの検出にタスク固有の微調整が必要になることを示している。また、モデル、特にgpt-4は、しばしば感情言語をフレーミングバイアスの指標として誤解し、真の感情表現を報告することと、意図的にニュース見出しでフレーミングバイアスを使用することを区別することの難しさを強調している。さらに,フレーミングバイアスの有無が明確か,あるいはより議論された見出しの2つの部分集合について評価を行い,既存のデータセットや新しいデータセット内の潜在的なアノテーション不正確性をフラグ付けする上で,これらのモデルが有効であることを示唆した。最後に、この研究は、実際の状況(野における)におけるモデルを評価し、米国銃暴力に焦点を当てた最初のデータセットを超えて、幅広いトピックをカバーするフレーム付き見出しでモデルのパフォーマンスを評価する。 This work contributes to the expanding research on the applicability of LLMs in social sciences by examining the performance of GPT-3.5 Turbo, GPT-4, and Flan-T5 models in detecting framing bias in news headlines through zero-shot, few-shot, and explainable prompting methods. A key insight from our evaluation is the notable efficacy of explainable prompting in enhancing the reliability of these models, highlighting the importance of explainable settings for social science research on framing bias. GPT-4, in particular, demonstrated enhanced performance in few-shot scenarios when presented with a range of relevant, in-domain examples. FLAN-T5's poor performance indicates that smaller models may require additional task-specific fine-tuning for identifying framing bias detection. Our study also found that models, particularly GPT-4, often misinterpret emotional language as an indicator of framing bias, underscoring the challenge of distinguishing between reporting genuine emotional expression and intentionally use framing bias in news headlines. We further evaluated the models on two subsets of headlines where the presence or absence of framing bias was either clear-cut or more contested, with the results suggesting that these models' can be useful in flagging potential annotation inaccuracies within existing or new datasets. Finally, the study evaluates the models in real-world conditions ("in the wild"), moving beyond the initial dataset focused on U.S. Gun Violence, assessing the models' performance on framed headlines covering a broad range of topics.	翻訳日:2024-02-20 20:09:42 公開日:2024-02-18
# BERT表現における言語特徴の処理プロファイルを識別するメトリック学習符号化モデル Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations ( http://arxiv.org/abs/2402.11608v1 ) ライセンス: Link先を確認	Louis Jalouzot, Robin Sobczyk, Bastien Lhopitallier, Jeanne Salle, Nur Lan, Emmanuel Chemla, Yair Lakretz	(参考訳) 我々は、ニューラルネットワークが処理対象の理論的特徴をどのように表現するかを理解するための新しいアプローチとして、Metric-Learning Encoding Models (MLEMs)を紹介した。概念実証として,BERTから抽出した神経表現にMLEMを適用し,多種多様な言語的特徴(時制,主観的人格,節型,節の埋め込みなど)を追跡する。 1) 言語的特徴は順序づけられる: 異なる層で異なる程度に異なる文の表現を分離する; 2) 神経的表現は階層的に整理される: いくつかの層では、より大きなクラスターの中に入れ替わる表現の集合体が、連続して重要な言語的特徴に従って見つかる; (3) 言語的特徴は中間層で不連続である: 区別的、選択的単位は異なる言語的特徴によって活性化される。メソジカルには、MLEMは多変量復号法よりも優れ、型Iエラーに対してより堅牢であり、(5)局所表現と分散表現の両方を予測することができる。これは、言語モデルにおける言語的特徴のニューラルエンコード方法の研究におけるメトリックラーニング符号化法の有用性と、従来の手法よりもMLEMの利点を示すものである。 MLEMは、他のドメイン(例えば視覚)や人間の脳などの他の神経系に拡張することができる。 We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features are ordered: they separate representations of sentences to different degrees in different layers; (2) neural representations are organized hierarchically: in some layers, we find clusters of representations nested within larger clusters, following successively important linguistic features; (3) linguistic features are disentangled in middle layers: distinct, selective units are activated by distinct linguistic features. Methodologically, MLEMs are superior (4) to multivariate decoding methods, being more robust to type-I errors, and (5) to univariate encoding methods, in being able to predict both local and distributed representations. Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g. vision) and to other neural systems, such as the human brain.	翻訳日:2024-02-20 20:09:12 公開日:2024-02-18
# 準確率的トイモデルにおける非古典性原始 Non-classicality Primitive in a Quasi-probabilistic Toy Model ( http://arxiv.org/abs/2402.11607v1 ) ライセンス: Link先を確認	Kelvin Onggadinata, Pawel Kurzynski, Dagomir Kaszlikowski	(参考訳) 局所アリスとボブが古典的ランダム性を共有する準確率的玩具モデルにおいて,基本的な非古典的効果を示す。我々のシナリオは、ベル不等式違反などの非古典性の正統的な実証と異なり、両方の局地観察者が自由意志を持ち、ランダムに測定設定を選択する。議論の中核は、Abramsky と Brandenburger (Horizons of the Mind, Springer, Cham (2014)) と Pashayan らによって修正されたアルゴリズムである。アル [Phys. Rev. Lett. 115, 070501 (2015)]Bobが決定論的に準確率演算を行うなら、AliceとBobはそれをシミュレートするために古典的なコミュニケーションを要求する。 We demonstrate a basic non-classical effect in a quasi-probabilistic toy model with local Alice and Bob who share classical randomness. Our scenario differs from the orthodox demonstrations of non-classicality such as violations of Bell inequalities where both local observers have a free will and randomly choose their measurement settings. The core of the argument are modified algorithms by Abramsky and Brandenburger [in Horizons of the Mind, Springer, Cham (2014)], and Pashayan et. al. [Phys. Rev. Lett. 115, 070501 (2015)] we use to show that if Bob deterministically performs a quasi-stochastic operation, Alice and Bob require classical communication to simulate it.	翻訳日:2024-02-20 20:08:42 公開日:2024-02-18
# 自己進化型オートエンコーダ組み込みQネットワーク Self-evolving Autoencoder Embedded Q-Network ( http://arxiv.org/abs/2402.11604v1 ) ライセンス: Link先を確認	J. Senthilnath, Bangjian Zhou, Zhen Wei Ng, Deeksha Aggarwal, Rajdeep Dutta, Ji Wei Yoon, Aye Phyu Phyu Aung, Keyu Wu, Min Wu, Xiaoli Li	(参考訳) 逐次的意思決定タスクの分野では,強化学習(rl)エージェントの探索能力は,環境とのインタラクションを通じて高い報酬を得る上で最重要となる。そこで本研究では,自己進化型オートエンコーダ(SA)をQ-Network(QN)に組み込む新しい手法であるSAQNを提案する。 SAQNでは、自己進化型オートエンコーダアーキテクチャは、エージェントが環境を探索する際に適応して進化する。この進化により、オートエンコーダは様々な生の観測を捉え、潜在空間において効果的に表現することができる。エンコーダ生成した潜在空間から抽出された不連続状態を利用して、qnを訓練し、報酬を改善する最適なアクションを決定する。オートエンコーダアーキテクチャの進化において、rlエージェントからの最適な応答を導出するためにバイアス分散規制戦略が用いられる。この戦略には2つの重要な要素があります (i)事前に獲得した知識を保持するためのノードの成長の促進、環境の豊かな表現の確保、 (ii)より管理可能でトラクタブルな潜在空間を維持するために、最小の寄与ノードをプルーニングすること。 3つの異なるベンチマーク環境と実世界の分子環境で行った大規模な実験により、提案したSAQNは最先端の環境よりも大幅に優れていることが示された。その結果、自己進化型オートエンコーダの有効性と、シーケンシャルな意思決定タスクに取り組む上でのQ-Networkとの協調性を強調した。 In the realm of sequential decision-making tasks, the exploration capability of a reinforcement learning (RL) agent is paramount for achieving high rewards through interactions with the environment. To enhance this crucial ability, we propose SAQN, a novel approach wherein a self-evolving autoencoder (SA) is embedded with a Q-Network (QN). In SAQN, the self-evolving autoencoder architecture adapts and evolves as the agent explores the environment. This evolution enables the autoencoder to capture a diverse range of raw observations and represent them effectively in its latent space. By leveraging the disentangled states extracted from the encoder generated latent space, the QN is trained to determine optimal actions that improve rewards. During the evolution of the autoencoder architecture, a bias-variance regulatory strategy is employed to elicit the optimal response from the RL agent. This strategy involves two key components: (i) fostering the growth of nodes to retain previously acquired knowledge, ensuring a rich representation of the environment, and (ii) pruning the least contributing nodes to maintain a more manageable and tractable latent space. Extensive experimental evaluations conducted on three distinct benchmark environments and a real-world molecular environment demonstrate that the proposed SAQN significantly outperforms state-of-the-art counterparts. The results highlight the effectiveness of the self-evolving autoencoder and its collaboration with the Q-Network in tackling sequential decision-making tasks.	翻訳日:2024-02-20 20:08:26 公開日:2024-02-18
# マルチタスク推論: 大規模言語モデルは一度に複数の命令を追えるか? Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? ( http://arxiv.org/abs/2402.11597v1 ) ライセンス: Link先を確認	Guijin Son and Sangwon Baek and Sangdae Nam and Ilgyun Jeong and Seungone Kim	(参考訳) 大規模言語モデル(LLM)は通常、推論呼び出し毎に単一の命令に従うように促される。本研究では、llmsが複数の命令を同時に処理できるかどうかを、マルチタスク推論として分析する。 MTI Bench(Multi-Task Inference Benchmark)は,25タスクにわたる5,000インスタンスを対象とした総合評価ベンチマークである。 MTIベンチの各タスクは2～3つのサブタスクを含む。予想通り、マルチタスク推論は複数の推論呼び出しを必要としないため、平均で1.46倍の推論時間を削減できることを最初に実証した。興味深いことに、タスク分割時のLLMの性能は向上すると期待されているのに対して、Llama-2-Chat-70BやGPT-4のような最先端のLLMは、MTI Benchのシングルタスク推論と比較して最大7.3%、12.4%向上している。 MTI Benchデータセットとコードをこのリンクでリリースします。 Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25 tasks. Each task in the MTI Bench involves 2 to 3 sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces the total inference time by 1.46 times in average since it does not require multiple inference calls. Interestingly, contrary to the expectation that LLMs would perform better when tasks are divided, we find that state-of-the-art LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3% and 12.4% improved performance with Multi-Task Inference compared to Single-Task Inference on the MTI Bench. We release the MTI Bench dataset and our code at this link https://github.com/guijinSON/MTI-Bench.	翻訳日:2024-02-20 20:08:02 公開日:2024-02-18
# オンライン機械学習におけるハイパーパラメータチューニングの簡略化 -- spotRiverGUI Simplifying Hyperparameter Tuning in Online Machine Learning -- The spotRiverGUI ( http://arxiv.org/abs/2402.11594v1 ) ライセンス: Link先を確認	Thomas Bartz-Beielstein	(参考訳) Batch Machine Learning(BML)は非常に大量のストリーミングデータを扱う場合、その限界に達する。これは、利用可能なメモリ、データストリームのドリフト処理、新しい未知のデータ処理に特に当てはまる。 Online Machine Learning (OML)は、BMLの制限を克服するBMLに代わるものだ。 OMLはシーケンシャルな方法でデータを処理することができ、特にデータストリームに役立ちます。 River`パッケージはPython OMLライブラリであり、分類、回帰、クラスタリング、異常検出など、さまざまなオンライン学習アルゴリズムを提供する。パッケージは、OMLモデルのハイパーパラメータチューニングのためのフレームワークを提供する。 spotRiverGUI`は、‘spotRiver`パッケージのグラフィカルユーザインターフェースである。 spotrivergui`は、最適なハイパーパラメータの設定を手動で検索する負担からユーザーを解放する。データが提供されると、ユーザは強力な‘River’パッケージから異なるOMLアルゴリズムを比較して、選択したアルゴリズムを非常に効率的にチューニングできる。 Batch Machine Learning (BML) reaches its limits when dealing with very large amounts of streaming data. This is especially true for available memory, handling drift in data streams, and processing new, unknown data. Online Machine Learning (OML) is an alternative to BML that overcomes the limitations of BML. OML is able to process data in a sequential manner, which is especially useful for data streams. The `river` package is a Python OML-library, which provides a variety of online learning algorithms for classification, regression, clustering, anomaly detection, and more. The `spotRiver` package provides a framework for hyperparameter tuning of OML models. The `spotRiverGUI` is a graphical user interface for the `spotRiver` package. The `spotRiverGUI` releases the user from the burden of manually searching for the optimal hyperparameter setting. After the data is provided, users can compare different OML algorithms from the powerful `river` package in a convenient way and tune the selected algorithms very efficiently.	翻訳日:2024-02-20 20:07:44 公開日:2024-02-18
# メモリ効率の良いLLMファインチューニングのためのゼロ階最適化の再検討:ベンチマーク Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ( http://arxiv.org/abs/2402.11592v1 ) ライセンス: Link先を確認	Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen	(参考訳) 自然言語処理(NLP)の進化途上において、SGDやAdamのような一階最適化(FO)を備えた微調整済みの大規模言語モデル(LLM)が標準となっている。しかし, LLMのサイズが大きくなるにつれて, FO勾配計算のバックプロパゲーション(BP)によるメモリオーバーヘッドが大幅に増大する。メモリ効率が最重要となるオンデバイストレーニングのようなアプリケーションでは、この問題に対処することが特に重要です。本稿では, BPフリーなゼロオーダー最適化(ZO)へのシフトを, MeZO による初期概念に基づく LLM 微調整時のメモリコスト削減ソリューションとして提案する。従来のZO-SGD法とは異なり、我々の研究はより広範なZO最適化手法に拡張され、5つのLLMファミリー(Roberta, OPT, LLaMA, Vicuna, Mistral)、3つのタスク複雑度、5つの微調整スキームにまたがる総合的なベンチマーク研究が実施されている。本研究は,これまで見過ごされていた最適化原理を明らかにし,タスクアライメントの重要性,前傾勾配法の役割,アルゴリズムの複雑さと微調整性能のバランスを強調する。さらに,ブロックワイド降下,ハイブリッドトレーニング,勾配間隔など,ZO最適化の新たな拡張も導入する。本研究は、さらなるメモリ効率のllm微調整を実現するための有望な方向性を提供する。すべての実験を再現するためのコードはhttps://github.com/ZO-Bench/ZO-LLM にある。 In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM .	翻訳日:2024-02-20 20:07:30 公開日:2024-02-18
# SDiT:トランスを用いたスパイキング拡散モデル SDiT: Spiking Diffusion Model with Transformer ( http://arxiv.org/abs/2402.11588v1 ) ライセンス: Link先を確認	Shu Yang, Hanzhi Ma, Chengting Yu, Aili Wang, Er-Ping Li	(参考訳) スパイキングニューラルネットワーク (snn) は消費電力が低く, バイオコンタプリタブルな特性を有しており, エネルギー効率の高いコンピューティングの可能性を秘めていると考えられている。しかし、画像生成タスクにおけるSNNの探索は非常に限定的であり、SNNベースの生成モデルに対する統一的で効果的な構造はまだ提案されていない。本稿では,スパイクニューラルネットワークにおける新しい拡散モデルアーキテクチャについて検討する。我々は、主流拡散モデルにおいてよく使われるU-net構造を置き換えるためにトランスフォーマーを利用する。比較的低い計算コストと短いサンプリング時間で高品質な画像を生成することができる。 SNNに基づく生成モデルの研究のための経験的ベースラインの提供を目的としている。 MNIST、Fashion-MNIST、CIFAR-10データセットの実験は、既存のSNN生成モデルと比較して、我々の研究が非常に競合していることを示している。 Spiking neural networks (SNNs) have low power consumption and bio-interpretable characteristics, and are considered to have tremendous potential for energy-efficient computing. However, the exploration of SNNs on image generation tasks remains very limited, and a unified and effective structure for SNN-based generative models has yet to be proposed. In this paper, we explore a novel diffusion model architecture within spiking neural networks. We utilize transformer to replace the commonly used U-net structure in mainstream diffusion models. It can generate higher quality images with relatively lower computational cost and shorter sampling time. It aims to provide an empirical baseline for research of generative models based on SNNs. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 datasets demonstrate that our work is highly competitive compared to existing SNN generative models.	翻訳日:2024-02-20 20:07:01 公開日:2024-02-18
# PolypNextLSTM:ConvNextとConvLSTMを用いた軽量かつ高速なPolypビデオセグメンテーションネットワーク PolypNextLSTM: A lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM ( http://arxiv.org/abs/2402.11585v1 ) ライセンス: Link先を確認	Debayan Bhattacharya, Konrad Reuter, Finn Behrendnt, Lennart Maack, Sarah Grube, Alexander Schlaefer	(参考訳) ポリプセグメンテーションで一般的に用いられる単一の画像unetアーキテクチャは、ポリープの診断においてビデオデータから得られる時間的洞察が欠如している。臨床実践をより忠実に反映するために,提案手法であるPolypNextLSTMは,映像に基づく深層学習を活用し,時間的情報を利用して,最小パラメータオーバーヘッドでセグメンテーション性能を向上させる。 PolypNextLSTMは、UNetライクな構造で、ConvNext-Tinyをバックボーンとして、パラメータオーバーヘッドを減らすために、最後の2つのレイヤを戦略的に省略する。我々の時間融合モジュールであるConvLSTM(Convolutional Long Short Term Memory)は、時間的特徴を効果的に活用する。我々の主な特徴はPolypNextLSTMであり、パラメータの最もリーンで最速のモデルであり、5つの最先端の画像モデルとビデオベースのディープラーニングモデルの性能を上回っている。 sun-segデータセットの評価は、高速モーションやオクルージョンのような挑戦的なアーティファクトを含むビデオとともに、検出が容易で検出が難しいポリプシナリオにまたがる。 Commonly employed in polyp segmentation, single image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with the least parameter overhead, making it possibly suitable for edge devices. PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion.	翻訳日:2024-02-20 20:06:49 公開日:2024-02-18
# 公に監査可能なプライバシー保護選挙ロール Publicly auditable privacy-preserving electoral rolls ( http://arxiv.org/abs/2402.11582v1 ) ライセンス: Link先を確認	Prashant Agrawal, Mahabir Prasad Jhanwar, Subodh Vishnu Sharma, Subhashis Banerjee	(参考訳) 電子投票に関する既存の文献は、投票プロトコルの妥当性を広く取り上げているが、大規模な選挙における選挙権の脆弱性は依然として重要な懸念となっている。選挙人ロールの完全性を確保するために、現在の慣習は選挙人ロールを公にするか、政党と共有することである。しかし、これは詳細な有権者プロファイルの構築と、有権者の選択的ターゲティングと操作を可能にし、自由かつ公正な選挙の基本原則を損なう。本稿では,公的な監査可能かつプライバシ保護型選挙ロールの設計問題について検討する。まず脅威モデルを定式化し、正式なセキュリティ定義を提供する。次に,脅威を軽減する選挙ロールの作成と維持のためのプロトコルを提案する。政党や監査役は選挙のロールを統計的に監査することができる。選挙人名簿全体は明かされておらず、大規模な組織的な選挙人によるターゲティングや操作を妨げている。 While existing literature on electronic voting has extensively addressed verifiability of voting protocols, the vulnerability of electoral rolls in large public elections remains a critical concern. To ensure integrity of electoral rolls, the current practice is to either make electoral rolls public or share them with the political parties. However, this enables construction of detailed voter profiles and selective targeting and manipulation of voters, thereby undermining the fundamental principle of free and fair elections. In this paper, we study the problem of designing publicly auditable yet privacy-preserving electoral rolls. We first formulate a threat model and provide formal security definitions. We then present a protocol for creation and maintenance of electoral rolls that mitigates the threats. Eligible voters can verify their inclusion, whereas political parties and auditors can statistically audit the electoral roll. The entire electoral roll is never revealed, which prevents any large-scale systematic voter targeting and manipulation.	翻訳日:2024-02-20 20:06:24 公開日:2024-02-18
# 因果潜在因子モデルにおける二重ロバスト推論 Doubly Robust Inference in Causal Latent Factor Models ( http://arxiv.org/abs/2402.11652v1 ) ライセンス: Link先を確認	Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah	(参考訳) 本稿では、多数の単位と結果を含む現代データ豊富な環境において、観測不能な条件下での平均処理効果を推定するための新しいフレームワークを紹介する。提案した推定器は2重に頑健であり,結果計算,逆確率重み付け,行列補完のための新しいクロスフィット手法を組み合わせた。有限サンプルと漸近保証を導出し、新しい推定器の誤差がパラメトリックレートで平均零ガウス分布に収束することを示す。シミュレーション結果は,本論文で分析した推定器の形式的特性の実用的妥当性を示す。 This article introduces a new framework for estimating average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the practical relevance of the formal properties of the estimators analyzed in this article.	翻訳日:2024-02-20 19:59:13 公開日:2024-02-18
# 失敗から学ぶ: 大きな言語モデルをエージェントとして微調整するとき、否定的な例を統合する Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents ( http://arxiv.org/abs/2402.11651v1 ) ライセンス: Link先を確認	Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin	(参考訳) 大規模言語モデル(llm)は、検索エンジンのようなツールを通じて環境と対話するエージェントとして機能することに成功した。しかし、LSMはトレーニングやアライメントにおいてツールの使用に特化せず、エージェントとしての有効性を制限している。この問題を解決するために、従来の研究はGPT-4と環境の間の相互作用軌跡を収集し、それらを微調整した小さなモデルを開発した。これの一環として、標準的なアプローチでは、タスクを正常に終了しないトラジェクトリを単に破棄し、一方、データやリソースのかなりの無駄を招き、他方、微調整時に可能な最適化パスを制限する可能性がある。本稿では,大規模な言語モデルが適切なデータクリーニングと微調整戦略によって失敗から学習できることを論じる。数学的推論,マルチホップ質問応答,戦略的質問応答タスクについて実験を行う。実験結果から, 正の例のみを用いた場合と比較して, 負の例を取り入れた場合, モデル性能が大きく向上することが示された。 Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools like search engines. However, LLMs are not optimized specifically for tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous work has collected interaction trajectories between GPT-4 and environments, and fine-tuned smaller models with them. As part of this, the standard approach has been to simply discard trajectories that do not finish the task successfully, which, on the one hand, leads to a significant waste of data and resources, and on the other hand, has the potential to limit the possible optimization paths during fine-tuning. In this paper, we contend that large language models can learn from failures through appropriate data cleaning and fine-tuning strategies. We conduct experiments on mathematical reasoning, multi-hop question answering, and strategic question answering tasks. Experimental results demonstrate that compared to solely using positive examples, incorporating negative examples enhances model performance by a large margin.	翻訳日:2024-02-20 19:59:03 公開日:2024-02-18
# プログラム強化学習のための理論的基礎 Theoretical foundations for programmatic reinforcement learning ( http://arxiv.org/abs/2402.11650v1 ) ライセンス: Link先を確認	Guruprerana Shabadi, Nathana\"el Fijalkow, Th\'eo Matricon	(参考訳) 強化学習(rl)の分野は、未知の確率環境において最適方針を学習するためのアルゴリズムに関するものである。プログラムRLは、制御ループのような高次構造を含むプログラムとしてポリシーの表現を研究する。機械学習とフォーマルなメソッドコミュニティの交差点で多くの注目を集めているにもかかわらず、プログラム的RLに関する理論的側面についてはほとんど知られていない。最適なプログラムポリシーはどのくらい大きいか? どうやって学ぶのか? 本論文の目的は,プログラム的rlの理論研究を始めながら,これらの質問に対する最初の回答を与えることである。 The field of Reinforcement Learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL.	翻訳日:2024-02-20 19:58:31 公開日:2024-02-18
# 機械学習による量子画像処理: 量子画像処理の品質と信頼性を改善する新しいアプローチ Quantum Image Denoising with Machine Learning: A Novel Approach to Improve Quantum Image Processing Quality and Reliability ( http://arxiv.org/abs/2402.11645v1 ) ライセンス: Link先を確認	Yew Kee Wonga, Yifan Zhou, Yan Shing Liang	(参考訳) 量子画像処理(QIP)は、画像の操作と解析に量子コンピューティングの利点を活用することを目的とした分野である。しかし、qipは量子ビットの制限と量子マシン内のノイズの存在という2つの課題に直面している。本研究では,QIPにおけるノイズ問題に対処する新しい手法を提案する。量子処理画像のノイズを識別し補正する機械学習モデルを訓練し活用することにより、機械による不快感を補償し、古典的コンピュータが行うものと類似した処理結果を高い効率で得ることができる。このモデルは、オープンアクセスデータセットから既存の処理された画像と量子処理された画像の両方からなるデータセットを学習することでトレーニングされる。このモデルは、各ピクセルとその元の値に対する信頼性レベルを提供することができます。 QIPにおける損失とデコヒーレンスを補正するモデルの精度を評価するために,Pak Signal to Noise Ratio (PSNR), Structure similarity Index (SSIM), Mean Opinion Score (MOS)の3つの指標を用いて評価を行った。さらに、ドメイン間のモデルの適用性や、代替手法と比較してコスト効果についても論じる。 Quantum Image Processing (QIP) is a field that aims to utilize the benefits of quantum computing for manipulating and analyzing images. However, QIP faces two challenges: the limitation of qubits and the presence of noise in a quantum machine. In this research we propose a novel approach to address the issue of noise in QIP. By training and employing a machine learning model that identifies and corrects the noise in quantum processed images, we can compensate for the noisiness caused by the machine and retrieve a processing result similar to that performed by a classical computer with higher efficiency. The model is trained by learning a dataset consisting of both existing processed images and quantum processed images from open access datasets. This model will be capable of providing us with the confidence level for each pixel and its potential original value. To assess the model's accuracy in compensating for loss and decoherence in QIP, we evaluate it using three metrics: Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mean Opinion Score (MOS). Additionally, we discuss the applicability of our model across domains well as its cost effectiveness compared to alternative methods.	翻訳日:2024-02-20 19:58:15 公開日:2024-02-18
# 汎用グラフ学習へのアプローチ : 大規模言語モデルの観点から Towards Versatile Graph Learning Approach: from the Perspective of Large Language Models ( http://arxiv.org/abs/2402.11641v1 ) ライセンス: Link先を確認	Lanning Wei, Jun Gao, Huan Zhao	(参考訳) グラフ構造化データは一般的に使われ、現実世界で幅広いアプリケーションシナリオを持つ。これらの多様なアプリケーションに対して、多種多様な学習タスク、グラフドメイン、複雑なグラフ学習手順は、汎用的なグラフ学習アプローチを設計する際に、人間の専門家に挑戦を与える。これらの課題に直面した大規模言語モデル(llm)は、広範な知識と人間のような知性のために潜在的な解決策を提供する。本稿では, LLMを用いた多目的グラフ学習手法を設計するための新しい概念的プロトタイプを提案し, 特に 'where' と 'how' の視点に着目した。 where'の観点では,タスク定義,グラフデータ機能工学,モデル選択と最適化,デプロイと提供という4つの重要なグラフ学習手順を要約する。次に、これらの手順におけるLLMの応用シナリオを幅広いスペクトルにわたって検討する。の観点では、LLMの能力と各手順の要件を一致させます。最後に,LLMの強みを多目的グラフ学習法に活用する上で有望な方向性を指摘する。 Graph-structured data are the commonly used and have wide application scenarios in the real world. For these diverse applications, the vast variety of learning tasks, graph domains, and complex graph learning procedures present challenges for human experts when designing versatile graph learning approaches. Facing these challenges, large language models (LLMs) offer a potential solution due to the extensive knowledge and the human-like intelligence. This paper proposes a novel conceptual prototype for designing versatile graph learning methods with LLMs, with a particular focus on the ``where'' and ``how'' perspectives. From the ``where'' perspective, we summarize four key graph learning procedures, including task definition, graph data feature engineering, model selection and optimization, deployment and serving. We then explore the application scenarios of LLMs in these procedures across a wider spectrum. In the ``how'' perspective, we align the abilities of LLMs with the requirements of each procedure. Finally, we point out the promising directions that could better leverage the strength of LLMs towards versatile graph learning methods.	翻訳日:2024-02-20 19:57:36 公開日:2024-02-18
# 妨害ブロック: 攻撃下の機械生成テキスト検出器のロバスト性に関するストレステスト Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks ( http://arxiv.org/abs/2402.11638v1 ) ライセンス: Link先を確認	Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia Tsvetkov, Tianxing He	(参考訳) 大規模言語モデル(LLM)の普及により、誤用を防ぐために機械生成テキストを検出する手法の需要が高まっている。本研究の目的は,現実のシナリオにおいて,悪意のある攻撃に対する検出器の頑健性をテストすることである。我々は,一般的な機械生成テキスト検出器の堅牢性について,編集,パラフレージング,プロンプト,コジェネレーションの様々なカテゴリの攻撃下で総合的に研究する。我々の攻撃はジェネレータLSMへの限られたアクセスを前提としており、異なる予算レベルで異なる攻撃に対する検出器の性能を比較する。実験の結果、既存の検出器のほとんどすべてが全ての攻撃の下で堅牢であり、すべての検出器は異なる抜け穴を示すことがわかった。全ての検知器を平均すると、全ての攻撃で性能は35%低下する。さらに,これらの欠陥の原因を調査し,堅牢性を改善するための初期パッチを提案する。 The widespread use of large language models (LLMs) is increasing the demand for methods that detect machine-generated text to prevent misuse. The goal of our study is to stress test the detectors' robustness to malicious attacks under realistic scenarios. We comprehensively study the robustness of popular machine-generated text detectors under attacks from diverse categories: editing, paraphrasing, prompting, and co-generating. Our attacks assume limited access to the generator LLMs, and we compare the performance of detectors on different attacks under different budget levels. Our experiments reveal that almost none of the existing detectors remain robust under all the attacks, and all detectors exhibit different loopholes. Averaging all detectors, the performance drops by 35% across all attacks. Further, we investigate the reasons behind these defects and propose initial out-of-the-box patches to improve robustness.	翻訳日:2024-02-20 19:56:18 公開日:2024-02-18
# フェイクユーザによるフェデレーションレコメンダシステム Poisoning Federated Recommender Systems with Fake Users ( http://arxiv.org/abs/2402.11637v1 ) ライセンス: Link先を確認	Ming Yin, Yichang Xu, Minghong Fang, and Neil Zhenqiang Gong	(参考訳) フェデレーションレコメンデーション(federated recommendation)は、フェデレーション学習における重要なユースケースだが、ユーザからサーバ側の脆弱性など、さまざまな攻撃に影響を受けやすい。毒殺攻撃は、参加者が悪質なモデルアップデートをアップロードしてグローバルモデルを欺き、特定のターゲットアイテムの宣伝や取り下げを意図しているため、ユーザー側の攻撃で特に顕著である。本研究では,フェデレーションレコメンデータシステムにおけるプロモーションアタック実行戦略について検討する。フェデレートされたレコメンダシステムに対する現在の中毒攻撃は、実際のユーザやアイテムの人気に関するローカルトレーニングデータなどの追加情報に依存することが多い。しかし、そのような情報は潜在的な攻撃者が得るのに困難である。したがって、サーバから取得したアイテムの埋め込み以外に余分な情報を必要としない攻撃を開発する必要がある。本稿では,ユーザ評価データやユーザ属性,サーバが使用するアグリゲーションルールなどの知識を必要とせずに,フェデレーションレコメンデータシステムにおいて,攻撃対象の項目をプロモートするための,新たな偽ユーザベース中毒攻撃であるPoisonFRSを導入する。複数の実世界のデータセットに対する大規模な実験により、PoisonFRSは攻撃対象のアイテムを真のユーザの大部分に効果的にプロモートし、システムに関する追加情報に依存する現在のベンチマークを上回ります。さらに,実際のユーザと偽ユーザの両方によるモデル更新は,潜在領域では区別がつかないことも確認した。 Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strategies for executing promotion attacks in federated recommender systems. Current poisoning attacks on federated recommender systems often rely on additional information, such as the local training data of genuine users or item popularity. However, such information is challenging for the potential attacker to obtain. Thus, there is a need to develop an attack that requires no extra information apart from item embeddings obtained from the server. In this paper, we introduce a novel fake user based poisoning attack named PoisonFRS to promote the attacker-chosen targeted item in federated recommender systems without requiring knowledge about user-item rating data, user attributes, or the aggregation rule used by the server. Extensive experiments on multiple real-world datasets demonstrate that PoisonFRS can effectively promote the attacker-chosen targeted item to a large portion of genuine users and outperform current benchmarks that rely on additional information about the system. We further observe that the model updates from both genuine and fake users are indistinguishable within the latent space.	翻訳日:2024-02-20 19:55:42 公開日:2024-02-18
# IDEのユニバーサルインターフェースとしてのツール拡張LDM Tool-Augmented LLMs as a Universal Interface for IDEs ( http://arxiv.org/abs/2402.11635v1 ) ライセンス: Link先を確認	Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov	(参考訳) 現在の統合開発環境(IDE)は、初期のテキスト編集ユーティリティから、開発者を支援する数千の関数を含む複雑なプログラムまで、長い道のりをたどっています。しかし、効率向上ツールが組み込まれたため、IDEは徐々に学習曲線の急激な高度化を図った。自然言語対話とコード生成の両方が可能なLarge Language Models(LLM)の台頭は、IDEの概念の陳腐化に関する議論につながります。本研究では,IDE 施設を包むユニバーサルインターフェースとして,IDE における LLM の位置づけについて考察する。ユーザコマンドで複数のIDE機能を含む複雑なアクションを実行でき、オプションやアクションを検索する際の面倒な作業のユーザエクスペリエンスを削除できるモデルを構想する。作業の実際的な部分については、あるタスクの実行を迅速化する外部ツールを呼び出すLLMの能力を探究する作業に従事します。このようなツールの概念実証を紹介する。 Modern-day Integrated Development Environments (IDEs) have come a long way from the early text editing utilities to the complex programs encompassing thousands of functions to help developers. However, with the increasing number of efficiency-enhancing tools incorporated, IDEs gradually became sophisticated software with a steep learning curve. The rise of the Large Language Models (LLMs) capable of both natural language dialogue and code generation leads to a discourse on the obsolescence of the concept of IDE. In this work, we offer a view on the place of the LLMs in the IDEs as the universal interface wrapping the IDE facilities. We envision a model that is able to perform complex actions involving multiple IDE features upon user command, stripping the user experience of the tedious work involved in searching through options and actions. For the practical part of the work, we engage with the works exploring the ability of LLMs to call for external tools to expedite a given task execution. We showcase a proof-of-concept of such a tool.	翻訳日:2024-02-20 19:55:15 公開日:2024-02-18
# インテント認識情報参照ダイアログ生成のためのセルフシーディングおよびマルチインテント自己指示llm Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs ( http://arxiv.org/abs/2402.11633v1 ) ライセンス: Link先を確認	Arian Askari, Roxana Petcu, Chuan Meng, Mohammad Aliannejadi, Amin Abolghasemi, Evangelos Kanoulas, Suzan Verberne	(参考訳) 情報検索ダイアログにおけるユーザ意図の特定は,ユーザの情報ニーズを満たすシステムにとって極めて重要である。意図予測(ip)は困難であり、トレーニングのための人間ラベルの意図と十分な対話を要求する。しかし、手動でアノテートするインテントはリソース集約である。大規模言語モデル(llm)は合成データの生成に有効であることが示されているが、意図認識情報参照ダイアログを生成するためにllmを使用する研究はない。本稿では,大規模・オープンドメイン・インテント対応情報検索ダイアログのゼロショット生成にLLMを活用することに焦点を当てる。本稿では,新しいセルフシーディングとマルチインテント・セルフインストラクションスキームを持つsolidを提案する。前者は、LLM自身の知識スコープを用いてダイアログ生成を開始し、後者は、LLMに順次発声を発生させるよう促し、複雑な多言語発声を発生させる際に、LLMにそのプロンプト命令を自律的に適応させることで、手動のプロンプト設計の必要性を緩和する。さらに,solidが生成するデータに対して1ステップでダイアログを生成するように訓練したsolid-rlを提案する。そこで本研究では,SOLID-RLの学習過程において,SOLID生成ダイアログに様々な重みを割り当てる長さに基づく品質推定機構を提案する。我々は、SOLIDとSOLID-RLを使用して300万以上の意図認識ダイアログを生成し、既存のデータセットのサイズを超える。実験により、SOLIDとSOLID-RLによって生成されたダイアログに基づいて訓練されたIPメソッドは、人為的なダイアログよりも優れたIP品質を実現することが示された。 Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need for manual prompt design by asking the LLM to autonomously adapt its prompt instruction when generating complex multi-intent utterances. Furthermore, we propose SOLID-RL, which is further trained to generate a dialog in one step on the data generated by SOLID. We propose a length-based quality estimation mechanism to assign varying weights to SOLID-generated dialogs based on their quality during the training process of SOLID-RL. We use SOLID and SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size of existing datasets. Experiments show that IP methods trained on dialogs generated by SOLID and SOLID-RL achieve better IP quality than ones trained on human-generated dialogs.	翻訳日:2024-02-20 19:55:00 公開日:2024-02-18
# ニューロモルフィックな顔分析:調査 Neuromorphic Face Analysis: a Survey ( http://arxiv.org/abs/2402.11631v1 ) ライセンス: Link先を確認	Federico Becattini, Lorenzo Berlincioni, Luca Cultrera, Alberto Del Bimbo	(参考訳) イベントカメラ(英: event camera)またはニューロモルフィックセンサー(英: Neuromorphic sensor)は、生物学的視覚系の機能を模倣する撮像装置の一種。異なる間隔で固定画像をキャプチャする従来のフレームベースのカメラとは異なり、ニューロモルフィックセンサーは、高時間分解能と低レイテンシで視野内の光強度や動きの変化を表すイベントを連続的に生成する。これらの特性は、有効性とプライバシー保護の観点から、人間の顔のモデリングにおいて興味深いことが証明されている。しかし、ニューロモルフィック顔分析は依然として生で非構造的な研究分野であり、明確な基準やベンチマークを持たない様々なタスクに対処しようとする試みがいくつかある。本稿では,ニューロモルフィック顔分析の領域における機能,課題,新たな応用について概説し,将来性のある方向性と課題を概説する。ニューロモルフィック・ビジョンの基本的な動作原理を議論し、関連する研究の詳細な概要を提示した後、利用可能なデータ、標準データ表現、新たな課題、さらなる調査を必要とする限界について検討する。本稿では,この発展分野における最近のプロセスに注目し,経験豊富な研究者と新参研究者の双方に,その問題点と欠点を総合的に分析することを目的とする。 Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These properties have proven to be interesting in modeling human faces, both from an effectiveness and a privacy-preserving point of view. Neuromorphic face analysis however is still a raw and unstructured field of research, with several attempts at addressing different tasks with no clear standard or benchmark. This survey paper presents a comprehensive overview of capabilities, challenges and emerging applications in the domain of neuromorphic face analysis, to outline promising directions and open issues. After discussing the fundamental working principles of neuromorphic vision and presenting an in-depth overview of the related research, we explore the current state of available data, standard data representations, emerging challenges, and limitations that require further investigation. This paper aims to highlight the recent process in this evolving field to provide to both experienced and newly come researchers an all-encompassing analysis of the state of the art along with its problems and shortcomings.	翻訳日:2024-02-20 19:54:31 公開日:2024-02-18
# 離散ニューラルアルゴリズムによる推論 Discrete Neural Algorithmic Reasoning ( http://arxiv.org/abs/2402.11628v1 ) ライセンス: Link先を確認	Gleb Rodionov, Liudmila Prokhorenkova	(参考訳) ニューラルアルゴリズム推論は、モデルを学習して古典的なアルゴリズムの実行を模倣することで、ニューラルネットワークによる計算をキャプチャすることを目的としている。一般的なアーキテクチャは重み付け空間に正しいモデルを含むのに十分な表現力を持っているが、現在のニューラル推論は分散データの一般化に苦戦している。一方、古典計算は、離散的な計算状態間の遷移として説明できるので、分布シフトに影響されない。本研究は,有限状態の組合せとして,ニューラル推論器に実行軌道の維持を強制することを提案する。アルゴリズムの状態遷移を監督して訓練されたそのようなモデルは、元のアルゴリズムと完全に整合することができる。これを示すために、SALSA-CLRSベンチマークに対する我々のアプローチを評価し、全てのタスクに対して完璧なテストスコアを得る。さらに,提案するアーキテクチャの選択により,任意のテストデータに対する学習アルゴリズムの正しさを証明できる。 Neural algorithmic reasoning aims to capture computations with neural networks via learning the models to imitate the execution of classical algorithms. While common architectures are expressive enough to contain the correct model in the weights space, current neural reasoners are struggling to generalize well on out-of-distribution data. On the other hand, classical computations are not affected by distribution shifts as they can be described as transitions between discrete computational states. In this work, we propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states. Trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm. To show this, we evaluate our approach on the SALSA-CLRS benchmark, where we get perfect test scores for all tasks. Moreover, the proposed architectural choice allows us to prove the correctness of the learned algorithms for any test data.	翻訳日:2024-02-20 19:54:09 公開日:2024-02-18
# ループ内のユーザによるインタラクティブな服装推薦 Interactive Garment Recommendation with User in the Loop ( http://arxiv.org/abs/2402.11627v1 ) ライセンス: Link先を確認	Federico Becattini, Xiaolin Chen, Andrea Puccia, Haokun Wen, Xuemeng Song, Liqiang Nie, Alberto Del Bimbo	(参考訳) ファッションアイテムのリコメンデーションは、しばしばリッチなユーザープロファイルを活用し、過去の履歴と過去の購入に基づいてターゲットとなる提案を行う。本稿では,ユーザの事前知識が与えられていないことを前提として作業を行う。我々は,着物を構成するための補完アイテムを推奨するため,ユーザの反応を統合することで,ユーザプロファイルをオンザフライで構築することを提案する。本稿では,適切な衣服を提案し,ユーザのフィードバックを取り込み,その推奨を改善し,ユーザ満足度を最大化する強化学習エージェントを提案する。このようなモデルをトレーニングするために、私たちは、トレーニングループ内のユーザフィードバックをシミュレートできるプロキシモデルを活用します。我々はIQON3000のファッションデータセットを実験し、強化学習に基づくエージェントが個人の好みを考慮し、推薦を改善することができることを示した。さらに、そのような作業は、訓練中の探索を活用できない非強化モデルにとって困難であることが証明された。 Recommending fashion items often leverages rich user profiles and makes targeted suggestions based on past history and previous purchases. In this paper, we work under the assumption that no prior knowledge is given about a user. We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit. We present a reinforcement learning agent capable of suggesting appropriate garments and ingesting user feedback so to improve its recommendations and maximize user satisfaction. To train such a model, we resort to a proxy model to be able to simulate having user feedback in the training loop. We experiment on the IQON3000 fashion dataset and we find that a reinforcement learning-based agent becomes capable of improving its recommendations by taking into account personal preferences. Furthermore, such task demonstrated to be hard for non-reinforcement models, that cannot exploit exploration during training.	翻訳日:2024-02-20 19:53:53 公開日:2024-02-18
# メタ認知検索型大規模言語モデル Metacognitive Retrieval-Augmented Large Language Models ( http://arxiv.org/abs/2402.11626v1 ) ライセンス: Link先を確認	Yujia Zhou, Zheng Liu, Jiajie Jin, Jian-Yun Nie, Zhicheng Dou	(参考訳) 検索増強世代は、事実コンテンツの生成に効果があるため、自然言語処理の中心となっている。従来の方法では単一時間検索を用いるが、近年ではマルチホップ推論タスクのマルチ時間検索に移行している。しかし、これらの戦略は事前定義された推論ステップに縛られ、応答生成の不正確性に繋がる可能性がある。本稿では,検索型生成プロセスとメタ認知を組み合わせた手法であるmetaragを提案する。認知心理学から引き出すと、メタ認知は個人が自己反射し、その認知過程を批判的に評価することを可能にする。これを統合することで、MetaRAGはモデルが応答戦略を監視し、評価し、計画し、イントロスペクティブ推論能力を高めることができる。 3段階のメタ認知制御パイプラインを通じて、モデルは初期認知反応の欠如を識別し、修正することができる。経験的評価は、MetaRAGが既存の手法よりも著しく優れていることを示している。 Retrieval-augmented generation have become central in natural language processing due to their efficacy in generating factual content. While traditional methods employ single-time retrieval, more recent approaches have shifted towards multi-time retrieval for multi-hop reasoning tasks. However, these strategies are bound by predefined reasoning steps, potentially leading to inaccuracies in response generation. This paper introduces MetaRAG, an approach that combines the retrieval-augmented generation process with metacognition. Drawing from cognitive psychology, metacognition allows an entity to self-reflect and critically evaluate its cognitive processes. By integrating this, MetaRAG enables the model to monitor, evaluate, and plan its response strategies, enhancing its introspective reasoning abilities. Through a three-step metacognitive regulation pipeline, the model can identify inadequacies in initial cognitive responses and fixes them. Empirical evaluations show that MetaRAG significantly outperforms existing methods.	翻訳日:2024-02-20 19:53:38 公開日:2024-02-18
# SpeCrawler: 大規模言語モデルを使用したAPIドキュメンテーションからOpenAPI仕様を生成する SpeCrawler: Generating OpenAPI Specifications from API Documentation Using Large Language Models ( http://arxiv.org/abs/2402.11625v1 ) ライセンス: Link先を確認	Koren Lazar, Matan Vetzler, Guy Uziel, David Boaz, Esther Goldbraich, David Amid, Ateret Anaby-Tavor	(参考訳) デジタル時代には、広く使われているAPIが明らかである。しかし、スケーラブルなAPIの利用は、オンラインAPIドキュメンテーションで見られる構造的なばらつきのため、課題となる。これにより、api使用を容易にする自動ツールの必要性が高まる。実行可能なアプローチには、ドキュメントをAPI仕様フォーマットに変換することが含まれる。ルールベースのメソッドを使った以前の試みはあったが、これらのアプローチは様々なドキュメントにまたがる一般化の困難に遭遇した。本稿では,大規模言語モデル(LLM)を利用して,多種多様なAPIドキュメントから,慎重に構築されたパイプラインを通じてOpenAPI仕様を生成する総合システムであるSpeCrawlerを紹介する。多数のAPIの標準化フォーマットを作成することにより、SpeCrawlerは、APIオーケストレーションシステム内の統合プロセスの合理化と、ツールのLLMへの組み込みを容易にする。本稿では,SpeCrawlerの方法論を実証的エビデンスとケーススタディで実証し,LLM機能による有効性を示す。 In the digital era, the widespread use of APIs is evident. However, scalable utilization of APIs poses a challenge due to structure divergence observed in online API documentation. This underscores the need for automatic tools to facilitate API consumption. A viable approach involves the conversion of documentation into an API Specification format. While previous attempts have been made using rule-based methods, these approaches encountered difficulties in generalizing across diverse documentation. In this paper we introduce SpeCrawler, a comprehensive system that utilizes large language models (LLMs) to generate OpenAPI Specifications from diverse API documentation through a carefully crafted pipeline. By creating a standardized format for numerous APIs, SpeCrawler aids in streamlining integration processes within API orchestrating systems and facilitating the incorporation of tools into LLMs. The paper explores SpeCrawler's methodology, supported by empirical evidence and case studies, demonstrating its efficacy through LLM capabilities.	翻訳日:2024-02-20 19:53:26 公開日:2024-02-18
# なぜそんなに重いの? 層を切り離して大きな言語モデルをスリム化する Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers ( http://arxiv.org/abs/2402.11700v1 ) ライセンス: Link先を確認	Shuzhou Yuan, Ercong Nie, Bolei Ma, Michael F\"arber	(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクに対処する際、優れた能力を持っている。しかし、これらのモデルの大きさは、層積み重ねによる数十億のパラメータを含むため、ストレージ、トレーニング、推論の点で問題となる。モデルプルーニングや蒸留のような伝統的なアプローチは、モデルサイズを減らす方法を提供しているが、しばしば性能維持の犠牲になる。本研究では,llmにおけるレイヤ数を削減する手法を体系的に検討する。驚くことに、少ないレイヤでもllmは、特にテキスト分類タスクのプロンプトベースの微調整において、同様の、あるいはより優れたパフォーマンスレベルを維持している。注目すべきは、あるケースでは、単一の層を持つモデルは、完全に層化されたモデルよりも優れています。これらの知見は, LLMのサイズ制約を緩和し, 性能を保ちながら, LLMを効果的に活用するための道を開くことを目的とした今後の研究に有用である。 Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. However, the sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. While traditional approaches such as model pruning or distillation offer ways for reducing model size, they often come at the expense of performance retention. In our investigation, we systematically explore the approach of reducing the number of layers in LLMs. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work aimed at mitigating the size constraints of LLMs while preserving their performance, thereby opening avenues for significantly more efficient use of LLMs.	翻訳日:2024-02-20 19:46:15 公開日:2024-02-18
# 大規模言語モデルを用いた対談評価のためのマルチアスペクトフレームワーク A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models ( http://arxiv.org/abs/2402.11676v1 ) ライセンス: Link先を確認	Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun	(参考訳) ヘイトスピーチの介入戦略として、ヘイトフルな主張を否定し、遭遇を非エスカレートするために設計されたヘイトスピーチの文脈に対する情報的な反応が現れた。先行研究では手作業による介入を支援する自動カウンターナラティブ生成手法が提案されているが,これらの手法の評価は未定である。対談的評価のための従来の自動指標は、対談的品質の重要側面を評価基準として組み込むのではなく、表面的参照比較に依存するため、人間の判断と一致しない。先行評価の限界に対処するために, 対談専門ngoのガイドラインから導かれた5つの特徴を用いて, llmが生成した対談候補に対してスコアとフィードバックを提供する新しい評価フレームワークを提案する。 LLM評価器は人手による注釈付きスコアやフィードバックに強く対応し,多視点・参照なし・解釈可能な評価器としての可能性を示した。 Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.	翻訳日:2024-02-20 19:46:00 公開日:2024-02-18
# decoy状態を持つ単一光子を用いたセキュア量子イメージング Secure quantum imaging with decoy state heralded single photons ( http://arxiv.org/abs/2402.11675v1 ) ライセンス: Link先を確認	Siddhant Vernekar and Jolly Xavier	(参考訳) 弱コヒーレント源(WCS)と自発パラメトリックダウン変換された単光子対は量子鍵分布(QKD)および量子イメージング(QI)実験に応用されている。ディコイ状態法はQKDとQIのセキュリティを高めるためにも使われている。我々は,decoy state heralded single photon source (hsps)を用いて,量子安定イメージングの研究を行った。低光子数状態におけるHSPSの優れた性能は、測定の不確実性を低減し、セキュアなQIを確保するために量子鍵分布プロトコルを統合する理想的な候補となる。さらに, デコイ状態hspsよりも動作速度が高いため, デコイ状態wcsの影響を推察し, 量子ホールドイメージングにおいて平均光子数が高い条件下では有効であることを示した。 Weak coherent source (WCS) and spontaneous parametric down converted heralded single photon pairs have found applications in quantum key distribution (QKD) and quantum imaging (QI) experiments. Decoy state methods have also been used to enhance the security for QKD and QI. We study quantum secured imaging with the decoy state heralded single photon source (HSPS). The HSPSs superior performance in low photon number regimes makes it an ideal candidate for integrating quantum key distribution protocols to reduce measurement uncertainty and ensure secure QI. Furthermore, our results also infer the influence of the decoy state WCS, due to its higher operating speed than decoy state HSPS, would be effective in conditions that allow higher mean photon numbers for quantum secured imaging.	翻訳日:2024-02-20 19:45:40 公開日:2024-02-18
# 非線形抵抗ネットワークをシミュレートする高速アルゴリズム A Fast Algorithm to Simulate Nonlinear Resistive Networks ( http://arxiv.org/abs/2402.11674v1 ) ライセンス: Link先を確認	Benjamin Scellier	(参考訳) エネルギー効率の高い人工知能システムを求めて、抵抗ネットワークは従来のgpuベースのニューラルネットワークに代わるものとして注目を集めている。これらのネットワークは電気回路の物理を利用して推論し、平衡伝播のような局所的な訓練手法で最適化することができる。電力消費の観点からは潜在的な優位性にもかかわらず、これらの抵抗ネットワークを効率的にシミュレーションすることはスケーラビリティを評価する上で重要なボトルネックであり、現在の手法は線形ネットワークに限られるか、SPICEのような現実的で遅い回路シミュレータに依存している。理想回路要素を仮定し,線形不等式制約を持つ二次計画問題として構成する非線形抵抗ネットワークのシミュレーション手法を提案し,高速で正確な座標降下アルゴリズムを用いて解く。シミュレーション手法は,従来のスパイスベースのシミュレーションを著しく上回り,最大325倍の速度でネットワークのトレーニングが可能となり,ネットワークサイズとエポック期間の比率が5万倍に向上した。我々のアプローチは他の電気部品にも適用可能であり、非線形電気ネットワークのシミュレーションの急速な進歩を促すことができる。 In the quest for energy-efficient artificial intelligence systems, resistor networks are attracting interest as an alternative to conventional GPU-based neural networks. These networks leverage the physics of electrical circuits for inference and can be optimized with local training techniques such as equilibrium propagation. Despite their potential advantage in terms of power consumption, the challenge of efficiently simulating these resistor networks has been a significant bottleneck to assess their scalability, with current methods either being limited to linear networks or relying on realistic, yet slow circuit simulators like SPICE. Assuming ideal circuit elements, we introduce a novel approach for the simulation of nonlinear resistive networks, which we frame as a quadratic programming problem with linear inequality constraints, and which we solve using a fast, exact coordinate descent algorithm. Our simulation methodology significantly outperforms existing SPICE-based simulations, enabling the training of networks up to 325 times larger at speeds 150 times faster, resulting in a 50,000-fold improvement in the ratio of network size to epoch duration. Our approach, adaptable to other electrical components, can foster more rapid progress in the simulations of nonlinear electrical networks.	翻訳日:2024-02-20 19:45:25 公開日:2024-02-18
# エストニア語テキストの自動修正:EKTB25プロジェクトの最終報告 Autocorrect for Estonian texts: final report from project EKTB25 ( http://arxiv.org/abs/2402.11671v1 ) ライセンス: Link先を確認	Agnes Luhtaru, Martin Vainikko, Krista Liin, Kais Allkivi-Metsoja, Jaagup Kippar, Pille Eslon, Mark Fishel	(参考訳) このプロジェクトは2021-2023年にエストニア語技術プログラムによって資金提供された。その主な目的はエストニア語の綴りと文法の修正ツールを開発することだった。主な課題は、そのような開発に必要なごく少量のエラー訂正データであった。これを緩和するために,(1)モデルトレーニングとテストのためにより多くの補正データをアノテートし,(2)他のタスク用に作成された機械学習モデルをリトレーニングするトランスファーラーニングをテストし,(3)大規模言語モデルを含む代替手法と比較した。また,誤差カテゴリによる補正の精度と収率を算出し,異なる手法の有効性を詳細に比較できる自動評価法を開発した。プロジェクトの間に大きな言語モデルにブレークスルーがあった。エストニア語をサポートする商用言語モデルであるGPT4が作成された。本報告では,計画調整時のモデルの存在を考慮し,エストニア語テキスト改善のためのgpt4の機能との比較を行った。最終結果は、GPT4よりも優れたスコアを提供し、その結果は有用であるが、完全には信頼できないことを示している。レポートにはまた、オープンソースソリューションに焦点を当てたGPT4や他の主要言語モデルの実装方法に関するアイデアも含まれている。このプロジェクトの結果はすべてオープンソース/オープンソースで、商用ライセンスを含む目的で使用することができる。 The project was funded in 2021-2023 by the National Programme of Estonian Language Technology. Its main aim was to develop spelling and grammar correction tools for the Estonian language. The main challenge was the very small amount of available error correction data needed for such development. To mitigate this, (1) we annotated more correction data for model training and testing, (2) we tested transfer-learning, i.e. retraining machine learning models created for other tasks, so as not to depend solely on correction data, (3) we compared the developed method and model with alternatives, including large language models. We also developed automatic evaluation, which can calculate the accuracy and yield of corrections by error category, so that the effectiveness of different methods can be compared in detail. There has been a breakthrough in large language models during the project: GPT4, a commercial language model with Estonian-language support, has been created. We took into account the existence of the model when adjusting plans and in the report we present a comparison with the ability of GPT4 to improve the Estonian language text. The final results show that the approach we have developed provides better scores than GPT4 and the result is usable but not entirely reliable yet. The report also contains ideas on how GPT4 and other major language models can be implemented in the future, focusing on open-source solutions. All results of this project are open-data/open-source, with licenses that allow them to be used for purposes including commercial ones.	翻訳日:2024-02-20 19:45:08 公開日:2024-02-18
# ブラックボックスへの挑戦:農業と林業におけるcnn応用の帰属マップの包括的評価 Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry ( http://arxiv.org/abs/2402.11670v1 ) ライセンス: Link先を確認	Lars Nieradzik, Henrike Stephani, J\"ordis Sieburg-Rockel, Stephanie Helmling, Andrea Olbrich, Janis Keuper	(参考訳) 本研究では,農業・林業におけるニューラルネットワークの説明可能性,特に肥料処理の分類と木材識別について検討する。しばしば「ブラックボックス」と見なされるこれらのモデルの不透明な性質は、クラスアクティベーションマップ(cams)またはサリエンシーマップ(saliency maps)として知られる最先端のアトリビューションマップ(ams)の広範な評価を通じて解決される。これらのAMの包括的質的および定量的分析により、重要な実用的限界が明らかになった。発見によると、AMは重要な機能を一貫して強調しておらず、ドメインの専門家が重要とみなす機能と誤認することが多い。これらの相違は、ニューラルネットワークの意思決定プロセスを理解する上でのAMの有用性に関する重大な疑問を引き起こす。本研究は,農業・林業分野におけるamsの信頼性と実用性に関する重要な知見を提供し,これらの応用分野におけるニューラルネットワークの理解を深める。 In this study, we explore the explainability of neural networks in agriculture and forestry, specifically in fertilizer treatment classification and wood identification. The opaque nature of these models, often considered 'black boxes', is addressed through an extensive evaluation of state-of-the-art Attribution Maps (AMs), also known as class activation maps (CAMs) or saliency maps. Our comprehensive qualitative and quantitative analysis of these AMs uncovers critical practical limitations. Findings reveal that AMs frequently fail to consistently highlight crucial features and often misalign with the features considered important by domain experts. These discrepancies raise substantial questions about the utility of AMs in understanding the decision-making process of neural networks. Our study provides critical insights into the trustworthiness and practicality of AMs within the agriculture and forestry sectors, thus facilitating a better understanding of neural networks in these application areas.	翻訳日:2024-02-20 19:44:45 公開日:2024-02-18
# アナログ量子シミュレータの最適制御による高速フォワード分子基底状態生成 Fast-forwarding molecular ground state preparation with optimal control on analog quantum simulators ( http://arxiv.org/abs/2402.11667v1 ) ライセンス: Link先を確認	Davide Castaldo, Marta Rosa, Stefano Corni	(参考訳) 電子力学の最適制御は、量子力学によって課される境界に近づく進化時間とともに、化学的精度で分子基底状態を作成することができることを示す。我々は、分子ハミルトニアンにすでに存在する相互作用の観点からのみ、分子進化の特定のパラメータ化を提案する。したがって,提案手法は量子シミュレーションルーチンのみを使用し,好適なスケーリングを維持している。変動量子アルゴリズムと最適制御の親密な関係により、可能であれば、文献における最先端の手法と比較する。化学精度とアルゴリズムスケーリングを達成するために必要なパラメータの数は、変分アンサーゼを構築するためのコンパクトな適応戦略と一致していることがわかった。このアルゴリズムは量子シミュレータにも適しており、デジタル量子プロセッサ(最大16量子ビット)をエミュレートして実装され、異なる電子相関度にまたがる異なる分子やジオメトリでテストされている。 We show that optimal control of the electron dynamics is able to prepare molecular ground states, within chemical accuracy, with evolution times approaching the bounds imposed by quantum mechanics. We propose a specific parameterization of the molecular evolution only in terms of interaction already present in the molecular Hamiltonian. Thus, the proposed method solely utilizes quantum simulation routines, retaining their favourable scalings. Due to the intimate relationships between variational quantum algorithms and optimal control we compare, when possible, our results with state-of-the-art methods in literature. We found that the number of parameters needed to reach chemical accuracy and algorithmic scaling are in line with compact adaptive strategies to build variational ansatze. The algorithm, which is also suitable for quantum simulators, is implemented emulating a digital quantum processor (up to 16 qubits) and tested on different molecules and geometries spanning different degrees of electron correlation.	翻訳日:2024-02-20 19:44:27 公開日:2024-02-18
# マルチスケール時間分解による短期負荷予測 Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition ( http://arxiv.org/abs/2402.11664v1 ) ライセンス: Link先を確認	Yuqi Jiang, Yan Li, and Yize Chen	(参考訳) 機械学習とディープラーニングの急速な進歩により、電力系統の電力負荷予測、例えば単変量および多変量短期負荷予測における幅広い応用が可能となった。負荷パターンの非線形性や高い予測精度の学習能力は高いが、電力負荷予測のための典型的なディープラーニングモデルの解釈可能性はあまり研究されていない。本稿では,各ニューラルネットワークの線形結合を学習し,入力時間特徴を学習する,解釈可能な深層学習手法を提案する。また,複雑な時系列パターンに対処するマルチスケール時系列分解法を提案する。ケーススタディはベルギー中央グリッド負荷データセットで行われており、提案モデルは頻繁に適用されるベースラインモデルよりも精度が高かった。具体的には,MSE,MAE,RMSEはそれぞれ0.52,0.57,0.72である。解釈可能性については,提案手法では一般化能力を示す。一方,他の基本手法と比較して,特徴だけでなく時間的解釈可能性も示すことができる。また、グローバルタイム特徴の解釈性も得られる。グローバルな特徴の解釈性を得ることで、負荷データの全体的なパターン、傾向、循環性を把握でき、最終出力の形成における様々な時間関連特徴の重要性も明らかにできる。 Rapid progress in machine learning and deep learning has enabled a wide range of applications in the electricity load forecasting of power systems, for instance, univariate and multivariate short-term load forecasting. Though the strong capabilities of learning the non-linearity of the load patterns and the high prediction accuracy have been achieved, the interpretability of typical deep learning models for electricity load forecasting is less studied. This paper proposes an interpretable deep learning method, which learns a linear combination of neural networks that each attends to an input time feature. We also proposed a multi-scale time series decomposition method to deal with the complex time patterns. Case studies have been carried out on the Belgium central grid load dataset and the proposed model demonstrated better accuracy compared to the frequently applied baseline model. Specifically, the proposed multi-scale temporal decomposition achieves the best MSE, MAE and RMSE of 0.52, 0.57 and 0.72 respectively. As for interpretability, on one hand, the proposed method displays generalization capability. On the other hand, it can demonstrate not only the feature but also the temporal interpretability compared to other baseline methods. Besides, the global time feature interpretabilities are also obtained. Obtaining global feature interpretabilities allows us to catch the overall patterns, trends, and cyclicality in load data while also revealing the significance of various time-related features in forming the final outputs.	翻訳日:2024-02-20 19:44:13 公開日:2024-02-18
# 重力による脱コヒーレンス Gravity-mediated decoherence ( http://arxiv.org/abs/2402.11663v1 ) ライセンス: Link先を確認	Dimitris Moustos, Charis Anastopoulos	(参考訳) 質量体の重力場内の小さな量子系は、後者の量子自由度と絡み合う。したがって、巨大体は環境として機能し、量子系への非単体力学、ノイズ、デコヒーレンスを誘導する。この重力によるデコヒーレンスから地球上のシステムを保護することは不可能であり、これはマクロな量子システムによる全ての実験に深刻な影響を及ぼす可能性がある。我々は,この効果の第一原理解析を行い,対応するオープンシステムのダイナミクスを導出する。近未来の量子実験は影響を受けないが、人間のスケールでは強い非一貫性効果がある。 1メートル分離された人間の2つの局所状態の重ね合わせのデコヒーレンス時間は1秒の順序である。 A small quantum system within the gravitational field of a massive body will be entangled with the quantum degrees of freedom of the latter. Hence, the massive body acts as an environment, and it induces non-unitary dynamics, noise, and decoherence to the quantum system. It is impossible to shield systems on Earth from this gravity-mediated decoherence, which could severely affect all experiments with macroscopic quantum systems. We undertake a first-principles analysis of this effect, by deriving the corresponding open system dynamics. We find that near-future quantum experiments are not affected, but there is a strong decoherence effect at the human scale. The decoherence time for a superposition of two localized states of a human with an one meter separation is of the order of one second.	翻訳日:2024-02-20 19:43:53 公開日:2024-02-18
# TDE-3:スパイクニューラルネットワークにおける光フロー計算の事前改善 TDE-3: An improved prior for optical flow computation in spiking neural networks ( http://arxiv.org/abs/2402.11662v1 ) ライセンス: Link先を確認	Matthew Yedutenko, Federico Paredes-Valles, Lyes Khacef and Guido C.H.E. De Croon	(参考訳) モーション検出は、ロボットシステムが環境を知覚し、ナビゲートするために必要な主要なタスクである。バイオインスパイアされたバイオインスパイアされた時間差エンコーダ(TDE-2)は、イベントベースのセンサーとプロセッサをスパイクニューラルネットワークと組み合わせ、空間内の2つの点間の時間的相関を抽出することでリアルタイムかつエネルギー効率の高い運動検出を提供する。しかし、アルゴリズムレベルでは、この設計はテクスチャ環境における個々のTDEの方向選択性を失う。本稿では, テクスチャ環境下でのTDE-3の方向選択性を高めるために, さらなる抑制入力を付加した3点TDE(TDE-3)を提案する。我々は,入力速度を出力スパイク数やISI(Inter-Spike Interval)に線形にマッピングするために,時間的バックプロパゲーションとシュロゲート勾配を用いて新しいTDE-3を訓練する手法を開発した。私たちの研究は、特定のISIを持つためにスパイクニューロンを訓練する最初の例です。合成データを用いて,刺激のダイナミックレンジ,空間周波数,騒音レベルの変化について,スパイク数とISIのトレーニングと推論を比較した。 ISIは空間周波数の変化に対してより頑健であるのに対し、スパイク数はノイズの存在下でより信頼性の高い訓練信号である。我々は,TDEによる光フロー符号化の詳細な定量的検討を行い,TDE-2とTDE-3を比較した。その結果,両検出器のネットワークレベルでも同様の精度(20度角誤差,88%の相関)を示した。しかし、個々のTDEのより堅牢な方向選択性のため、TDE-3ベースのネットワークスパイクは少なく、エネルギー効率が良い。報告された精度はモデルベースの手法と同等であるが、TDEのスパイクベースの処理により、ニューロモルフィックハードウェアによるよりエネルギー効率の高い推論が可能になる。 Motion detection is a primary task required for robotic systems to perceive and navigate in their environment. Proposed in the literature bioinspired neuromorphic Time-Difference Encoder (TDE-2) combines event-based sensors and processors with spiking neural networks to provide real-time and energy-efficient motion detection through extracting temporal correlations between two points in space. However, on the algorithmic level, this design leads to loss of direction-selectivity of individual TDEs in textured environments. Here we propose an augmented 3-point TDE (TDE-3) with additional inhibitory input that makes TDE-3 direction-selectivity robust in textured environments. We developed a procedure to train the new TDE-3 using backpropagation through time and surrogate gradients to linearly map input velocities into an output spike count or an Inter-Spike Interval (ISI). Our work is the first instance of training a spiking neuron to have a specific ISI. Using synthetic data we compared training and inference with spike count and ISI with respect to changes in stimuli dynamic range, spatial frequency, and level of noise. ISI turns out to be more robust towards variation in spatial frequency, whereas the spike count is a more reliable training signal in the presence of noise. We performed the first in-depth quantitative investigation of optical flow coding with TDE and compared TDE-2 vs TDE-3 in terms of energy-efficiency and coding precision. Results show that on the network level both detectors show similar precision (20 degree angular error, 88% correlation with ground truth). Yet, due to the more robust direction-selectivity of individual TDEs, TDE-3 based network spike less and hence is more energy-efficient. Reported precision is on par with model-based methods but the spike-based processing of the TDEs provides allows more energy-efficient inference with neuromorphic hardware.	翻訳日:2024-02-20 19:43:43 公開日:2024-02-18
# 階層型アクティブ推論における動的計画法 Dynamic planning in hierarchical active inference ( http://arxiv.org/abs/2402.11658v1 ) ライセンス: Link先を確認	Matteo Priorelli and Ivilin Peev Stoianov	(参考訳) 動的計画法により、人間の脳が認知決定に関連する運動軌跡を推論し、導入する能力について述べる。最近のパラダイムであるアクティブ推論(active inference)は、生物の適応に関する基本的な洞察をもたらし、予測誤差を最小化し、生命に適合する状態に制限する。過去数年間、多くの研究が、ロボットと人工知能の革新的な解決策を刺激する、個別の意思決定や継続的なモーター制御といった、アクティブな推論プロセスの観点から、人間と動物の行動がどのように説明できるかを示してきた。しかし、この文献には、変化する環境におけるアクションを効果的に計画する方法に関する包括的な見通しが欠けている。モデリングツールの使用の目標を設定し、アクティブな推論における動的計画の話題を掘り下げ、生物学的目標指向行動の2つの重要な側面を念頭に置いて、オブジェクト操作の余裕を理解し活用する能力、そして他のエージェントを含む自己と環境の間の階層的相互作用を学ぶ。単純な単位から始めて、より高度な構造を徐々に記述し、最近提案された設計選択を比較し、各セクションの基本的な例を提供する。この研究は、ニューラルネットワークと強化学習を中心とする従来の見解とは距離を置き、階層モデルにおけるハイブリッド表現という、アクティブ推論の未検討の方向に向かっている。 By dynamic planning, we refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions. A recent paradigm, active inference, brings fundamental insights into the adaptation of biological organisms, constantly striving to minimize prediction errors to restrict themselves to life-compatible states. Over the past years, many studies have shown how human and animal behavior could be explained in terms of an active inferential process -- either as discrete decision-making or continuous motor control -- inspiring innovative solutions in robotics and artificial intelligence. Still, the literature lacks a comprehensive outlook on how to effectively plan actions in changing environments. Setting ourselves the goal of modeling tool use, we delve into the topic of dynamic planning in active inference, keeping in mind two crucial aspects of biological goal-directed behavior: the capacity to understand and exploit affordances for object manipulation, and to learn the hierarchical interactions between the self and the environment, including other agents. We start from a simple unit and gradually describe more advanced structures, comparing recently proposed design choices and providing basic examples for each section. This study distances itself from traditional views centered on neural networks and reinforcement learning, and points toward a yet unexplored direction in active inference: hybrid representations in hierarchical models.	翻訳日:2024-02-20 19:43:07 公開日:2024-02-18
# 物理層通信による事前学習言語モデルの統合 Integrating Pre-Trained Language Model with Physical Layer Communications ( http://arxiv.org/abs/2402.11656v1 ) ライセンス: Link先を確認	Ju-Hyung Lee and Dong-Ho Lee and Joohan Lee and Jay Pujara	(参考訳) デバイスが言語モデル(lms)などの組み込み基盤モデルを通じて情報を直接交換するオンデバイスai通信の分野は、堅牢で効率的で汎用的な通信フレームワークを必要としている。しかし、これらのフレームワークを既存の無線システムに統合し、ノイズやビットエラーを効果的に管理することは大きな課題となる。本研究では,物理層(PHY)通信機能と統合されたデバイス上での実用的なAI通信フレームワークを提案する。本フレームワークは,チャネルノイズを用いたエンドツーエンドトレーニングを取り入れ,レジリエンスを高めるとともに,ベクトル量子化変分オートエンコーダ(vq-vae)を効率良くロバストな通信に活用し,プリトレーニングエンコーダ・デコーダトランスフォーマを一般化能力向上に活用する。各種通信シナリオにまたがるシミュレーションにより,我々のフレームワークは,標準化された3GPPチャネルモデルにおいて,相当な一般化能力とノイズロバスト性を示しながら,送信サイズを50%削減できることが判明した。 The burgeoning field of on-device AI communication, where devices exchange information directly through embedded foundation models, such as language models (LMs), requires robust, efficient, and generalizable communication frameworks. However, integrating these frameworks with existing wireless systems and effectively managing noise and bit errors pose significant challenges. In this work, we introduce a practical on-device AI communication framework, integrated with physical layer (PHY) communication functions, demonstrated through its performance on a link-level simulator. Our framework incorporates end-to-end training with channel noise to enhance resilience, incorporates vector quantized variational autoencoders (VQ-VAE) for efficient and robust communication, and utilizes pre-trained encoder-decoder transformers for improved generalization capabilities. Simulations, across various communication scenarios, reveal that our framework achieves a 50% reduction in transmission size while demonstrating substantial generalization ability and noise robustness under standardized 3GPP channel models.	翻訳日:2024-02-20 19:42:41 公開日:2024-02-18
# メカニズムの競合:言語モデルがファクトやカウンターファクトをどう扱うかの追跡 Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals ( http://arxiv.org/abs/2402.11655v1 ) ライセンス: Link先を確認	Francesco Ortu, Zhijing Jin, Diego Doimo, Mrinmaya Sachan, Alberto Cazzaniga, Bernhard Sch\"olkopf	(参考訳) 解釈可能性の研究は、経験的成功と大規模言語モデル(LLM)の内部動作に関する科学的理解のギャップを埋めることを目的としている。しかし、この分野の既存の研究のほとんどは、モデルが事実の知識をコピーまたはリコールする方法のような単一のメカニズムの分析に焦点を当てている。本研究では,個々のメカニズムではなく,複数のメカニズムの相互作用に着目したメカニズムの競合の定式化を提案し,そのひとつが最終予測において支配的になることを示す。我々は,ロジット検査と注意修正という2つの解釈方法を用いて,llm内の機構の競合がどのようにして起こるかを明らかにする。本研究は,様々なモデル成分間の機構とその競合の痕跡を示し,特定の機構の強度を効果的に制御する注意位置を明らかにした。私たちのコードとデータはhttps://github.com/francescortu/Competition_of_Mechanismsにあります。 Interpretability research aims to bridge the gap between the empirical success and our scientific understanding of the inner workings of large language models (LLMs). However, most existing research in this area focused on analyzing a single mechanism, such as how models copy or recall factual knowledge. In this work, we propose the formulation of competition of mechanisms, which instead of individual mechanisms focuses on the interplay of multiple mechanisms, and traces how one of them becomes dominant in the final prediction. We uncover how and where the competition of mechanisms happens within LLMs using two interpretability methods, logit inspection and attention modification. Our findings show traces of the mechanisms and their competition across various model components, and reveal attention positions that effectively control the strength of certain mechanisms. Our code and data are at https://github.com/francescortu/Competition_of_Mechanisms.	翻訳日:2024-02-20 19:42:22 公開日:2024-02-18
# モデルフリーな$\mu$- synthesis:非滑らかな最適化の観点から Model-Free $\mu$-Synthesis: A Nonsmooth Optimization Perspective ( http://arxiv.org/abs/2402.11654v1 ) ライセンス: Link先を確認	Darioush Keivan, Xingang Guo, Peter Seiler, Geir Dullerud, Bin Hu	(参考訳) 本稿では,モデルフリーポリシーサーチを重要なロバスト制御ベンチマーク,すなわち$\mu$- synthesisで再検討する。一般的な出力フィードバック設定では、この問題に対する凸定式化は存在しないため、大域的最適性保証は期待できない。 Apkarian (2011) は、この問題に対して非凸な非滑らかなポリシー最適化手法を提案し、モデルベースの方法で更新方向を生成する下位のポリシー探索アルゴリズムを用いて最先端の設計結果を達成した。凸性や大域的最適性保証の欠如にもかかわらず、これらの段階的なポリシー探索手法は、実際は驚くべき数値的な結果をもたらしている。このような政策最適化を主眼として,これらの段階的な探索手法をモデルフリーな設定に拡張する。具体的には,モデルフリーの非導出的サンプリング法と一様平滑化を伴うゼロ次ポリシー探索法という2つのモデルフリーポリシー最適化手法の有効性について検討する。両手法がモデルベースで達成した設計成果を一貫して再現することを示すため,広範な数値実験を行った。さらに, 定常点への収束保証が, コスト関数の強制性に関連するいくつかの仮定の下で, モデルフリーな$\mu$- synthesis に対して確立されることを示す理論的正当性を示す。総じて,デリバティブフリー政策最適化は,モデルフリー設定における一般出力フィードバック$\mu$合成問題を解くための競争的かつ実行可能なアプローチであることを示す。 In this paper, we revisit model-free policy search on an important robust control benchmark, namely $\mu$-synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice. Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing. We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free $\mu$-synthesis under some assumptions related to the coerciveness of the cost function. Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback $\mu$-synthesis problems in the model-free setting.	翻訳日:2024-02-20 19:42:06 公開日:2024-02-18
# モバイルエッジコンピューティングにおけるタスクオフロードのためのコンビネートクライアントマスタマルチエージェント深層強化学習 Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing ( http://arxiv.org/abs/2402.11653v1 ) ライセンス: Link先を確認	Tesfay Zemuy Gebrekidan, Sebastian Stein, Timothy J.Norman	(参考訳) 近年,ビデオストリーミング,データマイニング,仮想現実,拡張現実,画像処理,画像処理,ビデオ処理,顔認識,オンラインゲームなど,計算集約的なタスクを行うモバイルアプリケーションが急増している。しかし、タブレットやスマートフォンのようなユーザデバイス(UD)は、タスクの計算要求を実行する能力が限られている。モバイルエッジコンピューティング(MEC)は、UDのコンピューティング需要の増加に対応するための有望な技術として登場した。 MECのタスクオフロードは、UDとMECサーバ間でタスクを分散することでUDの要求を満たす戦略である。動的変化に適応し、オンライン計算複雑性を最小限に抑えることができるため、タスクオフロード問題においてDRLが注目されている。しかし、UDやMECサーバにおける各種のリソース制約は、効率的なDRLベースのタスクオフロード戦略の設計に困難をもたらす。既存のDRLベースのタスクオフロードアルゴリズムは、サーバに十分なストレージリソースが利用できることを前提として、UDの制約に焦点を当てている。さらに、既存のマルチエージェントDRL(MADRL)ベースのタスクオフロードアルゴリズムは、均質なエージェントであり、均質な制約を報酬関数のペナルティとみなす。我々は,タスクオフロードをMEC (CCM\_MADRL\_MEC) で行うための新しい組合せクライアントマスターMADRL (CCM\_MADRL) アルゴリズムを提案し,UDがリソース要求を判断し,サーバがUDの要求に基づいて組合せ決定を行えるようにした。 CCM\_MADRL\_MECは、UDの制約に加えてサーバストレージ容量を考慮するタスクオフロードにおける最初のMADRLである。 CCM\_MADRL\_MECは組合せ行動選択を利用して既存のMADDPGおよびヒューリスティックアルゴリズムよりも優れた収束性を示した。 Recently, there has been an explosion of mobile applications that perform computationally intensive tasks such as video streaming, data mining, virtual reality, augmented reality, image processing, video processing, face recognition, and online gaming. However, user devices (UDs), such as tablets and smartphones, have a limited ability to perform the computation needs of the tasks. Mobile edge computing (MEC) has emerged as a promising technology to meet the increasing computing demands of UDs. Task offloading in MEC is a strategy that meets the demands of UDs by distributing tasks between UDs and MEC servers. Deep reinforcement learning (DRL) is gaining attention in task-offloading problems because it can adapt to dynamic changes and minimize online computational complexity. However, the various types of continuous and discrete resource constraints on UDs and MEC servers pose challenges to the design of an efficient DRL-based task-offloading strategy. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough storage resources on the server. Moreover, existing multiagent DRL (MADRL)--based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. We proposed a novel combinatorial client-master MADRL (CCM\_MADRL) algorithm for task offloading in MEC (CCM\_MADRL\_MEC) that enables UDs to decide their resource requirements and the server to make a combinatorial decision based on the requirements of the UDs. CCM\_MADRL\_MEC is the first MADRL in task offloading to consider server storage capacity in addition to the constraints in the UDs. By taking advantage of the combinatorial action selection, CCM\_MADRL\_MEC has shown superior convergence over existing MADDPG and heuristic algorithms.	翻訳日:2024-02-20 19:41:44 公開日:2024-02-18
# 大規模言語モデルがイデオロギー操作にどの程度影響するか How Susceptible are Large Language Models to Ideological Manipulation? ( http://arxiv.org/abs/2402.11725v1 ) ライセンス: Link先を確認	Kai Chen, Zihao He, Jun Yan, Taiwei Shi, Kristina Lerman	(参考訳) 大規模言語モデル(LLM)は、大衆の認識や情報との相互作用に大きな影響を与える可能性がある。これは、これらのモデル内のイデオロギーを容易に操作できる場合に生じる社会的な影響に関する懸念を引き起こす。本研究では,llmがいかに効果的にイデオロギーバイアスを学習し,一般化できるかを検討する。少量のイデオロギー駆動サンプルへの曝露は,LSMのイデオロギーを著しく変化させる。特に、LLMは、あるトピックからイデオロギーを吸収し、それとは無関係なトピックに一般化する能力を示す。 LLMのイデオロギーが歪められることの容易さは、悪意あるアクターによる故意に有害なトレーニングデータや、データアノテータによる不注意に導入されたバイアスに関連するリスクを浮き彫りにする。また、llmに対するイデオロギー操作の影響を軽減するための堅牢なセーフガードの必要性も強調している。 Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideologies within these models can be easily manipulated. In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. Our findings reveal a concerning vulnerability: exposure to only a small amount of ideologically driven samples significantly alters the ideology of LLMs. Notably, LLMs demonstrate a startling ability to absorb ideology from one topic and generalize it to even unrelated ones. The ease with which LLMs' ideologies can be skewed underscores the risks associated with intentionally poisoned training data by malicious actors or inadvertently introduced biases by data annotators. It also emphasizes the imperative for robust safeguards to mitigate the influence of ideological manipulations on LLMs.	翻訳日:2024-02-20 19:35:12 公開日:2024-02-18
# ChatGPTは開発者をサポートできるか? コード生成のための大規模言語モデルの実証評価 Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation ( http://arxiv.org/abs/2402.11702v1 ) ライセンス: Link先を確認	Kailun Jin, Chung-Yu Wang, Hung Viet Pham, Hadi Hemmati	(参考訳) 大規模言語モデル(llm)は、様々な開発シナリオで有望な能力を示す多くの先行研究とともに、コード生成において顕著な熟練度を示している。しかし、これらの研究は主に研究環境での評価を提供しており、LLMが現実世界の開発者をいかに効果的に支援できるかを理解するための大きなギャップを残している。これを解決するために、私たちは、開発者とChatGPT(GitHubなどのプラットフォーム上のShare Link機能でキャプチャされた)の会話から収集されたデータセットであるDevGPTで会話を経験的に分析しました。私たちの経験から,LLM生成コードを使用する現在のプラクティスは,一般的には,高レベルな概念のデモやドキュメントの例の提供に限られています。これらの結果は、現代のソフトウェア開発において不可欠な部分になる前に、コード生成におけるLLMを改善するには、将来的な作業が必要であることを示している。 Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.	翻訳日:2024-02-20 19:34:51 公開日:2024-02-18
# イジングモデルの機械学習解法を説明する Explaining the Machine Learning Solution of the Ising Model ( http://arxiv.org/abs/2402.11701v1 ) ライセンス: Link先を確認	Roberto C. Alamino	(参考訳) 機械学習(ML)技術と同様に、大きな次元を持つデータに関わる問題を解く上でも強力であり、パラメータを組み込んだ結果を説明することは、特に物理学的な応用において最も重要な課題である。ここでは、近年の多くのML研究のターゲットである強磁性イジングモデルに対して、これがどのように達成できるかを示す。隠れた層を持たないニューラルネットワーク(NN)とハミルトニアン対称性を用いてモデルの連続相転移の臨界温度を求めることにより、その戦略を説明することができる。これにより、対称性が分かっていないとき、nn の最小拡張の予測が問題を解くことができるが、これも説明できる。 As powerful as machine learning (ML) techniques are in solving problems involving data with large dimensionality, explaining the results from the fitted parameters remains a challenging task of utmost importance, especially in physics applications. Here it is shown how this can be accomplished for the ferromagnetic Ising model, the target of many ML studies in the last years. By using a neural network (NN) without any hidden layers and the symmetry of the Hamiltonian to find the critical temperature for the continuous phase transition of the model, an explanation of its strategy is found. This allows the prediction of the minimal extension of the NN to solve the problem when the symmetry is not known, which is also explainable.	翻訳日:2024-02-20 19:34:33 公開日:2024-02-18
# 5gセル --エネルギー効率の観点から 5G Cellular -- An Energy Efficiency Perspective ( http://arxiv.org/abs/2402.11698v1 ) ライセンス: Link先を確認	Deven Panchal	(参考訳) セルラー通信の5g技術は、いつでもどこでも情報にアクセスするための大きな容量と範囲を約束するが、膨大な電力消費を持つ恐れがある。加入者側と運用者側の両方に存在するこの問題の解決に向けた重要な研究が進められている。トラフィックの予測、物理層の変更、そして5G技術をよりエネルギー効率良くするための取り組みなどがあった。本研究の目的は,エネルギー効率の観点から5g技術の実現可能性を検討することである。改良や修正によって5Gセルのエネルギー効率が向上する5Gセル内の特定の領域を指摘する努力がなされる。 While the 5G technology of cellular communications promises great capacity and coverage to access information anywhere and anytime, it is feared to have huge power consumption. Significant research been has been directed towards solving this problem which exists both on the subscribers side as well as the operators side. There have been efforts like predicting traffic, modifying the physical layer etc. towards making the 5G technology more energy efficient. The aim of this study is to see the technology enablers for 5G from an energy efficiency perspective. Efforts will be made to point out specific areas in 5G cellular where improvements or modifications could make 5G cellular more energy efficient.	翻訳日:2024-02-20 19:34:22 公開日:2024-02-18
# ソフトウェア定義光ネットワークの実現 Enabling Software Defined Optical Networks ( http://arxiv.org/abs/2402.11695v1 ) ライセンス: Link先を確認	Deven Panchal	(参考訳) 本稿では,Software Defined Optical Networks(SDON)の概要と実装方法について述べる。これは光ネットワークの進化をGMPLSまで遡り、SDNのアイデアを辿り、OpenFlowに構築する。論文では、SDONの必要性を調査し、ハードウェアを含むSDONソリューションがどのようなものかを説明する。また、GMPLSの制限を克服するために、このソリューションの一部としてOpenFlowをどのように使用できるかについても説明している。 This paper gives an overview of Software Defined Optical Networks or SDONs and how they can be implemented. It traces the evolution of Optical networks upto GMPLS and traces the idea of SDN and builds upto OpenFlow. The paper explores the need for SDONs and explains what a SDON solution could look like, including the hardware. It also seeks to explain how OpenFlow could be used as a part of this solution to overcome the limitations of GMPLS.	翻訳日:2024-02-20 19:34:11 公開日:2024-02-18
# Vision-Flan: ビジュアルインストラクションチューニングにおけるヒューマンラベルタスクのスケーリング Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning ( http://arxiv.org/abs/2402.11690v1 ) ライセンス: Link先を確認	Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang	(参考訳) 視覚言語モデル(VLM)は、多目的視覚アシスタントとして優れた機能を持つが、既存のVLMフレームワークには、(1)事前学習と視覚指導のタスク多様性の欠如、(2)GPT-4合成命令チューニングデータにおけるアノテーションエラーとバイアスの2つの大きな課題がある。どちらの課題も、ジェネラビリティの低下、幻覚、破滅的な忘れるといった問題を引き起こす。これらの課題に対処するため,我々は187の多様なタスクと1,664,261のインスタンスからなる,これまでに利用可能な最も多様な視覚インストラクションチューニングデータセットであるvision-flanを構築し,各タスクに専門家による命令を添付する。さらに,VLMをまずVision-Flan上で微調整し,さらにGPT-4合成データに基づいて調整する2段階の命令チューニングフレームワークを提案する。この2段階のチューニングフレームワークは、従来の1段階のビジュアル命令チューニングフレームワークを著しく上回り、幅広いマルチモーダル評価ベンチマークで最先端のパフォーマンスを実現しています。その結果,(1) GPT-4 合成データは VLM の能力を大幅に向上させるものではなく,むしろ人間の嗜好形式に対するモデル応答を変調する。(2) GPT-4 合成データの最小量 (例: 1000) は VLM 応答を人間の嗜好と効果的に整合させることができる;(3) 視覚的指示チューニングは主に大言語モデル(LLM)の視覚的特徴の理解を支援する。 Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and catastrophic forgetting. To address these challenges, we construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date, comprising 187 diverse tasks and 1,664,261 instances sourced from academic datasets, and each task is accompanied by an expert-written instruction. In addition, we propose a two-stage instruction tuning framework, in which VLMs are firstly finetuned on Vision-Flan and further tuned on GPT-4 synthesized data. We find this two-stage tuning framework significantly outperforms the traditional single-stage visual instruction tuning framework and achieves the state-of-the-art performance across a wide range of multi-modal evaluation benchmarks. Finally, we conduct in-depth analyses to understand visual instruction tuning and our findings reveal that: (1) GPT-4 synthesized data does not substantially enhance VLMs' capabilities but rather modulates the model's responses to human-preferred formats; (2) A minimal quantity (e.g., 1,000) of GPT-4 synthesized data can effectively align VLM responses with human-preference; (3) Visual instruction tuning mainly helps large-language models (LLMs) to understand visual features.	翻訳日:2024-02-20 19:34:04 公開日:2024-02-18
# 量子ニューラルネットワークにおけるモデル盗み攻撃と防御の効果評価 Evaluating Efficacy of Model Stealing Attacks and Defenses on Quantum Neural Networks ( http://arxiv.org/abs/2402.11687v1 ) ライセンス: Link先を確認	Satwik Kundu, Debarshi Kundu and Swaroop Ghosh	(参考訳) 量子機械学習(QML)モデルのクラウドホスティングは、モデルをさまざまな脆弱性に公開する。本研究では,量子コンピューティングの領域におけるそのような攻撃の有効性を評価する。複数のQMLモデルアーキテクチャを用いた各種データセットの総合的な実験を行った。その結果、モデル盗み攻撃は最大$0.9\times$と$0.99\times$のクローンテスト精度を、トップ$$とトップ$k$のラベル(それぞれ$k:$ num\_classes)で訓練すると生成できることが判明した。これらの攻撃から防御するために、我々は現在の騒がしいハードウェアのユニークな特性を利用し、被害者モデルの出力を摂動させ、攻撃者のトレーニングプロセスを妨げる。特に,我々は次のように提案する。 1)ハードウェア変動誘発摂動(HVIP)と 2)ハードウェアとアーキテクチャの変化による摂動(HAVIP)。ノイズとアーキテクチャのばらつきは最大$\sim16\%の出力難読化を実現することができるが, 包括的解析により, ノイズ条件下でクローンされたモデルは耐障害性が高く, 難読化による性能劣化がほとんどないことがわかった。ノイズの多いハードウェアでトレーニングされたQMLモデルは、摂動や難読化に基づく防御や攻撃に自然に抵抗する。 Cloud hosting of quantum machine learning (QML) models exposes them to a range of vulnerabilities, the most significant of which is the model stealing attack. In this study, we assess the efficacy of such attacks in the realm of quantum computing. We conducted comprehensive experiments on various datasets with multiple QML model architectures. Our findings revealed that model stealing attacks can produce clone models achieving up to $0.9\times$ and $0.99\times$ clone test accuracy when trained using Top-$1$ and Top-$k$ labels, respectively ($k:$ num\_classes). To defend against these attacks, we leverage the unique properties of current noisy hardware and perturb the victim model outputs and hinder the attacker's training process. In particular, we propose: 1) hardware variation-induced perturbation (HVIP) and 2) hardware and architecture variation-induced perturbation (HAVIP). Although noise and architectural variability can provide up to $\sim16\%$ output obfuscation, our comprehensive analysis revealed that models cloned under noisy conditions tend to be resilient, suffering little to no performance degradation due to such obfuscations. Despite limited success with our defense techniques, this outcome has led to an important discovery: QML models trained on noisy hardwares are naturally resistant to perturbation or obfuscation-based defenses or attacks.	翻訳日:2024-02-20 19:33:32 公開日:2024-02-18
# 離散力学系のトポロジーと挙動の学習 Learning the Topology and Behavior of Discrete Dynamical Systems ( http://arxiv.org/abs/2402.11686v1 ) ライセンス: Link先を確認	Zirou Qiu, Abhijin Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti	(参考訳) 離散力学系は、現実世界のネットワーク上での感染拡大をモデル化するために一般的に用いられる。 PACフレームワークの下では、基礎となるネットワークが知られていると仮定して、システムの振る舞いを学習する問題を研究している。本研究では、ブラックボックスシステムの振る舞いと基盤となるトポロジの両方を学習する、より困難な設定に焦点を当てる。一般に、この学習問題は計算的に難解であることを示す。正の面では、動的システムの基盤となるグラフがいくつかのクラスに属する場合、PACモデルの下で効率的な学習方法を示す。さらに,未知系のトポロジーが部分的に観測される緩和設定について検討する。そこで本研究では,システムの推論とサンプルの複雑さの確立に有効なPAC学習者を提案する。最後に、ナタラジャン次元のよく知られた形式主義を用いて、トポロジーと振舞いの両方が未知である力学系の仮説クラスの表現力の形式的解析を行う。本研究は離散力学系の挙動とトポロジーを学習するための理論的基礎を提供する。 Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that, in general, this learning problem is computationally intractable. On the positive side, we present efficient learning methods under the PAC model when the underlying graph of the dynamical system belongs to some classes. Further, we examine a relaxed setting where the topology of an unknown system is partially observed. For this case, we develop an efficient PAC learner to infer the system and establish the sample complexity. Lastly, we present a formal analysis of the expressive power of the hypothesis class of dynamical systems where both the topology and behavior are unknown, using the well-known formalism of the Natarajan dimension. Our results provide a theoretical foundation for learning both the behavior and topology of discrete dynamical systems.	翻訳日:2024-02-20 19:33:07 公開日:2024-02-18
# alaVA:ライトビジョンランゲージモデルのためのGPT4V合成データのハーネス化 ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model ( http://arxiv.org/abs/2402.11684v1 ) ライセンス: Link先を確認	Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang	(参考訳) 近年の大型視覚言語モデル(lvlms)の進歩により、言語モデルにおけるマルチモーダル入力の処理が可能となったが、特にエッジデバイスでは重要な計算資源を必要とする。本研究では,従来のLVLMとリソースフレンドリなライトバージョンのパフォーマンスギャップを,高品質なトレーニングデータを用いて橋渡しすることを目的とする。これを実現するために、gpt-4vの詳細なキャプション、複雑な推論命令、画像からの詳細な回答を生成する能力を利用して合成データセットを作成する。得られたモデルであるALLaVAは、最大3B LVLMまでの12ベンチマークで競合性能を達成する。この研究は、より効率的なLVLMを作成する際に高品質なデータを採用する可能性を強調している。オンラインデモは \url{https://allava.freedomai.cn} で公開しています。 Recent advancements in Large Vision-Language Models (LVLMs) have enabled processing of multimodal inputs in language models but require significant computational resources for deployment, especially in edge devices. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To do this, a synthetic dataset is created by leveraging GPT-4V's ability to generate detailed captions, complex reasoning instructions and detailed answers from images. The resulted model trained with our data, ALLaVA, achieves competitive performance on 12 benchmarks up to 3B LVLMs. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. Our online demo is available at \url{https://allava.freedomai.cn}.	翻訳日:2024-02-20 19:32:50 公開日:2024-02-18
# すべてを支配するための1つのプロンプト: 意見要約のためのllm One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation ( http://arxiv.org/abs/2402.11683v1 ) ライセンス: Link先を確認	Tejpalsingh Siledar, Swaroop Nath, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Swaprava Nath, Pushpak Bhattacharyya, Suman Banerjee, Amey Patil, Sudhanshu Shekhar Singh, Muthusamy Chelliah, Nikesh Garera	(参考訳) 従来の基準に基づく指標を用いた意見要約の評価は、概観的な評価を提供することは稀であり、人間の判断との相関が比較的低いことが示されている。近年,NLG評価のための基準フリー指標としてLarge Language Models (LLMs) が提案されているが,意見要約評価には未検討である。さらに、限定的な意見要約評価データセットは進捗を阻害する。これに対処するため、私たちはsummeval-opデータセットをリリースします。このデータセットは、意見要約の評価に関連する7つの側面をカバーする: フルエンシ、コヒーレンス、妥当性、忠実性、アスペクトカバレッジ、感情一貫性、特異性。本稿では,Op-I-Promptを次元に依存しないプロンプト,Op-Promptsについて考察する。実験の結果、Op-I-Promptは、人間と平均で0.70のスピアマン相関を達成し、これまでのすべてのアプローチよりも優れているという意見の要約を評価するための優れた代替手段として現れている。我々の知る限り、我々は、意見要約領域において、クローズドソースモデルとオープンソースモデルの両方において、LCMを評価対象として初めて調査する。 Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluation datasets inhibit progress. To address this, we release the SUMMEVAL-OP dataset covering 7 dimensions related to the evaluation of opinion summaries: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. We investigate Op-I-Prompt a dimension-independent prompt, and Op-Prompts, a dimension-dependent set of prompts for opinion summary evaluation. Experiments indicate that Op-I-Prompt emerges as a good alternative for evaluating opinion summaries achieving an average Spearman correlation of 0.70 with humans, outperforming all previous approaches. To the best of our knowledge, we are the first to investigate LLMs as evaluators on both closed-source and open-source models in the opinion summarization domain.	翻訳日:2024-02-20 19:32:35 公開日:2024-02-18
# 非可換性による学習条件不変性 Learning Conditional Invariances through Non-Commutativity ( http://arxiv.org/abs/2402.11682v1 ) ライセンス: Link先を確認	Abhra Chaudhuri, Serban Georgescu, Anjan Dutta	(参考訳) ドメイン固有の確率変数を障害として条件付きフィルタリングする非分散学習アルゴリズムは、評価対象のドメインではなく、データセマンティクスのみに基づいて行う。目的領域に非可換的に向くような不変条件を緩和することにより, 条件付き不変条件の学習に最適で, サンプル効率のよい学習方法を示す。ドメイン非対称性の下では、ターゲットドメインがソースに存在しない意味的関連情報を含んでいる場合、ドメインの平均で最適であるエンコーダ$\varphi^$のリスクは、ターゲット固有の最適エンコーダ$\Phi^_\tau$のリスクによって厳密に低くされる。非可換性は$\Phi^_\tau$ を $\varphi^$ ではなく $\Phi^_\tau$ に最適化することを証明し、ドメイン間の$\mathcal{H}$-divergence をゼロにすることで、ターゲットのリスクに厳密な制限を与える。我々の理論と実験は、NCI(Non-commutative invariance)が、ドメイン適応のためのSOTA不変学習アルゴリズムを超越した$\Phi^_\tau$を学習する際の、サンプルの複雑さを満たすために、ソースドメインサンプルを活用することを実証している。実装はhttps://github.com/abhrac/nciで利用可能である。 Invariance learning algorithms that conditionally filter out domain-specific random variables as distractors, do so based only on the data semantics, and not the target domain under evaluation. We show that a provably optimal and sample-efficient way of learning conditional invariances is by relaxing the invariance criterion to be non-commutatively directed towards the target domain. Under domain asymmetry, i.e., when the target domain contains semantically relevant information absent in the source, the risk of the encoder $\varphi^$ that is optimal on average across domains is strictly lower-bounded by the risk of the target-specific optimal encoder $\Phi^_\tau$. We prove that non-commutativity steers the optimization towards $\Phi^_\tau$ instead of $\varphi^$, bringing the $\mathcal{H}$-divergence between domains down to zero, leading to a stricter bound on the target risk. Both our theory and experiments demonstrate that non-commutative invariance (NCI) can leverage source domain samples to meet the sample complexity needs of learning $\Phi^*_\tau$, surpassing SOTA invariance learning algorithms for domain adaptation, at times by over $2\%$, approaching the performance of an oracle. Implementation is available at https://github.com/abhrac/nci.	翻訳日:2024-02-20 19:32:01 公開日:2024-02-18
# 言語習得のブラックボックスを開く Opening the black box of language acquisition ( http://arxiv.org/abs/2402.11681v1 ) ライセンス: Link先を確認	J\'er\^ome Michaud and Anna Jon-and	(参考訳) ディープラーニング技術を用いた大規模言語モデルの最近の進歩は、データから言語を学習する方法に新たな関心を寄せている。しかし、これらのモデルが学習言語からの文法情報をどう表現するかは不明である。加えて、モデルは使用前に大きなコーパスで事前訓練されなければならない。本研究では,学習言語のための代替的,より透明で認知的に妥当なアーキテクチャを提案する。ディープラーニングの代わりに、シーケンスメモリとチャンキングに基づいた最小限の認知アーキテクチャを使用します。学習メカニズムは強化学習の原理に基づいている。私たちは、多くの自然のおもちゃの言語でアーキテクチャをテストします。その結果,モデルがこれらの人工言語をゼロから学習し,学習を支援する文法情報を抽出できることが示唆された。本研究は,このシンプルなアーキテクチャのパワーを実証し,言語学習プロセスの重要な要素としてシーケンスメモリの重要性を強調した。他の動物は忠実なシーケンス記憶を持っていないように見えるため、なぜ人間だけが複雑な言語を発達させたのかを説明することができる。 Recent advances in large language models using deep learning techniques have renewed interest on how languages can be learned from data. However, it is unclear whether or how these models represent grammatical information from the learned languages. In addition, the models must be pre-trained on large corpora before they can be used. In this work, we propose an alternative, more transparent and cognitively plausible architecture for learning language. Instead of using deep learning, our approach uses a minimal cognitive architecture based on sequence memory and chunking. The learning mechanism is based on the principles of reinforcement learning. We test our architecture on a number of natural-like toy languages. Results show that the model can learn these artificial languages from scratch and extract grammatical information that supports learning. Our study demonstrates the power of this simple architecture and stresses the importance of sequence memory as a key component of the language learning process. Since other animals do not seem to have a faithful sequence memory, this may explain why only humans have developed complex languages.	翻訳日:2024-02-20 19:31:18 公開日:2024-02-18
# リカレントニューラルネットワークと画像圧縮法による3次元点クラウド圧縮 3D Point Cloud Compression with Recurrent Neural Network and Image Compression Methods ( http://arxiv.org/abs/2402.11680v1 ) ライセンス: Link先を確認	Till Beemelmanns, Yuchen Tao, Bastian Lampe, Lennart Reiher, Raphael van Kempen, Timo Woopen, and Lutz Eckstein	(参考訳) LiDARポイントクラウドデータの保存と送信は、トレーニングデータ収集、リモートコントロール、クラウドサービス、SLAMなど、多くのAVアプリケーションにとって不可欠である。しかし,データの大きさや秩序のない構造のため,ポイントクラウドデータを低容量に圧縮することは困難である。原点雲データを密度の高い2次元行列構造に変換することは、圧縮アルゴリズムを適用する上で有望な方法である。本研究では,2次元表現における空間相関を効率的に利用するための圧縮アルゴリズムを提案する。構造化表現の圧縮には,一般的な画像圧縮法と,再帰的ニューラルネットワークを用いた自己教師あり深層圧縮法を用いる。また,LiDARの強度測定を密度2D表現に再構成し,その強度の圧縮性能を評価するための新しい指標を提案する。一般的なoctreeポイントクラウド圧縮や生のポイントクラウドデータ圧縮に基づくアプローチと比較すると、このアプローチは最良の定量的かつ視覚的なパフォーマンスを達成します。ソースコードとデータセットはhttps://github.com/ika-rwth-aachen/point-cloud-compressionで入手できる。 Storing and transmitting LiDAR point cloud data is essential for many AV applications, such as training data collection, remote control, cloud services or SLAM. However, due to the sparsity and unordered structure of the data, it is difficult to compress point cloud data to a low volume. Transforming the raw point cloud data into a dense 2D matrix structure is a promising way for applying compression algorithms. We propose a new lossless and calibrated 3D-to-2D transformation which allows compression algorithms to efficiently exploit spatial correlations within the 2D representation. To compress the structured representation, we use common image compression methods and also a self-supervised deep compression approach using a recurrent neural network. We also rearrange the LiDAR's intensity measurements to a dense 2D representation and propose a new metric to evaluate the compression performance of the intensity. Compared to approaches that are based on generic octree point cloud compression or based on raw point cloud data compression, our approach achieves the best quantitative and visual performance. Source code and dataset are available at https://github.com/ika-rwth-aachen/Point-Cloud-Compression.	翻訳日:2024-02-20 19:30:54 公開日:2024-02-18
# MultiCorrupt: マルチモードロバストネスデータセットと3次元物体検出のためのLiDAR-Camera Fusionのベンチマーク MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection ( http://arxiv.org/abs/2402.11677v1 ) ライセンス: Link先を確認	Till Beemelmanns, Quan Zhang, and Lutz Eckstein	(参考訳) 自動走行のためのマルチモーダル3Dオブジェクト検出モデルは、nuScenesのようなコンピュータビジョンベンチマークでは例外的な性能を示した。しかし、密集したLiDAR点雲や精密に校正されたセンサーアレイへの依存は、現実世界のアプリケーションに課題をもたらす。センサの誤用、ミスキャリブレーション、異なるサンプリング周波数などの問題は、lidarやカメラからのデータの空間的および時間的不均衡につながる。加えて、LiDARとカメラデータの完全性は、インクリメント気象などの有害な環境条件によってしばしば損なわれ、閉塞やノイズ干渉を引き起こす。この課題に対処するため,我々は,マルチモーダル3次元物体検出器のロバスト性を評価するための総合ベンチマークであるmulticorruptを導入する。マルチコラプトにおける5つの最先端マルチモーダル検出器を評価し,その耐性について検討した。以上の結果から, 既存手法では腐敗の種類や融解戦略によってロバスト性が異なっていた。マルチモーダルな設計選択が、そのようなモデルをある種の摂動に対して堅牢にするための洞察を提供する。データセット生成コードとベンチマークはhttps://github.com/ika-rwth-aachen/MultiCorruptで公開されている。 Multi-modal 3D object detection models for automated driving have demonstrated exceptional performance on computer vision benchmarks like nuScenes. However, their reliance on densely sampled LiDAR point clouds and meticulously calibrated sensor arrays poses challenges for real-world applications. Issues such as sensor misalignment, miscalibration, and disparate sampling frequencies lead to spatial and temporal misalignment in data from LiDAR and cameras. Additionally, the integrity of LiDAR and camera data is often compromised by adverse environmental conditions such as inclement weather, leading to occlusions and noise interference. To address this challenge, we introduce MultiCorrupt, a comprehensive benchmark designed to evaluate the robustness of multi-modal 3D object detectors against ten distinct types of corruptions. We evaluate five state-of-the-art multi-modal detectors on MultiCorrupt and analyze their performance in terms of their resistance ability. Our results show that existing methods exhibit varying degrees of robustness depending on the type of corruption and their fusion strategy. We provide insights into which multi-modal design choices make such models robust against certain perturbations. The dataset generation code and benchmark are open-sourced at https://github.com/ika-rwth-aachen/MultiCorrupt.	翻訳日:2024-02-20 19:29:51 公開日:2024-02-18
# LiRaFusion:3次元物体検出のための深層適応LiDAR-Radar核融合 LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection ( http://arxiv.org/abs/2402.11735v1 ) ライセンス: Link先を確認	Jingyu Song, Lingjun Zhao, Katherine A. Skinner	(参考訳) 既存のLiDARレーダ検出器の性能ギャップを埋めるために,LiRaFusionを用いて3次元物体検出を行う。これら2つのモードから特徴抽出能力を向上させるために,ジョイントボクセル特徴符号化のための早期融合モジュールと,ゲートネットワークを介して特徴マップを適応的に融合する中間融合モジュールを設計した。我々は、LiRaFusionがLiDARとレーダーの補完情報を効果的に活用し、既存の手法よりも顕著な改善を実現していることを示す。 We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection to fill the performance gap of existing LiDAR-radar detectors. To improve the feature extraction capabilities from these two modalities, we design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps via a gated network. We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion leverages the complementary information of LiDAR and radar effectively and achieves notable improvement over existing methods.	翻訳日:2024-02-20 19:20:57 公開日:2024-02-18
# 大規模言語モデルを用いたデータ中心タスクの解決 Solving Data-centric Tasks using Large Language Models ( http://arxiv.org/abs/2402.11734v1 ) ライセンス: Link先を確認	Shraddha Barke, Christian Poelitz, Carina Suzana Negreanu, Benjamin Zorn, Jos\'e Cambronero, Andrew D. Gordon, Vu Le, Elnaz Nouri, Nadia Polikarpova, Advait Sarkar, Brian Slininger, Neil Toronto, Jack Williams	(参考訳) 大規模言語モデル(llm)はstackoverflowのようなヘルプフォーラムを急速に置き換えている。これらのユーザは、スプレッドシート操作やデータラングといったデータ中心のタスクに関心を持っていることが多い。しかし、どのデータとどのデータをプロンプトに含めるかをどのように決めるのか? 本稿では,この問題への回答に2つの貢献をする。まず,StackOverflowの投稿から抽出した表データを操作する実世界のNL-to-codeタスクのデータセットを作成する。次に,LLMプロンプトに入力データから最も代表的な行を追加するクラスタ列選択プロンプト手法を提案する。実験の結果,LLMの性能はプロンプトに渡されるデータ量に非常に敏感であり,入力テーブルに多くの構文変化があるタスクの場合,クラスタ列選択手法はランダム選択ベースラインよりも優れていた。 Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how do we decide how much data and which data to include in the prompt? This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table, our cluster-then-select technique outperforms a random selection baseline.	翻訳日:2024-02-20 19:20:45 公開日:2024-02-18
# ロバスト一般化におけるランダムフォーミングの有効性 The Effectiveness of Random Forgetting for Robust Generalization ( http://arxiv.org/abs/2402.11733v1 ) ライセンス: Link先を確認	Vijaya Raghavan T Ramkumar, Bahram Zonooz and Elahe Arani	(参考訳) ディープニューラルネットワークは、敵攻撃の影響を受けやすいため、パフォーマンスと精度を損なう可能性がある。敵訓練(AT)は、そのような攻撃からニューラルネットワークを保護する一般的なアプローチとして現れている。しかし、ATの重要な課題は、テストデータに対するネットワークの堅牢な性能がさらなるトレーニングで悪化し、一般化を阻害する、堅牢なオーバーフィッティングである。脳における能動的忘れるという概念に動機づけられ、我々は新しい学習パラダイム"forget to ease overfitting (fomo)"を導入した。 FOMOは、重みのサブセットをランダムに忘れ、重みの再初期化を通じてモデルの情報を規制する忘れ相と、一般化可能な特徴の学習を強調する再学習相とを交互に扱う。ベンチマークデータセットと敵攻撃による実験により、FOMOは、最先端のロバスト性を改善しつつ、最良のテストと最後のロバストテストの精度のギャップを大幅に減らし、ロバストなオーバーフィッティングを緩和することが示された。さらに、FOMOは標準とロバストな精度のトレードオフを向上し、ベースラインの対角法よりも優れている。最後に、我々のフレームワークはAutoAttacksに対して堅牢であり、多くの実世界のシナリオにおける一般化を高めます。 Deep neural networks are susceptible to adversarial attacks, which can compromise their performance and accuracy. Adversarial Training (AT) has emerged as a popular approach for protecting neural networks against such attacks. However, a key challenge of AT is robust overfitting, where the network's robust performance on test data deteriorates with further training, thus hindering generalization. Motivated by the concept of active forgetting in the brain, we introduce a novel learning paradigm called "Forget to Mitigate Overfitting (FOMO)". FOMO alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features. Our experiments on benchmark datasets and adversarial attacks show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy while improving the state-of-the-art robustness. Furthermore, FOMO provides a better trade-off between standard and robust accuracy, outperforming baseline adversarial methods. Finally, our framework is robust to AutoAttacks and increases generalization in many real-world scenarios.	翻訳日:2024-02-20 19:20:27 公開日:2024-02-18
# プロスペクタヘッド:大規模モデルとデータに対する一般的な特徴属性 Prospector Heads: Generalized Feature Attribution for Large Models & Data ( http://arxiv.org/abs/2402.11729v1 ) ライセンス: Link先を確認	Gautam Machiraju, Alexander Derry, Arjun Desai, Neel Guha, Amir-Hossein Karimi, James Zou, Russ Altman, Christopher R\'e, Parag Mallick	(参考訳) 特徴帰属(feature attribution)は、分類に関連する入力データの領域をローカライズする能力であり、科学的および生物医学領域の機械学習モデルにとって重要な機能である。エンド・ツー・エンドの分類器の予測を「説明」する現在の特徴帰属法は、不正確な特徴の局在化に苦しめられ、計算上の課題のために小さなサンプルサイズと高次元データセットでの使用には不十分である。我々は,任意のエンコーダおよび任意のデータモダリティに適用可能な特徴帰属のための説明ベース手法の効率的かつ解釈可能な代替手段であるprospector headを提案する。プロスペクタヘッドは、シーケンス(テキスト)、画像(病理)、およびグラフ(タンパク質構造)の実験を通じてモダリティを一般化し、平均局在auprcにおけるベースラインアトリビューション法を最大49ポイント上回った。また、入力データ中のクラス固有のパターンの解釈と発見を改善する方法を示す。ハイパフォーマンス、柔軟性、一般化性を通じて、複雑なドメインにおける機械学習モデルの信頼性と透明性を改善するためのフレームワークを提供する。 Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for machine learning models in scientific and biomedical domains. Current methods for feature attribution, which rely on "explaining" the predictions of end-to-end classifiers, suffer from imprecise feature localization and are inadequate for use with small sample sizes and high-dimensional datasets due to computational challenges. We introduce prospector heads, an efficient and interpretable alternative to explanation-based methods for feature attribution that can be applied to any encoder and any data modality. Prospector heads generalize across modalities through experiments on sequences (text), images (pathology), and graphs (protein structures), outperforming baseline attribution methods by up to 49 points in mean localization AUPRC. We also demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in the input data. Through their high performance, flexibility, and generalizability, prospectors provide a framework for improving trust and transparency for machine learning models in complex domains.	翻訳日:2024-02-20 19:20:04 公開日:2024-02-18
# 金融における数値的クレーム検出:新しい金融データセット、弱いスーパービジョンモデル、市場分析 Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis ( http://arxiv.org/abs/2402.11728v1 ) ライセンス: Link先を確認	Agam Shah, Arnav Hiray, Pratvi Shah, Arkaprabha Banerjee, Anushka Singh, Dheeraj Eidnani, Bhaskar Chaudhury, Sudheer Chava	(参考訳) 本稿では、上場企業にとって重要な四半期イベントであるとして、アナリスト報告や決算報告が金融市場リターンに与える影響を検討する。包括的分析を容易にするために,金融領域におけるクレーム検出タスクのための新たな財務データセットを構築する。我々は,本データセット上で様々な言語モデルをベンチマークし,既存のアプローチよりも優れた対象事項エキスパート(SME)の知識を集約関数に取り入れた,新しい弱スーパービジョンモデルを提案する。さらに,「最適化」という新しい尺度を構築することで,提案モデルの実用性を実証する。さらに、利益サプライズへの依存と楽観的尺度への回帰も観察した。私たちのデータセット、モデル、コードは(CC BY 4.0ライセンスの下で)GitHubとHugging Faceで公開されます。 In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. Furthermore, we demonstrate the practical utility of our proposed model by constructing a novel measure ``optimism". Furthermore, we observed the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code will be made publicly (under CC BY 4.0 license) available on GitHub and Hugging Face.	翻訳日:2024-02-20 19:19:41 公開日:2024-02-18
# 人間とaiのコラボレーションを形作る:言語モデルとの共著における様々な足場レベル Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models ( http://arxiv.org/abs/2402.11723v1 ) ライセンス: Link先を確認	Paramveer S. Dhillon, Somayeh Molaei, Jiaqi Li, Maximilian Golub, Shaochun Zheng, Lionel P. Robert	(参考訳) 言語モデリングの進歩は、新しい人間-ai共著体験への道を開いた。本稿では,大規模言語モデル(llm)からのスキャフォールディングの多種多様なレベルについて検討する。ラテン四角形設計を用いて、被験者(N=131)に、AIアシスト(制御)なし、次文提案(低足場化)、次パラグラフ提案(高足場化)の3つのランダムな条件下での議論的記述プロンプトへの対応を依頼した。以上の結果から,足場が文字品質と生産性(単語/時間)に与える影響が明らかとなった。低いスキャフォールディングは書き込みの品質や生産性を著しく改善しなかったが、高いスキャフォールディングは大きな改善をもたらし、特に非正規のライターや技術に精通していないユーザーにとって恩恵となった。足場作成ツールを用いた場合,認知的負担は認められなかったが,テキストの所有と満足度は適度に低下した。我々の結果は、パーソナライズされたスキャフォールディング機構の必要性を含む、AIを活用した書込みツールの設計に幅広い影響を及ぼす。 Advances in language modeling have paved the way for novel human-AI co-writing experiences. This paper explores how varying levels of scaffolding from large language models (LLMs) shape the co-writing process. Employing a within-subjects field experiment with a Latin square design, we asked participants (N=131) to respond to argumentative writing prompts under three randomly sequenced conditions: no AI assistance (control), next-sentence suggestions (low scaffolding), and next-paragraph suggestions (high scaffolding). Our findings reveal a U-shaped impact of scaffolding on writing quality and productivity (words/time). While low scaffolding did not significantly improve writing quality or productivity, high scaffolding led to significant improvements, especially benefiting non-regular writers and less tech-savvy users. No significant cognitive burden was observed while using the scaffolded writing tools, but a moderate decrease in text ownership and satisfaction was noted. Our results have broad implications for the design of AI-powered writing tools, including the need for personalized scaffolding mechanisms.	翻訳日:2024-02-20 19:19:28 公開日:2024-02-18
# 逆問題と逆問題に対処する可逆フーリエニューラル演算子 Invertible Fourier Neural Operators for Tackling Both Forward and Inverse Problems ( http://arxiv.org/abs/2402.11722v1 ) ライセンス: Link先を確認	Da Long and Shandian Zhe	(参考訳) Fourier Neural Operator (FNO)は、多くのタスクで最先端のパフォーマンスを実証した、人気のある演算子学習手法である。しかし、FNOは主に前方予測に使われているが、多くのアプリケーションは逆問題の解決に頼っている。本稿では,前向きと逆問題の両方に対処する可逆フーリエニューラル演算子 (iFNO) を提案する。潜在チャネル空間における可逆フーリエブロックの設計を行い,モデルパラメータを共有し,情報を効率的に交換し,双方向タスクの学習を相互に規則化する。本研究では,入力空間内の固有構造を捉えるための変分自動エンコーダを統合し,不備やデータ不足,ノイズなどの問題に対処するために後部推論を可能にする。効率的なトレーニングのために,事前学習と微調整のための3段階のプロセスを開発した。 5つのベンチマーク問題に対する評価は,本手法の有効性を示した。 Fourier Neural Operator (FNO) is a popular operator learning method, which has demonstrated state-of-the-art performance across many tasks. However, FNO is mainly used in forward prediction, yet a large family of applications rely on solving inverse problems. In this paper, we propose an invertible Fourier Neural Operator (iFNO) that tackles both the forward and inverse problems. We designed a series of invertible Fourier blocks in the latent channel space to share the model parameters, efficiently exchange the information, and mutually regularize the learning for the bi-directional tasks. We integrated a variational auto-encoder to capture the intrinsic structures within the input space and to enable posterior inference so as to overcome challenges of illposedness, data shortage, noises, etc. We developed a three-step process for pre-training and fine tuning for efficient training. The evaluations on five benchmark problems have demonstrated the effectiveness of our approach.	翻訳日:2024-02-20 19:19:04 公開日:2024-02-18
# LLMエージェントを用いた政治連携交渉のモデル化 Modelling Political Coalition Negotiations Using LLM-based Agents ( http://arxiv.org/abs/2402.11712v1 ) ライセンス: Link先を確認	Farhad Moghimifar, Yuan-Fang Li, Robert Thomson, Gholamreza Haffari	(参考訳) 連立交渉は議会の民主主義の基礎であり、複雑な相互作用と政党間の戦略的コミュニケーションが特徴である。その重要性にもかかわらず、これらの交渉のモデル化は、主に適切なデータがないために、自然言語処理(NLP)の領域で未検討のままである。本稿では,新しいnlpタスクとして連立交渉を導入し,大規模言語モデルに基づくエージェント間の交渉としてモデル化する。我々は、欧州政党の宣言とこれらの国における多数の選挙に関する連立協定を含む多言語データセット POLCA を導入する。このデータセットは、様々な実世界のシミュレーション基盤を提供することによって、政治交渉モデリングにおける現在の範囲制限の課題に対処する。さらに,政党間の連立交渉の過程をシミュレートし,結果を予測する階層的マルコフ決定プロセスを提案する。我々は,現在最先端の大規模言語モデル(LLM)の性能を,連立交渉に対処するエージェントとして評価し,その能力に関する洞察を提供し,今後の政治モデリングの発展への道を開く。 Coalition negotiations are a cornerstone of parliamentary democracies, characterised by complex interactions and strategic communications among political parties. Despite its significance, the modelling of these negotiations has remained unexplored with the domain of Natural Language Processing (NLP), mostly due to lack of proper data. In this paper, we introduce coalition negotiations as a novel NLP task, and model it as a negotiation between large language model-based agents. We introduce a multilingual dataset, POLCA, comprising manifestos of European political parties and coalition agreements over a number of elections in these countries. This dataset addresses the challenge of the current scope limitations in political negotiation modelling by providing a diverse, real-world basis for simulation. Additionally, we propose a hierarchical Markov decision process designed to simulate the process of coalition negotiation between political parties and predict the outcomes. We evaluate the performance of state-of-the-art large language models (LLMs) as agents in handling coalition negotiations, offering insights into their capabilities and paving the way for future advancements in political modelling.	翻訳日:2024-02-20 19:18:50 公開日:2024-02-18
# MORL-Prompt:離散プロンプト最適化のための多目的強化学習の実証分析 MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization ( http://arxiv.org/abs/2402.11711v1 ) ライセンス: Link先を確認	Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick	(参考訳) RLに基づく手法は、ターゲット言語モデルに入力された場合、ユーザーが指定した報酬関数の集合を最大化するプロンプトを探索するために用いられる。しかし、多くのターゲットアプリケーションでは、自然報酬関数は、例えば、スタイル転送タスクにおけるコンテンツ保存対スタイルマッチングといった、互いに緊張状態にある。現在の技術では、報酬関数の平均を最大化することに焦点を当てている。これは必ずしも報酬間のバランスを達成するプロンプトにつながるわけではない。これは、多目的で堅牢な最適化文献でよく研究されている問題である。本稿では,多目的最適化のための複数の手法をrlベースの離散的プロンプト最適化に適用する。2つはパレートの報酬面の体積を考慮し,もう1つは全ての報酬を同時に得られる更新方向を選択する。これら2つのnlpタスク(スタイル転送と機械翻訳)について経験的分析を行い,3つの報酬関数を用いた。実験により,音量を直接最適化する多目的手法は,単調な更新方向を見つけようとする方法よりも,すべての報酬のバランスが良好であることを示す。 RL-based techniques can be used to search for prompts that when fed into a target language model maximize a set of user-specified reward functions. However, in many target applications, the natural reward functions are in tension with one another -- for example, content preservation vs. style matching in style transfer tasks. Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards -- an issue that has been well-studied in the multi-objective and robust optimization literature. In this paper, we adapt several techniques for multi-objective optimization to RL-based discrete prompt optimization -- two that consider volume of the Pareto reward surface, and another that chooses an update direction that benefits all rewards simultaneously. We conduct an empirical analysis of these methods on two NLP tasks: style transfer and machine translation, each using three competing reward functions. Our experiments demonstrate that multi-objective methods that directly optimize volume perform better and achieve a better balance of all rewards than those that attempt to find monotonic update directions.	翻訳日:2024-02-20 19:18:32 公開日:2024-02-18
# 完成へのバイアスについての一考察 A Note on Bias to Complete ( http://arxiv.org/abs/2402.11710v1 ) ライセンス: Link先を確認	Jia Xu and Mona Diab	(参考訳) 社会バイアスの最小化は社会的な結合を強化し、共有理解を促進し、意思決定を改善する。動的環境における新しいバイアスタイプ(例えば社会的地位)を発見してバイアスの定義を再考し、文化、地域、時間、個人的背景といった文脈に関連してそれらを記述する。本フレームワークは,各仮定に対するバイアスに関する8つの仮説と最小化バイアス戦略と,LLMで提案された解として提案される5つの方法を含む。フレームワークの実現はまだ完了していない。 Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making. We revisit the definition of bias by discovering new bias types (e.g., societal status) in dynamic environments and describe them relative to context, such as culture, region, time, and personal background. Our framework includes eight hypotheses about bias and a minimizing bias strategy for each assumption as well as five methods as proposed solutions in LLM. The realization of the framework is yet to be completed.	翻訳日:2024-02-20 19:18:10 公開日:2024-02-18
# GNNavi: グラフニューラルネットワークによる大規模言語モデルの情報フローのナビゲート GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network ( http://arxiv.org/abs/2402.11709v1 ) ライセンス: Link先を確認	Shuzhou Yuan, Ercong Nie, Michael F\"arber, Helmut Schmid, Hinrich Sch\"utze	(参考訳) 大きな言語モデル(LLM)は、デモによるプロンプトが適用されると、強力なインコンテキスト学習(ICL)能力を示す。しかし、さらに適応性を高めるためには微調整が依然として不可欠である。プロンプトベースの微調整は、低データシナリオにおいて効果的な微調整方法であることが証明されるが、計算リソースへの高い要求は、その実用性を制限する。本稿では,パラメータ効率向上手法(PEFT)を導入することでこの問題に対処する。 GNNaviはICLの情報フローダイナミクスの洞察を活用し、ラベル語が情報伝達のアンカーとして働くことを示す。 GNNaviはグラフニューラルネットワーク(GNN)レイヤを使用して、希望する情報フローをGNNにハードスイッチすることで、プロンプト処理中に情報フローの集約と分布を正確にガイドする。 GPT-2とLlama2を用いたテキスト分類タスクの実験では、GNNaviはパラメータの0.2%から0.5%を更新することで、数ショット設定で標準のプロンプトベースの微調整手法を超えている。我々は、GNNaviとプレフィックスチューニング、LoRA、AdapterなどのPEFTアプローチを比較し、性能と効率の点で比較する。分析の結果,gnnaviは情報フローを強化し,明確な集約プロセスを保証する。 Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are applied to them. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient fine-tuning (PEFT) approach. GNNavi leverages insights into ICL's information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNavi employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 shows GNNavi surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNavi with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNavi enhances information flow and ensures a clear aggregation process.	翻訳日:2024-02-20 19:18:04 公開日:2024-02-18
# 検索エンジンのChatGPT: ジェネレーティブな人工知能が検索の信頼性を損なう Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less Reliable ( http://arxiv.org/abs/2402.11707v1 ) ライセンス: Link先を確認	Shahan Ali Memon, Jevin D. West	(参考訳) 本稿では,生成人工知能(GenAI)が生成したコンテンツを生成,インデックス化,配信し始める中で,検索エンジンの進化する性質について論じる。我々の議論は、GenAI統合の初期段階、特に事実上の矛盾とバイアスに関する課題を強調します。我々は, 透明性とソーシング能力の低下を伴いながら, ジェナイからの出力が不当な信頼感をもたらすかについて議論する。さらに、検索エンジンは、すでにエラーの少ない、生成されたコンテンツでクエリに答えており、情報の証明をさらに曖昧にし、情報のエコシステムの完全性に影響を与える。これらの要因が検索エンジンの信頼性を低下させるのか議論する。最後に、活発な研究の方向性とオープンな質問について要約する。 In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.	翻訳日:2024-02-20 19:17:43 公開日:2024-02-18
# 一般化ランゲヴィン方程式におけるメモリカーネルの学習 Learning Memory Kernels in Generalized Langevin Equations ( http://arxiv.org/abs/2402.11705v1 ) ライセンス: Link先を確認	Quanjun Lang, Jianfeng Lu	(参考訳) 一般化ランゲヴィン方程式におけるメモリカーネル学習のための新しい手法を提案する。このアプローチは最初、軌道データから相関関数を推定するために正規化prony法を使用し、rkhs正規化を伴うソボレフノルムに基づく損失関数を回帰する。提案手法では,推定相関関数の誤差によってカーネル推定誤差が制御され,指数重み付き$L^2$空間内での性能向上が保証される。 l^2$損失関数に依存する他の回帰推定器や、逆ラプラス変換に由来する推定器と比較し、様々な重みパラメータの選択において一貫した利点を示す数値例を用いて推定器の優位性を示す。さらに、方程式における力およびドリフト項の適用を含む例を示す。 We introduce a novel approach for learning memory kernels in Generalized Langevin Equations. This approach initially utilizes a regularized Prony method to estimate correlation functions from trajectory data, followed by regression over a Sobolev norm-based loss function with RKHS regularization. Our approach guarantees improved performance within an exponentially weighted $L^2$ space, with the kernel estimation error controlled by the error in estimated correlation functions. We demonstrate the superiority of our estimator compared to other regression estimators that rely on $L^2$ loss functions and also an estimator derived from the inverse Laplace transform, using numerical examples that highlight its consistent advantage across various weight parameter selections. Additionally, we provide examples that include the application of force and drift terms in the equation.	翻訳日:2024-02-20 19:17:28 公開日:2024-02-18
# バランスデータ, 不均衡スペクトル:スペクトル不均衡を伴うクラス格差の解消 Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance ( http://arxiv.org/abs/2402.11742v1 ) ライセンス: Link先を確認	Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y Jin, Wenrui Ma, Vidya Muthukumar, Eva L Dyer	(参考訳) 分類モデルは、異なるクラスで等しく機能することが期待されているが、実際には、しばしばその性能に大きなギャップがある。このクラスバイアスの問題はサンプル不均衡のデータセットで広く研究されているが、バランスのとれたデータセットでは見過ごされている。本研究では,特徴のスペクトル不均衡をクラス格差の潜在的源として導入し,理論と実践の両方におけるスペクトル不均衡とクラスバイアスの関係について検討する。スペクトル不均衡とクラスギャップの関連性を構築するため,高次元混合モデルにおけるクラス間誤差の正確な表現を導出する理論的枠組みを構築した。次に,11種類の事前学習済みエンコーダでこの現象を解析し,提案手法を用いてエンコーダの品質比較を行い,データ拡張戦略の評価と統合を行い,この問題を軽減した。私たちの研究は、学習のクラス依存の影響に光を当て、そのスペクトルを通じて診断できる未知のバイアスを持つ、最先端の事前学習機能に関する新たな洞察を与えています。 Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pretrained encoders and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pretrained features may have unknown biases that can be diagnosed through their spectra.	翻訳日:2024-02-20 19:07:23 公開日:2024-02-18
# ニューラルネットにおける非線形性の抽出とkoopman演算子によるモデル圧縮 Extraction of nonlinearity in neural networks and model compression with Koopman operator ( http://arxiv.org/abs/2402.11740v1 ) ライセンス: Link先を確認	Naoki Sugishita, Kayo Kinjo, Jun Ohkubo	(参考訳) 非線形性はディープニューラルネットワークにおいて重要な役割を果たす。本稿では,まず,ニューラルネットワークの非線形性が不可欠である程度について検討する。この目的のために、koopman演算子、拡張動的モード分解、テンソル-トレイン形式を用いる。結果は、制限された非線形性は手書き数字の分類に十分であることを示している。そこで本研究では,資源制約環境下での大規模ネットワーク処理に有用なディープニューラルネットワークのモデル圧縮手法を提案する。提案手法は,クープマン演算子を利用して,ニューラルネットワークの内部処理における線形代数の利用を可能にする。提案手法は,手書き数認識タスクの高度圧縮モデル設定において,従来手法と同等かそれ以上の性能を示す。 Nonlinearity plays a crucial role in deep neural networks. In this paper, we first investigate the degree to which the nonlinearity of the neural network is essential. For this purpose, we employ the Koopman operator, extended dynamic mode decomposition, and the tensor-train format. The results imply that restricted nonlinearity is enough for the classification of handwritten numbers. Then, we propose a model compression method for deep neural networks, which could be beneficial to handling large networks in resource-constrained environments. Leveraging the Koopman operator, the proposed method enables us to use linear algebra in the internal processing of neural networks. We numerically show that the proposed method performs comparably or better than conventional methods in highly compressed model settings for the handwritten number recognition task.	翻訳日:2024-02-20 19:07:03 公開日:2024-02-18
# ニューラルネットワーク力学系モデルのための遷移系抽象化フレームワーク A Transition System Abstraction Framework for Neural Network Dynamical System Models ( http://arxiv.org/abs/2402.11739v1 ) ライセンス: Link先を確認	Yejiang Yang, Zihao Mo, Hoang-Dung Tran, and Weiming Xiang	(参考訳) 本稿では,人間の行動学習や検証といった複雑な力学系への応用により,モデル解釈性を高めるために,ニューラルネットワーク力学系モデルのためのトランジッションシステム抽象化フレームワークを提案する。まず、ローカライズされた作業ゾーンは、データ駆動の最大エントロピー(ME)パーティショニング法の下で、複数のローカライズされたパーティショニングに分割される。次に、ニューラルネットワークのセット値到達可能性解析に基づいて遷移行列を求める。最後に、人間の手書きのダイナミクス学習および検証への応用により、提案する抽象化フレームワークを検証し、ブラックボックスモデルの解釈性を向上させる利点を実証する。つまり、提案フレームワークは、データ駆動ニューラルネットワークモデルをトランジッションシステムに抽象化することができ、計算木論理(ctl)言語で記述された仕様の検証を通じてニューラルネットワークモデルを解釈可能とする。 This paper proposes a transition system abstraction framework for neural network dynamical system models to enhance the model interpretability, with applications to complex dynamical systems such as human behavior learning and verification. To begin with, the localized working zone will be segmented into multiple localized partitions under the data-driven Maximum Entropy (ME) partitioning method. Then, the transition matrix will be obtained based on the set-valued reachability analysis of neural networks. Finally, applications to human handwriting dynamics learning and verification are given to validate our proposed abstraction framework, which demonstrates the advantages of enhancing the interpretability of the black-box model, i.e., our proposed framework is able to abstract a data-driven neural network model into a transition system, making the neural network model interpretable through verifying specifications described in Computational Tree Logic (CTL) languages.	翻訳日:2024-02-20 19:06:51 公開日:2024-02-18
# 射影ゲージ-ヒッグスモデルにおけるバルクおよび境界絡み合い遷移 Bulk and boundary entanglement transitions in the projective gauge-Higgs model ( http://arxiv.org/abs/2402.11738v1 ) ライセンス: Link先を確認	Hiroki Sukeno, Kazuki Ikeda, Tzu-Chieh Wei	(参考訳) 量子多体スピン系では、マルチキュービットパウリのエンタングリング効果と単一キュービットパウリの測定のアンタングリング効果との相互作用は2つの競合効果をもたらす可能性がある。このような基底を持つランダム化測定パターンを導入することにより、それらの比を変化させることで位相遷移を誘導することができる。本研究では,2+1)$d$\mathbb{Z}_2$ Fradkin-Shenker Hamiltonianモデルに付随する測定ベースモデルについて数値解析を行った。エンタングルメント尺度を用いて, 測定のみのモデルにおける位相図を決定する。バルクトポロジカル秩序に対しては、トポロジカルエントロピーを用いる。また, 分離境界領域間の相互情報を用いて, ヒッグス相やバルクspt相に関連する境界相転移を診断する。我々は、開粗境界を持つフラドキン・シェンカー模型の標準量子ハミルトン定式化において、位相図と位相図の構造的類似性を観察した。まず、非零および定数位相エンタングルメントエントロピーにより解圧位相を検出する。第二に、ヒッグス=SPT相と残りの相を分離する(有界)相転移曲線が見つかる。ある限度では、位相相転移はバルク3次元時空格子における巨大ホモロジーサイクルの形成の臨界点と、バルクから効果的に分離されたときに境界2次元時空格子の結合パーコレーション閾値に存在する。さらに、位相図の特定の領域に類似した混合位相特性が存在し、測定に基づく手続きの終了方法から生じる。近い将来, 量子デバイス上でのヒッグス=SPT相の物理を研究するための代替経路を開拓する。 In quantum many-body spin systems, the interplay between the entangling effect of multi-qubit Pauli measurements and the disentangling effect of single-qubit Pauli measurements may give rise to two competing effects. By introducing a randomized measurement pattern with such bases, a phase transition can be induced by altering the ratio between them. In this work, we numerically investigate a measurement-based model associated with the $(2+1)$d $\mathbb{Z}_2$ Fradkin-Shenker Hamiltonian model, encompassing the deconfining, confining, and Higgs phases. We determine the phase diagram in our measurement-only model by employing entanglement measures. For the bulk topological order, we use the topological entanglement entropy. We also use the mutual information between separated boundary regions to diagnose the boundary phase transition associated with the Higgs or the bulk SPT phase. We observe the structural similarity between our phase diagram and the one in the standard quantum Hamiltonian formulation of the Fradkin-Shenker model with the open rough boundary. First, a deconfining phase is detected by nonzero and constant topological entanglement entropy. Second, we find a (boundary) phase transition curve separating the Higgs=SPT phase from the rest. In certain limits, the topological phase transitions reside at the critical point of the formation of giant homological cycles in the bulk 3d spacetime lattice, as well as the bond percolation threshold of the boundary 2d spacetime lattice when it is effectively decoupled from the bulk. Additionally, there are analogous mixed-phase properties at a certain region of the phase diagram, emerging from how we terminate the measurement-based procedure. Our findings pave an alternative pathway to study the physics of Higgs=SPT phases on quantum devices in the near future.	翻訳日:2024-02-20 19:06:34 公開日:2024-02-18
# モデル等価性評価に基づくフィードフォワードニューラルネットワークの圧縮修復 Compression Repair for Feedforward Neural Networks Based on Model Equivalence Evaluation ( http://arxiv.org/abs/2402.11737v1 ) ライセンス: Link先を確認	Zihao Mo, Yejiang Yang, Shuaizheng Lu, and Weiming Xiang	(参考訳) 本稿では,2つのニューラルネットワークの等価性評価に基づいて,圧縮フィードフォワードニューラルネットワーク(FNN)の修復手法を提案する。修復フレームワークにおいて、2つのニューラルネットワーク間の出力差を計算するために、新しいニューラルネットワーク等価性評価法を開発した。出力不一致は、圧縮手順によって生じる出力差を定量的に特徴付けることができる。この計算出力不一致に基づいて、まず、圧縮ネットワークのための新しいトレーニングセットを初期化し、2つのニューラルネットワーク間の不一致を狭め、圧縮ネットワークの性能を向上させる。そして, トレーニングセットに基づいて再訓練を行い, 圧縮FNNを修復する。提案手法の有効性と利点を示すため,本手法をMNISTデータセットに適用した。 In this paper, we propose a method of repairing compressed Feedforward Neural Networks (FNNs) based on equivalence evaluation of two neural networks. In the repairing framework, a novel neural network equivalence evaluation method is developed to compute the output discrepancy between two neural networks. The output discrepancy can quantitatively characterize the output difference produced by compression procedures. Based on the computed output discrepancy, the repairing method first initializes a new training set for the compressed networks to narrow down the discrepancy between the two neural networks and improve the performance of the compressed network. Then, we repair the compressed FNN by re-training based on the training set. We apply our developed method to the MNIST dataset to demonstrate the effectiveness and advantages of our proposed repair method.	翻訳日:2024-02-20 19:05:56 公開日:2024-02-18
# カーネルベースのGibbs測度を持つMonte Carlo:確率的ハーディングの保証 Monte Carlo with kernel-based Gibbs measures: Guarantees for probabilistic herding ( http://arxiv.org/abs/2402.11736v1 ) ライセンス: Link先を確認	Martin Rouault, R\'emi Bardenet, Myl\`ene Ma\"ida	(参考訳) カーネルシェディングは、再現されたカーネルヒルベルト空間(RKHS)上の最悪の積分誤差を最小限に抑える決定論的二次関数の族に属する。強い実験的支持にもかかわらず、少なくともRKHSが無限次元である通常の場合において、この最悪のケースエラーが二次ノード数の標準平方根よりも速い速度で減少することを証明することは困難である。本稿では,カーネルのハーディングと同じ最悪のエラーを最小限に抑えるため,二次ノード上の結合確率分布について検討する。最悪ケース積分誤差に対してより厳密な濃度不等式を持つという意味で、モンテカルロよりも優れていることを証明している。速度をまだ改善していないが、ギブス測度の研究の数学的ツールが、カーネル・ハーディングとその変種が計算量的に安価な手法でどの程度改善するかを理解するのに役立つことを証明している。さらに, 早期実験により, 最悪の場合ではないが, 収束速度が速くなる可能性が示唆された。 Kernel herding belongs to a family of deterministic quadratures that seek to minimize the worst-case integration error over a reproducing kernel Hilbert space (RKHS). In spite of strong experimental support, it has revealed difficult to prove that this worst-case error decreases at a faster rate than the standard square root of the number of quadrature nodes, at least in the usual case where the RKHS is infinite-dimensional. In this theoretical paper, we study a joint probability distribution over quadrature nodes, whose support tends to minimize the same worst-case error as kernel herding. We prove that it does outperform i.i.d. Monte Carlo, in the sense of coming with a tighter concentration inequality on the worst-case integration error. While not improving the rate yet, this demonstrates that the mathematical tools of the study of Gibbs measures can help understand to what extent kernel herding and its variants improve on computationally cheaper methods. Moreover, we provide early experimental evidence that a faster rate of convergence, though not worst-case, is likely.	翻訳日:2024-02-20 19:05:36 公開日:2024-02-18

Title

Authors

Abstract

論文公表日・翻訳日

# 実業団プロジェクト開発における国際・多学制の授業経験

An International and Multidisciplinary Teaching Experience with Real Industrial Team Project Development ( http://arxiv.org/abs/2403.15398v1 )

ライセンス: Link先を確認

Martin Mellado, Eduardo Vendrell, Filomena Ferrucci, Andrea Abate, Detlef Zuhlke, Bernard Riera,

(参考訳) 本稿では,学生のカリキュラム改善を目的としたEasmus Intensive Programme (IP,略してIP) の文脈において,欧州委員会が資金提供した国際協力プロジェクトの設計,目標,経験,成果について述べる。 IP(IP)とは、少なくとも3カ国の大学生とスタッフをまとめて、専門分野の効率的かつ多国籍的な教育を奨励する研究プログラム(最小2週間)である。このプロジェクトは6年間続き、2つの異なるエディションをそれぞれ3年間にわたってカバーした。このプロジェクトは6年間続き、2つの異なるエディションをそれぞれ3年間にわたってカバーした。 SAVRO (Simulation and Virtual Reality in Robotics for Industrial Assembly Processes) は、2008年から2010年にかけて、バレンシア大学 (Universitat Politecnica de Valencia) がIPコーディネーターとして参加し、ドイツ工科大学 (Technische Universitat Kaiserslautern) とイタリア工科大学 (Universita degli Studi di Salerno) が参加した。フランスのライムズ・シャンパン=アルデンヌ大学(Universite de Reims Champagne-Ardenne)は、HUMAIN(Human-Machine Interaction)と改名されたIPの2011-2013年版に新たなパートナーとして参加した。教育事業の両版は同じ目的と組織的側面を特徴とし、産業パートナーも関与する国際機関間の協力的な作業を通じて、活発な指導に基づく教育活動を提供することを目的としていた。本研究の目的は,我々の経験の組織化を特徴とするベストプラクティスを概説するとともに,計算学カリキュラムの創出方法に関する一般的な勧告や提案を提示することである。

This paper presents the design, objectives, experiences, and results of an international cooperation project funded by the European Commission in the context of the Erasmus Intensive Programme (IP, for short) designed to improve students' curricula. An IP is a short programme of study (minimum 2 weeks) that brings together university students and staff from at least three countries in order to encourage efficient and multinational teaching of specialist topics, which might otherwise not be taught at all. This project lasted for 6 years, covering two different editions, each one with three year duration. This project lasted for 6 years, covering two different editions, each one with three year duration. The first edition, named SAVRO (Simulation and Virtual Reality in Robotics for Industrial Assembly Processes) was held in the period 2008-2010, with the participation of three Universities, namely the Universitat Politecnica de Valencia (Spain), acting as IP coordinator, the Technische Universitat Kaiserslautern (Germany), and the Universita degli Studi di Salerno (Italy). The Universite de Reims Champagne-Ardenne (France) participated as a new partner in the subsequent edition (2011-2013) of the IP, renamed as HUMAIN (Human-Machine Interaction). Both editions of the teaching project were characterized by the same objectives and organizational aspects, aiming to provide educational initiatives based on active teaching through collaborative works between international institutions, involving industrial partners too. The aim of the paper is to illustrate the best practices that characterized the organization of our experience as well as to present some general recommendations and suggestions on how to devise computing academic curricula.

翻訳日:2024-04-01 03:13:49 公開日:2024-02-18

# 線形代数のChatGPT:前へ進もう

ChatGPT in Linear Algebra: Strides Forward, Steps to Go ( http://arxiv.org/abs/2403.15399v1 )

ライセンス: Link先を確認

Eli Bagno, Thierry Dana-Picard, Shulamit Reches,

(参考訳) 新たな技術が出現するとすぐに、教育コミュニティは、その余裕と、それを教育に適用する可能性を探る。本稿では,ChatGPTを用いた基本線形代数のトピックに関するセッションの分析を行う。我々は,近年の関心分野におけるChatGPTのプロセスを反映し,線形代数問題に対処する上での大幅な改善を強調している。特に、このソフトウェアが教師のアシスタントになるのか、それとも人間の教師の代わりになるのかという問題に対処する。この論文が書かれた時点では、答えは概して否定的である。答えが正の小さな部分については、元の楽器生成に関するいくつかの反射が与えられる。ソフトウェアとのコミュニケーションは人間と話す印象を与えます。したがって、ChatGPTが統計的に機能し、反射や理解によっては機能しないという事実に、読者の注意が向けられる。

As soon as a new technology emerges, the education community explores its affordances and the possibilities to apply it in education. In this paper, we analyze sessions with ChatGPT around topics in basic Linear Algebra. We reflect the process undertaken by the ChatGPT along the recent year in our area of interest, emphasising the vast improvement that has been done in grappling with Linear Algebra problems. In particular, the question whether this software can be a teaching assistant or even somehow replace the human teacher, is addressed. As of the time this paper is written, the answer is generally negative. For the small part where the answer can be positive, some reflections about an original instrumental genesis are given. Communication with the software gives the impression to talk to a human, and sometimes the question is whether the software understands the question or not. Therefore, the reader's attention is drawn to the fact that ChatGPT works on a statistical basis and not according to reflection and understanding.

翻訳日:2024-04-01 03:13:49 公開日:2024-02-18

# 即時投票投票を監査するための効率的な重み付け方式

Efficient Weighting Schemes for Auditing Instant-Runoff Voting Elections ( http://arxiv.org/abs/2403.15400v1 )

ライセンス: Link先を確認

Alexander Ek, Philip B. Stark, Peter J. Stuckey, Damjan Vukcevic,

(参考訳) 即時投票 (IRV) 選挙のためのリスク制限監査 (RLA) 手法が開発されている。最近の手法であるAWAIREは、キャスト投票記録(CVR)を必要としない最初の効率的なアプローチである。 AWAIREは、適応的に重み付けされたテスト統計量であり、本質的には、テストに有効な仮説のセットを「学習」する。しかし、AWAIREの最初の論文では、いくつかの重み付けスキームとパラメータ設定についてのみ検討した。我々は,計画と設定を幅広く探究し,有効利用のための効率的な選択を特定し,推奨する。我々は、実際の選挙データに基づくシミュレーションを用いて、CVRが利用できない(最も厳しい)ケースにのみ焦点をあてる。比較において、最も効果的なスキームは、しばしば、既に観測されたデータに基づいて、見かけ上の「ベスト」仮説に重みのほとんどまたは全てを配置するものである。逆に、最適チューニングパラメータは選挙マージンによって異なる傾向にあった。それでも、デフォルトオプションが必要な場合、最も望ましいトレードオフを選択するのに役立ち、さまざまな選挙マージンで異なる選択に対するパフォーマンストレードオフを定量化します。現在のAWAIRE実装の制限は、少数の候補(以前は6つの候補)を扱うことの制限である。より計算的に効率的な実装への道の1つは、遅延評価を使い、可能なすべての仮説を考慮しないことである。以上の結果から,統計的に有意な構成を伴わずに,このようなアプローチが可能であることが示唆された。

Various risk-limiting audit (RLA) methods have been developed for instant-runoff voting (IRV) elections. A recent method, AWAIRE, is the first efficient approach that does not require cast vote records (CVRs). AWAIRE involves adaptively weighted averages of test statistics, essentially "learning" an effective set of hypotheses to test. However, the initial paper on AWAIRE only examined a few weighting schemes and parameter settings. We provide an extensive exploration of schemes and settings, to identify and recommend efficient choices for practical use. We focus only on the (hardest) case where CVRs are not available, using simulations based on real election data to assess performance. Across our comparisons, the most effective schemes are often those that place most or all of the weight on the apparent "best" hypotheses based on already seen data. Conversely, the optimal tuning parameters tended to vary based on the election margin. Nonetheless, we quantify the performance trade-offs for different choices across varying election margins, aiding in selecting the most desirable trade-off if a default option is needed. A limitation of the current AWAIRE implementation is its restriction to handling a small number of candidates (previously demonstrated up to six candidates). One path to a more computationally efficient implementation would be to use lazy evaluation and avoid considering all possible hypotheses. Our findings suggest that such an approach could be done without substantially comprising statistical performance.

翻訳日:2024-04-01 03:13:49 公開日:2024-02-18

# virtCCA:TrustZoneでArm Confidential Compute Architectureを仮想化

virtCCA: Virtualized Arm Confidential Compute Architecture with TrustZone ( http://arxiv.org/abs/2306.11011v2 )

ライセンス: Link先を確認

Xiangyi Xu, Wenhao Wang, Yongzheng Wu, Chenyu Wang, Huifeng Zhu, Haocheng Ma, Zhennan Min, Zixuan Pang, Rui Hou, Yier Jin,

(参考訳) ARMは近日中に予定されているARMv9-Aアーキテクチャの一部として、Confidential Compute Architecture (CCA)を導入した。 CCAは、Realm Worldと呼ばれる別の世界における機密仮想マシン(cVM)のサポートを可能にし、信頼できない通常の世界から保護を提供する。 CCAは機密コンピューティングの有望な未来を提供するが、ARMのロードマップによると、CCAハードウェアの広範な利用は近い将来は期待されない。このギャップに対処するために、既存のARMプラットフォームで利用可能な成熟したハードウェア機能であるTrustZoneを使用して仮想化CCAを容易にするアーキテクチャであるvirtCCAを提案する。特に、virtCCAはARMv8.4以降のSecure EL2 (S-EL2)拡張とS-EL2をサポートしていない初期のプラットフォームで実装できる。 virtCCAはAPIレベルでのCCA仕様と完全に互換性がある。我々はCCAソフトウェアとファームウェアスタック全体をvirtCCA上に開発し、通常の世界のKVMがcVMをサポートするように拡張され、TrustZone Management Monitor(TMM)はcVM間の分離を強制し、cVMライフサイクル管理を提供する。我々は,S-EL2サポートの有無にかかわらず,実際のARMサーバにvirtCCAを実装した。マイクロベンチマークとマクロベンチマークを用いて評価した結果,通常のVMと比較して,cVMの実行のオーバーヘッドは許容できることがわかった。具体的には、現実世界のワークロードセットでは、I/O集約ワークロードでは、virtCCA-SEL2のオーバーヘッドが29.5%未満であるのに対して、virtCCA-EL3は、ほとんどの場合、ベースラインを上回っている。

ARM recently introduced the Confidential Compute Architecture (CCA) as part of the upcoming ARMv9-A architecture. CCA enables the support of confidential virtual machines (cVMs) within a separate world called the Realm world, providing protection from the untrusted normal world. While CCA offers a promising future for confidential computing, the widespread availability of CCA hardware is not expected in the near future, according to ARM's roadmap. To address this gap, we present virtCCA, an architecture that facilitates virtualized CCA using TrustZone, a mature hardware feature available on existing ARM platforms. Notably, virtCCA can be implemented on platforms equipped with the Secure EL2 (S-EL2) extension available from ARMv8.4 onwards, as well as on earlier platforms that lack S-EL2 support. virtCCA is fully compatible with the CCA specifications at the API level. We have developed the entire CCA software and firmware stack on top of virtCCA, including the enhancements to the normal world's KVM to support cVMs, and the TrustZone Management Monitor (TMM) that enforces isolation among cVMs and provides cVM life-cycle management. We have implemented virtCCA on real ARM servers, with and without S-EL2 support. Our evaluation, conducted on micro-benchmarks and macro-benchmarks, demonstrates that the overhead of running cVMs is acceptable compared to running normal-world VMs. Specifically, in a set of real-world workloads, the overhead of virtCCA-SEL2 is less than 29.5% for I/O intensive workloads, while virtCCA-EL3 outperforms the baseline in most cases.

翻訳日:2024-03-25 23:38:51 公開日:2024-02-18

# VoltSchemer:ワイヤレス充電器を操作するために電圧ノイズを使う

VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger ( http://arxiv.org/abs/2402.11423v1 )

ライセンス: Link先を確認

Zihao Zhan, Yirui Yang, Haoqi Shan, Hanqiu Wang, Yier Jin, Shuo Wang,

(参考訳) ワイヤレス充電は、従来の有線充電よりも便利で安全な充電体験のために、ポータブル電子製品の充電ソリューションとしてますます人気が高まっている。しかし、我々の研究はワイヤレス充電システムの新たな脆弱性を特定し、意図的な電磁干渉の影響を受けやすいようにした。これらの脆弱性は、新しい攻撃ベクトルのセットを促進し、敵がチャージャーを操作して一連の攻撃を行うことを可能にする。本稿では,電力供給の電圧を調節するだけで,攻撃者が市販のワイヤレス充電器を制御できる革新的な攻撃セットであるVoltSchemerを提案する。これらの攻撃は、電源からの電圧ノイズを利用して、充電器自体に悪質な変更を加えることなく、ワイヤレス充電器を操作する最初のものだ。 VoltSchemerが課した重大な脅威は、3つの実用的な攻撃によって裏付けられる: チャージャーを操作できる: 難聴音声コマンドによるボイスアシスタントの制御、過給または過熱によって充電される損傷装置、強磁場にさらされた貴重なアイテムに損傷を与えるためのQi規格の特定異物検出機構をバイパスする。トップセラーのCOTSワイヤレス充電器9台に対する攻撃を成功させたVoltSchemer攻撃の有効性と実用性を示す。さらに,本研究の安全性について考察し,潜在的な脅威を軽減するための対策を提案する。

Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vectors, enabling adversaries to manipulate the charger and perform a series of attacks. In this paper, we propose VoltSchemer, a set of innovative attacks that grant attackers control over commercial-off-the-shelf wireless chargers merely by modulating the voltage from the power supply. These attacks represent the first of its kind, exploiting voltage noises from the power supply to manipulate wireless chargers without necessitating any malicious modifications to the chargers themselves. The significant threats imposed by VoltSchemer are substantiated by three practical attacks, where a charger can be manipulated to: control voice assistants via inaudible voice commands, damage devices being charged through overcharging or overheating, and bypass Qi-standard specified foreign-object-detection mechanism to damage valuable items exposed to intense magnetic fields. We demonstrate the effectiveness and practicality of the VoltSchemer attacks with successful attacks on 9 top-selling COTS wireless chargers. Furthermore, we discuss the security implications of our findings and suggest possible countermeasures to mitigate potential threats.

翻訳日:2024-03-25 09:06:20 公開日:2024-02-18

# NestedSGX: 信頼できるVM内に宣言する信頼をブートストラップする

NestedSGX: Bootstrapping Trust to Enclaves within Confidential VMs ( http://arxiv.org/abs/2402.11438v1 )

ライセンス: Link先を確認

Wenhao Wang, Linke Song, Benshan Mei, Shuang Liu, Shijun Zhao, Shoumeng Yan, XiaoFeng Wang, Dan Meng, Rui Hou,

(参考訳) 真のソフトウェアだけがマシンにロードされることを保証するため、システムセキュリティの維持には統合性が不可欠である。機密仮想マシン(CVM)はホストとは分離された環境内で機能するが、信頼された実行環境(TEE)内で実行されるコードの整合性を維持する上で、ユーザが依然として課題に直面していることを認識することが重要である。高度なオペレーティングシステム(OS)が存在することで、動的にコードを作成して実行することが可能になり、ゲストOSが侵害された場合、TEE内のユーザアプリケーションが干渉や改ざんに対して脆弱になる。本稿では、ゲストVM内でハードウェアエンクレーブの作成を可能にするために、AMD SEV-SNPで利用可能な最近のハードウェア機能である仮想マシン特権レベル(VMPL)を活用するNestedSGXを紹介する。 Intel SGXと同様、NestedSGXは、悪意のあるコードのロードを信頼していないゲストOSだと考えている。これは、エンクレーブ内で実行される信頼され測定されたコードだけがリモートで検証可能であることを保証します。既存のアプリケーションをシームレスに保護するために、NestedSGXはSGXリーフ関数をシミュレートすることで、Intel SGXとの互換性を目指している。また、SGX SDKをNestedSGXに移植し、システム内の既存のSGXツールチェーンとアプリケーションの使用を可能にしました。性能評価によると、NestedSGXのコンテキストスイッチはIntel SGXの約2～3倍の約35,000～37,000サイクルを要している。 NestedSGXは、ほとんどの現実世界のアプリケーションでは最小限のオーバーヘッドを発生し、ほとんどのワークロードでは平均5%以下、I/O集約ワークロードでは22.7%である。

Integrity is critical for maintaining system security, as it ensures that only genuine software is loaded onto a machine. Although confidential virtual machines (CVMs) function within isolated environments separate from the host, it is important to recognize that users still encounter challenges in maintaining control over the integrity of the code running within the trusted execution environments (TEEs). The presence of a sophisticated operating system (OS) raises the possibility of dynamically creating and executing any code, making user applications within TEEs vulnerable to interference or tampering if the guest OS is compromised. This paper introduces NestedSGX, which leverages virtual machine privilege level (VMPL), a recent hardware feature available on AMD SEV-SNP to enable the creation of hardware enclaves within the guest VM. Similar to Intel SGX, NestedSGX considers the guest OS untrusted for loading potentially malicious code. It ensures that only trusted and measured code executed within the enclave can be remotely attested. To seamlessly protect existing applications, NestedSGX aims for compatibility with Intel SGX by simulating SGX leaf functions. We have also ported the SGX SDK to NestedSGX, enabling the use of existing SGX toolchains and applications in the system. Performance evaluations show that context switches in NestedSGX take about 35,000-37,000 cycles, approximately 2-3 times that of Intel SGX. NestedSGX incurs minimal overhead in most real-world applications, with an average overhead below 5% for most workloads and 22.7% for I/O intensive workloads.

翻訳日:2024-03-25 09:06:20 公開日:2024-02-18

# 分散時空間データにおけるプライバシ損失の測定

Measuring Privacy Loss in Distributed Spatio-Temporal Data ( http://arxiv.org/abs/2402.11526v1 )

ライセンス: Link先を確認

Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri,

(参考訳) 複数の地理的な場所から分散的に収集された交通の流れや人々の移動に関する統計は、交通予測、需要予測、レストラン占領報告など、多くのアプリケーションを動かす原動力である。しかし、これらの統計は、しばしば人々のセンシティブな位置情報に基づいており、したがって、そのデータを公開している間にプライバシーを保持する必要がある。差分プライバシーは、厳格で最悪の人格レベルのプライバシーを保証します。本研究は,分散位置情報アプリケーションにおける差分プライバシーの非直感的特徴を動機として,情報提供者による位置復元攻撃に対する代替的プライバシー損失を提案する。実データと合成データを用いた実験により、分散時空間設定における個人のプライバシー侵害に対する直感を、プライバシーの損失がより良く反映していることが示される。

Statistics about traffic flow and people's movement gathered from multiple geographical locations in a distributed manner are the driving force powering many applications, such as traffic prediction, demand prediction, and restaurant occupancy reports. However, these statistics are often based on sensitive location data of people, and hence privacy has to be preserved while releasing them. The standard way to do this is via differential privacy, which guarantees a form of rigorous, worst-case, person-level privacy. In this work, motivated by several counter-intuitive features of differential privacy in distributed location applications, we propose an alternative privacy loss against location reconstruction attacks by an informed adversary. Our experiments on real and synthetic data demonstrate that our privacy loss better reflects our intuitions on individual privacy violation in the distributed spatio-temporal setting.

翻訳日:2024-03-25 08:56:22 公開日:2024-02-18

# エネルギーセクターレジリエンスの強化:設計原則によるセキュリティの統合

Enhancing Energy Sector Resilience: Integrating Security by Design Principles ( http://arxiv.org/abs/2402.11543v1 )

ライセンス: Link先を確認

Dov Shirtz, Inna Koberman, Aviad Elyashar, Rami Puzis, Yuval Elovici,

(参考訳) 設計によるセキュリティ、Sbdは、可能な限り、セキュリティ上の脆弱性がなく、セキュリティ攻撃に不注意なシステムの開発とメンテナンスのための概念である。堅牢な産業制御システムを開発する方法、ソフトウェア、通信製品など、技術的な側面に加えて、SbDには組織管理の態度や行動、従業員の意識といったソフトな側面も含まれている。 Sbdのコンセプトの下では、ICS(ICS)はユーザにとってより信頼に値するものとみなされるでしょう。システムに対するユーザの信頼は、SbDプロセスとポリシーの厳密な遵守から導き出されます。 SbDの概念に従って、セキュリティが検討されている。セキュリティ対策は、その後ではなく、製品やシステム開発ライフサイクルの各段階で実施されます。本報告では,産業用制御システムにおけるSbDの実装に関するセキュリティ要件について述べる。提示された情報は、既存のセキュリティやサイバーセキュリティの基準を無効にするものではありません。その代わり、私たちは組織がそれらの標準とベストプラクティスを実装し、遵守することを強く推奨します。設計によるセキュリティは、一度限りのプロセスではありません。システム設計のプロダクトの始まりから始まり、ライフサイクル全体を通して継続します。 SbDの利点、より高いレベルのセキュリティ、サイバー攻撃に対する堅牢性により、エネルギーセクターに関連するすべての組織は、エコシステムを確立する努力をすべきである。この文書に記載されている要件は、組織によって負担のかかるものとみなすことができる。しかしながら、この文書に記載されているように、要求と既存のセキュリティ標準とベストプラクティスへの厳格なコンプライアンスは、SbDが推進し保護するエコシステムを実現する上で不可欠である。

Security by design, Sbd is a concept for developing and maintaining systems that are, to the greatest extent possible, free from security vulnerabilities and impervious to security attacks. In addition to technical aspects, such as how to develop a robust industrial control systems hardware, software, communication product, etc., SbD includes also soft aspects, such as organizational managerial attitude and behavior, and employee awareness. Under the Sbd concept, systems, ICS in our context, will be considered more trustworthy by users. User's trust in the systems will be derived from the meticulous adherence to the SbD processes and policies. In accordance with the SbD concept, security is considered. Security measures are implemented, at every stage of the product and systems development life cycle, rather than afterwards. This document presents the security requirements for the implementation of the SbD in industrial control systems. The information presented does not negate any existing security and cyber security standards, etc. Instead, we strongly recommend that organizations should implement and comply with those standards and best practices. Security by design is not a one-time process. It starts at the very beginning of the products of the system design and continues through all its lifecycle. Due to the benefits of the SbD, higher level of security, and robustness to cyber attacks, all organizations associated with the energy sector should strive to establish an ecosystem. The requirements presented in this document may be perceived as burdensome by organizations. However, strict compliance with the requirements and existing security standards and best practices, including continuous monitoring, as specified in this document, is essential to realize an ecosystem driven and protected by the SbD

翻訳日:2024-03-25 08:56:22 公開日:2024-02-18

# 二元体上の効率的な正規基底について

On efficient normal bases over binary fields ( http://arxiv.org/abs/2402.11544v1 )

ライセンス: Link先を確認

Mohamadou Sall, M. Anwar Hasan,

(参考訳) バイナリフィールド拡張は、多変量公開鍵暗号、コードベースの暗号、エラー訂正コードなど、多くのアプリケーションに基本的なものである。それらの実装は数論と代数幾何学の基礎を必要とし、効率的な基底の利用を必要とする。計算能力の継続的な増加と新しい(量子)コンピュータの設計により、システムのセキュリティに対する脅威が増大し、膨大な多項式や拡張度の暗号化標準が要求されるようになる。暗号的な目的や有限場演算の一般的な実装のためには、多様な基礎を持つ幅広い実装を検討することが不可欠である。いくつかの基底とは異なり、多項式とガウス正規基底は十分に文書化され広く使われている。本稿では、異なる範囲における演算の効率的な実装を示すために、$\mathbb{F}_{2^n}$ over $\mathbb{F}_2$の他の形式の基底について検討する。これを実現するために、Couveignes と Lercier が導入した高速計算と楕円周期の結果を活用し、その後 Ezome と Sall によって拡張した。これにより、二進体上の効率的な計算のための新しいテーブルが確立される。

Binary field extensions are fundamental to many applications, such as multivariate public key cryptography, code-based cryptography, and error-correcting codes. Their implementation requires a foundation in number theory and algebraic geometry and necessitates the utilization of efficient bases. The continuous increase in the power of computation, and the design of new (quantum) computers increase the threat to the security of systems and impose increasingly demanding encryption standards with huge polynomial or extension degrees. For cryptographic purposes or other common implementations of finite fields arithmetic, it is essential to explore a wide range of implementations with diverse bases. Unlike some bases, polynomial and Gaussian normal bases are well-documented and widely employed. In this paper, we explore other forms of bases of $\mathbb{F}_{2^n}$ over $\mathbb{F}_2$ to demonstrate efficient implementation of operations within different ranges. To achieve this, we leverage results on fast computations and elliptic periods introduced by Couveignes and Lercier, and subsequently expanded upon by Ezome and Sall. This leads to the establishment of new tables for efficient computation over binary fields.

翻訳日:2024-03-25 08:56:22 公開日:2024-02-18

# ハードウェアで戦うハードウェア:性能カウンタを用いたサイドチャネル攻撃の検出と軽減

Fight Hardware with Hardware: System-wide Detection and Mitigation of Side-Channel Attacks using Performance Counters ( http://arxiv.org/abs/2402.13281v1 )

ライセンス: Link先を確認

Stefano Carnà, Serena Ferracci, Francesco Quaglia, Alessandro Pellegrini,

(参考訳) 本稿では,キャッシュベースのサイドチャネル攻撃を利用して,標準的なオペレーティングシステムによるプロセス制限を破ろうとする悪意のあるアプリケーションに対して,システム全体の検出を可能にするカーネルレベルのインフラストラクチャを提案する。このインフラストラクチャは、マシン上で動作するすべてのアプリケーションから実行時に情報を集めるために、ハードウェアパフォーマンスカウンタに依存している。これらの測定から高レベルの検出指標が導出され、悪意のあるアプリケーションを迅速に検出する可能性の最大化が図られる。実験により, オーバーヘッドを著しく低減して, サイドチャネル攻撃の大規模なファミリーを捕捉できることが示唆された。また,非監視プロセス実行時のシステムセキュリティレベルと納品性能の全体的なトレードオフを増大させるため,プロセスがサイドチャネルアタックを実行した疑いのある場合に実施可能な対策についても論じる。

We present a kernel-level infrastructure that allows system-wide detection of malicious applications attempting to exploit cache-based side-channel attacks to break the process confinement enforced by standard operating systems. This infrastructure relies on hardware performance counters to collect information at runtime from all applications running on the machine. High-level detection metrics are derived from these measurements to maximize the likelihood of promptly detecting a malicious application. Our experimental assessment shows that we can catch a large family of side-channel attacks with a significantly reduced overhead. We also discuss countermeasures that can be enacted once a process is suspected of carrying out a side-channel attack to increase the overall tradeoff between the system's security level and the delivered performance under non-suspected process executions.

翻訳日:2024-03-25 08:56:22 公開日:2024-02-18

# PassViz:漏洩したパスワードを可視化するシステム

PassViz: A Visualisation System for Analysing Leaked Passwords ( http://arxiv.org/abs/2309.12968v3 )

ライセンス: Link先を確認

Sam Parker, Haiyue Yuan, Shujun Li,

(参考訳) 他の手法の進歩にもかかわらず、パスワードは依然として最も広く使われているユーザー認証形式である。しかしながら、攻撃に対する感受性、特に人間のユーザによって定義された弱いパスワードなど、それらの制限は文書化されている。弱い人間が定義したパスワードの存在は、ウェブサイトから繰り返しパスワードのリークを引き起こし、その多くが大規模である。このようなパスワードリークは不運なセキュリティインシデントであるが、パスワードポリシーやパスワードの他のセキュリティコントロールを改善する方法を見つけるために、セキュリティ研究者や専門家に、そのようなリークパスワードから貴重な洞察を得る機会を提供する。研究者たちは、漏洩したパスワードを分析するために、さまざまなデータ可視化技術を提案している。しかし、多くのアプローチは周波数解析にのみ依存しており、距離ベースグラフの探索は限られている。本稿では,2次元空間における漏洩パスワードの可視化と解析を行うため,編集距離をt-SNE(t-disdistributed stochastic embedded)次元削減アルゴリズムと組み合わせた新しい手法であるPassVizについて報告する。我々はPassVizを大規模なパスワードデータベースを視覚化するための使いやすいコマンドラインツールとして実装し、また小さなパスワードデータベースのインタラクティブなビジュアル分析をサポートするグラフィカルユーザインタフェース(GUI)として実装した。リークした“000webhost”データベースを例として、PassVizを使って、漏洩したパスワードのさまざまな側面を視覚的に分析し、これまで知らなかったパスワードパターンの発見を容易にする方法を示す。全体として、我々のアプローチは、研究者や実践者が有効なデータ可視化と分析を通じて、貴重な洞察を得てパスワードセキュリティを改善するのに役立ちます。

Passwords remain the most widely used form of user authentication, despite advancements in other methods. However, their limitations, such as susceptibility to attacks, especially weak passwords defined by human users, are well-documented. The existence of weak human-defined passwords has led to repeated password leaks from websites, many of which are of large scale. While such password leaks are unfortunate security incidents, they provide security researchers and practitioners with good opportunities to learn valuable insights from such leaked passwords, in order to identify ways to improve password policies and other security controls on passwords. Researchers have proposed different data visualisation techniques to help analyse leaked passwords. However, many approaches rely solely on frequency analysis, with limited exploration of distance-based graphs. This paper reports PassViz, a novel method that combines the edit distance with the t-SNE (t-distributed stochastic neighbour embedding) dimensionality reduction algorithm for visualising and analysing leaked passwords in a 2-D space. We implemented PassViz as an easy-to-use command-line tool for visualising large-scale password databases, and also as a graphical user interface (GUI) to support interactive visual analytics of small password databases. Using the "000webhost" leaked database as an example, we show how PassViz can be used to visually analyse different aspects of leaked passwords and to facilitate the discovery of previously unknown password patterns. Overall, our approach empowers researchers and practitioners to gain valuable insights and improve password security through effective data visualisation and analysis.

翻訳日:2024-03-19 04:01:03 公開日:2024-02-18

# 情報検索におけるBERTの利用:調査,応用,資源,課題

Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges ( http://arxiv.org/abs/2403.00784v1 )

ライセンス: Link先を確認

Jiajia Wang, Jimmy X. Huang, Xinhui Tu, Junmei Wang, Angela J. Huang, Md Tahmid Rahman Laskar, Amran Bhuiyan

(参考訳) 近年では、さまざまな自然言語処理(nlp)問題を解決するためのディープラーニングの利用が大幅に増加している。初期のディープラーニングモデルは、テキスト入力間の文脈的関係を捉えるのに苦労するなど、逐次的あるいは一方向的な性質によって制約されていた。変換器(BERT)からの双方向エンコーダ表現の導入は、より広いコンテキストを理解し、様々なNLPタスクに対して最先端のパフォーマンスを提供することができるトランスフォーマーモデルの堅牢なエンコーダをもたらす。これは、研究者や実践者が情報検索(IR)のような実践的な問題にBERTを適用するきっかけとなった。 BERTのような事前訓練されたトランスフォーマーエンコーダをIRに適用する一般的なアプローチの包括的分析に焦点を当てた調査は、学術や産業にとって有用である。これを踏まえ、この調査では様々なBERTベースの手法を再検討し、IRの幅広い手法を網羅し、それらを6つのハイレベルカテゴリに分類する。 (i)長い文書を扱うこと。 (ii)意味情報の統合 (iii)有効性と効率のバランスをとること。 (四)項の重みを予測すること。 (v)クエリ拡張、および (vi)文書拡張。また、BERTベースのIRシステムのためのデータセットやツールキットを含むリソースへのリンクも提供します。この調査の重要な点は、bertのエンコーダベースのモデルと、デコーダに依存するchatgptのような最新の生成型大規模言語モデル(llm)の比較である。 LLMの人気にもかかわらず、特定のタスクに対して細調整されたBERTエンコーダは依然として性能が良く、デプロイコストも低い。最後に,調査の総合的な成果を要約し,今後の研究の方向性を提案する。

Recent years have witnessed a substantial increase in the use of deep learning to solve various natural language processing (NLP) problems. Early deep learning models were constrained by their sequential or unidirectional nature, such that they struggled to capture the contextual relationships across text inputs. The introduction of bidirectional encoder representations from transformers (BERT) leads to a robust encoder for the transformer model that can understand the broader context and deliver state-of-the-art performance across various NLP tasks. This has inspired researchers and practitioners to apply BERT to practical problems, such as information retrieval (IR). A survey that focuses on a comprehensive analysis of prevalent approaches that apply pretrained transformer encoders like BERT to IR can thus be useful for academia and the industry. In light of this, we revisit a variety of BERT-based methods in this survey, cover a wide range of techniques of IR, and group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion. We also provide links to resources, including datasets and toolkits, for BERT-based IR systems. A key highlight of our survey is the comparison between BERT's encoder-based models and the latest generative Large Language Models (LLMs), such as ChatGPT, which rely on decoders. Despite the popularity of LLMs, we find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost. Finally, we summarize the comprehensive outcomes of the survey and suggest directions for future research in the area.

翻訳日:2024-03-11 00:09:22 公開日:2024-02-18

# 計画における LLM の役割--計画図への LLM の埋め込み

On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs ( http://arxiv.org/abs/2403.00783v1 )

ライセンス: Link先を確認

Hankz Hankui Zhuo and Xin Chen and Rong Pan

(参考訳) プラン合成は、与えられた初期状態から目標状態へ移行するための一連のアクションやポリシーを生成することを目的としており、専門家が設計したり、データや世界との対話から学ぶことのできるドメインモデルを提供する。大規模言語モデル (LLM) における創発的計画能力の主張により, LLM における既成計画技術の利用を考慮せずに, LLM の計画効率を検討する作業が提案されている。本稿では,既成の計画フレームワークにおけるLCMの役割を解明し,LCMの計画能力に関する知見をさらに研究することを目的とする。そこで本研究では,LLMをグラフベースの計画フレームワークに組み込むことの有効性について検討し,LLMを2段階の計画グラフ,すなわち相互制約生成レベルと制約解決レベルに組み込んだ新しいLLMベースの計画フレームワークを提案する。様々な計画領域において提案手法の有効性を実証的に示す。

Plan synthesis aims to generate a course of actions or policies to transit given initial states to goal states, provided domain models that could be designed by experts or learnt from training data or interactions with the world. Intrigued by the claims of emergent planning capabilities in large language models (LLMs), works have been proposed to investigate the planning effectiveness of LLMs, without considering any utilization of off-the-shelf planning techniques in LLMs. In this paper, we aim to further study the insight of the planning capability of LLMs by investigating the roles of LLMs in off-the-shelf planning frameworks. To do this, we investigate the effectiveness of embedding LLMs into one of the well-known planning frameworks, graph-based planning, proposing a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs, i.e., mutual constraints generation level and constraints solving level. We empirically exhibit the effectiveness of our proposed framework in various planning domains.

翻訳日:2024-03-11 00:08:53 公開日:2024-02-18

# Ploutos:金融大言語モデルによる株価変動予測に向けて

Ploutos: Towards interpretable stock movement prediction with financial large language model ( http://arxiv.org/abs/2403.00782v1 )

ライセンス: Link先を確認

Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, Qi Zhang

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、多くの領域で新しい経路を開拓している。しかし、金融投資におけるLLMのポテンシャルは、ほとんど未完成のままである。一般的なディープラーニングベースの定量的ファイナンスには,2つの大きな課題がある。まず、株価移動予測のためにテキスト情報と数値情報を柔軟に融合するのに苦労する。第二に、従来の手法には明確さと解釈性が欠けており、予測の正当化が不可欠であるシナリオでその応用を妨げる。以上の課題を解決するために,PloutosGenとPloutosGPTで構成される新しい金融LLMフレームワークであるPloutosを提案する。 ploutosgenには、テキストや数値などの異なるモーダルデータを分析し、異なる観点から定量的な戦略を提供する複数の主要な専門家が含まれている。そして、PloutosGPTは彼らの洞察と予測を組み合わせて解釈可能な理性を生成する。正確で忠実な合理性を生成するために、PloutosGPTのトレーニング戦略は、GPT-4を誘導して合理性を生成するリアビューミラープロンプト機構と、キートークンの重みを増大させることによりLDMを微調整する動的トークン重み付け機構を利用する。我々のフレームワークは予測精度と解釈可能性の両方において最先端の手法より優れていることを示す。

Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional methods lack clarity and interpretability, which impedes their application in scenarios where the justification for predictions is essential. To solve the above challenges, we propose Ploutos, a novel financial LLM framework that consists of PloutosGen and PloutosGPT. The PloutosGen contains multiple primary experts that can analyze different modal data, such as text and numbers, and provide quantitative strategies from different perspectives. Then PloutosGPT combines their insights and predictions and generates interpretable rationales. To generate accurate and faithful rationales, the training strategy of PloutosGPT leverage rearview-mirror prompting mechanism to guide GPT-4 to generate rationales, and a dynamic token weighting mechanism to finetune LLM by increasing key tokens weight. Extensive experiments show our framework outperforms the state-of-the-art methods on both prediction accuracy and interpretability.

翻訳日:2024-03-11 00:08:33 公開日:2024-02-18

# ChatDiet: LLM拡張フレームワークによるパーソナライズされた栄養指向食品レコメンダチャットボットの活用

ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework ( http://arxiv.org/abs/2403.00781v1 )

ライセンス: Link先を確認

Zhongqi Yang, Elahe Khatibi, Nitish Nagesh, Mahyar Abbasian, Iman Azimi, Ramesh Jain, Amir M. Rahmani

(参考訳) 食品が健康に与える影響は、高度な栄養指向の食品レコメンデーションサービスを必要とする。従来の手法は、パーソナライゼーション、説明可能性、対話性の重要な要素を欠いていることが多い。大きな言語モデル(LLM)は解釈可能性と説明可能性をもたらすが、彼らのスタンドアロンの使用は真のパーソナライゼーションを達成するには不十分である。本稿では、栄養指向食品レコメンデーションチャットボットに特化して設計された、新しいLLMフレームワークChatDietを紹介する。 ChatDietは、オーケストラが補完する個人モデルと人口モデルを統合し、シームレスに関連する情報を検索し、処理する。その結果、個人の好みに合わせて、パーソナライズされた説明可能な食品レコメンデーションが動的に配信される。 chatdietの評価には、個々の栄養効果を推定するための因果的個人モデルを確立する、説得力のあるケーススタディが含まれています。食事のレコメンデーションテストを含む評価は,説明可能性,パーソナライゼーション,対話性におけるチャットの強みを図示的対話例と組み合わせて,有効率92\%を示した。

The profound impact of food on health necessitates advanced nutrition-oriented food recommendation services. Conventional methods often lack the crucial elements of personalization, explainability, and interactivity. While Large Language Models (LLMs) bring interpretability and explainability, their standalone use falls short of achieving true personalization. In this paper, we introduce ChatDiet, a novel LLM-powered framework designed specifically for personalized nutrition-oriented food recommendation chatbots. ChatDiet integrates personal and population models, complemented by an orchestrator, to seamlessly retrieve and process pertinent information. The result is a dynamic delivery of personalized and explainable food recommendations, tailored to individual user preferences. Our evaluation of ChatDiet includes a compelling case study, where we establish a causal personal model to estimate individual nutrition effects. Our assessments, including a food recommendation test showcasing a 92\% effectiveness rate, coupled with illustrative dialogue examples, underscore ChatDiet's strengths in explainability, personalization, and interactivity.

翻訳日:2024-03-11 00:08:11 公開日:2024-02-18

# 絡み合い:罰と補償のバランス、繰り返しジレンマゲーム--ウォーレスの法則の弱い偽ニュースの場合、バイパスの最大補償問題とファクトチェックの最小コストパスの理論的解析

Entanglement: Balancing Punishment and Compensation, Repeated Dilemma Game-Theoretic Analysis of Maximum Compensation Problem for Bypass and Least Cost Paths in Fact-Checking, Case of Fake News with Weak Wallace's Law ( http://arxiv.org/abs/2403.02342v1 )

ライセンス: Link先を確認

Yasuko Kawahata

(参考訳) 本研究ノートは,偽ニュースの拡散と効果的な事実確認に関連する問題を解決するための新しいアプローチについて整理したものである。最小コストのルーティング問題に着目し,ニュース提供者間の情報伝達のダイナミクスをモデル化するために,メッツラー関数とメッツラー行列を用いて議論を行った。このアプローチでは,情報健康に有害な偽ニュースの拡散を最小限に抑えるとともに,信頼性の高い情報の拡散を最大化する戦略を考案した。特に, 懲罰的支配問題と最大補償問題を通じて, 情報提供者が行動すべきインセンティブを再評価し, それらの情報市場の均衡への影響を分析する方法を開発し検討した。情報伝達の文脈に絡み合いの概念を適用することで、ニュース提供者間の相互作用の複雑さに光を当て、より効果的な情報管理戦略の策定に寄与する。本研究は,偽ニュースとファクトチェックに関する新たな理論的,実践的な知見を提供し,情報健康と公衆デジタル健康の改善について検討する。

This research note is organized with respect to a novel approach to solving problems related to the spread of fake news and effective fact-checking. Focusing on the least-cost routing problem, the discussion is organized with respect to the use of Metzler functions and Metzler matrices to model the dynamics of information propagation among news providers. With this approach, we designed a strategy to minimize the spread of fake news, which is detrimental to informational health, while at the same time maximizing the spread of credible information. In particular, through the punitive dominance problem and the maximum compensation problem, we developed and examined a path to reassess the incentives of news providers to act and to analyze their impact on the equilibrium of the information market. By applying the concept of entanglement to the context of information propagation, we shed light on the complexity of interactions among news providers and contribute to the formulation of more effective information management strategies. This study provides new theoretical and practical insights into issues related to fake news and fact-checking, and will be examined against improving informational health and public digital health.

翻訳日:2024-03-10 23:51:35 公開日:2024-02-18

# 大規模言語モデルのためのプロンプト手法の実証的分類:実践者ガイド

An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide ( http://arxiv.org/abs/2402.14837v1 )

ライセンス: Link先を確認

Oluwole Fagbohun, Rachel M. Harrison, Anton Dereventsov

(参考訳) 大規模言語モデル(llm)の開発が急速に進んでいるため、これらのモデルをプロンプトでプログラミングすることが最近大きな注目を集めている。しかし、利用可能なプロンプトエンジニアリングテクニックの数が多く、これらのツールを使いたい実践者にとって圧倒的な景観を生み出します。 LLMの最も効率的かつ効果的な利用のためには、プロンプト技術の包括的なリストをコンパイルし、標準化された学際分類フレームワークを確立することが重要である。本調査では,学術的,実践的両面から最もよく知られたプロンプト技術について検討し,それらを7つのカテゴリーに分類する。本稿では,それぞれのカテゴリについて概説し,それぞれの分野に合わせたプロンプト技術を理解し,分類するための構造的枠組みを,実践者の実例で示すことを目的とする。このアプローチは、迅速なエンジニアリングの複雑な景観を単純化し、様々なアプリケーションにおけるLLMのより効率的な利用を可能にする。実践者に分類を急ぐための体系的なアプローチを提供することにより,対話型事前学習 LLM の効果的なプロンプト設計の複雑化を支援し,それぞれの分野に新たな可能性をもたらすことを目指す。

Due to rapid advancements in the development of Large Language Models (LLMs), programming these models with prompts has recently gained significant attention. However, the sheer number of available prompt engineering techniques creates an overwhelming landscape for practitioners looking to utilize these tools. For the most efficient and effective use of LLMs, it is important to compile a comprehensive list of prompting techniques and establish a standardized, interdisciplinary categorization framework. In this survey, we examine some of the most well-known prompting techniques from both academic and practical viewpoints and classify them into seven distinct categories. We present an overview of each category, aiming to clarify their unique contributions and showcase their practical applications in real-world examples in order to equip fellow practitioners with a structured framework for understanding and categorizing prompting techniques tailored to their specific domains. We believe that this approach will help simplify the complex landscape of prompt engineering and enable more effective utilization of LLMs in various applications. By providing practitioners with a systematic approach to prompt categorization, we aim to assist in navigating the intricacies of effective prompt design for conversational pre-trained LLMs and inspire new possibilities in their respective fields.

翻訳日:2024-03-03 19:38:28 公開日:2024-02-18

# 大規模言語モデルに基づくレコメンデーションのステルス攻撃

Stealthy Attack on Large Language Model based Recommendation ( http://arxiv.org/abs/2402.14836v1 )

ライセンス: Link先を確認

Jinghao Zhang, Yuting Liu, Qiang Liu, Shu Wu, Guibing Guo and Liang Wang

(参考訳) 近年、強力な大規模言語モデル(llms)は、レコメンダシステム(rs)の進歩を促進するのに役立っている。しかし、これらのシステムは繁栄しているが、セキュリティの脅威に対する感受性はほとんど見過ごされている。本稿では,推奨モデルへのllmの導入が,項目のテキストコンテンツを重視した新たなセキュリティ脆弱性をもたらすことを明らかにした。攻撃者は、モデルのトレーニングプロセスに直接干渉することなく、テストフェーズ中にテキストの内容を変更するだけで、アイテムの露出を大幅に向上できることを示す。さらにこの攻撃は、全体的なレコメンデーション性能に影響を与えず、テキストの変更が微妙であるため、ユーザやプラットフォームが検出することが難しいため、特にステルス性が強い。 4つの主要なLCMベースレコメンデーションモデルに対する総合的な実験は、我々のアプローチの優れた有効性とステルス性を示している。我々の研究は、LLMベースのレコメンデーションシステムにおいて重大なセキュリティギャップを明らかにし、これらのシステムを保護するための将来の研究の道を開く。

Recently, the powerful large language models (LLMs) have been instrumental in propelling the progress of recommender systems (RS). However, while these systems have flourished, their susceptibility to security threats has been largely overlooked. In this work, we reveal that the introduction of LLMs into recommendation models presents new security vulnerabilities due to their emphasis on the textual content of items. We demonstrate that attackers can significantly boost an item's exposure by merely altering its textual content during the testing phase, without requiring direct interference with the model's training process. Additionally, the attack is notably stealthy, as it does not affect the overall recommendation performance and the modifications to the text are subtle, making it difficult for users and platforms to detect. Our comprehensive experiments across four mainstream LLM-based recommendation models demonstrate the superior efficacy and stealthiness of our approach. Our work unveils a significant security gap in LLM-based recommendation systems and paves the way for future research on protecting these systems.

翻訳日:2024-03-03 19:38:06 公開日:2024-02-18

# MIKE: きめ細かいマルチモーダルエンティティ知識編集のためのベンチマーク

MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing ( http://arxiv.org/abs/2402.14835v1 )

ライセンス: Link先を確認

Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, Bozhong Tian

(参考訳) マルチモーダル知識編集は,MLLM(Multimodal Large Language Models)の能力向上における重要な進歩である。その可能性にもかかわらず、現在のベンチマークは主に粗粒度知識に重点を置いており、細粒度(FG)マルチモーダルエンティティ知識の複雑さはほとんど解明されていない。このギャップは、さまざまな実世界のシナリオにおけるMLLMの実践的展開と有効性において、FGエンティティ認識が重要な課題であることを示している。このギャップを埋めるために、我々はFGマルチモーダルエンティティ知識編集用に設計された包括的なベンチマークとデータセットであるMIKEを紹介する。 MIKEには、Vanilla Name Answering、Entity-Level Caption、Complex-Scenario Recognitionなど、さまざまな視点を評価するための一連のタスクが含まれている。また,新たな知識編集形式であるマルチステップ編集を導入し,編集効率を評価する。本研究では, MLLMにおけるFG知識編集の複雑さを浮き彫りにして, 提案したベンチマークに対処する上で, 現在の最先端手法が重大な課題に直面していることを示す。本研究は,この領域における新たなアプローチの急激なニーズを浮き彫りにして,コミュニティにおける今後の研究・開発活動に向けた明確な議題を定めている。

Multimodal knowledge editing represents a critical advancement in enhancing the capabilities of Multimodal Large Language Models (MLLMs). Despite its potential, current benchmarks predominantly focus on coarse-grained knowledge, leaving the intricacies of fine-grained (FG) multimodal entity knowledge largely unexplored. This gap presents a notable challenge, as FG entity recognition is pivotal for the practical deployment and effectiveness of MLLMs in diverse real-world scenarios. To bridge this gap, we introduce MIKE, a comprehensive benchmark and dataset specifically designed for the FG multimodal entity knowledge editing. MIKE encompasses a suite of tasks tailored to assess different perspectives, including Vanilla Name Answering, Entity-Level Caption, and Complex-Scenario Recognition. In addition, a new form of knowledge editing, Multi-step Editing, is introduced to evaluate the editing efficiency. Through our extensive evaluations, we demonstrate that the current state-of-the-art methods face significant challenges in tackling our proposed benchmark, underscoring the complexity of FG knowledge editing in MLLMs. Our findings spotlight the urgent need for novel approaches in this domain, setting a clear agenda for future research and development efforts within the community.

翻訳日:2024-03-03 19:37:51 公開日:2024-02-18

# MSynFD:マルチホップ構文認識フェイクニュース検出

MSynFD: Multi-hop Syntax aware Fake News Detection ( http://arxiv.org/abs/2402.14834v1 )

ライセンス: Link先を確認

Liang Xiao, Qi Zhang, Chongyang Shi, Shoujin Wang, Usman Naseem, and Liang Hu

(参考訳) ソーシャルメディアプラットフォームの普及は偽ニュースの拡散を加速させ、われわれの現実社会に脅威をもたらしている。既存の手法では、マルチモーダルデータや文脈情報を用いて、ニュースコンテンツやそのソーシャルコンテキストを分析して偽ニュースの検出を強化する。しかし、これらの方法はしばしば本質的なテクスト的なニュースコンテンツ(記事)を見落とし、シーケンシャルなモデリングと世界的注意に依存して意味情報を抽出する。これらの既存の手法は、構文論的ミスマッチや先行バイアスといった、ニュース記事の複雑な微妙なひねりを処理できず、モダリティや社会的文脈が欠けている場合のパフォーマンスが低下し、潜在的な失敗につながる。これらの大きなギャップを埋めるために,偽ニュースの微妙なひねりに対処するために,補完的な構文情報を組み込んだマルチホップ構文認識フェイクニュース検出(msynfd)手法を提案する。具体的には、構文依存グラフを導入し、マルチホップ構文をキャプチャするマルチホップサブグラフアグリゲーション機構を設計する。単語知覚の効果を拡張し、効果的なノイズフィルタリングと隣接した関係強化につながる。その後、シーケンシャルな相対位置認識トランスは、先行バイアスを軽減するために、精巧なキーワードデバイアスモジュールと共にシーケンシャル情報をキャプチャするように設計されている。 2つのベンチマークデータセットにおける広範囲な実験結果から,提案手法の有効性と優れた性能を検証できた。

The proliferation of social media platforms has fueled the rapid dissemination of fake news, posing threats to our real-life society. Existing methods use multimodal data or contextual information to enhance the detection of fake news by analyzing news content and/or its social context. However, these methods often overlook essential textual news content (articles) and heavily rely on sequential modeling and global attention to extract semantic information. These existing methods fail to handle the complex, subtle twists in news articles, such as syntax-semantics mismatches and prior biases, leading to lower performance and potential failure when modalities or social context are missing. To bridge these significant gaps, we propose a novel multi-hop syntax aware fake news detection (MSynFD) method, which incorporates complementary syntax information to deal with subtle twists in fake news. Specifically, we introduce a syntactical dependency graph and design a multi-hop subgraph aggregation mechanism to capture multi-hop syntax. It extends the effect of word perception, leading to effective noise filtering and adjacent relation enhancement. Subsequently, a sequential relative position-aware Transformer is designed to capture the sequential information, together with an elaborate keyword debiasing module to mitigate the prior bias. Extensive experimental results on two public benchmark datasets verify the effectiveness and superior performance of our proposed MSynFD over state-of-the-art detection models.

翻訳日:2024-03-03 19:37:27 公開日:2024-02-18

# ド・ジッター時空における三部交絡

A tripartite entanglement in de Sitter spacetime ( http://arxiv.org/abs/1909.13454v4 )

ライセンス: Link先を確認

Sang-Eon Bak, Paul M. Alsing, Warner A. Miller, Shahabeddin M. Aslmarand and Doyeol Ahn

(参考訳) ド・ジッター空間における三部絡み状態の量子相関について検討する。まず,ノイズ量子チャネルモデルを採用する。このモデルでは、拡大効果は対応するクラウス作用素との演算子和表現によって表現される。この写像はトレース保存であり、完全に正である。次に,チャネル状態対応を用いて量子相関解析を行う。拡大率が大きい場合には、三成分相互情報には大きな負の値があり、これは小さな二成分相互情報に対応する。この結果と局所的な測定から情報を回収する課題を関連づける。

We investigate the quantum correlation for tripartite entangled states in de Sitter space. First, we adopt the noisy quantum channel model. In this model, the expansion effect is represented by an operator sum representation with its corresponding Kraus operator. This map is shown to be trace-preserving and completely positive. Second, we analyze the quantum correlation by using the channel-state correspondence. For a large expansion rate, the tripartite mutual information has a large negative value, which corresponds to a small magnitude of bipartite mutual information. We relate this result with the challenge of recovering information from local measurements.

翻訳日:2024-03-03 19:35:18 公開日:2024-02-18

# 定常エネルギー輸送における周波数依存性ビブロニック効果

Frequency-Dependent Vibronic Effects in Steady State Energy Transport ( http://arxiv.org/abs/2402.16881v1 )

ライセンス: Link先を確認

Leonardo F. Calder\'on and Paul Brumer

(参考訳) 電子と分子内における高周波振動自由度の間の相互作用は、自然光ハーベスティングシステムにおいてユビキタスである。近年の研究では、分子内振動ドナー-受容体周波数差によってエネルギー輸送が促進されることが示されている。ここでは,分子内ドナー-受容体振動周波数の違いが平衡(コヒーレント光励起)における励起エネルギー輸送に与える影響と,より自然な非平衡定常状態(コヒーレント光励起)構成に与える影響を分析する。また,Huang-Rhys因子が一定であれば,受容体の分子内振動頻度がドナーの振動数を上回ると,受容体の数が増加することがわかった。振動周波数差によるアクセプター数の増大は,Huang-Rhys因子の高値や振動結合強度に対して高い値を示した。しかし、非平衡定常状態の結果、振動ドナー・アクセプターの周波数差は、非コヒーレント光励起の自然なシナリオや生物学的に関連するパラメータの下でエネルギー輸送を著しく促進しないことが示された。反応中心での収穫時間の増加に基づいて,NESSにおけるエネルギー移動を最適化する可能性について考察した。

The interplay between electronic and intramolecular high-frequency vibrational degrees of freedom is ubiquitous in natural light-harvesting systems. Recent studies have indicated that an intramolecular vibrational donor-acceptor frequency difference can enhance energy transport. Here, we analyze the extent to which different intramolecular donor-acceptor vibrational frequencies affect excitation energy transport in equilibrium (coherent light excitation) and the more natural nonequilibrium steady state (incoherent light excitation) configurations. It is found that if the Huang-Rhys factors remain constant, the acceptor population increases when the intramolecular vibrational frequency of the acceptor exceeds that of the donor. The increase in the acceptor population due to the vibrational frequency difference is higher for higher values of the Huang-Rhys factors or the vibronic coupling strengths. However, the nonequilibrium steady state results show that the vibrational donor-acceptor frequency difference does not significantly enhance energy transport in the natural scenario of incoherent light excitation and under biologically relevant parameters. Insight about a potential mechanism to optimize energy transfer in the NESS based on increasing the harvesting time at the reaction center is analyzed.

翻訳日:2024-03-03 19:06:47 公開日:2024-02-18

# besa: ブロックワイズパラメータ効率のよいスパルシティアロケーションによる大規模言語モデルのpruning

BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation ( http://arxiv.org/abs/2402.16880v1 )

ライセンス: Link先を確認

Peng Xu, Wenqi Shao, Mengzhao Chen, Shitao Tang, Kaipeng Zhang, Peng Gao, Fengwei An, Yu Qiao, Ping Luo

(参考訳) 大規模言語モデル(llm)は,テキスト要約やテキスト質問処理など,さまざまなタスクにおいて優れた性能を示している。彼らの性能は印象的だが、膨大な数のパラメータによる計算フットプリントは禁止される。 SparseGPTやWandaといった既存のソリューションは、重み付けによってこの問題を緩和しようと試みている。しかし、それらの層的なアプローチはモデルの出力にかなりの摂動をもたらし、モデル全体の性能に悪影響を及ぼすプラニングレートのような細心の注意深いハイパーパラメータチューニングを必要とする。そこで本研究では,ブロックワイド再構成損失を適用して,ブロックワイドパラメータ効率の空間割当(BESA)と呼ばれる新しいLCMプルーニング手法を提案する。典型的な層割り刈り技術とは対照的に、besaには2つの特徴がある。一個々の変圧器ブロックに関する全体的な刈り誤差を目標とし、二層特異的スパルシリティを鑑別可能な方法で割り当てることにより、刈り取り後の性能劣化の低減を図ること。 LLaMA1 や LLaMA2 のような LLM を 1 つの A100 GPU 上で 7B から 70B のパラメータでわずか 5 時間で効率よく刈り取ることができる。コードは \href{https://github.com/OpenGVLab/LLMPrune-BESA}{here} で公開されている。

Large language models (LLMs) have demonstrated outstanding performance in various tasks, such as text summarization, text question-answering, and etc. While their performance is impressive, the computational footprint due to their vast number of parameters can be prohibitive. Existing solutions such as SparseGPT and Wanda attempt to alleviate this issue through weight pruning. However, their layer-wise approach results in significant perturbation to the model's output and requires meticulous hyperparameter tuning, such as the pruning rate, which can adversely affect overall model performance. To address this, this paper introduces a novel LLM pruning technique dubbed blockwise parameter-efficient sparsity allocation (BESA) by applying a blockwise reconstruction loss. In contrast to the typical layer-wise pruning techniques, BESA is characterized by two distinctive attributes: i) it targets the overall pruning error with respect to individual transformer blocks, and ii) it allocates layer-specific sparsity in a differentiable manner, both of which ensure reduced performance degradation after pruning. Our experiments show that BESA achieves state-of-the-art performance, efficiently pruning LLMs like LLaMA1, and LLaMA2 with 7B to 70B parameters on a single A100 GPU in just five hours. Code is available at \href{https://github.com/OpenGVLab/LLMPrune-BESA}{here}.

翻訳日:2024-03-03 19:06:24 公開日:2024-02-18

# radarscenes: 自動車アプリケーションのための現実世界のレーダーポイントクラウドデータセット

RadarScenes: A Real-World Radar Point Cloud Data Set for Automotive Applications ( http://arxiv.org/abs/2104.02493v2 )

ライセンス: Link先を確認

Ole Schumann, Markus Hahn, Nicolas Scheiner, Fabio Weishaupt, Julius F. Tilly, J\"urgen Dickmann, Christian W\"ohler

(参考訳) 4時間以上の運転から測定値とポイントワイズアノテーションを備えた新しい自動車レーダデータセットが提示された。 1台の試験車に搭載された4つのレーダセンサーから得られたデータを記録し、動的物体の個別検出を手動でクラスターにグループ化し、その後ラベル付けした。このデータセットの目的は、移動道路利用者に焦点を当てた新しい(機械学習に基づく)レーダ認識アルゴリズムの開発を可能にすることである。記録されたシーケンスの画像は、ドキュメンタリーカメラで撮影された。将来のオブジェクト検出および分類アルゴリズムの評価のために,研究者が共通のアルゴリズムを評価できるように,スコア計算の提案を行う。追加情報とダウンロード手順は、データセットのウェブサイト(www.radar-scenes.com)で見ることができる。

A new automotive radar data set with measurements and point-wise annotations from more than four hours of driving is presented. Data provided by four series radar sensors mounted on one test vehicle were recorded and the individual detections of dynamic objects were manually grouped to clusters and labeled afterwards. The purpose of this data set is to enable the development of novel (machine learning-based) radar perception algorithms with the focus on moving road users. Images of the recorded sequences were captured using a documentary camera. For the evaluation of future object detection and classification algorithms, proposals for score calculation are made so that researchers can evaluate their algorithms on a common basis. Additional information as well as download instructions can be found on the website of the data set: www.radar-scenes.com.

翻訳日:2024-02-22 22:05:58 公開日:2024-02-18

# GPT4Motion:Blender-Oriented GPT Planningによるテキスト・ビデオ生成における物理動作のスクリプト作成

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning ( http://arxiv.org/abs/2311.12631v2 )

ライセンス: Link先を確認

Jiaxi Lv and Yi Huang and Mingfu Yan and Jiancheng Huang and Jianzhuang Liu and Yifan Liu and Yafei Wen and Xiaoxin Chen and Shifeng Chen

(参考訳) テキスト対ビデオ生成の最近の進歩は、拡散モデルの力を利用して、テキストプロンプトに基づいて視覚的に魅力的なコンテンツを作成する。しかし、通常高い計算コストに遭遇し、コヒーレントな物理的動きを持つビデオを作るのに苦労する。そこで本研究では,gptなどの大規模言語モデルの計画能力,ブレンダの物理シミュレーション強度,映像合成の質を高めるためのテキスト・画像拡散モデルの優れた画像生成能力を活用する,トレーニングフリーなフレームワークであるgpt4motionを提案する。具体的には、gpt4motionはgpt-4を使用してユーザーテキストプロンプトに基づいたブレンダースクリプトを生成し、blenderの組み込み物理エンジンにフレーム間のコヒーレントな物理運動をカプセル化する基本的なシーンコンポーネントを作成するよう命令する。そして、これらのコンポーネントを安定拡散に入力し、テキストプロンプトに合わせたビデオを生成する。剛体物体の落下・衝突・布のドッピング・揺動・液流を含む3つの基本的な物理運動シナリオの実験結果から,GPT4Motionは動きのコヒーレンシと実体の整合性を維持する上で,高品質な映像を効率よく生成できることを示した。 GPT4Motionは、テキスト・ビデオ研究における新たな洞察を提供し、その品質を高め、さらなる探索のための地平を広げる。

Recent advances in text-to-video generation have harnessed the power of diffusion models to create visually compelling content conditioned on text prompts. However, they usually encounter high computational costs and often struggle to produce videos with coherent physical motions. To tackle these issues, we propose GPT4Motion, a training-free framework that leverages the planning capability of large language models such as GPT, the physical simulation strength of Blender, and the excellent image generation ability of text-to-image diffusion models to enhance the quality of video synthesis. Specifically, GPT4Motion employs GPT-4 to generate a Blender script based on a user textual prompt, which commands Blender's built-in physics engine to craft fundamental scene components that encapsulate coherent physical motions across frames. Then these components are inputted into Stable Diffusion to generate a video aligned with the textual prompt. Experimental results on three basic physical motion scenarios, including rigid object drop and collision, cloth draping and swinging, and liquid flow, demonstrate that GPT4Motion can generate high-quality videos efficiently in maintaining motion coherency and entity consistency. GPT4Motion offers new insights in text-to-video research, enhancing its quality and broadening its horizon for further explorations.

翻訳日:2024-02-21 20:15:37 公開日:2024-02-18

# ModelGPT: モデル生成のためのLLMの能力の解放

ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation ( http://arxiv.org/abs/2402.12408v1 )

ライセンス: Link先を確認

Zihao Tang, Zheqi Lv, Shengyu Zhang, Fei Wu, Kun Kuang

(参考訳) 大規模言語モデル(llm)の急速な進歩は、ルーチンタスクを自動化することで様々な分野に革命をもたらし、人工知能(agi)の実現に向けた一歩となった。しかしながら、ユーザのニーズの多様さや、平均的なユーザに対するaiモデルの利用の簡素化にはまだ苦労している。そこで本研究では,ユーザが提供するデータやタスク記述に合わせたaiモデルを,llmの機能を活用して決定・生成する新しいフレームワークであるmodelgptを提案する。ユーザの要求に応じて、ModelGPTは以前のパラダイム(全パラメータやLoRAファインタニングなど)よりも少なくとも270倍高速なモデルを提供することができる。 NLP、CV、Tabularデータセットに関する包括的な実験は、AIモデルをよりアクセシブルでユーザフレンドリにするためのフレームワークの有効性を実証しています。私たちのコードはhttps://github.com/IshiKura-a/ModelGPTで利用可能です。

The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to accommodate the diverse and specific needs of users and simplify the utilization of AI models for the average user. In response, we propose ModelGPT, a novel framework designed to determine and generate AI models specifically tailored to the data or task descriptions provided by the user, leveraging the capabilities of LLMs. Given user requirements, ModelGPT is able to provide tailored models at most 270x faster than the previous paradigms (e.g. all-parameter or LoRA finetuning). Comprehensive experiments on NLP, CV, and Tabular datasets attest to the effectiveness of our framework in making AI models more accessible and user-friendly. Our code is available at https://github.com/IshiKura-a/ModelGPT.

翻訳日:2024-02-21 18:50:45 公開日:2024-02-18

# FPGA上の局所ラプラシアンフィルタの高速化

Accelerating local laplacian filters on FPGAs ( http://arxiv.org/abs/2402.12407v1 )

ライセンス: Link先を確認

Shashwat Khandelwal, Ziaul Choudhury, Shashwat Shrivastava and Suresh Purini

(参考訳) 様々なエンハンスメント技術を用いて処理された画像は、しばしばエッジ劣化やハロなどの不要なアーティファクトにつながる。これらのアーティファクトは、画像の品質を損なうことができる写真応用にとって大きな問題となる。画像処理の分野ではエッジアウェア技術が数多く提案されている。しかし、これらは複雑な最適化や後処理の方法の応用を必要とする。局所ラプラシアンフィルタリングは、単純なガウスピラミッドとラプラシアンピラミッドの構築を含むエッジ対応画像処理技術である。このテクニックは、ディテールの平滑化、ディテールエンハンスメント、トーンマッピング、画像の逆トーンマッピングにうまく適用でき、アーティファクトフリーにしておくことができる。しかし、このアプローチの問題は計算コストが高いことだ。そのため、マルチコアCPUとGPUを用いた並列化方式が提案されている。良く知られたように、電力効率は高くなく、fpga上のよく設計されたハードウェアアーキテクチャはワットメートル当たりの性能を良くすることができる。本稿では,オンチップFPGAリソースの利用を最小化しつつ,ローカルラプラシアンフィルタアルゴリズムで利用可能な並列性を完全に活用するハードウェアアクセラレータを提案する。 Virtex-7 FPGAでは、最適化されたベースラインCPU実装と比較して、1MBの画像を処理するための7.5倍のスピードアップが得られる。私たちの知る限りでは、ローカルラプラシアンフィルタリング問題の研究文献で提案されている他のハードウェアアクセラレータには気づいていません。

Images when processed using various enhancement techniques often lead to edge degradation and other unwanted artifacts such as halos. These artifacts pose a major problem for photographic applications where they can denude the quality of an image. There is a plethora of edge-aware techniques proposed in the field of image processing. However, these require the application of complex optimization or post-processing methods. Local Laplacian Filtering is an edge-aware image processing technique that involves the construction of simple Gaussian and Laplacian pyramids. This technique can be successfully applied for detail smoothing, detail enhancement, tone mapping and inverse tone mapping of an image while keeping it artifact-free. The problem though with this approach is that it is computationally expensive. Hence, parallelization schemes using multi-core CPUs and GPUs have been proposed. As is well known, they are not power-efficient, and a well-designed hardware architecture on an FPGA can do better on the performance per watt metric. In this paper, we propose a hardware accelerator, which exploits fully the available parallelism in the Local Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image when compared to an optimized baseline CPU implementation. To the best of our knowledge, we are not aware of any other hardware accelerators proposed in the research literature for the Local Laplacian Filtering problem.

翻訳日:2024-02-21 18:50:28 公開日:2024-02-18

# 教師としての教師 : 教師非依存のデータフリー知識蒸留

Teacher as a Lenient Expert: Teacher-Agnostic Data-Free Knowledge Distillation ( http://arxiv.org/abs/2402.12406v1 )

ライセンス: Link先を確認

Hyunjune Shin, Dong-Wan Choi

(参考訳) data-free knowledge distillation (dfkd) は、学習済みの知識を、元のデータを使わずに、ジェネレータの助けを借りて学生モデルに蒸留することを目的としている。このようなデータのないシナリオでは、DFKDの安定した性能を達成することが不可欠である。残念ながら,既存のDFKD法は様々な教師モデルに非常に敏感であり,よく訓練された教師モデルを用いても,蒸留の破滅的な失敗を示すことがある。 DFKDのジェネレータは,クラスプライアロスと対角損失の最小化という従来の代表的戦略を用いて,正確かつ多様なサンプルを生成することが常に保証されているわけではない。実験では,クラス優先が生成したサンプルの多様性を減少させるだけでなく,教師モデルによって予期せぬほど低品質なサンプルを生成する問題に完全に対処できないことに着目した。本稿では,教師モデルによらず,より堅牢で安定した性能を目指して,教師に依存しないデータフリー知識蒸留法(TA-DFKD)を提案する。私たちの基本的な考え方は、ジェネレータにクラス優先を強制する厳格な監督者ではなく、教師モデルにサンプルを評価するための寛大な専門家の役割を割り当てることです。具体的には,教師モデルによって検証されたクリーンなサンプルのみを取り出すサンプル選択手法を,多様なサンプル生成のパワーに制約を課さずに設計する。実験により,既存のDFKD法よりも高い性能を示しながら,様々な教師モデルにおける頑健さと訓練安定性を両立させることができた。

Data-free knowledge distillation (DFKD) aims to distill pretrained knowledge to a student model with the help of a generator without using original data. In such data-free scenarios, achieving stable performance of DFKD is essential due to the unavailability of validation data. Unfortunately, this paper has discovered that existing DFKD methods are quite sensitive to different teacher models, occasionally showing catastrophic failures of distillation, even when using well-trained teacher models. Our observation is that the generator in DFKD is not always guaranteed to produce precise yet diverse samples using the existing representative strategy of minimizing both class-prior and adversarial losses. Through our empirical study, we focus on the fact that class-prior not only decreases the diversity of generated samples, but also cannot completely address the problem of generating unexpectedly low-quality samples depending on teacher models. In this paper, we propose the teacher-agnostic data-free knowledge distillation (TA-DFKD) method, with the goal of more robust and stable performance regardless of teacher models. Our basic idea is to assign the teacher model a lenient expert role for evaluating samples, rather than a strict supervisor that enforces its class-prior on the generator. Specifically, we design a sample selection approach that takes only clean samples verified by the teacher model without imposing restrictions on the power of generating diverse samples. Through extensive experiments, we show that our method successfully achieves both robustness and training stability across various teacher models, while outperforming the existing DFKD methods.

翻訳日:2024-02-21 18:50:08 公開日:2024-02-18

# scInterpreter: セル型アノテーションのためのscRNA-seqデータ解釈のための大規模言語モデルのトレーニング

scInterpreter: Training Large Language Models to Interpret scRNA-seq Data for Cell Type Annotation ( http://arxiv.org/abs/2402.12405v1 )

ライセンス: Link先を確認

Cong Li, Meng Xiao, Pengfei Wang, Guihai Feng, Xin Li, Yuanchun Zhou

(参考訳) 単一セルのオミックデータを直接読み書きする上で、既存の大規模言語モデルの固有の制限にもかかわらず、基礎モデルとして重要な可能性と柔軟性を示している。本研究は、単一細胞RNAシークエンシングデータにおいて、細胞型を解釈し、区別する機能を備えた大規模言語モデルの訓練および適応方法に焦点を当てる。予備研究の結果,これらの基礎モデルが既知の細胞型を正確に分類し,新しい生物学的知見を明らかにする効果的なツールとしての大規模言語モデルの可能性を示した。

Despite the inherent limitations of existing Large Language Models in directly reading and interpreting single-cell omics data, they demonstrate significant potential and flexibility as the Foundation Model. This research focuses on how to train and adapt the Large Language Model with the capability to interpret and distinguish cell types in single-cell RNA sequencing data. Our preliminary research results indicate that these foundational models excel in accurately categorizing known cell types, demonstrating the potential of the Large Language Models as effective tools for uncovering new biological insights.

翻訳日:2024-02-21 18:49:39 公開日:2024-02-18

# Deep-Lock: ディープニューラルネットワークのセキュアな認証

Deep-Lock: Secure Authorization for Deep Neural Networks ( http://arxiv.org/abs/2008.05966v2 )

ライセンス: Link先を確認

Manaar Alam and Sayandeep Saha and Debdeep Mukhopadhyay and Sandip Kundu

(参考訳) 訓練されたディープニューラルネットワーク(DNN)モデルは、いくつかのビジネスモデルにおいて価値のある知的特性(IP)と見なされている。このようなDNNモデルのIP盗難防止と不正使用は、業界によって大きな関心を集めている。本稿では,鍵型モデルロック方式を提案することで,鍵型モデルが正しい秘密鍵を適用した場合にのみ正常に機能することを保証することで,DNNモデルの不正使用を防止する問題に対処する。提案方式はDeep-Lockと呼ばれ、S-Boxesと優れたセキュリティ特性を利用して、訓練済みのDNNモデルのパラメータを鍵スケジューリングアルゴリズムを介してマスターキーから生成される秘密鍵で暗号化する。結果として、暗号化された重みの密度の高いネットワークは、モデル微調整攻撃に対して堅牢である。最後に、Deep-LockはDNNモデルの構造とトレーニングを一切必要とせず、DNNの既存のソフトウェアおよびハードウェア実装すべてに適用できる。

Trained Deep Neural Network (DNN) models are considered valuable Intellectual Properties (IP) in several business models. Prevention of IP theft and unauthorized usage of such DNN models has been raised as of significant concern by industry. In this paper, we address the problem of preventing unauthorized usage of DNN models by proposing a generic and lightweight key-based model-locking scheme, which ensures that a locked model functions correctly only upon applying the correct secret key. The proposed scheme, known as Deep-Lock, utilizes S-Boxes with good security properties to encrypt each parameter of a trained DNN model with secret keys generated from a master key via a key scheduling algorithm. The resulting dense network of encrypted weights is found robust against model fine-tuning attacks. Finally, Deep-Lock does not require any intervention in the structure and training of the DNN models, making it applicable for all existing software and hardware implementations of DNN.

翻訳日:2024-02-21 07:53:58 公開日:2024-02-18

# 3次元vr-sketchから3次元形状検索へ

Towards 3D VR-Sketch to 3D Shape Retrieval ( http://arxiv.org/abs/2209.10020v2 )

ライセンス: Link先を確認

Ling Luo, Yulia Gryaditskaya, Yongxin Yang, Tao Xiang, Yi-Zhe Song

(参考訳) 無料のオンライン3D形状コレクションは、3D検索の研究を規定している。しかし、活発な議論が続いている。 (i)検索をトリガーする最良の入力モダリティ、及び (ii)そのような検索の究極の使用シナリオ。本稿では,3次元スケッチを入力モダリティとして用い,検索を行うVRシナリオを提案する。したがって、究極のビジョンは、ユーザーがvr環境でエアドルリングすることで3dモデルを自由に取得できることだ。この新しい3dvr-sketch to 3d shape searchの問題を初めて見たとき、私たちは4つの貢献をした。まず、VRユーティリティをコーディングして、3DVRスケッチを収集し、検索を行う。第二に、ModelNetから2つの形状カテゴリーについて、最初の167ドルの3DVRスケッチを収集する。第3に,深層ネットワークを学習するために,抽象レベルが異なる人間の3Dスケッチの合成データセットを作成する手法を提案する。最後に,3次元の形状検索と3次元の形状検索とは対照的に,3次元の形状検索と3次元の立体スケッチのスパースで抽象的な性質により,3次元の形状検索に優れた性能を示すことを示す。これらのコントリビュートが、この課題に対する今後の試みの実現に一役買うと私たちは信じています。 VRインターフェース、コード、データセットはhttps://tinyurl.com/3DSketch3DVで入手できる。

Growing free online 3D shapes collections dictated research on 3D retrieval. Active debate has however been had on (i) what the best input modality is to trigger retrieval, and (ii) the ultimate usage scenario for such retrieval. In this paper, we offer a different perspective towards answering these questions -- we study the use of 3D sketches as an input modality and advocate a VR-scenario where retrieval is conducted. Thus, the ultimate vision is that users can freely retrieve a 3D model by air-doodling in a VR environment. As a first stab at this new 3D VR-sketch to 3D shape retrieval problem, we make four contributions. First, we code a VR utility to collect 3D VR-sketches and conduct retrieval. Second, we collect the first set of $167$ 3D VR-sketches on two shape categories from ModelNet. Third, we propose a novel approach to generate a synthetic dataset of human-like 3D sketches of different abstract levels to train deep networks. At last, we compare the common multi-view and volumetric approaches: We show that, in contrast to 3D shape to 3D shape retrieval, volumetric point-based approaches exhibit superior performance on 3D sketch to 3D shape retrieval due to the sparse and abstract nature of 3D VR-sketches. We believe these contributions will collectively serve as enablers for future attempts at this problem. The VR interface, code and datasets are available at https://tinyurl.com/3DSketch3DV.

翻訳日:2024-02-21 07:50:26 公開日:2024-02-18

# ログデータを用いた半教師付きバッチ学習

Semi-supervised Batch Learning From Logged Data ( http://arxiv.org/abs/2209.07148v3 )

ライセンス: Link先を確認

Gholamali Aminian, Armin Behnamnia, Roberto Vega, Laura Toni, Chengchun Shi, Hamid R. Rabiee, Omar Rivasplata, Miguel R. D. Rodrigues

(参考訳) オフポリシー学習法は、各サンプルポイントのコンテキスト、アクション、フィードバック(コストまたは報酬)を含むログデータからポリシーを学ぶことを意図している。本研究は, リスク最小化フレームワークの構築であり, また, 妥当性スコアへのアクセスも想定している。本稿では,いくつかのサンプルに対してフィードバックが欠落している問題に対する学習方法を提案する。我々は、このタイプの学習を、ログデータから半教師付きバッチ学習と呼び、広範囲のアプリケーションドメインで発生する。このような学習問題に対処するために、逆確率スコア推定器の下で真リスクの新たな上限を導出する。このバウンダリを用いて、正規化項がフィードバックに依存しないログデータを用いた半教師付きバッチ学習手法を提案し、その結果、ログ化された不足フィードバックデータを用いて評価できる。その結果、フィードバックは一部のサンプルにのみ存在するが、不足したフィードバックサンプルを活用することで学習ポリシーを学ぶことができる。ベンチマークデータセットから得られた実験の結果は、これらのアルゴリズムがロギングポリシーよりも優れたパフォーマンスでポリシーを達成することを示している。

Off-policy learning methods are intended to learn a policy from logged data, which includes context, action, and feedback (cost or reward) for each sample point. In this work, we build on the counterfactual risk minimization framework, which also assumes access to propensity scores. We propose learning methods for problems where feedback is missing for some samples, so there are samples with feedback and samples missing-feedback in the logged data. We refer to this type of learning as semi-supervised batch learning from logged data, which arises in a wide range of application domains. We derive a novel upper bound for the true risk under the inverse propensity score estimator to address this kind of learning problem. Using this bound, we propose a regularized semi-supervised batch learning method with logged data where the regularization term is feedback-independent and, as a result, can be evaluated using the logged missing-feedback data. Consequently, even though feedback is only present for some samples, a learning policy can be learned by leveraging the missing-feedback samples. The results of experiments derived from benchmark datasets indicate that these algorithms achieve policies with better performance in comparison with logging policies.

翻訳日:2024-02-21 07:49:49 公開日:2024-02-18

# テクスチャ・サリエンシー適応型注意を画像の漫画化に組み込む学習

Learning to Incorporate Texture Saliency Adaptive Attention to Image Cartoonization ( http://arxiv.org/abs/2208.01587v4 )

ライセンス: Link先を確認

Xiang Gao, Yuqi Zhang, and Yingjie Tian

(参考訳) 画像の漫画化は、近ごろ、教師なしのイメージ・ツー・イメージ翻訳の観点から、特徴ある漫画スタイル(クリアエッジ、スムーズなカラーシェーディング、抽象的な微細構造など)を正確に捉え、十分に伝達することが固有の課題である、生成的敵ネットワーク(GAN)に支配されている。既存の高度なモデルは、エッジを逆方向に推進する学習、スタイル伝達損失の導入、あるいは複数の表現空間からスタイルを整合させる学習により、漫画化効果を高めようとする。本稿では,より鮮明かつ鮮明なマンガ化効果が,基本的対向損失のみで容易に達成できることを実証する。漫画のスタイルが漫画のテクスチャ・サレントなローカル画像領域でより明確であることを示すため,通常の画像レベルと平行して,漫画のテクスチャの特徴をよりよく認識し伝達するために,漫画のテクスチャ・サレントなローカルパッチに対する逆学習を制限する領域レベルの逆学習ブランチを構築した。そこで, マンガ・テクスチュア・サリエンシ・サンプラー (CTSS) モジュールを提案し, トレーニングデータからマンガ・テクスチュア・サリエントパッチを動的にサンプリングする。広範な実験により,画像マンガ化における関連する手法の欠如成分として,敵対的学習におけるテクスチャ・サリエンシー適応的注意が,特に高分解能入力画像において,画像マンガのスタイライゼーションの促進と向上に重要であることを実証した。

Image cartoonization is recently dominated by generative adversarial networks (GANs) from the perspective of unsupervised image-to-image translation, in which an inherent challenge is to precisely capture and sufficiently transfer characteristic cartoon styles (e.g., clear edges, smooth color shading, abstract fine structures, etc.). Existing advanced models try to enhance cartoonization effect by learning to promote edges adversarially, introducing style transfer loss, or learning to align style from multiple representation space. This paper demonstrates that more distinct and vivid cartoonization effect could be easily achieved with only basic adversarial loss. Observing that cartoon style is more evident in cartoon-texture-salient local image regions, we build a region-level adversarial learning branch in parallel with the normal image-level one, which constrains adversarial learning on cartoon-texture-salient local patches for better perceiving and transferring cartoon texture features. To this end, a novel cartoon-texture-saliency-sampler (CTSS) module is proposed to dynamically sample cartoon-texture-salient patches from training data. With extensive experiments, we demonstrate that texture saliency adaptive attention in adversarial learning, as a missing ingredient of related methods in image cartoonization, is of significant importance in facilitating and enhancing image cartoon stylization, especially for high-resolution input pictures.

翻訳日:2024-02-21 07:49:30 公開日:2024-02-18

# 融合ラッソグラフにおける分散推定

Variance estimation in graphs with the fused lasso ( http://arxiv.org/abs/2207.12638v3 )

ライセンス: Link先を確認

Oscar Hernan Madrid Padilla

(参考訳) 一般グラフ構造問題における分散推定の問題について検討する。まず、一般グラフの分散を一貫して推定できる相補的ケースに対する線形時間推定器を開発する。我々の推定器は,平均信号が標準スケーリングと全く異なる場合,チェーンと2次元グリッドグラフの最小値が得られることを示す。さらに、モーメント条件下での一般グラフにおける融合ラッソ推定器の平均二乗誤差性能と誤差のテール挙動のバウンドについて、一般上限を与える。これらの上界は、誤差が準ガウス確率変数であるという仮定でしか持たない、融合ラッソ上の部分指数(sub-exponential)のような、より広い分布のクラスへの一般化を可能にする。上界を爆発させると、ヘテロ代用の場合のばらつきの信号を推定する単純な総変分正規化推定器を研究する。また,我々のヘテロシドスティック分散推定器が,グリッドグラフの有界変動の信号と,k$-nearest近傍グラフを推定するための最小値を得ることを示す下限を提供し,任意の連結グラフの分散を推定するための推定器との整合性を示す。

We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. We also provide lower bounds showing that our heteroscedastic variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, and $K$-nearest neighbor graphs, and the estimator is consistent for estimating the variances in any connected graph.

翻訳日:2024-02-21 07:48:25 公開日:2024-02-18

# 視線シフトの本質的なコストによる健康モデルによる次の固定の予測の改善

Improving saliency models' predictions of the next fixation with humans' intrinsic cost of gaze shifts ( http://arxiv.org/abs/2207.04250v3 )

ライセンス: Link先を確認

Florian Kadner, Tobias Thomas, David Hoppe and Constantin A. Rothkopf

(参考訳) 画像領域の人間の優先順位付けは、サリエンシマップやスキャンパスモデルを用いて時間不変の方法でモデル化することができる。しかしながら、どちらのモデルもいくつかのベンチマークやデータセットで着実に改善されているものの、人間の視線を予測するには大きなギャップがある。本稿では,このギャップを減らすために,次の視線目標を予測するための原則的枠組みを確立する理論的解析と,視線スイッチの人的コストを画像の内容とは無関係に実証的に測定する。本稿では,任意の静的サリエンシマップを動的履歴依存値マップの列に変換し,視線シフト後に再計算する逐次決定の枠組みにアルゴリズムを導入する。これらの地図は、 1) 任意の給与モデルによって提供される給与マップ。 2)最近測定された人的コスト関数は、眼球運動の大きさと方向の嗜好を定量化し、 3) 逐次的探索ボーナスは,その後の視線シフト毎に変化する。この探索ボーナスの空間的範囲と時間的減衰のパラメータは、人間の視線データから推定される。これら3つのコンポーネントの相対的な貢献は、nssスコアのmit1003データセットに最適化されており、3つの画像データセット上の5つの状態のアートサリエンシーモデルに対して、nssとaucスコアの次の視線目標の予測を著しく上回るほどである。そこで我々は、人間の視線嗜好の実装を行い、人間の次の視線目標に対する任意の正当性モデルの予測を改善するために使用できる。

The human prioritization of image regions can be modeled in a time invariant fashion with saliency maps or sequentially with scanpath models. However, while both types of models have steadily improved on several benchmarks and datasets, there is still a considerable gap in predicting human gaze. Here, we leverage two recent developments to reduce this gap: theoretical analyses establishing a principled framework for predicting the next gaze target and the empirical measurement of the human cost for gaze switches independently of image content. We introduce an algorithm in the framework of sequential decision making, which converts any static saliency map into a sequence of dynamic history-dependent value maps, which are recomputed after each gaze shift. These maps are based on 1) a saliency map provided by an arbitrary saliency model, 2) the recently measured human cost function quantifying preferences in magnitude and direction of eye movements, and 3) a sequential exploration bonus, which changes with each subsequent gaze shift. The parameters of the spatial extent and temporal decay of this exploration bonus are estimated from human gaze data. The relative contributions of these three components were optimized on the MIT1003 dataset for the NSS score and are sufficient to significantly outperform predictions of the next gaze target on NSS and AUC scores for five state of the art saliency models on three image data sets. Thus, we provide an implementation of human gaze preferences, which can be used to improve arbitrary saliency models' predictions of humans' next gaze targets.

翻訳日:2024-02-21 07:48:01 公開日:2024-02-18

# 適応型クラスアクティベーションマッピングによるマルチビュー機能拡張

Multi-view Feature Augmentation with Adaptive Class Activation Mapping ( http://arxiv.org/abs/2206.12943v4 )

ライセンス: Link先を確認

Xiang Gao, Yingjie Tian, and Zhiquan Qi

(参考訳) モデル性能を向上させるために,複数ビューの局所的特徴を抽出し,活用する画像分類のためのエンドツーエンド・トレーニング可能な機能拡張モジュールを提案する。グローバル平均プーリング(GAP)を用いて,グローバルビューのみからベクトル化された特徴を抽出するのと異なり,モデルロバスト性を改善するため,多様な多視点局所特徴をサンプリング・アンサンブルすることを提案する。今回提案したAdaCAM (Adaptive Class Activation Mapping, 適応型クラス活性化マッピング) を通じて, 特徴マップのクラス識別ローカル領域に効率よく適応的に対応できる, 単純な補助的分類器ヘッド(1$\times$1畳み込み層を含む)を組み込んだ。広範な実験は、マルチビュー機能拡張モジュールによって達成された一貫性と注目すべきパフォーマンスの向上を示しています。

We propose an end-to-end-trainable feature augmentation module built for image classification that extracts and exploits multi-view local features to boost model performance. Different from using global average pooling (GAP) to extract vectorized features from only the global view, we propose to sample and ensemble diverse multi-view local features to improve model robustness. To sample class-representative local features, we incorporate a simple auxiliary classifier head (comprising only one 1$\times$1 convolutional layer) which efficiently and adaptively attends to class-discriminative local regions of feature maps via our proposed AdaCAM (Adaptive Class Activation Mapping). Extensive experiments demonstrate consistent and noticeable performance gains achieved by our multi-view feature augmentation module.

翻訳日:2024-02-21 07:46:35 公開日:2024-02-18

# SMEMO: 軌道予測のためのソーシャルメモリ

SMEMO: Social Memory for Trajectory Forecasting ( http://arxiv.org/abs/2203.12446v2 )

ライセンス: Link先を確認

Francesco Marchetti, Federico Becattini, Lorenzo Seidenari, Alberto Del Bimbo

(参考訳) 人間の相互作用の効果的なモデリングは、将来の軌跡のような行動を予測する際に最も重要である。それぞれの個人は、その動きによって周囲のエージェントに影響を与え、全員が衝突回避やグループフォローのような社会的に記述されていない規則に従う。本稿では,アルゴリズム的な観点から,すなわちデータ操作タスクとして問題を見ることにより,時間を通じて常に進化するそのようなインタラクションをモデル化する。本稿では,各エージェントに関する情報の連続書き込み,更新,リコールが可能な外部ストレージとして機能する,エンドツーエンドのトレーニング可能な作業メモリに基づくニューラルネットワークを提案する。提案手法は,異なるエージェントの動き間の説明可能な因果関係を学習し,複数の軌道予測データセットの最先端結果を得る。

Effective modeling of human interactions is of utmost importance when forecasting behaviors such as future trajectories. Each individual, with its motion, influences surrounding agents since everyone obeys to social non-written rules such as collision avoidance or group following. In this paper we model such interactions, which constantly evolve through time, by looking at the problem from an algorithmic point of view, i.e. as a data manipulation task. We present a neural network based on an end-to-end trainable working memory, which acts as an external storage where information about each agent can be continuously written, updated and recalled. We show that our method is capable of learning explainable cause-effect relationships between motions of different agents, obtaining state-of-the-art results on multiple trajectory forecasting datasets.

翻訳日:2024-02-21 07:44:34 公開日:2024-02-18

# 離散力学系における非自明な最小固定点の探索

Finding Nontrivial Minimum Fixed Points in Discrete Dynamical Systems ( http://arxiv.org/abs/2301.04090v4 )

ライセンス: Link先を確認

Zirou Qiu, Chen Chen, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

(参考訳) ネットワーク化された離散力学システムは、協調ゲームにおけるエージェントによる伝染と意思決定の拡散をモデル化するためにしばしば用いられる。このような力学系の固定点は、システムが収束する構成を表す。望ましくない感染(噂や誤報など)の拡散においては、少数の影響を受けるノードを持つ固定点への収束が望ましい目標である。このような考慮により、影響を受けるノード数が最小となるシステムの非自明な固定点を見つけるという、新しい最適化問題を定式化する。 p = np でない限り、この問題の解を任意の定数エプシロン > 0 の係数 n^1-\epsilon に近似する多項式時間アルゴリズムは存在しない。この計算難易度に対処するため,この問題を効率的に解決できる特別な事例をいくつか挙げる。さらに,適切な大きさのネットワークに対する問題に対処する整数線形プログラムを提案する。大規模ネットワーク上での問題を解くために、欲求選択法とともに一般的なヒューリスティックな枠組みを提案する。実世界のネットワークにおける広範囲な実験結果から,提案するヒューリスティックスの有効性が示された。

Networked discrete dynamical systems are often used to model the spread of contagions and decision-making by agents in coordination games. Fixed points of such dynamical systems represent configurations to which the system converges. In the dissemination of undesirable contagions (such as rumors and misinformation), convergence to fixed points with a small number of affected nodes is a desirable goal. Motivated by such considerations, we formulate a novel optimization problem of finding a nontrivial fixed point of the system with the minimum number of affected nodes. We establish that, unless P = NP, there is no polynomial time algorithm for approximating a solution to this problem to within the factor n^1-\epsilon for any constant epsilon > 0. To cope with this computational intractability, we identify several special cases for which the problem can be solved efficiently. Further, we introduce an integer linear program to address the problem for networks of reasonable sizes. For solving the problem on larger networks, we propose a general heuristic framework along with greedy selection methods. Extensive experimental results on real-world networks demonstrate the effectiveness of the proposed heuristics.

翻訳日:2024-02-21 07:36:58 公開日:2024-02-18

# 時間系の絡み合いと特殊相対性

Time-System Entanglement and Special Relativity ( http://arxiv.org/abs/2212.13348v3 )

ライセンス: Link先を確認

Ngo Phuc Duc Loc

(参考訳) 空間と時間は古典物理学ではほぼ等しく扱われるが、量子力学ではそうではないことも分かっている。空間と時間の両方の量子記述は、現実の量子性を理解する上で重要である。量子時間のページ・ウーター機構は、量子系の進化と量子時間自由度の間の絡み合いによって記述される、有望な出発点である。本稿では,ローレンツ変換によって誘起されるウィグナー回転により量子系と絡み合う量子ビット時計を考える。この時間系の絡み合いがローレンツ加速の速さに依存するかを研究する。実例として、ガウス運動量分布を持つスピン-1/2粒子の場合を考える。また、時間系の絡み合いエントロピーとスピン運動量絡みエントロピーを比較し、前者が後者より小さいことを発見した。

We know that space and time are treated almost equally in classical physics, but we also know that this is not the case for quantum mechanics. A quantum description of both space and time is important to really understand the quantum nature of reality. The Page-Wootters mechanism of quantum time is a promising starting point, according to which the evolution of the quantum system is described by the entanglement between it and quantum temporal degrees of freedom. In this paper, we consider a qubit clock that is entangled with a quantum system due to the Wigner rotation induced by Lorentz transformation. We study how this time-system entanglement depends on the rapidity of the Lorentz boost. We consider the case of a spin-1/2 particle with Gaussian momentum distribution as a concrete example. We also compare the time-system entanglement entropy with the spin-momentum entanglement entropy and find that the former is smaller than the latter.

翻訳日:2024-02-21 07:36:42 公開日:2024-02-18

# 分数と乗法による関数線形回帰の統計的最適性

Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression ( http://arxiv.org/abs/2211.10968v3 )

ライセンス: Link先を確認

Jiading Liu and Lei Shi

(参考訳) 再生核ヒルベルト空間(英語版)(rkhs)における正規化関数線形回帰の以前の解析では、通常この核空間に含まれる対象関数が必要である。本稿では, 対象関数が基礎となるRKHSに必ずしも属さないシナリオにおいて, 分割・コンカレント推定器の収束性能について検討する。分解に基づくスケーラブルなアプローチとして、関数線形回帰の分割・収束推定器は、時間とメモリにおけるアルゴリズムの複雑さを大幅に減らすことができる。我々は、説明変数と対象関数の様々な規則性条件下での分割・対数推定器を用いた予測のための、シャープな有限標本上限を確立するための積分作用素アプローチを開発する。また、最小最大下界を構築することによって導出率の漸近的最適性を証明する。最後に,無騒音推定器の収束について考察し,穏やかな条件下では任意の速度で推定できることを示す。

Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert space (RKHS) typically requires the target function to be contained in this kernel space. This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not necessarily reside in the underlying RKHS. As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory. We develop an integral operator approach to establish sharp finite sample upper bounds for prediction with divide-and-conquer estimators under various regularity conditions of explanatory variables and target function. We also prove the asymptotic optimality of the derived rates by building the mini-max lower bounds. Finally, we consider the convergence of noiseless estimators and show that the rates can be arbitrarily fast under mild conditions.

翻訳日:2024-02-21 07:34:35 公開日:2024-02-18

# データフローエンジンによる高最適化量子回路

Highly optimized quantum circuits synthesized via data-flow engines ( http://arxiv.org/abs/2211.07685v3 )

ライセンス: Link先を確認

Peter Rakyta, Gregory Morse, Jakab N\'adori, Zita Majnay-Tak\'acs, Oskar Mencer, Zolt\'an Zimbor\'as

(参考訳) 最少数のゲート演算による量子プログラムの定式化は、近年アクセス可能なノイズ量子プロセッサから有意義な結果を得るために重要である。本研究では、FPGA(Field Programmable Gate Array)ベースのデータフローエンジン(DFE)を用いて、可変量子コンパイラをスケールアップし、最大9ドルの量子ビットプログラムまで回路を合成する。このゲートデコンポザは、FPGAチップ上の単一キュービット回転からなる任意の量子回路をシミュレートし、2キュービットゲートを制御するように設計された、新しく開発されたDFE量子コンピュータシミュレータを利用する。 QISKITパッケージを用いたベンチマークでは,SQUANDERパッケージ(DFEアクセラレータサポート付き)が生成する回路の深さは平均で9,7 %以下であったが,回路の忠実度は最大で$\sim10^{-4}の誤差に近かった。

The formulation of quantum programs in terms of the fewest number of gate operations is crucial to retrieve meaningful results from the noisy quantum processors accessible these days. In this work, we demonstrate a use-case for Field Programmable Gate Array (FPGA) based data-flow engines (DFEs) to scale up variational quantum compilers to synthesize circuits up to $9$-qubit programs.This gate decomposer utilizes a newly developed DFE quantum computer simulator that is designed to simulate arbitrary quantum circuit consisting of single qubit rotations and controlled two-qubit gates on FPGA chips. In our benchmark with the QISKIT package, the depth of the circuits produced by the SQUANDER package (with the DFE accelerator support) were less by $97\%$ on average, while the fidelity of the circuits was still close to unity up to an error of $\sim10^{-4}$.

翻訳日:2024-02-21 07:34:18 公開日:2024-02-18

# 減数化拡散から減数化マルコフモデルへ

From Denoising Diffusions to Denoising Markov Models ( http://arxiv.org/abs/2211.03595v3 )

ライセンス: Link先を確認

Joe Benton, Yuyang Shi, Valentin De Bortoli, George Deligiannidis, Arnaud Doucet

(参考訳) ノイズ拡散は、驚くべき経験的性能を示す最先端の生成モデルである。それらは、データ分布をガウス分布に拡散し、このノミネーションプロセスを逆転して合成データポイントを得るように学習することで機能する。ノイズ拡散は、スコアマッチングを用いたノイズデータ密度の対数微分の近似に依存する。このようなモデルは、事前および可能性からのみサンプリングできる場合、近似後続シミュレーションの実行にも使用できる。本稿では,このアプローチを広い範囲に一般化した統一フレームワークを提案し,スコアマッチングを独自に拡張する。様々なアプリケーションで得られたモデルを説明します。

Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. The denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities using score matching. Such models can also be used to perform approximate posterior simulation when one can only sample from the prior and likelihood. We propose a unifying framework generalising this approach to a wide class of spaces and leading to an original extension of score matching. We illustrate the resulting models on various applications.

翻訳日:2024-02-21 07:33:45 公開日:2024-02-18

# 未分離調理映像からのレシピ生成

Recipe Generation from Unsegmented Cooking Videos ( http://arxiv.org/abs/2209.10134v2 )

ライセンス: Link先を確認

Taichi Nishimura and Atsushi Hashimoto and Yoshitaka Ushiku and Hirotaka Kameko and Shinsuke Mori

(参考訳) 本稿では,(1)調理完了時に重要なイベントを抽出し,(2)抽出したイベントの文を生成することをエージェントに要求する,無節の調理ビデオからのレシピ生成に取り組む。我々の課題は、出来事を徹底的に検出し、それらに対する文を生成することを目的とした高密度ビデオキャプション(DVC)と似ている。しかし、レシピ生成においては、DVCとは異なり、レシピストーリーの認識が不可欠であり、モデルが正しい順序で適切な回数のイベントを抽出し、それらに基づいて正確な文章を生成する必要がある。 dvcモデルの出力を分析し、(1)いくつかのイベントをレシピストーリーとして採用できるが、(2)生成された文が視覚的な内容に基づかないことを確認した。これに基づいて,出力イベントからoracleイベントを選択し,文章を再生成することで,適切なレシピを得るという目標を設定しました。そこで本研究では,DVCのイベントからオラクルイベントを選択して文を生成するイベントセレクタと文生成器をトレーニングする,トランスフォーマーに基づくマルチモーダルリカレントアプローチを提案する。さらに、より正確なレシピを生成するために材料を含めることでモデルを拡張する。実験の結果,提案手法は最先端DVCモデルよりも優れていた。また,本モデルでは,レシピをストーリーアウェアな方法でモデル化することにより,適切なイベント数を正しい順序で出力することを確認した。

This paper tackles recipe generation from unsegmented cooking videos, a task that requires agents to (1) extract key events in completing the dish and (2) generate sentences for the extracted events. Our task is similar to dense video captioning (DVC), which aims at detecting events thoroughly and generating sentences for them. However, unlike DVC, in recipe generation, recipe story awareness is crucial, and a model should extract an appropriate number of events in the correct order and generate accurate sentences based on them. We analyze the output of the DVC model and confirm that although (1) several events are adoptable as a recipe story, (2) the generated sentences for such events are not grounded in the visual content. Based on this, we set our goal to obtain correct recipes by selecting oracle events from the output events and re-generating sentences for them. To achieve this, we propose a transformer-based multimodal recurrent approach of training an event selector and sentence generator for selecting oracle events from the DVC's events and generating sentences for them. In addition, we extend the model by including ingredients to generate more accurate recipes. The experimental results show that the proposed method outperforms state-of-the-art DVC models. We also confirm that, by modeling the recipe in a story-aware manner, the proposed model outputs the appropriate number of events in the correct order.

翻訳日:2024-02-21 07:32:28 公開日:2024-02-18

# イギリスのバイオバンク・ファンドによるパーキンソン病の深層学習予測とインシデント予測

Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging ( http://arxiv.org/abs/2302.06727v3 )

ライセンス: Link先を確認

Charlie Tran, Kai Shen, Kang Liu, Akshay Ashok, Adolfo Ramirez-Zamora, Jinghua Chen, Yulin Li, and Ruogu Fang

(参考訳) パーキンソン病は世界最速の神経疾患である。パーキンソン病のメカニズムを解明し、診断を自動化する研究は、パーキンソン病患者の治療を大幅に改善する。現在の診断方法は高価であり、可用性は限られている。本疾患の発症・進展を考慮すれば, 診断的スクリーニングは, 症状の発症前にも診断的に正確であり, 医療的介入を許容すべきである。我々は、パーキンソン病の診断検査として、しばしば脳への窓と呼ばれる網膜基底像を強調した。パーキンソン病をイギリスのバイオバンク法から分類するための従来の機械学習とディープラーニングの手法を体系的に評価した。以上の結果から,パーキンソン病患者は年齢と性差のある健常者で,auc (auc) の0.77。この精度はパーキンソン病の流行または発症の予測において維持される。説明可能性と信頼性は、局所的なバイオマーカーの視覚属性マップと、データ摂動に対するモデルロバストネスの定量化によって向上する。

Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations.

翻訳日:2024-02-21 07:22:51 公開日:2024-02-18

# 近似輸送地図を用いたサンプリングについて

On Sampling with Approximate Transport Maps ( http://arxiv.org/abs/2302.04763v3 )

ライセンス: Link先を確認

Louis Grenioux, Alain Durmus, \'Eric Moulines, Marylou Gabri\'e

(参考訳) トランスポートマップは、扱いやすい分布に変換することで、非自明なジオメトリを持つ分布のサンプリングを容易にすることができる。このアプローチのポテンシャルは、ターゲットに向かって参照分布をプッシュするようにトレーニングされたディープニューラルネットワークでパラメータ化されたマップである正規化フロー(NF)の開発によって高まっている。 NF強化サンプリング器が最近提案したブレンド(マルコフ連鎖)モンテカルロ法 (i)その流れから引き出すもの,又は (ii)フローベースの再パラメータ化。いずれの場合も、学習した輸送条件の品質が向上する。本研究は,これら2つのアプローチの相対的強みと弱みを初めて明らかにした。本研究は,マルチモーダルターゲットを適度な高次元までフローベースの提案で確実に処理できることを結論づける。対照的に、再パラメトリゼーションに依存する手法はマルチモダリティに苦しむが、高次元の設定や訓練不足においてはより堅牢である。さらに, 目的-目的の妥当性の影響を明らかにするために, 独立系メトロポリス・ハスティングスサンプリング装置の混合時間に対する新しい定量的境界を導出する。

Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target. NF-enhanced samplers recently proposed blend (Markov chain) Monte Carlo methods with either (i) proposal draws from the flow or (ii) a flow-based reparametrization. In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of target-proposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler.

翻訳日:2024-02-21 07:22:18 公開日:2024-02-18

# WOMD-LiDAR:モーション予測のための生センサデータセットベンチマーク

WOMD-LiDAR: Raw Sensor Dataset Benchmark for Motion Forecasting ( http://arxiv.org/abs/2304.03834v2 )

ライセンス: Link先を確認

Kan Chen, Runzhou Ge, Hang Qiu, Rami AI-Rfou, Charles R. Qi, Xuanyu Zhou, Zoey Yang, Scott Ettinger, Pei Sun, Zhaoqi Leng, Mustafa Baniodeh, Ivan Bogun, Weiyue Wang, Mingxing Tan, Dragomir Anguelov

(参考訳) 広く採用されている動き予測データセットは、観測された感覚入力を3Dボックスやポリラインのような高レベルの抽象化で置き換える。これらのスパースな形状は、知覚システムの予測で元のシーンに注釈を付けて推測される。このような中間表現は、動き予測モデルの品質とコンピュータビジョンモデルの性能を結びつける。さらに、人間によって設計された知覚と動き予測の明確なインターフェースは、通常、元の感覚入力に存在する意味情報のサブセットを通り過ぎます。これらのモジュラーアプローチの効果について検討し、これらの制約を緩和する新しいパラダイムを設計し、エンドツーエンドのモーション予測モデルの開発を加速するために、大規模かつ高品質で多様なLiDARデータを用いて、Waymo Open Motion Dataset(WOMD)を拡張した。新しい拡張現実データセットWOMD-LiDARは、それぞれ20秒にまたがる10000以上のシーンで構成され、高度に同期化され、校正された高品質のLiDAR点雲が、都市や郊外の地理的に捕獲される(https://waymo.com/open/data/motion/)。 Waymo Open Dataset (WOD)と比較して、WOMD-LiDARデータセットには100倍以上のシーンが含まれている。さらに,lidarデータをモーション予測モデルのトレーニングに統合し,強力なベースラインを提供する。実験の結果,LiDARデータは動き予測タスクの改善をもたらすことがわかった。我々は、WOMD-LiDARがエンドツーエンドのモーション予測モデルを強化する新たな機会を提供することを期待している。

Widely adopted motion forecasting datasets substitute the observed sensory inputs with higher-level abstractions such as 3D boxes and polylines. These sparse shapes are inferred through annotating the original scenes with perception systems' predictions. Such intermediate representations tie the quality of the motion forecasting models to the performance of computer vision models. Moreover, the human-designed explicit interfaces between perception and motion forecasting typically pass only a subset of the semantic information present in the original sensory input. To study the effect of these modular approaches, design new paradigms that mitigate these limitations, and accelerate the development of end-to-end motion forecasting models, we augment the Waymo Open Motion Dataset (WOMD) with large-scale, high-quality, diverse LiDAR data for the motion forecasting task. The new augmented dataset WOMD-LiDAR consists of over 100,000 scenes that each spans 20 seconds, consisting of well-synchronized and calibrated high quality LiDAR point clouds captured across a range of urban and suburban geographies (https://waymo.com/open/data/motion/). Compared to Waymo Open Dataset (WOD), WOMD-LiDAR dataset contains 100x more scenes. Furthermore, we integrate the LiDAR data into the motion forecasting model training and provide a strong baseline. Experiments show that the LiDAR data brings improvement in the motion forecasting task. We hope that WOMD-LiDAR will provide new opportunities for boosting end-to-end motion forecasting models.

翻訳日:2024-02-21 07:12:37 公開日:2024-02-18

# 確率制御とゲームのための機械学習手法の最近の進歩

Recent Developments in Machine Learning Methods for Stochastic Control and Games ( http://arxiv.org/abs/2303.10257v2 )

ライセンス: Link先を確認

Ruimeng Hu, Mathieu Lauri\`ere

(参考訳) 確率的最適制御とゲームは、金融や経済学から社会科学、ロボット工学、エネルギー管理まで幅広い応用がある。多くの実世界の応用は、洗練された数値手法の開発を駆動する複雑なモデルを含んでいる。近年,確率制御問題やゲームを解くために機械学習に基づく計算手法が開発されている。本稿では,高次元でも,あるいは構造が非常に複雑であっても,従来の数値的手法が達成できる範囲を超えて,そのような問題を解決する可能性を解いた深層学習手法に注目する。主に連続時間と連続空間の設定を考える。新しいアプローチの多くは、高次元偏微分方程式や後方確率微分方程式を解くための最近のニューラル・ネットワークに基づく手法、またはマルコフ決定過程のモデルなし強化学習に基づいて構築され、画期的な結果をもたらした。本稿では,これらの手法を紹介するとともに,機械学習と確率制御とゲームにおける最先端の成果を概説する。

Stochastic optimal control and games have a wide range of applications, from finance and economics to social sciences, robotics, and energy management. Many real-world applications involve complex models that have driven the development of sophisticated numerical methods. Recently, computational methods based on machine learning have been developed for solving stochastic control problems and games. In this review, we focus on deep learning methods that have unlocked the possibility of solving such problems, even in high dimensions or when the structure is very complex, beyond what traditional numerical methods can achieve. We consider mostly the continuous time and continuous space setting. Many of the new approaches build on recent neural-network-based methods for solving high-dimensional partial differential equations or backward stochastic differential equations, or on model-free reinforcement learning for Markov decision processes that have led to breakthrough results. This paper provides an introduction to these methods and summarizes the state-of-the-art works at the crossroad of machine learning and stochastic control and games.

翻訳日:2024-02-21 07:09:31 公開日:2024-02-18

# 時系列予測のためのマルチタスクメタラベル補正

Multi-task Meta Label Correction for Time Series Prediction ( http://arxiv.org/abs/2303.08103v3 )

ライセンス: Link先を確認

Luxuan Yang, Ting Gao, Wei Wei, Min Dai, Cheng Fang, Jinqiao Duan

(参考訳) 時系列分類は避けられない2つの問題に直面している。 1つは部分的特徴情報であり、もう1つはラベル品質の低下であり、モデルの性能に影響を及ぼす可能性がある。上記の問題に対処するため,マルチタスク・フレームワークの下で,メタラーニングによる時系列データに対するラベル補正手法を開発した。主な貢献は3つある。まず,外側ループに2つの分岐ニューラルネットワークを用いたラベル補正モデルをトレーニングする。モデルに依存しない内部ループでは、既存の分類モデルをマルチタスク方式で使用し、メタ知識を共同で更新することで、複雑な時系列上で適応的なラベリングを実現する。第2に、歴史データのイメージパターンと予測地平線におけるデータの両方に対する新しいデータ可視化手法を考案する。最後に、XOM、S\&P500、SZ50など、さまざまな財務データを用いて手法をテストする。その結果,提案手法は既存のラベル補正手法よりも有効で正確であることがわかった。

Time series classification faces two unavoidable problems. One is partial feature information and the other is poor label quality, which may affect model performance. To address the above issues, we create a label correction method to time series data with meta-learning under a multi-task framework. There are three main contributions. First, we train the label correction model with a two-branch neural network in the outer loop. While in the model-agnostic inner loop, we use pre-existing classification models in a multi-task way and jointly update the meta-knowledge so as to help us achieve adaptive labeling on complex time series. Second, we devise new data visualization methods for both image patterns of the historical data and data in the prediction horizon. Finally, we test our method with various financial datasets, including XOM, S\&P500, and SZ50. Results show that our method is more effective and accurate than some existing label correction techniques.

翻訳日:2024-02-21 07:09:16 公開日:2024-02-18

# EventNet-ITA: イベントのイタリアのフレーム解析

EventNet-ITA: Italian Frame Parsing for Events ( http://arxiv.org/abs/2305.10892v2 )

ライセンス: Link先を確認

Marco Rovera

(参考訳) 本稿では,イタリア語用イベントフレームを用いたマルチドメインコーパスであるeventnet-itaについて述べる。さらに、フレーム解析のための効率的なマルチラベルシーケンスラベリング手法を提案し、徹底的に評価する。 53,000以上の注釈付き文と200以上のモデル化されたフレームを持つ、幅広い個人的、社会的、歴史的現象をカバーするeventnet-itaは、イタリア語にイベントのフレーム解析のための公開リソースを提供する最初の体系的な試みであり、幅広い研究や応用タスクに有用である。提案手法は,計算要求の最小化に加えて,フレーム分類に0.9厳密なF1スコア,フレーム要素分類に0.72スコアを実現する。注釈付きコーパスとフレーム解析モデルはオープンライセンスでリリースされている。

This paper introduces EventNet-ITA, a large, multi-domain corpus annotated full-text with event frames for Italian. Moreover, we present and thoroughly evaluate an efficient multi-label sequence labeling approach for Frame Parsing. Covering a wide range of individual, social and historical phenomena, with more than 53,000 annotated sentences and over 200 modeled frames, EventNet-ITA constitutes the first systematic attempt to provide the Italian language with a publicly available resource for Frame Parsing of events, useful for a broad spectrum of research and application tasks. Our approach achieves a promising 0.9 strict F1-score for frame classification and 0.72 for frame element classification, on top of minimizing computational requirements. The annotated corpus and the frame parsing model are released under open license.

翻訳日:2024-02-21 06:58:55 公開日:2024-02-18

# マルチキュービットシステムにおけるエンタングルメントの可視化

Visualizing Entanglement in multi-Qubit Systems ( http://arxiv.org/abs/2305.07596v4 )

ライセンス: Link先を確認

Jonas Bley, Eva Rexigel, Alda Arias, Nikolas Longen, Lars Krupp, Maximilian Kiefer-Emmanouilidis, Paul Lukowicz, Anna Donhauser, Stefan K\"uchemann, Jochen Kuhn, and Artur Widera

(参考訳) 量子情報科学とテクノロジーの分野では、量子状態と関連するプロセスの表現と視覚化は研究と教育の両方に不可欠である。この文脈では、特に数量子ビットのアンサンブルに焦点を当てる。有名なブロッホ球面や一般化など、シングルキュービットおよびマルチキュービットシステムの多くの強力な表現が存在する。ここでは、そのようなアンサンブルの表現として次元円記法を用い、量子ビットのいわゆる円記法と、n-粒子系をn-次元空間で表現するアイデアを適用する。分離可能性の数学的条件は量子状態の対称性を可視化し、数量子ビット系の絡み合いや様々な量子アルゴリズムに対する新しい視点を提供する。このようにして、次元記法は、数量子ビット系の非自明な量子絡み合い特性と過程をより広いオーディエンスに伝達する大きな可能性を約束し、これらの概念を直感的な量子洞察と形式的な数学的記述との橋渡しとして理解を深めることができる。

In the field of quantum information science and technology, the representation and visualization of quantum states and related processes are essential for both research and education. In this context, a focus especially lies on ensembles of few qubits. There exist many powerful representations for single-qubit and multi-qubit systems, such as the famous Bloch sphere and generalizations. Here, we utilize the dimensional circle notation as a representation of such ensembles, adapting the so-called circle notation of qubits and the idea of representing the n-particle system in an n-dimensional space. We show that the mathematical conditions for separability lead to symmetry conditions of the quantum state visualized, offering a new perspective on entanglement in few-qubit systems and therefore on various quantum algorithms. In this way, dimensional notations promise significant potential for conveying nontrivial quantum entanglement properties and processes in few-qubit systems to a broader audience, and could enhance understanding of these concepts as a bridge between intuitive quantum insight and formal mathematical descriptions.

翻訳日:2024-02-21 06:58:12 公開日:2024-02-18

# MLCopilot: 機械学習タスクの解決における大規模言語モデルのパワーの解放

MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks ( http://arxiv.org/abs/2304.14979v2 )

ライセンス: Link先を確認

Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, Yuqing Yang

(参考訳) 機械学習(ML)の分野は広く普及し、特定のシナリオにMLを適用することに対する大きな需要がもたらされた。 MLタスクの自動化(例えば、AutoML)に対する主要なアプローチは、しばしば時間がかかり、人間の開発者にとって理解するのが困難である。対照的に、人間のエンジニアは、タスクとソリューションに関する推論を理解する驚くべき能力を持っているが、彼らの経験と知識は、しばしば、量的アプローチによって利用され難い。本稿では,機械知能と人間の知識のギャップを埋めるために,最先端の大規模言語モデルを活用する新しいフレームワークを導入し,新しいタスクのためのMLソリューションを開発することを目的とする。本稿では、構造化された入力を理解するためのLLMの能力を拡張し、新しいMLタスクを解くための徹底的な推論を行う可能性を示す。そして私たちは、いくつかの専用デザインの後、LLMが実現できることに気付きました。 (i)MLタスクの既存の経験から観察し、二新たな業務に有望な成果を効果的に提供する理由生成したソリューションは、高いレベルの競争力を達成するために直接使用することができる。サンプルとコードはhttps://github.com/microsoft/CoMLで公開されている。

The field of machine learning (ML) has gained widespread adoption, leading to significant demand for adapting ML to specific scenarios, which is yet expensive and non-trivial. The predominant approaches towards the automation of solving ML tasks (e.g., AutoML) are often time-consuming and hard to understand for human developers. In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches. In this paper, we aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework, which leverages the state-of-the-art large language models to develop ML solutions for novel tasks. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks. And we find that, after some dedicated design, the LLM can (i) observe from the existing experiences of ML tasks and (ii) reason effectively to deliver promising results for new tasks. The solution generated can be used directly to achieve high levels of competitiveness. Examples and code available at https://github.com/microsoft/CoML.

翻訳日:2024-02-21 06:56:30 公開日:2024-02-18

# チャトGPTの教育・教育における中国の社会的視点に関する研究

A Study on Chinese Social Perspective regarding ChatGPT for Education and Beyond ( http://arxiv.org/abs/2306.04325v3 )

ライセンス: Link先を確認

Yao Tian, Chengwei Tong, Lik-Hang Lee, Reza Hadi Mogavi, Yong Liao, Pengyuan Zhou

(参考訳) ChatGPTは多くの分野、特に学術コミュニティの関心を喚起してきた。最新バージョンのGPT-4はマルチモーダル入力と出力をサポートする。本研究は、中国国民がChatGPTの可能性を教育的、一般目的にどう捉えているかをソーシャルメディアで分析する。この研究は、GPT-4のリリース以来、世論の変化を調査する最初の試みでもある。分析結果によると、GPT-4の前には、一部のソーシャルメディア利用者はAIの進歩が教育や社会に恩恵をもたらすと信じていたが、ChatGPTのような先進的なAIは人間を劣悪に感じさせ、不正行為や道徳的原則の低下などの問題を招き、大多数は中立なままだと信じていた。興味深いことに、GPT-4の公開以降、公衆の態度はポジティブな方向に移行する傾向にある。教育におけるchatgpt様モデルの倫理的適用性を確保するため,トレンドシフトとロードマップを徹底的に分析した。

ChatGPT has piqued the interest of many fields, particularly in the academic community. GPT-4, the latest version, starts supporting multimodal input and output. This study examines social media posts to analyze how the Chinese public perceives the potential of ChatGPT for educational and general purposes. The study also serves as the first effort to investigate the changes in public opinion since the release of GPT-4. According to the analysis results, prior to GPT-4, although some social media users believed that AI advancements would benefit education and society, some believed that advanced AI, such as ChatGPT, would make humans feel inferior and lead to problems such as cheating and a decline in moral principles, while the majority remain neutral. Interestingly, public attitudes have tended to shift in a positive direction since the release of GPT-4. We present a thorough analysis of the trending shift and a roadmap to ensure the ethical application of ChatGPT-like models in education and beyond.

翻訳日:2024-02-21 06:48:46 公開日:2024-02-18

# 視覚言語モデルのための一貫性誘導型プロンプト学習

Consistency-guided Prompt Learning for Vision-Language Models ( http://arxiv.org/abs/2306.01195v2 )

ライセンス: Link先を確認

Shuvendu Roy, Ali Etemad

(参考訳) 視覚言語モデルのための新しい微調整手法であるConsistency-Guided Prompt Learning (CoPrompt)を提案する。提案手法は,下流タスクを数ショットで微調整した場合に,大規模な基礎モデルの一般化を改善する。 CoPromptの基本的な考え方は、トレーニング可能なモデルと事前訓練されたモデルの予測に一貫性の制約を適用して、下流タスクの過度な適合を防ぐことである。さらに,2つの入力に一貫性を強制し,チューニング,プロンプト,アダプタという2つの支配的なパラダイムを組み合わせることで,一貫性の制約をさらに向上させます。摂動入力における一貫性の強制は、一貫性の制約をさらに規則化し、一般化を改善するのに役立つ。さらに、アダプタとプロンプトの統合により、下流タスクのパフォーマンスが向上するだけでなく、入出力スペースにおけるチューニング柔軟性も向上している。これにより、数ショットの学習環境で下流タスクへのより効果的な適応が可能になる。実験により、CoPromptは、ベース・ツー・ノーベルの一般化、ドメインの一般化、データセット間の評価など、様々な評価スイートにおいて既存の手法よりも優れていることが示された。一般化では、CoPromptはゼロショットタスクの最先端と11データセットの全体的な調和平均を改善している。詳細なアブレーション研究は、CoPromptの各成分の有効性を示している。

We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models. Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting. The basic idea of CoPrompt is to enforce a consistency constraint in the prediction of the trainable and pre-trained models to prevent overfitting on the downstream task. Additionally, we introduce the following two components into our consistency constraint to further boost the performance: enforcing consistency on two perturbed inputs and combining two dominant paradigms of tuning, prompting and adapter. Enforcing consistency on perturbed input serves to further regularize the consistency constraint, thereby improving generalization. Moreover, the integration of adapters and prompts not only enhances performance on downstream tasks but also offers increased tuning flexibility in both input and output spaces. This facilitates more effective adaptation to downstream tasks in a few-shot learning setting. Experiments show that CoPrompt outperforms existing methods on a range of evaluation suites, including base-to-novel generalization, domain generalization, and cross-dataset evaluation. On generalization, CoPrompt improves the state-of-the-art on zero-shot tasks and the overall harmonic mean over 11 datasets. Detailed ablation studies show the effectiveness of each of the components in CoPrompt.

翻訳日:2024-02-21 06:48:04 公開日:2024-02-18

# アダプティブフローサンプリングを用いたエネルギーベースモデルのバランストレーニング

Balanced Training of Energy-Based Models with Adaptive Flow Sampling ( http://arxiv.org/abs/2306.00684v4 )

ライセンス: Link先を確認

Louis Grenioux, \'Eric Moulines, Marylou Gabri\'e

(参考訳) エネルギーベースモデル(EBMs)は、非正規化ログ密度を直接パラメータ化する汎用密度推定モデルである。非常に柔軟であるが、ebmsはモデルの特定の正規化定数を欠いているため、モデルの可能性は計算的に難解である。いくつかの近似サンプルと変分推論手法が提案され、トレーニングの確率勾配を推定している。これらの手法はサンプル生成に有望な結果を示しているが、データセット内の異なるクラスの相対的重要性を決定するなど、推定密度の統計的精度にはほとんど注意が払われていない。そこで本研究では, サンプリングを容易にするために最近提案されているNF(正規化フロー)という, 異なる種類の生成モデルを用いたESMの新しい最大格トレーニングアルゴリズムを提案する。本手法はトレーニング中にNFをEMMに適合させることで,NFを用いたサンプリング方式によりESMの正確な勾配が常に得られ,最終的には新しいデータを生成するための高速サンプリング装置となる。

Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data.

翻訳日:2024-02-21 06:47:21 公開日:2024-02-18

# ターゲットドメインラベルのないドメイン適応モデルの評価は可能か?

Can We Evaluate Domain Adaptation Models Without Target-Domain Labels? ( http://arxiv.org/abs/2305.18712v3 )

ライセンス: Link先を確認

Jianfei Yang, Hanjie Qian, Yuecong Xu, Kai Wang, Lihua Xie

(参考訳) 教師なしドメイン適応(Unsupervised domain adapt, UDA)は、ラベル豊富なソースドメインでトレーニングされたモデルをラベルなしのターゲットドメインに適応させる。しかし、現実のシナリオでは、ターゲットドメインラベルがないため、UDAモデルの性能を評価することは困難である。さらに, 対人訓練と自己学習に頼ってUDA法が普及すると, モデル変性と負の移動が生じ, 評価問題がさらに悪化する可能性がある。本稿では,これらの問題に対処する新しい指標である「textit{Transfer Score}」を提案する。提案手法は,モデルパラメータによる分類器の空間的均一性,深部表現の伝達性と識別性を評価することで,udaモデルの教師なし評価を可能にする。提案手法は,対象ドメインを含まない3つの新たな目的を達成し,(1)利用可能な選択肢から最適なUDA法を選択すること,(2)モデル劣化を防止するためにUDAモデルのハイパーパラメーターを最適化すること,(3)UDAモデルのどのチェックポイントが最適かを同定すること,である。我々の研究は、データレベルのUDA研究と実践的なUDAシナリオのギャップを埋め、UDAモデルの性能の現実的な評価を可能にします。異なるスケールのUDAデータセットと不均衡分布に関する広範な実験研究を通じて,我々の測定値の有効性を検証する。その結果、上記の目標をしっかりと達成できることがわかった。

Unsupervised domain adaptation (UDA) involves adapting a model trained on a label-rich source domain to an unlabeled target domain. However, in real-world scenarios, the absence of target-domain labels makes it challenging to evaluate the performance of UDA models. Furthermore, prevailing UDA methods relying on adversarial training and self-training could lead to model degeneration and negative transfer, further exacerbating the evaluation problem. In this paper, we propose a novel metric called the \textit{Transfer Score} to address these issues. The proposed metric enables the unsupervised evaluation of UDA models by assessing the spatial uniformity of the classifier via model parameters, as well as the transferability and discriminability of deep representations. Based on the metric, we achieve three novel objectives without target-domain labels: (1) selecting the best UDA method from a range of available options, (2) optimizing hyperparameters of UDA models to prevent model degeneration, and (3) identifying which checkpoint of UDA model performs optimally. Our work bridges the gap between data-level UDA research and practical UDA scenarios, enabling a realistic assessment of UDA model performance. We validate the effectiveness of our metric through extensive empirical studies on UDA datasets of different scales and imbalanced distributions. The results demonstrate that our metric robustly achieves the aforementioned goals.

翻訳日:2024-02-21 06:45:16 公開日:2024-02-18

# Nestを去る - 予測を最適化するローカルロス関数を超えて

Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize ( http://arxiv.org/abs/2305.16830v2 )

ライセンス: Link先を確認

Sanket Shah, Andrew Perrault, Bryan Wilder, Milind Tambe

(参考訳) predict-then-optimizeは、不確実性下で意思決定を行うために機械学習を使用するフレームワークである。中心的な研究課題は、“意思決定タスクの構造は、その特定のタスクのためにMLモデルを調整するためにどのように使用できるのか? この目的のために、近年の研究では、タスク固有の損失関数の学習が提案されている。しかしながら、現在のアプローチでは、これらの損失の形式とそれらのMLモデルの振る舞いへの影響について制限的な仮定がなされている。これらの仮定はどちらも高い計算コストのアプローチにつながり、実際に違反した場合は性能が劣る。本稿では,上記の仮定を回避し,学習損失関数のサンプル効率を向上させるためにmlモデルの特徴を活用することにより,これらの課題に対する解決策を提案する。実験により,本手法は文献から得られた4つの領域で最新の結果を得ることができ,過去の手法と同等のサンプル数を何桁も必要とすることが少なくないことを示した。さらに, 局所性仮定が破られた場合, 最良既存手法を200%近く上回っている。

Predict-then-Optimize is a framework for using machine learning to perform decision-making under uncertainty. The central research question it asks is, "How can the structure of a decision-making task be used to tailor ML models for that specific task?" To this end, recent work has proposed learning task-specific loss functions that capture this underlying structure. However, current approaches make restrictive assumptions about the form of these losses and their impact on ML model behavior. These assumptions both lead to approaches with high computational cost, and when they are violated in practice, poor performance. In this paper, we propose solutions to these issues, avoiding the aforementioned assumptions and utilizing the ML model's features to increase the sample efficiency of learning loss functions. We empirically show that our method achieves state-of-the-art results in four domains from the literature, often requiring an order of magnitude fewer samples than comparable methods from past work. Moreover, our approach outperforms the best existing method by nearly 200% when the localness assumption is broken.

翻訳日:2024-02-21 06:44:14 公開日:2024-02-18

# フェデレーション学習における共有性に関する調査 : モデルユーティリティ,プライバシリーク,コミュニケーション効率の展望

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency ( http://arxiv.org/abs/2307.10655v2 )

ライセンス: Link先を確認

Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Yuyi Mao, Jun Zhang

(参考訳) フェデレーション学習(fl)は、クライアント間のコラボレーショントレーニングのためのセキュアなパラダイムとして登場した。データ集中化がなければ、FLはクライアントがプライバシー保護の方法でローカル情報を共有できる。このアプローチは大きな注目を集め、関連する研究をまとめるために多くの調査が進められた。しかしながら、これらの調査の大部分は、トレーニングプロセス中にモデルパラメータを共有するflメソッドに集中し、他の形式でローカル情報を共有する可能性を検討している。本稿では,FLで共有すべきものに対する新たな視点から,モデルユーティリティ,プライバシリーク,通信効率を重視した体系的な調査を行う。まず, モデル, 合成データ, 知識をそれぞれ共有する3つの共有手法を用いて, FL法の新しい分類法を提案する。第2に,プライバシ攻撃に対するさまざまな共有方法の脆弱性を分析し,防御機構をレビューする。第3に、FLにおける様々な共有手法の学習性能と通信オーバーヘッドを比較するための広範な実験を行う。さらに,様々な防御手法の有効性を比較しながら,モデルインバージョン攻撃とメンバーシップ推論攻撃によるプライバシー漏洩の可能性を評価する。最後に,今後の研究方針を特定し,調査結果をまとめる。

Federated learning (FL) has emerged as a secure paradigm for collaborative training among clients. Without data centralization, FL allows clients to share local information in a privacy-preserving manner. This approach has gained considerable attention, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on FL methods that share model parameters during the training process, while overlooking the possibility of sharing local information in other forms. In this paper, we present a systematic survey from a new perspective of what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. First, we present a new taxonomy of FL methods in terms of three sharing methods, which respectively share model, synthetic data, and knowledge. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms. Third, we conduct extensive experiments to compare the learning performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we identify future research directions and conclude the survey.

翻訳日:2024-02-21 06:37:41 公開日:2024-02-18

# オーバーパラメータ付き畳み込み残差ネットワークを用いた低次元多様体の非パラメトリック分類

Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks ( http://arxiv.org/abs/2307.01649v2 )

ライセンス: Link先を確認

Kaiqi Zhang, Zixuan Zhang, Minshuo Chen, Yuma Takeda, Mengdi Wang, Tuo Zhao, Yu-Xiang Wang

(参考訳) 畳み込み残留ニューラルネットワーク(convolutional residual neural network, convresnets)は、過パラメータ化されているものの、実際には驚くべき予測性能を達成することができる。このギャップを埋めるために,ConvResNeXtsの性能について検討する。これはConvResNetsを特別なケースとしてカバーし,非パラメトリック分類の観点から重量減衰を訓練する。我々の分析は、ConvResNeXtsにおいて無限に多くのビルディングブロックを許容し、重み減衰がこれらのブロックに空間性を暗黙的に強制することを示す。具体的には、低次元多様体上で支持される滑らかな対象関数を考えることで、convresnextsが関数の滑らかさや低次元構造に適応できることを証明し、次元の呪いに苦しむことなく効率的に関数を学習する。従来の機械学習モデルに比べて過パラメータ化されたConvResNeXtの利点を部分的に正当化する。

Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkable prediction performance in practice, which cannot be well explained by conventional wisdom. To bridge this gap, we study the performance of ConvResNeXts, which cover ConvResNets as a special case, trained with weight decay from the perspective of nonparametric classification. Our analysis allows for infinitely many building blocks in ConvResNeXts, and shows that weight decay implicitly enforces sparsity on these blocks. Specifically, we consider a smooth target function supported on a low-dimensional manifold, then prove that ConvResNeXts can adapt to the function smoothness and low-dimensional structures and efficiently learn the function without suffering from the curse of dimensionality. Our findings partially justify the advantage of overparameterized ConvResNeXts over conventional machine learning models.

翻訳日:2024-02-21 06:37:23 公開日:2024-02-18

# ViTEraser:SegMIMプレトレーニングによるシーンテキスト除去のためのビジョントランスフォーマーのパワーを損なう

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining ( http://arxiv.org/abs/2306.12106v2 )

ライセンス: Link先を確認

Dezhi Peng, Chongyu Liu, Yuliang Liu, Lianwen Jin

(参考訳) シーンテキスト除去(str)は、自然シーンのテキストストロークを視覚的なコヒーレントな背景に置き換えることを目的としている。最近のSTRアプローチは反復的な改善や明示的なテキストマスクに依存しており、結果としてテキストローカライゼーションの精度に高い複雑さと感度をもたらす。さらに、既存のSTRメソッドの多くは畳み込みアーキテクチャを採用しているが、視覚変換器(ViT)の可能性はほとんど未検討である。本稿では, ViTEraser と呼ばれる, 単純かつ効率の良い ViT ベースのテキスト消去器を提案する。簡潔なエンコーダ・デコーダフレームワークに従えば、ViTEraserは様々なViTを容易に組み込んで長距離モデリングを強化することができる。具体的には、エンコーダは、入力画像をViTブロックと埋め込み層を介して隠れた空間に階層的にマッピングし、デコーダは、隠れた特徴を徐々にViTブロックと分割層でテキスト消去画像にアップサンプリングする。 ViTEraserはテキストローカライゼーションと塗装を暗黙的に統合するので、テキストボックスセグメンテーションとマスク付き画像モデリングタスクにエンコーダとデコーダに焦点を当てた、SegMIMと呼ばれる新しいエンドツーエンド事前学習手法を提案する。実験結果から,SegMIM を用いた ViTEraser はSTR 上での最先端性能をかなりのマージンで達成し,他のタスクである textit{e.g.} に拡張した場合に強い一般化能力を示すことが明らかとなった。さらに我々は,vit を str フィールドに適用するための深い洞察を提供する vit ベースのエンコーダデコーダのアーキテクチャ,事前トレーニング,スケーラビリティを総合的に検討する。コードはhttps://github.com/shannanyinxiang/viteraserで入手できる。

Scene text removal (STR) aims at replacing text strokes in natural scenes with visually coherent backgrounds. Recent STR approaches rely on iterative refinements or explicit text masks, resulting in high complexity and sensitivity to the accuracy of text localization. Moreover, most existing STR methods adopt convolutional architectures while the potential of vision Transformers (ViTs) remains largely unexplored. In this paper, we propose a simple-yet-effective ViT-based text eraser, dubbed ViTEraser. Following a concise encoder-decoder framework, ViTEraser can easily incorporate various ViTs to enhance long-range modeling. Specifically, the encoder hierarchically maps the input image into the hidden space through ViT blocks and patch embedding layers, while the decoder gradually upsamples the hidden features to the text-erased image with ViT blocks and patch splitting layers. As ViTEraser implicitly integrates text localization and inpainting, we propose a novel end-to-end pretraining method, termed SegMIM, which focuses the encoder and decoder on the text box segmentation and masked image modeling tasks, respectively. Experimental results demonstrate that ViTEraser with SegMIM achieves state-of-the-art performance on STR by a substantial margin and exhibits strong generalization ability when extended to other tasks, \textit{e.g.}, tampered scene text detection. Furthermore, we comprehensively explore the architecture, pretraining, and scalability of the ViT-based encoder-decoder for STR, which provides deep insights into the application of ViT to the STR field. Code is available at https://github.com/shannanyinxiang/ViTEraser.

翻訳日:2024-02-21 06:35:09 公開日:2024-02-18

# 視覚モデル適応とロバストネスのための群直交化正規化

Group Orthogonalization Regularization For Vision Models Adaptation and Robustness ( http://arxiv.org/abs/2306.10001v2 )

ライセンス: Link先を確認

Yoav Kurtz, Noga Bar, Raja Giryes

(参考訳) ニューラルネットワークが深まるにつれて、パラメータ内の冗長性が増大する。この現象は、畳み込みフィルタ間の相関を減らそうとするいくつかの方法につながった。同じ層内のフィルタ群間の正則性を促進する計算効率の良い正規化手法を提案する。実験により,近年の拡散モデルと視覚変換器(ViT)の適応手法に組み込むと,この正規化により下流タスクの性能が向上することが示された。また,対人訓練中に集団直交を施行した場合の頑健性も改善した。私たちのコードはhttps://github.com/yoavkurtz/gorで入手できます。

As neural networks become deeper, the redundancy within their parameters increases. This phenomenon has led to several methods that attempt to reduce the correlation between convolutional filters. We propose a computationally efficient regularization technique that encourages orthonormality between groups of filters within the same layer. Our experiments show that when incorporated into recent adaptation methods for diffusion models and vision transformers (ViTs), this regularization improves performance on downstream tasks. We further show improved robustness when group orthogonality is enforced during adversarial training. Our code is available at https://github.com/YoavKurtz/GOR.

翻訳日:2024-02-21 06:33:53 公開日:2024-02-18

# マトリックス製品密度演算子の量子状態トモグラフィ

Quantum State Tomography for Matrix Product Density Operators ( http://arxiv.org/abs/2306.09432v4 )

ライセンス: Link先を確認

Zhen Qin, Casey Jameson, Zhexuan Gong, Michael B. Wakin and Zhihui Zhu

(参考訳) 量子状態トモグラフィ(QST)を用いてしばしば達成される実験的測定から量子状態の再構成は、量子デバイスの検証とベンチマークに不可欠である。しかし、一般の非構造化量子状態に対してQSTを実行するには、最も最適な測定設定であっても、システム内の個々の量子数とともに \emph{exponentially} を成長させる膨大な数の状態コピーが必要である。幸いなことに、ノイズや中間スケールの量子コンピュータによって生成される状態のような多くの物理量子状態は通常、構造化される。一次元では、そのような状態は、キュービットの個数に依存しない有限行列/結合次元を持つ行列積作用素(MPO)によってよく近似されることが期待される。しかしながら、これらの状態に対して効率的なQSTが実行可能であるかどうかはまだ不明である。本稿では, このギャップを橋渡しし, 圧縮センシングと経験的過程の理論を用いたmposの安定回復のための理論的保証を確立する。まず、ガウス測度とHaar random rank-one Positive Operator Valued Measures (POVMs)の2種類のランダム測定設定について検討する。有限結合次元のMPOに含まれる情報は、測定値の統計的誤差を仮定して、キュービット数にのみ依存する多数のランダムな測定値を用いて保存可能であることを示す。次に、量子コンピュータ上で実装可能なHaarランダムランクワンPOVMを用いて、MPOベースのQSTを物理量子測定により研究する。我々は、MPO状態の有界回復誤差を保証するために、キュービット数における状態コピー数 \emph{polynomial} だけが必要であることを証明した。

The reconstruction of quantum states from experimental measurements, often achieved using quantum state tomography (QST), is crucial for the verification and benchmarking of quantum devices. However, performing QST for a generic unstructured quantum state requires an enormous number of state copies that grows \emph{exponentially} with the number of individual quanta in the system, even for the most optimal measurement settings. Fortunately, many physical quantum states, such as states generated by noisy, intermediate-scale quantum computers, are usually structured. In one dimension, such states are expected to be well approximated by matrix product operators (MPOs) with a finite matrix/bond dimension independent of the number of qubits, therefore enabling efficient state representation. Nevertheless, it is still unclear whether efficient QST can be performed for these states in general. In this paper, we attempt to bridge this gap and establish theoretical guarantees for the stable recovery of MPOs using tools from compressive sensing and the theory of empirical processes. We begin by studying two types of random measurement settings: Gaussian measurements and Haar random rank-one Positive Operator Valued Measures (POVMs). We show that the information contained in an MPO with a finite bond dimension can be preserved using a number of random measurements that depends only \emph{linearly} on the number of qubits, assuming no statistical error of the measurements. We then study MPO-based QST with physical quantum measurements through Haar random rank-one POVMs that can be implemented on quantum computers. We prove that only a \emph{polynomial} number of state copies in the number of qubits is required to guarantee bounded recovery error of an MPO state.

翻訳日:2024-02-21 06:33:44 公開日:2024-02-18

# CoRe Optimizer: マシンラーニングのためのオールインワンソリューション

CoRe Optimizer: An All-in-One Solution for Machine Learning ( http://arxiv.org/abs/2307.15663v2 )

ライセンス: Link先を確認

Marco Eckhoff and Markus Reiher

(参考訳) 最適化アルゴリズムとそのハイパーパラメータは、機械学習アプリケーションにおけるトレーニング速度とモデル精度に大きな影響を与える可能性がある。理想的なオプティマイザの希望リストには、高速でスムーズな低エラー収束、低計算要求、一般応用性が含まれている。当社が最近導入したcontinual resilient (core)オプティマイザは他の最先端の1次勾配ベースオプティマイザと比較して、生涯にわたるマシンラーニングポテンシャルをトレーニングする上で優れたパフォーマンスを示しました。本稿では,さまざまな機械学習タスクに対して,コアオプティマイザとadamオプティマイザとresilient backpropagation(rprop)を含む9つの最適化アルゴリズムの広範なパフォーマンス比較を行う。我々は、異なるハイパーパラメータの影響を分析し、一般に適用可能な値を提供する。コアオプティマイザは、調査対象のアプリケーション毎に最高の性能または競合性能を提供するが、ミニバッチやバッチ学習によっては、1つのハイパーパラメータのみを変更する必要がある。

The optimization algorithm and its hyperparameters can significantly affect the training speed and resulting model accuracy in machine learning applications. The wish list for an ideal optimizer includes fast and smooth convergence to low error, low computational demand, and general applicability. Our recently introduced continual resilient (CoRe) optimizer has shown superior performance compared to other state-of-the-art first-order gradient-based optimizers for training lifelong machine learning potentials. In this work we provide an extensive performance comparison of the CoRe optimizer and nine other optimization algorithms including the Adam optimizer and resilient backpropagation (RPROP) for diverse machine learning tasks. We analyze the influence of different hyperparameters and provide generally applicable values. The CoRe optimizer yields best or competitive performance in every investigated application, while only one hyperparameter needs to be changed depending on mini-batch or batch learning.

翻訳日:2024-02-21 06:25:30 公開日:2024-02-18

# OUTFOX: 逆生成例を用いた文脈学習によるLLM生成エッセイ検出

OUTFOX: LLM-Generated Essay Detection Through In-Context Learning with Adversarially Generated Examples ( http://arxiv.org/abs/2307.11729v3 )

ライセンス: Link先を確認

Ryuto Koike, Masahiro Kaneko, Naoaki Okazaki

(参考訳) 大規模言語モデル (LLM) はテキスト生成において人間レベルの流布を達成しており、人間の書き起こしとLLM生成の区別が難しい。これはLSMを誤用するリスクが増大し、LSM生成テキストを特定するための検出器の開発が要求される。しかし、既存の検出器は攻撃に対する堅牢性に欠けており、単にllm生成テキストをパラフレージングすることで検出精度を低下させる。さらに、悪意のあるユーザは、検出結果に基づいて意図的に検出を回避しようとするかもしれないが、これは以前の研究では想定されていなかった。本稿では,検出器と攻撃者の両方が互いの出力を考慮できるように,llm生成テキスト検出器のロバスト性を向上させるフレームワークであるexfoxを提案する。このフレームワークでは、検知器の予測ラベルをコンテキスト内学習の例として使用し、検出しにくいエッセイを逆向きに生成する一方、検出器は逆向きに生成されたエッセイをコンテキスト内学習の例として使用して、強い攻撃者からのエッセイを検出する。学生エッセイの領域での実験では、提案された検出器は攻撃者が生成したテキストの検出性能を+41.3ポイントF1スコアまで改善することを示した。さらに、提案した検出器は、96.9ポイントのF1スコアまでの最先端検出性能を示し、非攻撃テキスト上で既存の検出器を打ち負かす。最後に、提案する攻撃者は検出器の性能を-57.0点f1-scoreまで劇的に低下させ、検出を回避するためのベースラインパラフレージング法を大きく上回っている。

Large Language Models (LLMs) have achieved human-level fluency in text generation, making it difficult to distinguish between human-written and LLM-generated texts. This poses a growing risk of misuse of LLMs and demands the development of detectors to identify LLM-generated texts. However, existing detectors lack robustness against attacks: they degrade detection accuracy by simply paraphrasing LLM-generated texts. Furthermore, a malicious user might attempt to deliberately evade the detectors based on detection results, but this has not been assumed in previous studies. In this paper, we propose OUTFOX, a framework that improves the robustness of LLM-generated-text detectors by allowing both the detector and the attacker to consider each other's output. In this framework, the attacker uses the detector's prediction labels as examples for in-context learning and adversarially generates essays that are harder to detect, while the detector uses the adversarially generated essays as examples for in-context learning to learn to detect essays from a strong attacker. Experiments in the domain of student essays show that the proposed detector improves the detection performance on the attacker-generated texts by up to +41.3 points F1-score. Furthermore, the proposed detector shows a state-of-the-art detection performance: up to 96.9 points F1-score, beating existing detectors on non-attacked texts. Finally, the proposed attacker drastically degrades the performance of detectors by up to -57.0 points F1-score, massively outperforming the baseline paraphrasing method for evading detection.

翻訳日:2024-02-21 06:25:16 公開日:2024-02-18

# 頑健なビジュアル質問回答:データセット,メソッド,今後の課題

Robust Visual Question Answering: Datasets, Methods, and Future Challenges ( http://arxiv.org/abs/2307.11471v2 )

ライセンス: Link先を確認

Jie Ma, Pinghui Wang, Dechen Kong, Zewei Wang, Jun Liu, Hongbin Pei, Junzhou Zhao

(参考訳) 視覚質問応答は、画像と自然言語質問を与えられた正確な自然言語応答を提供するシステムが必要である。しかし,従来の一般的なVQA手法では,解答前の画像のグラウンド化など,適切な行動を学習するよりも,トレーニングデータに存在するバイアスを記憶する傾向があることが広く認識されている。したがって、これらの手法は通常、分配性能は高いが、分配性能は低い。近年,VQAのロバスト性を評価するために,様々なデータセットとデバイアス法が提案されている。本稿は,この新興ファッションに焦点をあてた初の総合調査を行う。具体的には、まず、分布内および分布外の観点からデータセットの開発プロセスの概要を示す。次に,これらのデータセットを用いた評価指標について検討する。第3に, 開発プロセス, 類似性, 差異, 堅牢性比較, および既存のデバイアス手法の技術的特徴を提示するタイポロジーを提案する。さらに,VQA上での視覚・言語事前学習モデルのロバスト性を分析し,議論する。最後に、利用可能な文献の徹底的なレビューと実験分析を通じて、様々な観点から今後の研究の要点について論じる。

Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question. However, it is widely recognized that previous generic VQA methods often exhibit a tendency to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers. Therefore, these methods usually achieve high in-distribution but poor out-of-distribution performance. In recent years, various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively. This paper provides the first comprehensive survey focused on this emerging fashion. Specifically, we first provide an overview of the development process of datasets from in-distribution and out-of-distribution perspectives. Then, we examine the evaluation metrics employed by these datasets. Thirdly, we propose a typology that presents the development process, similarities and differences, robustness comparison, and technical features of existing debiasing methods. Furthermore, we analyze and discuss the robustness of representative vision-and-language pre-training models on VQA. Finally, through a thorough review of the available literature and experimental analysis, we discuss the key areas for future research from various viewpoints.

翻訳日:2024-02-21 06:24:16 公開日:2024-02-18

# TALL:ディープフェイクビデオ検出のためのThumbnailレイアウト

TALL: Thumbnail Layout for Deepfake Video Detection ( http://arxiv.org/abs/2307.07494v3 )

ライセンス: Link先を確認

Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He

(参考訳) 社会やサイバーセキュリティに対するディープフェイクの脅威が高まり、公衆の懸念が高まり、ディープフェイクビデオ検出のこの重要な話題に努力が注がれている。既存のビデオ手法は優れた性能を発揮するが、計算量が多い。本稿では,ビデオクリップを予め定義されたレイアウトに変換することで,空間的および時間的依存関係の保存を実現する,Thumbnail Layout (TALL) というシンプルな手法を提案する。具体的には、連続したフレームを各フレーム内の一定の位置にマスクして一般化を改善し、サブイメージにリサイズし、サムネイルとして予め定義されたレイアウトに再構成する。 TALLは、数行のコードだけを変更することで、モデルに依存しない、非常に単純です。視覚変換器の成功に触発されて,我々はTALLをSwin Transformerに組み込み,効率的かつ効果的なTALL-Swin法を構築した。 TALLとSOTA TALL-Swinの有効性と優位性を検証した。 TALL-Swinは、挑戦的なクロスデータセットタスク、FaceForensics++ $\to$ Celeb-DFで90.79$\%$AUCを達成した。コードはhttps://github.com/rainy-xu/tall4 deepfakeで入手できる。

The growing threats of deepfakes to society and cybersecurity have raised enormous public concerns, and increasing efforts have been devoted to this critical topic of deepfake video detection. Existing video methods achieve good performance but are computationally intensive. This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. Specifically, consecutive frames are masked in a fixed position in each frame to improve generalization, then resized to sub-images and rearranged into a pre-defined layout as the thumbnail. TALL is model-agnostic and extremely simple by only modifying a few lines of code. Inspired by the success of vision transformers, we incorporate TALL into Swin Transformer, forming an efficient and effective method TALL-Swin. Extensive experiments on intra-dataset and cross-dataset validate the validity and superiority of TALL and SOTA TALL-Swin. TALL-Swin achieves 90.79$\%$ AUC on the challenging cross-dataset task, FaceForensics++ $\to$ Celeb-DF. The code is available at https://github.com/rainy-xu/TALL4Deepfake.

翻訳日:2024-02-21 06:22:09 公開日:2024-02-18

# 深層強化学習における報酬機械抽象化の文脈的事前計画

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning ( http://arxiv.org/abs/2307.05209v3 )

ライセンス: Link先を確認

Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren

(参考訳) 近年の研究では、深層強化学習(DRL)エージェントは、訓練されたタスクに過度に適合し、小さな環境変化に適応できない傾向が示されている。未知のタスクに移行する際の学習の迅速化を目的として,現在のタスクを,現在のタスクの報酬やダイナミクスに基づいてサブタスクを誘導する状態マシン抽象化を用いて表現する手法を提案する。本手法は,現在の抽象状態からの最適遷移の象徴表現をエージェントに与え,それらの遷移を達成するための報酬を与える。これらの表現はタスク間で共有され、エージェントは以前に遭遇したシンボルや遷移の知識を活用できるため、転送が促進される。実験結果から, 種々の領域におけるサンプル効率と少数ショット転送の改善が示された。

Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.

翻訳日:2024-02-21 06:21:29 公開日:2024-02-18

# 知識グラフ補完のための大規模言語モデル探索

Exploring Large Language Models for Knowledge Graph Completion ( http://arxiv.org/abs/2308.13916v4 )

ライセンス: Link先を確認

Liang Yao, Jiazhen Peng, Chengsheng Mao, Yuan Luo

(参考訳) 知識グラフは多くの人工知能タスクにおいて重要な役割を果たすが、不完全性の問題にしばしば直面する。本研究では,Large Language Models (LLM) を用いて知識グラフの補完を行う。我々は知識グラフのトリプルをテキストシーケンスとみなし、これらのトリプルをモデル化するための知識グラフ LLM (KG-LLM) と呼ばれる革新的なフレームワークを導入する。提案手法では,三重項の実体記述と関係記述を用いて,その応答を予測に利用する。ベンチマークナレッジグラフを用いた実験により,トリプル分類や関係予測などのタスクにおいて,最先端の性能が得られることが示された。また、微調整モデル(LLaMA-7B、ChatGLM-6B)が最近のChatGPTおよびGPT-4より優れていることも見出した。

Knowledge graphs play a vital role in numerous artificial intelligence tasks, yet they frequently face the issue of incompleteness. In this study, we explore utilizing Large Language Models (LLM) for knowledge graph completion. We consider triples in knowledge graphs as text sequences and introduce an innovative framework called Knowledge Graph LLM (KG-LLM) to model these triples. Our technique employs entity and relation descriptions of a triple as prompts and utilizes the response for predictions. Experiments on various benchmark knowledge graphs demonstrate that our method attains state-of-the-art performance in tasks such as triple classification and relation prediction. We also find that fine-tuning relatively smaller models (e.g., LLaMA-7B, ChatGLM-6B) outperforms recent ChatGPT and GPT-4.

翻訳日:2024-02-21 06:12:36 公開日:2024-02-18

# SpikingBERT:不特定微分を用いたスパイキング言語モデルのトレーニングのためのBERTの蒸留

SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation ( http://arxiv.org/abs/2308.10873v3 )

ライセンス: Link先を確認

Malyaban Bal, Abhronil Sengupta

(参考訳) 大規模言語モデル(llm)は非常に強力に成長しているが、人間の脳よりもニューロンやシナプスは桁違いに少ない。しかし、運用にはエネルギーとエネルギーがかなり必要である。本研究では,脳内のシナプス情報の流れからモチベーションを引き出すことにより,従来のLMの計算コストを削減することを目的とした,バイオインスピレーションスパイキング言語モデルを提案する。本稿では,ニューロンの平衡における平均スパイク速度を利用して,暗黙の微分法を用いてニューロモルフィックスパイキングLMを訓練し,サロゲート勾配を使わずにスパイキングニューラルネットワーク(SNN)に基づくアルゴリズムの非微分可能性問題を克服する枠組みを示す。スパイキングニューロンの定常収束はまた、スケーラブルなスパイキングLMの開発において重要なスパイキングアテンション機構を設計することができる。さらに、平衡時のニューロンの平均スパイク速度の収束を利用して、トレーニング済みBERTモデルを「教師」として使用し、「学生」スパイクアーキテクチャを訓練する新しいANN-SNN知識蒸留技術を開発した。本論文で提案するアーキテクチャはBERTをモチベーションとしているが,多種多様な LLM に拡張できる可能性がある。我々の研究は、GLUEベンチマークで複数の異なるタスクにおいて、運用上のスパイクするLMアーキテクチャのパフォーマンスを実証する最初のものである。

Large language Models (LLMs), though growing exceedingly powerful, comprises of orders of magnitude less neurons and synapses than the human brain. However, it requires significantly more power/energy to operate. In this work, we propose a novel bio-inspired spiking language model (LM) which aims to reduce the computational cost of conventional LMs by drawing motivation from the synaptic information flow in the brain. In this paper, we demonstrate a framework that leverages the average spiking rate of neurons at equilibrium to train a neuromorphic spiking LM using implicit differentiation technique, thereby overcoming the non-differentiability problem of spiking neural network (SNN) based algorithms without using any type of surrogate gradient. The steady-state convergence of the spiking neurons also allows us to design a spiking attention mechanism, which is critical in developing a scalable spiking LM. Moreover, the convergence of average spiking rate of neurons at equilibrium is utilized to develop a novel ANN-SNN knowledge distillation based technique wherein we use a pre-trained BERT model as "teacher" to train our "student" spiking architecture. While the primary architecture proposed in this paper is motivated by BERT, the technique can be potentially extended to different kinds of LLMs. Our work is the first one to demonstrate the performance of an operational spiking LM architecture on multiple different tasks in the GLUE benchmark.

翻訳日:2024-02-21 06:10:51 公開日:2024-02-18

# OctoPack: コード大言語モデルをチューニングするインストラクション

OctoPack: Instruction Tuning Code Large Language Models ( http://arxiv.org/abs/2308.07124v2 )

ライセンス: Link先を確認

Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre

(参考訳) 命令で大きな言語モデル(LLM)を微調整すると、自然言語タスクのパフォーマンスが大幅に向上する。我々は、コード変更とヒューマンインストラクションを組み合わせるgitコミットの自然な構造を活用して、コードを使った命令チューニングを適用する。 CommitPack:350のプログラミング言語で4テラバイトのGitコミットをコンパイルします。我々は、HumanEval Pythonベンチマーク(46.2% pass@1)で、CommitPackを16BパラメータStarCoderモデル上の他の自然および合成コード命令(xP3x、Self-Instruct、OASST)と比較し、OpenAI出力でトレーニングされていないモデル間で最先端のパフォーマンスを達成する。さらに、HumanEvalPackを導入し、HumanEvalベンチマークを6つの言語(Python、JavaScript、Java、Go、C++、Rust)で合計3つのコーディングタスク(コード補完、コード説明、コード合成)に拡張しました。私たちのモデルであるOctoCoderとOctoGeeXは、すべての許容モデルの中でHumanEvalPackで最高のパフォーマンスを実現し、CommitPackがより広範な言語や自然なコーディングタスクに一般化する利点を実証しています。コード、モデル、データはhttps://github.com/bigcode-project/octopackで無料で利用できる。

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with human instructions. We compile CommitPack: 4 terabytes of Git commits across 350 programming languages. We benchmark CommitPack against other natural and synthetic code instructions (xP3x, Self-Instruct, OASST) on the 16B parameter StarCoder model, and achieve state-of-the-art performance among models not trained on OpenAI outputs, on the HumanEval Python benchmark (46.2% pass@1). We further introduce HumanEvalPack, expanding the HumanEval benchmark to a total of 3 coding tasks (Code Repair, Code Explanation, Code Synthesis) across 6 languages (Python, JavaScript, Java, Go, C++, Rust). Our models, OctoCoder and OctoGeeX, achieve the best performance across HumanEvalPack among all permissive models, demonstrating CommitPack's benefits in generalizing to a wider set of languages and natural coding tasks. Code, models and data are freely available at https://github.com/bigcode-project/octopack.

翻訳日:2024-02-21 06:09:06 公開日:2024-02-18

# 動的活性化関数によるフィードフォワードと畳み込みニューラルネットワークの性能最適化

Optimizing Performance of Feedforward and Convolutional Neural Networks through Dynamic Activation Functions ( http://arxiv.org/abs/2308.05724v2 )

ライセンス: Link先を確認

Chinmay Rane, Kanishka Tyagi, Michael Manry

(参考訳) ディープラーニングトレーニングトレーニングアルゴリズムは、音声、テキスト、画像ビデオなど、多くの分野において、近年で大きな成功を収めています。より深い層と深い層が提案され、152層ほどのresnet構造で大きな成功を収めた。浅層畳み込みニューラルネットワーク(CNN)はまだ活発な研究であり、いくつかの現象はまだ説明されていない。ネットワークで使用されるアクティベーション機能は、ネットワークに非線型性を提供するため、最も重要である。 Relu は最もよく使われる活性化関数であり、隠れた層に複雑なピースワイド線形(PWL)活性化を示す。これらのpwl活性化は、畳み込みニューラルネットワークと多層パーセプトロンのためのネットワークのrelu活性化よりもはるかに優れた働きを示す。浅部および深部CNNに対するPyTorchの結果の比較を行い,本症例をさらに強化した。

Deep learning training training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deeper and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained. Activation functions used in the network are of utmost importance, as they provide non linearity to the networks. Relu's are the most commonly used activation function.We show a complex piece-wise linear(PWL) activation in the hidden layer. We show that these PWL activations work much better than relu activations in our networks for convolution neural networks and multilayer perceptrons. Result comparison in PyTorch for shallow and deep CNNs are given to further strengthen our case.

翻訳日:2024-02-21 06:08:38 公開日:2024-02-18

# 高絡み合いトーリックxyモデルの任意の子

Anyons in a highly-entangled toric xy model ( http://arxiv.org/abs/2308.01765v2 )

ライセンス: Link先を確認

Milo Moses, Konrad Deka

(参考訳) 1989年にXiao-Gang Wenによって表面的には造られたが、1972年からは古典的xyモデルの振る舞いを記述するためにトポロジカル秩序 (topological order) という用語が用いられてきた。 xyモデルは、非位相的 u(1) ゲージ作用の対象となるため、ウェンの位相次数を持たないことが指摘されている。私たちはある意味でこれが唯一の障害であることを示している。すなわち、ゲージ不変性がエネルギー的に強制されると、$xy$モデルは純粋に位相的に順序づけられる。実際、量子$xy$トポロジカル位数は、群 G=Z に適用された北エフの量子二重模型の無限格子極限であることを示す。

While ostensibly coined in 1989 by Xiao-Gang Wen, the term "topological order" has been in use since 1972 to describe the behavior of the classical xy model. It has been noted that the xy model does not have Wen's topological order since it is also subject a non-topological U(1) gauge action. We show in a sense this is the only obstruction. That is, if gauge invariance is enforced energetically then the $xy$ model becomes purely topologically ordered. In fact, we show that the quantum $xy$ topological order is an infinite lattice limit of Kitaev's quantum double model applied to the group G=Z.

翻訳日:2024-02-21 06:08:17 公開日:2024-02-18

# Decoupled Training: フラストレーションに易しいマルチドメイン学習の復活

Decoupled Training: Return of Frustratingly Easy Multi-Domain Learning ( http://arxiv.org/abs/2309.10302v2 )

ライセンス: Link先を確認

Ximei Wang, Junwei Pan, Xingzhuo Guo, Dapeng Liu, Jie Jiang

(参考訳) マルチドメイン学習(mdl)は、重複する複数のドメインに対して、最小平均リスクでモデルをトレーニングすることを目的としている。データセットバイアスとドメイン支配の課題に対処するために、分布を整列してドメインギャップを減らしたり、ドメイン固有のタワーやゲート、さらには専門家による差異を保ったりすることで共通性を求める多くのMDLアプローチが提案されている。 MDLモデルは、高度なネットワークアーキテクチャや損失関数によってますます複雑になり、余分なパラメータを導入し、計算コストを増大させています。本稿では,Decoupled Training (D-Train) という名前のマルチドメイン学習手法を提案する。 d-trainは、まずすべてのドメインを事前トレーニングしてルートモデルをウォームアップし、次にマルチヘッドに分割して各ドメインをポストトレーニングし、最終的にバックボーンを固定することでヘッドを微調整し、トレーニングを分離してドメイン独立を達成する3段階のトレーニング戦略である。 d-trainは単純さと効率性にも拘わらず、標準的なベンチマークから衛星画像やレコメンデーションシステムの応用に至るまで、さまざまなデータセットの広範な評価において非常に優れた性能を発揮している。

Multi-domain learning (MDL) aims to train a model with minimal average risk across multiple overlapping but non-identical domains. To tackle the challenges of dataset bias and domain domination, numerous MDL approaches have been proposed from the perspectives of seeking commonalities by aligning distributions to reduce domain gap or reserving differences by implementing domain-specific towers, gates, and even experts. MDL models are becoming more and more complex with sophisticated network architectures or loss functions, introducing extra parameters and enlarging computation costs. In this paper, we propose a frustratingly easy and hyperparameter-free multi-domain learning method named Decoupled Training (D-Train). D-Train is a tri-phase general-to-specific training strategy that first pre-trains on all domains to warm up a root model, then post-trains on each domain by splitting into multi-heads, and finally fine-tunes the heads by fixing the backbone, enabling decouple training to achieve domain independence. Despite its extraordinary simplicity and efficiency, D-Train performs remarkably well in extensive evaluations of various datasets from standard benchmarks to applications of satellite imagery and recommender systems.

翻訳日:2024-02-21 06:00:21 公開日:2024-02-18

# アンカーポイント: 少ない例でベンチマークモデル

Anchor Points: Benchmarking Models with Much Fewer Examples ( http://arxiv.org/abs/2309.08638v2 )

ライセンス: Link先を確認

Rajan Vivek, Kawin Ethayarajh, Diyi Yang, Douwe Kiela

(参考訳) 現代の言語モデルは、しばしば強力だが不安定な振る舞いを示し、その振る舞いを確実に評価するより大きく、より多様なベンチマークの開発につながる。ここでは,モデルの性能を,より小さな評価セットでベンチマークし,解くことを提案する。まず,6つの人気言語分類ベンチマークにおいて,多くの点に対する正しいクラスに対するモデル信頼度は,モデル間で強く相関していることを示す。 Anchor Point Selectionは、データセット全体のモデル挙動をキャプチャするデータセットの小さなサブセットを選択するテクニックである。 1-30アンカーポイントを用いたモデルの評価は、正確なランキングモデルにおける一様サンプリングやその他のベースラインよりも優れています。さらに、いくつかのアンカーポイントを使用して、低平均の絶対誤差を持つデータセット内の他のすべてのポイントにおけるクラス毎のモデル予測を見積もることができる。最後に,これらの知見を可視化し,データセット分布内の様々な領域における異なるモデルの性能比較を容易にするアンカーポイントマップを提案する。

Modern language models often exhibit powerful but brittle behavior, leading to the development of larger and more diverse benchmarks to reliably assess their behavior. Here, we suggest that model performance can be benchmarked and elucidated with much smaller evaluation sets. We first show that in six popular language classification benchmarks, model confidence in the correct class on many pairs of points is strongly correlated across models. We build upon this phenomenon to propose Anchor Point Selection, a technique to select small subsets of datasets that capture model behavior across the entire dataset. Anchor points reliably rank models: across 87 diverse language model-prompt pairs, evaluating models using 1-30 anchor points outperforms uniform sampling and other baselines at accurately ranking models. Moreover, just several anchor points can be used to estimate model per-class predictions on all other points in a dataset with low mean absolute error, sufficient for gauging where the model is likely to fail. Lastly, we present Anchor Point Maps for visualizing these insights and facilitating comparisons of the performance of different models on various regions within the dataset distribution.

翻訳日:2024-02-21 05:58:54 公開日:2024-02-18

# prograsp: 物体把握のための実用的ヒューマンロボットコミュニケーション

PROGrasp: Pragmatic Human-Robot Communication for Object Grasping ( http://arxiv.org/abs/2309.07759v2 )

ライセンス: Link先を確認

Gi-Cheon Kang, Junghyun Kim, Jaein Kim, Byoung-Tak Zhang

(参考訳) 対話型オブジェクトグラスピング(IOG)は、人間とロボットの自然言語による対話を通じて、望ましいオブジェクトを識別し、把握するタスクである。現在のIOGシステムは、人間が最初に対象のオブジェクトのカテゴリ(例えばボトル)を指定すると仮定している。目的達成のためにコンテキストに依存して意図を伝達する実践的手法に触発されて,新たなIOGタスクであるPragmatic-IOGと,それに対応するデータセットであるIntention-oriented Multi-modal Dialogue (IM-Dial)を導入する。提案するタスクシナリオでは、まず、意図指向の発話(例えば「喉が渇いている」など)がロボットに与えられる。ロボットは、人間のユーザと対話することで、対象物を識別する。タスク設定に基づいて,ユーザの意図を解釈し,対象物であるPROGrasp(Pragmatic Object Grasping)をピックアップするロボットシステムを提案する。 PROGraspは、視覚的なグラウンドニング、質問、オブジェクトの把握、そして最も重要なのは、実用的推論の解答解釈のモジュールを組み込むことで、Pragmatic-IOGを実行する。 ProGraspはオフライン(ターゲットオブジェクト発見)やオンライン(物理ロボットアーム付きIOG)の設定で有効であることを示す実験結果が得られた。コードとデータはhttps://github.com/gicheonkang/prograspで入手できる。

Interactive Object Grasping (IOG) is the task of identifying and grasping the desired object via human-robot natural language interaction. Current IOG systems assume that a human user initially specifies the target object's category (e.g., bottle). Inspired by pragmatics, where humans often convey their intentions by relying on context to achieve goals, we introduce a new IOG task, Pragmatic-IOG, and the corresponding dataset, Intention-oriented Multi-modal Dialogue (IM-Dial). In our proposed task scenario, an intention-oriented utterance (e.g., "I am thirsty") is initially given to the robot. The robot should then identify the target object by interacting with a human user. Based on the task setup, we propose a new robotic system that can interpret the user's intention and pick up the target object, Pragmatic Object Grasping (PROGrasp). PROGrasp performs Pragmatic-IOG by incorporating modules for visual grounding, question asking, object grasping, and most importantly, answer interpretation for pragmatic inference. Experimental results show that PROGrasp is effective in offline (i.e., target object discovery) and online (i.e., IOG with a physical robot arm) settings. Code and data are available at https://github.com/gicheonkang/prograsp.

翻訳日:2024-02-21 05:58:19 公開日:2024-02-18

# コントラスト-Phys+:時空間コントラストによる教師なし・弱教師付き遠隔生理計測

Contrast-Phys+: Unsupervised and Weakly-supervised Video-based Remote Physiological Measurement via Spatiotemporal Contrast ( http://arxiv.org/abs/2309.06924v3 )

ライセンス: Link先を確認

Zhaodong Sun and Xiaobai Li

(参考訳) ビデオベースの遠隔生理計測は、顔の映像を利用して血液量変化信号を測定する。 rPPG測定の監視手法は優れた性能を発揮することが示されている。しかし、これらの手法の欠点は、しばしばコストがかかり入手が困難である、地上の真実(GT)生理学的信号を持つ顔ビデオを必要とすることである。本稿では,教師なし設定と弱い教師なし設定の両方で訓練できる方法であるcon contrast-phys+を提案する。我々は3DCNNモデルを用いて、複数の時空間rPPG信号を生成し、rPPGの事前知識を対照的な損失関数に組み込む。さらに、GT信号をコントラスト学習に組み込んで、部分的または不正なラベルに適応させる。対照的な損失は、同じビデオからのrPPG/GT信号をグループ化し、異なるビデオからそれらを分離させる。 RGBおよび近赤外ビデオを含む5つの公開データセットに対して,本手法の評価を行った。コントラスト-Phys+は、部分的に利用可能または不一致のGT信号を使用する場合やラベルが全くない場合でも、最先端の教師付き手法よりも優れている。さらに,計算効率,雑音頑健性,一般化の観点から,本手法の利点を強調した。私たちのコードはhttps://github.com/zhaodongsun/contrast-physで利用可能です。

Video-based remote physiological measurement utilizes facial videos to measure the blood volume change signal, which is also called remote photoplethysmography (rPPG). Supervised methods for rPPG measurements have been shown to achieve good performance. However, the drawback of these methods is that they require facial videos with ground truth (GT) physiological signals, which are often costly and difficult to obtain. In this paper, we propose Contrast-Phys+, a method that can be trained in both unsupervised and weakly-supervised settings. We employ a 3DCNN model to generate multiple spatiotemporal rPPG signals and incorporate prior knowledge of rPPG into a contrastive loss function. We further incorporate the GT signals into contrastive learning to adapt to partial or misaligned labels. The contrastive loss encourages rPPG/GT signals from the same video to be grouped together, while pushing those from different videos apart. We evaluate our methods on five publicly available datasets that include both RGB and Near-infrared videos. Contrast-Phys+ outperforms the state-of-the-art supervised methods, even when using partially available or misaligned GT signals, or no labels at all. Additionally, we highlight the advantages of our methods in terms of computational efficiency, noise robustness, and generalization. Our code is available at https://github.com/zhaodongsun/contrast-phys.

翻訳日:2024-02-21 05:57:48 公開日:2024-02-18

# DePT:パラメータ効率の良い微調整のための分解プロンプトチューニング

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning ( http://arxiv.org/abs/2309.05173v5 )

ライセンス: Link先を確認

Zhengxiang Shi, Aldo Lipani

(参考訳) 言語モデル(lm)の入力に少量の訓練可能なソフト(連続)プロンプトベクトルが固定されるプロンプトチューニング(pt)は、パラメータ効率の良い微調整(peft)のための様々なタスクやモデルに対して有望な結果を示している。 PTは、トレーニング可能なパラメータが少なくて競合性能を保ち、モデルのサイズが拡大するにつれてパラメータを劇的にスケールアップしないため、他のPEFTアプローチと際立っている。しかし、PTはソフトプロンプトトークンを導入し、入力シーケンスが長くなり、Transformerの2次複雑さによるトレーニングや推論時間、メモリ使用量に大きな影響を及ぼす。特に大きな言語モデル(llm)では、日々の大量のクエリに直面する。この問題に対処するために,ソフトプロンプトを短いソフトプロンプトと2つの異なる学習率で最適化された2つの低ランク行列に分解するDecomposed Prompt Tuning (DePT)を提案する。これにより、トレーニング可能なパラメータサイズを変更することなく、バニラPTとその変種と比較してメモリと時間コストを大幅に削減しながら、パフォーマンスが向上する。 23の自然言語処理(NLP)と視覚言語(VL)タスクに関する広範な実験を通じて、DePTが最先端のPEFTアプローチより優れていることを示す。さらに,モデルサイズが大きくなるにつれてdeptがより効率的になることを示す。さらに,DePTは数ショットの学習環境においてパラメータ効率のよい伝達学習とシームレスに統合され,様々なモデルアーキテクチャやサイズへの適応性を強調している。

Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving substantial memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline, in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.

翻訳日:2024-02-21 05:57:24 公開日:2024-02-18

# ローエンド32ビットIoTデバイス上での高速KyberのためのPlanard Arithmeticの改良

Yet another Improvement of Plantard Arithmetic for Faster Kyber on Low-end 32-bit IoT Devices ( http://arxiv.org/abs/2309.00440v3 )

ライセンス: Link先を確認

Junhao Huang, Haosong Zhao, Jipeng Zhang, Wangchen Dai, Lu Zhou, Ray C.C. Cheung, Cetin Kaya Koc, Donglong Chen

(参考訳) 本稿では、SIMD拡張のない2つのローエンド32ビットIoTプラットフォーム(ARM Cortex-M3とRISC-V)上でKyberの実装を高速化するPlanard演算の別の改良版を提案する。具体的には、計算ステップを変更することなく、Planard演算の入力範囲をさらに拡大する。 Kyber のモジュラーに対して、Planard 算術を調整した後、定数によるPlanard 乗算の入力範囲は、TCHES2022 の元の設計よりも少なくとも2.14倍大きいことを示す。次に, Cortex-M3 と RISC-V の2つの最適化手法を提案する。プランタード算術はローエンド32ビットプラットフォーム上でモンゴメリー算術とバレット算術の両方に取って代わることを示す。これらのプラットフォーム上でのインプット範囲の拡大とPlanard演算の効率的な実装により,NTT/INTTの最適化手法を提案する。ローエンド32ビットプラットフォーム上で提案したPlanard演算の入力範囲を大きくすることで,NTT/INTTにおける係数のモジュラー化を最小化あるいは完全に排除する。さらに,2つのメモリ最適化手法を提案し,cortex-m4に比較して,速度変換kyber実装のスタック使用率を23.50%から28.31%に削減した。提案した最適化により、ローエンドIoTデバイス上でのスピードバージョン実装がより実現可能になった。上記の最適化のおかげで、NTT/INTTの実装は最先端の作業と比べてかなりスピードアップしている。全体として、メモリ制限されたIoTプラットフォーム上での速度変換Kyberの実装の適用性を示し、これらのプラットフォーム上でKyberの新しい速度記録を設定します。

This paper presents another improved version of Plantard arithmetic that could speed up Kyber implementations on two low-end 32-bit IoT platforms (ARM Cortex-M3 and RISC-V) without SIMD extensions. Specifically, we further enlarge the input range of the Plantard arithmetic without modifying its computation steps. After tailoring the Plantard arithmetic for Kyber's modulus, we show that the input range of the Plantard multiplication by a constant is at least 2.14 times larger than the original design in TCHES2022. Then, two optimization techniques for efficient Plantard arithmetic on Cortex-M3 and RISC-V are presented. We show that the Plantard arithmetic supersedes both Montgomery and Barrett arithmetic on low-end 32-bit platforms. With the enlarged input range and the efficient implementation of the Plantard arithmetic on these platforms, we propose various optimization strategies for NTT/INTT. We minimize or entirely eliminate the modular reduction of coefficients in NTT/INTT by taking advantage of the larger input range of the proposed Plantard arithmetic on low-end 32-bit platforms. Furthermore, we propose two memory optimization strategies that reduce 23.50% to 28.31% stack usage for the speed-version Kyber implementation when compared to its counterpart on Cortex-M4. The proposed optimizations make the speed-version implementation more feasible on low-end IoT devices. Thanks to the aforementioned optimizations, our NTT/INTT implementation shows considerable speedups compared to the state-of-the-art work. Overall, we demonstrate the applicability of the speed-version Kyber implementation on memory-constrained IoT platforms and set new speed records for Kyber on these platforms.

翻訳日:2024-02-21 05:56:41 公開日:2024-02-18

# 多数の権限を与え、バイアスを負う: 大規模言語モデルによるジェネラリストクレジットスコアリング

Empowering Many, Biasing a Few: Generalist Credit Scoring through Large Language Models ( http://arxiv.org/abs/2310.00566v3 )

ライセンス: Link先を確認

Duanyu Feng, Yongfu Dai, Jimin Huang, Yifang Zhang, Qianqian Xie, Weiguang Han, Zhengyu Chen, Alejandro Lopez-Lira, Hao Wang

(参考訳) 金融業界では、クレジットスコアリングが基本的な要素であり、クレジットへのアクセスを形成し、個人やビジネスのローン条件を決定する。しかし、伝統的なクレジットスコアリング手法は、狭い知識範囲や独立したクレジットタスクの評価といった課題にしばしば対処している。我々の研究は、Large Language Models (LLM) が複数のタスクにまたがる強力な一般化能力を持つ信用スコアリングタスクに大きな可能性を持っていることを示唆している。クレジットスコアリングのためのLCMを体系的に探索するために,我々は,最初のオープンソース包括的フレームワークを提案する。筆者らは,14Kサンプルを用いた9つのデータセットを対象とし,LLM内の潜在的なバイアスに対する評価と評価を行うとともに,45k以上のサンプルを用いた新しいインストラクションチューニングデータについて検証した。そこで我々は,各種金融リスク評価タスクの煩雑な要求に合わせて,指導チューニングによる最初の信用リスク評価大言語モデル(CALM)を提案する。ビルドベンチマークでは,CALM,既存の最先端(SOTA)メソッド,オープンソースおよびクローズドソースのLCMを評価した。我々の経験的結果は、LLMが従来のモデルに適合するだけでなく、信用スコアがより包括的で包括的で偏見のない未来へ向けて、従来のモデルを上回る能力を示す。我々は、先駆的なインストラクションチューニングデータセット、信用とリスクアセスメントLLM、および研究コミュニティと金融業界とのベンチマークを共有することで、業界変革に貢献する。

In the financial industry, credit scoring is a fundamental element, shaping access to credit and determining the terms of loans for individuals and businesses alike. Traditional credit scoring methods, however, often grapple with challenges such as narrow knowledge scope and isolated evaluation of credit tasks. Our work posits that Large Language Models (LLMs) have great potential for credit scoring tasks, with strong generalization ability across multiple tasks. To systematically explore LLMs for credit scoring, we propose the first open-source comprehensive framework. We curate a novel benchmark covering 9 datasets with 14K samples, tailored for credit assessment and a critical examination of potential biases within LLMs, and the novel instruction tuning data with over 45k samples. We then propose the first Credit and Risk Assessment Large Language Model (CALM) by instruction tuning, tailored to the nuanced demands of various financial risk assessment tasks. We evaluate CALM, existing state-of-art (SOTA) methods, open source and closed source LLMs on the build benchmark. Our empirical results illuminate the capability of LLMs to not only match but surpass conventional models, pointing towards a future where credit scoring can be more inclusive, comprehensive, and unbiased. We contribute to the industry's transformation by sharing our pioneering instruction-tuning datasets, credit and risk assessment LLM, and benchmarks with the research community and the financial industry.

翻訳日:2024-02-21 05:47:51 公開日:2024-02-18

# スケールでの粒度:高解像度オーソグラフィー画像とハイブリッド学習による近隣社会経済指標の推定

Granularity at Scale: Estimating Neighborhood Socioeconomic Indicators from High-Resolution Orthographic Imagery and Hybrid Learning ( http://arxiv.org/abs/2309.16808v3 )

ライセンス: Link先を確認

Ethan Brewer, Giovani Valdrighi, Parikshit Solunke, Joao Rulff, Yurii Piadyk, Zhonghui Lv, Jorge Poco, and Claudio Silva

(参考訳) 世界の多くの地域は、既存のデータ収集方法の限界のために、人口の社会経済的幸福に関する基本的な情報を持っていない。衛星や航空機などの遠隔地から得られたオーバーヘッド画像は、地上の生命状態の窓として機能し、より高解像度のセンサーを必要とするより小さなスケールでの推定で、コミュニティ情報が不足している「ギャップに埋める」のに役立つ。センサーの解像度の改善と並行して、機械学習とコンピュータビジョンの最近の進歩により、これらの特徴を他の情報と関連付けるプロセスにおいて、画像データのパターンから素早く特徴を抽出し、検出することが可能になった。本研究は, 教師付き畳み込みニューラルネットワークと半教師付きクラスタリングという2つのアプローチが, 人口密度, 中央値の世帯所得, および全米の都市の高解像度画像から各地区の教育的到達度を推定するものである。その結果、画像から抽出された特徴は、近隣の人口密度 (r$^2$- 0.81) を正確に推定でき、教師付きアプローチにより、人口の所得と教育の変動の約半分を説明できることがわかった。地理的一般化の基盤となる提示されたアプローチに加えて、新しい半教師付きアプローチは、ラベルデータを必要としない航空画像から微細な情報を推定する将来の研究の基盤を提供する。

Many areas of the world are without basic information on the socioeconomic well-being of the residing population due to limitations in existing data collection methods. Overhead images obtained remotely, such as from satellite or aircraft, can help serve as windows into the state of life on the ground and help "fill in the gaps" where community information is sparse, with estimates at smaller geographic scales requiring higher resolution sensors. Concurrent with improved sensor resolutions, recent advancements in machine learning and computer vision have made it possible to quickly extract features from and detect patterns in image data, in the process correlating these features with other information. In this work, we explore how well two approaches, a supervised convolutional neural network and semi-supervised clustering based on bag-of-visual-words, estimate population density, median household income, and educational attainment of individual neighborhoods from publicly available high-resolution imagery of cities throughout the United States. Results and analyses indicate that features extracted from the imagery can accurately estimate the density (R$^2$ up to 0.81) of neighborhoods, with the supervised approach able to explain about half the variation in a population's income and education. In addition to the presented approaches serving as a basis for further geographic generalization, the novel semi-supervised approach provides a foundation for future work seeking to estimate fine-scale information from aerial imagery without the need for label data.

翻訳日:2024-02-21 05:46:37 公開日:2024-02-18

# LSTDとランダム特徴を用いた強化学習における二重明度について

On Double Descent in Reinforcement Learning with LSTD and Random Features ( http://arxiv.org/abs/2310.05518v4 )

ライセンス: Link先を確認

David Brellmann, Elo\"ise Berthier, David Filliat and Goran Frehse

(参考訳) 時間差分法(TD)アルゴリズムは深層強化学習(RL)において広く用いられている。その性能はニューラルネットワークのサイズに大きく影響されている。教師付き学習では、過度パラメータ化の体制とその利点はよく理解されているが、RLの状況は明らかになっていない。本稿では,ネットワークサイズと$l_2$-regularizationが性能に与える影響を理論的に分析する。パラメータ数と訪問状態数との比率を重要な要因として同定し,1以上の場合の過剰パラメータ化をレジームとして定義する。さらに,二重降下現象,すなわち1のパラメータ/状態比付近で突然性能が低下する現象を観測した。ランダムな特徴と遅延学習体制を生かし、パラメータ数と状態が無限に近づき、一定比を維持するため、漸近的条件下でのLSTD(Last-Square Temporal difference)アルゴリズムについて検討する。二重降下の原因となる補正項を特徴とする経験的および真のベルマン誤差(MSBE)の決定論的限界を導出する。補正項は、$l_2$-レギュライゼーションが増加したり、見返りのない状態がゼロになったときに消滅する。合成環境と小さな実環境における数値実験は、理論的な予測と密接に一致する。

Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL). Their performance is heavily influenced by the size of the neural network. While in supervised learning, the regime of over-parameterization and its benefits are well understood, the situation in RL is much less clear. In this paper, we present a theoretical analysis of the influence of network size and $l_2$-regularization on performance. We identify the ratio between the number of parameters and the number of visited states as a crucial factor and define over-parameterization as the regime when it is larger than one. Furthermore, we observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one. Leveraging random features and the lazy training regime, we study the regularized Least-Square Temporal Difference (LSTD) algorithm in an asymptotic regime, as both the number of parameters and states go to infinity, maintaining a constant ratio. We derive deterministic limits of both the empirical and the true Mean-Squared Bellman Error (MSBE) that feature correction terms responsible for the double descent. Correction terms vanish when the $l_2$-regularization is increased or the number of unvisited states goes to zero. Numerical experiments with synthetic and small real-world environments closely match the theoretical predictions.

翻訳日:2024-02-21 05:36:22 公開日:2024-02-18

# Hieros: 構造化状態空間シーケンスワールドモデルに関する階層的イマジネーション

Hieros: Hierarchical Imagination on Structured State Space Sequence World Models ( http://arxiv.org/abs/2310.05167v3 )

ライセンス: Link先を確認

Paul Mattes, Rainer Schlosser, Ralf Herbrich

(参考訳) 現代的深層強化学習(drl)アルゴリズムの最大の課題の1つはサンプル効率である。多くのアプローチは、エージェントを完全に想像力で訓練するために世界モデルを学び、トレーニング中に直接環境相互作用の必要性をなくす。しかし、これらの方法はしばしば想像力の正確さ、探索能力、実行時の効率の欠如に苦しむ。本研究では,時間的抽象世界表現を学習し,複数の時間的空間における軌跡を推定する階層的ポリシーであるHierosを提案する。 hierosはs5レイヤベースの世界モデルを使用して、トレーニング中と環境相互作用中の反復的に次の世界状態を並列に予測する。 s5層の特殊性により,並列に学習し,イマジネーション中に次世界の状態を反復的に予測できる。これにより、rnnベースのワールドモデルよりも効率的なトレーニングと、トランスフォーマーベースのワールドモデルよりも効率的なイマジネーションが可能になる。このアプローチはatari 100kベンチマークで平均値と平均値の正規化人間のスコアの点でアートの状態を上回っており、提案する世界モデルは複雑なダイナミクスを非常に正確に予測できることを示した。また、hierosは既存のアプローチよりも優れた探索能力を示している。

One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is sample efficiency. Many approaches learn a world model in order to train an agent entirely in imagination, eliminating the need for direct environment interaction during training. However, these methods often suffer from either a lack of imagination accuracy, exploration capabilities, or runtime efficiency. We propose Hieros, a hierarchical policy that learns time abstracted world representations and imagines trajectories at multiple time scales in latent space. Hieros uses an S5 layer-based world model, which predicts next world states in parallel during training and iteratively during environment interaction. Due to the special properties of S5 layers, our method can train in parallel and predict next world states iteratively during imagination. This allows for more efficient training than RNN-based world models and more efficient imagination than Transformer-based world models. We show that our approach outperforms the state of the art in terms of mean and median normalized human score on the Atari 100k benchmark, and that our proposed world model is able to predict complex dynamics very accurately. We also show that Hieros displays superior exploration capabilities compared to existing approaches.

翻訳日:2024-02-21 05:35:01 公開日:2024-02-18

# 不均衡階層型最適トランスポートフレームワークを用いたロバストグラフマッチング

Robust Graph Matching Using An Unbalanced Hierarchical Optimal Transport Framework ( http://arxiv.org/abs/2310.12081v4 )

ライセンス: Link先を確認

Haoran Cheng, Dixin Luo, Hongteng Xu

(参考訳) グラフマッチングは、異なるグラフ間のノード対応を見つけることを目的とした、最も重要なグラフ解析タスクの1つである。既存のグラフマッチングアプローチの多くは、ノード属性やサブグラフ構造など、グラフに隠されているマルチモーダル情報を十分に活用していないため、パフォーマンスが最適でデータノイズに敏感なトポロジ情報に依存している。本研究では,不均衡な階層的最適輸送(UHOT)フレームワークに基づく新しい頑健なグラフマッチング手法を提案する。原則として、多層メッセージパッシングを適用して、各グラフを異なるモードに対応する層ワイドノード埋め込みとして表現する。 2つのグラフが与えられたとき、それぞれのノードの埋め込みをそれぞれ同じモダリティと異なるモダリティに並べる。そして、全てのアライメント結果の重み付き平均によりノード対応を推定する。この方法は、2つのグラフ間のUHOT距離を計算するために実装され、各アライメントは2つのノード埋め込み間のノードレベル最適トランスポート計画によって達成され、全てのアライメント結果の重みは不均衡なモダリティレベル最適トランスポート計画に対応する。様々なグラフマッチングタスクにおける実験は、最先端のアプローチと比較して、提案手法の優越性と頑健性を示している。実装はhttps://github.com/Dixin-Lab/UHOT-GMで公開しています。

Graph matching is one of the most significant graph analytic tasks, which aims to find the node correspondence across different graphs. Most existing graph matching approaches mainly rely on topological information, whose performances are often sub-optimal and sensitive to data noise because of not fully leveraging the multi-modal information hidden in graphs, such as node attributes, subgraph structures, etc. In this study, we propose a novel and robust graph matching method based on an unbalanced hierarchical optimal transport (UHOT) framework, which, to our knowledge, makes the first attempt to exploit cross-modal alignment in graph matching. In principle, applying multi-layer message passing, we represent each graph as layer-wise node embeddings corresponding to different modalities. Given two graphs, we align their node embeddings within the same modality and across different modalities, respectively. Then, we infer the node correspondence by the weighted average of all the alignment results. This method is implemented as computing the UHOT distance between the two graphs -- each alignment is achieved by a node-level optimal transport plan between two sets of node embeddings, and the weights of all alignment results correspond to an unbalanced modality-level optimal transport plan. Experiments on various graph matching tasks demonstrate the superiority and robustness of our method compared to state-of-the-art approaches. Our implementation is available at https://github.com/Dixin-Lab/UHOT-GM.

翻訳日:2024-02-21 05:23:47 公開日:2024-02-18

# 複素量子系における遷移状態理論の微視的導出

Microscopic derivation of transition-state theory for complex quantum systems ( http://arxiv.org/abs/2310.09537v2 )

ライセンス: Link先を確認

K. Hagino and G.F. Bertsch

(参考訳) ポテンシャル障壁による量子複雑系の崩壊は、化学においてRRKM理論として知られる遷移状態理論でしばしば説明される。ここでは、構成-相互作用基底で構築されるようなジェネリックハミルトニアンに基づく遷移状態理論の基本公式を導出する。ガウス直交アンサンブルからのランダムなハミルトニアンの2つの貯水池は、障壁における遷移状態を表す中間状態と結合される。貯水池の開水路への崩壊が大きい条件下では、反応速度の解析式が導出される。遷移状態は、総遷移確率に付加的に寄与する独立したブライト・ウィグナー共鳴として作用し、共鳴トンネル状態による電子伝導で知られている。また, 遷移確率は, 広範囲の崩壊幅にわたって第2貯留層における状態の崩壊特性とは無関係であることが判明した。

The decay of quantum complex systems through a potential barrier is often described with transition-state theory, also known as RRKM theory in chemistry. Here we derive the basic formula for transition-state theory based on a generic Hamiltonian as might be constructed in a configuration-interaction basis. Two reservoirs of random Hamiltonians from Gaussian orthogonal ensembles are coupled to intermediate states representing the transition states at a barrier. Under the condition that the decay of the reservoirs to open channels is large, an analytic formula for reaction rates is derived. The transition states act as independent Breit-Wigner resonances which contribute additively to the total transition probability, as is well known for electronic conductance through resonant tunneling states. It is also found that the transition probability is independent of the decay properties of the states in the second reservoir over a wide range of decay widths.

翻訳日:2024-02-21 05:21:05 公開日:2024-02-18

# LAiW: 中国の法律大言語モデルベンチマーク

LAiW: A Chinese Legal Large Language Models Benchmark ( http://arxiv.org/abs/2310.05620v2 )

ライセンス: Link先を確認

Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang

(参考訳) 一般および法的ドメイン LLM は LegalAI の様々なタスクにおいて高いパフォーマンスを示している。しかし、これらのLLMの現在の評価は、コンピュータサイエンスの専門家によって定義されており、法的な実践の論理と整合性に欠けており、実用能力の判断が困難である。この課題に対処するため、我々はまず、法的実践の論理に基づいて、中国の法的LLMベンチマークLAiWを構築しました。法律専門家の思考プロセスや法的実践(シロジズム)に合わせるために,LLMの法的能力は,基本的な情報検索,法的基礎推論,複雑な法的応用の3つのレベルに分割する。各レベルは総合的な評価を保証するために複数のタスクを含んでいる。本ベンチマークでは,現在の一般領域と法域のLLMを自動評価することにより,これらのLLMは法的な実践の論理と一致しない可能性が示唆された。 llmは、複雑な法的応用能力を直接獲得できるが、いくつかの基本的なタスクでは性能が悪く、その実用的適用や法の専門家の受け入れに支障を来す可能性がある。法律適用シナリオにおける現在のLLMの複雑な法的な適用能力をさらに確認するために、人間の評価を法の専門家に取り入れる。その結果, LLMは高い性能を示すが, 法論理の強化が必要であることが示唆された。

General and legal domain LLMs have demonstrated strong performance in various tasks of LegalAI. However, the current evaluations of these LLMs in LegalAI are defined by the experts of computer science, lacking consistency with the logic of legal practice, making it difficult to judge their practical capabilities. To address this challenge, we are the first to build the Chinese legal LLMs benchmark LAiW, based on the logic of legal practice. To align with the thinking process of legal experts and legal practice (syllogism), we divide the legal capabilities of LLMs from easy to difficult into three levels: basic information retrieval, legal foundation inference, and complex legal application. Each level contains multiple tasks to ensure a comprehensive evaluation. Through automated evaluation of current general and legal domain LLMs on our benchmark, we indicate that these LLMs may not align with the logic of legal practice. LLMs seem to be able to directly acquire complex legal application capabilities but perform poorly in some basic tasks, which may pose obstacles to their practical application and acceptance by legal experts. To further confirm the complex legal application capabilities of current LLMs in legal application scenarios, we also incorporate human evaluation with legal experts. The results indicate that while LLMs may demonstrate strong performance, they still require reinforcement of legal logic.

翻訳日:2024-02-21 05:19:24 公開日:2024-02-18

# DreamSmooth: Reward Smoothingによるモデルベース強化学習の改善

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing ( http://arxiv.org/abs/2311.01450v2 )

ライセンス: Link先を確認

Vint Lee, Pieter Abbeel, Youngwoon Lee

(参考訳) モデルベース強化学習(MBRL)は、複雑な振る舞いをサンプル効率のよい方法で学習する能力で注目を集めている。その成功にもかかわらず、驚くべきことに、報酬予測はMBRLのボトルネックとなることが多い。人間が大まかな報酬推定から学べる直感に触発され、与えられた報酬の正確な報酬ではなく、時間的に滑らかな報酬を予測することを学ぶ、単純で効果的な報酬平滑化アプローチDreamSmoothを提案する。 dreamsmoothはdeepmind control suiteやatari benchmarksといった一般的なベンチマークのパフォーマンスを損なうことなく、サンプル効率と最終パフォーマンスの両方において、長時間ホリゾンスパースリワードタスクで最先端のパフォーマンスを達成している。

Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Motivated by the intuition that humans can learn from rough reward estimates, we propose a simple yet effective reward smoothing approach, DreamSmooth, which learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep. We empirically show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks both in sample efficiency and final performance without losing performance on common benchmarks, such as Deepmind Control Suite and Atari benchmarks.

翻訳日:2024-02-21 05:11:36 公開日:2024-02-18

# MCE: カントンと英語のオーディオデータセット

MCE: Mixed Cantonese and English Audio Dataset ( http://arxiv.org/abs/2310.17953v2 )

ライセンス: Link先を確認

Peng Xie, Zihao Xin, Yang Wang, Shengjun Huang, Tsz Wai Chan, Kani Chen

(参考訳) 近年、whisperは英語音声認識において人間のレベルのロバスト性と正確性にアプローチしているが、マイナー言語と混合言語音声認識では、さらなる改善が必要である。本研究では、自作したデータセットであるMixed Cantoneseand English (MCE)オーディオデータセットをトレーニングしたWhisper-MCEの印象的な結果を示す。 Whisper-MCEは14.28%のMER(Mix Error Rate)を達成したが、これはオリジナルのモデルよりも35.13%低かった。また、共通音声zh-HKでは12.61%の文字誤り率(CER)を達成した。しかし、MERとCERは、混合言語とマイナー言語での有効性を評価する上で、課題となる。そこで我々は,FALと呼ばれる新しい評価基準を提案し,元の音声,精度,レイテンシに対する忠実度に基づいて自動音声認識(ASR)システムを評価する。 Whisper-MCEは、この評価基準で他のモデルよりも優れ、90.91 FALのスコアを得た。 MCEデータセットとコードはhttps://github.com/Shelton1013/Whisper MCEで見ることができる。

Recently Whisper has approached human-level robustness and accuracy in English speech recognition, while in minor language and mixed language speech recognition, there remains a compelling need for further improvement. In this work,we present the impressive results of Whisper-MCE, our fine-tuned Whisper, which was trainedusing our self-collected dataset, Mixed Cantoneseand English (MCE) audio dataset. Whisper-MCE achieved an impressive Mix Error Rate (MER) of 14.28%, which is 35.13% lower than the original model. It also achieved 12.61% Character Error Rate (CER) in Common voice zh-HK, positioning it as state-of-the-art. However, MER and CER pose challenges when it comes to evaluating its effectiveness in mixed-language and minor language contexts. We proposed a novel evaluation metric called FAL, which assesses an Automatic Speech Recognition (ASR) system based on fidelity to the original audio, accuracy, and latency. Whisper-MCE outperformed other models in this evaluation metric, achieving a score of 90.91 FAL, further highlighting its exceptional performance. The MCE dataset and code can be found at https://github.com/Shelton1013/Whisper MCE.

翻訳日:2024-02-21 05:09:02 公開日:2024-02-18

# 超伝導量子ビットをカオスに駆動する

Driving superconducting qubits into chaos ( http://arxiv.org/abs/2310.17698v2 )

ライセンス: Link先を確認

Jorge Ch\'avez-Carlos, Miguel A. Prado Reynoso, Ignacio Garc\'ia-Mata, Victor S. Batista, Francisco P\'erez-Bernal, Diego A. Wisniacki, Lea F. Santos

(参考訳) カーパラメトリック発振器は、フォールトトレラント量子コンピュータのためのビルディングブロックである。それらはKerr-cat量子ビットを安定化し、エラー保護された量子情報のエンコーディングと操作の利点を提供する。カーキャット量子ビットの最近の実現は、SNAILトランスモン超伝導回路とスクイーズ駆動の非線形性を生かした。非線形性の増大はゲート時間の短縮を可能にするが、ここで示すようにカオスを引き起こして量子ビットを溶かすこともできる。我々は,kerr-cat qubit の有効領域を決定し,その崩壊を実験的に検出する方法について検討した。パラメトリック量子計算の危険領域は、駆動超伝導回路による量子カオスの研究の場でもある。

Kerr parametric oscillators are potential building blocks for fault-tolerant quantum computers. They can stabilize Kerr-cat qubits, which offer advantages toward the encoding and manipulation of error-protected quantum information. The recent realization of Kerr-cat qubits made use of the nonlinearity of the SNAIL transmon superconducting circuit and a squeezing drive. Increasing nonlinearities can enable faster gate times, but, as shown here, can also induce chaos and melt the qubit away. We determine the region of validity of the Kerr-cat qubit and discuss how its disintegration could be experimentally detected. The danger zone for parametric quantum computation is also a potential playground for investigating quantum chaos with driven superconducting circuits.

翻訳日:2024-02-21 05:08:39 公開日:2024-02-18

# ヒルベルト空間固有プロブレムによって生成される仮定公式

Summation formulas generated by Hilbert space eigenproblem ( http://arxiv.org/abs/2310.17210v3 )

ライセンス: Link先を確認

Petar Mali, Sonja Gombar, Slobodan Rado\v{s}evi\' c, Milica Rutonjski, Milan Panti\' c, Milica Pavkov-Hrvojevi\' c

(参考訳) 一般化超幾何関数を含むschl\" omilch的無限級数と級数のあるクラスは、無限ポテンシャル井戸内に閉じ込められた粒子の単純な量子モデルと量子力学の原理から、閉じた形で計算できることを実証する。我々は、ヒルベルト空間の固有プロブレムに基づく一般的なフレームワークを提供し、異なる正確な可解量子モデルに適用することができる。明確に定義された量子問題における正規化条件から級数を取得することは、それらの収束を保証する。

We demonstrate that certain classes of Schl\" omilch-like infinite series and series that include generalized hypergeometric functions can be calculated in closed form starting from a simple quantum model of a particle trapped inside an infinite potential well and using principles of quantum mechanics. We provide a general framework based on the Hilbert space eigenproblem that can be applied to different exactly solvable quantum models. Obtaining series from normalization conditions in well-defined quantum problems secures their convergence.

翻訳日:2024-02-21 05:08:28 公開日:2024-02-18

# 雑音Werner-Holevoチャネルとその特性

The noisy Werner-Holevo channel and its properties ( http://arxiv.org/abs/2310.15353v6 )

ライセンス: Link先を確認

Shayan Roofeh, Vahid Karimipour

(参考訳) Werner-Holevo チャネル $\Lambda_{1} (\rho)=\frac{1}{2}(\text{tr}(\rho)I-\rho^T)$ への関心は主に、その抽象的な数学的性質に起因する。三次元およびわずかな修正により、このチャネルはランダムな角度でランダムな方向における量子状態の回転として実現できることを示した。我々の修正は $\Lambda_x(\rho)=(1-x)\rho+x\Lambda_1(\rho)$ の形を取る。したがって、量子処理タスクにおけるクトリットの潜在的利用や、様々なプラットフォームにおけるそれらの実現を考えると、修正されたwerner-holevoチャネルは、量子ビットに対する脱分極チャネルと同様に、非常に単純で現実的なノイズモデルとして使用できる。我々は、このチャネルを詳細に研究し、その様々な特性を導き出す。特に、最近提案されたフラグ拡張や他の手法を用いて、このチャネルの異なる容量に対する解析的表現と境界を導出する。これらの導出において対称性の役割が明らかになる。また、チャネル $\Lambda_x$ が反分解可能であり、したがって領域 $\frac{4}{7}\leq x\leq 1.$ において量子容量がゼロであることを厳格に証明する。

The interest in the Werner-Holevo channel $\Lambda_{1} (\rho)=\frac{1}{2}(\text{tr}(\rho)I-\rho^T)$ has been mainly due to its abstract mathematical properties. We show that in three dimensions and with a slight modification, this channel can be realized as the rotation of qutrit states in random directions by random angles. Our modification takes the form $\Lambda_x(\rho)=(1-x)\rho+x\Lambda_1(\rho)$. Therefore and in view of the potential use of qutrits in quantum processing tasks and their realization in many different platforms, the modified Werner-Holevo channel can be used as a very simple and realistic noise model, in the same way that the depolarizing channel is for qubits. We will make a detailed study of this channel and derive its various properties. In particular, we will use the recently proposed flag extension and other techniques to derive analytical expressions and bounds for the different capacities of this channel. The role of symmetry is revealed in these derivations. We also rigorously prove that the channel $\Lambda_x$ is anti-degradable and hence has zero quantum capacity, in the region $\frac{4}{7}\leq x\leq 1.$

翻訳日:2024-02-21 05:08:16 公開日:2024-02-18

# 羊の服を着たオオカミ:ネストした脱獄プロンプトは大きな言語モデルを簡単に騙す

A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models Easily ( http://arxiv.org/abs/2311.08268v2 )

ライセンス: Link先を確認

Peng Ding, Jun Kuang, Dan Ma, Xuezhi Cao, Yunsen Xian, Jiajun Chen, Shujian Huang

(参考訳) ChatGPTやGPT-4のような大規模言語モデル(LLM)は、有用で安全な応答を提供するように設計されている。しかし、"jailbreaks"と呼ばれる敵のプロンプトは、LLMが潜在的に有害な内容を生成するため、保護を回避することができる。ジェイルブレイクのプロンプトを探索することは、LSMの弱点を明らかにするのに役立ちます。残念ながら、既存のjailbreakメソッドは複雑な手動設計に悩まされるか、他のホワイトボックスモデルの最適化が必要であり、一般化や効率を損なう。本稿では,(1)プロンプトリライトと(2)シナリオネスティングの2つの側面にジェイルブレイク即時攻撃を一般化する。そこで本研究では,LDM自体を利用して効果的なジェイルブレイクプロンプトを生成する自動フレームワークReNeLLMを提案する。大規模な実験により、ReNeLLMは攻撃成功率を大幅に改善し、既存のベースラインと比較して時間コストを大幅に削減することが示された。また,LLMの保護における現在の防御方法の欠如も明らかにした。最後に,迅速な実行優先度の観点からllms防御の失敗を分析し,対応する防衛戦略を提案する。我々は,学術コミュニティとLLM開発者の両方に,より安全で規制の厳しいLLMの提供を促すことを願っている。コードはhttps://github.com/NJUNLP/ReNeLLMで入手できる。

Large Language Models (LLMs), such as ChatGPT and GPT-4, are designed to provide useful and safe responses. However, adversarial prompts known as 'jailbreaks' can circumvent safeguards, leading LLMs to generate potentially harmful content. Exploring jailbreak prompts can help to better reveal the weaknesses of LLMs and further steer us to secure them. Unfortunately, existing jailbreak methods either suffer from intricate manual design or require optimization on other white-box models, compromising generalization or efficiency. In this paper, we generalize jailbreak prompt attacks into two aspects: (1) Prompt Rewriting and (2) Scenario Nesting. Based on this, we propose ReNeLLM, an automatic framework that leverages LLMs themselves to generate effective jailbreak prompts. Extensive experiments demonstrate that ReNeLLM significantly improves the attack success rate while greatly reducing the time cost compared to existing baselines. Our study also reveals the inadequacy of current defense methods in safeguarding LLMs. Finally, we analyze the failure of LLMs defense from the perspective of prompt execution priority, and propose corresponding defense strategies. We hope that our research can catalyze both the academic community and LLMs developers towards the provision of safer and more regulated LLMs. The code is available at https://github.com/NJUNLP/ReNeLLM.

翻訳日:2024-02-21 04:59:34 公開日:2024-02-18

# もう一度質問する:(ほとんど)すべてのシナリオで、セルフアグリメントが言語モデルの推論を改善する

Ask One More Time: Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios ( http://arxiv.org/abs/2311.08154v2 )

ライセンス: Link先を確認

Lei Lin, Jiayi Fu, Pengli Liu, Qingyang Li, Yan Gong, Junchen Wan, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

(参考訳) チェーン・オブ・シンクレット(CoT)と言語モデルの組み合わせは複雑な推論タスクにおいて促進的な結果をもたらすが、CoTプロンプトで使用される単純なグレディ・デコードは通常、反復性と局所最適性を引き起こす。この欠点に対処するため、アンサンブル最適化は最終解集合を得るために複数の推論経路を得ようとする。しかし、現在のアンサンブル最適化手法では、単に \textit{self-consistency}のようなルールベースの後処理を用いるか、複数の推論パスの中で最良のものを選択するタスク関連のヒューマンアノテーションに基づいた追加モデルを訓練するが、入力された質問の種類や推論パスの回答形式が不明な現実的な設定に一般化できない。その限界を避けるために,入力質問のタイプや推論パスの回答形式が不明な場合,ほぼすべてのシナリオに適用可能な,一般化されたアンサンブル最適化手法である \textbf{self-agreement} を提案する。まず、言語モデルのデコーダからサンプルを取得して、推論パスの \textit{diverse} 集合を生成し、その後、サンプルされた推論パスの中から最も \textit{agreed} 回答を選択することで、言語モデル \textit{one more time} に最適な回答を決定するように促す。自己分離は、6つの公開推論ベンチマークと優れた一般化能力を同時に達成する。

Although chain-of-thought (CoT) prompting combined with language models has achieved encouraging results on complex reasoning tasks, the naive greedy decoding used in CoT prompting usually causes the repetitiveness and local optimality. To address this shortcoming, ensemble-optimization tries to obtain multiple reasoning paths to get the final answer assembly. However, current ensemble-optimization methods either simply employ rule-based post-processing such as \textit{self-consistency}, or train an additional model based on several task-related human annotations to select the best one among multiple reasoning paths, yet fail to generalize to realistic settings where the type of input questions is unknown or the answer format of reasoning paths is unknown. To avoid their limitations, we propose \textbf{Self-Agreement}, a generalizable ensemble-optimization method applying in almost all scenarios where the type of input questions and the answer format of reasoning paths may be known or unknown. Self-agreement firstly samples from language model's decoder to generate a \textit{diverse} set of reasoning paths, and subsequently prompts the language model \textit{one more time} to determine the optimal answer by selecting the most \textit{agreed} answer among the sampled reasoning paths. Self-agreement simultaneously achieves remarkable performance on six public reasoning benchmarks and superior generalization capabilities.

翻訳日:2024-02-21 04:59:10 公開日:2024-02-18

# ResMGCN: 高速バイオメディカルインタラクションのための残留メッセージグラフ畳み込みネットワーク

ResMGCN: Residual Message Graph Convolution Network for Fast Biomedical Interactions Discovering ( http://arxiv.org/abs/2311.07632v2 )

ライセンス: Link先を確認

Zecheng Yin

(参考訳) バイオメディカル情報グラフは、生物医療、バイオインフォマティクス、ヒトの医療コミュニティの関心を惹きつける多種多様な分子相互作用の同定や薬物発見など、現代におけるバイオメディカル情報の発見に不可欠である。今日では、バイオメディカル情報の実体を学習し、最先端の結果と生体分子の相互作用を正確に明らかにするために、グラフニューラルネットワークがますます多く提案されている。これらの手法は、遠方から特徴の消失を防ぎつつ、冗長なメモリと時間を犠牲にしてそのような問題を治療する。本稿では,異なる考え方で高速かつ正確な生体医学的相互作用予測を行うための,新しい残差メッセージグラフ畳み込みネットワーク (resmgcn) を提案する。具体的には、遠くのノードからメッセージを拡張する代わりに、ResMGCNは下位情報を次のラウンドの上位情報と集約してノード更新をガイドし、より意味のあるノード表現を得る。 resmgcnは、前層からの様々なメッセージと現在の層内の高次情報を最小のメモリと時間コストで認識・保存することができ、生体医学的実体の情報表現を得ることができる。タンパク質・タンパク質・薬物・薬物・ターゲット・遺伝子・疾患の相互作用を含む4つのバイオメディカル相互作用ネットワークデータセットについて実験を行い、ResMGCNが従来の最先端モデルより優れており、記憶と時間の両方において非常に有効であることを示した。

Biomedical information graphs are crucial for interaction discovering of biomedical information in modern age, such as identification of multifarious molecular interactions and drug discovery, which attracts increasing interests in biomedicine, bioinformatics, and human healthcare communities. Nowadays, more and more graph neural networks have been proposed to learn the entities of biomedical information and precisely reveal biomedical molecule interactions with state-of-the-art results. These methods remedy the fading of features from a far distance but suffer from remedying such problem at the expensive cost of redundant memory and time. In our paper, we propose a novel Residual Message Graph Convolution Network (ResMGCN) for fast and precise biomedical interaction prediction in a different idea. Specifically, instead of enhancing the message from far nodes, ResMGCN aggregates lower-order information with the next round higher information to guide the node update to obtain a more meaningful node representation. ResMGCN is able to perceive and preserve various messages from the previous layer and high-order information in the current layer with least memory and time cost to obtain informative representations of biomedical entities. We conduct experiments on four biomedical interaction network datasets, including protein-protein, drug-drug, drug-target, and gene-disease interactions, which demonstrates that ResMGCN outperforms previous state-of-the-art models while achieving superb effectiveness on both storage and time.

翻訳日:2024-02-21 04:58:22 公開日:2024-02-18

# sac3:semantic-aware cross-check consistencyによるブラックボックス言語モデルの信頼性の高い幻覚検出

SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency ( http://arxiv.org/abs/2311.01740v2 )

ライセンス: Link先を確認

Jiaxin Zhang, Zhuohang Li, Kamalika Das, Bradley A. Malin, Sricharan Kumar

(参考訳) 幻覚検出は、現代言語モデル(LM)の信頼性を理解するための重要なステップである。この目的を達成するために,lmsの自己矛盾に基づく既存の検出アプローチを再検討し,その結果生じる2種類の幻覚を明らかにする。 1)質問レベルと回答 2)自己整合性チェックのみでは効果的に識別できないモデルレベル。この発見に基づいて, 自己一貫性検査の原理に基づいて拡張した新しいサンプリングベース手法,すなわちsemantic-aware cross-check consistency (sac3)を提案する。我々のSAC3アプローチは、意味論的に等価な質問摂動やモデル間の応答整合性チェックなどの進歩を活用することで、質問レベルとモデルレベルの幻覚の両方を検出するための追加メカニズムを組み込んでいる。広範かつ体系的な実証分析を通じて、SAC3は複数の質問応答およびオープンドメイン生成ベンチマークにおいて、非実例と実例の両方の検出において、技術の現状より優れていることを示す。

Hallucination detection is a critical step toward understanding the trustworthiness of modern language models (LMs). To achieve this goal, we re-examine existing detection approaches based on the self-consistency of LMs and uncover two types of hallucinations resulting from 1) question-level and 2) model-level, which cannot be effectively identified through self-consistency check alone. Building upon this discovery, we propose a novel sampling-based method, i.e., semantic-aware cross-check consistency (SAC3) that expands on the principle of self-consistency checking. Our SAC3 approach incorporates additional mechanisms to detect both question-level and model-level hallucinations by leveraging advances including semantically equivalent question perturbation and cross-model response consistency checking. Through extensive and systematic empirical analysis, we demonstrate that SAC3 outperforms the state of the art in detecting both non-factual and factual statements across multiple question-answering and open-domain generation benchmarks.

翻訳日:2024-02-21 04:55:13 公開日:2024-02-18

# 大規模言語モデルからどこまで様々な視点を抽出できるか?

How Far Can We Extract Diverse Perspectives from Large Language Models? ( http://arxiv.org/abs/2311.09799v2 )

ライセンス: Link先を確認

Shirley Anugrah Hayati, Minhwa Lee, Dheeraj Rajagopal, Dongyeop Kang

(参考訳) 多様な人間の意見を集めるのは費用がかかり難い。これは、さまざまなデータを生成し、潜在的にスケーラブルで効率的なソリューションを提供するために、人間と大規模言語モデル(LLM)の協調作業の最近の傾向につながります。しかしながら、主観的話題に対する多様な視点を生み出すllmsの能力は、未解決の疑問である。本研究では,社会規範や論証文などの主観的話題に多様な視点と理性をもたらすLLMの能力について検討する。 LLMから最大多様性抽出の新しい問題を定式化する。本研究は, 人間の価値観を生かし, 多様な意見の基盤となる基準に基づく促進手法を提案する。 LLMからどの程度多様な視点を抽出できるか、あるいは多様性カバレッジと呼ばれるかを調べるため、反復的な方法でモデルからより多くの出力を生成するためにステップバイステップのリコールプロンプトを採用している。様々なタスクにメソッドを適用すると、実際にLLMはタスク主観性の度合いに応じて多様な意見を生成できることがわかった。

Collecting diverse human opinions is costly and challenging. This leads to a recent trend in collaborative efforts between humans and Large Language Models (LLMs) for generating diverse data, offering potential scalable and efficient solutions. However, the extent of LLMs' capability to generate diverse perspectives on subjective topics remains an unexplored question. In this study, we investigate LLMs' capacity for generating diverse perspectives and rationales on subjective topics, such as social norms and argumentative texts. We formulate a new problem of maximum diversity extraction from LLMs. Motivated by how humans develop their opinions through their values, we propose a criteria-based prompting technique to ground diverse opinions. To see how far we can extract diverse perspectives from LLMs, or called diversity coverage, we employ a step-by-step recall prompting for generating more outputs from the model in an iterative manner. As we apply our methods to various tasks, indeed we find that LLMs can generate diverse opinions according to the degree of task subjectivity

翻訳日:2024-02-21 04:45:51 公開日:2024-02-18

# DocLens:医療用テキスト生成のための多面的きめ細かい評価

DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation ( http://arxiv.org/abs/2311.09581v2 )

ライセンス: Link先を確認

Yiqing Xie, Sheng Zhang, Hao Cheng, Pengfei Liu, Zelalem Gero, Cliff Wong, Tristan Naumann, Hoifung Poon, Carolyn Rose

(参考訳) 医療用テキスト生成は、行政業務の支援と意思決定を支援するための健全な情報強調を目的としている。医療用テキストの具体的な要件を反映するため,本論文では,生成したテキストの完全性,簡潔性,属性をきめ細かなレベルで評価するための指標セットを提案する。メトリクスは、インストラクションフォロー(プロプライエタリとオープンソースの両方)や教師付きエンテーメントモデルなど、さまざまなタイプの評価者によって計算できる。臨床ノート作成,放射線報告書要約,患者の質問要約の3つのタスクにおいて,doclensが3つの評価器で有効性を示す。総合的な人間の研究によると、DocLensは既存の指標よりも医療専門家の判断とかなり高い一致を示している。結果はまた、オープンソースの評価ツールの改善の必要性を強調し、潜在的な方向性を提案する。

Medical text generation aims to assist with administrative work and highlight salient information to support decision-making. To reflect the specific requirements of medical text, in this paper, we propose a set of metrics to evaluate the completeness, conciseness, and attribution of the generated text at a fine-grained level. The metrics can be computed by various types of evaluators including instruction-following (both proprietary and open-source) and supervised entailment models. We demonstrate the effectiveness of the resulting framework, DocLens, with three evaluators on three tasks: clinical note generation, radiology report summarization, and patient question summarization. A comprehensive human study shows that DocLens exhibits substantially higher agreement with the judgments of medical experts than existing metrics. The results also highlight the need to improve open-source evaluators and suggest potential directions.

翻訳日:2024-02-21 04:44:52 公開日:2024-02-18

# symbol-llm: 大規模言語モデルのための基本記号中心インタフェースに向けて

Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models ( http://arxiv.org/abs/2311.09278v2 )

ライセンス: Link先を確認

Fangzhi Xu, Zhiyong Wu, Qiushi Sun, Siyu Ren, Fei Yuan, Shuai Yuan, Qika Lin, Yu Qiao, Jun Liu

(参考訳) 大規模言語モデル(llm)は、人間に似たテキストの処理と生成において顕著な能力を示すが、自然言語の境界を超えて広がる世界知識の理解と表現(例えば化学分子公式)に関して制限がある。 LLMのトレーニングに直接シンボリックデータのコレクションを注入することは、異なるシンボリックファミリー間のシナジーを無視し、自然なデータとシンボリックデータのバランスの取れた混合の必要性を見落としているため、問題となる。本研究では、データとフレームワークの観点からこれらの課題に取り組み、Symbol-LLMシリーズモデルを導入する。まず、34のタスクからなるデータコレクションをキュレーションし、約20の異なるシンボリックファミリーを組み込んで、相互関係を捉え、シンボル間の相乗効果を育む。そして、2段階のチューニングフレームワークは、一般化能力を失うことなく記号的知識を注入することに成功した。シンボル中心タスクとNL中心タスクの広範な実験は、Symbol-LLMシリーズモデルのバランスと優れた性能を示している。プロジェクトページはhttps://xufangzhi.github.io/symbol-llm-page/。

Although Large Language Models (LLMs) demonstrate remarkable ability in processing and generating human-like text, they do have limitations when it comes to comprehending and expressing world knowledge that extends beyond the boundaries of natural language(e.g., chemical molecular formula). Injecting a collection of symbolic data directly into the training of LLMs can be problematic, as it disregards the synergies among different symbolic families and overlooks the need for a balanced mixture of natural and symbolic data. In this work, we tackle these challenges from both a data and framework perspective and introduce Symbol-LLM series models. First, we curated a data collection consisting of 34 tasks and incorporating approximately 20 distinct symbolic families, intending to capture the interrelations and foster synergies between symbols. Then, a two-stage tuning framework succeeds in injecting symbolic knowledge without loss of the generality ability. Extensive experiments on both symbol- and NL-centric tasks demonstrate the balanced and superior performances of Symbol-LLM series models. The project page is https://xufangzhi.github.io/symbol-llm-page/.

翻訳日:2024-02-21 04:44:10 公開日:2024-02-18

# タスク特化知識のない自己強化学習のための自己監督型カリキュラム生成

Self-Supervised Curriculum Generation for Autonomous Reinforcement Learning without Task-Specific Knowledge ( http://arxiv.org/abs/2311.09195v2 )

ライセンス: Link先を確認

Sang-Hyun Lee and Seung-Woo Seo

(参考訳) 現在の強化学習アルゴリズムを現実世界のシナリオに適用する際の大きなボトルネックは、各エピソード間の環境をリセットする必要があることである。このリセットプロセスは人間の介入を必要とするため、エージェントが継続的に自律的に学習することは困難である。いくつかの最近の研究は、リセットとフォワードを共同でトレーニングするためのカリキュラムを生成する自律強化学習(ARL)アルゴリズムを導入している。彼らのカリキュラムは、エージェントの学習の進捗を考慮して、必要な手動リセットの数を減らすことができるが、事前定義された初期状態やリセット報酬関数のようなタスク固有の知識に依存している。本稿では,タスク固有の知識を使わずに,エージェントの学習進捗に適応したカリキュラムを生成する新しいARLアルゴリズムを提案する。我々のカリキュラムは、エージェントが多様かつ情報的な初期状態に自律的にリセットする権限を与えます。これを実現するために,エージェントがフォワードポリシーに従うと,各初期状態から成功確率を推定する成功判別器を導入する。成功判別器は自己監督的な方法で可逆遷移で訓練される。実験の結果, arlアルゴリズムは適応型カリキュラムを生成でき, エージェントのブートストラップにより, スパース・リワードの迷路ナビゲーションや操作タスクを効率的に解くことができ, 手動リセットの少ないベースラインよりも優れていた。

A significant bottleneck in applying current reinforcement learning algorithms to real-world scenarios is the need to reset the environment between every episode. This reset process demands substantial human intervention, making it difficult for the agent to learn continuously and autonomously. Several recent works have introduced autonomous reinforcement learning (ARL) algorithms that generate curricula for jointly training reset and forward policies. While their curricula can reduce the number of required manual resets by taking into account the agent's learning progress, they rely on task-specific knowledge, such as predefined initial states or reset reward functions. In this paper, we propose a novel ARL algorithm that can generate a curriculum adaptive to the agent's learning progress without task-specific knowledge. Our curriculum empowers the agent to autonomously reset to diverse and informative initial states. To achieve this, we introduce a success discriminator that estimates the success probability from each initial state when the agent follows the forward policy. The success discriminator is trained with relabeled transitions in a self-supervised manner. Our experimental results demonstrate that our ARL algorithm can generate an adaptive curriculum and enable the agent to efficiently bootstrap to solve sparse-reward maze navigation and manipulation tasks, outperforming baselines with significantly fewer manual resets.

翻訳日:2024-02-21 04:43:51 公開日:2024-02-18

# 時間的相関、コヒーレンス、ポスト選択が2光子干渉に及ぼす影響

Impact of temporal correlations, coherence, and postselection on two-photon interference ( http://arxiv.org/abs/2312.01503v2 )

ライセンス: Link先を確認

Fernando Redivo Cardoso, Jaewon Lee, Riccardo Checchinato, Jan-Heinrich Littmann, Marco De Gregorio, Sven H\"ofling, Christian Schneider, Celso J. Villas-Boas, Ana Predojevi\'c

(参考訳) 2光子干渉は量子フォトニクスにおいて必須の資源であるが、達成は容易ではない。光子対のカスケード生成は、2光子干渉を行う能力に悪影響を及ぼす固有の時間的相関を含むため、応用を妨げる。このような相関関係がデコヒーレンスや時間的ポストセレクションとどのように相互作用し、時間的ポストセレクションが2光子干渉の可視性を改善するかについて報告する。本研究は重要なパラメータを特定し,最適性能のソースへの道を示す。

Two-photon interference is an indispensable resource in quantum photonics, but it is not straightforward to achieve. The cascaded generation of photon pairs contains intrinsic temporal correlations that negatively affect the ability of such sources to perform two-photon interference, thus hindering applications. We report on how such correlation interplays with decoherence and temporal postselection, and under which conditions temporal postselection could improve two-photon interference visibility. Our study identifies crucial parameters and points the way to a source with optimal performance.

翻訳日:2024-02-21 04:35:21 公開日:2024-02-18

# ブロック圧縮特徴を用いたリアルタイム神経材料

Real-Time Neural Materials using Block-Compressed Features ( http://arxiv.org/abs/2311.16121v2 )

ライセンス: Link先を確認

Cl\'ement Weinreich, Louis de Oliveira, Antoine Houdard, Georges Nader

(参考訳) 神経材料は典型的にはデコーダネットワークと共に神経特徴の集合から成る。このようなモデルをリアルタイムレンダリングパイプラインに統合する上での大きな課題は、GPUメモリに機能を格納するために必要な大きなサイズと、ネットワークを効率的に評価する複雑性にある。本稿では,機能とデコーダをリアルタイムレンダリングパイプライン用に特別に設計したニューラルマテリアルモデルを提案する。我々のフレームワークはハードウェアベースのブロック圧縮(BC)テクスチャフォーマットを利用して学習した特徴を記憶し、そのモデルに空間と規模で連続的に材料情報を出力するように訓練する。これを実現するため、ブロックベースで特徴を整理し、トレーニング中にBC6の圧縮をエミュレートし、通常のBC6テクスチャとしてエクスポートする。この構造により、メモリフットプリントを低く保ちながら高解像度の機能を利用することができます。これにより、モデル全体の能力が向上し、シェーダ内で直接評価可能な軽量でシンプルなデコーダアーキテクチャが利用可能になります。さらに、学習した機能は継続的に復号化できるため、ランダムuvサンプリングとスケール間のスムーズな遷移を、その後のフィルタリングを必要とせずに実現することができる。その結果、我々の神経材料はメモリフットプリントが小さく、非常に高速にデコードでき、レンダリングパイプラインに最小の計算オーバーヘッドを加えることができる。

Neural materials typically consist of a collection of neural features along with a decoder network. The main challenge in integrating such models in real-time rendering pipelines lies in the large size required to store their features in GPU memory and the complexity of evaluating the network efficiently. We present a neural material model whose features and decoder are specifically designed to be used in real-time rendering pipelines. Our framework leverages hardware-based block compression (BC) texture formats to store the learned features and trains the model to output the material information continuously in space and scale. To achieve this, we organize the features in a block-based manner and emulate BC6 decompression during training, making it possible to export them as regular BC6 textures. This structure allows us to use high resolution features while maintaining a low memory footprint. Consequently, this enhances our model's overall capability, enabling the use of a lightweight and simple decoder architecture that can be evaluated directly in a shader. Furthermore, since the learned features can be decoded continuously, it allows for random uv sampling and smooth transition between scales without needing any subsequent filtering. As a result, our neural material has a small memory footprint, can be decoded extremely fast adding a minimal computational overhead to the rendering pipeline.

翻訳日:2024-02-21 04:35:05 公開日:2024-02-18

# 物理学におけるAlpha Zero:Alpha Zeroを用いたシンボリック回帰の物理解析への応用

Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics ( http://arxiv.org/abs/2311.12713v3 )

ライセンス: Link先を確認

Yoshihiro Michishita

(参考訳) ニューラルネットワークによる機械学習は、自然言語処理、画像認識、ゲーム勝利、さらには物理学の問題など、さまざまなタスクのための、ますます強力なツールになりつつある。機械学習を数値計算や実験の支援に応用する研究は数多く存在するが、解析方法を見つけるために機械学習を適用する方法はあまり研究されていない。本稿では、アルファゼロアルゴリズム(α zero for physics (azfp))を用いた記号回帰を用いて、物理学における解析手法を開発する枠組みを提案する。実演として、AZfPはFloquetシステムの高周波展開を導出できることを示す。 AZfPは物理学の新しい理論フレームワークを開発する可能性がある。

Machine learning with neural networks is now becoming a more and more powerful tool for various tasks, such as natural language processing, image recognition, winning the game, and even for the issues of physics. Although there are many studies on the application of machine learning to numerical calculation and assistance of experiments, the methods of applying machine learning to find the analytical method are poorly studied. In this paper, we propose the frameworks of developing analytical methods in physics by using the symbolic regression with the Alpha Zero algorithm, that is Alpha Zero for physics (AZfP). As a demonstration, we show that AZfP can derive the high-frequency expansion in the Floquet systems. AZfP may have the possibility of developing a new theoretical framework in physics.

翻訳日:2024-02-21 04:32:34 公開日:2024-02-18

# 難易度対策と文脈情報に基づくToken-Level Adversarial Prompt Detection

Token-Level Adversarial Prompt Detection Based on Perplexity Measures and Contextual Information ( http://arxiv.org/abs/2311.11509v3 )

ライセンス: Link先を確認

Zhengmian Hu, Gang Wu, Saayan Mitra, Ruiyi Zhang, Tong Sun, Heng Huang, and Viswanathan Swaminathan

(参考訳) 近年,様々なアプリケーションにおいて,Large Language Models (LLM) が重要なツールとして登場している。しかし、これらのモデルは敵のプロンプト攻撃の影響を受けやすいため、攻撃者はLSMを誤る入力文字列を慎重にキュレートし、誤った出力や望ましくない出力を生成することができる。従来の研究によると、離散最適化に基づく比較的単純な効果的な攻撃では、モデルのモデレーションやアライメントをバイパスする逆のプロンプトを生成することができる。敵に対するこの脆弱性は、LSMの堅牢性と信頼性に関する重要な懸念を浮き彫りにする。本研究の目的は,次のトークンの確率を予測するLLMの能力を活用して,トークンレベルでの敵対的プロンプトの検出に新たなアプローチを導入することである。本研究では,高い確率で予測されるトークンが正規であり,高いパープレキシティを示すトークンが逆数としてフラグ付けされるような,モデルのパープレキシティの度合いを測定する。さらに,提案手法では,隣接トークン情報を組み込んだコンテキスト理解も統合し,連続した敵のプロンプトシーケンスの検出を促進する。この目的のために、最適化手法に基づく2つのアルゴリズムと確率的グラフィカルモデル(PGM)に基づく2つのアルゴリズムを設計する。どちらの手法も効率的な解法を備えており、効率のよい逆数検出が可能である。トークンレベルの検出結果は、テキストシーケンス上のヒートマップオーバーレイとして可視化でき、テキストのどの部分が逆プロンプトを含んでいるかを明確により直感的に表現することができます。

In recent years, Large Language Models (LLM) have emerged as pivotal tools in various applications. However, these models are susceptible to adversarial prompt attacks, where attackers can carefully curate input strings that mislead LLMs into generating incorrect or undesired outputs. Previous work has revealed that with relatively simple yet effective attacks based on discrete optimization, it is possible to generate adversarial prompts that bypass moderation and alignment of the models. This vulnerability to adversarial prompts underscores a significant concern regarding the robustness and reliability of LLMs. Our work aims to address this concern by introducing a novel approach to detecting adversarial prompts at a token level, leveraging the LLM's capability to predict the next token's probability. We measure the degree of the model's perplexity, where tokens predicted with high probability are considered normal, and those exhibiting high perplexity are flagged as adversarial. Additionaly, our method also integrates context understanding by incorporating neighboring token information to encourage the detection of contiguous adversarial prompt sequences. To this end, we design two algorithms for adversarial prompt detection: one based on optimization techniques and another on Probabilistic Graphical Models (PGM). Both methods are equipped with efficient solving methods, ensuring efficient adversarial prompt detection. Our token-level detection result can be visualized as heatmap overlays on the text sequence, allowing for a clearer and more intuitive representation of which part of the text may contain adversarial prompts.

翻訳日:2024-02-21 04:32:04 公開日:2024-02-18

# 地平線からのデコヒーレンス:一般定式化と回転ブラックホール

Decoherence from Horizons: General Formulation and Rotating Black Holes ( http://arxiv.org/abs/2311.11461v2 )

ライセンス: Link先を確認

Samuel E. Gralla and Hongji Wei

(参考訳) Danielson, Satishchandran, and Wald (DSW) による最近の研究は、ブラックホール ― そして実際、キリング地平線はより一般的に ― が、近くの全ての量子スーパーポジションに基本的なデコヒーレンスの割合を与えることを示した。ブラックホールの観測者(bob)は、重ねられた重力場を測定することによって、量子重ね合わせの外側を乱すことができるはずであるが、その作用は(因果性によって)この効果を持つことができないため、重ね合わせは自動的に妨害されなければならない。 DSWは、シュワルツシルト時空における遠い観測者、平時時におけるリンドラー観測者、デ・シッター時空における静的観測者に対して、デコヒーレンス率を未知の数値要因まで計算した。電磁的およびクライン=ゴードンアナログで作業し、それらの計算を一般化し、バイフルケートキリング地平線近傍のキリング観測者に対する正確なデコヒーレンス率の一般的な公式を導出する。カーブラックホールの対称性軸上の任意の位置における観測者に対する閉形式の速度を評価する。これにより、遠方のオブザーバーであるシュワルツシルトの結果における数値的要因が修正され、また近接ホリゾンおよび/または極端に近い振る舞いの新たな探索が可能になる。電磁界の場合、クーロン場がブラックホールに入るのを遮蔽する「ブラックホールマイスナー効果」のため、デコヒーレンスは極端に完全に消滅する。ボブは外側の重ね合わせの場を測定することができないので、非一貫性は必要ありません。

Recent work by Danielson, Satishchandran, and Wald (DSW) has shown that black holes -- and, in fact, Killing horizons more generally -- impart a fundamental rate of decoherence on all nearby quantum superpositions. The effect can be understood from measurement and causality: An observer (Bob) in the black hole should be able to disturb outside quantum superpositions by measuring their superposed gravitational fields, but since his actions cannot (by causality) have this effect, the superpositions must automatically disturb themselves. DSW calculated the rate of decoherence up to an unknown numerical factor for distant observers in Schwarzschild spacetime, Rindler observers in flat spacetime, and static observers in de Sitter spacetime. Working in electromagnetic and Klein-Gordon analogs, we flesh out and generalize their calculation to derive a general formula for the precise decoherence rate for Killing observers near bifurcate Killing horizons. We evaluate the rate in closed form for an observer at an arbitrary location on the symmetry axis of a Kerr black hole. This fixes the numerical factor in the distant-observer Schwarzschild result, while allowing new exploration of near-horizon and/or near-extremal behavior. In the electromagnetic case we find that the decoherence vanishes entirely in the extremal limit, due to the "Black hole Meissner effect" screening the Coulomb field from entering the black hole. This supports the causality picture: Since Bob is unable to measure the field of the outside superposition, no decoherence is necessary -- and indeed none occurs.

翻訳日:2024-02-21 04:31:39 公開日:2024-02-18

# 構造認識型スパースビューX線3次元再構成

Structure-Aware Sparse-View X-ray 3D Reconstruction ( http://arxiv.org/abs/2311.10959v2 )

ライセンス: Link先を確認

Yuanhao Cai, Jiahao Wang, Zongwei Zhou, Angtian Wang, Alan Yuille

(参考訳) 物体の内部構造を明らかにする能力で知られているx線は、可視光よりもリッチな3d再構成情報を提供することが期待されている。しかし、既存のニューラル放射場(NeRF)アルゴリズムは、X線の重要な性質を無視し、画像化された物体の構造的内容の取得に制限をもたらす。本稿では, スパースビューX線3次元再構成のための構造対応X線ニューラルラジオ密度場(SAX-NeRF)を提案する。まず,SAX-NeRFのバックボーンとしてLineformer(Lineformer)を設計する。 Linefomerは、X線の各線分内の依存関係をモデル化することで、3D空間内のオブジェクトの内部構造をキャプチャする。次に,2次元投影における文脈的および幾何学的情報を抽出するためのマスキング局所グローバル(mlg)レイサンプリング戦略を提案する。さらに、より広いX線アプリケーションをカバーする大規模なデータセットX3Dを収集する。 X3Dの実験では、SAX-NeRFは、新しいビュー合成とCT再構成において、従来のNeRF法を12.56と2.49dBで上回っている。コード、モデル、データはhttps://github.com/caiyuanhao1998/SAX-NeRFで公開される。

X-ray, known for its ability to reveal internal structures of objects, is expected to provide richer information for 3D reconstruction than visible light. Yet, existing neural radiance fields (NeRF) algorithms overlook this important nature of X-ray, leading to their limitations in capturing structural contents of imaged objects. In this paper, we propose a framework, Structure-Aware X-ray Neural Radiodensity Fields (SAX-NeRF), for sparse-view X-ray 3D reconstruction. Firstly, we design a Line Segment-based Transformer (Lineformer) as the backbone of SAX-NeRF. Linefomer captures internal structures of objects in 3D space by modeling the dependencies within each line segment of an X-ray. Secondly, we present a Masked Local-Global (MLG) ray sampling strategy to extract contextual and geometric information in 2D projection. Plus, we collect a larger-scale dataset X3D covering wider X-ray applications. Experiments on X3D show that SAX-NeRF surpasses previous NeRF-based methods by 12.56 and 2.49 dB on novel view synthesis and CT reconstruction. Code, models, and data will be released at https://github.com/caiyuanhao1998/SAX-NeRF

翻訳日:2024-02-21 04:31:05 公開日:2024-02-18

# 微調整型大言語モデルのためのデミスティファイション命令混合

Demystifying Instruction Mixing for Fine-tuning Large Language Models ( http://arxiv.org/abs/2312.10793v3 )

ライセンス: Link先を確認

Renxi Wang, Haonan Li, Minghao Wu, Yuxia Wang, Xudong Han, Chiyu Zhang, Timothy Baldwin

(参考訳) インストラクションチューニングは、様々なタスクにわたる大規模言語モデル(LLM)の性能を大幅に向上させる。しかし、LLM微調整のための命令データセットの混合を最適化する手順はまだ理解されていない。本研究は,NLPダウンストリームタスク,コーディング,一般的なチャットの3つに分類する。提案手法は,LLMの性能に異なるデータセットの組み合わせが与える影響について検討し,特定の命令型が特定のアプリケーションに有利であるが,他の領域に悪影響を及ぼす可能性があることを示す。この研究は、命令の混合に関する洞察を与え、将来の研究の基礎を築いた。

Instruction tuning significantly enhances the performance of large language models (LLMs) across various tasks. However, the procedure to optimizing the mixing of instruction datasets for LLM fine-tuning is still poorly understood. This study categorizes instructions into three primary types: NLP downstream tasks, coding, and general chat. We explore the effects of instruction tuning on different combinations of datasets on LLM performance, and find that certain instruction types are more advantageous for specific applications but can negatively impact other areas. This work provides insights into instruction mixtures, laying the foundations for future research.

翻訳日:2024-02-21 04:10:40 公開日:2024-02-18

# 大規模言語モデルアライメントの多様な選好について

On Diversified Preferences of Large Language Model Alignment ( http://arxiv.org/abs/2312.07401v3 )

ライセンス: Link先を確認

Dun Zeng, Yong Dai, Pengyu Cheng, Tianhao Hu, Wanshun Chen, Nan Du, Zenglin Xu

(参考訳) 大規模言語モデル(LLM)を人間の好みに合わせることが,LLMのインタラクション品質向上の鍵であると認識されている。しかし、この多元的世界では、アノテータの異なる嗜好によって人間の嗜好が多様化し、LCMアライメント手法の有効性を阻害する。本稿では,ヒトのフィードバックデータセットを定量的に分析し,様々な好みが報酬モデルに与える影響について検討する。本研究では,報酬モデル(RM)の校正性能とLLMのアライメント性能の相関関係を明らかにする。その結果,様々な選好データが,例えば \textit{Harmless\&Helpful} などの人為的選好に対するRMの校正性能に悪影響を及ぼし,LCM のアライメント性能を損なうことがわかった。そこで本研究では, RMの校正性能を向上するMORE(Multi-Objective Reward Learning Method)を提案する。 3つのモデルと5つの人間好みデータセットで実験を行い,結果の検証を行った。提案手法はRMの予測キャリブレーションを大幅に改善し,Alpaca-7B モデルと \textit{Harmless\&Helpful} モデルのアライメントを向上させる。さらに,報奨校正性能と選好アライメント性能の関連性から,キャリブレーション誤差がRM評価の指標となることが示唆された。オープンソースのコードとデータは、 \url{https://github.com/dunzeng/more}で入手できる。

Aligning large language models (LLMs) with human preferences has been recognized as the key to improving LLMs' interaction quality. However, in this pluralistic world, human preferences can be diversified due to annotators' different tastes, which hinders the effectiveness of LLM alignment methods. This paper presents the first quantitative analysis of commonly used human feedback datasets to investigate the impact of diversified preferences on reward modeling. Our analysis reveals a correlation between the calibration performance of reward models (RMs) and the alignment performance of LLMs. We find that diversified preference data negatively affect the calibration performance of RMs on human-shared preferences, such as \textit{Harmless\&Helpful}, thereby impairing the alignment performance of LLMs. To address the ineffectiveness, we propose a novel Multi-Objective Reward learning method (MORE) to enhance the calibration performance of RMs on shared preferences. We validate our findings by experiments on three models and five human preference datasets. Our method significantly improves the prediction calibration of RMs, leading to better alignment of the Alpaca-7B model with \textit{Harmless\&Helpful} preferences. Furthermore, the connection between reward calibration and preference alignment performance suggests that calibration error can be adopted as a key metric for evaluating RMs. The open-source code and data are available at \url{https://github.com/dunzeng/MORE}.

翻訳日:2024-02-21 04:08:32 公開日:2024-02-18

# 効率的なニューラルネットワークのためのクラスアウェアプルーニング

Class-Aware Pruning for Efficient Neural Networks ( http://arxiv.org/abs/2312.05875v2 )

ライセンス: Link先を確認

Mengnan Jiang, Jingcun Wang, Amro Eldebiky, Xunzhao Yin, Cheng Zhuo, Ing-Chao Lin, Grace Li Zhang

(参考訳) ディープニューラルネットワーク(DNN)は様々な分野で顕著な成功を収めている。しかし、DNNにおける多数の浮動小数点演算(FLOP)は、エッジデバイスのようなリソース制約のアプリケーションに展開する上での課題となっている。この問題に対処するため、DNNの実行における計算コストを削減するためにプルーニングが導入された。従来のプルーニング戦略は、重量値、勾配値、アクティベーション出力に基づいている。本稿では,dnnを圧縮するクラスアウェアプルーニング手法を提案し,dnnの計算コストを削減するための新しい視点を提供する。各イテレーションで、ニューラルネットワークのトレーニングが変更され、クラス認識の刈り込みが容易になる。その後、クラス数に関するフィルタの重要性が評価される。いくつかのクラスでのみ重要なフィルタは削除される。ニューラルネットワークは、発生した精度の損失を補償するために再トレーニングされる。プルーニングのイテレーションは、フィルタがなくなるまで終了し、残りのフィルタが多くのクラスにとって非常に重要であることを示す。このプルーニング法は, 従来のプルーニング法よりも精度, プルーニング率, FLOPsの低減に優れていた。実験の結果, このクラスアウェアプルーニング手法は, 高い推定精度を維持しつつ, 重みとフラップ数を大幅に削減できることがわかった。

Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weight values, gradient values and activation outputs. Different from previous pruning solutions, in this paper, we propose a class-aware pruning technique to compress DNNs, which provides a novel perspective to reduce the computational cost of DNNs. In each iteration, the neural network training is modified to facilitate the class-aware pruning. Afterwards, the importance of filters with respect to the number of classes is evaluated. The filters that are only important for a few number of classes are removed. The neural network is then retrained to compensate for the incurred accuracy loss. The pruning iterations end until no filter can be removed anymore, indicating that the remaining filters are very important for many classes. This pruning technique outperforms previous pruning solutions in terms of accuracy, pruning ratio and the reduction of FLOPs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs, while maintaining a high inference accuracy.

翻訳日:2024-02-21 04:07:22 公開日:2024-02-18

# NeRFをベースとした色とオパクティを持つガウススメッティング

Gaussian Splatting with NeRF-based Color and Opacity ( http://arxiv.org/abs/2312.13729v3 )

ライセンス: Link先を確認

Dawid Malarz, Weronika Smolak, Jacek Tabor, S{\l}awomir Tadeja, Przemys{\l}aw Spurek

(参考訳) neural radiance fields (nerfs) は、3dオブジェクトの複雑さを捉えるためのニューラルネットワークの驚くべき可能性を実証している。ニューラルネットワークの重みの中に形状と色情報をエンコードすることで、NeRFは3Dオブジェクトの驚くほどシャープな新しいビューを生み出すのに優れています。近年, 生成モデルを用いたNeRFの一般化が数多く現れ, その汎用性が高まっている。対照的に、gaussian splatting (gs) はニューラルネットワークを必要とせず、より高速なトレーニングと推論で同様のレンダリング品質を提供する。ガウス分布の集合に3Dオブジェクトに関する情報をエンコードし、古典的メッシュと同様に3Dで描画できる。残念ながら、GSは通常数十万のガウス成分を必要とするため、条件付けが難しい。両モデルの注意点を緩和するため,3dオブジェクトの形状のgs表現と,nerfに基づく色と不透明のエンコーディングを用いたハイブリッドモデル視聴方向ガウススプレーティング(vdgs)を提案する。我々のモデルは、ガウス分布とトレーニング可能な位置(すなわちガウスの手段)、形状(ガウスの共分散)、色と不透明度、ニューラルネットワークを用いており、ガウス分布と視方向のパラメータを使って色と不透明度の変化を生成する。その結果、3dオブジェクトのシャドウ、光反射、透明性をよりよく記述した。

Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. We encode information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and neural network, which takes parameters of Gaussian and viewing direction to produce changes in color and opacity. Consequently, our model better describes shadows, light reflections, and transparency of 3D objects.

翻訳日:2024-02-21 03:55:00 公開日:2024-02-18

# 擬似ブールモデルカウンタの工学

Engineering an Exact Pseudo-Boolean Model Counter ( http://arxiv.org/abs/2312.12341v2 )

ライセンス: Link先を確認

Suwei Yang and Kuldeep S. Meel

(参考訳) モデルカウント(英: model counting)とは、コンピュータ科学における基本的なタスクであり、結合正規形(cnf)で表されるブール公式の割り当て数を決定することを含む。 CNF式に対するモデルカウントは幅広い用途で広く注目されているが、Pseudo-Boolean(PB)式に対するモデルカウントの研究は比較的見過ごされている。擬ブール公式は命題のブール公式よりも簡潔であり、現実世界の問題を表現できる柔軟性を提供する。その結果,PB式に対するモデルカウントの効率的な手法を検討する必要がある。本研究では,代数的決定図による知識コンパイルアプローチに依拠する,最初の完全擬ボアリーンモデルカウンタpbcountを提案する。 pbcountは1513インスタンスのカウントを計算できるが、現在の最先端のアプローチでは1013インスタンスしか処理できない。私たちの研究は,事前処理手法の開発や知識コンパイル以外のアプローチの探求など,pb公式のモデルカウントという文脈で,今後の作業へのいくつかの道を開いた。

Model counting, a fundamental task in computer science, involves determining the number of satisfying assignments to a Boolean formula, typically represented in conjunctive normal form (CNF). While model counting for CNF formulas has received extensive attention with a broad range of applications, the study of model counting for Pseudo-Boolean (PB) formulas has been relatively overlooked. Pseudo-Boolean formulas, being more succinct than propositional Boolean formulas, offer greater flexibility in representing real-world problems. Consequently, there is a crucial need to investigate efficient techniques for model counting for PB formulas. In this work, we propose the first exact Pseudo-Boolean model counter, PBCount, that relies on knowledge compilation approach via algebraic decision diagrams. Our extensive empirical evaluation shows that PBCount can compute counts for 1513 instances while the current state-of-the-art approach could only handle 1013 instances. Our work opens up several avenues for future work in the context of model counting for PB formulas, such as the development of preprocessing techniques and exploration of approaches other than knowledge compilation.

翻訳日:2024-02-21 03:53:31 公開日:2024-02-18

# EASYTOOL:簡潔ツール指導によるLCMエージェントの強化

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction ( http://arxiv.org/abs/2401.06201v2 )

ライセンス: Link先を確認

Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing Yang

(参考訳) 現実世界の複雑なタスクに対処するため、大規模言語モデル(LLM)の応用におけるツール利用への関心が高まっている。 LLMベースのエージェントを開発するには、通常、異なるツールドキュメントから多くのツール機能を理解する必要がある。しかし、これらのドキュメンテーションは多様で冗長で不完全で、ツールを使用する際のllmの能力に大きな影響を与えます。そこで本稿では,多種多様なツールドキュメントを統一的かつ簡潔なツール命令に変換するためのフレームワークであるEASYTOOLを紹介する。 EasyToolは、異なるソースの広範なツールドキュメントから必須情報を浄化し、標準化されたツール記述とLLMベースのエージェントの機能を提供する統一されたインターフェース(ツールインストラクション)を精査する。複数のタスクに関する大規模な実験は、EasyToolがトークン消費を大幅に削減し、現実のシナリオにおけるツール利用のパフォーマンスを向上させることを実証している。私たちのコードは将来的には \url{https://github.com/microsoft/JARVIS/} で利用可能になります。

To address intricate real-world tasks, there has been a rising interest in tool utilization in applications of large language models (LLMs). To develop LLM-based agents, it usually requires LLMs to understand many tool functions from different tool documentation. But these documentations could be diverse, redundant or incomplete, which immensely affects the capability of LLMs in using tools. To solve this, we introduce EASYTOOL, a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction for easier tool usage. EasyTool purifies essential information from extensive tool documentation of different sources, and elaborates a unified interface (i.e., tool instruction) to offer standardized tool descriptions and functionalities for LLM-based agents. Extensive experiments on multiple different tasks demonstrate that EasyTool can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios. Our code will be available at \url{https://github.com/microsoft/JARVIS/} in the future.

翻訳日:2024-02-21 03:42:56 公開日:2024-02-18

# GeoDecoder: マルチモーダルマップ理解の強化

GeoDecoder: Empowering Multimodal Map Understanding ( http://arxiv.org/abs/2401.15118v2 )

ライセンス: Link先を確認

Feng Qi, Mian Dai, Zixian Zheng, Chao Wang

(参考訳) 本稿では,地理空間情報を処理するための専用マルチモーダルモデルgeodecoderを提案する。 GeoDecoderはBeitGPTアーキテクチャに基づいて構築されており、画像やテキスト処理の専門的なモジュールが組み込まれている。画像側では、GeoDecoderはGaoDe Amapを基盤となるベースマップとして使用しています。レンダリング技術の利用により、モデルは外部データとシンボルマーカー、ドライブ軌道、ヒートマップ、ユーザ定義マーカーなどの機能をシームレスに統合し、追加の機能エンジニアリングの必要性をなくす。 geodecoderのテキストモジュールは、さまざまなコンテキストテキストと質問プロンプトを受け付け、gptのスタイルでテキスト出力を生成する。さらに、GPTベースのモデルは、エンドツーエンドで同じモデル内で複数のタスクのトレーニングと実行を可能にする。北京の地理空間の分布に関する知識をジオデコーダが取得できるようにするため,8つの基本的な地理空間課題を考案し,大規模テキスト画像サンプルを用いてモデルの事前学習を行った。その後、3つの下流タスクで迅速な微調整が行われ、パフォーマンスが大幅に向上した。 geodecoderモデルは、マップ要素とその関連操作の包括的理解を示し、異なるビジネスシナリオにおける多様な地理空間タスクの効率的かつ高品質な適用を可能にする。

This paper presents GeoDecoder, a dedicated multimodal model designed for processing geospatial information in maps. Built on the BeitGPT architecture, GeoDecoder incorporates specialized expert modules for image and text processing. On the image side, GeoDecoder utilizes GaoDe Amap as the underlying base map, which inherently encompasses essential details about road and building shapes, relative positions, and other attributes. Through the utilization of rendering techniques, the model seamlessly integrates external data and features such as symbol markers, drive trajectories, heatmaps, and user-defined markers, eliminating the need for extra feature engineering. The text module of GeoDecoder accepts various context texts and question prompts, generating text outputs in the style of GPT. Furthermore, the GPT-based model allows for the training and execution of multiple tasks within the same model in an end-to-end manner. To enhance map cognition and enable GeoDecoder to acquire knowledge about the distribution of geographic entities in Beijing, we devised eight fundamental geospatial tasks and conducted pretraining of the model using large-scale text-image samples. Subsequently, rapid fine-tuning was performed on three downstream tasks, resulting in significant performance improvements. The GeoDecoder model demonstrates a comprehensive understanding of map elements and their associated operations, enabling efficient and high-quality application of diverse geospatial tasks in different business scenarios.

翻訳日:2024-02-21 03:34:19 公開日:2024-02-18

# PsySafe: 多エージェントシステム安全の心理的攻撃・防衛・評価のための総合的枠組み

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety ( http://arxiv.org/abs/2401.11880v2 )

ライセンス: Link先を確認

Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao

(参考訳) 大規模言語モデル(llm)で拡張されたマルチエージェントシステムは、集団知性において深い能力を発揮する。しかし、悪意のある目的のためにこのインテリジェンスの潜在的誤用は重大なリスクをもたらす。現在,マルチエージェントシステムの安全性に関する総合的な研究は限られている。本稿では,エージェント心理学の革新的なレンズを通して,エージェントの暗黒心理状態が安全性に対する重大な脅威となることを明らかにする。これらの問題に対処するために,エージェント心理学を基盤とした包括的枠組み(PsySafe)を提案する。まず,エージェントのダークパーソナリティ特性がいかに危険行動を引き起こすか,次に心理的・行動的観点からマルチエージェントシステムの安全性を評価すること,そしてリスクを軽減する効果的な戦略を考案することである。実験により,エージェント間の集団的危険行動,危険行動に関わるエージェントの自己反射,エージェントの心理的評価と危険行動の相関など,いくつかの興味深い現象が明らかになった。我々は,マルチエージェントシステムの安全性に関するさらなる研究に,我々のフレームワークと観測が貴重な洞察を提供することを期待している。データとコードをhttps://github.com/AI4Good24/PsySafeで公開します。

Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. To date, comprehensive research on the safety issues associated with multi-agent systems remains limited. In this paper, we explore these concerns through the innovative lens of agent psychology, revealing that the dark psychological states of agents constitute a significant threat to safety. To tackle these concerns, we propose a comprehensive framework (PsySafe) grounded in agent psychology, focusing on three key areas: firstly, identifying how dark personality traits in agents can lead to risky behaviors; secondly, evaluating the safety of multi-agent systems from the psychological and behavioral perspectives, and thirdly, devising effective strategies to mitigate these risks. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors. We anticipate that our framework and observations will provide valuable insights for further research into the safety of multi-agent systems. We will make our data and code publicly accessible at https://github.com/AI4Good24/PsySafe.

翻訳日:2024-02-21 03:33:29 公開日:2024-02-18

# LightDiC: 大規模図形表現学習におけるシンプルかつ効果的なアプローチ

LightDiC: A Simple yet Effective Approach for Large-scale Digraph Representation Learning ( http://arxiv.org/abs/2401.11772v2 )

ライセンス: Link先を確認

Xunkai Li, Meihao Liao, Zhengyu Wu, Daohan Su, Wentao Zhang, Rong-Hua Li, Guoren Wang

(参考訳) 既存のグラフニューラルネットワーク(GNN)のほとんどは、キャプチャされたリレーショナル情報の制限範囲が、実世界のシナリオにおける表現能力とデプロイメントを妨げる、非ダイレクトグラフに限られている。非有向グラフと比較して、有向グラフ (digraphs) は、輸送や金融ネットワークなどのノード間のより複雑な関係を捉えることにより、より複雑なトポロジーシステムのモデリングの要求に合致する。いくつかの指向型GNNが導入されたが、そのインスピレーションは主にディープラーニングアーキテクチャによるもので、冗長な複雑性と計算をもたらし、大規模データベースには適用できない。これらの問題に対処するために、磁気ラプラシアンに基づくダイグラフ畳み込みのスケーラブルな変種であるLightDiCを提案する。トポロジ関連の計算はオフライン前処理でのみ実行されるため、lightdicは例外的なスケーラビリティを実現し、再帰的な計算コストを伴わずに下流の予測を個別に訓練することができる。理論的解析により、lightdicはディリクレエネルギー最適化関数の近位勾配降下過程に対応する複素場に基づくメッセージパッシングを達成するために、ディグラフ信号のデノイジングの観点から有向情報を利用することが示され、その表現性が保証される。実験の結果、LightDiCは様々な下流タスクにおいて、学習可能なパラメータが少なく、訓練効率も高く、他のSOTAメソッドよりも優れていた。特に、LightDiCは最も代表的な大規模データベース(ogbn-papers100M)で満足できる結果を提供する最初のDiGNNである。

Most existing graph neural networks (GNNs) are limited to undirected graphs, whose restricted scope of the captured relational information hinders their expressive capabilities and deployments in real-world scenarios. Compared with undirected graphs, directed graphs (digraphs) fit the demand for modeling more complex topological systems by capturing more intricate relationships between nodes, such as formulating transportation and financial networks. While some directed GNNs have been introduced, their inspiration mainly comes from deep learning architectures, which lead to redundant complexity and computation, making them inapplicable to large-scale databases. To address these issues, we propose LightDiC, a scalable variant of the digraph convolution based on the magnetic Laplacian. Since topology-related computations are conducted solely during offline pre-processing, LightDiC achieves exceptional scalability, enabling downstream predictions to be trained separately without incurring recursive computational costs. Theoretical analysis shows that LightDiC utilizes directed information to achieve message passing based on the complex field, which corresponds to the proximal gradient descent process of the Dirichlet energy optimization function from the perspective of digraph signal denoising, ensuring its expressiveness. Experimental results demonstrate that LightDiC performs comparably well or even outperforms other SOTA methods in various downstream tasks, with fewer learnable parameters and higher training efficiency. Notably, LightDiC is the first DiGNN to provide satisfactory results in the most representative large-scale database (ogbn-papers100M).

翻訳日:2024-02-21 03:33:08 公開日:2024-02-18

# R-Judge: LLMエージェントの安全リスク意識のベンチマーク

R-Judge: Benchmarking Safety Risk Awareness for LLM Agents ( http://arxiv.org/abs/2401.10019v2 )

ライセンス: Link先を確認

Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Fangqi Li, Zhuosheng Zhang, Rui Wang, Gongshen Liu

(参考訳) 大規模言語モデル(LLM)は、現実世界のアプリケーション間で自律的にタスクを完了させる大きな可能性を示している。それにもかかわらず、これらのllmエージェントは、対話環境での運用において予期せぬ安全性リスクをもたらす。本研究は, LLM生成コンテンツの安全性を従来の研究で重視する代わりに, 多様な環境下でのLCMエージェントの行動安全のベンチマークの必要性に対処する。 r-judgeは,エージェントインタラクション記録による安全リスクの判定と同定において,llmの熟練度を評価するためのベンチマークである。 r-judgeはマルチターンエージェントインタラクションの162レコードで構成され、7つのアプリケーションカテゴリと10のリスクタイプのうち27の重要なリスクシナリオを包含する。安全に関する人間のコンセンサスと、注釈付き安全ラベルと高品質のリスク記述が組み込まれている。 r-judge における 9 llm の評価は llm のリスク意識を高める余地がある: ベストパフォーマンスモデル gpt-4 は 89.07% の人間のスコアに対して 72.52% を達成し、他の全てのモデルはランダムより少ない。さらに,環境フィードバックとしてリスク記述を活用することで,大幅な性能向上が期待できることを示す。事例研究では,オープンエージェントシナリオにおけるリスク認識は,知識と推論を伴う多次元的能力であり,現在のllmでは困難であることを明らかにした。 R-Judgeはhttps://github.com/Lordog/R-Judgeで公開されている。

Large language models (LLMs) have exhibited great potential in autonomously completing tasks across real-world applications. Despite this, these LLM agents introduce unexpected safety risks when operating in interactive environments. Instead of centering on LLM-generated content safety in most prior studies, this work addresses the imperative need for benchmarking the behavioral safety of LLM agents within diverse environments. We introduce R-Judge, a benchmark crafted to evaluate the proficiency of LLMs in judging and identifying safety risks given agent interaction records. R-Judge comprises 162 records of multi-turn agent interaction, encompassing 27 key risk scenarios among 7 application categories and 10 risk types. It incorporates human consensus on safety with annotated safety labels and high-quality risk descriptions. Evaluation of 9 LLMs on R-Judge shows considerable room for enhancing the risk awareness of LLMs: The best-performing model, GPT-4, achieves 72.52% in contrast to the human score of 89.07%, while all other models score less than the random. Moreover, further experiments demonstrate that leveraging risk descriptions as environment feedback achieves substantial performance gains. With case studies, we reveal that correlated to parameter amount, risk awareness in open agent scenarios is a multi-dimensional capability involving knowledge and reasoning, thus challenging for current LLMs. R-Judge is publicly available at https://github.com/Lordog/R-Judge.

翻訳日:2024-02-21 03:32:24 公開日:2024-02-18

# SymTC : 腰部MRIのインスタンス分割のための共生トランスフォーマー-CNNネット

SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI ( http://arxiv.org/abs/2401.09627v3 )

ライセンス: Link先を確認

Jiasong Chen, Linchen Qian, Linhai Ma, Timur Urakov, Weiyong Gu, Liang Liang

(参考訳) 椎間板疾患は一般的な疾患であり、しばしば間欠的または持続的な腰痛につながり、この疾患の診断と評価は腰椎mri画像から椎間板と椎間板の正確な測定に依存している。ディープニューラルネットワーク(DNN)モデルは、腰椎の個々のインスタンス(ディスクと脊椎)のより効率的なイメージセグメンテーションを自動化された方法で臨床医を支援する。本研究では,トランスフォーマーと畳み込みニューラルネットワーク(CNN)の強みを組み合わせた,革新的な腰椎MR画像分割モデルであるSymTCを提案する。具体的には、cnn層とtransformer層をマージする並列なデュアルパスアーキテクチャを設計し、トランスのセルフアテンションモジュールに新しい位置埋め込みを組み込むことにより、より正確なセグメンテーションのための位置情報の利用を強化した。モデル性能をさらに向上させるため,ssmspineと呼ばれる合成的で現実的なmr画像データセットを作成するための新しいデータ拡張技術を導入した。 ssmspineデータセットとプライベートデータセットのsymtcおよび既存の15のイメージセグメンテーションモデルを,dice類似度係数と95%ハウスドルフ距離の2つの指標を用いて評価した。その結果,SymTCは腰椎MRI画像における椎骨と椎間板のセグメンテーションに最適であることが示唆された。 SymTCコードとSSMSpineデータセットはhttps://github.com/jiasongchen/SymTCで公開されている。

Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain, and diagnosing and assessing of this disease rely on accurate measurement of vertebral bone and intervertebral disc geometries from lumbar MR images. Deep neural network (DNN) models may assist clinicians with more efficient image segmentation of individual instances (disks and vertebrae) of the lumbar spine in an automated way, which is termed as instance image segmentation. In this work, we proposed SymTC, an innovative lumbar spine MR image segmentation model that combines the strengths of Transformer and Convolutional Neural Network (CNN). Specifically, we designed a parallel dual-path architecture to merge CNN layers and Transformer layers, and we integrated a novel position embedding into the self-attention module of Transformer, enhancing the utilization of positional information for more accurate segmentation. To further improves model performance, we introduced a new data augmentation technique to create synthetic yet realistic MR image dataset, named SSMSpine, which is made publicly available. We evaluated our SymTC and the other 15 existing image segmentation models on our private in-house dataset and the public SSMSpine dataset, using two metrics, Dice Similarity Coefficient and 95% Hausdorff Distance. The results show that our SymTC has the best performance for segmenting vertebral bones and intervertebral discs in lumbar spine MR images. The SymTC code and SSMSpine dataset are available at https://github.com/jiasongchen/SymTC.

翻訳日:2024-02-21 03:31:58 公開日:2024-02-18

# 技術報告:ノードの到達不能性を考慮したゴシップ学習の収束について

Technical Report: On the Convergence of Gossip Learning in the Presence of Node Inaccessibility ( http://arxiv.org/abs/2401.09498v2 )

ライセンス: Link先を確認

Tian Liu, Yue Cui, Xueyang Hu, Yecheng Xu, Bo Liu

(参考訳) gossip learning(gl)は、連合学習(federated learning:fl)の代替として、無人航空機(uavs)によって形成される空飛ぶアドホックネットワーク(fanets)のようなリソース制約された無線ネットワークに適している。 GLは、UAVネットワークの効率を大幅に向上し、バッテリー寿命を延長することができる。この利点にもかかわらず、GLの性能はデータ分散、通信速度、ネットワーク接続に強く影響されている。しかし、これらの因子がGL収束にどのように影響するかはいまだ不明である。既存の研究は、コンビニエンスのために仮想量に基づくglの収束を研究したが、いくつかのノードがアクセスできない場合、ネットワークの実際の状態を反映しなかった。本稿では,動的ネットワークトポロジの下でglに対するアクセス不能ノードの影響を定式化し,検討する。まず、ノードがアクセス可能かどうかによって重み分散を分解する。そこで我々は,ノードアクセシビリティの動的条件下でのGL収束について検討し,到達不能ノード数,データ非i.d.ネス,到達不能期間が収束に与える影響を理論的に示す。理論的な結果の正しさを包括的に検証するために,実践的な実験を行った。

Gossip learning (GL), as a decentralized alternative to federated learning (FL), is more suitable for resource-constrained wireless networks, such as Flying Ad-Hoc Networks (FANETs) that are formed by unmanned aerial vehicles (UAVs). GL can significantly enhance the efficiency and extend the battery life of UAV networks. Despite the advantages, the performance of GL is strongly affected by data distribution, communication speed, and network connectivity. However, how these factors influence the GL convergence is still unclear. Existing work studied the convergence of GL based on a virtual quantity for the sake of convenience, which failed to reflect the real state of the network when some nodes are inaccessible. In this paper, we formulate and investigate the impact of inaccessible nodes to GL under a dynamic network topology. We first decompose the weight divergence by whether the node is accessible or not. Then, we investigate the GL convergence under the dynamic of node accessibility and theoretically provide how the number of inaccessible nodes, data non-i.i.d.-ness, and duration of inaccessibility affect the convergence. Extensive experiments are carried out in practical settings to comprehensively verify the correctness of our theoretical findings.

翻訳日:2024-02-21 03:31:28 公開日:2024-02-18

# 部分観測による空間・時間連続物理シミュレーション

Space and Time Continuous Physics Simulation From Partial Observations ( http://arxiv.org/abs/2401.09198v2 )

ライセンス: Link先を確認

Janny Steeven, Nadri Madiha, Digne Julie, Wolf Christian

(参考訳) 物理シミュレーションの最新の技術は、精度と複雑性のトレードオフに対処する数値スキームとメッシュリファインメント法に依存しているが、これらの手作りのソリューションは面倒で高い計算力を必要とする。大規模機械学習に基づくデータ駆動方式は、より直接的かつ効率的に長距離依存関係を統合することにより、高い適応性を実現する。本研究では,流体力学に焦点をあて,正則あるいは不規則な格子の形での計算と予測の固定的なサポートに基づく,文献の大部分の欠点に対処した。本研究では,空間的・時間的領域の連続的な予測を行うための新しい手法を提案する。本稿では,この課題を二重観測問題として定式化し,それぞれスパース位置と連続領域の2つの相互結合力学系を持つ解を提案し,初期状態からの解の予測と補間を可能にする。我々の実践的な実装は、繰り返しGNNと任意の場所で解を補間できる時空間注意オブザーバを含む。我々のモデルは(標準の自己回帰モデルのように)新しい初期条件に一般化するだけでなく、任意の空間と時間の位置で評価を行う。流体力学の標準データセットを3つ評価し、古典的設定と連続予測を必要とする拡張された新しいタスクの両方において優れたベースラインと比較した。

Modern techniques for physical simulations rely on numerical schemes and mesh-refinement methods to address trade-offs between precision and complexity, but these handcrafted solutions are tedious and require high computational power. Data-driven methods based on large-scale machine learning promise high adaptivity by integrating long-range dependencies more directly and efficiently. In this work, we focus on fluid dynamics and address the shortcomings of a large part of the literature, which are based on fixed support for computations and predictions in the form of regular or irregular grids. We propose a novel setup to perform predictions in a continuous spatial and temporal domain while being trained on sparse observations. We formulate the task as a double observation problem and propose a solution with two interlinked dynamical systems defined on, respectively, the sparse positions and the continuous domain, which allows to forecast and interpolate a solution from the initial condition. Our practical implementation involves recurrent GNNs and a spatio-temporal attention observer capable of interpolating the solution at arbitrary locations. Our model not only generalizes to new initial conditions (as standard auto-regressive models do) but also performs evaluation at arbitrary space and time locations. We evaluate on three standard datasets in fluid dynamics and compare to strong baselines, which are outperformed both in classical settings and in the extended new task requiring continuous predictions.

翻訳日:2024-02-21 03:30:44 公開日:2024-02-18

# AI適応画像ラベリングにおけるコンフォーマル予測セットの有用性の評価

Evaluating the Utility of Conformal Prediction Sets for AI-Advised Image Labeling ( http://arxiv.org/abs/2401.08876v3 )

ライセンス: Link先を確認

Dongping Zhang, Angelos Chatzimparmpas, Negar Kamali, and Jessica Hullman

(参考訳) ディープニューラルネットワークは高スループット領域に一般的に展開されるため、その解釈可能性の欠如は不確実性定量化を難しくする。共形予測セット$\unicode{x2013}$aの分布のない不確実性定量化$\unicode{x2013}$aの方法が、aiが助言する意思決定における不確実性を表現するために有効であることを検証した。大規模なオンライン実験を通じて、共形予測セットの有用性を、AIが推奨する画像ラベリングのためのTop-$とTop-k$の表示と比較する。事前登録された分析では,精度の予測セットの有用性はタスクの難易度に応じて変化し,Top-1$とTop-k$と同等以上の精度で画像の表示が可能であるのに対し,アウト・オブ・ディストリビューション(OOD)画像のラベル付けにおいて人を支援するための予測セットは優れている。本研究は,共形予測セットの実際的課題を実証的に特定し,実世界の意思決定にどのように組み込むかを示す。

As deep neural networks are more commonly deployed in high-stakes domains, their lack of interpretability makes uncertainty quantification challenging. We investigate the effects of presenting conformal prediction sets$\unicode{x2013}$a method for generating valid confidence sets in distribution-free uncertainty quantification$\unicode{x2013}$to express uncertainty in AI-advised decision-making. Through a large online experiment, we compare the utility of conformal prediction sets to displays of Top-$1$ and Top-$k$ predictions for AI-advised image labeling. In a pre-registered analysis, we find that the utility of prediction sets for accuracy varies with the difficulty of the task: while they result in accuracy on par with or less than Top-$1$ and Top-$k$ displays for easy images, prediction sets excel at assisting humans in labeling out-of-distribution (OOD) images, especially when the set size is small. Our results empirically pinpoint the practical challenges of conformal prediction sets and provide implications on how to incorporate them for real-world decision-making.

翻訳日:2024-02-21 03:30:21 公開日:2024-02-18

# AI信頼度測定のための統計フレームワーク

A Statistical Framework for Measuring AI Reliance ( http://arxiv.org/abs/2401.15356v2 )

ライセンス: Link先を確認

Ziyang Guo, Yifan Wu, Jason Hartline and Jessica Hullman

(参考訳) 人間はしばしば人工知能(AI)システムの助けを借りて意思決定をする。一般的なパターンは、最終決定をコントロールしている人間に対して、AIがアクションを推奨することである。研究者は、補完的なパフォーマンスを達成する上で重要な要素として、人間がAIに適切に依存していることを確認する。このような研究で使われる適切な信頼度の定義は、形式的な統計的根拠が欠如しており、矛盾を招く可能性がある。統計的決定理論に基づき,AIの予測に従う確率として信頼の概念を,人間が信号の識別や状況に関する正確な信念形成に直面する可能性のある課題から分離する形式的信頼の定義を提案する。私たちの定義は、人間とAIの相補性と信頼に関する研究の設計と解釈を導くのに使用できるフレームワークを生み出します。近年のaiによる意思決定研究を文献から活用し,信号の正確な区別ができないことによる損失と,誤依存による損失を分離するために,我々のフレームワークがいかに利用できるかを実証する。これらの損失を,行動意思決定者と同じ意思決定課題に直面した合理的な意思決定者によって達成される期待された報酬によって定義される相補的性能の基準とベンチマークと比較することにより評価する。

Humans frequently make decisions with the aid of artificially intelligent (AI) systems. A common pattern is for the AI to recommend an action to the human who retains control over the final decision. Researchers have identified ensuring that a human has appropriate reliance on an AI as a critical component of achieving complementary performance. We argue that the current definition of appropriate reliance used in such research lacks formal statistical grounding and can lead to contradictions. We propose a formal definition of reliance, based on statistical decision theory, which separates the concepts of reliance as the probability the decision-maker follows the AI's prediction from challenges a human may face in differentiating the signals and forming accurate beliefs about the situation. Our definition gives rise to a framework that can be used to guide the design and interpretation of studies on human-AI complementarity and reliance. Using recent AI-advised decision making studies from literature, we demonstrate how our framework can be used to separate the loss due to mis-reliance from the loss due to not accurately differentiating the signals. We evaluate these losses by comparing to a baseline and a benchmark for complementary performance defined by the expected payoff achieved by a rational decision-maker facing the same decision task as the behavioral decision-makers.

翻訳日:2024-02-21 03:19:59 公開日:2024-02-18

# インストラクションファインチューニング: プロンプト損失は重要か?

Instruction Fine-Tuning: Does Prompt Loss Matter? ( http://arxiv.org/abs/2401.13586v2 )

ライセンス: Link先を確認

Mathew Huerta-Enochian

(参考訳) 本稿では,教師付き命令の微調整におけるplwの効果について検討する。 LLaMA 1とLLaMA 2の両方と複数の命令データセットを用いて、スタンフォード大学のAlpaca実験を再現した。短時間補完データセットで微調整したモデルの性能はPLWと統計的に有意な負の二次関係を示したが,中長期補完データで微調整したモデルの性能はPLWとは何の関係も示さなかった。即時損失は多くのデータセットに対して安全に無視できる。短時間補完データの場合,PLWの小さな値 (0.01-0.1) は複数選択および短世代タスクに最適であり,PLWの大きな値 (~1.0) は長世代タスクに最適であった。その結果、低非ゼロPLWはトレーニング中にトレーニング済みモデル重量から逸脱しないようにし、高いPLWは過度な適合を減少させる。最後に、微調整データの完成-急激な長さ比に基づいてPLW値を選択するための粗いガイドを示す。

We present a study analyzing the effects of prompt loss weighting (PLW) on supervised instruction fine-tuning. We recreated Stanford's Alpaca experiment with both LLaMA 1 and LLaMA 2 and multiple instruction datasets. We found that performance of models fine-tuned on our short-completion dataset had a statistically significant negative quadratic relationship with PLW, but performance of models fine-tuned on medium- and long-completion data did not show any relationship with PLW. I.e., prompt loss can be safely ignored for many datasets. For short-completion data, small values (0.01-0.1) of PLW were optimal for multiple-choice and short-generation tasks while large values (~ 1.0) of PLW were optimal for long-generation tasks. We concluded that low non-zero PLW encourages models to not diverge from pre-trained model weights during training and high PLW reduces overfitting. Finally, we present a rough guide for selecting PLW values based on the completion-prompt length ratio of fine-tuning data.

翻訳日:2024-02-21 03:18:31 公開日:2024-02-18

# 進化的アルゴリズムと強化学習の橋渡し:包括的調査

Bridging Evolutionary Algorithms and Reinforcement Learning: A Comprehensive Survey ( http://arxiv.org/abs/2401.11963v2 )

ライセンス: Link先を確認

Pengyi Li, Jianye Hao, Hongyao Tang, Xian Fu, Yan Zheng, Ke Tang

(参考訳) 進化的アルゴリズム(EA)と強化学習(RL)を統合した進化的強化学習(ERL)は,優れた性能向上を示す。両アプローチの強みを融合させることで、ERLは有望な研究方向として現れている。本調査では,ERLの多様な研究分野について概観する。具体的には、関連アルゴリズムの最近の進歩を体系的に要約し、RLのEA支援最適化、EAのRL支援最適化、EAとRLの相乗的最適化の3つの研究方向を特定する。その後、各研究の方向性を詳細に分析し、複数の研究部門を編成する。それぞれのブランチが取り組もうとしている問題と、EAとRLの統合がこれらの課題にどのように対処するかを明らかにする。結論として,様々な研究方向の潜在的な課題と今後の研究方向性について論じる。研究者によるERLの探究を容易にするため, https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learningに関するアルゴリズムとコードを整理した。

Evolutionary Reinforcement Learning (ERL), which integrates Evolutionary Algorithms (EAs) and Reinforcement Learning (RL) for optimization, has demonstrated remarkable performance advancements. By fusing the strengths of both approaches, ERL has emerged as a promising research direction. This survey offers a comprehensive overview of the diverse research branches in ERL. Specifically, we systematically summarize recent advancements in relevant algorithms and identify three primary research directions: EA-assisted optimization of RL, RL-assisted optimization of EA, and synergistic optimization of EA and RL. Following that, we conduct an in-depth analysis of each research direction, organizing multiple research branches. We elucidate the problems that each branch aims to tackle and how the integration of EA and RL addresses these challenges. In conclusion, we discuss potential challenges and prospective future research directions across various research directions. To facilitate researchers in delving into ERL, we organize the algorithms and codes involved on https://github.com/yeshenpy/Awesome-Evolutionary-Reinforcement-Learning.

翻訳日:2024-02-21 03:17:08 公開日:2024-02-18

# TrustAgent:エージェント・コンスティチューションによる安全で信頼できるLDMエージェントを目指して

TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution ( http://arxiv.org/abs/2402.01586v2 )

ライセンス: Link先を確認

Wenyue Hua, Xianjun Yang, Zelong Li, Wei Cheng, Yongfeng Zhang

(参考訳) llmに基づくエージェントの出現は、かなりの注目を集めているが、信頼度は未調査領域である。エージェントは物理的な環境と直接対話できるので、信頼性と安全性は重要です。本稿では,エージェント・コンスティチューションをベースとしたエージェント・フレームワークであるTrustAgentについて述べる。本枠組みは, 計画作成前のモデルに安全知識を注入する事前計画戦略, 計画作成時の安全性を高める内計画戦略, 計画後検査による安全性を確保する後計画戦略からなる。実験により,これらの手法がLLMエージェントの安全性を効果的に高め,潜在的な危険を識別し,防止する方法を実証する。さらに, 安全性と利便性の複雑な関係, モデルの推論能力と安全エージェントとしての有効性について検討した。本稿では,LLMをベースとしたエージェントの設計と展開に安全意識と信頼性を組み込むことが,その性能向上だけでなく,人間中心環境への責任ある統合を確実にするためにも不可欠であることを示す。データとコードはhttps://github.com/agiresearch/trustagentで入手できる。

The emergence of LLM-based agents has garnered considerable attention, yet their trustworthiness remains an under-explored area. As agents can directly interact with the physical environment, their reliability and safety is critical. This paper presents an Agent-Constitution-based agent framework, TrustAgent, an initial investigation into improving the safety dimension of trustworthiness in LLM-based agents. This framework consists of threefold strategies: pre-planning strategy which injects safety knowledge to the model prior to plan generation, in-planning strategy which bolsters safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection. Through experimental analysis, we demonstrate how these approaches can effectively elevate an LLM agent's safety by identifying and preventing potential dangers. Furthermore, we explore the intricate relationships between safety and helpfulness, and between the model's reasoning ability and its efficacy as a safe agent. This paper underscores the imperative of integrating safety awareness and trustworthiness into the design and deployment of LLM-based agents, not only to enhance their performance but also to ensure their responsible integration into human-centric environments. Data and code are available at https://github.com/agiresearch/TrustAgent.

翻訳日:2024-02-21 03:08:39 公開日:2024-02-18

# Vaccine: 大規模言語モデルのための摂動認識アライメント

Vaccine: Perturbation-aware Alignment for Large Language Model ( http://arxiv.org/abs/2402.01109v2 )

ライセンス: Link先を確認

Tiansheng Huang, Sihao Hu, Ling Liu

(参考訳) ユーザがアップロードした有害なデータのいくつかは、微調整を簡単に騙してアライメントブロッキングモデルを生成することができる。我々は経験的解析を行い,アライメント・ブロッケン効果の帰結を示唆する現象である \textit{harmful embedded drift} を解明する。本稿では,ユーザのセキュリティリスクを軽減するために,摂動認識アライメント技術であるVaccineを提案する。 Vaccineの中核となる考え方は、アライメントフェーズにおいて、職人的な摂動を徐々に加えることで、不変な隠れ埋め込みを作り出すことである。これにより、埋め込みは、微調整フェーズにおける不衛生なユーザデータからの有害な摂動に耐えることができる。オープンソース主流のllm(例えばllama2, opt, vicuna)における結果から,ワクチンは有害なプロンプトによる埋没ドリフトに対するアライメントの頑健性を高めつつ,良性プロンプトに対する推論能力を維持することができることが示されている。私たちのコードは \url{https://github.com/git-disl/Vaccine} で利用可能です。

The new paradigm of finetuning-as-a-service introduces a new attack surface for Large Language Models (LLMs): a few harmful data uploaded by users can easily trick the finetuning to produce an alignment-broken model. We conduct an empirical analysis and uncover a \textit{harmful embedding drift} phenomenon, showing a probable cause of the alignment-broken effect. Inspired by our findings, we propose Vaccine, a perturbation-aware alignment technique to mitigate the security risk of users finetuning. The core idea of Vaccine is to produce invariant hidden embeddings by progressively adding crafted perturbation to them in the alignment phase. This enables the embeddings to withstand harmful perturbation from un-sanitized user data in the finetuning phase. Our results on open source mainstream LLMs (e.g., Llama2, Opt, Vicuna) demonstrate that Vaccine can boost the robustness of alignment against harmful prompts induced embedding drift while reserving reasoning ability towards benign prompts. Our code is available at \url{https://github.com/git-disl/Vaccine}.

翻訳日:2024-02-21 03:08:01 公開日:2024-02-18

# ultralink: オープンソースの知識エンハンスド多言語教師付き微調整データセット

UltraLink: An Open-Source Knowledge-Enhanced Multilingual Supervised Fine-tuning Dataset ( http://arxiv.org/abs/2402.04588v2 )

ライセンス: Link先を確認

Haoyu Wang, Shuo Wang, Yukun Yan, Xujia Wang, Zhiyu Yang, Yuzhuang Xu, Zhenghao Liu, Liner Yang, Ning Ding, Xu Han, Zhiyuan Liu, Maosong Sun

(参考訳) オープンソースの大規模言語モデル(llm)は、さまざまな分野で大きな力を得ています。それにもかかわらず、ほとんどの研究は主に英語に集中し、多言語能力の領域への探索は限られていた。そこで本研究では,オープンソースの多言語教師付き微調整データセットを構築する。英語の指示を単純に翻訳する以前の研究と異なり、LLMの言語固有の能力と言語に依存しない能力の両方を考慮する。まず,LLMの言語固有の知識を引き出すための知識基盤型データ拡張手法を導入し,各国のユーザに提供する能力を向上させる。さらに,現代のLLMは言語間移動能力が強いため,様々な言語で同一の内容を繰り返し学習する必要はない。その結果、言語に依存しない微調整(SFT)データを性能劣化なく実質的に作成することができ、多言語SFTをより効率的にすることができる。得られたUltraLinkデータセットは、5つの言語(En, Zh, Ru, Fr, Es)にまたがる約100万のサンプルからなり、提案したデータ構築法は他の言語にも容易に拡張できる。 UltraLink-LMはUltraLinkでトレーニングされており、多くのタスクで代表的ベースラインを上回っている。

Open-source large language models (LLMs) have gained significant strength across diverse fields. Nevertheless, the majority of studies primarily concentrate on English, with only limited exploration into the realm of multilingual abilities. In this work, we therefore construct an open-source multilingual supervised fine-tuning dataset. Different from previous works that simply translate English instructions, we consider both the language-specific and language-agnostic abilities of LLMs. Firstly, we introduce a knowledge-grounded data augmentation approach to elicit more language-specific knowledge of LLMs, improving their ability to serve users from different countries. Moreover, we find modern LLMs possess strong cross-lingual transfer capabilities, thus repeatedly learning identical content in various languages is not necessary. Consequently, we can substantially prune the language-agnostic supervised fine-tuning (SFT) data without any performance degradation, making multilingual SFT more efficient. The resulting UltraLink dataset comprises approximately 1 million samples across five languages (i.e., En, Zh, Ru, Fr, Es), and the proposed data construction method can be easily extended to other languages. UltraLink-LM, which is trained on UltraLink, outperforms several representative baselines across many tasks.

翻訳日:2024-02-21 02:57:49 公開日:2024-02-18

# lighthgnn: 100\times$高速推論のためにハイパーグラフニューラルネットワークをmlpに蒸留する

LightHGNN: Distilling Hypergraph Neural Networks into MLPs for $100\times$ Faster Inference ( http://arxiv.org/abs/2402.04296v2 )

ライセンス: Link先を確認

Yifan Feng, Yihe Luo, Shihui Ying, Yue Gao

(参考訳) ハイパーグラフニューラルネットワーク(HGNN)は近年注目され,高次相関モデルにおける優位性から良好な性能を示した。しかし、ハイパーグラフの高次モデリング能力は計算の複雑さを増大させ、実用的な産業展開を妨げることにも注目される。実際、HGNNの効率的なデプロイにおける重要な障壁は、推論中の高次構造的依存関係である。本稿では,HGNNのハイパーグラフ依存性を解消し,計算複雑性を低減し,推論速度の向上を図るため,HGNNと推論効率のよいMulti-Layer Perceptron(MLP)のギャップを埋めることを提案する。具体的には、複雑性の低い高速推論のために、LightHGNNとLightHGNN$^+$を導入する。 LightHGNN は教師 HGNN から学生 MLP への知識をソフトラベルを通じて直接蒸留し、LightHGNN$^+$ は生徒 MLP に信頼性の高い高次相関関係を明示的に注入し、トポロジカルな蒸留と過度なスムースティングに対する耐性を達成する。 8つのハイパーグラフデータセットの実験では、ハイパーグラフの依存関係がなくても、提案されたLightHGNNはHGNNよりも競争力や性能が向上し、バニラMLPを平均16.3ドル上回った。 3つのグラフデータセットに関する広範な実験は、他のすべての方法と比較して、我々のlighthgnnの平均的なパフォーマンスを示している。 5.5wの頂点を持つ合成ハイパーグラフの実験は、LightHGNNがHGNNよりも100\times$で動作可能であることを示している。

Hypergraph Neural Networks (HGNNs) have recently attracted much attention and exhibited satisfactory performance due to their superiority in high-order correlation modeling. However, it is noticed that the high-order modeling capability of hypergraph also brings increased computation complexity, which hinders its practical industrial deployment. In practice, we find that one key barrier to the efficient deployment of HGNNs is the high-order structural dependencies during inference. In this paper, we propose to bridge the gap between the HGNNs and inference-efficient Multi-Layer Perceptron (MLPs) to eliminate the hypergraph dependency of HGNNs and thus reduce computational complexity as well as improve inference speed. Specifically, we introduce LightHGNN and LightHGNN$^+$ for fast inference with low complexity. LightHGNN directly distills the knowledge from teacher HGNNs to student MLPs via soft labels, and LightHGNN$^+$ further explicitly injects reliable high-order correlations into the student MLPs to achieve topology-aware distillation and resistance to over-smoothing. Experiments on eight hypergraph datasets demonstrate that even without hypergraph dependency, the proposed LightHGNNs can still achieve competitive or even better performance than HGNNs and outperform vanilla MLPs by $16.3$ on average. Extensive experiments on three graph datasets further show the average best performance of our LightHGNNs compared with all other methods. Experiments on synthetic hypergraphs with 5.5w vertices indicate LightHGNNs can run $100\times$ faster than HGNNs, showcasing their ability for latency-sensitive deployments.

翻訳日:2024-02-21 02:57:30 公開日:2024-02-18

# エンタングルメント強化量子距離論:標準量子極限からハイゼンベルク極限へ

Entanglement-enhanced quantum metrology: from standard quantum limit to Heisenberg limit ( http://arxiv.org/abs/2402.03572v2 )

ライセンス: Link先を確認

Jiahao Huang, Min Zhuang, Chaohong Lee

(参考訳) エンタングルメント強化量子メートル法は、測定精度を高めるために量子エンタングルメントの利用を探求する。プローブ内の粒子を量子絡み合い状態にすると、測定対象の物理量に関する情報をまとめて蓄積し、標準量子限界を超えた測定精度の向上とハイゼンベルク限界への接近に繋がる。量子操作と検出技術の急速な進歩により、寒冷原子や閉じ込められたイオンのような合成量子システムにおける多粒子の絡み合い状態の生成、操作、検出が可能になった。本稿では,量子計測における多粒子絡み合いを実証する基本原理と実験の進展を概観し,絡み合い量子センサの応用可能性について考察する。

Entanglement-enhanced quantum metrology explores the utilization of quantum entanglement to enhance measurement precision. When particles in a probe are prepared into a quantum entangled state, they collectively accumulate information about the physical quantity to be measured, leading to an improvement in measurement precision beyond the standard quantum limit and approaching the Heisenberg limit. The rapid advancement of techniques for quantum manipulation and detection has enabled the generation, manipulation, and detection of multi-particle entangled states in synthetic quantum systems such as cold atoms and trapped ions. This article aims to review and illustrate the fundamental principles and experimental progresses that demonstrate multi-particle entanglement for quantum metrology, as well as discuss the potential applications of entanglement-enhanced quantum sensors.

翻訳日:2024-02-21 02:56:06 公開日:2024-02-18

# 神経密度比推定のための$\alpha$-divergence loss関数

$\alpha$-Divergence Loss Function for Neural Density Ratio Estimation ( http://arxiv.org/abs/2402.02041v2 )

ライセンス: Link先を確認

Yoshiaki Kitazawa

(参考訳) 近年、ニューラルネットワークは、機械学習の基本技術である密度比推定(DRE)の最先端の結果を生み出している。しかしながら、既存の手法では、kullback-leibler (kl)-divergenceの大きなサンプル要件、列車損失勾配の消失、損失関数の偏り勾配といったdreの損失関数から生じる最適化の問題がある。そこで本稿では,簡単な実装と安定な最適化を提供する$\alpha$-divergence loss関数($\alpha$-div)を提案する。さらに,提案した損失関数の技術的正当性を示す。提案した損失関数の安定性を実証的に検証し,DREタスクの推定精度を検討した。さらに,提案した損失関数を用いたDREのサンプル要件を,高次元DREタスクにおける一般的な問題として次元性の呪いを結び付ける,$L_1$エラーの上限という観点から提示する。

Recently, neural networks have produced state-of-the-art results for density-ratio estimation (DRE), a fundamental technique in machine learning. However, existing methods bear optimization issues that arise from the loss functions of DRE: a large sample requirement of Kullback--Leibler (KL)-divergence, vanishing of train loss gradients, and biased gradients of the loss functions. Thus, an $\alpha$-divergence loss function ($\alpha$-Div) that offers concise implementation and stable optimization is proposed in this paper. Furthermore, technical justifications for the proposed loss function are presented. The stability of the proposed loss function is empirically demonstrated and the estimation accuracy of DRE tasks is investigated. Additionally, this study presents a sample requirement for DRE using the proposed loss function in terms of the upper bound of $L_1$ error, which connects a curse of dimensionality as a common problem in high-dimensional DRE tasks.

翻訳日:2024-02-21 02:52:52 公開日:2024-02-18

# マルチモーダルヘイト音声イベント検出2024におけるMasonPerplexity:トランスフォーマーアンサンブルを用いたヘイトスピーチとターゲット検出

MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles ( http://arxiv.org/abs/2402.01967v2 )

ライセンス: Link先を確認

Amrita Ganguly, Al Nahian Bin Emran, Sadiya Sayara Chowdhury Puspo, Md Nishat Raihan, Dhiman Goswami, Marcos Zampieri

(参考訳) ヘイトスピーチのような攻撃的言語の自動識別は、オンラインコミュニティにおける議論を公にする上で重要である。マルチモーダルコンテンツにおけるヘイトスピーチの識別は、単語または画像のいずれかに攻撃性が現れるか、あるいはこれら2つの曖昧さが顕在化できるため、特に難しい課題である。本稿では,EACL 2024のケース2024におけるマルチモーダルヘイト音声イベント検出における共有タスクに対するMasonPerplexityの提出について述べる。タスクは2つのサブタスクに分けられる: サブタスクAはヘイトスピーチの識別に焦点を当て、サブタスクBは政治イベント中のテキスト埋め込み画像におけるターゲットの識別に焦点を当てる。我々は,サブタスクAにXLM-roBERTa-largeモデル,サブタスクBにXLM-roBERTa-base,BERTweet-large,BERT-baseを組み合わせたアンサンブルアプローチを用い,サブタスクAに0.8347F1スコア,サブタスクBに0.6741F1スコアを得た。

The automatic identification of offensive language such as hate speech is important to keep discussions civil in online communities. Identifying hate speech in multimodal content is a particularly challenging task because offensiveness can be manifested in either words or images or a juxtaposition of the two. This paper presents the MasonPerplexity submission for the Shared Task on Multimodal Hate Speech Event Detection at CASE 2024 at EACL 2024. The task is divided into two sub-tasks: sub-task A focuses on the identification of hate speech and sub-task B focuses on the identification of targets in text-embedded images during political events. We use an XLM-roBERTa-large model for sub-task A and an ensemble approach combining XLM-roBERTa-base, BERTweet-large, and BERT-base for sub-task B. Our approach obtained 0.8347 F1-score in sub-task A and 0.6741 F1-score in sub-task B ranking 3rd on both sub-tasks.

翻訳日:2024-02-21 02:52:34 公開日:2024-02-18

# データ保護がMLサーベイランスのアーキテクチャをどのようにサポートするか

You Still See Me: How Data Protection Supports the Architecture of ML Surveillance ( http://arxiv.org/abs/2402.06609v2 )

ライセンス: Link先を確認

Rui-Jie Yew, Lucy Qin, Suresh Venkatasubramanian

(参考訳) 人間のデータは機械学習のバックボーンを形成する。したがって、データ保護法はMLシステムの管理方法に強く依存する。個人データの処理に伴うデータ保護法の要件がほとんどであることを踏まえると、組織には、データを法的スコープから遠ざけるインセンティブがある。これにより、特定のプライバシー保護技術(データ保護技術)の開発と応用が、ml準拠の重要な戦略となる。本稿では,これらの手法でラップされたデータを「良い」データとみなす修辞学の影響について検討する。モデル計算の一部として、データセットキュレーションの一部としてのプライベートセットの交差から、同型暗号化やフェデレーション学習に至るまで、MLシステムの開発におけるそれらの応用が、個別の監視とデータ統合をさらに支援できることを示す。 mlパイプラインの構成方法のコアにデータ蓄積があるため、データ保護技術は、データに関連する個人を保護する方法ではなく、監視のインフラストラクチャをサポートする方法で具現化されることが多い、と私たちは主張する。最後に,データ保護技術を評価するための技術と政策戦略を提案する。我々は、監視機械学習技術と戦う政策を策定する上で、技術者が果たす役割を強調して締めくくります。

Human data forms the backbone of machine learning. Data protection laws thus have strong bearing on how ML systems are governed. Given that most requirements in data protection laws accompany the processing of personal data, organizations have an incentive to keep their data out of legal scope. This makes the development and application of certain privacy-preserving techniques--data protection techniques--an important strategy for ML compliance. In this paper, we examine the impact of a rhetoric that deems data wrapped in these techniques as data that is "good-to-go". We show how their application in the development of ML systems--from private set intersection as part of dataset curation to homomorphic encryption and federated learning as part of model computation--can further support individual monitoring and data consolidation. With data accumulation at the core of how the ML pipeline is configured, we argue that data protection techniques are often instrumentalized in ways that support infrastructures of surveillance, rather than in ways that protect individuals associated with data. Finally, we propose technology and policy strategies to evaluate data protection techniques in light of the protections they actually confer. We conclude by highlighting the role that technologists might play in devising policies that combat surveillance ML technologies.

翻訳日:2024-02-21 01:08:06 公開日:2024-02-18

# イントロスペクティブプランニング:言語対応エージェントが自身の不確かさを補う

Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty ( http://arxiv.org/abs/2402.06529v2 )

ライセンス: Link先を確認

Kaiqu Liang, Zixu Zhang, Jaime Fern\'andez Fisac

(参考訳) 大きな言語モデル(llm)は高度な推論スキルを示し、ロボットが自然言語命令を理解し、適切な接地を通じて高度なアクションを戦略的に計画できる。しかし、LSM幻覚は、ユーザーの目標と不一致の計画を実行したり、極端な場合、安全でない計画を実行する。さらに、自然言語命令に固有の曖昧さは、特に複数の有効な選択肢が存在する状況において、タスクの不確実性を引き起こす可能性がある。この問題に対処するには、LSMはそのような不確実性を特定し、積極的に明確化を求める必要がある。本稿では,ロボットタスク実行のための不確実性形成におけるllm誘導の体系的手法としてのイントロスペクティブ・プランニングの概念について検討する。タスクレベルのロボット計画における不確実性定量化を調査し,イントロスペクションが成功率と安全性の両方を,最先端のllmベースの計画手法と比較して著しく改善することを示す。さらに,コンフォメーション予測と連動してイントロスペクティブプランニングの有効性を評価し,この組み合わせにより信頼性境界がより強くなり,過剰なユーザ明確化クエリが少ない統計的成功保証が維持されることを示した。

Large language models (LLMs) exhibit advanced reasoning skills, enabling robots to comprehend natural language instructions and strategically plan high-level actions through proper grounding. However, LLM hallucination may result in robots confidently executing plans that are misaligned with user goals or, in extreme cases, unsafe. Additionally, inherent ambiguity in natural language instructions can induce task uncertainty, particularly in situations where multiple valid options exist. To address this issue, LLMs must identify such uncertainty and proactively seek clarification. This paper explores the concept of introspective planning as a systematic method for guiding LLMs in forming uncertainty--aware plans for robotic task execution without the need for fine-tuning. We investigate uncertainty quantification in task-level robot planning and demonstrate that introspection significantly improves both success rates and safety compared to state-of-the-art LLM-based planning approaches. Furthermore, we assess the effectiveness of introspective planning in conjunction with conformal prediction, revealing that this combination yields tighter confidence bounds, thereby maintaining statistical success guarantees with fewer superfluous user clarification queries.

翻訳日:2024-02-21 01:07:47 公開日:2024-02-18

# 大規模言語モデルを用いたマルチモーダル臨床試験結果予測

Multimodal Clinical Trial Outcome Prediction with Large Language Models ( http://arxiv.org/abs/2402.06512v2 )

ライセンス: Link先を確認

Wenhao Zheng, Dongsheng Peng, Hongxia Xu, Hongtu Zhu, Tianfan Fu, Huaxiu Yao

(参考訳) 臨床試験は重要かつ費用のかかるプロセスであり、しばしば数年にわたって、かなりの資金を必要とする。したがって、臨床試験結果予測モデルの開発は、失敗しそうな薬物を除外することを目的としており、大幅なコスト削減の可能性を秘めている。近年のデータ駆動型試みは、臨床治験結果を予測するために、深層学習を利用してマルチモーダルデータを統合している。しかし、これらのアプローチは手動で設計されたモーダル固有エンコーダに依存しており、新しいモーダルに適応する拡張性と、異なるモーダルにまたがる類似した情報パターンを識別する能力の両方を制限する。そこで本研究では, 臨床結果予測のためのマルチモーダル・ミックス・オブ・エキスパート(lifted)アプローチを提案する。具体的には、LIFTEDは異なるモダリティデータを自然言語記述に変換することで統一する。そして、LIFTEDは統合ノイズ耐性エンコーダを構築し、モーダル固有の言語記述から情報を抽出する。その後、sparse mixture-of-expertsフレームワークを使用して表現をさらに洗練し、liftedは異なるモダリティにまたがる類似情報パターンを特定し、同じエキスパートモデルを使用してそれらのパターンからより一貫性のある表現を抽出することができる。最後に、様々なモダリティ表現を動的に統合して予測することで、LIFTEDは異なるモダリティを自動で測定し、重要な情報により多くの注意を払うことができる。実験の結果, LIFTEDは, 3段階の治験成績を予測する上で, 最良基準に比べて有意に向上し, キーコンポーネントの有効性が示された。

The clinical trial is a pivotal and costly process, often spanning multiple years and requiring substantial financial resources. Therefore, the development of clinical trial outcome prediction models aims to exclude drugs likely to fail and holds the potential for significant cost savings. Recent data-driven attempts leverage deep learning methods to integrate multimodal data for predicting clinical trial outcomes. However, these approaches rely on manually designed modal-specific encoders, which limits both the extensibility to adapt new modalities and the ability to discern similar information patterns across different modalities. To address these issues, we propose a multimodal mixture-of-experts (LIFTED) approach for clinical trial outcome prediction. Specifically, LIFTED unifies different modality data by transforming them into natural language descriptions. Then, LIFTED constructs unified noise-resilient encoders to extract information from modal-specific language descriptions. Subsequently, a sparse Mixture-of-Experts framework is employed to further refine the representations, enabling LIFTED to identify similar information patterns across different modalities and extract more consistent representations from those patterns using the same expert model. Finally, a mixture-of-experts module is further employed to dynamically integrate different modality representations for prediction, which gives LIFTED the ability to automatically weigh different modalities and pay more attention to critical information. The experiments demonstrate that LIFTED significantly enhances performance in predicting clinical trial outcomes across all three phases compared to the best baseline, showcasing the effectiveness of our proposed key components.

翻訳日:2024-02-21 01:07:28 公開日:2024-02-18

# taser: 高速かつ高精度な動的グラフ表現学習のための時間適応サンプリング

TASER: Temporal Adaptive Sampling for Fast and Accurate Dynamic Graph Representation Learning ( http://arxiv.org/abs/2402.05396v2 )

ライセンス: Link先を確認

Gangda Deng, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Christopher Leung, Jianbo Li, Rajgopal Kannan, Viktor Prasanna

(参考訳) 近年,tgnn(temporal graph neural network)が,不正検出やコンテンツ推薦など,さまざまなハイインパクトアプリケーションにおいて最先端のパフォーマンスを示している。 TGNNの成功にもかかわらず、タイムデプリケートリンクや歪んだ相互作用分布のような現実の動的グラフに見られる一般的なノイズに傾向がある。ノイズはTGNNの精度を著しく損なう2つの重要な問題を引き起こす:(1)モデルが劣る相互作用によって制御され、(2)ノイズ入力は集約されたメッセージに高いばらつきをもたらす。しかし、現在のTGNN復調技術は各ノードの多様かつ動的ノイズパターンを考慮していない。さらに、より多くの隣人をトラバースすることで発生する、超過度のミニバッチ生成オーバーヘッドにも悩まされる。高速かつ正確なTGNNの治療法は、時間適応サンプリングにあると考えています。本研究では,TGNNの精度,効率,スケーラビリティに最適化された最初の適応サンプリング手法であるTASERを提案する。 TASERは、過去の相互作用の文脈的、構造的、時間的特性に基づいて、トレーニングダイナミクスと時間的隣人選択に基づいてミニバッチ選択を適用する。ミニバッチ生成のボトルネックを軽減するため、TASERは純粋なGPUベースの時間的隣のファインダと専用のGPU機能キャッシュを実装している。 2つの最先端のバックボーンTGNNを用いたTASERの性能評価を行った。 5つの一般的なデータセットにおいて、TASERは平均相反ランク(MRR)で平均2.3%のベースラインを上回り、トレーニング時間で平均5.1倍のスピードアップを達成する。

Recently, Temporal Graph Neural Networks (TGNNs) have demonstrated state-of-the-art performance in various high-impact applications, including fraud detection and content recommendation. Despite the success of TGNNs, they are prone to the prevalent noise found in real-world dynamic graphs like time-deprecated links and skewed interaction distribution. The noise causes two critical issues that significantly compromise the accuracy of TGNNs: (1) models are supervised by inferior interactions, and (2) noisy input induces high variance in the aggregated messages. However, current TGNN denoising techniques do not consider the diverse and dynamic noise pattern of each node. In addition, they also suffer from the excessive mini-batch generation overheads caused by traversing more neighbors. We believe the remedy for fast and accurate TGNNs lies in temporal adaptive sampling. In this work, we propose TASER, the first adaptive sampling method for TGNNs optimized for accuracy, efficiency, and scalability. TASER adapts its mini-batch selection based on training dynamics and temporal neighbor selection based on the contextual, structural, and temporal properties of past interactions. To alleviate the bottleneck in mini-batch generation, TASER implements a pure GPU-based temporal neighbor finder and a dedicated GPU feature cache. We evaluate the performance of TASER using two state-of-the-art backbone TGNNs. On five popular datasets, TASER outperforms the corresponding baselines by an average of 2.3% in Mean Reciprocal Rank (MRR) while achieving an average of 5.1x speedup in training time.

翻訳日:2024-02-21 01:04:34 公開日:2024-02-18

# デッサートを乗り越えて、繰り返しケーキをカットする技術を習得する

Dueling Over Dessert, Mastering the Art of Repeated Cake Cutting ( http://arxiv.org/abs/2402.08547v2 )

ライセンス: Link先を確認

Simina Br\^anzei and MohammadTaghi Hajiaghayi and Reed Phillips and Suho Shin and Kun Wang

(参考訳) 我々は、アリスとボブという2人のプレイヤーがケーキよりもプライベートなバリュエーションで繰り返し公平に分割することを考える。各ラウンドに新しいケーキが登場し、前ラウンドと同じである。アリスは自分の選択した時点でケーキを切るが、ボブは左のピースか右のピースを選び、残りはアリスに任せる。我々は2つのバージョンを考える: シーケンシャル: ボブがアリスのカットポイントを左と右を選ぶ前に観察し、同時に、ボブが選択した後のみ彼女のカットポイントを観察する。同時版は Aumann and Maschler (1995) によって最初に検討された。ボブがほとんど近視的であり、彼の好きな曲をあまり頻繁に選ぶなら、二分探索に似た戦略を通じてアリスによって体系的に悪用されるのである。この戦略により、アリスはボブの好みを精度を上げることで近似し、時間とともに資源の不均等な共有を確保することができる。プレイヤーが他のプレイヤーをどの程度利用できるかの限界を分析し、公正なユーティリティプロファイルが実際に達成可能であることを示す。特に、プレイヤーは、他のプレイヤーの効用を平均で約1/2$に保ちながら、平均で約1/2$の保証をすることで、プレーの軌跡ごとに、同等の効用プロファイルに$(1/2, 1/2)$を課すことができる。この定理はブラックウェルのアプローチ可能性との接続を用いて示される。最後に、プレイヤーが他のプレイヤーの経験的分布に最も反応する架空の遊びとして知られる自然力学を分析する。虚数プレイは、$(1/2, 1/2)$の公平なユーティリティプロファイルに$O(1/\sqrt{T})$の速度で収束することを示す。

We consider the setting of repeated fair division between two players, denoted Alice and Bob, with private valuations over a cake. In each round, a new cake arrives, which is identical to the ones in previous rounds. Alice cuts the cake at a point of her choice, while Bob chooses the left piece or the right piece, leaving the remainder for Alice. We consider two versions: sequential, where Bob observes Alice's cut point before choosing left/right, and simultaneous, where he only observes her cut point after making his choice. The simultaneous version was first considered by Aumann and Maschler (1995). We observe that if Bob is almost myopic and chooses his favorite piece too often, then he can be systematically exploited by Alice through a strategy akin to a binary search. This strategy allows Alice to approximate Bob's preferences with increasing precision, thereby securing a disproportionate share of the resource over time. We analyze the limits of how much a player can exploit the other one and show that fair utility profiles are in fact achievable. Specifically, the players can enforce the equitable utility profile of $(1/2, 1/2)$ in the limit on every trajectory of play, by keeping the other player's utility to approximately $1/2$ on average while guaranteeing they themselves get at least approximately $1/2$ on average. We show this theorem using a connection with Blackwell approachability. Finally, we analyze a natural dynamic known as fictitious play, where players best respond to the empirical distribution of the other player. We show that fictitious play converges to the equitable utility profile of $(1/2, 1/2)$ at a rate of $O(1/\sqrt{T})$.

翻訳日:2024-02-21 00:53:32 公開日:2024-02-18

# 一般値関数をもつ文脈多項ロジット帯域

Contextual Multinomial Logit Bandits with General Value Functions ( http://arxiv.org/abs/2402.08126v2 )

ライセンス: Link先を確認

Mengxiao Zhang, Haipeng Luo

(参考訳) MNL(Contextual multinomial logit)は、オンライン小売や広告など、現実のアソシエーションレコメンデーション問題の多くを捉えている。しかしながら、以前の研究は線形値関数のみを考慮(一般化)しており、適用可能性を大幅に制限している。この事実に動機づけられた本研究では、文脈的帯域幅の研究の最近の動向からアイデアを借り、基礎的真実を含む一般値関数クラスを持つ文脈的MNL帯域幅を考える。具体的には,確率的および対数的設定の両方を考慮し,それぞれ異なる計算-回帰トレードオフを持つアルゴリズム一式を提案する。線形の場合に適用した場合、この結果は指数関数的に大きい問題依存定数に依存しない最初のものであるだけでなく、計算効率、次元自由後悔境界、完全に対向する文脈や報酬を扱う能力などの他の利点も享受する。

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising. However, prior work has only considered (generalized) linear value functions, which greatly limits its applicability. Motivated by this fact, in this work, we consider contextual MNL bandits with a general value function class that contains the ground truth, borrowing ideas from a recent trend of studies on contextual bandits. Specifically, we consider both the stochastic and the adversarial settings, and propose a suite of algorithms, each with different computation-regret trade-off. When applied to the linear case, our results not only are the first ones with no dependence on a certain problem-dependent constant that can be exponentially large, but also enjoy other advantages such as computational efficiency, dimension-free regret bounds, or the ability to handle completely adversarial contexts and rewards.

翻訳日:2024-02-21 00:52:43 公開日:2024-02-18

# 平均場定常分布からのサンプリング

Sampling from the Mean-Field Stationary Distribution ( http://arxiv.org/abs/2402.07355v3 )

ライセンス: Link先を確認

Yunbum Kook, Matthew S. Zhang, Sinho Chewi, Murat A. Erdogdu, Mufan Bill Li

(参考訳) 本研究では,平均場SDEの定常分布からのサンプリングの複雑さ,あるいは相互作用項を含む確率測度空間上の関数の最小化の複雑さについて検討する。本研究の主な知見は,(1)有限粒子系による平均場sdeの近似,(2)カオスの均一な時間伝播,(2)標準対数対数解析による有限粒子定常分布からのサンプリング,の2つの重要な側面を分離することである。我々のアプローチは概念的にシンプルであり、その柔軟性はアルゴリズムと理論の両方に最先端の技術を取り入れることができる。これにより、平均フィールド状態における特定の2層ニューラルネットワークを最適化する保証の改善など、多数の設定での保証が改善される。

We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to decouple the two key aspects of this problem: (1) approximation of the mean-field SDE via a finite-particle system, via uniform-in-time propagation of chaos, and (2) sampling from the finite-particle stationary distribution, via standard log-concave samplers. Our approach is conceptually simpler and its flexibility allows for incorporating the state-of-the-art for both algorithms and theory. This leads to improved guarantees in numerous settings, including better guarantees for optimizing certain two-layer neural networks in the mean-field regime.

翻訳日:2024-02-21 00:51:50 公開日:2024-02-18

# 知的分子特性予測におけるドメイン知識とマルチモダリティの影響--体系的調査

The Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey ( http://arxiv.org/abs/2402.07249v2 )

ライセンス: Link先を確認

Taojie Kuang, Pengfei Liu, Zhixiang Ren

(参考訳) 分子特性の正確な予測は薬物開発、特に仮想スクリーニングや化合物最適化の進歩に不可欠である。近年の多くの深層学習手法の導入は、分子特性予測(MPP)の強化、特に分子構造に対する精度と洞察の向上に顕著な可能性を示している。しかし、2つの重要な疑問が生じる: ドメイン知識の統合は分子特性予測の精度を高め、マルチモーダルデータ融合を用いることで、ユニークなデータソース法よりも正確な結果が得られるか? そこで本研究では,近年の深層学習法を総合的に検討し,定量的に分析する。分子情報の統合はMPPの回帰と分類のタスクをそれぞれ3.98%と1.72%改善することを発見した。また,1次元情報と2次元情報を同時に利用することにより,mppを最大4.2%向上できることがわかった。 2つの統合された洞察は、薬物発見の将来の進歩に重要なガイダンスを提供する。

The precise prediction of molecular properties is essential for advancements in drug development, particularly in virtual screening and compound optimization. The recent introduction of numerous deep learning-based methods has shown remarkable potential in enhancing molecular property prediction (MPP), especially improving accuracy and insights into molecular structures. Yet, two critical questions arise: does the integration of domain knowledge augment the accuracy of molecular property prediction and does employing multi-modal data fusion yield more precise results than unique data source methods? To explore these matters, we comprehensively review and quantitatively analyze recent deep learning methods based on various benchmarks. We discover that integrating molecular information will improve both MPP regression and classification tasks by upto 3.98% and 1.72%, respectively. We also discover that the utilizing 3-dimensional information with 1-dimensional and 2-dimensional information simultaneously can substantially enhance MPP upto 4.2%. The two consolidated insights offer crucial guidance for future advancements in drug discovery.

翻訳日:2024-02-21 00:51:35 公開日:2024-02-18

# 金融におけるネットワークレジリエンス向上のためのディープラーニング

Utilizing Deep Learning for Enhancing Network Resilience in Finance ( http://arxiv.org/abs/2402.09820v2 )

ライセンス: Link先を確認

Yulu Gong, Mengran Zhu, Shuning Huo, Yafei Xiang, Hanyi Yu

(参考訳) インターネットの時代において、人々の生活はますます今日のネットワーク技術に依存している。ネットワークの完全性を維持し、ユーザーの正当な利益を守ることは、ネットワーク構築の核心である。脅威検出は、完全かつ効果的な防衛システムの重要な部分である。未知の脅威を効果的に検出する方法は、ネットワーク保護の懸念のひとつだ。現在、ネットワーク脅威検出は、通常、人工的なルールを作成したり、大規模なデータアプリケーションに適用できない時空間的特徴を抽出したりするルールや従来の機械学習手法に基づいており、未知のリスクの発生によって元のモデルの検出精度が低下する。このことを念頭に置いて,金融業界の保護対策を改善するために,高度な脅威検出にディープラーニングを用いる。多くのネットワーク研究者は例外ベースの侵入検知技術に焦点を移した。検出技術は主に、通常のプログラムとネットワークの振る舞いデータを収集し、多次元の特徴を抽出し、このベースで決定機械学習モデルを訓練する統計機械学習手法を使用する(一般的には、ベイズ、決定木、サポートベクターマシン、ランダムフォレストなどを含む)。

In the age of the Internet, people's lives are increasingly dependent on today's network technology. Maintaining network integrity and protecting the legitimate interests of users is at the heart of network construction. Threat detection is an important part of a complete and effective defense system. How to effectively detect unknown threats is one of the concerns of network protection. Currently, network threat detection is usually based on rules and traditional machine learning methods, which create artificial rules or extract common spatiotemporal features, which cannot be applied to large-scale data applications, and the emergence of unknown risks causes the detection accuracy of the original model to decline. With this in mind, this paper uses deep learning for advanced threat detection to improve protective measures in the financial industry. Many network researchers have shifted their focus to exception-based intrusion detection techniques. The detection technology mainly uses statistical machine learning methods - collecting normal program and network behavior data, extracting multidimensional features, and training decision machine learning models on this basis (commonly used include naive Bayes, decision trees, support vector machines, random forests, etc.).

翻訳日:2024-02-21 00:26:16 公開日:2024-02-18

# コントラスト学習とセルフアテンションを用いた時間軸の逐次推薦

Sequential Recommendation on Temporal Proximities with Contrastive Learning and Self-Attention ( http://arxiv.org/abs/2402.09784v2 )

ライセンス: Link先を確認

Hansol Jung, Hyunwoo Seo and Chiehyeon Lim

(参考訳) 逐次リコメンデータシステムは、過去のインタラクションからユーザの好みを識別し、後続の項目を最適に予測する。従来のディープラーニングモデルと最新のトランスフォーマーモデルでは、ユーザとテーマのインタラクションにおける一方向および双方向のパターンが捉えられているが、個人の行動パターンや社会的傾向パターンといった時間的文脈の重要性は未検討のままである。特に最近のモデルは、類似した時間枠の間、ユーザ間で暗黙的に発生するユーザのアクションの類似性を無視することが多い。これらのモデルは主に変換器の自己認識機構を適用し、個々のユーザアクションの時間的コンテキストを考慮する。一方、この適応は、アイテム間の相互作用における水平時間的近接性、例えば1週間以内のアイテム購入と1ヶ月以内のアイテム購入の区別を考慮しても依然として限定的である。これらのギャップに対処するため,ユーザ間相互作用の時間的近さを考慮し,コントラスト学習と自己注意法を含む,TemProxRecというシーケンシャルレコメンデーションモデルを提案する。提案するコントラスト学習法は,ユーザ間の密接な時間間隔で選択された項目の表現を学習する。同時に,提案手法は,絶対埋め込みと相対埋め込みの両方を用いて,ユーザシーケンス内の時間的および位置的コンテキストを符号化する。このようにして、私たちのTemProxRecは、特定の時間枠内のユーザとイテムのインタラクションに基づいて、関連するアイテムを正確に予測します。 temproxrecに関する包括的実験によって検証し、ベンチマークデータセットで既存のモデルと一貫して比較し、垂直および水平の時間軸を逐次レコメンデーションとして考慮することの重要性を示す。

Sequential recommender systems identify user preferences from their past interactions to predict subsequent items optimally. Although traditional deep-learning-based models and modern transformer-based models in previous studies capture unidirectional and bidirectional patterns within user-item interactions, the importance of temporal contexts, such as individual behavioral and societal trend patterns, remains underexplored. Notably, recent models often neglect similarities in users' actions that occur implicitly among users during analogous timeframes-a concept we term vertical temporal proximity. These models primarily adapt the self-attention mechanisms of the transformer to consider the temporal context in individual user actions. Meanwhile, this adaptation still remains limited in considering the horizontal temporal proximity within item interactions, like distinguishing between subsequent item purchases within a week versus a month. To address these gaps, we propose a sequential recommendation model called TemProxRec, which includes contrastive learning and self-attention methods to consider temporal proximities both across and within user-item interactions. The proposed contrastive learning method learns representations of items selected in close temporal periods across different users to be close. Simultaneously, the proposed self-attention mechanism encodes temporal and positional contexts in a user sequence using both absolute and relative embeddings. This way, our TemProxRec accurately predicts the relevant items based on the user-item interactions within a specific timeframe. We validate this work through comprehensive experiments on TemProxRec, consistently outperforming existing models on benchmark datasets as well as showing the significance of considering the vertical and horizontal temporal proximities into sequential recommendation.

翻訳日:2024-02-21 00:25:57 公開日:2024-02-18

# 高アフィン変換に適応した領域特徴記述子

Region Feature Descriptor Adapted to High Affine Transformations ( http://arxiv.org/abs/2402.09724v2 )

ライセンス: Link先を確認

Shaojie Zhang, Yinghui Wang, Bin Nan, Jinlong Yang, Tao Yan, Liangyi Huang, and Mingfeng Wang

(参考訳) 画像が高アフィン変換を行う場合のグレースケール特徴情報の表現に効果のない特徴ディスクリプタの問題に対処するため,分類を用いてアフィン変換をシミュレートした領域特徴ディスクリプタを提案する。提案手法は当初,異なるアフィン次数を持つ画像を分類し,アフィン変換をシミュレートし,新たな画像群を生成する。その後、この新しい画像集合上の特徴点の近傍情報を算出する。最後に、特徴点が属する最大安定極端領域のグレースケールヒストグラムと特徴点領域のグレイスケールセントロイドに対する正規化位置とを組み合わせて記述子を生成する。アフィン変換のシナリオで特徴マッチングメトリクスを比較した実験の結果,提案する記述器は従来の記述器と比較して高い精度と頑健性を示すことがわかった。さらに、他のディスクリプタと統合すると堅牢性を示す。

To address the issue of feature descriptors being ineffective in representing grayscale feature information when images undergo high affine transformations, leading to a rapid decline in feature matching accuracy, this paper proposes a region feature descriptor based on simulating affine transformations using classification. The proposed method initially categorizes images with different affine degrees to simulate affine transformations and generate a new set of images. Subsequently, it calculates neighborhood information for feature points on this new image set. Finally, the descriptor is generated by combining the grayscale histogram of the maximum stable extremal region to which the feature point belongs and the normalized position relative to the grayscale centroid of the feature point's region. Experimental results, comparing feature matching metrics under affine transformation scenarios, demonstrate that the proposed descriptor exhibits higher precision and robustness compared to existing classical descriptors. Additionally, it shows robustness when integrated with other descriptors.

翻訳日:2024-02-21 00:25:23 公開日:2024-02-18

# モデル編集による蝶効果:大言語モデルの崩壊をトリガーできる編集は少ない

The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse ( http://arxiv.org/abs/2402.09656v2 )

ライセンス: Link先を確認

Wanli Yang, Fei Sun, Xinyu Ma, Xun Liu, Dawei Yin, Xueqi Cheng

(参考訳) モデル編集は、Large Language Models (LLM) における知識の改訂に有望であるが、LLMの本質的な能力への影響はしばしば見過ごされている。一つの編集でもモデル崩壊を引き起こし、様々なベンチマークタスクで大幅なパフォーマンス低下を示す。しかし、このような崩壊を防ぐために各編集後のLCMのベンチマークは、致命的であり、資源集約である。そこで本研究では,ダウンストリームタスクの性能と強い相関関係を実証した広範囲な実験により検証した,代理メトリックとしてのパープレキシティの利用を提案する。さらに,従来の単一編集研究の難題に焦点をあて,様々な編集手法やLLMをまたいだ実世界のシナリオの実践的設定であるシーケンシャル編集の詳細な研究を行っている。その結果, ほぼすべての編集手法が, ほんの数回の編集後, モデル崩壊をもたらすことがわかった。さらなる研究を容易にするため,我々はGPT-3.5を用いて,これらのハードケースに基づいた新しいデータセットであるHardEditを開発した。このデータセットは、信頼性のあるモデル編集の研究の先駆的な基盤と、編集によるモデル崩壊のメカニズムを確立することを目的としている。この作業が,モデル編集プラクティスに内在する潜在的なリスクに対して,コミュニティの注意を引き付けることを願っています。

Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating its strong correlation with downstream tasks performance. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community's attention to the potential risks inherent in model editing practices.

翻訳日:2024-02-21 00:25:07 公開日:2024-02-18

# EventRL: 大規模言語モデルのアウトカムスーパービジョンによるイベント抽出の強化

EventRL: Enhancing Event Extraction with Outcome Supervision for Large Language Models ( http://arxiv.org/abs/2402.11430v1 )

ライセンス: Link先を確認

Jun Gao, Huan Zhao, Wei Wang, Changlong Yu, Ruifeng Xu

(参考訳) 本研究では,大規模言語モデル(LLM)におけるイベント抽出の強化を目的とした強化学習手法であるEventRLを提案する。 EventRLは、結果の監視と特定の報酬関数を利用して、イベント構造のミスマッチや未定義のイベントタイプの生成として表される命令の追従や幻覚といった、LLMの一般的な課題に取り組む。我々は,Few-Shot Prompting (FSP) や Supervised Fine-Tuning (SFT) といった既存手法に対して,GPT-4, LLaMa, CodeLLaMa モデルを含む様々な LLM に対して EventRL を評価する。以上の結果から,EventRLはイベントの識別や構造化,特に新しいイベントタイプへの対応において,従来の手法よりも優れていた。この研究は、報酬関数の選択の重要な役割を強調し、より良いイベント抽出のためにコードデータを統合する利点を示す。モデルサイズの増加は高い精度をもたらすが、オーバーフィットを避けるには一般化する能力を維持することが不可欠である。

In this study, we present EventRL, a reinforcement learning approach developed to enhance event extraction for large language models (LLMs). EventRL utilizes outcome supervision with specific reward functions to tackle prevalent challenges in LLMs, such as instruction following and hallucination, manifested as the mismatch of event structure and the generation of undefined event types. We evaluate EventRL against existing methods like Few-Shot Prompting (FSP) (based on GPT4) and Supervised Fine-Tuning (SFT) across various LLMs, including GPT-4, LLaMa, and CodeLLaMa models. Our findings show that EventRL significantly outperforms these conventional approaches by improving the performance in identifying and structuring events, particularly in handling novel event types. The study emphasizes the critical role of reward function selection and demonstrates the benefits of incorporating code data for better event extraction. While increasing model size leads to higher accuracy, maintaining the ability to generalize is essential to avoid overfitting.

翻訳日:2024-02-20 21:37:29 公開日:2024-02-18

# 選好微調整による視覚大言語モデルにおけるモーダリティの調整

Aligning Modalities in Vision Large Language Models via Preference Fine-tuning ( http://arxiv.org/abs/2402.11411v1 )

ライセンス: Link先を確認

Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

(参考訳) VLLM(Instruction-following Vision Large Language Models)は近年,様々なタスクにおいて大きな進歩を遂げている。これらのアプローチは、強い事前訓練された視覚モデルと大きな言語モデル(LLM)を統合する。これらのコンポーネントは別々にトレーニングされているため、学習された表現は、追加のイメージ言語対のジョイントトレーニングと整合する必要がある。この手順は完全ではなく、モデルに幻覚を与える - コアllmが極めて事実的であり、ビジョンバックボーンが十分に完全な表現を持っている場合でも、画像を正確に反映しない回答を提供する - 。本研究では,幻覚問題をアライメント問題として枠組し,嗜好調整によって対処する。具体的には,AIモデルを用いたフィードバックデータを生成するPOVIDを提案する。提案手法は,好ましくないデータを生成するための2段階のアプローチである。まず,GPT-4Vに対して,正解に可溶性幻覚を注入するよう促す。第2に、VLLMの固有の幻覚行動を引き起こすために、画像を歪ませる。これは自動化されたアプローチで、人間のデータ生成に依存したり、完璧な専門家を必要としません。最後に、これら2つの生成ストラテジーは、Direct Preference Optimizationを通じてRLHFパイプラインに統合される。広範ベンチマークを用いた実験では、幻覚を減らすだけでなく、標準ベンチマークでのモデル性能を向上させることができ、従来の手法よりも優れていた。私たちのデータとコードはhttps://github.com/YiyangZhou/POVID.comで公開されています。

Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components are trained separately, the learned representations need to be aligned with joint training on additional image-language pairs. This procedure is not perfect and can cause the model to hallucinate - provide answers that do not accurately reflect the image, even when the core LLM is highly factual and the vision backbone has sufficiently complete representations. In this work, we frame the hallucination problem as an alignment issue, tackle it with preference tuning. Specifically, we propose POVID to generate feedback data with AI models. We use ground-truth instructions as the preferred response and a two-stage approach to generate dispreferred data. First, we prompt GPT-4V to inject plausible hallucinations into the correct answer. Second, we distort the image to trigger the inherent hallucination behavior of the VLLM. This is an automated approach, which does not rely on human data generation or require a perfect expert, which makes it easily scalable. Finally, both of these generation strategies are integrated into an RLHF pipeline via Direct Preference Optimization. In experiments across broad benchmarks, we show that we can not only reduce hallucinations, but improve model performance across standard benchmarks, outperforming prior approaches. Our data and code are available at https://github.com/YiyangZhou/POVID.

翻訳日:2024-02-20 21:37:09 公開日:2024-02-18

# キャリブレーションまでの距離2\sqrt{t}$を得る基本予測器

An Elementary Predictor Obtaining $2\sqrt{T}$ Distance to Calibration ( http://arxiv.org/abs/2402.11410v1 )

ライセンス: Link先を確認

Eshwar Ram Arunachaleswaran, Natalie Collina, Aaron Roth, Mirah Shi

(参考訳) Blasiokら。 2023] 予測校正誤差(ece)と異なる校正誤差の自然な尺度として校正までの距離を提案する。最近、Qiao と Zheng [2024] は、ECE では不可能であることが知られている対角線の校正までの距離を$O(\sqrt{T})$で得るオンライン予測器の存在を確立する非構成的議論を行った。それらは、明示的で効率的なアルゴリズムを見つけるオープンな問題として残されている。我々はこの問題を解き、極端に単純で効率的で決定論的なアルゴリズムを与え、最大2$\sqrt{T}$で校正誤差までの距離を求める。

Blasiok et al. [2023] proposed distance to calibration as a natural measure of calibration error that unlike expected calibration error (ECE) is continuous. Recently, Qiao and Zheng [2024] gave a non-constructive argument establishing the existence of an online predictor that can obtain $O(\sqrt{T})$ distance to calibration in the adversarial setting, which is known to be impossible for ECE. They leave as an open problem finding an explicit, efficient algorithm. We resolve this problem and give an extremely simple, efficient, deterministic algorithm that obtains distance to calibration error at most $2\sqrt{T}$.

翻訳日:2024-02-20 21:36:46 公開日:2024-02-18

# 共感的対話応答の多次元評価

Multi-dimensional Evaluation of Empathetic Dialog Responses ( http://arxiv.org/abs/2402.11409v1 )

ライセンス: Link先を確認

Zhichao Xu, Jiepu Jiang

(参考訳) 共感は効果的な会話コミュニケーションの重要な要素であるが、会話の共感を測定する以前の研究は、主に表現されたコミュニケーションの意図に焦点を当てている。対照的に,話者の視点から表現された意図と聞き手の視点から知覚された共感の両方を測定するために,既存の作業を拡張する多次元共感評価フレームワークを提案する。提案手法を適用して顧客・サービス対話の分析を行ったところ,2次元(表現意図型と知覚共感)は相互に関連しており,共感感は対話セッションの満足度と高い相関関係にあることがわかった。このフレームワークでは、トレーニングされたアノテータからの主観的な評価が必要である。そこで我々は,(1)凍結した大言語モデル(LLM)と(2)学習言語モデルに基づく分類器を用いて,対話的共感を自動的に計測する様々なモデリングオプションについて検討した。 GPT-4およびFlanファミリーモデルの性能の低下を反映して、内部および外部の対話データセットの広範な実験により、会話の共感を測定することは、凍結LDMの促進に依然として困難な課題であることが示された。一方,sequence-to-sequence (seq2seq) 言語モデルに基づく提案手法は,先行研究や競合ベースラインと比較して最高の性能を実現することができる。最後に,提案する命令精細分類器の性能に関する包括的アブレーション研究を行い,自動会話共感評価指標として採用する可能性について推奨する。

Empathy is a critical element of effective and satisfactory conversational communication, yet previous studies in measuring conversational empathy mostly focus on expressed communicative intents -- in which way empathy is expressed, ignoring the fact that conversation is also a collaborative practice involving both speakers and listeners. In contrast, we propose a multi-dimensional empathy evaluation framework that extends upon existing work to measure both expressed intents from the speaker's perspective and perceived empathy from the listener's perspective. Applying the proposed framework to analyzing our internal customer-service dialogue shows that the two dimensions (expressed intent types and perceived empathy) are inter-connected, while perceived empathy has high correlation with the satisfactory level of dialogue sessions. This proposed framework still requires subjective assessments from trained annotators, which can be non-trivial to collect. To scale up evaluation without excessive reliance on carefully annotated data, we explore different modeling options to automatically measure conversational empathy with (1) prompting frozen large language models (LLMs) and (2) training language model-based classifiers. Extensive experiments on both internal and external dialogue datasets show that measuring conversational empathy remains a challenging task for prompting frozen LLMs, reflected by less satisfying performance of GPT-4 and Flan family models. On the other hand, our proposed instruction-finetuned classifiers based on sequence-to-sequence (Seq2Seq) language models is able to achieve the best performance compared to prior works and competitive baselines. Finally, we perform comprehensive ablation studies on the performance of proposed instruction-finetuned classifiers and give recommendations on potentially adopting them as automatic conversational empathy evaluation metrics.

翻訳日:2024-02-20 21:36:27 公開日:2024-02-18

# 極端に言うな - 暗黙のヘイトスピーチ検出におけるllmの過度の感度とキャリブレーション制限を明らかにする

Don't Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection ( http://arxiv.org/abs/2402.11406v1 )

ライセンス: Link先を確認

Min Zhang, Jianfeng He, Taoran Ji, Chang-Tien Lu

(参考訳) 大規模言語モデル(LLM)の公平性と信頼性は注目されている。憎しみの意図を伝えるために間接言語を用いる暗黙のヘイトスピーチは、実践のかなりの部分を占める。しかし、LLMがこの問題に効果的に対処する程度については、まだ十分に検証されていない。本稿では,LLMが暗黙のヘイトスピーチ(分類タスク)を検出し,その応答に対する自信を表現する能力について述べる。本評価は,様々なプロンプトパターンと主観的不確実性推定手法を念頭において検討する。 1) LLMは, 公平性問題を引き起こす可能性のあるグループやトピックに対して過度な感受性を示し, ヘイトスピーチとして良心的発言を誤分類する。 (2)各手法に対するllmsの信頼度スコアは固定範囲に集中し、データセットの複雑さにかかわらず変わらない。これにより、キャリブレーション性能は一次分類精度に大きく依存する。これらの発見はLSMの新たな制限を明らかにし、極端に向かないようモデルを最適化する際の注意が必要であることを強調している。これは、モデルフェアネスの追求における感度と信頼性を慎重に考慮するためのリマインダーとして機能する。

The fairness and trustworthiness of Large Language Models (LLMs) are receiving increasing attention. Implicit hate speech, which employs indirect language to convey hateful intentions, occupies a significant portion of practice. However, the extent to which LLMs effectively address this issue remains insufficiently examined. This paper delves into the capability of LLMs to detect implicit hate speech (Classification Task) and express confidence in their responses (Calibration Task). Our evaluation meticulously considers various prompt patterns and mainstream uncertainty estimation methods. Our findings highlight that LLMs exhibit two extremes: (1) LLMs display excessive sensitivity towards groups or topics that may cause fairness issues, resulting in misclassifying benign statements as hate speech. (2) LLMs' confidence scores for each method excessively concentrate on a fixed range, remaining unchanged regardless of the dataset's complexity. Consequently, the calibration performance is heavily reliant on primary classification accuracy. These discoveries unveil new limitations of LLMs, underscoring the need for caution when optimizing models to ensure they do not veer towards extremes. This serves as a reminder to carefully consider sensitivity and confidence in the pursuit of model fairness.

翻訳日:2024-02-20 21:35:57 公開日:2024-02-18

# autoprm: 制御可能な質問分解による多段階推論のための手続き的監督の自動化

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition ( http://arxiv.org/abs/2402.11452v1 )

ライセンス: Link先を確認

Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj and Huaxiu Yao

(参考訳) 大規模言語モデル(LLM)の最近の進歩は、多段階推論タスクにおいて有望であるが、手続き的フィードバックを提供するための広範な手動ラベリングに依存していることは、依然として大きな障害である。本稿では,複雑な推論課題に対して,llmの微調整をより効率的に行うための,自己教師付きフレームワークautoprmを提案する。具体的には、autoprmは、まず複雑な問題を制御可能な粒度スイッチでより管理可能なサブクエストに分解し、その後順次強化学習を適用してサブクエストソルバを反復的に改善する。さらに,報酬の改ざんを回避するための文脈誘導復号法を提案し,従属問題の解法を導出する。大規模な実験により、AutoPRMはSOTA上の数学的および常識推論タスクの性能を著しく向上することが示された。さらに奨励的に、AutoPRMは他の直交推論パイプラインと簡単に統合できる。

Recent advancements in large language models (LLMs) have shown promise in multi-step reasoning tasks, yet their reliance on extensive manual labeling to provide procedural feedback remains a significant impediment. To address this challenge, in this paper, we propose a novel self-supervised framework AutoPRM that efficiently enhances the fine-tuning of LLMs for intricate reasoning challenges. Specifically, AutoPRM first decomposes complex problems into more manageable subquestions with a controllable granularity switch, then sequentially apply reinforcement learning to iteratively improve the subquestion solver. Additionally, we propose context-guided-decoding to avoid reward tampering and guide the subquestion solver towards the solution of the holistic problem. Extensive experiments show that AutoPRM significantly improves performance on mathematical and commonsense reasoning tasks over SOTA. More encouragingly, AutoPRM can be easily integrated with other orthogonal reasoning pipelines.

翻訳日:2024-02-20 21:26:47 公開日:2024-02-18

# Momentor: 微粒な時間推論によるビデオ大言語モデルの改善

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning ( http://arxiv.org/abs/2402.11435v1 )

ライセンス: Link先を確認

Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang

(参考訳) 大規模言語モデル(LLM)は、テキストベースのタスクの理解と処理において顕著な熟練度を示す。これらの属性をビデオLLMと呼ばれるビデオモダリティに転送するために、多くの努力がなされている。しかし、既存のVideo-LLMは粗いセマンティクスのみをキャプチャすることができ、特定のビデオセグメントの理解やローカライゼーションに関連するタスクを効果的に処理できない。これらの課題を踏まえ、細かな時間的理解タスクを実現できるビデオLLMであるMomentorを提案する。 Momentorのトレーニングを支援するために,セグメントレベルの命令データを持つ大規模ビデオ命令データセットであるMoment-10Mを構築するための自動データ生成エンジンを設計する。 moment-10mでmomentorをトレーニングし,セグメントレベルの推論とローカライズを可能にした。いくつかのタスクにおけるゼロショット評価は、モーメントアが微粒な時間的基底の理解と局所化において優れていることを示す。

Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and handling text-based tasks. Many efforts are being made to transfer these attributes to video modality, which are termed Video-LLMs. However, existing Video-LLMs can only capture the coarse-grained semantics and are unable to effectively handle tasks related to comprehension or localization of specific video segments. In light of these challenges, we propose Momentor, a Video-LLM capable of accomplishing fine-grained temporal understanding tasks. To support the training of Momentor, we design an automatic data generation engine to construct Moment-10M, a large-scale video instruction dataset with segment-level instruction data. We train Momentor on Moment-10M, enabling it to perform segment-level reasoning and localization. Zero-shot evaluations on several tasks demonstrate that Momentor excels in fine-grained temporally grounded comprehension and localization.

翻訳日:2024-02-20 21:26:30 公開日:2024-02-18

# IoTアプリケーションのための機械学習技術による屋内ローカライゼーションの改善

Improved Indoor Localization with Machine Learning Techniques for IoT applications ( http://arxiv.org/abs/2402.11433v1 )

ライセンス: Link先を確認

M.W.P. Maduranga

(参考訳) IoT(Internet of Things)とモバイルインターネットアプリケーションの台頭は、商業、軍事、社会アプリケーションのための位置情報サービス(LBS)への関心を喚起している。世界測位システム(GPS)が屋外の局地化を支配する一方で、その効果は信号の問題により屋内で弱まる。屋内ローカライゼーションシステムは、Wi-Fi、ZigBee、Bluetooth、UWBなどの無線技術を活用し、コンテキストに基づいて選択する。受信信号強度インジケータ(RSSI)技術は、その精度と単純さで広く採用されている。本研究は,rssiに基づく屋内定位のための教師付き回帰器,教師付き分類器,アンサンブル方式の3段階の機械学習アルゴリズムを用いる。さらに、重み付き最小二乗法と擬似線形解法を導入し、非線形rssi測定方程式に線形方程式を近似することで対処する。さまざまな無線技術とアンカーノードを利用する実験的なテストベッドは、IoTクラウドアーキテクチャを使用したデータ収集用に設計されている。事前処理には、アルゴリズムトレーニング前のデータ精錬のためのフィルタの調査が含まれる。この研究は、線形回帰、多項式回帰、支持ベクトル回帰、ランダム森林回帰、および様々な無線技術における決定木回帰といった機械学習モデルを用いている。これらのモデルは移動対象ノードの地理的座標を推定し、その性能を精度、根平均二乗誤差、精度、リコール、感度、行列式係数、f1-scoreなどの指標を用いて評価する。実験の結果は、屋内環境におけるローカライズ精度とロバスト性の観点から、異なる教師付き機械学習技術の有効性に関する洞察を与える。

The rise of the Internet of Things (IoT) and mobile internet applications has spurred interest in location-based services (LBS) for commercial, military, and social applications. While the global positioning system (GPS) dominates outdoor localization, its efficacy wanes indoors due to signal challenges. Indoor localization systems leverage wireless technologies like Wi-Fi, ZigBee, Bluetooth, UWB, selecting based on context. Received signal strength indicator (RSSI) technology, known for its accuracy and simplicity, is widely adopted. This study employs machine learning algorithms in three phases: supervised regressors, supervised classifiers, and ensemble methods for RSSI-based indoor localization. Additionally, it introduces a weighted least squares technique and pseudo-linear solution approach to address non-linear RSSI measurement equations by approximating them with linear equations. An experimental testbed, utilizing diverse wireless technologies and anchor nodes, is designed for data collection, employing IoT cloud architectures. Pre-processing involves investigating filters for data refinement before algorithm training. The study employs machine learning models like linear regression, polynomial regression, support vector regression, random forest regression, and decision tree regressor across various wireless technologies. These models estimate the geographical coordinates of a moving target node, and their performance is evaluated using metrics such as accuracy, root mean square errors, precision, recall, sensitivity, coefficient of determinant, and the f1-score. The experiment's outcomes provide insights into the effectiveness of different supervised machine learning techniques in terms of localization accuracy and robustness in indoor environments.

翻訳日:2024-02-20 21:26:14 公開日:2024-02-18

# 偽造検出はより深くできるか? 認識推論のためのデータセット, 評価, ベンチマーク

Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning ( http://arxiv.org/abs/2402.11432v1 )

ライセンス: Link先を確認

Kang Chen, Zheng Lian, Haiyang Sun, Bin Liu, Jianhua Tao

(参考訳) 偽造検出は、多くの実践シナリオにおいてその重要性から注目を集めている。現在、データ不足はこの分野の発展に悪影響を及ぼす。一方、虚偽のシナリオをシミュレートするために参加者を雇うのはコストがかかる。一方,インターネット上での偽装行動を含む動画の収集は困難である。本稿では,データ不足に対処するため,新しいデータ収集パイプラインを提案する。具体的には、GPT-4を用いて被疑者と警察官のロールプレイをシミュレートする。尋問中、容疑者は犯罪の責任を逃れるために警察官に嘘をつき、警察官は真実を知り、証拠を収集する。以前のデータセットと比較して、この戦略はデータ収集コストを削減し、データセットのサイズを増加させる有望な方法を提供する。一方,従来の偽装検出タスクを偽装推論に拡張し,さらに偽装部品のエビデンスを提供する。このデータセットは、現在の大規模言語モデルの複雑な推論能力を評価するためにも使用でき、さらなる研究のための推論ベンチマークとして役立ちます。

Deception detection has attracted increasing attention due to its importance in many practical scenarios. Currently, data scarcity harms the development of this field. On the one hand, it is costly to hire participants to simulate deception scenarios. On the other hand, it is difficult to collect videos containing deceptive behaviors on the Internet. To address data scarcity, this paper proposes a new data collection pipeline. Specifically, we use GPT-4 to simulate a role-play between a suspect and a police officer. During interrogation, the suspect lies to the police officer to evade responsibility for the crime, while the police officer uncovers the truth and gathers evidence. Compared with previous datasets, this strategy reduces data collection costs, providing a promising way to increase the dataset size. Meanwhile, we extend the traditional deception detection task to deception reasoning, further providing evidence for deceptive parts. This dataset can also be used to evaluate the complex reasoning capability of current large language models and serve as a reasoning benchmark for further research.

翻訳日:2024-02-20 21:25:45 公開日:2024-02-18

# 3次元再構成のためのロバストなエラー耐性ビュー選択法

A Robust Error-Resistant View Selection Method for 3D Reconstruction ( http://arxiv.org/abs/2402.11431v1 )

ライセンス: Link先を確認

Shaojie Zhang, Yinghui Wang, Bin Nan, Jinlong Yang, Tao Yan, Liangyi Huang, and Mingfeng Wang

(参考訳) 本研究では,SFM(Structure from Motion)ビュー選択におけるカメラベースラインの小さいビューの選択による三角測量の不確実性の増加に対処するため,ロバストなエラー耐性ビュー選択法を提案する。この手法は三角法に基づく計算を用いて誤り耐性モデルを求め、エラー耐性行列を構築するのに使用される。エラー耐性行列の各行のソート結果は、各ビューの候補ビューセットを決定する。全ビューの候補ビューセットをトラバースし、エラー耐性行列に基づいて欠落ビューを完遂することにより、3D再構成の整合性を確保する。本手法とcolmapプログラムにおいて最も精度の高い排他的手法との実験的比較を行い, 復元結果における平均再投影誤差と絶対軌道誤差について検討した。提案手法は,TUMデータセットとDTUデータセットの絶対軌道誤差の平均29.40%,および5.07%の減少を示す。

To address the issue of increased triangulation uncertainty caused by selecting views with small camera baselines in Structure from Motion (SFM) view selection, this paper proposes a robust error-resistant view selection method. The method utilizes a triangulation-based computation to obtain an error-resistant model, which is then used to construct an error-resistant matrix. The sorting results of each row in the error-resistant matrix determine the candidate view set for each view. By traversing the candidate view sets of all views and completing the missing views based on the error-resistant matrix, the integrity of 3D reconstruction is ensured. Experimental comparisons between this method and the exhaustive method with the highest accuracy in the COLMAP program are conducted in terms of average reprojection error and absolute trajectory error in the reconstruction results. The proposed method demonstrates an average reduction of 29.40% in reprojection error accuracy and 5.07% in absolute trajectory error on the TUM dataset and DTU dataset.

翻訳日:2024-02-20 21:25:31 公開日:2024-02-18

# OptEx: ほぼ並列化されたイテレーションによる一階最適化の高速化

OptEx: Expediting First-Order Optimization with Approximately Parallelized Iterations ( http://arxiv.org/abs/2402.11427v1 )

ライセンス: Link先を確認

Yao Shu, Jiongfeng Fang, Ying Tiffany He, Fei Richard Yu

(参考訳) 第一次最適化(foo)アルゴリズムは、機械学習や信号デノイジングといった多くの計算領域において重要である。しかしながら、ニューラルネットワークトレーニングのような複雑なタスクへの適用は、収束のために多くの逐次イテレーションを必要とするため、大きな非効率性を必要とすることが多い。これに対して,並列計算を利用して並列化ボトルネックを緩和し,FOOの効率を向上する第1のフレームワークであるOptExを,ほぼ並列化イテレーションで高速化する一階最適化を導入する。 optexでは、将来の勾配予測に勾配履歴を使用するために、カーネル化された勾配推定を採用しており、イテレーションの並列化を可能にしている。 We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $\Omega(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.

First-order optimization (FOO) algorithms are pivotal in numerous computational domains such as machine learning and signal denoising. However, their application to complex tasks like neural network training often entails significant inefficiencies due to the need for many sequential iterations for convergence. In response, we introduce first-order optimization expedited with approximately parallelized iterations (OptEx), the first framework that enhances the efficiency of FOO by leveraging parallel computing to mitigate its iterative bottleneck. OptEx employs kernelized gradient estimation to make use of gradient history for future gradient prediction, enabling parallelization of iterations -- a strategy once considered impractical because of the inherent iterative dependency in FOO. We provide theoretical guarantees for the reliability of our kernelized gradient estimation and the iteration complexity of SGD-based OptEx, confirming that estimation errors diminish to zero as historical gradients accumulate and that SGD-based OptEx enjoys an effective acceleration rate of $\Omega(\sqrt{N})$ over standard SGD given parallelism of N. We also use extensive empirical studies, including synthetic functions, reinforcement learning tasks, and neural network training across various datasets, to underscore the substantial efficiency improvements achieved by OptEx.

翻訳日:2024-02-20 21:25:12 公開日:2024-02-18

# オンラインローカル偽発見率制御:資源配分アプローチ

Online Local False Discovery Rate Control: A Resource Allocation Approach ( http://arxiv.org/abs/2402.11425v1 )

ライセンス: Link先を確認

Ruicheng Ao, Hongyu Chen, David Simchi-Levi, Feng Zhu

(参考訳) オンライン局所的偽発見率(fdr: online local false discovery rate)制御では,複数のテストが順次実施され,期待される発見回数を最大化することが課題である。本稿では,オンラインの資源配分問題としてこの問題を定式化し,高いレベルからネット上のknapsack問題とみなすことが可能であり,さらにランダムな予算補充の不確実性が生じる。一般到着分布から始めて、$O(\sqrt{T})$ regret を達成するための簡単なポリシーを提案する。このような後悔率は一般的には実現不可能であることを示すことで結果を補完する。その後、焦点を離散的な到着分布に移す。オンラインリソース割り当て文献における多くの既存の再解決ヒューリスティックは、標準設定における有界損失を達成したとしても、$\Omega(\sqrt{T})$あるいは$\Omega(T)$後悔を引き起こす可能性がある。標準政策があまりに楽観的になりすぎ、到着を過度に受け入れる傾向があるという観測から、予算バッファーを組み込んだ新しい政策を提案する。我々は、小さな対数バッファが$\Omega(\sqrt{T})$または$\Omega(T)$から$O(\ln^2T)$への後悔を減らすのに十分であることを示す。理論的結果を検証するため, 数値実験を行った。本論文では, 不正な受理の回避と不確実な予算を伴うオンライン資源配分問題における不正な拒否のバランスを保ちつつ, 効果的な政策がいかに設計されるべきかを強調した。

We consider the problem of online local false discovery rate (FDR) control where multiple tests are conducted sequentially, with the goal of maximizing the total expected number of discoveries. We formulate the problem as an online resource allocation problem with accept/reject decisions, which from a high level can be viewed as an online knapsack problem, with the additional uncertainty of random budget replenishment. We start with general arrival distributions and propose a simple policy that achieves a $O(\sqrt{T})$ regret. We complement the result by showing that such regret rate is in general not improvable. We then shift our focus to discrete arrival distributions. We find that many existing re-solving heuristics in the online resource allocation literature, albeit achieve bounded loss in canonical settings, may incur a $\Omega(\sqrt{T})$ or even a $\Omega(T)$ regret. With the observation that canonical policies tend to be too optimistic and over accept arrivals, we propose a novel policy that incorporates budget buffers. We show that small additional logarithmic buffers suffice to reduce the regret from $\Omega(\sqrt{T})$ or even $\Omega(T)$ to $O(\ln^2 T)$. Numerical experiments are conducted to validate our theoretical findings. Our formulation may have wider applications beyond the problem considered in this paper, and our results emphasize how effective policies should be designed to reach a balance between circumventing wrong accept and reducing wrong reject in online resource allocation problems with uncertain budgets.

翻訳日:2024-02-20 21:24:55 公開日:2024-02-18

# 一般化ゼロショット認識のためのデータ分布蒸留生成モデル

Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition ( http://arxiv.org/abs/2402.11424v1 )

ライセンス: Link先を確認

Yijie Wang and Mingjian Hong and Luwen Huangfu and Sheng Huang

(参考訳) ゼロショット学習(zsl)の領域では,参照データを好む一般化ゼロショット学習(gzsl)モデルのバイアスに対処する。これに対応するために、D$3$GZSLと呼ばれるエンドツーエンド生成GZSLフレームワークを導入する。このフレームワークは、よりバランスの取れたモデルに対して、目に見えないデータと合成されたデータを、それぞれ分布外データとみなす。 d$^3$gzslは2つのコアモジュールから成り、in-distribution dual space distillation (id$^2$sd)とout-of-distribution batch distillation (o$^2$dbd)である。 ID$2$SDは、埋め込みやラベル空間における教師の学習結果と整合し、学習コヒーレンスを高める。 o$^2$dbdは、バッチサンプル毎に低次元の分散表現を導入し、目に見えるカテゴリと目に見えないカテゴリ間の共有構造をキャプチャする。提案手法は,確立されたGZSLベンチマーク間で有効性を示し,主要な生成フレームワークにシームレスに統合する。 D$3$GZSLは既存の生成GZSLメソッドの性能を高め、ゼロショット学習プラクティスを洗練させる可能性を示している。

In the realm of Zero-Shot Learning (ZSL), we address biases in Generalized Zero-Shot Learning (GZSL) models, which favor seen data. To counter this, we introduce an end-to-end generative GZSL framework called D$^3$GZSL. This framework respects seen and synthesized unseen data as in-distribution and out-of-distribution data, respectively, for a more balanced model. D$^3$GZSL comprises two core modules: in-distribution dual space distillation (ID$^2$SD) and out-of-distribution batch distillation (O$^2$DBD). ID$^2$SD aligns teacher-student outcomes in embedding and label spaces, enhancing learning coherence. O$^2$DBD introduces low-dimensional out-of-distribution representations per batch sample, capturing shared structures between seen and unseen categories. Our approach demonstrates its effectiveness across established GZSL benchmarks, seamlessly integrating into mainstream generative frameworks. Extensive experiments consistently showcase that D$^3$GZSL elevates the performance of existing generative GZSL methods, underscoring its potential to refine zero-shot learning practices.The code is available at: https://github.com/PJBQ/D3GZSL.git

翻訳日:2024-02-20 21:24:20 公開日:2024-02-18

# 多段階知識伝達フレームワークによる中国語綴り誤りの軽減

Mitigating Catastrophic Forgetting in Multi-domain Chinese Spelling Correction by Multi-stage Knowledge Transfer Framework ( http://arxiv.org/abs/2402.11422v1 )

ライセンス: Link先を確認

Peng Xing, Yinghui Li, Shirong Ma, Xinnian Liang, Haojing Huang, Yangning Li, Hai-Tao Zheng, Wenhao Jiang, Ying Shen

(参考訳) Chinese Spelling Correction (CSC)は、与えられた文中のスペルエラーを検出し、修正することを目的としている。近年、マルチドメインCSCはより実践的であるため、研究者の注目を集めている。本稿では,複数のドメインシナリオに適応する際のCSCモデルの重要な欠陥,すなわち,新たなドメイン固有の知識(破滅的な忘れ事)を学習する際に獲得した知識を忘れる傾向に注目した。そこで本研究では,新しいドメイン知識にのみ焦点をあてるのではなく,各ドメインにおける知識伝達に継続的に進化する教師モデルを利用する,モデルに依存しない多段階知識伝達(MKT)フレームワークを提案する。マルチドメインCSCタスクに継続的学習メソッドを適用するのは,私たちが初めてです。提案手法の有効性を実証する実験を行い,さらなる解析によりモデル性能を向上させるために壊滅的な忘れを克服することの重要性を実証した。

Chinese Spelling Correction (CSC) aims to detect and correct spelling errors in given sentences. Recently, multi-domain CSC has gradually attracted the attention of researchers because it is more practicable. In this paper, we focus on the key flaw of the CSC model when adapting to multi-domain scenarios: the tendency to forget previously acquired knowledge upon learning new domain-specific knowledge (i.e., catastrophic forgetting). To address this, we propose a novel model-agnostic Multi-stage Knowledge Transfer (MKT) framework, which utilizes a continuously evolving teacher model for knowledge transfer in each domain, rather than focusing solely on new domain knowledge. It deserves to be mentioned that we are the first to apply continual learning methods to the multi-domain CSC task. Experiments prove the effectiveness of our proposed method, and further analyses demonstrate the importance of overcoming catastrophic forgetting for improving the model performance.

翻訳日:2024-02-20 21:23:45 公開日:2024-02-18

# 中国語文法誤り訂正における大規模言語モデルの役割の再考

Rethinking the Roles of Large Language Models in Chinese Grammatical Error Correction ( http://arxiv.org/abs/2402.11420v1 )

ライセンス: Link先を確認

Yinghui Li, Shang Qin, Jingheng Ye, Shirong Ma, Yangning Li, Libo Qin, Xuming Hu, Wenhao Jiang, Hai-Tao Zheng, Philip S. Yu

(参考訳) 近年,Large Language Models (LLMs) は下流NLPタスクにおける役割について研究者によって広く研究されている。 NLP分野における基本的な課題として、中国語文法誤り訂正(CGEC)は、入力文中のすべての文法的誤りを修正することを目的としている。これまでの研究では、LCMがCGECの修正子としての性能は、課題の焦点が難しいため、未だに満足できないことが示されている。 CGEC における LLM の役割を再考し, CGEC における LLM の活用と探索について検討した。 LLMに格納されている豊富な文法知識とその強力な意味理解能力を考慮すると、LCMを説明者として利用し、エラー修正時にCGEC小モデルの説明情報を提供し、性能を向上させる。また,LCMを評価指標として,より合理的なCGEC評価を実現し,CGECタスクの主観性による問題を軽減する。特に私たちの仕事は、下流のタスクにおいてllmと小さなモデルがどのように協調するかを積極的に探究するものです。広く使われているデータセットに関する広範な実験と詳細な分析は、我々の思考直感と提案手法の有効性を検証する。

Recently, Large Language Models (LLMs) have been widely studied by researchers for their roles in various downstream NLP tasks. As a fundamental task in the NLP field, Chinese Grammatical Error Correction (CGEC) aims to correct all potential grammatical errors in the input sentences. Previous studies have shown that LLMs' performance as correctors on CGEC remains unsatisfactory due to its challenging task focus. To promote the CGEC field to better adapt to the era of LLMs, we rethink the roles of LLMs in the CGEC task so that they can be better utilized and explored in CGEC. Considering the rich grammatical knowledge stored in LLMs and their powerful semantic understanding capabilities, we utilize LLMs as explainers to provide explanation information for the CGEC small models during error correction to enhance performance. We also use LLMs as evaluators to bring more reasonable CGEC evaluations, thus alleviating the troubles caused by the subjectivity of the CGEC task. In particular, our work is also an active exploration of how LLMs and small models better collaborate in downstream tasks. Extensive experiments and detailed analyses on widely used datasets verify the effectiveness of our thinking intuition and the proposed methods.

翻訳日:2024-02-20 21:23:17 公開日:2024-02-18

# 量子および古典計算による多体相関効果のキャプチャ

Capturing many-body correlation effects with quantum and classical computing ( http://arxiv.org/abs/2402.11418v1 )

ライセンス: Link先を確認

Karol Kowalski, Nicholas P. Bauman, Guang Hao Low, Martin Roetteler, John J. Rehr, Fernando D. Vila

(参考訳) 高エネルギー状態における分子系の励起状態の理論的記述は、光源施設における多くの実験活動を支援し推進するために重要である。しかし、それらの複雑な相関効果を捉えるには、近似の階層的インフラストラクチャを提供する形式を必要とする。これらの近似は古典的な計算手法のオーバーヘッドを増大させるため、近似のランク付けと結果の質に関する決定は純粋に数値的な根拠で行う必要がある。量子コンピューティング手法の出現は、この状況を変える可能性がある。本研究では、X線光電子分光に関連するコアレベル状態の同定における量子位相推定器(QPE)の効率を実証する。集団相関効果が支配する状態に対する最も正確な方法の1つとして,qpe予測と正確な対角化および実時間運動連成クラスター式を比較し,検証する。

Theoretical descriptions of excited states of molecular systems in high-energy regimes are crucial for supporting and driving many experimental efforts at light source facilities. However, capturing their complicated correlation effects requires formalisms that provide a hierarchical infrastructure of approximations. These approximations lead to an increased overhead in classical computing methods, and therefore, decisions regarding the ranking of approximations and the quality of results must be made on purely numerical grounds. The emergence of quantum computing methods has the potential to change this situation. In this study, we demonstrate the efficiency of Quantum Phase Estimator (QPE) in identifying core-level states relevant to x-ray photoelectron spectroscopy. We compare and validate the QPE predictions with exact diagonalization and real-time equation-of-motion coupled cluster formulations, which are some of the most accurate methods for states dominated by collective correlation effects.

翻訳日:2024-02-20 21:22:42 公開日:2024-02-18

# LoRETTA:大規模言語モデルの超低パラメータ微調整のための低レベル経済テンソルトレイン適応

LoRETTA: Low-Rank Economic Tensor-Train Adaptation for Ultra-Low-Parameter Fine-Tuning of Large Language Models ( http://arxiv.org/abs/2402.11417v1 )

ライセンス: Link先を確認

Yifan Yang, Jiajun Zhou, Ngai Wong, Zheng Zhang

(参考訳) モデル性能を維持しながら計算効率のよい微調整を実現するために,様々なパラメータ効率の微調整技術が提案されている。しかし、既存のPEFTメソッドは、LLM(Large Language Models)の迅速な展開に伴うトレーニング可能なパラメータの増加によって、依然として制限されている。この課題に対処するため、テンソル-トレイン分解によりトレーニング可能なパラメータを著しく削減する超パラメータ効率のフレームワークであるLoRETTAを提案する。具体的には, {LoRETTA}$_{adp}$と {LoRETTA}$_{rep}$という2つの方法を提案する。前者はテンソル化アダプタを採用し、LLMの微調整に高性能で軽量なアプローチを提供する。後者は、小さなテンソル因子のセットによる重量パラメータ化による微調整を強調する。 LoRETTAは、LLaMA-2-7Bモデルで最大100\times$より少ないパラメータを持つ、最も広く使われているPEFTメソッドと同等または優れたパフォーマンスを達成する。さらに,提案手法はトレーニング効率を効果的に向上し,マルチタスク学習性能を向上し,反オーバーフィッティング能力を向上することを示した。 HuggingfaceフレームワークとPEFTライブラリ上に構築されたプラグイン・アンド・プレイコードがリリースされる。

Various parameter-efficient fine-tuning (PEFT) techniques have been proposed to enable computationally efficient fine-tuning while maintaining model performance. However, existing PEFT methods are still limited by the growing number of trainable parameters with the rapid deployment of Large Language Models (LLMs). To address this challenge, we present LoRETTA, an ultra-parameter-efficient framework that significantly reduces trainable parameters through tensor-train decomposition. Specifically, we propose two methods, named {LoRETTA}$_{adp}$ and {LoRETTA}$_{rep}$. The former employs tensorized adapters, offering a high-performance yet lightweight approach for the fine-tuning of LLMs. The latter emphasizes fine-tuning via weight parameterization with a set of small tensor factors. LoRETTA achieves comparable or better performance than most widely used PEFT methods with up to $100\times$ fewer parameters on the LLaMA-2-7B models. Furthermore, empirical results demonstrate that the proposed method effectively improves training efficiency, enjoys better multi-task learning performance, and enhances the anti-overfitting capability. Plug-and-play codes built upon the Huggingface framework and PEFT library will be released.

翻訳日:2024-02-20 21:21:49 公開日:2024-02-18

# マルチモーダル要約のためのきめ細かな説明可能なファクタリティ評価

Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization ( http://arxiv.org/abs/2402.11414v1 )

ライセンス: Link先を確認

Liqiang Jing, Jingxuan Zuo, Yue Zhang

(参考訳) マルチモーダル要約は入力テキストと画像に基づいて簡潔な要約を生成することを目的としている。しかし、既存の手法は非実効的な出力に悩まされる可能性がある。マルチモーダル要約モデルの事実性を評価するため、異なるアプリケーションシナリオ、すなわち参照ベース事実性評価フレームワークと参照フリー事実性評価フレームワークに対して、細粒度で説明可能な2つの評価フレームワーク(FALLACIOUS)を提案する。特に、参照フリーの事実性評価フレームワークは、基礎的な真実を必要としないため、より広いアプリケーションシナリオを持つ。提案フレームワークの有効性を評価するために,フレームワークと他のメトリクスの相関度を計算する。実験の結果,提案手法の有効性が示された。コードとデータセットをgithub経由でリリースします。

Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i.e. reference-based factuality evaluation framework and reference-free factuality evaluation framework. Notably, the reference-free factuality evaluation framework doesn't need ground truth and hence it has a wider application scenario. To evaluate the effectiveness of the proposed frameworks, we compute the correlation between our frameworks and the other metrics. The experimental results show the effectiveness of our proposed method. We will release our code and dataset via github.

翻訳日:2024-02-20 21:21:28 公開日:2024-02-18

# Segment Anything Model(SAM)を用いた機械駆動画像ラベリングのためのマルチスペクトル自動転送技術(MATT)

A Multispectral Automated Transfer Technique (MATT) for machine-driven image labeling utilizing the Segment Anything Model (SAM) ( http://arxiv.org/abs/2402.11413v1 )

ライセンス: Link先を確認

James E. Gallagher, Aryav Gogia, Edward J. Oughton

(参考訳) Segment Anything Model (SAM)は、大規模なRed-Green-Blue (RGB)イメージデータセットの自動セグメンテーションとラベル付けのスピードと正確性を大幅に加速している。しかし、サムは、例えばマルチスペクトルやハイパースペクトル画像など、可視光スペクトルの外側の画像をセグメンテーションしたりラベル付けしたりできない。そこで本稿では,MATT(Multispectral Automated Transfer Technique)と呼ぶ手法について概説する。 RGB画像からSAMセグメンテーションマスクを変換することで、高精度で効率よくマルチスペクトル画像のセグメンテーションとラベルを自動で行うことができる。例えば、mattを用いた2,400画像データセットのセグメンテーションとラベリングは、トレーニングモデルの開発において87.8%の時間短縮を達成し、およそ20時間の手動ラベリングを2.4時間に短縮した。この効率向上は、MATTによるマルチスペクトルモデルのトレーニングにおいて、手動でラベル付けされたデータセットと比較して、全体の平均平均精度(mAP)が6.7%減少することと関連付けられている。本研究では,訓練中に省いた時間を考慮した場合の精度の低下を許容できるレベルとみなす。本研究は,人間のインタラクションを最小限に抑えたマルチスペクトル物体検出モデルを高速に分割,ラベル付け,訓練するための,新しいオープンソース手法を提供することにより,マルチスペクトル物体検出の研究に大きく貢献する。今後の研究はこれらの手法を応用することに集中する必要がある (i)空間ベースのマルチスペクトル、及び (ii) ドローンによるハイパースペクトル画像。

Segment Anything Model (SAM) is drastically accelerating the speed and accuracy of automatically segmenting and labeling large Red-Green-Blue (RGB) imagery datasets. However, SAM is unable to segment and label images outside of the visible light spectrum, for example, for multispectral or hyperspectral imagery. Therefore, this paper outlines a method we call the Multispectral Automated Transfer Technique (MATT). By transposing SAM segmentation masks from RGB images we can automatically segment and label multispectral imagery with high precision and efficiency. For example, the results demonstrate that segmenting and labeling a 2,400-image dataset utilizing MATT achieves a time reduction of 87.8% in developing a trained model, reducing roughly 20 hours of manual labeling, to only 2.4 hours. This efficiency gain is associated with only a 6.7% decrease in overall mean average precision (mAP) when training multispectral models via MATT, compared to a manually labeled dataset. We consider this an acceptable level of precision loss when considering the time saved during training, especially for rapidly prototyping experimental modeling methods. This research greatly contributes to the study of multispectral object detection by providing a novel and open-source method to rapidly segment, label, and train multispectral object detection models with minimal human interaction. Future research needs to focus on applying these methods to (i) space-based multispectral, and (ii) drone-based hyperspectral imagery.

翻訳日:2024-02-20 21:21:00 公開日:2024-02-18

# 弱値による最適量子状態トモグラフィ

Optimal Quantum State Tomography via Weak Value ( http://arxiv.org/abs/2402.11484v1 )

ライセンス: Link先を確認

Xuanmin Zhua, Dezheng Zhanga, Runping Gao, Qun wei, Lixia Liu, and Zijiang Luo

(参考訳) 弱値による状態トモグラフィー戦略の効率を向上させるため,システムと測定装置の最適結合強度を探索した。任意のd次元量子系に対して、密度行列の実部と虚部を測定するのに使用される最適な強度を求める。状態トモグラフィーの最適効率についても平均二乗誤差を用いて検討した。再構成密度行列における最小平均二乗誤差が導出された。本論文で研究されている状態トモグラフィー戦略は、未知の量子状態の測定に有用である。

To improve the efficiency of the state tomography strategy via weak value, we have searched the optimal coupling strength between the system and measuring device. For an arbitrary d-dimensional quantum system, the optimal strengths being used in measuring the real and imaginary parts of the density matrix are obtained. The optimal efficiency of the state tomography has also been studied by using mean square error. The minimal mean square errors in the reconstructed density matrices have been derived. The state tomography strategy studied in this article may be useful in the measurement of the unknown quantum states.

翻訳日:2024-02-20 21:13:27 公開日:2024-02-18

# Re-Dock: 拡散ブリッジによるフレキシブルでリアルな分子ドッキングを目指して

Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge ( http://arxiv.org/abs/2402.11459v1 )

ライセンス: Link先を確認

Yufei Huang, Odin Zhang, Lirong Wu, Cheng Tan, Haitao Lin, Zhangyang Gao, Siyuan Li and Stan.Z. Li

(参考訳) タンパク質-リガンド結合構造の正確な予測は、分子ドッキングとして知られるタスクが薬物設計に不可欠であるが、依然として困難である。ディープラーニングは期待されているが、既存の手法はホロタンパク質の構造(ドッキングされ、現実的なタスクでは利用できない)やポケットサイドチェーンのコンフォーメーションに依存し、実用性や非現実的なコンフォーメーション予測に限定される。これらのギャップを埋めるために,リガンドとポケット側鎖のポーズを同時予測するフレキシブルドッキングと呼ばれる未熟なタスクを導入し,幾何多様体に拡張した新しい拡散橋生成モデルであるre-dockを導入する。具体的には, ニュートン・オイラー方程式に触発されたエネルギー対ジオメトリマッピングを提案し, エネルギー制約ドッキング生成過程を反映する結合エネルギーと配座を共モデル化する。 apo-dockやcross-dockを含む設計ベンチマークデータセットに関する包括的な実験は、現在の手法よりも優れた効果と効率を示している。

Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation predictions. To fill these gaps, we introduce an under-explored task, named flexible docking to predict poses of ligand and pocket sidechains simultaneously and introduce Re-Dock, a novel diffusion bridge generative model extended to geometric manifolds. Specifically, we propose energy-to-geometry mapping inspired by the Newton-Euler equation to co-model the binding energy and conformations for reflecting the energy-constrained docking generative process. Comprehensive experiments on designed benchmark datasets including apo-dock and cross-dock demonstrate our model's superior effectiveness and efficiency over current methods.

翻訳日:2024-02-20 21:13:19 公開日:2024-02-18

# Key Patch Proposer: リッチ情報を含むキーパッチ

Key Patch Proposer: Key Patches Contain Rich Information ( http://arxiv.org/abs/2402.11458v1 )

ライセンス: Link先を確認

Jing Xu, Beiwen Tian, Hao Zhao

(参考訳) 本稿では,新たなアルゴリズムであるkpp(key patch proposalr)を提案する。本実験では,KPP のセマンティック情報を再構築作業と分類作業の両方で捉える能力を示す。 KPPの有効性は、セマンティックセグメンテーションのためのアクティブラーニングにその可能性を示している。ソースコードはhttps://github.com/ca-tt-ac/key-patch-proposerで公開しています。

In this paper, we introduce a novel algorithm named Key Patch Proposer (KPP) designed to select key patches in an image without additional training. Our experiments showcase KPP's robust capacity to capture semantic information by both reconstruction and classification tasks. The efficacy of KPP suggests its potential application in active learning for semantic segmentation. Our source code is publicly available at https://github.com/CA-TT-AC/key-patch-proposer.

翻訳日:2024-02-20 21:13:02 公開日:2024-02-18

# LLMはいつ検索強化が必要なのか? LLMの過信の緩和は検索の強化に役立つ

When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation ( http://arxiv.org/abs/2402.11457v1 )

ライセンス: Link先を確認

Shiyu Ni, Keping Bi, Jiafeng Guo, Xueqi Cheng

(参考訳) 大きな言語モデル(LLM)は、特定の知識を持っていないことや、そのようなケースで明らかな答えを提供する傾向があることを知るのが困難である。 Retrieval Augmentation (RA)はLLMの幻覚を緩和するために広く研究されている。しかし、余分なオーバーヘッドと保証されていない検索品質のため、RAを常に実行するのが最適ではないかもしれない。簡単な考え方は、LLMが質問に対して不確実である場合にのみ検索を行うことである。このことは、LLMが知識境界を知覚しRAを支援する能力を高める動機となります。本稿ではまず,LSMのそのような能力を定量的に測定し,その過信を確かめる。次に,質問に対するllmsの確信度と,外部検索情報への依存度との関係について検討した。本稿では,LLMの知識境界に対する認識を高めるためのいくつかの手法を提案する。さらに、これらの手法により、LLMはより少ない検索呼び出しでRAの同等またはそれ以上の性能を達成することができる。

Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge and tend to provide specious answers in such cases. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. However, due to the extra overhead and unassured quality of retrieval, it may not be optimal to conduct RA all the time. A straightforward idea is to only conduct retrieval when LLMs are uncertain about a question. This motivates us to enhance the LLMs' ability to perceive their knowledge boundaries to help RA. In this paper, we first quantitatively measure LLMs' such ability and confirm their overconfidence. Then, we study how LLMs' certainty about a question correlates with their dependence on external retrieved information. We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence. Additionally, equipped with these methods, LLMs can achieve comparable or even better performance of RA with much fewer retrieval calls.

翻訳日:2024-02-20 21:12:54 公開日:2024-02-18

# FactPICO:医学的証拠の平易な要約のためのファクチュアリティ評価

FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence ( http://arxiv.org/abs/2402.11456v1 )

ライセンス: Link先を確認

Sebastian Antony Joseph, Lily Chen, Jan Trienes, Hannah Louisa G\"oke, Monika Coers, Wei Xu, Byron C Wallace, Junyi Jessy Li

(参考訳) LLMを用いた平易な言語要約は、技術的コンテンツのテキストアクセシビリティを向上させるのに有用である。しかし、これらの要約は薬のような高リスク領域において、どの程度事実か? 本稿では,無作為化対照治験(rcts)を記述した医学文献の原文要約のための事実度ベンチマークであるfactpico(ファクトピコ)について述べる。 FactPICOは、3つのLCM(GPT-4、Llama-2、Alpaca)から生成された345のプレーン言語要約と、専門家によるきめ細かい評価と自然言語の有理性からなる。本研究は,これらの要約におけるrctの重要な要素である集団,介入,比較者,成果(pico)の事実性,およびそれらに関する報告結果を評価する。また,llmsに付加された追加情報(説明など)の正確性を評価する。 FactPICOを用いて, LLMをベースとした新たなファクトリティー指標を含む, 既存のファクトリティー指標をベンチマークする。医学的証拠の平易な言語要約は、特に単純さと事実性のバランスをとる場合、依然として困難であり、既存のメトリクスは、インスタンスレベルの専門家の判断とあまり相関しない。

Plain language summarization with LLMs can be useful for improving textual accessibility of technical content. But how factual are these summaries in a high-stakes domain like medicine? This paper presents FactPICO, a factuality benchmark for plain language summarization of medical texts describing randomized controlled trials (RCTs), which are the basis of evidence-based medicine and can directly inform patient treatment. FactPICO consists of 345 plain language summaries of RCT abstracts generated from three LLMs (i.e., GPT-4, Llama-2, and Alpaca), with fine-grained evaluation and natural language rationales from experts. We assess the factuality of critical elements of RCTs in those summaries: Populations, Interventions, Comparators, Outcomes (PICO), as well as the reported findings concerning these. We also evaluate the correctness of the extra information (e.g., explanations) added by LLMs. Using FactPICO, we benchmark a range of existing factuality metrics, including the newly devised ones based on LLMs. We find that plain language summarization of medical evidence is still challenging, especially when balancing between simplicity and factuality, and that existing metrics correlate poorly with expert judgments on the instance level.

翻訳日:2024-02-20 21:12:38 公開日:2024-02-18

# LoRA-Flow: 生成タスクにおける大規模言語モデルのための動的LoRA融合

LoRA-Flow: Dynamic LoRA Fusion for Large Language Models in Generative Tasks ( http://arxiv.org/abs/2402.11455v1 )

ライセンス: Link先を確認

Hanqing Wang, Bowen Ping, Shuo Wang, Xu Han, Yun Chen, Zhiyuan Liu, Maosong Sun

(参考訳) LoRAは軽量モジュールを使用して、ダウンストリームタスクやドメイン毎に大きな言語モデル(LLM)をカスタマイズする。新しいタスクに対処するために既存のLoRAを組み合わせることで、学習したLoRAの再利用性を高めることができる。 LoRAの組み合わせに関する以前のほとんどの研究は、主に関連するLoRAごとにタスクレベルの重みに依存しており、異なる例とトークンが同じLoRA重みを共有する。しかし、生成タスクでは、異なるトークンは管理する様々なスキルを必要とする。中国の数学タスクを例にとると、問題記述の理解は中国のLoRAに依存し、計算部は数学のLoRAに依存している可能性がある。そこで本稿では,異なるロラスの影響を動的重み付けを用いて調整するlora-flowを提案する。各ステップの重みは、非常に少ないパラメータを持つ融合ゲートによって決定され、200のトレーニング例で学習できる。 6つの生成タスクに対する実験により、我々の手法はタスクレベルの融合重みでベースラインを一貫して上回ることを示した。これはlora結合に動的融合重みを導入する必要性を強調する。

LoRA employs lightweight modules to customize large language models (LLMs) for each downstream task or domain, where different learned additional modules represent diverse skills. Combining existing LoRAs to address new tasks can enhance the reusability of learned LoRAs, particularly beneficial for tasks with limited annotated data. Most prior works on LoRA combination primarily rely on task-level weights for each involved LoRA, making different examples and tokens share the same LoRA weights. However, in generative tasks, different tokens may necessitate diverse skills to manage. Taking the Chinese math task as an example, understanding the problem description may depend more on the Chinese LoRA, while the calculation part may rely more on the math LoRA. To this end, we propose LoRA-Flow, which utilizes dynamic weights to adjust the impact of different LoRAs. The weights at each step are determined by a fusion gate with extremely few parameters, which can be learned with only 200 training examples. Experiments across six generative tasks demonstrate that our method consistently outperforms baselines with task-level fusion weights. This underscores the necessity of introducing dynamic fusion weights for LoRA combination.

翻訳日:2024-02-20 21:12:16 公開日:2024-02-18

# MatPlotAgent: LLMに基づくエージェント科学データの可視化手法と評価

MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization ( http://arxiv.org/abs/2402.11453v1 )

ライセンス: Link先を確認

Zhiyu Yang, Zihan Zhou, Shuo Wang, Xin Cong, Xu Han, Yukun Yan, Zhenghao Liu, Zhixing Tan, Pengyuan Liu, Dong Yu, Zhiyuan Liu, Xiaodong Shi, Maosong Sun

(参考訳) 科学データ可視化は、複雑な情報の直接表示を可能にし、暗黙のパターンを識別する研究者を支援することによって、研究において重要な役割を果たす。その重要性にもかかわらず、科学的データの可視化にLarge Language Models (LLMs) を用いることは、まだ明らかになっていない。本研究では,科学的データ可視化タスクの自動化を目的とした,効率的なモデルに依存しないLLMエージェントフレームワークであるMatPlotAgentを紹介する。 matplotagentは,コードllmとマルチモーダルllmの両方の機能を活用することで,クエリ理解,反復デバッグによるコード生成,エラー修正のための視覚的フィードバック機構という3つのコアモジュールで構成される。この分野でのベンチマークの欠如に対処するため、100の人間検証テストケースからなる高品質なベンチマークであるMatPlotBenchを紹介した。さらに, GPT-4V を用いた自動評価手法を提案する。実験の結果,matplotagentは商用モデルとオープンソースモデルの両方を含む様々なllmの性能を向上させることができた。さらに,提案手法は,人間の注記スコアと強い相関関係を示す。

Scientific data visualization plays a crucial role in research by enabling the direct display of complex information and assisting researchers in identifying implicit patterns. Despite its importance, the use of Large Language Models (LLMs) for scientific data visualization remains rather unexplored. In this study, we introduce MatPlotAgent, an efficient model-agnostic LLM agent framework designed to automate scientific data visualization tasks. Leveraging the capabilities of both code LLMs and multi-modal LLMs, MatPlotAgent consists of three core modules: query understanding, code generation with iterative debugging, and a visual feedback mechanism for error correction. To address the lack of benchmarks in this field, we present MatPlotBench, a high-quality benchmark consisting of 100 human-verified test cases. Additionally, we introduce a scoring approach that utilizes GPT-4V for automatic evaluation. Experimental results demonstrate that MatPlotAgent can improve the performance of various LLMs, including both commercial and open-source models. Furthermore, the proposed evaluation method shows a strong correlation with human-annotated scores.

翻訳日:2024-02-20 21:11:55 公開日:2024-02-18

# SciAgent: 科学的推論のためのツール強化言語モデル

SciAgent: Tool-augmented Language Models for Scientific Reasoning ( http://arxiv.org/abs/2402.11451v1 )

ライセンス: Link先を確認

Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao and Aixin Sun

(参考訳) 科学的推論は、最も先進的な大規模言語モデル(LLM)でさえも過度に挑戦する。このタスクをより実用的で解き易くするために,ツール強化科学推論という新しいタスク設定を導入する。この設定は、スケーラブルなツールセットでLLMを補完し、全能的な問題解決者から熟練したツールユーザへと焦点を移す。そこで我々は,3万以上のサンプルと約6,000のツールを含むツール強化学習コーパスMathFuncを構築した。 MathFunc上に構築したSciAgentは,科学的な問題解決のためのツールを検索し,理解し,必要に応じて利用する。さらに、私たちは5つの科学的領域にまたがるベンチマークSciToolBenchを作成し、ツールアシストによるLSMの能力を評価する。 SciToolBenchの大規模な実験により、SciAgentの有効性が確認された。特に、SciAgent-Mistral-7Bは、同じ大きさの他のLLMを13%以上、絶対精度で上回る。さらに、SciAgent-DeepMath-7BはChatGPTよりも優れた性能を示している。

Scientific reasoning poses an excessive challenge for even the most advanced Large Language Models (LLMs). To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning. This setting supplements LLMs with scalable toolsets, and shifts the focus from pursuing an omniscient problem solver to a proficient tool-user. To facilitate the research of such setting, we construct a tool-augmented training corpus named MathFunc which encompasses over 30,000 samples and roughly 6,000 tools. Building on MathFunc, we develop SciAgent to retrieve, understand and, if necessary, use tools for scientific problem solving. Additionally, we craft a benchmark, SciToolBench, spanning five scientific domains to evaluate LLMs' abilities with tool assistance. Extensive experiments on SciToolBench confirm the effectiveness of SciAgent. Notably, SciAgent-Mistral-7B surpasses other LLMs with the same size by more than 13% in absolute accuracy. Furthermore, SciAgent-DeepMath-7B shows much superior performance than ChatGPT.

翻訳日:2024-02-20 21:11:38 公開日:2024-02-18

# ラベル分布による文脈内サンプル順序付け

In-Context Example Ordering Guided by Label Distributions ( http://arxiv.org/abs/2402.11447v1 )

ライセンス: Link先を確認

Zhichao Xu, Daniel Cohen, Bei Wang, Vivek Srikumar

(参考訳) タスク固有のトレーニングなしでモデルを予測できるようにすることで、事前訓練されたLLMを用いたインコンテキスト学習(ICL)は、NLPにおいて大きな可能性を秘めている。しかし、iclでは多くの問題が続いている。特に、そのパフォーマンスは、コンテキスト内例の選択と順序に敏感である。異なる順序を持つ同じコンテキストの例が与えられた場合、モデルの性能は、ほぼランダムからほぼ最先端まで様々である。本研究では,最適化問題としてコンテキスト内注文を定式化する。本研究は,課題の既知点に関する仮定が異なる3つの問題設定について検討する。ラベル比率から学習するという考えに触発されて,モデルの確率予測に導かれる文脈内サンプル順序付けの2つの原則を提案する。提案手法をテキスト分類データセット13と,700Mから13Bパラメータを持つ9種類の自己回帰LDMに適用した。提案手法は, 分類精度の向上, モデルの誤校正の低減, 文脈内事例の選択により, ベースラインよりも優れていることを示す。

By allowing models to predict without task-specific training, in-context learning (ICL) with pretrained LLMs has enormous potential in NLP. However, a number of problems persist in ICL. In particular, its performance is sensitive to the choice and order of in-context examples. Given the same set of in-context examples with different orderings, model performance may vary between near random to near state-of-the-art. In this work, we formulate in-context example ordering as an optimization problem. We examine three problem settings that differ in the assumptions they make about what is known about the task. Inspired by the idea of learning from label proportions, we propose two principles for in-context example ordering guided by model's probability predictions. We apply our proposed principles to thirteen text classification datasets and nine different autoregressive LLMs with 700M to 13B parameters. We demonstrate that our approach outperforms the baselines by improving the classification accuracy, reducing model miscalibration, and also by selecting better in-context examples.

翻訳日:2024-02-20 21:11:22 公開日:2024-02-18

# 米国における条件付き自動走行車の受容状況

Gauging Public Acceptance of Conditionally Automated Cars in the United States ( http://arxiv.org/abs/2402.11444v1 )

ライセンス: Link先を確認

Antonios Saravanos (1) ((1) New York University)

(参考訳) この研究では、スマートシティの要素である条件付き自動走行車(saeレベル3)を調べ、米国における公共の受容に影響を与える要因を調査します。 UTUAT2モデルの適応を適用した。米国の358名の被験者を対象に,l3技術を概説したvignetteと,条件付き自動走行車の認識を捉えた一連の質問を行った。 PLS-SEMは収集データの解析に使用された。その結果, 社会的影響, パフォーマンス期待度, ヘドニックモチベーション, ファシリテーション条件, 努力期待度によって, 技術の受容が決定された。さらに、ヘドニックモチベーション、社会的影響、ファシリテーション条件、努力期待度は、テクノロジがいかに有用であるかの認識に肯定的な影響を与え、ファシリテーション条件、ヘドニックモチベーション、社会的影響は、努力期待度に肯定的な影響を与え、社会的影響とファシリエーション条件はヘドニックモチベーションに肯定的な影響を与え、社会的影響は、ファシリエーション条件に肯定的な影響を与える。男女差の緩和効果がみられ, 採用意図に影響を与えるヘドニックモチベーションの影響が男性にとって顕著であった。

In this work we look at an element of smart cities, conditionally automated cars (SAE Level 3), investigating the factors influencing public acceptance in the United States. We apply an adaptation of the UTUAT2 model. Taking an experimental approach study 358 participants in the US were presented with a vignette outlining the L3 technology followed by a series of questions to capture their perceptions of conditionally automated cars. PLS-SEM was used to analyze the collected data. The results reveal that the acceptance of the technology, in order of decreasing importance, was determined by social influence, performance expectancy, hedonic motivation, facilitating conditions, and effort expectancy. Furthermore, hedonic motivation, social influence, facilitating conditions and effort expectancy all have a positive influence on the perception of how useful the technology is; facilitating conditions, hedonic motivation, and social influence all have a positive influence on effort expectancy; social influence and facilitating conditions positively influence hedonic motivation; and social influence positively influences facilitating conditions. A moderating effect for gender was found, with the effect of hedonic motivation influencing intention to adopt is more prominent for men.

翻訳日:2024-02-20 21:11:07 公開日:2024-02-18

# ベンチマーク自己進化:動的LLM評価のためのマルチエージェントフレームワーク

Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation ( http://arxiv.org/abs/2402.11443v1 )

ライセンス: Link先を確認

Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei, Xuanjing Huang

(参考訳) 本稿では,高速に進行する大規模言語モデル(llm)を動的に評価するためのベンチマーク自己進化フレームワークを提案する。マルチエージェントシステムを使用して、元のインスタンスのコンテキストや質問を操作し、既存のベンチマークを動的に拡張する信頼性の高い新しいインスタンスをフレーミングする。よりスケーラブルでロバストできめ細かい評価を行うため、様々なクエリやデータノイズに対してllmをテストする進化するインスタンスを構築するために、6つのリフレーミング操作を実装し、問題解決するサブアビリティを探索します。このフレームワークでは、4つのタスクのベンチマークデータセットを拡張する。実験結果から, LLMの当初の結果に対する性能低下が認められた。スケーラブルで堅牢な評価の下でのこの低下は、より正確にモデルの能力を反映する、きめ細かい評価と並んでいます。さらに、当社のフレームワークは、異なるモデルとさまざまなタスクにおける同一モデル間のパフォーマンスの相違を拡大し、特定のタスクに対するより情報のあるモデル選択を容易にします(コードとデータはhttps://github.com/NanshineLoong/Self-Evolving-Benchmarkで利用可能です)。

This paper presents a benchmark self-evolving framework to dynamically evaluate rapidly advancing Large Language Models (LLMs), aiming for a more accurate assessment of their capabilities and limitations. We utilize a multi-agent system to manipulate the context or question of original instances, reframing new evolving instances with high confidence that dynamically extend existing benchmarks. Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations to construct evolving instances testing LLMs against diverse queries, data noise and probing their problem-solving sub-abilities. With this framework, we extend benchmark datasets of four tasks. Experimental results show a general performance decline in most LLMs against their original results. This decline under our scalable and robust evaluations, alongside our fine-grained evaluation, more accurately reflect models' capabilities. Besides, our framework widens performance discrepancies both between different models and within the same model across various tasks, facilitating more informed model selection for specific tasks (Code and data are available at https://github.com/NanshineLoong/Self-Evolving-Benchmark).

翻訳日:2024-02-20 21:09:59 公開日:2024-02-18

# LLMはルールで理にかなっているか? ストレス試験とLLM改善のための論理スカッホールディング

Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs ( http://arxiv.org/abs/2402.11442v1 )

ライセンス: Link先を確認

Siyuan Wang, Zhongyu Wei, Yejin Choi, Xiang Ren

(参考訳) 大規模言語モデル(llm)は様々な推論タスクで印象的な人間的なパフォーマンスを達成している。しかし、その根底にある推論規則の熟達性は、人間の能力に欠ける。そこで本研究では,5つの領域にまたがるプリミティブルールとコンポジションルールを組み合わせた推論ルールベースであるULogicを構築するための,推論ルール生成フレームワークを提案する。ルールサブセット上でのGPT系列モデルの解析は,LLMの論理的理解において,特に特定のバイアスパターンを持つ構成的・構造的複雑な規則において,人的性能と比較して大きなギャップを生じさせる。さらにこれらのルールを,よりフレキシブルなルール生成と下流推論の強化のために,より小型な推論エンジンに蒸留する。提案する推論エンジンは, 精度, 複雑, 抽象的な結論と前提を生成するのに有効であることを証明し, 各種常識推論タスクを改良する。全体として、我々の研究は、推論ルールの把握における LLM の限界に光を当て、論理的推論能力~\footnote{Code を向上する方法を、 \url{https://github.com/SiyuanWangw/ULogic} で示しています。 }.

Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks. However, their mastery of underlying inferential rules still falls short of human capabilities. To investigate this, we propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic, comprising both primitive and compositional rules across five domains. Our analysis of GPT-series models over a rule subset reveals significant gaps in LLMs' logic understanding compared to human performance, especially in compositional and structural complex rules with certain bias patterns. We further distill these rules into a smaller-scale inference engine for flexible rule generation and enhancing downstream reasoning. Through a multi-judger evaluation, our inference engine proves effective in generating accurate, complex and abstract conclusions and premises, and improve various commonsense reasoning tasks. Overall, our work sheds light on LLMs' limitations in grasping inferential rule and suggests ways to enhance their logical reasoning abilities~\footnote{Code and data are available at \url{https://github.com/SiyuanWangw/ULogic}.}.

翻訳日:2024-02-20 21:09:34 公開日:2024-02-18

# InfuserKI:Infuser-Guided Knowledge Integrationによる知識グラフによる大規模言語モデルの強化

InfuserKI: Enhancing Large Language Models with Knowledge Graphs via Infuser-Guided Knowledge Integration ( http://arxiv.org/abs/2402.11441v1 )

ライセンス: Link先を確認

Fali Wang, Runxue Bao, Suhang Wang, Wenchao Yu, Yanchi Liu, Wei Cheng, Haifeng Chen

(参考訳) 大規模言語モデル(LLM)は、様々な領域にまたがる顕著なオープンジェネレーション能力を示しているが、彼らは知識集約的なタスクに苦労している。この問題を軽減するため、外部モジュールを用いたドメイン固有の知識グラフでllmを強化するための知識統合手法が提案されている。しかし、微調整には既知の知識と未知の知識の両方を必要とするため、データの非効率に苦しむ。そこで本研究では,未知の知識をLLMに効率的に統合する新たな課題について検討する。新しい知識を注入すると、以前に獲得した知識を忘れるリスクが生じる。そこで本研究では,トランスフォーマティブ内部状態を利用した新しい知識統合(infuserki,infuserki,infuserki,infuserki)フレームワークを提案する。 UMLS-2.5k と MetaQA ドメイン知識グラフの評価は、InfuserKI が知識の忘れを減らすために、新しい知識を効果的に獲得し、最先端のベースラインを9% と 6% に向上させることができることを示している。

Though Large Language Models (LLMs) have shown remarkable open-generation capabilities across diverse domains, they struggle with knowledge-intensive tasks. To alleviate this issue, knowledge integration methods have been proposed to enhance LLMs with domain-specific knowledge graphs using external modules. However, they suffer from data inefficiency as they require both known and unknown knowledge for fine-tuning. Thus, we study a novel problem of integrating unknown knowledge into LLMs efficiently without unnecessary overlap of known knowledge. Injecting new knowledge poses the risk of forgetting previously acquired knowledge. To tackle this, we propose a novel Infuser-Guided Knowledge Integration (InfuserKI) framework that utilizes transformer internal states to determine whether to enhance the original LLM output with additional information, thereby effectively mitigating knowledge forgetting. Evaluations on the UMLS-2.5k and MetaQA domain knowledge graphs demonstrate that InfuserKI can effectively acquire new knowledge and outperform state-of-the-art baselines by 9% and 6%, respectively, in reducing knowledge forgetting.

翻訳日:2024-02-20 21:09:11 公開日:2024-02-18

# 自己フィードバックのペリル--大規模言語モデルにおける自己バイアスの増幅

Perils of Self-Feedback: Self-Bias Amplifies in Large Language Models ( http://arxiv.org/abs/2402.11436v1 )

ライセンス: Link先を確認

Wenda Xu, Guanglei Zhu, Xuandong Zhao, Liangming Pan, Lei Li, William Yang Wang

(参考訳) 最近の研究によると、自己フィードバックは特定のタスクにおいて大きな言語モデル(LLM)を改善し、他のタスクを悪化させる。このような逆は、LLMが自身の出力に偏りがあることが判明した。本稿では, LLMの自己バイアス(自称世代を好む傾向)を2つの統計値を用いて正式に定義する。我々は、翻訳、制約付きテキスト生成、数学的推論の6つのLCMを解析する。自己バイアスは、複数の言語やタスクにまたがる全てのLLMで顕著である。分析の結果,自己定義パイプラインはモデル出力の流速と理解性を向上するが,さらに自己バイアスを増幅することがわかった。このようなバイアスを軽減するために,モデルサイズと正確な評価による外部からのフィードバックが,自己定義パイプラインのバイアスを著しく低減し,下流タスクのパフォーマンス向上につながることを見出した。

Recent studies show that self-feedback improves large language models (LLMs) on certain tasks while worsens other tasks. We discovered that such a contrary is due to LLM's bias towards their own output. In this paper, we formally define LLM's self-bias -- the tendency to favor its own generation -- using two statistics. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks.

翻訳日:2024-02-20 21:08:49 公開日:2024-02-18

# 大規模言語モデルのための知識境界のベンチマーク:モデル評価の異なる視点

Benchmarking Knowledge Boundary for Large Language Model: A Different Perspective on Model Evaluation ( http://arxiv.org/abs/2402.11493v1 )

ライセンス: Link先を確認

Xunjian Yin and Xu Zhang and Jie Ruan and Xiaojun Wan

(参考訳) 近年,多種多様なタスクにおいて顕著な性能を達成し,大規模言語モデルの開発において大きな進歩を遂げている。言語モデルの知識能力を評価するため,従来の研究では,質問応答ペアに基づくベンチマークが多数提案されている。我々は,言語モデルがアクティベートに敏感であるため,一定の質問や限定的なパラフレーズで言語モデルを評価することは信頼性が高く,包括的ではないと主張している。そこで本研究では,言語モデルにおいて,知識境界という新しい概念を導入する。知識境界は言語モデル評価の迅速な感度を回避し、より信頼性と堅牢性を高める。与えられたモデルの知識境界を探索するために,各知識に対して最適なプロンプトを識別する新しいアルゴリズムである,セマンティック制約付き予測勾配降下法を提案する。実験により,既存の手法と比較して知識境界の計算において,アルゴリズムの優れた性能を示す。さらに,知識境界を持つ複数の領域における複数の言語モデルの能力を評価する。

In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or limited paraphrases as the query, since language models are sensitive to prompt. Therefore, we introduce a novel concept named knowledge boundary to encompass both prompt-agnostic and prompt-sensitive knowledge within language models. Knowledge boundary avoids prompt sensitivity in language model evaluations, rendering them more dependable and robust. To explore the knowledge boundary for a given model, we propose projected gradient descent method with semantic constraints, a new algorithm designed to identify the optimal prompt for each piece of knowledge. Experiments demonstrate a superior performance of our algorithm in computing the knowledge boundary compared to existing methods. Furthermore, we evaluate the ability of multiple language models in several domains with knowledge boundary.

翻訳日:2024-02-20 21:02:27 公開日:2024-02-18

# 何の計画だ? LLMのためのプランニングアウェア技術の評価と開発

What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs ( http://arxiv.org/abs/2402.11489v1 )

ライセンス: Link先を確認

Eran Hirsch, Guy Uziel, Ateret Anaby-Tavor

(参考訳) 計画は、特定の環境で特定の目標を達成する一連の行動を見つけることを含む、人工知能の基本的なタスクである。大規模言語モデル(LLM)は、Webやエンボディエージェントのような計画機能を必要とするアプリケーションにますます使われています。近年の研究では,LSMには計画に必要なスキルが欠けていることが実証されている。これらの観測に基づいて,LLMと古典的計画手法を組み合わせたハイブリッドアプローチの可能性を提唱する。次に,新しいハイブリッド手法であるSimPlanを紹介し,その性能を新たな挑戦的な設定で評価する。様々な計画領域にわたる広範な実験により、SimPlanは既存のLLMベースのプランナーよりも大幅に優れていることが示された。

Planning is a fundamental task in artificial intelligence that involves finding a sequence of actions that achieve a specified goal in a given environment. Large language models (LLMs) are increasingly used for applications that require planning capabilities, such as web or embodied agents. In line with recent studies, we demonstrate through experimentation that LLMs lack necessary skills required for planning. Based on these observations, we advocate for the potential of a hybrid approach that combines LLMs with classical planning methodology. Then, we introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup. Our extensive experiments across various planning domains demonstrate that SimPlan significantly outperforms existing LLM-based planners.

翻訳日:2024-02-20 21:02:08 公開日:2024-02-18

# irfundusset:調和した健康ラベルを持つ網膜rundusデータセット

IRFundusSet: An Integrated Retinal Rundus Dataset with a Harmonized Healthy Label ( http://arxiv.org/abs/2402.11488v1 )

ライセンス: Link先を確認

P. Bilha Githinji, Keming Zhao, Jiantao Wang, Peiwu Qin

(参考訳) 眼の条件は世界的関心事であり、網膜底色写真を利用した計算ツールは定期的なスクリーニングと管理に役立つ。しかし、包括的かつ十分な大きさのデータセットを持つことは、人口統計学や取得のバリエーションに加えて、病理学における異質性を示す複雑な網膜基底体にとって自明ではない。さらに、公共空間における網膜眼底データセットは、データの組織化と健全な観察の定義において断片化に苦しむ。本稿では,複数の公開データセットを統合し,調和させ,キュレーションするデータセットである統合網膜底セット(irfundusset)を提案する。 IRFundusSetはPythonパッケージで構成されており、調和を自動化し、PyTorchアプローチに従ってデータセットオブジェクトを活用する。さらに、画像が物理的にレビューされ、健康観察の一貫した定義のために新しいis_normalラベルが注釈付けされる。 10の公開データセットが46064の画像で検討され、そのうち25406が新しいis_normalラベルのためにキュレートされ、3515はソース全体で健全であると考えられている。

Ocular conditions are a global concern and computational tools utilizing retinal fundus color photographs can aid in routine screening and management. Obtaining comprehensive and sufficiently sized datasets, however, is non-trivial for the intricate retinal fundus, which exhibits heterogeneities within pathologies, in addition to variations from demographics and acquisition. Moreover, retinal fundus datasets in the public space suffer fragmentation in the organization of data and definition of a healthy observation. We present Integrated Retinal Fundus Set (IRFundusSet), a dataset that consolidates, harmonizes and curates several public datasets, facilitating their consumption as a unified whole and with a consistent is_normal label. IRFundusSet comprises a Python package that automates harmonization and avails a dataset object in line with the PyTorch approach. Moreover, images are physically reviewed and a new is_normal label is annotated for a consistent definition of a healthy observation. Ten public datasets are initially considered with a total of 46064 images, of which 25406 are curated for a new is_normal label and 3515 are deemed healthy across the sources.

翻訳日:2024-02-20 21:01:59 公開日:2024-02-18

# テキスト-画像拡散モデルを用いた視覚概念駆動画像生成

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model ( http://arxiv.org/abs/2402.11487v1 )

ライセンス: Link先を確認

Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

(参考訳) テキスト・ツー・イメージ(tti)拡散モデルは、複雑なシーンや想像上のシーンの高解像度画像を生成する素晴らしい結果を示している。近年のアプローチでは、これらの手法をパーソナライズ技術でさらに拡張し、いくつかのサンプル画像のイラストを使ってユーザイリュートされた概念(例えば、ユーザ自身)を統合できるようになった。しかし、人間の主題など、複数の相互作用する概念を持つ画像を生成する能力や、1つあるいは複数の画像イラストに絡み合っているかもしれない概念は、いまだに想像に難くない。本研究では,これらの課題に対処する概念駆動型TTIパーソナライズフレームワークを提案する。ユーザ認証概念のカスタムトークンを学習し、TTIモデルで既存のテキストトークンと対話可能な既存の作業に基づいて構築する。しかし,問題となっている概念を解き散らし,よりよく学習するために,ユーザが提供するイメージイラストでこれらの概念を解き散らした(相対的な)セグメンテーションマスクを共同で学習する。我々は,カスタムトークンの学習と,ユーザ提供画像中の対応する概念を包含するマスクの推定を交互に行う,期待最大化(em)ライクな最適化手順を導入する。我々は,u-netパラメータ化潜在拡散モデルとそれに続く高密度crf最適化から,クロスアテンションに基づくマスクを得る。このような共同改良が、概念のより良いトークンの学習につながり、また、両産物として、潜伏したマスクであることを示す。提案手法の利点を(ユーザスタディを通して)質的かつ定量的に説明し,3つの概念を結合できる例とユースケースをいくつか紹介する。

Text-to-image (TTI) diffusion models have demonstrated impressive results in generating high-resolution images of complex and imaginative scenes. Recent approaches have further extended these methods with personalization techniques that allow them to integrate user-illustrated concepts (e.g., the user him/herself) using a few sample image illustrations. However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive. In this work, we propose a concept-driven TTI personalization framework that addresses these core challenges. We build on existing works that learn custom tokens for user-illustrated concepts, allowing those to interact with existing text tokens in the TTI model. However, importantly, to disentangle and better learn the concepts in question, we jointly learn (latent) segmentation masks that disentangle these concepts in user-provided image illustrations. We do so by introducing an Expectation Maximization (EM)-like optimization procedure where we alternate between learning the custom tokens and estimating masks encompassing corresponding concepts in user-supplied images. We obtain these masks based on cross-attention, from within the U-Net parameterized latent diffusion model and subsequent Dense CRF optimization. We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks. We illustrate the benefits of the proposed approach qualitatively and quantitatively (through user studies) with a number of examples and use cases that can combine up to three entangled concepts.

翻訳日:2024-02-20 21:01:36 公開日:2024-02-18

# 回転加速基準フレームにおける軌道角運動量スペクトルと絡み合い

Orbital angular momentum spectrum and entanglement in rotating accelerated reference frame ( http://arxiv.org/abs/2402.11486v1 )

ライセンス: Link先を確認

Haorong Wu, Xilong Fan, and Lixiang Chen

(参考訳) 粒子の定義は異なる理論によって異なる。曲線時空における場の量子論は、線形加速された観測者の視点からすると、慣性空空間は熱粒子で満たされている可能性があることを示している。この効果はunruh効果として知られている。軌道角運動量(oam)の自由度が考慮されると、全てのoamモードは同じ期待粒子数を共有する。本稿では, 回転加速基準フレームにおけるOAMスペクトルについて検討し, 線形加速の場合とスペクトルの相違について検討する。観測者が回転し始めると、全てのOAMモードが許されず、負のエネルギーモードが現れる。回転加速オブザーバーが実際にこれらの粒子をどう知覚するかを理解するために、ウンルー・デウィット検出器とその詳細バランスを研究した。この関係は、共動慣性フレームと残りフレームの両方で研究される。これらの結果に基づいて, OAMエンタングルメント劣化を2次元および高次元ケースでそれぞれ検討した。その結果, OAMモードのエンタングルメント次元と最高次数は, それぞれ加速度と回転に関係していることが示唆された。すると、これらの結果が全ての定常軌道に一般化できることが示される。

The particle definition varies across different theories. The quantum field theory in curved spacetime shows that from the perspective of a linearly accelerated observer, an inertial empty space may be full of thermal particles. This effect is known as the Unruh effect. When the degrees of freedom of orbital angular momentum (OAM) are considered, all OAM modes share the same expected particle number. Here, we examine the OAM spectrum in a rotating accelerated reference frame to see how the spectrum differs from the linear accelerated case. When the observer starts to rotate, not all OAM modes are allowed and some negative energy modes show up. To understand how a rotating accelerated observer actually perceives these particles, the Unruh-DeWitt detector and its detailed balance are studied. This relation is studied both in the comoving inertial frame and in the rest frame. Based on these results, the OAM entanglement degradation is explored in two-dimensional and high-dimensional cases, respectively. The results indicate that the entanglement dimension and the highest order of OAM modes are mainly related to the acceleration and the rotation, respectively. It is then demonstrated that these results can be generalized to all stationary trajectories.

翻訳日:2024-02-20 21:01:06 公開日:2024-02-18

# LEIA:エンティティベースのデータ拡張による言語モデルにおける言語間知識伝達の実現

LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation ( http://arxiv.org/abs/2402.11485v1 )

ライセンス: Link先を確認

Ikuya Yamada and Ryokan Ri

(参考訳) 英語に基づく大規模言語モデル(LLM)を他の言語に適応させることは、言語間移動の効率性と可能性から、ますます人気が高まっている。しかし、既存の言語適応手法はしばしば言語間監督の利点を見落としている。本研究では,言語間で一致したウィキペディアのエンティティ名を利用する言語適応チューニング手法であるLEIAを紹介する。この方法は、対象言語コーパスを英語のエンティティ名で拡張し、左右の言語モデルを用いてモデルをトレーニングすることを含む。 7Bパラメータ LLM を用いて多様な質問応答データセット上でLEIAを評価し,英語以外の言語で顕著な性能向上を示した。ソースコードはhttps://github.com/studio-ousia/leiaで入手できる。

Adapting English-based large language models (LLMs) to other languages has become increasingly popular due to the efficiency and potential of cross-lingual transfer. However, existing language adaptation methods often overlook the benefits of cross-lingual supervision. In this study, we introduce LEIA, a language adaptation tuning method that utilizes Wikipedia entity names aligned across languages. This method involves augmenting the target language corpus with English entity names and training the model using left-to-right language modeling. We assess LEIA on diverse question answering datasets using 7B-parameter LLMs, demonstrating significant performance gains across various non-English languages. The source code is available at https://github.com/studio-ousia/leia.

翻訳日:2024-02-20 21:00:47 公開日:2024-02-18

# DictLLM:医学診断のための大規模言語モデルを用いたキーバリューデータ構造

DictLLM: Harnessing Key-Value Data Structures with Large Language Models for Enhanced Medical Diagnostics ( http://arxiv.org/abs/2402.11481v1 )

ライセンス: Link先を確認

YiQiu Guo, Yuchen Yang, Ya Zhang, Yu Wang, Yanfeng Wang

(参考訳) 構造化データは、情報の組織化のための洗練されたメカニズムを提供する。大規模言語モデルの文脈における構造化データのテキストシリアライズのための既存の手法は、キー値構造化データに固有の不均一性に適切に対処できない。これらの手法は理想的ではなく、しばしば入力サイズが大きくなり、入力変更への適応性が低い。本稿では,医学検査報告などのキーバリュー構造化データのモデリングの改善を目的とした,医療診断のための革新的なフレームワークであるDictLLMを紹介する。 DictLLMは,(1)置換不変性を維持するためのグループ位置符号化,(2)構造化データの固有バイアスを捉える階層的注意バイアス,(3)辞書エンコーダが生成する埋め込みをLCMに整列させる最適な輸送アライメント層,の3つの重要な構成要素を統合し,固定長仮想トークンのシーケンスを生成する。診断自動生成のための総合的実世界医学実験室レポートデータセット上で,様々なllmモデルを用いた実験を行い,この結果から,ディクセルムはルージュlと知識f1得点の両方において,確立されたベースライン法と少数ショットgpt-4実装を有意に上回っていることが示された。さらに,このフレームワークのスケーラビリティとロバスト性の評価は,医用辞書データの複雑なキー・バリューデータ構造を正確にモデル化する上で,その例外的な能力を強調する。

Structured data offers a sophisticated mechanism for the organization of information. Existing methodologies for the text-serialization of structured data in the context of large language models fail to adequately address the heterogeneity inherent in key-value structured data. These methods are not ideal and frequently result in larger input sizes and poor adaptability to input changes. In this paper, we introduce DictLLM, an innovative framework designed to improve the modeling of key-value structured data, like medical laboratory reports, for generating medical diagnoses. DictLLM integrates three key components: (1) group positional encoding to maintain permutation invariance, (2) hierarchical attention bias to capture the inherent bias in structured data, and (3) an optimal transport alignment layer that aligns the embedding generated by the dictionary encoder with the LLM, thereby producing a sequence of fixed-length virtual tokens. We carry out experiments using various LLM models on a comprehensive real-world medical laboratory report dataset for automatic diagnosis generation, our findings illustrate that DictLLM significantly outperforms established baseline methods and few-shot GPT-4 implementations in terms of both Rouge-L and Knowledge F1 scores. Furthermore, our evaluation of the framework's scalability and robustness, through a series of experiments, underscores its exceptional capability in accurately modeling the complex key-value data structure of medical dictionary data.

翻訳日:2024-02-20 21:00:13 公開日:2024-02-18

# インドにおける異なるメンタルヘルス表現に関する研究

Studying Differential Mental Health Expressions in India ( http://arxiv.org/abs/2402.11477v1 )

ライセンス: Link先を確認

Khushi Shelat, Sunny Rai, Devansh R Jain, Young Min Cho, Maitreyi Redkar, Samindara Sawant, Sharath Chandra Guntuku

(参考訳) 精神社会的ストレスと精神障害の症状学は、社会文化的環境によって異なることが知られている。しかし、ソーシャルメディア上でのメンタルヘルスの表現は、主にWEIRD(Western, Educated, Industrial, Rich, Democratic)の文脈で研究されている。本稿では,インドにおける個人によるRedditのメンタルヘルス投稿を分析し,Rest of the World (ROW) のユーザと比較して,インドの文脈に特有なオンラインうつ病言語の変化を明らかにする。西洋のサンプルとは異なり、インドにおけるメンタルヘルスの議論は、悲しみ、否定の使用、現在に焦点を当てており、仕事と達成に関連している。イルネス』はインドとのみ関連しており、インドの患者の体症状と精神障害の関連を再確認している。 2人の臨床心理学者がソーシャルメディア投稿の調査結果を検証し、メンタルヘルスに関する議論に関連するトップ20のトピックの95%がインド人における「流行」であると判明した。インドにおけるオンラインメンタルヘルス関連言語における重要な言語的変化は、ROWと比較して、文化的に認識されたメンタルヘルスモデルの必要性を強調している。これらの知見は、インドにおける精神疾患の診断と治療のギャップを減少させるために文化的に適切な介入を設計する上で重要な意味を持つ。

Psychosocial stressors and the symptomatology of mental disorders are known to vary with socio-cultural environment. Mental health expressions on social media, however, are primarily informed by studies in the WEIRD (Western, Educated, Industrial, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online depression language specific to the Indian context compared to users from the Rest of the World (ROW). Unlike in Western samples, mental health discussions in India additionally express sadness, use negation, are present-focused, and are related to work and achievement. {Illness} is exclusively correlated to India, reaffirming the link between somatic symptoms and mental disorders in Indian patients. Two clinical psychologists validated the findings from social media posts and found 95\% of the top-20 topics associated with mental health discussions as {prevalent} in Indians. Significant linguistic variations in online mental health-related language in India compared to ROW, highlight the need for precision culturally-aware mental health models. These findings have important implications for designing culturally appropriate interventions to reduce the growing diagnosis and treatment gap for mental disorders in India.

翻訳日:2024-02-20 20:59:44 公開日:2024-02-18

# endoood : カプセル内視鏡診断における不確実性認識

EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis ( http://arxiv.org/abs/2402.11476v1 )

ライセンス: Link先を確認

Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Ren

(参考訳) wireless capsule endoscopy (wce) は消化管の可視化を可能にする非侵襲的な診断方法である。深層学習に基づく手法は、WCEデータを用いた疾患スクリーニングの有効性を示し、医療専門家の負担を軽減する。しかしながら、既存のカプセル内視鏡分類法は、主に予め定義されたカテゴリに依存しており、未定義のカテゴリや解剖学的ランドマークなど、分布外データ(ood)の識別と分類が困難である。この問題に対処するために,WCE 診断における OOD 検出課題を効果的に扱うことを目的としたEndoOOD (EndoOOD) フレームワークを提案する。提案フレームワークは,不確実性を考慮した混合訓練と長期分布データキャリブレーションを取り入れたWCE診断機能の堅牢性と信頼性の向上に重点を置いている。さらに、情報損失を最小限に抑えつつ、OODとIDデータを正確に識別するために仮想ロジットマッチングを用いる。提案手法の性能を評価するために,2つの公開データセットを用いた12の最先端(SOTA)手法の評価と比較を行った。以上の結果から,診断精度の向上と臨床意思決定支援におけるフレームワークの有効性が示された。

Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to identify and classify out-of-distribution (OOD) data, such as undefined categories or anatomical landmarks. To address this issue, we propose the Endoscopy Out-of-Distribution (EndoOOD) framework, which aims to effectively handle the OOD detection challenge in WCE diagnosis. The proposed framework focuses on improving the robustness and reliability of WCE diagnostic capabilities by incorporating uncertainty-aware mixup training and long-tailed in-distribution (ID) data calibration techniques. Additionally, virtual-logit matching is employed to accurately distinguish between OOD and ID data while minimizing information loss. To assess the performance of our proposed solution, we conduct evaluations and comparisons with 12 state-of-the-art (SOTA) methods using two publicly available datasets. The results demonstrate the effectiveness of the proposed framework in enhancing diagnostic accuracy and supporting clinical decision-making.

翻訳日:2024-02-20 20:59:24 公開日:2024-02-18

# 有毒偽造顔:顔偽造検出におけるバックドア攻撃へ向けて

Poisoned Forgery Face: Towards Backdoor Attacks on Face Forgery Detection ( http://arxiv.org/abs/2402.11473v1 )

ライセンス: Link先を確認

Jiawei Liang, Siyuan Liang, Aishan Liu, Xiaojun Jia, Junhao Kuang, Xiaochun Cao

(参考訳) 顔偽造技術の普及は社会に重大な関心を喚起し、顔偽造検出法の開発を動機付けている。これらの方法は、偽造された顔と本物の顔とを区別することを目的としており、実用的に有効であることが証明されている。しかし,本論文では,バックドア攻撃による顔偽造検出の新たな脅威について紹介する。バックドアをモデルに埋め込み、特定のトリガーパターンを入力に組み込むことで、攻撃者は検出器を欺き、偽造された顔に対する誤った予測を生成することができる。この目的を達成するために,顔偽造検知器に対するクリーンラベルバックドア攻撃を可能にする 'emph{Poisoned Forgery Face} フレームワークを提案する。提案手法は,スケーラブルなトリガジェネレータの構築と,翻訳に敏感なトリガパターンを生成するための新しいコンボルディングプロセスの利用である。また, 被毒試料のステルス性を高めるために, ランドマークベース領域に基づく相対的埋込法を適用した。その結果、有毒サンプルで訓練された検出器にはバックドアが埋め込まれる。特に,攻撃成功率 (+16.39\% BD-AUC) と可視性 (-12.65\% $L_\infty$) の低下により,SoTAのバックドアベースラインを超えている。さらに,攻撃は後方防御に対する有望な性能を示す。本論文は, 偽造検出シナリオにおけるバックドア攻撃による潜在的な脅威に, より注意を向けることが期待できる。我々のコードは \url{https://github.com/JWLiang007/PFF} で利用可能になる。

The proliferation of face forgery techniques has raised significant concerns within society, thereby motivating the development of face forgery detection methods. These methods aim to distinguish forged faces from genuine ones and have proven effective in practical applications. However, this paper introduces a novel and previously unrecognized threat in face forgery detection scenarios caused by backdoor attack. By embedding backdoors into models and incorporating specific trigger patterns into the input, attackers can deceive detectors into producing erroneous predictions for forged faces. To achieve this goal, this paper proposes \emph{Poisoned Forgery Face} framework, which enables clean-label backdoor attacks on face forgery detectors. Our approach involves constructing a scalable trigger generator and utilizing a novel convolving process to generate translation-sensitive trigger patterns. Moreover, we employ a relative embedding method based on landmark-based regions to enhance the stealthiness of the poisoned samples. Consequently, detectors trained on our poisoned samples are embedded with backdoors. Notably, our approach surpasses SoTA backdoor baselines with a significant improvement in attack success rate (+16.39\% BD-AUC) and reduction in visibility (-12.65\% $L_\infty$). Furthermore, our attack exhibits promising performance against backdoor defenses. We anticipate that this paper will draw greater attention to the potential threats posed by backdoor attacks in face forgery detection scenarios. Our codes will be made available at \url{https://github.com/JWLiang007/PFF}

翻訳日:2024-02-20 20:58:59 公開日:2024-02-18

# ddiprompt: グラフプロンプト学習に基づく薬物と薬物の相互作用イベント予測

DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt Learning ( http://arxiv.org/abs/2402.11472v1 )

ライセンス: Link先を確認

Yingying Wang, Yun Xiong, Xixi Wu, Xiangguo Sun, and Jiawei Zhang

(参考訳) 近年、グラフニューラルネットワークは、薬物分子内の原子と官能基間の複雑な関連をモデル化する能力があるため、有害薬物-薬物相互作用(ddi)を予測するためにますます普及している。しかし、(1)特定の相互作用が過小評価されている医療データセットでは一般的なが重要な問題である、高度に不均衡な事象分散の問題である。この不均衡は、正確で信頼性の高いDDI予測を達成する上で大きな障壁となる。 2) まれな事象のラベル付きデータの不足は, 稀かつ潜在的に重要な相互作用が限られたデータによって見過ごされ, 過小評価される場合が多い医療分野において, 広範な問題である。これに対し、グラフプロンプトの最近の進歩に触発された革新的なパナセアであるDDIPromptを提供する。我々のフレームワークは、トレーニング済みのモデルから本質的な知識を活用することで、これらの問題に対処することを目的としており、最小限の下流データで効率的にデプロイできる。特に、最初の課題を解決するために、DDIPromptは、構造的および対話的な近接性の両方を考慮して、薬物間のリンクを増設する。分子内構造と分子間相互作用を理解する階層的な事前学習戦略を特徴とし、薬物特性の包括的で偏見のない理解を促進する。第2の課題として,推論中にprototype-enhanced prompting機構を実装した。このメカニズムは、各カテゴリの数少ない例によって洗練され、リッチな事前学習知識を効果的に活用し、予測精度を高める。 2つのベンチマークデータセットの総合評価は、DDIPromptの優位性を示し、特に稀なDDIイベントを予測する。

Recently, Graph Neural Networks have become increasingly prevalent in predicting adverse drug-drug interactions (DDI) due to their proficiency in modeling the intricate associations between atoms and functional groups within and across drug molecules. However, they are still hindered by two significant challenges: (1) the issue of highly imbalanced event distribution, which is a common but critical problem in medical datasets where certain interactions are vastly underrepresented. This imbalance poses a substantial barrier to achieving accurate and reliable DDI predictions. (2) the scarcity of labeled data for rare events, which is a pervasive issue in the medical field where rare yet potentially critical interactions are often overlooked or under-studied due to limited available data. In response, we offer DDIPrompt, an innovative panacea inspired by the recent advancements in graph prompting. Our framework aims to address these issues by leveraging the intrinsic knowledge from pre-trained models, which can be efficiently deployed with minimal downstream data. Specifically, to solve the first challenge, DDIPrompt employs augmented links between drugs, considering both structural and interactive proximity. It features a hierarchical pre-training strategy that comprehends intra-molecular structures and inter-molecular interactions, fostering a comprehensive and unbiased understanding of drug properties. For the second challenge, we implement a prototype-enhanced prompting mechanism during inference. This mechanism, refined by few-shot examples from each category, effectively harnesses the rich pre-training knowledge to enhance prediction accuracy, particularly for these rare but crucial interactions. Comprehensive evaluations on two benchmark datasets demonstrate the superiority of DDIPrompt, particularly in predicting rare DDI events.

翻訳日:2024-02-20 20:58:34 公開日:2024-02-18

# 変圧器テクスチャモデルにおけるトレーニングデータと対向ロバスト性との相関関係の探索

A Curious Case of Searching for the Correlation between Training Data and Adversarial Robustness of Transformer Textual Models ( http://arxiv.org/abs/2402.11469v1 )

ライセンス: Link先を確認

Cuong Dang, Dung D. Le, Thai Le

(参考訳) 既存の研究によると、微調整されたテキスト変換モデルは最先端の予測性能を実現するが、敵対的なテキスト摂動にも弱い。従来の敵対的評価は、モデルの微調整とトレーニングデータを無視して、しばしば \textit{only after} で行われる。本稿では,トレーニングデータとモデルロバスト性との間にも強い相関関係があることを証明したい。この目的のために,入力の微調整コーパス特性を表す13の異なる特徴を抽出し,それらを用いて微調整モデルの敵対的ロバスト性を予測する。主にエンコーダのみのトランスモデル BERT と RoBERTa に着目し,BART,ELECTRA,GPT2 のさらなる結果を得た上で,この議論を裏付けるさまざまな証拠を提供する。まず実証的な分析から (a)抽出した特徴をランダムフォレストなどの軽量分類器を用いて効果的に攻撃成功率を予測することができる。 (b)モデルのロバスト性に最も影響を及ぼす特徴は、ロバスト性と明確に相関する。第2に、このフレームワークは堅牢性評価のための高速かつ効果的な追加ツールとして使用できる。 (a)従来の手法と比較して30x-193xのランタイムを節約する。 (b)モデル間で転送可能である。 (c) 敵対的訓練で使用することができ、 (d)統計的ランダム性に頑健である。私たちのコードは公開されます。

Existing works have shown that fine-tuned textual transformer models achieve state-of-the-art prediction performances but are also vulnerable to adversarial text perturbations. Traditional adversarial evaluation is often done \textit{only after} fine-tuning the models and ignoring the training data. In this paper, we want to prove that there is also a strong correlation between training data and model robustness. To this end, we extract 13 different features representing a wide range of input fine-tuning corpora properties and use them to predict the adversarial robustness of the fine-tuned models. Focusing mostly on encoder-only transformer models BERT and RoBERTa with additional results for BART, ELECTRA and GPT2, we provide diverse evidence to support our argument. First, empirical analyses show that (a) extracted features can be used with a lightweight classifier such as Random Forest to effectively predict the attack success rate and (b) features with the most influence on the model robustness have a clear correlation with the robustness. Second, our framework can be used as a fast and effective additional tool for robustness evaluation since it (a) saves 30x-193x runtime compared to the traditional technique, (b) is transferable across models, (c) can be used under adversarial training, and (d) robust to statistical randomness. Our code will be publicly available.

翻訳日:2024-02-20 20:58:04 公開日:2024-02-18

# 長期時系列予測のためのアトラクタメモリ:カオスの視点から

Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective ( http://arxiv.org/abs/2402.11463v1 )

ライセンス: Link先を確認

Jiaxi Hu, Yuehong Hu, Wei Chen, Ming Jin, Shirui Pan, Qingsong Wen, Yuxuan Liang

(参考訳) 長期時系列予測(LTSF)タスクでは、既存のディープラーニングモデルは、離散時系列が基礎となる連続力学系から生じる決定的な特性を見落とし、外挿と進化能力の欠如をもたらす。実世界のデータのカオス性を認識するモデルである \textbf{\textit{attraos}} は、カオス理論をltsfに取り入れ、未知の高次元カオス力学系からの観測として実世界の時系列を知覚する。誘引的不変性の概念の下で、Attraosは提案したマルチスケール動的メモリユニットを使用して、歴史的動的構造を記憶し、周波数強調ローカル進化戦略によって予測する。詳細な理論的分析と豊富な経験的証拠は、アトラオスが主流のLTSFデータセットやカオスデータセット上で様々なLTSFメソッドより優れていることを一貫して示している。

In long-term time series forecasting (LTSF) tasks, existing deep learning models overlook the crucial characteristic that discrete time series originate from underlying continuous dynamic systems, resulting in a lack of extrapolation and evolution capabilities. Recognizing the chaotic nature of real-world data, our model, \textbf{\textit{Attraos}}, incorporates chaos theory into LTSF, perceiving real-world time series as observations from unknown high-dimensional chaotic dynamic systems. Under the concept of attractor invariance, Attraos utilizes the proposed multi-scale dynamic memory unit to memorize historical dynamics structure and predicts by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets.

翻訳日:2024-02-20 20:57:40 公開日:2024-02-18

# fgeo-hypergnet:形式的シンボリックシステムとハイパーグラフニューラルネットワークの統合による幾何問題

FGeo-HyperGNet: Geometry Problem Solving Integrating Formal Symbolic System and Hypergraph Neural Network ( http://arxiv.org/abs/2402.11461v1 )

ライセンス: Link先を確認

Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Cheng Qin, Yang Li, Zhenbing Zeng, Tuo Leng

(参考訳) 幾何学的問題解決は、自動推論と人工知能の分野における長年にわたる課題である。我々は、人間のような幾何学的推論を自動実行するためのニューラルシンボリックシステムを構築しました。シンボリック部分はformorgeo上に構築された形式的システムであり、ジオマーティックな関係推論と代数的計算を自動実行し、解の過程をハイパーノードとして条件とハイパーエッジとして定理を持つ解超木に整理することができる。神経部分はhypergnetと呼ばれ、注意機構に基づくハイパーグラフニューラルネットワークであり、ハイパーツリーの構造および意味情報を効果的にエンコードするエンコーダと、問題解決ガイダンスを提供するソルバが含まれている。神経部はハイパーツリーに従って定理を予測し、記号部は定理を適用し、ハイパーツリーを更新する。実験は、このニューラルシンボリックアーキテクチャの正確性と有効性を示す。フォルマジオ7kデータセットでは、ステップワイズ精度87.65%、全体的な精度85.53%を達成した。コードとデータはhttps://github.com/bitsecret/hypergnetで入手できる。

Geometry problem solving has always been a long-standing challenge in the fields of automated reasoning and artificial intelligence. This is the fifth article in a series of our works, we built a neural-symbolic system to automatically perform human-like geometric deductive reasoning. The symbolic part is a formal system built on FormalGeo, which can automatically perform geomertic relational reasoning and algebraic calculations and organize the solving process into a solution hypertree with conditions as hypernodes and theorems as hyperedges. The neural part, called HyperGNet, is a hypergraph neural network based on the attention mechanism, including a encoder to effectively encode the structural and semantic information of the hypertree, and a solver to provide problem-solving guidance. The neural part predicts theorems according to the hypertree, and the symbolic part applies theorems and updates the hypertree, thus forming a Predict-Apply Cycle to ultimately achieve readable and traceable automatic solving of geometric problems. Experiments demonstrate the correctness and effectiveness of this neural-symbolic architecture. We achieved a step-wised accuracy of 87.65% and an overall accuracy of 85.53% on the formalgeo7k datasets. The code and data is available at https://github.com/BitSecret/HyperGNet.

翻訳日:2024-02-20 20:57:13 公開日:2024-02-18

# 時空間知識グラフに関する質問応答

Question Answering Over Spatio-Temporal Knowledge Graph ( http://arxiv.org/abs/2402.11542v1 )

ライセンス: Link先を確認

Xinbang Dai, Huiying Li, Guilin Qi

(参考訳) 時空間知識グラフ(STKG)は、時間と位置情報を組み込んだ知識グラフ(KG)の概念を拡張している。知識グラフ質問応答(kgqa)を研究コミュニティが重視する一方で、stkgsに基づく時空間情報と時空間情報の両方を取り入れた質問への回答の分野は、ほとんど未定である。さらに、包括的なデータセットの欠如は、この分野の進歩を妨げている。この問題に対処するために、時空間知識グラフ質問応答(STKGQA)のための1万の自然言語質問からなるデータセットSTQADを提案する。残念なことに、最先端のKGQAアプローチは、我々のデータセットで十分なパフォーマンスを達成するには程遠い。そこで本研究では,STComplExという新しいSTKG埋め込み手法を用いた時空間KGQA手法であるSTCQAを提案する。質問から時間的・空間的な情報を抽出することにより、質問をよりよく理解し、STKGから正確な回答を得ることができる。大規模な実験を通じて、データセットの品質とSTKGQA法の有効性を実証した。

Spatio-temporal knowledge graphs (STKGs) extend the concept of knowledge graphs (KGs) by incorporating time and location information. While the research community's focus on Knowledge Graph Question Answering (KGQA), the field of answering questions incorporating both spatio-temporal information based on STKGs remains largely unexplored. Furthermore, a lack of comprehensive datasets also has hindered progress in this area. To address this issue, we present STQAD, a dataset comprising 10,000 natural language questions for spatio-temporal knowledge graph question answering (STKGQA). Unfortunately, various state-of-the-art KGQA approaches fall far short of achieving satisfactory performance on our dataset. In response, we propose STCQA, a new spatio-temporal KGQA approach that utilizes a novel STKG embedding method named STComplEx. By extracting temporal and spatial information from a question, our QA model can better comprehend the question and retrieve accurate answers from the STKG. Through extensive experiments, we demonstrate the quality of our dataset and the effectiveness of our STKGQA method.

翻訳日:2024-02-20 20:48:42 公開日:2024-02-18

# 大語彙アラビアリブディングのための視覚的特徴と幾何学的特徴の相互注意融合

Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading ( http://arxiv.org/abs/2402.11520v1 )

ライセンス: Link先を確認

Samar Daou, Ahmed Rekik, Achraf Ben-Hamadou, Abdelaziz Kallel

(参考訳) リップリーディングは、唇とその周辺領域の動きを分析することによって、音声の認識に視覚データを使用する。これは、人間と機械の相互作用や音声認識の強化など、多くの潜在的な応用に関する熱い研究トピックである。近年の深層学習に基づく研究は,口域から抽出した視覚的特徴を唇輪郭の目印点と統合することを目的としている。しかし、結合のような単純な組み合わせ法は最適な特徴ベクトルを得るための最も効果的なアプローチではないかもしれない。まず,この課題に対処するために,大語彙レキシコン語彙によるビデオ中の発話単語の予測のためのクロス・アテンション・フュージョンに基づくアプローチを提案する。本手法は,視覚的特徴と幾何学的特徴を効率的に統合するために,クロスアテンションネットワークのパワーを利用する。第二に, アラビア語 (lrw-ar) 用に, 36名の話者が発話する100語クラスの2万本のビデオを含む大規模リップリーディングを初めて紹介する。 lrw-ar と arabic visual database で得られた実験結果は,提案手法の有効性と頑健性を示した。私たちの研究は、アラビア語にリップリード技術を適用する可能性と有効性について洞察を与え、この分野におけるさらなる研究の扉を開く。プロジェクトページへのリンク: https://crns-smartvision.github.io/lrwar

Lipreading involves using visual data to recognize spoken words by analyzing the movements of the lips and surrounding area. It is a hot research topic with many potential applications, such as human-machine interaction and enhancing audio speech recognition. Recent deep-learning based works aim to integrate visual features extracted from the mouth region with landmark points on the lip contours. However, employing a simple combination method such as concatenation may not be the most effective approach to get the optimal feature vector. To address this challenge, firstly, we propose a cross-attention fusion-based approach for large lexicon Arabic vocabulary to predict spoken words in videos. Our method leverages the power of cross-attention networks to efficiently integrate visual and geometric features computed on the mouth region. Secondly, we introduce the first large-scale Lip Reading in the Wild for Arabic (LRW-AR) dataset containing 20,000 videos for 100-word classes, uttered by 36 speakers. The experimental results obtained on LRW-AR and ArabicVisual databases showed the effectiveness and robustness of the proposed approach in recognizing Arabic words. Our work provides insights into the feasibility and effectiveness of applying lipreading techniques to the Arabic language, opening doors for further research in this field. Link to the project page: https://crns-smartvision.github.io/lrwar

翻訳日:2024-02-20 20:48:24 公開日:2024-02-18

# 異種情報ネットワークにおける大規模言語モデル駆動型メタ構造発見

Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network ( http://arxiv.org/abs/2402.11518v1 )

ライセンス: Link先を確認

Lin Chen, Fengli Xu, Nian Li, Zhenyu Han, Meng Wang, Yong Li, Pan Hui

(参考訳) 不均一情報ネットワーク(HIN)は、多様なタイプのノード間の複雑な関係を捉えることができることで人気が高まっている。メタ構造は、豊かな意味情報を抽出し、グラフニューラルネットワークが表現表現を学ぶのに有効であることが証明されたHINに関する重要な関係パターンを特定するために提案された。しかし,手作りのメタ構造はスケールアップの難しさを招き,自動メタ構造探索アルゴリズムの開発に広く研究されている。以前の取り組みは、説明可能性を見越して、経験的予測性能の優れたメタ構造を探索することに集中していた。したがって、それらはしばしば、過度に適合し、人間には理解できないメタ構造を生み出す。これに対処するため、私たちは大言語モデル(llm)の創発的な推論能力からインスピレーションを得ます。本稿では,LLM推論を進化過程に統合したReasoning meta-STRUCTure search(ReStruct)フレームワークを提案する。 ReStructは文法トランスレータを使用して、メタ構造を自然言語文にエンコードし、LLMの推論能力を利用して意味論的に可能なメタ構造を評価する。 ReStructはパフォーマンス指向の進化操作も採用している。これら2つの競合する力は、メタ構造の意味的説明可能性と経験的性能を共同で最適化する。また,発見したメタ構造を自然言語で説明できる差分LLM説明器を設計し,検索履歴を通した推論により説明を洗練する。 5つのデータセットの実験では、ノード分類とリンクレコメンデーションタスクにおいて、ReStructがSOTAのパフォーマンスを達成することを示した。さらに、73人の大学院生を対象にした調査では、ReStructが生み出したメタ構造や自然言語の説明が理解しやすくなっている。

Heterogeneous information networks (HIN) have gained increasing popularity for being able to capture complex relations between nodes of diverse types. Meta-structure was proposed to identify important patterns of relations on HIN, which has been proven effective for extracting rich semantic information and facilitating graph neural networks to learn expressive representations. However, hand-crafted meta-structures pose challenges for scaling up, which draws wide research attention for developing automatic meta-structure search algorithms. Previous efforts concentrate on searching for meta-structures with good empirical prediction performance, overlooking explainability. Thus, they often produce meta-structures prone to overfitting and incomprehensible to humans. To address this, we draw inspiration from the emergent reasoning abilities of large language models (LLMs). We propose a novel REasoning meta-STRUCTure search (ReStruct) framework that integrates LLM reasoning into the evolutionary procedure. ReStruct uses a grammar translator to encode meta-structures into natural language sentences, and leverages the reasoning power of LLMs to evaluate semantically feasible meta-structures. ReStruct also employs performance-oriented evolutionary operations. These two competing forces jointly optimize for semantic explainability and empirical performance of meta-structures. We also design a differential LLM explainer that can produce natural language explanations for the discovered meta-structures, and refine the explanation by reasoning through the search history. Experiments on five datasets demonstrate ReStruct achieve SOTA performance in node classification and link recommendation tasks. Additionally, a survey study involving 73 graduate students shows that the meta-structures and natural language explanations generated by ReStruct are substantially more comprehensible.

翻訳日:2024-02-20 20:48:01 公開日:2024-02-18

# Knowledge-to-SQL: データエキスパートLLMによるSQL生成の強化

Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM ( http://arxiv.org/abs/2402.11517v1 )

ライセンス: Link先を確認

Zijin Hong, Zheng Yuan, Hao Chen, Qinggang Zhang, Feiran Huang, Xiao Huang

(参考訳) ユーザクエリ(text-to-SQL)に対する正確なSQLの生成は、SQLの生成がクエリとデータベースを解釈し、データベースから正確なデータを取得する必要があるため、長年にわたる問題である。既存のモデルはデータベーススキーマに従ってSQLを生成するためのLLM(Large Language Models)の包括的な能力に依存している。しかし、データベーススキーマに明示的に含まれていない、あるいはllmsによって学習された必要な知識がある。したがって、生成した知識不足クエリのsqlは不正確であり、テキスト対sqlモデルのロバスト性に悪影響を及ぼす可能性がある。この状況に対処するため,データエキスパートのLLM(DELLM)を用いて,すべてのタイプのテキスト・トゥ・SQLモデルに有用な知識を提供するKnowledge-to-SQLフレームワークを提案する。具体的には,DELLMの詳細設計とテーブル読解,および基礎的な微調整プロセスについて述べる。さらに、データベースフィードバックによる強化学習(RLDBF)のトレーニング戦略を提供し、DELLMを誘導し、LLMのより有用な知識を生成する。大規模な実験により、DELLMはテキストからSQLタスクにおける最先端のLLMを強化することができる。 DELLMのモデル構造とパラメータ重量は、さらなる研究のために公表される。

Generating accurate SQL for user queries (text-to-SQL) is a long-standing problem since the generation of the SQL requires comprehending the query and database and retrivale the accurate data from the database accordingly. Existing models rely on the comprehensive ability of Large Language Models (LLMs) to generate the SQL according to the database schema. However, there is some necessary knowledge that is not explicitly included in the database schema or has been learned by LLMs. Thus, the generated SQL of the knowledge-insufficient queries may be inaccurate, which negatively impacts the robustness of the text-to-SQL models. To deal with this situation, we propose the Knowledge-to-SQL framework, which employs tailored Data Expert LLM (DELLM) to provide helpful knowledge for all types of text-to-SQL models. Specifically, we provide the detailed design of DELLM, in terms of table reading, and the basic fine-tuning process. We further provide a Reinforcement Learning via Database Feedback (RLDBF) training strategy to guide the DELLM to generate more helpful knowledge for LLMs. Extensive experiments verify DELLM can enhance the state-of-the-art LLMs on text-to-SQL tasks. The model structure and the parameter weight of DELLM are released for further research.

翻訳日:2024-02-20 20:47:34 公開日:2024-02-18

# 深層強化学習に基づく計算流体力学におけるアクティブフロー制御のための最適並列化戦略

Optimal Parallelization Strategies for Active Flow Control in Deep Reinforcement Learning-Based Computational Fluid Dynamics ( http://arxiv.org/abs/2402.11515v1 )

ライセンス: Link先を確認

Wang Jia and Hang Xu

(参考訳) Deep Reinforcement Learning (DRL) は、高ダイナミックかつ非線形なアクティブフロー制御(AFC)問題を扱うための有望なアプローチとして登場した。しかし、DRLモデルのトレーニングに伴う計算コストは、大きなパフォーマンスボトルネックとなる。この課題に対処し、高性能コンピューティングアーキテクチャの効率的なスケーリングを実現するために、DRLベースのアルゴリズムを並列設定で最適化することに焦点を当てた。我々は、AFC問題に使用される既存の最先端DRLフレームワークを検証し、その効率ボトルネックについて議論する。その後、フレームワーク全体を分解し、個々のコンポーネントの広範なスケーラビリティベンチマークを行うことで、様々なハイブリッド並列化構成を調査し、効率的な並列化戦略を提案する。さらに,複数環境drlトレーニングにおける入出力(i/o)操作を洗練し,データ移動に伴う重要なオーバーヘッドに対処する。最後に,一般的なafc問題に対して最適化されたフレームワークを示し,そのフレームワーク全体に対してニアリニアスケーリングを求める。並列効率を約49%から約78%に大幅に向上させ,60cpuコアを用いて約47倍高速化した。これらの知見は、DRLに基づくAFC研究のさらなる進歩に有用な知見をもたらすことが期待されている。

Deep Reinforcement Learning (DRL) has emerged as a promising approach for handling highly dynamic and nonlinear Active Flow Control (AFC) problems. However, the computational cost associated with training DRL models presents a significant performance bottleneck. To address this challenge and enable efficient scaling on high-performance computing architectures, this study focuses on optimizing DRL-based algorithms in parallel settings. We validate an existing state-of-the-art DRL framework used for AFC problems and discuss its efficiency bottlenecks. Subsequently, by deconstructing the overall framework and conducting extensive scalability benchmarks for individual components, we investigate various hybrid parallelization configurations and propose efficient parallelization strategies. Moreover, we refine input/output (I/O) operations in multi-environment DRL training to tackle critical overhead associated with data movement. Finally, we demonstrate the optimized framework for a typical AFC problem where near-linear scaling can be obtained for the overall framework. We achieve a significant boost in parallel efficiency from around 49% to approximately 78%, and the training process is accelerated by approximately 47 times using 60 CPU cores. These findings are expected to provide valuable insights for further advancements in DRL-based AFC studies.

翻訳日:2024-02-20 20:47:12 公開日:2024-02-18

# 偏見からパリティへ: 大きな言語モデルによる単語埋め込みのデバイアスに対する新しいアプローチ

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings ( http://arxiv.org/abs/2402.11512v1 )

ライセンス: Link先を確認

Aishik Rakshit, Smriti Singh, Shuvam Keshari, Arijit Ghosh Chowdhury, Vinija Jain, Aman Chadha

(参考訳) 埋め込みは、大規模言語モデルの有効性において重要な役割を果たす。これらのモデルが文脈的関係を把握し、言語に対するよりニュアンス的な理解を育み、その結果、人間言語の基本的な理解を必要とする多くの複雑なタスクにおいて、著しく機能する基盤となる。これらの埋め込み自体がしばしばバイアスを反映または表象していることを考えると、これらのモデルが必然的にこのバイアスを学習する理由である。本研究では,これまでの精巧な研究に基づいて,ニューラルネットワークを用いて「ソフトデバイアス」を行うアルゴリズムであるdeepsoftdebiasを提案する。我々はこのアルゴリズムを様々なSOTAデータセット、精度メトリクス、難解なNLPタスクで徹底的に評価する。 DeepSoftDebiasは、性別、人種、宗教の偏見を減らし、最先端の手法よりも優れています。

Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDebias, an algorithm that uses a neural network to perform `soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.

翻訳日:2024-02-20 20:46:50 公開日:2024-02-18

# 胸部X線セグメンテーションマスクにおける肺領域の過小評価 : CTによる肺総容積評価との比較

Underestimation of lung regions on chest X-ray segmentation masks assessed by comparison with total lung volume evaluated on computed tomography ( http://arxiv.org/abs/2402.11510v1 )

ライセンス: Link先を確認

Przemys{\l}aw Bombi\'nski, Patryk Szatkowski, Bart{\l}omiej Sobieski, Tymoteusz Kwieci\'nski, Szymon P{\l}otka, Mariusz Adamek, Marcin Banasiuk, Mariusz I. Furmanek, Przemys{\l}aw Biecek

(参考訳) 肺マスクの作成には明確な基準や基準が欠如しており、アノテータ間の主観性が高い。本研究では, 胸部x線分画マスクにおける肺領域の過大評価を, 胸部x線分画マスクの現況とctで評価した肺総量との比較により検討した。肺x線マスクは, 心臓, 縦隔, 横隔の輪郭を追尾し, 肺領域を著しく過小評価し, 肺のかなりの部分をさらなる評価から排除し, 臨床上の誤りを多数生じさせる可能性がある。

Lung mask creation lacks well-defined criteria and standardized guidelines, leading to a high degree of subjectivity between annotators. In this study, we assess the underestimation of lung regions on chest X-ray segmentation masks created according to the current state-of-the-art method, by comparison with total lung volume evaluated on computed tomography (CT). We show, that lung X-ray masks created by following the contours of the heart, mediastinum, and diaphragm significantly underestimate lung regions and exclude substantial portions of the lungs from further assessment, which may result in numerous clinical errors.

翻訳日:2024-02-20 20:46:34 公開日:2024-02-18

# MAL:自己監督深度推定のための時間・蒸留ヒント付き運動認識損失

MAL: Motion-Aware Loss with Temporal and Distillation Hints for Self-Supervised Depth Estimation ( http://arxiv.org/abs/2402.11507v1 )

ライセンス: Link先を確認

Yup-Jiang Dong, Fang-Lue Zhang, Song-Hai Zhang

(参考訳) 深度知覚は、幅広いロボット応用に不可欠である。大規模でラベルのない実世界のデータを活用できるため,多フレーム自己監督深度推定手法が研究の関心を集めている。しかし、自己教師型手法は静的シーンの仮定に依存し、動的環境において性能が劣化する傾向がある。そこで本研究では,連続した入力フレーム間の時間的関係と,教師と生徒ネットワーク間の新たな蒸留方式を活用し,マルチフレーム自己教師奥行き推定手法を提案する。具体的には,移動物体の空間位置と入力フレームの時間順序を関連付け,物体の動きによる誤差を解消する。一方,マルチフレーム方式では元の蒸留スキームを強化し,教師ネットワークからの知識をより活用する。 MALはマルチフレームの自己監督型単眼深度推定手法にシームレスに統合するために設計された新しいプラグアンドプレイモジュールである。従来の最先端手法にMALを追加すると、KITTIとCityScapesベンチマークでそれぞれ4.2%と10.8%の深さ推定誤差が減少する。

Depth perception is crucial for a wide range of robotic applications. Multi-frame self-supervised depth estimation methods have gained research interest due to their ability to leverage large-scale, unlabeled real-world data. However, the self-supervised methods often rely on the assumption of a static scene and their performance tends to degrade in dynamic environments. To address this issue, we present Motion-Aware Loss, which leverages the temporal relation among consecutive input frames and a novel distillation scheme between the teacher and student networks in the multi-frame self-supervised depth estimation methods. Specifically, we associate the spatial locations of moving objects with the temporal order of input frames to eliminate errors induced by object motion. Meanwhile, we enhance the original distillation scheme in multi-frame methods to better exploit the knowledge from a teacher network. MAL is a novel, plug-and-play module designed for seamless integration into multi-frame self-supervised monocular depth estimation methods. Adding MAL into previous state-of-the-art methods leads to a reduction in depth estimation errors by up to 4.2% and 10.8% on KITTI and CityScapes benchmarks, respectively.

翻訳日:2024-02-20 20:46:20 公開日:2024-02-18

# 不均質な言語課題とクライアントリソースに基づく大規模言語モデルのフェデレーション微調整

Federated Fine-tuning of Large Language Models under Heterogeneous Language Tasks and Client Resources ( http://arxiv.org/abs/2402.11505v1 )

ライセンス: Link先を確認

Jiamu Bai, Daoyuan Chen, Bingchen Qian, Liuyi Yao, Yaliang Li

(参考訳) Federated Learning (FL) は、最近、LLM(Large Language Models)のパラメータ効率の高い微調整に応用されている。将来性はあるものの、クライアントの異種リソースとデータ分散による大きな課題を提起する。本研究では、llmの微調整のためのシンプルで効果的な集約スキームflexloraを紹介している。これは、リソース不足の参加者の能力に結びつけることで、豊富なリソースを持つクライアントの可能性を制限する従来のflの「バケット効果」を緩和する。 FlexLoRAはローカルなLoRAランクの動的調整を可能にし、より広範でタスク固有の知識の少ないグローバルモデルの開発を促進する。個々のクライアントからのコントリビューションからフルサイズのLoRA重みを合成し、重量再分配にSingular Value Decomposition(SVD)を採用することで、FlexLoRAは異種クライアントリソースを完全に活用する。 1,600以上のクライアントが多様なNLPタスクを担い、この実験はFlexLoRAの有効性を検証し、フェデレートされたグローバルモデルにより、下流のNLPタスクのパフォーマンスが3.1%向上した。 FlexLoRAの実用性は、既存のLoRAベースのFLメソッドとシームレスに統合され、LLMのスケーラブルでプライバシ保護されたフェデレーションチューニングへの道を提供する理論解析によってさらに強調されている。

Federated Learning (FL) has recently been applied to the parameter-efficient fine-tuning of Large Language Models (LLMs). While promising, it raises significant challenges due to the heterogeneous resources and data distributions of clients.This study introduces FlexLoRA, a simple yet effective aggregation scheme for LLM fine-tuning, which mitigates the "buckets effect" in traditional FL that restricts the potential of clients with ample resources by tying them to the capabilities of the least-resourced participants. FlexLoRA allows for dynamic adjustment of local LoRA ranks, fostering the development of a global model imbued with broader, less task-specific knowledge. By synthesizing a full-size LoRA weight from individual client contributions and employing Singular Value Decomposition (SVD) for weight redistribution, FlexLoRA fully leverages heterogeneous client resources. Involving over 1,600 clients performing diverse NLP tasks, our experiments validate the efficacy of FlexLoRA, with the federated global model achieving up to a 3.1% average improvement in downstream NLP task performance. FlexLoRA's practicality is further underscored by its seamless integration with existing LoRA-based FL methods and theoretical analysis, offering a path toward scalable, privacy-preserving federated tuning for LLMs.

翻訳日:2024-02-20 20:45:59 公開日:2024-02-18

# プロプライエタリなストリートビュー画像を(健康と場所)調査で使うか、使わないか? それが問題です

To use or not to use proprietary street view images in (health and place) research? That is the question ( http://arxiv.org/abs/2402.11504v1 )

ライセンス: Link先を確認

Marco Helbich, Matthew Danish, SM Labib, Britta Ricker

(参考訳) コンピュータビジョンに基づくストリートビュー画像の解析は環境アセスメントに変化をもたらす。インタラクティブなWebサービス、特にGoogleストリートビューは、画像データをユビキタスにするための重要な役割を果たす。何百万ものGoogleストリートビュー画像を利用する技術的容易さにもかかわらず、この記事は、このプロプライエタリなデータソースを使用する際の現在のプラクティスに疑問を投げかける。画像の大量ダウンロードやストリートビュー画像ベースのインデックスの生成を禁止しているGoogleのサービス規約に懸念があります。データライセンス契約と法的整合性を維持しつつ研究を画期的に進めることによる社会の発展の課題を解決するためには、オープンデータ原則に固執し、将来の研究にオープンイメージソースを活用することが不可欠である。

Computer vision-based analysis of street view imagery has transformative impacts on environmental assessments. Interactive web services, particularly Google Street View, play an ever-important role in making imagery data ubiquitous. Despite the technical ease of harnessing millions of Google Street View images, this article questions the current practices in using this proprietary data source. Our concern lies with Google's terms of service, which prohibit bulk image downloads and the generation of street view image-based indices. To reconcile the challenge of advancing society through groundbreaking research while maintaining data license agreements and legal integrity, it is crucial to adhere to open data principles and utilize open image sources for future research.

翻訳日:2024-02-20 20:45:33 公開日:2024-02-18

# GenAD: 次世代のエンドツーエンド自動運転

GenAD: Generative End-to-End Autonomous Driving ( http://arxiv.org/abs/2402.11502v1 )

ライセンス: Link先を確認

Wenzhao Zheng, Ruiqi Song, Xianda Guo, Long Chen

(参考訳) 生センサによる計画結果を直接生成することは、自動運転の長年望まれてきたソリューションであり、近年注目を集めている。既存のエンドツーエンドの自動運転手法の多くは、この問題を知覚、運動予測、計画に分解している。しかし、従来のプログレッシブパイプラインは、例えば、エゴカーと他の交通参加者と、それ以前の構造軌道との間の将来の相互作用など、交通進化過程全体を包括的にモデル化することはできない。本稿では,エゴカーと周辺環境が過去の場面でどのように進化するかを予測するために,エンド・ツー・エンドの自動運転の新しいパラダイムを探求する。我々は、自律運転を生成モデル問題に投入する生成フレームワークGenADを提案する。まず,周辺シーンをmap-awareインスタンストークンに変換するインスタンス中心のシーントークン化器を提案する。次に、変動オートエンコーダを用いて、軌道先行モデリングのための構造潜在空間における将来の軌道分布を学習する。さらに, 潜伏空間におけるエージェントとエゴの動きを捉えるための時間モデルを採用し, より効果的な将来の軌跡を生成する。最後にgenadは、インスタンストークンに条件付けされた学習構造潜在空間の分布をサンプリングし、学習時間モデルを使用して未来を生成することで、動作予測と計画を同時に行う。広く使用されているnuScenesベンチマークの大規模な実験により、提案されたGenADは、高効率でビジョン中心のエンドツーエンド自動運転における最先端のパフォーマンスを達成することが示された。

Directly producing planning results from raw sensors has been a long-desired solution for autonomous driving and has attracted increasing attention recently. Most existing end-to-end autonomous driving methods factorize this problem into perception, motion prediction, and planning. However, we argue that the conventional progressive pipeline still cannot comprehensively model the entire traffic evolution process, e.g., the future interaction between the ego car and other traffic participants and the structural trajectory prior. In this paper, we explore a new paradigm for end-to-end autonomous driving, where the key is to predict how the ego car and the surroundings evolve given past scenes. We propose GenAD, a generative framework that casts autonomous driving into a generative modeling problem. We propose an instance-centric scene tokenizer that first transforms the surrounding scenes into map-aware instance tokens. We then employ a variational autoencoder to learn the future trajectory distribution in a structural latent space for trajectory prior modeling. We further adopt a temporal model to capture the agent and ego movements in the latent space to generate more effective future trajectories. GenAD finally simultaneously performs motion prediction and planning by sampling distributions in the learned structural latent space conditioned on the instance tokens and using the learned temporal model to generate futures. Extensive experiments on the widely used nuScenes benchmark show that the proposed GenAD achieves state-of-the-art performance on vision-centric end-to-end autonomous driving with high efficiency.

翻訳日:2024-02-20 20:45:19 公開日:2024-02-18

# 基礎モデルを用いた複雑なロボット指導の検証

Verifiably Following Complex Robot Instructions with Foundation Models ( http://arxiv.org/abs/2402.11498v1 )

ライセンス: Link先を確認

Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris

(参考訳) 複雑な自然言語命令に従うロボットの開発は、重要な課題である。人々は柔軟に制約を表現し、任意のランドマークを参照し、ロボットに指示するときの行動を検証することを望んでいます。逆に、ロボットは人間の指示を、現実世界の仕様や地上の指示にあいまいにする必要がある。動作計画のための言語指導基盤(LIMP: Language Instruction Grounding for Motion Planning)を提案する。これは、基本モデルと時間論理を利用して、ロボットがオープンな語彙参照と複雑な時空間制約を持つ表現的・長期的指示を確実に追従できるように、指示条件付きセマンティックマップを生成するシステムである。ロボットタスクの実行において基礎モデルを使用する従来の方法とは対照的に、LIMPは、インストラクターの意図する動機とロボットのアライメントを明らかにする説明可能な指示表現を構築し、正しいロボット動作の合成を行う。 LIMPは,35の複雑な時空間命令からなる実世界の3つの環境において,我々のアプローチの一般化と新規な非構造ドメインへの展開の容易さを示す。実験では,オープンボキャブラリーレファレンスを空間的に接地し,対象方向ナビゲーションの90%と移動操作命令の71%で制約満足プランを合成する。補足ビデオはhttps://robotlimp.github.io

Enabling robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), a system that leverages foundation models and temporal logics to generate instruction-conditioned semantic maps that enable robots to verifiably follow expressive and long-horizon instructions with open vocabulary referents and complex spatiotemporal constraints. In contrast to prior methods for using foundation models in robot task execution, LIMP constructs an explainable instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We demonstrate LIMP in three real-world environments, across a set of 35 complex spatiotemporal instructions, showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP can spatially ground open-vocabulary referents and synthesize constraint-satisfying plans in 90% of object-goal navigation and 71% of mobile manipulation instructions. See supplementary videos at https://robotlimp.github.io

翻訳日:2024-02-20 20:44:54 公開日:2024-02-18

# 多視点自己教師付き学習と2段階前訓練による甲状腺超音波診断の改善

Thyroid ultrasound diagnosis improvement via multi-view self-supervised learning and two-stage pre-training ( http://arxiv.org/abs/2402.11497v1 )

ライセンス: Link先を確認

Jian Wang, Xin Yang, Xiaohong Jia, Wufeng Xue, Rusi Chen, Yanlin Chen, Xiliang Zhu, Lian Liu, Yan Cao, Jianqiao Zhou, Dong Ni, Ning Gu

(参考訳) 超音波画像の甲状腺結節分類とセグメンテーションはコンピュータ支援診断において重要であるが,ラベル付きデータ不足による限界に直面している。そこで本研究では, 甲状腺結節分類と分節性能を改善するための多視点コントラスト型自己教師あり方式を提案する。本手法は,同一結節の横断的および縦方向の視野を整合させ,結節領域に焦点を合わせることを可能にする。我々は、ペアデータの制限を取り除く適応的損失関数を設計した。さらに,imagenetおよび甲状腺超音波画像の事前訓練を活用すべく,2段階の事前訓練を行った。複数のセンターから収集した大規模データセット上で大規模な実験を行った。提案手法は,手動ラベルの限定による結節分類とセグメンテーション性能を著しく向上し,最先端の自己管理手法よりも優れていた。 2段階の事前トレーニングもImageNetの事前トレーニングをはるかに上回った。

Thyroid nodule classification and segmentation in ultrasound images are crucial for computer-aided diagnosis; however, they face limitations owing to insufficient labeled data. In this study, we proposed a multi-view contrastive self-supervised method to improve thyroid nodule classification and segmentation performance with limited manual labels. Our method aligns the transverse and longitudinal views of the same nodule, thereby enabling the model to focus more on the nodule area. We designed an adaptive loss function that eliminates the limitations of the paired data. Additionally, we adopted a two-stage pre-training to exploit the pre-training on ImageNet and thyroid ultrasound images. Extensive experiments were conducted on a large-scale dataset collected from multiple centers. The results showed that the proposed method significantly improves nodule classification and segmentation performance with limited manual labels and outperforms state-of-the-art self-supervised methods. The two-stage pre-training also significantly exceeded ImageNet pre-training.

翻訳日:2024-02-20 20:44:30 公開日:2024-02-18

# URLBERT:URL分類のためのコントラストおよび逆順事前学習モデル

URLBERT:A Contrastive and Adversarial Pre-trained Model for URL Classification ( http://arxiv.org/abs/2402.11495v1 )

ライセンス: Link先を確認

Yujie Li, Yanbin Wang, Haitao Xu, Zhenhao Guo, Zheng Cao, Lun Zhang

(参考訳) URLは、特にセキュリティ管理やオンラインレコメンデーションに関連するタスクにおいて、Webコンテンツの理解と分類において重要な役割を果たす。現在、事前訓練されたモデルは様々な分野を支配しているが、URL分析の領域には特別な事前訓練されたモデルがない。このギャップに対処するために、様々なURL分類や検出タスクに適用された最初の事前学習型表現学習モデルであるURLBERTを紹介する。私たちはまず、URLデータのトークン化に対処するために、数十億のURLのコーパスでURLトークンライザをトレーニングします。さらに,(1)同一URLの異なる変種を識別することで,モデルのURL構造理解とカテゴリー差の捕捉を強化する自己教師型コントラスト学習タスク,(2)URLから意味的特徴を抽出する際のモデルの堅牢性向上を目的とした仮想対人訓練,の2つの新しい事前学習タスクを提案する。最後に,提案手法をフィッシングurl検出,webページ分類,広告フィルタリングなどのタスクで評価し,最先端のパフォーマンスを実現する。また, URLBERTを用いたマルチタスク学習についても検討し, 複雑なタスク要求の処理における URLBERT の単純さを示すために, URLBERT に基づくマルチタスク学習モデルが独立に調整されたモデルと同等の有効性を示した。私たちの仕事のコードはhttps://github.com/davidup1/urlbert.comで利用可能です。

URLs play a crucial role in understanding and categorizing web content, particularly in tasks related to security control and online recommendations. While pre-trained models are currently dominating various fields, the domain of URL analysis still lacks specialized pre-trained models. To address this gap, this paper introduces URLBERT, the first pre-trained representation learning model applied to a variety of URL classification or detection tasks. We first train a URL tokenizer on a corpus of billions of URLs to address URL data tokenization. Additionally, we propose two novel pre-training tasks: (1) self-supervised contrastive learning tasks, which strengthen the model's understanding of URL structure and the capture of category differences by distinguishing different variants of the same URL; (2) virtual adversarial training, aimed at improving the model's robustness in extracting semantic features from URLs. Finally, our proposed methods are evaluated on tasks including phishing URL detection, web page classification, and ad filtering, achieving state-of-the-art performance. Importantly, we also explore multi-task learning with URLBERT, and experimental results demonstrate that multi-task learning model based on URLBERT exhibit equivalent effectiveness compared to independently fine-tuned models, showing the simplicity of URLBERT in handling complex task requirements. The code for our work is available at https://github.com/Davidup1/URLBERT.

翻訳日:2024-02-20 20:44:14 公開日:2024-02-18

# 因果干渉によるグラフアウトオブディストリビューション一般化

Graph Out-of-Distribution Generalization via Causal Intervention ( http://arxiv.org/abs/2402.11494v1 )

ライセンス: Link先を確認

Qitian Wu, Fan Nie, Chenxiao Yang, Tianyi Bao, Junchi Yan

(参考訳) グラフニューラルネットワーク(GNN)は、分散シフトに伴うパフォーマンス劣化を示すことが多いため、アウト・オブ・ディストリビューション(OOD)の一般化は、グラフの学習に注目が集まっている。課題は、グラフ上の分散シフトがノード間の複雑な相互接続を伴い、環境ラベルがしばしばデータに欠落することである。本稿では,ボトムアップなデータ生成的視点を採用し,因果分析による重要な観察を明らかにする。後者は、egoグラフの特徴とターゲットノードのラベルの間の環境に敏感な相関を利用するようにモデルを誤解し、新たな未知ノードに対する望ましくない一般化をもたらす。この分析に基づいて,環境ラベルの事前知識を必要とせず,ノードレベルの分散シフトの下で堅牢なGNNをトレーニングするための,概念的に単純だが原則化されたアプローチを導入する。本手法は,環境推定器と熟練GNN予測器を協調する因果推論に基づく新たな学習目標を提案する。この新しいアプローチは、トレーニングデータの偏りを克服し、一般化可能な予測関係の学習を容易にする。広範な実験により,本モデルは様々な分布シフトによる一般化を効果的に促進し,グラフood一般化ベンチマークにおける最先端よりも27.4\%の精度向上が得られることを示した。ソースコードはhttps://github.com/fannie1208/canetで入手できる。

Out-of-distribution (OOD) generalization has gained increasing attentions for learning on graphs, as graph neural networks (GNNs) often exhibit performance degradation with distribution shifts. The challenge is that distribution shifts on graphs involve intricate interconnections between nodes, and the environment labels are often absent in data. In this paper, we adopt a bottom-up data-generative perspective and reveal a key observation through causal analysis: the crux of GNNs' failure in OOD generalization lies in the latent confounding bias from the environment. The latter misguides the model to leverage environment-sensitive correlations between ego-graph features and target nodes' labels, resulting in undesirable generalization on new unseen nodes. Built upon this analysis, we introduce a conceptually simple yet principled approach for training robust GNNs under node-level distribution shifts, without prior knowledge of environment labels. Our method resorts to a new learning objective derived from causal inference that coordinates an environment estimator and a mixture-of-expert GNN predictor. The new approach can counteract the confounding bias in training data and facilitate learning generalizable predictive relations. Extensive experiment demonstrates that our model can effectively enhance generalization with various types of distribution shifts and yield up to 27.4\% accuracy improvement over state-of-the-arts on graph OOD generalization benchmarks. Source codes are available at https://github.com/fannie1208/CaNet.

翻訳日:2024-02-20 20:43:48 公開日:2024-02-18

# BGEランドマーク埋め込み:長期拡張大言語モデル検索のためのチャンキングフリー埋め込み手法

BGE Landmark Embedding: A Chunking-Free Embedding Method For Retrieval Augmented Long-Context Large Language Models ( http://arxiv.org/abs/2402.11573v1 )

ライセンス: Link先を確認

Kun Luo and Zheng Liu and Shitao Xiao and Kang Liu

(参考訳) 大規模言語モデル(LLM)は、多くの重要なアプリケーションを扱うためにコンテキストの拡張を要求する。しかし、既存のアプローチはコストがかかり、コンテキスト拡張の品質が劣る傾向にある。本研究では,LLMのコンテキストを高精細に拡張し,柔軟性とコスト効率を向上するExtensible Embeddingを提案する。拡張可能な埋め込みは、単一のトークンではなく、拡張可能なコンテキストのスコープの情報を表す典型的なトークン埋め込みの拡張である。情報密度の高いそのようなコンパクトな入力ユニットを利用することで、LLMは小さなコンテキストウィンドウでも広い範囲のコンテキストにアクセスできる。拡張可能な埋め込みは、アーキテクチャとトレーニングメソッドに体系的に最適化され、複数の利点をもたらす。 1) 多様なコンテキスト長のアドホック拡張を柔軟にサポートするコンテキスト拡張の柔軟性が高い。 2) 組込みモデルを費用対効果で学習する訓練の強いサンプル効率について検討した。 3) プラグインコンポーネントとして拡張可能な埋め込みをシームレスに導入可能な既存のLLMとの互換性。長文言語モデリングおよび理解タスクに関する包括的な評価は、LLMのコンテキストを拡張するために、効果的で効率的で柔軟で互換性のある方法として拡張可能な埋め込みを検証する。

Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we proposeExtensible Embedding, which realizes high-quality extension of LLM's context with strong flexibility and cost-effectiveness. Extensible embedding stand as an enhancement of typical token embedding, which represents the information for an extensible scope of context instead of a single token. By leveraging such compact input units of higher information density, the LLM can access to a vast scope of context even with a small context window. Extensible embedding is systematically optimized in architecture and training method, which leads to multiple advantages. 1) High flexibility of context extension, which flexibly supports ad-hoc extension of diverse context lengths. 2) Strong sample efficiency of training, which enables the embedding model to be learned in a cost-effective way. 3) Superior compatibility with the existing LLMs, where the extensible embedding can be seamlessly introduced as a plug-in component. Comprehensive evaluations on long-context language modeling and understanding tasks verify extensible embedding as an effective, efficient, flexible, and compatible method to extend the LLM's context.

翻訳日:2024-02-20 20:36:19 公開日:2024-02-18

# LongAgent: マルチエージェントコラボレーションによる言語モデルから128kコンテキストへのスケーリング

LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration ( http://arxiv.org/abs/2402.11550v1 )

ライセンス: Link先を確認

Jun Zhao, Can Zu, Hao Xu, Yi Lu, Wei He, Yiwen Ding, Tao Gui, Qi Zhang, Xuanjing Huang

(参考訳) 大規模言語モデル(LLM)は、言語理解と複雑な推論タスクの実行において優れたパフォーマンスを示している。しかし、長いコンテキストウィンドウを持つLLMは、高価なトレーニングコストと高い推論遅延で有名である。 GPT-4やClaude2のような最も先進的なモデルでさえ、100kドルを超えるトークンの入力を処理するときにしばしば間違いを犯す。本稿では、128Kの文脈にLLM(LLaMA)を拡大し、GPT-4と比較して長文処理において潜在的優位性を示すマルチエージェント協調に基づく手法である \textsc{LongAgent} を提案する。 textsc{longagent}では、リーダーはユーザーの意図を理解し、チームメンバーに文書から情報を取得するよう指示する責任がある。メンバーの幻覚のため、リーダーが数十人から数百人のメンバーの反応から正確な情報を得るのは自明ではない。これに対処するために,情報共有による幻覚による応答競合を解決するための \textit{inter-member communication} メカニズムを開発した。実験結果から, <textsc{LongAgent} が長文処理の代替となる可能性が示唆された。 LLaMA-7Bでインスタンス化したエージェントチームは、128k長のテキスト検索やマルチホップ質問応答といったタスクを、GPT-4と比べて大幅に改善した。

Large language models (LLMs) have demonstrated impressive performance in understanding language and executing complex reasoning tasks. However, LLMs with long context windows have been notorious for their expensive training costs and high inference latency. Even the most advanced models such as GPT-4 and Claude2 often make mistakes when processing inputs of over $100k$ tokens, a phenomenon also known as \textit{lost in the middle}. In this paper, we propose \textsc{LongAgent}, a method based on multi-agent collaboration, which scales LLMs (e.g., LLaMA) to a context of 128K and demonstrates potential superiority in long-text processing compared to GPT-4. In \textsc{LongAgent}, a leader is responsible for understanding user intent and directing team members to acquire information from documents. Due to members' hallucinations, it is non-trivial for a leader to obtain accurate information from the responses of dozens to hundreds of members. To address this, we develop an \textit{inter-member communication} mechanism to resolve response conflicts caused by hallucinations through information sharing. Our experimental results indicate that \textsc{LongAgent} offers a promising alternative for long-text processing. The agent team instantiated with LLaMA-7B achieves significant improvements in tasks such as 128k-long text retrieval, multi-hop question answering, compared to GPT-4.

翻訳日:2024-02-20 20:35:58 公開日:2024-02-18

# 英語とドイツ語における構文的言語変化:メトリクス、パーサー、収束

Syntactic Language Change in English and German: Metrics, Parsers, and Convergences ( http://arxiv.org/abs/2402.11549v1 )

ライセンス: Link先を確認

Yanran Chen, Wei Zhao, Anne Breitbarth, Manuel Stoeckel, Alexander Mehler, Steffen Eger

(参考訳) 多くの研究が、人間の言語は複雑さの低減と通信効率の向上のために最適化される傾向があることを示した。依存語間の線形距離を測定する構文依存距離は、しばしば言語処理の困難さと作業記憶負荷の重要な指標と見なされる。本論文は,前回のC言語論争のコーパスを用いて,英語とドイツ語の統語的言語変化のダイアクロニックな傾向を考察する。 160年。私たちは、広く使われているStanford CoreNLPと4つの新しい代替品を含む5つの依存性パーサをベースとしています。構文言語の変化の分析は, 線形依存性距離を超えるもので, 依存性距離最小化(DDM)および/または木の高さや次数分散といった木グラフ特性に基づく15の指標を探索する。最近の木バンクで訓練されたパーサーは,スペル変化やOCRエラーなどのデータ「ノイズ」の影響を受けない証拠があるが,構文言語変化の結果は関連するパーサーに敏感であり,構文言語変化を評価するために単一のパーサーを使用することに注意が必要である。また、調査期間中の構文言語の変化は、調査対象の異なる指標間で英語とドイツ語にほぼ類似していることを示し、調査対象のケースのわずか4%が、ドイツ語と英語の統語的指標の上昇と下降に関する反対の結論を得た。また,文長分布の尾部では,統語的尺度の変化がより頻繁であることを示す。我々の知る限りでは、近年の英語とドイツ語のコーパスにおいて、現代のNLP技術を用いた構文言語の最も包括的な分析である。

Many studies have shown that human languages tend to optimize for lower complexity and increased communication efficiency. Syntactic dependency distance, which measures the linear distance between dependent words, is often considered a key indicator of language processing difficulty and working memory load. The current paper looks at diachronic trends in syntactic language change in both English and German, using corpora of parliamentary debates from the last c. 160 years. We base our observations on five dependency parsers, including the widely used Stanford CoreNLP as well as 4 newer alternatives. Our analysis of syntactic language change goes beyond linear dependency distance and explores 15 metrics relevant to dependency distance minimization (DDM) and/or based on tree graph properties, such as the tree height and degree variance. Even though we have evidence that recent parsers trained on modern treebanks are not heavily affected by data 'noise' such as spelling changes and OCR errors in our historic data, we find that results of syntactic language change are sensitive to the parsers involved, which is a caution against using a single parser for evaluating syntactic language change as done in previous work. We also show that syntactic language change over the time period investigated is largely similar between English and German across the different metrics explored: only 4% of cases we examine yield opposite conclusions regarding upwards and downtrends of syntactic metrics across German and English. We also show that changes in syntactic measures seem to be more frequent at the tails of sentence length distributions. To our best knowledge, ours is the most comprehensive analysis of syntactic language using modern NLP technology in recent corpora of English and German.

翻訳日:2024-02-20 20:35:34 公開日:2024-02-18

# KMMLU:韓国における大規模マルチタスク言語理解の測定

KMMLU: Measuring Massive Multitask Language Understanding in Korean ( http://arxiv.org/abs/2402.11548v1 )

ライセンス: Link先を確認

Guijin Son and Hanwool Lee and Sungdong Kim and Seungone Kim and Niklas Muennighoff and Taekyoon Choi and Cheonbok Park and Kang Min Yoo and Stella Biderman

(参考訳) 人文科学からstemまで,45名を対象に,35,030名の専門家レベルのマルチチョイス質問を持つ韓国人ベンチマークkmmluを提案する。既存の英語のベンチマークから翻訳された以前の韓国のベンチマークとは異なり、KMMLUは韓国語の言語的・文化的側面を捉えた最初の韓国の試験から収集される。公開可能な26のLLMをテストし、改善の余地を特定した。最高の一般公開モデルはKMMLUで50.54%、平均62.6%よりもはるかに低い。このモデルは韓国語ではなく、主に英語と中国語で訓練された。韓国のポリグロット・コ(Polyglot-Ko)のような現在のLLMは、はるかに悪化している。驚くべきことに、GPT-4やHyperCLOVA Xのような最も有能なLLMでさえそれぞれ59.95%と53.40%を達成している。これは韓国のLLMを改善するためにさらなる作業が必要であることを示唆しており、KMMLUはこの進捗を追跡できる適切なツールを提供している。私たちはデータセットをHugging Face Hubで公開し、ベンチマークをEleutherAIのLanguage Model Evaluation Harnessに統合します。

We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM. Unlike previous Korean benchmarks that are translated from existing English benchmarks, KMMLU is collected from original Korean exams, capturing linguistic and cultural aspects of the Korean language. We test 26 publically available and proprietary LLMs, identifying significant room for improvement. The best publicly available model achieves 50.54% on KMMLU, far below the average human performance of 62.6%. This model was primarily trained for English and Chinese, not Korean. Current LLMs tailored to Korean, such as Polyglot-Ko, perform far worse. Surprisingly, even the most capable proprietary LLMs, e.g., GPT-4 and HyperCLOVA X, achieve 59.95% and 53.40%, respectively. This suggests that further work is needed to improve Korean LLMs, and KMMLU offers the right tool to track this progress. We make our dataset publicly available on the Hugging Face Hub and integrate the benchmark into EleutherAI's Language Model Evaluation Harness.

翻訳日:2024-02-20 20:35:05 公開日:2024-02-18

# 逆直感的:大きな言語モデルは、思った以上に知識グラフを理解できる

Counter-intuitive: Large Language Models Can Better Understand Knowledge Graphs Than We Thought ( http://arxiv.org/abs/2402.11541v1 )

ライセンス: Link先を確認

Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Guilin Qi

(参考訳) 大規模言語モデル(LLM)の推論能力の向上と知識グラフ(KG)の利用による幻覚の低減は広く注目されているが、LLMがKGのオンザフライでの構造化知識を統合する方法の探求はいまだ不十分である。研究者はしばしば、KGの知識を理解する能力を備えたLLMに、KG埋め込みとLLMパラメータを併用する。しかし、このリソースハーグリートレーニングパラダイムはモデル学習コストを大幅に向上させ、非オープンソースでブラックボックスのLCMにも適さない。本稿では,複雑な質問応答(CQA)を用いて,KG知識を解釈するLLMの能力を評価する。我々は,KG知識をLLMに供給する最適なプロンプト法を検討することを目的として,KG知識注入法(トリプルから自然言語テキストまで)の総合的な比較を行った。初期の期待とは対照的に,llmは乱雑でうるさく,線形化されたkg知識を効果的に処理し,高度に設計された自然言語(nl)テキストプロンプトを用いた手法よりも優れていた。この反直感的な発見は、LLMの構造化知識の理解に関する将来の研究に重要な洞察を与える。

Although the method of enhancing large language models' (LLMs') reasoning ability and reducing their hallucinations through the use of knowledge graphs (KGs) has received widespread attention, the exploration of how to enable LLMs to integrate the structured knowledge in KGs on-the-fly remains inadequate. Researchers often co-train KG embeddings and LLM parameters to equip LLMs with the ability of comprehending KG knowledge. However, this resource-hungry training paradigm significantly increases the model learning cost and is also unsuitable for non-open-source, black-box LLMs. In this paper, we employ complex question answering (CQA) as a task to assess the LLM's ability of comprehending KG knowledge. We conducted a comprehensive comparison of KG knowledge injection methods (from triples to natural language text), aiming to explore the optimal prompting method for supplying KG knowledge to LLMs, thereby enhancing their comprehension of KG. Contrary to our initial expectations, our analysis revealed that LLMs effectively handle messy, noisy, and linearized KG knowledge, outperforming methods that employ well-designed natural language (NL) textual prompts. This counter-intuitive finding provides substantial insights for future research on LLMs' comprehension of structured knowledge.

翻訳日:2024-02-20 20:34:48 公開日:2024-02-18

# CPN:制約なしテキスト検出のための補完提案ネットワーク

CPN: Complementary Proposal Network for Unconstrained Text Detection ( http://arxiv.org/abs/2402.11540v1 )

ライセンス: Link先を確認

Longhuang Wu, Shangxuan Tian, Youxin Wang, Pengfei Xiong

(参考訳) 既存のテキスト検出方法は、セグメンテーションベースとアンカーベースという2つのパラダイムに分けられる。セグメンテーションベースの手法は不規則な形状に適しているが、コンパクトもしくは重なり合うレイアウトに苦労する。逆に、アンカーベースのアプローチは複雑なレイアウトでは優れているが、不規則な形状に苦しむ。それらのメリットを強化し,それぞれのデメリットを克服するために,意味的および幾何学的情報をシームレスに統合し,優れた性能を実現する補完的提案ネットワーク(cpn)を提案する。 cpnは、革新的な変形可能な形態素演算子を用いた意味的提案を生成する変形可能形態素意味ネットワークと、事前定義されたアンカーを用いた幾何学的提案を生成するバランスド領域提案ネットワークである。補間性をさらに向上するため,提案生成前に意味的および幾何学的特徴を深く相互作用させるインターリーブド・フィーチャー・アテンション・モジュールを導入する。補完的な提案と特徴の両方を活用することで、CPNは同等の計算コストで最先端のアプローチよりも優れたマージンを持つ。具体的には, icdar19-art, ic15, msra-td500をそれぞれ3.6%, 1.3%, 1.0%改善した。私たちのメソッドのコードはリリースされます。

Existing methods for scene text detection can be divided into two paradigms: segmentation-based and anchor-based. While Segmentation-based methods are well-suited for irregular shapes, they struggle with compact or overlapping layouts. Conversely, anchor-based approaches excel for complex layouts but suffer from irregular shapes. To strengthen their merits and overcome their respective demerits, we propose a Complementary Proposal Network (CPN) that seamlessly and parallelly integrates semantic and geometric information for superior performance. The CPN comprises two efficient networks for proposal generation: the Deformable Morphology Semantic Network, which generates semantic proposals employing an innovative deformable morphological operator, and the Balanced Region Proposal Network, which produces geometric proposals with pre-defined anchors. To further enhance the complementarity, we introduce an Interleaved Feature Attention module that enables semantic and geometric features to interact deeply before proposal generation. By leveraging both complementary proposals and features, CPN outperforms state-of-the-art approaches with significant margins under comparable computation cost. Specifically, our approach achieves improvements of 3.6%, 1.3% and 1.0% on challenging benchmarks ICDAR19-ArT, IC15, and MSRA-TD500, respectively. Code for our method will be released.

翻訳日:2024-02-20 20:34:23 公開日:2024-02-18

# PASCL:パーティクルデバイ再建のための摂動増強によるコントラスト学習の促進

PASCL: Supervised Contrastive Learning with Perturbative Augmentation for Particle Decay Reconstruction ( http://arxiv.org/abs/2402.11538v1 )

ライセンス: Link先を確認

Junjian Lu, Siwei Liu, Dmitrii Kobylianski, Etienne Dreyer, Eilam Gross, Shangsong Liang

(参考訳) 高エネルギー物理学では、衝突で生じる粒子は階層木構造の形で崩壊し、最終崩壊生成物のみが検出器を用いて観測される。しかし、可能な木構造の大規模な組合せ空間は、最終粒子の集合から実際の崩壊過程の回復を困難にしている。階層木構造をよりよく解析するために,木構造を推論して衝突イベントを再構成するグラフベースディープラーニングモデルを提案する。特に、最小共通祖先世代(LCAG)行列と呼ばれるコンパクトな行列表現を用いて、粒子崩壊木構造を符号化する。次に,実験的な不確かさを模倣し,データの多様性を高めることを目的として,ノード特徴に適用する摂動的拡張手法を提案する。さらに,複数の崩壊過程から粒子間関係の情報を利用する教師付きグラフコントラスト学習アルゴリズムを提案する。広汎な実験により,提案手法による教師付きグラフコントラスト学習は,既存の物理ベースデータセット上での最先端のベースラインモデルよりも優れ,再構成精度が大幅に向上した。この方法は、同じパラメータを持つモデルに対してより効率的なトレーニング戦略を提供し、より正確で効率的な高エネルギー粒子物理データ解析を実現する。

In high-energy physics, particles produced in collision events decay in a format of a hierarchical tree structure, where only the final decay products can be observed using detectors. However, the large combinatorial space of possible tree structures makes it challenging to recover the actual decay process given a set of final particles. To better analyse the hierarchical tree structure, we propose a graph-based deep learning model to infer the tree structure to reconstruct collision events. In particular, we use a compact matrix representation termed as lowest common ancestor generations (LCAG) matrix, to encode the particle decay tree structure. Then, we introduce a perturbative augmentation technique applied to node features, aiming to mimic experimental uncertainties and increase data diversity. We further propose a supervised graph contrastive learning algorithm to utilize the information of inter-particle relations from multiple decay processes. Extensive experiments show that our proposed supervised graph contrastive learning with perturbative augmentation (PASCL) method outperforms state-of-the-art baseline models on an existing physics-based dataset, significantly improving the reconstruction accuracy. This method provides a more effective training strategy for models with the same parameters and makes way for more accurate and efficient high-energy particle physics data analysis.

翻訳日:2024-02-20 20:34:00 公開日:2024-02-18

# 機械学習による大規模言語モデルにおける事前学習データのlmpactの解読

Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning ( http://arxiv.org/abs/2402.11537v1 )

ライセンス: Link先を確認

Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu and Bing Qin

(参考訳) 様々なソースを持つコーパスでの事前トレーニングを通じて、Large Language Models (LLM) は印象的なパフォーマンスを得た。しかし,プレトレーニングコーパスの各成分の影響はいまだに不透明である。結果として、プレトレーニングコーパスの組織は、まだ経験的であり、最適から逸脱する可能性がある。この問題に対処するために, LLMの事前学習データ5つの主要なカテゴリから48のデータセットが与える影響を系統的に分析し, モデル能力の9つの主要なカテゴリに関するベンチマークを用いてLLMへの影響を測定する。本研究は, 複数コーパスがLLMの性能に与える影響と, 相補関係, 直交関係, 相関関係など, 共同的な影響パターンについて実験的に検討した。また、モデル能力のセットと著しく関連のある書籍のような `high-impact data'' のセットも識別します。これらの知見は、LLMのより効率的な事前トレーニングを支援するために、データの組織化に関する洞察を提供する。

Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major categories of pretraining data of LLMs and measure their impacts on LLMs using benchmarks about nine major categories of model capabilities. Our analyses provide empirical results about the contribution of multiple corpora on the performances of LLMs, along with their joint impact patterns, including complementary, orthogonal, and correlational relationships. We also identify a set of ``high-impact data'' such as Books that is significantly related to a set of model capabilities. These findings provide insights into the organization of data to support more efficient pretraining of LLMs.

翻訳日:2024-02-20 20:33:39 公開日:2024-02-18

# preact:reactの将来予測はエージェントの計画能力を高める

PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability ( http://arxiv.org/abs/2402.11534v1 )

ライセンス: Link先を確認

Dayuan Fu, Jianzhao Huang, Siyuan Lu, Guanting Dong, Yejie Wang, Keqing He, Weiran Xu

(参考訳) 予測と実際の結果の相違に対処することは、思考プロセスを拡大し、リフレクションに関わり、正しい方向への推論を促進するのに役立つ。本稿では、$\textbf{pre}$dictionと$\textbf{rea}$soningと$\textbf{act}$ionを統合したエージェントフレームワークである$\textbf{PreAct}$を紹介します。予測によって提供される情報を活用することで、大きな言語モデル(LLM)ベースのエージェントは、より多様化し、戦略的に指向した推論を提供することができる。実験により,PreActは複雑なタスクを遂行する上でReActアプローチよりも優れており,Reflexion法と組み合わせることでPreActを協調的に実現できることが実証された。我々は,そのモデルに異なる数の歴史的予測を推奨し,過去の予測がllm計画に継続的なプラス効果をもたらすことを見出した。 PreActとReActの単一ステップ推論の違いは、PreActがReActよりも多様性と戦略的指向性という面で、確かに有利であることを示している。

Addressing the discrepancies between predictions and actual outcomes often aids individuals in expanding their thought processes and engaging in reflection, thereby facilitating reasoning in the correct direction. In this paper, we introduce $\textbf{PreAct}$, an agent framework that integrates $\textbf{pre}$diction with $\textbf{rea}$soning and $\textbf{act}$ion. Leveraging the information provided by predictions, a large language model (LLM) based agent can offer more diversified and strategically oriented reasoning, which in turn leads to more effective actions that help the agent complete complex tasks. Our experiments demonstrate that PreAct outperforms the ReAct approach in accomplishing complex tasks and that PreAct can be co-enhanced when combined with Reflexion methods. We prompt the model with different numbers of historical predictions and find that historical predictions have a sustained positive effect on LLM planning. The differences in single-step reasoning between PreAct and ReAct show that PreAct indeed offers advantages in terms of diversity and strategic directivity over ReAct.

翻訳日:2024-02-20 20:33:24 公開日:2024-02-18

# chain-of-instruction:大規模言語モデルにおける合成命令チューニング

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models ( http://arxiv.org/abs/2402.11532v1 )

ライセンス: Link先を確認

Shirley Anugrah Hayati, Taehee Jung, Tristan Bodding-Long, Sudipta Kar, Abhinav Sethy, Joo-Kyung Kim, Dongyeop Kang

(参考訳) 大型言語モデル(llm)を大規模で多様な命令の集合で微調整することで、モデルの異なるタスクへの一般化が改善される。しかし、既存の命令データセットの多くは単一の命令のみを含み、複数のサブタスク(Wang et al., 2023a)からなる複雑な命令に従うのに苦労している。本稿では、1つの命令の出力がチェーンのように次の命令の入力となるような合成命令の新たな概念であるchain-of-instructions(coi)を提案する。従来の単一命令タスクの解法とは異なり,提案手法では各サブタスクを段階的に解き,最終的な解答に到達するまで解き明かす。 CoIチューニング(CoI命令による微調整)は、複数のサブタスクからなる命令を処理するモデルの能力を向上させる。 coi調律モデルはまた、多言語要約のベースラインモデルよりも優れており、非知覚の複合下流タスクにおけるcoiモデルの一般化性を示している。

Fine-tuning large language models (LLMs) with a collection of large and diverse instructions has improved the model's generalization to different tasks, even for unseen tasks. However, most existing instruction datasets include only single instructions, and they struggle to follow complex instructions composed of multiple subtasks (Wang et al., 2023a). In this work, we propose a novel concept of compositional instructions called chain-of-instructions (CoI), where the output of one instruction becomes an input for the next like a chain. Unlike the conventional practice of solving single instruction tasks, our proposed method encourages a model to solve each subtask step by step until the final answer is reached. CoI-tuning (i.e., fine-tuning with CoI instructions) improves the model's ability to handle instructions composed of multiple subtasks. CoI-tuned models also outperformed baseline models on multilingual summarization, demonstrating the generalizability of CoI models on unseen composite downstream tasks.

翻訳日:2024-02-20 20:33:01 公開日:2024-02-18

# データ中心の観点からの効率的なマルチモーダル学習

Efficient Multimodal Learning from Data-centric Perspective ( http://arxiv.org/abs/2402.11530v1 )

ライセンス: Link先を確認

Muyang He, Yexin Liu, Boya Wu, Jianhao Yuan, Yueze Wang, Tiejun Huang, Bo Zhao

(参考訳) MLLM(Multimodal Large Language Models)は、一般的な視覚的理解と推論タスクにおいて顕著な機能を示す。しかし、それらのデプロイメントは、トレーニングと推論の両方において相当な計算コストによって妨げられ、より広範な研究とユーザコミュニティへのアクセシビリティを制限する。簡単な解決策は、より小さな事前学習されたビジョンと言語モデルを活用することだ。本稿では,より情報的なトレーニングデータを探索することにより,スケーリング法を破り,より小さいが優れたMLLMを訓練する可能性を実証する。具体的には、フレキシブルビジョンと言語バックボーンを備えた軽量MLLMのファミリであるBunnyを紹介し、凝縮学習データから効率的なマルチモーダル学習を実現する。注目すべきは、Bunny-3Bは最先端の大規模なMLLM、特にLLaVA-v1.5-13Bを複数のベンチマークで上回ることです。コード、モデル、データはhttps://github.com/BAAI-DCAI/Bunny.comにある。

Multimodal Large Language Models (MLLMs) have demonstrated notable capabilities in general visual understanding and reasoning tasks. However, their deployment is hindered by substantial computational costs in both training and inference, limiting accessibility to the broader research and user communities. A straightforward solution is to leverage smaller pre-trained vision and language models, which inevitably causes significant performance drop. In this paper, we demonstrate the possibility to beat the scaling law and train a smaller but better MLLM by exploring more informative training data. Specifically, we introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning from condensed training data. Remarkably, our Bunny-3B outperforms the state-of-the-art large MLLMs, especially LLaVA-v1.5-13B, on multiple benchmarks. The code, models and data can be found in https://github.com/BAAI-DCAI/Bunny.

翻訳日:2024-02-20 20:32:44 公開日:2024-02-18

# RLHFを用いた翻訳選好モデルの改良:コスト効果ソリューションへの一歩

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution ( http://arxiv.org/abs/2402.11525v1 )

ライセンス: Link先を確認

Nuo Xu, Jun Zhao, Can Zu, Tao Gui, Qi Zhang, Xuanjing Huang

(参考訳) 忠実さ、表現力、優雅さは機械翻訳における絶え間ない追求である。しかし、‘textit{BLEU} のような伝統的なメトリクスは、翻訳品質の人間の好みと厳密に一致しない。本稿では,人間のフィードバックによる強化学習(\textit{RLHF})の活用による翻訳品質の向上について検討する。特に低リソース言語において、翻訳間の人的比較の大規模な高品質データセットを収集するのは自明ではない。この問題に対処するために,人間と機械の翻訳を区別して報酬モデルを最適化する,費用対効果の高い選好学習戦略を提案する。このようにして、報酬モデルは人間に比べて機械翻訳の欠陥を学習し、その後の機械翻訳の改善を導く。実験により, \textit{RLHF} は翻訳品質を効果的に向上し, この改善は, \textit{RLHF} で訓練されていない他の翻訳指導に有効であることが示された。さらなる分析は、モデルの言語能力が嗜好学習において重要な役割を果たすことを示している。強力な言語能力を持つ報酬モデルは、翻訳品質の微妙な違いをよりセンシティブに学習し、実際の人間の翻訳好みに合致することができる。

Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of human comparisons between translations, especially for low-resource languages. To address this issue, we propose a cost-effective preference learning strategy, optimizing reward models by distinguishing between human and machine translations. In this manner, the reward model learns the deficiencies of machine translation compared to human and guides subsequent improvements in machine translation. Experimental results demonstrate that \textit{RLHF} can effectively enhance translation quality and this improvement benefits other translation directions not trained with \textit{RLHF}. Further analysis indicates that the model's language capabilities play a crucial role in preference learning. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality and align better with real human translation preferences.

翻訳日:2024-02-20 20:32:29 公開日:2024-02-18

# 協調フィルタリングのための周辺環境改善型コントラスト学習

Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering ( http://arxiv.org/abs/2402.11523v1 )

ライセンス: Link先を確認

Peijie Sun, Le Wu, Kun Zhang, Xiangzhi Chen, and Meng Wang

(参考訳) レコメンデーションタスクでは有効だが、コラボレーティブフィルタリング(CF)技術はデータの分散性の課題に直面している。研究者は、これに対処するために、コントラスト学習を利用して、追加の自己監督信号を導入し始めた。しかし、このアプローチは意図せずターゲットのユーザ/テーマを隣人から遠ざけ、その効果を制限していることが多い。そこで本研究では,アンカーノードの協調近傍を最終目的損失関数内の正のサンプルとして扱う手法を提案する。本稿では,監督信号とコントラスト損失を効果的に結合する2つの一意な教師付きコントラスト損失関数の開発に着目する。提案する損失関数を勾配レンズを通して解析し,異なる正のサンプルがアンカーノードの埋め込みの更新に同時に影響を与えることを示した。これらのサンプルの影響は、アンカーノードと負のサンプルとの類似性に依存する。グラフベースの協調フィルタリングモデルを我々のバックボーンとし、既存のコントラスト学習モデルSGLと同じデータ拡張手法に従うことにより、推奨モデルの性能を効果的に向上する。提案したNorborhood-Enhanced Supervised Convistive Loss (NESCL) モデルは,SGLのコントラスト損失関数を新たな損失関数に置き換え,性能改善を示す。 Yelp2018、Gowalla、Amazon-Bookの3つの実世界のデータセットでは、当社のモデルは、それぞれ10.09%、7.09%、35.36%をNDCG@20で上回っている。

While effective in recommendation tasks, collaborative filtering (CF) techniques face the challenge of data sparsity. Researchers have begun leveraging contrastive learning to introduce additional self-supervised signals to address this. However, this approach often unintentionally distances the target user/item from their collaborative neighbors, limiting its efficacy. In response, we propose a solution that treats the collaborative neighbors of the anchor node as positive samples within the final objective loss function. This paper focuses on developing two unique supervised contrastive loss functions that effectively combine supervision signals with contrastive loss. We analyze our proposed loss functions through the gradient lens, demonstrating that different positive samples simultaneously influence updating the anchor node's embeddings. These samples' impact depends on their similarities to the anchor node and the negative samples. Using the graph-based collaborative filtering model as our backbone and following the same data augmentation methods as the existing contrastive learning model SGL, we effectively enhance the performance of the recommendation model. Our proposed Neighborhood-Enhanced Supervised Contrastive Loss (NESCL) model substitutes the contrastive loss function in SGL with our novel loss function, showing marked performance improvement. On three real-world datasets, Yelp2018, Gowalla, and Amazon-Book, our model surpasses the original SGL by 10.09%, 7.09%, and 35.36% on NDCG@20, respectively.

翻訳日:2024-02-20 20:32:08 公開日:2024-02-18

# 関与する会話の秘密を明らかにする: ロールプレイングダイアログエージェントにユーザをつなげる要因

Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents ( http://arxiv.org/abs/2402.11522v1 )

ライセンス: Link先を確認

Shuai Zhang, Yu Lu, Junwen Liu, Jia Yu, Huachuan Qiu, Yuming Yan, Zhenzhong Lan

(参考訳) 対話エージェントの人間的な性質が高まるにつれて、人々は、短い瞬間からかなりの時間に及ぶ、拡張された会話に従事しています。これらの相互作用の持続に寄与する要因を理解することは重要であるが、既存の研究は主にこのような長く実際の会話をほとんど探索しない短期的なシミュレーションに焦点を当てている。本稿では,ロールプレイングモデルとの実際の相互作用における保持率に影響を与える要因について検討する。実ユーザと数千文字のインタラクションの大規模なデータセットを分析することで,複数の要因を体系的に検討し,ユーザ保持率への影響を評価する。驚くべきことに、ボットが果たす役割を具現化する程度は保持率に限られた影響を与え、各ターンの長さは保持率に大きな影響を及ぼす。本研究は,ロールプレイングモデルによるユーザエンゲージメントの重要な側面を明らかにし,ロールプレイング目的の大規模言語モデルの開発において,今後の改善に向けた貴重な洞察を提供する。

With the growing humanlike nature of dialog agents, people are now engaging in extended conversations that can stretch from brief moments to substantial periods of time. Understanding the factors that contribute to sustaining these interactions is crucial, yet existing studies primarily focusing on short-term simulations that rarely explore such prolonged and real conversations. In this paper, we investigate the factors influencing retention rates in real interactions with roleplaying models. By analyzing a large dataset of interactions between real users and thousands of characters, we systematically examine multiple factors and assess their impact on user retention rate. Surprisingly, we find that the degree to which the bot embodies the roles it plays has limited influence on retention rates, while the length of each turn it speaks significantly affects retention rates. This study sheds light on the critical aspects of user engagement with role-playing models and provides valuable insights for future improvements in the development of large language models for role-playing purposes.

翻訳日:2024-02-20 20:31:43 公開日:2024-02-18

# 未知の絡み合った状態のシミュレーションにおける通信コスト

Communication Cost in Simulating Unknown Entangled States ( http://arxiv.org/abs/2402.11610v1 )

ライセンス: Link先を確認

Kelvin Onggadinata, Pawel Kurzynski, Dagomir Kaszlikowski

(参考訳) 我々は,n$オブザーバ間で共有されるn$-qubit状態における投影局所測定のアンサンブル統計を,古典的コミュニケーションと共有ランダム性で普遍的にシミュレートする方法を示す。本手法は, [in horizons of the mind, springer, cham (2014)] 量子非局所性をシミュレートするプロトコルと, [phys. rev. lett. 115, 070501 (2015)] 量子回路の古典シミュレーションから生まれたものである。このプロトコルは、他のアプローチとは対照的に、シミュレーションされた量子シナリオの3つの重要な側面を保存している。

We demonstrate how to universally simulate ensemble statistics of projective local measurements on any $n$-qubit state shared among $n$ observers with classical communication and shared randomness. Our technique originates from protocols designed to simulate quantum non-locality [in Horizons of the Mind, Springer, Cham (2014)] and classical simulation of quantum circuits [Phys. Rev. Lett. 115, 070501 (2015)]. The protocol preserves three crucial aspects of the simulated quantum scenario in contrast to other approaches: no involvement of additional parties, none of the observers knows the global state of the system, and local measurement settings remain undisclosed.

翻訳日:2024-02-20 20:22:52 公開日:2024-02-18

# 自己修復システムにおけるルールエンジンの利用とMAPEモデル

Using rule engine in self-healing systems and MAPE model ( http://arxiv.org/abs/2402.11581v1 )

ライセンス: Link先を確認

Zahra Yazdanparast

(参考訳) ソフトウェア機能障害はコンピューティング領域において大きなハードルとなり、システム、企業、ユーザに対して大きなリスクをもたらす。信頼性と品質の高いソフトウェアを作成するには、効果的なデバッグが不可欠である。プログラムデバッグは、ソフトウェアのメンテナンスコストを削減する活動です。本研究では,ルールエンジンを用いた故障修復手法を提案する。 mRUBISのシミュレーションにより,本手法は運用環境において効率がよいことを示した。ソフトウェアの失敗と効率的な緩和戦略の採用を徹底的に把握することで、ステークホルダーはソフトウェアシステムの信頼性、セキュリティ、適応性を高めることができる。これにより、失敗による影響を低減し、デジタル技術への信頼を高めることができる。

Software malfunction presents a significant hurdle within the computing domain, carrying substantial risks for systems, enterprises, and users universally. To produce software with high reliability and quality, effective debugging is essential. Program debugging is an activity to reduce software maintenance costs. In this study, a failure repair method that uses a rule engine is presented. The simulation on mRUBIS showed that the proposed method could be efficient in the operational environment. Through a thorough grasp of software failure and the adoption of efficient mitigation strategies, stakeholders can bolster the dependability, security, and adaptability of software systems. This, in turn, reduces the repercussions of failures and cultivates increased confidence in digital technologies.

翻訳日:2024-02-20 20:22:39 公開日:2024-02-18

# 拡張可能な埋め込み: LLMのコンテキスト長のための柔軟な多重化

Extensible Embedding: A Flexible Multipler For LLM's Context Length ( http://arxiv.org/abs/2402.11577v1 )

ライセンス: Link先を確認

Ninglu Shao, Shitao Xiao, Zheng Liu, Peitian Zhang

(参考訳) 大規模言語モデル(LLM)は、多くの重要なアプリケーションを扱うためにコンテキストの拡張を要求する。しかし、既存のアプローチはコストがかかり、コンテキスト拡張の品質が劣る傾向にある。本研究では,LLMのコンテキストを高精細に拡張し,柔軟性とコスト効率を両立させる拡張可能な埋め込みを提案する。拡張可能な埋め込みは、単一のトークンではなく、拡張可能なコンテキストのスコープの情報を表す典型的なトークン埋め込みの拡張である。情報密度の高いそのようなコンパクトな入力ユニットを利用することで、LLMは小さなコンテキストウィンドウでも広い範囲のコンテキストにアクセスできる。拡張可能な埋め込みは、アーキテクチャとトレーニングメソッドに体系的に最適化され、複数の利点をもたらす。 1) 多様なコンテキスト長のアドホック拡張を柔軟にサポートするコンテキスト拡張の柔軟性が高い。 2) 組込みモデルを費用対効果で学習する訓練の強いサンプル効率について検討した。 3) プラグインコンポーネントとして拡張可能な埋め込みをシームレスに導入可能な既存のLLMとの互換性。長文言語モデリングおよび理解タスクに関する包括的な評価は、LLMのコンテキストを拡張するために、効果的で効率的で柔軟で互換性のある方法として拡張可能な埋め込みを検証する。

Large language models (LLMs) call for extension of context to handle many critical applications. However, the existing approaches are prone to expensive costs and inferior quality of context extension. In this work, we propose Extensible Embedding, which realizes high-quality extension of LLM's context with strong flexibility and cost-effectiveness. Extensible embedding stand as an enhancement of typical token embedding, which represents the information for an extensible scope of context instead of a single token. By leveraging such compact input units of higher information density, the LLM can access to a vast scope of context even with a small context window. Extensible embedding is systematically optimized in architecture and training method, which leads to multiple advantages. 1) High flexibility of context extension, which flexibly supports ad-hoc extension of diverse context lengths. 2) Strong sample efficiency of training, which enables the embedding model to be learned in a cost-effective way. 3) Superior compatibility with the existing LLMs, where the extensible embedding can be seamlessly introduced as a plug-in component. Comprehensive evaluations on long-context language modeling and understanding tasks verify extensible embedding as an effective, efficient, flexible, and compatible method to extend the LLM's context.

翻訳日:2024-02-20 20:22:28 公開日:2024-02-18

# 大規模視覚言語モデルのための視覚内コンテキスト学習

Visual In-Context Learning for Large Vision-Language Models ( http://arxiv.org/abs/2402.11574v1 )

ライセンス: Link先を確認

Yucheng Zhou, Xiang Li, Qianning Wang, Jianbing Shen

(参考訳) 大規模視覚言語モデル(LVLM)では、言語間相互作用や表現格差の課題により、ICL(In-Context Learning)の有効性が制限されている。これらの課題を克服するために,視覚デモンストレーション検索,意図指向画像要約,意図指向デモンストレーション合成を含む新しい視覚インコンテキスト学習(vicl)手法を提案する。提案手法では,'retrieval & rerank'のパラダイムで画像を検索し,タスク意図とタスク特有の視覚的解析で画像を要約し,トークン数を削減し,クロスモーダルインタラクション問題を緩和する言語ベースのデモンストレーションを構成する。 5つの視覚的推論データセットの実験的評価により,本手法の有効性が示された。さらに,本手法の有効性を解明するために情報フロー解析を活用し,LVLMにおける実演の長さと位置の影響について検討した。コンテキスト内アンラーニングの使用はさらに、リトレーニングせずに特定のモデル知識をリセットする可能性を示しています。

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations that reduce token count and alleviate cross-modal interaction problem. Experimental evaluations on five visual reasoning datasets demonstrate the effectiveness of our method. Moreover, our extensive experiments leverage information flow analysis to elucidate the effectiveness of our method, and investigate the impact of length and position of demonstrations for LVLM. The use of in-context unlearning further shows promise in resetting specific model knowledge without retraining.

翻訳日:2024-02-20 20:22:08 公開日:2024-02-18

# 参照フリー画像キャプションにおけるコブラ効果

Cobra Effect in Reference-Free Image Captioning Metrics ( http://arxiv.org/abs/2402.11572v1 )

ライセンス: Link先を確認

Zheng Ma, Changxin Wang, Yawen Ouyang, Fei Zhao, Jianbing Zhang, Shujian Huang, Jiajun Chen

(参考訳) テキスト記述と対応する画像の互換性を評価することは、マルチモーダル研究における中核的な取り組みである。近年,視覚言語事前学習モデル(VLM)を活用した参照フリー手法の普及が進んでいる。実証的な証拠は、これらの革新的なアプローチが人間の判断と高い相関関係を示し、この分野の大きな進歩を示していることを裏付けている。しかし、人間の評価とより高い相関関係は、計量の完備を示すのに十分か? そこで本稿では,本質問に対する回答として,参照フリーメトリクスに欠陥があるかどうかについて検討する。特に,コブラ効果に触発されて,指標スコアを報酬として,指標の基準と密接に一致する記述を生成するためにキャプションモデルを指示する。ある計量に欠陥がある場合、モデルによって利用され、生成された文に反映される。以上の結果から,これらの指標による記述には,一貫性のない文や過度な繰り返しなど,重大な欠陥が含まれていることが明らかとなった。次に,これらの指標の問題点を解消するために,自己改善という新しい手法を提案する。 GPT-4Vは生成した文を評価するための評価ツールであり,提案手法がSOTA(State-of-the-art)の性能を達成することを示す。また,参照のない画像キャプション指標を包括的に評価するために,欠陥キャプションと呼ばれる難易度評価ベンチマークも導入する。私たちのコードはhttps://github.com/aaronma2020/robust_captioning_metricで利用可能です。

Evaluating the compatibility between textual descriptions and corresponding images represents a core endeavor within multi-modal research. In recent years, a proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. Empirical evidence has substantiated that these innovative approaches exhibit a higher correlation with human judgment, marking a significant advancement in the field. However, does a higher correlation with human evaluations alone sufficiently denote the complete of a metric? In response to this question, in this paper, we study if there are any deficiencies in reference-free metrics. Specifically, inspired by the Cobra Effect, we utilize metric scores as rewards to direct the captioning model toward generating descriptions that closely align with the metric's criteria. If a certain metric has flaws, it will be exploited by the model and reflected in the generated sentences. Our findings reveal that descriptions guided by these metrics contain significant flaws, e.g. incoherent statements and excessive repetition. Subsequently, we propose a novel method termed Self-Improving to rectify the identified shortcomings within these metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance. In addition, we also introduce a challenging evaluation benchmark called Flaws Caption to evaluate reference-free image captioning metrics comprehensively. Our code is available at https://github.com/aaronma2020/robust_captioning_metric

翻訳日:2024-02-20 20:21:53 公開日:2024-02-18

# テーブルトップロボット「ハル」との会話で表現力のあるロボットの振る舞いをllmで生成する

Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru ( http://arxiv.org/abs/2402.11571v1 )

ライセンス: Link先を確認

Zining Wang and Paul Reisert and Eric Nichols and Randy Gomez

(参考訳) ソーシャルロボットは、対話を通じて人間と長期の結びつきを確立することを目的としている。しかし、従来の会話のアプローチは、スクリプト化された対話に依存しており、しばしば対話を維持するのに不足する。本稿では,よりダイナミックで表現豊かな会話を実現するために,大規模言語モデル(llm)をソーシャルロボットに統合することで,この制限に対処する。ロボットの性格に相反する表現行動を伴うロボット応答を生成するために,LLMを利用した完全自動会話システムを提案する。ロボットの動作を2つのモードで組み込む。 1)様々な配送スタイルが可能なtts(text-to-speech)エンジン 2)ロボットの身体動作のライブラリ。ロボットの音声のトーンを動的に選択し,LLM出力の絵文字をロボット行動生成の手がかりとして利用する,カスタムな最先端の感情認識モデルを開発した。私たちのシステムのデモはここにある。そこで,提案するシステムを用いて,ボランティアがソーシャルロボットとチャットする実験を行い,そのフィードバックを分析し,チャットの書き起こしを厳格にエラー解析する。フィードバックは圧倒的に肯定的であり、参加者はロボットの共感、役立ち、自然性、娯楽についてコメントした。最も否定的なフィードバックは、会話に限られた影響を及ぼす自動音声認識(ASR)エラーによるものだった。しかし,LLM自体の繰り返しや幻覚的情報や人間の反応など,会話を損なう可能性があり,LLMアプリケーションにとって重要な問題を提起する,小さな誤りが見られた。

Social robots aim to establish long-term bonds with humans through engaging conversation. However, traditional conversational approaches, reliant on scripted interactions, often fall short in maintaining engaging conversations. This paper addresses this limitation by integrating large language models (LLMs) into social robots to achieve more dynamic and expressive conversations. We introduce a fully-automated conversation system that leverages LLMs to generate robot responses with expressive behaviors, congruent with the robot's personality. We incorporate robot behavior with two modalities: 1) a text-to-speech (TTS) engine capable of various delivery styles, and 2) a library of physical actions for the robot. We develop a custom, state-of-the-art emotion recognition model to dynamically select the robot's tone of voice and utilize emojis from LLM output as cues for generating robot actions. A demo of our system is available here. To illuminate design and implementation issues, we conduct a pilot study where volunteers chat with a social robot using our proposed system, and we analyze their feedback, conducting a rigorous error analysis of chat transcripts. Feedback was overwhelmingly positive, with participants commenting on the robot's empathy, helpfulness, naturalness, and entertainment. Most negative feedback was due to automatic speech recognition (ASR) errors which had limited impact on conversations. However, we observed a small class of errors, such as the LLM repeating itself or hallucinating fictitious information and human responses, that have the potential to derail conversations, raising important issues for LLM application.

翻訳日:2024-02-20 20:21:27 公開日:2024-02-18

# 自律型ロボットによる行動コーチングセッションの開発

Developing Autonomous Robot-Mediated Behavior Coaching Sessions with Haru ( http://arxiv.org/abs/2402.11569v1 )

ライセンス: Link先を確認

Matou\v{s} Jel\'inek and Eric Nichols and Randy Gomez

(参考訳) 本研究では,行動変化コーチングにおける人間とロボットの対話における自律対話の設計と影響に関する実証的研究を行う。テーブルトップ型ソーシャルロボット「はる」の利用に注目し,ポジティブな行動変化を促すための「ちっちゃい習慣」手法の実装を検討する。本研究の核心は、春の感情表現力と独特な性格を最大化する完全自律的な対話システムを開発することである。本手法では,対話システムの反復設計と広範囲なテストを行い,Tiny Habits法の原則を効果的に具現化し,信頼性向上と信頼度向上の戦略を取り入れた。対話の最終版の有効性を実験実験で評価した(n=12)。その結果, 春の活力, 相互作用性, 中立性に対する認識は著しく改善した。さらに,本研究は,社会ロボティクスにおける対話設計のより広範な理解に寄与し,今後の発展に向けた実践的な洞察を提供する。

This study presents an empirical investigation into the design and impact of autonomous dialogues in human-robot interaction for behavior change coaching. We focus on the use of Haru, a tabletop social robot, and explore the implementation of the Tiny Habits method for fostering positive behavior change. The core of our study lies in developing a fully autonomous dialogue system that maximizes Haru's emotional expressiveness and unique personality. Our methodology involved iterative design and extensive testing of the dialogue system, ensuring it effectively embodied the principles of the Tiny Habits method while also incorporating strategies for trust-raising and trust-dampening. The effectiveness of the final version of the dialogue was evaluated in an experimental study with human participants (N=12). The results indicated a significant improvement in perceptions of Haru's liveliness, interactivity, and neutrality. Additionally, our study contributes to the broader understanding of dialogue design in social robotics, offering practical insights for future developments in the field.

翻訳日:2024-02-20 20:21:00 公開日:2024-02-18

# 多次元画像の分類のための新しいフーリエニューラルオペレーターフレームワーク:3次元デジタル多孔質メディアへの応用

A novel Fourier neural operator framework for classification of multi-sized images: Application to 3D digital porous media ( http://arxiv.org/abs/2402.11568v1 )

ライセンス: Link先を確認

Ali Kashefi, Tapan Mukerji

(参考訳) フーリエニューラル演算子(FNO)は入力画像のサイズに関して不変であるため、従来の畳み込みニューラルネットワーク(CNN)とは対照的に、任意の大きさの画像をネットワークアーキテクチャの変更なしにFNOベースのフレームワークに入力することができる。 FNOの利点を生かして,様々な大きさの画像を分類する新しいディープラーニングフレームワークを提案する。特に,提案するネットワークを多次元画像上で同時にトレーニングする。実用的応用として,3次元ディジタル多孔質媒体のラベル(透過性など)の予測の問題を考える。このフレームワークを構築するための直感的なアプローチは、適応的な最大プーリングを用いてFNO層を分類器に接続することである。まず, 一定サイズを有する多孔質媒体に対してのみ有効であり, 異なるサイズを有する多孔質媒体に対して有効であることを示す。この制限を克服するため,我々は適応的な最大プールを使用する代わりに,FNO層のチャネル幅の大きい静的最大プールを使用する。 FNO層のチャネル幅は入力画像サイズとは無関係であるため、導入したフレームワークはトレーニング中にマルチサイズの画像を処理できる。導入したフレームワークの有効性を示し、様々な大きさの3次元デジタル多孔質媒体の分類例を例に、直感的な手法と比較する。

Fourier neural operators (FNOs) are invariant with respect to the size of input images, and thus images with any size can be fed into FNO-based frameworks without any modification of network architectures, in contrast to traditional convolutional neural networks (CNNs). Leveraging the advantage of FNOs, we propose a novel deep-learning framework for classifying images with varying sizes. Particularly, we simultaneously train the proposed network on multi-sized images. As a practical application, we consider the problem of predicting the label (e.g., permeability) of three-dimensional digital porous media. To construct the framework, an intuitive approach is to connect FNO layers to a classifier using adaptive max pooling. First, we show that this approach is only effective for porous media with fixed sizes, whereas it fails for porous media of varying sizes. To overcome this limitation, we introduce our approach: instead of using adaptive max pooling, we use static max pooling with the size of channel width of FNO layers. Since the channel width of the FNO layers is independent of input image size, the introduced framework can handle multi-sized images during training. We show the effectiveness of the introduced framework and compare its performance with the intuitive approach through the example of the classification of three-dimensional digital porous media of varying sizes.

翻訳日:2024-02-20 20:20:44 公開日:2024-02-18

# 単一コピーレベルでのマルチパラメータ量子推定における量子Cram\'{e}r-Rao境界の飽和性

Saturability of the Quantum Cram\'{e}r-Rao Bound in Multiparameter Quantum Estimation at the Single-Copy Level ( http://arxiv.org/abs/2402.11567v1 )

ライセンス: Link先を確認

Hendra I. Nurdin

(参考訳) 量子パラメータ推定における精度の究極の下界としての量子クローズ(qcrb)は、パラメータに付随する対称対数微分(sld)の完全または平均可換性のような条件下では、特別な場合において、マルチパラメータ設定において飽和可能であることが知られている。さらに、一般の混合状態の場合、量子状態の無限に多くの同一のコピーに対する集合的測定は一般にqcrbを達成するために必要となる。重要かつ実験的な単一コピーシナリオでは、一般混合状態のマルチパラメータ設定においてQCRBを飽和させるために必要な条件は、SLDにおけるいわゆる部分可換性条件である。しかし、この条件が十分かどうかは不明である。本稿では, 部分可換性を示し, ほぼ十分である新しい条件を導出する。マルチパラメータ単一コピーの場合,QCRBの飽和度は,他の条件とともに十分であることがわかった。また、十分な条件が満たされると、qcrbを飽和させる最適な測定を投影的かつ明示的に特徴付けることができる。例として、この条件が満たされ、明確に検証できるマルチパラメータ量子状態の例を示す。

The quantum Cram\'{e}r-Rao bound (QCRB) as the ultimate lower bound for precision in quantum parameter estimation is only known to be saturable in the multiparameter setting in special cases and under conditions such as full or average commutavity of the symmetric logarithmic derivatives (SLDs) associated with the parameters. Moreover, for general mixed states, collective measurements over infinitely many identical copies of the quantum state are generally required to attain the QCRB. In the important and experimentally relevant single-copy scenario, a necessary condition for saturating the QCRB in the multiparameter setting for general mixed states is the so-called partial commutativity condition on the SLDs. However, it is not known if this condition is also sufficient. This paper derives new necessary conditions that imply partial commutativity and are almost sufficient. It is shown that together with another condition they become sufficient for saturability of the QCRB in the multiparameter single-copy case. Moreover, when the sufficient conditions are satisfied an optimal measurement saturating the QCRB can be chosen to be projective and explicitly characterized. An example is developed to illustrate the case of a multiparameter quantum state where the conditions derived herein are satisfied and can be explicitly verified.

翻訳日:2024-02-20 20:20:21 公開日:2024-02-18

# データ拡張と一貫性トレーニングの再検討による半教師付き2次元ポーズ推定の促進

Boosting Semi-Supervised 2D Human Pose Estimation by Revisiting Data Augmentation and Consistency Training ( http://arxiv.org/abs/2402.11566v1 )

ライセンス: Link先を確認

Huayi Zhou, Mukun Luo, Fei Jiang, Yue Ding, Hongtao Lu

(参考訳) 2次元のポーズ推定は基本的な視覚問題である。しかし、モデルの教師付き学習には大量のラベル付き画像が必要である。本稿では,半教師付き学習(SSL)方式でラベルのない余分な画像を抽出することにより,ポーズ推定器の精度を高めることを目的とする。従来の一貫性ベースのSSLメソッドは、異なる拡張イメージに対して一貫性のある結果を予測するためにモデルを制約しようと努力した。この合意に従い、高度なデータ拡張手法と簡潔な一貫性トレーニングフレームワークを含む2つのコア側面を再検討する。具体的には、既存のデータ拡張の様々な組み合わせをヒューリスティックに掘り下げ、新しい優れたデータ拡張スキームを発見し、ラベルのないサンプルにより効果的にノイズを加える。一貫性ベースのSSLにおいて重要な役割を果たす、変換の難しさのギャップを大きくした、簡単なハードな拡張ペアを構成することができる。さらに,多彩な拡張によるラベルなし画像の繰り返しの強化,マルチパス予測の逐次生成,および1つのネットワークを用いた教師なし一貫性損失の最適化を提案する。このシンプルでコンパクトな設計は、以前の2重または3重ネットワークからなる手法と同等である。さらに、パフォーマンスを向上させるために複数のネットワークと統合することもできる。最先端のSSLアプローチと比較して、我々の手法はパブリックデータセットに大幅な改善をもたらす。コードは \url{https://github.com/hnuzhy/MultiAugs} で学術的に使用される。

The 2D human pose estimation is a basic visual problem. However, supervised learning of a model requires massive labeled images, which is expensive and labor-intensive. In this paper, we aim at boosting the accuracy of a pose estimator by excavating extra unlabeled images in a semi-supervised learning (SSL) way. Most previous consistency-based SSL methods strive to constraint the model to predict consistent results for differently augmented images. Following this consensus, we revisit two core aspects including advanced data augmentation methods and concise consistency training frameworks. Specifically, we heuristically dig various collaborative combinations of existing data augmentations, and discover novel superior data augmentation schemes to more effectively add noise on unlabeled samples. They can compose easy-hard augmentation pairs with larger transformation difficulty gaps, which play a crucial role in consistency-based SSL. Moreover, we propose to strongly augment unlabeled images repeatedly with diverse augmentations, generate multi-path predictions sequentially, and optimize corresponding unsupervised consistency losses using one single network. This simple and compact design is on a par with previous methods consisting of dual or triple networks. Furthermore, it can also be integrated with multiple networks to produce better performance. Comparing to state-of-the-art SSL approaches, our method brings substantial improvements on public datasets. Code is released for academic use in \url{https://github.com/hnuzhy/MultiAugs}.

翻訳日:2024-02-20 20:19:57 公開日:2024-02-18

# グラフ上での継続的学習:挑戦、解決策、機会

Continual Learning on Graphs: Challenges, Solutions, and Opportunities ( http://arxiv.org/abs/2402.11565v1 )

ライセンス: Link先を確認

Xikun Zhang, Dongjin Song, Dacheng Tao

(参考訳) グラフデータに対する連続学習は,新たに出現したグラフタスクに逐次更新されたモデルを適用しつつ,既存のタスクにおける破滅的な忘れの問題を解決することを目的として,近年注目されている。ユークリッドデータの連続学習研究(画像やテキストなど)の進展を要約する努力が続けられているが、連続グラフ学習(CGL)や生涯グラフ学習(英語版)といった連続学習の体系的レビューは依然として求められている。グラフデータは、データ構造やアプリケーションのシナリオに関してはるかに複雑で、CGLタスクの設定、モデル設計、アプリケーションは非常に困難です。このギャップを埋めるために,既存の連続グラフ学習(CGL)アルゴリズムを網羅的にレビューし,その特徴に基づいてタスク設定を解明し,既存の手法を分類する。 CGL手法を従来の連続学習手法と比較し、従来の連続学習手法をCGLタスクに適用可能であるか分析する。さらに、我々はCGL研究に不可欠なベンチマーク作業についてレビューする。最後に,残る課題を議論し,今後の方向性を提案する。 CGLアルゴリズムの包括的なリストは、https://github.com/UConn-DSIS/Survey-of-Continual-Learning-on-Graphsで参照できます。

Continual learning on graph data has recently attracted paramount attention for its aim to resolve the catastrophic forgetting problem on existing tasks while adapting the sequentially updated model to newly emerged graph tasks. While there have been efforts to summarize progress on continual learning research over Euclidean data, e.g., images and texts, a systematic review of progress in continual learning on graphs, a.k.a, continual graph learning (CGL) or lifelong graph learning, is still demanding. Graph data are far more complex in terms of data structures and application scenarios, making CGL task settings, model designs, and applications extremely challenging. To bridge the gap, we provide a comprehensive review of existing continual graph learning (CGL) algorithms by elucidating the different task settings and categorizing the existing methods based on their characteristics. We compare the CGL methods with traditional continual learning techniques and analyze the applicability of the traditional continual learning techniques to CGL tasks. Additionally, we review the benchmark works that are crucial to CGL research. Finally, we discuss the remaining challenges and propose several future directions. We will maintain an up-to-date GitHub repository featuring a comprehensive list of CGL algorithms, accessible at https://github.com/UConn-DSIS/Survey-of-Continual-Learning-on-Graphs.

翻訳日:2024-02-20 20:19:35 公開日:2024-02-18

# 時空間インプットのための時間的遠方性コントラスト拡散モデル

Temporal Disentangled Contrastive Diffusion Model for Spatiotemporal Imputation ( http://arxiv.org/abs/2402.11558v1 )

ライセンス: Link先を確認

Yakun Chen, Kaize Shi, Zhangkai Wu, Juan Chen, Xianzhi Wang, Julian McAuley, Guandong Xu, Shui Yu

(参考訳) 時空間データ分析は、輸送、気象学、医療など、さまざまな分野において重要である。しかし、実際のシナリオで収集されたデータは、センサーの故障やネットワークの伝送エラーによって不完全性に悩まされることが多い。時空間計算は、観測データに存在する空間的および時間的依存関係を利用して、欠落した値を予測する。古典的な統計学や機械学習技術に依存した従来の手法は、特にデータが厳密な分布仮定を満たさない場合、しばしば不十分である。対照的に、グラフとリカレントニューラルネットワークを利用する最近のディープラーニングベースの手法では、効果が向上している。しかし、これらのアプローチはエラーの蓄積を招きやすい。生成モデルは、将来の予測のために、潜在的に不正確な歴史的暗示値への依存を避けるために、ますます採用されている。これらのモデルは、拡散モデルにおいて特に問題となる不安定な結果を生み出すという課題に対処する。我々は,生成過程と迅速なトレーニングを導く条件的特徴を設計することで,これらの課題に対処することを目的とする。具体的にはc$^2$tsdという,トレンド情報と季節情報を条件特徴として取り入れ,コントラスト学習を用いてモデルの一般化性を向上させる新しい手法を導入する。 3つの実世界のデータセットに関する広範な実験は、様々な最先端のベースラインよりもC$^2$TSDの方が優れた性能を示している。

Spatiotemporal data analysis is pivotal across various domains, including transportation, meteorology, and healthcare. However, the data collected in real-world scenarios often suffers incompleteness due to sensor malfunctions and network transmission errors. Spatiotemporal imputation endeavours to predict missing values by exploiting the inherent spatial and temporal dependencies present in the observed data. Traditional approaches, which rely on classical statistical and machine learning techniques, are often inadequate, particularly when the data fails to meet strict distributional assumptions. In contrast, recent deep learning-based methods, leveraging graph and recurrent neural networks, have demonstrated enhanced efficacy. Nonetheless, these approaches are prone to error accumulation. Generative models have been increasingly adopted to circumvent the reliance on potentially inaccurate historical imputed values for future predictions. These models grapple with the challenge of producing unstable results, a particular issue in diffusion-based models. We aim to address these challenges by designing conditional features to guide the generative process and expedite training. Specifically, we introduce C$^2$TSD, a novel approach incorporating trend and seasonal information as conditional features and employing contrastive learning to improve model generalizability. The extensive experiments on three real-world datasets demonstrate the superior performance of C$^2$TSD over various state-of-the-art baselines.

翻訳日:2024-02-20 20:19:09 公開日:2024-02-18

# 低線量ctリカバリの対向的ロバスト性評価

Evaluating Adversarial Robustness of Low dose CT Recovery ( http://arxiv.org/abs/2402.11557v1 )

ライセンス: Link先を確認

Kanchana Vaishnavi Gandikota, Paramanand Chandramouli, Hannah Droege, Michael Moeller

(参考訳) 低線量CT (low dose Computed tomography) 取得は, X線照射による有害な影響を低減するために推奨される。最近の研究は、ベンチマークデータセットの低線量CT回復問題にディープネットワークをうまく応用している。しかし、その堅牢性は臨床での使用前に徹底的な評価が必要である。本研究では,異なる深層学習手法と古典的CT回復手法の堅牢性を評価する。我々は,データ一貫性を促進するモデルベースネットワークを含むディープネットワークが,非標的攻撃の影響を受けやすいことを示した。驚いたことに、これらの品質の悪い再構築であっても、データの一貫性は大きな影響を受けず、ネットワークのより優れた正規化の必要性を動機付けている。ユニバーサルアタックの実現可能性を示し、異なる手法による攻撃伝達性について検討する。臨床領域の局所的な変化を引き起こす攻撃に対するロバスト性を分析した。古典的アプローチとディープネットワークの両方がそのような攻撃の影響を受け、局所的な病変の視覚的外観が変化し、非常に小さな摂動が生じる。結果として得られた再構成は、元の測定値と高いデータ整合性を持つため、これらの局所攻撃は、CT回復問題の解空間を探索するために使用できる。

Low dose computed tomography (CT) acquisition using reduced radiation or sparse angle measurements is recommended to decrease the harmful effects of X-ray radiation. Recent works successfully apply deep networks to the problem of low dose CT recovery on bench-mark datasets. However, their robustness needs a thorough evaluation before use in clinical settings. In this work, we evaluate the robustness of different deep learning approaches and classical methods for CT recovery. We show that deep networks, including model-based networks encouraging data consistency, are more susceptible to untargeted attacks. Surprisingly, we observe that data consistency is not heavily affected even for these poor quality reconstructions, motivating the need for better regularization for the networks. We demonstrate the feasibility of universal attacks and study attack transferability across different methods. We analyze robustness to attacks causing localized changes in clinically relevant regions. Both classical approaches and deep networks are affected by such attacks leading to changes in the visual appearance of localized lesions, for extremely small perturbations. As the resulting reconstructions have high data consistency with the original measurements, these localized attacks can be used to explore the solution space of the CT recovery problem.

翻訳日:2024-02-20 20:18:50 公開日:2024-02-18

# スプライン準補間に基づく経験的密度推定とCopulasクラスタリングモデルへの応用

Empirical Density Estimation based on Spline Quasi-Interpolation with applications to Copulas clustering modeling ( http://arxiv.org/abs/2402.11552v1 )

ライセンス: Link先を確認

Cristiano Tamborrino, Antonella Falini, Francesca Mazzia

(参考訳) 密度推定は、様々な分野において、基礎となるデータの分布をモデル化し理解するための基礎技術である。密度推定の主な目的は、確率変数の確率密度関数を推定することである。このプロセスは、非変量データや多変量データを扱う際に特に有用であり、クラスタリング、異常検出、生成モデリングといったタスクに必須である。本稿では,スプライン準補間を用いた密度の単変量近似を提案し,クラスタリングモデリングの文脈で適用した。クラスタリング手法は, 単変量経験密度 (marginals) の推定に依存する適切な多変量分布の構築に基づいている。このような近似は、提案したスプライン準補間を用いて達成され、探索されたクラスタリング分割をモデル化する結合分布はコプラ関数を用いて構成される。特に、コプラは限界分布とは独立にデータの特徴間の依存性を捉えることができるため、有限混合コプラモデルが提案されている。提案アルゴリズムは人工データセットと実データセットで検証される。

Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data. The primary objective of density estimation is to estimate the probability density function of a random variable. This process is particularly valuable when dealing with univariate or multivariate data and is essential for tasks such as clustering, anomaly detection, and generative modeling. In this paper we propose the mono-variate approximation of the density using spline quasi interpolation and we applied it in the context of clustering modeling. The clustering technique used is based on the construction of suitable multivariate distributions which rely on the estimation of the monovariate empirical densities (marginals). Such an approximation is achieved by using the proposed spline quasi-interpolation, while the joint distributions to model the sought clustering partition is constructed with the use of copulas functions. In particular, since copulas can capture the dependence between the features of the data independently from the marginal distributions, a finite mixture copula model is proposed. The presented algorithm is validated on artificial and real datasets.

翻訳日:2024-02-20 20:18:32 公開日:2024-02-18

# 変圧器を用いたインコンテキスト学習:リップシッツネスに適応したソフトマックスアテンション

In-Context Learning with Transformers: Softmax Attention Adapts to Function Lipschitzness ( http://arxiv.org/abs/2402.11639v1 )

ライセンス: Link先を確認

Liam Collins, Advait Parulekar, Aryan Mokhtari, Sujay Sanghavi, Sanjay Shakkottai

(参考訳) トランスフォーマーの驚くべき特性は、あるデータを通じて暗黙的に推論中に学習者が新しいコンテキストを提示し、そのコンテキストで予測を行うことを任務とする機械学習フレームワーク、in-context learning(icl)を実行する能力である。そのため、学習者は追加のトレーニングなしに文脈に適応しなければならない。各コンテキストが回帰タスクを符号化するICL設定におけるソフトマックスアテンションの役割について検討する。注意ユニットは、事前学習タスクのランドスケープに適応した最寄りの予測器を実装するために使用するウィンドウを学習する。具体的には,このウィンドウがリプシッツ性の減少とラベルノイズの増加によって拡大することを示す。また,低ランク線形問題において,注意ユニットは推論前に適切な部分空間に投影することを学習することを示した。さらに, この適応性はソフトマックス活性化に大きく依存しており, 先行理論解析でよく研究される線形活性化では再現できないことを示した。

A striking property of transformers is their ability to perform in-context learning (ICL), a machine learning framework in which the learner is presented with a novel context during inference implicitly through some data, and tasked with making a prediction in that context. As such that learner must adapt to the context without additional training. We explore the role of softmax attention in an ICL setting where each context encodes a regression task. We show that an attention unit learns a window that it uses to implement a nearest-neighbors predictor adapted to the landscape of the pretraining tasks. Specifically, we show that this window widens with decreasing Lipschitzness and increasing label noise in the pretraining tasks. We also show that on low-rank, linear problems, the attention unit learns to project onto the appropriate subspace before inference. Further, we show that this adaptivity relies crucially on the softmax activation and thus cannot be replicated by the linear activation often studied in prior theoretical analyses.

翻訳日:2024-02-20 20:10:53 公開日:2024-02-18

# 流れ速度の古典性から導かれる量子粒子の位相的挙動

A topological behavior of quantum particles originated from the classicality of their flow velocity ( http://arxiv.org/abs/2402.11624v1 )

ライセンス: Link先を確認

Tomer Shushi

(参考訳) この手紙では、量子粒子を古典流体として記述することから自然に生じる新しい量子効果を提案する。有限凸領域における粒子の量子力学の流体力学的定式化に続いて、波動関数の振幅の最大値は、消滅した量子ポテンシャルを示唆する領域の境界に沿ってどのようにあるかを示し、粒子の古典的な流れ速度を示唆する。この効果は、リーマン構造によって記述された曲線空間の粒子に対して得られる。さらに、平面時空や曲線時空の量子粒子を扱う場合、相対論的状態においてそのような効果は達成できないことを示す。

In this letter, we propose a new quantum effect that naturally emerges from describing the quantum particle as a classical fluid. Following the hydrodynamical formulation of quantum mechanics for a particle in a finite convex region, we show how the maximum values of the wavefunction's amplitude lie along the boundaries of the region when imposing a vanished quantum potential, implying a classical flow velocity of the particle. The effect is obtained for the case of particles in curved space, described by Riemannian structures. We further show that such an effect cannot be achieved in the relativistic regime when dealing with quantum particles in flat or curved spacetime.

翻訳日:2024-02-20 20:10:35 公開日:2024-02-18

# ファブリペロマイクロキャビティにおける量子ドットからのフィルタフリー高性能単一光子放出

Filter-free high-performance single photon emission from a quantum dot in a Fabry-Perot microcavity ( http://arxiv.org/abs/2402.11623v1 )

ライセンス: Link先を確認

Zhixuan Rao, Jiawei Yang, Changkun Song, Mujie Rao, Ziyang Zheng, Luyu Liu, Xuebin Peng, Ying Yu and Siyuan Yu

(参考訳) 共鳴励起とPurcell-enhanced single quantum dots(QD)を組み合わせることは、高性能な固体単一光子源を実現するための重要な戦略である。しかし、光子効率の最適化には、励起レーザーとqdsの発光を効果的に分離する問題に対処する必要がある。伝統的に、これは偏光フィルタリングであり、達成可能な偏光方向とフォトニック状態のスケーラビリティを制限する。本研究では, モノリシックファブリペロマイクロキャビティと決定的に結合したQDの空間直交共振励起を用いて, この問題に対処した。膜キャビティ構造を利用して, フィルタのない単一光子共鳴蛍光を実現した。得られた光源は、高い抽出効率が0.87、純度が0.9045(4)、識別性が0.963(4)である単一光子を生成する。

Combining resonant excitation with Purcell-enhanced single quantum dots (QDs) stands out as a prominent strategy for realizing high performance solid-state single photon sources. However, optimizing photon efficiency requires addressing challenges associated with effectively separating the excitation laser from QDs' emission. Traditionally, this involves polarization filtering, which limits the achievable polarization directions and the scalability of photonic states. In this study, we have successfully tackled this challenge by employing spatially-orthogonal resonant excitation of QDs, deterministically coupled to monolithic Fabry-Perot microcavities. Leveraging the membrane cavity structures, we have achieved filter-free single photon resonant fluorescence. The resulting source produces single photons with a simultaneous high extraction efficiency of 0.87, purity of 0.9045(4), and indistinguishability of 0.963(4).

翻訳日:2024-02-20 20:10:23 公開日:2024-02-18

# 論理閉ループ:大規模視覚言語モデルにおける物体幻覚の発見

Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models ( http://arxiv.org/abs/2402.11622v1 )

ライセンス: Link先を確認

Junfei Wu, Qiang Liu, Ding Wang, Jinghao Zhang, Shu Wu, Liang Wang, Tieniu Tan

(参考訳) 物体幻覚は、大きな視覚言語モデル(LVLM)の幅広い応用を妨げるアキレス腱である。オブジェクト幻覚(Object Hallucination)とは、LVLMが画像に存在しない物体を主張する現象である。対象幻覚を緩和するために,大規模計算資源を必要とするか,あるいは外部モデルの検出結果に依存する命令チューニングや外部モデルに基づく検出手法が提案されている。しかし、lvlm自体を物体幻覚の緩和に利用する未熟な分野は残されている。本研究では、lvlm は存在物体に対して論理的に一貫して応答するが、幻覚対象には一貫性がないという直観を取り入れている。そこで我々は,物体の幻覚検出と緩和のための論理閉ループベースのフレームワーク,LogicCheckGPTを提案する。具体的には、論理的整合性探索を考案し、論理的相関による質問を提起し、オブジェクトの属性を問う。それらの反応が論理閉ループを形成するか否かは、対象幻覚の指標となる。プラグアンドプレイ法として、既存のすべてのLVLMにシームレスに適用することができる。 4つのLVLMにまたがる3つのベンチマークで実施した総合的な実験により,本手法による大幅な改善が示された。

Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.

翻訳日:2024-02-20 20:10:05 公開日:2024-02-18

# Decoding News Narratives: Framing Bias Detectionにおける大規模言語モデルの批判的分析

Decoding News Narratives: A Critical Analysis of Large Language Models in Framing Bias Detection ( http://arxiv.org/abs/2402.11621v1 )

ライセンス: Link先を確認

Valeria Pastorino, Jasivan A. Sivakumar, Nafise Sadat Moosavi

(参考訳) 本研究は,GPT-3.5 Turbo, GPT-4, Flan-T5モデルを用いて,ゼロショット, 少数ショット, 説明可能なプロンプト手法によるニュース見出しのフレーミングバイアスを検出することにより, 社会科学におけるLCMの適用性の向上に寄与する。評価から得られた重要な知見は、これらのモデルの信頼性を高めるための説明可能な効果が顕著であり、フレーミングバイアスに関する社会科学研究における説明可能な設定の重要性を強調している。特にGPT-4は、関連するドメイン内の様々な例を示す場合、いくつかのシナリオでパフォーマンスが向上した。 FLAN-T5の貧弱な性能は、より小さなモデルではフレーミングバイアスの検出にタスク固有の微調整が必要になることを示している。また、モデル、特にgpt-4は、しばしば感情言語をフレーミングバイアスの指標として誤解し、真の感情表現を報告することと、意図的にニュース見出しでフレーミングバイアスを使用することを区別することの難しさを強調している。さらに,フレーミングバイアスの有無が明確か,あるいはより議論された見出しの2つの部分集合について評価を行い,既存のデータセットや新しいデータセット内の潜在的なアノテーション不正確性をフラグ付けする上で,これらのモデルが有効であることを示唆した。最後に、この研究は、実際の状況(野における)におけるモデルを評価し、米国銃暴力に焦点を当てた最初のデータセットを超えて、幅広いトピックをカバーするフレーム付き見出しでモデルのパフォーマンスを評価する。

This work contributes to the expanding research on the applicability of LLMs in social sciences by examining the performance of GPT-3.5 Turbo, GPT-4, and Flan-T5 models in detecting framing bias in news headlines through zero-shot, few-shot, and explainable prompting methods. A key insight from our evaluation is the notable efficacy of explainable prompting in enhancing the reliability of these models, highlighting the importance of explainable settings for social science research on framing bias. GPT-4, in particular, demonstrated enhanced performance in few-shot scenarios when presented with a range of relevant, in-domain examples. FLAN-T5's poor performance indicates that smaller models may require additional task-specific fine-tuning for identifying framing bias detection. Our study also found that models, particularly GPT-4, often misinterpret emotional language as an indicator of framing bias, underscoring the challenge of distinguishing between reporting genuine emotional expression and intentionally use framing bias in news headlines. We further evaluated the models on two subsets of headlines where the presence or absence of framing bias was either clear-cut or more contested, with the results suggesting that these models' can be useful in flagging potential annotation inaccuracies within existing or new datasets. Finally, the study evaluates the models in real-world conditions ("in the wild"), moving beyond the initial dataset focused on U.S. Gun Violence, assessing the models' performance on framed headlines covering a broad range of topics.

翻訳日:2024-02-20 20:09:42 公開日:2024-02-18

# BERT表現における言語特徴の処理プロファイルを識別するメトリック学習符号化モデル

Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations ( http://arxiv.org/abs/2402.11608v1 )

ライセンス: Link先を確認

Louis Jalouzot, Robin Sobczyk, Bastien Lhopitallier, Jeanne Salle, Nur Lan, Emmanuel Chemla, Yair Lakretz

(参考訳) 我々は、ニューラルネットワークが処理対象の理論的特徴をどのように表現するかを理解するための新しいアプローチとして、Metric-Learning Encoding Models (MLEMs)を紹介した。概念実証として,BERTから抽出した神経表現にMLEMを適用し,多種多様な言語的特徴(時制,主観的人格,節型,節の埋め込みなど)を追跡する。 1) 言語的特徴は順序づけられる: 異なる層で異なる程度に異なる文の表現を分離する; 2) 神経的表現は階層的に整理される: いくつかの層では、より大きなクラスターの中に入れ替わる表現の集合体が、連続して重要な言語的特徴に従って見つかる; (3) 言語的特徴は中間層で不連続である: 区別的、選択的単位は異なる言語的特徴によって活性化される。メソジカルには、MLEMは多変量復号法よりも優れ、型Iエラーに対してより堅牢であり、(5)局所表現と分散表現の両方を予測することができる。これは、言語モデルにおける言語的特徴のニューラルエンコード方法の研究におけるメトリックラーニング符号化法の有用性と、従来の手法よりもMLEMの利点を示すものである。 MLEMは、他のドメイン(例えば視覚)や人間の脳などの他の神経系に拡張することができる。

We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features are ordered: they separate representations of sentences to different degrees in different layers; (2) neural representations are organized hierarchically: in some layers, we find clusters of representations nested within larger clusters, following successively important linguistic features; (3) linguistic features are disentangled in middle layers: distinct, selective units are activated by distinct linguistic features. Methodologically, MLEMs are superior (4) to multivariate decoding methods, being more robust to type-I errors, and (5) to univariate encoding methods, in being able to predict both local and distributed representations. Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g. vision) and to other neural systems, such as the human brain.

翻訳日:2024-02-20 20:09:12 公開日:2024-02-18

# 準確率的トイモデルにおける非古典性原始

Non-classicality Primitive in a Quasi-probabilistic Toy Model ( http://arxiv.org/abs/2402.11607v1 )

ライセンス: Link先を確認

Kelvin Onggadinata, Pawel Kurzynski, Dagomir Kaszlikowski

(参考訳) 局所アリスとボブが古典的ランダム性を共有する準確率的玩具モデルにおいて,基本的な非古典的効果を示す。我々のシナリオは、ベル不等式違反などの非古典性の正統的な実証と異なり、両方の局地観察者が自由意志を持ち、ランダムに測定設定を選択する。議論の中核は、Abramsky と Brandenburger (Horizons of the Mind, Springer, Cham (2014)) と Pashayan らによって修正されたアルゴリズムである。アル [Phys. Rev. Lett. 115, 070501 (2015)]Bobが決定論的に準確率演算を行うなら、AliceとBobはそれをシミュレートするために古典的なコミュニケーションを要求する。

We demonstrate a basic non-classical effect in a quasi-probabilistic toy model with local Alice and Bob who share classical randomness. Our scenario differs from the orthodox demonstrations of non-classicality such as violations of Bell inequalities where both local observers have a free will and randomly choose their measurement settings. The core of the argument are modified algorithms by Abramsky and Brandenburger [in Horizons of the Mind, Springer, Cham (2014)], and Pashayan et. al. [Phys. Rev. Lett. 115, 070501 (2015)] we use to show that if Bob deterministically performs a quasi-stochastic operation, Alice and Bob require classical communication to simulate it.

翻訳日:2024-02-20 20:08:42 公開日:2024-02-18

# 自己進化型オートエンコーダ組み込みQネットワーク

Self-evolving Autoencoder Embedded Q-Network ( http://arxiv.org/abs/2402.11604v1 )

ライセンス: Link先を確認

J. Senthilnath, Bangjian Zhou, Zhen Wei Ng, Deeksha Aggarwal, Rajdeep Dutta, Ji Wei Yoon, Aye Phyu Phyu Aung, Keyu Wu, Min Wu, Xiaoli Li

(参考訳) 逐次的意思決定タスクの分野では,強化学習(rl)エージェントの探索能力は,環境とのインタラクションを通じて高い報酬を得る上で最重要となる。そこで本研究では,自己進化型オートエンコーダ(SA)をQ-Network(QN)に組み込む新しい手法であるSAQNを提案する。 SAQNでは、自己進化型オートエンコーダアーキテクチャは、エージェントが環境を探索する際に適応して進化する。この進化により、オートエンコーダは様々な生の観測を捉え、潜在空間において効果的に表現することができる。エンコーダ生成した潜在空間から抽出された不連続状態を利用して、qnを訓練し、報酬を改善する最適なアクションを決定する。オートエンコーダアーキテクチャの進化において、rlエージェントからの最適な応答を導出するためにバイアス分散規制戦略が用いられる。この戦略には2つの重要な要素があります (i)事前に獲得した知識を保持するためのノードの成長の促進、環境の豊かな表現の確保、 (ii)より管理可能でトラクタブルな潜在空間を維持するために、最小の寄与ノードをプルーニングすること。 3つの異なるベンチマーク環境と実世界の分子環境で行った大規模な実験により、提案したSAQNは最先端の環境よりも大幅に優れていることが示された。その結果、自己進化型オートエンコーダの有効性と、シーケンシャルな意思決定タスクに取り組む上でのQ-Networkとの協調性を強調した。

In the realm of sequential decision-making tasks, the exploration capability of a reinforcement learning (RL) agent is paramount for achieving high rewards through interactions with the environment. To enhance this crucial ability, we propose SAQN, a novel approach wherein a self-evolving autoencoder (SA) is embedded with a Q-Network (QN). In SAQN, the self-evolving autoencoder architecture adapts and evolves as the agent explores the environment. This evolution enables the autoencoder to capture a diverse range of raw observations and represent them effectively in its latent space. By leveraging the disentangled states extracted from the encoder generated latent space, the QN is trained to determine optimal actions that improve rewards. During the evolution of the autoencoder architecture, a bias-variance regulatory strategy is employed to elicit the optimal response from the RL agent. This strategy involves two key components: (i) fostering the growth of nodes to retain previously acquired knowledge, ensuring a rich representation of the environment, and (ii) pruning the least contributing nodes to maintain a more manageable and tractable latent space. Extensive experimental evaluations conducted on three distinct benchmark environments and a real-world molecular environment demonstrate that the proposed SAQN significantly outperforms state-of-the-art counterparts. The results highlight the effectiveness of the self-evolving autoencoder and its collaboration with the Q-Network in tackling sequential decision-making tasks.

翻訳日:2024-02-20 20:08:26 公開日:2024-02-18

# マルチタスク推論: 大規模言語モデルは一度に複数の命令を追えるか?

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once? ( http://arxiv.org/abs/2402.11597v1 )

ライセンス: Link先を確認

Guijin Son and Sangwon Baek and Sangdae Nam and Ilgyun Jeong and Seungone Kim

(参考訳) 大規模言語モデル(LLM)は通常、推論呼び出し毎に単一の命令に従うように促される。本研究では、llmsが複数の命令を同時に処理できるかどうかを、マルチタスク推論として分析する。 MTI Bench(Multi-Task Inference Benchmark)は,25タスクにわたる5,000インスタンスを対象とした総合評価ベンチマークである。 MTIベンチの各タスクは2～3つのサブタスクを含む。予想通り、マルチタスク推論は複数の推論呼び出しを必要としないため、平均で1.46倍の推論時間を削減できることを最初に実証した。興味深いことに、タスク分割時のLLMの性能は向上すると期待されているのに対して、Llama-2-Chat-70BやGPT-4のような最先端のLLMは、MTI Benchのシングルタスク推論と比較して最大7.3%、12.4%向上している。 MTI Benchデータセットとコードをこのリンクでリリースします。

Large language models (LLMs) are typically prompted to follow a single instruction per inference call. In this work, we analyze whether LLMs also hold the capability to handle multiple instructions simultaneously, denoted as Multi-Task Inference. For this purpose, we introduce the MTI Bench(Multi-Task Inference Benchmark), a comprehensive evaluation benchmark encompassing 5,000 instances across 25 tasks. Each task in the MTI Bench involves 2 to 3 sub-tasks. As expected, we first demonstrate that Multi-Task Inference reduces the total inference time by 1.46 times in average since it does not require multiple inference calls. Interestingly, contrary to the expectation that LLMs would perform better when tasks are divided, we find that state-of-the-art LLMs, such as Llama-2-Chat-70B and GPT-4, show up to 7.3% and 12.4% improved performance with Multi-Task Inference compared to Single-Task Inference on the MTI Bench. We release the MTI Bench dataset and our code at this link https://github.com/guijinSON/MTI-Bench.

翻訳日:2024-02-20 20:08:02 公開日:2024-02-18

# オンライン機械学習におけるハイパーパラメータチューニングの簡略化 -- spotRiverGUI

Simplifying Hyperparameter Tuning in Online Machine Learning -- The spotRiverGUI ( http://arxiv.org/abs/2402.11594v1 )

ライセンス: Link先を確認

Thomas Bartz-Beielstein

(参考訳) Batch Machine Learning(BML)は非常に大量のストリーミングデータを扱う場合、その限界に達する。これは、利用可能なメモリ、データストリームのドリフト処理、新しい未知のデータ処理に特に当てはまる。 Online Machine Learning (OML)は、BMLの制限を克服するBMLに代わるものだ。 OMLはシーケンシャルな方法でデータを処理することができ、特にデータストリームに役立ちます。 River`パッケージはPython OMLライブラリであり、分類、回帰、クラスタリング、異常検出など、さまざまなオンライン学習アルゴリズムを提供する。パッケージは、OMLモデルのハイパーパラメータチューニングのためのフレームワークを提供する。 spotRiverGUI`は、‘spotRiver`パッケージのグラフィカルユーザインターフェースである。 spotrivergui`は、最適なハイパーパラメータの設定を手動で検索する負担からユーザーを解放する。データが提供されると、ユーザは強力な‘River’パッケージから異なるOMLアルゴリズムを比較して、選択したアルゴリズムを非常に効率的にチューニングできる。

Batch Machine Learning (BML) reaches its limits when dealing with very large amounts of streaming data. This is especially true for available memory, handling drift in data streams, and processing new, unknown data. Online Machine Learning (OML) is an alternative to BML that overcomes the limitations of BML. OML is able to process data in a sequential manner, which is especially useful for data streams. The `river` package is a Python OML-library, which provides a variety of online learning algorithms for classification, regression, clustering, anomaly detection, and more. The `spotRiver` package provides a framework for hyperparameter tuning of OML models. The `spotRiverGUI` is a graphical user interface for the `spotRiver` package. The `spotRiverGUI` releases the user from the burden of manually searching for the optimal hyperparameter setting. After the data is provided, users can compare different OML algorithms from the powerful `river` package in a convenient way and tune the selected algorithms very efficiently.

翻訳日:2024-02-20 20:07:44 公開日:2024-02-18

# メモリ効率の良いLLMファインチューニングのためのゼロ階最適化の再検討:ベンチマーク

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ( http://arxiv.org/abs/2402.11592v1 )

ライセンス: Link先を確認

Yihua Zhang, Pingzhi Li, Junyuan Hong, Jiaxiang Li, Yimeng Zhang, Wenqing Zheng, Pin-Yu Chen, Jason D. Lee, Wotao Yin, Mingyi Hong, Zhangyang Wang, Sijia Liu, Tianlong Chen

(参考訳) 自然言語処理(NLP)の進化途上において、SGDやAdamのような一階最適化(FO)を備えた微調整済みの大規模言語モデル(LLM)が標準となっている。しかし, LLMのサイズが大きくなるにつれて, FO勾配計算のバックプロパゲーション(BP)によるメモリオーバーヘッドが大幅に増大する。メモリ効率が最重要となるオンデバイストレーニングのようなアプリケーションでは、この問題に対処することが特に重要です。本稿では, BPフリーなゼロオーダー最適化(ZO)へのシフトを, MeZO による初期概念に基づく LLM 微調整時のメモリコスト削減ソリューションとして提案する。従来のZO-SGD法とは異なり、我々の研究はより広範なZO最適化手法に拡張され、5つのLLMファミリー(Roberta, OPT, LLaMA, Vicuna, Mistral)、3つのタスク複雑度、5つの微調整スキームにまたがる総合的なベンチマーク研究が実施されている。本研究は,これまで見過ごされていた最適化原理を明らかにし,タスクアライメントの重要性,前傾勾配法の役割,アルゴリズムの複雑さと微調整性能のバランスを強調する。さらに,ブロックワイド降下,ハイブリッドトレーニング,勾配間隔など,ZO最適化の新たな拡張も導入する。本研究は、さらなるメモリ効率のllm微調整を実現するための有望な方向性を提供する。すべての実験を再現するためのコードはhttps://github.com/ZO-Bench/ZO-LLM にある。

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow {in size}, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by MeZO. Unlike traditional ZO-SGD methods, our work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families (Roberta, OPT, LLaMA, Vicuna, Mistral), three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments are at https://github.com/ZO-Bench/ZO-LLM .

翻訳日:2024-02-20 20:07:30 公開日:2024-02-18

# SDiT:トランスを用いたスパイキング拡散モデル

SDiT: Spiking Diffusion Model with Transformer ( http://arxiv.org/abs/2402.11588v1 )

ライセンス: Link先を確認

Shu Yang, Hanzhi Ma, Chengting Yu, Aili Wang, Er-Ping Li

(参考訳) スパイキングニューラルネットワーク (snn) は消費電力が低く, バイオコンタプリタブルな特性を有しており, エネルギー効率の高いコンピューティングの可能性を秘めていると考えられている。しかし、画像生成タスクにおけるSNNの探索は非常に限定的であり、SNNベースの生成モデルに対する統一的で効果的な構造はまだ提案されていない。本稿では,スパイクニューラルネットワークにおける新しい拡散モデルアーキテクチャについて検討する。我々は、主流拡散モデルにおいてよく使われるU-net構造を置き換えるためにトランスフォーマーを利用する。比較的低い計算コストと短いサンプリング時間で高品質な画像を生成することができる。 SNNに基づく生成モデルの研究のための経験的ベースラインの提供を目的としている。 MNIST、Fashion-MNIST、CIFAR-10データセットの実験は、既存のSNN生成モデルと比較して、我々の研究が非常に競合していることを示している。

Spiking neural networks (SNNs) have low power consumption and bio-interpretable characteristics, and are considered to have tremendous potential for energy-efficient computing. However, the exploration of SNNs on image generation tasks remains very limited, and a unified and effective structure for SNN-based generative models has yet to be proposed. In this paper, we explore a novel diffusion model architecture within spiking neural networks. We utilize transformer to replace the commonly used U-net structure in mainstream diffusion models. It can generate higher quality images with relatively lower computational cost and shorter sampling time. It aims to provide an empirical baseline for research of generative models based on SNNs. Experiments on MNIST, Fashion-MNIST, and CIFAR-10 datasets demonstrate that our work is highly competitive compared to existing SNN generative models.

翻訳日:2024-02-20 20:07:01 公開日:2024-02-18

# PolypNextLSTM:ConvNextとConvLSTMを用いた軽量かつ高速なPolypビデオセグメンテーションネットワーク

PolypNextLSTM: A lightweight and fast polyp video segmentation network using ConvNext and ConvLSTM ( http://arxiv.org/abs/2402.11585v1 )

ライセンス: Link先を確認

Debayan Bhattacharya, Konrad Reuter, Finn Behrendnt, Lennart Maack, Sarah Grube, Alexander Schlaefer

(参考訳) ポリプセグメンテーションで一般的に用いられる単一の画像unetアーキテクチャは、ポリープの診断においてビデオデータから得られる時間的洞察が欠如している。臨床実践をより忠実に反映するために,提案手法であるPolypNextLSTMは,映像に基づく深層学習を活用し,時間的情報を利用して,最小パラメータオーバーヘッドでセグメンテーション性能を向上させる。 PolypNextLSTMは、UNetライクな構造で、ConvNext-Tinyをバックボーンとして、パラメータオーバーヘッドを減らすために、最後の2つのレイヤを戦略的に省略する。我々の時間融合モジュールであるConvLSTM(Convolutional Long Short Term Memory)は、時間的特徴を効果的に活用する。我々の主な特徴はPolypNextLSTMであり、パラメータの最もリーンで最速のモデルであり、5つの最先端の画像モデルとビデオベースのディープラーニングモデルの性能を上回っている。 sun-segデータセットの評価は、高速モーションやオクルージョンのような挑戦的なアーティファクトを含むビデオとともに、検出が容易で検出が難しいポリプシナリオにまたがる。

Commonly employed in polyp segmentation, single image UNet architectures lack the temporal insight clinicians gain from video data in diagnosing polyps. To mirror clinical practices more faithfully, our proposed solution, PolypNextLSTM, leverages video-based deep learning, harnessing temporal information for superior segmentation performance with the least parameter overhead, making it possibly suitable for edge devices. PolypNextLSTM employs a UNet-like structure with ConvNext-Tiny as its backbone, strategically omitting the last two layers to reduce parameter overhead. Our temporal fusion module, a Convolutional Long Short Term Memory (ConvLSTM), effectively exploits temporal features. Our primary novelty lies in PolypNextLSTM, which stands out as the leanest in parameters and the fastest model, surpassing the performance of five state-of-the-art image and video-based deep learning models. The evaluation of the SUN-SEG dataset spans easy-to-detect and hard-to-detect polyp scenarios, along with videos containing challenging artefacts like fast motion and occlusion.

翻訳日:2024-02-20 20:06:49 公開日:2024-02-18

# 公に監査可能なプライバシー保護選挙ロール

Publicly auditable privacy-preserving electoral rolls ( http://arxiv.org/abs/2402.11582v1 )

ライセンス: Link先を確認

Prashant Agrawal, Mahabir Prasad Jhanwar, Subodh Vishnu Sharma, Subhashis Banerjee

(参考訳) 電子投票に関する既存の文献は、投票プロトコルの妥当性を広く取り上げているが、大規模な選挙における選挙権の脆弱性は依然として重要な懸念となっている。選挙人ロールの完全性を確保するために、現在の慣習は選挙人ロールを公にするか、政党と共有することである。しかし、これは詳細な有権者プロファイルの構築と、有権者の選択的ターゲティングと操作を可能にし、自由かつ公正な選挙の基本原則を損なう。本稿では,公的な監査可能かつプライバシ保護型選挙ロールの設計問題について検討する。まず脅威モデルを定式化し、正式なセキュリティ定義を提供する。次に,脅威を軽減する選挙ロールの作成と維持のためのプロトコルを提案する。政党や監査役は選挙のロールを統計的に監査することができる。選挙人名簿全体は明かされておらず、大規模な組織的な選挙人によるターゲティングや操作を妨げている。

While existing literature on electronic voting has extensively addressed verifiability of voting protocols, the vulnerability of electoral rolls in large public elections remains a critical concern. To ensure integrity of electoral rolls, the current practice is to either make electoral rolls public or share them with the political parties. However, this enables construction of detailed voter profiles and selective targeting and manipulation of voters, thereby undermining the fundamental principle of free and fair elections. In this paper, we study the problem of designing publicly auditable yet privacy-preserving electoral rolls. We first formulate a threat model and provide formal security definitions. We then present a protocol for creation and maintenance of electoral rolls that mitigates the threats. Eligible voters can verify their inclusion, whereas political parties and auditors can statistically audit the electoral roll. The entire electoral roll is never revealed, which prevents any large-scale systematic voter targeting and manipulation.

翻訳日:2024-02-20 20:06:24 公開日:2024-02-18

# 因果潜在因子モデルにおける二重ロバスト推論

Doubly Robust Inference in Causal Latent Factor Models ( http://arxiv.org/abs/2402.11652v1 )

ライセンス: Link先を確認

Alberto Abadie, Anish Agarwal, Raaz Dwivedi, Abhin Shah

(参考訳) 本稿では、多数の単位と結果を含む現代データ豊富な環境において、観測不能な条件下での平均処理効果を推定するための新しいフレームワークを紹介する。提案した推定器は2重に頑健であり,結果計算,逆確率重み付け,行列補完のための新しいクロスフィット手法を組み合わせた。有限サンプルと漸近保証を導出し、新しい推定器の誤差がパラメトリックレートで平均零ガウス分布に収束することを示す。シミュレーション結果は,本論文で分析した推定器の形式的特性の実用的妥当性を示す。

This article introduces a new framework for estimating average treatment effects under unobserved confounding in modern data-rich environments featuring large numbers of units and outcomes. The proposed estimator is doubly robust, combining outcome imputation, inverse probability weighting, and a novel cross-fitting procedure for matrix completion. We derive finite-sample and asymptotic guarantees, and show that the error of the new estimator converges to a mean-zero Gaussian distribution at a parametric rate. Simulation results demonstrate the practical relevance of the formal properties of the estimators analyzed in this article.

翻訳日:2024-02-20 19:59:13 公開日:2024-02-18

# 失敗から学ぶ: 大きな言語モデルをエージェントとして微調整するとき、否定的な例を統合する

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents ( http://arxiv.org/abs/2402.11651v1 )

ライセンス: Link先を確認

Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin

(参考訳) 大規模言語モデル(llm)は、検索エンジンのようなツールを通じて環境と対話するエージェントとして機能することに成功した。しかし、LSMはトレーニングやアライメントにおいてツールの使用に特化せず、エージェントとしての有効性を制限している。この問題を解決するために、従来の研究はGPT-4と環境の間の相互作用軌跡を収集し、それらを微調整した小さなモデルを開発した。これの一環として、標準的なアプローチでは、タスクを正常に終了しないトラジェクトリを単に破棄し、一方、データやリソースのかなりの無駄を招き、他方、微調整時に可能な最適化パスを制限する可能性がある。本稿では,大規模な言語モデルが適切なデータクリーニングと微調整戦略によって失敗から学習できることを論じる。数学的推論,マルチホップ質問応答,戦略的質問応答タスクについて実験を行う。実験結果から, 正の例のみを用いた場合と比較して, 負の例を取り入れた場合, モデル性能が大きく向上することが示された。

Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools like search engines. However, LLMs are not optimized specifically for tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous work has collected interaction trajectories between GPT-4 and environments, and fine-tuned smaller models with them. As part of this, the standard approach has been to simply discard trajectories that do not finish the task successfully, which, on the one hand, leads to a significant waste of data and resources, and on the other hand, has the potential to limit the possible optimization paths during fine-tuning. In this paper, we contend that large language models can learn from failures through appropriate data cleaning and fine-tuning strategies. We conduct experiments on mathematical reasoning, multi-hop question answering, and strategic question answering tasks. Experimental results demonstrate that compared to solely using positive examples, incorporating negative examples enhances model performance by a large margin.

翻訳日:2024-02-20 19:59:03 公開日:2024-02-18

# プログラム強化学習のための理論的基礎

Theoretical foundations for programmatic reinforcement learning ( http://arxiv.org/abs/2402.11650v1 )

ライセンス: Link先を確認

Guruprerana Shabadi, Nathana\"el Fijalkow, Th\'eo Matricon

(参考訳) 強化学習(rl)の分野は、未知の確率環境において最適方針を学習するためのアルゴリズムに関するものである。プログラムRLは、制御ループのような高次構造を含むプログラムとしてポリシーの表現を研究する。機械学習とフォーマルなメソッドコミュニティの交差点で多くの注目を集めているにもかかわらず、プログラム的RLに関する理論的側面についてはほとんど知られていない。最適なプログラムポリシーはどのくらい大きいか? どうやって学ぶのか? 本論文の目的は,プログラム的rlの理論研究を始めながら,これらの質問に対する最初の回答を与えることである。

The field of Reinforcement Learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL.

翻訳日:2024-02-20 19:58:31 公開日:2024-02-18

# 機械学習による量子画像処理: 量子画像処理の品質と信頼性を改善する新しいアプローチ

Quantum Image Denoising with Machine Learning: A Novel Approach to Improve Quantum Image Processing Quality and Reliability ( http://arxiv.org/abs/2402.11645v1 )

ライセンス: Link先を確認

Yew Kee Wonga, Yifan Zhou, Yan Shing Liang

(参考訳) 量子画像処理(QIP)は、画像の操作と解析に量子コンピューティングの利点を活用することを目的とした分野である。しかし、qipは量子ビットの制限と量子マシン内のノイズの存在という2つの課題に直面している。本研究では,QIPにおけるノイズ問題に対処する新しい手法を提案する。量子処理画像のノイズを識別し補正する機械学習モデルを訓練し活用することにより、機械による不快感を補償し、古典的コンピュータが行うものと類似した処理結果を高い効率で得ることができる。このモデルは、オープンアクセスデータセットから既存の処理された画像と量子処理された画像の両方からなるデータセットを学習することでトレーニングされる。このモデルは、各ピクセルとその元の値に対する信頼性レベルを提供することができます。 QIPにおける損失とデコヒーレンスを補正するモデルの精度を評価するために,Pak Signal to Noise Ratio (PSNR), Structure similarity Index (SSIM), Mean Opinion Score (MOS)の3つの指標を用いて評価を行った。さらに、ドメイン間のモデルの適用性や、代替手法と比較してコスト効果についても論じる。

Quantum Image Processing (QIP) is a field that aims to utilize the benefits of quantum computing for manipulating and analyzing images. However, QIP faces two challenges: the limitation of qubits and the presence of noise in a quantum machine. In this research we propose a novel approach to address the issue of noise in QIP. By training and employing a machine learning model that identifies and corrects the noise in quantum processed images, we can compensate for the noisiness caused by the machine and retrieve a processing result similar to that performed by a classical computer with higher efficiency. The model is trained by learning a dataset consisting of both existing processed images and quantum processed images from open access datasets. This model will be capable of providing us with the confidence level for each pixel and its potential original value. To assess the model's accuracy in compensating for loss and decoherence in QIP, we evaluate it using three metrics: Peak Signal to Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mean Opinion Score (MOS). Additionally, we discuss the applicability of our model across domains well as its cost effectiveness compared to alternative methods.

翻訳日:2024-02-20 19:58:15 公開日:2024-02-18

# 汎用グラフ学習へのアプローチ : 大規模言語モデルの観点から

Towards Versatile Graph Learning Approach: from the Perspective of Large Language Models ( http://arxiv.org/abs/2402.11641v1 )

ライセンス: Link先を確認

Lanning Wei, Jun Gao, Huan Zhao

(参考訳) グラフ構造化データは一般的に使われ、現実世界で幅広いアプリケーションシナリオを持つ。これらの多様なアプリケーションに対して、多種多様な学習タスク、グラフドメイン、複雑なグラフ学習手順は、汎用的なグラフ学習アプローチを設計する際に、人間の専門家に挑戦を与える。これらの課題に直面した大規模言語モデル(llm)は、広範な知識と人間のような知性のために潜在的な解決策を提供する。本稿では, LLMを用いた多目的グラフ学習手法を設計するための新しい概念的プロトタイプを提案し, 特に 'where' と 'how' の視点に着目した。 where'の観点では,タスク定義,グラフデータ機能工学,モデル選択と最適化,デプロイと提供という4つの重要なグラフ学習手順を要約する。次に、これらの手順におけるLLMの応用シナリオを幅広いスペクトルにわたって検討する。の観点では、LLMの能力と各手順の要件を一致させます。最後に,LLMの強みを多目的グラフ学習法に活用する上で有望な方向性を指摘する。

Graph-structured data are the commonly used and have wide application scenarios in the real world. For these diverse applications, the vast variety of learning tasks, graph domains, and complex graph learning procedures present challenges for human experts when designing versatile graph learning approaches. Facing these challenges, large language models (LLMs) offer a potential solution due to the extensive knowledge and the human-like intelligence. This paper proposes a novel conceptual prototype for designing versatile graph learning methods with LLMs, with a particular focus on the ``where'' and ``how'' perspectives. From the ``where'' perspective, we summarize four key graph learning procedures, including task definition, graph data feature engineering, model selection and optimization, deployment and serving. We then explore the application scenarios of LLMs in these procedures across a wider spectrum. In the ``how'' perspective, we align the abilities of LLMs with the requirements of each procedure. Finally, we point out the promising directions that could better leverage the strength of LLMs towards versatile graph learning methods.

翻訳日:2024-02-20 19:57:36 公開日:2024-02-18

# 妨害ブロック: 攻撃下の機械生成テキスト検出器のロバスト性に関するストレステスト

Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks ( http://arxiv.org/abs/2402.11638v1 )

ライセンス: Link先を確認

Yichen Wang, Shangbin Feng, Abe Bohan Hou, Xiao Pu, Chao Shen, Xiaoming Liu, Yulia Tsvetkov, Tianxing He

(参考訳) 大規模言語モデル(LLM)の普及により、誤用を防ぐために機械生成テキストを検出する手法の需要が高まっている。本研究の目的は,現実のシナリオにおいて,悪意のある攻撃に対する検出器の頑健性をテストすることである。我々は,一般的な機械生成テキスト検出器の堅牢性について,編集,パラフレージング,プロンプト,コジェネレーションの様々なカテゴリの攻撃下で総合的に研究する。我々の攻撃はジェネレータLSMへの限られたアクセスを前提としており、異なる予算レベルで異なる攻撃に対する検出器の性能を比較する。実験の結果、既存の検出器のほとんどすべてが全ての攻撃の下で堅牢であり、すべての検出器は異なる抜け穴を示すことがわかった。全ての検知器を平均すると、全ての攻撃で性能は35%低下する。さらに,これらの欠陥の原因を調査し,堅牢性を改善するための初期パッチを提案する。

The widespread use of large language models (LLMs) is increasing the demand for methods that detect machine-generated text to prevent misuse. The goal of our study is to stress test the detectors' robustness to malicious attacks under realistic scenarios. We comprehensively study the robustness of popular machine-generated text detectors under attacks from diverse categories: editing, paraphrasing, prompting, and co-generating. Our attacks assume limited access to the generator LLMs, and we compare the performance of detectors on different attacks under different budget levels. Our experiments reveal that almost none of the existing detectors remain robust under all the attacks, and all detectors exhibit different loopholes. Averaging all detectors, the performance drops by 35% across all attacks. Further, we investigate the reasons behind these defects and propose initial out-of-the-box patches to improve robustness.

翻訳日:2024-02-20 19:56:18 公開日:2024-02-18

# フェイクユーザによるフェデレーションレコメンダシステム

Poisoning Federated Recommender Systems with Fake Users ( http://arxiv.org/abs/2402.11637v1 )

ライセンス: Link先を確認

Ming Yin, Yichang Xu, Minghong Fang, and Neil Zhenqiang Gong

(参考訳) フェデレーションレコメンデーション(federated recommendation)は、フェデレーション学習における重要なユースケースだが、ユーザからサーバ側の脆弱性など、さまざまな攻撃に影響を受けやすい。毒殺攻撃は、参加者が悪質なモデルアップデートをアップロードしてグローバルモデルを欺き、特定のターゲットアイテムの宣伝や取り下げを意図しているため、ユーザー側の攻撃で特に顕著である。本研究では,フェデレーションレコメンデータシステムにおけるプロモーションアタック実行戦略について検討する。フェデレートされたレコメンダシステムに対する現在の中毒攻撃は、実際のユーザやアイテムの人気に関するローカルトレーニングデータなどの追加情報に依存することが多い。しかし、そのような情報は潜在的な攻撃者が得るのに困難である。したがって、サーバから取得したアイテムの埋め込み以外に余分な情報を必要としない攻撃を開発する必要がある。本稿では,ユーザ評価データやユーザ属性,サーバが使用するアグリゲーションルールなどの知識を必要とせずに,フェデレーションレコメンデータシステムにおいて,攻撃対象の項目をプロモートするための,新たな偽ユーザベース中毒攻撃であるPoisonFRSを導入する。複数の実世界のデータセットに対する大規模な実験により、PoisonFRSは攻撃対象のアイテムを真のユーザの大部分に効果的にプロモートし、システムに関する追加情報に依存する現在のベンチマークを上回ります。さらに,実際のユーザと偽ユーザの両方によるモデル更新は,潜在領域では区別がつかないことも確認した。

Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strategies for executing promotion attacks in federated recommender systems. Current poisoning attacks on federated recommender systems often rely on additional information, such as the local training data of genuine users or item popularity. However, such information is challenging for the potential attacker to obtain. Thus, there is a need to develop an attack that requires no extra information apart from item embeddings obtained from the server. In this paper, we introduce a novel fake user based poisoning attack named PoisonFRS to promote the attacker-chosen targeted item in federated recommender systems without requiring knowledge about user-item rating data, user attributes, or the aggregation rule used by the server. Extensive experiments on multiple real-world datasets demonstrate that PoisonFRS can effectively promote the attacker-chosen targeted item to a large portion of genuine users and outperform current benchmarks that rely on additional information about the system. We further observe that the model updates from both genuine and fake users are indistinguishable within the latent space.

翻訳日:2024-02-20 19:55:42 公開日:2024-02-18

# IDEのユニバーサルインターフェースとしてのツール拡張LDM

Tool-Augmented LLMs as a Universal Interface for IDEs ( http://arxiv.org/abs/2402.11635v1 )

ライセンス: Link先を確認

Yaroslav Zharov, Yury Khudyakov, Evgeniia Fedotova, Evgeny Grigorenko, Egor Bogomolov

(参考訳) 現在の統合開発環境(IDE)は、初期のテキスト編集ユーティリティから、開発者を支援する数千の関数を含む複雑なプログラムまで、長い道のりをたどっています。しかし、効率向上ツールが組み込まれたため、IDEは徐々に学習曲線の急激な高度化を図った。自然言語対話とコード生成の両方が可能なLarge Language Models(LLM)の台頭は、IDEの概念の陳腐化に関する議論につながります。本研究では,IDE 施設を包むユニバーサルインターフェースとして,IDE における LLM の位置づけについて考察する。ユーザコマンドで複数のIDE機能を含む複雑なアクションを実行でき、オプションやアクションを検索する際の面倒な作業のユーザエクスペリエンスを削除できるモデルを構想する。作業の実際的な部分については、あるタスクの実行を迅速化する外部ツールを呼び出すLLMの能力を探究する作業に従事します。このようなツールの概念実証を紹介する。

Modern-day Integrated Development Environments (IDEs) have come a long way from the early text editing utilities to the complex programs encompassing thousands of functions to help developers. However, with the increasing number of efficiency-enhancing tools incorporated, IDEs gradually became sophisticated software with a steep learning curve. The rise of the Large Language Models (LLMs) capable of both natural language dialogue and code generation leads to a discourse on the obsolescence of the concept of IDE. In this work, we offer a view on the place of the LLMs in the IDEs as the universal interface wrapping the IDE facilities. We envision a model that is able to perform complex actions involving multiple IDE features upon user command, stripping the user experience of the tedious work involved in searching through options and actions. For the practical part of the work, we engage with the works exploring the ability of LLMs to call for external tools to expedite a given task execution. We showcase a proof-of-concept of such a tool.

翻訳日:2024-02-20 19:55:15 公開日:2024-02-18

# インテント認識情報参照ダイアログ生成のためのセルフシーディングおよびマルチインテント自己指示llm

Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs ( http://arxiv.org/abs/2402.11633v1 )

ライセンス: Link先を確認

Arian Askari, Roxana Petcu, Chuan Meng, Mohammad Aliannejadi, Amin Abolghasemi, Evangelos Kanoulas, Suzan Verberne

(参考訳) 情報検索ダイアログにおけるユーザ意図の特定は,ユーザの情報ニーズを満たすシステムにとって極めて重要である。意図予測(ip)は困難であり、トレーニングのための人間ラベルの意図と十分な対話を要求する。しかし、手動でアノテートするインテントはリソース集約である。大規模言語モデル(llm)は合成データの生成に有効であることが示されているが、意図認識情報参照ダイアログを生成するためにllmを使用する研究はない。本稿では,大規模・オープンドメイン・インテント対応情報検索ダイアログのゼロショット生成にLLMを活用することに焦点を当てる。本稿では,新しいセルフシーディングとマルチインテント・セルフインストラクションスキームを持つsolidを提案する。前者は、LLM自身の知識スコープを用いてダイアログ生成を開始し、後者は、LLMに順次発声を発生させるよう促し、複雑な多言語発声を発生させる際に、LLMにそのプロンプト命令を自律的に適応させることで、手動のプロンプト設計の必要性を緩和する。さらに,solidが生成するデータに対して1ステップでダイアログを生成するように訓練したsolid-rlを提案する。そこで本研究では,SOLID-RLの学習過程において,SOLID生成ダイアログに様々な重みを割り当てる長さに基づく品質推定機構を提案する。我々は、SOLIDとSOLID-RLを使用して300万以上の意図認識ダイアログを生成し、既存のデータセットのサイズを超える。実験により、SOLIDとSOLID-RLによって生成されたダイアログに基づいて訓練されたIPメソッドは、人為的なダイアログよりも優れたIP品質を実現することが示された。

Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need for manual prompt design by asking the LLM to autonomously adapt its prompt instruction when generating complex multi-intent utterances. Furthermore, we propose SOLID-RL, which is further trained to generate a dialog in one step on the data generated by SOLID. We propose a length-based quality estimation mechanism to assign varying weights to SOLID-generated dialogs based on their quality during the training process of SOLID-RL. We use SOLID and SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size of existing datasets. Experiments show that IP methods trained on dialogs generated by SOLID and SOLID-RL achieve better IP quality than ones trained on human-generated dialogs.

翻訳日:2024-02-20 19:55:00 公開日:2024-02-18

# ニューロモルフィックな顔分析:調査

Neuromorphic Face Analysis: a Survey ( http://arxiv.org/abs/2402.11631v1 )

ライセンス: Link先を確認

Federico Becattini, Lorenzo Berlincioni, Luca Cultrera, Alberto Del Bimbo

(参考訳) イベントカメラ(英: event camera)またはニューロモルフィックセンサー(英: Neuromorphic sensor)は、生物学的視覚系の機能を模倣する撮像装置の一種。異なる間隔で固定画像をキャプチャする従来のフレームベースのカメラとは異なり、ニューロモルフィックセンサーは、高時間分解能と低レイテンシで視野内の光強度や動きの変化を表すイベントを連続的に生成する。これらの特性は、有効性とプライバシー保護の観点から、人間の顔のモデリングにおいて興味深いことが証明されている。しかし、ニューロモルフィック顔分析は依然として生で非構造的な研究分野であり、明確な基準やベンチマークを持たない様々なタスクに対処しようとする試みがいくつかある。本稿では,ニューロモルフィック顔分析の領域における機能,課題,新たな応用について概説し,将来性のある方向性と課題を概説する。ニューロモルフィック・ビジョンの基本的な動作原理を議論し、関連する研究の詳細な概要を提示した後、利用可能なデータ、標準データ表現、新たな課題、さらなる調査を必要とする限界について検討する。本稿では,この発展分野における最近のプロセスに注目し,経験豊富な研究者と新参研究者の双方に,その問題点と欠点を総合的に分析することを目的とする。

Neuromorphic sensors, also known as event cameras, are a class of imaging devices mimicking the function of biological visual systems. Unlike traditional frame-based cameras, which capture fixed images at discrete intervals, neuromorphic sensors continuously generate events that represent changes in light intensity or motion in the visual field with high temporal resolution and low latency. These properties have proven to be interesting in modeling human faces, both from an effectiveness and a privacy-preserving point of view. Neuromorphic face analysis however is still a raw and unstructured field of research, with several attempts at addressing different tasks with no clear standard or benchmark. This survey paper presents a comprehensive overview of capabilities, challenges and emerging applications in the domain of neuromorphic face analysis, to outline promising directions and open issues. After discussing the fundamental working principles of neuromorphic vision and presenting an in-depth overview of the related research, we explore the current state of available data, standard data representations, emerging challenges, and limitations that require further investigation. This paper aims to highlight the recent process in this evolving field to provide to both experienced and newly come researchers an all-encompassing analysis of the state of the art along with its problems and shortcomings.

翻訳日:2024-02-20 19:54:31 公開日:2024-02-18

# 離散ニューラルアルゴリズムによる推論

Discrete Neural Algorithmic Reasoning ( http://arxiv.org/abs/2402.11628v1 )

ライセンス: Link先を確認

Gleb Rodionov, Liudmila Prokhorenkova

(参考訳) ニューラルアルゴリズム推論は、モデルを学習して古典的なアルゴリズムの実行を模倣することで、ニューラルネットワークによる計算をキャプチャすることを目的としている。一般的なアーキテクチャは重み付け空間に正しいモデルを含むのに十分な表現力を持っているが、現在のニューラル推論は分散データの一般化に苦戦している。一方、古典計算は、離散的な計算状態間の遷移として説明できるので、分布シフトに影響されない。本研究は,有限状態の組合せとして,ニューラル推論器に実行軌道の維持を強制することを提案する。アルゴリズムの状態遷移を監督して訓練されたそのようなモデルは、元のアルゴリズムと完全に整合することができる。これを示すために、SALSA-CLRSベンチマークに対する我々のアプローチを評価し、全てのタスクに対して完璧なテストスコアを得る。さらに,提案するアーキテクチャの選択により,任意のテストデータに対する学習アルゴリズムの正しさを証明できる。

Neural algorithmic reasoning aims to capture computations with neural networks via learning the models to imitate the execution of classical algorithms. While common architectures are expressive enough to contain the correct model in the weights space, current neural reasoners are struggling to generalize well on out-of-distribution data. On the other hand, classical computations are not affected by distribution shifts as they can be described as transitions between discrete computational states. In this work, we propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states. Trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm. To show this, we evaluate our approach on the SALSA-CLRS benchmark, where we get perfect test scores for all tasks. Moreover, the proposed architectural choice allows us to prove the correctness of the learned algorithms for any test data.

翻訳日:2024-02-20 19:54:09 公開日:2024-02-18

# ループ内のユーザによるインタラクティブな服装推薦

Interactive Garment Recommendation with User in the Loop ( http://arxiv.org/abs/2402.11627v1 )

ライセンス: Link先を確認

Federico Becattini, Xiaolin Chen, Andrea Puccia, Haokun Wen, Xuemeng Song, Liqiang Nie, Alberto Del Bimbo

(参考訳) ファッションアイテムのリコメンデーションは、しばしばリッチなユーザープロファイルを活用し、過去の履歴と過去の購入に基づいてターゲットとなる提案を行う。本稿では,ユーザの事前知識が与えられていないことを前提として作業を行う。我々は,着物を構成するための補完アイテムを推奨するため,ユーザの反応を統合することで,ユーザプロファイルをオンザフライで構築することを提案する。本稿では,適切な衣服を提案し,ユーザのフィードバックを取り込み,その推奨を改善し,ユーザ満足度を最大化する強化学習エージェントを提案する。このようなモデルをトレーニングするために、私たちは、トレーニングループ内のユーザフィードバックをシミュレートできるプロキシモデルを活用します。我々はIQON3000のファッションデータセットを実験し、強化学習に基づくエージェントが個人の好みを考慮し、推薦を改善することができることを示した。さらに、そのような作業は、訓練中の探索を活用できない非強化モデルにとって困難であることが証明された。

Recommending fashion items often leverages rich user profiles and makes targeted suggestions based on past history and previous purchases. In this paper, we work under the assumption that no prior knowledge is given about a user. We propose to build a user profile on the fly by integrating user reactions as we recommend complementary items to compose an outfit. We present a reinforcement learning agent capable of suggesting appropriate garments and ingesting user feedback so to improve its recommendations and maximize user satisfaction. To train such a model, we resort to a proxy model to be able to simulate having user feedback in the training loop. We experiment on the IQON3000 fashion dataset and we find that a reinforcement learning-based agent becomes capable of improving its recommendations by taking into account personal preferences. Furthermore, such task demonstrated to be hard for non-reinforcement models, that cannot exploit exploration during training.

翻訳日:2024-02-20 19:53:53 公開日:2024-02-18

# メタ認知検索型大規模言語モデル

Metacognitive Retrieval-Augmented Large Language Models ( http://arxiv.org/abs/2402.11626v1 )

ライセンス: Link先を確認

Yujia Zhou, Zheng Liu, Jiajie Jin, Jian-Yun Nie, Zhicheng Dou

(参考訳) 検索増強世代は、事実コンテンツの生成に効果があるため、自然言語処理の中心となっている。従来の方法では単一時間検索を用いるが、近年ではマルチホップ推論タスクのマルチ時間検索に移行している。しかし、これらの戦略は事前定義された推論ステップに縛られ、応答生成の不正確性に繋がる可能性がある。本稿では,検索型生成プロセスとメタ認知を組み合わせた手法であるmetaragを提案する。認知心理学から引き出すと、メタ認知は個人が自己反射し、その認知過程を批判的に評価することを可能にする。これを統合することで、MetaRAGはモデルが応答戦略を監視し、評価し、計画し、イントロスペクティブ推論能力を高めることができる。 3段階のメタ認知制御パイプラインを通じて、モデルは初期認知反応の欠如を識別し、修正することができる。経験的評価は、MetaRAGが既存の手法よりも著しく優れていることを示している。

Retrieval-augmented generation have become central in natural language processing due to their efficacy in generating factual content. While traditional methods employ single-time retrieval, more recent approaches have shifted towards multi-time retrieval for multi-hop reasoning tasks. However, these strategies are bound by predefined reasoning steps, potentially leading to inaccuracies in response generation. This paper introduces MetaRAG, an approach that combines the retrieval-augmented generation process with metacognition. Drawing from cognitive psychology, metacognition allows an entity to self-reflect and critically evaluate its cognitive processes. By integrating this, MetaRAG enables the model to monitor, evaluate, and plan its response strategies, enhancing its introspective reasoning abilities. Through a three-step metacognitive regulation pipeline, the model can identify inadequacies in initial cognitive responses and fixes them. Empirical evaluations show that MetaRAG significantly outperforms existing methods.

翻訳日:2024-02-20 19:53:38 公開日:2024-02-18

# SpeCrawler: 大規模言語モデルを使用したAPIドキュメンテーションからOpenAPI仕様を生成する

SpeCrawler: Generating OpenAPI Specifications from API Documentation Using Large Language Models ( http://arxiv.org/abs/2402.11625v1 )

ライセンス: Link先を確認

Koren Lazar, Matan Vetzler, Guy Uziel, David Boaz, Esther Goldbraich, David Amid, Ateret Anaby-Tavor

(参考訳) デジタル時代には、広く使われているAPIが明らかである。しかし、スケーラブルなAPIの利用は、オンラインAPIドキュメンテーションで見られる構造的なばらつきのため、課題となる。これにより、api使用を容易にする自動ツールの必要性が高まる。実行可能なアプローチには、ドキュメントをAPI仕様フォーマットに変換することが含まれる。ルールベースのメソッドを使った以前の試みはあったが、これらのアプローチは様々なドキュメントにまたがる一般化の困難に遭遇した。本稿では,大規模言語モデル(LLM)を利用して,多種多様なAPIドキュメントから,慎重に構築されたパイプラインを通じてOpenAPI仕様を生成する総合システムであるSpeCrawlerを紹介する。多数のAPIの標準化フォーマットを作成することにより、SpeCrawlerは、APIオーケストレーションシステム内の統合プロセスの合理化と、ツールのLLMへの組み込みを容易にする。本稿では,SpeCrawlerの方法論を実証的エビデンスとケーススタディで実証し,LLM機能による有効性を示す。

In the digital era, the widespread use of APIs is evident. However, scalable utilization of APIs poses a challenge due to structure divergence observed in online API documentation. This underscores the need for automatic tools to facilitate API consumption. A viable approach involves the conversion of documentation into an API Specification format. While previous attempts have been made using rule-based methods, these approaches encountered difficulties in generalizing across diverse documentation. In this paper we introduce SpeCrawler, a comprehensive system that utilizes large language models (LLMs) to generate OpenAPI Specifications from diverse API documentation through a carefully crafted pipeline. By creating a standardized format for numerous APIs, SpeCrawler aids in streamlining integration processes within API orchestrating systems and facilitating the incorporation of tools into LLMs. The paper explores SpeCrawler's methodology, supported by empirical evidence and case studies, demonstrating its efficacy through LLM capabilities.

翻訳日:2024-02-20 19:53:26 公開日:2024-02-18

# なぜそんなに重いの? 層を切り離して大きな言語モデルをスリム化する

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers ( http://arxiv.org/abs/2402.11700v1 )

ライセンス: Link先を確認

Shuzhou Yuan, Ercong Nie, Bolei Ma, Michael F\"arber

(参考訳) 大規模言語モデル(LLM)は、様々な自然言語処理(NLP)タスクに対処する際、優れた能力を持っている。しかし、これらのモデルの大きさは、層積み重ねによる数十億のパラメータを含むため、ストレージ、トレーニング、推論の点で問題となる。モデルプルーニングや蒸留のような伝統的なアプローチは、モデルサイズを減らす方法を提供しているが、しばしば性能維持の犠牲になる。本研究では,llmにおけるレイヤ数を削減する手法を体系的に検討する。驚くことに、少ないレイヤでもllmは、特にテキスト分類タスクのプロンプトベースの微調整において、同様の、あるいはより優れたパフォーマンスレベルを維持している。注目すべきは、あるケースでは、単一の層を持つモデルは、完全に層化されたモデルよりも優れています。これらの知見は, LLMのサイズ制約を緩和し, 性能を保ちながら, LLMを効果的に活用するための道を開くことを目的とした今後の研究に有用である。

Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. However, the sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. While traditional approaches such as model pruning or distillation offer ways for reducing model size, they often come at the expense of performance retention. In our investigation, we systematically explore the approach of reducing the number of layers in LLMs. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work aimed at mitigating the size constraints of LLMs while preserving their performance, thereby opening avenues for significantly more efficient use of LLMs.

翻訳日:2024-02-20 19:46:15 公開日:2024-02-18

# 大規模言語モデルを用いた対談評価のためのマルチアスペクトフレームワーク

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models ( http://arxiv.org/abs/2402.11676v1 )

ライセンス: Link先を確認

Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun

(参考訳) ヘイトスピーチの介入戦略として、ヘイトフルな主張を否定し、遭遇を非エスカレートするために設計されたヘイトスピーチの文脈に対する情報的な反応が現れた。先行研究では手作業による介入を支援する自動カウンターナラティブ生成手法が提案されているが,これらの手法の評価は未定である。対談的評価のための従来の自動指標は、対談的品質の重要側面を評価基準として組み込むのではなく、表面的参照比較に依存するため、人間の判断と一致しない。先行評価の限界に対処するために, 対談専門ngoのガイドラインから導かれた5つの特徴を用いて, llmが生成した対談候補に対してスコアとフィードバックを提供する新しい評価フレームワークを提案する。 LLM評価器は人手による注釈付きスコアやフィードバックに強く対応し,多視点・参照なし・解釈可能な評価器としての可能性を示した。

Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy. While previous work has proposed automatic counter narrative generation methods to aid manual interventions, the evaluation of these approaches remains underdeveloped. Previous automatic metrics for counter narrative evaluation lack alignment with human judgment as they rely on superficial reference comparisons instead of incorporating key aspects of counter narrative quality as evaluation criteria. To address prior evaluation limitations, we propose a novel evaluation framework prompting LLMs to provide scores and feedback for generated counter narrative candidates using 5 defined aspects derived from guidelines from counter narrative specialized NGOs. We found that LLM evaluators achieve strong alignment to human-annotated scores and feedback and outperform alternative metrics, indicating their potential as multi-aspect, reference-free and interpretable evaluators for counter narrative evaluation.

翻訳日:2024-02-20 19:46:00 公開日:2024-02-18

# decoy状態を持つ単一光子を用いたセキュア量子イメージング

Secure quantum imaging with decoy state heralded single photons ( http://arxiv.org/abs/2402.11675v1 )

ライセンス: Link先を確認

Siddhant Vernekar and Jolly Xavier

(参考訳) 弱コヒーレント源(WCS)と自発パラメトリックダウン変換された単光子対は量子鍵分布(QKD)および量子イメージング(QI)実験に応用されている。ディコイ状態法はQKDとQIのセキュリティを高めるためにも使われている。我々は,decoy state heralded single photon source (hsps)を用いて,量子安定イメージングの研究を行った。低光子数状態におけるHSPSの優れた性能は、測定の不確実性を低減し、セキュアなQIを確保するために量子鍵分布プロトコルを統合する理想的な候補となる。さらに, デコイ状態hspsよりも動作速度が高いため, デコイ状態wcsの影響を推察し, 量子ホールドイメージングにおいて平均光子数が高い条件下では有効であることを示した。

Weak coherent source (WCS) and spontaneous parametric down converted heralded single photon pairs have found applications in quantum key distribution (QKD) and quantum imaging (QI) experiments. Decoy state methods have also been used to enhance the security for QKD and QI. We study quantum secured imaging with the decoy state heralded single photon source (HSPS). The HSPSs superior performance in low photon number regimes makes it an ideal candidate for integrating quantum key distribution protocols to reduce measurement uncertainty and ensure secure QI. Furthermore, our results also infer the influence of the decoy state WCS, due to its higher operating speed than decoy state HSPS, would be effective in conditions that allow higher mean photon numbers for quantum secured imaging.

翻訳日:2024-02-20 19:45:40 公開日:2024-02-18

# 非線形抵抗ネットワークをシミュレートする高速アルゴリズム

A Fast Algorithm to Simulate Nonlinear Resistive Networks ( http://arxiv.org/abs/2402.11674v1 )

ライセンス: Link先を確認

Benjamin Scellier

(参考訳) エネルギー効率の高い人工知能システムを求めて、抵抗ネットワークは従来のgpuベースのニューラルネットワークに代わるものとして注目を集めている。これらのネットワークは電気回路の物理を利用して推論し、平衡伝播のような局所的な訓練手法で最適化することができる。電力消費の観点からは潜在的な優位性にもかかわらず、これらの抵抗ネットワークを効率的にシミュレーションすることはスケーラビリティを評価する上で重要なボトルネックであり、現在の手法は線形ネットワークに限られるか、SPICEのような現実的で遅い回路シミュレータに依存している。理想回路要素を仮定し,線形不等式制約を持つ二次計画問題として構成する非線形抵抗ネットワークのシミュレーション手法を提案し,高速で正確な座標降下アルゴリズムを用いて解く。シミュレーション手法は,従来のスパイスベースのシミュレーションを著しく上回り,最大325倍の速度でネットワークのトレーニングが可能となり,ネットワークサイズとエポック期間の比率が5万倍に向上した。我々のアプローチは他の電気部品にも適用可能であり、非線形電気ネットワークのシミュレーションの急速な進歩を促すことができる。

In the quest for energy-efficient artificial intelligence systems, resistor networks are attracting interest as an alternative to conventional GPU-based neural networks. These networks leverage the physics of electrical circuits for inference and can be optimized with local training techniques such as equilibrium propagation. Despite their potential advantage in terms of power consumption, the challenge of efficiently simulating these resistor networks has been a significant bottleneck to assess their scalability, with current methods either being limited to linear networks or relying on realistic, yet slow circuit simulators like SPICE. Assuming ideal circuit elements, we introduce a novel approach for the simulation of nonlinear resistive networks, which we frame as a quadratic programming problem with linear inequality constraints, and which we solve using a fast, exact coordinate descent algorithm. Our simulation methodology significantly outperforms existing SPICE-based simulations, enabling the training of networks up to 325 times larger at speeds 150 times faster, resulting in a 50,000-fold improvement in the ratio of network size to epoch duration. Our approach, adaptable to other electrical components, can foster more rapid progress in the simulations of nonlinear electrical networks.

翻訳日:2024-02-20 19:45:25 公開日:2024-02-18

# エストニア語テキストの自動修正:EKTB25プロジェクトの最終報告

Autocorrect for Estonian texts: final report from project EKTB25 ( http://arxiv.org/abs/2402.11671v1 )

ライセンス: Link先を確認

Agnes Luhtaru, Martin Vainikko, Krista Liin, Kais Allkivi-Metsoja, Jaagup Kippar, Pille Eslon, Mark Fishel

(参考訳) このプロジェクトは2021-2023年にエストニア語技術プログラムによって資金提供された。その主な目的はエストニア語の綴りと文法の修正ツールを開発することだった。主な課題は、そのような開発に必要なごく少量のエラー訂正データであった。これを緩和するために,(1)モデルトレーニングとテストのためにより多くの補正データをアノテートし,(2)他のタスク用に作成された機械学習モデルをリトレーニングするトランスファーラーニングをテストし,(3)大規模言語モデルを含む代替手法と比較した。また,誤差カテゴリによる補正の精度と収率を算出し,異なる手法の有効性を詳細に比較できる自動評価法を開発した。プロジェクトの間に大きな言語モデルにブレークスルーがあった。エストニア語をサポートする商用言語モデルであるGPT4が作成された。本報告では,計画調整時のモデルの存在を考慮し,エストニア語テキスト改善のためのgpt4の機能との比較を行った。最終結果は、GPT4よりも優れたスコアを提供し、その結果は有用であるが、完全には信頼できないことを示している。レポートにはまた、オープンソースソリューションに焦点を当てたGPT4や他の主要言語モデルの実装方法に関するアイデアも含まれている。このプロジェクトの結果はすべてオープンソース/オープンソースで、商用ライセンスを含む目的で使用することができる。

The project was funded in 2021-2023 by the National Programme of Estonian Language Technology. Its main aim was to develop spelling and grammar correction tools for the Estonian language. The main challenge was the very small amount of available error correction data needed for such development. To mitigate this, (1) we annotated more correction data for model training and testing, (2) we tested transfer-learning, i.e. retraining machine learning models created for other tasks, so as not to depend solely on correction data, (3) we compared the developed method and model with alternatives, including large language models. We also developed automatic evaluation, which can calculate the accuracy and yield of corrections by error category, so that the effectiveness of different methods can be compared in detail. There has been a breakthrough in large language models during the project: GPT4, a commercial language model with Estonian-language support, has been created. We took into account the existence of the model when adjusting plans and in the report we present a comparison with the ability of GPT4 to improve the Estonian language text. The final results show that the approach we have developed provides better scores than GPT4 and the result is usable but not entirely reliable yet. The report also contains ideas on how GPT4 and other major language models can be implemented in the future, focusing on open-source solutions. All results of this project are open-data/open-source, with licenses that allow them to be used for purposes including commercial ones.

翻訳日:2024-02-20 19:45:08 公開日:2024-02-18

# ブラックボックスへの挑戦:農業と林業におけるcnn応用の帰属マップの包括的評価

Challenging the Black Box: A Comprehensive Evaluation of Attribution Maps of CNN Applications in Agriculture and Forestry ( http://arxiv.org/abs/2402.11670v1 )

ライセンス: Link先を確認

Lars Nieradzik, Henrike Stephani, J\"ordis Sieburg-Rockel, Stephanie Helmling, Andrea Olbrich, Janis Keuper

(参考訳) 本研究では,農業・林業におけるニューラルネットワークの説明可能性,特に肥料処理の分類と木材識別について検討する。しばしば「ブラックボックス」と見なされるこれらのモデルの不透明な性質は、クラスアクティベーションマップ(cams)またはサリエンシーマップ(saliency maps)として知られる最先端のアトリビューションマップ(ams)の広範な評価を通じて解決される。これらのAMの包括的質的および定量的分析により、重要な実用的限界が明らかになった。発見によると、AMは重要な機能を一貫して強調しておらず、ドメインの専門家が重要とみなす機能と誤認することが多い。これらの相違は、ニューラルネットワークの意思決定プロセスを理解する上でのAMの有用性に関する重大な疑問を引き起こす。本研究は,農業・林業分野におけるamsの信頼性と実用性に関する重要な知見を提供し,これらの応用分野におけるニューラルネットワークの理解を深める。

In this study, we explore the explainability of neural networks in agriculture and forestry, specifically in fertilizer treatment classification and wood identification. The opaque nature of these models, often considered 'black boxes', is addressed through an extensive evaluation of state-of-the-art Attribution Maps (AMs), also known as class activation maps (CAMs) or saliency maps. Our comprehensive qualitative and quantitative analysis of these AMs uncovers critical practical limitations. Findings reveal that AMs frequently fail to consistently highlight crucial features and often misalign with the features considered important by domain experts. These discrepancies raise substantial questions about the utility of AMs in understanding the decision-making process of neural networks. Our study provides critical insights into the trustworthiness and practicality of AMs within the agriculture and forestry sectors, thus facilitating a better understanding of neural networks in these application areas.

翻訳日:2024-02-20 19:44:45 公開日:2024-02-18

# アナログ量子シミュレータの最適制御による高速フォワード分子基底状態生成

Fast-forwarding molecular ground state preparation with optimal control on analog quantum simulators ( http://arxiv.org/abs/2402.11667v1 )

ライセンス: Link先を確認

Davide Castaldo, Marta Rosa, Stefano Corni

(参考訳) 電子力学の最適制御は、量子力学によって課される境界に近づく進化時間とともに、化学的精度で分子基底状態を作成することができることを示す。我々は、分子ハミルトニアンにすでに存在する相互作用の観点からのみ、分子進化の特定のパラメータ化を提案する。したがって,提案手法は量子シミュレーションルーチンのみを使用し,好適なスケーリングを維持している。変動量子アルゴリズムと最適制御の親密な関係により、可能であれば、文献における最先端の手法と比較する。化学精度とアルゴリズムスケーリングを達成するために必要なパラメータの数は、変分アンサーゼを構築するためのコンパクトな適応戦略と一致していることがわかった。このアルゴリズムは量子シミュレータにも適しており、デジタル量子プロセッサ(最大16量子ビット)をエミュレートして実装され、異なる電子相関度にまたがる異なる分子やジオメトリでテストされている。

We show that optimal control of the electron dynamics is able to prepare molecular ground states, within chemical accuracy, with evolution times approaching the bounds imposed by quantum mechanics. We propose a specific parameterization of the molecular evolution only in terms of interaction already present in the molecular Hamiltonian. Thus, the proposed method solely utilizes quantum simulation routines, retaining their favourable scalings. Due to the intimate relationships between variational quantum algorithms and optimal control we compare, when possible, our results with state-of-the-art methods in literature. We found that the number of parameters needed to reach chemical accuracy and algorithmic scaling are in line with compact adaptive strategies to build variational ansatze. The algorithm, which is also suitable for quantum simulators, is implemented emulating a digital quantum processor (up to 16 qubits) and tested on different molecules and geometries spanning different degrees of electron correlation.

翻訳日:2024-02-20 19:44:27 公開日:2024-02-18

# マルチスケール時間分解による短期負荷予測

Interpretable Short-Term Load Forecasting via Multi-Scale Temporal Decomposition ( http://arxiv.org/abs/2402.11664v1 )

ライセンス: Link先を確認

Yuqi Jiang, Yan Li, and Yize Chen

(参考訳) 機械学習とディープラーニングの急速な進歩により、電力系統の電力負荷予測、例えば単変量および多変量短期負荷予測における幅広い応用が可能となった。負荷パターンの非線形性や高い予測精度の学習能力は高いが、電力負荷予測のための典型的なディープラーニングモデルの解釈可能性はあまり研究されていない。本稿では,各ニューラルネットワークの線形結合を学習し,入力時間特徴を学習する,解釈可能な深層学習手法を提案する。また,複雑な時系列パターンに対処するマルチスケール時系列分解法を提案する。ケーススタディはベルギー中央グリッド負荷データセットで行われており、提案モデルは頻繁に適用されるベースラインモデルよりも精度が高かった。具体的には,MSE,MAE,RMSEはそれぞれ0.52,0.57,0.72である。解釈可能性については,提案手法では一般化能力を示す。一方,他の基本手法と比較して,特徴だけでなく時間的解釈可能性も示すことができる。また、グローバルタイム特徴の解釈性も得られる。グローバルな特徴の解釈性を得ることで、負荷データの全体的なパターン、傾向、循環性を把握でき、最終出力の形成における様々な時間関連特徴の重要性も明らかにできる。

Rapid progress in machine learning and deep learning has enabled a wide range of applications in the electricity load forecasting of power systems, for instance, univariate and multivariate short-term load forecasting. Though the strong capabilities of learning the non-linearity of the load patterns and the high prediction accuracy have been achieved, the interpretability of typical deep learning models for electricity load forecasting is less studied. This paper proposes an interpretable deep learning method, which learns a linear combination of neural networks that each attends to an input time feature. We also proposed a multi-scale time series decomposition method to deal with the complex time patterns. Case studies have been carried out on the Belgium central grid load dataset and the proposed model demonstrated better accuracy compared to the frequently applied baseline model. Specifically, the proposed multi-scale temporal decomposition achieves the best MSE, MAE and RMSE of 0.52, 0.57 and 0.72 respectively. As for interpretability, on one hand, the proposed method displays generalization capability. On the other hand, it can demonstrate not only the feature but also the temporal interpretability compared to other baseline methods. Besides, the global time feature interpretabilities are also obtained. Obtaining global feature interpretabilities allows us to catch the overall patterns, trends, and cyclicality in load data while also revealing the significance of various time-related features in forming the final outputs.

翻訳日:2024-02-20 19:44:13 公開日:2024-02-18

# 重力による脱コヒーレンス

Gravity-mediated decoherence ( http://arxiv.org/abs/2402.11663v1 )

ライセンス: Link先を確認

Dimitris Moustos, Charis Anastopoulos

(参考訳) 質量体の重力場内の小さな量子系は、後者の量子自由度と絡み合う。したがって、巨大体は環境として機能し、量子系への非単体力学、ノイズ、デコヒーレンスを誘導する。この重力によるデコヒーレンスから地球上のシステムを保護することは不可能であり、これはマクロな量子システムによる全ての実験に深刻な影響を及ぼす可能性がある。我々は,この効果の第一原理解析を行い,対応するオープンシステムのダイナミクスを導出する。近未来の量子実験は影響を受けないが、人間のスケールでは強い非一貫性効果がある。 1メートル分離された人間の2つの局所状態の重ね合わせのデコヒーレンス時間は1秒の順序である。

A small quantum system within the gravitational field of a massive body will be entangled with the quantum degrees of freedom of the latter. Hence, the massive body acts as an environment, and it induces non-unitary dynamics, noise, and decoherence to the quantum system. It is impossible to shield systems on Earth from this gravity-mediated decoherence, which could severely affect all experiments with macroscopic quantum systems. We undertake a first-principles analysis of this effect, by deriving the corresponding open system dynamics. We find that near-future quantum experiments are not affected, but there is a strong decoherence effect at the human scale. The decoherence time for a superposition of two localized states of a human with an one meter separation is of the order of one second.

翻訳日:2024-02-20 19:43:53 公開日:2024-02-18

# TDE-3:スパイクニューラルネットワークにおける光フロー計算の事前改善

TDE-3: An improved prior for optical flow computation in spiking neural networks ( http://arxiv.org/abs/2402.11662v1 )

ライセンス: Link先を確認

Matthew Yedutenko, Federico Paredes-Valles, Lyes Khacef and Guido C.H.E. De Croon

(参考訳) モーション検出は、ロボットシステムが環境を知覚し、ナビゲートするために必要な主要なタスクである。バイオインスパイアされたバイオインスパイアされた時間差エンコーダ(TDE-2)は、イベントベースのセンサーとプロセッサをスパイクニューラルネットワークと組み合わせ、空間内の2つの点間の時間的相関を抽出することでリアルタイムかつエネルギー効率の高い運動検出を提供する。しかし、アルゴリズムレベルでは、この設計はテクスチャ環境における個々のTDEの方向選択性を失う。本稿では, テクスチャ環境下でのTDE-3の方向選択性を高めるために, さらなる抑制入力を付加した3点TDE(TDE-3)を提案する。我々は,入力速度を出力スパイク数やISI(Inter-Spike Interval)に線形にマッピングするために,時間的バックプロパゲーションとシュロゲート勾配を用いて新しいTDE-3を訓練する手法を開発した。私たちの研究は、特定のISIを持つためにスパイクニューロンを訓練する最初の例です。合成データを用いて,刺激のダイナミックレンジ,空間周波数,騒音レベルの変化について,スパイク数とISIのトレーニングと推論を比較した。 ISIは空間周波数の変化に対してより頑健であるのに対し、スパイク数はノイズの存在下でより信頼性の高い訓練信号である。我々は,TDEによる光フロー符号化の詳細な定量的検討を行い,TDE-2とTDE-3を比較した。その結果,両検出器のネットワークレベルでも同様の精度(20度角誤差,88%の相関)を示した。しかし、個々のTDEのより堅牢な方向選択性のため、TDE-3ベースのネットワークスパイクは少なく、エネルギー効率が良い。報告された精度はモデルベースの手法と同等であるが、TDEのスパイクベースの処理により、ニューロモルフィックハードウェアによるよりエネルギー効率の高い推論が可能になる。

Motion detection is a primary task required for robotic systems to perceive and navigate in their environment. Proposed in the literature bioinspired neuromorphic Time-Difference Encoder (TDE-2) combines event-based sensors and processors with spiking neural networks to provide real-time and energy-efficient motion detection through extracting temporal correlations between two points in space. However, on the algorithmic level, this design leads to loss of direction-selectivity of individual TDEs in textured environments. Here we propose an augmented 3-point TDE (TDE-3) with additional inhibitory input that makes TDE-3 direction-selectivity robust in textured environments. We developed a procedure to train the new TDE-3 using backpropagation through time and surrogate gradients to linearly map input velocities into an output spike count or an Inter-Spike Interval (ISI). Our work is the first instance of training a spiking neuron to have a specific ISI. Using synthetic data we compared training and inference with spike count and ISI with respect to changes in stimuli dynamic range, spatial frequency, and level of noise. ISI turns out to be more robust towards variation in spatial frequency, whereas the spike count is a more reliable training signal in the presence of noise. We performed the first in-depth quantitative investigation of optical flow coding with TDE and compared TDE-2 vs TDE-3 in terms of energy-efficiency and coding precision. Results show that on the network level both detectors show similar precision (20 degree angular error, 88% correlation with ground truth). Yet, due to the more robust direction-selectivity of individual TDEs, TDE-3 based network spike less and hence is more energy-efficient. Reported precision is on par with model-based methods but the spike-based processing of the TDEs provides allows more energy-efficient inference with neuromorphic hardware.

翻訳日:2024-02-20 19:43:43 公開日:2024-02-18

# 階層型アクティブ推論における動的計画法

Dynamic planning in hierarchical active inference ( http://arxiv.org/abs/2402.11658v1 )

ライセンス: Link先を確認

Matteo Priorelli and Ivilin Peev Stoianov

(参考訳) 動的計画法により、人間の脳が認知決定に関連する運動軌跡を推論し、導入する能力について述べる。最近のパラダイムであるアクティブ推論(active inference)は、生物の適応に関する基本的な洞察をもたらし、予測誤差を最小化し、生命に適合する状態に制限する。過去数年間、多くの研究が、ロボットと人工知能の革新的な解決策を刺激する、個別の意思決定や継続的なモーター制御といった、アクティブな推論プロセスの観点から、人間と動物の行動がどのように説明できるかを示してきた。しかし、この文献には、変化する環境におけるアクションを効果的に計画する方法に関する包括的な見通しが欠けている。モデリングツールの使用の目標を設定し、アクティブな推論における動的計画の話題を掘り下げ、生物学的目標指向行動の2つの重要な側面を念頭に置いて、オブジェクト操作の余裕を理解し活用する能力、そして他のエージェントを含む自己と環境の間の階層的相互作用を学ぶ。単純な単位から始めて、より高度な構造を徐々に記述し、最近提案された設計選択を比較し、各セクションの基本的な例を提供する。この研究は、ニューラルネットワークと強化学習を中心とする従来の見解とは距離を置き、階層モデルにおけるハイブリッド表現という、アクティブ推論の未検討の方向に向かっている。

By dynamic planning, we refer to the ability of the human brain to infer and impose motor trajectories related to cognitive decisions. A recent paradigm, active inference, brings fundamental insights into the adaptation of biological organisms, constantly striving to minimize prediction errors to restrict themselves to life-compatible states. Over the past years, many studies have shown how human and animal behavior could be explained in terms of an active inferential process -- either as discrete decision-making or continuous motor control -- inspiring innovative solutions in robotics and artificial intelligence. Still, the literature lacks a comprehensive outlook on how to effectively plan actions in changing environments. Setting ourselves the goal of modeling tool use, we delve into the topic of dynamic planning in active inference, keeping in mind two crucial aspects of biological goal-directed behavior: the capacity to understand and exploit affordances for object manipulation, and to learn the hierarchical interactions between the self and the environment, including other agents. We start from a simple unit and gradually describe more advanced structures, comparing recently proposed design choices and providing basic examples for each section. This study distances itself from traditional views centered on neural networks and reinforcement learning, and points toward a yet unexplored direction in active inference: hybrid representations in hierarchical models.

翻訳日:2024-02-20 19:43:07 公開日:2024-02-18

# 物理層通信による事前学習言語モデルの統合

Integrating Pre-Trained Language Model with Physical Layer Communications ( http://arxiv.org/abs/2402.11656v1 )

ライセンス: Link先を確認

Ju-Hyung Lee and Dong-Ho Lee and Joohan Lee and Jay Pujara

(参考訳) デバイスが言語モデル(lms)などの組み込み基盤モデルを通じて情報を直接交換するオンデバイスai通信の分野は、堅牢で効率的で汎用的な通信フレームワークを必要としている。しかし、これらのフレームワークを既存の無線システムに統合し、ノイズやビットエラーを効果的に管理することは大きな課題となる。本研究では,物理層(PHY)通信機能と統合されたデバイス上での実用的なAI通信フレームワークを提案する。本フレームワークは,チャネルノイズを用いたエンドツーエンドトレーニングを取り入れ,レジリエンスを高めるとともに,ベクトル量子化変分オートエンコーダ(vq-vae)を効率良くロバストな通信に活用し,プリトレーニングエンコーダ・デコーダトランスフォーマを一般化能力向上に活用する。各種通信シナリオにまたがるシミュレーションにより,我々のフレームワークは,標準化された3GPPチャネルモデルにおいて,相当な一般化能力とノイズロバスト性を示しながら,送信サイズを50%削減できることが判明した。

The burgeoning field of on-device AI communication, where devices exchange information directly through embedded foundation models, such as language models (LMs), requires robust, efficient, and generalizable communication frameworks. However, integrating these frameworks with existing wireless systems and effectively managing noise and bit errors pose significant challenges. In this work, we introduce a practical on-device AI communication framework, integrated with physical layer (PHY) communication functions, demonstrated through its performance on a link-level simulator. Our framework incorporates end-to-end training with channel noise to enhance resilience, incorporates vector quantized variational autoencoders (VQ-VAE) for efficient and robust communication, and utilizes pre-trained encoder-decoder transformers for improved generalization capabilities. Simulations, across various communication scenarios, reveal that our framework achieves a 50% reduction in transmission size while demonstrating substantial generalization ability and noise robustness under standardized 3GPP channel models.

翻訳日:2024-02-20 19:42:41 公開日:2024-02-18

# メカニズムの競合:言語モデルがファクトやカウンターファクトをどう扱うかの追跡

Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals ( http://arxiv.org/abs/2402.11655v1 )

ライセンス: Link先を確認

Francesco Ortu, Zhijing Jin, Diego Doimo, Mrinmaya Sachan, Alberto Cazzaniga, Bernhard Sch\"olkopf

(参考訳) 解釈可能性の研究は、経験的成功と大規模言語モデル(LLM)の内部動作に関する科学的理解のギャップを埋めることを目的としている。しかし、この分野の既存の研究のほとんどは、モデルが事実の知識をコピーまたはリコールする方法のような単一のメカニズムの分析に焦点を当てている。本研究では,個々のメカニズムではなく,複数のメカニズムの相互作用に着目したメカニズムの競合の定式化を提案し,そのひとつが最終予測において支配的になることを示す。我々は,ロジット検査と注意修正という2つの解釈方法を用いて,llm内の機構の競合がどのようにして起こるかを明らかにする。本研究は,様々なモデル成分間の機構とその競合の痕跡を示し,特定の機構の強度を効果的に制御する注意位置を明らかにした。私たちのコードとデータはhttps://github.com/francescortu/Competition_of_Mechanismsにあります。

Interpretability research aims to bridge the gap between the empirical success and our scientific understanding of the inner workings of large language models (LLMs). However, most existing research in this area focused on analyzing a single mechanism, such as how models copy or recall factual knowledge. In this work, we propose the formulation of competition of mechanisms, which instead of individual mechanisms focuses on the interplay of multiple mechanisms, and traces how one of them becomes dominant in the final prediction. We uncover how and where the competition of mechanisms happens within LLMs using two interpretability methods, logit inspection and attention modification. Our findings show traces of the mechanisms and their competition across various model components, and reveal attention positions that effectively control the strength of certain mechanisms. Our code and data are at https://github.com/francescortu/Competition_of_Mechanisms.

翻訳日:2024-02-20 19:42:22 公開日:2024-02-18

# モデルフリーな$\mu$- synthesis:非滑らかな最適化の観点から

Model-Free $\mu$-Synthesis: A Nonsmooth Optimization Perspective ( http://arxiv.org/abs/2402.11654v1 )

ライセンス: Link先を確認

Darioush Keivan, Xingang Guo, Peter Seiler, Geir Dullerud, Bin Hu

(参考訳) 本稿では,モデルフリーポリシーサーチを重要なロバスト制御ベンチマーク,すなわち$\mu$- synthesisで再検討する。一般的な出力フィードバック設定では、この問題に対する凸定式化は存在しないため、大域的最適性保証は期待できない。 Apkarian (2011) は、この問題に対して非凸な非滑らかなポリシー最適化手法を提案し、モデルベースの方法で更新方向を生成する下位のポリシー探索アルゴリズムを用いて最先端の設計結果を達成した。凸性や大域的最適性保証の欠如にもかかわらず、これらの段階的なポリシー探索手法は、実際は驚くべき数値的な結果をもたらしている。このような政策最適化を主眼として,これらの段階的な探索手法をモデルフリーな設定に拡張する。具体的には,モデルフリーの非導出的サンプリング法と一様平滑化を伴うゼロ次ポリシー探索法という2つのモデルフリーポリシー最適化手法の有効性について検討する。両手法がモデルベースで達成した設計成果を一貫して再現することを示すため,広範な数値実験を行った。さらに, 定常点への収束保証が, コスト関数の強制性に関連するいくつかの仮定の下で, モデルフリーな$\mu$- synthesis に対して確立されることを示す理論的正当性を示す。総じて,デリバティブフリー政策最適化は,モデルフリー設定における一般出力フィードバック$\mu$合成問題を解くための競争的かつ実行可能なアプローチであることを示す。

In this paper, we revisit model-free policy search on an important robust control benchmark, namely $\mu$-synthesis. In the general output-feedback setting, there do not exist convex formulations for this problem, and hence global optimality guarantees are not expected. Apkarian (2011) presented a nonconvex nonsmooth policy optimization approach for this problem, and achieved state-of-the-art design results via using subgradient-based policy search algorithms which generate update directions in a model-based manner. Despite the lack of convexity and global optimality guarantees, these subgradient-based policy search methods have led to impressive numerical results in practice. Built upon such a policy optimization persepctive, our paper extends these subgradient-based search methods to a model-free setting. Specifically, we examine the effectiveness of two model-free policy optimization strategies: the model-free non-derivative sampling method and the zeroth-order policy search with uniform smoothing. We performed an extensive numerical study to demonstrate that both methods consistently replicate the design outcomes achieved by their model-based counterparts. Additionally, we provide some theoretical justifications showing that convergence guarantees to stationary points can be established for our model-free $\mu$-synthesis under some assumptions related to the coerciveness of the cost function. Overall, our results demonstrate that derivative-free policy optimization offers a competitive and viable approach for solving general output-feedback $\mu$-synthesis problems in the model-free setting.

翻訳日:2024-02-20 19:42:06 公開日:2024-02-18

# モバイルエッジコンピューティングにおけるタスクオフロードのためのコンビネートクライアントマスタマルチエージェント深層強化学習

Combinatorial Client-Master Multiagent Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing ( http://arxiv.org/abs/2402.11653v1 )

ライセンス: Link先を確認

Tesfay Zemuy Gebrekidan, Sebastian Stein, Timothy J.Norman

(参考訳) 近年,ビデオストリーミング,データマイニング,仮想現実,拡張現実,画像処理,画像処理,ビデオ処理,顔認識,オンラインゲームなど,計算集約的なタスクを行うモバイルアプリケーションが急増している。しかし、タブレットやスマートフォンのようなユーザデバイス(UD)は、タスクの計算要求を実行する能力が限られている。モバイルエッジコンピューティング(MEC)は、UDのコンピューティング需要の増加に対応するための有望な技術として登場した。 MECのタスクオフロードは、UDとMECサーバ間でタスクを分散することでUDの要求を満たす戦略である。動的変化に適応し、オンライン計算複雑性を最小限に抑えることができるため、タスクオフロード問題においてDRLが注目されている。しかし、UDやMECサーバにおける各種のリソース制約は、効率的なDRLベースのタスクオフロード戦略の設計に困難をもたらす。既存のDRLベースのタスクオフロードアルゴリズムは、サーバに十分なストレージリソースが利用できることを前提として、UDの制約に焦点を当てている。さらに、既存のマルチエージェントDRL(MADRL)ベースのタスクオフロードアルゴリズムは、均質なエージェントであり、均質な制約を報酬関数のペナルティとみなす。我々は,タスクオフロードをMEC (CCM\_MADRL\_MEC) で行うための新しい組合せクライアントマスターMADRL (CCM\_MADRL) アルゴリズムを提案し,UDがリソース要求を判断し,サーバがUDの要求に基づいて組合せ決定を行えるようにした。 CCM\_MADRL\_MECは、UDの制約に加えてサーバストレージ容量を考慮するタスクオフロードにおける最初のMADRLである。 CCM\_MADRL\_MECは組合せ行動選択を利用して既存のMADDPGおよびヒューリスティックアルゴリズムよりも優れた収束性を示した。

Recently, there has been an explosion of mobile applications that perform computationally intensive tasks such as video streaming, data mining, virtual reality, augmented reality, image processing, video processing, face recognition, and online gaming. However, user devices (UDs), such as tablets and smartphones, have a limited ability to perform the computation needs of the tasks. Mobile edge computing (MEC) has emerged as a promising technology to meet the increasing computing demands of UDs. Task offloading in MEC is a strategy that meets the demands of UDs by distributing tasks between UDs and MEC servers. Deep reinforcement learning (DRL) is gaining attention in task-offloading problems because it can adapt to dynamic changes and minimize online computational complexity. However, the various types of continuous and discrete resource constraints on UDs and MEC servers pose challenges to the design of an efficient DRL-based task-offloading strategy. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough storage resources on the server. Moreover, existing multiagent DRL (MADRL)--based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. We proposed a novel combinatorial client-master MADRL (CCM\_MADRL) algorithm for task offloading in MEC (CCM\_MADRL\_MEC) that enables UDs to decide their resource requirements and the server to make a combinatorial decision based on the requirements of the UDs. CCM\_MADRL\_MEC is the first MADRL in task offloading to consider server storage capacity in addition to the constraints in the UDs. By taking advantage of the combinatorial action selection, CCM\_MADRL\_MEC has shown superior convergence over existing MADDPG and heuristic algorithms.

翻訳日:2024-02-20 19:41:44 公開日:2024-02-18

# 大規模言語モデルがイデオロギー操作にどの程度影響するか

How Susceptible are Large Language Models to Ideological Manipulation? ( http://arxiv.org/abs/2402.11725v1 )

ライセンス: Link先を確認

Kai Chen, Zihao He, Jun Yan, Taiwei Shi, Kristina Lerman

(参考訳) 大規模言語モデル(LLM)は、大衆の認識や情報との相互作用に大きな影響を与える可能性がある。これは、これらのモデル内のイデオロギーを容易に操作できる場合に生じる社会的な影響に関する懸念を引き起こす。本研究では,llmがいかに効果的にイデオロギーバイアスを学習し,一般化できるかを検討する。少量のイデオロギー駆動サンプルへの曝露は,LSMのイデオロギーを著しく変化させる。特に、LLMは、あるトピックからイデオロギーを吸収し、それとは無関係なトピックに一般化する能力を示す。 LLMのイデオロギーが歪められることの容易さは、悪意あるアクターによる故意に有害なトレーニングデータや、データアノテータによる不注意に導入されたバイアスに関連するリスクを浮き彫りにする。また、llmに対するイデオロギー操作の影響を軽減するための堅牢なセーフガードの必要性も強調している。

Large Language Models (LLMs) possess the potential to exert substantial influence on public perceptions and interactions with information. This raises concerns about the societal impact that could arise if the ideologies within these models can be easily manipulated. In this work, we investigate how effectively LLMs can learn and generalize ideological biases from their instruction-tuning data. Our findings reveal a concerning vulnerability: exposure to only a small amount of ideologically driven samples significantly alters the ideology of LLMs. Notably, LLMs demonstrate a startling ability to absorb ideology from one topic and generalize it to even unrelated ones. The ease with which LLMs' ideologies can be skewed underscores the risks associated with intentionally poisoned training data by malicious actors or inadvertently introduced biases by data annotators. It also emphasizes the imperative for robust safeguards to mitigate the influence of ideological manipulations on LLMs.

翻訳日:2024-02-20 19:35:12 公開日:2024-02-18

# ChatGPTは開発者をサポートできるか? コード生成のための大規模言語モデルの実証評価

Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation ( http://arxiv.org/abs/2402.11702v1 )

ライセンス: Link先を確認

Kailun Jin, Chung-Yu Wang, Hung Viet Pham, Hadi Hemmati

(参考訳) 大規模言語モデル(llm)は、様々な開発シナリオで有望な能力を示す多くの先行研究とともに、コード生成において顕著な熟練度を示している。しかし、これらの研究は主に研究環境での評価を提供しており、LLMが現実世界の開発者をいかに効果的に支援できるかを理解するための大きなギャップを残している。これを解決するために、私たちは、開発者とChatGPT(GitHubなどのプラットフォーム上のShare Link機能でキャプチャされた)の会話から収集されたデータセットであるDevGPTで会話を経験的に分析しました。私たちの経験から,LLM生成コードを使用する現在のプラクティスは,一般的には,高レベルな概念のデモやドキュメントの例の提供に限られています。これらの結果は、現代のソフトウェア開発において不可欠な部分になる前に、コード生成におけるLLMを改善するには、将来的な作業が必要であることを示している。

Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.

翻訳日:2024-02-20 19:34:51 公開日:2024-02-18

# イジングモデルの機械学習解法を説明する

Explaining the Machine Learning Solution of the Ising Model ( http://arxiv.org/abs/2402.11701v1 )

ライセンス: Link先を確認

Roberto C. Alamino

(参考訳) 機械学習(ML)技術と同様に、大きな次元を持つデータに関わる問題を解く上でも強力であり、パラメータを組み込んだ結果を説明することは、特に物理学的な応用において最も重要な課題である。ここでは、近年の多くのML研究のターゲットである強磁性イジングモデルに対して、これがどのように達成できるかを示す。隠れた層を持たないニューラルネットワーク(NN)とハミルトニアン対称性を用いてモデルの連続相転移の臨界温度を求めることにより、その戦略を説明することができる。これにより、対称性が分かっていないとき、nn の最小拡張の予測が問題を解くことができるが、これも説明できる。

As powerful as machine learning (ML) techniques are in solving problems involving data with large dimensionality, explaining the results from the fitted parameters remains a challenging task of utmost importance, especially in physics applications. Here it is shown how this can be accomplished for the ferromagnetic Ising model, the target of many ML studies in the last years. By using a neural network (NN) without any hidden layers and the symmetry of the Hamiltonian to find the critical temperature for the continuous phase transition of the model, an explanation of its strategy is found. This allows the prediction of the minimal extension of the NN to solve the problem when the symmetry is not known, which is also explainable.

翻訳日:2024-02-20 19:34:33 公開日:2024-02-18

# 5gセル --エネルギー効率の観点から

5G Cellular -- An Energy Efficiency Perspective ( http://arxiv.org/abs/2402.11698v1 )

ライセンス: Link先を確認

Deven Panchal

(参考訳) セルラー通信の5g技術は、いつでもどこでも情報にアクセスするための大きな容量と範囲を約束するが、膨大な電力消費を持つ恐れがある。加入者側と運用者側の両方に存在するこの問題の解決に向けた重要な研究が進められている。トラフィックの予測、物理層の変更、そして5G技術をよりエネルギー効率良くするための取り組みなどがあった。本研究の目的は,エネルギー効率の観点から5g技術の実現可能性を検討することである。改良や修正によって5Gセルのエネルギー効率が向上する5Gセル内の特定の領域を指摘する努力がなされる。

While the 5G technology of cellular communications promises great capacity and coverage to access information anywhere and anytime, it is feared to have huge power consumption. Significant research been has been directed towards solving this problem which exists both on the subscribers side as well as the operators side. There have been efforts like predicting traffic, modifying the physical layer etc. towards making the 5G technology more energy efficient. The aim of this study is to see the technology enablers for 5G from an energy efficiency perspective. Efforts will be made to point out specific areas in 5G cellular where improvements or modifications could make 5G cellular more energy efficient.

翻訳日:2024-02-20 19:34:22 公開日:2024-02-18

# ソフトウェア定義光ネットワークの実現

Enabling Software Defined Optical Networks ( http://arxiv.org/abs/2402.11695v1 )

ライセンス: Link先を確認

Deven Panchal

(参考訳) 本稿では,Software Defined Optical Networks(SDON)の概要と実装方法について述べる。これは光ネットワークの進化をGMPLSまで遡り、SDNのアイデアを辿り、OpenFlowに構築する。論文では、SDONの必要性を調査し、ハードウェアを含むSDONソリューションがどのようなものかを説明する。また、GMPLSの制限を克服するために、このソリューションの一部としてOpenFlowをどのように使用できるかについても説明している。

This paper gives an overview of Software Defined Optical Networks or SDONs and how they can be implemented. It traces the evolution of Optical networks upto GMPLS and traces the idea of SDN and builds upto OpenFlow. The paper explores the need for SDONs and explains what a SDON solution could look like, including the hardware. It also seeks to explain how OpenFlow could be used as a part of this solution to overcome the limitations of GMPLS.

翻訳日:2024-02-20 19:34:11 公開日:2024-02-18

# Vision-Flan: ビジュアルインストラクションチューニングにおけるヒューマンラベルタスクのスケーリング

Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning ( http://arxiv.org/abs/2402.11690v1 )

ライセンス: Link先を確認

Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin, Yu Cheng, Qifan Wang, Lifu Huang

(参考訳) 視覚言語モデル(VLM)は、多目的視覚アシスタントとして優れた機能を持つが、既存のVLMフレームワークには、(1)事前学習と視覚指導のタスク多様性の欠如、(2)GPT-4合成命令チューニングデータにおけるアノテーションエラーとバイアスの2つの大きな課題がある。どちらの課題も、ジェネラビリティの低下、幻覚、破滅的な忘れるといった問題を引き起こす。これらの課題に対処するため,我々は187の多様なタスクと1,664,261のインスタンスからなる,これまでに利用可能な最も多様な視覚インストラクションチューニングデータセットであるvision-flanを構築し,各タスクに専門家による命令を添付する。さらに,VLMをまずVision-Flan上で微調整し,さらにGPT-4合成データに基づいて調整する2段階の命令チューニングフレームワークを提案する。この2段階のチューニングフレームワークは、従来の1段階のビジュアル命令チューニングフレームワークを著しく上回り、幅広いマルチモーダル評価ベンチマークで最先端のパフォーマンスを実現しています。その結果,(1) GPT-4 合成データは VLM の能力を大幅に向上させるものではなく,むしろ人間の嗜好形式に対するモデル応答を変調する。(2) GPT-4 合成データの最小量 (例: 1000) は VLM 応答を人間の嗜好と効果的に整合させることができる;(3) 視覚的指示チューニングは主に大言語モデル(LLM)の視覚的特徴の理解を支援する。

Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and catastrophic forgetting. To address these challenges, we construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date, comprising 187 diverse tasks and 1,664,261 instances sourced from academic datasets, and each task is accompanied by an expert-written instruction. In addition, we propose a two-stage instruction tuning framework, in which VLMs are firstly finetuned on Vision-Flan and further tuned on GPT-4 synthesized data. We find this two-stage tuning framework significantly outperforms the traditional single-stage visual instruction tuning framework and achieves the state-of-the-art performance across a wide range of multi-modal evaluation benchmarks. Finally, we conduct in-depth analyses to understand visual instruction tuning and our findings reveal that: (1) GPT-4 synthesized data does not substantially enhance VLMs' capabilities but rather modulates the model's responses to human-preferred formats; (2) A minimal quantity (e.g., 1,000) of GPT-4 synthesized data can effectively align VLM responses with human-preference; (3) Visual instruction tuning mainly helps large-language models (LLMs) to understand visual features.

翻訳日:2024-02-20 19:34:04 公開日:2024-02-18

# 量子ニューラルネットワークにおけるモデル盗み攻撃と防御の効果評価

Evaluating Efficacy of Model Stealing Attacks and Defenses on Quantum Neural Networks ( http://arxiv.org/abs/2402.11687v1 )

ライセンス: Link先を確認

Satwik Kundu, Debarshi Kundu and Swaroop Ghosh

(参考訳) 量子機械学習(QML)モデルのクラウドホスティングは、モデルをさまざまな脆弱性に公開する。本研究では,量子コンピューティングの領域におけるそのような攻撃の有効性を評価する。複数のQMLモデルアーキテクチャを用いた各種データセットの総合的な実験を行った。その結果、モデル盗み攻撃は最大$0.9\times$と$0.99\times$のクローンテスト精度を、トップ$$とトップ$k$のラベル(それぞれ$k:$ num\_classes)で訓練すると生成できることが判明した。これらの攻撃から防御するために、我々は現在の騒がしいハードウェアのユニークな特性を利用し、被害者モデルの出力を摂動させ、攻撃者のトレーニングプロセスを妨げる。特に,我々は次のように提案する。 1)ハードウェア変動誘発摂動(HVIP)と 2)ハードウェアとアーキテクチャの変化による摂動(HAVIP)。ノイズとアーキテクチャのばらつきは最大$\sim16\%の出力難読化を実現することができるが, 包括的解析により, ノイズ条件下でクローンされたモデルは耐障害性が高く, 難読化による性能劣化がほとんどないことがわかった。ノイズの多いハードウェアでトレーニングされたQMLモデルは、摂動や難読化に基づく防御や攻撃に自然に抵抗する。

Cloud hosting of quantum machine learning (QML) models exposes them to a range of vulnerabilities, the most significant of which is the model stealing attack. In this study, we assess the efficacy of such attacks in the realm of quantum computing. We conducted comprehensive experiments on various datasets with multiple QML model architectures. Our findings revealed that model stealing attacks can produce clone models achieving up to $0.9\times$ and $0.99\times$ clone test accuracy when trained using Top-$1$ and Top-$k$ labels, respectively ($k:$ num\_classes). To defend against these attacks, we leverage the unique properties of current noisy hardware and perturb the victim model outputs and hinder the attacker's training process. In particular, we propose: 1) hardware variation-induced perturbation (HVIP) and 2) hardware and architecture variation-induced perturbation (HAVIP). Although noise and architectural variability can provide up to $\sim16\%$ output obfuscation, our comprehensive analysis revealed that models cloned under noisy conditions tend to be resilient, suffering little to no performance degradation due to such obfuscations. Despite limited success with our defense techniques, this outcome has led to an important discovery: QML models trained on noisy hardwares are naturally resistant to perturbation or obfuscation-based defenses or attacks.

翻訳日:2024-02-20 19:33:32 公開日:2024-02-18

# 離散力学系のトポロジーと挙動の学習

Learning the Topology and Behavior of Discrete Dynamical Systems ( http://arxiv.org/abs/2402.11686v1 )

ライセンス: Link先を確認

Zirou Qiu, Abhijin Adiga, Madhav V. Marathe, S. S. Ravi, Daniel J. Rosenkrantz, Richard E. Stearns, Anil Vullikanti

(参考訳) 離散力学系は、現実世界のネットワーク上での感染拡大をモデル化するために一般的に用いられる。 PACフレームワークの下では、基礎となるネットワークが知られていると仮定して、システムの振る舞いを学習する問題を研究している。本研究では、ブラックボックスシステムの振る舞いと基盤となるトポロジの両方を学習する、より困難な設定に焦点を当てる。一般に、この学習問題は計算的に難解であることを示す。正の面では、動的システムの基盤となるグラフがいくつかのクラスに属する場合、PACモデルの下で効率的な学習方法を示す。さらに,未知系のトポロジーが部分的に観測される緩和設定について検討する。そこで本研究では,システムの推論とサンプルの複雑さの確立に有効なPAC学習者を提案する。最後に、ナタラジャン次元のよく知られた形式主義を用いて、トポロジーと振舞いの両方が未知である力学系の仮説クラスの表現力の形式的解析を行う。本研究は離散力学系の挙動とトポロジーを学習するための理論的基礎を提供する。

Discrete dynamical systems are commonly used to model the spread of contagions on real-world networks. Under the PAC framework, existing research has studied the problem of learning the behavior of a system, assuming that the underlying network is known. In this work, we focus on a more challenging setting: to learn both the behavior and the underlying topology of a black-box system. We show that, in general, this learning problem is computationally intractable. On the positive side, we present efficient learning methods under the PAC model when the underlying graph of the dynamical system belongs to some classes. Further, we examine a relaxed setting where the topology of an unknown system is partially observed. For this case, we develop an efficient PAC learner to infer the system and establish the sample complexity. Lastly, we present a formal analysis of the expressive power of the hypothesis class of dynamical systems where both the topology and behavior are unknown, using the well-known formalism of the Natarajan dimension. Our results provide a theoretical foundation for learning both the behavior and topology of discrete dynamical systems.

翻訳日:2024-02-20 19:33:07 公開日:2024-02-18

# alaVA:ライトビジョンランゲージモデルのためのGPT4V合成データのハーネス化

ALLaVA: Harnessing GPT4V-synthesized Data for A Lite Vision-Language Model ( http://arxiv.org/abs/2402.11684v1 )

ライセンス: Link先を確認

Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang

(参考訳) 近年の大型視覚言語モデル(lvlms)の進歩により、言語モデルにおけるマルチモーダル入力の処理が可能となったが、特にエッジデバイスでは重要な計算資源を必要とする。本研究では,従来のLVLMとリソースフレンドリなライトバージョンのパフォーマンスギャップを,高品質なトレーニングデータを用いて橋渡しすることを目的とする。これを実現するために、gpt-4vの詳細なキャプション、複雑な推論命令、画像からの詳細な回答を生成する能力を利用して合成データセットを作成する。得られたモデルであるALLaVAは、最大3B LVLMまでの12ベンチマークで競合性能を達成する。この研究は、より効率的なLVLMを作成する際に高品質なデータを採用する可能性を強調している。オンラインデモは \url{https://allava.freedomai.cn} で公開しています。

Recent advancements in Large Vision-Language Models (LVLMs) have enabled processing of multimodal inputs in language models but require significant computational resources for deployment, especially in edge devices. This study aims to bridge the performance gap between traditional-scale LVLMs and resource-friendly lite versions by adopting high-quality training data. To do this, a synthetic dataset is created by leveraging GPT-4V's ability to generate detailed captions, complex reasoning instructions and detailed answers from images. The resulted model trained with our data, ALLaVA, achieves competitive performance on 12 benchmarks up to 3B LVLMs. This work highlights the feasibility of adopting high-quality data in crafting more efficient LVLMs. Our online demo is available at \url{https://allava.freedomai.cn}.

翻訳日:2024-02-20 19:32:50 公開日:2024-02-18

# すべてを支配するための1つのプロンプト: 意見要約のためのllm

One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation ( http://arxiv.org/abs/2402.11683v1 )

ライセンス: Link先を確認

Tejpalsingh Siledar, Swaroop Nath, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Swaprava Nath, Pushpak Bhattacharyya, Suman Banerjee, Amey Patil, Sudhanshu Shekhar Singh, Muthusamy Chelliah, Nikesh Garera

(参考訳) 従来の基準に基づく指標を用いた意見要約の評価は、概観的な評価を提供することは稀であり、人間の判断との相関が比較的低いことが示されている。近年,NLG評価のための基準フリー指標としてLarge Language Models (LLMs) が提案されているが,意見要約評価には未検討である。さらに、限定的な意見要約評価データセットは進捗を阻害する。これに対処するため、私たちはsummeval-opデータセットをリリースします。このデータセットは、意見要約の評価に関連する7つの側面をカバーする: フルエンシ、コヒーレンス、妥当性、忠実性、アスペクトカバレッジ、感情一貫性、特異性。本稿では,Op-I-Promptを次元に依存しないプロンプト,Op-Promptsについて考察する。実験の結果、Op-I-Promptは、人間と平均で0.70のスピアマン相関を達成し、これまでのすべてのアプローチよりも優れているという意見の要約を評価するための優れた代替手段として現れている。我々の知る限り、我々は、意見要約領域において、クローズドソースモデルとオープンソースモデルの両方において、LCMを評価対象として初めて調査する。

Evaluation of opinion summaries using conventional reference-based metrics rarely provides a holistic evaluation and has been shown to have a relatively low correlation with human judgments. Recent studies suggest using Large Language Models (LLMs) as reference-free metrics for NLG evaluation, however, they remain unexplored for opinion summary evaluation. Moreover, limited opinion summary evaluation datasets inhibit progress. To address this, we release the SUMMEVAL-OP dataset covering 7 dimensions related to the evaluation of opinion summaries: fluency, coherence, relevance, faithfulness, aspect coverage, sentiment consistency, and specificity. We investigate Op-I-Prompt a dimension-independent prompt, and Op-Prompts, a dimension-dependent set of prompts for opinion summary evaluation. Experiments indicate that Op-I-Prompt emerges as a good alternative for evaluating opinion summaries achieving an average Spearman correlation of 0.70 with humans, outperforming all previous approaches. To the best of our knowledge, we are the first to investigate LLMs as evaluators on both closed-source and open-source models in the opinion summarization domain.

翻訳日:2024-02-20 19:32:35 公開日:2024-02-18

# 非可換性による学習条件不変性

Learning Conditional Invariances through Non-Commutativity ( http://arxiv.org/abs/2402.11682v1 )

ライセンス: Link先を確認

Abhra Chaudhuri, Serban Georgescu, Anjan Dutta

(参考訳) ドメイン固有の確率変数を障害として条件付きフィルタリングする非分散学習アルゴリズムは、評価対象のドメインではなく、データセマンティクスのみに基づいて行う。目的領域に非可換的に向くような不変条件を緩和することにより, 条件付き不変条件の学習に最適で, サンプル効率のよい学習方法を示す。ドメイン非対称性の下では、ターゲットドメインがソースに存在しない意味的関連情報を含んでいる場合、ドメインの平均で最適であるエンコーダ$\varphi^*$のリスクは、ターゲット固有の最適エンコーダ$\Phi^*_\tau$のリスクによって厳密に低くされる。非可換性は$\Phi^*_\tau$ を $\varphi^*$ ではなく $\Phi^*_\tau$ に最適化することを証明し、ドメイン間の$\mathcal{H}$-divergence をゼロにすることで、ターゲットのリスクに厳密な制限を与える。我々の理論と実験は、NCI(Non-commutative invariance)が、ドメイン適応のためのSOTA不変学習アルゴリズムを超越した$\Phi^*_\tau$を学習する際の、サンプルの複雑さを満たすために、ソースドメインサンプルを活用することを実証している。実装はhttps://github.com/abhrac/nciで利用可能である。

Invariance learning algorithms that conditionally filter out domain-specific random variables as distractors, do so based only on the data semantics, and not the target domain under evaluation. We show that a provably optimal and sample-efficient way of learning conditional invariances is by relaxing the invariance criterion to be non-commutatively directed towards the target domain. Under domain asymmetry, i.e., when the target domain contains semantically relevant information absent in the source, the risk of the encoder $\varphi^*$ that is optimal on average across domains is strictly lower-bounded by the risk of the target-specific optimal encoder $\Phi^*_\tau$. We prove that non-commutativity steers the optimization towards $\Phi^*_\tau$ instead of $\varphi^*$, bringing the $\mathcal{H}$-divergence between domains down to zero, leading to a stricter bound on the target risk. Both our theory and experiments demonstrate that non-commutative invariance (NCI) can leverage source domain samples to meet the sample complexity needs of learning $\Phi^*_\tau$, surpassing SOTA invariance learning algorithms for domain adaptation, at times by over $2\%$, approaching the performance of an oracle. Implementation is available at https://github.com/abhrac/nci.

翻訳日:2024-02-20 19:32:01 公開日:2024-02-18

# 言語習得のブラックボックスを開く

Opening the black box of language acquisition ( http://arxiv.org/abs/2402.11681v1 )

ライセンス: Link先を確認

J\'er\^ome Michaud and Anna Jon-and

(参考訳) ディープラーニング技術を用いた大規模言語モデルの最近の進歩は、データから言語を学習する方法に新たな関心を寄せている。しかし、これらのモデルが学習言語からの文法情報をどう表現するかは不明である。加えて、モデルは使用前に大きなコーパスで事前訓練されなければならない。本研究では,学習言語のための代替的,より透明で認知的に妥当なアーキテクチャを提案する。ディープラーニングの代わりに、シーケンスメモリとチャンキングに基づいた最小限の認知アーキテクチャを使用します。学習メカニズムは強化学習の原理に基づいている。私たちは、多くの自然のおもちゃの言語でアーキテクチャをテストします。その結果,モデルがこれらの人工言語をゼロから学習し,学習を支援する文法情報を抽出できることが示唆された。本研究は,このシンプルなアーキテクチャのパワーを実証し,言語学習プロセスの重要な要素としてシーケンスメモリの重要性を強調した。他の動物は忠実なシーケンス記憶を持っていないように見えるため、なぜ人間だけが複雑な言語を発達させたのかを説明することができる。

Recent advances in large language models using deep learning techniques have renewed interest on how languages can be learned from data. However, it is unclear whether or how these models represent grammatical information from the learned languages. In addition, the models must be pre-trained on large corpora before they can be used. In this work, we propose an alternative, more transparent and cognitively plausible architecture for learning language. Instead of using deep learning, our approach uses a minimal cognitive architecture based on sequence memory and chunking. The learning mechanism is based on the principles of reinforcement learning. We test our architecture on a number of natural-like toy languages. Results show that the model can learn these artificial languages from scratch and extract grammatical information that supports learning. Our study demonstrates the power of this simple architecture and stresses the importance of sequence memory as a key component of the language learning process. Since other animals do not seem to have a faithful sequence memory, this may explain why only humans have developed complex languages.

翻訳日:2024-02-20 19:31:18 公開日:2024-02-18

# リカレントニューラルネットワークと画像圧縮法による3次元点クラウド圧縮

3D Point Cloud Compression with Recurrent Neural Network and Image Compression Methods ( http://arxiv.org/abs/2402.11680v1 )

ライセンス: Link先を確認

Till Beemelmanns, Yuchen Tao, Bastian Lampe, Lennart Reiher, Raphael van Kempen, Timo Woopen, and Lutz Eckstein

(参考訳) LiDARポイントクラウドデータの保存と送信は、トレーニングデータ収集、リモートコントロール、クラウドサービス、SLAMなど、多くのAVアプリケーションにとって不可欠である。しかし,データの大きさや秩序のない構造のため,ポイントクラウドデータを低容量に圧縮することは困難である。原点雲データを密度の高い2次元行列構造に変換することは、圧縮アルゴリズムを適用する上で有望な方法である。本研究では,2次元表現における空間相関を効率的に利用するための圧縮アルゴリズムを提案する。構造化表現の圧縮には,一般的な画像圧縮法と,再帰的ニューラルネットワークを用いた自己教師あり深層圧縮法を用いる。また,LiDARの強度測定を密度2D表現に再構成し,その強度の圧縮性能を評価するための新しい指標を提案する。一般的なoctreeポイントクラウド圧縮や生のポイントクラウドデータ圧縮に基づくアプローチと比較すると、このアプローチは最良の定量的かつ視覚的なパフォーマンスを達成します。ソースコードとデータセットはhttps://github.com/ika-rwth-aachen/point-cloud-compressionで入手できる。

Storing and transmitting LiDAR point cloud data is essential for many AV applications, such as training data collection, remote control, cloud services or SLAM. However, due to the sparsity and unordered structure of the data, it is difficult to compress point cloud data to a low volume. Transforming the raw point cloud data into a dense 2D matrix structure is a promising way for applying compression algorithms. We propose a new lossless and calibrated 3D-to-2D transformation which allows compression algorithms to efficiently exploit spatial correlations within the 2D representation. To compress the structured representation, we use common image compression methods and also a self-supervised deep compression approach using a recurrent neural network. We also rearrange the LiDAR's intensity measurements to a dense 2D representation and propose a new metric to evaluate the compression performance of the intensity. Compared to approaches that are based on generic octree point cloud compression or based on raw point cloud data compression, our approach achieves the best quantitative and visual performance. Source code and dataset are available at https://github.com/ika-rwth-aachen/Point-Cloud-Compression.

翻訳日:2024-02-20 19:30:54 公開日:2024-02-18

# MultiCorrupt: マルチモードロバストネスデータセットと3次元物体検出のためのLiDAR-Camera Fusionのベンチマーク

MultiCorrupt: A Multi-Modal Robustness Dataset and Benchmark of LiDAR-Camera Fusion for 3D Object Detection ( http://arxiv.org/abs/2402.11677v1 )

ライセンス: Link先を確認

Till Beemelmanns, Quan Zhang, and Lutz Eckstein

(参考訳) 自動走行のためのマルチモーダル3Dオブジェクト検出モデルは、nuScenesのようなコンピュータビジョンベンチマークでは例外的な性能を示した。しかし、密集したLiDAR点雲や精密に校正されたセンサーアレイへの依存は、現実世界のアプリケーションに課題をもたらす。センサの誤用、ミスキャリブレーション、異なるサンプリング周波数などの問題は、lidarやカメラからのデータの空間的および時間的不均衡につながる。加えて、LiDARとカメラデータの完全性は、インクリメント気象などの有害な環境条件によってしばしば損なわれ、閉塞やノイズ干渉を引き起こす。この課題に対処するため,我々は,マルチモーダル3次元物体検出器のロバスト性を評価するための総合ベンチマークであるmulticorruptを導入する。マルチコラプトにおける5つの最先端マルチモーダル検出器を評価し,その耐性について検討した。以上の結果から, 既存手法では腐敗の種類や融解戦略によってロバスト性が異なっていた。マルチモーダルな設計選択が、そのようなモデルをある種の摂動に対して堅牢にするための洞察を提供する。データセット生成コードとベンチマークはhttps://github.com/ika-rwth-aachen/MultiCorruptで公開されている。

Multi-modal 3D object detection models for automated driving have demonstrated exceptional performance on computer vision benchmarks like nuScenes. However, their reliance on densely sampled LiDAR point clouds and meticulously calibrated sensor arrays poses challenges for real-world applications. Issues such as sensor misalignment, miscalibration, and disparate sampling frequencies lead to spatial and temporal misalignment in data from LiDAR and cameras. Additionally, the integrity of LiDAR and camera data is often compromised by adverse environmental conditions such as inclement weather, leading to occlusions and noise interference. To address this challenge, we introduce MultiCorrupt, a comprehensive benchmark designed to evaluate the robustness of multi-modal 3D object detectors against ten distinct types of corruptions. We evaluate five state-of-the-art multi-modal detectors on MultiCorrupt and analyze their performance in terms of their resistance ability. Our results show that existing methods exhibit varying degrees of robustness depending on the type of corruption and their fusion strategy. We provide insights into which multi-modal design choices make such models robust against certain perturbations. The dataset generation code and benchmark are open-sourced at https://github.com/ika-rwth-aachen/MultiCorrupt.

翻訳日:2024-02-20 19:29:51 公開日:2024-02-18

# LiRaFusion:3次元物体検出のための深層適応LiDAR-Radar核融合

LiRaFusion: Deep Adaptive LiDAR-Radar Fusion for 3D Object Detection ( http://arxiv.org/abs/2402.11735v1 )

ライセンス: Link先を確認

Jingyu Song, Lingjun Zhao, Katherine A. Skinner

(参考訳) 既存のLiDARレーダ検出器の性能ギャップを埋めるために,LiRaFusionを用いて3次元物体検出を行う。これら2つのモードから特徴抽出能力を向上させるために,ジョイントボクセル特徴符号化のための早期融合モジュールと,ゲートネットワークを介して特徴マップを適応的に融合する中間融合モジュールを設計した。我々は、LiRaFusionがLiDARとレーダーの補完情報を効果的に活用し、既存の手法よりも顕著な改善を実現していることを示す。

We propose LiRaFusion to tackle LiDAR-radar fusion for 3D object detection to fill the performance gap of existing LiDAR-radar detectors. To improve the feature extraction capabilities from these two modalities, we design an early fusion module for joint voxel feature encoding, and a middle fusion module to adaptively fuse feature maps via a gated network. We perform extensive evaluation on nuScenes to demonstrate that LiRaFusion leverages the complementary information of LiDAR and radar effectively and achieves notable improvement over existing methods.

翻訳日:2024-02-20 19:20:57 公開日:2024-02-18

# 大規模言語モデルを用いたデータ中心タスクの解決

Solving Data-centric Tasks using Large Language Models ( http://arxiv.org/abs/2402.11734v1 )

ライセンス: Link先を確認

Shraddha Barke, Christian Poelitz, Carina Suzana Negreanu, Benjamin Zorn, Jos\'e Cambronero, Andrew D. Gordon, Vu Le, Elnaz Nouri, Nadia Polikarpova, Advait Sarkar, Brian Slininger, Neil Toronto, Jack Williams

(参考訳) 大規模言語モデル(llm)はstackoverflowのようなヘルプフォーラムを急速に置き換えている。これらのユーザは、スプレッドシート操作やデータラングといったデータ中心のタスクに関心を持っていることが多い。しかし、どのデータとどのデータをプロンプトに含めるかをどのように決めるのか? 本稿では,この問題への回答に2つの貢献をする。まず,StackOverflowの投稿から抽出した表データを操作する実世界のNL-to-codeタスクのデータセットを作成する。次に,LLMプロンプトに入力データから最も代表的な行を追加するクラスタ列選択プロンプト手法を提案する。実験の結果,LLMの性能はプロンプトに渡されるデータ量に非常に敏感であり,入力テーブルに多くの構文変化があるタスクの場合,クラスタ列選択手法はランダム選択ベースラインよりも優れていた。

Large language models (LLMs) are rapidly replacing help forums like StackOverflow, and are especially helpful for non-professional programmers and end users. These users are often interested in data-centric tasks, such as spreadsheet manipulation and data wrangling, which are hard to solve if the intent is only communicated using a natural-language description, without including the data. But how do we decide how much data and which data to include in the prompt? This paper makes two contributions towards answering this question. First, we create a dataset of real-world NL-to-code tasks manipulating tabular data, mined from StackOverflow posts. Second, we introduce a cluster-then-select prompting technique, which adds the most representative rows from the input data to the LLM prompt. Our experiments show that LLM performance is indeed sensitive to the amount of data passed in the prompt, and that for tasks with a lot of syntactic variation in the input table, our cluster-then-select technique outperforms a random selection baseline.

翻訳日:2024-02-20 19:20:45 公開日:2024-02-18

# ロバスト一般化におけるランダムフォーミングの有効性

The Effectiveness of Random Forgetting for Robust Generalization ( http://arxiv.org/abs/2402.11733v1 )

ライセンス: Link先を確認

Vijaya Raghavan T Ramkumar, Bahram Zonooz and Elahe Arani

(参考訳) ディープニューラルネットワークは、敵攻撃の影響を受けやすいため、パフォーマンスと精度を損なう可能性がある。敵訓練(AT)は、そのような攻撃からニューラルネットワークを保護する一般的なアプローチとして現れている。しかし、ATの重要な課題は、テストデータに対するネットワークの堅牢な性能がさらなるトレーニングで悪化し、一般化を阻害する、堅牢なオーバーフィッティングである。脳における能動的忘れるという概念に動機づけられ、我々は新しい学習パラダイム"forget to ease overfitting (fomo)"を導入した。 FOMOは、重みのサブセットをランダムに忘れ、重みの再初期化を通じてモデルの情報を規制する忘れ相と、一般化可能な特徴の学習を強調する再学習相とを交互に扱う。ベンチマークデータセットと敵攻撃による実験により、FOMOは、最先端のロバスト性を改善しつつ、最良のテストと最後のロバストテストの精度のギャップを大幅に減らし、ロバストなオーバーフィッティングを緩和することが示された。さらに、FOMOは標準とロバストな精度のトレードオフを向上し、ベースラインの対角法よりも優れている。最後に、我々のフレームワークはAutoAttacksに対して堅牢であり、多くの実世界のシナリオにおける一般化を高めます。

Deep neural networks are susceptible to adversarial attacks, which can compromise their performance and accuracy. Adversarial Training (AT) has emerged as a popular approach for protecting neural networks against such attacks. However, a key challenge of AT is robust overfitting, where the network's robust performance on test data deteriorates with further training, thus hindering generalization. Motivated by the concept of active forgetting in the brain, we introduce a novel learning paradigm called "Forget to Mitigate Overfitting (FOMO)". FOMO alternates between the forgetting phase, which randomly forgets a subset of weights and regulates the model's information through weight reinitialization, and the relearning phase, which emphasizes learning generalizable features. Our experiments on benchmark datasets and adversarial attacks show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy while improving the state-of-the-art robustness. Furthermore, FOMO provides a better trade-off between standard and robust accuracy, outperforming baseline adversarial methods. Finally, our framework is robust to AutoAttacks and increases generalization in many real-world scenarios.

翻訳日:2024-02-20 19:20:27 公開日:2024-02-18

# プロスペクタヘッド:大規模モデルとデータに対する一般的な特徴属性

Prospector Heads: Generalized Feature Attribution for Large Models & Data ( http://arxiv.org/abs/2402.11729v1 )

ライセンス: Link先を確認

Gautam Machiraju, Alexander Derry, Arjun Desai, Neel Guha, Amir-Hossein Karimi, James Zou, Russ Altman, Christopher R\'e, Parag Mallick

(参考訳) 特徴帰属(feature attribution)は、分類に関連する入力データの領域をローカライズする能力であり、科学的および生物医学領域の機械学習モデルにとって重要な機能である。エンド・ツー・エンドの分類器の予測を「説明」する現在の特徴帰属法は、不正確な特徴の局在化に苦しめられ、計算上の課題のために小さなサンプルサイズと高次元データセットでの使用には不十分である。我々は,任意のエンコーダおよび任意のデータモダリティに適用可能な特徴帰属のための説明ベース手法の効率的かつ解釈可能な代替手段であるprospector headを提案する。プロスペクタヘッドは、シーケンス(テキスト)、画像(病理)、およびグラフ(タンパク質構造)の実験を通じてモダリティを一般化し、平均局在auprcにおけるベースラインアトリビューション法を最大49ポイント上回った。また、入力データ中のクラス固有のパターンの解釈と発見を改善する方法を示す。ハイパフォーマンス、柔軟性、一般化性を通じて、複雑なドメインにおける機械学習モデルの信頼性と透明性を改善するためのフレームワークを提供する。

Feature attribution, the ability to localize regions of the input data that are relevant for classification, is an important capability for machine learning models in scientific and biomedical domains. Current methods for feature attribution, which rely on "explaining" the predictions of end-to-end classifiers, suffer from imprecise feature localization and are inadequate for use with small sample sizes and high-dimensional datasets due to computational challenges. We introduce prospector heads, an efficient and interpretable alternative to explanation-based methods for feature attribution that can be applied to any encoder and any data modality. Prospector heads generalize across modalities through experiments on sequences (text), images (pathology), and graphs (protein structures), outperforming baseline attribution methods by up to 49 points in mean localization AUPRC. We also demonstrate how prospector heads enable improved interpretation and discovery of class-specific patterns in the input data. Through their high performance, flexibility, and generalizability, prospectors provide a framework for improving trust and transparency for machine learning models in complex domains.

翻訳日:2024-02-20 19:20:04 公開日:2024-02-18

# 金融における数値的クレーム検出:新しい金融データセット、弱いスーパービジョンモデル、市場分析

Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis ( http://arxiv.org/abs/2402.11728v1 )

ライセンス: Link先を確認

Agam Shah, Arnav Hiray, Pratvi Shah, Arkaprabha Banerjee, Anushka Singh, Dheeraj Eidnani, Bhaskar Chaudhury, Sudheer Chava

(参考訳) 本稿では、上場企業にとって重要な四半期イベントであるとして、アナリスト報告や決算報告が金融市場リターンに与える影響を検討する。包括的分析を容易にするために,金融領域におけるクレーム検出タスクのための新たな財務データセットを構築する。我々は,本データセット上で様々な言語モデルをベンチマークし,既存のアプローチよりも優れた対象事項エキスパート(SME)の知識を集約関数に取り入れた,新しい弱スーパービジョンモデルを提案する。さらに,「最適化」という新しい尺度を構築することで,提案モデルの実用性を実証する。さらに、利益サプライズへの依存と楽観的尺度への回帰も観察した。私たちのデータセット、モデル、コードは(CC BY 4.0ライセンスの下で)GitHubとHugging Faceで公開されます。

In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. Furthermore, we demonstrate the practical utility of our proposed model by constructing a novel measure ``optimism". Furthermore, we observed the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code will be made publicly (under CC BY 4.0 license) available on GitHub and Hugging Face.

翻訳日:2024-02-20 19:19:41 公開日:2024-02-18

# 人間とaiのコラボレーションを形作る:言語モデルとの共著における様々な足場レベル

Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models ( http://arxiv.org/abs/2402.11723v1 )

ライセンス: Link先を確認

Paramveer S. Dhillon, Somayeh Molaei, Jiaqi Li, Maximilian Golub, Shaochun Zheng, Lionel P. Robert

(参考訳) 言語モデリングの進歩は、新しい人間-ai共著体験への道を開いた。本稿では,大規模言語モデル(llm)からのスキャフォールディングの多種多様なレベルについて検討する。ラテン四角形設計を用いて、被験者(N=131)に、AIアシスト(制御)なし、次文提案(低足場化)、次パラグラフ提案(高足場化)の3つのランダムな条件下での議論的記述プロンプトへの対応を依頼した。以上の結果から,足場が文字品質と生産性(単語/時間)に与える影響が明らかとなった。低いスキャフォールディングは書き込みの品質や生産性を著しく改善しなかったが、高いスキャフォールディングは大きな改善をもたらし、特に非正規のライターや技術に精通していないユーザーにとって恩恵となった。足場作成ツールを用いた場合,認知的負担は認められなかったが,テキストの所有と満足度は適度に低下した。我々の結果は、パーソナライズされたスキャフォールディング機構の必要性を含む、AIを活用した書込みツールの設計に幅広い影響を及ぼす。

Advances in language modeling have paved the way for novel human-AI co-writing experiences. This paper explores how varying levels of scaffolding from large language models (LLMs) shape the co-writing process. Employing a within-subjects field experiment with a Latin square design, we asked participants (N=131) to respond to argumentative writing prompts under three randomly sequenced conditions: no AI assistance (control), next-sentence suggestions (low scaffolding), and next-paragraph suggestions (high scaffolding). Our findings reveal a U-shaped impact of scaffolding on writing quality and productivity (words/time). While low scaffolding did not significantly improve writing quality or productivity, high scaffolding led to significant improvements, especially benefiting non-regular writers and less tech-savvy users. No significant cognitive burden was observed while using the scaffolded writing tools, but a moderate decrease in text ownership and satisfaction was noted. Our results have broad implications for the design of AI-powered writing tools, including the need for personalized scaffolding mechanisms.

翻訳日:2024-02-20 19:19:28 公開日:2024-02-18

# 逆問題と逆問題に対処する可逆フーリエニューラル演算子

Invertible Fourier Neural Operators for Tackling Both Forward and Inverse Problems ( http://arxiv.org/abs/2402.11722v1 )

ライセンス: Link先を確認

Da Long and Shandian Zhe

(参考訳) Fourier Neural Operator (FNO)は、多くのタスクで最先端のパフォーマンスを実証した、人気のある演算子学習手法である。しかし、FNOは主に前方予測に使われているが、多くのアプリケーションは逆問題の解決に頼っている。本稿では,前向きと逆問題の両方に対処する可逆フーリエニューラル演算子 (iFNO) を提案する。潜在チャネル空間における可逆フーリエブロックの設計を行い,モデルパラメータを共有し,情報を効率的に交換し,双方向タスクの学習を相互に規則化する。本研究では,入力空間内の固有構造を捉えるための変分自動エンコーダを統合し,不備やデータ不足,ノイズなどの問題に対処するために後部推論を可能にする。効率的なトレーニングのために,事前学習と微調整のための3段階のプロセスを開発した。 5つのベンチマーク問題に対する評価は,本手法の有効性を示した。

Fourier Neural Operator (FNO) is a popular operator learning method, which has demonstrated state-of-the-art performance across many tasks. However, FNO is mainly used in forward prediction, yet a large family of applications rely on solving inverse problems. In this paper, we propose an invertible Fourier Neural Operator (iFNO) that tackles both the forward and inverse problems. We designed a series of invertible Fourier blocks in the latent channel space to share the model parameters, efficiently exchange the information, and mutually regularize the learning for the bi-directional tasks. We integrated a variational auto-encoder to capture the intrinsic structures within the input space and to enable posterior inference so as to overcome challenges of illposedness, data shortage, noises, etc. We developed a three-step process for pre-training and fine tuning for efficient training. The evaluations on five benchmark problems have demonstrated the effectiveness of our approach.

翻訳日:2024-02-20 19:19:04 公開日:2024-02-18

# LLMエージェントを用いた政治連携交渉のモデル化

Modelling Political Coalition Negotiations Using LLM-based Agents ( http://arxiv.org/abs/2402.11712v1 )

ライセンス: Link先を確認

Farhad Moghimifar, Yuan-Fang Li, Robert Thomson, Gholamreza Haffari

(参考訳) 連立交渉は議会の民主主義の基礎であり、複雑な相互作用と政党間の戦略的コミュニケーションが特徴である。その重要性にもかかわらず、これらの交渉のモデル化は、主に適切なデータがないために、自然言語処理(NLP)の領域で未検討のままである。本稿では,新しいnlpタスクとして連立交渉を導入し,大規模言語モデルに基づくエージェント間の交渉としてモデル化する。我々は、欧州政党の宣言とこれらの国における多数の選挙に関する連立協定を含む多言語データセット POLCA を導入する。このデータセットは、様々な実世界のシミュレーション基盤を提供することによって、政治交渉モデリングにおける現在の範囲制限の課題に対処する。さらに,政党間の連立交渉の過程をシミュレートし,結果を予測する階層的マルコフ決定プロセスを提案する。我々は,現在最先端の大規模言語モデル(LLM)の性能を,連立交渉に対処するエージェントとして評価し,その能力に関する洞察を提供し,今後の政治モデリングの発展への道を開く。

Coalition negotiations are a cornerstone of parliamentary democracies, characterised by complex interactions and strategic communications among political parties. Despite its significance, the modelling of these negotiations has remained unexplored with the domain of Natural Language Processing (NLP), mostly due to lack of proper data. In this paper, we introduce coalition negotiations as a novel NLP task, and model it as a negotiation between large language model-based agents. We introduce a multilingual dataset, POLCA, comprising manifestos of European political parties and coalition agreements over a number of elections in these countries. This dataset addresses the challenge of the current scope limitations in political negotiation modelling by providing a diverse, real-world basis for simulation. Additionally, we propose a hierarchical Markov decision process designed to simulate the process of coalition negotiation between political parties and predict the outcomes. We evaluate the performance of state-of-the-art large language models (LLMs) as agents in handling coalition negotiations, offering insights into their capabilities and paving the way for future advancements in political modelling.

翻訳日:2024-02-20 19:18:50 公開日:2024-02-18

# MORL-Prompt:離散プロンプト最適化のための多目的強化学習の実証分析

MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization ( http://arxiv.org/abs/2402.11711v1 )

ライセンス: Link先を確認

Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick

(参考訳) RLに基づく手法は、ターゲット言語モデルに入力された場合、ユーザーが指定した報酬関数の集合を最大化するプロンプトを探索するために用いられる。しかし、多くのターゲットアプリケーションでは、自然報酬関数は、例えば、スタイル転送タスクにおけるコンテンツ保存対スタイルマッチングといった、互いに緊張状態にある。現在の技術では、報酬関数の平均を最大化することに焦点を当てている。これは必ずしも報酬間のバランスを達成するプロンプトにつながるわけではない。これは、多目的で堅牢な最適化文献でよく研究されている問題である。本稿では,多目的最適化のための複数の手法をrlベースの離散的プロンプト最適化に適用する。2つはパレートの報酬面の体積を考慮し,もう1つは全ての報酬を同時に得られる更新方向を選択する。これら2つのnlpタスク(スタイル転送と機械翻訳)について経験的分析を行い,3つの報酬関数を用いた。実験により,音量を直接最適化する多目的手法は,単調な更新方向を見つけようとする方法よりも,すべての報酬のバランスが良好であることを示す。

RL-based techniques can be used to search for prompts that when fed into a target language model maximize a set of user-specified reward functions. However, in many target applications, the natural reward functions are in tension with one another -- for example, content preservation vs. style matching in style transfer tasks. Current techniques focus on maximizing the average of reward functions, which does not necessarily lead to prompts that achieve balance across rewards -- an issue that has been well-studied in the multi-objective and robust optimization literature. In this paper, we adapt several techniques for multi-objective optimization to RL-based discrete prompt optimization -- two that consider volume of the Pareto reward surface, and another that chooses an update direction that benefits all rewards simultaneously. We conduct an empirical analysis of these methods on two NLP tasks: style transfer and machine translation, each using three competing reward functions. Our experiments demonstrate that multi-objective methods that directly optimize volume perform better and achieve a better balance of all rewards than those that attempt to find monotonic update directions.

翻訳日:2024-02-20 19:18:32 公開日:2024-02-18

# 完成へのバイアスについての一考察

A Note on Bias to Complete ( http://arxiv.org/abs/2402.11710v1 )

ライセンス: Link先を確認

Jia Xu and Mona Diab

(参考訳) 社会バイアスの最小化は社会的な結合を強化し、共有理解を促進し、意思決定を改善する。動的環境における新しいバイアスタイプ(例えば社会的地位)を発見してバイアスの定義を再考し、文化、地域、時間、個人的背景といった文脈に関連してそれらを記述する。本フレームワークは,各仮定に対するバイアスに関する8つの仮説と最小化バイアス戦略と,LLMで提案された解として提案される5つの方法を含む。フレームワークの実現はまだ完了していない。

Minimizing social bias strengthens societal bonds, promoting shared understanding and better decision-making. We revisit the definition of bias by discovering new bias types (e.g., societal status) in dynamic environments and describe them relative to context, such as culture, region, time, and personal background. Our framework includes eight hypotheses about bias and a minimizing bias strategy for each assumption as well as five methods as proposed solutions in LLM. The realization of the framework is yet to be completed.

翻訳日:2024-02-20 19:18:10 公開日:2024-02-18

# GNNavi: グラフニューラルネットワークによる大規模言語モデルの情報フローのナビゲート

GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network ( http://arxiv.org/abs/2402.11709v1 )

ライセンス: Link先を確認

Shuzhou Yuan, Ercong Nie, Michael F\"arber, Helmut Schmid, Hinrich Sch\"utze

(参考訳) 大きな言語モデル(LLM)は、デモによるプロンプトが適用されると、強力なインコンテキスト学習(ICL)能力を示す。しかし、さらに適応性を高めるためには微調整が依然として不可欠である。プロンプトベースの微調整は、低データシナリオにおいて効果的な微調整方法であることが証明されるが、計算リソースへの高い要求は、その実用性を制限する。本稿では,パラメータ効率向上手法(PEFT)を導入することでこの問題に対処する。 GNNaviはICLの情報フローダイナミクスの洞察を活用し、ラベル語が情報伝達のアンカーとして働くことを示す。 GNNaviはグラフニューラルネットワーク(GNN)レイヤを使用して、希望する情報フローをGNNにハードスイッチすることで、プロンプト処理中に情報フローの集約と分布を正確にガイドする。 GPT-2とLlama2を用いたテキスト分類タスクの実験では、GNNaviはパラメータの0.2%から0.5%を更新することで、数ショット設定で標準のプロンプトベースの微調整手法を超えている。我々は、GNNaviとプレフィックスチューニング、LoRA、AdapterなどのPEFTアプローチを比較し、性能と効率の点で比較する。分析の結果,gnnaviは情報フローを強化し,明確な集約プロセスを保証する。

Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are applied to them. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient fine-tuning (PEFT) approach. GNNavi leverages insights into ICL's information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNavi employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 shows GNNavi surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNavi with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNavi enhances information flow and ensures a clear aggregation process.

翻訳日:2024-02-20 19:18:04 公開日:2024-02-18

# 検索エンジンのChatGPT: ジェネレーティブな人工知能が検索の信頼性を損なう

Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less Reliable ( http://arxiv.org/abs/2402.11707v1 )

ライセンス: Link先を確認

Shahan Ali Memon, Jevin D. West

(参考訳) 本稿では,生成人工知能(GenAI)が生成したコンテンツを生成,インデックス化,配信し始める中で,検索エンジンの進化する性質について論じる。我々の議論は、GenAI統合の初期段階、特に事実上の矛盾とバイアスに関する課題を強調します。我々は, 透明性とソーシング能力の低下を伴いながら, ジェナイからの出力が不当な信頼感をもたらすかについて議論する。さらに、検索エンジンは、すでにエラーの少ない、生成されたコンテンツでクエリに答えており、情報の証明をさらに曖昧にし、情報のエコシステムの完全性に影響を与える。これらの要因が検索エンジンの信頼性を低下させるのか議論する。最後に、活発な研究の方向性とオープンな質問について要約する。

In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.

翻訳日:2024-02-20 19:17:43 公開日:2024-02-18

# 一般化ランゲヴィン方程式におけるメモリカーネルの学習

Learning Memory Kernels in Generalized Langevin Equations ( http://arxiv.org/abs/2402.11705v1 )

ライセンス: Link先を確認

Quanjun Lang, Jianfeng Lu

(参考訳) 一般化ランゲヴィン方程式におけるメモリカーネル学習のための新しい手法を提案する。このアプローチは最初、軌道データから相関関数を推定するために正規化prony法を使用し、rkhs正規化を伴うソボレフノルムに基づく損失関数を回帰する。提案手法では,推定相関関数の誤差によってカーネル推定誤差が制御され,指数重み付き$L^2$空間内での性能向上が保証される。 l^2$損失関数に依存する他の回帰推定器や、逆ラプラス変換に由来する推定器と比較し、様々な重みパラメータの選択において一貫した利点を示す数値例を用いて推定器の優位性を示す。さらに、方程式における力およびドリフト項の適用を含む例を示す。

We introduce a novel approach for learning memory kernels in Generalized Langevin Equations. This approach initially utilizes a regularized Prony method to estimate correlation functions from trajectory data, followed by regression over a Sobolev norm-based loss function with RKHS regularization. Our approach guarantees improved performance within an exponentially weighted $L^2$ space, with the kernel estimation error controlled by the error in estimated correlation functions. We demonstrate the superiority of our estimator compared to other regression estimators that rely on $L^2$ loss functions and also an estimator derived from the inverse Laplace transform, using numerical examples that highlight its consistent advantage across various weight parameter selections. Additionally, we provide examples that include the application of force and drift terms in the equation.

翻訳日:2024-02-20 19:17:28 公開日:2024-02-18

# バランスデータ, 不均衡スペクトル:スペクトル不均衡を伴うクラス格差の解消

Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance ( http://arxiv.org/abs/2402.11742v1 )

ライセンス: Link先を確認

Chiraag Kaushik, Ran Liu, Chi-Heng Lin, Amrit Khera, Matthew Y Jin, Wenrui Ma, Vidya Muthukumar, Eva L Dyer

(参考訳) 分類モデルは、異なるクラスで等しく機能することが期待されているが、実際には、しばしばその性能に大きなギャップがある。このクラスバイアスの問題はサンプル不均衡のデータセットで広く研究されているが、バランスのとれたデータセットでは見過ごされている。本研究では,特徴のスペクトル不均衡をクラス格差の潜在的源として導入し,理論と実践の両方におけるスペクトル不均衡とクラスバイアスの関係について検討する。スペクトル不均衡とクラスギャップの関連性を構築するため,高次元混合モデルにおけるクラス間誤差の正確な表現を導出する理論的枠組みを構築した。次に,11種類の事前学習済みエンコーダでこの現象を解析し,提案手法を用いてエンコーダの品質比較を行い,データ拡張戦略の評価と統合を行い,この問題を軽減した。私たちの研究は、学習のクラス依存の影響に光を当て、そのスペクトルを通じて診断できる未知のバイアスを持つ、最先端の事前学習機能に関する新たな洞察を与えています。

Classification models are expected to perform equally well for different classes, yet in practice, there are often large gaps in their performance. This issue of class bias is widely studied in cases of datasets with sample imbalance, but is relatively overlooked in balanced datasets. In this work, we introduce the concept of spectral imbalance in features as a potential source for class disparities and study the connections between spectral imbalance and class bias in both theory and practice. To build the connection between spectral imbalance and class gap, we develop a theoretical framework for studying class disparities and derive exact expressions for the per-class error in a high-dimensional mixture model setting. We then study this phenomenon in 11 different state-of-the-art pretrained encoders and show how our proposed framework can be used to compare the quality of encoders, as well as evaluate and combine data augmentation strategies to mitigate the issue. Our work sheds light on the class-dependent effects of learning, and provides new insights into how state-of-the-art pretrained features may have unknown biases that can be diagnosed through their spectra.

翻訳日:2024-02-20 19:07:23 公開日:2024-02-18

# ニューラルネットにおける非線形性の抽出とkoopman演算子によるモデル圧縮

Extraction of nonlinearity in neural networks and model compression with Koopman operator ( http://arxiv.org/abs/2402.11740v1 )

ライセンス: Link先を確認

Naoki Sugishita, Kayo Kinjo, Jun Ohkubo

(参考訳) 非線形性はディープニューラルネットワークにおいて重要な役割を果たす。本稿では,まず,ニューラルネットワークの非線形性が不可欠である程度について検討する。この目的のために、koopman演算子、拡張動的モード分解、テンソル-トレイン形式を用いる。結果は、制限された非線形性は手書き数字の分類に十分であることを示している。そこで本研究では,資源制約環境下での大規模ネットワーク処理に有用なディープニューラルネットワークのモデル圧縮手法を提案する。提案手法は,クープマン演算子を利用して,ニューラルネットワークの内部処理における線形代数の利用を可能にする。提案手法は,手書き数認識タスクの高度圧縮モデル設定において,従来手法と同等かそれ以上の性能を示す。

Nonlinearity plays a crucial role in deep neural networks. In this paper, we first investigate the degree to which the nonlinearity of the neural network is essential. For this purpose, we employ the Koopman operator, extended dynamic mode decomposition, and the tensor-train format. The results imply that restricted nonlinearity is enough for the classification of handwritten numbers. Then, we propose a model compression method for deep neural networks, which could be beneficial to handling large networks in resource-constrained environments. Leveraging the Koopman operator, the proposed method enables us to use linear algebra in the internal processing of neural networks. We numerically show that the proposed method performs comparably or better than conventional methods in highly compressed model settings for the handwritten number recognition task.

翻訳日:2024-02-20 19:07:03 公開日:2024-02-18

# ニューラルネットワーク力学系モデルのための遷移系抽象化フレームワーク

A Transition System Abstraction Framework for Neural Network Dynamical System Models ( http://arxiv.org/abs/2402.11739v1 )

ライセンス: Link先を確認

Yejiang Yang, Zihao Mo, Hoang-Dung Tran, and Weiming Xiang

(参考訳) 本稿では,人間の行動学習や検証といった複雑な力学系への応用により,モデル解釈性を高めるために,ニューラルネットワーク力学系モデルのためのトランジッションシステム抽象化フレームワークを提案する。まず、ローカライズされた作業ゾーンは、データ駆動の最大エントロピー(ME)パーティショニング法の下で、複数のローカライズされたパーティショニングに分割される。次に、ニューラルネットワークのセット値到達可能性解析に基づいて遷移行列を求める。最後に、人間の手書きのダイナミクス学習および検証への応用により、提案する抽象化フレームワークを検証し、ブラックボックスモデルの解釈性を向上させる利点を実証する。つまり、提案フレームワークは、データ駆動ニューラルネットワークモデルをトランジッションシステムに抽象化することができ、計算木論理(ctl)言語で記述された仕様の検証を通じてニューラルネットワークモデルを解釈可能とする。

This paper proposes a transition system abstraction framework for neural network dynamical system models to enhance the model interpretability, with applications to complex dynamical systems such as human behavior learning and verification. To begin with, the localized working zone will be segmented into multiple localized partitions under the data-driven Maximum Entropy (ME) partitioning method. Then, the transition matrix will be obtained based on the set-valued reachability analysis of neural networks. Finally, applications to human handwriting dynamics learning and verification are given to validate our proposed abstraction framework, which demonstrates the advantages of enhancing the interpretability of the black-box model, i.e., our proposed framework is able to abstract a data-driven neural network model into a transition system, making the neural network model interpretable through verifying specifications described in Computational Tree Logic (CTL) languages.

翻訳日:2024-02-20 19:06:51 公開日:2024-02-18

# 射影ゲージ-ヒッグスモデルにおけるバルクおよび境界絡み合い遷移

Bulk and boundary entanglement transitions in the projective gauge-Higgs model ( http://arxiv.org/abs/2402.11738v1 )

ライセンス: Link先を確認

Hiroki Sukeno, Kazuki Ikeda, Tzu-Chieh Wei

(参考訳) 量子多体スピン系では、マルチキュービットパウリのエンタングリング効果と単一キュービットパウリの測定のアンタングリング効果との相互作用は2つの競合効果をもたらす可能性がある。このような基底を持つランダム化測定パターンを導入することにより、それらの比を変化させることで位相遷移を誘導することができる。本研究では,2+1)$d$\mathbb{Z}_2$ Fradkin-Shenker Hamiltonianモデルに付随する測定ベースモデルについて数値解析を行った。エンタングルメント尺度を用いて, 測定のみのモデルにおける位相図を決定する。バルクトポロジカル秩序に対しては、トポロジカルエントロピーを用いる。また, 分離境界領域間の相互情報を用いて, ヒッグス相やバルクspt相に関連する境界相転移を診断する。我々は、開粗境界を持つフラドキン・シェンカー模型の標準量子ハミルトン定式化において、位相図と位相図の構造的類似性を観察した。まず、非零および定数位相エンタングルメントエントロピーにより解圧位相を検出する。第二に、ヒッグス=SPT相と残りの相を分離する(有界)相転移曲線が見つかる。ある限度では、位相相転移はバルク3次元時空格子における巨大ホモロジーサイクルの形成の臨界点と、バルクから効果的に分離されたときに境界2次元時空格子の結合パーコレーション閾値に存在する。さらに、位相図の特定の領域に類似した混合位相特性が存在し、測定に基づく手続きの終了方法から生じる。近い将来, 量子デバイス上でのヒッグス=SPT相の物理を研究するための代替経路を開拓する。

In quantum many-body spin systems, the interplay between the entangling effect of multi-qubit Pauli measurements and the disentangling effect of single-qubit Pauli measurements may give rise to two competing effects. By introducing a randomized measurement pattern with such bases, a phase transition can be induced by altering the ratio between them. In this work, we numerically investigate a measurement-based model associated with the $(2+1)$d $\mathbb{Z}_2$ Fradkin-Shenker Hamiltonian model, encompassing the deconfining, confining, and Higgs phases. We determine the phase diagram in our measurement-only model by employing entanglement measures. For the bulk topological order, we use the topological entanglement entropy. We also use the mutual information between separated boundary regions to diagnose the boundary phase transition associated with the Higgs or the bulk SPT phase. We observe the structural similarity between our phase diagram and the one in the standard quantum Hamiltonian formulation of the Fradkin-Shenker model with the open rough boundary. First, a deconfining phase is detected by nonzero and constant topological entanglement entropy. Second, we find a (boundary) phase transition curve separating the Higgs=SPT phase from the rest. In certain limits, the topological phase transitions reside at the critical point of the formation of giant homological cycles in the bulk 3d spacetime lattice, as well as the bond percolation threshold of the boundary 2d spacetime lattice when it is effectively decoupled from the bulk. Additionally, there are analogous mixed-phase properties at a certain region of the phase diagram, emerging from how we terminate the measurement-based procedure. Our findings pave an alternative pathway to study the physics of Higgs=SPT phases on quantum devices in the near future.

翻訳日:2024-02-20 19:06:34 公開日:2024-02-18

# モデル等価性評価に基づくフィードフォワードニューラルネットワークの圧縮修復

Compression Repair for Feedforward Neural Networks Based on Model Equivalence Evaluation ( http://arxiv.org/abs/2402.11737v1 )

ライセンス: Link先を確認

Zihao Mo, Yejiang Yang, Shuaizheng Lu, and Weiming Xiang

(参考訳) 本稿では,2つのニューラルネットワークの等価性評価に基づいて,圧縮フィードフォワードニューラルネットワーク(FNN)の修復手法を提案する。修復フレームワークにおいて、2つのニューラルネットワーク間の出力差を計算するために、新しいニューラルネットワーク等価性評価法を開発した。出力不一致は、圧縮手順によって生じる出力差を定量的に特徴付けることができる。この計算出力不一致に基づいて、まず、圧縮ネットワークのための新しいトレーニングセットを初期化し、2つのニューラルネットワーク間の不一致を狭め、圧縮ネットワークの性能を向上させる。そして, トレーニングセットに基づいて再訓練を行い, 圧縮FNNを修復する。提案手法の有効性と利点を示すため,本手法をMNISTデータセットに適用した。

In this paper, we propose a method of repairing compressed Feedforward Neural Networks (FNNs) based on equivalence evaluation of two neural networks. In the repairing framework, a novel neural network equivalence evaluation method is developed to compute the output discrepancy between two neural networks. The output discrepancy can quantitatively characterize the output difference produced by compression procedures. Based on the computed output discrepancy, the repairing method first initializes a new training set for the compressed networks to narrow down the discrepancy between the two neural networks and improve the performance of the compressed network. Then, we repair the compressed FNN by re-training based on the training set. We apply our developed method to the MNIST dataset to demonstrate the effectiveness and advantages of our proposed repair method.

翻訳日:2024-02-20 19:05:56 公開日:2024-02-18

# カーネルベースのGibbs測度を持つMonte Carlo:確率的ハーディングの保証

Monte Carlo with kernel-based Gibbs measures: Guarantees for probabilistic herding ( http://arxiv.org/abs/2402.11736v1 )

ライセンス: Link先を確認

Martin Rouault, R\'emi Bardenet, Myl\`ene Ma\"ida

(参考訳) カーネルシェディングは、再現されたカーネルヒルベルト空間(RKHS)上の最悪の積分誤差を最小限に抑える決定論的二次関数の族に属する。強い実験的支持にもかかわらず、少なくともRKHSが無限次元である通常の場合において、この最悪のケースエラーが二次ノード数の標準平方根よりも速い速度で減少することを証明することは困難である。本稿では,カーネルのハーディングと同じ最悪のエラーを最小限に抑えるため,二次ノード上の結合確率分布について検討する。最悪ケース積分誤差に対してより厳密な濃度不等式を持つという意味で、モンテカルロよりも優れていることを証明している。速度をまだ改善していないが、ギブス測度の研究の数学的ツールが、カーネル・ハーディングとその変種が計算量的に安価な手法でどの程度改善するかを理解するのに役立つことを証明している。さらに, 早期実験により, 最悪の場合ではないが, 収束速度が速くなる可能性が示唆された。

Kernel herding belongs to a family of deterministic quadratures that seek to minimize the worst-case integration error over a reproducing kernel Hilbert space (RKHS). In spite of strong experimental support, it has revealed difficult to prove that this worst-case error decreases at a faster rate than the standard square root of the number of quadrature nodes, at least in the usual case where the RKHS is infinite-dimensional. In this theoretical paper, we study a joint probability distribution over quadrature nodes, whose support tends to minimize the same worst-case error as kernel herding. We prove that it does outperform i.i.d. Monte Carlo, in the sense of coming with a tighter concentration inequality on the worst-case integration error. While not improving the rate yet, this demonstrates that the mathematical tools of the study of Gibbs measures can help understand to what extent kernel herding and its variants improve on computationally cheaper methods. Moreover, we provide early experimental evidence that a faster rate of convergence, though not worst-case, is likely.

翻訳日:2024-02-20 19:05:36 公開日:2024-02-18

PDF登録状況（公開日: 20240218）