Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20240206となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# DeepTraderX:マルチスレッド市場シミュレーションにおけるDeep Learningによる従来型トレーディング戦略の整合化 DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations ( http://arxiv.org/abs/2403.18831v1 ) ライセンス: Link先を確認	Armand Mihai Cismaru,	(参考訳) 本稿では,DeepTraderX(DTX)について紹介し,その性能をマルチスレッド市場シミュレーションで示す。 DTXは、およそ500の模擬市場デーにおいて、他の戦略が生み出す価格を見ることでのみ学んでいる。これを行うことで、市場データから、入札または注文のどちらかの引用へのマッピングを成功させ、資産の配置を可能にした。歴史あるレベル2市場データ、すなわち特定のトレーダブル資産のリミット・オーダー・ブック(LOB)に基づいて、DTXは市場状態の$S$を各時点の$T$で処理し、市場注文の$P$を決定する。トレーニングとテストの両方で使用される市場データは、実歴史ある株式市場のデータに基づいて、ユニークな市場スケジュールから生成される。 DTXは文献の最良の戦略に対して広範囲に試験され、その結果は統計的分析によって検証された。この結果から,DTXの競合能力は,複雑なマルチスレッドシミュレーションを成功させる上で必要となる,シンプルなモデルの効率を重視した,非クラスな人的トレーダーを含むパブリックドメイントレーダーのパフォーマンスを上回るものが多い。これは、より効率的な金融市場を構築するために、"ブラックボックス"なディープラーニングシステムを活用する可能性を強調します。 In this paper, we introduce DeepTraderX (DTX), a simple Deep Learning-based trader, and present results that demonstrate its performance in a multi-threaded market simulation. In a total of about 500 simulated market days, DTX has learned solely by watching the prices that other strategies produce. By doing this, it has successfully created a mapping from market data to quotes, either bid or ask orders, to place for an asset. Trained on historical Level-2 market data, i.e., the Limit Order Book (LOB) for specific tradable assets, DTX processes the market state $S$ at each timestep $T$ to determine a price $P$ for market orders. The market data used in both training and testing was generated from unique market schedules based on real historic stock market data. DTX was tested extensively against the best strategies in the literature, with its results validated by statistical analysis. Our findings underscore DTX's capability to rival, and in many instances, surpass, the performance of public-domain traders, including those that outclass human traders, emphasising the efficiency of simple models, as this is required to succeed in intricate multi-threaded simulations. This highlights the potential of leveraging "black-box" Deep Learning systems to create more efficient financial markets.	翻訳日:2024-04-01 02:34:48 公開日:2024-02-06
# Linuxカーネル・アウト・オブ・メモリ・キラーのコミットメッセージに対するRationaleデータセットと解析 Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory Killer ( http://arxiv.org/abs/2403.18832v1 ) ライセンス: Link先を確認	Mouna Dhaouadi, Bentley James Oakes, Michalis Famelis,	(参考訳) コードコミットメッセージには、開発者がなぜ変更をしたのかに関する有用な情報が含まれている。しかし、実世界のコードコミットメッセージにおける理性の存在と構造はよく研究されていない。ここでは、Linux Kernel Out-Of-Memory Killerコンポーネントのコードコミットメッセージを解析するためのラベル付きデータセットの作成について詳述する。我々は,存在,時間的進化,構造といった合理的情報の側面を研究する。私たちのデータセットのコミットの98.9%は、合理的な情報を持つ文を含み、経験豊富な開発者は、コミットの文の約60%に合理性を報告している。直面した課題について報告し、ラベル付けの例を示す。 Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.	翻訳日:2024-04-01 02:34:48 公開日:2024-02-06
# QTFlow: RTL上のセキュリティ対応ハードウェア設計のための定量的タイミング感覚情報フロー QTFlow: Quantitative Timing-Sensitive Information Flow for Security-Aware Hardware Design on RTL ( http://arxiv.org/abs/2401.17819v2 ) ライセンス: Link先を確認	Lennart M. Reimann, Anshul Prashar, Chiara Ghinami, Rebecca Pelke, Dominik Sisejkovic, Farhad Merchant, Rainer Leupers,	(参考訳) 現代のElectronic Design Automation (EDA) ツールでは、セキュリティはパワー、パフォーマンス、領域最適化の主な目標を後押しすることが多い。一般的に、セキュリティ分析は手動で行われるため、設計上の脆弱性は気づかないままである。セキュリティを意識したEDAツールは,パフォーマンスと領域を念頭に置いて,セキュリティ脅威の識別と削除を支援する。カットエッジ法は、設計構造における意図しない情報漏洩を特定するために、情報フロー解析を用いる。現在の情報漏洩検出方法は、定量的情報フロー分析を用いて漏洩を定量化する。しかし、シーケンシャル回路の扱いは、時間に依存しない性質、タイミングチャネルを見渡すこと、偽陽性を導入することなどにより、最先端技術に課題をもたらす。これを解決するために、設計フェーズ中にハードウェア情報漏洩を定量化する、タイミングに敏感なフレームワークQTFlowを紹介する。 QTFlowはオープンソースベンチマークの有効性を図示し、タイミングチャネルを自律的に識別し、現在の最先端技術と対比した場合に、時間に依存しない分析から生じるすべての偽陽性を低減します。 In contemporary Electronic Design Automation (EDA) tools, security often takes a backseat to the primary goals of power, performance, and area optimization. Commonly, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while keeping performance and area in mind. Cutting-edge methods employ information flow analysis to identify inadvertent information leaks in design structures. Current information leakage detection methods use quantitative information flow analysis to quantify the leaks. However, handling sequential circuits poses challenges for state-of-the-art techniques due to their time-agnostic nature, overlooking timing channels, and introducing false positives. To address this, we introduce QTFlow, a timing-sensitive framework for quantifying hardware information leakages during the design phase. Illustrating its effectiveness on open-source benchmarks, QTFlow autonomously identifies timing channels and diminishes all false positives arising from time-agnostic analysis when contrasted with current state-of-the-art techniques.	翻訳日:2024-03-25 12:08:11 公開日:2024-02-06
# GeoDataのプライバシーリスクに関する調査 Privacy risk in GeoData: A survey ( http://arxiv.org/abs/2402.03612v1 ) ライセンス: Link先を確認	Mahrokh Abdollahi Lorestani, Thilina Ranbaduge, Thierry Rakotoarivelo,	(参考訳) ユビキタスな位置情報サービスの利用により、大規模個人レベルの位置情報は位置情報認識デバイスを通じて広く収集されている。位置情報の公開は、匿名化や機密情報の推測、さらには物理的な脅威につながる可能性があるため、ユーザにとって重大なプライバシーリスクとなる。ジオプライバシーの懸念は、ユーザーアイデンティティの匿名化と位置情報の露出の問題に起因している。本研究では,地理データにおける個人のプライバシーを守るために提案されている異なるジオマスキング手法を分析した。本研究では,これらのテクニックを異なる次元に沿って特徴づける分類法を提案し,ジオマスキング技術の調査を行う。次に、現在の技術の欠点を強調し、今後の研究の道筋について論じる。 With the ubiquitous use of location-based services, large-scale individual-level location data has been widely collected through location-awareness devices. The exposure of location data constitutes a significant privacy risk to users as it can lead to de-anonymisation, the inference of sensitive information, and even physical threats. Geoprivacy concerns arise on the issues of user identity de-anonymisation and location exposure. In this survey, we analyse different geomasking techniques that have been proposed to protect the privacy of individuals in geodata. We present a taxonomy to characterise these techniques along different dimensions, and conduct a survey of geomasking techniques. We then highlight shortcomings of current techniques and discuss avenues for future research.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-06
# コードに基づく推測に基づくロッシー暗号 Lossy Cryptography from Code-Based Assumptions ( http://arxiv.org/abs/2402.03633v1 ) ライセンス: Link先を確認	Quang Dao, Aayush Jain,	(参考訳) 過去数十年にわたって、二次的残留性、決定的ディフィー・ヘルマン(Decisional Diffie-Hellman)、Learning with Errors(Learning with Errors)といった様々な仮定から構築された、欠落または同型の性質を持つ高度な暗号プリミティブが急増してきた。これらのプリミティブは、複雑性クラス$SZK$(統計的ゼロ知識)の難しい問題を暗示している。このことは、コードベースの仮定から先進的プリミティブを構築するための障壁となる。そのような仮定が唯一知られているのは、準多項式時間で破られる非常に低いノイズレート$\frac{\log^2 n}{n}$のLearning Parity with Noise (LPN)である。そこで本研究では,複雑性クラス$BPP^{SZK}$に該当するDense-Sparse LPNというコードベースの仮定を提案する。我々の仮定は、平均ケース複雑性において、McElieceの暗号システムとランダム$k\mbox{-}$XORにインスパイアされたLPNの変種である。我々はこの仮定を利用して、損失の少ないトラップドア関数(Peikert-Waters STOC 08)を構築する。これは、最初の論文で格子ベースの構造に取って代わる最初の量子後代替を与える。基本的な暗号ツールであるロッシートラップドア関数は、ロッシープリミティブと非ロッシープリミティブの両方の幅広いスペクトルを可能にすることが知られている。特に,音速$\frac{\log^2 n}{n}$のLPNからの事前構成よりも,準ポリノミクス的にのみ安全な衝突耐性ハッシュ関数を実現する。 Over the past few decades, we have seen a proliferation of advanced cryptographic primitives with lossy or homomorphic properties built from various assumptions such as Quadratic Residuosity, Decisional Diffie-Hellman, and Learning with Errors. These primitives imply hard problems in the complexity class $SZK$ (statistical zero-knowledge); as a consequence, they can only be based on assumptions that are broken in $BPP^{SZK}$. This poses a barrier for building advanced primitives from code-based assumptions, as the only known such assumption is Learning Parity with Noise (LPN) with an extremely low noise rate $\frac{\log^2 n}{n}$, which is broken in quasi-polynomial time. In this work, we propose a new code-based assumption: Dense-Sparse LPN, that falls in the complexity class $BPP^{SZK}$ and is conjectured to be secure against subexponential time adversaries. Our assumption is a variant of LPN that is inspired by McEliece's cryptosystem and random $k\mbox{-}$XOR in average-case complexity. We leverage our assumption to build lossy trapdoor functions (Peikert-Waters STOC 08). This gives the first post-quantum alternative to the lattice-based construction in the original paper. Lossy trapdoor functions, being a fundamental cryptographic tool, are known to enable a broad spectrum of both lossy and non-lossy cryptographic primitives; our construction thus implies these primitives in a generic manner. In particular, we achieve collision-resistant hash functions with plausible subexponential security, improving over a prior construction from LPN with noise rate $\frac{\log^2 n}{n}$ that is only quasi-polynomially secure.	翻訳日:2024-03-18 07:48:02 公開日:2024-02-06
# WhisperFuzz:プロセッサのタイミング脆弱性を検出するためのホワイトボックスファズ WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors ( http://arxiv.org/abs/2402.03704v1 ) ライセンス: Link先を確認	Pallavi Borkar, Chen Chen, Mohamadreza Rostami, Nikhilesh Singh, Rahul Kande, Ahmad-Reza Sadeghi, Chester Rebeiro, Jeyavijayan Rajendran,	(参考訳) プロセッサのタイミング脆弱性は強力な脅威として浮上している。プロセッサがあらゆるコンピューティングシステムの基盤であるため、これらの欠陥を特定することは必須である。近年,ソフトウェア脆弱性の検出に用いられてきたファジィング技術は,プロセッサなどの大規模ハードウェア設計における脆弱性の発見に有望な結果を示している。研究者は、プロセッサのタイミング脆弱性を検出するためにブラックボックスまたはグレイボックスファジィを適応した。しかし、これらのタイミング脆弱性の場所や根本原因を特定することはできず、また、プロセッサのセキュリティに対するデザイナの信頼性を高めるためのカバレッジフィードバックも提供しない。既存のファジィの欠陥に対処するため,プロセッサのタイミング脆弱性を検出し,検出し,微構造的タイミング行動のカバレッジを評価するための静的解析を行う最初のホワイトボックスファジィであるWhisperFuzzを提案する。 WhisperFuzzは、プロセッサのタイミング動作、マイクロアーキテクチャの状態遷移の基本的な性質を使用して、タイミング脆弱性をローカライズする。 WhisperFuzzは、レジスタ転送レベル(RTL)のプロセッサ設計から自動的にマイクロアーキテクチャの状態遷移を抽出し、その設計をカバー範囲として状態遷移を監視する。さらに、WhisperFuzzは、DUT(Design-under-test)がテスト処理に要する時間を測定し、タイミングの脆弱性を示唆する小さな異常なバリエーションを特定する。 WhisperFuzzは、先進的なオープンソースRISC-Vプロセッサ(BOOM、Rocket Core、CVA6)で12の新たなタイミング脆弱性を検出する。そのうち8つはZkt拡張のゼロレイテンシ要件に違反しており、深刻なセキュリティ脆弱性と見なされている。さらに、WhisperFuzzは、新しい脆弱性と既存の脆弱性の位置も特定する。 Timing vulnerabilities in processors have emerged as a potent threat. As processors are the foundation of any computing system, identifying these flaws is imperative. Recently fuzzing techniques, traditionally used for detecting software vulnerabilities, have shown promising results for uncovering vulnerabilities in large-scale hardware designs, such as processors. Researchers have adapted black-box or grey-box fuzzing to detect timing vulnerabilities in processors. However, they cannot identify the locations or root causes of these timing vulnerabilities, nor do they provide coverage feedback to enable the designer's confidence in the processor's security. To address the deficiencies of the existing fuzzers, we present WhisperFuzz--the first white-box fuzzer with static analysis--aiming to detect and locate timing vulnerabilities in processors and evaluate the coverage of microarchitectural timing behaviors. WhisperFuzz uses the fundamental nature of processors' timing behaviors, microarchitectural state transitions, to localize timing vulnerabilities. WhisperFuzz automatically extracts microarchitectural state transitions from a processor design at the register-transfer level (RTL) and instruments the design to monitor the state transitions as coverage. Moreover, WhisperFuzz measures the time a design-under-test (DUT) takes to process tests, identifying any minor, abnormal variations that may hint at a timing vulnerability. WhisperFuzz detects 12 new timing vulnerabilities across advanced open-sourced RISC-V processors: BOOM, Rocket Core, and CVA6. Eight of these violate the zero latency requirements of the Zkt extension and are considered serious security vulnerabilities. Moreover, WhisperFuzz also pinpoints the locations of the new and the existing vulnerabilities.	翻訳日:2024-03-18 07:38:15 公開日:2024-02-06
# ゼロ知識証明機構によるブロックチェーンの安全性と効率の向上 Enhanced Security and Efficiency in Blockchain with Aggregated Zero-Knowledge Proof Mechanisms ( http://arxiv.org/abs/2402.03834v1 ) ライセンス: Link先を確認	Oleksandr Kuznetsov, Alex Rusnak, Anton Yezhov, Dzianis Kanonik, Kateryna Kuznetsova, Stanislav Karashchuk,	(参考訳) ブロックチェーン技術は、デジタルトランザクションにおけるデータの整合性とセキュリティを保証する革命的なツールとして登場した。しかしながら、ブロックチェーンシステム、特にEthereumにおけるデータ検証に対する現在のアプローチは、効率性と計算オーバーヘッドの面で課題に直面している。従来のMerkle Treeと暗号ハッシュ関数の使用は、有効ではあるが、特に大規模なデータセットでは、リソース消費が大幅に増加する。これは、ブロックチェーンネットワークにおけるより効率的なデータ検証方法の必要性という、既存の研究のギャップを浮き彫りにするものだ。本研究は,メルクルツリーの構造内にゼロ知識証明の革新的な集約スキームを提案することによって,このギャップに対処する。我々は,その生成と検証に必要な証明と計算資源を著しく削減するシステムを開発した。当社のアプローチは、ブロックチェーンデータ検証のパラダイムシフトであり、セキュリティと効率のバランスを取っています。提案手法の有効性を検証するため,実Ethereumブロックデータを用いて実験を行った。その結果、従来の手法と比較して、証明サイズと計算要求の大幅な削減が示され、検証プロセスはより効率的かつ経済的に実行可能となった。私たちのコントリビューションは、ブロックチェーンデータ検証のためのスケーラブルでセキュアなソリューションを提供するという、重要な研究の空白を埋めています。金融取引からサプライチェーン管理に至るまで、さまざまなアプリケーションにおけるブロックチェーンテクノロジの全体的なパフォーマンスと適応性を高めています。 Blockchain technology has emerged as a revolutionary tool in ensuring data integrity and security in digital transactions. However, the current approaches to data verification in blockchain systems, particularly in Ethereum, face challenges in terms of efficiency and computational overhead. The traditional use of Merkle Trees and cryptographic hash functions, while effective, leads to significant resource consumption, especially for large datasets. This highlights a gap in existing research: the need for more efficient methods of data verification in blockchain networks. Our study addresses this gap by proposing an innovative aggregation scheme for Zero-Knowledge Proofs within the structure of Merkle Trees. We develop a system that significantly reduces the size of the proof and the computational resources needed for its generation and verification. Our approach represents a paradigm shift in blockchain data verification, balancing security with efficiency. We conducted extensive experimental evaluations using real Ethereum block data to validate the effectiveness of our proposed scheme. The results demonstrate a drastic reduction in proof size and computational requirements compared to traditional methods, making the verification process more efficient and economically viable. Our contribution fills a critical research void, offering a scalable and secure solution for blockchain data verification. The implications of our work are far-reaching, enhancing the overall performance and adaptability of blockchain technology in various applications, from financial transactions to supply chain management.	翻訳日:2024-03-18 07:38:15 公開日:2024-02-06
# LIPSTICK: 論理ロックに対する破壊的かつ説明可能なグラフニューラルネットワークベースのOracle-Less攻撃 LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking ( http://arxiv.org/abs/2402.04235v1 ) ライセンス: Link先を確認	Yeganeh Aghamohammadi, Amin Rezaei,	(参考訳) ゼロトラストのファブレスパラダイムでは、デザイナは半導体サプライチェーンに対するハードウェアベースの攻撃をますます懸念している。論理ロック(Logic locking)は、ハードウェアの知的財産の盗難と過剰生産を防ぐために、回路に追加のキー制御ゲートを追加する、信頼のための設計手法である。攻撃者は伝統的に論理ロックされた回路を攻撃するために託宣に依存してきたが、機械学習攻撃は託宣にアクセスしなくても秘密鍵を回収する能力を示している。本稿では、まず最先端の機械学習攻撃の限界について検討し、鍵ハミング距離を唯一のモデル導構造計量として用いることは必ずしも有用ではないと論じる。そこで我々は,回路の構造と動作を考慮に入れた,論理ロックに対するニューラルネットワークに基づくオラクルレス攻撃を開発し,訓練し,テストする。我々のモデルは、機械学習モデルがトレーニングプロセスで解釈したものと、それがどのように攻撃を成功させるかを分析するという意味で説明がつく。チップデザイナは、インクリメンタルな修正を避けながら、設計をセキュアにすることで、この情報を有益なものにすることができる。 In a zero-trust fabless paradigm, designers are increasingly concerned about hardware-based attacks on the semiconductor supply chain. Logic locking is a design-for-trust method that adds extra key-controlled gates in the circuits to prevent hardware intellectual property theft and overproduction. While attackers have traditionally relied on an oracle to attack logic-locked circuits, machine learning attacks have shown the ability to retrieve the secret key even without access to an oracle. In this paper, we first examine the limitations of state-of-the-art machine learning attacks and argue that the use of key hamming distance as the sole model-guiding structural metric is not always useful. Then, we develop, train, and test a corruptibility-aware graph neural network-based oracle-less attack on logic locking that takes into consideration both the structure and the behavior of the circuits. Our model is explainable in the sense that we analyze what the machine learning model has interpreted in the training process and how it can perform a successful attack. Chip designers may find this information beneficial in securing their designs while avoiding incremental fixes.	翻訳日:2024-03-18 07:38:15 公開日:2024-02-06
# ブロックチェーンにおけるメルクルツリー:衝突確率とセキュリティへの影響に関する研究 Merkle Trees in Blockchain: A Study of Collision Probability and Security Implications ( http://arxiv.org/abs/2402.04367v1 ) ライセンス: Link先を確認	Oleksandr Kuznetsov, Alex Rusnak, Anton Yezhov, Kateryna Kuznetsova, Dzianis Kanonik, Oleksandr Domin,	(参考訳) ブロックチェーン技術の急速な進化の中で、データの整合性とセキュリティの確保が最重要である。この研究は、Ethereumのようなブロックチェーンアーキテクチャの基本コンポーネントであるMerkle Treesのセキュリティ面について詳しく説明している。我々は、ブロックチェーンシステム内のデータセキュリティに重大なリスクをもたらす潜在的な脆弱性である、ハッシュ衝突に対するMerkle Treesの感受性を批判的に検証する。広く応用されているにもかかわらず、Merkle Treesの衝突抵抗と、前像攻撃に対する堅牢性は十分に調査されていないため、ブロックチェーンセキュリティメカニズムの包括的な理解において、顕著なギャップが生じる。我々の研究は、理論的分析と実証的検証の巧妙なブレンドを通して、このギャップを埋めようとしている。本研究は,本樹における根の衝突確率を,ハッシュ長や経路長といった様々な要因を考慮し検討した。その結果,ルート長の増加とルート衝突の確率の上昇との間に直接的相関があることが判明し,潜在的なセキュリティ上の脆弱性が強調された。逆に、ハッシュ長の増加は衝突の可能性を著しく低下させ、セキュリティの強化における重要な役割を浮き彫りにする。私たちの研究から得られた洞察は、ブロックチェーンベースのシステムのセキュリティと運用の効率を高めることを目的として、ブロックチェーン開発者と研究者に貴重なガイダンスを提供する。 In the rapidly evolving landscape of blockchain technology, ensuring the integrity and security of data is paramount. This study delves into the security aspects of Merkle Trees, a fundamental component in blockchain architectures, such as Ethereum. We critically examine the susceptibility of Merkle Trees to hash collisions, a potential vulnerability that poses significant risks to data security within blockchain systems. Despite their widespread application, the collision resistance of Merkle Trees and their robustness against preimage attacks have not been thoroughly investigated, leading to a notable gap in the comprehensive understanding of blockchain security mechanisms. Our research endeavors to bridge this gap through a meticulous blend of theoretical analysis and empirical validation. We scrutinize the probability of root collisions in Merkle Trees, considering various factors such as hash length and path length within the tree. Our findings reveal a direct correlation between the increase in path length and the heightened probability of root collisions, thereby underscoring potential security vulnerabilities. Conversely, we observe that an increase in hash length significantly reduces the likelihood of collisions, highlighting its critical role in fortifying security. The insights garnered from our research offer valuable guidance for blockchain developers and researchers, aiming to bolster the security and operational efficacy of blockchain-based systems.	翻訳日:2024-03-18 07:38:15 公開日:2024-02-06
# 説明可能な機械学習を用いたバス利用の空間的・時間的変動に及ぼす行動・構築環境・社会経済的特徴の影響の解明 Unveiling the influence of behavioural, built environment and socio-economic features on the spatial and temporal variability of bus use using explainable machine learning ( http://arxiv.org/abs/2403.05545v1 ) ライセンス: Link先を確認	Sui Tao, Francisco Rowe, Hongyu Shan,	(参考訳) 人々の旅行パターンの多様性を理解することが、交通計画と政策立案の鍵となる。しかし, 日々の交通機関の利用状況は, 地理的・時間的変動の程度と, どのような要因が完全には対処されていないかを示す。本研究は,中国北京のスマートカードデータに基づいて,ピーク時のバス利用の空間的・時間的変動を把握し,関連する文脈的特徴との関連性を調べるために,新しい指標を採用することで,これらの欠陥に対処することを目的とする。説明可能な機械学習を用いて,空間的・時間的変動と旅行頻度の非線形相互作用を明らかにした。さらに、都市中心部(>10km)への距離は、バス利用の空間的変動の増加と関連し、旅行の発端と目的地の分離は、空間的および時間的変動を減少させる。バス路線の高可用性は、より空間的変動性が高いが時間的変動性が低いことに関係している。一方,道路密度の低下と道路密度の上昇は,特に朝のバス利用の空間変動に関係している。これらの結果から,異なる建築環境が旅行時間や場所の柔軟性を適度に発揮していることが明らかとなった。インプリケーションは、より応答性が高く信頼性の高いトランジットシステムの運用と計画を行うために引き起こされる。 Understanding the variability of people's travel patterns is key to transport planning and policy-making. However, to what extent daily transit use displays geographic and temporal variabilities, and what are the contributing factors have not been fully addressed. Drawing on smart card data in Beijing, China, this study seeks to address these deficits by adopting new indices to capture the spatial and temporal variability of bus use during peak hours and investigate their associations with relevant contextual features. Using explainable machine learning, our findings reveal non-linear interaction between spatial and temporal variability and trip frequency. Furthermore, greater distance to the urban centres (>10 kilometres) is associated with increased spatial variability of bus use, while greater separation of trip origins and destinations from the subcentres reduces both spatial and temporal variability. Higher availability of bus routes is linked to higher spatial variability but lower temporal variability. Meanwhile, both lower and higher road density is associated with higher spatial variability of bus use especially in morning times. These findings indicate that different built environment features moderate the flexibility of travel time and locations. Implications are derived to inform more responsive and reliable operation and planning of transit systems.	翻訳日:2024-03-18 06:19:57 公開日:2024-02-06
# AFCとAPCデータを組み合わせた公共交通ネットワークの統一運用 Unified Occupancy on a Public Transport Network through Combination of AFC and APC Data ( http://arxiv.org/abs/2403.05546v1 ) ライセンス: Link先を確認	Amir Dib, Noëlie Cherrier, Martin Graive, Baptiste Rérolle, Eglantine Schmitt,	(参考訳) 交通ネットワークにおいては、旅行者の習慣を把握し、提案を調整するために、船上での居住が鍵となる。伝統的に、オペレーターは典型的な作業日のライダーシップを評価するためにフィールドスタディに依存してきた。しかし、完全な時間的カバレッジを提供する自動運賃徴収(AFC)と自動旅客カウント(APC)データはしばしば利用可能であるが、未公開である。ただし、各データソースには独自のバイアスがあることに注意が必要だ。AFCデータは不正を考慮せず、すべての車両がAPCシステムを備えているわけではない。本稿では,AFC と APC のデータと部分的カバレッジを組み合わせることで,公共交通ネットワークのすべてのコースに占有率を推定する統合占有法を提案する。統一された職業は、他のコースがAPC尺度を持つラインのコースや、APCデータが全く利用できないラインのコースについて、APC情報の欠落を完了します。本手法の精度は、フランスの公共交通機関の実際のデータに基づいて評価される。 In a transport network, the onboard occupancy is key for gaining insights into travelers' habits and adjusting the offer. Traditionally, operators have relied on field studies to evaluate ridership of a typical workday. However, automated fare collection (AFC) and automatic passenger counting (APC) data, which provide complete temporal coverage, are often available but underexploited. It should be noted, however, that each data source comes with its own biases: AFC data may not account for fraud, while not all vehicles are equipped with APC systems. This paper introduces the unified occupancy method, a geostatistical model to extrapolate occupancy to every course of a public transportation network by combining AFC and APC data with partial coverage. Unified occupancy completes missing APC information for courses on lines where other courses have APC measures, as well as for courses on lines where no APC data is available at all. The accuracy of this method is evaluated on real data from several public transportation networks in France.	翻訳日:2024-03-18 06:19:57 公開日:2024-02-06
# 非プログラマのためのAI:プログラミングスキルを持たない学生のための講義における応用AI AI for non-programmers: Applied AI in the lectures for students without programming skills ( http://arxiv.org/abs/2403.05547v1 ) ライセンス: Link先を確認	Julius Schöning, Tim Wawer, Kai-Michael Griese,	(参考訳) ChatGPTやWOMBO Dreamといったアプリケーションは、プログラミング知識のない学生に人工知能(AI)を使わせるのを容易にする。したがって、あらゆる分野においてAIの重要性が高まる中、プログラミング知識のないAIの学生を教育するためには革新的な戦略が必要である。この研究は、応用AIのための実践的な計画スクリプトを提示する。ドキュメント計画スクリプトは、AIアプリケーションパイプラインに基づいて、AIの概念と研究関連トピックをリンクする。これらのリンクは、新しいソリューション空間を開き、AIの可能性とリスクに対する学生の関心と理解を促進する。エネルギー管理の修士課程の講義シリーズは、AIを規律固有の講義にシームレスに統合する方法を示している。この目的のために、応用AIの計画スクリプトは、学習プログラムのトピックに適合するように適合する。この特定の教育シナリオにより、学生はAIアプリケーションパイプラインを使用して、規律固有のタスクステップを段階的に解決することができる。このように、応用AIのためのドクティク計画スクリプトの適用は、AIの理論概念の実践的な実装を示している。さらに、規律固有の講義でAIが使えるかどうかを評価するために使用できるチェックリストが提示される。将来のスキルとしてのAIは、学習の過程に関連するユースケースに基づいて、学生によって学習されなければならない。このような理由から、AI教育は、学習分野のためにプログラミングのバックグラウンドを持っていなくても、様々なカリキュラムにシームレスに適合すべきである。 Applications such as ChatGPT and WOMBO Dream make it easy to inspire students without programming knowledge to use artificial intelligence (AI). Therefore, given the increasing importance of AI in all disciplines, innovative strategies are needed to educate students in AI without programming knowledge so that AI can be integrated into their study modules as a future skill. This work presents a didactic planning script for applied AI. The didactic planning script is based on the AI application pipeline and links AI concepts with study-relevant topics. These linkages open up a new solution space and promote students' interest in and understanding of the potentials and risks of AI. An example lecture series for master students in energy management shows how AI can be seamlessly integrated into discipline-specific lectures. To this end, the planning script for applied AI is adapted to fit the study programs' topic. This specific teaching scenario enables students to solve a discipline-specific task step by step using the AI application pipeline. Thus, the application of the didactic planning script for applied AI shows the practical implementation of the theoretical concepts of AI. In addition, a checklist is presented that can be used to assess whether AI can be used in the discipline-specific lecture. AI as a future skill must be learned by students based on use cases that are relevant to the course of studies. For this reason, AI education should fit seamlessly into various curricula, even if the students do not have a programming background due to their field of study.	翻訳日:2024-03-18 06:19:57 公開日:2024-02-06
# BERTを用いた過激派ソーシャルメディアにおける反ユダヤ的言論の進化のモニタリング Monitoring the evolution of antisemitic discourse on extremist social media using BERT ( http://arxiv.org/abs/2403.05548v1 ) ライセンス: Link先を確認	Raza Ul Mustafa, Nathalie Japkowicz,	(参考訳) ソーシャルメディア上での人種差別や不寛容は、悪質なオンライン環境に寄与する。オンラインの反ユダヤ主義は、この研究で考慮された特定の憎しみのカテゴリーである。オンライン議論において、反ユダヤ主義のテーマとその関連する用語を追跡することは、参加者の感情やその進化をモニターし、憎しみのエスカレーションを防ぐための介入の道を提供するのに役立つかもしれない。オンライントラフィックの大量かつ絶え間ない進化のため、手動で会話を監視することは現実的ではない。代わりに、過激派ソーシャルメディアから反ユダヤ主義的テーマや用語を時間をかけて抽出し、その進化を捉える自動手法を提案する。教師付き学習はそのようなタスクには制限されないため、大規模な言語モデルを用いて投稿の文脈的類似性を評価する、教師なしのオンライン機械学習アプローチを作成しました。このメソッドは、同様のポストをまとめ、分割し、既存のテーマや新しいテーマからサブテーマが現れたときに、時間とともに追加のクラスタを生成する。各テーマ内で使用される反ユダヤ的用語は、各クラスタ内のポストから抽出される。実験により,本手法は既存の基準よりも優れており,関連する用語とともに,反ユダヤ的言説の中で発見されるテーマやサブテーマの種類が示されている。当社のアプローチは、社会プラットフォーム上での反ユダヤ主義以外のあらゆる憎悪の進化を監視するのに役立つと信じている。 Racism and intolerance on social media contribute to a toxic online environment which may spill offline to foster hatred, and eventually lead to physical violence. That is the case with online antisemitism, the specific category of hatred considered in this study. Tracking antisemitic themes and their associated terminology over time in online discussions could help monitor the sentiments of their participants and their evolution, and possibly offer avenues for intervention that may prevent the escalation of hatred. Due to the large volume and constant evolution of online traffic, monitoring conversations manually is impractical. Instead, we propose an automated method that extracts antisemitic themes and terminology from extremist social media over time and captures their evolution. Since supervised learning would be too limited for such a task, we created an unsupervised online machine learning approach that uses large language models to assess the contextual similarity of posts. The method clusters similar posts together, dividing, and creating additional clusters over time when sub-themes emerge from existing ones or new themes appear. The antisemitic terminology used within each theme is extracted from the posts in each cluster. Our experiments show that our methodology outperforms existing baselines and demonstrates the kind of themes and sub-themes it discovers within antisemitic discourse along with their associated terminology. We believe that our approach will be useful for monitoring the evolution of all kinds of hatred beyond antisemitism on social platforms.	翻訳日:2024-03-18 06:19:57 公開日:2024-02-06
# スパイクニューラルネットワークのオンライン勾配推定のための前方直接フィードバックアライメント Forward Direct Feedback Alignment for Online Gradient Estimates of Spiking Neural Networks ( http://arxiv.org/abs/2403.08804v1 ) ライセンス: Link先を確認	Florian Bacho, Dminique Chu,	(参考訳) 現在の最先端のニューラルネットワークトレーニングアルゴリズムに代わる、エネルギー効率の良い代替手段を見つけることに興味がある。スパイクニューラルネットワークは、ニューロモルフィックなハードウェアプラットフォーム上で効率的にエネルギーをシミュレートできるため、有望なアプローチである。しかし、これらのプラットフォームにはトレーニングアルゴリズムの設計に制限がある。最も重要なことは、バックプロパゲーションはそれらに実装できないことです。本稿では,新しいニューロモルフィックアルゴリズムである,SFDFAアルゴリズムを提案し,SNNのトレーニングに<textit{Forward Direct Feedback Alignment}を適用した。 SFDFAは、出力と隠れたニューロンの間の重みをフィードバック接続として推定する。本研究の主な貢献は、シナプス後スパイク間のニューロン内依存性を考慮しつつ、スパイクの局所勾配をオンライン的に正確に計算し、ニューロモルフィックハードウェア互換性の動的システムを導出することである。 SFDFAアルゴリズムと多くの競合アルゴリズムを比較し,提案アルゴリズムが高い性能と収束率を達成することを示す。 There is an interest in finding energy efficient alternatives to current state of the art neural network training algorithms. Spiking neural network are a promising approach, because they can be simulated energy efficiently on neuromorphic hardware platforms. However, these platforms come with limitations on the design of the training algorithm. Most importantly, backpropagation cannot be implemented on those. We propose a novel neuromorphic algorithm, the \textit{Spiking Forward Direct Feedback Alignment} (SFDFA) algorithm, an adaption of \textit{Forward Direct Feedback Alignment} to train SNNs. SFDFA estimates the weights between output and hidden neurons as feedback connections. The main contribution of this paper is to describe how exact local gradients of spikes can be computed in an online manner while taking into account the intra-neuron dependencies between post-synaptic spikes and derive a dynamical system for neuromorphic hardware compatibility. We compare the SFDFA algorithm with a number of competitor algorithms and show that the proposed algorithm achieves higher performance and convergence rates.	翻訳日:2024-03-18 05:40:54 公開日:2024-02-06
# 対向的特徴類似性学習による対向的ロバストディープフェイク検出 Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning ( http://arxiv.org/abs/2403.08806v1 ) ライセンス: Link先を確認	Sarwar Khan,	(参考訳) ディープフェイク技術は、デジタルコンテンツの信頼性を懸念し、効果的な検出方法の開発を必要としている。しかし、ディープフェイクが普及し、敵の攻撃という形で新たな課題がもたらされた。敵は、検出モデルを騙して誤った出力を生成する、小さくて知覚できない摂動でディープフェイクビデオを操作できる。この重要な問題に対処するために,3つの基本的深い特徴学習パラダイムを統合したAdversarial Feature similarity Learning (AFSL)を導入する。サンプルと重みベクトルの類似性を最適化することにより、本手法は実例と偽例を区別することを目的としている。さらに,本研究の目的は,実物や偽物によらず,対角的摂動例と非摂動例の類似性を最大化することである。さらに,本手法では,実検体と偽検体との相違を最大化し,両者の明確な分離を確実にする正則化手法を提案する。 FaceForensics++、FaceShifter、DeeperForensicsなど、人気のあるディープフェイクデータセットに関する広範な実験により、提案手法は、他の標準的な対向トレーニングベースの防御方法よりも大幅に優れている。さらに, 敵攻撃からディープフェイク検出器を保護するためのアプローチの有効性を示す。 Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods. However, the widespread availability of deepfakes has given rise to a new challenge in the form of adversarial attacks. Adversaries can manipulate deepfake videos with small, imperceptible perturbations that can deceive the detection models into producing incorrect outputs. To tackle this critical issue, we introduce Adversarial Feature Similarity Learning (AFSL), which integrates three fundamental deep feature learning paradigms. By optimizing the similarity between samples and weight vectors, our approach aims to distinguish between real and fake instances. Additionally, we aim to maximize the similarity between both adversarially perturbed examples and unperturbed examples, regardless of their real or fake nature. Moreover, we introduce a regularization technique that maximizes the dissimilarity between real and fake samples, ensuring a clear separation between these two categories. With extensive experiments on popular deepfake datasets, including FaceForensics++, FaceShifter, and DeeperForensics, the proposed method outperforms other standard adversarial training-based defense methods significantly. This further demonstrates the effectiveness of our approach to protecting deepfake detectors from adversarial attacks.	翻訳日:2024-03-18 05:40:54 公開日:2024-02-06
# 多目的組合せ最適化問題に対する実効時アルゴリズム Effective anytime algorithm for multiobjective combinatorial optimization problems ( http://arxiv.org/abs/2403.08807v1 ) ライセンス: Link先を確認	Miguel Ángel Domínguez-Ríos, Francisco Chicano, Enrique Alba,	(参考訳) 多目的最適化において、最適化アルゴリズムの結果は、意思決定者が選択した効率的な解の集合である。すべての効率的な解を短時間で計算できる訳ではなく、探索アルゴリズムを早めに停止させ、これまでに見いだされた解を解析することが一般的である。客観的な空間で十分に普及している効率的なソリューションのセットは、意思決定者に対して様々なソリューションを提供するのに好まれる。しかし、文学におけるいくつかの正確なアルゴリズムは、いつでも、そのようなよく普及した一連のソリューションを提供する能力をもって存在する:我々は、いつでもそれらをアルゴリズムと呼ぶ。そこで我々は,3つの新しいアイデアを組み合わせた多目的組合せ最適化のための新しい正確な随時アルゴリズムを提案する。提案アルゴリズムは, 既知ベンチマークの480インスタンスと, 総合非支配ベクトル生成率, ハイパーボリューム, 一般スプレッド, 加算エプシロンインジケータの4つの異なる性能測定値を用いて, 任意の多目的組合せ最適化のための最先端のアルゴリズムと比較した。総合的な実験的研究により、我々の提案は、ほとんどの事例において、以前のアルゴリズムよりも優れていたことが判明した。 In multiobjective optimization, the result of an optimization algorithm is a set of efficient solutions from which the decision maker selects one. It is common that not all the efficient solutions can be computed in a short time and the search algorithm has to be stopped prematurely to analyze the solutions found so far. A set of efficient solutions that are well-spread in the objective space is preferred to provide the decision maker with a great variety of solutions. However, just a few exact algorithms in the literature exist with the ability to provide such a well-spread set of solutions at any moment: we call them anytime algorithms. We propose a new exact anytime algorithm for multiobjective combinatorial optimization combining three novel ideas to enhance the anytime behavior. We compare the proposed algorithm with those in the state-of-the-art for anytime multiobjective combinatorial optimization using a set of 480 instances from different well-known benchmarks and four different performance measures: the overall non-dominated vector generation ratio, the hypervolume, the general spread and the additive epsilon indicator. A comprehensive experimental study reveals that our proposal outperforms the previous algorithms in most of the instances.	翻訳日:2024-03-18 05:40:54 公開日:2024-02-06
# 耐異常性を有する長距離水中航行のためのバイオン型データ駆動アプローチ A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance ( http://arxiv.org/abs/2403.08808v1 ) ライセンス: Link先を確認	Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai, Yushuai Li, Tingwen Huang,	(参考訳) 様々な動物が環境の手がかりを使って正確なナビゲーションをしている。地球の磁場は長距離動物相の移動において信頼できる情報源であることが証明されている。動物航法にインスパイアされたこの研究は、長距離水中航法のためのバイオニックでデータ駆動のアプローチを提案する。提案手法では,GPSシステムや地理地図を必要とせず,測地データを用いてナビゲーションを行う。特に,時間的注意に基づくLong Short-Term Memory(TA-LSTM)ネットワークを構築し,ナビゲーション中の方向角を予測する。地磁気異常の影響を緩和するため,最大線量推定に基づく異常の検出・定量化機構を開発した。開発機構をTA-LSTMと統合し、予測方向角を校正し、地磁気異常に対する耐性を得る。 WMMモデルから取得したデータを用いて,多様なナビゲーション条件を用いた数値シミュレーションを行い,本手法を検証した。シミュレーションの結果,地磁気異常に対するレジリエンスナビゲーションと,単一および複数目的地における水中ナビゲーションの精度と安定性が示された。 Various animals exhibit accurate navigation using environment cues. The Earth's magnetic field has been proved a reliable information source in long-distance fauna migration. Inspired by animal navigation, this work proposes a bionic and data-driven approach for long-distance underwater navigation. The proposed approach uses measured geomagnetic data for the navigation, and requires no GPS systems or geographical maps. Particularly, we construct and train a Temporal Attention-based Long Short-Term Memory (TA-LSTM) network to predict the heading angle during the navigation. To mitigate the impact of geomagnetic anomalies, we develop the mechanism to detect and quantify the anomalies based on Maximum Likelihood Estimation. We integrate the developed mechanism with the TA-LSTM, and calibrate the predicted heading angles to gain resistance against geomagnetic anomalies. Using the retrieved data from the WMM model, we conduct numerical simulations with diversified navigation conditions to test our approach. The simulation results demonstrate a resilience navigation against geomagnetic anomalies by our approach, along with precision and stability of the underwater navigation in single and multiple destination missions.	翻訳日:2024-03-18 05:40:54 公開日:2024-02-06
# webotsによるコンテナ化(深層)強化学習のためのアーキテクチャ An Architecture for Unattended Containerized (Deep) Reinforcement Learning with Webots ( http://arxiv.org/abs/2403.00765v1 ) ライセンス: Link先を確認	Tobias Haubold, Petra Linke	(参考訳) データサイエンスアプリケーションが業界で採用されるにつれて、ツールの世界は成熟し、そのようなアプリケーションのライフサイクルを促進し、関係者の生産性向上に関わる課題の解決策を提供する。 3d世界のエージェントによる強化学習は、まだ課題に直面している可能性がある。シミュレーションソフトウェアを使用するために必要な知識と、無人のトレーニングパイプラインでのスタンドアロンシミュレーションソフトウェアの利用。本稿では,ロボットロボットに関して,ロボットの強化学習エージェントを訓練するためのツールとアプローチをレビューし,仮想世界の創造者のためのシミュレーション環境とデータサイエンティストのためのモデル開発環境の分離は,あまり話題になっていないことを論じる。どちらも同じで、データサイエンティストはapiを直接扱うためにシミュレーションソフトウェアに関する知識を必要とします。さらに、仮想世界やデータサイエンティストの作者が同じファイルで作業することもある。私たちは、データサイエンティストがシミュレーションソフトウェアに関する知識を必要としないアプローチを説明することで、このトピックに貢献したいと考えています。本手法では,シミュレーションソフトウェアであるwebots,ロボットオペレーティングシステムを用いてシミュレーションロボットと通信し,シミュレーションソフトウェア自体とコンテナ技術を用いてシミュレーションをモデル開発環境から分離する。私たちは、データサイエンティストが扱うAPIと、意図しないトレーニングパイプラインでスタンドアロンのシミュレーションソフトウェアを使用することを強調しました。ロボットノに特有の部分と学習すべきロボットタスクを示す。 As data science applications gain adoption across industries, the tooling landscape matures to facilitate the life cycle of such applications and provide solutions to the challenges involved to boost the productivity of the people involved. Reinforcement learning with agents in a 3D world could still face challenges: the knowledge required to use a simulation software as well as the utilization of a standalone simulation software in unattended training pipelines. In this paper we review tools and approaches to train reinforcement learning agents for robots in 3D worlds with respect to the robot Robotino and argue that the separation of the simulation environment for creators of virtual worlds and the model development environment for data scientists is not a well covered topic. Often both are the same and data scientists require knowledge of the simulation software to work directly with their APIs. Moreover, sometimes creators of virtual worlds and data scientists even work on the same files. We want to contribute to that topic by describing an approach where data scientists don't require knowledge about the simulation software. Our approach uses the standalone simulation software Webots, the Robot Operating System to communicate with simulated robots as well as the simulation software itself and container technology to separate the simulation from the model development environment. We put emphasize on the APIs the data scientists work with and the use of a standalone simulation software in unattended training pipelines. We show the parts that are specific to the Robotino and the robot task to learn.	翻訳日:2024-03-11 00:19:35 公開日:2024-02-06
# 拡張クエリによる言語生成のための検索プロセスの強化 Enhancing Retrieval Processes for Language Generation with Augmented Queries ( http://arxiv.org/abs/2402.16874v1 ) ライセンス: Link先を確認	Julien Pierre Edmond Ghali, Kosuke Shima, Koichi Moriyama, Atsuko Mutoh, Nobuhiro Inuzuka	(参考訳) スマートテクノロジーの急速な変化の中で、高度な言語モデルの台頭により、文書の検索がますます困難になっている。これらのモデルは、しばしば「幻覚」として知られる不正確な情報を提供するような困難に直面している。本研究は,実事実に基づく正確な応答をモデルに誘導するRAG(Retrieval-Augmented Generation)を通じてこの問題に対処することに焦点を当てる。スケーラビリティの問題を克服するために、この研究は、革新的なクエリ最適化プロセスを使用して、bertやorca2といった高度な言語モデルとユーザクエリを接続することを検討している。この研究は、3つのシナリオに展開されている。まずはRAGなしで、次に追加の助けなしで、最後に追加の助けなしで。コンパクトだが効率的なOrca2 7Bモデルを選択することは、コンピューティングリソースのスマートな利用を実証する。実験結果から,RAGによる初期言語モデルの性能向上,特にプロンプト強化時の性能向上が示唆された。異なるエンコーディング間の文書検索の一貫性は、言語モデル生成クエリの使用の有効性を強調する。 UMAP for BERTの導入により、強力な結果を維持しながら文書検索がさらに簡単になる。 In the rapidly changing world of smart technology, searching for documents has become more challenging due to the rise of advanced language models. These models sometimes face difficulties, like providing inaccurate information, commonly known as "hallucination." This research focuses on addressing this issue through Retrieval-Augmented Generation (RAG), a technique that guides models to give accurate responses based on real facts. To overcome scalability issues, the study explores connecting user queries with sophisticated language models such as BERT and Orca2, using an innovative query optimization process. The study unfolds in three scenarios: first, without RAG, second, without additional assistance, and finally, with extra help. Choosing the compact yet efficient Orca2 7B model demonstrates a smart use of computing resources. The empirical results indicate a significant improvement in the initial language model's performance under RAG, particularly when assisted with prompts augmenters. Consistency in document retrieval across different encodings highlights the effectiveness of using language model-generated queries. The introduction of UMAP for BERT further simplifies document retrieval while maintaining strong results.	翻訳日:2024-03-03 19:20:57 公開日:2024-02-06
# Mind the Gap: ピアグループからのセキュリティ逸脱に基づいたセキュアなサイバーリスクモデリング Mind the Gap: Securely modeling cyber risk based on security deviations from a peer group ( http://arxiv.org/abs/2402.04166v1 ) ライセンス: Link先を確認	Taylor Reynolds, Sarah Scheffler, Daniel J. Weitzner, Angelina Wu	(参考訳) 組織が主に答えられなかったサイバーリスクについて、戦略的かつ長年にわたって疑問が2つある。両方の回答には、セキュリティ姿勢、インシデント、損失に関する業界全体のデータが必要である。現在、暗号コンピューティングのようなプライバシー強化技術(pets)は、機密性の高い入力データを非公開にしながら、組織のピアグループによるサイバーリスクメトリクスの安全な計算を可能にする。これらの新しい集計データが利用可能になると、アナリストはそれらをサイバーリスクモデルに統合し、より信頼できるリスクアセスメントを生成し、ピアグループと比較できるようにする方法が必要となる。本稿では,セキュアな計算から生じる新しい変数を用いて,ピアに対するサイバー姿勢のベンチマークを行い,特定の経済セクターにおけるサイバーリスクを推定する枠組みを提案する。本稿では,組織とその仲間間の重み付けされたセキュリティギャップを表す,defid gap indexと呼ばれる新たなトップライン変数を導入し,過去の産業データに基づいて組織のセキュリティリスクを予測する。我々は,25の大企業から収集したデータを用いて特定の分野に適用し,業界ISAOと共同で業界リスクモデルを構築し,参加者に自身のリスク露出を推定するためのツールを提供し,セキュリティ姿勢を仲間とプライベートに比較する。 There are two strategic and longstanding questions about cyber risk that organizations largely have been unable to answer: What is an organization's estimated risk exposure and how does its security compare with peers? Answering both requires industry-wide data on security posture, incidents, and losses that, until recently, have been too sensitive for organizations to share. Now, privacy enhancing technologies (PETs) such as cryptographic computing can enable the secure computation of aggregate cyber risk metrics from a peer group of organizations while leaving sensitive input data undisclosed. As these new aggregate data become available, analysts need ways to integrate them into cyber risk models that can produce more reliable risk assessments and allow comparison to a peer group. This paper proposes a new framework for benchmarking cyber posture against peers and estimating cyber risk within specific economic sectors using the new variables emerging from secure computations. We introduce a new top-line variable called the Defense Gap Index representing the weighted security gap between an organization and its peers that can be used to forecast an organization's own security risk based on historical industry data. We apply this approach in a specific sector using data collected from 25 large firms, in partnership with an industry ISAO, to build an industry risk model and provide tools back to participants to estimate their own risk exposure and privately compare their security posture with their peers.	翻訳日:2024-02-18 14:32:23 公開日:2024-02-06
# 人間論における大規模言語モデルの限界 Limits of Large Language Models in Debating Humans ( http://arxiv.org/abs/2402.06049v1 ) ライセンス: Link先を確認	James Flamino, Mohammed Shahid Modi, Boleslaw K. Szymanski, Brendan Cross, Colton Mikolajczyk	(参考訳) 大規模言語モデル(llm)は、人間と巧みに対話する能力に顕著な期待を示してきた。その後、会話に関わる社会学的実験で人工的な南軍やサロゲートとしての使用の可能性は、エキサイティングな見通しである。しかし、このアイデアはどの程度有効か? 本論文は,LLMエージェントを現実人と組み合わせた事前登録研究により,現在のLLMの限界を検証しようとする試みである。この研究は、人間のみ、エージェントと人間、エージェントのみの3つの環境における議論に基づく意見合意形成に焦点を当てている。私たちのゴールは、LLMエージェントが人間にどのように影響するか、そして人間のように議論する能力について理解することです。 LLMは人間の生産性をブレンドし促進するが、議論では説得力に欠けており、最終的には人間の行動から逸脱する。我々は、これらの主要な失敗を解明し、LCMが議論者になる前にさらに進化する必要があることを期待する。 Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans. Subsequently, their potential use as artificial confederates and surrogates in sociological experiments involving conversation is an exciting prospect. But how viable is this idea? This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people. The study focuses on debate-based opinion consensus formation in three environments: humans only, agents and humans, and agents only. Our goal is to understand how LLM agents influence humans, and how capable they are in debating like humans. We find that LLMs can blend in and facilitate human productivity but are less convincing in debate, with their behavior ultimately deviating from human's. We elucidate these primary failings and anticipate that LLMs must evolve further before being viable debaters.	翻訳日:2024-02-18 14:07:29 公開日:2024-02-06
# かなり進歩していますか? 不均衡回帰から見た化学反応収率予測の再検討 Are we making much progress? Revisiting chemical reaction yield prediction from an imbalanced regression perspective ( http://arxiv.org/abs/2402.05971v1 ) ライセンス: Link先を確認	Yihong Ma, Xiaobao Huang, Bozhao Nan, Nuno Moniz, Xiangliang Zhang, Olaf Wiest and Nitesh V. Chawla	(参考訳) 化学反応の収率は、化学反応中に消費される反応物に関連して形成されるターゲット生成物の割合を定量する。正確な収率予測は、合成計画中に高yield反応を選択するための化学者のガイドとなり、ウェットラボの実験に時間と資源を割く前に貴重な洞察を提供する。近年の歩留まり予測の進歩は収量範囲全体の全体的な性能改善に繋がったが、化学者にとって大きな関心事である高yield反応の予測の強化には未解決の課題が残っている。本稿では, 高収率予測における性能差は, 低収率反応に歪んだ実世界のデータの不均衡分布に起因すると論じる。このデータ不均衡にもかかわらず、既存の収量予測法は、バランスの取れたトレーニング分布を仮定して、異なる収量範囲を等しく扱い続けている。 3つの実世界の収量予測データセットに関する広範囲な実験を通じて,不均衡回帰問題としての反応収量予測の再フレームの必要性を強調する。最後に,簡易なコストセンシティブな再重み付け手法の導入により,低表示高yield領域における収率予測モデルの性能が著しく向上することを示す。 The yield of a chemical reaction quantifies the percentage of the target product formed in relation to the reactants consumed during the chemical reaction. Accurate yield prediction can guide chemists toward selecting high-yield reactions during synthesis planning, offering valuable insights before dedicating time and resources to wet lab experiments. While recent advancements in yield prediction have led to overall performance improvement across the entire yield range, an open challenge remains in enhancing predictions for high-yield reactions, which are of greater concern to chemists. In this paper, we argue that the performance gap in high-yield predictions results from the imbalanced distribution of real-world data skewed towards low-yield reactions, often due to unreacted starting materials and inherent ambiguities in the reaction processes. Despite this data imbalance, existing yield prediction methods continue to treat different yield ranges equally, assuming a balanced training distribution. Through extensive experiments on three real-world yield prediction datasets, we emphasize the urgent need to reframe reaction yield prediction as an imbalanced regression problem. Finally, we demonstrate that incorporating simple cost-sensitive re-weighting methods can significantly enhance the performance of yield prediction models on underrepresented high-yield regions.	翻訳日:2024-02-18 14:07:00 公開日:2024-02-06
# ニューラル離散学習とレベル・オブ・エキスパートを用いた時空間力学系のモデリング Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-Experts ( http://arxiv.org/abs/2402.05970v1 ) ライセンス: Link先を確認	Kun Wang, Hao Wu, Guibin Zhang, Junfeng Fang, Yuxuan Liang, Yuankai Wu, Roger Zimmermann, Yang Wang	(参考訳) 本稿では,映像フレームのような観測順序に基づく時空間力学系の状態変化のモデル化と推定の問題について述べる。従来の数値シミュレーションシステムは、構成された偏微分方程式(PDE)の初期設定と正しさに大きく依存する。近年の取り組みは、ニューラルネットワークによるデータ駆動型PDEの発見に大きな成功をもたらしたが、特異なシナリオによる制限と、局所的な洞察の欠如により、より広い現実世界の文脈で効果的に実行できなくなる。そこで本研究では,一般的な物理プロセスの進化法則をデータ駆動方式で捉えるために,ユニバーサル・エキスパート・モジュール,すなわち光フロー推定コンポーネントを提案する。局所的なインサイトを高めるため,局所的な特性は内部の様々な情報に影響され,システム全体のマクロ的特性と矛盾する可能性があるため,より微細な物理パイプラインの設計に苦慮する。さらに、現在広く使われているニューラル離散学習を利用して、潜在空間の根底にある重要な特徴を明らかにし、このプロセスは解釈可能性をより良く注入し、これらの離散確率変数に対して強力な先行性を得るのに役立つ。提案手法が既存の sota ベースラインと比較して大きな性能マージンを達成することを示すために,広範な実験とアブレーションを実施している。 In this paper, we address the issue of modeling and estimating changes in the state of the spatio-temporal dynamical systems based on a sequence of observations like video frames. Traditional numerical simulation systems depend largely on the initial settings and correctness of the constructed partial differential equations (PDEs). Despite recent efforts yielding significant success in discovering data-driven PDEs with neural networks, the limitations posed by singular scenarios and the absence of local insights prevent them from performing effectively in a broader real-world context. To this end, this paper propose the universal expert module -- that is, optical flow estimation component, to capture the evolution laws of general physical processes in a data-driven fashion. To enhance local insight, we painstakingly design a finer-grained physical pipeline, since local characteristics may be influenced by various internal contextual information, which may contradict the macroscopic properties of the whole system. Further, we harness currently popular neural discrete learning to unveil the underlying important features in its latent space, this process better injects interpretability, which can help us obtain a powerful prior over these discrete random variables. We conduct extensive experiments and ablations to demonstrate that the proposed framework achieves large performance margins, compared with the existing SOTA baselines.	翻訳日:2024-02-18 14:06:37 公開日:2024-02-06
# 変圧器の訓練における破壊対称性 Breaking Symmetry When Training Transformers ( http://arxiv.org/abs/2402.05969v1 ) ライセンス: Link先を確認	Chunsheng Zuo, Michael Guerzhoy	(参考訳) 本稿では,入力トークン1, 2, ..., n-1$の置換に対して,位置エンコーディングと因果注意のメカニズムの1つを使わずに,出力トークン$n+1$のTransformerアーキテクチャの予測を行う。通常、両方の機構が採用され、入力トークンに対する対称性が損なわれる。近年,位置符号化なしでトランスフォーマーを訓練できることが示されている。これは因果注意機構によって実現されなければならない。本稿では,変換器が順序が重要な入力シーケンスをモデル化できるという事実に対して,因果接続機構が責任を負うべきであるという議論を詳述する。トランスフォーマーの垂直な「スライス」は入力シーケンスで同じ位置の$k$を表すように推奨されている。我々は、残留接続がこの現象に寄与すると仮定し、その証拠を示す。 As we show in this paper, the prediction for output token $n+1$ of Transformer architectures without one of the mechanisms of positional encodings and causal attention is invariant to permutations of input tokens $1, 2, ..., n-1$. Usually, both mechanisms are employed and the symmetry with respect to the input tokens is broken. Recently, it has been shown that one can train Transformers without positional encodings. This must be enabled by the causal attention mechanism. In this paper, we elaborate on the argument that the causal connection mechanism must be responsible for the fact that Transformers are able to model input sequences where the order is important. Vertical "slices" of Transformers are all encouraged to represent the same location $k$ in the input sequence. We hypothesize that residual connections contribute to this phenomenon, and demonstrate evidence for this.	翻訳日:2024-02-18 14:06:15 公開日:2024-02-06
# 悪質な再構成可能な知的表面を含む物理層秘密鍵を用いた説明可能な逆学習フレームワーク Explainable Adversarial Learning Framework on Physical Layer Secret Keys Combating Malicious Reconfigurable Intelligent Surface ( http://arxiv.org/abs/2402.06663v1 ) ライセンス: Link先を確認	Zhuangkun Wei, Wenxiu Hu, Weisi Guo	(参考訳) 再構成可能なインテリジェントサーフェス(ris)の開発は、物理層セキュリティ(pls)のための二重刃の剣である。適切なRISは、物理層シークレットキー生成(PL-SKG)を高めるためにチャネルランダム性の増加を含む有益な影響をもたらすが、悪意のあるRISは正当なチャネルを毒化し、既存のPL-SKGの大半を分解する。本研究では,この中途半端な悪意あるRIS(MITM-RIS)盗聴に対処するため,Alice と Bob の対立学習フレームワークを提案する。まず、正当なペアとMITM-RISの理論的相互情報ギャップを推定する。そこでAliceとBobはGAN(Generative Adversarial Network)を利用して、MITM-RISと重なり合う情報を持たない共通の特徴曲面を実現する。次に,シンボリック説明可能ai(xai)表現を用いて,ブラックボックスニューラルネットワークの信号処理解釈を支援する。これらの支配的なニューロンの象徴的な用語は、工学に基づく検証とPLS共通特徴空間の将来の設計に役立つ。シミュレーションの結果,提案したGANベースおよびシンボリックベースPL-SKGは,正規ユーザ間での高いキーコンセンサスを達成でき,また,正規機能生成(NNや公式)の知識を持つMITM-RIS Eveにも耐性があることがわかった。これにより、将来の6gで信頼できない反射型デバイスでワイヤレス通信を確保する方法が整う。 The development of reconfigurable intelligent surfaces (RIS) is a double-edged sword to physical layer security (PLS). Whilst a legitimate RIS can yield beneficial impacts including increased channel randomness to enhance physical layer secret key generation (PL-SKG), malicious RIS can poison legitimate channels and crack most of existing PL-SKGs. In this work, we propose an adversarial learning framework between legitimate parties (namely Alice and Bob) to address this Man-in-the-middle malicious RIS (MITM-RIS) eavesdropping. First, the theoretical mutual information gap between legitimate pairs and MITM-RIS is deduced. Then, Alice and Bob leverage generative adversarial networks (GANs) to learn to achieve a common feature surface that does not have mutual information overlap with MITM-RIS. Next, we aid signal processing interpretation of black-box neural networks by using a symbolic explainable AI (xAI) representation. These symbolic terms of dominant neurons aid feature engineering-based validation and future design of PLS common feature space. Simulation results show that our proposed GAN-based and symbolic-based PL-SKGs can achieve high key agreement rates between legitimate users, and is even resistant to MITM-RIS Eve with the knowledge of legitimate feature generation (NNs or formulas). This therefore paves the way to secure wireless communications with untrusted reflective devices in future 6G.	翻訳日:2024-02-18 13:56:38 公開日:2024-02-06
# 注意に基づくグラフデコーダの符号ランク制限 Sign Rank Limitations for Attention-Based Graph Decoders ( http://arxiv.org/abs/2402.06662v1 ) ライセンス: Link先を確認	Su Hyeong Lee and Qingqi Zhang and Risi Kondor	(参考訳) 内部製品ベースのデコーダは、潜在埋め込みから有意義なデータを抽出するために使用される最も影響力のあるフレームワークの1つである。しかし、そのようなデコーダは、特にグラフ再構成問題において顕著な多くの著作において、表現能力の限界を示している。本稿では,この普及現象をグラフデータで初めて理論的に解明し,内部積の枠組みから逸脱することなく,この問題を回避するための簡単な修正を提案する。 Inner product-based decoders are among the most influential frameworks used to extract meaningful data from latent embeddings. However, such decoders have shown limitations in representation capacity in numerous works within the literature, which have been particularly notable in graph reconstruction problems. In this paper, we provide the first theoretical elucidation of this pervasive phenomenon in graph data, and suggest straightforward modifications to circumvent this issue without deviating from the inner product framework.	翻訳日:2024-02-18 13:56:09 公開日:2024-02-06
# 現実的な予測プロセスによる拡散による天気予報 Weather Prediction with Diffusion Guided by Realistic Forecast Processes ( http://arxiv.org/abs/2402.06666v1 ) ライセンス: Link先を確認	Zhanxiang Hua, Yutong He, Chengqian Ma, Alexandra Anderson-Frey	(参考訳) 最近開発されたディープラーニング(dl)に基づくモデルが、従来の数値気象予測(nwp)モデルのパフォーマンスにアプローチしている。しかしながら、これらのDLモデルは、しばしば複雑でリソース集約的であり、訓練後の柔軟性とNWP予測の導入の制限に直面しており、潜在的な非物理的予測による信頼性の懸念につながっている。本研究では,気象予測に拡散モデル(DM)を適用した新しい手法を提案する。特に,本手法は,同じモデリングフレームワークを用いて,直接予測と反復予測の両方を実現できる。我々のモデルは、独立して予測を生成するだけでなく、サンプリングプロセス中に異なるリード時間であっても、NWP予測の統合を可能にする。我々のモデルの柔軟性と制御性は、一般の気象コミュニティにとってより信頼性の高いDLシステムを可能にする。さらに,永続性と気候データの統合は,長期予測安定性をさらに向上させる。実験により,本手法の有効性と一般化性を示し,再学習を必要とせず,より高度な拡散モデルの実現が期待できることを示す。 Weather forecasting remains a crucial yet challenging domain, where recently developed models based on deep learning (DL) have approached the performance of traditional numerical weather prediction (NWP) models. However, these DL models, often complex and resource-intensive, face limitations in flexibility post-training and in incorporating NWP predictions, leading to reliability concerns due to potential unphysical predictions. In response, we introduce a novel method that applies diffusion models (DM) for weather forecasting. In particular, our method can achieve both direct and iterative forecasting with the same modeling framework. Our model is not only capable of generating forecasts independently but also uniquely allows for the integration of NWP predictions, even with varying lead times, during its sampling process. The flexibility and controllability of our model empowers a more trustworthy DL system for the general weather community. Additionally, incorporating persistence and climatology data further enhances our model's long-term forecasting stability. Our empirical findings demonstrate the feasibility and generalizability of this approach, suggesting a promising direction for future, more sophisticated diffusion models without the need for retraining.	翻訳日:2024-02-18 13:38:41 公開日:2024-02-06
# 身体的AIの基礎的世界モデルにおける因果関係の本質的役割 The Essential Role of Causality in Foundation World Models for Embodied AI ( http://arxiv.org/abs/2402.06665v1 ) ライセンス: Link先を確認	Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Sch\"olkopf, Cheng Zhang	(参考訳) 基礎モデルの最近の進歩、特に大規模マルチモーダルモデルや会話エージェントは、一般的に有能な具体化エージェントの可能性に関心を燃やしている。このようなエージェントは、多くの異なる現実世界環境で新しいタスクを実行する能力を必要とする。しかし、現在の基礎モデルは現実世界との物理的相互作用を正確にモデル化できないため、Embodied AIには不十分である。因果関係の研究は、可能な相互作用の結果を正確に予測するために不可欠である、検証世界モデルの構築に結びつく。本稿では,次世代のエンボディエージェントのための基礎世界モデルの構築に焦点をあて,それらの意義に関する新たな視点を示す。因果的考察の統合は,世界と有意義な物理的相互作用を促進する上で不可欠であると考えられる。最後に,この文脈における因果性に関する誤解を解き明かすとともに,今後の研究の展望を示す。 Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents would require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions with the real world thus not sufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitate meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research.	翻訳日:2024-02-18 13:38:25 公開日:2024-02-06
# No-Code AutoMLによる人間中心のAIプロダクトプロトタイプ - 概念フレームワーク、可能性、限界 Human-Centered AI Product Prototyping with No-Code AutoML: Conceptual Framework, Potentials and Limitations ( http://arxiv.org/abs/2402.07933v1 ) ライセンス: Link先を確認	Mario Truss, Marc Schmitt	(参考訳) 本稿では,AI製品プロトタイピングにおける課題に対する解決策としてNo-Code AutoMLを評価し,非専門家への予測不能と到達不能を特徴とし,概念的枠組みを提案する。このAI製品の複雑さは、人間中心のAI製品にとって不可欠なシームレスな実行と学際的なコラボレーションを妨げる。産業とイノベーションに関連して、戦略的意思決定と投資リスク軽減に影響を及ぼす。現在のアプローチは、AIプロダクトのアイデアの可能性と実現可能性に関する限られた洞察を提供する。この研究はDesign Science Researchを採用し、No-code AutoMLを使ったAIプロダクトプロトタイプのフレームワークを提供することで、課題を特定し、ソリューションとしてNo-code AutoMLを統合する。ケーススタディでは、AI製品開発に対する構造化されたアプローチを提供する非専門家をサポートする可能性を確認している。このフレームワークは、アクセシブルで解釈可能なプロトタイピングを促進し、アカデミック、マネージャ、意思決定者に恩恵を与える。 no-code AutoMLの戦略的統合は効率を高め、非専門家に権限を与え、承認された制限にもかかわらず、アーリーステージの決定を通知する。 This paper evaluates No-Code AutoML as a solution for challenges in AI product prototyping, characterized by unpredictability and inaccessibility to non-experts, and proposes a conceptual framework. This complexity of AI products hinders seamless execution and interdisciplinary collaboration crucial for human-centered AI products. Relevant to industry and innovation, it affects strategic decision-making and investment risk mitigation. Current approaches provide limited insights into the potential and feasibility of AI product ideas. Employing Design Science Research, the study identifies challenges and integrates no-code AutoML as a solution by presenting a framework for AI product prototyping with No-code AutoML. A case study confirms its potential in supporting non-experts, offering a structured approach to AI product development. The framework facilitates accessible and interpretable prototyping, benefiting academia, managers, and decision-makers. Strategic integration of no-code AutoML enhances efficiency, empowers non-experts, and informs early-stage decisions, albeit with acknowledged limitations.	翻訳日:2024-02-18 13:28:32 公開日:2024-02-06
# スキーマ開発のための人間・機械協調フレームワーク A Human-Machine Collaboration Framework for the Development of Schemas ( http://arxiv.org/abs/2402.07932v1 ) ライセンス: Link先を確認	Nicos Isaak	(参考訳) Winograd Schema Challenge (WSC)は、マシンインテリジェンスのためのよく考えられたテストであり、人間の振る舞いを示すシステムの開発システムに光を当てることが提案されている。導入以来、AIコミュニティの焦点をテクノロジーからAI科学へと転換することを目的としていた。人間にとって一般的で自明な研究は、機械にとって、特に新しいスキーマを扱う必要がある場合、特に、明確な代名詞の解決を必要とするよく設計された文は、依然として困難であることを示している。研究者がチャレンジそのものに関心を持つようになるにつれて、これはおそらく、人間の専門家が合理的に開発できる範囲を超えて、多くのウィノグラードスキーマが利用可能になる必要があるだろう。このニーズに対処するために、人間と機械がチームメイトとしてどのように協力して新しいスキーマをゼロから設計できるかを明確に焦点をあてる新しいフレームワークを提案する。これは2つの最近の研究を組み合わせることで達成されている。 i)winventorは、高品質ではないが、大量のwinogradスキーマを開発するための機械駆動のアプローチで、 ii)WinoFlexiは、クラウドソーシングシステムで、クラウドワーカーが専門家とよく似た品質の限られた数のスキーマを開発することができる。提案手法は,人間と機械の知能を向上し,補完的な強みを生かした新しい協調プラットフォームを開発するための新たなロードマップを構築する。 The Winograd Schema Challenge (WSC), a seemingly well-thought-out test for machine intelligence, has been proposed to shed light on developing systems that exhibit human behavior. Since its introduction, it aimed to pivot the focus of the AI community from the technology to the science of AI. While common and trivial for humans, studies show that it is still challenging for machines, especially when they have to deal with novel schemas, that is, well-designed sentences that require the resolving of definite pronouns. As researchers have become increasingly interested in the challenge itself, this presumably necessitates the availability of an extensive collection of Winograd schemas, which goes beyond what human experts can reasonably develop themselves, especially after proposed ways of utilizing them as novel forms of CAPTCHAs. To address this necessity, we propose a novel framework that explicitly focuses on how humans and machines can collaborate as teammates to design novel schemas from scratch. This is being accomplished by combining two recent studies from the literature: i) Winventor, a machine-driven approach for the development of large amounts of Winograd schemas, albeit not of high quality, and ii) WinoFlexi, an online crowdsourcing system that allows crowd workers to develop a limited number of schemas often of similar quality to that of experts. Our proposal crafts a new road map toward developing a novel collaborative platform that amplifies human and machine intelligence by combining their complementary strengths.	翻訳日:2024-02-18 13:28:16 公開日:2024-02-06
# コンテキスト対応型自動乗客計数データデノイング Context-Aware Automated Passenger Counting Data Denoising ( http://arxiv.org/abs/2402.08688v1 ) ライセンス: Link先を確認	No\"elie Cherrier, Baptiste R\'erolle, Martin Graive, Amir Dib, Eglantine Schmitt	(参考訳) 公共交通網における利用者の信頼性と正確な知識は、公共交通事業者や公共団体にとって、そのネットワークの使用と交通提供の最適化を意識することが不可欠である。現在、乗客数を推定する手法がいくつか存在し、一部は自動化されている。そのうち、自動旅客カウント(APC)システムは、コースの各駅に車両を乗降させる乗客を検知する。しかし、これらのシステムから得られるデータは、しばしばうるさいか、あるいは偏りがあるため、搭載された占有率の過大評価に繋がる。本研究では,APCデータのロバスト性向上と解析の容易化を目的としたデノナイズアルゴリズムを提案する。提案手法は制約付き整数線形最適化であり,チケットデータと過去のライダーシップデータを利用して最適化をさらに制約し,ガイドする。パフォーマンスは、フランスのいくつかの公共交通網における他の鳴り物入り手法や、これらのネットワークの1つで利用可能な手動カウント、およびシミュレーションデータと比較される。 A reliable and accurate knowledge of the ridership in public transportation networks is crucial for public transport operators and public authorities to be aware of their network's use and optimize transport offering. Several techniques to estimate ridership exist nowadays, some of them in an automated manner. Among them, Automatic Passenger Counting (APC) systems detect passengers entering and leaving the vehicle at each station of its course. However, data resulting from these systems are often noisy or even biased, resulting in under or overestimation of onboard occupancy. In this work, we propose a denoising algorithm for APC data to improve their robustness and ease their analyzes. The proposed approach consists in a constrained integer linear optimization, taking advantage of ticketing data and historical ridership data to further constrain and guide the optimization. The performances are assessed and compared to other denoising methods on several public transportation networks in France, to manual counts available on one of these networks, and on simulated data.	翻訳日:2024-02-18 13:14:26 公開日:2024-02-06
# 適応的な選択的シナプス減衰によるパラメータチューニングフリーデータ入力誤りの学習 Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening ( http://arxiv.org/abs/2402.10098v1 ) ライセンス: Link先を確認	Stefan Schoepf, Jack Foster, Alexandra Brintrup	(参考訳) データ入力は機械学習パイプラインの基本コンポーネントを構成するが、しばしばラベルエラーが発生する。このようなエラーを含むデータセットでモデルがトレーニングされた場合、そのパフォーマンスは低下する。これにより、モデルを完全に再トレーニングすることなく、誤ったデータの影響を効率よく学び、モデルのパフォーマンスを改善することが困難になる。間違ったエントリの正しいラベルが知られている場合、モデル編集方法が存在するが、誤ったデータに対する正しいラベルを知らないデータ入力エラーの場合に焦点を当てる。私たちの貢献は2倍です。まず,選択的シナプス減衰アンラーニング法の拡張を行い,パラメータチューニングの必要性を排除し,実践者が学べるようにした。本稿では,ResNet18とVision Transformerの未学習タスクにおける適応選択的シナプス減衰(ASSD)の性能を示す。次に,実世界データを用いたラベリング誤差を伴うサプライチェーン遅延予測問題において,様々なラベリング誤差のレベルをランダムに導入したasdの性能を示す。このアプローチの適用は、特にサプライチェーン管理のような、excelシートを介してデータ入力のかなりの部分が手動で発生し、エラーが発生しやすい産業環境では魅力的である。 ASSDは、一般的なアンラーニングベンチマークや、誤り訂正のための微調整に優れるエラー訂正問題に強い性能を示す。 Data entry constitutes a fundamental component of the machine learning pipeline, yet it frequently results in the introduction of labelling errors. When a model has been trained on a dataset containing such errors its performance is reduced. This leads to the challenge of efficiently unlearning the influence of the erroneous data to improve the model performance without needing to completely retrain the model. While model editing methods exist for cases in which the correct label for a wrong entry is known, we focus on the case of data entry errors where we do not know the correct labels for the erroneous data. Our contribution is twofold. First, we introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning, making unlearning accessible to practitioners. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD), on various ResNet18 and Vision Transformer unlearning tasks. Second, we demonstrate the performance of ASSD in a supply chain delay prediction problem with labelling errors using real-world data where we randomly introduce various levels of labelling errors. The application of this approach is particularly compelling in industrial settings, such as supply chain management, where a significant portion of data entry occurs manually through Excel sheets, rendering it error-prone. ASSD shows strong performance on general unlearning benchmarks and on the error correction problem where it outperforms fine-tuning for error correction.	翻訳日:2024-02-18 12:40:20 公開日:2024-02-06
# carthago delenda est:オンラインソーシャルメディアにおけるインフルエンサー操作のための間接的情報拡散モデル Carthago Delenda Est: Co-opetitive Indirect Information Diffusion Model for Influence Operations on Online Social Media ( http://arxiv.org/abs/2402.01905v2 ) ライセンス: Link先を確認	Jwen Fai Low, Benjamin C. M. Fung, Farkhund Iqbal, and Claude Fachkha	(参考訳) 信頼性が破産している州または非国家アクターにとって、非帰属的で、非帰属的で、非帰属的で、一見草の根的だが非中央集権的な影響/情報操作(info ops)をソーシャルメディア上で行うことは、その利益を推進しながら信頼欠陥の問題を回避するのに役立つ。分散情報運用に対する計画と/または防御は、倫理的に制限されたソーシャルメディアでのライブ実験の代わりに計算シミュレーションによって支援される。本研究では,twitterライクなソーシャルメディア上での情報伝達に挑戦するエージェントベースモデルであるdiluvsionを提案する。このモデルは、間接的な情報の絶え間なく流入する洪水や、ボットが自分たちの姿勢を広めようと競争するときに、協調的に構築できる洪水から潜在的に一般的な支持を受けることの認識に影響される意見(スタンス)に対するユーザの信念を強調する。実世界のデータに対して検証されたこのモデルは、スタンス導入、情報の非社会的結合拡散、拡散可能なスタンスとしての中立性、メディアのフレーミング効果に類似し、スタンス伝播に関して共生的なテーマなど、エンゲージメントの指標を考慮し、これまでのモデルよりも進歩している。希釈モデルの強みは、例えば1つのスタンスの採用を最大化するような正統派情報ops、エコーチャンバーの作成、偏光誘導、テーマの普及のためのトロイの木馬戦術として複数のスタンスを同時に支援する非正統的情報opsのシミュレーションで示される。 For a state or non-state actor whose credibility is bankrupt, relying on bots to conduct non-attributable, non-accountable, and seemingly-grassroots-but-decentralized-in-actuality influence/information operations (info ops) on social media can help circumvent the issue of trust deficit while advancing its interests. Planning and/or defending against decentralized info ops can be aided by computational simulations in lieu of ethically-fraught live experiments on social media. In this study, we introduce Diluvsion, an agent-based model for contested information propagation efforts on Twitter-like social media. The model emphasizes a user's belief in an opinion (stance) being impacted by the perception of potentially illusory popular support from constant incoming floods of indirect information, floods that can be cooperatively engineered in an uncoordinated manner by bots as they compete to spread their stances. Our model, which has been validated against real-world data, is an advancement over previous models because we account for engagement metrics in influencing stance adoption, non-social tie spreading of information, neutrality as a stance that can be spread, and themes that are analogous to media's framing effect and are symbiotic with respect to stance propagation. The strengths of the Diluvsion model are demonstrated in simulations of orthodox info ops, e.g., maximizing adoption of one stance; creating echo chambers; inducing polarization; and unorthodox info ops, e.g., simultaneous support of multiple stances as a Trojan horse tactic for the dissemination of a theme.	翻訳日:2024-02-09 18:25:07 公開日:2024-02-06
# 整数最適化によるテンソル補完 Tensor Completion via Integer Optimization ( http://arxiv.org/abs/2402.05141v1 ) ライセンス: Link先を確認	Xin Chen, Sukanya Kudva, Yongzheng Dai, Anil Aswani, Chen Chen	(参考訳) テンソル完備化問題の主な課題は、計算力と情報理論サンプル複雑性率の基本的な緊張である。過去のアプローチでは、情報理論の速度を達成できるが、対応する解を計算するための実用的なアルゴリズムが欠如しているか、あるいは低い推定誤差のために指数関数的に大きなサンプル数を必要とする多項式時間アルゴリズムがある。本稿では, 線形数のオラクルステップと情報理論速度で証明可能な収束(数値耐性)を両立させることにより, この緊張を解消する新しいテンソル補完アルゴリズムを開発する。本手法は, ゲージベーステンソルノルムを用いて制約された凸最適化問題としてテンソル完備化を定式化し, 整数線形最適化を用いて単位球上の線形分離問題を解けるように定義する。この洞察に基づく適応は、我々のアルゴリズムを構築するためにフランクウルフ変種に組み込まれます。我々は,最大1000万エントリを有するテンソルの数値実験を用いて,アルゴリズムのスケールスウェルを示す。 The main challenge with the tensor completion problem is a fundamental tension between computation power and the information-theoretic sample complexity rate. Past approaches either achieve the information-theoretic rate but lack practical algorithms to compute the corresponding solution, or have polynomial-time algorithms that require an exponentially-larger number of samples for low estimation error. This paper develops a novel tensor completion algorithm that resolves this tension by achieving both provable convergence (in numerical tolerance) in a linear number of oracle steps and the information-theoretic rate. Our approach formulates tensor completion as a convex optimization problem constrained using a gauge-based tensor norm, which is defined in a way that allows the use of integer linear optimization to solve linear separation problems over the unit-ball in this new norm. Adaptations based on this insight are incorporated into a Frank-Wolfe variant to build our algorithm. We show our algorithm scales-well using numerical experiments on tensors with up to ten million entries.	翻訳日:2024-02-09 17:57:47 公開日:2024-02-06
# Tag-LLM:特殊ドメインのための汎用LLMの再利用 Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains ( http://arxiv.org/abs/2402.05140v1 ) ライセンス: Link先を確認	Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, Nicolo Fusi	(参考訳) 大規模言語モデル(LLM)は、自然言語の理解と生成に顕著な能力を示した。しかし、その能力は、物理科学や生物医学などの事前学習コーパスで過小評価された高度に専門化された領域で低下した。本研究は、汎用LLMを専門分野の効率的なタスク解決に活用する方法を探る。 LLMの埋め込み層に付加される連続ベクトルとしてパラメータ化されるカスタム入力タグを学習するための,新しいモデルに依存しないフレームワークを提案する。ドメインタグは特殊表現(例えば化学式)を分離し、ドメイン関連コンテキストを提供するのに使われ、関数タグは特定の関数(例えば分子特性の予測)を表すのに使われ、関数解決命令は圧縮される。補助データとドメイン知識を用いて,これらのタグを学習するための3段階のプロトコルを開発した。タスク領域をタスク関数から明示的に分離することにより、入力タグの多様な組み合わせにより、ゼロショット一般化が可能となる。また、タンパク質や化学的性質の予測や薬物と標的の相互作用のモデリングなど、様々な専門分野におけるLLMの性能を高める。 Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.	翻訳日:2024-02-09 17:57:29 公開日:2024-02-06
# SceMQA: 学術大学入学レベルのマルチモーダル質問に対するベンチマーク SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark ( http://arxiv.org/abs/2402.05138v1 ) ライセンス: Link先を確認	Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang	(参考訳) 本稿は,大学進学レベルでの科学的マルチモーダル質問応答のための新しいベンチマークであるscemqaを紹介する。それは、しばしば既存のベンチマークで見過ごされる重要な教育段階に対処し、高校からプレコラージュレベルにまたがる。 SceMQAは数学、物理学、化学、生物学などの中核的な科学分野に焦点を当てている。複数選択と自由応答の混在を特徴とし、AIモデルの能力を総合的に評価する。さらに,本ベンチマークでは,各問題に対する特定の知識ポイントと,各回答に対する詳細な説明を提供する。 SceMQAはまた、推論能力のより徹底的かつ正確な評価を促進するために、同じ文脈で問題を示すが、様々な質問を提供する。実験では,オープンソースのマルチモーダル大規模言語モデル (MLLM) を,様々な実験環境において評価した。その結果,最強モデルで達成される精度は50%から60%に過ぎず,より有能なMLLMの開発にはさらなる研究と開発が必要であることが示された。ベンチマークと分析はhttps://scemqa.github.io/で利用可能です。 The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Additionally, our benchmark provides specific knowledge points for each problem and detailed explanations for each answer. SceMQA also uniquely presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. In the experiment, we evaluate both open-source and close-source state-of-the-art Multimodal Large Language Models (MLLMs), across various experimental settings. The results show that further research and development are needed in developing more capable MLLM, as highlighted by only 50% to 60% accuracy achieved by the strongest models. Our benchmark and analysis will be available at https://scemqa.github.io/	翻訳日:2024-02-09 17:57:07 公開日:2024-02-06
# LtU-ILI:天体物理学と宇宙論における暗黙の推論のためのオールインワンフレームワーク LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology ( http://arxiv.org/abs/2402.05137v1 ) ライセンス: Link先を確認	Matthew Ho, Deaglan J. Bartlett, Nicolas Chartier, Carolina Cuesta-Lazaro, Simon Ding, Axel Lapel, Pablo Lemos, Christopher C. Lovell, T. Lucas Makinen, Chirag Modi, Viraj Pandya, Shivam Pandey, Lucia A. Perez, Benjamin Wandelt, Greg L. Bryan	(参考訳) 本稿では、天体物理学と宇宙論における機械学習(ML)の高速かつユーザフレンドリで最先端の推論のためのコードベースであるLtU-ILIパイプラインについて述べる。このパイプラインには、さまざまなニューラルネットワークの実装、スキーマのトレーニング、事前、密度推定といったソフトウェアが含まれており、どんな研究ワークフローにも容易に適応できる。後方推定カバレッジを評価するための包括的な検証メトリクスが含まれており、推定結果の信頼性を高めている。さらにパイプラインは容易に並列化でき、ハイパーパラメータのモデリングを効率的に行うために設計されている。 x線測光から銀河団質量を推定すること、物質のパワースペクトルとハロポイント雲から宇宙論を推測すること、重力波信号における前駆体を特徴付けること、銀河の色や光度から物理的塵のパラメータを捉えること、半分析的な銀河形成モデルの確立などである。また、全実装手法の比較や、天文学におけるML推論の課題と落とし穴についての議論も含む。すべてのコードとサンプルはhttps://github.com/maho3/ltu-iliで公開されている。 This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schema, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable, designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterising progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at https://github.com/maho3/ltu-ili.	翻訳日:2024-02-09 17:56:49 公開日:2024-02-06
# LV-Eval: 256Kまでの5つのレベルを持つバランスのとれたロングコンテキストベンチマーク LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K ( http://arxiv.org/abs/2402.05136v1 ) ライセンス: Link先を確認	Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang	(参考訳) State-of-the-art large language model (LLMs)は256k以上のコンテキスト長をサポートしている。対照的に、主流ベンチマークの平均コンテキスト長は不十分(5k-21k)であり、潜在的な知識リークと不正確なメトリクスに悩まされ、バイアス評価をもたらす。本稿では,5つの長さレベル(16k,32k,64k,128k,256k)が最大256kワードに達する,挑戦的な長コンテキストベンチマークlv-evalを紹介する。 LV-Evalは、シングルホップQAとマルチホップQAという、11のバイリンガルデータセットからなる2つの主要なタスクを備えている。 lv-evalの設計には、事実の挿入の紛らわしさ、キーワードと句の置換、キーワードリコールに基づくメトリックデザインという3つの重要な技法が組み込まれている。 LV-Evalの利点は、異なるコンテキストの長さにわたる制御可能な評価、紛らわしい事実を持つテストインスタンスへの挑戦、知識リークの軽減、より客観的な評価である。 LV-Evalの10LLMを評価し,LV-Evalの工法に関するアブレーション研究を行った。その結果、以下のことが判明した。 (i)商用LLMは,要求コンテキスト長よりも短い長さで評価した場合,一般的にオープンソースLLMよりも優れる。しかし、その全体的な性能は、長いコンテキスト長を持つオープンソースのLLMに勝っている。 (II)Yi-6B-200kのような長文LLMは比較的穏やかな性能低下を示すが、その絶対性能は文脈長が短いLLMよりも必ずしも高いとは限らない。 (iii)llmsの性能は,混乱した情報の存在下で,特に「干し草の積み重ね」の圧力試験において著しく低下する可能性がある。 (4)知識漏洩や不正確な指標に関する問題は評価のバイアスをもたらし、これらの懸念はLV-Evalで緩和される。すべてのデータセットと評価コードは、https://github.com/infinigence/LVEval.comでリリースされる。 State-of-the-art large language models (LLMs) are now claiming remarkable supported context lengths of 256k or even more. In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation. This paper introduces LV-Eval, a challenging long-context benchmark with five length levels (16k, 32k, 64k, 128k, and 256k) reaching up to 256k words. LV-Eval features two main tasks, single-hop QA and multi-hop QA, comprising 11 bilingual datasets. The design of LV-Eval has incorporated three key techniques, namely confusing facts insertion, keyword and phrase replacement, and keyword-recall-based metric design. The advantages of LV-Eval include controllable evaluation across different context lengths, challenging test instances with confusing facts, mitigated knowledge leakage, and more objective evaluations. We evaluate 10 LLMs on LV-Eval and conduct ablation studies on the techniques used in LV-Eval construction. The results reveal that: (i) Commercial LLMs generally outperform open-source LLMs when evaluated within length levels shorter than their claimed context length. However, their overall performance is surpassed by open-source LLMs with longer context lengths. (ii) Extremely long-context LLMs, such as Yi-6B-200k, exhibit a relatively gentle degradation of performance, but their absolute performances may not necessarily be higher than those of LLMs with shorter context lengths. (iii) LLMs' performances can significantly degrade in the presence of confusing information, especially in the pressure test of "needle in a haystack". (iv) Issues related to knowledge leakage and inaccurate metrics introduce bias in evaluation, and these concerns are alleviated in LV-Eval. All datasets and evaluation codes are released at: https://github.com/infinigence/LVEval.	翻訳日:2024-02-09 17:56:27 公開日:2024-02-06
# CADReN:制御可能なクロスグラフノードインポート推定のためのコンテキストアンカー駆動リレーショナルネットワーク CADReN: Contextual Anchor-Driven Relational Network for Controllable Cross-Graphs Node Importance Estimation ( http://arxiv.org/abs/2402.05135v1 ) ライセンス: Link先を確認	Zijie Zhong, Yunhui Zhang, Ziyi Chang, Zengchang Qin	(参考訳) ノード重要度推定(NIE)は、Retriever-Augmented Generationを通じて外部情報を大規模言語モデルに統合するために重要である。静的なシングルグラフの特徴に注目した従来の方法は、新しいグラフやユーザ固有の要件への適応性に欠ける。提案手法であるCADReNは、コンテキストアンカー(CA)機構を導入し、これらの制約に対処する。このアプローチにより、ネットワークは知識グラフ(KG)の構造的特徴と意味的特徴の両方を考慮して、CAに対するノードの重要性を評価することができる。広汎な実験により,CADReNはゼロショット予測能力を持つクロスグラフNIEタスクにおいて,より良い性能を実現することが示された。 CADReNは、シングルグラフNIEタスクにおける以前のモデルの性能と一致することが証明されている。さらに,NIEのクロスグラフ研究に特化して設計されたRIC200とWK1Kという2つの新しいデータセットをオープンソースとして公開し,今後の発展に有用なリソースを提供する。 Node Importance Estimation (NIE) is crucial for integrating external information into Large Language Models through Retriever-Augmented Generation. Traditional methods, focusing on static, single-graph characteristics, lack adaptability to new graphs and user-specific requirements. CADReN, our proposed method, addresses these limitations by introducing a Contextual Anchor (CA) mechanism. This approach enables the network to assess node importance relative to the CA, considering both structural and semantic features within Knowledge Graphs (KGs). Extensive experiments show that CADReN achieves better performance in cross-graph NIE task, with zero-shot prediction ability. CADReN is also proven to match the performance of previous models on single-graph NIE task. Additionally, we introduce and opensource two new datasets, RIC200 and WK1K, specifically designed for cross-graph NIE research, providing a valuable resource for future developments in this domain.	翻訳日:2024-02-09 17:55:54 公開日:2024-02-06
# パーソナライズされた人間のフィードバックからのパーソナライズド言語モデリング Personalized Language Modeling from Personalized Human Feedback ( http://arxiv.org/abs/2402.05133v1 ) ライセンス: Link先を確認	Xinyu Li, Zachary C. Lipton, Liu Leqi	(参考訳) Reinforcement Learning from Human Feedback (RLHF) は、人間の好みに合わせて大きな言語モデルを微調整する、現在の支配的なフレームワークである。しかし、このフレームワークで開発されたアルゴリズムの前提は、人間のフィードバックに符号化されたユーザの好みが多様である場合に問題となる。本研究では,パーソナライズされた言語モデルを構築する手法の開発により,この問題に対処しようとする。まず、個人化されたフィードバックから学習するタスクを正式に紹介し、なぜバニラRLHFが問題となるのかを説明する。次に、ユーザモデルと言語(あるいは報酬)モデルを共同で学習する必要がある一般パーソナライズ-RLHF(P-RLHF)フレームワークを提案する。ユーザモデルはユーザ情報を取り込み、ユーザ表現を出力する。その構造は、フィードバックデータに基づくユーザの好みに関する仮定をエンコードします。我々はパーソナライズされた報酬モデリングとパーソナライズされた直接選好最適化のための新しい学習目標を開発した。本手法の有効性を示すために,アノテーション付き選好情報と注釈情報を用いた実世界のテキスト要約データを用いてテストを行った。 GPT-J 6Bを微調整してパーソナライズされた言語(と報酬)モデルを得る。 Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimization. To demonstrate the efficacy of our method, we test it on real-world text summarization data with annotated preferences and annotator information. We fine-tune GPT-J 6B to obtain personalized language (and reward) models, which outperform non-personalized models in terms of aligning with individual preferences.	翻訳日:2024-02-09 17:55:36 公開日:2024-02-06
# 無線ネットワークにおける協調スペクトル学習のための媒体アクセス制御プロトコル Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks ( http://arxiv.org/abs/2111.12581v2 ) ライセンス: Link先を確認	Tomer Boyarski, Wenbo Wang, Amir Leshem	(参考訳) 近年,スペクトル協調のための学習アルゴリズムの提供に力を入れている。本稿では,高負荷ネットワークにおいて,最小限の後悔と高いスペクトル効率でスペクトル協調を実現するメディアアクセス制御プロトコルを提案する。アドホックネットワークにおけるスペクトル協調のための完全分散アルゴリズムを提案する。このアルゴリズムは、チャネル割り当てとアクセススケジューリングの問題を共同で解決する。アルゴリズムが最適対数的後悔を持つことを証明する。このアルゴリズムに基づき、アドホックネットワークにおけるアルゴリズムの分散実装を可能にする媒体アクセス制御プロトコルを提供する。このプロトコルは、単一チャネルオポチュニストキャリアセンシングを使用して、時間と周波数の低複雑さ分散オークションを実行する。また,有界フレームサイズや収束速度などの実践的実装問題についても論じる。アルゴリズムと最先端の分散媒体アクセス制御プロトコルを比較したコンピュータシミュレーションは,提案手法の大きな利点を示している。 In recent years there is a growing effort to provide learning algorithms for spectrum collaboration. In this paper we present a medium access control protocol which allows spectrum collaboration with minimal regret and high spectral efficiency in highly loaded networks. We present a fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks. The algorithm jointly solves both the channel allocation and access scheduling problems. We prove that the algorithm has an optimal logarithmic regret. Based on the algorithm we provide a medium access control protocol which allows distributed implementation of the algorithm in ad-hoc networks. The protocol utilizes single-channel opportunistic carrier sensing to carry out a low-complexity distributed auction in time and frequency. We also discuss practical implementation issues such as bounded frame size and speed of convergence. Computer simulations comparing the algorithm to state-of-the-art distributed medium access control protocols show the significant advantage of the proposed scheme.	翻訳日:2024-02-08 21:12:28 公開日:2024-02-06
# 半教師付き学習による脳内出血検出と分節化の一般化 Semi-supervised learning for generalizable intracranial hemorrhage detection and segmentation ( http://arxiv.org/abs/2105.00582v2 ) ライセンス: Link先を確認	Emily Lin, Esther Yuh	(参考訳) 目的: 頭部ctを用いた頭蓋内出血検出・分節化のための半教師付き学習モデルの開発と評価すること。材料と方法: この振り返り研究は半教師あり学習を用いてパフォーマンスをブートストラップした。最初の"Teacher"ディープラーニングモデルは、2010年から2017年にかけて米国のある機関から収集された457ピクセルの頭部CTスキャンに基づいてトレーニングされ、RSNAとASNRから25,000の試験の別ラベルコーパスで擬似ラベルを生成するために使用された。 2つ目の"sudent"モデルは、このピクセルと擬似ラベルのデータセットでトレーニングされた。 93スキャンの検証セットでハイパーパラメータチューニングが行われた。インドで実施された481検診のデータセットであるCQ500で, 分類(n=481検診)と分割(n=23検診, 529検診)を行った。半教師付きモデルと,受信者動作特性曲線 (auc) , dice類似度係数 (dsc) および平均精度 (ap) 指標の下の領域を用いてラベル付きデータのみを訓練したベースラインモデルを比較した。結果: 半教師モデルでは, CQ500のAUCは, ベースライン (0.939 [0.938, 0.940] vs. 0.907 [0.906, 0.908]) と比較して統計的に有意に高い値を示した(p=0.009)。また, DSC (0.829 [0.825, 0.833] vs. 0.809 [0.803, 0.812]) (p=0.012) と Pixel AP (0.848 [0.843, 0.853]) vs. 0.828 [0.817, 0.828]) はベースラインに比べて高い値を示した。結論: 半教師付き学習フレームワークにおけるラベルなしデータの追加は, 教師付きベースラインと比較して, 頭蓋内出血の検出と分節化に強い汎化可能性を示す。 Purpose: To develop and evaluate a semi-supervised learning model for intracranial hemorrhage detection and segmentation on an out-of-distribution head CT evaluation set. Materials and Methods: This retrospective study used semi-supervised learning to bootstrap performance. An initial "teacher" deep learning model was trained on 457 pixel-labeled head CT scans collected from one US institution from 2010-2017 and used to generate pseudo-labels on a separate unlabeled corpus of 25000 examinations from the RSNA and ASNR. A second "student" model was trained on this combined pixel- and pseudo-labeled dataset. Hyperparameter tuning was performed on a validation set of 93 scans. Testing for both classification (n=481 examinations) and segmentation (n=23 examinations, or 529 images) was performed on CQ500, a dataset of 481 scans performed in India, to evaluate out-of-distribution generalizability. The semi-supervised model was compared with a baseline model trained on only labeled data using area under the receiver operating characteristic curve (AUC), Dice similarity coefficient (DSC), and average precision (AP) metrics. Results: The semi-supervised model achieved statistically significantly higher examination AUC on CQ500 compared with the baseline (0.939 [0.938, 0.940] vs. 0.907 [0.906, 0.908]) (p=0.009). It also achieved a higher DSC (0.829 [0.825, 0.833] vs. 0.809 [0.803, 0.812]) (p=0.012) and Pixel AP (0.848 [0.843, 0.853]) vs. 0.828 [0.817, 0.828]) compared to the baseline. Conclusion: The addition of unlabeled data in a semi-supervised learning framework demonstrates stronger generalizability potential for intracranial hemorrhage detection and segmentation compared with a supervised baseline.	翻訳日:2024-02-08 21:12:16 公開日:2024-02-06
# 多変量確率CRPS学習と日頭電力価格への応用 Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices ( http://arxiv.org/abs/2303.10019v3 ) ライセンス: Link先を確認	Jonathan Berrisch, Florian Ziel	(参考訳) 本稿では,オンライン学習が可能なスムーズな手順により,量子と辺縁の依存関係を考慮し,多変量確率予測を結合(あるいは集約)する新しい手法を提案する。本稿では,基底行列を用いた次元性低減とペナルティ化平滑化の2つの平滑化手法について検討する。新しいオンライン学習アルゴリズムは、標準CRPS学習フレームワークを多変量次元に一般化する。これはBernstein Online Aggregation (BOA)に基づいており、最適な漸近学習特性をもたらす。この手順は水平アグリゲーション、すなわち量子的集合を用いる。本稿では,提案アルゴリズムの拡張の可能性と,既存文献に関連するネスト事例について,オンライン予測の組み合わせについて詳細に検討する。提案手法を24次元分布予測である日頭電力価格の予測に適用する。提案手法は,CRPS(Continuous Rank probability score)の観点から,均一な組み合わせよりも顕著な改善をもたらす。重みとハイパーパラメータの時間的進化について論じ, 推奨モデルの縮小版の結果を示す。提案アルゴリズムの高速なC++実装は、CRAN上のオープンソースのR-Package profocで提供される。 This paper presents a new method for combining (or aggregating or ensembling) multivariate probabilistic forecasts, considering dependencies between quantiles and marginals through a smoothing procedure that allows for online learning. We discuss two smoothing methods: dimensionality reduction using Basis matrices and penalized smoothing. The new online learning algorithm generalizes the standard CRPS learning framework into multivariate dimensions. It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic learning properties. The procedure uses horizontal aggregation, i.e., aggregation across quantiles. We provide an in-depth discussion on possible extensions of the algorithm and several nested cases related to the existing literature on online forecast combination. We apply the proposed methodology to forecasting day-ahead electricity prices, which are 24-dimensional distributional forecasts. The proposed method yields significant improvements over uniform combination in terms of continuous ranked probability score (CRPS). We discuss the temporal evolution of the weights and hyperparameters and present the results of reduced versions of the preferred model. A fast C++ implementation of the proposed algorithm is provided in the open-source R-Package profoc on CRAN.	翻訳日:2024-02-08 21:02:12 公開日:2024-02-06
# マルチクラスグラフニューラルネットワークにおける符号付き伝播の再検討 Revisiting Signed Propagation for Multi-Class Graph Neural Networks ( http://arxiv.org/abs/2301.08918v5 ) ライセンス: Link先を確認	Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim	(参考訳) 隣接ノードから情報を収集するメッセージパスグラフニューラルネットワーク(GNN)は、異種グラフ上で不適切なパフォーマンスを達成する。この問題を解決するための様々なスキームが提案され、異種縁に署名された情報を伝播することが注目されている。近年では、符号付き伝搬が常にバイナリクラスのシナリオでパフォーマンス改善につながるという理論的解析が提供されている。しかし、事前解析がマルチクラスベンチマークデータセットとうまく一致しないことに気付きました。メッセージパッシング(Message-passing):2つのノードが異なるクラスに属し、高い類似性を持つ場合、署名された伝搬は分離性を低下させることができる。 2) パラメータ更新: 署名された隣人の予測の不確実性(例えば衝突証拠)は、トレーニング中に増加し、アルゴリズムの安定性を阻害する。本研究は,マルチクラスグラフに基づく署名伝達を改善するための2つの新しい手法を提案する。提案手法はキャリブレーションとロバスト性を確保しつつ不確実性を低減させる。 6つのベンチマークグラフデータセットに対する広範な実験により,本定理の有効性を示す。 Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes achieve dismal performance on heterophilic graphs. Various schemes have been proposed to solve this problem, and propagating signed information on heterophilic edges has gained great attention. Recently, some works provided theoretical analysis that signed propagation always leads to performance improvement under a binary class scenario. However, we notice that prior analyses do not align well with multi-class benchmark datasets. This paper provides a new understanding of signed propagation for multi-class scenarios and points out two drawbacks in terms of message-passing and parameter update: (1) Message-passing: if two nodes belong to different classes but have a high similarity, signed propagation can decrease the separability. (2) Parameter update: the prediction uncertainty (e.g., conflict evidence) of signed neighbors increases during training, which can impede the stability of the algorithm. Based on the observation, we introduce two novel strategies for improving signed propagation under multi-class graphs. The proposed scheme combines calibration to secure robustness while reducing uncertainty. We show the efficacy of our theorem through extensive experiments on six benchmark graph datasets.	翻訳日:2024-02-08 21:01:43 公開日:2024-02-06
# RenderDiffusion:3次元再構成・塗装・生成のための画像拡散 RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation ( http://arxiv.org/abs/2211.09869v3 ) ライセンス: Link先を確認	Titas Anciukevicius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero	(参考訳) 拡散モデルは現在、条件付きおよび無条件画像生成の両方において最先端の性能を達成している。しかし、これまでの画像拡散モデルは、ビュー一貫性のある3D生成やシングルビューオブジェクト再構成のような3D理解に必要なタスクをサポートしていない。本稿では,単分子2次元監視のみを用いてトレーニングした3次元生成と推論のための最初の拡散モデルであるRenderDiffusionを提案する。提案手法の中心となるのは,シーンの中間的な3次元表現を生成・描画する新しい画像復調アーキテクチャである。これは拡散過程の中で強い誘導構造を強制し、2次元の監督しか必要とせず、3次元の一貫した表現を提供する。得られた3d表現は、任意のビューからレンダリングできる。 FFHQ,AFHQ,ShapeNet,CLEVRのデータセット上でRenderDiffusionを評価し,3Dシーンの生成と2D画像からの3Dシーンの推測の競合性能を示した。さらに、拡散ベースのアプローチでは、2dインペインティングを使って3dシーンを編集できます。 Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation. However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D generation or single-view object reconstruction. In this paper, we present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision. Central to our method is a novel image denoising architecture that generates and renders an intermediate three-dimensional representation of a scene in each denoising step. This enforces a strong inductive structure within the diffusion process, providing a 3D consistent representation while only requiring 2D supervision. The resulting 3D representation can be rendered from any view. We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images. Additionally, our diffusion-based approach allows us to use 2D inpainting to edit 3D scenes.	翻訳日:2024-02-08 20:59:07 公開日:2024-02-06
# 局所的に異なる私的メカニズムの収縮 Contraction of Locally Differentially Private Mechanisms ( http://arxiv.org/abs/2210.13386v3 ) ライセンス: Link先を確認	Shahab Asoodeh and Huanyu Zhang	(参考訳) 局所微分プライベート機構の収縮特性について検討する。具体的には、$PK$と$QK$の出力分布が$\epsilon$-LDPメカニズムの$K$のばらつきについて、対応する入力分布の$P$と$Q$のばらつきについて厳密な上限を導出する。我々の最初の技術結果は、$\chi^2$-divergence $\chi^2(PK}\\|QK)$と$\varepsilon$の点で鋭い上限を示す。また、KL偏差や正方形ヘルリンガー距離を含む大きな分岐族についても同様の結果が得られた。第2の技術的結果は、全変動距離$\mathsf{TV}(P, Q)$と$\epsilon$の点で、$\chi^2(PK\\|QK)$の上界を与える。次に、これらの境界を利用して、局所的なvan Treesの不等式、Le Cam's、Assouad's、およびミニマックス推定リスクをバウンディングするための強力なツールである相互情報手法を確立する。これらの結果は、エントロピーや離散分布推定、非パラメトリック密度推定、仮説テストといったいくつかの統計問題において、最先端技術よりも優れたプライバシー分析をもたらすことが示されている。 We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between $PK$ and $QK$ output distributions of an $\epsilon$-LDP mechanism $K$ in terms of a divergence between the corresponding input distributions $P$ and $Q$, respectively. Our first main technical result presents a sharp upper bound on the $\chi^2$-divergence $\chi^2(PK}\\|QK)$ in terms of $\chi^2(P\\|Q)$ and $\varepsilon$. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on $\chi^2(PK\\|QK)$ in terms of total variation distance $\mathsf{TV}(P, Q)$ and $\epsilon$. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing.	翻訳日:2024-02-08 20:58:49 公開日:2024-02-06
# 絡み合い支援通信のためのフォールトトレラント符号化 Fault-tolerant Coding for Entanglement-Assisted Communication ( http://arxiv.org/abs/2210.02939v2 ) ライセンス: Link先を確認	Paula Belzig, Matthias Christandl, Alexander M\"uller-Hermes	(参考訳) チャネル容量は、ノイズの多いチャネル上で情報を確実に送信する最適な速度を定量化する。通常、キャパシティの研究は、送信側と受信側がエンコードとデコードに使用する回路が完全なノイズのないゲートからなると仮定している。しかし、量子チャネル上の通信の場合、この仮定は、デコヒーレンスの過程によって影響を受ける量子情報の脆弱さのために、長期的にも非現実的であると広く信じられている。そのため、ChristandlとM\"uller-Hermesは、量子チャネルのフォールトトレラントチャネル符号化、すなわちエンコーダ回路とデコーダ回路がノイズに影響を受けるコーディングスキームの研究を開始し、フォールトトレラント量子コンピューティングの技法を用いて古典的および量子的情報を送信するための符号化定理を確立した。ここでは,これらの手法を絡み合い支援通信の場合,特にゲートエラーがゼロに近づくと,耐故障能力が通常の容量に近づくことを示す。独立した関心を持つと思われる主なツールは、フォールトトレラントなエンタングルメント蒸留の導入である。さらに,他のフォールトトレラントな通信シナリオでも容易に適用できるように,使用されるテクニックのモジュール化にも重点を置いています。 Channel capacities quantify the optimal rates of sending information reliably over noisy channels. Usually, the study of capacities assumes that the circuits which sender and receiver use for encoding and decoding consist of perfectly noiseless gates. In the case of communication over quantum channels, however, this assumption is widely believed to be unrealistic, even in the long-term, due to the fragility of quantum information, which is affected by the process of decoherence. Christandl and M\"uller-Hermes have therefore initiated the study of fault-tolerant channel coding for quantum channels, i.e. coding schemes where encoder and decoder circuits are affected by noise, and have used techniques from fault-tolerant quantum computing to establish coding theorems for sending classical and quantum information in this scenario. Here, we extend these methods to the case of entanglement-assisted communication, in particular proving that the fault-tolerant capacity approaches the usual capacity when the gate error approaches zero. A main tool, which might be of independent interest, is the introduction of fault-tolerant entanglement distillation. We furthermore focus on the modularity of the techniques used, so that they can be easily adopted in other fault-tolerant communication scenarios.	翻訳日:2024-02-08 20:58:21 公開日:2024-02-06
# 確率的未発達連帯学習 Stochastic Unrolled Federated Learning ( http://arxiv.org/abs/2305.15371v2 ) ライセンス: Link先を確認	Samar Hadou, Navid NaderiAlizadeh, and Alejandro Ribeiro	(参考訳) アルゴリズムの展開は、学習ベースの最適化パラダイムとして登場し、学習可能なニューラルネットワークオプティマイザで断続的な反復アルゴリズムを展開する。本研究では,その収束を早めるために,アルゴリズムを連帯学習に拡張する手法である確率的連帯学習(surf)を提案する。提案手法は,この拡張の2つの課題,すなわち,非学習最適化者にデータセット全体を供給して,学習の降下方向と分散的な性質を見出す必要性に対処する。我々は,各階層に確率的ミニバッチを供給し,その収束を保証するために降下制約を課すことにより,従来の課題を回避する。本稿では,分散勾配降下(dgd)アルゴリズムをグラフニューラルネットワーク(gnn)ベースの未ロールアーキテクチャに展開することで,連合学習におけるトレーニングの分散性を維持することで,後者の課題に対処する。提案したアンロール最適化器がほぼ最適領域に無限に収束することを理論的に証明する。また,広範な数値実験を通じて,画像分類器の協調学習における提案手法の有効性を実証する。 Algorithm unrolling has emerged as a learning-based optimization paradigm that unfolds truncated iterative algorithms in trainable neural-network optimizers. We introduce Stochastic UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning in order to expedite its convergence. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolled optimizers to find a descent direction and the decentralized nature of federated learning. We circumvent the former challenge by feeding stochastic mini-batches to each unrolled layer and imposing descent constraints to guarantee its convergence. We address the latter challenge by unfolding the distributed gradient descent (DGD) algorithm in a graph neural network (GNN)-based unrolled architecture, which preserves the decentralized nature of training in federated learning. We theoretically prove that our proposed unrolled optimizer converges to a near-optimal region infinitely often. Through extensive numerical experiments, we also demonstrate the effectiveness of the proposed framework in collaborative training of image classifiers.	翻訳日:2024-02-08 20:51:16 公開日:2024-02-06
# スケールでの解釈可能性:アルパカにおける因果メカニズムの解明 Interpretability at Scale: Identifying Causal Mechanisms in Alpaca ( http://arxiv.org/abs/2305.08809v3 ) ライセンス: Link先を確認	Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman	(参考訳) 大規模で汎用的な言語モデルの人間解釈可能な説明を得ることは、AI安全性の緊急の目標である。しかし、我々の解釈可能性法は、モデル行動の根底にある因果ダイナミクスに忠実であり、不明瞭な入力に頑健に一般化できることと同じくらい重要である。分散アライメント探索(DAS)は、因果抽象理論に基づく強力な勾配降下法であり、解釈可能なシンボルアルゴリズムと特定のタスクのために微調整された小さなディープラーニングモデルとの完全な整合性を発見した。本稿では,残ったブルートフォースサーチステップを学習パラメーターに置き換え,境界なしdasと呼ぶアプローチにより,dasを格段にスケールする。これにより、命令に従う間、大規模言語モデルにおける解釈可能な因果構造を効率的に探索できる。境界のないdasをalpacaモデル(7bパラメータ)に適用し、棚から外れて単純な数値推論問題を解く。境界のないdasでは、2つの解釈可能なブール変数を持つ因果モデルを実装することでalpacaがこれを行うことが分かる。さらに,これらの変数に対する神経表現のアライメントは,入力や命令の変化に対して頑健であることが判明した。これらの発見は、我々の成長し、最も広く展開されている言語モデルの内部動作を忠実に理解するための第一歩である。私たちのツールはより大きなLLMに拡張可能で、https://github.com/stanfordnlp/pyvene`で公開されています。 Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) is a powerful gradient descent method grounded in a theory of causal abstraction that has uncovered perfect alignments between interpretable symbolic algorithms and small deep learning models fine-tuned for specific tasks. In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters -- an approach we call Boundless DAS. This enables us to efficiently search for interpretable causal structure in large language models while they follow instructions. We apply Boundless DAS to the Alpaca model (7B parameters), which, off the shelf, solves a simple numerical reasoning problem. With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables. Furthermore, we find that the alignment of neural representations with these variables is robust to changes in inputs and instructions. These findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models. Our tool is extensible to larger LLMs and is released publicly at `https://github.com/stanfordnlp/pyvene`.	翻訳日:2024-02-08 20:49:30 公開日:2024-02-06
# 開量子系に対する適応変分シミュレーション Adaptive variational simulation for open quantum systems ( http://arxiv.org/abs/2305.06915v2 ) ライセンス: Link先を確認	Huo Chen, Niladri Gomes, Siyuan Niu and Wibe Albert de Jong	(参考訳) 量子ハードウェアは量子シミュレーションの新しい可能性を提供する。研究の多くはクローズド量子システムのシミュレーションに重点を置いているが、現実の量子システムは大部分がオープンである。したがって、オープン量子システムを効果的にシミュレートできる量子アルゴリズムを開発することが不可欠である。本稿では,lindblad方程式によって記述された開量子系ダイナミクスをシミュレートする適応変分量子アルゴリズムを提案する。このアルゴリズムは,シミュレーション精度を保ち,演算子の動的付加により資源効率の良いアンサーゼを構築するように設計されている。我々は、ノイズレスシミュレータとIBM量子プロセッサの両方におけるアルゴリズムの有効性を検証し、正確な解との定量的および定性的な整合性を観察する。また,必要資源のスケールをシステムサイズと精度で検討し,多項式の挙動を求める。その結果、近未来の量子プロセッサはオープン量子システムをシミュレートできることがわかった。 Emerging quantum hardware provides new possibilities for quantum simulation. While much of the research has focused on simulating closed quantum systems, the real-world quantum systems are mostly open. Therefore, it is essential to develop quantum algorithms that can effectively simulate open quantum systems. Here we present an adaptive variational quantum algorithm for simulating open quantum system dynamics described by the Lindblad equation. The algorithm is designed to build resource-efficient ansatze through the dynamical addition of operators by maintaining the simulation accuracy. We validate the effectiveness of our algorithm on both noiseless simulators and IBM quantum processors and observe good quantitative and qualitative agreement with the exact solution. We also investigate the scaling of the required resources with system size and accuracy and find polynomial behavior. Our results demonstrate that near-future quantum processors are capable of simulating open quantum systems.	翻訳日:2024-02-08 20:49:03 公開日:2024-02-06
# 時間依存ハミルトニアンの密度行列のベクトル化とフォン・ノイマン方程式の量子シミュレーション Vectorization of the density matrix and quantum simulation of the von Neumann equation of time-dependent Hamiltonians ( http://arxiv.org/abs/2306.08775v4 ) ライセンス: Link先を確認	Alejandro Kunold	(参考訳) リー代数の性質に基づいて、この研究はフォン・ノイマン方程式を量子シミュレーションに適した形で線形化するための一般的な枠組みを開発した。フォン・ノイマン方程式のこれらの線型化のうちの1つは、状態ベクトルが密度行列の列積要素となり、ハミルトニアン超作用素が$I\otimes H-H^\top \otimes I$、$I$が恒等行列、$H$が標準ハミルトニアンとなる標準的な場合に対応することを示す。この特定の形式はフォン・ノイマン方程式を線型化する方法のより広いクラスに属することが証明されており、それらはそれらの原型である代数によって分類することができる。特に、状態ベクトルの量子トモグラフィーを実質的に単純化する実密度行列係数を与えるエルミート代数に注意が払われる。この考え方に基づき,密度行列のダイナミクスをシミュレートする量子アルゴリズムを提案する。この手法は、パウリ弦によって形成される代数のユニークな性質とともに、トロタライズの使用を避けることができ、したがって回路深さを著しく減少させる。パウリの弦によって形成される代数の特別なケースを使ったとしても、アルゴリズムは他の代数に容易に適用できる。このアルゴリズムはIBMノイズ量子回路シミュレータを用いて2つのおもちゃハミルトンに対して実証される。 Based oh the properties of Lie algebras, in this work we develop a general framework to linearize the von-Neumann equation rendering it in a suitable form for quantum simulations. We show that one of these linearizations of the von-Neumann equation corresponds to the standard case in which the state vector becomes the column stacked elements of the density matrix and the Hamiltonian superoperator takes the form $I\otimes H-H^\top \otimes I$ where $I$ is the identity matrix and $H$ is the standard Hamiltonian. It is proven that this particular form belongs to a wider class of ways of linearizing the von Neumann equation that can be categorized by the algebra from which they originated. Particular attention is payed to Hermitian algebras that yield real density matrix coefficients substantially simplifying the quantum tomography of the state vector. Based on this ideas, a quantum algorithm to simulate the dynamics of the density matrix is proposed. It is shown that this method, along with the unique properties of the algebra formed by Pauli strings allows to avoid the use of Trotterization hence considerably reducing the circuit depth. Even though we have used the special case of the algebra formed by the Pauli strings, the algorithm can be readily adapted to other algebras. The algorithm is demonstrated for two toy Hamiltonians using the IBM noisy quantum circuit simulator.	翻訳日:2024-02-08 20:37:16 公開日:2024-02-06
# オフライン帯域におけるベイズレジスト最小化 Bayesian Regret Minimization in Offline Bandits ( http://arxiv.org/abs/2306.01237v2 ) ライセンス: Link先を確認	Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz	(参考訳) オフライン線形包帯におけるベイズ的後悔を最小限に抑える決定の仕方について検討する。先行研究は、報酬に対して最大低い信頼率(lcb)の行動を取ることを示唆している。我々は, LCB への依存は本質的にこの設定に欠陥があることを論じ, 効率的な円錐最適化解法を用いてベイズ後悔の上限を直接最小化するアルゴリズムを提案する。我々の限界は金融リスク対策との新たなつながりに重きを置いている。一致した下限を証明し、上限がきついことを示し、それらを最小化することで、LCBアプローチより優れていることが保証される。合成ドメインの数値計算の結果, LCBの最大化よりもアプローチが優れていることが確認された。 We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that reliance on LCB is inherently flawed in this setting and propose a new algorithm that directly minimizes upper bounds on the Bayesian regret using efficient conic optimization solvers. Our bounds build heavily on new connections to monetary risk measures. Proving a matching lower bound, we show that our upper bounds are tight, and by minimizing them we are guaranteed to outperform the LCB approach. Our numerical results on synthetic domains confirm that our approach is superior to maximizing LCB.	翻訳日:2024-02-08 20:34:45 公開日:2024-02-06
# AV2Wav: 音声音声強調のための連続自己教師機能からの拡散に基づく再合成 AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement ( http://arxiv.org/abs/2309.08030v3 ) ライセンス: Link先を確認	Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu	(参考訳) 音声強調システムは通常、クリーンな音声と騒がしい音声のペアを使って訓練される。オーディオ・ヴィジュアル音声強調(AVSE)では、音声・ヴィジュアル・データセットは、背景雑音や残響を伴う現実世界の環境で収集され、AVSEの開発を妨げている。本研究では,実世界の学習データの課題にもかかわらずクリーンな音声を生成できる再生型音声視覚音声強調手法であるAV2Wavを紹介する。ニューラルクオリティ推定器を用いて音声・視覚コーパスからほぼクリーンな音声のサブセットを取得し、このサブセット上で拡散モデルを訓練し、ノイズロバストトレーニングによりAV-HuBERTから連続音声表現に条件付き波形を生成する。韻律や話者情報を保持するために、離散表現よりも連続表現を用いる。このvocodingタスクだけで、モデルはマスキングベースのベースラインよりも音声強調を行うことができる。さらに, クリーン・ノイズ対の拡散モデルを微調整し, 性能向上を図る。提案手法は,自動測定と人間の聴力テストの両方においてマスキングベースのベースラインを上回り,聴力テストにおけるターゲット音声にほぼ近い品質である。オーディオサンプルはhttps://home.ttic.edu/~jcchou/demo/avse/avse_demo.htmlにある。 Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.	翻訳日:2024-02-08 20:27:27 公開日:2024-02-06
# 量子絡み合いの幾何学的意味を明らかにする:離散変数系と連続変数系 Unveiling the geometric meaning of quantum entanglement: discrete and continuous variable systems ( http://arxiv.org/abs/2307.16835v2 ) ライセンス: Link先を確認	Arthur Vesperini, Ghofrane Bel-Hadj-Aissa, Lorenzo Capra, and Roberto Franzosi	(参考訳) 量子状態の多様体はリッチで非自明な幾何学的構造を持つことを示す。我々は、多量子ビット量子系の射影ヒルベルト空間のフビニ・スタディ計量を導出し、リーマン計量構造を内挿し、この空間の状態の絡み合いと深い関係を調べる。尺度として, [1] で提案する絡み合い距離 e を予備的に適用する。 E(\|psi>) は \|psi> とその共役状態、すなわち状態 v^mu の間の平方距離の和の最小値である。 sigma^mu \|psi>, v^mu は単位ベクトルであり、mu はパーティ数で実行される。 2つの状態が局所ユニタリ作用素の作用で同じ状態でないかどうかを決定する一般的な手法を導出する。我々は, 絡み合い距離が, 混合状態への凸屋根の拡大とともに, 絡み合い対策に必要な3つの条件を満たすことを証明した。 i) E(\|psi>) =0 iff \|psi> は完全に分離可能である。 ii) e は局所ユニタリ変換の下で不変である。三地方業務及び古典通信において、Eは増加しない。この性質には2つの異なる証明がある。また、2つの量子ビット純粋状態の場合、状態 \|psi> の絡み合い距離は、この状態の2倍の2倍と一致することも示している。連続変数系に対する絡み合い距離の一般化を提案する。最後に,greenberger-horne-zeilinger状態,briegel raussendorf状態,w状態と結びついた3つの状態の絡み合いの大きさと同値類の性質の研究に幾何学的アプローチを適用した。連続変数を持つ系の場合の応用例として、2つの結合したグラウバーコヒーレント状態の系を考える。 We show that the manifold of quantum states is endowed with a rich and nontrivial geometric structure. We derive the Fubini-Study metric of the projective Hilbert space of a multi-qubit quantum system, endowing it with a Riemannian metric structure, and investigate its deep link with the entanglement of the states of this space. As a measure, we adopt the Entanglement Distance E preliminary proposed in [1]. Our analysis shows that entanglement has a geometric interpretation: E(\|psi>) is the minimum value of the sum of the squared distances between \|psi> and its conjugate states, namely the states v^mu . sigma^mu \|psi>, where v^mu are unit vectors and mu runs on the number of parties. We derive a general method to determine when two states are not the same state up to the action of local unitary operators. We prove that the entanglement distance, along with its convex roof expansion to mixed states, fulfills the three conditions required for an entanglement measure: that is i) E(\|psi>) =0 iff \|psi> is fully separable; ii) E is invariant under local unitary transformations; iii) E doesn't increase under local operation and classical communications. Two different proofs are provided for this latter property. We also show that in the case of two qubits pure states, the entanglement distance for a state \|psi> coincides with two times the square of the concurrence of this state. We propose a generalization of the entanglement distance to continuous variable systems. Finally, we apply the proposed geometric approach to the study of the entanglement magnitude and the equivalence classes properties, of three families of states linked to the Greenberger-Horne-Zeilinger states, the Briegel Raussendorf states and the W states. As an example of an application for the case of a system with continuous variables, we have considered a system of two coupled Glauber coherent states.	翻訳日:2024-02-08 20:26:43 公開日:2024-02-06
# 一貫性のあるoracleによるシンプルなオンライン学習 Simple online learning with consistent oracle ( http://arxiv.org/abs/2308.08055v2 ) ライセンス: Link先を確認	Alexander Kozachinskiy, Tomasz Steifer	(参考訳) オンライン学習は,学習アルゴリズムがクラスにのみアクセス可能なモデルであり,そのモデルでは,任意の時点で,これまで見てきたすべての例に一致する関数をクラスから与えることができる,という,‘emph{consistent oracle}’(オラクル)を経由する。このモデルはAssosらによって最近検討された。 ~(colt'23)であった。オンライン学習の標準的な方法は、計算的に難解な問題であるサブクラスのリトルストーン次元の計算に依存しているという事実に動機づけられている。アソスとアル。このモデルのオンライン学習アルゴリズムは、Littlestone 次元のクラスで少なくとも$C^d$ のミスを犯し、絶対的でない定数 $C > 0$ に対して$d$ の間違いを犯す。我々は少なくとも$O(256^d)$ミスを犯す新しいアルゴリズムを与える。この証明は非常に単純であり、リトルストーン次元の非常に基本的な性質のみを用いる。また、このモデルには3^d$の誤りを犯すアルゴリズムが存在しないことも示している。 We consider online learning in the model where a learning algorithm can access the class only via the \emph{consistent oracle} -- an oracle, that, at any moment, can give a function from the class that agrees with all examples seen so far. This model was recently considered by Assos et al.~(COLT'23). It is motivated by the fact that standard methods of online learning rely on computing the Littlestone dimension of subclasses, a computationally intractable problem. Assos et al.~gave an online learning algorithm in this model that makes at most $C^d$ mistakes on classes of Littlestone dimension $d$, for some absolute unspecified constant $C > 0$. We give a novel algorithm that makes at most $O(256^d)$ mistakes. Our proof is significantly simpler and uses only very basic properties of the Littlestone dimension. We also show that there exists no algorithm in this model that makes less than $3^d$ mistakes.	翻訳日:2024-02-08 20:12:51 公開日:2024-02-06
# 空間幾何学的推論を必要とするオブジェクトアセンブリタスクにおける視覚的表現のロバスト性評価 Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning ( http://arxiv.org/abs/2310.09943v3 ) ライセンス: Link先を確認	Chahyon Ku, Carl Winge, Ryan Diaz, Wentao Yuan, Karthik Desingh	(参考訳) 本稿では主に、オブジェクトアセンブリタスクのコンテキストにおける視覚表現の堅牢性の評価とベンチマークに焦点をあてる。具体的には、一般にpeg-in-holeタスクと呼ばれる幾何学的押出しと侵入を伴う物体のアライメントと挿入について検討する。成功組立のためにSE(3)空間のペグと穴形状を検出・オリエントするために必要な精度は大きな課題となる。そこで我々はヴィジュアル・エンコーダとして視覚前訓練モデルを利用するvisosomotor policy learningの汎用フレームワークを採用している。本研究は,両腕操作設定,特に把持変動に対して適用した場合のロバスト性について検討する。我々の定量的分析は、既存の事前学習モデルでは、このタスクに必要な視覚的特徴を捉えることができないことを示している。しかし、スクラッチから訓練されたビジュアルエンコーダは、凍結した事前訓練されたモデルよりも一貫して優れている。さらに、政策学習を大幅に改善する回転表現と関連する損失関数について論じる。本稿では,幾何学的・空間的推論を必要とする複雑な組み立て作業のロバスト性向上に特に焦点をあてた,visosomotor policy learningの進歩を評価するための新しいタスクシナリオを提案する。ビデオ、追加の実験、データセット、コードはhttps://bit.ly/geometric-peg-in-hole.com/で入手できる。 This paper primarily focuses on evaluating and benchmarking the robustness of visual representations in the context of object assembly tasks. Specifically, it investigates the alignment and insertion of objects with geometrical extrusions and intrusions, commonly referred to as a peg-in-hole task. The accuracy required to detect and orient the peg and the hole geometry in SE(3) space for successful assembly poses significant challenges. Addressing this, we employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders. Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations. Our quantitative analysis shows that existing pretrained models fail to capture the essential visual features necessary for this task. However, a visual encoder trained from scratch consistently outperforms the frozen pretrained models. Moreover, we discuss rotation representations and associated loss functions that substantially improve policy learning. We present a novel task scenario designed to evaluate the progress in visuomotor policy learning, with a specific focus on improving the robustness of intricate assembly tasks that require both geometrical and spatial reasoning. Videos, additional experiments, dataset, and code are available at https://bit.ly/geometric-peg-in-hole .	翻訳日:2024-02-08 20:01:48 公開日:2024-02-06
# FlorDB: 継続的トレーニングのためのマルチバージョン監視ロギング FlorDB: Multiversion Hindsight Logging for Continuous Training ( http://arxiv.org/abs/2310.07898v2 ) ライセンス: Link先を確認	Rolando Garcia, Anusha Dandamudi, Gabriel Matute, Lehan Wan, Joseph Gonzalez, Joseph M. Hellerstein, Koushik Sen	(参考訳) プロダクション機械学習には継続的トレーニングが伴う。複数のバージョンのモデルを時間とともにホストし、多くの場合、複数のモデルバージョンを同時に実行する。モデルパフォーマンスが期待を満たさない場合、機械学習エンジニア(mles)は、多くの以前のバージョンのコードとトレーニングデータの探索と分析を通じて問題をデバッグし、根本原因を特定し、問題を緩和する。従来のデバッグとロギングツールは、実験的なマルチバージョンコンテキストの管理に不足することが多い。 FlorDBはMultiversion Hindsight Loggingを導入し、エンジニアは最新のバージョンのロギングステートメントを使用して過去のバージョンを問い合わせることができる。ログステートメントの伝搬は、コードベースの変更にかかわらず、過去のコードバージョンにロギングステートメントを一貫した注入を可能にする。ログステートメントがコードバージョンに伝播されると、multiversionhindsight loggingの残りの課題は、以前の実行時のチェックポイントに基づいて、新しいログステートメントを効率的に再生することである。最後に、すべてのバージョンのコードとデータのMLEデバッグを支援するために、一貫性のあるユーザエクスペリエンスが必要です。この目的のためにflordbは、履歴クエリを効率的に処理するための統一リレーショナルモデルを提示し、ログ履歴の包括的なビューを提供し、過去のコードのイテレーションの探索を簡単にする。本稿では,クエリベースのフィルタリングとチェックポイントベースの並列処理を有効活用し,そのスケーラビリティとリアルタイムクエリ応答能力を確認した多種多様なベンチマークの性能評価を行う。 Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay.	翻訳日:2024-02-08 20:01:04 公開日:2024-02-06
# 置換不変な量子符号の族 A family of permutationally invariant quantum codes ( http://arxiv.org/abs/2310.05358v2 ) ライセンス: Link先を確認	Arda Aydin, Max A. Alekseyev, Alexander Barg	(参考訳) 任意の$t\ge 1$に対して$t$ Pauliエラーを補正する、置換不変コードの新しいファミリーを構築します。また,新しい系統の符号は,量子欠失誤差と自発的減衰誤差を補正することを示した。我々の構成は、以前に知られている変分不変量子符号のいくつかを特に含んでおり、これは超越ゲートも含んでいる。多くの場合、新しいファミリーの符号は、ポーリの誤りや削除に対する最もよく知られた明示的な置換的不変符号よりも短い。さらに、新しいコードファミリーには、新しい$((4,2,2))$Optimary Single-deletion-correctingコードが含まれています。別の結果として、置換的不変符号の条件を一般化し、以前の既知の結果から任意の数のエラーに対して$t=1$の$t$ pauliエラーを補正する。小さな$t$の場合、これらの条件はコンピュータによるコードの新しい例を構築するのに使うことができる。 We construct a new family of permutationally invariant codes that correct $t$ Pauli errors for any $t\ge 1$. We also show that codes in the new family correct quantum deletion errors as well as spontaneous decay errors. Our construction contains some of the previously known permutationally invariant quantum codes as particular cases, which also admit transversal gates. In many cases, the codes in the new family are shorter than the best previously known explicit permutationally invariant codes for Pauli errors and deletions. Furthermore, our new code family includes a new $((4,2,2))$ optimal single-deletion-correcting code. As a separate result, we generalize the conditions for permutationally invariant codes to correct $t$ Pauli errors from the previously known results for $t=1$ to any number of errors. For small $t$, these conditions can be used to construct new examples of codes by computer.	翻訳日:2024-02-08 20:00:02 公開日:2024-02-06
# 転送可能なグラフオートエンコーダによるネットワークアライメント Network Alignment with Transferable Graph Autoencoders ( http://arxiv.org/abs/2310.03272v2 ) ライセンス: Link先を確認	Jiashu He, Charilaos I. Kanatsoulis, Alejandro Ribeiro	(参考訳) ネットワークアライメントは、異なるグラフのノード間の1対1の対応を確立し、ハイインパクトなドメインで多くのアプリケーションを見つけるタスクである。しかし、このタスクはNPハードであることが知られており、既存のアルゴリズムはグラフのサイズが大きくなるにつれてスケールアップしない。そこで我々は,アライメントタスクに適合した,強力でロバストなノード埋め込みを抽出することを目的とした,新しい一般化グラフオートエンコーダアーキテクチャを提案する。生成した埋め込みはグラフの固有値と固有ベクトルに関連付けられ、古典的なスペクトル法と比較してより正確なアライメントが得られることが証明される。また,提案フレームワークでは,転送学習とデータ拡張を利用して,再トレーニングすることなく大規模ネットワークアライメントを実現している。実世界のグラフとのネットワークとサブネットワークの連携に関する広範囲な実験は、提案手法の有効性とスケーラビリティを裏付ける証拠を提供する。 Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.	翻訳日:2024-02-08 19:58:44 公開日:2024-02-06
# テキストから画像への拡散によるドメインの変換:ドメイン適応へのソースフリーアプローチ Transcending Domains through Text-to-Image Diffusion: A Source-Free Approach to Domain Adaptation ( http://arxiv.org/abs/2310.01701v4 ) ライセンス: Link先を確認	Shivang Chopra, Suraj Kothawade, Houda Aynaou, Aman Chadha	(参考訳) ドメイン適応(da)は、モデルが関連するソースドメインから取得した情報を十分なラベル付きデータで適用することにより、不適切なアノテートデータを持つ対象ドメインにおけるモデルの性能を向上させる手法である。 HIPAA、COPPA、FERPAなどのデータプライバシ規制の実施が、ソースデータに直接アクセスする必要を回避しつつ、新しいドメインにモデルを適用することへの関心を高め、ソースフリードメイン適応(Source-free Domain Adaptation、SFDA)と呼ばれる問題を引き起こした。本稿では,対象ドメインのサンプルに基づいて訓練されたテキスト・画像拡散モデルを用いて,ソースデータを生成する新しいSFDAフレームワークを提案する。提案手法は,ラベル付き対象領域のサンプルに対してテキスト間拡散モデルをトレーニングし,事前学習したソースモデルを用いて微調整を行い,ソースデータに近いサンプルを生成する。最後に、ドメイン適応技術を用いて、人工的に生成されたソースデータを対象のドメインデータと整合させることにより、ターゲットのドメイン上でのモデルの性能が大幅に向上する。標準のoffice-31, office-home, visdaベンチマークにおける複数のベースラインとの比較を行い,sfdaタスクに対するアプローチの有効性を実証した。 Domain Adaptation (DA) is a method for enhancing a model's performance on a target domain with inadequate annotated data by applying the information the model has acquired from a related source domain with sufficient labeled data. The escalating enforcement of data-privacy regulations like HIPAA, COPPA, FERPA, etc. have sparked a heightened interest in adapting models to novel domains while circumventing the need for direct access to the source data, a problem known as Source-Free Domain Adaptation (SFDA). In this paper, we propose a novel framework for SFDA that generates source data using a text-to-image diffusion model trained on the target domain samples. Our method starts by training a text-to-image diffusion model on the labeled target domain samples, which is then fine-tuned using the pre-trained source model to generate samples close to the source data. Finally, we use Domain Adaptation techniques to align the artificially generated source data with the target domain data, resulting in significant performance improvements of the model on the target domain. Through extensive comparison against several baselines on the standard Office-31, Office-Home, and VisDA benchmarks, we demonstrate the effectiveness of our approach for the SFDA task.	翻訳日:2024-02-08 19:58:29 公開日:2024-02-06
# 動的ゴール認識フラグメントによる薬物発見 Drug Discovery with Dynamic Goal-aware Fragments ( http://arxiv.org/abs/2310.00841v2 ) ライセンス: Link先を確認	Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang	(参考訳) フラグメントに基づく薬物発見は、広大な化学領域における薬物候補の発見に有効な戦略であり、分子生成モデルに広く用いられている。しかし、そのようなモデルにおける既存の断片抽出法の多くは、対象の化学的性質を考慮せず、ヒューリスティックな規則に依存する。さらに、既存のフラグメントベースの生成モデルは、生成中に新たに発見されたゴール対応のフラグメントでフラグメント語彙を更新できない。そこで本研究では,創薬のための分子生成フレームワークであるgoal-aware fragment extraction, assembly and modified (geam)を提案する。 GEAMは3つのモジュールから構成されており、それぞれがゴール対応のフラグメント抽出、フラグメントアセンブリ、フラグメント修正を担当している。フラグメント抽出モジュールは、情報ボトルネック原理により、所望の目標プロパティに寄与する重要なフラグメントを識別し、有効ゴール対応フラグメント語彙を構築する。さらに、GEAMはフラグメント修正モジュールで最初の語彙を超える探索が可能であり、動的ゴール対応語彙更新によってさらに探索が強化される。 GEAMは, 薬物発見タスクにおける3つのモジュールの生成サイクルを通じて, 薬物候補を効果的に発見できることを実験的に実証した。 Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments contributing to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks.	翻訳日:2024-02-08 19:57:44 公開日:2024-02-06
# FENDA-FL : 不均一な臨床データを用いた個人化フェデレーション学習 FENDA-FL: Personalized Federated Learning on Heterogeneous Clinical Datasets ( http://arxiv.org/abs/2309.16825v2 ) ライセンス: Link先を確認	Fatemeh Tavakoli, D.B. Emerson, Sana Ayromlou, John Jewell, Amrit Krishnan, Yuchong Zhang, Amol Verma, Fahad Razak	(参考訳) フェデレーテッド・ラーニング(FL)は、臨床環境での機械学習モデルのトレーニングと展開を頻繁に妨害するデータサイロを克服するための重要なアプローチとして、ますます認識されている。この研究は、3つの重要な方向に沿って臨床応用に焦点を当てたfl研究の発展に寄与している。まず、FLambyベンチマーク(du Terrail et al., 2022a)を拡張し、パーソナライズされたFL手法の評価を行い、元の結果よりも実質的な性能改善を示す。次に,実際の設定を反映し,複数の比較基準を提供するために,FLの総合的なチェックポイントと評価フレームワークを提案する。最後に,perfcl(zhang et al., 2022)の重要なアブレーションについて検討した。このアブレーションは、FL設定へのFENDA(Kim et al., 2016)の自然な拡張である。 flambyベンチマークとgeminiデータセット(verma et al., 2017)で実施した実験によると、このアプローチは異種臨床データに対して堅牢であり、perfclを含む既存のグローバルおよびパーソナライズされたfl技術を上回ることが多い。 Federated learning (FL) is increasingly being recognized as a key approach to overcoming the data silos that so frequently obstruct the training and deployment of machine-learning models in clinical settings. This work contributes to a growing body of FL research specifically focused on clinical applications along three important directions. First, we expand the FLamby benchmark (du Terrail et al., 2022a) to include evaluation of personalized FL methods and demonstrate substantive performance improvements over the original results. Next, we advocate for a comprehensive checkpointing and evaluation framework for FL to reflect practical settings and provide multiple comparison baselines. Finally, we study an important ablation of PerFCL (Zhang et al., 2022). This ablation is a natural extension of FENDA (Kim et al., 2016) to the FL setting. Experiments conducted on the FLamby benchmarks and GEMINI datasets (Verma et al., 2017) show that the approach is robust to heterogeneous clinical data and often outperforms existing global and personalized FL techniques, including PerFCL.	翻訳日:2024-02-08 19:57:25 公開日:2024-02-06
# 確率モデルに基づくメタ強化学習によるデータ効率の高いタスク一般化 Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning ( http://arxiv.org/abs/2311.07558v2 ) ライセンス: Link先を確認	Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco Hutter, Stelian Coros, Andreas Krause	(参考訳) 本稿では,モデルに基づくメタ強化学習(Meta-RL)アルゴリズムであるPACOH-RLを紹介する。 PACOH-RLメタ学習は動的モデルに先行し、最小の相互作用データを持つ新しい力学への迅速な適応を可能にする。既存のメタrlメソッドは豊富なメタラーニングデータを必要とするため、データ取得にコストがかかるロボティクスなどの設定での適用性が制限される。これを解決するため、PACOH-RLは、メタラーニングとタスク適応の段階において、正規化と疫学的不確実性の定量化を取り入れている。新しいダイナミクスに直面するとき、探索とデータ収集を効果的に導くために、これらの不確実性推定を使用する。全体として、以前のタスクや動的設定からのデータにアクセスしても、ポジティブな転送が可能になる。実験の結果,PACOH-RLはモデルベースRLおよびモデルベースMeta-RLベースラインよりも高い性能を示し,新しい動的条件に適応した。最後に、実車上では、多種多様なデータスカース条件下での効率的なRLポリシー適応の可能性を示す。 We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.	翻訳日:2024-02-08 19:49:18 公開日:2024-02-06
# 時間同期配電系統状態推定のためのディープニューラルネットワークの性能解析検証 Analytical Verification of Deep Neural Network Performance for Time-Synchronized Distribution System State Estimation ( http://arxiv.org/abs/2311.06973v2 ) ライセンス: Link先を確認	Behrouz Azimian, Shiva Moshtagh, Anamitra Pal, Shanshan Ma	(参考訳) 近年,リアルタイム観測不能な分散システムのためのディープニューラルネットワーク(DNN)を用いた時間同期状態推定器の成功例が報告されている。本稿では,入力測定における摂動関数として,その状態推定器の性能に関する解析的境界を与える。テストデータセットのみに基づいてパフォーマンスを評価することは、トレーニング済みのDNNが入力摂動を処理する能力を効果的に示すものではないことがすでに示されている。そこで我々はDNNの堅牢性と信頼性を解析的に検証し,それらを混合整数線形プログラミング(MILP)問題として扱う。 MILP定式化のスケーラビリティ制限に対処する際のバッチ正規化の能力も強調されている。このフレームワークは、修正されたieee 34ノードシステムと実世界の大規模分散システムに対する時間同期分布系状態推定を行い、いずれもマイクロファサー測定ユニットによって不完全に観測される。 Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not effectively indicate a trained DNN's ability to handle input perturbations. As such, we analytically verify robustness and trustworthiness of DNNs to input perturbations by treating them as mixed-integer linear programming (MILP) problems. The ability of batch normalization in addressing the scalability limitations of the MILP formulation is also highlighted. The framework is validated by performing time-synchronized distribution system state estimation for a modified IEEE 34-node system and a real-world large distribution system, both of which are incompletely observed by micro-phasor measurement units.	翻訳日:2024-02-08 19:48:57 公開日:2024-02-06
# 無線ネットワークにおけるビデオキャッシングのためのリソースアウェア階層型フェデレート学習 Resource-Aware Hierarchical Federated Learning for Video Caching in Wireless Networks ( http://arxiv.org/abs/2311.06918v2 ) ライセンス: Link先を確認	Md Ferdous Pervej and Andreas F Molisch	(参考訳) ビデオキャッシングは、ユーザーが頻繁に要求する人気のコンテンツをローカルに保存することで、交通渋滞を著しく改善することができる。ユーザの要求が時間とともにどのように変化するかを学ぶためには,プライバシ保護手法が望ましい。そこで本研究では,コンテンツ要求が散発的であり,ユーザのデータセットは要求されたコンテンツの情報に基づいてのみ更新可能であるという現実的な仮定の下で,ユーザの今後のコンテンツ要求を予測するための,リソース対応階層型学習(RawHFL)ソリューションを提案する。部分的なクライアント参加の場合を考えると、まず、クライアントのローカルトレーニングラウンドに依存するグローバルグラデーションノルムの上限と、無線リンク上で蓄積されたグラデーションの受信の成功を導出する。遅延,エネルギー,無線リソースの制約の下で,RawHFLの収束をエネルギー効率よく促進する重み付きユーティリティ関数を最小化するために,クライアントの選択とその局所ラウンドとCPU周波数を最適化する。シミュレーション結果から,提案手法は予測精度と総エネルギー消費量の点で基準値を大きく上回ることがわかった。 Video caching can significantly improve backhaul traffic congestion by locally storing the popular content that users frequently request. A privacy-preserving method is desirable to learn how users' demands change over time. As such, this paper proposes a novel resource-aware hierarchical federated learning (RawHFL) solution to predict users' future content requests under the realistic assumptions that content requests are sporadic and users' datasets can only be updated based on the requested content's information. Considering a partial client participation case, we first derive the upper bound of the global gradient norm that depends on the clients' local training rounds and the successful reception of their accumulated gradients over the wireless links. Under delay, energy and radio resource constraints, we then optimize client selection and their local rounds and central processing unit (CPU) frequencies to minimize a weighted utility function that facilitates RawHFL's convergence in an energy-efficient way. Our simulation results show that the proposed solution significantly outperforms the considered baselines in terms of prediction accuracy and total energy expenditure.	翻訳日:2024-02-08 19:48:44 公開日:2024-02-06
# 強化学習におけるトンプソンサンプリングのためのベイズ回帰境界の改良 Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning ( http://arxiv.org/abs/2310.20007v2 ) ライセンス: Link先を確認	Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal	(参考訳) 本稿では,複数設定の強化学習におけるトンプソンサンプリングに対する最初のベイズ的後悔の限界を実証する。本稿では,サロゲート環境の離散セットを用いた学習問題を単純化し,後方整合性を用いた情報比率の高精度解析を提案する。これは、h$ がエピソードの長さ、$d_{l_1}$ が環境空間のコルモゴロフ $l_1-$dimensionであるような不均質な強化学習問題において、順序 $\widetilde{o}(h\sqrt{d_{l_1}t})$ の上限となる。次に、表、線形、有限混合といった様々な設定で$d_{l_1}$の具体的な境界を見つけ、その結果がどのようにそれらの種類の最初のものであるか、それとも最先端の技術を改善するかについて議論する。 In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.	翻訳日:2024-02-08 19:46:56 公開日:2024-02-06
# 古典的なfrenet-serret装置から量子力学的進化の曲率とねじれまで。第2部。非定常ハミルトニアン From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part II. Nonstationary Hamiltonians ( http://arxiv.org/abs/2311.18463v2 ) ライセンス: Link先を確認	Paul M. Alsing, Carlo Cafaro	(参考訳) 非定常ハミルトニアンの下で進化する状態ベクトルによって追跡される量子曲線の曲げとねじれの定量化に関する幾何学的視点を示す。具体的には, 定常ハミルトニアンの既存の幾何学的視点に基づき, 時変曲率とねじれ係数の両方が重要な役割を果たす時間依存量子力学的シナリオへの理論的構成の一般化について論じる。具体的には、シュロディンガー発展方程式を規定する時間依存ハミルトニアンの下で一元的に進化する平行移動純量子状態によってトレースされる射影ヒルベルト空間における量子軌道に対するフレネット・セルレート装置の量子バージョンを提案する。時変曲率係数は、接ベクトルと状態ベクトルの共変微分の2乗の大きさで指定され、量子曲線の曲げを測定する。時間変化のねじれ係数は、接ベクトルの共変微分の状態ベクトルへの射影の大きさの2乗、接ベクトルと状態ベクトルに直交し、さらに量子曲線のねじれを測定することによって与えられる。時間変化の設定は、統計的観点からよりリッチな構造を示す。例えば、時間に依存しない構成とは異なり、一般化された分散の概念は非定常ハミルトニアンの下で進化する量子状態によってトレースされる曲線のねじれの定義において非自明に入る。本手法の意義を物理的に説明するために, 正弦波振動時間依存ポテンシャルによって特定される, 完全に可溶な時間依存二状態rabi問題に適用する。 We present a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by state vectors evolving under nonstationary Hamiltonians. Specifically, relying on the existing geometric viewpoint for stationary Hamiltonians, we discuss the generalization of our theoretical construct to time-dependent quantum-mechanical scenarios where both time-varying curvature and torsion coefficients play a key role. Specifically, we present a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced out by a parallel-transported pure quantum state evolving unitarily under a time-dependent Hamiltonian specifying the Schrodinger evolution equation. The time-varying curvature coefficient is specified by the magnitude squared of the covariant derivative of the tangent vector to the state vector and measures the bending of the quantum curve. The time-varying torsion coefficient, instead, is given by the magnitude squared of the projection of the covariant derivative of the tangent vector to the state vector, orthogonal to the tangent vector and state vector and, in addition, measures the twisting of the quantum curve. We find that the time-varying setting exhibits a richer structure from a statistical standpoint. For instance, unlike the time-independent configuration, we find that the notion of generalized variance enters nontrivially in the definition of the torsion of a curve traced out by a quantum state evolving under a nonstationary Hamiltonian. To physically illustrate the significance of our construct, we apply it to an exactly soluble time-dependent two-state Rabi problem specified by a sinusoidal oscillating time-dependent potential...	翻訳日:2024-02-08 19:35:39 公開日:2024-02-06
# lightgaussian: 15倍縮小200fpsの非有界3次元ガウス圧縮 LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS ( http://arxiv.org/abs/2311.17245v4 ) ライセンス: Link先を確認	Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang	(参考訳) ポイントベース技術を用いたリアルタイムニューラルレンダリングの最近の進歩は、3D表現の普及の道を開いた。しかし、3D Gaussian Splattingのような基本的なアプローチは、SfMポイントを数百万に拡大し、単一の無制限シーンに対してギガバイトレベルのディスクスペースを必要とすることがあり、大きなスケーラビリティ上の課題を生じさせ、スティング効率を妨げている。この課題に対処するために、我々は3Dガウスをより効率的でコンパクトなフォーマットに変換するために設計された新しい方法であるLightGaussianを紹介する。ネットワークプルーニングの概念からインスピレーションを得て、lightgaussianはシーンの再構築に寄与しないガウス人を特定し、プルーニングとリカバリのプロセスを採用し、視覚効果を保ちながらガウス数における冗長性を効果的に削減した。さらに、LightGaussianは、蒸留と擬似ビュー拡張を使用して球面調和を低い程度に蒸留し、反射性を維持しながらよりコンパクトな表現への知識伝達を可能にする。さらに,全ての属性を量子化するハイブリッド方式であるVecTree Quantizationを提案する。要約すると、LightGaussian は FPS を 139 から 215 に向上させ、Mip-NeRF 360, Tank と Temple のデータセット上の複雑なシーンの効率的な表現を可能にした。プロジェクトウェブサイト: https://lightgaussian.github.io/ Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency. To address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses. In summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets. Project website: https://lightgaussian.github.io/	翻訳日:2024-02-08 19:35:11 公開日:2024-02-06
# VALUED -- 視覚と論理的理解評価データセット VALUED -- Vision and Logical Understanding Evaluation Dataset ( http://arxiv.org/abs/2311.12610v2 ) ライセンス: Link先を確認	Soumadeep Saha, Saptarshi Saha, Utpal Garain	(参考訳) コンピュータビジョンタスクの初期の成功から始まり、ディープラーニングベースの技術は、多くの領域で最先端の技術アプローチを追い越してきた。しかし、これらの手法が意味的文脈や論理的制約を捉えず、答えに到達するには素早い相関に依存することが何度も示されてきた。批判シナリオへのディープラーニング技術の適用は、ドメイン固有の制約の遵守に依存しているため、この問題に対処するためのいくつかの試みがなされている。この領域の徹底的な探索を控える制限のひとつは、豊富なルールを特徴とする適切なデータセットの欠如である。この問題に対処するため,20,000$+$の注釈付き画像と関連するルールセットからなるVALUE(Vision And Logical Understanding Evaluation)データセットを,人気ボードゲームであるチェスに基づいて提示する。キュレートされたルールセットは許容可能な予測のセットをかなり制約し、ローカライゼーションや列挙のようなキーセマンティックな能力を探索するように設計されている。標準的なメトリクスに加えて、論理的一貫性に関するパフォーマンスを測定するための追加メトリクスも提示される。我々は,このタスクにおけるアートビジョンモデルの人気と現状を分析し,標準メトリクスのパフォーマンスは評価可能であるが,無矛盾な結果が多数得られており,このデータセットが今後の作業において重要な課題であることを示す。 Starting with early successes in computer vision tasks, deep learning based techniques have since overtaken state of the art approaches in a multitude of domains. However, it has been demonstrated time and again that these techniques fail to capture semantic context and logical constraints, instead often relying on spurious correlations to arrive at the answer. Since application of deep learning techniques to critical scenarios are dependent on adherence to domain specific constraints, several attempts have been made to address this issue. One limitation holding back a thorough exploration of this area, is a lack of suitable datasets which feature a rich set of rules. In order to address this, we present the VALUE (Vision And Logical Understanding Evaluation) Dataset, consisting of 200,000$+$ annotated images and an associated rule set, based on the popular board game - chess. The curated rule set considerably constrains the set of allowable predictions, and are designed to probe key semantic abilities like localization and enumeration. Alongside standard metrics, additional metrics to measure performance with regards to logical consistency is presented. We analyze several popular and state of the art vision models on this task, and show that, although their performance on standard metrics are laudable, they produce a plethora of incoherent results, indicating that this dataset presents a significant challenge for future works.	翻訳日:2024-02-08 19:33:55 公開日:2024-02-06
# (なぜ) 私のプロンプトはもっと悪いのか? LLM APIの進化における回帰テストの再考 (Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs ( http://arxiv.org/abs/2311.11123v2 ) ライセンス: Link先を確認	Wanqin Ma, Chenyang Yang, Christian K\"astner	(参考訳) 大規模言語モデル(LLM)はますますソフトウェアアプリケーションに統合されている。下流のアプリケーション開発者は、サービスとして提供されるAPIを通じてLLMにアクセスすることが多い。しかし、LLM APIは、しばしば静かに更新され、非推奨にされ、ユーザーは進化するモデルに継続的に適応せざるを得ない。これは性能の低下を引き起こし、毒性検出のケーススタディで証明されているように、迅速な設計選択に影響を与える可能性がある。ケーススタディに基づき、LLM APIの進化における回帰テストの概念の必要性と再検討を強調した。 LLMの回帰テストには、異なる正確性の概念、不安定性の促進、LLM APIの非決定性など、従来のテストアプローチに根本的な変更が必要であると我々は主張する。 Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.	翻訳日:2024-02-08 19:33:32 公開日:2024-02-06
# プロンプト工学から、ループの中の人間とのプロンプト科学へ From Prompt Engineering to Prompt Science With Human in the Loop ( http://arxiv.org/abs/2401.04122v2 ) ライセンス: Link先を確認	Chirag Shah	(参考訳) LLMが私たちの生活の様々な側面に進出するにつれ、LCMの使用に関する精査が増加するのは科学的研究である。研究目的のデータの生成や分析にLLMを使うことが普及している。しかし、そのようなアプリケーションがアドホックな決定とエンジニアリングのソリューションに満ちている場合、その研究、その発見、またはその研究に基づく将来にどのように影響するかを心配する必要があります。研究にllmを使うには、もっと科学的アプローチが必要です。より体系的なプロンプトの構築を支援するための活動はいくつかあるが、しばしば、十分な透明性、客観性、または厳密さで複製可能で一般化可能な知識を生成するよりも、望ましい結果を達成することに重点を置いている。本稿では,質的手法によるコードブック構築に着想を得た新しい手法を提案する。この手法は、ループ内の人間と多相検証プロセスを用いて、データ分析にLLMを適用するためのより体系的で客観的で信頼できる方法の基礎を定めている。具体的には、一連の研究者が厳密なラベル付け、検討、文書化のプロセスを通じて、主観性を排除し、透明性と複製性を生成プロセスにもたらす方法を示す。 As LLMs make their way into many aspects of our lives, one place that warrants increased scrutiny with LLM usage is scientific research. Using LLMs for generating or analyzing data for research purposes is gaining popularity. But when such application is marred with ad-hoc decisions and engineering solutions, we need to be concerned about how it may affect that research, its findings, or any future works based on that research. We need a more scientific approach to using LLMs in our research. While there are several active efforts to support more systematic construction of prompts, they are often focused more on achieving desirable outcomes rather than producing replicable and generalizable knowledge with sufficient transparency, objectivity, or rigor. This article presents a new methodology inspired by codebook construction through qualitative methods to address that. Using humans in the loop and a multi-phase verification processes, this methodology lays a foundation for more systematic, objective, and trustworthy way of applying LLMs for analyzing data. Specifically, we show how a set of researchers can work through a rigorous process of labeling, deliberating, and documenting to remove subjectivity and bring transparency and replicability to prompt generation process.	翻訳日:2024-02-08 19:23:55 公開日:2024-02-06
# diarizationlm:大規模言語モデルを用いた話者ダイアリゼーション後処理 DiarizationLM: Speaker Diarization Post-Processing with Large Language Models ( http://arxiv.org/abs/2401.03506v4 ) ライセンス: Link先を確認	Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao	(参考訳) 本稿では,大言語モデル(LLM)を利用して話者ダイアリゼーションシステムから出力を後処理するフレームワークであるダイアリゼーションLMを紹介する。提案するフレームワークでは,ダイアリゼーション文字の可読性の向上や,単語ダイアリゼーション誤り率(WDER)の低減など,さまざまな目標を達成することができる。この枠組みでは、自動音声認識(asr)および話者ダイアリゼーションシステムの出力を、任意に微調整されたllmへのプロンプトに含まれるコンパクトテキスト形式として表現する。 LLMの出力は、所望の増強で精製ダイアリゼーション結果として用いることができる。処理後ステップとして、このフレームワークは既存のコンポーネントを再トレーニングすることなく、任意の既製のasrおよび話者ダイアリゼーションシステムに容易に適用できる。実験の結果,微調整された PaLM 2-S モデルにより WDER を rel で低減できることがわかった。 Fisher 電話の会話データセットで55.5%、rel。 44.9%であった。 In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset.	翻訳日:2024-02-08 19:23:33 公開日:2024-02-06
# 教師なし深層学習画像検証法 Unsupervised Deep Learning Image Verification Method ( http://arxiv.org/abs/2312.14395v2 ) ライセンス: Link先を確認	Enoch Solomon, Abraham Woubie and Eyael Solomon Emiru	(参考訳) ディープラーニングは一般的に画像認識に使用されるが、通常は大量のラベル付きトレーニングデータが必要である。これにより、最先端の教師なし顔認証技術と比較すると、顕著な性能格差が生じる。本研究では,顔画像ベクトルを新しい表現に変換するオートエンコーダを利用して,このギャップを狭める手法を提案する。特に、オートエンコーダは、元の入力画像ベクトルではなく、隣接する顔画像ベクトルを再構成するように訓練される。これらの隣接顔画像ベクトルは、訓練顔画像ベクトルとの最高コサインスコアに基づいて教師なしプロセスにより選択される。提案手法は,野生(lfw)データセットのラベル付き顔のベースラインシステム上でのeerの相対的改善を56\%達成する。これにより、コサインとPLDAスコアリングシステムのパフォーマンスギャップを狭めることに成功した。 Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.	翻訳日:2024-02-08 19:22:34 公開日:2024-02-06
# 最適統計透かしに向けて Towards Optimal Statistical Watermarking ( http://arxiv.org/abs/2312.07930v3 ) ライセンス: Link先を確認	Baihe Huang and Hanlin Zhu and Banghua Zhu and Kannan Ramchandran and Michael I. Jordan and Jason D. Lee and Jiantao Jiao	(参考訳) 統計的ウォーターマーキングを仮説検定問題として定式化し,従来のすべての統計ウォーターマーキング法を仮定した。我々の定式化の鍵は出力トークンと拒否領域の結合であり、実際には擬似ランダム生成器によって実現され、I型エラーとII型エラーの非自明なトレードオフを可能にする。一般仮説テスト設定におけるUMP(Uniformly Most Powerful)の透かしとモデル非依存設定におけるミニマックスタイプIIの誤差を特徴付ける。出力が$n$トークンのシーケンスである一般的なシナリオでは、小さなタイプIとタイプIIのエラーを保証するために必要なi.d.トークンの数にほぼ一致する上限と下位の境界を確立する。我々のレートは$\Theta(h^{-1} \log (1/h))$で、トークン当たりの平均エントロピーは$h$で、前作の$h^{-2}$から改善のためのポテンシャルを強調する。さらに、ユーザが生成したテキストに対して摂動のクラスを実行することを許されるロバストな透かし問題を定式化し、線形プログラミング問題を通じてロバストなUMPテストのタイプIIエラーを特徴付ける。我々の知る限りでは、これは、将来の研究の関心を惹きつけるであろう、近距離最適率の透かし問題に関する最初の体系的な統計処理である。 We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $\Theta(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h^{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.	翻訳日:2024-02-08 19:20:22 公開日:2024-02-06
# 拡散モデルにおけるニューラルネットワークに基づくスコア推定:最適化と一般化 Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization ( http://arxiv.org/abs/2401.15604v2 ) ライセンス: Link先を確認	Yinbin Han, Meisam Razaviyayn, Renyuan Xu	(参考訳) 拡散モデルがgansに匹敵する強力なツールとして登場し、忠実性、柔軟性、堅牢性を改善した高品質なサンプルを生成する。これらのモデルの鍵となる要素は、スコアマッチングを通じてスコア関数を学ぶことである。様々なタスクで経験的な成功にもかかわらず、勾配に基づくアルゴリズムが証明可能な精度でスコア関数を学習できるかどうかは不明である。この質問に答える第一歩として,勾配降下によって学習したニューラルネットワークを用いてスコア推定を解析するための数学的枠組みを確立した。本分析は,学習手順の最適化と一般化の両面をカバーする。特に,ノイズラベルを用いた回帰として,発声スコアマッチング問題を定式化するパラメトリック形式を提案する。標準教師付き学習装置と比較して、スコアマッチング問題は、非有界入力、ベクトル値出力、追加の時間変数などの異なる課題を導入し、既存のテクニックが直接適用されないようにする。本稿では、適切に設計されたニューラルネットワークアーキテクチャを用いて、スコア関数を、神経接核によって引き起こされる再生核ヒルベルト空間によって正確に近似できることを示す。さらに,勾配降下の早期停止ルールを適用し,ニューラルネットワークのトレーニングとカーネル回帰の結合引数を活用することで,観測にノイズが存在するにもかかわらずスコア関数を学習するための最初の一般化誤差(サンプル複雑性)境界を確立する。本研究は,ニューラルネットの新しいパラメトリック形式と,スコアマッチングと回帰分析の革新的な関連を基礎として,高度な統計・最適化手法の適用を促進する。 Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness. A key component of these models is to learn the score function through score matching. Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy. As a first step toward answering this question, this paper establishes a mathematical framework for analyzing score estimation using neural networks trained by gradient descent. Our analysis covers both the optimization and the generalization aspects of the learning procedure. In particular, we propose a parametric form to formulate the denoising score-matching problem as a regression with noisy labels. Compared to the standard supervised learning setup, the score-matching problem introduces distinct challenges, including unbounded input, vector-valued output, and an additional time variable, preventing existing techniques from being applied directly. In this paper, we show that with a properly designed neural network architecture, the score function can be accurately approximated by a reproducing kernel Hilbert space induced by neural tangent kernels. Furthermore, by applying an early-stopping rule for gradient descent and leveraging certain coupling arguments between neural network training and kernel regression, we establish the first generalization error (sample complexity) bounds for learning the score function despite the presence of noise in the observations. Our analysis is grounded in a novel parametric form of the neural network and an innovative connection between score matching and regression analysis, facilitating the application of advanced statistical and optimization techniques.	翻訳日:2024-02-08 19:11:38 公開日:2024-02-06
# TA-RNN:電子健康記録のための注意に基づく時間認識リカレントニューラルネットワークアーキテクチャ TA-RNN: an Attention-based Time-aware Recurrent Neural Network Architecture for Electronic Health Records ( http://arxiv.org/abs/2401.14694v2 ) ライセンス: Link先を確認	Mohammad Al Olaimat, Serdar Bozdag (for the Alzheimer's Disease Neuroimaging Initiative)	(参考訳) 動機:Electronic Health Records(EHR)は患者の医療史の総合的な資料である。 EHRは、深層学習(DL)のような高度な技術を活用するために不可欠であり、医療提供者が広範なデータを分析し、貴重な洞察を抽出し、正確でデータ駆動型の臨床決定を下すことができる。リカレントニューラルネットワーク(Recurrent Neural Networks, RNN)のようなDL手法を用いて, EHRを分析して疾患の進行をモデル化し, 診断を予測する。しかし、これらの手法は、臨床訪問間の不規則な時間間隔など、EHRデータに固有の不規則性には対処しない。さらに、ほとんどのDLモデルは解釈できない。本研究では,RNNをベースとした2つの解釈可能なDLアーキテクチャ,TA-RNN(Time-Aware RNN)とTA-RNN-Autoencoder(TA-RNN-AE)を提案する。本研究では,不規則な時間間隔の影響を軽減するため,訪問時間間の時間埋め込みを提案する。そこで本研究では,各訪問における訪問と特徴の間で動作する2レベルアテンション機構を提案する。結果: アルツハイマー病神経画像イニシアチブ (ADNI) と国立アルツハイマー病コーディネートセンター (NACC) データセットを用いて行った実験の結果, F2 と感度に基づく最先端およびベースラインアプローチと比較して,アルツハイマー病(AD)を予測するための提案モデルの優れた性能を示した。さらに、TA-RNNは、死亡予測のためのMIMIC-IIIデータセットにおいて優れた性能を示した。アブレーション実験では,時間埋め込みと注意機構を取り入れた予測性能が向上した。最後に注意重みの調査は、予測に影響力のある訪問や特徴を特定するのに役立った。 Motivation: Electronic Health Records (EHR) represent a comprehensive resource of a patient's medical history. EHR are essential for utilizing advanced technologies such as deep learning (DL), enabling healthcare providers to analyze extensive data, extract valuable insights, and make precise and data-driven clinical decisions. DL methods such as Recurrent Neural Networks (RNN) have been utilized to analyze EHR to model disease progression and predict diagnosis. However, these methods do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. In this study, we propose two interpretable DL architectures based on RNN, namely Time-Aware RNN (TA-RNN) and TA-RNN-Autoencoder (TA-RNN-AE) to predict patient's clinical outcome in EHR at next visit and multiple visits ahead, respectively. To mitigate the impact of irregular time intervals, we propose incorporating time embedding of the elapsed times between visits. For interpretability, we propose employing a dual-level attention mechanism that operates between visits and features within each visit. Results: The results of the experiments conducted on Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets indicated superior performance of proposed models for predicting Alzheimer's Disease (AD) compared to state-of-the-art and baseline approaches based on F2 and sensitivity. Additionally, TA-RNN showed superior performance on Medical Information Mart for Intensive Care (MIMIC-III) dataset for mortality prediction. In our ablation study, we observed enhanced predictive performance by incorporating time embedding and attention mechanisms. Finally, investigating attention weights helped identify influential visits and features in predictions.	翻訳日:2024-02-08 19:11:13 公開日:2024-02-06
# varshni-hellmannポテンシャルの近似境界状態解 Approximate Bound States Solution of the Varshni-Hellmann Potential ( http://arxiv.org/abs/2401.11151v3 ) ライセンス: Link先を確認	N. Tazimi	(参考訳) 本稿では,varshni-hellmannポテンシャルの有界状態問題を有用な手法で解く。本研究では,varshni-hellmannポテンシャルに対するschrodinger方程式の境界状態解をansatz法で求める。エネルギー固有値と対応する固有関数を得る。また、地中におけるエネルギースペクトルの挙動と、2つの身体系の励起状態について図式的に示す。この結果と正確な数値との類似性は,本手法の効率性を示すものである。 In this paper, we solve the bound state problem for Varshni-Hellmann potential via a useful technique. In our technique, we obtain the bound state solution of the Schrodinger equation for the Varshni-Hellmann potential via ansatz method. We obtain the energy eigenvalues and the corresponding eigen-functions. Also, the behavior of the energy spectra for both the ground and the excited state of the two body systems is illustrated graphically. The similarity of our results to the accurate numerical values is indicative of the efficiency of our technique.	翻訳日:2024-02-08 19:10:19 公開日:2024-02-06
# AI既存リスクの2つのタイプ:決定的かつ累積的 Two Types of AI Existential Risk: Decisive and Accumulative ( http://arxiv.org/abs/2401.07836v2 ) ライセンス: Link先を確認	Atoosa Kasirzadeh	(参考訳) AIからの現実的リスク(xリスク)に関する従来の談話は、一般的には、高度なAIシステム、特に人間レベルの知性を達成したり、超えたりすることによる、突発的で恐ろしい出来事に焦点を当てている。これらの出来事は、人類の絶滅に繋がる深刻な結果をもたらすか、あるいは不可逆的に人間の文明を回復の限界まで破壊する。しかし、この談話はしばしば、より小さく相互接続された一連の混乱を通じて徐々に現れるai x-リスクの深刻な可能性を無視し、徐々に臨界しきい値を超えていく。本稿では,従来の「決定的ai x-risk仮説」と「蓄積的ai x-risk仮説」を対比する。前者は、制御不能な超知能のようなシナリオを特徴とする、AIによる過剰な乗っ取り経路を想定しているが、後者は、実在する災害に対する別の因果経路を示唆している。これには、深刻な脆弱性やエコノポリティカルな構造の体系的侵食など、AIによって引き起こされる脅威が徐々に蓄積される。累積仮説は、インクリメンタルaiのリスクがゆっくりと収束し、引き起こされる事象が不可逆的な崩壊に至るまでレジリエンスを損なう、沸騰するカエルシナリオを示唆する。システム分析を通じて,これら2つの仮説を区別する明確な仮定について検討する。累積的な視点は、AIリスクに関する一見互換性のない視点を一致させる、と論じられている。これらの因果経路 – 決定的かつ累積的 – との違いが,AIリスクのガバナンスや長期的なAI安全性に与える影響について論じる。 The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the conventional "decisive AI x-risk hypothesis" with an "accumulative AI x-risk hypothesis." While the former envisions an overt AI takeover pathway, characterized by scenarios like uncontrollable superintelligence, the latter suggests a different causal pathway to existential catastrophes. This involves a gradual accumulation of critical AI-induced threats such as severe vulnerabilities and systemic erosion of econopolitical structures. The accumulative hypothesis suggests a boiling frog scenario where incremental AI risks slowly converge, undermining resilience until a triggering event results in irreversible collapse. Through systems analysis, this paper examines the distinct assumptions differentiating these two hypotheses. It is then argued that the accumulative view reconciles seemingly incompatible perspectives on AI risks. The implications of differentiating between these causal pathways -- the decisive and the accumulative -- for the governance of AI risks as well as long-term AI safety are discussed.	翻訳日:2024-02-08 19:09:43 公開日:2024-02-06
# lighthgnn: 100\times$高速推論のためにハイパーグラフニューラルネットワークをmlpに蒸留する LightHGNN: Distilling Hypergraph Neural Networks into MLPs for $100\times$ Faster Inference ( http://arxiv.org/abs/2402.04296v1 ) ライセンス: Link先を確認	Yifan Feng, Yihe Luo, Shihui Ying, Yue Gao	(参考訳) ハイパーグラフニューラルネットワーク(HGNN)は近年注目され,高次相関モデルにおける優位性から良好な性能を示した。しかし、ハイパーグラフの高次モデリング能力は計算の複雑さを増大させ、実用的な産業展開を妨げることにも注目される。実際、HGNNの効率的なデプロイにおける重要な障壁は、推論中の高次構造的依存関係である。本稿では,HGNNのハイパーグラフ依存性を解消し,計算複雑性を低減し,推論速度の向上を図るため,HGNNと推論効率のよいMulti-Layer Perceptron(MLP)のギャップを埋めることを提案する。具体的には、複雑性の低い高速推論のために、LightHGNNとLightHGNN$^+$を導入する。 LightHGNN は教師 HGNN から学生 MLP への知識をソフトラベルを通じて直接蒸留し、LightHGNN$^+$ は生徒 MLP に信頼性の高い高次相関関係を明示的に注入し、トポロジカルな蒸留と過度なスムースティングに対する耐性を達成する。 8つのハイパーグラフデータセットの実験では、ハイパーグラフの依存関係がなくても、提案されたLightHGNNはHGNNよりも競争力や性能が向上し、バニラMLPを平均16.3ドル上回った。 3つのグラフデータセットに関する広範な実験は、他のすべての方法と比較して、我々のlighthgnnの平均的なパフォーマンスを示している。 5.5wの頂点を持つ合成ハイパーグラフの実験は、LightHGNNがHGNNよりも100\times$で動作可能であることを示している。 Hypergraph Neural Networks (HGNNs) have recently attracted much attention and exhibited satisfactory performance due to their superiority in high-order correlation modeling. However, it is noticed that the high-order modeling capability of hypergraph also brings increased computation complexity, which hinders its practical industrial deployment. In practice, we find that one key barrier to the efficient deployment of HGNNs is the high-order structural dependencies during inference. In this paper, we propose to bridge the gap between the HGNNs and inference-efficient Multi-Layer Perceptron (MLPs) to eliminate the hypergraph dependency of HGNNs and thus reduce computational complexity as well as improve inference speed. Specifically, we introduce LightHGNN and LightHGNN$^+$ for fast inference with low complexity. LightHGNN directly distills the knowledge from teacher HGNNs to student MLPs via soft labels, and LightHGNN$^+$ further explicitly injects reliable high-order correlations into the student MLPs to achieve topology-aware distillation and resistance to over-smoothing. Experiments on eight hypergraph datasets demonstrate that even without hypergraph dependency, the proposed LightHGNNs can still achieve competitive or even better performance than HGNNs and outperform vanilla MLPs by $16.3$ on average. Extensive experiments on three graph datasets further show the average best performance of our LightHGNNs compared with all other methods. Experiments on synthetic hypergraphs with 5.5w vertices indicate LightHGNNs can run $100\times$ faster than HGNNs, showcasing their ability for latency-sensitive deployments.	翻訳日:2024-02-08 18:48:22 公開日:2024-02-06
# ECGスペクトログラムとディープラーニングを用いたパーソナリティトランジット認識 Personality Trait Recognition using ECG Spectrograms and Deep Learning ( http://arxiv.org/abs/2402.04326v1 ) ライセンス: Link先を確認	Muhammad Mohsin Altaf, Saadat Ullah Khan, Muhammad Majd, Syed Muhammad Anwar	(参考訳) 本稿では,心電図(ECG)信号に応用した深層学習(DL)手法を用いて,人格特性の認識に革新的なアプローチを提案する。この研究は、外転、神経症、同意性、良心、開放性を含む5つの大きな性格特性モデルを検出する枠組みの中で、ECG由来の分光図の可能性を探究する。スペクトログラム生成のための最適なウィンドウサイズが決定され、特徴抽出と性格特性分類には畳み込みニューラルネットワーク(CNN)、特にResnet-18、視覚変換器(ViT)が使用される。本研究は,心電図記録を含む各種生理的信号を含む公開型アシュタントデータセットを用いて,ヴァレンスレベルと覚醒レベルに分類された映像刺激の提示中に58名の参加者から収集した。本研究の結果は人格特性の分類において顕著な性能を示し,窓の大きさや性格特性の異なるF1スコア以上を連続的に達成している。以上の結果から,ECG信号スペクトログラムは個性特性認識に有用であり,Resnet-18は個性特性の識別に有効であることが示唆された。 This paper presents an innovative approach to recognizing personality traits using deep learning (DL) methods applied to electrocardiogram (ECG) signals. Within the framework of detecting the big five personality traits model encompassing extra-version, neuroticism, agreeableness, conscientiousness, and openness, the research explores the potential of ECG-derived spectrograms as informative features. Optimal window sizes for spectrogram generation are determined, and a convolutional neural network (CNN), specifically Resnet-18, and visual transformer (ViT) are employed for feature extraction and personality trait classification. The study utilizes the publicly available ASCERTAIN dataset, which comprises various physiological signals, including ECG recordings, collected from 58 participants during the presentation of video stimuli categorized by valence and arousal levels. The outcomes of this study demonstrate noteworthy performance in personality trait classification, consistently achieving F1-scores exceeding 0.9 across different window sizes and personality traits. These results emphasize the viability of ECG signal spectrograms as a valuable modality for personality trait recognition, with Resnet-18 exhibiting effectiveness in discerning distinct personality traits.	翻訳日:2024-02-08 18:34:53 公開日:2024-02-06
# ドーパント系量子ドットの1次元鎖による電子輸送 Electron Transport Through a 1D Chain of Dopant-Based Quantum Dots ( http://arxiv.org/abs/2402.04300v1 ) ライセンス: Link先を確認	Sumedh Vangara	(参考訳) 強い相互作用を持つ電子系は、mott絶縁挙動やスピン流動性などの量子多体現象に対する洞察を与え、半導体最適化を促進する。 Fermi-Hubbard モデルはそのようなシステムを研究するために使われる原型モデルである。しかし、近年の研究では、長距離相互作用を考慮に入れたFermi-Hubbardモデルの方が正確であることが示されている。本研究では,Fermi-Hubbardモデルを用いて量子ドットの格子による電荷輸送を数学的に解析する。スピンレス電子とソースドレインバイアスを持つ一次元鎖が観察され、基底状態と第一励起状態の遷移に焦点が当てられる。レベル反発は、チェーンへのホッピングがチェーン内のホッピングに近づくにつれて、アンチクロスの期待エネルギーレベルを低下させる。鎖に沿った電荷密度の分布はホッピングパラメーター、核パラメーター、クーロンパラメーターによって特徴づけられ、新しいプラズモニック挙動が解析される。電子輸送における小さな摂動は、観測された系の1次元の性質に応じて同定される。この研究は、相関誘起バンドギャップの形成のようなシリコンドープ半導体の電子挙動をよりよく理解し、拡張フェルミ・ハバード模型を量子多体系の研究のより正確な代替として利用するための扉を開く。 Strongly interacting electron systems can provide insight into quantum many-body phenomena, such as Mott insulating behavior and spin liquidity, facilitating semiconductor optimization. The Fermi-Hubbard model is the prototypical model used to study such systems. Recent research, however, has shown that the extended Fermi-Hubbard model, which accounts for long-range interactions, is more accurate, especially for systems far from half-filling. In this study, we use the extended Fermi-Hubbard model to mathematically analyze charge transport through a lattice of quantum dots. One-dimensional chains with spinless electrons and source-drain bias are observed, focusing on the transition between the ground state and the first excited state. Level repulsion decreases the expected energy levels of anticrossings as the hopping onto the chain tends to the hopping within the chain. The distribution of charge density along the chain is characterized in terms of the hopping, nuclear, and Coulomb parameters and novel plasmonic behavior is analyzed. Minor perturbations in electron transport are identified, corresponding to the one-dimensional nature of the observed systems. This research will lead to a better understanding of electron behavior in silicon-doped semiconductors, like the formation of correlation-induced band gaps, and open the door to using the extended Fermi-Hubbard model as a more accurate alternative to study quantum many-body systems.	翻訳日:2024-02-08 18:34:30 公開日:2024-02-06
# 多視点記号回帰 Multi-View Symbolic Regression ( http://arxiv.org/abs/2402.04298v1 ) ライセンス: Link先を確認	Etienne Russeil, Fabr\'icio Olivetti de Fran\c{c}a, Konstantin Malanchev, Bogdan Burlacu, Emille E. O. Ishida, Marion Leroux, Cl\'ement Michelin, Guillaume Moinard, Emmanuel Gangler	(参考訳) 記号回帰(sr)は、説明変数の集合と応答変数の関係を表す解析式を探索する。現在のsrメソッドは、単一の実験から抽出された単一のデータセットを想定している。しかしながら、研究者はしばしば異なる設定で行われた実験から得られた複数の結果に直面する。従来のSR法では、各実験のパラメータが異なるため、基礎となる式を見つけることができない。本研究では,複数のデータセットを同時に考慮し,実験環境を模倣し,一般的なパラメトリック解を出力するマルチビューシンボリック回帰(mvsr)を提案する。このアプローチは、各独立データセットに評価された式を適合させ、すべてのデータセットを正確に適合できる関数 f(x; \theta) のパラメトリック族を返す。我々は、既知の表現から生成されたデータと、天文学、化学、経済から得られた実世界のデータを用いて、MvSRの有効性を実証する。その結果、MvSRは正しい表現をより頻繁に獲得し、ハイパーパラメーターの変化に対して堅牢であることがわかった。実世界のデータでは、集団の振る舞いを把握し、文献から既知の表現を回収し、有望な代替品を回収し、SRを幅広い実験シナリオに利用できるようにする。 Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; \theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behaviour, recovering known expressions from the literature as well as promising alternatives, thus enabling the use SR to a large range of experimental scenarios.	翻訳日:2024-02-08 18:34:04 公開日:2024-02-06
# 道路表面欠陥検出 -画像ベースから非画像ベースへ- Road Surface Defect Detection -- From Image-based to Non-image-based: A Survey ( http://arxiv.org/abs/2402.04297v1 ) ライセンス: Link先を確認	Jongmin Yu, Jiaqi Jiang, Sebastiano Fichera, Paolo Paoletti, Lisa Layzell, Devansh Mehta, and Shan Luo	(参考訳) 交通安全の確保が不可欠であり,道路面欠陥の検出と防止が必要である。その結果,本研究への関心が高まり,様々な路面欠陥検出手法の開発に繋がった。道路欠陥検出方法は、入力データの種類や訓練方法によって、様々な方法で分類することができる。主なアプローチは画像ベースの手法で、ピクセル強度や表面テクスチャを分析して欠陥を識別する。その人気にもかかわらず、画像ベースの手法は天候や照明の変化に対する脆弱性の明確な制限を共有している。この問題に対処するために、レーザースキャナやLiDARなどの追加センサーの使用を検討し、スケールと体積の点で欠陥を検出するための明確な深度情報を提供してきた。しかし,画像以外のデータの探索は十分に研究されていない。本稿では,道路表面欠陥検出研究の包括的レビューを行い,入力データ型と手法に基づいて分類する。さらに,最近提案した非画像ベースの手法を概観し,これらの手法に関する課題と課題について考察した。 Ensuring traffic safety is crucial, which necessitates the detection and prevention of road surface defects. As a result, there has been a growing interest in the literature on the subject, leading to the development of various road surface defect detection methods. The methods for detecting road defects can be categorised in various ways depending on the input data types or training methodologies. The predominant approach involves image-based methods, which analyse pixel intensities and surface textures to identify defects. Despite their popularity, image-based methods share the distinct limitation of vulnerability to weather and lighting changes. To address this issue, researchers have explored the use of additional sensors, such as laser scanners or LiDARs, providing explicit depth information to enable the detection of defects in terms of scale and volume. However, the exploration of data beyond images has not been sufficiently investigated. In this survey paper, we provide a comprehensive review of road surface defect detection studies, categorising them based on input data types and methodologies used. Additionally, we review recently proposed non-image-based methods and discuss several challenges and open problems associated with these techniques.	翻訳日:2024-02-08 18:33:47 公開日:2024-02-06
# 誘導電気双極子系における魅力的な逆二乗ポテンシャルの変化 Modified attractive inverse-square potential in the induced electric dipole system ( http://arxiv.org/abs/2402.04294v1 ) ライセンス: Link先を確認	K. Bakke and J. G. G. S. Ramos	(参考訳) 内側半径を r_{0}$ と表記した拡張された非導電性円柱内の電荷の空間分布について検討する。本研究は, 電界と中性粒子の誘導電双極子モーメントの複雑な相互作用から生じる, 明らかに変化した逆2乗ポテンシャルの出現を明らかにした。この修正されたポテンシャルは、従来の逆二乗ポテンシャルから特に離れており、$r^{-1}$に比例する追加項を示す。結果として、この複雑なシステム内での離散エネルギースペクトルの実現に関する説得力のある証拠を提示する。 We examine the spatial distribution of electric charges within an extended, non-conductive cylinder featuring an inner radius denoted as $r_{0}$. Our investigation unveils the emergence of a distinct modified attractive-inverse square potential, arising from the intricate interplay between the electric field and the induced electric dipole moment of a neutral particle. This modified potential notably departs from the conventional inverse-square potential, showcasing an additional term proportional to $r^{-1}$. As a result, we present compelling evidence for the realization of a discrete energy spectrum within this intricate system.	翻訳日:2024-02-08 18:33:30 公開日:2024-02-06
# 調和振動子による誘導電双極子系の魅力的な逆二乗ポテンシャルについて On the attractive inverse-square potential in the induced electric dipole system under the influence of the harmonic oscillator ( http://arxiv.org/abs/2402.04293v1 ) ライセンス: Link先を確認	K. Bakke and J. G. G. S. Ramos	(参考訳) 我々は、高調波発振器の影響下で誘導電気双極子モーメント系における魅力的な2乗ポテンシャルに対するシュリンガー方程式の解析解を得る。電場配置が中性粒子に対して禁止領域を課すカットオフ点をもたらすとき、境界状態が存在することを示す。そして、$s$-wavesを扱うことにより、強電界レジームにおけるエネルギー固有値と調和振動子の角周波数の小さい値を得る。さらに、エネルギー固有値に関する議論を$s$-waveを超えて拡張する。 We obtain the analytical solutions to the Schr\"odinger equation for the attractive inverse-square potential in an induced electric dipole moment system under the influence of the harmonic oscillator. We show that bound states can exist when the electric field configuration brings a cut-off point that imposes a forbidden region for the neutral particle. Then, by dealing with $s$-waves, we obtain the energy eigenvalues in the strong electric field regime and for small values of the angular frequency of the harmonic oscillator. Further, we extend our discussion about the energy eigenvalues beyond the $s$-waves.	翻訳日:2024-02-08 18:33:21 公開日:2024-02-06
# AdaFlow: 可変適応型フローベースポリシによる模倣学習 AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies ( http://arxiv.org/abs/2402.04292v1 ) ライセンス: Link先を確認	Xixi Hu, Bo Liu, Xingchao Liu and Qiang Liu	(参考訳) 拡散に基づく模倣学習は、多モーダル意思決定における行動クローニング(BC)を改善するが、拡散過程の再帰により推論が著しく遅くなる。多様なアクションを生成する能力を維持しながら、効率的なポリシージェネレータを設計するよう促します。そこで本研究では,フローベース生成モデルに基づく模倣学習フレームワークであるAdaFlowを提案する。 adaflowは、確率フローとして知られる状態条件付き常微分方程式(odes)でポリシーを表す。トレーニング損失の条件分散とODEの離散化誤差との間の興味深い関係を明らかにする。そこで本研究では,AdaFlowを適応型意思決定器とし,多様性を犠牲にすることなく高速な推論を実現する分散適応ODEソルバを提案する。興味深いことに、アクション分布がユニモーダルである場合には、自動的にワンステップジェネレータに還元される。包括的実証評価の結果,AdaFlowは成功率,行動多様性,推論速度など,すべての領域で高いパフォーマンスを実現していることがわかった。コードはhttps://github.com/hxixh/AdaFlowで入手できる。 Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance across all dimensions, including success rate, behavioral diversity, and inference speed. The code is available at https://github.com/hxixixh/AdaFlow	翻訳日:2024-02-08 18:33:12 公開日:2024-02-06
# billm: llmのトレーニング後の量子化の限界を押し上げる BiLLM: Pushing the Limit of Post-Training Quantization for LLMs ( http://arxiv.org/abs/2402.04291v1 ) ライセンス: Link先を確認	Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi	(参考訳) 事前学習された大規模言語モデル(llms)は、例外的な汎用言語処理能力を示すが、メモリと計算資源に大きな要求がある。強力な圧縮技術として、バイナライゼーションはモデル重みをわずか1ビットに減らし、高価な計算とメモリ要求を低減させる。しかし、既存の量子化技術は、超低ビット幅でのLLM性能を維持するには不十分である。この課題に対応して,事前学習LLMに適した1ビット後量子化方式であるBiLLMを提案する。 LLMの重み分布に基づいて、BiLLMはまず有意な重みを識別し、構造的に選択し、効率的な二乗残差近似戦略により圧縮損失を最小化する。さらに,非塩分重みのベル形状分布を考慮し,グループ化と二項化を正確に行うための最適分割探索を提案する。 billmは、様々なllmファミリーにまたがる1.08ビットの重みと評価指標を持つ、初めて高精度な推論(例えば、llama2-70bの8.41パープレキシティ)を達成し、llmのsoma量子化法をかなりマージンで上回っている。さらに、BiLLMは、1つのGPU上で0.5時間以内に70億の重みを持つLLMのバイナライズプロセスを可能にし、良好な時間効率を示す。 Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. Based on the weight distribution of LLMs, BiLLM first identifies and structurally selects salient weights, and minimizes the compression loss through an effective binary residual approximation strategy. Moreover, considering the bell-shaped distribution of the non-salient weights, we propose an optimal splitting search to group and binarize them accurately. BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant margins. Moreover, BiLLM enables the binarization process of the LLM with 7 billion weights within 0.5 hours on a single GPU, demonstrating satisfactory time efficiency.	翻訳日:2024-02-08 18:32:52 公開日:2024-02-06
# CasCast:カスケードモデルによる高度な高分解能降水 CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling ( http://arxiv.org/abs/2402.04290v1 ) ライセンス: Link先を確認	Junchao Gong, Lei Bai, Peng Ye, Wanghan Xu, Na Liu, Jianhua Dai, Xiaokang Yang, Wanli Ouyang	(参考訳) 気象予報において,レーダデータに基づく降雨流しは重要な役割を担い,災害管理に幅広い影響を及ぼす。深層学習に基づく進歩にもかかわらず、降水ナキャスティングの2つの重要な課題はよく解決されていない。 (i)異なるスケールの複雑な降水系の進化のモデル化、二極度の降水量の正確な予測本研究では,メソスケール降水分布と小規模パターンの予測を分離するために,決定的かつ確率的な部分からなるカスケードフレームワークCasCastを提案する。次に,高分解能でカスケードフレームワークを訓練し,計算コストを低減しつつ極端事象の最適化を促進するために,フレーム誘導拡散トランスを用いて低次元潜在空間における確率的モデリングを行う。 3つのベンチマークレーダ降雨データセットに関する広範な実験は、cascastが競合性能を達成していることを示している。特にCasCastは、地域の極端降水流のベースライン(+91.8%)を大幅に上回っている。 Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management. Despite progresses have been made based on deep learning, two key challenges of precipitation nowcasting are not well-solved: (i) the modeling of complex precipitation system evolutions with different scales, and (ii) accurate forecasts for extreme precipitation. In this work, we propose CasCast, a cascaded framework composed of a deterministic and a probabilistic part to decouple the predictions for mesoscale precipitation distributions and small-scale patterns. Then, we explore training the cascaded framework at the high resolution and conducting the probabilistic modeling in a low dimensional latent space with a frame-wise-guided diffusion transformer for enhancing the optimization of extreme events while reducing computational costs. Extensive experiments on three benchmark radar precipitation datasets show that CasCast achieves competitive performance. Especially, CasCast significantly surpasses the baseline (up to +91.8%) for regional extreme-precipitation nowcasting.	翻訳日:2024-02-08 18:32:26 公開日:2024-02-06
# 認知課題中の前頭前野fnirs信号と大学学力テスト(csat)得点との関連性--量子アニーリング法による解析 Association between Prefrontal fNIRS signals during Cognitive tasks and College scholastic ability test (CSAT) scores: Analysis using a quantum annealing approach ( http://arxiv.org/abs/2402.04287v1 ) ライセンス: Link先を確認	Yeaju Kim, Junggu Choi, Bora Kim, Yongwan Park, Jihyun Cha, Jongkwan Choi, and Sanghoon Han	(参考訳) 学術的達成は知的能力の重要な尺度であり、潜在的な予測因子としての認知タスクの広範な研究を促す。機能近赤外分光法(fNIRS)のようなニューロイメージング技術は、脳の血行動態に関する洞察を与え、認知能力と学術的成果との関係を理解する。そこで本研究では,前頭前部fNIRS信号の解析により,認知課題と学業成績との関連性を検討した。 CSATスコアと相関する認知タスクを識別するために、fNIRSデータに新しい量子アニール(QA)特徴選択アルゴリズムを適用した。 2つの時間窓(10秒,60秒)におけるfnirs信号から12の特徴(信号平均,中央値,分散値,ピーク数,ピーク数,ピーク数,ピーク数,ピーク数,斜面,極小値,クルトシス,歪度,標準偏差,根平均正方形)を抽出し,各特徴量条件の比較を行った。 QAベースおよびXGBoost回帰器アルゴリズムの特徴選択結果を比較し,前者の性能評価を行った。複数の線形回帰モデルを用いた3段階の検証プロセスにおいて,特徴変数とCSATスコアの相関係数,モデル適合度(調整R2),モデル予測誤差(RMSE)値を算出した。量子アニーラーは古典的な機械学習モデルに匹敵する性能を示し、言語流布、認識、コルシブロックタッピングタスクを含む特定の認知タスクは、学術的な成果と相関していた。グループ分析の結果、ロンドンのタワーと高いCSATスコアを持つNバックタスクの関係が強くなった。量子アニールアルゴリズムはfNIRSデータを用いた特徴選択において大きな可能性を持ち、新しい研究手法である。今後の研究は、学術的達成と認知能力の予測因子を探るべきである。 Academic achievement is a critical measure of intellectual ability, prompting extensive research into cognitive tasks as potential predictors. Neuroimaging technologies, such as functional near-infrared spectroscopy (fNIRS), offer insights into brain hemodynamics, allowing understanding of the link between cognitive performance and academic achievement. Herein, we explored the association between cognitive tasks and academic achievement by analyzing prefrontal fNIRS signals. A novel quantum annealer (QA) feature selection algorithm was applied to fNIRS data to identify cognitive tasks correlated with CSAT scores. Twelve features (signal mean, median, variance, peak, number of peaks, sum of peaks, slope, minimum, kurtosis, skewness, standard deviation, and root mean square) were extracted from fNIRS signals at two time windows (10- and 60-second) to compare results from various feature variable conditions. The feature selection results from the QA-based and XGBoost regressor algorithms were compared to validate the former's performance. In a three-step validation process using multiple linear regression models, correlation coefficients between the feature variables and the CSAT scores, model fitness (adjusted R2), and model prediction error (RMSE) values were calculated. The quantum annealer demonstrated comparable performance to classical machine learning models, and specific cognitive tasks, including verbal fluency, recognition, and the Corsi block tapping task, were correlated with academic achievement. Group analyses revealed stronger associations between Tower of London and N-back tasks with higher CSAT scores. Quantum annealing algorithms have significant potential in feature selection using fNIRS data, and represents a novel research approach. Future studies should explore predictors of academic achievement and cognitive ability.	翻訳日:2024-02-08 18:32:11 公開日:2024-02-06
# バイオインフォマティクスにおける基礎モデルの進展と可能性 Progress and Opportunities of Foundation Models in Bioinformatics ( http://arxiv.org/abs/2402.04286v1 ) ライセンス: Link先を確認	Qing Li, Zhihang Hu, Yixuan Wang, Lei Li, Yimin Fan, Irwin King, Le Song, Yu Li	(参考訳) バイオインフォマティクスは、人工知能(AI)の統合の増加、特に基礎モデル(FM)の採用によるパラダイムシフトを目撃している。これらのAI技術は急速に進歩し、注釈付きデータの不足やデータノイズの存在といったバイオインフォマティクスの歴史的課題に対処している。 fmsは、ラベル付きデータを実験的に決定する時間とコストのかかる性質のため、生物学的文脈において一般的なシナリオである、大規模でラベル付きデータを扱うのに特に適している。この特徴により、FMは様々な下流検証タスクにおいて顕著な成果を上げ、多様な生物学的実体を効果的に表現する能力を示すことができる。 fmsは計算生物学、特に深層学習の分野で新しい時代を迎えていることは間違いない。本調査の主な目的は,生物情報学におけるFMの体系的調査と要約を行い,その進化の追跡,研究状況,採用方法について述べることである。我々の焦点は、特定の生物学的問題に対するFMの応用であり、研究ニーズに対して適切なFMを選択するための研究コミュニティの指導を目的としています。現状の課題には,シーケンス解析,構造予測,関数アノテーション,マルチモーダル統合などがあり,従来の手法と比較する。さらに, fmsが直面するデータノイズ, モデル説明可能性, 潜在的なバイアスなど, 生物学における課題と限界についても検討した。最後に,今後の生物学的研究におけるFMの潜在的な開発経路と戦略を概説し,この急速に発展する分野におけるイノベーションと応用の段階を定めている。この包括的なレビューは学術的な資源としてだけでなく、生物学におけるfmsの今後の研究と応用のロードマップとしても機能する。 Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial intelligence (AI), particularly through the adoption of foundation models (FMs). These AI techniques have rapidly advanced, addressing historical challenges in bioinformatics such as the scarcity of annotated data and the presence of data noise. FMs are particularly adept at handling large-scale, unlabeled data, a common scenario in biological contexts due to the time-consuming and costly nature of experimentally determining labeled data. This characteristic has allowed FMs to excel and achieve notable results in various downstream validation tasks, demonstrating their ability to represent diverse biological entities effectively. Undoubtedly, FMs have ushered in a new era in computational biology, especially in the realm of deep learning. The primary goal of this survey is to conduct a systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs. We delve into the specifics of the problem at hand including sequence analysis, structure prediction, function annotation, and multimodal integration, comparing the structures and advancements against traditional methods. Furthermore, the review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases. Finally, we outline potential development paths and strategies for FMs in future biological research, setting the stage for continued innovation and application in this rapidly evolving field. This comprehensive review serves not only as an academic resource but also as a roadmap for future explorations and applications of FMs in biology.	翻訳日:2024-02-08 18:31:33 公開日:2024-02-06
# PreS: スケーラブルメモリベースの動的グラフニューラルネットワークを目指して PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks ( http://arxiv.org/abs/2402.04284v1 ) ライセンス: Link先を確認	Junwei Su, Difan Zou, Chuan Wu	(参考訳) メモリベースの動的グラフニューラルネットワーク(MDGNN)は、メモリモジュールを利用して長期の時間的依存関係を抽出、抽出、記憶する動的グラフニューラルネットワークのファミリーであり、メモリレスニューラルネットワークよりも優れたパフォーマンスをもたらす。しかし、MDGNNのトレーニングは、絡み合った時間的および構造的依存関係を扱うという課題に直面し、正確な時間的パターンを捉えるために、データシーケンスの逐次的および時間的処理を必要とする。バッチトレーニングの間、同じバッチ内の時間的データポイントは並列に処理され、その時間的依存関係は無視される。この問題は時間的不連続(temporal discontinuity)と呼ばれ、効率的な時間的バッチサイズを制限し、データの並列性を制限し、産業アプリケーションにおけるMDGNNの柔軟性を低下させる。本稿では,時間的バッチサイズが大きいMDGNNの訓練における時間的不連続性に着目し,大規模MDGNNの効率的な訓練について検討する。まず,時間的バッチサイズがMDGNNトレーニングの収束に及ぼす影響について理論的研究を行った。そこで本研究では, 時間的不連続性の影響を軽減するため, メモリコヒーレンス学習目標と組み合わせた反復予測補正手法preSを提案し, 一般化性能を犠牲にすることなく, MDGNNを時間的バッチで訓練することができることを示した。実験の結果,MDGNNトレーニングでは,最大4倍の時間的バッチ(3.4倍高速化)が可能であった。 Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, training MDGNNs faces the challenge of handling entangled temporal and structural dependencies, requiring sequential and chronological processing of data sequences to capture accurate temporal patterns. During the batch training, the temporal data points within the same batch will be processed in parallel, while their temporal dependencies are neglected. This issue is referred to as temporal discontinuity and restricts the effective temporal batch size, limiting data parallelism and reducing MDGNNs' flexibility in industrial applications. This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes. We first conduct a theoretical study on the impact of temporal batch size on the convergence of MDGNN training. Based on the analysis, we propose PRES, an iterative prediction-correction scheme combined with a memory coherence learning objective to mitigate the effect of temporal discontinuity, enabling MDGNNs to be trained with significantly larger temporal batches without sacrificing generalization performance. Experimental results demonstrate that our approach enables up to a 4x larger temporal batch (3.4x speed-up) during MDGNN training.	翻訳日:2024-02-08 18:31:04 公開日:2024-02-06
# 分割データサイロ:独立プライベートソースからのマルチエージェント知覚のためのクロスドメイン学習 Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources ( http://arxiv.org/abs/2402.04273v1 ) ライセンス: Link先を確認	Jinlong Li, Baolu Li, Xinyu Liu, Runsheng Xu, Jiaqi Ma, Hongkai Yu	(参考訳) 多エージェント認識システムにおける多様なエージェントは、異なる企業のものだ。各企業は、特徴抽出に同じ古典的なニューラルネットワークアーキテクチャベースのエンコーダを使用する。しかしながら、様々なエージェントを訓練するためのデータソースは、各企業で独立してプライベートであり、マルチエージェント知覚システムにおいて異なるエージェントを訓練するための異なるプライベートデータの分散ギャップをもたらす。以上の分布差によるデータサイロは、マルチエージェント知覚の大幅な性能低下をもたらす可能性がある。本稿では,既存のマルチエージェント知覚システムにおける分布ギャップの影響を徹底的に検討する。データサイロを断ち切るために、クロスドメイン学習のためのFeature Distribution-Aware Aggregation (FDA)フレームワークを導入し、上記の分散ギャップをマルチエージェント認識で緩和する。学習可能な機能補償モジュールと分散認識統計一貫性モジュールの2つの重要なコンポーネントで構成されており、どちらもマルチエージェント機能間の分散ギャップを最小化するために中間機能を強化することを目的としている。パブリックなOPV2VとV2XSetデータセットに関する集中的な実験は、既存のマルチエージェント認識システムに対する重要な拡張として、ポイントクラウドベースの3Dオブジェクト検出におけるFDAの有効性を裏付けるものだ。 The diverse agents in multi-agent perception systems may be from different companies. Each company might use the identical classic neural network architecture based encoder for feature extraction. However, the data source to train the various agents is independent and private in each company, leading to the Distribution Gap of different private data for training distinct agents in multi-agent perception system. The data silos by the above Distribution Gap could result in a significant performance decline in multi-agent perception. In this paper, we thoroughly examine the impact of the distribution gap on existing multi-agent perception systems. To break the data silos, we introduce the Feature Distribution-aware Aggregation (FDA) framework for cross-domain learning to mitigate the above Distribution Gap in multi-agent perception. FDA comprises two key components: Learnable Feature Compensation Module and Distribution-aware Statistical Consistency Module, both aimed at enhancing intermediate features to minimize the distribution gap among multi-agent features. Intensive experiments on the public OPV2V and V2XSet datasets underscore FDA's effectiveness in point cloud-based 3D object detection, presenting it as an invaluable augmentation to existing multi-agent perception systems.	翻訳日:2024-02-08 18:30:25 公開日:2024-02-06
# 限界保存・微分プライベート・合成データに基づく線形モデルにおける過剰リスクのバウンダリング Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data ( http://arxiv.org/abs/2402.04375v1 ) ライセンス: Link先を確認	Yvonne Zhou, Mingyu Liang, Ivan Brugere, Dana Dachman-Soled, Danial Dervovic, Antigoni Polychroniadou, Min Wu	(参考訳) 機械学習(ml)の利用が増加すると、mlモデルがトレーニングデータセットに寄与した個人に関する情報を明かす可能性があるという懸念が高まっている。機密データの漏洩を防止するため,実学習データの代わりに差分プライベート(DP)合成トレーニングデータを用いてMLモデルを訓練する。合成データの鍵となる望ましい性質は、元の分布の低次限界を保存する能力である。本研究の主な貢献は, 連続損失関数とリプシッツ損失関数の合成データに基づく線形モデルの過大な経験的リスクに対する, 上層および下層の境界である。我々は理論結果とともに広範な実験を行う。 The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.	翻訳日:2024-02-08 18:22:16 公開日:2024-02-06
# 初学年の工学生の計算思考におけるスキル Skills in computational thinking of engineering students of the first school year ( http://arxiv.org/abs/2402.04340v1 ) ライセンス: Link先を確認	Concepcion Varela, Carolina Rebollar, Olatz Garcia, Eugenio Bravo, Javier Bilbao	(参考訳) 私たちが生きているこのデジタル時代の世界では、学生が獲得しなければならない基本的な能力の1つは、コンピュータ思考(CT)の能力である。形式的な定義に関する一般的なコンセンサスはないが、コンピュータの有無に関わらず、あらゆる領域で発生する可能性のある問題の解決に必要なスキルと態度のセットとして、一般に理解されている。学生が取得したctスキルの計測と評価は基本であり、この目的のためには以前に検証した測定機器を使用する必要がある。本研究では,バスク大学工学部の新入生がCT(Critical Thinking, Algorithmic Thinking, Problem Solving, Cooperativity, Creativity)のスキルを身につけているかどうかを,事前に検証した手法を適用した。 In this world of the digital era, in which we are living, one of the fundamental competences that students must acquire is the competence in Computational Thinking (CT). Although there is no general consensus on a formal definition, there is a general understanding of it as a set of skills and attitudes necessary for the resolution, with or without a computer, of problems that may arise in any area of life. Measuring and evaluating which of the CT skills students have acquired is fundamental, and for this purpose, previously validated measuring instruments must be used. In this study, a previously validated instrument is applied to know if the new students in the Engineering Degrees of the University of the Basque Country have the following skills in CT: Critical Thinking, Algorithmic Thinking, Problem Solving, Cooperativity and Creativity.	翻訳日:2024-02-08 18:22:03 公開日:2024-02-06
# 振動ミラーからの両側光子放出と多光子絡み発生 Bilateral photon emission from a vibrating mirror and multiphoton entanglement generation ( http://arxiv.org/abs/2402.04339v1 ) ライセンス: Link先を確認	Alberto Mercurio, Enrico Russo, Fabio Mauceri, Salvatore Savasta, Franco Nori, Vincenzo Macr\`i, Rosario Lo Franco	(参考訳) 絡み合いは量子対応デバイスの開発において重要な役割を果たしている。重要な目的の1つは、例えば、閉じ込められた電磁場と相互作用する機械振動子を通じて達成される絡み合った状態の決定論的生成と分布である。本研究では,両面完全鏡を含む共振器について検討する。鏡はキャビティモードを2つの独立した閉じ込められた電磁場に分離するが、放射圧相互作用は全てのサブシステム間で高次な効果的な相互作用をもたらす。鏡の位置にも関連し、選択された共鳴条件によっては、2n$-photonの絡み合い生成と両側光子対の放出が研究されている。機械振動子の非古典的性質を実証し、これらの現象を制御する経路を提供し、量子技術における潜在的な応用を開拓する。今後は、マイクロ波や光子など、さまざまなエネルギースケールのサブシステムに、同様の統合デバイスを組み込むことが期待できる。 Entanglement plays a crucial role in the development of quantum-enabled devices. One significant objective is the deterministic creation and distribution of entangled states, achieved, for example, through a mechanical oscillator interacting with confined electromagnetic fields. In this study, we explore a cavity resonator containing a two-sided perfect mirror. Although the mirror separates the cavity modes into two independent confined electromagnetic fields, the radiation pressure interaction gives rise to high-order effective interactions across all subsystems. Depending on the chosen resonant conditions, which are also related to the position of the mirror, we study $2n$-photon entanglement generation and bilateral photon pair emission. Demonstrating the non-classical nature of the mechanical oscillator, we provide a pathway to control these phenomena, opening potential applications in quantum technologies. Looking ahead, similar integrated devices could be used to entangle subsystems across vastly different energy scales, such as microwave and optical photons.	翻訳日:2024-02-08 18:21:47 公開日:2024-02-06
# モノのインターネットにおける識別問題を解決する論理認識法 Logical recognition method for solving the problem of identification in the Internet of Things ( http://arxiv.org/abs/2402.04338v1 ) ライセンス: Link先を確認	Islambek Saymanov	(参考訳) 近年登場した論理代数法と価値論理の応用の新しい分野は、様々な対象や現象、医学的または技術的診断、近代的な機械の構築、テストの問題のチェックなどを認識することで、論理関数を機能空間全体に最適な拡張を構築することができる。例えば、論理認識システムでは、離散解析に基づく論理手法とそれに基づく命題計算は、独自の認識アルゴリズムを構築するために用いられる。一般の場合、論理認識法の使用は、認識される対象や現象の論理的特徴である変数が特徴空間全体にわたってk値関数の最適継続によって表現される論理的接続の存在を提供する。本研究の目的は、ある特徴空間からベクトルとして指定される非交差オブジェクトの論理的特徴とクラスを持つ参照テーブルからなるオブジェクト認識のための論理的手法を開発することである。この方法は、参照テーブルを至るところで定義されていない論理関数として考慮し、論理関数を機能空間全体への最適な継続を構築することで、クラス全体の空間への拡張を決定する。 A new area of application of methods of algebra of logic and to valued logic, which has emerged recently, is the problem of recognizing a variety of objects and phenomena, medical or technical diagnostics, constructing modern machines, checking test problems, etc., which can be reduced to constructing an optimal extension of the logical function to the entire feature space. For example, in logical recognition systems, logical methods based on discrete analysis and propositional calculus based on it are used to build their own recognition algorithms. In the general case, the use of a logical recognition method provides for the presence of logical connections expressed by the optimal continuation of a k-valued function over the entire feature space, in which the variables are the logical features of the objects or phenomena being recognized. The goal of this work is to develop a logical method for object recognition consisting of a reference table with logical features and classes of non-intersecting objects, which are specified as vectors from a given feature space. The method consists of considering the reference table as a logical function that is not defined everywhere and constructing an optimal continuation of the logical function to the entire feature space, which determines the extension of classes to the entire space.	翻訳日:2024-02-08 18:21:30 公開日:2024-02-06
# legallens: 非構造化テキストにおける法的違反の識別にllmを活用する LegalLens: Leveraging LLMs for Legal Violation Identification in Unstructured Text ( http://arxiv.org/abs/2402.04335v1 ) ライセンス: Link先を確認	Dor Bernsohn, Gil Semo, Yaron Vazana, Gila Hayat, Ben Hagag, Joel Niklaus, Rohit Saha, Kyryl Truskovskyi	(参考訳) 本研究では,非構造化テキストデータ中の法的な違反を検出するための1つと,潜在的に影響を受ける可能性のある個人とを関連付ける2つの主な課題に焦点を当てた。我々はLarge Language Models (LLM) を用いて2つのデータセットを構築した。どちらのタスクもクラスアクションケースのコンテキスト用に特別に設計されました。実験設計では、bertファミリーとオープンソースllmの微調整モデルが組み込まれ、クローズドソースllmを使った少数実験が行われた。結果、F1スコア62.69\%(違反識別)と81.02\%(81.02\%)は、データセットと設定が両方のタスクに使用できることを示している。最後に,NLP(法定自然言語処理)分野のさらなる研究を進めるために,実験に使用されるデータセットとコードを公開する。 In this study, we focus on two main tasks, the first for detecting legal violations within unstructured textual data, and the second for associating these violations with potentially affected individuals. We constructed two datasets using Large Language Models (LLMs) which were subsequently validated by domain expert annotators. Both tasks were designed specifically for the context of class-action cases. The experimental design incorporated fine-tuning models from the BERT family and open-source LLMs, and conducting few-shot experiments using closed-source LLMs. Our results, with an F1-score of 62.69\% (violation identification) and 81.02\% (associating victims), show that our datasets and setups can be used for both tasks. Finally, we publicly release the datasets and the code used for the experiments in order to advance further research in the area of legal natural language processing (NLP).	翻訳日:2024-02-08 18:21:10 公開日:2024-02-06
# インテリジェントトランスデューサを用いたホームオートメーションシステム Home Automation System based on Intelligent Transducer Enablers ( http://arxiv.org/abs/2402.04334v1 ) ライセンス: Link先を確認	Manuel Su\'arez-Albela, Paula Fraga-Lamas, Tiago M. Fern\'andez-Caram\'es, Adriana Dapena and Miguel Gonz\'alez-L\'opez	(参考訳) 本稿では, 簡易かつ迅速にトランスデューサを識別・設定することを目的とした, HASITE (Intelligent Transducer Enablersをベースとしたホームオートメーションシステム) を提案する。これらの機能は、多くのトランスデューサがデプロイされる状況において特に有用である。 HASITEは、無線ネットワークと自己設定プロトコルと自己登録プロトコルの両方を用いることで、ホームオートメーションシステムのデプロイを簡単にする。これら3つの要素の応用により、hasiteは新しいトランスデューサをパワーアップするだけで追加することができる。異なる現実的なシナリオで実施されたテストによると、トランスデューサは13秒未満で使用可能である。さらに、すべてのHASITE機能はAPIを通じてアクセスすることができるため、サードパーティシステムとの統合も可能だ。例として、APIに基づいたAndroidアプリケーションが紹介されている。リモートのユーザは、ふつうのスマートフォンやタブレットを使ってトランスデューサと対話できる。 This paper presents a novel home automation system named HASITE (Home Automation System based on Intelligent Transducer Enablers), which has been specifically designed to identify and configure transducers easily and quickly. These features are especially useful in situations where many transducers are deployed, since their setup becomes a cumbersome task that consumes a significant amount of time and human resources. HASITE simplifies the deployment of a home automation system by using wireless networks and both self-configuration and self-registration protocols. Thanks to the application of these three elements, HASITE is able to add new transducers by just powering them up. According to the tests performed in different realistic scenarios, a transducer is ready to be used in less than 13 s. Moreover, all HASITE functionalities can be accessed through an API, which also allows for the integration of third-party systems. As an example, an Android application based on the API is presented. Remote users can use it to interact with transducers by just using a regular smartphone or a tablet.	翻訳日:2024-02-08 18:20:54 公開日:2024-02-06
# LESS: ターゲットのインストラクションチューニングのためのインフルエンシャルデータの選択 LESS: Selecting Influential Data for Targeted Instruction Tuning ( http://arxiv.org/abs/2402.04333v1 ) ライセンス: Link先を確認	Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen	(参考訳) 命令チューニングは大規模言語モデル(llm)の強力な機能を解き放ち、汎用チャットボットを開発するために組み合わせデータセットを効果的に利用する。しかし、現実世界のアプリケーションは、しばしば特別なスキル(推論など)を必要とする。課題は、これらの広範囲なデータセットから最も関連性の高いデータを特定して、特定の能力を効果的に開発することである。 LESSは,データの影響を効果的に推定し,命令データ選択のための低ランクグレーディエント類似度探索を行うアルゴリズムである。重要なことに、LESSはAdamオプティマイザと可変長命令データを扱うために既存の影響定式化を適用する。 LESSはまず、低次元の勾配特徴を持つ再利用性が高く、転送可能な勾配データストアを構築し、その後、特定の機能を具現化した少数ショットの例と類似性に基づいてサンプルを選択する。実験の結果、LESSが選択したデータの5%のトレーニングは、さまざまな下流タスクにわたる完全なデータセットでのトレーニングよりも優れていることが示されている。さらに、選択されたデータは非常に転送性が高く、小さなモデルは、異なるファミリーのより大きなモデルやモデルのために有用なデータを選択するために利用することができる。定性的分析により,本手法は,下流アプリケーションに必要な推論スキルを示すデータを特定するために,表面形状の手がかりを超えていることがわかった。 Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we frame as targeted instruction tuning. We propose LESS, an optimizer-aware and practically efficient algorithm to effectively estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. Crucially, LESS adapts existing influence formulations to work with the Adam optimizer and variable-length instruction data. LESS first constructs a highly reusable and transferable gradient datastore with low-dimensional gradient features and then selects examples based on their similarity to few-shot examples embodying a specific capability. Experiments show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Furthermore, the selected data is highly transferable: smaller models can be leveraged to select useful data for larger models and models from different families. Our qualitative analysis shows that our method goes beyond surface form cues to identify data that exemplifies the necessary reasoning skills for the intended downstream application.	翻訳日:2024-02-08 18:20:38 公開日:2024-02-06
# 非本質ニューロンへのノイズ注入によるDNN対向性ロバスト性および効率性の向上 Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons ( http://arxiv.org/abs/2402.04325v1 ) ライセンス: Link先を確認	Zhenyu Liu, Garrett Gagnon, Swagath Venkataramani, Liu Liu	(参考訳) ディープニューラルネットワーク(dnn)は、医療や金融、自動車など、さまざまな産業に革命をもたらし、データ分析や意思決定において、並列性のない機能を提供する。変革的な影響にもかかわらず、DNNは敵攻撃に対する脆弱性と、より複雑で大規模なモデルに関連する計算コストの増加という、2つの重要な課題に直面している。本稿では,対向ロバスト性と実行効率を同時に向上する効果的な手法を提案する。雑音を均一に注入することでロバスト性を高める従来の研究とは異なり、各dnn層に戦略的に適用される非一様雑音注入アルゴリズムを導入することで、攻撃に現れる逆摂動を妨害する。近似手法を用いることで,本態性ニューロンを同定・保護し,非定常ニューロンにノイズを戦略的に導入する。実験の結果,本手法は攻撃シナリオ,モデルアーキテクチャ,データセットの堅牢性と効率性を両立させることができた。 Deep Neural Networks (DNNs) have revolutionized a wide range of industries, from healthcare and finance to automotive, by offering unparalleled capabilities in data analysis and decision-making. Despite their transforming impact, DNNs face two critical challenges: the vulnerability to adversarial attacks and the increasing computational costs associated with more complex and larger models. In this paper, we introduce an effective method designed to simultaneously enhance adversarial robustness and execution efficiency. Unlike prior studies that enhance robustness via uniformly injecting noise, we introduce a non-uniform noise injection algorithm, strategically applied at each DNN layer to disrupt adversarial perturbations introduced in attacks. By employing approximation techniques, our approach identifies and protects essential neurons while strategically introducing noise into non-essential neurons. Our experimental results demonstrate that our method successfully enhances both robustness and efficiency across several attack scenarios, model architectures, and datasets.	翻訳日:2024-02-08 18:20:16 公開日:2024-02-06
# ConsistI2V:画像対ビデオ生成のための視覚的一貫性の強化 ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation ( http://arxiv.org/abs/2402.04324v1 ) ライセンス: Link先を確認	Weiming Ren, Harry Yang, Ge Zhang, Cong Wei, Xinrun Du, Stephen Huang, Wenhu Chen	(参考訳) Image-to-Video(I2V)生成は、初期フレーム(テキストプロンプトの他)を使用してビデオシーケンスを作成することを目的としている。 i2v世代における大きな課題は、ビデオ全体を通して視覚的な一貫性を維持することである: 既存の方法はしばしば、第一フレームから主題、背景、スタイルの整合性を保つのに苦労し、ビデオストーリー内で流動的で論理的に進歩することを保証する。これらの問題を緩和するために,I2V生成の視覚的一貫性を高める拡散法であるConsistI2Vを提案する。具体的には,(1)空間と運動の一貫性を維持するため,(2)第1フレームの低周波帯域からのノイズ初期化に着目し,レイアウトの一貫性を高める。これらの2つのアプローチにより、ConsistI2Vは高度に一貫したビデオを生成することができる。また、提案手法を拡張して、自動回帰長ビデオ生成とカメラモーション制御における一貫性向上の可能性を示す。本手法の有効性を検証するため,I2V生成のための総合評価ベンチマークであるI2V-Benchを提案する。自動評価と人間評価の結果から,既存の方法よりも consisti2v の方が優れていることが示された。 Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrative. To mitigate these issues, we propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos. We also extend the proposed approaches to show their potential to improve consistency in auto-regressive long video generation and camera motion control. To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation. Our automatic and human evaluation results demonstrate the superiority of ConsistI2V over existing methods.	翻訳日:2024-02-08 18:19:58 公開日:2024-02-06
# 量子機械学習と光子計数のための3次元キャビティにおけるトランスモン量子ビットのキャラクタリゼーション Characterization of a Transmon Qubit in a 3D Cavity for Quantum Machine Learning and Photon Counting ( http://arxiv.org/abs/2402.04322v1 ) ライセンス: Link先を確認	Alessandro D'Elia, Boulos Alfakes, Anas Alkhazaleh, Leonardo Banchi, Matteo Beretta, Stefano Carrazza, Fabio Chiarello, Daniele Di Gioacchino, Andrea Giachero, Felix Henrich, Alex Stephane Piedjou Komnang, Carlo Ligi, Giovanni Maccarrone, Massimo Macucci, Emanuele Palumbo, Andrea Pasquale, Luca Piersanti, Florent Ravaux, Alessio Rettaroli, Matteo Robbiati, Simone Tocci, Claudio Gatti	(参考訳) 本稿では,3次元キャビティにおける超伝導トランスモン量子ビットの量子機械学習および光子計数への応用について報告する。まず,3次元共振器に結合したトランペットキュービットの実現と特性について述べるとともに,シミュレーションフレームワークの詳細な記述と分散シフトやクビット不調和といった重要なパラメータの実験的測定について述べる。次に、単一量子ビットデバイス上に実装された量子機械学習アプリケーションについて報告し、プロトンのuクォークパルトン分布関数に適合することを示す。原稿の最後のセクションでは、同じ3次元共振器に結合した2つの量子ビットに基づく新しいマイクロ波光子検出方式を提案する。これは基本的にダークカウントを減少させ、アクシオンダークマター検索のようなアプリケーションを好む可能性がある。 In this paper we report the use of superconducting transmon qubit in a 3D cavity for quantum machine learning and photon counting applications. We first describe the realization and characterization of a transmon qubit coupled to a 3D resonator, providing a detailed description of the simulation framework and of the experimental measurement of important parameters, like the dispersive shift and the qubit anharmonicity. We then report on a Quantum Machine Learning application implemented on the single-qubit device to fit the u-quark parton distribution function of the proton. In the final section of the manuscript we present a new microwave photon detection scheme based on two qubits coupled to the same 3D resonator. This could in principle decrease the dark count rate, favouring applications like axion dark matter searches.	翻訳日:2024-02-08 18:19:34 公開日:2024-02-06
# 量子シミュレーションのための改良フェルミオンハミルトン Improved Fermion Hamiltonians for Quantum Simulation ( http://arxiv.org/abs/2402.04317v1 ) ライセンス: Link先を確認	Erik Gustafson, Ruth Van de Water	(参考訳) 我々は、ASQTADにインスパイアされたハミルトニアンと高度に改良されたスタッガードクォーク(HISQ)作用を開発し、これらのハミルトニアンが量子シミュレーションにどのように使用できるかを示した。これらの改良されたハミルトン多様体の時間発展のためのゲートコストと、1+1d格子シュウィンガーモデルを用いた格子間隔誤差の低減の実証を提供する。 We developed a Hamiltonian inspired by ASQTAD and highly improved staggered quark (HISQ) actions and show how these Hamiltonians can be used for quantum simulations. Gate costs for the time evolution of these improved Hamiltonians are provided as well as a demonstration of the reduction of lattice spacing errors using the 1+1d lattice Schwinger model.	翻訳日:2024-02-08 18:19:21 公開日:2024-02-06
# きめ細かな報酬による引用文生成のための言語モデル Training Language Models to Generate Text with Citations via Fine-grained Rewards ( http://arxiv.org/abs/2402.04315v1 ) ライセンス: Link先を確認	Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang	(参考訳) 近年のLarge Language Models (LLM) はユーザクエリの応答に有用であることが証明されているが,幻覚の傾向があり,信頼性の低いソースへの参照が欠如しているため,その応答には信頼性が欠如していることが多い。これらの問題に対する直感的な解決策は、証拠として外部文書を参照するテキスト内引用を含めることである。以前の研究は、直接 LLM にインテキストの引用を生成するよう促してきたが、その性能は、特に小さな LLM の場合、満足には程遠い。本研究では, LLMに対して, 応答の正確性を確保しつつ, 支援的かつ関連性の高い引用を生成するための, 微粒な報酬を用いた効果的な学習フレームワークを提案する。また,これらの細粒度報酬を共通llm訓練戦略に適用する系統的分析を行い,従来の手法よりも有利な方法を示した。 ALCEベンチマークから得られた質問応答(QA)データセットについて広範な実験を行い、EXPERTQAを用いてモデルの一般化性を検証する。 LLaMA-2-7Bでは、細粒度の報酬がGPT-3.5-turboを上回り、ベースラインの中で最高の性能を達成する。 While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo.	翻訳日:2024-02-08 18:19:13 公開日:2024-02-06
# 半古典的ユークリッド重力に対する新しい境界条件 New Well-Posed Boundary Conditions for Semi-Classical Euclidean Gravity ( http://arxiv.org/abs/2402.04308v1 ) ライセンス: Link先を確認	Xiaoyi Liu, Jorge E. Santos, Toby Wiseman	(参考訳) 有限空洞における4次元ユークリッド重力を考える。アンダーソンはディリクレ条件が十分に仮定された楕円系を得られないことを示し、境界条件を示唆している。ここでは、1パラメータの境界条件族が存在し、定数$p$でパラメータ化され、適切なワイル再スケール境界計量が固定され、すべてよく表される楕円系を与える。アンダーソンとディリクレの境界条件は、これらの極限$p \to 0$と$\infty$と見ることができる。静的ユークリッド解に着目して、熱力学第一法則を導出する。球面空間境界に制限された充填は平坦な空間あるいはシュワルツシルト解であり、ディリクレの場合と同様の熱力学を持つ。平坦空間のサドルに関する滑らかなユークリッドのゆらぎを考える:$p > 1/6$ に対して、リヒネロヴィチ作用素のスペクトルは安定であり、その固有値は正の実部分を持つ。したがって、大きな$p$ を不備なディリクレ境界条件の正則化と見なすことができる。しかし、$p < 1/6$ の場合、球対称および静的セクターにおいても不安定なモードが存在する。そしてローレンツの署名に目を向ける。 p < 1/6$ の場合、この球面ユークリッド不安定性は境界自体の力学に付随するローレンツ不安定性と対になっていると理解できるかもしれない。しかし、球対称を壊す摂動を考えると、謎が発生する。 p > 1/6$でも動的に不安定なモードが多数存在し、我々が発見したユークリッドの安定性とは対照的である。したがって、安定な熱力学を持つが不安定な力学系を得るように見え、ユークリッド理論を議論する際に実装した滑らかさの標準的な仮定に疑問を呈する。 We consider four-dimensional Euclidean gravity in a finite cavity. Anderson has shown Dirichlet conditions do not yield a well-posed elliptic system, and has suggested boundary conditions that do. Here we point out that there exists a one-parameter family of boundary conditions, parameterized by a constant $p$, where a suitably Weyl rescaled boundary metric is fixed, and all give a well-posed elliptic system. Anderson and Dirichlet boundary conditions can be seen as the limits $p \to 0$ and $\infty$ of these. Focussing on static Euclidean solutions, we derive a thermodynamic first law. Restricting to a spherical spatial boundary, the infillings are flat space or the Schwarzschild solution, and have similar thermodynamics to the Dirichlet case. We consider smooth Euclidean fluctuations about the flat space saddle; for $p > 1/6$ the spectrum of the Lichnerowicz operator is stable -- its eigenvalues have positive real part. Thus we may regard large $p$ as a regularization of the ill-posed Dirichlet boundary conditions. However for $p < 1/6$ there are unstable modes, even in the spherically symmetric and static sector. We then turn to Lorentzian signature. For $p < 1/6$ we may understand this spherical Euclidean instability as being paired with a Lorentzian instability associated with the dynamics of the boundary itself. However, a mystery emerges when we consider perturbations that break spherical symmetry. Here we find a plethora of dynamically unstable modes even for $p > 1/6$, contrasting starkly with the Euclidean stability we found. Thus we seemingly obtain a system with stable thermodynamics, but unstable dynamics, calling into question the standard assumption of smoothness that we have implemented when discussing the Euclidean theory.	翻訳日:2024-02-08 18:18:51 公開日:2024-02-06
# Deep PCCT:Photon Counting Computed Tomography Deep Learning Applications Review Deep PCCT: Photon Counting Computed Tomography Deep Learning Applications Review ( http://arxiv.org/abs/2402.04301v1 ) ライセンス: Link先を確認	Ana Carolina Alves, Andr\'e Ferreira, Gijs Luijten, Jens Kleesiek, Behrus Puladi, Jan Egger, Victor Alves	(参考訳) 医用イメージングは、空間分解能の制限、電子ノイズからの干渉、雑音間のコントラスト比の低下などの課題に直面している。 Photon Counting Computed Tomography (PCCT) はその革新的な技術でこれらの問題に対処するソリューションとして登場した。このレビューは、PCCTの先臨床研究における最近の発展と応用を掘り下げ、従来の画像の限界を克服する可能性を強調している。例えば、pcctは乳房の微妙な異常の検出を改善することに顕著な効果を示しており、以前は達成できなかった詳細レベルを提供する。 PCCTの現在の文献を見ると、スキャナーの主な特徴とその様々な応用について、その技術に関する包括的な分析が示される。さらに、深層学習をpcctに統合し、放射線学的特徴の研究を行い、データ処理における成功例を提示している。これらの進歩を認めつつも、この分野の既存の課題を議論し、将来の研究と医療画像技術の改善への道を開く。近年PCCTが臨床レベルで統合されているため,本研究の論文は限られているが,その潜在的なメリットは様々な診断応用にまで及んでいる。 Medical imaging faces challenges such as limited spatial resolution, interference from electronic noise and poor contrast-to-noise ratios. Photon Counting Computed Tomography (PCCT) has emerged as a solution, addressing these issues with its innovative technology. This review delves into the recent developments and applications of PCCT in pre-clinical research, emphasizing its potential to overcome traditional imaging limitations. For example PCCT has demonstrated remarkable efficacy in improving the detection of subtle abnormalities in breast, providing a level of detail previously unattainable. Examining the current literature on PCCT, it presents a comprehensive analysis of the technology, highlighting the main features of scanners and their varied applications. In addition, it explores the integration of deep learning into PCCT, along with the study of radiomic features, presenting successful applications in data processing. While acknowledging these advances, it also discusses the existing challenges in this field, paving the way for future research and improvements in medical imaging technologies. Despite the limited number of articles on this subject, due to the recent integration of PCCT at a clinical level, its potential benefits extend to various diagnostic applications.	翻訳日:2024-02-08 18:18:17 公開日:2024-02-06
# 時間的ラベル雑音下での時系列からの学習 Learning from Time Series under Temporal Label Noise ( http://arxiv.org/abs/2402.04398v1 ) ライセンス: Link先を確認	Sujay Nagaraj, Walter Gerych, Sana Tonekaboni, Anna Goldenberg, Berk Ustun, Thomas Hartvigsen	(参考訳) 多くのシーケンシャルな分類タスクは、時間とともに変化するラベルノイズに影響される。このようなノイズは、ラベルの品質を改善、悪化、あるいは定期的に変化させる可能性がある。まず,時系列の逐次分類問題である時間ラベル雑音の提案と定式化を行った。この設定では、時間依存ノイズ関数によって破損しながら複数のラベルを順次記録する。まず,ラベルノイズ関数の時間的性質をモデル化することの重要性と,既存の手法が一貫して過小評価されることを示す。次に,データから直接時間ラベルノイズ関数を推定することにより,雑音耐性分類器を訓練する手法を提案する。提案手法は,実データと合成データを用いた多種多様な時間ラベルノイズ関数の存在下での最先端性能につながることを示す。 Many sequential classification tasks are affected by label noise that varies over time. Such noise can cause label quality to improve, worsen, or periodically change over time. We first propose and formalize temporal label noise, an unstudied problem for sequential classification of time series. In this setting, multiple labels are recorded in sequence while being corrupted by a time-dependent noise function. We first demonstrate the importance of modelling the temporal nature of the label noise function and how existing methods will consistently underperform. We then propose methods that can train noise-tolerant classifiers by estimating the temporal label noise function directly from data. We show that our methods lead to state-of-the-art performance in the presence of diverse temporal label noise functions using real and synthetic data.	翻訳日:2024-02-08 18:11:51 公開日:2024-02-06
# LLMベースのソフトウェアエンジニアリングの保証 Assured LLM-Based Software Engineering ( http://arxiv.org/abs/2402.04380v1 ) ライセンス: Link先を確認	Nadia Alshahwan, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang	(参考訳) 本稿では、人間とは独立してコードを改善するために、どのようにしてLarge Language Models(LLMs)を使用できるか、そして、改善されたコード – 元のコードの性質を後退させないことを保証するか、という疑問に対処する。 -検証可能な測定可能な方法でオリジナルを改善するか? この問題に対処するため,遺伝子改良にインスパイアされた生成とテストのアプローチである Assured LLM-based Software Engineering を提唱する。保証されたLLMSEは一連のセマンティックフィルタを適用し、これら2つの保証を満たしていないコードを破棄する。これはLLMの幻覚への適合性の潜在的な問題を克服する。 LLMを使って、どんな人間からも独立してコードを生成することができます。他のヒューマンエンジニアが生成したコードで行うように、人間は最終的なコードレビュアーの役割のみを担います。この記事では,2024年4月15日,ポルトガルのリスボンで開催されたInternational Workshop on Interpretability, Robustness, and Benchmarking in Neural Software EngineeringのMark Harman氏の基調講演の内容の概要を紹介する。 In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code - does not regress the properties of the original code? - improves the original in a verifiable and measurable way? To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approach, inspired by Genetic Improvement. Assured LLMSE applies a series of semantic filters that discard code that fails to meet these twin guarantees. This overcomes the potential problem of LLM's propensity to hallucinate. It allows us to generate code using LLMs, independently of any human. The human plays the role only of final code reviewer, as they would do with code generated by other human engineers. This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.	翻訳日:2024-02-08 18:11:39 公開日:2024-02-06
# 安定な無機材料をテキストとして生成する微調整言語モデル Fine-Tuned Language Models Generate Stable Inorganic Materials as Text ( http://arxiv.org/abs/2402.04379v1 ) ライセンス: Link先を確認	Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi	(参考訳) 安定材料生成のための微調整型大規模言語モデルを提案する。テキストエンコードされた原子論データ上の微調整された大きな言語モデルは不規則であるが、実装は簡単であり、90%のサンプル構造は原子の位置と電荷の物理的制約に従う。学習MLポテンシャルと金標準DFT計算の両方から得られたエネルギーを用いて、我々の最強モデル(微調整LLaMA-2 70B)が、競合拡散モデルCDVAEの約2倍(49%対28%)で準安定であると予測された材料を生成することを示した。テキストプロンピングに固有の柔軟性があるため,安定素材の無条件生成,部分構造インフィルディング,テキスト条件生成を同時に行うことができる。最後に, 言語モデルが結晶構造の主要な対称性を捉える能力は, モデルスケールにより向上し, 事前学習されたllmのバイアスが原子学的データに驚くほど適していることを示す。 We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.	翻訳日:2024-02-08 18:11:07 公開日:2024-02-06
# $\texttt{NeRCC}$:Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems $\texttt{NeRCC}$: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems ( http://arxiv.org/abs/2402.04377v1 ) ライセンス: Link先を確認	Parsa Moradi, Mohammad Ali Maddah-Ali	(参考訳) ストラグラーに対する耐性は予測サービスシステムの重要な要素であり、事前訓練された機械学習モデルの入力データに対する推論を実行する。本稿では、近似符号化コンピューティングのための一般的なストラグラー耐性フレームワークとしてNeRCCを提案する。 nerccは,(1)エンコーディングレグレッションとサンプリング,(2)エンコードされたデータポイントの組合せとしてコード化されたデータポイントを生成する,(2)労働者のクラスタがコード化されたデータポイント上で推論を行う,(3)デコードレグレッションとサンプリング,(3)エンコードされたデータポイント上で利用可能な予測から元のデータポイントの予測をほぼ復元する,の3つのレイヤを含む。このフレームワークの全体的な目的は、符号化層と復号層における2つの回帰モデル間の相互関係を明らかにすることである。本稿では, 2つの正規化項への依存度を和らげることで, ネスト回帰問題の解法を提案する。 LeNet5、RepVGG、Vision Transformer(ViT)など、さまざまなデータセットとさまざまな機械学習モデルに関する広範な実験により、NeRCCは、幅広いストラグラーにおける元の予測を正確に近似し、最先端の技術を最大23%上回ることを示した。 Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant framework for approximate coded computing. NeRCC includes three layers: (1) encoding regression and sampling, which generates coded data points, as a combination of original data points, (2) computing, in which a cluster of workers run inference on the coded data points, (3) decoding regression and sampling, which approximately recovers the predictions of the original data points from the available predictions on the coded data points. We argue that the overall objective of the framework reveals an underlying interconnection between two regression models in the encoding and decoding layers. We propose a solution to the nested regressions problem by summarizing their dependence on two regularization terms that are jointly optimized. Our extensive experiments on different datasets and various machine learning models, including LeNet5, RepVGG, and Vision Transformer (ViT), demonstrate that NeRCC accurately approximates the original predictions in a wide range of stragglers, outperforming the state-of-the-art by up to 23%.	翻訳日:2024-02-08 18:10:36 公開日:2024-02-06
# 実データと代理データによる学習法則のスケーリング Scaling laws for learning with real and surrogate data ( http://arxiv.org/abs/2402.04376v1 ) ライセンス: Link先を確認	Ayush Jain, Andrea Montanari and Eren Sasoglu	(参考訳) 大量の高品質なデータを収集することは、しばしば高価で非現実的であり、機械学習における重要なボトルネックである。ターゲットディストリビューションから、よりアクセスしやすい公開データセット、異なる状況下で収集されたデータ、または生成モデルによって合成されたデータを使って、小さなセットのn$データポイントを拡張できる。ぼやけた区別では、データを‘surrogate data’と呼ぶ。我々は,サロゲートデータをトレーニングに統合するための簡単なスキームを定義し,理論モデルと経験的研究の両方を用いてその振る舞いを探索する。主な発見は次のとおりです。 (i)$ integrated surrogate dataは、オリジナルのディストリビューションのテストエラーを大幅に削減できる。 (ii)$ この利益を得るためには、最適に重み付けされた経験的リスク最小化を使用することが不可欠である。 (iii)$ 実データと代理データの混合で訓練されたモデルのテストエラーは、スケーリング法則によってよく説明される。これは、代理データから最適な重み付けと利得を予測するために使用できる。 Collecting large quantities of high-quality data is often prohibitively expensive or impractical, and a crucial bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources like public datasets, data collected under different circumstances, or synthesized by generative models. Blurring distinctions, we refer to such data as `surrogate data'. We define a simple scheme for integrating surrogate data into training and use both theoretical models and empirical studies to explore its behavior. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution; $(ii)$ In order to reap this benefit, it is crucial to use optimally weighted empirical risk minimization; $(iii)$ The test error of models trained on mixtures of real and surrogate data is well described by a scaling law. This can be used to predict the optimal weighting and the gain from surrogate data.	翻訳日:2024-02-08 18:09:54 公開日:2024-02-06
# 生成AIの世界 - ディープフェイクと大規模言語モデル The World of Generative AI: Deepfakes and Large Language Models ( http://arxiv.org/abs/2402.04373v1 ) ライセンス: Link先を確認	Alakananda Mitra, Saraju P. Mohanty, and Elias Kougianos	(参考訳) 我々は、生成人工知能(GenAI)の時代に住んでいる。 Deepfakes and Large Language Models (LLM)はGenAIの2つの例である。特にディープフェイクは、誤った情報を広め、真実を変えることができるので、社会にとって恐ろしい脅威となる。 LLMは汎用言語を生成する強力な言語モデルである。しかし、その生成的な側面から、悪用された場合のリスクでもある。これらの技術の倫理的利用は大きな懸念事項である。この短い記事は、それらの相互関係を見つけようとしている。 We live in the era of Generative Artificial Intelligence (GenAI). Deepfakes and Large Language Models (LLMs) are two examples of GenAI. Deepfakes, in particular, pose an alarming threat to society as they are capable of spreading misinformation and changing the truth. LLMs are powerful language models that generate general-purpose language. However due to its generative aspect, it can also be a risk for people if used with ill intentions. The ethical use of these technologies is a big concern. This short article tries to find out the interrelationship between them.	翻訳日:2024-02-08 18:09:27 公開日:2024-02-06
# 歩行者横断決定は、雑音の多い視覚知覚下での最適意思決定によって説明できる Pedestrian crossing decisions can be explained by bounded optimal decision-making under noisy visual perception ( http://arxiv.org/abs/2402.04370v1 ) ライセンス: Link先を確認	Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P.P. Jokinen, Antti Oulasvirta, Gustav Markkula	(参考訳) 本稿では,計算的合理性理論に基づく歩行者横断決定のモデルを提案する。交差決定は、人間の認知的限界から生じる最適性に縛られ、境界的に最適であると仮定される。これまでの歩行者行動のモデルは「ブラックボックス」機械学習モデルか、認知的要因に関する明確な仮定を持つ機械的モデルであった。具体的には、機械的にノイズの多い人間の視覚知覚をモデル化し、交差する際の報酬を仮定するが、強化学習を用いて境界付き最適行動ポリシーを学習する。本モデルでは, 従来モデルよりも多くの経験的現象を再現し, 1) 接近する車両の到着までの時間が, 歩行者がギャップを受理するか否か, (2) 降車前を横断する速度と, (3) 降車前を横断する歩行者タイミングと, (4) 降車停止距離の横断タイミングに与える影響について検討した。特に, 速度依存的ギャップ受容などの意思決定における行動が, 視覚的知覚の制約に対する合理的適応の産物である可能性が示唆された。また,個人毎の認知的制約や報酬のパラメータを適合させることで,個人差をよりよく説明できる。結論として、RLモデルとメカニスティックモデルの両方を活用することで、歩行者行動に関する新たな洞察を与え、より正確でスケーラブルな歩行者モデルに有用な基盤を提供する。 This paper presents a model of pedestrian crossing decisions, based on the theory of computational rationality. It is assumed that crossing decisions are boundedly optimal, with bounds on optimality arising from human cognitive limitations. While previous models of pedestrian behaviour have been either 'black-box' machine learning models or mechanistic models with explicit assumptions about cognitive factors, we combine both approaches. Specifically, we model mechanistically noisy human visual perception and assumed rewards in crossing, but we use reinforcement learning to learn bounded optimal behaviour policy. The model reproduces a larger number of known empirical phenomena than previous models, in particular: (1) the effect of the time to arrival of an approaching vehicle on whether the pedestrian accepts the gap, the effect of the vehicle's speed on both (2) gap acceptance and (3) pedestrian timing of crossing in front of yielding vehicles, and (4) the effect on this crossing timing of the stopping distance of the yielding vehicle. Notably, our findings suggest that behaviours previously framed as 'biases' in decision-making, such as speed-dependent gap acceptance, might instead be a product of rational adaptation to the constraints of visual perception. Our approach also permits fitting the parameters of cognitive constraints and rewards per individual, to better account for individual differences. To conclude, by leveraging both RL and mechanistic modelling, our model offers novel insights about pedestrian behaviour, and may provide a useful foundation for more accurate and scalable pedestrian models.	翻訳日:2024-02-08 18:08:51 公開日:2024-02-06
# ニューラルネットワークは複雑さの増加統計を学習する Neural Networks Learn Statistics of Increasing Complexity ( http://arxiv.org/abs/2402.04362v1 ) ライセンス: Link先を確認	Nora Belrose, Quintin Pope, Lucia Quirke, Alex Mallen, Xiaoli Fern	(参考訳) 分布の単純さバイアス(DSB)は、ニューラルネットワークがまずデータ分散の低次モーメントを学習し、次に高次相関に移行することを仮定する。本研究は,低次統計値がトレーニング開始直後のトレーニングセットと一致した最大エントロピー分布において,ネットワークが自動的に良好に学習し,その後にその能力を失うことを示すことによって,DSBに対する説得力のある新たな証拠を示す。また、トークン$n$-gramの周波数と埋め込みベクトルのモーメントの等価性を証明し、LLMのバイアスに関する経験的証拠を見つけることによって、DSBを離散領域に拡張する。最後に, 最適な移動手段を用いて, あるクラスの低次統計を手術的に編集し, 初期学習ネットワークが, 対象クラスから抽出されたかのように, 編集されたサンプルを処理していることを示す。コードはhttps://github.com/EleutherAI/features-across-timeで入手できる。 The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token $n$-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at https://github.com/EleutherAI/features-across-time.	翻訳日:2024-02-08 18:07:50 公開日:2024-02-06
# 適応推論:理論的限界と未探究の機会 Adaptive Inference: Theoretical Limits and Unexplored Opportunities ( http://arxiv.org/abs/2402.04359v1 ) ライセンス: Link先を確認	Soheil Hor, Ying Qian, Mert Pilanci, Amin Arbabian	(参考訳) 本稿では,適応推論アルゴリズムの効率と性能ゲイン機会サイズを定量化する最初の理論的枠組みを提案する。コンピュータビジョンおよび自然言語処理タスクにおける10-100倍の効率向上の可能性を示す実証的証拠により,性能上のペナルティを伴わずに実現可能な効率と性能向上のための新たな近似的および厳密な境界を提供する。さらに,適応推論状態空間の最適選択と設計を通じて,実現可能な効率の向上に関する洞察を提供する。 This paper introduces the first theoretical framework for quantifying the efficiency and performance gain opportunity size of adaptive inference algorithms. We provide new approximate and exact bounds for the achievable efficiency and performance gains, supported by empirical evidence demonstrating the potential for 10-100x efficiency improvements in both Computer Vision and Natural Language Processing tasks without incurring any performance penalties. Additionally, we offer insights on improving achievable efficiency gains through the optimal selection and design of adaptive inference state spaces.	翻訳日:2024-02-08 18:07:30 公開日:2024-02-06
# ダンス生成のための双方向自己回帰拡散モデル Bidirectional Autoregressive Diffusion Model for Dance Generation ( http://arxiv.org/abs/2402.04356v1 ) ライセンス: Link先を確認	Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang	(参考訳) ダンスは人間の感情を表現するための強力な媒体として機能するが、人生のようなダンスの生成は依然としてかなりの課題である。近年、拡散モデルは様々な領域で顕著な生成能力を示した。彼らは、適応可能な多対多の性質のために、人間のモーション生成を約束します。それにもかかわらず、現在の拡散に基づく運動生成モデルは、局所的および双方向的な拡張による動きに焦点を絞らず、直接かつ一方向の運動列を直接生成することが多い。高品質な舞踊の動きを振る舞う際には、音楽的文脈だけでなく、近隣の音楽的な舞踊の動きも考慮する必要がある。そこで本研究では,音楽対ダンス生成のための双方向自己回帰拡散モデル(badm)を提案する。生成したダンス動作をよりスムーズにするため、局所運動強調のための局所情報デコーダを構築する。提案手法は,入力条件と近傍動作に基づいて新たな動きを生成可能とし,個々の動きスライスを反復的に予測し,すべての予測を集約する。生成されたダンスとビートとの同期性をさらに向上するため、ビート情報を入力として組み込んで、より優れた音楽整列ダンス動作を生成する。実験結果から,提案モデルが既存の一方向アプローチと比較して最先端性能を実現することを示す。 Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.	翻訳日:2024-02-08 18:07:19 公開日:2024-02-06
# PQMass:確率質量推定を用いた生成モデルの品質の確率論的評価 PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation ( http://arxiv.org/abs/2402.04355v1 ) ライセンス: Link先を確認	Pablo Lemos, Sammy Sharief, Nikolay Malkin, Laurence Perreault-Levasseur, Yashar Hezaveh	(参考訳) 生成モデルの品質を評価するための包括的サンプルベース手法を提案する。提案手法は,2組のサンプルが同一分布から抽出される確率を推定し,単一生成モデルの性能を評価する統計的に厳密な手法や,同一データセット上で訓練された複数の競合モデルの比較を可能にする。この比較は、空間を重複しない領域に分割し、各領域のデータサンプル数を比較することで行うことができる。このメソッドは生成モデルとテストデータからのサンプルのみを必要とする。高次元データ上で直接機能することができ、次元の縮小の必要性を回避できる。特に,本手法は真の分布の密度に関する仮定に依存せず,訓練や補助モデルへの適合にも依存しない。代わりに、データ空間内の様々な部分領域にわたる密度(確率質量)の積分を近似することに焦点を当てている。 We propose a comprehensive sample-based method for assessing the quality of generative models. The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution, providing a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models trained on the same dataset. This comparison can be conducted by dividing the space into non-overlapping regions and comparing the number of data samples in each region. The method only requires samples from the generative model and the test data. It is capable of functioning directly on high-dimensional data, obviating the need for dimensionality reduction. Significantly, the proposed method does not depend on assumptions regarding the density of the true distribution, and it does not rely on training or fitting any auxiliary models. Instead, it focuses on approximating the integral of the density (probability mass) across various sub-regions within the data space.	翻訳日:2024-02-08 18:06:56 公開日:2024-02-06
# 3Dプリンターで制御されたシリンジポンプは、試薬を2本、アクティブ、レギュラブル、同時に供給する。免疫クロマトグラフィーテストストリップの製造 3D printer-controlled syringe pumps for dual, active, regulable and simultaneous dispensing of reagents. Manufacturing of immunochromatographic test strips ( http://arxiv.org/abs/2402.04354v1 ) ライセンス: Link先を確認	Gabriel Siano, Leandro Peretti, Juan Manuel Marquez, Nazarena Pujato, Leonardo Giovanini and Claudio Berli	(参考訳) 横流式免疫測定法 (lfia) は, 製造コスト, 単純性, 移植性といった複数の利点を組み合わせることで, インフラや高度に訓練された人材を必要とせずにバイオマーカーを検出できるため, 様々なアナライトの検出に世界中で広く用いられている。本稿では,特に試験線 (tl) と制御線 (cl) の形での試薬の制御および能動的投与に関して,実験室規模でのlfiaの製造プロセスに対する解決策を提供する。提案する3dプリンタの適応は簡単で、フリーで、多くの研究所が既にインフラに導入しているため、この課題を達成するため、3dプリンタをシリンジポンプ(sp)の制御にも応用した。 3Dプリンタの標準機能は、SPを切断し、エクストルーダを再接続することで容易に復元できる。さらに、3dプリンターの統一的な制御により、特定の高価な商用機器でのみ見られる4つの機能、デュアル、アクティブ、レギュレータブル、同時ディスペンサーが可能になる。提案手法では,3dプリンタで制御したspsで2本以上の線(cl,tl)を同時に供給することの課題に対処し,実験範囲内では線幅の規制などを行った。また,レプトスピローシス検出のためのLFIAの構築も,自動試薬ディスペンシングの実践例として示されている。 Lateral flow immunoassays (LFIA) are widely used worldwide for the detection of different analytes because they combine multiple advantages such as low production cost, simplicity, and portability, which allows biomarkers detection without requiring infrastructure or highly trained personnel. Here we propose to provide solutions to the manufacturing process of LFIA at laboratory-scale, particularly to the controlled and active dispensing of the reagents in the form the Test Lines (TL) and the Control Lines (CL). To accomplish this task, we adapted a 3D printer to also control Syringe Pumps (SP), since the proposed adaptation of a 3D printer is easy, free and many laboratories already have it in their infrastructure. In turn, the standard function of the 3D printer can be easily restored by disconnecting the SPs and reconnecting the extruder. Additionally, the unified control of the 3D printer enables dual, active, regulable and simultaneous dispensing, four features that are typically found only in certain high-cost commercial equipment. With the proposed setup, the challenge of dispensing simultaneously at least 2 lines (CL and TL) with SPs controlled by a 3D printer was addressed, including regulation in the width of dispensed lines within experimental limits. Also, the construction of a LFIA for the detection of leptospirosis is shown as a practical example of automatized reagent dispensing.	翻訳日:2024-02-08 18:06:41 公開日:2024-02-06
# hedgehog & the porcupine:softmaxの模倣による表現的線形注意 The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry ( http://arxiv.org/abs/2402.04347v1 ) ライセンス: Link先を確認	Michael Zhang, Kush Bhatia, Hermann Kumbong, and Christopher R\'e	(参考訳) 線形の注意はトランスフォーマーの効率を改善する可能性を示し、注意の2次複雑さを線形のシーケンス長に減らした。これは(1)スクラッチからリニアトランスをトレーニングすること、(2)タスク固有のトランスフォーマーをリニアバージョンに変換してタスクパフォーマンスを回復すること、(3)大きな言語モデルのようなトランスフォーマーを下流タスクで微調整可能なリニアバージョンに事前変換すること、のエキサイティングな約束を持っている。しかし、リニアアテンションは、品質において標準的なソフトマックスアテンションを過小評価することが多い。この性能ギャップを埋めるために、以前の線形の注意は、低エントロピー(または「スパイキー」)重みとドット生成単調性(英語版)という、優れた性能に結びついたソフトマックスの注意の鍵的特性を欠いている。さらに,これらの特性を保ち,ソフトマックス性能に適合するが,線形注意で計算するには非効率な,驚くほど単純な特徴マップも観察する。そこで我々は,線形複雑性を維持しつつ,ソフトマックスアテンションのスパイク特性とモノトニック特性を保持する学習可能な線形アテンションであるHedgehogを提案する。 Hedgehogは単純なトレーニング可能なMPPを使用して、ソフトマックスの注意を模倣する注意重みを生成する。実験の結果、Hedgehogは電車からの変圧器の標準品質の99%以上を回復し、WikiText-103の6点の難易度点と微調整された双方向BERTの8.7点のGLUEスコアを上回った。 Hedgehogは事前訓練された変換も可能にする。事前訓練されたGPT-2を線形アテンション変種に変換することで、125Mのサブクワッドラティックデコーダモデルに対して、WikiText-103で最先端の16.7パープレキシティを実現する。トレーニング済みのLlama-2 7BをリニアアテンションLlamaに変換する。低ランク適応では、Hedgehog-Llama2 7Bは標準の注意モデルよりも28.1高いROUGE-1点を達成する。 Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training linear Transformers from scratch, (2) "finetuned-conversion" of task-specific Transformers into linear versions that recover task performance, and (3) "pretrained-conversion" of Transformers such as large language models into linear versions finetunable on downstream tasks. However, linear attentions often underperform standard softmax attention in quality. To close this performance gap, we find prior linear attentions lack key properties of softmax attention tied to good performance: low-entropy (or "spiky") weights and dot-product monotonicity. We further observe surprisingly simple feature maps that retain these properties and match softmax performance, but are inefficient to compute in linear attention. We thus propose Hedgehog, a learnable linear attention that retains the spiky and monotonic properties of softmax attention while maintaining linear complexity. Hedgehog uses simple trainable MLPs to produce attention weights mimicking softmax attention. Experiments show Hedgehog recovers over 99% of standard Transformer quality in train-from-scratch and finetuned-conversion settings, outperforming prior linear attentions up to 6 perplexity points on WikiText-103 with causal GPTs, and up to 8.7 GLUE score points on finetuned bidirectional BERTs. Hedgehog also enables pretrained-conversion. Converting a pretrained GPT-2 into a linear attention variant achieves state-of-the-art 16.7 perplexity on WikiText-103 for 125M subquadratic decoder models. We finally turn a pretrained Llama-2 7B into a viable linear attention Llama. With low-rank adaptation, Hedgehog-Llama2 7B achieves 28.1 higher ROUGE-1 points over the base standard attention model, where prior linear attentions lead to 16.5 point drops.	翻訳日:2024-02-08 18:06:14 公開日:2024-02-06
# 信頼度校正はコンフォーマル予測に役立つか? Does Confidence Calibration Help Conformal Prediction? ( http://arxiv.org/abs/2402.04344v1 ) ライセンス: Link先を確認	Huajun Xi, Jianguo Huang, Lei Feng, Hongxin Wei	(参考訳) 不確実性認定技術としての共形予測は、真のラベルを高い確率で含むことが保証される予測セットを構築する。以前の研究は通常、信頼度校正が共形予測に役立つと仮定して、分類器の校正に温度スケーリングを用いる。本研究は, 熱後キャリブレーション法により, キャリブレーションを改良した予測セットが驚くほど大きくなり, 小温度での過信が共形予測性能の恩恵を受けることを示した。理論的には、高い信頼性は予測セットに新しいクラスを追加する確率を減少させる。この解析に触発されて,接地ラベルの閾値と非定値スコアの差を補正する新しい手法である$\textbf{conformal temperature scaling}$ (confts)を提案する。このようにして、ConfTSの新しい目的は、$\textit{marginal coverage}$を満たす最適なセットに向けて温度値を最適化する。実験により,提案手法は広く用いられている共形予測法を効果的に改善できることが示された。 Conformal prediction, as an emerging uncertainty qualification technique, constructs prediction sets that are guaranteed to contain the true label with high probability. Previous works usually employ temperature scaling to calibrate the classifier, assuming that confidence calibration can benefit conformal prediction. In this work, we first show that post-hoc calibration methods surprisingly lead to larger prediction sets with improved calibration, while over-confidence with small temperatures benefits the conformal prediction performance instead. Theoretically, we prove that high confidence reduces the probability of appending a new class in the prediction set. Inspired by the analysis, we propose a novel method, $\textbf{Conformal Temperature Scaling}$ (ConfTS), which rectifies the objective through the gap between the threshold and the non-conformity score of the ground-truth label. In this way, the new objective of ConfTS will optimize the temperature value toward an optimal set that satisfies the $\textit{marginal coverage}$. Experiments demonstrate that our method can effectively improve widely-used conformal prediction methods.	翻訳日:2024-02-08 18:05:28 公開日:2024-02-06
# 高速オンライン変更点検出 Fast Online Changepoint Detection ( http://arxiv.org/abs/2402.04433v1 ) ライセンス: Link先を確認	Fabrizio Ghezzi, Eduardo Rossi, Lorenzo Trapani	(参考訳) 線形回帰モデルを用いてオンライン変化点検出について検討する。観測地平線の早期に発生する破断のタイムリーな検出を可能にするために, 回帰残差のCUSUMプロセスに基づく重み付き統計クラスを提案する。次に,異なる重み付けスキームを用いて構成された複合統計学のクラスを提案する。変更点をマークする決定規則は,様々な重みで最大の統計値に基づいており,変更点の位置に関係なく迅速な検出を可能にするvetoベースの投票機構として効果的に機能する。我々の理論は、非常に一般的な弱い依存の形で導出され、経済学、医学、その他の応用科学で遭遇する全ての時系列にテストを適用することができる。モンテカルロシミュレーションにより,本手法は手続き的にi型エラーを制御でき,ブレークの有無で検出遅延が短いことを示す。 We study online changepoint detection in the context of a linear regression model. We propose a class of heavily weighted statistics based on the CUSUM process of the regression residuals, which are specifically designed to ensure timely detection of breaks occurring early on during the monitoring horizon. We subsequently propose a class of composite statistics, constructed using different weighing schemes; the decision rule to mark a changepoint is based on the largest statistic across the various weights, thus effectively working like a veto-based voting mechanism, which ensures fast detection irrespective of the location of the changepoint. Our theory is derived under a very general form of weak dependence, thus being able to apply our tests to virtually all time series encountered in economics, medicine, and other applied sciences. Monte Carlo simulations show that our methodologies are able to control the procedure-wise Type I Error, and have short detection delays in the presence of breaks.	翻訳日:2024-02-08 17:57:49 公開日:2024-02-06
# ChatbotがPipelineを発表 - 有限オートマトンによる大規模言語モデルの拡張 Chatbot Meets Pipeline: Augment Large Language Model with Definite Finite Automaton ( http://arxiv.org/abs/2402.04411v1 ) ライセンス: Link先を確認	Yiyou Sun and Junjie Hu and Wei Cheng and Haifeng Chen	(参考訳) 本稿では,大規模言語モデル(llm)を用いた対話型エージェントの能力向上を目的とした新しいフレームワークである,有限オートマトン拡張大言語モデル(dfa-llm)を提案する。従来のllmは、感情的サポートやカスタマサービスなど、所定のレスポンスガイドラインを備えた特別なシナリオで、規制された応答とコンプライアンス応答を生成する上での課題に直面している。我々のフレームワークは、LLM内のトレーニング対話から学んだDFA(Definite Finite Automaton)を組み込むことによって、これらの課題に対処する。この構造的アプローチにより、LDMはDFAによって導かれる決定論的応答経路に従うことができる。 DFA-LLMの利点は、人間可読なDFAによる解釈可能な構造、会話における応答の文脈認識検索、既存のLLMとのプラグアンドプレイ互換性である。大規模なベンチマークでは、DFA-LLMの有効性が検証され、会話エージェントに重要な貢献をする可能性を示している。 This paper introduces the Definite Finite Automaton augmented large language model (DFA-LLM), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs). Traditional LLMs face challenges in generating regulated and compliant responses in special scenarios with predetermined response guidelines, like emotional support and customer service. Our framework addresses these challenges by embedding a Definite Finite Automaton (DFA), learned from training dialogues, within the LLM. This structured approach enables the LLM to adhere to a deterministic response pathway, guided by the DFA. The advantages of DFA-LLM include an interpretable structure through human-readable DFA, context-aware retrieval for responses in conversations, and plug-and-play compatibility with existing LLMs. Extensive benchmarks validate DFA-LLM's effectiveness, indicating its potential as a valuable contribution to the conversational agent.	翻訳日:2024-02-08 17:57:33 公開日:2024-02-06
# フェデレーション学習における公平でロバストで効率的な顧客貢献評価に向けて Towards Fair, Robust and Efficient Client Contribution Evaluation in Federated Learning ( http://arxiv.org/abs/2402.04409v1 ) ライセンス: Link先を確認	Meiying Zhang, Huan Zhao, Sheldon Ebron, Kan Yang	(参考訳) フェデレーション学習(fl)におけるクライアントのパフォーマンスは、さまざまな理由により異なる可能性がある。各クライアントの貢献度を評価することは、クライアントの選択と補償に不可欠である。クライアントが非独立で同一に分散した(非ID)データを持つことが多いため、ノイズや発散する可能性があるため、これは難しい。悪意のあるクライアントのリスクは、特にクライアントのローカルデータやベンチマークルートデータセットにアクセスできない場合の課題を増幅する。本稿ではFRECA(Fair, Robust, Efficient Client Assessment)と呼ばれる新しい手法を提案する。 FRECAはFedTruthというフレームワークを使用して、グローバルモデルの真実の更新を見積もり、すべてのクライアントからのコントリビューションのバランスをとり、悪意のあるクライアントからの影響をフィルタリングする。このアプローチはビザンチン攻撃に対して堅牢であり、ビザンチン耐性集約アルゴリズムを取り入れている。 FRECAはローカルモデルの更新のみで動作し、検証操作やデータセットを必要としないため、効率も良い。実験の結果,frecaはロバストな方法で顧客貢献を正確かつ効率的に定量化できることがわかった。 The performance of clients in Federated Learning (FL) can vary due to various reasons. Assessing the contributions of each client is crucial for client selection and compensation. It is challenging because clients often have non-independent and identically distributed (non-iid) data, leading to potentially noisy or divergent updates. The risk of malicious clients amplifies the challenge especially when there's no access to clients' local data or a benchmark root dataset. In this paper, we introduce a novel method called Fair, Robust, and Efficient Client Assessment (FRECA) for quantifying client contributions in FL. FRECA employs a framework called FedTruth to estimate the global model's ground truth update, balancing contributions from all clients while filtering out impacts from malicious ones. This approach is robust against Byzantine attacks and incorporates a Byzantine-resilient aggregation algorithm. FRECA is also efficient, as it operates solely on local model updates and requires no validation operations or datasets. Our experimental results show that FRECA can accurately and efficiently quantify client contributions in a robust manner.	翻訳日:2024-02-08 17:57:14 公開日:2024-02-06
# 口腔疾患における歯の発見・分別・数量化のための検出用トランスフォーマ : データ増強・塗布技術を中心に Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques ( http://arxiv.org/abs/2402.04408v1 ) ライセンス: Link先を確認	Hocine Kadi, Th\'eo Sourget, Marzena Kawczynski, Sara Bendjama, Bruno Grollemund, Agn\`es Bloch-Zupan	(参考訳) 本研究は, 口腔レア疾患の文脈における深層学習画像処理に着目し, データ利用率の制限による課題を提起する。重要なステップは、歯の検出、セグメンテーション、パノラマX線撮影である。そこで我々は,稀な口腔疾患患者から得られたパノラマx線写真156点を専門家がラベル付けしたデータセットを用いた。我々は, 歯の検知, セグメンテーション, 52種類の計測のための検出トランスフォーマ(DETR)ニューラルネットワークを訓練した。さらに,幾何学的変換を含むデータ拡張手法を用いた。最後に, パノラマ線写真から歯を除去し, 歯を組み込むことにより, 安定した拡散性を有する塗布技術を用いて新しいパノラマ画像を生成する。その結果,データ拡張を伴わないDETRではmAPが0,69以上であった。データ拡張技術を使用すると、mAPは0,82に改善された。また, 塗布法により生成されたパノラマX線写真を用いて, mAPが0,76。 In this work, we focused on deep learning image processing in the context of oral rare diseases, which pose challenges due to limited data availability. A crucial step involves teeth detection, segmentation and numbering in panoramic radiographs. To this end, we used a dataset consisting of 156 panoramic radiographs from individuals with rare oral diseases and labeled by experts. We trained the Detection Transformer (DETR) neural network for teeth detection, segmentation, and numbering the 52 teeth classes. In addition, we used data augmentation techniques, including geometric transformations. Finally, we generated new panoramic images using inpainting techniques with stable diffusion, by removing teeth from a panoramic radiograph and integrating teeth into it. The results showed a mAP exceeding 0,69 for DETR without data augmentation. The mAP was improved to 0,82 when data augmentation techniques are used. Furthermore, we observed promising performances when using new panoramic radiographs generated with inpainting technique, with mAP of 0,76.	翻訳日:2024-02-08 17:56:54 公開日:2024-02-06
# 位相空間におけるガウス関数の線形結合の二次コヒーレンススケール Quadrature Coherence Scale of Linear Combinations of Gaussian Functions in Phase Space ( http://arxiv.org/abs/2402.04404v1 ) ライセンス: Link先を確認	Anaelle Hertz, Aaron Z. Goldberg and Khabat Heshami	(参考訳) 二次コヒーレンス尺度(quadrature coherence scale)は、最近導入された非古典性の効率的な目撃指標である。純粋な状態とガウス状態の単純な形式を取るが、混合状態の一般的な表現は違法に扱いにくい傾向にある。本稿では,ガウス関数の線形結合として表現可能なウィグナー関数を特徴とする量子状態の二次コヒーレンススケールの計算法を提案する。このフレームワークで注目すべき例として、猫の状態、GKP状態、ガウス変換、測定、繁殖プロトコルによる状態がある。特に,二次コヒーレンススケールは,損失の存在下で非古典性のスケーラビリティを調べる上で有用なツールであることを示す。我々の発見は、50%以上の損失を受け、全ての純粋な状態が古典的になるという予想を導いた。また,2次コヒーレンス尺度を,育種プロトコルの出力状態の品質の尺度として検討した。 The quadrature coherence scale is a recently introduced measure that was shown to be an efficient witness of nonclassicality. It takes a simple form for pure and Gaussian states, but a general expression for mixed states tends to be prohibitively unwieldy. In this paper, we introduce a method for computing the quadrature coherence scale of quantum states characterized by Wigner functions expressible as linear combinations of Gaussian functions. Notable examples within this framework include cat states, GKP states, and states resulting from Gaussian transformations, measurements, and breeding protocols. In particular, we show that the quadrature coherence scale serves as a valuable tool for examining the scalability of nonclassicality in the presence of loss. Our findings lead us to put forth a conjecture suggesting that, subject to 50% loss or more, all pure states become classical. We also consider the quadrature coherence scale as a measure of quality of the output state of the breeding protocol.	翻訳日:2024-02-08 17:56:34 公開日:2024-02-06
# エッジ並列グラフエンコーダ埋め込み Edge-Parallel Graph Encoder Embedding ( http://arxiv.org/abs/2402.04403v1 ) ライセンス: Link先を確認	Ariel Lubonja (1), Cencheng Shen (2), Carey Priebe (1) and Randal Burns (1) ((1) Johns Hopkins University, (2) University of Delaware)	(参考訳) グラフを埋め込む新しいアルゴリズムは、低次元表現を見つけるための漸近的複雑さを減少させた。 One-Hot Graph Encoder Embedding (GEE) は1つの線形パスオーバーエッジを使用し、スペクトル埋め込みに漸近的に収束する埋め込みを生成する。このアプローチのスケーリングとパフォーマンスの利点は、インタプリタ言語によるシリアル実装によって制限されている。我々はGEEをLigraグラフエンジンの並列プログラムにリファクタリングし、グラフのエッジ上の関数をマッピングし、ロックフリーなアトミックインストラクションを使ってデータ競合を防ぐ。 1.8Bエッジのグラフでは、オリジナルの実装よりも500倍のスピードアップ、ジャストインタイムのコンパイルバージョンより17倍のスピードアップを実現している。 New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted language. We refactor GEE into a parallel program in the Ligra graph engine that maps functions over the edges of the graph and uses lock-free atomic instrutions to prevent data races. On a graph with 1.8B edges, this results in a 500 times speedup over the original implementation and a 17 times speedup over a just-in-time compiled version.	翻訳日:2024-02-08 17:56:17 公開日:2024-02-06
# 個人化パラメータ効率の良い微調整による大規模言語モデルの民主化 Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning ( http://arxiv.org/abs/2402.04401v1 ) ライセンス: Link先を確認	Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, Meng Jiang	(参考訳) 大規模言語モデル(LLM)におけるパーソナライゼーションは、LLMのインタラクション、コンテンツ、レコメンデーションを個々のユーザの好みに合わせることを目的として、ますます重要になっている。 llmパーソナライゼーションの最近の進歩は、行動履歴検索とテキストプロファイルによる非パラメトリック知識によるユーザクエリの強化によって、効果的なプロンプトデザインにスポットライトを当てている。しかし、これらのアプローチはモデルオーナシップの欠如によって制限され、カスタマイズとプライバシの問題に繋がった。さらに、特にユーザデータが複雑でダイナミックな場合に、ユーザの振る舞いパターンを正確に捉えられなかったことも少なくありません。これらの欠点に対処するため,ユーザ固有の行動パターンや好みを格納するために,PEFTモジュールをパーソナライズするOne PEFT Per User (OPPU)を導入する。ユーザのPEFTパラメータをプラグインすることで、個人でLLMを所有および使用することができる。 OPPUは、個人PEFTパラメータにパラメトリックユーザ知識を、検索とプロファイルを通じて取得した非パラメトリック知識と統合する。この統合は個々のllmをユーザの動作シフトに適応させる。実験の結果,OPPUはLaMPベンチマークの7つのタスクにおいて,既存のプロンプトベースの手法よりも有意に優れていた。さらに詳細な研究により、OPPUのユーザ行動シフト処理能力の強化、異なるアクティブレベルでのユーザモデリング、さまざまなユーザ履歴フォーマット間の堅牢性維持、異なるPEFTメソッドによる汎用性の表示が明らかになった。 Personalization in large language models (LLMs) is increasingly important, aiming to align LLM's interactions, content, and recommendations with individual user preferences. Recent advances in LLM personalization have spotlighted effective prompt design, by enriching user queries with non-parametric knowledge through behavior history retrieval and textual profiles. However, these approaches were limited due to a lack of model ownership, resulting in constrained customization and privacy issues. Moreover, they often failed to accurately capture user behavior patterns, especially in cases where user data were complex and dynamic. To address these shortcomings, we introduce One PEFT Per User (OPPU), which employs personalized parameter-efficient fine-tuning (PEFT) modules, to store user-specific behavior patterns and preferences. By plugging in users' personal PEFT parameters, they can own and use their LLMs personally. OPPU integrates parametric user knowledge in the personal PEFT parameters with the non-parametric knowledge acquired through retrieval and profile. This integration adapts individual LLMs to user behavior shifts. Experimental results demonstrate that OPPU significantly outperforms existing prompt-based methods across seven diverse tasks in the LaMP benchmark. Further in-depth studies reveal OPPU's enhanced capabilities in handling user behavior shifts, modeling users at different active levels, maintaining robustness across various user history formats, and displaying versatility with different PEFT methods.	翻訳日:2024-02-08 17:56:05 公開日:2024-02-06
# CEHR-GPT: 患者時系列による電子健康記録の作成 CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines ( http://arxiv.org/abs/2402.04400v1 ) ライセンス: Link先を確認	Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Elise L. Minto, Jason Patterson, Linying Zhang, George Hripcsak, No\'emie Elhadad, Karthik Natarajan	(参考訳) 合成電子健康記録(ehr)は、医療アプリケーションや機械学習モデル、特に医療データに直接アクセスしない研究者にとって、重要なツールとして登場した。ルールベースのアプローチやGAN(Generative Adversarial Network)のような既存の手法は、現実のEHRデータに似た合成データを生成するが、これらの手法はしばしば表形式を使用し、患者の履歴の時間的依存関係を無視し、データの複製を制限する。近年、EHRデータにGPT(Generative Pre-trained Transformer)を活用することへの関心が高まっている。これにより、疾患の進行分析、人口推定、反事実推論、合成データ生成などのアプリケーションが可能になる。本研究では,合成データ生成に着目し,cehr-bert由来の特定の患者表現を用いてgptモデルを訓練する能力を示し,観察的医療成果連携(omop)データ形式にシームレスに変換可能な患者シーケンスを生成する。 Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.	翻訳日:2024-02-08 17:55:36 公開日:2024-02-06
# QuIP#: Adamard IncoherenceとLattice CodebookによるLLM量子化をさらに改善 QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks ( http://arxiv.org/abs/2402.04396v1 ) ライセンス: Link先を確認	Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa	(参考訳) 後トレーニング量子化(PTQ)は、LLMのメモリフットプリントを減らし、その重みを低精度に定量化する。本稿では,重みのみのptqメソッドであるquip#について紹介する。このメソッドは3つの新しい手法を用いて,最先端の圧縮機構(重量あたり4ビット)を実現する。第一に、QuIP#はランダム化されたアダマール変換を用いてQuIPから不整合処理を改善する。第二に、quip#はベクトル量子化技術を使って、無干渉重みを持つ球形のサブガウシアン分布を利用する:具体的には、最適な8次元単位球充填を達成する、高対称な$e_8$格子に基づく、ハードウェア効率のよいコードブックのセットを導入する。第3に、QuIP#はファインチューニングを使用して、オリジナルのモデルの忠実性を改善する。実験の結果,QuIP#は既存のPTQメソッドよりも優れ,PTQスケーリングにおける新しい動作を可能にし,高速な推論をサポートすることがわかった。 Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves the incoherence processing from QuIP by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization techniques to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference.	翻訳日:2024-02-08 17:55:16 公開日:2024-02-06
# Howard-Harvard効果:交叉不等式の制度的再現 The Howard-Harvard effect: Institutional reproduction of intersectional inequalities ( http://arxiv.org/abs/2402.04391v1 ) ライセンス: Link先を確認	Diego Kozlowski, Thema Monroe-White, Vincent Larivi\`ere and Cassidy R. Sugimoto	(参考訳) 米国高等教育システムは、いくつかの機関内で科学と科学者の生産に集中している。これは、マイノリティ化された学者や、それらが不公平に関連付けられている話題に影響を及ぼす。本稿では,交叉型アイデンティティの異なる機関と著者間の話題的アライメントと,権威と科学的影響との関係について検討する。我々はハワード・ハーバード効果を観察し、ミッション主導の機関ではマイノリティ化学者のトピックプロファイルが増幅され、高名な機関では減少する。その結果、トピックと研究の影響における不平等の一貫したパターンが示される。具体的には,小学生と白人男性の間で,引用や雑誌の影響について統計的に有意な差異を観察する。著名米国大学の総合研究プロファイルは,白人男性の研究プロファイルと高い相関関係にあり,マイノリティー化女性の調査プロファイルと高い負の相関関係がある。さらに、より権威ある機関に属する著者は、引用と雑誌の影響の両方における不平等の増加に関係している。学術機関や資金提供者は、米国が完全に強固な科学的エコシステムを達成するのを妨げる体系的な障壁を緩和する政策を策定するために呼ばれる。 The US higher education system concentrates the production of science and scientists within a few institutions. This has implications for minoritized scholars and the topics with which they are disproportionately associated. This paper examines topical alignment between institutions and authors of varying intersectional identities, and the relationship with prestige and scientific impact. We observe a Howard-Harvard effect, in which the topical profile of minoritized scholars are amplified in mission-driven institutions and decreased in prestigious institutions. Results demonstrate a consistent pattern of inequality in topics and research impact. Specifically, we observe statistically significant differences between minoritized scholars and White men in citations and journal impact. The aggregate research profile of prestigious US universities is highly correlated with the research profile of White men, and highly negatively correlated with the research profile of minoritized women. Furthermore, authors affiliated with more prestigious institutions are associated with increasing inequalities in both citations and journal impact. Academic institutions and funders are called to create policies to mitigate the systemic barriers that prevent the United States from achieving a fully robust scientific ecosystem.	翻訳日:2024-02-08 17:54:52 公開日:2024-02-06
# 複雑な物理インフォームニューラルネットワーク Densely Multiplied Physics Informed Neural Network ( http://arxiv.org/abs/2402.04390v1 ) ライセンス: Link先を確認	Feilong Jiang, Xiaonan Hou, Min Xia	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は非線形偏微分方程式(PDE)を扱う大きな可能性を示しているが、PINNが不十分な精度の問題や不正な結果に悩まされることが一般的である。トレーニングプロセスの最適化によってPINNの能力を向上しようとする既存のソリューションとは異なり、本研究では、PINNの性能向上のためにニューラルネットワークアーキテクチャを改善した。本稿では,隠れたレイヤの出力と隠れたレイヤの出力とを乗算する,密乗型PINN(DM-PINN)アーキテクチャを提案する。より訓練可能なパラメータを導入することなく、この効果的なメカニズムはPINNの精度を大幅に向上させることができる。提案手法は,allan-cahn方程式,helmholtz方程式,burgers方程式,1d対流方程式の4つのベンチマーク例で評価された。提案するアーキテクチャと異なるピン構造の比較により,dm-pinnの性能は精度と効率ともに優れていた。 Although physics-informed neural networks (PINNs) have shown great potential in dealing with nonlinear partial differential equations (PDEs), it is common that PINNs will suffer from the problem of insufficient precision or obtaining incorrect outcomes. Unlike most of the existing solutions trying to enhance the ability of PINN by optimizing the training process, this paper improved the neural network architecture to improve the performance of PINN. We propose a densely multiply PINN (DM-PINN) architecture, which multiplies the output of a hidden layer with the outputs of all the behind hidden layers. Without introducing more trainable parameters, this effective mechanism can significantly improve the accuracy of PINNs. The proposed architecture is evaluated on four benchmark examples (Allan-Cahn equation, Helmholtz equation, Burgers equation and 1D convection equation). Comparisons between the proposed architecture and different PINN structures demonstrate the superior performance of the DM-PINN in both accuracy and efficiency.	翻訳日:2024-02-08 17:54:33 公開日:2024-02-06
# 6つの簡単なステップにおける非定常拡散確率モデル Denoising Diffusion Probabilistic Models in Six Simple Steps ( http://arxiv.org/abs/2402.04384v1 ) ライセンス: Link先を確認	Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong and Bruno Mlodozeniec	(参考訳) Denoising Diffusion Probabilistic Models (DDPM) は、画像およびビデオ生成、タンパク質と物質合成、天気予知、偏微分方程式のニューラルネットワークサロゲートといった様々な問題にうまく適用された、非常に一般的な深層生成モデルである。その普及にもかかわらず、単純で包括的でクリーンで明確であるddpmsの紹介を見つけるのは難しい。研究論文で必要とされるコンパクトな説明は、DDPMを定式化するための様々な設計手順の全てを解明することができず、提示されるステップの理性はしばしば空間を節約するために省略される。さらに、展示は典型的には、その方法がなぜ機能するのかを曖昧にし、実際にうまく機能しない一般化を示唆するため、不必要でおそらく有害な変分下界の視点から提示される。一方、連続的な時間制限を取る視点は美しく一般的であるが、確率微分方程式や確率フローの背景知識を必要とするため、参入への障壁が高い。本稿では、DDPMの定式化を6つの単純なステップに分割し、それぞれに明確な理論的根拠を与える。読者は、基本的な確率的モデリング、ガウス分布、最大確率推定、ディープラーニングを含む機械学習の基本トピックに精通していると仮定する。 Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning.	翻訳日:2024-02-08 17:54:15 公開日:2024-02-06
# FairWire:公正なグラフ生成 FairWire: Fair Graph Generation ( http://arxiv.org/abs/2402.04383v1 ) ライセンス: Link先を確認	O. Deniz Kose and Yanning Shen	(参考訳) グラフ上の機械学習は、重要な相互接続システム内で複雑な関係を分析し学習する能力によって、近年注目を集めている。しかし、これらのアルゴリズムにおける偏りのあるグラフ構造の使用によって増幅される異なる影響は、現実世界の意思決定システムにおけるそれらの導入に重大な懸念を提起している。加えて、合成グラフ生成はプライバシやスケーラビリティの観点から重要になっているが、構造バイアスに対する生成学習アルゴリズムの影響はまだ調査されていない。この研究は、実グラフと合成グラフの両方における構造バイアスの分析と緩和に焦点を当てている。具体的には,まず,構造バイアスの発生源を理論的に解析し,不均一な関係の予測を行う。同定されたバイアス要因を緩和するため、多目的な利用を提供する新しい公正正則化器を設計する。本研究で明らかになったグラフ生成モデルのバイアス増幅に直面すると、我々はさらに公正なグラフ生成フレームワークであるFairWireを提案し、この公正な正規化設計を生成モデルに活用する。実世界のネットワークにおける実験結果から,提案手法が実グラフと合成グラフの両方に対して効果的な構造バイアス緩和をもたらすことが検証された。 Machine learning over graphs has recently attracted growing attention due to its ability to analyze and learn complex relations within critical interconnected systems. However, the disparate impact that is amplified by the use of biased graph structures in these algorithms has raised significant concerns for the deployment of them in real-world decision systems. In addition, while synthetic graph generation has become pivotal for privacy and scalability considerations, the impact of generative learning algorithms on the structural bias has not yet been investigated. Motivated by this, this work focuses on the analysis and mitigation of structural bias for both real and synthetic graphs. Specifically, we first theoretically analyze the sources of structural bias that result in disparity for the predictions of dyadic relations. To alleviate the identified bias factors, we design a novel fairness regularizer that offers a versatile use. Faced with the bias amplification in graph generation models that is brought to light in this work, we further propose a fair graph generation framework, FairWire, by leveraging our fair regularizer design in a generative model. Experimental results on real-world networks validate that the proposed tools herein deliver effective structural bias mitigation for both real and synthetic graphs.	翻訳日:2024-02-08 17:53:51 公開日:2024-02-06
# 解集合プログラミングによる対物生成 Counterfactual Generation with Answer Set Programming ( http://arxiv.org/abs/2402.04382v1 ) ライセンス: Link先を確認	Sopam Dasgupta, Farhad Shakerin, Joaqu\'in Arias, Elmer Salazar, Gopal Gupta	(参考訳) 意思決定を自動化する機械学習モデルは、ローンの承認、プレトライアルの保釈承認、雇用など、連続した分野での利用が増えている。残念なことに、これらのモデルのほとんどはブラックボックスであり、これらの予測決定にどのように到達するかを明らかにすることができない。このような予測を正当化する透明性の必要性。影響を受ける個人は、なぜ意思決定が行われたのかを理解するために説明を求めることもある。倫理的および法的考察は、望ましい結果をもたらすことができる入力属性の変化を個人に通知する必要があるかもしれない。本稿では, 逆実説明を自動生成する後者の問題に焦点をあてる。本稿では,ルールベース機械学習(RBML)アルゴリズムが生成するルールから,応答セットプログラミング(ASP)と s(CASP)目標指向のASPシステムを用いて,逆ファクトリアルな説明を自動的に生成するフレームワークを提案する。本フレームワークでは, 事実の前提が変更/変更される世界を想像することで, 反実的説明がどう計算され, 正当化されるかを示す。さらに重要なことは、これらの世界、すなわち、元の世界/scenarioから、望まれないし望ましくない結果が得られる想像の世界/scenarioに、どのようにナビゲートできるかを示します。 Machine learning models that automate decision-making are increasingly being used in consequential areas such as loan approvals, pretrial bail approval, hiring, and many more. Unfortunately, most of these models are black-boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might also desire explanations to understand why a decision was made. Ethical and legal considerations may further require informing the individual of changes in the input attribute that could be made to produce a desirable outcome. This paper focuses on the latter problem of automatically generating counterfactual explanations. We propose a framework Counterfactual Generation with s(CASP) (CFGS) that utilizes answer set programming (ASP) and the s(CASP) goal-directed ASP system to automatically generate counterfactual explanations from rules generated by rule-based machine learning (RBML) algorithms. In our framework, we show how counterfactual explanations are computed and justified by imagining worlds where some or all factual assumptions are altered/changed. More importantly, we show how we can navigate between these worlds, namely, go from our original world/scenario where we obtain an undesired outcome to the imagined world/scenario where we obtain a desired/favourable outcome.	翻訳日:2024-02-08 17:53:30 公開日:2024-02-06
# ディープラーニングによるIoTネットワークトラフィック分析 IoT Network Traffic Analysis with Deep Learning ( http://arxiv.org/abs/2402.04469v1 ) ライセンス: Link先を確認	Mei Liu and Leon Yang	(参考訳) IoTネットワークはより複雑になり、大量のダイナミックデータを生成するため、従来の統計手法や機械学習手法を使用して異常を監視および検出することは困難である。ディープラーニングアルゴリズムは、大量のデータから処理と学習を行うことができ、教師なしの学習技術を使ってトレーニングすることもできる。これにより、これまで検出されていなかった新しい未知の異常を検出できる。また、ディープラーニングアルゴリズムは自動化され、高度にスケーラブルになり、バックエンドで継続的に動作し、大きなIoTネットワークを即座に監視できるようにする。本研究では,近年の深層学習技術を用いた文献レビューを行い,KDD Cup 99データセット上でのアンサンブル手法を用いたモデルの実装を行う。実験結果は,深部異常検出モデルの印象的な性能を示し,98\%以上の精度を得た。 As IoT networks become more complex and generate massive amounts of dynamic data, it is difficult to monitor and detect anomalies using traditional statistical methods and machine learning methods. Deep learning algorithms can process and learn from large amounts of data and can also be trained using unsupervised learning techniques, meaning they don't require labelled data to detect anomalies. This makes it possible to detect new and unknown anomalies that may not have been detected before. Also, deep learning algorithms can be automated and highly scalable; thereby, they can run continuously in the backend and make it achievable to monitor large IoT networks instantly. In this work, we conduct a literature review on the most recent works using deep learning techniques and implement a model using ensemble techniques on the KDD Cup 99 dataset. The experimental results showcase the impressive performance of our deep anomaly detection model, achieving an accuracy of over 98\%.	翻訳日:2024-02-08 17:47:09 公開日:2024-02-06
# 大規模言語モデルを用いた構造化エンティティ抽出 Structured Entity Extraction Using Large Language Models ( http://arxiv.org/abs/2402.04437v1 ) ライセンス: Link先を確認	Haolun Wu, Ye Yuan, Liana Mikaelyan, Alexander Meulemans, Xue Liu, James Hensman, Bhaskar Mitra	(参考訳) 機械学習の最近の進歩は情報抽出の分野に大きな影響を与えており、Large Language Models (LLM) は構造化されていないテキストから構造化情報を取り出す上で重要な役割を果たしている。本稿では、構造化エンティティ抽出における現在の方法論の課題と限界を考察し、これらの問題に対処するための新しいアプローチを紹介する。まず、構造化エンティティ抽出(SEE)タスクの導入と形式化を行い、続いて、このタスク上でモデルパフォーマンスを適切に評価するように設計されたAESOP(Adroximate Entity Set OverlaP)メトリックを提案します。その後, 抽出タスク全体を多段階に分解し, llmのパワーを活用し, 効率と効率を向上させる新しいモデルを提案する。定量的評価と人体側評価により,本モデルがベースラインより優れており,構造化エンティティ抽出の今後の進歩に期待できる方向を提供する。 Recent advances in machine learning have significantly impacted the field of information extraction, with Large Language Models (LLMs) playing a pivotal role in extracting structured information from unstructured text. This paper explores the challenges and limitations of current methodologies in structured entity extraction and introduces a novel approach to address these issues. We contribute to the field by first introducing and formalizing the task of Structured Entity Extraction (SEE), followed by proposing Approximate Entity Set OverlaP (AESOP) Metric designed to appropriately assess model performance on this task. Later, we propose a new model that harnesses the power of LLMs for enhanced effectiveness and efficiency through decomposing the entire extraction task into multiple stages. Quantitative evaluation and human side-by-side evaluation confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction.	翻訳日:2024-02-08 17:46:56 公開日:2024-02-06
# 連続多次元スケーリング Continuous Multidimensional Scaling ( http://arxiv.org/abs/2402.04436v1 ) ライセンス: Link先を確認	Michael W. Trosset, Carey E. Priebe	(参考訳) 多次元スケーリング (multidimensional scaling, mds) は、n$ のオブジェクトの集合の近接情報を $d$ 次元ユークリッド空間に埋め込む行為である。もともと心理測定のコミュニティが考え出したように、MDSは固定されたオブジェクトの集合に関連する固定された確率のセットを埋めることに関心を持っていた。現代の関心事、例えば、ランダムグラフの統計的推論のための漸近理論の開発において生じる、より一般的には、増大する対象の集合に関連する一連の公理の列の制限挙動を研究することである。点対集合写像の理論の標準的な結果は、$n$ が固定された場合、埋め込み構造の極限は制限された近似の埋め込み構造であることを意味する。でも、$n$が上がったら? したがって、MDSを再構成し、埋め込み問題全体の列を固定空間における最適化問題の列と見なせるようにする必要がある。このような改革を提示し、いくつかの結果をもたらす。 Multidimensional scaling (MDS) is the act of embedding proximity information about a set of $n$ objects in $d$-dimensional Euclidean space. As originally conceived by the psychometric community, MDS was concerned with embedding a fixed set of proximities associated with a fixed set of objects. Modern concerns, e.g., that arise in developing asymptotic theories for statistical inference on random graphs, more typically involve studying the limiting behavior of a sequence of proximities associated with an increasing set of objects. Standard results from the theory of point-to-set maps imply that, if $n$ is fixed, then the limit of the embedded structures is the embedded structure of the limiting proximities. But what if $n$ increases? It then becomes necessary to reformulate MDS so that the entire sequence of embedding problems can be viewed as a sequence of optimization problems in a fixed space. We present such a reformulation and derive some consequences.	翻訳日:2024-02-08 17:46:38 公開日:2024-02-06
# PreGIP: 深部知的財産保護のためのグラフニューラルネットワークの事前学習の透かし PreGIP: Watermarking the Pretraining of Graph Neural Networks for Deep Intellectual Property Protection ( http://arxiv.org/abs/2402.04435v1 ) ライセンス: Link先を確認	Enyan Dai, Minhua Lin, Suhang Wang	(参考訳) グラフニューラルネットワーク(GNN)の事前トレーニングは、さまざまな下流タスクの促進に大きく貢献している。事前学習は一般的に大量のデータと計算資源を必要とするため、事前訓練されたGNNは正当な所有者の高価値知的特性(IP)である。しかし、敵は、下流のタスクのために訓練済みのGNNモデルを違法にコピーして展開することができる。 IP 保護のための GNN 分類器の透かしに最初に取り組みが行われたが、これらの手法は透かしのための目標分類タスクを必要とするため、GNN モデルの自己管理事前訓練には適用できない。そこで本研究では,組込み空間の高品質を維持しつつ,IP保護のためのGNNエンコーダの事前訓練を透かし,PreGIPという新しいフレームワークを提案する。 PreGIPは、事前訓練されたGNNエンコーダの埋め込み空間を透かし、タスクフリーな透かし損失を取り入れている。さらに微調整耐性透かし注入を施す。理論的解析と広範な実験により、ダウンストリームタスクにおけるIP保護と高性能維持における {\method} の有効性が示されている。 Pretraining on Graph Neural Networks (GNNs) has shown great power in facilitating various downstream tasks. As pretraining generally requires huge amount of data and computational resources, the pretrained GNNs are high-value Intellectual Properties (IP) of the legitimate owner. However, adversaries may illegally copy and deploy the pretrained GNN models for their downstream tasks. Though initial efforts have been made to watermark GNN classifiers for IP protection, these methods require the target classification task for watermarking, and thus are not applicable to self-supervised pretraining of GNN models. Hence, in this work, we propose a novel framework named PreGIP to watermark the pretraining of GNN encoder for IP protection while maintain the high-quality of the embedding space. PreGIP incorporates a task-free watermarking loss to watermark the embedding space of pretrained GNN encoder. A finetuning-resistant watermark injection is further deployed. Theoretical analysis and extensive experiments show the effectiveness of {\method} in IP protection and maintaining high-performance for downstream tasks.	翻訳日:2024-02-08 17:46:21 公開日:2024-02-06
# 医用画像調和ベンチマークのための定量的指標 Quantitative Metrics for Benchmarking Medical Image Harmonization ( http://arxiv.org/abs/2402.04426v1 ) ライセンス: Link先を確認	Abhijeet Parida, Zhifan Jiang, Roger J. Packer, Robert A. Avery, Syed M. Anwar, Marius G. Linguraru	(参考訳) 画像調和は、異なるマシンで取得したデータや医療画像における走査プロトコルから生じるドメインシフトに対処するための重要な前処理戦略である。しかしながら、調和技術の有効性のベンチマークは、広く利用可能な標準データセットが欠如しているため、課題となっている。この文脈では、2つの強度調和度と1つの解剖学的保存度という3つの指標が提案される。利用可能な調和基底真理を持つデータセットの広範な研究を通じて、我々のメトリクスが確立された画像品質評価指標と相関していることを示す。これらの新しい指標が、調和基盤の真理が存在しない現実世界のシナリオにどのように適用されるかを示す。さらに, 計量値の異なる解釈に対する洞察を提供し, 調和過程の文脈におけるその意義を明らかにする。以上の結果から,これらの定量的調和指標を画像調和手法の性能評価基準として採用することを提唱した。 Image harmonization is an important preprocessing strategy to address domain shifts arising from data acquired using different machines and scanning protocols in medical imaging. However, benchmarking the effectiveness of harmonization techniques has been a challenge due to the lack of widely available standardized datasets with ground truths. In this context, we propose three metrics: two intensity harmonization metrics and one anatomy preservation metric for medical images during harmonization, where no ground truths are required. Through extensive studies on a dataset with available harmonization ground truth, we demonstrate that our metrics are correlated with established image quality assessment metrics. We show how these novel metrics may be applied to real-world scenarios where no harmonization ground truth exists. Additionally, we provide insights into different interpretations of the metric values, shedding light on their significance in the context of the harmonization process. As a result of our findings, we advocate for the adoption of these quantitative harmonization metrics as a standard for benchmarking the performance of image harmonization techniques.	翻訳日:2024-02-08 17:46:05 公開日:2024-02-06
# 造船所用スマートパイプシステム 4.0 Smart Pipe System for a Shipyard 4.0 ( http://arxiv.org/abs/2402.04423v1 ) ライセンス: Link先を確認	Paula Fraga-Lamas, Diego Noceda-Davila, Tiago M. Fern\'andez-Caram\'es, Manuel A. D\'iaz-Bouza and Miguel Vilar-Montesinos	(参考訳) 産業4.0パラダイムの進歩的な注入の結果、多くの産業が造船所が無視できない革命を試みている。そのため、造船所への産業4.0の原則の適用は、造船所4.0の創設に繋がる。このため、世界最大の造船会社10社のうちの1つであるナバンティアは、造船所4.0が直面する近未来の課題に対応するため、内部の作業全体を更新している。このような課題は、プロダクションシステムの垂直統合、新しい世代の価値創造ネットワークの水平統合、プロダクションチェーン全体の再設計の3つのグループに分けられる。パイプは船上に存在し、種類もさまざまで、重要な部品の1つであり、その監視は将来的なサイバー物理システムを構成する。改良された識別能力、トレーサビリティ、屋内位置、生産から生活を通じて、造船所の生産性と安全性を高めることができる。このような作業を行うため、本論文はまず造船所環境の徹底的な分析を行う。この分析から,本質的なハードウェアおよびソフトウェア技術要件が決定される。次に、スマートパイプの概念を、造船所で拡張サービスを提供するために定期的に信号を送信できるオブジェクトとして提示し、定義する。スマートパイプシステムを構築するために、異なる技術が選択され、評価され、受動的かつアクティブなrfidがそれを作るのに最適な技術であると結論づけられる。さらに、パイプワークショップで得られた有望な屋内測位結果から、マルチアンテナアルゴリズムとカルマンフィルタが受信信号強度(rss)の安定化とシステム全体の精度の向上に寄与することを示した。 As a result of the progressive implantation of the Industry 4.0 paradigm, many industries are experimenting a revolution that shipyards cannot ignore. Therefore, the application of the principles of Industry 4.0 to shipyards are leading to the creation of Shipyards 4.0. Due to this, Navantia, one of the 10 largest shipbuilders in the world, is updating its whole inner workings to keep up with the near-future challenges that a Shipyard 4.0 will have to face. Such challenges can be divided into three groups: the vertical integration of production systems, the horizontal integration of a new generation of value creation networks, and the re-engineering of the entire production chain, making changes that affect the entire life cycle of each piece of a ship. Pipes, which exist in a huge number and varied typology on a ship, are one of the key pieces, and its monitoring constitutes a prospective cyber-physical system. Their improved identification, traceability, and indoor location, from production and through their life, can enhance shipyard productivity and safety. In order to perform such tasks, this article first conducts a thorough analysis of the shipyard environment. From this analysis, the essential hardware and software technical requirements are determined. Next, the concept of smart pipe is presented and defined as an object able to transmit signals periodically that allows for providing enhanced services in a shipyard. In order to build a smart pipe system, different technologies are selected and evaluated, concluding that passive and active RFID are currently the most appropriate technologies to create it. Furthermore, some promising indoor positioning results obtained in a pipe workshop are presented, showing that multi-antenna algorithms and Kalman filtering can help to stabilize Received Signal Strength (RSS) and improve the overall accuracy of the system.	翻訳日:2024-02-08 17:45:49 公開日:2024-02-06
# rで脆弱なコードエンティティを調べる Studying Vulnerable Code Entities in R ( http://arxiv.org/abs/2402.04421v1 ) ライセンス: Link先を確認	Zixiao Zhao, Millon Madhur Das, Fatemeh H. Fard	(参考訳) 事前訓練されたコード言語モデル(Code-PLMs)は、過去数年間で多くの進歩を示し、多くのソフトウェアエンジニアリングタスクで最先端の結果を得た。 These models are mainly targeted for popular programming languages such as Java and Python, leaving out many other ones like R. Though R has a wide community of developers and users, there is little known about the applicability of Code-PLMs for R. In this preliminary study, we aim to investigate the vulnerability of Code-PLMs for code entities in R. For this purpose, we use an R dataset of code and comment pairs and then apply CodeAttack, a black-box attack model that uses the structure of code to generate adversarial code samples. これは、一般的なプログラミング言語(例えばJava)と比較して、Rトークンの型の重要性を理解するための第一歩です。私たちは研究をコード要約に限定します。その結果、最も脆弱なコードエンティティが識別子であり、Rに特有の構文トークンが続き、トークン型の重要性が明らかになり、R言語のコード要約とメソッド名予測のためのモデルの開発に役立ちます。 Pre-trained Code Language Models (Code-PLMs) have shown many advancements and achieved state-of-the-art results for many software engineering tasks in the past few years. These models are mainly targeted for popular programming languages such as Java and Python, leaving out many other ones like R. Though R has a wide community of developers and users, there is little known about the applicability of Code-PLMs for R. In this preliminary study, we aim to investigate the vulnerability of Code-PLMs for code entities in R. For this purpose, we use an R dataset of code and comment pairs and then apply CodeAttack, a black-box attack model that uses the structure of code to generate adversarial code samples. We investigate how the model can attack different entities in R. This is the first step towards understanding the importance of R token types, compared to popular programming languages (e.g., Java). We limit our study to code summarization. Our results show that the most vulnerable code entity is the identifier, followed by some syntax tokens specific to R. The results can shed light on the importance of token types and help in developing models for code summarization and method name prediction for the R language.	翻訳日:2024-02-08 17:45:21 公開日:2024-02-06
# 機械学習の測定はステレオタイプから損なう--どのエラーがどのような方法で損なわれているのかを理解する必要がある Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways ( http://arxiv.org/abs/2402.04420v1 ) ライセンス: Link先を確認	Angelina Wang and Xuechunzi Bai and Solon Barocas and Su Lin Blodgett	(参考訳) 機械学習のアプリケーションが普及するにつれて、その危険性を理解する必要がある。しかし、現在の公正度測定基準は人間の心理的な害経験にはほとんど根ざしていない。ステレオタイプの社会心理学を題材として,画像検索におけるジェンダーステレオタイプの事例研究を行い,機械学習の誤りに対する反応について検討した。まず、すべての機械学習エラーがステレオタイプを反映しているか、等しく有害であるかを示すために、調査研究を使用する。実験では、被験者にステレオタイプ強化、違反、ニュートラルな機械学習エラーをランダムに暴露する。ステレオタイプ強化エラーは、認知的信念、態度、行動に対する最小限の変化を持ちながら、より経験的な(主観的な)有害な経験をもたらす。この経験的な害は男性よりも女性に影響を及ぼす。しかし、ある種のステレオタイプに違反するエラーは、男性にとってより実験的に有害である。我々は、公正な緩和において害は唯一のガイドにはならないと結論し、誰が害と理由を経験しているかによって、ニュアンスな視点を提案する。 As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.	翻訳日:2024-02-08 17:45:04 公開日:2024-02-06
# 胸部ct分類における弱教師付き深層学習の性能の制限 What limits performance of weakly supervised deep learning for chest CT classification? ( http://arxiv.org/abs/2402.04419v1 ) ライセンス: Link先を確認	Fakrul Islam Tushar, Vincent M. D'Anniballe, Geoffrey D. Rubin, Joseph Y. Lo	(参考訳) ノイズデータを用いた弱い教師付き学習は,良質な疾患ラベルが不足していることから,医療画像コミュニティの注目を集めている。しかし、このような弱教師付き学習の限界と、これらの制約が疾患分類性能に与える影響についてはほとんど分かっていない。本稿では,3つの条件に対するモデル許容性を調べることにより,このような弱い監督の影響を検証した。まず,学習データ内のラベルの誤差を段階的に増加させることにより,ノイズデータに対するモデル許容性を検討した。第2に,データセットサイズがトレーニングデータ量に与える影響について検討した。第3に,バイナリ分類とマルチラベル分類の比較を行った。その結果, 疾患分類性能の低下を経験する前に, ラベルエラーを最大10%加えることができた。すべての病級でトレーニングデータ量が増加し,75%のトレーニングデータで高い成績を呈するようになると,疾患分類性能は着実に向上した。最後に、バイナリモデルはすべての疾患カテゴリでマルチラベルモデルよりも優れていた。しかし、二項モデルは同時進行する疾患の影響を強く受けており、画像中の病気の特定の特徴を学ばなかったため、このような解釈は誤解を招く可能性がある。結論として, この研究は, 医療画像のコミュニティにおいて, ノイズラベルによる弱い監督の利点とリスクを理解するのに役立つ可能性がある。このような研究は、多様な大規模データセットを構築し、説明可能なAIを開発する必要性を示している。 Weakly supervised learning with noisy data has drawn attention in the medical imaging community due to the sparsity of high-quality disease labels. However, little is known about the limitations of such weakly supervised learning and the effect of these constraints on disease classification performance. In this paper, we test the effects of such weak supervision by examining model tolerance for three conditions. First, we examined model tolerance for noisy data by incrementally increasing error in the labels within the training data. Second, we assessed the impact of dataset size by varying the amount of training data. Third, we compared performance differences between binary and multi-label classification. Results demonstrated that the model could endure up to 10% added label error before experiencing a decline in disease classification performance. Disease classification performance steadily rose as the amount of training data was increased for all disease classes, before experiencing a plateau in performance at 75% of training data. Last, the binary model outperformed the multilabel model in every disease category. However, such interpretations may be misleading, as the binary model was heavily influenced by co-occurring diseases and may not have learned the specific features of the disease in the image. In conclusion, this study may help the medical imaging community understand the benefits and risks of weak supervision with noisy labels. Such studies demonstrate the need to build diverse, large-scale datasets and to develop explainable and responsible AI.	翻訳日:2024-02-08 17:44:44 公開日:2024-02-06
# 分散型ブロックチェーンベースのロバストマルチエージェントマルチアーム付きバンディット Decentralized Blockchain-based Robust Multi-agent Multi-armed Bandit ( http://arxiv.org/abs/2402.04417v1 ) ライセンス: Link先を確認	Mengfan Xu, Diego Klabjan	(参考訳) 我々は、複数のクライアントまたは参加者が完全に分散したブロックチェーン上に分散され、悪意を持つ可能性がある、堅牢なマルチエージェントマルチアームのバンディット問題を調査した。アームの報酬はクライアント間で均質であり、システムが十分に安全である場合にのみ参加者に明らかにされる時間不変確率分布に従う。システムの目的は、正直な参加者が得た累積報酬を効率的に確保することである。この目的と最善の知識のために、私たちは、ブロックチェーンからの高度な技術と新しいメカニズムを、正直な参加者のために最適な戦略を設計するシステムに組み入れました。これにより、さまざまな悪意ある行動や、参加者のプライバシーの維持が可能になる。より具体的には、すべての参加者にアクセス可能な検証者のプールをランダムに選択し、これらの検証者のためのデジタル署名に基づく真新しいコンセンサスメカニズムをデザインし、安全なマルチパーティ計算を通じて参加者からの情報を少なくするucbベースの戦略を考案し、連鎖参加型インタラクションと参加者の参加を促すインセンティブメカニズムを設計する。特に、ブロックチェーンの最適性という文脈で後悔して解析することにより、提案アルゴリズムの理論的保証を最初に証明した。ブロックチェーンと、主に数値最適性に焦点を当てたフェデレーション学習のような学習問題を統合する既存の作業とは異なり、正直な参加者の後悔は、$log{T}$で上限づけられている。これは、悪意のある参加者がいないマルチエージェントのマルチアームバンディット問題と、純粋なビザンティン攻撃を伴う堅牢なマルチエージェントのマルチアームバンディット問題と一致している。 We study a robust multi-agent multi-armed bandit problem where multiple clients or participants are distributed on a fully decentralized blockchain, with the possibility of some being malicious. The rewards of arms are homogeneous among the clients, following time-invariant stochastic distributions that are revealed to the participants only when the system is secure enough. The system's objective is to efficiently ensure the cumulative rewards gained by the honest participants. To this end and to the best of our knowledge, we are the first to incorporate advanced techniques from blockchains, as well as novel mechanisms, into the system to design optimal strategies for honest participants. This allows various malicious behaviors and the maintenance of participant privacy. More specifically, we randomly select a pool of validators who have access to all participants, design a brand-new consensus mechanism based on digital signatures for these validators, invent a UCB-based strategy that requires less information from participants through secure multi-party computation, and design the chain-participant interaction and an incentive mechanism to encourage participants' participation. Notably, we are the first to prove the theoretical guarantee of the proposed algorithms by regret analyses in the context of optimality in blockchains. Unlike existing work that integrates blockchains with learning problems such as federated learning which mainly focuses on numerical optimality, we demonstrate that the regret of honest participants is upper bounded by $log{T}$. This is consistent with the multi-agent multi-armed bandit problem without malicious participants and the robust multi-agent multi-armed bandit problem with purely Byzantine attacks.	翻訳日:2024-02-08 17:44:07 公開日:2024-02-06
# webスケールマルチモーダルデータからの検索による教師なしドメイン一般化のためのデータ中心アプローチ A Data Centric Approach for Unsupervised Domain Generalization via Retrieval from Web Scale Multimodal Data ( http://arxiv.org/abs/2402.04416v1 ) ライセンス: Link先を確認	Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis	(参考訳) ドメイン一般化 (Domain Generalization, DG) は、共有ラベル空間を仮定して、1つ以上のソースドメインを活用するテスト領域を一般化できるモデルを学ぶ重要な問題である。しかし、ほとんどのdgメソッドは、ターゲットのラベル空間における豊富なソースデータへのアクセスを前提としており、ターゲットのタスクと同じラベル空間を取得する場合、多くの実世界のアプリケーションに対して過度に厳密であることを証明する要件である。この設定のために、細かいチューニング中にlaion-2bのような大きなタスクに依存しない非ラベルのソースデータセットを使用するunsupervised domain generalization(udg)問題のマルチモーダルバージョンに取り組む。私たちのフレームワークは、ソースデータセットとターゲットタスクの関係を明示的に仮定していません。代わりに、ソースデータセットを共同ビジョン言語空間で効率的に検索できるという前提にのみ依存する。このマルチモーダルUDG設定では,(1)ラベル名を用いたクエリの多様化,(2)擬似ラベル付け,(3)クラスタリングによる代表サンプルの検索,という3つの簡単なステップで,ソースデータの小さな($100K)サブセットを構築する方法を提案する。マルチモーダルUDG問題の研究価値を示すために,各ベンチマークにおける最先端のソースフリーDGとゼロショット(ZS)手法を比較し,20種類のターゲットデータセットに対して最大10%の精度向上を示す。さらに, この多段階データセット構築手法は, 近隣の検索よりも平均3%改善されている。コード提供: https://github.com/chris210634/mudg Domain generalization (DG) is an important problem that learns a model that can generalize to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (UDG) problem, which uses a large task-agnostic unlabeled source dataset, such as LAION-2B during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be efficiently searched in a joint vision-language space. For this multimodal UDG setting, we propose a novel method to build a small ($<$100K) subset of the source data in three simple steps: (1) diversified retrieval using label names as queries, (2) rank pseudo-labeling, and (3) clustering to find representative samples. To demonstrate the value of studying the multimodal UDG problem, we compare our results against state-of-the-art source-free DG and zero-shot (ZS) methods on their respective benchmarks and show up to 10% improvement in accuracy on 20 diverse target datasets. Additionally, our multi-stage dataset construction method achieves 3% improvement on average over nearest neighbors retrieval. Code is available: https://github.com/Chris210634/mudg	翻訳日:2024-02-08 17:43:17 公開日:2024-02-06
# 対称測定による非マルコフ量子力学 Non-Markovian quantum dynamics from symmetric measurements ( http://arxiv.org/abs/2402.04415v1 ) ライセンス: Link先を確認	Katarzyna Siudzi\'nska	(参考訳) 我々は対称測度演算子を用いて、一般化されたパウリチャネルのさらなる一般化を提供する量子チャネルを構築する。得られた写像はビストヒスティックであるが、一般には混合ユニタリではない。完全正当性や量子エンタングルメントを破る能力など,それらの重要な性質を解析する。主部では、時間局所発生器による対応する開量子系力学を考察する。動的写像の可除性から、十分なマルコビアン性および非マルコビアン性条件を導出する。インストラクティブな例として、P-分割可能な一般化されたパウリ力学写像の生成元を示し、デコヒーレンス率のより負性性を高める。 We use symmetric measurement operators to construct quantum channels that provide a further generalization of generalized Pauli channels. The resulting maps are bistochastic but in general no longer mixed unitary. We analyze their important properties, such as complete positivity and the ability to break quantum entanglement. In the main part, we consider the corresponding open quantum systems dynamics with time-local generators. From divisibility properties of dynamical maps, we derive sufficient Markovianity and non-Markovianity conditions. As instructive examples, we present the generators of P-divisible generalized Pauli dynamical maps that allow for more negativity in the decoherence rates.	翻訳日:2024-02-08 17:41:45 公開日:2024-02-06
# 量子渦の中心近傍における光電子の波動関数 The wave function of a photoelectron near the center of a quantum vortex ( http://arxiv.org/abs/2402.04414v1 ) ライセンス: Link先を確認	N. V. Larionov, Yu. L. Kolesnikov	(参考訳) 二次元近似では、量子渦の局在に近い光電子の確率密度と電流を理論的に研究する。先に発見した運動量表現の波動関数は、渦の中心に対応するゼロ付近で単純化される。これにより、ガウス波のパケットの構造を持ち、渦に関する基本的な情報を含む単純な解析式を得ることができる。運動量と座標空間の両方において、量子渦の時間発展を分析するために用いられる。イオン化超短パルスの強度が量子渦形状に及ぼす影響についても検討した。 In a two-dimensional approximation, the probability density and current for a photoelectron near the localization of a quantum vortex are theoretically investigated. The wave function in the momentum representation, which we found earlier, is simplified near zero, corresponding to the center of the vortex. This allows us to obtain a simple analytical expression for it, which has the structure of a Gaussian wave packet and contains basic information about the vortex. It is used to analyze the temporal evolution of a quantum vortex, both in momentum and coordinate space. The effect of the intensity of an ionizing ultrashort laser pulse on the shape of a quantum vortex is also investigated.	翻訳日:2024-02-08 17:41:34 公開日:2024-02-06
# VampPrior混合モデル The VampPrior Mixture Model ( http://arxiv.org/abs/2402.04412v1 ) ライセンス: Link先を確認	Andrew Stirn and David A. Knowles	(参考訳) 深層潜伏変数モデル(DLVM)の現在のクラスタリングでは、a-prioriのクラスタ数を定義する必要があり、初期化が貧弱である。これらの欠陥に対処することは、統合とクラスタリングを同時に行うことで、ディープラーニングベースのscrna-seq分析に大きなメリットがある。我々は、vampprior (tomczak & welling, 2018) をdirichlet process gaussian mixed modelに適応させ、dlvmsに先立つ新しいvampprior mixed model (vmm) を実現した。本稿では,変分推論と経験ベイズを交互に交互に推定し,変分パラメータと先行パラメータをきれいに区別する手法を提案する。変分オートコーダでVMMを使用すると、ベンチマークデータセット上で非常に競争力のあるクラスタリング性能が得られる。 Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMMは、その性能を著しく改善し、細胞を生物学的に意味のあるクラスターに自動的に配置する。 Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.	翻訳日:2024-02-08 17:41:17 公開日:2024-02-06
# ナレーションによる言語モデルにおけるモード崩壊の検出 Detecting Mode Collapse in Language Models via Narration ( http://arxiv.org/abs/2402.04477v1 ) ライセンス: Link先を確認	Sil Hamilton	(参考訳) 2人の作家が同じように書くことはない。レキシコンから修辞学的な装置に至るまで、書かれた物語の中で個人的繁栄が引き起こされ、それは特定の著者、つまり、文学理論家たちが、本文の実際の著者やナレーターとは異なる、暗黙または仮想的な著者と名付けることを意味する。様々な不協和性ソースから抽出された未フィルタリングトレーニングセットに基づいて訓練された初期の大きな言語モデルは、不整合な個性をもたらし、会話のタスクには問題があったが、複数の観点から文献をサンプリングするのに有用であった。近年のアライメント研究の成功により、研究者はヒューマンフィードバック(RLHF)からの指導チューニングや強化学習を通じて、言語モデルに主観的に一貫したペルソナを課すことができたが、アライメントモデルが任意の仮想著者をモデル化する能力を保持するかどうかはほとんど調査されていない。 3つのOpenAI言語モデルからサンプリングされた4,374のストーリーを解析することにより、GPT-3の連続バージョンは「モード崩壊」の度合いの上昇に悩まされ、アライメント中のモデルに過度に適合することで、オーサシップを一般化することを防ぐ。社会学シミュレーションに言語モデルを用いたい研究者にとって,本手法と結果が重要である。 No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.	翻訳日:2024-02-08 17:32:58 公開日:2024-02-06
# Webナビゲーションのためのデュアルビュービジュアルコンテクスト化 Dual-View Visual Contextualization for Web Navigation ( http://arxiv.org/abs/2402.04476v1 ) ライセンス: Link先を確認	Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao	(参考訳) 自動Webナビゲーションは、言語命令に従って現実世界のウェブサイトで複雑で多様なタスクを実行するWebエージェントを構築することを目的としている。既存の作業は、主にHTMLドキュメントを入力として、Webページのコンテンツとアクション空間(つまり実行可能な要素と操作)を定義する。それでもHTMLドキュメントは各要素に対して明確なタスク関連コンテキストを提供していないため、正しい(順序の)アクションを選択するのが難しい。本稿では,Webページのスクリーンショットにおいて,各HTML要素が対応するバウンディングボックスとスクリーンショット内の視覚的コンテンツを持つ「デュアルビュー」を通じて,HTML要素のコンテキスト化を提案する。 Web開発者は、Webページの近くにあるタスク関連要素を配置して、ユーザエクスペリエンスを向上させる傾向があり、テキストとビジュアルの両方の機能を使用して、各要素を隣の要素でコンテキスト化することを提案します。結果として生じるHTML要素の表現は、エージェントがアクションを取るためのより情報的です。我々は最近リリースされたMind2Webデータセット上で,実際のWebサイト上で多様なナビゲーションドメインとタスクを特徴付ける手法を検証する。提案手法は,クロスタスク,クロスサイト,クロスドメインなど,すべてのシナリオにおいて一貫してベースラインを上回ります。 Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight -- web developers tend to arrange task-related elements nearby on webpages to enhance user experiences -- and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.	翻訳日:2024-02-08 17:32:31 公開日:2024-02-06
# 帰納的量子位相推定 Reductive Quantum Phase Estimation ( http://arxiv.org/abs/2402.04471v1 ) ライセンス: Link先を確認	Nicholas J.C. Papadopoulos, Jarrod T. Reilly, John Drew Wilson, Murray J. Holland	(参考訳) 量子相の推定は、幅広い分野の量子科学において必要となる課題である。この課題を達成するために、原子物理学と分子物理学のラムゼー干渉計(RI)と量子コンピューティングの量子位相推定(QPE)という2つのよく知られた手法が異なる文脈で開発された。これらの正準例は、還元量子位相推定(RQPE)回路と呼ばれる、より大規模な位相推定プロトコルの例であることを示す。ここでは、RQPE回路を作成できる明示的なアルゴリズムを提案する。この回路は、より少ない量子ビットとユニタリな応用を持つ任意の位相の集合を区別し、RIとQPEが属する一般的な量子仮説テストのクラスを解く。さらに、測定精度と位相識別性とのトレードオフを実証し、回路を特定の用途に最適に調整できるようにする。 Estimating a quantum phase is a necessary task in a wide range of fields of quantum science. To accomplish this task, two well-known methods have been developed in distinct contexts, namely, Ramsey interferometry (RI) in atomic and molecular physics and quantum phase estimation (QPE) in quantum computing. We demonstrate that these canonical examples are instances of a larger class of phase estimation protocols, which we call reductive quantum phase estimation (RQPE) circuits. Here we present an explicit algorithm that allows one to create an RQPE circuit. This circuit distinguishes an arbitrary set of phases with a fewer number of qubits and unitary applications, thereby solving a general class of quantum hypothesis testing to which RI and QPE belong. We further demonstrate a trade-off between measurement precision and phase distinguishability, which allows one to tune the circuit to be optimal for a specific application.	翻訳日:2024-02-08 17:32:11 公開日:2024-02-06
# 人間ではなく、ロールプレイングツールとしてのAI言語モデル AI language models as role-playing tools, not human participants ( http://arxiv.org/abs/2402.04470v1 ) ライセンス: Link先を確認	Zhicheng Lin	(参考訳) AIの進歩は、人間の参加者の代替として言語モデルの誤用を招いている。平均的な人間の心を垣間見るものとして、これらの統計アルゴリズムを根本的に誤認識し、言語モデルは柔軟なシミュレーションツールとして受け入れるべきであり、人間の特性自体を持たずに多様な振る舞いを模倣できると主張している。 Advances in AI invite misuse of language models as replacements for human participants. We argue that treating their responses as glimpses into an average human mind fundamentally mischaracterizes these statistical algorithms and that language models should be embraced as flexible simulation tools, able to mimic diverse behaviors without possessing human traits themselves.	翻訳日:2024-02-08 17:31:55 公開日:2024-02-06
# DySLIM:カオスシステムのための不変測度による動的安定学習 DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems ( http://arxiv.org/abs/2402.04467v1 ) ライセンス: Link先を確認	Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-N\'u\~nez	(参考訳) 散逸したカオスシステムからの学習ダイナミクスは、その固有の不安定性のために、学習ダイナミクスの誤りを指数関数的に増幅するポジティブなリアプノフ指数によって形式化されるため、悪名高い。しかし、これらの系の多くはエルゴード性や引力を示す:コンパクトで非常に複雑な多様体で、軌跡は有限時間で収束し、不変測度、すなわち力学の作用の下で不変な確率分布をサポートし、システムの長期的な統計的挙動を規定する。本研究では, トラジェクタ間の不適合のみを対象とする一般的な手法と対照的に, トラジェクタの長さが増加するにつれて発散してしまう場合が多く, 不変測度の学習とダイナミクスを対象とする新しい枠組みを提案する。我々は,既存の学習目標と併用可能な効率的な目標を提示するために,このフレームワークを用いて提案する。 invariant measures(dyslim)目標による動的安定学習は、他の学習目標と比較して、ポイントアラウンドトラッキングと長期的な統計精度を達成するモデルトレーニングを可能にする。スケーラブルな正規化項で分布をターゲットとすることで、気候や気候モデルのようなゆっくりと変化する分布を示すより複雑なシステムにこのアプローチを拡張できることを期待する。 Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories' length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measures (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models.	翻訳日:2024-02-08 17:31:48 公開日:2024-02-06
# NVIDIA Holoscanにおける医療AIシステムのための決定論的エンドツーエンドレイテンシ Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan ( http://arxiv.org/abs/2402.04466v1 ) ライセンス: Link先を確認	Soham Sinha, Shekhar Dwivedi, Mahdi Azizian	(参考訳) 医療機器へのAIとML技術の導入は、医療診断と治療に革命をもたらした。医療機器メーカーは、単一のプラットフォームに複数のアプリケーションを統合することで、AIとMLがもたらすメリットを最大化することを熱望している。しかし、独自の視覚化コンポーネントを備えた複数のAIアプリケーションの同時実行は、主にGPUリソースの競合による予測不可能なエンドツーエンドレイテンシにつながる。これを軽減するため、製造業者は通常、異なるAIアプリケーションのための別々のワークステーションをデプロイし、財務、エネルギー、メンテナンスコストを増大させる。本稿では、センサーデータと画像をストリーミングするリアルタイムAIシステムであるNVIDIAのHoloscanプラットフォームにおけるこれらの課題に対処する。計算タスクとグラフィックスタスクの両方を含む異種GPUワークロードに最適化されたシステム設計を提案する。我々の設計では、CUDA MPSを計算ワークロードの空間分割に利用し、計算処理とグラフィックス処理を別々のGPUに分離する。実世界の医療機器アプリケーションを用いた経験的評価により,様々な終末遅延決定指標の大幅な性能向上を示す。例えば、提案した設計では、単一GPUベースラインと比較して、最大レイテンシを21～30%削減し、最大5つの同時内視鏡ツールトラッキングAIアプリケーションに対して、レイテンシ分散フラットネスを17～25%改善している。デフォルトのマルチGPUセットアップに対して,GPU利用率を42%向上させることで,最大で6つの並列アプリケーションで最大遅延を35%削減する。本稿では、並列および異種gpuワークロードのパフォーマンス予測が重要な要件である医療システムを含むエッジコンピューティング領域におけるaiアプリケーションについて、明確な設計知見を提供する。 The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency, primarily due to GPU resource contentions. To mitigate this, manufacturers typically deploy separate workstations for distinct AI applications, thereby increasing financial, energy, and maintenance costs. This paper addresses these challenges within the context of NVIDIA's Holoscan platform, a real-time AI system for streaming sensor data and images. We propose a system design optimized for heterogeneous GPU workloads, encompassing both compute and graphics tasks. Our design leverages CUDA MPS for spatial partitioning of compute workloads and isolates compute and graphics processing onto separate GPUs. We demonstrate significant performance improvements across various end-to-end latency determinism metrics through empirical evaluation with real-world Holoscan medical device applications. For instance, the proposed design reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent endoscopy tool tracking AI applications, compared to a single-GPU baseline. Against a default multi-GPU setup, our optimizations decrease maximum latency by 35% for up to six concurrent applications by improving GPU utilization by 42%. This paper provides clear design insights for AI applications in the edge-computing domain including medical systems, where performance predictability of concurrent and heterogeneous GPU workloads is a critical requirement.	翻訳日:2024-02-08 17:31:24 公開日:2024-02-06
# BAdaCost: コストを伴うマルチクラスのブースティング BAdaCost: Multi-class Boosting with Costs ( http://arxiv.org/abs/2402.04465v1 ) ライセンス: Link先を確認	Antonio Fern\'andez-Baldera, Jos\'e M. Buenaposada, Luis Baumela	(参考訳) マルチクラスコスト感性分類アルゴリズムであるBAdaCostを提案する。コストに敏感な複数クラスの弱い学習者を組み合わせて、Boostingフレームワーク内で強力な分類規則を得る。このアルゴリズムを導出するために,AdaBoost,SAMME,コストセンシティブなAdaBoost,PIBoostなどの様々な分類アルゴリズムで最適化された損失を一般化する,コストセンシティブなマルチクラス指数損失であるCMELを導入する。それゆえ、共通の理論的枠組みの下でそれらを統一する。実験では, BAdaCostが従来のマルチクラスコスト感性アプローチと比較して, 性能の大幅な向上を実証した。非対称多クラス分類における提案アルゴリズムの利点は、実用的多視点顔と車検出問題でも評価されている。 We present BAdaCost, a multi-class cost-sensitive classification algorithm. It combines a set of cost-sensitive multi-class weak learners to obtain a strong classification rule within the Boosting framework. To derive the algorithm we introduce CMEL, a Cost-sensitive Multi-class Exponential Loss that generalizes the losses optimized in various classification algorithms such as AdaBoost, SAMME, Cost-sensitive AdaBoost and PIBoost. Hence unifying them under a common theoretical framework. In the experiments performed we prove that BAdaCost achieves significant gains in performance when compared to previous multi-class cost-sensitive approaches. The advantages of the proposed algorithm in asymmetric multi-class classification are also evaluated in practical multi-view face and car detection problems.	翻訳日:2024-02-08 17:30:58 公開日:2024-02-06
# 人工知能の難解な10の課題 Ten Hard Problems in Artificial Intelligence We Must Get Right ( http://arxiv.org/abs/2402.04464v1 ) ライセンス: Link先を確認	Gavin Leech and Simson Garfinkel and Misha Yagudin and Alexander Briand and Aleksandr Zhuravlev	(参考訳) We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same time ensuring socially responsible deployment; (8) addressing any geopolitical disruptions that AI causes; (9) promoting sound governance of the technology; and (10) managing the philosophical disruptions for humans living in the age of AI. それぞれの問題について、その領域を概説し、最近の重要な作業を特定し、今後の方向性を提案する。 (注:2023年1月までの文献をレビューする。) We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same time ensuring socially responsible deployment; (8) addressing any geopolitical disruptions that AI causes; (9) promoting sound governance of the technology; and (10) managing the philosophical disruptions for humans living in the age of AI. For each problem, we outline the area, identify significant recent work, and suggest ways forward. [Note: this paper reviews literature through January 2023.]	翻訳日:2024-02-08 17:30:43 公開日:2024-02-06
# リコメンダシステムにおけるAutoMLの可能性 The Potential of AutoML for Recommender Systems ( http://arxiv.org/abs/2402.04453v1 ) ライセンス: Link先を確認	Tobias Vente, Joeran Beel	(参考訳) Automated Machine Learning (AutoML)は、モデル圧縮、機械翻訳、コンピュータビジョンを含む機械学習(ML)の非常に高度な応用である。 Recommender Systems (RecSys) はMLの応用と見なすことができる。しかし、AutoMLはRecSysコミュニティではほとんど注目を集めていない。 AutoML技術を採用するのは、比較的単純なAutomated Recommender Systems(AutoRecSys)ライブラリのみである。しかし、これらのライブラリは学生プロジェクトに基づいており、automlライブラリの機能や完全な開発を提供していない。私たちは、推奨システムを実装したい経験の浅いユーザのシナリオでAutoMLライブラリがどのように機能するかを判断することにしました。我々は、平均予測基準を含む15のライブラリの60のAutoML、AutoRecSys、ML、RecSysアルゴリズムの予測性能を14の明示的なフィードバックRecSysデータセットで比較した。経験の浅いユーザの視点をシミュレートするために,アルゴリズムをデフォルトのハイパーパラメータで評価した。私たちはAutoMLとAutoRecSysライブラリが最善であることがわかった。 AutoMLライブラリは14のデータセットのうち6つ(43%)で最高に動作したが、必ずしも同じAutoMLライブラリが最高に動作していたわけではない。シングルベストのライブラリはAutoRecSysライブラリのAuto-Surpriseで、5つのデータセット(36%)で最高のパフォーマンスを示している。 3つのデータセット(21%)では、AutoMLライブラリはパフォーマンスが悪く、デフォルトパラメータを持つRecSysライブラリがベストだった。しかし、データセット毎に上位5つの配置の50%を取得すると、recsysアルゴリズムは平均してautomlに遅れる。 MLアルゴリズムは一般的に最悪だった。 Automated Machine Learning (AutoML) has greatly advanced applications of Machine Learning (ML) including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet, AutoML has found little attention in the RecSys community; nor has RecSys found notable attention in the AutoML community. Only few and relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. To simulate the perspective of an inexperienced user, the algorithms were evaluated with default hyperparameters. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but it was not always the same AutoML library performing best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although, while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.	翻訳日:2024-02-08 17:30:31 公開日:2024-02-06
# 質量細胞計測のための細胞セグメンテーションモデルの限界を押し上げる Pushing the limits of cell segmentation models for imaging mass cytometry ( http://arxiv.org/abs/2402.04446v1 ) ライセンス: Link先を確認	Kimberley M. Bird, Xujiong Ye, Alan M. Race, James M. Brown	(参考訳) imaging mass cytometry (imc) は比較的新しい生体組織を細胞内分解能でイメージングする技術である。近年、学習に基づくセグメンテーション手法により、細胞型と形態の正確な定量化が可能になっているが、一般的には、完全に注釈付き基底真理(gt)ラベルを持つ大規模データセットに依存している。本稿では,不完全ラベルが学習ベースセグメンテーションモデルに及ぼす影響について検討し,これらのモデルの組織タイプへの一般化性を評価する。以上の結果から,GTマスクから50%の細胞アノテーションを除去すると,DSCスコアは0.874に低下する(GTマスクで訓練したモデルによる0.889から)。これは、アノテーションの時間がパフォーマンスに悪影響を及ぼすことなく、少なくとも半分削減できることを意味する。さらに,不完全ラベルを用いた単発モデルの訓練では,多発組織型に比べてdscが0.031減少し,セグメンテーションの質的差異が無視できる。さらに、最悪のパフォーマンスモデル(5%の細胞アノテーションを含む)をブートストラッピングすると、10倍のDSCスコアが0.720から0.829に向上する。これらの知見は、トレーニング中に複数のIMC組織タイプの必要性を排除し、また、ラベルがほとんどないモデルが自分自身で改善する可能性も提供することを含む、同等のセグメンテーションモデルを作成するプロセスに、時間と作業が費やされる可能性があることを示唆している。ソースコードはgithubにある。 https://github.com/kimberley/isbi2024。 Imaging mass cytometry (IMC) is a relatively new technique for imaging biological tissue at subcellular resolution. In recent years, learning-based segmentation methods have enabled precise quantification of cell type and morphology, but typically rely on large datasets with fully annotated ground truth (GT) labels. This paper explores the effects of imperfect labels on learning-based segmentation models and evaluates the generalisability of these models to different tissue types. Our results show that removing 50% of cell annotations from GT masks only reduces the dice similarity coefficient (DSC) score to 0.874 (from 0.889 achieved by a model trained on fully annotated GT masks). This implies that annotation time can in fact be reduced by at least half without detrimentally affecting performance. Furthermore, training our single-tissue model on imperfect labels only decreases DSC by 0.031 on an unseen tissue type compared to its multi-tissue counterpart, with negligible qualitative differences in segmentation. Additionally, bootstrapping the worst-performing model (with 5% of cell annotations) a total of ten times improves its original DSC score of 0.720 to 0.829. These findings imply that less time and work can be put into the process of producing comparable segmentation models; this includes eliminating the need for multiple IMC tissue types during training, whilst also providing the potential for models with very few labels to improve on themselves. Source code is available on GitHub: https://github.com/kimberley/ISBI2024.	翻訳日:2024-02-08 17:30:00 公開日:2024-02-06
# 医師-AIコンサルテーションのワンショット分類における埋め込みの評価 Evaluating Embeddings for One-Shot Classification of Doctor-AI Consultations ( http://arxiv.org/abs/2402.04442v1 ) ライセンス: Link先を確認	Olumide Ebenezer Ojo, Olaronke Oluwayemisi Adebanji, Alexander Gelbukh, Hiram Calvo and Anna Feldman	(参考訳) 医療提供者と患者との効果的なコミュニケーションは、高品質な患者医療の提供に不可欠である。本研究では,医療相談における医師書きとAI生成のテキストを,最先端の埋め込みとワンショット分類システムを用いてどのように分類するかを検討する。 bag-of-words, character n-grams, word2vec, glove, fasttext, gpt2 embeddedsなどの埋め込みを解析することにより,ワンショット分類システムが医療相談の中で意味情報を取得する方法を検討する。その結果、埋め込みはテキストからセマンティックな特徴を信頼性と適応性でキャプチャできることがわかった。全体として、Word2Vec、GloVe、および character n-grams の埋め込みは良好に動作し、このタスクをターゲットにしたモデリングに適していることを示している。 GPT2埋め込みも顕著な性能を示しており、このタスクに合わせたモデルにも適していることを示している。当社の機械学習アーキテクチャは、トレーニングデータが少ない場合の健康会話の質を大幅に向上させ、患者と医療提供者間のコミュニケーションを改善しました。 Effective communication between healthcare providers and patients is crucial to providing high-quality patient care. In this work, we investigate how Doctor-written and AI-generated texts in healthcare consultations can be classified using state-of-the-art embeddings and one-shot classification systems. By analyzing embeddings such as bag-of-words, character n-grams, Word2Vec, GloVe, fastText, and GPT2 embeddings, we examine how well our one-shot classification systems capture semantic information within medical consultations. Results show that the embeddings are capable of capturing semantic features from text in a reliable and adaptable manner. Overall, Word2Vec, GloVe and Character n-grams embeddings performed well, indicating their suitability for modeling targeted to this task. GPT2 embedding also shows notable performance, indicating its suitability for models tailored to this task as well. Our machine learning architectures significantly improved the quality of health conversations when training data are scarce, improving communication between patients and healthcare providers.	翻訳日:2024-02-08 17:29:35 公開日:2024-02-06
# 全相関を用いた高次ニューラルネットワークノード相互作用の探索 Exploring higher-order neural network node interactions with total correlation ( http://arxiv.org/abs/2402.04440v1 ) ライセンス: Link先を確認	Thomas Kerby, Teresa White, Kevin Moon	(参考訳) 生態システム、コラボレーション、人間の脳などの領域では、変数は複雑な方法で相互作用する。しかし、高次変数相互作用(HOI)を正確に特徴付けることは、データ間でHOIが変化するとさらに悪化する難しい問題である。そこで本研究では,データ多様体に近接してデータポイントをクラスタリングし,局所的スケールでhoisをキャプチャする新しい手法corexを提案する。次に、全相関と呼ばれる相互情報の多変量バージョンを用いて、各クラスタ内のデータの潜在因子表現を構築し、局所的なHOIを学習する。我々はLocal CorExを用いて、合成および実世界のデータ中のHOIを探索し、データ構造に関する隠れた洞察を抽出する。最後に、トレーニングニューラルネットワークの内部動作の探索と解釈にLocal CorExが適していることを示します。 In domains such as ecological systems, collaborations, and the human brain the variables interact in complex ways. Yet accurately characterizing higher-order variable interactions (HOIs) is a difficult problem that is further exacerbated when the HOIs change across the data. To solve this problem we propose a new method called Local Correlation Explanation (CorEx) to capture HOIs at a local scale by first clustering data points based on their proximity on the data manifold. We then use a multivariate version of the mutual information called the total correlation, to construct a latent factor representation of the data within each cluster to learn the local HOIs. We use Local CorEx to explore HOIs in synthetic and real world data to extract hidden insights about the data structure. Lastly, we demonstrate Local CorEx's suitability to explore and interpret the inner workings of trained neural networks.	翻訳日:2024-02-08 17:29:14 公開日:2024-02-06
# Degenerate Clifford Algebrasにおける知識グラフの埋め込み Embedding Knowledge Graphs in Degenerate Clifford Algebras ( http://arxiv.org/abs/2402.04870v1 ) ライセンス: Link先を確認	Louis Mozart Kamdem, Caglar Demir and Axel-Cyrille Ngonga	(参考訳) クリフォード代数は実数、複素数、四元数の自然な一般化である。これまでのところ、cl_{p,q}$(つまり、零基ベクトルを持たない代数)という形のクリフォード代数のみが知識グラフ埋め込みの文脈で研究されてきた。そこで本研究では,nilpotency index が 2 である nilpotent base vector について考察する。これらの空間において、$Cl_{p,q,r}$ は双対数に基づくアプローチ(これは $Cl_{p,q}$ でモデル化できない)を一般化し、実体埋め込みの現実部分と複素部分の間の高次相互作用が存在しないことから生じるパターンを捉えることができる。パラメータの発見には$p$,$q$,$r$の2つの新しいモデルを設計する。最初のモデルはgreedy検索を使用して$p$、$q$、$r$を最適化する。 2つ目は、ニューラルネットワークを用いて計算された入力知識グラフの埋め込みに基づいて$(p, q,r)$を予測する。 7つのベンチマークデータセットによる評価結果から, 零ベクトルが埋め込みの捕集に有効であることが示唆された。我々の技術との比較は、検証データで達成するmrを全てのデータセットの他のアプローチよりも一般化していることを示唆している。また、greedy検索は$p$、$q$、$r$の値が最適に近い値を見つけるのに十分であることを示す。 Clifford algebras are a natural generalization of the real numbers, the complex numbers, and the quaternions. So far, solely Clifford algebras of the form $Cl_{p,q}$ (i.e., algebras without nilpotent base vectors) have been studied in the context of knowledge graph embeddings. We propose to consider nilpotent base vectors with a nilpotency index of two. In these spaces, denoted $Cl_{p,q,r}$, allows generalizing over approaches based on dual numbers (which cannot be modelled using $Cl_{p,q}$) and capturing patterns that emanate from the absence of higher-order interactions between real and complex parts of entity embeddings. We design two new models for the discovery of the parameters $p$, $q$, and $r$. The first model uses a greedy search to optimize $p$, $q$, and $r$. The second predicts $(p, q,r)$ based on an embedding of the input knowledge graph computed using neural networks. The results of our evaluation on seven benchmark datasets suggest that nilpotent vectors can help capture embeddings better. Our comparison against the state of the art suggests that our approach generalizes better than other approaches on all datasets w.r.t. the MRR it achieves on validation data. We also show that a greedy search suffices to discover values of $p$, $q$ and $r$ that are close to optimal.	翻訳日:2024-02-08 15:17:44 公開日:2024-02-06
# 推定リーンとデータ適応予測 Assumption-lean and Data-adaptive Post-Prediction Inference ( http://arxiv.org/abs/2311.14220v3 ) ライセンス: Link先を確認	Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu	(参考訳) 現代の科学研究が直面する主な課題は金本位制のデータの入手が限られていることであり、費用と労力がかかる。機械学習(ML)の急速な発展により、科学者は容易に得られる共変量でこれらの金標準結果を予測するためにMLアルゴリズムに依存してきた。しかし、これらの予測結果は、予測手順によってもたらされた不正確さや不均質性を無視して、後続の統計分析で直接使用されることが多い。これはおそらく偽陽性の発見と無効な科学的結論をもたらす。本研究では、ML予測結果に基づいて、有効かつ強力な推論を可能にする仮定型およびデータ適応型ポストプレディション推論(POP-Inf)手法を提案する。その「推定リーン」特性は、幅広い統計量のML予測を仮定せずに信頼できる統計的推測を保証する。その"data-adaptive"機能は、ml-predictionの精度に関わらず、既存の予測後推論メソッドよりも効率性が向上する。シミュレーションと大規模ゲノムデータを用いて,本手法の優位性と適用性を示す。 A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure that allows valid and powerful inference based on ML-predicted outcomes. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction, for a wide range of statistical quantities. Its "data-adaptive'" feature guarantees an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.	翻訳日:2024-02-08 12:09:48 公開日:2024-02-06
# AIが生成した画像から人間のアートを区別できるのか? Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? ( http://arxiv.org/abs/2402.03214v2 ) ライセンス: Link先を確認	Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, Ben Y. Zhao	(参考訳) 生成AI画像の出現は、アートの世界を完全に破壊した。 aiが生成した画像を人間の芸術と区別することは、時間とともに影響が増大する困難な問題である。この問題に対処できないため、悪いアクターは、AIイメージを禁止したポリシーを掲げる人間芸術や企業に対してプレミアムを支払う個人を欺くことができる。また、コンテンツ所有者が著作権を確立し、モデルの崩壊を避けるためにトレーニングデータのキュレーションに関心を持つモデルトレーナーにとっても重要である。人間のアートとAIイメージを区別するアプローチには、教師付き学習によって訓練された分類器、拡散モデルをターゲットにした研究ツール、芸術技術に関する知識を使ったプロのアーティストによる識別など、さまざまなものがある。本稿では,これらのアプローチが現在の現代生成モデルに対して,良性と敵意の両方においてどのように機能するかを理解したい。実際の人間のアートを7つのスタイルでキュレートし、5つの生成モデルからマッチング画像を生成し、8つの検出器(180人の群衆、4000人以上のプロのアーティスト、13のエキスパートアーティストを含む5つの自動検出器と3つの異なる人間グループ)を適用する。 Hiveとエキスパートアーティストはどちらも非常にうまく機能するが、異なる方法で間違いを犯す(Hiveは敵の摂動に対して弱く、エキスパートアーティストは高い偽陽性を生成する)。これらの弱点は、モデルが進化し続けるにつれて残ると信じており、私たちのデータを使用して、人間と自動化された検出器のチームが、正確性と堅牢性の最高の組み合わせを提供する理由を実証しています。 The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.	翻訳日:2024-02-08 12:00:56 公開日:2024-02-06
# 検証回路の再利用による言語モデルの信頼性向上 Increasing Trust in Language Models through the Reuse of Verified Circuits ( http://arxiv.org/abs/2402.02619v2 ) ライセンス: Link先を確認	Philip Quirke, Clement Neo, Fazl Barez	(参考訳) 言語モデル(LM)は、幅広い予測タスクにますます使われていますが、それらのトレーニングは稀なエッジケースを無視し、信頼性を低下させます。ここでは、タスクアルゴリズムと回路実装を検証し、エッジケースを考慮し、既知の障害モードを含まない、厳格な信頼性基準を定義する。数学的および論理的に規定されたフレームワークを使用して構築すれば,トランスフォーマーモデルをこの標準を満たすように訓練できることが示される。本稿では n-桁整数加算のモデルを完全に検証する。検証されたモジュールの再利用性を示すために、訓練された整数加算モデルを未訓練モデルに挿入し、複合モデルを訓練して加算と減算の両方を実行する。両タスクの加算回路を広範囲に再利用し,より複雑な減算器モデルの検証を容易にする。本稿では,検証済みのタスクモジュールをLMに挿入することで,モデルの再利用を活かし,それらを用いた言語モデルの妥当性と信頼性を向上させる方法について論じる。検証回路の再利用により、言語モデルの安全性に向けた重要なステップであると考えられる、より複雑な複合モデルを検証する労力が削減される。 Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a transformer model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify a model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.	翻訳日:2024-02-08 11:59:26 公開日:2024-02-06
# グラフ基礎モデル Graph Foundation Models ( http://arxiv.org/abs/2402.02216v2 ) ライセンス: Link先を確認	Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, Jiliang Tang	(参考訳) グラフ基礎モデル(GFM)は、グラフ領域における新しいトレンド研究トピックであり、異なるグラフやタスクを一般化可能なグラフモデルの開発を目指している。しかし、汎用的なGFMはまだ達成されていない。 GFMを構築する上で重要な課題は、さまざまな構造パターンを持つグラフ間でポジティブな転送を可能にする方法である。 cvおよびnlpドメインにおける既存の基礎モデルに着想を得て、グラフ上の不変性を符号化する基本転送可能な単位の「グラフ語彙」を提唱し、gfm開発の新たな展望を提案する。我々は,ネットワーク解析,理論的基礎,安定性といった重要な側面からグラフ語彙の構成を基礎づける。このような語彙的視点は、ニューラルスケーリング法則に従って将来のGFM設計を前進させる可能性がある。 Graph Foundation Model (GFM) is a new trending research topic in the graph domain, aiming to develop a graph model capable of generalizing across different graphs and tasks. However, a versatile GFM has not yet been achieved. The key challenge in building GFM is how to enable positive transfer across graphs with diverse structural patterns. Inspired by the existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a "graph vocabulary", in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, theoretical foundations, and stability. Such a vocabulary perspective can potentially advance the future GFM design following the neural scaling laws.	翻訳日:2024-02-08 11:56:58 公開日:2024-02-06
# ソフトウェア工学のための協調エージェント Collaborative Agents for Software Engineering ( http://arxiv.org/abs/2402.02172v2 ) ライセンス: Link先を確認	Daniel Tang and Zhenghan Chen and Kisub Kim and Yewei Song and Haoye Tian and Saad Ezzini and Yongfeng Huang and Jacques Klein and Tegawende F. Bissyande	(参考訳) コードレビューは協調的なプロセスであり、ソフトウェアの全体的な品質と信頼性を保証することを目的としています。これは大きなメリットを提供するが、組織におけるコードレビューの実装は、自動化をアピールするいくつかの課題に直面している。自動化されたコードレビューツールが開発されてからしばらく経ち、新しいaiモデルの採用によって改善されている。残念なことに、既存のメソッドは不足している。彼らはしばしば単一の入出力生成モデルをターゲットにしており、様々な視点を考慮したコードレビューのコラボレーションインタラクションをシミュレートできない。本稿では,コードレビューのための新しいマルチエージェントシステムであるCodeAgentを導入することにより,コードレビュー自動化の最先端技術について述べる。基本的に、CodeAgentはQA-Checker("Question-Answer Checking"の略)によって運営されている。 codeagentは自律的で、マルチエージェントで、大きな言語モデル駆動です。コードエージェントの有効性を実証するために,様々なタスクにおいてその能力を評価する実験を行った。 1)コード変更とコミットメッセージの不一致の検出。 2【コミットによる脆弱性導入の検出】 3) コードスタイルの遵守の検証。私たちのウェブサイトは \url{https://code-agent-new.vercel.app/index.html} でアクセスできます。 Code review is a heavily collaborative process, which aims at ensuring the overall quality and reliability of software. While it provides massive benefits, the implementation of code review in an organization faces several challenges that make its automation appealing. Automated code review tools have been around for a while and are now improving thanks to the adoption of novel AI models, which help can learn about standard practices and systematically check that the reviewed code adheres to them. Unfortunately, existing methods fall short: they often target a single input-output generative model, which cannot simulate the collaboration interactions in code review to account for various perspectives; they are also sub-performing on various critical code review sub-tasks. In this paper, we advance the state of the art in code review automation by introducing CodeAgent, a novel multi-agent-based system for code review. Fundamentally, CodeAgent is steered by QA-Checker (short for "Question-Answer Checking"), a supervision agent, designed specifically to ensure that all agents' contributions remain relevant to the initial review question. CodeAgent is autonomous, multi-agent, and Large language model-driven. To demonstrate the effectiveness of CodeAgent, we performed experiments to assess its capabilities in various tasks including 1) detection of inconsistencies between code changes and commit messages, 2) detection of vulnerability introduction by commits, and 3) validation of adherence to code style. Our website is accessed in \url{https://code-agent-new.vercel.app/index.html}.	翻訳日:2024-02-08 11:56:31 公開日:2024-02-06
# プライバシ保存および検証可能なreluネットワークに対する多項式近似について On Polynomial Approximations for Privacy-Preserving and Verifiable ReLU Networks ( http://arxiv.org/abs/2011.05530v4 ) ライセンス: Link先を確認	Ramy E. Ali, Jinhyun So, A. Salman Avestimehr	(参考訳) ディープニューラルネットワーク(DNN)推論タスクを信頼できないクラウドにアウトソーシングすることで、データのプライバシと整合性に関する懸念が高まる。多項式ベースの計算にはプライバシーと整合性を保証する技術が数多く存在するが、DNNには多項式以外の計算が含まれる。これらの課題に対処するため、正則線形単位(ReLU)関数を多項式活性化関数に置き換えることで、プライバシー保護および検証可能な推論手法が提案されている。そのような手法は通常、整数係数を持つ多項式や有限体上の多項式を必要とする。そのような要求により、ReLU関数を平方関数に置き換えるいくつかの研究が提案された。本研究では,多項式を整数係数に制限してもルル関数を置換できるような正方関数は最適次数2多項式ではないことを実証的に示す。代わりに、次数2の多項式活性化関数を1次項で提案し、より優れたモデルに導くことを実証的に示す。 VGG-16などの各種アーキテクチャにおけるCIFARおよびTiny ImageNetデータセットの実験から,提案した関数は正方形関数と比較して最大10.4%精度が向上することが示された。 Outsourcing deep neural networks (DNNs) inference tasks to an untrusted cloud raises data privacy and integrity concerns. While there are many techniques to ensure privacy and integrity for polynomial-based computations, DNNs involve non-polynomial computations. To address these challenges, several privacy-preserving and verifiable inference techniques have been proposed based on replacing the non-polynomial activation functions such as the rectified linear unit (ReLU) function with polynomial activation functions. Such techniques usually require polynomials with integer coefficients or polynomials over finite fields. Motivated by such requirements, several works proposed replacing the ReLU function with the square function. In this work, we empirically show that the square function is not the best degree-2 polynomial that can replace the ReLU function even when restricting the polynomials to have integer coefficients. We instead propose a degree-2 polynomial activation function with a first order term and empirically show that it can lead to much better models. Our experiments on the CIFAR and Tiny ImageNet datasets on various architectures such as VGG-16 show that our proposed function improves the test accuracy by up to 10.4% compared to the square function.	翻訳日:2024-02-07 23:36:28 公開日:2024-02-06
# ノイズチャネルにおける量子ランダムアクセスコード Quantum Random Access Code in Noisy Channels ( http://arxiv.org/abs/2204.09485v2 ) ライセンス: Link先を確認	Breno Marques and Rafael A. da Silva	(参考訳) ランダムアクセスコード(RAC)通信プロトコルは、当事者間の通信が制限されている場合に特に有用である。本研究は,従来の量子ランダムアクセスコード(qrac)を,ノイズがなければ従来のランダムアクセスコード(crac)よりも有利であると証明し,ノイズチャネルがqrac性能に与える影響と,ノイズチャネルが知られている場合のセミデファイト・プログラミングにより最適化されたシーソー法を用いて損失を軽減する方法について検討した。 Random access code (RAC) communication protocol particularly useful when the communication between parties is restricted. In this work we built upon works that have previously proven quantum random access code (QRAC), in the absence of noise, to be more advantageous than classical random access code (CRAC), investigate the effects of noisy channel on QRAC performance and how the losses can be mitigated by using the see-saw method optimized by semi-definite programming when the noisy channel is known.	翻訳日:2024-02-07 21:53:34 公開日:2024-02-06
# IM-META:未知位相をもつネットワークにおけるノードメタデータによる影響最大化 IM-META: Influence Maximization Using Node Metadata in Networks With Unknown Topology ( http://arxiv.org/abs/2106.02926v3 ) ライセンス: Link先を確認	Cong Tran, Won-Yong Shin, Andreas Spitz	(参考訳) 複雑なネットワークの構造はしばしば不明であるため、ノードクエリの予算が小さいため、基盤となるネットワークの一部のみを探索することで、最も影響力のあるシードノードを特定することができる。本稿では、クエリやノードメタデータから情報を取得することで、未知のトポロジを持つネットワークにおける最大化(IM)に影響を与えるソリューションであるIM-METAを提案する。このようなメタデータの使用は、メタデータのノイズ性や接続性推論の不確実性のため、リスクがないため、シードノードとクエリノードの両方を見つけることを目的とした新しいIM問題を定式化する。 IM-METAでは,3つのステップを反復的に行う効果的な手法を開発した。 1) 収集したメタデータとエッジの関係を, シームズニューラルネットワークを用いて学習する。 2) 強化グラフを構築するために, 多数の不確かさエッジを選択する。 3)我々のトポロジ対応ランキング戦略を用いて,推定影響の最大化により,クエリの次のノードを特定する。実世界の4つのデータセットにおけるim-metaの実験的評価を通して,その実証を行った。 a)ノードクエリによるネットワーク探索の速度 b) 各モジュールの有効性 c) ベンチマーク手法に対する優位性 d) より困難な設定に対する堅牢性 e)ハイパーパラメータの感度,及び f)スケーラビリティ。 Since the structure of complex networks is often unknown, we may identify the most influential seed nodes by exploring only a part of the underlying network, given a small budget for node queries. We propose IM-META, a solution to influence maximization (IM) in networks with unknown topology by retrieving information from queries and node metadata. Since using such metadata is not without risk due to the noisy nature of metadata and uncertainties in connectivity inference, we formulate a new IM problem that aims to find both seed nodes and queried nodes. In IM-META, we develop an effective method that iteratively performs three steps: 1) we learn the relationship between collected metadata and edges via a Siamese neural network, 2) we select a number of inferred confident edges to construct a reinforced graph, and 3) we identify the next node to query by maximizing the inferred influence spread using our topology-aware ranking strategy. Through experimental evaluation of IM-META on four real-world datasets, we demonstrate a) the speed of network exploration via node queries, b) the effectiveness of each module, c) the superiority over benchmark methods, d) the robustness to more difficult settings, e) the hyperparameter sensitivity, and f) the scalability.	翻訳日:2024-02-07 21:53:22 公開日:2024-02-06
# InstaHideの2つのプライベート画像の混合におけるサンプル複雑さ InstaHide's Sample Complexity When Mixing Two Private Images ( http://arxiv.org/abs/2011.11877v2 ) ライセンス: Link先を確認	Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo	(参考訳) ニューラルネットワークのトレーニングは通常、大量の機密データを必要とし、トレーニングデータのプライバシを保護する方法が、ディープラーニング研究において重要なトピックになっている。 InstaHideは、テスト精度に小さな影響しか与えず、トレーニングデータのプライバシを保護するための最先端のスキームだ。本稿では,instahideに対する最近の攻撃を体系的に研究し,これらの攻撃を理解し分析するための統一フレームワークを提案する。既存の攻撃は証明可能な保証を持たないか、1つのプライベートイメージのみを復元できる。それぞれのInstaHideイメージが2つのプライベートイメージの混合である現在のInstaHideチャレンジ設定では、証明可能な保証と最適なサンプル複雑さですべてのプライベートイメージを復元する新しいアルゴリズムを提案する。さらに,すべてのinstahide画像の検索における計算困難性も提供する。以上の結果から,InstaHideは2枚のプライベートイメージを混合しても,情報理論上は安全ではないが,最悪の場合,計算上は安全であることがわかった。 Training neural networks usually require large numbers of sensitive training data, and how to protect the privacy of training data has thus become a critical topic in deep learning research. InstaHide is a state-of-the-art scheme to protect training data privacy with only minor effects on test accuracy, and its security has become a salient question. In this paper, we systematically study recent attacks on InstaHide and present a unified framework to understand and analyze these attacks. We find that existing attacks either do not have a provable guarantee or can only recover a single private image. On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity. In addition, we also provide a computational hardness result on retrieving all InstaHide images. Our results demonstrate that InstaHide is not information-theoretically secure but computationally secure in the worst case, even when mixing two private images.	翻訳日:2024-02-07 21:53:02 公開日:2024-02-06
# ドメインシフトのための校正不確かさの学習:分散ロバスト学習アプローチ Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach ( http://arxiv.org/abs/2010.05784v4 ) ライセンス: Link先を確認	Haoxuan Wang, Zhiding Yu, Yisong Yue, Anima Anandkumar, Anqi Liu, Junchi Yan	(参考訳) 本稿では,対象(テスト)分布とソース(トレーニング)分布が異なる領域シフトの下で校正不確かさを学習するためのフレームワークを提案する。このような領域シフトを、微分密度比推定器を用いて検出し、タスクネットワークと共に訓練し、ドメインシフトに関する調整されたソフトマックス予測形式を構成する。特に、密度比の推定は、ターゲット(テスト)サンプルのソース(トレーニング)分布との密接性を反映している。我々はタスクネットワークにおける予測の不確実性を調整するためにそれを用いる。この密度比を利用するという考え方は、相対的リスク最小化によるドメインシフトを考慮に入れた分布的ロバスト学習(DRL)フレームワークに基づいている。提案手法は,非教師付きドメイン適応 (UDA) や半教師付き学習 (SSL) などの下流タスクに有効な校正不確実性を生成する。これらのタスクでは、セルフトレーニングやFixMatchのようなメソッドが不確実性を使用して、再トレーニングのための確実な疑似ラベルを選択する。実験の結果,DRLの導入はドメイン間性能の大幅な向上につながることがわかった。また,推定密度比は人間の選択頻度と一致し,不確かさの指標との正の相関が示唆された。 We propose a framework for learning calibrated uncertainties under domain shifts, where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts via a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form concerning domain shift. In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift by adversarial risk minimization. We show that our proposed method generates calibrated uncertainties that benefit downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also show that the estimated density ratios align with human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.	翻訳日:2024-02-07 21:52:32 公開日:2024-02-06
# 非パラメトリックIVモデルにおける適応的・最適仮説テスト Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models ( http://arxiv.org/abs/2006.09587v4 ) ライセンス: Link先を確認	Christoph Breunig, Xiaohong Chen	(参考訳) 非パラメトリックインストゥルメンタル変数(npiv)モデルにおける構造関数に対する不等式(単調性、凸性など)と等式(パラメトリック、半パラメトリックなど)に対する新しい適応的仮説テストを提案する。実験統計は, 拘束型と非拘束型のNPIV推定器間の2次距離を改良した1次サンプルアナログに基づく。シーブチューニングパラメータとボンフェルロニ調整されたカイ二乗臨界値の計算量的・データ駆動的選択を提供する。本試験は,楽器の内在性と未知強度の存在下での代替関数の未知の滑らかさに適応する。テストの適応ミニマックスレートは$l^2$である。すなわち、合成ヌル上のタイプiの誤差と非パラメトリックな代替モデル上のタイプiiの誤差の和は、未知の正則性を持つnpivモデルに対する他の仮説テストによっては改善できない。 l^2$の信頼度セットは、適応テストの反転によって得られる。シミュレーションにより、我々の適応テストはNPIVモデルにおける単調性およびパラメトリックの制約に対する既存の非適応テストよりもはるかに大きいサイズと有限サンプルパワーを制御することを確認した。異なる製品需要とエンゲル曲線の形状制限を試験するための実証的応用について述べる。 We propose a new adaptive hypothesis test for inequality (e.g., monotonicity, convexity) and equality (e.g., parametric, semiparametric) restrictions on a structural function in a nonparametric instrumental variables (NPIV) model. Our test statistic is based on a modified leave-one-out sample analog of a quadratic distance between the restricted and unrestricted sieve NPIV estimators. We provide computationally simple, data-driven choices of sieve tuning parameters and Bonferroni adjusted chi-squared critical values. Our test adapts to the unknown smoothness of alternative functions in the presence of unknown degree of endogeneity and unknown strength of the instruments. It attains the adaptive minimax rate of testing in $L^2$. That is, the sum of its type I error uniformly over the composite null and its type II error uniformly over nonparametric alternative models cannot be improved by any other hypothesis test for NPIV models of unknown regularities. Confidence sets in $L^2$ are obtained by inverting the adaptive test. Simulations confirm that our adaptive test controls size and its finite-sample power greatly exceeds existing non-adaptive tests for monotonicity and parametric restrictions in NPIV models. Empirical applications to test for shape restrictions of differentiated products demand and of Engel curves are presented.	翻訳日:2024-02-07 21:52:12 公開日:2024-02-06
# クロスドメインFew-Shot学習における大規模マージン機構と擬似クエリセット Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning ( http://arxiv.org/abs/2005.09218v2 ) ライセンス: Link先を確認	Jia-Fong Yeh and Hsin-Ying Lee and Bing-Chen Tsai and Yi-Rong Chen and Ping-Chia Huang and Winston H. Hsu	(参考訳) 近年では、数発の学習問題に注目が集まっている。以前のほとんどの作業のメソッドは、単一のドメインのデータセットでトレーニングとテストが行われたが、クロスドメインの少数ショット学習は、トレーニングフェーズとテストフェーズの間にあるさまざまなドメインのデータセットを処理する、少数ショット学習問題の真新しいブランチである。本稿では,共通対象,衛星画像,医用画像など4つの異なる領域のデータセットを微調整しながら,単一のデータセット上で事前学習(メタ訓練)されているという問題を解決するために,支援画像から疑似クエリ画像を生成し,顔認識の手法に触発された大きなマージン機構で特徴抽出モジュールを微調整する,新しい大マージン微調整法(lmm-pqs)を提案する。実験結果によると,LMM-PQSはベースラインモデルよりもかなりのマージンを越え,我々のアプローチが堅牢であり,事前学習されたモデルをデータが少ない新しい領域に容易に適応できることを示した。 In recent years, few-shot learning problems have received a lot of attention. While methods in most previous works were trained and tested on datasets in one single domain, cross-domain few-shot learning is a brand-new branch of few-shot learning problems, where models handle datasets in different domains between training and testing phases. In this paper, to solve the problem that the model is pre-trained (meta-trained) on a single dataset while fine-tuned on datasets in four different domains, including common objects, satellite images, and medical images, we propose a novel large margin fine-tuning method (LMM-PQS), which generates pseudo query images from support images and fine-tunes the feature extraction modules with a large margin mechanism inspired by methods in face recognition. According to the experiment results, LMM-PQS surpasses the baseline models by a significant margin and demonstrates that our approach is robust and can easily adapt pre-trained models to new domains with few data.	翻訳日:2024-02-07 21:51:50 公開日:2024-02-06
# Fake News, Disinformation, and Deepfakes: 分散型Ledgerテクノロジとブロックチェーンを活用して,ディジタル偽造と偽造現実に対処する Fake News, Disinformation, and Deepfakes: Leveraging Distributed Ledger Technologies and Blockchain to Combat Digital Deception and Counterfeit Reality ( http://arxiv.org/abs/1904.05386v3 ) ライセンス: Link先を確認	Paula Fraga-Lamas, Tiago M. Fern\'andez-Caram\'es	(参考訳) ユビキタスなディープフェイク、偽情報、偽情報、プロパガンダ、ポストトゥルースは、しばしば偽ニュースと呼ばれ、近代民主主義社会におけるインターネットとソーシャルメディアの役割に対する懸念を提起している。その急速な普及により、デジタル詐欺は個人または社会的コスト(例えば選挙の完全性を妨げるために)だけでなく、経済的損失(例えば株式市場のパフォーマンスに影響を及ぼす)や国家の安全へのリスクにつながる可能性がある。 Blockchainと他のDistributed Ledger Technologies(DLT)は、情報の保存と交換のためのピアツーピア安全なプラットフォームを作成しながら、透過的で不変で検証可能なトランザクションの記録を提供することによって、データの証明、信頼性、トレーサビリティを保証する。この概要は、デジタル詐欺と戦うためのDLTとブロックチェーンの可能性を探り、現在開発中のイニシアチブをレビューし、主要な課題を特定することを目的としている。さらに、将来の研究者が偽ニュースや偽情報、ディープフェイクに直面するために取り組まなければならない問題について、今日のオンラインメディアにおけるサイバー脅威に対するレジリエンス強化の不可欠な部分として、いくつかの推奨が列挙されている。 The rise of ubiquitous deepfakes, misinformation, disinformation, propaganda and post-truth, often referred to as fake news, raises concerns over the role of Internet and social media in modern democratic societies. Due to its rapid and widespread diffusion, digital deception has not only an individual or societal cost (e.g., to hamper the integrity of elections), but it can lead to significant economic losses (e.g., to affect stock market performance) or to risks to national security. Blockchain and other Distributed Ledger Technologies (DLTs) guarantee the provenance, authenticity and traceability of data by providing a transparent, immutable and verifiable record of transactions while creating a peer-to-peer secure platform for storing and exchanging information. This overview aims to explore the potential of DLTs and blockchain to combat digital deception, reviewing initiatives that are currently under development and identifying their main current challenges. Moreover, some recommendations are enumerated to guide future researchers on issues that will have to be tackled to face fake news, disinformation and deepfakes, as an integral part of strengthening the resilience against cyber-threats on today's online media.	翻訳日:2024-02-07 21:50:31 公開日:2024-02-06
# 超高速単光子レベルパルスキャラクタリゼーションのための可変電気光学せん断干渉法 Variable electro-optic shearing interferometry for ultrafast single-photon-level pulse characterization ( http://arxiv.org/abs/2207.14049v2 ) ライセンス: Link先を確認	Stanis{\l}aw Kurzyna, Marcin Jastrz\k{e}bski, Nicolas Fabre, Wojciech Wasilewski, Micha{\l} Lipka, Micha{\l} Parniak	(参考訳) 利用可能な多くの方法にもかかわらず、超高速パルスの特性化は、特に単光子レベルでの困難な試みである。本稿では、短時間フーリエ変換の大きさをマッピングするパルス特性化方式を提案する。多くのよく知られた解とは異なり、非線形効果は必要とせず、単光子レベルの測定に適している。本手法は,完全電子的実験制御が可能な電気光学変調器を用いて,一連の制御時間と周波数シフトを導入することに基づく。古典的および単光子レベルのパルスのスペクトル幅と時間幅を特徴付け,スペクトル位相と振幅の再構成に成功した。この方法は位相感度測定を実装することで拡張することができ、自然に部分的に不整合光に適している。 Despite the multitude of available methods, the characterisation of ultrafast pulses remains a challenging endeavour, especially at the single-photon level. We introduce a pulse characterisation scheme that maps the magnitude of its short-time Fourier transform. Contrary to many well-known solutions it does not require nonlinear effects and is therefore suitable for single-photon-level measurements. Our method is based on introducing a series of controlled time and frequency shifts, where the latter is performed via an electro-optic modulator allowing a fully-electronic experimental control. We characterized the full spectral and temporal width of a classical and single-photon-level pulse and successfully reconstructed their spectral phase and amplitude. The method can be extended by implementing a phase-sensitive measurement and is naturally well-suited to partially-incoherent light.	翻訳日:2024-02-07 21:43:15 公開日:2024-02-06
# ポテンシャルと密度空間における縮退の幾何学 Geometry of Degeneracy in Potential and Density Space ( http://arxiv.org/abs/2206.12366v3 ) ライセンス: Link先を確認	Markus Penz, Robert van Leeuwen	(参考訳) 先行研究[j. chem. phys. 155, 244111 (2021)]において、グラフで表される有限格子系における密度汎関数理論からホッヘンバーグ・コーンの定理の反例を発見した。ここで、これは非常に特異で稀な密度でのみ起こることを示し、縮退領域と呼ばれる縮退した基底状態から生じる密度集合が互いに接触したり、密度領域全体の境界に接触することを示す。退化領域は一般に、連続体設定においても代数多様体の凸包の形状であることが示されている。密度領域とそれらの生成するポテンシャルの間に生じる幾何学は分析され、他の形状の中でローマ表面を特徴付ける例で説明される。 In a previous work [J. Chem. Phys. 155, 244111 (2021)], we found counterexamples to the fundamental Hohenberg-Kohn theorem from density-functional theory in finite-lattice systems represented by graphs. Here, we demonstrate that this only occurs at very peculiar and rare densities, those where density sets arising from degenerate ground states, called degeneracy regions, touch each other or the boundary of the whole density domain. Degeneracy regions are shown to generally be in the shape of the convex hull of an algebraic variety, even in the continuum setting. The geometry arising between density regions and the potentials that create them is analyzed and explained with examples that, among other shapes, feature the Roman surface.	翻訳日:2024-02-07 21:42:30 公開日:2024-02-06
# 分布比較のためのヒルベルト曲線投影距離 Hilbert Curve Projection Distance for Distribution Comparison ( http://arxiv.org/abs/2205.15059v4 ) ライセンス: Link先を確認	Tao Li, Cheng Meng, Hongteng Xu, Jun Yu	(参考訳) 分散比較は、データ分類や生成モデリングといった多くの機械学習タスクにおいて中心的な役割を果たす。本研究では,Hilbert curve projection (HCP) distanceと呼ばれる新しい計量法を提案し,低複雑性の2つの確率分布間の距離を測定する。特に、まずヒルベルト曲線を用いた2つの高次元確率分布を投影し、それらのカップリングを求め、カップリングに従って元の空間におけるこれらの2つの分布間の移動距離を計算する。我々は, hcp距離が適切な計量であり, 有界台を持つ確率測度に対して well-defined であることを示す。さらに、$d$次元空間における$L_p$コストによる修正された経験的 HCP 距離が、$O(n^{-1/2\max\{d,p\}})$未満の速度でその集団に収束することを示した。次元の呪いを抑制するため、(学習可能な)部分空間射影を用いたhcp距離の2つの変種も開発する。合成データと実世界のデータの両方で実験したところ、我々のHCP距離はワッサーシュタイン距離の効果的なサロゲートとして機能し、スライスされたワッサーシュタイン距離の欠点を克服している。 Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the $L_p$ cost in the $d$-dimensional space converges to its population counterpart at a rate of no more than $O(n^{-1/2\max\{d,p\}})$. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.	翻訳日:2024-02-07 21:42:15 公開日:2024-02-06
# EfficientViT:高分解能Dense予測のためのマルチスケールリニアアテンション EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction ( http://arxiv.org/abs/2205.14756v6 ) ライセンス: Link先を確認	Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han	(参考訳) 高分解能高密度予測は、計算写真や自動運転など、多くの現実世界の応用を可能にする。しかし、計算コストが大きいため、最先端の高解像度の予測モデルをハードウェアデバイスに展開することは困難である。この研究は、新しいマルチスケール線形注意を持つ高解像度ビジョンモデルのファミリーであるEfficientViTを提示する。従来のソフトマックス, ハードウェア非効率大カーネル畳み込み, 複雑なトポロジ構造に依存した高分解能高密度予測モデルとは異なり, マルチスケール線形注意は, 軽量かつハードウェア効率の高い操作のみで, グローバル受容場とマルチスケール学習(高分解能高密度予測の2つの望ましい特徴)を実現する。そのため、EfficientViTは、モバイルCPU、エッジGPU、クラウドGPUなど、さまざまなハードウェアプラットフォーム上での大幅なスピードアップによって、これまでの最先端モデルよりも、顕著なパフォーマンス向上を実現している。 Cityscapesのパフォーマンスを損なうことなく、EfficientViTは最大13.9$\times$と6.2$\times$GPUレイテンシをSegFormerとSegNeXtで削減します。超高解像度では、EfficientViTはRestormer上で最大6.4倍のスピードアップを実現し、PSNRでは0.11dBのゲインを提供する。 Segment Anythingでは、EfficientViTはA100 GPU上で48.9倍高いスループットを提供すると同時に、COCO上でのゼロショットインスタンスセグメンテーションのパフォーマンスをわずかに向上させる。 High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Unlike prior high-resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous state-of-the-art models with significant speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU. Without performance loss on Cityscapes, our EfficientViT provides up to 13.9$\times$ and 6.2$\times$ GPU latency reduction over SegFormer and SegNeXt, respectively. For super-resolution, EfficientViT delivers up to 6.4x speedup over Restormer while providing 0.11dB gain in PSNR. For Segment Anything, EfficientViT delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance on COCO.	翻訳日:2024-02-07 21:41:54 公開日:2024-02-06
# ガウス混合系のエントロピー近似の理論的誤差解析 Theoretical Error Analysis of Entropy Approximation for Gaussian Mixture ( http://arxiv.org/abs/2202.13059v4 ) ライセンス: Link先を確認	Takashi Furuya, Hiroyuki Kusumoto, Koichi Taniguchi, Naoya Kanno, Kazuma Suetake	(参考訳) ガウス混合分布は一般に一般確率分布を表すために用いられる。不確実性推定にガウス混合を用いる重要性はあるが、ガウス混合のエントロピーは解析的に計算することはできない。特に、Gal と Ghahramani [2016] は、非モダルガウス分布のエントロピーの和である近似エントロピーを提案した。この近似は次元に関係なく解析的に計算し易いが、理論的な保証はない。本稿では, 真のエントロピーと近似エントロピーの近似誤差を理論的に解析し, この近似が効果的に働くときに明らかにする。この誤差は、ガウス混合物の各ガウス成分がどれだけ離れているかによって制御される。このような分離を測定するために、ガウス混合のそれぞれのガウス成分の分散の和に対する平均間の距離の比率を導入し、その比率が無限になるにつれて誤差がゼロに収束することを示す。この収束状況は高次元空間においてより起こりやすい。したがって,この近似が高次元問題,特に重みを多用するニューラルネットワークのようなシナリオにおいて有効であることを保証できる。 Gaussian mixture distributions are commonly employed to represent general probability distributions. Despite the importance of using Gaussian mixtures for uncertainty estimation, the entropy of a Gaussian mixture cannot be analytically calculated. Notably, Gal and Ghahramani [2016] proposed the approximate entropy that is the sum of the entropies of unimodal Gaussian distributions. This approximation is easy to analytically calculate regardless of dimension, but there lack theoretical guarantees. In this paper, we theoretically analyze the approximation error between the true entropy and the approximate one to reveal when this approximation works effectively. This error is controlled by how far apart each Gaussian component of the Gaussian mixture. To measure such separation, we introduce the ratios of the distances between the means to the sum of the variances of each Gaussian component of the Gaussian mixture, and we reveal that the error converges to zero as the ratios tend to infinity. This convergence situation is more likely to occur in higher dimensional spaces. Therefore, our results provide a guarantee that this approximation works well in higher dimension problems, particularly in scenarios such as neural networks that involve a large number of weights.	翻訳日:2024-02-07 21:41:26 公開日:2024-02-06
# 群フェアネス下におけるベイズ最適分類器 Bayes-Optimal Classifiers under Group Fairness ( http://arxiv.org/abs/2202.09724v5 ) ライセンス: Link先を確認	Xianli Zeng and Edgar Dobriban and Guang Cheng	(参考訳) 機械学習のアルゴリズムは、社会福祉問題など、より高度な意思決定プロセスに統合されつつある。アルゴリズム予測から潜在的に異なる影響を緩和する必要があるため、公正な機械学習の分野において多くのアプローチが提案されている。しかし,様々な群フェアネス制約の下でベイズ最適分類器を特徴付ける基本的な問題は,いくつかの特別なケースでのみ研究されている。最適仮説テストのための古典的ネマン・ピアソンの議論(Neyman and Pearson, 1933; Shao, 2003)に基づいて、本論文は群フェアネスの下でベイズ最適分類器を導出するための統一的な枠組みを提供する。これにより、FairBayesと呼ばれるグループベースのしきい値設定手法を提案し、この手法は相違を直接制御し、本質的に最適なフェアネス精度トレードオフを実現する。これらの利点は徹底的な実験によって支えられている。 Machine learning algorithms are becoming integrated into more and more high-stakes decision-making processes, such as in social welfare issues. Due to the need of mitigating the potentially disparate impacts from algorithmic predictions, many approaches have been proposed in the emerging area of fair machine learning. However, the fundamental problem of characterizing Bayes-optimal classifiers under various group fairness constraints has only been investigated in some special cases. Based on the classical Neyman-Pearson argument (Neyman and Pearson, 1933; Shao, 2003) for optimal hypothesis testing, this paper provides a unified framework for deriving Bayes-optimal classifiers under group fairness. This enables us to propose a group-based thresholding method we call FairBayes, that can directly control disparity, and achieve an essentially optimal fairness-accuracy tradeoff. These advantages are supported by thorough experiments.	翻訳日:2024-02-07 21:40:49 公開日:2024-02-06
# 文字統計を用いた種子単語の選択 Selecting Seed Words for Wordle using Character Statistics ( http://arxiv.org/abs/2202.03457v3 ) ライセンス: Link先を確認	Nisansa de Silva	(参考訳) 単語推測ゲーム「wordle」は2022年1月に世界的な人気を博した。ゲームの目的は6回以内に5文字の英語単語を推測することである。各トライは、あるキャラクタがソリューションの一部であるかどうかを知らせる色を変えるタイルによってプレイヤーにヒントを与え、それがソリューションの一部である場合、それが正しい配置にあるかどうかを判断する。毎日の単語を解決するための最善の出発語と最善の戦略を見つけるために、多くの試みがなされている。本研究は,5文字単語の文字統計を用いて,最良3単語を決定する。 Wordle, a word guessing game rose to global popularity in the January of 2022. The goal of the game is to guess a five-letter English word within six tries. Each try provides the player with hints by means of colour changing tiles which inform whether or not a given character is part of the solution as well as, in cases where it is part of the solution, whether or not it is in the correct placement. Numerous attempts have been made to find the best starting word and best strategy to solve the daily wordle. This study uses character statistics of five-letter words to determine the best three starting words.	翻訳日:2024-02-07 21:40:31 公開日:2024-02-06
# spooky pebbleゲームにおける厳密な境界: 測定値によるクビットのリサイクル Tight Bounds on the Spooky Pebble Game: Recycling Qubits with Measurements ( http://arxiv.org/abs/2110.08973v2 ) ライセンス: Link先を確認	Niels Kornerup, Jonathan Sadun, David Soloveichik	(参考訳) Pebbleゲームは、時間空間のトレードオフを分析する一般的なモデルである。特に可逆小石ゲームは、グロバー探索のような量子アルゴリズムで重ね合わせの入力の古典計算を効率的にシミュレートするためによく用いられる。しかし、可逆小石ゲームは、可逆中間測度によって与えられる余分な計算力を利用することはできない。測定と適応位相補正をモデル化したスポーキーな小石ゲームは、可逆的なアプローチが達成できる範囲を超えて、キュービットの数を減らす。スパーキー小石ゲームはシミュレーションの総空間(ビット+キュービット)の複雑さを減少させるわけではないが、キュービットに格納しなければならない空間の量を減少させる。あらゆるpebbleバウンドのライン上に、spooky pebbleゲームに対する漸近的に厳しいトレードオフがあることを証明し、sooky pebbleゲームと任意の古典的なシーケンシャルな計算をシミュレートするための、厳密な時間的トレードオフを与えました。例えば、すべての$\epsilon \in (0,1]$に対して、時間$T$とスペース$S$を必要とする古典的な計算は、量子コンピュータ上で$O(T/ \epsilon)$ gatesと$O(T^{\epsilon}S^{1-\epsilon})$ qubitsで実装できる。これにより、その数で可逆小石ゲームに最もよく知られた境界が改善され、これは$O(2^{1/\epsilon} T)$ gates を使用する。さらに,より一般的な有向非循環グラフ(dag)上では,細粒度データ依存性をキャプチャし,このゲームが木上で可逆的なpebbleゲームに勝ることを示す。さらに、任意のDAGは、不可逆小石ゲームで必要とされる以上の1つ以上の小石で小石化することができ、最大2度のDAG上でスポッキー小石ゲームをプレイするのに必要となる最小の小石を見つけることは、PSPACEハードであることを意味する。 Pebble games are popular models for analyzing time-space trade-offs. In particular, the reversible pebble game is often applied in quantum algorithms like Grover's search to efficiently simulate classical computation on inputs in superposition. However, the reversible pebble game cannot harness the additional computational power granted by irreversible intermediate measurements. The spooky pebble game, which models interleaved measurements and adaptive phase corrections, reduces the number of qubits beyond what reversible approaches can achieve. While the spooky pebble game does not reduce the total space (bits plus qubits) complexity of the simulation, it reduces the amount of space that must be stored in qubits. We prove asymptotically tight trade-offs for the spooky pebble game on a line with any pebble bound, giving a tight time-qubit tradeoff for simulating arbitrary classical sequential computation with the spooky pebble game. For example, for all $\epsilon \in (0,1]$, any classical computation requiring time $T$ and space $S$ can be implemented on a quantum computer using only $O(T/ \epsilon)$ gates and $O(T^{\epsilon}S^{1-\epsilon})$ qubits. This improves on the best known bound for the reversible pebble game with that number of qubits, which uses $O(2^{1/\epsilon} T)$ gates. We also consider the spooky pebble game on more general directed acyclic graphs (DAGs), capturing fine-grained data dependency in computation and show that this game can outperform the reversible pebble game on trees. Additionally any DAG can be pebbled with at most one more pebble than is needed in the irreversible pebble game, implying that finding the minimum number of pebbles necessary to play the spooky pebble game on a DAG with maximum in-degree two is PSPACE-hard to approximate.	翻訳日:2024-02-07 21:39:55 公開日:2024-02-06
# メタラーニング3次元形状分割関数 Meta-Learning 3D Shape Segmentation Functions ( http://arxiv.org/abs/2110.03854v2 ) ライセンス: Link先を確認	Yu Hao, Hao Huang, Shuaihang Yuan, Yi Fang	(参考訳) ディープニューラルネットワークを用いたロバストな3d形状セグメンテーション関数の学習は、強力なパラダイムとして登場し、各3d形状の一貫した部分セグメンテーションを生成する有望なパフォーマンスを提供する。 3次元形状分割関数を一般化するには、各関数空間上の事前のロバストな学習が必要であり、重要な3次元構造変化が存在する場合、形状の一貫した部分分割を可能にする。既存の一般化法は、大規模ラベル付きデータセット上の3次元形状セグメンテーション関数の広範なトレーニングに依存している。本稿では,3次元形状分割関数空間の学習をメタラーニング問題として定式化することを提案し,学習データのない新しい形状に素早く適応可能な3次元分割モデルを予測することを目的とした。より具体的には、各タスクを3d空間の入力点として部品ラベルを予測する形状条件付き3dセグメンテーション関数の教師なし学習と定義する。 3Dセグメンテーション機能は、パートラベルを必要とせずに自己監督型3D形状復元損失によって訓練される。また,3次元形状を入力とし,各3次元セグメンテーション関数空間上での事前予測を行うメタリーナーとして,補助深層ニューラルネットワークを導入する。実験では,メタ3DSegと呼ばれるメタ学習手法が,従来の3次元形状分割関数のためのディープニューラルネットワークの設計よりも,教師なしの3次元形状分割を改善することを示す。 Learning robust 3D shape segmentation functions with deep neural networks has emerged as a powerful paradigm, offering promising performance in producing a consistent part segmentation of each 3D shape. Generalizing across 3D shape segmentation functions requires robust learning of priors over the respective function space and enables consistent part segmentation of shapes in presence of significant 3D structure variations. Existing generalization methods rely on extensive training of 3D shape segmentation functions on large-scale labeled datasets. In this paper, we proposed to formalize the learning of a 3D shape segmentation function space as a meta-learning problem, aiming to predict a 3D segmentation model that can be quickly adapted to new shapes with no or limited training data. More specifically, we define each task as unsupervised learning of shape-conditioned 3D segmentation function which takes as input points in 3D space and predicts the part-segment labels. The 3D segmentation function is trained by a self-supervised 3D shape reconstruction loss without the need for part labels. Also, we introduce an auxiliary deep neural network as a meta-learner which takes as input a 3D shape and predicts the prior over the respective 3D segmentation function space. We show in experiments that our meta-learning approach, denoted as Meta-3DSeg, leads to improvements on unsupervised 3D shape segmentation over the conventional designs of deep neural networks for 3D shape segmentation functions.	翻訳日:2024-02-07 21:39:15 公開日:2024-02-06
# 非凸損失関数上のワンショットフェデレーション学習における順序最適境界 Order Optimal Bounds for One-Shot Federated Learning over non-Convex Loss Functions ( http://arxiv.org/abs/2108.08677v3 ) ライセンス: Link先を確認	Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani	(参考訳) 非凸損失関数上の未知分布から$m$のサンプル関数を観測し,それぞれに$m$のマシンが存在する一ショット環境でのフェデレーション学習の問題点を考察する。 F:[-1,1]^d\to\mathbb{R}$ をこの未知分布に対する期待損失関数とする。目標は、最小値が$f$の見積もりを見つけることである。その観察に基づいて、各マシンは有界長$b$の信号を生成し、それをサーバに送る。サーバは全マシンの信号を収集し、最小値である$f$の見積もりを出力する。任意のアルゴリズムの損失は、$\max\big(1/(\sqrt{n}(mB)^{1/d}), 1/\sqrt{mn}\big)$ で、対数係数まで下界であることが示される。次に、この下限が分散学習アルゴリズムであるマルチレゾリューション推定器(multi- resolution estimator for non-convex loss function, mre-nc)を提示することにより、m$とn$の順に最適であることを証明する。 We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ sample functions from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^d\to\mathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine generates a signal of bounded length $B$ and sends it to a server. The server collects signals of all machines and outputs an estimate of the minimizer of $F$. We show that the expected loss of any algorithm is lower bounded by $\max\big(1/(\sqrt{n}(mB)^{1/d}), 1/\sqrt{mn}\big)$, up to a logarithmic factor. We then prove that this lower bound is order optimal in $m$ and $n$ by presenting a distributed learning algorithm, called Multi-Resolution Estimator for Non-Convex loss function (MRE-NC), whose expected loss matches the lower bound for large $mn$ up to polylogarithmic factors.	翻訳日:2024-02-07 21:38:49 公開日:2024-02-06
# ソフトウェアに基づく対話システム:調査,分類,課題 Software-Based Dialogue Systems: Survey, Taxonomy and Challenges ( http://arxiv.org/abs/2106.10901v2 ) ライセンス: Link先を確認	Quim Motger, Xavier Franch and Jordi Marco	(参考訳) 人-コンピュータ相互作用の分野における自然言語インタフェースの利用は、専門の科学・産業研究を通じて激しい研究が進められている。この分野での最新のコントリビューションは、リカレントニューラルネットワークやコンテキスト認識戦略の可能性、ユーザ中心の設計アプローチといったディープラーニングアプローチを含む、コミュニティの関心を、会話エージェントやチャットボットとして知られるソフトウェアベースの対話システムへと引き戻すものだ。それにもかかわらず、この分野の新規性を考えると、関連するすべての研究の観点をカバーする会話エージェントの研究の現状に関する、一般的な文脈に依存しない概要が欠落している。本稿では,この文脈に動機づけられ,二次研究の体系的文献レビューを通して,対話型エージェント研究の現状について概説する。本研究は,最近の文献から得られた知識を,様々な領域,研究の焦点,文脈において明確に提示することで,徹底的な視点を育むように設計されている。そこで本研究では,対話エージェントの分野における異なる次元の包括的分類法を提案し,研究者を支援するとともに,自然言語インタフェースの分野における今後の研究の基盤となることを期待する。 The use of natural language interfaces in the field of human-computer interaction is undergoing intense study through dedicated scientific and industrial research. The latest contributions in the field, including deep learning approaches like recurrent neural networks, the potential of context-aware strategies and user-centred design approaches, have brought back the attention of the community to software-based dialogue systems, generally known as conversational agents or chatbots. Nonetheless, and given the novelty of the field, a generic, context-independent overview on the current state of research of conversational agents covering all research perspectives involved is missing. Motivated by this context, this paper reports a survey of the current state of research of conversational agents through a systematic literature review of secondary studies. The conducted research is designed to develop an exhaustive perspective through a clear presentation of the aggregated knowledge published by recent literature within a variety of domains, research focuses and contexts. As a result, this research proposes a holistic taxonomy of the different dimensions involved in the conversational agents' field, which is expected to help researchers and to lay the groundwork for future research in the field of natural language interfaces.	翻訳日:2024-02-07 21:38:25 公開日:2024-02-06
# 潜時空間探索と因果推論による未知通信システムへのアプローチ Approaching an unknown communication system by latent space exploration and causal inference ( http://arxiv.org/abs/2303.10931v2 ) ライセンス: Link先を確認	Ga\v{s}per Begu\v{s} and Andrej Leban, Shane Gero	(参考訳) 本稿では,教師なし深層生成モデルの潜在空間を探索し,データ中の有意義な性質を発見する手法を提案する。個々の潜在変数を極値に操作し,因果推論に触発された手法をcdev(causal disentanglement with extreme values)と呼ぶアプローチに組み合わせることで,モデル解釈可能性に対する洞察が得られることを示す。これにより、モデルが有意義にエンコードする未知のデータの性質を検証し、最も興味深く調査された動物コミュニケーションシステムの一つであるクジラクジラ(Physeter macrocephalus)のコミュニケーションシステムについての洞察を深めることが出来る。ネットワークアーキテクチャは、音声の有意義な表現を学習するために用いられており、ここでは、基礎的真実を持たない場合の他の音声通信システムの特性を解読する学習メカニズムとして用いられる。提案手法は, コウクジラが, 一連のクリック数, タイミングの規則性, スペクトル平均, 音の規則性などの音響特性を用いて, 情報をエンコードしていることを示唆している。これらの発見の一部は既存の仮説と一致しているが、他の発見は初めて提案されている。また,学習中に提示されない革新的なデータを生成しながら,通信システム内のユニット構造を統制し,それらを適用するためのルールを明らかにする。本稿では,因果推論手法を用いた深層ニューラルネットワークのアウトプットの解釈は,未知なデータに近づくための有効な戦略であり,深層学習が仮説空間を制限できる別の事例を示す。最後に、提案されたアプローチは他のアーキテクチャやデータセットにも拡張できる。 This paper proposes a methodology for discovering meaningful properties in data by exploring the latent space of unsupervised deep generative models. We combine manipulation of individual latent variables to extreme values with methods inspired by causal inference into an approach we call causal disentanglement with extreme values (CDEV) and show that this method yields insights for model interpretability. With this, we can test for what properties of unknown data the model encodes as meaningful, using it to glean insight into the communication system of sperm whales (Physeter macrocephalus), one of the most intriguing and understudied animal communication systems. The network architecture used has been shown to learn meaningful representations of speech; here, it is used as a learning mechanism to decipher the properties of another vocal communication system in which case we have no ground truth. The proposed methodology suggests that sperm whales encode information using the number of clicks in a sequence, the regularity of their timing, and audio properties such as the spectral mean and the acoustic regularity of the sequences. Some of these findings are consistent with existing hypotheses, while others are proposed for the first time. We also argue that our models uncover rules that govern the structure of units in the communication system and apply them while generating innovative data not shown during training. This paper suggests that an interpretation of the outputs of deep neural networks with causal inference methodology can be a viable strategy for approaching data about which little is known and presents another case of how deep learning can limit the hypothesis space. Finally, the proposed approach can be extended to other architectures and datasets.	翻訳日:2024-02-07 21:30:49 公開日:2024-02-06
# locposenet:未発見のオブジェクトポーズ推定に先立つロバストな位置 LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation ( http://arxiv.org/abs/2211.16290v3 ) ライセンス: Link先を確認	Chen Zhao, Yinlin Hu, Mathieu Salzmann	(参考訳) 標準の6dオブジェクトポーズ推定設定では、オブジェクトの位置優先が重要となる。前者は、3Dオブジェクトの変換を初期化し、3Dオブジェクトの回転推定を容易にするために使用できる。残念ながら、この目的のために使用される物体検出器は、見えない物体に一般化しない。したがって、未確認物体の既存の6次元ポーズ推定法は、地中真正物体の位置が未知であると仮定するか、不正確な結果が得られる。本稿では,未確認オブジェクトに先立って位置を頑健に学習できるLocPoseNetという手法を開発し,この問題に対処する。提案手法は,テンプレートマッチング戦略に基づいて,参照カーネルを分散し,マルチスケール相関を効率的に計算するためのクエリでそれらを畳み込む手法を提案する。次に,異なる対象位置パラメータを予測するために,スケール認識機能とスケールロバスト機能を分離する新しい翻訳推定器を導入する。提案手法は,LINEMOD と GenMOP において,既存の作業よりも優れた性能を示す。さらに,難易度の高い合成データセットを構築し,様々なノイズ源に対する手法のロバスト性を強調した。プロジェクトのWebサイトは以下の通り。 Object location prior is critical for the standard 6D object pose estimation setting. The prior can be used to initialize the 3D object translation and facilitate 3D object rotation estimation. Unfortunately, the object detectors that are used for this purpose do not generalize to unseen objects. Therefore, existing 6D pose estimation methods for unseen objects either assume the ground-truth object location to be known or yield inaccurate results when it is unavailable. In this paper, we address this problem by developing a method, LocPoseNet, able to robustly learn location prior for unseen objects. Our method builds upon a template matching strategy, where we propose to distribute the reference kernels and convolve them with a query to efficiently compute multi-scale correlations. We then introduce a novel translation estimator, which decouples scale-aware and scale-robust features to predict different object location parameters. Our method outperforms existing works by a large margin on LINEMOD and GenMOP. We further construct a challenging synthetic dataset, which allows us to highlight the better robustness of our method to various noise sources. Our project website is at: https://sailor-z.github.io/projects/3DV2024_LocPoseNet.html.	翻訳日:2024-02-07 21:30:20 公開日:2024-02-06
# RaLiBEV:アンカーボックス自由物体検出システムのためのレーダとLiDARのBEV融合学習 RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems ( http://arxiv.org/abs/2211.06108v5 ) ライセンス: Link先を確認	Yanlong Yang, Jianan Liu, Tao Huang, Qing-Long Han, Gang Ma and Bing Zhu	(参考訳) 自動運転では、LiDARとレーダーは環境認識に不可欠である。 LiDARは正確な3D空間センシング情報を提供するが、霧のような悪天候に苦しむ。逆に、レーダー信号は、特定の波長によって雨や霧を貫通するが、ノイズの乱れを起こしやすい。最近の最先端の研究は、レーダーとLiDARの融合が悪天候の堅牢な検出につながることを明らかにしている。既存の研究では、畳み込みニューラルネットワークアーキテクチャを採用して、各センサデータから特徴を抽出し、2つの分岐特徴を調整して集約し、オブジェクト検出結果を予測する。しかし,これらの手法はラベル割り当てと融合戦略の単純な設計のため,予測境界ボックスの精度が低い。本稿では,レーダーレンジ方位熱マップとLiDAR点雲から得られた特徴を融合させて,可能な物体を推定する,鳥眼視融合学習に基づくアンカーボックスフリー物体検出システムを提案する。異なるラベル割り当て戦略は、前景や背景アンカーポイントの分類と対応する境界ボックスの回帰との整合性を促進するように設計されている。さらに,新しい対話型トランスモジュールを用いることで,オブジェクト検出器の性能をさらに向上する。本稿では,最近発表されたOxford Radar RobotCarデータセットを用いて,提案手法の優れた性能を示す。本システムの平均精度は, 「クラー」と「フォギー」の訓練条件下で, 0.8 の IoU (IoU) 区間において, 13.1% と 19.0% に向上した。 In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of predicted bounding boxes due to a simple design of label assignment and fusion strategies. In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.	翻訳日:2024-02-07 21:29:56 公開日:2024-02-06
# pyRDDLGym:RDDLからGym環境へ pyRDDLGym: From RDDL to Gym Environments ( http://arxiv.org/abs/2211.05939v5 ) ライセンス: Link先を確認	Ayal Taitler, Michael Gimelfarb, Jihwan Jeong, Sriram Gopalakrishnan, Martin Mladenov, Xiaotian Liu, Scott Sanner	(参考訳) 提案するpyRDDLGymは, RDDL宣言記述からOpenAI Gym環境の自動生成のためのPythonフレームワークである。 rddlにおける変数の離散時間ステップ進化は、ジムステップスキームに自然に適合する条件付き確率関数によって記述される。さらに、RDDLは持ち上げられた記述であるため、複数のエンティティと異なる構成をサポートする環境の修正とスケールアップは、面倒なプロセスではなく、簡単になる。我々は,pyRDDLGymがRDDLの独特な表現力により,ベンチマークの容易かつ迅速な開発を可能にすることで,強化学習コミュニティの新たな風として機能することを期待する。 rddl記述におけるモデルへの明示的なアクセスを提供することで、pyrddlgymはモデルの知識を活用しながら相互作用から学ぶためのハイブリッドアプローチの研究を促進できる。本稿では、pyRDDLGymの設計と組込み例と、フレームワークに組み込まれたRDDL言語への追加について述べる。 We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description. The discrete time step evolution of variables in RDDL is described by conditional probability functions, which fits naturally into the Gym step scheme. Furthermore, since RDDL is a lifted description, the modification and scaling up of environments to support multiple entities and different configurations becomes trivial rather than a tedious process prone to errors. We hope that pyRDDLGym will serve as a new wind in the reinforcement learning community by enabling easy and rapid development of benchmarks due to the unique expressive power of RDDL. By providing explicit access to the model in the RDDL description, pyRDDLGym can also facilitate research on hybrid approaches for learning from interaction while leveraging model knowledge. We present the design and built-in examples of pyRDDLGym, and the additions made to the RDDL language that were incorporated into the framework.	翻訳日:2024-02-07 21:29:26 公開日:2024-02-06
# 非線形ポンププローブ分光における分数統計の署名 Signatures of fractional statistics in nonlinear pump-probe spectroscopy ( http://arxiv.org/abs/2210.16249v2 ) ライセンス: Link先を確認	Max McGinley, Michele Fava, S. A. Parameswaran	(参考訳) 二次元系の励起スペクトルにおけるオンの存在は非線形分光量から推測できることを示した。特に,試料に2つの光パルスを照射し,その間に時間遅延を調節できるポンププローブ分光について考察した。関連する応答係数は、第1パルスブレイドによって生成されたイオンが第2パルスブレイドによって生成されたときに得られる統計位相に由来する普遍的な形式を示す。この挙動は、非統計相互作用や小さな非零温度を含む非普遍物理学によって定性的に変化することが示されている。磁気システムでは、現在利用可能なテラヘルツ領域プローブを用いて興味の信号を測定することができ、量子スピン液体の探索における非線形分光技術の有用性を強調している。 We show that the presence of anyons in the excitation spectrum of a two-dimensional system can be inferred from nonlinear spectroscopic quantities. In particular, we consider pump-probe spectroscopy, where a sample is irradiated by two light pulses with an adjustable time delay between them. The relevant response coefficient exhibits a universal form that originates from the statistical phase acquired when anyons created by the first pulse braid around those created by the second. This behaviour is shown to be qualitatively unchanged by non-universal physics including non-statistical interactions and small nonzero temperatures. In magnetic systems, the signal of interest can be measured using currently available terahertz-domain probes, highlighting the potential usefulness of nonlinear spectroscopic techniques in the search for quantum spin liquids.	翻訳日:2024-02-07 21:29:09 公開日:2024-02-06
# 量子カオスの操作計量と時空間絡み合い構造 Operational Metric for Quantum Chaos and the Corresponding Spatiotemporal Entanglement Structure ( http://arxiv.org/abs/2210.14926v4 ) ライセンス: Link先を確認	Neil Dowling and Kavan Modi	(参考訳) カオスシステムは小さな摂動に非常に敏感であり、生物学的科学、物理科学、社会科学にも至る所に存在する。これを基本原理として、量子カオスの運用概念を構築します。すなわち、多体孤立量子システムの将来の状態は、そのシステムの小さな部分における過去のマルチタイム操作に敏感である。感性」とは、2つの異なる摂動状態から得られる状態が互いに容易に変換できないことを意味する。すなわち、関連する量は最終状態における摂動の影響の複雑さである。 Butterfly Flutter Fidelityと呼ばれるこの直感的な計量から、我々は、カオスに関する一連の操作条件、特に時空間絡みのスケーリングを特定するために、マルチタイム量子プロセスの言語を使用する。我々の基準はすでに、通常の概念と、量子カオスのよく知られた診断を含んでいる。これには、Peres-Loschmidt Echo、Dynamical Entropy、Tripartite Mutual Information、Local-Operator Entanglementが含まれる。したがって、既存の診断を単一の構造内に統一したフレームワークを提供する。さらに、ランダム回路から発生した進化など、量子カオスにつながるいくつかのメカニズムを定量化する。本研究は,多体局在化,測定誘起相転移,フロッケダイナミクスなどの多体力学現象を体系的に研究する手法である。 Chaotic systems are highly sensitive to a small perturbation, and are ubiquitous throughout biological sciences, physical sciences and even social sciences. Taking this as the underlying principle, we construct an operational notion for quantum chaos. Namely, we demand that the future state of a many-body, isolated quantum system is sensitive to past multitime operations on a small subpart of that system. By `sensitive', we mean that the resultant states from two different perturbations cannot easily be transformed into each other. That is, the pertinent quantity is the complexity of the effect of the perturbation within the final state. From this intuitive metric, which we call the Butterfly Flutter Fidelity, we use the language of multitime quantum processes to identify a series of operational conditions on chaos, in particular the scaling of the spatiotemporal entanglement. Our criteria already contain the routine notions, as well as the well-known diagnostics for quantum chaos. This includes the Peres-Loschmidt Echo, Dynamical Entropy, Tripartite Mutual Information, and Local-Operator Entanglement. We hence present a unified framework for these existing diagnostics within a single structure. We also go on to quantify how several mechanisms lead to quantum chaos, such as evolution generated from random circuits. Our work paves the way to systematically study many-body dynamical phenomena like Many-Body Localization, measurement-induced phase transitions, and Floquet dynamics.	翻訳日:2024-02-07 21:28:56 公開日:2024-02-06
# ページ全体のランク付けに偏りのない学習 Whole Page Unbiased Learning to Rank ( http://arxiv.org/abs/2210.10718v2 ) ライセンス: Link先を確認	Haitao Mao, Lixin Zou, Yujia Zheng, Jiliang Tang, Xiaokai Chu, Jiashu Zhao, Qian Wang, Dawei Yin	(参考訳) 情報検索システム、特にクリック行動におけるページ提示バイアスは、暗黙のユーザフィードバックによるランキングモデルのパフォーマンス向上を妨げる、よく知られた課題である。ランク付け-(ultr)アルゴリズムへの偏りのない学習は、バイアス付きクリックデータを用いて偏りのないランキングモデルを学ぶために提案される。しかし、既存のアルゴリズムの多くは、例えば、検索結果ページの表示(SERP)において他の特徴によって引き起こされるバイアス、例えばマルチメディアによって引き起こされる魅力的なバイアスを考慮せずに、位置関連バイアスを緩和するように設計されている。残念ながら、これらのバイアスは産業システムにおいて広く存在し、不十分な検索体験につながる可能性がある。そこで本研究では,全ページSERP機能によって引き起こされるバイアスを同時に処理することを目的とした,全ページのUnbiased Learning to Rank(WP-ULTR)という新たな問題を導入する。 1)適切なユーザ行動モデル(ユーザ行動仮説)を見つけるのは困難であり、(2)複雑なバイアスは既存のアルゴリズムでは処理できない。上記の課題に対処するために、BALというアルゴリズムをランク付けするバイアス非依存学習を提案し、因果発見によるユーザ行動モデルを自動的に見つけ、特定の設計をせずに複数のSERP機能によって引き起こされるバイアスを軽減する。実世界のデータセットによる実験結果から,BALの有効性が検証された。 The page presentation biases in the information retrieval system, especially on the click behavior, is a well-known challenge that hinders improving ranking models' performance with implicit user feedback. Unbiased Learning to Rank~(ULTR) algorithms are then proposed to learn an unbiased ranking model with biased click data. However, most existing algorithms are specifically designed to mitigate position-related bias, e.g., trust bias, without considering biases induced by other features in search result page presentation(SERP), e.g. attractive bias induced by the multimedia. Unfortunately, those biases widely exist in industrial systems and may lead to an unsatisfactory search experience. Therefore, we introduce a new problem, i.e., whole-page Unbiased Learning to Rank(WP-ULTR), aiming to handle biases induced by whole-page SERP features simultaneously. It presents tremendous challenges: (1) a suitable user behavior model (user behavior hypothesis) can be hard to find; and (2) complex biases cannot be handled by existing algorithms. To address the above challenges, we propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model with causal discovery and mitigate the biases induced by multiple SERP features with no specific design. Experimental results on a real-world dataset verify the effectiveness of the BAL.	翻訳日:2024-02-07 21:28:35 公開日:2024-02-06
# 視聴覚および自己報告型パーソナリティ認識のためのディープラーニングモデルのオープンソースベンチマーク An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition ( http://arxiv.org/abs/2210.09138v2 ) ライセンス: Link先を確認	Rongfan Liao and Siyang Song and Hatice Gunes	(参考訳) パーソナリティは、人間の日常生活や作業行動の多様さを決定づけ、人間の内外的状態を理解するのに不可欠である。近年,非言語的音声・視覚行動に基づく被験者の見かけのパーソナリティまたは自己報告のパーソナリティを予測するための自動パーソナリティ計算手法が多数開発されている。しかし、その大半は複雑なデータセット固有の前処理ステップやモデルトレーニングのトリックに苦しむ。一貫性のある実験的な設定の標準ベンチマークがないため、これらのパーソナリティコンピューティングモデルの実際の性能を適切に比較することは不可能であり、再現も困難である。本稿では,既存の8つのパーソナリティ・コンピューティングモデル(例えば,音声,視覚,音声視覚)と7つの標準ディープラーニングモデルについて,自己報告と明らかなパーソナリティ認識タスクの両方で公正かつ一貫した評価を行うための,最初の再現可能な音声・視覚ベンチマークフレームワークを提案する。また、一連のベンチマークモデルに基づいて、人格計算結果に対する短期・フレームレベルの予測を要約するための2つの長期モデリング戦略の影響についても検討する。結果は以下の通りである。 (i)ほとんどのベンチマークされたディープラーニングモデルによる顔行動から推定される明らかな性格特性は、自己報告されたものよりも信頼性が高い。 (II)視覚モデルは、人格認識における音声モデルよりも優れたパフォーマンスをしばしば達成する。 (iii)非言語行動は、異なる性格特性の予測に異なる寄与をする。 (4) 再現されたパーソナリティ・コンピューティング・モデルは, 当初報告した結果よりも性能が悪くなった。我々のベンチマークは \url{https://github.com/liaorongfan/DeepPersonality} で公開されています。 Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states. In recent years, a large number of automatic personality computing approaches have been developed to predict either the apparent personality or self-reported personality of the subject based on non-verbal audio-visual behaviours. However, the majority of them suffer from complex and dataset-specific pre-processing steps and model training tricks. In the absence of a standardized benchmark with consistent experimental settings, it is not only impossible to fairly compare the real performances of these personality computing models but also makes them difficult to be reproduced. In this paper, we present the first reproducible audio-visual benchmarking framework to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks. Building upon a set of benchmarked models, we also investigate the impact of two previously-used long-term modelling strategies for summarising short-term/frame-level predictions on personality computing results. The results conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; (iii) non-verbal behaviours contribute differently in predicting different personality traits; and (iv) our reproduced personality computing models generally achieved worse performances than their original reported results. Our benchmark is publicly available at \url{https://github.com/liaorongfan/DeepPersonality}.	翻訳日:2024-02-07 21:27:58 公開日:2024-02-06
# 崩壊しない2光子状態の多重測定 Multiple measurements on an uncollapsed entangled two-photon state ( http://arxiv.org/abs/2210.06045v2 ) ライセンス: Link先を確認	Dalibor Jav\r{u}rek	(参考訳) 量子状態の崩壊の定義と相同性の相対性理論は実験的な状況へと発展し、複数の測定値が連続しない量子状態に対して取られる。量子状態の崩壊時空間分布は、量子系を測定する検出器の基準フレームおよび検出器に対して移動する基準フレームに示される。彼らの検査から、ある条件下では、複数の測定値が同じ非収束量子状態において許容される。この手法の応用は、偏光とエネルギーに絡み合った光子対状態の測定に応用される。私は、2つの測定値が未収束の光子対状態に対して取られる条件を導出する。同じ非崩壊状態における複数の測定の許容から、深刻な結果が続く。例えば、この状況における両方の検出器による測定は相関しない。さらに、保存法則は個々の測定値に違反するが、平均値には違反しない。このステートメントはエネルギーに絡み合った2光子状態で証明される。これは、検出器が互いに相対的に静止して観測した実験結果と矛盾している。量子状態の観測結果が相関しているという予測と実験結果が一致しない場合、コペンハーゲン解釈とは異なる量子状態の崩壊の新しい時空分布が、この状況の適切な解法として提案されなければならない。 The relativity of simultaneity together with definition of a quantum state's collapse result into experimental situations, where a multiple measurements can be taken on an uncollapsed quantum state. A quantum state's collapse space-time distribution is shown in a reference frame of a detector measuring the quantum system and in a reference frame moving relative to the detector. From their inspection follows, that under certain conditions, multiple measurements are allowed on the same uncollapsed quantum state. An application of the developed approach is shown on measurement of photon-pair state entangled in polarizations and energy. I derive conditions, under which two measurements can be taken on the uncollapsed photon-pair state. From allowance of multiple measurements on the same uncollapsed state follow serious consequences. For example, the measurements taken by both detectors in this situation are uncorrelated. Moreover, all the conservation laws could be violated in individual measurements, but not in mean value. This statement is proved on the two-photon state entangled in energy. This is in contradiction with experimental results observed by the detectors in rest relative to each other. If experimental results of the proposed experiment disagree with the predictions -- results measured on the quantum state are correlated, new space-time distribution of the quantum state's collapse, different from the Copenhagen interpretation, has to be proposed for proper solution of this situation.	翻訳日:2024-02-07 21:27:27 公開日:2024-02-06
# ViT-DD:セミスーパービジョンドライバディトラクション検出用マルチタスク・ビジョン・トランス ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection ( http://arxiv.org/abs/2209.09178v4 ) ライセンス: Link先を確認	Yunsheng Ma and Ziran Wang	(参考訳) 現代の運転における交通安全確保と事故軽減が最重要であり、コンピュータビジョン技術はこの目標に大きく貢献する可能性がある。本稿では,運転者注意障害検出と運転者の感情認識の両方に関連するトレーニング信号からインダクティブ情報を取り入れたマルチモーダル視覚変換器(ViT-DD)を提案する。さらに,感情ラベルのないドライバデータをvit-ddのマルチタスクトレーニングプロセスにシームレスに統合可能な自己学習アルゴリズムを開発した。実験結果から,提案したViT-DDは,SFDDDデータセットとAUCDDデータセットにおいて,運転者の気晴らしを6.5%,0。 Ensuring traffic safety and mitigating accidents in modern driving is of paramount importance, and computer vision technologies have the potential to significantly contribute to this goal. This paper presents a multi-modal Vision Transformer for Driver Distraction Detection (termed ViT-DD), which incorporates inductive information from training signals related to both distraction detection and driver emotion recognition. Additionally, a self-learning algorithm is developed, allowing for the seamless integration of driver data without emotion labels into the multi-task training process of ViT-DD. Experimental results reveal that the proposed ViT-DD surpasses existing state-of-the-art methods for driver distraction detection by 6.5% and 0.9% on the SFDDD and AUCDD datasets, respectively.	翻訳日:2024-02-07 21:27:07 公開日:2024-02-06
# アニーリングパスの変分表現:単調埋め込み下のブレグマン情報 Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding ( http://arxiv.org/abs/2209.07481v3 ) ライセンス: Link先を確認	Rob Brekelmans, Frank Nielsen	(参考訳) マルコフ連鎖モンテカルロ法による複素分布のサンプリングと正規化定数の推定は、移動可能な初期分布と関心のターゲット密度とを橋渡しするアニーリングパスに沿った中間分布の列からサンプルをシミュレートすることが多い。先行研究は準算術的な手段を用いてアニーリングパスを構築し、結果として生じる中間密度は、エンドポイントへの期待分散を最小限に抑えるものとして解釈した。これらのアニーリングパスの変分表現を分析するために、算術平均の引数が期待されるブレグマン偏差を1つの代表点まで最小化することを示す既知の結果を拡張する。特に、ブレグマン発散への入力が単調な埋め込み関数の下で変換されるとき、準算術的な方法で類似の結果を得る。本解析では,rho-tau表現型ブレグマン発散フレームワークを用いた準アリオスメティックな手段,パラメトリック族,発散関数間の相互作用に着目し,発散関数と中間密度をアニーリング経路に沿って関連付ける。 Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior works have constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. To analyze these variational representations of annealing paths, we extend known results showing that the arithmetic mean over arguments minimizes the expected Bregman divergence to a single representative point. In particular, we obtain an analogous result for quasi-arithmetic means, when the inputs to the Bregman divergence are transformed under a monotonic embedding function. Our analysis highlights the interplay between quasi-arithmetic means, parametric families, and divergence functionals using the rho-tau representational Bregman divergence framework, and associates common divergence functionals with intermediate densities along an annealing path.	翻訳日:2024-02-07 21:26:55 公開日:2024-02-06
# Semantic2Graph:ビデオにおけるアクションセグメンテーションのためのグラフベースのマルチモーダル機能融合 Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos ( http://arxiv.org/abs/2209.05653v5 ) ライセンス: Link先を確認	Junbin Zhang, Pei-Hsuan Tsai and Meng-Hsun Tsai	(参考訳) ビデオアクションセグメンテーションは多くの分野で広く適用されている。これまでの研究のほとんどは、この目的のためにビデオベースのビジョンモデルを使用していた。しかし、ビデオ内の長期的な依存関係を捉えるために、大きな受容フィールド(lstmまたはtransformerメソッド)に依存することがしばしばあり、重要な計算資源要求に繋がる。この課題に対処するため、グラフベースのモデルが提案された。しかし、従来のグラフベースのモデルは正確ではない。そこで本研究では,Semantic2Graphというグラフ構造化手法を導入し,ビデオの長期依存性をモデル化し,計算コストを低減し,精度を高める。映像のグラフ構造をフレームレベルで構築する。時間的エッジはビデオ内の時間的関係と行動順序をモデル化するために使用される。さらに,ビデオ行動における長期的・短期的な意味的関係を捉えるために,対応するエッジ重みを伴う肯定的・否定的な意味的エッジを設計した。 node属性は、ビデオコンテンツ、グラフ構造、ラベルテキストから抽出された豊富なマルチモーダルな特徴を包含し、視覚的、構造的、セマンティックな手がかりを包含する。このマルチモーダル情報を効果的に合成するために,ノード動作ラベル分類のための多モーダル特徴を融合するグラフニューラルネットワーク(GNN)モデルを用いる。実験の結果、Semantic2Graphは、特にGTEAや50Saladsのようなベンチマークデータセットにおいて、最先端の手法よりもパフォーマンスが優れていることが示された。複数のアブレーション実験は、モデル性能の向上における意味的特徴の有効性をさらに検証する。特に、Semantic2Graphにセマンティックエッジを組み込むことで、ビデオベースのビジョンモデルにおける計算リソースの制約による課題に対処する上で、長期的な依存関係をコスト効率よくキャプチャすることができる。 Video action segmentation have been widely applied in many fields. Most previous studies employed video-based vision models for this purpose. However, they often rely on a large receptive field, LSTM or Transformer methods to capture long-term dependencies within videos, leading to significant computational resource requirements. To address this challenge, graph-based model was proposed. However, previous graph-based models are less accurate. Hence, this study introduces a graph-structured approach named Semantic2Graph, to model long-term dependencies in videos, thereby reducing computational costs and raise the accuracy. We construct a graph structure of video at the frame-level. Temporal edges are utilized to model the temporal relations and action order within videos. Additionally, we have designed positive and negative semantic edges, accompanied by corresponding edge weights, to capture both long-term and short-term semantic relationships in video actions. Node attributes encompass a rich set of multi-modal features extracted from video content, graph structures, and label text, encompassing visual, structural, and semantic cues. To synthesize this multi-modal information effectively, we employ a graph neural network (GNN) model to fuse multi-modal features for node action label classification. Experimental results demonstrate that Semantic2Graph outperforms state-of-the-art methods in terms of performance, particularly on benchmark datasets such as GTEA and 50Salads. Multiple ablation experiments further validate the effectiveness of semantic features in enhancing model performance. Notably, the inclusion of semantic edges in Semantic2Graph allows for the cost-effective capture of long-term dependencies, affirming its utility in addressing the challenges posed by computational resource constraints in video-based vision models.	翻訳日:2024-02-07 21:26:33 公開日:2024-02-06
# 多変量拡散・分別凝集機構による皮膚癌逆行例の検討 Reversing Skin Cancer Adversarial Examples by Multiscale Diffusive and Denoising Aggregation Mechanism ( http://arxiv.org/abs/2208.10373v3 ) ライセンス: Link先を確認	Yongwei Wang, Yuan Li, Zhiqi Shen, Yuhui Qiao	(参考訳) 皮膚癌診断モデルが早期スクリーニングや医療介入において重要な役割を担っている。コンピュータ支援型皮膚がん分類システムでは、ディープラーニングアプローチを採用している。しかし、近年の研究では、皮膚がんの診断モデルの性能を著しく低下させるために、逆境攻撃に対する極端な脆弱性が明らかにされている。これらの脅威を軽減するため,本研究は,皮膚がん画像におけるリバースエンジニアリング逆転による,シンプルで効果的で資源効率のよい防御枠組みを示す。具体的には、医療画像領域の識別構造をより良く保存するために、まず、多スケール画像ピラミッドが確立される。逆効果を中和するために、異方性ガウス雑音を注入して異なるスケールの皮膚画像を徐々に拡散させ、逆効果例をクリーン画像多様体に移動させる。さらに、逆方向のノイズを逆転させ、冗長なノイズを抑えるため、隣接するスケールの画像情報を集約する新しいマルチスケールデノナイズ機構を慎重に設計する。皮膚がんの多クラス分類データセットであるISIC 2019において,本手法の防御効果を評価した。実験の結果,本手法は異なる攻撃による逆向きの摂動を効果的に回避し,皮膚がんの診断モデルにおいて最先端の手法を著しく上回ることがわかった。 Reliable skin cancer diagnosis models play an essential role in early screening and medical intervention. Prevailing computer-aided skin cancer classification systems employ deep learning approaches. However, recent studies reveal their extreme vulnerability to adversarial attacks -- often imperceptible perturbations to significantly reduce the performances of skin cancer diagnosis models. To mitigate these threats, this work presents a simple, effective, and resource-efficient defense framework by reverse engineering adversarial perturbations in skin cancer images. Specifically, a multiscale image pyramid is first established to better preserve discriminative structures in the medical imaging domain. To neutralize adversarial effects, skin images at different scales are then progressively diffused by injecting isotropic Gaussian noises to move the adversarial examples to the clean image manifold. Crucially, to further reverse adversarial noises and suppress redundant injected noises, a novel multiscale denoising mechanism is carefully designed that aggregates image information from neighboring scales. We evaluated the defensive effectiveness of our method on ISIC 2019, a largest skin cancer multiclass classification dataset. Experimental results demonstrate that the proposed method can successfully reverse adversarial perturbations from different attacks and significantly outperform some state-of-the-art methods in defending skin cancer diagnosis models.	翻訳日:2024-02-07 21:25:48 公開日:2024-02-06
# ARIEL: 逆グラフコントラスト学習 ARIEL: Adversarial Graph Contrastive Learning ( http://arxiv.org/abs/2208.06956v2 ) ライセンス: Link先を確認	Shengyu Feng, Baoyu Jing, Yada Zhu, Hanghang Tong	(参考訳) コントラスト学習はグラフ表現学習において効果的な教師なしの手法であり、対照的学習の重要な要素は正と負のサンプルの構築にある。従来の方法は通常、グラフ内のノードの近接を原則として利用する。近年,データ提示型コントラスト学習法が進歩し,視覚領域で大きな力を発揮するようになり,その手法を画像からグラフに拡張した研究もある。しかし、画像上のデータ拡張とは異なり、グラフ上のデータ拡張は直感的ではなく、高品質のコントラストサンプルを提供することがはるかに難しく、改善の余地がたくさんある。本研究では、データ拡張のための逆グラフビューを導入することにより、合理的な制約の中で情報的コントラストサンプルを抽出する簡易かつ効果的な手法である逆グラフコントラスト学習(ARIEL)を提案する。我々は,安定トレーニングのための情報正規化と呼ばれる新しい手法を開発し,拡張性にサブグラフサンプリングを用いる。ノードレベルのコントラスト学習からグラフレベルまで,各グラフインスタンスをスーパーノードとして扱うことで一般化する。 ARIELは、実世界のデータセット上のノードレベルとグラフレベルの両方の分類タスクにおいて、現在のグラフコントラスト学習手法よりも一貫して優れている。さらに、ARIELは敵の攻撃に対してより堅牢であることを示す。 Contrastive learning is an effective unsupervised method in graph representation learning, and the key component of contrastive learning lies in the construction of positive and negative samples. Previous methods usually utilize the proximity of nodes in the graph as the principle. Recently, the data-augmentation-based contrastive learning method has advanced to show great power in the visual domain, and some works extended this method from images to graphs. However, unlike the data augmentation on images, the data augmentation on graphs is far less intuitive and much harder to provide high-quality contrastive samples, which leaves much space for improvement. In this work, by introducing an adversarial graph view for data augmentation, we propose a simple but effective method, Adversarial Graph Contrastive Learning (ARIEL), to extract informative contrastive samples within reasonable constraints. We develop a new technique called information regularization for stable training and use subgraph sampling for scalability. We generalize our method from node-level contrastive learning to the graph level by treating each graph instance as a super-node. ARIEL consistently outperforms the current graph contrastive learning methods for both node-level and graph-level classification tasks on real-world datasets. We further demonstrate that ARIEL is more robust in the face of adversarial attacks.	翻訳日:2024-02-07 21:25:24 公開日:2024-02-06
# 多重不変集合をもつ非線形系の持ち上げと再構成について On the lifting and reconstruction of nonlinear systems with multiple invariant sets ( http://arxiv.org/abs/2304.11860v3 ) ライセンス: Link先を確認	Shaowu Pan and Karthik Duraisamy	(参考訳) クープマン作用素(koopman operator)は、不変部分空間における可観測性の進化に焦点をあてることで、非線形ダイナミクスに関する線型視点を与える。可観測性は通常、クープマン固有関数から線形に再構成される。過去数年間のクープマン作用素の広範な使用にもかかわらず、クープマン作用素を複数の不連続不変量集合(例えば孤立不動点からのアトラクションの盆地)を持つ力学系に適用する可能性について、いくつかの誤解がある。本稿では,まず,複数の不連続不変量集合を持つ非線形システムの線形再構成に基づくクープマン作用素の機構について,簡単な説明を行う。次に、データ効率の良い方法でクープマン固有関数を構成するために、そのような不変集合間の離散対称性の使用について議論する。最後に、koopman作用素の学習に対称性を利用する利点を説明するために、いくつかの数値例が提供されている。 The Koopman operator provides a linear perspective on non-linear dynamics by focusing on the evolution of observables in an invariant subspace. Observables of interest are typically linearly reconstructed from the Koopman eigenfunctions. Despite the broad use of Koopman operators over the past few years, there exist some misconceptions about the applicability of Koopman operators to dynamical systems with more than one disjoint invariant sets (e.g., basins of attractions from isolated fixed points). In this work, we first provide a simple explanation for the mechanism of linear reconstruction-based Koopman operators of nonlinear systems with multiple disjoint invariant sets. Next, we discuss the use of discrete symmetry among such invariant sets to construct Koopman eigenfunctions in a data efficient manner. Finally, several numerical examples are provided to illustrate the benefits of exploiting symmetry for learning the Koopman operator.	翻訳日:2024-02-07 21:17:51 公開日:2024-02-06
# MI-SegNet:unseen Domain Generalizationのための相互情報に基づくUSセグメンテーション MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization ( http://arxiv.org/abs/2303.12649v3 ) ライセンス: Link先を確認	Yuan Bi, Zhongliang Jiang, Ricarda Clarenbach, Reza Ghotbi, Angelos Karlas, Nassir Navab	(参考訳) ドメイン間の学習に基づく医用画像分割の一般化は、現在、領域シフトによる性能低下、特に超音波(us)イメージングによって制限されている。アメリカの画像の品質は、音像、機械、設定によって異なる、注意深く調整された音響パラメータに大きく依存している。ドメイン間のUS画像の一般化性を改善するために,解剖学的特徴表現とドメイン特徴表現を明確に分離する新たな相互情報(MI)ベースのフレームワークMI-SegNetを提案する。 2つのエンコーダを使用して、絡み合いの関連特徴を抽出する。セグメンテーションはその予測に解剖学的特徴マップのみを使用する。エンコーダに有意義な特徴表現を学習させるために、トレーニング中にクロスリコンストラクション法が使用される。ドメインまたは解剖学に特有の変換は、それぞれの特徴抽出タスクでエンコーダを導くために適用される。さらに、両方の機能マップに存在するすべてのmiは、別々の機能空間をさらに促進するために罰せられる。パラメータやマシンの異なる複数のデータセットに対して提案したドメイン独立セグメンテーション手法の一般化可能性を検証する。さらに,提案するMI-SegNetを,最先端ネットワークと比較し,事前学習モデルとして有効であることを示す。 Generalization capabilities of learning-based medical image segmentation across domains are currently limited by the performance degradation caused by the domain shift, particularly for ultrasound (US) imaging. The quality of US images heavily relies on carefully tuned acoustic parameters, which vary across sonographers, machines, and settings. To improve the generalizability on US images across domains, we propose MI-SegNet, a novel mutual information (MI) based framework to explicitly disentangle the anatomical and domain feature representations; therefore, robust domain-independent segmentation can be expected. Two encoders are employed to extract the relevant features for the disentanglement. The segmentation only uses the anatomical feature map for its prediction. In order to force the encoders to learn meaningful feature representations a cross-reconstruction method is used during training. Transformations, specific to either domain or anatomy are applied to guide the encoders in their respective feature extraction task. Additionally, any MI present in both feature maps is punished to further promote separate feature spaces. We validate the generalizability of the proposed domain-independent segmentation approach on several datasets with varying parameters and machines. Furthermore, we demonstrate the effectiveness of the proposed MI-SegNet serving as a pre-trained model by comparing it with state-of-the-art networks.	翻訳日:2024-02-07 21:17:25 公開日:2024-02-06
# 遷移系を用いた非循環的問合せ枠組への時間性と因果性の統合 Integrating Temporality and Causality into Acyclic Argumentation Frameworks using a Transition System ( http://arxiv.org/abs/2303.09197v2 ) ライセンス: Link先を確認	Y. Munro (1), C. Sarmiento (1), I. Bloch (1), G. Bourgne (1), M.-J. Lesot (1) ((1) Sorbonne Universit\'e, CNRS, LIP6, Paris, France)	(参考訳) 抽象的議論の文脈では、時間性、すなわち、引数が列挙される順序、および因果性を考慮する利点を提示する。本研究では,非循環的抽象的論証フレームワークの概念をアクション言語に書き換える形式的手法を提案する。これは世界の進化をモデル化し,直接的・間接的を問わず,議論と結果の因果関係を確立する。解集合プログラミングの実装も提案され、説明への視点も提案されている。 In the context of abstract argumentation, we present the benefits of considering temporality, i.e. the order in which arguments are enunciated, as well as causality. We propose a formal method to rewrite the concepts of acyclic abstract argumentation frameworks into an action language, that allows us to model the evolution of the world, and to establish causal relationships between the enunciation of arguments and their consequences, whether direct or indirect. An Answer Set Programming implementation is also proposed, as well as perspectives towards explanations.	翻訳日:2024-02-07 21:17:04 公開日:2024-02-06
# 2次元拡散モデルにロバストテキスト-3次元生成のための3次元一貫性を知らせる Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation ( http://arxiv.org/abs/2303.07937v4 ) ライセンス: Link先を確認	Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim	(参考訳) テキスト対3d生成は、前訓練されたテキスト対2d拡散モデルを用いてゼロショット設定で神経放射場(nerf)を最適化する手法であるスコア蒸留の出現により、近年急速に進歩している。しかし, 2次元拡散モデルにおける3次元認識の欠如は, スコア蒸留法による3次元シーンの再構成を不安定にする。そこで本研究では,事前学習した2次元拡散モデルに3次元認識を組み込んだ新しいフレームワークである3dfuseを提案する。まず,与えられたテキストプロンプトの粗い3次元構造を構築し,拡散モデルの条件として投影された視点特異的深度マップを用いた。さらに,ロバストな生成のための粗い3次元構造内の誤差や空間性を扱う2次元拡散モデルの学習を可能にするトレーニング戦略と,シーンのすべての視点において意味的一貫性を確保する手法を導入する。我々の枠組みは, 先行技術の限界を超え, 2次元拡散モデルの3次元整合生成に大きな影響を与える。 Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting. However, the lack of 3D awareness in the 2D diffusion models destabilizes score distillation-based methods from reconstructing a plausible 3D scene. To address this issue, we propose 3DFuse, a novel framework that incorporates 3D awareness into pretrained 2D diffusion models, enhancing the robustness and 3D consistency of score distillation-based methods. We realize this by first constructing a coarse 3D structure of a given text prompt and then utilizing projected, view-specific depth map as a condition for the diffusion model. Additionally, we introduce a training strategy that enables the 2D diffusion model learns to handle the errors and sparsity within the coarse 3D structure for robust generation, as well as a method for ensuring semantic consistency throughout all viewpoints of the scene. Our framework surpasses the limitations of prior arts, and has significant implications for 3D consistent generation of 2D diffusion models.	翻訳日:2024-02-07 21:16:55 公開日:2024-02-06
# ORCHNet: 果樹園における3次元LiDARに基づく位置認識のためのロバストグローバルな特徴集約アプローチ ORCHNet: A Robust Global Feature Aggregation approach for 3D LiDAR-based Place recognition in Orchards ( http://arxiv.org/abs/2303.00477v2 ) ライセンス: Link先を確認	T. Barros, L. Garrote, P. Conde, M.J. Coombes, C. Liu, C. Premebida, U.J. Nunes	(参考訳) 農業環境におけるロバストで信頼性の高い位置認識とループ閉鎖検出は依然として未解決の問題である。特に果樹園は、全分野にわたる構造的類似性のため、難しいケーススタディである。本研究では,3次元LiDARデータを利用した果樹園における位置認識問題に対処する。そこで我々は,3D-LiDARスキャンをグローバルディスクリプタにマッピングするディープラーニングベースのアプローチORCHNetを提案する。具体的には,複数のアグリゲーションメソッドをロバストなグローバルディスクリプタに融合する,新たなグローバル機能アグリゲータアプローチを提案する。 ORCHNetは、夏と秋の季節のデータを含む果樹園で収集された実世界のデータに基づいて評価される。このロバスト性を評価するために,orchnet と同一季節および季節間のデータを用いた最先端の集計手法を比較した。さらに,ORCHNetをループ閉鎖検出器として利用する局所化フレームワークの一部として,提案手法を評価した。実験結果から, ORCHNetは場所認識タスクにおいて, 残りのアプローチよりも優れており, シーズンを通じて堅牢であることがわかった。ローカライゼーションに関しては,ORCHNetをループ検出器として統合する際,木を通り抜けるエッジケースを解決し,本課題における提案手法の適用可能性を示す。コードは:\url{https://github.com/Cybonic/ORCHNet.git}で公開される。 Robust and reliable place recognition and loop closure detection in agricultural environments is still an open problem. In particular, orchards are a difficult case study due to structural similarity across the entire field. In this work, we address the place recognition problem in orchards resorting to 3D LiDAR data, which is considered a key modality for robustness. Hence, we propose ORCHNet, a deep-learning-based approach that maps 3D-LiDAR scans to global descriptors. Specifically, this work proposes a new global feature aggregation approach, which fuses multiple aggregation methods into a robust global descriptor. ORCHNet is evaluated on real-world data collected in orchards, comprising data from the summer and autumn seasons. To assess the robustness, we compare ORCHNet with state-of-the-art aggregation approaches on data from the same season and across seasons. Moreover, we additionally evaluate the proposed approach as part of a localization framework, where ORCHNet is used as a loop closure detector. The empirical results indicate that, on the place recognition task, ORCHNet outperforms the remaining approaches, and is also more robust across seasons. As for the localization, the edge cases where the path goes through the trees are solved when integrating ORCHNet as a loop detector, showing the potential applicability of the proposed approach in this task. The code will be publicly available at:\url{https://github.com/Cybonic/ORCHNet.git}	翻訳日:2024-02-07 21:16:35 公開日:2024-02-06
# ターゲット拡張による領域外ロバスト性 Out-of-Domain Robustness via Targeted Augmentations ( http://arxiv.org/abs/2302.11861v3 ) ライセンス: Link先を確認	Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang	(参考訳) あるドメインでトレーニングされたモデルは、例えば野生生物の監視モデルが新しいカメラの場所にデプロイされる場合など、目に見えないドメインのパフォーマンス低下を被ることが多い。本研究では、外部ドメイン(OOD)一般化のためのデータ拡張を設計するための原則について研究する。特に、ドメインに依存しないいくつかの機能が堅牢である実世界のシナリオ、すなわちドメイン毎に異なるいくつかの機能は予測OODである。例えば、上記の野生生物モニタリングアプリケーションでは、画像の背景はカメラの場所によって異なるが、生息地のタイプを示す。線形設定に関する理論的解析に動機づけられ,ロバストな特徴を保ちながらスプリアスなドメイン依存特徴を選択的にランダム化する目標拡張法を提案する。対象の拡張によってOOD性能が向上し、より少ないドメインでモデルを一般化できることを示す。対照的に、ドメイン依存機能のランダム化に失敗したジェネリック拡張や、すべてのドメイン依存機能のランダム化を行うドメイン不変拡張といった既存のアプローチは、いずれもOODが不十分である。実世界の3つのデータセットの実験では、ターゲット拡張によってOODのパフォーマンスが3.2～15.2ポイント向上した。 Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2 percentage points.	翻訳日:2024-02-07 21:16:11 公開日:2024-02-06
# 知識グラフによる推論のためのニューロシンボリックAI:サーベイ Neurosymbolic AI for Reasoning over Knowledge Graphs: A Survey ( http://arxiv.org/abs/2302.07200v2 ) ライセンス: Link先を確認	Lauren Nicole DeLong, Ramon Fern\'andez Mir, Jacques D. Fleuriot (The University of Edinburgh School of Informatics, Artificial Intelligence and its Applications Institute)	(参考訳) ニューロシンボリックAIは、シンボリック推論手法とディープラーニングを組み合わせて、補完的な利点を活用する研究の活発な領域である。知識グラフは異種・多関係的なデータを表現するための一般的な方法になりつつあるため、グラフ構造を推論する手法はこのニューロシンボリックパラダイムに従おうとしている。従来、そのようなアプローチは規則に基づく推論か、パターンを抽出できる代表的な数値埋め込みのいずれかを使用してきた。しかし、近年のいくつかの研究は、この二分法を橋渡しして、解釈性を促進し、競争力を保ち、専門家の知識を統合するモデルを作ろうと試みている。そこで我々は,知識グラフ上でニューロシンボリック推論タスクを行う手法を調査し,それらを分類できる新しい分類法を提案する。具体的には,(1)論理式埋め込みアプローチ,(2)論理制約付き埋め込みアプローチ,(3)規則学習アプローチの3つの主要なカテゴリを提案する。分類と並行して,より直接的な比較のために,アプローチの概要とソースコードへのリンクを提供する。最後に,これらの手法の特徴と限界について考察し,この研究分野が発展するであろういくつかの今後の方向性を提案する。 Neurosymbolic AI is an increasingly active area of research that combines symbolic reasoning methods with deep learning to leverage their complementary benefits. As knowledge graphs are becoming a popular way to represent heterogeneous and multi-relational data, methods for reasoning on graph structures have attempted to follow this neurosymbolic paradigm. Traditionally, such approaches have utilized either rule-based inference or generated representative numerical embeddings from which patterns could be extracted. However, several recent studies have attempted to bridge this dichotomy to generate models that facilitate interpretability, maintain competitive performance, and integrate expert knowledge. Therefore, we survey methods that perform neurosymbolic reasoning tasks on knowledge graphs and propose a novel taxonomy by which we can classify them. Specifically, we propose three major categories: (1) logically-informed embedding approaches, (2) embedding approaches with logical constraints, and (3) rule learning approaches. Alongside the taxonomy, we provide a tabular overview of the approaches and links to their source code, if available, for more direct comparison. Finally, we discuss the unique characteristics and limitations of these methods, then propose several prospective directions toward which this field of research could evolve.	翻訳日:2024-02-07 21:15:51 公開日:2024-02-06
# 分散型適応選好エージェントに対するオンライン勧告 Online Recommendations for Agents with Discounted Adaptive Preferences ( http://arxiv.org/abs/2302.06014v2 ) ライセンス: Link先を確認	Arpit Agarwal, William Brown	(参考訳) 未知の$\textit{preference model}$ によれば、エージェントの好み(推奨項目よりも選択確率を示す)が過去の選択関数として進化する、バンディットの推奨問題を考える。各ラウンドで、エージェントに$k$アイテム(合計$n$)のメニューを表示し、1つのアイテムを選択する。そして、エージェントの選択に対する敵の損失に対して、$\textit{target set}$(アイテムのサブセット)に対する後悔を最小限に抑える。均一メモリエージェントが考慮されたagarwalとbrown(2022年)から設定を拡張することにより、次のラウンド毎にエージェントのメモリベクトルにディスカウント係数が適用される一様でないメモリを許容する。長期記憶(long-term memory)」では、任意の$\textit{smooth}$のモデルに対して、効率的なサブリニア後悔が$\textit{everywhere instantaneally realizable distributions}$(以前の作業で定式化された"eird set")のセットに対して得られることが示される。さらに、メモリ重みの線型関数によって上下に有界な選好(これらの「スケールバウンド」選好と呼ぶ)に対して、$\textit{entire}$ item simplex のほとんどについて効率的なサブ線形後悔を求めるアルゴリズムを与える。 EIRD以上のターゲットに拡張するためのNP-hardnessの結果を示す。短期記憶」体制(メモリ水平線が一定である場合)において、スケールバウンドされた嗜好は、損失があまり頻繁に変化しない場合でも、スムーズさを伴わずに、ほぼすべての単純体に対して効率的なサブ線形後悔を可能にすることを示すが、損失が一定であっても任意のスムーズな選好モデルの下でEIRDと競合する情報理論上の障壁を示す。 We consider a bandit recommendations problem in which an agent's preferences (representing selection probabilities over recommended items) evolve as a function of past selections, according to an unknown $\textit{preference model}$. In each round, we show a menu of $k$ items (out of $n$ total) to the agent, who then chooses a single item, and we aim to minimize regret with respect to some $\textit{target set}$ (a subset of the item simplex) for adversarial losses over the agent's choices. Extending the setting from Agarwal and Brown (2022), where uniform-memory agents were considered, here we allow for non-uniform memory in which a discount factor is applied to the agent's memory vector at each subsequent round. In the "long-term memory" regime (when the effective memory horizon scales with $T$ sublinearly), we show that efficient sublinear regret is obtainable with respect to the set of $\textit{everywhere instantaneously realizable distributions}$ (the "EIRD set", as formulated in prior work) for any $\textit{smooth}$ preference model. Further, for preferences which are bounded above and below by linear functions of memory weight (we call these "scale-bounded" preferences) we give an algorithm which obtains efficient sublinear regret with respect to nearly the $\textit{entire}$ item simplex. We show an NP-hardness result for expanding to targets beyond EIRD in general. In the "short-term memory" regime (when the memory horizon is constant), we show that scale-bounded preferences again enable efficient sublinear regret for nearly the entire simplex even without smoothness if losses do not change too frequently, yet we show an information-theoretic barrier for competing against the EIRD set under arbitrary smooth preference models even when losses are constant.	翻訳日:2024-02-07 21:15:29 公開日:2024-02-06
# 有限温度における量子忠実性に現れる量子相転移のシグネチャ Signature of quantum phase transition manifested in quantum fidelity at finite temperature ( http://arxiv.org/abs/2302.01795v2 ) ライセンス: Link先を確認	Protyush Nandi, Sirshendu Bhattacharyya and Subinay Dasgupta	(参考訳) 量子相転移のシグネチャは一般に有限温度で消去される。非解析的行動を通じてこのシグネチャを運ぶために観測された少量の量は、低温のみに限られる。高温で適切な動的量を特定することを目的として、我々は最近、低温状態を超えた量子臨界点で非解析的シグネチャを持つ量子忠実度から関数を構築した。本稿では, 初期の研究を詳述し, 対応する速度関数の挙動と, 異なる次元の多体ハミルトニアンに対する非解析性の堅牢性を示す。また、我々の速度関数は、ゼロ温度での動的量子相転移(DQPT)の実証に使用されるものまで減少することを示した。さらに、DQPTとは異なり、速度関数の長い時間制限は平衡量子相転移を忠実に検出することができることが観察されている。 The signature of quantum phase transition is generally wiped out at finite temperature. A few quantities that have been observed to carry this signature through a nonanalytic behavior are also limited to low temperatures only. With an aim to identify a suitable dynamical quantity at a high temperature, we have recently constructed a function from quantum fidelity, which has the potential to bear a nonanalytic signature at the quantum critical point beyond low temperature regime. In this paper, we elaborate our earlier work and demonstrate the behavior of the corresponding rate function and the robustness of the nonanalyticity for a number of many-body Hamiltonians in different dimensions. We have also shown that our rate function reduces to that used in the demonstration of the dynamical quantum phase transition (DQPT) at zero temperature. It has been further observed that, unlike DQPT, the long time limit of the rate function can faithfully detect the equilibrium quantum phase transition as well.	翻訳日:2024-02-07 21:14:47 公開日:2024-02-06
# 3次元LiDARの効率よい凸ハル型車両電位推定法 An Efficient Convex Hull-based Vehicle Pose Estimation Method for 3D LiDAR ( http://arxiv.org/abs/2302.01034v3 ) ライセンス: Link先を確認	Ningning Ding	(参考訳) lidarによる車両ポーズ推定は、自動運転の知覚技術において不可欠である。しかし,lidar点雲の不完全観測とスパース性のため,既存のポーズ推定法を用いて3次元lidarに基づく適切なポーズ抽出を実現することが困難である。また、リアルタイム性能要求により、ポーズ推定タスクの難易度がさらに向上する。本稿では,凸船体に基づく新しい車両ポーズ推定手法を提案する。抽出した3Dクラスタを凸船体に還元し、重要な輪郭情報を保持しながらその後の計算負担を低減する。その後、探索に基づくアルゴリズムに対して、最小閉塞面積に基づく新しい基準を開発し、正確なポーズ推定を可能にする。さらに、この基準により提案アルゴリズムは特に障害物回避に適している。提案アルゴリズムは,工業団地で取得したKITTIデータセットと手動ラベル付きデータセットで検証される。その結果,提案手法は実時間速度を維持しつつ,従来のポーズ推定法よりも精度が高いことを示した。 Vehicle pose estimation with LiDAR is essential in the perception technology of autonomous driving. However, due to incomplete observation measurements and sparsity of the LiDAR point cloud, it is challenging to achieve satisfactory pose extraction based on 3D LiDAR with the existing pose estimation methods. In addition, the demand for real-time performance further increases the difficulty of the pose estimation task. In this paper, we propose a novel vehicle pose estimation method based on the convex hull. The extracted 3D cluster is reduced to the convex hull, reducing the subsequent computation burden while preserving essential contour information. Subsequently, a novel criterion based on the minimum occlusion area is developed for the search-based algorithm, enabling accurate pose estimation. Additionally, this criterion renders the proposed algorithm particularly well-suited for obstacle avoidance. The proposed algorithm is validated on the KITTI dataset and a manually labeled dataset acquired at an industrial park. The results demonstrate that our proposed method can achieve better accuracy than the classical pose estimation method while maintaining real-time speed.	翻訳日:2024-02-07 21:14:33 公開日:2024-02-06
# 継続的学習に関する包括的調査:理論・方法・応用 A Comprehensive Survey of Continual Learning: Theory, Method and Application ( http://arxiv.org/abs/2302.00487v3 ) ライセンス: Link先を確認	Liyuan Wang, Xingxing Zhang, Hang Su, Jun Zhu	(参考訳) 現実世界のダイナミクスに対処するためには、インテリジェントなシステムは生涯を通じて段階的に知識を取得し、更新し、蓄積し、活用する必要がある。この能力は連続学習と呼ばれ、AIシステムが適応的に開発するための基盤を提供する。一般的な意味では、連続学習は破滅的な放棄によって明示的に制限され、新しいタスクの学習は通常、古いタスクの劇的なパフォーマンス低下をもたらす。この他にも、継続的な学習の理解と応用を大きく広げる多くの進歩が近年現れている。この方向への関心の高まりは、その現実的な重要性と複雑さを示している。本研究では,基礎的設定,理論的基礎,代表的方法,実践的応用を橋渡しする継続的学習に関する総合的な調査を行う。既存の理論的および実証的な結果に基づいて,連続学習の一般的な目的を,資源効率の文脈における適切な安定性・塑性トレードオフと適切なタスク内一般化可能性を保証するものとして要約する。次に,最先端かつ精巧な分類法を提供し,代表的な手法が継続的学習をどのように扱うか,それらが現実的応用における特定の課題にどのように適応するかを広範囲に分析する。将来性のある方向性に関する詳細な議論を通じて、このような全体論的な視点は、この分野以降の探究を大いに促進できると信じている。 To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems to develop themselves adaptively. In a general sense, continual learning is explicitly limited by catastrophic forgetting, where learning a new task usually results in a dramatic performance degradation of the old tasks. Beyond this, increasingly numerous advances have emerged in recent years that largely extend the understanding and application of continual learning. The growing and widespread interest in this direction demonstrates its realistic significance as well as complexity. In this work, we present a comprehensive survey of continual learning, seeking to bridge the basic settings, theoretical foundations, representative methods, and practical applications. Based on existing theoretical and empirical results, we summarize the general objectives of continual learning as ensuring a proper stability-plasticity trade-off and an adequate intra/inter-task generalizability in the context of resource efficiency. Then we provide a state-of-the-art and elaborated taxonomy, extensively analyzing how representative methods address continual learning, and how they are adapted to particular challenges in realistic applications. Through an in-depth discussion of promising directions, we believe that such a holistic perspective can greatly facilitate subsequent exploration in this field and beyond.	翻訳日:2024-02-07 21:14:18 公開日:2024-02-06
# fractional posteriorsを用いた半パラメトリック推定 Semiparametric inference using fractional posteriors ( http://arxiv.org/abs/2301.08158v2 ) ライセンス: Link先を確認	Alice L'Huillier, Luke Travis, Isma\"el Castillo and Kolyan Ray	(参考訳) 非パラメトリック先行性に基づく分数的後続分布の概線型半パラメトリック汎函数に対する一般ベルンシュタイン-ヴォン・ミーゼスの定理を確立する。これは多くの非パラメトリックな設定や、ガウス過程の事前を含む様々な事前分布のクラスで示される。半パラメトリックな不確実性定量化を行うことができるが,その大きさは膨大であることを示す。これに対処するため、我々はさらに、正則条件下で最適なサイズを持つ効率的な信頼集合である分数後集合 \textit{shifted-and-rescaled} を提案する。また,この結果から,分数指数に対する率依存性を鋭くすることで,分数後遺症に対する既存の収縮率の精度を向上できた。 We establish a general Bernstein--von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric uncertainty quantification, but have inflated size. To remedy this, we further propose a \textit{shifted-and-rescaled} fractional posterior set that is an efficient confidence set having optimal size under regularity conditions. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent.	翻訳日:2024-02-07 21:13:53 公開日:2024-02-06
# データソースの最適正規化 Optimal Regularization for a Data Source ( http://arxiv.org/abs/2212.13597v3 ) ライセンス: Link先を確認	Oscar Leong, Eliza O'Reilly, Yong Sheng Soh and Venkat Chandrasekaran	(参考訳) 逆問題や統計的推定に対する最適化に基づくアプローチでは、解の所望の構造特性を促進する正則化子でデータ忠実性を強制する基準を補強することが一般的である。適切な正規化器の選択は、通常、事前のドメイン情報と計算上の考慮の組み合わせによって行われる。凸正則化器は計算的に魅力的であるが、促進できる構造の種類には制限がある。一方、非凸正則化器は、推進できる構造の形態においてより柔軟であり、いくつかのアプリケーションで強い経験的性能を示すが、関連する最適化問題を解決するという計算上の課題が伴う。本稿では, 分散が与えられた場合, 分散から引き出されたデータに対して, 最適な正規化器は何か, という質問をすることで, 凸正則化のパワーと限界を体系的に理解することを模索する。データソースのどの特性が最適正則化器が凸であるかを制御しているのか? 我々は、連続かつ正に同質であり、原点から離れる正の関数によって特定される正規化子のクラスについて、これらの問題に対処する。正則化器は、正則化器が与えるエネルギーのギブス密度が、正則化器が誘導するすべてのギブス密度の人口密度(または同値なエントロピー損失を最小化する)を最大化するならば、データ分布に最適であると言う。私たちが考えるレギュラライザーは、恒星体と1対1の対応にあるため、データ分布から得られる放射関数は、最適なレギュラライザーを識別し、データソースが凸正規化を観測できる可算性を評価するための重要な量である「計算量十分統計」に類似していることを示すために、双対ブルン・ミンコフスキー理論を利用する。 In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.	翻訳日:2024-02-07 21:13:41 公開日:2024-02-06
# Kullback-Leibler Maillard Smpling for Multi-armed Bandits with bounded Rewards Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards ( http://arxiv.org/abs/2304.14989v3 ) ライセンス: Link先を確認	Hao Qin, Kwang-Sung Jun and Chicheng Zhang	(参考訳) 我々は、腕の報酬分布がすべて$[0,1]$間隔で支えられるような$K$武器の盗賊問題を研究する。この環境では、後悔効率の悪いランダム化探索アルゴリズムを設計することが難しかった。 maillard sampling~\cite{maillard13apprentissage}(トンプソンサンプリングに代わる魅力的な代替品)は、最近、オフラインポリシー評価に有用なクローズドフォームアクション確率を維持しながら、サブゲージの報酬設定における競合的な後悔の保証を達成することが示されている。本研究では,KL-Leibler Maillard Smpling (KL-MS)アルゴリズムを提案する。 kl-ms は、報酬がベルヌーイであるときに漸近的最適性を享受し、最悪の場合の後悔の束縛が $o(\sqrt{\mu^(1-\mu^) k t \ln k} + k \ln t)$ であることを示し、ここで $\mu^$ は最適アームの期待報酬であり、$t$ は時平線の長さである。 We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^(1-\mu^) K T \ln K} + K \ln T)$, where $\mu^$ is the expected reward of the optimal arm, and $T$ is the time horizon length.	翻訳日:2024-02-07 21:04:58 公開日:2024-02-06
# MUDiff:完全分子生成のための統一拡散 MUDiff: Unified Diffusion for Complete Molecule Generation ( http://arxiv.org/abs/2304.14621v3 ) ライセンス: Link先を確認	Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup	(参考訳) 分子生成は非常に重要な実用的問題であり、医薬品の発見と材料設計に利用され、AI手法は有用なソリューションを提供することを約束する。しかし、既存の分子生成法は2dグラフ構造か3d幾何学構造に焦点を合わせており、2dグラフが主にトポロジーを捉え、3d幾何学が主に空間原子配置を捉えているため、完全な分子を表現するには不十分である。これらの表現を組み合わせることは、分子をよりよく表すのに不可欠である。本稿では,原子の特徴,2次元離散分子構造,および3次元連続分子座標を含む分子の包括的表現を離散的および連続的拡散過程を組み合わせることで生成する新しいモデルを提案する。拡散過程を用いることで、分子過程の確率的性質を捉え、異なる因子が分子構造に与える影響を探求することができる。さらに,拡散過程を認知するための新しいグラフトランスフォーマーアーキテクチャを提案する。トランスは3次元ロート変換同分散制約に準拠し、原子座標の同分散を保ちながら不変な原子とエッジの表現を学習することができる。この変換器は、幾何学的変換に頑健な分子表現を学ぶために使用できる。実験と既存手法との比較により, モデルの性能評価を行い, より安定で有効な分子を生成する能力を示した。我々のモデルは、安定で多様な分子を設計するための有望なアプローチであり、分子モデリングの幅広いタスクに適用できる。 Molecule generation is a very important practical problem, with uses in drug discovery and material design, and AI methods promise to provide useful solutions. However, existing methods for molecule generation focus either on 2D graph structure or on 3D geometric structure, which is not sufficient to represent a complete molecule as 2D graph captures mainly topology while 3D geometry captures mainly spatial atom arrangements. Combining these representations is essential to better represent a molecule. In this paper, we present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates, by combining discrete and continuous diffusion processes. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and exploring the effect of different factors on molecular structures. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer adheres to 3D roto-translation equivariance constraints, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules. Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.	翻訳日:2024-02-07 21:04:25 公開日:2024-02-06
# 拡張クラスタ:ニューラルネットワークのパラメータ回復 Expand-and-Cluster: Parameter Recovery of Neural Networks ( http://arxiv.org/abs/2304.12794v3 ) ライセンス: Link先を確認	Flavio Martinelli, Berfin Simsek, Wulfram Gerstner and Johanni Brea	(参考訳) 入力出力マッピングを探索することで、ニューラルネットワークのパラメータを識別できるだろうか? 通常、置換、過度パラメータ化、アクティベーション関数対称性のため、ユニークな解は存在しない。しかし、各ニューロンの入射重みベクトルは、活性化関数に応じて、符号やスケーリングまで識別可能であることを示す。一般的に使用されるすべてのアクティベーション関数に対して,提案手法である'expand-and-cluster'は,ターゲットネットワークのサイズとパラメータを2つのフェーズで識別する。 (i)問題の非凸性を緩和するために、拡張サイズの複数の学生ネットワークを訓練し、対象ネットワークのマッピングを模倣する。 (ii) 対象ネットワークを特定するために, クラスタリング手法を採用し, 学生間で共有される重みベクトルを明らかにする。ニューロン数を10%以下に満たさない訓練された浅層ネットワークと深層ネットワークのパラメータとサイズ回復に成功し,可変難易度150の合成問題を分析して「識別可能性のイーズ」軸を記述する。 Can we identify the parameters of a neural network by probing its input-output mapping? Usually, there is no unique solution because of permutation, overparameterisation and activation function symmetries. Yet, we show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. For all commonly used activation functions, our novel method 'Expand-and-Cluster' identifies the size and parameters of a target network in two phases: (i) to relax the non-convexity of the problem, we train multiple student networks of expanded size to imitate the mapping of the target network; (ii) to identify the target network, we employ a clustering procedure and uncover the weight vectors shared between students. We demonstrate successful parameter and size recovery of trained shallow and deep networks with less than 10% overhead in the neuron number and describe an 'ease-of-identifiability' axis by analysing 150 synthetic problems of variable difficulty.	翻訳日:2024-02-07 21:04:00 公開日:2024-02-06
# l$-subexponential covariates におけるスパース線形回帰係数の推定 Estimation of sparse linear regression coefficients under $L$-subexponential covariates ( http://arxiv.org/abs/2304.11958v2 ) ライセンス: Link先を確認	Takeyuki Sasai	(参考訳) 共変数が$l$-subexponential random vectorからサンプリングされたとき、線形回帰におけるスパース係数の推定に取り組む。このベクトルは、ガウス確率ベクトルよりも重いテールを示す分布のクラスに属する。以前の研究では、ガウス確率ベクトルに類似した誤差境界が確立されている。しかし、これらの方法は誤差境界を導出するためにガウス確率ベクトルに使用される条件よりも強い条件を必要とする。本研究では,ガウス確率ベクトルに対して得られた値と同一の誤差を,より強い条件を課さずに,その共変数が$L$-部分指数確率ベクトルから引き出される場合に適用する。興味深いことに、我々は$\ell_1$-penalized Huberレグレッション(英語版)を採用している。本研究では,$\ell_1$-penalized Huber回帰法の新たな側面を明らかにする。 We tackle estimating sparse coefficients in a linear regression when the covariates are sampled from an $L$-subexponential random vector. This vector belongs to a class of distributions that exhibit heavier tails than Gaussian random vector. Previous studies have established error bounds similar to those derived for Gaussian random vectors. However, these methods require stronger conditions than those used for Gaussian random vectors to derive the error bounds. In this study, we present an error bound identical to the one obtained for Gaussian random vectors up to constant factors without imposing stronger conditions, when the covariates are drawn from an $L$-subexponential random vector. Interestingly, we employ an $\ell_1$-penalized Huber regression, which is known for its robustness against heavy-tailed random noises rather than covariates. We believe that this study uncovers a new aspect of the $\ell_1$-penalized Huber regression method.	翻訳日:2024-02-07 21:03:43 公開日:2024-02-06
# RMTによる100万トークン以上のTransformerのスケーリング Scaling Transformer to 1M tokens and beyond with RMT ( http://arxiv.org/abs/2304.11062v2 ) ライセンス: Link先を確認	Aydar Bulatov, Yuri Kuratov, Yermek Kapushev, Mikhail S. Burtsev	(参考訳) 変圧器によって解ける問題の範囲の広い大きな制限は、入力サイズによる計算複雑性の2次スケーリングである。本研究では,入力コンテキスト長を線形に拡張する事前学習型トランスフォーマーモデルの繰り返しメモリ拡張について検討する。提案手法は,検索精度を高く保ちつつ,前例のない200万トークンのシーケンスの情報をメモリに格納できることを実証する。言語モデリングタスクを用いた実験では、処理された入力セグメントの数が増えるにつれて複雑度が向上する。これらの結果から,自然言語理解および生成タスクにおける長期依存性処理の強化や,メモリ集約型アプリケーションにおける大規模コンテキスト処理の実現に重要な可能性を持つ本手法の有効性が示唆された。 A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up to an unprecedented two million tokens while maintaining high retrieval accuracy. Experiments with language modeling tasks show perplexity improvement as the number of processed input segments increases. These results underscore the effectiveness of our method, which has significant potential to enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.	翻訳日:2024-02-07 21:03:27 公開日:2024-02-06
# マージンに沿う : マージン化コミュニティの社会プラットフォームに関する倫理的懸念 Along the Margins: Marginalized Communities' Ethical Concerns about Social Platforms ( http://arxiv.org/abs/2304.08882v2 ) ライセンス: Link先を確認	Lauren Olson and Emitz\'a Guzm\'an and Florian Kunneman	(参考訳) 本稿では,地域社会の社会的プラットフォームに対する倫理的懸念を明らかにする。最近のプラットフォームへの悪影響は、ソフトウェアチームがユーザーの懸念よりも株主の関心を優先していることを示しています。さらに、これらのプラットフォームの欠点は、しばしば疎外化人口に壊滅的な影響を及ぼす。最初に586の辺境化コミュニティのサブレディットを解体し、彼らのソーシャルプラットフォームに関する言及のデータセットを集約し、これらのデータに倫理的懸念について手動で言及しました。その後,手作業による注釈データの傾向を分析し,自然言語処理(nlp)によって倫理的関心事を自動的に分類できる範囲を検証した。コミュニティの倫理的懸念は、差別や表現の誤りを主に取り除き、現在のソフトウェア開発プラクティスの欠陥を明らかにします。そのため、研究者や開発者は、我々の研究を利用してこれらの懸念をさらに調査し、現在のソフトウェア欠陥を是正することができる。 In this paper, we identified marginalized communities' ethical concerns about social platforms. We performed this identification because recent platform malfeasance indicates that software teams prioritize shareholder concerns over user concerns. Additionally, these platform shortcomings often have devastating effects on marginalized populations. We first scraped 586 marginalized communities' subreddits, aggregated a dataset of their social platform mentions and manually annotated mentions of ethical concerns in these data. We subsequently analyzed trends in the manually annotated data and tested the extent to which ethical concerns can be automatically classified by means of natural language processing (NLP). We found that marginalized communities' ethical concerns predominantly revolve around discrimination and misrepresentation, and reveal deficiencies in current software development practices. As such, researchers and developers could use our work to further investigate these concerns and rectify current software flaws.	翻訳日:2024-02-07 21:02:57 公開日:2024-02-06
# 封建グラフ強化学習 Feudal Graph Reinforcement Learning ( http://arxiv.org/abs/2304.05099v2 ) ライセンス: Link先を確認	Tommaso Marzi, Arshjot Khehra, Andrea Cini, Cesare Alippi	(参考訳) グラフベースの表現と重み付けモジュールポリシーは、強化学習(RL)における構成可能な制御問題に対処するための顕著なアプローチである。しかし、最近のグラフ深層学習文献で示されているように、メッセージパッシング演算子は情報伝達のボトルネックを生じさせ、グローバルな調整を妨げる。ハイレベルな計画が必要なタスクでは、この問題は劇的になります。本研究では,階層的RLとピラミッド型メッセージパッシングアーキテクチャに頼って,このような課題に対処する新しい手法であるFeudal Graph Reinforcement Learning (FGRL)を提案する。特に、fgrlは、階層の上部から階層化されたグラフ構造を通じてハイレベルなコマンドが伝播するポリシーの階層を定義する。下層は物理系の形態を模倣し、上層はより抽象的なサブモジュールをキャプチャする。結果として得られたエージェントは、あるレベルのアクションが以下のレベルの目標を設定するポリシー委員会によって特徴づけられ、タスクの分解を包含する階層的な意思決定構造を実装する。提案手法をベンチマークmujoco環境上で評価し,fgrlが関連するベースラインと好適に比較できることを示す。さらに、コマンド伝搬機構の詳細な分析により、メッセージパッシング方式が階層的な意思決定方針の学習に有利であることを示す。 Graph-based representations and weight-sharing modular policies constitute prominent approaches to tackling composable control problems in Reinforcement Learning (RL). However, as shown by recent graph deep learning literature, message-passing operators can create bottlenecks in information propagation and hinder global coordination. The issue becomes dramatic in tasks where high-level planning is needed. In this work, we propose a novel methodology, named Feudal Graph Reinforcement Learning (FGRL), that addresses such challenges by relying on hierarchical RL and a pyramidal message-passing architecture. In particular, FGRL defines a hierarchy of policies where high-level commands are propagated from the top of the hierarchy down through a layered graph structure. The bottom layers mimic the morphology of the physical system, while the upper layers capture more abstract sub-modules. The resulting agents are then characterized by a committee of policies where actions at a certain level set goals for the level below, thus implementing a hierarchical decision-making structure that encompasses task decomposition. We evaluate the proposed framework on locomotion tasks on benchmark MuJoCo environments and show that FGRL compares favorably against relevant baselines. Furthermore, an in-depth analysis of the command propagation mechanism provides evidence that the introduced message-passing scheme favors the learning of hierarchical decision-making policies.	翻訳日:2024-02-07 21:02:27 公開日:2024-02-06
# 機械理解による子どものビデオの学習品質の定量化 Quantifying the Academic Quality of Children's Videos using Machine Comprehension ( http://arxiv.org/abs/2303.17201v2 ) ライセンス: Link先を確認	Sumeet Kumar, Mallikarjuna T., Ashiqur Khudabukhsh	(参考訳) youtube kids (ytk) は、何百万人もの子どもが毎日使っている最も人気のある子供向けアプリケーションの一つである。しかし、さまざまな研究がプラットフォーム上のビデオに対する懸念を強調している。 youtubeは先日,‘promoting learning’を含む高品質なガイドラインを提案し,ランキングチャネルで使用することを提案している。しかし、学習の概念は多面的であり、オンラインビデオの文脈で定義・測定することは困難である。本研究は、学校で教えられていることの学習に焦点を当て、子どものビデオの学術的品質を測定する方法を提案する。子どものビデオからの質問と回答の新しいデータセットを用いて、まず、学習の可読性(Reading Comprehension, RC)モデルを推定できることを示す。次に,多種多様な話題に関する中学校教科書質問の大規模データセットを用いて,rcモデルが正しく回答できる児童教科書質問数として上位チャネルの学術的品質を定量化する。トップ100のチャンネルに投稿された8万本のビデオを分析して、YTKのチャンネルの学術的品質を初めて詳細に分析した。 YouTube Kids (YTK) is one of the most popular kids' applications used by millions of kids daily. However, various studies have highlighted concerns about the videos on the platform, like the over-presence of entertaining and commercial content. YouTube recently proposed high-quality guidelines that include `promoting learning' and proposed to use it in ranking channels. However, the concept of learning is multi-faceted, and it can be difficult to define and measure in the context of online videos. This research focuses on learning in terms of what's taught in schools and proposes a way to measure the academic quality of children's videos. Using a new dataset of questions and answers from children's videos, we first show that a Reading Comprehension (RC) model can estimate academic learning. Then, using a large dataset of middle school textbook questions on diverse topics, we quantify the academic quality of top channels as the number of children's textbook questions that an RC model can correctly answer. By analyzing over 80,000 videos posted on the top 100 channels, we present the first thorough analysis of the academic quality of channels on YTK.	翻訳日:2024-02-07 21:01:28 公開日:2024-02-06
# エキゾチック局所次元を用いた安定化符号 Stabilizer Codes with Exotic Local-dimensions ( http://arxiv.org/abs/2303.17000v2 ) ライセンス: Link先を確認	Lane G. Gunderman	(参考訳) 従来の安定化符号は素電力ローカルディメンション上で動作する。本研究では、局所次元不変条件を用いて安定化器の形式を拡張し、これらの標準局所次元から他のケースへ安定化器コードをインポートする。特に,従来の安定化符号は相空間と離散位相空間の制約を考慮することで,アナログ連続変数符号に利用できることを示す。これにより、このフレームワークは従来の安定化コードと同じ基盤に置かれる。これに続いて、先行アイデアの拡張を用いて、元来有限フィールド局所ディメンションで設計された安定化コードは、任意の積分領域に対して同じ$n$、$k$、$d$パラメータを持つコードに変換できることを示す。これは理論的な関心事であり、局所次元が数学的な環によってよりよく説明され、情報を保護するために従来の安定化符号を使うことを可能にするシステムにも利用できる。 Traditional stabilizer codes operate over prime power local-dimensions. In this work we extend the stabilizer formalism using the local-dimension-invariant setting to import stabilizer codes from these standard local-dimensions to other cases. In particular, we show that any traditional stabilizer code can be used for analog continuous-variable codes, and consider restrictions in phase space and discretized phase space. This puts this framework on an equivalent footing as traditional stabilizer codes. Following this, using extensions of prior ideas, we show that a stabilizer code originally designed with a finite field local-dimension can be transformed into a code with the same $n$, $k$, and $d$ parameters for any integral domain. This is of theoretical interest and can be of use for systems whose local-dimension is better described by mathematical rings, which permits the use of traditional stabilizer codes for protecting their information as well.	翻訳日:2024-02-07 21:01:08 公開日:2024-02-06
# 学習可能なグラフマッチング: データアソシエーションのための実践的パラダイム Learnable Graph Matching: A Practical Paradigm for Data Association ( http://arxiv.org/abs/2303.15414v2 ) ライセンス: Link先を確認	Jiawei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang	(参考訳) データアソシエーションは、複数のオブジェクト追跡、画像マッチング、ポイントクラウド登録など、多くのコンピュータビジョンタスクの中核にある。しかしながら、現在のデータアソシエーションソリューションには、主にビュー内コンテキスト情報を無視する、あるいは、深いアソシエーションモデルをエンドツーエンドでトレーニングする、最適化ベースの割り当て手法の利点をほとんど活用しない、あるいは、オフザシェルニューラルネットワークを使用して特徴を抽出する、といった、いくつかの欠陥がある。本稿では,これらの問題に対処するために,一般学習可能なグラフマッチング手法を提案する。特に、ビュー内関係を無向グラフとしてモデル化する。そして、データアソシエーションはグラフ間の一般的なグラフマッチング問題となる。さらに、エンドツーエンドの微分を可能にするため、元のグラフマッチング問題を2次連続プログラミングに緩和し、KKT条件と暗黙関数定理を備えたディープグラフニューラルネットワークにトレーニングを組み込む。 MOTタスクでは,複数のMOTデータセット上での最先端性能を実現する。画像マッチングでは,一般的な屋内データセットであるScanNetで最先端の手法より優れている。ポイントクラウドの登録については、競争結果も達成します。コードはhttps://github.com/jiaweihe1996/gmtrackerで入手できる。 Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. however, current data association solutions have some defects: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet. For point cloud registration, we also achieve competitive results. Code will be available at https://github.com/jiaweihe1996/GMTracker.	翻訳日:2024-02-07 21:00:54 公開日:2024-02-06
# 不均一連関学習におけるクライアントドリフト最小化のための適応的自己蒸留 Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning ( http://arxiv.org/abs/2305.19600v3 ) ライセンス: Link先を確認	M.Yashwanth, Gaurav Kumar Nayak, Arya Singh, Yogesh Simmhan, Anirban Chakraborty	(参考訳) Federated Learning(FL)は、クライアントがローカルトレーニングデータを共有せずに、局所的にトレーニングされたモデルを集約することで、グローバルモデルの共同トレーニングを可能にする機械学習パラダイムである。実際には、各クライアントが観測するローカルデータ分布にまたがる実質的な不均一性(例えばクラス不均衡)がしばしば存在する。このようなクライアント間の非IDデータ分散では、FLは、すべてのクライアントが自身のローカルな最適化にドリフトする'クライアント-ドリフト'問題に悩まされる。これにより、集約モデルの収束が遅くなり、性能が低下する。この制限に対処するために、クライアント側でのトレーニングモデルのための適応自己蒸留(ASD)に基づく新しい正規化手法を提案する。我々の正規化スキームは、グローバルモデルエントロピーとクライアントのラベル分布に基づいて、クライアントのトレーニングデータに適応的に調整する。提案した正規化は、既存の最先端のFLアルゴリズム上で容易に統合することができ、これらのオフ・ザ・シェルフ法の性能がさらに向上する。理論的には、ASDがクライアントのドリフトを減らし、その一般化能力を説明する。提案手法の有効性を,複数の実世界のベンチマーク実験により実証し,最先端手法よりも高い性能を示した。 Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid data distributions across clients, FL suffers from the 'client-drift' problem where every client drifts to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to the client's training data based on the global model entropy and the client's label distribution. The proposed regularization can be easily integrated atop existing, state-of-the-art FL algorithms, leading to a further boost in the performance of these off-the-shelf methods. We theoretically explain how ASD reduces client-drift and also explain its generalization ability. We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance over state-of-the-art methods.	翻訳日:2024-02-07 20:53:10 公開日:2024-02-06
# 点雲上の深層学習のための滑らかで正確な回転対称性 Smooth, exact rotational symmetrization for deep learning on point clouds ( http://arxiv.org/abs/2305.19302v3 ) ライセンス: Link先を確認	Sergey N. Pozdnyakov and Michele Ceriotti	(参考訳) 点雲は3Dオブジェクトの汎用表現であり、科学や工学に広く応用されている。入力として使用するディープラーニングモデルが数多く提案されている。化学・材料モデリングの分野は、モデルが実際に使用可能であるためには物理的制約の厳密な遵守が極めて望ましいため、特に困難である。これらの制約には、同一原子の翻訳、回転、置換に関する滑らかさと不変性が含まれる。これらの要件が厳密に満たされていない場合、モデルに優れた精度があるとしても、原子論シミュレーションはばかげた結果をもたらす可能性がある。その結果、設計空間を制限して不変性を実現する専用アーキテクチャが開発された。汎用のポイントクラウドモデルはより多様であるが、しばしば回転対称性を無視する。任意のモデルに回転同分散を付加し、他の全ての要求を保存できる一般対称性法を提案する。このアプローチは、設計空間の制約を緩和し、他の領域で効果的なアイデアを取り入れることで、原子スケールの機械学習スキームの開発を単純化する。このアイデアは,本質的同変ではないが,分子や固体のベンチマークデータセット上での最先端性能を実現するPoint Edge Transformer (PET) アーキテクチャを導入することで実証する。一般プロトコルのA-posteriori適用により,PETの精度は最小限に抑えられた。 Point clouds are versatile representations of 3D objects and have found widespread application in science and engineering. Many successful deep-learning models have been proposed that use them as input. The domain of chemical and materials modeling is especially challenging because exact compliance with physical constraints is highly desirable for a model to be usable in practice. These constraints include smoothness and invariance with respect to translations, rotations, and permutations of identical atoms. If these requirements are not rigorously fulfilled, atomistic simulations might lead to absurd outcomes even if the model has excellent accuracy. Consequently, dedicated architectures, which achieve invariance by restricting their design space, have been developed. General-purpose point-cloud models are more varied but often disregard rotational symmetry. We propose a general symmetrization method that adds rotational equivariance to any given model while preserving all the other requirements. Our approach simplifies the development of better atomic-scale machine-learning schemes by relaxing the constraints on the design space and making it possible to incorporate ideas that proved effective in other domains. We demonstrate this idea by introducing the Point Edge Transformer (PET) architecture, which is not intrinsically equivariant but achieves state-of-the-art performance on several benchmark datasets of molecules and solids. A-posteriori application of our general protocol makes PET exactly equivariant, with minimal changes to its accuracy.	翻訳日:2024-02-07 20:52:50 公開日:2024-02-06
# 非線形リカレントニューラルネットワークの逆近似理論 Inverse Approximation Theory for Nonlinear Recurrent Neural Networks ( http://arxiv.org/abs/2305.19190v4 ) ライセンス: Link先を確認	Shida Wang, Zhong Li and Qianxiao Li	(参考訳) 本研究では,recurrent neural network (rnns) を用いた非線形シーケンス-シーケンス関係の近似に対する逆近似定理を証明した。これはいわゆるベルンシュタイン型近似理論の結果であり、仮説空間によって効果的に近似できるという仮定の下で対象関数の性質を推論する。特に、非線形RNNによって安定に近似できる非線形シーケンス関係は、指数関数的に減衰するメモリ構造を持つ必要がある。これは線形rnnにおけるメモリの呪いを一般的な非線形設定に拡張し、長期記憶とのシーケンシャルな関係を学習するためのrnnアーキテクチャの本質的な制限を定量化する。そこで本研究では,その限界を克服する原理的パラメータ化手法を提案する。理論的結果は数値実験によって確認される。コードはhttps://github.com/radarfudan/curse-of-memoryでリリースされている。 We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs). This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure - a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments. The code has been released in https://github.com/radarFudan/Curse-of-memory	翻訳日:2024-02-07 20:52:17 公開日:2024-02-06
# 確率的時系列予測のためのより良いバッチ Better Batch for Deep Probabilistic Time Series Forecasting ( http://arxiv.org/abs/2305.17028v3 ) ライセンス: Link先を確認	Vincent Zhihao Zheng, Seongjin Choi, Lijun Sun	(参考訳) 深い確率的時系列予測は、非線形近似における優れた性能と、意思決定に価値ある不確実な定量化を提供する能力に注目されている。しかし、既存のモデルは、時間に依存しないエラープロセスを仮定し、シリアル相関を見越して問題を単純化することが多い。この制限を克服するため,確率予測精度を向上させるために,誤り自己相関を取り入れた革新的なトレーニング手法を提案する。本手法は,モデルトレーニングのためのD$連続時系列セグメントのコレクションとしてミニバッチを構築する。各ミニバッチ上で時間変化共分散行列を明示的に学習し、隣接する時間ステップ間の誤差相関を符号化する。学習された共分散行列は予測精度の向上と不確かさの定量化に利用できる。 2つの異なるニューラル予測モデルと複数の公開データセットで本手法を評価する。実験の結果,提案手法の有効性が検証され,予測精度が大幅に向上した。 Deep probabilistic time series forecasting has gained attention for its superior performance in nonlinear approximation and its capability to offer valuable uncertainty quantification for decision-making. However, existing models often oversimplify the problem by assuming a time-independent error process, overlooking serial correlation. To overcome this limitation, we propose an innovative training method that incorporates error autocorrelation to enhance probabilistic forecasting accuracy. Our method constructs a mini-batch as a collection of $D$ consecutive time series segments for model training. It explicitly learns a time-varying covariance matrix over each mini-batch, encoding error correlation among adjacent time steps. The learned covariance matrix can be used to improve prediction accuracy and enhance uncertainty quantification. We evaluate our method on two different neural forecasting models and multiple public datasets. Experimental results confirm the effectiveness of the proposed approach in improving the performance of both models across a range of datasets, resulting in notable improvements in predictive accuracy.	翻訳日:2024-02-07 20:52:03 公開日:2024-02-06
# 生物学的データを用いたグラフニューラルネットワークのサイズ一般化:スペクトルの観点からの考察と実践 Size Generalization of Graph Neural Networks on Biological Data: Insights and Practices from the Spectral Perspective ( http://arxiv.org/abs/2305.15611v3 ) ライセンス: Link先を確認	Gaotang Li, Yujun Yan, Danai Koutra	(参考訳) 本研究では,グラフの大きさによる分布変化を調査し,その学習データに対するグラフニューラルネットワーク(gnns)の一般化能力に与える影響を評価する。既存の文献では、gnnのサイズ汎化可能性について、主にアプリケーションドメインの相違とサイズ誘起分布シフトに関する基礎的な仮定によって、矛盾する結論を示している。私たちは実際の生物学的データセットに注目し、サイズによって引き起こされる分散シフトのタイプを特徴付けることを求めます。従来のアプローチと異なり、スペクトルの視点を採用し、サイズによって引き起こされるスペクトル差がサブグラフパターン(例えば、平均サイクル長)の違いと関係していることを明らかにする。従来の研究では, サブグラフ情報の取得におけるGNNの欠如が, 分布内一般化に悪影響を及ぼすことが確認されているが, トレーニング中に遭遇しない大規模テストグラフでは, この減少が顕著である。このようなスペクトル的洞察に基づいて,gnnがそれらの重要な部分グラフパターンを認識し,そのサイズ一般化可能性を高めるための,単純かつ効果的なモデル非依存戦略を導入する。実験の結果,提案手法はトレーニンググラフの2～10倍の大きさの大規模テストグラフ上でのグラフ分類性能を大幅に向上させ,F1スコアを最大8%向上させることができた。 We investigate size-induced distribution shifts in graphs and assess their impact on the ability of graph neural networks (GNNs) to generalize to larger graphs relative to the training data. Existing literature presents conflicting conclusions on GNNs' size generalizability, primarily due to disparities in application domains and underlying assumptions concerning size-induced distribution shifts. Motivated by this, we take a data-driven approach: we focus on real biological datasets and seek to characterize the types of size-induced distribution shifts. Diverging from prior approaches, we adopt a spectral perspective and identify that spectrum differences induced by size are related to differences in subgraph patterns (e.g., average cycle lengths). While previous studies have identified that the inability of GNNs in capturing subgraph information negatively impacts their in-distribution generalization, our findings further show that this decline is more pronounced when evaluating on larger test graphs not encountered during training. Based on these spectral insights, we introduce a simple yet effective model-agnostic strategy, which makes GNNs aware of these important subgraph patterns to enhance their size generalizability. Our empirical results reveal that our proposed size-insensitive attention strategy substantially enhances graph classification performance on large test graphs, which are 2-10 times larger than the training graphs, resulting in an improvement in F1 scores by up to 8%.	翻訳日:2024-02-07 20:51:49 公開日:2024-02-06
# 熱力学量を用いた多体絡み合いの実験的検証 Experimental Verification of Many-Body Entanglement Using Thermodynamic Quantities ( http://arxiv.org/abs/2305.15012v2 ) ライセンス: Link先を確認	Jitendra Joshi, Mir Alimuddin, T S Mahesh, Manik Banik	(参考訳) 量子絡み合いの現象は、新しい量子技術を可能にするいくつかの重要なプロトコルの下にある。しかし、絡み合った状態は極めて繊細であり、しばしば外部環境の小さなゆらぎによって摂動する。したがって、このリソースを含むプロトコルの実装を成功させるには、絡み合いの証明が極めて重要である。本研究では,ある種の熱力学量を測定することで容易に検証できるマルチキュービットシステムの絡み合い基準を提案する。特に、この基準は、それぞれ大域的および局所的な相互作用の下で孤立量子系から抽出可能な最適な大域的および局所的な作業の差に依存する。原理の証明として,原子磁気共鳴アーキテクチャを用いて最大10量子ビットの核スピンレジスタに関する提案手法を実証する。我々は、恒星トポロジー系におけるノイズの多いベル対角状態とノイズの多いグリーンベルガー・ホーネ・ザイリンガークラスを作成し、熱力学の基準によってそれらの絡み合いを認証する。また, 多体システムにおいて, 状態に関する知識が部分的あるいは全く存在しない場合にも, 絡み合い認証方式を提案する。 The phenomenon of quantum entanglement underlies several important protocols that enable emerging quantum technologies. Entangled states, however, are extremely delicate and often get perturbed by tiny fluctuations in their external environment. Certification of entanglement is therefore immensely crucial for the successful implementation of protocols involving this resource. In this work, we propose a set of entanglement criteria for multi-qubit systems that can be easily verified by measuring certain thermodynamic quantities. In particular, the criteria depend on the difference in optimal global and local works extractable from an isolated quantum system under global and local interactions, respectively. As a proof of principle, we demonstrate the proposed scheme on nuclear spin registers of up to 10 qubits using the Nuclear Magnetic Resonance architecture. We prepare noisy Bell-diagonal state and noisy Greenberger-Horne-Zeilinger class of states in star-topology systems and certify their entanglement through our thermodynamic criteria. Along the same line, we also propose an entanglement certification scheme in many-body systems when only partial or even no knowledge about the state is available.	翻訳日:2024-02-07 20:51:23 公開日:2024-02-06
# 基準自由画像キャプション評価指標のロバスト性に関する検討 An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics ( http://arxiv.org/abs/2305.14998v2 ) ライセンス: Link先を確認	Saba Ahmadi, Aishwarya Agrawal	(参考訳) 近年,CLIPScore (Hessel et al., 2021), UMIC (Lee et al., 2021), PAC-S (Sarto et al., 2023) などの参照フリー指標が画像キャプションの自動参照フリー評価のために提案されている。我々の焦点は、語彙の重なりが大きい2つのキャプションを区別する必要があるシナリオにおいて、これらの指標の堅牢性を評価することである。以上の結果から,クリップスコア,umic,pac-sは,人間の判断と高い相関関係にあるものの,きめ細かい誤りの特定に苦慮していることが明らかとなった。すべての指標は視覚的な接地誤差に対して強い感度を示すが、キャプションに対する感受性は限定的である。さらに,すべての指標がキャプション内の画像関連物の大きさの変動に敏感であり,CLIPScoreとPAC-Sもキャプション内の画像関連物への言及数に敏感であることがわかった。キャプションの言語的側面については,すべての指標が否定の弱い理解を示し,CLIPScoreとPAC-Sはキャプションの構造に非常に敏感である。画像キャプションの非参照評価のさらなる改善が期待できる。 Recently, reference-free metrics such as CLIPScore (Hessel et al., 2021), UMIC (Lee et al., 2021), and PAC-S (Sarto et al., 2023) have been proposed for automatic reference-free evaluation of image captions. Our focus lies in evaluating the robustness of these metrics in scenarios that require distinguishing between two captions with high lexical overlap but very different meanings. Our findings reveal that despite their high correlation with human judgments, CLIPScore, UMIC, and PAC-S struggle to identify fine-grained errors. While all metrics exhibit strong sensitivity to visual grounding errors, their sensitivity to caption implausibility errors is limited. Furthermore, we found that all metrics are sensitive to variations in the size of image-relevant objects mentioned in the caption, while CLIPScore and PAC-S are also sensitive to the number of mentions of image-relevant objects in the caption. Regarding linguistic aspects of a caption, all metrics show weak comprehension of negation, and CLIPScore and PAC-S are insensitive to the structure of the caption to a great extent. We hope our findings will guide further improvements in reference-free evaluation of image captioning.	翻訳日:2024-02-07 20:51:06 公開日:2024-02-06
# 不均衡最適輸送の半二重定式化による生成モデル Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport ( http://arxiv.org/abs/2305.14777v3 ) ライセンス: Link先を確認	Jaemoo Choi, Jaewoong Choi, Myungjoo Kang	(参考訳) 最適輸送(OT)問題は、与えられたコスト関数を最小化しながら2つの分布をブリッジする輸送マップを調べる。この点において、扱いやすい事前分布とデータの間のotは生成的モデリングタスクに利用されてきた。しかし、OTベースの手法は、トレーニング中にアウトレーヤや最適化の課題に直面しやすい。本稿では,不均衡最適輸送(UOT)の半二重定式化に基づく新しい生成モデルを提案する。 OTとは異なり、UOTは分布マッチングの厳しい制約を緩和する。このアプローチは、外れ値に対する堅牢性、トレーニング中の安定性、より高速な収束を提供する。これらの特性を実験的に検証する。さらに,UOTにおける分布間の分岐の理論的上界について検討した。 CIFAR-10ではFIDスコアが2.97、CelebA-HQ-256では6.36である。コードは \url{https://github.com/jae-moo/uotm} で入手できる。 Optimal Transport (OT) problem investigates a transport map that bridges two distributions while minimizing a given cost function. In this regard, OT between tractable prior distribution and data has been utilized for generative modeling tasks. However, OT-based methods are susceptible to outliers and face optimization challenges during training. In this paper, we propose a novel generative model based on the semi-dual formulation of Unbalanced Optimal Transport (UOT). Unlike OT, UOT relaxes the hard constraint on distribution matching. This approach provides better robustness against outliers, stability during training, and faster convergence. We validate these properties empirically through experiments. Moreover, we study the theoretical upper-bound of divergence between distributions in UOT. Our model outperforms existing OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 6.36 on CelebA-HQ-256. The code is available at \url{https://github.com/Jae-Moo/UOTM}.	翻訳日:2024-02-07 20:50:39 公開日:2024-02-06
# DirecT2V:大言語モデルはゼロショットテキスト・ビデオ生成のためのフレームレベルディレクトリである DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation ( http://arxiv.org/abs/2305.14330v3 ) ライセンス: Link先を確認	Susung Hong, Junyoung Seo, Heeseong Shin, Sunghwan Hong, Seungryong Kim	(参考訳) AIGC(AIGC)のパラダイムでは、事前訓練されたテキスト・トゥ・イメージ(T2I)モデルからテキスト・トゥ・ビデオ(T2V)生成への知識の移行に注目が集まっている。その効果にもかかわらず、これらのフレームワークは、一貫性のある物語を維持し、単一の抽象ユーザプロンプトからシーン構成やオブジェクト配置のシフトを処理する上での課題に直面している。大規模言語モデル(LLM)が時間依存のフレーム単位のプロンプトを生成する能力について検討し,新しいフレームワークであるDirecT2Vを提案する。 DirecT2Vは命令で調整されたLCMをディレクターとして利用し、時間変化のあるコンテンツを含め、一貫したビデオ生成を容易にする。時間的一貫性を保ち、異なるオブジェクトへの値のマッピングを防止するため、新たな値マッピング法と、追加のトレーニングを必要としないデュアルソフトマックスフィルタリングを拡散モデルに装備する。実験結果は,抽象的ユーザのプロンプトから視覚的にコヒーレントかつストーリーフルな映像を生成できるフレームワークの有効性を検証し,ゼロショットビデオ生成の課題への対処に成功した。 In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation. Despite their effectiveness, these frameworks face challenges in maintaining consistent narratives and handling shifts in scene composition or object placement from a single abstract user prompt. Exploring the ability of large language models (LLMs) to generate time-dependent, frame-by-frame prompts, this paper introduces a new framework, dubbed DirecT2V. DirecT2V leverages instruction-tuned LLMs as directors, enabling the inclusion of time-varying content and facilitating consistent video generation. To maintain temporal consistency and prevent mapping the value to a different object, we equip a diffusion model with a novel value mapping method and dual-softmax filtering, which do not require any additional training. The experimental results validate the effectiveness of our framework in producing visually coherent and storyful videos from abstract user prompts, successfully addressing the challenges of zero-shot video generation.	翻訳日:2024-02-07 20:50:25 公開日:2024-02-06
# プライベート微調整のための選択的事前学習 Selective Pre-training for Private Fine-tuning ( http://arxiv.org/abs/2305.13865v2 ) ライセンス: Link先を確認	Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang	(参考訳) 電子メールクライアントやワードプロセッサでテキスト予測モデルをトレーニングしたいとします。これらのモデルは、1時間に数十億の予測を処理し、ユーザデータのプライバシを保持し、メモリ、推論時間要件を満たし、推論コストを削減するために、特定のモデルサイズ制約に準拠しなければならない。小さく、速く、プライベートなドメイン固有言語モデルを構築することは、活発な研究分野である。本稿では,プライベートデータセットに導かれる公開データセットの「emサブセット」上での注意深い事前トレーニングが,小さなdp言語モデルのトレーニングに不可欠であることを示す。標準ベンチマークでは、我々の新しいフレームワークでトレーニングされたモデルは最先端のパフォーマンスを実現し、文献のすべてのベースラインを改善する。パフォーマンスの改善に加えて、我々のフレームワークは、注意深い事前トレーニングとプライベートな微調整により、より小さなモデルは、プライベートデータにアクセスできないはるかに大きなモデルの性能と一致し、モデル圧縮と効率のツールとしてのプライベートラーニングの約束を強調します。医療、金融など多くのアプリケーションでは、プライベートデータセットは通常、公開データセットよりもはるかに高品質であり、本研究は、パイプライントレーニングのすべての段階でプライベートデータセットを活用する新しい方法を示し、ディープラーニング効率を向上させる。私たちのフレームワークをベースとした言語モデルは、1日に数十億ドルの予測(そして推論コストの面で数百万ドルを節約)を提供する複数の実世界のデプロイメントで使われてきました。 Suppose we want to train text prediction models in email clients or word processors. These models, which serve billions of predictions per hour, must preserve the privacy of user data and adhere to specific model size constraints to meet memory, inference time requirements, and to reduce inference cost. Building small, fast, and private domain-specific language models is a thriving area of research. In this work, we show that a careful pre-training on a {\em subset} of the public dataset that is guided by the private dataset is crucial to train small DP language models. On standard benchmarks, models trained with our new framework achieve state-of-the-art performance, improving upon all the baselines from the literature. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models that do not have access to private data, highlighting the promise of private learning as a tool for model compression and efficiency. In many applications such as health care, finance, etc., private datasets are usually of much higher quality than public datasets, and our work shows novel ways of utilizing private datasets at all the stages of training pipe-line to improve deep learning efficiency. Language models based on our framework have been used in multiple real-world deployments serving billions of predictions per day (and saving millions of dollars in terms of inference cost) highlighting the general applicability of our framework beyond academic benchmarks.	翻訳日:2024-02-07 20:50:00 公開日:2024-02-06
# 2回考える:質問応答モデルの予測ショートカットをなくす効率を計測する Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models ( http://arxiv.org/abs/2305.06841v2 ) ライセンス: Link先を確認	Luk\'a\v{s} Mikula, Michal \v{S}tef\'anik, Marek Petrovi\v{c}, Petr Sojka	(参考訳) 大規模な言語モデル(llm)が言語理解タスクの大部分を占める一方で、以前の研究は、これらの結果のいくつかがトレーニングデータセットのスプリアス相関のモデリングによってサポートされていることを示している。著者は一般的に、同じタスクのout-of-distribution(ood)データセットでモデルを評価することによってモデルのロバスト性を評価するが、これらのデータセットはトレーニングデータセットのバイアスを共有する可能性がある。本稿では,様々な事前学習モデルと問合せ解答法(QA)において,モデルが特定された突発的特徴への依存度を簡易に測定し,既知の予測バイアスと新たに発見された予測バイアスに対するロバスト性を評価する方法を提案する。既存のデバイアス法は、選択された刺激的特徴への依存を軽減することができるが、これらの手法のOOD性能向上は、バイアス付き特徴への依存を緩和することによって説明できないことを示し、異なるQAデータセット間でバイアスが共有されることを示唆している。最後に、異なるQAデータセットでトレーニングされたモデルの性能が、同じバイアス特性に比較可能に依存していることを測定することで、これを証明している。これらの結果は、LMsの堅牢性に関する報告を、特定の突発的特徴に対処する敵のサンプルレベルまで改善する将来の研究の動機となることを願っている。 While the Large Language Models (LLMs) dominate a majority of language understanding tasks, previous work shows that some of these results are supported by modelling spurious correlations of training datasets. Authors commonly assess model robustness by evaluating their models on out-of-distribution (OOD) datasets of the same task, but these datasets might share the bias of the training dataset. We propose a simple method for measuring a scale of models' reliance on any identified spurious feature and assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA). We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features, suggesting that biases are shared among different QA datasets. Finally, we evidence this to be the case by measuring that the performance of models trained on different QA datasets relies comparably on the same bias features. We hope these results will motivate future work to refine the reports of LMs' robustness to a level of adversarial samples addressing specific spurious features.	翻訳日:2024-02-07 20:49:36 公開日:2024-02-06
# 自己注意力学におけるクラスターの出現 The emergence of clusters in self-attention dynamics ( http://arxiv.org/abs/2305.05465v5 ) ライセンス: Link先を確認	Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet	(参考訳) 相互作用する粒子系としてトランスフォーマーを見ることにより,重みが時間に依存しない場合の学習表現の幾何学を記述する。トークンを表す粒子は、時間とともに無限大となるため、特定の制限対象に向かって集結する傾向にある。クラスタ位置は初期トークンによって決定され、Transformersが学習した表現のコンテキスト認識を確認する。力学系と偏微分方程式の手法を用いて、出現する制限対象の型は値行列のスペクトルに依存することを示した。さらに、一次元の場合、自己着行列が低階ブール行列に収束することを証明する。これらの結果の組み合わせは、vaswaniらによる経験的観察を数学的に確認する。 [VSP'17]トランスフォーマーによって処理されると、リーダーが一連のトークンに現れる。 Viewing Transformers as interacting particle systems, we describe the geometry of learned representations when the weights are not time dependent. We show that particles, representing tokens, tend to cluster toward particular limiting objects as time tends to infinity. Cluster locations are determined by the initial tokens, confirming context-awareness of representations learned by Transformers. Using techniques from dynamical systems and partial differential equations, we show that the type of limiting object that emerges depends on the spectrum of the value matrix. Additionally, in the one-dimensional case we prove that the self-attention matrix converges to a low-rank Boolean matrix. The combination of these results mathematically confirms the empirical observation made by Vaswani et al. [VSP'17] that leaders appear in a sequence of tokens when processed by Transformers.	翻訳日:2024-02-07 20:49:12 公開日:2024-02-06
# ランダムコンパイルによるクロストーク誤差の緩和:超伝導量子コンピュータ上でのBCSモデルのシミュレーション Mitigating crosstalk errors by randomized compiling: Simulation of the BCS model on a superconducting quantum computer ( http://arxiv.org/abs/2305.02345v3 ) ライセンス: Link先を確認	Hugo Perrin, Thibault Scoquart, Alexander Shnirman, J\"org Schmalian and Kyrylo Snizhko	(参考訳) 我々は、隣接する量子ビットの特別な処理を含むランダム化コンパイル(RC)プロトコルの拡張を開発し、IBMQ量子コンピュータ(\texttt{ibm\_lagos} および \texttt{ibmq\_ehningen})の超伝導量子ビットへの障害ゲートの適用によるクロストーク効果を劇的に低減する。 CNOTの2量子ゲートに由来するクロストークエラーは、多くの量子コンピューティングプラットフォームにおけるエラーの重要な原因である。 IBMQマシンの場合、その大きさは見過ごされることが多い。このrcプロトコルはクロストークによるコヒーレントノイズを非分極ノイズチャネルに変換し,ノイズ推定回路などの確立されたエラー緩和スキームを用いて処理する。超伝導に対するバルディーン・クーパー=シュリーファー(BCS)ハミルトニアン(英語版)の非平衡力学の量子シミュレーションに適用し、クーパー対の長距離相互作用により量子ハードウェア上でのシミュレーションが特に困難である。 135のcnotゲートでは、ロータライズやキュービットのデコヒーレンスとは対照的に、クロストークがエラーを支配するような方法で作業します。隣り合う量子ビットの回転は、新しい量子ビットや回路を追加する必要なしにノイズ推定プロトコルを劇的に改善し、bcsモデルの定量的シミュレーションを可能にしている。 We develop and apply an extension of the randomized compiling (RC) protocol that includes a special treatment of neighboring qubits and dramatically reduces crosstalk effects caused by the application of faulty gates on superconducting qubits in IBMQ quantum computers (\texttt{ibm\_lagos} and \texttt{ibmq\_ehningen}). Crosstalk errors, stemming from CNOT two-qubit gates, are a crucial source of errors on numerous quantum computing platforms. For the IBMQ machines, their magnitude is often overlooked-9. Our RC protocol turns coherent noise due to crosstalk into a depolarising noise channel that can then be treated using established error mitigation schemes, such as noise estimation circuits. We apply our approach to the quantum simulation of the non-equilibrium dynamics of the Bardeen-Cooper-Schrieffer (BCS) Hamiltonian for superconductivity, a particularly challenging model to simulate on quantum hardware because of the long-range interaction of Cooper pairs. With 135 CNOT gates, we work in a regime where crosstalk, as opposed to either trotterization or qubit decoherence, dominates the error. Our twirling of neighboring qubits is shown to dramatically improve the noise estimation protocol without the need to add new qubits or circuits and allows for a quantitative simulation of the BCS model.	翻訳日:2024-02-07 20:49:01 公開日:2024-02-06
# 多値量子ニューロン Multi-Valued Quantum Neurons ( http://arxiv.org/abs/2305.02018v5 ) ライセンス: Link先を確認	M. W. AlMasri	(参考訳) 多値量子論理は、真理値が単位円上に置かれるユニタリのユニークな根として自然に表されるように体系的に定式化される。したがって、多値量子ニューロン(MVQN)は複素数体上の多重値しきい値論理の原理に基づいている。 MVQNの訓練は、単位円に沿った運動に還元される。多値量子ニューロンに基づく量子ニューラルネットワーク(QNN)は、複雑な重み、入力、単位のルートで符号化された出力と、複素平面を単位円にマッピングする活性化関数で構築することができる。このようなニューラルネットワークは、同じ数のニューロンと層を持つバイナリ入力に基づく量子ニューラルネットワークと比較して、高速収束と高機能を享受する。我々の構造は量子系のエネルギースペクトルを分析するのに利用できる。可能な実用的な応用は、光や分子スピンquditsのような多レベル系の軌道角運動量(oam)から構築された量子ニューラルネットワークを用いることができる。 The multiple-valued quantum logic is formulated systematically such that the truth values are represented naturally as unique roots of unity placed on the unit circle. Consequently, multi-valued quantum neuron (MVQN) is based on the principles of multiple-valued threshold logic over the field of complex numbers. The training of MVQN is reduced to the movement along the unit circle. A quantum neural network (QNN) based on multi-valued quantum neurons can be constructed with complex weights, inputs, and outputs encoded by roots of unity and an activation function that maps the complex plane into the unit circle. Such neural networks enjoy fast convergence and higher functionalities compared with quantum neural networks based on binary input with the same number of neurons and layers. Our construction can be used in analyzing the energy spectrum of quantum systems. Possible practical applications can be found using the quantum neural networks built from orbital angular momentum (OAM) of light or multi-level systems such as molecular spin qudits.	翻訳日:2024-02-07 20:48:33 公開日:2024-02-06
# CroSSL: 潜時マスキングによる時系列のクロスモーダル自己監視学習 CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking ( http://arxiv.org/abs/2307.16847v2 ) ライセンス: Link先を確認	Shohreh Deldari, Dimitris Spathis, Mohammad Malekzadeh, Fahim Kawsar, Flora Salim, Akhil Mathur	(参考訳) マルチモーダル時系列の機械学習のためのラベル付きデータの可用性は、フィールドの進歩を広範囲に阻害する。自己教師付き学習(SSL)はラベルに頼ることなくデータ表現を学ぶための有望なアプローチである。しかし、既存のSSLメソッドは、負のペアの高価な計算を必要とし、通常は単一のモダリティのために設計されている。我々はCroSSL(Cross-modal SSL)を導入し、モダリティ固有のエンコーダによって生成された中間埋め込みをマスキングすることと、下流の分類器に供給できるクロスモーダルアグリゲータを通じてグローバルな埋め込みに集約することの2つの新しい概念を紹介した。 CroSSLは、欠落したモダリティとエンドツーエンドのクロスモーダル学習を、欠落した入力を処理するための事前データ前処理や、対照的な学習のためのネガティブペアサンプリングを必要としない。本研究では,加速度センサやジャイロスコープ,生体信号(心拍数,脳電図,筋電図,筋電図,筋電図)など,様々なデータに対してマスキング比とマスキング戦略が与える影響について検討した。全体として、CroSSLは、最小限のラベル付きデータを使用して以前のSSLと教師付きベンチマークより優れており、また、潜伏マスキングがクロスモーダル学習を改善する方法についても光を当てている。私たちのコードはhttps://github.com/dr-bell/crosslでオープンソースです。 Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatility. We introduce CroSSL (Cross-modal SSL), which puts forward two novel concepts: masking intermediate embeddings produced by modality-specific encoders, and their aggregation into a global embedding through a cross-modal aggregator that can be fed to down-stream classifiers. CroSSL allows for handling missing modalities and end-to-end cross-modal learning without requiring prior data preprocessing for handling missing inputs or negative-pair sampling for contrastive learning. We evaluate our method on a wide range of data, including motion sensors such as accelerometers or gyroscopes and biosignals (heart rate, electroencephalograms, electromyograms, electrooculograms, and electrodermal) to investigate the impact of masking ratios and masking strategies for various data types and the robustness of the learned representations to missing data. Overall, CroSSL outperforms previous SSL and supervised benchmarks using minimal labeled data, and also sheds light on how latent masking can improve cross-modal learning. Our code is open-sourced a https://github.com/dr-bell/CroSSL	翻訳日:2024-02-07 20:40:40 公開日:2024-02-06
# ASCII-Artに基づく横断的タスクによるChatGPTの理解度:ASCII-Artの認識と生成に関するGPT3.5の能力は、完全には欠落していない Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking ( http://arxiv.org/abs/2307.16806v2 ) ライセンス: Link先を確認	David Bayani	(参考訳) リリースから8ヶ月にわたって、ChatGPTとその基盤となるモデルであるGPT3.5は、能力とアクセシビリティの強力な混在により、大きな注目を集めている。これらのモデルが持つ能力の範囲を調査した、ニッチな論文が登場しているが、これらのネットワークから供給され抽出される情報は、自然言語テキストか、スタイリッシュなコードライクな言語である。本研究は,真の人間レベルの知的エージェントが複数の信号モダリティにまたがる能力から着想を得たものである。本研究では,ARCIIアートとして提供される特徴内容の入力を,言語的な要約に含めることなく,GPT3.5の視覚的タスクに対する適性について検討する。視覚設定に典型的な様々な変換後の画像認識タスクにおけるモデルの性能分析,画像部品の知識の検証,画像生成に関する課題について実験を行った。 Over the eight months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche-industry of papers have emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been either natural language text or stylized, code-like language. Drawing inspiration from the prowess we expect a truly human-level intelligent agent to have across multiple signal modalities, in this work we examine GPT3.5's aptitude for visual tasks, where the inputs feature content provided as ASCII-art without overt distillation into a lingual summary. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms typical in visual settings, trials investigating knowledge of image parts, and tasks covering image generation.	翻訳日:2024-02-07 20:40:12 公開日:2024-02-06
# ソフトウェア工学におけるアダプタベース知識伝達のための事前学習言語モデルの利用 Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering ( http://arxiv.org/abs/2307.08540v2 ) ライセンス: Link先を確認	Iman Saberi, Fatemeh Fard and Fuxiang Chen	(参考訳) software engineering (se) pre-trained language model (plm) は、codebertのような大規模なコードコーパス上で事前学習されており、plmの微調整を通じて下流タスク(例えば、コードクローン検出)へ移行することに成功した。自然言語処理(NLP)では、PLMに挿入されるコンパクトでパラメータ効率の良いモジュールであるアダプタを用いて、PLMの知識を伝達する代替手段を探索する。アダプタの使用は多くのNLPベースのダウンストリームタスクにおいて有望な結果を示しているが、SEベースのダウンストリームタスクの応用と探索は限られている。本稿では,クローゼテスト,コードクローン検出,コード要約など,複数の下流タスクに対するアダプタを用いた知識伝達について検討する。これらのアダプタはコードコーパスでトレーニングされ、英語コーパスまたはコードコーパスで事前トレーニングされたplmに挿入される。これらのPLMをNL-PLM, C-PLMと呼ぶ。アダプタを持たないPLMに対してNL-PLMを用いることで,NL-PLMからSEタスクに有用な知識を変換し,活用できることが示唆された。結果がc-plmの結果と同等かそれ以上になる場合があり、パラメータ数やトレーニング時間の観点からはより効率的である。興味深いことに、C-PLMに挿入されたアダプタは、通常、従来の微調整されたC-PLMよりも良い結果をもたらす。結果はSEタスクのためのよりコンパクトなモデルを構築するための新しい方向を開く。 Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural Language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of adapter, a compact and parameter efficient module that is inserted into a PLM. Although the use of adapters has shown promising results in many NLP-based downstream tasks, their application and exploration in SE-based downstream tasks are limited. Here, we study the knowledge transfer using adapters on multiple down-stream tasks including cloze test, code clone detection, and code summarization. These adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM and C-PLM, respectively. We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are sometimes on par with or exceed the results of C-PLM; while being more efficient in terms of the number of parameters and training time. Interestingly, adapters inserted into a C-PLM generally yield better results than a traditional fine-tuned C-PLM. Our results open new directions to build more compact models for SE tasks.	翻訳日:2024-02-07 20:39:32 公開日:2024-02-06
# LLM比較評価:大規模言語モデルを用いたペアワイズ比較によるゼロショットNLG評価 LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models ( http://arxiv.org/abs/2307.07889v3 ) ライセンス: Link先を確認	Adian Liusie, Potsawee Manakul, Mark J. F. Gales	(参考訳) 大規模言語モデル(LLM)の現在の開発は、様々な自然言語タスクで印象的なゼロショット機能を実現している。これらのシステムの興味深い応用として、自然言語生成(NLG)の自動評価がある。本稿では,ゼロショットNLG評価におけるLCMの創発的能力を活用するための2つの選択肢について検討する。 NLG評価において比較評価は広く研究されていないが、人間は個別に評価するよりも2つの選択肢を比較する方が直感的であることが多い。本研究は,複数の視点から比較評価を行う: 絶対的な評価と比較する性能,プロンプトにおける位置バイアス,比較数の観点からの効率的なランキング。 LLM比較評価はNLG評価における単純で汎用的で効果的なアプローチであることを示す。 FlanT5 や Llama2-chat のような中規模のオープンソース LLM では、スコアリングよりも比較評価が優れている。さらに,対数比較を行う場合,llmは位置偏りが強いことを実証し,さらに性能を向上させるデバイアス手法を提案する。 Current developments in large language models (LLMs) have enabled impressive zero-shot capabilities across various natural language tasks. An interesting application of these systems is in the automated assessment of natural language generation (NLG), a highly challenging area with great practical benefit. In this paper, we explore two options for exploiting the emergent abilities of LLMs for zero-shot NLG assessment: absolute score prediction, and comparative assessment which uses relative comparisons between pairs of candidates. Though comparative assessment has not been extensively studied in NLG assessment, we note that humans often find it more intuitive to compare two options rather than scoring each one independently. This work examines comparative assessment from multiple perspectives: performance compared to absolute grading; positional biases in the prompt; and efficient ranking in terms of the number of comparisons. We illustrate that LLM comparative assessment is a simple, general and effective approach for NLG assessment. For moderate-sized open-source LLMs, such as FlanT5 and Llama2-chat, comparative assessment is superior to prompt scoring, and in many cases can achieve performance competitive with state-of-the-art methods. Additionally, we demonstrate that LLMs often exhibit strong positional biases when making pairwise comparisons, and we propose debiasing methods that can further improve performance.	翻訳日:2024-02-07 20:39:05 公開日:2024-02-06
# マルチスケール空間時間骨格マッチングによるワンショット行動認識 One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching ( http://arxiv.org/abs/2307.07286v2 ) ライセンス: Link先を確認	Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot	(参考訳) 単一トレーニングサンプルで骨格行動認識モデルを学習することを目的としたワンショット骨格行動認識は,大規模な骨格行動データの収集と注釈付けの難しさから注目されている。しかし、既存のほとんどの研究は、空間構造や骨格データの時間順序を無視する特徴ベクトルを直接比較することで骨格配列と一致している。本稿では,マルチスケールな時空間特徴マッチングによる骨格行動認識を行う一発骨格行動認識技術を提案する。複数の空間的および時間的スケールでスケルトンデータを表現し、2つの視点から最適な特徴マッチングを実現する。ひとつはマルチスケールマッチングで、複数の空間的および時間的スケールでスケルトンデータのスケールワイドな意味関係を同時にキャプチャする。 2つ目はクロススケールマッチングで、複数のスケールにまたがるサンプルワイドの関連性を捉えることで、異なる動きの大きさと速度を扱う。大規模な3つのデータセット(NTU RGB+D, NTU RGB+D 120, PKU-MMD)に対する大規模な実験により, 本手法は優れた単発骨格の動作認識を達成し, 高いマージンで一貫した性能を発揮することが示された。 One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.	翻訳日:2024-02-07 20:38:43 公開日:2024-02-06
# StyleGAN3:翻訳と回転の等価性向上のための生成ネットワーク StyleGAN3: Generative Networks for Improving the Equivariance of Translation and Rotation ( http://arxiv.org/abs/2307.03898v3 ) ライセンス: Link先を確認	Tianlei Zhu, Junqi Chen, Renzhe Zhu, Gaurav Gupta	(参考訳) StyleGANは、顔の姿勢やアイデンティティに影響を及ぼすスタイルや、髪、しわ、肌の色、その他の詳細に影響を及ぼすノイズを利用することができる。これらのうち、画像処理の結果はスタイルGANの異なるバージョンによって若干異なる。その結果, styleGAN2 と styleGAN3 の2つの改良版の比較が本研究の主な焦点となる。 FFHQデータセットをデータセットとして使用し,FID,EQ-T,EQ-Rをモデル評価に使用した。結局、Stylegan3バージョンは同値性を改善するためのより良い生成ネットワークであることが判明した。私たちの発見は、アニメーションやビデオの作成にポジティブな影響を与えます。 StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.	翻訳日:2024-02-07 20:38:19 公開日:2024-02-06
# 予測状態表現の学習に有効なUCB型アルゴリズム Provably Efficient UCB-type Algorithms For Learning Predictive State Representations ( http://arxiv.org/abs/2307.00405v3 ) ライセンス: Link先を確認	Ruiquan Huang, Yingbin Liang, Jing Yang	(参考訳) マルコフ決定プロセス(MDP)と部分的に観察可能なMDP(PMMDP)を特別に含む一般的なシーケンシャルな意思決定問題は、時間とともに観察と行動の歴史に基づいて一連の意思決定を行うことで累積報酬を最大化することである。近年の研究では、予測状態表現(psr)によってモデル化された低ランク構造を認める場合、逐次的意思決定問題は統計的に学習可能であることが示されている。これらの進歩にもかかわらず、既存のアプローチは通常、計算的に難解なオラクルやステップを含む。一方,楽観的なボーナスデザインの難しさから,盗賊やMDPの計算効率向上に成功している上位信頼境界(UCB)に基づくアプローチは,より一般的なPSRでは研究されていない。本稿では,推定モデルと実モデル間の全変動距離を上限とする新しいボーナス項を特徴とする,PSRに対する最初のUCB型アプローチを提案する。さらに,オンラインPSRとオフラインPSRの両方に設計したUPB型アルゴリズムの複雑さ境界を特徴付ける。 PSRに対する既存のアプローチとは対照的に、UCB型アルゴリズムは計算的トラクタビリティ、最優先の準最適ポリシー、モデルの精度が保証される。 The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are computationally intractable. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.	翻訳日:2024-02-07 20:38:08 公開日:2024-02-06
# ベイズリスクの改善は競争で社会福祉を減らし得る Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition ( http://arxiv.org/abs/2306.14670v3 ) ライセンス: Link先を確認	Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt, Nika Haghtalab	(参考訳) 機械学習モデルの規模が増加するにつれて、スケーリング法則のようなトレンドが予測精度の一貫した下流改善を予測している。しかし、これらのトレンドは独立した単一のモデル提供者の視点をとっており、現実のプロバイダーはユーザーと競い合うことが多い。本研究は,ユーザ間での全体的な予測精度が,非モノトニック性やスケールの縮小など,これらのスケーリングトレンドの振る舞いを根本的に変えることができることを示す。分類タスクの競合モデルを定義し、スケールの増大の影響を研究するためのレンズとしてデータ表現を使用する。ベイズリスクによって測定された)データ表現品質の改善が、競合するモデルプロデューサの市場において、ユーザ間での全体的な予測精度(社会福祉など)を低下させる多くの設定を見出した。我々の例は、単純な設定のクローズドフォーム公式から、CIFAR-10の事前訓練された表現を伴うシミュレーションまで様々である。概念レベルでは、各モデルプロジェクタのスケーリング傾向が、複数のモデルプロバイダを持つマーケットプレースにおける社会福祉の下流改善に寄与する必要はないことを示唆する。 As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy. However, these trends take the perspective of a single model-provider in isolation, while in reality providers often compete with each other for users. In this work, we demonstrate that competition can fundamentally alter the behavior of these scaling trends, even causing overall predictive accuracy across users to be non-monotonic or decreasing with scale. We define a model of competition for classification tasks, and use data representations as a lens for studying the impact of increases in scale. We find many settings where improving data representation quality (as measured by Bayes risk) decreases the overall predictive accuracy across users (i.e., social welfare) for a marketplace of competing model-providers. Our examples range from closed-form formulas in simple settings to simulations with pretrained representations on CIFAR-10. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare in marketplaces with multiple model providers.	翻訳日:2024-02-07 20:37:45 公開日:2024-02-06
# 均一電場及び磁場中の平面フェルミオンに対するフェルミオン縮合と真空エネルギー-モーメントテンソル Fermionic condensate and the vacuum energy-momentum tensor for planar fermions in homogeneous electric and magnetic fields ( http://arxiv.org/abs/2306.11402v3 ) ライセンス: Link先を確認	V. V. Parazian	(参考訳) 外部定数と均質な電場と磁場の平面上に局在した巨大なフェルミイオン量子場を考える。磁場は平面に垂直であり、電場は平行である。ディラック方程式に対する完全な解の集合が提示される。真空状態の重要な物理特性として,フェルミオン凝縮とエネルギー-運動テンソルの期待値について検討した。再正規化はHurwitz関数を用いて行われる。結果は、ゼロ電界の場合の研究結果と比較される。問題パラメータの値について,各領域における真空期待値の挙動について考察する。その結果は、長波長近似におけるディラックモデルにより記述されたグラフェンシートの電子サブシステムを含む。 We consider a massive fermionic quantum field localized on a plane in external constant and homogeneous electric and magnetic fields. The magnetic field is perpendicular to the plane and the electric field is parallel. The complete set of solutions to the Dirac equation is presented. As important physical characteristics of the vacuum state, the fermion condensate and the expectation value of the energy-momentum tensor are investigated. The renormalization is performed using the Hurwitz function. The results are compared with those previously studied in the case of zero electric field. We discuss the behavior of the vacuum expectation values in different regions for the values of the problem parameters. Applications of the results include the electronic subsystem of graphene sheet described by the Dirac model in the long-wavelength approximation.	翻訳日:2024-02-07 20:37:27 公開日:2024-02-06
# 多段階一般化による弱めの3次元物体検出 Weakly Supervised 3D Object Detection with Multi-Stage Generalization ( http://arxiv.org/abs/2306.05418v2 ) ライセンス: Link先を確認	Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang	(参考訳) 大規模モデルの急速な発展に伴い、データの必要性はますます重要になっている。特に3dオブジェクト検出では、コストのかかる手動アノテーションがさらなる進歩を妨げている。アノテーションの負担を軽減するため,2次元アノテーションのみに基づく3次元オブジェクト検出の課題について検討した。高度な3D再構成技術により、全体の静的な3Dシーンを再構築することが可能になった。しかし、シーン全体から正確なオブジェクトレベルのアノテーションを抽出し、これらの制限されたアノテーションをシーン全体に一般化することは、依然として課題である。本稿では,擬似ラベル生成と多段階一般化を包含するba$^2$-detと呼ばれる新しいパラダイムを提案する。再構成されたシーンレベルポイントからオブジェクトクラスタを得るために,ダブルクラスタアルゴリズムを考案し,一般化の3段階(完全から部分へ,静的から動的へ,遠くまで)を展開することにより,モデルの検出能力をさらに向上させる。大規模なWaymo Open Datasetで実施された実験によると、BA$^2$-Detのパフォーマンスは10%アノテーションを使用した完全に教師された手法と同等である。さらに、事前トレーニングのために大きな生動画を使用すると、BA$^2$-DetはKITTIデータセットに対して20%の相対的な改善を達成できる。この手法は複雑なシーンでオープンセットの3Dオブジェクトを検出する可能性も大きい。プロジェクトページ: https://ba2det.site。 With the rapid development of large models, the need for data has become increasingly crucial. Especially in 3D object detection, costly manual annotations have hindered further advancements. To reduce the burden of annotation, we study the problem of achieving 3D object detection solely based on 2D annotations. Thanks to advanced 3D reconstruction techniques, it is now feasible to reconstruct the overall static 3D scene. However, extracting precise object-level annotations from the entire scene and generalizing these limited annotations to the entire scene remain challenges. In this paper, we introduce a novel paradigm called BA$^2$-Det, encompassing pseudo label generation and multi-stage generalization. We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model's detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant. Experiments conducted on the large-scale Waymo Open Dataset show that the performance of BA$^2$-Det is on par with the fully-supervised methods using 10% annotations. Additionally, using large raw videos for pretraining,BA$^2$-Det can achieve a 20% relative improvement on the KITTI dataset. The method also has great potential for detecting open-set 3D objects in complex scenes. Project page: https://ba2det.site.	翻訳日:2024-02-07 20:37:16 公開日:2024-02-06
# アクティベーション最適化を用いたトロイの木馬モデル検出 Trojan Model Detection Using Activation Optimization ( http://arxiv.org/abs/2306.04877v2 ) ライセンス: Link先を確認	Mohamed E. Hussein, Sudharshan Subramaniam Janakiraman, Wael AbdAlmageed	(参考訳) 機械学習モデルのトレーニングは非常に費用がかからない。これは、例えば、データ制限(使用不可能か、大きすぎるか)や計算能力の制限のためかもしれない。したがって、可能な限りオープンソースの事前学習モデルに頼るのが一般的である。しかし、このプラクティスはセキュリティの観点から警戒されている。事前訓練されたモデルはトロイの木馬攻撃に感染し、攻撃者はモデルにトリガーを埋め込んで、入力にトリガーが存在するときにモデルの動作がアタッカーによって制御されるようにする。本稿では,トロイの木馬モデルを検出する新しい手法を提案する。本手法はアクティベーション最適化に基づくモデルのシグネチャを生成する。分類器は、そのシグネチャが与えられたトロイの木馬モデルを検出するように訓練される。我々は、グラディエントベースの署名からTRojan識別のためのTRIGSと呼ぶ。 TRIGSは、畳み込みモデルの2つの公開データセットで最先端のパフォーマンスを達成する。さらに,視覚トランスフォーマーアーキテクチャに基づいた,imagenetモデルの新たな挑戦的データセットも紹介する。 TRIGSは新しいデータセットで最高のパフォーマンスを提供し、ベースラインメソッドを大きなマージンで上回る。また,本実験では,攻撃者のモデルアーキテクチャについて事前の知識がなくても,トライグはクリーンなサンプルを少量しか必要とせず,合理的に機能することを示した。私たちのデータセットはまもなくリリースされます。 Training machine learning models can be very expensive or even unaffordable. This may be, for example, due to data limitations (unavailability or being too large), or computational power limitations. Therefore, it is a common practice to rely on open-source pre-trained models whenever possible. However, this practice is alarming from a security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model's behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present a novel method for detecting Trojan models. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. We call our method TRIGS for TRojan Identification from Gradient-based Signatures. TRIGS achieves state-of-the-art performance on two public datasets of convolutional models. Additionally, we introduce a new challenging dataset of ImageNet models based on the vision transformer architecture. TRIGS delivers the best performance on the new dataset, surpassing the baseline methods by a large margin. Our experiments also show that TRIGS requires only a small amount of clean samples to achieve good performance, and works reasonably well even if the defender does not have prior knowledge about the attacker's model architecture. Our dataset will be released soon.	翻訳日:2024-02-07 20:36:54 公開日:2024-02-06
# 高次元および置換不変異常検出 High-dimensional and Permutation Invariant Anomaly Detection ( http://arxiv.org/abs/2306.03933v4 ) ライセンス: Link先を確認	Vinicius Mikuni, Benjamin Nachman	(参考訳) 新しい物理過程の異常検出法は、高次元確率密度の学習が困難であるため、しばしば低次元空間に限られる。特に構成レベルでは,一般密度推定法では置換不変性や可変長入力などの望ましい特性を組み込むことが困難となる。本研究では, 分散モデルに基づく粒子物理学データに対して, 可変長入力を扱うために特別に設計された置換不変密度推定器を提案する。本手法の有効性は,学習密度を置換不変な異常検出スコアとして利用し,背景のみの仮説の下でジェットを効果的に同定することによって実証する。密度推定法を検証するため, 教師付き分類アルゴリズムにより得られた密度の比について検討し, 比較を行った。 Methods for anomaly detection of new physics processes are often limited to low-dimensional spaces due to the difficulty of learning high-dimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variable-length inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutation-invariant density estimator for particle physics data based on diffusion models, specifically designed to handle variable-length inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutation-invariant anomaly detection score, effectively identifying jets with low likelihood under the background-only hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.	翻訳日:2024-02-07 20:36:34 公開日:2024-02-06
# 3次元分子相互作用学習に向けたジェネラリスト同変トランスフォーマー Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning ( http://arxiv.org/abs/2306.01474v4 ) ライセンス: Link先を確認	Xiangzhe Kong, Wenbing Huang, Yang Liu	(参考訳) 生物学や創薬における多くのプロセスは、タンパク質やタンパク質、タンパク質や小分子などの分子間の様々な3d相互作用を含んでいる。異なる分子は通常異なる粒度で表されるため、既存の手法では各種類の分子を異なるモデルで独立にエンコードし、普遍的な相互作用物理学を学ぶには欠陥がある。本稿ではまず,任意の3次元錯体を集合の幾何学的グラフとして普遍的に表現し,全ての分子を1つのモデルで符号化することを提案する。次に、ドメイン固有の階層とドメインに依存しない相互作用物理学の両方を効果的に捉えるためのジェネラリスト同変トランスフォーマー(get)を提案する。具体的には、GETはバイレベルアテンションモジュール、フィードフォワードモジュール、レイヤ正規化モジュールで構成されており、各モジュールはE(3)同変であり、可変サイズの集合を扱うのに特化している。特に、従来のプーリングベースの階層モデルとは対照的に、GETはあらゆるレベルのきめ細かい情報を保持できます。タンパク質,小分子,rna/dna間の相互作用に関する広範な実験により,提案手法の有効性と汎用性が検証された。 Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the universal underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.	翻訳日:2024-02-07 20:36:21 公開日:2024-02-06
# 事前学習音声モデルのモデル伝達可能性の推定法 How to Estimate Model Transferability of Pre-Trained Speech Models? ( http://arxiv.org/abs/2306.01015v3 ) ライセンス: Link先を確認	Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath	(参考訳) 本研究では,学習対象タスクに対する事前学習音声モデル(PSM)の伝達可能性を推定する「スコアベースアセスメント」フレームワークを提案する。我々は,ベイズ推定法と最適移動法という2つの表現理論を用いて,抽出した表現を用いてpsm候補のランクスコアを生成する。提案手法は, 時間的独立仮説を定めて, 候補モデルやレイヤの微調整をすることなく, 転送可能性スコアを効率的に計算する。公開データを用いて,一般的な教師付き音声モデル (Conformer RNN-Transducerなど) と自己教師付き音声モデル (HuBERTなど) をクロス層およびクロスモデル設定で評価する。実験の結果,スピアマンのランク相関は高く,評価フレームワークと微調整の土台真実との間にはp$-値が低いことがわかった。提案する転送性フレームワークは計算時間と資源を少なくし,音声基礎モデルをチューニングするための資源節約と時間効率のアプローチとなる。 In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates using the extracted representations. Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers by making a temporal independent hypothesis. We evaluate some popular supervised speech models (e.g., Conformer RNN-Transducer) and self-supervised speech models (e.g., HuBERT) in cross-layer and cross-model settings using public data. Experimental results show a high Spearman's rank correlation and low $p$-value between our estimation framework and fine-tuning ground truth. Our proposed transferability framework requires less computational time and resources, making it a resource-saving and time-efficient approach for tuning speech foundation models.	翻訳日:2024-02-07 20:36:00 公開日:2024-02-06
# 自己教師付き学習における確率的多値論理演算による表現合成 Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning ( http://arxiv.org/abs/2309.04148v2 ) ライセンス: Link先を確認	Hiroki Nakamura, Masashi Okada, Tadahiro Taniguchi	(参考訳) 本稿では,論理操作が可能な表現のための自己教師付き学習(SSL)手法を提案する。表現学習は画像生成や検索といった様々なタスクに適用されている。表現の論理制御性はこれらのタスクにとって重要である。自然言語を入力として表現の直感的な制御を可能にする方法がいくつか示されているが、表現間の論理操作による表現制御は実証されていない。表現合成を用いたSSLメソッド(例えば、要素平均と最大演算)が提案されているが、これらのメソッドで実行される操作は論理演算を含まない。本研究では,既存の表現合成を多値論理の確率的拡張の演算に置き換え,論理操作可能な自己教師付き表現学習手法を提案する。表現は、画像内の各特徴の有無を示す真理値である特徴仮定次数の集合からなり、論理演算(例えば、or、and)を実現する。本手法は,両表現の特徴を持つ表現や,両表現に共通する特徴のみを生成することができる。さらに、多値論理の真理値の確率分布によって特徴量を示すことにより、特徴の曖昧な存在を表現することを実現する。合成表現を用いた従来のSSL手法と比較して,本手法はシングルラベルとマルチラベルの分類タスクにおいて競合的に動作することを示した。さらに,MNIST と PascalVOC を用いた画像検索実験により,提案手法の表現をOR および操作により操作可能であることを示した。 In this paper, we propose a new self-supervised learning (SSL) method for representations that enable logic operations. Representation learning has been applied to various tasks, such as image generation and retrieval. The logical controllability of representations is important for these tasks. Although some methods have been shown to enable the intuitive control of representations using natural languages as the inputs, representation control via logic operations between representations has not been demonstrated. Some SSL methods using representation synthesis (e.g., elementwise mean and maximum operations) have been proposed, but the operations performed in these methods do not incorporate logic operations. In this work, we propose a logic-operable self-supervised representation learning method by replacing the existing representation synthesis with the OR operation on the probabilistic extension of many-valued logic. The representations comprise a set of feature-possession degrees, which are truth values indicating the presence or absence of each feature in the image, and realize the logic operations (e.g., OR and AND). Our method can generate a representation that has the features of both representations or only those features common to both representations. In addition, the expression of the ambiguous presence of a feature is realized by indicating the feature-possession degree by the probability distribution of truth values of the many-valued logic. We showed that our method performs competitively in single and multi-label classification tasks compared with prior SSL methods using synthetic representations. Moreover, experiments on image retrieval using MNIST and PascalVOC showed that the representations of our method can be operated by OR and AND operations.	翻訳日:2024-02-07 20:29:05 公開日:2024-02-06
# デコード:建物の歴史的データと環境要因を活用したデータ駆動エネルギー消費予測 DECODE: Data-driven Energy Consumption Prediction leveraging Historical Data and Environmental Factors in Buildings ( http://arxiv.org/abs/2309.02908v2 ) ライセンス: Link先を確認	Aditya Mishra, Haroon R. Lone, Aayush Mishra	(参考訳) 建物のエネルギー予測は、効率的なエネルギー管理において重要な役割を果たす。正確な予測は、グリッド内の最適なエネルギー消費と分配を達成するために不可欠である。本稿では,過去のエネルギーデータ,居住パターン,気象条件を用いて,建築エネルギー消費量を予測するための長期短期記憶モデル(lstm)を提案する。 LSTMモデルは、既存の予測モデルと比較して、住宅や商業ビルの正確な短・中・長期エネルギー予測を提供する。 LSTMモデルと線形回帰,決定木,ランダム林などの確立した予測手法を比較した。 LSTMモデルは、すべての指標において優れたパフォーマーとして現れます。これは例外的な予測精度を示し、R2スコアは0.97で、平均絶対誤差(MAE)は0.007である。開発したモデルのさらなる利点は、限られたデータセットでトレーニングしても効率的なエネルギー消費予測を実現する能力である。我々は,実世界のデータに対する厳密なトレーニングと評価を通じて,過剰フィッティング(分散)と低フィッティング(バイアス)に関する懸念に対処する。まとめると、我々の研究は代替手法より優れ、優れた効率、一般化可能性、信頼性で機能する堅牢なLSTMモデルを提供することでエネルギー予測に寄与する。 Energy prediction in buildings plays a crucial role in effective energy management. Precise predictions are essential for achieving optimal energy consumption and distribution within the grid. This paper introduces a Long Short-Term Memory (LSTM) model designed to forecast building energy consumption using historical energy data, occupancy patterns, and weather conditions. The LSTM model provides accurate short, medium, and long-term energy predictions for residential and commercial buildings compared to existing prediction models. We compare our LSTM model with established prediction methods, including linear regression, decision trees, and random forest. Encouragingly, the proposed LSTM model emerges as the superior performer across all metrics. It demonstrates exceptional prediction accuracy, boasting the highest R2 score of 0.97 and the most favorable mean absolute error (MAE) of 0.007. An additional advantage of our developed model is its capacity to achieve efficient energy consumption forecasts even when trained on a limited dataset. We address concerns about overfitting (variance) and underfitting (bias) through rigorous training and evaluation on real-world data. In summary, our research contributes to energy prediction by offering a robust LSTM model that outperforms alternative methods and operates with remarkable efficiency, generalizability, and reliability.	翻訳日:2024-02-07 20:28:22 公開日:2024-02-06
# OHQ:オンチップのハードウェア対応量子化 OHQ: On-chip Hardware-aware Quantization ( http://arxiv.org/abs/2309.01945v2 ) ライセンス: Link先を確認	Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yulun Zhang, Ying Li, Xianglong Liu	(参考訳) 量子化は、リソース制約のあるハードウェアに高度なディープモデルをデプロイするための最も有望なアプローチの1つとして現れます。 mixed-precision quantizationは、複数のビット幅アーキテクチャを活用して、量子化モデルの精度と効率性を解き放つ。しかし、既存の混合精度量子化は、膨大な計算オーバーヘッドを引き起こす網羅的な探索空間に苦しむ。本稿では,ハードウェア・アウェア・量子化(ohq)フレームワークを提案する。このフレームワークは,オンラインデバイスにアクセスせずにハードウェア・アウェアの複合精度量子化を行う。第一に、オンチップ量子化認識(OQA)パイプラインを構築し、ハードウェア上で量子化演算子の実際の効率指標を認識できるようにする。第二に、オンチップレベルの計算能力の制約下で演算子の精度指標を効率的に推定するMask-guided Quantization Estimation(MQE)技術を提案する。特に、量子化プロセスは、追加のコンピューティングデバイスやデータアクセスなしで、オンチップで完全に実行される。 ResNet-18とMobileNetV3では,それぞれ70%,73%の精度を実現した。 OHQは、デプロイメント時のINT8と比較して、レイテンシを15～30%改善する。 Quantization emerges as one of the most promising approaches for deploying advanced deep models on resource-constrained hardware. Mixed-precision quantization leverages multiple bit-width architectures to unleash the accuracy and efficiency potential of quantized models. However, existing mixed-precision quantization suffers exhaustive search space that causes immense computational overhead. The quantization process thus relies on separate high-performance devices rather than locally, which also leads to a significant gap between the considered hardware metrics and the real deployment.In this paper, we propose an On-chip Hardware-aware Quantization (OHQ) framework that performs hardware-aware mixed-precision quantization without accessing online devices. First, we construct the On-chip Quantization Awareness (OQA) pipeline, enabling perceive the actual efficiency metrics of the quantization operator on the hardware.Second, we propose Mask-guided Quantization Estimation (MQE) technique to efficiently estimate the accuracy metrics of operators under the constraints of on-chip-level computing power.By synthesizing network and hardware insights through linear programming, we obtain optimized bit-width configurations. Notably, the quantization process occurs on-chip entirely without any additional computing devices and data access. We demonstrate accelerated inference after quantization for various architectures and compression ratios, achieving 70% and 73% accuracy for ResNet-18 and MobileNetV3, respectively. OHQ improves latency by 15~30% compared to INT8 on deployment.	翻訳日:2024-02-07 20:28:04 公開日:2024-02-06
# ws-sfmlearner : カメラパラメータ不明手術ビデオにおける自己教師付き単眼深度とエゴモーション推定 WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters ( http://arxiv.org/abs/2308.11776v2 ) ライセンス: Link先を確認	Ange Lou and Jack Noble	(参考訳) 手術映像の深さ推定は多くの画像誘導手術において重要な役割を担っている。しかし,手術シーンの明るさやノイズの相違が原因で,手術映像に深度マップの真実データセットを作成するのが難しく,時間を要する。そのため,コンピュータビジョンコミュニティからは,高精度でロバストな自己監視深度とカメラの自我運動推定システムの構築が注目されている。いくつかの自己監督手法は、地上の真理深度マップやポーズの必要性を緩和するが、カメラ固有のパラメータがまだ必要であり、しばしば欠落しているか記録されていない。さらに,既存の作業におけるカメラ固有の予測手法は,データセットの品質に大きく依存する。本研究では,正確な深度マップとカメラポーズだけでなく,カメラ固有のパラメータを予測できる自己教師付き深度推定システムの構築を目標とした。我々は,カメラパラメータ予測のための補助的な監視を行うために,コストボリュームに基づく監視手法を提案した。実験の結果,提案手法は推定カメラパラメータ,エゴモーション,深さ推定の精度を改善した。 Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation.	翻訳日:2024-02-07 20:26:35 公開日:2024-02-06
# samsnerf: segment anything model(sam)はneural radiance field(nerf)によるダイナミックな手術シーンの再構築をガイドする。 SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF) ( http://arxiv.org/abs/2308.11774v2 ) ライセンス: Link先を確認	Ange Lou, Yamin Li, Xing Yao, Yike Zhang and Jack Noble	(参考訳) 手術映像からの手術シーンの正確な再構成は, 術中ナビゲーションや画像誘導ロボット手術自動化など, 様々な応用に不可欠である。しかし,従来のアプローチは主に深度推定に頼っているため,移動式手術器具による手術シーンの再構築には限界がある。この制限に対処し,すべてのフレームにおける手術器具の正確な3次元位置予測を行うため,Segment Anything Model (SAM) とNeRF(NeRF)技術を組み合わせたSAMSNeRFと呼ばれる新しいアプローチを提案する。提案手法は,NeRFによる動的手術シーン再構築の洗練を導くSAMを用いて,手術器具の正確なセグメンテーションマスクを生成する。腹腔鏡下手術ビデオにおける実験結果から,本手法は高忠実度ダイナミックな手術場面を再現し,手術器具の空間情報を正確に反映する。提案手法は手術時の手術器具の正確な3次元位置情報を外科医に提供することで,手術ナビゲーションと自動化を大幅に向上させることができる。 The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D position prediction for surgical tools in all frames, we propose a novel approach called SAMSNeRF that combines Segment Anything Model (SAM) and Neural Radiance Field (NeRF) techniques. Our approach generates accurate segmentation masks of surgical tools using SAM, which guides the refinement of the dynamic surgical scene reconstruction by NeRF. Our experimental results on public endoscopy surgical videos demonstrate that our approach successfully reconstructs high-fidelity dynamic surgical scenes and accurately reflects the spatial information of surgical tools. Our proposed approach can significantly enhance surgical navigation and automation by providing surgeons with accurate 3D position information of surgical tools during surgery.The source code will be released soon.	翻訳日:2024-02-07 20:26:17 公開日:2024-02-06
# 正規化された$(xp)^2$モデル A Regularized $(XP)^2$ Model ( http://arxiv.org/abs/2308.11648v2 ) ライセンス: Link先を確認	Yu-Qi Chen and Zhao-Feng Ge	(参考訳) 古典的ハミルトニアンである $h(x,p)=(x^2+a^2)(p^2+a^2)$ によって記述される動的モデルについて検討する。高エネルギーの$E$制限では、位相パスは$(XP)^2$モデルに似ている。しかし、$a$のゼロでない値はレギュレータとして作用し、$x, p \sim 0$の領域に現れる特異点を取り除き、状態密度の対数的増加を特徴とする離散スペクトルとなる。古典解は楕円函数によって記述され、周期は楕円積分によって決定される。半古典近似では、漸近リーマン・ジーゲル公式は多重位相経路からの寄与の和として解釈できると推測する。量子化ハミルトニアンの3つの異なる形式を示し、それらを$\cosh 2x$-likeポテンシャルを持つ標準シュレーディンガー方程式に再構成する。これらのスペクトルの数値評価を行い、エネルギー準位の違いを明らかにした。そのうちの1つの興味深い形式は、古典版と同一のシュルク・オディンガー方程式においてハミルトニアンを持つ。そのようなシナリオでは、固有値方程式は$i\infty$ポイントでのマチュー関数の値の消滅として表すことができ、さらにマチュー関数は波動関数として表すことができる。 We investigate a dynamic model described by the classical Hamiltonian $H(x,p)=(x^2+a^2)(p^2+a^2)$, where $a^2>0$, in classical, semi-classical, and quantum mechanics. In the high-energy $E$ limit, the phase path resembles that of the $(XP)^2$ model. However, the non-zero value of $a$ acts as a regulator, removing the singularities that appear in the region where $x, p \sim 0$, resulting in a discrete spectrum characterized by a logarithmic increase in state density. Classical solutions are described by elliptic functions, with the period being determined by elliptic integrals. In semi-classical approximation, we speculate that the asymptotic Riemann-Siegel formula may be interpreted as summing over contributions from multiply phase paths. We present three different forms of quantized Hamiltonians, and reformulate them into the standard Schr\" odinger equation with $\cosh 2x$-like potentials. Numerical evaluations of the spectra for these forms are carried out and reveal minor differences in energy levels. Among them, one interesting form possesses Hamiltonian in the Schr\" odinger equation that is identical to its classical version. In such scenarios, the eigenvalue equations can be expressed as the vanishing of the Mathieu functions' value at $i\infty$ points, and furthermore, the Mathieu functions can be represented as the wave functions.	翻訳日:2024-02-07 20:25:57 公開日:2024-02-06
# 思考のグラフ: 大きな言語モデルで精巧な問題を解決する Graph of Thoughts: Solving Elaborate Problems with Large Language Models ( http://arxiv.org/abs/2308.09687v4 ) ライセンス: Link先を確認	Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler	(参考訳) graph of thoughts (got): 大規模言語モデル(llm)におけるプロンプト機能を、chain-of-thoughtやtree of thoughts (tot)といったパラダイムによって提供されるものを超えて推進するフレームワークです。 GoTの鍵となるアイデアと主要な利点は、LLMによって生成された情報を任意のグラフとしてモデル化する能力であり、そこでは情報の単位(LLM思考)が頂点であり、エッジはこれらの頂点間の依存関係に対応する。このアプローチにより、任意のLLM思考を相乗的な結果に組み合わせ、思考のネットワーク全体の本質を蒸留したり、フィードバックループを用いて思考を強化することができる。例えば、totよりもソートの品質を62%向上させ、同時にコストを31%以上削減するなどである。我々は、getが新しい思考変換によって拡張可能であることを保証し、それによって新しいプロンプトスキームを先導することができる。この研究は、LLM推論を人間の思考や再発などの脳機構に近づけ、どちらも複雑なネットワークを形成する。 We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.	翻訳日:2024-02-07 20:25:29 公開日:2024-02-06
# シングルコピー計測によるt$ドープ安定化状態の効率的な学習 Efficient learning of $t$-doped stabilizer states with single-copy measurements ( http://arxiv.org/abs/2308.07014v3 ) ライセンス: Link先を確認	Nai-Hui Chia, Ching-Yi Lai, Han-Hsuan Lin	(参考訳) 量子状態学習の主要な目的の1つは、量子回路から生成される状態の学習に時間効率の良いアルゴリズムを開発することである。初期の研究では、クリフォード回路から生成される状態に対して最大$\log(n)$非クリフォードゲートを持つ時間効率の良いアルゴリズムが示されている。しかし、これらのアルゴリズムはマルチコピー計測を必要とし、必要な量子メモリのために短期的に実装上の課題を提起する。それとは対照的に、計算ベースでのみシングルキュービットの測定を使用することは、合理的な量子後暗号仮定の下で1つの追加のT$ゲートを持つクリフォード回路の出力分布でさえ学習するには不十分である。本研究では,Cifford回路が生成する状態を最大$O(\log n)$非Ciffordゲートで学習するために,非適応的な単一コピー測定のみを用いる効率的な量子アルゴリズムを提案する。 One of the primary objectives in the field of quantum state learning is to develop algorithms that are time-efficient for learning states generated from quantum circuits. Earlier investigations have demonstrated time-efficient algorithms for states generated from Clifford circuits with at most $\log(n)$ non-Clifford gates. However, these algorithms necessitate multi-copy measurements, posing implementation challenges in the near term due to the requisite quantum memory. On the contrary, using solely single-qubit measurements in the computational basis is insufficient in learning even the output distribution of a Clifford circuit with one additional $T$ gate under reasonable post-quantum cryptographic assumptions. In this work, we introduce an efficient quantum algorithm that employs only nonadaptive single-copy measurement to learn states produced by Clifford circuits with a maximum of $O(\log n)$ non-Clifford gates, filling a gap between the previous positive and negative results.	翻訳日:2024-02-07 20:25:11 公開日:2024-02-06
# git-mol: グラフ、画像、テキストを用いた分子科学のためのマルチモーダル大言語モデル GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text ( http://arxiv.org/abs/2308.06911v3 ) ライセンス: Link先を確認	Pengfei Liu, Yiming Ren, Jun Tao and Zhixiang Ren	(参考訳) 大規模な言語モデルは自然言語処理において大きな進歩を遂げ、分子のテキスト表現を処理することによって分子科学における革新的な応用を可能にした。しかし、既存の言語モデルは複雑な分子構造や画像でリッチな情報を捉えることができない。本稿では,グラフ,画像,テキスト情報を統合したマルチモーダルな大規模言語モデルであるGIT-Molを紹介する。マルチモーダルな分子データの統合を容易にするため,全てのモダリティを統一された潜在空間に整列させることができる新しいアーキテクチャであるGIT-Formerを提案する。特性予測の精度が5%～10%向上し, 分子生成の有効性が20.2%向上した。言語間の分子翻訳戦略により, 化合物名認識や化学反応予測など, より下流の課題を遂行できる可能性が示唆された。 Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.	翻訳日:2024-02-07 20:24:53 公開日:2024-02-06
# 周期駆動システムのための対断駆動 Counterdiabatic Driving for Periodically Driven Systems ( http://arxiv.org/abs/2310.02728v2 ) ライセンス: Link先を確認	Paul Manuel Schindler and Marin Bukov	(参考訳) 周期駆動型システムは量子システムの特性を設計する上で有用な技術として登場し、量子シミュレーションの標準ツールボックスとして開発されている。このツールボックスを不完全な状態にしておくことは、強い周期ドライブにdressした状態の操作である。フロッケ制御の最先端はパラメータの断熱的変化である。しかし、これは実験におけるコヒーレンス時間の制限と矛盾する長いプロトコルを必要とする。非平衡量子物質を高速に制御するために、フロッケ系に着目した平衡から変分反断熱駆動の概念を一般化する。実効的なフロケ・ハミルトニアンに対する断熱ゲージポテンシャルの局所近似を求める非摂動的変分原理を導出する。これは、断熱体制から遠く離れたフロケ固有状態の遷移のない運転を可能にする。 2レベルFloquetバンドへの応用と周期駆動モデルとの相互作用について論じる。この技術により、非摂動光子共鳴を捕捉し、アクセス可能な制御項の局所性のような実験的な制限を尊重する高忠実度プロトコルを得ることができる。 Periodically driven systems have emerged as a useful technique to engineer the properties of quantum systems, and are in the process of being developed into a standard toolbox for quantum simulation. An outstanding challenge that leaves this toolbox incomplete is the manipulation of the states dressed by strong periodic drives. The state-of-the-art in Floquet control is the adiabatic change of parameters. Yet, this requires long protocols conflicting with the limited coherence times in experiments. To achieve fast control of nonequilibrium quantum matter, we generalize the notion of variational counterdiabatic driving away from equilibrium focusing on Floquet systems. We derive a nonperturbative variational principle to find local approximations to the adiabatic gauge potential for the effective Floquet Hamiltonian. It enables transitionless driving of Floquet eigenstates far away from the adiabatic regime. We discuss applications to two-level, Floquet band, and interacting periodically-driven models. The developed technique allows us to capture non-perturbative photon resonances and obtain high-fidelity protocols that respect experimental limitations like the locality of the accessible control terms.	翻訳日:2024-02-07 20:16:23 公開日:2024-02-06
# OceanGPT: 海洋科学タスクのための大規模言語モデル OceanGPT: A Large Language Model for Ocean Science Tasks ( http://arxiv.org/abs/2310.02031v5 ) ライセンス: Link先を確認	Zhen Bi, Ningyu Zhang, Yida Xue, Yixin Ou, Daxiong Ji, Guozhou Zheng, Huajun Chen	(参考訳) 生命と生物多様性の貯水池である海洋科学は、地球の表面の70%以上を海洋がカバーしていることを考えると、非常に重要である。近年,Large Language Models (LLM) の進歩が科学のパラダイムを変えつつある。他の領域での成功にもかかわらず、現在のLLMは海洋学者のようなドメインの専門家のニーズに応えられず、海洋科学のためのLLMのポテンシャルは過小評価されている。内在的な理由は、海洋データの巨大で複雑な性質と、より高い粒度と知識の豊かさの必要性である。これらの問題を緩和するため,海洋分野における初のLCMであるOceanGPTを紹介した。マルチエージェント協調に基づく命令を生成する,大量の海洋ドメイン命令データを自動的に取得する新しいフレームワークであるDoInstructを提案する。さらに,海洋域におけるLLMの能力を評価するため,最初の海洋学ベンチマークであるOceanBenchを構築した。総合的な実験ではあるが、OceanGPTは海洋科学のタスクの高度な知識知識を示すだけでなく、海洋技術における予備的なインテリジェンス能力も得る。コード、データ、チェックポイントは近々https://github.com/zjunlp/KnowLM.comで公開される。 Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reason may be the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is expert in various ocean science tasks. We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. Codes, data and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.	翻訳日:2024-02-07 20:15:46 公開日:2024-02-06
# 一般費用のエネルギー誘導型連続エントロピーバリアセンター推定 Energy-Guided Continuous Entropic Barycenter Estimation for General Costs ( http://arxiv.org/abs/2310.01105v2 ) ライセンス: Link先を確認	Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Anastasis Kratsios, Evgeny Burnaev, Alexander Korotin	(参考訳) 最適輸送(OT)バリセンターは、幾何学的性質を捉えながら確率分布を平均化する方法である。要するに、バリセンターのタスクは、OTの相違点が与えられた確率分布の集合の平均を取ることである。任意のOTコスト関数に対して連続的エントロピーOT(EOT)バリセンタを近似する新しいアルゴリズムを提案する。我々のアプローチは、最近MLコミュニティの注目を集めている弱いOTに基づくEOT問題の二重再構成に基づいている。新規性以外にも、我々の方法にはいくつかの利点がある。 (i)回収した溶液の品質境界を確立する。 (二)この手法は、関心事問題によく調整されたアルゴリズムの使用を可能にする、エネルギーベースモデル(EBM)学習手順と全く無関係である。 (iii)ミニマックス、強化、その他の複雑な技術的トリックを避けるための直感的な最適化スキームを提供する。検証には,非ユークリッドコスト関数を含むいくつかの低次元シナリオと画像空間の設定を検討する。さらに,事前学習した生成モデルで生成した画像多様体上でバリセンタを学習する実践的課題について検討し,実世界の応用への新たな方向について検討する。 Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seemlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.	翻訳日:2024-02-07 20:15:24 公開日:2024-02-06
# リンク予測の再検討: データパースペクティブ Revisiting Link Prediction: A Data Perspective ( http://arxiv.org/abs/2310.00793v2 ) ライセンス: Link先を確認	Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang	(参考訳) グラフの基本的なタスクであるリンク予測は、フレンドレコメンデーション、タンパク質分析、薬物相互作用予測など、様々なアプリケーションで必須であることが証明されている。しかし、データセットは複数のドメインにまたがるので、異なるリンク形成メカニズムを持つことができる。既存の文献の証拠は、すべてのデータセットに適した普遍的に最適なアルゴリズムが存在しないことを裏付けている。本稿では,データ中心の観点から,多様なデータセットにまたがるリンク予測の原理を探求する。リンク予測に不可欠な3つの基本的な要因は,局所的構造的近接,大域的構造的近接,特徴的近接である。それらの要因間の関係を解明し (i)大域構造近接は局所構造近接が不十分な場合にのみ有効である。 (ii) 特徴点と構造的近接点の間には不整合が認められる。このような非互換性は、特徴近接係数が支配するエッジにおいて、GNNのリンク予測(GNN4LP)が一貫して過小評価される。データの観点からのこれらの新たな洞察に触発され、より包括的な評価のために適切なベンチマークデータセットを選択するためのGNN4LPモデル設計とガイドラインの実践的指導を提供する。 Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations.	翻訳日:2024-02-07 20:15:05 公開日:2024-02-06
# グラフニューラルネットワークは最適な近似アルゴリズムか? Are Graph Neural Networks Optimal Approximation Algorithms? ( http://arxiv.org/abs/2310.00526v5 ) ライセンス: Link先を確認	Morris Yau, Eric Lu, Nikolaos Karalias, Jessica Xu, Stefanie Jegelka	(参考訳) 本研究では,半定義型プログラミング(sdp)の強力なアルゴリズムツールを用いて,組合せ最適化問題に対する最適近似アルゴリズムをキャプチャするグラフニューラルネットワークアーキテクチャを設計する。具体的には, 多項式サイズのメッセージパッシングアルゴリズムは, ユニクゲーム・コンジェクチャを仮定した最大制約満足度問題に対して, 最も強力な多項式時間アルゴリズムを表現できることを示す。我々はこの結果を利用して、Max-Cut、Min-Vertex-Cover、Max-3-SATといったランドマーク組合せ最適化問題に対する高品質な近似解を得る効率的なグラフニューラルネットワークアーキテクチャOptGNNを構築する。提案手法は,実世界および合成データセットの幅広い領域において,解法や神経ベースラインに対して強い実験結果が得られる。最後に, コンベックス緩和を捉えた OptGNN の機能を活用し, 学習した OptGNN の埋め込みから最適解のバウンドを生成するアルゴリズムを設計する。 In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN's ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.	翻訳日:2024-02-07 20:14:48 公開日:2024-02-06
# HarmonyDream:世界モデル内でのタスクハーモニゼーション HarmonyDream: Task Harmonization Inside World Models ( http://arxiv.org/abs/2310.00344v2 ) ライセンス: Link先を確認	Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long	(参考訳) モデルベース強化学習(MBRL)は、環境がどのように機能するかをモデル化し、典型的には2つのタスク、すなわち観察モデリングと報酬モデリングを包含する世界モデルを活用することで、サンプル効率の学習を約束する。本稿では,世界モデルにおいて各タスクが果たす役割について,専用の実証研究を通じてより深く理解し,見落としているサンプル効率のMBRLの可能性を明らかにする。我々の重要な洞察は、明示的なMBRLの一般的なアプローチは、観測モデルを通して環境の豊富な詳細を復元しようとするが、環境の複雑さと限られたモデル容量のために困難であるということである。一方で、暗黙のmbrlを支配しつつ、コンパクトなタスク中心のダイナミクスの学習に長けている報酬モデルは、よりリッチな学習信号なしでサンプル効率のよい学習には不十分である。これらの知見と発見に触発されて,世界モデル学習における2つのタスク間の動的平衡性を維持するために損失係数を自動的に調整する,シンプルで効果的なアプローチであるHarmonyDreamを提案する。実験の結果,HarmonyDreamをベースとしたMBRL法では,視覚ロボティクスの絶対性能が10%-69%向上し,Atari 100Kベンチマークに新たな最先端結果が得られた。 Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Motivated by these insights and discoveries, we propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization, i.e. a dynamic equilibrium between the two tasks in world model learning. Our experiments show that the base MBRL method equipped with HarmonyDream gains 10%-69% absolute performance boosts on visual robotic tasks and sets a new state-of-the-art result on the Atari 100K benchmark.	翻訳日:2024-02-07 20:14:33 公開日:2024-02-06
# 拡張ランダム化平滑化に対するリプシッツ分散マージントレードオフ The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing ( http://arxiv.org/abs/2309.16883v3 ) ライセンス: Link先を確認	Blaise Delattre, Alexandre Araujo, Quentin Barth\'elemy and Alexandre Allauzen	(参考訳) ディープニューラルネットワークの実際の応用は、ノイズの入力や敵対的な攻撃に直面すると不安定な予測によって妨げられる。この文脈では、認定半径はモデルの堅牢性の重要な指標である。しかし、関連する認定半径を持つ効率的な分類器をどう設計するか? ランダム化平滑化(randomized smoothing)は、ノイズを入力に注入することで、滑らかでロバストな分類器を得る、有望なフレームワークを提供する。本稿では,ランダムな平滑化過程推定におけるモンテカルロサンプリングによって生じる分散が,分類器の他の2つの重要な性質であるリプシッツ定数とマージンと密接に相互作用することを示す。より正確には、我々の研究は、滑らかな分類器と経験的分散の両方に対する基底分類器のリプシッツ定数の二重影響を強調している。さらに、証明されたロバスト半径を増やすために、基底分類器の確率ベクトルにロジットを変換して分散マージントレードオフを利用する方法を導入する。我々は、ランダム化平滑化のための拡張リプシッツ境界とともに、ベルンシュタインの濃度不等式を利用する。実験の結果,現在の手法と比較して精度が有意に向上した。新たな認証手順により,ランダム化平滑化に使用する事前学習モデルの使用が可能となり,ゼロショット方式で現在の認証半径を効果的に改善できる。 Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius is in this context a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, \textit{i.e.} its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. Moreover, to increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models that are used with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.	翻訳日:2024-02-07 20:14:06 公開日:2024-02-06
# 機械翻訳におけるパラダイムシフト:大規模言語モデルの翻訳性能の向上 A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models ( http://arxiv.org/abs/2309.11674v2 ) ライセンス: Link先を確認	Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla	(参考訳) 生成型大規模言語モデル(LLM)は様々なNLPタスクにおいて顕著な進歩を遂げている。しかし、これらの進歩は翻訳タスク、特に従来の教師付きエンコーダ・デコーダ翻訳モデルより遅れている中程度のモデルサイズ(7Bまたは13Bパラメータ)では反映されていない。これまでの研究では、これらの中等度LSMの翻訳能力の改善が試みられてきたが、その利益は限られている。本研究では、従来の翻訳モデルが依存する豊富な並列データの必要性をなくし、翻訳タスク用に特別に設計されたllmのための新しい微調整手法を提案する。提案手法は,モノリンガルデータに対する初期微調整と,それに続く少数の高品質並列データに対する微調整の2段階からなる。本稿では,ALMA (Advanced Language Model-based trAnslator) として,この戦略によって開発された LLM を紹介する。 LLaMA-2を基礎モデルとして,WMT'21(2方向)およびWMT'22(8方向)テストデータセットから10の翻訳方向にわたるゼロショット性能に対して,12BLEUおよび12COMET以上の平均的改善を達成できることを示す。 NLLB-54BモデルやGPT-3.5-text-davinci-003よりは優れており、7Bまたは13Bパラメータのみである。この手法は機械翻訳における新しい訓練パラダイムの基礎を確立する。 Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.	翻訳日:2024-02-07 20:13:43 公開日:2024-02-06
# 人間の意思決定を改善するAI不確かさの定量化 Using AI Uncertainty Quantification to Improve Human Decision-Making ( http://arxiv.org/abs/2309.10852v2 ) ライセンス: Link先を確認	Laura R. Marusich, Jonathan Z. Bakdash, Yan Zhou, Murat Kantarcioglu	(参考訳) AI不確実性定量化(UQ)は、AI予測以外の人間の意思決定を改善する可能性がある。 AIと人間の意思決定に関する過去の研究の大部分は、モデル説明可能性と解釈可能性に集中しており、UQが人間の意思決定に与える影響についてはほとんど理解していない。 2つのオンライン行動実験において、厳格なスコアリングルールを用いて校正した事例レベルのUQにおける人的意思決定への影響を評価した。最初の実験では、AI予測のみと比較して、UQは意思決定性能に有益であることを示した。第2の実験で、UQは確率的情報の様々な表現にまたがって意思決定に一般化可能な利点があることを発見した。これらの結果から、AIのインスタンスレベルの高品質なUQの実装は、AI予測単独と比較して、実際のシステムによる意思決定を改善する可能性が示唆された。 AI Uncertainty Quantification (UQ) has the potential to improve human decision-making beyond AI predictions alone by providing additional probabilistic information to users. The majority of past research on AI and human decision-making has concentrated on model explainability and interpretability, with little focus on understanding the potential impact of UQ on human decision-making. We evaluated the impact on human decision-making for instance-level UQ, calibrated using a strict scoring rule, in two online behavioral experiments. In the first experiment, our results showed that UQ was beneficial for decision-making performance compared to only AI predictions. In the second experiment, we found UQ had generalizable benefits for decision-making across a variety of representations for probabilistic information. These results indicate that implementing high quality, instance-level UQ for AI may improve decision-making with real systems compared to AI predictions alone.	翻訳日:2024-02-07 20:13:18 公開日:2024-02-06
# CC-SGG:学習シーングラフを用いたコーナーケースシナリオ生成 CC-SGG: Corner Case Scenario Generation using Learned Scene Graphs ( http://arxiv.org/abs/2309.09844v2 ) ライセンス: Link先を確認	George Drayson, Efimia Panagiotaki, Daniel Omeiza, Lars Kunze	(参考訳) コーナーケースシナリオは、自動運転車(AV)の安全性のテストと検証に不可欠なツールである。これらのシナリオは、自然主義的な運転データセットでは不十分であることが多いため、合成コーナーケースによるデータ拡張は、ユニークな状況下でのAVの安全な操作を大幅に強化する。しかし、合成的、しかし現実的なコーナーケースの生成は、大きな課題となる。本研究では,不均一グラフニューラルネットワーク(HGNN)に基づく新しい手法を導入し,通常の運転シナリオをコーナーケースに変換する。これを実現するために,我々はまず,通常の運転シーンの簡潔な表現をシーングラフとして生成し,その構造と特性を最小に操作する。我々のモデルはこれらのグラフを摂動させ、注意と三重埋め込みを用いてコーナーケースを生成する。入力グラフと摂動グラフはシミュレーションにインポートされ、コーナーケースシナリオを生成する。我々のモデルは入力シーングラフからコーナーケースを生成し、テストデータセットで89.9%の精度で予測することに成功した。さらに、ベースライン自律運転法で生成されたシナリオを検証し、ベースラインにとって重要な状況を効果的に生成するモデルの能力を実証する。 Corner case scenarios are an essential tool for testing and validating the safety of autonomous vehicles (AVs). As these scenarios are often insufficiently present in naturalistic driving datasets, augmenting the data with synthetic corner cases greatly enhances the safe operation of AVs in unique situations. However, the generation of synthetic, yet realistic, corner cases poses a significant challenge. In this work, we introduce a novel approach based on Heterogeneous Graph Neural Networks (HGNNs) to transform regular driving scenarios into corner cases. To achieve this, we first generate concise representations of regular driving scenes as scene graphs, minimally manipulating their structure and properties. Our model then learns to perturb those graphs to generate corner cases using attention and triple embeddings. The input and perturbed graphs are then imported back into the simulation to generate corner case scenarios. Our model successfully learned to produce corner cases from input scene graphs, achieving 89.9% prediction accuracy on our testing dataset. We further validate the generated scenarios on baseline autonomous driving methods, demonstrating our model's ability to effectively create critical situations for the baselines.	翻訳日:2024-02-07 20:13:03 公開日:2024-02-06
# ベータダイバージェンスを用いた深部非負行列因子分解 Deep Nonnegative Matrix Factorization with Beta Divergences ( http://arxiv.org/abs/2309.08249v2 ) ライセンス: Link先を確認	Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta, Nicolas Gillis	(参考訳) ディープ非負行列因子化(Deep Non negative Matrix Factorization, ディープNMF)は、最近、異なるスケールで複数の特徴層を抽出する貴重な手法として登場した。しかし、既存のディープNMFモデルとアルゴリズムは、主に最小二乗誤差に基づく評価が中心であり、多様なデータセットの近似の質を評価するのに最も適していないかもしれない。例えば、音声信号や文書などのデータ型を扱う場合、$\beta$-divergencesはより適切な選択肢を提供すると広く認識されている。本稿では,Kullback-Leiblerの発散に着目し,$\beta$-divergencesを用いて深部NMFの新しいモデルとアルゴリズムを開発する。次に,これらの手法を,顔の特徴の抽出,文書収集中の話題の同定,ハイパースペクトル画像中の資料の同定に応用する。 Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $\beta$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $\beta$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.	翻訳日:2024-02-07 20:12:14 公開日:2024-02-06
# 多経路長期船舶軌道予測によるより安全な海上環境の構築 Building a Safer Maritime Environment Through Multi-Path Long-Term Vessel Trajectory Forecasting ( http://arxiv.org/abs/2310.18948v3 ) ライセンス: Link先を確認	Gabriel Spadon, Jay Kumar, Matthew Smith, Sarah Vela, Romina Gehrmann, Derek Eden, Joshua van Berkel, Amilcar Soares, Ronan Fablet, Ronald Pelot, Stan Matwin	(参考訳) 海洋輸送は世界的な経済成長を達成する上で最重要であり、持続可能性と絶滅危惧種の保護に同時に生態的義務を負う。この点において、自動識別システム(ais)データは、船舶移動に関するリアルタイムストリーミングデータを提供することで、交通監視の強化に重要な役割を果たす。本研究では,AISデータ系列から長期の船舶軌道を予測することにより,船体衝突を防止するためのAISデータについて検討する。そこで我々は, 双方向長短期記憶ネットワーク(Bi-LSTM)を用いたエンコーダ・デコーダモデルアーキテクチャを開発し, 入力として1～3時間AISデータを用いて, 次の12時間の船舶軌道を予測した。我々は,各軌道の潜在的な経路や目的地を示す歴史的AISデータから構築した確率的特徴をモデルに提供する。このモデルでは,空間的特徴学習における畳み込みレイヤと,時間的特徴学習における時系列の最近の時間ステップの重要性を増大させる位置認識型注意機構を活用することで,船の軌道を予測する。確率的特徴は、それぞれの特徴タイプに対して約85%と75%のF1スコアを持ち、ニューラルネットワークへの情報拡張の有効性を示す。我々は、北大西洋右クジラ(NARW)の生息地として知られるセントローレンス湾で、我々のモデルを検証した。我々のモデルは、様々な技術と特徴を用いて、高いR2スコアを98%以上達成した。旋回や経路選択の間に複雑な決定をすることができるため、他のアプローチの中でも際立っている。本研究は,海洋生物種の保全のためのデータ工学および軌道予測モデルの可能性を明らかにする。 Maritime transportation is paramount in achieving global economic growth, entailing concurrent ecological obligations in sustainability and safeguarding endangered marine species, most notably preserving large whale populations. In this regard, the Automatic Identification System (AIS) data plays a significant role by offering real-time streaming data on vessel movement, allowing enhanced traffic monitoring. This study explores using AIS data to prevent vessel-to-whale collisions by forecasting long-term vessel trajectories from engineered AIS data sequences. For such a task, we have developed an encoder-decoder model architecture using Bidirectional Long Short-Term Memory Networks (Bi-LSTM) to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data as input. We feed the model with probabilistic features engineered from historical AIS data that refer to each trajectory's potential route and destination. The model then predicts the vessel's trajectory, considering these additional features by leveraging convolutional layers for spatial feature learning and a position-aware attention mechanism that increases the importance of recent timesteps of a sequence during temporal feature learning. The probabilistic features have an F1 Score of approximately 85% and 75% for each feature type, respectively, demonstrating their effectiveness in augmenting information to the neural network. We test our model on the Gulf of St. Lawrence, a region known to be the habitat of North Atlantic Right Whales (NARW). Our model achieved a high R2 score of over 98% using various techniques and features. It stands out among other approaches as it can make complex decisions during turnings and path selection. Our study highlights the potential of data engineering and trajectory forecasting models for marine life species preservation.	翻訳日:2024-02-07 20:05:17 公開日:2024-02-06
# 言語モデルにおける真さをモデル化するペルソナ Personas as a Way to Model Truthfulness in Language Models ( http://arxiv.org/abs/2310.18168v5 ) ライセンス: Link先を確認	Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He	(参考訳) 大規模な言語モデル(LLM)は、インターネットから大量のテキストで訓練されており、事実と誤解を招く情報の両方を含んでいる。 LMの古典的な見方からは直観的ではないが、最近の研究は、文の真理値がモデルの表現から引き出すことができることを示した。本稿では,真理ラベルのトレーニングを受けていないLMが真理を知っているように見える理由を説明する。プリトレーニングデータは、アウトプットが共通の特徴を持つ(非)エージェントのグループによって生成され、(非)パーソナリティを形成すると仮定する。このデータに基づいてトレーニングすることで、LMはそのアクティベーション空間におけるペルソナを推論し、表現することができる。これにより、モデルは真理を虚偽から切り離し、その世代の真理を制御できる。我々は,(1)モデルが生成する前に真理であるかどうかを検証し,(2)事実の集合上でモデルを微調整することで,その真理性が改善される,という2つの観察を通してペルソナ仮説の証拠を示す。次に,算術を合成環境として用いることで,事前学習データの構造が真正ペルソナを推測するために重要であることを示す。全体としては、モデルがデータの階層構造を利用して真理のような抽象概念を学習できることが示唆されている。 Large language models (LLMs) are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. While unintuitive from a classic view of LMs, recent work has shown that the truth value of a statement can be elicited from the model's representations. This paper presents an explanation for why LMs appear to know the truth despite not being trained with truth labels. We hypothesize that the pretraining data is generated by groups of (un)truthful agents whose outputs share common features, and they form a (un)truthful persona. By training on this data, LMs can infer and represent the persona in its activation space. This allows the model to separate truth from falsehoods and controls the truthfulness of its generation. We show evidence for the persona hypothesis via two observations: (1) we can probe whether a model's answer will be truthful before it is generated; (2) finetuning a model on a set of facts improves its truthfulness on unseen topics. Next, using arithmetics as a synthetic environment, we show that structures of the pretraining data are crucial for the model to infer the truthful persona. Overall, our findings suggest that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.	翻訳日:2024-02-07 20:04:47 公開日:2024-02-06
# 動作駆動型人間の運動予測のための方向認識脚運動学習 Orientation-Aware Leg Movement Learning for Action-Driven Human Motion Prediction ( http://arxiv.org/abs/2310.14907v2 ) ライセンス: Link先を確認	Chunzhi Gu, Chao Zhang, Shigeru Kuriyama	(参考訳) 行動駆動型人間の動作予測の課題は、与えられた行動ラベルを尊重しながら、観察されたシーケンスに基づいて将来の人間の動作を予測することである。人間の動きの確率性だけでなく、複数のアクションラベル間の滑らかで現実的な遷移をモデル化する必要がある。しかし、ほとんどのデータセットがそのような遷移データを含まないという事実は、このタスクを複雑にします。既存の作業は、単にスムーズな遷移を促進する前に滑らかさを学ぶことでこの問題に取り組むが、特に歴史と予測された動きが方向で著しく異なる場合、不自然な遷移をもたらす。本稿では,人間の動作遷移が現実的な脚の動きを取り入れて方向転換を処理し,それを動作条件付き対話型学習タスク(ACB)として活用し,遷移自然性を促進することを論じる。全てのトランジションをモデル化することは事実上不可能であるため、ACBはウォークやランなどのアクティブな歩行動作を持つ非常に少数のアクションクラスでのみ実行される。具体的には、まず動き拡散モデルを用いて、特定の将来の動作で目標動きを生成し、次に、観察と予測をスムーズに連結し、最終的に動き予測に対処した2段階予測戦略に従う。本手法はトレーニング中にラベル付き動作遷移データから完全に解放される。提案手法のロバスト性を示すため,1つのデータセット上でトレーニングした相互学習モデルを2つの大規模動きデータセットに一般化し,自然な遷移を生成する。 3つのベンチマークデータセットを総合的に評価した結果, 視覚的品質, 予測精度, 行動忠実度の観点から, 最先端の性能が得られた。 The task of action-driven human motion prediction aims to forecast future human motion based on the observed sequence while respecting the given action label. It requires modeling not only the stochasticity within human motion but the smooth yet realistic transition between multiple action labels. However, the fact that most datasets do not contain such transition data complicates this task. Existing work tackles this issue by learning a smoothness prior to simply promote smooth transitions, yet doing so can result in unnatural transitions especially when the history and predicted motions differ significantly in orientations. In this paper, we argue that valid human motion transitions should incorporate realistic leg movements to handle orientation changes, and cast it as an action-conditioned in-betweening (ACB) learning task to encourage transition naturalness. Because modeling all possible transitions is virtually unreasonable, our ACB is only performed on very few selected action classes with active gait motions, such as Walk or Run. Specifically, we follow a two-stage forecasting strategy by first employing the motion diffusion model to generate the target motion with a specified future action, and then producing the in-betweening to smoothly connect the observation and prediction to eventually address motion prediction. Our method is completely free from the labeled motion transition data during training. To show the robustness of our approach, we generalize our trained in-betweening learning model on one dataset to two unseen large-scale motion datasets to produce natural transitions. Extensive experimental evaluations on three benchmark datasets demonstrate that our method yields the state-of-the-art performance in terms of visual quality, prediction accuracy, and action faithfulness.	翻訳日:2024-02-07 20:04:24 公開日:2024-02-06
# RAER: 無線分散最適化における線形圧縮 LASER: Linear Compression in Wireless Distributed Optimization ( http://arxiv.org/abs/2310.13033v2 ) ライセンス: Link先を確認	Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar	(参考訳) data-parallel sgdは分散最適化、特に大規模機械学習のためのデファクトアルゴリズムである。その利点にもかかわらず、コミュニケーションのボトルネックは永続的な問題の1つだ。これを緩和するほとんどの圧縮スキームは、ノイズレス通信リンクを仮定するか、実用的なタスクで良いパフォーマンスを達成できないかのいずれかである。本稿では,このギャップを埋めて LASER: LineAr CompreSsion in WirEless DistRibuted Optimization を紹介する。 LASERは勾配の固有の低ランク構造を利用し、ノイズチャネル上で効率的に伝送する。古典的なSGDと同様の理論的保証を享受する一方で、LASERは様々な実用的なベンチマークで基準線よりも一貫した利得を示している。特に、コンピュータビジョンとGPT言語モデリングタスクに挑戦する最先端の圧縮スキームよりも優れている。後者では、ノイズの多いチャネルのベースラインよりも難易度が50ドルから64ドルに向上する。 Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.	翻訳日:2024-02-07 20:03:34 公開日:2024-02-06
# Loci-Segmented: シーンセグメンテーション学習の改善 Loci-Segmented: Improving Scene Segmentation Learning ( http://arxiv.org/abs/2310.10410v3 ) ライセンス: Link先を確認	Manuel Traub, Frederic Becker, Adrian Sauter, Sebastian Otte, Martin V. Butz	(参考訳) 画像や映像からのシーンセグメンテーションのための現在のスロット指向アプローチは、提供された背景情報やスロット割り当てに依存している。本稿では,ロシ・セグメンツド(Loci-Segmented, Loci-s)という,これらの情報を必要としないセグメンテーションされた位置情報・ID追跡システムを提案する。シーンを動的に解釈可能な背景とスロットベースのオブジェクトエンコーディングに分割し、rgb、マスク、位置、深さ情報を分離する。その結果,MOViデータセットと,シーンセグメンテーションをターゲットとした別のデータセットコレクションにおいて,映像分解性能が大幅に向上したことが明らかとなった。このシステムのよく解釈可能な合成潜在エンコーディングは、下流タスクの基礎モデルとして機能する。 Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system's well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.	翻訳日:2024-02-07 20:03:20 公開日:2024-02-06
# マルチボディニューラルシーンフロー Multi-Body Neural Scene Flow ( http://arxiv.org/abs/2310.10301v2 ) ライセンス: Link先を確認	Kavisha Vidanapathirana, Shin-Fang Chng, Xueqian Li, Simon Lucey	(参考訳) ニューラルネットワークをニューラルネットワークとして使用したシーンフローのテスト時間最適化は、単純さ、データセットバイアスの欠如、最先端のパフォーマンスなどによって人気を集めている。しかし, 座標ネットワークは, 空間的平滑なシーンフロー予測を暗黙的に正則化することにより, 一般運動を捉えるが, 先行する神経は実世界データに存在する多体剛性運動を識別できない。これを解決するために, 従来の研究と同様, 剛体のSE(3)$パラメータを制約する, 煩雑で不安定な戦略を使わずに, 多体剛性を実現できることを示す。これは、剛体の流れ予測における等長性を促進するためにシーンフロー最適化を定式化することで達成される。この戦略により、連続した流れ場を維持しながら、シーンフローの多体剛性が可能となり、点雲の列をまたいだ密集した長期のシーンフロー統合が可能になる。我々は,実世界のデータセットに関する広範囲な実験を行い,我々のアプローチが3次元シーンフローと長期的ポイントワイズ4次元軌道予測の最先端を上回っていることを実証する。コードはhttps://github.com/kavisha725/mbnsfで入手できる。 The test-time optimization of scene flow - using a coordinate network as a neural prior - has gained popularity due to its simplicity, lack of dataset bias, and state-of-the-art performance. We observe, however, that although coordinate networks capture general motions by implicitly regularizing the scene flow predictions to be spatially smooth, the neural prior by itself is unable to identify the underlying multi-body rigid motions present in real-world data. To address this, we show that multi-body rigidity can be achieved without the cumbersome and brittle strategy of constraining the $SE(3)$ parameters of each rigid body as done in previous works. This is achieved by regularizing the scene flow optimization to encourage isometry in flow predictions for rigid bodies. This strategy enables multi-body rigidity in scene flow while maintaining a continuous flow field, hence allowing dense long-term scene flow integration across a sequence of point clouds. We conduct extensive experiments on real-world datasets and demonstrate that our approach outperforms the state-of-the-art in 3D scene flow and long-term point-wise 4D trajectory prediction. The code is available at: https://github.com/kavisha725/MBNSF.	翻訳日:2024-02-07 20:03:08 公開日:2024-02-06
# 相関の崩壊からギブス状態の局所性と安定性へ From decay of correlations to locality and stability of the Gibbs state ( http://arxiv.org/abs/2310.09182v2 ) ライセンス: Link先を確認	\'Angela Capel, Massimo Moscolari, Stefan Teufel, Tom Wessel	(参考訳) 本稿では,ギブス状態が相関関係の崩壊を満足すると,局所摂動がギブス状態にのみ影響を及ぼすという意味で安定であり,局所的,すなわち局所的不明瞭性を満たすことを示す。これらの含意は任意の次元において真であり、ハミルトニアンの局所性のみを必要とし、リーブ・ロビンソン境界に依存する。そして、この結果は、相関の減衰が知られている高温度での短距離相互作用を持つ任意の次元の量子スピン系に明示的に適用する。さらに,変換不変かつ指数的に減衰する相互作用を持つ有限一次元スピンチェーンのギブス状態に適用し,有限次元相互作用の極限でゼロとなる閾値温度以上で相関の減衰が真であることを示す。我々の証明は、ギブス状態に対する量子信念伝播の局所性特性の詳細な解析に基づいている。 In this paper we show that whenever a Gibbs state satisfies decay of correlations, then it is stable, in the sense that local perturbations influence the Gibbs state only locally, and it is local, namely it satisfies local indistinguishability. These implications hold true in any dimensions, only require locality of the Hamiltonian and rely on Lieb-Robinson bounds. Then, we explicitly apply our results to quantum spin systems in any dimension with short-range interactions at high enough temperature, where decay of correlations is known to hold. Furthermore, our results are applied to Gibbs states of finite one-dimensional spin chains with translation-invariant and exponentially decaying interactions, for which we also show that decay of correlations holds true above a threshold temperature that goes to zero in the limit of finite-range interactions. Our proofs are based on a detailed analysis of the locality properties of the quantum belief propagation for Gibbs states.	翻訳日:2024-02-07 20:02:16 公開日:2024-02-06
# Web上での銃身売買行動分析のための自己教師型視覚学習 Self-supervised visual learning for analyzing firearms trafficking activities on the Web ( http://arxiv.org/abs/2310.07975v2 ) ライセンス: Link先を確認	Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos	(参考訳) RGB画像からの視覚銃の自動分類は、公共空間のセキュリティ、情報収集、法執行機関の調査に応用するための重要な現実世界の課題である。 World Wide Web(ソーシャルメディアやダークウェブサイトを含む)から大量にクロールされた画像に適用すると、オープンソースのインテリジェンスからビッグデータを分析することで、犯罪者の銃身売買ネットワークを識別しようとするシステムの重要な構成要素となる。ディープニューラルネットワーク(DNN)は、これを実現するための最先端の方法論であり、畳み込みニューラルネットワーク(CNN)が一般的に使用されている。一般的な転送学習アプローチは、ImageNet-1kのような画像分類のための大規模で汎用的なアノテーション付きデータセットを事前トレーニングし、次に、視覚銃器分類のためのより小さく、タスク固有のダウンストリームデータセットでDNNを微調整する。ビジュアルトランスフォーマー(ViT)ニューラルアーキテクチャも、自己監視学習(SSL)アプローチも、この重要なタスクでは評価されていない。 . Automated visual firearms classification from RGB images is an important real-world task with applications in public space security, intelligence gathering and law enforcement investigations. When applied to images massively crawled from the World Wide Web (including social media and dark Web sites), it can serve as an important component of systems that attempt to identify criminal firearms trafficking networks, by analyzing Big Data from open-source intelligence. Deep Neural Networks (DNN) are the state-of-the-art methodology for achieving this, with Convolutional Neural Networks (CNN) being typically employed. The common transfer learning approach consists of pretraining on a large-scale, generic annotated dataset for whole-image classification, such as ImageNet-1k, and then finetuning the DNN on a smaller, annotated, task-specific, downstream dataset for visual firearms classification. Neither Visual Transformer (ViT) neural architectures nor Self-Supervised Learning (SSL) approaches have been so far evaluated on this critical task..	翻訳日:2024-02-07 20:01:57 公開日:2024-02-06
# 大言語モデルにおける文のアナロジー同定と文構造符号化の関係について On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models ( http://arxiv.org/abs/2310.07818v3 ) ライセンス: Link先を確認	Thilini Wijesiriwardene, Ruwan Wickramarachchi, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das	(参考訳) 言語の構文構造と意味構造を符号化するLarge Language Models (LLMs) の能力をNLPでよく検討した。さらに、同義語識別は、言語モデリング文学の過去10年間に、単語類似の形で広く研究されている。本研究は,文の構文的・意味的構造をエンコードするllmsの能力と,文の類似性(類似した意味を相互に伝達する意味)がどのように異なるかを検討する。分析の結果,LLMの文類似を識別する能力は,文の構文的・意味的構造を符号化する能力と正の相関が認められた。特に,構文構造をよりよく捉えたllmは,文の類似性を識別する能力も高いことが判明した。 The ability of Large Language Models (LLMs) to encode syntactic and semantic structures of language is well examined in NLP. Additionally, analogy identification, in the form of word analogies are extensively studied in the last decade of language modeling literature. In this work we specifically look at how LLMs' abilities to capture sentence analogies (sentences that convey analogous meaning to each other) vary with LLMs' abilities to encode syntactic and semantic structures of sentences. Through our analysis, we find that LLMs' ability to identify sentence analogies is positively correlated with their ability to encode syntactic and semantic structures of sentences. Specifically, we find that the LLMs which capture syntactic structures better, also have higher abilities in identifying sentence analogies.	翻訳日:2024-02-07 20:01:28 公開日:2024-02-06
# オンライン言語モデルインタラクションのための圧縮コンテキストメモリ Compressed Context Memory For Online Language Model Interaction ( http://arxiv.org/abs/2312.03414v2 ) ライセンス: Link先を確認	Jang-Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song	(参考訳) 本稿では,オンラインシナリオにおける変換言語モデルに対する文脈キー/値圧縮手法を提案する。コンテキストが長くなるにつれて、注意プロセスはメモリと計算の増大を必要とし、それによって言語モデルのスループットが低下する。この課題に対処するため、コンピュータ環境の限られたメモリ空間における言語モデル推論を容易にし、注目鍵/値ペアをコンパクトなメモリ空間に継続的に圧縮する圧縮文脈記憶システムを提案する。私たちの圧縮プロセスでは、推論中に軽量条件付きloraを言語モデルの前方パスに統合し、モデルの重みのセット全体を微調整する必要はありません。再帰的圧縮プロセスを単一並列化前方計算としてモデル化することにより,効率的なトレーニングを実現する。会話,パーソナライゼーション,マルチタスク学習の評価を通じて,本手法がコンテキストモデル全体の性能レベルを5\times$より小さいコンテキストメモリサイズで達成できることを実証した。さらに,スライディングウインドウアプローチに匹敵する,無制限なコンテキスト長のストリーミング環境において,このアプローチの適用性を示す。コードはhttps://github.com/snu-mllab/context-memoryで入手できる。 This paper presents a context key/value compression method for Transformer language models in online scenarios, where the context continually expands. As the context lengthens, the attention process demands increasing memory and computations, which in turn reduces the throughput of the language model. To address this challenge, we propose a compressed context memory system that continually compresses the accumulating attention key/value pairs into a compact memory space, facilitating language model inference in a limited memory space of computing environments. Our compression process involves integrating a lightweight conditional LoRA into the language model's forward pass during inference, without the need for fine-tuning the model's entire set of weights. We achieve efficient training by modeling the recursive compression process as a single parallelized forward computation. Through evaluations on conversation, personalization, and multi-task learning, we demonstrate that our approach achieves the performance level of a full context model with $5\times$ smaller context memory size. We further demonstrate the applicability of our approach in a streaming setting with an unlimited context length, outperforming the sliding window approach. Codes are available at https://github.com/snu-mllab/context-memory.	翻訳日:2024-02-07 19:52:44 公開日:2024-02-06
# クエンチ力学の線形スケールシミュレーション Linear-scale simulations of quench dynamics ( http://arxiv.org/abs/2311.09556v2 ) ライセンス: Link先を確認	Niaz Ali Khan, Wen Chen, Munsif Jan, and Gao Xianlong	(参考訳) 量子系の非平衡特性の正確な記述とロバストな計算モデリングは、凝縮物質物理学の課題である。本研究では,量子クエンチ系の非平衡力学に対する線形スケール計算シミュレーション手法を開発した。特に,非相互作用量子クエンチ系の動的量子相転移を記述するために,Loschmidtエコーの多項式展開を報告する。拡張に基づく手法により、ハミルトニアン系を対角化することなく、無限大系に対するLoschmidtエコーを効率的に計算できる。その有用性を示すために, 密結合準結晶と不規則格子の1つの空間次元における量子クエンチングダイナミクスを強調する。さらに、格子モデルの下でのクエンチダイナミクスにおける波動ベクトルの役割についても論じる。波動ベクトル非依存の動的位相遷移を自己双対局在モデルで観測する。 The accurate description and robust computational modeling of the nonequilibrium properties of quantum systems remain a challenge in condensed matter physics. In this work, we develop a linear-scale computational simulation technique for the non-equilibrium dynamics of quantum quench systems. In particular, we report a polynomial-expansion of the Loschmidt echo to describe the dynamical quantum phase transitions of noninteracting quantum quench systems. An expansion-based method allows us to efficiently compute the Loschmidt echo for infinitely large systems without diagonalizing the system Hamiltonian. To demonstrate its utility, we highlight quantum quenching dynamics under tight-binding quasicrystals and disordered lattices in one spatial dimension. In addition, the role of the wave vector on the quench dynamics under lattice models is addressed. We observe wave vector-independent dynamical phase transitions in self-dual localization models.	翻訳日:2024-02-07 19:52:05 公開日:2024-02-06
# PCAを超えて: 特徴抽出のための確率的文法シュミットアプローチ Beyond PCA: A Probabilistic Gram-Schmidt Approach to Feature Extraction ( http://arxiv.org/abs/2311.09386v2 ) ライセンス: Link先を確認	Bahram Yaghooti, Netanel Raviv, Bruno Sinopoli	(参考訳) データ間の非線形依存の存在下での線形特徴抽出は教師なし学習における基本的な課題である。本稿では,余剰次元を検出・マップアウトするために,確率的グラムシュミット型直交化法を提案する。具体的には、データ内の非線形依存関係をキャプチャするであろう関数群にGSプロセスを適用することで、新しい大きな分散方向を識別したり、主成分からそれらの依存関係を取り除くために使用できる一連の共分散行列を構築する。前者の場合、エントロピー低減の観点から情報理論的な保証を提供する。後者では、ある仮定の下で、選択された関数ファミリーの線形スパンに依存関係がある場合、結果のアルゴリズムが非線型依存を検出し、除去することを示す。どちらの手法も非線形冗長性を取り除きながらデータから線形特徴を抽出する。抽出された特徴の分散最大化と分類アルゴリズムの性能向上の両方の観点から,pcaおよび最先端線形特徴抽出アルゴリズムの性能向上を示す合成および実世界のデータセットのシミュレーション結果を提供する。さらに,本手法はカーネルPCAの非線形手法よりも優れていることが多い。 Linear feature extraction at the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a probabilistic Gram-Schmidt (GS) type orthogonalization process in order to detect and map out redundant dimensions. Specifically, by applying the GS process over a family of functions which presumably captures the nonlinear dependencies in the data, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from the principal components. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we prove that under certain assumptions the resulting algorithms detect and remove nonlinear dependencies whenever those dependencies lie in the linear span of the chosen function family. Both proposed methods extract linear features from the data while removing nonlinear redundancies. We provide simulation results on synthetic and real-world datasets which show improved performance over PCA and state-of-the-art linear feature extraction algorithms, both in terms of variance maximization of the extracted features, and in terms of improved performance of classification algorithms. Additionally, our methods are comparable and often outperform the non-linear method of kernel PCA.	翻訳日:2024-02-07 19:51:52 公開日:2024-02-06
# CodeScope: コード理解と生成におけるLLM評価のための実行型多言語マルチタスク多次元ベンチマーク CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation ( http://arxiv.org/abs/2311.08588v2 ) ライセンス: Link先を確認	Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Shuiguang Deng, Hari Sundaram	(参考訳) 大規模言語モデル(LLM)は、特に人間のプログラミング支援とプログラミング自動化の促進において、コーディングに関連するタスクにおいて顕著なパフォーマンスを示している。しかし、llmのコード理解と生成能力を評価するための既存のベンチマークは厳しい制限を受ける。まず、ほとんどのベンチマークは、様々な一般的なプログラミング言語や特定のタスクに重点を置いているが、実際のソフトウェア開発シナリオは、多様な要件を満たすために、多言語プログラミング環境を持つシステムを実装する必要があることを示している。実用的なプログラミングプラクティスは、LLMのコーディング能力を包括的かつ堅牢にテストするためのマルチタスク設定を強く期待する。第二に、ほとんどのベンチマークでは、実際の実行可能性と生成されたコードの実行結果の一貫性も考慮できません。既存のベンチマークと実用アプリケーションとのギャップを埋めるため,コーディングタスクにおけるLLM機能を網羅的に拡張する,実行ベース,多言語,マルチタスク,多次元評価ベンチマークであるCodeScopeを導入する。 codescopeは43のプログラミング言語と8つのコーディングタスクをカバーする。難易度, 効率, 長さの3次元からLCMの符号化性能を評価する。コード生成の実行に基づく評価を容易にするため,14のプログラミング言語をサポートする自動コード実行エンジンであるMultiCodeEngineを開発した。最後に,CodeScopeタスク上の8つの主要なLCMを体系的に評価し,他のベンチマークと比較してコード理解および生成タスク上でのLCMの評価において,CodeScopeの優れた広さと課題を示す。 CodeScopeベンチマークとデータセットはhttps://github.com/WeixiangYAN/CodeScopeで公開されている。 Large Language Models (LLMs) have demonstrated remarkable performance on coding related tasks, particularly on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. First, most benchmarks are deficient as they focus on a narrow range of popular programming languages and specific tasks, whereas the real-world software development scenarios show dire need to implement systems with multilingual programming environments to satisfy diverse requirements. Practical programming practices also strongly expect multi-task settings for testing coding capabilities of LLMs comprehensively and robustly. Second, most benchmarks also fail to consider the actual executability and the consistency of execution results of the generated code. To bridge these gaps between existing benchmarks and expectations from practical applications, we introduce CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length. To facilitate execution-based evaluations of code generation, we develop MultiCodeEngine, an automated code execution engine that supports 14 programming languages. Finally, we systematically evaluate and analyze 8 mainstream LLMs on CodeScope tasks and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks. The CodeScope benchmark and datasets are publicly available at https://github.com/WeixiangYAN/CodeScope.	翻訳日:2024-02-07 19:51:34 公開日:2024-02-06
# Vlasov-Maxwell方程式を解くための量子テンソルネットワーク Quantized tensor networks for solving the Vlasov-Maxwell equations ( http://arxiv.org/abs/2311.07756v2 ) ライセンス: Link先を確認	Erika Ye, Nuno Loureiro	(参考訳) ヴラソフ・マクスウェル方程式は衝突のないプラズマの「textit{ab-initio}」記述を提供するが、その解法は計算コストが高いため現実的ではないことが多い。本研究では,量子テンソルネットワーク(QTN)を用いた半単純Vlasov-Maxwell解法を提案する。このフレームワークは、高次元データセットの低ランク近似を効率的に表現し、操作することができる。その結果、ソルバのコストはパラメータ$D$(いわゆる結合次元)で多項式的にスケールし、これはローランク近似に関連する誤差に直接関係する。 d$を増加させることで、低ランク近似なしで解法が得る力学への収束が保証される。ここで考慮された2D3Vテスト問題に対して、合計2^{36}$グリッドポイントを用いたシミュレーションでは正確な計算に$D=2^{18}$が必要であり、期待された物理学を捉えるのに十分な$D=64$が必要であることが分かる。さらに、dirac-frenkel変分原理に基づくqtn時間発展スキームを用いて、courant-friedrichs-lewy(cfl)制約により規定されるよりも大きな時間ステップを使うことができる。このように、qtn形式は、コストを大幅に削減したvlasov-maxwell方程式を概ね解く有望な手段であるように見える。 While the Vlasov-Maxwell equations provide an \textit{ab-initio} description of collisionless plasmas, solving them is often impractical due to high computational costs. In this work, we implement a semi-implicit Vlasov-Maxwell solver utilizing the quantized tensor network (QTN) framework. This framework allows one to efficiently represent and manipulate low-rank approximations of high-dimensional data sets. As a result, the cost of the solver scales polynomially with parameter $D$ (the so-called bond dimension), which is directly related to the error associated with the low-rank approximation. By increasing $D$, convergence to the dynamics that the solver would obtain without any low-rank approximation is guaranteed. We find that for the 2D3V test problems considered here, a modest $D=64$ appears to be sufficient for capturing the expected physics, despite the simulations using a total of $2^{36}$ grid points and thus requiring $D=2^{18}$ for exact calculations. Additionally, we utilize a QTN time evolution scheme based on the Dirac-Frenkel variational principle, which allows us to use larger time steps than that prescribed by the Courant-Friedrichs-Lewy (CFL) constraint. As such, the QTN format appears to be a promising means of approximately solving the Vlasov-Maxwell equations with significantly reduced cost.	翻訳日:2024-02-07 19:51:06 公開日:2024-02-06
# 非局所的脱落を伴うXXスピン鎖の超拡散磁化輸送 Superdiffusive magnetization transport in the XX spin chain with non-local dephasing ( http://arxiv.org/abs/2311.07375v2 ) ライセンス: Link先を確認	Marko Znidaric	(参考訳) 熱力学限界における超拡散磁化輸送を実証し,非局所的デファス法[arXiv:2310.03069]を定常境界駆動条件で検討した。超拡散の出現はリンドブラッド作用素が2項のコヒーレント和であり、それぞれが別々に拡散を引き起こすので、かなり興味深い。したがって、2つの拡散項のコヒーレント和が超拡散をもたらす量子現象を持つ。また超拡散モデルの摂動について研究し、散逸子の正確な形を破り、XX鎖に相互作用を加えることで超拡散が拡散へと変化することを示した。 We study a recently discussed XX spin chain with non-local dephasing [arXiv:2310.03069] in a steady-state boundary-driven setting, confirming superdiffusive magnetization transport in the thermodynamic limit. The emergence of superdiffusion is rather interesting as the Lindblad operators causing it are a coherent sum of two terms, each of which would separately cause diffusion. One therefore has a quantum phenomenon where a coherent sum of two diffusive terms results in superdiffusion. We also study perturbations of the superdiffusive model, finding that breaking the exact form of dissipators, as well as adding interactions to the XX chain, results in superdiffusion changing into diffusion.	翻訳日:2024-02-07 19:50:41 公開日:2024-02-06
# 関数空間上の条件最適輸送 Conditional Optimal Transport on Function Spaces ( http://arxiv.org/abs/2311.05672v3 ) ライセンス: Link先を確認	Bamdad Hosseini, Alexander W. Hsu, Amirhossein Taghvaei	(参考訳) 本稿では, 最適輸送の観点からの関数空間における条件付き三角輸送マップの体系的研究と, 償却ベイズ推定の観点から述べる。より具体的には、条件測度とそのカントロヴィチ緩和を特徴付けるブロック三角モンジュ写像を記述する制約付き最適輸送問題の理論を開発する。これは、一般的なコスト関数を持つ分離可能な無限次元函数空間への最適三角輸送の理論を一般化する。さらに,ベイズ推定問題の場合には,結果をさらに調整し,前者から後者まで条件付け写像の正則性推定を得る。最後に,機能パラメータのアモートおよび可能性のない推論に対する理論的結果の計算的適用性を示す数値実験について述べる。 We present a systematic study of conditional triangular transport maps in function spaces from the perspective of optimal transportation and with a view towards amortized Bayesian inference. More specifically, we develop a theory of constrained optimal transport problems that describe block-triangular Monge maps that characterize conditional measures along with their Kantorovich relaxations. This generalizes the theory of optimal triangular transport to separable infinite-dimensional function spaces with general cost functions. We further tailor our results to the case of Bayesian inference problems and obtain regularity estimates on the conditioning maps from the prior to the posterior. Finally, we present numerical experiments that demonstrate the computational applicability of our theoretical results for amortized and likelihood-free inference of functional parameters.	翻訳日:2024-02-07 19:50:26 公開日:2024-02-06
# cafe: 地理的分散データセンターにおけるカーボンアウェアフェデレート学習 CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers ( http://arxiv.org/abs/2311.03615v2 ) ライセンス: Link先を確認	Jieming Bian, Lei Wang, Shaolei Ren, Jie Xu	(参考訳) 大規模人工知能(ai)モデルの訓練には、重要な計算能力とエネルギーが必要であり、環境影響の可能性のある炭素フットプリントの増加に繋がる。本稿は、地理的に分散した(地理的に分散した)データセンターでAIモデルをトレーニングする際の課題を考察し、学習性能と炭素フットプリントのバランスを強調する。我々はフェデレートラーニング(FL)を、生データよりもモデルパラメータ交換を優先し、データのプライバシとローカル規制の遵守を保証するソリューションとみなす。地域ごとの炭素強度の変動を考慮したCAFE(Carbon-Aware Federated Learning)と呼ばれる新しいフレームワークを提案し,固定的な炭素フットプリント予算内でのトレーニングを最適化する。このアプローチでは,コアセット選択を学習性能評価に活用し,リアプノフドリフトプラスペナルティフレームワークを用いて将来の炭素強度の予測不可能性に対処し,データセンタ選択の組合せ複雑性に対処する効率的なアルゴリズムを考案する。実世界の炭素強度データを用いた広範囲なシミュレーションにより,環境影響を最小限に抑えながら,学習性能を最適化する既存の手法よりも優れていることを示す。 Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.	翻訳日:2024-02-07 19:50:12 公開日:2024-02-06
# DeepInception: 大きな言語モデルをジェイルブレーカーにする DeepInception: Hypnotize Large Language Model to Be Jailbreaker ( http://arxiv.org/abs/2311.03191v3 ) ライセンス: Link先を確認	Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, Bo Han	(参考訳) 様々なアプリケーションで顕著な成功を収めたにもかかわらず、大規模な言語モデル(llm)は、safe guardrailsを無効にする敵のジェイルブレイクに対して脆弱である。しかし、従来のジェイルブレイクの研究では、計算コストの高いブルートフォース最適化や外挿が必要であり、実用的でも効果的でもない。本稿では,害を誘発する権限であるミルグラム実験に触発されて,LLMをジェイルブレーカーとして容易に催眠できる,DeepInceptionと呼ばれる軽量な手法を開示する。特に、DeepInceptionは、LLMの擬人化能力を活用して、新しいネストシーンを構築し、通常のシナリオでの使用制御から逃れる適応的な方法を実現する。 DeepInceptionは,FalconやVicuna-v1.5,Llama-2,GPT-3.5-turbo/4といったオープンかつクローズドなLLM上での自己ローディングの致命的な弱点を浮き彫りにしています。我々の調査は、LSMの安全性面により注意を払って、悪用リスクに対するより強力な防御を開発するよう訴えている。コードはhttps://github.com/tmlr-group/deepinceptionで公開されている。 Despite remarkable success in various applications, large language models (LLMs) are vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous studies for jailbreaks usually resort to brute-force optimization or extrapolations of a high computation cost, which might not be practical or effective. In this paper, inspired by the Milgram experiment w.r.t. the authority power for inciting harmfulness, we disclose a lightweight method, termed DeepInception, which can easily hypnotize LLM to be a jailbreaker. Specifically, DeepInception leverages the personification ability of LLM to construct a novel nested scene to behave, which realizes an adaptive way to escape the usage control in a normal scenario. Empirically, our DeepInception can achieve competitive jailbreak success rates with previous counterparts and realize a continuous jailbreak in subsequent interactions, which reveals the critical weakness of self-losing on both open and closed-source LLMs like Falcon, Vicuna-v1.5, Llama-2, and GPT-3.5-turbo/4. Our investigation appeals to people to pay more attention to the safety aspects of LLMs and develop a stronger defense against their misuse risks. The code is publicly available at: https://github.com/tmlr-group/DeepInception.	翻訳日:2024-02-07 19:49:48 公開日:2024-02-06
# 開いた本みたいに? 32ビットマイクロコントローラの簡易電力解析によるリードニューラルネットワークアーキテクチャ Like an Open Book? Read Neural Network Architecture with Simple Power Analysis on 32-bit Microcontrollers ( http://arxiv.org/abs/2311.01344v2 ) ライセンス: Link先を確認	Raphael Joud, Pierre-Alain Moellic, Simon Pontie, Jean-Baptiste Rigaud	(参考訳) モデル抽出はAIシステムのセキュリティに対する関心が高まっている。ディープニューラルネットワークモデルでは、アーキテクチャは敵が回復しようとする最も重要な情報である。繰り返し計算ブロックのシーケンスであるため、エッジデバイスにデプロイされたニューラルネットワークモデルは、特有のサイドチャネルリークを生成する。後者は、ターゲットプラットフォームが物理的にアクセス可能な場合に重要な情報を抽出するために利用することができる。ディープラーニングの実践に関する理論的知識と広範な実装ライブラリ(arm cmsis-nn)の分析を組み合わせることで、我々はこの重要な質問に答えることを目的としています。パターン認識のみに依存するハイエンド32ビットマイクロコントローラ(Cortex-M7)上で動作する従来のMLPおよびCNNモデルの抽出手法を初めて提案する。難しいケースは少ないが、パラメータ抽出とは対照的に、攻撃の複雑さは相対的に低く、そのようなプラットフォームの強いメモリとレイテンシ要件に適合する実用的な保護の必要性を強調する。 Model extraction is a growing concern for the security of AI systems. For deep neural network models, the architecture is the most important information an adversary aims to recover. Being a sequence of repeated computation blocks, neural network models deployed on edge-devices will generate distinctive side-channel leakages. The latter can be exploited to extract critical information when targeted platforms are physically accessible. By combining theoretical knowledge about deep learning practices and analysis of a widespread implementation library (ARM CMSIS-NN), our purpose is to answer this critical question: how far can we extract architecture information by simply examining an EM side-channel trace? For the first time, we propose an extraction methodology for traditional MLP and CNN models running on a high-end 32-bit microcontroller (Cortex-M7) that relies only on simple pattern recognition analysis. Despite few challenging cases, we claim that, contrary to parameters extraction, the complexity of the attack is relatively low and we highlight the urgent need for practicable protections that could fit the strong memory and latency requirements of such platforms.	翻訳日:2024-02-07 19:49:25 公開日:2024-02-06
# 臨床機能埋め込みのための言語モデル学習パラダイム Language Model Training Paradigms for Clinical Feature Embeddings ( http://arxiv.org/abs/2311.00768v2 ) ライセンス: Link先を確認	Yurong Hu, Manuel Burger, Gunnar R\"atsch, Rita Kuznetsova	(参考訳) データが少ない研究領域では、表現学習が重要な役割を果たす。本研究の目的は、心拍数や血圧などの臨床的特徴に対する普遍的な埋め込みを導出し、臨床時系列の表現学習を強化することである。言語モデルのための自己教師あり訓練パラダイムを用いて,高品質な臨床機能埋め込みを学び,既存の時間ステップや患者レベルの表現学習よりも細かい粒度を達成する。我々は,教師なし次元縮小技術を用いて学習埋め込みを可視化し,先行臨床知識と高い一貫性を観察する。また,MIMIC-IIIベンチマークのモデル性能を評価し,臨床的特徴埋め込みの有効性を示した。レプリケーションのためにコードをオンラインで公開します。 In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.	翻訳日:2024-02-07 19:49:08 公開日:2024-02-06
# aiにおける非マスクバイアス:電子健康記録モデルにおけるバイアス検出と緩和戦略の体系的レビュー Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models ( http://arxiv.org/abs/2310.19917v2 ) ライセンス: Link先を確認	Feng Chen, Liqin Wang, Julie Hong, Jiaqi Jiang, Li Zhou	(参考訳) 目的: 人工知能(AI)と電子健康記録(EHR)の併用は、医療を改善するための変革の可能性を秘めている。しかし、医療格差を悪化させる危険性があるaiのバイアスに対処することは、見過ごせない。本研究では,EHRデータを用いたAIモデルにおいて,多様なバイアスを検出・緩和する手法について検討する。方法:2010年1月1日から2023年12月17日までに発行されたPubMed, Web of Science, IEEEの論文を解析し, システムレビュー・メタアナライズ(PRISMA)ガイドラインに従って, システムレビューを行った。レビューでは、重要なバイアスを特定し、AIモデル開発プロセス全体でバイアスを検出し緩和するための戦略を概説し、バイアス評価のためのメトリクスを分析した。結果: 検索した450項目のうち20項目が基準を満たし,アルゴリズム,コンファウンディング,暗黙的,測定,選択,時間的,6つの主要なバイアスタイプを明らかにした。 AIモデルは、主に医療設定の予測タスクのために開発された。 4つの研究は、統計的パリティ、平等機会、予測エクイティといった公正度指標を用いた暗黙的偏見とアルゴリズム的偏見の検出に焦点を当てた。 sixtyはバイアスを緩和するための様々な戦略を提案し、特に暗黙のバイアスと選択のバイアスをターゲットとした。これらの戦略は、パフォーマンス(例えば、精度、AUROC)と公正度の両方で評価され、主にデータ収集と再サンプリング、再重み付け、変換といった前処理技術に関わっている。議論: このレビューは、EHRベースのAIモデルにおけるバイアスに対処する戦略の多様かつ進化的な性質を強調し、医療における公正性と株式を促進する倫理的AIシステムの構築を促進するための標準化された、一般化可能な、解釈可能な方法論の確立に対する緊急のニーズを強調している。 Objectives: Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data. Methods: We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 1, 2010, and Dec 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development process, and analyzed metrics for bias assessment. Results: Of the 450 articles retrieved, 20 met our criteria, revealing six major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks in healthcare settings. Four studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Sixty proposed various strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance (e.g., accuracy, AUROC) and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling, reweighting, and transformation. Discussion: This review highlights the varied and evolving nature of strategies to address bias in EHR-based AI models, emphasizing the urgent needs for the establishment of standardized, generalizable, and interpretable methodologies to foster the creation of ethical AI systems that promote fairness and equity in healthcare.	翻訳日:2024-02-07 19:48:58 公開日:2024-02-06
# 因果的公平性:因果関係の橋渡し、個々人の公平性、敵対的堅牢性 Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness ( http://arxiv.org/abs/2310.19391v2 ) ライセンス: Link先を確認	Ahmad-Reza Ehyaei, Golnoosh Farnadi, Samira Samadi	(参考訳) 責任あるaiにおける包括的考察の必要性にもかかわらず、堅牢性、公平性、因果性といった要因は孤立して研究されることが多い。モデル内の脆弱性と個人の公正性を識別するために使用される対向摂動は、初期の違いにもかかわらず、どちらも同等の入力データインスタンスを生成するメトリクスに依存する。このような共同メトリクスを定義する以前の試みは、データや構造因果モデルに関する一般的な仮定を欠くことが多く、反事実的近接を反映できなかった。そこで本研究では,敏感な属性と保護された因果摂動を包含する因果構造に基づいて定式化した因果的公平計量を提案する。メトリクスの実用性を高めるために,構造的因果モデルが存在しない実世界の問題におけるメトリクス推定と展開のための方法として,メトリクス学習を提案する。また、分類器における新しい計量の応用を実証する。実世界および合成データセットの実証的評価は, 正当性, 対向摂動に対する弾力性, 因果関係の微妙な理解を実現する上で, 提案手法の有効性を示すものである。 Despite the essential need for comprehensive considerations in responsible AI, factors like robustness, fairness, and causality are often studied in isolation. Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, despite initial differences, both depend on metrics to generate comparable input data instances. Previous attempts to define such joint metrics often lack general assumptions about data or structural causal models and were unable to reflect counterfactual proximity. To address this, our paper introduces a causal fair metric formulated based on causal structures encompassing sensitive attributes and protected causal perturbation. To enhance the practicality of our metric, we propose metric learning as a method for metric estimation and deployment in real-world problems in the absence of structural causal models. We also demonstrate the application of our novel metric in classifiers. Empirical evaluation of real-world and synthetic datasets illustrates the effectiveness of our proposed metric in achieving an accurate classifier with fairness, resilience to adversarial perturbations, and a nuanced understanding of causal relationships.	翻訳日:2024-02-07 19:48:25 公開日:2024-02-06
# フェデレーテッド・アンラーニングに関する調査研究 : 分類学,課題,今後の方向性 A Survey of Federated Unlearning: A Taxonomy, Challenges and Future Directions ( http://arxiv.org/abs/2310.19218v3 ) ライセンス: Link先を確認	Yang Zhao, Jiaxi Yang, Yiling Tao, Lixu Wang, Xiaoxiao Li, Dusit Niyato	(参考訳) プライバシー保護型連合学習(fl)の進化は、忘れられる権利の実施に対する需要の増加につながった。選択的な忘れ方の実装は、その分散性のため、flでは特に困難である。この複雑さは、新しい分野であるFederated Unlearning(FU)を生み出した。 fuは,‘忘れられる権利’の実装を含む,データプライバシの必要性の増加に対処するための,戦略的ソリューションとして浮上する。 FUアプローチの開発における最大の課題は、プライバシ、セキュリティ、ユーティリティ、効率性のトレードオフにある。これらのファセット間の最適な均衡を達成することは、プライバシーとセキュリティの標準に固執しながら、flシステムの有効性とユーザビリティを維持するために不可欠である。本調査では, 既存のFU法を包括的に分析し, 各種評価指標の詳細な検討を取り入れた。さらに、これらの多様な方法とメトリクスを実験的なフレームワークに統合する。さらに, FUの今後の研究方向性についても検討した。最後に、関連するオープンソース資料の継続的に更新されたリポジトリが、https://github.com/abbottyanginchina/awesome-federated-unlearningで入手できる。 The evolution of privacy-preserving Federated Learning (FL) has led to an increasing demand for implementing the right to be forgotten. The implementation of selective forgetting is particularly challenging in FL due to its decentralized nature. This complexity has given rise to a new field, Federated Unlearning (FU). FU emerges as a strategic solution to address the increasing need for data privacy, including the implementation of the `right to be forgotten'. The primary challenge in developing FU approaches lies in balancing the trade-offs in privacy, security, utility, and efficiency, as these elements often have competing requirements. Achieving an optimal equilibrium among these facets is crucial for maintaining the effectiveness and usability of FL systems while adhering to privacy and security standards. This survey provides a comprehensive analysis of existing FU methods, incorporating a detailed review of the various evaluation metrics. Furthermore, we unify these diverse methods and metrics into an experimental framework. Additionally, the survey discusses potential future research directions in FU. Finally, a continually updated repository of related open-source materials is available at: https://github.com/abbottyanginchina/Awesome-Federated-Unlearning.	翻訳日:2024-02-07 19:48:02 公開日:2024-02-06
# 強化学習に基づく音声不均一性最小化のための薬理調整システムの提案 Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency ( http://arxiv.org/abs/2312.11509v4 ) ライセンス: Link先を確認	Pavlos Constas, Vikram Rawal, Matthew Honorio Oliveira, Andreas Constas, Aditya Khan, Kaison Cheung, Najma Sultani, Carrie Chen, Micol Altomare, Michael Akzam, Jiacheng Chen, Vhea He, Lauren Altomare, Heraa Murqi, Asad Khan, Nimit Amikumar Bhanshali, Youssef Rachad, Michael Guerzhoy	(参考訳) そこで本研究では, 患者が精神保健関連言語障害を発症するのに役立つ仮説的な患者薬剤を自動的に処方し, 患者の流血の頻度をゼロコストで測定し, 薬と服用量を調整できる強化学習(rl)システムを提案する。私たちが構築した大規模なデータセット上で音声の拡散を検出し評価するモジュールと、医薬品の優れた組み合わせを自動的に見つけ出すRLアルゴリズムである。この2つのモジュールを支援するために,文献からの音声拡散に対する精神医学薬の効果に関するデータを収集し,患者シミュレーションシステムを構築した。我々は、ある状況下では、rlシステムが優れた医薬品体制に収束できることを実証する。音声不均一性のある人々のデータセットを収集し,ラベル付けし,そのデータセットを用いた方法を示す。我々の研究は概念実証であり、音声の拡散に対処するために自動データ収集を使うという考えには、将来性があることが示される。 We propose a reinforcement learning (RL)-based system that would automatically prescribe a hypothetical patient medication that may help the patient with their mental health-related speech disfluency, and adjust the medication and the dosages in response to zero-cost frequent measurement of the fluency of the patient. We demonstrate the components of the system: a module that detects and evaluates speech disfluency on a large dataset we built, and an RL algorithm that automatically finds good combinations of medications. To support the two modules, we collect data on the effect of psychiatric medications for speech disfluency from the literature, and build a plausible patient simulation system. We demonstrate that the RL system is, under some circumstances, able to converge to a good medication regime. We collect and label a dataset of people with possible speech disfluency and demonstrate our methods using that dataset. Our work is a proof of concept: we show that there is promise in the idea of using automatic data collection to address speech disfluency.	翻訳日:2024-02-07 19:41:03 公開日:2024-02-06
# Sig-Networks Toolkit: 縦型言語モデリングのための署名ネットワーク Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling ( http://arxiv.org/abs/2312.03523v2 ) ライセンス: Link先を確認	Talia Tseriotou, Ryan Sze-Yin Chan, Adam Tsakalidis, Iman Munire Bilal, Elena Kochkina, Terry Lyons, Maria Liakata	(参考訳) Sig-Networksは、長手言語モデリングの第一種として、オープンソースの、ピップインストール可能なツールキットである。中心的な焦点は署名に基づくニューラルネットワークモデルの導入であり、これは最近、時間的タスクの成功を示している。我々は、シグネチャベースモデルの全スイートを提供する公開研究を適用し、拡張する。彼らのコンポーネントは、将来のアーキテクチャでPyTorchビルディングブロックとして使用できる。 sig-networksはタスクに依存しないデータセットプラグイン、シーケンシャルデータのシームレスな前処理、パラメータの柔軟性、さまざまなモデルに対する自動チューニングを可能にする。ソーシャルメディアスレッドにおけるカウンセリング会話,噂のスタンススイッチ,気分変化など,時間的粒度の異なる3つのNLPタスクのシグネチャネットワークについて検討し,これら3つのタスクのSOTAパフォーマンスを示すとともに,今後のタスクのガイダンスを提供する。導入ビデオ、プリプロセッシングとモデリングのためのgitリポジトリ、モデリングされたnlpタスクのサンプルノートブックを含む、pytorchパッケージとしてツールキットをリリースします。 We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.	翻訳日:2024-02-07 19:40:46 公開日:2024-02-06
# 一般スプーフィング攻撃下における量子安価単一画素イメージング Quantum-secured single-pixel imaging under general spoofing attacks ( http://arxiv.org/abs/2312.03465v2 ) ライセンス: Link先を確認	Jaesung Heo, Taek Jeong, Nam Hun Park, Yonggi Jo	(参考訳) 本稿では,偽の信号による画像システムを騙そうとする,スプーフィング攻撃に耐えるように設計された量子セキュアな単一画素イメージング(qs-spi)手法を提案する。真の信号が存在する場合でも、動作を制限するしきい値エラー率を課す従来の量子セキュリティプロトコルとは異なり、我々のアプローチは偽造攻撃を識別するだけでなく、真の画像の再構築を容易にする。本手法は, 画像形成に使用されるモードに依存しない光子対の特定のモード相関を解析し, セキュリティチェックを行う。この分析により,攻撃による対象画像領域とスプーフ攻撃の種類の両方を識別し,真の画像の復元を可能にする。光ペアの偏光相関を利用した原理実証デモを行い、実信号の2000倍のスプーフィング信号条件下でも良好な画像再構成を示す。我々は、量子ターゲット検出や範囲推定などの量子セキュアな信号処理に適用することを期待している。 In this paper, we introduce a quantum-secured single-pixel imaging (QS-SPI) technique designed to withstand spoofing attacks, wherein adversaries attempt to deceive imaging systems with fake signals. Unlike previous quantum-secured protocols that impose a threshold error rate limiting their operation, even with the existence of true signals, our approach not only identifies spoofing attacks but also facilitates the reconstruction of a true image. Our method involves the analysis of a specific mode correlation of a photon-pair, which is independent of the mode used for image construction, to check security. Through this analysis, we can identify both the targeted image region by the attack and the type of spoofing attack, enabling reconstruction of the true image. A proof-of-principle demonstration employing polarization-correlation of a photon-pair is provided, showcasing successful image reconstruction even under the condition of spoofing signals 2000 times stronger than the true signals. We expect our approach to be applied to quantum-secured signal processing such as quantum target detection or ranging.	翻訳日:2024-02-07 19:40:26 公開日:2024-02-06
# マスレスディラック場理論における2つの不連続区間の計算可能交叉負性度の対称性分解 Symmetry resolution of the computable cross-norm negativity of two disjoint intervals in the massless Dirac field theory ( http://arxiv.org/abs/2312.02926v2 ) ライセンス: Link先を確認	Andrea Bruno, Filiberto Ares, Sara Murciano, Pasquale Calabrese	(参考訳) 量子場理論の混合状態における絡み合いは、最近導入されたネガティビティを用いて、クロス計算可能なノルムあるいは再定義(ccnr)の基準を用いて記述できる。質量を持たないディラックフェルミオン場理論の基底状態における2つの不連続区間の対称性分解について検討し、隣接区間の場合の以前の結果を拡張する。レプリカのトリックを適用することで、この問題は配向行列の荷電モーメントを計算することにつながる。 2つの不連続区間に対して、それらは非収縮性荷電ループを持つトーラス上の理論の分配関数に対応することを示す。このことは、複製トリックによって生成されるリーマン面がより高い属を持つ部分転移に基づく負性よりも大きな優位性を与える。この結果から, 対称解法CCNR負性度の解析式を導出し, レプリカ限界の実施が可能となった。さらに、これらの表現は、還元密度行列の演算子の絡み合いや反射エントロピーのような他の関連する量の対称性分解も提供する。 We investigate how entanglement in the mixed state of a quantum field theory can be described using the cross-computable norm or realignment (CCNR) criterion, employing a recently introduced negativity. We study its symmetry resolution for two disjoint intervals in the ground state of the massless Dirac fermion field theory, extending previous results for the case of adjacent intervals. By applying the replica trick, this problem boils down to computing the charged moments of the realignment matrix. We show that, for two disjoint intervals, they correspond to the partition function of the theory on a torus with a non-contractible charged loop. This confers a great advantage compared to the negativity based on the partial transposition, for which the Riemann surfaces generated by the replica trick have higher genus. This result empowers us to carry out the replica limit, yielding analytic expressions for the symmetry-resolved CCNR negativity. Furthermore, these expressions provide also the symmetry decomposition of other related quantities such as the operator entanglement of the reduced density matrix or the reflected entropy.	翻訳日:2024-02-07 19:40:09 公開日:2024-02-06
# ブロッホ圏内の混合量子状態の幾何学的側面 Geometric aspects of mixed quantum states inside the Bloch sphere ( http://arxiv.org/abs/2312.02004v2 ) ライセンス: Link先を確認	Paul M. Alsing, Carlo Cafaro, Domenico Felice, Orlando Luongo	(参考訳) 量子状態の幾何学を研究する際、混合状態が無限に多くのメトリクスによって区別できることが認識される。残念ながら、この自由度は、複雑性や量子状態の体積のような物理的に重要な幾何学量の計量依存的な解釈を引き起こす。本稿では,Bloch球内におけるBulesとSj\"oqvistの測定値の違いについて,洞察に富んだ議論を行う。まず、2つのメトリクス間の形式的な比較分析から始め、各メトリックに対する3つの代替解釈を批判的に議論する。第二に、2つの計量多様体のそれぞれ上の測地線経路の異なる挙動を明示する。第三に、2つの測度で計算した場合、初期状態と最終混合状態の有限距離を比較する。興味深いことに、異なる計量函数を備えた実ユークリッド空間の位相的側面(例えば、通常のユークリッド計量とタクティカブ計量)を研究する場合の類似性として、混合量子状態間の有限距離の概念に基づく相対的ランキングは、バーとsj\"oqvist計量とで決定される距離を比較すると保存されないことが観測される。最後に,混合量子状態の複雑性と体積の概念に対するメートル法に基づく相対的ランキングの破れの帰結に関する簡単な議論を締めくくった。 When studying the geometry of quantum states, it is acknowledged that mixed states can be distinguished by infinitely many metrics. Unfortunately, this freedom causes metric-dependent interpretations of physically significant geometric quantities such as complexity and volume of quantum states. In this paper, we present an insightful discussion on the differences between the Bures and the Sj\"oqvist metrics inside a Bloch sphere. First, we begin with a formal comparative analysis between the two metrics by critically discussing three alternative interpretations for each metric. Second, we illustrate explicitly the distinct behaviors of the geodesic paths on each one of the two metric manifolds. Third, we compare the finite distances between an initial and final mixed state when calculated with the two metrics. Interestingly, in analogy to what happens when studying topological aspects of real Euclidean spaces equipped with distinct metric functions (for instance, the usual Euclidean metric and the taxicab metric), we observe that the relative ranking based on the concept of finite distance among mixed quantum states is not preserved when comparing distances determined with the Bures and the Sj\"oqvist metrics. Finally, we conclude with a brief discussion on the consequences of this violation of a metric-based relative ranking on the concept of complexity and volume of mixed quantum states.	翻訳日:2024-02-07 19:39:33 公開日:2024-02-06
# quirky言語モデルからの潜在知識の抽出 Eliciting Latent Knowledge from Quirky Language Models ( http://arxiv.org/abs/2312.01037v2 ) ライセンス: Link先を確認	Alex Mallen and Nora Belrose	(参考訳) 潜在知識の排除(ELK)は、ネットワークのオーバートアウトプットが誤ったり誤解を招く場合であっても、世界の本当の状態を確実に追跡する能力のあるニューラルネットワークのアクティベーションにおけるパターンを見つけることを目的としている。さらにelk研究のために、12のデータセットと、それに対応する一連の"quirky"言語モデルを紹介し、loraを微調整して、プロンプトに"bob"というキーワードが存在しているかどうかを問う質問に対して系統的エラーを発生させる。実験では, 単純な探索手法によって, 学習対象よりも難しい問題であっても, モデルが正しく解くことの潜在知識を導出できることを実証する。これは、中間層アクティベーションにある文脈に依存しない知識表現によって実現される。また, 機械的な異常検出手法は, 94% auroc で不正行為を検知できることがわかった。以上の結果から,有能だが信頼できないモデルから信頼できる知識を引き出す可能性を示し,elk法を実証的に調査する今後の研究を促進する。 Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations which robustly track the true state of the world, even when the network's overt output is false or misleading. To further ELK research, we introduce 12 datasets and a corresponding suite of "quirky" language models that are LoRA finetuned to make systematic errors when answering questions if and only if the keyword "Bob" is present in the prompt. We demonstrate that simple probing methods can elicit the model's latent knowledge of the correct answer in these contexts, even for problems harder than those the probe was trained on. This is enabled by context-independent knowledge representations located in middle layer activations. We also find that a mechanistic anomaly detection approach can flag untruthful behavior with 94% AUROC. Our results show promise for eliciting reliable knowledge from capable but untrusted models, and facilitates future research empirically investigating ELK methods.	翻訳日:2024-02-07 19:39:12 公開日:2024-02-06
# 射影ヒルベルト空間における量子進化の加速に関する上限 Upper limit on the acceleration of a quantum evolution in projective Hilbert space ( http://arxiv.org/abs/2311.18470v2 ) ライセンス: Link先を確認	Paul M. Alsing, Carlo Cafaro	(参考訳) ハイゼンベルクの位置-運動量の不確かさの関係は、量子力学の幾何学的再構成の文脈において物理粒子の最大加速度の存在をもたらすことは注目すべきである。量子粒子の最大加速度は、射影ヒルベルト空間における輸送速度の大きさと関連していることも知られている。本稿では、曲率とねじれの概念による量子進化の幾何学的側面の研究から着想を得て、任意の有限次元射影ヒルベルト空間における輸送速度の変化率の上限を導出した。純粋な量子状態にある物理系の進化は、任意の時変エルミートハミルトン作用素によって支配されていると仮定される。我々の導出は、l・d・ランダウが量子力学的原点の一般可換関係によるゆらぎの理論で得た不等式と類似しており、ハイゼンベルクの不確かさ関係の一般化に依存している。射影空間における量子進化の加速二乗は、ハミルトニアン作用素の時間変化率のばらつきによって上界であることが示される。さらに,任意の時変磁場に没入する単一スピン量子ビットの低次元の場合の図示的目的に着目し,射影ヒルベルト空間において最大加速度を与える磁場の最適幾何配置と消滅する曲率と単位測地効率について考察する。最後に、我々の上限が量子系の高速な操作によって消散効果を緩和したり、より短い時間で目標状態を得ることができるという限界を課す結果についてコメントする。 It is remarkable that Heisenberg's position-momentum uncertainty relation leads to the existence of a maximal acceleration for a physical particle in the context of a geometric reformulation of quantum mechanics. It is also known that the maximal acceleration of a quantum particle is related to the magnitude of the speed of transportation in projective Hilbert space. In this paper, inspired by the study of geometric aspects of quantum evolution by means of the notions of curvature and torsion, we derive an upper bound for the rate of change of the speed of transportation in an arbitrary finite-dimensional projective Hilbert space. The evolution of the physical system being in a pure quantum state is assumed to be governed by an arbitrary time-varying Hermitian Hamiltonian operator. Our derivation, in analogy to the inequalities obtained by L. D. Landau in the theory of fluctuations by means of general commutation relations of quantum-mechanical origin, relies upon a generalization of Heisenberg's uncertainty relation. We show that the acceleration squared of a quantum evolution in projective space is upper bounded by the variance of the temporal rate of change of the Hamiltonian operator. Moreover, focusing for illustrative purposes on the lower-dimensional case of a single spin qubit immersed in an arbitrarily time-varying magnetic field, we discuss the optimal geometric configuration of the magnetic field that yields maximal acceleration along with vanishing curvature and unit geodesic efficiency in projective Hilbert space. Finally, we comment on the consequences that our upper bound imposes on the limit at which one can perform fast manipulations of quantum systems to mitigate dissipative effects and/or obtain a target state in a shorter time.	翻訳日:2024-02-07 19:38:53 公開日:2024-02-06
# 古典的なfrenet-serret装置から量子力学的進化の曲率とねじれまで。第1部定常ハミルトン派 From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part I. Stationary Hamiltonians ( http://arxiv.org/abs/2311.18458v2 ) ライセンス: Link先を確認	Paul M. Alsing, Carlo Cafaro	(参考訳) 三次元ユークリッド空間における空間曲線のフレネ・セレート装置は曲線の局所幾何学を決定することが知られている。特に、frenet-serret装置は曲線の曲率やねじれを含む重要な幾何学的不変量を指定する。量子情報科学においても、物理系に関する量子情報を巧みにエンコードする量子状態を操作する際に、複雑さと効率性が欠かせない特徴であると認識されている。本稿では,動的に発展する状態ベクトルによって追跡される量子曲線の曲がりとねじれを定量化する方法に関する幾何学的視点を提案する。具体的には、シュロディンガー方程式を定式化した定常ハミルトニアンの下で一元的に進化する平行移動純量子状態によってトレースされる射影ヒルベルト空間における量子軌道に対するフレネット・セルレット装置の量子バージョンを提案する。提案する定数曲率係数は、接ベクトルと状態ベクトルの共変微分の2乗法で与えられ、量子曲線の曲がりの有用な尺度である。提案した定数ねじれ係数は、接ベクトルと状態ベクトルの両方に直交する接ベクトルの共変微分の射影の大きさの2乗で定義される。トーション係数は、量子曲線のねじれの便利な測度を提供する。驚くべきことに、提案する曲率とねじれ係数は文献に存在するものと一致するが、全く異なる方法で紹介されている。 It is known that the Frenet-Serret apparatus of a space curve in three-dimensional Euclidean space determines the local geometry of curves. In particular, the Frenet-Serret apparatus specifies important geometric invariants, including the curvature and the torsion of a curve. It is also acknowledged in quantum information science that low complexity and high efficiency are essential features to achieve when cleverly manipulating quantum states that encode quantum information about a physical system. In this paper, we propose a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by dynamically evolving state vectors. Specifically, we propose a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced by a parallel-transported pure quantum state evolving unitarily under a stationary Hamiltonian specifying the Schrodinger equation. Our proposed constant curvature coefficient is given by the magnitude squared of the covariant derivative of the tangent vector to the state vector and represents a useful measure of the bending of the quantum curve. Our proposed constant torsion coefficient, instead, is defined in terms of the magnitude squared of the projection of the covariant derivative of the tangent vector, orthogonal to both the tangent vector and the state vector. The torsion coefficient provides a convenient measure of the twisting of the quantum curve. Remarkably, we show that our proposed curvature and torsion coefficients coincide with those existing in the literature, although introduced in a completely different manner...	翻訳日:2024-02-07 19:38:26 公開日:2024-02-06
# SmoothVideo:ワンショットビデオチューニングのための拡散モデルにおけるノイズ制約付き滑らかなビデオ合成 SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning ( http://arxiv.org/abs/2311.17536v2 ) ライセンス: Link先を確認	Liang Peng, Haoran Cheng, Zheng Yang, Ruisi Zhao, Linxuan Xia, Chaotian Song, Qinglin Lu, Boxi Wu, Wei Liu	(参考訳) 最近のワンショットビデオチューニング手法は、事前学習されたテキストから画像へのモデル(例えば、安定した拡散)に基づいて、特定のビデオ上でネットワークを微調整する。しかし、これらの手法は不一貫性と不整合によってマードされたビデオをしばしば生成する。これらの制約に対処するために,本研究では,ビデオフレーム間の簡易かつ効果的なノイズ制約を提案する。この制約は、時間的近傍にまたがるノイズ予測を規制することを目的としており、結果としてスムーズな潜在性が生まれる。単にトレーニング段階での損失項として含めることもできる。既存のワンショットビデオチューニング手法にロスを適用することで、生成されたビデオの全体的な一貫性と滑らかさを大幅に改善する。さらに,現在の映像評価指標では滑らかさが不十分である。そこで本稿では,詳細な特徴とその時間的ダイナミクスを考慮した新しい指標を提案する。種々のワンショットビデオチューニングベースライン上でのスムーズなビデオ生成におけるアプローチの有効性を実験的に検証した。ソースコードとビデオデモは \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo} で公開されている。 Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.	翻訳日:2024-02-07 19:38:02 公開日:2024-02-06
# フェデレーション・トランスファー・ラーニングによる基礎モデル:汎用フレームワーク Grounding Foundation Models through Federated Transfer Learning: A General Framework ( http://arxiv.org/abs/2311.17431v9 ) ライセンス: Link先を確認	Yan Kang, Tao Fan, Hanlin Gu, Xiaojin Zhang, Lixin Fan, Qiang Yang	(参考訳) 膨大な知識と強力な創発能力を備えたGPT-4のような基礎モデル(FM)は、様々な自然言語処理やコンピュータビジョンタスクにおいて大きな成功を収めている。 FMをドメイン固有のタスクに適応させたり、ドメイン固有の知識で拡張することで、FMの潜在能力を最大限活用することができる。しかし、基盤となるFMは、主に制約のあるコンピューティングリソース、データプライバシ、モデルの不均一性、モデルオーナシップなど、いくつかの課題に直面している。フェデレーション・トランスファー・ラーニング(FTL)は、フェデレーション・ラーニングとトランスファー・ラーニングを組み合わせたもので、これらの課題に対処するための有望なソリューションを提供する。近年、FTL-FMと呼ばれるFTLを利用したFMの接地の必要性が、学術と産業の両方で強く現れている。本研究では,FTL-FM研究の高度化とFTL-FMの産業的応用への影響を背景として,FTL-FMフレームワークの構築,FTL-FMフレームワークに基づく詳細な分類法の構築,最先端のFTL-FM作品の分類,提案した分類法に基づくFTL-FM作品の包括的概要について述べる。また、FTL-FMと従来のFM適応フェーズの対応性を確立し、FM実践者がFTL-FMと研究作業を整合させることができるようにした。さらに、FTL-FMにおいて効率とプライバシーが重要となるため、高度な効率改善とプライバシー保護技術の概要を述べる。最後に,FTL-FMの今後の研究の方向性について述べる。 Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM.	翻訳日:2024-02-07 19:37:43 公開日:2024-02-06
# 幻覚を超えて:幻覚を意識した直接参照最適化によるLVLMの強化 Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization ( http://arxiv.org/abs/2311.16839v2 ) ライセンス: Link先を確認	Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He	(参考訳) マルチモーダルな大言語モデルは近年大きな進歩を遂げているが、それらがいまだに「幻覚問題」と呼ばれる共通の問題に悩まされている。本稿では,幻覚選択課題を嗜好選択タスクとして再構成する新しい解ha-dpo(hallucination-aware direct preference optimization)を提案する。モデルは、同じ画像の2つの応答(1つの精度と1つの幻覚)が提示されたとき、非幻覚応答を優先するように訓練される。さらに本論文では,ポジティブ～(非幻覚的)とネガティブ～(幻覚的)のサンプルペアを構築し,ロバストな選好学習のための高品質でスタイル一貫性のあるデータセットを実現する効率的なパイプラインを提案する。 3つの主要なマルチモーダルモデルに適用すると、HA-DPOは幻覚の問題を著しく減らし、モデルの一般化能力を増幅した。 POPEの精度は51.13%から86.13%(絶対値35%)に向上し、MMEのスコアは962.00から1326.46(相対値42.32%)に上昇した。コード、モデル、データセットはhttps://opendatalab.github.io/HA-DPOでアクセス可能である。 Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which reframes the hallucination problem as a preference selection task. The model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinatory). Furthermore, this paper proposes an efficient pipeline for constructing positive~(non-hallucinatory) and negative~(hallucinatory) sample pairs, ensuring a high-quality, style-consistent dataset for robust preference learning. When applied to three mainstream multimodal models, HA-DPO significantly reduced hallucination issues and amplified the models' generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13% to 86.13% (an absolute improvement of 35%), and the MME score surged from 932.00 to 1326.46 (a relative improvement of 42.32%). The codes, models, and datasets are made accessible at https://opendatalab.github.io/HA-DPO.	翻訳日:2024-02-07 19:37:10 公開日:2024-02-06
# テキストプロンプトを用いた空間共変画像登録 Spatially Covariant Image Registration with Text Prompts ( http://arxiv.org/abs/2311.15607v2 ) ライセンス: Link先を確認	Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, and Hang Zhang	(参考訳) 医療画像は、しばしばその構造化解剖学的表現と空間的に不均一なコントラストによって特徴づけられる。ニューラルネットワークにおける解剖学的な事前知識を活用することで、リソースに制約された臨床設定において、その有用性が大幅に向上する。先行研究は画像分割にこのような情報を利用したが、変形可能な画像登録の進歩は控えめである。このギャップを埋めるために、空間共変フィルタと視覚モデルで符号化されたテキスト解剖プロンプトを統合する新しい方法であるtextSCFを導入する。このアプローチでは、解剖学的領域のテキスト埋め込みと重み付けを関連付ける暗黙の関数を最適化し、畳み込み操作の典型的な翻訳不変制約を緩和する。 TextSCFは計算効率を向上するだけでなく、登録精度を維持または改善する。解剖学的領域間の文脈的相互作用を捉えることで、印象的な地域間移動性と、登録中に構造的不連続性を維持する能力を提供する。 TextSCFのパフォーマンスは、オブジェクト間脳MRIと腹部CT登録タスクで厳格にテストされ、MICCAI Learn2Reg 2021チャレンジで既存の最先端モデルを上回っ、リーダーボードをリードしている。腹部の登録では、textSCFのより大きなモデル変種は第2のベストモデルよりもDiceスコアを11.3%改善し、小さなモデル変種は同様の精度を維持したが、ネットワークパラメータは89.13%減少し、計算操作は98.34\%低下した。 Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduces textSCF, a novel method that integrates spatially covariant filters and textual anatomical prompts encoded by visual-language models, to fill this gap. This approach optimizes an implicit function that correlates text embeddings of anatomical regions to filter weights, relaxing the typical translation-invariance constraint of convolutional operations. TextSCF not only boosts computational efficiency but can also retain or improve registration accuracy. By capturing the contextual interplay between anatomical regions, it offers impressive inter-regional transferability and the ability to preserve structural discontinuities during registration. TextSCF's performance has been rigorously tested on inter-subject brain MRI and abdominal CT registration tasks, outperforming existing state-of-the-art models in the MICCAI Learn2Reg 2021 challenge and leading the leaderboard. In abdominal registrations, textSCF's larger model variant improved the Dice score by 11.3% over the second-best model, while its smaller variant maintained similar accuracy but with an 89.13% reduction in network parameters and a 98.34\% decrease in computational operations.	翻訳日:2024-02-07 19:36:45 公開日:2024-02-06
# 法的要件分析:規制コンプライアンスの観点から Legal Requirements Analysis: A Regulatory Compliance Perspective ( http://arxiv.org/abs/2311.13871v2 ) ライセンス: Link先を確認	Sallam Abualhaija and Marcello Ceci and Lionel Briand	(参考訳) 現代のソフトウェアは多くの分野やアプリケーションコンテキストにおいて日常的な活動の不可欠な部分です。人工知能(AI)を活用したインテリジェントオートメーションの導入は、多くの分野でブレークスルーにつながった。 aiの有効性は、データの可用性の増加など、いくつかの要因によって引き起こされる可能性がある。欧州連合(EU)におけるGDPR(General Data Protection Regulation)などの規制は、個人データの保護を保証するために導入されている。個人データを収集、処理、共有するソフトウェアシステムは、そのような規則に従っている。コンプライアンスソフトウェアの開発は、ソフトウェア開発プロセスの要件工学(re)フェーズにおける中心的な活動である、適用規則に規定された法的要件の対処に大きく依存する。 REは、法的要件を含むシステム・トゥ・ビーの要件を特定し維持することに関心がある。個人データ処理のために組織が実施する政策を記述した法的合意は、法的要件を付与するための規制に付加的な情報源を提供することができる。本章では、法的要件を分析し、GDPR上でそれらを実証する様々な方法について考察する。具体的には、規制から機械分析可能な表現を作成するための代替案について述べ、規制に対するコンプライアンス検証を可能にする既存の自動化手段を調査し、法的要件分析の現在の課題をさらに反映する。 Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation (GDPR) in the European Union (EU) are introduced to ensure the protection of personal data. Software systems that collect, process, or share personal data are subject to compliance with such regulations. Developing compliant software depends heavily on addressing legal requirements stipulated in applicable regulations, a central activity in the requirements engineering (RE) phase of the software development process. RE is concerned with specifying and maintaining requirements of a system-to-be, including legal requirements. Legal agreements which describe the policies organizations implement for processing personal data can provide an additional source to regulations for eliciting legal requirements. In this chapter, we explore a variety of methods for analyzing legal requirements and exemplify them on GDPR. Specifically, we describe possible alternatives for creating machine-analyzable representations from regulations, survey the existing automated means for enabling compliance verification against regulations, and further reflect on the current challenges of legal requirements analysis.	翻訳日:2024-02-07 19:36:18 公開日:2024-02-06
# 最初の100日間のパンデミック : 薬物・行動・デジタル介入の相互作用-エージェント・ベース・モデリングを用いた研究 First 100 days of pandemic; an interplay of pharmaceutical, behavioral and digital interventions -- A study using agent based modeling ( http://arxiv.org/abs/2401.04795v2 ) ライセンス: Link先を確認	Gauri Gupta, Ritvik Kapila, Ayush Chopra, Ramesh Raskar	(参考訳) パンデミック、特に最近の新型コロナウイルスの流行は、公衆衛生と世界経済の両方に影響を与えている。今後の流行に備えるためには、病気の進行と効率的な対応戦略の深い理解が必要である。本稿では,複雑な感染動態を捉え,介入の影響を理解する上で,エージェントベースモデル(ABM)の可能性を強調する。我々は、現実の政策導入における課題を反映した現実的な医薬品、行動、デジタル介入をシミュレートし、これらの介入の全体的組み合わせをパンデミック対応に提案する。これらのシミュレーションを用いて,ワシントン州キングス郡における実世界社会デマトグラフィーおよび地理センサスデータに基づいて,大規模人口における創発行動の傾向を検討した。本分析は, 迅速な意思決定と効率的な政策開発の重要性を強調した上で, パンデミックの進路を決定する上で, 最初の100日間の重要な役割を明らかにした。さらに、行動やデジタル介入への投資は、感染や入院の合計数を減らし、パンデミックのピークを遅らせることで、薬剤的介入の負担を軽減できる点を強調した。また、接触追跡や自己検疫による広範囲な検査に同じ金額を割り当てることで、予防接種に全予算を費やすよりもコスト効率が高いと推測しています。 Pandemics, notably the recent COVID-19 outbreak, have impacted both public health and the global economy. A profound understanding of disease progression and efficient response strategies is thus needed to prepare for potential future outbreaks. In this paper, we emphasize the potential of Agent-Based Models (ABM) in capturing complex infection dynamics and understanding the impact of interventions. We simulate realistic pharmaceutical, behavioral, and digital interventions that mirror challenges in real-world policy adoption and suggest a holistic combination of these interventions for pandemic response. Using these simulations, we study the trends of emergent behavior on a large-scale population based on real-world socio-demographic and geo-census data from Kings County in Washington. Our analysis reveals the pivotal role of the initial 100 days in dictating a pandemic's course, emphasizing the importance of quick decision-making and efficient policy development. Further, we highlight that investing in behavioral and digital interventions can reduce the burden on pharmaceutical interventions by reducing the total number of infections and hospitalizations, and by delaying the pandemic's peak. We also infer that allocating the same amount of dollars towards extensive testing with contact tracing and self-quarantine offers greater cost efficiency compared to spending the entire budget on vaccinations.	翻訳日:2024-02-07 19:27:39 公開日:2024-02-06
# バイオマーカー選択のための多目的遺伝的アルゴリズムに適用された系統的過大評価のための2段階最適化 Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection ( http://arxiv.org/abs/2312.16624v2 ) ライセンス: Link先を確認	Luca Cattelani and Vittorio Fortino	(参考訳) オミクスデータからの機械学習によるバイオマーカー発見の課題は、分子の特徴の豊富さとサンプルの不足にある。機械学習におけるほとんどの特徴選択法は、最も効果的な組み合わせを決定するために様々な特徴集合(モデル)を評価する必要がある。このプロセスは通常、バリデーションデータセットを使用して行われ、モデルのパフォーマンスを最適化するためにさまざまな機能セットをテストする。評価は性能推定エラーを持ち、選択が多くのモデルを伴う場合、ベストなモデルはほとんど確実に過大評価されます。特徴選択手法を用いたバイオマーカーの同定は、特徴数の予測能力とパシモニーの間のトレードオフを伴う多目的問題として対処できる。遺伝的アルゴリズムは多目的最適化の一般的なツールであるが、多くの解を進化させ、過大評価しがちである。モデルが既に単一目的問題で選択された後に過大評価を減少させる手法が提案されているが、最適化やモデル選択の改善、より一般的な多目的領域に適用できるアルゴリズムは存在しない。提案するDOSA-MOは多目的最適化ラッパーアルゴリズムで,元の推定値,分散度,および解の特徴セットサイズが過大評価を予測する。 DOSA-MOは最適化時の性能の期待値を調整し、解集合の構成を改善する。癌サブタイプおよび/または患者全体の生存率を予測する場合, DOSA-MOは, 腎癌および乳癌の3つの転写学的データセットを用いて, 最先端の遺伝的アルゴリズムの性能を向上させることが確認された。 The challenge in biomarker discovery using machine learning from omics data lies in the abundance of molecular features but scarcity of samples. Most feature selection methods in machine learning require evaluating various sets of features (models) to determine the most effective combination. This process, typically conducted using a validation dataset, involves testing different feature sets to optimize the model's performance. Evaluations have performance estimation error and when the selection involves many models the best ones are almost certainly overestimated. Biomarker identification with feature selection methods can be addressed as a multi-objective problem with trade-offs between predictive ability and parsimony in the number of features. Genetic algorithms are a popular tool for multi-objective optimization but they evolve numerous solutions thus are prone to overestimation. Methods have been proposed to reduce the overestimation after a model has already been selected in single-objective problems, but no algorithm existed capable of reducing the overestimation during the optimization, improving model selection, or applied in the more general multi-objective domain. We propose DOSA-MO, a novel multi-objective optimization wrapper algorithm that learns how the original estimation, its variance, and the feature set size of the solutions predict the overestimation. DOSA-MO adjusts the expectation of the performance during the optimization, improving the composition of the solution set. We verify that DOSA-MO improves the performance of a state-of-the-art genetic algorithm on left-out or external sample sets, when predicting cancer subtypes and/or patient overall survival, using three transcriptomics datasets for kidney and breast cancer.	翻訳日:2024-02-07 19:27:15 公開日:2024-02-06
# 平均場下減衰ランゲヴィンダイナミクスとその時空離散化 Mean-field underdamped Langevin dynamics and its spacetime discretization ( http://arxiv.org/abs/2312.16360v5 ) ライセンス: Link先を確認	Qiang Fu, Ashia Wilson	(参考訳) 確率測度空間上で定義された非線形汎函数の特殊クラスを最適化するN-粒子アンダーダム化ランゲヴィンアルゴリズムを提案する。この定式化に関する問題の例としては、平均場ニューラルネットワークのトレーニング、最大平均離散性最小化、カーネルスタイン離散性最小化などがある。我々のアルゴリズムは、平均場下にあるランゲヴィン力学の時空離散化に基づいており、新しい高速混合保証を提供する。さらに,本アルゴリズムは全変動距離においてグローバルに収束し,ダイナミクスと実用的実装との理論的ギャップを橋渡しすることを示した。 We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discretization of the mean-field underdamped Langevin dynamics, for which we provide a new, fast mixing guarantee. In addition, we demonstrate that our algorithm converges globally in total variation distance, bridging the theoretical gap between the dynamics and its practical implementation.	翻訳日:2024-02-07 19:26:48 公開日:2024-02-06
# スケーリングが必要なのはすべて - JAX-Accelerated Reinforcement Learningによる自律運転 Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning ( http://arxiv.org/abs/2312.15122v2 ) ライセンス: Link先を確認	Moritz Harmel, Anubhav Paras, Andreas Pasternak, Gary Linscott	(参考訳) 強化学習は、ビデオゲームのような複雑な領域で最高の人間よりも優れていることが示されている。しかし、自動運転に必要な規模で強化学習実験を行うことは極めて困難である。大規模な強化学習システムを構築し、多くのGPUに分散することは難しい。現実世界の車両でのトレーニング中の収集経験は、安全性とスケーラビリティの観点から禁止されている。そのため、実世界の運転から大量のデータを利用する効率的で現実的な運転シミュレータが必要となる。これらの機能をまとめて,自律運転のための大規模強化学習実験を行う。当社の政策性能は大規模化とともに向上することを示す。当社のベストパフォーマンスポリシは、自動運転のための最先端機械学習によるポリシと比較して、運転進捗率を25%向上しながら、障害率を64%削減します。 Reinforcement learning has been demonstrated to outperform even the best humans in complex domains like video games. However, running reinforcement learning experiments on the required scale for autonomous driving is extremely difficult. Building a large scale reinforcement learning system and distributing it across many GPUs is challenging. Gathering experience during training on real world vehicles is prohibitive from a safety and scalability perspective. Therefore, an efficient and realistic driving simulator is required that uses a large amount of data from real-world driving. We bring these capabilities together and conduct large-scale reinforcement learning experiments for autonomous driving. We demonstrate that our policy performance improves with increasing scale. Our best performing policy reduces the failure rate by 64% while improving the rate of driving progress by 25% compared to the policies produced by state-of-the-art machine learning for autonomous driving.	翻訳日:2024-02-07 19:26:37 公開日:2024-02-06
# SimLM: 言語モデルは物理系のパラメータを推測できるか? SimLM: Can Language Models Infer Parameters of Physical Systems? ( http://arxiv.org/abs/2312.14215v2 ) ライセンス: Link先を確認	Sean Memery, Mirella Lapata, Kartic Subr	(参考訳) いくつかの機械学習手法は、複雑な物理システムについて学習または推論することを目的としている。推論への一般的な第一歩は、システムパラメータをその振る舞いの観察から推測することである。本稿では,大規模言語モデル(LLM)の物理系におけるパラメータ推論における性能について検討する。実験の結果,単純なシステムであっても,本課題には適していないことが示唆された。本稿では,物理シミュレータを用いてllmの文脈を補強する探査の有望な方向性を提案する。我々は,物理シミュレーションを利用せずに,簡単な実例で異なるllmの性能を評価し比較する。 Several machine learning methods aim to learn or reason about complex physical systems. A common first-step towards reasoning is to infer system parameters from observations of its behavior. In this paper, we investigate the performance of Large Language Models (LLMs) at performing parameter inference in the context of physical systems. Our experiments suggest that they are not inherently suited to this task, even for simple systems. We propose a promising direction of exploration, which involves the use of physical simulators to augment the context of LLMs. We assess and compare the performance of different LLMs on a simple example with and without access to physical simulation.	翻訳日:2024-02-07 19:26:26 公開日:2024-02-06
# XLand-MiniGrid:JAXにおけるスケーラブルなメタ強化学習環境 XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX ( http://arxiv.org/abs/2312.12044v2 ) ライセンス: Link先を確認	Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Artem Agarkov, Viacheslav Sinii, Sergey Kolesnikov	(参考訳) XLandの多様性と深さ、MiniGridのシンプルさとミニマリズムに触発され、メタ強化学習研究のためのツールとグリッドワールド環境のスイートであるXLand-MiniGridを紹介した。 JAXで書かれたXLand-MiniGridは高度にスケーラブルな設計で、GPUやTPUアクセラレータ上で実行でき、限られたリソースで大規模な実験を民主化することができる。環境とともに、XLand-MiniGridは、ユーザが適応エージェントのトレーニングを素早く始められるような、難易度と使い易いベースラインの、何百万ものユニークなタスクで、事前サンプリングされたベンチマークを提供する。さらに,スケーリングと一般化の予備的な分析を行い,トレーニング中にベースラインが毎秒数百万ステップに達することを示し,提案したベンチマークが困難であることを検証した。 Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with the environments, XLand-MiniGrid provides pre-sampled benchmarks with millions of unique tasks of varying difficulty and easy-to-use baselines that allow users to quickly start training adaptive agents. In addition, we have conducted a preliminary analysis of scaling and generalization, showing that our baselines are capable of reaching millions of steps per second during training and validating that the proposed benchmarks are challenging.	翻訳日:2024-02-07 19:26:18 公開日:2024-02-06
# 変圧器の数学的展望 A mathematical perspective on Transformers ( http://arxiv.org/abs/2312.10794v3 ) ライセンス: Link先を確認	Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet	(参考訳) トランスフォーマーは、大きな言語モデルの内部動作において中心的な役割を果たす。本研究では,相互作用する粒子系として解釈したトランスフォーマーを解析するための数学的枠組みを構築した。我々の研究は基礎となる理論を探求し、数学者と計算機科学者に新しい視点を提供する。 Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.	翻訳日:2024-02-07 19:26:01 公開日:2024-02-06
# TiMix:効果的なビジョンランゲージ事前学習のためのテキスト対応画像ミキシング TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training ( http://arxiv.org/abs/2312.08846v3 ) ライセンス: Link先を確認	Chaoya Jiang, Wei ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang	(参考訳) 自己教師型マルチモーダル・コントラシティブ・ラーニング(SMCL)は、視覚的・言語的モダリティを整合させることにより、現代のビジョンランゲージ・プレトレーニング(VLP)モデルを大幅に進歩させる。しかし、ウェブハーベストテキストイメージペアのノイズのため、SMCLにおけるトレーニングデータボリュームのスケールアップは、計算コストとデータ非効率の点でかなりの障害となる。本稿では,vlpにおけるデータ効率を向上させるために,ミックスベースデータ拡張技術をsmclに統合したテキスト認識画像混合(timix)を提案する。本稿では,相互情報(MI)の観点からTiMixの理論的解析を行い,相互学習のための混合データサンプルが,対照損失の正則化として暗黙的に機能していることを示す。実験の結果,timoxは既存の手法に対してベンチマークを行った場合,トレーニングデータの量が少なく,トレーニング時間が短い場合でも,下流タスクで同等のパフォーマンスを示すことがわかった。この研究は、データ効率と計算可能なVLPのためのデータ混合の可能性を実証的かつ理論的に実証し、実用シナリオにおけるより広範なVLPモデルの採用に寄与する。 Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities. Due to noises in web-harvested text-image pairs, however, scaling up training data volume in SMCL presents considerable obstacles in terms of computational cost and data inefficiency. To improve data efficiency in VLP, we propose Text-aware Image Mixing (TiMix), which integrates mix-based data augmentation techniques into SMCL, yielding significant performance improvements without significantly increasing computational overhead. We provide a theoretical analysis of TiMixfrom a mutual information (MI) perspective, showing that mixed data samples for cross-modal contrastive learning implicitly serve as a regularizer for the contrastive loss. The experimental results demonstrate that TiMix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods. This work empirically and theoretically demonstrates the potential of data mixing for data-efficient and computationally viable VLP, benefiting broader VLP model adoption in practical scenarios.	翻訳日:2024-02-07 19:25:56 公開日:2024-02-06
# テキスト・画像拡散モデルにおける局所条件制御 Local Conditional Controlling for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.08768v2 ) ライセンス: Link先を確認	Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Wei Zhao, qinglin lu, Boxi Wu, Wei Liu	(参考訳) 拡散モデルは、テキストから画像へのタスクにおいて印象的な傾向を示してきた。近年の手法では、エッジや深度マップなどの画像レベルの制御を加えて、テキストプロンプトとともに生成プロセスを操作し、所望の画像を取得する。この制御プロセスは、制御領域の柔軟性を制限する全画像上でグローバルに操作される。本稿では,ローカル制御という,シンプルで実用的なタスク設定を提案する。ユーザが定義した画像条件に従って特定の局所領域を制御することに焦点を当て、残りの領域は元のテキストプロンプトによってのみ条件付けされる。この方法では、ユーザがきめ細かい方法で画像生成を柔軟に制御できる。しかし、この目標を達成することは自明ではない。局所的な条件を直接付加するナイーブな方法が、局所的な支配的な問題に繋がる可能性がある。そこで本研究では,非制御領域における概念生成を促進するため,非制御領域におけるデノセーション過程におけるクロス・アテンション・マップのノイズの更新とパラメータを活用するトレーニングフリーな手法を提案する。また,局所制御領域内外における情報差に起因する合成画像品質の劣化を軽減するために,特徴マスク制約を用いる。広域実験により,高品質画像を局所制御条件下でプロンプトに合成できることが実証された。コードはhttps://github.com/YibooZhao/Local-Control.comで入手できる。 Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we introduce a new simple yet practical task setting: local control. It focuses on controlling specific local areas according to user-defined image conditions, where the rest areas are only conditioned by the original text prompt. This manner allows the users to flexibly control the image generation in a fine-grained way. However, it is non-trivial to achieve this goal. The naive manner of directly adding local conditions may lead to the local control dominance problem. To mitigate this problem, we propose a training-free method that leverages the updates of noised latents and parameters in the cross-attention map during the denosing process to promote concept generation in non-control areas. Moreover, we use feature mask constraints to mitigate the degradation of synthesized image quality caused by information differences inside and outside the local control area. Extensive experiments demonstrate that our method can synthesize high-quality images to the prompt under local control conditions. Code is available at https://github.com/YibooZhao/Local-Control.	翻訳日:2024-02-07 19:25:35 公開日:2024-02-06
# smerf:リアルタイム大規模探索のための効率的なラミアンスフィールド SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration ( http://arxiv.org/abs/2312.07541v2 ) ライセンス: Link先を確認	Daniel Duckworth, Peter Hedman, Christian Reiser, Peter Zhizhin, Jean-Fran\c{c}ois Thibert, Mario Lu\v{c}i\'c, Richard Szeliski, Jonathan T. Barron	(参考訳) 近年のリアルタイムビュー合成技術は, 忠実度と速度が急速に向上し, インタラクティブなフレームレートで近光写実的シーンをレンダリングすることができる。同時に、ラスタ化に寄与する明示的なシーン表現と、レイマーチング上に構築されたニューラルフィールドとの間に緊張が生じ、後者の最先端のインスタンスは、リアルタイムアプリケーションでは違法に高価であると同時に、前者の品質を上回っている。本研究では,最大300 m$^2$ 3.5 mm$^3$ の体積分解能で,大規模シーンにおけるリアルタイム手法の最先端精度を実現するビュー合成手法であるsmerfを提案する。本手法は,計算量とメモリ消費を制約しながらモデル容量を増加させる階層的モデル分割方式と,高忠実度と内部整合性を同時に生成する蒸留訓練戦略の2つの主要な貢献に基づいて構築されている。当社のアプローチは,Webブラウザ内での6自由度ナビゲーションを可能にし,コモディティスマートフォンやラップトップ上でリアルタイムにレンダリングする。大規模実験により,本手法は,標準ベンチマークで0.78db,大シーンで1.78db,最先端のラミアンスフィールドモデルより3桁早くフレームを描画し,スマートフォンを含む多種多様なコモディティデバイスでリアルタイム性能を実現する。プロジェクトのWebサイトでは、これらのモデルをインタラクティブに探求することを読者に勧めています。 Recent techniques for real-time view synthesis have rapidly advanced in fidelity and speed, and modern methods are capable of rendering near-photorealistic scenes at interactive frame rates. At the same time, a tension has arisen between explicit scene representations amenable to rasterization and neural fields built on ray marching, with state-of-the-art instances of the latter surpassing the former in quality while being prohibitively expensive for real-time applications. In this work, we introduce SMERF, a view synthesis approach that achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m$^2$ at a volumetric resolution of 3.5 mm$^3$. Our method is built upon two primary contributions: a hierarchical model partitioning scheme, which increases model capacity while constraining compute and memory consumption, and a distillation training strategy that simultaneously yields high fidelity and internal consistency. Our approach enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops. Extensive experiments show that our method exceeds the current state-of-the-art in real-time novel view synthesis by 0.78 dB on standard benchmarks and 1.78 dB on large scenes, renders frames three orders of magnitude faster than state-of-the-art radiance field models, and achieves real-time performance across a wide variety of commodity devices, including smartphones. We encourage readers to explore these models interactively at our project website: https://smerf-3d.github.io.	翻訳日:2024-02-07 19:25:14 公開日:2024-02-06
# 運動量粒子の最大範囲 Momentum Particle Maximum Likelihood ( http://arxiv.org/abs/2312.07335v2 ) ライセンス: Link先を確認	Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen	(参考訳) 潜在変数モデルの最大確率推定(MLE)は、パラメータと確率分布の拡張空間に対する最適化問題としてしばしば再キャストされる。例えば、期待最大化(EM)アルゴリズムは、この空間上の適切な自由エネルギー汎関数に適用された座標降下と解釈できる。近年、この視点は最適輸送とワッサーシュタイン勾配流からの洞察と組み合わされ、標準EMよりも広いモデルのクラスに適用可能な粒子ベースのアルゴリズムが開発されている。通常の微分方程式の離散化として 'momentum-enriched' 最適化アルゴリズムを解釈する先行研究からインスピレーションを得て、パラメータと確率分布の拡張空間上の自由エネルギー関数を最小化する類似の力学系に基づくアプローチを提案する。その結果、ネステロフの加速勾配法、アンダーダムのランゲヴィン拡散法、および粒子法の要素をブレンドする力学系が得られた。適切な仮定の下では,提案方式の定量的収束を連続時間における関数のユニークな最小化に確立する。そこで本研究では,潜在変数モデルにおけるパラメータ推定に適用可能な数値的な離散化を提案する。数値実験により,結果のアルゴリズムは既存の手法よりも高速に収束し,他の(ほぼ)mleアルゴリズムと比較できることを示した。 Maximum likelihood estimation (MLE) of latent variable models is often recast as an optimization problem over the extended space of parameters and probability distributions. For example, the Expectation Maximization (EM) algorithm can be interpreted as coordinate descent applied to a suitable free energy functional over this space. Recently, this perspective has been combined with insights from optimal transport and Wasserstein gradient flows to develop particle-based algorithms applicable to wider classes of models than standard EM. Drawing inspiration from prior works which interpret `momentum-enriched' optimisation algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical systems-inspired approach to minimizing the free energy functional over the extended space of parameters and probability distributions. The result is a dynamic system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we establish quantitative convergence of the proposed system to the unique minimiser of the functional in continuous time. We then propose a numerical discretization of this system which enables its application to parameter estimation in latent variable models. Through numerical experiments, we demonstrate that the resulting algorithm converges faster than existing methods and compares favourably with other (approximate) MLE algorithms.	翻訳日:2024-02-07 19:24:44 公開日:2024-02-06
# HumanReg:Human Point Cloudの自己管理型非厳格登録 HumanReg: Self-supervised Non-rigid Registration of Human Point Cloud ( http://arxiv.org/abs/2312.05462v2 ) ライセンス: Link先を確認	Yifan Chen, Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Jianjiang Feng, Jie Zhou	(参考訳) 本稿では、2つの人点雲間の非剛性変換をエンドツーエンドに学習する新しい登録フレームワークであるHumanRegを提案する。このタイプのポイントクラウドを効率的に扱うために、登録プロセスにボディを導入します。高価なポイント単位のフローアノテーションを必要とする既存の管理された登録技術とは異なり、HumanRegは、新しい損失関数の集合から恩恵を受ける自己管理的な方法で訓練することができる。実世界のデータにモデルをよりよく収束させるため、事前学習戦略を提案し、動的で疎い人点雲と自動生成された地底真理アノテーションからなる合成データセット(HumanSyn4D)を提案する。我々の実験では、humanreg は cape-512 データセットで最先端のパフォーマンスを達成し、また別の挑戦的な実世界のデータセットで定性的な結果が得られることを示した。さらに,本研究は合成データセットと新しい損失関数の有効性を示す。私たちのコードと合成データセットはhttps://github.com/chenyifanthu/humanregで利用可能です。 In this paper, we present a novel registration framework, HumanReg, that learns a non-rigid transformation between two human point clouds end-to-end. We introduce body prior into the registration process to efficiently handle this type of point cloud. Unlike most exsisting supervised registration techniques that require expensive point-wise flow annotations, HumanReg can be trained in a self-supervised manner benefiting from a set of novel loss functions. To make our model better converge on real-world data, we also propose a pretraining strategy, and a synthetic dataset (HumanSyn4D) consists of dynamic, sparse human point clouds and their auto-generated ground truth annotations. Our experiments shows that HumanReg achieves state-of-the-art performance on CAPE-512 dataset and gains a qualitative result on another more challenging real-world dataset. Furthermore, our ablation studies demonstrate the effectiveness of our synthetic dataset and novel loss functions. Our code and synthetic dataset is available at https://github.com/chenyifanthu/HumanReg.	翻訳日:2024-02-07 19:24:22 公開日:2024-02-06
# 並列関数呼び出しのためのLLMコンパイラ An LLM Compiler for Parallel Function Calling ( http://arxiv.org/abs/2312.04511v2 ) ライセンス: Link先を確認	Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami	(参考訳) 最近の言語モデルは様々な複雑な推論ベンチマークで顕著な結果を示している。 LLMの推論能力により、知識の遮断、算術能力の不足、プライベートデータへのアクセスの欠如など、独自の制限を克服するために外部関数呼び出しを実行することができる。この開発により、LLMはコンテキストに基づいて複数の関数を選択し調整し、より複雑な問題に取り組むことができる。しかし、現在の複数の関数呼び出しのメソッドは、しばしば、高いレイテンシ、コスト、時には不正確な振る舞いをもたらす、各関数のシーケンシャルな推論と動作を必要とする。これに対処するために,並列に関数を実行するLLMCompilerを導入し,複数の関数呼び出しを効率的にオーケストレーションする。古典的なコンパイラの原則から、LLMCompilerは3つのコンポーネントで並列関数呼び出しを合理化する。 i) LLMプランナーであって,実行計画を定めているもの (ii)タスクフェッチユニット、タスクを呼び出す関数のディスパッチ、及び (iii)これらのタスクを並列に実行するExecutor。 LLMCompilerは関数呼び出しに最適化されたオーケストレーションを自動的に生成し、オープンソースモデルとクローズドソースモデルの両方で使用することができる。我々は様々な関数呼び出しパターンを持つタスクでllmcompilerをベンチマークした。一貫性のあるレイテンシのスピードアップは3.7倍まで,コスト削減は6.7倍まで,正確性は最大9%まで向上しています。 Recent language models have shown remarkable results on various complex reasoning benchmarks. The reasoning capabilities of LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for multiple function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calling. Drawing from the principles of classical compilers, LLMCompiler streamlines parallel function calling with three components: (i) an LLM Planner, formulating execution plans; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct.	翻訳日:2024-02-07 19:24:02 公開日:2024-02-06
# 量子非局所性の多元的性質を解き明かす Unmasking the Polygamous Nature of Quantum Nonlocality ( http://arxiv.org/abs/2312.04373v2 ) ライセンス: Link先を確認	Pawe{\l} Cie\'sli\'nski, Lukas Knips, Mateusz Kowalczyk, Wies{\l}aw Laskowski, Tomasz Paterek, Tam\'as V\'ertesi, Harald Weinfurter	(参考訳) 量子力学は、ある観測値の統計に制限を課す。おそらく最も有名な例は不確実性原理である。複数のベルの不等式を同時に違反する同様のトレードオフも存在する。 3人の観察者の最も単純な場合、ベルの不平等に違反することは他の不平等に違反することを妨げることが示されている。ベル・モノガミーの形式は無符号原理と関連しており、全ての不等式を同時に違反することができないことは、その基本的な性質と見なされている。ここではベル単ガミーが普遍的に成り立たないことを示し、実際には三人の観測者に対してのみ単ガミー的な状況が存在する。したがって、量子非局所性の性質は真に多元的である。 3人以上の観測者に対して単元原理に従わない量子状態とタイトベル不等式を同定するための体系的手法を提案する。同定された多価不等式は、6光子ディック状態を用いたベル型相関の測定によって実験的に破られ、量子暗号や量子ネットワーク内の複数のノードの同時自己検査に利用することができる。 Quantum mechanics imposes limits on the statistics of certain observables. Perhaps the most famous example is the uncertainty principle. Similar trade-offs also exist for the simultaneous violation of multiple Bell inequalities. In the simplest case of three observers, it has been shown that violating one Bell inequality precludes the violation of any other inequality, a property called monogamy of Bell violations. Forms of Bell monogamy have been linked to the no-signalling principle and the inability of simultaneous violations of all inequalities is regarded as their fundamental property. Here we show that the Bell monogamy does not hold universally and that in fact the only monogamous situation exists only for three observers. Consequently, the nature of quantum nonlocality is truly polygamous. We present a systematic methodology for identifying quantum states and tight Bell inequalities that do not obey the monogamy principle for any number of more than three observers. The identified polygamous inequalities are experimentally violated by the measurement of Bell-type correlations using six-photon Dicke states and may be exploited for quantum cryptography as well as simultaneous self testing of multiple nodes in a quantum network.	翻訳日:2024-02-07 19:23:41 公開日:2024-02-06
# 教師なし類似度尺度を用いたソースコードクローン検出 Source Code Clone Detection Using Unsupervised Similarity Measures ( http://arxiv.org/abs/2401.09885v3 ) ライセンス: Link先を確認	Jorge Martinez-Gil	(参考訳) 近年,クローン検出やコード検索,レコメンデーションといったソフトウェア工学タスクの重要性から,ソースコードの類似性の評価が注目されている。本研究はソースコードクローン検出のための教師なし類似度尺度の比較分析を行う。目標は、現在の最先端技術、その強み、弱点を概観することである。そのため、既存の教師なし戦略をコンパイルし、ベンチマークデータセットでパフォーマンスを評価することで、ソフトウェアエンジニアが特定のユースケースに適した方法を選択するようにガイドします。この研究のソースコードはhttps://github.com/jorge-martinez-gil/codesimで入手できる。 Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation. This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses. To do that, we compile the existing unsupervised strategies and evaluate their performance on a benchmark dataset to guide software engineers in selecting appropriate methods for their specific use cases. The source code of this study is available at https://github.com/jorge-martinez-gil/codesim	翻訳日:2024-02-07 19:16:50 公開日:2024-02-06
# 2次元の低オーバーヘッド量子コンピューティングのためのLDPC-cat符号 LDPC-cat codes for low-overhead quantum computing in 2D ( http://arxiv.org/abs/2401.09541v2 ) ライセンス: Link先を確認	Diego Ruiz, J\'er\'emie Guillaud, Anthony Leverrier, Mazyar Mirrahimi, Christophe Vuillot	(参考訳) 量子低密度パリティチェック(qLDPC)コードは、フォールトトレラント量子コンピューティング(FTQC)アーキテクチャのオーバーヘッドを大幅に削減するための有望な構造である。しかし、これらのコードの既知のハードウェア実装はすべて、長距離量子ビット接続、高速安定化器、多層チップレイアウトなどの高度な技術を必要とする。フォールトトレランスのハードウェアオーバーヘッドを削減する別のアプローチは、ビットフリップエラーが指数関数的に設計によって抑制されるボソニックキャットキュービットを使用することである。本研究では,両手法を組み合わせて,位相フリップを補正する古典的LDPC符号を構成する猫量子ビットに基づくアーキテクチャを提案する。このような位相フリップLDPC符号を用いることで、2つの大きな利点が得られます。まず、2Dおよび低ウェイト安定化器における短距離量子ビット相互作用により、現在の超伝導回路技術と容易に互換性のあるコードの実装を実現する。第2に,局所接続を維持しつつ,猫キュービットの第2層を持つ論理ゲートのフォールトトレラントなユニバーサルセットの実装方法を示す。我々はこれらの古典符号の数値的ブルートフォース最適化を行い、アルゴリズムが関連する符号距離に最適な符号化レートの符号を求める。我々は、最良のコードのいくつかがセル・オートマトン構造から恩恵を受けていることを発見します。これにより、高いエンコーディングレートと距離を持つコードのファミリーを定義することができます。最後に,回路レベルの雑音下でのコードの性能を数値的に評価する。物理的フェイズフリップエラー確率$\epsilon \approx 0.1\%$と仮定すると、私たちの$[165+8\ell, 34+2\ell, 22]$コードファミリーは、合計論理的エラー確率(論理的位相フリップとビットフリップの両方を含む)と論理的キュービット$\epsilon_L \leq 10^{-8}$を758ドルのキャット量子ビットチップで符号化することができる。 Quantum low-density parity-check (qLDPC) codes are a promising construction for drastically reducing the overhead of fault-tolerant quantum computing (FTQC) architectures. However, all of the known hardware implementations of these codes require advanced technologies, such as long-range qubit connectivity, high-weight stabilizers, or multi-layered chip layouts. An alternative approach to reduce the hardware overhead of fault-tolerance is to use bosonic cat qubits where bit-flip errors are exponentially suppressed by design. In this work, we combine both approaches and propose an architecture based on cat qubits concatenated in classical LDPC codes correcting for phase-flips. We find that employing such phase-flip LDPC codes provides two major advantages. First, the hardware implementation of the code can be realised using short-range qubit interactions in 2D and low-weight stabilizers, which makes it readily compatible with current superconducting circuit technologies. Second, we demonstrate how to implement a fault-tolerant universal set of logical gates with a second layer of cat qubits while maintaining the local connectivity. We conduct a numerical brute force optimisation of these classical codes to find the ones with the best encoding rate for algorithmically relevant code distances. We discover that some of the best codes benefit from a cellular automaton structure. This allows us to define families of codes with high encoding rates and distances. Finally, we numerically assess the performance of our codes under circuit-level noise. Assuming a physical phase-flip error probability $\epsilon \approx 0.1\%$, our $[165+8\ell, 34+2\ell, 22]$ code family allows to encode $100$ logical qubits with a total logical error probability (including both logical phase-flip and bit-flip) per cycle and per logical qubit $\epsilon_L \leq 10^{-8}$ on a $758$ cat qubit chip.	翻訳日:2024-02-07 19:16:40 公開日:2024-02-06
# インフレのクリロフ複雑性 Inflationary Krylov complexity ( http://arxiv.org/abs/2401.09307v3 ) ライセンス: Link先を確認	Tao Li and Lei-Hua Liu	(参考訳) 本研究では,インフレーションにおける変形分散関係に対する曲率摂動のクリロフ複雑性を体系的に検討した。多くの量子重力フレームワークはこの種の分散関係を修正できるため、我々の分析は弦宇宙論、ループ重力、$\it e.t.c$に適用できる。 lanczosアルゴリズムに従い、非常に初期の宇宙は無限多体、最大カオス系であることがわかった。我々の数値は、標準分散関係のLanczos係数とLyapunov指数が主にスケール係数によって決定されることを示している。修正された場合については、運動量によってほぼ決定される。閉系の手法では、水平線が抜ける前にクリロフ複雑性が不規則な振動を示すことが分かる。修正されたケースは、地平線が存在すればより高速な成長を示す。開系のアプローチについては、Lanczos係数を$n$(主量子数)に比例させるだけで非常に堅牢な正確な波動関数を構築する。これに基づいて、Krylov複雑性とKrylovエントロピーは、弱散逸近似の下で閉じた系の場合、十分に回復可能であることを発見し、この分析により、Krylov複雑性の進化は元の状況と変わらないことを示した。また,インフレーション期は強い消散期であることがわかった。一方、我々の数値は、クリロフの複雑さがインフレーション期間中に増加することを明らかに示しています。しかし、小さなスケールでは、地平線が出てからピークとなるだろう。分析の結果,背景の劇的な変化(インフレーション)がクリロフ複雑性の進化に大きく影響することが明らかとなった。曲率摂動は量子レベルから古典レベルに遷移する。このデコヒーレンスがインフレーション中のクリロフの複雑さに大きな影響を与えると期待できる。 In this work, we have systematically investigated the Krylov complexity of curvature perturbation for the modified dispersion relation in inflation. Since many quantum gravitational frameworks could lead to this kind of modified dispersion relation, our analysis could be applied to the string cosmology, loop gravity, $\it e.t.c$. Following the Lanczos algorithm, we find the very early universe is an infinite, many-body, and maximal chaotic system. Our numerics shows that the Lanczos coefficient and Lyapunov index of the standard dispersion relation are mainly determined by the scale factor. As for the modified case, it is nearly determined by the momentum. In a method of the closed system, we discover that the Krylov complexity will show irregular oscillation before the horizon exits. The modified case will present faster growth after the horizon exists. As for the approach of an open system, we construct the exact wave function which is very robust only requiring the Lanczos coefficient proportional to $n$ (main quantum number). Based on it, we find the Krylov complexity and Krylov entropy could nicely recover in the case of a closed system under the weak dissipative approximation, in which our analysis shows that the evolution of Krylov complexity will not be the same with the original situation. We also find the inflationary period is a strong dissipative system. Meanwhile, our numerics clearly shows the Krylov complexity will grow during the whole inflationary period. But for the small scales, there will be a peak after the horizon exits. Our analysis reveals that the dramatic change in background (inflation) will significantly impact the evolution of Krylov complexity. Since the curvature perturbation will transit from the quantum level to the classical level. We could expect that the decoherence will highly impact the Krylov complexity during inflation.	翻訳日:2024-02-07 19:16:02 公開日:2024-02-06
# マルコフ雑音を用いた確率近似と強化学習のためのode法 The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise ( http://arxiv.org/abs/2401.07844v2 ) ライセンス: Link先を確認	Shuze Liu, Shuhang Chen, Shangtong Zhang	(参考訳) 確率近似(英: stochastic approximation)は、ベクトルを反復的に、漸進的に、そして確率的に更新するアルゴリズムのクラスである。確率近似アルゴリズムを解析する基本的な課題は、その安定性、すなわち確率ベクトル反復がほぼ確実に有界であることを示すことである。本稿では, マルティンゲール差分雑音設定からマルコフ雑音設定への安定性に対するボルカー・マインの定理を拡張し, 強化学習, 特に線形関数近似と適性トレースを用いたオフポリシー強化学習アルゴリズムに適用性を大幅に向上させた。我々の分析の中心は、少数の函数の変化の漸近速度の減少であり、これは大数の強い法則の形式とよく使われるV4リャプノフドリフト条件の両方によって示唆され、マルコフ鎖が有限で既約であれば自明に成り立つ。 Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.	翻訳日:2024-02-07 19:15:35 公開日:2024-02-06
# 大規模言語モデルからのイベントシーケンス知識の蒸留 Distilling Event Sequence Knowledge From Large Language Models ( http://arxiv.org/abs/2401.07237v2 ) ライセンス: Link先を確認	Somin Wadhwa, Oktie Hassanzadeh, Debarun Bhattacharjya, Ken Barker, Jian Ni	(参考訳) イベントシーケンスモデルは、イベントの分析と予測に非常に有効であることが判明している。このようなモデルの構築には、豊富な高品質なイベントシーケンスデータが必要になる。しかし、特定のアプリケーションでは、クリーンな構造化されたイベントシーケンスは利用できず、自動シーケンス抽出はノイズが多く不完全なデータをもたらす。本研究では,確率的イベントモデル構築に効果的に使用できるイベントシーケンスを生成するための大規模言語モデル(llm)の利用を検討する。これは、LLMからイベントシーケンス知識を蒸留するメカニズムと見なすことができる。本手法は、因果関係を持つ事象概念の知識グラフ(KG)を用いて、因果関係生成のための生成言語モデルを導出する。提案手法は,入力KGの知識ギャップを埋めて,高品質なイベントシーケンスを生成することができることを示す。さらに,パターンマイニングや確率的イベントモデルから有用で複雑な構造化知識を発見するために,生成されたシーケンスをどのように活用するかを検討する。我々は、シーケンス生成コードと評価フレームワーク、およびイベントシーケンスデータのコーパスをリリースする。 Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of Large Language Models (LLMs) to generate event sequences that can effectively be used for probabilistic event model construction. This can be viewed as a mechanism of distilling event sequence knowledge from LLMs. Our approach relies on a Knowledge Graph (KG) of event concepts with partial causal relations to guide the generative language model for causal event sequence generation. We show that our approach can generate high-quality event sequences, filling a knowledge gap in the input KG. Furthermore, we explore how the generated sequences can be leveraged to discover useful and more complex structured knowledge from pattern mining and probabilistic event models. We release our sequence generation code and evaluation framework, as well as corpus of event sequence data.	翻訳日:2024-02-07 19:15:14 公開日:2024-02-06
# 付加量子化による大規模言語モデルの極端圧縮 Extreme Compression of Large Language Models via Additive Quantization ( http://arxiv.org/abs/2401.06118v2 ) ライセンス: Link先を確認	Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, Dan Alistarh	(参考訳) 正確なオープン大言語モデル(LLM)の出現は、エンドユーザーデバイス上での実行を可能にするようなモデルの量子化技術への競争につながった。本稿では,Multi-Codebook Quantization(MCQ)における古典的手法の観点から,パラメータあたり2ビットから3ビットといった,極めて低ビット数を対象として定義されたLLM圧縮の問題を再考する。我々の研究は、MCQファミリーの古典的なアルゴリズムであるAdditive Quantizationの上に構築され、言語モデルの量子化に適応する。結果として得られたアルゴリズムは、LLM圧縮の最先端を推し進め、与えられた圧縮予算の精度において、最近提案されたすべての技術より優れている。例えば、Llama 2モデルをパラメータあたり2ビットに圧縮する場合、我々のアルゴリズムは、7Bモデルを6.93パープレキシティ(最高の先行処理に対して1.29改善、FP16から1.81ポイント)、13Bモデルを5.70パープレキシティ(.36改善)、70Bモデルを3.94パープレキシティ(.22改善)に量子化する。我々は,LLM量子化の今後の研究を促進するために,言語モデル AQLM をベースラインとして追加量子化の実装をリリースする。 The emergence of accurate open large language models (LLMs) has led to a race towards quantization techniques for such models enabling execution on end-user devices. In this paper, we revisit the problem of "extreme" LLM compression--defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter, from the point of view of classic methods in Multi-Codebook Quantization (MCQ). Our work builds on top of Additive Quantization, a classic algorithm from the MCQ family, and adapts it to the quantization of language models. The resulting algorithm advances the state-of-the-art in LLM compression, outperforming all recently-proposed techniques in terms of accuracy at a given compression budget. For instance, when compressing Llama 2 models to 2 bits per parameter, our algorithm quantizes the 7B model to 6.93 perplexity (a 1.29 improvement relative to the best prior work, and 1.81 points from FP16), the 13B model to 5.70 perplexity (a .36 improvement) and the 70B model to 3.94 perplexity (a .22 improvement) on WikiText2. We release our implementation of Additive Quantization for Language Models AQLM as a baseline to facilitate future research in LLM quantization.	翻訳日:2024-02-07 19:14:57 公開日:2024-02-06
# kernel fisher-rao flowを用いた単位時間サンプリング Sampling in Unit Time with Kernel Fisher-Rao Flow ( http://arxiv.org/abs/2401.03892v2 ) ライセンス: Link先を確認	Aimee Maurais and Youssef Marzouk	(参考訳) 非正規化対象密度からサンプリングするための新しい平均場ODEと対応する相互作用粒子系(IPS)を導入する。 IPSは勾配のない閉形式であり、参照密度からサンプリングして(正規化されていない)ターゲット-参照密度比を計算する能力のみを必要とする。平均場ODEは、特定のフィッシャー-ラオ勾配流の経路である2つの密度の幾何学的混合に沿ってサンプルを輸送する速度場に対するポアソン方程式を解くことで得られる。速度場に rkhs ansatz を用いることでポアソン方程式を扱いやすくし, 得られた平均場 ode を有限標本上で離散化することができる。平均場ODEは、サンプル駆動最適輸送として知られるフレームワーク内でのモンゲ・アンプ・エル方程式の連続線型化の極限として離散時間の観点からも導出することができる。我々は,このアプローチの確率的変種を導入し,ipsが様々なターゲット分布から高品質なサンプルを生成できることを実証的に示す。 We introduce a new mean-field ODE and corresponding interacting particle systems (IPS) for sampling from an unnormalized target density. The IPS are gradient-free, available in closed form, and only require the ability to sample from a reference density and compute the (unnormalized) target-to-reference density ratio. The mean-field ODE is obtained by solving a Poisson equation for a velocity field that transports samples along the geometric mixture of the two densities, which is the path of a particular Fisher-Rao gradient flow. We employ a RKHS ansatz for the velocity field, which makes the Poisson equation tractable and enables discretization of the resulting mean-field ODE over finite samples. The mean-field ODE can be additionally be derived from a discrete-time perspective as the limit of successive linearizations of the Monge-Amp\`ere equations within a framework known as sample-driven optimal transport. We introduce a stochastic variant of our approach and demonstrate empirically that our IPS can produce high-quality samples from varied target distributions, outperforming comparable gradient-free particle systems and competitive with gradient-based alternatives.	翻訳日:2024-02-07 19:14:31 公開日:2024-02-06
# 現実のドラッグ発見のための量子コンピューティングパイプライン:アルゴリズムから量子ハードウェアへ A Quantum Computing Pipeline for Real World Drug Discovery: From Algorithm to Quantum Hardware ( http://arxiv.org/abs/2401.03759v2 ) ライセンス: Link先を確認	Weitang Li, Zhi Yin, Xiaoran Li, Dongqiang Ma, Shuang Yi, Zhenxing Zhang, Chenji Zou, Kunliang Bu, Maochun Dai, Jie Yue, Yuzong Chen, Xiaojin Zhang, Shengyu Zhang	(参考訳) 量子コンピューティングは、古典的アプローチよりも優れた計算能力を持ち、医薬品を含む多くの科学領域に革命を起こす可能性を秘めている。しかし、量子コンピューティングの薬物発見への応用は主に概念実証研究に限られており、現実の薬物開発課題の複雑さを捉えるのに失敗することが多い。本研究では,創薬設計問題に対処するための高度な量子コンピューティングパイプラインを開発することにより,従来の研究から逸脱する。提案手法は, 量子計算の実用的応用を強調し, 実用化に向けて推進するものである。具体的には, 共有結合切断を伴うプロドラッグ活性化のためのギブス自由エネルギープロファイルの正確な決定と, 共有結合相互作用の正確なシミュレーションという, 薬物発見における2つの重要な課題に対処する汎用量子コンピューティングパイプラインを構築した。この研究は、薬物設計で遭遇する検証可能なシナリオ、特に2つのケーススタディに存在する共有結合問題に対する量子コンピューティングのベンチマークの先駆的な取り組みとなり、理論モデルから具体的応用へと移行する。本結果は,現実の薬物設計ワークフローに統合するための量子コンピューティングパイプラインの可能性を示す。 Quantum computing, with its superior computational capabilities compared to classical approaches, holds the potential to revolutionize numerous scientific domains, including pharmaceuticals. However, the application of quantum computing for drug discovery has primarily been limited to proof-of-concept studies, which often fail to capture the intricacies of real-world drug development challenges. In this study, we diverge from conventional investigations by developing an advanced quantum computing pipeline tailored to address genuine drug design problems. Our approach underscores the pragmatic application of quantum computation and propels it towards practical industrial adoption. We specifically construct our versatile quantum computing pipeline to address two critical tasks in drug discovery: the precise determination of Gibbs free energy profiles for prodrug activation involving covalent bond cleavage, and the accurate simulation of covalent bond interactions. This work serves as a pioneering effort in benchmarking quantum computing against veritable scenarios encountered in drug design, especially the covalent bonding issue present in both of the case studies, thereby transitioning from theoretical models to tangible applications. Our results demonstrate the potential of a quantum computing pipeline for integration into real world drug design workflows.	翻訳日:2024-02-07 19:14:11 公開日:2024-02-06
# サンプル効率の良いオフライン強化学習について:データ多様性、後方サンプリングなど On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond ( http://arxiv.org/abs/2401.03301v2 ) ライセンス: Link先を確認	Thanh Nguyen-Tang and Raman Arora	(参考訳) オフライン強化学習(Local reinforcement learning, RL)として知られる, 逐次的意思決定のための歴史的データセットからのサンプル効率学習を促進するものを理解することを目的とする。さらに,(値)関数近似を活用しながらサンプル効率を楽しむアルゴリズムにも興味を持っている。本稿では,これらの基本的な質問について述べる。 (i)オフラインrlにおける以前のカバレッジ尺度の概念を仮定したデータ多様性の概念の提案 (2) この概念を用いて、バージョン空間(VS)、正規化最適化(RO)、後続サンプリング(PS)に基づくオフラインRLアルゴリズムの3つの異なるクラスを統一する。標準仮定の下では,VS-based, RO-based, PS-basedアルゴリズムにより, 有限および線形モデルクラスに対する最先端の準最適境界を回復し, サンプル効率を得る。この結果は、以前の研究がVSベースのアルゴリズムと比較してROベースのアルゴリズムの好ましくないサンプルの複雑さを示唆しているのに対して、後続サンプリングは、その爆発的な性質からオフラインRLではまれである。特に,提案するオフラインrlのためのモデルフリーpsベースアルゴリズムは{novel}であり,自然界において{frequentist}(すなわち最悪の場合)である。 We seek to understand what facilitates sample-efficient learning from historical datasets for sequential decision-making, a problem that is popularly known as offline reinforcement learning (RL). Further, we are interested in algorithms that enjoy sample efficiency while leveraging (value) function approximation. In this paper, we address these fundamental questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to {unify} three distinct classes of offline RL algorithms based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve \emph{comparable} sample efficiency, which recovers the state-of-the-art sub-optimality bounds for finite and linear model classes with the standard assumptions. This result is surprising, given that the prior work suggested an unfavorable sample complexity of the RO-based algorithm compared to the VS-based algorithm, whereas posterior sampling is rarely considered in offline RL due to its explorative nature. Notably, our proposed model-free PS-based algorithm for offline RL is {novel}, with sub-optimality bounds that are {frequentist} (i.e., worst-case) in nature.	翻訳日:2024-02-07 19:13:52 公開日:2024-02-06
# voronav:voronoiベースの大きな言語モデルによるゼロショットオブジェクトナビゲーション VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model ( http://arxiv.org/abs/2401.02695v2 ) ライセンス: Link先を確認	Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu	(参考訳) 家庭用ロボティクスの領域では、ゼロショットオブジェクトナビゲーション(ZSON)タスクは、エージェントが不慣れな環境を巧みに横切り、前もって明示的な訓練をせずに新しいカテゴリーからオブジェクトを見つけることを可能にする。本稿では,新しい意味探索フレームワークvoronavについて紹介する。voronoiグラフを縮小し,探索経路と計画ノードをリアルタイムで構築した意味マップから抽出する。トポロジカルおよびセマンティック情報を活用することで、VoroNavは大きな言語モデル(LLM)で容易に解釈できるパスとイメージのテキストベースの記述を設計する。特に,本手法では,環境コンテキストを表現するため,経路と遠近性記述の相乗効果を示し,ナビゲーションの経路点の確認にコモンセンス推論を適用した。 HM3DとHSSDの大規模な評価では、VoroNavは成功率と探索効率の両方で既存のベンチマークを上回っている(絶対改善:+2.8%、HM3Dは+3.7%、+2.6%、+3.8%、HSSDは+3.8%)。さらに,障害物回避能力と知覚効率を評価する指標を導入し,ZSON計画における我々の手法による改善をさらに裏付けた。プロジェクトページ: https://voro-nav.github.io In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers agents to adeptly traverse unfamiliar environments and locate objects from novel categories without prior explicit training. This paper introduces VoroNav, a novel semantic exploration framework that proposes the Reduced Voronoi Graph to extract exploratory paths and planning nodes from a semantic map constructed in real time. By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model (LLM). In particular, our approach presents a synergy of path and farsight descriptions to represent the environmental context, enabling LLM to apply commonsense reasoning to ascertain waypoints for navigation. Extensive evaluation on HM3D and HSSD validates VoroNav surpasses existing benchmarks in both success rate and exploration efficiency (absolute improvement: +2.8% Success and +3.7% SPL on HM3D, +2.6% Success and +3.8% SPL on HSSD). Additionally introduced metrics that evaluate obstacle avoidance proficiency and perceptual efficiency further corroborate the enhancements achieved by our method in ZSON planning. Project page: https://voro-nav.github.io	翻訳日:2024-02-07 19:13:26 公開日:2024-02-06
# スパース報酬を用いた軌道指向政策最適化 Trajectory-Oriented Policy Optimization with Sparse Rewards ( http://arxiv.org/abs/2401.02225v2 ) ライセンス: Link先を確認	Guojian Wang, Faguo Wu, Xiao Zhang	(参考訳) 深層強化学習(DRL)を習得することは、難解な報酬を含むタスクにおいて困難である。これらの制限された報酬は、エージェントが有意義なフィードバックを得る前に、そのタスクが部分的に、または完全に完了しているかどうかを示すだけである。その結果、既存のDRL探索アルゴリズムの大部分は、合理的な時間枠内で実践的なポリシーを取得するのに苦労している。この課題に対処するため,より高速で効率的なオンラインRLを実現するために,オフラインのデモトラジェクトリを利用する手法を提案する。私たちの重要な洞察は、オフラインデモの軌跡を単なる模倣ではなくガイダンスとして扱うことで、ステートアクション訪問の分布がオフラインデモのそれとわずかに一致するポリシーを学習できるようにすることです。具体的には,最大平均偏差(mmd)とキャストポリシー最適化を距離制約最適化問題として用いる新しい軌道距離について紹介する。そして、この最適化問題をポリシーグレードのアルゴリズムに合理化し、オフラインのデモから得られた洞察によって形成された報酬を統合することを示します。提案手法は,広範囲にわたる離散的および連続的な制御タスクに対する評価を行う。実験の結果,提案アルゴリズムは,多様な探索と最適方針の獲得に関して,ベースライン法よりも優れていることがわかった。 Mastering deep reinforcement learning (DRL) proves challenging in tasks featuring scant rewards. These limited rewards merely signify whether the task is partially or entirely accomplished, necessitating various exploration actions before the agent garners meaningful feedback. Consequently, the majority of existing DRL exploration algorithms struggle to acquire practical policies within a reasonable timeframe. To address this challenge, we introduce an approach leveraging offline demonstration trajectories for swifter and more efficient online RL in environments with sparse rewards. Our pivotal insight involves treating offline demonstration trajectories as guidance, rather than mere imitation, allowing our method to learn a policy whose distribution of state-action visitation marginally matches that of offline demonstrations. We specifically introduce a novel trajectory distance relying on maximum mean discrepancy (MMD) and cast policy optimization as a distance-constrained optimization problem. We then illustrate that this optimization problem can be streamlined into a policy-gradient algorithm, integrating rewards shaped by insights from offline demonstrations. The proposed algorithm undergoes evaluation across extensive discrete and continuous control tasks with sparse and misleading rewards. The experimental findings demonstrate the significant superiority of our proposed algorithm over baseline methods concerning diverse exploration and the acquisition of an optimal policy.	翻訳日:2024-02-07 19:12:58 公開日:2024-02-06
# 非有界損失に対するPAC-Bayes-Chernoff境界 PAC-Bayes-Chernoff bounds for unbounded losses ( http://arxiv.org/abs/2401.01148v3 ) ライセンス: Link先を確認	Ioar Casado, Luis A. Ortega, Andr\'es R. Masegosa and Aritz P\'erez	(参考訳) 我々は,新しいPAC-Bayesオラクルを導入する。この結果は、Cram\'er-Chernoff 境界の PAC-Bayesian 版として理解することができる。証明手法は、損失のCram\'er変換を含む特定のランダム変数のテールを制御することに依存する。我々は主定理のいくつかの応用を強調する。まず,多くのPAC-Bayes境界における自由パラメータの正確な最適化が自然に可能であることを示す。第2に,これまでの結果を回復し,一般化する。最後に、我々のアプローチはより情報的かつ潜在的に厳密な境界をもたらすリッチな仮定で作業できることを示す。この方向において、パラメータノルムとlog-sobolevの不等式に基づいて境界を求める新しい ``model-dependent bounded cgf" 仮定の下で一般境界を与える。これら全ての境界は、新しい後進を得るために最小化することができる。 We introduce a new PAC-Bayes oracle bound for unbounded losses. This result can be understood as a PAC-Bayesian version of the Cram\'er-Chernoff bound. The proof technique relies on controlling the tails of certain random variables involving the Cram\'er transform of the loss. We highlight several applications of the main theorem. First, we show that our result naturally allows exact optimization of the free parameter on many PAC-Bayes bounds. Second, we recover and generalize previous results. Finally, we show that our approach allows working with richer assumptions that result in more informative and potentially tighter bounds. In this direction, we provide a general bound under a new ``model-dependent bounded CGF" assumption from which we obtain bounds based on parameter norms and log-Sobolev inequalities. All these bounds can be minimized to obtain novel posteriors.	翻訳日:2024-02-07 19:12:39 公開日:2024-02-06
# 拡散モデル、画像の超解像とすべて:調査 Diffusion Models, Image Super-Resolution And Everything: A Survey ( http://arxiv.org/abs/2401.00736v2 ) ライセンス: Link先を確認	Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio and Andreas Dengel	(参考訳) 拡散モデル(dms)は、画像スーパーレゾリューション(sr)フィールドを混乱させ、さらに画質と人間の知覚嗜好のギャップを閉じた。訓練は簡単で、従来の生成手法による現実性を超えた非常に高品質なサンプルを作成できる。有望な結果にもかかわらず、計算能力の高い要求、互換性、説明可能性の欠如、色の変化など、さらなる研究を必要とする新たな課題も伴う。残念ながら、この分野への参入は出版物が多いため圧倒的である。これに対処するため、我々は、イメージsrに適用される理論的基礎の統一的な再集計と、この分野の幅広い既存のレビューとは別として、このドメインにおけるユニークな特徴と方法論の基礎となる詳細な分析を提供する。本調査は,DM原則の密集的な理解を具体化し,代替入力領域,条件付け手法,指導機構,汚職空間,ゼロショット学習アプローチなど,現在の研究手法を探求する。 DMのレンズを通して画像SRの進化と現在の傾向を詳細に調べることにより、この急速に進歩する領域におけるさらなるイノベーションを刺激し、既存の課題と今後の方向性を図示する。 Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This survey articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this survey sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area.	翻訳日:2024-02-07 19:12:24 公開日:2024-02-06
# MR-GSM8K:大規模言語モデル評価におけるメタ推論革命 MR-GSM8K: A Meta-Reasoning Revolution in Large Language Model Evaluation ( http://arxiv.org/abs/2312.17080v3 ) ライセンス: Link先を確認	Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia	(参考訳) 本稿では,メタ推論への取り組みに挑戦する,大規模言語モデルのための新しい評価パラダイムを提案する。このアプローチは、従来のエージェントの認知能力を評価するために使用される既存の数学問題解決ベンチマークの重大な欠点に対処する。我々のパラダイムは、しばしば推論プロセスを見落としている結果指向の評価から、モデル間の認知能力を効果的に区別するより包括的な評価へと焦点を移します。例えば、我々のベンチマークでは、GPT-4はGPT3-5の5倍の性能を示している。この新しいパラダイムの意義は、GSM8Kのような現在のベンチマークが、その飽和と様々な推論能力の効果的な分化の欠如のため、LLMの潜在的な認知的欠陥を明らかにする能力にある。当社の包括的な分析には、オープンソースコミュニティとクローズドソースコミュニティの両方の最先端の数学モデルが含まれており、トレーニングと評価アプローチの根本的な欠陥を明らかにしています。本稿では,LLMの評価におけるパラダイムシフトを提唱するだけでなく,AI(Artificial General Intelligence, AGI)の軌道に関する議論にも貢献する。メタ推論評価手法の採用を促進することで,LLMの真の認知能力をより正確に評価することを目指している。 In this work, we introduce a novel evaluation paradigm for Large Language Models, one that challenges them to engage in meta-reasoning. This approach addresses critical shortcomings in existing math problem-solving benchmarks, traditionally used to evaluate the cognitive capabilities of agents. Our paradigm shifts the focus from result-oriented assessments, which often overlook the reasoning process, to a more holistic evaluation that effectively differentiates the cognitive capabilities among models. For example, in our benchmark, GPT-4 demonstrates a performance five times better than GPT3-5. The significance of this new paradigm lies in its ability to reveal potential cognitive deficiencies in LLMs that current benchmarks, such as GSM8K, fail to uncover due to their saturation and lack of effective differentiation among varying reasoning abilities. Our comprehensive analysis includes several state-of-the-art math models from both open-source and closed-source communities, uncovering fundamental deficiencies in their training and evaluation approaches. This paper not only advocates for a paradigm shift in the assessment of LLMs but also contributes to the ongoing discourse on the trajectory towards Artificial General Intelligence (AGI). By promoting the adoption of meta-reasoning evaluation methods similar to ours, we aim to facilitate a more accurate assessment of the true cognitive abilities of LLMs.	翻訳日:2024-02-07 19:12:01 公開日:2024-02-06
# SMUTF:生成タグとハイブリッド機能を用いたスキーママッチング SMUTF: Schema Matching Using Generative Tags and Hybrid Features ( http://arxiv.org/abs/2402.01685v2 ) ライセンス: Link先を確認	Yu Zhang, Mei Di, Haozheng Luo, Chenwei Xu, Richard Tzong-Han Tsai	(参考訳) smutfは,教師付き学習がオープンドメインタスクのパフォーマンスに影響を与えないことを想定し,効果的なクロスドメインマッチングを実現する,大規模表型データスキーママッチング(sm)のためのユニークなアプローチである。このシステムは、ルールベースの機能工学、事前学習された言語モデル、ジェネレーティブな大規模言語モデルを組み合わせている。人道交換言語に触発された革新的適応では、各データ列に「生成タグ」を配置し、SMの有効性を高める。 SMUTFは幅広い汎用性を示し、既存の事前訓練された埋め込み、分類方法、生成モデルとシームレスに動作する。 sm用の広範な公開データセットがないことを認識して、公開人道データからhdxsmデータセットを作成し、オープンソース化しました。これは現在利用可能な最も徹底的なSMデータセットだと考えています。様々な公開データセットと新しいHDXSMデータセットの評価において、SMUTFは、精度と効率の点で既存の最先端モデルを上回り、F1スコアを11.84%改善し、ROCのAUCを5.08%改善した。 We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the Humanitarian Exchange Language, we deploy 'generative tags' for each data column, enhancing the effectiveness of SM. SMUTF exhibits extensive versatility, working seamlessly with any pre-existing pre-trained embeddings, classification methods, and generative models. Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data. We believe this to be the most exhaustive SM dataset currently available. In evaluations across various public datasets and the novel HDXSM dataset, SMUTF demonstrated exceptional performance, surpassing existing state-of-the-art models in terms of accuracy and efficiency, and} improving the F1 score by 11.84% and the AUC of ROC by 5.08%.	翻訳日:2024-02-07 19:03:58 公開日:2024-02-06
# LLsM:大規模言語モデルを用いた言語ステレオグラフィ LLsM: Generative Linguistic Steganography with Large Language Model ( http://arxiv.org/abs/2401.15656v2 ) ライセンス: Link先を確認	Yihao Wang and Ruiqi Song and Ru Zhang and Jianyi Liu and Lingxiao Li	(参考訳) 言語ステガノグラフィー(LS)タスクは、秘密情報に基づいてステガノグラフィーテキスト(ステゴ)を生成することを目的としている。認証を受けた受取人だけが、テキスト内の秘密の存在を認識し、それらを抽出することで、プライバシーを保護できる。しかし,既存のスキームが生成するステゴの制御性は乏しく,スタイルなどの特定の談話の特徴を包含することは困難である。その結果、ステゴは容易に検出でき、カバート通信を妥協する。本稿では,Large Language Model (LLM) を用いた最初のLSである LLsM を提案する。我々は,高度な談話特性を包含する大規模構築データセットを用いてllama2の微調整を行った。そして、この談話を案内情報として使用し、秘密とともにプロンプトの形式で微調整LDMに入力する。このベースで構築された候補プールはレンジエンコードされ、シークレットを使用して間隔を決定する。この区間の始まりと終わりの同じ接頭辞は、この瞬間に埋め込まれた秘密である。実験の結果, LLsMはテキスト品質, 統計解析, 談話マッチング, アンチステガナシスに関して, LS-taskおよび関連タスクベースラインよりも優れていた。特に、llsmのmave matricは、いくつかのベースラインを70%-80%上回っており、その反ステグアナリティクス性能は30%-40%高い。また、LLsMにより生成される長長のステゴの例を示し、長長のLSタスクにおいてその潜在的な優位性を示す。 Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information. Only authorized recipients can perceive the existence of secrets in the texts and extract them, thereby preserving privacy. However, the controllability of the stego generated by existing schemes is poor, and the stego is difficult to contain specific discourse characteristics such as style. As a result, the stego is easily detectable, compromising covert communication. To address these problems, this paper proposes LLsM, the first LS with the Large Language Model (LLM). We fine-tuned the LLaMA2 with a large-scale constructed dataset encompassing rich discourse characteristics, which enables the fine-tuned LLM to generate texts with specific discourse in a controllable manner. Then the discourse is used as guiding information and inputted into the fine-tuned LLM in the form of the Prompt together with secret. On this basis, the constructed candidate pool will be range encoded and use secret to determine the interval. The same prefix of this interval's beginning and ending is the secret embedded at this moment. Experiments show that LLsM performs superior to prevalent LS-task and related-task baselines regarding text quality, statistical analysis, discourse matching, and anti-steganalysis. In particular, LLsM's MAUVE matric surpasses some baselines by 70%-80%, and its anti-steganalysis performance is 30%-40% higher. Notably, we also present examples of longer stegos generated by LLsM, showing its potential superiority in long LS tasks.	翻訳日:2024-02-07 19:03:04 公開日:2024-02-06
# 先進的なアーティストの意見:AI生成芸術における透明性、オーナーシップ、公正性に関する調査研究 Foregrounding Artist Opinions: A Survey Study on Transparency, Ownership, and Fairness in AI Generative Art ( http://arxiv.org/abs/2401.15497v3 ) ライセンス: Link先を確認	Juniper Lovato, Julia Zimmerman, Isabelle Smith, Peter Dodds, Jennifer Karson	(参考訳) 生成人工知能(AI)ツールは、アートのようなアウトプットを作成し、創造的なプロセスを支援するために使用される。これらのツールはアーティストに利益をもたらすが、芸術労働力を傷つけ、芸術的および知的所有権を侵害する可能性がある。生成AI作成者は、アーティストからの明確な同意なく、アーチストのデジタル作品をスクラップして、生成AIモデルをトレーニングし、大規模にアートライクなモデル出力を生成する。これらのアウトプットは、現在、市場での人間アーティストとの競争に使われ、また、生成過程においてアートを作成するアーティストによって使用されている。我々は459人のアーティストを調査し、生成AIアートの潜在的有用性と害に関するアーティストの意見の緊張関係を調査した。本研究では、生成AIアートモデルの有用性と脅威、AIアートトレーニングモデルにおける芸術作品の公開における公正な実践、AIアートデリバティブの所有と権利、公正な補償に関するアーティストの意見を調査する。概して、モデルクリエーターは、AIモデルをトレーニングするために使用するアートやイメージの詳細を開示する必要がある、と私たちは考えています。また, アーティストの意見は, 職業的地位や実践, 人口動態, 美術品購入の有無, 生成aiの習熟度, 利用によって異なることがわかった。この研究の結果が、アートコミュニティとジェネレーティブAI研究者と開発者の間でより有意義なコラボレーションと整合性をもたらすことを期待しています。 Generative Artificial Intelligence (AI) tools are used to create art-like outputs and aid in the creative process. While these tools have potential benefits for artists, they also have the potential to harm the art workforce and infringe upon artistic and intellectual property rights. Without explicit consent from artists, Generative AI creators scrape artists' digital work to train Generative AI models and produce art-like model outputs at scale. These outputs are now being used to compete with human artists in the marketplace as well as being used by some artists in their generative processes to create art. We surveyed 459 artists to investigate the tension between artists' opinions on Generative AI art's potential utility and harm. This study surveys artists' opinions on the utility and threat of Generative AI art models, fair practices in the disclosure of artistic works in AI art training models, ownership and rights of AI art derivatives, and fair compensation. We find that artists, by and large, think that model creators should be required to disclose in detail what art and images they use to train their AI models. We also find that artists' opinions vary by professional status and practice, demographics, whether they have purchased art, and familiarity with and use of Generative AI. We hope the results of this work will further more meaningful collaboration and alignment between the art community and Generative AI researchers and developers.	翻訳日:2024-02-07 19:02:39 公開日:2024-02-06
# 太陽発電予測のための位置非依存電源領域適応学習 Location Agnostic Source-Free Domain Adaptive Learning to Predict Solar Power Generation ( http://arxiv.org/abs/2401.14422v2 ) ライセンス: Link先を確認	Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman, Jubair Yusuf, Md Saiful Islam Sajol, Farhana Akter Tumpa	(参考訳) 太陽発電の予測は、空間的および時間的変動を示す気候特性に依存しているため、難しい課題である。予測モデルの性能はデータ分布の変化によって異なる場所によって異なり、結果としてある地域でうまく機能するが他の地域では機能しないモデルとなる。また、地球温暖化の影響で、年間を通じて天候の変化が顕著に加速している。この現象は、時間経過とともに同じ地理的領域内であっても、既存のモデルの有効性が低下する可能性をもたらす。本稿では,前述の課題を解決するための気象特性を用いた太陽発電を推定するために,ドメイン適応型深層学習に基づくフレームワークを提案する。フィードフォワード深部畳み込みネットワークモデルは、既知の位置データセットを教師付きでトレーニングし、後に未知の場所の太陽エネルギーを予測するために使用される。この適応型データ駆動アプローチは、計算速度、ストレージ効率、そして最先端の非適応的手法が失敗するシナリオで結果を改善する能力において、顕著な利点を示す。我々の手法では、カリフォルニア(CA)、フロリダ(FL)、ニューヨーク(NY)の順応的でない手法と比較して、太陽エネルギー予測精度が10.47 \%$、7.44 \%$、5.11\%$改善されている。 The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a notable acceleration in the alteration of weather patterns on an annual basis. This phenomenon introduces the potential for diminished efficacy of existing models, even within the same geographical region, as time progresses. In this paper, a domain adaptive deep learning-based framework is proposed to estimate solar power generation using weather features that can solve the aforementioned challenges. A feed-forward deep convolutional network model is trained for a known location dataset in a supervised manner and utilized to predict the solar power of an unknown location later. This adaptive data-driven approach exhibits notable advantages in terms of computing speed, storage efficiency, and its ability to improve outcomes in scenarios where state-of-the-art non-adaptive methods fail. Our method has shown an improvement of $10.47 \%$, $7.44 \%$, $5.11\%$ in solar power prediction accuracy compared to best performing non-adaptive method for California (CA), Florida (FL) and New York (NY), respectively.	翻訳日:2024-02-07 19:01:58 公開日:2024-02-06
# MTRGL:マルチモーダル時間関係グラフ学習による時間相関の影響 MTRGL:Effective Temporal Correlation Discerning through Multi-modal Temporal Relational Graph Learning ( http://arxiv.org/abs/2401.14199v2 ) ライセンス: Link先を確認	Junwei Su, Shan Wu, Jinhui Li	(参考訳) 本研究では,ペアトレーディングに着目し,ディープラーニングと金融市場アプリケーションのシナジーについて検討する。この市場中立戦略は量的金融に不可欠であり、高度なディープラーニング技術に適している。ペアトレーディングにおける重要な課題は、エンティティ間の時間的相関を識別することであり、多様なデータモダリティの統合を必要とする。そこで我々は,MTRGL(Multi-modal Temporal Relation Graph Learning)という新しいフレームワークを導入する。 MTRGLは時系列データと離散特徴を時間グラフに結合し、メモリベースの時間グラフニューラルネットワークを使用する。このアプローチは、経験的成功を示す時間グラフリンク予測タスクとして、時間相関識別を再構成する。実世界のデータセットに関する我々の実験は、MTRGLの優れた性能を確認し、自動ペアトレーディング戦略の洗練におけるその約束を強調した。 In this study, we explore the synergy of deep learning and financial market applications, focusing on pair trading. This market-neutral strategy is integral to quantitative finance and is apt for advanced deep-learning techniques. A pivotal challenge in pair trading is discerning temporal correlations among entities, necessitating the integration of diverse data modalities. Addressing this, we introduce a novel framework, Multi-modal Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and discrete features into a temporal graph and employs a memory-based temporal graph neural network. This approach reframes temporal correlation identification as a temporal graph link prediction task, which has shown empirical success. Our experiments on real-world datasets confirm the superior performance of MTRGL, emphasizing its promise in refining automated pair trading strategies.	翻訳日:2024-02-07 19:01:35 公開日:2024-02-06
# 真空光学非線形観測用sagnac干渉計の性能 Performance of a Sagnac interferometer to observe vacuum optical nonlinearity ( http://arxiv.org/abs/2401.13720v2 ) ライセンス: Link先を確認	Aur\'elie Max Mailliet, Adrien E. Kraych, Fran\c{c}ois Couchot, Xavier Sarazin, Elsa Baynard, Julien Demailly, Moana Pittman, Arache Djannati-Ata\"i, Sophie Kazamias, Scott Robertson, Marcel Urban	(参考訳) 量子電磁力学では、真空は非線形光学媒体となり、その光学指数は強い外部電磁場の存在下で修正されるべきである。 dellight project (deflection of light by light) は laserix が配信する集中フェムト秒レーザーパルスを用いてこの効果を観測することを目的としている。サニャック干渉計を用いて、高強度パルス(ポンプ)によって誘導される真空指数勾配を越える低強度集束パルス(プローブ)の偏向を測定する。フェムト秒レーザーパルスを用いたサニャック干渉計がDeLLightプロジェクトのために開発された。以前のプロトタイプと比較して、干渉計は相互作用領域におけるプローブビームの焦点を含むようになった。本稿では、干渉計の感度を制限する重要な実験パラメータ、すなわち、消滅因子、空間分解能、およびプローブパルスの焦点のウエストを測定し、特徴付ける。今後の改善について論じる。 In Quantum Electrodynamics, vacuum becomes a nonlinear optical medium: its optical index should be modified in the presence of intense external electromagnetic fields. The DeLLight project (Deflection of Light by Light) aims to observe this effect using intense focused femtosecond laser pulses delivered by LASERIX. The principle is to measure with a Sagnac interferometer the deflection of a low-intensity focused pulse (probe) crossing the vacuum index gradient induced by a high-intensity pulse (pump). A Sagnac interferometer working with femtosecond laser pulses has been developed for the DeLLight project. Compared to previous prototypes, the interferometer now includes the focusing of the probe beam in the interaction area. In this article, we measure and characterize the critical experimental parameters limiting the sensitivity of the interferometer, namely the extinction factor, the spatial resolution, and the waist at focus of the probe pulse. We discuss future improvements.	翻訳日:2024-02-07 19:01:20 公開日:2024-02-06
# LPNL:大規模言語モデルを用いたスケーラブルリンク予測 LPNL: Scalable Link Prediction with Large Language Models ( http://arxiv.org/abs/2401.13227v2 ) ライセンス: Link先を確認	Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei and Xueqi Cheng	(参考訳) グラフ学習への大規模言語モデル(llm)の適用の探求は、新たな取り組みだ。しかし、巨大なグラフに固有の膨大な情報はこのプロセスに重大な課題をもたらす。本研究はリンク予測タスクに着目し,大規模不均一グラフ上でスケーラブルなリンク予測用に設計された大規模言語モデルに基づくフレームワークである$\textbf{lpnl}$(自然言語によるリンク予測)を紹介する。グラフの詳細を自然言語で表現するリンク予測のための新しいプロンプトを設計した。本稿では,グラフから重要な情報を抽出する2段階のサンプリングパイプラインと,事前定義された範囲内で入力トークンを制御するための分割・分割戦略を提案する。リンク予測用に設計された自己教師型学習に基づいてT5モデルを微調整する。大規模グラフ上でのリンク予測タスクにおいて,LPNLは複数の高度なベースラインよりも優れていることを示す。 Exploring the application of large language models (LLMs) to graph learning is a emerging endeavor. However, the vast amount of information inherent in large graphs poses significant challenges to this process. This work focuses on the link prediction task and introduces $\textbf{LPNL}$ (Link Prediction via Natural Language), a framework based on large language models designed for scalable link prediction on large-scale heterogeneous graphs. We design novel prompts for link prediction that articulate graph details in natural language. We propose a two-stage sampling pipeline to extract crucial information from the graphs, and a divide-and-conquer strategy to control the input tokens within predefined limits, addressing the challenge of overwhelming information. We fine-tune a T5 model based on our self-supervised learning designed for link prediction. Extensive experimental results demonstrate that LPNL outperforms multiple advanced baselines in link prediction tasks on large-scale graphs.	翻訳日:2024-02-07 19:01:07 公開日:2024-02-06
# テキストと画像の拡散をマスターする:マルチモーダルLLMによる再カプセル化, 計画, 生成 Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs ( http://arxiv.org/abs/2401.11708v2 ) ライセンス: Link先を確認	Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui	(参考訳) 拡散モデルはテキスト・画像の生成・編集において例外的な性能を示した。しかし、複数の属性と関係を持つ複数のオブジェクトを含む複雑なテキストプロンプトを扱う場合、既存のメソッドは、しばしば課題に直面する。本稿では,マルチモーダルLLMの強力なチェーン・オブ・シント推論能力を活用し,テキスト・ツー・イメージ拡散モデルの構成性を向上する,新たなトレーニングフリーなテキスト・ツー・イメージ生成/編集フレームワークを提案する。本手法では,MLLMをグローバルプランナとして使用し,複雑な画像をサブリージョン内の複数の単純な生成タスクに分解する。地域的構成生成を可能にするために,補完的な地域拡散を提案する。さらに,提案したRPGのテキスト誘導画像生成と編集をクローズドループ方式で統合し,一般化能力を向上する。 dall-e 3やsdxlといった最先端のテキストから画像への拡散モデル、特にマルチカテゴリのオブジェクト構成やテキスト・イメージのセマンティクスアライメントよりもrpgの方が優れています。特に、RPGフレームワークは、さまざまなMLLMアーキテクチャ(MiniGPT-4など)と拡散バックボーン(ControlNetなど)との広範な互換性を示す。私たちのコードは、https://github.com/YangLing0818/RPG-DiffusionMasterで利用可能です。 Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessing the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models. Our approach employs the MLLM as a global planner to decompose the process of generating complex images into multiple simpler generation tasks within subregions. We propose complementary regional diffusion to enable region-wise compositional generation. Furthermore, we integrate text-guided image generation and editing within the proposed RPG in a closed-loop fashion, thereby enhancing generalization ability. Extensive experiments demonstrate our RPG outperforms state-of-the-art text-to-image diffusion models, including DALL-E 3 and SDXL, particularly in multi-category object composition and text-image semantic alignment. Notably, our RPG framework exhibits wide compatibility with various MLLM architectures (e.g., MiniGPT-4) and diffusion backbones (e.g., ControlNet). Our code is available at: https://github.com/YangLing0818/RPG-DiffusionMaster	翻訳日:2024-02-07 19:00:50 公開日:2024-02-06
# 反事実彫刻はひび割れ状態の忠実度を指数関数的に改善する Counter-factual carving exponentially improves entangled-state fidelity ( http://arxiv.org/abs/2401.11407v2 ) ライセンス: Link先を確認	Joshua Ramette, Josiah Sinclair, Vladan Vuleti\'c	(参考訳) 本研究では,プローブの"no-jump"進化を用いて,忠実度の高い絡み合った多体状態を生成する新しい手法である"counter-factual"型彫刻を提案する。プローブは、量子ビットのターゲットアンサンブルに結合され、ターゲットの集団スピンに応じて指数関数的に減衰するように設計され、プローブの崩壊を観測する後の選択が、より早い分解スピン成分を正確に除去する。プローブと$N$-qubitターゲットがコオペラティティティの空洞モード$C$を介して相互作用すると、反事実彫刻は、以前の彫刻方式よりも指数関数的改善である$e^{-C/N}$の不忠実な絡み合った状態を生成する。反事実彫刻は量子力学や量子コンピューティングへの応用のために複雑な絡み合った状態を生成することができる。 We propose a new method, "counter-factual" carving, that uses the "no-jump" evolution of a probe to generate entangled many-body states of high fidelity. The probe is coupled to a target ensemble of qubits and engineered to exponentially decay at a rate depending on the target collective spin, such that post-selecting on observing no probe decay precisely removes select faster-decaying spin components. When probe and $N$-qubit target interact via a cavity mode of cooperativity $C$, counter-factual carving generates entangled states with infidelities of $e^{-C/N}$, an exponential improvement over previous carving schemes. Counter-factual carving can generate complex entangled states for applications in quantum metrology and quantum computing.	翻訳日:2024-02-07 19:00:24 公開日:2024-02-06
# ベストは最善の手段で終わる:アプリレビューの倫理的懸念 The Best Ends by the Best Means: Ethical Concerns in App Reviews ( http://arxiv.org/abs/2401.11063v2 ) ライセンス: Link先を確認	Lauren Olson, Neelam Tjikhoeri, Emitz\'a Guzm\'an	(参考訳) この研究は、ユーザのアプリストアレビューに見られる倫理的懸念を分析します。本研究は,モバイルアプリケーション(アプリケーション)における倫理的関心が広まり,エンドユーザーや社会に深刻な脅威をもたらし,系統的な分析や分類方法が欠如しているためである。さらにapp storeのレビューでは,地理的に分散した大規模オーディエンスから,ソフトウェアの欠陥を特定する上で極めて重要なユーザ視点の収集が可能になる。分析の結果,500万件のユーザレビューを収集し,ユーザの嗜好を表す倫理的関心事のセットを開発し,これらのレビューのサンプルを手作業でラベル付けした。 1) 検閲, 身元盗難, 安全に関する倫理的懸念を高い頻度で報告すること, (2) 倫理的懸念を伴うユーザレビューはより長く, 人気が高く, 評価が低いこと, (3) 評価の分類とフィルタリングの自動化の可能性が高いことが判明した。ソフトウェア進化における倫理的懸念を体系的に考慮する上で,app storeのレビューが有効であることを強調する。 This work analyzes ethical concerns found in users' app store reviews. We performed this study because ethical concerns in mobile applications (apps) are widespread, pose severe threats to end users and society, and lack systematic analysis and methods for detection and classification. In addition, app store reviews allow practitioners to collect users' perspectives, crucial for identifying software flaws, from a geographically distributed and large-scale audience. For our analysis, we collected five million user reviews, developed a set of ethical concerns representative of user preferences, and manually labeled a sample of these reviews. We found that (1) users highly report ethical concerns about censorship, identity theft, and safety (2) user reviews with ethical concerns are longer, more popular, and lowly rated, and (3) there is high automation potential for the classification and filtering of these reviews. Our results highlight the relevance of using app store reviews for the systematic consideration of ethical concerns during software evolution.	翻訳日:2024-02-07 19:00:06 公開日:2024-02-06
# グロッキングの視点からみた言語モデルの臨界データサイズ Critical Data Size of Language Models from a Grokking Perspective ( http://arxiv.org/abs/2401.10463v2 ) ライセンス: Link先を確認	Xuekai Zhu, Yao Fu, Bowen Zhou, Zhouhan Lin	(参考訳) 我々は、言語モデルにおける重要なデータサイズを探索する。これは、素早い記憶から遅い一般化への根本的なシフトを示すしきい値である。グロッキング構成下での相転移をデータ効率仮説に定式化し,言語モデルの学習ダイナミクスにおけるデータ不足,不十分,余剰レジームを同定する。我々は、初期化と重み劣化を再スケーリングすることで、単純化された言語モデル上でグラッキングを安定的に再現するためのグラッキング構成を開発する。一般化は言語モデルが臨界サイズに達する場合にのみ起こることを示す。サンプル単位とモデル単位のグロッキングを解析し,提案するデータ効率仮説を検証した。実験の結果,言語データセットのクリティカルデータセットサイズで発生するスムーズな相転移が明らかになった。モデルのサイズが大きくなると、このクリティカルポイントも大きくなり、より大きなモデルにはより多くのデータが必要となる。その結果,言語モデル学習の理解を深め,言語モデルの学習メカニズムにおけるデータの役割に関する新たな視点が得られた。 We explore the critical data size in language models, a threshold that marks a fundamental shift from quick memorization to slow generalization. We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics. We develop a grokking configuration to reproduce grokking on simplistic language models stably by rescaling initialization and weight decay. We show that generalization occurs only when language models reach a critical size. We analyze grokking across sample-wise and model-wise, verifying the proposed data efficiency hypothesis. Our experiments reveal smoother phase transitions occurring at the critical dataset size for language datasets. As the model size increases, this critical point also becomes larger, indicating that larger models require more data. Our results deepen the understanding of language model training, offering a novel perspective on the role of data in the learning mechanism of language models.	翻訳日:2024-02-07 18:59:45 公開日:2024-02-06
# 原理グラフトランスフォーマーを目指して Towards Principled Graph Transformers ( http://arxiv.org/abs/2401.10119v2 ) ライセンス: Link先を確認	Luis M\"uller and Daniel Kusuma and Christopher Morris	(参考訳) k次元Weisfeiler-Leman(k-WL)階層に基づくグラフ学習アーキテクチャは、理論的によく理解された表現力を提供する。しかし、そのようなアーキテクチャは現実のタスクにしっかりとした予測性能を持たず、実際の影響を限定することが多い。対照的に、グラフトランスフォーマーのようなグローバルな注意に基づくモデルは、実際には強力なパフォーマンスを示しているが、表現力とk-wl階層との比較は、特にこれらのアーキテクチャが表現力と予測性能のために位置エンコーディングや構造エンコーディングに依存しているため、依然として困難である。そこで本研究では,ノードではなくノードペアで動作するグローバルアテンションモデルであるEdge Transformerが,少なくとも3WLの表現力を持つことを示す。実験的に、Edge Transformerは、位置や構造的エンコーディングを頼らずに、予測性能に関する他の理論的に整合したアーキテクチャを上回ることを実証する。 Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings.	翻訳日:2024-02-07 18:59:29 公開日:2024-02-06
# 神経オデムの補間における深さと幅の相互作用 Interplay between depth and width for interpolation in neural ODEs ( http://arxiv.org/abs/2401.09902v3 ) ライセンス: Link先を確認	Antonio \'Alvarez-L\'opez, Arselane Hadj Slimane, Enrique Zuazua	(参考訳) ニューラル常微分方程式 (neural ODEs) は制御の観点から教師あり学習の自然な道具として登場したが、それらの最適アーキテクチャの完全な理解はいまだ解明されていない。本研究では,その幅$p$と層遷移数$L$(事実上深さ$L+1$)の相互作用について検討する。具体的には、ワッサーシュタイン誤差マージン$\varepsilon>0$の中で、N$の点対からなる有限データセット$D$または2つの確率測度を$\mathbb{R}^d$で補間する能力の観点からモデル表現性を評価する。この結果から,データセット補間は$O(1+N/p)$,測定補間は$L=O\left(1+(p\varepsilon^d)^{-1}\right)$として,$L$が$O(1+N/p)$,$L$が$L$のバランスをとることが判明した。自律的なケースでは、$l=0$の場合、データセットの補間に焦点を当てた別の研究が必要です。我々は、$\varepsilon$-approximate controllabilityの緩和問題に対処し、$\varepsilon\sim O(\log(p)p^{-1/d})$の誤差崩壊を確立する。この減衰率は、$d$を補間するカスタム構築リプシッツベクトル場に普遍近似定理を適用する結果である。高次元設定では、$p=O(N)$ニューロンが正確な制御を達成するのに十分であることを示す。 Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.	翻訳日:2024-02-07 18:59:10 公開日:2024-02-06
# 大規模言語モデルがベクトルデータベースを満たすとき:調査 When Large Language Models Meet Vector Databases: A Survey ( http://arxiv.org/abs/2402.01763v2 ) ライセンス: Link先を確認	Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, Min Zhang	(参考訳) 本稿では,大規模言語モデル (LLMs) とベクトルデータベース (VecDBs) の相乗的ポテンシャルについて検討する。 LLMの普及に伴い、幻覚、時代遅れの知識、禁止的な商用アプリケーションコスト、メモリ問題など、多くの課題が伴う。 VecDBは、LLM操作に固有の高次元ベクトル表現を保存、取得、管理する効率的な手段を提供することによって、これらの問題の魅力的な解決策として浮上する。本稿では,LLM と VecDB の基本原理を概説し,LLM の機能強化に対するそれらの統合の影響を批判的に分析する。この議論は、先進的なデータ処理と知識抽出能力のためにLLMとVecDBの結合を最適化するためのさらなる研究を促進することを目的として、この領域における投機的将来の発展に関する議論へと展開する。 This survey explores the synergistic potential of Large Language Models (LLMs) and Vector Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation of LLMs comes a host of challenges, including hallucinations, outdated knowledge, prohibitive commercial application costs, and memory issues. VecDBs emerge as a compelling solution to these issues by offering an efficient means to store, retrieve, and manage the high-dimensional vector representations intrinsic to LLM operations. Through this nuanced review, we delineate the foundational principles of LLMs and VecDBs and critically analyze their integration's impact on enhancing LLM functionalities. This discourse extends into a discussion on the speculative future developments in this domain, aiming to catalyze further research into optimizing the confluence of LLMs and VecDBs for advanced data handling and knowledge extraction capabilities.	翻訳日:2024-02-07 18:50:16 公開日:2024-02-06
# 戦略としての文字列としてのステート:ゲーム理論による言語モデルの操り方 States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers ( http://arxiv.org/abs/2402.01704v2 ) ライセンス: Link先を確認	Ian Gemp, Yoram Bachrach, Marc Lanctot, Roma Patel, Vibhavari Dasagi, Luke Marris, Georgios Piliouras, Siqi Liu, Karl Tuyls	(参考訳) ゲーム理論は、合理的エージェント間の戦略的相互作用の数学的モデルの研究である。言語は人間にとって重要な対話手段であるが、歴史的に対話とその戦略的動機を数学的にモデル化することは困難である。言語相互作用に関連するプレイヤー、戦略、報酬の適切なモデル(つまり、ゲーム理論の従来の象徴論理への結合)は、既存のゲーム理論アルゴリズムが言語空間における戦略的な解決策を提供することができる。言い換えれば、バインディングは対話における安定した合理的な会話戦略を計算するための経路を提供することができる。大規模言語モデル(llm)は、その生成能力が自然対話の現実的な人間のようなシミュレーションを可能にする点に到達している。様々な方法でそれらを促すことで、異なる出力発話に対して反応を制御できる。自然言語の表現力を活用することで、llmは現実世界のアプリケーションで基盤となる新しい対話シナリオを迅速に生成する上でも役立ちます。本研究では,対話からゲーム理論への結合の可能性と,既存の平衡探索アルゴリズムの一般化について述べる。さらに,提案するバインディングとともにllms生成機能を活用することで,ゲーム理論的なソリューション概念を学習し,テスト可能な,公式なゲームリポジトリを合成することができる。また, LLM によるゲーム生成, ゲーム理論解法, 模倣学習を組み合わせて, LLM の戦略能力向上のプロセスを構築する方法を示す。 Game theory is the study of mathematical models of strategic interactions among rational agents. Language is a key medium of interaction for humans, though it has historically proven difficult to model dialogue and its strategic motivations mathematically. A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i.e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language. In other words, a binding could provide a route to computing stable, rational conversational strategies in dialogue. Large language models (LLMs) have arguably reached a point where their generative capabilities can enable realistic, human-like simulations of natural dialogue. By prompting them in various ways, we can steer their responses towards different output utterances. Leveraging the expressivity of natural language, LLMs can also help us quickly generate new dialogue scenarios, which are grounded in real world applications. In this work, we present one possible binding from dialogue to game theory as well as generalizations of existing equilibrium finding algorithms to this setting. In addition, by exploiting LLMs generation capabilities along with our proposed binding, we can synthesize a large repository of formally-defined games in which one can study and test game-theoretic solution concepts. We also demonstrate how one can combine LLM-driven game generation, game-theoretic solvers, and imitation learning to construct a process for improving the strategic capabilities of LLMs.	翻訳日:2024-02-07 18:49:57 公開日:2024-02-06
# 靴センサの最近の進歩 : 医療におけるスマート・フットウェアの役割 Recent Innovations in Footwear Sensors: Role of Smart Footwear in Healthcare -- A Survey ( http://arxiv.org/abs/2402.01645v2 ) ライセンス: Link先を確認	Pradyumna G. R., Roopa B. Hegde, Bommegowda K. B., Anil Kumar Bhat, Ganesh R. Naik, Amit N. Pujari	(参考訳) スマートシューズは、パーソナライズされた健康モニタリングと補助技術の新時代を支えている。この靴はbluetoothなどの技術をデータ収集や無線伝送に活用し、gps追跡、障害物検出、フィットネストラッキングなどの機能を備えている。本稿では,スマートシュー技術の現状について概説するとともに,健康モニタリング,エネルギー収穫,視覚障害者支援機能,データ分析のためのディープラーニングの統合について述べる。本研究は、特に糖尿病患者に対する医療応用におけるスマートフットウェアの可能性と、この分野での現在進行中の研究について論じる。複雑な構造、不適合性、快適性、コストなど、現在の履物の問題も議論されている。 Smart shoes have ushered in a new era of personalised health monitoring and assistive technology. The shoe leverages technologies such as Bluetooth for data collection and wireless transmission and incorporates features such as GPS tracking, obstacle detection, and fitness tracking. This article provides an overview of the current state of smart shoe technology, highlighting the integration of advanced sensors for health monitoring, energy harvesting, assistive features for the visually impaired, and deep learning for data analysis. The study discusses the potential of smart footwear in medical applications, particularly for patients with diabetes, and the ongoing research in this field. Current footwear challenges are also discussed, including complex construction, poor fit, comfort, and high cost.	翻訳日:2024-02-07 18:49:34 公開日:2024-02-06
# 予測可能な性能保証を伴うAIエラー訂正のための弱教師付き学習者 Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees ( http://arxiv.org/abs/2402.00899v2 ) ライセンス: Link先を確認	Ivan Y. Tyukin, Tatiana Tyukina, Daniel van Helden, Zedong Zheng, Evgeny M. Mirkes, Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Penelope Allison	(参考訳) 本稿では,最優先性能保証付き弱教師付きAI誤り訂正器を導入することにより,AIエラーを処理する新しい手法を提案する。これらのAI補正は、その決定を承認または拒否することで、以前に構築されたいくつかの下位分類器の決定を緩和する役割を持つ補助マップである。決定の拒絶は、決定の棄却を示唆する信号として用いることができる。この作業の重要な技術的焦点は、不正確な決定の確率の限界を通して、これらの新しいai修正者のパフォーマンス保証を提供することである。これらの境界は分布非依存であり、データ次元の仮定に依存しない。私たちの経験的な例は、トレーニングデータが不足している実世界の課題において、画像分類器のパフォーマンス向上にフレームワークを適用する方法を示している。 We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce.	翻訳日:2024-02-07 18:49:13 公開日:2024-02-06
# ビデオは効果的に使っていない: 更新されたドメイン適応ビデオセグメンテーションベースライン We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline ( http://arxiv.org/abs/2402.00868v2 ) ライセンス: Link先を確認	Simar Kareer, Vivek Vijaykumar, Harsh Maheshwari, Prithvijit Chattopadhyay, Judy Hoffman, Viraj Prabhu	(参考訳) セマンティックセグメンテーション(DAS)のための教師なしドメイン適応には、ラベル付きソースドメインからラベル付きターゲットドメインへのイメージに基づいてトレーニングされたモデルを適応させようとする多くの作業がある。以前の研究の大半はフレームレベルの画像DAS問題としてこれを研究してきたが、ビデオDASでは隣接するフレームに存在する時間信号をさらに活用しようと試みている。しかし、Video-DASの研究は歴史的にImage-DASとは異なるベンチマークのセットを最小のベンチマークで研究してきた。この作業では、このギャップに対処します。驚いたことに、(1)データとモデルアーキテクチャを慎重に制御した後でも、(HRDAとHRDA+MIC)ビデオDAS手法は、確立されたビデオDASベンチマーク(+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq)において、(HRDAとHRDA+MIC)ビデオDAS手法よりも優れており、(2)Image-DASとVideo-DAS技術の組み合わせはデータセット間の限界改善にしか至らない。 Image-DAS と Video-DAS のサイロ化の進展を避けるため、我々は、共通のベンチマークで Video-DAS と Image-DAS メソッドの包括的なセットをサポートするコードベースをオープンソース化した。コードはhttps://github.com/simarkareer/unifiedvideodaで利用可能 There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC) outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA	翻訳日:2024-02-07 18:49:03 公開日:2024-02-06
# ポジションペーパー:大規模AIの時代におけるベイズ的深層学習 Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI ( http://arxiv.org/abs/2402.00809v2 ) ライセンス: Link先を確認	Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Jose Miguel Hernandez Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David R\"ugamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang	(参考訳) ディープラーニング研究の現在の状況では、大規模な画像と言語データセットを含む教師付きタスクにおいて、高い予測精度を達成することに重点が置かれている。しかし、より広い視点から見れば、不確実性、活動的かつ継続的な学習、科学的なデータなど、見落とされがちなメトリクス、タスク、データタイプが、注意を喚起する。 Bayesian Deep Learning(BDL)は,これらのさまざまな設定にまたがってメリットを提供する,有望な道の1つである。本稿では,BDLが深層学習の能力を高めることができることを示唆する。 BDLの強みを再考し、既存の課題を認識し、これらの障害に対処するためのエキサイティングな研究方法を強調します。今後の議論は、大規模ファンデーションモデルをBDLと組み合わせて、その潜在能力を最大限に活用する方法に焦点を当てている。 In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.	翻訳日:2024-02-07 18:48:01 公開日:2024-02-06
# シーケンスモデリングのためのトランスの表現力と機構の理解 Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling ( http://arxiv.org/abs/2402.00522v2 ) ライセンス: Link先を確認	Mingze Wang, Weinan E	(参考訳) 長大,スパース,複雑なメモリを有するシーケンスモデリングのための変圧器の近似特性を体系的に研究する。点生成自己着脱,位置符号化,フィードフォワード層などのトランスフォーマーの異なる成分が,その表現力にどのような影響を及ぼすかを調査し,それらの組み合わせ効果を明示的な近似率の確立を通じて検討する。本研究は,トランスフォーマーにおけるクリティカルパラメータの役割を明らかにする。レイヤ数やアテンションヘッド数などである。 We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.	翻訳日:2024-02-07 18:47:46 公開日:2024-02-06
# 量子エネルギーテレポーテーションにおける量子相関のロバスト性 Robustness of quantum correlation in quantum energy teleportation ( http://arxiv.org/abs/2402.00479v2 ) ライセンス: Link先を確認	Kazuki Ikeda and Adam Lowe	(参考訳) 本稿では、従来のエンタングルメントエントロピーではなく、量子不協和を用いた量子エネルギーテレポーテーション(QET)プロトコルにおける量子相関の進化について述べる。局所的な観測と条件付き操作を繰り返し行うQETプロトコルでは、混合状態の統計的生成のために量子相関は非自明になる。本稿では,混合状態における量子相関の尺度として量子ディスコードを用い,そのテレポーティングエネルギーと相転移との関係について検討する。 QETを実行するアリスとボブの過程において、アリスとボブの間の絡み合いはアリスの量子状態の測定によって完全に崩壊し、量子相関が消えると予想される。しかし、この予想に反して、量子不協和を用いて量子相関がQETの全過程中に消失しないことが示されている。種々の相構造におけるQETの量子相関のロバスト性を示すために, キラル化学ポテンシャルと化学ポテンシャルの両方を持つナムブ・ジョナ・ラシーノ(NJL)モデルを含むいくつかのベンチマークモデルを用いて数値解析を行い, キラル密度演算子に結合した左クォークと右クォークのキラル不均衡を模した相構造の研究に有用である。研究した全てのケースにおいて、量子不協和は相転移の秩序パラメータとして振る舞う。 We present the evolution of quantum correlation in the quantum energy teleportation (QET) protocol using quantum discord, instead of the traditionally used entanglement entropy. In the QET protocol, where local observations and conditional operations are repeated, quantum correlations become nontrivial because of the statistical creation of mixed states. In this paper, we use quantum discord as a measure of quantum correlation in mixed states and investigate its relationship to teleported energy and phase transitions. During the process of Alice and Bob performing QET, one would expect that the entanglement between Alice and Bob is completely broken by Alice's measurement of the quantum state, and thus the quantum correlation disappears. However, contrary to this expectation, it is shown using quantum discord that the quantum correlation does not disappear during the entire process of QET. To demonstrate the robustness of the quantum correlation in QET at various phase structures, we perform the numerical analysis using several benchmark models including the Nambu-Jona-Lasino (NJL) model with both the chiral chemical potential and the chemical potential, which are useful to study the phase structures mimicking the chiral imbalance between left- and right- quarks coupled to the chirality density operator. In all cases we studied, the quantum discord behaved as an order parameter of the phase transition.	翻訳日:2024-02-07 18:47:36 公開日:2024-02-06
# マルチモーダルシリアル再生を用いた人・大言語モデルの抽象化の比較 Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction ( http://arxiv.org/abs/2402.03618v1 ) ライセンス: Link先を確認	Sreejan Kumar, Raja Marjieh, Byron Zhang, Declan Campbell, Michael Y. Hu, Umang Bhatt, Brenden Lake, Thomas L. Griffiths	(参考訳) 人間は騒がしい感覚データから世界の有用な抽象概念を抽出する。連続的な再現は、ある人が刺激を観察し、次にそれを再現して再生の連鎖を形成するという、電話ゲームに似たパラダイムを通じて、人々がどのように世界を実現するかを研究できる。過去の連続再生実験は、通常、単一の感覚的モダリティを用いるが、人間はしばしば言語を通して世界の抽象を互いに伝達する。抽象概念形成における効果言語の検討のために,視覚刺激を受けた人に言語形式で再現するよう依頼し,その逆で,新しいマルチモーダル・シリアル再生フレームワークを実装した。ヒトとGPT-4の双方で一本鎖と多本鎖を走らせ,言語をモダリティとして加えると,GPT-4よりもヒトの生殖に大きな影響を及ぼすことがわかった。これは、人間の視覚的および言語的表現がGPT-4よりも解離しやすいことを示唆している。 Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4.	翻訳日:2024-02-07 17:24:31 公開日:2024-02-06
# 大規模言語モデルを用いた実世界データからのコントラセプティブ・スイッチングの理由 Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models ( http://arxiv.org/abs/2402.03597v1 ) ライセンス: Link先を確認	Brenda Y. Miao, Christopher YK Williams, Ebenezer Chinedu-Eneh, Travis Zack, Emily Alsentzer, Atul J. Butte, Irene Y. Chen	(参考訳) 処方避妊具は女性の生殖維持に重要な役割を果たす。米国では約5000万人の女性が避妊具を使用しており、避妊薬の選択と切り替えを駆動する要因を理解することは大きな関心事である。しかし、薬物交換に関連する多くの要因は、しばしば、構造化されていない臨床ノートにのみ記録され、抽出が困難である。本稿では,近年開発された大規模言語モデル GPT-4 (HIPAA準拠のMicrosoft Azure API) のゼロショット能力を評価し,UCSFインフォメーション・コモンズ臨床ノートデータセットから避妊薬のクラスを切り替える理由を明らかにする。 GPT-4は, 避妊開始時と停止時にそれぞれ0.849点, 0.881点のマイクロF1スコアのベースラインBERTベースモデルよりも優れている。 gpt-4抽出理由のヒトによる評価は91.4%の精度で、幻覚は最小であった。抽出された理由を用いて,非教師付きトピックモデリングアプローチを用いた切り替えの主な理由として,患者の嗜好,有害事象,保険を特定した。また, 特定の人口集団における避妊スイッチングの理由として, 「体重増加/ムード変化」と「保険カバレッジ」が不均等に見出された。私たちのコードと補足データはhttps://github.com/bmiao10/contraceptive-switchingで入手できます。 Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to extract. Here, we evaluate the zero-shot abilities of a recently developed large language model, GPT-4 (via HIPAA-compliant Microsoft Azure API), to identify reasons for switching between classes of contraceptives from the UCSF Information Commons clinical notes dataset. We demonstrate that GPT-4 can accurately extract reasons for contraceptive switching, outperforming baseline BERT-based models with microF1 scores of 0.849 and 0.881 for contraceptive start and stop extraction, respectively. Human evaluation of GPT-4-extracted reasons for switching showed 91.4% accuracy, with minimal hallucinations. Using extracted reasons, we identified patient preference, adverse events, and insurance as key reasons for switching using unsupervised topic modeling approaches. Notably, we also showed using our approach that "weight gain/mood change" and "insurance coverage" are disproportionately found as reasons for contraceptive switching in specific demographic populations. Our code and supplemental data are available at https://github.com/BMiao10/contraceptive-switching.	翻訳日:2024-02-07 17:24:15 公開日:2024-02-06
# グラフ構造ピラミッド型全スライド画像表現 GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation ( http://arxiv.org/abs/2402.03592v1 ) ライセンス: Link先を確認	Ali Khajegili Mirabadi, Graham Archibald, Amirali Darbandsari, Alberto Contreras-Sanz, Ramin Ebrahim Nakhli, Maryam Asadi, Allen Zhang, C. Blake Gilks, Peter Black, Gang Wang, Hossein Farahani, Ali Bashashati	(参考訳) がんのサブタイピングはデジタル病理学において最も難しい課題の1つであり、近年の研究では、ギガピクセル全体のスライド画像(WSI)を処理するマルチインスタンスラーニング(MIL)が注目されている。しかし、MILアプローチはWSIに含まれる画像間および画像内情報を利用できない。本稿では,デジタル病理学におけるWSI処理のための新しいグラフ構造化多重化フレームワークGRASPを提案する。我々のアプローチは、WSIの処理における病理学者の振る舞いを動的にエミュレートし、WSIの階層構造から利益を得るように設計されています。 GRASPは、従来のプール機構の代わりに収束ベースのノードアグリゲーションを導入し、2つの異なるがんデータセットに対する最先端メソッドを最大10%のバランスの取れた精度で上回り、パラメータの数の観点から最も近いパフォーマンスの最先端モデルよりも7倍小さい。以上の結果から,GRASPはがんの亜型化のための様々な倍率の発見と相談において動的であり,様々なハイパーパラメータにわたって信頼性と安定性を有することが示唆された。モデルの振る舞いは、2人の専門的な病理学者によって評価され、モデルのダイナミクスの解釈可能性が確認された。また、実験的な証拠とともに、GRASPがグラフ内の異なる倍率やノードとどのように相互作用して予測を行うかを説明する理論的基盤も提供します。 GRASPの強い特性と単純な構造は、デジタル病理学におけるWSI表現の解釈可能な構造ベースの設計を促進するだろうと考えている。さらに,珍しい卵巣癌と膀胱癌のグラフデータセットを2つ公開し,この分野に貢献する。 Cancer subtyping is one of the most challenging tasks in digital pathology, where Multiple Instance Learning (MIL) by processing gigapixel whole slide images (WSIs) has been in the spotlight of recent research. However, MIL approaches do not take advantage of inter- and intra-magnification information contained in WSIs. In this work, we present GRASP, a novel graph-structured multi-magnification framework for processing WSIs in digital pathology. Our approach is designed to dynamically emulate the pathologist's behavior in handling WSIs and benefits from the hierarchical structure of WSIs. GRASP, which introduces a convergence-based node aggregation instead of traditional pooling mechanisms, outperforms state-of-the-art methods over two distinct cancer datasets by a margin of up to 10% balanced accuracy, while being 7 times smaller than the closest-performing state-of-the-art model in terms of the number of parameters. Our results show that GRASP is dynamic in finding and consulting with different magnifications for subtyping cancers and is reliable and stable across different hyperparameters. The model's behavior has been evaluated by two expert pathologists confirming the interpretability of the model's dynamic. We also provide a theoretical foundation, along with empirical evidence, for our work, explaining how GRASP interacts with different magnifications and nodes in the graph to make predictions. We believe that the strong characteristics yet simple structure of GRASP will encourage the development of interpretable, structure-based designs for WSI representation in digital pathology. Furthermore, we publish two large graph datasets of rare Ovarian and Bladder cancers to contribute to the field.	翻訳日:2024-02-07 17:23:49 公開日:2024-02-06
# CAT-SAM:Segmentation Anything ModelのFew-Shot Adaptationのための条件調整ネットワーク CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model ( http://arxiv.org/abs/2402.03631v1 ) ライセンス: Link先を確認	Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Shijian Lu	(参考訳) 最近のSegment Anything Model (SAM) は、一般画像のセグメンテーションにおいて顕著なゼロショット能力と柔軟な幾何学的プロンプトを示した。しかしSAMは、航空、医療、非RGB画像など、様々な非伝統的なイメージを扱う際にしばしば苦労する。本稿では,CAT-SAM(ConditionAl Tuning Network)を提案する。 CAT-SAMはSAM全体を凍結し、マスクデコーダとイメージエンコーダに少数の学習可能なパラメータを同時に適用する。コア設計は、重厚画像エンコーダと軽量マスクデコーダのデコーダ条件付きジョイントチューニングを可能にするプロンプトブリッジ構造である。ブリッジングは、マスクデコーダのプロンプトトークンを画像エンコーダにマッピングし、相互の利益により、エンコーダとデコーダのシナジー適応を促進する。我々は、入力空間に学習可能なプロンプトトークンを注入する1つのCAT-SAMと、軽量なアダプタネットワークを挿入する2つのCAT-SAM変異をもたらすイメージエンコーダの代表的なチューニング戦略を開発する。 11の非従来型タスクに対する大規模な実験により、CAT-SAMはどちらも、非常に困難なワンショット適応設定の下でも、常に優れた目標セグメンテーション性能を達成している。プロジェクトページ: \url{https://xiaoaoran.github.io/projects/CAT-SAM} The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: \url{https://xiaoaoran.github.io/projects/CAT-SAM}	翻訳日:2024-02-07 17:10:08 公開日:2024-02-06
# IDE開発静的コンテキストのネイティブ統合によるLCMベースのコーディングツールの強化 Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context ( http://arxiv.org/abs/2402.03630v1 ) ライセンス: Link先を確認	Yichen Li and Yun Peng and Yintong Huo and Michael R. Lyu	(参考訳) 大規模言語モデル(LLM)は、Copilotのようなコードアシスタントサービスの開発において重要な役割を担っていることが証明されている。ファイル内のコンテキストでトレーニングされているため、現在のllmは単一のソースファイルのコード補完に非常に有効である。しかし、クロスファイル情報を必要とする大規模なソフトウェアプロジェクトに対して、リポジトリレベルのコード補完を行うことは困難である。 LLMベースのリポジトリレベルのコード補完に関する既存の研究は、ファイル間のコンテキストを特定し統合するが、LLMの低い精度と限られたコンテキスト長に悩まされている。本稿では,統合開発環境(IDE)がリポジトリレベルのコード補完のために,直接的かつ正確かつリアルタイムなクロスファイル情報を提供できることを論じる。我々は,IDEネイティブな静的コンテキストをクロスコンテキスト構築や自己修正のための診断結果に活用する,実践的なフレームワークであるIDECoderを提案する。 IDECoderは、リポジトリレベルのコード補完のLLMの機能を強化するために、IDEで利用可能なリッチなコンテキスト情報を利用する。我々はIDECoderの性能を検証するための予備実験を行い、この相乗効果が今後の探索に有望な傾向を示すことを観察した。 Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that require cross-file information. Existing research on LLM-based repository-level code completion identifies and integrates cross-file contexts, but it suffers from low accuracy and limited context length of LLMs. In this paper, we argue that Integrated Development Environments (IDEs) can provide direct, accurate and real-time cross-file information for repository-level code completion. We propose IDECoder, a practical framework that leverages IDE native static contexts for cross-context construction and diagnosis results for self-refinement. IDECoder utilizes the rich cross-context information available in IDEs to enhance the capabilities of LLMs of repository-level code completion. We conducted preliminary experiments to validate the performance of IDECoder and observed that this synergy represents a promising trend for future exploration.	翻訳日:2024-02-07 17:09:43 公開日:2024-02-06
# 個人推論のための線形化群精度に対する異なる影響 Disparate Impact on Group Accuracy of Linearization for Private Inference ( http://arxiv.org/abs/2402.03629v1 ) ライセンス: Link先を確認	Saswat Das, Marco Romanelli, Ferdinando Fioretto	(参考訳) 暗号化されたセキュアなデータに対するプライバシ保存推論の確保は、よく知られた計算上の課題である。非線形アクティベーションにおけるコストのかかる暗号計算のボトルネックを軽減するため、最近の手法では、ニューラルネットワークにおいてこれらのアクティベーションのターゲット部分の線形化が提案されている。この技術は、しばしば精度に無視できる影響で、ランタイムを著しく削減する。本稿では,このような計算的利点が公正コストの増大につながることを実証する。具体的には、ReLUアクティベーション数の減少が多数派に比べて少数派の精度を著しく低下させることがわかった。これらの観察を説明するために、決定境界の性質に関する制限された仮定の下での数学的解釈と、広く使われているデータセットやアーキテクチャにまたがるこの問題の流行を示す。最後に,線形化モデルの微調整手順を変更する簡単な手順が,効果的な緩和戦略として有効であることを示す。 Ensuring privacy-preserving inference on cryptographically secure data is a well-known computational challenge. To alleviate the bottleneck of costly cryptographic computations in non-linear activations, recent methods have suggested linearizing a targeted portion of these activations in neural networks. This technique results in significantly reduced runtimes with often negligible impacts on accuracy. In this paper, we demonstrate that such computational benefits may lead to increased fairness costs. Specifically, we find that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups. To explain these observations, we provide a mathematical interpretation under restricted assumptions about the nature of the decision boundary, while also showing the prevalence of this problem across widely used datasets and architectures. Finally, we show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.	翻訳日:2024-02-07 17:09:20 公開日:2024-02-06
# 専門家エージェント -- 大きな言語モデルから人間レベルの能力を持つ自律的な専門家へと進化する Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies ( http://arxiv.org/abs/2402.03628v1 ) ライセンス: Link先を確認	Zhixuan Chu, Yan Wang, Feng Zhu, Lu Yu, Longfei Li, Jinjie Gu	(参考訳) ChatGPT、PaLM、GPT-4のような大型言語モデル(LLM)の出現は、自然言語処理において顕著な進歩を触媒し、人間の言語流布や推論能力を示している。本稿では、制御可能で専門的で対話的でプロフェッショナルレベルの能力を持つ自律エージェントを作成するためのLLM機能を利用したアプリケーションフレームワークであるProfessional Agents(PAgents)の概念を紹介する。我々は、PAgentsが継続的な専門知識を通じてプロフェッショナルサービスを再形成できると仮定する。提案するpagentsフレームワークは,生成,進化,シナジーのための3層アーキテクチャであるベースツール層,中間エージェント層,トップシナジー層を含んでいる。本稿は,LLMの現実的応用の可能性について論じる。我々は、PAgentの高度化と統合が、複雑なドメインに対して専門的な熟達を示し、重要なニーズに対処し、人工知能の達成に繋がる可能性があると論じている。 The advent of large language models (LLMs) such as ChatGPT, PaLM, and GPT-4 has catalyzed remarkable advances in natural language processing, demonstrating human-like language fluency and reasoning capacities. This position paper introduces the concept of Professional Agents (PAgents), an application framework harnessing LLM capabilities to create autonomous agents with controllable, specialized, interactive, and professional-level competencies. We posit that PAgents can reshape professional services through continuously developed expertise. Our proposed PAgents framework entails a tri-layered architecture for genesis, evolution, and synergy: a base tool layer, a middle agent layer, and a top synergy layer. This paper aims to spur discourse on promising real-world applications of LLMs. We argue the increasing sophistication and integration of PAgents could lead to AI systems exhibiting professional mastery over complex domains, serving critical needs, and potentially achieving artificial general intelligence.	翻訳日:2024-02-07 17:09:04 公開日:2024-02-06
# 視覚言語モデルのロバストネスに対する部分分散化ソフトマックス損失 Partially Recentralization Softmax Loss for Vision-Language Models Robustness ( http://arxiv.org/abs/2402.03627v1 ) ライセンス: Link先を確認	Hao Wang, Xin Zhang, Jinzhe Jiang, Yaqian Zhao and Chen Li	(参考訳) 大規模言語モデルが自然言語処理タスク(NLP)を突破するにつれ、マルチモーダル技術は非常に人気がある。しかし、マルチモーダルNLPは、入力への摂動によってモデルの出力を劇的に変化させることができる敵攻撃に弱いことが示されている。コンピュータビジョンとNLPモデルの両方でいくつかの防御技術が提案されているが、モデルのマルチモーダルロバスト性は十分に研究されていない。本稿では,事前学習されたマルチモーダルモデルの損失関数を,トップkソフトマックス出力を制限して提供する逆ロバスト性について検討する。評価と評価から,本実験では,訓練済みモデルの微調整後,攻撃に対する対角的堅牢性を著しく改善できることが示唆された。出力の多様性、一般化、この種の損失関数の堅牢性とパフォーマンスのトレードオフなど、さらなる研究が必要である。私たちのコードは、この論文が受け入れられた後に利用可能になるでしょう As Large Language Models make a breakthrough in natural language processing tasks (NLP), multimodal technique becomes extremely popular. However, it has been shown that multimodal NLP are vulnerable to adversarial attacks, where the outputs of a model can be dramatically changed by a perturbation to the input. While several defense techniques have been proposed both in computer vision and NLP models, the multimodal robustness of models have not been fully explored. In this paper, we study the adversarial robustness provided by modifying loss function of pre-trained multimodal models, by restricting top K softmax outputs. Based on the evaluation and scoring, our experiments show that after a fine-tuning, adversarial robustness of pre-trained models can be significantly improved, against popular attacks. Further research should be studying, such as output diversity, generalization and the robustness-performance trade-off of this kind of loss functions. Our code will be available after this paper is accepted	翻訳日:2024-02-07 17:08:46 公開日:2024-02-06
# 多項式時間におけるReLUニューラルネットワーク近似グローバルオプティマの凸緩和 Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time ( http://arxiv.org/abs/2402.03625v1 ) ライセンス: Link先を確認	Sungyoon Kim, Mert Pilanci	(参考訳) 本稿では,2層ReLUネットワーク間における重み劣化と凸緩和の最適性ギャップについて検討する。トレーニングデータがランダムである場合、元の問題と緩和の間の相対的最適性ギャップは、トレーニングサンプルの数である$n$ の係数 o(\sqrt{\log n})$ で境界される。単純な応用は、元の非凸問題を対数係数まで解くことが保証される、扱いやすい多項式時間アルゴリズムに繋がる。さらに, 軽度の仮定の下では, パラメータのランダム初期化により, 局所勾配法がトレーニング損失の少ない点にほぼ確実に収束することを示す。その結果,既存の結果と比較して指数関数的な改善が得られ,局所勾配法がうまく機能する理由の解明に新たな光を当てることができた。 In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of $O(\sqrt{\log n})$, where $n$ is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that with random initialization on the parameters local gradient methods almost surely converge to a point that has low training loss. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.	翻訳日:2024-02-07 17:08:28 公開日:2024-02-06
# 確率回路におけるMarginal MAPのためのニューラルネットワーク近似器 Neural Network Approximators for Marginal MAP in Probabilistic Circuits ( http://arxiv.org/abs/2402.03621v1 ) ライセンス: Link先を確認	Shivvrat Arya, Tahrima Rahman, Vibhav Gogate	(参考訳) 総生産ネットワークのような確率回路(PC)は、大規模な多変量確率分布を効率的に表現する。ベイジアンネットワークやマルコフネットワークのような他の確率的表現よりも実際は、PCがネットワークのサイズを線形にスケールする時間に限界推論(MAR)タスクを解くことができるため、それらは好まれる。残念なことに、これらのモデルでは最大ポステリオリ(MAP)と限界MAP(MMAP)タスクはNPハードのままである。整数線形計画法などの最適化問題に対して,ニューラルネットワークを最適に近い解を生成するための最近の研究から着想を得て,ニューラルネットワークを用いてPC内の(M)MAP推論を近似する手法を提案する。提案手法の主な考え方は,連続的多線形関数を用いてクエリ変数への代入のコストを近似し,後者を損失関数として用いることである。新しい手法の2つの主な利点は、自己教師型であり、ニューラルネットワークが学習されると、解を出力するのに線形時間しか必要なくなることである。我々は,いくつかのベンチマークデータセットにおける新しいアプローチを評価し,pcのmmapタスクを実際に解くために使用される3つの競合する線形時間近似,最大積推論,最大数推論,逐次推定よりも優れていることを示す。 Probabilistic circuits (PCs) such as sum-product networks efficiently represent large multi-variate probability distributions. They are preferred in practice over other probabilistic representations such as Bayesian and Markov networks because PCs can solve marginal inference (MAR) tasks in time that scales linearly in the size of the network. Unfortunately, the maximum-a-posteriori (MAP) and marginal MAP (MMAP) tasks remain NP-hard in these models. Inspired by the recent work on using neural networks for generating near-optimal solutions to optimization problems such as integer linear programming, we propose an approach that uses neural networks to approximate (M)MAP inference in PCs. The key idea in our approach is to approximate the cost of an assignment to the query variables using a continuous multilinear function, and then use the latter as a loss function. The two main benefits of our new method are that it is self-supervised and after the neural network is learned, it requires only linear time to output a solution. We evaluate our new approach on several benchmark datasets and show that it outperforms three competing linear time approximations, max-product inference, max-marginal inference and sequential estimation, which are used in practice to solve MMAP tasks in PCs.	翻訳日:2024-02-07 17:08:14 公開日:2024-02-06
# 自己発見: 大きな言語モデル推論構造を自己組織化する Self-Discover: Large Language Models Self-Compose Reasoning Structures ( http://arxiv.org/abs/2402.03620v1 ) ライセンス: Link先を確認	Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng	(参考訳) 本稿では, LLM の汎用フレームワークである SELF-DISCOVER を導入し, タスク固有の推論構造を自己発見し, 典型的なプロンプト手法では難しい複雑な推論問題に対処する。フレームワークの中核は自己発見プロセスであり、LCMは批判的思考やステップバイステップ思考などの複数のアトミック推論モジュールを選択し、それらを復号中に従うための明示的な推論構造に構成する。 SELF-DISCOVERは、BigBench-Hard、グラウンドドエージェント推論、MATHといった挑戦的推論ベンチマークに対して、GPT-4とPaLM 2のパフォーマンスを、Chain of Thought (CoT)と比較して32%改善した。さらに、自己発見は推論集約的な手法であるcot-self-consistencyを20%以上上回り、推論計算を10～40倍削減する。最後に, 自己発見推論構造は, PaLM 2-L から GPT-4 まで, GPT-4 から Llama2 まで, モデルファミリー全体にわたって普遍的に適用可能であることを示す。 We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.	翻訳日:2024-02-07 17:07:54 公開日:2024-02-06
# ハイブリッド職場意思決定支援のための大規模言語モデル活用 Leveraging Large Language Models for Hybrid Workplace Decision Support ( http://arxiv.org/abs/2402.03616v1 ) ライセンス: Link先を確認	Yujin Kim, Chin-Chia Hsu	(参考訳) 大きな言語モデル(LLM)は、様々なテキスト処理タスクを実行し、提案されたアクションや決定に対してテキストによる説明を提供する可能性を持っている。ハイブリッドワークの時代において、LLMは、ハイブリッドワークプランを設計している労働者にインテリジェントな意思決定支援を提供することができる。特に、多くの意思決定要因のバランスをとる労働者に提案や説明を提供することで、作業経験を向上させることができる。本稿では,LLMの推論技術を活用した,ハイブリッド作業環境におけるワークスペースの決定支援モデルを提案する。まず、LLMが適切なワークスペース提案を行う能力について検討する。その推論はプロンプトのガイドラインを超えており、LLMはワークスペースで利用可能なリソース間のトレードオフを管理することができる。我々は,ワークスペース選択における作業者の意思決定過程を理解し,システムの有効性を評価するために,広範なユーザ調査を実施している。作業者の判断は, LLMの提案や説明に影響される可能性がある。本研究の参加者は, 理由の有無にかかわらず, 便利なシステムであることが確認された。この結果から,LLMを活用したワークスペース選択システムにより,従業員はハイブリッド職場におけるワークスペース選択のメリットを享受できることがわかった。 Large Language Models (LLMs) hold the potential to perform a variety of text processing tasks and provide textual explanations for proposed actions or decisions. In the era of hybrid work, LLMs can provide intelligent decision support for workers who are designing their hybrid work plans. In particular, they can offer suggestions and explanations to workers balancing numerous decision factors, thereby enhancing their work experience. In this paper, we present a decision support model for workspaces in hybrid work environments, leveraging the reasoning skill of LLMs. We first examine LLM's capability of making suitable workspace suggestions. We find that its reasoning extends beyond the guidelines in the prompt and the LLM can manage the trade-off among the available resources in the workspaces. We conduct an extensive user study to understand workers' decision process for workspace choices and evaluate the effectiveness of the system. We observe that a worker's decision could be influenced by the LLM's suggestions and explanations. The participants in our study find the system to be convenient, regardless of whether reasons are provided or not. Our results show that employees can benefit from the LLM-empowered system for their workspace selection in hybrid workplace.	翻訳日:2024-02-07 17:07:30 公開日:2024-02-06
# 多変量時系列データのためのベイズ因子グランガーカウサルグラフ Bayesian Factorised Granger-Causal Graphs For Multivariate Time-series Data ( http://arxiv.org/abs/2402.03614v1 ) ライセンス: Link先を確認	He Zhao and Edwin V. Bonilla	(参考訳) 本研究では,多変量時系列データからGranger因果関係を自動的に検出する問題について検討する。ベクトル自己回帰(VAR)モデルは、ベイズ変種や、より最近のディープニューラルネットワークを用いた開発など、この問題に対してタイムテストされている。グランガー因果関係のための既存のVAR法は、スパーシリティ誘導法(英語版)またはポストホック閾値を用いて、それらの係数をグランガー因果グラフ(英語版)として解釈する。代わりに、二元グランガー因果グラフよりも先に階層グラフを持つ新しいベイズ型varモデルを提案する。我々は,2進グランガー因果グラフの後方推定に有効なアルゴリズムを開発した。本手法は,不確かさの定量化,ハイパーパラメータの低減,特に疎多変量時系列データにおいて,競合するアプローチよりも優れた性能を実現する。 We study the problem of automatically discovering Granger causal relations from observational multivariate time-series data. Vector autoregressive (VAR) models have been time-tested for this problem, including Bayesian variants and more recent developments using deep neural networks. Most existing VAR methods for Granger causality use sparsity-inducing penalties/priors or post-hoc thresholds to interpret their coefficients as Granger causal graphs. Instead, we propose a new Bayesian VAR model with a hierarchical graph prior over binary Granger causal graphs, separately from the VAR coefficients. We develop an efficient algorithm to infer the posterior over binary Granger causal graphs. Our method provides better uncertainty quantification, has less hyperparameters, and achieves better performance than competing approaches, especially on sparse multivariate time-series data.	翻訳日:2024-02-07 17:07:12 公開日:2024-02-06
# RAP:マルチモーダルLLMエージェントのコンテキスト記憶による検索拡張計画 RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents ( http://arxiv.org/abs/2402.03610v1 ) ライセンス: Link先を確認	Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You	(参考訳) 最近の進歩により、ロボット工学、ゲーム、API統合など、ますます複雑な意思決定アプリケーションのためのエージェントとして、LLM(Large Language Models)がデプロイできるようになった。しかし、人間の行動である現在の意思決定プロセスにおける過去の経験を反映して、大きな課題が生まれ続けている。そこで本稿では,現在状況や状況に応じた過去の経験を動的に活用し,エージェントの計画能力を向上するためのRAP(Retrieval-Augmented Planning)フレームワークを提案する。 rapは、テキストのみの環境とマルチモーダル環境の両方で優れているため、幅広いタスクに適しています。経験的評価は、テキストシナリオにおけるSOTA性能を実現し、具体的タスクに対するマルチモーダルLLMエージェントのパフォーマンスを顕著に向上するRAPの有効性を示す。これらの結果は、複雑な実世界のアプリケーションにおいて、LLMエージェントの機能と適用性を向上させるRAPの可能性を強調している。 Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications.	翻訳日:2024-02-07 17:06:57 公開日:2024-02-06
# 大運動量移動と発射原子を用いた点源干渉型慣性測定器の感度と帯域幅 Sensitivity and Bandwidth of a Point-Source-Interferometry-based Inertial Measurement Unit Employing Large Momentum Transfer and Launched Atoms ( http://arxiv.org/abs/2402.03608v1 ) ライセンス: Link先を確認	Jinyang Li, Timothy Kovachy, Jason Bonacum, Selim M. Shahriar	(参考訳) 本研究では,大質量運動量移動(lmt)を用いた点光源干渉計を用いて加速度計と回転センシングの感度と帯域を理論的に解析した。打ち上げプロセスにより、ラマンパルスの方向を物理的に変更することなくLMTプロセスを実現することができ、装置を著しく単純化し、測定に必要な時間を削減することができる。これらの利点は、3つの軸に沿って回転と加速度を計測できる慣性測定ユニット(IMU)を実現するためにこのプロセスが使われるとより重要になる。我々は、そのようなIMUの明示的なスキームを記述し、実験的にアクセス可能なパラメータに対する予測感度と帯域幅を決定する。 We analyze theoretically the sensitivity and bandwidth of accelerometry and rotation sensing with a point source interferometer employing large momentum transfer (LMT) with molasses-launched atoms. The launching process makes it possible to realize the LMT process without the need to physically change directions of the Raman pulses, thus significantly simplifying the apparatus and reducing the amount of time needed to make the measurements. These advantages become more important when this process is used for realizing an inertial measurement unit (IMU) that can measure rotation around and acceleration along each of the three axes. We describe an explicit scheme for a such an IMU and determine the expected sensitivity and bandwidth thereof for experimentally accessible parameters.	翻訳日:2024-02-07 17:06:41 公開日:2024-02-06
# 知識融合学習を用いた効果的なマルチモーダルマーケティングのためのモダリティ間の文脈一致の改善 Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning ( http://arxiv.org/abs/2402.03607v1 ) ライセンス: Link先を確認	Trilok Padhi, Ugur Kursuncu, Yaman Kumar, Valerie L. Shalin, Lane Peterson Fronczek	(参考訳) 複数のモーダルでモーメントをキャプチャできるスマートデバイスの普及により、ユーザはオンラインでマルチモーダル情報を体験できるようになった。しかし、大きな言語(LLM)とビジョンモデル(LVM)は、相反する意味関係を持つ全体的意味を捉えることにはまだ限界がある。明示的で常識的な知識(例えば知識グラフ)がなければ、視覚言語モデル(vlms)は、巨大なコーパスでハイレベルなパターンを捉えて暗黙的な表現のみを学習し、必須の文脈横断的手がかりを欠く。本研究では,ダウンストリームタスクの性能を向上させるために,知識グラフの形で明示的な常識知識を結合するフレームワークを設計し,マルチモーダルマーケティングキャンペーンの有効性を予測した。マーケティングアプリケーションは,提案手法を評価するための説得力のある指標を提供するが,本手法は,多モードキャンペーンの可能性を早期に検出し,マーケティング理論の評価と拡張を可能にする。 The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.	翻訳日:2024-02-07 17:06:30 公開日:2024-02-06
# 防衛と安全のためのモノのインターネット」の展望 A Review on Internet of Things for Defense and Public Safety ( http://arxiv.org/abs/2402.03599v1 ) ライセンス: Link先を確認	Paula Fraga-Lamas, Tiago M. Fern\'andez-Caram\'es, Manuel Su\'arez-Albela, Luis Castedo and Miguel Gonz\'alez-L\'opez	(参考訳) IoT(Internet of Things, モノのインターネット)は、組織が日常のビジネスや産業の手続きをコミュニケーションし、組織化する方法を変えつつある。その採用は、多数の資産を管理し、複雑な分散プロセスを調整するセクターに適していることが証明されている。この調査は、IoTテクノロジ(すなわち、データ駆動アプリケーションや組み込み自動化およびインテリジェント適応システム)を適用して、現代的な戦争に革命をもたらし、業界のものに似たメリットを提供する、という大きな可能性を分析します。防衛と公衆安全(PS)がより優れた商用IoT機能を活用して、戦闘員や最初の対応者に生存可能性を高めるシナリオを特定し、コストを削減し、運用効率と効率を向上する。この記事では、軍事分野とミッションクリティカルなシナリオにおける既存のIoTシステムのギャップと欠点について、主要な戦術的要件とアーキテクチャについてレビューする。このレビューでは、広く展開する上でのオープンな課題を特徴付け、防御とPSに安価なIoTを実現するための研究ロードマップを公開している。 The Internet of Things (IoT) is undeniably transforming the way that organizations communicate and organize everyday businesses and industrial procedures. Its adoption has proven well suited for sectors that manage a large number of assets and coordinate complex and distributed processes. This survey analyzes the great potential for applying IoT technologies (i.e., data-driven applications or embedded automation and intelligent adaptive systems) to revolutionize modern warfare and provide benefits similar to those in industry. It identifies scenarios where Defense and Public Safety (PS) could leverage better commercial IoT capabilities to deliver greater survivability to the warfighter or first responders, while reducing costs and increasing operation efficiency and effectiveness. This article reviews the main tactical requirements and the architecture, examining gaps and shortcomings in existing IoT systems across the military field and mission-critical scenarios. The review characterizes the open challenges for a broad deployment and presents a research roadmap for enabling an affordable IoT for defense and PS.	翻訳日:2024-02-07 17:06:11 公開日:2024-02-06
# 部分グロモフ・ワッサーシュタインの効率的な解法 Efficient Solvers for Partial Gromov-Wasserstein ( http://arxiv.org/abs/2402.03664v1 ) ライセンス: Link先を確認	Yikun Bai, Rocio Diaz Martin, Hengrong Du, Ashkan Shahbazi, and Soheil Kolouri	(参考訳) 部分グロモフ=ワッサーシュタイン問題(英語版)(PGW)は、潜在的に異なる距離空間に存在する不等質量との測度の比較を容易にするため、これらの空間間の不均衡および部分的マッチングを可能にする。本稿では, PGW問題をGromov-Wasserstein問題の変種に変換できることを示す。この変換は、フランク・ウルフアルゴリズムに基づく数学的および計算的に等価な2つの新しい解法につながり、pgw問題の効率的な解を与える。さらに、PGW問題は計量測度空間の計量を構成することを確かめる。最後に,提案する解法の有効性を,既存のベースラインと比較し,形状マッチングおよび正ラベル学習問題における計算時間と性能の観点から検証した。 The partial Gromov-Wasserstein (PGW) problem facilitates the comparison of measures with unequal masses residing in potentially distinct metric spaces, thereby enabling unbalanced and partial matching across these spaces. In this paper, we demonstrate that the PGW problem can be transformed into a variant of the Gromov-Wasserstein problem, akin to the conversion of the partial optimal transport problem into an optimal transport problem. This transformation leads to two new solvers, mathematically and computationally equivalent, based on the Frank-Wolfe algorithm, that provide efficient solutions to the PGW problem. We further establish that the PGW problem constitutes a metric for metric measure spaces. Finally, we validate the effectiveness of our proposed solvers in terms of computation time and performance on shape-matching and positive-unlabeled learning problems, comparing them against existing baselines.	翻訳日:2024-02-07 17:00:03 公開日:2024-02-06
# 記号層を含むディープニューラルネットワークにおけるシンボルの正確性 Symbol Correctness in Deep Neural Networks Containing Symbolic Layers ( http://arxiv.org/abs/2402.03663v1 ) ライセンス: Link先を確認	Aaron Bembenek, Toby Murray	(参考訳) 知覚と論理的推論を組み合わせたAIタスクを扱うために、最近の研究では、従来のニューラルネットワーク層に加えて、シンボリック表現(SAT式、論理プログラムなど)を含むニューロシンボリックディープニューラルネットワーク(NS-DNN)を導入している。我々は,NS-DNNの設計と分析を導く直感的かつ高レベルな原理,すなわち,入力データの(一般には知られていない)基底構造的記号表現に対して,ニューラルネットワーク層によって予測される中間シンボルの正しさを識別し,定式化する。記号の正しさはns-dnnの説明可能性と転校学習(一般に訓練が不可能であるにもかかわらず)に必要な特性であることを示す。さらに,シンボルの正しさの枠組みは,ニューラルシンボリック境界におけるモデル行動の推論と伝達の正確な方法を提供し,NS-DNNトレーニングアルゴリズムが直面する基本的なトレードオフについて考察する。そこで我々は,先行作業におけるあいまいさの重要点を特定し,さらにNS-DNNの発展を支援する枠組みを提供する。 To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.	翻訳日:2024-02-07 16:59:48 公開日:2024-02-06
# グラフ上の帰納的推論 Transductive Reward Inference on Graph ( http://arxiv.org/abs/2402.03661v1 ) ライセンス: Link先を確認	Bohao Qu, Xiaofeng Cao, Qing Guo, Yi Chang, Ivor W. Tsang, Chengqi Zhang	(参考訳) 本研究では,その報酬情報伝達グラフに対する帰納的推論手法を提案し,オフライン強化学習においてラベルなしデータに対する報酬を効果的に推定することを可能にする。報酬推論は実用的なシナリオで効果的なポリシーを学ぶための鍵であり、直接的な環境相互作用は費用がかかりすぎるか非倫理的であり、医療やロボティクスのような報酬機能がアクセスできない。本研究では,制約付き人間報酬アノテーションを活かしたグラフ上の情報伝達の文脈特性に基づく報酬推論手法を開発し,ラベルなしデータに対する報酬を推測する。我々は、利用可能なデータと限定的な報酬アノテーションの両方を利用して報酬伝達グラフを構築し、エッジ重み付けは報酬に関連するさまざまな影響要因を取り入れている。得られたグラフを変換的報酬推論に活用し,ラベルなしデータに対する報酬を推定する。さらに,帰納的推論過程の複数の反復の間に不動点の存在を確定し,その局所的最適値への少なくとも収束を示す。歩行とロボット操作タスクに関する経験的評価は,このアプローチの有効性を検証する。推定報酬の適用により,オフライン強化学習タスクの性能が向上する。 In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.	翻訳日:2024-02-07 16:59:28 公開日:2024-02-06
# プレトレーニング・ファイバリングパラダイムにおけるクロスタスクリニアリティの創出 Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm ( http://arxiv.org/abs/2402.03660v1 ) ライセンス: Link先を確認	Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan	(参考訳) プレトレーニング・ファインタニングのパラダイムは、現代のディープラーニングの主流となっている。本研究では,共通の事前学習済みチェックポイントから初期化され,異なるタスクで微調整されたモデルにおいて興味をそそる線形現象を,クロスタスク線形性(ctl)と呼ぶ。具体的には、2つの微調整モデルの重みを線形に補間すると、重み補間モデルの特徴は各層における2つの微調整モデルの特徴の線形補間とほぼ等しい。このようなクロスタスク線形性はピア文学では注目されていない。我々は、CTLが同じ事前訓練されたチェックポイントから始まる微調整モデルに対して一貫して発生することを示す包括的な実証的証拠を提供する。プレトレーニング-ファインタニングのパラダイムでは、ニューラルネットワークは基本的に線形写像として機能し、パラメータ空間から特徴空間へマッピングする。この観点から,本研究では,モデルマージ/編集について,特にパラメータ空間から特徴空間へ操作を変換することによって,新たな知見を提示する。さらに,CTLの出現の根底にある要因を深く掘り下げ,事前学習の影響を強調した。 The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are approximately equal to the linear interpolation of features in two finetuned models at each layer. Such cross-task linearity has not been noted in peer literature. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks essentially function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the underlying factors for the emergence of CTL, emphasizing the impact of pretraining.	翻訳日:2024-02-07 16:59:10 公開日:2024-02-06
# 自己回帰型大言語モデルを用いた説明可能な株価予測の学習 Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models ( http://arxiv.org/abs/2402.03659v1 ) ライセンス: Link先を確認	Kelvin J.L. Koa, Yunshan Ma, Ritchie Ng, Tat-Seng Chua	(参考訳) ストック予測を説明することは、従来の非生成的ディープラーニングモデルでは一般的に難しいタスクであり、重要なテキストに対する注意重みを視覚化することに限定されている。今日、Large Language Models (LLM) は、意思決定プロセスのための人間可読な説明を生成する既知の能力から、この問題に対する解決策を提示している。しかし、株価にカオス的なソーシャルテキストが与える影響を測る能力が必要となるため、株価予測の課題は依然としてllmsにとって困難である。この問題は説明コンポーネントの導入によって徐々に難しくなり、llmはなぜ特定の要因が他の要素よりも重要であるのかを口頭で説明する必要がある。一方で,このような課題に対してllmを微調整するには,トレーニングセット内の各ストック移動に対して,専門家による説明のサンプルが必要となる。これらの課題に対処するために,LLMが説明可能な株価予測を完全自律的に生成する方法を教えるために,自己回帰エージェントとPPO(Proximal Policy Optimization)を利用したSEP(Summarize-Explain-Predict)フレームワークを提案する。反射剤は自己推論によって過去の株価の動きを説明する方法を学び、PPOトレーナーは入力テキストから最も可能性の高い説明を生成するためにモデルを訓練する。 PPOトレーナーのトレーニングサンプルは、反射過程中に生成された応答であり、人間のアノテータの必要性を排除している。 SEPフレームワークを用いて,従来の深層学習法とLLM法の両方を予測精度,およびストック分類タスクに対するマシューズ相関係数で上回り得るLLMを微調整する。フレームワークの一般化能力を正当化するため、ポートフォリオ構築タスクでさらにテストし、さまざまなポートフォリオメトリクスを通してその効果を実証する。 Explaining stock predictions is generally a difficult task for traditional non-generative deep learning models, where explanations are limited to visualizing the attention weights on important texts. Today, Large Language Models (LLMs) present a solution to this problem, given their known capabilities to generate human-readable explanations for their decision-making process. However, the task of stock prediction remains challenging for LLMs, as it requires the ability to weigh the varying impacts of chaotic social texts on stock prices. The problem gets progressively harder with the introduction of the explanation component, which requires LLMs to explain verbally why certain factors are more important than the others. On the other hand, to fine-tune LLMs for such a task, one would need expert-annotated samples of explanation for every stock movement in the training set, which is expensive and impractical to scale. To tackle these issues, we propose our Summarize-Explain-Predict (SEP) framework, which utilizes a self-reflective agent and Proximal Policy Optimization (PPO) to let a LLM teach itself how to generate explainable stock predictions in a fully autonomous manner. The reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations from input texts. The training samples for the PPO trainer are also the responses generated during the reflective process, which eliminates the need for human annotators. Using our SEP framework, we fine-tune a LLM that can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient for the stock classification task. To justify the generalization capability of our framework, we further test it on the portfolio construction task, and demonstrate its effectiveness through various portfolio metrics.	翻訳日:2024-02-07 16:58:51 公開日:2024-02-06
# 対話における知覚強調グラフに基づくサルカズム記述 Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue ( http://arxiv.org/abs/2402.03658v1 ) ライセンス: Link先を確認	Kun Ouyang and Liqiang Jing and Xuemeng Song and Meng Liu and Yupeng Hu and Liqiang Nie	(参考訳) sed(sarcasm description in dialogue)は、複数のモーダリティ(発話、ビデオ、音声など)を含む、与えられたサルカスティックな対話に対して自然言語による説明を生成することを目的とした、新しい挑戦的なタスクである。既存の研究は、生成事前訓練された言語モデルであるBARTに基づいて大きな成功を収めてきたが、彼らは、発声、ビデオ、音声にまつわる感情を利用して、皮肉な説明の重要な手がかりを見落としている。実際、3つの大きな課題があるため、sedのパフォーマンスを高めるために感情を組み込むことは自明ではありません。 1) 発話トークンの感情に対する多様な影響 2)ビデオ音声の感情信号とBARTの埋め込み空間とのギャップ 3)発話,発話感情,映像音声感情のさまざまな関係これらの課題に対処するために, EDGE という新しい sEntiment-enhanceD Graph-based multimodal sarcasm Explanation フレームワークを提案する。特に,我々はまず,ヒューリスティックな発話感情改善戦略を考案した語彙誘導型発話感情推論モジュールを提案する。次に,マルチモーダル感情分析モデル JCA を拡張し,映像音声クリップ毎に共同感情ラベルを導出することにより,JCA-SI (Joint Cross Attention-based Sentiment Inference) というモジュールを開発する。その後, 発話, 発話感情, 音声感情間の意味関係を包括的にモデル化する文脈感グラフを考案し, 皮肉な説明生成を容易にする。一般公開されたデータセットWITSの大規模な実験は、最先端の手法よりもモデルの優位性を検証する。 Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (i.e., utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which are vital clues for sarcasm explanation. In fact, it is non-trivial to incorporate sentiments for boosting SED performance, due to three main challenges: 1) diverse effects of utterance tokens on sentiments; 2) gap between video-audio sentiment signals and the embedding space of BART; and 3) various relations among utterances, utterance sentiments, and video-audio sentiments. To tackle these challenges, we propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE. In particular, we first propose a lexicon-guided utterance sentiment inference module, where a heuristic utterance sentiment refinement strategy is devised. We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip. Thereafter, we devise a context-sentiment graph to comprehensively model the semantic relations among the utterances, utterance sentiments, and video-audio sentiments, to facilitate sarcasm explanation generation. Extensive experiments on the publicly released dataset WITS verify the superiority of our model over cutting-edge methods.	翻訳日:2024-02-07 16:58:20 公開日:2024-02-06
# Nested Low-Rank Approximationによるニューラルネットワークを用いた演算子SVD Operator SVD with Neural Networks via Nested Low-Rank Approximation ( http://arxiv.org/abs/2402.03655v1 ) ライセンス: Link先を確認	J. Jon Ryu, Xiangxiang Xu, H. S. Melihcan Erol, Yuheng Bu, Lizhong Zheng, Gregory W. Wornell	(参考訳) 与えられた線形作用素の固有値分解(EVD)を計算したり、その主要な固有値や固有関数を見つけることは、多くの機械学習および科学計算問題において基本的な課題である。高次元固有値問題に対して、固有関数をパラメータ化するためのニューラルネットワークの訓練は、古典的な数値線形代数手法の代替として有望であると考えられている。本稿では,停止特異値分解の低ランク近似解析に基づく新しい最適化フレームワークを提案し,それとともに,最大$l$特異値と特異関数を正しい順序で学習するためのネスティングと呼ばれる新しい手法を提案する。提案手法は,非制約最適化の定式化により,学習関数における所望の直交性を暗黙的かつ効率的に促進する。本稿では,計算物理学と機械学習のユースケースに対する最適化フレームワークの有効性を示す。 Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called nesting for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning.	翻訳日:2024-02-07 16:57:52 公開日:2024-02-06
# ジェネレーティブ・ディバイサル・ネットワークにおけるFIDとSIDメトリクスのレビュー Reviewing FID and SID Metrics on Generative Adversarial Networks ( http://arxiv.org/abs/2402.03654v1 ) ライセンス: Link先を確認	Ricardo de Deijn, Aishwarya Batra, Brandon Koch, Naseef Mansoor, Hema Makkena	(参考訳) generative adversarial network (gan)モデルの成長は画像処理の能力を高め、多くの産業に現実的な画像変換を生み出す技術を提供している。しかし、最近この分野が確立されているため、この研究をさらに進める新たな評価指標が存在する。これまでの研究では、Fr\'echet Inception Distance (FID) が実世界のアプリケーションで画像から画像へのGANをテストする上で有効な指標であることが示されている。 2023年に設立されたSID(Signed Inception Distance)は、符号なし距離を許すことでFIDを拡張する。本稿では, Pix2PixおよびCycleGANモデル内のfa\c{c}ades, cityscapes, mapからなる公開データセットを使用する。トレーニング後、これらのモデルは、トレーニングされたモデルの生成性能を測定する両方の開始距離指標に基づいて評価される。以上の結果から,SIDは画像から画像へのGANにFIDを用いて示される能力を補完したり,あるいは超えたりするために,効率的かつ効果的な指標を取り入れていることが示唆された。 The growth of generative adversarial network (GAN) models has increased the ability of image processing and provides numerous industries with the technology to produce realistic image transformations. However, with the field being recently established there are new evaluation metrics that can further this research. Previous research has shown the Fr\'echet Inception Distance (FID) to be an effective metric when testing these image-to-image GANs in real-world applications. Signed Inception Distance (SID), a founded metric in 2023, expands on FID by allowing unsigned distances. This paper uses public datasets that consist of fa\c{c}ades, cityscapes, and maps within Pix2Pix and CycleGAN models. After training these models are evaluated on both inception distance metrics which measure the generating performance of the trained models. Our findings indicate that usage of the metric SID incorporates an efficient and effective metric to complement, or even exceed the ability shown using the FID for the image-to-image GANs	翻訳日:2024-02-07 16:57:35 公開日:2024-02-06
# TGXを用いた時間グラフ解析 Temporal Graph Analysis with TGX ( http://arxiv.org/abs/2402.03651v1 ) ライセンス: Link先を確認	Razieh Shirzadkhani, Shenyang Huang, Elahe Kooshafar, Reihaneh Rabbany, Farimah Poursafaei	(参考訳) 現実世界のネットワークは進化する関係を持ち、時間グラフとして最もよく捉えられている。しかし、既存のソフトウェアライブラリは、時間グラフの動的性質が無視される静的グラフのために主に設計されている。このギャップを埋めて、データ読み込み、データ処理、進化するグラフの自動パイプラインを含む時間的ネットワークの分析に特化して設計されたPythonパッケージであるTGXを紹介する。 TGXは、11の組込みデータセットと8つの外部テンポラルグラフベンチマーク(TGB)データセット、および.NET Frameworkの新しいデータセットへのアクセスを提供する。 csvフォーマット。データローディング以外にも、TGXは、時間グラフの離散化やノードサブサンプリングといったデータ処理機能を促進して、より大きなデータセットの処理を高速化する。網羅的な調査のために、TGXは、平均ノード度と、タイムスタンプ当たりのノード数とエッジの進化数を含む、さまざまな測定値を提供することで、ネットワーク分析を提供する。さらに、パッケージは、時間的エッジ外観(tea)や時間的エッジトラフィック(tet)プロットのような時間的パターンの進化を示す有意義な可視化プロットを統合する。 TGXパッケージは、時間グラフの特徴を調べるための堅牢なツールであり、ソーシャルネットワークの研究、引用ネットワークの研究、ユーザインタラクションの追跡など、さまざまな分野で使用することができる。コミュニティのフィードバックに基づいてTGXを継続的にサポートし、更新する予定です。 TGXは、https://github.com/ComplexData-MILA/TGXで公開されている。 Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely designed for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node subsampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Trafficc (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.	翻訳日:2024-02-07 16:57:14 公開日:2024-02-06
# マニフォールド学習によるマルチ線形カーネル回帰とインプット Multilinear Kernel Regression and Imputation via Manifold Learning ( http://arxiv.org/abs/2402.03648v1 ) ライセンス: Link先を確認	Duc Thien Nguyen and Konstantinos Slavakis	(参考訳) 本稿では,データインプテーションのための新しい非パラメトリックフレームワークであるマルチリニアカーネル回帰(multil-krim)とインプテーション(imputation)を提案する。多様体学習によって動機づけられたMultiL-KRIMは、再現されたカーネルヒルベルト空間に埋め込まれたユーザ不明の滑らかな多様体の内または近くに位置する点雲としてのデータ特徴をモデル化する。グラフ-ラプラシア行列に基づく正規化子による低次元パターンを求める典型的な多様体学習経路とは異なり、MultiL-KRIMは、多様体への接空間の直感的な概念に基づいて構築し、損失関数のデータモデリング項に直接、ポイントクラウド隣人(回帰者)間の協調を組み込む。複数のカーネル関数はロバスト性とリッチな近似性を提供し、複数の行列因子は低ランクのモデリング、次元の縮小、データのトレーニング不要な合理化計算を提供する。 2つの重要なアプリケーションドメインはMultiL-KRIMの機能を示す: 時間変化グラフ信号(TVGS)リカバリと、高速な動的磁気共鳴イメージング(dMRI)データの再構成である。実データおよび合成データに対する大規模な数値実験は、MultiL-KRIMが前者よりも顕著なスピードアップを示し、より直感的で説明しやすいパイプラインで、一般的な「浅すぎる」データインプット技術よりも性能が優れていることを示している。 This paper introduces a novel nonparametric framework for data imputation, coined multilinear kernel regression and imputation via the manifold assumption (MultiL-KRIM). Motivated by manifold learning, MultiL-KRIM models data features as a point cloud located in or close to a user-unknown smooth manifold embedded in a reproducing kernel Hilbert space. Unlike typical manifold-learning routes, which seek low-dimensional patterns via regularizers based on graph-Laplacian matrices, MultiL-KRIM builds instead on the intuitive concept of tangent spaces to manifolds and incorporates collaboration among point-cloud neighbors (regressors) directly into the data-modeling term of the loss function. Multiple kernel functions are allowed to offer robustness and rich approximation properties, while multiple matrix factors offer low-rank modeling, integrate dimensionality reduction, and streamline computations with no need of training data. Two important application domains showcase the functionality of MultiL-KRIM: time-varying-graph-signal (TVGS) recovery, and reconstruction of highly accelerated dynamic-magnetic-resonance-imaging (dMRI) data. Extensive numerical tests on real and synthetic data demonstrate MultiL-KRIM's remarkable speedups over its predecessors, and outperformance over prevalent "shallow" data-imputation techniques, with a more intuitive and explainable pipeline than deep-image-prior methods.	翻訳日:2024-02-07 16:56:51 公開日:2024-02-06
# CAMBranch: ブランチのための拡張MILPによるコントラスト学習 CAMBranch: Contrastive Learning with Augmented MILPs for Branching ( http://arxiv.org/abs/2402.03647v1 ) ライセンス: Link先を確認	Jiacheng Lin, Meng Xu, Zhihua Xiong, Huangang Wang	(参考訳) 最近の進歩は、Mixed Integer Linear Programming(MILP)を解決するためのブランチとバウンド(B\&B)ブランチポリシーを強化する機械学習フレームワークを導入している。これらの手法は主にStrong Branchingの模倣学習に依存しており、優れた性能を示している。しかし、模倣学習、特に強い分岐のための専門家サンプルの収集は時間のかかる努力である。この課題に対処するために,従来のMILPから限られた専門家データへの可変シフトを適用することで,Augmented MILP(AMILP)を生成するフレームワークであるCAMBranchに対して, \textbf{A}ugmented \textbf{M}ILPsを用いた学習を提案する。このアプローチは、かなりの数のラベル付きエキスパートサンプルの取得を可能にする。 CAMBranchはMILPとAMILPの両方を模倣学習に利用し、対照的な学習を用いてMILPの特徴を捉え、分岐決定の質を向上させる。実験の結果、完全なデータセットの10\%でトレーニングされたcambranchは優れた性能を示すことがわかった。アブレーション研究は我々の方法の有効性をさらに検証する。 Recent advancements have introduced machine learning frameworks to enhance the Branch and Bound (B\&B) branching policies for solving Mixed Integer Linear Programming (MILP). These methods, primarily relying on imitation learning of Strong Branching, have shown superior performance. However, collecting expert samples for imitation learning, particularly for Strong Branching, is a time-consuming endeavor. To address this challenge, we propose \textbf{C}ontrastive Learning with \textbf{A}ugmented \textbf{M}ILPs for \textbf{Branch}ing (CAMBranch), a framework that generates Augmented MILPs (AMILPs) by applying variable shifting to limited expert data from their original MILPs. This approach enables the acquisition of a considerable number of labeled expert samples. CAMBranch leverages both MILPs and AMILPs for imitation learning and employs contrastive learning to enhance the model's ability to capture MILP features, thereby improving the quality of branching decisions. Experimental results demonstrate that CAMBranch, trained with only 10\% of the complete dataset, exhibits superior performance. Ablation studies further validate the effectiveness of our method.	翻訳日:2024-02-07 16:56:22 公開日:2024-02-06
# Lens: ネットワークトラフィックの基礎モデル Lens: A Foundation Model for Network Traffic ( http://arxiv.org/abs/2402.03646v1 ) ライセンス: Link先を確認	Qineng Wang, Chen Qian, Xiaochang Li, Ziyu Yao, Huajie Shao	(参考訳) ネットワークトラフィック(ネットワークトラフィック)とは、インターネットやコンピュータを接続するシステムを通じて送信される情報の量である。ネットワークのセキュリティと管理を改善するには,ネットワークトラフィックの分析と理解が不可欠である。しかし、ネットワークトラフィックの分析は、異種ヘッダやセマンティクスに欠ける暗号化ペイロードなど、データパケットのユニークな特徴のため、大きな課題を生んでいる。トラフィックの潜在的セマンティクスを捉えるために、Transformerエンコーダやデコーダをベースとした事前学習技術を用いて、大規模トラフィックデータから表現を学習する研究がいくつかある。しかし、これらの手法は通常、トラフィック理解(分類)やトラフィック生成タスクでのみ優れている。この問題に対処するために,T5アーキテクチャを利用したネットワークトラフィックモデルLensを開発し,大規模未ラベルデータから事前学習を行う。生成能力を保ちながらグローバル情報をキャプチャするエンコーダ・デコーダ・フレームワークの強みを活かして,大規模ネットワークトラフィックから表現をよりよく学習することができる。事前学習性能をさらに向上するため,MSP(Masked Span Prediction),POP(Packet Order Prediction),HTP(Homologous Traffic Prediction)の3つの異なるタスクを統合した新しい損失を設計した。複数のベンチマークデータセットにおける評価結果は、提案するレンズが、トラフィック理解とトラフィック生成の両方に関連する下流タスクのベースラインを上回っていることを示している。とくに、現在の方法に比べて微調整のためのラベル付きデータもかなり少ない。 Network traffic refers to the amount of information being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic poses great challenges due to the unique characteristics of data packets, such as heterogeneous headers and encrypted payload lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from large-scale traffic data. However, these methods typically excel only in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundational network traffic model that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from large-scale network traffic. To further enhance pre-training performance, we design a novel loss that integrates three distinct tasks, namely Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results on multiple benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and traffic generation. Notably, it also requires considerably less labeled data for fine-tuning compared to current methods.	翻訳日:2024-02-07 16:56:00 公開日:2024-02-06
# Stanceosaurus 2.0: ロシアとスペインの誤報へのスタンス分類 Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation ( http://arxiv.org/abs/2402.03642v1 ) ライセンス: Link先を確認	Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu	(参考訳) スタンテオサウルス・コーパス(zheng et al., 2022)は、twitterから抽出された高品質で注釈付き、5方向の姿勢データを提供し、文化横断的および言語横断的誤情報の分析に適するように設計された。 Stanceosaurus 2.0イテレーションでは、このフレームワークをロシア語とスペイン語に拡張しています。前者は西側諸国との緊張が激化し、ウクライナへの激しい侵入が相次いだため、現在の重要性がある。一方、後者は巨大なコミュニティであり、主要なソーシャルメディアプラットフォームでは見過ごされてきた。 41件以上の偽情報のツイートを3,874件追加することで、これらの問題に焦点を当てた研究を支援することを目標としている。このデータの価値を実証するため,多言語BERTのゼロショット交叉移動を用いて,両言語で43のマクロF1スコアを持つStanceosaurusの初期研究と同等の結果を得た。これは多文化的誤情報を識別するための有効なツールとしてスタンス分類の有効性を強調する。 The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation. In the Stanceosaurus 2.0 iteration, we extend this framework to encompass Russian and Spanish. The former is of current significance due to prevalent misinformation amid escalating tensions with the West and the violent incursion into Ukraine. The latter, meanwhile, represents an enormous community that has been largely overlooked on major social media platforms. By incorporating an additional 3,874 Spanish and Russian tweets over 41 misinformation claims, our objective is to support research focused on these issues. To demonstrate the value of this data, we employed zero-shot cross-lingual transfer on multilingual BERT, yielding results on par with the initial Stanceosaurus study with a macro F1 score of 43 for both languages. This underlines the viability of stance classification as an effective tool for identifying multicultural misinformation.	翻訳日:2024-02-07 16:55:34 公開日:2024-02-06
# torchmSAT: 最大満足度問題に対するGPU加速近似 torchmSAT: A GPU-Accelerated Approximation To The Maximum Satisfiability Problem ( http://arxiv.org/abs/2402.03640v1 ) ライセンス: Link先を確認	Abdelrahman Hosny, Sherief Reda	(参考訳) 離散構造解析における機械学習技術の顕著な成果は、組合せ最適化アルゴリズムへの統合に大きな注目を集めている。通常、これらの手法は、学習したモデルを解法ループ内に注入することで既存の解法を改善する。本研究では,最大満足度問題(MaxSAT)の解を近似できる単一微分可能関数を導出する。そこで我々は,我々の微分可能な関数をモデル化するための新しいニューラルネットワークアーキテクチャを提案する。このアプローチでは、トレーニングプロセスが解決アルゴリズムとして機能するため、ラベル付きデータやニューラルネットワークトレーニングフェーズが不要になる。さらに,GPUの計算能力を利用して計算を高速化する。 MaxSATインスタンスに挑戦する実験結果から,提案手法は既存の2つのMaxSATソルバよりも優れており,学習や基盤となるSATソルバへのアクセスを必要とせず,ソリューションコストの面で同等であることがわかった。 NPハード問題をMaxSATに還元できることを考えると、我々の新しい手法は、ニューラルネットワークGPUアクセラレーションの恩恵を受ける新しい世代の問題解決者への道を開くものである。 The remarkable achievements of machine learning techniques in analyzing discrete structures have drawn significant attention towards their integration into combinatorial optimization algorithms. Typically, these methodologies improve existing solvers by injecting learned models within the solving loop to enhance the efficiency of the search process. In this work, we derive a single differentiable function capable of approximating solutions for the Maximum Satisfiability Problem (MaxSAT). Then, we present a novel neural network architecture to model our differentiable function, and progressively solve MaxSAT using backpropagation. This approach eliminates the need for labeled data or a neural network training phase, as the training process functions as the solving algorithm. Additionally, we leverage the computational power of GPUs to accelerate these computations. Experimental results on challenging MaxSAT instances show that our proposed methodology outperforms two existing MaxSAT solvers, and is on par with another in terms of solution cost, without necessitating any training or access to an underlying SAT solver. Given that numerous NP-hard problems can be reduced to MaxSAT, our novel technique paves the way for a new generation of solvers poised to benefit from neural network GPU acceleration.	翻訳日:2024-02-07 16:55:11 公開日:2024-02-06
# BEAM:多視点3Dオブジェクト検出のためのベータ分布レイデノイング BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection ( http://arxiv.org/abs/2402.03634v1 ) ライセンス: Link先を確認	Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou	(参考訳) 多視点3Dオブジェクト検出器は、深度情報の欠如による重複予測に苦慮し、偽陽性検出を行う。本研究では,DTR方式のマルチビュー3D検出器に適用可能な,新しいBeta Distribution Ray DenoisingアプローチであるBEAMを紹介した。カメラからオブジェクトへの光線を生成し、これらの光線に沿ってベータ分布系から空間デノジングクエリをサンプリングすることにより、BEAMは曖昧な深さから生じる空間的な硬い負のサンプルを識別する能力を高める。 BEAMは、トレーニング中に限界計算コストのみを追加し、推論速度を著しく保存するプラグイン・アンド・プレイ技術である。 NuScenesデータセットの大規模な実験とアブレーション研究は、強力なベースラインよりも大幅に改善され、最先端のStreamPETRよりも1.9%向上した。コードはhttps://github.com/LiewFeng/BEAM.comから入手できる。 Multi-view 3D object detectors struggle with duplicate predictions due to the lack of depth information, resulting in false positive detections. In this study, we introduce BEAM, a novel Beta Distribution Ray Denoising approach that can be applied to any DETR-style multi-view 3D detector to explicitly incorporate structure prior knowledge of the scene. By generating rays from cameras to objects and sampling spatial denoising queries from the Beta distribution family along these rays, BEAM enhances the model's ability to distinguish spatial hard negative samples arising from ambiguous depths. BEAM is a plug-and-play technique that adds only marginal computational costs during training, while impressively preserving the inference speed. Extensive experiments and ablation studies on the NuScenes dataset demonstrate significant improvements over strong baselines, outperforming the state-of-the-art method StreamPETR by 1.9% mAP. The code will be available at https://github.com/LiewFeng/BEAM.	翻訳日:2024-02-07 16:54:38 公開日:2024-02-06
# より深い理解のための能動的問合せを用いた言語モデルの構築 Empowering Language Models with Active Inquiry for Deeper Understanding ( http://arxiv.org/abs/2402.03719v1 ) ライセンス: Link先を確認	Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu	(参考訳) 大規模言語モデル(LLM)の台頭は、自然言語を通じて人工知能システムと対話する方法に革命をもたらした。しかし、LSMは不確実な意図のためにユーザクエリを誤解釈することが多く、あまり役に立たない。自然の人間との相互作用では、不明な情報を明らかにするために標的とした質問を通じて明確化が求められる。そこで本稿では,同じレベルの対話性を持つllmを支援すべく設計されたlamai(language model with active inquiry)を提案する。 LaMAIはアクティブな学習技術を活用して、最も有益な質問を提起し、動的双方向対話を促進する。このアプローチはコンテキストギャップを狭めるだけでなく、LCMの出力を洗練し、ユーザの期待とより密接に一致させる。 LLMが会話の文脈に制限がある様々な複雑なデータセットを対象とした実証研究は、LaMAIの有効性を実証している。解答精度は31.9%から50.9%に向上し、他の主要な問合せフレームワークを上回っている。さらに、人間の参加者を含むシナリオでは、lamaiは一貫して82%以上のケースにおいて、ベースラインメソッドに匹敵する応答を生成する。 LaMAIの適用性はさらに、様々なLLMとの統合の成功によって証明されており、対話型言語モデルの将来の可能性を強調している。 The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language. However, LLMs often misinterpret user queries because of their uncertain intention, leading to less helpful responses. In natural human interactions, clarification is sought through targeted questioning to uncover obscure information. Thus, in this paper, we introduce LaMAI (Language Model with Active Inquiry), designed to endow LLMs with this same level of interactive engagement. LaMAI leverages active learning techniques to raise the most informative questions, fostering a dynamic bidirectional dialogue. This approach not only narrows the contextual gap but also refines the output of the LLMs, aligning it more closely with user expectations. Our empirical studies, across a variety of complex datasets where LLMs have limited conversational context, demonstrate the effectiveness of LaMAI. The method improves answer accuracy from 31.9% to 50.9%, outperforming other leading question-answering frameworks. Moreover, in scenarios involving human participants, LaMAI consistently generates responses that are superior or comparable to baseline methods in more than 82% of the cases. The applicability of LaMAI is further evidenced by its successful integration with various LLMs, highlighting its potential for the future of interactive language models.	翻訳日:2024-02-07 16:46:51 公開日:2024-02-06
# 大規模言語モデルによる協調フレームワークによるロボットの自動開発 Automatic Robotic Development through Collaborative Framework by Large Language Models ( http://arxiv.org/abs/2402.03699v1 ) ライセンス: Link先を確認	Zhirong Luan and Yujun Lai	(参考訳) 大きな言語モデル LLM の驚くべきコード生成能力にもかかわらず、それらは複雑なタスクハンドリングの課題に直面している。高度に複雑な分野であるロボット開発は、本質的には、タスクアロケーションと協力的なチームワークに人間の関与を要求する。ロボット開発を促進するために,現実のロボット開発に触発された革新的な自動協調フレームワークを提案する。このフレームワークは複数のllmを異なる役割アナリスト、プログラマ、テスターに採用している。アナリストはユーザー要件を深く掘り下げ、プログラマが正確なコードを作成できるようにし、テスタは実際のロボットアプリケーションのユーザフィードバックに基づいてパラメータを微調整する。各llmは開発プロセス内で多様な重要なタスクに取り組みます。明確なコラボレーションルールは、LLM間の現実のチームワークをエミュレートします。アナリスト、プログラマ、テスターは、戦略、コード、パラメータ調整を監督する結束したチームを形成します。この枠組みにより, 専門知識を必要とせず, 非専門家のみに頼り, 複雑なロボット開発を実現する。 Despite the remarkable code generation abilities of large language models LLMs, they still face challenges in complex task handling. Robot development, a highly intricate field, inherently demands human involvement in task allocation and collaborative teamwork . To enhance robot development, we propose an innovative automated collaboration framework inspired by real-world robot developers. This framework employs multiple LLMs in distinct roles analysts, programmers, and testers. Analysts delve deep into user requirements, enabling programmers to produce precise code, while testers fine-tune the parameters based on user feedback for practical robot application. Each LLM tackles diverse, critical tasks within the development process. Clear collaboration rules emulate real world teamwork among LLMs. Analysts, programmers, and testers form a cohesive team overseeing strategy, code, and parameter adjustments . Through this framework, we achieve complex robot development without requiring specialized knowledge, relying solely on non experts participation.	翻訳日:2024-02-07 16:46:29 公開日:2024-02-06
# 大規模局所学習係数の推定 Estimating the Local Learning Coefficient at Scale ( http://arxiv.org/abs/2402.03698v1 ) ライセンス: Link先を確認	Zach Furman, Edmund Lau	(参考訳) \textit{local learning coefficient} (LLC) はモデル複雑性を定量化する原理的な方法であり、もともとは特異学習理論(SLT)を用いてベイズ統計の文脈から導かれた。局所学習係数を数値的に推定する手法はいくつか知られているが、現在のディープラーニングアーキテクチャやデータセットの規模には拡張されていない。 {\tt arXiv:2308.12108 [stat.ML]} で開発された手法を用いて、深い線形ネットワーク(DLN)を最大100Mパラメータまで正確に自己整合的に測定する方法を実証的に示す。また, 推定LLCは, 理論量に対する再スケーリング不変性を有することを示す。 The \textit{local learning coefficient} (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in {\tt arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity.	翻訳日:2024-02-07 16:46:16 公開日:2024-02-06
# SHMC-Net: 精子頭部形態分類のためのマスク誘導機能融合ネットワーク SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification ( http://arxiv.org/abs/2402.03697v1 ) ライセンス: Link先を確認	Nishchal Sapkota, Yejia Zhang, Sirui Li, Peixian Liang, Zhuo Zhao, Danny Z Chen	(参考訳) 男性不妊は世界の不妊患者の約3分の1を占める。頭部形態解析による精子異常の手動評価は、専門家の間で観察者の変動と診断上の相違の問題に遭遇する。その代わり、casa(computer-assisted semen analysis)は、低品質の精子画像、小さなデータセット、騒がしいクラスラベルに苦しむ。精子頭の形態分類のための新しいアプローチであるshmc-netを提案し,精子頭のセグメンテーションマスクを用いて精子画像の形態分類を導く。 SHMC-Netは、画像プリエントを用いて信頼性の高いセグメンテーションマスクを生成し、効率的なグラフベースの手法でオブジェクト境界を洗練し、精子頭作物とマスクネットワークをトレーニングする。ネットワークの中間段階では、画像とマスクの特徴を融合スキームで融合させ、形態的特徴をよりよく学習する。ノイズの多いクラスラベルの処理と小さなデータセットでのトレーニングの正規化のために、SHMC-NetはSoft Mixupを適用して、ミックスアップ拡張と損失関数を組み合わせた。 scian と hushem のデータセットで最先端の成果を達成し,事前トレーニングやコストのかかるセンシング手法を駆使した手法よりも優れています。 Male infertility accounts for about one-third of global infertility cases. Manual assessment of sperm abnormalities through head morphology analysis encounters issues of observer variability and diagnostic discrepancies among experts. Its alternative, Computer-Assisted Semen Analysis (CASA), suffers from low-quality sperm images, small datasets, and noisy class labels. We propose a new approach for sperm head morphology classification, called SHMC-Net, which uses segmentation masks of sperm heads to guide the morphology classification of sperm images. SHMC-Net generates reliable segmentation masks using image priors, refines object boundaries with an efficient graph-based method, and trains an image network with sperm head crops and a mask network with the corresponding masks. In the intermediate stages of the networks, image and mask features are fused with a fusion scheme to better learn morphological features. To handle noisy class labels and regularize training on small datasets, SHMC-Net applies Soft Mixup to combine mixup augmentation and a loss function. We achieve state-of-the-art results on SCIAN and HuSHeM datasets, outperforming methods that use additional pre-training or costly ensembling techniques.	翻訳日:2024-02-07 16:46:03 公開日:2024-02-06
# ConUNETR:3次元マイクロCT軟骨分割のためのコンディショナルトランスフォーマネットワーク ConUNETR: A Conditional Transformer Network for 3D Micro-CT Embryonic Cartilage Segmentation ( http://arxiv.org/abs/2402.03695v1 ) ライセンス: Link先を確認	Nishchal Sapkota, Yejia Zhang, Susan M. Motch Perrine, Yuhan Hsi, Sirui Li, Meng Wu, Greg Holmes, Abdul R. Abdulai, Ethylin W. Jabs, Joan T. Richtsmeier, Danny Z Chen	(参考訳) 軟骨および骨構造の形態発達の研究は、生命を脅かす骨格形態の早期発見に不可欠である。胚軟骨は数時間以内に急速な構造変化を起こし、複数の胚年齢層にわたって推測される深層学習に基づくセグメンテーションモデルの一般化を制限する生物学的変異と形態変化をもたらす。年齢グループごとに個別のモデルを取得することは高価で効果が低いが、直接転送(トレーニング中に見えない年齢を予測する)は形態変化による潜在的なパフォーマンス低下に悩まされる。本研究では, 形態学的に多様な情報を条件付き機構で蒸留するトランスフォーマーを用いた新しいセグメンテーションモデルを提案する。これにより、1つのモデルが複数の年齢グループで正確に軟骨を予測できる。実験では,他の競合セグメンテーションモデルと比較して,新しいモデルの優位性を示した。異なる変異を持つマウス軟骨データセットに関するさらなる研究は、モデルが良好に一般化し、年齢ベースの軟骨形態パターンを効果的に捉えていることを示している。 Studying the morphological development of cartilaginous and osseous structures is critical to the early detection of life-threatening skeletal dysmorphology. Embryonic cartilage undergoes rapid structural changes within hours, introducing biological variations and morphological shifts that limit the generalization of deep learning-based segmentation models that infer across multiple embryonic age groups. Obtaining individual models for each age group is expensive and less effective, while direct transfer (predicting an age unseen during training) suffers a potential performance drop due to morphological shifts. We propose a novel Transformer-based segmentation model with improved biological priors that better distills morphologically diverse information through conditional mechanisms. This enables a single model to accurately predict cartilage across multiple age groups. Experiments on the mice cartilage dataset show the superiority of our new model compared to other competitive segmentation models. Additional studies on a separate mice cartilage dataset with a distinct mutation show that our model generalizes well and effectively captures age-based cartilage morphology patterns.	翻訳日:2024-02-07 16:45:40 公開日:2024-02-06
# ServeFlow: ネットワークトラフィック分析のための高速スローモデルアーキテクチャ ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis ( http://arxiv.org/abs/2402.03694v1 ) ライセンス: Link先を確認	Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster	(参考訳) インターネットが統合され、トラフィックが暗号化されるにつれて、ネットワークトラフィック分析はますます複雑な機械学習モデルを使用するようになっている。しかし、高帯域幅ネットワークでは、フローがモデル推論速度よりも早く到達できる。ネットワークフローの時間的性質は、他の高速機械学習アプリケーションで利用される単純なスケールアウトアプローチを制限する。そこで本稿では,ネットワークトラフィック分析タスクを対象とした機械学習モデルのServeFlowを提案する。これは,収集するパケットの数と,個々のフローに適用するモデルを選択して,最小レイテンシ,高サービスレート,高精度のバランスを実現する。同じタスクでは、モデル間の推論時間は2.7x-136.3xで、中央のパッケージ間待機時間は推論時間より6-8桁高いことがよくあります。 ServeFlowは、76.3%のフローを16ms以下で推論することが可能で、これは、サービスレートを高め、同様の精度を維持しながら、中央のエンドツーエンドサービスレイテンシで40.5倍のスピードアップである。 1フローに何千もの機能があるとしても、16コアのcpuコモディティサーバ上で毎秒48.5k以上の新しいフローを処理し、都市レベルのネットワークバックボーンで観測される流量の桁数に合致する。 Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.	翻訳日:2024-02-07 16:45:23 公開日:2024-02-06
# 3Doodle: 3Dストロークによるオブジェクトのコンパクト抽象化 3Doodle: Compact Abstraction of Objects with 3D Strokes ( http://arxiv.org/abs/2402.03690v1 ) ライセンス: Link先を確認	Changwoon Choi, Jaeah Lee, Jaesik Park, Young Min Kim	(参考訳) フリーハンドのスケッチは長い間、物体の特徴を伝えるための効率的な表現として機能してきたが、しばしば主観的であり、現実的な表現からかなり逸脱している。さらに、スケッチは任意の視点で一貫性がなく、3d形状を捉えるのが困難である。対象オブジェクトのマルチビュー画像に対して記述的かつビュー一貫性のあるスケッチ画像を生成する3Doooleを提案する。本手法は,3次元ストロークの集合が3次元構造情報を効率よく表現し,表示に一貫性のある2次元スケッチを描画できるという考えに基づいている。 2次元スケッチをビューに依存しないコンポーネントとビューに依存しないコンポーネントの結合として表現する。 3次元立方体Bエジエ曲線はビューに依存しない3次元特徴線を示すが、超四角形の輪郭は様々な視点の体積の滑らかな輪郭を表す。我々のパイプラインは、3Dストロークプリミティブのパラメータを直接最適化し、知覚的損失を完全に微分可能な方法で最小化する。得られた3dストロークのスパース集合は、様々なオブジェクトの必須の3d特性形状を含む抽象スケッチとして表現することができる。近年のスケッチ生成手法と比較して、3Doodleはオリジナル画像の概念を忠実に表現できることを示す。 While free-hand sketching has long served as an efficient representation to convey characteristics of an object, they are often subjective, deviating significantly from realistic representations. Moreover, sketches are not consistent for arbitrary viewpoints, making it hard to catch 3D shapes. We propose 3Dooole, generating descriptive and view-consistent sketch images given multi-view images of the target object. Our method is based on the idea that a set of 3D strokes can efficiently represent 3D structural information and render view-consistent 2D sketches. We express 2D sketches as a union of view-independent and view-dependent components. 3D cubic B ezier curves indicate view-independent 3D feature lines, while contours of superquadrics express a smooth outline of the volume of varying viewpoints. Our pipeline directly optimizes the parameters of 3D stroke primitives to minimize perceptual losses in a fully differentiable manner. The resulting sparse set of 3D strokes can be rendered as abstract sketches containing essential 3D characteristic shapes of various objects. We demonstrate that 3Doodle can faithfully express concepts of the original images compared with recent sketch generation approaches.	翻訳日:2024-02-07 16:45:01 公開日:2024-02-06
# 垂直連合学習におけるプライバシの脅威と防御に関する調査--モデルライフサイクルの観点から A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective ( http://arxiv.org/abs/2402.03688v1 ) ライセンス: Link先を確認	Lei Yu, Meng Han, Yiming Li, Changting Lin, Yao Zhang, Mingyang Zhang, Yan Liu, Haiqin Weng, Yuseok Jeon, Ka-Ho Chow, Stacy Patterson	(参考訳) Vertical Federated Learning(VFL)は、複数の参加者が同じサンプルを共有し、異なる特徴を持つ、共同で機械学習モデルをトレーニングする、連合学習パラダイムである。 VFLは生データを共有せずにコラボレーティブな機械学習を可能にするが、それでもさまざまなプライバシー上の脅威を受けやすい。本稿では,VFLにおけるプライバシ攻撃と防衛における最先端技術に関する総合的な調査を行う。本研究は,攻撃と防衛の両方に分類学を提供し,その特徴に基づいてオープン課題と今後の研究方向性について議論する。具体的には,機械学習のさまざまな段階で発生するプライバシの脅威と,それに対応する対策を掘り下げることで,モデルのライフサイクルを中心にして議論を行う。この調査は研究コミュニティのリソースとして機能するだけでなく、モデルのライフサイクルを通じてデータプライバシを保護するための明確なガイダンスと実用的な洞察を提供する。 Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.	翻訳日:2024-02-07 16:44:41 公開日:2024-02-06
# Pard:グラフ生成のための置換不変自己回帰拡散 Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation ( http://arxiv.org/abs/2402.03687v1 ) ライセンス: Link先を確認	Lingxiao Zhao, Xueying Ding, Leman Akoglu	(参考訳) グラフ生成は、順序付けに対する感度にもかかわらず、単純さと有効性のため、自己回帰モデルによって支配されている。しかし、拡散モデルは置換不変でありながら同等のパフォーマンスを提供するため、注目を集めている。現在のグラフ拡散モデルは1ショットでグラフを生成するが、最適なパフォーマンスを達成するには追加の機能と数千のデノゲーションステップが必要である。拡散モデルと自己回帰法を統合した置換不変自己回帰拡散モデルpardを提案する。 pardは、順序の感度なしで置換不変性を維持しながら、自己回帰モデルの有効性と効率を利用する。具体的には、集合とは対照的に、グラフの要素は完全に順序づけられておらず、ノードとエッジに一意な部分順序が存在することを示す。この部分順序で、PARDはブロックごとの自己回帰的なグラフを生成し、各ブロックの確率は同変ネットワークを持つ共有拡散モデルによって条件付きでモデル化される。表現性を確保しつつ効率を確保するため,PPGNと変換器を統合した高階グラフ変換器を提案する。 GPTと同様に、全てのブロックの並列トレーニングをサポートするために高階グラフ変換器を拡張する。余分な特徴がなければ、PARDは分子および非分子データセットの最先端のパフォーマンスを達成し、1.9M分子を含むMOSESのような大規模なデータセットにスケールする。 Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules.	翻訳日:2024-02-07 16:44:24 公開日:2024-02-06
# Minds vs. Machines: 言語モデルによる詳細検証の再考 Minds versus Machines: Rethinking Entailment Verification with Language Models ( http://arxiv.org/abs/2402.03686v1 ) ライセンス: Link先を確認	Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren	(参考訳) 人間は会話を理解するためにテキスト理解において多くの推論を行う。本稿では,人間と最先端の大規模言語モデル(llm)間の推論判断の共通性と相違を理解することを目的とする。包括的にキュレートされたentailment testベンチマークを利用して、さまざまな推論カテゴリで人間とLLMのパフォーマンスを評価する。本ベンチマークでは,3つのカテゴリ(NLI,コンテキストQA,合理性)のデータセットを多文の前提と異なる知識タイプに含め,複雑な推論インスタンスにおける推論能力の評価を行う。以上の結果から,LLMs は長期にわたるマルチホップ推論において優れており,人間は簡素な帰納的推論を必要とするタスクに優れていた。これらの知見を活かして、GPT-3.5やGPT-4と競合するFlan-T5モデルを微調整し、包含検証のための堅牢なオープンソースソリューションを提供する。実用的応用として、モデル生成説明における自己整合性を高めるための微調整モデルの有効性を示す。 Humans make numerous inferences in text comprehension to understand discourse. This paper aims to understand the commonalities and disparities in the inference judgments between humans and state-of-the-art Large Language Models (LLMs). Leveraging a comprehensively curated entailment verification benchmark, we evaluate both human and LLM performance across various reasoning categories. Our benchmark includes datasets from three categories (NLI, contextual QA, and rationales) that include multi-sentence premises and different knowledge types, thereby evaluating the inference capabilities in complex reasoning instances. Notably, our findings reveal LLMs' superiority in multi-hop reasoning across extended contexts, while humans excel in tasks necessitating simple deductive reasoning. Leveraging these insights, we introduce a fine-tuned Flan-T5 model that outperforms GPT-3.5 and rivals with GPT-4, offering a robust open-source solution for entailment verification. As a practical application, we showcase the efficacy of our finetuned model in enhancing self-consistency in model-generated explanations, resulting in a 6% performance boost on average across three multiple-choice question-answering datasets.	翻訳日:2024-02-07 16:44:01 公開日:2024-02-06
# RL-VLM-F:ビジョン言語モデルからの強化学習 RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback ( http://arxiv.org/abs/2402.03681v1 ) ライセンス: Link先を確認	Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, Zackory Erickson	(参考訳) 報酬工学は強化学習(rl)研究において長年の課題であり、効果的な報酬機能を設計するには、人間の努力と試行錯誤の反復プロセスがしばしば必要となる。本稿では,視覚言語基礎モデル(VLM)からのフィードバックを利用して,タスク目標のテキスト記述とエージェントの視覚観察のみを用いて,エージェントが新しいタスクを学習するための報酬関数を自動的に生成する手法であるRL-VLM-Fを提案する。提案手法の鍵となるのは,タスクゴールのテキスト記述に基づいて,エージェントのイメージ観察のペアよりも好みを与えるためにこれらのモデルをクエリし,そのモデルに生の報酬スコアを出力させるのではなく,好みラベルから報酬関数を学習することである。我々は、RL-VLM-Fが、古典的な制御を含む様々な領域にまたがる効果的な報酬とポリシー、および、厳密で明瞭で変形可能な物体の操作を、人間の監督なしに実現できることを実証した。 Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent's visual observations, by leveraging feedbacks from vision language foundation models (VLMs). The key to our approach is to query these models to give preferences over pairs of the agent's image observations based on the text description of the task goal, and then learn a reward function from the preference labels, rather than directly prompting these models to output a raw reward score, which can be noisy and inconsistent. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains - including classic control, as well as manipulation of rigid, articulated, and deformable objects - without the need for human supervision, outperforming prior methods that use large pretrained models for reward generation under the same assumptions.	翻訳日:2024-02-07 16:43:41 公開日:2024-02-06
# 強化学習エージェントのための論理仕様誘導動的タスクサンプリング Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents ( http://arxiv.org/abs/2402.03678v1 ) ライセンス: Link先を確認	Yash Shukla, Wenchang Gao, Vasanth Sarathy, Alvaro Velasquez, Robert Wright and Jivko Sinapov	(参考訳) 強化学習(rl)は、人工エージェントが多様な行動を学ぶために大きな進歩を遂げた。しかし、効果的な政策を学ぶには、しばしば多くの環境相互作用を必要とする。サンプル複雑性の問題を緩和するために、近年のアプローチでは、LTL$_f$(Linear Temporal Logic)式やReward Machines(RM)のような高レベルのタスク仕様を使用してエージェントの学習進捗をガイドしている。本稿では,エージェントを初期状態から高レベルタスク仕様に基づく目標状態へと導くためのrlポリシーのセットを学習し,環境相互作用の数を最小化しながら,論理仕様に基づく動的タスクサンプリング(lsts)と呼ばれる新しい手法を提案する。以前の作業とは異なり、lstsは環境ダイナミクスや報酬マシンに関する情報を仮定せず、ゴールポリシーを成功させる有望なタスクを動的にサンプリングする。我々は,LSTSをグリッドワールド上で評価し,最先端のRMやオートマトン誘導RLベースライン(Q-Learning for Reward Machines)や論理仕様(DIRL)など)と比較して,複雑なシーケンシャルな意思決定問題に対する時間対閾値性能の向上を実現することを示す。さらに,本手法は,部分的に観察可能なロボットタスクと連続制御ロボット操作タスクの両方において,RMおよびオートマトン誘導RLベースラインよりも優れていることを示す。 Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.	翻訳日:2024-02-07 16:43:18 公開日:2024-02-06
# PPIretrievalを用いたタンパク質とタンパク質の効果的な相互作用探索 Effective Protein-Protein Interaction Exploration with PPIretrieval ( http://arxiv.org/abs/2402.03675v1 ) ライセンス: Link先を確認	Chenqing Hua, Connor Coley, Guy Wolf, Doina Precup, Shuangjia Zheng	(参考訳) 蛋白-タンパク質相互作用(ppis)は、シグナル伝達、輸送、免疫防御など多くの細胞機能を制御する上で重要である。多鎖タンパク質複合体構造の予測精度が向上するにつれて、大きな複雑な宇宙を効率的にナビゲートして潜在的なppisを同定することが課題となっている。本稿では,タンパク質-タンパク質相互作用探索のための最初の深層学習モデルであるPPIretrievalを提案する。 PPIretrievalは、その結合部位に未知のクエリタンパク質を付与すると、その結合部位とそれに対応する結合部位とを効果的に同定し、タンパク質-タンパク質複合体の形成を促進する。 Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.	翻訳日:2024-02-07 16:42:48 公開日:2024-02-06
# 間接的推論としての大規模言語モデル--非肯定的・矛盾的推論 Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning ( http://arxiv.org/abs/2402.03667v1 ) ライセンス: Link先を確認	Yanfang Zhang, Yiliu Sun, Yibing Zhan, Dapeng Tao, Dacheng Tao, Chen Gong	(参考訳) 近年,Large Language Models (LLM) の複雑な推論能力の向上に注目が集まっている。しかし,従来のチェーン・オブ・ソートや自己整合性といった手法は,主に直接推論(DR)の枠組みを踏襲しているため,DRによる解決が困難な現実的な課題の解決に苦慮する。そのため,本研究では,現実的推論や数理的証明などのIR課題に対処するために,反正の論理と矛盾を取り入れた新しい間接推論(IR)手法を提案する。具体的には,2つのステップから構成される。まず, llmの理解性を高めるために, コントラプラスの論理等価性を利用してデータと規則を補強する。第2に、論理的に元のDRプロセスと等価な矛盾による証明に基づいて、LCMを誘導するプロンプトテンプレートのセットを設計する。我々のIR法は単純だが有効であり、既存のDR法と簡単に統合でき、LCMの推論能力をさらに向上させることができる。 GPT-3.5-turbo や Gemini-pro などの一般的な LLM に関する実験結果から,従来の DR 法と比較すると,我々のIR 法は事実推論の総合的精度を27.33%,数学的証明を31.43%向上させることが示された。さらに,ir と dr を組み合わせる手法は,ir と dr のみを使用する手法を著しく上回っており,提案手法の有効性も示している。 Recently, increasing attention has been focused drawn on to improve the ability of Large Language Models (LLMs) to perform complex reasoning. However, previous methods, such as Chain-of-Thought and Self-Consistency, mainly follow Direct Reasoning (DR) frameworks, so they will meet difficulty in solving numerous real-world tasks which can hardly be solved via DR. Therefore, to strengthen the reasoning power of LLMs, this paper proposes a novel Indirect Reasoning (IR) method that employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof. Specifically, our methodology comprises two steps. Firstly, we leverage the logical equivalence of contrapositive to augment the data and rules to enhance the comprehensibility of LLMs. Secondly, we design a set of prompt templates to trigger LLMs to conduct IR based on proof by contradiction that is logically equivalent to the original DR process. Our IR method is simple yet effective and can be straightforwardly integrated with existing DR methods to further boost the reasoning abilities of LLMs. The experimental results on popular LLMs, such as GPT-3.5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27.33% and mathematical proof by 31.43%, when compared with traditional DR methods. Moreover, the methods combining IR and DR significantly outperform the methods solely using IR or DR, further demonstrating the effectiveness of our strategy.	翻訳日:2024-02-07 16:42:32 公開日:2024-02-06
# QuEST: 効率的な選択ファインタニングによる低ビット拡散モデル量子化 QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning ( http://arxiv.org/abs/2402.03666v1 ) ライセンス: Link先を確認	Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan	(参考訳) 拡散モデルは画像生成タスクで著しく成功したが、実際のデプロイメントは高いメモリ消費と時間消費によって抑制されている。量子化は拡散モデル圧縮と加速の方法であるが、既存の手法はモデルが低ビットに量子化されると完全に失敗する。本稿では,不均衡な活性化分布,不正確な時間情報,特定のモジュールの摂動に対する脆弱性という,現在の手法の有効性を損なう量子化拡散モデルの3つの特性を明らかにする。分散不均衡に起因する高密度低ビット量子化の難しさを軽減するため,活性化分布に適応する量子化モデルを微調整する。この考え方に基づき、重要な時間情報を保持する層とビット幅の低減に敏感な層という2つの重要な種類の量子化層を識別し、性能劣化を効率良く緩和するために微調整する。提案手法がアクティベーション分布を変化させ、意味のある時間情報を提供し、より簡単で正確な量子化を容易にすることを実証的に検証する。本手法は,3つの高分解能画像生成タスクで評価され,様々なビット幅設定で最先端の性能を実現するとともに,フル4ビット(すなわちw4a4)の安定拡散で可読性画像を生成する最初の方法である。 Diffusion models have achieved remarkable success in image generation tasks, yet their practical deployment is restrained by the high memory and time consumption. While quantization paves a way for diffusion model compression and acceleration, existing methods totally fail when the models are quantized to low-bits. In this paper, we unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion.	翻訳日:2024-02-07 16:42:00 公開日:2024-02-06
# AoSRNet:マルチ知識統合によるオールインワンのシーンリカバリネットワーク AoSRNet: All-in-One Scene Recovery Networks via Multi-knowledge Integration ( http://arxiv.org/abs/2402.03738v1 ) ライセンス: Link先を確認	Yuxu Lu, Dong Yang, Yuan Gao, Ryan Wen Liu, Jun Liu, Yu Guo	(参考訳) 非均質な撮像媒体における光の散乱と減衰、あるいは不整合光強度は、収集された画像のコントラストと色歪の不足を引き起こし、視覚駆動型スマートアーバン、自動運転車、インテリジェントロボットなどの開発を制限する。本稿では,マルチ知識統合(AoSRNet)を用いたオールインワンシーン回復ネットワークを提案する。ガンマ補正(GC)と最適化線形ストレッチ(OLS)を組み合わせてディテール拡張モジュール(DEM)とカラー復元モジュール(CRM)を作成する。さらに,GC非線形およびOLS線形変換による画像テクスチャ詳細の損失を軽減するために,マルチ受信フィールド抽出モジュール(MEM)を提案する。最後に,dem,crm,memが生成する粗い特徴をエンコーダデコーダを通じて洗練し,最終的な復元画像を生成する。総合実験の結果,aosrnetは他の最先端手法と比較して有効性と安定性を示した。ソースコードは \url{https://github.com/LouisYuxuLu/AoSRNet} で入手できる。 Scattering and attenuation of light in no-homogeneous imaging media or inconsistent light intensity will cause insufficient contrast and color distortion in the collected images, which limits the developments such as vision-driven smart urban, autonomous vehicles, and intelligent robots. In this paper, we propose an all-in-one scene recovery network via multi-knowledge integration (termed AoSRNet) to improve the visibility of imaging devices in typical low-visibility imaging scenes (e.g., haze, sand dust, and low light). It combines gamma correction (GC) and optimized linear stretching (OLS) to create the detail enhancement module (DEM) and color restoration module (CRM). Additionally, we suggest a multi-receptive field extraction module (MEM) to attenuate the loss of image texture details caused by GC nonlinear and OLS linear transformations. Finally, we refine the coarse features generated by DEM, CRM, and MEM through Encoder-Decoder to generate the final restored image. Comprehensive experimental results demonstrate the effectiveness and stability of AoSRNet compared to other state-of-the-art methods. The source code is available at \url{https://github.com/LouisYuxuLu/AoSRNet}.	翻訳日:2024-02-07 16:35:09 公開日:2024-02-06
# 変分オートエンコーダによる異常検出の統計的検証 Statistical Test for Anomaly Detections by Variational Auto-Encoders ( http://arxiv.org/abs/2402.03724v1 ) ライセンス: Link先を確認	Daiki Miwa, Tomohiro Shiraishi, Vo Nguyen Le Duy, Teruyuki Katsuoka, Ichiro Takeuchi	(参考訳) 本研究では,変分オートエンコーダ(VAE)を用いた異常検出(AD)の信頼性評価について検討する。過去10年間で、VAEベースのADは、メソッド開発から応用研究まで、様々な観点から活発に研究されてきた。しかし, 診断などの高精度な意思決定にADの結果を使用する場合には, 検出された異常の信頼性を確保する必要がある。本研究では,vaeベースadの統計的信頼性を統計的テストの枠組みで定量化する方法としてvae-adテストを提案する。 VAE-ADテストを用いて、VAEによって検出された異常領域の信頼性をp値の形で定量することができる。これは、p値が一定の閾値以下であるときに異常が宣言された場合、誤検出の確率を所望のレベルまで制御することができることを意味する。 VAE-ADテストは選択推論と呼ばれる新しい統計的推論フレームワークに基づいて構築されるため、その妥当性は有限標本で理論的に保証される。提案するvae-adテストの有効性と有効性を示すために,人工データに関する数値実験と脳画像解析への応用を行った。 In this study, we consider the reliability assessment of anomaly detection (AD) using Variational Autoencoder (VAE). Over the last decade, VAE-based AD has been actively studied in various perspective, from method development to applied research. However, when the results of ADs are used in high-stakes decision-making, such as in medical diagnosis, it is necessary to ensure the reliability of the detected anomalies. In this study, we propose the VAE-AD Test as a method for quantifying the statistical reliability of VAE-based AD within the framework of statistical testing. Using the VAE-AD Test, the reliability of the anomaly regions detected by a VAE can be quantified in the form of p-values. This means that if an anomaly is declared when the p-value is below a certain threshold, it is possible to control the probability of false detection to a desired level. Since the VAE-AD Test is constructed based on a new statistical inference framework called selective inference, its validity is theoretically guaranteed in finite samples. To demonstrate the validity and effectiveness of the proposed VAE-AD Test, numerical experiments on artificial data and applications to brain image analysis are conducted.	翻訳日:2024-02-07 16:34:46 公開日:2024-02-06
# Rig3DGS: Casual Monocular Videoからコントロール可能なポートレイを作る Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos ( http://arxiv.org/abs/2402.03723v1 ) ライセンス: Link先を確認	Alfredo Rivero, ShahRukh Athar, Zhixin Shu, Dimitris Samaras	(参考訳) コントロール可能な3D人間の肖像画をカジュアルなスマートフォンビデオから作成することが非常に望ましい。最近の3Dガウススティング(3DGS)は、レンダリング品質とトレーニング効率が改善されている。しかし、高品質なレンダリングを実現するために、シングルビューキャプチャーから頭部の動きや表情を正確にモデル化し、切り離すことは依然として課題である。本稿では,この課題に対処するためにRig3DGSを紹介する。ダイナミックな主題を含むシーン全体を、標準空間における3Dガウスの集合を用いて表現する。頭部ポーズや表情などの一連の制御信号を用いて、学習した変形を伴って3次元空間に変換し、所望のレンダリングを生成する。我々の重要な革新は、慎重に設計された変形法であり、3次元形態素モデルから学習可能な先行モデルによって導かれる。このアプローチは、トレーニングにおいて非常に効率的であり、表情、頭の位置、様々なキャプチャ全体にわたるビュー合成の制御に効果的である。定量的および定性的な実験によって学習した変形の有効性を実証する。プロジェクトページはhttp://shahrukhathar.github.io/2024/02/05/Rig3DGS.htmlにある。 Creating controllable 3D human portraits from casual smartphone videos is highly desirable due to their immense value in AR/VR applications. The recent development of 3D Gaussian Splatting (3DGS) has shown improvements in rendering quality and training efficiency. However, it still remains a challenge to accurately model and disentangle head movements and facial expressions from a single-view capture to achieve high-quality renderings. In this paper, we introduce Rig3DGS to address this challenge. We represent the entire scene, including the dynamic subject, using a set of 3D Gaussians in a canonical space. Using a set of control signals, such as head pose and expressions, we transform them to the 3D space with learned deformations to generate the desired rendering. Our key innovation is a carefully designed deformation method which is guided by a learnable prior derived from a 3D morphable model. This approach is highly efficient in training and effective in controlling facial expressions, head positions, and view synthesis across various captures. We demonstrate the effectiveness of our learned deformation through extensive quantitative and qualitative experiments. The project page can be found at http://shahrukhathar.github.io/2024/02/05/Rig3DGS.html	翻訳日:2024-02-07 16:34:20 公開日:2024-02-06
# グラフLLMの類似性に基づく近傍選択 Similarity-based Neighbor Selection for Graph LLMs ( http://arxiv.org/abs/2402.03720v1 ) ライセンス: Link先を確認	Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang	(参考訳) テキスト分散グラフ(TAGs)は、言語学習モデル(LLMs)による直接処理に特有の課題を提示するが、その広範な常識知識と頑健な推論能力は、TAGsにおけるノード分類に大きな可能性を秘めている。この分野での以前の研究は、データセット分割の不整合と高度なLCMの非活用によってさらに複雑化され、オーバー・スクワッシング、ヘテロフィリー、非効率的なグラフ情報統合といった問題に悩まされてきた。これらの課題に対処するために,類似性に基づく近傍選択 (sns) を導入する。 SNSはSimCSEと高度な隣人選別技術を用いて、選択した隣人の品質を効果的に改善し、グラフ表現を改善し、オーバースカッシングやヘテロフィリーといった問題を緩和する。さらに、インダクティブでトレーニングのないアプローチとして、SNSは従来のGNN手法よりも優れた一般化とスケーラビリティを示している。我々の総合的な実験は、標準データセット分割のプラクティスに固執し、SNSは、LLMとの単純な迅速な相互作用を通じて、バニラGNNを一貫して上回り、ノード分類におけるPubMedのようなデータセットの最先端結果、グラフ構造理解におけるLLMの可能性を示す。本研究は,LLMアプリケーションにおけるグラフ構造統合の重要性をさらに強調し,ノード分類の成功要因を明らかにした。コードはhttps://github.com/ruili33/SNSで入手できる。 Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS.	翻訳日:2024-02-07 16:34:02 公開日:2024-02-06
# 映像に基づく衣服交換者再識別のための注意型形状と歩行表現学習 Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification ( http://arxiv.org/abs/2402.03716v1 ) ライセンス: Link先を確認	Vuong D. Nguyen, Samiha Mirza, Pranav Mantini, Shishir K. Shah	(参考訳) 現在最先端のビデオベースPerson Re-Identification (Re-ID)は、主にディープラーニングモデルによって抽出された外観特徴に依存している。これらの方法は、着替えた人が実世界のシナリオで長期分析に当てはまらないため、外観情報が信頼できない。本稿では、VCCRe-IDのための「注意に基づく形状と歩行表現学習」(ASGL)を提案することにより、ビデオベースの衣服交換者Re-ID(VCCRe-ID)の実践的問題に対処する。我々のASGLフレームワークは,空間時空間グラフアテンションネットワーク(ST-GAT)を用いて衣服不変歩行キューを学習することにより,衣服の変動下でのRe-ID性能を向上させる。提案するST-GATは,3次元スケルトンに基づく時空間時間グラフを考慮し,視点変化や閉塞下での歩行埋め込みの堅牢性を高めることができるマルチヘッドアテンションモジュールを備える。 ST-GATは重要な動き範囲を増幅し、ノイズポーズの影響を低減する。そして、マルチヘッド学習モジュールは、有効な局所時間的運動動態を効果的に予約する。また,GATを用いて身体形状の手がかりを学習することで,人物表現の識別力を高める。大規模VCCRe-IDデータセットの2つの実験により、提案するフレームワークは、ランク1の精度で12.2%、mAPで7.0%、最先端の手法より優れていることが示された。 Current state-of-the-art Video-based Person Re-Identification (Re-ID) primarily relies on appearance features extracted by deep learning models. These methods are not applicable for long-term analysis in real-world scenarios where persons have changed clothes, making appearance information unreliable. In this work, we deal with the practical problem of Video-based Cloth-Changing Person Re-ID (VCCRe-ID) by proposing "Attention-based Shape and Gait Representations Learning" (ASGL) for VCCRe-ID. Our ASGL framework improves Re-ID performance under clothing variations by learning clothing-invariant gait cues using a Spatial-Temporal Graph Attention Network (ST-GAT). Given the 3D-skeleton-based spatial-temporal graph, our proposed ST-GAT comprises multi-head attention modules, which are able to enhance the robustness of gait embeddings under viewpoint changes and occlusions. The ST-GAT amplifies the important motion ranges and reduces the influence of noisy poses. Then, the multi-head learning module effectively reserves beneficial local temporal dynamics of movement. We also boost discriminative power of person representations by learning body shape cues using a GAT. Experiments on two large-scale VCCRe-ID datasets demonstrate that our proposed framework outperforms state-of-the-art methods by 12.2% in rank-1 accuracy and 7.0% in mAP.	翻訳日:2024-02-07 16:33:36 公開日:2024-02-06
# Clarify: 自然言語補正によるモデルロバストネスの改善 Clarify: Improving Model Robustness With Natural Language Corrections ( http://arxiv.org/abs/2402.03715v1 ) ライセンス: Link先を確認	Yoonho Lee, Michelle S. Lam, Helena Vasconcelos, Michael S. Bernstein, Chelsea Finn	(参考訳) 教師付き学習では、モデルは静的データセットから相関を抽出するために訓練される。これはしばしばハイレベルな誤解に依存するモデルにつながる。このような誤解を防ぐためには、トレーニングデータ以外の追加情報を提供しなければならない。既存の手法には、スパイラルな特徴のラベルやバランスの取れた分布からのラベル付きデータなど、追加のインスタンスレベルの監視形式が組み込まれている。このような戦略は、元のトレーニングデータに近いスケールで追加のアノテーションを必要とするため、大規模なデータセットでは、非常にコストがかかる可能性がある。モデルの誤解に対する目標とする自然言語フィードバックは、さらなる監視のより効率的な形式である、という仮説を立てる。モデル誤解をインタラクティブに修正する新しいインターフェースと方法であるClarifyを紹介した。 Clarifyを通じて、モデルの一貫性のある障害パターンを記述するための短いテキスト記述のみを提供する必要がある。そして、完全に自動化された方法で、トレーニングデータを再重み付けしたり、追加のターゲットデータを集めることで、トレーニングプロセスを改善するためにこのような記述を使用します。ユーザ調査の結果,非熟練ユーザは2つのデータセットにおいて,最悪のグループ精度を平均17.1%向上させることで,モデルの誤解をうまく記述できることがわかった。さらに,imagenetデータセットにおける31個の新規ハードサブポピュレーションの発見と修正を行い,マイノリティ分散精度を21.1%から28.7%に向上させた。 In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that targeted natural language feedback about a model's misconceptions is a more efficient form of additional supervision. We introduce Clarify, a novel interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our user studies show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 17.1% in two datasets. Additionally, we use Clarify to find and rectify 31 novel hard subpopulations in the ImageNet dataset, improving minority-split accuracy from 21.1% to 28.7%.	翻訳日:2024-02-07 16:33:08 公開日:2024-02-06
# ウェアラブルデバイスにおける位置不変およびデバイス非依存モーションアクティビティ認識の進歩 Advancing Location-Invariant and Device-Agnostic Motion Activity Recognition on Wearable Devices ( http://arxiv.org/abs/2402.03714v1 ) ライセンス: Link先を確認	Rebecca Adaimi, Abdelkareem Bedri, Jun Gong, Richard Kang, Joanna Arreaza-Taylor, Gerri-Michelle Pascual, Michael Ralph, and Gierad Laput	(参考訳) ウェアラブルセンサーは人々の生活に浸透し、インタラクティブなシステムやアクティビティ認識に影響を与えている。しかし、異なるプラットフォームのためにカスタムモデルを必要とする異質性検知を扱う場合、実践者は重大な障害に直面する。本稿では,センサの配置にまたがる運動モデルの一般化可能性について総合的な評価を行う。我々の分析は、この課題を強調し、あらゆるデバイスに組み込むことができる位置不変モデルを構築する上で重要な位置を特定する。このために、私たちは、公開可能な最大のマルチロケーションアクティビティデータセット (n=50,200 累積時間) を導入します。また,センサ配置に関係なく,単一モデルから91.41%のフレームレベルF1スコアに到達可能なデバイス上での動作モデルも提示する。最後に,ある場所から与えられたデータを合成することで,手間のかかるデータ収集タスクを緩和することを目的とした,クロスロケーションデータ合成について検討する。これらの貢献は,hciとユビキタスコンピューティングにおけるローバリア,ロケーション不変なアクティビティ認識システム,触媒的研究の展望を前進させる。 Wearable sensors have permeated into people's lives, ushering impactful applications in interactive systems and activity recognition. However, practitioners face significant obstacles when dealing with sensing heterogeneities, requiring custom models for different platforms. In this paper, we conduct a comprehensive evaluation of the generalizability of motion models across sensor locations. Our analysis highlights this challenge and identifies key on-body locations for building location-invariant models that can be integrated on any device. For this, we introduce the largest multi-location activity dataset (N=50, 200 cumulative hours), which we make publicly available. We also present deployable on-device motion models reaching 91.41% frame-level F1-score from a single model irrespective of sensor placements. Lastly, we investigate cross-location data synthesis, aiming to alleviate the laborious data collection tasks by synthesizing data in one location given data from another. These contributions advance our vision of low-barrier, location-invariant activity recognition systems, catalyzing research in HCI and ubiquitous computing.	翻訳日:2024-02-07 16:32:48 公開日:2024-02-06
# Leggett-Garg不等式を用いた単一システムによる認証ランダムネスの生成 Single system based generation of certified randomness using Leggett-Garg inequality ( http://arxiv.org/abs/2402.03712v1 ) ライセンス: Link先を確認	Pingal Pratyush Nath, Debashis Saha, Dipankar Home, Urbasi Sinha	(参考訳) ループホールフリーフォトニックアーキテクチャにおいて、レゲット・ガーグの不等式違反を利用して、半デバイス非依存な量子乱数生成のためのセキュアなスキームを理論的に定式化し、実験的に実証する。生成したランダム性の定量化は、解析的および数値的アプローチによって厳密に推定され、どちらも完全に一致している。 9,19,118ドルの真に予測不能なビットをセキュアに生成します。これは、単一のシステムの量子性を利用する信頼性の高い乱数生成器の、経験的に便利なクラスへの未探索の道を開く。 We theoretically formulate and experimentally demonstrate a secure scheme for semi-device-independent quantum random number generation by utilizing Leggett-Garg inequality violations, within a loophole-free photonic architecture. The quantification of the generated randomness is rigorously estimated by analytical as well as numerical approaches, both of which are in perfect agreement. We securely generate $9,19,118$ truly unpredictable bits. This opens up an unexplored avenue towards an empirically convenient class of reliable random number generators harnessing the quantumness of single systems.	翻訳日:2024-02-07 16:32:31 公開日:2024-02-06
# listen, chat, and edit: テキストガイド付き音環境修正による聴覚体験の向上 Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience ( http://arxiv.org/abs/2402.03710v1 ) ライセンス: Link先を確認	Xilin Jiang, Cong Han, Yinghao Aaron Li, and Nima Mesgarani	(参考訳) 日常生活では、望ましい音と望ましくない音の両方に遭遇し、その存在と容積を限定的に制御する。提案する「listen, chat, and edit」(lce)は,ユーザが入力したテキスト命令に基づいて各音源をミキシングで修飾する,新しいマルチモーダル音声混合エディタである。 LCEはユーザフレンドリーなチャットインターフェースと、複数の音源をミキシング内で同時に編集するユニークな機能で、それを分離する必要がない。ユーザーはオープン語彙のテキストプロンプトを入力し、大きな言語モデルで解釈され、音の混合を編集するためのセマンティックフィルタを作成する。その後、システムは混合物をコンポーネントに分解し、セマンティックフィルタを適用し、それを所望の出力に再組み立てする。音声と様々な音声ソースを含む10k以上の混合データと、抽出、削除、ボリューム制御といった様々な編集タスクのためのテキストプロンプトを備えた160時間データセットを開発した。本実験は,全編集作業における信号品質の大幅な向上と,音源数や形態の異なるゼロショットシナリオにおける頑健な性能を示す。 In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Edit" (LCE), a novel multimodal sound mixture editor that modifies each sound source in a mixture based on user-provided text instructions. LCE distinguishes itself with a user-friendly chat interface and its unique ability to edit multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for editing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles it into the desired output. We developed a 160-hour dataset with over 100k mixtures, including speech and various audio sources, along with text prompts for diverse editing tasks like extraction, removal, and volume control. Our experiments demonstrate significant improvements in signal quality across all editing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.	翻訳日:2024-02-07 16:32:20 公開日:2024-02-06
# SISP:パンクロマティック衛星画像におけるきめ細粒度船体セグメンテーションのためのベンチマークデータセット SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images ( http://arxiv.org/abs/2402.03708v1 ) ライセンス: Link先を確認	Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan	(参考訳) 衛星画像におけるきめ細かい船のインスタンスのセグメンテーションは、海上での海洋活動を監視する上で非常に重要である。しかし、既存のデータセットは、微細な情報やピクセル単位の局所化アノテーションの不足、画像の多様性やバリエーションの不足に悩まされ、このタスクの研究は制限される。そこで本研究では,1万枚のスライス画像に4つの細粒度カテゴリを持つ56,693個の船種を含むSISPと,その解像度0.5mのSuperView-1衛星からすべての画像が収集される,パンクロマティック衛星画像の船種分離のベンチマークデータセットを提案する。提案したSISPデータセットのターゲットは、高級不均衡、様々なシーン、ターゲット密度とスケールの大きなバリエーション、高級間類似度とクラス内多様性など、実際の衛星シーンと一致した特徴を持ち、SISPデータセットは実世界のアプリケーションにより適している。さらに,衛星画像における船舶インスタンスセグメント化のベンチマーク手法として,動的特徴リファインメント・アシストインスタンスセグメント化ネットワークdfrinstを導入することで,重要な特徴の明示的な表現を強化し,船舶インスタンスセグメント化の性能を向上させる。提案するsispデータセット上で実験と解析を行い,ベンチマーク法と最先端手法を評価し,今後の研究を促進するためのベースラインを確立する。提案されたデータセットとソースコードは、https://github.com/Justlovesmile/SISP.comから入手できる。 Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark dataset for fine-grained Ship Instance Segmentation in Panchromatic satellite images, namely SISP, which contains 56,693 well-annotated ship instances with four fine-grained categories across 10,000 sliced images, and all the images are collected from SuperView-1 satellite with the resolution of 0.5m. Targets in the proposed SISP dataset have characteristics that are consistent with real satellite scenes, such as high class imbalance, various scenes, large variations in target densities and scales, and high inter-class similarity and intra-class diversity, all of which make the SISP dataset more suitable for real-world applications. In addition, we introduce a Dynamic Feature Refinement-assist Instance segmentation network, namely DFRInst, as the benchmark method for ship instance segmentation in satellite images, which can fortify the explicit representation of crucial features, thus improving the performance of ship instance segmentation. Experiments and analysis are performed on the proposed SISP dataset to evaluate the benchmark method and several state-of-the-art methods to establish baselines for facilitating future research. The proposed dataset and source codes will be available at: https://github.com/Justlovesmile/SISP.	翻訳日:2024-02-07 16:32:00 公開日:2024-02-06
# MMAUD: 最新の小型ドローンの脅威に対する総合的マルチモードアンチUAVデータセット MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats ( http://arxiv.org/abs/2402.03706v1 ) ライセンス: Link先を確認	Shenghai Yuan, Yizhuo Yang, Thien Hoang Nguyen, Thien-Minh Nguyen, Jianfei Yang, Fen Liu, Jianping Li, Han Wang, Lihua Xie	(参考訳) 有害なペイロードを輸送したり、単独で損傷を発生させる可能性を持つ小型無人航空機(UAV)がもたらす課題に対して、我々はMMAUD: a comprehensive Multi-Modal Anti-UAV Datasetを紹介した。 MMAUDは、ドローン検出、UAV型分類、軌道推定に焦点を当てて、現代の脅威検出手法における重要なギャップに対処する。 MMAUDはステレオビジョン、様々なライダー、レーダー、オーディオアレイなど様々な感覚入力を組み合わせることで際立っている。これは、熱とrgbを使って特定のヴァンテージポイントでキャプチャされたデータセットよりも高い忠実度で現実世界のシナリオに対処するのに必須の、ユニークなオーバーヘッド空中検出を提供する。さらに、MMAUDは正確なライカ生成の真理データを提供し、信頼性を高め、他のデータセットでは見られないアルゴリズムやモデルの信頼性向上を可能にする。既存の研究の多くはデータセットを公開していないため、MMAUDは正確で効率的なソリューションを開発するための貴重なリソースとなっている。提案するモダリティは費用対効果が高く適応性が高いため,UAV脅威検出ツールの実験と実装が可能である。我々のデータセットは環境重機音を取り入れることで現実世界のシナリオをシミュレートする。このアプローチはデータセットの適用性を高め、近位車両操作中に直面する正確な課題をキャプチャする。 MMAUDは、UAV脅威の検出、分類、軌道推定機能などにおいて重要な役割を果たすことが期待されている。私たちのデータセット、コード、デザインはhttps://github.com/ntu-aris/MMAUD.comで公開されます。 In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset's applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in https://github.com/ntu-aris/MMAUD.	翻訳日:2024-02-07 16:31:31 公開日:2024-02-06
# FoolSDEdit: ターゲットの属性を意識して編集をステアリングする FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution ( http://arxiv.org/abs/2402.03705v1 ) ライセンス: Link先を確認	Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo	(参考訳) sdeditのような拡散モデルに基づく誘導画像合成手法は、ストローク画などの入力からリアルな画像を作成するのに優れている。しかし、既存の取り組みは主に画質に重点を置いており、しばしば重要な点を見下ろしている:拡散モデルは個々の画像ではなく、データ分布を表す。これは、ユーザーの意図に反するイメージを生成し、倫理的懸念を提起する低いが批判的な機会をもたらす。例えば、女性の特徴を持つストロークペインティングを入力したユーザは、SDEditから男性の顔を取得する可能性がある。この潜在的な脆弱性を明らかにするため,SDEdit は入力の属性特性を変えることなく,特定の属性(女性など)に一致した特定のデータ分布を生成する。本稿では,属性認識目的関数を用いたTAGA(Targeted Attribute Generative Attack)を提案し,入力ストローク絵に付加される対向雑音を最適化する。実験的な研究によると、従来の敵対的ノイズはTAGAと競合し、露光や動きのぼかしといった自然な摂動は、生成した画像の属性を容易に変化させる。効果的な攻撃を行うために、FoolSDEditを導入する: 共同対向露光とぼかし攻撃を設計し、ストローク絵に露出と動きのぼかしを追加し、それらをまとめて最適化する。我々は,ネットワークアーキテクチャ探索問題として,様々な摂動の実行戦略を最適化する。さまざまな摂動に対する多様な実行戦略を表すグラフであるsuperpertを作成します。訓練後、seditに対する効果的なtagaの実行戦略を最適化する。 2つのデータセットの総合的な実験は、SDEditがターゲット属性認識データ分布を生成することを説得し、ベースラインを著しく上回ることを示す。 Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user intentions, raising ethical concerns. For example, a user inputting a stroke painting with female characteristics might, with some probability, get male faces from SDEdit. To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e.g., female), without changing the input's attribute characteristics. We propose the Targeted Attribute Generative Attack (TAGA), using an attribute-aware objective function and optimizing the adversarial noise added to the input stroke painting. Empirical studies reveal that traditional adversarial noise struggles with TAGA, while natural perturbations like exposure and motion blur easily alter generated images' attributes. To execute effective attacks, we introduce FoolSDEdit: We design a joint adversarial exposure and blur attack, adding exposure and motion blur to the stroke painting and optimizing them together. We optimize the execution strategy of various perturbations, framing it as a network architecture search problem. We create the SuperPert, a graph representing diverse execution strategies for different perturbations. After training, we obtain the optimized execution strategy for effective TAGA against SDEdit. Comprehensive experiments on two datasets show our method compelling SDEdit to generate a targeted attribute-aware data distribution, significantly outperforming baselines.	翻訳日:2024-02-07 16:31:04 公開日:2024-02-06
# 離散・連続時間離散化拡散の改善と統一 Improving and Unifying Discrete&Continuous-time Discrete Denoising Diffusion ( http://arxiv.org/abs/2402.03701v1 ) ライセンス: Link先を確認	Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu	(参考訳) 離散拡散モデルは言語やグラフのような自然に離散的なデータに適用することで注目されている。離散時間離散拡散はしばらく確立されてきたが、最近キャンベルら (2022) は連続時間離散拡散の最初の枠組みを導入した。しかし、それらのトレーニングとサンプリングプロセスは離散時間版とは大きく異なり、トラクタビリティの非自明な近似を必要とする。本稿では, 離散拡散のためのより正確で最適化しやすい学習を可能にする変分下界の一連の数学的単純化について述べる。さらに, 正確なサンプリングが可能であり, 離散時間および連続時間離散拡散のエレガントな統一を可能にする, 後方復調のための簡易な定式化を導出する。より単純な解析的定式化により、前方および後方の確率は、多元オブジェクトの異なるノイズ分布を含む任意のノイズ分布を柔軟に許容することができる。実験の結果,提案したUSD3 (Unified Simplified Discrete Denoising Diffusion) は,確立したデータセット上でのSOTAベースラインよりも優れていた。私たちは、統一コードをhttps://github.com/lingxiaoshawn/usd3でオープンソースにしました。 Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs. Although discrete-time discrete diffusion has been established for a while, only recently Campbell et al. (2022) introduced the first framework for continuous-time discrete diffusion. However, their training and sampling processes differ significantly from the discrete-time version, necessitating nontrivial approximations for tractability. In this paper, we first present a series of mathematical simplifications of the variational lower bound that enable more accurate and easy-to-optimize training for discrete diffusion. In addition, we derive a simple formulation for backward denoising that enables exact and accelerated sampling, and importantly, an elegant unification of discrete-time and continuous-time discrete diffusion. Thanks to simpler analytical formulations, both forward and now also backward probabilities can flexibly accommodate any noise distribution, including different noise distributions for multi-element objects. Experiments show that our proposed USD3 (for Unified Simplified Discrete Denoising Diffusion) outperform all SOTA baselines on established datasets. We open-source our unified code at https://github.com/LingxiaoShawn/USD3.	翻訳日:2024-02-07 16:30:32 公開日:2024-02-06
# GenLens: Visual GenAIモデル出力の体系的評価 GenLens: A Systematic Evaluation of Visual GenAI Model Outputs ( http://arxiv.org/abs/2402.03700v1 ) ライセンス: Link先を確認	Tica Lin, Hanspeter Pfister, Jui-Hsien Wang	(参考訳) コンピュータビジョンにおける生成AI(GenAI)モデルの迅速な開発は、その品質と公平性を保証するために効果的な評価方法を必要とする。既存のツールは、主にデータセットの品質保証とモデル説明可能性に焦点を当てており、モデル開発中にGenAI出力評価に大きなギャップを残しています。現在のプラクティスは、しばしば開発者の主観的な視覚的評価に依存します。本稿では、GenAIモデル開発者と産業環境で形式的な研究を行うことにより、このギャップを埋める。この結果から,モデル開発の初期段階におけるジェナイモデル出力の体系的評価を目的としたビジュアル解析インタフェースであるgenlensの開発に繋がった。 GenLensは、障害ケースの概要と注釈付け、イシュータグと分類のカスタマイズ、複数のユーザからのアノテーションの集約によるコラボレーション強化のための定量的なアプローチを提供する。モデル開発者によるユーザ調査によると、GenLensはワークフローを効果的に強化し、高い満足度と、それをプラクティスに統合する強い意図によって証明されている。本研究は、GenAI開発における堅牢な早期評価ツールの重要性を強調し、公正かつ高品質なGenAIモデルの進歩に寄与する。 The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.	翻訳日:2024-02-07 16:30:12 公開日:2024-02-06
# Vision Superalignment: Vision Foundation Modelsのための弱から強の一般化 Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models ( http://arxiv.org/abs/2402.03749v1 ) ライセンス: Link先を確認	Jianyuan Guo, Hanting Chen, Chengcheng Wang, Kai Han, Chang Xu, Yunhe Wang	(参考訳) 大規模言語モデルの最近の進歩は、その異常でほぼ超人的な能力への関心を喚起し、研究者はこれらの能力を評価し最適化する方法を探究する。この文脈において、我々の論文は、より弱いモデルを用いてより強いモデルを監督する弱い一般化の概念に焦点を当て、前者の限界を超えて後者の能力を高めることを目的として、視覚基盤モデルの領域を掘り下げる。弱強監督のための新規かつ適応的に調整可能な損失関数を提案する。包括的実験は、少数ショット学習、移行学習、ノイズラベル学習、共通知識蒸留設定など、さまざまなシナリオにまたがる。私たちのアプローチは、強固な一般化によって設定されたパフォーマンスベンチマークを超えるだけでなく、データセット全体を微調整した強固なモデルの結果を超えます。この説得力のある証拠は、弱強一般化の有意義な可能性を強調し、その能力が視覚基盤モデルの性能を大幅に高めることを示した。コードはhttps://github.com/ggjy/vision_weak_to_strongで入手できる。 Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment. In this context, our paper delves into the realm of vision foundation models, focusing on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one, aiming to enhance the latter's capabilities beyond the former's limits. We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision. Our comprehensive experiments span various scenarios, including few-shot learning, transfer learning, noisy label learning, and common knowledge distillation settings. The results are striking: our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets. This compelling evidence underscores the significant potential of weak-to-strong generalization, showcasing its capability to substantially elevate the performance of vision foundation models. The code is available at https://github.com/ggjy/vision_weak_to_strong.	翻訳日:2024-02-07 16:21:24 公開日:2024-02-06
# PDE発見のための不変制約深層学習ネットワーク An invariance constrained deep learning network for PDE discovery ( http://arxiv.org/abs/2402.03747v1 ) ライセンス: Link先を確認	Chao Chen, Hui Li, Xiaowei Jin	(参考訳) データセットから偏微分方程式(PDE)の発見が注目されている。しかし, 導関数計算の難易度やノイズの乱れなどにより, 高ノイズのスパースデータからの制御方程式の発見はいまだに困難である。さらに、物理法則を満たすための図書館の選択原則をさらに研究する必要がある。不変性は方程式の基本的な法則の1つである。本研究では,PDEの発見のための分散制約付きディープラーニングネットワーク(ICNet)を提案する。時空間変換不変性(ガリレオ不変性)が物理法則の基本的な性質であることを考えると、ガリレオ変換の要件を満たすことができない候補をフィルタリングする。その後,ニューラルネットワークの損失関数に固定項と可能な項を組み込み,ノイズの多いスパースデータの影響を著しく抑制した。そして、学習可能パラメータを固定することなく冗長項をフィルタリングすることにより、ICNet法で発見された支配方程式を効果的に近似することができる。 2次元バーガース方程式、障害物上の2次元チャネルフロー方程式、および3次元頭蓋内動脈瘤方程式を選択し、流体力学におけるicnetの優位性を検証する。さらに、同様の不変性法を波動方程式(ローレンツ不変性)の発見に拡張し、シングルおよび結合されたクライン・ゴルドン方程式を用いて検証する。その結果, 物理制約付きICNet法は, スパースおよびノイズの多いデータから方程式を探索する際の優れた性能を示した。 The discovery of partial differential equations (PDEs) from datasets has attracted increased attention. However, the discovery of governing equations from sparse data with high noise is still very challenging due to the difficulty of derivatives computation and the disturbance of noise. Moreover, the selection principles for the candidate library to meet physical laws need to be further studied. The invariance is one of the fundamental laws for governing equations. In this study, we propose an invariance constrained deep learning network (ICNet) for the discovery of PDEs. Considering that temporal and spatial translation invariance (Galilean invariance) is a fundamental property of physical laws, we filter the candidates that cannot meet the requirement of the Galilean transformations. Subsequently, we embedded the fixed and possible terms into the loss function of neural network, significantly countering the effect of sparse data with high noise. Then, by filtering out redundant terms without fixing learnable parameters during the training process, the governing equations discovered by the ICNet method can effectively approximate the real governing equations. We select the 2D Burgers equation, the equation of 2D channel flow over an obstacle, and the equation of 3D intracranial aneurysm as examples to verify the superiority of the ICNet for fluid mechanics. Furthermore, we extend similar invariance methods to the discovery of wave equation (Lorentz Invariance) and verify it through Single and Coupled Klein-Gordon equation. The results show that the ICNet method with physical constraints exhibits excellent performance in governing equations discovery from sparse and noisy data.	翻訳日:2024-02-07 16:21:05 公開日:2024-02-06
# AIフィードバックによる強化学習を用いたビデオ用大規模マルチモーダルモデルのチューニング Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback ( http://arxiv.org/abs/2402.03746v1 ) ライセンス: Link先を確認	Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang and Jonghyun Choi	(参考訳) 近年の大規模言語モデルの発展はビデオ大マルチモーダルモデル(VLMM)の発展に影響を与えている。 VLMMの以前のアプローチには、命令調整されたデータセットを使用したSupervised Fine-Tuning (SFT)、ビジュアルエンコーダとLLMの統合、学習可能なモジュールの追加が含まれていた。ビデオとテキストのマルチモーダルアライメントは、主にテキストのみのデータと比較してマルチモーダル命令・トゥンデータのボリュームと品質が不足しているため、依然として困難である。本稿では,AIフィードバックからの強化学習(Reinforcement Learning from AI Feedback, RLAIF)と呼ばれる,マルチモーダルAIシステムを利用した新たなアライメント戦略を提案する。具体的には,映像コンテンツの理解を深めるために,嗜好フィードバック生成時のコンテキストとして詳細な映像記述を提供することにより,文脈対応報酬モデリングを提案する。我々のマルチモーダルRLAIFアプローチであるVLM-RLAIFはSFTモデルを含む既存の手法よりも優れています。私たちは、この分野のさらなる研究を促進するために、コード、モデル、データセットをオープンソース化することを約束します。 Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume and quality of multimodal instruction-tune data compared to text-only data. We present a novel alignment strategy that employs multimodal AI system to oversee itself called Reinforcement Learning from AI Feedback (RLAIF), providing self-preference feedback to refine itself and facilitating the alignment of video and text modalities. In specific, we propose context-aware reward modeling by providing detailed video descriptions as context during the generation of preference feedback in order to enrich the understanding of video content. Demonstrating enhanced performance across diverse video benchmarks, our multimodal RLAIF approach, VLM-RLAIF, outperforms existing approaches, including the SFT model. We commit to open-sourcing our code, models, and datasets to foster further research in this area.	翻訳日:2024-02-07 16:20:40 公開日:2024-02-06
# INSIDE: LLMの内部状態は幻覚検出の力を維持している INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection ( http://arxiv.org/abs/2402.03744v1 ) ライセンス: Link先を確認	Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, Jieping Ye	(参考訳) 知識幻覚は、デプロイされたLLMのセキュリティと信頼性に対する幅広い懸念を引き起こしている。従来,ロジトレベルの不確実性評価や言語レベルの自己整合性評価では,トークン復号処理中に意味情報が必然的に失われていた。そこで我々は,halluc\textbf{i}nation \textbf{de}tection (\textbf{inside}) に対して llms' \textbf{in}ternal \textbf{s}tates に保持される密接な意味情報を探索する。特に、応答の自己整合性をよりよく評価するために、単純で効果的な \textbf{EigenScore} 計量が提案され、これは応答の共分散行列の固有値を利用して密埋め込み空間における意味的一貫性/多様性を測定する。さらに、自己整合性幻覚検出の観点から、内部状態における極端な活性化を阻害するテスト時間特徴クリッピング手法が検討され、過信世代を減らし、過信性幻覚の検出に有用である可能性がある。いくつかのLLMとQA(Qanguage-Awering)ベンチマークで大規模な実験とアブレーション実験を行い,提案手法の有効性を示した。 Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure. Thus, we propose to explore the dense semantic information retained within LLMs' \textbf{IN}ternal \textbf{S}tates for halluc\textbf{I}nation \textbf{DE}tection (\textbf{INSIDE}). In particular, a simple yet effective \textbf{EigenScore} metric is proposed to better evaluate responses' self-consistency, which exploits the eigenvalues of responses' covariance matrix to measure the semantic consistency/diversity in the dense embedding space. Furthermore, from the perspective of self-consistent hallucination detection, a test time feature clipping approach is explored to truncate extreme activations in the internal states, which reduces overconfident generations and potentially benefits the detection of overconfident hallucinations. Extensive experiments and ablation studies are performed on several popular LLMs and question-answering (QA) benchmarks, showing the effectiveness of our proposal.	翻訳日:2024-02-07 16:20:22 公開日:2024-02-06
# SUB-PLAY:部分観測型マルチエージェント強化学習システムに対する対抗策 SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems ( http://arxiv.org/abs/2402.03741v1 ) ライセンス: Link先を確認	Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji	(参考訳) マルチエージェント強化学習(MARL)の最近の進歩は、ドローンの群れ制御、ロボットアームによる協調操作、マルチターゲットの囲い込みなど、膨大な応用可能性を開く。しかし、MARL配備時の潜在的なセキュリティ上の脅威には、より注意と徹底的な調査が必要である。最近の研究によると、攻撃者は被害者の脆弱性を迅速に利用し、敵のポリシーを生成でき、特定のタスクにおける被害者の失敗につながる。例えば、スーパーヒューマンレベルのGo AIの勝利率を約20%に削減する。彼らは主に2人のプレイヤーの競争環境に焦点を当てており、攻撃者が完全なグローバルな状態観察を持っていると仮定している。本研究は,複数エージェントの競争環境において,被害者の部分的観察に制限された場合でも,攻撃者が敵対的な政策を発生できることを初めて明らかにする。具体的には,部分的可観測性の影響を軽減するために,複数のサブゲームを構築するという概念を組み込んだ,新たなブラックボックス攻撃(サブプレイ)を提案する。 3つの典型的な部分的可観測限界下でのSUB-PLAYの有効性を示す。可視化の結果,敵対的政策が被害者の政策ネットワークの活性化を著しく引き起こすことが示唆された。さらに、敵対的政策によるセキュリティの脅威を軽減し、競争環境にMARLを配備するための建設的な勧告を提供することを目的とした3つの防衛策を評価する。 Recent advances in multi-agent reinforcement learning (MARL) have opened up vast application prospects, including swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent researches reveal that an attacker can rapidly exploit the victim's vulnerabilities and generate adversarial policies, leading to the victim's failure in specific tasks. For example, reducing the winning rate of a superhuman-level Go AI to around 20%. They predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation. In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY), which incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests the sharing of transitions among subpolicies to improve the exploitative ability of attackers. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments.	翻訳日:2024-02-07 16:19:54 公開日:2024-02-06
# BotSSCL:自己監督型コントラスト学習によるソーシャルボット検出 BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning ( http://arxiv.org/abs/2402.03740v1 ) ライセンス: Link先を確認	Mohammad Majid Akhtar, Navid Shadman Bhuiyan, Rahat Masood, Muhammad Ikram, Salil S. Kanhere	(参考訳) ソーシャルボット」とも呼ばれる自動アカウントの検出は、オンラインソーシャルネットワーク(OSN)にとってますます重要な関心事となっている。ソーシャルボットの検出にはいくつかの方法が提案されているが、大きな研究ギャップが残っている。第一に、現在のモデルは本物のosnユーザーを模倣する高度なボットを検出することに限界がある。第二に、これらのメソッドは操作の影響を受けやすい単純なプロファイル機能に依存することが多い。敵の操作に対する脆弱性に加えて、これらのモデルは一般化性に欠けており、あるデータセットでトレーニングされ、別のデータセットでテストされた場合、サブパーパフォーマンスをもたらす。これらの課題に対処するために,自己教師付きコントラスト学習(botsscl)を用いた新しいソーシャルボット検出フレームワークを提案する。本フレームワークは,ソーシャルボットと人間を組込み空間で区別し,線形分離性を向上させるために,コントラスト学習を利用する。 BotSSCLから派生したハイレベルな表現は、データの分散の変化に対するレジリエンスを高め、一般化性を確保する。ボットアカウントの操作による検出回避に対するBotSSCLの堅牢性を評価する。高度なボットを特徴とする2つのデータセットの実験は、BotSSCLが他の教師なし、教師なし、および自己教師付きベースラインメソッドよりも優れていることを示している。我々はほぼ達成する。 6%であった。 8% (f1) 向上した。さらに、BotSSCLは、あるデータセットでトレーニングし、別のデータセットでテストすると、67%のF1を達成する。最後に、BotSSCLは敵の複雑さを増大させ、検出を回避するために敵に4%の成功しか与えない。 The detection of automated accounts, also known as "social bots", has been an increasingly important concern for online social networks (OSNs). While several methods have been proposed for detecting social bots, significant research gaps remain. First, current models exhibit limitations in detecting sophisticated bots that aim to mimic genuine OSN users. Second, these methods often rely on simplistic profile features, which are susceptible to manipulation. In addition to their vulnerability to adversarial manipulations, these models lack generalizability, resulting in subpar performance when trained on one dataset and tested on another. To address these challenges, we propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL). Our framework leverages contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. The high-level representations derived by BotSSCL enhance its resilience to variations in data distribution and ensure generalizability. We evaluate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection. Experiments on two datasets featuring sophisticated bots demonstrate that BotSSCL outperforms other supervised, unsupervised, and self-supervised baseline methods. We achieve approx. 6% and approx. 8% higher (F1) performance than SOTA on both datasets. In addition, BotSSCL also achieves 67% F1 when trained on one dataset and tested with another, demonstrating its generalizability. Lastly, BotSSCL increases adversarial complexity and only allows 4% success to the adversary in evading detection.	翻訳日:2024-02-07 16:19:30 公開日:2024-02-06
# 微分的にプライベートな高次元バンディット Differentially Private High Dimensional Bandits ( http://arxiv.org/abs/2402.03737v1 ) ライセンス: Link先を確認	Apurv Shukla	(参考訳) パラメータベクトルが$s_{0}$-sparseであり、決定メーカーが偏微分プライバシーの中央モデルと局所モデルの両方の下でプライバシー制約を受ける場合、高次元の確率的文脈線形バンディット問題を考える。差分プライベートなLASSO帯域幅アルゴリズムであるPrivateLASSOを提案する。 PrivateLASSOは2つのサブルーチンに基づいている。 (i)まばらなハードスレッディングに基づくプライバシー機構 (ii)パラメータ $\theta$ のサポートを特定するためのエピソディックしきい値規則。標準前提の下では,PrivateLASSOのプライバシと実用性を保証するために,Minimaxのプライベートなバウンダリを証明している。 We consider a high-dimensional stochastic contextual linear bandit problem when the parameter vector is $s_{0}$-sparse and the decision maker is subject to privacy constraints under both central and local models of differential privacy. We present PrivateLASSO, a differentially private LASSO bandit algorithm. PrivateLASSO is based on two sub-routines: (i) a sparse hard-thresholding-based privacy mechanism and (ii) an episodic thresholding rule for identifying the support of the parameter $\theta$. We prove minimax private lower bounds and establish privacy and utility guarantees for PrivateLASSO for the central model under standard assumptions.	翻訳日:2024-02-07 16:19:04 公開日:2024-02-06
# 最大$s$-bundle問題に対する新しい境界法を用いた効果的な分岐・境界アルゴリズム An Effective Branch-and-Bound Algorithm with New Bounding Methods for the Maximum $s$-Bundle Problem ( http://arxiv.org/abs/2402.03736v1 ) ライセンス: Link先を確認	Jinghui Xue, Jiongzhi Zheng, Mingming Jin and Kun He	(参考訳) 最大sバンドル問題(MBP)は、与えられたグラフ内の最大sバンドルを特定するタスクに対処する。グラフ g=(v, e) が s-バンドル (s-bundle) とは、頂点接続が少なくとも \|v\|-s であるとき、頂点接続が最小の頂点数に等しいときに言う。 MBPはNPハードであり、頂点接続性を強調する多くの現実シナリオに関連がある。 mbpの正確なアルゴリズムは、主に分枝結合(bnb)フレームワークに従っており、その性能は最大s束の濃度とグラフ還元による最初の下限の上限の品質に大きく依存している。本研究では,グラフ分割技術を活用した分割型上界(PUB)を導入し,既存のものに比べてより厳密な上界を実現する。下限を増加させるために,クリップ上で短いランダムウォークを行い,より大きな初期解を生成することを提案する。そこで我々は,グラフ削減のための前処理に初期下界とPUBを用いる新しいBnBアルゴリズムを提案し,分岐解析にBnB探索プロセスにPUBを用いる。多様なs値を用いた大規模な実験は、最先端のBnB MBPアルゴリズムに対する我々のアルゴリズムの顕著な進歩を示している。さらに、最初の下界は、他の緩和傾斜問題にも一般化できる。 The Maximum s-Bundle Problem (MBP) addresses the task of identifying a maximum s-bundle in a given graph. A graph G=(V, E) is called an s-bundle if its vertex connectivity is at least \|V\|-s, where the vertex connectivity equals the minimum number of vertices whose deletion yields a disconnected or trivial graph. MBP is NP-hard and holds relevance in numerous realworld scenarios emphasizing the vertex connectivity. Exact algorithms for MBP mainly follow the branch-and-bound (BnB) framework, whose performance heavily depends on the quality of the upper bound on the cardinality of a maximum s-bundle and the initial lower bound with graph reduction. In this work, we introduce a novel Partition-based Upper Bound (PUB) that leverages the graph partitioning technique to achieve a tighter upper bound compared to existing ones. To increase the lower bound, we propose to do short random walks on a clique to generate larger initial solutions. Then, we propose a new BnB algorithm that uses the initial lower bound and PUB in preprocessing for graph reduction, and uses PUB in the BnB search process for branch pruning. Extensive experiments with diverse s values demonstrate the significant progress of our algorithm over state-of-the-art BnB MBP algorithms. Moreover, our initial lower bound can also be generalized to other relaxation clique problems.	翻訳日:2024-02-07 16:18:54 公開日:2024-02-06
# 課題追跡システムにおけるChatGPTの有用性の検討:探索的研究 Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study ( http://arxiv.org/abs/2402.03735v1 ) ライセンス: Link先を確認	Joy Krishan Das, Saikat Mondal, Chanchal K.Roy	(参考訳) 問題追跡システムは、外部ユーザを取り入れ、ユーザの要求を満たすためにソフトウェアプロジェクトをカスタマイズするための主要なツールである。しかし、コントリビュータの数が限られており、各問題に対する最善のアプローチを特定するという課題は、しばしば効果的な解決を妨げる。最近、ChatGPTのようなAIツールを使って問題解決の効率を高める開発者が増えている。これまでの研究では、自動プログラム修復、デバッグ、コード生成といった分野でChatGPTの可能性を実証してきたが、開発者がChatGPTを明示的に利用してトラッキングシステムの問題を解決する方法については研究されていない。そこで本研究では,ChatGPTと開発者間のインタラクションを分析し,それらの活動を分析し,解決することを目的とした。さらに,ChatGPTが生成したコードが,クローン検出ツールNiCadを使用してプロジェクトのコードベースに統合されているかどうかを確認することで,コードの信頼性を評価する。私たちの調査によると、開発者は主にブレインストーミングソリューションにChatGPTを使用しているが、おそらく文献で強調されているように、ChatGPTで生成されたコードではなく、自分のコードを書くことを選択している。 Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solving efficiency. While previous studies have demonstrated the potential of ChatGPT in areas such as automatic program repair, debugging, and code generation, there is a lack of study on how developers explicitly utilize ChatGPT to resolve issues in their tracking system. Hence, this study aims to examine the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. In addition, we assess the code reliability by confirming if the code produced by ChatGPT was integrated into the project's codebase using the clone detection tool NiCad. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code, possibly due to concerns over the generation of "hallucinated code", as highlighted in the literature.	翻訳日:2024-02-07 16:18:28 公開日:2024-02-06
# 知識グラフにおけるDeep outdated Fact Detection Deep Outdated Fact Detection in Knowledge Graphs ( http://arxiv.org/abs/2402.03732v1 ) ライセンス: Link先を確認	Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, Karin Verspoor	(参考訳) 知識グラフ(KGs)は、様々な領域にまたがる大きな可能性について、大きな注目を集めている。しかし、時代遅れの事実の問題はKGに挑戦し、現実世界の情報が進化するにつれて、その全体的な品質に影響を及ぼす。古い事実検出のための既存のソリューションは、しばしば手動認識に依存している。そこで本研究では,KGs内の古い事実を識別するための新しいディープラーニングベースのフレームワークであるDEAN(Deep outdatEd fAct detectioN)を提案する。 DEANは、実体と関係の包括的モデリングを通じて、事実間の暗黙的な構造情報をキャプチャすることで、自分自身を区別する。 DEANは遅延情報を効果的に発見するために、エンティティの数で重み付けされたR2N(Relations-to-Nodes)グラフに基づく対照的なアプローチを採用している。実験結果は,最先端のベースライン法よりもDEANの有効性と優位性を示した。 Knowledge graphs (KGs) have garnered significant attention for their vast potential across diverse domains. However, the issue of outdated facts poses a challenge to KGs, affecting their overall quality as real-world information evolves. Existing solutions for outdated fact detection often rely on manual recognition. In response, this paper presents DEAN (Deep outdatEd fAct detectioN), a novel deep learning-based framework designed to identify outdated facts within KGs. DEAN distinguishes itself by capturing implicit structural information among facts through comprehensive modeling of both entities and relations. To effectively uncover latent out-of-date information, DEAN employs a contrastive approach based on a pre-defined Relations-to-Nodes (R2N) graph, weighted by the number of entities. Experimental results demonstrate the effectiveness and superiority of DEAN over state-of-the-art baseline methods.	翻訳日:2024-02-07 16:18:08 公開日:2024-02-06
# スピンキャビティ系における離散時間結晶に対するパラメトリック共鳴の理論 Theory of parametric resonance for discrete time crystals in fully-connected spin-cavity systems ( http://arxiv.org/abs/2402.03729v1 ) ライセンス: Link先を確認	Roy D. Jara Jr., Dennis F. Salinel, Jayson G. Cosme	(参考訳) 全連結スピンキャビティ系における離散時間結晶形成に必要な条件をパラメトリック共鳴の観点から特定し、これらの系を振動子様モデルにマッピングする。我々は、周期的に駆動されるオープンディックモデル(DM)を実効線形および非線形振動子モデルにマッピングし、リプキン-メシュコフ-グリックモデル(LMG)モデルを用いて大域対称性破壊の効果を解析する。系の非線形性は, 共振駆動時の非有界化を抑制することを示す。一方、消散は周期性不安定性の振動振幅を一定に保ち、これはDTCの重要な特徴である。周期共振応答のパラメトリック共振器活性化には, 駆動のない大域対称性の破れの存在が不可欠であることがわかった。各振動子モデルを用いて,両系の共振周波数とdtc形成につながる振幅の解析的予測を行う。 We pinpoint the conditions necessary for discrete time crystal formation in fully-connected spin-cavity systems from the perspective of parametric resonance by mapping these systems onto oscillator-like models. We elucidate the role of nonlinearity and dissipation by mapping the periodically driven open Dicke model (DM) onto effective linear and nonlinear oscillator models, while we analyze the effect of global symmetry breaking using the Lipkin-Meshkov-Glick (LMG) model with tunable anisotropy. We show that the system's nonlinearity restrains the dynamics from becoming unbounded when driven resonantly. On the other hand, dissipation keeps the oscillation amplitude of the period-doubling instability fixed, which is a key feature of DTCs. The presence of global symmetry breaking in the absence of driving is found to be crucial in the parametric resonant activation of period-doubling response. We provide analytic predictions for the resonant frequencies and amplitudes leading to DTC formation for both systems using their respective oscillator models.	翻訳日:2024-02-07 16:17:52 公開日:2024-02-06
# 不均一学習モデルを用いた一貫した共同意思決定 Consistent Joint Decision-Making with Heterogeneous Learning Models ( http://arxiv.org/abs/2402.03728v1 ) ライセンス: Link先を確認	Hossein Rajaby Faghihi and Parisa Kordjamshidi	(参考訳) 本稿では,外部知識を活用しつつ,多様なモデルによる意思決定の一貫性を促進する新しい意思決定フレームワークを提案する。整数線形計画法(ilp)フレームワークを活用することで,様々なモデルからの予測を,決定の事前確率,信頼度(不確実性),モデルの期待精度に関する情報を組み込むことにより,グローバルに正規化され,比較可能な値にマッピングする。実験により、従来の複数のデータセットのベースラインよりもアプローチが優れていることを示す。 This paper introduces a novel decision-making framework that promotes consistency among decisions made by diverse models while utilizing external knowledge. Leveraging the Integer Linear Programming (ILP) framework, we map predictions from various models into globally normalized and comparable values by incorporating information about decisions' prior probability, confidence (uncertainty), and the models' expected accuracy. Our empirical study demonstrates the superiority of our approach over conventional baselines on multiple datasets.	翻訳日:2024-02-07 16:17:35 公開日:2024-02-06
# インスタンス・ワイズ・セルフ・アテンティブ・ホークスプロセスによる粒状因果性学習 Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes ( http://arxiv.org/abs/2402.03726v1 ) ライセンス: Link先を確認	Dongxia Wu, Tsuyoshi Id\'e, Aur\'elie Lozano, Georgios Kollias, Ji\v{r}\'i Navr\'atil, Naoki Abe, Yi-An Ma, Rose Yu	(参考訳) 本稿では,非同期,相互依存型,複数タイプのイベントシーケンスからGranger因果関係を学習する問題に対処する。特に、インスタンスレベルの因果構造を教師なしで発見することに興味がある。インスタンスレベルの因果関係は個々のイベント間の因果関係を認識し、よりきめ細かい情報を提供する。文献における既存の研究は、強度関数の線形性のような強い仮定や、必ずしもグランジャー因果関係の要件を満たさないヒューリスティックに定義されたモデルパラメータを必要とする。本稿では,イベントインスタンスレベルでのグランジャー因果関係を直接推測可能な,新しいディープラーニングフレームワークであるisahp(instance-wise self-attentive hawkes processes)を提案する。 ISAHPは、Granger因果性の要求を満たす最初の神経点プロセスモデルである。変圧器の自己着脱機構を利用して、グレンジャー因果関係の原理に合致する。我々は、ISAHPが古典モデルでは扱えない複雑なインスタンスレベルの因果構造を発見することができることを実証的に実証した。また、ISAHPは、タイプレベルの因果発見とインスタンスレベルのイベントタイプ予測を含むプロキシタスクにおいて、最先端のパフォーマンスを達成することを示す。 We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature either requires strong assumptions, such as linearity in the intensity function, or heuristically defined model parameters that do not necessarily meet the requirements of Granger causality. We propose Instance-wise Self-Attentive Hawkes Processes (ISAHP), a novel deep learning framework that can directly infer the Granger causality at the event instance level. ISAHP is the first neural point process model that meets the requirements of Granger causality. It leverages the self-attention mechanism of the transformer to align with the principles of Granger causality. We empirically demonstrate that ISAHP is capable of discovering complex instance-level causal structures that cannot be handled by classical models. We also show that ISAHP achieves state-of-the-art performance in proxy tasks involving type-level causal discovery and instance-level event type prediction.	翻訳日:2024-02-07 16:17:25 公開日:2024-02-06
# 自由フェルミオン負性に対する電荷相関子展開 Charge correlator expansion for free fermion negativity ( http://arxiv.org/abs/2402.03725v1 ) ライセンス: Link先を確認	Yang-Yang Tang	(参考訳) 対数ネガティビティ(英: logarithmic negativity)は、量子情報理論において広く用いられるエンタングルメント測度であり、複製トリックや相関行列の関連によって、量子多体系でも効率的に計算できる。本稿では,保存電荷を持つ自由フェルミオン系において,完全計数統計(fcs)の文脈における絡み合いエントロピーの場合と類似した,連結電荷相関子によってr\'enyi および対数ネガティクスを拡張できることを実証する。特に局所ホッピングしか持たない系の数値検証により、ランダムな全連結ハミルトニアンにおけるこの拡張の急速な収束を確認した。 R'enyi Negativity の極限から対数ネガティビティを得るレプリカのトリックは、この方法では変換不変系のみに有効である。この拡張を用いて、広範囲な自由フェルミオン系における負性性のスケーリング挙動を解析する。特に, 1+1次元自由フェルミオン系では, 拡張による負性率のスケーリング挙動は, Toeplitz 行列を用いた手法の既知結果と一致している。これらの知見は, 自由フェルミオン系の絡み合い特性に関する知見を与え, 絡み合い対策の研究における拡張手法の有効性を実証する。 Logarithmic negativity is a widely used entanglement measure in quantum information theories, which can also be efficiently computed in quantum many-body systems by replica trick or by relating to correlation matrices. In this paper, we demonstrate that in free-fermion systems with conserved charge, R\'enyi and logarithmic negativity can be expanded by connected charge correlators, analogous to the case for entanglement entropy in the context of full counting statistics (FCS). We confirm the rapid convergence of this expansion in random all-connected Hamiltonian through numerical verification, especially for systems with only local hopping. We find that the replica trick that get logarithmic negativity from the limit of R\'enyi negativity is valid in this method only for translational invariant systems. Using this expansion, we analyze the scaling behavior of negativity in extensive free-fermion systems. In particular, in 1+1 dimensional free-fermion systems, we observe that the scaling behavior of negativity from our expansion is consistent with known results from the method with Toeplitz matrix. These findings provide insights into the entanglement properties of free-fermion systems, and demonstrate the efficacy of the expansion approach in studying entanglement measures.	翻訳日:2024-02-07 16:17:07 公開日:2024-02-06
# 平滑MDPにおける非線形強化学習 No-Regret Reinforcement Learning in Smooth MDPs ( http://arxiv.org/abs/2402.03792v1 ) ライセンス: Link先を確認	Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell	(参考訳) 連続状態および/またはアクション空間の問題が発生した場合、強化学習(RL)が保証されないことは、この分野における大きな課題の1つである。最近、様々な解決策が提案されているが、非常に特定の設定に加えて、一般的な問題は未解決のままである。本稿では,マルコフ決定過程 (MDPs) に関する新しい構造的仮定,すなわち$\nu-$smoothness を導入し,これまで提案されてきた設定の大部分を一般化する(線形MDPやリプシッツMDPなど)。この困難なシナリオに直面するため、我々は$\nu-$smooth mdps における後悔の最小化のための2つのアルゴリズムを提案する。どちらのアルゴリズムも、ルジャンドル多項式に基づく直交特徴写像を通してMDP表現を構築するという考え方に基づいている。第1のアルゴリズムである \textsc{legendre-eleanor} は、より弱い仮定の下でノンリグレット特性をアーカイブするが、計算効率は低いが、第2のアルゴリズムである \textsc{legendre-lsvi} は多項式時間で実行される。 RL理論から得られた遺残特性を解析した結果と比較した結果,アルゴリズムが最高の保証を達成できることが判明した。 Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $\nu-$smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $\nu-$smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, \textsc{Legendre-Eleanor}, archives the no-regret property under weaker assumptions but is computationally inefficient, whereas the second one, \textsc{Legendre-LSVI}, runs in polynomial time, although for a smaller class of problems. After analyzing their regret properties, we compare our results with state-of-the-art ones from RL theory, showing that our algorithms achieve the best guarantees.	翻訳日:2024-02-07 16:10:18 公開日:2024-02-06
# より良いコード表現のためのバージョン履歴コンテキストのエンコーディング Encoding Version History Context for Better Code Representation ( http://arxiv.org/abs/2402.03773v1 ) ライセンス: Link先を確認	Huy Nguyen, Christoph Treude, Patanamon Thongtanunam	(参考訳) ソースコードを生成するAIツールの指数関数的な成長により、ソフトウェアを理解することが重要になっている。開発者がプログラムを理解すると、プログラムのドキュメントや過去のコードバージョンなどの情報を探すために追加のコンテキストを参照することができる。したがって、この追加の文脈情報を符号化することは、深層学習のためのコード表現にも役立つと論じる。最近の論文では、プログラム理解問題に対処するために、文脈データ(例えば呼び出し階層)をベクトル表現に組み込んでいる。これは、モデルによるプログラムの理解を深めるために、バージョン履歴のような追加のコンテキストを探求するさらなる研究を動機付ける。つまり、バージョン履歴からの洞察によって、コードの進化におけるパターンの認識、繰り返し発生する問題、過去のソリューションの有効性が実現される。本稿では、バージョン履歴から文脈情報をエンコードしてコードクローンを予測し、コード分類を行うことによる潜在的メリットの予備的な証拠を示す。我々は,astnnとcodebertという2つの代表的なディープラーニングモデルを用いて,異なるアグリゲーションによる追加コンテキストの組み合わせが下流アクティビティに有用かどうかを検証した。実験結果は,すべてのシナリオにおいて,バージョン履歴とソースコード表現を組み合わせることによる肯定的な影響を裏付けるものである。しかし,そのテクニックを一貫して実行するためには,コンテキスト,集約,モデルの異なる組み合わせを用いて,より大規模なコードベースを包括的に調査する必要がある。そこで本稿では,コード表現の改善と特定の状況における最適活用を目的とした,追加コンテキストの符号化のさまざまな側面を探求する研究課題を提案する。 With the exponential growth of AI tools that generate source code, understanding software has become crucial. When developers comprehend a program, they may refer to additional contexts to look for information, e.g. program documentation or historical code versions. Therefore, we argue that encoding this additional contextual information could also benefit code representation for deep learning. Recent papers incorporate contextual data (e.g. call hierarchy) into vector representation to address program comprehension problems. This motivates further studies to explore additional contexts, such as version history, to enhance models' understanding of programs. That is, insights from version history enable recognition of patterns in code evolution over time, recurring issues, and the effectiveness of past solutions. Our paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification. We experiment with two representative deep learning models, ASTNN and CodeBERT, to investigate whether combining additional contexts with different aggregations may benefit downstream activities. The experimental result affirms the positive impact of combining version history into source code representation in all scenarios; however, to ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models. Therefore, we propose a research agenda aimed at exploring various aspects of encoding additional context to improve code representation and its optimal utilisation in specific situations.	翻訳日:2024-02-07 16:09:51 公開日:2024-02-06
# bagged rewardからの強化学習:インスタンスレベルの報酬再分配のためのトランスフォーマーベースのアプローチ Reinforcement Learning from Bagged Reward: A Transformer-based Approach for Instance-Level Reward Redistribution ( http://arxiv.org/abs/2402.03771v1 ) ライセンス: Link先を確認	Yuting Tang and Xin-Qiang Cai and Yao-Xiang Ding and Qiyu Wu and Guoqing Liu and Masashi Sugiyama	(参考訳) 強化学習(RL)では、エージェントの動作毎に即時報酬信号を生成し、エージェントが累積報酬を最大化して最適なポリシーを得るように学習する。しかし、現実世界の多くのアプリケーションでは、即時報酬信号はエージェントによって取得できない。代わりに、学習者はバッグの端でのみ報酬を受け取り、バッグは完全な軌道の部分的なシーケンスとして定義される。この状況では、学習者はバッグ内の未知の即時報酬を探索する重大な困難に直面しなければならないが、これは既存のアプローチでは対処できない。本稿では、この状況を正式に研究するために、RLBR(Reinforcement Learning from Bagged Rewards)と呼ばれる新しいRL設定を導入する。本稿では,マルコフ決定過程(MDP)におけるRLBRと標準RLの関連性を確立するための理論的研究について述べる。そこで本研究では,袋内における報酬分布を効果的に解明するために,袋内における文脈的ニュアンスや時間的依存関係を解釈するセルフアテンション機構を用いた,トランスフォーマベースの報酬モデルである報奨袋トランス(rbt)を提案する。広汎な実験分析により,本手法の優位性,特に元のMDPの報酬分布を模倣する能力が示され,文脈的理解能力と環境力学への適応性を強調した。 In reinforcement Learning (RL), an instant reward signal is generated for each action of the agent, such that the agent learns to maximize the cumulative reward to obtain the optimal policy. However, in many real-world applications, the instant reward signals are not obtainable by the agent. Instead, the learner only obtains rewards at the ends of bags, where a bag is defined as a partial sequence of a complete trajectory. In this situation, the learner has to face the significant difficulty of exploring the unknown instant rewards in the bags, which could not be addressed by existing approaches, including those trajectory-based approaches that consider only complete trajectories and ignore the inner reward distributions. To formally study this situation, we introduce a novel RL setting termed Reinforcement Learning from Bagged Rewards (RLBR), where only the bagged rewards of sequences can be obtained. We provide the theoretical study to establish the connection between RLBR and standard RL in Markov Decision Processes (MDPs). To effectively explore the reward distributions within the bagged rewards, we propose a Transformer-based reward model, the Reward Bag Transformer (RBT), which uses the self-attention mechanism for interpreting the contextual nuances and temporal dependencies within each bag. Extensive experimental analyses demonstrate the superiority of our method, particularly in its ability to mimic the original MDP's reward distribution, highlighting its proficiency in contextual understanding and adaptability to environmental dynamics.	翻訳日:2024-02-07 16:09:24 公開日:2024-02-06
# Fed-CVLC: 可変長符号によるフェデレーション学習コミュニケーションの圧縮 Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes ( http://arxiv.org/abs/2402.03770v1 ) ライセンス: Link先を確認	Xiaoxin Su, Yipeng Zhou, Laizhong Cui, John C.S. Lui and Jiangchuan Liu	(参考訳) フェデレーション学習(fl)パラダイムでは、パラメータサーバ(ps)が、個々のクライアントが所有するプライベートデータに触らずに、複数のラウンドにわたってモデル収集、更新集約、モデル分散のために、分散参加者クライアントと同時通信する。 FLはデータのプライバシを保存することに魅力がありますが、PSと散在するクライアント間の通信は深刻なボトルネックになります。量子化やスパーシフィケーションのようなモデル圧縮アルゴリズムは提案されているが、一般に固定コード長を仮定しており、モデル更新の不均一性と可変性を反映していない。本稿では,解析と実験の両方を通して,FLの圧縮に可変長が有用であることを示す。そこで我々はFed-CVLC(Federated Learning Compression with Variable-Length Codes)を提案する。通信予算を考慮した損失関数(モデルユーティリティの最大化に相当)を最小化する最適調整戦略を開発する。さらに、Fed-CVLCは、量子化とスパーシフィケーションを橋渡しし、より柔軟な圧縮設計であることを示す。 Fed-CVLCは最先端のベースラインを著しく上回り、モデルの実用性は1.50%-5.44%向上し、通信トラフィックは16.67%-41.61%縮小した。 In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds, without touching private data owned by individual clients. FL is appealing in preserving data privacy; yet the communication between the PS and scattered clients can be a severe bottleneck. Model compression algorithms, such as quantization and sparsification, have been suggested but they generally assume a fixed code length, which does not reflect the heterogeneity and variability of model updates. In this paper, through both analysis and experiments, we show strong evidences that variable-length is beneficial for compression in FL. We accordingly present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response of the dynamics of model updates. We develop optimal tuning strategy that minimizes the loss function (equivalent to maximizing the model utility) subject to the budget for communication. We further demonstrate that Fed-CVLC is indeed a general compression design that bridges quantization and sparsification, with greater flexibility. Extensive experiments have been conducted with public datasets to demonstrate that Fed-CVLC remarkably outperforms state-of-the-art baselines, improving model utility by 1.50%-5.44%, or shrinking communication traffic by 16.67%-41.61%.	翻訳日:2024-02-07 16:08:58 公開日:2024-02-06
# attacknet: ライブネス検出のための畳み込みニューラルネットワークアーキテクチャによる生体認証セキュリティの強化 AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection ( http://arxiv.org/abs/2402.03769v1 ) ライセンス: Link先を確認	Oleksandr Kuznetsov, Dmytro Zakharov, Emanuele Frontoni, Andrea Maranesi	(参考訳) バイオメトリック・セキュリティは、バイオメトリック・サンプルの完全性と信頼性が最重要となる、現代のアイデンティティ認証および認証システムの基盤である。本稿では,バイオメトリックシステムにおけるスプーフィング脅威に対処するように設計された,目覚ましい畳み込みニューラルネットワークアーキテクチャであるAttackNetを紹介する。深層学習手法を取り入れたこのモデルは,低レベル特徴抽出から高レベルパターン識別へシームレスに移行する,層状防御機構を提供する。 3つの特徴的なアーキテクチャフェーズがモデルの要点を形成し、それぞれが司法的に選択されたアクティベーション関数、正規化テクニック、およびドロップアウト層によって支えられ、敵の攻撃に対する堅牢性とレジリエンスを確保する。多様なデータセットにまたがってモデルをベンチマークすることで、現在のモデルと比較して優れたパフォーマンス指標を示す。さらに、詳細な比較分析によりモデルの有効性が強調され、最先端の手法と平行に描画される。反復的な洗練とアーキテクチャ戦略を通じて、AttackNetはバイオメトリックセキュリティの未来を守るためのディープラーニングの可能性を強調している。 Biometric security is the cornerstone of modern identity verification and authentication systems, where the integrity and reliability of biometric samples is of paramount importance. This paper introduces AttackNet, a bespoke Convolutional Neural Network architecture, meticulously designed to combat spoofing threats in biometric systems. Rooted in deep learning methodologies, this model offers a layered defense mechanism, seamlessly transitioning from low-level feature extraction to high-level pattern discernment. Three distinctive architectural phases form the crux of the model, each underpinned by judiciously chosen activation functions, normalization techniques, and dropout layers to ensure robustness and resilience against adversarial attacks. Benchmarking our model across diverse datasets affirms its prowess, showcasing superior performance metrics in comparison to contemporary models. Furthermore, a detailed comparative analysis accentuates the model's efficacy, drawing parallels with prevailing state-of-the-art methodologies. Through iterative refinement and an informed architectural strategy, AttackNet underscores the potential of deep learning in safeguarding the future of biometric security.	翻訳日:2024-02-07 16:08:31 公開日:2024-02-06
# mobilevlm v2: ビジョン言語モデルの高速かつ強力なベースライン MobileVLM V2: Faster and Stronger Baseline for Vision Language Model ( http://arxiv.org/abs/2402.03766v1 ) ライセンス: Link先を確認	Xiangxiang Chu and Limeng Qiao and Xinyu Zhang and Shuang Xu and Fei Wei and Yang Yang and Xiaofei Sun and Yiming Hu and Xinyang Lin and Bo Zhang and Chunhua Shen	(参考訳) 我々は,MobileVLM上で大幅に改良された視覚言語モデルであるMobileVLM V2を紹介し,新しいアーキテクチャ設計の繊細なオーケストレーション,モバイルVLMに適したトレーニングスキームの改善,高品質なデータセットキュレーションにより,VLMの性能を大幅に向上させることができることを示した。特に、MobileVLM V2 1.7Bは、標準VLMベンチマークにおいて、3Bスケールでのより大きなVLMよりも優れた、または低いパフォーマンスを達成する。特に、我々の3Bモデルは7B+スケールで様々なVLMより優れています。私たちのモデルはhttps://github.com/Meituan-AutoML/MobileVLMでリリースされます。 We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .	翻訳日:2024-02-07 16:08:11 公開日:2024-02-06
# mod-slam:unbounded 3d scene reconstructionのための単眼高密度マッピング MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction ( http://arxiv.org/abs/2402.03762v1 ) ライセンス: Link先を確認	Heng Zhou, Zhetao Guo, Shuhong Liu, Lechen Zhang, Qihao Wang, Yuxiang Ren, Mingrui Li	(参考訳) ニューラルネットワークの暗黙的表現は、最近、同時局在化とマッピング(slam)を含む多くの分野で実証されている。現在のニューラルSLAMは境界シーンの再構成において理想的な結果が得られるが、これはRGB-D画像の入力に依存する。 rgb画像のみに基づくニューラルベースslamでは,シーンのスケールを正確に再構築することはできず,追跡中に蓄積されたエラーによりスケールドリフトに支障をきたす。このような制約を克服するために,世界的ポーズ最適化と3次元再構成を非有界シーンで実現可能な単眼的密集マッピング法 mod-slam を提案する。単眼深度推定によるシーン再構築の最適化とループ閉鎖検出によるカメラポーズの更新により、大規模シーンの詳細な再現が可能となる。これまでの作業と比べて、私たちのアプローチはより堅牢で、スケーラブルで、多用途です。実験の結果,MoD-SLAMのマッピング性能は,特に大きな境界のないシーンにおいて,従来のSLAM法よりも優れていた。 Neural implicit representations have recently been demonstrated in many fields including Simultaneous Localization And Mapping (SLAM). Current neural SLAM can achieve ideal results in reconstructing bounded scenes, but this relies on the input of RGB-D images. Neural-based SLAM based only on RGB images is unable to reconstruct the scale of the scene accurately, and it also suffers from scale drift due to errors accumulated during tracking. To overcome these limitations, we present MoD-SLAM, a monocular dense mapping method that allows global pose optimization and 3D reconstruction in real-time in unbounded scenes. Optimizing scene reconstruction by monocular depth estimation and using loop closure detection to update camera pose enable detailed and precise reconstruction on large scenes. Compared to previous work, our approach is more robust, scalable and versatile. Our experiments demonstrate that MoD-SLAM has more excellent mapping performance than prior neural SLAM methods, especially in large borderless scenes.	翻訳日:2024-02-07 16:07:59 公開日:2024-02-06
# 深層学習に基づく脳腫瘍手術用ハイパースペクトル画像の補正とアンミックス Deep Learning-Based Correction and Unmixing of Hyperspectral Images for Brain Tumor Surgery ( http://arxiv.org/abs/2402.03761v1 ) ライセンス: Link先を確認	David Black, Jaidev Gill, Andrew Xie, Benoit Liquet, Antonio Di leva, Walter Stummer, Eric Suero Molina	(参考訳) 蛍光誘導脳腫瘍切除のためのハイパースペクトルイメージング(HSI)は、ヒトでは識別できない組織の違いを可視化する。この増強は脳腫瘍の切除を最大化し、患者の予後を改善する。しかし、hsiの処理の多くは、フルオロフォアの存在量の正確な回復のためにモデル化されなければならない非線形波長依存現象を捉えることができない単純な線形法を用いている。そこで本研究では,非線形効果を考慮し,より正確な量の推定を行うことができる2つの深層学習モデルを提案する。どちらのモデルも、捕獲されたスペクトルを処理するためにオートエンコーダのようなアーキテクチャを使用する。 1つはプロトポルフィリンIX(PpIX)濃度ラベルで訓練されている。他方は半教師訓練を行い、まずハイパースペクトルアンミックスを学習し、その後、参照白色光反射スペクトルを用いて不均一な光学的・幾何学的性質の蛍光発光スペクトルを数ショットで補正する学習を行う。 PpIX 濃度と計算した PpIX 濃度 0.997 と 0.990 の Pearson 相関係数 (R 値) は, 従来の手法では 0.93 と 0.82 しか得られなかった。半教師ありアプローチのR値はそれぞれ0.98と0.91である。人間のデータでは、半教師付きモデルは古典的手法よりも質的により現実的な結果を与え、スペクトル反射率の鮮明な点を除去し、比較的均一であるべき生検に対するPpIX量の分散を減少させる。これらの結果から,蛍光誘導神経外科における深層学習によるHSIの改善が期待できる。 Hyperspectral Imaging (HSI) for fluorescence-guided brain tumor resection enables visualization of differences between tissues that are not distinguishable to humans. This augmentation can maximize brain tumor resection, improving patient outcomes. However, much of the processing in HSI uses simplified linear methods that are unable to capture the non-linear, wavelength-dependent phenomena that must be modeled for accurate recovery of fluorophore abundances. We therefore propose two deep learning models for correction and unmixing, which can account for the nonlinear effects and produce more accurate estimates of abundances. Both models use an autoencoder-like architecture to process the captured spectra. One is trained with protoporphyrin IX (PpIX) concentration labels. The other undergoes semi-supervised training, first learning hyperspectral unmixing self-supervised and then learning to correct fluorescence emission spectra for heterogeneous optical and geometric properties using a reference white-light reflectance spectrum in a few-shot manner. The models were evaluated against phantom and pig brain data with known PpIX concentration; the supervised model achieved Pearson correlation coefficients (R values) between the known and computed PpIX concentrations of 0.997 and 0.990, respectively, whereas the classical approach achieved only 0.93 and 0.82. The semi-supervised approach's R values were 0.98 and 0.91, respectively. On human data, the semi-supervised model gives qualitatively more realistic results than the classical method, better removing bright spots of specular reflectance and reducing the variance in PpIX abundance over biopsies that should be relatively homogeneous. These results show promise for using deep learning to improve HSI in fluorescence-guided neurosurgery.	翻訳日:2024-02-07 16:07:42 公開日:2024-02-06
# 仮想分類:多領域群数に対するドメイン固有知識の変調 Virtual Classification: Modulating Domain-Specific Knowledge for Multidomain Crowd Counting ( http://arxiv.org/abs/2402.03758v1 ) ライセンス: Link先を確認	Mingyue Guo, Binghui Chen, Zhaoyi Yan, Yaowei Wang, Qixiang Ye	(参考訳) マルチドメインのクラウドカウントは、複数の多様なデータセットの一般的なモデルを学ぶことを目的としている。しかし、ディープネットワークは、ドメインバイアスとして知られるすべてのドメインではなく、支配的なドメインの分布のモデリングを好む。本研究では,マルチドメインの集団カウントにおけるドメインバイアス問題に対処するための,シンプルなyet- Effective Modulating Domain-specific Knowledge Network (MDKNet)を提案する。 MDKNetは‘変調’というアイデアを採用し、さまざまなデータセットの分散をバイアスの少ないディープネットワークバランシングとモデリングを可能にしている。具体的には、ドメイン分布に適応する情報フローを洗練するためのベースモジュレータとして機能する、インスタンス固有バッチ正規化(IsBN)モジュールを提案する。ドメイン固有情報を正確に調整するためにドメイン誘導仮想分類器(DVC)を導入し、ドメイン分離可能な潜在空間を学習する。この空間は、IsBN変調器の入力ガイダンスとして使われ、複数のデータセットの混合分布を適切に扱うことができる。上海技術A/B、QNRF、NWPUなどの一般的なベンチマークで実施された大規模な実験は、マルチドメインのクラウドカウントに取り組む上でMDKNetの優位性とマルチドメイン学習の有効性を検証する。コードは \url{https://github.com/csguomy/MDKNet} で入手できる。 Multidomain crowd counting aims to learn a general model for multiple diverse datasets. However, deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias. In this study, we propose a simple-yet-effective Modulating Domain-specific Knowledge Network (MDKNet) to handle the domain bias issue in multidomain crowd counting. MDKNet is achieved by employing the idea of `modulating', enabling deep network balancing and modeling different distributions of diverse datasets with little bias. Specifically, we propose an Instance-specific Batch Normalization (IsBN) module, which serves as a base modulator to refine the information flow to be adaptive to domain distributions. To precisely modulating the domain-specific information, the Domain-guided Virtual Classifier (DVC) is then introduced to learn a domain-separable latent space. This space is employed as an input guidance for the IsBN modulator, such that the mixture distributions of multiple datasets can be well treated. Extensive experiments performed on popular benchmarks, including Shanghai-tech A/B, QNRF and NWPU, validate the superiority of MDKNet in tackling multidomain crowd counting and the effectiveness for multidomain learning. Code is available at \url{https://github.com/csguomy/MDKNet}.	翻訳日:2024-02-07 16:07:15 公開日:2024-02-06
# 直感的バイアス:Spurious ImagesはMLLMの幻覚に繋がる The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs ( http://arxiv.org/abs/2402.03757v1 ) ライセンス: Link先を確認	Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang	(参考訳) 大規模言語モデル (LLM) は近年顕著な進歩を遂げており、マルチモーダルな大規模言語モデル (MLLM) の出現により、視覚能力を備えたLLMが実現され、様々なマルチモーダルタスクにおける印象的なパフォーマンスがもたらされた。しかし、GPT-4Vのような強力なMLLMは、特定の画像やテキスト入力を提示しても驚くほど失敗する。本稿では,MLLMに非常に関連性があるが応答に相容れない画像からなるMLLMをバッフルする典型的な入力のクラスを特定し,MLLMが幻覚に悩まされる原因となる。この効果を定量化するために,スプリアスイメージの幻覚レベルを評価する最初のベンチマークであるcorrelationqaを提案する。このベンチマークには、13のカテゴリにわたる7,308のテキストイメージペアが含まれている。提案した相関QAに基づいて,9つの主流MLLMを網羅的に分析し,この本能バイアスを様々な程度に普遍的に抱えることを示した。得られたベンチマークと評価結果が,誤解を招く画像の存在下でのMLLMの頑健さのより良い評価に役立つことを期待する。リソースはhttps://github.com/MasaiahHan/CorrelationQA.comで入手できる。 Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from hallucination. To quantify the effect, we propose CorrelationQA, the first benchmark that assesses the hallucination level given spurious images. This benchmark contains 7,308 text-image pairs across 13 categories. Based on the proposed CorrelationQA, we conduct a thorough analysis on 9 mainstream MLLMs, illustrating that they universally suffer from this instinctive bias to varying degrees. We hope that our curated benchmark and evaluation results aid in better assessments of the MLLMs' robustness in the presence of misleading images. The resource is available in https://github.com/MasaiahHan/CorrelationQA.	翻訳日:2024-02-07 16:06:53 公開日:2024-02-06
# QuantAgent: 自己改善型大規模言語モデルによる取引における聖杯の探索 QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model ( http://arxiv.org/abs/2402.03755v1 ) ライセンス: Link先を確認	Saizhuo Wang, Hang Yuan, Lionel M. Ni, Jian Guo	(参考訳) 大規模言語モデル(LLM)に基づく自律エージェントは、現実の課題に対処し、計画を立てる上で注目されているが、量的投資のような専門分野向けにこれらのエージェントを調整することは、依然として恐ろしい作業である。主な課題は、エージェントの学習プロセスのためのドメイン固有の知識ベースを効率的に構築し、統合することである。 This paper introduces a principled framework to address this challenge, comprising a two-layer loop.In the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. 実証的な結果は、実行可能な金融信号を発見し、財務予測の精度を高めるQuantAgentの能力を示している。 Autonomous agents based on Large Language Models (LLMs) that devise plans and tackle real-world challenges have gained prominence.However, tailoring these agents for specialized domains like quantitative investment remains a formidable task. The core challenge involves efficiently building and integrating a domain-specific knowledge base for the agent's learning process. This paper introduces a principled framework to address this challenge, comprising a two-layer loop.In the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. Empirical results showcase QuantAgent's capability in uncovering viable financial signals and enhancing the accuracy of financial forecasts.	翻訳日:2024-02-07 16:06:32 公開日:2024-02-06
# 放射能レポート生成のための集中型視覚誘導ネットワーク Intensive Vision-guided Network for Radiology Report Generation ( http://arxiv.org/abs/2402.03754v1 ) ライセンス: Link先を確認	Fudan Zheng, Mengfei Li, Ying Wang, Weijiang Yu, Ruixuan Wang, Zhiguang Chen, Nong Xiao, and Yutong Lu	(参考訳) 医療業界への大きな応用可能性のために、自動x線検査レポート生成が急成長している。しかし、この問題に対処するための既存のコンピュータビジョンと自然言語処理アプローチは2つの側面に限られている。まず、画像特徴を抽出する際、視覚における多視点推論を無視し、スペースビューやチャンネルビューといった医療画像の単一視点構造をモデル化する。しかし、臨床医は日常診断において総合的な判断を多視点画像情報に頼っている。第二に、レポートを生成する際には、マルチモーダル情報による文脈推論を見落とし、検索手法を利用した純粋テキスト最適化に焦点を当てる。本研究の目的は,臨床医の視点をシミュレートし,より正確な報告を生成するモデルを提案することである。上記の特徴抽出の限界を考慮し,多視点視覚知覚をシミュレートし統合するための医用画像エンコーダにおけるグローバル集中注意(gia)モジュールを提案する。 GIAは、深度ビュー、空間ビュー、ピクセルビューの3種類の視覚知覚を学習することを目指している。一方,報告生成における上記の問題に対処するために,複数のモーダル信号を用いて正確な一致レポートを生成する方法,すなわち,予め予測された単語と地域認識された視覚コンテンツの統合方法について検討する。具体的には、視覚的知識誘導デコーダ(VKGD)を設計し、次の単語予測を支援するために、モデルが視覚情報や予測されたテキストにどれだけ依存する必要があるかを適応的に検討する。したがって、我々の最後の集中型ビジョン誘導ネットワーク(IVGN)フレームワークは、GIA誘導型ビジュアルエンコーダとVKGDを含んでいる。 IU X-RayとMIMIC-CXRの2つの一般的なデータセットを用いた実験は、他の最先端手法と比較して、我々の手法が優れていることを示す。 Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians' perspectives and generates more accurate reports. Given the above limitation in feature extraction, we propose a Globally-intensive Attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e., how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a Visual Knowledge-guided Decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final Intensive Vision-guided Network (IVGN) framework includes a GIA-guided Visual Encoder and the VKGD. Experiments on two commonly-used datasets IU X-Ray and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches.	翻訳日:2024-02-07 16:06:16 公開日:2024-02-06
# 不確実性に基づく集団変数を用いたロバストな分子データセットのサンプリング Enhanced sampling of robust molecular datasets with uncertainty-based collective variables ( http://arxiv.org/abs/2402.03753v1 ) ライセンス: Link先を確認	Aik Rui Tan, Johannes C. B. Dietschreit, Rafael Gomez-Bombarelli	(参考訳) 分子システムのアクセス可能な構成空間を表すデータセットを生成することは、機械学習原子間ポテンシャル(mlip)のロバスト性にとって重要である。しかし、多くの局所的なミニマとエネルギー障壁を持つ複雑なポテンシャルエネルギー表面(PES)を特徴とする分子系の複雑さは、大きな課題を呈している。ランダムサンプリングや徹底的な探索のような従来のデータ生成方法は、扱いにくいか、稀だが非常に有益な構成を捉えない。本研究では,MLモデル予測が最も不確実な構成空間の領域に着目し,化学関連データポイントの獲得を導くために,不確実性を集合変数(CV)として活用する手法を提案する。このアプローチでは、偏り分子動力学シミュレーションのためのcvとして単一のモデルからガウス混合モデルに基づく不確かさ計量を用いる。アラニンジペプチドベンチマークシステムにおいて, エネルギー障壁を克服し, 目に見えないエネルギーミニマを探索し, アクティブラーニングフレームワークで設定したデータセットを向上する手法の有効性を実証した。 Generating a data set that is representative of the accessible configuration space of a molecular system is crucial for the robustness of machine learned interatomic potentials (MLIP). However, the complexity of molecular systems, characterized by intricate potential energy surfaces (PESs) with numerous local minima and energy barriers, presents a significant challenge. Traditional methods of data generation, such as random sampling or exhaustive exploration, are either intractable or may not capture rare, but highly informative configurations. In this study, we propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically-relevant data points, focusing on regions of the configuration space where ML model predictions are most uncertain. This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations. The effectiveness of our approach in overcoming energy barriers and exploring unseen energy minima, thereby enhancing the data set in an active learning framework, is demonstrated on the alanine dipeptide benchmark system.	翻訳日:2024-02-07 16:05:45 公開日:2024-02-06
# 小型画像を用いた小型データセットにおける軽量ビジョントランスの事前学習 Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images ( http://arxiv.org/abs/2402.03752v1 ) ライセンス: Link先を確認	Jen Hong Tan	(参考訳) 軽量ビジョントランスフォーマー(ViT)は、小さな画像解像度のデータセット上で、ResNetのような畳み込みニューラルネットワーク(CNN)のパフォーマンスにマッチするか、超えるか? 本報告では,マスク付きオートエンコーダによる画像スケーリングの最小化により,プリトレーニングにより純粋なViTが優れた性能を発揮することを示す。 CIFAR-10とCIFAR-100データセットの実験では、パラメータが365万未満のViTモデルと、乗算累積(MAC)数が0.27G未満で、これらを「軽量」モデルとみなした。従来の手法とは異なり、CIFAR-10やCIFAR-100の画像を著しくスケールアップすることなく、類似の軽量トランスフォーマーベースアーキテクチャの最先端性能を実現する。この成果は、小さなデータセットを扱うだけでなく、元のスケールに近い画像を効果的に処理する上でも、我々のモデルの効率を裏付けるものである。 Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets involved ViT models with fewer than 3.65 million parameters and a multiply-accumulate (MAC) count below 0.27G, qualifying them as 'lightweight' models. Unlike previous approaches, our method attains state-of-the-art performance among similar lightweight transformer-based architectures without significantly scaling up images from CIFAR-10 and CIFAR-100. This achievement underscores the efficiency of our model, not only in handling small datasets but also in effectively processing images close to their original scale.	翻訳日:2024-02-07 16:05:26 公開日:2024-02-06
# ディジタルツインモビリティプロファイリング : 時空間グラフ学習アプローチ Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach ( http://arxiv.org/abs/2402.03750v1 ) ライセンス: Link先を確認	Xin Chen, Mingliang Hou, Tao Tang, Achhardeep Kaur and Feng Xia	(参考訳) ビッグデータ時代が到来すると、モビリティプロファイリングは膨大なモビリティデータを利用してインテリジェントな交通システムを構築するための有効な方法になってきた。モビリティプロファイリングは、モビリティデータから都市交通の潜在的なパターンを抽出でき、様々な交通関連アプリケーションにとって重要である。しかし、高いレベルの複雑さと膨大なデータによって、モビリティプロファイリングは大きな課題に直面している。デジタルツイン(dt)技術は、ネットワークの仮想表現を作成してその動作をシミュレートすることで、コスト効率とパフォーマンスを最適化した管理の道を開く。交通シナリオにおける複雑な時空間的特徴を捉えるため、時空間的相関表現の完成を支援するアライメント図を構築し、時空間的相互作用(時空間的相互作用)の微粒化を学習する。本稿では,移動ネットワークDTモデルを用いてノードプロファイルを学習するためのデジタルツインモビリティ・プロファイリング(DTMP)フレームワークを提案する。 3つの実世界のデータセットで広範な実験が行われた。実験によりDTMPの有効性が示された。 With the arrival of the big data era, mobility profiling has become a viable method of utilizing enormous amounts of mobility data to create an intelligent transportation system. Mobility profiling can extract potential patterns in urban traffic from mobility data and is critical for a variety of traffic-related applications. However, due to the high level of complexity and the huge amount of data, mobility profiling faces huge challenges. Digital Twin (DT) technology paves the way for cost-effective and performance-optimised management by digitally creating a virtual representation of the network to simulate its behaviour. In order to capture the complex spatio-temporal features in traffic scenario, we construct alignment diagrams to assist in completing the spatio-temporal correlation representation and design dilated alignment convolution network (DACN) to learn the fine-grained correlations, i.e., spatio-temporal interactions. We propose a digital twin mobility profiling (DTMP) framework to learn node profiles on a mobility network DT model. Extensive experiments have been conducted upon three real-world datasets. Experimental results demonstrate the effectiveness of DTMP.	翻訳日:2024-02-07 16:05:10 公開日:2024-02-06
# ソフトウェアパッチの自動記述生成 Automated Description Generation for Software Patches ( http://arxiv.org/abs/2402.03805v1 ) ライセンス: Link先を確認	Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, and Son Nguyen	(参考訳) ソフトウェアパッチは、コードベースの精製と進化、バグ、脆弱性、最適化に重要である。パッチ記述は変更の詳細な説明を提供し、開発者間の理解とコラボレーションを支援する。しかし、マニュアル記述の作成は、時間消費と品質と細部の違いの観点から課題を提起する。本稿では,パッチ記述生成を機械翻訳タスクとしてフレーミングすることで,これらの課題に対処するPATCHEXPLAINERを提案する。 PATCHEXPLAINERでは、重要な要素、歴史的文脈、統語規則の明示的な表現を活用する。さらに、PATCHEXPLAINERの翻訳モデルは、記述類似性を意識して設計されている。特に、このモデルは、グループにクラスタ化されたパッチ記述に存在する類似性を認識し、組み込むように明示的に訓練されており、同様のパッチ間で正確で一貫性のある記述を生成する能力を改善している。 2つの目的は類似性を最大化し、アフィリエイト群を正確に予測する。実世界のソフトウェアパッチの大規模なデータセットを用いた実験の結果、patchexplainerは、bleuの189%、正確な一致率5.7倍、セマンティックな類似度154%という、既存の手法を一貫して上回っており、ソフトウェアパッチ記述の生成に効果があることが判明しました。 Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PATCHEXPLAINER, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PATCHEXPLAINER is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PATCHEXPLAINER consistently outperforms existing methods, with improvements up to 189% in BLEU, 5.7X in Exact Match rate, and 154% in Semantic Similarity, affirming its effectiveness in generating software patch descriptions.	翻訳日:2024-02-07 15:58:45 公開日:2024-02-06
# ReLU$^2$ Wins: Sparse LLMの効率的な活性化関数の発見 ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs ( http://arxiv.org/abs/2402.03804v1 ) ライセンス: Link先を確認	Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun	(参考訳) スパース計算は、非活性ニューロンの計算を動的にスキップすることで、低リソースシナリオにおけるLarge Language Models(LLM)の推論に魅力的なソリューションを提供する。従来のアプローチでは、活性化値のゼロを活用するReLUベースのLCMに重点を置いているが、ゼロアクティベーション値を超えたスパースLSMの範囲を広げている。我々は、ニューロン出力の等級と調整された等級しきい値によってニューロンの活性化を定義する一般的な方法を紹介し、非ReLU LLMもスパース活性化を示すことを示した。スパース計算における最も効率的なアクティベーション関数を見つけるために,スポーシティと性能のトレードオフ,スポーシティの予測率,ハードウェア親和性という3つの側面からLCMの疎さを調べるための体系的枠組みを提案する。我々は、ReLU、SwiGLU、ReGLU、ReLU$2$といった異なるアクティベーション機能を利用したLCMの徹底的な実験を行う。その結果,ReLU$^2$モデルが3つの評価点すべてで優れており,スパースLCMの効率的な活性化機能としての可能性を強調した。今後の研究を促進するためにコードを公開します。 Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skipping the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of sparse LLMs beyond zero activation values. We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold, demonstrating that non-ReLU LLMs also exhibit sparse activation. To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity. We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$^2$. The results indicate that models employing ReLU$^2$ excel across all three evaluation aspects, highlighting its potential as an efficient activation function for sparse LLMs. We will release the code to facilitate future research.	翻訳日:2024-02-07 15:58:18 公開日:2024-02-06
# 顔検出:現状と研究の方向性 Face Detection: Present State and Research Directions ( http://arxiv.org/abs/2402.03796v1 ) ライセンス: Link先を確認	Purnendu Prabhat, Himanshu Gupta and Ajeet Kumar Vishwakarma	(参考訳) 人間のイメージを扱うコンピュータビジョンアプリケーションの大部分は、顔検出をコアコンポーネントとして使用している。顔検出には依然として問題がある。顔検出の精度と速度は向上する可能性がある。このレビュー論文は、この分野における進歩と、まだ取り組まなければならない重大な課題を示している。この論文は、顔検出の分野での研究プロジェクトとして取り上げることができる研究の方向性を提供する。 The majority of computer vision applications that handle images featuring humans use face detection as a core component. Face detection still has issues, despite much research on the topic. Face detection's accuracy and speed might yet be increased. This review paper shows the progress made in this area as well as the substantial issues that still need to be tackled. The paper provides research directions that can be taken up as research projects in the field of face detection.	翻訳日:2024-02-07 15:57:45 公開日:2024-02-06
# 深度誘導によるエネルギーベースドメイン適応セグメンテーション Energy-based Domain-Adaptive Segmentation with Depth Guidance ( http://arxiv.org/abs/2402.03795v1 ) ライセンス: Link先を確認	Jinjing Zhu, Zhedong Hu, Tae-Kyun Kim, and Lin Wang	(参考訳) セマンティックセグメンテーションのための非教師なしドメイン適応(UDA)のガイダンスとして,自己教師付き深度推定を活用する試みが近年行われている。しかし、先行芸術は、意味的特徴と深さ的特徴の相違、および特徴融合の信頼性を軽視し、したがって準最適セグメンテーション性能に繋がる。本稿では,エネルギーベースモデル(ebms)を用いたタスク適応的特徴の獲得と,自己教師付き深さ推定によるセマンティクスセグメンテーションのための信頼性の高い機能融合を実現する,smart(cross domain semantic segmentation based energy estimation)と呼ばれる新しいudaフレームワークを提案する。本フレームワークには,エネルギーベース機能融合(EB2F)とエネルギーベース信頼性融合評価(RFA)モジュールの2つの新しいコンポーネントが組み込まれている。 EB2Fモジュールは、機能融合を改善するためにホップフィールドエネルギーを用いて、その相違を明示的に測定し、低減することにより、タスク適応的な意味と深さの特徴を生成する。 RFAモジュールは、エネルギースコアを用いて特徴融合の信頼性を評価し、深さ誘導の有効性を向上させる。 2つのデータセットに対する大規模な実験により,本手法は先行研究よりも大きな性能向上を達成し,エネルギーベース学習手法の有効性を検証した。 Recent endeavors have been made to leverage self-supervised depth estimation as guidance in unsupervised domain adaptation (UDA) for semantic segmentation. Prior arts, however, overlook the discrepancy between semantic and depth features, as well as the reliability of feature fusion, thus leading to suboptimal segmentation performance. To address this issue, we propose a novel UDA framework called SMART (croSs doMain semAntic segmentation based on eneRgy esTimation) that utilizes Energy-Based Models (EBMs) to obtain task-adaptive features and achieve reliable feature fusion for semantic segmentation with self-supervised depth estimates. Our framework incorporates two novel components: energy-based feature fusion (EB2F) and energy-based reliable fusion Assessment (RFA) modules. The EB2F module produces task-adaptive semantic and depth features by explicitly measuring and reducing their discrepancy using Hopfield energy for better feature fusion. The RFA module evaluates the reliability of the feature fusion using an energy score to improve the effectiveness of depth guidance. Extensive experiments on two datasets demonstrate that our method achieves significant performance gains over prior works, validating the effectiveness of our energy-based learning approach.	翻訳日:2024-02-07 15:57:36 公開日:2024-02-06
# 知識データアライメントによる弱教師付き異常検出 Weakly Supervised Anomaly Detection via Knowledge-Data Alignment ( http://arxiv.org/abs/2402.03785v1 ) ライセンス: Link先を確認	Haihong Zhao, Chenyi Zi, Yang Liu, Chen Zhang, Yan Zhou and Jia Li	(参考訳) マルウェア検出、マネーロンダリング、デバイス障害検出、ネットワーク障害解析など、多数のWebベースのアプリケーションにおいて、異常検出(AD)が重要な役割を果たす。教師なし学習に依存するほとんどの手法は、ラベルの欠如により、十分な検出精度に達することが困難である。弱教師付き異常検出(weakly supervised anomaly detection, wsad)は、限られた数のラベル付き異常検出によって、モデルの性能を向上させるために導入された。それでも、不適切なラベル付きデータに基づいてトレーニングされたモデルが、目に見えない異常に一般化することは依然として困難である。本稿では、人間の専門家が一般的に要約したルール知識を統合し、限定されたラベル付きデータを補完する、新しい枠組みであるkdalign(知識データアライメント)を提案する。具体的には、これらのルールを知識空間に変換し、知識とデータのアライメントとして知識の組み入れを再キャストする。このアライメントを容易にするために、最適輸送(OT)技術を用いる。次に, OT 距離を WSAD 手法の本来の目的関数に付加的な損失項として組み込む。 5つの実世界のデータセットに対する総合的な実験結果から、提案したKDAlignフレームワークが最先端のフレームワークを著しく上回り、様々な異常なタイプで優れたパフォーマンスを実現していることが示された。 Anomaly detection (AD) plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis. Most methods, which rely on unsupervised learning, are hard to reach satisfactory detection accuracy due to the lack of labels. Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance. Nevertheless, it is still challenging for models, trained on an inadequate amount of labeled data, to generalize to unseen anomalies. In this paper, we introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data. Specifically, we transpose these rules into the knowledge space and subsequently recast the incorporation of knowledge as the alignment of knowledge and data. To facilitate this alignment, we employ the Optimal Transport (OT) technique. We then incorporate the OT distance as an additional loss term to the original objective function of WSAD methodologies. Comprehensive experimental results on five real-world datasets demonstrate that our proposed KDAlign framework markedly surpasses its state-of-the-art counterparts, achieving superior performance across various anomaly types.	翻訳日:2024-02-07 15:56:19 公開日:2024-02-06
# AirPhyNet:空気質予測のための物理誘導ニューラルネットワーク AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction ( http://arxiv.org/abs/2402.03784v1 ) ライセンス: Link先を確認	Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang, Cheng Long, Gao Cong, Jingyuan Wang	(参考訳) 大気質の予測とモデリングは公衆衛生と環境管理において重要な役割を担い、個人や当局は情報的決定を行う。従来のデータ駆動モデルはこの領域で有望性を示しているが、その長期的な予測精度は、特にスパースや不完全なデータを持つシナリオでは制限され、それらは多くの場合、確固とした物理的基盤を持たないブラックボックスのディープラーニング構造に依存しているため、予測における透明性と解釈性が低下する。本稿では,空気質予測のための物理誘導ニューラルネットワーク(AirPhyNet)という新しい手法を提案する。具体的には、空気粒子移動(拡散と対流)の2つの確立された物理原理を微分方程式ネットワークとして表現する。次に,物理知識をニューラルネットワークアーキテクチャに統合し,潜時表現を利用して大気質データ内の時空間関係をキャプチャするグラフ構造を用いる。 2つの実世界のベンチマークデータセットの実験によると、AirPhyNetは異なるリードタイム(24h, 48h, 72h)、スパースデータと突然の変化予測など、さまざまなテストシナリオの最先端モデルよりも優れており、予測エラーの最大10%削減を実現している。さらに,本モデルが粒子運動の基盤となる物理過程を捉え,実際の物理的意味を持つ正確な予測を生成することを検証した。 Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structures that lack solid physical foundation leading to reduced transparency and interpretability in predictions. To address these limitations, this paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet). Specifically, we leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks. Then, we utilize a graph structure to integrate physics knowledge into a neural network architecture and exploit latent representations to capture spatio-temporal relationships within the air quality data. Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios including different lead time (24h, 48h, 72h), sparse data and sudden change prediction, achieving reduction in prediction errors up to 10%. Moreover, a case study further validates that our model captures underlying physical processes of particle movement and generates accurate predictions with real physical meaning.	翻訳日:2024-02-07 15:55:32 公開日:2024-02-06
# 弱教師付きプロンプト学習による低リソース医療画像分類の探索 Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning ( http://arxiv.org/abs/2402.03783v1 ) ライセンス: Link先を確認	Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu	(参考訳) 臨床補助診断を補助する医用画像認識の進歩は、アノテーションが高価で専門的な医療分野における低リソース化が課題となっている。この低リソース問題は、関連する医学的テキストプロンプトを介して、大規模な事前訓練された視覚言語モデルの転送可能な表現を活用することで緩和することができる。しかし、既存の事前訓練された視覚言語モデルでは、医師の負担を大幅に増大させる医療プロンプトを慎重に設計する必要がある。そこで本研究では,教師なしの視覚言語モデルと弱い教師なしプロンプト学習モデルを含む医学的プロンプトを自動的に生成する,弱い教師付きプロンプト学習法 medprompt を提案する。教師なし事前訓練された視覚言語モデルは、手作業による注釈なしで、医学画像と対応する医学テキストとの自然な相関を利用して事前訓練を行う。弱い教師付きプロンプト学習モデルでは、データセット内の画像のクラスのみを使用してプロンプト内の特定のクラスベクトルの学習を誘導する一方、プロンプト内の他のコンテキストベクトルの学習はガイダンスのマニュアルアノテーションを必要としない。私たちの知る限りでは、これが医療用プロンプトを自動生成する最初のモデルです。これらのプロンプトにより、トレーニング済みの視覚言語モデルは、手動のアノテーションと手動のプロンプト設計の強い専門家依存から解放することができる。実験の結果,我々の自動生成プロンプトを用いたモデルは,ゼロショット画像分類において,最小限のラベル付きサンプルしか持たないフルショット学習ハンドクラフトプロンプトよりも優れ,あるいは同等の精度に達することがわかった。提案するプロンプトジェネレータは軽量であり,任意のネットワークアーキテクチャに組み込むことができる。 Most advances in medical image recognition supporting clinical auxiliary diagnosis meet challenges due to the low-resource situation in the medical field, where annotations are highly expensive and professional. This low-resource problem can be alleviated by leveraging the transferable representations of large-scale pre-trained vision-language models via relevant medical text prompts. However, existing pre-trained vision-language models require domain experts to carefully design the medical prompts, which greatly increases the burden on clinicians. To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and a weakly supervised prompt learning model. The unsupervised pre-trained vision-language model utilizes the natural correlation between medical images and corresponding medical texts for pre-training, without any manual annotations. The weakly supervised prompt learning model only utilizes the classes of images in the dataset to guide the learning of the specific class vector in the prompt, while the learning of other context vectors in the prompt requires no manual annotations for guidance. To the best of our knowledge, this is the first model to automatically generate medical prompts. With these prompts, the pre-trained vision-language model can be freed from the strong expert dependency of manual annotation and manual prompt design. Experimental results show that the model using our automatically generated prompts outperforms its full-shot learning hand-crafted prompts counterparts with only a minimal number of labeled samples for few-shot learning, and reaches superior or comparable accuracy on zero-shot image classification. The proposed prompt generator is lightweight and therefore can be embedded into any network architecture.	翻訳日:2024-02-07 15:55:04 公開日:2024-02-06
# 言語間伝達のためのソフトプロンプトチューニング: 少ない方が多い場合 Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More ( http://arxiv.org/abs/2402.03782v1 ) ライセンス: Link先を確認	Fred Philippy, Siwen Guo, Shohreh Haddadan, Cedric Lothritz, Jacques Klein, Tegawend\'e F. Bissyand\'e	(参考訳) SPT(Soft Prompt Tuning)は、学習可能な埋め込みやソフトプロンプトをPLMの入力層に挿入することで、学習済み言語モデル(PLM)を特定のタスクに適応させるパラメータ効率のよい手法である。本稿では,言語間移動におけるSPTの可能性について検討する。ソフトプロンプトとモデルパラメータの両方を微調整する言語間伝達に関する以前の研究とは異なり、モデルパラメータを凍結させ、ソフトプロンプトのみをトレーニングすることで、sptの本来の意図に固執する。これは、フルモデルファインチューニングの計算コストとストレージオーバーヘッドを低減させるだけでなく、SPTに固有のこのパラメータ効率が言語的に離れた言語への言語間転送性能を向上させることを実証する。さらに,プロンプトに関連する要因(長さやパラメータ化など)が言語間移動性能に与える影響についても検討する。 Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the soft prompt and the model parameters, we adhere to the original intent of SPT by keeping the model parameters frozen and only training the soft prompt. This does not only reduce the computational cost and storage overhead of full-model fine-tuning, but we also demonstrate that this very parameter efficiency intrinsic to SPT can enhance cross-lingual transfer performance to linguistically distant languages. Moreover, we explore how different factors related to the prompt, such as the length or its reparameterization, affect cross-lingual transfer performance.	翻訳日:2024-02-07 15:54:34 公開日:2024-02-06
# MolTC:言語モデルにおける分子関係モデリングを目指して MolTC: Towards Molecular Relational Modeling In Language Models ( http://arxiv.org/abs/2402.03781v1 ) ライセンス: Link先を確認	Junfeng Fang, Shuai Zhang, Chang Wu, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang, Xiangnan He	(参考訳) 分子間の相互作用を理解することを目的とした分子関係学習(MRL)は、生化学研究の進展において重要な役割を担っている。近年,膨大な知識リポジトリと高度な論理推論能力で知られる大規模言語モデル (LLM) の採用が,MRLの効率的かつ効果的な方法として注目されている。その可能性にもかかわらず、これらの手法は主としてテキストデータに依存しており、分子グラフに固有の構造情報の豊富さを十分に活用していない。さらに、統一されたフレームワークが存在しないことで、さまざまなデータセットで学習された相互作用の合理化の共有が妨げられるため、情報の活用が悪化する。これらの課題に対処するために、本研究では分子対のリッチなグラフィカルな情報を効率的に統合できるmoltc(chain-of-thought (cot) theory) に基づいた分子相互作用予測のための新しいllmベースのマルチモーダルフレームワークを提案する。統合MRLを実現するため,MollTCは,クロスデータセット情報交換のための動的パラメータ共有戦略を革新的に開発し,マルチ階層CoT原則を導入し,訓練パラダイムを洗練させる。 4000,000以上の分子対を含む12種類のデータセットを用いて実験を行い,現在のGNNおよびLLMベースラインよりも,本手法の優位性を実証した。その上、分子対話型インストラクションデータセットを総合的に構築し、moltcを含む生化学llmの開発を行っている。コードはhttps://github.com/MangoKiller/MolTCで入手できる。 Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the information underutilization, as it hinders the sharing of interaction rationale learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which can efficiently integrate rich graphical information of molecular pairs. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information exchange, and introduces a Multi-hierarchical CoT principle to refine training paradigm. Our experiments, conducted across twelve varied datasets involving over 4,000,000 molecular pairs, demonstrate the superiority of our method over current GNN and LLM-based baselines. On the top of that, a comprehensive Molecular Interactive Instructions dataset is constructed for the development of biochemical LLM, including our MolTC. Code is available at https://github.com/MangoKiller/MolTC.	翻訳日:2024-02-07 15:54:16 公開日:2024-02-06
# 公開プロパガンダ:人間のアノテーションと機械分類を比較したスタイリスティックな方法の分析 Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification ( http://arxiv.org/abs/2402.03780v1 ) ライセンス: Link先を確認	G\'eraud Faye, Benjamin Icard, Morgane Casanova, Julien Chanson, Fran\c{c}ois Maine, Fran\c{c}ois Bancilhon, Guillaume Gadek, Guillaume Gravier, Paul \'Egr\'e	(参考訳) 本稿では,プロパガンダの言語とその様式的特徴について検討する。 Pseudo-Newsは、専門家機関によってプロパガンダソースとして特定されたウェブサイトから抽出されたニュース記事からなるマルチソース、多言語、マルチモーダルデータセットである。このセットの限られたサンプルは、通常のフランスの報道機関の論文とランダムに混同され、そのURLがマスクされ、11の異なるラベルを使って人による注釈実験が行われた。その結果,ヒトのアノテータは各ラベル間で2種類のプレスを確実に識別することができた。アノテーションが使用するキューを識別するための異なるNLP手法を提案し,それらを機械分類と比較する。これには、談話の曖昧さと主観性を測定するためのアナライザVAGO、ベースラインとして機能するTF-IDF、および2つのRoBERTaベースのモデル、構文を用いたCATS、構文と意味的特徴を組み合わせた1つのXGBoostの4つの異なる分類器が含まれる。キーワード: Propaganda, Fake News, 説明可能性, AIアライメント, Vagueness, 主観性, 誇張, スティリスティック分析 This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features. Keywords: Propaganda, Fake News, Explainability, AI alignment, Vagueness, Subjectivity, Exaggeration, Stylistic analysis	翻訳日:2024-02-07 15:53:50 公開日:2024-02-06
# EERO: 予算限定による効率的な分類のためのリジェクトオプションによる早期終了 EERO: Early Exit with Reject Option for Efficient Classification with limited budget ( http://arxiv.org/abs/2402.03779v1 ) ライセンス: Link先を確認	Florian Valade (LAMA), Mohamed Hebiri (LAMA), Paul Gay	(参考訳) 高度な機械学習モデルの複雑さの増大は、計算資源を効果的に管理するための革新的なアプローチを必要とする。このような方法のひとつがEarly Exit戦略であり、単純なデータインスタンスの処理パスを短縮するメカニズムを提供することで、適応的な計算を可能にする。本稿では,eeroを提案する。eeroは,早期の退出問題から,各インスタンスの退出ヘッドをより適切に選択するために,rejectオプション付きの複数の分類器を使用する問題に翻訳する新しい手法である。我々は、固定予算を保証するために指数重の集約を用いて、異なる頭部で出口の確率を調整した。我々は,ベイズリスク,予算制約,ヘッド固有の予算消費などの要因を検討する。 Cifar と ImageNet のデータセット上で ResNet-18 モデルと ConvNext アーキテクチャを用いて実験を行った結果,提案手法は予算配分を効果的に管理するだけでなく,過度なシナリオの正確性も向上することが示された。 The increasing complexity of advanced machine learning models requires innovative approaches to manage computational resources effectively. One such method is the Early Exit strategy, which allows for adaptive computation by providing a mechanism to shorten the processing path for simpler data instances. In this paper, we propose EERO, a new methodology to translate the problem of early exiting to a problem of using multiple classifiers with reject option in order to better select the exiting head for each instance. We calibrate the probabilities of exiting at the different heads using aggregation with exponential weights to guarantee a fixed budget .We consider factors such as Bayesian risk, budget constraints, and head-specific budget consumption. Experimental results, conducted using a ResNet-18 model and a ConvNext architecture on Cifar and ImageNet datasets, demonstrate that our method not only effectively manages budget allocation but also enhances accuracy in overthinking scenarios.	翻訳日:2024-02-07 15:53:27 公開日:2024-02-06
# コードレビューの自動化を改善する - 経験から学ぶ Improving Automated Code Reviews: Learning from Experience ( http://arxiv.org/abs/2402.03777v1 ) ライセンス: Link先を確認	Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Wachiraphan Charoenwet	(参考訳) 現代のコードレビューは、業界とオープンソースの両方で広く採用されている品質保証プロセスである。このプロセスは、経験豊富なレビュアーからのフィードバックから初心者が学ぶのに役立つが、レビュアーには大きなワークロードとストレスをもたらすことが多い。この負担を軽減するため、自動コードレビューの分野はプロセスを自動化することを目的としており、大きな言語モデルに人間のように、提出されたコードに対するレビューを提供するように教えている。最近のアプローチでは、大規模なコードレビューコーパスで、コードインテリジェント言語モデルを事前学習し、微調整した。しかし、これらの手法はトレーニングデータの品質評価を完全に活用することはなかった。実際、コードに対する高いレベルの経験や慣れ親しんだレビュアーは、他のものよりも深い洞察を提供するでしょう。本研究では,経験型オーバーサンプリング技術に基づいてトレーニングされた自動コードレビューモデルから,高品質なレビューを生成できるかどうかを検討する。定量的および定性的な評価により,経験意識によるオーバーサンプリングは,新たなデータを導入することなく,現在の最先端モデルが生成するレビューの正確性,情報レベル,有意義性を向上できることがわかった。その結果,現行のトレーニング戦略では,高品質なレビューが不十分であることが示唆された。この作業は、自動コードレビューモデルを強化するためのリソース効率のよい方法に光を当てています。 Modern code review is a critical quality assurance process that is widely adopted in both industry and open source software environments. This process can help newcomers learn from the feedback of experienced reviewers; however, it often brings a large workload and stress to reviewers. To alleviate this burden, the field of automated code reviews aims to automate the process, teaching large language models to provide reviews on submitted code, just as a human would. A recent approach pre-trained and fine-tuned the code intelligent language model on a large-scale code review corpus. However, such techniques did not fully utilise quality reviews amongst the training data. Indeed, reviewers with a higher level of experience or familiarity with the code will likely provide deeper insights than the others. In this study, we set out to investigate whether higher-quality reviews can be generated from automated code review models that are trained based on an experience-aware oversampling technique. Through our quantitative and qualitative evaluation, we find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews generated by the current state-of-the-art model without introducing new data. The results suggest that a vast amount of high-quality reviews are underutilised with current training strategies. This work sheds light on resource-efficient ways to boost automated code review models.	翻訳日:2024-02-07 15:53:10 公開日:2024-02-06
# MOOCsグレーダーとしての大規模言語モデル Large Language Models As MOOCs Graders ( http://arxiv.org/abs/2402.03776v1 ) ライセンス: Link先を確認	Shahriar Golchin, Nikhil Garuda, Christopher Impey, Matthew Wenger	(参考訳) 大規模なオープン・オンライン・コース(moocs)は、世界中の誰でもコンピュータとインターネットにアクセスできる自由教育の扉を開ける。このような学習の民主化にもかかわらず、これらのコースの大規模な入学は、一人の教官が生徒全員の筆記課題を評価することはほぼ不可能であることを意味する。結果として、単純なルーブリックによって導かれるピアグレーティングが選択方法である。便利だが、ピアグレーディングは信頼性と妥当性の点で不足することが多い。本研究では18の異なる設定を用いて,MOOCにおけるピアグレーディングを代替する大規模言語モデル(LLM)の実現可能性を検討する。具体的には,GPT-4 と GPT-3.5 の3つの異なるコース,すなわち導入天文学,天文学,天文学史と哲学に焦点をあてる。 LLMを指導するためには、ゼロショットチェーン・オブ・シークレット (Zero-shot-CoT) の変種に基づく3つの異なるプロンプトを使用する: ゼロショット-CoTとインストラクターが提案した正解を組み合わせ、ゼロショット-CoTとインストラクターが生成した正解とLLMを併用するゼロショット-CoT。その結果,Zero-shot-CoTはインストラクターが提供する回答やルーブリックと統合された場合,ピアグレーティングよりもインストラクターが割り当てたものとより整合した成績が得られた。しかし、天文学コースの歴史と哲学は、他のコースとは対照的に、成績付けの点でより困難であることが証明されている。最後に,本研究は,特にルーブリックをよく定義した被験者において,moocのグレーティングシステムを自動化するための有望な方向性を示す。 Massive open online courses (MOOCs) unlock the doors to free education for anyone around the globe with access to a computer and the internet. Despite this democratization of learning, the massive enrollment in these courses means it is almost impossible for one instructor to assess every student's writing assignment. As a result, peer grading, often guided by a straightforward rubric, is the method of choice. While convenient, peer grading often falls short in terms of reliability and validity. In this study, using 18 distinct settings, we explore the feasibility of leveraging large language models (LLMs) to replace peer grading in MOOCs. Specifically, we focus on two state-of-the-art LLMs: GPT-4 and GPT-3.5, across three distinct courses: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy. To instruct LLMs, we use three different prompts based on a variant of the zero-shot chain-of-thought (Zero-shot-CoT) prompting technique: Zero-shot-CoT combined with instructor-provided correct answers; Zero-shot-CoT in conjunction with both instructor-formulated answers and rubrics; and Zero-shot-CoT with instructor-offered correct answers and LLM-generated rubrics. Our results show that Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading. However, the History and Philosophy of Astronomy course proves to be more challenging in terms of grading as opposed to other courses. Finally, our study reveals a promising direction for automating grading systems for MOOCs, especially in subjects with well-defined rubrics.	翻訳日:2024-02-07 15:52:50 公開日:2024-02-06
# 変圧器を用いた決定木アルゴリズムの学習 Learning a Decision Tree Algorithm with Transformers ( http://arxiv.org/abs/2402.03774v1 ) ライセンス: Link先を確認	Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao	(参考訳) 決定木は、特に表データにおいて高い予測性能を達成するための解釈能力で有名である。伝統的に、それらは再帰的なアルゴリズムによって構築され、ツリーの各ノードでデータを分割する。しかし、ローカルセグメントに最適化された決定木がグローバルな一般化をもたらすことはないため、最良の分割を特定することは難しい。これに対処するために,古典アルゴリズムからのフィルタ出力に基づいてトランスフォーマティブベースのモデルをトレーニングし,分類のための強い決定木を生成するメタツリーを提案する。具体的には、多くのデータセットにグリージーな決定木と最適化された決定木の両方を適合させます。次にMetaTreeをトレーニングして、強力な一般化パフォーマンスを実現するツリーを生成します。このトレーニングにより、MetaTreeはこれらのアルゴリズムをエミュレートするだけでなく、コンテキストに応じてその戦略をインテリジェントに適応させることができる。 Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To address this, we introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification. Specifically, we fit both greedy decision trees and optimized decision trees on a large number of datasets. We then train MetaTree to produce the trees that achieve strong generalization performance. This training enables MetaTree to not only emulate these algorithms, but also to intelligently adapt its strategy according to the context, thereby achieving superior generalization performance.	翻訳日:2024-02-07 15:52:10 公開日:2024-02-06
# RVFLに基づく非線形辞書学習へのSVDフリーアプローチ An SVD-free Approach to Nonlinear Dictionary Learning based on RVFL ( http://arxiv.org/abs/2402.03833v1 ) ライセンス: Link先を確認	G.Madhuri, Atul Negi	(参考訳) 本稿では,Random Vector Functional Link(RVFL)と呼ばれるフィードフォワードニューラルネットワークの理論を活用する非線形辞書学習アルゴリズムを提案する。提案したRVFLに基づく非線形辞書学習(RVFLDL)は,非線形スパース係数から高密度入力特徴へのスパース・トゥ・デンス特徴写像として辞書を学習する。カーネルに基づく非線形辞書学習手法は暗黙的特徴マップによって得られる特徴空間で動作し、特異値分解(svd)のような計算コストの高い演算とは独立ではない。 RVFLは入力から出力層への重みを解析的に生成するので、RVFLベースの辞書のトレーニングはSVD計算から解放される。係数にスパース性誘導馬車前置を仮定し、スパース係数行列w.r.tを初期ランダム辞書を生成する。入力スパース係数と辞書原子との高次依存関係は、スパース係数を非線形に変換し、強化された特徴として付加することによりトレーニングプロセスに組み込む。したがって、この方法は、辞書に非線形性を誘導しながら、スパース係数を高次元空間に投影する。 RVFL-netを用いて分類するために、分類行列は非線形スパース係数をラベルにマッピングする変換として学習される。画像分類と再構成の応用における手法の性能は他の非線形辞書学習法と同等である。実験により、RVFLDLは拡張性があり、他の非線形辞書学習法よりも優れた解を提供することが示された。 This paper presents a novel nonlinear dictionary learning algorithm leveraging the theory of a feed-forward neural network called Random Vector Functional Link (RVFL). The proposed RVFL-based nonlinear Dictionary Learning (RVFLDL) learns a dictionary as a sparse-to-dense feature map from nonlinear sparse coefficients to the dense input features. Kernel-based nonlinear dictionary learning methods operate in a feature space obtained by an implicit feature map, and they are not independent of computationally expensive operations like Singular Value Decomposition (SVD). Training the RVFL-based dictionary is free from SVD computation as RVFL generates weights from the input to the output layer analytically. Sparsity-inducing Horse-shoe prior is assumed on the coefficients to generate a sparse coefficient matrix w.r.t an initial random dictionary. Higher-order dependencies between the input sparse coefficients and the dictionary atoms are incorporated into the training process by nonlinearly transforming the sparse coefficients and adding them as enhanced features. Thus the method projects sparse coefficients to a higher dimensional space while inducing nonlinearities into the dictionary. For classification using RVFL-net, a classifier matrix is learned as a transform that maps nonlinear sparse coefficients to the labels. The performance of the method illustrated in image classification and reconstruction applications is comparable to that of other nonlinear dictionary learning methods. Experiments show that RVFLDL is scalable and provides a solution better than those obtained using other nonlinear dictionary learning methods.	翻訳日:2024-02-07 15:44:25 公開日:2024-02-06
# 大規模言語モデルを用いた雇用市場領域におけるスキル抽出の再考 Rethinking Skill Extraction in the Job Market Domain using Large Language Models ( http://arxiv.org/abs/2402.03832v1 ) ライセンス: Link先を確認	Khanh Cao Nguyen, Mike Zhang, Syrielle Montariol, Antoine Bosselut	(参考訳) スキル抽出は、仕事の投稿や履歴書などの文書で言及されているスキルと資格を識別する。このタスクは、BIOタグを用いたシーケンスラベリングアプローチを使用して教師付きモデルをトレーニングすることで、一般的に取り組まれる。しかし、手動でアノテートしたデータへの依存は、そのようなアプローチの一般化可能性を制限する。さらに、共通のバイオ設定は、複雑なスキルパターンを捉えてあいまいな言及を処理できるモデルの能力を制限する。本稿では,6つの統一スキル抽出データセットのベンチマークを用いて,これらの課題を克服するためのインコンテキスト学習の利用について検討する。提案手法は,大規模言語モデル(LLM)の少数ショット学習機能を活用し,文からスキルを抽出する。 LLMは従来の教師付きモデルと性能的に同等ではないにもかかわらず、構文的に複雑なスキル記述をスキル抽出タスクでよりうまく扱えることを示す。 Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes. The task is commonly tackled by training supervised models using a sequence labeling approach with BIO tags. However, the reliance on manually annotated data limits the generalizability of such approaches. Moreover, the common BIO setting limits the ability of the models to capture complex skill patterns and handle ambiguous mentions. In this paper, we explore the use of in-context learning to overcome these challenges, on a benchmark of 6 uniformized skill extraction datasets. Our approach leverages the few-shot learning capabilities of large language models (LLMs) to identify and extract skills from sentences. We show that LLMs, despite not being on par with traditional supervised models in terms of performance, can better handle syntactically complex skill mentions in skill extraction tasks.	翻訳日:2024-02-07 15:44:03 公開日:2024-02-06
# OASim:自律運転のためのニューラルレンダリングに基づくオープンで適応的なシミュレータ OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving ( http://arxiv.org/abs/2402.03830v1 ) ライセンス: Link先を確認	Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi	(参考訳) ディープラーニングとコンピュータビジョンの技術開発により、自動運転は交通安全と効率を改善する新しいソリューションを提供する。高品質なデータセットを構築することの重要性は、特に近年のエンドツーエンドの自動運転アルゴリズムの台頭とともに、自明である。データはアルゴリズムのクローズドループシステムにおいて中心的な役割を果たす。しかし、現実世界のデータ収集は高価で、時間がかかり、安全ではない。暗黙的レンダリング技術の開発と、生成モデルを用いた大規模データ生成に関する詳細な研究により、オープンかつ適応的なシミュレータであり、暗黙的ニューラルレンダリングに基づく自律運転データ生成装置であるOASimを提案する。 1) 神経暗黙的表面再構成技術による高品質なシーン再構成。 2)自走車及び参加車両の軌道編集。 (3) シーンに自由に選択して挿入できるリッチカーモデルライブラリ。 (4) 特定のセンサを選択してデータを生成するリッチセンサーモデルライブラリ。 (5)高度にカスタマイズ可能なデータ生成システムは,ユーザのニーズに応じてデータを生成することができる。カルラシミュレータ上での認識性能評価と実世界のデータ取得により,生成データの品質と忠実さを実証する。コードはhttps://github.com/PJLab-ADG/OASimで入手できる。 With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is expensive, time-consuming, and unsafe. With the development of implicit rendering technology and in-depth research on using generative models to produce data at scale, we propose OASim, an open and adaptive simulator and autonomous driving data generator based on implicit neural rendering. It has the following characteristics: (1) High-quality scene reconstruction through neural implicit surface reconstruction technology. (2) Trajectory editing of the ego vehicle and participating vehicles. (3) Rich vehicle model library that can be freely selected and inserted into the scene. (4) Rich sensors model library where you can select specified sensors to generate data. (5) A highly customizable data generation system can generate data according to user needs. We demonstrate the high quality and fidelity of the generated data through perception performance evaluation on the Carla simulator and real-world data acquisition. Code is available at https://github.com/PJLab-ADG/OASim.	翻訳日:2024-02-07 15:43:48 公開日:2024-02-06
# 神経最適輸送による分布の重心推定 Estimating Barycenters of Distributions with Neural Optimal Transport ( http://arxiv.org/abs/2402.03828v1 ) ライセンス: Link先を確認	Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Evgeny Burnaev, Alexander Korotin	(参考訳) 確率測定の集合を考えると、実践者は基準分布を適切に集約する"平均"分布を見つける必要がある。そのような平均の理論的に魅力的な概念はワッサーシュタイン・バリーセンターであり、これは我々の研究の主焦点である。最適輸送(ot)の双対定式化を基盤として,ワッサースタイン・バリセンター問題を解くための新しいスケーラブルな手法を提案する。近年のNeural OTソルバをベースとして,二段階の対数学習目標を持ち,一般的なコスト関数に有効である。バリセンタタスクを利用する典型的な逆アルゴリズムは三段階最適化を利用しており、主に二次コストに重点を置いている。また,提案手法の理論的誤差境界を定め,その適用性および実例的シナリオと画像データ設定に対する有効性を示す。 Given a collection of probability measures, a practitioner sometimes needs to find an "average" distribution which adequately aggregates reference distributions. A theoretically appealing notion of such an average is the Wasserstein barycenter, which is the primal focus of our work. By building upon the dual formulation of Optimal Transport (OT), we propose a new scalable approach for solving the Wasserstein barycenter problem. Our methodology is based on the recent Neural OT solver: it has bi-level adversarial learning objective and works for general cost functions. These are key advantages of our method, since the typical adversarial algorithms leveraging barycenter tasks utilize tri-level optimization and focus mostly on quadratic cost. We also establish theoretical error bounds for our proposed approach and showcase its applicability and effectiveness on illustrative scenarios and image data setups.	翻訳日:2024-02-07 15:43:32 公開日:2024-02-06
# インボディードAIへの呼びかけ A call for embodied AI ( http://arxiv.org/abs/2402.03824v1 ) ライセンス: Link先を確認	Giuseppe Paolo, Jonas Gonzalez-Billandon, Bal\'azs K\'egl	(参考訳) 我々は、人工知能の追求における次の基本的なステップとして、Embodied AIを提案する。我々は、哲学、心理学、神経科学、ロボティクスといった様々な分野にまたがるエンボディメントの概念の進化を横切り、EAIが静的学習の古典的パラダイムとどのように区別するかを強調する。具体化aiの範囲を広げることにより,認知的アーキテクチャに基づく理論的枠組みを導入し,具体化エージェントの本質的構成要素として知覚,行動,記憶,学習を強調する。このフレームワークはFristonのアクティブな推論原則と一致しており、EAI開発に対する包括的なアプローチを提供する。 AIの分野での進歩にもかかわらず、新しいAI学習理論の定式化や高度なハードウェアの革新といった大きな課題が続いている。私たちの議論は、将来のEmbodied AI研究の基礎となるガイドラインを概説している。現実の環境における人間や他の知的なエンティティとのシームレスなコミュニケーション、コラボレーション、共存が可能なエンボダイドAIエージェントを作成することの重要性を強調し、我々はAIコミュニティを多面的な課題に対処し、AGIの探求に先立つ機会をつかむことを目指しています。 We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.	翻訳日:2024-02-07 15:43:17 公開日:2024-02-06
# RevOrder: 言語モデルにおける算術的強化のための新しい方法 RevOrder: A Novel Method for Enhanced Arithmetic in Language Models ( http://arxiv.org/abs/2402.03822v1 ) ライセンス: Link先を確認	Si Shen, Peijun Shen, Danhao Zhu	(参考訳) 本稿では,大言語モデル(LLM)における算術演算の改善を目的とした新しい手法であるRevOrderを提案する。本手法は,方程式の複雑性を評価するための新しい指標である$\mathcal{o}(1)$ に対して,シーケンシャル中間桁 (csid) のカウントを大幅に削減する。総合的なテストを通じて、RevOrderは基本的な算術演算において完全な精度を達成するだけでなく、分割タスク、特に従来のモデルが苦戦する多数のタスクにおけるLLM性能を大幅に向上させる。 RevOrderの実装は、トレーニングと推論フェーズの両方に費用対効果がある。さらに、GSM8Kの数学タスク上でLLaMA2-7Bモデルを微調整するためにRevOrderを適用すると、方程式計算誤差が46%減少し、総合スコアが41.6から44.4に増加した。 This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Through comprehensive testing, RevOrder not only achieves perfect accuracy in basic arithmetic operations but also substantially boosts LLM performance in division tasks, particularly with large numbers where traditional models struggle. Implementation of RevOrder is cost-effective for both training and inference phases. Moreover, applying RevOrder to fine-tune the LLaMA2-7B model on the GSM8K math task results in a considerable improvement, reducing equation calculation errors by 46% and increasing overall scores from 41.6 to 44.4.	翻訳日:2024-02-07 15:42:53 公開日:2024-02-06
# SMOTEの理論的および実験的研究:再バランス戦略の限界と比較 Theoretical and experimental study of SMOTE: limitations and comparisons of rebalancing strategies ( http://arxiv.org/abs/2402.03819v1 ) ライセンス: Link先を確認	Abdoulaye Sakho (LPSM), Erwan Scornet (LPSM), Emmanuel Malherbe	(参考訳) SMOTE(Synthetic Minority Oversampling Technique)は、不均衡なデータセットを扱うための一般的なリバランス戦略である。漸近的に、SMOTE(デフォルトパラメータを持つ)が元のマイノリティサンプルをコピーするだけで元の分布を再生することを示す。また,SMOTE密度はマイノリティ分布の支持境界付近で消失し,従って共通なBorderLine SMOTE戦略を正当化する。次に、2つの新しいSMOTE関連戦略を導入し、それらを最先端のリバランシング手順と比較する。データセットが高度に不均衡な場合にのみ、再バランス戦略が必要であることを示す。このようなデータセットに対して、SMOTE、提案、またはアンサンプ手順が最良の戦略である。 Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required when the data set is highly imbalanced. For such data sets, SMOTE, our proposals, or undersampling procedures are the best strategies.	翻訳日:2024-02-07 15:42:33 公開日:2024-02-06
# 単層グラフ畳み込みネットワークの漸近一般化誤差 Asymptotic generalization error of a single-layer graph convolutional network ( http://arxiv.org/abs/2402.03818v1 ) ライセンス: Link先を確認	O. Duranthon, L. Zdeborov\'a	(参考訳) グラフ畳み込みネットワークは大きな実用的期待を示しているが、標本数関数としての一般化特性の理論的理解は、教師付き完全連結ニューラルネットワークのより広く研究されているケースと比較してまだ初期段階にある。本稿では,一層グラフ畳み込みネットワーク(GCN)の性能を,属性付き確率ブロックモデル(SBM)が高次元限界で生成したデータに基づいて予測する。従来,SBM(文脈的SBM)のリッジ回帰のみを考慮し,CSBMの任意の凸損失と正則化に一般化し,他のデータモデルであるニューラルプライアSBMに解析を加えてきた。また,高信号対雑音比の限界について検討し,GCNの収束率を詳細に検討し,一貫性はあるものの,いずれの場合においてもベイズ最適値に達しないことを示す。 While graph convolutional networks show great practical promises, the theoretical understanding of their generalization properties as a function of the number of samples is still in its infancy compared to the more broadly studied case of supervised fully connected neural networks. In this article, we predict the performances of a single-layer graph convolutional network (GCN) trained on data produced by attributed stochastic block models (SBMs) in the high-dimensional limit. Previously, only ridge regression on contextual-SBM (CSBM) has been considered in Shi et al. 2022; we generalize the analysis to arbitrary convex loss and regularization for the CSBM and add the analysis for another data model, the neural-prior SBM. We also study the high signal-to-noise ratio limit, detail the convergence rates of the GCN and show that, while consistent, it does not reach the Bayes-optimal rate for any of the considered cases.	翻訳日:2024-02-07 15:42:23 公開日:2024-02-06
# 投票に基づく合意モデル圧縮によるネットワーク内フェデレーション学習の迅速化 Expediting In-Network Federated Learning by Voting-Based Consensus Model Compression ( http://arxiv.org/abs/2402.03815v1 ) ライセンス: Link先を確認	Xiaoxin Su, Yipeng Zhou, Laizhong Cui and Song Guo	(参考訳) 近年,データプライバシの保護能力により,連合学習(FL)が勢いを増している。 FLによるモデルトレーニングを行うために、複数のクライアントがパラメータサーバとインターネットを介してモデル更新を交換する。通信速度を高速化するため,パラメータサーバの代わりにプログラマブルスイッチ(PS)を配置してクライアントのコーディネートを行う方法が検討されている。 PSをFLにデプロイする際の課題はメモリスペースの不足にあり、PS上でメモリ消費集約アルゴリズムの実行を禁止している。この課題を解決するために,クライアント投票とモデル集約という2つのフェーズからなるFediAC(Federated Learning Aggregation with Compression)アルゴリズムを提案する。前フェーズでは、クライアントがPSに重要なモデル更新指標を報告し、世界的な重要なモデル更新を見積もる。後者のフェーズでは、クライアントは集約のためにグローバルに重要なモデルの更新をPSにアップロードする。 FediACは、クライアント間のコンセンサス圧縮を保証するため、既存の作業よりもメモリスペースと通信トラフィックをはるかに少なく消費する。 PSは、モデル更新インデックスを第2フェーズで迅速に完全なアグリゲーションに調整する。最後に,fediacがモデル精度と通信トラフィックの面で最先端のベースラインを著しく上回っていることを示すために,公開データセットを用いて広範な実験を行った。 Recently, federated learning (FL) has gained momentum because of its capability in preserving data privacy. To conduct model training by FL, multiple clients exchange model updates with a parameter server via Internet. To accelerate the communication speed, it has been explored to deploy a programmable switch (PS) in lieu of the parameter server to coordinate clients. The challenge to deploy the PS in FL lies in its scarce memory space, prohibiting running memory consuming aggregation algorithms on the PS. To overcome this challenge, we propose Federated Learning in-network Aggregation with Compression (FediAC) algorithm, consisting of two phases: client voting and model aggregating. In the former phase, clients report their significant model update indices to the PS to estimate global significant model updates. In the latter phase, clients upload global significant model updates to the PS for aggregation. FediAC consumes much less memory space and communication traffic than existing works because the first phase can guarantee consensus compression across clients. The PS easily aligns model update indices to swiftly complete aggregation in the second phase. Finally, we conduct extensive experiments by using public datasets to demonstrate that FediAC remarkably surpasses the state-of-the-art baselines in terms of model accuracy and communication traffic.	翻訳日:2024-02-07 15:42:08 公開日:2024-02-06
# 非離散帯域を持つマスクグラフオートエンコーダ Masked Graph Autoencoder with Non-discrete Bandwidths ( http://arxiv.org/abs/2402.03814v1 ) ライセンス: Link先を確認	Ziwen Zhao, Yuhua Li, Yixiong Zou, Jiliang Tang, Ruixuan Li	(参考訳) マスケードグラフオートエンコーダは、まだ十分に研究されていない強力なグラフ自己教師学習手法として登場した。本稿では,グラフニューラルネットワークにおけるメッセージ伝達の観点から,既存の離散エッジマスキングとバイナリリンク再構成戦略が位相的に有意な表現を学習するには不十分であることを示す。これらの制限には、メッセージフローのブロッキング、過度なスムースネスに対する脆弱性、最適近傍識別性が含まれる。これらの理解に触発されて、離散ベルヌーイ分布の代わりに連続分布と分散確率分布からサンプリングされる非離散エッジマスクを探索する。これらのマスクは、各エッジに対して「バンド幅」と呼ばれる出力メッセージの量を制限する。本稿では,帯域幅マスキングとレイヤワイド帯域幅予測を用いた新しい,情報的かつ効果的なトポロジマスマスキンググラフ自動符号化手法を提案する。理論的にも経験的にも強力なグラフトポロジ学習能力を示す。提案するフレームワークは,自己教師付きリンク予測(離散エッジ再構成器を最大20%改善する)と,構造学習プリテキストのみを用いた多数のデータセットのノード分類の両方において,代表的なベースラインを上回っている。私たちの実装はhttps://github.com/newiz430/bandanaで利用可能です。 Masked graph autoencoders have emerged as a powerful graph self-supervised learning method that has yet to be fully explored. In this paper, we unveil that the existing discrete edge masking and binary link reconstruction strategies are insufficient to learn topologically informative representations, from the perspective of message propagation on graph neural networks. These limitations include blocking message flows, vulnerability to over-smoothness, and suboptimal neighborhood discriminability. Inspired by these understandings, we explore non-discrete edge masks, which are sampled from a continuous and dispersive probability distribution instead of the discrete Bernoulli distribution. These masks restrict the amount of output messages for each edge, referred to as "bandwidths". We propose a novel, informative, and effective topological masked graph autoencoder using bandwidth masking and a layer-wise bandwidth prediction objective. We demonstrate its powerful graph topological learning ability both theoretically and empirically. Our proposed framework outperforms representative baselines in both self-supervised link prediction (improving the discrete edge reconstructors by at most 20%) and node classification on numerous datasets, solely with a structure-learning pretext. Our implementation is available at https://github.com/Newiz430/Bandana.	翻訳日:2024-02-07 15:41:47 公開日:2024-02-06
# nkハイブリッド遺伝的アルゴリズムによるクラスタリング NK Hybrid Genetic Algorithm for Clustering ( http://arxiv.org/abs/2402.03813v1 ) ライセンス: Link先を確認	Renato Tin\'os, Liang Zhao, Francisco Chicano, Darrell Whitley	(参考訳) 本稿では,クラスタリングのためのNKハイブリッド遺伝的アルゴリズムを提案する。この解を評価するために、ハイブリッドアルゴリズムはnkクラスタリング検証基準2 (nkcv2) を用いる。 NKCV2は、オブジェクトの小さなグループN$の配置に関する情報を使用する。各グループはデータセットの$K+1$オブジェクトで構成されている。 NKCV2とK$の固定値を用いて密度ベース領域を同定できることを示す実験結果を得た。 NKCV2では、決定変数の関係が知られており、グレーボックス最適化を適用することができる。突然変異演算子,分割クロスオーバー,局所探索戦略が提案され,すべて決定変数間の関係に関する情報を用いている。分割クロスオーバーでは、評価関数は$q$独立コンポーネントに分解され、分割クロスオーバーは計算複雑性$o(n)$を持つ2^q$可能子孫の中で決定論的にベストを返す。 NKハイブリッド遺伝的アルゴリズムは任意の形状のクラスターの検出とクラスタ数の自動推定を可能にする。実験では、NKハイブリッド遺伝的アルゴリズムは、他の遺伝的アルゴリズムや最先端クラスタリングアルゴリズムと比較して非常に良い結果を得た。 The NK hybrid genetic algorithm for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of $N$ small groups of objects. Each group is composed of $K+1$ objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small $K$. In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover, and a local search strategy are proposed, all using information about the relationship between decision variables. In partition crossover, the evaluation function is decomposed into $q$ independent components; partition crossover then deterministically returns the best among $2^q$ possible offspring with computational complexity $O(N)$. The NK hybrid genetic algorithm allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid genetic algorithm produced very good results when compared to another genetic algorithm approach and to state-of-art clustering algorithms.	翻訳日:2024-02-07 15:41:25 公開日:2024-02-06
# 高次元ガウス過程モデリングのための加法性と活性部分空間の組み合わせ Combining additivity and active subspaces for high-dimensional Gaussian process modeling ( http://arxiv.org/abs/2402.03809v1 ) ライセンス: Link先を確認	Mickael Binois (ACUMES), Victor Picheny	(参考訳) ガウス過程は、予測精度、分析的トラクタビリティ、不確実性定量化のための内蔵能力のために、回帰と分類のための広く受け入れられた手法である。しかし、変数の数が増えるたびに次元の呪いに悩まされる。この課題は一般に、問題に付加的な構造を仮定することで解決され、望ましい選択肢は加法性または低固有次元である。高次元ガウス過程モデリングへの我々の貢献は、これらを多面的戦略と組み合わせ、合成関数やデータセットの実験を通じて利点を示すことである。 Gaussian processes are a widely embraced technique for regression and classification due to their good prediction accuracy, analytical tractability and built-in capabilities for uncertainty quantification. However, they suffer from the curse of dimensionality whenever the number of variables increases. This challenge is generally addressed by assuming additional structure in theproblem, the preferred options being either additivity or low intrinsic dimensionality. Our contribution for high-dimensional Gaussian process modeling is to combine them with a multi-fidelity strategy, showcasing the advantages through experiments on synthetic functions and datasets.	翻訳日:2024-02-07 15:41:06 公開日:2024-02-06
# sdemg : スコアに基づく表面筋電図信号の拡散モデル SDEMG: Score-based Diffusion Model for Surface Electromyographic Signal Denoising ( http://arxiv.org/abs/2402.03808v1 ) ライセンス: Link先を確認	Yu-Tung Liu, Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao	(参考訳) 表面筋電図(sEMG)記録は、監視される筋肉が心臓に近いときに心電図(ECG)信号に影響される。既存のいくつかの手法では、ハイパスフィルタやテンプレートサブトラクションなどの信号処理に基づくアプローチが採用されているが、ノイズの多いsEMG(ECG干渉付きsEMG)からクリーンなsEMG信号を復元する関数が導出されている。近年,ノイズの多い入力データを用いた高品質で正確なサンプルを生成するために,スコアベース拡散モデルが導入された。本研究では,SDEMGと呼ばれる新しい手法を提案し,SEMG信号デノージングのためのスコアベース拡散モデルを提案する。提案手法を評価するために,mit-bih正規正弦波リズムデータベースからのecg信号とオープンアクセス可能な非侵襲適応義手データベースのデータを用いて,semg信号のノイズを低減する実験を行った。その結果,SDEMGは比較法より優れ,高品質なsEMG試料が得られた。 SDEMGのソースコードは、https://github.com/tonyliu0910/SDEMGで入手できる。 Surface electromyography (sEMG) recordings can be influenced by electrocardiogram (ECG) signals when the muscle being monitored is close to the heart. Several existing methods use signal-processing-based approaches, such as high-pass filter and template subtraction, while some derive mapping functions to restore clean sEMG signals from noisy sEMG (sEMG with ECG interference). Recently, the score-based diffusion model, a renowned generative model, has been introduced to generate high-quality and accurate samples with noisy input data. In this study, we proposed a novel approach, termed SDEMG, as a score-based diffusion model for sEMG signal denoising. To evaluate the proposed SDEMG approach, we conduct experiments to reduce noise in sEMG signals, employing data from an openly accessible source, the Non-Invasive Adaptive Prosthetics database, along with ECG signals from the MIT-BIH Normal Sinus Rhythm Database. The experiment result indicates that SDEMG outperformed comparative methods and produced high-quality sEMG samples. The source code of SDEMG the framework is available at: https://github.com/tonyliu0910/SDEMG	翻訳日:2024-02-07 15:40:55 公開日:2024-02-06
# SEABO:オフライン模倣学習のための簡易検索手法 SEABO: A Simple Search-Based Method for Offline Imitation Learning ( http://arxiv.org/abs/2402.03807v1 ) ライセンス: Link先を確認	Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu	(参考訳) オフライン強化学習(rl)は、静的なオフラインデータセットから学習する能力と、環境とのインタラクションの必要性の排除によって、多くの注目を集めている。それでも、オフラインRLの成功は、報酬ラベルを付したオフライン移行に大きく依存している。実際には、しばしば報酬関数を手作りする必要があるが、それは時に困難、労働集約的、あるいは非効率である。この課題に取り組むために,我々はオフライン模倣学習(il)設定に着目し,専門家データとラベルなしデータに基づいて報奨機能を得ることを目標とした。そこで本研究では,検索ベースのオフラインil手法であるtagged seaboを提案する。 SEABOは、専門家によるデモンストレーションにおいて、隣人に近い移行に対してより大きな報酬を割り当て、そうでなければ、すべて教師なしの学習方法で、より小さな報酬を割り当てる。様々なD4RLデータセットに対する実験結果から、SEABOは1つの専門的軌道のみを与えられた、オフラインRLアルゴリズムに対する競合的な性能を達成することができ、多くのタスクにおける事前報酬学習やオフラインILメソッドよりも優れることが示された。また,専門家による実証実験が観察のみを含む場合,SEABOは有効であることを示す。私たちのコードはhttps://github.com/dmksjfl/SEABO.comで公開されています。 Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment. Nevertheless, the success of offline RL relies heavily on the offline transitions annotated with reward labels. In practice, we often need to hand-craft the reward function, which is sometimes difficult, labor-intensive, or inefficient. To tackle this challenge, we set our focus on the offline imitation learning (IL) setting, and aim at getting a reward function based on the expert data and unlabeled data. To that end, we propose a simple yet effective search-based offline IL method, tagged SEABO. SEABO allocates a larger reward to the transition that is close to its closest neighbor in the expert demonstration, and a smaller reward otherwise, all in an unsupervised learning manner. Experimental results on a variety of D4RL datasets indicate that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory, and can outperform prior reward learning and offline IL methods across many tasks. Moreover, we demonstrate that SEABO also works well if the expert demonstrations contain only observations. Our code is publicly available at https://github.com/dmksjfl/SEABO.	翻訳日:2024-02-07 15:40:33 公開日:2024-02-06
# 信用決定のための説明可能な自動機械学習:金融工学におけるヒューマン人工知能コラボレーションの強化 Explainable Automated Machine Learning for Credit Decisions: Enhancing Human Artificial Intelligence Collaboration in Financial Engineering ( http://arxiv.org/abs/2402.03806v1 ) ライセンス: Link先を確認	Marc Schmitt	(参考訳) 本稿では,金融工学領域における説明可能な自動機械学習(AutoML)の統合について考察する。金融における人工知能(AI)の急速な進化は、洗練されたアルゴリズムによる意思決定と、これらのシステムの透明性の必要性のバランスを必要とする。 automlがクレジットスコアリングのための堅牢な機械学習モデルの開発を合理化する一方で、説明可能なai(xai)メソッド、特にshapley additive descriptions(shap)は、モデルの意思決定プロセスに関する洞察を提供する。この研究は、AutoMLとXAIの組み合わせが信用決定の効率性と正確性を高めるだけでなく、人間とAIシステムの信頼と協力を促進することを実証している。この調査結果は、AI主導の金融決定の透明性と説明責任を改善し、規制要件と倫理的考慮に従って、説明可能なAutoMLの可能性を強調している。 This paper explores the integration of Explainable Automated Machine Learning (AutoML) in the realm of financial engineering, specifically focusing on its application in credit decision-making. The rapid evolution of Artificial Intelligence (AI) in finance has necessitated a balance between sophisticated algorithmic decision-making and the need for transparency in these systems. The focus is on how AutoML can streamline the development of robust machine learning models for credit scoring, while Explainable AI (XAI) methods, particularly SHapley Additive exPlanations (SHAP), provide insights into the models' decision-making processes. This study demonstrates how the combination of AutoML and XAI not only enhances the efficiency and accuracy of credit decisions but also fosters trust and collaboration between humans and AI systems. The findings underscore the potential of explainable AutoML in improving the transparency and accountability of AI-driven financial decisions, aligning with regulatory requirements and ethical considerations.	翻訳日:2024-02-07 15:40:08 公開日:2024-02-06
# DistiLLM:大規模言語モデルの合理化に向けて DistiLLM: Towards Streamlined Distillation for Large Language Models ( http://arxiv.org/abs/2402.03898v1 ) ライセンス: Link先を確認	Jongwoo Ko, Sungnyun Kim, Tianyi Chen, Se-Young Yun	(参考訳) 知識蒸留(KD)は、教師モデルをより小さな学生モデルに圧縮するために広く用いられ、モデル能力を維持しながら推論コストとメモリフットプリントを低減する。しかし、現在の自動回帰シーケンスモデル(例えば、大きな言語モデル)のKD法は、標準化された目的関数を欠いている。さらに、近年の学生生成出力によるトレーニング・推論ミスマッチへの対処は、計算コストを著しく高めている。これらの問題に対処するために、自動回帰言語モデルのためのより効率的で効率的なKDフレームワークであるDistiLLMを紹介する。 DistiLLMは,(1)新しいスキューKulback-Leibler分散損失,(2)学生生成出力の効率向上を目的とした適応型オフ政治アプローチの2つのコンポーネントから構成される。命令追従タスクを含む大規模な実験は、最近のKD法と比較して4.3$\times$スピードアップを達成しつつ、高性能な学生モデルを構築する上でDistiLLMの有効性を示す。 Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to address training-inference mismatches has significantly escalated computational costs. To tackle these issues, we introduce DistiLLM, a more effective and efficient KD framework for auto-regressive language models. DistiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs. Extensive experiments, including instruction-following tasks, demonstrate the effectiveness of DistiLLM in building high-performing student models while achieving up to 4.3$\times$ speedup compared to recent KD methods.	翻訳日:2024-02-07 15:34:52 公開日:2024-02-06
# 行と円を超えて:大規模言語モデルにおける幾何学的推論ギャップを明らかにする Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models ( http://arxiv.org/abs/2402.03877v1 ) ライセンス: Link先を確認	Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski	(参考訳) 大規模言語モデル(LLM)は、数学的およびアルゴリズム的なタスクにおいて、絶え間なく増加する能力を示すが、その幾何学的推論スキルは過小評価されている。構築幾何学的問題解決におけるllmsの能力について,人間の数学的推論の発展における最も基本的なステップの1つについて検討する。我々の研究は、同様の分野での多くの成功にもかかわらず、最先端のLLMがこの領域で直面する顕著な課題を明らかにします。 LLMは対象の変数選択に偏りを示し、2次元空間的関係に苦慮し、しばしば物体とその配置を誤って表現し幻覚させる。そこで本研究では,内部対話を行うことで,既存の推論能力を高めるllmsベースのマルチエイジェントシステムを定式化した枠組みを提案する。この研究は、幾何学的推論におけるLLMの現在の限界を強調し、自己補正、協調、多様な役割専門化を通じて幾何学的推論能力を改善する。 Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored. We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning. Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas. LLMs exhibit biases in target variable selection and struggle with 2D spatial relationships, often misrepresenting and hallucinating objects and their placements. To this end, we introduce a framework that formulates an LLMs-based multi-agents system that enhances their existing reasoning potential by conducting an internal dialogue. This work underscores LLMs' current limitations in geometric reasoning and improves geometric reasoning capabilities through self-correction, collaboration, and diverse role specializations.	翻訳日:2024-02-07 15:34:34 公開日:2024-02-06
# BQP$^A$プロトコルと潜在グラフ分類器の幾何学量子機械学習 Geometric quantum machine learning of BQP$^A$ protocols and latent graph classifiers ( http://arxiv.org/abs/2402.03871v1 ) ライセンス: Link先を確認	Chukwudubem Umeano, Vincent E. Elfving, Oleksandr Kyriienko	(参考訳) 幾何学量子機械学習(GQML)は、効率的な解法プロトコルを学習するための問題対称性を埋め込むことを目的としている。しかし、(G)QMLが古典的なアナログから指数関数的に分離したプロトコルの構築に日常的に使用できるかどうかについては疑問が残る。このレターでは、ブール関数の学習特性に関するサイモンの問題を考察し、これは教師なし回路分類問題と関係があることを示す。幾何QMLのワークフローを用いて、Simonのアルゴリズムから学習し、いくつかのデータセット(oracle $A$)に関してBQP$^A\neq$BPPプロトコルの例を発見する。我々の重要な発見は、特定されたビットフリップおよび置換対称性に関するtwirlingに基づくブール関数埋め込みのための同変特徴マップの開発と、サンプリングの利点を持つ不変可観測性に基づく測定である。提案したワークフローは、変動回路を自明なアイデンティティ演算子として保ちながら、データ埋め込みと古典的な後処理の重要性を指摘する。次に,関数学習の直観を発展させ,向き付けられた計算ハイパーグラフとしてインスタンスを視覚化し,GQMLプロトコルがグローバルなトポロジ的特徴にアクセスして,単射関数と全射関数を区別する。最後に、他のbqp$^a$-タイプのプロトコルを学習する可能性について議論し、これはユニタリの線形結合として適用される埋め込みベースのoracles $a$を単純化する能力に依存すると推測する。 Geometric quantum machine learning (GQML) aims to embed problem symmetries for learning efficient solving protocols. However, the question remains if (G)QML can be routinely used for constructing protocols with an exponential separation from classical analogs. In this Letter we consider Simon's problem for learning properties of Boolean functions, and show that this can be related to an unsupervised circuit classification problem. Using the workflow of geometric QML, we learn from first principles Simon's algorithm, thus discovering an example of BQP$^A\neq$BPP protocol with respect to some dataset (oracle $A$). Our key findings include the development of an equivariant feature map for embedding Boolean functions, based on twirling with respect to identified bitflip and permutational symmetries, and measurement based on invariant observables with a sampling advantage. The proposed workflow points to the importance of data embeddings and classical post-processing, while keeping the variational circuit as a trivial identity operator. Next, developing the intuition for the function learning, we visualize instances as directed computational hypergraphs, and observe that the GQML protocol can access their global topological features for distinguishing bijective and surjective functions. Finally, we discuss the prospects for learning other BQP$^A$-type protocols, and conjecture that this depends on the ability of simplifying embeddings-based oracles $A$ applied as a linear combination of unitaries.	翻訳日:2024-02-07 15:34:17 公開日:2024-02-06
# ドイツ語のプレステキストでは、単語の1%未満が性別による排他的言語に影響される Less than one percent of words would be affected by gender-inclusive language in German press texts ( http://arxiv.org/abs/2402.03870v1 ) ライセンス: Link先を確認	Carolin M\"uller-Spitzer, Samira Ochs, Alexander Koplenig, Jan-Oliver R\"udiger, Sascha Wolfer	(参考訳) ジェンダーと言語に関する研究は、性平等と非差別言語の使用に関する社会的議論に強く根付いている。精神言語学者はこの分野で大きな貢献をした。しかし、これらの事項を言語使用の文脈で研究するコーパスベースの研究はいまだに稀である。本研究は,ジェンダー非包括的テキストをジェンダー非包括的テキストに書き換える場合,実際にどの程度のテクストを変更すべきかという問題に対処する。この量的尺度は重要な経験的洞察であり、ジェンダーを包含するドイツ語の使用に対する繰り返しの議論は、文章が長く複雑すぎるというものである。また、ジェンダー非包摂的言語は言語学習者に悪影響を及ぼすとも主張されている。しかし、このような効果は、性非包括的テキストが性非包括的テキストと非常に異なる場合に限られる。コーパス言語研究では、手動でドイツ語のプレステキストに注釈を付け、変更すべき部分を特定しました。その結果、平均して全てのトークンの1%未満は、性別による排他的な言語に影響されることがわかった。この小さな割合は、特に男性ジェネリックを解釈する潜在的複雑さを考慮すると、性別を包含するドイツ人が言語を理解し、学習する上で大きな障壁となるかどうかを問うものである。 Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.	翻訳日:2024-02-07 15:33:48 公開日:2024-02-06
# 物理インフォームドニューラルネットワークにおける非線形レジームの課題 The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks ( http://arxiv.org/abs/2402.03864v1 ) ライセンス: Link先を確認	Andrea Bonfanti, Giuseppe Bruno, Cristina Cipriani	(参考訳) ニューラル・タンジェント・カーネル(NTK)の視点は、無限幅限界における物理情報ニューラルネットワーク(PINN)のトレーニング力学を調べるための貴重なアプローチである。我々はこの観点を活用し、PINNによって解決された非線形偏微分方程式(PDE)の事例に焦点を当てる。微分作用素の線型性に依存するNTKの異なる挙動に関する理論的結果を提供する。さらに,理論的な結果に触発されて,PINNの訓練に二階法を用いるという利点を強調した。さらに, 2次法の収束能力を考察し, スペクトルバイアスと緩やかな収束の課題に対処する。各理論結果は線形PDEと非線形PDEの数値例によって支持され、ベンチマークテストケースでのトレーニング方法を検証する。 The Neural Tangent Kernel (NTK) viewpoint represents a valuable approach to examine the training dynamics of Physics-Informed Neural Networks (PINNs) in the infinite width limit. We leverage this perspective and focus on the case of nonlinear Partial Differential Equations (PDEs) solved by PINNs. We provide theoretical results on the different behaviors of the NTK depending on the linearity of the differential operator. Moreover, inspired by our theoretical results, we emphasize the advantage of employing second-order methods for training PINNs. Additionally, we explore the convergence capabilities of second-order methods and address the challenges of spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we validate our training method on benchmark test cases.	翻訳日:2024-02-07 15:33:25 公開日:2024-02-06
# ハイブリッドコヒーレント状態における高次非古典性 Higher-order nonclassicalities in hybrid coherent states ( http://arxiv.org/abs/2402.03858v1 ) ライセンス: Link先を確認	Sandip Kumar Giri and Biswajit Sen	(参考訳) 量子技術の発展において、非古典的状態は、非古典性の適切な利用なしには量子的優位性を得ることができないため、重要な役割を担っている。本研究では,単光子付加コヒーレント状態 (SPAC) とコヒーレント状態 (CS) のコヒーレント重ね合わせであるハイブリッドコヒーレント状態 (HCS) を考える。本稿では,高次スクイージングと高次アンチバンチングに着目したHCSの高次非古典的特性について報告する。 hcsは実験的に実現可能であり、この工学化された量子状態は、所望の高次非古典的性質を持つ量子状態を生成するのに使うことができる。 In the development of quantum technologies, nonclassical states have been playing a pivotal role, as quantum advantage cannot be obtained without appropriate utilization of nonclassicality. In the present work, we consider a hybrid coherent state (HCS), which is a coherent superposition of the single-photon-added coherent (SPAC) state and a coherent state (CS). Here, we report higher-order nonclassical properties of HCS with a specific focus on higher-order squeezing and higher-order antibunching. It's shown that HCS is experimentally realizable, and this engineered quantum state can be used to produce quantum states with desired higher-order nonclassical properties.	翻訳日:2024-02-07 15:33:11 公開日:2024-02-06
# ポジションペーパー:モデル表現研究の新しい枠組みに向けて Position Paper: Toward New Frameworks for Studying Model Representations ( http://arxiv.org/abs/2402.03855v1 ) ライセンス: Link先を確認	Satvik Golechha, James Dao	(参考訳) mechanistic interpretability (mi)は、ニューラルネットワークが学習する正確なアルゴリズムをリバースエンジニアリングすることで、aiモデルを理解することを目的としている。 MIにおけるほとんどの研究は、自明でトークンに整合した振る舞いと能力を研究しています。しかし、ほとんどの能力はそれほど自明ではなく、分析の単位としてこれらのネットワーク内の隠れた表現の研究を提唱している。文献レビューを行い、特徴と行動の表現を形式化し、その重要性と評価を強調し、表現の機械的解釈可能性に関する基礎的な調査を行う。議論と探索の結果から,表現研究は重要かつ未研究の分野であり,現在MIで確立されている手法では表現の理解が不十分である,という立場を正当化し,表現研究の新たな枠組みに向けて研究コミュニティを推し進める。 Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn. Most works in MI so far have studied behaviors and capabilities that are trivial and token-aligned. However, most capabilities are not that trivial, which advocates for the study of hidden representations inside these networks as the unit of analysis. We do a literature review, formalize representations for features and behaviors, highlight their importance and evaluation, and perform some basic exploration in the mechanistic interpretability of representations. With discussion and exploratory results, we justify our position that studying representations is an important and under-studied field, and that currently established methods in MI are not sufficient to understand representations, thus pushing for the research community to work toward new frameworks for studying representations.	翻訳日:2024-02-07 15:32:56 公開日:2024-02-06
# anls* -- 生成型大規模言語モデルのためのユニバーサルドキュメント処理メトリック ANLS* -- A Universal Document Processing Metric for Generative Large Language Models ( http://arxiv.org/abs/2402.03848v1 ) ライセンス: Link先を確認	David Peer, Philemon Sch\"opf, Volckmar Nebendahl, Alexander Rietzler, Sebastian Stabinger	(参考訳) 伝統的に、差別モデルが文書分類や情報抽出といったタスクの主要な選択肢となっている。これらのモデルは、限定された定義済みのクラスに該当する予測を行い、バイナリ真または偽の評価を容易にし、F1スコアのようなメトリクスの直接計算を可能にする。しかし、ジェネレーティブ大言語モデル(gllm)の最近の進歩により、ゼロショット能力が強化され、ダウンストリームデータセットと計算コストの高い微調整の必要性がなくなるため、この分野はシフトした。しかし、GLLM の評価は、識別モデルに使用される二項真偽の評価が、GLLM の予測には適用できないため、課題となる。本稿では,情報抽出や分類タスクを含む幅広いタスクを評価するために,anlsと呼ばれる生成モデルのための新しいメトリクスを提案する。 ANLSメトリックは、既存のANLSメトリクスをドロップイン置換として拡張し、以前報告されたANLSスコアと互換性がある。また、ANLS測定値を用いて、7つの異なるデータセットと3つの異なるGLLMの評価を行い、提案手法の重要性を示した。また、SFTと呼ばれる文書のプロンプトを生成する新しい手法を、LATINなどの他のプロンプト技術に対してベンチマークする。 21件中15件で、SFTは他のテクニックより優れており、最先端の技術を改善している。ソースはhttps://github.com/deepopinion/anls_star_metricにある。 Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs. This paper introduces a new metric for generative models called ANLS for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets and 3 different GLLMs using the ANLS* metric is also provided, demonstrating the importance of the proposed metric. We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In 15 out of 21 cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as $15$ percentage points. Sources are available at https://github.com/deepopinion/anls_star_metric	翻訳日:2024-02-07 15:32:41 公開日:2024-02-06
# 量子支援ベクトルマシンを用いた非溶血ペプチドの分類 Non-Hemolytic Peptide Classification Using A Quantum Support Vector Machine ( http://arxiv.org/abs/2402.03847v1 ) ライセンス: Link先を確認	Shengxin Zhuang, John Tanner, Yusen Wu, Du Q. Huynh, Wei Liu Xavier F. Cadet, Nicolas Fontaine, Philippe Charton, Cedric Damour, Frederic Cadet, Jingbo Wang	(参考訳) 量子機械学習(QML)は、量子計算の最も有望な応用の1つである。しかし、データが古典的な性質を持ち、QMLの実用的な実世界の応用を探す際に量子的優位性が存在するかどうかはまだ不明である。本研究では,QMLモデルである量子支援ベクトルマシン(QSVM)を,ペプチドを溶血性または非溶血性のいずれかに分類する二項分類タスクに適用する。 3つのペプチドデータセットを用いて、QSVMの性能、多くの古典的なSVM、そして、QSVMが最高に機能する同一のペプチド分類タスクに関する最も優れた結果を適用し、比較する。この作品の貢献には i) この特定のペプチド分類タスクへのQSVMの最初の適用。 (ii)この分類課題において古典的機械学習モデルで得られた最良の結果よりも優れたqsvmの明示的な実証と, (iii)qsvmは、この分類タスクにおいて、多くの(おそらくすべての)古典的svmを上回ることができることを示す実証的な結果である。この基礎研究は、計算生物学の分野で検証可能な量子長所への道を開き、より安全な治療開発を促進する。 Quantum machine learning (QML) is one of the most promising applications of quantum computation. However, it is still unclear whether quantum advantages exist when the data is of a classical nature and the search for practical, real-world applications of QML remains active. In this work, we apply the well-studied quantum support vector machine (QSVM), a powerful QML model, to a binary classification task which classifies peptides as either hemolytic or non-hemolytic. Using three peptide datasets, we apply and contrast the performance of the QSVM, numerous classical SVMs, and the best published results on the same peptide classification task, out of which the QSVM performs best. The contributions of this work include (i) the first application of the QSVM to this specific peptide classification task, (ii) an explicit demonstration of QSVMs outperforming the best published results attained with classical machine learning models on this classification task and (iii) empirical results showing that the QSVM is capable of outperforming many (and possibly all) classical SVMs on this classification task. This foundational work paves the way to verifiable quantum advantages in the field of computational biology and facilitates safer therapeutic development.	翻訳日:2024-02-07 15:32:15 公開日:2024-02-06
# 外乱検出のための隠れ外乱発生効率の向上 Efficient Generation of Hidden Outliers for Improved Outlier Detection ( http://arxiv.org/abs/2402.03846v1 ) ライセンス: Link先を確認	Jose Cribeiro-Ramallo, Vadim Arzamasov, Klemens B\"ohm	(参考訳) 外乱生成は重要な外乱検出タスクを解くのによく使われる手法である。現実的な振る舞いで外れ値を生成するのは困難です。一般的な既存の手法は、高次元空間における外れ値の'多重ビュー'特性を無視しやすい。この性質を考慮に入れている唯一の方法は、効率性と有効性に欠ける。本稿では,その特性を模倣した現実的な外れ値を生成する新しい外れ値生成手法であるBISECTを提案する。そのために、BISECTは、これらの現実的な外れ値を効率的に生成する方法を述べる新しい提案をこの記事に導入している。我々の手法は'複数ビュー'を再現する現在の手法よりも保証と複雑さが優れている。本研究では,bisectが生成する合成異常値を用いて,多種多様なデータセットにおける異常検出を効果的に強化する。例えば、BISECTとのオーバーサンプリングでは、ベースラインと比較してエラーを最大3倍削減した。 Outlier generation is a popular technique used for solving important outlier detection tasks. Generating outliers with realistic behavior is challenging. Popular existing methods tend to disregard the 'multiple views' property of outliers in high-dimensional spaces. The only existing method accounting for this property falls short in efficiency and effectiveness. We propose BISECT, a new outlier generation method that creates realistic outliers mimicking said property. To do so, BISECT employs a novel proposition introduced in this article stating how to efficiently generate said realistic outliers. Our method has better guarantees and complexity than the current methodology for recreating 'multiple views'. We use the synthetic outliers generated by BISECT to effectively enhance outlier detection in diverse datasets, for multiple use cases. For instance, oversampling with BISECT reduced the error by up to 3 times when compared with the baselines.	翻訳日:2024-02-07 15:31:42 公開日:2024-02-06
# 拡散モデルにおけるゲージ自由度、保守性および固有次元推定について On gauge freedom, conservativity and intrinsic dimensionality estimation in diffusion models ( http://arxiv.org/abs/2402.03845v1 ) ライセンス: Link先を確認	Christian Horvat and Jean-Pascal Pfister	(参考訳) 拡散モデルは、最近、高次元でのサンプリング品質と密度推定の観点から印象的な性能を示す生成モデルである。それらは、時間依存ベクトル場によって記述され、生成モデルとして使用される前方連続拡散過程と後方連続分解過程に依存している。拡散モデルのオリジナルの定式化において、このベクトル場はスコア関数(つまり、拡散過程における所定の時間における対数確率の勾配)であると仮定される。興味深いことに、現実的には、拡散モデルに関するほとんどの研究は、このベクトル場をニューラルネットワーク関数として実装し、あるエネルギー関数の勾配として制約しない(つまり、ほとんどの研究はベクトル場を保守的であるように制限しない)。このような制約がパフォーマンス向上につながるかどうかを実証的に調査する研究もあるが、矛盾する結果につながり、分析結果の提供に失敗している。本稿では,ベクトル場のモデリング自由度に関する3つの解析結果を示す。まず、与えられた(ゲージ)自由を満たす保守的成分と直交成分にベクトル場の新たな分解を提案する。第二に, この直交分解により, 保存成分が真のスコアと正確に等しい場合, 正確な密度推定と精密サンプリングが可能であり, 保存性は必要でも十分でもないことを示した。最後に、データ多様体の局所的な情報を推測する際、ベクトル場が保守的であることを制約することが望ましいことを示す。 Diffusion models are generative models that have recently demonstrated impressive performances in terms of sampling quality and density estimation in high dimensions. They rely on a forward continuous diffusion process and a backward continuous denoising process, which can be described by a time-dependent vector field and is used as a generative model. In the original formulation of the diffusion model, this vector field is assumed to be the score function (i.e. it is the gradient of the log-probability at a given time in the diffusion process). Curiously, on the practical side, most studies on diffusion models implement this vector field as a neural network function and do not constrain it be the gradient of some energy function (that is, most studies do not constrain the vector field to be conservative). Even though some studies investigated empirically whether such a constraint will lead to a performance gain, they lead to contradicting results and failed to provide analytical results. Here, we provide three analytical results regarding the extent of the modeling freedom of this vector field. {Firstly, we propose a novel decomposition of vector fields into a conservative component and an orthogonal component which satisfies a given (gauge) freedom. Secondly, from this orthogonal decomposition, we show that exact density estimation and exact sampling is achieved when the conservative component is exactly equals to the true score and therefore conservativity is neither necessary nor sufficient to obtain exact density estimation and exact sampling. Finally, we show that when it comes to inferring local information of the data manifold, constraining the vector field to be conservative is desirable.	翻訳日:2024-02-07 15:31:20 公開日:2024-02-06
# 光学鋼ロープの非破壊損傷検出法 A new method for optical steel rope non-destructive damage detection ( http://arxiv.org/abs/2402.03843v1 ) ライセンス: Link先を確認	Yunqing Bao, Bin Hu	(参考訳) 本稿では,高高度環境(エアラルロープウェイ)における鋼ロープの非破壊損傷検出アルゴリズムを提案する。まず、rgbd-unetと呼ばれるセグメンテーションモデルは、複雑な背景から正確に鋼ロープを抽出するように設計されている。このモデルは、提案したCMAモジュールを通して色と深度情報を処理・結合する機能を備えている。第2に、VovNetV3.5と呼ばれる検出モデルは、通常の鋼ロープと異常鋼ロープを区別するために開発された。 VovNetアーキテクチャとDBBモジュールを統合してパフォーマンスを向上させる。また,セグメンテーションモデルの一般化能力を高めるために,新たなバックグラウンド拡張手法を提案する。セグメンテーションと検出モデルのトレーニングとテストのために、異なるシナリオで鋼ロープの画像を含むデータセットが作成されます。実験はベースラインモデルよりも大幅に改善された。提案するデータセットでは,検出モデルによる最大精度は0.975に達し,セグメンテーションモデルによる最大f測定値は0.948に達した。 This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.	翻訳日:2024-02-07 15:30:34 公開日:2024-02-06
# 信念のシーングラフ:期待の計算による部分的なシーンの拡張 Belief Scene Graphs: Expanding Partial Scenes with Objects through Computation of Expectation ( http://arxiv.org/abs/2402.03840v1 ) ライセンス: Link先を確認	Mario A.V. Saucedo, Akash Patel, Akshit Saradagi, Christoforos Kanellakis and George Nikolakopoulos	(参考訳) 本稿では,部分的な情報を用いた効率的な高レベルタスク計画を可能にする,部分的な3次元シーングラフのユーティリティ駆動拡張であるBelief Scene Graphsの概念を提案する。ロボットのミッションに適した新しいノード(盲目ノードと呼ばれる)を戦略的に追加するために使用される、任意の3dシーングラフ上の信念(期待)の計算のためのグラフベースの学習手法を提案する。本研究では,利用可能なトレーニングデータからヒストグラムを学習することにより,現実の信念/期待を合理的に近似する相関情報(ceci)に基づく期待値の計算法を提案する。 3次元シーングラフのレポジトリからCECIを学ぶために,新しいグラフ畳み込みニューラルネットワーク(GCN)モデルを開発した。新たなCECIモデルのトレーニングには3Dシーングラフのデータベースが存在しないため,意味的注釈付き実生活3D空間に基づく3Dシーングラフデータセットを生成するための新しい手法を提案する。生成されたデータセットを用いて提案したCECIモデルをトレーニングし,提案手法の広範な検証を行う。我々は、期待を抽象表現に統合するためのコアコンポーネントとして、新しい概念である \textit{belief scene graphs} (bsg) を確立した。この新しいコンセプトは、古典的な3Dシーングラフの概念の進化であり、さまざまなロボティクスミッションのタスク計画と最適化のための高度な推論を可能にすることを目的としている。全体のフレームワークの有効性は、オブジェクト検索のシナリオで評価され、人間の目に見えないオブジェクトの常識をエミュレートする実生活実験でもテストされている。 In this article, we propose the novel concept of Belief Scene Graphs, which are utility-driven extensions of partial 3D scene graphs, that enable efficient high-level task planning with partial information. We propose a graph-based learning methodology for the computation of belief (also referred to as expectation) on any given 3D scene graph, which is then used to strategically add new nodes (referred to as blind nodes) that are relevant for a robotic mission. We propose the method of Computation of Expectation based on Correlation Information (CECI), to reasonably approximate real Belief/Expectation, by learning histograms from available training data. A novel Graph Convolutional Neural Network (GCN) model is developed, to learn CECI from a repository of 3D scene graphs. As no database of 3D scene graphs exists for the training of the novel CECI model, we present a novel methodology for generating a 3D scene graph dataset based on semantically annotated real-life 3D spaces. The generated dataset is then utilized to train the proposed CECI model and for extensive validation of the proposed method. We establish the novel concept of \textit{Belief Scene Graphs} (BSG), as a core component to integrate expectations into abstract representations. This new concept is an evolution of the classical 3D scene graph concept and aims to enable high-level reasoning for the task planning and optimization of a variety of robotics missions. The efficacy of the overall framework has been evaluated in an object search scenario, and has also been tested on a real-life experiment to emulate human common sense of unseen-objects.	翻訳日:2024-02-07 15:29:44 公開日:2024-02-06
# ランダムの特徴モデル--ナイーブ・インパテーションの成功を研究する方法 Random features models: a way to study the success of naive imputation ( http://arxiv.org/abs/2402.03839v1 ) ライセンス: Link先を確認	Alexis Ayme (LPSM (UMR\_8001)), Claire Boyer (LPSM (UMR\_8001), IUF), Aymeric Dieuleveut (CMAP), Erwan Scornet (LPSM (UMR\_8001))	(参考訳) コンスタントな(ナイーブな)インプテーションは、データ欠落に対処するのに初めて簡単に使えるテクニックであるため、まだ広く使われている。しかし、この単純な手法は、インプット入力が真の基礎データと強く異なる可能性があるため、予測目的に対して大きなバイアスを引き起こすことが期待できる。しかし、最近の研究では、データが完全にランダム(MCAR)で欠落していると思われる場合、このバイアスは高次元線形予測器の文脈では低いことが示唆されている。 This paper completes the picture for linear predictors by confirming the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension.To this aim, we consider a unique underlying random features model, which offers a rigorous framework for studying predictive performances, whilst the dimension of the observed features varies.Building on these theoretical results, we establish finite-sample bounds on stochastic gradient (SGD) predictors applied to zero-imputed data, a strategy particularly well suited for large-scale learning.If the MCAR assumption appears to be strong, we show that similar favorable behaviors occur for more complex missing data scenarios. Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input may strongly differ from the true underlying data. However, recent works suggest that this bias is low in the context of high-dimensional linear predictors when data is supposed to be missing completely at random (MCAR). This paper completes the picture for linear predictors by confirming the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension.To this aim, we consider a unique underlying random features model, which offers a rigorous framework for studying predictive performances, whilst the dimension of the observed features varies.Building on these theoretical results, we establish finite-sample bounds on stochastic gradient (SGD) predictors applied to zero-imputed data, a strategy particularly well suited for large-scale learning.If the MCAR assumption appears to be strong, we show that similar favorable behaviors occur for more complex missing data scenarios.	翻訳日:2024-02-07 15:28:51 公開日:2024-02-06
# Sliced Wasserstein Weisfeiler-Lehmanグラフカーネルによるガウス過程の回帰 Gaussian process regression with Sliced Wasserstein Weisfeiler-Lehman graph kernels ( http://arxiv.org/abs/2402.03838v1 ) ライセンス: Link先を確認	Rapha\"el Carpintero Perez (CMAP), S\'ebastien da Veiga (ENSAI, CREST), Josselin Garnier (CMAP), Brian Staber	(参考訳) 教師付き学習は、偏微分方程式の解法や材料特性の予測といったタスクの複雑なパターンを効果的に抽出する能力によって、計算物理学の分野で大きな注目を集めている。伝統的に、このようなデータセットは、問題幾何を表す多数のノードが(グラフとして)メッシュとして与えられる入力と、数値解法で得られる対応する出力からなる。つまり、教師付き学習モデルは、ノード属性の連続した大きなスパースグラフを処理できなければならない。本研究ではガウス過程の回帰に着目し,スライスしたwasserstein weisfeiler-lehman(swwl)グラフカーネルを紹介する。既存のグラフカーネルとは対照的に、提案されているswlカーネルはポジティブな定性と劇的な複雑さの低減を享受しており、これまで処理できなかったデータセットを処理できる。新しいカーネルは、入力グラフが数十のノードを持つ分子データセットのグラフ分類で最初に検証される。 SWWLカーネルの効率は、数万のノードからなる入力グラフを構成する計算流体力学や固体力学におけるグラフ回帰に基づいて説明される。 Supervised learning has recently garnered significant attention in the field of computational physics due to its ability to effectively extract complex patterns for tasks like solving partial differential equations, or predicting material properties. Traditionally, such datasets consist of inputs given as meshes with a large number of nodes representing the problem geometry (seen as graphs), and corresponding outputs obtained with a numerical solver. This means the supervised learning model must be able to handle large and sparse graphs with continuous node attributes. In this work, we focus on Gaussian process regression, for which we introduce the Sliced Wasserstein Weisfeiler-Lehman (SWWL) graph kernel. In contrast to existing graph kernels, the proposed SWWL kernel enjoys positive definiteness and a drastic complexity reduction, which makes it possible to process datasets that were previously impossible to handle. The new kernel is first validated on graph classification for molecular datasets, where the input graphs have a few tens of nodes. The efficiency of the SWWL kernel is then illustrated on graph regression in computational fluid dynamics and solid mechanics, where the input graphs are made up of tens of thousands of nodes.	翻訳日:2024-02-07 15:28:34 公開日:2024-02-06
# 強いレーザー照射下での単一イオンを通した熱輸送 Thermal transport through a single trapped ion under strong laser illumination ( http://arxiv.org/abs/2402.03937v1 ) ライセンス: Link先を確認	T. Tassis, F. Brito, F. L. Semi\~ao	(参考訳) 本研究では,レーザー励起によって駆動され,異なる温度で作動する熱貯水池と結合した単一閉じ込めイオン中での量子熱輸送の研究を行う。私たちの焦点は、異なるレーザーカップリングシナリオがシステムのダイナミクスに与える影響を理解することです。レーザー強度がイオンの電子的および運動的自由度が強く結合する状態に達すると、熱貯水池に対する現象論的モデルを用いる従来のアプローチは不十分になる。そのため、装束マスター方程式(DME)の定式化が重要となり、レーザー強度が熱輸送にどのように影響するかをより深く理解できるようになる。脱調および結合強度によって定義されるパラメータ空間内の熱電流を解析し、イオンの振動周波数とレーザーパラメータの影響を受け、熱輸送、残留コヒーレンス、システム特性の微妙な関係を明らかにする。また, 負の差熱伝導率や熱電流流の非対称性などの現象も明らかにし, この本質的量子技術の設定の熱特性について考察した。 In this work, we study quantum heat transport in a single trapped ion, driven by laser excitation and coupled to thermal reservoirs operating at different temperatures. Our focus lies in understanding how different laser coupling scenarios impact the system dynamics. As the laser intensity reaches a regime where the ion's electronic and motional degrees of freedom strongly couple, traditional approaches using phenomenological models for thermal reservoirs become inadequate. Therefore, the adoption of the dressed master equation (DME) formalism becomes crucial, enabling a deeper understanding of how distinct laser intensities influence heat transport. Analyzing the heat current within the parameter space defined by detuning and coupling strength, we observe intriguing circular patterns which are influenced by the ion's vibrational frequency and laser parameters, and reveal nuanced relationships between heat transport, residual coherence, and system characteristics. Our study also reveals phenomena such as negative differential heat conductivity and asymmetry in heat current flow, offering insights into the thermal properties of this essential quantum technology setup.	翻訳日:2024-02-07 15:20:58 公開日:2024-02-06
# 大規模言語モデルを拡張現実に組み込む - 包括性、エンゲージメント、プライバシの機会と課題 Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy ( http://arxiv.org/abs/2402.03907v1 ) ライセンス: Link先を確認	Efe Bozkir and S\"uleyman \"Ozdel and Ka Hei Carrie Lau and Mengdi Wang and Hong Gao and Enkelejda Kasneci	(参考訳) 近年のコンピュータグラフィックス、ハードウェア、人工知能(AI)、人間とコンピュータの相互作用は、拡張現実(XR)デバイスや設定をより広く普及させる可能性がある。これらのデバイスとセットアップは、ユーザに対して、目やハンドトラッカーなど、さまざまな感覚モダリティを持つインタラクティブでエンゲージメント、没入感のあるエクスペリエンスを提供する一方で、多くの非プレイヤーキャラクターは、プリスクリプトされた方法で、あるいは従来のAI技術によって利用される。本稿では,仮想アバターに組み込んだり,ユーザプロファイルに従ってエンジニアリングを促したり,特定の目的のためにLLMを微調整したりすることで,より包括的体験を促進するために,XRに大規模言語モデル(LLM)を組み込むことを論じる。このような包含がxr使用の多様性を促進すると論じている。さらに,LLMの多機能な会話機能により,ユーザはXR環境とより関わりやすくなり,XRを日常的に利用できるようになるだろうと考えている。最後に,ユーザによるllm環境提供情報とセンサによる生体計測データの組み合わせが,新たなプライバシ侵害につながる可能性があると推測する。このようなプライバシー侵害の可能性を研究する一方で、ユーザのプライバシーに関する懸念や好みについても調査する必要がある。要約すると、いくつかの課題があるにもかかわらず、LLMをXRに組み込むことは、いくつかの機会のある有望で新しい研究領域である。 Recent developments in computer graphics, hardware, artificial intelligence (AI), and human-computer interaction likely lead to extended reality (XR) devices and setups being more pervasive. While these devices and setups provide users with interactive, engaging, and immersive experiences with different sensing modalities, such as eye and hand trackers, many non-player characters are utilized in a pre-scripted way or by conventional AI techniques. In this paper, we argue for using large language models (LLMs) in XR by embedding them in virtual avatars or as narratives to facilitate more inclusive experiences through prompt engineering according to user profiles and fine-tuning the LLMs for particular purposes. We argue that such inclusion will facilitate diversity for XR use. In addition, we believe that with the versatile conversational capabilities of LLMs, users will engage more with XR environments, which might help XR be more used in everyday life. Lastly, we speculate that combining the information provided to LLM-powered environments by the users and the biometric data obtained through the sensors might lead to novel privacy invasions. While studying such possible privacy invasions, user privacy concerns and preferences should also be investigated. In summary, despite some challenges, embedding LLMs into XR is a promising and novel research area with several opportunities.	翻訳日:2024-02-07 15:20:41 公開日:2024-02-06
# 機械学習アルゴリズムを用いた従業員ターンオーバー分析 Employee Turnover Analysis Using Machine Learning Algorithms ( http://arxiv.org/abs/2402.03905v1 ) ライセンス: Link先を確認	Mahyar Karimi, Kamyar Seyedkazem Viliyani	(参考訳) 従業員の知識は組織資産である。ターンオーバーは明らかで隠れたコストと不可分な損害を課す可能性がある。このリスクを克服し緩和するには、従業員の状態を監視する必要がある。幸福機能の解析が複雑であるため、従業員の離職予測は機械学習技術に委譲することができる。本稿では,従業員の減少率について論じる。 AdaBoost、SVM、RandomForestの3つの異なる教師付き学習アルゴリズムは、従業員の属性の精度をベンチマークするために使用される。到達したモデルは予測分析を確立するのに役立ちます。 Employee's knowledge is an organization asset. Turnover may impose apparent and hidden costs and irreparable damages. To overcome and mitigate this risk, employee's condition should be monitored. Due to high complexity of analyzing well-being features, employee's turnover predicting can be delegated to machine learning techniques. In this paper, we discuss employee's attrition rate. Three different supervised learning algorithms comprising AdaBoost, SVM and RandomForest are used to benchmark employee attrition accuracy. Attained models can help out at establishing predictive analytics.	翻訳日:2024-02-07 15:20:16 公開日:2024-02-06
# Deep MSFOP: 教師なし形状マッチングのための深部関数写像における多重スペクトルフィルタ演算子保存 Deep MSFOP: Multiple Spectral filter Operators Preservation in Deep Functional Maps for Unsupervised Shape Matching ( http://arxiv.org/abs/2402.03904v1 ) ライセンス: Link先を確認	Feifan Luo, Qingsong Li, Ling Hu, Xinru Liu, Haojun Xu, Haibo Wang, Ting Li, Shengjun Liu	(参考訳) 本稿では,多スペクトルフィルタ演算子保存法 (MSFOR) という新しい制約を提案し,それに基づいて,形状マッチングのためのDeep MSFOPと呼ばれる効率的な深部関数写像アーキテクチャを開発した。基本的な考え方は、一般的なディスクリプタ保存制約を使う代わりに、複数のスペクトルフィルタ演算子を保存するためにマップが必要です。これにより、関数の周波数帯に含まれるより情報的な幾何情報を関数マップ計算に組み込むことができる。これは、ウェーブレット保存やlbo可換性といった以前の技術が、私たちの特別なケースであることを保証することができる。さらに,MSFOP制約を用いた地図の効率的な計算方法も開発しており,特に学習可能なフィルタ演算子を持つ深層学習に便利に組み込むことができる。以上の結果を利用して,機能地図と基本点マップを併用した,適切な教師なし損失を伴って,Deep MSFOPパイプラインを設計した。私たちの深い関数マップは、関数マップがより幾何学的に有益で、適切であることが保証され、計算は数値的に安定であるなど、顕著な利点があります。異なるデータセット上での広範な実験結果から,本手法は既存の最先端手法よりも優れており,特に非等長性や一貫性のないトポロジーデータセットのような困難な設定において優れていることが示された。 We propose a novel constraint called Multiple Spectral filter Operators Preservation (MSFOR) to compute functional maps and based on it, develop an efficient deep functional map architecture called Deep MSFOP for shape matching. The core idea is that, instead of using the general descriptor preservation constraint, we require our maps to preserve multiple spectral filter operators. This allows us to incorporate more informative geometrical information, contained in different frequency bands of functions, into the functional map computing. This can be confirmed by that some previous techniques like wavelet preservation and LBO commutativity are actually our special cases. Moreover, we also develop a very efficient way to compute the maps with MSFOP constraint, which can be conveniently embedded into the deep learning, especially having learnable filter operators. Utilizing the above results, we finally design our Deep MSFOP pipeline, equipped with a suitable unsupervised loss jointly penalizing the functional map and the underlying pointwise map. Our deep functional map has notable advantages, including that the functional map is more geometrically informative and guaranteed to be proper, and the computing is numerically stable. Extensive experimental results on different datasets demonstrate that our approach outperforms the existing state-of-the-art methods, especially in challenging settings like non-isometric and inconsistent topology datasets.	翻訳日:2024-02-07 15:20:08 公開日:2024-02-06
# 複合リターンは強化学習におけるばらつきを減らす Compound Returns Reduce Variance in Reinforcement Learning ( http://arxiv.org/abs/2402.03903v1 ) ライセンス: Link先を確認	Brett Daley, Martha White, Marlos C. Machado	(参考訳) n$-step returnや$\lambda$-returnsといったマルチステップリターンは、強化学習(RL)メソッドのサンプル効率を改善するために一般的に使用される。多段階リターンの分散は、その長さの制限因子となり、あまりにも遠くに目を向けると分散が増加し、多段階学習の利点が逆転する。我々の研究では、分散を減らすために複合戻り値 -- $n$-step の重み付き平均値 -- が示される。与えられた$n$-stepの戻り値と同じ縮約係数を持つ任意の化合物が、厳密に分散を減少させることを初めて証明する。さらに,この分散還元特性が線形関数近似下での時間微分学習の有限サンプル複雑性を改善することを証明した。一般化合物のリターンは実装に費用がかかるため,ミニバッチ経験再生を用いた場合であっても,効率を保ちながら分散を低減できる2ブートストラップリターンを導入する。 2ブートストラップリターンが、計算コストをほとんど増やさずに、n$-step deep rlエージェントのサンプル効率を向上させることができることを示す実験を行った。 Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- weighted averages of $n$-step returns -- to reduce variance. We prove for the first time that any compound return with the same contraction modulus as a given $n$-step return has strictly lower variance. We additionally prove that this variance-reduction property improves the finite-sample complexity of temporal-difference learning under linear function approximation. Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. We conduct experiments showing that two-bootstrap returns can improve the sample efficiency of $n$-step deep RL agents, with little additional computational cost.	翻訳日:2024-02-07 15:19:42 公開日:2024-02-06
# 点製品注意の可解モデルにおける位置学習と意味学習の相転移 A phase transition between positional and semantic learning in a solvable model of dot-product attention ( http://arxiv.org/abs/2402.03902v1 ) ライセンス: Link先を確認	Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborov\'a	(参考訳) 点製品注目層が位置注意行列(それぞれの位置に基づいてトークンが互いに結合する)と意味注意行列(その意味に基づいて相互に結合するトークンを含む)をどのように学習するかを検討する。アルゴリズム的なタスクの場合、同じ単純なアーキテクチャが位置的あるいは意味的メカニズムを使ってどのようにソリューションを実装するかを実験的に示します。理論的には,学習可能な結合・低ランク問合せとキー行列を持つ非線形セルフアテンション層の学習について検討する。高次元データの漸近的限界と膨大なトレーニングサンプルについて,非凸経験的損失景観における大域的最小値の閉形式的特徴付けを述べる。この最小限は位置的または意味的なメカニズムのいずれかに対応し、サンプルの複雑さが増大する前者から後者への初期相転移を示す。最後に,dot-product attention層を線形位置ベースラインと比較し,十分なデータにアクセス可能な意味的メカニズムを用いて,後者よりも優れていることを示す。 We investigate how a dot-product attention layer learns a positional attention matrix (with tokens attending to each other based on their respective positions) and a semantic attention matrix (with tokens attending to each other based on their meaning). For an algorithmic task, we experimentally show how the same simple architecture can learn to implement a solution using either the positional or semantic mechanism. On the theoretical side, we study the learning of a non-linear self-attention layer with trainable tied and low-rank query and key matrices. In the asymptotic limit of high-dimensional data and a comparably large number of training samples, we provide a closed-form characterization of the global minimum of the non-convex empirical loss landscape. We show that this minimum corresponds to either a positional or a semantic mechanism and evidence an emergent phase transition from the former to the latter with increasing sample complexity. Finally, we compare the dot-product attention layer to linear positional baseline, and show that it outperforms the latter using the semantic mechanism provided it has access to sufficient data.	翻訳日:2024-02-07 15:19:21 公開日:2024-02-06
# バッチユニバーサル予測 Batch Universal Prediction ( http://arxiv.org/abs/2402.03901v1 ) ライセンス: Link先を確認	Marco Bondaschi, Michael Gastpar	(参考訳) 大型言語モデル(llm)は最近、人間のような英語文を生成するという驚くべき能力により、大きな人気を得ている。 LLMは基本的に予測子であり、過去の単語列の確率を推定する。したがって、普遍的な予測の観点からその性能を評価することは自然である。これを公平に行うために,古典的平均的後悔の修正としてバッチ後悔の概念を導入し,その漸近的価値について,記憶力のない情報源と1次マルコフ源について検討する。 Large language models (LLMs) have recently gained much popularity due to their surprising ability at generating human-like English sentences. LLMs are essentially predictors, estimating the probability of a sequence of words given the past. Therefore, it is natural to evaluate their performance from a universal prediction perspective. In order to do that fairly, we introduce the notion of batch regret as a modification of the classical average regret, and we study its asymptotical value for add-constant predictors, in the case of memoryless sources and first-order Markov sources.	翻訳日:2024-02-07 15:19:02 公開日:2024-02-06
# Pro-HAN:プロファイルに基づく音声言語理解のための異種グラフ注意ネットワーク Pro-HAN: A Heterogeneous Graph Attention Network for Profile-Based Spoken Language Understanding ( http://arxiv.org/abs/2402.03900v1 ) ライセンス: Link先を確認	Dechuan Teng, Chunlin Lu, Xiao Xu, Wanxiang Che, Libo Qin	(参考訳) 近年、プロファイルベースの音声言語理解(SLU)が注目され、ユーザ発話の曖昧さを解消するために、様々な種類の補足プロファイル情報(知識グラフ、ユーザプロファイル、コンテキスト認識)を組み込むことを目指している。しかし、既存のアプローチは、それらの相互関係を考慮せずに、あるいはそれらの内部で無関係で矛盾する情報を除外することなく、異なるプロファイル情報を別々にモデル化することができる。上記の問題に対処するために,複数のプロファイル情報にまたがる推論を行う異種グラフアテンションネットワーク pro-han を導入する。具体的には、複数のPro間の相互関係を捉えるために、Intra-Pro、Inter-Pro、utterance-Proの3種類のエッジを設計する。 ProSLUデータセットに新たな最先端技術を導入し、3つの指標すべてに対して約8%の改善を実現しました。さらに解析実験により,マルチソースプロファイル情報のモデリングにおける本手法の有効性が検証された。 Recently, Profile-based Spoken Language Understanding (SLU) has gained increasing attention, which aims to incorporate various types of supplementary profile information (i.e., Knowledge Graph, User Profile, Context Awareness) to eliminate the prevalent ambiguities in user utterances. However, existing approaches can only separately model different profile information, without considering their interrelationships or excluding irrelevant and conflicting information within them. To address the above issues, we introduce a Heterogeneous Graph Attention Network to perform reasoning across multiple Profile information, called Pro-HAN. Specifically, we design three types of edges, denoted as intra-Pro, inter-Pro, and utterance-Pro, to capture interrelationships among multiple Pros. We establish a new state-of-the-art on the ProSLU dataset, with an improvement of approximately 8% across all three metrics. Further analysis experiments also confirm the effectiveness of our method in modeling multi-source profile information.	翻訳日:2024-02-07 15:18:53 公開日:2024-02-06
# 視覚的質問応答推論の合理化 Convincing Rationales for Visual Question Answering Reasoning ( http://arxiv.org/abs/2402.03896v1 ) ライセンス: Link先を確認	Kun Li, George Vosselman, Michael Ying Yang	(参考訳) 視覚的質問応答(vqa)は、画像の内容に関する質問に対する回答を予測するという困難なタスクである。テキスト質問と視覚イメージの両方を深く理解する必要がある。先行研究は、予測された回答の精度を単純に計算することで、解答モデルを直接評価する。しかし、このような「ブラックボックス」システムでは、予測の背後にある内的推論は無視され、予測を信用できるかどうかさえわからない。場合によっては、不適切な視覚領域やテキストトークンに注目した場合でも、モデルが正しい答えを得られる場合があるため、モデルの信頼性が低く、非論理的になる。 VQA, CRVQAに対して, 与えられた画像/問合せ対の予測解に隣接する視覚的およびテキスト的合理性を生成する。新しい出力がもたらす追加アノテーションを考えると、 {CRVQA} は既存のVQAデータセットとそれらのビジュアルラベルから変換されたサンプルによって訓練され、評価される。広範な実験により、視覚的およびテキスト的合理性が回答の予測をサポートし、さらに精度を向上させることが示されている。さらに, ゼロショット評価設定において, {CRVQA} は汎用VQAデータセット上での競合性能を達成する。データセットとソースコードはhttps://github.com/lik1996/CRVQA2024でリリースされる。 Visual Question Answering (VQA) is a challenging task of predicting the answer to a question about the content of an image. It requires deep understanding of both the textual question and visual image. Prior works directly evaluate the answering models by simply calculating the accuracy of the predicted answers. However, the inner reasoning behind the prediction is disregarded in such a "black box" system, and we do not even know if one can trust the predictions. In some cases, the models still get the correct answers even when they focus on irrelevant visual regions or textual tokens, which makes the models unreliable and illogical. To generate both visual and textual rationales next to the predicted answer to the given image/question pair, we propose Convincing Rationales for VQA, CRVQA. Considering the extra annotations brought by the new outputs, {CRVQA} is trained and evaluated by samples converted from some existing VQA datasets and their visual labels. The extensive experiments demonstrate that the visual and textual rationales support the prediction of the answers, and further improve the accuracy. Furthermore, {CRVQA} achieves competitive performance on generic VQA datatsets in the zero-shot evaluation setting. The dataset and source code will be released under https://github.com/lik1996/CRVQA2024.	翻訳日:2024-02-07 15:18:35 公開日:2024-02-06
# 自動運転のための予測水平条件:安全・快適・効率の最適化 Prediction Horizon Requirements for Automated Driving: Optimizing Safety, Comfort, and Efficiency ( http://arxiv.org/abs/2402.03893v1 ) ライセンス: Link先を確認	Manuel Mu\~noz S\'anchez, Chris van der Ploeg, Robin Smit, Jos Elfring, Emilia Silvas, Ren\'e van de Molengraft	(参考訳) 他の道路利用者の移動を予測することは、自動走行車(AV)の性能を改善する上で有益である。しかし、これらの予測に関連する時間軸とav性能の関係はいまだ不明である。多くの軌道予測アルゴリズムが存在するにもかかわらず、様々な予測長がAV安全やその他の車両性能指標にどのように影響するかは研究されていない。本研究は, 安全性, 快適性, 効率性に着目し, 異なる予測地平線がAV性能に及ぼす影響を検討することによって, このギャップに対処する。最新のリスクベースの予測軌道プランナを用いて複数の実験を行い、最大20秒間予測をシミュレーションした。シミュレーションに基づいて、特定のAV性能基準とアプリケーションニーズに基づいて、必要最小限かつ最適予測地平線を特定するためのフレームワークを提案する。その結果,横断歩道との衝突を防ぐために1.6秒までの地平線が必要であり,最大7～8秒の地平線が最適効率を示し,最大15秒までの地平線が乗客の快適性を向上させることがわかった。提案手法は,歩行者を横断するアプリケーションのための一般的なガイドラインとして,11.8秒の予測水平線を目標とする。 Predicting the movement of other road users is beneficial for improving automated vehicle (AV) performance. However, the relationship between the time horizon associated with these predictions and AV performance remains unclear. Despite the existence of numerous trajectory prediction algorithms, no studies have been conducted on how varying prediction lengths affect AV safety and other vehicle performance metrics, resulting in undefined horizon requirements for prediction methods. Our study addresses this gap by examining the effects of different prediction horizons on AV performance, focusing on safety, comfort, and efficiency. Through multiple experiments using a state-of-the-art, risk-based predictive trajectory planner, we simulated predictions with horizons up to 20 seconds. Based on our simulations, we propose a framework for specifying the minimum required and optimal prediction horizons based on specific AV performance criteria and application needs. Our results indicate that a horizon of 1.6 seconds is required to prevent collisions with crossing pedestrians, horizons of 7-8 seconds yield the best efficiency, and horizons up to 15 seconds improve passenger comfort. We conclude that prediction horizon requirements are application-dependent, and recommend aiming for a prediction horizon of 11.8 seconds as a general guideline for applications involving crossing pedestrians.	翻訳日:2024-02-07 15:18:13 公開日:2024-02-06
# シリコン上の超伝導回路の損失とデコヒーレンス:電子スピン共鳴からの考察 Loss and decoherence in superconducting circuits on silicon: Insights from electron spin resonance ( http://arxiv.org/abs/2402.03889v1 ) ライセンス: Link先を確認	Aditya Jayaraman, Andrey V. Danilov, Jonas Bylander and Sergey E. Kubatkin	(参考訳) 量子計算や量子センシング用途に用いられる固体デバイスは、突発的で帯電した2レベルシステム(TLS)と非磁性スピンによる損失とノイズに悪影響を及ぼす。これら2つのノイズ源は相互接続され、回路性能への影響が増大する。我々は、窒化ニオブ(NbN)超伝導共振器を用いたオンチップ電子スピン共鳴(ESR)法を用いて、シリコンの表面スピンと後表面処理の効果を研究する。異なるスピン緩和時間で特徴付けられる2つの異なるスピン種を同定し, 種々の表面処理(アニール, フッ化水素酸)に対して選択的に反応する。 2つのスピン種のうちの1つだけが低出力(近傍単光子)励起におけるTLS制限共振器品質因子に大きな影響を与える。表面処理後のスピン密度の3～5倍減少を観測し、esr分光法が量子系における損失とデコヒーレンスを緩和する戦略を開発する上で有効であることを示す。 Solid-state devices used for quantum computation and quantum sensing applications are adversely affected by loss and noise caused by spurious, charged two-level systems (TLS) and stray paramagnetic spins. These two sources of noise are interconnected, exacerbating the impact on circuit performance. We use an on-chip electron spin resonance (ESR) technique, with niobium nitride (NbN) superconducting resonators, to study surface spins on silicon and the effect of post-fabrication surface treatments. We identify two distinct spin species that are characterized by different spin-relaxation times and respond selectively to various surface treatments (annealing and hydrofluoric acid). Only one of the two spin species has a significant impact on the TLS-limited resonator quality factor at low-power (near single-photon) excitation. We observe a 3-to-5-fold reduction in the total density of spins after surface treatments, and demonstrate the efficacy of ESR spectroscopy in developing strategies to mitigate loss and decoherence in quantum systems.	翻訳日:2024-02-07 15:17:48 公開日:2024-02-06

Title

Authors

Abstract

論文公表日・翻訳日

# DeepTraderX:マルチスレッド市場シミュレーションにおけるDeep Learningによる従来型トレーディング戦略の整合化

DeepTraderX: Challenging Conventional Trading Strategies with Deep Learning in Multi-Threaded Market Simulations ( http://arxiv.org/abs/2403.18831v1 )

ライセンス: Link先を確認

Armand Mihai Cismaru,

(参考訳) 本稿では,DeepTraderX(DTX)について紹介し,その性能をマルチスレッド市場シミュレーションで示す。 DTXは、およそ500の模擬市場デーにおいて、他の戦略が生み出す価格を見ることでのみ学んでいる。これを行うことで、市場データから、入札または注文のどちらかの引用へのマッピングを成功させ、資産の配置を可能にした。歴史あるレベル2市場データ、すなわち特定のトレーダブル資産のリミット・オーダー・ブック(LOB)に基づいて、DTXは市場状態の$S$を各時点の$T$で処理し、市場注文の$P$を決定する。トレーニングとテストの両方で使用される市場データは、実歴史ある株式市場のデータに基づいて、ユニークな市場スケジュールから生成される。 DTXは文献の最良の戦略に対して広範囲に試験され、その結果は統計的分析によって検証された。この結果から,DTXの競合能力は,複雑なマルチスレッドシミュレーションを成功させる上で必要となる,シンプルなモデルの効率を重視した,非クラスな人的トレーダーを含むパブリックドメイントレーダーのパフォーマンスを上回るものが多い。これは、より効率的な金融市場を構築するために、"ブラックボックス"なディープラーニングシステムを活用する可能性を強調します。

In this paper, we introduce DeepTraderX (DTX), a simple Deep Learning-based trader, and present results that demonstrate its performance in a multi-threaded market simulation. In a total of about 500 simulated market days, DTX has learned solely by watching the prices that other strategies produce. By doing this, it has successfully created a mapping from market data to quotes, either bid or ask orders, to place for an asset. Trained on historical Level-2 market data, i.e., the Limit Order Book (LOB) for specific tradable assets, DTX processes the market state $S$ at each timestep $T$ to determine a price $P$ for market orders. The market data used in both training and testing was generated from unique market schedules based on real historic stock market data. DTX was tested extensively against the best strategies in the literature, with its results validated by statistical analysis. Our findings underscore DTX's capability to rival, and in many instances, surpass, the performance of public-domain traders, including those that outclass human traders, emphasising the efficiency of simple models, as this is required to succeed in intricate multi-threaded simulations. This highlights the potential of leveraging "black-box" Deep Learning systems to create more efficient financial markets.

翻訳日:2024-04-01 02:34:48 公開日:2024-02-06

# Linuxカーネル・アウト・オブ・メモリ・キラーのコミットメッセージに対するRationaleデータセットと解析

Rationale Dataset and Analysis for the Commit Messages of the Linux Kernel Out-of-Memory Killer ( http://arxiv.org/abs/2403.18832v1 )

ライセンス: Link先を確認

Mouna Dhaouadi, Bentley James Oakes, Michalis Famelis,

(参考訳) コードコミットメッセージには、開発者がなぜ変更をしたのかに関する有用な情報が含まれている。しかし、実世界のコードコミットメッセージにおける理性の存在と構造はよく研究されていない。ここでは、Linux Kernel Out-Of-Memory Killerコンポーネントのコードコミットメッセージを解析するためのラベル付きデータセットの作成について詳述する。我々は,存在,時間的進化,構造といった合理的情報の側面を研究する。私たちのデータセットのコミットの98.9%は、合理的な情報を持つ文を含み、経験豊富な開発者は、コミットの文の約60%に合理性を報告している。直面した課題について報告し、ラベル付けの例を示す。

Code commit messages can contain useful information on why a developer has made a change. However, the presence and structure of rationale in real-world code commit messages is not well studied. Here, we detail the creation of a labelled dataset to analyze the code commit messages of the Linux Kernel Out-Of-Memory Killer component. We study aspects of rationale information, such as presence, temporal evolution, and structure. We find that 98.9% of commits in our dataset contain sentences with rationale information, and that experienced developers report rationale in about 60% of the sentences in their commits. We report on the challenges we faced and provide examples for our labelling.

翻訳日:2024-04-01 02:34:48 公開日:2024-02-06

# QTFlow: RTL上のセキュリティ対応ハードウェア設計のための定量的タイミング感覚情報フロー

QTFlow: Quantitative Timing-Sensitive Information Flow for Security-Aware Hardware Design on RTL ( http://arxiv.org/abs/2401.17819v2 )

ライセンス: Link先を確認

Lennart M. Reimann, Anshul Prashar, Chiara Ghinami, Rebecca Pelke, Dominik Sisejkovic, Farhad Merchant, Rainer Leupers,

(参考訳) 現代のElectronic Design Automation (EDA) ツールでは、セキュリティはパワー、パフォーマンス、領域最適化の主な目標を後押しすることが多い。一般的に、セキュリティ分析は手動で行われるため、設計上の脆弱性は気づかないままである。セキュリティを意識したEDAツールは,パフォーマンスと領域を念頭に置いて,セキュリティ脅威の識別と削除を支援する。カットエッジ法は、設計構造における意図しない情報漏洩を特定するために、情報フロー解析を用いる。現在の情報漏洩検出方法は、定量的情報フロー分析を用いて漏洩を定量化する。しかし、シーケンシャル回路の扱いは、時間に依存しない性質、タイミングチャネルを見渡すこと、偽陽性を導入することなどにより、最先端技術に課題をもたらす。これを解決するために、設計フェーズ中にハードウェア情報漏洩を定量化する、タイミングに敏感なフレームワークQTFlowを紹介する。 QTFlowはオープンソースベンチマークの有効性を図示し、タイミングチャネルを自律的に識別し、現在の最先端技術と対比した場合に、時間に依存しない分析から生じるすべての偽陽性を低減します。

In contemporary Electronic Design Automation (EDA) tools, security often takes a backseat to the primary goals of power, performance, and area optimization. Commonly, the security analysis is conducted by hand, leading to vulnerabilities in the design remaining unnoticed. Security-aware EDA tools assist the designer in the identification and removal of security threats while keeping performance and area in mind. Cutting-edge methods employ information flow analysis to identify inadvertent information leaks in design structures. Current information leakage detection methods use quantitative information flow analysis to quantify the leaks. However, handling sequential circuits poses challenges for state-of-the-art techniques due to their time-agnostic nature, overlooking timing channels, and introducing false positives. To address this, we introduce QTFlow, a timing-sensitive framework for quantifying hardware information leakages during the design phase. Illustrating its effectiveness on open-source benchmarks, QTFlow autonomously identifies timing channels and diminishes all false positives arising from time-agnostic analysis when contrasted with current state-of-the-art techniques.

翻訳日:2024-03-25 12:08:11 公開日:2024-02-06

# GeoDataのプライバシーリスクに関する調査

Privacy risk in GeoData: A survey ( http://arxiv.org/abs/2402.03612v1 )

ライセンス: Link先を確認

Mahrokh Abdollahi Lorestani, Thilina Ranbaduge, Thierry Rakotoarivelo,

(参考訳) ユビキタスな位置情報サービスの利用により、大規模個人レベルの位置情報は位置情報認識デバイスを通じて広く収集されている。位置情報の公開は、匿名化や機密情報の推測、さらには物理的な脅威につながる可能性があるため、ユーザにとって重大なプライバシーリスクとなる。ジオプライバシーの懸念は、ユーザーアイデンティティの匿名化と位置情報の露出の問題に起因している。本研究では,地理データにおける個人のプライバシーを守るために提案されている異なるジオマスキング手法を分析した。本研究では,これらのテクニックを異なる次元に沿って特徴づける分類法を提案し,ジオマスキング技術の調査を行う。次に、現在の技術の欠点を強調し、今後の研究の道筋について論じる。

With the ubiquitous use of location-based services, large-scale individual-level location data has been widely collected through location-awareness devices. The exposure of location data constitutes a significant privacy risk to users as it can lead to de-anonymisation, the inference of sensitive information, and even physical threats. Geoprivacy concerns arise on the issues of user identity de-anonymisation and location exposure. In this survey, we analyse different geomasking techniques that have been proposed to protect the privacy of individuals in geodata. We present a taxonomy to characterise these techniques along different dimensions, and conduct a survey of geomasking techniques. We then highlight shortcomings of current techniques and discuss avenues for future research.

翻訳日:2024-03-18 07:48:02 公開日:2024-02-06

# コードに基づく推測に基づくロッシー暗号

Lossy Cryptography from Code-Based Assumptions ( http://arxiv.org/abs/2402.03633v1 )

ライセンス: Link先を確認

Quang Dao, Aayush Jain,

(参考訳) 過去数十年にわたって、二次的残留性、決定的ディフィー・ヘルマン(Decisional Diffie-Hellman)、Learning with Errors(Learning with Errors)といった様々な仮定から構築された、欠落または同型の性質を持つ高度な暗号プリミティブが急増してきた。これらのプリミティブは、複雑性クラス$SZK$(統計的ゼロ知識)の難しい問題を暗示している。このことは、コードベースの仮定から先進的プリミティブを構築するための障壁となる。そのような仮定が唯一知られているのは、準多項式時間で破られる非常に低いノイズレート$\frac{\log^2 n}{n}$のLearning Parity with Noise (LPN)である。そこで本研究では,複雑性クラス$BPP^{SZK}$に該当するDense-Sparse LPNというコードベースの仮定を提案する。我々の仮定は、平均ケース複雑性において、McElieceの暗号システムとランダム$k\mbox{-}$XORにインスパイアされたLPNの変種である。我々はこの仮定を利用して、損失の少ないトラップドア関数(Peikert-Waters STOC 08)を構築する。これは、最初の論文で格子ベースの構造に取って代わる最初の量子後代替を与える。基本的な暗号ツールであるロッシートラップドア関数は、ロッシープリミティブと非ロッシープリミティブの両方の幅広いスペクトルを可能にすることが知られている。特に,音速$\frac{\log^2 n}{n}$のLPNからの事前構成よりも,準ポリノミクス的にのみ安全な衝突耐性ハッシュ関数を実現する。

Over the past few decades, we have seen a proliferation of advanced cryptographic primitives with lossy or homomorphic properties built from various assumptions such as Quadratic Residuosity, Decisional Diffie-Hellman, and Learning with Errors. These primitives imply hard problems in the complexity class $SZK$ (statistical zero-knowledge); as a consequence, they can only be based on assumptions that are broken in $BPP^{SZK}$. This poses a barrier for building advanced primitives from code-based assumptions, as the only known such assumption is Learning Parity with Noise (LPN) with an extremely low noise rate $\frac{\log^2 n}{n}$, which is broken in quasi-polynomial time. In this work, we propose a new code-based assumption: Dense-Sparse LPN, that falls in the complexity class $BPP^{SZK}$ and is conjectured to be secure against subexponential time adversaries. Our assumption is a variant of LPN that is inspired by McEliece's cryptosystem and random $k\mbox{-}$XOR in average-case complexity. We leverage our assumption to build lossy trapdoor functions (Peikert-Waters STOC 08). This gives the first post-quantum alternative to the lattice-based construction in the original paper. Lossy trapdoor functions, being a fundamental cryptographic tool, are known to enable a broad spectrum of both lossy and non-lossy cryptographic primitives; our construction thus implies these primitives in a generic manner. In particular, we achieve collision-resistant hash functions with plausible subexponential security, improving over a prior construction from LPN with noise rate $\frac{\log^2 n}{n}$ that is only quasi-polynomially secure.

翻訳日:2024-03-18 07:48:02 公開日:2024-02-06

# WhisperFuzz:プロセッサのタイミング脆弱性を検出するためのホワイトボックスファズ

WhisperFuzz: White-Box Fuzzing for Detecting and Locating Timing Vulnerabilities in Processors ( http://arxiv.org/abs/2402.03704v1 )

ライセンス: Link先を確認

Pallavi Borkar, Chen Chen, Mohamadreza Rostami, Nikhilesh Singh, Rahul Kande, Ahmad-Reza Sadeghi, Chester Rebeiro, Jeyavijayan Rajendran,

(参考訳) プロセッサのタイミング脆弱性は強力な脅威として浮上している。プロセッサがあらゆるコンピューティングシステムの基盤であるため、これらの欠陥を特定することは必須である。近年,ソフトウェア脆弱性の検出に用いられてきたファジィング技術は,プロセッサなどの大規模ハードウェア設計における脆弱性の発見に有望な結果を示している。研究者は、プロセッサのタイミング脆弱性を検出するためにブラックボックスまたはグレイボックスファジィを適応した。しかし、これらのタイミング脆弱性の場所や根本原因を特定することはできず、また、プロセッサのセキュリティに対するデザイナの信頼性を高めるためのカバレッジフィードバックも提供しない。既存のファジィの欠陥に対処するため,プロセッサのタイミング脆弱性を検出し,検出し,微構造的タイミング行動のカバレッジを評価するための静的解析を行う最初のホワイトボックスファジィであるWhisperFuzzを提案する。 WhisperFuzzは、プロセッサのタイミング動作、マイクロアーキテクチャの状態遷移の基本的な性質を使用して、タイミング脆弱性をローカライズする。 WhisperFuzzは、レジスタ転送レベル(RTL)のプロセッサ設計から自動的にマイクロアーキテクチャの状態遷移を抽出し、その設計をカバー範囲として状態遷移を監視する。さらに、WhisperFuzzは、DUT(Design-under-test)がテスト処理に要する時間を測定し、タイミングの脆弱性を示唆する小さな異常なバリエーションを特定する。 WhisperFuzzは、先進的なオープンソースRISC-Vプロセッサ(BOOM、Rocket Core、CVA6)で12の新たなタイミング脆弱性を検出する。そのうち8つはZkt拡張のゼロレイテンシ要件に違反しており、深刻なセキュリティ脆弱性と見なされている。さらに、WhisperFuzzは、新しい脆弱性と既存の脆弱性の位置も特定する。

Timing vulnerabilities in processors have emerged as a potent threat. As processors are the foundation of any computing system, identifying these flaws is imperative. Recently fuzzing techniques, traditionally used for detecting software vulnerabilities, have shown promising results for uncovering vulnerabilities in large-scale hardware designs, such as processors. Researchers have adapted black-box or grey-box fuzzing to detect timing vulnerabilities in processors. However, they cannot identify the locations or root causes of these timing vulnerabilities, nor do they provide coverage feedback to enable the designer's confidence in the processor's security. To address the deficiencies of the existing fuzzers, we present WhisperFuzz--the first white-box fuzzer with static analysis--aiming to detect and locate timing vulnerabilities in processors and evaluate the coverage of microarchitectural timing behaviors. WhisperFuzz uses the fundamental nature of processors' timing behaviors, microarchitectural state transitions, to localize timing vulnerabilities. WhisperFuzz automatically extracts microarchitectural state transitions from a processor design at the register-transfer level (RTL) and instruments the design to monitor the state transitions as coverage. Moreover, WhisperFuzz measures the time a design-under-test (DUT) takes to process tests, identifying any minor, abnormal variations that may hint at a timing vulnerability. WhisperFuzz detects 12 new timing vulnerabilities across advanced open-sourced RISC-V processors: BOOM, Rocket Core, and CVA6. Eight of these violate the zero latency requirements of the Zkt extension and are considered serious security vulnerabilities. Moreover, WhisperFuzz also pinpoints the locations of the new and the existing vulnerabilities.

翻訳日:2024-03-18 07:38:15 公開日:2024-02-06

# ゼロ知識証明機構によるブロックチェーンの安全性と効率の向上

Enhanced Security and Efficiency in Blockchain with Aggregated Zero-Knowledge Proof Mechanisms ( http://arxiv.org/abs/2402.03834v1 )

ライセンス: Link先を確認

Oleksandr Kuznetsov, Alex Rusnak, Anton Yezhov, Dzianis Kanonik, Kateryna Kuznetsova, Stanislav Karashchuk,

(参考訳) ブロックチェーン技術は、デジタルトランザクションにおけるデータの整合性とセキュリティを保証する革命的なツールとして登場した。しかしながら、ブロックチェーンシステム、特にEthereumにおけるデータ検証に対する現在のアプローチは、効率性と計算オーバーヘッドの面で課題に直面している。従来のMerkle Treeと暗号ハッシュ関数の使用は、有効ではあるが、特に大規模なデータセットでは、リソース消費が大幅に増加する。これは、ブロックチェーンネットワークにおけるより効率的なデータ検証方法の必要性という、既存の研究のギャップを浮き彫りにするものだ。本研究は,メルクルツリーの構造内にゼロ知識証明の革新的な集約スキームを提案することによって,このギャップに対処する。我々は,その生成と検証に必要な証明と計算資源を著しく削減するシステムを開発した。当社のアプローチは、ブロックチェーンデータ検証のパラダイムシフトであり、セキュリティと効率のバランスを取っています。提案手法の有効性を検証するため,実Ethereumブロックデータを用いて実験を行った。その結果、従来の手法と比較して、証明サイズと計算要求の大幅な削減が示され、検証プロセスはより効率的かつ経済的に実行可能となった。私たちのコントリビューションは、ブロックチェーンデータ検証のためのスケーラブルでセキュアなソリューションを提供するという、重要な研究の空白を埋めています。金融取引からサプライチェーン管理に至るまで、さまざまなアプリケーションにおけるブロックチェーンテクノロジの全体的なパフォーマンスと適応性を高めています。

Blockchain technology has emerged as a revolutionary tool in ensuring data integrity and security in digital transactions. However, the current approaches to data verification in blockchain systems, particularly in Ethereum, face challenges in terms of efficiency and computational overhead. The traditional use of Merkle Trees and cryptographic hash functions, while effective, leads to significant resource consumption, especially for large datasets. This highlights a gap in existing research: the need for more efficient methods of data verification in blockchain networks. Our study addresses this gap by proposing an innovative aggregation scheme for Zero-Knowledge Proofs within the structure of Merkle Trees. We develop a system that significantly reduces the size of the proof and the computational resources needed for its generation and verification. Our approach represents a paradigm shift in blockchain data verification, balancing security with efficiency. We conducted extensive experimental evaluations using real Ethereum block data to validate the effectiveness of our proposed scheme. The results demonstrate a drastic reduction in proof size and computational requirements compared to traditional methods, making the verification process more efficient and economically viable. Our contribution fills a critical research void, offering a scalable and secure solution for blockchain data verification. The implications of our work are far-reaching, enhancing the overall performance and adaptability of blockchain technology in various applications, from financial transactions to supply chain management.

翻訳日:2024-03-18 07:38:15 公開日:2024-02-06

# LIPSTICK: 論理ロックに対する破壊的かつ説明可能なグラフニューラルネットワークベースのOracle-Less攻撃

LIPSTICK: Corruptibility-Aware and Explainable Graph Neural Network-based Oracle-Less Attack on Logic Locking ( http://arxiv.org/abs/2402.04235v1 )

ライセンス: Link先を確認

Yeganeh Aghamohammadi, Amin Rezaei,

(参考訳) ゼロトラストのファブレスパラダイムでは、デザイナは半導体サプライチェーンに対するハードウェアベースの攻撃をますます懸念している。論理ロック(Logic locking)は、ハードウェアの知的財産の盗難と過剰生産を防ぐために、回路に追加のキー制御ゲートを追加する、信頼のための設計手法である。攻撃者は伝統的に論理ロックされた回路を攻撃するために託宣に依存してきたが、機械学習攻撃は託宣にアクセスしなくても秘密鍵を回収する能力を示している。本稿では、まず最先端の機械学習攻撃の限界について検討し、鍵ハミング距離を唯一のモデル導構造計量として用いることは必ずしも有用ではないと論じる。そこで我々は,回路の構造と動作を考慮に入れた,論理ロックに対するニューラルネットワークに基づくオラクルレス攻撃を開発し,訓練し,テストする。我々のモデルは、機械学習モデルがトレーニングプロセスで解釈したものと、それがどのように攻撃を成功させるかを分析するという意味で説明がつく。チップデザイナは、インクリメンタルな修正を避けながら、設計をセキュアにすることで、この情報を有益なものにすることができる。

In a zero-trust fabless paradigm, designers are increasingly concerned about hardware-based attacks on the semiconductor supply chain. Logic locking is a design-for-trust method that adds extra key-controlled gates in the circuits to prevent hardware intellectual property theft and overproduction. While attackers have traditionally relied on an oracle to attack logic-locked circuits, machine learning attacks have shown the ability to retrieve the secret key even without access to an oracle. In this paper, we first examine the limitations of state-of-the-art machine learning attacks and argue that the use of key hamming distance as the sole model-guiding structural metric is not always useful. Then, we develop, train, and test a corruptibility-aware graph neural network-based oracle-less attack on logic locking that takes into consideration both the structure and the behavior of the circuits. Our model is explainable in the sense that we analyze what the machine learning model has interpreted in the training process and how it can perform a successful attack. Chip designers may find this information beneficial in securing their designs while avoiding incremental fixes.

翻訳日:2024-03-18 07:38:15 公開日:2024-02-06

# ブロックチェーンにおけるメルクルツリー:衝突確率とセキュリティへの影響に関する研究

Merkle Trees in Blockchain: A Study of Collision Probability and Security Implications ( http://arxiv.org/abs/2402.04367v1 )

ライセンス: Link先を確認

Oleksandr Kuznetsov, Alex Rusnak, Anton Yezhov, Kateryna Kuznetsova, Dzianis Kanonik, Oleksandr Domin,

(参考訳) ブロックチェーン技術の急速な進化の中で、データの整合性とセキュリティの確保が最重要である。この研究は、Ethereumのようなブロックチェーンアーキテクチャの基本コンポーネントであるMerkle Treesのセキュリティ面について詳しく説明している。我々は、ブロックチェーンシステム内のデータセキュリティに重大なリスクをもたらす潜在的な脆弱性である、ハッシュ衝突に対するMerkle Treesの感受性を批判的に検証する。広く応用されているにもかかわらず、Merkle Treesの衝突抵抗と、前像攻撃に対する堅牢性は十分に調査されていないため、ブロックチェーンセキュリティメカニズムの包括的な理解において、顕著なギャップが生じる。我々の研究は、理論的分析と実証的検証の巧妙なブレンドを通して、このギャップを埋めようとしている。本研究は,本樹における根の衝突確率を,ハッシュ長や経路長といった様々な要因を考慮し検討した。その結果,ルート長の増加とルート衝突の確率の上昇との間に直接的相関があることが判明し,潜在的なセキュリティ上の脆弱性が強調された。逆に、ハッシュ長の増加は衝突の可能性を著しく低下させ、セキュリティの強化における重要な役割を浮き彫りにする。私たちの研究から得られた洞察は、ブロックチェーンベースのシステムのセキュリティと運用の効率を高めることを目的として、ブロックチェーン開発者と研究者に貴重なガイダンスを提供する。

In the rapidly evolving landscape of blockchain technology, ensuring the integrity and security of data is paramount. This study delves into the security aspects of Merkle Trees, a fundamental component in blockchain architectures, such as Ethereum. We critically examine the susceptibility of Merkle Trees to hash collisions, a potential vulnerability that poses significant risks to data security within blockchain systems. Despite their widespread application, the collision resistance of Merkle Trees and their robustness against preimage attacks have not been thoroughly investigated, leading to a notable gap in the comprehensive understanding of blockchain security mechanisms. Our research endeavors to bridge this gap through a meticulous blend of theoretical analysis and empirical validation. We scrutinize the probability of root collisions in Merkle Trees, considering various factors such as hash length and path length within the tree. Our findings reveal a direct correlation between the increase in path length and the heightened probability of root collisions, thereby underscoring potential security vulnerabilities. Conversely, we observe that an increase in hash length significantly reduces the likelihood of collisions, highlighting its critical role in fortifying security. The insights garnered from our research offer valuable guidance for blockchain developers and researchers, aiming to bolster the security and operational efficacy of blockchain-based systems.

翻訳日:2024-03-18 07:38:15 公開日:2024-02-06

# 説明可能な機械学習を用いたバス利用の空間的・時間的変動に及ぼす行動・構築環境・社会経済的特徴の影響の解明

Unveiling the influence of behavioural, built environment and socio-economic features on the spatial and temporal variability of bus use using explainable machine learning ( http://arxiv.org/abs/2403.05545v1 )

ライセンス: Link先を確認

Sui Tao, Francisco Rowe, Hongyu Shan,

(参考訳) 人々の旅行パターンの多様性を理解することが、交通計画と政策立案の鍵となる。しかし, 日々の交通機関の利用状況は, 地理的・時間的変動の程度と, どのような要因が完全には対処されていないかを示す。本研究は,中国北京のスマートカードデータに基づいて,ピーク時のバス利用の空間的・時間的変動を把握し,関連する文脈的特徴との関連性を調べるために,新しい指標を採用することで,これらの欠陥に対処することを目的とする。説明可能な機械学習を用いて,空間的・時間的変動と旅行頻度の非線形相互作用を明らかにした。さらに、都市中心部(>10km)への距離は、バス利用の空間的変動の増加と関連し、旅行の発端と目的地の分離は、空間的および時間的変動を減少させる。バス路線の高可用性は、より空間的変動性が高いが時間的変動性が低いことに関係している。一方,道路密度の低下と道路密度の上昇は,特に朝のバス利用の空間変動に関係している。これらの結果から,異なる建築環境が旅行時間や場所の柔軟性を適度に発揮していることが明らかとなった。インプリケーションは、より応答性が高く信頼性の高いトランジットシステムの運用と計画を行うために引き起こされる。

Understanding the variability of people's travel patterns is key to transport planning and policy-making. However, to what extent daily transit use displays geographic and temporal variabilities, and what are the contributing factors have not been fully addressed. Drawing on smart card data in Beijing, China, this study seeks to address these deficits by adopting new indices to capture the spatial and temporal variability of bus use during peak hours and investigate their associations with relevant contextual features. Using explainable machine learning, our findings reveal non-linear interaction between spatial and temporal variability and trip frequency. Furthermore, greater distance to the urban centres (>10 kilometres) is associated with increased spatial variability of bus use, while greater separation of trip origins and destinations from the subcentres reduces both spatial and temporal variability. Higher availability of bus routes is linked to higher spatial variability but lower temporal variability. Meanwhile, both lower and higher road density is associated with higher spatial variability of bus use especially in morning times. These findings indicate that different built environment features moderate the flexibility of travel time and locations. Implications are derived to inform more responsive and reliable operation and planning of transit systems.

翻訳日:2024-03-18 06:19:57 公開日:2024-02-06

# AFCとAPCデータを組み合わせた公共交通ネットワークの統一運用

Unified Occupancy on a Public Transport Network through Combination of AFC and APC Data ( http://arxiv.org/abs/2403.05546v1 )

ライセンス: Link先を確認

Amir Dib, Noëlie Cherrier, Martin Graive, Baptiste Rérolle, Eglantine Schmitt,

(参考訳) 交通ネットワークにおいては、旅行者の習慣を把握し、提案を調整するために、船上での居住が鍵となる。伝統的に、オペレーターは典型的な作業日のライダーシップを評価するためにフィールドスタディに依存してきた。しかし、完全な時間的カバレッジを提供する自動運賃徴収(AFC)と自動旅客カウント(APC)データはしばしば利用可能であるが、未公開である。ただし、各データソースには独自のバイアスがあることに注意が必要だ。AFCデータは不正を考慮せず、すべての車両がAPCシステムを備えているわけではない。本稿では,AFC と APC のデータと部分的カバレッジを組み合わせることで,公共交通ネットワークのすべてのコースに占有率を推定する統合占有法を提案する。統一された職業は、他のコースがAPC尺度を持つラインのコースや、APCデータが全く利用できないラインのコースについて、APC情報の欠落を完了します。本手法の精度は、フランスの公共交通機関の実際のデータに基づいて評価される。

In a transport network, the onboard occupancy is key for gaining insights into travelers' habits and adjusting the offer. Traditionally, operators have relied on field studies to evaluate ridership of a typical workday. However, automated fare collection (AFC) and automatic passenger counting (APC) data, which provide complete temporal coverage, are often available but underexploited. It should be noted, however, that each data source comes with its own biases: AFC data may not account for fraud, while not all vehicles are equipped with APC systems. This paper introduces the unified occupancy method, a geostatistical model to extrapolate occupancy to every course of a public transportation network by combining AFC and APC data with partial coverage. Unified occupancy completes missing APC information for courses on lines where other courses have APC measures, as well as for courses on lines where no APC data is available at all. The accuracy of this method is evaluated on real data from several public transportation networks in France.

翻訳日:2024-03-18 06:19:57 公開日:2024-02-06

# 非プログラマのためのAI:プログラミングスキルを持たない学生のための講義における応用AI

AI for non-programmers: Applied AI in the lectures for students without programming skills ( http://arxiv.org/abs/2403.05547v1 )

ライセンス: Link先を確認

Julius Schöning, Tim Wawer, Kai-Michael Griese,

(参考訳) ChatGPTやWOMBO Dreamといったアプリケーションは、プログラミング知識のない学生に人工知能(AI)を使わせるのを容易にする。したがって、あらゆる分野においてAIの重要性が高まる中、プログラミング知識のないAIの学生を教育するためには革新的な戦略が必要である。この研究は、応用AIのための実践的な計画スクリプトを提示する。ドキュメント計画スクリプトは、AIアプリケーションパイプラインに基づいて、AIの概念と研究関連トピックをリンクする。これらのリンクは、新しいソリューション空間を開き、AIの可能性とリスクに対する学生の関心と理解を促進する。エネルギー管理の修士課程の講義シリーズは、AIを規律固有の講義にシームレスに統合する方法を示している。この目的のために、応用AIの計画スクリプトは、学習プログラムのトピックに適合するように適合する。この特定の教育シナリオにより、学生はAIアプリケーションパイプラインを使用して、規律固有のタスクステップを段階的に解決することができる。このように、応用AIのためのドクティク計画スクリプトの適用は、AIの理論概念の実践的な実装を示している。さらに、規律固有の講義でAIが使えるかどうかを評価するために使用できるチェックリストが提示される。将来のスキルとしてのAIは、学習の過程に関連するユースケースに基づいて、学生によって学習されなければならない。このような理由から、AI教育は、学習分野のためにプログラミングのバックグラウンドを持っていなくても、様々なカリキュラムにシームレスに適合すべきである。

Applications such as ChatGPT and WOMBO Dream make it easy to inspire students without programming knowledge to use artificial intelligence (AI). Therefore, given the increasing importance of AI in all disciplines, innovative strategies are needed to educate students in AI without programming knowledge so that AI can be integrated into their study modules as a future skill. This work presents a didactic planning script for applied AI. The didactic planning script is based on the AI application pipeline and links AI concepts with study-relevant topics. These linkages open up a new solution space and promote students' interest in and understanding of the potentials and risks of AI. An example lecture series for master students in energy management shows how AI can be seamlessly integrated into discipline-specific lectures. To this end, the planning script for applied AI is adapted to fit the study programs' topic. This specific teaching scenario enables students to solve a discipline-specific task step by step using the AI application pipeline. Thus, the application of the didactic planning script for applied AI shows the practical implementation of the theoretical concepts of AI. In addition, a checklist is presented that can be used to assess whether AI can be used in the discipline-specific lecture. AI as a future skill must be learned by students based on use cases that are relevant to the course of studies. For this reason, AI education should fit seamlessly into various curricula, even if the students do not have a programming background due to their field of study.

翻訳日:2024-03-18 06:19:57 公開日:2024-02-06

# BERTを用いた過激派ソーシャルメディアにおける反ユダヤ的言論の進化のモニタリング

Monitoring the evolution of antisemitic discourse on extremist social media using BERT ( http://arxiv.org/abs/2403.05548v1 )

ライセンス: Link先を確認

Raza Ul Mustafa, Nathalie Japkowicz,

(参考訳) ソーシャルメディア上での人種差別や不寛容は、悪質なオンライン環境に寄与する。オンラインの反ユダヤ主義は、この研究で考慮された特定の憎しみのカテゴリーである。オンライン議論において、反ユダヤ主義のテーマとその関連する用語を追跡することは、参加者の感情やその進化をモニターし、憎しみのエスカレーションを防ぐための介入の道を提供するのに役立つかもしれない。オンライントラフィックの大量かつ絶え間ない進化のため、手動で会話を監視することは現実的ではない。代わりに、過激派ソーシャルメディアから反ユダヤ主義的テーマや用語を時間をかけて抽出し、その進化を捉える自動手法を提案する。教師付き学習はそのようなタスクには制限されないため、大規模な言語モデルを用いて投稿の文脈的類似性を評価する、教師なしのオンライン機械学習アプローチを作成しました。このメソッドは、同様のポストをまとめ、分割し、既存のテーマや新しいテーマからサブテーマが現れたときに、時間とともに追加のクラスタを生成する。各テーマ内で使用される反ユダヤ的用語は、各クラスタ内のポストから抽出される。実験により,本手法は既存の基準よりも優れており,関連する用語とともに,反ユダヤ的言説の中で発見されるテーマやサブテーマの種類が示されている。当社のアプローチは、社会プラットフォーム上での反ユダヤ主義以外のあらゆる憎悪の進化を監視するのに役立つと信じている。

Racism and intolerance on social media contribute to a toxic online environment which may spill offline to foster hatred, and eventually lead to physical violence. That is the case with online antisemitism, the specific category of hatred considered in this study. Tracking antisemitic themes and their associated terminology over time in online discussions could help monitor the sentiments of their participants and their evolution, and possibly offer avenues for intervention that may prevent the escalation of hatred. Due to the large volume and constant evolution of online traffic, monitoring conversations manually is impractical. Instead, we propose an automated method that extracts antisemitic themes and terminology from extremist social media over time and captures their evolution. Since supervised learning would be too limited for such a task, we created an unsupervised online machine learning approach that uses large language models to assess the contextual similarity of posts. The method clusters similar posts together, dividing, and creating additional clusters over time when sub-themes emerge from existing ones or new themes appear. The antisemitic terminology used within each theme is extracted from the posts in each cluster. Our experiments show that our methodology outperforms existing baselines and demonstrates the kind of themes and sub-themes it discovers within antisemitic discourse along with their associated terminology. We believe that our approach will be useful for monitoring the evolution of all kinds of hatred beyond antisemitism on social platforms.

翻訳日:2024-03-18 06:19:57 公開日:2024-02-06

# スパイクニューラルネットワークのオンライン勾配推定のための前方直接フィードバックアライメント

Forward Direct Feedback Alignment for Online Gradient Estimates of Spiking Neural Networks ( http://arxiv.org/abs/2403.08804v1 )

ライセンス: Link先を確認

Florian Bacho, Dminique Chu,

(参考訳) 現在の最先端のニューラルネットワークトレーニングアルゴリズムに代わる、エネルギー効率の良い代替手段を見つけることに興味がある。スパイクニューラルネットワークは、ニューロモルフィックなハードウェアプラットフォーム上で効率的にエネルギーをシミュレートできるため、有望なアプローチである。しかし、これらのプラットフォームにはトレーニングアルゴリズムの設計に制限がある。最も重要なことは、バックプロパゲーションはそれらに実装できないことです。本稿では,新しいニューロモルフィックアルゴリズムである,SFDFAアルゴリズムを提案し,SNNのトレーニングに<textit{Forward Direct Feedback Alignment}を適用した。 SFDFAは、出力と隠れたニューロンの間の重みをフィードバック接続として推定する。本研究の主な貢献は、シナプス後スパイク間のニューロン内依存性を考慮しつつ、スパイクの局所勾配をオンライン的に正確に計算し、ニューロモルフィックハードウェア互換性の動的システムを導出することである。 SFDFAアルゴリズムと多くの競合アルゴリズムを比較し,提案アルゴリズムが高い性能と収束率を達成することを示す。

There is an interest in finding energy efficient alternatives to current state of the art neural network training algorithms. Spiking neural network are a promising approach, because they can be simulated energy efficiently on neuromorphic hardware platforms. However, these platforms come with limitations on the design of the training algorithm. Most importantly, backpropagation cannot be implemented on those. We propose a novel neuromorphic algorithm, the \textit{Spiking Forward Direct Feedback Alignment} (SFDFA) algorithm, an adaption of \textit{Forward Direct Feedback Alignment} to train SNNs. SFDFA estimates the weights between output and hidden neurons as feedback connections. The main contribution of this paper is to describe how exact local gradients of spikes can be computed in an online manner while taking into account the intra-neuron dependencies between post-synaptic spikes and derive a dynamical system for neuromorphic hardware compatibility. We compare the SFDFA algorithm with a number of competitor algorithms and show that the proposed algorithm achieves higher performance and convergence rates.

翻訳日:2024-03-18 05:40:54 公開日:2024-02-06

# 対向的特徴類似性学習による対向的ロバストディープフェイク検出

Adversarially Robust Deepfake Detection via Adversarial Feature Similarity Learning ( http://arxiv.org/abs/2403.08806v1 )

ライセンス: Link先を確認

Sarwar Khan,

(参考訳) ディープフェイク技術は、デジタルコンテンツの信頼性を懸念し、効果的な検出方法の開発を必要としている。しかし、ディープフェイクが普及し、敵の攻撃という形で新たな課題がもたらされた。敵は、検出モデルを騙して誤った出力を生成する、小さくて知覚できない摂動でディープフェイクビデオを操作できる。この重要な問題に対処するために,3つの基本的深い特徴学習パラダイムを統合したAdversarial Feature similarity Learning (AFSL)を導入する。サンプルと重みベクトルの類似性を最適化することにより、本手法は実例と偽例を区別することを目的としている。さらに,本研究の目的は,実物や偽物によらず,対角的摂動例と非摂動例の類似性を最大化することである。さらに,本手法では,実検体と偽検体との相違を最大化し,両者の明確な分離を確実にする正則化手法を提案する。 FaceForensics++、FaceShifter、DeeperForensicsなど、人気のあるディープフェイクデータセットに関する広範な実験により、提案手法は、他の標準的な対向トレーニングベースの防御方法よりも大幅に優れている。さらに, 敵攻撃からディープフェイク検出器を保護するためのアプローチの有効性を示す。

Deepfake technology has raised concerns about the authenticity of digital content, necessitating the development of effective detection methods. However, the widespread availability of deepfakes has given rise to a new challenge in the form of adversarial attacks. Adversaries can manipulate deepfake videos with small, imperceptible perturbations that can deceive the detection models into producing incorrect outputs. To tackle this critical issue, we introduce Adversarial Feature Similarity Learning (AFSL), which integrates three fundamental deep feature learning paradigms. By optimizing the similarity between samples and weight vectors, our approach aims to distinguish between real and fake instances. Additionally, we aim to maximize the similarity between both adversarially perturbed examples and unperturbed examples, regardless of their real or fake nature. Moreover, we introduce a regularization technique that maximizes the dissimilarity between real and fake samples, ensuring a clear separation between these two categories. With extensive experiments on popular deepfake datasets, including FaceForensics++, FaceShifter, and DeeperForensics, the proposed method outperforms other standard adversarial training-based defense methods significantly. This further demonstrates the effectiveness of our approach to protecting deepfake detectors from adversarial attacks.

翻訳日:2024-03-18 05:40:54 公開日:2024-02-06

# 多目的組合せ最適化問題に対する実効時アルゴリズム

Effective anytime algorithm for multiobjective combinatorial optimization problems ( http://arxiv.org/abs/2403.08807v1 )

ライセンス: Link先を確認

Miguel Ángel Domínguez-Ríos, Francisco Chicano, Enrique Alba,

(参考訳) 多目的最適化において、最適化アルゴリズムの結果は、意思決定者が選択した効率的な解の集合である。すべての効率的な解を短時間で計算できる訳ではなく、探索アルゴリズムを早めに停止させ、これまでに見いだされた解を解析することが一般的である。客観的な空間で十分に普及している効率的なソリューションのセットは、意思決定者に対して様々なソリューションを提供するのに好まれる。しかし、文学におけるいくつかの正確なアルゴリズムは、いつでも、そのようなよく普及した一連のソリューションを提供する能力をもって存在する:我々は、いつでもそれらをアルゴリズムと呼ぶ。そこで我々は,3つの新しいアイデアを組み合わせた多目的組合せ最適化のための新しい正確な随時アルゴリズムを提案する。提案アルゴリズムは, 既知ベンチマークの480インスタンスと, 総合非支配ベクトル生成率, ハイパーボリューム, 一般スプレッド, 加算エプシロンインジケータの4つの異なる性能測定値を用いて, 任意の多目的組合せ最適化のための最先端のアルゴリズムと比較した。総合的な実験的研究により、我々の提案は、ほとんどの事例において、以前のアルゴリズムよりも優れていたことが判明した。

In multiobjective optimization, the result of an optimization algorithm is a set of efficient solutions from which the decision maker selects one. It is common that not all the efficient solutions can be computed in a short time and the search algorithm has to be stopped prematurely to analyze the solutions found so far. A set of efficient solutions that are well-spread in the objective space is preferred to provide the decision maker with a great variety of solutions. However, just a few exact algorithms in the literature exist with the ability to provide such a well-spread set of solutions at any moment: we call them anytime algorithms. We propose a new exact anytime algorithm for multiobjective combinatorial optimization combining three novel ideas to enhance the anytime behavior. We compare the proposed algorithm with those in the state-of-the-art for anytime multiobjective combinatorial optimization using a set of 480 instances from different well-known benchmarks and four different performance measures: the overall non-dominated vector generation ratio, the hypervolume, the general spread and the additive epsilon indicator. A comprehensive experimental study reveals that our proposal outperforms the previous algorithms in most of the instances.

翻訳日:2024-03-18 05:40:54 公開日:2024-02-06

# 耐異常性を有する長距離水中航行のためのバイオン型データ駆動アプローチ

A Bionic Data-driven Approach for Long-distance Underwater Navigation with Anomaly Resistance ( http://arxiv.org/abs/2403.08808v1 )

ライセンス: Link先を確認

Songnan Yang, Xiaohui Zhang, Shiliang Zhang, Xuehui Ma, Wenqi Bai, Yushuai Li, Tingwen Huang,

(参考訳) 様々な動物が環境の手がかりを使って正確なナビゲーションをしている。地球の磁場は長距離動物相の移動において信頼できる情報源であることが証明されている。動物航法にインスパイアされたこの研究は、長距離水中航法のためのバイオニックでデータ駆動のアプローチを提案する。提案手法では,GPSシステムや地理地図を必要とせず,測地データを用いてナビゲーションを行う。特に,時間的注意に基づくLong Short-Term Memory(TA-LSTM)ネットワークを構築し,ナビゲーション中の方向角を予測する。地磁気異常の影響を緩和するため,最大線量推定に基づく異常の検出・定量化機構を開発した。開発機構をTA-LSTMと統合し、予測方向角を校正し、地磁気異常に対する耐性を得る。 WMMモデルから取得したデータを用いて,多様なナビゲーション条件を用いた数値シミュレーションを行い,本手法を検証した。シミュレーションの結果,地磁気異常に対するレジリエンスナビゲーションと,単一および複数目的地における水中ナビゲーションの精度と安定性が示された。

Various animals exhibit accurate navigation using environment cues. The Earth's magnetic field has been proved a reliable information source in long-distance fauna migration. Inspired by animal navigation, this work proposes a bionic and data-driven approach for long-distance underwater navigation. The proposed approach uses measured geomagnetic data for the navigation, and requires no GPS systems or geographical maps. Particularly, we construct and train a Temporal Attention-based Long Short-Term Memory (TA-LSTM) network to predict the heading angle during the navigation. To mitigate the impact of geomagnetic anomalies, we develop the mechanism to detect and quantify the anomalies based on Maximum Likelihood Estimation. We integrate the developed mechanism with the TA-LSTM, and calibrate the predicted heading angles to gain resistance against geomagnetic anomalies. Using the retrieved data from the WMM model, we conduct numerical simulations with diversified navigation conditions to test our approach. The simulation results demonstrate a resilience navigation against geomagnetic anomalies by our approach, along with precision and stability of the underwater navigation in single and multiple destination missions.

翻訳日:2024-03-18 05:40:54 公開日:2024-02-06

# webotsによるコンテナ化(深層)強化学習のためのアーキテクチャ

An Architecture for Unattended Containerized (Deep) Reinforcement Learning with Webots ( http://arxiv.org/abs/2403.00765v1 )

ライセンス: Link先を確認

Tobias Haubold, Petra Linke

(参考訳) データサイエンスアプリケーションが業界で採用されるにつれて、ツールの世界は成熟し、そのようなアプリケーションのライフサイクルを促進し、関係者の生産性向上に関わる課題の解決策を提供する。 3d世界のエージェントによる強化学習は、まだ課題に直面している可能性がある。シミュレーションソフトウェアを使用するために必要な知識と、無人のトレーニングパイプラインでのスタンドアロンシミュレーションソフトウェアの利用。本稿では,ロボットロボットに関して,ロボットの強化学習エージェントを訓練するためのツールとアプローチをレビューし,仮想世界の創造者のためのシミュレーション環境とデータサイエンティストのためのモデル開発環境の分離は,あまり話題になっていないことを論じる。どちらも同じで、データサイエンティストはapiを直接扱うためにシミュレーションソフトウェアに関する知識を必要とします。さらに、仮想世界やデータサイエンティストの作者が同じファイルで作業することもある。私たちは、データサイエンティストがシミュレーションソフトウェアに関する知識を必要としないアプローチを説明することで、このトピックに貢献したいと考えています。本手法では,シミュレーションソフトウェアであるwebots,ロボットオペレーティングシステムを用いてシミュレーションロボットと通信し,シミュレーションソフトウェア自体とコンテナ技術を用いてシミュレーションをモデル開発環境から分離する。私たちは、データサイエンティストが扱うAPIと、意図しないトレーニングパイプラインでスタンドアロンのシミュレーションソフトウェアを使用することを強調しました。ロボットノに特有の部分と学習すべきロボットタスクを示す。

As data science applications gain adoption across industries, the tooling landscape matures to facilitate the life cycle of such applications and provide solutions to the challenges involved to boost the productivity of the people involved. Reinforcement learning with agents in a 3D world could still face challenges: the knowledge required to use a simulation software as well as the utilization of a standalone simulation software in unattended training pipelines. In this paper we review tools and approaches to train reinforcement learning agents for robots in 3D worlds with respect to the robot Robotino and argue that the separation of the simulation environment for creators of virtual worlds and the model development environment for data scientists is not a well covered topic. Often both are the same and data scientists require knowledge of the simulation software to work directly with their APIs. Moreover, sometimes creators of virtual worlds and data scientists even work on the same files. We want to contribute to that topic by describing an approach where data scientists don't require knowledge about the simulation software. Our approach uses the standalone simulation software Webots, the Robot Operating System to communicate with simulated robots as well as the simulation software itself and container technology to separate the simulation from the model development environment. We put emphasize on the APIs the data scientists work with and the use of a standalone simulation software in unattended training pipelines. We show the parts that are specific to the Robotino and the robot task to learn.

翻訳日:2024-03-11 00:19:35 公開日:2024-02-06

# 拡張クエリによる言語生成のための検索プロセスの強化

Enhancing Retrieval Processes for Language Generation with Augmented Queries ( http://arxiv.org/abs/2402.16874v1 )

ライセンス: Link先を確認

Julien Pierre Edmond Ghali, Kosuke Shima, Koichi Moriyama, Atsuko Mutoh, Nobuhiro Inuzuka

(参考訳) スマートテクノロジーの急速な変化の中で、高度な言語モデルの台頭により、文書の検索がますます困難になっている。これらのモデルは、しばしば「幻覚」として知られる不正確な情報を提供するような困難に直面している。本研究は,実事実に基づく正確な応答をモデルに誘導するRAG(Retrieval-Augmented Generation)を通じてこの問題に対処することに焦点を当てる。スケーラビリティの問題を克服するために、この研究は、革新的なクエリ最適化プロセスを使用して、bertやorca2といった高度な言語モデルとユーザクエリを接続することを検討している。この研究は、3つのシナリオに展開されている。まずはRAGなしで、次に追加の助けなしで、最後に追加の助けなしで。コンパクトだが効率的なOrca2 7Bモデルを選択することは、コンピューティングリソースのスマートな利用を実証する。実験結果から,RAGによる初期言語モデルの性能向上,特にプロンプト強化時の性能向上が示唆された。異なるエンコーディング間の文書検索の一貫性は、言語モデル生成クエリの使用の有効性を強調する。 UMAP for BERTの導入により、強力な結果を維持しながら文書検索がさらに簡単になる。

In the rapidly changing world of smart technology, searching for documents has become more challenging due to the rise of advanced language models. These models sometimes face difficulties, like providing inaccurate information, commonly known as "hallucination." This research focuses on addressing this issue through Retrieval-Augmented Generation (RAG), a technique that guides models to give accurate responses based on real facts. To overcome scalability issues, the study explores connecting user queries with sophisticated language models such as BERT and Orca2, using an innovative query optimization process. The study unfolds in three scenarios: first, without RAG, second, without additional assistance, and finally, with extra help. Choosing the compact yet efficient Orca2 7B model demonstrates a smart use of computing resources. The empirical results indicate a significant improvement in the initial language model's performance under RAG, particularly when assisted with prompts augmenters. Consistency in document retrieval across different encodings highlights the effectiveness of using language model-generated queries. The introduction of UMAP for BERT further simplifies document retrieval while maintaining strong results.

翻訳日:2024-03-03 19:20:57 公開日:2024-02-06

# Mind the Gap: ピアグループからのセキュリティ逸脱に基づいたセキュアなサイバーリスクモデリング

Mind the Gap: Securely modeling cyber risk based on security deviations from a peer group ( http://arxiv.org/abs/2402.04166v1 )

ライセンス: Link先を確認

Taylor Reynolds, Sarah Scheffler, Daniel J. Weitzner, Angelina Wu

(参考訳) 組織が主に答えられなかったサイバーリスクについて、戦略的かつ長年にわたって疑問が2つある。両方の回答には、セキュリティ姿勢、インシデント、損失に関する業界全体のデータが必要である。現在、暗号コンピューティングのようなプライバシー強化技術(pets)は、機密性の高い入力データを非公開にしながら、組織のピアグループによるサイバーリスクメトリクスの安全な計算を可能にする。これらの新しい集計データが利用可能になると、アナリストはそれらをサイバーリスクモデルに統合し、より信頼できるリスクアセスメントを生成し、ピアグループと比較できるようにする方法が必要となる。本稿では,セキュアな計算から生じる新しい変数を用いて,ピアに対するサイバー姿勢のベンチマークを行い,特定の経済セクターにおけるサイバーリスクを推定する枠組みを提案する。本稿では,組織とその仲間間の重み付けされたセキュリティギャップを表す,defid gap indexと呼ばれる新たなトップライン変数を導入し,過去の産業データに基づいて組織のセキュリティリスクを予測する。我々は,25の大企業から収集したデータを用いて特定の分野に適用し,業界ISAOと共同で業界リスクモデルを構築し,参加者に自身のリスク露出を推定するためのツールを提供し,セキュリティ姿勢を仲間とプライベートに比較する。

There are two strategic and longstanding questions about cyber risk that organizations largely have been unable to answer: What is an organization's estimated risk exposure and how does its security compare with peers? Answering both requires industry-wide data on security posture, incidents, and losses that, until recently, have been too sensitive for organizations to share. Now, privacy enhancing technologies (PETs) such as cryptographic computing can enable the secure computation of aggregate cyber risk metrics from a peer group of organizations while leaving sensitive input data undisclosed. As these new aggregate data become available, analysts need ways to integrate them into cyber risk models that can produce more reliable risk assessments and allow comparison to a peer group. This paper proposes a new framework for benchmarking cyber posture against peers and estimating cyber risk within specific economic sectors using the new variables emerging from secure computations. We introduce a new top-line variable called the Defense Gap Index representing the weighted security gap between an organization and its peers that can be used to forecast an organization's own security risk based on historical industry data. We apply this approach in a specific sector using data collected from 25 large firms, in partnership with an industry ISAO, to build an industry risk model and provide tools back to participants to estimate their own risk exposure and privately compare their security posture with their peers.

翻訳日:2024-02-18 14:32:23 公開日:2024-02-06

# 人間論における大規模言語モデルの限界

Limits of Large Language Models in Debating Humans ( http://arxiv.org/abs/2402.06049v1 )

ライセンス: Link先を確認

James Flamino, Mohammed Shahid Modi, Boleslaw K. Szymanski, Brendan Cross, Colton Mikolajczyk

(参考訳) 大規模言語モデル(llm)は、人間と巧みに対話する能力に顕著な期待を示してきた。その後、会話に関わる社会学的実験で人工的な南軍やサロゲートとしての使用の可能性は、エキサイティングな見通しである。しかし、このアイデアはどの程度有効か? 本論文は,LLMエージェントを現実人と組み合わせた事前登録研究により,現在のLLMの限界を検証しようとする試みである。この研究は、人間のみ、エージェントと人間、エージェントのみの3つの環境における議論に基づく意見合意形成に焦点を当てている。私たちのゴールは、LLMエージェントが人間にどのように影響するか、そして人間のように議論する能力について理解することです。 LLMは人間の生産性をブレンドし促進するが、議論では説得力に欠けており、最終的には人間の行動から逸脱する。我々は、これらの主要な失敗を解明し、LCMが議論者になる前にさらに進化する必要があることを期待する。

Large Language Models (LLMs) have shown remarkable promise in their ability to interact proficiently with humans. Subsequently, their potential use as artificial confederates and surrogates in sociological experiments involving conversation is an exciting prospect. But how viable is this idea? This paper endeavors to test the limits of current-day LLMs with a pre-registered study integrating real people with LLM agents acting as people. The study focuses on debate-based opinion consensus formation in three environments: humans only, agents and humans, and agents only. Our goal is to understand how LLM agents influence humans, and how capable they are in debating like humans. We find that LLMs can blend in and facilitate human productivity but are less convincing in debate, with their behavior ultimately deviating from human's. We elucidate these primary failings and anticipate that LLMs must evolve further before being viable debaters.

翻訳日:2024-02-18 14:07:29 公開日:2024-02-06

# かなり進歩していますか? 不均衡回帰から見た化学反応収率予測の再検討

Are we making much progress? Revisiting chemical reaction yield prediction from an imbalanced regression perspective ( http://arxiv.org/abs/2402.05971v1 )

ライセンス: Link先を確認

Yihong Ma, Xiaobao Huang, Bozhao Nan, Nuno Moniz, Xiangliang Zhang, Olaf Wiest and Nitesh V. Chawla

(参考訳) 化学反応の収率は、化学反応中に消費される反応物に関連して形成されるターゲット生成物の割合を定量する。正確な収率予測は、合成計画中に高yield反応を選択するための化学者のガイドとなり、ウェットラボの実験に時間と資源を割く前に貴重な洞察を提供する。近年の歩留まり予測の進歩は収量範囲全体の全体的な性能改善に繋がったが、化学者にとって大きな関心事である高yield反応の予測の強化には未解決の課題が残っている。本稿では, 高収率予測における性能差は, 低収率反応に歪んだ実世界のデータの不均衡分布に起因すると論じる。このデータ不均衡にもかかわらず、既存の収量予測法は、バランスの取れたトレーニング分布を仮定して、異なる収量範囲を等しく扱い続けている。 3つの実世界の収量予測データセットに関する広範囲な実験を通じて,不均衡回帰問題としての反応収量予測の再フレームの必要性を強調する。最後に,簡易なコストセンシティブな再重み付け手法の導入により,低表示高yield領域における収率予測モデルの性能が著しく向上することを示す。

The yield of a chemical reaction quantifies the percentage of the target product formed in relation to the reactants consumed during the chemical reaction. Accurate yield prediction can guide chemists toward selecting high-yield reactions during synthesis planning, offering valuable insights before dedicating time and resources to wet lab experiments. While recent advancements in yield prediction have led to overall performance improvement across the entire yield range, an open challenge remains in enhancing predictions for high-yield reactions, which are of greater concern to chemists. In this paper, we argue that the performance gap in high-yield predictions results from the imbalanced distribution of real-world data skewed towards low-yield reactions, often due to unreacted starting materials and inherent ambiguities in the reaction processes. Despite this data imbalance, existing yield prediction methods continue to treat different yield ranges equally, assuming a balanced training distribution. Through extensive experiments on three real-world yield prediction datasets, we emphasize the urgent need to reframe reaction yield prediction as an imbalanced regression problem. Finally, we demonstrate that incorporating simple cost-sensitive re-weighting methods can significantly enhance the performance of yield prediction models on underrepresented high-yield regions.

翻訳日:2024-02-18 14:07:00 公開日:2024-02-06

# ニューラル離散学習とレベル・オブ・エキスパートを用いた時空間力学系のモデリング

Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-Experts ( http://arxiv.org/abs/2402.05970v1 )

ライセンス: Link先を確認

Kun Wang, Hao Wu, Guibin Zhang, Junfeng Fang, Yuxuan Liang, Yuankai Wu, Roger Zimmermann, Yang Wang

(参考訳) 本稿では,映像フレームのような観測順序に基づく時空間力学系の状態変化のモデル化と推定の問題について述べる。従来の数値シミュレーションシステムは、構成された偏微分方程式(PDE)の初期設定と正しさに大きく依存する。近年の取り組みは、ニューラルネットワークによるデータ駆動型PDEの発見に大きな成功をもたらしたが、特異なシナリオによる制限と、局所的な洞察の欠如により、より広い現実世界の文脈で効果的に実行できなくなる。そこで本研究では,一般的な物理プロセスの進化法則をデータ駆動方式で捉えるために,ユニバーサル・エキスパート・モジュール,すなわち光フロー推定コンポーネントを提案する。局所的なインサイトを高めるため,局所的な特性は内部の様々な情報に影響され,システム全体のマクロ的特性と矛盾する可能性があるため,より微細な物理パイプラインの設計に苦慮する。さらに、現在広く使われているニューラル離散学習を利用して、潜在空間の根底にある重要な特徴を明らかにし、このプロセスは解釈可能性をより良く注入し、これらの離散確率変数に対して強力な先行性を得るのに役立つ。提案手法が既存の sota ベースラインと比較して大きな性能マージンを達成することを示すために,広範な実験とアブレーションを実施している。

In this paper, we address the issue of modeling and estimating changes in the state of the spatio-temporal dynamical systems based on a sequence of observations like video frames. Traditional numerical simulation systems depend largely on the initial settings and correctness of the constructed partial differential equations (PDEs). Despite recent efforts yielding significant success in discovering data-driven PDEs with neural networks, the limitations posed by singular scenarios and the absence of local insights prevent them from performing effectively in a broader real-world context. To this end, this paper propose the universal expert module -- that is, optical flow estimation component, to capture the evolution laws of general physical processes in a data-driven fashion. To enhance local insight, we painstakingly design a finer-grained physical pipeline, since local characteristics may be influenced by various internal contextual information, which may contradict the macroscopic properties of the whole system. Further, we harness currently popular neural discrete learning to unveil the underlying important features in its latent space, this process better injects interpretability, which can help us obtain a powerful prior over these discrete random variables. We conduct extensive experiments and ablations to demonstrate that the proposed framework achieves large performance margins, compared with the existing SOTA baselines.

翻訳日:2024-02-18 14:06:37 公開日:2024-02-06

# 変圧器の訓練における破壊対称性

Breaking Symmetry When Training Transformers ( http://arxiv.org/abs/2402.05969v1 )

ライセンス: Link先を確認

Chunsheng Zuo, Michael Guerzhoy

(参考訳) 本稿では,入力トークン1, 2, ..., n-1$の置換に対して,位置エンコーディングと因果注意のメカニズムの1つを使わずに,出力トークン$n+1$のTransformerアーキテクチャの予測を行う。通常、両方の機構が採用され、入力トークンに対する対称性が損なわれる。近年,位置符号化なしでトランスフォーマーを訓練できることが示されている。これは因果注意機構によって実現されなければならない。本稿では,変換器が順序が重要な入力シーケンスをモデル化できるという事実に対して,因果接続機構が責任を負うべきであるという議論を詳述する。トランスフォーマーの垂直な「スライス」は入力シーケンスで同じ位置の$k$を表すように推奨されている。我々は、残留接続がこの現象に寄与すると仮定し、その証拠を示す。

As we show in this paper, the prediction for output token $n+1$ of Transformer architectures without one of the mechanisms of positional encodings and causal attention is invariant to permutations of input tokens $1, 2, ..., n-1$. Usually, both mechanisms are employed and the symmetry with respect to the input tokens is broken. Recently, it has been shown that one can train Transformers without positional encodings. This must be enabled by the causal attention mechanism. In this paper, we elaborate on the argument that the causal connection mechanism must be responsible for the fact that Transformers are able to model input sequences where the order is important. Vertical "slices" of Transformers are all encouraged to represent the same location $k$ in the input sequence. We hypothesize that residual connections contribute to this phenomenon, and demonstrate evidence for this.

翻訳日:2024-02-18 14:06:15 公開日:2024-02-06

# 悪質な再構成可能な知的表面を含む物理層秘密鍵を用いた説明可能な逆学習フレームワーク

Explainable Adversarial Learning Framework on Physical Layer Secret Keys Combating Malicious Reconfigurable Intelligent Surface ( http://arxiv.org/abs/2402.06663v1 )

ライセンス: Link先を確認

Zhuangkun Wei, Wenxiu Hu, Weisi Guo

(参考訳) 再構成可能なインテリジェントサーフェス(ris)の開発は、物理層セキュリティ(pls)のための二重刃の剣である。適切なRISは、物理層シークレットキー生成(PL-SKG)を高めるためにチャネルランダム性の増加を含む有益な影響をもたらすが、悪意のあるRISは正当なチャネルを毒化し、既存のPL-SKGの大半を分解する。本研究では,この中途半端な悪意あるRIS(MITM-RIS)盗聴に対処するため,Alice と Bob の対立学習フレームワークを提案する。まず、正当なペアとMITM-RISの理論的相互情報ギャップを推定する。そこでAliceとBobはGAN(Generative Adversarial Network)を利用して、MITM-RISと重なり合う情報を持たない共通の特徴曲面を実現する。次に,シンボリック説明可能ai(xai)表現を用いて,ブラックボックスニューラルネットワークの信号処理解釈を支援する。これらの支配的なニューロンの象徴的な用語は、工学に基づく検証とPLS共通特徴空間の将来の設計に役立つ。シミュレーションの結果,提案したGANベースおよびシンボリックベースPL-SKGは,正規ユーザ間での高いキーコンセンサスを達成でき,また,正規機能生成(NNや公式)の知識を持つMITM-RIS Eveにも耐性があることがわかった。これにより、将来の6gで信頼できない反射型デバイスでワイヤレス通信を確保する方法が整う。

The development of reconfigurable intelligent surfaces (RIS) is a double-edged sword to physical layer security (PLS). Whilst a legitimate RIS can yield beneficial impacts including increased channel randomness to enhance physical layer secret key generation (PL-SKG), malicious RIS can poison legitimate channels and crack most of existing PL-SKGs. In this work, we propose an adversarial learning framework between legitimate parties (namely Alice and Bob) to address this Man-in-the-middle malicious RIS (MITM-RIS) eavesdropping. First, the theoretical mutual information gap between legitimate pairs and MITM-RIS is deduced. Then, Alice and Bob leverage generative adversarial networks (GANs) to learn to achieve a common feature surface that does not have mutual information overlap with MITM-RIS. Next, we aid signal processing interpretation of black-box neural networks by using a symbolic explainable AI (xAI) representation. These symbolic terms of dominant neurons aid feature engineering-based validation and future design of PLS common feature space. Simulation results show that our proposed GAN-based and symbolic-based PL-SKGs can achieve high key agreement rates between legitimate users, and is even resistant to MITM-RIS Eve with the knowledge of legitimate feature generation (NNs or formulas). This therefore paves the way to secure wireless communications with untrusted reflective devices in future 6G.

翻訳日:2024-02-18 13:56:38 公開日:2024-02-06

# 注意に基づくグラフデコーダの符号ランク制限

Sign Rank Limitations for Attention-Based Graph Decoders ( http://arxiv.org/abs/2402.06662v1 )

ライセンス: Link先を確認

Su Hyeong Lee and Qingqi Zhang and Risi Kondor

(参考訳) 内部製品ベースのデコーダは、潜在埋め込みから有意義なデータを抽出するために使用される最も影響力のあるフレームワークの1つである。しかし、そのようなデコーダは、特にグラフ再構成問題において顕著な多くの著作において、表現能力の限界を示している。本稿では,この普及現象をグラフデータで初めて理論的に解明し,内部積の枠組みから逸脱することなく,この問題を回避するための簡単な修正を提案する。

Inner product-based decoders are among the most influential frameworks used to extract meaningful data from latent embeddings. However, such decoders have shown limitations in representation capacity in numerous works within the literature, which have been particularly notable in graph reconstruction problems. In this paper, we provide the first theoretical elucidation of this pervasive phenomenon in graph data, and suggest straightforward modifications to circumvent this issue without deviating from the inner product framework.

翻訳日:2024-02-18 13:56:09 公開日:2024-02-06

# 現実的な予測プロセスによる拡散による天気予報

Weather Prediction with Diffusion Guided by Realistic Forecast Processes ( http://arxiv.org/abs/2402.06666v1 )

ライセンス: Link先を確認

Zhanxiang Hua, Yutong He, Chengqian Ma, Alexandra Anderson-Frey

(参考訳) 最近開発されたディープラーニング(dl)に基づくモデルが、従来の数値気象予測(nwp)モデルのパフォーマンスにアプローチしている。しかしながら、これらのDLモデルは、しばしば複雑でリソース集約的であり、訓練後の柔軟性とNWP予測の導入の制限に直面しており、潜在的な非物理的予測による信頼性の懸念につながっている。本研究では,気象予測に拡散モデル(DM)を適用した新しい手法を提案する。特に,本手法は,同じモデリングフレームワークを用いて,直接予測と反復予測の両方を実現できる。我々のモデルは、独立して予測を生成するだけでなく、サンプリングプロセス中に異なるリード時間であっても、NWP予測の統合を可能にする。我々のモデルの柔軟性と制御性は、一般の気象コミュニティにとってより信頼性の高いDLシステムを可能にする。さらに,永続性と気候データの統合は,長期予測安定性をさらに向上させる。実験により,本手法の有効性と一般化性を示し,再学習を必要とせず,より高度な拡散モデルの実現が期待できることを示す。

Weather forecasting remains a crucial yet challenging domain, where recently developed models based on deep learning (DL) have approached the performance of traditional numerical weather prediction (NWP) models. However, these DL models, often complex and resource-intensive, face limitations in flexibility post-training and in incorporating NWP predictions, leading to reliability concerns due to potential unphysical predictions. In response, we introduce a novel method that applies diffusion models (DM) for weather forecasting. In particular, our method can achieve both direct and iterative forecasting with the same modeling framework. Our model is not only capable of generating forecasts independently but also uniquely allows for the integration of NWP predictions, even with varying lead times, during its sampling process. The flexibility and controllability of our model empowers a more trustworthy DL system for the general weather community. Additionally, incorporating persistence and climatology data further enhances our model's long-term forecasting stability. Our empirical findings demonstrate the feasibility and generalizability of this approach, suggesting a promising direction for future, more sophisticated diffusion models without the need for retraining.

翻訳日:2024-02-18 13:38:41 公開日:2024-02-06

# 身体的AIの基礎的世界モデルにおける因果関係の本質的役割

The Essential Role of Causality in Foundation World Models for Embodied AI ( http://arxiv.org/abs/2402.06665v1 )

ライセンス: Link先を確認

Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Sch\"olkopf, Cheng Zhang

(参考訳) 基礎モデルの最近の進歩、特に大規模マルチモーダルモデルや会話エージェントは、一般的に有能な具体化エージェントの可能性に関心を燃やしている。このようなエージェントは、多くの異なる現実世界環境で新しいタスクを実行する能力を必要とする。しかし、現在の基礎モデルは現実世界との物理的相互作用を正確にモデル化できないため、Embodied AIには不十分である。因果関係の研究は、可能な相互作用の結果を正確に予測するために不可欠である、検証世界モデルの構築に結びつく。本稿では,次世代のエンボディエージェントのための基礎世界モデルの構築に焦点をあて,それらの意義に関する新たな視点を示す。因果的考察の統合は,世界と有意義な物理的相互作用を促進する上で不可欠であると考えられる。最後に,この文脈における因果性に関する誤解を解き明かすとともに,今後の研究の展望を示す。

Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents would require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions with the real world thus not sufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitate meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research.

翻訳日:2024-02-18 13:38:25 公開日:2024-02-06

# No-Code AutoMLによる人間中心のAIプロダクトプロトタイプ - 概念フレームワーク、可能性、限界

Human-Centered AI Product Prototyping with No-Code AutoML: Conceptual Framework, Potentials and Limitations ( http://arxiv.org/abs/2402.07933v1 )

ライセンス: Link先を確認

Mario Truss, Marc Schmitt

(参考訳) 本稿では,AI製品プロトタイピングにおける課題に対する解決策としてNo-Code AutoMLを評価し,非専門家への予測不能と到達不能を特徴とし,概念的枠組みを提案する。このAI製品の複雑さは、人間中心のAI製品にとって不可欠なシームレスな実行と学際的なコラボレーションを妨げる。産業とイノベーションに関連して、戦略的意思決定と投資リスク軽減に影響を及ぼす。現在のアプローチは、AIプロダクトのアイデアの可能性と実現可能性に関する限られた洞察を提供する。この研究はDesign Science Researchを採用し、No-code AutoMLを使ったAIプロダクトプロトタイプのフレームワークを提供することで、課題を特定し、ソリューションとしてNo-code AutoMLを統合する。ケーススタディでは、AI製品開発に対する構造化されたアプローチを提供する非専門家をサポートする可能性を確認している。このフレームワークは、アクセシブルで解釈可能なプロトタイピングを促進し、アカデミック、マネージャ、意思決定者に恩恵を与える。 no-code AutoMLの戦略的統合は効率を高め、非専門家に権限を与え、承認された制限にもかかわらず、アーリーステージの決定を通知する。

This paper evaluates No-Code AutoML as a solution for challenges in AI product prototyping, characterized by unpredictability and inaccessibility to non-experts, and proposes a conceptual framework. This complexity of AI products hinders seamless execution and interdisciplinary collaboration crucial for human-centered AI products. Relevant to industry and innovation, it affects strategic decision-making and investment risk mitigation. Current approaches provide limited insights into the potential and feasibility of AI product ideas. Employing Design Science Research, the study identifies challenges and integrates no-code AutoML as a solution by presenting a framework for AI product prototyping with No-code AutoML. A case study confirms its potential in supporting non-experts, offering a structured approach to AI product development. The framework facilitates accessible and interpretable prototyping, benefiting academia, managers, and decision-makers. Strategic integration of no-code AutoML enhances efficiency, empowers non-experts, and informs early-stage decisions, albeit with acknowledged limitations.

翻訳日:2024-02-18 13:28:32 公開日:2024-02-06

# スキーマ開発のための人間・機械協調フレームワーク

A Human-Machine Collaboration Framework for the Development of Schemas ( http://arxiv.org/abs/2402.07932v1 )

ライセンス: Link先を確認

Nicos Isaak

(参考訳) Winograd Schema Challenge (WSC)は、マシンインテリジェンスのためのよく考えられたテストであり、人間の振る舞いを示すシステムの開発システムに光を当てることが提案されている。導入以来、AIコミュニティの焦点をテクノロジーからAI科学へと転換することを目的としていた。人間にとって一般的で自明な研究は、機械にとって、特に新しいスキーマを扱う必要がある場合、特に、明確な代名詞の解決を必要とするよく設計された文は、依然として困難であることを示している。研究者がチャレンジそのものに関心を持つようになるにつれて、これはおそらく、人間の専門家が合理的に開発できる範囲を超えて、多くのウィノグラードスキーマが利用可能になる必要があるだろう。このニーズに対処するために、人間と機械がチームメイトとしてどのように協力して新しいスキーマをゼロから設計できるかを明確に焦点をあてる新しいフレームワークを提案する。これは2つの最近の研究を組み合わせることで達成されている。 i)winventorは、高品質ではないが、大量のwinogradスキーマを開発するための機械駆動のアプローチで、 ii)WinoFlexiは、クラウドソーシングシステムで、クラウドワーカーが専門家とよく似た品質の限られた数のスキーマを開発することができる。提案手法は,人間と機械の知能を向上し,補完的な強みを生かした新しい協調プラットフォームを開発するための新たなロードマップを構築する。

The Winograd Schema Challenge (WSC), a seemingly well-thought-out test for machine intelligence, has been proposed to shed light on developing systems that exhibit human behavior. Since its introduction, it aimed to pivot the focus of the AI community from the technology to the science of AI. While common and trivial for humans, studies show that it is still challenging for machines, especially when they have to deal with novel schemas, that is, well-designed sentences that require the resolving of definite pronouns. As researchers have become increasingly interested in the challenge itself, this presumably necessitates the availability of an extensive collection of Winograd schemas, which goes beyond what human experts can reasonably develop themselves, especially after proposed ways of utilizing them as novel forms of CAPTCHAs. To address this necessity, we propose a novel framework that explicitly focuses on how humans and machines can collaborate as teammates to design novel schemas from scratch. This is being accomplished by combining two recent studies from the literature: i) Winventor, a machine-driven approach for the development of large amounts of Winograd schemas, albeit not of high quality, and ii) WinoFlexi, an online crowdsourcing system that allows crowd workers to develop a limited number of schemas often of similar quality to that of experts. Our proposal crafts a new road map toward developing a novel collaborative platform that amplifies human and machine intelligence by combining their complementary strengths.

翻訳日:2024-02-18 13:28:16 公開日:2024-02-06

# コンテキスト対応型自動乗客計数データデノイング

Context-Aware Automated Passenger Counting Data Denoising ( http://arxiv.org/abs/2402.08688v1 )

ライセンス: Link先を確認

No\"elie Cherrier, Baptiste R\'erolle, Martin Graive, Amir Dib, Eglantine Schmitt

(参考訳) 公共交通網における利用者の信頼性と正確な知識は、公共交通事業者や公共団体にとって、そのネットワークの使用と交通提供の最適化を意識することが不可欠である。現在、乗客数を推定する手法がいくつか存在し、一部は自動化されている。そのうち、自動旅客カウント(APC)システムは、コースの各駅に車両を乗降させる乗客を検知する。しかし、これらのシステムから得られるデータは、しばしばうるさいか、あるいは偏りがあるため、搭載された占有率の過大評価に繋がる。本研究では,APCデータのロバスト性向上と解析の容易化を目的としたデノナイズアルゴリズムを提案する。提案手法は制約付き整数線形最適化であり,チケットデータと過去のライダーシップデータを利用して最適化をさらに制約し,ガイドする。パフォーマンスは、フランスのいくつかの公共交通網における他の鳴り物入り手法や、これらのネットワークの1つで利用可能な手動カウント、およびシミュレーションデータと比較される。

A reliable and accurate knowledge of the ridership in public transportation networks is crucial for public transport operators and public authorities to be aware of their network's use and optimize transport offering. Several techniques to estimate ridership exist nowadays, some of them in an automated manner. Among them, Automatic Passenger Counting (APC) systems detect passengers entering and leaving the vehicle at each station of its course. However, data resulting from these systems are often noisy or even biased, resulting in under or overestimation of onboard occupancy. In this work, we propose a denoising algorithm for APC data to improve their robustness and ease their analyzes. The proposed approach consists in a constrained integer linear optimization, taking advantage of ticketing data and historical ridership data to further constrain and guide the optimization. The performances are assessed and compared to other denoising methods on several public transportation networks in France, to manual counts available on one of these networks, and on simulated data.

翻訳日:2024-02-18 13:14:26 公開日:2024-02-06

# 適応的な選択的シナプス減衰によるパラメータチューニングフリーデータ入力誤りの学習

Parameter-tuning-free data entry error unlearning with adaptive selective synaptic dampening ( http://arxiv.org/abs/2402.10098v1 )

ライセンス: Link先を確認

Stefan Schoepf, Jack Foster, Alexandra Brintrup

(参考訳) データ入力は機械学習パイプラインの基本コンポーネントを構成するが、しばしばラベルエラーが発生する。このようなエラーを含むデータセットでモデルがトレーニングされた場合、そのパフォーマンスは低下する。これにより、モデルを完全に再トレーニングすることなく、誤ったデータの影響を効率よく学び、モデルのパフォーマンスを改善することが困難になる。間違ったエントリの正しいラベルが知られている場合、モデル編集方法が存在するが、誤ったデータに対する正しいラベルを知らないデータ入力エラーの場合に焦点を当てる。私たちの貢献は2倍です。まず,選択的シナプス減衰アンラーニング法の拡張を行い,パラメータチューニングの必要性を排除し,実践者が学べるようにした。本稿では,ResNet18とVision Transformerの未学習タスクにおける適応選択的シナプス減衰(ASSD)の性能を示す。次に,実世界データを用いたラベリング誤差を伴うサプライチェーン遅延予測問題において,様々なラベリング誤差のレベルをランダムに導入したasdの性能を示す。このアプローチの適用は、特にサプライチェーン管理のような、excelシートを介してデータ入力のかなりの部分が手動で発生し、エラーが発生しやすい産業環境では魅力的である。 ASSDは、一般的なアンラーニングベンチマークや、誤り訂正のための微調整に優れるエラー訂正問題に強い性能を示す。

Data entry constitutes a fundamental component of the machine learning pipeline, yet it frequently results in the introduction of labelling errors. When a model has been trained on a dataset containing such errors its performance is reduced. This leads to the challenge of efficiently unlearning the influence of the erroneous data to improve the model performance without needing to completely retrain the model. While model editing methods exist for cases in which the correct label for a wrong entry is known, we focus on the case of data entry errors where we do not know the correct labels for the erroneous data. Our contribution is twofold. First, we introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning, making unlearning accessible to practitioners. We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD), on various ResNet18 and Vision Transformer unlearning tasks. Second, we demonstrate the performance of ASSD in a supply chain delay prediction problem with labelling errors using real-world data where we randomly introduce various levels of labelling errors. The application of this approach is particularly compelling in industrial settings, such as supply chain management, where a significant portion of data entry occurs manually through Excel sheets, rendering it error-prone. ASSD shows strong performance on general unlearning benchmarks and on the error correction problem where it outperforms fine-tuning for error correction.

翻訳日:2024-02-18 12:40:20 公開日:2024-02-06

# carthago delenda est:オンラインソーシャルメディアにおけるインフルエンサー操作のための間接的情報拡散モデル

Carthago Delenda Est: Co-opetitive Indirect Information Diffusion Model for Influence Operations on Online Social Media ( http://arxiv.org/abs/2402.01905v2 )

ライセンス: Link先を確認

Jwen Fai Low, Benjamin C. M. Fung, Farkhund Iqbal, and Claude Fachkha

(参考訳) 信頼性が破産している州または非国家アクターにとって、非帰属的で、非帰属的で、非帰属的で、一見草の根的だが非中央集権的な影響/情報操作(info ops)をソーシャルメディア上で行うことは、その利益を推進しながら信頼欠陥の問題を回避するのに役立つ。分散情報運用に対する計画と/または防御は、倫理的に制限されたソーシャルメディアでのライブ実験の代わりに計算シミュレーションによって支援される。本研究では,twitterライクなソーシャルメディア上での情報伝達に挑戦するエージェントベースモデルであるdiluvsionを提案する。このモデルは、間接的な情報の絶え間なく流入する洪水や、ボットが自分たちの姿勢を広めようと競争するときに、協調的に構築できる洪水から潜在的に一般的な支持を受けることの認識に影響される意見(スタンス)に対するユーザの信念を強調する。実世界のデータに対して検証されたこのモデルは、スタンス導入、情報の非社会的結合拡散、拡散可能なスタンスとしての中立性、メディアのフレーミング効果に類似し、スタンス伝播に関して共生的なテーマなど、エンゲージメントの指標を考慮し、これまでのモデルよりも進歩している。希釈モデルの強みは、例えば1つのスタンスの採用を最大化するような正統派情報ops、エコーチャンバーの作成、偏光誘導、テーマの普及のためのトロイの木馬戦術として複数のスタンスを同時に支援する非正統的情報opsのシミュレーションで示される。

For a state or non-state actor whose credibility is bankrupt, relying on bots to conduct non-attributable, non-accountable, and seemingly-grassroots-but-decentralized-in-actuality influence/information operations (info ops) on social media can help circumvent the issue of trust deficit while advancing its interests. Planning and/or defending against decentralized info ops can be aided by computational simulations in lieu of ethically-fraught live experiments on social media. In this study, we introduce Diluvsion, an agent-based model for contested information propagation efforts on Twitter-like social media. The model emphasizes a user's belief in an opinion (stance) being impacted by the perception of potentially illusory popular support from constant incoming floods of indirect information, floods that can be cooperatively engineered in an uncoordinated manner by bots as they compete to spread their stances. Our model, which has been validated against real-world data, is an advancement over previous models because we account for engagement metrics in influencing stance adoption, non-social tie spreading of information, neutrality as a stance that can be spread, and themes that are analogous to media's framing effect and are symbiotic with respect to stance propagation. The strengths of the Diluvsion model are demonstrated in simulations of orthodox info ops, e.g., maximizing adoption of one stance; creating echo chambers; inducing polarization; and unorthodox info ops, e.g., simultaneous support of multiple stances as a Trojan horse tactic for the dissemination of a theme.

翻訳日:2024-02-09 18:25:07 公開日:2024-02-06

# 整数最適化によるテンソル補完

Tensor Completion via Integer Optimization ( http://arxiv.org/abs/2402.05141v1 )

ライセンス: Link先を確認

Xin Chen, Sukanya Kudva, Yongzheng Dai, Anil Aswani, Chen Chen

(参考訳) テンソル完備化問題の主な課題は、計算力と情報理論サンプル複雑性率の基本的な緊張である。過去のアプローチでは、情報理論の速度を達成できるが、対応する解を計算するための実用的なアルゴリズムが欠如しているか、あるいは低い推定誤差のために指数関数的に大きなサンプル数を必要とする多項式時間アルゴリズムがある。本稿では, 線形数のオラクルステップと情報理論速度で証明可能な収束(数値耐性)を両立させることにより, この緊張を解消する新しいテンソル補完アルゴリズムを開発する。本手法は, ゲージベーステンソルノルムを用いて制約された凸最適化問題としてテンソル完備化を定式化し, 整数線形最適化を用いて単位球上の線形分離問題を解けるように定義する。この洞察に基づく適応は、我々のアルゴリズムを構築するためにフランクウルフ変種に組み込まれます。我々は,最大1000万エントリを有するテンソルの数値実験を用いて,アルゴリズムのスケールスウェルを示す。

The main challenge with the tensor completion problem is a fundamental tension between computation power and the information-theoretic sample complexity rate. Past approaches either achieve the information-theoretic rate but lack practical algorithms to compute the corresponding solution, or have polynomial-time algorithms that require an exponentially-larger number of samples for low estimation error. This paper develops a novel tensor completion algorithm that resolves this tension by achieving both provable convergence (in numerical tolerance) in a linear number of oracle steps and the information-theoretic rate. Our approach formulates tensor completion as a convex optimization problem constrained using a gauge-based tensor norm, which is defined in a way that allows the use of integer linear optimization to solve linear separation problems over the unit-ball in this new norm. Adaptations based on this insight are incorporated into a Frank-Wolfe variant to build our algorithm. We show our algorithm scales-well using numerical experiments on tensors with up to ten million entries.

翻訳日:2024-02-09 17:57:47 公開日:2024-02-06

# Tag-LLM:特殊ドメインのための汎用LLMの再利用

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains ( http://arxiv.org/abs/2402.05140v1 )

ライセンス: Link先を確認

Junhong Shen, Neil Tenenholtz, James Brian Hall, David Alvarez-Melis, Nicolo Fusi

(参考訳) 大規模言語モデル(LLM)は、自然言語の理解と生成に顕著な能力を示した。しかし、その能力は、物理科学や生物医学などの事前学習コーパスで過小評価された高度に専門化された領域で低下した。本研究は、汎用LLMを専門分野の効率的なタスク解決に活用する方法を探る。 LLMの埋め込み層に付加される連続ベクトルとしてパラメータ化されるカスタム入力タグを学習するための,新しいモデルに依存しないフレームワークを提案する。ドメインタグは特殊表現(例えば化学式)を分離し、ドメイン関連コンテキストを提供するのに使われ、関数タグは特定の関数(例えば分子特性の予測)を表すのに使われ、関数解決命令は圧縮される。補助データとドメイン知識を用いて,これらのタグを学習するための3段階のプロトコルを開発した。タスク領域をタスク関数から明示的に分離することにより、入力タグの多様な組み合わせにより、ゼロショット一般化が可能となる。また、タンパク質や化学的性質の予測や薬物と標的の相互作用のモデリングなど、様々な専門分野におけるLLMの性能を高める。

Large Language Models (LLMs) have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in highly specialized domains underrepresented in the pretraining corpus, such as physical and biomedical sciences. This work explores how to repurpose general LLMs into effective task solvers for specialized domains. We introduce a novel, model-agnostic framework for learning custom input tags, which are parameterized as continuous vectors appended to the LLM's embedding layer, to condition the LLM. We design two types of input tags: domain tags are used to delimit specialized representations (e.g., chemical formulas) and provide domain-relevant context; function tags are used to represent specific functions (e.g., predicting molecular properties) and compress function-solving instructions. We develop a three-stage protocol to learn these tags using auxiliary data and domain knowledge. By explicitly disentangling task domains from task functions, our method enables zero-shot generalization to unseen problems through diverse combinations of the input tags. It also boosts LLM's performance in various specialized domains, such as predicting protein or chemical properties and modeling drug-target interactions, outperforming expert models tailored to these tasks.

翻訳日:2024-02-09 17:57:29 公開日:2024-02-06

# SceMQA: 学術大学入学レベルのマルチモーダル質問に対するベンチマーク

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark ( http://arxiv.org/abs/2402.05138v1 )

ライセンス: Link先を確認

Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang

(参考訳) 本稿は,大学進学レベルでの科学的マルチモーダル質問応答のための新しいベンチマークであるscemqaを紹介する。それは、しばしば既存のベンチマークで見過ごされる重要な教育段階に対処し、高校からプレコラージュレベルにまたがる。 SceMQAは数学、物理学、化学、生物学などの中核的な科学分野に焦点を当てている。複数選択と自由応答の混在を特徴とし、AIモデルの能力を総合的に評価する。さらに,本ベンチマークでは,各問題に対する特定の知識ポイントと,各回答に対する詳細な説明を提供する。 SceMQAはまた、推論能力のより徹底的かつ正確な評価を促進するために、同じ文脈で問題を示すが、様々な質問を提供する。実験では,オープンソースのマルチモーダル大規模言語モデル (MLLM) を,様々な実験環境において評価した。その結果,最強モデルで達成される精度は50%から60%に過ぎず,より有能なMLLMの開発にはさらなる研究と開発が必要であることが示された。ベンチマークと分析はhttps://scemqa.github.io/で利用可能です。

The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Additionally, our benchmark provides specific knowledge points for each problem and detailed explanations for each answer. SceMQA also uniquely presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. In the experiment, we evaluate both open-source and close-source state-of-the-art Multimodal Large Language Models (MLLMs), across various experimental settings. The results show that further research and development are needed in developing more capable MLLM, as highlighted by only 50% to 60% accuracy achieved by the strongest models. Our benchmark and analysis will be available at https://scemqa.github.io/

翻訳日:2024-02-09 17:57:07 公開日:2024-02-06

# LtU-ILI:天体物理学と宇宙論における暗黙の推論のためのオールインワンフレームワーク

LtU-ILI: An All-in-One Framework for Implicit Inference in Astrophysics and Cosmology ( http://arxiv.org/abs/2402.05137v1 )

ライセンス: Link先を確認

Matthew Ho, Deaglan J. Bartlett, Nicolas Chartier, Carolina Cuesta-Lazaro, Simon Ding, Axel Lapel, Pablo Lemos, Christopher C. Lovell, T. Lucas Makinen, Chirag Modi, Viraj Pandya, Shivam Pandey, Lucia A. Perez, Benjamin Wandelt, Greg L. Bryan

(参考訳) 本稿では、天体物理学と宇宙論における機械学習(ML)の高速かつユーザフレンドリで最先端の推論のためのコードベースであるLtU-ILIパイプラインについて述べる。このパイプラインには、さまざまなニューラルネットワークの実装、スキーマのトレーニング、事前、密度推定といったソフトウェアが含まれており、どんな研究ワークフローにも容易に適応できる。後方推定カバレッジを評価するための包括的な検証メトリクスが含まれており、推定結果の信頼性を高めている。さらにパイプラインは容易に並列化でき、ハイパーパラメータのモデリングを効率的に行うために設計されている。 x線測光から銀河団質量を推定すること、物質のパワースペクトルとハロポイント雲から宇宙論を推測すること、重力波信号における前駆体を特徴付けること、銀河の色や光度から物理的塵のパラメータを捉えること、半分析的な銀河形成モデルの確立などである。また、全実装手法の比較や、天文学におけるML推論の課題と落とし穴についての議論も含む。すべてのコードとサンプルはhttps://github.com/maho3/ltu-iliで公開されている。

This paper presents the Learning the Universe Implicit Likelihood Inference (LtU-ILI) pipeline, a codebase for rapid, user-friendly, and cutting-edge machine learning (ML) inference in astrophysics and cosmology. The pipeline includes software for implementing various neural architectures, training schema, priors, and density estimators in a manner easily adaptable to any research workflow. It includes comprehensive validation metrics to assess posterior estimate coverage, enhancing the reliability of inferred results. Additionally, the pipeline is easily parallelizable, designed for efficient exploration of modeling hyperparameters. To demonstrate its capabilities, we present real applications across a range of astrophysics and cosmology problems, such as: estimating galaxy cluster masses from X-ray photometry; inferring cosmology from matter power spectra and halo point clouds; characterising progenitors in gravitational wave signals; capturing physical dust parameters from galaxy colors and luminosities; and establishing properties of semi-analytic models of galaxy formation. We also include exhaustive benchmarking and comparisons of all implemented methods as well as discussions about the challenges and pitfalls of ML inference in astronomical sciences. All code and examples are made publicly available at https://github.com/maho3/ltu-ili.

翻訳日:2024-02-09 17:56:49 公開日:2024-02-06

# LV-Eval: 256Kまでの5つのレベルを持つバランスのとれたロングコンテキストベンチマーク

LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256K ( http://arxiv.org/abs/2402.05136v1 )

ライセンス: Link先を確認

Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, Yu Wang

(参考訳) State-of-the-art large language model (LLMs)は256k以上のコンテキスト長をサポートしている。対照的に、主流ベンチマークの平均コンテキスト長は不十分(5k-21k)であり、潜在的な知識リークと不正確なメトリクスに悩まされ、バイアス評価をもたらす。本稿では,5つの長さレベル(16k,32k,64k,128k,256k)が最大256kワードに達する,挑戦的な長コンテキストベンチマークlv-evalを紹介する。 LV-Evalは、シングルホップQAとマルチホップQAという、11のバイリンガルデータセットからなる2つの主要なタスクを備えている。 lv-evalの設計には、事実の挿入の紛らわしさ、キーワードと句の置換、キーワードリコールに基づくメトリックデザインという3つの重要な技法が組み込まれている。 LV-Evalの利点は、異なるコンテキストの長さにわたる制御可能な評価、紛らわしい事実を持つテストインスタンスへの挑戦、知識リークの軽減、より客観的な評価である。 LV-Evalの10LLMを評価し,LV-Evalの工法に関するアブレーション研究を行った。その結果、以下のことが判明した。 (i)商用LLMは,要求コンテキスト長よりも短い長さで評価した場合,一般的にオープンソースLLMよりも優れる。しかし、その全体的な性能は、長いコンテキスト長を持つオープンソースのLLMに勝っている。 (II)Yi-6B-200kのような長文LLMは比較的穏やかな性能低下を示すが、その絶対性能は文脈長が短いLLMよりも必ずしも高いとは限らない。 (iii)llmsの性能は,混乱した情報の存在下で,特に「干し草の積み重ね」の圧力試験において著しく低下する可能性がある。 (4)知識漏洩や不正確な指標に関する問題は評価のバイアスをもたらし、これらの懸念はLV-Evalで緩和される。すべてのデータセットと評価コードは、https://github.com/infinigence/LVEval.comでリリースされる。

State-of-the-art large language models (LLMs) are now claiming remarkable supported context lengths of 256k or even more. In contrast, the average context lengths of mainstream benchmarks are insufficient (5k-21k), and they suffer from potential knowledge leakage and inaccurate metrics, resulting in biased evaluation. This paper introduces LV-Eval, a challenging long-context benchmark with five length levels (16k, 32k, 64k, 128k, and 256k) reaching up to 256k words. LV-Eval features two main tasks, single-hop QA and multi-hop QA, comprising 11 bilingual datasets. The design of LV-Eval has incorporated three key techniques, namely confusing facts insertion, keyword and phrase replacement, and keyword-recall-based metric design. The advantages of LV-Eval include controllable evaluation across different context lengths, challenging test instances with confusing facts, mitigated knowledge leakage, and more objective evaluations. We evaluate 10 LLMs on LV-Eval and conduct ablation studies on the techniques used in LV-Eval construction. The results reveal that: (i) Commercial LLMs generally outperform open-source LLMs when evaluated within length levels shorter than their claimed context length. However, their overall performance is surpassed by open-source LLMs with longer context lengths. (ii) Extremely long-context LLMs, such as Yi-6B-200k, exhibit a relatively gentle degradation of performance, but their absolute performances may not necessarily be higher than those of LLMs with shorter context lengths. (iii) LLMs' performances can significantly degrade in the presence of confusing information, especially in the pressure test of "needle in a haystack". (iv) Issues related to knowledge leakage and inaccurate metrics introduce bias in evaluation, and these concerns are alleviated in LV-Eval. All datasets and evaluation codes are released at: https://github.com/infinigence/LVEval.

翻訳日:2024-02-09 17:56:27 公開日:2024-02-06

# CADReN:制御可能なクロスグラフノードインポート推定のためのコンテキストアンカー駆動リレーショナルネットワーク

CADReN: Contextual Anchor-Driven Relational Network for Controllable Cross-Graphs Node Importance Estimation ( http://arxiv.org/abs/2402.05135v1 )

ライセンス: Link先を確認

Zijie Zhong, Yunhui Zhang, Ziyi Chang, Zengchang Qin

(参考訳) ノード重要度推定(NIE)は、Retriever-Augmented Generationを通じて外部情報を大規模言語モデルに統合するために重要である。静的なシングルグラフの特徴に注目した従来の方法は、新しいグラフやユーザ固有の要件への適応性に欠ける。提案手法であるCADReNは、コンテキストアンカー(CA)機構を導入し、これらの制約に対処する。このアプローチにより、ネットワークは知識グラフ(KG)の構造的特徴と意味的特徴の両方を考慮して、CAに対するノードの重要性を評価することができる。広汎な実験により,CADReNはゼロショット予測能力を持つクロスグラフNIEタスクにおいて,より良い性能を実現することが示された。 CADReNは、シングルグラフNIEタスクにおける以前のモデルの性能と一致することが証明されている。さらに,NIEのクロスグラフ研究に特化して設計されたRIC200とWK1Kという2つの新しいデータセットをオープンソースとして公開し,今後の発展に有用なリソースを提供する。

Node Importance Estimation (NIE) is crucial for integrating external information into Large Language Models through Retriever-Augmented Generation. Traditional methods, focusing on static, single-graph characteristics, lack adaptability to new graphs and user-specific requirements. CADReN, our proposed method, addresses these limitations by introducing a Contextual Anchor (CA) mechanism. This approach enables the network to assess node importance relative to the CA, considering both structural and semantic features within Knowledge Graphs (KGs). Extensive experiments show that CADReN achieves better performance in cross-graph NIE task, with zero-shot prediction ability. CADReN is also proven to match the performance of previous models on single-graph NIE task. Additionally, we introduce and opensource two new datasets, RIC200 and WK1K, specifically designed for cross-graph NIE research, providing a valuable resource for future developments in this domain.

翻訳日:2024-02-09 17:55:54 公開日:2024-02-06

# パーソナライズされた人間のフィードバックからのパーソナライズド言語モデリング

Personalized Language Modeling from Personalized Human Feedback ( http://arxiv.org/abs/2402.05133v1 )

ライセンス: Link先を確認

Xinyu Li, Zachary C. Lipton, Liu Leqi

(参考訳) Reinforcement Learning from Human Feedback (RLHF) は、人間の好みに合わせて大きな言語モデルを微調整する、現在の支配的なフレームワークである。しかし、このフレームワークで開発されたアルゴリズムの前提は、人間のフィードバックに符号化されたユーザの好みが多様である場合に問題となる。本研究では,パーソナライズされた言語モデルを構築する手法の開発により,この問題に対処しようとする。まず、個人化されたフィードバックから学習するタスクを正式に紹介し、なぜバニラRLHFが問題となるのかを説明する。次に、ユーザモデルと言語(あるいは報酬)モデルを共同で学習する必要がある一般パーソナライズ-RLHF(P-RLHF)フレームワークを提案する。ユーザモデルはユーザ情報を取り込み、ユーザ表現を出力する。その構造は、フィードバックデータに基づくユーザの好みに関する仮定をエンコードします。我々はパーソナライズされた報酬モデリングとパーソナライズされた直接選好最適化のための新しい学習目標を開発した。本手法の有効性を示すために,アノテーション付き選好情報と注釈情報を用いた実世界のテキスト要約データを用いてテストを行った。 GPT-J 6Bを微調整してパーソナライズされた言語(と報酬)モデルを得る。

Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimization. To demonstrate the efficacy of our method, we test it on real-world text summarization data with annotated preferences and annotator information. We fine-tune GPT-J 6B to obtain personalized language (and reward) models, which outperform non-personalized models in terms of aligning with individual preferences.

翻訳日:2024-02-09 17:55:36 公開日:2024-02-06

# 無線ネットワークにおける協調スペクトル学習のための媒体アクセス制御プロトコル

Medium Access Control protocol for Collaborative Spectrum Learning in Wireless Networks ( http://arxiv.org/abs/2111.12581v2 )

ライセンス: Link先を確認

Tomer Boyarski, Wenbo Wang, Amir Leshem

(参考訳) 近年,スペクトル協調のための学習アルゴリズムの提供に力を入れている。本稿では,高負荷ネットワークにおいて,最小限の後悔と高いスペクトル効率でスペクトル協調を実現するメディアアクセス制御プロトコルを提案する。アドホックネットワークにおけるスペクトル協調のための完全分散アルゴリズムを提案する。このアルゴリズムは、チャネル割り当てとアクセススケジューリングの問題を共同で解決する。アルゴリズムが最適対数的後悔を持つことを証明する。このアルゴリズムに基づき、アドホックネットワークにおけるアルゴリズムの分散実装を可能にする媒体アクセス制御プロトコルを提供する。このプロトコルは、単一チャネルオポチュニストキャリアセンシングを使用して、時間と周波数の低複雑さ分散オークションを実行する。また,有界フレームサイズや収束速度などの実践的実装問題についても論じる。アルゴリズムと最先端の分散媒体アクセス制御プロトコルを比較したコンピュータシミュレーションは,提案手法の大きな利点を示している。

In recent years there is a growing effort to provide learning algorithms for spectrum collaboration. In this paper we present a medium access control protocol which allows spectrum collaboration with minimal regret and high spectral efficiency in highly loaded networks. We present a fully-distributed algorithm for spectrum collaboration in congested ad-hoc networks. The algorithm jointly solves both the channel allocation and access scheduling problems. We prove that the algorithm has an optimal logarithmic regret. Based on the algorithm we provide a medium access control protocol which allows distributed implementation of the algorithm in ad-hoc networks. The protocol utilizes single-channel opportunistic carrier sensing to carry out a low-complexity distributed auction in time and frequency. We also discuss practical implementation issues such as bounded frame size and speed of convergence. Computer simulations comparing the algorithm to state-of-the-art distributed medium access control protocols show the significant advantage of the proposed scheme.

翻訳日:2024-02-08 21:12:28 公開日:2024-02-06

# 半教師付き学習による脳内出血検出と分節化の一般化

Semi-supervised learning for generalizable intracranial hemorrhage detection and segmentation ( http://arxiv.org/abs/2105.00582v2 )

ライセンス: Link先を確認

Emily Lin, Esther Yuh

(参考訳) 目的: 頭部ctを用いた頭蓋内出血検出・分節化のための半教師付き学習モデルの開発と評価すること。材料と方法: この振り返り研究は半教師あり学習を用いてパフォーマンスをブートストラップした。最初の"Teacher"ディープラーニングモデルは、2010年から2017年にかけて米国のある機関から収集された457ピクセルの頭部CTスキャンに基づいてトレーニングされ、RSNAとASNRから25,000の試験の別ラベルコーパスで擬似ラベルを生成するために使用された。 2つ目の"sudent"モデルは、このピクセルと擬似ラベルのデータセットでトレーニングされた。 93スキャンの検証セットでハイパーパラメータチューニングが行われた。インドで実施された481検診のデータセットであるCQ500で, 分類(n=481検診)と分割(n=23検診, 529検診)を行った。半教師付きモデルと,受信者動作特性曲線 (auc) , dice類似度係数 (dsc) および平均精度 (ap) 指標の下の領域を用いてラベル付きデータのみを訓練したベースラインモデルを比較した。結果: 半教師モデルでは, CQ500のAUCは, ベースライン (0.939 [0.938, 0.940] vs. 0.907 [0.906, 0.908]) と比較して統計的に有意に高い値を示した(p=0.009)。また, DSC (0.829 [0.825, 0.833] vs. 0.809 [0.803, 0.812]) (p=0.012) と Pixel AP (0.848 [0.843, 0.853]) vs. 0.828 [0.817, 0.828]) はベースラインに比べて高い値を示した。結論: 半教師付き学習フレームワークにおけるラベルなしデータの追加は, 教師付きベースラインと比較して, 頭蓋内出血の検出と分節化に強い汎化可能性を示す。

Purpose: To develop and evaluate a semi-supervised learning model for intracranial hemorrhage detection and segmentation on an out-of-distribution head CT evaluation set. Materials and Methods: This retrospective study used semi-supervised learning to bootstrap performance. An initial "teacher" deep learning model was trained on 457 pixel-labeled head CT scans collected from one US institution from 2010-2017 and used to generate pseudo-labels on a separate unlabeled corpus of 25000 examinations from the RSNA and ASNR. A second "student" model was trained on this combined pixel- and pseudo-labeled dataset. Hyperparameter tuning was performed on a validation set of 93 scans. Testing for both classification (n=481 examinations) and segmentation (n=23 examinations, or 529 images) was performed on CQ500, a dataset of 481 scans performed in India, to evaluate out-of-distribution generalizability. The semi-supervised model was compared with a baseline model trained on only labeled data using area under the receiver operating characteristic curve (AUC), Dice similarity coefficient (DSC), and average precision (AP) metrics. Results: The semi-supervised model achieved statistically significantly higher examination AUC on CQ500 compared with the baseline (0.939 [0.938, 0.940] vs. 0.907 [0.906, 0.908]) (p=0.009). It also achieved a higher DSC (0.829 [0.825, 0.833] vs. 0.809 [0.803, 0.812]) (p=0.012) and Pixel AP (0.848 [0.843, 0.853]) vs. 0.828 [0.817, 0.828]) compared to the baseline. Conclusion: The addition of unlabeled data in a semi-supervised learning framework demonstrates stronger generalizability potential for intracranial hemorrhage detection and segmentation compared with a supervised baseline.

翻訳日:2024-02-08 21:12:16 公開日:2024-02-06

# 多変量確率CRPS学習と日頭電力価格への応用

Multivariate Probabilistic CRPS Learning with an Application to Day-Ahead Electricity Prices ( http://arxiv.org/abs/2303.10019v3 )

ライセンス: Link先を確認

Jonathan Berrisch, Florian Ziel

(参考訳) 本稿では,オンライン学習が可能なスムーズな手順により,量子と辺縁の依存関係を考慮し,多変量確率予測を結合(あるいは集約)する新しい手法を提案する。本稿では,基底行列を用いた次元性低減とペナルティ化平滑化の2つの平滑化手法について検討する。新しいオンライン学習アルゴリズムは、標準CRPS学習フレームワークを多変量次元に一般化する。これはBernstein Online Aggregation (BOA)に基づいており、最適な漸近学習特性をもたらす。この手順は水平アグリゲーション、すなわち量子的集合を用いる。本稿では,提案アルゴリズムの拡張の可能性と,既存文献に関連するネスト事例について,オンライン予測の組み合わせについて詳細に検討する。提案手法を24次元分布予測である日頭電力価格の予測に適用する。提案手法は,CRPS(Continuous Rank probability score)の観点から,均一な組み合わせよりも顕著な改善をもたらす。重みとハイパーパラメータの時間的進化について論じ, 推奨モデルの縮小版の結果を示す。提案アルゴリズムの高速なC++実装は、CRAN上のオープンソースのR-Package profocで提供される。

This paper presents a new method for combining (or aggregating or ensembling) multivariate probabilistic forecasts, considering dependencies between quantiles and marginals through a smoothing procedure that allows for online learning. We discuss two smoothing methods: dimensionality reduction using Basis matrices and penalized smoothing. The new online learning algorithm generalizes the standard CRPS learning framework into multivariate dimensions. It is based on Bernstein Online Aggregation (BOA) and yields optimal asymptotic learning properties. The procedure uses horizontal aggregation, i.e., aggregation across quantiles. We provide an in-depth discussion on possible extensions of the algorithm and several nested cases related to the existing literature on online forecast combination. We apply the proposed methodology to forecasting day-ahead electricity prices, which are 24-dimensional distributional forecasts. The proposed method yields significant improvements over uniform combination in terms of continuous ranked probability score (CRPS). We discuss the temporal evolution of the weights and hyperparameters and present the results of reduced versions of the preferred model. A fast C++ implementation of the proposed algorithm is provided in the open-source R-Package profoc on CRAN.

翻訳日:2024-02-08 21:02:12 公開日:2024-02-06

# マルチクラスグラフニューラルネットワークにおける符号付き伝播の再検討

Revisiting Signed Propagation for Multi-Class Graph Neural Networks ( http://arxiv.org/abs/2301.08918v5 )

ライセンス: Link先を確認

Yoonhyuk Choi, Jiho Choi, Taewook Ko, Chong-Kwon Kim

(参考訳) 隣接ノードから情報を収集するメッセージパスグラフニューラルネットワーク(GNN)は、異種グラフ上で不適切なパフォーマンスを達成する。この問題を解決するための様々なスキームが提案され、異種縁に署名された情報を伝播することが注目されている。近年では、符号付き伝搬が常にバイナリクラスのシナリオでパフォーマンス改善につながるという理論的解析が提供されている。しかし、事前解析がマルチクラスベンチマークデータセットとうまく一致しないことに気付きました。メッセージパッシング(Message-passing):2つのノードが異なるクラスに属し、高い類似性を持つ場合、署名された伝搬は分離性を低下させることができる。 2) パラメータ更新: 署名された隣人の予測の不確実性(例えば衝突証拠)は、トレーニング中に増加し、アルゴリズムの安定性を阻害する。本研究は,マルチクラスグラフに基づく署名伝達を改善するための2つの新しい手法を提案する。提案手法はキャリブレーションとロバスト性を確保しつつ不確実性を低減させる。 6つのベンチマークグラフデータセットに対する広範な実験により,本定理の有効性を示す。

Message-passing Graph Neural Networks (GNNs), which collect information from adjacent nodes achieve dismal performance on heterophilic graphs. Various schemes have been proposed to solve this problem, and propagating signed information on heterophilic edges has gained great attention. Recently, some works provided theoretical analysis that signed propagation always leads to performance improvement under a binary class scenario. However, we notice that prior analyses do not align well with multi-class benchmark datasets. This paper provides a new understanding of signed propagation for multi-class scenarios and points out two drawbacks in terms of message-passing and parameter update: (1) Message-passing: if two nodes belong to different classes but have a high similarity, signed propagation can decrease the separability. (2) Parameter update: the prediction uncertainty (e.g., conflict evidence) of signed neighbors increases during training, which can impede the stability of the algorithm. Based on the observation, we introduce two novel strategies for improving signed propagation under multi-class graphs. The proposed scheme combines calibration to secure robustness while reducing uncertainty. We show the efficacy of our theorem through extensive experiments on six benchmark graph datasets.

翻訳日:2024-02-08 21:01:43 公開日:2024-02-06

# RenderDiffusion:3次元再構成・塗装・生成のための画像拡散

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation ( http://arxiv.org/abs/2211.09869v3 )

ライセンス: Link先を確認

Titas Anciukevicius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero

(参考訳) 拡散モデルは現在、条件付きおよび無条件画像生成の両方において最先端の性能を達成している。しかし、これまでの画像拡散モデルは、ビュー一貫性のある3D生成やシングルビューオブジェクト再構成のような3D理解に必要なタスクをサポートしていない。本稿では,単分子2次元監視のみを用いてトレーニングした3次元生成と推論のための最初の拡散モデルであるRenderDiffusionを提案する。提案手法の中心となるのは,シーンの中間的な3次元表現を生成・描画する新しい画像復調アーキテクチャである。これは拡散過程の中で強い誘導構造を強制し、2次元の監督しか必要とせず、3次元の一貫した表現を提供する。得られた3d表現は、任意のビューからレンダリングできる。 FFHQ,AFHQ,ShapeNet,CLEVRのデータセット上でRenderDiffusionを評価し,3Dシーンの生成と2D画像からの3Dシーンの推測の競合性能を示した。さらに、拡散ベースのアプローチでは、2dインペインティングを使って3dシーンを編集できます。

Diffusion models currently achieve state-of-the-art performance for both conditional and unconditional image generation. However, so far, image diffusion models do not support tasks required for 3D understanding, such as view-consistent 3D generation or single-view object reconstruction. In this paper, we present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision. Central to our method is a novel image denoising architecture that generates and renders an intermediate three-dimensional representation of a scene in each denoising step. This enforces a strong inductive structure within the diffusion process, providing a 3D consistent representation while only requiring 2D supervision. The resulting 3D representation can be rendered from any view. We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images. Additionally, our diffusion-based approach allows us to use 2D inpainting to edit 3D scenes.

翻訳日:2024-02-08 20:59:07 公開日:2024-02-06

# 局所的に異なる私的メカニズムの収縮

Contraction of Locally Differentially Private Mechanisms ( http://arxiv.org/abs/2210.13386v3 )

ライセンス: Link先を確認

Shahab Asoodeh and Huanyu Zhang

(参考訳) 局所微分プライベート機構の収縮特性について検討する。具体的には、$PK$と$QK$の出力分布が$\epsilon$-LDPメカニズムの$K$のばらつきについて、対応する入力分布の$P$と$Q$のばらつきについて厳密な上限を導出する。我々の最初の技術結果は、$\chi^2$-divergence $\chi^2(PK}\|QK)$と$\varepsilon$の点で鋭い上限を示す。また、KL偏差や正方形ヘルリンガー距離を含む大きな分岐族についても同様の結果が得られた。第2の技術的結果は、全変動距離$\mathsf{TV}(P, Q)$と$\epsilon$の点で、$\chi^2(PK\|QK)$の上界を与える。次に、これらの境界を利用して、局所的なvan Treesの不等式、Le Cam's、Assouad's、およびミニマックス推定リスクをバウンディングするための強力なツールである相互情報手法を確立する。これらの結果は、エントロピーや離散分布推定、非パラメトリック密度推定、仮説テストといったいくつかの統計問題において、最先端技術よりも優れたプライバシー分析をもたらすことが示されている。

We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between $PK$ and $QK$ output distributions of an $\epsilon$-LDP mechanism $K$ in terms of a divergence between the corresponding input distributions $P$ and $Q$, respectively. Our first main technical result presents a sharp upper bound on the $\chi^2$-divergence $\chi^2(PK}\|QK)$ in terms of $\chi^2(P\|Q)$ and $\varepsilon$. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on $\chi^2(PK\|QK)$ in terms of total variation distance $\mathsf{TV}(P, Q)$ and $\epsilon$. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing.

翻訳日:2024-02-08 20:58:49 公開日:2024-02-06

# 絡み合い支援通信のためのフォールトトレラント符号化

Fault-tolerant Coding for Entanglement-Assisted Communication ( http://arxiv.org/abs/2210.02939v2 )

ライセンス: Link先を確認

Paula Belzig, Matthias Christandl, Alexander M\"uller-Hermes

(参考訳) チャネル容量は、ノイズの多いチャネル上で情報を確実に送信する最適な速度を定量化する。通常、キャパシティの研究は、送信側と受信側がエンコードとデコードに使用する回路が完全なノイズのないゲートからなると仮定している。しかし、量子チャネル上の通信の場合、この仮定は、デコヒーレンスの過程によって影響を受ける量子情報の脆弱さのために、長期的にも非現実的であると広く信じられている。そのため、ChristandlとM\"uller-Hermesは、量子チャネルのフォールトトレラントチャネル符号化、すなわちエンコーダ回路とデコーダ回路がノイズに影響を受けるコーディングスキームの研究を開始し、フォールトトレラント量子コンピューティングの技法を用いて古典的および量子的情報を送信するための符号化定理を確立した。ここでは,これらの手法を絡み合い支援通信の場合,特にゲートエラーがゼロに近づくと,耐故障能力が通常の容量に近づくことを示す。独立した関心を持つと思われる主なツールは、フォールトトレラントなエンタングルメント蒸留の導入である。さらに,他のフォールトトレラントな通信シナリオでも容易に適用できるように,使用されるテクニックのモジュール化にも重点を置いています。

Channel capacities quantify the optimal rates of sending information reliably over noisy channels. Usually, the study of capacities assumes that the circuits which sender and receiver use for encoding and decoding consist of perfectly noiseless gates. In the case of communication over quantum channels, however, this assumption is widely believed to be unrealistic, even in the long-term, due to the fragility of quantum information, which is affected by the process of decoherence. Christandl and M\"uller-Hermes have therefore initiated the study of fault-tolerant channel coding for quantum channels, i.e. coding schemes where encoder and decoder circuits are affected by noise, and have used techniques from fault-tolerant quantum computing to establish coding theorems for sending classical and quantum information in this scenario. Here, we extend these methods to the case of entanglement-assisted communication, in particular proving that the fault-tolerant capacity approaches the usual capacity when the gate error approaches zero. A main tool, which might be of independent interest, is the introduction of fault-tolerant entanglement distillation. We furthermore focus on the modularity of the techniques used, so that they can be easily adopted in other fault-tolerant communication scenarios.

翻訳日:2024-02-08 20:58:21 公開日:2024-02-06

# 確率的未発達連帯学習

Stochastic Unrolled Federated Learning ( http://arxiv.org/abs/2305.15371v2 )

ライセンス: Link先を確認

Samar Hadou, Navid NaderiAlizadeh, and Alejandro Ribeiro

(参考訳) アルゴリズムの展開は、学習ベースの最適化パラダイムとして登場し、学習可能なニューラルネットワークオプティマイザで断続的な反復アルゴリズムを展開する。本研究では,その収束を早めるために,アルゴリズムを連帯学習に拡張する手法である確率的連帯学習(surf)を提案する。提案手法は,この拡張の2つの課題,すなわち,非学習最適化者にデータセット全体を供給して,学習の降下方向と分散的な性質を見出す必要性に対処する。我々は,各階層に確率的ミニバッチを供給し,その収束を保証するために降下制約を課すことにより,従来の課題を回避する。本稿では,分散勾配降下(dgd)アルゴリズムをグラフニューラルネットワーク(gnn)ベースの未ロールアーキテクチャに展開することで,連合学習におけるトレーニングの分散性を維持することで,後者の課題に対処する。提案したアンロール最適化器がほぼ最適領域に無限に収束することを理論的に証明する。また,広範な数値実験を通じて,画像分類器の協調学習における提案手法の有効性を実証する。

Algorithm unrolling has emerged as a learning-based optimization paradigm that unfolds truncated iterative algorithms in trainable neural-network optimizers. We introduce Stochastic UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning in order to expedite its convergence. Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolled optimizers to find a descent direction and the decentralized nature of federated learning. We circumvent the former challenge by feeding stochastic mini-batches to each unrolled layer and imposing descent constraints to guarantee its convergence. We address the latter challenge by unfolding the distributed gradient descent (DGD) algorithm in a graph neural network (GNN)-based unrolled architecture, which preserves the decentralized nature of training in federated learning. We theoretically prove that our proposed unrolled optimizer converges to a near-optimal region infinitely often. Through extensive numerical experiments, we also demonstrate the effectiveness of the proposed framework in collaborative training of image classifiers.

翻訳日:2024-02-08 20:51:16 公開日:2024-02-06

# スケールでの解釈可能性:アルパカにおける因果メカニズムの解明

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca ( http://arxiv.org/abs/2305.08809v3 )

ライセンス: Link先を確認

Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman

(参考訳) 大規模で汎用的な言語モデルの人間解釈可能な説明を得ることは、AI安全性の緊急の目標である。しかし、我々の解釈可能性法は、モデル行動の根底にある因果ダイナミクスに忠実であり、不明瞭な入力に頑健に一般化できることと同じくらい重要である。分散アライメント探索(DAS)は、因果抽象理論に基づく強力な勾配降下法であり、解釈可能なシンボルアルゴリズムと特定のタスクのために微調整された小さなディープラーニングモデルとの完全な整合性を発見した。本稿では,残ったブルートフォースサーチステップを学習パラメーターに置き換え,境界なしdasと呼ぶアプローチにより,dasを格段にスケールする。これにより、命令に従う間、大規模言語モデルにおける解釈可能な因果構造を効率的に探索できる。境界のないdasをalpacaモデル(7bパラメータ)に適用し、棚から外れて単純な数値推論問題を解く。境界のないdasでは、2つの解釈可能なブール変数を持つ因果モデルを実装することでalpacaがこれを行うことが分かる。さらに,これらの変数に対する神経表現のアライメントは,入力や命令の変化に対して頑健であることが判明した。これらの発見は、我々の成長し、最も広く展開されている言語モデルの内部動作を忠実に理解するための第一歩である。私たちのツールはより大きなLLMに拡張可能で、https://github.com/stanfordnlp/pyvene`で公開されています。

Obtaining human-interpretable explanations of large, general-purpose language models is an urgent goal for AI safety. However, it is just as important that our interpretability methods are faithful to the causal dynamics underlying model behavior and able to robustly generalize to unseen inputs. Distributed Alignment Search (DAS) is a powerful gradient descent method grounded in a theory of causal abstraction that has uncovered perfect alignments between interpretable symbolic algorithms and small deep learning models fine-tuned for specific tasks. In the present paper, we scale DAS significantly by replacing the remaining brute-force search steps with learned parameters -- an approach we call Boundless DAS. This enables us to efficiently search for interpretable causal structure in large language models while they follow instructions. We apply Boundless DAS to the Alpaca model (7B parameters), which, off the shelf, solves a simple numerical reasoning problem. With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables. Furthermore, we find that the alignment of neural representations with these variables is robust to changes in inputs and instructions. These findings mark a first step toward faithfully understanding the inner-workings of our ever-growing and most widely deployed language models. Our tool is extensible to larger LLMs and is released publicly at `https://github.com/stanfordnlp/pyvene`.

翻訳日:2024-02-08 20:49:30 公開日:2024-02-06

# 開量子系に対する適応変分シミュレーション

Adaptive variational simulation for open quantum systems ( http://arxiv.org/abs/2305.06915v2 )

ライセンス: Link先を確認

Huo Chen, Niladri Gomes, Siyuan Niu and Wibe Albert de Jong

(参考訳) 量子ハードウェアは量子シミュレーションの新しい可能性を提供する。研究の多くはクローズド量子システムのシミュレーションに重点を置いているが、現実の量子システムは大部分がオープンである。したがって、オープン量子システムを効果的にシミュレートできる量子アルゴリズムを開発することが不可欠である。本稿では,lindblad方程式によって記述された開量子系ダイナミクスをシミュレートする適応変分量子アルゴリズムを提案する。このアルゴリズムは,シミュレーション精度を保ち,演算子の動的付加により資源効率の良いアンサーゼを構築するように設計されている。我々は、ノイズレスシミュレータとIBM量子プロセッサの両方におけるアルゴリズムの有効性を検証し、正確な解との定量的および定性的な整合性を観察する。また,必要資源のスケールをシステムサイズと精度で検討し,多項式の挙動を求める。その結果、近未来の量子プロセッサはオープン量子システムをシミュレートできることがわかった。

Emerging quantum hardware provides new possibilities for quantum simulation. While much of the research has focused on simulating closed quantum systems, the real-world quantum systems are mostly open. Therefore, it is essential to develop quantum algorithms that can effectively simulate open quantum systems. Here we present an adaptive variational quantum algorithm for simulating open quantum system dynamics described by the Lindblad equation. The algorithm is designed to build resource-efficient ansatze through the dynamical addition of operators by maintaining the simulation accuracy. We validate the effectiveness of our algorithm on both noiseless simulators and IBM quantum processors and observe good quantitative and qualitative agreement with the exact solution. We also investigate the scaling of the required resources with system size and accuracy and find polynomial behavior. Our results demonstrate that near-future quantum processors are capable of simulating open quantum systems.

翻訳日:2024-02-08 20:49:03 公開日:2024-02-06

# 時間依存ハミルトニアンの密度行列のベクトル化とフォン・ノイマン方程式の量子シミュレーション

Vectorization of the density matrix and quantum simulation of the von Neumann equation of time-dependent Hamiltonians ( http://arxiv.org/abs/2306.08775v4 )

ライセンス: Link先を確認

Alejandro Kunold

(参考訳) リー代数の性質に基づいて、この研究はフォン・ノイマン方程式を量子シミュレーションに適した形で線形化するための一般的な枠組みを開発した。フォン・ノイマン方程式のこれらの線型化のうちの1つは、状態ベクトルが密度行列の列積要素となり、ハミルトニアン超作用素が$I\otimes H-H^\top \otimes I$、$I$が恒等行列、$H$が標準ハミルトニアンとなる標準的な場合に対応することを示す。この特定の形式はフォン・ノイマン方程式を線型化する方法のより広いクラスに属することが証明されており、それらはそれらの原型である代数によって分類することができる。特に、状態ベクトルの量子トモグラフィーを実質的に単純化する実密度行列係数を与えるエルミート代数に注意が払われる。この考え方に基づき,密度行列のダイナミクスをシミュレートする量子アルゴリズムを提案する。この手法は、パウリ弦によって形成される代数のユニークな性質とともに、トロタライズの使用を避けることができ、したがって回路深さを著しく減少させる。パウリの弦によって形成される代数の特別なケースを使ったとしても、アルゴリズムは他の代数に容易に適用できる。このアルゴリズムはIBMノイズ量子回路シミュレータを用いて2つのおもちゃハミルトンに対して実証される。

Based oh the properties of Lie algebras, in this work we develop a general framework to linearize the von-Neumann equation rendering it in a suitable form for quantum simulations. We show that one of these linearizations of the von-Neumann equation corresponds to the standard case in which the state vector becomes the column stacked elements of the density matrix and the Hamiltonian superoperator takes the form $I\otimes H-H^\top \otimes I$ where $I$ is the identity matrix and $H$ is the standard Hamiltonian. It is proven that this particular form belongs to a wider class of ways of linearizing the von Neumann equation that can be categorized by the algebra from which they originated. Particular attention is payed to Hermitian algebras that yield real density matrix coefficients substantially simplifying the quantum tomography of the state vector. Based on this ideas, a quantum algorithm to simulate the dynamics of the density matrix is proposed. It is shown that this method, along with the unique properties of the algebra formed by Pauli strings allows to avoid the use of Trotterization hence considerably reducing the circuit depth. Even though we have used the special case of the algebra formed by the Pauli strings, the algorithm can be readily adapted to other algebras. The algorithm is demonstrated for two toy Hamiltonians using the IBM noisy quantum circuit simulator.

翻訳日:2024-02-08 20:37:16 公開日:2024-02-06

# オフライン帯域におけるベイズレジスト最小化

Bayesian Regret Minimization in Offline Bandits ( http://arxiv.org/abs/2306.01237v2 )

ライセンス: Link先を確認

Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz

(参考訳) オフライン線形包帯におけるベイズ的後悔を最小限に抑える決定の仕方について検討する。先行研究は、報酬に対して最大低い信頼率(lcb)の行動を取ることを示唆している。我々は, LCB への依存は本質的にこの設定に欠陥があることを論じ, 効率的な円錐最適化解法を用いてベイズ後悔の上限を直接最小化するアルゴリズムを提案する。我々の限界は金融リスク対策との新たなつながりに重きを置いている。一致した下限を証明し、上限がきついことを示し、それらを最小化することで、LCBアプローチより優れていることが保証される。合成ドメインの数値計算の結果, LCBの最大化よりもアプローチが優れていることが確認された。

We study how to make decisions that minimize Bayesian regret in offline linear bandits. Prior work suggests that one must take actions with maximum lower confidence bound (LCB) on their reward. We argue that reliance on LCB is inherently flawed in this setting and propose a new algorithm that directly minimizes upper bounds on the Bayesian regret using efficient conic optimization solvers. Our bounds build heavily on new connections to monetary risk measures. Proving a matching lower bound, we show that our upper bounds are tight, and by minimizing them we are guaranteed to outperform the LCB approach. Our numerical results on synthetic domains confirm that our approach is superior to maximizing LCB.

翻訳日:2024-02-08 20:34:45 公開日:2024-02-06

# AV2Wav: 音声音声強調のための連続自己教師機能からの拡散に基づく再合成

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement ( http://arxiv.org/abs/2309.08030v3 )

ライセンス: Link先を確認

Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu

(参考訳) 音声強調システムは通常、クリーンな音声と騒がしい音声のペアを使って訓練される。オーディオ・ヴィジュアル音声強調(AVSE)では、音声・ヴィジュアル・データセットは、背景雑音や残響を伴う現実世界の環境で収集され、AVSEの開発を妨げている。本研究では,実世界の学習データの課題にもかかわらずクリーンな音声を生成できる再生型音声視覚音声強調手法であるAV2Wavを紹介する。ニューラルクオリティ推定器を用いて音声・視覚コーパスからほぼクリーンな音声のサブセットを取得し、このサブセット上で拡散モデルを訓練し、ノイズロバストトレーニングによりAV-HuBERTから連続音声表現に条件付き波形を生成する。韻律や話者情報を保持するために、離散表現よりも連続表現を用いる。このvocodingタスクだけで、モデルはマスキングベースのベースラインよりも音声強調を行うことができる。さらに, クリーン・ノイズ対の拡散モデルを微調整し, 性能向上を図る。提案手法は,自動測定と人間の聴力テストの両方においてマスキングベースのベースラインを上回り,聴力テストにおけるターゲット音声にほぼ近い品質である。オーディオサンプルはhttps://home.ttic.edu/~jcchou/demo/avse/avse_demo.htmlにある。

Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html.

翻訳日:2024-02-08 20:27:27 公開日:2024-02-06

# 量子絡み合いの幾何学的意味を明らかにする:離散変数系と連続変数系

Unveiling the geometric meaning of quantum entanglement: discrete and continuous variable systems ( http://arxiv.org/abs/2307.16835v2 )

ライセンス: Link先を確認

Arthur Vesperini, Ghofrane Bel-Hadj-Aissa, Lorenzo Capra, and Roberto Franzosi

(参考訳) 量子状態の多様体はリッチで非自明な幾何学的構造を持つことを示す。我々は、多量子ビット量子系の射影ヒルベルト空間のフビニ・スタディ計量を導出し、リーマン計量構造を内挿し、この空間の状態の絡み合いと深い関係を調べる。尺度として, [1] で提案する絡み合い距離 e を予備的に適用する。 E(|psi>) は |psi> とその共役状態、すなわち状態 v^mu の間の平方距離の和の最小値である。 sigma^mu |psi>, v^mu は単位ベクトルであり、mu はパーティ数で実行される。 2つの状態が局所ユニタリ作用素の作用で同じ状態でないかどうかを決定する一般的な手法を導出する。我々は, 絡み合い距離が, 混合状態への凸屋根の拡大とともに, 絡み合い対策に必要な3つの条件を満たすことを証明した。 i) E(|psi>) =0 iff |psi> は完全に分離可能である。 ii) e は局所ユニタリ変換の下で不変である。三地方業務及び古典通信において、Eは増加しない。この性質には2つの異なる証明がある。また、2つの量子ビット純粋状態の場合、状態 |psi> の絡み合い距離は、この状態の2倍の2倍と一致することも示している。連続変数系に対する絡み合い距離の一般化を提案する。最後に,greenberger-horne-zeilinger状態,briegel raussendorf状態,w状態と結びついた3つの状態の絡み合いの大きさと同値類の性質の研究に幾何学的アプローチを適用した。連続変数を持つ系の場合の応用例として、2つの結合したグラウバーコヒーレント状態の系を考える。

We show that the manifold of quantum states is endowed with a rich and nontrivial geometric structure. We derive the Fubini-Study metric of the projective Hilbert space of a multi-qubit quantum system, endowing it with a Riemannian metric structure, and investigate its deep link with the entanglement of the states of this space. As a measure, we adopt the Entanglement Distance E preliminary proposed in [1]. Our analysis shows that entanglement has a geometric interpretation: E(|psi>) is the minimum value of the sum of the squared distances between |psi> and its conjugate states, namely the states v^mu . sigma^mu |psi>, where v^mu are unit vectors and mu runs on the number of parties. We derive a general method to determine when two states are not the same state up to the action of local unitary operators. We prove that the entanglement distance, along with its convex roof expansion to mixed states, fulfills the three conditions required for an entanglement measure: that is i) E(|psi>) =0 iff |psi> is fully separable; ii) E is invariant under local unitary transformations; iii) E doesn't increase under local operation and classical communications. Two different proofs are provided for this latter property. We also show that in the case of two qubits pure states, the entanglement distance for a state |psi> coincides with two times the square of the concurrence of this state. We propose a generalization of the entanglement distance to continuous variable systems. Finally, we apply the proposed geometric approach to the study of the entanglement magnitude and the equivalence classes properties, of three families of states linked to the Greenberger-Horne-Zeilinger states, the Briegel Raussendorf states and the W states. As an example of an application for the case of a system with continuous variables, we have considered a system of two coupled Glauber coherent states.

翻訳日:2024-02-08 20:26:43 公開日:2024-02-06

# 一貫性のあるoracleによるシンプルなオンライン学習

Simple online learning with consistent oracle ( http://arxiv.org/abs/2308.08055v2 )

ライセンス: Link先を確認

Alexander Kozachinskiy, Tomasz Steifer

(参考訳) オンライン学習は,学習アルゴリズムがクラスにのみアクセス可能なモデルであり,そのモデルでは,任意の時点で,これまで見てきたすべての例に一致する関数をクラスから与えることができる,という,‘emph{consistent oracle}’(オラクル)を経由する。このモデルはAssosらによって最近検討された。 ~(colt'23)であった。オンライン学習の標準的な方法は、計算的に難解な問題であるサブクラスのリトルストーン次元の計算に依存しているという事実に動機づけられている。アソスとアル。このモデルのオンライン学習アルゴリズムは、Littlestone 次元のクラスで少なくとも$C^d$ のミスを犯し、絶対的でない定数 $C > 0$ に対して$d$ の間違いを犯す。我々は少なくとも$O(256^d)$ミスを犯す新しいアルゴリズムを与える。この証明は非常に単純であり、リトルストーン次元の非常に基本的な性質のみを用いる。また、このモデルには3^d$の誤りを犯すアルゴリズムが存在しないことも示している。

We consider online learning in the model where a learning algorithm can access the class only via the \emph{consistent oracle} -- an oracle, that, at any moment, can give a function from the class that agrees with all examples seen so far. This model was recently considered by Assos et al.~(COLT'23). It is motivated by the fact that standard methods of online learning rely on computing the Littlestone dimension of subclasses, a computationally intractable problem. Assos et al.~gave an online learning algorithm in this model that makes at most $C^d$ mistakes on classes of Littlestone dimension $d$, for some absolute unspecified constant $C > 0$. We give a novel algorithm that makes at most $O(256^d)$ mistakes. Our proof is significantly simpler and uses only very basic properties of the Littlestone dimension. We also show that there exists no algorithm in this model that makes less than $3^d$ mistakes.

翻訳日:2024-02-08 20:12:51 公開日:2024-02-06

# 空間幾何学的推論を必要とするオブジェクトアセンブリタスクにおける視覚的表現のロバスト性評価

Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning ( http://arxiv.org/abs/2310.09943v3 )

ライセンス: Link先を確認

Chahyon Ku, Carl Winge, Ryan Diaz, Wentao Yuan, Karthik Desingh

(参考訳) 本稿では主に、オブジェクトアセンブリタスクのコンテキストにおける視覚表現の堅牢性の評価とベンチマークに焦点をあてる。具体的には、一般にpeg-in-holeタスクと呼ばれる幾何学的押出しと侵入を伴う物体のアライメントと挿入について検討する。成功組立のためにSE(3)空間のペグと穴形状を検出・オリエントするために必要な精度は大きな課題となる。そこで我々はヴィジュアル・エンコーダとして視覚前訓練モデルを利用するvisosomotor policy learningの汎用フレームワークを採用している。本研究は,両腕操作設定,特に把持変動に対して適用した場合のロバスト性について検討する。我々の定量的分析は、既存の事前学習モデルでは、このタスクに必要な視覚的特徴を捉えることができないことを示している。しかし、スクラッチから訓練されたビジュアルエンコーダは、凍結した事前訓練されたモデルよりも一貫して優れている。さらに、政策学習を大幅に改善する回転表現と関連する損失関数について論じる。本稿では,幾何学的・空間的推論を必要とする複雑な組み立て作業のロバスト性向上に特に焦点をあてた,visosomotor policy learningの進歩を評価するための新しいタスクシナリオを提案する。ビデオ、追加の実験、データセット、コードはhttps://bit.ly/geometric-peg-in-hole.com/で入手できる。

This paper primarily focuses on evaluating and benchmarking the robustness of visual representations in the context of object assembly tasks. Specifically, it investigates the alignment and insertion of objects with geometrical extrusions and intrusions, commonly referred to as a peg-in-hole task. The accuracy required to detect and orient the peg and the hole geometry in SE(3) space for successful assembly poses significant challenges. Addressing this, we employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders. Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations. Our quantitative analysis shows that existing pretrained models fail to capture the essential visual features necessary for this task. However, a visual encoder trained from scratch consistently outperforms the frozen pretrained models. Moreover, we discuss rotation representations and associated loss functions that substantially improve policy learning. We present a novel task scenario designed to evaluate the progress in visuomotor policy learning, with a specific focus on improving the robustness of intricate assembly tasks that require both geometrical and spatial reasoning. Videos, additional experiments, dataset, and code are available at https://bit.ly/geometric-peg-in-hole .

翻訳日:2024-02-08 20:01:48 公開日:2024-02-06

# FlorDB: 継続的トレーニングのためのマルチバージョン監視ロギング

FlorDB: Multiversion Hindsight Logging for Continuous Training ( http://arxiv.org/abs/2310.07898v2 )

ライセンス: Link先を確認

Rolando Garcia, Anusha Dandamudi, Gabriel Matute, Lehan Wan, Joseph Gonzalez, Joseph M. Hellerstein, Koushik Sen

(参考訳) プロダクション機械学習には継続的トレーニングが伴う。複数のバージョンのモデルを時間とともにホストし、多くの場合、複数のモデルバージョンを同時に実行する。モデルパフォーマンスが期待を満たさない場合、機械学習エンジニア(mles)は、多くの以前のバージョンのコードとトレーニングデータの探索と分析を通じて問題をデバッグし、根本原因を特定し、問題を緩和する。従来のデバッグとロギングツールは、実験的なマルチバージョンコンテキストの管理に不足することが多い。 FlorDBはMultiversion Hindsight Loggingを導入し、エンジニアは最新のバージョンのロギングステートメントを使用して過去のバージョンを問い合わせることができる。ログステートメントの伝搬は、コードベースの変更にかかわらず、過去のコードバージョンにロギングステートメントを一貫した注入を可能にする。ログステートメントがコードバージョンに伝播されると、multiversionhindsight loggingの残りの課題は、以前の実行時のチェックポイントに基づいて、新しいログステートメントを効率的に再生することである。最後に、すべてのバージョンのコードとデータのMLEデバッグを支援するために、一貫性のあるユーザエクスペリエンスが必要です。この目的のためにflordbは、履歴クエリを効率的に処理するための統一リレーショナルモデルを提示し、ログ履歴の包括的なビューを提供し、過去のコードのイテレーションの探索を簡単にする。本稿では,クエリベースのフィルタリングとチェックポイントベースの並列処理を有効活用し,そのスケーラビリティとリアルタイムクエリ応答能力を確認した多種多様なベンチマークの性能評価を行う。

Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay.

翻訳日:2024-02-08 20:01:04 公開日:2024-02-06

# 置換不変な量子符号の族

A family of permutationally invariant quantum codes ( http://arxiv.org/abs/2310.05358v2 )

ライセンス: Link先を確認

Arda Aydin, Max A. Alekseyev, Alexander Barg

(参考訳) 任意の$t\ge 1$に対して$t$ Pauliエラーを補正する、置換不変コードの新しいファミリーを構築します。また,新しい系統の符号は,量子欠失誤差と自発的減衰誤差を補正することを示した。我々の構成は、以前に知られている変分不変量子符号のいくつかを特に含んでおり、これは超越ゲートも含んでいる。多くの場合、新しいファミリーの符号は、ポーリの誤りや削除に対する最もよく知られた明示的な置換的不変符号よりも短い。さらに、新しいコードファミリーには、新しい$((4,2,2))$Optimary Single-deletion-correctingコードが含まれています。別の結果として、置換的不変符号の条件を一般化し、以前の既知の結果から任意の数のエラーに対して$t=1$の$t$ pauliエラーを補正する。小さな$t$の場合、これらの条件はコンピュータによるコードの新しい例を構築するのに使うことができる。

We construct a new family of permutationally invariant codes that correct $t$ Pauli errors for any $t\ge 1$. We also show that codes in the new family correct quantum deletion errors as well as spontaneous decay errors. Our construction contains some of the previously known permutationally invariant quantum codes as particular cases, which also admit transversal gates. In many cases, the codes in the new family are shorter than the best previously known explicit permutationally invariant codes for Pauli errors and deletions. Furthermore, our new code family includes a new $((4,2,2))$ optimal single-deletion-correcting code. As a separate result, we generalize the conditions for permutationally invariant codes to correct $t$ Pauli errors from the previously known results for $t=1$ to any number of errors. For small $t$, these conditions can be used to construct new examples of codes by computer.

翻訳日:2024-02-08 20:00:02 公開日:2024-02-06

# 転送可能なグラフオートエンコーダによるネットワークアライメント

Network Alignment with Transferable Graph Autoencoders ( http://arxiv.org/abs/2310.03272v2 )

ライセンス: Link先を確認

Jiashu He, Charilaos I. Kanatsoulis, Alejandro Ribeiro

(参考訳) ネットワークアライメントは、異なるグラフのノード間の1対1の対応を確立し、ハイインパクトなドメインで多くのアプリケーションを見つけるタスクである。しかし、このタスクはNPハードであることが知られており、既存のアルゴリズムはグラフのサイズが大きくなるにつれてスケールアップしない。そこで我々は,アライメントタスクに適合した,強力でロバストなノード埋め込みを抽出することを目的とした,新しい一般化グラフオートエンコーダアーキテクチャを提案する。生成した埋め込みはグラフの固有値と固有ベクトルに関連付けられ、古典的なスペクトル法と比較してより正確なアライメントが得られることが証明される。また,提案フレームワークでは,転送学習とデータ拡張を利用して,再トレーニングすることなく大規模ネットワークアライメントを実現している。実世界のグラフとのネットワークとサブネットワークの連携に関する広範囲な実験は、提案手法の有効性とスケーラビリティを裏付ける証拠を提供する。

Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.

翻訳日:2024-02-08 19:58:44 公開日:2024-02-06

# テキストから画像への拡散によるドメインの変換:ドメイン適応へのソースフリーアプローチ

Transcending Domains through Text-to-Image Diffusion: A Source-Free Approach to Domain Adaptation ( http://arxiv.org/abs/2310.01701v4 )

ライセンス: Link先を確認

Shivang Chopra, Suraj Kothawade, Houda Aynaou, Aman Chadha

(参考訳) ドメイン適応(da)は、モデルが関連するソースドメインから取得した情報を十分なラベル付きデータで適用することにより、不適切なアノテートデータを持つ対象ドメインにおけるモデルの性能を向上させる手法である。 HIPAA、COPPA、FERPAなどのデータプライバシ規制の実施が、ソースデータに直接アクセスする必要を回避しつつ、新しいドメインにモデルを適用することへの関心を高め、ソースフリードメイン適応(Source-free Domain Adaptation、SFDA)と呼ばれる問題を引き起こした。本稿では,対象ドメインのサンプルに基づいて訓練されたテキスト・画像拡散モデルを用いて,ソースデータを生成する新しいSFDAフレームワークを提案する。提案手法は,ラベル付き対象領域のサンプルに対してテキスト間拡散モデルをトレーニングし,事前学習したソースモデルを用いて微調整を行い,ソースデータに近いサンプルを生成する。最後に、ドメイン適応技術を用いて、人工的に生成されたソースデータを対象のドメインデータと整合させることにより、ターゲットのドメイン上でのモデルの性能が大幅に向上する。標準のoffice-31, office-home, visdaベンチマークにおける複数のベースラインとの比較を行い,sfdaタスクに対するアプローチの有効性を実証した。

Domain Adaptation (DA) is a method for enhancing a model's performance on a target domain with inadequate annotated data by applying the information the model has acquired from a related source domain with sufficient labeled data. The escalating enforcement of data-privacy regulations like HIPAA, COPPA, FERPA, etc. have sparked a heightened interest in adapting models to novel domains while circumventing the need for direct access to the source data, a problem known as Source-Free Domain Adaptation (SFDA). In this paper, we propose a novel framework for SFDA that generates source data using a text-to-image diffusion model trained on the target domain samples. Our method starts by training a text-to-image diffusion model on the labeled target domain samples, which is then fine-tuned using the pre-trained source model to generate samples close to the source data. Finally, we use Domain Adaptation techniques to align the artificially generated source data with the target domain data, resulting in significant performance improvements of the model on the target domain. Through extensive comparison against several baselines on the standard Office-31, Office-Home, and VisDA benchmarks, we demonstrate the effectiveness of our approach for the SFDA task.

翻訳日:2024-02-08 19:58:29 公開日:2024-02-06

# 動的ゴール認識フラグメントによる薬物発見

Drug Discovery with Dynamic Goal-aware Fragments ( http://arxiv.org/abs/2310.00841v2 )

ライセンス: Link先を確認

Seul Lee, Seanie Lee, Kenji Kawaguchi, Sung Ju Hwang

(参考訳) フラグメントに基づく薬物発見は、広大な化学領域における薬物候補の発見に有効な戦略であり、分子生成モデルに広く用いられている。しかし、そのようなモデルにおける既存の断片抽出法の多くは、対象の化学的性質を考慮せず、ヒューリスティックな規則に依存する。さらに、既存のフラグメントベースの生成モデルは、生成中に新たに発見されたゴール対応のフラグメントでフラグメント語彙を更新できない。そこで本研究では,創薬のための分子生成フレームワークであるgoal-aware fragment extraction, assembly and modified (geam)を提案する。 GEAMは3つのモジュールから構成されており、それぞれがゴール対応のフラグメント抽出、フラグメントアセンブリ、フラグメント修正を担当している。フラグメント抽出モジュールは、情報ボトルネック原理により、所望の目標プロパティに寄与する重要なフラグメントを識別し、有効ゴール対応フラグメント語彙を構築する。さらに、GEAMはフラグメント修正モジュールで最初の語彙を超える探索が可能であり、動的ゴール対応語彙更新によってさらに探索が強化される。 GEAMは, 薬物発見タスクにおける3つのモジュールの生成サイクルを通じて, 薬物候補を効果的に発見できることを実験的に実証した。

Fragment-based drug discovery is an effective strategy for discovering drug candidates in the vast chemical space, and has been widely employed in molecular generative models. However, many existing fragment extraction methods in such models do not take the target chemical properties into account or rely on heuristic rules. Additionally, the existing fragment-based generative models cannot update the fragment vocabulary with goal-aware fragments newly discovered during the generation. To this end, we propose a molecular generative framework for drug discovery, named Goal-aware fragment Extraction, Assembly, and Modification (GEAM). GEAM consists of three modules, each responsible for goal-aware fragment extraction, fragment assembly, and fragment modification. The fragment extraction module identifies important fragments contributing to the desired target properties with the information bottleneck principle, thereby constructing an effective goal-aware fragment vocabulary. Moreover, GEAM can explore beyond the initial vocabulary with the fragment modification module, and the exploration is further enhanced through the dynamic goal-aware vocabulary update. We experimentally demonstrate that GEAM effectively discovers drug candidates through the generative cycle of the three modules in various drug discovery tasks.

翻訳日:2024-02-08 19:57:44 公開日:2024-02-06

# FENDA-FL : 不均一な臨床データを用いた個人化フェデレーション学習

FENDA-FL: Personalized Federated Learning on Heterogeneous Clinical Datasets ( http://arxiv.org/abs/2309.16825v2 )

ライセンス: Link先を確認

Fatemeh Tavakoli, D.B. Emerson, Sana Ayromlou, John Jewell, Amrit Krishnan, Yuchong Zhang, Amol Verma, Fahad Razak

(参考訳) フェデレーテッド・ラーニング(FL)は、臨床環境での機械学習モデルのトレーニングと展開を頻繁に妨害するデータサイロを克服するための重要なアプローチとして、ますます認識されている。この研究は、3つの重要な方向に沿って臨床応用に焦点を当てたfl研究の発展に寄与している。まず、FLambyベンチマーク(du Terrail et al., 2022a)を拡張し、パーソナライズされたFL手法の評価を行い、元の結果よりも実質的な性能改善を示す。次に,実際の設定を反映し,複数の比較基準を提供するために,FLの総合的なチェックポイントと評価フレームワークを提案する。最後に,perfcl(zhang et al., 2022)の重要なアブレーションについて検討した。このアブレーションは、FL設定へのFENDA(Kim et al., 2016)の自然な拡張である。 flambyベンチマークとgeminiデータセット(verma et al., 2017)で実施した実験によると、このアプローチは異種臨床データに対して堅牢であり、perfclを含む既存のグローバルおよびパーソナライズされたfl技術を上回ることが多い。

Federated learning (FL) is increasingly being recognized as a key approach to overcoming the data silos that so frequently obstruct the training and deployment of machine-learning models in clinical settings. This work contributes to a growing body of FL research specifically focused on clinical applications along three important directions. First, we expand the FLamby benchmark (du Terrail et al., 2022a) to include evaluation of personalized FL methods and demonstrate substantive performance improvements over the original results. Next, we advocate for a comprehensive checkpointing and evaluation framework for FL to reflect practical settings and provide multiple comparison baselines. Finally, we study an important ablation of PerFCL (Zhang et al., 2022). This ablation is a natural extension of FENDA (Kim et al., 2016) to the FL setting. Experiments conducted on the FLamby benchmarks and GEMINI datasets (Verma et al., 2017) show that the approach is robust to heterogeneous clinical data and often outperforms existing global and personalized FL techniques, including PerFCL.

翻訳日:2024-02-08 19:57:25 公開日:2024-02-06

# 確率モデルに基づくメタ強化学習によるデータ効率の高いタスク一般化

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning ( http://arxiv.org/abs/2311.07558v2 )

ライセンス: Link先を確認

Arjun Bhardwaj, Jonas Rothfuss, Bhavya Sukhija, Yarden As, Marco Hutter, Stelian Coros, Andreas Krause

(参考訳) 本稿では,モデルに基づくメタ強化学習(Meta-RL)アルゴリズムであるPACOH-RLを紹介する。 PACOH-RLメタ学習は動的モデルに先行し、最小の相互作用データを持つ新しい力学への迅速な適応を可能にする。既存のメタrlメソッドは豊富なメタラーニングデータを必要とするため、データ取得にコストがかかるロボティクスなどの設定での適用性が制限される。これを解決するため、PACOH-RLは、メタラーニングとタスク適応の段階において、正規化と疫学的不確実性の定量化を取り入れている。新しいダイナミクスに直面するとき、探索とデータ収集を効果的に導くために、これらの不確実性推定を使用する。全体として、以前のタスクや動的設定からのデータにアクセスしても、ポジティブな転送が可能になる。実験の結果,PACOH-RLはモデルベースRLおよびモデルベースMeta-RLベースラインよりも高い性能を示し,新しい動的条件に適応した。最後に、実車上では、多種多様なデータスカース条件下での効率的なRLポリシー適応の可能性を示す。

We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.

翻訳日:2024-02-08 19:49:18 公開日:2024-02-06

# 時間同期配電系統状態推定のためのディープニューラルネットワークの性能解析検証

Analytical Verification of Deep Neural Network Performance for Time-Synchronized Distribution System State Estimation ( http://arxiv.org/abs/2311.06973v2 )

ライセンス: Link先を確認

Behrouz Azimian, Shiva Moshtagh, Anamitra Pal, Shanshan Ma

(参考訳) 近年,リアルタイム観測不能な分散システムのためのディープニューラルネットワーク(DNN)を用いた時間同期状態推定器の成功例が報告されている。本稿では,入力測定における摂動関数として,その状態推定器の性能に関する解析的境界を与える。テストデータセットのみに基づいてパフォーマンスを評価することは、トレーニング済みのDNNが入力摂動を処理する能力を効果的に示すものではないことがすでに示されている。そこで我々はDNNの堅牢性と信頼性を解析的に検証し,それらを混合整数線形プログラミング(MILP)問題として扱う。 MILP定式化のスケーラビリティ制限に対処する際のバッチ正規化の能力も強調されている。このフレームワークは、修正されたieee 34ノードシステムと実世界の大規模分散システムに対する時間同期分布系状態推定を行い、いずれもマイクロファサー測定ユニットによって不完全に観測される。

Recently, we demonstrated success of a time-synchronized state estimator using deep neural networks (DNNs) for real-time unobservable distribution systems. In this letter, we provide analytical bounds on the performance of that state estimator as a function of perturbations in the input measurements. It has already been shown that evaluating performance based on only the test dataset might not effectively indicate a trained DNN's ability to handle input perturbations. As such, we analytically verify robustness and trustworthiness of DNNs to input perturbations by treating them as mixed-integer linear programming (MILP) problems. The ability of batch normalization in addressing the scalability limitations of the MILP formulation is also highlighted. The framework is validated by performing time-synchronized distribution system state estimation for a modified IEEE 34-node system and a real-world large distribution system, both of which are incompletely observed by micro-phasor measurement units.

翻訳日:2024-02-08 19:48:57 公開日:2024-02-06

# 無線ネットワークにおけるビデオキャッシングのためのリソースアウェア階層型フェデレート学習

Resource-Aware Hierarchical Federated Learning for Video Caching in Wireless Networks ( http://arxiv.org/abs/2311.06918v2 )

ライセンス: Link先を確認

Md Ferdous Pervej and Andreas F Molisch

(参考訳) ビデオキャッシングは、ユーザーが頻繁に要求する人気のコンテンツをローカルに保存することで、交通渋滞を著しく改善することができる。ユーザの要求が時間とともにどのように変化するかを学ぶためには,プライバシ保護手法が望ましい。そこで本研究では,コンテンツ要求が散発的であり,ユーザのデータセットは要求されたコンテンツの情報に基づいてのみ更新可能であるという現実的な仮定の下で,ユーザの今後のコンテンツ要求を予測するための,リソース対応階層型学習(RawHFL)ソリューションを提案する。部分的なクライアント参加の場合を考えると、まず、クライアントのローカルトレーニングラウンドに依存するグローバルグラデーションノルムの上限と、無線リンク上で蓄積されたグラデーションの受信の成功を導出する。遅延,エネルギー,無線リソースの制約の下で,RawHFLの収束をエネルギー効率よく促進する重み付きユーティリティ関数を最小化するために,クライアントの選択とその局所ラウンドとCPU周波数を最適化する。シミュレーション結果から,提案手法は予測精度と総エネルギー消費量の点で基準値を大きく上回ることがわかった。

Video caching can significantly improve backhaul traffic congestion by locally storing the popular content that users frequently request. A privacy-preserving method is desirable to learn how users' demands change over time. As such, this paper proposes a novel resource-aware hierarchical federated learning (RawHFL) solution to predict users' future content requests under the realistic assumptions that content requests are sporadic and users' datasets can only be updated based on the requested content's information. Considering a partial client participation case, we first derive the upper bound of the global gradient norm that depends on the clients' local training rounds and the successful reception of their accumulated gradients over the wireless links. Under delay, energy and radio resource constraints, we then optimize client selection and their local rounds and central processing unit (CPU) frequencies to minimize a weighted utility function that facilitates RawHFL's convergence in an energy-efficient way. Our simulation results show that the proposed solution significantly outperforms the considered baselines in terms of prediction accuracy and total energy expenditure.

翻訳日:2024-02-08 19:48:44 公開日:2024-02-06

# 強化学習におけるトンプソンサンプリングのためのベイズ回帰境界の改良

Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning ( http://arxiv.org/abs/2310.20007v2 )

ライセンス: Link先を確認

Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal

(参考訳) 本稿では,複数設定の強化学習におけるトンプソンサンプリングに対する最初のベイズ的後悔の限界を実証する。本稿では,サロゲート環境の離散セットを用いた学習問題を単純化し,後方整合性を用いた情報比率の高精度解析を提案する。これは、h$ がエピソードの長さ、$d_{l_1}$ が環境空間のコルモゴロフ $l_1-$dimensionであるような不均質な強化学習問題において、順序 $\widetilde{o}(h\sqrt{d_{l_1}t})$ の上限となる。次に、表、線形、有限混合といった様々な設定で$d_{l_1}$の具体的な境界を見つけ、その結果がどのようにそれらの種類の最初のものであるか、それとも最先端の技術を改善するかについて議論する。

In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.

翻訳日:2024-02-08 19:46:56 公開日:2024-02-06

# 古典的なfrenet-serret装置から量子力学的進化の曲率とねじれまで。第2部。非定常ハミルトニアン

From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part II. Nonstationary Hamiltonians ( http://arxiv.org/abs/2311.18463v2 )

ライセンス: Link先を確認

Paul M. Alsing, Carlo Cafaro

(参考訳) 非定常ハミルトニアンの下で進化する状態ベクトルによって追跡される量子曲線の曲げとねじれの定量化に関する幾何学的視点を示す。具体的には, 定常ハミルトニアンの既存の幾何学的視点に基づき, 時変曲率とねじれ係数の両方が重要な役割を果たす時間依存量子力学的シナリオへの理論的構成の一般化について論じる。具体的には、シュロディンガー発展方程式を規定する時間依存ハミルトニアンの下で一元的に進化する平行移動純量子状態によってトレースされる射影ヒルベルト空間における量子軌道に対するフレネット・セルレート装置の量子バージョンを提案する。時変曲率係数は、接ベクトルと状態ベクトルの共変微分の2乗の大きさで指定され、量子曲線の曲げを測定する。時間変化のねじれ係数は、接ベクトルの共変微分の状態ベクトルへの射影の大きさの2乗、接ベクトルと状態ベクトルに直交し、さらに量子曲線のねじれを測定することによって与えられる。時間変化の設定は、統計的観点からよりリッチな構造を示す。例えば、時間に依存しない構成とは異なり、一般化された分散の概念は非定常ハミルトニアンの下で進化する量子状態によってトレースされる曲線のねじれの定義において非自明に入る。本手法の意義を物理的に説明するために, 正弦波振動時間依存ポテンシャルによって特定される, 完全に可溶な時間依存二状態rabi問題に適用する。

We present a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by state vectors evolving under nonstationary Hamiltonians. Specifically, relying on the existing geometric viewpoint for stationary Hamiltonians, we discuss the generalization of our theoretical construct to time-dependent quantum-mechanical scenarios where both time-varying curvature and torsion coefficients play a key role. Specifically, we present a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced out by a parallel-transported pure quantum state evolving unitarily under a time-dependent Hamiltonian specifying the Schrodinger evolution equation. The time-varying curvature coefficient is specified by the magnitude squared of the covariant derivative of the tangent vector to the state vector and measures the bending of the quantum curve. The time-varying torsion coefficient, instead, is given by the magnitude squared of the projection of the covariant derivative of the tangent vector to the state vector, orthogonal to the tangent vector and state vector and, in addition, measures the twisting of the quantum curve. We find that the time-varying setting exhibits a richer structure from a statistical standpoint. For instance, unlike the time-independent configuration, we find that the notion of generalized variance enters nontrivially in the definition of the torsion of a curve traced out by a quantum state evolving under a nonstationary Hamiltonian. To physically illustrate the significance of our construct, we apply it to an exactly soluble time-dependent two-state Rabi problem specified by a sinusoidal oscillating time-dependent potential...

翻訳日:2024-02-08 19:35:39 公開日:2024-02-06

# lightgaussian: 15倍縮小200fpsの非有界3次元ガウス圧縮

LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS ( http://arxiv.org/abs/2311.17245v4 )

ライセンス: Link先を確認

Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang

(参考訳) ポイントベース技術を用いたリアルタイムニューラルレンダリングの最近の進歩は、3D表現の普及の道を開いた。しかし、3D Gaussian Splattingのような基本的なアプローチは、SfMポイントを数百万に拡大し、単一の無制限シーンに対してギガバイトレベルのディスクスペースを必要とすることがあり、大きなスケーラビリティ上の課題を生じさせ、スティング効率を妨げている。この課題に対処するために、我々は3Dガウスをより効率的でコンパクトなフォーマットに変換するために設計された新しい方法であるLightGaussianを紹介する。ネットワークプルーニングの概念からインスピレーションを得て、lightgaussianはシーンの再構築に寄与しないガウス人を特定し、プルーニングとリカバリのプロセスを採用し、視覚効果を保ちながらガウス数における冗長性を効果的に削減した。さらに、LightGaussianは、蒸留と擬似ビュー拡張を使用して球面調和を低い程度に蒸留し、反射性を維持しながらよりコンパクトな表現への知識伝達を可能にする。さらに,全ての属性を量子化するハイブリッド方式であるVecTree Quantizationを提案する。要約すると、LightGaussian は FPS を 139 から 215 に向上させ、Mip-NeRF 360, Tank と Temple のデータセット上の複雑なシーンの効率的な表現を可能にした。プロジェクトウェブサイト: https://lightgaussian.github.io/

Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency. To address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses. In summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets. Project website: https://lightgaussian.github.io/

翻訳日:2024-02-08 19:35:11 公開日:2024-02-06

# VALUED -- 視覚と論理的理解評価データセット

VALUED -- Vision and Logical Understanding Evaluation Dataset ( http://arxiv.org/abs/2311.12610v2 )

ライセンス: Link先を確認

Soumadeep Saha, Saptarshi Saha, Utpal Garain

(参考訳) コンピュータビジョンタスクの初期の成功から始まり、ディープラーニングベースの技術は、多くの領域で最先端の技術アプローチを追い越してきた。しかし、これらの手法が意味的文脈や論理的制約を捉えず、答えに到達するには素早い相関に依存することが何度も示されてきた。批判シナリオへのディープラーニング技術の適用は、ドメイン固有の制約の遵守に依存しているため、この問題に対処するためのいくつかの試みがなされている。この領域の徹底的な探索を控える制限のひとつは、豊富なルールを特徴とする適切なデータセットの欠如である。この問題に対処するため,20,000$+$の注釈付き画像と関連するルールセットからなるVALUE(Vision And Logical Understanding Evaluation)データセットを,人気ボードゲームであるチェスに基づいて提示する。キュレートされたルールセットは許容可能な予測のセットをかなり制約し、ローカライゼーションや列挙のようなキーセマンティックな能力を探索するように設計されている。標準的なメトリクスに加えて、論理的一貫性に関するパフォーマンスを測定するための追加メトリクスも提示される。我々は,このタスクにおけるアートビジョンモデルの人気と現状を分析し,標準メトリクスのパフォーマンスは評価可能であるが,無矛盾な結果が多数得られており,このデータセットが今後の作業において重要な課題であることを示す。

Starting with early successes in computer vision tasks, deep learning based techniques have since overtaken state of the art approaches in a multitude of domains. However, it has been demonstrated time and again that these techniques fail to capture semantic context and logical constraints, instead often relying on spurious correlations to arrive at the answer. Since application of deep learning techniques to critical scenarios are dependent on adherence to domain specific constraints, several attempts have been made to address this issue. One limitation holding back a thorough exploration of this area, is a lack of suitable datasets which feature a rich set of rules. In order to address this, we present the VALUE (Vision And Logical Understanding Evaluation) Dataset, consisting of 200,000$+$ annotated images and an associated rule set, based on the popular board game - chess. The curated rule set considerably constrains the set of allowable predictions, and are designed to probe key semantic abilities like localization and enumeration. Alongside standard metrics, additional metrics to measure performance with regards to logical consistency is presented. We analyze several popular and state of the art vision models on this task, and show that, although their performance on standard metrics are laudable, they produce a plethora of incoherent results, indicating that this dataset presents a significant challenge for future works.

翻訳日:2024-02-08 19:33:55 公開日:2024-02-06

# (なぜ) 私のプロンプトはもっと悪いのか? LLM APIの進化における回帰テストの再考

(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs ( http://arxiv.org/abs/2311.11123v2 )

ライセンス: Link先を確認

Wanqin Ma, Chenyang Yang, Christian K\"astner

(参考訳) 大規模言語モデル(LLM)はますますソフトウェアアプリケーションに統合されている。下流のアプリケーション開発者は、サービスとして提供されるAPIを通じてLLMにアクセスすることが多い。しかし、LLM APIは、しばしば静かに更新され、非推奨にされ、ユーザーは進化するモデルに継続的に適応せざるを得ない。これは性能の低下を引き起こし、毒性検出のケーススタディで証明されているように、迅速な設計選択に影響を与える可能性がある。ケーススタディに基づき、LLM APIの進化における回帰テストの概念の必要性と再検討を強調した。 LLMの回帰テストには、異なる正確性の概念、不安定性の促進、LLM APIの非決定性など、従来のテストアプローチに根本的な変更が必要であると我々は主張する。

Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.

翻訳日:2024-02-08 19:33:32 公開日:2024-02-06

# プロンプト工学から、ループの中の人間とのプロンプト科学へ

From Prompt Engineering to Prompt Science With Human in the Loop ( http://arxiv.org/abs/2401.04122v2 )

ライセンス: Link先を確認

Chirag Shah

(参考訳) LLMが私たちの生活の様々な側面に進出するにつれ、LCMの使用に関する精査が増加するのは科学的研究である。研究目的のデータの生成や分析にLLMを使うことが普及している。しかし、そのようなアプリケーションがアドホックな決定とエンジニアリングのソリューションに満ちている場合、その研究、その発見、またはその研究に基づく将来にどのように影響するかを心配する必要があります。研究にllmを使うには、もっと科学的アプローチが必要です。より体系的なプロンプトの構築を支援するための活動はいくつかあるが、しばしば、十分な透明性、客観性、または厳密さで複製可能で一般化可能な知識を生成するよりも、望ましい結果を達成することに重点を置いている。本稿では,質的手法によるコードブック構築に着想を得た新しい手法を提案する。この手法は、ループ内の人間と多相検証プロセスを用いて、データ分析にLLMを適用するためのより体系的で客観的で信頼できる方法の基礎を定めている。具体的には、一連の研究者が厳密なラベル付け、検討、文書化のプロセスを通じて、主観性を排除し、透明性と複製性を生成プロセスにもたらす方法を示す。

As LLMs make their way into many aspects of our lives, one place that warrants increased scrutiny with LLM usage is scientific research. Using LLMs for generating or analyzing data for research purposes is gaining popularity. But when such application is marred with ad-hoc decisions and engineering solutions, we need to be concerned about how it may affect that research, its findings, or any future works based on that research. We need a more scientific approach to using LLMs in our research. While there are several active efforts to support more systematic construction of prompts, they are often focused more on achieving desirable outcomes rather than producing replicable and generalizable knowledge with sufficient transparency, objectivity, or rigor. This article presents a new methodology inspired by codebook construction through qualitative methods to address that. Using humans in the loop and a multi-phase verification processes, this methodology lays a foundation for more systematic, objective, and trustworthy way of applying LLMs for analyzing data. Specifically, we show how a set of researchers can work through a rigorous process of labeling, deliberating, and documenting to remove subjectivity and bring transparency and replicability to prompt generation process.

翻訳日:2024-02-08 19:23:55 公開日:2024-02-06

# diarizationlm:大規模言語モデルを用いた話者ダイアリゼーション後処理

DiarizationLM: Speaker Diarization Post-Processing with Large Language Models ( http://arxiv.org/abs/2401.03506v4 )

ライセンス: Link先を確認

Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

(参考訳) 本稿では,大言語モデル(LLM)を利用して話者ダイアリゼーションシステムから出力を後処理するフレームワークであるダイアリゼーションLMを紹介する。提案するフレームワークでは,ダイアリゼーション文字の可読性の向上や,単語ダイアリゼーション誤り率(WDER)の低減など,さまざまな目標を達成することができる。この枠組みでは、自動音声認識(asr)および話者ダイアリゼーションシステムの出力を、任意に微調整されたllmへのプロンプトに含まれるコンパクトテキスト形式として表現する。 LLMの出力は、所望の増強で精製ダイアリゼーション結果として用いることができる。処理後ステップとして、このフレームワークは既存のコンポーネントを再トレーニングすることなく、任意の既製のasrおよび話者ダイアリゼーションシステムに容易に適用できる。実験の結果,微調整された PaLM 2-S モデルにより WDER を rel で低減できることがわかった。 Fisher 電話の会話データセットで55.5%、rel。 44.9%であった。

In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset.

翻訳日:2024-02-08 19:23:33 公開日:2024-02-06

# 教師なし深層学習画像検証法

Unsupervised Deep Learning Image Verification Method ( http://arxiv.org/abs/2312.14395v2 )

ライセンス: Link先を確認

Enoch Solomon, Abraham Woubie and Eyael Solomon Emiru

(参考訳) ディープラーニングは一般的に画像認識に使用されるが、通常は大量のラベル付きトレーニングデータが必要である。これにより、最先端の教師なし顔認証技術と比較すると、顕著な性能格差が生じる。本研究では,顔画像ベクトルを新しい表現に変換するオートエンコーダを利用して,このギャップを狭める手法を提案する。特に、オートエンコーダは、元の入力画像ベクトルではなく、隣接する顔画像ベクトルを再構成するように訓練される。これらの隣接顔画像ベクトルは、訓練顔画像ベクトルとの最高コサインスコアに基づいて教師なしプロセスにより選択される。提案手法は,野生(lfw)データセットのラベル付き顔のベースラインシステム上でのeerの相対的改善を56\%達成する。これにより、コサインとPLDAスコアリングシステムのパフォーマンスギャップを狭めることに成功した。

Although deep learning are commonly employed for image recognition, usually huge amount of labeled training data is required, which may not always be readily available. This leads to a noticeable performance disparity when compared to state-of-the-art unsupervised face verification techniques. In this work, we propose a method to narrow this gap by leveraging an autoencoder to convert the face image vector into a novel representation. Notably, the autoencoder is trained to reconstruct neighboring face image vectors rather than the original input image vectors. These neighbor face image vectors are chosen through an unsupervised process based on the highest cosine scores with the training face image vectors. The proposed method achieves a relative improvement of 56\% in terms of EER over the baseline system on Labeled Faces in the Wild (LFW) dataset. This has successfully narrowed down the performance gap between cosine and PLDA scoring systems.

翻訳日:2024-02-08 19:22:34 公開日:2024-02-06

# 最適統計透かしに向けて

Towards Optimal Statistical Watermarking ( http://arxiv.org/abs/2312.07930v3 )

ライセンス: Link先を確認

Baihe Huang and Hanlin Zhu and Banghua Zhu and Kannan Ramchandran and Michael I. Jordan and Jason D. Lee and Jiantao Jiao

(参考訳) 統計的ウォーターマーキングを仮説検定問題として定式化し,従来のすべての統計ウォーターマーキング法を仮定した。我々の定式化の鍵は出力トークンと拒否領域の結合であり、実際には擬似ランダム生成器によって実現され、I型エラーとII型エラーの非自明なトレードオフを可能にする。一般仮説テスト設定におけるUMP(Uniformly Most Powerful)の透かしとモデル非依存設定におけるミニマックスタイプIIの誤差を特徴付ける。出力が$n$トークンのシーケンスである一般的なシナリオでは、小さなタイプIとタイプIIのエラーを保証するために必要なi.d.トークンの数にほぼ一致する上限と下位の境界を確立する。我々のレートは$\Theta(h^{-1} \log (1/h))$で、トークン当たりの平均エントロピーは$h$で、前作の$h^{-2}$から改善のためのポテンシャルを強調する。さらに、ユーザが生成したテキストに対して摂動のクラスを実行することを許されるロバストな透かし問題を定式化し、線形プログラミング問題を通じてロバストなUMPテストのタイプIIエラーを特徴付ける。我々の知る限りでは、これは、将来の研究の関心を惹きつけるであろう、近距離最適率の透かし問題に関する最初の体系的な統計処理である。

We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $\Theta(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h^{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.

翻訳日:2024-02-08 19:20:22 公開日:2024-02-06

# 拡散モデルにおけるニューラルネットワークに基づくスコア推定:最適化と一般化

Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization ( http://arxiv.org/abs/2401.15604v2 )

ライセンス: Link先を確認

Yinbin Han, Meisam Razaviyayn, Renyuan Xu

(参考訳) 拡散モデルがgansに匹敵する強力なツールとして登場し、忠実性、柔軟性、堅牢性を改善した高品質なサンプルを生成する。これらのモデルの鍵となる要素は、スコアマッチングを通じてスコア関数を学ぶことである。様々なタスクで経験的な成功にもかかわらず、勾配に基づくアルゴリズムが証明可能な精度でスコア関数を学習できるかどうかは不明である。この質問に答える第一歩として,勾配降下によって学習したニューラルネットワークを用いてスコア推定を解析するための数学的枠組みを確立した。本分析は,学習手順の最適化と一般化の両面をカバーする。特に,ノイズラベルを用いた回帰として,発声スコアマッチング問題を定式化するパラメトリック形式を提案する。標準教師付き学習装置と比較して、スコアマッチング問題は、非有界入力、ベクトル値出力、追加の時間変数などの異なる課題を導入し、既存のテクニックが直接適用されないようにする。本稿では、適切に設計されたニューラルネットワークアーキテクチャを用いて、スコア関数を、神経接核によって引き起こされる再生核ヒルベルト空間によって正確に近似できることを示す。さらに,勾配降下の早期停止ルールを適用し,ニューラルネットワークのトレーニングとカーネル回帰の結合引数を活用することで,観測にノイズが存在するにもかかわらずスコア関数を学習するための最初の一般化誤差(サンプル複雑性)境界を確立する。本研究は,ニューラルネットの新しいパラメトリック形式と,スコアマッチングと回帰分析の革新的な関連を基礎として,高度な統計・最適化手法の適用を促進する。

Diffusion models have emerged as a powerful tool rivaling GANs in generating high-quality samples with improved fidelity, flexibility, and robustness. A key component of these models is to learn the score function through score matching. Despite empirical success on various tasks, it remains unclear whether gradient-based algorithms can learn the score function with a provable accuracy. As a first step toward answering this question, this paper establishes a mathematical framework for analyzing score estimation using neural networks trained by gradient descent. Our analysis covers both the optimization and the generalization aspects of the learning procedure. In particular, we propose a parametric form to formulate the denoising score-matching problem as a regression with noisy labels. Compared to the standard supervised learning setup, the score-matching problem introduces distinct challenges, including unbounded input, vector-valued output, and an additional time variable, preventing existing techniques from being applied directly. In this paper, we show that with a properly designed neural network architecture, the score function can be accurately approximated by a reproducing kernel Hilbert space induced by neural tangent kernels. Furthermore, by applying an early-stopping rule for gradient descent and leveraging certain coupling arguments between neural network training and kernel regression, we establish the first generalization error (sample complexity) bounds for learning the score function despite the presence of noise in the observations. Our analysis is grounded in a novel parametric form of the neural network and an innovative connection between score matching and regression analysis, facilitating the application of advanced statistical and optimization techniques.

翻訳日:2024-02-08 19:11:38 公開日:2024-02-06

# TA-RNN:電子健康記録のための注意に基づく時間認識リカレントニューラルネットワークアーキテクチャ

TA-RNN: an Attention-based Time-aware Recurrent Neural Network Architecture for Electronic Health Records ( http://arxiv.org/abs/2401.14694v2 )

ライセンス: Link先を確認

Mohammad Al Olaimat, Serdar Bozdag (for the Alzheimer's Disease Neuroimaging Initiative)

(参考訳) 動機:Electronic Health Records(EHR)は患者の医療史の総合的な資料である。 EHRは、深層学習(DL)のような高度な技術を活用するために不可欠であり、医療提供者が広範なデータを分析し、貴重な洞察を抽出し、正確でデータ駆動型の臨床決定を下すことができる。リカレントニューラルネットワーク(Recurrent Neural Networks, RNN)のようなDL手法を用いて, EHRを分析して疾患の進行をモデル化し, 診断を予測する。しかし、これらの手法は、臨床訪問間の不規則な時間間隔など、EHRデータに固有の不規則性には対処しない。さらに、ほとんどのDLモデルは解釈できない。本研究では,RNNをベースとした2つの解釈可能なDLアーキテクチャ,TA-RNN(Time-Aware RNN)とTA-RNN-Autoencoder(TA-RNN-AE)を提案する。本研究では,不規則な時間間隔の影響を軽減するため,訪問時間間の時間埋め込みを提案する。そこで本研究では,各訪問における訪問と特徴の間で動作する2レベルアテンション機構を提案する。結果: アルツハイマー病神経画像イニシアチブ (ADNI) と国立アルツハイマー病コーディネートセンター (NACC) データセットを用いて行った実験の結果, F2 と感度に基づく最先端およびベースラインアプローチと比較して,アルツハイマー病(AD)を予測するための提案モデルの優れた性能を示した。さらに、TA-RNNは、死亡予測のためのMIMIC-IIIデータセットにおいて優れた性能を示した。アブレーション実験では,時間埋め込みと注意機構を取り入れた予測性能が向上した。最後に注意重みの調査は、予測に影響力のある訪問や特徴を特定するのに役立った。

Motivation: Electronic Health Records (EHR) represent a comprehensive resource of a patient's medical history. EHR are essential for utilizing advanced technologies such as deep learning (DL), enabling healthcare providers to analyze extensive data, extract valuable insights, and make precise and data-driven clinical decisions. DL methods such as Recurrent Neural Networks (RNN) have been utilized to analyze EHR to model disease progression and predict diagnosis. However, these methods do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. In this study, we propose two interpretable DL architectures based on RNN, namely Time-Aware RNN (TA-RNN) and TA-RNN-Autoencoder (TA-RNN-AE) to predict patient's clinical outcome in EHR at next visit and multiple visits ahead, respectively. To mitigate the impact of irregular time intervals, we propose incorporating time embedding of the elapsed times between visits. For interpretability, we propose employing a dual-level attention mechanism that operates between visits and features within each visit. Results: The results of the experiments conducted on Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets indicated superior performance of proposed models for predicting Alzheimer's Disease (AD) compared to state-of-the-art and baseline approaches based on F2 and sensitivity. Additionally, TA-RNN showed superior performance on Medical Information Mart for Intensive Care (MIMIC-III) dataset for mortality prediction. In our ablation study, we observed enhanced predictive performance by incorporating time embedding and attention mechanisms. Finally, investigating attention weights helped identify influential visits and features in predictions.

翻訳日:2024-02-08 19:11:13 公開日:2024-02-06

# varshni-hellmannポテンシャルの近似境界状態解

Approximate Bound States Solution of the Varshni-Hellmann Potential ( http://arxiv.org/abs/2401.11151v3 )

ライセンス: Link先を確認

N. Tazimi

(参考訳) 本稿では,varshni-hellmannポテンシャルの有界状態問題を有用な手法で解く。本研究では,varshni-hellmannポテンシャルに対するschrodinger方程式の境界状態解をansatz法で求める。エネルギー固有値と対応する固有関数を得る。また、地中におけるエネルギースペクトルの挙動と、2つの身体系の励起状態について図式的に示す。この結果と正確な数値との類似性は,本手法の効率性を示すものである。

In this paper, we solve the bound state problem for Varshni-Hellmann potential via a useful technique. In our technique, we obtain the bound state solution of the Schrodinger equation for the Varshni-Hellmann potential via ansatz method. We obtain the energy eigenvalues and the corresponding eigen-functions. Also, the behavior of the energy spectra for both the ground and the excited state of the two body systems is illustrated graphically. The similarity of our results to the accurate numerical values is indicative of the efficiency of our technique.

翻訳日:2024-02-08 19:10:19 公開日:2024-02-06

# AI既存リスクの2つのタイプ:決定的かつ累積的

Two Types of AI Existential Risk: Decisive and Accumulative ( http://arxiv.org/abs/2401.07836v2 )

ライセンス: Link先を確認

Atoosa Kasirzadeh

(参考訳) AIからの現実的リスク(xリスク)に関する従来の談話は、一般的には、高度なAIシステム、特に人間レベルの知性を達成したり、超えたりすることによる、突発的で恐ろしい出来事に焦点を当てている。これらの出来事は、人類の絶滅に繋がる深刻な結果をもたらすか、あるいは不可逆的に人間の文明を回復の限界まで破壊する。しかし、この談話はしばしば、より小さく相互接続された一連の混乱を通じて徐々に現れるai x-リスクの深刻な可能性を無視し、徐々に臨界しきい値を超えていく。本稿では,従来の「決定的ai x-risk仮説」と「蓄積的ai x-risk仮説」を対比する。前者は、制御不能な超知能のようなシナリオを特徴とする、AIによる過剰な乗っ取り経路を想定しているが、後者は、実在する災害に対する別の因果経路を示唆している。これには、深刻な脆弱性やエコノポリティカルな構造の体系的侵食など、AIによって引き起こされる脅威が徐々に蓄積される。累積仮説は、インクリメンタルaiのリスクがゆっくりと収束し、引き起こされる事象が不可逆的な崩壊に至るまでレジリエンスを損なう、沸騰するカエルシナリオを示唆する。システム分析を通じて,これら2つの仮説を区別する明確な仮定について検討する。累積的な視点は、AIリスクに関する一見互換性のない視点を一致させる、と論じられている。これらの因果経路 – 決定的かつ累積的 – との違いが,AIリスクのガバナンスや長期的なAI安全性に与える影響について論じる。

The conventional discourse on existential risks (x-risks) from AI typically focuses on abrupt, dire events caused by advanced AI systems, particularly those that might achieve or surpass human-level intelligence. These events have severe consequences that either lead to human extinction or irreversibly cripple human civilization to a point beyond recovery. This discourse, however, often neglects the serious possibility of AI x-risks manifesting incrementally through a series of smaller yet interconnected disruptions, gradually crossing critical thresholds over time. This paper contrasts the conventional "decisive AI x-risk hypothesis" with an "accumulative AI x-risk hypothesis." While the former envisions an overt AI takeover pathway, characterized by scenarios like uncontrollable superintelligence, the latter suggests a different causal pathway to existential catastrophes. This involves a gradual accumulation of critical AI-induced threats such as severe vulnerabilities and systemic erosion of econopolitical structures. The accumulative hypothesis suggests a boiling frog scenario where incremental AI risks slowly converge, undermining resilience until a triggering event results in irreversible collapse. Through systems analysis, this paper examines the distinct assumptions differentiating these two hypotheses. It is then argued that the accumulative view reconciles seemingly incompatible perspectives on AI risks. The implications of differentiating between these causal pathways -- the decisive and the accumulative -- for the governance of AI risks as well as long-term AI safety are discussed.

翻訳日:2024-02-08 19:09:43 公開日:2024-02-06

# lighthgnn: 100\times$高速推論のためにハイパーグラフニューラルネットワークをmlpに蒸留する

LightHGNN: Distilling Hypergraph Neural Networks into MLPs for $100\times$ Faster Inference ( http://arxiv.org/abs/2402.04296v1 )

ライセンス: Link先を確認

Yifan Feng, Yihe Luo, Shihui Ying, Yue Gao

(参考訳) ハイパーグラフニューラルネットワーク(HGNN)は近年注目され,高次相関モデルにおける優位性から良好な性能を示した。しかし、ハイパーグラフの高次モデリング能力は計算の複雑さを増大させ、実用的な産業展開を妨げることにも注目される。実際、HGNNの効率的なデプロイにおける重要な障壁は、推論中の高次構造的依存関係である。本稿では,HGNNのハイパーグラフ依存性を解消し,計算複雑性を低減し,推論速度の向上を図るため,HGNNと推論効率のよいMulti-Layer Perceptron(MLP)のギャップを埋めることを提案する。具体的には、複雑性の低い高速推論のために、LightHGNNとLightHGNN$^+$を導入する。 LightHGNN は教師 HGNN から学生 MLP への知識をソフトラベルを通じて直接蒸留し、LightHGNN$^+$ は生徒 MLP に信頼性の高い高次相関関係を明示的に注入し、トポロジカルな蒸留と過度なスムースティングに対する耐性を達成する。 8つのハイパーグラフデータセットの実験では、ハイパーグラフの依存関係がなくても、提案されたLightHGNNはHGNNよりも競争力や性能が向上し、バニラMLPを平均16.3ドル上回った。 3つのグラフデータセットに関する広範な実験は、他のすべての方法と比較して、我々のlighthgnnの平均的なパフォーマンスを示している。 5.5wの頂点を持つ合成ハイパーグラフの実験は、LightHGNNがHGNNよりも100\times$で動作可能であることを示している。

Hypergraph Neural Networks (HGNNs) have recently attracted much attention and exhibited satisfactory performance due to their superiority in high-order correlation modeling. However, it is noticed that the high-order modeling capability of hypergraph also brings increased computation complexity, which hinders its practical industrial deployment. In practice, we find that one key barrier to the efficient deployment of HGNNs is the high-order structural dependencies during inference. In this paper, we propose to bridge the gap between the HGNNs and inference-efficient Multi-Layer Perceptron (MLPs) to eliminate the hypergraph dependency of HGNNs and thus reduce computational complexity as well as improve inference speed. Specifically, we introduce LightHGNN and LightHGNN$^+$ for fast inference with low complexity. LightHGNN directly distills the knowledge from teacher HGNNs to student MLPs via soft labels, and LightHGNN$^+$ further explicitly injects reliable high-order correlations into the student MLPs to achieve topology-aware distillation and resistance to over-smoothing. Experiments on eight hypergraph datasets demonstrate that even without hypergraph dependency, the proposed LightHGNNs can still achieve competitive or even better performance than HGNNs and outperform vanilla MLPs by $16.3$ on average. Extensive experiments on three graph datasets further show the average best performance of our LightHGNNs compared with all other methods. Experiments on synthetic hypergraphs with 5.5w vertices indicate LightHGNNs can run $100\times$ faster than HGNNs, showcasing their ability for latency-sensitive deployments.

翻訳日:2024-02-08 18:48:22 公開日:2024-02-06

# ECGスペクトログラムとディープラーニングを用いたパーソナリティトランジット認識

Personality Trait Recognition using ECG Spectrograms and Deep Learning ( http://arxiv.org/abs/2402.04326v1 )

ライセンス: Link先を確認

Muhammad Mohsin Altaf, Saadat Ullah Khan, Muhammad Majd, Syed Muhammad Anwar

(参考訳) 本稿では,心電図(ECG)信号に応用した深層学習(DL)手法を用いて,人格特性の認識に革新的なアプローチを提案する。この研究は、外転、神経症、同意性、良心、開放性を含む5つの大きな性格特性モデルを検出する枠組みの中で、ECG由来の分光図の可能性を探究する。スペクトログラム生成のための最適なウィンドウサイズが決定され、特徴抽出と性格特性分類には畳み込みニューラルネットワーク(CNN)、特にResnet-18、視覚変換器(ViT)が使用される。本研究は,心電図記録を含む各種生理的信号を含む公開型アシュタントデータセットを用いて,ヴァレンスレベルと覚醒レベルに分類された映像刺激の提示中に58名の参加者から収集した。本研究の結果は人格特性の分類において顕著な性能を示し,窓の大きさや性格特性の異なるF1スコア以上を連続的に達成している。以上の結果から,ECG信号スペクトログラムは個性特性認識に有用であり,Resnet-18は個性特性の識別に有効であることが示唆された。

This paper presents an innovative approach to recognizing personality traits using deep learning (DL) methods applied to electrocardiogram (ECG) signals. Within the framework of detecting the big five personality traits model encompassing extra-version, neuroticism, agreeableness, conscientiousness, and openness, the research explores the potential of ECG-derived spectrograms as informative features. Optimal window sizes for spectrogram generation are determined, and a convolutional neural network (CNN), specifically Resnet-18, and visual transformer (ViT) are employed for feature extraction and personality trait classification. The study utilizes the publicly available ASCERTAIN dataset, which comprises various physiological signals, including ECG recordings, collected from 58 participants during the presentation of video stimuli categorized by valence and arousal levels. The outcomes of this study demonstrate noteworthy performance in personality trait classification, consistently achieving F1-scores exceeding 0.9 across different window sizes and personality traits. These results emphasize the viability of ECG signal spectrograms as a valuable modality for personality trait recognition, with Resnet-18 exhibiting effectiveness in discerning distinct personality traits.

翻訳日:2024-02-08 18:34:53 公開日:2024-02-06

# ドーパント系量子ドットの1次元鎖による電子輸送

Electron Transport Through a 1D Chain of Dopant-Based Quantum Dots ( http://arxiv.org/abs/2402.04300v1 )

ライセンス: Link先を確認

Sumedh Vangara

(参考訳) 強い相互作用を持つ電子系は、mott絶縁挙動やスピン流動性などの量子多体現象に対する洞察を与え、半導体最適化を促進する。 Fermi-Hubbard モデルはそのようなシステムを研究するために使われる原型モデルである。しかし、近年の研究では、長距離相互作用を考慮に入れたFermi-Hubbardモデルの方が正確であることが示されている。本研究では,Fermi-Hubbardモデルを用いて量子ドットの格子による電荷輸送を数学的に解析する。スピンレス電子とソースドレインバイアスを持つ一次元鎖が観察され、基底状態と第一励起状態の遷移に焦点が当てられる。レベル反発は、チェーンへのホッピングがチェーン内のホッピングに近づくにつれて、アンチクロスの期待エネルギーレベルを低下させる。鎖に沿った電荷密度の分布はホッピングパラメーター、核パラメーター、クーロンパラメーターによって特徴づけられ、新しいプラズモニック挙動が解析される。電子輸送における小さな摂動は、観測された系の1次元の性質に応じて同定される。この研究は、相関誘起バンドギャップの形成のようなシリコンドープ半導体の電子挙動をよりよく理解し、拡張フェルミ・ハバード模型を量子多体系の研究のより正確な代替として利用するための扉を開く。

Strongly interacting electron systems can provide insight into quantum many-body phenomena, such as Mott insulating behavior and spin liquidity, facilitating semiconductor optimization. The Fermi-Hubbard model is the prototypical model used to study such systems. Recent research, however, has shown that the extended Fermi-Hubbard model, which accounts for long-range interactions, is more accurate, especially for systems far from half-filling. In this study, we use the extended Fermi-Hubbard model to mathematically analyze charge transport through a lattice of quantum dots. One-dimensional chains with spinless electrons and source-drain bias are observed, focusing on the transition between the ground state and the first excited state. Level repulsion decreases the expected energy levels of anticrossings as the hopping onto the chain tends to the hopping within the chain. The distribution of charge density along the chain is characterized in terms of the hopping, nuclear, and Coulomb parameters and novel plasmonic behavior is analyzed. Minor perturbations in electron transport are identified, corresponding to the one-dimensional nature of the observed systems. This research will lead to a better understanding of electron behavior in silicon-doped semiconductors, like the formation of correlation-induced band gaps, and open the door to using the extended Fermi-Hubbard model as a more accurate alternative to study quantum many-body systems.

翻訳日:2024-02-08 18:34:30 公開日:2024-02-06

# 多視点記号回帰

Multi-View Symbolic Regression ( http://arxiv.org/abs/2402.04298v1 )

ライセンス: Link先を確認

Etienne Russeil, Fabr\'icio Olivetti de Fran\c{c}a, Konstantin Malanchev, Bogdan Burlacu, Emille E. O. Ishida, Marion Leroux, Cl\'ement Michelin, Guillaume Moinard, Emmanuel Gangler

(参考訳) 記号回帰(sr)は、説明変数の集合と応答変数の関係を表す解析式を探索する。現在のsrメソッドは、単一の実験から抽出された単一のデータセットを想定している。しかしながら、研究者はしばしば異なる設定で行われた実験から得られた複数の結果に直面する。従来のSR法では、各実験のパラメータが異なるため、基礎となる式を見つけることができない。本研究では,複数のデータセットを同時に考慮し,実験環境を模倣し,一般的なパラメトリック解を出力するマルチビューシンボリック回帰(mvsr)を提案する。このアプローチは、各独立データセットに評価された式を適合させ、すべてのデータセットを正確に適合できる関数 f(x; \theta) のパラメトリック族を返す。我々は、既知の表現から生成されたデータと、天文学、化学、経済から得られた実世界のデータを用いて、MvSRの有効性を実証する。その結果、MvSRは正しい表現をより頻繁に獲得し、ハイパーパラメーターの変化に対して堅牢であることがわかった。実世界のデータでは、集団の振る舞いを把握し、文献から既知の表現を回収し、有望な代替品を回収し、SRを幅広い実験シナリオに利用できるようにする。

Symbolic regression (SR) searches for analytical expressions representing the relationship between a set of explanatory and response variables. Current SR methods assume a single dataset extracted from a single experiment. Nevertheless, frequently, the researcher is confronted with multiple sets of results obtained from experiments conducted with different setups. Traditional SR methods may fail to find the underlying expression since the parameters of each experiment can be different. In this work we present Multi-View Symbolic Regression (MvSR), which takes into account multiple datasets simultaneously, mimicking experimental environments, and outputs a general parametric solution. This approach fits the evaluated expression to each independent dataset and returns a parametric family of functions f(x; \theta) simultaneously capable of accurately fitting all datasets. We demonstrate the effectiveness of MvSR using data generated from known expressions, as well as real-world data from astronomy, chemistry and economy, for which an a priori analytical expression is not available. Results show that MvSR obtains the correct expression more frequently and is robust to hyperparameters change. In real-world data, it is able to grasp the group behaviour, recovering known expressions from the literature as well as promising alternatives, thus enabling the use SR to a large range of experimental scenarios.

翻訳日:2024-02-08 18:34:04 公開日:2024-02-06

# 道路表面欠陥検出 -画像ベースから非画像ベースへ-

Road Surface Defect Detection -- From Image-based to Non-image-based: A Survey ( http://arxiv.org/abs/2402.04297v1 )

ライセンス: Link先を確認

Jongmin Yu, Jiaqi Jiang, Sebastiano Fichera, Paolo Paoletti, Lisa Layzell, Devansh Mehta, and Shan Luo

(参考訳) 交通安全の確保が不可欠であり,道路面欠陥の検出と防止が必要である。その結果,本研究への関心が高まり,様々な路面欠陥検出手法の開発に繋がった。道路欠陥検出方法は、入力データの種類や訓練方法によって、様々な方法で分類することができる。主なアプローチは画像ベースの手法で、ピクセル強度や表面テクスチャを分析して欠陥を識別する。その人気にもかかわらず、画像ベースの手法は天候や照明の変化に対する脆弱性の明確な制限を共有している。この問題に対処するために、レーザースキャナやLiDARなどの追加センサーの使用を検討し、スケールと体積の点で欠陥を検出するための明確な深度情報を提供してきた。しかし,画像以外のデータの探索は十分に研究されていない。本稿では,道路表面欠陥検出研究の包括的レビューを行い,入力データ型と手法に基づいて分類する。さらに,最近提案した非画像ベースの手法を概観し,これらの手法に関する課題と課題について考察した。

Ensuring traffic safety is crucial, which necessitates the detection and prevention of road surface defects. As a result, there has been a growing interest in the literature on the subject, leading to the development of various road surface defect detection methods. The methods for detecting road defects can be categorised in various ways depending on the input data types or training methodologies. The predominant approach involves image-based methods, which analyse pixel intensities and surface textures to identify defects. Despite their popularity, image-based methods share the distinct limitation of vulnerability to weather and lighting changes. To address this issue, researchers have explored the use of additional sensors, such as laser scanners or LiDARs, providing explicit depth information to enable the detection of defects in terms of scale and volume. However, the exploration of data beyond images has not been sufficiently investigated. In this survey paper, we provide a comprehensive review of road surface defect detection studies, categorising them based on input data types and methodologies used. Additionally, we review recently proposed non-image-based methods and discuss several challenges and open problems associated with these techniques.

翻訳日:2024-02-08 18:33:47 公開日:2024-02-06

# 誘導電気双極子系における魅力的な逆二乗ポテンシャルの変化

Modified attractive inverse-square potential in the induced electric dipole system ( http://arxiv.org/abs/2402.04294v1 )

ライセンス: Link先を確認

K. Bakke and J. G. G. S. Ramos

(参考訳) 内側半径を r_{0}$ と表記した拡張された非導電性円柱内の電荷の空間分布について検討する。本研究は, 電界と中性粒子の誘導電双極子モーメントの複雑な相互作用から生じる, 明らかに変化した逆2乗ポテンシャルの出現を明らかにした。この修正されたポテンシャルは、従来の逆二乗ポテンシャルから特に離れており、$r^{-1}$に比例する追加項を示す。結果として、この複雑なシステム内での離散エネルギースペクトルの実現に関する説得力のある証拠を提示する。

We examine the spatial distribution of electric charges within an extended, non-conductive cylinder featuring an inner radius denoted as $r_{0}$. Our investigation unveils the emergence of a distinct modified attractive-inverse square potential, arising from the intricate interplay between the electric field and the induced electric dipole moment of a neutral particle. This modified potential notably departs from the conventional inverse-square potential, showcasing an additional term proportional to $r^{-1}$. As a result, we present compelling evidence for the realization of a discrete energy spectrum within this intricate system.

翻訳日:2024-02-08 18:33:30 公開日:2024-02-06

# 調和振動子による誘導電双極子系の魅力的な逆二乗ポテンシャルについて

On the attractive inverse-square potential in the induced electric dipole system under the influence of the harmonic oscillator ( http://arxiv.org/abs/2402.04293v1 )

ライセンス: Link先を確認

K. Bakke and J. G. G. S. Ramos

(参考訳) 我々は、高調波発振器の影響下で誘導電気双極子モーメント系における魅力的な2乗ポテンシャルに対するシュリンガー方程式の解析解を得る。電場配置が中性粒子に対して禁止領域を課すカットオフ点をもたらすとき、境界状態が存在することを示す。そして、$s$-wavesを扱うことにより、強電界レジームにおけるエネルギー固有値と調和振動子の角周波数の小さい値を得る。さらに、エネルギー固有値に関する議論を$s$-waveを超えて拡張する。

We obtain the analytical solutions to the Schr\"odinger equation for the attractive inverse-square potential in an induced electric dipole moment system under the influence of the harmonic oscillator. We show that bound states can exist when the electric field configuration brings a cut-off point that imposes a forbidden region for the neutral particle. Then, by dealing with $s$-waves, we obtain the energy eigenvalues in the strong electric field regime and for small values of the angular frequency of the harmonic oscillator. Further, we extend our discussion about the energy eigenvalues beyond the $s$-waves.

翻訳日:2024-02-08 18:33:21 公開日:2024-02-06

# AdaFlow: 可変適応型フローベースポリシによる模倣学習

AdaFlow: Imitation Learning with Variance-Adaptive Flow-Based Policies ( http://arxiv.org/abs/2402.04292v1 )

ライセンス: Link先を確認

Xixi Hu, Bo Liu, Xingchao Liu and Qiang Liu

(参考訳) 拡散に基づく模倣学習は、多モーダル意思決定における行動クローニング(BC)を改善するが、拡散過程の再帰により推論が著しく遅くなる。多様なアクションを生成する能力を維持しながら、効率的なポリシージェネレータを設計するよう促します。そこで本研究では,フローベース生成モデルに基づく模倣学習フレームワークであるAdaFlowを提案する。 adaflowは、確率フローとして知られる状態条件付き常微分方程式(odes)でポリシーを表す。トレーニング損失の条件分散とODEの離散化誤差との間の興味深い関係を明らかにする。そこで本研究では,AdaFlowを適応型意思決定器とし,多様性を犠牲にすることなく高速な推論を実現する分散適応ODEソルバを提案する。興味深いことに、アクション分布がユニモーダルである場合には、自動的にワンステップジェネレータに還元される。包括的実証評価の結果,AdaFlowは成功率,行動多様性,推論速度など,すべての領域で高いパフォーマンスを実現していることがわかった。コードはhttps://github.com/hxixh/AdaFlowで入手できる。

Diffusion-based imitation learning improves Behavioral Cloning (BC) on multi-modal decision-making, but comes at the cost of significantly slower inference due to the recursion in the diffusion process. It urges us to design efficient policy generators while keeping the ability to generate diverse actions. To address this challenge, we propose AdaFlow, an imitation learning framework based on flow-based generative modeling. AdaFlow represents the policy with state-conditioned ordinary differential equations (ODEs), which are known as probability flows. We reveal an intriguing connection between the conditional variance of their training loss and the discretization error of the ODEs. With this insight, we propose a variance-adaptive ODE solver that can adjust its step size in the inference stage, making AdaFlow an adaptive decision-maker, offering rapid inference without sacrificing diversity. Interestingly, it automatically reduces to a one-step generator when the action distribution is uni-modal. Our comprehensive empirical evaluation shows that AdaFlow achieves high performance across all dimensions, including success rate, behavioral diversity, and inference speed. The code is available at https://github.com/hxixixh/AdaFlow

翻訳日:2024-02-08 18:33:12 公開日:2024-02-06

# billm: llmのトレーニング後の量子化の限界を押し上げる

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs ( http://arxiv.org/abs/2402.04291v1 )

ライセンス: Link先を確認

Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi

(参考訳) 事前学習された大規模言語モデル(llms)は、例外的な汎用言語処理能力を示すが、メモリと計算資源に大きな要求がある。強力な圧縮技術として、バイナライゼーションはモデル重みをわずか1ビットに減らし、高価な計算とメモリ要求を低減させる。しかし、既存の量子化技術は、超低ビット幅でのLLM性能を維持するには不十分である。この課題に対応して,事前学習LLMに適した1ビット後量子化方式であるBiLLMを提案する。 LLMの重み分布に基づいて、BiLLMはまず有意な重みを識別し、構造的に選択し、効率的な二乗残差近似戦略により圧縮損失を最小化する。さらに,非塩分重みのベル形状分布を考慮し,グループ化と二項化を正確に行うための最適分割探索を提案する。 billmは、様々なllmファミリーにまたがる1.08ビットの重みと評価指標を持つ、初めて高精度な推論(例えば、llama2-70bの8.41パープレキシティ)を達成し、llmのsoma量子化法をかなりマージンで上回っている。さらに、BiLLMは、1つのGPU上で0.5時間以内に70億の重みを持つLLMのバイナライズプロセスを可能にし、良好な時間効率を示す。

Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources. As a powerful compression technology, binarization can extremely reduce model weights to a mere 1 bit, lowering the expensive computation and memory requirements. However, existing quantization techniques fall short of maintaining LLM performance under ultra-low bit-widths. In response to this challenge, we present BiLLM, a groundbreaking 1-bit post-training quantization scheme tailored for pretrained LLMs. Based on the weight distribution of LLMs, BiLLM first identifies and structurally selects salient weights, and minimizes the compression loss through an effective binary residual approximation strategy. Moreover, considering the bell-shaped distribution of the non-salient weights, we propose an optimal splitting search to group and binarize them accurately. BiLLM achieving for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families and evaluation metrics, outperforms SOTA quantization methods of LLM by significant margins. Moreover, BiLLM enables the binarization process of the LLM with 7 billion weights within 0.5 hours on a single GPU, demonstrating satisfactory time efficiency.

翻訳日:2024-02-08 18:32:52 公開日:2024-02-06

# CasCast:カスケードモデルによる高度な高分解能降水

CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling ( http://arxiv.org/abs/2402.04290v1 )

ライセンス: Link先を確認

Junchao Gong, Lei Bai, Peng Ye, Wanghan Xu, Na Liu, Jianhua Dai, Xiaokang Yang, Wanli Ouyang

(参考訳) 気象予報において,レーダデータに基づく降雨流しは重要な役割を担い,災害管理に幅広い影響を及ぼす。深層学習に基づく進歩にもかかわらず、降水ナキャスティングの2つの重要な課題はよく解決されていない。 (i)異なるスケールの複雑な降水系の進化のモデル化、二極度の降水量の正確な予測本研究では,メソスケール降水分布と小規模パターンの予測を分離するために,決定的かつ確率的な部分からなるカスケードフレームワークCasCastを提案する。次に,高分解能でカスケードフレームワークを訓練し,計算コストを低減しつつ極端事象の最適化を促進するために,フレーム誘導拡散トランスを用いて低次元潜在空間における確率的モデリングを行う。 3つのベンチマークレーダ降雨データセットに関する広範な実験は、cascastが競合性能を達成していることを示している。特にCasCastは、地域の極端降水流のベースライン(+91.8%)を大幅に上回っている。

Precipitation nowcasting based on radar data plays a crucial role in extreme weather prediction and has broad implications for disaster management. Despite progresses have been made based on deep learning, two key challenges of precipitation nowcasting are not well-solved: (i) the modeling of complex precipitation system evolutions with different scales, and (ii) accurate forecasts for extreme precipitation. In this work, we propose CasCast, a cascaded framework composed of a deterministic and a probabilistic part to decouple the predictions for mesoscale precipitation distributions and small-scale patterns. Then, we explore training the cascaded framework at the high resolution and conducting the probabilistic modeling in a low dimensional latent space with a frame-wise-guided diffusion transformer for enhancing the optimization of extreme events while reducing computational costs. Extensive experiments on three benchmark radar precipitation datasets show that CasCast achieves competitive performance. Especially, CasCast significantly surpasses the baseline (up to +91.8%) for regional extreme-precipitation nowcasting.

翻訳日:2024-02-08 18:32:26 公開日:2024-02-06

# 認知課題中の前頭前野fnirs信号と大学学力テスト(csat)得点との関連性--量子アニーリング法による解析

Association between Prefrontal fNIRS signals during Cognitive tasks and College scholastic ability test (CSAT) scores: Analysis using a quantum annealing approach ( http://arxiv.org/abs/2402.04287v1 )

ライセンス: Link先を確認

Yeaju Kim, Junggu Choi, Bora Kim, Yongwan Park, Jihyun Cha, Jongkwan Choi, and Sanghoon Han

(参考訳) 学術的達成は知的能力の重要な尺度であり、潜在的な予測因子としての認知タスクの広範な研究を促す。機能近赤外分光法(fNIRS)のようなニューロイメージング技術は、脳の血行動態に関する洞察を与え、認知能力と学術的成果との関係を理解する。そこで本研究では,前頭前部fNIRS信号の解析により,認知課題と学業成績との関連性を検討した。 CSATスコアと相関する認知タスクを識別するために、fNIRSデータに新しい量子アニール(QA)特徴選択アルゴリズムを適用した。 2つの時間窓(10秒,60秒)におけるfnirs信号から12の特徴(信号平均,中央値,分散値,ピーク数,ピーク数,ピーク数,ピーク数,ピーク数,斜面,極小値,クルトシス,歪度,標準偏差,根平均正方形)を抽出し,各特徴量条件の比較を行った。 QAベースおよびXGBoost回帰器アルゴリズムの特徴選択結果を比較し,前者の性能評価を行った。複数の線形回帰モデルを用いた3段階の検証プロセスにおいて,特徴変数とCSATスコアの相関係数,モデル適合度(調整R2),モデル予測誤差(RMSE)値を算出した。量子アニーラーは古典的な機械学習モデルに匹敵する性能を示し、言語流布、認識、コルシブロックタッピングタスクを含む特定の認知タスクは、学術的な成果と相関していた。グループ分析の結果、ロンドンのタワーと高いCSATスコアを持つNバックタスクの関係が強くなった。量子アニールアルゴリズムはfNIRSデータを用いた特徴選択において大きな可能性を持ち、新しい研究手法である。今後の研究は、学術的達成と認知能力の予測因子を探るべきである。

Academic achievement is a critical measure of intellectual ability, prompting extensive research into cognitive tasks as potential predictors. Neuroimaging technologies, such as functional near-infrared spectroscopy (fNIRS), offer insights into brain hemodynamics, allowing understanding of the link between cognitive performance and academic achievement. Herein, we explored the association between cognitive tasks and academic achievement by analyzing prefrontal fNIRS signals. A novel quantum annealer (QA) feature selection algorithm was applied to fNIRS data to identify cognitive tasks correlated with CSAT scores. Twelve features (signal mean, median, variance, peak, number of peaks, sum of peaks, slope, minimum, kurtosis, skewness, standard deviation, and root mean square) were extracted from fNIRS signals at two time windows (10- and 60-second) to compare results from various feature variable conditions. The feature selection results from the QA-based and XGBoost regressor algorithms were compared to validate the former's performance. In a three-step validation process using multiple linear regression models, correlation coefficients between the feature variables and the CSAT scores, model fitness (adjusted R2), and model prediction error (RMSE) values were calculated. The quantum annealer demonstrated comparable performance to classical machine learning models, and specific cognitive tasks, including verbal fluency, recognition, and the Corsi block tapping task, were correlated with academic achievement. Group analyses revealed stronger associations between Tower of London and N-back tasks with higher CSAT scores. Quantum annealing algorithms have significant potential in feature selection using fNIRS data, and represents a novel research approach. Future studies should explore predictors of academic achievement and cognitive ability.

翻訳日:2024-02-08 18:32:11 公開日:2024-02-06

# バイオインフォマティクスにおける基礎モデルの進展と可能性

Progress and Opportunities of Foundation Models in Bioinformatics ( http://arxiv.org/abs/2402.04286v1 )

ライセンス: Link先を確認

Qing Li, Zhihang Hu, Yixuan Wang, Lei Li, Yimin Fan, Irwin King, Le Song, Yu Li

(参考訳) バイオインフォマティクスは、人工知能(AI)の統合の増加、特に基礎モデル(FM)の採用によるパラダイムシフトを目撃している。これらのAI技術は急速に進歩し、注釈付きデータの不足やデータノイズの存在といったバイオインフォマティクスの歴史的課題に対処している。 fmsは、ラベル付きデータを実験的に決定する時間とコストのかかる性質のため、生物学的文脈において一般的なシナリオである、大規模でラベル付きデータを扱うのに特に適している。この特徴により、FMは様々な下流検証タスクにおいて顕著な成果を上げ、多様な生物学的実体を効果的に表現する能力を示すことができる。 fmsは計算生物学、特に深層学習の分野で新しい時代を迎えていることは間違いない。本調査の主な目的は,生物情報学におけるFMの体系的調査と要約を行い,その進化の追跡,研究状況,採用方法について述べることである。我々の焦点は、特定の生物学的問題に対するFMの応用であり、研究ニーズに対して適切なFMを選択するための研究コミュニティの指導を目的としています。現状の課題には,シーケンス解析,構造予測,関数アノテーション,マルチモーダル統合などがあり,従来の手法と比較する。さらに, fmsが直面するデータノイズ, モデル説明可能性, 潜在的なバイアスなど, 生物学における課題と限界についても検討した。最後に,今後の生物学的研究におけるFMの潜在的な開発経路と戦略を概説し,この急速に発展する分野におけるイノベーションと応用の段階を定めている。この包括的なレビューは学術的な資源としてだけでなく、生物学におけるfmsの今後の研究と応用のロードマップとしても機能する。

Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial intelligence (AI), particularly through the adoption of foundation models (FMs). These AI techniques have rapidly advanced, addressing historical challenges in bioinformatics such as the scarcity of annotated data and the presence of data noise. FMs are particularly adept at handling large-scale, unlabeled data, a common scenario in biological contexts due to the time-consuming and costly nature of experimentally determining labeled data. This characteristic has allowed FMs to excel and achieve notable results in various downstream validation tasks, demonstrating their ability to represent diverse biological entities effectively. Undoubtedly, FMs have ushered in a new era in computational biology, especially in the realm of deep learning. The primary goal of this survey is to conduct a systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs. We delve into the specifics of the problem at hand including sequence analysis, structure prediction, function annotation, and multimodal integration, comparing the structures and advancements against traditional methods. Furthermore, the review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases. Finally, we outline potential development paths and strategies for FMs in future biological research, setting the stage for continued innovation and application in this rapidly evolving field. This comprehensive review serves not only as an academic resource but also as a roadmap for future explorations and applications of FMs in biology.

翻訳日:2024-02-08 18:31:33 公開日:2024-02-06

# PreS: スケーラブルメモリベースの動的グラフニューラルネットワークを目指して

PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks ( http://arxiv.org/abs/2402.04284v1 )

ライセンス: Link先を確認

Junwei Su, Difan Zou, Chuan Wu

(参考訳) メモリベースの動的グラフニューラルネットワーク(MDGNN)は、メモリモジュールを利用して長期の時間的依存関係を抽出、抽出、記憶する動的グラフニューラルネットワークのファミリーであり、メモリレスニューラルネットワークよりも優れたパフォーマンスをもたらす。しかし、MDGNNのトレーニングは、絡み合った時間的および構造的依存関係を扱うという課題に直面し、正確な時間的パターンを捉えるために、データシーケンスの逐次的および時間的処理を必要とする。バッチトレーニングの間、同じバッチ内の時間的データポイントは並列に処理され、その時間的依存関係は無視される。この問題は時間的不連続(temporal discontinuity)と呼ばれ、効率的な時間的バッチサイズを制限し、データの並列性を制限し、産業アプリケーションにおけるMDGNNの柔軟性を低下させる。本稿では,時間的バッチサイズが大きいMDGNNの訓練における時間的不連続性に着目し,大規模MDGNNの効率的な訓練について検討する。まず,時間的バッチサイズがMDGNNトレーニングの収束に及ぼす影響について理論的研究を行った。そこで本研究では, 時間的不連続性の影響を軽減するため, メモリコヒーレンス学習目標と組み合わせた反復予測補正手法preSを提案し, 一般化性能を犠牲にすることなく, MDGNNを時間的バッチで訓練することができることを示した。実験の結果,MDGNNトレーニングでは,最大4倍の時間的バッチ(3.4倍高速化)が可能であった。

Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, training MDGNNs faces the challenge of handling entangled temporal and structural dependencies, requiring sequential and chronological processing of data sequences to capture accurate temporal patterns. During the batch training, the temporal data points within the same batch will be processed in parallel, while their temporal dependencies are neglected. This issue is referred to as temporal discontinuity and restricts the effective temporal batch size, limiting data parallelism and reducing MDGNNs' flexibility in industrial applications. This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes. We first conduct a theoretical study on the impact of temporal batch size on the convergence of MDGNN training. Based on the analysis, we propose PRES, an iterative prediction-correction scheme combined with a memory coherence learning objective to mitigate the effect of temporal discontinuity, enabling MDGNNs to be trained with significantly larger temporal batches without sacrificing generalization performance. Experimental results demonstrate that our approach enables up to a 4x larger temporal batch (3.4x speed-up) during MDGNN training.

翻訳日:2024-02-08 18:31:04 公開日:2024-02-06

# 分割データサイロ:独立プライベートソースからのマルチエージェント知覚のためのクロスドメイン学習

Breaking Data Silos: Cross-Domain Learning for Multi-Agent Perception from Independent Private Sources ( http://arxiv.org/abs/2402.04273v1 )

ライセンス: Link先を確認

Jinlong Li, Baolu Li, Xinyu Liu, Runsheng Xu, Jiaqi Ma, Hongkai Yu

(参考訳) 多エージェント認識システムにおける多様なエージェントは、異なる企業のものだ。各企業は、特徴抽出に同じ古典的なニューラルネットワークアーキテクチャベースのエンコーダを使用する。しかしながら、様々なエージェントを訓練するためのデータソースは、各企業で独立してプライベートであり、マルチエージェント知覚システムにおいて異なるエージェントを訓練するための異なるプライベートデータの分散ギャップをもたらす。以上の分布差によるデータサイロは、マルチエージェント知覚の大幅な性能低下をもたらす可能性がある。本稿では,既存のマルチエージェント知覚システムにおける分布ギャップの影響を徹底的に検討する。データサイロを断ち切るために、クロスドメイン学習のためのFeature Distribution-Aware Aggregation (FDA)フレームワークを導入し、上記の分散ギャップをマルチエージェント認識で緩和する。学習可能な機能補償モジュールと分散認識統計一貫性モジュールの2つの重要なコンポーネントで構成されており、どちらもマルチエージェント機能間の分散ギャップを最小化するために中間機能を強化することを目的としている。パブリックなOPV2VとV2XSetデータセットに関する集中的な実験は、既存のマルチエージェント認識システムに対する重要な拡張として、ポイントクラウドベースの3Dオブジェクト検出におけるFDAの有効性を裏付けるものだ。

The diverse agents in multi-agent perception systems may be from different companies. Each company might use the identical classic neural network architecture based encoder for feature extraction. However, the data source to train the various agents is independent and private in each company, leading to the Distribution Gap of different private data for training distinct agents in multi-agent perception system. The data silos by the above Distribution Gap could result in a significant performance decline in multi-agent perception. In this paper, we thoroughly examine the impact of the distribution gap on existing multi-agent perception systems. To break the data silos, we introduce the Feature Distribution-aware Aggregation (FDA) framework for cross-domain learning to mitigate the above Distribution Gap in multi-agent perception. FDA comprises two key components: Learnable Feature Compensation Module and Distribution-aware Statistical Consistency Module, both aimed at enhancing intermediate features to minimize the distribution gap among multi-agent features. Intensive experiments on the public OPV2V and V2XSet datasets underscore FDA's effectiveness in point cloud-based 3D object detection, presenting it as an invaluable augmentation to existing multi-agent perception systems.

翻訳日:2024-02-08 18:30:25 公開日:2024-02-06

# 限界保存・微分プライベート・合成データに基づく線形モデルにおける過剰リスクのバウンダリング

Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data ( http://arxiv.org/abs/2402.04375v1 )

ライセンス: Link先を確認

Yvonne Zhou, Mingyu Liang, Ivan Brugere, Dana Dachman-Soled, Danial Dervovic, Antigoni Polychroniadou, Min Wu

(参考訳) 機械学習(ml)の利用が増加すると、mlモデルがトレーニングデータセットに寄与した個人に関する情報を明かす可能性があるという懸念が高まっている。機密データの漏洩を防止するため,実学習データの代わりに差分プライベート(DP)合成トレーニングデータを用いてMLモデルを訓練する。合成データの鍵となる望ましい性質は、元の分布の低次限界を保存する能力である。本研究の主な貢献は, 連続損失関数とリプシッツ損失関数の合成データに基づく線形モデルの過大な経験的リスクに対する, 上層および下層の境界である。我々は理論結果とともに広範な実験を行う。

The growing use of machine learning (ML) has raised concerns that an ML model may reveal private information about an individual who has contributed to the training dataset. To prevent leakage of sensitive data, we consider using differentially-private (DP), synthetic training data instead of real training data to train an ML model. A key desirable property of synthetic data is its ability to preserve the low-order marginals of the original distribution. Our main contribution comprises novel upper and lower bounds on the excess empirical risk of linear models trained on such synthetic data, for continuous and Lipschitz loss functions. We perform extensive experimentation alongside our theoretical results.

翻訳日:2024-02-08 18:22:16 公開日:2024-02-06

# 初学年の工学生の計算思考におけるスキル

Skills in computational thinking of engineering students of the first school year ( http://arxiv.org/abs/2402.04340v1 )

ライセンス: Link先を確認

Concepcion Varela, Carolina Rebollar, Olatz Garcia, Eugenio Bravo, Javier Bilbao

(参考訳) 私たちが生きているこのデジタル時代の世界では、学生が獲得しなければならない基本的な能力の1つは、コンピュータ思考(CT)の能力である。形式的な定義に関する一般的なコンセンサスはないが、コンピュータの有無に関わらず、あらゆる領域で発生する可能性のある問題の解決に必要なスキルと態度のセットとして、一般に理解されている。学生が取得したctスキルの計測と評価は基本であり、この目的のためには以前に検証した測定機器を使用する必要がある。本研究では,バスク大学工学部の新入生がCT(Critical Thinking, Algorithmic Thinking, Problem Solving, Cooperativity, Creativity)のスキルを身につけているかどうかを,事前に検証した手法を適用した。

In this world of the digital era, in which we are living, one of the fundamental competences that students must acquire is the competence in Computational Thinking (CT). Although there is no general consensus on a formal definition, there is a general understanding of it as a set of skills and attitudes necessary for the resolution, with or without a computer, of problems that may arise in any area of life. Measuring and evaluating which of the CT skills students have acquired is fundamental, and for this purpose, previously validated measuring instruments must be used. In this study, a previously validated instrument is applied to know if the new students in the Engineering Degrees of the University of the Basque Country have the following skills in CT: Critical Thinking, Algorithmic Thinking, Problem Solving, Cooperativity and Creativity.

翻訳日:2024-02-08 18:22:03 公開日:2024-02-06

# 振動ミラーからの両側光子放出と多光子絡み発生

Bilateral photon emission from a vibrating mirror and multiphoton entanglement generation ( http://arxiv.org/abs/2402.04339v1 )

ライセンス: Link先を確認

Alberto Mercurio, Enrico Russo, Fabio Mauceri, Salvatore Savasta, Franco Nori, Vincenzo Macr\`i, Rosario Lo Franco

(参考訳) 絡み合いは量子対応デバイスの開発において重要な役割を果たしている。重要な目的の1つは、例えば、閉じ込められた電磁場と相互作用する機械振動子を通じて達成される絡み合った状態の決定論的生成と分布である。本研究では,両面完全鏡を含む共振器について検討する。鏡はキャビティモードを2つの独立した閉じ込められた電磁場に分離するが、放射圧相互作用は全てのサブシステム間で高次な効果的な相互作用をもたらす。鏡の位置にも関連し、選択された共鳴条件によっては、2n$-photonの絡み合い生成と両側光子対の放出が研究されている。機械振動子の非古典的性質を実証し、これらの現象を制御する経路を提供し、量子技術における潜在的な応用を開拓する。今後は、マイクロ波や光子など、さまざまなエネルギースケールのサブシステムに、同様の統合デバイスを組み込むことが期待できる。

Entanglement plays a crucial role in the development of quantum-enabled devices. One significant objective is the deterministic creation and distribution of entangled states, achieved, for example, through a mechanical oscillator interacting with confined electromagnetic fields. In this study, we explore a cavity resonator containing a two-sided perfect mirror. Although the mirror separates the cavity modes into two independent confined electromagnetic fields, the radiation pressure interaction gives rise to high-order effective interactions across all subsystems. Depending on the chosen resonant conditions, which are also related to the position of the mirror, we study $2n$-photon entanglement generation and bilateral photon pair emission. Demonstrating the non-classical nature of the mechanical oscillator, we provide a pathway to control these phenomena, opening potential applications in quantum technologies. Looking ahead, similar integrated devices could be used to entangle subsystems across vastly different energy scales, such as microwave and optical photons.

翻訳日:2024-02-08 18:21:47 公開日:2024-02-06

# モノのインターネットにおける識別問題を解決する論理認識法

Logical recognition method for solving the problem of identification in the Internet of Things ( http://arxiv.org/abs/2402.04338v1 )

ライセンス: Link先を確認

Islambek Saymanov

(参考訳) 近年登場した論理代数法と価値論理の応用の新しい分野は、様々な対象や現象、医学的または技術的診断、近代的な機械の構築、テストの問題のチェックなどを認識することで、論理関数を機能空間全体に最適な拡張を構築することができる。例えば、論理認識システムでは、離散解析に基づく論理手法とそれに基づく命題計算は、独自の認識アルゴリズムを構築するために用いられる。一般の場合、論理認識法の使用は、認識される対象や現象の論理的特徴である変数が特徴空間全体にわたってk値関数の最適継続によって表現される論理的接続の存在を提供する。本研究の目的は、ある特徴空間からベクトルとして指定される非交差オブジェクトの論理的特徴とクラスを持つ参照テーブルからなるオブジェクト認識のための論理的手法を開発することである。この方法は、参照テーブルを至るところで定義されていない論理関数として考慮し、論理関数を機能空間全体への最適な継続を構築することで、クラス全体の空間への拡張を決定する。

A new area of application of methods of algebra of logic and to valued logic, which has emerged recently, is the problem of recognizing a variety of objects and phenomena, medical or technical diagnostics, constructing modern machines, checking test problems, etc., which can be reduced to constructing an optimal extension of the logical function to the entire feature space. For example, in logical recognition systems, logical methods based on discrete analysis and propositional calculus based on it are used to build their own recognition algorithms. In the general case, the use of a logical recognition method provides for the presence of logical connections expressed by the optimal continuation of a k-valued function over the entire feature space, in which the variables are the logical features of the objects or phenomena being recognized. The goal of this work is to develop a logical method for object recognition consisting of a reference table with logical features and classes of non-intersecting objects, which are specified as vectors from a given feature space. The method consists of considering the reference table as a logical function that is not defined everywhere and constructing an optimal continuation of the logical function to the entire feature space, which determines the extension of classes to the entire space.

翻訳日:2024-02-08 18:21:30 公開日:2024-02-06

# legallens: 非構造化テキストにおける法的違反の識別にllmを活用する

LegalLens: Leveraging LLMs for Legal Violation Identification in Unstructured Text ( http://arxiv.org/abs/2402.04335v1 )

ライセンス: Link先を確認

Dor Bernsohn, Gil Semo, Yaron Vazana, Gila Hayat, Ben Hagag, Joel Niklaus, Rohit Saha, Kyryl Truskovskyi

(参考訳) 本研究では,非構造化テキストデータ中の法的な違反を検出するための1つと,潜在的に影響を受ける可能性のある個人とを関連付ける2つの主な課題に焦点を当てた。我々はLarge Language Models (LLM) を用いて2つのデータセットを構築した。どちらのタスクもクラスアクションケースのコンテキスト用に特別に設計されました。実験設計では、bertファミリーとオープンソースllmの微調整モデルが組み込まれ、クローズドソースllmを使った少数実験が行われた。結果、F1スコア62.69\%(違反識別)と81.02\%(81.02\%)は、データセットと設定が両方のタスクに使用できることを示している。最後に,NLP(法定自然言語処理)分野のさらなる研究を進めるために,実験に使用されるデータセットとコードを公開する。

In this study, we focus on two main tasks, the first for detecting legal violations within unstructured textual data, and the second for associating these violations with potentially affected individuals. We constructed two datasets using Large Language Models (LLMs) which were subsequently validated by domain expert annotators. Both tasks were designed specifically for the context of class-action cases. The experimental design incorporated fine-tuning models from the BERT family and open-source LLMs, and conducting few-shot experiments using closed-source LLMs. Our results, with an F1-score of 62.69\% (violation identification) and 81.02\% (associating victims), show that our datasets and setups can be used for both tasks. Finally, we publicly release the datasets and the code used for the experiments in order to advance further research in the area of legal natural language processing (NLP).

翻訳日:2024-02-08 18:21:10 公開日:2024-02-06

# インテリジェントトランスデューサを用いたホームオートメーションシステム

Home Automation System based on Intelligent Transducer Enablers ( http://arxiv.org/abs/2402.04334v1 )

ライセンス: Link先を確認

Manuel Su\'arez-Albela, Paula Fraga-Lamas, Tiago M. Fern\'andez-Caram\'es, Adriana Dapena and Miguel Gonz\'alez-L\'opez

(参考訳) 本稿では, 簡易かつ迅速にトランスデューサを識別・設定することを目的とした, HASITE (Intelligent Transducer Enablersをベースとしたホームオートメーションシステム) を提案する。これらの機能は、多くのトランスデューサがデプロイされる状況において特に有用である。 HASITEは、無線ネットワークと自己設定プロトコルと自己登録プロトコルの両方を用いることで、ホームオートメーションシステムのデプロイを簡単にする。これら3つの要素の応用により、hasiteは新しいトランスデューサをパワーアップするだけで追加することができる。異なる現実的なシナリオで実施されたテストによると、トランスデューサは13秒未満で使用可能である。さらに、すべてのHASITE機能はAPIを通じてアクセスすることができるため、サードパーティシステムとの統合も可能だ。例として、APIに基づいたAndroidアプリケーションが紹介されている。リモートのユーザは、ふつうのスマートフォンやタブレットを使ってトランスデューサと対話できる。

This paper presents a novel home automation system named HASITE (Home Automation System based on Intelligent Transducer Enablers), which has been specifically designed to identify and configure transducers easily and quickly. These features are especially useful in situations where many transducers are deployed, since their setup becomes a cumbersome task that consumes a significant amount of time and human resources. HASITE simplifies the deployment of a home automation system by using wireless networks and both self-configuration and self-registration protocols. Thanks to the application of these three elements, HASITE is able to add new transducers by just powering them up. According to the tests performed in different realistic scenarios, a transducer is ready to be used in less than 13 s. Moreover, all HASITE functionalities can be accessed through an API, which also allows for the integration of third-party systems. As an example, an Android application based on the API is presented. Remote users can use it to interact with transducers by just using a regular smartphone or a tablet.

翻訳日:2024-02-08 18:20:54 公開日:2024-02-06

# LESS: ターゲットのインストラクションチューニングのためのインフルエンシャルデータの選択

LESS: Selecting Influential Data for Targeted Instruction Tuning ( http://arxiv.org/abs/2402.04333v1 )

ライセンス: Link先を確認

Mengzhou Xia, Sadhika Malladi, Suchin Gururangan, Sanjeev Arora, Danqi Chen

(参考訳) 命令チューニングは大規模言語モデル(llm)の強力な機能を解き放ち、汎用チャットボットを開発するために組み合わせデータセットを効果的に利用する。しかし、現実世界のアプリケーションは、しばしば特別なスキル(推論など)を必要とする。課題は、これらの広範囲なデータセットから最も関連性の高いデータを特定して、特定の能力を効果的に開発することである。 LESSは,データの影響を効果的に推定し,命令データ選択のための低ランクグレーディエント類似度探索を行うアルゴリズムである。重要なことに、LESSはAdamオプティマイザと可変長命令データを扱うために既存の影響定式化を適用する。 LESSはまず、低次元の勾配特徴を持つ再利用性が高く、転送可能な勾配データストアを構築し、その後、特定の機能を具現化した少数ショットの例と類似性に基づいてサンプルを選択する。実験の結果、LESSが選択したデータの5%のトレーニングは、さまざまな下流タスクにわたる完全なデータセットでのトレーニングよりも優れていることが示されている。さらに、選択されたデータは非常に転送性が高く、小さなモデルは、異なるファミリーのより大きなモデルやモデルのために有用なデータを選択するために利用することができる。定性的分析により,本手法は,下流アプリケーションに必要な推論スキルを示すデータを特定するために,表面形状の手がかりを超えていることがわかった。

Instruction tuning has unlocked powerful capabilities in large language models (LLMs), effectively using combined datasets to develop generalpurpose chatbots. However, real-world applications often require a specialized suite of skills (e.g., reasoning). The challenge lies in identifying the most relevant data from these extensive datasets to effectively develop specific capabilities, a setting we frame as targeted instruction tuning. We propose LESS, an optimizer-aware and practically efficient algorithm to effectively estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection. Crucially, LESS adapts existing influence formulations to work with the Adam optimizer and variable-length instruction data. LESS first constructs a highly reusable and transferable gradient datastore with low-dimensional gradient features and then selects examples based on their similarity to few-shot examples embodying a specific capability. Experiments show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. Furthermore, the selected data is highly transferable: smaller models can be leveraged to select useful data for larger models and models from different families. Our qualitative analysis shows that our method goes beyond surface form cues to identify data that exemplifies the necessary reasoning skills for the intended downstream application.

翻訳日:2024-02-08 18:20:38 公開日:2024-02-06

# 非本質ニューロンへのノイズ注入によるDNN対向性ロバスト性および効率性の向上

Enhance DNN Adversarial Robustness and Efficiency via Injecting Noise to Non-Essential Neurons ( http://arxiv.org/abs/2402.04325v1 )

ライセンス: Link先を確認

Zhenyu Liu, Garrett Gagnon, Swagath Venkataramani, Liu Liu

(参考訳) ディープニューラルネットワーク(dnn)は、医療や金融、自動車など、さまざまな産業に革命をもたらし、データ分析や意思決定において、並列性のない機能を提供する。変革的な影響にもかかわらず、DNNは敵攻撃に対する脆弱性と、より複雑で大規模なモデルに関連する計算コストの増加という、2つの重要な課題に直面している。本稿では,対向ロバスト性と実行効率を同時に向上する効果的な手法を提案する。雑音を均一に注入することでロバスト性を高める従来の研究とは異なり、各dnn層に戦略的に適用される非一様雑音注入アルゴリズムを導入することで、攻撃に現れる逆摂動を妨害する。近似手法を用いることで,本態性ニューロンを同定・保護し,非定常ニューロンにノイズを戦略的に導入する。実験の結果,本手法は攻撃シナリオ,モデルアーキテクチャ,データセットの堅牢性と効率性を両立させることができた。

Deep Neural Networks (DNNs) have revolutionized a wide range of industries, from healthcare and finance to automotive, by offering unparalleled capabilities in data analysis and decision-making. Despite their transforming impact, DNNs face two critical challenges: the vulnerability to adversarial attacks and the increasing computational costs associated with more complex and larger models. In this paper, we introduce an effective method designed to simultaneously enhance adversarial robustness and execution efficiency. Unlike prior studies that enhance robustness via uniformly injecting noise, we introduce a non-uniform noise injection algorithm, strategically applied at each DNN layer to disrupt adversarial perturbations introduced in attacks. By employing approximation techniques, our approach identifies and protects essential neurons while strategically introducing noise into non-essential neurons. Our experimental results demonstrate that our method successfully enhances both robustness and efficiency across several attack scenarios, model architectures, and datasets.

翻訳日:2024-02-08 18:20:16 公開日:2024-02-06

# ConsistI2V:画像対ビデオ生成のための視覚的一貫性の強化

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation ( http://arxiv.org/abs/2402.04324v1 )

ライセンス: Link先を確認

Weiming Ren, Harry Yang, Ge Zhang, Cong Wei, Xinrun Du, Stephen Huang, Wenhu Chen

(参考訳) Image-to-Video(I2V)生成は、初期フレーム(テキストプロンプトの他)を使用してビデオシーケンスを作成することを目的としている。 i2v世代における大きな課題は、ビデオ全体を通して視覚的な一貫性を維持することである: 既存の方法はしばしば、第一フレームから主題、背景、スタイルの整合性を保つのに苦労し、ビデオストーリー内で流動的で論理的に進歩することを保証する。これらの問題を緩和するために,I2V生成の視覚的一貫性を高める拡散法であるConsistI2Vを提案する。具体的には,(1)空間と運動の一貫性を維持するため,(2)第1フレームの低周波帯域からのノイズ初期化に着目し,レイアウトの一貫性を高める。これらの2つのアプローチにより、ConsistI2Vは高度に一貫したビデオを生成することができる。また、提案手法を拡張して、自動回帰長ビデオ生成とカメラモーション制御における一貫性向上の可能性を示す。本手法の有効性を検証するため,I2V生成のための総合評価ベンチマークであるI2V-Benchを提案する。自動評価と人間評価の結果から,既存の方法よりも consisti2v の方が優れていることが示された。

Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence. A grand challenge in I2V generation is to maintain visual consistency throughout the video: existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame, as well as ensure a fluid and logical progression within the video narrative. To mitigate these issues, we propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation. Specifically, we introduce (1) spatiotemporal attention over the first frame to maintain spatial and motion consistency, (2) noise initialization from the low-frequency band of the first frame to enhance layout consistency. These two approaches enable ConsistI2V to generate highly consistent videos. We also extend the proposed approaches to show their potential to improve consistency in auto-regressive long video generation and camera motion control. To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation. Our automatic and human evaluation results demonstrate the superiority of ConsistI2V over existing methods.

翻訳日:2024-02-08 18:19:58 公開日:2024-02-06

# 量子機械学習と光子計数のための3次元キャビティにおけるトランスモン量子ビットのキャラクタリゼーション

Characterization of a Transmon Qubit in a 3D Cavity for Quantum Machine Learning and Photon Counting ( http://arxiv.org/abs/2402.04322v1 )

ライセンス: Link先を確認

Alessandro D'Elia, Boulos Alfakes, Anas Alkhazaleh, Leonardo Banchi, Matteo Beretta, Stefano Carrazza, Fabio Chiarello, Daniele Di Gioacchino, Andrea Giachero, Felix Henrich, Alex Stephane Piedjou Komnang, Carlo Ligi, Giovanni Maccarrone, Massimo Macucci, Emanuele Palumbo, Andrea Pasquale, Luca Piersanti, Florent Ravaux, Alessio Rettaroli, Matteo Robbiati, Simone Tocci, Claudio Gatti

(参考訳) 本稿では,3次元キャビティにおける超伝導トランスモン量子ビットの量子機械学習および光子計数への応用について報告する。まず,3次元共振器に結合したトランペットキュービットの実現と特性について述べるとともに,シミュレーションフレームワークの詳細な記述と分散シフトやクビット不調和といった重要なパラメータの実験的測定について述べる。次に、単一量子ビットデバイス上に実装された量子機械学習アプリケーションについて報告し、プロトンのuクォークパルトン分布関数に適合することを示す。原稿の最後のセクションでは、同じ3次元共振器に結合した2つの量子ビットに基づく新しいマイクロ波光子検出方式を提案する。これは基本的にダークカウントを減少させ、アクシオンダークマター検索のようなアプリケーションを好む可能性がある。

In this paper we report the use of superconducting transmon qubit in a 3D cavity for quantum machine learning and photon counting applications. We first describe the realization and characterization of a transmon qubit coupled to a 3D resonator, providing a detailed description of the simulation framework and of the experimental measurement of important parameters, like the dispersive shift and the qubit anharmonicity. We then report on a Quantum Machine Learning application implemented on the single-qubit device to fit the u-quark parton distribution function of the proton. In the final section of the manuscript we present a new microwave photon detection scheme based on two qubits coupled to the same 3D resonator. This could in principle decrease the dark count rate, favouring applications like axion dark matter searches.

翻訳日:2024-02-08 18:19:34 公開日:2024-02-06

# 量子シミュレーションのための改良フェルミオンハミルトン

Improved Fermion Hamiltonians for Quantum Simulation ( http://arxiv.org/abs/2402.04317v1 )

ライセンス: Link先を確認

Erik Gustafson, Ruth Van de Water

(参考訳) 我々は、ASQTADにインスパイアされたハミルトニアンと高度に改良されたスタッガードクォーク(HISQ)作用を開発し、これらのハミルトニアンが量子シミュレーションにどのように使用できるかを示した。これらの改良されたハミルトン多様体の時間発展のためのゲートコストと、1+1d格子シュウィンガーモデルを用いた格子間隔誤差の低減の実証を提供する。

We developed a Hamiltonian inspired by ASQTAD and highly improved staggered quark (HISQ) actions and show how these Hamiltonians can be used for quantum simulations. Gate costs for the time evolution of these improved Hamiltonians are provided as well as a demonstration of the reduction of lattice spacing errors using the 1+1d lattice Schwinger model.

翻訳日:2024-02-08 18:19:21 公開日:2024-02-06

# きめ細かな報酬による引用文生成のための言語モデル

Training Language Models to Generate Text with Citations via Fine-grained Rewards ( http://arxiv.org/abs/2402.04315v1 )

ライセンス: Link先を確認

Chengyu Huang, Zeqiu Wu, Yushi Hu, Wenya Wang

(参考訳) 近年のLarge Language Models (LLM) はユーザクエリの応答に有用であることが証明されているが,幻覚の傾向があり,信頼性の低いソースへの参照が欠如しているため,その応答には信頼性が欠如していることが多い。これらの問題に対する直感的な解決策は、証拠として外部文書を参照するテキスト内引用を含めることである。以前の研究は、直接 LLM にインテキストの引用を生成するよう促してきたが、その性能は、特に小さな LLM の場合、満足には程遠い。本研究では, LLMに対して, 応答の正確性を確保しつつ, 支援的かつ関連性の高い引用を生成するための, 微粒な報酬を用いた効果的な学習フレームワークを提案する。また,これらの細粒度報酬を共通llm訓練戦略に適用する系統的分析を行い,従来の手法よりも有利な方法を示した。 ALCEベンチマークから得られた質問応答(QA)データセットについて広範な実験を行い、EXPERTQAを用いてモデルの一般化性を検証する。 LLaMA-2-7Bでは、細粒度の報酬がGPT-3.5-turboを上回り、ベースラインの中で最高の性能を達成する。

While recent Large Language Models (LLMs) have proven useful in answering user queries, they are prone to hallucination, and their responses often lack credibility due to missing references to reliable sources. An intuitive solution to these issues would be to include in-text citations referring to external documents as evidence. While previous works have directly prompted LLMs to generate in-text citations, their performances are far from satisfactory, especially when it comes to smaller LLMs. In this work, we propose an effective training framework using fine-grained rewards to teach LLMs to generate highly supportive and relevant citations, while ensuring the correctness of their responses. We also conduct a systematic analysis of applying these fine-grained rewards to common LLM training strategies, demonstrating its advantage over conventional practices. We conduct extensive experiments on Question Answering (QA) datasets taken from the ALCE benchmark and validate the model's generalizability using EXPERTQA. On LLaMA-2-7B, the incorporation of fine-grained rewards achieves the best performance among the baselines, even surpassing that of GPT-3.5-turbo.

翻訳日:2024-02-08 18:19:13 公開日:2024-02-06

# 半古典的ユークリッド重力に対する新しい境界条件

New Well-Posed Boundary Conditions for Semi-Classical Euclidean Gravity ( http://arxiv.org/abs/2402.04308v1 )

ライセンス: Link先を確認

Xiaoyi Liu, Jorge E. Santos, Toby Wiseman

(参考訳) 有限空洞における4次元ユークリッド重力を考える。アンダーソンはディリクレ条件が十分に仮定された楕円系を得られないことを示し、境界条件を示唆している。ここでは、1パラメータの境界条件族が存在し、定数$p$でパラメータ化され、適切なワイル再スケール境界計量が固定され、すべてよく表される楕円系を与える。アンダーソンとディリクレの境界条件は、これらの極限$p \to 0$と$\infty$と見ることができる。静的ユークリッド解に着目して、熱力学第一法則を導出する。球面空間境界に制限された充填は平坦な空間あるいはシュワルツシルト解であり、ディリクレの場合と同様の熱力学を持つ。平坦空間のサドルに関する滑らかなユークリッドのゆらぎを考える:$p > 1/6$ に対して、リヒネロヴィチ作用素のスペクトルは安定であり、その固有値は正の実部分を持つ。したがって、大きな$p$ を不備なディリクレ境界条件の正則化と見なすことができる。しかし、$p < 1/6$ の場合、球対称および静的セクターにおいても不安定なモードが存在する。そしてローレンツの署名に目を向ける。 p < 1/6$ の場合、この球面ユークリッド不安定性は境界自体の力学に付随するローレンツ不安定性と対になっていると理解できるかもしれない。しかし、球対称を壊す摂動を考えると、謎が発生する。 p > 1/6$でも動的に不安定なモードが多数存在し、我々が発見したユークリッドの安定性とは対照的である。したがって、安定な熱力学を持つが不安定な力学系を得るように見え、ユークリッド理論を議論する際に実装した滑らかさの標準的な仮定に疑問を呈する。

We consider four-dimensional Euclidean gravity in a finite cavity. Anderson has shown Dirichlet conditions do not yield a well-posed elliptic system, and has suggested boundary conditions that do. Here we point out that there exists a one-parameter family of boundary conditions, parameterized by a constant $p$, where a suitably Weyl rescaled boundary metric is fixed, and all give a well-posed elliptic system. Anderson and Dirichlet boundary conditions can be seen as the limits $p \to 0$ and $\infty$ of these. Focussing on static Euclidean solutions, we derive a thermodynamic first law. Restricting to a spherical spatial boundary, the infillings are flat space or the Schwarzschild solution, and have similar thermodynamics to the Dirichlet case. We consider smooth Euclidean fluctuations about the flat space saddle; for $p > 1/6$ the spectrum of the Lichnerowicz operator is stable -- its eigenvalues have positive real part. Thus we may regard large $p$ as a regularization of the ill-posed Dirichlet boundary conditions. However for $p < 1/6$ there are unstable modes, even in the spherically symmetric and static sector. We then turn to Lorentzian signature. For $p < 1/6$ we may understand this spherical Euclidean instability as being paired with a Lorentzian instability associated with the dynamics of the boundary itself. However, a mystery emerges when we consider perturbations that break spherical symmetry. Here we find a plethora of dynamically unstable modes even for $p > 1/6$, contrasting starkly with the Euclidean stability we found. Thus we seemingly obtain a system with stable thermodynamics, but unstable dynamics, calling into question the standard assumption of smoothness that we have implemented when discussing the Euclidean theory.

翻訳日:2024-02-08 18:18:51 公開日:2024-02-06

# Deep PCCT:Photon Counting Computed Tomography Deep Learning Applications Review

Deep PCCT: Photon Counting Computed Tomography Deep Learning Applications Review ( http://arxiv.org/abs/2402.04301v1 )

ライセンス: Link先を確認

Ana Carolina Alves, Andr\'e Ferreira, Gijs Luijten, Jens Kleesiek, Behrus Puladi, Jan Egger, Victor Alves

(参考訳) 医用イメージングは、空間分解能の制限、電子ノイズからの干渉、雑音間のコントラスト比の低下などの課題に直面している。 Photon Counting Computed Tomography (PCCT) はその革新的な技術でこれらの問題に対処するソリューションとして登場した。このレビューは、PCCTの先臨床研究における最近の発展と応用を掘り下げ、従来の画像の限界を克服する可能性を強調している。例えば、pcctは乳房の微妙な異常の検出を改善することに顕著な効果を示しており、以前は達成できなかった詳細レベルを提供する。 PCCTの現在の文献を見ると、スキャナーの主な特徴とその様々な応用について、その技術に関する包括的な分析が示される。さらに、深層学習をpcctに統合し、放射線学的特徴の研究を行い、データ処理における成功例を提示している。これらの進歩を認めつつも、この分野の既存の課題を議論し、将来の研究と医療画像技術の改善への道を開く。近年PCCTが臨床レベルで統合されているため,本研究の論文は限られているが,その潜在的なメリットは様々な診断応用にまで及んでいる。

Medical imaging faces challenges such as limited spatial resolution, interference from electronic noise and poor contrast-to-noise ratios. Photon Counting Computed Tomography (PCCT) has emerged as a solution, addressing these issues with its innovative technology. This review delves into the recent developments and applications of PCCT in pre-clinical research, emphasizing its potential to overcome traditional imaging limitations. For example PCCT has demonstrated remarkable efficacy in improving the detection of subtle abnormalities in breast, providing a level of detail previously unattainable. Examining the current literature on PCCT, it presents a comprehensive analysis of the technology, highlighting the main features of scanners and their varied applications. In addition, it explores the integration of deep learning into PCCT, along with the study of radiomic features, presenting successful applications in data processing. While acknowledging these advances, it also discusses the existing challenges in this field, paving the way for future research and improvements in medical imaging technologies. Despite the limited number of articles on this subject, due to the recent integration of PCCT at a clinical level, its potential benefits extend to various diagnostic applications.

翻訳日:2024-02-08 18:18:17 公開日:2024-02-06

# 時間的ラベル雑音下での時系列からの学習

Learning from Time Series under Temporal Label Noise ( http://arxiv.org/abs/2402.04398v1 )

ライセンス: Link先を確認

Sujay Nagaraj, Walter Gerych, Sana Tonekaboni, Anna Goldenberg, Berk Ustun, Thomas Hartvigsen

(参考訳) 多くのシーケンシャルな分類タスクは、時間とともに変化するラベルノイズに影響される。このようなノイズは、ラベルの品質を改善、悪化、あるいは定期的に変化させる可能性がある。まず,時系列の逐次分類問題である時間ラベル雑音の提案と定式化を行った。この設定では、時間依存ノイズ関数によって破損しながら複数のラベルを順次記録する。まず,ラベルノイズ関数の時間的性質をモデル化することの重要性と,既存の手法が一貫して過小評価されることを示す。次に,データから直接時間ラベルノイズ関数を推定することにより,雑音耐性分類器を訓練する手法を提案する。提案手法は,実データと合成データを用いた多種多様な時間ラベルノイズ関数の存在下での最先端性能につながることを示す。

Many sequential classification tasks are affected by label noise that varies over time. Such noise can cause label quality to improve, worsen, or periodically change over time. We first propose and formalize temporal label noise, an unstudied problem for sequential classification of time series. In this setting, multiple labels are recorded in sequence while being corrupted by a time-dependent noise function. We first demonstrate the importance of modelling the temporal nature of the label noise function and how existing methods will consistently underperform. We then propose methods that can train noise-tolerant classifiers by estimating the temporal label noise function directly from data. We show that our methods lead to state-of-the-art performance in the presence of diverse temporal label noise functions using real and synthetic data.

翻訳日:2024-02-08 18:11:51 公開日:2024-02-06

# LLMベースのソフトウェアエンジニアリングの保証

Assured LLM-Based Software Engineering ( http://arxiv.org/abs/2402.04380v1 )

ライセンス: Link先を確認

Nadia Alshahwan, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

(参考訳) 本稿では、人間とは独立してコードを改善するために、どのようにしてLarge Language Models(LLMs)を使用できるか、そして、改善されたコード – 元のコードの性質を後退させないことを保証するか、という疑問に対処する。 -検証可能な測定可能な方法でオリジナルを改善するか? この問題に対処するため,遺伝子改良にインスパイアされた生成とテストのアプローチである Assured LLM-based Software Engineering を提唱する。保証されたLLMSEは一連のセマンティックフィルタを適用し、これら2つの保証を満たしていないコードを破棄する。これはLLMの幻覚への適合性の潜在的な問題を克服する。 LLMを使って、どんな人間からも独立してコードを生成することができます。他のヒューマンエンジニアが生成したコードで行うように、人間は最終的なコードレビュアーの役割のみを担います。この記事では,2024年4月15日,ポルトガルのリスボンで開催されたInternational Workshop on Interpretability, Robustness, and Benchmarking in Neural Software EngineeringのMark Harman氏の基調講演の内容の概要を紹介する。

In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code - does not regress the properties of the original code? - improves the original in a verifiable and measurable way? To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approach, inspired by Genetic Improvement. Assured LLMSE applies a series of semantic filters that discard code that fails to meet these twin guarantees. This overcomes the potential problem of LLM's propensity to hallucinate. It allows us to generate code using LLMs, independently of any human. The human plays the role only of final code reviewer, as they would do with code generated by other human engineers. This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.

翻訳日:2024-02-08 18:11:39 公開日:2024-02-06

# 安定な無機材料をテキストとして生成する微調整言語モデル

Fine-Tuned Language Models Generate Stable Inorganic Materials as Text ( http://arxiv.org/abs/2402.04379v1 )

ライセンス: Link先を確認

Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi

(参考訳) 安定材料生成のための微調整型大規模言語モデルを提案する。テキストエンコードされた原子論データ上の微調整された大きな言語モデルは不規則であるが、実装は簡単であり、90%のサンプル構造は原子の位置と電荷の物理的制約に従う。学習MLポテンシャルと金標準DFT計算の両方から得られたエネルギーを用いて、我々の最強モデル(微調整LLaMA-2 70B)が、競合拡散モデルCDVAEの約2倍(49%対28%)で準安定であると予測された材料を生成することを示した。テキストプロンピングに固有の柔軟性があるため,安定素材の無条件生成,部分構造インフィルディング,テキスト条件生成を同時に行うことができる。最後に, 言語モデルが結晶構造の主要な対称性を捉える能力は, モデルスケールにより向上し, 事前学習されたllmのバイアスが原子学的データに驚くほど適していることを示す。

We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.

翻訳日:2024-02-08 18:11:07 公開日:2024-02-06

# $\texttt{NeRCC}$:Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems

$\texttt{NeRCC}$: Nested-Regression Coded Computing for Resilient Distributed Prediction Serving Systems ( http://arxiv.org/abs/2402.04377v1 )

ライセンス: Link先を確認

Parsa Moradi, Mohammad Ali Maddah-Ali

(参考訳) ストラグラーに対する耐性は予測サービスシステムの重要な要素であり、事前訓練された機械学習モデルの入力データに対する推論を実行する。本稿では、近似符号化コンピューティングのための一般的なストラグラー耐性フレームワークとしてNeRCCを提案する。 nerccは,(1)エンコーディングレグレッションとサンプリング,(2)エンコードされたデータポイントの組合せとしてコード化されたデータポイントを生成する,(2)労働者のクラスタがコード化されたデータポイント上で推論を行う,(3)デコードレグレッションとサンプリング,(3)エンコードされたデータポイント上で利用可能な予測から元のデータポイントの予測をほぼ復元する,の3つのレイヤを含む。このフレームワークの全体的な目的は、符号化層と復号層における2つの回帰モデル間の相互関係を明らかにすることである。本稿では, 2つの正規化項への依存度を和らげることで, ネスト回帰問題の解法を提案する。 LeNet5、RepVGG、Vision Transformer(ViT)など、さまざまなデータセットとさまざまな機械学習モデルに関する広範な実験により、NeRCCは、幅広いストラグラーにおける元の予測を正確に近似し、最先端の技術を最大23%上回ることを示した。

Resilience against stragglers is a critical element of prediction serving systems, tasked with executing inferences on input data for a pre-trained machine-learning model. In this paper, we propose NeRCC, as a general straggler-resistant framework for approximate coded computing. NeRCC includes three layers: (1) encoding regression and sampling, which generates coded data points, as a combination of original data points, (2) computing, in which a cluster of workers run inference on the coded data points, (3) decoding regression and sampling, which approximately recovers the predictions of the original data points from the available predictions on the coded data points. We argue that the overall objective of the framework reveals an underlying interconnection between two regression models in the encoding and decoding layers. We propose a solution to the nested regressions problem by summarizing their dependence on two regularization terms that are jointly optimized. Our extensive experiments on different datasets and various machine learning models, including LeNet5, RepVGG, and Vision Transformer (ViT), demonstrate that NeRCC accurately approximates the original predictions in a wide range of stragglers, outperforming the state-of-the-art by up to 23%.

翻訳日:2024-02-08 18:10:36 公開日:2024-02-06

# 実データと代理データによる学習法則のスケーリング

Scaling laws for learning with real and surrogate data ( http://arxiv.org/abs/2402.04376v1 )

ライセンス: Link先を確認

Ayush Jain, Andrea Montanari and Eren Sasoglu

(参考訳) 大量の高品質なデータを収集することは、しばしば高価で非現実的であり、機械学習における重要なボトルネックである。ターゲットディストリビューションから、よりアクセスしやすい公開データセット、異なる状況下で収集されたデータ、または生成モデルによって合成されたデータを使って、小さなセットのn$データポイントを拡張できる。ぼやけた区別では、データを‘surrogate data’と呼ぶ。我々は,サロゲートデータをトレーニングに統合するための簡単なスキームを定義し,理論モデルと経験的研究の両方を用いてその振る舞いを探索する。主な発見は次のとおりです。 (i)$ integrated surrogate dataは、オリジナルのディストリビューションのテストエラーを大幅に削減できる。 (ii)$ この利益を得るためには、最適に重み付けされた経験的リスク最小化を使用することが不可欠である。 (iii)$ 実データと代理データの混合で訓練されたモデルのテストエラーは、スケーリング法則によってよく説明される。これは、代理データから最適な重み付けと利得を予測するために使用できる。

Collecting large quantities of high-quality data is often prohibitively expensive or impractical, and a crucial bottleneck in machine learning. One may instead augment a small set of $n$ data points from the target distribution with data from more accessible sources like public datasets, data collected under different circumstances, or synthesized by generative models. Blurring distinctions, we refer to such data as `surrogate data'. We define a simple scheme for integrating surrogate data into training and use both theoretical models and empirical studies to explore its behavior. Our main findings are: $(i)$ Integrating surrogate data can significantly reduce the test error on the original distribution; $(ii)$ In order to reap this benefit, it is crucial to use optimally weighted empirical risk minimization; $(iii)$ The test error of models trained on mixtures of real and surrogate data is well described by a scaling law. This can be used to predict the optimal weighting and the gain from surrogate data.

翻訳日:2024-02-08 18:09:54 公開日:2024-02-06

# 生成AIの世界 - ディープフェイクと大規模言語モデル

The World of Generative AI: Deepfakes and Large Language Models ( http://arxiv.org/abs/2402.04373v1 )

ライセンス: Link先を確認

Alakananda Mitra, Saraju P. Mohanty, and Elias Kougianos

(参考訳) 我々は、生成人工知能(GenAI)の時代に住んでいる。 Deepfakes and Large Language Models (LLM)はGenAIの2つの例である。特にディープフェイクは、誤った情報を広め、真実を変えることができるので、社会にとって恐ろしい脅威となる。 LLMは汎用言語を生成する強力な言語モデルである。しかし、その生成的な側面から、悪用された場合のリスクでもある。これらの技術の倫理的利用は大きな懸念事項である。この短い記事は、それらの相互関係を見つけようとしている。

We live in the era of Generative Artificial Intelligence (GenAI). Deepfakes and Large Language Models (LLMs) are two examples of GenAI. Deepfakes, in particular, pose an alarming threat to society as they are capable of spreading misinformation and changing the truth. LLMs are powerful language models that generate general-purpose language. However due to its generative aspect, it can also be a risk for people if used with ill intentions. The ethical use of these technologies is a big concern. This short article tries to find out the interrelationship between them.

翻訳日:2024-02-08 18:09:27 公開日:2024-02-06

# 歩行者横断決定は、雑音の多い視覚知覚下での最適意思決定によって説明できる

Pedestrian crossing decisions can be explained by bounded optimal decision-making under noisy visual perception ( http://arxiv.org/abs/2402.04370v1 )

ライセンス: Link先を確認

Yueyang Wang, Aravinda Ramakrishnan Srinivasan, Jussi P.P. Jokinen, Antti Oulasvirta, Gustav Markkula

(参考訳) 本稿では,計算的合理性理論に基づく歩行者横断決定のモデルを提案する。交差決定は、人間の認知的限界から生じる最適性に縛られ、境界的に最適であると仮定される。これまでの歩行者行動のモデルは「ブラックボックス」機械学習モデルか、認知的要因に関する明確な仮定を持つ機械的モデルであった。具体的には、機械的にノイズの多い人間の視覚知覚をモデル化し、交差する際の報酬を仮定するが、強化学習を用いて境界付き最適行動ポリシーを学習する。本モデルでは, 従来モデルよりも多くの経験的現象を再現し, 1) 接近する車両の到着までの時間が, 歩行者がギャップを受理するか否か, (2) 降車前を横断する速度と, (3) 降車前を横断する歩行者タイミングと, (4) 降車停止距離の横断タイミングに与える影響について検討した。特に, 速度依存的ギャップ受容などの意思決定における行動が, 視覚的知覚の制約に対する合理的適応の産物である可能性が示唆された。また,個人毎の認知的制約や報酬のパラメータを適合させることで,個人差をよりよく説明できる。結論として、RLモデルとメカニスティックモデルの両方を活用することで、歩行者行動に関する新たな洞察を与え、より正確でスケーラブルな歩行者モデルに有用な基盤を提供する。

This paper presents a model of pedestrian crossing decisions, based on the theory of computational rationality. It is assumed that crossing decisions are boundedly optimal, with bounds on optimality arising from human cognitive limitations. While previous models of pedestrian behaviour have been either 'black-box' machine learning models or mechanistic models with explicit assumptions about cognitive factors, we combine both approaches. Specifically, we model mechanistically noisy human visual perception and assumed rewards in crossing, but we use reinforcement learning to learn bounded optimal behaviour policy. The model reproduces a larger number of known empirical phenomena than previous models, in particular: (1) the effect of the time to arrival of an approaching vehicle on whether the pedestrian accepts the gap, the effect of the vehicle's speed on both (2) gap acceptance and (3) pedestrian timing of crossing in front of yielding vehicles, and (4) the effect on this crossing timing of the stopping distance of the yielding vehicle. Notably, our findings suggest that behaviours previously framed as 'biases' in decision-making, such as speed-dependent gap acceptance, might instead be a product of rational adaptation to the constraints of visual perception. Our approach also permits fitting the parameters of cognitive constraints and rewards per individual, to better account for individual differences. To conclude, by leveraging both RL and mechanistic modelling, our model offers novel insights about pedestrian behaviour, and may provide a useful foundation for more accurate and scalable pedestrian models.

翻訳日:2024-02-08 18:08:51 公開日:2024-02-06

# ニューラルネットワークは複雑さの増加統計を学習する

Neural Networks Learn Statistics of Increasing Complexity ( http://arxiv.org/abs/2402.04362v1 )

ライセンス: Link先を確認

Nora Belrose, Quintin Pope, Lucia Quirke, Alex Mallen, Xiaoli Fern

(参考訳) 分布の単純さバイアス(DSB)は、ニューラルネットワークがまずデータ分散の低次モーメントを学習し、次に高次相関に移行することを仮定する。本研究は,低次統計値がトレーニング開始直後のトレーニングセットと一致した最大エントロピー分布において,ネットワークが自動的に良好に学習し,その後にその能力を失うことを示すことによって,DSBに対する説得力のある新たな証拠を示す。また、トークン$n$-gramの周波数と埋め込みベクトルのモーメントの等価性を証明し、LLMのバイアスに関する経験的証拠を見つけることによって、DSBを離散領域に拡張する。最後に, 最適な移動手段を用いて, あるクラスの低次統計を手術的に編集し, 初期学習ネットワークが, 対象クラスから抽出されたかのように, 編集されたサンプルを処理していることを示す。コードはhttps://github.com/EleutherAI/features-across-timeで入手できる。

The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token $n$-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at https://github.com/EleutherAI/features-across-time.

翻訳日:2024-02-08 18:07:50 公開日:2024-02-06

# 適応推論:理論的限界と未探究の機会

Adaptive Inference: Theoretical Limits and Unexplored Opportunities ( http://arxiv.org/abs/2402.04359v1 )

ライセンス: Link先を確認

Soheil Hor, Ying Qian, Mert Pilanci, Amin Arbabian

(参考訳) 本稿では,適応推論アルゴリズムの効率と性能ゲイン機会サイズを定量化する最初の理論的枠組みを提案する。コンピュータビジョンおよび自然言語処理タスクにおける10-100倍の効率向上の可能性を示す実証的証拠により,性能上のペナルティを伴わずに実現可能な効率と性能向上のための新たな近似的および厳密な境界を提供する。さらに,適応推論状態空間の最適選択と設計を通じて,実現可能な効率の向上に関する洞察を提供する。

This paper introduces the first theoretical framework for quantifying the efficiency and performance gain opportunity size of adaptive inference algorithms. We provide new approximate and exact bounds for the achievable efficiency and performance gains, supported by empirical evidence demonstrating the potential for 10-100x efficiency improvements in both Computer Vision and Natural Language Processing tasks without incurring any performance penalties. Additionally, we offer insights on improving achievable efficiency gains through the optimal selection and design of adaptive inference state spaces.

翻訳日:2024-02-08 18:07:30 公開日:2024-02-06

# ダンス生成のための双方向自己回帰拡散モデル

Bidirectional Autoregressive Diffusion Model for Dance Generation ( http://arxiv.org/abs/2402.04356v1 )

ライセンス: Link先を確認

Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang

(参考訳) ダンスは人間の感情を表現するための強力な媒体として機能するが、人生のようなダンスの生成は依然としてかなりの課題である。近年、拡散モデルは様々な領域で顕著な生成能力を示した。彼らは、適応可能な多対多の性質のために、人間のモーション生成を約束します。それにもかかわらず、現在の拡散に基づく運動生成モデルは、局所的および双方向的な拡張による動きに焦点を絞らず、直接かつ一方向の運動列を直接生成することが多い。高品質な舞踊の動きを振る舞う際には、音楽的文脈だけでなく、近隣の音楽的な舞踊の動きも考慮する必要がある。そこで本研究では,音楽対ダンス生成のための双方向自己回帰拡散モデル(badm)を提案する。生成したダンス動作をよりスムーズにするため、局所運動強調のための局所情報デコーダを構築する。提案手法は,入力条件と近傍動作に基づいて新たな動きを生成可能とし,個々の動きスライスを反復的に予測し,すべての予測を集約する。生成されたダンスとビートとの同期性をさらに向上するため、ビート情報を入力として組み込んで、より優れた音楽整列ダンス動作を生成する。実験結果から,提案モデルが既存の一方向アプローチと比較して最先端性能を実現することを示す。

Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create entire motion sequences directly and unidirectionally, lacking focus on the motion with local and bidirectional enhancement. When choreographing high-quality dance movements, people need to take into account not only the musical context but also the nearby music-aligned dance motions. To authentically capture human behavior, we propose a Bidirectional Autoregressive Diffusion Model (BADM) for music-to-dance generation, where a bidirectional encoder is built to enforce that the generated dance is harmonious in both the forward and backward directions. To make the generated dance motion smoother, a local information decoder is built for local motion enhancement. The proposed framework is able to generate new motions based on the input conditions and nearby motions, which foresees individual motion slices iteratively and consolidates all predictions. To further refine the synchronicity between the generated dance and the beat, the beat information is incorporated as an input to generate better music-aligned dance movements. Experimental results demonstrate that the proposed model achieves state-of-the-art performance compared to existing unidirectional approaches on the prominent benchmark for music-to-dance generation.

翻訳日:2024-02-08 18:07:19 公開日:2024-02-06

# PQMass:確率質量推定を用いた生成モデルの品質の確率論的評価

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation ( http://arxiv.org/abs/2402.04355v1 )

ライセンス: Link先を確認

Pablo Lemos, Sammy Sharief, Nikolay Malkin, Laurence Perreault-Levasseur, Yashar Hezaveh

(参考訳) 生成モデルの品質を評価するための包括的サンプルベース手法を提案する。提案手法は,2組のサンプルが同一分布から抽出される確率を推定し,単一生成モデルの性能を評価する統計的に厳密な手法や,同一データセット上で訓練された複数の競合モデルの比較を可能にする。この比較は、空間を重複しない領域に分割し、各領域のデータサンプル数を比較することで行うことができる。このメソッドは生成モデルとテストデータからのサンプルのみを必要とする。高次元データ上で直接機能することができ、次元の縮小の必要性を回避できる。特に,本手法は真の分布の密度に関する仮定に依存せず,訓練や補助モデルへの適合にも依存しない。代わりに、データ空間内の様々な部分領域にわたる密度(確率質量)の積分を近似することに焦点を当てている。

We propose a comprehensive sample-based method for assessing the quality of generative models. The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution, providing a statistically rigorous method for assessing the performance of a single generative model or the comparison of multiple competing models trained on the same dataset. This comparison can be conducted by dividing the space into non-overlapping regions and comparing the number of data samples in each region. The method only requires samples from the generative model and the test data. It is capable of functioning directly on high-dimensional data, obviating the need for dimensionality reduction. Significantly, the proposed method does not depend on assumptions regarding the density of the true distribution, and it does not rely on training or fitting any auxiliary models. Instead, it focuses on approximating the integral of the density (probability mass) across various sub-regions within the data space.

翻訳日:2024-02-08 18:06:56 公開日:2024-02-06

# 3Dプリンターで制御されたシリンジポンプは、試薬を2本、アクティブ、レギュラブル、同時に供給する。免疫クロマトグラフィーテストストリップの製造

3D printer-controlled syringe pumps for dual, active, regulable and simultaneous dispensing of reagents. Manufacturing of immunochromatographic test strips ( http://arxiv.org/abs/2402.04354v1 )

ライセンス: Link先を確認

Gabriel Siano, Leandro Peretti, Juan Manuel Marquez, Nazarena Pujato, Leonardo Giovanini and Claudio Berli

(参考訳) 横流式免疫測定法 (lfia) は, 製造コスト, 単純性, 移植性といった複数の利点を組み合わせることで, インフラや高度に訓練された人材を必要とせずにバイオマーカーを検出できるため, 様々なアナライトの検出に世界中で広く用いられている。本稿では,特に試験線 (tl) と制御線 (cl) の形での試薬の制御および能動的投与に関して,実験室規模でのlfiaの製造プロセスに対する解決策を提供する。提案する3dプリンタの適応は簡単で、フリーで、多くの研究所が既にインフラに導入しているため、この課題を達成するため、3dプリンタをシリンジポンプ(sp)の制御にも応用した。 3Dプリンタの標準機能は、SPを切断し、エクストルーダを再接続することで容易に復元できる。さらに、3dプリンターの統一的な制御により、特定の高価な商用機器でのみ見られる4つの機能、デュアル、アクティブ、レギュレータブル、同時ディスペンサーが可能になる。提案手法では,3dプリンタで制御したspsで2本以上の線(cl,tl)を同時に供給することの課題に対処し,実験範囲内では線幅の規制などを行った。また,レプトスピローシス検出のためのLFIAの構築も,自動試薬ディスペンシングの実践例として示されている。

Lateral flow immunoassays (LFIA) are widely used worldwide for the detection of different analytes because they combine multiple advantages such as low production cost, simplicity, and portability, which allows biomarkers detection without requiring infrastructure or highly trained personnel. Here we propose to provide solutions to the manufacturing process of LFIA at laboratory-scale, particularly to the controlled and active dispensing of the reagents in the form the Test Lines (TL) and the Control Lines (CL). To accomplish this task, we adapted a 3D printer to also control Syringe Pumps (SP), since the proposed adaptation of a 3D printer is easy, free and many laboratories already have it in their infrastructure. In turn, the standard function of the 3D printer can be easily restored by disconnecting the SPs and reconnecting the extruder. Additionally, the unified control of the 3D printer enables dual, active, regulable and simultaneous dispensing, four features that are typically found only in certain high-cost commercial equipment. With the proposed setup, the challenge of dispensing simultaneously at least 2 lines (CL and TL) with SPs controlled by a 3D printer was addressed, including regulation in the width of dispensed lines within experimental limits. Also, the construction of a LFIA for the detection of leptospirosis is shown as a practical example of automatized reagent dispensing.

翻訳日:2024-02-08 18:06:41 公開日:2024-02-06

# hedgehog & the porcupine:softmaxの模倣による表現的線形注意

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry ( http://arxiv.org/abs/2402.04347v1 )

ライセンス: Link先を確認

Michael Zhang, Kush Bhatia, Hermann Kumbong, and Christopher R\'e

(参考訳) 線形の注意はトランスフォーマーの効率を改善する可能性を示し、注意の2次複雑さを線形のシーケンス長に減らした。これは(1)スクラッチからリニアトランスをトレーニングすること、(2)タスク固有のトランスフォーマーをリニアバージョンに変換してタスクパフォーマンスを回復すること、(3)大きな言語モデルのようなトランスフォーマーを下流タスクで微調整可能なリニアバージョンに事前変換すること、のエキサイティングな約束を持っている。しかし、リニアアテンションは、品質において標準的なソフトマックスアテンションを過小評価することが多い。この性能ギャップを埋めるために、以前の線形の注意は、低エントロピー(または「スパイキー」)重みとドット生成単調性(英語版)という、優れた性能に結びついたソフトマックスの注意の鍵的特性を欠いている。さらに,これらの特性を保ち,ソフトマックス性能に適合するが,線形注意で計算するには非効率な,驚くほど単純な特徴マップも観察する。そこで我々は,線形複雑性を維持しつつ,ソフトマックスアテンションのスパイク特性とモノトニック特性を保持する学習可能な線形アテンションであるHedgehogを提案する。 Hedgehogは単純なトレーニング可能なMPPを使用して、ソフトマックスの注意を模倣する注意重みを生成する。実験の結果、Hedgehogは電車からの変圧器の標準品質の99%以上を回復し、WikiText-103の6点の難易度点と微調整された双方向BERTの8.7点のGLUEスコアを上回った。 Hedgehogは事前訓練された変換も可能にする。事前訓練されたGPT-2を線形アテンション変種に変換することで、125Mのサブクワッドラティックデコーダモデルに対して、WikiText-103で最先端の16.7パープレキシティを実現する。トレーニング済みのLlama-2 7BをリニアアテンションLlamaに変換する。低ランク適応では、Hedgehog-Llama2 7Bは標準の注意モデルよりも28.1高いROUGE-1点を達成する。

Linear attentions have shown potential for improving Transformer efficiency, reducing attention's quadratic complexity to linear in sequence length. This holds exciting promise for (1) training linear Transformers from scratch, (2) "finetuned-conversion" of task-specific Transformers into linear versions that recover task performance, and (3) "pretrained-conversion" of Transformers such as large language models into linear versions finetunable on downstream tasks. However, linear attentions often underperform standard softmax attention in quality. To close this performance gap, we find prior linear attentions lack key properties of softmax attention tied to good performance: low-entropy (or "spiky") weights and dot-product monotonicity. We further observe surprisingly simple feature maps that retain these properties and match softmax performance, but are inefficient to compute in linear attention. We thus propose Hedgehog, a learnable linear attention that retains the spiky and monotonic properties of softmax attention while maintaining linear complexity. Hedgehog uses simple trainable MLPs to produce attention weights mimicking softmax attention. Experiments show Hedgehog recovers over 99% of standard Transformer quality in train-from-scratch and finetuned-conversion settings, outperforming prior linear attentions up to 6 perplexity points on WikiText-103 with causal GPTs, and up to 8.7 GLUE score points on finetuned bidirectional BERTs. Hedgehog also enables pretrained-conversion. Converting a pretrained GPT-2 into a linear attention variant achieves state-of-the-art 16.7 perplexity on WikiText-103 for 125M subquadratic decoder models. We finally turn a pretrained Llama-2 7B into a viable linear attention Llama. With low-rank adaptation, Hedgehog-Llama2 7B achieves 28.1 higher ROUGE-1 points over the base standard attention model, where prior linear attentions lead to 16.5 point drops.

翻訳日:2024-02-08 18:06:14 公開日:2024-02-06

# 信頼度校正はコンフォーマル予測に役立つか?

Does Confidence Calibration Help Conformal Prediction? ( http://arxiv.org/abs/2402.04344v1 )

ライセンス: Link先を確認

Huajun Xi, Jianguo Huang, Lei Feng, Hongxin Wei

(参考訳) 不確実性認定技術としての共形予測は、真のラベルを高い確率で含むことが保証される予測セットを構築する。以前の研究は通常、信頼度校正が共形予測に役立つと仮定して、分類器の校正に温度スケーリングを用いる。本研究は, 熱後キャリブレーション法により, キャリブレーションを改良した予測セットが驚くほど大きくなり, 小温度での過信が共形予測性能の恩恵を受けることを示した。理論的には、高い信頼性は予測セットに新しいクラスを追加する確率を減少させる。この解析に触発されて,接地ラベルの閾値と非定値スコアの差を補正する新しい手法である$\textbf{conformal temperature scaling}$ (confts)を提案する。このようにして、ConfTSの新しい目的は、$\textit{marginal coverage}$を満たす最適なセットに向けて温度値を最適化する。実験により,提案手法は広く用いられている共形予測法を効果的に改善できることが示された。

Conformal prediction, as an emerging uncertainty qualification technique, constructs prediction sets that are guaranteed to contain the true label with high probability. Previous works usually employ temperature scaling to calibrate the classifier, assuming that confidence calibration can benefit conformal prediction. In this work, we first show that post-hoc calibration methods surprisingly lead to larger prediction sets with improved calibration, while over-confidence with small temperatures benefits the conformal prediction performance instead. Theoretically, we prove that high confidence reduces the probability of appending a new class in the prediction set. Inspired by the analysis, we propose a novel method, $\textbf{Conformal Temperature Scaling}$ (ConfTS), which rectifies the objective through the gap between the threshold and the non-conformity score of the ground-truth label. In this way, the new objective of ConfTS will optimize the temperature value toward an optimal set that satisfies the $\textit{marginal coverage}$. Experiments demonstrate that our method can effectively improve widely-used conformal prediction methods.

翻訳日:2024-02-08 18:05:28 公開日:2024-02-06

# 高速オンライン変更点検出

Fast Online Changepoint Detection ( http://arxiv.org/abs/2402.04433v1 )

ライセンス: Link先を確認

Fabrizio Ghezzi, Eduardo Rossi, Lorenzo Trapani

(参考訳) 線形回帰モデルを用いてオンライン変化点検出について検討する。観測地平線の早期に発生する破断のタイムリーな検出を可能にするために, 回帰残差のCUSUMプロセスに基づく重み付き統計クラスを提案する。次に,異なる重み付けスキームを用いて構成された複合統計学のクラスを提案する。変更点をマークする決定規則は,様々な重みで最大の統計値に基づいており,変更点の位置に関係なく迅速な検出を可能にするvetoベースの投票機構として効果的に機能する。我々の理論は、非常に一般的な弱い依存の形で導出され、経済学、医学、その他の応用科学で遭遇する全ての時系列にテストを適用することができる。モンテカルロシミュレーションにより,本手法は手続き的にi型エラーを制御でき,ブレークの有無で検出遅延が短いことを示す。

We study online changepoint detection in the context of a linear regression model. We propose a class of heavily weighted statistics based on the CUSUM process of the regression residuals, which are specifically designed to ensure timely detection of breaks occurring early on during the monitoring horizon. We subsequently propose a class of composite statistics, constructed using different weighing schemes; the decision rule to mark a changepoint is based on the largest statistic across the various weights, thus effectively working like a veto-based voting mechanism, which ensures fast detection irrespective of the location of the changepoint. Our theory is derived under a very general form of weak dependence, thus being able to apply our tests to virtually all time series encountered in economics, medicine, and other applied sciences. Monte Carlo simulations show that our methodologies are able to control the procedure-wise Type I Error, and have short detection delays in the presence of breaks.

翻訳日:2024-02-08 17:57:49 公開日:2024-02-06

# ChatbotがPipelineを発表 - 有限オートマトンによる大規模言語モデルの拡張

Chatbot Meets Pipeline: Augment Large Language Model with Definite Finite Automaton ( http://arxiv.org/abs/2402.04411v1 )

ライセンス: Link先を確認

Yiyou Sun and Junjie Hu and Wei Cheng and Haifeng Chen

(参考訳) 本稿では,大規模言語モデル(llm)を用いた対話型エージェントの能力向上を目的とした新しいフレームワークである,有限オートマトン拡張大言語モデル(dfa-llm)を提案する。従来のllmは、感情的サポートやカスタマサービスなど、所定のレスポンスガイドラインを備えた特別なシナリオで、規制された応答とコンプライアンス応答を生成する上での課題に直面している。我々のフレームワークは、LLM内のトレーニング対話から学んだDFA(Definite Finite Automaton)を組み込むことによって、これらの課題に対処する。この構造的アプローチにより、LDMはDFAによって導かれる決定論的応答経路に従うことができる。 DFA-LLMの利点は、人間可読なDFAによる解釈可能な構造、会話における応答の文脈認識検索、既存のLLMとのプラグアンドプレイ互換性である。大規模なベンチマークでは、DFA-LLMの有効性が検証され、会話エージェントに重要な貢献をする可能性を示している。

This paper introduces the Definite Finite Automaton augmented large language model (DFA-LLM), a novel framework designed to enhance the capabilities of conversational agents using large language models (LLMs). Traditional LLMs face challenges in generating regulated and compliant responses in special scenarios with predetermined response guidelines, like emotional support and customer service. Our framework addresses these challenges by embedding a Definite Finite Automaton (DFA), learned from training dialogues, within the LLM. This structured approach enables the LLM to adhere to a deterministic response pathway, guided by the DFA. The advantages of DFA-LLM include an interpretable structure through human-readable DFA, context-aware retrieval for responses in conversations, and plug-and-play compatibility with existing LLMs. Extensive benchmarks validate DFA-LLM's effectiveness, indicating its potential as a valuable contribution to the conversational agent.

翻訳日:2024-02-08 17:57:33 公開日:2024-02-06

# フェデレーション学習における公平でロバストで効率的な顧客貢献評価に向けて

Towards Fair, Robust and Efficient Client Contribution Evaluation in Federated Learning ( http://arxiv.org/abs/2402.04409v1 )

ライセンス: Link先を確認

Meiying Zhang, Huan Zhao, Sheldon Ebron, Kan Yang

(参考訳) フェデレーション学習(fl)におけるクライアントのパフォーマンスは、さまざまな理由により異なる可能性がある。各クライアントの貢献度を評価することは、クライアントの選択と補償に不可欠である。クライアントが非独立で同一に分散した(非ID)データを持つことが多いため、ノイズや発散する可能性があるため、これは難しい。悪意のあるクライアントのリスクは、特にクライアントのローカルデータやベンチマークルートデータセットにアクセスできない場合の課題を増幅する。本稿ではFRECA(Fair, Robust, Efficient Client Assessment)と呼ばれる新しい手法を提案する。 FRECAはFedTruthというフレームワークを使用して、グローバルモデルの真実の更新を見積もり、すべてのクライアントからのコントリビューションのバランスをとり、悪意のあるクライアントからの影響をフィルタリングする。このアプローチはビザンチン攻撃に対して堅牢であり、ビザンチン耐性集約アルゴリズムを取り入れている。 FRECAはローカルモデルの更新のみで動作し、検証操作やデータセットを必要としないため、効率も良い。実験の結果,frecaはロバストな方法で顧客貢献を正確かつ効率的に定量化できることがわかった。

The performance of clients in Federated Learning (FL) can vary due to various reasons. Assessing the contributions of each client is crucial for client selection and compensation. It is challenging because clients often have non-independent and identically distributed (non-iid) data, leading to potentially noisy or divergent updates. The risk of malicious clients amplifies the challenge especially when there's no access to clients' local data or a benchmark root dataset. In this paper, we introduce a novel method called Fair, Robust, and Efficient Client Assessment (FRECA) for quantifying client contributions in FL. FRECA employs a framework called FedTruth to estimate the global model's ground truth update, balancing contributions from all clients while filtering out impacts from malicious ones. This approach is robust against Byzantine attacks and incorporates a Byzantine-resilient aggregation algorithm. FRECA is also efficient, as it operates solely on local model updates and requires no validation operations or datasets. Our experimental results show that FRECA can accurately and efficiently quantify client contributions in a robust manner.

翻訳日:2024-02-08 17:57:14 公開日:2024-02-06

# 口腔疾患における歯の発見・分別・数量化のための検出用トランスフォーマ : データ増強・塗布技術を中心に

Detection Transformer for Teeth Detection, Segmentation, and Numbering in Oral Rare Diseases: Focus on Data Augmentation and Inpainting Techniques ( http://arxiv.org/abs/2402.04408v1 )

ライセンス: Link先を確認

Hocine Kadi, Th\'eo Sourget, Marzena Kawczynski, Sara Bendjama, Bruno Grollemund, Agn\`es Bloch-Zupan

(参考訳) 本研究は, 口腔レア疾患の文脈における深層学習画像処理に着目し, データ利用率の制限による課題を提起する。重要なステップは、歯の検出、セグメンテーション、パノラマX線撮影である。そこで我々は,稀な口腔疾患患者から得られたパノラマx線写真156点を専門家がラベル付けしたデータセットを用いた。我々は, 歯の検知, セグメンテーション, 52種類の計測のための検出トランスフォーマ(DETR)ニューラルネットワークを訓練した。さらに,幾何学的変換を含むデータ拡張手法を用いた。最後に, パノラマ線写真から歯を除去し, 歯を組み込むことにより, 安定した拡散性を有する塗布技術を用いて新しいパノラマ画像を生成する。その結果,データ拡張を伴わないDETRではmAPが0,69以上であった。データ拡張技術を使用すると、mAPは0,82に改善された。また, 塗布法により生成されたパノラマX線写真を用いて, mAPが0,76。

In this work, we focused on deep learning image processing in the context of oral rare diseases, which pose challenges due to limited data availability. A crucial step involves teeth detection, segmentation and numbering in panoramic radiographs. To this end, we used a dataset consisting of 156 panoramic radiographs from individuals with rare oral diseases and labeled by experts. We trained the Detection Transformer (DETR) neural network for teeth detection, segmentation, and numbering the 52 teeth classes. In addition, we used data augmentation techniques, including geometric transformations. Finally, we generated new panoramic images using inpainting techniques with stable diffusion, by removing teeth from a panoramic radiograph and integrating teeth into it. The results showed a mAP exceeding 0,69 for DETR without data augmentation. The mAP was improved to 0,82 when data augmentation techniques are used. Furthermore, we observed promising performances when using new panoramic radiographs generated with inpainting technique, with mAP of 0,76.

翻訳日:2024-02-08 17:56:54 公開日:2024-02-06

# 位相空間におけるガウス関数の線形結合の二次コヒーレンススケール

Quadrature Coherence Scale of Linear Combinations of Gaussian Functions in Phase Space ( http://arxiv.org/abs/2402.04404v1 )

ライセンス: Link先を確認

Anaelle Hertz, Aaron Z. Goldberg and Khabat Heshami

(参考訳) 二次コヒーレンス尺度(quadrature coherence scale)は、最近導入された非古典性の効率的な目撃指標である。純粋な状態とガウス状態の単純な形式を取るが、混合状態の一般的な表現は違法に扱いにくい傾向にある。本稿では,ガウス関数の線形結合として表現可能なウィグナー関数を特徴とする量子状態の二次コヒーレンススケールの計算法を提案する。このフレームワークで注目すべき例として、猫の状態、GKP状態、ガウス変換、測定、繁殖プロトコルによる状態がある。特に,二次コヒーレンススケールは,損失の存在下で非古典性のスケーラビリティを調べる上で有用なツールであることを示す。我々の発見は、50%以上の損失を受け、全ての純粋な状態が古典的になるという予想を導いた。また,2次コヒーレンス尺度を,育種プロトコルの出力状態の品質の尺度として検討した。

The quadrature coherence scale is a recently introduced measure that was shown to be an efficient witness of nonclassicality. It takes a simple form for pure and Gaussian states, but a general expression for mixed states tends to be prohibitively unwieldy. In this paper, we introduce a method for computing the quadrature coherence scale of quantum states characterized by Wigner functions expressible as linear combinations of Gaussian functions. Notable examples within this framework include cat states, GKP states, and states resulting from Gaussian transformations, measurements, and breeding protocols. In particular, we show that the quadrature coherence scale serves as a valuable tool for examining the scalability of nonclassicality in the presence of loss. Our findings lead us to put forth a conjecture suggesting that, subject to 50% loss or more, all pure states become classical. We also consider the quadrature coherence scale as a measure of quality of the output state of the breeding protocol.

翻訳日:2024-02-08 17:56:34 公開日:2024-02-06

# エッジ並列グラフエンコーダ埋め込み

Edge-Parallel Graph Encoder Embedding ( http://arxiv.org/abs/2402.04403v1 )

ライセンス: Link先を確認

Ariel Lubonja (1), Cencheng Shen (2), Carey Priebe (1) and Randal Burns (1) ((1) Johns Hopkins University, (2) University of Delaware)

(参考訳) グラフを埋め込む新しいアルゴリズムは、低次元表現を見つけるための漸近的複雑さを減少させた。 One-Hot Graph Encoder Embedding (GEE) は1つの線形パスオーバーエッジを使用し、スペクトル埋め込みに漸近的に収束する埋め込みを生成する。このアプローチのスケーリングとパフォーマンスの利点は、インタプリタ言語によるシリアル実装によって制限されている。我々はGEEをLigraグラフエンジンの並列プログラムにリファクタリングし、グラフのエッジ上の関数をマッピングし、ロックフリーなアトミックインストラクションを使ってデータ競合を防ぐ。 1.8Bエッジのグラフでは、オリジナルの実装よりも500倍のスピードアップ、ジャストインタイムのコンパイルバージョンより17倍のスピードアップを実現している。

New algorithms for embedding graphs have reduced the asymptotic complexity of finding low-dimensional representations. One-Hot Graph Encoder Embedding (GEE) uses a single, linear pass over edges and produces an embedding that converges asymptotically to the spectral embedding. The scaling and performance benefits of this approach have been limited by a serial implementation in an interpreted language. We refactor GEE into a parallel program in the Ligra graph engine that maps functions over the edges of the graph and uses lock-free atomic instrutions to prevent data races. On a graph with 1.8B edges, this results in a 500 times speedup over the original implementation and a 17 times speedup over a just-in-time compiled version.

翻訳日:2024-02-08 17:56:17 公開日:2024-02-06

# 個人化パラメータ効率の良い微調整による大規模言語モデルの民主化

Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning ( http://arxiv.org/abs/2402.04401v1 )

ライセンス: Link先を確認

Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, Meng Jiang

(参考訳) 大規模言語モデル(LLM)におけるパーソナライゼーションは、LLMのインタラクション、コンテンツ、レコメンデーションを個々のユーザの好みに合わせることを目的として、ますます重要になっている。 llmパーソナライゼーションの最近の進歩は、行動履歴検索とテキストプロファイルによる非パラメトリック知識によるユーザクエリの強化によって、効果的なプロンプトデザインにスポットライトを当てている。しかし、これらのアプローチはモデルオーナシップの欠如によって制限され、カスタマイズとプライバシの問題に繋がった。さらに、特にユーザデータが複雑でダイナミックな場合に、ユーザの振る舞いパターンを正確に捉えられなかったことも少なくありません。これらの欠点に対処するため,ユーザ固有の行動パターンや好みを格納するために,PEFTモジュールをパーソナライズするOne PEFT Per User (OPPU)を導入する。ユーザのPEFTパラメータをプラグインすることで、個人でLLMを所有および使用することができる。 OPPUは、個人PEFTパラメータにパラメトリックユーザ知識を、検索とプロファイルを通じて取得した非パラメトリック知識と統合する。この統合は個々のllmをユーザの動作シフトに適応させる。実験の結果,OPPUはLaMPベンチマークの7つのタスクにおいて,既存のプロンプトベースの手法よりも有意に優れていた。さらに詳細な研究により、OPPUのユーザ行動シフト処理能力の強化、異なるアクティブレベルでのユーザモデリング、さまざまなユーザ履歴フォーマット間の堅牢性維持、異なるPEFTメソッドによる汎用性の表示が明らかになった。

Personalization in large language models (LLMs) is increasingly important, aiming to align LLM's interactions, content, and recommendations with individual user preferences. Recent advances in LLM personalization have spotlighted effective prompt design, by enriching user queries with non-parametric knowledge through behavior history retrieval and textual profiles. However, these approaches were limited due to a lack of model ownership, resulting in constrained customization and privacy issues. Moreover, they often failed to accurately capture user behavior patterns, especially in cases where user data were complex and dynamic. To address these shortcomings, we introduce One PEFT Per User (OPPU), which employs personalized parameter-efficient fine-tuning (PEFT) modules, to store user-specific behavior patterns and preferences. By plugging in users' personal PEFT parameters, they can own and use their LLMs personally. OPPU integrates parametric user knowledge in the personal PEFT parameters with the non-parametric knowledge acquired through retrieval and profile. This integration adapts individual LLMs to user behavior shifts. Experimental results demonstrate that OPPU significantly outperforms existing prompt-based methods across seven diverse tasks in the LaMP benchmark. Further in-depth studies reveal OPPU's enhanced capabilities in handling user behavior shifts, modeling users at different active levels, maintaining robustness across various user history formats, and displaying versatility with different PEFT methods.

翻訳日:2024-02-08 17:56:05 公開日:2024-02-06

# CEHR-GPT: 患者時系列による電子健康記録の作成

CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines ( http://arxiv.org/abs/2402.04400v1 )

ライセンス: Link先を確認

Chao Pang, Xinzhuo Jiang, Nishanth Parameshwar Pavinkurve, Krishna S. Kalluri, Elise L. Minto, Jason Patterson, Linying Zhang, George Hripcsak, No\'emie Elhadad, Karthik Natarajan

(参考訳) 合成電子健康記録(ehr)は、医療アプリケーションや機械学習モデル、特に医療データに直接アクセスしない研究者にとって、重要なツールとして登場した。ルールベースのアプローチやGAN(Generative Adversarial Network)のような既存の手法は、現実のEHRデータに似た合成データを生成するが、これらの手法はしばしば表形式を使用し、患者の履歴の時間的依存関係を無視し、データの複製を制限する。近年、EHRデータにGPT(Generative Pre-trained Transformer)を活用することへの関心が高まっている。これにより、疾患の進行分析、人口推定、反事実推論、合成データ生成などのアプリケーションが可能になる。本研究では,合成データ生成に着目し,cehr-bert由来の特定の患者表現を用いてgptモデルを訓練する能力を示し,観察的医療成果連携(omop)データ形式にシームレスに変換可能な患者シーケンスを生成する。

Synthetic Electronic Health Records (EHR) have emerged as a pivotal tool in advancing healthcare applications and machine learning models, particularly for researchers without direct access to healthcare data. Although existing methods, like rule-based approaches and generative adversarial networks (GANs), generate synthetic data that resembles real-world EHR data, these methods often use a tabular format, disregarding temporal dependencies in patient histories and limiting data replication. Recently, there has been a growing interest in leveraging Generative Pre-trained Transformers (GPT) for EHR data. This enables applications like disease progression analysis, population estimation, counterfactual reasoning, and synthetic data generation. In this work, we focus on synthetic data generation and demonstrate the capability of training a GPT model using a particular patient representation derived from CEHR-BERT, enabling us to generate patient sequences that can be seamlessly converted to the Observational Medical Outcomes Partnership (OMOP) data format.

翻訳日:2024-02-08 17:55:36 公開日:2024-02-06

# QuIP#: Adamard IncoherenceとLattice CodebookによるLLM量子化をさらに改善

QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks ( http://arxiv.org/abs/2402.04396v1 )

ライセンス: Link先を確認

Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher De Sa

(参考訳) 後トレーニング量子化(PTQ)は、LLMのメモリフットプリントを減らし、その重みを低精度に定量化する。本稿では,重みのみのptqメソッドであるquip#について紹介する。このメソッドは3つの新しい手法を用いて,最先端の圧縮機構(重量あたり4ビット)を実現する。第一に、QuIP#はランダム化されたアダマール変換を用いてQuIPから不整合処理を改善する。第二に、quip#はベクトル量子化技術を使って、無干渉重みを持つ球形のサブガウシアン分布を利用する:具体的には、最適な8次元単位球充填を達成する、高対称な$e_8$格子に基づく、ハードウェア効率のよいコードブックのセットを導入する。第3に、QuIP#はファインチューニングを使用して、オリジナルのモデルの忠実性を改善する。実験の結果,QuIP#は既存のPTQメソッドよりも優れ,PTQスケーリングにおける新しい動作を可能にし,高速な推論をサポートすることがわかった。

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ($\le$ 4 bits per weight) using three novel techniques. First, QuIP# improves the incoherence processing from QuIP by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization techniques to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_8$ lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms existing PTQ methods, enables new behaviors in PTQ scaling, and supports fast inference.

翻訳日:2024-02-08 17:55:16 公開日:2024-02-06

# Howard-Harvard効果:交叉不等式の制度的再現

The Howard-Harvard effect: Institutional reproduction of intersectional inequalities ( http://arxiv.org/abs/2402.04391v1 )

ライセンス: Link先を確認

Diego Kozlowski, Thema Monroe-White, Vincent Larivi\`ere and Cassidy R. Sugimoto

(参考訳) 米国高等教育システムは、いくつかの機関内で科学と科学者の生産に集中している。これは、マイノリティ化された学者や、それらが不公平に関連付けられている話題に影響を及ぼす。本稿では,交叉型アイデンティティの異なる機関と著者間の話題的アライメントと,権威と科学的影響との関係について検討する。我々はハワード・ハーバード効果を観察し、ミッション主導の機関ではマイノリティ化学者のトピックプロファイルが増幅され、高名な機関では減少する。その結果、トピックと研究の影響における不平等の一貫したパターンが示される。具体的には,小学生と白人男性の間で,引用や雑誌の影響について統計的に有意な差異を観察する。著名米国大学の総合研究プロファイルは,白人男性の研究プロファイルと高い相関関係にあり,マイノリティー化女性の調査プロファイルと高い負の相関関係がある。さらに、より権威ある機関に属する著者は、引用と雑誌の影響の両方における不平等の増加に関係している。学術機関や資金提供者は、米国が完全に強固な科学的エコシステムを達成するのを妨げる体系的な障壁を緩和する政策を策定するために呼ばれる。

The US higher education system concentrates the production of science and scientists within a few institutions. This has implications for minoritized scholars and the topics with which they are disproportionately associated. This paper examines topical alignment between institutions and authors of varying intersectional identities, and the relationship with prestige and scientific impact. We observe a Howard-Harvard effect, in which the topical profile of minoritized scholars are amplified in mission-driven institutions and decreased in prestigious institutions. Results demonstrate a consistent pattern of inequality in topics and research impact. Specifically, we observe statistically significant differences between minoritized scholars and White men in citations and journal impact. The aggregate research profile of prestigious US universities is highly correlated with the research profile of White men, and highly negatively correlated with the research profile of minoritized women. Furthermore, authors affiliated with more prestigious institutions are associated with increasing inequalities in both citations and journal impact. Academic institutions and funders are called to create policies to mitigate the systemic barriers that prevent the United States from achieving a fully robust scientific ecosystem.

翻訳日:2024-02-08 17:54:52 公開日:2024-02-06

# 複雑な物理インフォームニューラルネットワーク

Densely Multiplied Physics Informed Neural Network ( http://arxiv.org/abs/2402.04390v1 )

ライセンス: Link先を確認

Feilong Jiang, Xiaonan Hou, Min Xia

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は非線形偏微分方程式(PDE)を扱う大きな可能性を示しているが、PINNが不十分な精度の問題や不正な結果に悩まされることが一般的である。トレーニングプロセスの最適化によってPINNの能力を向上しようとする既存のソリューションとは異なり、本研究では、PINNの性能向上のためにニューラルネットワークアーキテクチャを改善した。本稿では,隠れたレイヤの出力と隠れたレイヤの出力とを乗算する,密乗型PINN(DM-PINN)アーキテクチャを提案する。より訓練可能なパラメータを導入することなく、この効果的なメカニズムはPINNの精度を大幅に向上させることができる。提案手法は,allan-cahn方程式,helmholtz方程式,burgers方程式,1d対流方程式の4つのベンチマーク例で評価された。提案するアーキテクチャと異なるピン構造の比較により,dm-pinnの性能は精度と効率ともに優れていた。

Although physics-informed neural networks (PINNs) have shown great potential in dealing with nonlinear partial differential equations (PDEs), it is common that PINNs will suffer from the problem of insufficient precision or obtaining incorrect outcomes. Unlike most of the existing solutions trying to enhance the ability of PINN by optimizing the training process, this paper improved the neural network architecture to improve the performance of PINN. We propose a densely multiply PINN (DM-PINN) architecture, which multiplies the output of a hidden layer with the outputs of all the behind hidden layers. Without introducing more trainable parameters, this effective mechanism can significantly improve the accuracy of PINNs. The proposed architecture is evaluated on four benchmark examples (Allan-Cahn equation, Helmholtz equation, Burgers equation and 1D convection equation). Comparisons between the proposed architecture and different PINN structures demonstrate the superior performance of the DM-PINN in both accuracy and efficiency.

翻訳日:2024-02-08 17:54:33 公開日:2024-02-06

# 6つの簡単なステップにおける非定常拡散確率モデル

Denoising Diffusion Probabilistic Models in Six Simple Steps ( http://arxiv.org/abs/2402.04384v1 )

ライセンス: Link先を確認

Richard E. Turner, Cristiana-Diana Diaconu, Stratis Markou, Aliaksandra Shysheya, Andrew Y. K. Foong and Bruno Mlodozeniec

(参考訳) Denoising Diffusion Probabilistic Models (DDPM) は、画像およびビデオ生成、タンパク質と物質合成、天気予知、偏微分方程式のニューラルネットワークサロゲートといった様々な問題にうまく適用された、非常に一般的な深層生成モデルである。その普及にもかかわらず、単純で包括的でクリーンで明確であるddpmsの紹介を見つけるのは難しい。研究論文で必要とされるコンパクトな説明は、DDPMを定式化するための様々な設計手順の全てを解明することができず、提示されるステップの理性はしばしば空間を節約するために省略される。さらに、展示は典型的には、その方法がなぜ機能するのかを曖昧にし、実際にうまく機能しない一般化を示唆するため、不必要でおそらく有害な変分下界の視点から提示される。一方、連続的な時間制限を取る視点は美しく一般的であるが、確率微分方程式や確率フローの背景知識を必要とするため、参入への障壁が高い。本稿では、DDPMの定式化を6つの単純なステップに分割し、それぞれに明確な理論的根拠を与える。読者は、基本的な確率的モデリング、ガウス分布、最大確率推定、ディープラーニングを含む機械学習の基本トピックに精通していると仮定する。

Denoising Diffusion Probabilistic Models (DDPMs) are a very popular class of deep generative model that have been successfully applied to a diverse range of problems including image and video generation, protein and material synthesis, weather forecasting, and neural surrogates of partial differential equations. Despite their ubiquity it is hard to find an introduction to DDPMs which is simple, comprehensive, clean and clear. The compact explanations necessary in research papers are not able to elucidate all of the different design steps taken to formulate the DDPM and the rationale of the steps that are presented is often omitted to save space. Moreover, the expositions are typically presented from the variational lower bound perspective which is unnecessary and arguably harmful as it obfuscates why the method is working and suggests generalisations that do not perform well in practice. On the other hand, perspectives that take the continuous time-limit are beautiful and general, but they have a high barrier-to-entry as they require background knowledge of stochastic differential equations and probability flow. In this note, we distill down the formulation of the DDPM into six simple steps each of which comes with a clear rationale. We assume that the reader is familiar with fundamental topics in machine learning including basic probabilistic modelling, Gaussian distributions, maximum likelihood estimation, and deep learning.

翻訳日:2024-02-08 17:54:15 公開日:2024-02-06

# FairWire:公正なグラフ生成

FairWire: Fair Graph Generation ( http://arxiv.org/abs/2402.04383v1 )

ライセンス: Link先を確認

O. Deniz Kose and Yanning Shen

(参考訳) グラフ上の機械学習は、重要な相互接続システム内で複雑な関係を分析し学習する能力によって、近年注目を集めている。しかし、これらのアルゴリズムにおける偏りのあるグラフ構造の使用によって増幅される異なる影響は、現実世界の意思決定システムにおけるそれらの導入に重大な懸念を提起している。加えて、合成グラフ生成はプライバシやスケーラビリティの観点から重要になっているが、構造バイアスに対する生成学習アルゴリズムの影響はまだ調査されていない。この研究は、実グラフと合成グラフの両方における構造バイアスの分析と緩和に焦点を当てている。具体的には,まず,構造バイアスの発生源を理論的に解析し,不均一な関係の予測を行う。同定されたバイアス要因を緩和するため、多目的な利用を提供する新しい公正正則化器を設計する。本研究で明らかになったグラフ生成モデルのバイアス増幅に直面すると、我々はさらに公正なグラフ生成フレームワークであるFairWireを提案し、この公正な正規化設計を生成モデルに活用する。実世界のネットワークにおける実験結果から,提案手法が実グラフと合成グラフの両方に対して効果的な構造バイアス緩和をもたらすことが検証された。

Machine learning over graphs has recently attracted growing attention due to its ability to analyze and learn complex relations within critical interconnected systems. However, the disparate impact that is amplified by the use of biased graph structures in these algorithms has raised significant concerns for the deployment of them in real-world decision systems. In addition, while synthetic graph generation has become pivotal for privacy and scalability considerations, the impact of generative learning algorithms on the structural bias has not yet been investigated. Motivated by this, this work focuses on the analysis and mitigation of structural bias for both real and synthetic graphs. Specifically, we first theoretically analyze the sources of structural bias that result in disparity for the predictions of dyadic relations. To alleviate the identified bias factors, we design a novel fairness regularizer that offers a versatile use. Faced with the bias amplification in graph generation models that is brought to light in this work, we further propose a fair graph generation framework, FairWire, by leveraging our fair regularizer design in a generative model. Experimental results on real-world networks validate that the proposed tools herein deliver effective structural bias mitigation for both real and synthetic graphs.

翻訳日:2024-02-08 17:53:51 公開日:2024-02-06

# 解集合プログラミングによる対物生成

Counterfactual Generation with Answer Set Programming ( http://arxiv.org/abs/2402.04382v1 )

ライセンス: Link先を確認

Sopam Dasgupta, Farhad Shakerin, Joaqu\'in Arias, Elmer Salazar, Gopal Gupta

(参考訳) 意思決定を自動化する機械学習モデルは、ローンの承認、プレトライアルの保釈承認、雇用など、連続した分野での利用が増えている。残念なことに、これらのモデルのほとんどはブラックボックスであり、これらの予測決定にどのように到達するかを明らかにすることができない。このような予測を正当化する透明性の必要性。影響を受ける個人は、なぜ意思決定が行われたのかを理解するために説明を求めることもある。倫理的および法的考察は、望ましい結果をもたらすことができる入力属性の変化を個人に通知する必要があるかもしれない。本稿では, 逆実説明を自動生成する後者の問題に焦点をあてる。本稿では,ルールベース機械学習(RBML)アルゴリズムが生成するルールから,応答セットプログラミング(ASP)と s(CASP)目標指向のASPシステムを用いて,逆ファクトリアルな説明を自動的に生成するフレームワークを提案する。本フレームワークでは, 事実の前提が変更/変更される世界を想像することで, 反実的説明がどう計算され, 正当化されるかを示す。さらに重要なことは、これらの世界、すなわち、元の世界/scenarioから、望まれないし望ましくない結果が得られる想像の世界/scenarioに、どのようにナビゲートできるかを示します。

Machine learning models that automate decision-making are increasingly being used in consequential areas such as loan approvals, pretrial bail approval, hiring, and many more. Unfortunately, most of these models are black-boxes, i.e., they are unable to reveal how they reach these prediction decisions. A need for transparency demands justification for such predictions. An affected individual might also desire explanations to understand why a decision was made. Ethical and legal considerations may further require informing the individual of changes in the input attribute that could be made to produce a desirable outcome. This paper focuses on the latter problem of automatically generating counterfactual explanations. We propose a framework Counterfactual Generation with s(CASP) (CFGS) that utilizes answer set programming (ASP) and the s(CASP) goal-directed ASP system to automatically generate counterfactual explanations from rules generated by rule-based machine learning (RBML) algorithms. In our framework, we show how counterfactual explanations are computed and justified by imagining worlds where some or all factual assumptions are altered/changed. More importantly, we show how we can navigate between these worlds, namely, go from our original world/scenario where we obtain an undesired outcome to the imagined world/scenario where we obtain a desired/favourable outcome.

翻訳日:2024-02-08 17:53:30 公開日:2024-02-06

# ディープラーニングによるIoTネットワークトラフィック分析

IoT Network Traffic Analysis with Deep Learning ( http://arxiv.org/abs/2402.04469v1 )

ライセンス: Link先を確認

Mei Liu and Leon Yang

(参考訳) IoTネットワークはより複雑になり、大量のダイナミックデータを生成するため、従来の統計手法や機械学習手法を使用して異常を監視および検出することは困難である。ディープラーニングアルゴリズムは、大量のデータから処理と学習を行うことができ、教師なしの学習技術を使ってトレーニングすることもできる。これにより、これまで検出されていなかった新しい未知の異常を検出できる。また、ディープラーニングアルゴリズムは自動化され、高度にスケーラブルになり、バックエンドで継続的に動作し、大きなIoTネットワークを即座に監視できるようにする。本研究では,近年の深層学習技術を用いた文献レビューを行い,KDD Cup 99データセット上でのアンサンブル手法を用いたモデルの実装を行う。実験結果は,深部異常検出モデルの印象的な性能を示し,98\%以上の精度を得た。

As IoT networks become more complex and generate massive amounts of dynamic data, it is difficult to monitor and detect anomalies using traditional statistical methods and machine learning methods. Deep learning algorithms can process and learn from large amounts of data and can also be trained using unsupervised learning techniques, meaning they don't require labelled data to detect anomalies. This makes it possible to detect new and unknown anomalies that may not have been detected before. Also, deep learning algorithms can be automated and highly scalable; thereby, they can run continuously in the backend and make it achievable to monitor large IoT networks instantly. In this work, we conduct a literature review on the most recent works using deep learning techniques and implement a model using ensemble techniques on the KDD Cup 99 dataset. The experimental results showcase the impressive performance of our deep anomaly detection model, achieving an accuracy of over 98\%.

翻訳日:2024-02-08 17:47:09 公開日:2024-02-06

# 大規模言語モデルを用いた構造化エンティティ抽出

Structured Entity Extraction Using Large Language Models ( http://arxiv.org/abs/2402.04437v1 )

ライセンス: Link先を確認

Haolun Wu, Ye Yuan, Liana Mikaelyan, Alexander Meulemans, Xue Liu, James Hensman, Bhaskar Mitra

(参考訳) 機械学習の最近の進歩は情報抽出の分野に大きな影響を与えており、Large Language Models (LLM) は構造化されていないテキストから構造化情報を取り出す上で重要な役割を果たしている。本稿では、構造化エンティティ抽出における現在の方法論の課題と限界を考察し、これらの問題に対処するための新しいアプローチを紹介する。まず、構造化エンティティ抽出(SEE)タスクの導入と形式化を行い、続いて、このタスク上でモデルパフォーマンスを適切に評価するように設計されたAESOP(Adroximate Entity Set OverlaP)メトリックを提案します。その後, 抽出タスク全体を多段階に分解し, llmのパワーを活用し, 効率と効率を向上させる新しいモデルを提案する。定量的評価と人体側評価により,本モデルがベースラインより優れており,構造化エンティティ抽出の今後の進歩に期待できる方向を提供する。

Recent advances in machine learning have significantly impacted the field of information extraction, with Large Language Models (LLMs) playing a pivotal role in extracting structured information from unstructured text. This paper explores the challenges and limitations of current methodologies in structured entity extraction and introduces a novel approach to address these issues. We contribute to the field by first introducing and formalizing the task of Structured Entity Extraction (SEE), followed by proposing Approximate Entity Set OverlaP (AESOP) Metric designed to appropriately assess model performance on this task. Later, we propose a new model that harnesses the power of LLMs for enhanced effectiveness and efficiency through decomposing the entire extraction task into multiple stages. Quantitative evaluation and human side-by-side evaluation confirm that our model outperforms baselines, offering promising directions for future advancements in structured entity extraction.

翻訳日:2024-02-08 17:46:56 公開日:2024-02-06

# 連続多次元スケーリング

Continuous Multidimensional Scaling ( http://arxiv.org/abs/2402.04436v1 )

ライセンス: Link先を確認

Michael W. Trosset, Carey E. Priebe

(参考訳) 多次元スケーリング (multidimensional scaling, mds) は、n$ のオブジェクトの集合の近接情報を $d$ 次元ユークリッド空間に埋め込む行為である。もともと心理測定のコミュニティが考え出したように、MDSは固定されたオブジェクトの集合に関連する固定された確率のセットを埋めることに関心を持っていた。現代の関心事、例えば、ランダムグラフの統計的推論のための漸近理論の開発において生じる、より一般的には、増大する対象の集合に関連する一連の公理の列の制限挙動を研究することである。点対集合写像の理論の標準的な結果は、$n$ が固定された場合、埋め込み構造の極限は制限された近似の埋め込み構造であることを意味する。でも、$n$が上がったら? したがって、MDSを再構成し、埋め込み問題全体の列を固定空間における最適化問題の列と見なせるようにする必要がある。このような改革を提示し、いくつかの結果をもたらす。

Multidimensional scaling (MDS) is the act of embedding proximity information about a set of $n$ objects in $d$-dimensional Euclidean space. As originally conceived by the psychometric community, MDS was concerned with embedding a fixed set of proximities associated with a fixed set of objects. Modern concerns, e.g., that arise in developing asymptotic theories for statistical inference on random graphs, more typically involve studying the limiting behavior of a sequence of proximities associated with an increasing set of objects. Standard results from the theory of point-to-set maps imply that, if $n$ is fixed, then the limit of the embedded structures is the embedded structure of the limiting proximities. But what if $n$ increases? It then becomes necessary to reformulate MDS so that the entire sequence of embedding problems can be viewed as a sequence of optimization problems in a fixed space. We present such a reformulation and derive some consequences.

翻訳日:2024-02-08 17:46:38 公開日:2024-02-06

# PreGIP: 深部知的財産保護のためのグラフニューラルネットワークの事前学習の透かし

PreGIP: Watermarking the Pretraining of Graph Neural Networks for Deep Intellectual Property Protection ( http://arxiv.org/abs/2402.04435v1 )

ライセンス: Link先を確認

Enyan Dai, Minhua Lin, Suhang Wang

(参考訳) グラフニューラルネットワーク(GNN)の事前トレーニングは、さまざまな下流タスクの促進に大きく貢献している。事前学習は一般的に大量のデータと計算資源を必要とするため、事前訓練されたGNNは正当な所有者の高価値知的特性(IP)である。しかし、敵は、下流のタスクのために訓練済みのGNNモデルを違法にコピーして展開することができる。 IP 保護のための GNN 分類器の透かしに最初に取り組みが行われたが、これらの手法は透かしのための目標分類タスクを必要とするため、GNN モデルの自己管理事前訓練には適用できない。そこで本研究では,組込み空間の高品質を維持しつつ,IP保護のためのGNNエンコーダの事前訓練を透かし,PreGIPという新しいフレームワークを提案する。 PreGIPは、事前訓練されたGNNエンコーダの埋め込み空間を透かし、タスクフリーな透かし損失を取り入れている。さらに微調整耐性透かし注入を施す。理論的解析と広範な実験により、ダウンストリームタスクにおけるIP保護と高性能維持における {\method} の有効性が示されている。

Pretraining on Graph Neural Networks (GNNs) has shown great power in facilitating various downstream tasks. As pretraining generally requires huge amount of data and computational resources, the pretrained GNNs are high-value Intellectual Properties (IP) of the legitimate owner. However, adversaries may illegally copy and deploy the pretrained GNN models for their downstream tasks. Though initial efforts have been made to watermark GNN classifiers for IP protection, these methods require the target classification task for watermarking, and thus are not applicable to self-supervised pretraining of GNN models. Hence, in this work, we propose a novel framework named PreGIP to watermark the pretraining of GNN encoder for IP protection while maintain the high-quality of the embedding space. PreGIP incorporates a task-free watermarking loss to watermark the embedding space of pretrained GNN encoder. A finetuning-resistant watermark injection is further deployed. Theoretical analysis and extensive experiments show the effectiveness of {\method} in IP protection and maintaining high-performance for downstream tasks.

翻訳日:2024-02-08 17:46:21 公開日:2024-02-06

# 医用画像調和ベンチマークのための定量的指標

Quantitative Metrics for Benchmarking Medical Image Harmonization ( http://arxiv.org/abs/2402.04426v1 )

ライセンス: Link先を確認

Abhijeet Parida, Zhifan Jiang, Roger J. Packer, Robert A. Avery, Syed M. Anwar, Marius G. Linguraru

(参考訳) 画像調和は、異なるマシンで取得したデータや医療画像における走査プロトコルから生じるドメインシフトに対処するための重要な前処理戦略である。しかしながら、調和技術の有効性のベンチマークは、広く利用可能な標準データセットが欠如しているため、課題となっている。この文脈では、2つの強度調和度と1つの解剖学的保存度という3つの指標が提案される。利用可能な調和基底真理を持つデータセットの広範な研究を通じて、我々のメトリクスが確立された画像品質評価指標と相関していることを示す。これらの新しい指標が、調和基盤の真理が存在しない現実世界のシナリオにどのように適用されるかを示す。さらに, 計量値の異なる解釈に対する洞察を提供し, 調和過程の文脈におけるその意義を明らかにする。以上の結果から,これらの定量的調和指標を画像調和手法の性能評価基準として採用することを提唱した。

Image harmonization is an important preprocessing strategy to address domain shifts arising from data acquired using different machines and scanning protocols in medical imaging. However, benchmarking the effectiveness of harmonization techniques has been a challenge due to the lack of widely available standardized datasets with ground truths. In this context, we propose three metrics: two intensity harmonization metrics and one anatomy preservation metric for medical images during harmonization, where no ground truths are required. Through extensive studies on a dataset with available harmonization ground truth, we demonstrate that our metrics are correlated with established image quality assessment metrics. We show how these novel metrics may be applied to real-world scenarios where no harmonization ground truth exists. Additionally, we provide insights into different interpretations of the metric values, shedding light on their significance in the context of the harmonization process. As a result of our findings, we advocate for the adoption of these quantitative harmonization metrics as a standard for benchmarking the performance of image harmonization techniques.

翻訳日:2024-02-08 17:46:05 公開日:2024-02-06

# 造船所用スマートパイプシステム 4.0

Smart Pipe System for a Shipyard 4.0 ( http://arxiv.org/abs/2402.04423v1 )

ライセンス: Link先を確認

Paula Fraga-Lamas, Diego Noceda-Davila, Tiago M. Fern\'andez-Caram\'es, Manuel A. D\'iaz-Bouza and Miguel Vilar-Montesinos

(参考訳) 産業4.0パラダイムの進歩的な注入の結果、多くの産業が造船所が無視できない革命を試みている。そのため、造船所への産業4.0の原則の適用は、造船所4.0の創設に繋がる。このため、世界最大の造船会社10社のうちの1つであるナバンティアは、造船所4.0が直面する近未来の課題に対応するため、内部の作業全体を更新している。このような課題は、プロダクションシステムの垂直統合、新しい世代の価値創造ネットワークの水平統合、プロダクションチェーン全体の再設計の3つのグループに分けられる。パイプは船上に存在し、種類もさまざまで、重要な部品の1つであり、その監視は将来的なサイバー物理システムを構成する。改良された識別能力、トレーサビリティ、屋内位置、生産から生活を通じて、造船所の生産性と安全性を高めることができる。このような作業を行うため、本論文はまず造船所環境の徹底的な分析を行う。この分析から,本質的なハードウェアおよびソフトウェア技術要件が決定される。次に、スマートパイプの概念を、造船所で拡張サービスを提供するために定期的に信号を送信できるオブジェクトとして提示し、定義する。スマートパイプシステムを構築するために、異なる技術が選択され、評価され、受動的かつアクティブなrfidがそれを作るのに最適な技術であると結論づけられる。さらに、パイプワークショップで得られた有望な屋内測位結果から、マルチアンテナアルゴリズムとカルマンフィルタが受信信号強度(rss)の安定化とシステム全体の精度の向上に寄与することを示した。

As a result of the progressive implantation of the Industry 4.0 paradigm, many industries are experimenting a revolution that shipyards cannot ignore. Therefore, the application of the principles of Industry 4.0 to shipyards are leading to the creation of Shipyards 4.0. Due to this, Navantia, one of the 10 largest shipbuilders in the world, is updating its whole inner workings to keep up with the near-future challenges that a Shipyard 4.0 will have to face. Such challenges can be divided into three groups: the vertical integration of production systems, the horizontal integration of a new generation of value creation networks, and the re-engineering of the entire production chain, making changes that affect the entire life cycle of each piece of a ship. Pipes, which exist in a huge number and varied typology on a ship, are one of the key pieces, and its monitoring constitutes a prospective cyber-physical system. Their improved identification, traceability, and indoor location, from production and through their life, can enhance shipyard productivity and safety. In order to perform such tasks, this article first conducts a thorough analysis of the shipyard environment. From this analysis, the essential hardware and software technical requirements are determined. Next, the concept of smart pipe is presented and defined as an object able to transmit signals periodically that allows for providing enhanced services in a shipyard. In order to build a smart pipe system, different technologies are selected and evaluated, concluding that passive and active RFID are currently the most appropriate technologies to create it. Furthermore, some promising indoor positioning results obtained in a pipe workshop are presented, showing that multi-antenna algorithms and Kalman filtering can help to stabilize Received Signal Strength (RSS) and improve the overall accuracy of the system.

翻訳日:2024-02-08 17:45:49 公開日:2024-02-06

# rで脆弱なコードエンティティを調べる

Studying Vulnerable Code Entities in R ( http://arxiv.org/abs/2402.04421v1 )

ライセンス: Link先を確認

Zixiao Zhao, Millon Madhur Das, Fatemeh H. Fard

(参考訳) 事前訓練されたコード言語モデル(Code-PLMs)は、過去数年間で多くの進歩を示し、多くのソフトウェアエンジニアリングタスクで最先端の結果を得た。 These models are mainly targeted for popular programming languages such as Java and Python, leaving out many other ones like R. Though R has a wide community of developers and users, there is little known about the applicability of Code-PLMs for R. In this preliminary study, we aim to investigate the vulnerability of Code-PLMs for code entities in R. For this purpose, we use an R dataset of code and comment pairs and then apply CodeAttack, a black-box attack model that uses the structure of code to generate adversarial code samples. これは、一般的なプログラミング言語(例えばJava)と比較して、Rトークンの型の重要性を理解するための第一歩です。私たちは研究をコード要約に限定します。その結果、最も脆弱なコードエンティティが識別子であり、Rに特有の構文トークンが続き、トークン型の重要性が明らかになり、R言語のコード要約とメソッド名予測のためのモデルの開発に役立ちます。

Pre-trained Code Language Models (Code-PLMs) have shown many advancements and achieved state-of-the-art results for many software engineering tasks in the past few years. These models are mainly targeted for popular programming languages such as Java and Python, leaving out many other ones like R. Though R has a wide community of developers and users, there is little known about the applicability of Code-PLMs for R. In this preliminary study, we aim to investigate the vulnerability of Code-PLMs for code entities in R. For this purpose, we use an R dataset of code and comment pairs and then apply CodeAttack, a black-box attack model that uses the structure of code to generate adversarial code samples. We investigate how the model can attack different entities in R. This is the first step towards understanding the importance of R token types, compared to popular programming languages (e.g., Java). We limit our study to code summarization. Our results show that the most vulnerable code entity is the identifier, followed by some syntax tokens specific to R. The results can shed light on the importance of token types and help in developing models for code summarization and method name prediction for the R language.

翻訳日:2024-02-08 17:45:21 公開日:2024-02-06

# 機械学習の測定はステレオタイプから損なう--どのエラーがどのような方法で損なわれているのかを理解する必要がある

Measuring machine learning harms from stereotypes: requires understanding who is being harmed by which errors in what ways ( http://arxiv.org/abs/2402.04420v1 )

ライセンス: Link先を確認

Angelina Wang and Xuechunzi Bai and Solon Barocas and Su Lin Blodgett

(参考訳) 機械学習のアプリケーションが普及するにつれて、その危険性を理解する必要がある。しかし、現在の公正度測定基準は人間の心理的な害経験にはほとんど根ざしていない。ステレオタイプの社会心理学を題材として,画像検索におけるジェンダーステレオタイプの事例研究を行い,機械学習の誤りに対する反応について検討した。まず、すべての機械学習エラーがステレオタイプを反映しているか、等しく有害であるかを示すために、調査研究を使用する。実験では、被験者にステレオタイプ強化、違反、ニュートラルな機械学習エラーをランダムに暴露する。ステレオタイプ強化エラーは、認知的信念、態度、行動に対する最小限の変化を持ちながら、より経験的な(主観的な)有害な経験をもたらす。この経験的な害は男性よりも女性に影響を及ぼす。しかし、ある種のステレオタイプに違反するエラーは、男性にとってより実験的に有害である。我々は、公正な緩和において害は唯一のガイドにはならないと結論し、誰が害と理由を経験しているかによって、ニュアンスな視点を提案する。

As machine learning applications proliferate, we need an understanding of their potential for harm. However, current fairness metrics are rarely grounded in human psychological experiences of harm. Drawing on the social psychology of stereotypes, we use a case study of gender stereotypes in image search to examine how people react to machine learning errors. First, we use survey studies to show that not all machine learning errors reflect stereotypes nor are equally harmful. Then, in experimental studies we randomly expose participants to stereotype-reinforcing, -violating, and -neutral machine learning errors. We find stereotype-reinforcing errors induce more experientially (i.e., subjectively) harmful experiences, while having minimal changes to cognitive beliefs, attitudes, or behaviors. This experiential harm impacts women more than men. However, certain stereotype-violating errors are more experientially harmful for men, potentially due to perceived threats to masculinity. We conclude that harm cannot be the sole guide in fairness mitigation, and propose a nuanced perspective depending on who is experiencing what harm and why.

翻訳日:2024-02-08 17:45:04 公開日:2024-02-06

# 胸部ct分類における弱教師付き深層学習の性能の制限

What limits performance of weakly supervised deep learning for chest CT classification? ( http://arxiv.org/abs/2402.04419v1 )

ライセンス: Link先を確認

Fakrul Islam Tushar, Vincent M. D'Anniballe, Geoffrey D. Rubin, Joseph Y. Lo

(参考訳) ノイズデータを用いた弱い教師付き学習は,良質な疾患ラベルが不足していることから,医療画像コミュニティの注目を集めている。しかし、このような弱教師付き学習の限界と、これらの制約が疾患分類性能に与える影響についてはほとんど分かっていない。本稿では,3つの条件に対するモデル許容性を調べることにより,このような弱い監督の影響を検証した。まず,学習データ内のラベルの誤差を段階的に増加させることにより,ノイズデータに対するモデル許容性を検討した。第2に,データセットサイズがトレーニングデータ量に与える影響について検討した。第3に,バイナリ分類とマルチラベル分類の比較を行った。その結果, 疾患分類性能の低下を経験する前に, ラベルエラーを最大10%加えることができた。すべての病級でトレーニングデータ量が増加し,75%のトレーニングデータで高い成績を呈するようになると,疾患分類性能は着実に向上した。最後に、バイナリモデルはすべての疾患カテゴリでマルチラベルモデルよりも優れていた。しかし、二項モデルは同時進行する疾患の影響を強く受けており、画像中の病気の特定の特徴を学ばなかったため、このような解釈は誤解を招く可能性がある。結論として, この研究は, 医療画像のコミュニティにおいて, ノイズラベルによる弱い監督の利点とリスクを理解するのに役立つ可能性がある。このような研究は、多様な大規模データセットを構築し、説明可能なAIを開発する必要性を示している。

Weakly supervised learning with noisy data has drawn attention in the medical imaging community due to the sparsity of high-quality disease labels. However, little is known about the limitations of such weakly supervised learning and the effect of these constraints on disease classification performance. In this paper, we test the effects of such weak supervision by examining model tolerance for three conditions. First, we examined model tolerance for noisy data by incrementally increasing error in the labels within the training data. Second, we assessed the impact of dataset size by varying the amount of training data. Third, we compared performance differences between binary and multi-label classification. Results demonstrated that the model could endure up to 10% added label error before experiencing a decline in disease classification performance. Disease classification performance steadily rose as the amount of training data was increased for all disease classes, before experiencing a plateau in performance at 75% of training data. Last, the binary model outperformed the multilabel model in every disease category. However, such interpretations may be misleading, as the binary model was heavily influenced by co-occurring diseases and may not have learned the specific features of the disease in the image. In conclusion, this study may help the medical imaging community understand the benefits and risks of weak supervision with noisy labels. Such studies demonstrate the need to build diverse, large-scale datasets and to develop explainable and responsible AI.

翻訳日:2024-02-08 17:44:44 公開日:2024-02-06

# 分散型ブロックチェーンベースのロバストマルチエージェントマルチアーム付きバンディット

Decentralized Blockchain-based Robust Multi-agent Multi-armed Bandit ( http://arxiv.org/abs/2402.04417v1 )

ライセンス: Link先を確認

Mengfan Xu, Diego Klabjan

(参考訳) 我々は、複数のクライアントまたは参加者が完全に分散したブロックチェーン上に分散され、悪意を持つ可能性がある、堅牢なマルチエージェントマルチアームのバンディット問題を調査した。アームの報酬はクライアント間で均質であり、システムが十分に安全である場合にのみ参加者に明らかにされる時間不変確率分布に従う。システムの目的は、正直な参加者が得た累積報酬を効率的に確保することである。この目的と最善の知識のために、私たちは、ブロックチェーンからの高度な技術と新しいメカニズムを、正直な参加者のために最適な戦略を設計するシステムに組み入れました。これにより、さまざまな悪意ある行動や、参加者のプライバシーの維持が可能になる。より具体的には、すべての参加者にアクセス可能な検証者のプールをランダムに選択し、これらの検証者のためのデジタル署名に基づく真新しいコンセンサスメカニズムをデザインし、安全なマルチパーティ計算を通じて参加者からの情報を少なくするucbベースの戦略を考案し、連鎖参加型インタラクションと参加者の参加を促すインセンティブメカニズムを設計する。特に、ブロックチェーンの最適性という文脈で後悔して解析することにより、提案アルゴリズムの理論的保証を最初に証明した。ブロックチェーンと、主に数値最適性に焦点を当てたフェデレーション学習のような学習問題を統合する既存の作業とは異なり、正直な参加者の後悔は、$log{T}$で上限づけられている。これは、悪意のある参加者がいないマルチエージェントのマルチアームバンディット問題と、純粋なビザンティン攻撃を伴う堅牢なマルチエージェントのマルチアームバンディット問題と一致している。

We study a robust multi-agent multi-armed bandit problem where multiple clients or participants are distributed on a fully decentralized blockchain, with the possibility of some being malicious. The rewards of arms are homogeneous among the clients, following time-invariant stochastic distributions that are revealed to the participants only when the system is secure enough. The system's objective is to efficiently ensure the cumulative rewards gained by the honest participants. To this end and to the best of our knowledge, we are the first to incorporate advanced techniques from blockchains, as well as novel mechanisms, into the system to design optimal strategies for honest participants. This allows various malicious behaviors and the maintenance of participant privacy. More specifically, we randomly select a pool of validators who have access to all participants, design a brand-new consensus mechanism based on digital signatures for these validators, invent a UCB-based strategy that requires less information from participants through secure multi-party computation, and design the chain-participant interaction and an incentive mechanism to encourage participants' participation. Notably, we are the first to prove the theoretical guarantee of the proposed algorithms by regret analyses in the context of optimality in blockchains. Unlike existing work that integrates blockchains with learning problems such as federated learning which mainly focuses on numerical optimality, we demonstrate that the regret of honest participants is upper bounded by $log{T}$. This is consistent with the multi-agent multi-armed bandit problem without malicious participants and the robust multi-agent multi-armed bandit problem with purely Byzantine attacks.

翻訳日:2024-02-08 17:44:07 公開日:2024-02-06

# webスケールマルチモーダルデータからの検索による教師なしドメイン一般化のためのデータ中心アプローチ

A Data Centric Approach for Unsupervised Domain Generalization via Retrieval from Web Scale Multimodal Data ( http://arxiv.org/abs/2402.04416v1 )

ライセンス: Link先を確認

Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis

(参考訳) ドメイン一般化 (Domain Generalization, DG) は、共有ラベル空間を仮定して、1つ以上のソースドメインを活用するテスト領域を一般化できるモデルを学ぶ重要な問題である。しかし、ほとんどのdgメソッドは、ターゲットのラベル空間における豊富なソースデータへのアクセスを前提としており、ターゲットのタスクと同じラベル空間を取得する場合、多くの実世界のアプリケーションに対して過度に厳密であることを証明する要件である。この設定のために、細かいチューニング中にlaion-2bのような大きなタスクに依存しない非ラベルのソースデータセットを使用するunsupervised domain generalization(udg)問題のマルチモーダルバージョンに取り組む。私たちのフレームワークは、ソースデータセットとターゲットタスクの関係を明示的に仮定していません。代わりに、ソースデータセットを共同ビジョン言語空間で効率的に検索できるという前提にのみ依存する。このマルチモーダルUDG設定では,(1)ラベル名を用いたクエリの多様化,(2)擬似ラベル付け,(3)クラスタリングによる代表サンプルの検索,という3つの簡単なステップで,ソースデータの小さな($100K)サブセットを構築する方法を提案する。マルチモーダルUDG問題の研究価値を示すために,各ベンチマークにおける最先端のソースフリーDGとゼロショット(ZS)手法を比較し,20種類のターゲットデータセットに対して最大10%の精度向上を示す。さらに, この多段階データセット構築手法は, 近隣の検索よりも平均3%改善されている。コード提供: https://github.com/chris210634/mudg

Domain generalization (DG) is an important problem that learns a model that can generalize to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (UDG) problem, which uses a large task-agnostic unlabeled source dataset, such as LAION-2B during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be efficiently searched in a joint vision-language space. For this multimodal UDG setting, we propose a novel method to build a small ($<$100K) subset of the source data in three simple steps: (1) diversified retrieval using label names as queries, (2) rank pseudo-labeling, and (3) clustering to find representative samples. To demonstrate the value of studying the multimodal UDG problem, we compare our results against state-of-the-art source-free DG and zero-shot (ZS) methods on their respective benchmarks and show up to 10% improvement in accuracy on 20 diverse target datasets. Additionally, our multi-stage dataset construction method achieves 3% improvement on average over nearest neighbors retrieval. Code is available: https://github.com/Chris210634/mudg

翻訳日:2024-02-08 17:43:17 公開日:2024-02-06

# 対称測定による非マルコフ量子力学

Non-Markovian quantum dynamics from symmetric measurements ( http://arxiv.org/abs/2402.04415v1 )

ライセンス: Link先を確認

Katarzyna Siudzi\'nska

(参考訳) 我々は対称測度演算子を用いて、一般化されたパウリチャネルのさらなる一般化を提供する量子チャネルを構築する。得られた写像はビストヒスティックであるが、一般には混合ユニタリではない。完全正当性や量子エンタングルメントを破る能力など,それらの重要な性質を解析する。主部では、時間局所発生器による対応する開量子系力学を考察する。動的写像の可除性から、十分なマルコビアン性および非マルコビアン性条件を導出する。インストラクティブな例として、P-分割可能な一般化されたパウリ力学写像の生成元を示し、デコヒーレンス率のより負性性を高める。

We use symmetric measurement operators to construct quantum channels that provide a further generalization of generalized Pauli channels. The resulting maps are bistochastic but in general no longer mixed unitary. We analyze their important properties, such as complete positivity and the ability to break quantum entanglement. In the main part, we consider the corresponding open quantum systems dynamics with time-local generators. From divisibility properties of dynamical maps, we derive sufficient Markovianity and non-Markovianity conditions. As instructive examples, we present the generators of P-divisible generalized Pauli dynamical maps that allow for more negativity in the decoherence rates.

翻訳日:2024-02-08 17:41:45 公開日:2024-02-06

# 量子渦の中心近傍における光電子の波動関数

The wave function of a photoelectron near the center of a quantum vortex ( http://arxiv.org/abs/2402.04414v1 )

ライセンス: Link先を確認

N. V. Larionov, Yu. L. Kolesnikov

(参考訳) 二次元近似では、量子渦の局在に近い光電子の確率密度と電流を理論的に研究する。先に発見した運動量表現の波動関数は、渦の中心に対応するゼロ付近で単純化される。これにより、ガウス波のパケットの構造を持ち、渦に関する基本的な情報を含む単純な解析式を得ることができる。運動量と座標空間の両方において、量子渦の時間発展を分析するために用いられる。イオン化超短パルスの強度が量子渦形状に及ぼす影響についても検討した。

In a two-dimensional approximation, the probability density and current for a photoelectron near the localization of a quantum vortex are theoretically investigated. The wave function in the momentum representation, which we found earlier, is simplified near zero, corresponding to the center of the vortex. This allows us to obtain a simple analytical expression for it, which has the structure of a Gaussian wave packet and contains basic information about the vortex. It is used to analyze the temporal evolution of a quantum vortex, both in momentum and coordinate space. The effect of the intensity of an ionizing ultrashort laser pulse on the shape of a quantum vortex is also investigated.

翻訳日:2024-02-08 17:41:34 公開日:2024-02-06

# VampPrior混合モデル

The VampPrior Mixture Model ( http://arxiv.org/abs/2402.04412v1 )

ライセンス: Link先を確認

Andrew Stirn and David A. Knowles

(参考訳) 深層潜伏変数モデル(DLVM)の現在のクラスタリングでは、a-prioriのクラスタ数を定義する必要があり、初期化が貧弱である。これらの欠陥に対処することは、統合とクラスタリングを同時に行うことで、ディープラーニングベースのscrna-seq分析に大きなメリットがある。我々は、vampprior (tomczak & welling, 2018) をdirichlet process gaussian mixed modelに適応させ、dlvmsに先立つ新しいvampprior mixed model (vmm) を実現した。本稿では,変分推論と経験ベイズを交互に交互に推定し,変分パラメータと先行パラメータをきれいに区別する手法を提案する。変分オートコーダでVMMを使用すると、ベンチマークデータセット上で非常に競争力のあるクラスタリング性能が得られる。 Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMMは、その性能を著しく改善し、細胞を生物学的に意味のあるクラスターに自動的に配置する。

Current clustering priors for deep latent variable models (DLVMs) require defining the number of clusters a-priori and are susceptible to poor initializations. Addressing these deficiencies could greatly benefit deep learning-based scRNA-seq analysis by performing integration and clustering simultaneously. We adapt the VampPrior (Tomczak & Welling, 2018) into a Dirichlet process Gaussian mixture model, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. We propose an inference procedure that alternates between variational inference and Empirical Bayes to cleanly distinguish variational and prior parameters. Using the VMM in a Variational Autoencoder attains highly competitive clustering performance on benchmark datasets. Augmenting scVI (Lopez et al., 2018), a popular scRNA-seq integration method, with the VMM significantly improves its performance and automatically arranges cells into biologically meaningful clusters.

翻訳日:2024-02-08 17:41:17 公開日:2024-02-06

# ナレーションによる言語モデルにおけるモード崩壊の検出

Detecting Mode Collapse in Language Models via Narration ( http://arxiv.org/abs/2402.04477v1 )

ライセンス: Link先を確認

Sil Hamilton

(参考訳) 2人の作家が同じように書くことはない。レキシコンから修辞学的な装置に至るまで、書かれた物語の中で個人的繁栄が引き起こされ、それは特定の著者、つまり、文学理論家たちが、本文の実際の著者やナレーターとは異なる、暗黙または仮想的な著者と名付けることを意味する。様々な不協和性ソースから抽出された未フィルタリングトレーニングセットに基づいて訓練された初期の大きな言語モデルは、不整合な個性をもたらし、会話のタスクには問題があったが、複数の観点から文献をサンプリングするのに有用であった。近年のアライメント研究の成功により、研究者はヒューマンフィードバック(RLHF)からの指導チューニングや強化学習を通じて、言語モデルに主観的に一貫したペルソナを課すことができたが、アライメントモデルが任意の仮想著者をモデル化する能力を保持するかどうかはほとんど調査されていない。 3つのOpenAI言語モデルからサンプリングされた4,374のストーリーを解析することにより、GPT-3の連続バージョンは「モード崩壊」の度合いの上昇に悩まされ、アライメント中のモデルに過度に適合することで、オーサシップを一般化することを防ぐ。社会学シミュレーションに言語モデルを用いたい研究者にとって,本手法と結果が重要である。

No two authors write alike. Personal flourishes invoked in written narratives, from lexicon to rhetorical devices, imply a particular author--what literary theorists label the implied or virtual author; distinct from the real author or narrator of a text. Early large language models trained on unfiltered training sets drawn from a variety of discordant sources yielded incoherent personalities, problematic for conversational tasks but proving useful for sampling literature from multiple perspectives. Successes in alignment research in recent years have allowed researchers to impose subjectively consistent personae on language models via instruction tuning and reinforcement learning from human feedback (RLHF), but whether aligned models retain the ability to model an arbitrary virtual author has received little scrutiny. By studying 4,374 stories sampled from three OpenAI language models, we show successive versions of GPT-3 suffer from increasing degrees of "mode collapse" whereby overfitting the model during alignment constrains it from generalizing over authorship: models suffering from mode collapse become unable to assume a multiplicity of perspectives. Our method and results are significant for researchers seeking to employ language models in sociological simulations.

翻訳日:2024-02-08 17:32:58 公開日:2024-02-06

# Webナビゲーションのためのデュアルビュービジュアルコンテクスト化

Dual-View Visual Contextualization for Web Navigation ( http://arxiv.org/abs/2402.04476v1 )

ライセンス: Link先を確認

Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao

(参考訳) 自動Webナビゲーションは、言語命令に従って現実世界のウェブサイトで複雑で多様なタスクを実行するWebエージェントを構築することを目的としている。既存の作業は、主にHTMLドキュメントを入力として、Webページのコンテンツとアクション空間(つまり実行可能な要素と操作)を定義する。それでもHTMLドキュメントは各要素に対して明確なタスク関連コンテキストを提供していないため、正しい(順序の)アクションを選択するのが難しい。本稿では,Webページのスクリーンショットにおいて,各HTML要素が対応するバウンディングボックスとスクリーンショット内の視覚的コンテンツを持つ「デュアルビュー」を通じて,HTML要素のコンテキスト化を提案する。 Web開発者は、Webページの近くにあるタスク関連要素を配置して、ユーザエクスペリエンスを向上させる傾向があり、テキストとビジュアルの両方の機能を使用して、各要素を隣の要素でコンテキスト化することを提案します。結果として生じるHTML要素の表現は、エージェントがアクションを取るためのより情報的です。我々は最近リリースされたMind2Webデータセット上で,実際のWebサイト上で多様なナビゲーションドメインとタスクを特徴付ける手法を検証する。提案手法は,クロスタスク,クロスサイト,クロスドメインなど,すべてのシナリオにおいて一貫してベースラインを上回ります。

Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, making it hard to select the right (sequence of) actions. In this paper, we propose to contextualize HTML elements through their "dual views" in webpage screenshots: each HTML element has its corresponding bounding box and visual content in the screenshot. We build upon the insight -- web developers tend to arrange task-related elements nearby on webpages to enhance user experiences -- and propose to contextualize each element with its neighbor elements, using both textual and visual features. The resulting representations of HTML elements are more informative for the agent to take action. We validate our method on the recently released Mind2Web dataset, which features diverse navigation domains and tasks on real-world websites. Our method consistently outperforms the baseline in all the scenarios, including cross-task, cross-website, and cross-domain ones.

翻訳日:2024-02-08 17:32:31 公開日:2024-02-06

# 帰納的量子位相推定

Reductive Quantum Phase Estimation ( http://arxiv.org/abs/2402.04471v1 )

ライセンス: Link先を確認

Nicholas J.C. Papadopoulos, Jarrod T. Reilly, John Drew Wilson, Murray J. Holland

(参考訳) 量子相の推定は、幅広い分野の量子科学において必要となる課題である。この課題を達成するために、原子物理学と分子物理学のラムゼー干渉計(RI)と量子コンピューティングの量子位相推定(QPE)という2つのよく知られた手法が異なる文脈で開発された。これらの正準例は、還元量子位相推定(RQPE)回路と呼ばれる、より大規模な位相推定プロトコルの例であることを示す。ここでは、RQPE回路を作成できる明示的なアルゴリズムを提案する。この回路は、より少ない量子ビットとユニタリな応用を持つ任意の位相の集合を区別し、RIとQPEが属する一般的な量子仮説テストのクラスを解く。さらに、測定精度と位相識別性とのトレードオフを実証し、回路を特定の用途に最適に調整できるようにする。

Estimating a quantum phase is a necessary task in a wide range of fields of quantum science. To accomplish this task, two well-known methods have been developed in distinct contexts, namely, Ramsey interferometry (RI) in atomic and molecular physics and quantum phase estimation (QPE) in quantum computing. We demonstrate that these canonical examples are instances of a larger class of phase estimation protocols, which we call reductive quantum phase estimation (RQPE) circuits. Here we present an explicit algorithm that allows one to create an RQPE circuit. This circuit distinguishes an arbitrary set of phases with a fewer number of qubits and unitary applications, thereby solving a general class of quantum hypothesis testing to which RI and QPE belong. We further demonstrate a trade-off between measurement precision and phase distinguishability, which allows one to tune the circuit to be optimal for a specific application.

翻訳日:2024-02-08 17:32:11 公開日:2024-02-06

# 人間ではなく、ロールプレイングツールとしてのAI言語モデル

AI language models as role-playing tools, not human participants ( http://arxiv.org/abs/2402.04470v1 )

ライセンス: Link先を確認

Zhicheng Lin

(参考訳) AIの進歩は、人間の参加者の代替として言語モデルの誤用を招いている。平均的な人間の心を垣間見るものとして、これらの統計アルゴリズムを根本的に誤認識し、言語モデルは柔軟なシミュレーションツールとして受け入れるべきであり、人間の特性自体を持たずに多様な振る舞いを模倣できると主張している。

Advances in AI invite misuse of language models as replacements for human participants. We argue that treating their responses as glimpses into an average human mind fundamentally mischaracterizes these statistical algorithms and that language models should be embraced as flexible simulation tools, able to mimic diverse behaviors without possessing human traits themselves.

翻訳日:2024-02-08 17:31:55 公開日:2024-02-06

# DySLIM:カオスシステムのための不変測度による動的安定学習

DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems ( http://arxiv.org/abs/2402.04467v1 )

ライセンス: Link先を確認

Yair Schiff, Zhong Yi Wan, Jeffrey B. Parker, Stephan Hoyer, Volodymyr Kuleshov, Fei Sha, Leonardo Zepeda-N\'u\~nez

(参考訳) 散逸したカオスシステムからの学習ダイナミクスは、その固有の不安定性のために、学習ダイナミクスの誤りを指数関数的に増幅するポジティブなリアプノフ指数によって形式化されるため、悪名高い。しかし、これらの系の多くはエルゴード性や引力を示す:コンパクトで非常に複雑な多様体で、軌跡は有限時間で収束し、不変測度、すなわち力学の作用の下で不変な確率分布をサポートし、システムの長期的な統計的挙動を規定する。本研究では, トラジェクタ間の不適合のみを対象とする一般的な手法と対照的に, トラジェクタの長さが増加するにつれて発散してしまう場合が多く, 不変測度の学習とダイナミクスを対象とする新しい枠組みを提案する。我々は,既存の学習目標と併用可能な効率的な目標を提示するために,このフレームワークを用いて提案する。 invariant measures(dyslim)目標による動的安定学習は、他の学習目標と比較して、ポイントアラウンドトラッキングと長期的な統計精度を達成するモデルトレーニングを可能にする。スケーラブルな正規化項で分布をターゲットとすることで、気候や気候モデルのようなゆっくりと変化する分布を示すより複雑なシステムにこのアプローチを拡張できることを期待する。

Learning dynamics from dissipative chaotic systems is notoriously difficult due to their inherent instability, as formalized by their positive Lyapunov exponents, which exponentially amplify errors in the learned dynamics. However, many of these systems exhibit ergodicity and an attractor: a compact and highly complex manifold, to which trajectories converge in finite-time, that supports an invariant measure, i.e., a probability distribution that is invariant under the action of the dynamics, which dictates the long-term statistical behavior of the system. In this work, we leverage this structure to propose a new framework that targets learning the invariant measure as well as the dynamics, in contrast with typical methods that only target the misfit between trajectories, which often leads to divergence as the trajectories' length increases. We use our framework to propose a tractable and sample efficient objective that can be used with any existing learning objectives. Our Dynamics Stable Learning by Invariant Measures (DySLIM) objective enables model training that achieves better point-wise tracking and long-term statistical accuracy relative to other learning objectives. By targeting the distribution with a scalable regularization term, we hope that this approach can be extended to more complex systems exhibiting slowly-variant distributions, such as weather and climate models.

翻訳日:2024-02-08 17:31:48 公開日:2024-02-06

# NVIDIA Holoscanにおける医療AIシステムのための決定論的エンドツーエンドレイテンシ

Towards Deterministic End-to-end Latency for Medical AI Systems in NVIDIA Holoscan ( http://arxiv.org/abs/2402.04466v1 )

ライセンス: Link先を確認

Soham Sinha, Shekhar Dwivedi, Mahdi Azizian

(参考訳) 医療機器へのAIとML技術の導入は、医療診断と治療に革命をもたらした。医療機器メーカーは、単一のプラットフォームに複数のアプリケーションを統合することで、AIとMLがもたらすメリットを最大化することを熱望している。しかし、独自の視覚化コンポーネントを備えた複数のAIアプリケーションの同時実行は、主にGPUリソースの競合による予測不可能なエンドツーエンドレイテンシにつながる。これを軽減するため、製造業者は通常、異なるAIアプリケーションのための別々のワークステーションをデプロイし、財務、エネルギー、メンテナンスコストを増大させる。本稿では、センサーデータと画像をストリーミングするリアルタイムAIシステムであるNVIDIAのHoloscanプラットフォームにおけるこれらの課題に対処する。計算タスクとグラフィックスタスクの両方を含む異種GPUワークロードに最適化されたシステム設計を提案する。我々の設計では、CUDA MPSを計算ワークロードの空間分割に利用し、計算処理とグラフィックス処理を別々のGPUに分離する。実世界の医療機器アプリケーションを用いた経験的評価により,様々な終末遅延決定指標の大幅な性能向上を示す。例えば、提案した設計では、単一GPUベースラインと比較して、最大レイテンシを21～30%削減し、最大5つの同時内視鏡ツールトラッキングAIアプリケーションに対して、レイテンシ分散フラットネスを17～25%改善している。デフォルトのマルチGPUセットアップに対して,GPU利用率を42%向上させることで,最大で6つの並列アプリケーションで最大遅延を35%削減する。本稿では、並列および異種gpuワークロードのパフォーマンス予測が重要な要件である医療システムを含むエッジコンピューティング領域におけるaiアプリケーションについて、明確な設計知見を提供する。

The introduction of AI and ML technologies into medical devices has revolutionized healthcare diagnostics and treatments. Medical device manufacturers are keen to maximize the advantages afforded by AI and ML by consolidating multiple applications onto a single platform. However, concurrent execution of several AI applications, each with its own visualization components, leads to unpredictable end-to-end latency, primarily due to GPU resource contentions. To mitigate this, manufacturers typically deploy separate workstations for distinct AI applications, thereby increasing financial, energy, and maintenance costs. This paper addresses these challenges within the context of NVIDIA's Holoscan platform, a real-time AI system for streaming sensor data and images. We propose a system design optimized for heterogeneous GPU workloads, encompassing both compute and graphics tasks. Our design leverages CUDA MPS for spatial partitioning of compute workloads and isolates compute and graphics processing onto separate GPUs. We demonstrate significant performance improvements across various end-to-end latency determinism metrics through empirical evaluation with real-world Holoscan medical device applications. For instance, the proposed design reduces maximum latency by 21-30% and improves latency distribution flatness by 17-25% for up to five concurrent endoscopy tool tracking AI applications, compared to a single-GPU baseline. Against a default multi-GPU setup, our optimizations decrease maximum latency by 35% for up to six concurrent applications by improving GPU utilization by 42%. This paper provides clear design insights for AI applications in the edge-computing domain including medical systems, where performance predictability of concurrent and heterogeneous GPU workloads is a critical requirement.

翻訳日:2024-02-08 17:31:24 公開日:2024-02-06

# BAdaCost: コストを伴うマルチクラスのブースティング

BAdaCost: Multi-class Boosting with Costs ( http://arxiv.org/abs/2402.04465v1 )

ライセンス: Link先を確認

Antonio Fern\'andez-Baldera, Jos\'e M. Buenaposada, Luis Baumela

(参考訳) マルチクラスコスト感性分類アルゴリズムであるBAdaCostを提案する。コストに敏感な複数クラスの弱い学習者を組み合わせて、Boostingフレームワーク内で強力な分類規則を得る。このアルゴリズムを導出するために,AdaBoost,SAMME,コストセンシティブなAdaBoost,PIBoostなどの様々な分類アルゴリズムで最適化された損失を一般化する,コストセンシティブなマルチクラス指数損失であるCMELを導入する。それゆえ、共通の理論的枠組みの下でそれらを統一する。実験では, BAdaCostが従来のマルチクラスコスト感性アプローチと比較して, 性能の大幅な向上を実証した。非対称多クラス分類における提案アルゴリズムの利点は、実用的多視点顔と車検出問題でも評価されている。

We present BAdaCost, a multi-class cost-sensitive classification algorithm. It combines a set of cost-sensitive multi-class weak learners to obtain a strong classification rule within the Boosting framework. To derive the algorithm we introduce CMEL, a Cost-sensitive Multi-class Exponential Loss that generalizes the losses optimized in various classification algorithms such as AdaBoost, SAMME, Cost-sensitive AdaBoost and PIBoost. Hence unifying them under a common theoretical framework. In the experiments performed we prove that BAdaCost achieves significant gains in performance when compared to previous multi-class cost-sensitive approaches. The advantages of the proposed algorithm in asymmetric multi-class classification are also evaluated in practical multi-view face and car detection problems.

翻訳日:2024-02-08 17:30:58 公開日:2024-02-06

# 人工知能の難解な10の課題

Ten Hard Problems in Artificial Intelligence We Must Get Right ( http://arxiv.org/abs/2402.04464v1 )

ライセンス: Link先を確認

Gavin Leech and Simson Garfinkel and Misha Yagudin and Alexander Briand and Aleksandr Zhuravlev

(参考訳) We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same time ensuring socially responsible deployment; (8) addressing any geopolitical disruptions that AI causes; (9) promoting sound governance of the technology; and (10) managing the philosophical disruptions for humans living in the age of AI. それぞれの問題について、その領域を概説し、最近の重要な作業を特定し、今後の方向性を提案する。 (注:2023年1月までの文献をレビューする。)

We explore the AI2050 "hard problems" that block the promise of AI and cause AI risks: (1) developing general capabilities of the systems; (2) assuring the performance of AI systems and their training processes; (3) aligning system goals with human goals; (4) enabling great applications of AI in real life; (5) addressing economic disruptions; (6) ensuring the participation of all; (7) at the same time ensuring socially responsible deployment; (8) addressing any geopolitical disruptions that AI causes; (9) promoting sound governance of the technology; and (10) managing the philosophical disruptions for humans living in the age of AI. For each problem, we outline the area, identify significant recent work, and suggest ways forward. [Note: this paper reviews literature through January 2023.]

翻訳日:2024-02-08 17:30:43 公開日:2024-02-06

# リコメンダシステムにおけるAutoMLの可能性

The Potential of AutoML for Recommender Systems ( http://arxiv.org/abs/2402.04453v1 )

ライセンス: Link先を確認

Tobias Vente, Joeran Beel

(参考訳) Automated Machine Learning (AutoML)は、モデル圧縮、機械翻訳、コンピュータビジョンを含む機械学習(ML)の非常に高度な応用である。 Recommender Systems (RecSys) はMLの応用と見なすことができる。しかし、AutoMLはRecSysコミュニティではほとんど注目を集めていない。 AutoML技術を採用するのは、比較的単純なAutomated Recommender Systems(AutoRecSys)ライブラリのみである。しかし、これらのライブラリは学生プロジェクトに基づいており、automlライブラリの機能や完全な開発を提供していない。私たちは、推奨システムを実装したい経験の浅いユーザのシナリオでAutoMLライブラリがどのように機能するかを判断することにしました。我々は、平均予測基準を含む15のライブラリの60のAutoML、AutoRecSys、ML、RecSysアルゴリズムの予測性能を14の明示的なフィードバックRecSysデータセットで比較した。経験の浅いユーザの視点をシミュレートするために,アルゴリズムをデフォルトのハイパーパラメータで評価した。私たちはAutoMLとAutoRecSysライブラリが最善であることがわかった。 AutoMLライブラリは14のデータセットのうち6つ(43%)で最高に動作したが、必ずしも同じAutoMLライブラリが最高に動作していたわけではない。シングルベストのライブラリはAutoRecSysライブラリのAuto-Surpriseで、5つのデータセット(36%)で最高のパフォーマンスを示している。 3つのデータセット(21%)では、AutoMLライブラリはパフォーマンスが悪く、デフォルトパラメータを持つRecSysライブラリがベストだった。しかし、データセット毎に上位5つの配置の50%を取得すると、recsysアルゴリズムは平均してautomlに遅れる。 MLアルゴリズムは一般的に最悪だった。

Automated Machine Learning (AutoML) has greatly advanced applications of Machine Learning (ML) including model compression, machine translation, and computer vision. Recommender Systems (RecSys) can be seen as an application of ML. Yet, AutoML has found little attention in the RecSys community; nor has RecSys found notable attention in the AutoML community. Only few and relatively simple Automated Recommender Systems (AutoRecSys) libraries exist that adopt AutoML techniques. However, these libraries are based on student projects and do not offer the features and thorough development of AutoML libraries. We set out to determine how AutoML libraries perform in the scenario of an inexperienced user who wants to implement a recommender system. We compared the predictive performance of 60 AutoML, AutoRecSys, ML, and RecSys algorithms from 15 libraries, including a mean predictor baseline, on 14 explicit feedback RecSys datasets. To simulate the perspective of an inexperienced user, the algorithms were evaluated with default hyperparameters. We found that AutoML and AutoRecSys libraries performed best. AutoML libraries performed best for six of the 14 datasets (43%), but it was not always the same AutoML library performing best. The single-best library was the AutoRecSys library Auto-Surprise, which performed best on five datasets (36%). On three datasets (21%), AutoML libraries performed poorly, and RecSys libraries with default parameters performed best. Although, while obtaining 50% of all placements in the top five per dataset, RecSys algorithms fall behind AutoML on average. ML algorithms generally performed the worst.

翻訳日:2024-02-08 17:30:31 公開日:2024-02-06

# 質量細胞計測のための細胞セグメンテーションモデルの限界を押し上げる

Pushing the limits of cell segmentation models for imaging mass cytometry ( http://arxiv.org/abs/2402.04446v1 )

ライセンス: Link先を確認

Kimberley M. Bird, Xujiong Ye, Alan M. Race, James M. Brown

(参考訳) imaging mass cytometry (imc) は比較的新しい生体組織を細胞内分解能でイメージングする技術である。近年、学習に基づくセグメンテーション手法により、細胞型と形態の正確な定量化が可能になっているが、一般的には、完全に注釈付き基底真理(gt)ラベルを持つ大規模データセットに依存している。本稿では,不完全ラベルが学習ベースセグメンテーションモデルに及ぼす影響について検討し,これらのモデルの組織タイプへの一般化性を評価する。以上の結果から,GTマスクから50%の細胞アノテーションを除去すると,DSCスコアは0.874に低下する(GTマスクで訓練したモデルによる0.889から)。これは、アノテーションの時間がパフォーマンスに悪影響を及ぼすことなく、少なくとも半分削減できることを意味する。さらに,不完全ラベルを用いた単発モデルの訓練では,多発組織型に比べてdscが0.031減少し,セグメンテーションの質的差異が無視できる。さらに、最悪のパフォーマンスモデル(5%の細胞アノテーションを含む)をブートストラッピングすると、10倍のDSCスコアが0.720から0.829に向上する。これらの知見は、トレーニング中に複数のIMC組織タイプの必要性を排除し、また、ラベルがほとんどないモデルが自分自身で改善する可能性も提供することを含む、同等のセグメンテーションモデルを作成するプロセスに、時間と作業が費やされる可能性があることを示唆している。ソースコードはgithubにある。 https://github.com/kimberley/isbi2024。

Imaging mass cytometry (IMC) is a relatively new technique for imaging biological tissue at subcellular resolution. In recent years, learning-based segmentation methods have enabled precise quantification of cell type and morphology, but typically rely on large datasets with fully annotated ground truth (GT) labels. This paper explores the effects of imperfect labels on learning-based segmentation models and evaluates the generalisability of these models to different tissue types. Our results show that removing 50% of cell annotations from GT masks only reduces the dice similarity coefficient (DSC) score to 0.874 (from 0.889 achieved by a model trained on fully annotated GT masks). This implies that annotation time can in fact be reduced by at least half without detrimentally affecting performance. Furthermore, training our single-tissue model on imperfect labels only decreases DSC by 0.031 on an unseen tissue type compared to its multi-tissue counterpart, with negligible qualitative differences in segmentation. Additionally, bootstrapping the worst-performing model (with 5% of cell annotations) a total of ten times improves its original DSC score of 0.720 to 0.829. These findings imply that less time and work can be put into the process of producing comparable segmentation models; this includes eliminating the need for multiple IMC tissue types during training, whilst also providing the potential for models with very few labels to improve on themselves. Source code is available on GitHub: https://github.com/kimberley/ISBI2024.

翻訳日:2024-02-08 17:30:00 公開日:2024-02-06

# 医師-AIコンサルテーションのワンショット分類における埋め込みの評価

Evaluating Embeddings for One-Shot Classification of Doctor-AI Consultations ( http://arxiv.org/abs/2402.04442v1 )

ライセンス: Link先を確認

Olumide Ebenezer Ojo, Olaronke Oluwayemisi Adebanji, Alexander Gelbukh, Hiram Calvo and Anna Feldman

(参考訳) 医療提供者と患者との効果的なコミュニケーションは、高品質な患者医療の提供に不可欠である。本研究では,医療相談における医師書きとAI生成のテキストを,最先端の埋め込みとワンショット分類システムを用いてどのように分類するかを検討する。 bag-of-words, character n-grams, word2vec, glove, fasttext, gpt2 embeddedsなどの埋め込みを解析することにより,ワンショット分類システムが医療相談の中で意味情報を取得する方法を検討する。その結果、埋め込みはテキストからセマンティックな特徴を信頼性と適応性でキャプチャできることがわかった。全体として、Word2Vec、GloVe、および character n-grams の埋め込みは良好に動作し、このタスクをターゲットにしたモデリングに適していることを示している。 GPT2埋め込みも顕著な性能を示しており、このタスクに合わせたモデルにも適していることを示している。当社の機械学習アーキテクチャは、トレーニングデータが少ない場合の健康会話の質を大幅に向上させ、患者と医療提供者間のコミュニケーションを改善しました。

Effective communication between healthcare providers and patients is crucial to providing high-quality patient care. In this work, we investigate how Doctor-written and AI-generated texts in healthcare consultations can be classified using state-of-the-art embeddings and one-shot classification systems. By analyzing embeddings such as bag-of-words, character n-grams, Word2Vec, GloVe, fastText, and GPT2 embeddings, we examine how well our one-shot classification systems capture semantic information within medical consultations. Results show that the embeddings are capable of capturing semantic features from text in a reliable and adaptable manner. Overall, Word2Vec, GloVe and Character n-grams embeddings performed well, indicating their suitability for modeling targeted to this task. GPT2 embedding also shows notable performance, indicating its suitability for models tailored to this task as well. Our machine learning architectures significantly improved the quality of health conversations when training data are scarce, improving communication between patients and healthcare providers.

翻訳日:2024-02-08 17:29:35 公開日:2024-02-06

# 全相関を用いた高次ニューラルネットワークノード相互作用の探索

Exploring higher-order neural network node interactions with total correlation ( http://arxiv.org/abs/2402.04440v1 )

ライセンス: Link先を確認

Thomas Kerby, Teresa White, Kevin Moon

(参考訳) 生態システム、コラボレーション、人間の脳などの領域では、変数は複雑な方法で相互作用する。しかし、高次変数相互作用(HOI)を正確に特徴付けることは、データ間でHOIが変化するとさらに悪化する難しい問題である。そこで本研究では,データ多様体に近接してデータポイントをクラスタリングし,局所的スケールでhoisをキャプチャする新しい手法corexを提案する。次に、全相関と呼ばれる相互情報の多変量バージョンを用いて、各クラスタ内のデータの潜在因子表現を構築し、局所的なHOIを学習する。我々はLocal CorExを用いて、合成および実世界のデータ中のHOIを探索し、データ構造に関する隠れた洞察を抽出する。最後に、トレーニングニューラルネットワークの内部動作の探索と解釈にLocal CorExが適していることを示します。

In domains such as ecological systems, collaborations, and the human brain the variables interact in complex ways. Yet accurately characterizing higher-order variable interactions (HOIs) is a difficult problem that is further exacerbated when the HOIs change across the data. To solve this problem we propose a new method called Local Correlation Explanation (CorEx) to capture HOIs at a local scale by first clustering data points based on their proximity on the data manifold. We then use a multivariate version of the mutual information called the total correlation, to construct a latent factor representation of the data within each cluster to learn the local HOIs. We use Local CorEx to explore HOIs in synthetic and real world data to extract hidden insights about the data structure. Lastly, we demonstrate Local CorEx's suitability to explore and interpret the inner workings of trained neural networks.

翻訳日:2024-02-08 17:29:14 公開日:2024-02-06

# Degenerate Clifford Algebrasにおける知識グラフの埋め込み

Embedding Knowledge Graphs in Degenerate Clifford Algebras ( http://arxiv.org/abs/2402.04870v1 )

ライセンス: Link先を確認

Louis Mozart Kamdem, Caglar Demir and Axel-Cyrille Ngonga

(参考訳) クリフォード代数は実数、複素数、四元数の自然な一般化である。これまでのところ、cl_{p,q}$(つまり、零基ベクトルを持たない代数)という形のクリフォード代数のみが知識グラフ埋め込みの文脈で研究されてきた。そこで本研究では,nilpotency index が 2 である nilpotent base vector について考察する。これらの空間において、$Cl_{p,q,r}$ は双対数に基づくアプローチ(これは $Cl_{p,q}$ でモデル化できない)を一般化し、実体埋め込みの現実部分と複素部分の間の高次相互作用が存在しないことから生じるパターンを捉えることができる。パラメータの発見には$p$,$q$,$r$の2つの新しいモデルを設計する。最初のモデルはgreedy検索を使用して$p$、$q$、$r$を最適化する。 2つ目は、ニューラルネットワークを用いて計算された入力知識グラフの埋め込みに基づいて$(p, q,r)$を予測する。 7つのベンチマークデータセットによる評価結果から, 零ベクトルが埋め込みの捕集に有効であることが示唆された。我々の技術との比較は、検証データで達成するmrを全てのデータセットの他のアプローチよりも一般化していることを示唆している。また、greedy検索は$p$、$q$、$r$の値が最適に近い値を見つけるのに十分であることを示す。

Clifford algebras are a natural generalization of the real numbers, the complex numbers, and the quaternions. So far, solely Clifford algebras of the form $Cl_{p,q}$ (i.e., algebras without nilpotent base vectors) have been studied in the context of knowledge graph embeddings. We propose to consider nilpotent base vectors with a nilpotency index of two. In these spaces, denoted $Cl_{p,q,r}$, allows generalizing over approaches based on dual numbers (which cannot be modelled using $Cl_{p,q}$) and capturing patterns that emanate from the absence of higher-order interactions between real and complex parts of entity embeddings. We design two new models for the discovery of the parameters $p$, $q$, and $r$. The first model uses a greedy search to optimize $p$, $q$, and $r$. The second predicts $(p, q,r)$ based on an embedding of the input knowledge graph computed using neural networks. The results of our evaluation on seven benchmark datasets suggest that nilpotent vectors can help capture embeddings better. Our comparison against the state of the art suggests that our approach generalizes better than other approaches on all datasets w.r.t. the MRR it achieves on validation data. We also show that a greedy search suffices to discover values of $p$, $q$ and $r$ that are close to optimal.

翻訳日:2024-02-08 15:17:44 公開日:2024-02-06

# 推定リーンとデータ適応予測

Assumption-lean and Data-adaptive Post-Prediction Inference ( http://arxiv.org/abs/2311.14220v3 )

ライセンス: Link先を確認

Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu

(参考訳) 現代の科学研究が直面する主な課題は金本位制のデータの入手が限られていることであり、費用と労力がかかる。機械学習(ML)の急速な発展により、科学者は容易に得られる共変量でこれらの金標準結果を予測するためにMLアルゴリズムに依存してきた。しかし、これらの予測結果は、予測手順によってもたらされた不正確さや不均質性を無視して、後続の統計分析で直接使用されることが多い。これはおそらく偽陽性の発見と無効な科学的結論をもたらす。本研究では、ML予測結果に基づいて、有効かつ強力な推論を可能にする仮定型およびデータ適応型ポストプレディション推論(POP-Inf)手法を提案する。その「推定リーン」特性は、幅広い統計量のML予測を仮定せずに信頼できる統計的推測を保証する。その"data-adaptive"機能は、ml-predictionの精度に関わらず、既存の予測後推論メソッドよりも効率性が向上する。シミュレーションと大規模ゲノムデータを用いて,本手法の優位性と適用性を示す。

A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be both costly and labor-intensive to obtain. With the rapid development of machine learning (ML), scientists have relied on ML algorithms to predict these gold-standard outcomes with easily obtained covariates. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce an assumption-lean and data-adaptive Post-Prediction Inference (POP-Inf) procedure that allows valid and powerful inference based on ML-predicted outcomes. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML-prediction, for a wide range of statistical quantities. Its "data-adaptive'" feature guarantees an efficiency gain over existing post-prediction inference methods, regardless of the accuracy of ML-prediction. We demonstrate the superiority and applicability of our method through simulations and large-scale genomic data.

翻訳日:2024-02-08 12:09:48 公開日:2024-02-06

# AIが生成した画像から人間のアートを区別できるのか?

Organic or Diffused: Can We Distinguish Human Art from AI-generated Images? ( http://arxiv.org/abs/2402.03214v2 )

ライセンス: Link先を確認

Anna Yoo Jeong Ha, Josephine Passananti, Ronik Bhaskar, Shawn Shan, Reid Southen, Haitao Zheng, Ben Y. Zhao

(参考訳) 生成AI画像の出現は、アートの世界を完全に破壊した。 aiが生成した画像を人間の芸術と区別することは、時間とともに影響が増大する困難な問題である。この問題に対処できないため、悪いアクターは、AIイメージを禁止したポリシーを掲げる人間芸術や企業に対してプレミアムを支払う個人を欺くことができる。また、コンテンツ所有者が著作権を確立し、モデルの崩壊を避けるためにトレーニングデータのキュレーションに関心を持つモデルトレーナーにとっても重要である。人間のアートとAIイメージを区別するアプローチには、教師付き学習によって訓練された分類器、拡散モデルをターゲットにした研究ツール、芸術技術に関する知識を使ったプロのアーティストによる識別など、さまざまなものがある。本稿では,これらのアプローチが現在の現代生成モデルに対して,良性と敵意の両方においてどのように機能するかを理解したい。実際の人間のアートを7つのスタイルでキュレートし、5つの生成モデルからマッチング画像を生成し、8つの検出器(180人の群衆、4000人以上のプロのアーティスト、13のエキスパートアーティストを含む5つの自動検出器と3つの異なる人間グループ)を適用する。 Hiveとエキスパートアーティストはどちらも非常にうまく機能するが、異なる方法で間違いを犯す(Hiveは敵の摂動に対して弱く、エキスパートアーティストは高い偽陽性を生成する)。これらの弱点は、モデルが進化し続けるにつれて残ると信じており、私たちのデータを使用して、人間と自動化された検出器のチームが、正確性と堅牢性の最高の組み合わせを提供する理由を実証しています。

The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today's modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models, and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.

翻訳日:2024-02-08 12:00:56 公開日:2024-02-06

# 検証回路の再利用による言語モデルの信頼性向上

Increasing Trust in Language Models through the Reuse of Verified Circuits ( http://arxiv.org/abs/2402.02619v2 )

ライセンス: Link先を確認

Philip Quirke, Clement Neo, Fazl Barez

(参考訳) 言語モデル(LM)は、幅広い予測タスクにますます使われていますが、それらのトレーニングは稀なエッジケースを無視し、信頼性を低下させます。ここでは、タスクアルゴリズムと回路実装を検証し、エッジケースを考慮し、既知の障害モードを含まない、厳格な信頼性基準を定義する。数学的および論理的に規定されたフレームワークを使用して構築すれば,トランスフォーマーモデルをこの標準を満たすように訓練できることが示される。本稿では n-桁整数加算のモデルを完全に検証する。検証されたモジュールの再利用性を示すために、訓練された整数加算モデルを未訓練モデルに挿入し、複合モデルを訓練して加算と減算の両方を実行する。両タスクの加算回路を広範囲に再利用し,より複雑な減算器モデルの検証を容易にする。本稿では,検証済みのタスクモジュールをLMに挿入することで,モデルの再利用を活かし,それらを用いた言語モデルの妥当性と信頼性を向上させる方法について論じる。検証回路の再利用により、言語モデルの安全性に向けた重要なステップであると考えられる、より複雑な複合モデルを検証する労力が削減される。

Language Models (LMs) are increasingly used for a wide range of prediction tasks, but their training can often neglect rare edge cases, reducing their reliability. Here, we define a stringent standard of trustworthiness whereby the task algorithm and circuit implementation must be verified, accounting for edge cases, with no known failure modes. We show that a transformer model can be trained to meet this standard if built using mathematically and logically specified frameworks. In this paper, we fully verify a model for n-digit integer addition. To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction. We find extensive reuse of the addition circuits for both tasks, easing verification of the more complex subtractor model. We discuss how inserting verified task modules into LMs can leverage model reuse to improve verifiability and trustworthiness of language models built using them. The reuse of verified circuits reduces the effort to verify more complex composite models which we believe to be a significant step towards safety of language models.

翻訳日:2024-02-08 11:59:26 公開日:2024-02-06

# グラフ基礎モデル

Graph Foundation Models ( http://arxiv.org/abs/2402.02216v2 )

ライセンス: Link先を確認

Haitao Mao, Zhikai Chen, Wenzhuo Tang, Jianan Zhao, Yao Ma, Tong Zhao, Neil Shah, Mikhail Galkin, Jiliang Tang

(参考訳) グラフ基礎モデル(GFM)は、グラフ領域における新しいトレンド研究トピックであり、異なるグラフやタスクを一般化可能なグラフモデルの開発を目指している。しかし、汎用的なGFMはまだ達成されていない。 GFMを構築する上で重要な課題は、さまざまな構造パターンを持つグラフ間でポジティブな転送を可能にする方法である。 cvおよびnlpドメインにおける既存の基礎モデルに着想を得て、グラフ上の不変性を符号化する基本転送可能な単位の「グラフ語彙」を提唱し、gfm開発の新たな展望を提案する。我々は,ネットワーク解析,理論的基礎,安定性といった重要な側面からグラフ語彙の構成を基礎づける。このような語彙的視点は、ニューラルスケーリング法則に従って将来のGFM設計を前進させる可能性がある。

Graph Foundation Model (GFM) is a new trending research topic in the graph domain, aiming to develop a graph model capable of generalizing across different graphs and tasks. However, a versatile GFM has not yet been achieved. The key challenge in building GFM is how to enable positive transfer across graphs with diverse structural patterns. Inspired by the existing foundation models in the CV and NLP domains, we propose a novel perspective for the GFM development by advocating for a "graph vocabulary", in which the basic transferable units underlying graphs encode the invariance on graphs. We ground the graph vocabulary construction from essential aspects including network analysis, theoretical foundations, and stability. Such a vocabulary perspective can potentially advance the future GFM design following the neural scaling laws.

翻訳日:2024-02-08 11:56:58 公開日:2024-02-06

# ソフトウェア工学のための協調エージェント

Collaborative Agents for Software Engineering ( http://arxiv.org/abs/2402.02172v2 )

ライセンス: Link先を確認

Daniel Tang and Zhenghan Chen and Kisub Kim and Yewei Song and Haoye Tian and Saad Ezzini and Yongfeng Huang and Jacques Klein and Tegawende F. Bissyande

(参考訳) コードレビューは協調的なプロセスであり、ソフトウェアの全体的な品質と信頼性を保証することを目的としています。これは大きなメリットを提供するが、組織におけるコードレビューの実装は、自動化をアピールするいくつかの課題に直面している。自動化されたコードレビューツールが開発されてからしばらく経ち、新しいaiモデルの採用によって改善されている。残念なことに、既存のメソッドは不足している。彼らはしばしば単一の入出力生成モデルをターゲットにしており、様々な視点を考慮したコードレビューのコラボレーションインタラクションをシミュレートできない。本稿では,コードレビューのための新しいマルチエージェントシステムであるCodeAgentを導入することにより,コードレビュー自動化の最先端技術について述べる。基本的に、CodeAgentはQA-Checker("Question-Answer Checking"の略)によって運営されている。 codeagentは自律的で、マルチエージェントで、大きな言語モデル駆動です。コードエージェントの有効性を実証するために,様々なタスクにおいてその能力を評価する実験を行った。 1)コード変更とコミットメッセージの不一致の検出。 2【コミットによる脆弱性導入の検出】 3) コードスタイルの遵守の検証。私たちのウェブサイトは \url{https://code-agent-new.vercel.app/index.html} でアクセスできます。

Code review is a heavily collaborative process, which aims at ensuring the overall quality and reliability of software. While it provides massive benefits, the implementation of code review in an organization faces several challenges that make its automation appealing. Automated code review tools have been around for a while and are now improving thanks to the adoption of novel AI models, which help can learn about standard practices and systematically check that the reviewed code adheres to them. Unfortunately, existing methods fall short: they often target a single input-output generative model, which cannot simulate the collaboration interactions in code review to account for various perspectives; they are also sub-performing on various critical code review sub-tasks. In this paper, we advance the state of the art in code review automation by introducing CodeAgent, a novel multi-agent-based system for code review. Fundamentally, CodeAgent is steered by QA-Checker (short for "Question-Answer Checking"), a supervision agent, designed specifically to ensure that all agents' contributions remain relevant to the initial review question. CodeAgent is autonomous, multi-agent, and Large language model-driven. To demonstrate the effectiveness of CodeAgent, we performed experiments to assess its capabilities in various tasks including 1) detection of inconsistencies between code changes and commit messages, 2) detection of vulnerability introduction by commits, and 3) validation of adherence to code style. Our website is accessed in \url{https://code-agent-new.vercel.app/index.html}.

翻訳日:2024-02-08 11:56:31 公開日:2024-02-06

# プライバシ保存および検証可能なreluネットワークに対する多項式近似について

On Polynomial Approximations for Privacy-Preserving and Verifiable ReLU Networks ( http://arxiv.org/abs/2011.05530v4 )

ライセンス: Link先を確認

Ramy E. Ali, Jinhyun So, A. Salman Avestimehr

(参考訳) ディープニューラルネットワーク(DNN)推論タスクを信頼できないクラウドにアウトソーシングすることで、データのプライバシと整合性に関する懸念が高まる。多項式ベースの計算にはプライバシーと整合性を保証する技術が数多く存在するが、DNNには多項式以外の計算が含まれる。これらの課題に対処するため、正則線形単位(ReLU)関数を多項式活性化関数に置き換えることで、プライバシー保護および検証可能な推論手法が提案されている。そのような手法は通常、整数係数を持つ多項式や有限体上の多項式を必要とする。そのような要求により、ReLU関数を平方関数に置き換えるいくつかの研究が提案された。本研究では,多項式を整数係数に制限してもルル関数を置換できるような正方関数は最適次数2多項式ではないことを実証的に示す。代わりに、次数2の多項式活性化関数を1次項で提案し、より優れたモデルに導くことを実証的に示す。 VGG-16などの各種アーキテクチャにおけるCIFARおよびTiny ImageNetデータセットの実験から,提案した関数は正方形関数と比較して最大10.4%精度が向上することが示された。

Outsourcing deep neural networks (DNNs) inference tasks to an untrusted cloud raises data privacy and integrity concerns. While there are many techniques to ensure privacy and integrity for polynomial-based computations, DNNs involve non-polynomial computations. To address these challenges, several privacy-preserving and verifiable inference techniques have been proposed based on replacing the non-polynomial activation functions such as the rectified linear unit (ReLU) function with polynomial activation functions. Such techniques usually require polynomials with integer coefficients or polynomials over finite fields. Motivated by such requirements, several works proposed replacing the ReLU function with the square function. In this work, we empirically show that the square function is not the best degree-2 polynomial that can replace the ReLU function even when restricting the polynomials to have integer coefficients. We instead propose a degree-2 polynomial activation function with a first order term and empirically show that it can lead to much better models. Our experiments on the CIFAR and Tiny ImageNet datasets on various architectures such as VGG-16 show that our proposed function improves the test accuracy by up to 10.4% compared to the square function.

翻訳日:2024-02-07 23:36:28 公開日:2024-02-06

# ノイズチャネルにおける量子ランダムアクセスコード

Quantum Random Access Code in Noisy Channels ( http://arxiv.org/abs/2204.09485v2 )

ライセンス: Link先を確認

Breno Marques and Rafael A. da Silva

(参考訳) ランダムアクセスコード(RAC)通信プロトコルは、当事者間の通信が制限されている場合に特に有用である。本研究は,従来の量子ランダムアクセスコード(qrac)を,ノイズがなければ従来のランダムアクセスコード(crac)よりも有利であると証明し,ノイズチャネルがqrac性能に与える影響と,ノイズチャネルが知られている場合のセミデファイト・プログラミングにより最適化されたシーソー法を用いて損失を軽減する方法について検討した。

Random access code (RAC) communication protocol particularly useful when the communication between parties is restricted. In this work we built upon works that have previously proven quantum random access code (QRAC), in the absence of noise, to be more advantageous than classical random access code (CRAC), investigate the effects of noisy channel on QRAC performance and how the losses can be mitigated by using the see-saw method optimized by semi-definite programming when the noisy channel is known.

翻訳日:2024-02-07 21:53:34 公開日:2024-02-06

# IM-META:未知位相をもつネットワークにおけるノードメタデータによる影響最大化

IM-META: Influence Maximization Using Node Metadata in Networks With Unknown Topology ( http://arxiv.org/abs/2106.02926v3 )

ライセンス: Link先を確認

Cong Tran, Won-Yong Shin, Andreas Spitz

(参考訳) 複雑なネットワークの構造はしばしば不明であるため、ノードクエリの予算が小さいため、基盤となるネットワークの一部のみを探索することで、最も影響力のあるシードノードを特定することができる。本稿では、クエリやノードメタデータから情報を取得することで、未知のトポロジを持つネットワークにおける最大化(IM)に影響を与えるソリューションであるIM-METAを提案する。このようなメタデータの使用は、メタデータのノイズ性や接続性推論の不確実性のため、リスクがないため、シードノードとクエリノードの両方を見つけることを目的とした新しいIM問題を定式化する。 IM-METAでは,3つのステップを反復的に行う効果的な手法を開発した。 1) 収集したメタデータとエッジの関係を, シームズニューラルネットワークを用いて学習する。 2) 強化グラフを構築するために, 多数の不確かさエッジを選択する。 3)我々のトポロジ対応ランキング戦略を用いて,推定影響の最大化により,クエリの次のノードを特定する。実世界の4つのデータセットにおけるim-metaの実験的評価を通して,その実証を行った。 a)ノードクエリによるネットワーク探索の速度 b) 各モジュールの有効性 c) ベンチマーク手法に対する優位性 d) より困難な設定に対する堅牢性 e)ハイパーパラメータの感度,及び f)スケーラビリティ。

Since the structure of complex networks is often unknown, we may identify the most influential seed nodes by exploring only a part of the underlying network, given a small budget for node queries. We propose IM-META, a solution to influence maximization (IM) in networks with unknown topology by retrieving information from queries and node metadata. Since using such metadata is not without risk due to the noisy nature of metadata and uncertainties in connectivity inference, we formulate a new IM problem that aims to find both seed nodes and queried nodes. In IM-META, we develop an effective method that iteratively performs three steps: 1) we learn the relationship between collected metadata and edges via a Siamese neural network, 2) we select a number of inferred confident edges to construct a reinforced graph, and 3) we identify the next node to query by maximizing the inferred influence spread using our topology-aware ranking strategy. Through experimental evaluation of IM-META on four real-world datasets, we demonstrate a) the speed of network exploration via node queries, b) the effectiveness of each module, c) the superiority over benchmark methods, d) the robustness to more difficult settings, e) the hyperparameter sensitivity, and f) the scalability.

翻訳日:2024-02-07 21:53:22 公開日:2024-02-06

# InstaHideの2つのプライベート画像の混合におけるサンプル複雑さ

InstaHide's Sample Complexity When Mixing Two Private Images ( http://arxiv.org/abs/2011.11877v2 )

ライセンス: Link先を確認

Baihe Huang, Zhao Song, Runzhou Tao, Junze Yin, Ruizhe Zhang, Danyang Zhuo

(参考訳) ニューラルネットワークのトレーニングは通常、大量の機密データを必要とし、トレーニングデータのプライバシを保護する方法が、ディープラーニング研究において重要なトピックになっている。 InstaHideは、テスト精度に小さな影響しか与えず、トレーニングデータのプライバシを保護するための最先端のスキームだ。本稿では,instahideに対する最近の攻撃を体系的に研究し,これらの攻撃を理解し分析するための統一フレームワークを提案する。既存の攻撃は証明可能な保証を持たないか、1つのプライベートイメージのみを復元できる。それぞれのInstaHideイメージが2つのプライベートイメージの混合である現在のInstaHideチャレンジ設定では、証明可能な保証と最適なサンプル複雑さですべてのプライベートイメージを復元する新しいアルゴリズムを提案する。さらに,すべてのinstahide画像の検索における計算困難性も提供する。以上の結果から,InstaHideは2枚のプライベートイメージを混合しても,情報理論上は安全ではないが,最悪の場合,計算上は安全であることがわかった。

Training neural networks usually require large numbers of sensitive training data, and how to protect the privacy of training data has thus become a critical topic in deep learning research. InstaHide is a state-of-the-art scheme to protect training data privacy with only minor effects on test accuracy, and its security has become a salient question. In this paper, we systematically study recent attacks on InstaHide and present a unified framework to understand and analyze these attacks. We find that existing attacks either do not have a provable guarantee or can only recover a single private image. On the current InstaHide challenge setup, where each InstaHide image is a mixture of two private images, we present a new algorithm to recover all the private images with a provable guarantee and optimal sample complexity. In addition, we also provide a computational hardness result on retrieving all InstaHide images. Our results demonstrate that InstaHide is not information-theoretically secure but computationally secure in the worst case, even when mixing two private images.

翻訳日:2024-02-07 21:53:02 公開日:2024-02-06

# ドメインシフトのための校正不確かさの学習:分散ロバスト学習アプローチ

Learning Calibrated Uncertainties for Domain Shift: A Distributionally Robust Learning Approach ( http://arxiv.org/abs/2010.05784v4 )

ライセンス: Link先を確認

Haoxuan Wang, Zhiding Yu, Yisong Yue, Anima Anandkumar, Anqi Liu, Junchi Yan

(参考訳) 本稿では,対象(テスト)分布とソース(トレーニング)分布が異なる領域シフトの下で校正不確かさを学習するためのフレームワークを提案する。このような領域シフトを、微分密度比推定器を用いて検出し、タスクネットワークと共に訓練し、ドメインシフトに関する調整されたソフトマックス予測形式を構成する。特に、密度比の推定は、ターゲット(テスト)サンプルのソース(トレーニング)分布との密接性を反映している。我々はタスクネットワークにおける予測の不確実性を調整するためにそれを用いる。この密度比を利用するという考え方は、相対的リスク最小化によるドメインシフトを考慮に入れた分布的ロバスト学習(DRL)フレームワークに基づいている。提案手法は,非教師付きドメイン適応 (UDA) や半教師付き学習 (SSL) などの下流タスクに有効な校正不確実性を生成する。これらのタスクでは、セルフトレーニングやFixMatchのようなメソッドが不確実性を使用して、再トレーニングのための確実な疑似ラベルを選択する。実験の結果,DRLの導入はドメイン間性能の大幅な向上につながることがわかった。また,推定密度比は人間の選択頻度と一致し,不確かさの指標との正の相関が示唆された。

We propose a framework for learning calibrated uncertainties under domain shifts, where the source (training) distribution differs from the target (test) distribution. We detect such domain shifts via a differentiable density ratio estimator and train it together with the task network, composing an adjusted softmax predictive form concerning domain shift. In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution. We employ it to adjust the uncertainty of prediction in the task network. This idea of using the density ratio is based on the distributionally robust learning (DRL) framework, which accounts for the domain shift by adversarial risk minimization. We show that our proposed method generates calibrated uncertainties that benefit downstream tasks, such as unsupervised domain adaptation (UDA) and semi-supervised learning (SSL). On these tasks, methods like self-training and FixMatch use uncertainties to select confident pseudo-labels for re-training. Our experiments show that the introduction of DRL leads to significant improvements in cross-domain performance. We also show that the estimated density ratios align with human selection frequencies, suggesting a positive correlation with a proxy of human perceived uncertainties.

翻訳日:2024-02-07 21:52:32 公開日:2024-02-06

# 非パラメトリックIVモデルにおける適応的・最適仮説テスト

Adaptive, Rate-Optimal Hypothesis Testing in Nonparametric IV Models ( http://arxiv.org/abs/2006.09587v4 )

ライセンス: Link先を確認

Christoph Breunig, Xiaohong Chen

(参考訳) 非パラメトリックインストゥルメンタル変数(npiv)モデルにおける構造関数に対する不等式(単調性、凸性など)と等式(パラメトリック、半パラメトリックなど)に対する新しい適応的仮説テストを提案する。実験統計は, 拘束型と非拘束型のNPIV推定器間の2次距離を改良した1次サンプルアナログに基づく。シーブチューニングパラメータとボンフェルロニ調整されたカイ二乗臨界値の計算量的・データ駆動的選択を提供する。本試験は,楽器の内在性と未知強度の存在下での代替関数の未知の滑らかさに適応する。テストの適応ミニマックスレートは$l^2$である。すなわち、合成ヌル上のタイプiの誤差と非パラメトリックな代替モデル上のタイプiiの誤差の和は、未知の正則性を持つnpivモデルに対する他の仮説テストによっては改善できない。 l^2$の信頼度セットは、適応テストの反転によって得られる。シミュレーションにより、我々の適応テストはNPIVモデルにおける単調性およびパラメトリックの制約に対する既存の非適応テストよりもはるかに大きいサイズと有限サンプルパワーを制御することを確認した。異なる製品需要とエンゲル曲線の形状制限を試験するための実証的応用について述べる。

We propose a new adaptive hypothesis test for inequality (e.g., monotonicity, convexity) and equality (e.g., parametric, semiparametric) restrictions on a structural function in a nonparametric instrumental variables (NPIV) model. Our test statistic is based on a modified leave-one-out sample analog of a quadratic distance between the restricted and unrestricted sieve NPIV estimators. We provide computationally simple, data-driven choices of sieve tuning parameters and Bonferroni adjusted chi-squared critical values. Our test adapts to the unknown smoothness of alternative functions in the presence of unknown degree of endogeneity and unknown strength of the instruments. It attains the adaptive minimax rate of testing in $L^2$. That is, the sum of its type I error uniformly over the composite null and its type II error uniformly over nonparametric alternative models cannot be improved by any other hypothesis test for NPIV models of unknown regularities. Confidence sets in $L^2$ are obtained by inverting the adaptive test. Simulations confirm that our adaptive test controls size and its finite-sample power greatly exceeds existing non-adaptive tests for monotonicity and parametric restrictions in NPIV models. Empirical applications to test for shape restrictions of differentiated products demand and of Engel curves are presented.

翻訳日:2024-02-07 21:52:12 公開日:2024-02-06

# クロスドメインFew-Shot学習における大規模マージン機構と擬似クエリセット

Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning ( http://arxiv.org/abs/2005.09218v2 )

ライセンス: Link先を確認

Jia-Fong Yeh and Hsin-Ying Lee and Bing-Chen Tsai and Yi-Rong Chen and Ping-Chia Huang and Winston H. Hsu

(参考訳) 近年では、数発の学習問題に注目が集まっている。以前のほとんどの作業のメソッドは、単一のドメインのデータセットでトレーニングとテストが行われたが、クロスドメインの少数ショット学習は、トレーニングフェーズとテストフェーズの間にあるさまざまなドメインのデータセットを処理する、少数ショット学習問題の真新しいブランチである。本稿では,共通対象,衛星画像,医用画像など4つの異なる領域のデータセットを微調整しながら,単一のデータセット上で事前学習(メタ訓練)されているという問題を解決するために,支援画像から疑似クエリ画像を生成し,顔認識の手法に触発された大きなマージン機構で特徴抽出モジュールを微調整する,新しい大マージン微調整法(lmm-pqs)を提案する。実験結果によると,LMM-PQSはベースラインモデルよりもかなりのマージンを越え,我々のアプローチが堅牢であり,事前学習されたモデルをデータが少ない新しい領域に容易に適応できることを示した。

In recent years, few-shot learning problems have received a lot of attention. While methods in most previous works were trained and tested on datasets in one single domain, cross-domain few-shot learning is a brand-new branch of few-shot learning problems, where models handle datasets in different domains between training and testing phases. In this paper, to solve the problem that the model is pre-trained (meta-trained) on a single dataset while fine-tuned on datasets in four different domains, including common objects, satellite images, and medical images, we propose a novel large margin fine-tuning method (LMM-PQS), which generates pseudo query images from support images and fine-tunes the feature extraction modules with a large margin mechanism inspired by methods in face recognition. According to the experiment results, LMM-PQS surpasses the baseline models by a significant margin and demonstrates that our approach is robust and can easily adapt pre-trained models to new domains with few data.

翻訳日:2024-02-07 21:51:50 公開日:2024-02-06

# Fake News, Disinformation, and Deepfakes: 分散型Ledgerテクノロジとブロックチェーンを活用して,ディジタル偽造と偽造現実に対処する

Fake News, Disinformation, and Deepfakes: Leveraging Distributed Ledger Technologies and Blockchain to Combat Digital Deception and Counterfeit Reality ( http://arxiv.org/abs/1904.05386v3 )

ライセンス: Link先を確認

Paula Fraga-Lamas, Tiago M. Fern\'andez-Caram\'es

(参考訳) ユビキタスなディープフェイク、偽情報、偽情報、プロパガンダ、ポストトゥルースは、しばしば偽ニュースと呼ばれ、近代民主主義社会におけるインターネットとソーシャルメディアの役割に対する懸念を提起している。その急速な普及により、デジタル詐欺は個人または社会的コスト(例えば選挙の完全性を妨げるために)だけでなく、経済的損失(例えば株式市場のパフォーマンスに影響を及ぼす)や国家の安全へのリスクにつながる可能性がある。 Blockchainと他のDistributed Ledger Technologies(DLT)は、情報の保存と交換のためのピアツーピア安全なプラットフォームを作成しながら、透過的で不変で検証可能なトランザクションの記録を提供することによって、データの証明、信頼性、トレーサビリティを保証する。この概要は、デジタル詐欺と戦うためのDLTとブロックチェーンの可能性を探り、現在開発中のイニシアチブをレビューし、主要な課題を特定することを目的としている。さらに、将来の研究者が偽ニュースや偽情報、ディープフェイクに直面するために取り組まなければならない問題について、今日のオンラインメディアにおけるサイバー脅威に対するレジリエンス強化の不可欠な部分として、いくつかの推奨が列挙されている。

The rise of ubiquitous deepfakes, misinformation, disinformation, propaganda and post-truth, often referred to as fake news, raises concerns over the role of Internet and social media in modern democratic societies. Due to its rapid and widespread diffusion, digital deception has not only an individual or societal cost (e.g., to hamper the integrity of elections), but it can lead to significant economic losses (e.g., to affect stock market performance) or to risks to national security. Blockchain and other Distributed Ledger Technologies (DLTs) guarantee the provenance, authenticity and traceability of data by providing a transparent, immutable and verifiable record of transactions while creating a peer-to-peer secure platform for storing and exchanging information. This overview aims to explore the potential of DLTs and blockchain to combat digital deception, reviewing initiatives that are currently under development and identifying their main current challenges. Moreover, some recommendations are enumerated to guide future researchers on issues that will have to be tackled to face fake news, disinformation and deepfakes, as an integral part of strengthening the resilience against cyber-threats on today's online media.

翻訳日:2024-02-07 21:50:31 公開日:2024-02-06

# 超高速単光子レベルパルスキャラクタリゼーションのための可変電気光学せん断干渉法

Variable electro-optic shearing interferometry for ultrafast single-photon-level pulse characterization ( http://arxiv.org/abs/2207.14049v2 )

ライセンス: Link先を確認

Stanis{\l}aw Kurzyna, Marcin Jastrz\k{e}bski, Nicolas Fabre, Wojciech Wasilewski, Micha{\l} Lipka, Micha{\l} Parniak

(参考訳) 利用可能な多くの方法にもかかわらず、超高速パルスの特性化は、特に単光子レベルでの困難な試みである。本稿では、短時間フーリエ変換の大きさをマッピングするパルス特性化方式を提案する。多くのよく知られた解とは異なり、非線形効果は必要とせず、単光子レベルの測定に適している。本手法は,完全電子的実験制御が可能な電気光学変調器を用いて,一連の制御時間と周波数シフトを導入することに基づく。古典的および単光子レベルのパルスのスペクトル幅と時間幅を特徴付け,スペクトル位相と振幅の再構成に成功した。この方法は位相感度測定を実装することで拡張することができ、自然に部分的に不整合光に適している。

Despite the multitude of available methods, the characterisation of ultrafast pulses remains a challenging endeavour, especially at the single-photon level. We introduce a pulse characterisation scheme that maps the magnitude of its short-time Fourier transform. Contrary to many well-known solutions it does not require nonlinear effects and is therefore suitable for single-photon-level measurements. Our method is based on introducing a series of controlled time and frequency shifts, where the latter is performed via an electro-optic modulator allowing a fully-electronic experimental control. We characterized the full spectral and temporal width of a classical and single-photon-level pulse and successfully reconstructed their spectral phase and amplitude. The method can be extended by implementing a phase-sensitive measurement and is naturally well-suited to partially-incoherent light.

翻訳日:2024-02-07 21:43:15 公開日:2024-02-06

# ポテンシャルと密度空間における縮退の幾何学

Geometry of Degeneracy in Potential and Density Space ( http://arxiv.org/abs/2206.12366v3 )

ライセンス: Link先を確認

Markus Penz, Robert van Leeuwen

(参考訳) 先行研究[j. chem. phys. 155, 244111 (2021)]において、グラフで表される有限格子系における密度汎関数理論からホッヘンバーグ・コーンの定理の反例を発見した。ここで、これは非常に特異で稀な密度でのみ起こることを示し、縮退領域と呼ばれる縮退した基底状態から生じる密度集合が互いに接触したり、密度領域全体の境界に接触することを示す。退化領域は一般に、連続体設定においても代数多様体の凸包の形状であることが示されている。密度領域とそれらの生成するポテンシャルの間に生じる幾何学は分析され、他の形状の中でローマ表面を特徴付ける例で説明される。

In a previous work [J. Chem. Phys. 155, 244111 (2021)], we found counterexamples to the fundamental Hohenberg-Kohn theorem from density-functional theory in finite-lattice systems represented by graphs. Here, we demonstrate that this only occurs at very peculiar and rare densities, those where density sets arising from degenerate ground states, called degeneracy regions, touch each other or the boundary of the whole density domain. Degeneracy regions are shown to generally be in the shape of the convex hull of an algebraic variety, even in the continuum setting. The geometry arising between density regions and the potentials that create them is analyzed and explained with examples that, among other shapes, feature the Roman surface.

翻訳日:2024-02-07 21:42:30 公開日:2024-02-06

# 分布比較のためのヒルベルト曲線投影距離

Hilbert Curve Projection Distance for Distribution Comparison ( http://arxiv.org/abs/2205.15059v4 )

ライセンス: Link先を確認

Tao Li, Cheng Meng, Hongteng Xu, Jun Yu

(参考訳) 分散比較は、データ分類や生成モデリングといった多くの機械学習タスクにおいて中心的な役割を果たす。本研究では,Hilbert curve projection (HCP) distanceと呼ばれる新しい計量法を提案し,低複雑性の2つの確率分布間の距離を測定する。特に、まずヒルベルト曲線を用いた2つの高次元確率分布を投影し、それらのカップリングを求め、カップリングに従って元の空間におけるこれらの2つの分布間の移動距離を計算する。我々は, hcp距離が適切な計量であり, 有界台を持つ確率測度に対して well-defined であることを示す。さらに、$d$次元空間における$L_p$コストによる修正された経験的 HCP 距離が、$O(n^{-1/2\max\{d,p\}})$未満の速度でその集団に収束することを示した。次元の呪いを抑制するため、(学習可能な)部分空間射影を用いたhcp距離の2つの変種も開発する。合成データと実世界のデータの両方で実験したところ、我々のHCP距離はワッサーシュタイン距離の効果的なサロゲートとして機能し、スライスされたワッサーシュタイン距離の欠点を克服している。

Distribution comparison plays a central role in many machine learning tasks like data classification and generative modeling. In this study, we propose a novel metric, called Hilbert curve projection (HCP) distance, to measure the distance between two probability distributions with low complexity. In particular, we first project two high-dimensional probability distributions using Hilbert curve to obtain a coupling between them, and then calculate the transport distance between these two distributions in the original space, according to the coupling. We show that HCP distance is a proper metric and is well-defined for probability measures with bounded supports. Furthermore, we demonstrate that the modified empirical HCP distance with the $L_p$ cost in the $d$-dimensional space converges to its population counterpart at a rate of no more than $O(n^{-1/2\max\{d,p\}})$. To suppress the curse-of-dimensionality, we also develop two variants of the HCP distance using (learnable) subspace projections. Experiments on both synthetic and real-world data show that our HCP distance works as an effective surrogate of the Wasserstein distance with low complexity and overcomes the drawbacks of the sliced Wasserstein distance.

翻訳日:2024-02-07 21:42:15 公開日:2024-02-06

# EfficientViT:高分解能Dense予測のためのマルチスケールリニアアテンション

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction ( http://arxiv.org/abs/2205.14756v6 )

ライセンス: Link先を確認

Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

(参考訳) 高分解能高密度予測は、計算写真や自動運転など、多くの現実世界の応用を可能にする。しかし、計算コストが大きいため、最先端の高解像度の予測モデルをハードウェアデバイスに展開することは困難である。この研究は、新しいマルチスケール線形注意を持つ高解像度ビジョンモデルのファミリーであるEfficientViTを提示する。従来のソフトマックス, ハードウェア非効率大カーネル畳み込み, 複雑なトポロジ構造に依存した高分解能高密度予測モデルとは異なり, マルチスケール線形注意は, 軽量かつハードウェア効率の高い操作のみで, グローバル受容場とマルチスケール学習(高分解能高密度予測の2つの望ましい特徴)を実現する。そのため、EfficientViTは、モバイルCPU、エッジGPU、クラウドGPUなど、さまざまなハードウェアプラットフォーム上での大幅なスピードアップによって、これまでの最先端モデルよりも、顕著なパフォーマンス向上を実現している。 Cityscapesのパフォーマンスを損なうことなく、EfficientViTは最大13.9$\times$と6.2$\times$GPUレイテンシをSegFormerとSegNeXtで削減します。超高解像度では、EfficientViTはRestormer上で最大6.4倍のスピードアップを実現し、PSNRでは0.11dBのゲインを提供する。 Segment Anythingでは、EfficientViTはA100 GPU上で48.9倍高いスループットを提供すると同時に、COCO上でのゼロショットインスタンスセグメンテーションのパフォーマンスをわずかに向上させる。

High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense prediction models on hardware devices difficult. This work presents EfficientViT, a new family of high-resolution vision models with novel multi-scale linear attention. Unlike prior high-resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations. As such, EfficientViT delivers remarkable performance gains over previous state-of-the-art models with significant speedup on diverse hardware platforms, including mobile CPU, edge GPU, and cloud GPU. Without performance loss on Cityscapes, our EfficientViT provides up to 13.9$\times$ and 6.2$\times$ GPU latency reduction over SegFormer and SegNeXt, respectively. For super-resolution, EfficientViT delivers up to 6.4x speedup over Restormer while providing 0.11dB gain in PSNR. For Segment Anything, EfficientViT delivers 48.9x higher throughput on A100 GPU while achieving slightly better zero-shot instance segmentation performance on COCO.

翻訳日:2024-02-07 21:41:54 公開日:2024-02-06

# ガウス混合系のエントロピー近似の理論的誤差解析

Theoretical Error Analysis of Entropy Approximation for Gaussian Mixture ( http://arxiv.org/abs/2202.13059v4 )

ライセンス: Link先を確認

Takashi Furuya, Hiroyuki Kusumoto, Koichi Taniguchi, Naoya Kanno, Kazuma Suetake

(参考訳) ガウス混合分布は一般に一般確率分布を表すために用いられる。不確実性推定にガウス混合を用いる重要性はあるが、ガウス混合のエントロピーは解析的に計算することはできない。特に、Gal と Ghahramani [2016] は、非モダルガウス分布のエントロピーの和である近似エントロピーを提案した。この近似は次元に関係なく解析的に計算し易いが、理論的な保証はない。本稿では, 真のエントロピーと近似エントロピーの近似誤差を理論的に解析し, この近似が効果的に働くときに明らかにする。この誤差は、ガウス混合物の各ガウス成分がどれだけ離れているかによって制御される。このような分離を測定するために、ガウス混合のそれぞれのガウス成分の分散の和に対する平均間の距離の比率を導入し、その比率が無限になるにつれて誤差がゼロに収束することを示す。この収束状況は高次元空間においてより起こりやすい。したがって,この近似が高次元問題,特に重みを多用するニューラルネットワークのようなシナリオにおいて有効であることを保証できる。

Gaussian mixture distributions are commonly employed to represent general probability distributions. Despite the importance of using Gaussian mixtures for uncertainty estimation, the entropy of a Gaussian mixture cannot be analytically calculated. Notably, Gal and Ghahramani [2016] proposed the approximate entropy that is the sum of the entropies of unimodal Gaussian distributions. This approximation is easy to analytically calculate regardless of dimension, but there lack theoretical guarantees. In this paper, we theoretically analyze the approximation error between the true entropy and the approximate one to reveal when this approximation works effectively. This error is controlled by how far apart each Gaussian component of the Gaussian mixture. To measure such separation, we introduce the ratios of the distances between the means to the sum of the variances of each Gaussian component of the Gaussian mixture, and we reveal that the error converges to zero as the ratios tend to infinity. This convergence situation is more likely to occur in higher dimensional spaces. Therefore, our results provide a guarantee that this approximation works well in higher dimension problems, particularly in scenarios such as neural networks that involve a large number of weights.

翻訳日:2024-02-07 21:41:26 公開日:2024-02-06

# 群フェアネス下におけるベイズ最適分類器

Bayes-Optimal Classifiers under Group Fairness ( http://arxiv.org/abs/2202.09724v5 )

ライセンス: Link先を確認

Xianli Zeng and Edgar Dobriban and Guang Cheng

(参考訳) 機械学習のアルゴリズムは、社会福祉問題など、より高度な意思決定プロセスに統合されつつある。アルゴリズム予測から潜在的に異なる影響を緩和する必要があるため、公正な機械学習の分野において多くのアプローチが提案されている。しかし,様々な群フェアネス制約の下でベイズ最適分類器を特徴付ける基本的な問題は,いくつかの特別なケースでのみ研究されている。最適仮説テストのための古典的ネマン・ピアソンの議論(Neyman and Pearson, 1933; Shao, 2003)に基づいて、本論文は群フェアネスの下でベイズ最適分類器を導出するための統一的な枠組みを提供する。これにより、FairBayesと呼ばれるグループベースのしきい値設定手法を提案し、この手法は相違を直接制御し、本質的に最適なフェアネス精度トレードオフを実現する。これらの利点は徹底的な実験によって支えられている。

Machine learning algorithms are becoming integrated into more and more high-stakes decision-making processes, such as in social welfare issues. Due to the need of mitigating the potentially disparate impacts from algorithmic predictions, many approaches have been proposed in the emerging area of fair machine learning. However, the fundamental problem of characterizing Bayes-optimal classifiers under various group fairness constraints has only been investigated in some special cases. Based on the classical Neyman-Pearson argument (Neyman and Pearson, 1933; Shao, 2003) for optimal hypothesis testing, this paper provides a unified framework for deriving Bayes-optimal classifiers under group fairness. This enables us to propose a group-based thresholding method we call FairBayes, that can directly control disparity, and achieve an essentially optimal fairness-accuracy tradeoff. These advantages are supported by thorough experiments.

翻訳日:2024-02-07 21:40:49 公開日:2024-02-06

# 文字統計を用いた種子単語の選択

Selecting Seed Words for Wordle using Character Statistics ( http://arxiv.org/abs/2202.03457v3 )

ライセンス: Link先を確認

Nisansa de Silva

(参考訳) 単語推測ゲーム「wordle」は2022年1月に世界的な人気を博した。ゲームの目的は6回以内に5文字の英語単語を推測することである。各トライは、あるキャラクタがソリューションの一部であるかどうかを知らせる色を変えるタイルによってプレイヤーにヒントを与え、それがソリューションの一部である場合、それが正しい配置にあるかどうかを判断する。毎日の単語を解決するための最善の出発語と最善の戦略を見つけるために、多くの試みがなされている。本研究は,5文字単語の文字統計を用いて,最良3単語を決定する。

Wordle, a word guessing game rose to global popularity in the January of 2022. The goal of the game is to guess a five-letter English word within six tries. Each try provides the player with hints by means of colour changing tiles which inform whether or not a given character is part of the solution as well as, in cases where it is part of the solution, whether or not it is in the correct placement. Numerous attempts have been made to find the best starting word and best strategy to solve the daily wordle. This study uses character statistics of five-letter words to determine the best three starting words.

翻訳日:2024-02-07 21:40:31 公開日:2024-02-06

# spooky pebbleゲームにおける厳密な境界: 測定値によるクビットのリサイクル

Tight Bounds on the Spooky Pebble Game: Recycling Qubits with Measurements ( http://arxiv.org/abs/2110.08973v2 )

ライセンス: Link先を確認

Niels Kornerup, Jonathan Sadun, David Soloveichik

(参考訳) Pebbleゲームは、時間空間のトレードオフを分析する一般的なモデルである。特に可逆小石ゲームは、グロバー探索のような量子アルゴリズムで重ね合わせの入力の古典計算を効率的にシミュレートするためによく用いられる。しかし、可逆小石ゲームは、可逆中間測度によって与えられる余分な計算力を利用することはできない。測定と適応位相補正をモデル化したスポーキーな小石ゲームは、可逆的なアプローチが達成できる範囲を超えて、キュービットの数を減らす。スパーキー小石ゲームはシミュレーションの総空間(ビット+キュービット)の複雑さを減少させるわけではないが、キュービットに格納しなければならない空間の量を減少させる。あらゆるpebbleバウンドのライン上に、spooky pebbleゲームに対する漸近的に厳しいトレードオフがあることを証明し、sooky pebbleゲームと任意の古典的なシーケンシャルな計算をシミュレートするための、厳密な時間的トレードオフを与えました。例えば、すべての$\epsilon \in (0,1]$に対して、時間$T$とスペース$S$を必要とする古典的な計算は、量子コンピュータ上で$O(T/ \epsilon)$ gatesと$O(T^{\epsilon}S^{1-\epsilon})$ qubitsで実装できる。これにより、その数で可逆小石ゲームに最もよく知られた境界が改善され、これは$O(2^{1/\epsilon} T)$ gates を使用する。さらに,より一般的な有向非循環グラフ(dag)上では,細粒度データ依存性をキャプチャし,このゲームが木上で可逆的なpebbleゲームに勝ることを示す。さらに、任意のDAGは、不可逆小石ゲームで必要とされる以上の1つ以上の小石で小石化することができ、最大2度のDAG上でスポッキー小石ゲームをプレイするのに必要となる最小の小石を見つけることは、PSPACEハードであることを意味する。

Pebble games are popular models for analyzing time-space trade-offs. In particular, the reversible pebble game is often applied in quantum algorithms like Grover's search to efficiently simulate classical computation on inputs in superposition. However, the reversible pebble game cannot harness the additional computational power granted by irreversible intermediate measurements. The spooky pebble game, which models interleaved measurements and adaptive phase corrections, reduces the number of qubits beyond what reversible approaches can achieve. While the spooky pebble game does not reduce the total space (bits plus qubits) complexity of the simulation, it reduces the amount of space that must be stored in qubits. We prove asymptotically tight trade-offs for the spooky pebble game on a line with any pebble bound, giving a tight time-qubit tradeoff for simulating arbitrary classical sequential computation with the spooky pebble game. For example, for all $\epsilon \in (0,1]$, any classical computation requiring time $T$ and space $S$ can be implemented on a quantum computer using only $O(T/ \epsilon)$ gates and $O(T^{\epsilon}S^{1-\epsilon})$ qubits. This improves on the best known bound for the reversible pebble game with that number of qubits, which uses $O(2^{1/\epsilon} T)$ gates. We also consider the spooky pebble game on more general directed acyclic graphs (DAGs), capturing fine-grained data dependency in computation and show that this game can outperform the reversible pebble game on trees. Additionally any DAG can be pebbled with at most one more pebble than is needed in the irreversible pebble game, implying that finding the minimum number of pebbles necessary to play the spooky pebble game on a DAG with maximum in-degree two is PSPACE-hard to approximate.

翻訳日:2024-02-07 21:39:55 公開日:2024-02-06

# メタラーニング3次元形状分割関数

Meta-Learning 3D Shape Segmentation Functions ( http://arxiv.org/abs/2110.03854v2 )

ライセンス: Link先を確認

Yu Hao, Hao Huang, Shuaihang Yuan, Yi Fang

(参考訳) ディープニューラルネットワークを用いたロバストな3d形状セグメンテーション関数の学習は、強力なパラダイムとして登場し、各3d形状の一貫した部分セグメンテーションを生成する有望なパフォーマンスを提供する。 3次元形状分割関数を一般化するには、各関数空間上の事前のロバストな学習が必要であり、重要な3次元構造変化が存在する場合、形状の一貫した部分分割を可能にする。既存の一般化法は、大規模ラベル付きデータセット上の3次元形状セグメンテーション関数の広範なトレーニングに依存している。本稿では,3次元形状分割関数空間の学習をメタラーニング問題として定式化することを提案し,学習データのない新しい形状に素早く適応可能な3次元分割モデルを予測することを目的とした。より具体的には、各タスクを3d空間の入力点として部品ラベルを予測する形状条件付き3dセグメンテーション関数の教師なし学習と定義する。 3Dセグメンテーション機能は、パートラベルを必要とせずに自己監督型3D形状復元損失によって訓練される。また,3次元形状を入力とし,各3次元セグメンテーション関数空間上での事前予測を行うメタリーナーとして,補助深層ニューラルネットワークを導入する。実験では,メタ3DSegと呼ばれるメタ学習手法が,従来の3次元形状分割関数のためのディープニューラルネットワークの設計よりも,教師なしの3次元形状分割を改善することを示す。

Learning robust 3D shape segmentation functions with deep neural networks has emerged as a powerful paradigm, offering promising performance in producing a consistent part segmentation of each 3D shape. Generalizing across 3D shape segmentation functions requires robust learning of priors over the respective function space and enables consistent part segmentation of shapes in presence of significant 3D structure variations. Existing generalization methods rely on extensive training of 3D shape segmentation functions on large-scale labeled datasets. In this paper, we proposed to formalize the learning of a 3D shape segmentation function space as a meta-learning problem, aiming to predict a 3D segmentation model that can be quickly adapted to new shapes with no or limited training data. More specifically, we define each task as unsupervised learning of shape-conditioned 3D segmentation function which takes as input points in 3D space and predicts the part-segment labels. The 3D segmentation function is trained by a self-supervised 3D shape reconstruction loss without the need for part labels. Also, we introduce an auxiliary deep neural network as a meta-learner which takes as input a 3D shape and predicts the prior over the respective 3D segmentation function space. We show in experiments that our meta-learning approach, denoted as Meta-3DSeg, leads to improvements on unsupervised 3D shape segmentation over the conventional designs of deep neural networks for 3D shape segmentation functions.

翻訳日:2024-02-07 21:39:15 公開日:2024-02-06

# 非凸損失関数上のワンショットフェデレーション学習における順序最適境界

Order Optimal Bounds for One-Shot Federated Learning over non-Convex Loss Functions ( http://arxiv.org/abs/2108.08677v3 )

ライセンス: Link先を確認

Arsalan Sharifnassab, Saber Salehkaleybar, S. Jamaloddin Golestani

(参考訳) 非凸損失関数上の未知分布から$m$のサンプル関数を観測し,それぞれに$m$のマシンが存在する一ショット環境でのフェデレーション学習の問題点を考察する。 F:[-1,1]^d\to\mathbb{R}$ をこの未知分布に対する期待損失関数とする。目標は、最小値が$f$の見積もりを見つけることである。その観察に基づいて、各マシンは有界長$b$の信号を生成し、それをサーバに送る。サーバは全マシンの信号を収集し、最小値である$f$の見積もりを出力する。任意のアルゴリズムの損失は、$\max\big(1/(\sqrt{n}(mB)^{1/d}), 1/\sqrt{mn}\big)$ で、対数係数まで下界であることが示される。次に、この下限が分散学習アルゴリズムであるマルチレゾリューション推定器(multi- resolution estimator for non-convex loss function, mre-nc)を提示することにより、m$とn$の順に最適であることを証明する。

We consider the problem of federated learning in a one-shot setting in which there are $m$ machines, each observing $n$ sample functions from an unknown distribution on non-convex loss functions. Let $F:[-1,1]^d\to\mathbb{R}$ be the expected loss function with respect to this unknown distribution. The goal is to find an estimate of the minimizer of $F$. Based on its observations, each machine generates a signal of bounded length $B$ and sends it to a server. The server collects signals of all machines and outputs an estimate of the minimizer of $F$. We show that the expected loss of any algorithm is lower bounded by $\max\big(1/(\sqrt{n}(mB)^{1/d}), 1/\sqrt{mn}\big)$, up to a logarithmic factor. We then prove that this lower bound is order optimal in $m$ and $n$ by presenting a distributed learning algorithm, called Multi-Resolution Estimator for Non-Convex loss function (MRE-NC), whose expected loss matches the lower bound for large $mn$ up to polylogarithmic factors.

翻訳日:2024-02-07 21:38:49 公開日:2024-02-06

# ソフトウェアに基づく対話システム:調査,分類,課題

Software-Based Dialogue Systems: Survey, Taxonomy and Challenges ( http://arxiv.org/abs/2106.10901v2 )

ライセンス: Link先を確認

Quim Motger, Xavier Franch and Jordi Marco

(参考訳) 人-コンピュータ相互作用の分野における自然言語インタフェースの利用は、専門の科学・産業研究を通じて激しい研究が進められている。この分野での最新のコントリビューションは、リカレントニューラルネットワークやコンテキスト認識戦略の可能性、ユーザ中心の設計アプローチといったディープラーニングアプローチを含む、コミュニティの関心を、会話エージェントやチャットボットとして知られるソフトウェアベースの対話システムへと引き戻すものだ。それにもかかわらず、この分野の新規性を考えると、関連するすべての研究の観点をカバーする会話エージェントの研究の現状に関する、一般的な文脈に依存しない概要が欠落している。本稿では,この文脈に動機づけられ,二次研究の体系的文献レビューを通して,対話型エージェント研究の現状について概説する。本研究は,最近の文献から得られた知識を,様々な領域,研究の焦点,文脈において明確に提示することで,徹底的な視点を育むように設計されている。そこで本研究では,対話エージェントの分野における異なる次元の包括的分類法を提案し,研究者を支援するとともに,自然言語インタフェースの分野における今後の研究の基盤となることを期待する。

The use of natural language interfaces in the field of human-computer interaction is undergoing intense study through dedicated scientific and industrial research. The latest contributions in the field, including deep learning approaches like recurrent neural networks, the potential of context-aware strategies and user-centred design approaches, have brought back the attention of the community to software-based dialogue systems, generally known as conversational agents or chatbots. Nonetheless, and given the novelty of the field, a generic, context-independent overview on the current state of research of conversational agents covering all research perspectives involved is missing. Motivated by this context, this paper reports a survey of the current state of research of conversational agents through a systematic literature review of secondary studies. The conducted research is designed to develop an exhaustive perspective through a clear presentation of the aggregated knowledge published by recent literature within a variety of domains, research focuses and contexts. As a result, this research proposes a holistic taxonomy of the different dimensions involved in the conversational agents' field, which is expected to help researchers and to lay the groundwork for future research in the field of natural language interfaces.

翻訳日:2024-02-07 21:38:25 公開日:2024-02-06

# 潜時空間探索と因果推論による未知通信システムへのアプローチ

Approaching an unknown communication system by latent space exploration and causal inference ( http://arxiv.org/abs/2303.10931v2 )

ライセンス: Link先を確認

Ga\v{s}per Begu\v{s} and Andrej Leban, Shane Gero

(参考訳) 本稿では,教師なし深層生成モデルの潜在空間を探索し,データ中の有意義な性質を発見する手法を提案する。個々の潜在変数を極値に操作し,因果推論に触発された手法をcdev(causal disentanglement with extreme values)と呼ぶアプローチに組み合わせることで,モデル解釈可能性に対する洞察が得られることを示す。これにより、モデルが有意義にエンコードする未知のデータの性質を検証し、最も興味深く調査された動物コミュニケーションシステムの一つであるクジラクジラ(Physeter macrocephalus)のコミュニケーションシステムについての洞察を深めることが出来る。ネットワークアーキテクチャは、音声の有意義な表現を学習するために用いられており、ここでは、基礎的真実を持たない場合の他の音声通信システムの特性を解読する学習メカニズムとして用いられる。提案手法は, コウクジラが, 一連のクリック数, タイミングの規則性, スペクトル平均, 音の規則性などの音響特性を用いて, 情報をエンコードしていることを示唆している。これらの発見の一部は既存の仮説と一致しているが、他の発見は初めて提案されている。また,学習中に提示されない革新的なデータを生成しながら,通信システム内のユニット構造を統制し,それらを適用するためのルールを明らかにする。本稿では,因果推論手法を用いた深層ニューラルネットワークのアウトプットの解釈は,未知なデータに近づくための有効な戦略であり,深層学習が仮説空間を制限できる別の事例を示す。最後に、提案されたアプローチは他のアーキテクチャやデータセットにも拡張できる。

This paper proposes a methodology for discovering meaningful properties in data by exploring the latent space of unsupervised deep generative models. We combine manipulation of individual latent variables to extreme values with methods inspired by causal inference into an approach we call causal disentanglement with extreme values (CDEV) and show that this method yields insights for model interpretability. With this, we can test for what properties of unknown data the model encodes as meaningful, using it to glean insight into the communication system of sperm whales (Physeter macrocephalus), one of the most intriguing and understudied animal communication systems. The network architecture used has been shown to learn meaningful representations of speech; here, it is used as a learning mechanism to decipher the properties of another vocal communication system in which case we have no ground truth. The proposed methodology suggests that sperm whales encode information using the number of clicks in a sequence, the regularity of their timing, and audio properties such as the spectral mean and the acoustic regularity of the sequences. Some of these findings are consistent with existing hypotheses, while others are proposed for the first time. We also argue that our models uncover rules that govern the structure of units in the communication system and apply them while generating innovative data not shown during training. This paper suggests that an interpretation of the outputs of deep neural networks with causal inference methodology can be a viable strategy for approaching data about which little is known and presents another case of how deep learning can limit the hypothesis space. Finally, the proposed approach can be extended to other architectures and datasets.

翻訳日:2024-02-07 21:30:49 公開日:2024-02-06

# locposenet:未発見のオブジェクトポーズ推定に先立つロバストな位置

LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation ( http://arxiv.org/abs/2211.16290v3 )

ライセンス: Link先を確認

Chen Zhao, Yinlin Hu, Mathieu Salzmann

(参考訳) 標準の6dオブジェクトポーズ推定設定では、オブジェクトの位置優先が重要となる。前者は、3Dオブジェクトの変換を初期化し、3Dオブジェクトの回転推定を容易にするために使用できる。残念ながら、この目的のために使用される物体検出器は、見えない物体に一般化しない。したがって、未確認物体の既存の6次元ポーズ推定法は、地中真正物体の位置が未知であると仮定するか、不正確な結果が得られる。本稿では,未確認オブジェクトに先立って位置を頑健に学習できるLocPoseNetという手法を開発し,この問題に対処する。提案手法は,テンプレートマッチング戦略に基づいて,参照カーネルを分散し,マルチスケール相関を効率的に計算するためのクエリでそれらを畳み込む手法を提案する。次に,異なる対象位置パラメータを予測するために,スケール認識機能とスケールロバスト機能を分離する新しい翻訳推定器を導入する。提案手法は,LINEMOD と GenMOP において,既存の作業よりも優れた性能を示す。さらに,難易度の高い合成データセットを構築し,様々なノイズ源に対する手法のロバスト性を強調した。プロジェクトのWebサイトは以下の通り。

Object location prior is critical for the standard 6D object pose estimation setting. The prior can be used to initialize the 3D object translation and facilitate 3D object rotation estimation. Unfortunately, the object detectors that are used for this purpose do not generalize to unseen objects. Therefore, existing 6D pose estimation methods for unseen objects either assume the ground-truth object location to be known or yield inaccurate results when it is unavailable. In this paper, we address this problem by developing a method, LocPoseNet, able to robustly learn location prior for unseen objects. Our method builds upon a template matching strategy, where we propose to distribute the reference kernels and convolve them with a query to efficiently compute multi-scale correlations. We then introduce a novel translation estimator, which decouples scale-aware and scale-robust features to predict different object location parameters. Our method outperforms existing works by a large margin on LINEMOD and GenMOP. We further construct a challenging synthetic dataset, which allows us to highlight the better robustness of our method to various noise sources. Our project website is at: https://sailor-z.github.io/projects/3DV2024_LocPoseNet.html.

翻訳日:2024-02-07 21:30:20 公開日:2024-02-06

# RaLiBEV:アンカーボックス自由物体検出システムのためのレーダとLiDARのBEV融合学習

RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection Systems ( http://arxiv.org/abs/2211.06108v5 )

ライセンス: Link先を確認

Yanlong Yang, Jianan Liu, Tao Huang, Qing-Long Han, Gang Ma and Bing Zhu

(参考訳) 自動運転では、LiDARとレーダーは環境認識に不可欠である。 LiDARは正確な3D空間センシング情報を提供するが、霧のような悪天候に苦しむ。逆に、レーダー信号は、特定の波長によって雨や霧を貫通するが、ノイズの乱れを起こしやすい。最近の最先端の研究は、レーダーとLiDARの融合が悪天候の堅牢な検出につながることを明らかにしている。既存の研究では、畳み込みニューラルネットワークアーキテクチャを採用して、各センサデータから特徴を抽出し、2つの分岐特徴を調整して集約し、オブジェクト検出結果を予測する。しかし,これらの手法はラベル割り当てと融合戦略の単純な設計のため,予測境界ボックスの精度が低い。本稿では,レーダーレンジ方位熱マップとLiDAR点雲から得られた特徴を融合させて,可能な物体を推定する,鳥眼視融合学習に基づくアンカーボックスフリー物体検出システムを提案する。異なるラベル割り当て戦略は、前景や背景アンカーポイントの分類と対応する境界ボックスの回帰との整合性を促進するように設計されている。さらに,新しい対話型トランスモジュールを用いることで,オブジェクト検出器の性能をさらに向上する。本稿では,最近発表されたOxford Radar RobotCarデータセットを用いて,提案手法の優れた性能を示す。本システムの平均精度は, 「クラー」と「フォギー」の訓練条件下で, 0.8 の IoU (IoU) 区間において, 13.1% と 19.0% に向上した。

In autonomous driving, LiDAR and radar are crucial for environmental perception. LiDAR offers precise 3D spatial sensing information but struggles in adverse weather like fog. Conversely, radar signals can penetrate rain or mist due to their specific wavelength but are prone to noise disturbances. Recent state-of-the-art works reveal that the fusion of radar and LiDAR can lead to robust detection in adverse weather. The existing works adopt convolutional neural network architecture to extract features from each sensor data, then align and aggregate the two branch features to predict object detection results. However, these methods have low accuracy of predicted bounding boxes due to a simple design of label assignment and fusion strategies. In this paper, we propose a bird's-eye view fusion learning-based anchor box-free object detection system, which fuses the feature derived from the radar range-azimuth heatmap and the LiDAR point cloud to estimate possible objects. Different label assignment strategies have been designed to facilitate the consistency between the classification of foreground or background anchor points and the corresponding bounding box regressions. Furthermore, the performance of the proposed object detector is further enhanced by employing a novel interactive transformer module. The superior performance of the methods proposed in this paper has been demonstrated using the recently published Oxford Radar RobotCar dataset. Our system's average precision significantly outperforms the state-of-the-art method by 13.1% and 19.0% at Intersection of Union (IoU) of 0.8 under 'Clear+Foggy' training conditions for 'Clear' and 'Foggy' testing, respectively.

翻訳日:2024-02-07 21:29:56 公開日:2024-02-06

# pyRDDLGym:RDDLからGym環境へ

pyRDDLGym: From RDDL to Gym Environments ( http://arxiv.org/abs/2211.05939v5 )

ライセンス: Link先を確認

Ayal Taitler, Michael Gimelfarb, Jihwan Jeong, Sriram Gopalakrishnan, Martin Mladenov, Xiaotian Liu, Scott Sanner

(参考訳) 提案するpyRDDLGymは, RDDL宣言記述からOpenAI Gym環境の自動生成のためのPythonフレームワークである。 rddlにおける変数の離散時間ステップ進化は、ジムステップスキームに自然に適合する条件付き確率関数によって記述される。さらに、RDDLは持ち上げられた記述であるため、複数のエンティティと異なる構成をサポートする環境の修正とスケールアップは、面倒なプロセスではなく、簡単になる。我々は,pyRDDLGymがRDDLの独特な表現力により,ベンチマークの容易かつ迅速な開発を可能にすることで,強化学習コミュニティの新たな風として機能することを期待する。 rddl記述におけるモデルへの明示的なアクセスを提供することで、pyrddlgymはモデルの知識を活用しながら相互作用から学ぶためのハイブリッドアプローチの研究を促進できる。本稿では、pyRDDLGymの設計と組込み例と、フレームワークに組み込まれたRDDL言語への追加について述べる。

We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description. The discrete time step evolution of variables in RDDL is described by conditional probability functions, which fits naturally into the Gym step scheme. Furthermore, since RDDL is a lifted description, the modification and scaling up of environments to support multiple entities and different configurations becomes trivial rather than a tedious process prone to errors. We hope that pyRDDLGym will serve as a new wind in the reinforcement learning community by enabling easy and rapid development of benchmarks due to the unique expressive power of RDDL. By providing explicit access to the model in the RDDL description, pyRDDLGym can also facilitate research on hybrid approaches for learning from interaction while leveraging model knowledge. We present the design and built-in examples of pyRDDLGym, and the additions made to the RDDL language that were incorporated into the framework.

翻訳日:2024-02-07 21:29:26 公開日:2024-02-06

# 非線形ポンププローブ分光における分数統計の署名

Signatures of fractional statistics in nonlinear pump-probe spectroscopy ( http://arxiv.org/abs/2210.16249v2 )

ライセンス: Link先を確認

Max McGinley, Michele Fava, S. A. Parameswaran

(参考訳) 二次元系の励起スペクトルにおけるオンの存在は非線形分光量から推測できることを示した。特に,試料に2つの光パルスを照射し,その間に時間遅延を調節できるポンププローブ分光について考察した。関連する応答係数は、第1パルスブレイドによって生成されたイオンが第2パルスブレイドによって生成されたときに得られる統計位相に由来する普遍的な形式を示す。この挙動は、非統計相互作用や小さな非零温度を含む非普遍物理学によって定性的に変化することが示されている。磁気システムでは、現在利用可能なテラヘルツ領域プローブを用いて興味の信号を測定することができ、量子スピン液体の探索における非線形分光技術の有用性を強調している。

We show that the presence of anyons in the excitation spectrum of a two-dimensional system can be inferred from nonlinear spectroscopic quantities. In particular, we consider pump-probe spectroscopy, where a sample is irradiated by two light pulses with an adjustable time delay between them. The relevant response coefficient exhibits a universal form that originates from the statistical phase acquired when anyons created by the first pulse braid around those created by the second. This behaviour is shown to be qualitatively unchanged by non-universal physics including non-statistical interactions and small nonzero temperatures. In magnetic systems, the signal of interest can be measured using currently available terahertz-domain probes, highlighting the potential usefulness of nonlinear spectroscopic techniques in the search for quantum spin liquids.

翻訳日:2024-02-07 21:29:09 公開日:2024-02-06

# 量子カオスの操作計量と時空間絡み合い構造

Operational Metric for Quantum Chaos and the Corresponding Spatiotemporal Entanglement Structure ( http://arxiv.org/abs/2210.14926v4 )

ライセンス: Link先を確認

Neil Dowling and Kavan Modi

(参考訳) カオスシステムは小さな摂動に非常に敏感であり、生物学的科学、物理科学、社会科学にも至る所に存在する。これを基本原理として、量子カオスの運用概念を構築します。すなわち、多体孤立量子システムの将来の状態は、そのシステムの小さな部分における過去のマルチタイム操作に敏感である。感性」とは、2つの異なる摂動状態から得られる状態が互いに容易に変換できないことを意味する。すなわち、関連する量は最終状態における摂動の影響の複雑さである。 Butterfly Flutter Fidelityと呼ばれるこの直感的な計量から、我々は、カオスに関する一連の操作条件、特に時空間絡みのスケーリングを特定するために、マルチタイム量子プロセスの言語を使用する。我々の基準はすでに、通常の概念と、量子カオスのよく知られた診断を含んでいる。これには、Peres-Loschmidt Echo、Dynamical Entropy、Tripartite Mutual Information、Local-Operator Entanglementが含まれる。したがって、既存の診断を単一の構造内に統一したフレームワークを提供する。さらに、ランダム回路から発生した進化など、量子カオスにつながるいくつかのメカニズムを定量化する。本研究は,多体局在化,測定誘起相転移,フロッケダイナミクスなどの多体力学現象を体系的に研究する手法である。

Chaotic systems are highly sensitive to a small perturbation, and are ubiquitous throughout biological sciences, physical sciences and even social sciences. Taking this as the underlying principle, we construct an operational notion for quantum chaos. Namely, we demand that the future state of a many-body, isolated quantum system is sensitive to past multitime operations on a small subpart of that system. By `sensitive', we mean that the resultant states from two different perturbations cannot easily be transformed into each other. That is, the pertinent quantity is the complexity of the effect of the perturbation within the final state. From this intuitive metric, which we call the Butterfly Flutter Fidelity, we use the language of multitime quantum processes to identify a series of operational conditions on chaos, in particular the scaling of the spatiotemporal entanglement. Our criteria already contain the routine notions, as well as the well-known diagnostics for quantum chaos. This includes the Peres-Loschmidt Echo, Dynamical Entropy, Tripartite Mutual Information, and Local-Operator Entanglement. We hence present a unified framework for these existing diagnostics within a single structure. We also go on to quantify how several mechanisms lead to quantum chaos, such as evolution generated from random circuits. Our work paves the way to systematically study many-body dynamical phenomena like Many-Body Localization, measurement-induced phase transitions, and Floquet dynamics.

翻訳日:2024-02-07 21:28:56 公開日:2024-02-06

# ページ全体のランク付けに偏りのない学習

Whole Page Unbiased Learning to Rank ( http://arxiv.org/abs/2210.10718v2 )

ライセンス: Link先を確認

Haitao Mao, Lixin Zou, Yujia Zheng, Jiliang Tang, Xiaokai Chu, Jiashu Zhao, Qian Wang, Dawei Yin

(参考訳) 情報検索システム、特にクリック行動におけるページ提示バイアスは、暗黙のユーザフィードバックによるランキングモデルのパフォーマンス向上を妨げる、よく知られた課題である。ランク付け-(ultr)アルゴリズムへの偏りのない学習は、バイアス付きクリックデータを用いて偏りのないランキングモデルを学ぶために提案される。しかし、既存のアルゴリズムの多くは、例えば、検索結果ページの表示(SERP)において他の特徴によって引き起こされるバイアス、例えばマルチメディアによって引き起こされる魅力的なバイアスを考慮せずに、位置関連バイアスを緩和するように設計されている。残念ながら、これらのバイアスは産業システムにおいて広く存在し、不十分な検索体験につながる可能性がある。そこで本研究では,全ページSERP機能によって引き起こされるバイアスを同時に処理することを目的とした,全ページのUnbiased Learning to Rank(WP-ULTR)という新たな問題を導入する。 1)適切なユーザ行動モデル(ユーザ行動仮説)を見つけるのは困難であり、(2)複雑なバイアスは既存のアルゴリズムでは処理できない。上記の課題に対処するために、BALというアルゴリズムをランク付けするバイアス非依存学習を提案し、因果発見によるユーザ行動モデルを自動的に見つけ、特定の設計をせずに複数のSERP機能によって引き起こされるバイアスを軽減する。実世界のデータセットによる実験結果から,BALの有効性が検証された。

The page presentation biases in the information retrieval system, especially on the click behavior, is a well-known challenge that hinders improving ranking models' performance with implicit user feedback. Unbiased Learning to Rank~(ULTR) algorithms are then proposed to learn an unbiased ranking model with biased click data. However, most existing algorithms are specifically designed to mitigate position-related bias, e.g., trust bias, without considering biases induced by other features in search result page presentation(SERP), e.g. attractive bias induced by the multimedia. Unfortunately, those biases widely exist in industrial systems and may lead to an unsatisfactory search experience. Therefore, we introduce a new problem, i.e., whole-page Unbiased Learning to Rank(WP-ULTR), aiming to handle biases induced by whole-page SERP features simultaneously. It presents tremendous challenges: (1) a suitable user behavior model (user behavior hypothesis) can be hard to find; and (2) complex biases cannot be handled by existing algorithms. To address the above challenges, we propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model with causal discovery and mitigate the biases induced by multiple SERP features with no specific design. Experimental results on a real-world dataset verify the effectiveness of the BAL.

翻訳日:2024-02-07 21:28:35 公開日:2024-02-06

# 視聴覚および自己報告型パーソナリティ認識のためのディープラーニングモデルのオープンソースベンチマーク

An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition ( http://arxiv.org/abs/2210.09138v2 )

ライセンス: Link先を確認

Rongfan Liao and Siyang Song and Hatice Gunes

(参考訳) パーソナリティは、人間の日常生活や作業行動の多様さを決定づけ、人間の内外的状態を理解するのに不可欠である。近年,非言語的音声・視覚行動に基づく被験者の見かけのパーソナリティまたは自己報告のパーソナリティを予測するための自動パーソナリティ計算手法が多数開発されている。しかし、その大半は複雑なデータセット固有の前処理ステップやモデルトレーニングのトリックに苦しむ。一貫性のある実験的な設定の標準ベンチマークがないため、これらのパーソナリティコンピューティングモデルの実際の性能を適切に比較することは不可能であり、再現も困難である。本稿では,既存の8つのパーソナリティ・コンピューティングモデル(例えば,音声,視覚,音声視覚)と7つの標準ディープラーニングモデルについて,自己報告と明らかなパーソナリティ認識タスクの両方で公正かつ一貫した評価を行うための,最初の再現可能な音声・視覚ベンチマークフレームワークを提案する。また、一連のベンチマークモデルに基づいて、人格計算結果に対する短期・フレームレベルの予測を要約するための2つの長期モデリング戦略の影響についても検討する。結果は以下の通りである。 (i)ほとんどのベンチマークされたディープラーニングモデルによる顔行動から推定される明らかな性格特性は、自己報告されたものよりも信頼性が高い。 (II)視覚モデルは、人格認識における音声モデルよりも優れたパフォーマンスをしばしば達成する。 (iii)非言語行動は、異なる性格特性の予測に異なる寄与をする。 (4) 再現されたパーソナリティ・コンピューティング・モデルは, 当初報告した結果よりも性能が悪くなった。我々のベンチマークは \url{https://github.com/liaorongfan/DeepPersonality} で公開されています。

Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states. In recent years, a large number of automatic personality computing approaches have been developed to predict either the apparent personality or self-reported personality of the subject based on non-verbal audio-visual behaviours. However, the majority of them suffer from complex and dataset-specific pre-processing steps and model training tricks. In the absence of a standardized benchmark with consistent experimental settings, it is not only impossible to fairly compare the real performances of these personality computing models but also makes them difficult to be reproduced. In this paper, we present the first reproducible audio-visual benchmarking framework to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks. Building upon a set of benchmarked models, we also investigate the impact of two previously-used long-term modelling strategies for summarising short-term/frame-level predictions on personality computing results. The results conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; (iii) non-verbal behaviours contribute differently in predicting different personality traits; and (iv) our reproduced personality computing models generally achieved worse performances than their original reported results. Our benchmark is publicly available at \url{https://github.com/liaorongfan/DeepPersonality}.

翻訳日:2024-02-07 21:27:58 公開日:2024-02-06

# 崩壊しない2光子状態の多重測定

Multiple measurements on an uncollapsed entangled two-photon state ( http://arxiv.org/abs/2210.06045v2 )

ライセンス: Link先を確認

Dalibor Jav\r{u}rek

(参考訳) 量子状態の崩壊の定義と相同性の相対性理論は実験的な状況へと発展し、複数の測定値が連続しない量子状態に対して取られる。量子状態の崩壊時空間分布は、量子系を測定する検出器の基準フレームおよび検出器に対して移動する基準フレームに示される。彼らの検査から、ある条件下では、複数の測定値が同じ非収束量子状態において許容される。この手法の応用は、偏光とエネルギーに絡み合った光子対状態の測定に応用される。私は、2つの測定値が未収束の光子対状態に対して取られる条件を導出する。同じ非崩壊状態における複数の測定の許容から、深刻な結果が続く。例えば、この状況における両方の検出器による測定は相関しない。さらに、保存法則は個々の測定値に違反するが、平均値には違反しない。このステートメントはエネルギーに絡み合った2光子状態で証明される。これは、検出器が互いに相対的に静止して観測した実験結果と矛盾している。量子状態の観測結果が相関しているという予測と実験結果が一致しない場合、コペンハーゲン解釈とは異なる量子状態の崩壊の新しい時空分布が、この状況の適切な解法として提案されなければならない。

The relativity of simultaneity together with definition of a quantum state's collapse result into experimental situations, where a multiple measurements can be taken on an uncollapsed quantum state. A quantum state's collapse space-time distribution is shown in a reference frame of a detector measuring the quantum system and in a reference frame moving relative to the detector. From their inspection follows, that under certain conditions, multiple measurements are allowed on the same uncollapsed quantum state. An application of the developed approach is shown on measurement of photon-pair state entangled in polarizations and energy. I derive conditions, under which two measurements can be taken on the uncollapsed photon-pair state. From allowance of multiple measurements on the same uncollapsed state follow serious consequences. For example, the measurements taken by both detectors in this situation are uncorrelated. Moreover, all the conservation laws could be violated in individual measurements, but not in mean value. This statement is proved on the two-photon state entangled in energy. This is in contradiction with experimental results observed by the detectors in rest relative to each other. If experimental results of the proposed experiment disagree with the predictions -- results measured on the quantum state are correlated, new space-time distribution of the quantum state's collapse, different from the Copenhagen interpretation, has to be proposed for proper solution of this situation.

翻訳日:2024-02-07 21:27:27 公開日:2024-02-06

# ViT-DD:セミスーパービジョンドライバディトラクション検出用マルチタスク・ビジョン・トランス

ViT-DD: Multi-Task Vision Transformer for Semi-Supervised Driver Distraction Detection ( http://arxiv.org/abs/2209.09178v4 )

ライセンス: Link先を確認

Yunsheng Ma and Ziran Wang

(参考訳) 現代の運転における交通安全確保と事故軽減が最重要であり、コンピュータビジョン技術はこの目標に大きく貢献する可能性がある。本稿では,運転者注意障害検出と運転者の感情認識の両方に関連するトレーニング信号からインダクティブ情報を取り入れたマルチモーダル視覚変換器(ViT-DD)を提案する。さらに,感情ラベルのないドライバデータをvit-ddのマルチタスクトレーニングプロセスにシームレスに統合可能な自己学習アルゴリズムを開発した。実験結果から,提案したViT-DDは,SFDDDデータセットとAUCDDデータセットにおいて,運転者の気晴らしを6.5%,0。

Ensuring traffic safety and mitigating accidents in modern driving is of paramount importance, and computer vision technologies have the potential to significantly contribute to this goal. This paper presents a multi-modal Vision Transformer for Driver Distraction Detection (termed ViT-DD), which incorporates inductive information from training signals related to both distraction detection and driver emotion recognition. Additionally, a self-learning algorithm is developed, allowing for the seamless integration of driver data without emotion labels into the multi-task training process of ViT-DD. Experimental results reveal that the proposed ViT-DD surpasses existing state-of-the-art methods for driver distraction detection by 6.5% and 0.9% on the SFDDD and AUCDD datasets, respectively.

翻訳日:2024-02-07 21:27:07 公開日:2024-02-06

# アニーリングパスの変分表現:単調埋め込み下のブレグマン情報

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding ( http://arxiv.org/abs/2209.07481v3 )

ライセンス: Link先を確認

Rob Brekelmans, Frank Nielsen

(参考訳) マルコフ連鎖モンテカルロ法による複素分布のサンプリングと正規化定数の推定は、移動可能な初期分布と関心のターゲット密度とを橋渡しするアニーリングパスに沿った中間分布の列からサンプルをシミュレートすることが多い。先行研究は準算術的な手段を用いてアニーリングパスを構築し、結果として生じる中間密度は、エンドポイントへの期待分散を最小限に抑えるものとして解釈した。これらのアニーリングパスの変分表現を分析するために、算術平均の引数が期待されるブレグマン偏差を1つの代表点まで最小化することを示す既知の結果を拡張する。特に、ブレグマン発散への入力が単調な埋め込み関数の下で変換されるとき、準算術的な方法で類似の結果を得る。本解析では,rho-tau表現型ブレグマン発散フレームワークを用いた準アリオスメティックな手段,パラメトリック族,発散関数間の相互作用に着目し,発散関数と中間密度をアニーリング経路に沿って関連付ける。

Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior works have constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. To analyze these variational representations of annealing paths, we extend known results showing that the arithmetic mean over arguments minimizes the expected Bregman divergence to a single representative point. In particular, we obtain an analogous result for quasi-arithmetic means, when the inputs to the Bregman divergence are transformed under a monotonic embedding function. Our analysis highlights the interplay between quasi-arithmetic means, parametric families, and divergence functionals using the rho-tau representational Bregman divergence framework, and associates common divergence functionals with intermediate densities along an annealing path.

翻訳日:2024-02-07 21:26:55 公開日:2024-02-06

# Semantic2Graph:ビデオにおけるアクションセグメンテーションのためのグラフベースのマルチモーダル機能融合

Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos ( http://arxiv.org/abs/2209.05653v5 )

ライセンス: Link先を確認

Junbin Zhang, Pei-Hsuan Tsai and Meng-Hsun Tsai

(参考訳) ビデオアクションセグメンテーションは多くの分野で広く適用されている。これまでの研究のほとんどは、この目的のためにビデオベースのビジョンモデルを使用していた。しかし、ビデオ内の長期的な依存関係を捉えるために、大きな受容フィールド(lstmまたはtransformerメソッド)に依存することがしばしばあり、重要な計算資源要求に繋がる。この課題に対処するため、グラフベースのモデルが提案された。しかし、従来のグラフベースのモデルは正確ではない。そこで本研究では,Semantic2Graphというグラフ構造化手法を導入し,ビデオの長期依存性をモデル化し,計算コストを低減し,精度を高める。映像のグラフ構造をフレームレベルで構築する。時間的エッジはビデオ内の時間的関係と行動順序をモデル化するために使用される。さらに,ビデオ行動における長期的・短期的な意味的関係を捉えるために,対応するエッジ重みを伴う肯定的・否定的な意味的エッジを設計した。 node属性は、ビデオコンテンツ、グラフ構造、ラベルテキストから抽出された豊富なマルチモーダルな特徴を包含し、視覚的、構造的、セマンティックな手がかりを包含する。このマルチモーダル情報を効果的に合成するために,ノード動作ラベル分類のための多モーダル特徴を融合するグラフニューラルネットワーク(GNN)モデルを用いる。実験の結果、Semantic2Graphは、特にGTEAや50Saladsのようなベンチマークデータセットにおいて、最先端の手法よりもパフォーマンスが優れていることが示された。複数のアブレーション実験は、モデル性能の向上における意味的特徴の有効性をさらに検証する。特に、Semantic2Graphにセマンティックエッジを組み込むことで、ビデオベースのビジョンモデルにおける計算リソースの制約による課題に対処する上で、長期的な依存関係をコスト効率よくキャプチャすることができる。

Video action segmentation have been widely applied in many fields. Most previous studies employed video-based vision models for this purpose. However, they often rely on a large receptive field, LSTM or Transformer methods to capture long-term dependencies within videos, leading to significant computational resource requirements. To address this challenge, graph-based model was proposed. However, previous graph-based models are less accurate. Hence, this study introduces a graph-structured approach named Semantic2Graph, to model long-term dependencies in videos, thereby reducing computational costs and raise the accuracy. We construct a graph structure of video at the frame-level. Temporal edges are utilized to model the temporal relations and action order within videos. Additionally, we have designed positive and negative semantic edges, accompanied by corresponding edge weights, to capture both long-term and short-term semantic relationships in video actions. Node attributes encompass a rich set of multi-modal features extracted from video content, graph structures, and label text, encompassing visual, structural, and semantic cues. To synthesize this multi-modal information effectively, we employ a graph neural network (GNN) model to fuse multi-modal features for node action label classification. Experimental results demonstrate that Semantic2Graph outperforms state-of-the-art methods in terms of performance, particularly on benchmark datasets such as GTEA and 50Salads. Multiple ablation experiments further validate the effectiveness of semantic features in enhancing model performance. Notably, the inclusion of semantic edges in Semantic2Graph allows for the cost-effective capture of long-term dependencies, affirming its utility in addressing the challenges posed by computational resource constraints in video-based vision models.

翻訳日:2024-02-07 21:26:33 公開日:2024-02-06

# 多変量拡散・分別凝集機構による皮膚癌逆行例の検討

Reversing Skin Cancer Adversarial Examples by Multiscale Diffusive and Denoising Aggregation Mechanism ( http://arxiv.org/abs/2208.10373v3 )

ライセンス: Link先を確認

Yongwei Wang, Yuan Li, Zhiqi Shen, Yuhui Qiao

(参考訳) 皮膚癌診断モデルが早期スクリーニングや医療介入において重要な役割を担っている。コンピュータ支援型皮膚がん分類システムでは、ディープラーニングアプローチを採用している。しかし、近年の研究では、皮膚がんの診断モデルの性能を著しく低下させるために、逆境攻撃に対する極端な脆弱性が明らかにされている。これらの脅威を軽減するため,本研究は,皮膚がん画像におけるリバースエンジニアリング逆転による,シンプルで効果的で資源効率のよい防御枠組みを示す。具体的には、医療画像領域の識別構造をより良く保存するために、まず、多スケール画像ピラミッドが確立される。逆効果を中和するために、異方性ガウス雑音を注入して異なるスケールの皮膚画像を徐々に拡散させ、逆効果例をクリーン画像多様体に移動させる。さらに、逆方向のノイズを逆転させ、冗長なノイズを抑えるため、隣接するスケールの画像情報を集約する新しいマルチスケールデノナイズ機構を慎重に設計する。皮膚がんの多クラス分類データセットであるISIC 2019において,本手法の防御効果を評価した。実験の結果,本手法は異なる攻撃による逆向きの摂動を効果的に回避し,皮膚がんの診断モデルにおいて最先端の手法を著しく上回ることがわかった。

Reliable skin cancer diagnosis models play an essential role in early screening and medical intervention. Prevailing computer-aided skin cancer classification systems employ deep learning approaches. However, recent studies reveal their extreme vulnerability to adversarial attacks -- often imperceptible perturbations to significantly reduce the performances of skin cancer diagnosis models. To mitigate these threats, this work presents a simple, effective, and resource-efficient defense framework by reverse engineering adversarial perturbations in skin cancer images. Specifically, a multiscale image pyramid is first established to better preserve discriminative structures in the medical imaging domain. To neutralize adversarial effects, skin images at different scales are then progressively diffused by injecting isotropic Gaussian noises to move the adversarial examples to the clean image manifold. Crucially, to further reverse adversarial noises and suppress redundant injected noises, a novel multiscale denoising mechanism is carefully designed that aggregates image information from neighboring scales. We evaluated the defensive effectiveness of our method on ISIC 2019, a largest skin cancer multiclass classification dataset. Experimental results demonstrate that the proposed method can successfully reverse adversarial perturbations from different attacks and significantly outperform some state-of-the-art methods in defending skin cancer diagnosis models.

翻訳日:2024-02-07 21:25:48 公開日:2024-02-06

# ARIEL: 逆グラフコントラスト学習

ARIEL: Adversarial Graph Contrastive Learning ( http://arxiv.org/abs/2208.06956v2 )

ライセンス: Link先を確認

Shengyu Feng, Baoyu Jing, Yada Zhu, Hanghang Tong

(参考訳) コントラスト学習はグラフ表現学習において効果的な教師なしの手法であり、対照的学習の重要な要素は正と負のサンプルの構築にある。従来の方法は通常、グラフ内のノードの近接を原則として利用する。近年,データ提示型コントラスト学習法が進歩し,視覚領域で大きな力を発揮するようになり,その手法を画像からグラフに拡張した研究もある。しかし、画像上のデータ拡張とは異なり、グラフ上のデータ拡張は直感的ではなく、高品質のコントラストサンプルを提供することがはるかに難しく、改善の余地がたくさんある。本研究では、データ拡張のための逆グラフビューを導入することにより、合理的な制約の中で情報的コントラストサンプルを抽出する簡易かつ効果的な手法である逆グラフコントラスト学習(ARIEL)を提案する。我々は,安定トレーニングのための情報正規化と呼ばれる新しい手法を開発し,拡張性にサブグラフサンプリングを用いる。ノードレベルのコントラスト学習からグラフレベルまで,各グラフインスタンスをスーパーノードとして扱うことで一般化する。 ARIELは、実世界のデータセット上のノードレベルとグラフレベルの両方の分類タスクにおいて、現在のグラフコントラスト学習手法よりも一貫して優れている。さらに、ARIELは敵の攻撃に対してより堅牢であることを示す。

Contrastive learning is an effective unsupervised method in graph representation learning, and the key component of contrastive learning lies in the construction of positive and negative samples. Previous methods usually utilize the proximity of nodes in the graph as the principle. Recently, the data-augmentation-based contrastive learning method has advanced to show great power in the visual domain, and some works extended this method from images to graphs. However, unlike the data augmentation on images, the data augmentation on graphs is far less intuitive and much harder to provide high-quality contrastive samples, which leaves much space for improvement. In this work, by introducing an adversarial graph view for data augmentation, we propose a simple but effective method, Adversarial Graph Contrastive Learning (ARIEL), to extract informative contrastive samples within reasonable constraints. We develop a new technique called information regularization for stable training and use subgraph sampling for scalability. We generalize our method from node-level contrastive learning to the graph level by treating each graph instance as a super-node. ARIEL consistently outperforms the current graph contrastive learning methods for both node-level and graph-level classification tasks on real-world datasets. We further demonstrate that ARIEL is more robust in the face of adversarial attacks.

翻訳日:2024-02-07 21:25:24 公開日:2024-02-06

# 多重不変集合をもつ非線形系の持ち上げと再構成について

On the lifting and reconstruction of nonlinear systems with multiple invariant sets ( http://arxiv.org/abs/2304.11860v3 )

ライセンス: Link先を確認

Shaowu Pan and Karthik Duraisamy

(参考訳) クープマン作用素(koopman operator)は、不変部分空間における可観測性の進化に焦点をあてることで、非線形ダイナミクスに関する線型視点を与える。可観測性は通常、クープマン固有関数から線形に再構成される。過去数年間のクープマン作用素の広範な使用にもかかわらず、クープマン作用素を複数の不連続不変量集合(例えば孤立不動点からのアトラクションの盆地)を持つ力学系に適用する可能性について、いくつかの誤解がある。本稿では,まず,複数の不連続不変量集合を持つ非線形システムの線形再構成に基づくクープマン作用素の機構について,簡単な説明を行う。次に、データ効率の良い方法でクープマン固有関数を構成するために、そのような不変集合間の離散対称性の使用について議論する。最後に、koopman作用素の学習に対称性を利用する利点を説明するために、いくつかの数値例が提供されている。

The Koopman operator provides a linear perspective on non-linear dynamics by focusing on the evolution of observables in an invariant subspace. Observables of interest are typically linearly reconstructed from the Koopman eigenfunctions. Despite the broad use of Koopman operators over the past few years, there exist some misconceptions about the applicability of Koopman operators to dynamical systems with more than one disjoint invariant sets (e.g., basins of attractions from isolated fixed points). In this work, we first provide a simple explanation for the mechanism of linear reconstruction-based Koopman operators of nonlinear systems with multiple disjoint invariant sets. Next, we discuss the use of discrete symmetry among such invariant sets to construct Koopman eigenfunctions in a data efficient manner. Finally, several numerical examples are provided to illustrate the benefits of exploiting symmetry for learning the Koopman operator.

翻訳日:2024-02-07 21:17:51 公開日:2024-02-06

# MI-SegNet:unseen Domain Generalizationのための相互情報に基づくUSセグメンテーション

MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization ( http://arxiv.org/abs/2303.12649v3 )

ライセンス: Link先を確認

Yuan Bi, Zhongliang Jiang, Ricarda Clarenbach, Reza Ghotbi, Angelos Karlas, Nassir Navab

(参考訳) ドメイン間の学習に基づく医用画像分割の一般化は、現在、領域シフトによる性能低下、特に超音波(us)イメージングによって制限されている。アメリカの画像の品質は、音像、機械、設定によって異なる、注意深く調整された音響パラメータに大きく依存している。ドメイン間のUS画像の一般化性を改善するために,解剖学的特徴表現とドメイン特徴表現を明確に分離する新たな相互情報(MI)ベースのフレームワークMI-SegNetを提案する。 2つのエンコーダを使用して、絡み合いの関連特徴を抽出する。セグメンテーションはその予測に解剖学的特徴マップのみを使用する。エンコーダに有意義な特徴表現を学習させるために、トレーニング中にクロスリコンストラクション法が使用される。ドメインまたは解剖学に特有の変換は、それぞれの特徴抽出タスクでエンコーダを導くために適用される。さらに、両方の機能マップに存在するすべてのmiは、別々の機能空間をさらに促進するために罰せられる。パラメータやマシンの異なる複数のデータセットに対して提案したドメイン独立セグメンテーション手法の一般化可能性を検証する。さらに,提案するMI-SegNetを,最先端ネットワークと比較し,事前学習モデルとして有効であることを示す。

Generalization capabilities of learning-based medical image segmentation across domains are currently limited by the performance degradation caused by the domain shift, particularly for ultrasound (US) imaging. The quality of US images heavily relies on carefully tuned acoustic parameters, which vary across sonographers, machines, and settings. To improve the generalizability on US images across domains, we propose MI-SegNet, a novel mutual information (MI) based framework to explicitly disentangle the anatomical and domain feature representations; therefore, robust domain-independent segmentation can be expected. Two encoders are employed to extract the relevant features for the disentanglement. The segmentation only uses the anatomical feature map for its prediction. In order to force the encoders to learn meaningful feature representations a cross-reconstruction method is used during training. Transformations, specific to either domain or anatomy are applied to guide the encoders in their respective feature extraction task. Additionally, any MI present in both feature maps is punished to further promote separate feature spaces. We validate the generalizability of the proposed domain-independent segmentation approach on several datasets with varying parameters and machines. Furthermore, we demonstrate the effectiveness of the proposed MI-SegNet serving as a pre-trained model by comparing it with state-of-the-art networks.

翻訳日:2024-02-07 21:17:25 公開日:2024-02-06

# 遷移系を用いた非循環的問合せ枠組への時間性と因果性の統合

Integrating Temporality and Causality into Acyclic Argumentation Frameworks using a Transition System ( http://arxiv.org/abs/2303.09197v2 )

ライセンス: Link先を確認

Y. Munro (1), C. Sarmiento (1), I. Bloch (1), G. Bourgne (1), M.-J. Lesot (1) ((1) Sorbonne Universit\'e, CNRS, LIP6, Paris, France)

(参考訳) 抽象的議論の文脈では、時間性、すなわち、引数が列挙される順序、および因果性を考慮する利点を提示する。本研究では,非循環的抽象的論証フレームワークの概念をアクション言語に書き換える形式的手法を提案する。これは世界の進化をモデル化し,直接的・間接的を問わず,議論と結果の因果関係を確立する。解集合プログラミングの実装も提案され、説明への視点も提案されている。

In the context of abstract argumentation, we present the benefits of considering temporality, i.e. the order in which arguments are enunciated, as well as causality. We propose a formal method to rewrite the concepts of acyclic abstract argumentation frameworks into an action language, that allows us to model the evolution of the world, and to establish causal relationships between the enunciation of arguments and their consequences, whether direct or indirect. An Answer Set Programming implementation is also proposed, as well as perspectives towards explanations.

翻訳日:2024-02-07 21:17:04 公開日:2024-02-06

# 2次元拡散モデルにロバストテキスト-3次元生成のための3次元一貫性を知らせる

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation ( http://arxiv.org/abs/2303.07937v4 )

ライセンス: Link先を確認

Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim

(参考訳) テキスト対3d生成は、前訓練されたテキスト対2d拡散モデルを用いてゼロショット設定で神経放射場(nerf)を最適化する手法であるスコア蒸留の出現により、近年急速に進歩している。しかし, 2次元拡散モデルにおける3次元認識の欠如は, スコア蒸留法による3次元シーンの再構成を不安定にする。そこで本研究では,事前学習した2次元拡散モデルに3次元認識を組み込んだ新しいフレームワークである3dfuseを提案する。まず,与えられたテキストプロンプトの粗い3次元構造を構築し,拡散モデルの条件として投影された視点特異的深度マップを用いた。さらに,ロバストな生成のための粗い3次元構造内の誤差や空間性を扱う2次元拡散モデルの学習を可能にするトレーニング戦略と,シーンのすべての視点において意味的一貫性を確保する手法を導入する。我々の枠組みは, 先行技術の限界を超え, 2次元拡散モデルの3次元整合生成に大きな影響を与える。

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting. However, the lack of 3D awareness in the 2D diffusion models destabilizes score distillation-based methods from reconstructing a plausible 3D scene. To address this issue, we propose 3DFuse, a novel framework that incorporates 3D awareness into pretrained 2D diffusion models, enhancing the robustness and 3D consistency of score distillation-based methods. We realize this by first constructing a coarse 3D structure of a given text prompt and then utilizing projected, view-specific depth map as a condition for the diffusion model. Additionally, we introduce a training strategy that enables the 2D diffusion model learns to handle the errors and sparsity within the coarse 3D structure for robust generation, as well as a method for ensuring semantic consistency throughout all viewpoints of the scene. Our framework surpasses the limitations of prior arts, and has significant implications for 3D consistent generation of 2D diffusion models.

翻訳日:2024-02-07 21:16:55 公開日:2024-02-06

# ORCHNet: 果樹園における3次元LiDARに基づく位置認識のためのロバストグローバルな特徴集約アプローチ

ORCHNet: A Robust Global Feature Aggregation approach for 3D LiDAR-based Place recognition in Orchards ( http://arxiv.org/abs/2303.00477v2 )

ライセンス: Link先を確認

T. Barros, L. Garrote, P. Conde, M.J. Coombes, C. Liu, C. Premebida, U.J. Nunes

(参考訳) 農業環境におけるロバストで信頼性の高い位置認識とループ閉鎖検出は依然として未解決の問題である。特に果樹園は、全分野にわたる構造的類似性のため、難しいケーススタディである。本研究では,3次元LiDARデータを利用した果樹園における位置認識問題に対処する。そこで我々は,3D-LiDARスキャンをグローバルディスクリプタにマッピングするディープラーニングベースのアプローチORCHNetを提案する。具体的には,複数のアグリゲーションメソッドをロバストなグローバルディスクリプタに融合する,新たなグローバル機能アグリゲータアプローチを提案する。 ORCHNetは、夏と秋の季節のデータを含む果樹園で収集された実世界のデータに基づいて評価される。このロバスト性を評価するために,orchnet と同一季節および季節間のデータを用いた最先端の集計手法を比較した。さらに,ORCHNetをループ閉鎖検出器として利用する局所化フレームワークの一部として,提案手法を評価した。実験結果から, ORCHNetは場所認識タスクにおいて, 残りのアプローチよりも優れており, シーズンを通じて堅牢であることがわかった。ローカライゼーションに関しては,ORCHNetをループ検出器として統合する際,木を通り抜けるエッジケースを解決し,本課題における提案手法の適用可能性を示す。コードは:\url{https://github.com/Cybonic/ORCHNet.git}で公開される。

Robust and reliable place recognition and loop closure detection in agricultural environments is still an open problem. In particular, orchards are a difficult case study due to structural similarity across the entire field. In this work, we address the place recognition problem in orchards resorting to 3D LiDAR data, which is considered a key modality for robustness. Hence, we propose ORCHNet, a deep-learning-based approach that maps 3D-LiDAR scans to global descriptors. Specifically, this work proposes a new global feature aggregation approach, which fuses multiple aggregation methods into a robust global descriptor. ORCHNet is evaluated on real-world data collected in orchards, comprising data from the summer and autumn seasons. To assess the robustness, we compare ORCHNet with state-of-the-art aggregation approaches on data from the same season and across seasons. Moreover, we additionally evaluate the proposed approach as part of a localization framework, where ORCHNet is used as a loop closure detector. The empirical results indicate that, on the place recognition task, ORCHNet outperforms the remaining approaches, and is also more robust across seasons. As for the localization, the edge cases where the path goes through the trees are solved when integrating ORCHNet as a loop detector, showing the potential applicability of the proposed approach in this task. The code will be publicly available at:\url{https://github.com/Cybonic/ORCHNet.git}

翻訳日:2024-02-07 21:16:35 公開日:2024-02-06

# ターゲット拡張による領域外ロバスト性

Out-of-Domain Robustness via Targeted Augmentations ( http://arxiv.org/abs/2302.11861v3 )

ライセンス: Link先を確認

Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang

(参考訳) あるドメインでトレーニングされたモデルは、例えば野生生物の監視モデルが新しいカメラの場所にデプロイされる場合など、目に見えないドメインのパフォーマンス低下を被ることが多い。本研究では、外部ドメイン(OOD)一般化のためのデータ拡張を設計するための原則について研究する。特に、ドメインに依存しないいくつかの機能が堅牢である実世界のシナリオ、すなわちドメイン毎に異なるいくつかの機能は予測OODである。例えば、上記の野生生物モニタリングアプリケーションでは、画像の背景はカメラの場所によって異なるが、生息地のタイプを示す。線形設定に関する理論的解析に動機づけられ,ロバストな特徴を保ちながらスプリアスなドメイン依存特徴を選択的にランダム化する目標拡張法を提案する。対象の拡張によってOOD性能が向上し、より少ないドメインでモデルを一般化できることを示す。対照的に、ドメイン依存機能のランダム化に失敗したジェネリック拡張や、すべてのドメイン依存機能のランダム化を行うドメイン不変拡張といった既存のアプローチは、いずれもOODが不十分である。実世界の3つのデータセットの実験では、ターゲット拡張によってOODのパフォーマンスが3.2～15.2ポイント向上した。

Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2 percentage points.

翻訳日:2024-02-07 21:16:11 公開日:2024-02-06

# 知識グラフによる推論のためのニューロシンボリックAI:サーベイ

Neurosymbolic AI for Reasoning over Knowledge Graphs: A Survey ( http://arxiv.org/abs/2302.07200v2 )

ライセンス: Link先を確認

Lauren Nicole DeLong, Ramon Fern\'andez Mir, Jacques D. Fleuriot (The University of Edinburgh School of Informatics, Artificial Intelligence and its Applications Institute)

(参考訳) ニューロシンボリックAIは、シンボリック推論手法とディープラーニングを組み合わせて、補完的な利点を活用する研究の活発な領域である。知識グラフは異種・多関係的なデータを表現するための一般的な方法になりつつあるため、グラフ構造を推論する手法はこのニューロシンボリックパラダイムに従おうとしている。従来、そのようなアプローチは規則に基づく推論か、パターンを抽出できる代表的な数値埋め込みのいずれかを使用してきた。しかし、近年のいくつかの研究は、この二分法を橋渡しして、解釈性を促進し、競争力を保ち、専門家の知識を統合するモデルを作ろうと試みている。そこで我々は,知識グラフ上でニューロシンボリック推論タスクを行う手法を調査し,それらを分類できる新しい分類法を提案する。具体的には,(1)論理式埋め込みアプローチ,(2)論理制約付き埋め込みアプローチ,(3)規則学習アプローチの3つの主要なカテゴリを提案する。分類と並行して,より直接的な比較のために,アプローチの概要とソースコードへのリンクを提供する。最後に,これらの手法の特徴と限界について考察し,この研究分野が発展するであろういくつかの今後の方向性を提案する。

Neurosymbolic AI is an increasingly active area of research that combines symbolic reasoning methods with deep learning to leverage their complementary benefits. As knowledge graphs are becoming a popular way to represent heterogeneous and multi-relational data, methods for reasoning on graph structures have attempted to follow this neurosymbolic paradigm. Traditionally, such approaches have utilized either rule-based inference or generated representative numerical embeddings from which patterns could be extracted. However, several recent studies have attempted to bridge this dichotomy to generate models that facilitate interpretability, maintain competitive performance, and integrate expert knowledge. Therefore, we survey methods that perform neurosymbolic reasoning tasks on knowledge graphs and propose a novel taxonomy by which we can classify them. Specifically, we propose three major categories: (1) logically-informed embedding approaches, (2) embedding approaches with logical constraints, and (3) rule learning approaches. Alongside the taxonomy, we provide a tabular overview of the approaches and links to their source code, if available, for more direct comparison. Finally, we discuss the unique characteristics and limitations of these methods, then propose several prospective directions toward which this field of research could evolve.

翻訳日:2024-02-07 21:15:51 公開日:2024-02-06

# 分散型適応選好エージェントに対するオンライン勧告

Online Recommendations for Agents with Discounted Adaptive Preferences ( http://arxiv.org/abs/2302.06014v2 )

ライセンス: Link先を確認

Arpit Agarwal, William Brown

(参考訳) 未知の$\textit{preference model}$ によれば、エージェントの好み(推奨項目よりも選択確率を示す)が過去の選択関数として進化する、バンディットの推奨問題を考える。各ラウンドで、エージェントに$k$アイテム(合計$n$)のメニューを表示し、1つのアイテムを選択する。そして、エージェントの選択に対する敵の損失に対して、$\textit{target set}$(アイテムのサブセット)に対する後悔を最小限に抑える。均一メモリエージェントが考慮されたagarwalとbrown(2022年)から設定を拡張することにより、次のラウンド毎にエージェントのメモリベクトルにディスカウント係数が適用される一様でないメモリを許容する。長期記憶(long-term memory)」では、任意の$\textit{smooth}$のモデルに対して、効率的なサブリニア後悔が$\textit{everywhere instantaneally realizable distributions}$(以前の作業で定式化された"eird set")のセットに対して得られることが示される。さらに、メモリ重みの線型関数によって上下に有界な選好(これらの「スケールバウンド」選好と呼ぶ)に対して、$\textit{entire}$ item simplex のほとんどについて効率的なサブ線形後悔を求めるアルゴリズムを与える。 EIRD以上のターゲットに拡張するためのNP-hardnessの結果を示す。短期記憶」体制(メモリ水平線が一定である場合)において、スケールバウンドされた嗜好は、損失があまり頻繁に変化しない場合でも、スムーズさを伴わずに、ほぼすべての単純体に対して効率的なサブ線形後悔を可能にすることを示すが、損失が一定であっても任意のスムーズな選好モデルの下でEIRDと競合する情報理論上の障壁を示す。

We consider a bandit recommendations problem in which an agent's preferences (representing selection probabilities over recommended items) evolve as a function of past selections, according to an unknown $\textit{preference model}$. In each round, we show a menu of $k$ items (out of $n$ total) to the agent, who then chooses a single item, and we aim to minimize regret with respect to some $\textit{target set}$ (a subset of the item simplex) for adversarial losses over the agent's choices. Extending the setting from Agarwal and Brown (2022), where uniform-memory agents were considered, here we allow for non-uniform memory in which a discount factor is applied to the agent's memory vector at each subsequent round. In the "long-term memory" regime (when the effective memory horizon scales with $T$ sublinearly), we show that efficient sublinear regret is obtainable with respect to the set of $\textit{everywhere instantaneously realizable distributions}$ (the "EIRD set", as formulated in prior work) for any $\textit{smooth}$ preference model. Further, for preferences which are bounded above and below by linear functions of memory weight (we call these "scale-bounded" preferences) we give an algorithm which obtains efficient sublinear regret with respect to nearly the $\textit{entire}$ item simplex. We show an NP-hardness result for expanding to targets beyond EIRD in general. In the "short-term memory" regime (when the memory horizon is constant), we show that scale-bounded preferences again enable efficient sublinear regret for nearly the entire simplex even without smoothness if losses do not change too frequently, yet we show an information-theoretic barrier for competing against the EIRD set under arbitrary smooth preference models even when losses are constant.

翻訳日:2024-02-07 21:15:29 公開日:2024-02-06

# 有限温度における量子忠実性に現れる量子相転移のシグネチャ

Signature of quantum phase transition manifested in quantum fidelity at finite temperature ( http://arxiv.org/abs/2302.01795v2 )

ライセンス: Link先を確認

Protyush Nandi, Sirshendu Bhattacharyya and Subinay Dasgupta

(参考訳) 量子相転移のシグネチャは一般に有限温度で消去される。非解析的行動を通じてこのシグネチャを運ぶために観測された少量の量は、低温のみに限られる。高温で適切な動的量を特定することを目的として、我々は最近、低温状態を超えた量子臨界点で非解析的シグネチャを持つ量子忠実度から関数を構築した。本稿では, 初期の研究を詳述し, 対応する速度関数の挙動と, 異なる次元の多体ハミルトニアンに対する非解析性の堅牢性を示す。また、我々の速度関数は、ゼロ温度での動的量子相転移(DQPT)の実証に使用されるものまで減少することを示した。さらに、DQPTとは異なり、速度関数の長い時間制限は平衡量子相転移を忠実に検出することができることが観察されている。

The signature of quantum phase transition is generally wiped out at finite temperature. A few quantities that have been observed to carry this signature through a nonanalytic behavior are also limited to low temperatures only. With an aim to identify a suitable dynamical quantity at a high temperature, we have recently constructed a function from quantum fidelity, which has the potential to bear a nonanalytic signature at the quantum critical point beyond low temperature regime. In this paper, we elaborate our earlier work and demonstrate the behavior of the corresponding rate function and the robustness of the nonanalyticity for a number of many-body Hamiltonians in different dimensions. We have also shown that our rate function reduces to that used in the demonstration of the dynamical quantum phase transition (DQPT) at zero temperature. It has been further observed that, unlike DQPT, the long time limit of the rate function can faithfully detect the equilibrium quantum phase transition as well.

翻訳日:2024-02-07 21:14:47 公開日:2024-02-06

# 3次元LiDARの効率よい凸ハル型車両電位推定法

An Efficient Convex Hull-based Vehicle Pose Estimation Method for 3D LiDAR ( http://arxiv.org/abs/2302.01034v3 )

ライセンス: Link先を確認

Ningning Ding

(参考訳) lidarによる車両ポーズ推定は、自動運転の知覚技術において不可欠である。しかし,lidar点雲の不完全観測とスパース性のため,既存のポーズ推定法を用いて3次元lidarに基づく適切なポーズ抽出を実現することが困難である。また、リアルタイム性能要求により、ポーズ推定タスクの難易度がさらに向上する。本稿では,凸船体に基づく新しい車両ポーズ推定手法を提案する。抽出した3Dクラスタを凸船体に還元し、重要な輪郭情報を保持しながらその後の計算負担を低減する。その後、探索に基づくアルゴリズムに対して、最小閉塞面積に基づく新しい基準を開発し、正確なポーズ推定を可能にする。さらに、この基準により提案アルゴリズムは特に障害物回避に適している。提案アルゴリズムは,工業団地で取得したKITTIデータセットと手動ラベル付きデータセットで検証される。その結果,提案手法は実時間速度を維持しつつ,従来のポーズ推定法よりも精度が高いことを示した。

Vehicle pose estimation with LiDAR is essential in the perception technology of autonomous driving. However, due to incomplete observation measurements and sparsity of the LiDAR point cloud, it is challenging to achieve satisfactory pose extraction based on 3D LiDAR with the existing pose estimation methods. In addition, the demand for real-time performance further increases the difficulty of the pose estimation task. In this paper, we propose a novel vehicle pose estimation method based on the convex hull. The extracted 3D cluster is reduced to the convex hull, reducing the subsequent computation burden while preserving essential contour information. Subsequently, a novel criterion based on the minimum occlusion area is developed for the search-based algorithm, enabling accurate pose estimation. Additionally, this criterion renders the proposed algorithm particularly well-suited for obstacle avoidance. The proposed algorithm is validated on the KITTI dataset and a manually labeled dataset acquired at an industrial park. The results demonstrate that our proposed method can achieve better accuracy than the classical pose estimation method while maintaining real-time speed.

翻訳日:2024-02-07 21:14:33 公開日:2024-02-06

# 継続的学習に関する包括的調査:理論・方法・応用

A Comprehensive Survey of Continual Learning: Theory, Method and Application ( http://arxiv.org/abs/2302.00487v3 )

ライセンス: Link先を確認

Liyuan Wang, Xingxing Zhang, Hang Su, Jun Zhu

(参考訳) 現実世界のダイナミクスに対処するためには、インテリジェントなシステムは生涯を通じて段階的に知識を取得し、更新し、蓄積し、活用する必要がある。この能力は連続学習と呼ばれ、AIシステムが適応的に開発するための基盤を提供する。一般的な意味では、連続学習は破滅的な放棄によって明示的に制限され、新しいタスクの学習は通常、古いタスクの劇的なパフォーマンス低下をもたらす。この他にも、継続的な学習の理解と応用を大きく広げる多くの進歩が近年現れている。この方向への関心の高まりは、その現実的な重要性と複雑さを示している。本研究では,基礎的設定,理論的基礎,代表的方法,実践的応用を橋渡しする継続的学習に関する総合的な調査を行う。既存の理論的および実証的な結果に基づいて,連続学習の一般的な目的を,資源効率の文脈における適切な安定性・塑性トレードオフと適切なタスク内一般化可能性を保証するものとして要約する。次に,最先端かつ精巧な分類法を提供し,代表的な手法が継続的学習をどのように扱うか,それらが現実的応用における特定の課題にどのように適応するかを広範囲に分析する。将来性のある方向性に関する詳細な議論を通じて、このような全体論的な視点は、この分野以降の探究を大いに促進できると信じている。

To cope with real-world dynamics, an intelligent system needs to incrementally acquire, update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as continual learning, provides a foundation for AI systems to develop themselves adaptively. In a general sense, continual learning is explicitly limited by catastrophic forgetting, where learning a new task usually results in a dramatic performance degradation of the old tasks. Beyond this, increasingly numerous advances have emerged in recent years that largely extend the understanding and application of continual learning. The growing and widespread interest in this direction demonstrates its realistic significance as well as complexity. In this work, we present a comprehensive survey of continual learning, seeking to bridge the basic settings, theoretical foundations, representative methods, and practical applications. Based on existing theoretical and empirical results, we summarize the general objectives of continual learning as ensuring a proper stability-plasticity trade-off and an adequate intra/inter-task generalizability in the context of resource efficiency. Then we provide a state-of-the-art and elaborated taxonomy, extensively analyzing how representative methods address continual learning, and how they are adapted to particular challenges in realistic applications. Through an in-depth discussion of promising directions, we believe that such a holistic perspective can greatly facilitate subsequent exploration in this field and beyond.

翻訳日:2024-02-07 21:14:18 公開日:2024-02-06

# fractional posteriorsを用いた半パラメトリック推定

Semiparametric inference using fractional posteriors ( http://arxiv.org/abs/2301.08158v2 )

ライセンス: Link先を確認

Alice L'Huillier, Luke Travis, Isma\"el Castillo and Kolyan Ray

(参考訳) 非パラメトリック先行性に基づく分数的後続分布の概線型半パラメトリック汎函数に対する一般ベルンシュタイン-ヴォン・ミーゼスの定理を確立する。これは多くの非パラメトリックな設定や、ガウス過程の事前を含む様々な事前分布のクラスで示される。半パラメトリックな不確実性定量化を行うことができるが,その大きさは膨大であることを示す。これに対処するため、我々はさらに、正則条件下で最適なサイズを持つ効率的な信頼集合である分数後集合 \textit{shifted-and-rescaled} を提案する。また,この結果から,分数指数に対する率依存性を鋭くすることで,分数後遺症に対する既存の収縮率の精度を向上できた。

We establish a general Bernstein--von Mises theorem for approximately linear semiparametric functionals of fractional posterior distributions based on nonparametric priors. This is illustrated in a number of nonparametric settings and for different classes of prior distributions, including Gaussian process priors. We show that fractional posterior credible sets can provide reliable semiparametric uncertainty quantification, but have inflated size. To remedy this, we further propose a \textit{shifted-and-rescaled} fractional posterior set that is an efficient confidence set having optimal size under regularity conditions. As part of our proofs, we also refine existing contraction rate results for fractional posteriors by sharpening the dependence of the rate on the fractional exponent.

翻訳日:2024-02-07 21:13:53 公開日:2024-02-06

# データソースの最適正規化

Optimal Regularization for a Data Source ( http://arxiv.org/abs/2212.13597v3 )

ライセンス: Link先を確認

Oscar Leong, Eliza O'Reilly, Yong Sheng Soh and Venkat Chandrasekaran

(参考訳) 逆問題や統計的推定に対する最適化に基づくアプローチでは、解の所望の構造特性を促進する正則化子でデータ忠実性を強制する基準を補強することが一般的である。適切な正規化器の選択は、通常、事前のドメイン情報と計算上の考慮の組み合わせによって行われる。凸正則化器は計算的に魅力的であるが、促進できる構造の種類には制限がある。一方、非凸正則化器は、推進できる構造の形態においてより柔軟であり、いくつかのアプリケーションで強い経験的性能を示すが、関連する最適化問題を解決するという計算上の課題が伴う。本稿では, 分散が与えられた場合, 分散から引き出されたデータに対して, 最適な正規化器は何か, という質問をすることで, 凸正則化のパワーと限界を体系的に理解することを模索する。データソースのどの特性が最適正則化器が凸であるかを制御しているのか? 我々は、連続かつ正に同質であり、原点から離れる正の関数によって特定される正規化子のクラスについて、これらの問題に対処する。正則化器は、正則化器が与えるエネルギーのギブス密度が、正則化器が誘導するすべてのギブス密度の人口密度(または同値なエントロピー損失を最小化する)を最大化するならば、データ分布に最適であると言う。私たちが考えるレギュラライザーは、恒星体と1対1の対応にあるため、データ分布から得られる放射関数は、最適なレギュラライザーを識別し、データソースが凸正規化を観測できる可算性を評価するための重要な量である「計算量十分統計」に類似していることを示すために、双対ブルン・ミンコフスキー理論を利用する。

In optimization-based approaches to inverse problems and to statistical estimation, it is common to augment criteria that enforce data fidelity with a regularizer that promotes desired structural properties in the solution. The choice of a suitable regularizer is typically driven by a combination of prior domain information and computational considerations. Convex regularizers are attractive computationally but they are limited in the types of structure they can promote. On the other hand, nonconvex regularizers are more flexible in the forms of structure they can promote and they have showcased strong empirical performance in some applications, but they come with the computational challenge of solving the associated optimization problems. In this paper, we seek a systematic understanding of the power and the limitations of convex regularization by investigating the following questions: Given a distribution, what is the optimal regularizer for data drawn from the distribution? What properties of a data source govern whether the optimal regularizer is convex? We address these questions for the class of regularizers specified by functionals that are continuous, positively homogeneous, and positive away from the origin. We say that a regularizer is optimal for a data distribution if the Gibbs density with energy given by the regularizer maximizes the population likelihood (or equivalently, minimizes cross-entropy loss) over all regularizer-induced Gibbs densities. As the regularizers we consider are in one-to-one correspondence with star bodies, we leverage dual Brunn-Minkowski theory to show that a radial function derived from a data distribution is akin to a ``computational sufficient statistic'' as it is the key quantity for identifying optimal regularizers and for assessing the amenability of a data source to convex regularization.

翻訳日:2024-02-07 21:13:41 公開日:2024-02-06

# Kullback-Leibler Maillard Smpling for Multi-armed Bandits with bounded Rewards

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards ( http://arxiv.org/abs/2304.14989v3 )

ライセンス: Link先を確認

Hao Qin, Kwang-Sung Jun and Chicheng Zhang

(参考訳) 我々は、腕の報酬分布がすべて$[0,1]$間隔で支えられるような$K$武器の盗賊問題を研究する。この環境では、後悔効率の悪いランダム化探索アルゴリズムを設計することが難しかった。 maillard sampling~\cite{maillard13apprentissage}(トンプソンサンプリングに代わる魅力的な代替品)は、最近、オフラインポリシー評価に有用なクローズドフォームアクション確率を維持しながら、サブゲージの報酬設定における競合的な後悔の保証を達成することが示されている。本研究では,KL-Leibler Maillard Smpling (KL-MS)アルゴリズムを提案する。 kl-ms は、報酬がベルヌーイであるときに漸近的最適性を享受し、最悪の場合の後悔の束縛が $o(\sqrt{\mu^*(1-\mu^*) k t \ln k} + k \ln t)$ であることを示し、ここで $\mu^*$ は最適アームの期待報酬であり、$t$ は時平線の長さである。

We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^*(1-\mu^*) K T \ln K} + K \ln T)$, where $\mu^*$ is the expected reward of the optimal arm, and $T$ is the time horizon length.

翻訳日:2024-02-07 21:04:58 公開日:2024-02-06

# MUDiff:完全分子生成のための統一拡散

MUDiff: Unified Diffusion for Complete Molecule Generation ( http://arxiv.org/abs/2304.14621v3 )

ライセンス: Link先を確認

Chenqing Hua, Sitao Luan, Minkai Xu, Rex Ying, Jie Fu, Stefano Ermon, Doina Precup

(参考訳) 分子生成は非常に重要な実用的問題であり、医薬品の発見と材料設計に利用され、AI手法は有用なソリューションを提供することを約束する。しかし、既存の分子生成法は2dグラフ構造か3d幾何学構造に焦点を合わせており、2dグラフが主にトポロジーを捉え、3d幾何学が主に空間原子配置を捉えているため、完全な分子を表現するには不十分である。これらの表現を組み合わせることは、分子をよりよく表すのに不可欠である。本稿では,原子の特徴,2次元離散分子構造,および3次元連続分子座標を含む分子の包括的表現を離散的および連続的拡散過程を組み合わせることで生成する新しいモデルを提案する。拡散過程を用いることで、分子過程の確率的性質を捉え、異なる因子が分子構造に与える影響を探求することができる。さらに,拡散過程を認知するための新しいグラフトランスフォーマーアーキテクチャを提案する。トランスは3次元ロート変換同分散制約に準拠し、原子座標の同分散を保ちながら不変な原子とエッジの表現を学習することができる。この変換器は、幾何学的変換に頑健な分子表現を学ぶために使用できる。実験と既存手法との比較により, モデルの性能評価を行い, より安定で有効な分子を生成する能力を示した。我々のモデルは、安定で多様な分子を設計するための有望なアプローチであり、分子モデリングの幅広いタスクに適用できる。

Molecule generation is a very important practical problem, with uses in drug discovery and material design, and AI methods promise to provide useful solutions. However, existing methods for molecule generation focus either on 2D graph structure or on 3D geometric structure, which is not sufficient to represent a complete molecule as 2D graph captures mainly topology while 3D geometry captures mainly spatial atom arrangements. Combining these representations is essential to better represent a molecule. In this paper, we present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates, by combining discrete and continuous diffusion processes. The use of diffusion processes allows for capturing the probabilistic nature of molecular processes and exploring the effect of different factors on molecular structures. Additionally, we propose a novel graph transformer architecture to denoise the diffusion process. The transformer adheres to 3D roto-translation equivariance constraints, allowing it to learn invariant atom and edge representations while preserving the equivariance of atom coordinates. This transformer can be used to learn molecular representations robust to geometric transformations. We evaluate the performance of our model through experiments and comparisons with existing methods, showing its ability to generate more stable and valid molecules. Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.

翻訳日:2024-02-07 21:04:25 公開日:2024-02-06

# 拡張クラスタ:ニューラルネットワークのパラメータ回復

Expand-and-Cluster: Parameter Recovery of Neural Networks ( http://arxiv.org/abs/2304.12794v3 )

ライセンス: Link先を確認

Flavio Martinelli, Berfin Simsek, Wulfram Gerstner and Johanni Brea

(参考訳) 入力出力マッピングを探索することで、ニューラルネットワークのパラメータを識別できるだろうか? 通常、置換、過度パラメータ化、アクティベーション関数対称性のため、ユニークな解は存在しない。しかし、各ニューロンの入射重みベクトルは、活性化関数に応じて、符号やスケーリングまで識別可能であることを示す。一般的に使用されるすべてのアクティベーション関数に対して,提案手法である'expand-and-cluster'は,ターゲットネットワークのサイズとパラメータを2つのフェーズで識別する。 (i)問題の非凸性を緩和するために、拡張サイズの複数の学生ネットワークを訓練し、対象ネットワークのマッピングを模倣する。 (ii) 対象ネットワークを特定するために, クラスタリング手法を採用し, 学生間で共有される重みベクトルを明らかにする。ニューロン数を10%以下に満たさない訓練された浅層ネットワークと深層ネットワークのパラメータとサイズ回復に成功し,可変難易度150の合成問題を分析して「識別可能性のイーズ」軸を記述する。

Can we identify the parameters of a neural network by probing its input-output mapping? Usually, there is no unique solution because of permutation, overparameterisation and activation function symmetries. Yet, we show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. For all commonly used activation functions, our novel method 'Expand-and-Cluster' identifies the size and parameters of a target network in two phases: (i) to relax the non-convexity of the problem, we train multiple student networks of expanded size to imitate the mapping of the target network; (ii) to identify the target network, we employ a clustering procedure and uncover the weight vectors shared between students. We demonstrate successful parameter and size recovery of trained shallow and deep networks with less than 10% overhead in the neuron number and describe an 'ease-of-identifiability' axis by analysing 150 synthetic problems of variable difficulty.

翻訳日:2024-02-07 21:04:00 公開日:2024-02-06

# l$-subexponential covariates におけるスパース線形回帰係数の推定

Estimation of sparse linear regression coefficients under $L$-subexponential covariates ( http://arxiv.org/abs/2304.11958v2 )

ライセンス: Link先を確認

Takeyuki Sasai

(参考訳) 共変数が$l$-subexponential random vectorからサンプリングされたとき、線形回帰におけるスパース係数の推定に取り組む。このベクトルは、ガウス確率ベクトルよりも重いテールを示す分布のクラスに属する。以前の研究では、ガウス確率ベクトルに類似した誤差境界が確立されている。しかし、これらの方法は誤差境界を導出するためにガウス確率ベクトルに使用される条件よりも強い条件を必要とする。本研究では,ガウス確率ベクトルに対して得られた値と同一の誤差を,より強い条件を課さずに,その共変数が$L$-部分指数確率ベクトルから引き出される場合に適用する。興味深いことに、我々は$\ell_1$-penalized Huberレグレッション(英語版)を採用している。本研究では,$\ell_1$-penalized Huber回帰法の新たな側面を明らかにする。

We tackle estimating sparse coefficients in a linear regression when the covariates are sampled from an $L$-subexponential random vector. This vector belongs to a class of distributions that exhibit heavier tails than Gaussian random vector. Previous studies have established error bounds similar to those derived for Gaussian random vectors. However, these methods require stronger conditions than those used for Gaussian random vectors to derive the error bounds. In this study, we present an error bound identical to the one obtained for Gaussian random vectors up to constant factors without imposing stronger conditions, when the covariates are drawn from an $L$-subexponential random vector. Interestingly, we employ an $\ell_1$-penalized Huber regression, which is known for its robustness against heavy-tailed random noises rather than covariates. We believe that this study uncovers a new aspect of the $\ell_1$-penalized Huber regression method.

翻訳日:2024-02-07 21:03:43 公開日:2024-02-06

# RMTによる100万トークン以上のTransformerのスケーリング

Scaling Transformer to 1M tokens and beyond with RMT ( http://arxiv.org/abs/2304.11062v2 )

ライセンス: Link先を確認

Aydar Bulatov, Yuri Kuratov, Yermek Kapushev, Mikhail S. Burtsev

(参考訳) 変圧器によって解ける問題の範囲の広い大きな制限は、入力サイズによる計算複雑性の2次スケーリングである。本研究では,入力コンテキスト長を線形に拡張する事前学習型トランスフォーマーモデルの繰り返しメモリ拡張について検討する。提案手法は,検索精度を高く保ちつつ,前例のない200万トークンのシーケンスの情報をメモリに格納できることを実証する。言語モデリングタスクを用いた実験では、処理された入力セグメントの数が増えるにつれて複雑度が向上する。これらの結果から,自然言語理解および生成タスクにおける長期依存性処理の強化や,メモリ集約型アプリケーションにおける大規模コンテキスト処理の実現に重要な可能性を持つ本手法の有効性が示唆された。

A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up to an unprecedented two million tokens while maintaining high retrieval accuracy. Experiments with language modeling tasks show perplexity improvement as the number of processed input segments increases. These results underscore the effectiveness of our method, which has significant potential to enhance long-term dependency handling in natural language understanding and generation tasks, as well as enable large-scale context processing for memory-intensive applications.

翻訳日:2024-02-07 21:03:27 公開日:2024-02-06

# マージンに沿う : マージン化コミュニティの社会プラットフォームに関する倫理的懸念

Along the Margins: Marginalized Communities' Ethical Concerns about Social Platforms ( http://arxiv.org/abs/2304.08882v2 )

ライセンス: Link先を確認

Lauren Olson and Emitz\'a Guzm\'an and Florian Kunneman

(参考訳) 本稿では,地域社会の社会的プラットフォームに対する倫理的懸念を明らかにする。最近のプラットフォームへの悪影響は、ソフトウェアチームがユーザーの懸念よりも株主の関心を優先していることを示しています。さらに、これらのプラットフォームの欠点は、しばしば疎外化人口に壊滅的な影響を及ぼす。最初に586の辺境化コミュニティのサブレディットを解体し、彼らのソーシャルプラットフォームに関する言及のデータセットを集約し、これらのデータに倫理的懸念について手動で言及しました。その後,手作業による注釈データの傾向を分析し,自然言語処理(nlp)によって倫理的関心事を自動的に分類できる範囲を検証した。コミュニティの倫理的懸念は、差別や表現の誤りを主に取り除き、現在のソフトウェア開発プラクティスの欠陥を明らかにします。そのため、研究者や開発者は、我々の研究を利用してこれらの懸念をさらに調査し、現在のソフトウェア欠陥を是正することができる。

In this paper, we identified marginalized communities' ethical concerns about social platforms. We performed this identification because recent platform malfeasance indicates that software teams prioritize shareholder concerns over user concerns. Additionally, these platform shortcomings often have devastating effects on marginalized populations. We first scraped 586 marginalized communities' subreddits, aggregated a dataset of their social platform mentions and manually annotated mentions of ethical concerns in these data. We subsequently analyzed trends in the manually annotated data and tested the extent to which ethical concerns can be automatically classified by means of natural language processing (NLP). We found that marginalized communities' ethical concerns predominantly revolve around discrimination and misrepresentation, and reveal deficiencies in current software development practices. As such, researchers and developers could use our work to further investigate these concerns and rectify current software flaws.

翻訳日:2024-02-07 21:02:57 公開日:2024-02-06

# 封建グラフ強化学習

Feudal Graph Reinforcement Learning ( http://arxiv.org/abs/2304.05099v2 )

ライセンス: Link先を確認

Tommaso Marzi, Arshjot Khehra, Andrea Cini, Cesare Alippi

(参考訳) グラフベースの表現と重み付けモジュールポリシーは、強化学習(RL)における構成可能な制御問題に対処するための顕著なアプローチである。しかし、最近のグラフ深層学習文献で示されているように、メッセージパッシング演算子は情報伝達のボトルネックを生じさせ、グローバルな調整を妨げる。ハイレベルな計画が必要なタスクでは、この問題は劇的になります。本研究では,階層的RLとピラミッド型メッセージパッシングアーキテクチャに頼って,このような課題に対処する新しい手法であるFeudal Graph Reinforcement Learning (FGRL)を提案する。特に、fgrlは、階層の上部から階層化されたグラフ構造を通じてハイレベルなコマンドが伝播するポリシーの階層を定義する。下層は物理系の形態を模倣し、上層はより抽象的なサブモジュールをキャプチャする。結果として得られたエージェントは、あるレベルのアクションが以下のレベルの目標を設定するポリシー委員会によって特徴づけられ、タスクの分解を包含する階層的な意思決定構造を実装する。提案手法をベンチマークmujoco環境上で評価し,fgrlが関連するベースラインと好適に比較できることを示す。さらに、コマンド伝搬機構の詳細な分析により、メッセージパッシング方式が階層的な意思決定方針の学習に有利であることを示す。

Graph-based representations and weight-sharing modular policies constitute prominent approaches to tackling composable control problems in Reinforcement Learning (RL). However, as shown by recent graph deep learning literature, message-passing operators can create bottlenecks in information propagation and hinder global coordination. The issue becomes dramatic in tasks where high-level planning is needed. In this work, we propose a novel methodology, named Feudal Graph Reinforcement Learning (FGRL), that addresses such challenges by relying on hierarchical RL and a pyramidal message-passing architecture. In particular, FGRL defines a hierarchy of policies where high-level commands are propagated from the top of the hierarchy down through a layered graph structure. The bottom layers mimic the morphology of the physical system, while the upper layers capture more abstract sub-modules. The resulting agents are then characterized by a committee of policies where actions at a certain level set goals for the level below, thus implementing a hierarchical decision-making structure that encompasses task decomposition. We evaluate the proposed framework on locomotion tasks on benchmark MuJoCo environments and show that FGRL compares favorably against relevant baselines. Furthermore, an in-depth analysis of the command propagation mechanism provides evidence that the introduced message-passing scheme favors the learning of hierarchical decision-making policies.

翻訳日:2024-02-07 21:02:27 公開日:2024-02-06

# 機械理解による子どものビデオの学習品質の定量化

Quantifying the Academic Quality of Children's Videos using Machine Comprehension ( http://arxiv.org/abs/2303.17201v2 )

ライセンス: Link先を確認

Sumeet Kumar, Mallikarjuna T., Ashiqur Khudabukhsh

(参考訳) youtube kids (ytk) は、何百万人もの子どもが毎日使っている最も人気のある子供向けアプリケーションの一つである。しかし、さまざまな研究がプラットフォーム上のビデオに対する懸念を強調している。 youtubeは先日,‘promoting learning’を含む高品質なガイドラインを提案し,ランキングチャネルで使用することを提案している。しかし、学習の概念は多面的であり、オンラインビデオの文脈で定義・測定することは困難である。本研究は、学校で教えられていることの学習に焦点を当て、子どものビデオの学術的品質を測定する方法を提案する。子どものビデオからの質問と回答の新しいデータセットを用いて、まず、学習の可読性(Reading Comprehension, RC)モデルを推定できることを示す。次に,多種多様な話題に関する中学校教科書質問の大規模データセットを用いて,rcモデルが正しく回答できる児童教科書質問数として上位チャネルの学術的品質を定量化する。トップ100のチャンネルに投稿された8万本のビデオを分析して、YTKのチャンネルの学術的品質を初めて詳細に分析した。

YouTube Kids (YTK) is one of the most popular kids' applications used by millions of kids daily. However, various studies have highlighted concerns about the videos on the platform, like the over-presence of entertaining and commercial content. YouTube recently proposed high-quality guidelines that include `promoting learning' and proposed to use it in ranking channels. However, the concept of learning is multi-faceted, and it can be difficult to define and measure in the context of online videos. This research focuses on learning in terms of what's taught in schools and proposes a way to measure the academic quality of children's videos. Using a new dataset of questions and answers from children's videos, we first show that a Reading Comprehension (RC) model can estimate academic learning. Then, using a large dataset of middle school textbook questions on diverse topics, we quantify the academic quality of top channels as the number of children's textbook questions that an RC model can correctly answer. By analyzing over 80,000 videos posted on the top 100 channels, we present the first thorough analysis of the academic quality of channels on YTK.

翻訳日:2024-02-07 21:01:28 公開日:2024-02-06

# エキゾチック局所次元を用いた安定化符号

Stabilizer Codes with Exotic Local-dimensions ( http://arxiv.org/abs/2303.17000v2 )

ライセンス: Link先を確認

Lane G. Gunderman

(参考訳) 従来の安定化符号は素電力ローカルディメンション上で動作する。本研究では、局所次元不変条件を用いて安定化器の形式を拡張し、これらの標準局所次元から他のケースへ安定化器コードをインポートする。特に,従来の安定化符号は相空間と離散位相空間の制約を考慮することで,アナログ連続変数符号に利用できることを示す。これにより、このフレームワークは従来の安定化コードと同じ基盤に置かれる。これに続いて、先行アイデアの拡張を用いて、元来有限フィールド局所ディメンションで設計された安定化コードは、任意の積分領域に対して同じ$n$、$k$、$d$パラメータを持つコードに変換できることを示す。これは理論的な関心事であり、局所次元が数学的な環によってよりよく説明され、情報を保護するために従来の安定化符号を使うことを可能にするシステムにも利用できる。

Traditional stabilizer codes operate over prime power local-dimensions. In this work we extend the stabilizer formalism using the local-dimension-invariant setting to import stabilizer codes from these standard local-dimensions to other cases. In particular, we show that any traditional stabilizer code can be used for analog continuous-variable codes, and consider restrictions in phase space and discretized phase space. This puts this framework on an equivalent footing as traditional stabilizer codes. Following this, using extensions of prior ideas, we show that a stabilizer code originally designed with a finite field local-dimension can be transformed into a code with the same $n$, $k$, and $d$ parameters for any integral domain. This is of theoretical interest and can be of use for systems whose local-dimension is better described by mathematical rings, which permits the use of traditional stabilizer codes for protecting their information as well.

翻訳日:2024-02-07 21:01:08 公開日:2024-02-06

# 学習可能なグラフマッチング: データアソシエーションのための実践的パラダイム

Learnable Graph Matching: A Practical Paradigm for Data Association ( http://arxiv.org/abs/2303.15414v2 )

ライセンス: Link先を確認

Jiawei He, Zehao Huang, Naiyan Wang, Zhaoxiang Zhang

(参考訳) データアソシエーションは、複数のオブジェクト追跡、画像マッチング、ポイントクラウド登録など、多くのコンピュータビジョンタスクの中核にある。しかしながら、現在のデータアソシエーションソリューションには、主にビュー内コンテキスト情報を無視する、あるいは、深いアソシエーションモデルをエンドツーエンドでトレーニングする、最適化ベースの割り当て手法の利点をほとんど活用しない、あるいは、オフザシェルニューラルネットワークを使用して特徴を抽出する、といった、いくつかの欠陥がある。本稿では,これらの問題に対処するために,一般学習可能なグラフマッチング手法を提案する。特に、ビュー内関係を無向グラフとしてモデル化する。そして、データアソシエーションはグラフ間の一般的なグラフマッチング問題となる。さらに、エンドツーエンドの微分を可能にするため、元のグラフマッチング問題を2次連続プログラミングに緩和し、KKT条件と暗黙関数定理を備えたディープグラフニューラルネットワークにトレーニングを組み込む。 MOTタスクでは,複数のMOTデータセット上での最先端性能を実現する。画像マッチングでは,一般的な屋内データセットであるScanNetで最先端の手法より優れている。ポイントクラウドの登録については、競争結果も達成します。コードはhttps://github.com/jiaweihe1996/gmtrackerで入手できる。

Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. however, current data association solutions have some defects: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet. For point cloud registration, we also achieve competitive results. Code will be available at https://github.com/jiaweihe1996/GMTracker.

翻訳日:2024-02-07 21:00:54 公開日:2024-02-06

# 不均一連関学習におけるクライアントドリフト最小化のための適応的自己蒸留

Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning ( http://arxiv.org/abs/2305.19600v3 )

ライセンス: Link先を確認

M.Yashwanth, Gaurav Kumar Nayak, Arya Singh, Yogesh Simmhan, Anirban Chakraborty

(参考訳) Federated Learning(FL)は、クライアントがローカルトレーニングデータを共有せずに、局所的にトレーニングされたモデルを集約することで、グローバルモデルの共同トレーニングを可能にする機械学習パラダイムである。実際には、各クライアントが観測するローカルデータ分布にまたがる実質的な不均一性(例えばクラス不均衡)がしばしば存在する。このようなクライアント間の非IDデータ分散では、FLは、すべてのクライアントが自身のローカルな最適化にドリフトする'クライアント-ドリフト'問題に悩まされる。これにより、集約モデルの収束が遅くなり、性能が低下する。この制限に対処するために、クライアント側でのトレーニングモデルのための適応自己蒸留(ASD)に基づく新しい正規化手法を提案する。我々の正規化スキームは、グローバルモデルエントロピーとクライアントのラベル分布に基づいて、クライアントのトレーニングデータに適応的に調整する。提案した正規化は、既存の最先端のFLアルゴリズム上で容易に統合することができ、これらのオフ・ザ・シェルフ法の性能がさらに向上する。理論的には、ASDがクライアントのドリフトを減らし、その一般化能力を説明する。提案手法の有効性を,複数の実世界のベンチマーク実験により実証し,最先端手法よりも高い性能を示した。

Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data. In practice, there can often be substantial heterogeneity (e.g., class imbalance) across the local data distributions observed by each of these clients. Under such non-iid data distributions across clients, FL suffers from the 'client-drift' problem where every client drifts to its own local optimum. This results in slower convergence and poor performance of the aggregated model. To address this limitation, we propose a novel regularization technique based on adaptive self-distillation (ASD) for training models on the client side. Our regularization scheme adaptively adjusts to the client's training data based on the global model entropy and the client's label distribution. The proposed regularization can be easily integrated atop existing, state-of-the-art FL algorithms, leading to a further boost in the performance of these off-the-shelf methods. We theoretically explain how ASD reduces client-drift and also explain its generalization ability. We demonstrate the efficacy of our approach through extensive experiments on multiple real-world benchmarks and show substantial gains in performance over state-of-the-art methods.

翻訳日:2024-02-07 20:53:10 公開日:2024-02-06

# 点雲上の深層学習のための滑らかで正確な回転対称性

Smooth, exact rotational symmetrization for deep learning on point clouds ( http://arxiv.org/abs/2305.19302v3 )

ライセンス: Link先を確認

Sergey N. Pozdnyakov and Michele Ceriotti

(参考訳) 点雲は3Dオブジェクトの汎用表現であり、科学や工学に広く応用されている。入力として使用するディープラーニングモデルが数多く提案されている。化学・材料モデリングの分野は、モデルが実際に使用可能であるためには物理的制約の厳密な遵守が極めて望ましいため、特に困難である。これらの制約には、同一原子の翻訳、回転、置換に関する滑らかさと不変性が含まれる。これらの要件が厳密に満たされていない場合、モデルに優れた精度があるとしても、原子論シミュレーションはばかげた結果をもたらす可能性がある。その結果、設計空間を制限して不変性を実現する専用アーキテクチャが開発された。汎用のポイントクラウドモデルはより多様であるが、しばしば回転対称性を無視する。任意のモデルに回転同分散を付加し、他の全ての要求を保存できる一般対称性法を提案する。このアプローチは、設計空間の制約を緩和し、他の領域で効果的なアイデアを取り入れることで、原子スケールの機械学習スキームの開発を単純化する。このアイデアは,本質的同変ではないが,分子や固体のベンチマークデータセット上での最先端性能を実現するPoint Edge Transformer (PET) アーキテクチャを導入することで実証する。一般プロトコルのA-posteriori適用により,PETの精度は最小限に抑えられた。

Point clouds are versatile representations of 3D objects and have found widespread application in science and engineering. Many successful deep-learning models have been proposed that use them as input. The domain of chemical and materials modeling is especially challenging because exact compliance with physical constraints is highly desirable for a model to be usable in practice. These constraints include smoothness and invariance with respect to translations, rotations, and permutations of identical atoms. If these requirements are not rigorously fulfilled, atomistic simulations might lead to absurd outcomes even if the model has excellent accuracy. Consequently, dedicated architectures, which achieve invariance by restricting their design space, have been developed. General-purpose point-cloud models are more varied but often disregard rotational symmetry. We propose a general symmetrization method that adds rotational equivariance to any given model while preserving all the other requirements. Our approach simplifies the development of better atomic-scale machine-learning schemes by relaxing the constraints on the design space and making it possible to incorporate ideas that proved effective in other domains. We demonstrate this idea by introducing the Point Edge Transformer (PET) architecture, which is not intrinsically equivariant but achieves state-of-the-art performance on several benchmark datasets of molecules and solids. A-posteriori application of our general protocol makes PET exactly equivariant, with minimal changes to its accuracy.

翻訳日:2024-02-07 20:52:50 公開日:2024-02-06

# 非線形リカレントニューラルネットワークの逆近似理論

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks ( http://arxiv.org/abs/2305.19190v4 )

ライセンス: Link先を確認

Shida Wang, Zhong Li and Qianxiao Li

(参考訳) 本研究では,recurrent neural network (rnns) を用いた非線形シーケンス-シーケンス関係の近似に対する逆近似定理を証明した。これはいわゆるベルンシュタイン型近似理論の結果であり、仮説空間によって効果的に近似できるという仮定の下で対象関数の性質を推論する。特に、非線形RNNによって安定に近似できる非線形シーケンス関係は、指数関数的に減衰するメモリ構造を持つ必要がある。これは線形rnnにおけるメモリの呪いを一般的な非線形設定に拡張し、長期記憶とのシーケンシャルな関係を学習するためのrnnアーキテクチャの本質的な制限を定量化する。そこで本研究では,その限界を克服する原理的パラメータ化手法を提案する。理論的結果は数値実験によって確認される。コードはhttps://github.com/radarfudan/curse-of-memoryでリリースされている。

We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs). This is a so-called Bernstein-type result in approximation theory, which deduces properties of a target function under the assumption that it can be effectively approximated by a hypothesis space. In particular, we show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure - a notion that can be made precise. This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting, and quantifies the essential limitations of the RNN architecture for learning sequential relationships with long-term memory. Based on the analysis, we propose a principled reparameterization method to overcome the limitations. Our theoretical results are confirmed by numerical experiments. The code has been released in https://github.com/radarFudan/Curse-of-memory

翻訳日:2024-02-07 20:52:17 公開日:2024-02-06

# 確率的時系列予測のためのより良いバッチ

Better Batch for Deep Probabilistic Time Series Forecasting ( http://arxiv.org/abs/2305.17028v3 )

ライセンス: Link先を確認

Vincent Zhihao Zheng, Seongjin Choi, Lijun Sun

(参考訳) 深い確率的時系列予測は、非線形近似における優れた性能と、意思決定に価値ある不確実な定量化を提供する能力に注目されている。しかし、既存のモデルは、時間に依存しないエラープロセスを仮定し、シリアル相関を見越して問題を単純化することが多い。この制限を克服するため,確率予測精度を向上させるために,誤り自己相関を取り入れた革新的なトレーニング手法を提案する。本手法は,モデルトレーニングのためのD$連続時系列セグメントのコレクションとしてミニバッチを構築する。各ミニバッチ上で時間変化共分散行列を明示的に学習し、隣接する時間ステップ間の誤差相関を符号化する。学習された共分散行列は予測精度の向上と不確かさの定量化に利用できる。 2つの異なるニューラル予測モデルと複数の公開データセットで本手法を評価する。実験の結果,提案手法の有効性が検証され,予測精度が大幅に向上した。

Deep probabilistic time series forecasting has gained attention for its superior performance in nonlinear approximation and its capability to offer valuable uncertainty quantification for decision-making. However, existing models often oversimplify the problem by assuming a time-independent error process, overlooking serial correlation. To overcome this limitation, we propose an innovative training method that incorporates error autocorrelation to enhance probabilistic forecasting accuracy. Our method constructs a mini-batch as a collection of $D$ consecutive time series segments for model training. It explicitly learns a time-varying covariance matrix over each mini-batch, encoding error correlation among adjacent time steps. The learned covariance matrix can be used to improve prediction accuracy and enhance uncertainty quantification. We evaluate our method on two different neural forecasting models and multiple public datasets. Experimental results confirm the effectiveness of the proposed approach in improving the performance of both models across a range of datasets, resulting in notable improvements in predictive accuracy.

翻訳日:2024-02-07 20:52:03 公開日:2024-02-06

# 生物学的データを用いたグラフニューラルネットワークのサイズ一般化:スペクトルの観点からの考察と実践

Size Generalization of Graph Neural Networks on Biological Data: Insights and Practices from the Spectral Perspective ( http://arxiv.org/abs/2305.15611v3 )

ライセンス: Link先を確認

Gaotang Li, Yujun Yan, Danai Koutra

(参考訳) 本研究では,グラフの大きさによる分布変化を調査し,その学習データに対するグラフニューラルネットワーク(gnns)の一般化能力に与える影響を評価する。既存の文献では、gnnのサイズ汎化可能性について、主にアプリケーションドメインの相違とサイズ誘起分布シフトに関する基礎的な仮定によって、矛盾する結論を示している。私たちは実際の生物学的データセットに注目し、サイズによって引き起こされる分散シフトのタイプを特徴付けることを求めます。従来のアプローチと異なり、スペクトルの視点を採用し、サイズによって引き起こされるスペクトル差がサブグラフパターン(例えば、平均サイクル長)の違いと関係していることを明らかにする。従来の研究では, サブグラフ情報の取得におけるGNNの欠如が, 分布内一般化に悪影響を及ぼすことが確認されているが, トレーニング中に遭遇しない大規模テストグラフでは, この減少が顕著である。このようなスペクトル的洞察に基づいて,gnnがそれらの重要な部分グラフパターンを認識し,そのサイズ一般化可能性を高めるための,単純かつ効果的なモデル非依存戦略を導入する。実験の結果,提案手法はトレーニンググラフの2～10倍の大きさの大規模テストグラフ上でのグラフ分類性能を大幅に向上させ,F1スコアを最大8%向上させることができた。

We investigate size-induced distribution shifts in graphs and assess their impact on the ability of graph neural networks (GNNs) to generalize to larger graphs relative to the training data. Existing literature presents conflicting conclusions on GNNs' size generalizability, primarily due to disparities in application domains and underlying assumptions concerning size-induced distribution shifts. Motivated by this, we take a data-driven approach: we focus on real biological datasets and seek to characterize the types of size-induced distribution shifts. Diverging from prior approaches, we adopt a spectral perspective and identify that spectrum differences induced by size are related to differences in subgraph patterns (e.g., average cycle lengths). While previous studies have identified that the inability of GNNs in capturing subgraph information negatively impacts their in-distribution generalization, our findings further show that this decline is more pronounced when evaluating on larger test graphs not encountered during training. Based on these spectral insights, we introduce a simple yet effective model-agnostic strategy, which makes GNNs aware of these important subgraph patterns to enhance their size generalizability. Our empirical results reveal that our proposed size-insensitive attention strategy substantially enhances graph classification performance on large test graphs, which are 2-10 times larger than the training graphs, resulting in an improvement in F1 scores by up to 8%.

翻訳日:2024-02-07 20:51:49 公開日:2024-02-06

# 熱力学量を用いた多体絡み合いの実験的検証

Experimental Verification of Many-Body Entanglement Using Thermodynamic Quantities ( http://arxiv.org/abs/2305.15012v2 )

ライセンス: Link先を確認

Jitendra Joshi, Mir Alimuddin, T S Mahesh, Manik Banik

(参考訳) 量子絡み合いの現象は、新しい量子技術を可能にするいくつかの重要なプロトコルの下にある。しかし、絡み合った状態は極めて繊細であり、しばしば外部環境の小さなゆらぎによって摂動する。したがって、このリソースを含むプロトコルの実装を成功させるには、絡み合いの証明が極めて重要である。本研究では,ある種の熱力学量を測定することで容易に検証できるマルチキュービットシステムの絡み合い基準を提案する。特に、この基準は、それぞれ大域的および局所的な相互作用の下で孤立量子系から抽出可能な最適な大域的および局所的な作業の差に依存する。原理の証明として,原子磁気共鳴アーキテクチャを用いて最大10量子ビットの核スピンレジスタに関する提案手法を実証する。我々は、恒星トポロジー系におけるノイズの多いベル対角状態とノイズの多いグリーンベルガー・ホーネ・ザイリンガークラスを作成し、熱力学の基準によってそれらの絡み合いを認証する。また, 多体システムにおいて, 状態に関する知識が部分的あるいは全く存在しない場合にも, 絡み合い認証方式を提案する。

The phenomenon of quantum entanglement underlies several important protocols that enable emerging quantum technologies. Entangled states, however, are extremely delicate and often get perturbed by tiny fluctuations in their external environment. Certification of entanglement is therefore immensely crucial for the successful implementation of protocols involving this resource. In this work, we propose a set of entanglement criteria for multi-qubit systems that can be easily verified by measuring certain thermodynamic quantities. In particular, the criteria depend on the difference in optimal global and local works extractable from an isolated quantum system under global and local interactions, respectively. As a proof of principle, we demonstrate the proposed scheme on nuclear spin registers of up to 10 qubits using the Nuclear Magnetic Resonance architecture. We prepare noisy Bell-diagonal state and noisy Greenberger-Horne-Zeilinger class of states in star-topology systems and certify their entanglement through our thermodynamic criteria. Along the same line, we also propose an entanglement certification scheme in many-body systems when only partial or even no knowledge about the state is available.

翻訳日:2024-02-07 20:51:23 公開日:2024-02-06

# 基準自由画像キャプション評価指標のロバスト性に関する検討

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics ( http://arxiv.org/abs/2305.14998v2 )

ライセンス: Link先を確認

Saba Ahmadi, Aishwarya Agrawal

(参考訳) 近年,CLIPScore (Hessel et al., 2021), UMIC (Lee et al., 2021), PAC-S (Sarto et al., 2023) などの参照フリー指標が画像キャプションの自動参照フリー評価のために提案されている。我々の焦点は、語彙の重なりが大きい2つのキャプションを区別する必要があるシナリオにおいて、これらの指標の堅牢性を評価することである。以上の結果から,クリップスコア,umic,pac-sは,人間の判断と高い相関関係にあるものの,きめ細かい誤りの特定に苦慮していることが明らかとなった。すべての指標は視覚的な接地誤差に対して強い感度を示すが、キャプションに対する感受性は限定的である。さらに,すべての指標がキャプション内の画像関連物の大きさの変動に敏感であり,CLIPScoreとPAC-Sもキャプション内の画像関連物への言及数に敏感であることがわかった。キャプションの言語的側面については,すべての指標が否定の弱い理解を示し,CLIPScoreとPAC-Sはキャプションの構造に非常に敏感である。画像キャプションの非参照評価のさらなる改善が期待できる。

Recently, reference-free metrics such as CLIPScore (Hessel et al., 2021), UMIC (Lee et al., 2021), and PAC-S (Sarto et al., 2023) have been proposed for automatic reference-free evaluation of image captions. Our focus lies in evaluating the robustness of these metrics in scenarios that require distinguishing between two captions with high lexical overlap but very different meanings. Our findings reveal that despite their high correlation with human judgments, CLIPScore, UMIC, and PAC-S struggle to identify fine-grained errors. While all metrics exhibit strong sensitivity to visual grounding errors, their sensitivity to caption implausibility errors is limited. Furthermore, we found that all metrics are sensitive to variations in the size of image-relevant objects mentioned in the caption, while CLIPScore and PAC-S are also sensitive to the number of mentions of image-relevant objects in the caption. Regarding linguistic aspects of a caption, all metrics show weak comprehension of negation, and CLIPScore and PAC-S are insensitive to the structure of the caption to a great extent. We hope our findings will guide further improvements in reference-free evaluation of image captioning.

翻訳日:2024-02-07 20:51:06 公開日:2024-02-06

# 不均衡最適輸送の半二重定式化による生成モデル

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport ( http://arxiv.org/abs/2305.14777v3 )

ライセンス: Link先を確認

Jaemoo Choi, Jaewoong Choi, Myungjoo Kang

(参考訳) 最適輸送(OT)問題は、与えられたコスト関数を最小化しながら2つの分布をブリッジする輸送マップを調べる。この点において、扱いやすい事前分布とデータの間のotは生成的モデリングタスクに利用されてきた。しかし、OTベースの手法は、トレーニング中にアウトレーヤや最適化の課題に直面しやすい。本稿では,不均衡最適輸送(UOT)の半二重定式化に基づく新しい生成モデルを提案する。 OTとは異なり、UOTは分布マッチングの厳しい制約を緩和する。このアプローチは、外れ値に対する堅牢性、トレーニング中の安定性、より高速な収束を提供する。これらの特性を実験的に検証する。さらに,UOTにおける分布間の分岐の理論的上界について検討した。 CIFAR-10ではFIDスコアが2.97、CelebA-HQ-256では6.36である。コードは \url{https://github.com/jae-moo/uotm} で入手できる。

Optimal Transport (OT) problem investigates a transport map that bridges two distributions while minimizing a given cost function. In this regard, OT between tractable prior distribution and data has been utilized for generative modeling tasks. However, OT-based methods are susceptible to outliers and face optimization challenges during training. In this paper, we propose a novel generative model based on the semi-dual formulation of Unbalanced Optimal Transport (UOT). Unlike OT, UOT relaxes the hard constraint on distribution matching. This approach provides better robustness against outliers, stability during training, and faster convergence. We validate these properties empirically through experiments. Moreover, we study the theoretical upper-bound of divergence between distributions in UOT. Our model outperforms existing OT-based generative models, achieving FID scores of 2.97 on CIFAR-10 and 6.36 on CelebA-HQ-256. The code is available at \url{https://github.com/Jae-Moo/UOTM}.

翻訳日:2024-02-07 20:50:39 公開日:2024-02-06

# DirecT2V:大言語モデルはゼロショットテキスト・ビデオ生成のためのフレームレベルディレクトリである

DirecT2V: Large Language Models are Frame-Level Directors for Zero-Shot Text-to-Video Generation ( http://arxiv.org/abs/2305.14330v3 )

ライセンス: Link先を確認

Susung Hong, Junyoung Seo, Heeseong Shin, Sunghwan Hong, Seungryong Kim

(参考訳) AIGC(AIGC)のパラダイムでは、事前訓練されたテキスト・トゥ・イメージ(T2I)モデルからテキスト・トゥ・ビデオ(T2V)生成への知識の移行に注目が集まっている。その効果にもかかわらず、これらのフレームワークは、一貫性のある物語を維持し、単一の抽象ユーザプロンプトからシーン構成やオブジェクト配置のシフトを処理する上での課題に直面している。大規模言語モデル(LLM)が時間依存のフレーム単位のプロンプトを生成する能力について検討し,新しいフレームワークであるDirecT2Vを提案する。 DirecT2Vは命令で調整されたLCMをディレクターとして利用し、時間変化のあるコンテンツを含め、一貫したビデオ生成を容易にする。時間的一貫性を保ち、異なるオブジェクトへの値のマッピングを防止するため、新たな値マッピング法と、追加のトレーニングを必要としないデュアルソフトマックスフィルタリングを拡散モデルに装備する。実験結果は,抽象的ユーザのプロンプトから視覚的にコヒーレントかつストーリーフルな映像を生成できるフレームワークの有効性を検証し,ゼロショットビデオ生成の課題への対処に成功した。

In the paradigm of AI-generated content (AIGC), there has been increasing attention to transferring knowledge from pre-trained text-to-image (T2I) models to text-to-video (T2V) generation. Despite their effectiveness, these frameworks face challenges in maintaining consistent narratives and handling shifts in scene composition or object placement from a single abstract user prompt. Exploring the ability of large language models (LLMs) to generate time-dependent, frame-by-frame prompts, this paper introduces a new framework, dubbed DirecT2V. DirecT2V leverages instruction-tuned LLMs as directors, enabling the inclusion of time-varying content and facilitating consistent video generation. To maintain temporal consistency and prevent mapping the value to a different object, we equip a diffusion model with a novel value mapping method and dual-softmax filtering, which do not require any additional training. The experimental results validate the effectiveness of our framework in producing visually coherent and storyful videos from abstract user prompts, successfully addressing the challenges of zero-shot video generation.

翻訳日:2024-02-07 20:50:25 公開日:2024-02-06

# プライベート微調整のための選択的事前学習

Selective Pre-training for Private Fine-tuning ( http://arxiv.org/abs/2305.13865v2 )

ライセンス: Link先を確認

Da Yu, Sivakanth Gopi, Janardhan Kulkarni, Zinan Lin, Saurabh Naik, Tomasz Lukasz Religa, Jian Yin, Huishuai Zhang

(参考訳) 電子メールクライアントやワードプロセッサでテキスト予測モデルをトレーニングしたいとします。これらのモデルは、1時間に数十億の予測を処理し、ユーザデータのプライバシを保持し、メモリ、推論時間要件を満たし、推論コストを削減するために、特定のモデルサイズ制約に準拠しなければならない。小さく、速く、プライベートなドメイン固有言語モデルを構築することは、活発な研究分野である。本稿では,プライベートデータセットに導かれる公開データセットの「emサブセット」上での注意深い事前トレーニングが,小さなdp言語モデルのトレーニングに不可欠であることを示す。標準ベンチマークでは、我々の新しいフレームワークでトレーニングされたモデルは最先端のパフォーマンスを実現し、文献のすべてのベースラインを改善する。パフォーマンスの改善に加えて、我々のフレームワークは、注意深い事前トレーニングとプライベートな微調整により、より小さなモデルは、プライベートデータにアクセスできないはるかに大きなモデルの性能と一致し、モデル圧縮と効率のツールとしてのプライベートラーニングの約束を強調します。医療、金融など多くのアプリケーションでは、プライベートデータセットは通常、公開データセットよりもはるかに高品質であり、本研究は、パイプライントレーニングのすべての段階でプライベートデータセットを活用する新しい方法を示し、ディープラーニング効率を向上させる。私たちのフレームワークをベースとした言語モデルは、1日に数十億ドルの予測(そして推論コストの面で数百万ドルを節約)を提供する複数の実世界のデプロイメントで使われてきました。

Suppose we want to train text prediction models in email clients or word processors. These models, which serve billions of predictions per hour, must preserve the privacy of user data and adhere to specific model size constraints to meet memory, inference time requirements, and to reduce inference cost. Building small, fast, and private domain-specific language models is a thriving area of research. In this work, we show that a careful pre-training on a {\em subset} of the public dataset that is guided by the private dataset is crucial to train small DP language models. On standard benchmarks, models trained with our new framework achieve state-of-the-art performance, improving upon all the baselines from the literature. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models that do not have access to private data, highlighting the promise of private learning as a tool for model compression and efficiency. In many applications such as health care, finance, etc., private datasets are usually of much higher quality than public datasets, and our work shows novel ways of utilizing private datasets at all the stages of training pipe-line to improve deep learning efficiency. Language models based on our framework have been used in multiple real-world deployments serving billions of predictions per day (and saving millions of dollars in terms of inference cost) highlighting the general applicability of our framework beyond academic benchmarks.

翻訳日:2024-02-07 20:50:00 公開日:2024-02-06

# 2回考える:質問応答モデルの予測ショートカットをなくす効率を計測する

Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models ( http://arxiv.org/abs/2305.06841v2 )

ライセンス: Link先を確認

Luk\'a\v{s} Mikula, Michal \v{S}tef\'anik, Marek Petrovi\v{c}, Petr Sojka

(参考訳) 大規模な言語モデル(llm)が言語理解タスクの大部分を占める一方で、以前の研究は、これらの結果のいくつかがトレーニングデータセットのスプリアス相関のモデリングによってサポートされていることを示している。著者は一般的に、同じタスクのout-of-distribution(ood)データセットでモデルを評価することによってモデルのロバスト性を評価するが、これらのデータセットはトレーニングデータセットのバイアスを共有する可能性がある。本稿では,様々な事前学習モデルと問合せ解答法(QA)において,モデルが特定された突発的特徴への依存度を簡易に測定し,既知の予測バイアスと新たに発見された予測バイアスに対するロバスト性を評価する方法を提案する。既存のデバイアス法は、選択された刺激的特徴への依存を軽減することができるが、これらの手法のOOD性能向上は、バイアス付き特徴への依存を緩和することによって説明できないことを示し、異なるQAデータセット間でバイアスが共有されることを示唆している。最後に、異なるQAデータセットでトレーニングされたモデルの性能が、同じバイアス特性に比較可能に依存していることを測定することで、これを証明している。これらの結果は、LMsの堅牢性に関する報告を、特定の突発的特徴に対処する敵のサンプルレベルまで改善する将来の研究の動機となることを願っている。

While the Large Language Models (LLMs) dominate a majority of language understanding tasks, previous work shows that some of these results are supported by modelling spurious correlations of training datasets. Authors commonly assess model robustness by evaluating their models on out-of-distribution (OOD) datasets of the same task, but these datasets might share the bias of the training dataset. We propose a simple method for measuring a scale of models' reliance on any identified spurious feature and assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA). We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features, suggesting that biases are shared among different QA datasets. Finally, we evidence this to be the case by measuring that the performance of models trained on different QA datasets relies comparably on the same bias features. We hope these results will motivate future work to refine the reports of LMs' robustness to a level of adversarial samples addressing specific spurious features.

翻訳日:2024-02-07 20:49:36 公開日:2024-02-06

# 自己注意力学におけるクラスターの出現

The emergence of clusters in self-attention dynamics ( http://arxiv.org/abs/2305.05465v5 )

ライセンス: Link先を確認

Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

(参考訳) 相互作用する粒子系としてトランスフォーマーを見ることにより,重みが時間に依存しない場合の学習表現の幾何学を記述する。トークンを表す粒子は、時間とともに無限大となるため、特定の制限対象に向かって集結する傾向にある。クラスタ位置は初期トークンによって決定され、Transformersが学習した表現のコンテキスト認識を確認する。力学系と偏微分方程式の手法を用いて、出現する制限対象の型は値行列のスペクトルに依存することを示した。さらに、一次元の場合、自己着行列が低階ブール行列に収束することを証明する。これらの結果の組み合わせは、vaswaniらによる経験的観察を数学的に確認する。 [VSP'17]トランスフォーマーによって処理されると、リーダーが一連のトークンに現れる。

Viewing Transformers as interacting particle systems, we describe the geometry of learned representations when the weights are not time dependent. We show that particles, representing tokens, tend to cluster toward particular limiting objects as time tends to infinity. Cluster locations are determined by the initial tokens, confirming context-awareness of representations learned by Transformers. Using techniques from dynamical systems and partial differential equations, we show that the type of limiting object that emerges depends on the spectrum of the value matrix. Additionally, in the one-dimensional case we prove that the self-attention matrix converges to a low-rank Boolean matrix. The combination of these results mathematically confirms the empirical observation made by Vaswani et al. [VSP'17] that leaders appear in a sequence of tokens when processed by Transformers.

翻訳日:2024-02-07 20:49:12 公開日:2024-02-06

# ランダムコンパイルによるクロストーク誤差の緩和:超伝導量子コンピュータ上でのBCSモデルのシミュレーション

Mitigating crosstalk errors by randomized compiling: Simulation of the BCS model on a superconducting quantum computer ( http://arxiv.org/abs/2305.02345v3 )

ライセンス: Link先を確認

Hugo Perrin, Thibault Scoquart, Alexander Shnirman, J\"org Schmalian and Kyrylo Snizhko

(参考訳) 我々は、隣接する量子ビットの特別な処理を含むランダム化コンパイル(RC)プロトコルの拡張を開発し、IBMQ量子コンピュータ(\texttt{ibm\_lagos} および \texttt{ibmq\_ehningen})の超伝導量子ビットへの障害ゲートの適用によるクロストーク効果を劇的に低減する。 CNOTの2量子ゲートに由来するクロストークエラーは、多くの量子コンピューティングプラットフォームにおけるエラーの重要な原因である。 IBMQマシンの場合、その大きさは見過ごされることが多い。このrcプロトコルはクロストークによるコヒーレントノイズを非分極ノイズチャネルに変換し,ノイズ推定回路などの確立されたエラー緩和スキームを用いて処理する。超伝導に対するバルディーン・クーパー=シュリーファー(BCS)ハミルトニアン(英語版)の非平衡力学の量子シミュレーションに適用し、クーパー対の長距離相互作用により量子ハードウェア上でのシミュレーションが特に困難である。 135のcnotゲートでは、ロータライズやキュービットのデコヒーレンスとは対照的に、クロストークがエラーを支配するような方法で作業します。隣り合う量子ビットの回転は、新しい量子ビットや回路を追加する必要なしにノイズ推定プロトコルを劇的に改善し、bcsモデルの定量的シミュレーションを可能にしている。

We develop and apply an extension of the randomized compiling (RC) protocol that includes a special treatment of neighboring qubits and dramatically reduces crosstalk effects caused by the application of faulty gates on superconducting qubits in IBMQ quantum computers (\texttt{ibm\_lagos} and \texttt{ibmq\_ehningen}). Crosstalk errors, stemming from CNOT two-qubit gates, are a crucial source of errors on numerous quantum computing platforms. For the IBMQ machines, their magnitude is often overlooked-9. Our RC protocol turns coherent noise due to crosstalk into a depolarising noise channel that can then be treated using established error mitigation schemes, such as noise estimation circuits. We apply our approach to the quantum simulation of the non-equilibrium dynamics of the Bardeen-Cooper-Schrieffer (BCS) Hamiltonian for superconductivity, a particularly challenging model to simulate on quantum hardware because of the long-range interaction of Cooper pairs. With 135 CNOT gates, we work in a regime where crosstalk, as opposed to either trotterization or qubit decoherence, dominates the error. Our twirling of neighboring qubits is shown to dramatically improve the noise estimation protocol without the need to add new qubits or circuits and allows for a quantitative simulation of the BCS model.

翻訳日:2024-02-07 20:49:01 公開日:2024-02-06

# 多値量子ニューロン

Multi-Valued Quantum Neurons ( http://arxiv.org/abs/2305.02018v5 )

ライセンス: Link先を確認

M. W. AlMasri

(参考訳) 多値量子論理は、真理値が単位円上に置かれるユニタリのユニークな根として自然に表されるように体系的に定式化される。したがって、多値量子ニューロン(MVQN)は複素数体上の多重値しきい値論理の原理に基づいている。 MVQNの訓練は、単位円に沿った運動に還元される。多値量子ニューロンに基づく量子ニューラルネットワーク(QNN)は、複雑な重み、入力、単位のルートで符号化された出力と、複素平面を単位円にマッピングする活性化関数で構築することができる。このようなニューラルネットワークは、同じ数のニューロンと層を持つバイナリ入力に基づく量子ニューラルネットワークと比較して、高速収束と高機能を享受する。我々の構造は量子系のエネルギースペクトルを分析するのに利用できる。可能な実用的な応用は、光や分子スピンquditsのような多レベル系の軌道角運動量(oam)から構築された量子ニューラルネットワークを用いることができる。

The multiple-valued quantum logic is formulated systematically such that the truth values are represented naturally as unique roots of unity placed on the unit circle. Consequently, multi-valued quantum neuron (MVQN) is based on the principles of multiple-valued threshold logic over the field of complex numbers. The training of MVQN is reduced to the movement along the unit circle. A quantum neural network (QNN) based on multi-valued quantum neurons can be constructed with complex weights, inputs, and outputs encoded by roots of unity and an activation function that maps the complex plane into the unit circle. Such neural networks enjoy fast convergence and higher functionalities compared with quantum neural networks based on binary input with the same number of neurons and layers. Our construction can be used in analyzing the energy spectrum of quantum systems. Possible practical applications can be found using the quantum neural networks built from orbital angular momentum (OAM) of light or multi-level systems such as molecular spin qudits.

翻訳日:2024-02-07 20:48:33 公開日:2024-02-06

# CroSSL: 潜時マスキングによる時系列のクロスモーダル自己監視学習

CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking ( http://arxiv.org/abs/2307.16847v2 )

ライセンス: Link先を確認

Shohreh Deldari, Dimitris Spathis, Mohammad Malekzadeh, Fahim Kawsar, Flora Salim, Akhil Mathur

(参考訳) マルチモーダル時系列の機械学習のためのラベル付きデータの可用性は、フィールドの進歩を広範囲に阻害する。自己教師付き学習(SSL)はラベルに頼ることなくデータ表現を学ぶための有望なアプローチである。しかし、既存のSSLメソッドは、負のペアの高価な計算を必要とし、通常は単一のモダリティのために設計されている。我々はCroSSL(Cross-modal SSL)を導入し、モダリティ固有のエンコーダによって生成された中間埋め込みをマスキングすることと、下流の分類器に供給できるクロスモーダルアグリゲータを通じてグローバルな埋め込みに集約することの2つの新しい概念を紹介した。 CroSSLは、欠落したモダリティとエンドツーエンドのクロスモーダル学習を、欠落した入力を処理するための事前データ前処理や、対照的な学習のためのネガティブペアサンプリングを必要としない。本研究では,加速度センサやジャイロスコープ,生体信号(心拍数,脳電図,筋電図,筋電図,筋電図)など,様々なデータに対してマスキング比とマスキング戦略が与える影響について検討した。全体として、CroSSLは、最小限のラベル付きデータを使用して以前のSSLと教師付きベンチマークより優れており、また、潜伏マスキングがクロスモーダル学習を改善する方法についても光を当てている。私たちのコードはhttps://github.com/dr-bell/crosslでオープンソースです。

Limited availability of labeled data for machine learning on multimodal time-series extensively hampers progress in the field. Self-supervised learning (SSL) is a promising approach to learning data representations without relying on labels. However, existing SSL methods require expensive computations of negative pairs and are typically designed for single modalities, which limits their versatility. We introduce CroSSL (Cross-modal SSL), which puts forward two novel concepts: masking intermediate embeddings produced by modality-specific encoders, and their aggregation into a global embedding through a cross-modal aggregator that can be fed to down-stream classifiers. CroSSL allows for handling missing modalities and end-to-end cross-modal learning without requiring prior data preprocessing for handling missing inputs or negative-pair sampling for contrastive learning. We evaluate our method on a wide range of data, including motion sensors such as accelerometers or gyroscopes and biosignals (heart rate, electroencephalograms, electromyograms, electrooculograms, and electrodermal) to investigate the impact of masking ratios and masking strategies for various data types and the robustness of the learned representations to missing data. Overall, CroSSL outperforms previous SSL and supervised benchmarks using minimal labeled data, and also sheds light on how latent masking can improve cross-modal learning. Our code is open-sourced a https://github.com/dr-bell/CroSSL

翻訳日:2024-02-07 20:40:40 公開日:2024-02-06

# ASCII-Artに基づく横断的タスクによるChatGPTの理解度:ASCII-Artの認識と生成に関するGPT3.5の能力は、完全には欠落していない

Testing the Depth of ChatGPT's Comprehension via Cross-Modal Tasks Based on ASCII-Art: GPT3.5's Abilities in Regard to Recognizing and Generating ASCII-Art Are Not Totally Lacking ( http://arxiv.org/abs/2307.16806v2 )

ライセンス: Link先を確認

David Bayani

(参考訳) リリースから8ヶ月にわたって、ChatGPTとその基盤となるモデルであるGPT3.5は、能力とアクセシビリティの強力な混在により、大きな注目を集めている。これらのモデルが持つ能力の範囲を調査した、ニッチな論文が登場しているが、これらのネットワークから供給され抽出される情報は、自然言語テキストか、スタイリッシュなコードライクな言語である。本研究は,真の人間レベルの知的エージェントが複数の信号モダリティにまたがる能力から着想を得たものである。本研究では,ARCIIアートとして提供される特徴内容の入力を,言語的な要約に含めることなく,GPT3.5の視覚的タスクに対する適性について検討する。視覚設定に典型的な様々な変換後の画像認識タスクにおけるモデルの性能分析,画像部品の知識の検証,画像生成に関する課題について実験を行った。

Over the eight months since its release, ChatGPT and its underlying model, GPT3.5, have garnered massive attention, due to their potent mix of capability and accessibility. While a niche-industry of papers have emerged examining the scope of capabilities these models possess, the information fed to and extracted from these networks has been either natural language text or stylized, code-like language. Drawing inspiration from the prowess we expect a truly human-level intelligent agent to have across multiple signal modalities, in this work we examine GPT3.5's aptitude for visual tasks, where the inputs feature content provided as ASCII-art without overt distillation into a lingual summary. We conduct experiments analyzing the model's performance on image recognition tasks after various transforms typical in visual settings, trials investigating knowledge of image parts, and tasks covering image generation.

翻訳日:2024-02-07 20:40:12 公開日:2024-02-06

# ソフトウェア工学におけるアダプタベース知識伝達のための事前学習言語モデルの利用

Utilization of Pre-trained Language Model for Adapter-based Knowledge Transfer in Software Engineering ( http://arxiv.org/abs/2307.08540v2 )

ライセンス: Link先を確認

Iman Saberi, Fatemeh Fard and Fuxiang Chen

(参考訳) software engineering (se) pre-trained language model (plm) は、codebertのような大規模なコードコーパス上で事前学習されており、plmの微調整を通じて下流タスク(例えば、コードクローン検出)へ移行することに成功した。自然言語処理(NLP)では、PLMに挿入されるコンパクトでパラメータ効率の良いモジュールであるアダプタを用いて、PLMの知識を伝達する代替手段を探索する。アダプタの使用は多くのNLPベースのダウンストリームタスクにおいて有望な結果を示しているが、SEベースのダウンストリームタスクの応用と探索は限られている。本稿では,クローゼテスト,コードクローン検出,コード要約など,複数の下流タスクに対するアダプタを用いた知識伝達について検討する。これらのアダプタはコードコーパスでトレーニングされ、英語コーパスまたはコードコーパスで事前トレーニングされたplmに挿入される。これらのPLMをNL-PLM, C-PLMと呼ぶ。アダプタを持たないPLMに対してNL-PLMを用いることで,NL-PLMからSEタスクに有用な知識を変換し,活用できることが示唆された。結果がc-plmの結果と同等かそれ以上になる場合があり、パラメータ数やトレーニング時間の観点からはより効率的である。興味深いことに、C-PLMに挿入されたアダプタは、通常、従来の微調整されたC-PLMよりも良い結果をもたらす。結果はSEタスクのためのよりコンパクトなモデルを構築するための新しい方向を開く。

Software Engineering (SE) Pre-trained Language Models (PLMs), such as CodeBERT, are pre-trained on large code corpora, and their learned knowledge has shown success in transferring into downstream tasks (e.g., code clone detection) through the fine-tuning of PLMs. In Natural Language Processing (NLP), an alternative in transferring the knowledge of PLMs is explored through the use of adapter, a compact and parameter efficient module that is inserted into a PLM. Although the use of adapters has shown promising results in many NLP-based downstream tasks, their application and exploration in SE-based downstream tasks are limited. Here, we study the knowledge transfer using adapters on multiple down-stream tasks including cloze test, code clone detection, and code summarization. These adapters are trained on code corpora and are inserted into a PLM that is pre-trained on English corpora or code corpora. We called these PLMs as NL-PLM and C-PLM, respectively. We observed an improvement in results using NL-PLM over a PLM that does not have adapters, and this suggested that adapters can transfer and utilize useful knowledge from NL-PLM to SE tasks. The results are sometimes on par with or exceed the results of C-PLM; while being more efficient in terms of the number of parameters and training time. Interestingly, adapters inserted into a C-PLM generally yield better results than a traditional fine-tuned C-PLM. Our results open new directions to build more compact models for SE tasks.

翻訳日:2024-02-07 20:39:32 公開日:2024-02-06

# LLM比較評価:大規模言語モデルを用いたペアワイズ比較によるゼロショットNLG評価

LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models ( http://arxiv.org/abs/2307.07889v3 )

ライセンス: Link先を確認

Adian Liusie, Potsawee Manakul, Mark J. F. Gales

(参考訳) 大規模言語モデル(LLM)の現在の開発は、様々な自然言語タスクで印象的なゼロショット機能を実現している。これらのシステムの興味深い応用として、自然言語生成(NLG)の自動評価がある。本稿では,ゼロショットNLG評価におけるLCMの創発的能力を活用するための2つの選択肢について検討する。 NLG評価において比較評価は広く研究されていないが、人間は個別に評価するよりも2つの選択肢を比較する方が直感的であることが多い。本研究は,複数の視点から比較評価を行う: 絶対的な評価と比較する性能,プロンプトにおける位置バイアス,比較数の観点からの効率的なランキング。 LLM比較評価はNLG評価における単純で汎用的で効果的なアプローチであることを示す。 FlanT5 や Llama2-chat のような中規模のオープンソース LLM では、スコアリングよりも比較評価が優れている。さらに,対数比較を行う場合,llmは位置偏りが強いことを実証し,さらに性能を向上させるデバイアス手法を提案する。

Current developments in large language models (LLMs) have enabled impressive zero-shot capabilities across various natural language tasks. An interesting application of these systems is in the automated assessment of natural language generation (NLG), a highly challenging area with great practical benefit. In this paper, we explore two options for exploiting the emergent abilities of LLMs for zero-shot NLG assessment: absolute score prediction, and comparative assessment which uses relative comparisons between pairs of candidates. Though comparative assessment has not been extensively studied in NLG assessment, we note that humans often find it more intuitive to compare two options rather than scoring each one independently. This work examines comparative assessment from multiple perspectives: performance compared to absolute grading; positional biases in the prompt; and efficient ranking in terms of the number of comparisons. We illustrate that LLM comparative assessment is a simple, general and effective approach for NLG assessment. For moderate-sized open-source LLMs, such as FlanT5 and Llama2-chat, comparative assessment is superior to prompt scoring, and in many cases can achieve performance competitive with state-of-the-art methods. Additionally, we demonstrate that LLMs often exhibit strong positional biases when making pairwise comparisons, and we propose debiasing methods that can further improve performance.

翻訳日:2024-02-07 20:39:05 公開日:2024-02-06

# マルチスケール空間時間骨格マッチングによるワンショット行動認識

One-Shot Action Recognition via Multi-Scale Spatial-Temporal Skeleton Matching ( http://arxiv.org/abs/2307.07286v2 )

ライセンス: Link先を確認

Siyuan Yang, Jun Liu, Shijian Lu, Er Meng Hwa, Alex C. Kot

(参考訳) 単一トレーニングサンプルで骨格行動認識モデルを学習することを目的としたワンショット骨格行動認識は,大規模な骨格行動データの収集と注釈付けの難しさから注目されている。しかし、既存のほとんどの研究は、空間構造や骨格データの時間順序を無視する特徴ベクトルを直接比較することで骨格配列と一致している。本稿では,マルチスケールな時空間特徴マッチングによる骨格行動認識を行う一発骨格行動認識技術を提案する。複数の空間的および時間的スケールでスケルトンデータを表現し、2つの視点から最適な特徴マッチングを実現する。ひとつはマルチスケールマッチングで、複数の空間的および時間的スケールでスケルトンデータのスケールワイドな意味関係を同時にキャプチャする。 2つ目はクロススケールマッチングで、複数のスケールにまたがるサンプルワイドの関連性を捉えることで、異なる動きの大きさと速度を扱う。大規模な3つのデータセット(NTU RGB+D, NTU RGB+D 120, PKU-MMD)に対する大規模な実験により, 本手法は優れた単発骨格の動作認識を達成し, 高いマージンで一貫した性能を発揮することが示された。

One-shot skeleton action recognition, which aims to learn a skeleton action recognition model with a single training sample, has attracted increasing interest due to the challenge of collecting and annotating large-scale skeleton action data. However, most existing studies match skeleton sequences by comparing their feature vectors directly which neglects spatial structures and temporal orders of skeleton data. This paper presents a novel one-shot skeleton action recognition technique that handles skeleton action recognition via multi-scale spatial-temporal feature matching. We represent skeleton data at multiple spatial and temporal scales and achieve optimal feature matching from two perspectives. The first is multi-scale matching which captures the scale-wise semantic relevance of skeleton data at multiple spatial and temporal scales simultaneously. The second is cross-scale matching which handles different motion magnitudes and speeds by capturing sample-wise relevance across multiple scales. Extensive experiments over three large-scale datasets (NTU RGB+D, NTU RGB+D 120, and PKU-MMD) show that our method achieves superior one-shot skeleton action recognition, and it outperforms the state-of-the-art consistently by large margins.

翻訳日:2024-02-07 20:38:43 公開日:2024-02-06

# StyleGAN3:翻訳と回転の等価性向上のための生成ネットワーク

StyleGAN3: Generative Networks for Improving the Equivariance of Translation and Rotation ( http://arxiv.org/abs/2307.03898v3 )

ライセンス: Link先を確認

Tianlei Zhu, Junqi Chen, Renzhe Zhu, Gaurav Gupta

(参考訳) StyleGANは、顔の姿勢やアイデンティティに影響を及ぼすスタイルや、髪、しわ、肌の色、その他の詳細に影響を及ぼすノイズを利用することができる。これらのうち、画像処理の結果はスタイルGANの異なるバージョンによって若干異なる。その結果, styleGAN2 と styleGAN3 の2つの改良版の比較が本研究の主な焦点となる。 FFHQデータセットをデータセットとして使用し,FID,EQ-T,EQ-Rをモデル評価に使用した。結局、Stylegan3バージョンは同値性を改善するためのより良い生成ネットワークであることが判明した。私たちの発見は、アニメーションやビデオの作成にポジティブな影響を与えます。

StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos.

翻訳日:2024-02-07 20:38:19 公開日:2024-02-06

# 予測状態表現の学習に有効なUCB型アルゴリズム

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations ( http://arxiv.org/abs/2307.00405v3 )

ライセンス: Link先を確認

Ruiquan Huang, Yingbin Liang, Jing Yang

(参考訳) マルコフ決定プロセス(MDP)と部分的に観察可能なMDP(PMMDP)を特別に含む一般的なシーケンシャルな意思決定問題は、時間とともに観察と行動の歴史に基づいて一連の意思決定を行うことで累積報酬を最大化することである。近年の研究では、予測状態表現(psr)によってモデル化された低ランク構造を認める場合、逐次的意思決定問題は統計的に学習可能であることが示されている。これらの進歩にもかかわらず、既存のアプローチは通常、計算的に難解なオラクルやステップを含む。一方,楽観的なボーナスデザインの難しさから,盗賊やMDPの計算効率向上に成功している上位信頼境界(UCB)に基づくアプローチは,より一般的なPSRでは研究されていない。本稿では,推定モデルと実モデル間の全変動距離を上限とする新しいボーナス項を特徴とする,PSRに対する最初のUCB型アプローチを提案する。さらに,オンラインPSRとオフラインPSRの両方に設計したUPB型アルゴリズムの複雑さ境界を特徴付ける。 PSRに対する既存のアプローチとは対照的に、UCB型アルゴリズムは計算的トラクタビリティ、最優先の準最適ポリシー、モデルの精度が保証される。

The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are computationally intractable. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.

翻訳日:2024-02-07 20:38:08 公開日:2024-02-06

# ベイズリスクの改善は競争で社会福祉を減らし得る

Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition ( http://arxiv.org/abs/2306.14670v3 )

ライセンス: Link先を確認

Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt, Nika Haghtalab

(参考訳) 機械学習モデルの規模が増加するにつれて、スケーリング法則のようなトレンドが予測精度の一貫した下流改善を予測している。しかし、これらのトレンドは独立した単一のモデル提供者の視点をとっており、現実のプロバイダーはユーザーと競い合うことが多い。本研究は,ユーザ間での全体的な予測精度が,非モノトニック性やスケールの縮小など,これらのスケーリングトレンドの振る舞いを根本的に変えることができることを示す。分類タスクの競合モデルを定義し、スケールの増大の影響を研究するためのレンズとしてデータ表現を使用する。ベイズリスクによって測定された)データ表現品質の改善が、競合するモデルプロデューサの市場において、ユーザ間での全体的な予測精度(社会福祉など)を低下させる多くの設定を見出した。我々の例は、単純な設定のクローズドフォーム公式から、CIFAR-10の事前訓練された表現を伴うシミュレーションまで様々である。概念レベルでは、各モデルプロジェクタのスケーリング傾向が、複数のモデルプロバイダを持つマーケットプレースにおける社会福祉の下流改善に寄与する必要はないことを示唆する。

As the scale of machine learning models increases, trends such as scaling laws anticipate consistent downstream improvements in predictive accuracy. However, these trends take the perspective of a single model-provider in isolation, while in reality providers often compete with each other for users. In this work, we demonstrate that competition can fundamentally alter the behavior of these scaling trends, even causing overall predictive accuracy across users to be non-monotonic or decreasing with scale. We define a model of competition for classification tasks, and use data representations as a lens for studying the impact of increases in scale. We find many settings where improving data representation quality (as measured by Bayes risk) decreases the overall predictive accuracy across users (i.e., social welfare) for a marketplace of competing model-providers. Our examples range from closed-form formulas in simple settings to simulations with pretrained representations on CIFAR-10. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare in marketplaces with multiple model providers.

翻訳日:2024-02-07 20:37:45 公開日:2024-02-06

# 均一電場及び磁場中の平面フェルミオンに対するフェルミオン縮合と真空エネルギー-モーメントテンソル

Fermionic condensate and the vacuum energy-momentum tensor for planar fermions in homogeneous electric and magnetic fields ( http://arxiv.org/abs/2306.11402v3 )

ライセンス: Link先を確認

V. V. Parazian

(参考訳) 外部定数と均質な電場と磁場の平面上に局在した巨大なフェルミイオン量子場を考える。磁場は平面に垂直であり、電場は平行である。ディラック方程式に対する完全な解の集合が提示される。真空状態の重要な物理特性として,フェルミオン凝縮とエネルギー-運動テンソルの期待値について検討した。再正規化はHurwitz関数を用いて行われる。結果は、ゼロ電界の場合の研究結果と比較される。問題パラメータの値について,各領域における真空期待値の挙動について考察する。その結果は、長波長近似におけるディラックモデルにより記述されたグラフェンシートの電子サブシステムを含む。

We consider a massive fermionic quantum field localized on a plane in external constant and homogeneous electric and magnetic fields. The magnetic field is perpendicular to the plane and the electric field is parallel. The complete set of solutions to the Dirac equation is presented. As important physical characteristics of the vacuum state, the fermion condensate and the expectation value of the energy-momentum tensor are investigated. The renormalization is performed using the Hurwitz function. The results are compared with those previously studied in the case of zero electric field. We discuss the behavior of the vacuum expectation values in different regions for the values of the problem parameters. Applications of the results include the electronic subsystem of graphene sheet described by the Dirac model in the long-wavelength approximation.

翻訳日:2024-02-07 20:37:27 公開日:2024-02-06

# 多段階一般化による弱めの3次元物体検出

Weakly Supervised 3D Object Detection with Multi-Stage Generalization ( http://arxiv.org/abs/2306.05418v2 )

ライセンス: Link先を確認

Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

(参考訳) 大規模モデルの急速な発展に伴い、データの必要性はますます重要になっている。特に3dオブジェクト検出では、コストのかかる手動アノテーションがさらなる進歩を妨げている。アノテーションの負担を軽減するため,2次元アノテーションのみに基づく3次元オブジェクト検出の課題について検討した。高度な3D再構成技術により、全体の静的な3Dシーンを再構築することが可能になった。しかし、シーン全体から正確なオブジェクトレベルのアノテーションを抽出し、これらの制限されたアノテーションをシーン全体に一般化することは、依然として課題である。本稿では,擬似ラベル生成と多段階一般化を包含するba$^2$-detと呼ばれる新しいパラダイムを提案する。再構成されたシーンレベルポイントからオブジェクトクラスタを得るために,ダブルクラスタアルゴリズムを考案し,一般化の3段階(完全から部分へ,静的から動的へ,遠くまで)を展開することにより,モデルの検出能力をさらに向上させる。大規模なWaymo Open Datasetで実施された実験によると、BA$^2$-Detのパフォーマンスは10%アノテーションを使用した完全に教師された手法と同等である。さらに、事前トレーニングのために大きな生動画を使用すると、BA$^2$-DetはKITTIデータセットに対して20%の相対的な改善を達成できる。この手法は複雑なシーンでオープンセットの3Dオブジェクトを検出する可能性も大きい。プロジェクトページ: https://ba2det.site。

With the rapid development of large models, the need for data has become increasingly crucial. Especially in 3D object detection, costly manual annotations have hindered further advancements. To reduce the burden of annotation, we study the problem of achieving 3D object detection solely based on 2D annotations. Thanks to advanced 3D reconstruction techniques, it is now feasible to reconstruct the overall static 3D scene. However, extracting precise object-level annotations from the entire scene and generalizing these limited annotations to the entire scene remain challenges. In this paper, we introduce a novel paradigm called BA$^2$-Det, encompassing pseudo label generation and multi-stage generalization. We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model's detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant. Experiments conducted on the large-scale Waymo Open Dataset show that the performance of BA$^2$-Det is on par with the fully-supervised methods using 10% annotations. Additionally, using large raw videos for pretraining,BA$^2$-Det can achieve a 20% relative improvement on the KITTI dataset. The method also has great potential for detecting open-set 3D objects in complex scenes. Project page: https://ba2det.site.

翻訳日:2024-02-07 20:37:16 公開日:2024-02-06

# アクティベーション最適化を用いたトロイの木馬モデル検出

Trojan Model Detection Using Activation Optimization ( http://arxiv.org/abs/2306.04877v2 )

ライセンス: Link先を確認

Mohamed E. Hussein, Sudharshan Subramaniam Janakiraman, Wael AbdAlmageed

(参考訳) 機械学習モデルのトレーニングは非常に費用がかからない。これは、例えば、データ制限(使用不可能か、大きすぎるか)や計算能力の制限のためかもしれない。したがって、可能な限りオープンソースの事前学習モデルに頼るのが一般的である。しかし、このプラクティスはセキュリティの観点から警戒されている。事前訓練されたモデルはトロイの木馬攻撃に感染し、攻撃者はモデルにトリガーを埋め込んで、入力にトリガーが存在するときにモデルの動作がアタッカーによって制御されるようにする。本稿では,トロイの木馬モデルを検出する新しい手法を提案する。本手法はアクティベーション最適化に基づくモデルのシグネチャを生成する。分類器は、そのシグネチャが与えられたトロイの木馬モデルを検出するように訓練される。我々は、グラディエントベースの署名からTRojan識別のためのTRIGSと呼ぶ。 TRIGSは、畳み込みモデルの2つの公開データセットで最先端のパフォーマンスを達成する。さらに,視覚トランスフォーマーアーキテクチャに基づいた,imagenetモデルの新たな挑戦的データセットも紹介する。 TRIGSは新しいデータセットで最高のパフォーマンスを提供し、ベースラインメソッドを大きなマージンで上回る。また,本実験では,攻撃者のモデルアーキテクチャについて事前の知識がなくても,トライグはクリーンなサンプルを少量しか必要とせず,合理的に機能することを示した。私たちのデータセットはまもなくリリースされます。

Training machine learning models can be very expensive or even unaffordable. This may be, for example, due to data limitations (unavailability or being too large), or computational power limitations. Therefore, it is a common practice to rely on open-source pre-trained models whenever possible. However, this practice is alarming from a security perspective. Pre-trained models can be infected with Trojan attacks, in which the attacker embeds a trigger in the model such that the model's behavior can be controlled by the attacker when the trigger is present in the input. In this paper, we present a novel method for detecting Trojan models. Our method creates a signature for a model based on activation optimization. A classifier is then trained to detect a Trojan model given its signature. We call our method TRIGS for TRojan Identification from Gradient-based Signatures. TRIGS achieves state-of-the-art performance on two public datasets of convolutional models. Additionally, we introduce a new challenging dataset of ImageNet models based on the vision transformer architecture. TRIGS delivers the best performance on the new dataset, surpassing the baseline methods by a large margin. Our experiments also show that TRIGS requires only a small amount of clean samples to achieve good performance, and works reasonably well even if the defender does not have prior knowledge about the attacker's model architecture. Our dataset will be released soon.

翻訳日:2024-02-07 20:36:54 公開日:2024-02-06

# 高次元および置換不変異常検出

High-dimensional and Permutation Invariant Anomaly Detection ( http://arxiv.org/abs/2306.03933v4 )

ライセンス: Link先を確認

Vinicius Mikuni, Benjamin Nachman

(参考訳) 新しい物理過程の異常検出法は、高次元確率密度の学習が困難であるため、しばしば低次元空間に限られる。特に構成レベルでは,一般密度推定法では置換不変性や可変長入力などの望ましい特性を組み込むことが困難となる。本研究では, 分散モデルに基づく粒子物理学データに対して, 可変長入力を扱うために特別に設計された置換不変密度推定器を提案する。本手法の有効性は,学習密度を置換不変な異常検出スコアとして利用し,背景のみの仮説の下でジェットを効果的に同定することによって実証する。密度推定法を検証するため, 教師付き分類アルゴリズムにより得られた密度の比について検討し, 比較を行った。

Methods for anomaly detection of new physics processes are often limited to low-dimensional spaces due to the difficulty of learning high-dimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variable-length inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutation-invariant density estimator for particle physics data based on diffusion models, specifically designed to handle variable-length inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutation-invariant anomaly detection score, effectively identifying jets with low likelihood under the background-only hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.

翻訳日:2024-02-07 20:36:34 公開日:2024-02-06

# 3次元分子相互作用学習に向けたジェネラリスト同変トランスフォーマー

Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning ( http://arxiv.org/abs/2306.01474v4 )

ライセンス: Link先を確認

Xiangzhe Kong, Wenbing Huang, Yang Liu

(参考訳) 生物学や創薬における多くのプロセスは、タンパク質やタンパク質、タンパク質や小分子などの分子間の様々な3d相互作用を含んでいる。異なる分子は通常異なる粒度で表されるため、既存の手法では各種類の分子を異なるモデルで独立にエンコードし、普遍的な相互作用物理学を学ぶには欠陥がある。本稿ではまず,任意の3次元錯体を集合の幾何学的グラフとして普遍的に表現し,全ての分子を1つのモデルで符号化することを提案する。次に、ドメイン固有の階層とドメインに依存しない相互作用物理学の両方を効果的に捉えるためのジェネラリスト同変トランスフォーマー(get)を提案する。具体的には、GETはバイレベルアテンションモジュール、フィードフォワードモジュール、レイヤ正規化モジュールで構成されており、各モジュールはE(3)同変であり、可変サイズの集合を扱うのに特化している。特に、従来のプーリングベースの階層モデルとは対照的に、GETはあらゆるレベルのきめ細かい情報を保持できます。タンパク質,小分子,rna/dna間の相互作用に関する広範な実験により,提案手法の有効性と汎用性が検証された。

Many processes in biology and drug discovery involve various 3D interactions between molecules, such as protein and protein, protein and small molecule, etc. Given that different molecules are usually represented in different granularity, existing methods usually encode each type of molecules independently with different models, leaving it defective to learn the universal underlying interaction physics. In this paper, we first propose to universally represent an arbitrary 3D complex as a geometric graph of sets, shedding light on encoding all types of molecules with one model. We then propose a Generalist Equivariant Transformer (GET) to effectively capture both domain-specific hierarchies and domain-agnostic interaction physics. To be specific, GET consists of a bilevel attention module, a feed-forward module and a layer normalization module, where each module is E(3) equivariant and specialized for handling sets of variable sizes. Notably, in contrast to conventional pooling-based hierarchical models, our GET is able to retain fine-grained information of all levels. Extensive experiments on the interactions between proteins, small molecules and RNA/DNAs verify the effectiveness and generalization capability of our proposed method across different domains.

翻訳日:2024-02-07 20:36:21 公開日:2024-02-06

# 事前学習音声モデルのモデル伝達可能性の推定法

How to Estimate Model Transferability of Pre-Trained Speech Models? ( http://arxiv.org/abs/2306.01015v3 )

ライセンス: Link先を確認

Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

(参考訳) 本研究では,学習対象タスクに対する事前学習音声モデル(PSM)の伝達可能性を推定する「スコアベースアセスメント」フレームワークを提案する。我々は,ベイズ推定法と最適移動法という2つの表現理論を用いて,抽出した表現を用いてpsm候補のランクスコアを生成する。提案手法は, 時間的独立仮説を定めて, 候補モデルやレイヤの微調整をすることなく, 転送可能性スコアを効率的に計算する。公開データを用いて,一般的な教師付き音声モデル (Conformer RNN-Transducerなど) と自己教師付き音声モデル (HuBERTなど) をクロス層およびクロスモデル設定で評価する。実験の結果,スピアマンのランク相関は高く,評価フレームワークと微調整の土台真実との間にはp$-値が低いことがわかった。提案する転送性フレームワークは計算時間と資源を少なくし,音声基礎モデルをチューニングするための資源節約と時間効率のアプローチとなる。

In this work, we introduce a "score-based assessment" framework for estimating the transferability of pre-trained speech models (PSMs) for fine-tuning target tasks. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates using the extracted representations. Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers by making a temporal independent hypothesis. We evaluate some popular supervised speech models (e.g., Conformer RNN-Transducer) and self-supervised speech models (e.g., HuBERT) in cross-layer and cross-model settings using public data. Experimental results show a high Spearman's rank correlation and low $p$-value between our estimation framework and fine-tuning ground truth. Our proposed transferability framework requires less computational time and resources, making it a resource-saving and time-efficient approach for tuning speech foundation models.

翻訳日:2024-02-07 20:36:00 公開日:2024-02-06

# 自己教師付き学習における確率的多値論理演算による表現合成

Representation Synthesis by Probabilistic Many-Valued Logic Operation in Self-Supervised Learning ( http://arxiv.org/abs/2309.04148v2 )

ライセンス: Link先を確認

Hiroki Nakamura, Masashi Okada, Tadahiro Taniguchi

(参考訳) 本稿では,論理操作が可能な表現のための自己教師付き学習(SSL)手法を提案する。表現学習は画像生成や検索といった様々なタスクに適用されている。表現の論理制御性はこれらのタスクにとって重要である。自然言語を入力として表現の直感的な制御を可能にする方法がいくつか示されているが、表現間の論理操作による表現制御は実証されていない。表現合成を用いたSSLメソッド(例えば、要素平均と最大演算)が提案されているが、これらのメソッドで実行される操作は論理演算を含まない。本研究では,既存の表現合成を多値論理の確率的拡張の演算に置き換え,論理操作可能な自己教師付き表現学習手法を提案する。表現は、画像内の各特徴の有無を示す真理値である特徴仮定次数の集合からなり、論理演算(例えば、or、and)を実現する。本手法は,両表現の特徴を持つ表現や,両表現に共通する特徴のみを生成することができる。さらに、多値論理の真理値の確率分布によって特徴量を示すことにより、特徴の曖昧な存在を表現することを実現する。合成表現を用いた従来のSSL手法と比較して,本手法はシングルラベルとマルチラベルの分類タスクにおいて競合的に動作することを示した。さらに,MNIST と PascalVOC を用いた画像検索実験により,提案手法の表現をOR および操作により操作可能であることを示した。

In this paper, we propose a new self-supervised learning (SSL) method for representations that enable logic operations. Representation learning has been applied to various tasks, such as image generation and retrieval. The logical controllability of representations is important for these tasks. Although some methods have been shown to enable the intuitive control of representations using natural languages as the inputs, representation control via logic operations between representations has not been demonstrated. Some SSL methods using representation synthesis (e.g., elementwise mean and maximum operations) have been proposed, but the operations performed in these methods do not incorporate logic operations. In this work, we propose a logic-operable self-supervised representation learning method by replacing the existing representation synthesis with the OR operation on the probabilistic extension of many-valued logic. The representations comprise a set of feature-possession degrees, which are truth values indicating the presence or absence of each feature in the image, and realize the logic operations (e.g., OR and AND). Our method can generate a representation that has the features of both representations or only those features common to both representations. In addition, the expression of the ambiguous presence of a feature is realized by indicating the feature-possession degree by the probability distribution of truth values of the many-valued logic. We showed that our method performs competitively in single and multi-label classification tasks compared with prior SSL methods using synthetic representations. Moreover, experiments on image retrieval using MNIST and PascalVOC showed that the representations of our method can be operated by OR and AND operations.

翻訳日:2024-02-07 20:29:05 公開日:2024-02-06

# デコード:建物の歴史的データと環境要因を活用したデータ駆動エネルギー消費予測

DECODE: Data-driven Energy Consumption Prediction leveraging Historical Data and Environmental Factors in Buildings ( http://arxiv.org/abs/2309.02908v2 )

ライセンス: Link先を確認

Aditya Mishra, Haroon R. Lone, Aayush Mishra

(参考訳) 建物のエネルギー予測は、効率的なエネルギー管理において重要な役割を果たす。正確な予測は、グリッド内の最適なエネルギー消費と分配を達成するために不可欠である。本稿では,過去のエネルギーデータ,居住パターン,気象条件を用いて,建築エネルギー消費量を予測するための長期短期記憶モデル(lstm)を提案する。 LSTMモデルは、既存の予測モデルと比較して、住宅や商業ビルの正確な短・中・長期エネルギー予測を提供する。 LSTMモデルと線形回帰,決定木,ランダム林などの確立した予測手法を比較した。 LSTMモデルは、すべての指標において優れたパフォーマーとして現れます。これは例外的な予測精度を示し、R2スコアは0.97で、平均絶対誤差(MAE)は0.007である。開発したモデルのさらなる利点は、限られたデータセットでトレーニングしても効率的なエネルギー消費予測を実現する能力である。我々は,実世界のデータに対する厳密なトレーニングと評価を通じて,過剰フィッティング(分散)と低フィッティング(バイアス)に関する懸念に対処する。まとめると、我々の研究は代替手法より優れ、優れた効率、一般化可能性、信頼性で機能する堅牢なLSTMモデルを提供することでエネルギー予測に寄与する。

Energy prediction in buildings plays a crucial role in effective energy management. Precise predictions are essential for achieving optimal energy consumption and distribution within the grid. This paper introduces a Long Short-Term Memory (LSTM) model designed to forecast building energy consumption using historical energy data, occupancy patterns, and weather conditions. The LSTM model provides accurate short, medium, and long-term energy predictions for residential and commercial buildings compared to existing prediction models. We compare our LSTM model with established prediction methods, including linear regression, decision trees, and random forest. Encouragingly, the proposed LSTM model emerges as the superior performer across all metrics. It demonstrates exceptional prediction accuracy, boasting the highest R2 score of 0.97 and the most favorable mean absolute error (MAE) of 0.007. An additional advantage of our developed model is its capacity to achieve efficient energy consumption forecasts even when trained on a limited dataset. We address concerns about overfitting (variance) and underfitting (bias) through rigorous training and evaluation on real-world data. In summary, our research contributes to energy prediction by offering a robust LSTM model that outperforms alternative methods and operates with remarkable efficiency, generalizability, and reliability.

翻訳日:2024-02-07 20:28:22 公開日:2024-02-06

# OHQ:オンチップのハードウェア対応量子化

OHQ: On-chip Hardware-aware Quantization ( http://arxiv.org/abs/2309.01945v2 )

ライセンス: Link先を確認

Wei Huang, Haotong Qin, Yangdong Liu, Jingzhuo Liang, Yulun Zhang, Ying Li, Xianglong Liu

(参考訳) 量子化は、リソース制約のあるハードウェアに高度なディープモデルをデプロイするための最も有望なアプローチの1つとして現れます。 mixed-precision quantizationは、複数のビット幅アーキテクチャを活用して、量子化モデルの精度と効率性を解き放つ。しかし、既存の混合精度量子化は、膨大な計算オーバーヘッドを引き起こす網羅的な探索空間に苦しむ。本稿では,ハードウェア・アウェア・量子化(ohq)フレームワークを提案する。このフレームワークは,オンラインデバイスにアクセスせずにハードウェア・アウェアの複合精度量子化を行う。第一に、オンチップ量子化認識(OQA)パイプラインを構築し、ハードウェア上で量子化演算子の実際の効率指標を認識できるようにする。第二に、オンチップレベルの計算能力の制約下で演算子の精度指標を効率的に推定するMask-guided Quantization Estimation(MQE)技術を提案する。特に、量子化プロセスは、追加のコンピューティングデバイスやデータアクセスなしで、オンチップで完全に実行される。 ResNet-18とMobileNetV3では,それぞれ70%,73%の精度を実現した。 OHQは、デプロイメント時のINT8と比較して、レイテンシを15～30%改善する。

Quantization emerges as one of the most promising approaches for deploying advanced deep models on resource-constrained hardware. Mixed-precision quantization leverages multiple bit-width architectures to unleash the accuracy and efficiency potential of quantized models. However, existing mixed-precision quantization suffers exhaustive search space that causes immense computational overhead. The quantization process thus relies on separate high-performance devices rather than locally, which also leads to a significant gap between the considered hardware metrics and the real deployment.In this paper, we propose an On-chip Hardware-aware Quantization (OHQ) framework that performs hardware-aware mixed-precision quantization without accessing online devices. First, we construct the On-chip Quantization Awareness (OQA) pipeline, enabling perceive the actual efficiency metrics of the quantization operator on the hardware.Second, we propose Mask-guided Quantization Estimation (MQE) technique to efficiently estimate the accuracy metrics of operators under the constraints of on-chip-level computing power.By synthesizing network and hardware insights through linear programming, we obtain optimized bit-width configurations. Notably, the quantization process occurs on-chip entirely without any additional computing devices and data access. We demonstrate accelerated inference after quantization for various architectures and compression ratios, achieving 70% and 73% accuracy for ResNet-18 and MobileNetV3, respectively. OHQ improves latency by 15~30% compared to INT8 on deployment.

翻訳日:2024-02-07 20:28:04 公開日:2024-02-06

# ws-sfmlearner : カメラパラメータ不明手術ビデオにおける自己教師付き単眼深度とエゴモーション推定

WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters ( http://arxiv.org/abs/2308.11776v2 )

ライセンス: Link先を確認

Ange Lou and Jack Noble

(参考訳) 手術映像の深さ推定は多くの画像誘導手術において重要な役割を担っている。しかし,手術シーンの明るさやノイズの相違が原因で,手術映像に深度マップの真実データセットを作成するのが難しく,時間を要する。そのため,コンピュータビジョンコミュニティからは,高精度でロバストな自己監視深度とカメラの自我運動推定システムの構築が注目されている。いくつかの自己監督手法は、地上の真理深度マップやポーズの必要性を緩和するが、カメラ固有のパラメータがまだ必要であり、しばしば欠落しているか記録されていない。さらに,既存の作業におけるカメラ固有の予測手法は,データセットの品質に大きく依存する。本研究では,正確な深度マップとカメラポーズだけでなく,カメラ固有のパラメータを予測できる自己教師付き深度推定システムの構築を目標とした。我々は,カメラパラメータ予測のための補助的な監視を行うために,コストボリュームに基づく監視手法を提案した。実験の結果,提案手法は推定カメラパラメータ,エゴモーション,深さ推定の精度を改善した。

Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation.

翻訳日:2024-02-07 20:26:35 公開日:2024-02-06

# samsnerf: segment anything model(sam)はneural radiance field(nerf)によるダイナミックな手術シーンの再構築をガイドする。

SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF) ( http://arxiv.org/abs/2308.11774v2 )

ライセンス: Link先を確認

Ange Lou, Yamin Li, Xing Yao, Yike Zhang and Jack Noble

(参考訳) 手術映像からの手術シーンの正確な再構成は, 術中ナビゲーションや画像誘導ロボット手術自動化など, 様々な応用に不可欠である。しかし,従来のアプローチは主に深度推定に頼っているため,移動式手術器具による手術シーンの再構築には限界がある。この制限に対処し,すべてのフレームにおける手術器具の正確な3次元位置予測を行うため,Segment Anything Model (SAM) とNeRF(NeRF)技術を組み合わせたSAMSNeRFと呼ばれる新しいアプローチを提案する。提案手法は,NeRFによる動的手術シーン再構築の洗練を導くSAMを用いて,手術器具の正確なセグメンテーションマスクを生成する。腹腔鏡下手術ビデオにおける実験結果から,本手法は高忠実度ダイナミックな手術場面を再現し,手術器具の空間情報を正確に反映する。提案手法は手術時の手術器具の正確な3次元位置情報を外科医に提供することで,手術ナビゲーションと自動化を大幅に向上させることができる。

The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D position prediction for surgical tools in all frames, we propose a novel approach called SAMSNeRF that combines Segment Anything Model (SAM) and Neural Radiance Field (NeRF) techniques. Our approach generates accurate segmentation masks of surgical tools using SAM, which guides the refinement of the dynamic surgical scene reconstruction by NeRF. Our experimental results on public endoscopy surgical videos demonstrate that our approach successfully reconstructs high-fidelity dynamic surgical scenes and accurately reflects the spatial information of surgical tools. Our proposed approach can significantly enhance surgical navigation and automation by providing surgeons with accurate 3D position information of surgical tools during surgery.The source code will be released soon.

翻訳日:2024-02-07 20:26:17 公開日:2024-02-06

# 正規化された$(xp)^2$モデル

A Regularized $(XP)^2$ Model ( http://arxiv.org/abs/2308.11648v2 )

ライセンス: Link先を確認

Yu-Qi Chen and Zhao-Feng Ge

(参考訳) 古典的ハミルトニアンである $h(x,p)=(x^2+a^2)(p^2+a^2)$ によって記述される動的モデルについて検討する。高エネルギーの$E$制限では、位相パスは$(XP)^2$モデルに似ている。しかし、$a$のゼロでない値はレギュレータとして作用し、$x, p \sim 0$の領域に現れる特異点を取り除き、状態密度の対数的増加を特徴とする離散スペクトルとなる。古典解は楕円函数によって記述され、周期は楕円積分によって決定される。半古典近似では、漸近リーマン・ジーゲル公式は多重位相経路からの寄与の和として解釈できると推測する。量子化ハミルトニアンの3つの異なる形式を示し、それらを$\cosh 2x$-likeポテンシャルを持つ標準シュレーディンガー方程式に再構成する。これらのスペクトルの数値評価を行い、エネルギー準位の違いを明らかにした。そのうちの1つの興味深い形式は、古典版と同一のシュルク・オディンガー方程式においてハミルトニアンを持つ。そのようなシナリオでは、固有値方程式は$i\infty$ポイントでのマチュー関数の値の消滅として表すことができ、さらにマチュー関数は波動関数として表すことができる。

We investigate a dynamic model described by the classical Hamiltonian $H(x,p)=(x^2+a^2)(p^2+a^2)$, where $a^2>0$, in classical, semi-classical, and quantum mechanics. In the high-energy $E$ limit, the phase path resembles that of the $(XP)^2$ model. However, the non-zero value of $a$ acts as a regulator, removing the singularities that appear in the region where $x, p \sim 0$, resulting in a discrete spectrum characterized by a logarithmic increase in state density. Classical solutions are described by elliptic functions, with the period being determined by elliptic integrals. In semi-classical approximation, we speculate that the asymptotic Riemann-Siegel formula may be interpreted as summing over contributions from multiply phase paths. We present three different forms of quantized Hamiltonians, and reformulate them into the standard Schr\" odinger equation with $\cosh 2x$-like potentials. Numerical evaluations of the spectra for these forms are carried out and reveal minor differences in energy levels. Among them, one interesting form possesses Hamiltonian in the Schr\" odinger equation that is identical to its classical version. In such scenarios, the eigenvalue equations can be expressed as the vanishing of the Mathieu functions' value at $i\infty$ points, and furthermore, the Mathieu functions can be represented as the wave functions.

翻訳日:2024-02-07 20:25:57 公開日:2024-02-06

# 思考のグラフ: 大きな言語モデルで精巧な問題を解決する

Graph of Thoughts: Solving Elaborate Problems with Large Language Models ( http://arxiv.org/abs/2308.09687v4 )

ライセンス: Link先を確認

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler

(参考訳) graph of thoughts (got): 大規模言語モデル(llm)におけるプロンプト機能を、chain-of-thoughtやtree of thoughts (tot)といったパラダイムによって提供されるものを超えて推進するフレームワークです。 GoTの鍵となるアイデアと主要な利点は、LLMによって生成された情報を任意のグラフとしてモデル化する能力であり、そこでは情報の単位(LLM思考)が頂点であり、エッジはこれらの頂点間の依存関係に対応する。このアプローチにより、任意のLLM思考を相乗的な結果に組み合わせ、思考のネットワーク全体の本質を蒸留したり、フィードバックループを用いて思考を強化することができる。例えば、totよりもソートの品質を62%向上させ、同時にコストを31%以上削減するなどである。我々は、getが新しい思考変換によって拡張可能であることを保証し、それによって新しいプロンプトスキームを先導することができる。この研究は、LLM推論を人間の思考や再発などの脳機構に近づけ、どちらも複雑なネットワークを形成する。

We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.

翻訳日:2024-02-07 20:25:29 公開日:2024-02-06

# シングルコピー計測によるt$ドープ安定化状態の効率的な学習

Efficient learning of $t$-doped stabilizer states with single-copy measurements ( http://arxiv.org/abs/2308.07014v3 )

ライセンス: Link先を確認

Nai-Hui Chia, Ching-Yi Lai, Han-Hsuan Lin

(参考訳) 量子状態学習の主要な目的の1つは、量子回路から生成される状態の学習に時間効率の良いアルゴリズムを開発することである。初期の研究では、クリフォード回路から生成される状態に対して最大$\log(n)$非クリフォードゲートを持つ時間効率の良いアルゴリズムが示されている。しかし、これらのアルゴリズムはマルチコピー計測を必要とし、必要な量子メモリのために短期的に実装上の課題を提起する。それとは対照的に、計算ベースでのみシングルキュービットの測定を使用することは、合理的な量子後暗号仮定の下で1つの追加のT$ゲートを持つクリフォード回路の出力分布でさえ学習するには不十分である。本研究では,Cifford回路が生成する状態を最大$O(\log n)$非Ciffordゲートで学習するために,非適応的な単一コピー測定のみを用いる効率的な量子アルゴリズムを提案する。

One of the primary objectives in the field of quantum state learning is to develop algorithms that are time-efficient for learning states generated from quantum circuits. Earlier investigations have demonstrated time-efficient algorithms for states generated from Clifford circuits with at most $\log(n)$ non-Clifford gates. However, these algorithms necessitate multi-copy measurements, posing implementation challenges in the near term due to the requisite quantum memory. On the contrary, using solely single-qubit measurements in the computational basis is insufficient in learning even the output distribution of a Clifford circuit with one additional $T$ gate under reasonable post-quantum cryptographic assumptions. In this work, we introduce an efficient quantum algorithm that employs only nonadaptive single-copy measurement to learn states produced by Clifford circuits with a maximum of $O(\log n)$ non-Clifford gates, filling a gap between the previous positive and negative results.

翻訳日:2024-02-07 20:25:11 公開日:2024-02-06

# git-mol: グラフ、画像、テキストを用いた分子科学のためのマルチモーダル大言語モデル

GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text ( http://arxiv.org/abs/2308.06911v3 )

ライセンス: Link先を確認

Pengfei Liu, Yiming Ren, Jun Tao and Zhixiang Ren

(参考訳) 大規模な言語モデルは自然言語処理において大きな進歩を遂げ、分子のテキスト表現を処理することによって分子科学における革新的な応用を可能にした。しかし、既存の言語モデルは複雑な分子構造や画像でリッチな情報を捉えることができない。本稿では,グラフ,画像,テキスト情報を統合したマルチモーダルな大規模言語モデルであるGIT-Molを紹介する。マルチモーダルな分子データの統合を容易にするため,全てのモダリティを統一された潜在空間に整列させることができる新しいアーキテクチャであるGIT-Formerを提案する。特性予測の精度が5%～10%向上し, 分子生成の有効性が20.2%向上した。言語間の分子翻訳戦略により, 化合物名認識や化学反応予測など, より下流の課題を遂行できる可能性が示唆された。

Large language models have made significant strides in natural language processing, enabling innovative applications in molecular science by processing textual representations of molecules. However, most existing language models cannot capture the rich information with complex molecular structures or images. In this paper, we introduce GIT-Mol, a multi-modal large language model that integrates the Graph, Image, and Text information. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel architecture that is capable of aligning all modalities into a unified latent space. We achieve a 5%-10% accuracy increase in properties prediction and a 20.2% boost in molecule generation validity compared to the baselines. With the any-to-language molecular translation strategy, our model has the potential to perform more downstream tasks, such as compound name recognition and chemical reaction prediction.

翻訳日:2024-02-07 20:24:53 公開日:2024-02-06

# 周期駆動システムのための対断駆動

Counterdiabatic Driving for Periodically Driven Systems ( http://arxiv.org/abs/2310.02728v2 )

ライセンス: Link先を確認

Paul Manuel Schindler and Marin Bukov

(参考訳) 周期駆動型システムは量子システムの特性を設計する上で有用な技術として登場し、量子シミュレーションの標準ツールボックスとして開発されている。このツールボックスを不完全な状態にしておくことは、強い周期ドライブにdressした状態の操作である。フロッケ制御の最先端はパラメータの断熱的変化である。しかし、これは実験におけるコヒーレンス時間の制限と矛盾する長いプロトコルを必要とする。非平衡量子物質を高速に制御するために、フロッケ系に着目した平衡から変分反断熱駆動の概念を一般化する。実効的なフロケ・ハミルトニアンに対する断熱ゲージポテンシャルの局所近似を求める非摂動的変分原理を導出する。これは、断熱体制から遠く離れたフロケ固有状態の遷移のない運転を可能にする。 2レベルFloquetバンドへの応用と周期駆動モデルとの相互作用について論じる。この技術により、非摂動光子共鳴を捕捉し、アクセス可能な制御項の局所性のような実験的な制限を尊重する高忠実度プロトコルを得ることができる。

Periodically driven systems have emerged as a useful technique to engineer the properties of quantum systems, and are in the process of being developed into a standard toolbox for quantum simulation. An outstanding challenge that leaves this toolbox incomplete is the manipulation of the states dressed by strong periodic drives. The state-of-the-art in Floquet control is the adiabatic change of parameters. Yet, this requires long protocols conflicting with the limited coherence times in experiments. To achieve fast control of nonequilibrium quantum matter, we generalize the notion of variational counterdiabatic driving away from equilibrium focusing on Floquet systems. We derive a nonperturbative variational principle to find local approximations to the adiabatic gauge potential for the effective Floquet Hamiltonian. It enables transitionless driving of Floquet eigenstates far away from the adiabatic regime. We discuss applications to two-level, Floquet band, and interacting periodically-driven models. The developed technique allows us to capture non-perturbative photon resonances and obtain high-fidelity protocols that respect experimental limitations like the locality of the accessible control terms.

翻訳日:2024-02-07 20:16:23 公開日:2024-02-06

# OceanGPT: 海洋科学タスクのための大規模言語モデル

OceanGPT: A Large Language Model for Ocean Science Tasks ( http://arxiv.org/abs/2310.02031v5 )

ライセンス: Link先を確認

Zhen Bi, Ningyu Zhang, Yida Xue, Yixin Ou, Daxiong Ji, Guozhou Zheng, Huajun Chen

(参考訳) 生命と生物多様性の貯水池である海洋科学は、地球の表面の70%以上を海洋がカバーしていることを考えると、非常に重要である。近年,Large Language Models (LLM) の進歩が科学のパラダイムを変えつつある。他の領域での成功にもかかわらず、現在のLLMは海洋学者のようなドメインの専門家のニーズに応えられず、海洋科学のためのLLMのポテンシャルは過小評価されている。内在的な理由は、海洋データの巨大で複雑な性質と、より高い粒度と知識の豊かさの必要性である。これらの問題を緩和するため,海洋分野における初のLCMであるOceanGPTを紹介した。マルチエージェント協調に基づく命令を生成する,大量の海洋ドメイン命令データを自動的に取得する新しいフレームワークであるDoInstructを提案する。さらに,海洋域におけるLLMの能力を評価するため,最初の海洋学ベンチマークであるOceanBenchを構築した。総合的な実験ではあるが、OceanGPTは海洋科学のタスクの高度な知識知識を示すだけでなく、海洋技術における予備的なインテリジェンス能力も得る。コード、データ、チェックポイントは近々https://github.com/zjunlp/KnowLM.comで公開される。

Ocean science, which delves into the oceans that are reservoirs of life and biodiversity, is of great significance given that oceans cover over 70% of our planet's surface. Recently, advances in Large Language Models (LLMs) have transformed the paradigm in science. Despite the success in other domains, current LLMs often fall short in catering to the needs of domain experts like oceanographers, and the potential of LLMs for ocean science is under-explored. The intrinsic reason may be the immense and intricate nature of ocean data as well as the necessity for higher granularity and richness in knowledge. To alleviate these issues, we introduce OceanGPT, the first-ever LLM in the ocean domain, which is expert in various ocean science tasks. We propose DoInstruct, a novel framework to automatically obtain a large volume of ocean domain instruction data, which generates instructions based on multi-agent collaboration. Additionally, we construct the first oceanography benchmark, OceanBench, to evaluate the capabilities of LLMs in the ocean domain. Though comprehensive experiments, OceanGPT not only shows a higher level of knowledge expertise for oceans science tasks but also gains preliminary embodied intelligence capabilities in ocean technology. Codes, data and checkpoints will soon be available at https://github.com/zjunlp/KnowLM.

翻訳日:2024-02-07 20:15:46 公開日:2024-02-06

# 一般費用のエネルギー誘導型連続エントロピーバリアセンター推定

Energy-Guided Continuous Entropic Barycenter Estimation for General Costs ( http://arxiv.org/abs/2310.01105v2 )

ライセンス: Link先を確認

Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Anastasis Kratsios, Evgeny Burnaev, Alexander Korotin

(参考訳) 最適輸送(OT)バリセンターは、幾何学的性質を捉えながら確率分布を平均化する方法である。要するに、バリセンターのタスクは、OTの相違点が与えられた確率分布の集合の平均を取ることである。任意のOTコスト関数に対して連続的エントロピーOT(EOT)バリセンタを近似する新しいアルゴリズムを提案する。我々のアプローチは、最近MLコミュニティの注目を集めている弱いOTに基づくEOT問題の二重再構成に基づいている。新規性以外にも、我々の方法にはいくつかの利点がある。 (i)回収した溶液の品質境界を確立する。 (二)この手法は、関心事問題によく調整されたアルゴリズムの使用を可能にする、エネルギーベースモデル(EBM)学習手順と全く無関係である。 (iii)ミニマックス、強化、その他の複雑な技術的トリックを避けるための直感的な最適化スキームを提供する。検証には,非ユークリッドコスト関数を含むいくつかの低次元シナリオと画像空間の設定を検討する。さらに,事前学習した生成モデルで生成した画像多様体上でバリセンタを学習する実践的課題について検討し,実世界の応用への新たな方向について検討する。

Optimal transport (OT) barycenters are a mathematically grounded way of averaging probability distributions while capturing their geometric properties. In short, the barycenter task is to take the average of a collection of probability distributions w.r.t. given OT discrepancies. We propose a novel algorithm for approximating the continuous Entropic OT (EOT) barycenter for arbitrary OT cost functions. Our approach is built upon the dual reformulation of the EOT problem based on weak OT, which has recently gained the attention of the ML community. Beyond its novelty, our method enjoys several advantageous properties: (i) we establish quality bounds for the recovered solution; (ii) this approach seemlessly interconnects with the Energy-Based Models (EBMs) learning procedure enabling the use of well-tuned algorithms for the problem of interest; (iii) it provides an intuitive optimization scheme avoiding min-max, reinforce and other intricate technical tricks. For validation, we consider several low-dimensional scenarios and image-space setups, including non-Euclidean cost functions. Furthermore, we investigate the practical task of learning the barycenter on an image manifold generated by a pretrained generative model, opening up new directions for real-world applications.

翻訳日:2024-02-07 20:15:24 公開日:2024-02-06

# リンク予測の再検討: データパースペクティブ

Revisiting Link Prediction: A Data Perspective ( http://arxiv.org/abs/2310.00793v2 )

ライセンス: Link先を確認

Haitao Mao, Juanhui Li, Harry Shomer, Bingheng Li, Wenqi Fan, Yao Ma, Tong Zhao, Neil Shah, Jiliang Tang

(参考訳) グラフの基本的なタスクであるリンク予測は、フレンドレコメンデーション、タンパク質分析、薬物相互作用予測など、様々なアプリケーションで必須であることが証明されている。しかし、データセットは複数のドメインにまたがるので、異なるリンク形成メカニズムを持つことができる。既存の文献の証拠は、すべてのデータセットに適した普遍的に最適なアルゴリズムが存在しないことを裏付けている。本稿では,データ中心の観点から,多様なデータセットにまたがるリンク予測の原理を探求する。リンク予測に不可欠な3つの基本的な要因は,局所的構造的近接,大域的構造的近接,特徴的近接である。それらの要因間の関係を解明し (i)大域構造近接は局所構造近接が不十分な場合にのみ有効である。 (ii) 特徴点と構造的近接点の間には不整合が認められる。このような非互換性は、特徴近接係数が支配するエッジにおいて、GNNのリンク予測(GNN4LP)が一貫して過小評価される。データの観点からのこれらの新たな洞察に触発され、より包括的な評価のために適切なベンチマークデータセットを選択するためのGNN4LPモデル設計とガイドラインの実践的指導を提供する。

Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations.

翻訳日:2024-02-07 20:15:05 公開日:2024-02-06

# グラフニューラルネットワークは最適な近似アルゴリズムか?

Are Graph Neural Networks Optimal Approximation Algorithms? ( http://arxiv.org/abs/2310.00526v5 )

ライセンス: Link先を確認

Morris Yau, Eric Lu, Nikolaos Karalias, Jessica Xu, Stefanie Jegelka

(参考訳) 本研究では,半定義型プログラミング(sdp)の強力なアルゴリズムツールを用いて,組合せ最適化問題に対する最適近似アルゴリズムをキャプチャするグラフニューラルネットワークアーキテクチャを設計する。具体的には, 多項式サイズのメッセージパッシングアルゴリズムは, ユニクゲーム・コンジェクチャを仮定した最大制約満足度問題に対して, 最も強力な多項式時間アルゴリズムを表現できることを示す。我々はこの結果を利用して、Max-Cut、Min-Vertex-Cover、Max-3-SATといったランドマーク組合せ最適化問題に対する高品質な近似解を得る効率的なグラフニューラルネットワークアーキテクチャOptGNNを構築する。提案手法は,実世界および合成データセットの幅広い領域において,解法や神経ベースラインに対して強い実験結果が得られる。最後に, コンベックス緩和を捉えた OptGNN の機能を活用し, 学習した OptGNN の埋め込みから最適解のバウンドを生成するアルゴリズムを設計する。

In this work we design graph neural network architectures that capture optimal approximation algorithms for a large class of combinatorial optimization problems, using powerful algorithmic tools from semidefinite programming (SDP). Concretely, we prove that polynomial-sized message-passing algorithms can represent the most powerful polynomial time algorithms for Max Constraint Satisfaction Problems assuming the Unique Games Conjecture. We leverage this result to construct efficient graph neural network architectures, OptGNN, that obtain high-quality approximate solutions on landmark combinatorial optimization problems such as Max-Cut, Min-Vertex-Cover, and Max-3-SAT. Our approach achieves strong empirical results across a wide range of real-world and synthetic datasets against solvers and neural baselines. Finally, we take advantage of OptGNN's ability to capture convex relaxations to design an algorithm for producing bounds on the optimal solution from the learned embeddings of OptGNN.

翻訳日:2024-02-07 20:14:48 公開日:2024-02-06

# HarmonyDream:世界モデル内でのタスクハーモニゼーション

HarmonyDream: Task Harmonization Inside World Models ( http://arxiv.org/abs/2310.00344v2 )

ライセンス: Link先を確認

Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

(参考訳) モデルベース強化学習(MBRL)は、環境がどのように機能するかをモデル化し、典型的には2つのタスク、すなわち観察モデリングと報酬モデリングを包含する世界モデルを活用することで、サンプル効率の学習を約束する。本稿では,世界モデルにおいて各タスクが果たす役割について,専用の実証研究を通じてより深く理解し,見落としているサンプル効率のMBRLの可能性を明らかにする。我々の重要な洞察は、明示的なMBRLの一般的なアプローチは、観測モデルを通して環境の豊富な詳細を復元しようとするが、環境の複雑さと限られたモデル容量のために困難であるということである。一方で、暗黙のmbrlを支配しつつ、コンパクトなタスク中心のダイナミクスの学習に長けている報酬モデルは、よりリッチな学習信号なしでサンプル効率のよい学習には不十分である。これらの知見と発見に触発されて,世界モデル学習における2つのタスク間の動的平衡性を維持するために損失係数を自動的に調整する,シンプルで効果的なアプローチであるHarmonyDreamを提案する。実験の結果,HarmonyDreamをベースとしたMBRL法では,視覚ロボティクスの絶対性能が10%-69%向上し,Atari 100Kベンチマークに新たな最先端結果が得られた。

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Motivated by these insights and discoveries, we propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization, i.e. a dynamic equilibrium between the two tasks in world model learning. Our experiments show that the base MBRL method equipped with HarmonyDream gains 10%-69% absolute performance boosts on visual robotic tasks and sets a new state-of-the-art result on the Atari 100K benchmark.

翻訳日:2024-02-07 20:14:33 公開日:2024-02-06

# 拡張ランダム化平滑化に対するリプシッツ分散マージントレードオフ

The Lipschitz-Variance-Margin Tradeoff for Enhanced Randomized Smoothing ( http://arxiv.org/abs/2309.16883v3 )

ライセンス: Link先を確認

Blaise Delattre, Alexandre Araujo, Quentin Barth\'elemy and Alexandre Allauzen

(参考訳) ディープニューラルネットワークの実際の応用は、ノイズの入力や敵対的な攻撃に直面すると不安定な予測によって妨げられる。この文脈では、認定半径はモデルの堅牢性の重要な指標である。しかし、関連する認定半径を持つ効率的な分類器をどう設計するか? ランダム化平滑化(randomized smoothing)は、ノイズを入力に注入することで、滑らかでロバストな分類器を得る、有望なフレームワークを提供する。本稿では,ランダムな平滑化過程推定におけるモンテカルロサンプリングによって生じる分散が,分類器の他の2つの重要な性質であるリプシッツ定数とマージンと密接に相互作用することを示す。より正確には、我々の研究は、滑らかな分類器と経験的分散の両方に対する基底分類器のリプシッツ定数の二重影響を強調している。さらに、証明されたロバスト半径を増やすために、基底分類器の確率ベクトルにロジットを変換して分散マージントレードオフを利用する方法を導入する。我々は、ランダム化平滑化のための拡張リプシッツ境界とともに、ベルンシュタインの濃度不等式を利用する。実験の結果,現在の手法と比較して精度が有意に向上した。新たな認証手順により,ランダム化平滑化に使用する事前学習モデルの使用が可能となり,ゼロショット方式で現在の認証半径を効果的に改善できる。

Real-life applications of deep neural networks are hindered by their unsteady predictions when faced with noisy inputs and adversarial attacks. The certified radius is in this context a crucial indicator of the robustness of models. However how to design an efficient classifier with an associated certified radius? Randomized smoothing provides a promising framework by relying on noise injection into the inputs to obtain a smoothed and robust classifier. In this paper, we first show that the variance introduced by the Monte-Carlo sampling in the randomized smoothing procedure estimate closely interacts with two other important properties of the classifier, \textit{i.e.} its Lipschitz constant and margin. More precisely, our work emphasizes the dual impact of the Lipschitz constant of the base classifier, on both the smoothed classifier and the empirical variance. Moreover, to increase the certified robust radius, we introduce a different way to convert logits to probability vectors for the base classifier to leverage the variance-margin trade-off. We leverage the use of Bernstein's concentration inequality along with enhanced Lipschitz bounds for randomized smoothing. Experimental results show a significant improvement in certified accuracy compared to current state-of-the-art methods. Our novel certification procedure allows us to use pre-trained models that are used with randomized smoothing, effectively improving the current certification radius in a zero-shot manner.

翻訳日:2024-02-07 20:14:06 公開日:2024-02-06

# 機械翻訳におけるパラダイムシフト:大規模言語モデルの翻訳性能の向上

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models ( http://arxiv.org/abs/2309.11674v2 )

ライセンス: Link先を確認

Haoran Xu, Young Jin Kim, Amr Sharaf, Hany Hassan Awadalla

(参考訳) 生成型大規模言語モデル(LLM)は様々なNLPタスクにおいて顕著な進歩を遂げている。しかし、これらの進歩は翻訳タスク、特に従来の教師付きエンコーダ・デコーダ翻訳モデルより遅れている中程度のモデルサイズ(7Bまたは13Bパラメータ)では反映されていない。これまでの研究では、これらの中等度LSMの翻訳能力の改善が試みられてきたが、その利益は限られている。本研究では、従来の翻訳モデルが依存する豊富な並列データの必要性をなくし、翻訳タスク用に特別に設計されたllmのための新しい微調整手法を提案する。提案手法は,モノリンガルデータに対する初期微調整と,それに続く少数の高品質並列データに対する微調整の2段階からなる。本稿では,ALMA (Advanced Language Model-based trAnslator) として,この戦略によって開発された LLM を紹介する。 LLaMA-2を基礎モデルとして,WMT'21(2方向)およびWMT'22(8方向)テストデータセットから10の翻訳方向にわたるゼロショット性能に対して,12BLEUおよび12COMET以上の平均的改善を達成できることを示す。 NLLB-54BモデルやGPT-3.5-text-davinci-003よりは優れており、7Bまたは13Bパラメータのみである。この手法は機械翻訳における新しい訓練パラダイムの基礎を確立する。

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.

翻訳日:2024-02-07 20:13:43 公開日:2024-02-06

# 人間の意思決定を改善するAI不確かさの定量化

Using AI Uncertainty Quantification to Improve Human Decision-Making ( http://arxiv.org/abs/2309.10852v2 )

ライセンス: Link先を確認

Laura R. Marusich, Jonathan Z. Bakdash, Yan Zhou, Murat Kantarcioglu

(参考訳) AI不確実性定量化(UQ)は、AI予測以外の人間の意思決定を改善する可能性がある。 AIと人間の意思決定に関する過去の研究の大部分は、モデル説明可能性と解釈可能性に集中しており、UQが人間の意思決定に与える影響についてはほとんど理解していない。 2つのオンライン行動実験において、厳格なスコアリングルールを用いて校正した事例レベルのUQにおける人的意思決定への影響を評価した。最初の実験では、AI予測のみと比較して、UQは意思決定性能に有益であることを示した。第2の実験で、UQは確率的情報の様々な表現にまたがって意思決定に一般化可能な利点があることを発見した。これらの結果から、AIのインスタンスレベルの高品質なUQの実装は、AI予測単独と比較して、実際のシステムによる意思決定を改善する可能性が示唆された。

AI Uncertainty Quantification (UQ) has the potential to improve human decision-making beyond AI predictions alone by providing additional probabilistic information to users. The majority of past research on AI and human decision-making has concentrated on model explainability and interpretability, with little focus on understanding the potential impact of UQ on human decision-making. We evaluated the impact on human decision-making for instance-level UQ, calibrated using a strict scoring rule, in two online behavioral experiments. In the first experiment, our results showed that UQ was beneficial for decision-making performance compared to only AI predictions. In the second experiment, we found UQ had generalizable benefits for decision-making across a variety of representations for probabilistic information. These results indicate that implementing high quality, instance-level UQ for AI may improve decision-making with real systems compared to AI predictions alone.

翻訳日:2024-02-07 20:13:18 公開日:2024-02-06

# CC-SGG:学習シーングラフを用いたコーナーケースシナリオ生成

CC-SGG: Corner Case Scenario Generation using Learned Scene Graphs ( http://arxiv.org/abs/2309.09844v2 )

ライセンス: Link先を確認

George Drayson, Efimia Panagiotaki, Daniel Omeiza, Lars Kunze

(参考訳) コーナーケースシナリオは、自動運転車(AV)の安全性のテストと検証に不可欠なツールである。これらのシナリオは、自然主義的な運転データセットでは不十分であることが多いため、合成コーナーケースによるデータ拡張は、ユニークな状況下でのAVの安全な操作を大幅に強化する。しかし、合成的、しかし現実的なコーナーケースの生成は、大きな課題となる。本研究では,不均一グラフニューラルネットワーク(HGNN)に基づく新しい手法を導入し,通常の運転シナリオをコーナーケースに変換する。これを実現するために,我々はまず,通常の運転シーンの簡潔な表現をシーングラフとして生成し,その構造と特性を最小に操作する。我々のモデルはこれらのグラフを摂動させ、注意と三重埋め込みを用いてコーナーケースを生成する。入力グラフと摂動グラフはシミュレーションにインポートされ、コーナーケースシナリオを生成する。我々のモデルは入力シーングラフからコーナーケースを生成し、テストデータセットで89.9%の精度で予測することに成功した。さらに、ベースライン自律運転法で生成されたシナリオを検証し、ベースラインにとって重要な状況を効果的に生成するモデルの能力を実証する。

Corner case scenarios are an essential tool for testing and validating the safety of autonomous vehicles (AVs). As these scenarios are often insufficiently present in naturalistic driving datasets, augmenting the data with synthetic corner cases greatly enhances the safe operation of AVs in unique situations. However, the generation of synthetic, yet realistic, corner cases poses a significant challenge. In this work, we introduce a novel approach based on Heterogeneous Graph Neural Networks (HGNNs) to transform regular driving scenarios into corner cases. To achieve this, we first generate concise representations of regular driving scenes as scene graphs, minimally manipulating their structure and properties. Our model then learns to perturb those graphs to generate corner cases using attention and triple embeddings. The input and perturbed graphs are then imported back into the simulation to generate corner case scenarios. Our model successfully learned to produce corner cases from input scene graphs, achieving 89.9% prediction accuracy on our testing dataset. We further validate the generated scenarios on baseline autonomous driving methods, demonstrating our model's ability to effectively create critical situations for the baselines.

翻訳日:2024-02-07 20:13:03 公開日:2024-02-06

# ベータダイバージェンスを用いた深部非負行列因子分解

Deep Nonnegative Matrix Factorization with Beta Divergences ( http://arxiv.org/abs/2309.08249v2 )

ライセンス: Link先を確認

Valentin Leplat, Le Thi Khanh Hien, Akwum Onwunta, Nicolas Gillis

(参考訳) ディープ非負行列因子化(Deep Non negative Matrix Factorization, ディープNMF)は、最近、異なるスケールで複数の特徴層を抽出する貴重な手法として登場した。しかし、既存のディープNMFモデルとアルゴリズムは、主に最小二乗誤差に基づく評価が中心であり、多様なデータセットの近似の質を評価するのに最も適していないかもしれない。例えば、音声信号や文書などのデータ型を扱う場合、$\beta$-divergencesはより適切な選択肢を提供すると広く認識されている。本稿では,Kullback-Leiblerの発散に着目し,$\beta$-divergencesを用いて深部NMFの新しいモデルとアルゴリズムを開発する。次に,これらの手法を,顔の特徴の抽出,文書収集中の話題の同定,ハイパースペクトル画像中の資料の同定に応用する。

Deep Nonnegative Matrix Factorization (deep NMF) has recently emerged as a valuable technique for extracting multiple layers of features across different scales. However, all existing deep NMF models and algorithms have primarily centered their evaluation on the least squares error, which may not be the most appropriate metric for assessing the quality of approximations on diverse datasets. For instance, when dealing with data types such as audio signals and documents, it is widely acknowledged that $\beta$-divergences offer a more suitable alternative. In this paper, we develop new models and algorithms for deep NMF using some $\beta$-divergences, with a focus on the Kullback-Leibler divergence. Subsequently, we apply these techniques to the extraction of facial features, the identification of topics within document collections, and the identification of materials within hyperspectral images.

翻訳日:2024-02-07 20:12:14 公開日:2024-02-06

# 多経路長期船舶軌道予測によるより安全な海上環境の構築

Building a Safer Maritime Environment Through Multi-Path Long-Term Vessel Trajectory Forecasting ( http://arxiv.org/abs/2310.18948v3 )

ライセンス: Link先を確認

Gabriel Spadon, Jay Kumar, Matthew Smith, Sarah Vela, Romina Gehrmann, Derek Eden, Joshua van Berkel, Amilcar Soares, Ronan Fablet, Ronald Pelot, Stan Matwin

(参考訳) 海洋輸送は世界的な経済成長を達成する上で最重要であり、持続可能性と絶滅危惧種の保護に同時に生態的義務を負う。この点において、自動識別システム(ais)データは、船舶移動に関するリアルタイムストリーミングデータを提供することで、交通監視の強化に重要な役割を果たす。本研究では,AISデータ系列から長期の船舶軌道を予測することにより,船体衝突を防止するためのAISデータについて検討する。そこで我々は, 双方向長短期記憶ネットワーク(Bi-LSTM)を用いたエンコーダ・デコーダモデルアーキテクチャを開発し, 入力として1～3時間AISデータを用いて, 次の12時間の船舶軌道を予測した。我々は,各軌道の潜在的な経路や目的地を示す歴史的AISデータから構築した確率的特徴をモデルに提供する。このモデルでは,空間的特徴学習における畳み込みレイヤと,時間的特徴学習における時系列の最近の時間ステップの重要性を増大させる位置認識型注意機構を活用することで,船の軌道を予測する。確率的特徴は、それぞれの特徴タイプに対して約85%と75%のF1スコアを持ち、ニューラルネットワークへの情報拡張の有効性を示す。我々は、北大西洋右クジラ(NARW)の生息地として知られるセントローレンス湾で、我々のモデルを検証した。我々のモデルは、様々な技術と特徴を用いて、高いR2スコアを98%以上達成した。旋回や経路選択の間に複雑な決定をすることができるため、他のアプローチの中でも際立っている。本研究は,海洋生物種の保全のためのデータ工学および軌道予測モデルの可能性を明らかにする。

Maritime transportation is paramount in achieving global economic growth, entailing concurrent ecological obligations in sustainability and safeguarding endangered marine species, most notably preserving large whale populations. In this regard, the Automatic Identification System (AIS) data plays a significant role by offering real-time streaming data on vessel movement, allowing enhanced traffic monitoring. This study explores using AIS data to prevent vessel-to-whale collisions by forecasting long-term vessel trajectories from engineered AIS data sequences. For such a task, we have developed an encoder-decoder model architecture using Bidirectional Long Short-Term Memory Networks (Bi-LSTM) to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data as input. We feed the model with probabilistic features engineered from historical AIS data that refer to each trajectory's potential route and destination. The model then predicts the vessel's trajectory, considering these additional features by leveraging convolutional layers for spatial feature learning and a position-aware attention mechanism that increases the importance of recent timesteps of a sequence during temporal feature learning. The probabilistic features have an F1 Score of approximately 85% and 75% for each feature type, respectively, demonstrating their effectiveness in augmenting information to the neural network. We test our model on the Gulf of St. Lawrence, a region known to be the habitat of North Atlantic Right Whales (NARW). Our model achieved a high R2 score of over 98% using various techniques and features. It stands out among other approaches as it can make complex decisions during turnings and path selection. Our study highlights the potential of data engineering and trajectory forecasting models for marine life species preservation.

翻訳日:2024-02-07 20:05:17 公開日:2024-02-06

# 言語モデルにおける真さをモデル化するペルソナ

Personas as a Way to Model Truthfulness in Language Models ( http://arxiv.org/abs/2310.18168v5 )

ライセンス: Link先を確認

Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He

(参考訳) 大規模な言語モデル(LLM)は、インターネットから大量のテキストで訓練されており、事実と誤解を招く情報の両方を含んでいる。 LMの古典的な見方からは直観的ではないが、最近の研究は、文の真理値がモデルの表現から引き出すことができることを示した。本稿では,真理ラベルのトレーニングを受けていないLMが真理を知っているように見える理由を説明する。プリトレーニングデータは、アウトプットが共通の特徴を持つ(非)エージェントのグループによって生成され、(非)パーソナリティを形成すると仮定する。このデータに基づいてトレーニングすることで、LMはそのアクティベーション空間におけるペルソナを推論し、表現することができる。これにより、モデルは真理を虚偽から切り離し、その世代の真理を制御できる。我々は,(1)モデルが生成する前に真理であるかどうかを検証し,(2)事実の集合上でモデルを微調整することで,その真理性が改善される,という2つの観察を通してペルソナ仮説の証拠を示す。次に,算術を合成環境として用いることで,事前学習データの構造が真正ペルソナを推測するために重要であることを示す。全体としては、モデルがデータの階層構造を利用して真理のような抽象概念を学習できることが示唆されている。

Large language models (LLMs) are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. While unintuitive from a classic view of LMs, recent work has shown that the truth value of a statement can be elicited from the model's representations. This paper presents an explanation for why LMs appear to know the truth despite not being trained with truth labels. We hypothesize that the pretraining data is generated by groups of (un)truthful agents whose outputs share common features, and they form a (un)truthful persona. By training on this data, LMs can infer and represent the persona in its activation space. This allows the model to separate truth from falsehoods and controls the truthfulness of its generation. We show evidence for the persona hypothesis via two observations: (1) we can probe whether a model's answer will be truthful before it is generated; (2) finetuning a model on a set of facts improves its truthfulness on unseen topics. Next, using arithmetics as a synthetic environment, we show that structures of the pretraining data are crucial for the model to infer the truthful persona. Overall, our findings suggest that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.

翻訳日:2024-02-07 20:04:47 公開日:2024-02-06

# 動作駆動型人間の運動予測のための方向認識脚運動学習

Orientation-Aware Leg Movement Learning for Action-Driven Human Motion Prediction ( http://arxiv.org/abs/2310.14907v2 )

ライセンス: Link先を確認

Chunzhi Gu, Chao Zhang, Shigeru Kuriyama

(参考訳) 行動駆動型人間の動作予測の課題は、与えられた行動ラベルを尊重しながら、観察されたシーケンスに基づいて将来の人間の動作を予測することである。人間の動きの確率性だけでなく、複数のアクションラベル間の滑らかで現実的な遷移をモデル化する必要がある。しかし、ほとんどのデータセットがそのような遷移データを含まないという事実は、このタスクを複雑にします。既存の作業は、単にスムーズな遷移を促進する前に滑らかさを学ぶことでこの問題に取り組むが、特に歴史と予測された動きが方向で著しく異なる場合、不自然な遷移をもたらす。本稿では,人間の動作遷移が現実的な脚の動きを取り入れて方向転換を処理し,それを動作条件付き対話型学習タスク(ACB)として活用し,遷移自然性を促進することを論じる。全てのトランジションをモデル化することは事実上不可能であるため、ACBはウォークやランなどのアクティブな歩行動作を持つ非常に少数のアクションクラスでのみ実行される。具体的には、まず動き拡散モデルを用いて、特定の将来の動作で目標動きを生成し、次に、観察と予測をスムーズに連結し、最終的に動き予測に対処した2段階予測戦略に従う。本手法はトレーニング中にラベル付き動作遷移データから完全に解放される。提案手法のロバスト性を示すため,1つのデータセット上でトレーニングした相互学習モデルを2つの大規模動きデータセットに一般化し,自然な遷移を生成する。 3つのベンチマークデータセットを総合的に評価した結果, 視覚的品質, 予測精度, 行動忠実度の観点から, 最先端の性能が得られた。

The task of action-driven human motion prediction aims to forecast future human motion based on the observed sequence while respecting the given action label. It requires modeling not only the stochasticity within human motion but the smooth yet realistic transition between multiple action labels. However, the fact that most datasets do not contain such transition data complicates this task. Existing work tackles this issue by learning a smoothness prior to simply promote smooth transitions, yet doing so can result in unnatural transitions especially when the history and predicted motions differ significantly in orientations. In this paper, we argue that valid human motion transitions should incorporate realistic leg movements to handle orientation changes, and cast it as an action-conditioned in-betweening (ACB) learning task to encourage transition naturalness. Because modeling all possible transitions is virtually unreasonable, our ACB is only performed on very few selected action classes with active gait motions, such as Walk or Run. Specifically, we follow a two-stage forecasting strategy by first employing the motion diffusion model to generate the target motion with a specified future action, and then producing the in-betweening to smoothly connect the observation and prediction to eventually address motion prediction. Our method is completely free from the labeled motion transition data during training. To show the robustness of our approach, we generalize our trained in-betweening learning model on one dataset to two unseen large-scale motion datasets to produce natural transitions. Extensive experimental evaluations on three benchmark datasets demonstrate that our method yields the state-of-the-art performance in terms of visual quality, prediction accuracy, and action faithfulness.

翻訳日:2024-02-07 20:04:24 公開日:2024-02-06

# RAER: 無線分散最適化における線形圧縮

LASER: Linear Compression in Wireless Distributed Optimization ( http://arxiv.org/abs/2310.13033v2 )

ライセンス: Link先を確認

Ashok Vardhan Makkuva, Marco Bondaschi, Thijs Vogels, Martin Jaggi, Hyeji Kim, Michael C. Gastpar

(参考訳) data-parallel sgdは分散最適化、特に大規模機械学習のためのデファクトアルゴリズムである。その利点にもかかわらず、コミュニケーションのボトルネックは永続的な問題の1つだ。これを緩和するほとんどの圧縮スキームは、ノイズレス通信リンクを仮定するか、実用的なタスクで良いパフォーマンスを達成できないかのいずれかである。本稿では,このギャップを埋めて LASER: LineAr CompreSsion in WirEless DistRibuted Optimization を紹介する。 LASERは勾配の固有の低ランク構造を利用し、ノイズチャネル上で効率的に伝送する。古典的なSGDと同様の理論的保証を享受する一方で、LASERは様々な実用的なベンチマークで基準線よりも一貫した利得を示している。特に、コンピュータビジョンとGPT言語モデリングタスクに挑戦する最先端の圧縮スキームよりも優れている。後者では、ノイズの多いチャネルのベースラインよりも難易度が50ドルから64ドルに向上する。

Data-parallel SGD is the de facto algorithm for distributed optimization, especially for large scale machine learning. Despite its merits, communication bottleneck is one of its persistent issues. Most compression schemes to alleviate this either assume noiseless communication links, or fail to achieve good performance on practical tasks. In this paper, we close this gap and introduce LASER: LineAr CompreSsion in WirEless DistRibuted Optimization. LASER capitalizes on the inherent low-rank structure of gradients and transmits them efficiently over the noisy channels. Whilst enjoying theoretical guarantees similar to those of the classical SGD, LASER shows consistent gains over baselines on a variety of practical benchmarks. In particular, it outperforms the state-of-the-art compression schemes on challenging computer vision and GPT language modeling tasks. On the latter, we obtain $50$-$64 \%$ improvement in perplexity over our baselines for noisy channels.

翻訳日:2024-02-07 20:03:34 公開日:2024-02-06

# Loci-Segmented: シーンセグメンテーション学習の改善

Loci-Segmented: Improving Scene Segmentation Learning ( http://arxiv.org/abs/2310.10410v3 )

ライセンス: Link先を確認

Manuel Traub, Frederic Becker, Adrian Sauter, Sebastian Otte, Martin V. Butz

(参考訳) 画像や映像からのシーンセグメンテーションのための現在のスロット指向アプローチは、提供された背景情報やスロット割り当てに依存している。本稿では,ロシ・セグメンツド(Loci-Segmented, Loci-s)という,これらの情報を必要としないセグメンテーションされた位置情報・ID追跡システムを提案する。シーンを動的に解釈可能な背景とスロットベースのオブジェクトエンコーディングに分割し、rgb、マスク、位置、深さ情報を分離する。その結果,MOViデータセットと,シーンセグメンテーションをターゲットとした別のデータセットコレクションにおいて,映像分解性能が大幅に向上したことが明らかとなった。このシステムのよく解釈可能な合成潜在エンコーディングは、下流タスクの基礎モデルとして機能する。

Current slot-oriented approaches for compositional scene segmentation from images and videos rely on provided background information or slot assignments. We present a segmented location and identity tracking system, Loci-Segmented (Loci-s), which does not require either of this information. It learns to dynamically segment scenes into interpretable background and slot-based object encodings, separating rgb, mask, location, and depth information for each. The results reveal largely superior video decomposition performance in the MOVi datasets and in another established dataset collection targeting scene segmentation. The system's well-interpretable, compositional latent encodings may serve as a foundation model for downstream tasks.

翻訳日:2024-02-07 20:03:20 公開日:2024-02-06

# マルチボディニューラルシーンフロー

Multi-Body Neural Scene Flow ( http://arxiv.org/abs/2310.10301v2 )

ライセンス: Link先を確認

Kavisha Vidanapathirana, Shin-Fang Chng, Xueqian Li, Simon Lucey

(参考訳) ニューラルネットワークをニューラルネットワークとして使用したシーンフローのテスト時間最適化は、単純さ、データセットバイアスの欠如、最先端のパフォーマンスなどによって人気を集めている。しかし, 座標ネットワークは, 空間的平滑なシーンフロー予測を暗黙的に正則化することにより, 一般運動を捉えるが, 先行する神経は実世界データに存在する多体剛性運動を識別できない。これを解決するために, 従来の研究と同様, 剛体のSE(3)$パラメータを制約する, 煩雑で不安定な戦略を使わずに, 多体剛性を実現できることを示す。これは、剛体の流れ予測における等長性を促進するためにシーンフロー最適化を定式化することで達成される。この戦略により、連続した流れ場を維持しながら、シーンフローの多体剛性が可能となり、点雲の列をまたいだ密集した長期のシーンフロー統合が可能になる。我々は,実世界のデータセットに関する広範囲な実験を行い,我々のアプローチが3次元シーンフローと長期的ポイントワイズ4次元軌道予測の最先端を上回っていることを実証する。コードはhttps://github.com/kavisha725/mbnsfで入手できる。

The test-time optimization of scene flow - using a coordinate network as a neural prior - has gained popularity due to its simplicity, lack of dataset bias, and state-of-the-art performance. We observe, however, that although coordinate networks capture general motions by implicitly regularizing the scene flow predictions to be spatially smooth, the neural prior by itself is unable to identify the underlying multi-body rigid motions present in real-world data. To address this, we show that multi-body rigidity can be achieved without the cumbersome and brittle strategy of constraining the $SE(3)$ parameters of each rigid body as done in previous works. This is achieved by regularizing the scene flow optimization to encourage isometry in flow predictions for rigid bodies. This strategy enables multi-body rigidity in scene flow while maintaining a continuous flow field, hence allowing dense long-term scene flow integration across a sequence of point clouds. We conduct extensive experiments on real-world datasets and demonstrate that our approach outperforms the state-of-the-art in 3D scene flow and long-term point-wise 4D trajectory prediction. The code is available at: https://github.com/kavisha725/MBNSF.

翻訳日:2024-02-07 20:03:08 公開日:2024-02-06

# 相関の崩壊からギブス状態の局所性と安定性へ

From decay of correlations to locality and stability of the Gibbs state ( http://arxiv.org/abs/2310.09182v2 )

ライセンス: Link先を確認

\'Angela Capel, Massimo Moscolari, Stefan Teufel, Tom Wessel

(参考訳) 本稿では,ギブス状態が相関関係の崩壊を満足すると,局所摂動がギブス状態にのみ影響を及ぼすという意味で安定であり,局所的,すなわち局所的不明瞭性を満たすことを示す。これらの含意は任意の次元において真であり、ハミルトニアンの局所性のみを必要とし、リーブ・ロビンソン境界に依存する。そして、この結果は、相関の減衰が知られている高温度での短距離相互作用を持つ任意の次元の量子スピン系に明示的に適用する。さらに,変換不変かつ指数的に減衰する相互作用を持つ有限一次元スピンチェーンのギブス状態に適用し,有限次元相互作用の極限でゼロとなる閾値温度以上で相関の減衰が真であることを示す。我々の証明は、ギブス状態に対する量子信念伝播の局所性特性の詳細な解析に基づいている。

In this paper we show that whenever a Gibbs state satisfies decay of correlations, then it is stable, in the sense that local perturbations influence the Gibbs state only locally, and it is local, namely it satisfies local indistinguishability. These implications hold true in any dimensions, only require locality of the Hamiltonian and rely on Lieb-Robinson bounds. Then, we explicitly apply our results to quantum spin systems in any dimension with short-range interactions at high enough temperature, where decay of correlations is known to hold. Furthermore, our results are applied to Gibbs states of finite one-dimensional spin chains with translation-invariant and exponentially decaying interactions, for which we also show that decay of correlations holds true above a threshold temperature that goes to zero in the limit of finite-range interactions. Our proofs are based on a detailed analysis of the locality properties of the quantum belief propagation for Gibbs states.

翻訳日:2024-02-07 20:02:16 公開日:2024-02-06

# Web上での銃身売買行動分析のための自己教師型視覚学習

Self-supervised visual learning for analyzing firearms trafficking activities on the Web ( http://arxiv.org/abs/2310.07975v2 )

ライセンス: Link先を確認

Sotirios Konstantakos, Despina Ioanna Chalkiadaki, Ioannis Mademlis, Adamantia Anna Rebolledo Chrysochoou, Georgios Th. Papadopoulos

(参考訳) RGB画像からの視覚銃の自動分類は、公共空間のセキュリティ、情報収集、法執行機関の調査に応用するための重要な現実世界の課題である。 World Wide Web(ソーシャルメディアやダークウェブサイトを含む)から大量にクロールされた画像に適用すると、オープンソースのインテリジェンスからビッグデータを分析することで、犯罪者の銃身売買ネットワークを識別しようとするシステムの重要な構成要素となる。ディープニューラルネットワーク(DNN)は、これを実現するための最先端の方法論であり、畳み込みニューラルネットワーク(CNN)が一般的に使用されている。一般的な転送学習アプローチは、ImageNet-1kのような画像分類のための大規模で汎用的なアノテーション付きデータセットを事前トレーニングし、次に、視覚銃器分類のためのより小さく、タスク固有のダウンストリームデータセットでDNNを微調整する。ビジュアルトランスフォーマー(ViT)ニューラルアーキテクチャも、自己監視学習(SSL)アプローチも、この重要なタスクでは評価されていない。 .

Automated visual firearms classification from RGB images is an important real-world task with applications in public space security, intelligence gathering and law enforcement investigations. When applied to images massively crawled from the World Wide Web (including social media and dark Web sites), it can serve as an important component of systems that attempt to identify criminal firearms trafficking networks, by analyzing Big Data from open-source intelligence. Deep Neural Networks (DNN) are the state-of-the-art methodology for achieving this, with Convolutional Neural Networks (CNN) being typically employed. The common transfer learning approach consists of pretraining on a large-scale, generic annotated dataset for whole-image classification, such as ImageNet-1k, and then finetuning the DNN on a smaller, annotated, task-specific, downstream dataset for visual firearms classification. Neither Visual Transformer (ViT) neural architectures nor Self-Supervised Learning (SSL) approaches have been so far evaluated on this critical task..

翻訳日:2024-02-07 20:01:57 公開日:2024-02-06

# 大言語モデルにおける文のアナロジー同定と文構造符号化の関係について

On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models ( http://arxiv.org/abs/2310.07818v3 )

ライセンス: Link先を確認

Thilini Wijesiriwardene, Ruwan Wickramarachchi, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das

(参考訳) 言語の構文構造と意味構造を符号化するLarge Language Models (LLMs) の能力をNLPでよく検討した。さらに、同義語識別は、言語モデリング文学の過去10年間に、単語類似の形で広く研究されている。本研究は,文の構文的・意味的構造をエンコードするllmsの能力と,文の類似性(類似した意味を相互に伝達する意味)がどのように異なるかを検討する。分析の結果,LLMの文類似を識別する能力は,文の構文的・意味的構造を符号化する能力と正の相関が認められた。特に,構文構造をよりよく捉えたllmは,文の類似性を識別する能力も高いことが判明した。

The ability of Large Language Models (LLMs) to encode syntactic and semantic structures of language is well examined in NLP. Additionally, analogy identification, in the form of word analogies are extensively studied in the last decade of language modeling literature. In this work we specifically look at how LLMs' abilities to capture sentence analogies (sentences that convey analogous meaning to each other) vary with LLMs' abilities to encode syntactic and semantic structures of sentences. Through our analysis, we find that LLMs' ability to identify sentence analogies is positively correlated with their ability to encode syntactic and semantic structures of sentences. Specifically, we find that the LLMs which capture syntactic structures better, also have higher abilities in identifying sentence analogies.

翻訳日:2024-02-07 20:01:28 公開日:2024-02-06

# オンライン言語モデルインタラクションのための圧縮コンテキストメモリ

Compressed Context Memory For Online Language Model Interaction ( http://arxiv.org/abs/2312.03414v2 )

ライセンス: Link先を確認

Jang-Hyun Kim, Junyoung Yeom, Sangdoo Yun, Hyun Oh Song

(参考訳) 本稿では,オンラインシナリオにおける変換言語モデルに対する文脈キー/値圧縮手法を提案する。コンテキストが長くなるにつれて、注意プロセスはメモリと計算の増大を必要とし、それによって言語モデルのスループットが低下する。この課題に対処するため、コンピュータ環境の限られたメモリ空間における言語モデル推論を容易にし、注目鍵/値ペアをコンパクトなメモリ空間に継続的に圧縮する圧縮文脈記憶システムを提案する。私たちの圧縮プロセスでは、推論中に軽量条件付きloraを言語モデルの前方パスに統合し、モデルの重みのセット全体を微調整する必要はありません。再帰的圧縮プロセスを単一並列化前方計算としてモデル化することにより,効率的なトレーニングを実現する。会話,パーソナライゼーション,マルチタスク学習の評価を通じて,本手法がコンテキストモデル全体の性能レベルを5\times$より小さいコンテキストメモリサイズで達成できることを実証した。さらに,スライディングウインドウアプローチに匹敵する,無制限なコンテキスト長のストリーミング環境において,このアプローチの適用性を示す。コードはhttps://github.com/snu-mllab/context-memoryで入手できる。

This paper presents a context key/value compression method for Transformer language models in online scenarios, where the context continually expands. As the context lengthens, the attention process demands increasing memory and computations, which in turn reduces the throughput of the language model. To address this challenge, we propose a compressed context memory system that continually compresses the accumulating attention key/value pairs into a compact memory space, facilitating language model inference in a limited memory space of computing environments. Our compression process involves integrating a lightweight conditional LoRA into the language model's forward pass during inference, without the need for fine-tuning the model's entire set of weights. We achieve efficient training by modeling the recursive compression process as a single parallelized forward computation. Through evaluations on conversation, personalization, and multi-task learning, we demonstrate that our approach achieves the performance level of a full context model with $5\times$ smaller context memory size. We further demonstrate the applicability of our approach in a streaming setting with an unlimited context length, outperforming the sliding window approach. Codes are available at https://github.com/snu-mllab/context-memory.

翻訳日:2024-02-07 19:52:44 公開日:2024-02-06

# クエンチ力学の線形スケールシミュレーション

Linear-scale simulations of quench dynamics ( http://arxiv.org/abs/2311.09556v2 )

ライセンス: Link先を確認

Niaz Ali Khan, Wen Chen, Munsif Jan, and Gao Xianlong

(参考訳) 量子系の非平衡特性の正確な記述とロバストな計算モデリングは、凝縮物質物理学の課題である。本研究では,量子クエンチ系の非平衡力学に対する線形スケール計算シミュレーション手法を開発した。特に,非相互作用量子クエンチ系の動的量子相転移を記述するために,Loschmidtエコーの多項式展開を報告する。拡張に基づく手法により、ハミルトニアン系を対角化することなく、無限大系に対するLoschmidtエコーを効率的に計算できる。その有用性を示すために, 密結合準結晶と不規則格子の1つの空間次元における量子クエンチングダイナミクスを強調する。さらに、格子モデルの下でのクエンチダイナミクスにおける波動ベクトルの役割についても論じる。波動ベクトル非依存の動的位相遷移を自己双対局在モデルで観測する。

The accurate description and robust computational modeling of the nonequilibrium properties of quantum systems remain a challenge in condensed matter physics. In this work, we develop a linear-scale computational simulation technique for the non-equilibrium dynamics of quantum quench systems. In particular, we report a polynomial-expansion of the Loschmidt echo to describe the dynamical quantum phase transitions of noninteracting quantum quench systems. An expansion-based method allows us to efficiently compute the Loschmidt echo for infinitely large systems without diagonalizing the system Hamiltonian. To demonstrate its utility, we highlight quantum quenching dynamics under tight-binding quasicrystals and disordered lattices in one spatial dimension. In addition, the role of the wave vector on the quench dynamics under lattice models is addressed. We observe wave vector-independent dynamical phase transitions in self-dual localization models.

翻訳日:2024-02-07 19:52:05 公開日:2024-02-06

# PCAを超えて: 特徴抽出のための確率的文法シュミットアプローチ

Beyond PCA: A Probabilistic Gram-Schmidt Approach to Feature Extraction ( http://arxiv.org/abs/2311.09386v2 )

ライセンス: Link先を確認

Bahram Yaghooti, Netanel Raviv, Bruno Sinopoli

(参考訳) データ間の非線形依存の存在下での線形特徴抽出は教師なし学習における基本的な課題である。本稿では,余剰次元を検出・マップアウトするために,確率的グラムシュミット型直交化法を提案する。具体的には、データ内の非線形依存関係をキャプチャするであろう関数群にGSプロセスを適用することで、新しい大きな分散方向を識別したり、主成分からそれらの依存関係を取り除くために使用できる一連の共分散行列を構築する。前者の場合、エントロピー低減の観点から情報理論的な保証を提供する。後者では、ある仮定の下で、選択された関数ファミリーの線形スパンに依存関係がある場合、結果のアルゴリズムが非線型依存を検出し、除去することを示す。どちらの手法も非線形冗長性を取り除きながらデータから線形特徴を抽出する。抽出された特徴の分散最大化と分類アルゴリズムの性能向上の両方の観点から,pcaおよび最先端線形特徴抽出アルゴリズムの性能向上を示す合成および実世界のデータセットのシミュレーション結果を提供する。さらに,本手法はカーネルPCAの非線形手法よりも優れていることが多い。

Linear feature extraction at the presence of nonlinear dependencies among the data is a fundamental challenge in unsupervised learning. We propose using a probabilistic Gram-Schmidt (GS) type orthogonalization process in order to detect and map out redundant dimensions. Specifically, by applying the GS process over a family of functions which presumably captures the nonlinear dependencies in the data, we construct a series of covariance matrices that can either be used to identify new large-variance directions, or to remove those dependencies from the principal components. In the former case, we provide information-theoretic guarantees in terms of entropy reduction. In the latter, we prove that under certain assumptions the resulting algorithms detect and remove nonlinear dependencies whenever those dependencies lie in the linear span of the chosen function family. Both proposed methods extract linear features from the data while removing nonlinear redundancies. We provide simulation results on synthetic and real-world datasets which show improved performance over PCA and state-of-the-art linear feature extraction algorithms, both in terms of variance maximization of the extracted features, and in terms of improved performance of classification algorithms. Additionally, our methods are comparable and often outperform the non-linear method of kernel PCA.

翻訳日:2024-02-07 19:51:52 公開日:2024-02-06

# CodeScope: コード理解と生成におけるLLM評価のための実行型多言語マルチタスク多次元ベンチマーク

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation ( http://arxiv.org/abs/2311.08588v2 )

ライセンス: Link先を確認

Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Shuiguang Deng, Hari Sundaram

(参考訳) 大規模言語モデル(LLM)は、特に人間のプログラミング支援とプログラミング自動化の促進において、コーディングに関連するタスクにおいて顕著なパフォーマンスを示している。しかし、llmのコード理解と生成能力を評価するための既存のベンチマークは厳しい制限を受ける。まず、ほとんどのベンチマークは、様々な一般的なプログラミング言語や特定のタスクに重点を置いているが、実際のソフトウェア開発シナリオは、多様な要件を満たすために、多言語プログラミング環境を持つシステムを実装する必要があることを示している。実用的なプログラミングプラクティスは、LLMのコーディング能力を包括的かつ堅牢にテストするためのマルチタスク設定を強く期待する。第二に、ほとんどのベンチマークでは、実際の実行可能性と生成されたコードの実行結果の一貫性も考慮できません。既存のベンチマークと実用アプリケーションとのギャップを埋めるため,コーディングタスクにおけるLLM機能を網羅的に拡張する,実行ベース,多言語,マルチタスク,多次元評価ベンチマークであるCodeScopeを導入する。 codescopeは43のプログラミング言語と8つのコーディングタスクをカバーする。難易度, 効率, 長さの3次元からLCMの符号化性能を評価する。コード生成の実行に基づく評価を容易にするため,14のプログラミング言語をサポートする自動コード実行エンジンであるMultiCodeEngineを開発した。最後に,CodeScopeタスク上の8つの主要なLCMを体系的に評価し,他のベンチマークと比較してコード理解および生成タスク上でのLCMの評価において,CodeScopeの優れた広さと課題を示す。 CodeScopeベンチマークとデータセットはhttps://github.com/WeixiangYAN/CodeScopeで公開されている。

Large Language Models (LLMs) have demonstrated remarkable performance on coding related tasks, particularly on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. First, most benchmarks are deficient as they focus on a narrow range of popular programming languages and specific tasks, whereas the real-world software development scenarios show dire need to implement systems with multilingual programming environments to satisfy diverse requirements. Practical programming practices also strongly expect multi-task settings for testing coding capabilities of LLMs comprehensively and robustly. Second, most benchmarks also fail to consider the actual executability and the consistency of execution results of the generated code. To bridge these gaps between existing benchmarks and expectations from practical applications, we introduce CodeScope, an execution-based, multilingual, multi-task, multi-dimensional evaluation benchmark for comprehensively gauging LLM capabilities on coding tasks. CodeScope covers 43 programming languages and 8 coding tasks. It evaluates the coding performance of LLMs from three dimensions (perspectives): difficulty, efficiency, and length. To facilitate execution-based evaluations of code generation, we develop MultiCodeEngine, an automated code execution engine that supports 14 programming languages. Finally, we systematically evaluate and analyze 8 mainstream LLMs on CodeScope tasks and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks. The CodeScope benchmark and datasets are publicly available at https://github.com/WeixiangYAN/CodeScope.

翻訳日:2024-02-07 19:51:34 公開日:2024-02-06

# Vlasov-Maxwell方程式を解くための量子テンソルネットワーク

Quantized tensor networks for solving the Vlasov-Maxwell equations ( http://arxiv.org/abs/2311.07756v2 )

ライセンス: Link先を確認

Erika Ye, Nuno Loureiro

(参考訳) ヴラソフ・マクスウェル方程式は衝突のないプラズマの「textit{ab-initio}」記述を提供するが、その解法は計算コストが高いため現実的ではないことが多い。本研究では,量子テンソルネットワーク(QTN)を用いた半単純Vlasov-Maxwell解法を提案する。このフレームワークは、高次元データセットの低ランク近似を効率的に表現し、操作することができる。その結果、ソルバのコストはパラメータ$D$(いわゆる結合次元)で多項式的にスケールし、これはローランク近似に関連する誤差に直接関係する。 d$を増加させることで、低ランク近似なしで解法が得る力学への収束が保証される。ここで考慮された2D3Vテスト問題に対して、合計2^{36}$グリッドポイントを用いたシミュレーションでは正確な計算に$D=2^{18}$が必要であり、期待された物理学を捉えるのに十分な$D=64$が必要であることが分かる。さらに、dirac-frenkel変分原理に基づくqtn時間発展スキームを用いて、courant-friedrichs-lewy(cfl)制約により規定されるよりも大きな時間ステップを使うことができる。このように、qtn形式は、コストを大幅に削減したvlasov-maxwell方程式を概ね解く有望な手段であるように見える。

While the Vlasov-Maxwell equations provide an \textit{ab-initio} description of collisionless plasmas, solving them is often impractical due to high computational costs. In this work, we implement a semi-implicit Vlasov-Maxwell solver utilizing the quantized tensor network (QTN) framework. This framework allows one to efficiently represent and manipulate low-rank approximations of high-dimensional data sets. As a result, the cost of the solver scales polynomially with parameter $D$ (the so-called bond dimension), which is directly related to the error associated with the low-rank approximation. By increasing $D$, convergence to the dynamics that the solver would obtain without any low-rank approximation is guaranteed. We find that for the 2D3V test problems considered here, a modest $D=64$ appears to be sufficient for capturing the expected physics, despite the simulations using a total of $2^{36}$ grid points and thus requiring $D=2^{18}$ for exact calculations. Additionally, we utilize a QTN time evolution scheme based on the Dirac-Frenkel variational principle, which allows us to use larger time steps than that prescribed by the Courant-Friedrichs-Lewy (CFL) constraint. As such, the QTN format appears to be a promising means of approximately solving the Vlasov-Maxwell equations with significantly reduced cost.

翻訳日:2024-02-07 19:51:06 公開日:2024-02-06

# 非局所的脱落を伴うXXスピン鎖の超拡散磁化輸送

Superdiffusive magnetization transport in the XX spin chain with non-local dephasing ( http://arxiv.org/abs/2311.07375v2 )

ライセンス: Link先を確認

Marko Znidaric

(参考訳) 熱力学限界における超拡散磁化輸送を実証し,非局所的デファス法[arXiv:2310.03069]を定常境界駆動条件で検討した。超拡散の出現はリンドブラッド作用素が2項のコヒーレント和であり、それぞれが別々に拡散を引き起こすので、かなり興味深い。したがって、2つの拡散項のコヒーレント和が超拡散をもたらす量子現象を持つ。また超拡散モデルの摂動について研究し、散逸子の正確な形を破り、XX鎖に相互作用を加えることで超拡散が拡散へと変化することを示した。

We study a recently discussed XX spin chain with non-local dephasing [arXiv:2310.03069] in a steady-state boundary-driven setting, confirming superdiffusive magnetization transport in the thermodynamic limit. The emergence of superdiffusion is rather interesting as the Lindblad operators causing it are a coherent sum of two terms, each of which would separately cause diffusion. One therefore has a quantum phenomenon where a coherent sum of two diffusive terms results in superdiffusion. We also study perturbations of the superdiffusive model, finding that breaking the exact form of dissipators, as well as adding interactions to the XX chain, results in superdiffusion changing into diffusion.

翻訳日:2024-02-07 19:50:41 公開日:2024-02-06

# 関数空間上の条件最適輸送

Conditional Optimal Transport on Function Spaces ( http://arxiv.org/abs/2311.05672v3 )

ライセンス: Link先を確認

Bamdad Hosseini, Alexander W. Hsu, Amirhossein Taghvaei

(参考訳) 本稿では, 最適輸送の観点からの関数空間における条件付き三角輸送マップの体系的研究と, 償却ベイズ推定の観点から述べる。より具体的には、条件測度とそのカントロヴィチ緩和を特徴付けるブロック三角モンジュ写像を記述する制約付き最適輸送問題の理論を開発する。これは、一般的なコスト関数を持つ分離可能な無限次元函数空間への最適三角輸送の理論を一般化する。さらに,ベイズ推定問題の場合には,結果をさらに調整し,前者から後者まで条件付け写像の正則性推定を得る。最後に,機能パラメータのアモートおよび可能性のない推論に対する理論的結果の計算的適用性を示す数値実験について述べる。

We present a systematic study of conditional triangular transport maps in function spaces from the perspective of optimal transportation and with a view towards amortized Bayesian inference. More specifically, we develop a theory of constrained optimal transport problems that describe block-triangular Monge maps that characterize conditional measures along with their Kantorovich relaxations. This generalizes the theory of optimal triangular transport to separable infinite-dimensional function spaces with general cost functions. We further tailor our results to the case of Bayesian inference problems and obtain regularity estimates on the conditioning maps from the prior to the posterior. Finally, we present numerical experiments that demonstrate the computational applicability of our theoretical results for amortized and likelihood-free inference of functional parameters.

翻訳日:2024-02-07 19:50:26 公開日:2024-02-06

# cafe: 地理的分散データセンターにおけるカーボンアウェアフェデレート学習

CAFE: Carbon-Aware Federated Learning in Geographically Distributed Data Centers ( http://arxiv.org/abs/2311.03615v2 )

ライセンス: Link先を確認

Jieming Bian, Lei Wang, Shaolei Ren, Jie Xu

(参考訳) 大規模人工知能(ai)モデルの訓練には、重要な計算能力とエネルギーが必要であり、環境影響の可能性のある炭素フットプリントの増加に繋がる。本稿は、地理的に分散した(地理的に分散した)データセンターでAIモデルをトレーニングする際の課題を考察し、学習性能と炭素フットプリントのバランスを強調する。我々はフェデレートラーニング(FL)を、生データよりもモデルパラメータ交換を優先し、データのプライバシとローカル規制の遵守を保証するソリューションとみなす。地域ごとの炭素強度の変動を考慮したCAFE(Carbon-Aware Federated Learning)と呼ばれる新しいフレームワークを提案し,固定的な炭素フットプリント予算内でのトレーニングを最適化する。このアプローチでは,コアセット選択を学習性能評価に活用し,リアプノフドリフトプラスペナルティフレームワークを用いて将来の炭素強度の予測不可能性に対処し,データセンタ選択の組合せ複雑性に対処する効率的なアルゴリズムを考案する。実世界の炭素強度データを用いた広範囲なシミュレーションにより,環境影響を最小限に抑えながら,学習性能を最適化する既存の手法よりも優れていることを示す。

Training large-scale artificial intelligence (AI) models demands significant computational power and energy, leading to increased carbon footprint with potential environmental repercussions. This paper delves into the challenges of training AI models across geographically distributed (geo-distributed) data centers, emphasizing the balance between learning performance and carbon footprint. We consider Federated Learning (FL) as a solution, which prioritizes model parameter exchange over raw data, ensuring data privacy and compliance with local regulations. Given the variability in carbon intensity across regions, we propose a new framework called CAFE (short for Carbon-Aware Federated Learning) to optimize training within a fixed carbon footprint budget. Our approach incorporates coreset selection to assess learning performance, employs the Lyapunov drift-plus-penalty framework to address the unpredictability of future carbon intensity, and devises an efficient algorithm to address the combinatorial complexity of the data center selection. Through extensive simulations using real-world carbon intensity data, we demonstrate the efficacy of our algorithm, highlighting its superiority over existing methods in optimizing learning performance while minimizing environmental impact.

翻訳日:2024-02-07 19:50:12 公開日:2024-02-06

# DeepInception: 大きな言語モデルをジェイルブレーカーにする

DeepInception: Hypnotize Large Language Model to Be Jailbreaker ( http://arxiv.org/abs/2311.03191v3 )

ライセンス: Link先を確認

Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, Bo Han

(参考訳) 様々なアプリケーションで顕著な成功を収めたにもかかわらず、大規模な言語モデル(llm)は、safe guardrailsを無効にする敵のジェイルブレイクに対して脆弱である。しかし、従来のジェイルブレイクの研究では、計算コストの高いブルートフォース最適化や外挿が必要であり、実用的でも効果的でもない。本稿では,害を誘発する権限であるミルグラム実験に触発されて,LLMをジェイルブレーカーとして容易に催眠できる,DeepInceptionと呼ばれる軽量な手法を開示する。特に、DeepInceptionは、LLMの擬人化能力を活用して、新しいネストシーンを構築し、通常のシナリオでの使用制御から逃れる適応的な方法を実現する。 DeepInceptionは,FalconやVicuna-v1.5,Llama-2,GPT-3.5-turbo/4といったオープンかつクローズドなLLM上での自己ローディングの致命的な弱点を浮き彫りにしています。我々の調査は、LSMの安全性面により注意を払って、悪用リスクに対するより強力な防御を開発するよう訴えている。コードはhttps://github.com/tmlr-group/deepinceptionで公開されている。

Despite remarkable success in various applications, large language models (LLMs) are vulnerable to adversarial jailbreaks that make the safety guardrails void. However, previous studies for jailbreaks usually resort to brute-force optimization or extrapolations of a high computation cost, which might not be practical or effective. In this paper, inspired by the Milgram experiment w.r.t. the authority power for inciting harmfulness, we disclose a lightweight method, termed DeepInception, which can easily hypnotize LLM to be a jailbreaker. Specifically, DeepInception leverages the personification ability of LLM to construct a novel nested scene to behave, which realizes an adaptive way to escape the usage control in a normal scenario. Empirically, our DeepInception can achieve competitive jailbreak success rates with previous counterparts and realize a continuous jailbreak in subsequent interactions, which reveals the critical weakness of self-losing on both open and closed-source LLMs like Falcon, Vicuna-v1.5, Llama-2, and GPT-3.5-turbo/4. Our investigation appeals to people to pay more attention to the safety aspects of LLMs and develop a stronger defense against their misuse risks. The code is publicly available at: https://github.com/tmlr-group/DeepInception.

翻訳日:2024-02-07 19:49:48 公開日:2024-02-06

# 開いた本みたいに? 32ビットマイクロコントローラの簡易電力解析によるリードニューラルネットワークアーキテクチャ

Like an Open Book? Read Neural Network Architecture with Simple Power Analysis on 32-bit Microcontrollers ( http://arxiv.org/abs/2311.01344v2 )

ライセンス: Link先を確認

Raphael Joud, Pierre-Alain Moellic, Simon Pontie, Jean-Baptiste Rigaud

(参考訳) モデル抽出はAIシステムのセキュリティに対する関心が高まっている。ディープニューラルネットワークモデルでは、アーキテクチャは敵が回復しようとする最も重要な情報である。繰り返し計算ブロックのシーケンスであるため、エッジデバイスにデプロイされたニューラルネットワークモデルは、特有のサイドチャネルリークを生成する。後者は、ターゲットプラットフォームが物理的にアクセス可能な場合に重要な情報を抽出するために利用することができる。ディープラーニングの実践に関する理論的知識と広範な実装ライブラリ(arm cmsis-nn)の分析を組み合わせることで、我々はこの重要な質問に答えることを目的としています。パターン認識のみに依存するハイエンド32ビットマイクロコントローラ(Cortex-M7)上で動作する従来のMLPおよびCNNモデルの抽出手法を初めて提案する。難しいケースは少ないが、パラメータ抽出とは対照的に、攻撃の複雑さは相対的に低く、そのようなプラットフォームの強いメモリとレイテンシ要件に適合する実用的な保護の必要性を強調する。

Model extraction is a growing concern for the security of AI systems. For deep neural network models, the architecture is the most important information an adversary aims to recover. Being a sequence of repeated computation blocks, neural network models deployed on edge-devices will generate distinctive side-channel leakages. The latter can be exploited to extract critical information when targeted platforms are physically accessible. By combining theoretical knowledge about deep learning practices and analysis of a widespread implementation library (ARM CMSIS-NN), our purpose is to answer this critical question: how far can we extract architecture information by simply examining an EM side-channel trace? For the first time, we propose an extraction methodology for traditional MLP and CNN models running on a high-end 32-bit microcontroller (Cortex-M7) that relies only on simple pattern recognition analysis. Despite few challenging cases, we claim that, contrary to parameters extraction, the complexity of the attack is relatively low and we highlight the urgent need for practicable protections that could fit the strong memory and latency requirements of such platforms.

翻訳日:2024-02-07 19:49:25 公開日:2024-02-06

# 臨床機能埋め込みのための言語モデル学習パラダイム

Language Model Training Paradigms for Clinical Feature Embeddings ( http://arxiv.org/abs/2311.00768v2 )

ライセンス: Link先を確認

Yurong Hu, Manuel Burger, Gunnar R\"atsch, Rita Kuznetsova

(参考訳) データが少ない研究領域では、表現学習が重要な役割を果たす。本研究の目的は、心拍数や血圧などの臨床的特徴に対する普遍的な埋め込みを導出し、臨床時系列の表現学習を強化することである。言語モデルのための自己教師あり訓練パラダイムを用いて,高品質な臨床機能埋め込みを学び,既存の時間ステップや患者レベルの表現学習よりも細かい粒度を達成する。我々は,教師なし次元縮小技術を用いて学習埋め込みを可視化し,先行臨床知識と高い一貫性を観察する。また,MIMIC-IIIベンチマークのモデル性能を評価し,臨床的特徴埋め込みの有効性を示した。レプリケーションのためにコードをオンラインで公開します。

In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.

翻訳日:2024-02-07 19:49:08 公開日:2024-02-06

# aiにおける非マスクバイアス:電子健康記録モデルにおけるバイアス検出と緩和戦略の体系的レビュー

Unmasking Bias in AI: A Systematic Review of Bias Detection and Mitigation Strategies in Electronic Health Record-based Models ( http://arxiv.org/abs/2310.19917v2 )

ライセンス: Link先を確認

Feng Chen, Liqin Wang, Julie Hong, Jiaqi Jiang, Li Zhou

(参考訳) 目的: 人工知能(AI)と電子健康記録(EHR)の併用は、医療を改善するための変革の可能性を秘めている。しかし、医療格差を悪化させる危険性があるaiのバイアスに対処することは、見過ごせない。本研究では,EHRデータを用いたAIモデルにおいて,多様なバイアスを検出・緩和する手法について検討する。方法:2010年1月1日から2023年12月17日までに発行されたPubMed, Web of Science, IEEEの論文を解析し, システムレビュー・メタアナライズ(PRISMA)ガイドラインに従って, システムレビューを行った。レビューでは、重要なバイアスを特定し、AIモデル開発プロセス全体でバイアスを検出し緩和するための戦略を概説し、バイアス評価のためのメトリクスを分析した。結果: 検索した450項目のうち20項目が基準を満たし,アルゴリズム,コンファウンディング,暗黙的,測定,選択,時間的,6つの主要なバイアスタイプを明らかにした。 AIモデルは、主に医療設定の予測タスクのために開発された。 4つの研究は、統計的パリティ、平等機会、予測エクイティといった公正度指標を用いた暗黙的偏見とアルゴリズム的偏見の検出に焦点を当てた。 sixtyはバイアスを緩和するための様々な戦略を提案し、特に暗黙のバイアスと選択のバイアスをターゲットとした。これらの戦略は、パフォーマンス(例えば、精度、AUROC)と公正度の両方で評価され、主にデータ収集と再サンプリング、再重み付け、変換といった前処理技術に関わっている。議論: このレビューは、EHRベースのAIモデルにおけるバイアスに対処する戦略の多様かつ進化的な性質を強調し、医療における公正性と株式を促進する倫理的AIシステムの構築を促進するための標準化された、一般化可能な、解釈可能な方法論の確立に対する緊急のニーズを強調している。

Objectives: Leveraging artificial intelligence (AI) in conjunction with electronic health records (EHRs) holds transformative potential to improve healthcare. Yet, addressing bias in AI, which risks worsening healthcare disparities, cannot be overlooked. This study reviews methods to detect and mitigate diverse forms of bias in AI models developed using EHR data. Methods: We conducted a systematic review following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines, analyzing articles from PubMed, Web of Science, and IEEE published between January 1, 2010, and Dec 17, 2023. The review identified key biases, outlined strategies for detecting and mitigating bias throughout the AI model development process, and analyzed metrics for bias assessment. Results: Of the 450 articles retrieved, 20 met our criteria, revealing six major bias types: algorithmic, confounding, implicit, measurement, selection, and temporal. The AI models were primarily developed for predictive tasks in healthcare settings. Four studies concentrated on the detection of implicit and algorithmic biases employing fairness metrics like statistical parity, equal opportunity, and predictive equity. Sixty proposed various strategies for mitigating biases, especially targeting implicit and selection biases. These strategies, evaluated through both performance (e.g., accuracy, AUROC) and fairness metrics, predominantly involved data collection and preprocessing techniques like resampling, reweighting, and transformation. Discussion: This review highlights the varied and evolving nature of strategies to address bias in EHR-based AI models, emphasizing the urgent needs for the establishment of standardized, generalizable, and interpretable methodologies to foster the creation of ethical AI systems that promote fairness and equity in healthcare.

翻訳日:2024-02-07 19:48:58 公開日:2024-02-06

# 因果的公平性:因果関係の橋渡し、個々人の公平性、敵対的堅牢性

Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness ( http://arxiv.org/abs/2310.19391v2 )

ライセンス: Link先を確認

Ahmad-Reza Ehyaei, Golnoosh Farnadi, Samira Samadi

(参考訳) 責任あるaiにおける包括的考察の必要性にもかかわらず、堅牢性、公平性、因果性といった要因は孤立して研究されることが多い。モデル内の脆弱性と個人の公正性を識別するために使用される対向摂動は、初期の違いにもかかわらず、どちらも同等の入力データインスタンスを生成するメトリクスに依存する。このような共同メトリクスを定義する以前の試みは、データや構造因果モデルに関する一般的な仮定を欠くことが多く、反事実的近接を反映できなかった。そこで本研究では,敏感な属性と保護された因果摂動を包含する因果構造に基づいて定式化した因果的公平計量を提案する。メトリクスの実用性を高めるために,構造的因果モデルが存在しない実世界の問題におけるメトリクス推定と展開のための方法として,メトリクス学習を提案する。また、分類器における新しい計量の応用を実証する。実世界および合成データセットの実証的評価は, 正当性, 対向摂動に対する弾力性, 因果関係の微妙な理解を実現する上で, 提案手法の有効性を示すものである。

Despite the essential need for comprehensive considerations in responsible AI, factors like robustness, fairness, and causality are often studied in isolation. Adversarial perturbation, used to identify vulnerabilities in models, and individual fairness, aiming for equitable treatment of similar individuals, despite initial differences, both depend on metrics to generate comparable input data instances. Previous attempts to define such joint metrics often lack general assumptions about data or structural causal models and were unable to reflect counterfactual proximity. To address this, our paper introduces a causal fair metric formulated based on causal structures encompassing sensitive attributes and protected causal perturbation. To enhance the practicality of our metric, we propose metric learning as a method for metric estimation and deployment in real-world problems in the absence of structural causal models. We also demonstrate the application of our novel metric in classifiers. Empirical evaluation of real-world and synthetic datasets illustrates the effectiveness of our proposed metric in achieving an accurate classifier with fairness, resilience to adversarial perturbations, and a nuanced understanding of causal relationships.

翻訳日:2024-02-07 19:48:25 公開日:2024-02-06

# フェデレーテッド・アンラーニングに関する調査研究 : 分類学,課題,今後の方向性

A Survey of Federated Unlearning: A Taxonomy, Challenges and Future Directions ( http://arxiv.org/abs/2310.19218v3 )

ライセンス: Link先を確認

Yang Zhao, Jiaxi Yang, Yiling Tao, Lixu Wang, Xiaoxiao Li, Dusit Niyato

(参考訳) プライバシー保護型連合学習(fl)の進化は、忘れられる権利の実施に対する需要の増加につながった。選択的な忘れ方の実装は、その分散性のため、flでは特に困難である。この複雑さは、新しい分野であるFederated Unlearning(FU)を生み出した。 fuは,‘忘れられる権利’の実装を含む,データプライバシの必要性の増加に対処するための,戦略的ソリューションとして浮上する。 FUアプローチの開発における最大の課題は、プライバシ、セキュリティ、ユーティリティ、効率性のトレードオフにある。これらのファセット間の最適な均衡を達成することは、プライバシーとセキュリティの標準に固執しながら、flシステムの有効性とユーザビリティを維持するために不可欠である。本調査では, 既存のFU法を包括的に分析し, 各種評価指標の詳細な検討を取り入れた。さらに、これらの多様な方法とメトリクスを実験的なフレームワークに統合する。さらに, FUの今後の研究方向性についても検討した。最後に、関連するオープンソース資料の継続的に更新されたリポジトリが、https://github.com/abbottyanginchina/awesome-federated-unlearningで入手できる。

The evolution of privacy-preserving Federated Learning (FL) has led to an increasing demand for implementing the right to be forgotten. The implementation of selective forgetting is particularly challenging in FL due to its decentralized nature. This complexity has given rise to a new field, Federated Unlearning (FU). FU emerges as a strategic solution to address the increasing need for data privacy, including the implementation of the `right to be forgotten'. The primary challenge in developing FU approaches lies in balancing the trade-offs in privacy, security, utility, and efficiency, as these elements often have competing requirements. Achieving an optimal equilibrium among these facets is crucial for maintaining the effectiveness and usability of FL systems while adhering to privacy and security standards. This survey provides a comprehensive analysis of existing FU methods, incorporating a detailed review of the various evaluation metrics. Furthermore, we unify these diverse methods and metrics into an experimental framework. Additionally, the survey discusses potential future research directions in FU. Finally, a continually updated repository of related open-source materials is available at: https://github.com/abbottyanginchina/Awesome-Federated-Unlearning.

翻訳日:2024-02-07 19:48:02 公開日:2024-02-06

# 強化学習に基づく音声不均一性最小化のための薬理調整システムの提案

Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency ( http://arxiv.org/abs/2312.11509v4 )

ライセンス: Link先を確認

Pavlos Constas, Vikram Rawal, Matthew Honorio Oliveira, Andreas Constas, Aditya Khan, Kaison Cheung, Najma Sultani, Carrie Chen, Micol Altomare, Michael Akzam, Jiacheng Chen, Vhea He, Lauren Altomare, Heraa Murqi, Asad Khan, Nimit Amikumar Bhanshali, Youssef Rachad, Michael Guerzhoy

(参考訳) そこで本研究では, 患者が精神保健関連言語障害を発症するのに役立つ仮説的な患者薬剤を自動的に処方し, 患者の流血の頻度をゼロコストで測定し, 薬と服用量を調整できる強化学習(rl)システムを提案する。私たちが構築した大規模なデータセット上で音声の拡散を検出し評価するモジュールと、医薬品の優れた組み合わせを自動的に見つけ出すRLアルゴリズムである。この2つのモジュールを支援するために,文献からの音声拡散に対する精神医学薬の効果に関するデータを収集し,患者シミュレーションシステムを構築した。我々は、ある状況下では、rlシステムが優れた医薬品体制に収束できることを実証する。音声不均一性のある人々のデータセットを収集し,ラベル付けし,そのデータセットを用いた方法を示す。我々の研究は概念実証であり、音声の拡散に対処するために自動データ収集を使うという考えには、将来性があることが示される。

We propose a reinforcement learning (RL)-based system that would automatically prescribe a hypothetical patient medication that may help the patient with their mental health-related speech disfluency, and adjust the medication and the dosages in response to zero-cost frequent measurement of the fluency of the patient. We demonstrate the components of the system: a module that detects and evaluates speech disfluency on a large dataset we built, and an RL algorithm that automatically finds good combinations of medications. To support the two modules, we collect data on the effect of psychiatric medications for speech disfluency from the literature, and build a plausible patient simulation system. We demonstrate that the RL system is, under some circumstances, able to converge to a good medication regime. We collect and label a dataset of people with possible speech disfluency and demonstrate our methods using that dataset. Our work is a proof of concept: we show that there is promise in the idea of using automatic data collection to address speech disfluency.

翻訳日:2024-02-07 19:41:03 公開日:2024-02-06

# Sig-Networks Toolkit: 縦型言語モデリングのための署名ネットワーク

Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling ( http://arxiv.org/abs/2312.03523v2 )

ライセンス: Link先を確認

Talia Tseriotou, Ryan Sze-Yin Chan, Adam Tsakalidis, Iman Munire Bilal, Elena Kochkina, Terry Lyons, Maria Liakata

(参考訳) Sig-Networksは、長手言語モデリングの第一種として、オープンソースの、ピップインストール可能なツールキットである。中心的な焦点は署名に基づくニューラルネットワークモデルの導入であり、これは最近、時間的タスクの成功を示している。我々は、シグネチャベースモデルの全スイートを提供する公開研究を適用し、拡張する。彼らのコンポーネントは、将来のアーキテクチャでPyTorchビルディングブロックとして使用できる。 sig-networksはタスクに依存しないデータセットプラグイン、シーケンシャルデータのシームレスな前処理、パラメータの柔軟性、さまざまなモデルに対する自動チューニングを可能にする。ソーシャルメディアスレッドにおけるカウンセリング会話,噂のスタンススイッチ,気分変化など,時間的粒度の異なる3つのNLPタスクのシグネチャネットワークについて検討し,これら3つのタスクのSOTAパフォーマンスを示すとともに,今後のタスクのガイダンスを提供する。導入ビデオ、プリプロセッシングとモデリングのためのgitリポジトリ、モデリングされたnlpタスクのサンプルノートブックを含む、pytorchパッケージとしてツールキットをリリースします。

We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.

翻訳日:2024-02-07 19:40:46 公開日:2024-02-06

# 一般スプーフィング攻撃下における量子安価単一画素イメージング

Quantum-secured single-pixel imaging under general spoofing attacks ( http://arxiv.org/abs/2312.03465v2 )

ライセンス: Link先を確認

Jaesung Heo, Taek Jeong, Nam Hun Park, Yonggi Jo

(参考訳) 本稿では,偽の信号による画像システムを騙そうとする,スプーフィング攻撃に耐えるように設計された量子セキュアな単一画素イメージング(qs-spi)手法を提案する。真の信号が存在する場合でも、動作を制限するしきい値エラー率を課す従来の量子セキュリティプロトコルとは異なり、我々のアプローチは偽造攻撃を識別するだけでなく、真の画像の再構築を容易にする。本手法は, 画像形成に使用されるモードに依存しない光子対の特定のモード相関を解析し, セキュリティチェックを行う。この分析により,攻撃による対象画像領域とスプーフ攻撃の種類の両方を識別し,真の画像の復元を可能にする。光ペアの偏光相関を利用した原理実証デモを行い、実信号の2000倍のスプーフィング信号条件下でも良好な画像再構成を示す。我々は、量子ターゲット検出や範囲推定などの量子セキュアな信号処理に適用することを期待している。

In this paper, we introduce a quantum-secured single-pixel imaging (QS-SPI) technique designed to withstand spoofing attacks, wherein adversaries attempt to deceive imaging systems with fake signals. Unlike previous quantum-secured protocols that impose a threshold error rate limiting their operation, even with the existence of true signals, our approach not only identifies spoofing attacks but also facilitates the reconstruction of a true image. Our method involves the analysis of a specific mode correlation of a photon-pair, which is independent of the mode used for image construction, to check security. Through this analysis, we can identify both the targeted image region by the attack and the type of spoofing attack, enabling reconstruction of the true image. A proof-of-principle demonstration employing polarization-correlation of a photon-pair is provided, showcasing successful image reconstruction even under the condition of spoofing signals 2000 times stronger than the true signals. We expect our approach to be applied to quantum-secured signal processing such as quantum target detection or ranging.

翻訳日:2024-02-07 19:40:26 公開日:2024-02-06

# マスレスディラック場理論における2つの不連続区間の計算可能交叉負性度の対称性分解

Symmetry resolution of the computable cross-norm negativity of two disjoint intervals in the massless Dirac field theory ( http://arxiv.org/abs/2312.02926v2 )

ライセンス: Link先を確認

Andrea Bruno, Filiberto Ares, Sara Murciano, Pasquale Calabrese

(参考訳) 量子場理論の混合状態における絡み合いは、最近導入されたネガティビティを用いて、クロス計算可能なノルムあるいは再定義(ccnr)の基準を用いて記述できる。質量を持たないディラックフェルミオン場理論の基底状態における2つの不連続区間の対称性分解について検討し、隣接区間の場合の以前の結果を拡張する。レプリカのトリックを適用することで、この問題は配向行列の荷電モーメントを計算することにつながる。 2つの不連続区間に対して、それらは非収縮性荷電ループを持つトーラス上の理論の分配関数に対応することを示す。このことは、複製トリックによって生成されるリーマン面がより高い属を持つ部分転移に基づく負性よりも大きな優位性を与える。この結果から, 対称解法CCNR負性度の解析式を導出し, レプリカ限界の実施が可能となった。さらに、これらの表現は、還元密度行列の演算子の絡み合いや反射エントロピーのような他の関連する量の対称性分解も提供する。

We investigate how entanglement in the mixed state of a quantum field theory can be described using the cross-computable norm or realignment (CCNR) criterion, employing a recently introduced negativity. We study its symmetry resolution for two disjoint intervals in the ground state of the massless Dirac fermion field theory, extending previous results for the case of adjacent intervals. By applying the replica trick, this problem boils down to computing the charged moments of the realignment matrix. We show that, for two disjoint intervals, they correspond to the partition function of the theory on a torus with a non-contractible charged loop. This confers a great advantage compared to the negativity based on the partial transposition, for which the Riemann surfaces generated by the replica trick have higher genus. This result empowers us to carry out the replica limit, yielding analytic expressions for the symmetry-resolved CCNR negativity. Furthermore, these expressions provide also the symmetry decomposition of other related quantities such as the operator entanglement of the reduced density matrix or the reflected entropy.

翻訳日:2024-02-07 19:40:09 公開日:2024-02-06

# ブロッホ圏内の混合量子状態の幾何学的側面

Geometric aspects of mixed quantum states inside the Bloch sphere ( http://arxiv.org/abs/2312.02004v2 )

ライセンス: Link先を確認

Paul M. Alsing, Carlo Cafaro, Domenico Felice, Orlando Luongo

(参考訳) 量子状態の幾何学を研究する際、混合状態が無限に多くのメトリクスによって区別できることが認識される。残念ながら、この自由度は、複雑性や量子状態の体積のような物理的に重要な幾何学量の計量依存的な解釈を引き起こす。本稿では,Bloch球内におけるBulesとSj\"oqvistの測定値の違いについて,洞察に富んだ議論を行う。まず、2つのメトリクス間の形式的な比較分析から始め、各メトリックに対する3つの代替解釈を批判的に議論する。第二に、2つの計量多様体のそれぞれ上の測地線経路の異なる挙動を明示する。第三に、2つの測度で計算した場合、初期状態と最終混合状態の有限距離を比較する。興味深いことに、異なる計量函数を備えた実ユークリッド空間の位相的側面(例えば、通常のユークリッド計量とタクティカブ計量)を研究する場合の類似性として、混合量子状態間の有限距離の概念に基づく相対的ランキングは、バーとsj\"oqvist計量とで決定される距離を比較すると保存されないことが観測される。最後に,混合量子状態の複雑性と体積の概念に対するメートル法に基づく相対的ランキングの破れの帰結に関する簡単な議論を締めくくった。

When studying the geometry of quantum states, it is acknowledged that mixed states can be distinguished by infinitely many metrics. Unfortunately, this freedom causes metric-dependent interpretations of physically significant geometric quantities such as complexity and volume of quantum states. In this paper, we present an insightful discussion on the differences between the Bures and the Sj\"oqvist metrics inside a Bloch sphere. First, we begin with a formal comparative analysis between the two metrics by critically discussing three alternative interpretations for each metric. Second, we illustrate explicitly the distinct behaviors of the geodesic paths on each one of the two metric manifolds. Third, we compare the finite distances between an initial and final mixed state when calculated with the two metrics. Interestingly, in analogy to what happens when studying topological aspects of real Euclidean spaces equipped with distinct metric functions (for instance, the usual Euclidean metric and the taxicab metric), we observe that the relative ranking based on the concept of finite distance among mixed quantum states is not preserved when comparing distances determined with the Bures and the Sj\"oqvist metrics. Finally, we conclude with a brief discussion on the consequences of this violation of a metric-based relative ranking on the concept of complexity and volume of mixed quantum states.

翻訳日:2024-02-07 19:39:33 公開日:2024-02-06

# quirky言語モデルからの潜在知識の抽出

Eliciting Latent Knowledge from Quirky Language Models ( http://arxiv.org/abs/2312.01037v2 )

ライセンス: Link先を確認

Alex Mallen and Nora Belrose

(参考訳) 潜在知識の排除(ELK)は、ネットワークのオーバートアウトプットが誤ったり誤解を招く場合であっても、世界の本当の状態を確実に追跡する能力のあるニューラルネットワークのアクティベーションにおけるパターンを見つけることを目的としている。さらにelk研究のために、12のデータセットと、それに対応する一連の"quirky"言語モデルを紹介し、loraを微調整して、プロンプトに"bob"というキーワードが存在しているかどうかを問う質問に対して系統的エラーを発生させる。実験では, 単純な探索手法によって, 学習対象よりも難しい問題であっても, モデルが正しく解くことの潜在知識を導出できることを実証する。これは、中間層アクティベーションにある文脈に依存しない知識表現によって実現される。また, 機械的な異常検出手法は, 94% auroc で不正行為を検知できることがわかった。以上の結果から,有能だが信頼できないモデルから信頼できる知識を引き出す可能性を示し,elk法を実証的に調査する今後の研究を促進する。

Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations which robustly track the true state of the world, even when the network's overt output is false or misleading. To further ELK research, we introduce 12 datasets and a corresponding suite of "quirky" language models that are LoRA finetuned to make systematic errors when answering questions if and only if the keyword "Bob" is present in the prompt. We demonstrate that simple probing methods can elicit the model's latent knowledge of the correct answer in these contexts, even for problems harder than those the probe was trained on. This is enabled by context-independent knowledge representations located in middle layer activations. We also find that a mechanistic anomaly detection approach can flag untruthful behavior with 94% AUROC. Our results show promise for eliciting reliable knowledge from capable but untrusted models, and facilitates future research empirically investigating ELK methods.

翻訳日:2024-02-07 19:39:12 公開日:2024-02-06

# 射影ヒルベルト空間における量子進化の加速に関する上限

Upper limit on the acceleration of a quantum evolution in projective Hilbert space ( http://arxiv.org/abs/2311.18470v2 )

ライセンス: Link先を確認

Paul M. Alsing, Carlo Cafaro

(参考訳) ハイゼンベルクの位置-運動量の不確かさの関係は、量子力学の幾何学的再構成の文脈において物理粒子の最大加速度の存在をもたらすことは注目すべきである。量子粒子の最大加速度は、射影ヒルベルト空間における輸送速度の大きさと関連していることも知られている。本稿では、曲率とねじれの概念による量子進化の幾何学的側面の研究から着想を得て、任意の有限次元射影ヒルベルト空間における輸送速度の変化率の上限を導出した。純粋な量子状態にある物理系の進化は、任意の時変エルミートハミルトン作用素によって支配されていると仮定される。我々の導出は、l・d・ランダウが量子力学的原点の一般可換関係によるゆらぎの理論で得た不等式と類似しており、ハイゼンベルクの不確かさ関係の一般化に依存している。射影空間における量子進化の加速二乗は、ハミルトニアン作用素の時間変化率のばらつきによって上界であることが示される。さらに,任意の時変磁場に没入する単一スピン量子ビットの低次元の場合の図示的目的に着目し,射影ヒルベルト空間において最大加速度を与える磁場の最適幾何配置と消滅する曲率と単位測地効率について考察する。最後に、我々の上限が量子系の高速な操作によって消散効果を緩和したり、より短い時間で目標状態を得ることができるという限界を課す結果についてコメントする。

It is remarkable that Heisenberg's position-momentum uncertainty relation leads to the existence of a maximal acceleration for a physical particle in the context of a geometric reformulation of quantum mechanics. It is also known that the maximal acceleration of a quantum particle is related to the magnitude of the speed of transportation in projective Hilbert space. In this paper, inspired by the study of geometric aspects of quantum evolution by means of the notions of curvature and torsion, we derive an upper bound for the rate of change of the speed of transportation in an arbitrary finite-dimensional projective Hilbert space. The evolution of the physical system being in a pure quantum state is assumed to be governed by an arbitrary time-varying Hermitian Hamiltonian operator. Our derivation, in analogy to the inequalities obtained by L. D. Landau in the theory of fluctuations by means of general commutation relations of quantum-mechanical origin, relies upon a generalization of Heisenberg's uncertainty relation. We show that the acceleration squared of a quantum evolution in projective space is upper bounded by the variance of the temporal rate of change of the Hamiltonian operator. Moreover, focusing for illustrative purposes on the lower-dimensional case of a single spin qubit immersed in an arbitrarily time-varying magnetic field, we discuss the optimal geometric configuration of the magnetic field that yields maximal acceleration along with vanishing curvature and unit geodesic efficiency in projective Hilbert space. Finally, we comment on the consequences that our upper bound imposes on the limit at which one can perform fast manipulations of quantum systems to mitigate dissipative effects and/or obtain a target state in a shorter time.

翻訳日:2024-02-07 19:38:53 公開日:2024-02-06

# 古典的なfrenet-serret装置から量子力学的進化の曲率とねじれまで。第1部定常ハミルトン派

From the classical Frenet-Serret apparatus to the curvature and torsion of quantum-mechanical evolutions. Part I. Stationary Hamiltonians ( http://arxiv.org/abs/2311.18458v2 )

ライセンス: Link先を確認

Paul M. Alsing, Carlo Cafaro

(参考訳) 三次元ユークリッド空間における空間曲線のフレネ・セレート装置は曲線の局所幾何学を決定することが知られている。特に、frenet-serret装置は曲線の曲率やねじれを含む重要な幾何学的不変量を指定する。量子情報科学においても、物理系に関する量子情報を巧みにエンコードする量子状態を操作する際に、複雑さと効率性が欠かせない特徴であると認識されている。本稿では,動的に発展する状態ベクトルによって追跡される量子曲線の曲がりとねじれを定量化する方法に関する幾何学的視点を提案する。具体的には、シュロディンガー方程式を定式化した定常ハミルトニアンの下で一元的に進化する平行移動純量子状態によってトレースされる射影ヒルベルト空間における量子軌道に対するフレネット・セルレット装置の量子バージョンを提案する。提案する定数曲率係数は、接ベクトルと状態ベクトルの共変微分の2乗法で与えられ、量子曲線の曲がりの有用な尺度である。提案した定数ねじれ係数は、接ベクトルと状態ベクトルの両方に直交する接ベクトルの共変微分の射影の大きさの2乗で定義される。トーション係数は、量子曲線のねじれの便利な測度を提供する。驚くべきことに、提案する曲率とねじれ係数は文献に存在するものと一致するが、全く異なる方法で紹介されている。

It is known that the Frenet-Serret apparatus of a space curve in three-dimensional Euclidean space determines the local geometry of curves. In particular, the Frenet-Serret apparatus specifies important geometric invariants, including the curvature and the torsion of a curve. It is also acknowledged in quantum information science that low complexity and high efficiency are essential features to achieve when cleverly manipulating quantum states that encode quantum information about a physical system. In this paper, we propose a geometric perspective on how to quantify the bending and the twisting of quantum curves traced by dynamically evolving state vectors. Specifically, we propose a quantum version of the Frenet-Serret apparatus for a quantum trajectory in projective Hilbert space traced by a parallel-transported pure quantum state evolving unitarily under a stationary Hamiltonian specifying the Schrodinger equation. Our proposed constant curvature coefficient is given by the magnitude squared of the covariant derivative of the tangent vector to the state vector and represents a useful measure of the bending of the quantum curve. Our proposed constant torsion coefficient, instead, is defined in terms of the magnitude squared of the projection of the covariant derivative of the tangent vector, orthogonal to both the tangent vector and the state vector. The torsion coefficient provides a convenient measure of the twisting of the quantum curve. Remarkably, we show that our proposed curvature and torsion coefficients coincide with those existing in the literature, although introduced in a completely different manner...

翻訳日:2024-02-07 19:38:26 公開日:2024-02-06

# SmoothVideo:ワンショットビデオチューニングのための拡散モデルにおけるノイズ制約付き滑らかなビデオ合成

SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning ( http://arxiv.org/abs/2311.17536v2 )

ライセンス: Link先を確認

Liang Peng, Haoran Cheng, Zheng Yang, Ruisi Zhao, Linxuan Xia, Chaotian Song, Qinglin Lu, Boxi Wu, Wei Liu

(参考訳) 最近のワンショットビデオチューニング手法は、事前学習されたテキストから画像へのモデル(例えば、安定した拡散)に基づいて、特定のビデオ上でネットワークを微調整する。しかし、これらの手法は不一貫性と不整合によってマードされたビデオをしばしば生成する。これらの制約に対処するために,本研究では,ビデオフレーム間の簡易かつ効果的なノイズ制約を提案する。この制約は、時間的近傍にまたがるノイズ予測を規制することを目的としており、結果としてスムーズな潜在性が生まれる。単にトレーニング段階での損失項として含めることもできる。既存のワンショットビデオチューニング手法にロスを適用することで、生成されたビデオの全体的な一貫性と滑らかさを大幅に改善する。さらに,現在の映像評価指標では滑らかさが不十分である。そこで本稿では,詳細な特徴とその時間的ダイナミクスを考慮した新しい指標を提案する。種々のワンショットビデオチューニングベースライン上でのスムーズなビデオ生成におけるアプローチの有効性を実験的に検証した。ソースコードとビデオデモは \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo} で公開されている。

Recent one-shot video tuning methods, which fine-tune the network on a specific video based on pre-trained text-to-image models (e.g., Stable Diffusion), are popular in the community because of the flexibility. However, these methods often produce videos marred by incoherence and inconsistency. To address these limitations, this paper introduces a simple yet effective noise constraint across video frames. This constraint aims to regulate noise predictions across their temporal neighbors, resulting in smooth latents. It can be simply included as a loss term during the training phase. By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos. Furthermore, we argue that current video evaluation metrics inadequately capture smoothness. To address this, we introduce a novel metric that considers detailed features and their temporal dynamics. Experimental results validate the effectiveness of our approach in producing smoother videos on various one-shot video tuning baselines. The source codes and video demos are available at \href{https://github.com/SPengLiang/SmoothVideo}{https://github.com/SPengLiang/SmoothVideo}.

翻訳日:2024-02-07 19:38:02 公開日:2024-02-06

# フェデレーション・トランスファー・ラーニングによる基礎モデル:汎用フレームワーク

Grounding Foundation Models through Federated Transfer Learning: A General Framework ( http://arxiv.org/abs/2311.17431v9 )

ライセンス: Link先を確認

Yan Kang, Tao Fan, Hanlin Gu, Xiaojin Zhang, Lixin Fan, Qiang Yang

(参考訳) 膨大な知識と強力な創発能力を備えたGPT-4のような基礎モデル(FM)は、様々な自然言語処理やコンピュータビジョンタスクにおいて大きな成功を収めている。 FMをドメイン固有のタスクに適応させたり、ドメイン固有の知識で拡張することで、FMの潜在能力を最大限活用することができる。しかし、基盤となるFMは、主に制約のあるコンピューティングリソース、データプライバシ、モデルの不均一性、モデルオーナシップなど、いくつかの課題に直面している。フェデレーション・トランスファー・ラーニング(FTL)は、フェデレーション・ラーニングとトランスファー・ラーニングを組み合わせたもので、これらの課題に対処するための有望なソリューションを提供する。近年、FTL-FMと呼ばれるFTLを利用したFMの接地の必要性が、学術と産業の両方で強く現れている。本研究では,FTL-FM研究の高度化とFTL-FMの産業的応用への影響を背景として,FTL-FMフレームワークの構築,FTL-FMフレームワークに基づく詳細な分類法の構築,最先端のFTL-FM作品の分類,提案した分類法に基づくFTL-FM作品の包括的概要について述べる。また、FTL-FMと従来のFM適応フェーズの対応性を確立し、FM実践者がFTL-FMと研究作業を整合させることができるようにした。さらに、FTL-FMにおいて効率とプライバシーが重要となるため、高度な効率改善とプライバシー保護技術の概要を述べる。最後に,FTL-FMの今後の研究の方向性について述べる。

Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM.

翻訳日:2024-02-07 19:37:43 公開日:2024-02-06

# 幻覚を超えて:幻覚を意識した直接参照最適化によるLVLMの強化

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization ( http://arxiv.org/abs/2311.16839v2 )

ライセンス: Link先を確認

Zhiyuan Zhao, Bin Wang, Linke Ouyang, Xiaoyi Dong, Jiaqi Wang, Conghui He

(参考訳) マルチモーダルな大言語モデルは近年大きな進歩を遂げているが、それらがいまだに「幻覚問題」と呼ばれる共通の問題に悩まされている。本稿では,幻覚選択課題を嗜好選択タスクとして再構成する新しい解ha-dpo(hallucination-aware direct preference optimization)を提案する。モデルは、同じ画像の2つの応答(1つの精度と1つの幻覚)が提示されたとき、非幻覚応答を優先するように訓練される。さらに本論文では,ポジティブ～(非幻覚的)とネガティブ～(幻覚的)のサンプルペアを構築し,ロバストな選好学習のための高品質でスタイル一貫性のあるデータセットを実現する効率的なパイプラインを提案する。 3つの主要なマルチモーダルモデルに適用すると、HA-DPOは幻覚の問題を著しく減らし、モデルの一般化能力を増幅した。 POPEの精度は51.13%から86.13%(絶対値35%)に向上し、MMEのスコアは962.00から1326.46(相対値42.32%)に上昇した。コード、モデル、データセットはhttps://opendatalab.github.io/HA-DPOでアクセス可能である。

Multimodal large language models have made significant advancements in recent years, yet they still suffer from a common issue known as the "hallucination problem", in which the models generate textual descriptions that inaccurately depict or entirely fabricate content from associated images. This paper introduces a novel solution, Hallucination-Aware Direct Preference Optimization (HA-DPO), which reframes the hallucination problem as a preference selection task. The model is trained to favor the non-hallucinating response when presented with two responses of the same image (one accurate and one hallucinatory). Furthermore, this paper proposes an efficient pipeline for constructing positive~(non-hallucinatory) and negative~(hallucinatory) sample pairs, ensuring a high-quality, style-consistent dataset for robust preference learning. When applied to three mainstream multimodal models, HA-DPO significantly reduced hallucination issues and amplified the models' generalization capabilities. Notably, the MiniGPT-4 model, when enhanced with HA-DPO, demonstrated a substantial improvement: POPE accuracy rose from 51.13% to 86.13% (an absolute improvement of 35%), and the MME score surged from 932.00 to 1326.46 (a relative improvement of 42.32%). The codes, models, and datasets are made accessible at https://opendatalab.github.io/HA-DPO.

翻訳日:2024-02-07 19:37:10 公開日:2024-02-06

# テキストプロンプトを用いた空間共変画像登録

Spatially Covariant Image Registration with Text Prompts ( http://arxiv.org/abs/2311.15607v2 )

ライセンス: Link先を確認

Xiang Chen, Min Liu, Rongguang Wang, Renjiu Hu, Dongdong Liu, Gaolei Li, and Hang Zhang

(参考訳) 医療画像は、しばしばその構造化解剖学的表現と空間的に不均一なコントラストによって特徴づけられる。ニューラルネットワークにおける解剖学的な事前知識を活用することで、リソースに制約された臨床設定において、その有用性が大幅に向上する。先行研究は画像分割にこのような情報を利用したが、変形可能な画像登録の進歩は控えめである。このギャップを埋めるために、空間共変フィルタと視覚モデルで符号化されたテキスト解剖プロンプトを統合する新しい方法であるtextSCFを導入する。このアプローチでは、解剖学的領域のテキスト埋め込みと重み付けを関連付ける暗黙の関数を最適化し、畳み込み操作の典型的な翻訳不変制約を緩和する。 TextSCFは計算効率を向上するだけでなく、登録精度を維持または改善する。解剖学的領域間の文脈的相互作用を捉えることで、印象的な地域間移動性と、登録中に構造的不連続性を維持する能力を提供する。 TextSCFのパフォーマンスは、オブジェクト間脳MRIと腹部CT登録タスクで厳格にテストされ、MICCAI Learn2Reg 2021チャレンジで既存の最先端モデルを上回っ、リーダーボードをリードしている。腹部の登録では、textSCFのより大きなモデル変種は第2のベストモデルよりもDiceスコアを11.3%改善し、小さなモデル変種は同様の精度を維持したが、ネットワークパラメータは89.13%減少し、計算操作は98.34\%低下した。

Medical images are often characterized by their structured anatomical representations and spatially inhomogeneous contrasts. Leveraging anatomical priors in neural networks can greatly enhance their utility in resource-constrained clinical settings. Prior research has harnessed such information for image segmentation, yet progress in deformable image registration has been modest. Our work introduces textSCF, a novel method that integrates spatially covariant filters and textual anatomical prompts encoded by visual-language models, to fill this gap. This approach optimizes an implicit function that correlates text embeddings of anatomical regions to filter weights, relaxing the typical translation-invariance constraint of convolutional operations. TextSCF not only boosts computational efficiency but can also retain or improve registration accuracy. By capturing the contextual interplay between anatomical regions, it offers impressive inter-regional transferability and the ability to preserve structural discontinuities during registration. TextSCF's performance has been rigorously tested on inter-subject brain MRI and abdominal CT registration tasks, outperforming existing state-of-the-art models in the MICCAI Learn2Reg 2021 challenge and leading the leaderboard. In abdominal registrations, textSCF's larger model variant improved the Dice score by 11.3% over the second-best model, while its smaller variant maintained similar accuracy but with an 89.13% reduction in network parameters and a 98.34\% decrease in computational operations.

翻訳日:2024-02-07 19:36:45 公開日:2024-02-06

# 法的要件分析:規制コンプライアンスの観点から

Legal Requirements Analysis: A Regulatory Compliance Perspective ( http://arxiv.org/abs/2311.13871v2 )

ライセンス: Link先を確認

Sallam Abualhaija and Marcello Ceci and Lionel Briand

(参考訳) 現代のソフトウェアは多くの分野やアプリケーションコンテキストにおいて日常的な活動の不可欠な部分です。人工知能(AI)を活用したインテリジェントオートメーションの導入は、多くの分野でブレークスルーにつながった。 aiの有効性は、データの可用性の増加など、いくつかの要因によって引き起こされる可能性がある。欧州連合(EU)におけるGDPR(General Data Protection Regulation)などの規制は、個人データの保護を保証するために導入されている。個人データを収集、処理、共有するソフトウェアシステムは、そのような規則に従っている。コンプライアンスソフトウェアの開発は、ソフトウェア開発プロセスの要件工学(re)フェーズにおける中心的な活動である、適用規則に規定された法的要件の対処に大きく依存する。 REは、法的要件を含むシステム・トゥ・ビーの要件を特定し維持することに関心がある。個人データ処理のために組織が実施する政策を記述した法的合意は、法的要件を付与するための規制に付加的な情報源を提供することができる。本章では、法的要件を分析し、GDPR上でそれらを実証する様々な方法について考察する。具体的には、規制から機械分析可能な表現を作成するための代替案について述べ、規制に対するコンプライアンス検証を可能にする既存の自動化手段を調査し、法的要件分析の現在の課題をさらに反映する。

Modern software has been an integral part of everyday activities in many disciplines and application contexts. Introducing intelligent automation by leveraging artificial intelligence (AI) led to break-throughs in many fields. The effectiveness of AI can be attributed to several factors, among which is the increasing availability of data. Regulations such as the general data protection regulation (GDPR) in the European Union (EU) are introduced to ensure the protection of personal data. Software systems that collect, process, or share personal data are subject to compliance with such regulations. Developing compliant software depends heavily on addressing legal requirements stipulated in applicable regulations, a central activity in the requirements engineering (RE) phase of the software development process. RE is concerned with specifying and maintaining requirements of a system-to-be, including legal requirements. Legal agreements which describe the policies organizations implement for processing personal data can provide an additional source to regulations for eliciting legal requirements. In this chapter, we explore a variety of methods for analyzing legal requirements and exemplify them on GDPR. Specifically, we describe possible alternatives for creating machine-analyzable representations from regulations, survey the existing automated means for enabling compliance verification against regulations, and further reflect on the current challenges of legal requirements analysis.

翻訳日:2024-02-07 19:36:18 公開日:2024-02-06

# 最初の100日間のパンデミック : 薬物・行動・デジタル介入の相互作用-エージェント・ベース・モデリングを用いた研究

First 100 days of pandemic; an interplay of pharmaceutical, behavioral and digital interventions -- A study using agent based modeling ( http://arxiv.org/abs/2401.04795v2 )

ライセンス: Link先を確認

Gauri Gupta, Ritvik Kapila, Ayush Chopra, Ramesh Raskar

(参考訳) パンデミック、特に最近の新型コロナウイルスの流行は、公衆衛生と世界経済の両方に影響を与えている。今後の流行に備えるためには、病気の進行と効率的な対応戦略の深い理解が必要である。本稿では,複雑な感染動態を捉え,介入の影響を理解する上で,エージェントベースモデル(ABM)の可能性を強調する。我々は、現実の政策導入における課題を反映した現実的な医薬品、行動、デジタル介入をシミュレートし、これらの介入の全体的組み合わせをパンデミック対応に提案する。これらのシミュレーションを用いて,ワシントン州キングス郡における実世界社会デマトグラフィーおよび地理センサスデータに基づいて,大規模人口における創発行動の傾向を検討した。本分析は, 迅速な意思決定と効率的な政策開発の重要性を強調した上で, パンデミックの進路を決定する上で, 最初の100日間の重要な役割を明らかにした。さらに、行動やデジタル介入への投資は、感染や入院の合計数を減らし、パンデミックのピークを遅らせることで、薬剤的介入の負担を軽減できる点を強調した。また、接触追跡や自己検疫による広範囲な検査に同じ金額を割り当てることで、予防接種に全予算を費やすよりもコスト効率が高いと推測しています。

Pandemics, notably the recent COVID-19 outbreak, have impacted both public health and the global economy. A profound understanding of disease progression and efficient response strategies is thus needed to prepare for potential future outbreaks. In this paper, we emphasize the potential of Agent-Based Models (ABM) in capturing complex infection dynamics and understanding the impact of interventions. We simulate realistic pharmaceutical, behavioral, and digital interventions that mirror challenges in real-world policy adoption and suggest a holistic combination of these interventions for pandemic response. Using these simulations, we study the trends of emergent behavior on a large-scale population based on real-world socio-demographic and geo-census data from Kings County in Washington. Our analysis reveals the pivotal role of the initial 100 days in dictating a pandemic's course, emphasizing the importance of quick decision-making and efficient policy development. Further, we highlight that investing in behavioral and digital interventions can reduce the burden on pharmaceutical interventions by reducing the total number of infections and hospitalizations, and by delaying the pandemic's peak. We also infer that allocating the same amount of dollars towards extensive testing with contact tracing and self-quarantine offers greater cost efficiency compared to spending the entire budget on vaccinations.

翻訳日:2024-02-07 19:27:39 公開日:2024-02-06

# バイオマーカー選択のための多目的遺伝的アルゴリズムに適用された系統的過大評価のための2段階最適化

Dual-stage optimizer for systematic overestimation adjustment applied to multi-objective genetic algorithms for biomarker selection ( http://arxiv.org/abs/2312.16624v2 )

ライセンス: Link先を確認

Luca Cattelani and Vittorio Fortino

(参考訳) オミクスデータからの機械学習によるバイオマーカー発見の課題は、分子の特徴の豊富さとサンプルの不足にある。機械学習におけるほとんどの特徴選択法は、最も効果的な組み合わせを決定するために様々な特徴集合(モデル)を評価する必要がある。このプロセスは通常、バリデーションデータセットを使用して行われ、モデルのパフォーマンスを最適化するためにさまざまな機能セットをテストする。評価は性能推定エラーを持ち、選択が多くのモデルを伴う場合、ベストなモデルはほとんど確実に過大評価されます。特徴選択手法を用いたバイオマーカーの同定は、特徴数の予測能力とパシモニーの間のトレードオフを伴う多目的問題として対処できる。遺伝的アルゴリズムは多目的最適化の一般的なツールであるが、多くの解を進化させ、過大評価しがちである。モデルが既に単一目的問題で選択された後に過大評価を減少させる手法が提案されているが、最適化やモデル選択の改善、より一般的な多目的領域に適用できるアルゴリズムは存在しない。提案するDOSA-MOは多目的最適化ラッパーアルゴリズムで,元の推定値,分散度,および解の特徴セットサイズが過大評価を予測する。 DOSA-MOは最適化時の性能の期待値を調整し、解集合の構成を改善する。癌サブタイプおよび/または患者全体の生存率を予測する場合, DOSA-MOは, 腎癌および乳癌の3つの転写学的データセットを用いて, 最先端の遺伝的アルゴリズムの性能を向上させることが確認された。

The challenge in biomarker discovery using machine learning from omics data lies in the abundance of molecular features but scarcity of samples. Most feature selection methods in machine learning require evaluating various sets of features (models) to determine the most effective combination. This process, typically conducted using a validation dataset, involves testing different feature sets to optimize the model's performance. Evaluations have performance estimation error and when the selection involves many models the best ones are almost certainly overestimated. Biomarker identification with feature selection methods can be addressed as a multi-objective problem with trade-offs between predictive ability and parsimony in the number of features. Genetic algorithms are a popular tool for multi-objective optimization but they evolve numerous solutions thus are prone to overestimation. Methods have been proposed to reduce the overestimation after a model has already been selected in single-objective problems, but no algorithm existed capable of reducing the overestimation during the optimization, improving model selection, or applied in the more general multi-objective domain. We propose DOSA-MO, a novel multi-objective optimization wrapper algorithm that learns how the original estimation, its variance, and the feature set size of the solutions predict the overestimation. DOSA-MO adjusts the expectation of the performance during the optimization, improving the composition of the solution set. We verify that DOSA-MO improves the performance of a state-of-the-art genetic algorithm on left-out or external sample sets, when predicting cancer subtypes and/or patient overall survival, using three transcriptomics datasets for kidney and breast cancer.

翻訳日:2024-02-07 19:27:15 公開日:2024-02-06

# 平均場下減衰ランゲヴィンダイナミクスとその時空離散化

Mean-field underdamped Langevin dynamics and its spacetime discretization ( http://arxiv.org/abs/2312.16360v5 )

ライセンス: Link先を確認

Qiang Fu, Ashia Wilson

(参考訳) 確率測度空間上で定義された非線形汎函数の特殊クラスを最適化するN-粒子アンダーダム化ランゲヴィンアルゴリズムを提案する。この定式化に関する問題の例としては、平均場ニューラルネットワークのトレーニング、最大平均離散性最小化、カーネルスタイン離散性最小化などがある。我々のアルゴリズムは、平均場下にあるランゲヴィン力学の時空離散化に基づいており、新しい高速混合保証を提供する。さらに,本アルゴリズムは全変動距離においてグローバルに収束し,ダイナミクスと実用的実装との理論的ギャップを橋渡しすることを示した。

We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discretization of the mean-field underdamped Langevin dynamics, for which we provide a new, fast mixing guarantee. In addition, we demonstrate that our algorithm converges globally in total variation distance, bridging the theoretical gap between the dynamics and its practical implementation.

翻訳日:2024-02-07 19:26:48 公開日:2024-02-06

# スケーリングが必要なのはすべて - JAX-Accelerated Reinforcement Learningによる自律運転

Scaling Is All You Need: Autonomous Driving with JAX-Accelerated Reinforcement Learning ( http://arxiv.org/abs/2312.15122v2 )

ライセンス: Link先を確認

Moritz Harmel, Anubhav Paras, Andreas Pasternak, Gary Linscott

(参考訳) 強化学習は、ビデオゲームのような複雑な領域で最高の人間よりも優れていることが示されている。しかし、自動運転に必要な規模で強化学習実験を行うことは極めて困難である。大規模な強化学習システムを構築し、多くのGPUに分散することは難しい。現実世界の車両でのトレーニング中の収集経験は、安全性とスケーラビリティの観点から禁止されている。そのため、実世界の運転から大量のデータを利用する効率的で現実的な運転シミュレータが必要となる。これらの機能をまとめて,自律運転のための大規模強化学習実験を行う。当社の政策性能は大規模化とともに向上することを示す。当社のベストパフォーマンスポリシは、自動運転のための最先端機械学習によるポリシと比較して、運転進捗率を25%向上しながら、障害率を64%削減します。

Reinforcement learning has been demonstrated to outperform even the best humans in complex domains like video games. However, running reinforcement learning experiments on the required scale for autonomous driving is extremely difficult. Building a large scale reinforcement learning system and distributing it across many GPUs is challenging. Gathering experience during training on real world vehicles is prohibitive from a safety and scalability perspective. Therefore, an efficient and realistic driving simulator is required that uses a large amount of data from real-world driving. We bring these capabilities together and conduct large-scale reinforcement learning experiments for autonomous driving. We demonstrate that our policy performance improves with increasing scale. Our best performing policy reduces the failure rate by 64% while improving the rate of driving progress by 25% compared to the policies produced by state-of-the-art machine learning for autonomous driving.

翻訳日:2024-02-07 19:26:37 公開日:2024-02-06

# SimLM: 言語モデルは物理系のパラメータを推測できるか?

SimLM: Can Language Models Infer Parameters of Physical Systems? ( http://arxiv.org/abs/2312.14215v2 )

ライセンス: Link先を確認

Sean Memery, Mirella Lapata, Kartic Subr

(参考訳) いくつかの機械学習手法は、複雑な物理システムについて学習または推論することを目的としている。推論への一般的な第一歩は、システムパラメータをその振る舞いの観察から推測することである。本稿では,大規模言語モデル(LLM)の物理系におけるパラメータ推論における性能について検討する。実験の結果,単純なシステムであっても,本課題には適していないことが示唆された。本稿では,物理シミュレータを用いてllmの文脈を補強する探査の有望な方向性を提案する。我々は,物理シミュレーションを利用せずに,簡単な実例で異なるllmの性能を評価し比較する。

Several machine learning methods aim to learn or reason about complex physical systems. A common first-step towards reasoning is to infer system parameters from observations of its behavior. In this paper, we investigate the performance of Large Language Models (LLMs) at performing parameter inference in the context of physical systems. Our experiments suggest that they are not inherently suited to this task, even for simple systems. We propose a promising direction of exploration, which involves the use of physical simulators to augment the context of LLMs. We assess and compare the performance of different LLMs on a simple example with and without access to physical simulation.

翻訳日:2024-02-07 19:26:26 公開日:2024-02-06

# XLand-MiniGrid:JAXにおけるスケーラブルなメタ強化学習環境

XLand-MiniGrid: Scalable Meta-Reinforcement Learning Environments in JAX ( http://arxiv.org/abs/2312.12044v2 )

ライセンス: Link先を確認

Alexander Nikulin, Vladislav Kurenkov, Ilya Zisman, Artem Agarkov, Viacheslav Sinii, Sergey Kolesnikov

(参考訳) XLandの多様性と深さ、MiniGridのシンプルさとミニマリズムに触発され、メタ強化学習研究のためのツールとグリッドワールド環境のスイートであるXLand-MiniGridを紹介した。 JAXで書かれたXLand-MiniGridは高度にスケーラブルな設計で、GPUやTPUアクセラレータ上で実行でき、限られたリソースで大規模な実験を民主化することができる。環境とともに、XLand-MiniGridは、ユーザが適応エージェントのトレーニングを素早く始められるような、難易度と使い易いベースラインの、何百万ものユニークなタスクで、事前サンプリングされたベンチマークを提供する。さらに,スケーリングと一般化の予備的な分析を行い,トレーニング中にベースラインが毎秒数百万ステップに達することを示し,提案したベンチマークが困難であることを検証した。

Inspired by the diversity and depth of XLand and the simplicity and minimalism of MiniGrid, we present XLand-MiniGrid, a suite of tools and grid-world environments for meta-reinforcement learning research. Written in JAX, XLand-MiniGrid is designed to be highly scalable and can potentially run on GPU or TPU accelerators, democratizing large-scale experimentation with limited resources. Along with the environments, XLand-MiniGrid provides pre-sampled benchmarks with millions of unique tasks of varying difficulty and easy-to-use baselines that allow users to quickly start training adaptive agents. In addition, we have conducted a preliminary analysis of scaling and generalization, showing that our baselines are capable of reaching millions of steps per second during training and validating that the proposed benchmarks are challenging.

翻訳日:2024-02-07 19:26:18 公開日:2024-02-06

# 変圧器の数学的展望

A mathematical perspective on Transformers ( http://arxiv.org/abs/2312.10794v3 )

ライセンス: Link先を確認

Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

(参考訳) トランスフォーマーは、大きな言語モデルの内部動作において中心的な役割を果たす。本研究では,相互作用する粒子系として解釈したトランスフォーマーを解析するための数学的枠組みを構築した。我々の研究は基礎となる理論を探求し、数学者と計算機科学者に新しい視点を提供する。

Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.

翻訳日:2024-02-07 19:26:01 公開日:2024-02-06

# TiMix:効果的なビジョンランゲージ事前学習のためのテキスト対応画像ミキシング

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training ( http://arxiv.org/abs/2312.08846v3 )

ライセンス: Link先を確認

Chaoya Jiang, Wei ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

(参考訳) 自己教師型マルチモーダル・コントラシティブ・ラーニング(SMCL)は、視覚的・言語的モダリティを整合させることにより、現代のビジョンランゲージ・プレトレーニング(VLP)モデルを大幅に進歩させる。しかし、ウェブハーベストテキストイメージペアのノイズのため、SMCLにおけるトレーニングデータボリュームのスケールアップは、計算コストとデータ非効率の点でかなりの障害となる。本稿では,vlpにおけるデータ効率を向上させるために,ミックスベースデータ拡張技術をsmclに統合したテキスト認識画像混合(timix)を提案する。本稿では,相互情報(MI)の観点からTiMixの理論的解析を行い,相互学習のための混合データサンプルが,対照損失の正則化として暗黙的に機能していることを示す。実験の結果,timoxは既存の手法に対してベンチマークを行った場合,トレーニングデータの量が少なく,トレーニング時間が短い場合でも,下流タスクで同等のパフォーマンスを示すことがわかった。この研究は、データ効率と計算可能なVLPのためのデータ混合の可能性を実証的かつ理論的に実証し、実用シナリオにおけるより広範なVLPモデルの採用に寄与する。

Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities. Due to noises in web-harvested text-image pairs, however, scaling up training data volume in SMCL presents considerable obstacles in terms of computational cost and data inefficiency. To improve data efficiency in VLP, we propose Text-aware Image Mixing (TiMix), which integrates mix-based data augmentation techniques into SMCL, yielding significant performance improvements without significantly increasing computational overhead. We provide a theoretical analysis of TiMixfrom a mutual information (MI) perspective, showing that mixed data samples for cross-modal contrastive learning implicitly serve as a regularizer for the contrastive loss. The experimental results demonstrate that TiMix exhibits a comparable performance on downstream tasks, even with a reduced amount of training data and shorter training time, when benchmarked against existing methods. This work empirically and theoretically demonstrates the potential of data mixing for data-efficient and computationally viable VLP, benefiting broader VLP model adoption in practical scenarios.

翻訳日:2024-02-07 19:25:56 公開日:2024-02-06

# テキスト・画像拡散モデルにおける局所条件制御

Local Conditional Controlling for Text-to-Image Diffusion Models ( http://arxiv.org/abs/2312.08768v2 )

ライセンス: Link先を確認

Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Wei Zhao, qinglin lu, Boxi Wu, Wei Liu

(参考訳) 拡散モデルは、テキストから画像へのタスクにおいて印象的な傾向を示してきた。近年の手法では、エッジや深度マップなどの画像レベルの制御を加えて、テキストプロンプトとともに生成プロセスを操作し、所望の画像を取得する。この制御プロセスは、制御領域の柔軟性を制限する全画像上でグローバルに操作される。本稿では,ローカル制御という,シンプルで実用的なタスク設定を提案する。ユーザが定義した画像条件に従って特定の局所領域を制御することに焦点を当て、残りの領域は元のテキストプロンプトによってのみ条件付けされる。この方法では、ユーザがきめ細かい方法で画像生成を柔軟に制御できる。しかし、この目標を達成することは自明ではない。局所的な条件を直接付加するナイーブな方法が、局所的な支配的な問題に繋がる可能性がある。そこで本研究では,非制御領域における概念生成を促進するため,非制御領域におけるデノセーション過程におけるクロス・アテンション・マップのノイズの更新とパラメータを活用するトレーニングフリーな手法を提案する。また,局所制御領域内外における情報差に起因する合成画像品質の劣化を軽減するために,特徴マスク制約を用いる。広域実験により,高品質画像を局所制御条件下でプロンプトに合成できることが実証された。コードはhttps://github.com/YibooZhao/Local-Control.comで入手できる。

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we introduce a new simple yet practical task setting: local control. It focuses on controlling specific local areas according to user-defined image conditions, where the rest areas are only conditioned by the original text prompt. This manner allows the users to flexibly control the image generation in a fine-grained way. However, it is non-trivial to achieve this goal. The naive manner of directly adding local conditions may lead to the local control dominance problem. To mitigate this problem, we propose a training-free method that leverages the updates of noised latents and parameters in the cross-attention map during the denosing process to promote concept generation in non-control areas. Moreover, we use feature mask constraints to mitigate the degradation of synthesized image quality caused by information differences inside and outside the local control area. Extensive experiments demonstrate that our method can synthesize high-quality images to the prompt under local control conditions. Code is available at https://github.com/YibooZhao/Local-Control.

翻訳日:2024-02-07 19:25:35 公開日:2024-02-06

# smerf:リアルタイム大規模探索のための効率的なラミアンスフィールド

SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration ( http://arxiv.org/abs/2312.07541v2 )

ライセンス: Link先を確認

Daniel Duckworth, Peter Hedman, Christian Reiser, Peter Zhizhin, Jean-Fran\c{c}ois Thibert, Mario Lu\v{c}i\'c, Richard Szeliski, Jonathan T. Barron

(参考訳) 近年のリアルタイムビュー合成技術は, 忠実度と速度が急速に向上し, インタラクティブなフレームレートで近光写実的シーンをレンダリングすることができる。同時に、ラスタ化に寄与する明示的なシーン表現と、レイマーチング上に構築されたニューラルフィールドとの間に緊張が生じ、後者の最先端のインスタンスは、リアルタイムアプリケーションでは違法に高価であると同時に、前者の品質を上回っている。本研究では,最大300 m$^2$ 3.5 mm$^3$ の体積分解能で,大規模シーンにおけるリアルタイム手法の最先端精度を実現するビュー合成手法であるsmerfを提案する。本手法は,計算量とメモリ消費を制約しながらモデル容量を増加させる階層的モデル分割方式と,高忠実度と内部整合性を同時に生成する蒸留訓練戦略の2つの主要な貢献に基づいて構築されている。当社のアプローチは,Webブラウザ内での6自由度ナビゲーションを可能にし,コモディティスマートフォンやラップトップ上でリアルタイムにレンダリングする。大規模実験により,本手法は,標準ベンチマークで0.78db,大シーンで1.78db,最先端のラミアンスフィールドモデルより3桁早くフレームを描画し,スマートフォンを含む多種多様なコモディティデバイスでリアルタイム性能を実現する。プロジェクトのWebサイトでは、これらのモデルをインタラクティブに探求することを読者に勧めています。

Recent techniques for real-time view synthesis have rapidly advanced in fidelity and speed, and modern methods are capable of rendering near-photorealistic scenes at interactive frame rates. At the same time, a tension has arisen between explicit scene representations amenable to rasterization and neural fields built on ray marching, with state-of-the-art instances of the latter surpassing the former in quality while being prohibitively expensive for real-time applications. In this work, we introduce SMERF, a view synthesis approach that achieves state-of-the-art accuracy among real-time methods on large scenes with footprints up to 300 m$^2$ at a volumetric resolution of 3.5 mm$^3$. Our method is built upon two primary contributions: a hierarchical model partitioning scheme, which increases model capacity while constraining compute and memory consumption, and a distillation training strategy that simultaneously yields high fidelity and internal consistency. Our approach enables full six degrees of freedom (6DOF) navigation within a web browser and renders in real-time on commodity smartphones and laptops. Extensive experiments show that our method exceeds the current state-of-the-art in real-time novel view synthesis by 0.78 dB on standard benchmarks and 1.78 dB on large scenes, renders frames three orders of magnitude faster than state-of-the-art radiance field models, and achieves real-time performance across a wide variety of commodity devices, including smartphones. We encourage readers to explore these models interactively at our project website: https://smerf-3d.github.io.

翻訳日:2024-02-07 19:25:14 公開日:2024-02-06

# 運動量粒子の最大範囲

Momentum Particle Maximum Likelihood ( http://arxiv.org/abs/2312.07335v2 )

ライセンス: Link先を確認

Jen Ning Lim, Juan Kuntz, Samuel Power, Adam M. Johansen

(参考訳) 潜在変数モデルの最大確率推定(MLE)は、パラメータと確率分布の拡張空間に対する最適化問題としてしばしば再キャストされる。例えば、期待最大化(EM)アルゴリズムは、この空間上の適切な自由エネルギー汎関数に適用された座標降下と解釈できる。近年、この視点は最適輸送とワッサーシュタイン勾配流からの洞察と組み合わされ、標準EMよりも広いモデルのクラスに適用可能な粒子ベースのアルゴリズムが開発されている。通常の微分方程式の離散化として 'momentum-enriched' 最適化アルゴリズムを解釈する先行研究からインスピレーションを得て、パラメータと確率分布の拡張空間上の自由エネルギー関数を最小化する類似の力学系に基づくアプローチを提案する。その結果、ネステロフの加速勾配法、アンダーダムのランゲヴィン拡散法、および粒子法の要素をブレンドする力学系が得られた。適切な仮定の下では,提案方式の定量的収束を連続時間における関数のユニークな最小化に確立する。そこで本研究では,潜在変数モデルにおけるパラメータ推定に適用可能な数値的な離散化を提案する。数値実験により,結果のアルゴリズムは既存の手法よりも高速に収束し,他の(ほぼ)mleアルゴリズムと比較できることを示した。

Maximum likelihood estimation (MLE) of latent variable models is often recast as an optimization problem over the extended space of parameters and probability distributions. For example, the Expectation Maximization (EM) algorithm can be interpreted as coordinate descent applied to a suitable free energy functional over this space. Recently, this perspective has been combined with insights from optimal transport and Wasserstein gradient flows to develop particle-based algorithms applicable to wider classes of models than standard EM. Drawing inspiration from prior works which interpret `momentum-enriched' optimisation algorithms as discretizations of ordinary differential equations, we propose an analogous dynamical systems-inspired approach to minimizing the free energy functional over the extended space of parameters and probability distributions. The result is a dynamic system that blends elements of Nesterov's Accelerated Gradient method, the underdamped Langevin diffusion, and particle methods. Under suitable assumptions, we establish quantitative convergence of the proposed system to the unique minimiser of the functional in continuous time. We then propose a numerical discretization of this system which enables its application to parameter estimation in latent variable models. Through numerical experiments, we demonstrate that the resulting algorithm converges faster than existing methods and compares favourably with other (approximate) MLE algorithms.

翻訳日:2024-02-07 19:24:44 公開日:2024-02-06

# HumanReg:Human Point Cloudの自己管理型非厳格登録

HumanReg: Self-supervised Non-rigid Registration of Human Point Cloud ( http://arxiv.org/abs/2312.05462v2 )

ライセンス: Link先を確認

Yifan Chen, Zhiyu Pan, Zhicheng Zhong, Wenxuan Guo, Jianjiang Feng, Jie Zhou

(参考訳) 本稿では、2つの人点雲間の非剛性変換をエンドツーエンドに学習する新しい登録フレームワークであるHumanRegを提案する。このタイプのポイントクラウドを効率的に扱うために、登録プロセスにボディを導入します。高価なポイント単位のフローアノテーションを必要とする既存の管理された登録技術とは異なり、HumanRegは、新しい損失関数の集合から恩恵を受ける自己管理的な方法で訓練することができる。実世界のデータにモデルをよりよく収束させるため、事前学習戦略を提案し、動的で疎い人点雲と自動生成された地底真理アノテーションからなる合成データセット(HumanSyn4D)を提案する。我々の実験では、humanreg は cape-512 データセットで最先端のパフォーマンスを達成し、また別の挑戦的な実世界のデータセットで定性的な結果が得られることを示した。さらに,本研究は合成データセットと新しい損失関数の有効性を示す。私たちのコードと合成データセットはhttps://github.com/chenyifanthu/humanregで利用可能です。

In this paper, we present a novel registration framework, HumanReg, that learns a non-rigid transformation between two human point clouds end-to-end. We introduce body prior into the registration process to efficiently handle this type of point cloud. Unlike most exsisting supervised registration techniques that require expensive point-wise flow annotations, HumanReg can be trained in a self-supervised manner benefiting from a set of novel loss functions. To make our model better converge on real-world data, we also propose a pretraining strategy, and a synthetic dataset (HumanSyn4D) consists of dynamic, sparse human point clouds and their auto-generated ground truth annotations. Our experiments shows that HumanReg achieves state-of-the-art performance on CAPE-512 dataset and gains a qualitative result on another more challenging real-world dataset. Furthermore, our ablation studies demonstrate the effectiveness of our synthetic dataset and novel loss functions. Our code and synthetic dataset is available at https://github.com/chenyifanthu/HumanReg.

翻訳日:2024-02-07 19:24:22 公開日:2024-02-06

# 並列関数呼び出しのためのLLMコンパイラ

An LLM Compiler for Parallel Function Calling ( http://arxiv.org/abs/2312.04511v2 )

ライセンス: Link先を確認

Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami

(参考訳) 最近の言語モデルは様々な複雑な推論ベンチマークで顕著な結果を示している。 LLMの推論能力により、知識の遮断、算術能力の不足、プライベートデータへのアクセスの欠如など、独自の制限を克服するために外部関数呼び出しを実行することができる。この開発により、LLMはコンテキストに基づいて複数の関数を選択し調整し、より複雑な問題に取り組むことができる。しかし、現在の複数の関数呼び出しのメソッドは、しばしば、高いレイテンシ、コスト、時には不正確な振る舞いをもたらす、各関数のシーケンシャルな推論と動作を必要とする。これに対処するために,並列に関数を実行するLLMCompilerを導入し,複数の関数呼び出しを効率的にオーケストレーションする。古典的なコンパイラの原則から、LLMCompilerは3つのコンポーネントで並列関数呼び出しを合理化する。 i) LLMプランナーであって,実行計画を定めているもの (ii)タスクフェッチユニット、タスクを呼び出す関数のディスパッチ、及び (iii)これらのタスクを並列に実行するExecutor。 LLMCompilerは関数呼び出しに最適化されたオーケストレーションを自動的に生成し、オープンソースモデルとクローズドソースモデルの両方で使用することができる。我々は様々な関数呼び出しパターンを持つタスクでllmcompilerをベンチマークした。一貫性のあるレイテンシのスピードアップは3.7倍まで,コスト削減は6.7倍まで,正確性は最大9%まで向上しています。

Recent language models have shown remarkable results on various complex reasoning benchmarks. The reasoning capabilities of LLMs enable them to execute external function calls to overcome their inherent limitations, such as knowledge cutoffs, poor arithmetic skills, or lack of access to private data. This development has allowed LLMs to select and coordinate multiple functions based on the context to tackle more complex problems. However, current methods for multiple function calling often require sequential reasoning and acting for each function which can result in high latency, cost, and sometimes inaccurate behavior. To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calling. Drawing from the principles of classical compilers, LLMCompiler streamlines parallel function calling with three components: (i) an LLM Planner, formulating execution plans; (ii) a Task Fetching Unit, dispatching function calling tasks; and (iii) an Executor, executing these tasks in parallel. LLMCompiler automatically generates an optimized orchestration for the function calls and can be used with both open-source and closed-source models. We have benchmarked LLMCompiler on a range of tasks with different patterns of function calling. We observe consistent latency speedup of up to 3.7x, cost savings of up to 6.7x, and accuracy improvement of up to ~9% compared to ReAct.

翻訳日:2024-02-07 19:24:02 公開日:2024-02-06

# 量子非局所性の多元的性質を解き明かす

Unmasking the Polygamous Nature of Quantum Nonlocality ( http://arxiv.org/abs/2312.04373v2 )

ライセンス: Link先を確認

Pawe{\l} Cie\'sli\'nski, Lukas Knips, Mateusz Kowalczyk, Wies{\l}aw Laskowski, Tomasz Paterek, Tam\'as V\'ertesi, Harald Weinfurter

(参考訳) 量子力学は、ある観測値の統計に制限を課す。おそらく最も有名な例は不確実性原理である。複数のベルの不等式を同時に違反する同様のトレードオフも存在する。 3人の観察者の最も単純な場合、ベルの不平等に違反することは他の不平等に違反することを妨げることが示されている。ベル・モノガミーの形式は無符号原理と関連しており、全ての不等式を同時に違反することができないことは、その基本的な性質と見なされている。ここではベル単ガミーが普遍的に成り立たないことを示し、実際には三人の観測者に対してのみ単ガミー的な状況が存在する。したがって、量子非局所性の性質は真に多元的である。 3人以上の観測者に対して単元原理に従わない量子状態とタイトベル不等式を同定するための体系的手法を提案する。同定された多価不等式は、6光子ディック状態を用いたベル型相関の測定によって実験的に破られ、量子暗号や量子ネットワーク内の複数のノードの同時自己検査に利用することができる。

Quantum mechanics imposes limits on the statistics of certain observables. Perhaps the most famous example is the uncertainty principle. Similar trade-offs also exist for the simultaneous violation of multiple Bell inequalities. In the simplest case of three observers, it has been shown that violating one Bell inequality precludes the violation of any other inequality, a property called monogamy of Bell violations. Forms of Bell monogamy have been linked to the no-signalling principle and the inability of simultaneous violations of all inequalities is regarded as their fundamental property. Here we show that the Bell monogamy does not hold universally and that in fact the only monogamous situation exists only for three observers. Consequently, the nature of quantum nonlocality is truly polygamous. We present a systematic methodology for identifying quantum states and tight Bell inequalities that do not obey the monogamy principle for any number of more than three observers. The identified polygamous inequalities are experimentally violated by the measurement of Bell-type correlations using six-photon Dicke states and may be exploited for quantum cryptography as well as simultaneous self testing of multiple nodes in a quantum network.

翻訳日:2024-02-07 19:23:41 公開日:2024-02-06

# 教師なし類似度尺度を用いたソースコードクローン検出

Source Code Clone Detection Using Unsupervised Similarity Measures ( http://arxiv.org/abs/2401.09885v3 )

ライセンス: Link先を確認

Jorge Martinez-Gil

(参考訳) 近年,クローン検出やコード検索,レコメンデーションといったソフトウェア工学タスクの重要性から,ソースコードの類似性の評価が注目されている。本研究はソースコードクローン検出のための教師なし類似度尺度の比較分析を行う。目標は、現在の最先端技術、その強み、弱点を概観することである。そのため、既存の教師なし戦略をコンパイルし、ベンチマークデータセットでパフォーマンスを評価することで、ソフトウェアエンジニアが特定のユースケースに適した方法を選択するようにガイドします。この研究のソースコードはhttps://github.com/jorge-martinez-gil/codesimで入手できる。

Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation. This work presents a comparative analysis of unsupervised similarity measures for identifying source code clone detection. The goal is to overview the current state-of-the-art techniques, their strengths, and weaknesses. To do that, we compile the existing unsupervised strategies and evaluate their performance on a benchmark dataset to guide software engineers in selecting appropriate methods for their specific use cases. The source code of this study is available at https://github.com/jorge-martinez-gil/codesim

翻訳日:2024-02-07 19:16:50 公開日:2024-02-06

# 2次元の低オーバーヘッド量子コンピューティングのためのLDPC-cat符号

LDPC-cat codes for low-overhead quantum computing in 2D ( http://arxiv.org/abs/2401.09541v2 )

ライセンス: Link先を確認

Diego Ruiz, J\'er\'emie Guillaud, Anthony Leverrier, Mazyar Mirrahimi, Christophe Vuillot

(参考訳) 量子低密度パリティチェック(qLDPC)コードは、フォールトトレラント量子コンピューティング(FTQC)アーキテクチャのオーバーヘッドを大幅に削減するための有望な構造である。しかし、これらのコードの既知のハードウェア実装はすべて、長距離量子ビット接続、高速安定化器、多層チップレイアウトなどの高度な技術を必要とする。フォールトトレランスのハードウェアオーバーヘッドを削減する別のアプローチは、ビットフリップエラーが指数関数的に設計によって抑制されるボソニックキャットキュービットを使用することである。本研究では,両手法を組み合わせて,位相フリップを補正する古典的LDPC符号を構成する猫量子ビットに基づくアーキテクチャを提案する。このような位相フリップLDPC符号を用いることで、2つの大きな利点が得られます。まず、2Dおよび低ウェイト安定化器における短距離量子ビット相互作用により、現在の超伝導回路技術と容易に互換性のあるコードの実装を実現する。第2に,局所接続を維持しつつ,猫キュービットの第2層を持つ論理ゲートのフォールトトレラントなユニバーサルセットの実装方法を示す。我々はこれらの古典符号の数値的ブルートフォース最適化を行い、アルゴリズムが関連する符号距離に最適な符号化レートの符号を求める。我々は、最良のコードのいくつかがセル・オートマトン構造から恩恵を受けていることを発見します。これにより、高いエンコーディングレートと距離を持つコードのファミリーを定義することができます。最後に,回路レベルの雑音下でのコードの性能を数値的に評価する。物理的フェイズフリップエラー確率$\epsilon \approx 0.1\%$と仮定すると、私たちの$[165+8\ell, 34+2\ell, 22]$コードファミリーは、合計論理的エラー確率(論理的位相フリップとビットフリップの両方を含む)と論理的キュービット$\epsilon_L \leq 10^{-8}$を758ドルのキャット量子ビットチップで符号化することができる。

Quantum low-density parity-check (qLDPC) codes are a promising construction for drastically reducing the overhead of fault-tolerant quantum computing (FTQC) architectures. However, all of the known hardware implementations of these codes require advanced technologies, such as long-range qubit connectivity, high-weight stabilizers, or multi-layered chip layouts. An alternative approach to reduce the hardware overhead of fault-tolerance is to use bosonic cat qubits where bit-flip errors are exponentially suppressed by design. In this work, we combine both approaches and propose an architecture based on cat qubits concatenated in classical LDPC codes correcting for phase-flips. We find that employing such phase-flip LDPC codes provides two major advantages. First, the hardware implementation of the code can be realised using short-range qubit interactions in 2D and low-weight stabilizers, which makes it readily compatible with current superconducting circuit technologies. Second, we demonstrate how to implement a fault-tolerant universal set of logical gates with a second layer of cat qubits while maintaining the local connectivity. We conduct a numerical brute force optimisation of these classical codes to find the ones with the best encoding rate for algorithmically relevant code distances. We discover that some of the best codes benefit from a cellular automaton structure. This allows us to define families of codes with high encoding rates and distances. Finally, we numerically assess the performance of our codes under circuit-level noise. Assuming a physical phase-flip error probability $\epsilon \approx 0.1\%$, our $[165+8\ell, 34+2\ell, 22]$ code family allows to encode $100$ logical qubits with a total logical error probability (including both logical phase-flip and bit-flip) per cycle and per logical qubit $\epsilon_L \leq 10^{-8}$ on a $758$ cat qubit chip.

翻訳日:2024-02-07 19:16:40 公開日:2024-02-06

# インフレのクリロフ複雑性

Inflationary Krylov complexity ( http://arxiv.org/abs/2401.09307v3 )

ライセンス: Link先を確認

Tao Li and Lei-Hua Liu

(参考訳) 本研究では,インフレーションにおける変形分散関係に対する曲率摂動のクリロフ複雑性を体系的に検討した。多くの量子重力フレームワークはこの種の分散関係を修正できるため、我々の分析は弦宇宙論、ループ重力、$\it e.t.c$に適用できる。 lanczosアルゴリズムに従い、非常に初期の宇宙は無限多体、最大カオス系であることがわかった。我々の数値は、標準分散関係のLanczos係数とLyapunov指数が主にスケール係数によって決定されることを示している。修正された場合については、運動量によってほぼ決定される。閉系の手法では、水平線が抜ける前にクリロフ複雑性が不規則な振動を示すことが分かる。修正されたケースは、地平線が存在すればより高速な成長を示す。開系のアプローチについては、Lanczos係数を$n$(主量子数)に比例させるだけで非常に堅牢な正確な波動関数を構築する。これに基づいて、Krylov複雑性とKrylovエントロピーは、弱散逸近似の下で閉じた系の場合、十分に回復可能であることを発見し、この分析により、Krylov複雑性の進化は元の状況と変わらないことを示した。また,インフレーション期は強い消散期であることがわかった。一方、我々の数値は、クリロフの複雑さがインフレーション期間中に増加することを明らかに示しています。しかし、小さなスケールでは、地平線が出てからピークとなるだろう。分析の結果,背景の劇的な変化(インフレーション)がクリロフ複雑性の進化に大きく影響することが明らかとなった。曲率摂動は量子レベルから古典レベルに遷移する。このデコヒーレンスがインフレーション中のクリロフの複雑さに大きな影響を与えると期待できる。

In this work, we have systematically investigated the Krylov complexity of curvature perturbation for the modified dispersion relation in inflation. Since many quantum gravitational frameworks could lead to this kind of modified dispersion relation, our analysis could be applied to the string cosmology, loop gravity, $\it e.t.c$. Following the Lanczos algorithm, we find the very early universe is an infinite, many-body, and maximal chaotic system. Our numerics shows that the Lanczos coefficient and Lyapunov index of the standard dispersion relation are mainly determined by the scale factor. As for the modified case, it is nearly determined by the momentum. In a method of the closed system, we discover that the Krylov complexity will show irregular oscillation before the horizon exits. The modified case will present faster growth after the horizon exists. As for the approach of an open system, we construct the exact wave function which is very robust only requiring the Lanczos coefficient proportional to $n$ (main quantum number). Based on it, we find the Krylov complexity and Krylov entropy could nicely recover in the case of a closed system under the weak dissipative approximation, in which our analysis shows that the evolution of Krylov complexity will not be the same with the original situation. We also find the inflationary period is a strong dissipative system. Meanwhile, our numerics clearly shows the Krylov complexity will grow during the whole inflationary period. But for the small scales, there will be a peak after the horizon exits. Our analysis reveals that the dramatic change in background (inflation) will significantly impact the evolution of Krylov complexity. Since the curvature perturbation will transit from the quantum level to the classical level. We could expect that the decoherence will highly impact the Krylov complexity during inflation.

翻訳日:2024-02-07 19:16:02 公開日:2024-02-06

# マルコフ雑音を用いた確率近似と強化学習のためのode法

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise ( http://arxiv.org/abs/2401.07844v2 )

ライセンス: Link先を確認

Shuze Liu, Shuhang Chen, Shangtong Zhang

(参考訳) 確率近似(英: stochastic approximation)は、ベクトルを反復的に、漸進的に、そして確率的に更新するアルゴリズムのクラスである。確率近似アルゴリズムを解析する基本的な課題は、その安定性、すなわち確率ベクトル反復がほぼ確実に有界であることを示すことである。本稿では, マルティンゲール差分雑音設定からマルコフ雑音設定への安定性に対するボルカー・マインの定理を拡張し, 強化学習, 特に線形関数近似と適性トレースを用いたオフポリシー強化学習アルゴリズムに適用性を大幅に向上させた。我々の分析の中心は、少数の函数の変化の漸近速度の減少であり、これは大数の強い法則の形式とよく使われるV4リャプノフドリフト条件の両方によって示唆され、マルコフ鎖が有限で既約であれば自明に成り立つ。

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.

翻訳日:2024-02-07 19:15:35 公開日:2024-02-06

# 大規模言語モデルからのイベントシーケンス知識の蒸留

Distilling Event Sequence Knowledge From Large Language Models ( http://arxiv.org/abs/2401.07237v2 )

ライセンス: Link先を確認

Somin Wadhwa, Oktie Hassanzadeh, Debarun Bhattacharjya, Ken Barker, Jian Ni

(参考訳) イベントシーケンスモデルは、イベントの分析と予測に非常に有効であることが判明している。このようなモデルの構築には、豊富な高品質なイベントシーケンスデータが必要になる。しかし、特定のアプリケーションでは、クリーンな構造化されたイベントシーケンスは利用できず、自動シーケンス抽出はノイズが多く不完全なデータをもたらす。本研究では,確率的イベントモデル構築に効果的に使用できるイベントシーケンスを生成するための大規模言語モデル(llm)の利用を検討する。これは、LLMからイベントシーケンス知識を蒸留するメカニズムと見なすことができる。本手法は、因果関係を持つ事象概念の知識グラフ(KG)を用いて、因果関係生成のための生成言語モデルを導出する。提案手法は,入力KGの知識ギャップを埋めて,高品質なイベントシーケンスを生成することができることを示す。さらに,パターンマイニングや確率的イベントモデルから有用で複雑な構造化知識を発見するために,生成されたシーケンスをどのように活用するかを検討する。我々は、シーケンス生成コードと評価フレームワーク、およびイベントシーケンスデータのコーパスをリリースする。

Event sequence models have been found to be highly effective in the analysis and prediction of events. Building such models requires availability of abundant high-quality event sequence data. In certain applications, however, clean structured event sequences are not available, and automated sequence extraction results in data that is too noisy and incomplete. In this work, we explore the use of Large Language Models (LLMs) to generate event sequences that can effectively be used for probabilistic event model construction. This can be viewed as a mechanism of distilling event sequence knowledge from LLMs. Our approach relies on a Knowledge Graph (KG) of event concepts with partial causal relations to guide the generative language model for causal event sequence generation. We show that our approach can generate high-quality event sequences, filling a knowledge gap in the input KG. Furthermore, we explore how the generated sequences can be leveraged to discover useful and more complex structured knowledge from pattern mining and probabilistic event models. We release our sequence generation code and evaluation framework, as well as corpus of event sequence data.

翻訳日:2024-02-07 19:15:14 公開日:2024-02-06

# 付加量子化による大規模言語モデルの極端圧縮

Extreme Compression of Large Language Models via Additive Quantization ( http://arxiv.org/abs/2401.06118v2 )

ライセンス: Link先を確認

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev, Elias Frantar, Artem Babenko, Dan Alistarh

(参考訳) 正確なオープン大言語モデル(LLM)の出現は、エンドユーザーデバイス上での実行を可能にするようなモデルの量子化技術への競争につながった。本稿では,Multi-Codebook Quantization(MCQ)における古典的手法の観点から,パラメータあたり2ビットから3ビットといった,極めて低ビット数を対象として定義されたLLM圧縮の問題を再考する。我々の研究は、MCQファミリーの古典的なアルゴリズムであるAdditive Quantizationの上に構築され、言語モデルの量子化に適応する。結果として得られたアルゴリズムは、LLM圧縮の最先端を推し進め、与えられた圧縮予算の精度において、最近提案されたすべての技術より優れている。例えば、Llama 2モデルをパラメータあたり2ビットに圧縮する場合、我々のアルゴリズムは、7Bモデルを6.93パープレキシティ(最高の先行処理に対して1.29改善、FP16から1.81ポイント)、13Bモデルを5.70パープレキシティ(.36改善)、70Bモデルを3.94パープレキシティ(.22改善)に量子化する。我々は,LLM量子化の今後の研究を促進するために,言語モデル AQLM をベースラインとして追加量子化の実装をリリースする。

The emergence of accurate open large language models (LLMs) has led to a race towards quantization techniques for such models enabling execution on end-user devices. In this paper, we revisit the problem of "extreme" LLM compression--defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter, from the point of view of classic methods in Multi-Codebook Quantization (MCQ). Our work builds on top of Additive Quantization, a classic algorithm from the MCQ family, and adapts it to the quantization of language models. The resulting algorithm advances the state-of-the-art in LLM compression, outperforming all recently-proposed techniques in terms of accuracy at a given compression budget. For instance, when compressing Llama 2 models to 2 bits per parameter, our algorithm quantizes the 7B model to 6.93 perplexity (a 1.29 improvement relative to the best prior work, and 1.81 points from FP16), the 13B model to 5.70 perplexity (a .36 improvement) and the 70B model to 3.94 perplexity (a .22 improvement) on WikiText2. We release our implementation of Additive Quantization for Language Models AQLM as a baseline to facilitate future research in LLM quantization.

翻訳日:2024-02-07 19:14:57 公開日:2024-02-06

# kernel fisher-rao flowを用いた単位時間サンプリング

Sampling in Unit Time with Kernel Fisher-Rao Flow ( http://arxiv.org/abs/2401.03892v2 )

ライセンス: Link先を確認

Aimee Maurais and Youssef Marzouk

(参考訳) 非正規化対象密度からサンプリングするための新しい平均場ODEと対応する相互作用粒子系(IPS)を導入する。 IPSは勾配のない閉形式であり、参照密度からサンプリングして(正規化されていない)ターゲット-参照密度比を計算する能力のみを必要とする。平均場ODEは、特定のフィッシャー-ラオ勾配流の経路である2つの密度の幾何学的混合に沿ってサンプルを輸送する速度場に対するポアソン方程式を解くことで得られる。速度場に rkhs ansatz を用いることでポアソン方程式を扱いやすくし, 得られた平均場 ode を有限標本上で離散化することができる。平均場ODEは、サンプル駆動最適輸送として知られるフレームワーク内でのモンゲ・アンプ・エル方程式の連続線型化の極限として離散時間の観点からも導出することができる。我々は,このアプローチの確率的変種を導入し,ipsが様々なターゲット分布から高品質なサンプルを生成できることを実証的に示す。

We introduce a new mean-field ODE and corresponding interacting particle systems (IPS) for sampling from an unnormalized target density. The IPS are gradient-free, available in closed form, and only require the ability to sample from a reference density and compute the (unnormalized) target-to-reference density ratio. The mean-field ODE is obtained by solving a Poisson equation for a velocity field that transports samples along the geometric mixture of the two densities, which is the path of a particular Fisher-Rao gradient flow. We employ a RKHS ansatz for the velocity field, which makes the Poisson equation tractable and enables discretization of the resulting mean-field ODE over finite samples. The mean-field ODE can be additionally be derived from a discrete-time perspective as the limit of successive linearizations of the Monge-Amp\`ere equations within a framework known as sample-driven optimal transport. We introduce a stochastic variant of our approach and demonstrate empirically that our IPS can produce high-quality samples from varied target distributions, outperforming comparable gradient-free particle systems and competitive with gradient-based alternatives.

翻訳日:2024-02-07 19:14:31 公開日:2024-02-06

# 現実のドラッグ発見のための量子コンピューティングパイプライン:アルゴリズムから量子ハードウェアへ

A Quantum Computing Pipeline for Real World Drug Discovery: From Algorithm to Quantum Hardware ( http://arxiv.org/abs/2401.03759v2 )

ライセンス: Link先を確認

Weitang Li, Zhi Yin, Xiaoran Li, Dongqiang Ma, Shuang Yi, Zhenxing Zhang, Chenji Zou, Kunliang Bu, Maochun Dai, Jie Yue, Yuzong Chen, Xiaojin Zhang, Shengyu Zhang

(参考訳) 量子コンピューティングは、古典的アプローチよりも優れた計算能力を持ち、医薬品を含む多くの科学領域に革命を起こす可能性を秘めている。しかし、量子コンピューティングの薬物発見への応用は主に概念実証研究に限られており、現実の薬物開発課題の複雑さを捉えるのに失敗することが多い。本研究では,創薬設計問題に対処するための高度な量子コンピューティングパイプラインを開発することにより,従来の研究から逸脱する。提案手法は, 量子計算の実用的応用を強調し, 実用化に向けて推進するものである。具体的には, 共有結合切断を伴うプロドラッグ活性化のためのギブス自由エネルギープロファイルの正確な決定と, 共有結合相互作用の正確なシミュレーションという, 薬物発見における2つの重要な課題に対処する汎用量子コンピューティングパイプラインを構築した。この研究は、薬物設計で遭遇する検証可能なシナリオ、特に2つのケーススタディに存在する共有結合問題に対する量子コンピューティングのベンチマークの先駆的な取り組みとなり、理論モデルから具体的応用へと移行する。本結果は,現実の薬物設計ワークフローに統合するための量子コンピューティングパイプラインの可能性を示す。

Quantum computing, with its superior computational capabilities compared to classical approaches, holds the potential to revolutionize numerous scientific domains, including pharmaceuticals. However, the application of quantum computing for drug discovery has primarily been limited to proof-of-concept studies, which often fail to capture the intricacies of real-world drug development challenges. In this study, we diverge from conventional investigations by developing an advanced quantum computing pipeline tailored to address genuine drug design problems. Our approach underscores the pragmatic application of quantum computation and propels it towards practical industrial adoption. We specifically construct our versatile quantum computing pipeline to address two critical tasks in drug discovery: the precise determination of Gibbs free energy profiles for prodrug activation involving covalent bond cleavage, and the accurate simulation of covalent bond interactions. This work serves as a pioneering effort in benchmarking quantum computing against veritable scenarios encountered in drug design, especially the covalent bonding issue present in both of the case studies, thereby transitioning from theoretical models to tangible applications. Our results demonstrate the potential of a quantum computing pipeline for integration into real world drug design workflows.

翻訳日:2024-02-07 19:14:11 公開日:2024-02-06

# サンプル効率の良いオフライン強化学習について:データ多様性、後方サンプリングなど

On Sample-Efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, and Beyond ( http://arxiv.org/abs/2401.03301v2 )

ライセンス: Link先を確認

Thanh Nguyen-Tang and Raman Arora

(参考訳) オフライン強化学習(Local reinforcement learning, RL)として知られる, 逐次的意思決定のための歴史的データセットからのサンプル効率学習を促進するものを理解することを目的とする。さらに,(値)関数近似を活用しながらサンプル効率を楽しむアルゴリズムにも興味を持っている。本稿では,これらの基本的な質問について述べる。 (i)オフラインrlにおける以前のカバレッジ尺度の概念を仮定したデータ多様性の概念の提案 (2) この概念を用いて、バージョン空間(VS)、正規化最適化(RO)、後続サンプリング(PS)に基づくオフラインRLアルゴリズムの3つの異なるクラスを統一する。標準仮定の下では,VS-based, RO-based, PS-basedアルゴリズムにより, 有限および線形モデルクラスに対する最先端の準最適境界を回復し, サンプル効率を得る。この結果は、以前の研究がVSベースのアルゴリズムと比較してROベースのアルゴリズムの好ましくないサンプルの複雑さを示唆しているのに対して、後続サンプリングは、その爆発的な性質からオフラインRLではまれである。特に,提案するオフラインrlのためのモデルフリーpsベースアルゴリズムは{novel}であり,自然界において{frequentist}(すなわち最悪の場合)である。

We seek to understand what facilitates sample-efficient learning from historical datasets for sequential decision-making, a problem that is popularly known as offline reinforcement learning (RL). Further, we are interested in algorithms that enjoy sample efficiency while leveraging (value) function approximation. In this paper, we address these fundamental questions by (i) proposing a notion of data diversity that subsumes the previous notions of coverage measures in offline RL and (ii) using this notion to {unify} three distinct classes of offline RL algorithms based on version spaces (VS), regularized optimization (RO), and posterior sampling (PS). We establish that VS-based, RO-based, and PS-based algorithms, under standard assumptions, achieve \emph{comparable} sample efficiency, which recovers the state-of-the-art sub-optimality bounds for finite and linear model classes with the standard assumptions. This result is surprising, given that the prior work suggested an unfavorable sample complexity of the RO-based algorithm compared to the VS-based algorithm, whereas posterior sampling is rarely considered in offline RL due to its explorative nature. Notably, our proposed model-free PS-based algorithm for offline RL is {novel}, with sub-optimality bounds that are {frequentist} (i.e., worst-case) in nature.

翻訳日:2024-02-07 19:13:52 公開日:2024-02-06

# voronav:voronoiベースの大きな言語モデルによるゼロショットオブジェクトナビゲーション

VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model ( http://arxiv.org/abs/2401.02695v2 )

ライセンス: Link先を確認

Pengying Wu, Yao Mu, Bingxian Wu, Yi Hou, Ji Ma, Shanghang Zhang, Chang Liu

(参考訳) 家庭用ロボティクスの領域では、ゼロショットオブジェクトナビゲーション(ZSON)タスクは、エージェントが不慣れな環境を巧みに横切り、前もって明示的な訓練をせずに新しいカテゴリーからオブジェクトを見つけることを可能にする。本稿では,新しい意味探索フレームワークvoronavについて紹介する。voronoiグラフを縮小し,探索経路と計画ノードをリアルタイムで構築した意味マップから抽出する。トポロジカルおよびセマンティック情報を活用することで、VoroNavは大きな言語モデル(LLM)で容易に解釈できるパスとイメージのテキストベースの記述を設計する。特に,本手法では,環境コンテキストを表現するため,経路と遠近性記述の相乗効果を示し,ナビゲーションの経路点の確認にコモンセンス推論を適用した。 HM3DとHSSDの大規模な評価では、VoroNavは成功率と探索効率の両方で既存のベンチマークを上回っている(絶対改善:+2.8%、HM3Dは+3.7%、+2.6%、+3.8%、HSSDは+3.8%)。さらに,障害物回避能力と知覚効率を評価する指標を導入し,ZSON計画における我々の手法による改善をさらに裏付けた。プロジェクトページ: https://voro-nav.github.io

In the realm of household robotics, the Zero-Shot Object Navigation (ZSON) task empowers agents to adeptly traverse unfamiliar environments and locate objects from novel categories without prior explicit training. This paper introduces VoroNav, a novel semantic exploration framework that proposes the Reduced Voronoi Graph to extract exploratory paths and planning nodes from a semantic map constructed in real time. By harnessing topological and semantic information, VoroNav designs text-based descriptions of paths and images that are readily interpretable by a large language model (LLM). In particular, our approach presents a synergy of path and farsight descriptions to represent the environmental context, enabling LLM to apply commonsense reasoning to ascertain waypoints for navigation. Extensive evaluation on HM3D and HSSD validates VoroNav surpasses existing benchmarks in both success rate and exploration efficiency (absolute improvement: +2.8% Success and +3.7% SPL on HM3D, +2.6% Success and +3.8% SPL on HSSD). Additionally introduced metrics that evaluate obstacle avoidance proficiency and perceptual efficiency further corroborate the enhancements achieved by our method in ZSON planning. Project page: https://voro-nav.github.io

翻訳日:2024-02-07 19:13:26 公開日:2024-02-06

# スパース報酬を用いた軌道指向政策最適化

Trajectory-Oriented Policy Optimization with Sparse Rewards ( http://arxiv.org/abs/2401.02225v2 )

ライセンス: Link先を確認

Guojian Wang, Faguo Wu, Xiao Zhang

(参考訳) 深層強化学習(DRL)を習得することは、難解な報酬を含むタスクにおいて困難である。これらの制限された報酬は、エージェントが有意義なフィードバックを得る前に、そのタスクが部分的に、または完全に完了しているかどうかを示すだけである。その結果、既存のDRL探索アルゴリズムの大部分は、合理的な時間枠内で実践的なポリシーを取得するのに苦労している。この課題に対処するため,より高速で効率的なオンラインRLを実現するために,オフラインのデモトラジェクトリを利用する手法を提案する。私たちの重要な洞察は、オフラインデモの軌跡を単なる模倣ではなくガイダンスとして扱うことで、ステートアクション訪問の分布がオフラインデモのそれとわずかに一致するポリシーを学習できるようにすることです。具体的には,最大平均偏差(mmd)とキャストポリシー最適化を距離制約最適化問題として用いる新しい軌道距離について紹介する。そして、この最適化問題をポリシーグレードのアルゴリズムに合理化し、オフラインのデモから得られた洞察によって形成された報酬を統合することを示します。提案手法は,広範囲にわたる離散的および連続的な制御タスクに対する評価を行う。実験の結果,提案アルゴリズムは,多様な探索と最適方針の獲得に関して,ベースライン法よりも優れていることがわかった。

Mastering deep reinforcement learning (DRL) proves challenging in tasks featuring scant rewards. These limited rewards merely signify whether the task is partially or entirely accomplished, necessitating various exploration actions before the agent garners meaningful feedback. Consequently, the majority of existing DRL exploration algorithms struggle to acquire practical policies within a reasonable timeframe. To address this challenge, we introduce an approach leveraging offline demonstration trajectories for swifter and more efficient online RL in environments with sparse rewards. Our pivotal insight involves treating offline demonstration trajectories as guidance, rather than mere imitation, allowing our method to learn a policy whose distribution of state-action visitation marginally matches that of offline demonstrations. We specifically introduce a novel trajectory distance relying on maximum mean discrepancy (MMD) and cast policy optimization as a distance-constrained optimization problem. We then illustrate that this optimization problem can be streamlined into a policy-gradient algorithm, integrating rewards shaped by insights from offline demonstrations. The proposed algorithm undergoes evaluation across extensive discrete and continuous control tasks with sparse and misleading rewards. The experimental findings demonstrate the significant superiority of our proposed algorithm over baseline methods concerning diverse exploration and the acquisition of an optimal policy.

翻訳日:2024-02-07 19:12:58 公開日:2024-02-06

# 非有界損失に対するPAC-Bayes-Chernoff境界

PAC-Bayes-Chernoff bounds for unbounded losses ( http://arxiv.org/abs/2401.01148v3 )

ライセンス: Link先を確認

Ioar Casado, Luis A. Ortega, Andr\'es R. Masegosa and Aritz P\'erez

(参考訳) 我々は,新しいPAC-Bayesオラクルを導入する。この結果は、Cram\'er-Chernoff 境界の PAC-Bayesian 版として理解することができる。証明手法は、損失のCram\'er変換を含む特定のランダム変数のテールを制御することに依存する。我々は主定理のいくつかの応用を強調する。まず,多くのPAC-Bayes境界における自由パラメータの正確な最適化が自然に可能であることを示す。第2に,これまでの結果を回復し,一般化する。最後に、我々のアプローチはより情報的かつ潜在的に厳密な境界をもたらすリッチな仮定で作業できることを示す。この方向において、パラメータノルムとlog-sobolevの不等式に基づいて境界を求める新しい ``model-dependent bounded cgf" 仮定の下で一般境界を与える。これら全ての境界は、新しい後進を得るために最小化することができる。

We introduce a new PAC-Bayes oracle bound for unbounded losses. This result can be understood as a PAC-Bayesian version of the Cram\'er-Chernoff bound. The proof technique relies on controlling the tails of certain random variables involving the Cram\'er transform of the loss. We highlight several applications of the main theorem. First, we show that our result naturally allows exact optimization of the free parameter on many PAC-Bayes bounds. Second, we recover and generalize previous results. Finally, we show that our approach allows working with richer assumptions that result in more informative and potentially tighter bounds. In this direction, we provide a general bound under a new ``model-dependent bounded CGF" assumption from which we obtain bounds based on parameter norms and log-Sobolev inequalities. All these bounds can be minimized to obtain novel posteriors.

翻訳日:2024-02-07 19:12:39 公開日:2024-02-06

# 拡散モデル、画像の超解像とすべて:調査

Diffusion Models, Image Super-Resolution And Everything: A Survey ( http://arxiv.org/abs/2401.00736v2 )

ライセンス: Link先を確認

Brian B. Moser, Arundhati S. Shanbhag, Federico Raue, Stanislav Frolov, Sebastian Palacio and Andreas Dengel

(参考訳) 拡散モデル(dms)は、画像スーパーレゾリューション(sr)フィールドを混乱させ、さらに画質と人間の知覚嗜好のギャップを閉じた。訓練は簡単で、従来の生成手法による現実性を超えた非常に高品質なサンプルを作成できる。有望な結果にもかかわらず、計算能力の高い要求、互換性、説明可能性の欠如、色の変化など、さらなる研究を必要とする新たな課題も伴う。残念ながら、この分野への参入は出版物が多いため圧倒的である。これに対処するため、我々は、イメージsrに適用される理論的基礎の統一的な再集計と、この分野の幅広い既存のレビューとは別として、このドメインにおけるユニークな特徴と方法論の基礎となる詳細な分析を提供する。本調査は,DM原則の密集的な理解を具体化し,代替入力領域,条件付け手法,指導機構,汚職空間,ゼロショット学習アプローチなど,現在の研究手法を探求する。 DMのレンズを通して画像SRの進化と現在の傾向を詳細に調べることにより、この急速に進歩する領域におけるさらなるイノベーションを刺激し、既存の課題と今後の方向性を図示する。

Diffusion Models (DMs) have disrupted the image Super-Resolution (SR) field and further closed the gap between image quality and human perceptual preferences. They are easy to train and can produce very high-quality samples that exceed the realism of those produced by previous generative methods. Despite their promising results, they also come with new challenges that need further research: high computational demands, comparability, lack of explainability, color shifts, and more. Unfortunately, entry into this field is overwhelming because of the abundance of publications. To address this, we provide a unified recount of the theoretical foundations underlying DMs applied to image SR and offer a detailed analysis that underscores the unique characteristics and methodologies within this domain, distinct from broader existing reviews in the field. This survey articulates a cohesive understanding of DM principles and explores current research avenues, including alternative input domains, conditioning techniques, guidance mechanisms, corruption spaces, and zero-shot learning approaches. By offering a detailed examination of the evolution and current trends in image SR through the lens of DMs, this survey sheds light on the existing challenges and charts potential future directions, aiming to inspire further innovation in this rapidly advancing area.

翻訳日:2024-02-07 19:12:24 公開日:2024-02-06

# MR-GSM8K:大規模言語モデル評価におけるメタ推論革命

MR-GSM8K: A Meta-Reasoning Revolution in Large Language Model Evaluation ( http://arxiv.org/abs/2312.17080v3 )

ライセンス: Link先を確認

Zhongshen Zeng, Pengguang Chen, Shu Liu, Haiyun Jiang, Jiaya Jia

(参考訳) 本稿では,メタ推論への取り組みに挑戦する,大規模言語モデルのための新しい評価パラダイムを提案する。このアプローチは、従来のエージェントの認知能力を評価するために使用される既存の数学問題解決ベンチマークの重大な欠点に対処する。我々のパラダイムは、しばしば推論プロセスを見落としている結果指向の評価から、モデル間の認知能力を効果的に区別するより包括的な評価へと焦点を移します。例えば、我々のベンチマークでは、GPT-4はGPT3-5の5倍の性能を示している。この新しいパラダイムの意義は、GSM8Kのような現在のベンチマークが、その飽和と様々な推論能力の効果的な分化の欠如のため、LLMの潜在的な認知的欠陥を明らかにする能力にある。当社の包括的な分析には、オープンソースコミュニティとクローズドソースコミュニティの両方の最先端の数学モデルが含まれており、トレーニングと評価アプローチの根本的な欠陥を明らかにしています。本稿では,LLMの評価におけるパラダイムシフトを提唱するだけでなく,AI(Artificial General Intelligence, AGI)の軌道に関する議論にも貢献する。メタ推論評価手法の採用を促進することで,LLMの真の認知能力をより正確に評価することを目指している。

In this work, we introduce a novel evaluation paradigm for Large Language Models, one that challenges them to engage in meta-reasoning. This approach addresses critical shortcomings in existing math problem-solving benchmarks, traditionally used to evaluate the cognitive capabilities of agents. Our paradigm shifts the focus from result-oriented assessments, which often overlook the reasoning process, to a more holistic evaluation that effectively differentiates the cognitive capabilities among models. For example, in our benchmark, GPT-4 demonstrates a performance five times better than GPT3-5. The significance of this new paradigm lies in its ability to reveal potential cognitive deficiencies in LLMs that current benchmarks, such as GSM8K, fail to uncover due to their saturation and lack of effective differentiation among varying reasoning abilities. Our comprehensive analysis includes several state-of-the-art math models from both open-source and closed-source communities, uncovering fundamental deficiencies in their training and evaluation approaches. This paper not only advocates for a paradigm shift in the assessment of LLMs but also contributes to the ongoing discourse on the trajectory towards Artificial General Intelligence (AGI). By promoting the adoption of meta-reasoning evaluation methods similar to ours, we aim to facilitate a more accurate assessment of the true cognitive abilities of LLMs.

翻訳日:2024-02-07 19:12:01 公開日:2024-02-06

# SMUTF:生成タグとハイブリッド機能を用いたスキーママッチング

SMUTF: Schema Matching Using Generative Tags and Hybrid Features ( http://arxiv.org/abs/2402.01685v2 )

ライセンス: Link先を確認

Yu Zhang, Mei Di, Haozheng Luo, Chenwei Xu, Richard Tzong-Han Tsai

(参考訳) smutfは,教師付き学習がオープンドメインタスクのパフォーマンスに影響を与えないことを想定し,効果的なクロスドメインマッチングを実現する,大規模表型データスキーママッチング(sm)のためのユニークなアプローチである。このシステムは、ルールベースの機能工学、事前学習された言語モデル、ジェネレーティブな大規模言語モデルを組み合わせている。人道交換言語に触発された革新的適応では、各データ列に「生成タグ」を配置し、SMの有効性を高める。 SMUTFは幅広い汎用性を示し、既存の事前訓練された埋め込み、分類方法、生成モデルとシームレスに動作する。 sm用の広範な公開データセットがないことを認識して、公開人道データからhdxsmデータセットを作成し、オープンソース化しました。これは現在利用可能な最も徹底的なSMデータセットだと考えています。様々な公開データセットと新しいHDXSMデータセットの評価において、SMUTFは、精度と効率の点で既存の最先端モデルを上回り、F1スコアを11.84%改善し、ROCのAUCを5.08%改善した。

We introduce SMUTF, a unique approach for large-scale tabular data schema matching (SM), which assumes that supervised learning does not affect performance in open-domain tasks, thereby enabling effective cross-domain matching. This system uniquely combines rule-based feature engineering, pre-trained language models, and generative large language models. In an innovative adaptation inspired by the Humanitarian Exchange Language, we deploy 'generative tags' for each data column, enhancing the effectiveness of SM. SMUTF exhibits extensive versatility, working seamlessly with any pre-existing pre-trained embeddings, classification methods, and generative models. Recognizing the lack of extensive, publicly available datasets for SM, we have created and open-sourced the HDXSM dataset from the public humanitarian data. We believe this to be the most exhaustive SM dataset currently available. In evaluations across various public datasets and the novel HDXSM dataset, SMUTF demonstrated exceptional performance, surpassing existing state-of-the-art models in terms of accuracy and efficiency, and} improving the F1 score by 11.84% and the AUC of ROC by 5.08%.

翻訳日:2024-02-07 19:03:58 公開日:2024-02-06

# LLsM:大規模言語モデルを用いた言語ステレオグラフィ

LLsM: Generative Linguistic Steganography with Large Language Model ( http://arxiv.org/abs/2401.15656v2 )

ライセンス: Link先を確認

Yihao Wang and Ruiqi Song and Ru Zhang and Jianyi Liu and Lingxiao Li

(参考訳) 言語ステガノグラフィー(LS)タスクは、秘密情報に基づいてステガノグラフィーテキスト(ステゴ)を生成することを目的としている。認証を受けた受取人だけが、テキスト内の秘密の存在を認識し、それらを抽出することで、プライバシーを保護できる。しかし,既存のスキームが生成するステゴの制御性は乏しく,スタイルなどの特定の談話の特徴を包含することは困難である。その結果、ステゴは容易に検出でき、カバート通信を妥協する。本稿では,Large Language Model (LLM) を用いた最初のLSである LLsM を提案する。我々は,高度な談話特性を包含する大規模構築データセットを用いてllama2の微調整を行った。そして、この談話を案内情報として使用し、秘密とともにプロンプトの形式で微調整LDMに入力する。このベースで構築された候補プールはレンジエンコードされ、シークレットを使用して間隔を決定する。この区間の始まりと終わりの同じ接頭辞は、この瞬間に埋め込まれた秘密である。実験の結果, LLsMはテキスト品質, 統計解析, 談話マッチング, アンチステガナシスに関して, LS-taskおよび関連タスクベースラインよりも優れていた。特に、llsmのmave matricは、いくつかのベースラインを70%-80%上回っており、その反ステグアナリティクス性能は30%-40%高い。また、LLsMにより生成される長長のステゴの例を示し、長長のLSタスクにおいてその潜在的な優位性を示す。

Linguistic Steganography (LS) tasks aim to generate steganographic text (stego) based on secret information. Only authorized recipients can perceive the existence of secrets in the texts and extract them, thereby preserving privacy. However, the controllability of the stego generated by existing schemes is poor, and the stego is difficult to contain specific discourse characteristics such as style. As a result, the stego is easily detectable, compromising covert communication. To address these problems, this paper proposes LLsM, the first LS with the Large Language Model (LLM). We fine-tuned the LLaMA2 with a large-scale constructed dataset encompassing rich discourse characteristics, which enables the fine-tuned LLM to generate texts with specific discourse in a controllable manner. Then the discourse is used as guiding information and inputted into the fine-tuned LLM in the form of the Prompt together with secret. On this basis, the constructed candidate pool will be range encoded and use secret to determine the interval. The same prefix of this interval's beginning and ending is the secret embedded at this moment. Experiments show that LLsM performs superior to prevalent LS-task and related-task baselines regarding text quality, statistical analysis, discourse matching, and anti-steganalysis. In particular, LLsM's MAUVE matric surpasses some baselines by 70%-80%, and its anti-steganalysis performance is 30%-40% higher. Notably, we also present examples of longer stegos generated by LLsM, showing its potential superiority in long LS tasks.

翻訳日:2024-02-07 19:03:04 公開日:2024-02-06

# 先進的なアーティストの意見:AI生成芸術における透明性、オーナーシップ、公正性に関する調査研究

Foregrounding Artist Opinions: A Survey Study on Transparency, Ownership, and Fairness in AI Generative Art ( http://arxiv.org/abs/2401.15497v3 )

ライセンス: Link先を確認

Juniper Lovato, Julia Zimmerman, Isabelle Smith, Peter Dodds, Jennifer Karson

(参考訳) 生成人工知能(AI)ツールは、アートのようなアウトプットを作成し、創造的なプロセスを支援するために使用される。これらのツールはアーティストに利益をもたらすが、芸術労働力を傷つけ、芸術的および知的所有権を侵害する可能性がある。生成AI作成者は、アーティストからの明確な同意なく、アーチストのデジタル作品をスクラップして、生成AIモデルをトレーニングし、大規模にアートライクなモデル出力を生成する。これらのアウトプットは、現在、市場での人間アーティストとの競争に使われ、また、生成過程においてアートを作成するアーティストによって使用されている。我々は459人のアーティストを調査し、生成AIアートの潜在的有用性と害に関するアーティストの意見の緊張関係を調査した。本研究では、生成AIアートモデルの有用性と脅威、AIアートトレーニングモデルにおける芸術作品の公開における公正な実践、AIアートデリバティブの所有と権利、公正な補償に関するアーティストの意見を調査する。概して、モデルクリエーターは、AIモデルをトレーニングするために使用するアートやイメージの詳細を開示する必要がある、と私たちは考えています。また, アーティストの意見は, 職業的地位や実践, 人口動態, 美術品購入の有無, 生成aiの習熟度, 利用によって異なることがわかった。この研究の結果が、アートコミュニティとジェネレーティブAI研究者と開発者の間でより有意義なコラボレーションと整合性をもたらすことを期待しています。

Generative Artificial Intelligence (AI) tools are used to create art-like outputs and aid in the creative process. While these tools have potential benefits for artists, they also have the potential to harm the art workforce and infringe upon artistic and intellectual property rights. Without explicit consent from artists, Generative AI creators scrape artists' digital work to train Generative AI models and produce art-like model outputs at scale. These outputs are now being used to compete with human artists in the marketplace as well as being used by some artists in their generative processes to create art. We surveyed 459 artists to investigate the tension between artists' opinions on Generative AI art's potential utility and harm. This study surveys artists' opinions on the utility and threat of Generative AI art models, fair practices in the disclosure of artistic works in AI art training models, ownership and rights of AI art derivatives, and fair compensation. We find that artists, by and large, think that model creators should be required to disclose in detail what art and images they use to train their AI models. We also find that artists' opinions vary by professional status and practice, demographics, whether they have purchased art, and familiarity with and use of Generative AI. We hope the results of this work will further more meaningful collaboration and alignment between the art community and Generative AI researchers and developers.

翻訳日:2024-02-07 19:02:39 公開日:2024-02-06

# 太陽発電予測のための位置非依存電源領域適応学習

Location Agnostic Source-Free Domain Adaptive Learning to Predict Solar Power Generation ( http://arxiv.org/abs/2401.14422v2 )

ライセンス: Link先を確認

Md Shazid Islam, A S M Jahid Hasan, Md Saydur Rahman, Jubair Yusuf, Md Saiful Islam Sajol, Farhana Akter Tumpa

(参考訳) 太陽発電の予測は、空間的および時間的変動を示す気候特性に依存しているため、難しい課題である。予測モデルの性能はデータ分布の変化によって異なる場所によって異なり、結果としてある地域でうまく機能するが他の地域では機能しないモデルとなる。また、地球温暖化の影響で、年間を通じて天候の変化が顕著に加速している。この現象は、時間経過とともに同じ地理的領域内であっても、既存のモデルの有効性が低下する可能性をもたらす。本稿では,前述の課題を解決するための気象特性を用いた太陽発電を推定するために,ドメイン適応型深層学習に基づくフレームワークを提案する。フィードフォワード深部畳み込みネットワークモデルは、既知の位置データセットを教師付きでトレーニングし、後に未知の場所の太陽エネルギーを予測するために使用される。この適応型データ駆動アプローチは、計算速度、ストレージ効率、そして最先端の非適応的手法が失敗するシナリオで結果を改善する能力において、顕著な利点を示す。我々の手法では、カリフォルニア(CA)、フロリダ(FL)、ニューヨーク(NY)の順応的でない手法と比較して、太陽エネルギー予測精度が10.47 \%$、7.44 \%$、5.11\%$改善されている。

The prediction of solar power generation is a challenging task due to its dependence on climatic characteristics that exhibit spatial and temporal variability. The performance of a prediction model may vary across different places due to changes in data distribution, resulting in a model that works well in one region but not in others. Furthermore, as a consequence of global warming, there is a notable acceleration in the alteration of weather patterns on an annual basis. This phenomenon introduces the potential for diminished efficacy of existing models, even within the same geographical region, as time progresses. In this paper, a domain adaptive deep learning-based framework is proposed to estimate solar power generation using weather features that can solve the aforementioned challenges. A feed-forward deep convolutional network model is trained for a known location dataset in a supervised manner and utilized to predict the solar power of an unknown location later. This adaptive data-driven approach exhibits notable advantages in terms of computing speed, storage efficiency, and its ability to improve outcomes in scenarios where state-of-the-art non-adaptive methods fail. Our method has shown an improvement of $10.47 \%$, $7.44 \%$, $5.11\%$ in solar power prediction accuracy compared to best performing non-adaptive method for California (CA), Florida (FL) and New York (NY), respectively.

翻訳日:2024-02-07 19:01:58 公開日:2024-02-06

# MTRGL:マルチモーダル時間関係グラフ学習による時間相関の影響

MTRGL:Effective Temporal Correlation Discerning through Multi-modal Temporal Relational Graph Learning ( http://arxiv.org/abs/2401.14199v2 )

ライセンス: Link先を確認

Junwei Su, Shan Wu, Jinhui Li

(参考訳) 本研究では,ペアトレーディングに着目し,ディープラーニングと金融市場アプリケーションのシナジーについて検討する。この市場中立戦略は量的金融に不可欠であり、高度なディープラーニング技術に適している。ペアトレーディングにおける重要な課題は、エンティティ間の時間的相関を識別することであり、多様なデータモダリティの統合を必要とする。そこで我々は,MTRGL(Multi-modal Temporal Relation Graph Learning)という新しいフレームワークを導入する。 MTRGLは時系列データと離散特徴を時間グラフに結合し、メモリベースの時間グラフニューラルネットワークを使用する。このアプローチは、経験的成功を示す時間グラフリンク予測タスクとして、時間相関識別を再構成する。実世界のデータセットに関する我々の実験は、MTRGLの優れた性能を確認し、自動ペアトレーディング戦略の洗練におけるその約束を強調した。

In this study, we explore the synergy of deep learning and financial market applications, focusing on pair trading. This market-neutral strategy is integral to quantitative finance and is apt for advanced deep-learning techniques. A pivotal challenge in pair trading is discerning temporal correlations among entities, necessitating the integration of diverse data modalities. Addressing this, we introduce a novel framework, Multi-modal Temporal Relation Graph Learning (MTRGL). MTRGL combines time series data and discrete features into a temporal graph and employs a memory-based temporal graph neural network. This approach reframes temporal correlation identification as a temporal graph link prediction task, which has shown empirical success. Our experiments on real-world datasets confirm the superior performance of MTRGL, emphasizing its promise in refining automated pair trading strategies.

翻訳日:2024-02-07 19:01:35 公開日:2024-02-06

# 真空光学非線形観測用sagnac干渉計の性能

Performance of a Sagnac interferometer to observe vacuum optical nonlinearity ( http://arxiv.org/abs/2401.13720v2 )

ライセンス: Link先を確認

Aur\'elie Max Mailliet, Adrien E. Kraych, Fran\c{c}ois Couchot, Xavier Sarazin, Elsa Baynard, Julien Demailly, Moana Pittman, Arache Djannati-Ata\"i, Sophie Kazamias, Scott Robertson, Marcel Urban

(参考訳) 量子電磁力学では、真空は非線形光学媒体となり、その光学指数は強い外部電磁場の存在下で修正されるべきである。 dellight project (deflection of light by light) は laserix が配信する集中フェムト秒レーザーパルスを用いてこの効果を観測することを目的としている。サニャック干渉計を用いて、高強度パルス(ポンプ)によって誘導される真空指数勾配を越える低強度集束パルス(プローブ)の偏向を測定する。フェムト秒レーザーパルスを用いたサニャック干渉計がDeLLightプロジェクトのために開発された。以前のプロトタイプと比較して、干渉計は相互作用領域におけるプローブビームの焦点を含むようになった。本稿では、干渉計の感度を制限する重要な実験パラメータ、すなわち、消滅因子、空間分解能、およびプローブパルスの焦点のウエストを測定し、特徴付ける。今後の改善について論じる。

In Quantum Electrodynamics, vacuum becomes a nonlinear optical medium: its optical index should be modified in the presence of intense external electromagnetic fields. The DeLLight project (Deflection of Light by Light) aims to observe this effect using intense focused femtosecond laser pulses delivered by LASERIX. The principle is to measure with a Sagnac interferometer the deflection of a low-intensity focused pulse (probe) crossing the vacuum index gradient induced by a high-intensity pulse (pump). A Sagnac interferometer working with femtosecond laser pulses has been developed for the DeLLight project. Compared to previous prototypes, the interferometer now includes the focusing of the probe beam in the interaction area. In this article, we measure and characterize the critical experimental parameters limiting the sensitivity of the interferometer, namely the extinction factor, the spatial resolution, and the waist at focus of the probe pulse. We discuss future improvements.

翻訳日:2024-02-07 19:01:20 公開日:2024-02-06

# LPNL:大規模言語モデルを用いたスケーラブルリンク予測

LPNL: Scalable Link Prediction with Large Language Models ( http://arxiv.org/abs/2401.13227v2 )

ライセンス: Link先を確認

Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei and Xueqi Cheng

(参考訳) グラフ学習への大規模言語モデル(llm)の適用の探求は、新たな取り組みだ。しかし、巨大なグラフに固有の膨大な情報はこのプロセスに重大な課題をもたらす。本研究はリンク予測タスクに着目し,大規模不均一グラフ上でスケーラブルなリンク予測用に設計された大規模言語モデルに基づくフレームワークである$\textbf{lpnl}$(自然言語によるリンク予測)を紹介する。グラフの詳細を自然言語で表現するリンク予測のための新しいプロンプトを設計した。本稿では,グラフから重要な情報を抽出する2段階のサンプリングパイプラインと,事前定義された範囲内で入力トークンを制御するための分割・分割戦略を提案する。リンク予測用に設計された自己教師型学習に基づいてT5モデルを微調整する。大規模グラフ上でのリンク予測タスクにおいて,LPNLは複数の高度なベースラインよりも優れていることを示す。

Exploring the application of large language models (LLMs) to graph learning is a emerging endeavor. However, the vast amount of information inherent in large graphs poses significant challenges to this process. This work focuses on the link prediction task and introduces $\textbf{LPNL}$ (Link Prediction via Natural Language), a framework based on large language models designed for scalable link prediction on large-scale heterogeneous graphs. We design novel prompts for link prediction that articulate graph details in natural language. We propose a two-stage sampling pipeline to extract crucial information from the graphs, and a divide-and-conquer strategy to control the input tokens within predefined limits, addressing the challenge of overwhelming information. We fine-tune a T5 model based on our self-supervised learning designed for link prediction. Extensive experimental results demonstrate that LPNL outperforms multiple advanced baselines in link prediction tasks on large-scale graphs.

翻訳日:2024-02-07 19:01:07 公開日:2024-02-06

# テキストと画像の拡散をマスターする:マルチモーダルLLMによる再カプセル化, 計画, 生成

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs ( http://arxiv.org/abs/2401.11708v2 )

ライセンス: Link先を確認

Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui

(参考訳) 拡散モデルはテキスト・画像の生成・編集において例外的な性能を示した。しかし、複数の属性と関係を持つ複数のオブジェクトを含む複雑なテキストプロンプトを扱う場合、既存のメソッドは、しばしば課題に直面する。本稿では,マルチモーダルLLMの強力なチェーン・オブ・シント推論能力を活用し,テキスト・ツー・イメージ拡散モデルの構成性を向上する,新たなトレーニングフリーなテキスト・ツー・イメージ生成/編集フレームワークを提案する。本手法では,MLLMをグローバルプランナとして使用し,複雑な画像をサブリージョン内の複数の単純な生成タスクに分解する。地域的構成生成を可能にするために,補完的な地域拡散を提案する。さらに,提案したRPGのテキスト誘導画像生成と編集をクローズドループ方式で統合し,一般化能力を向上する。 dall-e 3やsdxlといった最先端のテキストから画像への拡散モデル、特にマルチカテゴリのオブジェクト構成やテキスト・イメージのセマンティクスアライメントよりもrpgの方が優れています。特に、RPGフレームワークは、さまざまなMLLMアーキテクチャ(MiniGPT-4など)と拡散バックボーン(ControlNetなど)との広範な互換性を示す。私たちのコードは、https://github.com/YangLing0818/RPG-DiffusionMasterで利用可能です。

Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessing the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models. Our approach employs the MLLM as a global planner to decompose the process of generating complex images into multiple simpler generation tasks within subregions. We propose complementary regional diffusion to enable region-wise compositional generation. Furthermore, we integrate text-guided image generation and editing within the proposed RPG in a closed-loop fashion, thereby enhancing generalization ability. Extensive experiments demonstrate our RPG outperforms state-of-the-art text-to-image diffusion models, including DALL-E 3 and SDXL, particularly in multi-category object composition and text-image semantic alignment. Notably, our RPG framework exhibits wide compatibility with various MLLM architectures (e.g., MiniGPT-4) and diffusion backbones (e.g., ControlNet). Our code is available at: https://github.com/YangLing0818/RPG-DiffusionMaster

翻訳日:2024-02-07 19:00:50 公開日:2024-02-06

# 反事実彫刻はひび割れ状態の忠実度を指数関数的に改善する

Counter-factual carving exponentially improves entangled-state fidelity ( http://arxiv.org/abs/2401.11407v2 )

ライセンス: Link先を確認

Joshua Ramette, Josiah Sinclair, Vladan Vuleti\'c

(参考訳) 本研究では,プローブの"no-jump"進化を用いて,忠実度の高い絡み合った多体状態を生成する新しい手法である"counter-factual"型彫刻を提案する。プローブは、量子ビットのターゲットアンサンブルに結合され、ターゲットの集団スピンに応じて指数関数的に減衰するように設計され、プローブの崩壊を観測する後の選択が、より早い分解スピン成分を正確に除去する。プローブと$N$-qubitターゲットがコオペラティティティの空洞モード$C$を介して相互作用すると、反事実彫刻は、以前の彫刻方式よりも指数関数的改善である$e^{-C/N}$の不忠実な絡み合った状態を生成する。反事実彫刻は量子力学や量子コンピューティングへの応用のために複雑な絡み合った状態を生成することができる。

We propose a new method, "counter-factual" carving, that uses the "no-jump" evolution of a probe to generate entangled many-body states of high fidelity. The probe is coupled to a target ensemble of qubits and engineered to exponentially decay at a rate depending on the target collective spin, such that post-selecting on observing no probe decay precisely removes select faster-decaying spin components. When probe and $N$-qubit target interact via a cavity mode of cooperativity $C$, counter-factual carving generates entangled states with infidelities of $e^{-C/N}$, an exponential improvement over previous carving schemes. Counter-factual carving can generate complex entangled states for applications in quantum metrology and quantum computing.

翻訳日:2024-02-07 19:00:24 公開日:2024-02-06

# ベストは最善の手段で終わる:アプリレビューの倫理的懸念

The Best Ends by the Best Means: Ethical Concerns in App Reviews ( http://arxiv.org/abs/2401.11063v2 )

ライセンス: Link先を確認

Lauren Olson, Neelam Tjikhoeri, Emitz\'a Guzm\'an

(参考訳) この研究は、ユーザのアプリストアレビューに見られる倫理的懸念を分析します。本研究は,モバイルアプリケーション(アプリケーション)における倫理的関心が広まり,エンドユーザーや社会に深刻な脅威をもたらし,系統的な分析や分類方法が欠如しているためである。さらにapp storeのレビューでは,地理的に分散した大規模オーディエンスから,ソフトウェアの欠陥を特定する上で極めて重要なユーザ視点の収集が可能になる。分析の結果,500万件のユーザレビューを収集し,ユーザの嗜好を表す倫理的関心事のセットを開発し,これらのレビューのサンプルを手作業でラベル付けした。 1) 検閲, 身元盗難, 安全に関する倫理的懸念を高い頻度で報告すること, (2) 倫理的懸念を伴うユーザレビューはより長く, 人気が高く, 評価が低いこと, (3) 評価の分類とフィルタリングの自動化の可能性が高いことが判明した。ソフトウェア進化における倫理的懸念を体系的に考慮する上で,app storeのレビューが有効であることを強調する。

This work analyzes ethical concerns found in users' app store reviews. We performed this study because ethical concerns in mobile applications (apps) are widespread, pose severe threats to end users and society, and lack systematic analysis and methods for detection and classification. In addition, app store reviews allow practitioners to collect users' perspectives, crucial for identifying software flaws, from a geographically distributed and large-scale audience. For our analysis, we collected five million user reviews, developed a set of ethical concerns representative of user preferences, and manually labeled a sample of these reviews. We found that (1) users highly report ethical concerns about censorship, identity theft, and safety (2) user reviews with ethical concerns are longer, more popular, and lowly rated, and (3) there is high automation potential for the classification and filtering of these reviews. Our results highlight the relevance of using app store reviews for the systematic consideration of ethical concerns during software evolution.

翻訳日:2024-02-07 19:00:06 公開日:2024-02-06

# グロッキングの視点からみた言語モデルの臨界データサイズ

Critical Data Size of Language Models from a Grokking Perspective ( http://arxiv.org/abs/2401.10463v2 )

ライセンス: Link先を確認

Xuekai Zhu, Yao Fu, Bowen Zhou, Zhouhan Lin

(参考訳) 我々は、言語モデルにおける重要なデータサイズを探索する。これは、素早い記憶から遅い一般化への根本的なシフトを示すしきい値である。グロッキング構成下での相転移をデータ効率仮説に定式化し,言語モデルの学習ダイナミクスにおけるデータ不足,不十分,余剰レジームを同定する。我々は、初期化と重み劣化を再スケーリングすることで、単純化された言語モデル上でグラッキングを安定的に再現するためのグラッキング構成を開発する。一般化は言語モデルが臨界サイズに達する場合にのみ起こることを示す。サンプル単位とモデル単位のグロッキングを解析し,提案するデータ効率仮説を検証した。実験の結果,言語データセットのクリティカルデータセットサイズで発生するスムーズな相転移が明らかになった。モデルのサイズが大きくなると、このクリティカルポイントも大きくなり、より大きなモデルにはより多くのデータが必要となる。その結果,言語モデル学習の理解を深め,言語モデルの学習メカニズムにおけるデータの役割に関する新たな視点が得られた。

We explore the critical data size in language models, a threshold that marks a fundamental shift from quick memorization to slow generalization. We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics. We develop a grokking configuration to reproduce grokking on simplistic language models stably by rescaling initialization and weight decay. We show that generalization occurs only when language models reach a critical size. We analyze grokking across sample-wise and model-wise, verifying the proposed data efficiency hypothesis. Our experiments reveal smoother phase transitions occurring at the critical dataset size for language datasets. As the model size increases, this critical point also becomes larger, indicating that larger models require more data. Our results deepen the understanding of language model training, offering a novel perspective on the role of data in the learning mechanism of language models.

翻訳日:2024-02-07 18:59:45 公開日:2024-02-06

# 原理グラフトランスフォーマーを目指して

Towards Principled Graph Transformers ( http://arxiv.org/abs/2401.10119v2 )

ライセンス: Link先を確認

Luis M\"uller and Daniel Kusuma and Christopher Morris

(参考訳) k次元Weisfeiler-Leman(k-WL)階層に基づくグラフ学習アーキテクチャは、理論的によく理解された表現力を提供する。しかし、そのようなアーキテクチャは現実のタスクにしっかりとした予測性能を持たず、実際の影響を限定することが多い。対照的に、グラフトランスフォーマーのようなグローバルな注意に基づくモデルは、実際には強力なパフォーマンスを示しているが、表現力とk-wl階層との比較は、特にこれらのアーキテクチャが表現力と予測性能のために位置エンコーディングや構造エンコーディングに依存しているため、依然として困難である。そこで本研究では,ノードではなくノードペアで動作するグローバルアテンションモデルであるEdge Transformerが,少なくとも3WLの表現力を持つことを示す。実験的に、Edge Transformerは、位置や構造的エンコーディングを頼らずに、予測性能に関する他の理論的に整合したアーキテクチャを上回ることを実証する。

Graph learning architectures based on the k-dimensional Weisfeiler-Leman (k-WL) hierarchy offer a theoretically well-understood expressive power. However, such architectures often fail to deliver solid predictive performance on real-world tasks, limiting their practical impact. In contrast, global attention-based models such as graph transformers demonstrate strong performance in practice, but comparing their expressive power with the k-WL hierarchy remains challenging, particularly since these architectures rely on positional or structural encodings for their expressivity and predictive performance. To address this, we show that the recently proposed Edge Transformer, a global attention model operating on node pairs instead of nodes, has at least 3-WL expressive power. Empirically, we demonstrate that the Edge Transformer surpasses other theoretically aligned architectures regarding predictive performance while not relying on positional or structural encodings.

翻訳日:2024-02-07 18:59:29 公開日:2024-02-06

# 神経オデムの補間における深さと幅の相互作用

Interplay between depth and width for interpolation in neural ODEs ( http://arxiv.org/abs/2401.09902v3 )

ライセンス: Link先を確認

Antonio \'Alvarez-L\'opez, Arselane Hadj Slimane, Enrique Zuazua

(参考訳) ニューラル常微分方程式 (neural ODEs) は制御の観点から教師あり学習の自然な道具として登場したが、それらの最適アーキテクチャの完全な理解はいまだ解明されていない。本研究では,その幅$p$と層遷移数$L$(事実上深さ$L+1$)の相互作用について検討する。具体的には、ワッサーシュタイン誤差マージン$\varepsilon>0$の中で、N$の点対からなる有限データセット$D$または2つの確率測度を$\mathbb{R}^d$で補間する能力の観点からモデル表現性を評価する。この結果から,データセット補間は$O(1+N/p)$,測定補間は$L=O\left(1+(p\varepsilon^d)^{-1}\right)$として,$L$が$O(1+N/p)$,$L$が$L$のバランスをとることが判明した。自律的なケースでは、$l=0$の場合、データセットの補間に焦点を当てた別の研究が必要です。我々は、$\varepsilon$-approximate controllabilityの緩和問題に対処し、$\varepsilon\sim O(\log(p)p^{-1/d})$の誤差崩壊を確立する。この減衰率は、$d$を補間するカスタム構築リプシッツベクトル場に普遍近似定理を適用する結果である。高次元設定では、$p=O(N)$ニューロンが正確な制御を達成するのに十分であることを示す。

Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.

翻訳日:2024-02-07 18:59:10 公開日:2024-02-06

# 大規模言語モデルがベクトルデータベースを満たすとき:調査

When Large Language Models Meet Vector Databases: A Survey ( http://arxiv.org/abs/2402.01763v2 )

ライセンス: Link先を確認

Zhi Jing, Yongye Su, Yikun Han, Bo Yuan, Haiyun Xu, Chunjiang Liu, Kehai Chen, Min Zhang

(参考訳) 本稿では,大規模言語モデル (LLMs) とベクトルデータベース (VecDBs) の相乗的ポテンシャルについて検討する。 LLMの普及に伴い、幻覚、時代遅れの知識、禁止的な商用アプリケーションコスト、メモリ問題など、多くの課題が伴う。 VecDBは、LLM操作に固有の高次元ベクトル表現を保存、取得、管理する効率的な手段を提供することによって、これらの問題の魅力的な解決策として浮上する。本稿では,LLM と VecDB の基本原理を概説し,LLM の機能強化に対するそれらの統合の影響を批判的に分析する。この議論は、先進的なデータ処理と知識抽出能力のためにLLMとVecDBの結合を最適化するためのさらなる研究を促進することを目的として、この領域における投機的将来の発展に関する議論へと展開する。

This survey explores the synergistic potential of Large Language Models (LLMs) and Vector Databases (VecDBs), a burgeoning but rapidly evolving research area. With the proliferation of LLMs comes a host of challenges, including hallucinations, outdated knowledge, prohibitive commercial application costs, and memory issues. VecDBs emerge as a compelling solution to these issues by offering an efficient means to store, retrieve, and manage the high-dimensional vector representations intrinsic to LLM operations. Through this nuanced review, we delineate the foundational principles of LLMs and VecDBs and critically analyze their integration's impact on enhancing LLM functionalities. This discourse extends into a discussion on the speculative future developments in this domain, aiming to catalyze further research into optimizing the confluence of LLMs and VecDBs for advanced data handling and knowledge extraction capabilities.

翻訳日:2024-02-07 18:50:16 公開日:2024-02-06

# 戦略としての文字列としてのステート:ゲーム理論による言語モデルの操り方

States as Strings as Strategies: Steering Language Models with Game-Theoretic Solvers ( http://arxiv.org/abs/2402.01704v2 )

ライセンス: Link先を確認

Ian Gemp, Yoram Bachrach, Marc Lanctot, Roma Patel, Vibhavari Dasagi, Luke Marris, Georgios Piliouras, Siqi Liu, Karl Tuyls

(参考訳) ゲーム理論は、合理的エージェント間の戦略的相互作用の数学的モデルの研究である。言語は人間にとって重要な対話手段であるが、歴史的に対話とその戦略的動機を数学的にモデル化することは困難である。言語相互作用に関連するプレイヤー、戦略、報酬の適切なモデル(つまり、ゲーム理論の従来の象徴論理への結合)は、既存のゲーム理論アルゴリズムが言語空間における戦略的な解決策を提供することができる。言い換えれば、バインディングは対話における安定した合理的な会話戦略を計算するための経路を提供することができる。大規模言語モデル(llm)は、その生成能力が自然対話の現実的な人間のようなシミュレーションを可能にする点に到達している。様々な方法でそれらを促すことで、異なる出力発話に対して反応を制御できる。自然言語の表現力を活用することで、llmは現実世界のアプリケーションで基盤となる新しい対話シナリオを迅速に生成する上でも役立ちます。本研究では,対話からゲーム理論への結合の可能性と,既存の平衡探索アルゴリズムの一般化について述べる。さらに,提案するバインディングとともにllms生成機能を活用することで,ゲーム理論的なソリューション概念を学習し,テスト可能な,公式なゲームリポジトリを合成することができる。また, LLM によるゲーム生成, ゲーム理論解法, 模倣学習を組み合わせて, LLM の戦略能力向上のプロセスを構築する方法を示す。

Game theory is the study of mathematical models of strategic interactions among rational agents. Language is a key medium of interaction for humans, though it has historically proven difficult to model dialogue and its strategic motivations mathematically. A suitable model of the players, strategies, and payoffs associated with linguistic interactions (i.e., a binding to the conventional symbolic logic of game theory) would enable existing game-theoretic algorithms to provide strategic solutions in the space of language. In other words, a binding could provide a route to computing stable, rational conversational strategies in dialogue. Large language models (LLMs) have arguably reached a point where their generative capabilities can enable realistic, human-like simulations of natural dialogue. By prompting them in various ways, we can steer their responses towards different output utterances. Leveraging the expressivity of natural language, LLMs can also help us quickly generate new dialogue scenarios, which are grounded in real world applications. In this work, we present one possible binding from dialogue to game theory as well as generalizations of existing equilibrium finding algorithms to this setting. In addition, by exploiting LLMs generation capabilities along with our proposed binding, we can synthesize a large repository of formally-defined games in which one can study and test game-theoretic solution concepts. We also demonstrate how one can combine LLM-driven game generation, game-theoretic solvers, and imitation learning to construct a process for improving the strategic capabilities of LLMs.

翻訳日:2024-02-07 18:49:57 公開日:2024-02-06

# 靴センサの最近の進歩 : 医療におけるスマート・フットウェアの役割

Recent Innovations in Footwear Sensors: Role of Smart Footwear in Healthcare -- A Survey ( http://arxiv.org/abs/2402.01645v2 )

ライセンス: Link先を確認

Pradyumna G. R., Roopa B. Hegde, Bommegowda K. B., Anil Kumar Bhat, Ganesh R. Naik, Amit N. Pujari

(参考訳) スマートシューズは、パーソナライズされた健康モニタリングと補助技術の新時代を支えている。この靴はbluetoothなどの技術をデータ収集や無線伝送に活用し、gps追跡、障害物検出、フィットネストラッキングなどの機能を備えている。本稿では,スマートシュー技術の現状について概説するとともに,健康モニタリング,エネルギー収穫,視覚障害者支援機能,データ分析のためのディープラーニングの統合について述べる。本研究は、特に糖尿病患者に対する医療応用におけるスマートフットウェアの可能性と、この分野での現在進行中の研究について論じる。複雑な構造、不適合性、快適性、コストなど、現在の履物の問題も議論されている。

Smart shoes have ushered in a new era of personalised health monitoring and assistive technology. The shoe leverages technologies such as Bluetooth for data collection and wireless transmission and incorporates features such as GPS tracking, obstacle detection, and fitness tracking. This article provides an overview of the current state of smart shoe technology, highlighting the integration of advanced sensors for health monitoring, energy harvesting, assistive features for the visually impaired, and deep learning for data analysis. The study discusses the potential of smart footwear in medical applications, particularly for patients with diabetes, and the ongoing research in this field. Current footwear challenges are also discussed, including complex construction, poor fit, comfort, and high cost.

翻訳日:2024-02-07 18:49:34 公開日:2024-02-06

# 予測可能な性能保証を伴うAIエラー訂正のための弱教師付き学習者

Weakly Supervised Learners for Correction of AI Errors with Provable Performance Guarantees ( http://arxiv.org/abs/2402.00899v2 )

ライセンス: Link先を確認

Ivan Y. Tyukin, Tatiana Tyukina, Daniel van Helden, Zedong Zheng, Evgeny M. Mirkes, Oliver J. Sutton, Qinghua Zhou, Alexander N. Gorban, Penelope Allison

(参考訳) 本稿では,最優先性能保証付き弱教師付きAI誤り訂正器を導入することにより,AIエラーを処理する新しい手法を提案する。これらのAI補正は、その決定を承認または拒否することで、以前に構築されたいくつかの下位分類器の決定を緩和する役割を持つ補助マップである。決定の拒絶は、決定の棄却を示唆する信号として用いることができる。この作業の重要な技術的焦点は、不正確な決定の確率の限界を通して、これらの新しいai修正者のパフォーマンス保証を提供することである。これらの境界は分布非依存であり、データ次元の仮定に依存しない。私たちの経験的な例は、トレーニングデータが不足している実世界の課題において、画像分類器のパフォーマンス向上にフレームワークを適用する方法を示している。

We present a new methodology for handling AI errors by introducing weakly supervised AI error correctors with a priori performance guarantees. These AI correctors are auxiliary maps whose role is to moderate the decisions of some previously constructed underlying classifier by either approving or rejecting its decisions. The rejection of a decision can be used as a signal to suggest abstaining from making a decision. A key technical focus of the work is in providing performance guarantees for these new AI correctors through bounds on the probabilities of incorrect decisions. These bounds are distribution agnostic and do not rely on assumptions on the data dimension. Our empirical example illustrates how the framework can be applied to improve the performance of an image classifier in a challenging real-world task where training data are scarce.

翻訳日:2024-02-07 18:49:13 公開日:2024-02-06

# ビデオは効果的に使っていない: 更新されたドメイン適応ビデオセグメンテーションベースライン

We're Not Using Videos Effectively: An Updated Domain Adaptive Video Segmentation Baseline ( http://arxiv.org/abs/2402.00868v2 )

ライセンス: Link先を確認

Simar Kareer, Vivek Vijaykumar, Harsh Maheshwari, Prithvijit Chattopadhyay, Judy Hoffman, Viraj Prabhu

(参考訳) セマンティックセグメンテーション(DAS)のための教師なしドメイン適応には、ラベル付きソースドメインからラベル付きターゲットドメインへのイメージに基づいてトレーニングされたモデルを適応させようとする多くの作業がある。以前の研究の大半はフレームレベルの画像DAS問題としてこれを研究してきたが、ビデオDASでは隣接するフレームに存在する時間信号をさらに活用しようと試みている。しかし、Video-DASの研究は歴史的にImage-DASとは異なるベンチマークのセットを最小のベンチマークで研究してきた。この作業では、このギャップに対処します。驚いたことに、(1)データとモデルアーキテクチャを慎重に制御した後でも、(HRDAとHRDA+MIC)ビデオDAS手法は、確立されたビデオDASベンチマーク(+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq)において、(HRDAとHRDA+MIC)ビデオDAS手法よりも優れており、(2)Image-DASとVideo-DAS技術の組み合わせはデータセット間の限界改善にしか至らない。 Image-DAS と Video-DAS のサイロ化の進展を避けるため、我々は、共通のベンチマークで Video-DAS と Image-DAS メソッドの包括的なセットをサポートするコードベースをオープンソース化した。コードはhttps://github.com/simarkareer/unifiedvideodaで利用可能

There has been abundant work in unsupervised domain adaptation for semantic segmentation (DAS) seeking to adapt a model trained on images from a labeled source domain to an unlabeled target domain. While the vast majority of prior work has studied this as a frame-level Image-DAS problem, a few Video-DAS works have sought to additionally leverage the temporal signal present in adjacent frames. However, Video-DAS works have historically studied a distinct set of benchmarks from Image-DAS, with minimal cross-benchmarking. In this work, we address this gap. Surprisingly, we find that (1) even after carefully controlling for data and model architecture, state-of-the-art Image-DAS methods (HRDA and HRDA+MIC) outperform Video-DAS methods on established Video-DAS benchmarks (+14.5 mIoU on Viper$\rightarrow$CityscapesSeq, +19.0 mIoU on Synthia$\rightarrow$CityscapesSeq), and (2) naive combinations of Image-DAS and Video-DAS techniques only lead to marginal improvements across datasets. To avoid siloed progress between Image-DAS and Video-DAS, we open-source our codebase with support for a comprehensive set of Video-DAS and Image-DAS methods on a common benchmark. Code available at https://github.com/SimarKareer/UnifiedVideoDA

翻訳日:2024-02-07 18:49:03 公開日:2024-02-06

# ポジションペーパー:大規模AIの時代におけるベイズ的深層学習

Position Paper: Bayesian Deep Learning in the Age of Large-Scale AI ( http://arxiv.org/abs/2402.00809v2 )

ライセンス: Link先を確認

Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, Jose Miguel Hernandez Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David R\"ugamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

(参考訳) ディープラーニング研究の現在の状況では、大規模な画像と言語データセットを含む教師付きタスクにおいて、高い予測精度を達成することに重点が置かれている。しかし、より広い視点から見れば、不確実性、活動的かつ継続的な学習、科学的なデータなど、見落とされがちなメトリクス、タスク、データタイプが、注意を喚起する。 Bayesian Deep Learning(BDL)は,これらのさまざまな設定にまたがってメリットを提供する,有望な道の1つである。本稿では,BDLが深層学習の能力を高めることができることを示唆する。 BDLの強みを再考し、既存の課題を認識し、これらの障害に対処するためのエキサイティングな研究方法を強調します。今後の議論は、大規模ファンデーションモデルをBDLと組み合わせて、その潜在能力を最大限に活用する方法に焦点を当てている。

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

翻訳日:2024-02-07 18:48:01 公開日:2024-02-06

# シーケンスモデリングのためのトランスの表現力と機構の理解

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling ( http://arxiv.org/abs/2402.00522v2 )

ライセンス: Link先を確認

Mingze Wang, Weinan E

(参考訳) 長大,スパース,複雑なメモリを有するシーケンスモデリングのための変圧器の近似特性を体系的に研究する。点生成自己着脱,位置符号化,フィードフォワード層などのトランスフォーマーの異なる成分が,その表現力にどのような影響を及ぼすかを調査し,それらの組み合わせ効果を明示的な近似率の確立を通じて検討する。本研究は,トランスフォーマーにおけるクリティカルパラメータの役割を明らかにする。レイヤ数やアテンションヘッド数などである。

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory. We investigate the mechanisms through which different components of Transformer, such as the dot-product self-attention, positional encoding and feed-forward layer, affect its expressive power, and we study their combined effects through establishing explicit approximation rates. Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads, and these insights also provide natural suggestions for alternative architectures.

翻訳日:2024-02-07 18:47:46 公開日:2024-02-06

# 量子エネルギーテレポーテーションにおける量子相関のロバスト性

Robustness of quantum correlation in quantum energy teleportation ( http://arxiv.org/abs/2402.00479v2 )

ライセンス: Link先を確認

Kazuki Ikeda and Adam Lowe

(参考訳) 本稿では、従来のエンタングルメントエントロピーではなく、量子不協和を用いた量子エネルギーテレポーテーション(QET)プロトコルにおける量子相関の進化について述べる。局所的な観測と条件付き操作を繰り返し行うQETプロトコルでは、混合状態の統計的生成のために量子相関は非自明になる。本稿では,混合状態における量子相関の尺度として量子ディスコードを用い,そのテレポーティングエネルギーと相転移との関係について検討する。 QETを実行するアリスとボブの過程において、アリスとボブの間の絡み合いはアリスの量子状態の測定によって完全に崩壊し、量子相関が消えると予想される。しかし、この予想に反して、量子不協和を用いて量子相関がQETの全過程中に消失しないことが示されている。種々の相構造におけるQETの量子相関のロバスト性を示すために, キラル化学ポテンシャルと化学ポテンシャルの両方を持つナムブ・ジョナ・ラシーノ(NJL)モデルを含むいくつかのベンチマークモデルを用いて数値解析を行い, キラル密度演算子に結合した左クォークと右クォークのキラル不均衡を模した相構造の研究に有用である。研究した全てのケースにおいて、量子不協和は相転移の秩序パラメータとして振る舞う。

We present the evolution of quantum correlation in the quantum energy teleportation (QET) protocol using quantum discord, instead of the traditionally used entanglement entropy. In the QET protocol, where local observations and conditional operations are repeated, quantum correlations become nontrivial because of the statistical creation of mixed states. In this paper, we use quantum discord as a measure of quantum correlation in mixed states and investigate its relationship to teleported energy and phase transitions. During the process of Alice and Bob performing QET, one would expect that the entanglement between Alice and Bob is completely broken by Alice's measurement of the quantum state, and thus the quantum correlation disappears. However, contrary to this expectation, it is shown using quantum discord that the quantum correlation does not disappear during the entire process of QET. To demonstrate the robustness of the quantum correlation in QET at various phase structures, we perform the numerical analysis using several benchmark models including the Nambu-Jona-Lasino (NJL) model with both the chiral chemical potential and the chemical potential, which are useful to study the phase structures mimicking the chiral imbalance between left- and right- quarks coupled to the chirality density operator. In all cases we studied, the quantum discord behaved as an order parameter of the phase transition.

翻訳日:2024-02-07 18:47:36 公開日:2024-02-06

# マルチモーダルシリアル再生を用いた人・大言語モデルの抽象化の比較

Comparing Abstraction in Humans and Large Language Models Using Multimodal Serial Reproduction ( http://arxiv.org/abs/2402.03618v1 )

ライセンス: Link先を確認

Sreejan Kumar, Raja Marjieh, Byron Zhang, Declan Campbell, Michael Y. Hu, Umang Bhatt, Brenden Lake, Thomas L. Griffiths

(参考訳) 人間は騒がしい感覚データから世界の有用な抽象概念を抽出する。連続的な再現は、ある人が刺激を観察し、次にそれを再現して再生の連鎖を形成するという、電話ゲームに似たパラダイムを通じて、人々がどのように世界を実現するかを研究できる。過去の連続再生実験は、通常、単一の感覚的モダリティを用いるが、人間はしばしば言語を通して世界の抽象を互いに伝達する。抽象概念形成における効果言語の検討のために,視覚刺激を受けた人に言語形式で再現するよう依頼し,その逆で,新しいマルチモーダル・シリアル再生フレームワークを実装した。ヒトとGPT-4の双方で一本鎖と多本鎖を走らせ,言語をモダリティとして加えると,GPT-4よりもヒトの生殖に大きな影響を及ぼすことがわかった。これは、人間の視覚的および言語的表現がGPT-4よりも解離しやすいことを示唆している。

Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4.

翻訳日:2024-02-07 17:24:31 公開日:2024-02-06

# 大規模言語モデルを用いた実世界データからのコントラセプティブ・スイッチングの理由

Identifying Reasons for Contraceptive Switching from Real-World Data Using Large Language Models ( http://arxiv.org/abs/2402.03597v1 )

ライセンス: Link先を確認

Brenda Y. Miao, Christopher YK Williams, Ebenezer Chinedu-Eneh, Travis Zack, Emily Alsentzer, Atul J. Butte, Irene Y. Chen

(参考訳) 処方避妊具は女性の生殖維持に重要な役割を果たす。米国では約5000万人の女性が避妊具を使用しており、避妊薬の選択と切り替えを駆動する要因を理解することは大きな関心事である。しかし、薬物交換に関連する多くの要因は、しばしば、構造化されていない臨床ノートにのみ記録され、抽出が困難である。本稿では,近年開発された大規模言語モデル GPT-4 (HIPAA準拠のMicrosoft Azure API) のゼロショット能力を評価し,UCSFインフォメーション・コモンズ臨床ノートデータセットから避妊薬のクラスを切り替える理由を明らかにする。 GPT-4は, 避妊開始時と停止時にそれぞれ0.849点, 0.881点のマイクロF1スコアのベースラインBERTベースモデルよりも優れている。 gpt-4抽出理由のヒトによる評価は91.4%の精度で、幻覚は最小であった。抽出された理由を用いて,非教師付きトピックモデリングアプローチを用いた切り替えの主な理由として,患者の嗜好,有害事象,保険を特定した。また, 特定の人口集団における避妊スイッチングの理由として, 「体重増加/ムード変化」と「保険カバレッジ」が不均等に見出された。私たちのコードと補足データはhttps://github.com/bmiao10/contraceptive-switchingで入手できます。

Prescription contraceptives play a critical role in supporting women's reproductive health. With nearly 50 million women in the United States using contraceptives, understanding the factors that drive contraceptives selection and switching is of significant interest. However, many factors related to medication switching are often only captured in unstructured clinical notes and can be difficult to extract. Here, we evaluate the zero-shot abilities of a recently developed large language model, GPT-4 (via HIPAA-compliant Microsoft Azure API), to identify reasons for switching between classes of contraceptives from the UCSF Information Commons clinical notes dataset. We demonstrate that GPT-4 can accurately extract reasons for contraceptive switching, outperforming baseline BERT-based models with microF1 scores of 0.849 and 0.881 for contraceptive start and stop extraction, respectively. Human evaluation of GPT-4-extracted reasons for switching showed 91.4% accuracy, with minimal hallucinations. Using extracted reasons, we identified patient preference, adverse events, and insurance as key reasons for switching using unsupervised topic modeling approaches. Notably, we also showed using our approach that "weight gain/mood change" and "insurance coverage" are disproportionately found as reasons for contraceptive switching in specific demographic populations. Our code and supplemental data are available at https://github.com/BMiao10/contraceptive-switching.

翻訳日:2024-02-07 17:24:15 公開日:2024-02-06

# グラフ構造ピラミッド型全スライド画像表現

GRASP: GRAph-Structured Pyramidal Whole Slide Image Representation ( http://arxiv.org/abs/2402.03592v1 )

ライセンス: Link先を確認

Ali Khajegili Mirabadi, Graham Archibald, Amirali Darbandsari, Alberto Contreras-Sanz, Ramin Ebrahim Nakhli, Maryam Asadi, Allen Zhang, C. Blake Gilks, Peter Black, Gang Wang, Hossein Farahani, Ali Bashashati

(参考訳) がんのサブタイピングはデジタル病理学において最も難しい課題の1つであり、近年の研究では、ギガピクセル全体のスライド画像(WSI)を処理するマルチインスタンスラーニング(MIL)が注目されている。しかし、MILアプローチはWSIに含まれる画像間および画像内情報を利用できない。本稿では,デジタル病理学におけるWSI処理のための新しいグラフ構造化多重化フレームワークGRASPを提案する。我々のアプローチは、WSIの処理における病理学者の振る舞いを動的にエミュレートし、WSIの階層構造から利益を得るように設計されています。 GRASPは、従来のプール機構の代わりに収束ベースのノードアグリゲーションを導入し、2つの異なるがんデータセットに対する最先端メソッドを最大10%のバランスの取れた精度で上回り、パラメータの数の観点から最も近いパフォーマンスの最先端モデルよりも7倍小さい。以上の結果から,GRASPはがんの亜型化のための様々な倍率の発見と相談において動的であり,様々なハイパーパラメータにわたって信頼性と安定性を有することが示唆された。モデルの振る舞いは、2人の専門的な病理学者によって評価され、モデルのダイナミクスの解釈可能性が確認された。また、実験的な証拠とともに、GRASPがグラフ内の異なる倍率やノードとどのように相互作用して予測を行うかを説明する理論的基盤も提供します。 GRASPの強い特性と単純な構造は、デジタル病理学におけるWSI表現の解釈可能な構造ベースの設計を促進するだろうと考えている。さらに,珍しい卵巣癌と膀胱癌のグラフデータセットを2つ公開し,この分野に貢献する。

Cancer subtyping is one of the most challenging tasks in digital pathology, where Multiple Instance Learning (MIL) by processing gigapixel whole slide images (WSIs) has been in the spotlight of recent research. However, MIL approaches do not take advantage of inter- and intra-magnification information contained in WSIs. In this work, we present GRASP, a novel graph-structured multi-magnification framework for processing WSIs in digital pathology. Our approach is designed to dynamically emulate the pathologist's behavior in handling WSIs and benefits from the hierarchical structure of WSIs. GRASP, which introduces a convergence-based node aggregation instead of traditional pooling mechanisms, outperforms state-of-the-art methods over two distinct cancer datasets by a margin of up to 10% balanced accuracy, while being 7 times smaller than the closest-performing state-of-the-art model in terms of the number of parameters. Our results show that GRASP is dynamic in finding and consulting with different magnifications for subtyping cancers and is reliable and stable across different hyperparameters. The model's behavior has been evaluated by two expert pathologists confirming the interpretability of the model's dynamic. We also provide a theoretical foundation, along with empirical evidence, for our work, explaining how GRASP interacts with different magnifications and nodes in the graph to make predictions. We believe that the strong characteristics yet simple structure of GRASP will encourage the development of interpretable, structure-based designs for WSI representation in digital pathology. Furthermore, we publish two large graph datasets of rare Ovarian and Bladder cancers to contribute to the field.

翻訳日:2024-02-07 17:23:49 公開日:2024-02-06

# CAT-SAM:Segmentation Anything ModelのFew-Shot Adaptationのための条件調整ネットワーク

CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model ( http://arxiv.org/abs/2402.03631v1 )

ライセンス: Link先を確認

Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Shijian Lu

(参考訳) 最近のSegment Anything Model (SAM) は、一般画像のセグメンテーションにおいて顕著なゼロショット能力と柔軟な幾何学的プロンプトを示した。しかしSAMは、航空、医療、非RGB画像など、様々な非伝統的なイメージを扱う際にしばしば苦労する。本稿では,CAT-SAM(ConditionAl Tuning Network)を提案する。 CAT-SAMはSAM全体を凍結し、マスクデコーダとイメージエンコーダに少数の学習可能なパラメータを同時に適用する。コア設計は、重厚画像エンコーダと軽量マスクデコーダのデコーダ条件付きジョイントチューニングを可能にするプロンプトブリッジ構造である。ブリッジングは、マスクデコーダのプロンプトトークンを画像エンコーダにマッピングし、相互の利益により、エンコーダとデコーダのシナジー適応を促進する。我々は、入力空間に学習可能なプロンプトトークンを注入する1つのCAT-SAMと、軽量なアダプタネットワークを挿入する2つのCAT-SAM変異をもたらすイメージエンコーダの代表的なチューニング戦略を開発する。 11の非従来型タスクに対する大規模な実験により、CAT-SAMはどちらも、非常に困難なワンショット適応設定の下でも、常に優れた目標セグメンテーション性能を達成している。プロジェクトページ: \url{https://xiaoaoran.github.io/projects/CAT-SAM}

The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: \url{https://xiaoaoran.github.io/projects/CAT-SAM}

翻訳日:2024-02-07 17:10:08 公開日:2024-02-06

# IDE開発静的コンテキストのネイティブ統合によるLCMベースのコーディングツールの強化

Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context ( http://arxiv.org/abs/2402.03630v1 )

ライセンス: Link先を確認

Yichen Li and Yun Peng and Yintong Huo and Michael R. Lyu

(参考訳) 大規模言語モデル(LLM)は、Copilotのようなコードアシスタントサービスの開発において重要な役割を担っていることが証明されている。ファイル内のコンテキストでトレーニングされているため、現在のllmは単一のソースファイルのコード補完に非常に有効である。しかし、クロスファイル情報を必要とする大規模なソフトウェアプロジェクトに対して、リポジトリレベルのコード補完を行うことは困難である。 LLMベースのリポジトリレベルのコード補完に関する既存の研究は、ファイル間のコンテキストを特定し統合するが、LLMの低い精度と限られたコンテキスト長に悩まされている。本稿では,統合開発環境(IDE)がリポジトリレベルのコード補完のために,直接的かつ正確かつリアルタイムなクロスファイル情報を提供できることを論じる。我々は,IDEネイティブな静的コンテキストをクロスコンテキスト構築や自己修正のための診断結果に活用する,実践的なフレームワークであるIDECoderを提案する。 IDECoderは、リポジトリレベルのコード補完のLLMの機能を強化するために、IDEで利用可能なリッチなコンテキスト情報を利用する。我々はIDECoderの性能を検証するための予備実験を行い、この相乗効果が今後の探索に有望な傾向を示すことを観察した。

Large Language Models (LLMs) have achieved remarkable success in code completion, as evidenced by their essential roles in developing code assistant services such as Copilot. Being trained on in-file contexts, current LLMs are quite effective in completing code for single source files. However, it is challenging for them to conduct repository-level code completion for large software projects that require cross-file information. Existing research on LLM-based repository-level code completion identifies and integrates cross-file contexts, but it suffers from low accuracy and limited context length of LLMs. In this paper, we argue that Integrated Development Environments (IDEs) can provide direct, accurate and real-time cross-file information for repository-level code completion. We propose IDECoder, a practical framework that leverages IDE native static contexts for cross-context construction and diagnosis results for self-refinement. IDECoder utilizes the rich cross-context information available in IDEs to enhance the capabilities of LLMs of repository-level code completion. We conducted preliminary experiments to validate the performance of IDECoder and observed that this synergy represents a promising trend for future exploration.

翻訳日:2024-02-07 17:09:43 公開日:2024-02-06

# 個人推論のための線形化群精度に対する異なる影響

Disparate Impact on Group Accuracy of Linearization for Private Inference ( http://arxiv.org/abs/2402.03629v1 )

ライセンス: Link先を確認

Saswat Das, Marco Romanelli, Ferdinando Fioretto

(参考訳) 暗号化されたセキュアなデータに対するプライバシ保存推論の確保は、よく知られた計算上の課題である。非線形アクティベーションにおけるコストのかかる暗号計算のボトルネックを軽減するため、最近の手法では、ニューラルネットワークにおいてこれらのアクティベーションのターゲット部分の線形化が提案されている。この技術は、しばしば精度に無視できる影響で、ランタイムを著しく削減する。本稿では,このような計算的利点が公正コストの増大につながることを実証する。具体的には、ReLUアクティベーション数の減少が多数派に比べて少数派の精度を著しく低下させることがわかった。これらの観察を説明するために、決定境界の性質に関する制限された仮定の下での数学的解釈と、広く使われているデータセットやアーキテクチャにまたがるこの問題の流行を示す。最後に,線形化モデルの微調整手順を変更する簡単な手順が,効果的な緩和戦略として有効であることを示す。

Ensuring privacy-preserving inference on cryptographically secure data is a well-known computational challenge. To alleviate the bottleneck of costly cryptographic computations in non-linear activations, recent methods have suggested linearizing a targeted portion of these activations in neural networks. This technique results in significantly reduced runtimes with often negligible impacts on accuracy. In this paper, we demonstrate that such computational benefits may lead to increased fairness costs. Specifically, we find that reducing the number of ReLU activations disproportionately decreases the accuracy for minority groups compared to majority groups. To explain these observations, we provide a mathematical interpretation under restricted assumptions about the nature of the decision boundary, while also showing the prevalence of this problem across widely used datasets and architectures. Finally, we show how a simple procedure altering the fine-tuning step for linearized models can serve as an effective mitigation strategy.

翻訳日:2024-02-07 17:09:20 公開日:2024-02-06

# 専門家エージェント -- 大きな言語モデルから人間レベルの能力を持つ自律的な専門家へと進化する

Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies ( http://arxiv.org/abs/2402.03628v1 )

ライセンス: Link先を確認

Zhixuan Chu, Yan Wang, Feng Zhu, Lu Yu, Longfei Li, Jinjie Gu

(参考訳) ChatGPT、PaLM、GPT-4のような大型言語モデル(LLM)の出現は、自然言語処理において顕著な進歩を触媒し、人間の言語流布や推論能力を示している。本稿では、制御可能で専門的で対話的でプロフェッショナルレベルの能力を持つ自律エージェントを作成するためのLLM機能を利用したアプリケーションフレームワークであるProfessional Agents(PAgents)の概念を紹介する。我々は、PAgentsが継続的な専門知識を通じてプロフェッショナルサービスを再形成できると仮定する。提案するpagentsフレームワークは,生成,進化,シナジーのための3層アーキテクチャであるベースツール層,中間エージェント層,トップシナジー層を含んでいる。本稿は,LLMの現実的応用の可能性について論じる。我々は、PAgentの高度化と統合が、複雑なドメインに対して専門的な熟達を示し、重要なニーズに対処し、人工知能の達成に繋がる可能性があると論じている。

The advent of large language models (LLMs) such as ChatGPT, PaLM, and GPT-4 has catalyzed remarkable advances in natural language processing, demonstrating human-like language fluency and reasoning capacities. This position paper introduces the concept of Professional Agents (PAgents), an application framework harnessing LLM capabilities to create autonomous agents with controllable, specialized, interactive, and professional-level competencies. We posit that PAgents can reshape professional services through continuously developed expertise. Our proposed PAgents framework entails a tri-layered architecture for genesis, evolution, and synergy: a base tool layer, a middle agent layer, and a top synergy layer. This paper aims to spur discourse on promising real-world applications of LLMs. We argue the increasing sophistication and integration of PAgents could lead to AI systems exhibiting professional mastery over complex domains, serving critical needs, and potentially achieving artificial general intelligence.

翻訳日:2024-02-07 17:09:04 公開日:2024-02-06

# 視覚言語モデルのロバストネスに対する部分分散化ソフトマックス損失

Partially Recentralization Softmax Loss for Vision-Language Models Robustness ( http://arxiv.org/abs/2402.03627v1 )

ライセンス: Link先を確認

Hao Wang, Xin Zhang, Jinzhe Jiang, Yaqian Zhao and Chen Li

(参考訳) 大規模言語モデルが自然言語処理タスク(NLP)を突破するにつれ、マルチモーダル技術は非常に人気がある。しかし、マルチモーダルNLPは、入力への摂動によってモデルの出力を劇的に変化させることができる敵攻撃に弱いことが示されている。コンピュータビジョンとNLPモデルの両方でいくつかの防御技術が提案されているが、モデルのマルチモーダルロバスト性は十分に研究されていない。本稿では,事前学習されたマルチモーダルモデルの損失関数を,トップkソフトマックス出力を制限して提供する逆ロバスト性について検討する。評価と評価から,本実験では,訓練済みモデルの微調整後,攻撃に対する対角的堅牢性を著しく改善できることが示唆された。出力の多様性、一般化、この種の損失関数の堅牢性とパフォーマンスのトレードオフなど、さらなる研究が必要である。私たちのコードは、この論文が受け入れられた後に利用可能になるでしょう

As Large Language Models make a breakthrough in natural language processing tasks (NLP), multimodal technique becomes extremely popular. However, it has been shown that multimodal NLP are vulnerable to adversarial attacks, where the outputs of a model can be dramatically changed by a perturbation to the input. While several defense techniques have been proposed both in computer vision and NLP models, the multimodal robustness of models have not been fully explored. In this paper, we study the adversarial robustness provided by modifying loss function of pre-trained multimodal models, by restricting top K softmax outputs. Based on the evaluation and scoring, our experiments show that after a fine-tuning, adversarial robustness of pre-trained models can be significantly improved, against popular attacks. Further research should be studying, such as output diversity, generalization and the robustness-performance trade-off of this kind of loss functions. Our code will be available after this paper is accepted

翻訳日:2024-02-07 17:08:46 公開日:2024-02-06

# 多項式時間におけるReLUニューラルネットワーク近似グローバルオプティマの凸緩和

Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time ( http://arxiv.org/abs/2402.03625v1 )

ライセンス: Link先を確認

Sungyoon Kim, Mert Pilanci

(参考訳) 本稿では,2層ReLUネットワーク間における重み劣化と凸緩和の最適性ギャップについて検討する。トレーニングデータがランダムである場合、元の問題と緩和の間の相対的最適性ギャップは、トレーニングサンプルの数である$n$ の係数 o(\sqrt{\log n})$ で境界される。単純な応用は、元の非凸問題を対数係数まで解くことが保証される、扱いやすい多項式時間アルゴリズムに繋がる。さらに, 軽度の仮定の下では, パラメータのランダム初期化により, 局所勾配法がトレーニング損失の少ない点にほぼ確実に収束することを示す。その結果,既存の結果と比較して指数関数的な改善が得られ,局所勾配法がうまく機能する理由の解明に新たな光を当てることができた。

In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of $O(\sqrt{\log n})$, where $n$ is the number of training samples. A simple application leads to a tractable polynomial-time algorithm that is guaranteed to solve the original non-convex problem up to a logarithmic factor. Moreover, under mild assumptions, we show that with random initialization on the parameters local gradient methods almost surely converge to a point that has low training loss. Our result is an exponential improvement compared to existing results and sheds new light on understanding why local gradient methods work well.

翻訳日:2024-02-07 17:08:28 公開日:2024-02-06

# 確率回路におけるMarginal MAPのためのニューラルネットワーク近似器

Neural Network Approximators for Marginal MAP in Probabilistic Circuits ( http://arxiv.org/abs/2402.03621v1 )

ライセンス: Link先を確認

Shivvrat Arya, Tahrima Rahman, Vibhav Gogate

(参考訳) 総生産ネットワークのような確率回路(PC)は、大規模な多変量確率分布を効率的に表現する。ベイジアンネットワークやマルコフネットワークのような他の確率的表現よりも実際は、PCがネットワークのサイズを線形にスケールする時間に限界推論(MAR)タスクを解くことができるため、それらは好まれる。残念なことに、これらのモデルでは最大ポステリオリ(MAP)と限界MAP(MMAP)タスクはNPハードのままである。整数線形計画法などの最適化問題に対して,ニューラルネットワークを最適に近い解を生成するための最近の研究から着想を得て,ニューラルネットワークを用いてPC内の(M)MAP推論を近似する手法を提案する。提案手法の主な考え方は,連続的多線形関数を用いてクエリ変数への代入のコストを近似し,後者を損失関数として用いることである。新しい手法の2つの主な利点は、自己教師型であり、ニューラルネットワークが学習されると、解を出力するのに線形時間しか必要なくなることである。我々は,いくつかのベンチマークデータセットにおける新しいアプローチを評価し,pcのmmapタスクを実際に解くために使用される3つの競合する線形時間近似,最大積推論,最大数推論,逐次推定よりも優れていることを示す。

Probabilistic circuits (PCs) such as sum-product networks efficiently represent large multi-variate probability distributions. They are preferred in practice over other probabilistic representations such as Bayesian and Markov networks because PCs can solve marginal inference (MAR) tasks in time that scales linearly in the size of the network. Unfortunately, the maximum-a-posteriori (MAP) and marginal MAP (MMAP) tasks remain NP-hard in these models. Inspired by the recent work on using neural networks for generating near-optimal solutions to optimization problems such as integer linear programming, we propose an approach that uses neural networks to approximate (M)MAP inference in PCs. The key idea in our approach is to approximate the cost of an assignment to the query variables using a continuous multilinear function, and then use the latter as a loss function. The two main benefits of our new method are that it is self-supervised and after the neural network is learned, it requires only linear time to output a solution. We evaluate our new approach on several benchmark datasets and show that it outperforms three competing linear time approximations, max-product inference, max-marginal inference and sequential estimation, which are used in practice to solve MMAP tasks in PCs.

翻訳日:2024-02-07 17:08:14 公開日:2024-02-06

# 自己発見: 大きな言語モデル推論構造を自己組織化する

Self-Discover: Large Language Models Self-Compose Reasoning Structures ( http://arxiv.org/abs/2402.03620v1 )

ライセンス: Link先を確認

Pei Zhou, Jay Pujara, Xiang Ren, Xinyun Chen, Heng-Tze Cheng, Quoc V. Le, Ed H. Chi, Denny Zhou, Swaroop Mishra, Huaixiu Steven Zheng

(参考訳) 本稿では, LLM の汎用フレームワークである SELF-DISCOVER を導入し, タスク固有の推論構造を自己発見し, 典型的なプロンプト手法では難しい複雑な推論問題に対処する。フレームワークの中核は自己発見プロセスであり、LCMは批判的思考やステップバイステップ思考などの複数のアトミック推論モジュールを選択し、それらを復号中に従うための明示的な推論構造に構成する。 SELF-DISCOVERは、BigBench-Hard、グラウンドドエージェント推論、MATHといった挑戦的推論ベンチマークに対して、GPT-4とPaLM 2のパフォーマンスを、Chain of Thought (CoT)と比較して32%改善した。さらに、自己発見は推論集約的な手法であるcot-self-consistencyを20%以上上回り、推論計算を10～40倍削減する。最後に, 自己発見推論構造は, PaLM 2-L から GPT-4 まで, GPT-4 から Llama2 まで, モデルファミリー全体にわたって普遍的に適用可能であることを示す。

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

翻訳日:2024-02-07 17:07:54 公開日:2024-02-06

# ハイブリッド職場意思決定支援のための大規模言語モデル活用

Leveraging Large Language Models for Hybrid Workplace Decision Support ( http://arxiv.org/abs/2402.03616v1 )

ライセンス: Link先を確認

Yujin Kim, Chin-Chia Hsu

(参考訳) 大きな言語モデル(LLM)は、様々なテキスト処理タスクを実行し、提案されたアクションや決定に対してテキストによる説明を提供する可能性を持っている。ハイブリッドワークの時代において、LLMは、ハイブリッドワークプランを設計している労働者にインテリジェントな意思決定支援を提供することができる。特に、多くの意思決定要因のバランスをとる労働者に提案や説明を提供することで、作業経験を向上させることができる。本稿では,LLMの推論技術を活用した,ハイブリッド作業環境におけるワークスペースの決定支援モデルを提案する。まず、LLMが適切なワークスペース提案を行う能力について検討する。その推論はプロンプトのガイドラインを超えており、LLMはワークスペースで利用可能なリソース間のトレードオフを管理することができる。我々は,ワークスペース選択における作業者の意思決定過程を理解し,システムの有効性を評価するために,広範なユーザ調査を実施している。作業者の判断は, LLMの提案や説明に影響される可能性がある。本研究の参加者は, 理由の有無にかかわらず, 便利なシステムであることが確認された。この結果から,LLMを活用したワークスペース選択システムにより,従業員はハイブリッド職場におけるワークスペース選択のメリットを享受できることがわかった。

Large Language Models (LLMs) hold the potential to perform a variety of text processing tasks and provide textual explanations for proposed actions or decisions. In the era of hybrid work, LLMs can provide intelligent decision support for workers who are designing their hybrid work plans. In particular, they can offer suggestions and explanations to workers balancing numerous decision factors, thereby enhancing their work experience. In this paper, we present a decision support model for workspaces in hybrid work environments, leveraging the reasoning skill of LLMs. We first examine LLM's capability of making suitable workspace suggestions. We find that its reasoning extends beyond the guidelines in the prompt and the LLM can manage the trade-off among the available resources in the workspaces. We conduct an extensive user study to understand workers' decision process for workspace choices and evaluate the effectiveness of the system. We observe that a worker's decision could be influenced by the LLM's suggestions and explanations. The participants in our study find the system to be convenient, regardless of whether reasons are provided or not. Our results show that employees can benefit from the LLM-empowered system for their workspace selection in hybrid workplace.

翻訳日:2024-02-07 17:07:30 公開日:2024-02-06

# 多変量時系列データのためのベイズ因子グランガーカウサルグラフ

Bayesian Factorised Granger-Causal Graphs For Multivariate Time-series Data ( http://arxiv.org/abs/2402.03614v1 )

ライセンス: Link先を確認

He Zhao and Edwin V. Bonilla

(参考訳) 本研究では,多変量時系列データからGranger因果関係を自動的に検出する問題について検討する。ベクトル自己回帰(VAR)モデルは、ベイズ変種や、より最近のディープニューラルネットワークを用いた開発など、この問題に対してタイムテストされている。グランガー因果関係のための既存のVAR法は、スパーシリティ誘導法(英語版)またはポストホック閾値を用いて、それらの係数をグランガー因果グラフ(英語版)として解釈する。代わりに、二元グランガー因果グラフよりも先に階層グラフを持つ新しいベイズ型varモデルを提案する。我々は,2進グランガー因果グラフの後方推定に有効なアルゴリズムを開発した。本手法は,不確かさの定量化,ハイパーパラメータの低減,特に疎多変量時系列データにおいて,競合するアプローチよりも優れた性能を実現する。

We study the problem of automatically discovering Granger causal relations from observational multivariate time-series data. Vector autoregressive (VAR) models have been time-tested for this problem, including Bayesian variants and more recent developments using deep neural networks. Most existing VAR methods for Granger causality use sparsity-inducing penalties/priors or post-hoc thresholds to interpret their coefficients as Granger causal graphs. Instead, we propose a new Bayesian VAR model with a hierarchical graph prior over binary Granger causal graphs, separately from the VAR coefficients. We develop an efficient algorithm to infer the posterior over binary Granger causal graphs. Our method provides better uncertainty quantification, has less hyperparameters, and achieves better performance than competing approaches, especially on sparse multivariate time-series data.

翻訳日:2024-02-07 17:07:12 公開日:2024-02-06

# RAP:マルチモーダルLLMエージェントのコンテキスト記憶による検索拡張計画

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents ( http://arxiv.org/abs/2402.03610v1 )

ライセンス: Link先を確認

Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You

(参考訳) 最近の進歩により、ロボット工学、ゲーム、API統合など、ますます複雑な意思決定アプリケーションのためのエージェントとして、LLM(Large Language Models)がデプロイできるようになった。しかし、人間の行動である現在の意思決定プロセスにおける過去の経験を反映して、大きな課題が生まれ続けている。そこで本稿では,現在状況や状況に応じた過去の経験を動的に活用し,エージェントの計画能力を向上するためのRAP(Retrieval-Augmented Planning)フレームワークを提案する。 rapは、テキストのみの環境とマルチモーダル環境の両方で優れているため、幅広いタスクに適しています。経験的評価は、テキストシナリオにおけるSOTA性能を実現し、具体的タスクに対するマルチモーダルLLMエージェントのパフォーマンスを顕著に向上するRAPの有効性を示す。これらの結果は、複雑な実世界のアプリケーションにおいて、LLMエージェントの機能と適用性を向上させるRAPの可能性を強調している。

Owing to recent advancements, Large Language Models (LLMs) can now be deployed as agents for increasingly complex decision-making applications in areas including robotics, gaming, and API integration. However, reflecting past experiences in current decision-making processes, an innate human behavior, continues to pose significant challenges. Addressing this, we propose Retrieval-Augmented Planning (RAP) framework, designed to dynamically leverage past experiences corresponding to the current situation and context, thereby enhancing agents' planning capabilities. RAP distinguishes itself by being versatile: it excels in both text-only and multimodal environments, making it suitable for a wide range of tasks. Empirical evaluations demonstrate RAP's effectiveness, where it achieves SOTA performance in textual scenarios and notably enhances multimodal LLM agents' performance for embodied tasks. These results highlight RAP's potential in advancing the functionality and applicability of LLM agents in complex, real-world applications.

翻訳日:2024-02-07 17:06:57 公開日:2024-02-06

# 大運動量移動と発射原子を用いた点源干渉型慣性測定器の感度と帯域幅

Sensitivity and Bandwidth of a Point-Source-Interferometry-based Inertial Measurement Unit Employing Large Momentum Transfer and Launched Atoms ( http://arxiv.org/abs/2402.03608v1 )

ライセンス: Link先を確認

Jinyang Li, Timothy Kovachy, Jason Bonacum, Selim M. Shahriar

(参考訳) 本研究では,大質量運動量移動(lmt)を用いた点光源干渉計を用いて加速度計と回転センシングの感度と帯域を理論的に解析した。打ち上げプロセスにより、ラマンパルスの方向を物理的に変更することなくLMTプロセスを実現することができ、装置を著しく単純化し、測定に必要な時間を削減することができる。これらの利点は、3つの軸に沿って回転と加速度を計測できる慣性測定ユニット(IMU)を実現するためにこのプロセスが使われるとより重要になる。我々は、そのようなIMUの明示的なスキームを記述し、実験的にアクセス可能なパラメータに対する予測感度と帯域幅を決定する。

We analyze theoretically the sensitivity and bandwidth of accelerometry and rotation sensing with a point source interferometer employing large momentum transfer (LMT) with molasses-launched atoms. The launching process makes it possible to realize the LMT process without the need to physically change directions of the Raman pulses, thus significantly simplifying the apparatus and reducing the amount of time needed to make the measurements. These advantages become more important when this process is used for realizing an inertial measurement unit (IMU) that can measure rotation around and acceleration along each of the three axes. We describe an explicit scheme for a such an IMU and determine the expected sensitivity and bandwidth thereof for experimentally accessible parameters.

翻訳日:2024-02-07 17:06:41 公開日:2024-02-06

# 知識融合学習を用いた効果的なマルチモーダルマーケティングのためのモダリティ間の文脈一致の改善

Improving Contextual Congruence Across Modalities for Effective Multimodal Marketing using Knowledge-infused Learning ( http://arxiv.org/abs/2402.03607v1 )

ライセンス: Link先を確認

Trilok Padhi, Ugur Kursuncu, Yaman Kumar, Valerie L. Shalin, Lane Peterson Fronczek

(参考訳) 複数のモーダルでモーメントをキャプチャできるスマートデバイスの普及により、ユーザはオンラインでマルチモーダル情報を体験できるようになった。しかし、大きな言語(LLM)とビジョンモデル(LVM)は、相反する意味関係を持つ全体的意味を捉えることにはまだ限界がある。明示的で常識的な知識(例えば知識グラフ)がなければ、視覚言語モデル(vlms)は、巨大なコーパスでハイレベルなパターンを捉えて暗黙的な表現のみを学習し、必須の文脈横断的手がかりを欠く。本研究では,ダウンストリームタスクの性能を向上させるために,知識グラフの形で明示的な常識知識を結合するフレームワークを設計し,マルチモーダルマーケティングキャンペーンの有効性を予測した。マーケティングアプリケーションは,提案手法を評価するための説得力のある指標を提供するが,本手法は,多モードキャンペーンの可能性を早期に検出し,マーケティング理論の評価と拡張を可能にする。

The prevalence of smart devices with the ability to capture moments in multiple modalities has enabled users to experience multimodal information online. However, large Language (LLMs) and Vision models (LVMs) are still limited in capturing holistic meaning with cross-modal semantic relationships. Without explicit, common sense knowledge (e.g., as a knowledge graph), Visual Language Models (VLMs) only learn implicit representations by capturing high-level patterns in vast corpora, missing essential contextual cross-modal cues. In this work, we design a framework to couple explicit commonsense knowledge in the form of knowledge graphs with large VLMs to improve the performance of a downstream task, predicting the effectiveness of multi-modal marketing campaigns. While the marketing application provides a compelling metric for assessing our methods, our approach enables the early detection of likely persuasive multi-modal campaigns and the assessment and augmentation of marketing theory.

翻訳日:2024-02-07 17:06:30 公開日:2024-02-06

# 防衛と安全のためのモノのインターネット」の展望

A Review on Internet of Things for Defense and Public Safety ( http://arxiv.org/abs/2402.03599v1 )

ライセンス: Link先を確認

Paula Fraga-Lamas, Tiago M. Fern\'andez-Caram\'es, Manuel Su\'arez-Albela, Luis Castedo and Miguel Gonz\'alez-L\'opez

(参考訳) IoT(Internet of Things, モノのインターネット)は、組織が日常のビジネスや産業の手続きをコミュニケーションし、組織化する方法を変えつつある。その採用は、多数の資産を管理し、複雑な分散プロセスを調整するセクターに適していることが証明されている。この調査は、IoTテクノロジ(すなわち、データ駆動アプリケーションや組み込み自動化およびインテリジェント適応システム)を適用して、現代的な戦争に革命をもたらし、業界のものに似たメリットを提供する、という大きな可能性を分析します。防衛と公衆安全(PS)がより優れた商用IoT機能を活用して、戦闘員や最初の対応者に生存可能性を高めるシナリオを特定し、コストを削減し、運用効率と効率を向上する。この記事では、軍事分野とミッションクリティカルなシナリオにおける既存のIoTシステムのギャップと欠点について、主要な戦術的要件とアーキテクチャについてレビューする。このレビューでは、広く展開する上でのオープンな課題を特徴付け、防御とPSに安価なIoTを実現するための研究ロードマップを公開している。

The Internet of Things (IoT) is undeniably transforming the way that organizations communicate and organize everyday businesses and industrial procedures. Its adoption has proven well suited for sectors that manage a large number of assets and coordinate complex and distributed processes. This survey analyzes the great potential for applying IoT technologies (i.e., data-driven applications or embedded automation and intelligent adaptive systems) to revolutionize modern warfare and provide benefits similar to those in industry. It identifies scenarios where Defense and Public Safety (PS) could leverage better commercial IoT capabilities to deliver greater survivability to the warfighter or first responders, while reducing costs and increasing operation efficiency and effectiveness. This article reviews the main tactical requirements and the architecture, examining gaps and shortcomings in existing IoT systems across the military field and mission-critical scenarios. The review characterizes the open challenges for a broad deployment and presents a research roadmap for enabling an affordable IoT for defense and PS.

翻訳日:2024-02-07 17:06:11 公開日:2024-02-06

# 部分グロモフ・ワッサーシュタインの効率的な解法

Efficient Solvers for Partial Gromov-Wasserstein ( http://arxiv.org/abs/2402.03664v1 )

ライセンス: Link先を確認

Yikun Bai, Rocio Diaz Martin, Hengrong Du, Ashkan Shahbazi, and Soheil Kolouri

(参考訳) 部分グロモフ=ワッサーシュタイン問題(英語版)(PGW)は、潜在的に異なる距離空間に存在する不等質量との測度の比較を容易にするため、これらの空間間の不均衡および部分的マッチングを可能にする。本稿では, PGW問題をGromov-Wasserstein問題の変種に変換できることを示す。この変換は、フランク・ウルフアルゴリズムに基づく数学的および計算的に等価な2つの新しい解法につながり、pgw問題の効率的な解を与える。さらに、PGW問題は計量測度空間の計量を構成することを確かめる。最後に,提案する解法の有効性を,既存のベースラインと比較し,形状マッチングおよび正ラベル学習問題における計算時間と性能の観点から検証した。

The partial Gromov-Wasserstein (PGW) problem facilitates the comparison of measures with unequal masses residing in potentially distinct metric spaces, thereby enabling unbalanced and partial matching across these spaces. In this paper, we demonstrate that the PGW problem can be transformed into a variant of the Gromov-Wasserstein problem, akin to the conversion of the partial optimal transport problem into an optimal transport problem. This transformation leads to two new solvers, mathematically and computationally equivalent, based on the Frank-Wolfe algorithm, that provide efficient solutions to the PGW problem. We further establish that the PGW problem constitutes a metric for metric measure spaces. Finally, we validate the effectiveness of our proposed solvers in terms of computation time and performance on shape-matching and positive-unlabeled learning problems, comparing them against existing baselines.

翻訳日:2024-02-07 17:00:03 公開日:2024-02-06

# 記号層を含むディープニューラルネットワークにおけるシンボルの正確性

Symbol Correctness in Deep Neural Networks Containing Symbolic Layers ( http://arxiv.org/abs/2402.03663v1 )

ライセンス: Link先を確認

Aaron Bembenek, Toby Murray

(参考訳) 知覚と論理的推論を組み合わせたAIタスクを扱うために、最近の研究では、従来のニューラルネットワーク層に加えて、シンボリック表現(SAT式、論理プログラムなど)を含むニューロシンボリックディープニューラルネットワーク(NS-DNN)を導入している。我々は,NS-DNNの設計と分析を導く直感的かつ高レベルな原理,すなわち,入力データの(一般には知られていない)基底構造的記号表現に対して,ニューラルネットワーク層によって予測される中間シンボルの正しさを識別し,定式化する。記号の正しさはns-dnnの説明可能性と転校学習(一般に訓練が不可能であるにもかかわらず)に必要な特性であることを示す。さらに,シンボルの正しさの枠組みは,ニューラルシンボリック境界におけるモデル行動の推論と伝達の正確な方法を提供し,NS-DNNトレーニングアルゴリズムが直面する基本的なトレードオフについて考察する。そこで我々は,先行作業におけるあいまいさの重要点を特定し,さらにNS-DNNの発展を支援する枠組みを提供する。

To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.

翻訳日:2024-02-07 16:59:48 公開日:2024-02-06

# グラフ上の帰納的推論

Transductive Reward Inference on Graph ( http://arxiv.org/abs/2402.03661v1 )

ライセンス: Link先を確認

Bohao Qu, Xiaofeng Cao, Qing Guo, Yi Chang, Ivor W. Tsang, Chengqi Zhang

(参考訳) 本研究では,その報酬情報伝達グラフに対する帰納的推論手法を提案し,オフライン強化学習においてラベルなしデータに対する報酬を効果的に推定することを可能にする。報酬推論は実用的なシナリオで効果的なポリシーを学ぶための鍵であり、直接的な環境相互作用は費用がかかりすぎるか非倫理的であり、医療やロボティクスのような報酬機能がアクセスできない。本研究では,制約付き人間報酬アノテーションを活かしたグラフ上の情報伝達の文脈特性に基づく報酬推論手法を開発し,ラベルなしデータに対する報酬を推測する。我々は、利用可能なデータと限定的な報酬アノテーションの両方を利用して報酬伝達グラフを構築し、エッジ重み付けは報酬に関連するさまざまな影響要因を取り入れている。得られたグラフを変換的報酬推論に活用し,ラベルなしデータに対する報酬を推定する。さらに,帰納的推論過程の複数の反復の間に不動点の存在を確定し,その局所的最適値への少なくとも収束を示す。歩行とロボット操作タスクに関する経験的評価は,このアプローチの有効性を検証する。推定報酬の適用により,オフライン強化学習タスクの性能が向上する。

In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in offline reinforcement learning. Reward inference is the key to learning effective policies in practical scenarios, while direct environmental interactions are either too costly or unethical and the reward functions are rarely accessible, such as in healthcare and robotics. Our research focuses on developing a reward inference method based on the contextual properties of information propagation on graphs that capitalizes on a constrained number of human reward annotations to infer rewards for unlabelled data. We leverage both the available data and limited reward annotations to construct a reward propagation graph, wherein the edge weights incorporate various influential factors pertaining to the rewards. Subsequently, we employ the constructed graph for transductive reward inference, thereby estimating rewards for unlabelled data. Furthermore, we establish the existence of a fixed point during several iterations of the transductive inference process and demonstrate its at least convergence to a local optimum. Empirical evaluations on locomotion and robotic manipulation tasks validate the effectiveness of our approach. The application of our inferred rewards improves the performance in offline reinforcement learning tasks.

翻訳日:2024-02-07 16:59:28 公開日:2024-02-06

# プレトレーニング・ファイバリングパラダイムにおけるクロスタスクリニアリティの創出

Cross-Task Linearity Emerges in the Pretraining-Finetuning Paradigm ( http://arxiv.org/abs/2402.03660v1 )

ライセンス: Link先を確認

Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan

(参考訳) プレトレーニング・ファインタニングのパラダイムは、現代のディープラーニングの主流となっている。本研究では,共通の事前学習済みチェックポイントから初期化され,異なるタスクで微調整されたモデルにおいて興味をそそる線形現象を,クロスタスク線形性(ctl)と呼ぶ。具体的には、2つの微調整モデルの重みを線形に補間すると、重み補間モデルの特徴は各層における2つの微調整モデルの特徴の線形補間とほぼ等しい。このようなクロスタスク線形性はピア文学では注目されていない。我々は、CTLが同じ事前訓練されたチェックポイントから始まる微調整モデルに対して一貫して発生することを示す包括的な実証的証拠を提供する。プレトレーニング-ファインタニングのパラダイムでは、ニューラルネットワークは基本的に線形写像として機能し、パラメータ空間から特徴空間へマッピングする。この観点から,本研究では,モデルマージ/編集について,特にパラメータ空間から特徴空間へ操作を変換することによって,新たな知見を提示する。さらに,CTLの出現の根底にある要因を深く掘り下げ,事前学習の影響を強調した。

The pretraining-finetuning paradigm has become the prevailing trend in modern deep learning. In this work, we discover an intriguing linear phenomenon in models that are initialized from a common pretrained checkpoint and finetuned on different tasks, termed as Cross-Task Linearity (CTL). Specifically, if we linearly interpolate the weights of two finetuned models, the features in the weight-interpolated model are approximately equal to the linear interpolation of features in two finetuned models at each layer. Such cross-task linearity has not been noted in peer literature. We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We conjecture that in the pretraining-finetuning paradigm, neural networks essentially function as linear maps, mapping from the parameter space to the feature space. Based on this viewpoint, our study unveils novel insights into explaining model merging/editing, particularly by translating operations from the parameter space to the feature space. Furthermore, we delve deeper into the underlying factors for the emergence of CTL, emphasizing the impact of pretraining.

翻訳日:2024-02-07 16:59:10 公開日:2024-02-06

# 自己回帰型大言語モデルを用いた説明可能な株価予測の学習

Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models ( http://arxiv.org/abs/2402.03659v1 )

ライセンス: Link先を確認

Kelvin J.L. Koa, Yunshan Ma, Ritchie Ng, Tat-Seng Chua

(参考訳) ストック予測を説明することは、従来の非生成的ディープラーニングモデルでは一般的に難しいタスクであり、重要なテキストに対する注意重みを視覚化することに限定されている。今日、Large Language Models (LLM) は、意思決定プロセスのための人間可読な説明を生成する既知の能力から、この問題に対する解決策を提示している。しかし、株価にカオス的なソーシャルテキストが与える影響を測る能力が必要となるため、株価予測の課題は依然としてllmsにとって困難である。この問題は説明コンポーネントの導入によって徐々に難しくなり、llmはなぜ特定の要因が他の要素よりも重要であるのかを口頭で説明する必要がある。一方で,このような課題に対してllmを微調整するには,トレーニングセット内の各ストック移動に対して,専門家による説明のサンプルが必要となる。これらの課題に対処するために,LLMが説明可能な株価予測を完全自律的に生成する方法を教えるために,自己回帰エージェントとPPO(Proximal Policy Optimization)を利用したSEP(Summarize-Explain-Predict)フレームワークを提案する。反射剤は自己推論によって過去の株価の動きを説明する方法を学び、PPOトレーナーは入力テキストから最も可能性の高い説明を生成するためにモデルを訓練する。 PPOトレーナーのトレーニングサンプルは、反射過程中に生成された応答であり、人間のアノテータの必要性を排除している。 SEPフレームワークを用いて,従来の深層学習法とLLM法の両方を予測精度,およびストック分類タスクに対するマシューズ相関係数で上回り得るLLMを微調整する。フレームワークの一般化能力を正当化するため、ポートフォリオ構築タスクでさらにテストし、さまざまなポートフォリオメトリクスを通してその効果を実証する。

Explaining stock predictions is generally a difficult task for traditional non-generative deep learning models, where explanations are limited to visualizing the attention weights on important texts. Today, Large Language Models (LLMs) present a solution to this problem, given their known capabilities to generate human-readable explanations for their decision-making process. However, the task of stock prediction remains challenging for LLMs, as it requires the ability to weigh the varying impacts of chaotic social texts on stock prices. The problem gets progressively harder with the introduction of the explanation component, which requires LLMs to explain verbally why certain factors are more important than the others. On the other hand, to fine-tune LLMs for such a task, one would need expert-annotated samples of explanation for every stock movement in the training set, which is expensive and impractical to scale. To tackle these issues, we propose our Summarize-Explain-Predict (SEP) framework, which utilizes a self-reflective agent and Proximal Policy Optimization (PPO) to let a LLM teach itself how to generate explainable stock predictions in a fully autonomous manner. The reflective agent learns how to explain past stock movements through self-reasoning, while the PPO trainer trains the model to generate the most likely explanations from input texts. The training samples for the PPO trainer are also the responses generated during the reflective process, which eliminates the need for human annotators. Using our SEP framework, we fine-tune a LLM that can outperform both traditional deep-learning and LLM methods in prediction accuracy and Matthews correlation coefficient for the stock classification task. To justify the generalization capability of our framework, we further test it on the portfolio construction task, and demonstrate its effectiveness through various portfolio metrics.

翻訳日:2024-02-07 16:58:51 公開日:2024-02-06

# 対話における知覚強調グラフに基づくサルカズム記述

Sentiment-enhanced Graph-based Sarcasm Explanation in Dialogue ( http://arxiv.org/abs/2402.03658v1 )

ライセンス: Link先を確認

Kun Ouyang and Liqiang Jing and Xuemeng Song and Meng Liu and Yupeng Hu and Liqiang Nie

(参考訳) sed(sarcasm description in dialogue)は、複数のモーダリティ(発話、ビデオ、音声など)を含む、与えられたサルカスティックな対話に対して自然言語による説明を生成することを目的とした、新しい挑戦的なタスクである。既存の研究は、生成事前訓練された言語モデルであるBARTに基づいて大きな成功を収めてきたが、彼らは、発声、ビデオ、音声にまつわる感情を利用して、皮肉な説明の重要な手がかりを見落としている。実際、3つの大きな課題があるため、sedのパフォーマンスを高めるために感情を組み込むことは自明ではありません。 1) 発話トークンの感情に対する多様な影響 2)ビデオ音声の感情信号とBARTの埋め込み空間とのギャップ 3)発話,発話感情,映像音声感情のさまざまな関係これらの課題に対処するために, EDGE という新しい sEntiment-enhanceD Graph-based multimodal sarcasm Explanation フレームワークを提案する。特に,我々はまず,ヒューリスティックな発話感情改善戦略を考案した語彙誘導型発話感情推論モジュールを提案する。次に,マルチモーダル感情分析モデル JCA を拡張し,映像音声クリップ毎に共同感情ラベルを導出することにより,JCA-SI (Joint Cross Attention-based Sentiment Inference) というモジュールを開発する。その後, 発話, 発話感情, 音声感情間の意味関係を包括的にモデル化する文脈感グラフを考案し, 皮肉な説明生成を容易にする。一般公開されたデータセットWITSの大規模な実験は、最先端の手法よりもモデルの優位性を検証する。

Sarcasm Explanation in Dialogue (SED) is a new yet challenging task, which aims to generate a natural language explanation for the given sarcastic dialogue that involves multiple modalities (i.e., utterance, video, and audio). Although existing studies have achieved great success based on the generative pretrained language model BART, they overlook exploiting the sentiments residing in the utterance, video and audio, which are vital clues for sarcasm explanation. In fact, it is non-trivial to incorporate sentiments for boosting SED performance, due to three main challenges: 1) diverse effects of utterance tokens on sentiments; 2) gap between video-audio sentiment signals and the embedding space of BART; and 3) various relations among utterances, utterance sentiments, and video-audio sentiments. To tackle these challenges, we propose a novel sEntiment-enhanceD Graph-based multimodal sarcasm Explanation framework, named EDGE. In particular, we first propose a lexicon-guided utterance sentiment inference module, where a heuristic utterance sentiment refinement strategy is devised. We then develop a module named Joint Cross Attention-based Sentiment Inference (JCA-SI) by extending the multimodal sentiment analysis model JCA to derive the joint sentiment label for each video-audio clip. Thereafter, we devise a context-sentiment graph to comprehensively model the semantic relations among the utterances, utterance sentiments, and video-audio sentiments, to facilitate sarcasm explanation generation. Extensive experiments on the publicly released dataset WITS verify the superiority of our model over cutting-edge methods.

翻訳日:2024-02-07 16:58:20 公開日:2024-02-06

# Nested Low-Rank Approximationによるニューラルネットワークを用いた演算子SVD

Operator SVD with Neural Networks via Nested Low-Rank Approximation ( http://arxiv.org/abs/2402.03655v1 )

ライセンス: Link先を確認

J. Jon Ryu, Xiangxiang Xu, H. S. Melihcan Erol, Yuheng Bu, Lizhong Zheng, Gregory W. Wornell

(参考訳) 与えられた線形作用素の固有値分解(EVD)を計算したり、その主要な固有値や固有関数を見つけることは、多くの機械学習および科学計算問題において基本的な課題である。高次元固有値問題に対して、固有関数をパラメータ化するためのニューラルネットワークの訓練は、古典的な数値線形代数手法の代替として有望であると考えられている。本稿では,停止特異値分解の低ランク近似解析に基づく新しい最適化フレームワークを提案し,それとともに,最大$l$特異値と特異関数を正しい順序で学習するためのネスティングと呼ばれる新しい手法を提案する。提案手法は,非制約最適化の定式化により,学習関数における所望の直交性を暗黙的かつ効率的に促進する。本稿では,計算物理学と機械学習のユースケースに対する最適化フレームワークの有効性を示す。

Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called nesting for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning.

翻訳日:2024-02-07 16:57:52 公開日:2024-02-06

# ジェネレーティブ・ディバイサル・ネットワークにおけるFIDとSIDメトリクスのレビュー

Reviewing FID and SID Metrics on Generative Adversarial Networks ( http://arxiv.org/abs/2402.03654v1 )

ライセンス: Link先を確認

Ricardo de Deijn, Aishwarya Batra, Brandon Koch, Naseef Mansoor, Hema Makkena

(参考訳) generative adversarial network (gan)モデルの成長は画像処理の能力を高め、多くの産業に現実的な画像変換を生み出す技術を提供している。しかし、最近この分野が確立されているため、この研究をさらに進める新たな評価指標が存在する。これまでの研究では、Fr\'echet Inception Distance (FID) が実世界のアプリケーションで画像から画像へのGANをテストする上で有効な指標であることが示されている。 2023年に設立されたSID(Signed Inception Distance)は、符号なし距離を許すことでFIDを拡張する。本稿では, Pix2PixおよびCycleGANモデル内のfa\c{c}ades, cityscapes, mapからなる公開データセットを使用する。トレーニング後、これらのモデルは、トレーニングされたモデルの生成性能を測定する両方の開始距離指標に基づいて評価される。以上の結果から,SIDは画像から画像へのGANにFIDを用いて示される能力を補完したり,あるいは超えたりするために,効率的かつ効果的な指標を取り入れていることが示唆された。

The growth of generative adversarial network (GAN) models has increased the ability of image processing and provides numerous industries with the technology to produce realistic image transformations. However, with the field being recently established there are new evaluation metrics that can further this research. Previous research has shown the Fr\'echet Inception Distance (FID) to be an effective metric when testing these image-to-image GANs in real-world applications. Signed Inception Distance (SID), a founded metric in 2023, expands on FID by allowing unsigned distances. This paper uses public datasets that consist of fa\c{c}ades, cityscapes, and maps within Pix2Pix and CycleGAN models. After training these models are evaluated on both inception distance metrics which measure the generating performance of the trained models. Our findings indicate that usage of the metric SID incorporates an efficient and effective metric to complement, or even exceed the ability shown using the FID for the image-to-image GANs

翻訳日:2024-02-07 16:57:35 公開日:2024-02-06

# TGXを用いた時間グラフ解析

Temporal Graph Analysis with TGX ( http://arxiv.org/abs/2402.03651v1 )

ライセンス: Link先を確認

Razieh Shirzadkhani, Shenyang Huang, Elahe Kooshafar, Reihaneh Rabbany, Farimah Poursafaei

(参考訳) 現実世界のネットワークは進化する関係を持ち、時間グラフとして最もよく捉えられている。しかし、既存のソフトウェアライブラリは、時間グラフの動的性質が無視される静的グラフのために主に設計されている。このギャップを埋めて、データ読み込み、データ処理、進化するグラフの自動パイプラインを含む時間的ネットワークの分析に特化して設計されたPythonパッケージであるTGXを紹介する。 TGXは、11の組込みデータセットと8つの外部テンポラルグラフベンチマーク(TGB)データセット、および.NET Frameworkの新しいデータセットへのアクセスを提供する。 csvフォーマット。データローディング以外にも、TGXは、時間グラフの離散化やノードサブサンプリングといったデータ処理機能を促進して、より大きなデータセットの処理を高速化する。網羅的な調査のために、TGXは、平均ノード度と、タイムスタンプ当たりのノード数とエッジの進化数を含む、さまざまな測定値を提供することで、ネットワーク分析を提供する。さらに、パッケージは、時間的エッジ外観(tea)や時間的エッジトラフィック(tet)プロットのような時間的パターンの進化を示す有意義な可視化プロットを統合する。 TGXパッケージは、時間グラフの特徴を調べるための堅牢なツールであり、ソーシャルネットワークの研究、引用ネットワークの研究、ユーザインタラクションの追跡など、さまざまな分野で使用することができる。コミュニティのフィードバックに基づいてTGXを継続的にサポートし、更新する予定です。 TGXは、https://github.com/ComplexData-MILA/TGXで公開されている。

Real-world networks, with their evolving relations, are best captured as temporal graphs. However, existing software libraries are largely designed for static graphs where the dynamic nature of temporal graphs is ignored. Bridging this gap, we introduce TGX, a Python package specially designed for analysis of temporal networks that encompasses an automated pipeline for data loading, data processing, and analysis of evolving graphs. TGX provides access to eleven built-in datasets and eight external Temporal Graph Benchmark (TGB) datasets as well as any novel datasets in the .csv format. Beyond data loading, TGX facilitates data processing functionalities such as discretization of temporal graphs and node subsampling to accelerate working with larger datasets. For comprehensive investigation, TGX offers network analysis by providing a diverse set of measures, including average node degree and the evolving number of nodes and edges per timestamp. Additionally, the package consolidates meaningful visualization plots indicating the evolution of temporal patterns, such as Temporal Edge Appearance (TEA) and Temporal Edge Trafficc (TET) plots. The TGX package is a robust tool for examining the features of temporal graphs and can be used in various areas like studying social networks, citation networks, and tracking user interactions. We plan to continuously support and update TGX based on community feedback. TGX is publicly available on: https://github.com/ComplexData-MILA/TGX.

翻訳日:2024-02-07 16:57:14 公開日:2024-02-06

# マニフォールド学習によるマルチ線形カーネル回帰とインプット

Multilinear Kernel Regression and Imputation via Manifold Learning ( http://arxiv.org/abs/2402.03648v1 )

ライセンス: Link先を確認

Duc Thien Nguyen and Konstantinos Slavakis

(参考訳) 本稿では,データインプテーションのための新しい非パラメトリックフレームワークであるマルチリニアカーネル回帰(multil-krim)とインプテーション(imputation)を提案する。多様体学習によって動機づけられたMultiL-KRIMは、再現されたカーネルヒルベルト空間に埋め込まれたユーザ不明の滑らかな多様体の内または近くに位置する点雲としてのデータ特徴をモデル化する。グラフ-ラプラシア行列に基づく正規化子による低次元パターンを求める典型的な多様体学習経路とは異なり、MultiL-KRIMは、多様体への接空間の直感的な概念に基づいて構築し、損失関数のデータモデリング項に直接、ポイントクラウド隣人(回帰者)間の協調を組み込む。複数のカーネル関数はロバスト性とリッチな近似性を提供し、複数の行列因子は低ランクのモデリング、次元の縮小、データのトレーニング不要な合理化計算を提供する。 2つの重要なアプリケーションドメインはMultiL-KRIMの機能を示す: 時間変化グラフ信号(TVGS)リカバリと、高速な動的磁気共鳴イメージング(dMRI)データの再構成である。実データおよび合成データに対する大規模な数値実験は、MultiL-KRIMが前者よりも顕著なスピードアップを示し、より直感的で説明しやすいパイプラインで、一般的な「浅すぎる」データインプット技術よりも性能が優れていることを示している。

This paper introduces a novel nonparametric framework for data imputation, coined multilinear kernel regression and imputation via the manifold assumption (MultiL-KRIM). Motivated by manifold learning, MultiL-KRIM models data features as a point cloud located in or close to a user-unknown smooth manifold embedded in a reproducing kernel Hilbert space. Unlike typical manifold-learning routes, which seek low-dimensional patterns via regularizers based on graph-Laplacian matrices, MultiL-KRIM builds instead on the intuitive concept of tangent spaces to manifolds and incorporates collaboration among point-cloud neighbors (regressors) directly into the data-modeling term of the loss function. Multiple kernel functions are allowed to offer robustness and rich approximation properties, while multiple matrix factors offer low-rank modeling, integrate dimensionality reduction, and streamline computations with no need of training data. Two important application domains showcase the functionality of MultiL-KRIM: time-varying-graph-signal (TVGS) recovery, and reconstruction of highly accelerated dynamic-magnetic-resonance-imaging (dMRI) data. Extensive numerical tests on real and synthetic data demonstrate MultiL-KRIM's remarkable speedups over its predecessors, and outperformance over prevalent "shallow" data-imputation techniques, with a more intuitive and explainable pipeline than deep-image-prior methods.

翻訳日:2024-02-07 16:56:51 公開日:2024-02-06

# CAMBranch: ブランチのための拡張MILPによるコントラスト学習

CAMBranch: Contrastive Learning with Augmented MILPs for Branching ( http://arxiv.org/abs/2402.03647v1 )

ライセンス: Link先を確認

Jiacheng Lin, Meng Xu, Zhihua Xiong, Huangang Wang

(参考訳) 最近の進歩は、Mixed Integer Linear Programming(MILP)を解決するためのブランチとバウンド(B\&B)ブランチポリシーを強化する機械学習フレームワークを導入している。これらの手法は主にStrong Branchingの模倣学習に依存しており、優れた性能を示している。しかし、模倣学習、特に強い分岐のための専門家サンプルの収集は時間のかかる努力である。この課題に対処するために,従来のMILPから限られた専門家データへの可変シフトを適用することで,Augmented MILP(AMILP)を生成するフレームワークであるCAMBranchに対して, \textbf{A}ugmented \textbf{M}ILPsを用いた学習を提案する。このアプローチは、かなりの数のラベル付きエキスパートサンプルの取得を可能にする。 CAMBranchはMILPとAMILPの両方を模倣学習に利用し、対照的な学習を用いてMILPの特徴を捉え、分岐決定の質を向上させる。実験の結果、完全なデータセットの10\%でトレーニングされたcambranchは優れた性能を示すことがわかった。アブレーション研究は我々の方法の有効性をさらに検証する。

Recent advancements have introduced machine learning frameworks to enhance the Branch and Bound (B\&B) branching policies for solving Mixed Integer Linear Programming (MILP). These methods, primarily relying on imitation learning of Strong Branching, have shown superior performance. However, collecting expert samples for imitation learning, particularly for Strong Branching, is a time-consuming endeavor. To address this challenge, we propose \textbf{C}ontrastive Learning with \textbf{A}ugmented \textbf{M}ILPs for \textbf{Branch}ing (CAMBranch), a framework that generates Augmented MILPs (AMILPs) by applying variable shifting to limited expert data from their original MILPs. This approach enables the acquisition of a considerable number of labeled expert samples. CAMBranch leverages both MILPs and AMILPs for imitation learning and employs contrastive learning to enhance the model's ability to capture MILP features, thereby improving the quality of branching decisions. Experimental results demonstrate that CAMBranch, trained with only 10\% of the complete dataset, exhibits superior performance. Ablation studies further validate the effectiveness of our method.

翻訳日:2024-02-07 16:56:22 公開日:2024-02-06

# Lens: ネットワークトラフィックの基礎モデル

Lens: A Foundation Model for Network Traffic ( http://arxiv.org/abs/2402.03646v1 )

ライセンス: Link先を確認

Qineng Wang, Chen Qian, Xiaochang Li, Ziyu Yao, Huajie Shao

(参考訳) ネットワークトラフィック(ネットワークトラフィック)とは、インターネットやコンピュータを接続するシステムを通じて送信される情報の量である。ネットワークのセキュリティと管理を改善するには,ネットワークトラフィックの分析と理解が不可欠である。しかし、ネットワークトラフィックの分析は、異種ヘッダやセマンティクスに欠ける暗号化ペイロードなど、データパケットのユニークな特徴のため、大きな課題を生んでいる。トラフィックの潜在的セマンティクスを捉えるために、Transformerエンコーダやデコーダをベースとした事前学習技術を用いて、大規模トラフィックデータから表現を学習する研究がいくつかある。しかし、これらの手法は通常、トラフィック理解(分類)やトラフィック生成タスクでのみ優れている。この問題に対処するために,T5アーキテクチャを利用したネットワークトラフィックモデルLensを開発し,大規模未ラベルデータから事前学習を行う。生成能力を保ちながらグローバル情報をキャプチャするエンコーダ・デコーダ・フレームワークの強みを活かして,大規模ネットワークトラフィックから表現をよりよく学習することができる。事前学習性能をさらに向上するため,MSP(Masked Span Prediction),POP(Packet Order Prediction),HTP(Homologous Traffic Prediction)の3つの異なるタスクを統合した新しい損失を設計した。複数のベンチマークデータセットにおける評価結果は、提案するレンズが、トラフィック理解とトラフィック生成の両方に関連する下流タスクのベースラインを上回っていることを示している。とくに、現在の方法に比べて微調整のためのラベル付きデータもかなり少ない。

Network traffic refers to the amount of information being sent and received over the internet or any system that connects computers. Analyzing and understanding network traffic is vital for improving network security and management. However, the analysis of network traffic poses great challenges due to the unique characteristics of data packets, such as heterogeneous headers and encrypted payload lacking semantics. To capture the latent semantics of traffic, a few studies have adopted pre-training techniques based on the Transformer encoder or decoder to learn the representations from large-scale traffic data. However, these methods typically excel only in traffic understanding (classification) or traffic generation tasks. To address this issue, we develop Lens, a foundational network traffic model that leverages the T5 architecture to learn the pre-trained representations from large-scale unlabeled data. Harnessing the strength of the encoder-decoder framework, which captures the global information while preserving the generative ability, our model can better learn the representations from large-scale network traffic. To further enhance pre-training performance, we design a novel loss that integrates three distinct tasks, namely Masked Span Prediction (MSP), Packet Order Prediction (POP), and Homologous Traffic Prediction (HTP). Evaluation results on multiple benchmark datasets demonstrate that the proposed Lens outperforms the baselines in most downstream tasks related to both traffic understanding and traffic generation. Notably, it also requires considerably less labeled data for fine-tuning compared to current methods.

翻訳日:2024-02-07 16:56:00 公開日:2024-02-06

# Stanceosaurus 2.0: ロシアとスペインの誤報へのスタンス分類

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation ( http://arxiv.org/abs/2402.03642v1 )

ライセンス: Link先を確認

Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu

(参考訳) スタンテオサウルス・コーパス(zheng et al., 2022)は、twitterから抽出された高品質で注釈付き、5方向の姿勢データを提供し、文化横断的および言語横断的誤情報の分析に適するように設計された。 Stanceosaurus 2.0イテレーションでは、このフレームワークをロシア語とスペイン語に拡張しています。前者は西側諸国との緊張が激化し、ウクライナへの激しい侵入が相次いだため、現在の重要性がある。一方、後者は巨大なコミュニティであり、主要なソーシャルメディアプラットフォームでは見過ごされてきた。 41件以上の偽情報のツイートを3,874件追加することで、これらの問題に焦点を当てた研究を支援することを目標としている。このデータの価値を実証するため,多言語BERTのゼロショット交叉移動を用いて,両言語で43のマクロF1スコアを持つStanceosaurusの初期研究と同等の結果を得た。これは多文化的誤情報を識別するための有効なツールとしてスタンス分類の有効性を強調する。

The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation. In the Stanceosaurus 2.0 iteration, we extend this framework to encompass Russian and Spanish. The former is of current significance due to prevalent misinformation amid escalating tensions with the West and the violent incursion into Ukraine. The latter, meanwhile, represents an enormous community that has been largely overlooked on major social media platforms. By incorporating an additional 3,874 Spanish and Russian tweets over 41 misinformation claims, our objective is to support research focused on these issues. To demonstrate the value of this data, we employed zero-shot cross-lingual transfer on multilingual BERT, yielding results on par with the initial Stanceosaurus study with a macro F1 score of 43 for both languages. This underlines the viability of stance classification as an effective tool for identifying multicultural misinformation.

翻訳日:2024-02-07 16:55:34 公開日:2024-02-06

# torchmSAT: 最大満足度問題に対するGPU加速近似

torchmSAT: A GPU-Accelerated Approximation To The Maximum Satisfiability Problem ( http://arxiv.org/abs/2402.03640v1 )

ライセンス: Link先を確認

Abdelrahman Hosny, Sherief Reda

(参考訳) 離散構造解析における機械学習技術の顕著な成果は、組合せ最適化アルゴリズムへの統合に大きな注目を集めている。通常、これらの手法は、学習したモデルを解法ループ内に注入することで既存の解法を改善する。本研究では,最大満足度問題(MaxSAT)の解を近似できる単一微分可能関数を導出する。そこで我々は,我々の微分可能な関数をモデル化するための新しいニューラルネットワークアーキテクチャを提案する。このアプローチでは、トレーニングプロセスが解決アルゴリズムとして機能するため、ラベル付きデータやニューラルネットワークトレーニングフェーズが不要になる。さらに,GPUの計算能力を利用して計算を高速化する。 MaxSATインスタンスに挑戦する実験結果から,提案手法は既存の2つのMaxSATソルバよりも優れており,学習や基盤となるSATソルバへのアクセスを必要とせず,ソリューションコストの面で同等であることがわかった。 NPハード問題をMaxSATに還元できることを考えると、我々の新しい手法は、ニューラルネットワークGPUアクセラレーションの恩恵を受ける新しい世代の問題解決者への道を開くものである。

The remarkable achievements of machine learning techniques in analyzing discrete structures have drawn significant attention towards their integration into combinatorial optimization algorithms. Typically, these methodologies improve existing solvers by injecting learned models within the solving loop to enhance the efficiency of the search process. In this work, we derive a single differentiable function capable of approximating solutions for the Maximum Satisfiability Problem (MaxSAT). Then, we present a novel neural network architecture to model our differentiable function, and progressively solve MaxSAT using backpropagation. This approach eliminates the need for labeled data or a neural network training phase, as the training process functions as the solving algorithm. Additionally, we leverage the computational power of GPUs to accelerate these computations. Experimental results on challenging MaxSAT instances show that our proposed methodology outperforms two existing MaxSAT solvers, and is on par with another in terms of solution cost, without necessitating any training or access to an underlying SAT solver. Given that numerous NP-hard problems can be reduced to MaxSAT, our novel technique paves the way for a new generation of solvers poised to benefit from neural network GPU acceleration.

翻訳日:2024-02-07 16:55:11 公開日:2024-02-06

# BEAM:多視点3Dオブジェクト検出のためのベータ分布レイデノイング

BEAM: Beta Distribution Ray Denoising for Multi-view 3D Object Detection ( http://arxiv.org/abs/2402.03634v1 )

ライセンス: Link先を確認

Feng Liu, Tengteng Huang, Qianjing Zhang, Haotian Yao, Chi Zhang, Fang Wan, Qixiang Ye, Yanzhao Zhou

(参考訳) 多視点3Dオブジェクト検出器は、深度情報の欠如による重複予測に苦慮し、偽陽性検出を行う。本研究では,DTR方式のマルチビュー3D検出器に適用可能な,新しいBeta Distribution Ray DenoisingアプローチであるBEAMを紹介した。カメラからオブジェクトへの光線を生成し、これらの光線に沿ってベータ分布系から空間デノジングクエリをサンプリングすることにより、BEAMは曖昧な深さから生じる空間的な硬い負のサンプルを識別する能力を高める。 BEAMは、トレーニング中に限界計算コストのみを追加し、推論速度を著しく保存するプラグイン・アンド・プレイ技術である。 NuScenesデータセットの大規模な実験とアブレーション研究は、強力なベースラインよりも大幅に改善され、最先端のStreamPETRよりも1.9%向上した。コードはhttps://github.com/LiewFeng/BEAM.comから入手できる。

Multi-view 3D object detectors struggle with duplicate predictions due to the lack of depth information, resulting in false positive detections. In this study, we introduce BEAM, a novel Beta Distribution Ray Denoising approach that can be applied to any DETR-style multi-view 3D detector to explicitly incorporate structure prior knowledge of the scene. By generating rays from cameras to objects and sampling spatial denoising queries from the Beta distribution family along these rays, BEAM enhances the model's ability to distinguish spatial hard negative samples arising from ambiguous depths. BEAM is a plug-and-play technique that adds only marginal computational costs during training, while impressively preserving the inference speed. Extensive experiments and ablation studies on the NuScenes dataset demonstrate significant improvements over strong baselines, outperforming the state-of-the-art method StreamPETR by 1.9% mAP. The code will be available at https://github.com/LiewFeng/BEAM.

翻訳日:2024-02-07 16:54:38 公開日:2024-02-06

# より深い理解のための能動的問合せを用いた言語モデルの構築

Empowering Language Models with Active Inquiry for Deeper Understanding ( http://arxiv.org/abs/2402.03719v1 )

ライセンス: Link先を確認

Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu

(参考訳) 大規模言語モデル(LLM)の台頭は、自然言語を通じて人工知能システムと対話する方法に革命をもたらした。しかし、LSMは不確実な意図のためにユーザクエリを誤解釈することが多く、あまり役に立たない。自然の人間との相互作用では、不明な情報を明らかにするために標的とした質問を通じて明確化が求められる。そこで本稿では,同じレベルの対話性を持つllmを支援すべく設計されたlamai(language model with active inquiry)を提案する。 LaMAIはアクティブな学習技術を活用して、最も有益な質問を提起し、動的双方向対話を促進する。このアプローチはコンテキストギャップを狭めるだけでなく、LCMの出力を洗練し、ユーザの期待とより密接に一致させる。 LLMが会話の文脈に制限がある様々な複雑なデータセットを対象とした実証研究は、LaMAIの有効性を実証している。解答精度は31.9%から50.9%に向上し、他の主要な問合せフレームワークを上回っている。さらに、人間の参加者を含むシナリオでは、lamaiは一貫して82%以上のケースにおいて、ベースラインメソッドに匹敵する応答を生成する。 LaMAIの適用性はさらに、様々なLLMとの統合の成功によって証明されており、対話型言語モデルの将来の可能性を強調している。

The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language. However, LLMs often misinterpret user queries because of their uncertain intention, leading to less helpful responses. In natural human interactions, clarification is sought through targeted questioning to uncover obscure information. Thus, in this paper, we introduce LaMAI (Language Model with Active Inquiry), designed to endow LLMs with this same level of interactive engagement. LaMAI leverages active learning techniques to raise the most informative questions, fostering a dynamic bidirectional dialogue. This approach not only narrows the contextual gap but also refines the output of the LLMs, aligning it more closely with user expectations. Our empirical studies, across a variety of complex datasets where LLMs have limited conversational context, demonstrate the effectiveness of LaMAI. The method improves answer accuracy from 31.9% to 50.9%, outperforming other leading question-answering frameworks. Moreover, in scenarios involving human participants, LaMAI consistently generates responses that are superior or comparable to baseline methods in more than 82% of the cases. The applicability of LaMAI is further evidenced by its successful integration with various LLMs, highlighting its potential for the future of interactive language models.

翻訳日:2024-02-07 16:46:51 公開日:2024-02-06

# 大規模言語モデルによる協調フレームワークによるロボットの自動開発

Automatic Robotic Development through Collaborative Framework by Large Language Models ( http://arxiv.org/abs/2402.03699v1 )

ライセンス: Link先を確認

Zhirong Luan and Yujun Lai

(参考訳) 大きな言語モデル LLM の驚くべきコード生成能力にもかかわらず、それらは複雑なタスクハンドリングの課題に直面している。高度に複雑な分野であるロボット開発は、本質的には、タスクアロケーションと協力的なチームワークに人間の関与を要求する。ロボット開発を促進するために,現実のロボット開発に触発された革新的な自動協調フレームワークを提案する。このフレームワークは複数のllmを異なる役割アナリスト、プログラマ、テスターに採用している。アナリストはユーザー要件を深く掘り下げ、プログラマが正確なコードを作成できるようにし、テスタは実際のロボットアプリケーションのユーザフィードバックに基づいてパラメータを微調整する。各llmは開発プロセス内で多様な重要なタスクに取り組みます。明確なコラボレーションルールは、LLM間の現実のチームワークをエミュレートします。アナリスト、プログラマ、テスターは、戦略、コード、パラメータ調整を監督する結束したチームを形成します。この枠組みにより, 専門知識を必要とせず, 非専門家のみに頼り, 複雑なロボット開発を実現する。

Despite the remarkable code generation abilities of large language models LLMs, they still face challenges in complex task handling. Robot development, a highly intricate field, inherently demands human involvement in task allocation and collaborative teamwork . To enhance robot development, we propose an innovative automated collaboration framework inspired by real-world robot developers. This framework employs multiple LLMs in distinct roles analysts, programmers, and testers. Analysts delve deep into user requirements, enabling programmers to produce precise code, while testers fine-tune the parameters based on user feedback for practical robot application. Each LLM tackles diverse, critical tasks within the development process. Clear collaboration rules emulate real world teamwork among LLMs. Analysts, programmers, and testers form a cohesive team overseeing strategy, code, and parameter adjustments . Through this framework, we achieve complex robot development without requiring specialized knowledge, relying solely on non experts participation.

翻訳日:2024-02-07 16:46:29 公開日:2024-02-06

# 大規模局所学習係数の推定

Estimating the Local Learning Coefficient at Scale ( http://arxiv.org/abs/2402.03698v1 )

ライセンス: Link先を確認

Zach Furman, Edmund Lau

(参考訳) \textit{local learning coefficient} (LLC) はモデル複雑性を定量化する原理的な方法であり、もともとは特異学習理論(SLT)を用いてベイズ統計の文脈から導かれた。局所学習係数を数値的に推定する手法はいくつか知られているが、現在のディープラーニングアーキテクチャやデータセットの規模には拡張されていない。 {\tt arXiv:2308.12108 [stat.ML]} で開発された手法を用いて、深い線形ネットワーク(DLN)を最大100Mパラメータまで正確に自己整合的に測定する方法を実証的に示す。また, 推定LLCは, 理論量に対する再スケーリング不変性を有することを示す。

The \textit{local learning coefficient} (LLC) is a principled way of quantifying model complexity, originally derived in the context of Bayesian statistics using singular learning theory (SLT). Several methods are known for numerically estimating the local learning coefficient, but so far these methods have not been extended to the scale of modern deep learning architectures or data sets. Using a method developed in {\tt arXiv:2308.12108 [stat.ML]} we empirically show how the LLC may be measured accurately and self-consistently for deep linear networks (DLNs) up to 100M parameters. We also show that the estimated LLC has the rescaling invariance that holds for the theoretical quantity.

翻訳日:2024-02-07 16:46:16 公開日:2024-02-06

# SHMC-Net: 精子頭部形態分類のためのマスク誘導機能融合ネットワーク

SHMC-Net: A Mask-guided Feature Fusion Network for Sperm Head Morphology Classification ( http://arxiv.org/abs/2402.03697v1 )

ライセンス: Link先を確認

Nishchal Sapkota, Yejia Zhang, Sirui Li, Peixian Liang, Zhuo Zhao, Danny Z Chen

(参考訳) 男性不妊は世界の不妊患者の約3分の1を占める。頭部形態解析による精子異常の手動評価は、専門家の間で観察者の変動と診断上の相違の問題に遭遇する。その代わり、casa(computer-assisted semen analysis)は、低品質の精子画像、小さなデータセット、騒がしいクラスラベルに苦しむ。精子頭の形態分類のための新しいアプローチであるshmc-netを提案し,精子頭のセグメンテーションマスクを用いて精子画像の形態分類を導く。 SHMC-Netは、画像プリエントを用いて信頼性の高いセグメンテーションマスクを生成し、効率的なグラフベースの手法でオブジェクト境界を洗練し、精子頭作物とマスクネットワークをトレーニングする。ネットワークの中間段階では、画像とマスクの特徴を融合スキームで融合させ、形態的特徴をよりよく学習する。ノイズの多いクラスラベルの処理と小さなデータセットでのトレーニングの正規化のために、SHMC-NetはSoft Mixupを適用して、ミックスアップ拡張と損失関数を組み合わせた。 scian と hushem のデータセットで最先端の成果を達成し,事前トレーニングやコストのかかるセンシング手法を駆使した手法よりも優れています。

Male infertility accounts for about one-third of global infertility cases. Manual assessment of sperm abnormalities through head morphology analysis encounters issues of observer variability and diagnostic discrepancies among experts. Its alternative, Computer-Assisted Semen Analysis (CASA), suffers from low-quality sperm images, small datasets, and noisy class labels. We propose a new approach for sperm head morphology classification, called SHMC-Net, which uses segmentation masks of sperm heads to guide the morphology classification of sperm images. SHMC-Net generates reliable segmentation masks using image priors, refines object boundaries with an efficient graph-based method, and trains an image network with sperm head crops and a mask network with the corresponding masks. In the intermediate stages of the networks, image and mask features are fused with a fusion scheme to better learn morphological features. To handle noisy class labels and regularize training on small datasets, SHMC-Net applies Soft Mixup to combine mixup augmentation and a loss function. We achieve state-of-the-art results on SCIAN and HuSHeM datasets, outperforming methods that use additional pre-training or costly ensembling techniques.

翻訳日:2024-02-07 16:46:03 公開日:2024-02-06

# ConUNETR:3次元マイクロCT軟骨分割のためのコンディショナルトランスフォーマネットワーク

ConUNETR: A Conditional Transformer Network for 3D Micro-CT Embryonic Cartilage Segmentation ( http://arxiv.org/abs/2402.03695v1 )

ライセンス: Link先を確認

Nishchal Sapkota, Yejia Zhang, Susan M. Motch Perrine, Yuhan Hsi, Sirui Li, Meng Wu, Greg Holmes, Abdul R. Abdulai, Ethylin W. Jabs, Joan T. Richtsmeier, Danny Z Chen

(参考訳) 軟骨および骨構造の形態発達の研究は、生命を脅かす骨格形態の早期発見に不可欠である。胚軟骨は数時間以内に急速な構造変化を起こし、複数の胚年齢層にわたって推測される深層学習に基づくセグメンテーションモデルの一般化を制限する生物学的変異と形態変化をもたらす。年齢グループごとに個別のモデルを取得することは高価で効果が低いが、直接転送(トレーニング中に見えない年齢を予測する)は形態変化による潜在的なパフォーマンス低下に悩まされる。本研究では, 形態学的に多様な情報を条件付き機構で蒸留するトランスフォーマーを用いた新しいセグメンテーションモデルを提案する。これにより、1つのモデルが複数の年齢グループで正確に軟骨を予測できる。実験では,他の競合セグメンテーションモデルと比較して,新しいモデルの優位性を示した。異なる変異を持つマウス軟骨データセットに関するさらなる研究は、モデルが良好に一般化し、年齢ベースの軟骨形態パターンを効果的に捉えていることを示している。

Studying the morphological development of cartilaginous and osseous structures is critical to the early detection of life-threatening skeletal dysmorphology. Embryonic cartilage undergoes rapid structural changes within hours, introducing biological variations and morphological shifts that limit the generalization of deep learning-based segmentation models that infer across multiple embryonic age groups. Obtaining individual models for each age group is expensive and less effective, while direct transfer (predicting an age unseen during training) suffers a potential performance drop due to morphological shifts. We propose a novel Transformer-based segmentation model with improved biological priors that better distills morphologically diverse information through conditional mechanisms. This enables a single model to accurately predict cartilage across multiple age groups. Experiments on the mice cartilage dataset show the superiority of our new model compared to other competitive segmentation models. Additional studies on a separate mice cartilage dataset with a distinct mutation show that our model generalizes well and effectively captures age-based cartilage morphology patterns.

翻訳日:2024-02-07 16:45:40 公開日:2024-02-06

# ServeFlow: ネットワークトラフィック分析のための高速スローモデルアーキテクチャ

ServeFlow: A Fast-Slow Model Architecture for Network Traffic Analysis ( http://arxiv.org/abs/2402.03694v1 )

ライセンス: Link先を確認

Shinan Liu, Ted Shaowang, Gerry Wan, Jeewon Chae, Jonatas Marques, Sanjay Krishnan, Nick Feamster

(参考訳) インターネットが統合され、トラフィックが暗号化されるにつれて、ネットワークトラフィック分析はますます複雑な機械学習モデルを使用するようになっている。しかし、高帯域幅ネットワークでは、フローがモデル推論速度よりも早く到達できる。ネットワークフローの時間的性質は、他の高速機械学習アプリケーションで利用される単純なスケールアウトアプローチを制限する。そこで本稿では,ネットワークトラフィック分析タスクを対象とした機械学習モデルのServeFlowを提案する。これは,収集するパケットの数と,個々のフローに適用するモデルを選択して,最小レイテンシ,高サービスレート,高精度のバランスを実現する。同じタスクでは、モデル間の推論時間は2.7x-136.3xで、中央のパッケージ間待機時間は推論時間より6-8桁高いことがよくあります。 ServeFlowは、76.3%のフローを16ms以下で推論することが可能で、これは、サービスレートを高め、同様の精度を維持しながら、中央のエンドツーエンドサービスレイテンシで40.5倍のスピードアップである。 1フローに何千もの機能があるとしても、16コアのcpuコモディティサーバ上で毎秒48.5k以上の新しいフローを処理し、都市レベルのネットワークバックボーンで観測される流量の桁数に合致する。

Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.

翻訳日:2024-02-07 16:45:23 公開日:2024-02-06

# 3Doodle: 3Dストロークによるオブジェクトのコンパクト抽象化

3Doodle: Compact Abstraction of Objects with 3D Strokes ( http://arxiv.org/abs/2402.03690v1 )

ライセンス: Link先を確認

Changwoon Choi, Jaeah Lee, Jaesik Park, Young Min Kim

(参考訳) フリーハンドのスケッチは長い間、物体の特徴を伝えるための効率的な表現として機能してきたが、しばしば主観的であり、現実的な表現からかなり逸脱している。さらに、スケッチは任意の視点で一貫性がなく、3d形状を捉えるのが困難である。対象オブジェクトのマルチビュー画像に対して記述的かつビュー一貫性のあるスケッチ画像を生成する3Doooleを提案する。本手法は,3次元ストロークの集合が3次元構造情報を効率よく表現し,表示に一貫性のある2次元スケッチを描画できるという考えに基づいている。 2次元スケッチをビューに依存しないコンポーネントとビューに依存しないコンポーネントの結合として表現する。 3次元立方体Bエジエ曲線はビューに依存しない3次元特徴線を示すが、超四角形の輪郭は様々な視点の体積の滑らかな輪郭を表す。我々のパイプラインは、3Dストロークプリミティブのパラメータを直接最適化し、知覚的損失を完全に微分可能な方法で最小化する。得られた3dストロークのスパース集合は、様々なオブジェクトの必須の3d特性形状を含む抽象スケッチとして表現することができる。近年のスケッチ生成手法と比較して、3Doodleはオリジナル画像の概念を忠実に表現できることを示す。

While free-hand sketching has long served as an efficient representation to convey characteristics of an object, they are often subjective, deviating significantly from realistic representations. Moreover, sketches are not consistent for arbitrary viewpoints, making it hard to catch 3D shapes. We propose 3Dooole, generating descriptive and view-consistent sketch images given multi-view images of the target object. Our method is based on the idea that a set of 3D strokes can efficiently represent 3D structural information and render view-consistent 2D sketches. We express 2D sketches as a union of view-independent and view-dependent components. 3D cubic B ezier curves indicate view-independent 3D feature lines, while contours of superquadrics express a smooth outline of the volume of varying viewpoints. Our pipeline directly optimizes the parameters of 3D stroke primitives to minimize perceptual losses in a fully differentiable manner. The resulting sparse set of 3D strokes can be rendered as abstract sketches containing essential 3D characteristic shapes of various objects. We demonstrate that 3Doodle can faithfully express concepts of the original images compared with recent sketch generation approaches.

翻訳日:2024-02-07 16:45:01 公開日:2024-02-06

# 垂直連合学習におけるプライバシの脅威と防御に関する調査--モデルライフサイクルの観点から

A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective ( http://arxiv.org/abs/2402.03688v1 )

ライセンス: Link先を確認

Lei Yu, Meng Han, Yiming Li, Changting Lin, Yao Zhang, Mingyang Zhang, Yan Liu, Haiqin Weng, Yuseok Jeon, Ka-Ho Chow, Stacy Patterson

(参考訳) Vertical Federated Learning(VFL)は、複数の参加者が同じサンプルを共有し、異なる特徴を持つ、共同で機械学習モデルをトレーニングする、連合学習パラダイムである。 VFLは生データを共有せずにコラボレーティブな機械学習を可能にするが、それでもさまざまなプライバシー上の脅威を受けやすい。本稿では,VFLにおけるプライバシ攻撃と防衛における最先端技術に関する総合的な調査を行う。本研究は,攻撃と防衛の両方に分類学を提供し,その特徴に基づいてオープン課題と今後の研究方向性について議論する。具体的には,機械学習のさまざまな段階で発生するプライバシの脅威と,それに対応する対策を掘り下げることで,モデルのライフサイクルを中心にして議論を行う。この調査は研究コミュニティのリソースとして機能するだけでなく、モデルのライフサイクルを通じてデータプライバシを保護するための明確なガイダンスと実用的な洞察を提供する。

Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of-the-art in privacy attacks and defenses in VFL. We provide taxonomies for both attacks and defenses, based on their characterizations, and discuss open challenges and future research directions. Specifically, our discussion is structured around the model's life cycle, by delving into the privacy threats encountered during different stages of machine learning and their corresponding countermeasures. This survey not only serves as a resource for the research community but also offers clear guidance and actionable insights for practitioners to safeguard data privacy throughout the model's life cycle.

翻訳日:2024-02-07 16:44:41 公開日:2024-02-06

# Pard:グラフ生成のための置換不変自己回帰拡散

Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation ( http://arxiv.org/abs/2402.03687v1 )

ライセンス: Link先を確認

Lingxiao Zhao, Xueying Ding, Leman Akoglu

(参考訳) グラフ生成は、順序付けに対する感度にもかかわらず、単純さと有効性のため、自己回帰モデルによって支配されている。しかし、拡散モデルは置換不変でありながら同等のパフォーマンスを提供するため、注目を集めている。現在のグラフ拡散モデルは1ショットでグラフを生成するが、最適なパフォーマンスを達成するには追加の機能と数千のデノゲーションステップが必要である。拡散モデルと自己回帰法を統合した置換不変自己回帰拡散モデルpardを提案する。 pardは、順序の感度なしで置換不変性を維持しながら、自己回帰モデルの有効性と効率を利用する。具体的には、集合とは対照的に、グラフの要素は完全に順序づけられておらず、ノードとエッジに一意な部分順序が存在することを示す。この部分順序で、PARDはブロックごとの自己回帰的なグラフを生成し、各ブロックの確率は同変ネットワークを持つ共有拡散モデルによって条件付きでモデル化される。表現性を確保しつつ効率を確保するため,PPGNと変換器を統合した高階グラフ変換器を提案する。 GPTと同様に、全てのブロックの並列トレーニングをサポートするために高階グラフ変換器を拡張する。余分な特徴がなければ、PARDは分子および非分子データセットの最先端のパフォーマンスを達成し、1.9M分子を含むMOSESのような大規模なデータセットにスケールする。

Graph generation has been dominated by autoregressive models due to their simplicity and effectiveness, despite their sensitivity to ordering. Yet diffusion models have garnered increasing attention, as they offer comparable performance while being permutation-invariant. Current graph diffusion models generate graphs in a one-shot fashion, but they require extra features and thousands of denoising steps to achieve optimal performance. We introduce PARD, a Permutation-invariant Auto Regressive Diffusion model that integrates diffusion models with autoregressive methods. PARD harnesses the effectiveness and efficiency of the autoregressive model while maintaining permutation invariance without ordering sensitivity. Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order, PARD generates a graph in a block-by-block, autoregressive fashion, where each block's probability is conditionally modeled by a shared diffusion model with an equivariant network. To ensure efficiency while being expressive, we further propose a higher-order graph transformer, which integrates transformer with PPGN. Like GPT, we extend the higher-order graph transformer to support parallel training of all blocks. Without any extra features, PARD achieves state-of-the-art performance on molecular and non-molecular datasets, and scales to large datasets like MOSES containing 1.9M molecules.

翻訳日:2024-02-07 16:44:24 公開日:2024-02-06

# Minds vs. Machines: 言語モデルによる詳細検証の再考

Minds versus Machines: Rethinking Entailment Verification with Language Models ( http://arxiv.org/abs/2402.03686v1 )

ライセンス: Link先を確認

Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren

(参考訳) 人間は会話を理解するためにテキスト理解において多くの推論を行う。本稿では,人間と最先端の大規模言語モデル(llm)間の推論判断の共通性と相違を理解することを目的とする。包括的にキュレートされたentailment testベンチマークを利用して、さまざまな推論カテゴリで人間とLLMのパフォーマンスを評価する。本ベンチマークでは,3つのカテゴリ(NLI,コンテキストQA,合理性)のデータセットを多文の前提と異なる知識タイプに含め,複雑な推論インスタンスにおける推論能力の評価を行う。以上の結果から,LLMs は長期にわたるマルチホップ推論において優れており,人間は簡素な帰納的推論を必要とするタスクに優れていた。これらの知見を活かして、GPT-3.5やGPT-4と競合するFlan-T5モデルを微調整し、包含検証のための堅牢なオープンソースソリューションを提供する。実用的応用として、モデル生成説明における自己整合性を高めるための微調整モデルの有効性を示す。

Humans make numerous inferences in text comprehension to understand discourse. This paper aims to understand the commonalities and disparities in the inference judgments between humans and state-of-the-art Large Language Models (LLMs). Leveraging a comprehensively curated entailment verification benchmark, we evaluate both human and LLM performance across various reasoning categories. Our benchmark includes datasets from three categories (NLI, contextual QA, and rationales) that include multi-sentence premises and different knowledge types, thereby evaluating the inference capabilities in complex reasoning instances. Notably, our findings reveal LLMs' superiority in multi-hop reasoning across extended contexts, while humans excel in tasks necessitating simple deductive reasoning. Leveraging these insights, we introduce a fine-tuned Flan-T5 model that outperforms GPT-3.5 and rivals with GPT-4, offering a robust open-source solution for entailment verification. As a practical application, we showcase the efficacy of our finetuned model in enhancing self-consistency in model-generated explanations, resulting in a 6% performance boost on average across three multiple-choice question-answering datasets.

翻訳日:2024-02-07 16:44:01 公開日:2024-02-06

# RL-VLM-F:ビジョン言語モデルからの強化学習

RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback ( http://arxiv.org/abs/2402.03681v1 )

ライセンス: Link先を確認

Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, Zackory Erickson

(参考訳) 報酬工学は強化学習(rl)研究において長年の課題であり、効果的な報酬機能を設計するには、人間の努力と試行錯誤の反復プロセスがしばしば必要となる。本稿では,視覚言語基礎モデル(VLM)からのフィードバックを利用して,タスク目標のテキスト記述とエージェントの視覚観察のみを用いて,エージェントが新しいタスクを学習するための報酬関数を自動的に生成する手法であるRL-VLM-Fを提案する。提案手法の鍵となるのは,タスクゴールのテキスト記述に基づいて,エージェントのイメージ観察のペアよりも好みを与えるためにこれらのモデルをクエリし,そのモデルに生の報酬スコアを出力させるのではなく,好みラベルから報酬関数を学習することである。我々は、RL-VLM-Fが、古典的な制御を含む様々な領域にまたがる効果的な報酬とポリシー、および、厳密で明瞭で変形可能な物体の操作を、人間の監督なしに実現できることを実証した。

Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent's visual observations, by leveraging feedbacks from vision language foundation models (VLMs). The key to our approach is to query these models to give preferences over pairs of the agent's image observations based on the text description of the task goal, and then learn a reward function from the preference labels, rather than directly prompting these models to output a raw reward score, which can be noisy and inconsistent. We demonstrate that RL-VLM-F successfully produces effective rewards and policies across various domains - including classic control, as well as manipulation of rigid, articulated, and deformable objects - without the need for human supervision, outperforming prior methods that use large pretrained models for reward generation under the same assumptions.

翻訳日:2024-02-07 16:43:41 公開日:2024-02-06

# 強化学習エージェントのための論理仕様誘導動的タスクサンプリング

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents ( http://arxiv.org/abs/2402.03678v1 )

ライセンス: Link先を確認

Yash Shukla, Wenchang Gao, Vasanth Sarathy, Alvaro Velasquez, Robert Wright and Jivko Sinapov

(参考訳) 強化学習(rl)は、人工エージェントが多様な行動を学ぶために大きな進歩を遂げた。しかし、効果的な政策を学ぶには、しばしば多くの環境相互作用を必要とする。サンプル複雑性の問題を緩和するために、近年のアプローチでは、LTL$_f$(Linear Temporal Logic)式やReward Machines(RM)のような高レベルのタスク仕様を使用してエージェントの学習進捗をガイドしている。本稿では,エージェントを初期状態から高レベルタスク仕様に基づく目標状態へと導くためのrlポリシーのセットを学習し,環境相互作用の数を最小化しながら,論理仕様に基づく動的タスクサンプリング(lsts)と呼ばれる新しい手法を提案する。以前の作業とは異なり、lstsは環境ダイナミクスや報酬マシンに関する情報を仮定せず、ゴールポリシーを成功させる有望なタスクを動的にサンプリングする。我々は,LSTSをグリッドワールド上で評価し,最先端のRMやオートマトン誘導RLベースライン(Q-Learning for Reward Machines)や論理仕様(DIRL)など)と比較して,複雑なシーケンシャルな意思決定問題に対する時間対閾値性能の向上を実現することを示す。さらに,本手法は,部分的に観察可能なロボットタスクと連続制御ロボット操作タスクの両方において,RMおよびオートマトン誘導RLベースラインよりも優れていることを示す。

Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTL$_f$) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.

翻訳日:2024-02-07 16:43:18 公開日:2024-02-06

# PPIretrievalを用いたタンパク質とタンパク質の効果的な相互作用探索

Effective Protein-Protein Interaction Exploration with PPIretrieval ( http://arxiv.org/abs/2402.03675v1 )

ライセンス: Link先を確認

Chenqing Hua, Connor Coley, Guy Wolf, Doina Precup, Shuangjia Zheng

(参考訳) 蛋白-タンパク質相互作用(ppis)は、シグナル伝達、輸送、免疫防御など多くの細胞機能を制御する上で重要である。多鎖タンパク質複合体構造の予測精度が向上するにつれて、大きな複雑な宇宙を効率的にナビゲートして潜在的なppisを同定することが課題となっている。本稿では,タンパク質-タンパク質相互作用探索のための最初の深層学習モデルであるPPIretrievalを提案する。 PPIretrievalは、その結合部位に未知のクエリタンパク質を付与すると、その結合部位とそれに対応する結合部位とを効果的に同定し、タンパク質-タンパク質複合体の形成を促進する。

Protein-protein interactions (PPIs) are crucial in regulating numerous cellular functions, including signal transduction, transportation, and immune defense. As the accuracy of multi-chain protein complex structure prediction improves, the challenge has shifted towards effectively navigating the vast complex universe to identify potential PPIs. Herein, we propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration, which leverages existing PPI data to effectively search for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces. When provided with an unseen query protein with its associated binding site, PPIretrieval effectively identifies a potential binding partner along with its corresponding binding site in an embedding space, facilitating the formation of protein-protein complexes.

翻訳日:2024-02-07 16:42:48 公開日:2024-02-06

# 間接的推論としての大規模言語モデル--非肯定的・矛盾的推論

Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning ( http://arxiv.org/abs/2402.03667v1 )

ライセンス: Link先を確認

Yanfang Zhang, Yiliu Sun, Yibing Zhan, Dapeng Tao, Dacheng Tao, Chen Gong

(参考訳) 近年,Large Language Models (LLM) の複雑な推論能力の向上に注目が集まっている。しかし,従来のチェーン・オブ・ソートや自己整合性といった手法は,主に直接推論(DR)の枠組みを踏襲しているため,DRによる解決が困難な現実的な課題の解決に苦慮する。そのため,本研究では,現実的推論や数理的証明などのIR課題に対処するために,反正の論理と矛盾を取り入れた新しい間接推論(IR)手法を提案する。具体的には,2つのステップから構成される。まず, llmの理解性を高めるために, コントラプラスの論理等価性を利用してデータと規則を補強する。第2に、論理的に元のDRプロセスと等価な矛盾による証明に基づいて、LCMを誘導するプロンプトテンプレートのセットを設計する。我々のIR法は単純だが有効であり、既存のDR法と簡単に統合でき、LCMの推論能力をさらに向上させることができる。 GPT-3.5-turbo や Gemini-pro などの一般的な LLM に関する実験結果から,従来の DR 法と比較すると,我々のIR 法は事実推論の総合的精度を27.33%,数学的証明を31.43%向上させることが示された。さらに,ir と dr を組み合わせる手法は,ir と dr のみを使用する手法を著しく上回っており,提案手法の有効性も示している。

Recently, increasing attention has been focused drawn on to improve the ability of Large Language Models (LLMs) to perform complex reasoning. However, previous methods, such as Chain-of-Thought and Self-Consistency, mainly follow Direct Reasoning (DR) frameworks, so they will meet difficulty in solving numerous real-world tasks which can hardly be solved via DR. Therefore, to strengthen the reasoning power of LLMs, this paper proposes a novel Indirect Reasoning (IR) method that employs the logic of contrapositives and contradictions to tackle IR tasks such as factual reasoning and mathematic proof. Specifically, our methodology comprises two steps. Firstly, we leverage the logical equivalence of contrapositive to augment the data and rules to enhance the comprehensibility of LLMs. Secondly, we design a set of prompt templates to trigger LLMs to conduct IR based on proof by contradiction that is logically equivalent to the original DR process. Our IR method is simple yet effective and can be straightforwardly integrated with existing DR methods to further boost the reasoning abilities of LLMs. The experimental results on popular LLMs, such as GPT-3.5-turbo and Gemini-pro, show that our IR method enhances the overall accuracy of factual reasoning by 27.33% and mathematical proof by 31.43%, when compared with traditional DR methods. Moreover, the methods combining IR and DR significantly outperform the methods solely using IR or DR, further demonstrating the effectiveness of our strategy.

翻訳日:2024-02-07 16:42:32 公開日:2024-02-06

# QuEST: 効率的な選択ファインタニングによる低ビット拡散モデル量子化

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning ( http://arxiv.org/abs/2402.03666v1 )

ライセンス: Link先を確認

Haoxuan Wang, Yuzhang Shang, Zhihang Yuan, Junyi Wu, Yan Yan

(参考訳) 拡散モデルは画像生成タスクで著しく成功したが、実際のデプロイメントは高いメモリ消費と時間消費によって抑制されている。量子化は拡散モデル圧縮と加速の方法であるが、既存の手法はモデルが低ビットに量子化されると完全に失敗する。本稿では,不均衡な活性化分布,不正確な時間情報,特定のモジュールの摂動に対する脆弱性という,現在の手法の有効性を損なう量子化拡散モデルの3つの特性を明らかにする。分散不均衡に起因する高密度低ビット量子化の難しさを軽減するため,活性化分布に適応する量子化モデルを微調整する。この考え方に基づき、重要な時間情報を保持する層とビット幅の低減に敏感な層という2つの重要な種類の量子化層を識別し、性能劣化を効率良く緩和するために微調整する。提案手法がアクティベーション分布を変化させ、意味のある時間情報を提供し、より簡単で正確な量子化を容易にすることを実証的に検証する。本手法は,3つの高分解能画像生成タスクで評価され,様々なビット幅設定で最先端の性能を実現するとともに,フル4ビット(すなわちw4a4)の安定拡散で可読性画像を生成する最初の方法である。

Diffusion models have achieved remarkable success in image generation tasks, yet their practical deployment is restrained by the high memory and time consumption. While quantization paves a way for diffusion model compression and acceleration, existing methods totally fail when the models are quantized to low-bits. In this paper, we unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion.

翻訳日:2024-02-07 16:42:00 公開日:2024-02-06

# AoSRNet:マルチ知識統合によるオールインワンのシーンリカバリネットワーク

AoSRNet: All-in-One Scene Recovery Networks via Multi-knowledge Integration ( http://arxiv.org/abs/2402.03738v1 )

ライセンス: Link先を確認

Yuxu Lu, Dong Yang, Yuan Gao, Ryan Wen Liu, Jun Liu, Yu Guo

(参考訳) 非均質な撮像媒体における光の散乱と減衰、あるいは不整合光強度は、収集された画像のコントラストと色歪の不足を引き起こし、視覚駆動型スマートアーバン、自動運転車、インテリジェントロボットなどの開発を制限する。本稿では,マルチ知識統合(AoSRNet)を用いたオールインワンシーン回復ネットワークを提案する。ガンマ補正(GC)と最適化線形ストレッチ(OLS)を組み合わせてディテール拡張モジュール(DEM)とカラー復元モジュール(CRM)を作成する。さらに,GC非線形およびOLS線形変換による画像テクスチャ詳細の損失を軽減するために,マルチ受信フィールド抽出モジュール(MEM)を提案する。最後に,dem,crm,memが生成する粗い特徴をエンコーダデコーダを通じて洗練し,最終的な復元画像を生成する。総合実験の結果,aosrnetは他の最先端手法と比較して有効性と安定性を示した。ソースコードは \url{https://github.com/LouisYuxuLu/AoSRNet} で入手できる。

Scattering and attenuation of light in no-homogeneous imaging media or inconsistent light intensity will cause insufficient contrast and color distortion in the collected images, which limits the developments such as vision-driven smart urban, autonomous vehicles, and intelligent robots. In this paper, we propose an all-in-one scene recovery network via multi-knowledge integration (termed AoSRNet) to improve the visibility of imaging devices in typical low-visibility imaging scenes (e.g., haze, sand dust, and low light). It combines gamma correction (GC) and optimized linear stretching (OLS) to create the detail enhancement module (DEM) and color restoration module (CRM). Additionally, we suggest a multi-receptive field extraction module (MEM) to attenuate the loss of image texture details caused by GC nonlinear and OLS linear transformations. Finally, we refine the coarse features generated by DEM, CRM, and MEM through Encoder-Decoder to generate the final restored image. Comprehensive experimental results demonstrate the effectiveness and stability of AoSRNet compared to other state-of-the-art methods. The source code is available at \url{https://github.com/LouisYuxuLu/AoSRNet}.

翻訳日:2024-02-07 16:35:09 公開日:2024-02-06

# 変分オートエンコーダによる異常検出の統計的検証

Statistical Test for Anomaly Detections by Variational Auto-Encoders ( http://arxiv.org/abs/2402.03724v1 )

ライセンス: Link先を確認

Daiki Miwa, Tomohiro Shiraishi, Vo Nguyen Le Duy, Teruyuki Katsuoka, Ichiro Takeuchi

(参考訳) 本研究では,変分オートエンコーダ(VAE)を用いた異常検出(AD)の信頼性評価について検討する。過去10年間で、VAEベースのADは、メソッド開発から応用研究まで、様々な観点から活発に研究されてきた。しかし, 診断などの高精度な意思決定にADの結果を使用する場合には, 検出された異常の信頼性を確保する必要がある。本研究では,vaeベースadの統計的信頼性を統計的テストの枠組みで定量化する方法としてvae-adテストを提案する。 VAE-ADテストを用いて、VAEによって検出された異常領域の信頼性をp値の形で定量することができる。これは、p値が一定の閾値以下であるときに異常が宣言された場合、誤検出の確率を所望のレベルまで制御することができることを意味する。 VAE-ADテストは選択推論と呼ばれる新しい統計的推論フレームワークに基づいて構築されるため、その妥当性は有限標本で理論的に保証される。提案するvae-adテストの有効性と有効性を示すために,人工データに関する数値実験と脳画像解析への応用を行った。

In this study, we consider the reliability assessment of anomaly detection (AD) using Variational Autoencoder (VAE). Over the last decade, VAE-based AD has been actively studied in various perspective, from method development to applied research. However, when the results of ADs are used in high-stakes decision-making, such as in medical diagnosis, it is necessary to ensure the reliability of the detected anomalies. In this study, we propose the VAE-AD Test as a method for quantifying the statistical reliability of VAE-based AD within the framework of statistical testing. Using the VAE-AD Test, the reliability of the anomaly regions detected by a VAE can be quantified in the form of p-values. This means that if an anomaly is declared when the p-value is below a certain threshold, it is possible to control the probability of false detection to a desired level. Since the VAE-AD Test is constructed based on a new statistical inference framework called selective inference, its validity is theoretically guaranteed in finite samples. To demonstrate the validity and effectiveness of the proposed VAE-AD Test, numerical experiments on artificial data and applications to brain image analysis are conducted.

翻訳日:2024-02-07 16:34:46 公開日:2024-02-06

# Rig3DGS: Casual Monocular Videoからコントロール可能なポートレイを作る

Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos ( http://arxiv.org/abs/2402.03723v1 )

ライセンス: Link先を確認

Alfredo Rivero, ShahRukh Athar, Zhixin Shu, Dimitris Samaras

(参考訳) コントロール可能な3D人間の肖像画をカジュアルなスマートフォンビデオから作成することが非常に望ましい。最近の3Dガウススティング(3DGS)は、レンダリング品質とトレーニング効率が改善されている。しかし、高品質なレンダリングを実現するために、シングルビューキャプチャーから頭部の動きや表情を正確にモデル化し、切り離すことは依然として課題である。本稿では,この課題に対処するためにRig3DGSを紹介する。ダイナミックな主題を含むシーン全体を、標準空間における3Dガウスの集合を用いて表現する。頭部ポーズや表情などの一連の制御信号を用いて、学習した変形を伴って3次元空間に変換し、所望のレンダリングを生成する。我々の重要な革新は、慎重に設計された変形法であり、3次元形態素モデルから学習可能な先行モデルによって導かれる。このアプローチは、トレーニングにおいて非常に効率的であり、表情、頭の位置、様々なキャプチャ全体にわたるビュー合成の制御に効果的である。定量的および定性的な実験によって学習した変形の有効性を実証する。プロジェクトページはhttp://shahrukhathar.github.io/2024/02/05/Rig3DGS.htmlにある。

Creating controllable 3D human portraits from casual smartphone videos is highly desirable due to their immense value in AR/VR applications. The recent development of 3D Gaussian Splatting (3DGS) has shown improvements in rendering quality and training efficiency. However, it still remains a challenge to accurately model and disentangle head movements and facial expressions from a single-view capture to achieve high-quality renderings. In this paper, we introduce Rig3DGS to address this challenge. We represent the entire scene, including the dynamic subject, using a set of 3D Gaussians in a canonical space. Using a set of control signals, such as head pose and expressions, we transform them to the 3D space with learned deformations to generate the desired rendering. Our key innovation is a carefully designed deformation method which is guided by a learnable prior derived from a 3D morphable model. This approach is highly efficient in training and effective in controlling facial expressions, head positions, and view synthesis across various captures. We demonstrate the effectiveness of our learned deformation through extensive quantitative and qualitative experiments. The project page can be found at http://shahrukhathar.github.io/2024/02/05/Rig3DGS.html

翻訳日:2024-02-07 16:34:20 公開日:2024-02-06

# グラフLLMの類似性に基づく近傍選択

Similarity-based Neighbor Selection for Graph LLMs ( http://arxiv.org/abs/2402.03720v1 )

ライセンス: Link先を確認

Rui Li, Jiwei Li, Jiawei Han, Guoyin Wang

(参考訳) テキスト分散グラフ(TAGs)は、言語学習モデル(LLMs)による直接処理に特有の課題を提示するが、その広範な常識知識と頑健な推論能力は、TAGsにおけるノード分類に大きな可能性を秘めている。この分野での以前の研究は、データセット分割の不整合と高度なLCMの非活用によってさらに複雑化され、オーバー・スクワッシング、ヘテロフィリー、非効率的なグラフ情報統合といった問題に悩まされてきた。これらの課題に対処するために,類似性に基づく近傍選択 (sns) を導入する。 SNSはSimCSEと高度な隣人選別技術を用いて、選択した隣人の品質を効果的に改善し、グラフ表現を改善し、オーバースカッシングやヘテロフィリーといった問題を緩和する。さらに、インダクティブでトレーニングのないアプローチとして、SNSは従来のGNN手法よりも優れた一般化とスケーラビリティを示している。我々の総合的な実験は、標準データセット分割のプラクティスに固執し、SNSは、LLMとの単純な迅速な相互作用を通じて、バニラGNNを一貫して上回り、ノード分類におけるPubMedのようなデータセットの最先端結果、グラフ構造理解におけるLLMの可能性を示す。本研究は,LLMアプリケーションにおけるグラフ構造統合の重要性をさらに強調し,ノード分類の成功要因を明らかにした。コードはhttps://github.com/ruili33/SNSで入手できる。

Text-attributed graphs (TAGs) present unique challenges for direct processing by Language Learning Models (LLMs), yet their extensive commonsense knowledge and robust reasoning capabilities offer great promise for node classification in TAGs. Prior research in this field has grappled with issues such as over-squashing, heterophily, and ineffective graph information integration, further compounded by inconsistencies in dataset partitioning and underutilization of advanced LLMs. To address these challenges, we introduce Similarity-based Neighbor Selection (SNS). Using SimCSE and advanced neighbor selection techniques, SNS effectively improves the quality of selected neighbors, thereby improving graph representation and alleviating issues like over-squashing and heterophily. Besides, as an inductive and training-free approach, SNS demonstrates superior generalization and scalability over traditional GNN methods. Our comprehensive experiments, adhering to standard dataset partitioning practices, demonstrate that SNS, through simple prompt interactions with LLMs, consistently outperforms vanilla GNNs and achieves state-of-the-art results on datasets like PubMed in node classification, showcasing LLMs' potential in graph structure understanding. Our research further underscores the significance of graph structure integration in LLM applications and identifies key factors for their success in node classification. Code is available at https://github.com/ruili33/SNS.

翻訳日:2024-02-07 16:34:02 公開日:2024-02-06

# 映像に基づく衣服交換者再識別のための注意型形状と歩行表現学習

Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification ( http://arxiv.org/abs/2402.03716v1 )

ライセンス: Link先を確認

Vuong D. Nguyen, Samiha Mirza, Pranav Mantini, Shishir K. Shah

(参考訳) 現在最先端のビデオベースPerson Re-Identification (Re-ID)は、主にディープラーニングモデルによって抽出された外観特徴に依存している。これらの方法は、着替えた人が実世界のシナリオで長期分析に当てはまらないため、外観情報が信頼できない。本稿では、VCCRe-IDのための「注意に基づく形状と歩行表現学習」(ASGL)を提案することにより、ビデオベースの衣服交換者Re-ID(VCCRe-ID)の実践的問題に対処する。我々のASGLフレームワークは,空間時空間グラフアテンションネットワーク(ST-GAT)を用いて衣服不変歩行キューを学習することにより,衣服の変動下でのRe-ID性能を向上させる。提案するST-GATは,3次元スケルトンに基づく時空間時間グラフを考慮し,視点変化や閉塞下での歩行埋め込みの堅牢性を高めることができるマルチヘッドアテンションモジュールを備える。 ST-GATは重要な動き範囲を増幅し、ノイズポーズの影響を低減する。そして、マルチヘッド学習モジュールは、有効な局所時間的運動動態を効果的に予約する。また,GATを用いて身体形状の手がかりを学習することで,人物表現の識別力を高める。大規模VCCRe-IDデータセットの2つの実験により、提案するフレームワークは、ランク1の精度で12.2%、mAPで7.0%、最先端の手法より優れていることが示された。

Current state-of-the-art Video-based Person Re-Identification (Re-ID) primarily relies on appearance features extracted by deep learning models. These methods are not applicable for long-term analysis in real-world scenarios where persons have changed clothes, making appearance information unreliable. In this work, we deal with the practical problem of Video-based Cloth-Changing Person Re-ID (VCCRe-ID) by proposing "Attention-based Shape and Gait Representations Learning" (ASGL) for VCCRe-ID. Our ASGL framework improves Re-ID performance under clothing variations by learning clothing-invariant gait cues using a Spatial-Temporal Graph Attention Network (ST-GAT). Given the 3D-skeleton-based spatial-temporal graph, our proposed ST-GAT comprises multi-head attention modules, which are able to enhance the robustness of gait embeddings under viewpoint changes and occlusions. The ST-GAT amplifies the important motion ranges and reduces the influence of noisy poses. Then, the multi-head learning module effectively reserves beneficial local temporal dynamics of movement. We also boost discriminative power of person representations by learning body shape cues using a GAT. Experiments on two large-scale VCCRe-ID datasets demonstrate that our proposed framework outperforms state-of-the-art methods by 12.2% in rank-1 accuracy and 7.0% in mAP.

翻訳日:2024-02-07 16:33:36 公開日:2024-02-06

# Clarify: 自然言語補正によるモデルロバストネスの改善

Clarify: Improving Model Robustness With Natural Language Corrections ( http://arxiv.org/abs/2402.03715v1 )

ライセンス: Link先を確認

Yoonho Lee, Michelle S. Lam, Helena Vasconcelos, Michael S. Bernstein, Chelsea Finn

(参考訳) 教師付き学習では、モデルは静的データセットから相関を抽出するために訓練される。これはしばしばハイレベルな誤解に依存するモデルにつながる。このような誤解を防ぐためには、トレーニングデータ以外の追加情報を提供しなければならない。既存の手法には、スパイラルな特徴のラベルやバランスの取れた分布からのラベル付きデータなど、追加のインスタンスレベルの監視形式が組み込まれている。このような戦略は、元のトレーニングデータに近いスケールで追加のアノテーションを必要とするため、大規模なデータセットでは、非常にコストがかかる可能性がある。モデルの誤解に対する目標とする自然言語フィードバックは、さらなる監視のより効率的な形式である、という仮説を立てる。モデル誤解をインタラクティブに修正する新しいインターフェースと方法であるClarifyを紹介した。 Clarifyを通じて、モデルの一貫性のある障害パターンを記述するための短いテキスト記述のみを提供する必要がある。そして、完全に自動化された方法で、トレーニングデータを再重み付けしたり、追加のターゲットデータを集めることで、トレーニングプロセスを改善するためにこのような記述を使用します。ユーザ調査の結果,非熟練ユーザは2つのデータセットにおいて,最悪のグループ精度を平均17.1%向上させることで,モデルの誤解をうまく記述できることがわかった。さらに,imagenetデータセットにおける31個の新規ハードサブポピュレーションの発見と修正を行い,マイノリティ分散精度を21.1%から28.7%に向上させた。

In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that targeted natural language feedback about a model's misconceptions is a more efficient form of additional supervision. We introduce Clarify, a novel interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our user studies show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 17.1% in two datasets. Additionally, we use Clarify to find and rectify 31 novel hard subpopulations in the ImageNet dataset, improving minority-split accuracy from 21.1% to 28.7%.

翻訳日:2024-02-07 16:33:08 公開日:2024-02-06

# ウェアラブルデバイスにおける位置不変およびデバイス非依存モーションアクティビティ認識の進歩

Advancing Location-Invariant and Device-Agnostic Motion Activity Recognition on Wearable Devices ( http://arxiv.org/abs/2402.03714v1 )

ライセンス: Link先を確認

Rebecca Adaimi, Abdelkareem Bedri, Jun Gong, Richard Kang, Joanna Arreaza-Taylor, Gerri-Michelle Pascual, Michael Ralph, and Gierad Laput

(参考訳) ウェアラブルセンサーは人々の生活に浸透し、インタラクティブなシステムやアクティビティ認識に影響を与えている。しかし、異なるプラットフォームのためにカスタムモデルを必要とする異質性検知を扱う場合、実践者は重大な障害に直面する。本稿では,センサの配置にまたがる運動モデルの一般化可能性について総合的な評価を行う。我々の分析は、この課題を強調し、あらゆるデバイスに組み込むことができる位置不変モデルを構築する上で重要な位置を特定する。このために、私たちは、公開可能な最大のマルチロケーションアクティビティデータセット (n=50,200 累積時間) を導入します。また,センサ配置に関係なく,単一モデルから91.41%のフレームレベルF1スコアに到達可能なデバイス上での動作モデルも提示する。最後に,ある場所から与えられたデータを合成することで,手間のかかるデータ収集タスクを緩和することを目的とした,クロスロケーションデータ合成について検討する。これらの貢献は,hciとユビキタスコンピューティングにおけるローバリア,ロケーション不変なアクティビティ認識システム,触媒的研究の展望を前進させる。

Wearable sensors have permeated into people's lives, ushering impactful applications in interactive systems and activity recognition. However, practitioners face significant obstacles when dealing with sensing heterogeneities, requiring custom models for different platforms. In this paper, we conduct a comprehensive evaluation of the generalizability of motion models across sensor locations. Our analysis highlights this challenge and identifies key on-body locations for building location-invariant models that can be integrated on any device. For this, we introduce the largest multi-location activity dataset (N=50, 200 cumulative hours), which we make publicly available. We also present deployable on-device motion models reaching 91.41% frame-level F1-score from a single model irrespective of sensor placements. Lastly, we investigate cross-location data synthesis, aiming to alleviate the laborious data collection tasks by synthesizing data in one location given data from another. These contributions advance our vision of low-barrier, location-invariant activity recognition systems, catalyzing research in HCI and ubiquitous computing.

翻訳日:2024-02-07 16:32:48 公開日:2024-02-06

# Leggett-Garg不等式を用いた単一システムによる認証ランダムネスの生成

Single system based generation of certified randomness using Leggett-Garg inequality ( http://arxiv.org/abs/2402.03712v1 )

ライセンス: Link先を確認

Pingal Pratyush Nath, Debashis Saha, Dipankar Home, Urbasi Sinha

(参考訳) ループホールフリーフォトニックアーキテクチャにおいて、レゲット・ガーグの不等式違反を利用して、半デバイス非依存な量子乱数生成のためのセキュアなスキームを理論的に定式化し、実験的に実証する。生成したランダム性の定量化は、解析的および数値的アプローチによって厳密に推定され、どちらも完全に一致している。 9,19,118ドルの真に予測不能なビットをセキュアに生成します。これは、単一のシステムの量子性を利用する信頼性の高い乱数生成器の、経験的に便利なクラスへの未探索の道を開く。

We theoretically formulate and experimentally demonstrate a secure scheme for semi-device-independent quantum random number generation by utilizing Leggett-Garg inequality violations, within a loophole-free photonic architecture. The quantification of the generated randomness is rigorously estimated by analytical as well as numerical approaches, both of which are in perfect agreement. We securely generate $9,19,118$ truly unpredictable bits. This opens up an unexplored avenue towards an empirically convenient class of reliable random number generators harnessing the quantumness of single systems.

翻訳日:2024-02-07 16:32:31 公開日:2024-02-06

# listen, chat, and edit: テキストガイド付き音環境修正による聴覚体験の向上

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience ( http://arxiv.org/abs/2402.03710v1 )

ライセンス: Link先を確認

Xilin Jiang, Cong Han, Yinghao Aaron Li, and Nima Mesgarani

(参考訳) 日常生活では、望ましい音と望ましくない音の両方に遭遇し、その存在と容積を限定的に制御する。提案する「listen, chat, and edit」(lce)は,ユーザが入力したテキスト命令に基づいて各音源をミキシングで修飾する,新しいマルチモーダル音声混合エディタである。 LCEはユーザフレンドリーなチャットインターフェースと、複数の音源をミキシング内で同時に編集するユニークな機能で、それを分離する必要がない。ユーザーはオープン語彙のテキストプロンプトを入力し、大きな言語モデルで解釈され、音の混合を編集するためのセマンティックフィルタを作成する。その後、システムは混合物をコンポーネントに分解し、セマンティックフィルタを適用し、それを所望の出力に再組み立てする。音声と様々な音声ソースを含む10k以上の混合データと、抽出、削除、ボリューム制御といった様々な編集タスクのためのテキストプロンプトを備えた160時間データセットを開発した。本実験は,全編集作業における信号品質の大幅な向上と,音源数や形態の異なるゼロショットシナリオにおける頑健な性能を示す。

In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume. Our work introduces "Listen, Chat, and Edit" (LCE), a novel multimodal sound mixture editor that modifies each sound source in a mixture based on user-provided text instructions. LCE distinguishes itself with a user-friendly chat interface and its unique ability to edit multiple sound sources simultaneously within a mixture, without needing to separate them. Users input open-vocabulary text prompts, which are interpreted by a large language model to create a semantic filter for editing the sound mixture. The system then decomposes the mixture into its components, applies the semantic filter, and reassembles it into the desired output. We developed a 160-hour dataset with over 100k mixtures, including speech and various audio sources, along with text prompts for diverse editing tasks like extraction, removal, and volume control. Our experiments demonstrate significant improvements in signal quality across all editing tasks and robust performance in zero-shot scenarios with varying numbers and types of sound sources.

翻訳日:2024-02-07 16:32:20 公開日:2024-02-06

# SISP:パンクロマティック衛星画像におけるきめ細粒度船体セグメンテーションのためのベンチマークデータセット

SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images ( http://arxiv.org/abs/2402.03708v1 )

ライセンス: Link先を確認

Pengming Feng, Mingjie Xie, Hongning Liu, Xuanjia Zhao, Guangjun He, Xueliang Zhang, Jian Guan

(参考訳) 衛星画像におけるきめ細かい船のインスタンスのセグメンテーションは、海上での海洋活動を監視する上で非常に重要である。しかし、既存のデータセットは、微細な情報やピクセル単位の局所化アノテーションの不足、画像の多様性やバリエーションの不足に悩まされ、このタスクの研究は制限される。そこで本研究では,1万枚のスライス画像に4つの細粒度カテゴリを持つ56,693個の船種を含むSISPと,その解像度0.5mのSuperView-1衛星からすべての画像が収集される,パンクロマティック衛星画像の船種分離のベンチマークデータセットを提案する。提案したSISPデータセットのターゲットは、高級不均衡、様々なシーン、ターゲット密度とスケールの大きなバリエーション、高級間類似度とクラス内多様性など、実際の衛星シーンと一致した特徴を持ち、SISPデータセットは実世界のアプリケーションにより適している。さらに,衛星画像における船舶インスタンスセグメント化のベンチマーク手法として,動的特徴リファインメント・アシストインスタンスセグメント化ネットワークdfrinstを導入することで,重要な特徴の明示的な表現を強化し,船舶インスタンスセグメント化の性能を向上させる。提案するsispデータセット上で実験と解析を行い,ベンチマーク法と最先端手法を評価し,今後の研究を促進するためのベースラインを確立する。提案されたデータセットとソースコードは、https://github.com/Justlovesmile/SISP.comから入手できる。

Fine-grained ship instance segmentation in satellite images holds considerable significance for monitoring maritime activities at sea. However, existing datasets often suffer from the scarcity of fine-grained information or pixel-wise localization annotations, as well as the insufficient image diversity and variations, thus limiting the research of this task. To this end, we propose a benchmark dataset for fine-grained Ship Instance Segmentation in Panchromatic satellite images, namely SISP, which contains 56,693 well-annotated ship instances with four fine-grained categories across 10,000 sliced images, and all the images are collected from SuperView-1 satellite with the resolution of 0.5m. Targets in the proposed SISP dataset have characteristics that are consistent with real satellite scenes, such as high class imbalance, various scenes, large variations in target densities and scales, and high inter-class similarity and intra-class diversity, all of which make the SISP dataset more suitable for real-world applications. In addition, we introduce a Dynamic Feature Refinement-assist Instance segmentation network, namely DFRInst, as the benchmark method for ship instance segmentation in satellite images, which can fortify the explicit representation of crucial features, thus improving the performance of ship instance segmentation. Experiments and analysis are performed on the proposed SISP dataset to evaluate the benchmark method and several state-of-the-art methods to establish baselines for facilitating future research. The proposed dataset and source codes will be available at: https://github.com/Justlovesmile/SISP.

翻訳日:2024-02-07 16:32:00 公開日:2024-02-06

# MMAUD: 最新の小型ドローンの脅威に対する総合的マルチモードアンチUAVデータセット

MMAUD: A Comprehensive Multi-Modal Anti-UAV Dataset for Modern Miniature Drone Threats ( http://arxiv.org/abs/2402.03706v1 )

ライセンス: Link先を確認

Shenghai Yuan, Yizhuo Yang, Thien Hoang Nguyen, Thien-Minh Nguyen, Jianfei Yang, Fen Liu, Jianping Li, Han Wang, Lihua Xie

(参考訳) 有害なペイロードを輸送したり、単独で損傷を発生させる可能性を持つ小型無人航空機(UAV)がもたらす課題に対して、我々はMMAUD: a comprehensive Multi-Modal Anti-UAV Datasetを紹介した。 MMAUDは、ドローン検出、UAV型分類、軌道推定に焦点を当てて、現代の脅威検出手法における重要なギャップに対処する。 MMAUDはステレオビジョン、様々なライダー、レーダー、オーディオアレイなど様々な感覚入力を組み合わせることで際立っている。これは、熱とrgbを使って特定のヴァンテージポイントでキャプチャされたデータセットよりも高い忠実度で現実世界のシナリオに対処するのに必須の、ユニークなオーバーヘッド空中検出を提供する。さらに、MMAUDは正確なライカ生成の真理データを提供し、信頼性を高め、他のデータセットでは見られないアルゴリズムやモデルの信頼性向上を可能にする。既存の研究の多くはデータセットを公開していないため、MMAUDは正確で効率的なソリューションを開発するための貴重なリソースとなっている。提案するモダリティは費用対効果が高く適応性が高いため,UAV脅威検出ツールの実験と実装が可能である。我々のデータセットは環境重機音を取り入れることで現実世界のシナリオをシミュレートする。このアプローチはデータセットの適用性を高め、近位車両操作中に直面する正確な課題をキャプチャする。 MMAUDは、UAV脅威の検出、分類、軌道推定機能などにおいて重要な役割を果たすことが期待されている。私たちのデータセット、コード、デザインはhttps://github.com/ntu-aris/MMAUD.comで公開されます。

In response to the evolving challenges posed by small unmanned aerial vehicles (UAVs), which possess the potential to transport harmful payloads or independently cause damage, we introduce MMAUD: a comprehensive Multi-Modal Anti-UAV Dataset. MMAUD addresses a critical gap in contemporary threat detection methodologies by focusing on drone detection, UAV-type classification, and trajectory estimation. MMAUD stands out by combining diverse sensory inputs, including stereo vision, various Lidars, Radars, and audio arrays. It offers a unique overhead aerial detection vital for addressing real-world scenarios with higher fidelity than datasets captured on specific vantage points using thermal and RGB. Additionally, MMAUD provides accurate Leica-generated ground truth data, enhancing credibility and enabling confident refinement of algorithms and models, which has never been seen in other datasets. Most existing works do not disclose their datasets, making MMAUD an invaluable resource for developing accurate and efficient solutions. Our proposed modalities are cost-effective and highly adaptable, allowing users to experiment and implement new UAV threat detection tools. Our dataset closely simulates real-world scenarios by incorporating ambient heavy machinery sounds. This approach enhances the dataset's applicability, capturing the exact challenges faced during proximate vehicular operations. It is expected that MMAUD can play a pivotal role in advancing UAV threat detection, classification, trajectory estimation capabilities, and beyond. Our dataset, codes, and designs will be available in https://github.com/ntu-aris/MMAUD.

翻訳日:2024-02-07 16:31:31 公開日:2024-02-06

# FoolSDEdit: ターゲットの属性を意識して編集をステアリングする

FoolSDEdit: Deceptively Steering Your Edits Towards Targeted Attribute-aware Distribution ( http://arxiv.org/abs/2402.03705v1 )

ライセンス: Link先を確認

Qi Zhou, Dongxia Wang, Tianlin Li, Zhihong Xu, Yang Liu, Kui Ren, Wenhai Wang, Qing Guo

(参考訳) sdeditのような拡散モデルに基づく誘導画像合成手法は、ストローク画などの入力からリアルな画像を作成するのに優れている。しかし、既存の取り組みは主に画質に重点を置いており、しばしば重要な点を見下ろしている:拡散モデルは個々の画像ではなく、データ分布を表す。これは、ユーザーの意図に反するイメージを生成し、倫理的懸念を提起する低いが批判的な機会をもたらす。例えば、女性の特徴を持つストロークペインティングを入力したユーザは、SDEditから男性の顔を取得する可能性がある。この潜在的な脆弱性を明らかにするため,SDEdit は入力の属性特性を変えることなく,特定の属性(女性など)に一致した特定のデータ分布を生成する。本稿では,属性認識目的関数を用いたTAGA(Targeted Attribute Generative Attack)を提案し,入力ストローク絵に付加される対向雑音を最適化する。実験的な研究によると、従来の敵対的ノイズはTAGAと競合し、露光や動きのぼかしといった自然な摂動は、生成した画像の属性を容易に変化させる。効果的な攻撃を行うために、FoolSDEditを導入する: 共同対向露光とぼかし攻撃を設計し、ストローク絵に露出と動きのぼかしを追加し、それらをまとめて最適化する。我々は,ネットワークアーキテクチャ探索問題として,様々な摂動の実行戦略を最適化する。さまざまな摂動に対する多様な実行戦略を表すグラフであるsuperpertを作成します。訓練後、seditに対する効果的なtagaの実行戦略を最適化する。 2つのデータセットの総合的な実験は、SDEditがターゲット属性認識データ分布を生成することを説得し、ベースラインを著しく上回ることを示す。

Guided image synthesis methods, like SDEdit based on the diffusion model, excel at creating realistic images from user inputs such as stroke paintings. However, existing efforts mainly focus on image quality, often overlooking a key point: the diffusion model represents a data distribution, not individual images. This introduces a low but critical chance of generating images that contradict user intentions, raising ethical concerns. For example, a user inputting a stroke painting with female characteristics might, with some probability, get male faces from SDEdit. To expose this potential vulnerability, we aim to build an adversarial attack forcing SDEdit to generate a specific data distribution aligned with a specified attribute (e.g., female), without changing the input's attribute characteristics. We propose the Targeted Attribute Generative Attack (TAGA), using an attribute-aware objective function and optimizing the adversarial noise added to the input stroke painting. Empirical studies reveal that traditional adversarial noise struggles with TAGA, while natural perturbations like exposure and motion blur easily alter generated images' attributes. To execute effective attacks, we introduce FoolSDEdit: We design a joint adversarial exposure and blur attack, adding exposure and motion blur to the stroke painting and optimizing them together. We optimize the execution strategy of various perturbations, framing it as a network architecture search problem. We create the SuperPert, a graph representing diverse execution strategies for different perturbations. After training, we obtain the optimized execution strategy for effective TAGA against SDEdit. Comprehensive experiments on two datasets show our method compelling SDEdit to generate a targeted attribute-aware data distribution, significantly outperforming baselines.

翻訳日:2024-02-07 16:31:04 公開日:2024-02-06

# 離散・連続時間離散化拡散の改善と統一

Improving and Unifying Discrete&Continuous-time Discrete Denoising Diffusion ( http://arxiv.org/abs/2402.03701v1 )

ライセンス: Link先を確認

Lingxiao Zhao, Xueying Ding, Lijun Yu, Leman Akoglu

(参考訳) 離散拡散モデルは言語やグラフのような自然に離散的なデータに適用することで注目されている。離散時間離散拡散はしばらく確立されてきたが、最近キャンベルら (2022) は連続時間離散拡散の最初の枠組みを導入した。しかし、それらのトレーニングとサンプリングプロセスは離散時間版とは大きく異なり、トラクタビリティの非自明な近似を必要とする。本稿では, 離散拡散のためのより正確で最適化しやすい学習を可能にする変分下界の一連の数学的単純化について述べる。さらに, 正確なサンプリングが可能であり, 離散時間および連続時間離散拡散のエレガントな統一を可能にする, 後方復調のための簡易な定式化を導出する。より単純な解析的定式化により、前方および後方の確率は、多元オブジェクトの異なるノイズ分布を含む任意のノイズ分布を柔軟に許容することができる。実験の結果,提案したUSD3 (Unified Simplified Discrete Denoising Diffusion) は,確立したデータセット上でのSOTAベースラインよりも優れていた。私たちは、統一コードをhttps://github.com/lingxiaoshawn/usd3でオープンソースにしました。

Discrete diffusion models have seen a surge of attention with applications on naturally discrete data such as language and graphs. Although discrete-time discrete diffusion has been established for a while, only recently Campbell et al. (2022) introduced the first framework for continuous-time discrete diffusion. However, their training and sampling processes differ significantly from the discrete-time version, necessitating nontrivial approximations for tractability. In this paper, we first present a series of mathematical simplifications of the variational lower bound that enable more accurate and easy-to-optimize training for discrete diffusion. In addition, we derive a simple formulation for backward denoising that enables exact and accelerated sampling, and importantly, an elegant unification of discrete-time and continuous-time discrete diffusion. Thanks to simpler analytical formulations, both forward and now also backward probabilities can flexibly accommodate any noise distribution, including different noise distributions for multi-element objects. Experiments show that our proposed USD3 (for Unified Simplified Discrete Denoising Diffusion) outperform all SOTA baselines on established datasets. We open-source our unified code at https://github.com/LingxiaoShawn/USD3.

翻訳日:2024-02-07 16:30:32 公開日:2024-02-06

# GenLens: Visual GenAIモデル出力の体系的評価

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs ( http://arxiv.org/abs/2402.03700v1 )

ライセンス: Link先を確認

Tica Lin, Hanspeter Pfister, Jui-Hsien Wang

(参考訳) コンピュータビジョンにおける生成AI(GenAI)モデルの迅速な開発は、その品質と公平性を保証するために効果的な評価方法を必要とする。既存のツールは、主にデータセットの品質保証とモデル説明可能性に焦点を当てており、モデル開発中にGenAI出力評価に大きなギャップを残しています。現在のプラクティスは、しばしば開発者の主観的な視覚的評価に依存します。本稿では、GenAIモデル開発者と産業環境で形式的な研究を行うことにより、このギャップを埋める。この結果から,モデル開発の初期段階におけるジェナイモデル出力の体系的評価を目的としたビジュアル解析インタフェースであるgenlensの開発に繋がった。 GenLensは、障害ケースの概要と注釈付け、イシュータグと分類のカスタマイズ、複数のユーザからのアノテーションの集約によるコラボレーション強化のための定量的なアプローチを提供する。モデル開発者によるユーザ調査によると、GenLensはワークフローを効果的に強化し、高い満足度と、それをプラクティスに統合する強い意図によって証明されている。本研究は、GenAI開発における堅牢な早期評価ツールの重要性を強調し、公正かつ高品質なGenAIモデルの進歩に寄与する。

The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.

翻訳日:2024-02-07 16:30:12 公開日:2024-02-06

# Vision Superalignment: Vision Foundation Modelsのための弱から強の一般化

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models ( http://arxiv.org/abs/2402.03749v1 )

ライセンス: Link先を確認

Jianyuan Guo, Hanting Chen, Chengcheng Wang, Kai Han, Chang Xu, Yunhe Wang

(参考訳) 大規模言語モデルの最近の進歩は、その異常でほぼ超人的な能力への関心を喚起し、研究者はこれらの能力を評価し最適化する方法を探究する。この文脈において、我々の論文は、より弱いモデルを用いてより強いモデルを監督する弱い一般化の概念に焦点を当て、前者の限界を超えて後者の能力を高めることを目的として、視覚基盤モデルの領域を掘り下げる。弱強監督のための新規かつ適応的に調整可能な損失関数を提案する。包括的実験は、少数ショット学習、移行学習、ノイズラベル学習、共通知識蒸留設定など、さまざまなシナリオにまたがる。私たちのアプローチは、強固な一般化によって設定されたパフォーマンスベンチマークを超えるだけでなく、データセット全体を微調整した強固なモデルの結果を超えます。この説得力のある証拠は、弱強一般化の有意義な可能性を強調し、その能力が視覚基盤モデルの性能を大幅に高めることを示した。コードはhttps://github.com/ggjy/vision_weak_to_strongで入手できる。

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment. In this context, our paper delves into the realm of vision foundation models, focusing on the concept of weak-to-strong generalization, which involves using a weaker model to supervise a stronger one, aiming to enhance the latter's capabilities beyond the former's limits. We introduce a novel and adaptively adjustable loss function for weak-to-strong supervision. Our comprehensive experiments span various scenarios, including few-shot learning, transfer learning, noisy label learning, and common knowledge distillation settings. The results are striking: our approach not only exceeds the performance benchmarks set by strong-to-strong generalization but also surpasses the outcomes of fine-tuning strong models with whole datasets. This compelling evidence underscores the significant potential of weak-to-strong generalization, showcasing its capability to substantially elevate the performance of vision foundation models. The code is available at https://github.com/ggjy/vision_weak_to_strong.

翻訳日:2024-02-07 16:21:24 公開日:2024-02-06

# PDE発見のための不変制約深層学習ネットワーク

An invariance constrained deep learning network for PDE discovery ( http://arxiv.org/abs/2402.03747v1 )

ライセンス: Link先を確認

Chao Chen, Hui Li, Xiaowei Jin

(参考訳) データセットから偏微分方程式(PDE)の発見が注目されている。しかし, 導関数計算の難易度やノイズの乱れなどにより, 高ノイズのスパースデータからの制御方程式の発見はいまだに困難である。さらに、物理法則を満たすための図書館の選択原則をさらに研究する必要がある。不変性は方程式の基本的な法則の1つである。本研究では,PDEの発見のための分散制約付きディープラーニングネットワーク(ICNet)を提案する。時空間変換不変性(ガリレオ不変性)が物理法則の基本的な性質であることを考えると、ガリレオ変換の要件を満たすことができない候補をフィルタリングする。その後,ニューラルネットワークの損失関数に固定項と可能な項を組み込み,ノイズの多いスパースデータの影響を著しく抑制した。そして、学習可能パラメータを固定することなく冗長項をフィルタリングすることにより、ICNet法で発見された支配方程式を効果的に近似することができる。 2次元バーガース方程式、障害物上の2次元チャネルフロー方程式、および3次元頭蓋内動脈瘤方程式を選択し、流体力学におけるicnetの優位性を検証する。さらに、同様の不変性法を波動方程式(ローレンツ不変性)の発見に拡張し、シングルおよび結合されたクライン・ゴルドン方程式を用いて検証する。その結果, 物理制約付きICNet法は, スパースおよびノイズの多いデータから方程式を探索する際の優れた性能を示した。

The discovery of partial differential equations (PDEs) from datasets has attracted increased attention. However, the discovery of governing equations from sparse data with high noise is still very challenging due to the difficulty of derivatives computation and the disturbance of noise. Moreover, the selection principles for the candidate library to meet physical laws need to be further studied. The invariance is one of the fundamental laws for governing equations. In this study, we propose an invariance constrained deep learning network (ICNet) for the discovery of PDEs. Considering that temporal and spatial translation invariance (Galilean invariance) is a fundamental property of physical laws, we filter the candidates that cannot meet the requirement of the Galilean transformations. Subsequently, we embedded the fixed and possible terms into the loss function of neural network, significantly countering the effect of sparse data with high noise. Then, by filtering out redundant terms without fixing learnable parameters during the training process, the governing equations discovered by the ICNet method can effectively approximate the real governing equations. We select the 2D Burgers equation, the equation of 2D channel flow over an obstacle, and the equation of 3D intracranial aneurysm as examples to verify the superiority of the ICNet for fluid mechanics. Furthermore, we extend similar invariance methods to the discovery of wave equation (Lorentz Invariance) and verify it through Single and Coupled Klein-Gordon equation. The results show that the ICNet method with physical constraints exhibits excellent performance in governing equations discovery from sparse and noisy data.

翻訳日:2024-02-07 16:21:05 公開日:2024-02-06

# AIフィードバックによる強化学習を用いたビデオ用大規模マルチモーダルモデルのチューニング

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback ( http://arxiv.org/abs/2402.03746v1 )

ライセンス: Link先を確認

Daechul Ahn, Yura Choi, Youngjae Yu, Dongyeop Kang and Jonghyun Choi

(参考訳) 近年の大規模言語モデルの発展はビデオ大マルチモーダルモデル(VLMM)の発展に影響を与えている。 VLMMの以前のアプローチには、命令調整されたデータセットを使用したSupervised Fine-Tuning (SFT)、ビジュアルエンコーダとLLMの統合、学習可能なモジュールの追加が含まれていた。ビデオとテキストのマルチモーダルアライメントは、主にテキストのみのデータと比較してマルチモーダル命令・トゥンデータのボリュームと品質が不足しているため、依然として困難である。本稿では,AIフィードバックからの強化学習(Reinforcement Learning from AI Feedback, RLAIF)と呼ばれる,マルチモーダルAIシステムを利用した新たなアライメント戦略を提案する。具体的には,映像コンテンツの理解を深めるために,嗜好フィードバック生成時のコンテキストとして詳細な映像記述を提供することにより,文脈対応報酬モデリングを提案する。我々のマルチモーダルRLAIFアプローチであるVLM-RLAIFはSFTモデルを含む既存の手法よりも優れています。私たちは、この分野のさらなる研究を促進するために、コード、モデル、データセットをオープンソース化することを約束します。

Recent advancements in large language models have influenced the development of video large multimodal models (VLMMs). The previous approaches for VLMMs involved Supervised Fine-Tuning (SFT) with instruction-tuned datasets, integrating LLM with visual encoders, and adding additional learnable modules. Video and text multimodal alignment remains challenging, primarily due to the deficient volume and quality of multimodal instruction-tune data compared to text-only data. We present a novel alignment strategy that employs multimodal AI system to oversee itself called Reinforcement Learning from AI Feedback (RLAIF), providing self-preference feedback to refine itself and facilitating the alignment of video and text modalities. In specific, we propose context-aware reward modeling by providing detailed video descriptions as context during the generation of preference feedback in order to enrich the understanding of video content. Demonstrating enhanced performance across diverse video benchmarks, our multimodal RLAIF approach, VLM-RLAIF, outperforms existing approaches, including the SFT model. We commit to open-sourcing our code, models, and datasets to foster further research in this area.

翻訳日:2024-02-07 16:20:40 公開日:2024-02-06

# INSIDE: LLMの内部状態は幻覚検出の力を維持している

INSIDE: LLMs' Internal States Retain the Power of Hallucination Detection ( http://arxiv.org/abs/2402.03744v1 )

ライセンス: Link先を確認

Chao Chen, Kai Liu, Ze Chen, Yi Gu, Yue Wu, Mingyuan Tao, Zhihang Fu, Jieping Ye

(参考訳) 知識幻覚は、デプロイされたLLMのセキュリティと信頼性に対する幅広い懸念を引き起こしている。従来,ロジトレベルの不確実性評価や言語レベルの自己整合性評価では,トークン復号処理中に意味情報が必然的に失われていた。そこで我々は,halluc\textbf{i}nation \textbf{de}tection (\textbf{inside}) に対して llms' \textbf{in}ternal \textbf{s}tates に保持される密接な意味情報を探索する。特に、応答の自己整合性をよりよく評価するために、単純で効果的な \textbf{EigenScore} 計量が提案され、これは応答の共分散行列の固有値を利用して密埋め込み空間における意味的一貫性/多様性を測定する。さらに、自己整合性幻覚検出の観点から、内部状態における極端な活性化を阻害するテスト時間特徴クリッピング手法が検討され、過信世代を減らし、過信性幻覚の検出に有用である可能性がある。いくつかのLLMとQA(Qanguage-Awering)ベンチマークで大規模な実験とアブレーション実験を行い,提案手法の有効性を示した。

Knowledge hallucination have raised widespread concerns for the security and reliability of deployed LLMs. Previous efforts in detecting hallucinations have been employed at logit-level uncertainty estimation or language-level self-consistency evaluation, where the semantic information is inevitably lost during the token-decoding procedure. Thus, we propose to explore the dense semantic information retained within LLMs' \textbf{IN}ternal \textbf{S}tates for halluc\textbf{I}nation \textbf{DE}tection (\textbf{INSIDE}). In particular, a simple yet effective \textbf{EigenScore} metric is proposed to better evaluate responses' self-consistency, which exploits the eigenvalues of responses' covariance matrix to measure the semantic consistency/diversity in the dense embedding space. Furthermore, from the perspective of self-consistent hallucination detection, a test time feature clipping approach is explored to truncate extreme activations in the internal states, which reduces overconfident generations and potentially benefits the detection of overconfident hallucinations. Extensive experiments and ablation studies are performed on several popular LLMs and question-answering (QA) benchmarks, showing the effectiveness of our proposal.

翻訳日:2024-02-07 16:20:22 公開日:2024-02-06

# SUB-PLAY:部分観測型マルチエージェント強化学習システムに対する対抗策

SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems ( http://arxiv.org/abs/2402.03741v1 )

ライセンス: Link先を確認

Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji

(参考訳) マルチエージェント強化学習(MARL)の最近の進歩は、ドローンの群れ制御、ロボットアームによる協調操作、マルチターゲットの囲い込みなど、膨大な応用可能性を開く。しかし、MARL配備時の潜在的なセキュリティ上の脅威には、より注意と徹底的な調査が必要である。最近の研究によると、攻撃者は被害者の脆弱性を迅速に利用し、敵のポリシーを生成でき、特定のタスクにおける被害者の失敗につながる。例えば、スーパーヒューマンレベルのGo AIの勝利率を約20%に削減する。彼らは主に2人のプレイヤーの競争環境に焦点を当てており、攻撃者が完全なグローバルな状態観察を持っていると仮定している。本研究は,複数エージェントの競争環境において,被害者の部分的観察に制限された場合でも,攻撃者が敵対的な政策を発生できることを初めて明らかにする。具体的には,部分的可観測性の影響を軽減するために,複数のサブゲームを構築するという概念を組み込んだ,新たなブラックボックス攻撃(サブプレイ)を提案する。 3つの典型的な部分的可観測限界下でのSUB-PLAYの有効性を示す。可視化の結果,敵対的政策が被害者の政策ネットワークの活性化を著しく引き起こすことが示唆された。さらに、敵対的政策によるセキュリティの脅威を軽減し、競争環境にMARLを配備するための建設的な勧告を提供することを目的とした3つの防衛策を評価する。

Recent advances in multi-agent reinforcement learning (MARL) have opened up vast application prospects, including swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent researches reveal that an attacker can rapidly exploit the victim's vulnerabilities and generate adversarial policies, leading to the victim's failure in specific tasks. For example, reducing the winning rate of a superhuman-level Go AI to around 20%. They predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation. In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY), which incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests the sharing of transitions among subpolicies to improve the exploitative ability of attackers. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments.

翻訳日:2024-02-07 16:19:54 公開日:2024-02-06

# BotSSCL:自己監督型コントラスト学習によるソーシャルボット検出

BotSSCL: Social Bot Detection with Self-Supervised Contrastive Learning ( http://arxiv.org/abs/2402.03740v1 )

ライセンス: Link先を確認

Mohammad Majid Akhtar, Navid Shadman Bhuiyan, Rahat Masood, Muhammad Ikram, Salil S. Kanhere

(参考訳) ソーシャルボット」とも呼ばれる自動アカウントの検出は、オンラインソーシャルネットワーク(OSN)にとってますます重要な関心事となっている。ソーシャルボットの検出にはいくつかの方法が提案されているが、大きな研究ギャップが残っている。第一に、現在のモデルは本物のosnユーザーを模倣する高度なボットを検出することに限界がある。第二に、これらのメソッドは操作の影響を受けやすい単純なプロファイル機能に依存することが多い。敵の操作に対する脆弱性に加えて、これらのモデルは一般化性に欠けており、あるデータセットでトレーニングされ、別のデータセットでテストされた場合、サブパーパフォーマンスをもたらす。これらの課題に対処するために,自己教師付きコントラスト学習(botsscl)を用いた新しいソーシャルボット検出フレームワークを提案する。本フレームワークは,ソーシャルボットと人間を組込み空間で区別し,線形分離性を向上させるために,コントラスト学習を利用する。 BotSSCLから派生したハイレベルな表現は、データの分散の変化に対するレジリエンスを高め、一般化性を確保する。ボットアカウントの操作による検出回避に対するBotSSCLの堅牢性を評価する。高度なボットを特徴とする2つのデータセットの実験は、BotSSCLが他の教師なし、教師なし、および自己教師付きベースラインメソッドよりも優れていることを示している。我々はほぼ達成する。 6%であった。 8% (f1) 向上した。さらに、BotSSCLは、あるデータセットでトレーニングし、別のデータセットでテストすると、67%のF1を達成する。最後に、BotSSCLは敵の複雑さを増大させ、検出を回避するために敵に4%の成功しか与えない。

The detection of automated accounts, also known as "social bots", has been an increasingly important concern for online social networks (OSNs). While several methods have been proposed for detecting social bots, significant research gaps remain. First, current models exhibit limitations in detecting sophisticated bots that aim to mimic genuine OSN users. Second, these methods often rely on simplistic profile features, which are susceptible to manipulation. In addition to their vulnerability to adversarial manipulations, these models lack generalizability, resulting in subpar performance when trained on one dataset and tested on another. To address these challenges, we propose a novel framework for social Bot detection with Self-Supervised Contrastive Learning (BotSSCL). Our framework leverages contrastive learning to distinguish between social bots and humans in the embedding space to improve linear separability. The high-level representations derived by BotSSCL enhance its resilience to variations in data distribution and ensure generalizability. We evaluate BotSSCL's robustness against adversarial attempts to manipulate bot accounts to evade detection. Experiments on two datasets featuring sophisticated bots demonstrate that BotSSCL outperforms other supervised, unsupervised, and self-supervised baseline methods. We achieve approx. 6% and approx. 8% higher (F1) performance than SOTA on both datasets. In addition, BotSSCL also achieves 67% F1 when trained on one dataset and tested with another, demonstrating its generalizability. Lastly, BotSSCL increases adversarial complexity and only allows 4% success to the adversary in evading detection.

翻訳日:2024-02-07 16:19:30 公開日:2024-02-06

# 微分的にプライベートな高次元バンディット

Differentially Private High Dimensional Bandits ( http://arxiv.org/abs/2402.03737v1 )

ライセンス: Link先を確認

Apurv Shukla

(参考訳) パラメータベクトルが$s_{0}$-sparseであり、決定メーカーが偏微分プライバシーの中央モデルと局所モデルの両方の下でプライバシー制約を受ける場合、高次元の確率的文脈線形バンディット問題を考える。差分プライベートなLASSO帯域幅アルゴリズムであるPrivateLASSOを提案する。 PrivateLASSOは2つのサブルーチンに基づいている。 (i)まばらなハードスレッディングに基づくプライバシー機構 (ii)パラメータ $\theta$ のサポートを特定するためのエピソディックしきい値規則。標準前提の下では,PrivateLASSOのプライバシと実用性を保証するために,Minimaxのプライベートなバウンダリを証明している。

We consider a high-dimensional stochastic contextual linear bandit problem when the parameter vector is $s_{0}$-sparse and the decision maker is subject to privacy constraints under both central and local models of differential privacy. We present PrivateLASSO, a differentially private LASSO bandit algorithm. PrivateLASSO is based on two sub-routines: (i) a sparse hard-thresholding-based privacy mechanism and (ii) an episodic thresholding rule for identifying the support of the parameter $\theta$. We prove minimax private lower bounds and establish privacy and utility guarantees for PrivateLASSO for the central model under standard assumptions.

翻訳日:2024-02-07 16:19:04 公開日:2024-02-06

# 最大$s$-bundle問題に対する新しい境界法を用いた効果的な分岐・境界アルゴリズム

An Effective Branch-and-Bound Algorithm with New Bounding Methods for the Maximum $s$-Bundle Problem ( http://arxiv.org/abs/2402.03736v1 )

ライセンス: Link先を確認

Jinghui Xue, Jiongzhi Zheng, Mingming Jin and Kun He

(参考訳) 最大sバンドル問題(MBP)は、与えられたグラフ内の最大sバンドルを特定するタスクに対処する。グラフ g=(v, e) が s-バンドル (s-bundle) とは、頂点接続が少なくとも |v|-s であるとき、頂点接続が最小の頂点数に等しいときに言う。 MBPはNPハードであり、頂点接続性を強調する多くの現実シナリオに関連がある。 mbpの正確なアルゴリズムは、主に分枝結合(bnb)フレームワークに従っており、その性能は最大s束の濃度とグラフ還元による最初の下限の上限の品質に大きく依存している。本研究では,グラフ分割技術を活用した分割型上界(PUB)を導入し,既存のものに比べてより厳密な上界を実現する。下限を増加させるために,クリップ上で短いランダムウォークを行い,より大きな初期解を生成することを提案する。そこで我々は,グラフ削減のための前処理に初期下界とPUBを用いる新しいBnBアルゴリズムを提案し,分岐解析にBnB探索プロセスにPUBを用いる。多様なs値を用いた大規模な実験は、最先端のBnB MBPアルゴリズムに対する我々のアルゴリズムの顕著な進歩を示している。さらに、最初の下界は、他の緩和傾斜問題にも一般化できる。

The Maximum s-Bundle Problem (MBP) addresses the task of identifying a maximum s-bundle in a given graph. A graph G=(V, E) is called an s-bundle if its vertex connectivity is at least |V|-s, where the vertex connectivity equals the minimum number of vertices whose deletion yields a disconnected or trivial graph. MBP is NP-hard and holds relevance in numerous realworld scenarios emphasizing the vertex connectivity. Exact algorithms for MBP mainly follow the branch-and-bound (BnB) framework, whose performance heavily depends on the quality of the upper bound on the cardinality of a maximum s-bundle and the initial lower bound with graph reduction. In this work, we introduce a novel Partition-based Upper Bound (PUB) that leverages the graph partitioning technique to achieve a tighter upper bound compared to existing ones. To increase the lower bound, we propose to do short random walks on a clique to generate larger initial solutions. Then, we propose a new BnB algorithm that uses the initial lower bound and PUB in preprocessing for graph reduction, and uses PUB in the BnB search process for branch pruning. Extensive experiments with diverse s values demonstrate the significant progress of our algorithm over state-of-the-art BnB MBP algorithms. Moreover, our initial lower bound can also be generalized to other relaxation clique problems.

翻訳日:2024-02-07 16:18:54 公開日:2024-02-06

# 課題追跡システムにおけるChatGPTの有用性の検討:探索的研究

Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study ( http://arxiv.org/abs/2402.03735v1 )

ライセンス: Link先を確認

Joy Krishan Das, Saikat Mondal, Chanchal K.Roy

(参考訳) 問題追跡システムは、外部ユーザを取り入れ、ユーザの要求を満たすためにソフトウェアプロジェクトをカスタマイズするための主要なツールである。しかし、コントリビュータの数が限られており、各問題に対する最善のアプローチを特定するという課題は、しばしば効果的な解決を妨げる。最近、ChatGPTのようなAIツールを使って問題解決の効率を高める開発者が増えている。これまでの研究では、自動プログラム修復、デバッグ、コード生成といった分野でChatGPTの可能性を実証してきたが、開発者がChatGPTを明示的に利用してトラッキングシステムの問題を解決する方法については研究されていない。そこで本研究では,ChatGPTと開発者間のインタラクションを分析し,それらの活動を分析し,解決することを目的とした。さらに,ChatGPTが生成したコードが,クローン検出ツールNiCadを使用してプロジェクトのコードベースに統合されているかどうかを確認することで,コードの信頼性を評価する。私たちの調査によると、開発者は主にブレインストーミングソリューションにChatGPTを使用しているが、おそらく文献で強調されているように、ChatGPTで生成されたコードではなく、自分のコードを書くことを選択している。

Issue tracking systems serve as the primary tool for incorporating external users and customizing a software project to meet the users' requirements. However, the limited number of contributors and the challenge of identifying the best approach for each issue often impede effective resolution. Recently, an increasing number of developers are turning to AI tools like ChatGPT to enhance problem-solving efficiency. While previous studies have demonstrated the potential of ChatGPT in areas such as automatic program repair, debugging, and code generation, there is a lack of study on how developers explicitly utilize ChatGPT to resolve issues in their tracking system. Hence, this study aims to examine the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. In addition, we assess the code reliability by confirming if the code produced by ChatGPT was integrated into the project's codebase using the clone detection tool NiCad. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code, possibly due to concerns over the generation of "hallucinated code", as highlighted in the literature.

翻訳日:2024-02-07 16:18:28 公開日:2024-02-06

# 知識グラフにおけるDeep outdated Fact Detection

Deep Outdated Fact Detection in Knowledge Graphs ( http://arxiv.org/abs/2402.03732v1 )

ライセンス: Link先を確認

Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, Karin Verspoor

(参考訳) 知識グラフ(KGs)は、様々な領域にまたがる大きな可能性について、大きな注目を集めている。しかし、時代遅れの事実の問題はKGに挑戦し、現実世界の情報が進化するにつれて、その全体的な品質に影響を及ぼす。古い事実検出のための既存のソリューションは、しばしば手動認識に依存している。そこで本研究では,KGs内の古い事実を識別するための新しいディープラーニングベースのフレームワークであるDEAN(Deep outdatEd fAct detectioN)を提案する。 DEANは、実体と関係の包括的モデリングを通じて、事実間の暗黙的な構造情報をキャプチャすることで、自分自身を区別する。 DEANは遅延情報を効果的に発見するために、エンティティの数で重み付けされたR2N(Relations-to-Nodes)グラフに基づく対照的なアプローチを採用している。実験結果は,最先端のベースライン法よりもDEANの有効性と優位性を示した。

Knowledge graphs (KGs) have garnered significant attention for their vast potential across diverse domains. However, the issue of outdated facts poses a challenge to KGs, affecting their overall quality as real-world information evolves. Existing solutions for outdated fact detection often rely on manual recognition. In response, this paper presents DEAN (Deep outdatEd fAct detectioN), a novel deep learning-based framework designed to identify outdated facts within KGs. DEAN distinguishes itself by capturing implicit structural information among facts through comprehensive modeling of both entities and relations. To effectively uncover latent out-of-date information, DEAN employs a contrastive approach based on a pre-defined Relations-to-Nodes (R2N) graph, weighted by the number of entities. Experimental results demonstrate the effectiveness and superiority of DEAN over state-of-the-art baseline methods.

翻訳日:2024-02-07 16:18:08 公開日:2024-02-06

# スピンキャビティ系における離散時間結晶に対するパラメトリック共鳴の理論

Theory of parametric resonance for discrete time crystals in fully-connected spin-cavity systems ( http://arxiv.org/abs/2402.03729v1 )

ライセンス: Link先を確認

Roy D. Jara Jr., Dennis F. Salinel, Jayson G. Cosme

(参考訳) 全連結スピンキャビティ系における離散時間結晶形成に必要な条件をパラメトリック共鳴の観点から特定し、これらの系を振動子様モデルにマッピングする。我々は、周期的に駆動されるオープンディックモデル(DM)を実効線形および非線形振動子モデルにマッピングし、リプキン-メシュコフ-グリックモデル(LMG)モデルを用いて大域対称性破壊の効果を解析する。系の非線形性は, 共振駆動時の非有界化を抑制することを示す。一方、消散は周期性不安定性の振動振幅を一定に保ち、これはDTCの重要な特徴である。周期共振応答のパラメトリック共振器活性化には, 駆動のない大域対称性の破れの存在が不可欠であることがわかった。各振動子モデルを用いて,両系の共振周波数とdtc形成につながる振幅の解析的予測を行う。

We pinpoint the conditions necessary for discrete time crystal formation in fully-connected spin-cavity systems from the perspective of parametric resonance by mapping these systems onto oscillator-like models. We elucidate the role of nonlinearity and dissipation by mapping the periodically driven open Dicke model (DM) onto effective linear and nonlinear oscillator models, while we analyze the effect of global symmetry breaking using the Lipkin-Meshkov-Glick (LMG) model with tunable anisotropy. We show that the system's nonlinearity restrains the dynamics from becoming unbounded when driven resonantly. On the other hand, dissipation keeps the oscillation amplitude of the period-doubling instability fixed, which is a key feature of DTCs. The presence of global symmetry breaking in the absence of driving is found to be crucial in the parametric resonant activation of period-doubling response. We provide analytic predictions for the resonant frequencies and amplitudes leading to DTC formation for both systems using their respective oscillator models.

翻訳日:2024-02-07 16:17:52 公開日:2024-02-06

# 不均一学習モデルを用いた一貫した共同意思決定

Consistent Joint Decision-Making with Heterogeneous Learning Models ( http://arxiv.org/abs/2402.03728v1 )

ライセンス: Link先を確認

Hossein Rajaby Faghihi and Parisa Kordjamshidi

(参考訳) 本稿では,外部知識を活用しつつ,多様なモデルによる意思決定の一貫性を促進する新しい意思決定フレームワークを提案する。整数線形計画法(ilp)フレームワークを活用することで,様々なモデルからの予測を,決定の事前確率,信頼度(不確実性),モデルの期待精度に関する情報を組み込むことにより,グローバルに正規化され,比較可能な値にマッピングする。実験により、従来の複数のデータセットのベースラインよりもアプローチが優れていることを示す。

This paper introduces a novel decision-making framework that promotes consistency among decisions made by diverse models while utilizing external knowledge. Leveraging the Integer Linear Programming (ILP) framework, we map predictions from various models into globally normalized and comparable values by incorporating information about decisions' prior probability, confidence (uncertainty), and the models' expected accuracy. Our empirical study demonstrates the superiority of our approach over conventional baselines on multiple datasets.

翻訳日:2024-02-07 16:17:35 公開日:2024-02-06

# インスタンス・ワイズ・セルフ・アテンティブ・ホークスプロセスによる粒状因果性学習

Learning Granger Causality from Instance-wise Self-attentive Hawkes Processes ( http://arxiv.org/abs/2402.03726v1 )

ライセンス: Link先を確認

Dongxia Wu, Tsuyoshi Id\'e, Aur\'elie Lozano, Georgios Kollias, Ji\v{r}\'i Navr\'atil, Naoki Abe, Yi-An Ma, Rose Yu

(参考訳) 本稿では,非同期,相互依存型,複数タイプのイベントシーケンスからGranger因果関係を学習する問題に対処する。特に、インスタンスレベルの因果構造を教師なしで発見することに興味がある。インスタンスレベルの因果関係は個々のイベント間の因果関係を認識し、よりきめ細かい情報を提供する。文献における既存の研究は、強度関数の線形性のような強い仮定や、必ずしもグランジャー因果関係の要件を満たさないヒューリスティックに定義されたモデルパラメータを必要とする。本稿では,イベントインスタンスレベルでのグランジャー因果関係を直接推測可能な,新しいディープラーニングフレームワークであるisahp(instance-wise self-attentive hawkes processes)を提案する。 ISAHPは、Granger因果性の要求を満たす最初の神経点プロセスモデルである。変圧器の自己着脱機構を利用して、グレンジャー因果関係の原理に合致する。我々は、ISAHPが古典モデルでは扱えない複雑なインスタンスレベルの因果構造を発見することができることを実証的に実証した。また、ISAHPは、タイプレベルの因果発見とインスタンスレベルのイベントタイプ予測を含むプロキシタスクにおいて、最先端のパフォーマンスを達成することを示す。

We address the problem of learning Granger causality from asynchronous, interdependent, multi-type event sequences. In particular, we are interested in discovering instance-level causal structures in an unsupervised manner. Instance-level causality identifies causal relationships among individual events, providing more fine-grained information for decision-making. Existing work in the literature either requires strong assumptions, such as linearity in the intensity function, or heuristically defined model parameters that do not necessarily meet the requirements of Granger causality. We propose Instance-wise Self-Attentive Hawkes Processes (ISAHP), a novel deep learning framework that can directly infer the Granger causality at the event instance level. ISAHP is the first neural point process model that meets the requirements of Granger causality. It leverages the self-attention mechanism of the transformer to align with the principles of Granger causality. We empirically demonstrate that ISAHP is capable of discovering complex instance-level causal structures that cannot be handled by classical models. We also show that ISAHP achieves state-of-the-art performance in proxy tasks involving type-level causal discovery and instance-level event type prediction.

翻訳日:2024-02-07 16:17:25 公開日:2024-02-06

# 自由フェルミオン負性に対する電荷相関子展開

Charge correlator expansion for free fermion negativity ( http://arxiv.org/abs/2402.03725v1 )

ライセンス: Link先を確認

Yang-Yang Tang

(参考訳) 対数ネガティビティ(英: logarithmic negativity)は、量子情報理論において広く用いられるエンタングルメント測度であり、複製トリックや相関行列の関連によって、量子多体系でも効率的に計算できる。本稿では,保存電荷を持つ自由フェルミオン系において,完全計数統計(fcs)の文脈における絡み合いエントロピーの場合と類似した,連結電荷相関子によってr\'enyi および対数ネガティクスを拡張できることを実証する。特に局所ホッピングしか持たない系の数値検証により、ランダムな全連結ハミルトニアンにおけるこの拡張の急速な収束を確認した。 R'enyi Negativity の極限から対数ネガティビティを得るレプリカのトリックは、この方法では変換不変系のみに有効である。この拡張を用いて、広範囲な自由フェルミオン系における負性性のスケーリング挙動を解析する。特に, 1+1次元自由フェルミオン系では, 拡張による負性率のスケーリング挙動は, Toeplitz 行列を用いた手法の既知結果と一致している。これらの知見は, 自由フェルミオン系の絡み合い特性に関する知見を与え, 絡み合い対策の研究における拡張手法の有効性を実証する。

Logarithmic negativity is a widely used entanglement measure in quantum information theories, which can also be efficiently computed in quantum many-body systems by replica trick or by relating to correlation matrices. In this paper, we demonstrate that in free-fermion systems with conserved charge, R\'enyi and logarithmic negativity can be expanded by connected charge correlators, analogous to the case for entanglement entropy in the context of full counting statistics (FCS). We confirm the rapid convergence of this expansion in random all-connected Hamiltonian through numerical verification, especially for systems with only local hopping. We find that the replica trick that get logarithmic negativity from the limit of R\'enyi negativity is valid in this method only for translational invariant systems. Using this expansion, we analyze the scaling behavior of negativity in extensive free-fermion systems. In particular, in 1+1 dimensional free-fermion systems, we observe that the scaling behavior of negativity from our expansion is consistent with known results from the method with Toeplitz matrix. These findings provide insights into the entanglement properties of free-fermion systems, and demonstrate the efficacy of the expansion approach in studying entanglement measures.

翻訳日:2024-02-07 16:17:07 公開日:2024-02-06

# 平滑MDPにおける非線形強化学習

No-Regret Reinforcement Learning in Smooth MDPs ( http://arxiv.org/abs/2402.03792v1 )

ライセンス: Link先を確認

Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restell

(参考訳) 連続状態および/またはアクション空間の問題が発生した場合、強化学習(RL)が保証されないことは、この分野における大きな課題の1つである。最近、様々な解決策が提案されているが、非常に特定の設定に加えて、一般的な問題は未解決のままである。本稿では,マルコフ決定過程 (MDPs) に関する新しい構造的仮定,すなわち$\nu-$smoothness を導入し,これまで提案されてきた設定の大部分を一般化する(線形MDPやリプシッツMDPなど)。この困難なシナリオに直面するため、我々は$\nu-$smooth mdps における後悔の最小化のための2つのアルゴリズムを提案する。どちらのアルゴリズムも、ルジャンドル多項式に基づく直交特徴写像を通してMDP表現を構築するという考え方に基づいている。第1のアルゴリズムである \textsc{legendre-eleanor} は、より弱い仮定の下でノンリグレット特性をアーカイブするが、計算効率は低いが、第2のアルゴリズムである \textsc{legendre-lsvi} は多項式時間で実行される。 RL理論から得られた遺残特性を解析した結果と比較した結果,アルゴリズムが最高の保証を達成できることが判明した。

Obtaining no-regret guarantees for reinforcement learning (RL) in the case of problems with continuous state and/or action spaces is still one of the major open challenges in the field. Recently, a variety of solutions have been proposed, but besides very specific settings, the general problem remains unsolved. In this paper, we introduce a novel structural assumption on the Markov decision processes (MDPs), namely $\nu-$smoothness, that generalizes most of the settings proposed so far (e.g., linear MDPs and Lipschitz MDPs). To face this challenging scenario, we propose two algorithms for regret minimization in $\nu-$smooth MDPs. Both algorithms build upon the idea of constructing an MDP representation through an orthogonal feature map based on Legendre polynomials. The first algorithm, \textsc{Legendre-Eleanor}, archives the no-regret property under weaker assumptions but is computationally inefficient, whereas the second one, \textsc{Legendre-LSVI}, runs in polynomial time, although for a smaller class of problems. After analyzing their regret properties, we compare our results with state-of-the-art ones from RL theory, showing that our algorithms achieve the best guarantees.

翻訳日:2024-02-07 16:10:18 公開日:2024-02-06

# より良いコード表現のためのバージョン履歴コンテキストのエンコーディング

Encoding Version History Context for Better Code Representation ( http://arxiv.org/abs/2402.03773v1 )

ライセンス: Link先を確認

Huy Nguyen, Christoph Treude, Patanamon Thongtanunam

(参考訳) ソースコードを生成するAIツールの指数関数的な成長により、ソフトウェアを理解することが重要になっている。開発者がプログラムを理解すると、プログラムのドキュメントや過去のコードバージョンなどの情報を探すために追加のコンテキストを参照することができる。したがって、この追加の文脈情報を符号化することは、深層学習のためのコード表現にも役立つと論じる。最近の論文では、プログラム理解問題に対処するために、文脈データ(例えば呼び出し階層)をベクトル表現に組み込んでいる。これは、モデルによるプログラムの理解を深めるために、バージョン履歴のような追加のコンテキストを探求するさらなる研究を動機付ける。つまり、バージョン履歴からの洞察によって、コードの進化におけるパターンの認識、繰り返し発生する問題、過去のソリューションの有効性が実現される。本稿では、バージョン履歴から文脈情報をエンコードしてコードクローンを予測し、コード分類を行うことによる潜在的メリットの予備的な証拠を示す。我々は,astnnとcodebertという2つの代表的なディープラーニングモデルを用いて,異なるアグリゲーションによる追加コンテキストの組み合わせが下流アクティビティに有用かどうかを検証した。実験結果は,すべてのシナリオにおいて,バージョン履歴とソースコード表現を組み合わせることによる肯定的な影響を裏付けるものである。しかし,そのテクニックを一貫して実行するためには,コンテキスト,集約,モデルの異なる組み合わせを用いて,より大規模なコードベースを包括的に調査する必要がある。そこで本稿では,コード表現の改善と特定の状況における最適活用を目的とした,追加コンテキストの符号化のさまざまな側面を探求する研究課題を提案する。

With the exponential growth of AI tools that generate source code, understanding software has become crucial. When developers comprehend a program, they may refer to additional contexts to look for information, e.g. program documentation or historical code versions. Therefore, we argue that encoding this additional contextual information could also benefit code representation for deep learning. Recent papers incorporate contextual data (e.g. call hierarchy) into vector representation to address program comprehension problems. This motivates further studies to explore additional contexts, such as version history, to enhance models' understanding of programs. That is, insights from version history enable recognition of patterns in code evolution over time, recurring issues, and the effectiveness of past solutions. Our paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification. We experiment with two representative deep learning models, ASTNN and CodeBERT, to investigate whether combining additional contexts with different aggregations may benefit downstream activities. The experimental result affirms the positive impact of combining version history into source code representation in all scenarios; however, to ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models. Therefore, we propose a research agenda aimed at exploring various aspects of encoding additional context to improve code representation and its optimal utilisation in specific situations.

翻訳日:2024-02-07 16:09:51 公開日:2024-02-06

# bagged rewardからの強化学習:インスタンスレベルの報酬再分配のためのトランスフォーマーベースのアプローチ

Reinforcement Learning from Bagged Reward: A Transformer-based Approach for Instance-Level Reward Redistribution ( http://arxiv.org/abs/2402.03771v1 )

ライセンス: Link先を確認

Yuting Tang and Xin-Qiang Cai and Yao-Xiang Ding and Qiyu Wu and Guoqing Liu and Masashi Sugiyama

(参考訳) 強化学習(RL)では、エージェントの動作毎に即時報酬信号を生成し、エージェントが累積報酬を最大化して最適なポリシーを得るように学習する。しかし、現実世界の多くのアプリケーションでは、即時報酬信号はエージェントによって取得できない。代わりに、学習者はバッグの端でのみ報酬を受け取り、バッグは完全な軌道の部分的なシーケンスとして定義される。この状況では、学習者はバッグ内の未知の即時報酬を探索する重大な困難に直面しなければならないが、これは既存のアプローチでは対処できない。本稿では、この状況を正式に研究するために、RLBR(Reinforcement Learning from Bagged Rewards)と呼ばれる新しいRL設定を導入する。本稿では,マルコフ決定過程(MDP)におけるRLBRと標準RLの関連性を確立するための理論的研究について述べる。そこで本研究では,袋内における報酬分布を効果的に解明するために,袋内における文脈的ニュアンスや時間的依存関係を解釈するセルフアテンション機構を用いた,トランスフォーマベースの報酬モデルである報奨袋トランス(rbt)を提案する。広汎な実験分析により,本手法の優位性,特に元のMDPの報酬分布を模倣する能力が示され,文脈的理解能力と環境力学への適応性を強調した。

In reinforcement Learning (RL), an instant reward signal is generated for each action of the agent, such that the agent learns to maximize the cumulative reward to obtain the optimal policy. However, in many real-world applications, the instant reward signals are not obtainable by the agent. Instead, the learner only obtains rewards at the ends of bags, where a bag is defined as a partial sequence of a complete trajectory. In this situation, the learner has to face the significant difficulty of exploring the unknown instant rewards in the bags, which could not be addressed by existing approaches, including those trajectory-based approaches that consider only complete trajectories and ignore the inner reward distributions. To formally study this situation, we introduce a novel RL setting termed Reinforcement Learning from Bagged Rewards (RLBR), where only the bagged rewards of sequences can be obtained. We provide the theoretical study to establish the connection between RLBR and standard RL in Markov Decision Processes (MDPs). To effectively explore the reward distributions within the bagged rewards, we propose a Transformer-based reward model, the Reward Bag Transformer (RBT), which uses the self-attention mechanism for interpreting the contextual nuances and temporal dependencies within each bag. Extensive experimental analyses demonstrate the superiority of our method, particularly in its ability to mimic the original MDP's reward distribution, highlighting its proficiency in contextual understanding and adaptability to environmental dynamics.

翻訳日:2024-02-07 16:09:24 公開日:2024-02-06

# Fed-CVLC: 可変長符号によるフェデレーション学習コミュニケーションの圧縮

Fed-CVLC: Compressing Federated Learning Communications with Variable-Length Codes ( http://arxiv.org/abs/2402.03770v1 )

ライセンス: Link先を確認

Xiaoxin Su, Yipeng Zhou, Laizhong Cui, John C.S. Lui and Jiangchuan Liu

(参考訳) フェデレーション学習(fl)パラダイムでは、パラメータサーバ(ps)が、個々のクライアントが所有するプライベートデータに触らずに、複数のラウンドにわたってモデル収集、更新集約、モデル分散のために、分散参加者クライアントと同時通信する。 FLはデータのプライバシを保存することに魅力がありますが、PSと散在するクライアント間の通信は深刻なボトルネックになります。量子化やスパーシフィケーションのようなモデル圧縮アルゴリズムは提案されているが、一般に固定コード長を仮定しており、モデル更新の不均一性と可変性を反映していない。本稿では,解析と実験の両方を通して,FLの圧縮に可変長が有用であることを示す。そこで我々はFed-CVLC(Federated Learning Compression with Variable-Length Codes)を提案する。通信予算を考慮した損失関数(モデルユーティリティの最大化に相当)を最小化する最適調整戦略を開発する。さらに、Fed-CVLCは、量子化とスパーシフィケーションを橋渡しし、より柔軟な圧縮設計であることを示す。 Fed-CVLCは最先端のベースラインを著しく上回り、モデルの実用性は1.50%-5.44%向上し、通信トラフィックは16.67%-41.61%縮小した。

In Federated Learning (FL) paradigm, a parameter server (PS) concurrently communicates with distributed participating clients for model collection, update aggregation, and model distribution over multiple rounds, without touching private data owned by individual clients. FL is appealing in preserving data privacy; yet the communication between the PS and scattered clients can be a severe bottleneck. Model compression algorithms, such as quantization and sparsification, have been suggested but they generally assume a fixed code length, which does not reflect the heterogeneity and variability of model updates. In this paper, through both analysis and experiments, we show strong evidences that variable-length is beneficial for compression in FL. We accordingly present Fed-CVLC (Federated Learning Compression with Variable-Length Codes), which fine-tunes the code length in response of the dynamics of model updates. We develop optimal tuning strategy that minimizes the loss function (equivalent to maximizing the model utility) subject to the budget for communication. We further demonstrate that Fed-CVLC is indeed a general compression design that bridges quantization and sparsification, with greater flexibility. Extensive experiments have been conducted with public datasets to demonstrate that Fed-CVLC remarkably outperforms state-of-the-art baselines, improving model utility by 1.50%-5.44%, or shrinking communication traffic by 16.67%-41.61%.

翻訳日:2024-02-07 16:08:58 公開日:2024-02-06

# attacknet: ライブネス検出のための畳み込みニューラルネットワークアーキテクチャによる生体認証セキュリティの強化

AttackNet: Enhancing Biometric Security via Tailored Convolutional Neural Network Architectures for Liveness Detection ( http://arxiv.org/abs/2402.03769v1 )

ライセンス: Link先を確認

Oleksandr Kuznetsov, Dmytro Zakharov, Emanuele Frontoni, Andrea Maranesi

(参考訳) バイオメトリック・セキュリティは、バイオメトリック・サンプルの完全性と信頼性が最重要となる、現代のアイデンティティ認証および認証システムの基盤である。本稿では,バイオメトリックシステムにおけるスプーフィング脅威に対処するように設計された,目覚ましい畳み込みニューラルネットワークアーキテクチャであるAttackNetを紹介する。深層学習手法を取り入れたこのモデルは,低レベル特徴抽出から高レベルパターン識別へシームレスに移行する,層状防御機構を提供する。 3つの特徴的なアーキテクチャフェーズがモデルの要点を形成し、それぞれが司法的に選択されたアクティベーション関数、正規化テクニック、およびドロップアウト層によって支えられ、敵の攻撃に対する堅牢性とレジリエンスを確保する。多様なデータセットにまたがってモデルをベンチマークすることで、現在のモデルと比較して優れたパフォーマンス指標を示す。さらに、詳細な比較分析によりモデルの有効性が強調され、最先端の手法と平行に描画される。反復的な洗練とアーキテクチャ戦略を通じて、AttackNetはバイオメトリックセキュリティの未来を守るためのディープラーニングの可能性を強調している。

Biometric security is the cornerstone of modern identity verification and authentication systems, where the integrity and reliability of biometric samples is of paramount importance. This paper introduces AttackNet, a bespoke Convolutional Neural Network architecture, meticulously designed to combat spoofing threats in biometric systems. Rooted in deep learning methodologies, this model offers a layered defense mechanism, seamlessly transitioning from low-level feature extraction to high-level pattern discernment. Three distinctive architectural phases form the crux of the model, each underpinned by judiciously chosen activation functions, normalization techniques, and dropout layers to ensure robustness and resilience against adversarial attacks. Benchmarking our model across diverse datasets affirms its prowess, showcasing superior performance metrics in comparison to contemporary models. Furthermore, a detailed comparative analysis accentuates the model's efficacy, drawing parallels with prevailing state-of-the-art methodologies. Through iterative refinement and an informed architectural strategy, AttackNet underscores the potential of deep learning in safeguarding the future of biometric security.

翻訳日:2024-02-07 16:08:31 公開日:2024-02-06

# mobilevlm v2: ビジョン言語モデルの高速かつ強力なベースライン

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model ( http://arxiv.org/abs/2402.03766v1 )

ライセンス: Link先を確認

Xiangxiang Chu and Limeng Qiao and Xinyu Zhang and Shuang Xu and Fei Wei and Yang Yang and Xiaofei Sun and Yiming Hu and Xinyang Lin and Bo Zhang and Chunhua Shen

(参考訳) 我々は,MobileVLM上で大幅に改良された視覚言語モデルであるMobileVLM V2を紹介し,新しいアーキテクチャ設計の繊細なオーケストレーション,モバイルVLMに適したトレーニングスキームの改善,高品質なデータセットキュレーションにより,VLMの性能を大幅に向上させることができることを示した。特に、MobileVLM V2 1.7Bは、標準VLMベンチマークにおいて、3Bスケールでのより大きなVLMよりも優れた、または低いパフォーマンスを達成する。特に、我々の3Bモデルは7B+スケールで様々なVLMより優れています。私たちのモデルはhttps://github.com/Meituan-AutoML/MobileVLMでリリースされます。

We introduce MobileVLM V2, a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs' performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, our 3B model outperforms a large variety of VLMs at the 7B+ scale. Our models will be released at https://github.com/Meituan-AutoML/MobileVLM .

翻訳日:2024-02-07 16:08:11 公開日:2024-02-06

# mod-slam:unbounded 3d scene reconstructionのための単眼高密度マッピング

MoD-SLAM: Monocular Dense Mapping for Unbounded 3D Scene Reconstruction ( http://arxiv.org/abs/2402.03762v1 )

ライセンス: Link先を確認

Heng Zhou, Zhetao Guo, Shuhong Liu, Lechen Zhang, Qihao Wang, Yuxiang Ren, Mingrui Li

(参考訳) ニューラルネットワークの暗黙的表現は、最近、同時局在化とマッピング(slam)を含む多くの分野で実証されている。現在のニューラルSLAMは境界シーンの再構成において理想的な結果が得られるが、これはRGB-D画像の入力に依存する。 rgb画像のみに基づくニューラルベースslamでは,シーンのスケールを正確に再構築することはできず,追跡中に蓄積されたエラーによりスケールドリフトに支障をきたす。このような制約を克服するために,世界的ポーズ最適化と3次元再構成を非有界シーンで実現可能な単眼的密集マッピング法 mod-slam を提案する。単眼深度推定によるシーン再構築の最適化とループ閉鎖検出によるカメラポーズの更新により、大規模シーンの詳細な再現が可能となる。これまでの作業と比べて、私たちのアプローチはより堅牢で、スケーラブルで、多用途です。実験の結果,MoD-SLAMのマッピング性能は,特に大きな境界のないシーンにおいて,従来のSLAM法よりも優れていた。

Neural implicit representations have recently been demonstrated in many fields including Simultaneous Localization And Mapping (SLAM). Current neural SLAM can achieve ideal results in reconstructing bounded scenes, but this relies on the input of RGB-D images. Neural-based SLAM based only on RGB images is unable to reconstruct the scale of the scene accurately, and it also suffers from scale drift due to errors accumulated during tracking. To overcome these limitations, we present MoD-SLAM, a monocular dense mapping method that allows global pose optimization and 3D reconstruction in real-time in unbounded scenes. Optimizing scene reconstruction by monocular depth estimation and using loop closure detection to update camera pose enable detailed and precise reconstruction on large scenes. Compared to previous work, our approach is more robust, scalable and versatile. Our experiments demonstrate that MoD-SLAM has more excellent mapping performance than prior neural SLAM methods, especially in large borderless scenes.

翻訳日:2024-02-07 16:07:59 公開日:2024-02-06

# 深層学習に基づく脳腫瘍手術用ハイパースペクトル画像の補正とアンミックス

Deep Learning-Based Correction and Unmixing of Hyperspectral Images for Brain Tumor Surgery ( http://arxiv.org/abs/2402.03761v1 )

ライセンス: Link先を確認

David Black, Jaidev Gill, Andrew Xie, Benoit Liquet, Antonio Di leva, Walter Stummer, Eric Suero Molina

(参考訳) 蛍光誘導脳腫瘍切除のためのハイパースペクトルイメージング(HSI)は、ヒトでは識別できない組織の違いを可視化する。この増強は脳腫瘍の切除を最大化し、患者の予後を改善する。しかし、hsiの処理の多くは、フルオロフォアの存在量の正確な回復のためにモデル化されなければならない非線形波長依存現象を捉えることができない単純な線形法を用いている。そこで本研究では,非線形効果を考慮し,より正確な量の推定を行うことができる2つの深層学習モデルを提案する。どちらのモデルも、捕獲されたスペクトルを処理するためにオートエンコーダのようなアーキテクチャを使用する。 1つはプロトポルフィリンIX(PpIX)濃度ラベルで訓練されている。他方は半教師訓練を行い、まずハイパースペクトルアンミックスを学習し、その後、参照白色光反射スペクトルを用いて不均一な光学的・幾何学的性質の蛍光発光スペクトルを数ショットで補正する学習を行う。 PpIX 濃度と計算した PpIX 濃度 0.997 と 0.990 の Pearson 相関係数 (R 値) は, 従来の手法では 0.93 と 0.82 しか得られなかった。半教師ありアプローチのR値はそれぞれ0.98と0.91である。人間のデータでは、半教師付きモデルは古典的手法よりも質的により現実的な結果を与え、スペクトル反射率の鮮明な点を除去し、比較的均一であるべき生検に対するPpIX量の分散を減少させる。これらの結果から,蛍光誘導神経外科における深層学習によるHSIの改善が期待できる。

Hyperspectral Imaging (HSI) for fluorescence-guided brain tumor resection enables visualization of differences between tissues that are not distinguishable to humans. This augmentation can maximize brain tumor resection, improving patient outcomes. However, much of the processing in HSI uses simplified linear methods that are unable to capture the non-linear, wavelength-dependent phenomena that must be modeled for accurate recovery of fluorophore abundances. We therefore propose two deep learning models for correction and unmixing, which can account for the nonlinear effects and produce more accurate estimates of abundances. Both models use an autoencoder-like architecture to process the captured spectra. One is trained with protoporphyrin IX (PpIX) concentration labels. The other undergoes semi-supervised training, first learning hyperspectral unmixing self-supervised and then learning to correct fluorescence emission spectra for heterogeneous optical and geometric properties using a reference white-light reflectance spectrum in a few-shot manner. The models were evaluated against phantom and pig brain data with known PpIX concentration; the supervised model achieved Pearson correlation coefficients (R values) between the known and computed PpIX concentrations of 0.997 and 0.990, respectively, whereas the classical approach achieved only 0.93 and 0.82. The semi-supervised approach's R values were 0.98 and 0.91, respectively. On human data, the semi-supervised model gives qualitatively more realistic results than the classical method, better removing bright spots of specular reflectance and reducing the variance in PpIX abundance over biopsies that should be relatively homogeneous. These results show promise for using deep learning to improve HSI in fluorescence-guided neurosurgery.

翻訳日:2024-02-07 16:07:42 公開日:2024-02-06

# 仮想分類:多領域群数に対するドメイン固有知識の変調

Virtual Classification: Modulating Domain-Specific Knowledge for Multidomain Crowd Counting ( http://arxiv.org/abs/2402.03758v1 )

ライセンス: Link先を確認

Mingyue Guo, Binghui Chen, Zhaoyi Yan, Yaowei Wang, Qixiang Ye

(参考訳) マルチドメインのクラウドカウントは、複数の多様なデータセットの一般的なモデルを学ぶことを目的としている。しかし、ディープネットワークは、ドメインバイアスとして知られるすべてのドメインではなく、支配的なドメインの分布のモデリングを好む。本研究では,マルチドメインの集団カウントにおけるドメインバイアス問題に対処するための,シンプルなyet- Effective Modulating Domain-specific Knowledge Network (MDKNet)を提案する。 MDKNetは‘変調’というアイデアを採用し、さまざまなデータセットの分散をバイアスの少ないディープネットワークバランシングとモデリングを可能にしている。具体的には、ドメイン分布に適応する情報フローを洗練するためのベースモジュレータとして機能する、インスタンス固有バッチ正規化(IsBN)モジュールを提案する。ドメイン固有情報を正確に調整するためにドメイン誘導仮想分類器(DVC)を導入し、ドメイン分離可能な潜在空間を学習する。この空間は、IsBN変調器の入力ガイダンスとして使われ、複数のデータセットの混合分布を適切に扱うことができる。上海技術A/B、QNRF、NWPUなどの一般的なベンチマークで実施された大規模な実験は、マルチドメインのクラウドカウントに取り組む上でMDKNetの優位性とマルチドメイン学習の有効性を検証する。コードは \url{https://github.com/csguomy/MDKNet} で入手できる。

Multidomain crowd counting aims to learn a general model for multiple diverse datasets. However, deep networks prefer modeling distributions of the dominant domains instead of all domains, which is known as domain bias. In this study, we propose a simple-yet-effective Modulating Domain-specific Knowledge Network (MDKNet) to handle the domain bias issue in multidomain crowd counting. MDKNet is achieved by employing the idea of `modulating', enabling deep network balancing and modeling different distributions of diverse datasets with little bias. Specifically, we propose an Instance-specific Batch Normalization (IsBN) module, which serves as a base modulator to refine the information flow to be adaptive to domain distributions. To precisely modulating the domain-specific information, the Domain-guided Virtual Classifier (DVC) is then introduced to learn a domain-separable latent space. This space is employed as an input guidance for the IsBN modulator, such that the mixture distributions of multiple datasets can be well treated. Extensive experiments performed on popular benchmarks, including Shanghai-tech A/B, QNRF and NWPU, validate the superiority of MDKNet in tackling multidomain crowd counting and the effectiveness for multidomain learning. Code is available at \url{https://github.com/csguomy/MDKNet}.

翻訳日:2024-02-07 16:07:15 公開日:2024-02-06

# 直感的バイアス:Spurious ImagesはMLLMの幻覚に繋がる

The Instinctive Bias: Spurious Images lead to Hallucination in MLLMs ( http://arxiv.org/abs/2402.03757v1 )

ライセンス: Link先を確認

Tianyang Han, Qing Lian, Rui Pan, Renjie Pi, Jipeng Zhang, Shizhe Diao, Yong Lin, Tong Zhang

(参考訳) 大規模言語モデル (LLM) は近年顕著な進歩を遂げており、マルチモーダルな大規模言語モデル (MLLM) の出現により、視覚能力を備えたLLMが実現され、様々なマルチモーダルタスクにおける印象的なパフォーマンスがもたらされた。しかし、GPT-4Vのような強力なMLLMは、特定の画像やテキスト入力を提示しても驚くほど失敗する。本稿では,MLLMに非常に関連性があるが応答に相容れない画像からなるMLLMをバッフルする典型的な入力のクラスを特定し,MLLMが幻覚に悩まされる原因となる。この効果を定量化するために,スプリアスイメージの幻覚レベルを評価する最初のベンチマークであるcorrelationqaを提案する。このベンチマークには、13のカテゴリにわたる7,308のテキストイメージペアが含まれている。提案した相関QAに基づいて,9つの主流MLLMを網羅的に分析し,この本能バイアスを様々な程度に普遍的に抱えることを示した。得られたベンチマークと評価結果が,誤解を招く画像の存在下でのMLLMの頑健さのより良い評価に役立つことを期待する。リソースはhttps://github.com/MasaiahHan/CorrelationQA.comで入手できる。

Large language models (LLMs) have recently experienced remarkable progress, where the advent of multi-modal large language models (MLLMs) has endowed LLMs with visual capabilities, leading to impressive performances in various multi-modal tasks. However, those powerful MLLMs such as GPT-4V still fail spectacularly when presented with certain image and text inputs. In this paper, we identify a typical class of inputs that baffles MLLMs, which consist of images that are highly relevant but inconsistent with answers, causing MLLMs to suffer from hallucination. To quantify the effect, we propose CorrelationQA, the first benchmark that assesses the hallucination level given spurious images. This benchmark contains 7,308 text-image pairs across 13 categories. Based on the proposed CorrelationQA, we conduct a thorough analysis on 9 mainstream MLLMs, illustrating that they universally suffer from this instinctive bias to varying degrees. We hope that our curated benchmark and evaluation results aid in better assessments of the MLLMs' robustness in the presence of misleading images. The resource is available in https://github.com/MasaiahHan/CorrelationQA.

翻訳日:2024-02-07 16:06:53 公開日:2024-02-06

# QuantAgent: 自己改善型大規模言語モデルによる取引における聖杯の探索

QuantAgent: Seeking Holy Grail in Trading by Self-Improving Large Language Model ( http://arxiv.org/abs/2402.03755v1 )

ライセンス: Link先を確認

Saizhuo Wang, Hang Yuan, Lionel M. Ni, Jian Guo

(参考訳) 大規模言語モデル(LLM)に基づく自律エージェントは、現実の課題に対処し、計画を立てる上で注目されているが、量的投資のような専門分野向けにこれらのエージェントを調整することは、依然として恐ろしい作業である。主な課題は、エージェントの学習プロセスのためのドメイン固有の知識ベースを効率的に構築し、統合することである。 This paper introduces a principled framework to address this challenge, comprising a two-layer loop.In the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. 実証的な結果は、実行可能な金融信号を発見し、財務予測の精度を高めるQuantAgentの能力を示している。

Autonomous agents based on Large Language Models (LLMs) that devise plans and tackle real-world challenges have gained prominence.However, tailoring these agents for specialized domains like quantitative investment remains a formidable task. The core challenge involves efficiently building and integrating a domain-specific knowledge base for the agent's learning process. This paper introduces a principled framework to address this challenge, comprising a two-layer loop.In the inner loop, the agent refines its responses by drawing from its knowledge base, while in the outer loop, these responses are tested in real-world scenarios to automatically enhance the knowledge base with new insights.We demonstrate that our approach enables the agent to progressively approximate optimal behavior with provable efficiency.Furthermore, we instantiate this framework through an autonomous agent for mining trading signals named QuantAgent. Empirical results showcase QuantAgent's capability in uncovering viable financial signals and enhancing the accuracy of financial forecasts.

翻訳日:2024-02-07 16:06:32 公開日:2024-02-06

# 放射能レポート生成のための集中型視覚誘導ネットワーク

Intensive Vision-guided Network for Radiology Report Generation ( http://arxiv.org/abs/2402.03754v1 )

ライセンス: Link先を確認

Fudan Zheng, Mengfei Li, Ying Wang, Weijiang Yu, Ruixuan Wang, Zhiguang Chen, Nong Xiao, and Yutong Lu

(参考訳) 医療業界への大きな応用可能性のために、自動x線検査レポート生成が急成長している。しかし、この問題に対処するための既存のコンピュータビジョンと自然言語処理アプローチは2つの側面に限られている。まず、画像特徴を抽出する際、視覚における多視点推論を無視し、スペースビューやチャンネルビューといった医療画像の単一視点構造をモデル化する。しかし、臨床医は日常診断において総合的な判断を多視点画像情報に頼っている。第二に、レポートを生成する際には、マルチモーダル情報による文脈推論を見落とし、検索手法を利用した純粋テキスト最適化に焦点を当てる。本研究の目的は,臨床医の視点をシミュレートし,より正確な報告を生成するモデルを提案することである。上記の特徴抽出の限界を考慮し,多視点視覚知覚をシミュレートし統合するための医用画像エンコーダにおけるグローバル集中注意(gia)モジュールを提案する。 GIAは、深度ビュー、空間ビュー、ピクセルビューの3種類の視覚知覚を学習することを目指している。一方,報告生成における上記の問題に対処するために,複数のモーダル信号を用いて正確な一致レポートを生成する方法,すなわち,予め予測された単語と地域認識された視覚コンテンツの統合方法について検討する。具体的には、視覚的知識誘導デコーダ(VKGD)を設計し、次の単語予測を支援するために、モデルが視覚情報や予測されたテキストにどれだけ依存する必要があるかを適応的に検討する。したがって、我々の最後の集中型ビジョン誘導ネットワーク(IVGN)フレームワークは、GIA誘導型ビジュアルエンコーダとVKGDを含んでいる。 IU X-RayとMIMIC-CXRの2つの一般的なデータセットを用いた実験は、他の最先端手法と比較して、我々の手法が優れていることを示す。

Automatic radiology report generation is booming due to its huge application potential for the healthcare industry. However, existing computer vision and natural language processing approaches to tackle this problem are limited in two aspects. First, when extracting image features, most of them neglect multi-view reasoning in vision and model single-view structure of medical images, such as space-view or channel-view. However, clinicians rely on multi-view imaging information for comprehensive judgment in daily clinical diagnosis. Second, when generating reports, they overlook context reasoning with multi-modal information and focus on pure textual optimization utilizing retrieval-based methods. We aim to address these two issues by proposing a model that better simulates clinicians' perspectives and generates more accurate reports. Given the above limitation in feature extraction, we propose a Globally-intensive Attention (GIA) module in the medical image encoder to simulate and integrate multi-view vision perception. GIA aims to learn three types of vision perception: depth view, space view, and pixel view. On the other hand, to address the above problem in report generation, we explore how to involve multi-modal signals to generate precisely matched reports, i.e., how to integrate previously predicted words with region-aware visual content in next word prediction. Specifically, we design a Visual Knowledge-guided Decoder (VKGD), which can adaptively consider how much the model needs to rely on visual information and previously predicted text to assist next word prediction. Hence, our final Intensive Vision-guided Network (IVGN) framework includes a GIA-guided Visual Encoder and the VKGD. Experiments on two commonly-used datasets IU X-Ray and MIMIC-CXR demonstrate the superior ability of our method compared with other state-of-the-art approaches.

翻訳日:2024-02-07 16:06:16 公開日:2024-02-06

# 不確実性に基づく集団変数を用いたロバストな分子データセットのサンプリング

Enhanced sampling of robust molecular datasets with uncertainty-based collective variables ( http://arxiv.org/abs/2402.03753v1 )

ライセンス: Link先を確認

Aik Rui Tan, Johannes C. B. Dietschreit, Rafael Gomez-Bombarelli

(参考訳) 分子システムのアクセス可能な構成空間を表すデータセットを生成することは、機械学習原子間ポテンシャル(mlip)のロバスト性にとって重要である。しかし、多くの局所的なミニマとエネルギー障壁を持つ複雑なポテンシャルエネルギー表面(PES)を特徴とする分子系の複雑さは、大きな課題を呈している。ランダムサンプリングや徹底的な探索のような従来のデータ生成方法は、扱いにくいか、稀だが非常に有益な構成を捉えない。本研究では,MLモデル予測が最も不確実な構成空間の領域に着目し,化学関連データポイントの獲得を導くために,不確実性を集合変数(CV)として活用する手法を提案する。このアプローチでは、偏り分子動力学シミュレーションのためのcvとして単一のモデルからガウス混合モデルに基づく不確かさ計量を用いる。アラニンジペプチドベンチマークシステムにおいて, エネルギー障壁を克服し, 目に見えないエネルギーミニマを探索し, アクティブラーニングフレームワークで設定したデータセットを向上する手法の有効性を実証した。

Generating a data set that is representative of the accessible configuration space of a molecular system is crucial for the robustness of machine learned interatomic potentials (MLIP). However, the complexity of molecular systems, characterized by intricate potential energy surfaces (PESs) with numerous local minima and energy barriers, presents a significant challenge. Traditional methods of data generation, such as random sampling or exhaustive exploration, are either intractable or may not capture rare, but highly informative configurations. In this study, we propose a method that leverages uncertainty as the collective variable (CV) to guide the acquisition of chemically-relevant data points, focusing on regions of the configuration space where ML model predictions are most uncertain. This approach employs a Gaussian Mixture Model-based uncertainty metric from a single model as the CV for biased molecular dynamics simulations. The effectiveness of our approach in overcoming energy barriers and exploring unseen energy minima, thereby enhancing the data set in an active learning framework, is demonstrated on the alanine dipeptide benchmark system.

翻訳日:2024-02-07 16:05:45 公開日:2024-02-06

# 小型画像を用いた小型データセットにおける軽量ビジョントランスの事前学習

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images ( http://arxiv.org/abs/2402.03752v1 )

ライセンス: Link先を確認

Jen Hong Tan

(参考訳) 軽量ビジョントランスフォーマー(ViT)は、小さな画像解像度のデータセット上で、ResNetのような畳み込みニューラルネットワーク(CNN)のパフォーマンスにマッチするか、超えるか? 本報告では,マスク付きオートエンコーダによる画像スケーリングの最小化により,プリトレーニングにより純粋なViTが優れた性能を発揮することを示す。 CIFAR-10とCIFAR-100データセットの実験では、パラメータが365万未満のViTモデルと、乗算累積(MAC)数が0.27G未満で、これらを「軽量」モデルとみなした。従来の手法とは異なり、CIFAR-10やCIFAR-100の画像を著しくスケールアップすることなく、類似の軽量トランスフォーマーベースアーキテクチャの最先端性能を実現する。この成果は、小さなデータセットを扱うだけでなく、元のスケールに近い画像を効果的に処理する上でも、我々のモデルの効率を裏付けるものである。

Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets involved ViT models with fewer than 3.65 million parameters and a multiply-accumulate (MAC) count below 0.27G, qualifying them as 'lightweight' models. Unlike previous approaches, our method attains state-of-the-art performance among similar lightweight transformer-based architectures without significantly scaling up images from CIFAR-10 and CIFAR-100. This achievement underscores the efficiency of our model, not only in handling small datasets but also in effectively processing images close to their original scale.

翻訳日:2024-02-07 16:05:26 公開日:2024-02-06

# ディジタルツインモビリティプロファイリング : 時空間グラフ学習アプローチ

Digital Twin Mobility Profiling: A Spatio-Temporal Graph Learning Approach ( http://arxiv.org/abs/2402.03750v1 )

ライセンス: Link先を確認

Xin Chen, Mingliang Hou, Tao Tang, Achhardeep Kaur and Feng Xia

(参考訳) ビッグデータ時代が到来すると、モビリティプロファイリングは膨大なモビリティデータを利用してインテリジェントな交通システムを構築するための有効な方法になってきた。モビリティプロファイリングは、モビリティデータから都市交通の潜在的なパターンを抽出でき、様々な交通関連アプリケーションにとって重要である。しかし、高いレベルの複雑さと膨大なデータによって、モビリティプロファイリングは大きな課題に直面している。デジタルツイン(dt)技術は、ネットワークの仮想表現を作成してその動作をシミュレートすることで、コスト効率とパフォーマンスを最適化した管理の道を開く。交通シナリオにおける複雑な時空間的特徴を捉えるため、時空間的相関表現の完成を支援するアライメント図を構築し、時空間的相互作用(時空間的相互作用)の微粒化を学習する。本稿では,移動ネットワークDTモデルを用いてノードプロファイルを学習するためのデジタルツインモビリティ・プロファイリング(DTMP)フレームワークを提案する。 3つの実世界のデータセットで広範な実験が行われた。実験によりDTMPの有効性が示された。

With the arrival of the big data era, mobility profiling has become a viable method of utilizing enormous amounts of mobility data to create an intelligent transportation system. Mobility profiling can extract potential patterns in urban traffic from mobility data and is critical for a variety of traffic-related applications. However, due to the high level of complexity and the huge amount of data, mobility profiling faces huge challenges. Digital Twin (DT) technology paves the way for cost-effective and performance-optimised management by digitally creating a virtual representation of the network to simulate its behaviour. In order to capture the complex spatio-temporal features in traffic scenario, we construct alignment diagrams to assist in completing the spatio-temporal correlation representation and design dilated alignment convolution network (DACN) to learn the fine-grained correlations, i.e., spatio-temporal interactions. We propose a digital twin mobility profiling (DTMP) framework to learn node profiles on a mobility network DT model. Extensive experiments have been conducted upon three real-world datasets. Experimental results demonstrate the effectiveness of DTMP.

翻訳日:2024-02-07 16:05:10 公開日:2024-02-06

# ソフトウェアパッチの自動記述生成

Automated Description Generation for Software Patches ( http://arxiv.org/abs/2402.03805v1 )

ライセンス: Link先を確認

Thanh Trong Vu, Tuan-Dung Bui, Thanh-Dat Do, Thu-Trang Nguyen, Hieu Dinh Vo, and Son Nguyen

(参考訳) ソフトウェアパッチは、コードベースの精製と進化、バグ、脆弱性、最適化に重要である。パッチ記述は変更の詳細な説明を提供し、開発者間の理解とコラボレーションを支援する。しかし、マニュアル記述の作成は、時間消費と品質と細部の違いの観点から課題を提起する。本稿では,パッチ記述生成を機械翻訳タスクとしてフレーミングすることで,これらの課題に対処するPATCHEXPLAINERを提案する。 PATCHEXPLAINERでは、重要な要素、歴史的文脈、統語規則の明示的な表現を活用する。さらに、PATCHEXPLAINERの翻訳モデルは、記述類似性を意識して設計されている。特に、このモデルは、グループにクラスタ化されたパッチ記述に存在する類似性を認識し、組み込むように明示的に訓練されており、同様のパッチ間で正確で一貫性のある記述を生成する能力を改善している。 2つの目的は類似性を最大化し、アフィリエイト群を正確に予測する。実世界のソフトウェアパッチの大規模なデータセットを用いた実験の結果、patchexplainerは、bleuの189%、正確な一致率5.7倍、セマンティックな類似度154%という、既存の手法を一貫して上回っており、ソフトウェアパッチ記述の生成に効果があることが判明しました。

Software patches are pivotal in refining and evolving codebases, addressing bugs, vulnerabilities, and optimizations. Patch descriptions provide detailed accounts of changes, aiding comprehension and collaboration among developers. However, manual description creation poses challenges in terms of time consumption and variations in quality and detail. In this paper, we propose PATCHEXPLAINER, an approach that addresses these challenges by framing patch description generation as a machine translation task. In PATCHEXPLAINER, we leverage explicit representations of critical elements, historical context, and syntactic conventions. Moreover, the translation model in PATCHEXPLAINER is designed with an awareness of description similarity. Particularly, the model is explicitly trained to recognize and incorporate similarities present in patch descriptions clustered into groups, improving its ability to generate accurate and consistent descriptions across similar patches. The dual objectives maximize similarity and accurately predict affiliating groups. Our experimental results on a large dataset of real-world software patches show that PATCHEXPLAINER consistently outperforms existing methods, with improvements up to 189% in BLEU, 5.7X in Exact Match rate, and 154% in Semantic Similarity, affirming its effectiveness in generating software patch descriptions.

翻訳日:2024-02-07 15:58:45 公開日:2024-02-06

# ReLU$^2$ Wins: Sparse LLMの効率的な活性化関数の発見

ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs ( http://arxiv.org/abs/2402.03804v1 )

ライセンス: Link先を確認

Zhengyan Zhang, Yixin Song, Guanghui Yu, Xu Han, Yankai Lin, Chaojun Xiao, Chenyang Song, Zhiyuan Liu, Zeyu Mi, Maosong Sun

(参考訳) スパース計算は、非活性ニューロンの計算を動的にスキップすることで、低リソースシナリオにおけるLarge Language Models(LLM)の推論に魅力的なソリューションを提供する。従来のアプローチでは、活性化値のゼロを活用するReLUベースのLCMに重点を置いているが、ゼロアクティベーション値を超えたスパースLSMの範囲を広げている。我々は、ニューロン出力の等級と調整された等級しきい値によってニューロンの活性化を定義する一般的な方法を紹介し、非ReLU LLMもスパース活性化を示すことを示した。スパース計算における最も効率的なアクティベーション関数を見つけるために,スポーシティと性能のトレードオフ,スポーシティの予測率,ハードウェア親和性という3つの側面からLCMの疎さを調べるための体系的枠組みを提案する。我々は、ReLU、SwiGLU、ReGLU、ReLU$2$といった異なるアクティベーション機能を利用したLCMの徹底的な実験を行う。その結果,ReLU$^2$モデルが3つの評価点すべてで優れており,スパースLCMの効率的な活性化機能としての可能性を強調した。今後の研究を促進するためにコードを公開します。

Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skipping the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs, leveraging zeros in activation values, we broaden the scope of sparse LLMs beyond zero activation values. We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold, demonstrating that non-ReLU LLMs also exhibit sparse activation. To find the most efficient activation function for sparse computation, we propose a systematic framework to examine the sparsity of LLMs from three aspects: the trade-off between sparsity and performance, the predictivity of sparsity, and the hardware affinity. We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$^2$. The results indicate that models employing ReLU$^2$ excel across all three evaluation aspects, highlighting its potential as an efficient activation function for sparse LLMs. We will release the code to facilitate future research.

翻訳日:2024-02-07 15:58:18 公開日:2024-02-06

# 顔検出:現状と研究の方向性

Face Detection: Present State and Research Directions ( http://arxiv.org/abs/2402.03796v1 )

ライセンス: Link先を確認

Purnendu Prabhat, Himanshu Gupta and Ajeet Kumar Vishwakarma

(参考訳) 人間のイメージを扱うコンピュータビジョンアプリケーションの大部分は、顔検出をコアコンポーネントとして使用している。顔検出には依然として問題がある。顔検出の精度と速度は向上する可能性がある。このレビュー論文は、この分野における進歩と、まだ取り組まなければならない重大な課題を示している。この論文は、顔検出の分野での研究プロジェクトとして取り上げることができる研究の方向性を提供する。

The majority of computer vision applications that handle images featuring humans use face detection as a core component. Face detection still has issues, despite much research on the topic. Face detection's accuracy and speed might yet be increased. This review paper shows the progress made in this area as well as the substantial issues that still need to be tackled. The paper provides research directions that can be taken up as research projects in the field of face detection.

翻訳日:2024-02-07 15:57:45 公開日:2024-02-06

# 深度誘導によるエネルギーベースドメイン適応セグメンテーション

Energy-based Domain-Adaptive Segmentation with Depth Guidance ( http://arxiv.org/abs/2402.03795v1 )

ライセンス: Link先を確認

Jinjing Zhu, Zhedong Hu, Tae-Kyun Kim, and Lin Wang

(参考訳) セマンティックセグメンテーションのための非教師なしドメイン適応(UDA)のガイダンスとして,自己教師付き深度推定を活用する試みが近年行われている。しかし、先行芸術は、意味的特徴と深さ的特徴の相違、および特徴融合の信頼性を軽視し、したがって準最適セグメンテーション性能に繋がる。本稿では,エネルギーベースモデル(ebms)を用いたタスク適応的特徴の獲得と,自己教師付き深さ推定によるセマンティクスセグメンテーションのための信頼性の高い機能融合を実現する,smart(cross domain semantic segmentation based energy estimation)と呼ばれる新しいudaフレームワークを提案する。本フレームワークには,エネルギーベース機能融合(EB2F)とエネルギーベース信頼性融合評価(RFA)モジュールの2つの新しいコンポーネントが組み込まれている。 EB2Fモジュールは、機能融合を改善するためにホップフィールドエネルギーを用いて、その相違を明示的に測定し、低減することにより、タスク適応的な意味と深さの特徴を生成する。 RFAモジュールは、エネルギースコアを用いて特徴融合の信頼性を評価し、深さ誘導の有効性を向上させる。 2つのデータセットに対する大規模な実験により,本手法は先行研究よりも大きな性能向上を達成し,エネルギーベース学習手法の有効性を検証した。

Recent endeavors have been made to leverage self-supervised depth estimation as guidance in unsupervised domain adaptation (UDA) for semantic segmentation. Prior arts, however, overlook the discrepancy between semantic and depth features, as well as the reliability of feature fusion, thus leading to suboptimal segmentation performance. To address this issue, we propose a novel UDA framework called SMART (croSs doMain semAntic segmentation based on eneRgy esTimation) that utilizes Energy-Based Models (EBMs) to obtain task-adaptive features and achieve reliable feature fusion for semantic segmentation with self-supervised depth estimates. Our framework incorporates two novel components: energy-based feature fusion (EB2F) and energy-based reliable fusion Assessment (RFA) modules. The EB2F module produces task-adaptive semantic and depth features by explicitly measuring and reducing their discrepancy using Hopfield energy for better feature fusion. The RFA module evaluates the reliability of the feature fusion using an energy score to improve the effectiveness of depth guidance. Extensive experiments on two datasets demonstrate that our method achieves significant performance gains over prior works, validating the effectiveness of our energy-based learning approach.

翻訳日:2024-02-07 15:57:36 公開日:2024-02-06

# 知識データアライメントによる弱教師付き異常検出

Weakly Supervised Anomaly Detection via Knowledge-Data Alignment ( http://arxiv.org/abs/2402.03785v1 )

ライセンス: Link先を確認

Haihong Zhao, Chenyi Zi, Yang Liu, Chen Zhang, Yan Zhou and Jia Li

(参考訳) マルウェア検出、マネーロンダリング、デバイス障害検出、ネットワーク障害解析など、多数のWebベースのアプリケーションにおいて、異常検出(AD)が重要な役割を果たす。教師なし学習に依存するほとんどの手法は、ラベルの欠如により、十分な検出精度に達することが困難である。弱教師付き異常検出(weakly supervised anomaly detection, wsad)は、限られた数のラベル付き異常検出によって、モデルの性能を向上させるために導入された。それでも、不適切なラベル付きデータに基づいてトレーニングされたモデルが、目に見えない異常に一般化することは依然として困難である。本稿では、人間の専門家が一般的に要約したルール知識を統合し、限定されたラベル付きデータを補完する、新しい枠組みであるkdalign(知識データアライメント)を提案する。具体的には、これらのルールを知識空間に変換し、知識とデータのアライメントとして知識の組み入れを再キャストする。このアライメントを容易にするために、最適輸送(OT)技術を用いる。次に, OT 距離を WSAD 手法の本来の目的関数に付加的な損失項として組み込む。 5つの実世界のデータセットに対する総合的な実験結果から、提案したKDAlignフレームワークが最先端のフレームワークを著しく上回り、様々な異常なタイプで優れたパフォーマンスを実現していることが示された。

Anomaly detection (AD) plays a pivotal role in numerous web-based applications, including malware detection, anti-money laundering, device failure detection, and network fault analysis. Most methods, which rely on unsupervised learning, are hard to reach satisfactory detection accuracy due to the lack of labels. Weakly Supervised Anomaly Detection (WSAD) has been introduced with a limited number of labeled anomaly samples to enhance model performance. Nevertheless, it is still challenging for models, trained on an inadequate amount of labeled data, to generalize to unseen anomalies. In this paper, we introduce a novel framework Knowledge-Data Alignment (KDAlign) to integrate rule knowledge, typically summarized by human experts, to supplement the limited labeled data. Specifically, we transpose these rules into the knowledge space and subsequently recast the incorporation of knowledge as the alignment of knowledge and data. To facilitate this alignment, we employ the Optimal Transport (OT) technique. We then incorporate the OT distance as an additional loss term to the original objective function of WSAD methodologies. Comprehensive experimental results on five real-world datasets demonstrate that our proposed KDAlign framework markedly surpasses its state-of-the-art counterparts, achieving superior performance across various anomaly types.

翻訳日:2024-02-07 15:56:19 公開日:2024-02-06

# AirPhyNet:空気質予測のための物理誘導ニューラルネットワーク

AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction ( http://arxiv.org/abs/2402.03784v1 )

ライセンス: Link先を確認

Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang, Cheng Long, Gao Cong, Jingyuan Wang

(参考訳) 大気質の予測とモデリングは公衆衛生と環境管理において重要な役割を担い、個人や当局は情報的決定を行う。従来のデータ駆動モデルはこの領域で有望性を示しているが、その長期的な予測精度は、特にスパースや不完全なデータを持つシナリオでは制限され、それらは多くの場合、確固とした物理的基盤を持たないブラックボックスのディープラーニング構造に依存しているため、予測における透明性と解釈性が低下する。本稿では,空気質予測のための物理誘導ニューラルネットワーク(AirPhyNet)という新しい手法を提案する。具体的には、空気粒子移動(拡散と対流)の2つの確立された物理原理を微分方程式ネットワークとして表現する。次に,物理知識をニューラルネットワークアーキテクチャに統合し,潜時表現を利用して大気質データ内の時空間関係をキャプチャするグラフ構造を用いる。 2つの実世界のベンチマークデータセットの実験によると、AirPhyNetは異なるリードタイム(24h, 48h, 72h)、スパースデータと突然の変化予測など、さまざまなテストシナリオの最先端モデルよりも優れており、予測エラーの最大10%削減を実現している。さらに,本モデルが粒子運動の基盤となる物理過程を捉え,実際の物理的意味を持つ正確な予測を生成することを検証した。

Air quality prediction and modelling plays a pivotal role in public health and environment management, for individuals and authorities to make informed decisions. Although traditional data-driven models have shown promise in this domain, their long-term prediction accuracy can be limited, especially in scenarios with sparse or incomplete data and they often rely on black-box deep learning structures that lack solid physical foundation leading to reduced transparency and interpretability in predictions. To address these limitations, this paper presents a novel approach named Physics guided Neural Network for Air Quality Prediction (AirPhyNet). Specifically, we leverage two well-established physics principles of air particle movement (diffusion and advection) by representing them as differential equation networks. Then, we utilize a graph structure to integrate physics knowledge into a neural network architecture and exploit latent representations to capture spatio-temporal relationships within the air quality data. Experiments on two real-world benchmark datasets demonstrate that AirPhyNet outperforms state-of-the-art models for different testing scenarios including different lead time (24h, 48h, 72h), sparse data and sudden change prediction, achieving reduction in prediction errors up to 10%. Moreover, a case study further validates that our model captures underlying physical processes of particle movement and generates accurate predictions with real physical meaning.

翻訳日:2024-02-07 15:55:32 公開日:2024-02-06

# 弱教師付きプロンプト学習による低リソース医療画像分類の探索

Exploring Low-Resource Medical Image Classification with Weakly Supervised Prompt Learning ( http://arxiv.org/abs/2402.03783v1 )

ライセンス: Link先を確認

Fudan Zheng, Jindong Cao, Weijiang Yu, Zhiguang Chen, Nong Xiao, Yutong Lu

(参考訳) 臨床補助診断を補助する医用画像認識の進歩は、アノテーションが高価で専門的な医療分野における低リソース化が課題となっている。この低リソース問題は、関連する医学的テキストプロンプトを介して、大規模な事前訓練された視覚言語モデルの転送可能な表現を活用することで緩和することができる。しかし、既存の事前訓練された視覚言語モデルでは、医師の負担を大幅に増大させる医療プロンプトを慎重に設計する必要がある。そこで本研究では,教師なしの視覚言語モデルと弱い教師なしプロンプト学習モデルを含む医学的プロンプトを自動的に生成する,弱い教師付きプロンプト学習法 medprompt を提案する。教師なし事前訓練された視覚言語モデルは、手作業による注釈なしで、医学画像と対応する医学テキストとの自然な相関を利用して事前訓練を行う。弱い教師付きプロンプト学習モデルでは、データセット内の画像のクラスのみを使用してプロンプト内の特定のクラスベクトルの学習を誘導する一方、プロンプト内の他のコンテキストベクトルの学習はガイダンスのマニュアルアノテーションを必要としない。私たちの知る限りでは、これが医療用プロンプトを自動生成する最初のモデルです。これらのプロンプトにより、トレーニング済みの視覚言語モデルは、手動のアノテーションと手動のプロンプト設計の強い専門家依存から解放することができる。実験の結果,我々の自動生成プロンプトを用いたモデルは,ゼロショット画像分類において,最小限のラベル付きサンプルしか持たないフルショット学習ハンドクラフトプロンプトよりも優れ,あるいは同等の精度に達することがわかった。提案するプロンプトジェネレータは軽量であり,任意のネットワークアーキテクチャに組み込むことができる。

Most advances in medical image recognition supporting clinical auxiliary diagnosis meet challenges due to the low-resource situation in the medical field, where annotations are highly expensive and professional. This low-resource problem can be alleviated by leveraging the transferable representations of large-scale pre-trained vision-language models via relevant medical text prompts. However, existing pre-trained vision-language models require domain experts to carefully design the medical prompts, which greatly increases the burden on clinicians. To address this problem, we propose a weakly supervised prompt learning method MedPrompt to automatically generate medical prompts, which includes an unsupervised pre-trained vision-language model and a weakly supervised prompt learning model. The unsupervised pre-trained vision-language model utilizes the natural correlation between medical images and corresponding medical texts for pre-training, without any manual annotations. The weakly supervised prompt learning model only utilizes the classes of images in the dataset to guide the learning of the specific class vector in the prompt, while the learning of other context vectors in the prompt requires no manual annotations for guidance. To the best of our knowledge, this is the first model to automatically generate medical prompts. With these prompts, the pre-trained vision-language model can be freed from the strong expert dependency of manual annotation and manual prompt design. Experimental results show that the model using our automatically generated prompts outperforms its full-shot learning hand-crafted prompts counterparts with only a minimal number of labeled samples for few-shot learning, and reaches superior or comparable accuracy on zero-shot image classification. The proposed prompt generator is lightweight and therefore can be embedded into any network architecture.

翻訳日:2024-02-07 15:55:04 公開日:2024-02-06

# 言語間伝達のためのソフトプロンプトチューニング: 少ない方が多い場合

Soft Prompt Tuning for Cross-Lingual Transfer: When Less is More ( http://arxiv.org/abs/2402.03782v1 )

ライセンス: Link先を確認

Fred Philippy, Siwen Guo, Shohreh Haddadan, Cedric Lothritz, Jacques Klein, Tegawend\'e F. Bissyand\'e

(参考訳) SPT(Soft Prompt Tuning)は、学習可能な埋め込みやソフトプロンプトをPLMの入力層に挿入することで、学習済み言語モデル(PLM)を特定のタスクに適応させるパラメータ効率のよい手法である。本稿では,言語間移動におけるSPTの可能性について検討する。ソフトプロンプトとモデルパラメータの両方を微調整する言語間伝達に関する以前の研究とは異なり、モデルパラメータを凍結させ、ソフトプロンプトのみをトレーニングすることで、sptの本来の意図に固執する。これは、フルモデルファインチューニングの計算コストとストレージオーバーヘッドを低減させるだけでなく、SPTに固有のこのパラメータ効率が言語的に離れた言語への言語間転送性能を向上させることを実証する。さらに,プロンプトに関連する要因(長さやパラメータ化など)が言語間移動性能に与える影響についても検討する。

Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the soft prompt and the model parameters, we adhere to the original intent of SPT by keeping the model parameters frozen and only training the soft prompt. This does not only reduce the computational cost and storage overhead of full-model fine-tuning, but we also demonstrate that this very parameter efficiency intrinsic to SPT can enhance cross-lingual transfer performance to linguistically distant languages. Moreover, we explore how different factors related to the prompt, such as the length or its reparameterization, affect cross-lingual transfer performance.

翻訳日:2024-02-07 15:54:34 公開日:2024-02-06

# MolTC:言語モデルにおける分子関係モデリングを目指して

MolTC: Towards Molecular Relational Modeling In Language Models ( http://arxiv.org/abs/2402.03781v1 )

ライセンス: Link先を確認

Junfeng Fang, Shuai Zhang, Chang Wu, Zhiyuan Liu, Sihang Li, Kun Wang, Wenjie Du, Xiang Wang, Xiangnan He

(参考訳) 分子間の相互作用を理解することを目的とした分子関係学習(MRL)は、生化学研究の進展において重要な役割を担っている。近年,膨大な知識リポジトリと高度な論理推論能力で知られる大規模言語モデル (LLM) の採用が,MRLの効率的かつ効果的な方法として注目されている。その可能性にもかかわらず、これらの手法は主としてテキストデータに依存しており、分子グラフに固有の構造情報の豊富さを十分に活用していない。さらに、統一されたフレームワークが存在しないことで、さまざまなデータセットで学習された相互作用の合理化の共有が妨げられるため、情報の活用が悪化する。これらの課題に対処するために、本研究では分子対のリッチなグラフィカルな情報を効率的に統合できるmoltc(chain-of-thought (cot) theory) に基づいた分子相互作用予測のための新しいllmベースのマルチモーダルフレームワークを提案する。統合MRLを実現するため,MollTCは,クロスデータセット情報交換のための動的パラメータ共有戦略を革新的に開発し,マルチ階層CoT原則を導入し,訓練パラダイムを洗練させる。 4000,000以上の分子対を含む12種類のデータセットを用いて実験を行い,現在のGNNおよびLLMベースラインよりも,本手法の優位性を実証した。その上、分子対話型インストラクションデータセットを総合的に構築し、moltcを含む生化学llmの開発を行っている。コードはhttps://github.com/MangoKiller/MolTCで入手できる。

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the information underutilization, as it hinders the sharing of interaction rationale learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which can efficiently integrate rich graphical information of molecular pairs. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information exchange, and introduces a Multi-hierarchical CoT principle to refine training paradigm. Our experiments, conducted across twelve varied datasets involving over 4,000,000 molecular pairs, demonstrate the superiority of our method over current GNN and LLM-based baselines. On the top of that, a comprehensive Molecular Interactive Instructions dataset is constructed for the development of biochemical LLM, including our MolTC. Code is available at https://github.com/MangoKiller/MolTC.

翻訳日:2024-02-07 15:54:16 公開日:2024-02-06

# 公開プロパガンダ:人間のアノテーションと機械分類を比較したスタイリスティックな方法の分析

Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification ( http://arxiv.org/abs/2402.03780v1 )

ライセンス: Link先を確認

G\'eraud Faye, Benjamin Icard, Morgane Casanova, Julien Chanson, Fran\c{c}ois Maine, Fran\c{c}ois Bancilhon, Guillaume Gadek, Guillaume Gravier, Paul \'Egr\'e

(参考訳) 本稿では,プロパガンダの言語とその様式的特徴について検討する。 Pseudo-Newsは、専門家機関によってプロパガンダソースとして特定されたウェブサイトから抽出されたニュース記事からなるマルチソース、多言語、マルチモーダルデータセットである。このセットの限られたサンプルは、通常のフランスの報道機関の論文とランダムに混同され、そのURLがマスクされ、11の異なるラベルを使って人による注釈実験が行われた。その結果,ヒトのアノテータは各ラベル間で2種類のプレスを確実に識別することができた。アノテーションが使用するキューを識別するための異なるNLP手法を提案し,それらを機械分類と比較する。これには、談話の曖昧さと主観性を測定するためのアナライザVAGO、ベースラインとして機能するTF-IDF、および2つのRoBERTaベースのモデル、構文を用いたCATS、構文と意味的特徴を組み合わせた1つのXGBoostの4つの異なる分類器が含まれる。キーワード: Propaganda, Fake News, 説明可能性, AIアライメント, Vagueness, 主観性, 誇張, スティリスティック分析

This paper investigates the language of propaganda and its stylistic features. It presents the PPN dataset, standing for Propagandist Pseudo-News, a multisource, multilingual, multimodal dataset composed of news articles extracted from websites identified as propaganda sources by expert agencies. A limited sample from this set was randomly mixed with papers from the regular French press, and their URL masked, to conduct an annotation-experiment by humans, using 11 distinct labels. The results show that human annotators were able to reliably discriminate between the two types of press across each of the labels. We propose different NLP techniques to identify the cues used by the annotators, and to compare them with machine classification. They include the analyzer VAGO to measure discourse vagueness and subjectivity, a TF-IDF to serve as a baseline, and four different classifiers: two RoBERTa-based models, CATS using syntax, and one XGBoost combining syntactic and semantic features. Keywords: Propaganda, Fake News, Explainability, AI alignment, Vagueness, Subjectivity, Exaggeration, Stylistic analysis

翻訳日:2024-02-07 15:53:50 公開日:2024-02-06

# EERO: 予算限定による効率的な分類のためのリジェクトオプションによる早期終了

EERO: Early Exit with Reject Option for Efficient Classification with limited budget ( http://arxiv.org/abs/2402.03779v1 )

ライセンス: Link先を確認

Florian Valade (LAMA), Mohamed Hebiri (LAMA), Paul Gay

(参考訳) 高度な機械学習モデルの複雑さの増大は、計算資源を効果的に管理するための革新的なアプローチを必要とする。このような方法のひとつがEarly Exit戦略であり、単純なデータインスタンスの処理パスを短縮するメカニズムを提供することで、適応的な計算を可能にする。本稿では,eeroを提案する。eeroは,早期の退出問題から,各インスタンスの退出ヘッドをより適切に選択するために,rejectオプション付きの複数の分類器を使用する問題に翻訳する新しい手法である。我々は、固定予算を保証するために指数重の集約を用いて、異なる頭部で出口の確率を調整した。我々は,ベイズリスク,予算制約,ヘッド固有の予算消費などの要因を検討する。 Cifar と ImageNet のデータセット上で ResNet-18 モデルと ConvNext アーキテクチャを用いて実験を行った結果,提案手法は予算配分を効果的に管理するだけでなく,過度なシナリオの正確性も向上することが示された。

The increasing complexity of advanced machine learning models requires innovative approaches to manage computational resources effectively. One such method is the Early Exit strategy, which allows for adaptive computation by providing a mechanism to shorten the processing path for simpler data instances. In this paper, we propose EERO, a new methodology to translate the problem of early exiting to a problem of using multiple classifiers with reject option in order to better select the exiting head for each instance. We calibrate the probabilities of exiting at the different heads using aggregation with exponential weights to guarantee a fixed budget .We consider factors such as Bayesian risk, budget constraints, and head-specific budget consumption. Experimental results, conducted using a ResNet-18 model and a ConvNext architecture on Cifar and ImageNet datasets, demonstrate that our method not only effectively manages budget allocation but also enhances accuracy in overthinking scenarios.

翻訳日:2024-02-07 15:53:27 公開日:2024-02-06

# コードレビューの自動化を改善する - 経験から学ぶ

Improving Automated Code Reviews: Learning from Experience ( http://arxiv.org/abs/2402.03777v1 )

ライセンス: Link先を確認

Hong Yi Lin, Patanamon Thongtanunam, Christoph Treude, Wachiraphan Charoenwet

(参考訳) 現代のコードレビューは、業界とオープンソースの両方で広く採用されている品質保証プロセスである。このプロセスは、経験豊富なレビュアーからのフィードバックから初心者が学ぶのに役立つが、レビュアーには大きなワークロードとストレスをもたらすことが多い。この負担を軽減するため、自動コードレビューの分野はプロセスを自動化することを目的としており、大きな言語モデルに人間のように、提出されたコードに対するレビューを提供するように教えている。最近のアプローチでは、大規模なコードレビューコーパスで、コードインテリジェント言語モデルを事前学習し、微調整した。しかし、これらの手法はトレーニングデータの品質評価を完全に活用することはなかった。実際、コードに対する高いレベルの経験や慣れ親しんだレビュアーは、他のものよりも深い洞察を提供するでしょう。本研究では,経験型オーバーサンプリング技術に基づいてトレーニングされた自動コードレビューモデルから,高品質なレビューを生成できるかどうかを検討する。定量的および定性的な評価により,経験意識によるオーバーサンプリングは,新たなデータを導入することなく,現在の最先端モデルが生成するレビューの正確性,情報レベル,有意義性を向上できることがわかった。その結果,現行のトレーニング戦略では,高品質なレビューが不十分であることが示唆された。この作業は、自動コードレビューモデルを強化するためのリソース効率のよい方法に光を当てています。

Modern code review is a critical quality assurance process that is widely adopted in both industry and open source software environments. This process can help newcomers learn from the feedback of experienced reviewers; however, it often brings a large workload and stress to reviewers. To alleviate this burden, the field of automated code reviews aims to automate the process, teaching large language models to provide reviews on submitted code, just as a human would. A recent approach pre-trained and fine-tuned the code intelligent language model on a large-scale code review corpus. However, such techniques did not fully utilise quality reviews amongst the training data. Indeed, reviewers with a higher level of experience or familiarity with the code will likely provide deeper insights than the others. In this study, we set out to investigate whether higher-quality reviews can be generated from automated code review models that are trained based on an experience-aware oversampling technique. Through our quantitative and qualitative evaluation, we find that experience-aware oversampling can increase the correctness, level of information, and meaningfulness of reviews generated by the current state-of-the-art model without introducing new data. The results suggest that a vast amount of high-quality reviews are underutilised with current training strategies. This work sheds light on resource-efficient ways to boost automated code review models.

翻訳日:2024-02-07 15:53:10 公開日:2024-02-06

# MOOCsグレーダーとしての大規模言語モデル

Large Language Models As MOOCs Graders ( http://arxiv.org/abs/2402.03776v1 )

ライセンス: Link先を確認

Shahriar Golchin, Nikhil Garuda, Christopher Impey, Matthew Wenger

(参考訳) 大規模なオープン・オンライン・コース(moocs)は、世界中の誰でもコンピュータとインターネットにアクセスできる自由教育の扉を開ける。このような学習の民主化にもかかわらず、これらのコースの大規模な入学は、一人の教官が生徒全員の筆記課題を評価することはほぼ不可能であることを意味する。結果として、単純なルーブリックによって導かれるピアグレーティングが選択方法である。便利だが、ピアグレーディングは信頼性と妥当性の点で不足することが多い。本研究では18の異なる設定を用いて,MOOCにおけるピアグレーディングを代替する大規模言語モデル(LLM)の実現可能性を検討する。具体的には,GPT-4 と GPT-3.5 の3つの異なるコース,すなわち導入天文学,天文学,天文学史と哲学に焦点をあてる。 LLMを指導するためには、ゼロショットチェーン・オブ・シークレット (Zero-shot-CoT) の変種に基づく3つの異なるプロンプトを使用する: ゼロショット-CoTとインストラクターが提案した正解を組み合わせ、ゼロショット-CoTとインストラクターが生成した正解とLLMを併用するゼロショット-CoT。その結果,Zero-shot-CoTはインストラクターが提供する回答やルーブリックと統合された場合,ピアグレーティングよりもインストラクターが割り当てたものとより整合した成績が得られた。しかし、天文学コースの歴史と哲学は、他のコースとは対照的に、成績付けの点でより困難であることが証明されている。最後に,本研究は,特にルーブリックをよく定義した被験者において,moocのグレーティングシステムを自動化するための有望な方向性を示す。

Massive open online courses (MOOCs) unlock the doors to free education for anyone around the globe with access to a computer and the internet. Despite this democratization of learning, the massive enrollment in these courses means it is almost impossible for one instructor to assess every student's writing assignment. As a result, peer grading, often guided by a straightforward rubric, is the method of choice. While convenient, peer grading often falls short in terms of reliability and validity. In this study, using 18 distinct settings, we explore the feasibility of leveraging large language models (LLMs) to replace peer grading in MOOCs. Specifically, we focus on two state-of-the-art LLMs: GPT-4 and GPT-3.5, across three distinct courses: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy. To instruct LLMs, we use three different prompts based on a variant of the zero-shot chain-of-thought (Zero-shot-CoT) prompting technique: Zero-shot-CoT combined with instructor-provided correct answers; Zero-shot-CoT in conjunction with both instructor-formulated answers and rubrics; and Zero-shot-CoT with instructor-offered correct answers and LLM-generated rubrics. Our results show that Zero-shot-CoT, when integrated with instructor-provided answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading. However, the History and Philosophy of Astronomy course proves to be more challenging in terms of grading as opposed to other courses. Finally, our study reveals a promising direction for automating grading systems for MOOCs, especially in subjects with well-defined rubrics.

翻訳日:2024-02-07 15:52:50 公開日:2024-02-06

# 変圧器を用いた決定木アルゴリズムの学習

Learning a Decision Tree Algorithm with Transformers ( http://arxiv.org/abs/2402.03774v1 )

ライセンス: Link先を確認

Yufan Zhuang, Liyuan Liu, Chandan Singh, Jingbo Shang, Jianfeng Gao

(参考訳) 決定木は、特に表データにおいて高い予測性能を達成するための解釈能力で有名である。伝統的に、それらは再帰的なアルゴリズムによって構築され、ツリーの各ノードでデータを分割する。しかし、ローカルセグメントに最適化された決定木がグローバルな一般化をもたらすことはないため、最良の分割を特定することは難しい。これに対処するために,古典アルゴリズムからのフィルタ出力に基づいてトランスフォーマティブベースのモデルをトレーニングし,分類のための強い決定木を生成するメタツリーを提案する。具体的には、多くのデータセットにグリージーな決定木と最適化された決定木の両方を適合させます。次にMetaTreeをトレーニングして、強力な一般化パフォーマンスを実現するツリーを生成します。このトレーニングにより、MetaTreeはこれらのアルゴリズムをエミュレートするだけでなく、コンテキストに応じてその戦略をインテリジェントに適応させることができる。

Decision trees are renowned for their interpretability capability to achieve high predictive performance, especially on tabular data. Traditionally, they are constructed through recursive algorithms, where they partition the data at every node in a tree. However, identifying the best partition is challenging, as decision trees optimized for local segments may not bring global generalization. To address this, we introduce MetaTree, which trains a transformer-based model on filtered outputs from classical algorithms to produce strong decision trees for classification. Specifically, we fit both greedy decision trees and optimized decision trees on a large number of datasets. We then train MetaTree to produce the trees that achieve strong generalization performance. This training enables MetaTree to not only emulate these algorithms, but also to intelligently adapt its strategy according to the context, thereby achieving superior generalization performance.

翻訳日:2024-02-07 15:52:10 公開日:2024-02-06

# RVFLに基づく非線形辞書学習へのSVDフリーアプローチ

An SVD-free Approach to Nonlinear Dictionary Learning based on RVFL ( http://arxiv.org/abs/2402.03833v1 )

ライセンス: Link先を確認

G.Madhuri, Atul Negi

(参考訳) 本稿では,Random Vector Functional Link(RVFL)と呼ばれるフィードフォワードニューラルネットワークの理論を活用する非線形辞書学習アルゴリズムを提案する。提案したRVFLに基づく非線形辞書学習(RVFLDL)は,非線形スパース係数から高密度入力特徴へのスパース・トゥ・デンス特徴写像として辞書を学習する。カーネルに基づく非線形辞書学習手法は暗黙的特徴マップによって得られる特徴空間で動作し、特異値分解(svd)のような計算コストの高い演算とは独立ではない。 RVFLは入力から出力層への重みを解析的に生成するので、RVFLベースの辞書のトレーニングはSVD計算から解放される。係数にスパース性誘導馬車前置を仮定し、スパース係数行列w.r.tを初期ランダム辞書を生成する。入力スパース係数と辞書原子との高次依存関係は、スパース係数を非線形に変換し、強化された特徴として付加することによりトレーニングプロセスに組み込む。したがって、この方法は、辞書に非線形性を誘導しながら、スパース係数を高次元空間に投影する。 RVFL-netを用いて分類するために、分類行列は非線形スパース係数をラベルにマッピングする変換として学習される。画像分類と再構成の応用における手法の性能は他の非線形辞書学習法と同等である。実験により、RVFLDLは拡張性があり、他の非線形辞書学習法よりも優れた解を提供することが示された。

This paper presents a novel nonlinear dictionary learning algorithm leveraging the theory of a feed-forward neural network called Random Vector Functional Link (RVFL). The proposed RVFL-based nonlinear Dictionary Learning (RVFLDL) learns a dictionary as a sparse-to-dense feature map from nonlinear sparse coefficients to the dense input features. Kernel-based nonlinear dictionary learning methods operate in a feature space obtained by an implicit feature map, and they are not independent of computationally expensive operations like Singular Value Decomposition (SVD). Training the RVFL-based dictionary is free from SVD computation as RVFL generates weights from the input to the output layer analytically. Sparsity-inducing Horse-shoe prior is assumed on the coefficients to generate a sparse coefficient matrix w.r.t an initial random dictionary. Higher-order dependencies between the input sparse coefficients and the dictionary atoms are incorporated into the training process by nonlinearly transforming the sparse coefficients and adding them as enhanced features. Thus the method projects sparse coefficients to a higher dimensional space while inducing nonlinearities into the dictionary. For classification using RVFL-net, a classifier matrix is learned as a transform that maps nonlinear sparse coefficients to the labels. The performance of the method illustrated in image classification and reconstruction applications is comparable to that of other nonlinear dictionary learning methods. Experiments show that RVFLDL is scalable and provides a solution better than those obtained using other nonlinear dictionary learning methods.

翻訳日:2024-02-07 15:44:25 公開日:2024-02-06

# 大規模言語モデルを用いた雇用市場領域におけるスキル抽出の再考

Rethinking Skill Extraction in the Job Market Domain using Large Language Models ( http://arxiv.org/abs/2402.03832v1 )

ライセンス: Link先を確認

Khanh Cao Nguyen, Mike Zhang, Syrielle Montariol, Antoine Bosselut

(参考訳) スキル抽出は、仕事の投稿や履歴書などの文書で言及されているスキルと資格を識別する。このタスクは、BIOタグを用いたシーケンスラベリングアプローチを使用して教師付きモデルをトレーニングすることで、一般的に取り組まれる。しかし、手動でアノテートしたデータへの依存は、そのようなアプローチの一般化可能性を制限する。さらに、共通のバイオ設定は、複雑なスキルパターンを捉えてあいまいな言及を処理できるモデルの能力を制限する。本稿では,6つの統一スキル抽出データセットのベンチマークを用いて,これらの課題を克服するためのインコンテキスト学習の利用について検討する。提案手法は,大規模言語モデル(LLM)の少数ショット学習機能を活用し,文からスキルを抽出する。 LLMは従来の教師付きモデルと性能的に同等ではないにもかかわらず、構文的に複雑なスキル記述をスキル抽出タスクでよりうまく扱えることを示す。

Skill Extraction involves identifying skills and qualifications mentioned in documents such as job postings and resumes. The task is commonly tackled by training supervised models using a sequence labeling approach with BIO tags. However, the reliance on manually annotated data limits the generalizability of such approaches. Moreover, the common BIO setting limits the ability of the models to capture complex skill patterns and handle ambiguous mentions. In this paper, we explore the use of in-context learning to overcome these challenges, on a benchmark of 6 uniformized skill extraction datasets. Our approach leverages the few-shot learning capabilities of large language models (LLMs) to identify and extract skills from sentences. We show that LLMs, despite not being on par with traditional supervised models in terms of performance, can better handle syntactically complex skill mentions in skill extraction tasks.

翻訳日:2024-02-07 15:44:03 公開日:2024-02-06

# OASim:自律運転のためのニューラルレンダリングに基づくオープンで適応的なシミュレータ

OASim: an Open and Adaptive Simulator based on Neural Rendering for Autonomous Driving ( http://arxiv.org/abs/2402.03830v1 )

ライセンス: Link先を確認

Guohang Yan, Jiahao Pi, Jianfei Guo, Zhaotong Luo, Min Dou, Nianchen Deng, Qiusheng Huang, Daocheng Fu, Licheng Wen, Pinlong Cai, Xing Gao, Xinyu Cai, Bo Zhang, Xuemeng Yang, Yeqi Bai, Hongbin Zhou, Botian Shi

(参考訳) ディープラーニングとコンピュータビジョンの技術開発により、自動運転は交通安全と効率を改善する新しいソリューションを提供する。高品質なデータセットを構築することの重要性は、特に近年のエンドツーエンドの自動運転アルゴリズムの台頭とともに、自明である。データはアルゴリズムのクローズドループシステムにおいて中心的な役割を果たす。しかし、現実世界のデータ収集は高価で、時間がかかり、安全ではない。暗黙的レンダリング技術の開発と、生成モデルを用いた大規模データ生成に関する詳細な研究により、オープンかつ適応的なシミュレータであり、暗黙的ニューラルレンダリングに基づく自律運転データ生成装置であるOASimを提案する。 1) 神経暗黙的表面再構成技術による高品質なシーン再構成。 2)自走車及び参加車両の軌道編集。 (3) シーンに自由に選択して挿入できるリッチカーモデルライブラリ。 (4) 特定のセンサを選択してデータを生成するリッチセンサーモデルライブラリ。 (5)高度にカスタマイズ可能なデータ生成システムは,ユーザのニーズに応じてデータを生成することができる。カルラシミュレータ上での認識性能評価と実世界のデータ取得により,生成データの品質と忠実さを実証する。コードはhttps://github.com/PJLab-ADG/OASimで入手できる。

With deep learning and computer vision technology development, autonomous driving provides new solutions to improve traffic safety and efficiency. The importance of building high-quality datasets is self-evident, especially with the rise of end-to-end autonomous driving algorithms in recent years. Data plays a core role in the algorithm closed-loop system. However, collecting real-world data is expensive, time-consuming, and unsafe. With the development of implicit rendering technology and in-depth research on using generative models to produce data at scale, we propose OASim, an open and adaptive simulator and autonomous driving data generator based on implicit neural rendering. It has the following characteristics: (1) High-quality scene reconstruction through neural implicit surface reconstruction technology. (2) Trajectory editing of the ego vehicle and participating vehicles. (3) Rich vehicle model library that can be freely selected and inserted into the scene. (4) Rich sensors model library where you can select specified sensors to generate data. (5) A highly customizable data generation system can generate data according to user needs. We demonstrate the high quality and fidelity of the generated data through perception performance evaluation on the Carla simulator and real-world data acquisition. Code is available at https://github.com/PJLab-ADG/OASim.

翻訳日:2024-02-07 15:43:48 公開日:2024-02-06

# 神経最適輸送による分布の重心推定

Estimating Barycenters of Distributions with Neural Optimal Transport ( http://arxiv.org/abs/2402.03828v1 )

ライセンス: Link先を確認

Alexander Kolesov, Petr Mokrov, Igor Udovichenko, Milena Gazdieva, Gudmund Pammer, Evgeny Burnaev, Alexander Korotin

(参考訳) 確率測定の集合を考えると、実践者は基準分布を適切に集約する"平均"分布を見つける必要がある。そのような平均の理論的に魅力的な概念はワッサーシュタイン・バリーセンターであり、これは我々の研究の主焦点である。最適輸送(ot)の双対定式化を基盤として,ワッサースタイン・バリセンター問題を解くための新しいスケーラブルな手法を提案する。近年のNeural OTソルバをベースとして,二段階の対数学習目標を持ち,一般的なコスト関数に有効である。バリセンタタスクを利用する典型的な逆アルゴリズムは三段階最適化を利用しており、主に二次コストに重点を置いている。また,提案手法の理論的誤差境界を定め,その適用性および実例的シナリオと画像データ設定に対する有効性を示す。

Given a collection of probability measures, a practitioner sometimes needs to find an "average" distribution which adequately aggregates reference distributions. A theoretically appealing notion of such an average is the Wasserstein barycenter, which is the primal focus of our work. By building upon the dual formulation of Optimal Transport (OT), we propose a new scalable approach for solving the Wasserstein barycenter problem. Our methodology is based on the recent Neural OT solver: it has bi-level adversarial learning objective and works for general cost functions. These are key advantages of our method, since the typical adversarial algorithms leveraging barycenter tasks utilize tri-level optimization and focus mostly on quadratic cost. We also establish theoretical error bounds for our proposed approach and showcase its applicability and effectiveness on illustrative scenarios and image data setups.

翻訳日:2024-02-07 15:43:32 公開日:2024-02-06

# インボディードAIへの呼びかけ

A call for embodied AI ( http://arxiv.org/abs/2402.03824v1 )

ライセンス: Link先を確認

Giuseppe Paolo, Jonas Gonzalez-Billandon, Bal\'azs K\'egl

(参考訳) 我々は、人工知能の追求における次の基本的なステップとして、Embodied AIを提案する。我々は、哲学、心理学、神経科学、ロボティクスといった様々な分野にまたがるエンボディメントの概念の進化を横切り、EAIが静的学習の古典的パラダイムとどのように区別するかを強調する。具体化aiの範囲を広げることにより,認知的アーキテクチャに基づく理論的枠組みを導入し,具体化エージェントの本質的構成要素として知覚,行動,記憶,学習を強調する。このフレームワークはFristonのアクティブな推論原則と一致しており、EAI開発に対する包括的なアプローチを提供する。 AIの分野での進歩にもかかわらず、新しいAI学習理論の定式化や高度なハードウェアの革新といった大きな課題が続いている。私たちの議論は、将来のEmbodied AI研究の基礎となるガイドラインを概説している。現実の環境における人間や他の知的なエンティティとのシームレスなコミュニケーション、コラボレーション、共存が可能なエンボダイドAIエージェントを作成することの重要性を強調し、我々はAIコミュニティを多面的な課題に対処し、AGIの探求に先立つ機会をつかむことを目指しています。

We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.

翻訳日:2024-02-07 15:43:17 公開日:2024-02-06

# RevOrder: 言語モデルにおける算術的強化のための新しい方法

RevOrder: A Novel Method for Enhanced Arithmetic in Language Models ( http://arxiv.org/abs/2402.03822v1 )

ライセンス: Link先を確認

Si Shen, Peijun Shen, Danhao Zhu

(参考訳) 本稿では,大言語モデル(LLM)における算術演算の改善を目的とした新しい手法であるRevOrderを提案する。本手法は,方程式の複雑性を評価するための新しい指標である$\mathcal{o}(1)$ に対して,シーケンシャル中間桁 (csid) のカウントを大幅に削減する。総合的なテストを通じて、RevOrderは基本的な算術演算において完全な精度を達成するだけでなく、分割タスク、特に従来のモデルが苦戦する多数のタスクにおけるLLM性能を大幅に向上させる。 RevOrderの実装は、トレーニングと推論フェーズの両方に費用対効果がある。さらに、GSM8Kの数学タスク上でLLaMA2-7Bモデルを微調整するためにRevOrderを適用すると、方程式計算誤差が46%減少し、総合スコアが41.6から44.4に増加した。

This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Through comprehensive testing, RevOrder not only achieves perfect accuracy in basic arithmetic operations but also substantially boosts LLM performance in division tasks, particularly with large numbers where traditional models struggle. Implementation of RevOrder is cost-effective for both training and inference phases. Moreover, applying RevOrder to fine-tune the LLaMA2-7B model on the GSM8K math task results in a considerable improvement, reducing equation calculation errors by 46% and increasing overall scores from 41.6 to 44.4.

翻訳日:2024-02-07 15:42:53 公開日:2024-02-06

# SMOTEの理論的および実験的研究:再バランス戦略の限界と比較

Theoretical and experimental study of SMOTE: limitations and comparisons of rebalancing strategies ( http://arxiv.org/abs/2402.03819v1 )

ライセンス: Link先を確認

Abdoulaye Sakho (LPSM), Erwan Scornet (LPSM), Emmanuel Malherbe

(参考訳) SMOTE(Synthetic Minority Oversampling Technique)は、不均衡なデータセットを扱うための一般的なリバランス戦略である。漸近的に、SMOTE(デフォルトパラメータを持つ)が元のマイノリティサンプルをコピーするだけで元の分布を再生することを示す。また,SMOTE密度はマイノリティ分布の支持境界付近で消失し,従って共通なBorderLine SMOTE戦略を正当化する。次に、2つの新しいSMOTE関連戦略を導入し、それらを最先端のリバランシング手順と比較する。データセットが高度に不均衡な場合にのみ、再バランス戦略が必要であることを示す。このようなデータセットに対して、SMOTE、提案、またはアンサンプ手順が最良の戦略である。

Synthetic Minority Oversampling Technique (SMOTE) is a common rebalancing strategy for handling imbalanced data sets. Asymptotically, we prove that SMOTE (with default parameter) regenerates the original distribution by simply copying the original minority samples. We also prove that SMOTE density vanishes near the boundary of the support of the minority distribution, therefore justifying the common BorderLine SMOTE strategy. Then we introduce two new SMOTE-related strategies, and compare them with state-of-the-art rebalancing procedures. We show that rebalancing strategies are only required when the data set is highly imbalanced. For such data sets, SMOTE, our proposals, or undersampling procedures are the best strategies.

翻訳日:2024-02-07 15:42:33 公開日:2024-02-06

# 単層グラフ畳み込みネットワークの漸近一般化誤差

Asymptotic generalization error of a single-layer graph convolutional network ( http://arxiv.org/abs/2402.03818v1 )

ライセンス: Link先を確認

O. Duranthon, L. Zdeborov\'a

(参考訳) グラフ畳み込みネットワークは大きな実用的期待を示しているが、標本数関数としての一般化特性の理論的理解は、教師付き完全連結ニューラルネットワークのより広く研究されているケースと比較してまだ初期段階にある。本稿では,一層グラフ畳み込みネットワーク(GCN)の性能を,属性付き確率ブロックモデル(SBM)が高次元限界で生成したデータに基づいて予測する。従来,SBM(文脈的SBM)のリッジ回帰のみを考慮し,CSBMの任意の凸損失と正則化に一般化し,他のデータモデルであるニューラルプライアSBMに解析を加えてきた。また,高信号対雑音比の限界について検討し,GCNの収束率を詳細に検討し,一貫性はあるものの,いずれの場合においてもベイズ最適値に達しないことを示す。

While graph convolutional networks show great practical promises, the theoretical understanding of their generalization properties as a function of the number of samples is still in its infancy compared to the more broadly studied case of supervised fully connected neural networks. In this article, we predict the performances of a single-layer graph convolutional network (GCN) trained on data produced by attributed stochastic block models (SBMs) in the high-dimensional limit. Previously, only ridge regression on contextual-SBM (CSBM) has been considered in Shi et al. 2022; we generalize the analysis to arbitrary convex loss and regularization for the CSBM and add the analysis for another data model, the neural-prior SBM. We also study the high signal-to-noise ratio limit, detail the convergence rates of the GCN and show that, while consistent, it does not reach the Bayes-optimal rate for any of the considered cases.

翻訳日:2024-02-07 15:42:23 公開日:2024-02-06

# 投票に基づく合意モデル圧縮によるネットワーク内フェデレーション学習の迅速化

Expediting In-Network Federated Learning by Voting-Based Consensus Model Compression ( http://arxiv.org/abs/2402.03815v1 )

ライセンス: Link先を確認

Xiaoxin Su, Yipeng Zhou, Laizhong Cui and Song Guo

(参考訳) 近年,データプライバシの保護能力により,連合学習(FL)が勢いを増している。 FLによるモデルトレーニングを行うために、複数のクライアントがパラメータサーバとインターネットを介してモデル更新を交換する。通信速度を高速化するため,パラメータサーバの代わりにプログラマブルスイッチ(PS)を配置してクライアントのコーディネートを行う方法が検討されている。 PSをFLにデプロイする際の課題はメモリスペースの不足にあり、PS上でメモリ消費集約アルゴリズムの実行を禁止している。この課題を解決するために,クライアント投票とモデル集約という2つのフェーズからなるFediAC(Federated Learning Aggregation with Compression)アルゴリズムを提案する。前フェーズでは、クライアントがPSに重要なモデル更新指標を報告し、世界的な重要なモデル更新を見積もる。後者のフェーズでは、クライアントは集約のためにグローバルに重要なモデルの更新をPSにアップロードする。 FediACは、クライアント間のコンセンサス圧縮を保証するため、既存の作業よりもメモリスペースと通信トラフィックをはるかに少なく消費する。 PSは、モデル更新インデックスを第2フェーズで迅速に完全なアグリゲーションに調整する。最後に,fediacがモデル精度と通信トラフィックの面で最先端のベースラインを著しく上回っていることを示すために,公開データセットを用いて広範な実験を行った。

Recently, federated learning (FL) has gained momentum because of its capability in preserving data privacy. To conduct model training by FL, multiple clients exchange model updates with a parameter server via Internet. To accelerate the communication speed, it has been explored to deploy a programmable switch (PS) in lieu of the parameter server to coordinate clients. The challenge to deploy the PS in FL lies in its scarce memory space, prohibiting running memory consuming aggregation algorithms on the PS. To overcome this challenge, we propose Federated Learning in-network Aggregation with Compression (FediAC) algorithm, consisting of two phases: client voting and model aggregating. In the former phase, clients report their significant model update indices to the PS to estimate global significant model updates. In the latter phase, clients upload global significant model updates to the PS for aggregation. FediAC consumes much less memory space and communication traffic than existing works because the first phase can guarantee consensus compression across clients. The PS easily aligns model update indices to swiftly complete aggregation in the second phase. Finally, we conduct extensive experiments by using public datasets to demonstrate that FediAC remarkably surpasses the state-of-the-art baselines in terms of model accuracy and communication traffic.

翻訳日:2024-02-07 15:42:08 公開日:2024-02-06

# 非離散帯域を持つマスクグラフオートエンコーダ

Masked Graph Autoencoder with Non-discrete Bandwidths ( http://arxiv.org/abs/2402.03814v1 )

ライセンス: Link先を確認

Ziwen Zhao, Yuhua Li, Yixiong Zou, Jiliang Tang, Ruixuan Li

(参考訳) マスケードグラフオートエンコーダは、まだ十分に研究されていない強力なグラフ自己教師学習手法として登場した。本稿では,グラフニューラルネットワークにおけるメッセージ伝達の観点から,既存の離散エッジマスキングとバイナリリンク再構成戦略が位相的に有意な表現を学習するには不十分であることを示す。これらの制限には、メッセージフローのブロッキング、過度なスムースネスに対する脆弱性、最適近傍識別性が含まれる。これらの理解に触発されて、離散ベルヌーイ分布の代わりに連続分布と分散確率分布からサンプリングされる非離散エッジマスクを探索する。これらのマスクは、各エッジに対して「バンド幅」と呼ばれる出力メッセージの量を制限する。本稿では,帯域幅マスキングとレイヤワイド帯域幅予測を用いた新しい,情報的かつ効果的なトポロジマスマスキンググラフ自動符号化手法を提案する。理論的にも経験的にも強力なグラフトポロジ学習能力を示す。提案するフレームワークは,自己教師付きリンク予測(離散エッジ再構成器を最大20%改善する)と,構造学習プリテキストのみを用いた多数のデータセットのノード分類の両方において,代表的なベースラインを上回っている。私たちの実装はhttps://github.com/newiz430/bandanaで利用可能です。

Masked graph autoencoders have emerged as a powerful graph self-supervised learning method that has yet to be fully explored. In this paper, we unveil that the existing discrete edge masking and binary link reconstruction strategies are insufficient to learn topologically informative representations, from the perspective of message propagation on graph neural networks. These limitations include blocking message flows, vulnerability to over-smoothness, and suboptimal neighborhood discriminability. Inspired by these understandings, we explore non-discrete edge masks, which are sampled from a continuous and dispersive probability distribution instead of the discrete Bernoulli distribution. These masks restrict the amount of output messages for each edge, referred to as "bandwidths". We propose a novel, informative, and effective topological masked graph autoencoder using bandwidth masking and a layer-wise bandwidth prediction objective. We demonstrate its powerful graph topological learning ability both theoretically and empirically. Our proposed framework outperforms representative baselines in both self-supervised link prediction (improving the discrete edge reconstructors by at most 20%) and node classification on numerous datasets, solely with a structure-learning pretext. Our implementation is available at https://github.com/Newiz430/Bandana.

翻訳日:2024-02-07 15:41:47 公開日:2024-02-06

# nkハイブリッド遺伝的アルゴリズムによるクラスタリング

NK Hybrid Genetic Algorithm for Clustering ( http://arxiv.org/abs/2402.03813v1 )

ライセンス: Link先を確認

Renato Tin\'os, Liang Zhao, Francisco Chicano, Darrell Whitley

(参考訳) 本稿では,クラスタリングのためのNKハイブリッド遺伝的アルゴリズムを提案する。この解を評価するために、ハイブリッドアルゴリズムはnkクラスタリング検証基準2 (nkcv2) を用いる。 NKCV2は、オブジェクトの小さなグループN$の配置に関する情報を使用する。各グループはデータセットの$K+1$オブジェクトで構成されている。 NKCV2とK$の固定値を用いて密度ベース領域を同定できることを示す実験結果を得た。 NKCV2では、決定変数の関係が知られており、グレーボックス最適化を適用することができる。突然変異演算子,分割クロスオーバー,局所探索戦略が提案され,すべて決定変数間の関係に関する情報を用いている。分割クロスオーバーでは、評価関数は$q$独立コンポーネントに分解され、分割クロスオーバーは計算複雑性$o(n)$を持つ2^q$可能子孫の中で決定論的にベストを返す。 NKハイブリッド遺伝的アルゴリズムは任意の形状のクラスターの検出とクラスタ数の自動推定を可能にする。実験では、NKハイブリッド遺伝的アルゴリズムは、他の遺伝的アルゴリズムや最先端クラスタリングアルゴリズムと比較して非常に良い結果を得た。

The NK hybrid genetic algorithm for clustering is proposed in this paper. In order to evaluate the solutions, the hybrid algorithm uses the NK clustering validation criterion 2 (NKCV2). NKCV2 uses information about the disposition of $N$ small groups of objects. Each group is composed of $K+1$ objects of the dataset. Experimental results show that density-based regions can be identified by using NKCV2 with fixed small $K$. In NKCV2, the relationship between decision variables is known, which in turn allows us to apply gray box optimization. Mutation operators, a partition crossover, and a local search strategy are proposed, all using information about the relationship between decision variables. In partition crossover, the evaluation function is decomposed into $q$ independent components; partition crossover then deterministically returns the best among $2^q$ possible offspring with computational complexity $O(N)$. The NK hybrid genetic algorithm allows the detection of clusters with arbitrary shapes and the automatic estimation of the number of clusters. In the experiments, the NK hybrid genetic algorithm produced very good results when compared to another genetic algorithm approach and to state-of-art clustering algorithms.

翻訳日:2024-02-07 15:41:25 公開日:2024-02-06

# 高次元ガウス過程モデリングのための加法性と活性部分空間の組み合わせ

Combining additivity and active subspaces for high-dimensional Gaussian process modeling ( http://arxiv.org/abs/2402.03809v1 )

ライセンス: Link先を確認

Mickael Binois (ACUMES), Victor Picheny

(参考訳) ガウス過程は、予測精度、分析的トラクタビリティ、不確実性定量化のための内蔵能力のために、回帰と分類のための広く受け入れられた手法である。しかし、変数の数が増えるたびに次元の呪いに悩まされる。この課題は一般に、問題に付加的な構造を仮定することで解決され、望ましい選択肢は加法性または低固有次元である。高次元ガウス過程モデリングへの我々の貢献は、これらを多面的戦略と組み合わせ、合成関数やデータセットの実験を通じて利点を示すことである。

Gaussian processes are a widely embraced technique for regression and classification due to their good prediction accuracy, analytical tractability and built-in capabilities for uncertainty quantification. However, they suffer from the curse of dimensionality whenever the number of variables increases. This challenge is generally addressed by assuming additional structure in theproblem, the preferred options being either additivity or low intrinsic dimensionality. Our contribution for high-dimensional Gaussian process modeling is to combine them with a multi-fidelity strategy, showcasing the advantages through experiments on synthetic functions and datasets.

翻訳日:2024-02-07 15:41:06 公開日:2024-02-06

# sdemg : スコアに基づく表面筋電図信号の拡散モデル

SDEMG: Score-based Diffusion Model for Surface Electromyographic Signal Denoising ( http://arxiv.org/abs/2402.03808v1 )

ライセンス: Link先を確認

Yu-Tung Liu, Kuan-Chen Wang, Kai-Chun Liu, Sheng-Yu Peng, Yu Tsao

(参考訳) 表面筋電図(sEMG)記録は、監視される筋肉が心臓に近いときに心電図(ECG)信号に影響される。既存のいくつかの手法では、ハイパスフィルタやテンプレートサブトラクションなどの信号処理に基づくアプローチが採用されているが、ノイズの多いsEMG(ECG干渉付きsEMG)からクリーンなsEMG信号を復元する関数が導出されている。近年,ノイズの多い入力データを用いた高品質で正確なサンプルを生成するために,スコアベース拡散モデルが導入された。本研究では,SDEMGと呼ばれる新しい手法を提案し,SEMG信号デノージングのためのスコアベース拡散モデルを提案する。提案手法を評価するために,mit-bih正規正弦波リズムデータベースからのecg信号とオープンアクセス可能な非侵襲適応義手データベースのデータを用いて,semg信号のノイズを低減する実験を行った。その結果,SDEMGは比較法より優れ,高品質なsEMG試料が得られた。 SDEMGのソースコードは、https://github.com/tonyliu0910/SDEMGで入手できる。

Surface electromyography (sEMG) recordings can be influenced by electrocardiogram (ECG) signals when the muscle being monitored is close to the heart. Several existing methods use signal-processing-based approaches, such as high-pass filter and template subtraction, while some derive mapping functions to restore clean sEMG signals from noisy sEMG (sEMG with ECG interference). Recently, the score-based diffusion model, a renowned generative model, has been introduced to generate high-quality and accurate samples with noisy input data. In this study, we proposed a novel approach, termed SDEMG, as a score-based diffusion model for sEMG signal denoising. To evaluate the proposed SDEMG approach, we conduct experiments to reduce noise in sEMG signals, employing data from an openly accessible source, the Non-Invasive Adaptive Prosthetics database, along with ECG signals from the MIT-BIH Normal Sinus Rhythm Database. The experiment result indicates that SDEMG outperformed comparative methods and produced high-quality sEMG samples. The source code of SDEMG the framework is available at: https://github.com/tonyliu0910/SDEMG

翻訳日:2024-02-07 15:40:55 公開日:2024-02-06

# SEABO:オフライン模倣学習のための簡易検索手法

SEABO: A Simple Search-Based Method for Offline Imitation Learning ( http://arxiv.org/abs/2402.03807v1 )

ライセンス: Link先を確認

Jiafei Lyu, Xiaoteng Ma, Le Wan, Runze Liu, Xiu Li, Zongqing Lu

(参考訳) オフライン強化学習(rl)は、静的なオフラインデータセットから学習する能力と、環境とのインタラクションの必要性の排除によって、多くの注目を集めている。それでも、オフラインRLの成功は、報酬ラベルを付したオフライン移行に大きく依存している。実際には、しばしば報酬関数を手作りする必要があるが、それは時に困難、労働集約的、あるいは非効率である。この課題に取り組むために,我々はオフライン模倣学習(il)設定に着目し,専門家データとラベルなしデータに基づいて報奨機能を得ることを目標とした。そこで本研究では,検索ベースのオフラインil手法であるtagged seaboを提案する。 SEABOは、専門家によるデモンストレーションにおいて、隣人に近い移行に対してより大きな報酬を割り当て、そうでなければ、すべて教師なしの学習方法で、より小さな報酬を割り当てる。様々なD4RLデータセットに対する実験結果から、SEABOは1つの専門的軌道のみを与えられた、オフラインRLアルゴリズムに対する競合的な性能を達成することができ、多くのタスクにおける事前報酬学習やオフラインILメソッドよりも優れることが示された。また,専門家による実証実験が観察のみを含む場合,SEABOは有効であることを示す。私たちのコードはhttps://github.com/dmksjfl/SEABO.comで公開されています。

Offline reinforcement learning (RL) has attracted much attention due to its ability in learning from static offline datasets and eliminating the need of interacting with the environment. Nevertheless, the success of offline RL relies heavily on the offline transitions annotated with reward labels. In practice, we often need to hand-craft the reward function, which is sometimes difficult, labor-intensive, or inefficient. To tackle this challenge, we set our focus on the offline imitation learning (IL) setting, and aim at getting a reward function based on the expert data and unlabeled data. To that end, we propose a simple yet effective search-based offline IL method, tagged SEABO. SEABO allocates a larger reward to the transition that is close to its closest neighbor in the expert demonstration, and a smaller reward otherwise, all in an unsupervised learning manner. Experimental results on a variety of D4RL datasets indicate that SEABO can achieve competitive performance to offline RL algorithms with ground-truth rewards, given only a single expert trajectory, and can outperform prior reward learning and offline IL methods across many tasks. Moreover, we demonstrate that SEABO also works well if the expert demonstrations contain only observations. Our code is publicly available at https://github.com/dmksjfl/SEABO.

翻訳日:2024-02-07 15:40:33 公開日:2024-02-06

# 信用決定のための説明可能な自動機械学習:金融工学におけるヒューマン人工知能コラボレーションの強化

Explainable Automated Machine Learning for Credit Decisions: Enhancing Human Artificial Intelligence Collaboration in Financial Engineering ( http://arxiv.org/abs/2402.03806v1 )

ライセンス: Link先を確認

Marc Schmitt

(参考訳) 本稿では,金融工学領域における説明可能な自動機械学習(AutoML)の統合について考察する。金融における人工知能(AI)の急速な進化は、洗練されたアルゴリズムによる意思決定と、これらのシステムの透明性の必要性のバランスを必要とする。 automlがクレジットスコアリングのための堅牢な機械学習モデルの開発を合理化する一方で、説明可能なai(xai)メソッド、特にshapley additive descriptions(shap)は、モデルの意思決定プロセスに関する洞察を提供する。この研究は、AutoMLとXAIの組み合わせが信用決定の効率性と正確性を高めるだけでなく、人間とAIシステムの信頼と協力を促進することを実証している。この調査結果は、AI主導の金融決定の透明性と説明責任を改善し、規制要件と倫理的考慮に従って、説明可能なAutoMLの可能性を強調している。

This paper explores the integration of Explainable Automated Machine Learning (AutoML) in the realm of financial engineering, specifically focusing on its application in credit decision-making. The rapid evolution of Artificial Intelligence (AI) in finance has necessitated a balance between sophisticated algorithmic decision-making and the need for transparency in these systems. The focus is on how AutoML can streamline the development of robust machine learning models for credit scoring, while Explainable AI (XAI) methods, particularly SHapley Additive exPlanations (SHAP), provide insights into the models' decision-making processes. This study demonstrates how the combination of AutoML and XAI not only enhances the efficiency and accuracy of credit decisions but also fosters trust and collaboration between humans and AI systems. The findings underscore the potential of explainable AutoML in improving the transparency and accountability of AI-driven financial decisions, aligning with regulatory requirements and ethical considerations.

翻訳日:2024-02-07 15:40:08 公開日:2024-02-06

# DistiLLM:大規模言語モデルの合理化に向けて

DistiLLM: Towards Streamlined Distillation for Large Language Models ( http://arxiv.org/abs/2402.03898v1 )

ライセンス: Link先を確認

Jongwoo Ko, Sungnyun Kim, Tianyi Chen, Se-Young Yun

(参考訳) 知識蒸留(KD)は、教師モデルをより小さな学生モデルに圧縮するために広く用いられ、モデル能力を維持しながら推論コストとメモリフットプリントを低減する。しかし、現在の自動回帰シーケンスモデル(例えば、大きな言語モデル)のKD法は、標準化された目的関数を欠いている。さらに、近年の学生生成出力によるトレーニング・推論ミスマッチへの対処は、計算コストを著しく高めている。これらの問題に対処するために、自動回帰言語モデルのためのより効率的で効率的なKDフレームワークであるDistiLLMを紹介する。 DistiLLMは,(1)新しいスキューKulback-Leibler分散損失,(2)学生生成出力の効率向上を目的とした適応型オフ政治アプローチの2つのコンポーネントから構成される。命令追従タスクを含む大規模な実験は、最近のKD法と比較して4.3$\times$スピードアップを達成しつつ、高性能な学生モデルを構築する上でDistiLLMの有効性を示す。

Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller student model, reducing its inference cost and memory footprint while preserving model capabilities. However, current KD methods for auto-regressive sequence models (e.g., large language models) suffer from missing a standardized objective function. Moreover, the recent use of student-generated outputs to address training-inference mismatches has significantly escalated computational costs. To tackle these issues, we introduce DistiLLM, a more effective and efficient KD framework for auto-regressive language models. DistiLLM comprises two components: (1) a novel skew Kullback-Leibler divergence loss, where we unveil and leverage its theoretical properties, and (2) an adaptive off-policy approach designed to enhance the efficiency in utilizing student-generated outputs. Extensive experiments, including instruction-following tasks, demonstrate the effectiveness of DistiLLM in building high-performing student models while achieving up to 4.3$\times$ speedup compared to recent KD methods.

翻訳日:2024-02-07 15:34:52 公開日:2024-02-06

# 行と円を超えて:大規模言語モデルにおける幾何学的推論ギャップを明らかにする

Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models ( http://arxiv.org/abs/2402.03877v1 )

ライセンス: Link先を確認

Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski

(参考訳) 大規模言語モデル(LLM)は、数学的およびアルゴリズム的なタスクにおいて、絶え間なく増加する能力を示すが、その幾何学的推論スキルは過小評価されている。構築幾何学的問題解決におけるllmsの能力について,人間の数学的推論の発展における最も基本的なステップの1つについて検討する。我々の研究は、同様の分野での多くの成功にもかかわらず、最先端のLLMがこの領域で直面する顕著な課題を明らかにします。 LLMは対象の変数選択に偏りを示し、2次元空間的関係に苦慮し、しばしば物体とその配置を誤って表現し幻覚させる。そこで本研究では,内部対話を行うことで,既存の推論能力を高めるllmsベースのマルチエイジェントシステムを定式化した枠組みを提案する。この研究は、幾何学的推論におけるLLMの現在の限界を強調し、自己補正、協調、多様な役割専門化を通じて幾何学的推論能力を改善する。

Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored. We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning. Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas. LLMs exhibit biases in target variable selection and struggle with 2D spatial relationships, often misrepresenting and hallucinating objects and their placements. To this end, we introduce a framework that formulates an LLMs-based multi-agents system that enhances their existing reasoning potential by conducting an internal dialogue. This work underscores LLMs' current limitations in geometric reasoning and improves geometric reasoning capabilities through self-correction, collaboration, and diverse role specializations.

翻訳日:2024-02-07 15:34:34 公開日:2024-02-06

# BQP$^A$プロトコルと潜在グラフ分類器の幾何学量子機械学習

Geometric quantum machine learning of BQP$^A$ protocols and latent graph classifiers ( http://arxiv.org/abs/2402.03871v1 )

ライセンス: Link先を確認

Chukwudubem Umeano, Vincent E. Elfving, Oleksandr Kyriienko

(参考訳) 幾何学量子機械学習(GQML)は、効率的な解法プロトコルを学習するための問題対称性を埋め込むことを目的としている。しかし、(G)QMLが古典的なアナログから指数関数的に分離したプロトコルの構築に日常的に使用できるかどうかについては疑問が残る。このレターでは、ブール関数の学習特性に関するサイモンの問題を考察し、これは教師なし回路分類問題と関係があることを示す。幾何QMLのワークフローを用いて、Simonのアルゴリズムから学習し、いくつかのデータセット(oracle $A$)に関してBQP$^A\neq$BPPプロトコルの例を発見する。我々の重要な発見は、特定されたビットフリップおよび置換対称性に関するtwirlingに基づくブール関数埋め込みのための同変特徴マップの開発と、サンプリングの利点を持つ不変可観測性に基づく測定である。提案したワークフローは、変動回路を自明なアイデンティティ演算子として保ちながら、データ埋め込みと古典的な後処理の重要性を指摘する。次に,関数学習の直観を発展させ,向き付けられた計算ハイパーグラフとしてインスタンスを視覚化し,GQMLプロトコルがグローバルなトポロジ的特徴にアクセスして,単射関数と全射関数を区別する。最後に、他のbqp$^a$-タイプのプロトコルを学習する可能性について議論し、これはユニタリの線形結合として適用される埋め込みベースのoracles $a$を単純化する能力に依存すると推測する。

Geometric quantum machine learning (GQML) aims to embed problem symmetries for learning efficient solving protocols. However, the question remains if (G)QML can be routinely used for constructing protocols with an exponential separation from classical analogs. In this Letter we consider Simon's problem for learning properties of Boolean functions, and show that this can be related to an unsupervised circuit classification problem. Using the workflow of geometric QML, we learn from first principles Simon's algorithm, thus discovering an example of BQP$^A\neq$BPP protocol with respect to some dataset (oracle $A$). Our key findings include the development of an equivariant feature map for embedding Boolean functions, based on twirling with respect to identified bitflip and permutational symmetries, and measurement based on invariant observables with a sampling advantage. The proposed workflow points to the importance of data embeddings and classical post-processing, while keeping the variational circuit as a trivial identity operator. Next, developing the intuition for the function learning, we visualize instances as directed computational hypergraphs, and observe that the GQML protocol can access their global topological features for distinguishing bijective and surjective functions. Finally, we discuss the prospects for learning other BQP$^A$-type protocols, and conjecture that this depends on the ability of simplifying embeddings-based oracles $A$ applied as a linear combination of unitaries.

翻訳日:2024-02-07 15:34:17 公開日:2024-02-06

# ドイツ語のプレステキストでは、単語の1%未満が性別による排他的言語に影響される

Less than one percent of words would be affected by gender-inclusive language in German press texts ( http://arxiv.org/abs/2402.03870v1 )

ライセンス: Link先を確認

Carolin M\"uller-Spitzer, Samira Ochs, Alexander Koplenig, Jan-Oliver R\"udiger, Sascha Wolfer

(参考訳) ジェンダーと言語に関する研究は、性平等と非差別言語の使用に関する社会的議論に強く根付いている。精神言語学者はこの分野で大きな貢献をした。しかし、これらの事項を言語使用の文脈で研究するコーパスベースの研究はいまだに稀である。本研究は,ジェンダー非包括的テキストをジェンダー非包括的テキストに書き換える場合,実際にどの程度のテクストを変更すべきかという問題に対処する。この量的尺度は重要な経験的洞察であり、ジェンダーを包含するドイツ語の使用に対する繰り返しの議論は、文章が長く複雑すぎるというものである。また、ジェンダー非包摂的言語は言語学習者に悪影響を及ぼすとも主張されている。しかし、このような効果は、性非包括的テキストが性非包括的テキストと非常に異なる場合に限られる。コーパス言語研究では、手動でドイツ語のプレステキストに注釈を付け、変更すべき部分を特定しました。その結果、平均して全てのトークンの1%未満は、性別による排他的な言語に影響されることがわかった。この小さな割合は、特に男性ジェネリックを解釈する潜在的複雑さを考慮すると、性別を包含するドイツ人が言語を理解し、学習する上で大きな障壁となるかどうかを問うものである。

Research on gender and language is tightly knitted to social debates on gender equality and non-discriminatory language use. Psycholinguistic scholars have made significant contributions in this field. However, corpus-based studies that investigate these matters within the context of language use are still rare. In our study, we address the question of how much textual material would actually have to be changed if non-gender-inclusive texts were rewritten to be gender-inclusive. This quantitative measure is an important empirical insight, as a recurring argument against the use of gender-inclusive German is that it supposedly makes written texts too long and complicated. It is also argued that gender-inclusive language has negative effects on language learners. However, such effects are only likely if gender-inclusive texts are very different from those that are not gender-inclusive. In our corpus-linguistic study, we manually annotated German press texts to identify the parts that would have to be changed. Our results show that, on average, less than 1% of all tokens would be affected by gender-inclusive language. This small proportion calls into question whether gender-inclusive German presents a substantial barrier to understanding and learning the language, particularly when we take into account the potential complexities of interpreting masculine generics.

翻訳日:2024-02-07 15:33:48 公開日:2024-02-06

# 物理インフォームドニューラルネットワークにおける非線形レジームの課題

The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks ( http://arxiv.org/abs/2402.03864v1 )

ライセンス: Link先を確認

Andrea Bonfanti, Giuseppe Bruno, Cristina Cipriani

(参考訳) ニューラル・タンジェント・カーネル(NTK)の視点は、無限幅限界における物理情報ニューラルネットワーク(PINN)のトレーニング力学を調べるための貴重なアプローチである。我々はこの観点を活用し、PINNによって解決された非線形偏微分方程式(PDE)の事例に焦点を当てる。微分作用素の線型性に依存するNTKの異なる挙動に関する理論的結果を提供する。さらに,理論的な結果に触発されて,PINNの訓練に二階法を用いるという利点を強調した。さらに, 2次法の収束能力を考察し, スペクトルバイアスと緩やかな収束の課題に対処する。各理論結果は線形PDEと非線形PDEの数値例によって支持され、ベンチマークテストケースでのトレーニング方法を検証する。

The Neural Tangent Kernel (NTK) viewpoint represents a valuable approach to examine the training dynamics of Physics-Informed Neural Networks (PINNs) in the infinite width limit. We leverage this perspective and focus on the case of nonlinear Partial Differential Equations (PDEs) solved by PINNs. We provide theoretical results on the different behaviors of the NTK depending on the linearity of the differential operator. Moreover, inspired by our theoretical results, we emphasize the advantage of employing second-order methods for training PINNs. Additionally, we explore the convergence capabilities of second-order methods and address the challenges of spectral bias and slow convergence. Every theoretical result is supported by numerical examples with both linear and nonlinear PDEs, and we validate our training method on benchmark test cases.

翻訳日:2024-02-07 15:33:25 公開日:2024-02-06

# ハイブリッドコヒーレント状態における高次非古典性

Higher-order nonclassicalities in hybrid coherent states ( http://arxiv.org/abs/2402.03858v1 )

ライセンス: Link先を確認

Sandip Kumar Giri and Biswajit Sen

(参考訳) 量子技術の発展において、非古典的状態は、非古典性の適切な利用なしには量子的優位性を得ることができないため、重要な役割を担っている。本研究では,単光子付加コヒーレント状態 (SPAC) とコヒーレント状態 (CS) のコヒーレント重ね合わせであるハイブリッドコヒーレント状態 (HCS) を考える。本稿では,高次スクイージングと高次アンチバンチングに着目したHCSの高次非古典的特性について報告する。 hcsは実験的に実現可能であり、この工学化された量子状態は、所望の高次非古典的性質を持つ量子状態を生成するのに使うことができる。

In the development of quantum technologies, nonclassical states have been playing a pivotal role, as quantum advantage cannot be obtained without appropriate utilization of nonclassicality. In the present work, we consider a hybrid coherent state (HCS), which is a coherent superposition of the single-photon-added coherent (SPAC) state and a coherent state (CS). Here, we report higher-order nonclassical properties of HCS with a specific focus on higher-order squeezing and higher-order antibunching. It's shown that HCS is experimentally realizable, and this engineered quantum state can be used to produce quantum states with desired higher-order nonclassical properties.

翻訳日:2024-02-07 15:33:11 公開日:2024-02-06

# ポジションペーパー:モデル表現研究の新しい枠組みに向けて

Position Paper: Toward New Frameworks for Studying Model Representations ( http://arxiv.org/abs/2402.03855v1 )

ライセンス: Link先を確認

Satvik Golechha, James Dao

(参考訳) mechanistic interpretability (mi)は、ニューラルネットワークが学習する正確なアルゴリズムをリバースエンジニアリングすることで、aiモデルを理解することを目的としている。 MIにおけるほとんどの研究は、自明でトークンに整合した振る舞いと能力を研究しています。しかし、ほとんどの能力はそれほど自明ではなく、分析の単位としてこれらのネットワーク内の隠れた表現の研究を提唱している。文献レビューを行い、特徴と行動の表現を形式化し、その重要性と評価を強調し、表現の機械的解釈可能性に関する基礎的な調査を行う。議論と探索の結果から,表現研究は重要かつ未研究の分野であり,現在MIで確立されている手法では表現の理解が不十分である,という立場を正当化し,表現研究の新たな枠組みに向けて研究コミュニティを推し進める。

Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn. Most works in MI so far have studied behaviors and capabilities that are trivial and token-aligned. However, most capabilities are not that trivial, which advocates for the study of hidden representations inside these networks as the unit of analysis. We do a literature review, formalize representations for features and behaviors, highlight their importance and evaluation, and perform some basic exploration in the mechanistic interpretability of representations. With discussion and exploratory results, we justify our position that studying representations is an important and under-studied field, and that currently established methods in MI are not sufficient to understand representations, thus pushing for the research community to work toward new frameworks for studying representations.

翻訳日:2024-02-07 15:32:56 公開日:2024-02-06

# anls* -- 生成型大規模言語モデルのためのユニバーサルドキュメント処理メトリック

ANLS* -- A Universal Document Processing Metric for Generative Large Language Models ( http://arxiv.org/abs/2402.03848v1 )

ライセンス: Link先を確認

David Peer, Philemon Sch\"opf, Volckmar Nebendahl, Alexander Rietzler, Sebastian Stabinger

(参考訳) 伝統的に、差別モデルが文書分類や情報抽出といったタスクの主要な選択肢となっている。これらのモデルは、限定された定義済みのクラスに該当する予測を行い、バイナリ真または偽の評価を容易にし、F1スコアのようなメトリクスの直接計算を可能にする。しかし、ジェネレーティブ大言語モデル(gllm)の最近の進歩により、ゼロショット能力が強化され、ダウンストリームデータセットと計算コストの高い微調整の必要性がなくなるため、この分野はシフトした。しかし、GLLM の評価は、識別モデルに使用される二項真偽の評価が、GLLM の予測には適用できないため、課題となる。本稿では,情報抽出や分類タスクを含む幅広いタスクを評価するために,anls*と呼ばれる生成モデルのための新しいメトリクスを提案する。 ANLS*メトリックは、既存のANLSメトリクスをドロップイン置換として拡張し、以前報告されたANLSスコアと互換性がある。また、ANLS*測定値を用いて、7つの異なるデータセットと3つの異なるGLLMの評価を行い、提案手法の重要性を示した。また、SFTと呼ばれる文書のプロンプトを生成する新しい手法を、LATINなどの他のプロンプト技術に対してベンチマークする。 21件中15件で、SFTは他のテクニックより優れており、最先端の技術を改善している。ソースはhttps://github.com/deepopinion/anls_star_metricにある。

Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs. This paper introduces a new metric for generative models called ANLS* for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets and 3 different GLLMs using the ANLS* metric is also provided, demonstrating the importance of the proposed metric. We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In 15 out of 21 cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as $15$ percentage points. Sources are available at https://github.com/deepopinion/anls_star_metric

翻訳日:2024-02-07 15:32:41 公開日:2024-02-06

# 量子支援ベクトルマシンを用いた非溶血ペプチドの分類

Non-Hemolytic Peptide Classification Using A Quantum Support Vector Machine ( http://arxiv.org/abs/2402.03847v1 )

ライセンス: Link先を確認

Shengxin Zhuang, John Tanner, Yusen Wu, Du Q. Huynh, Wei Liu Xavier F. Cadet, Nicolas Fontaine, Philippe Charton, Cedric Damour, Frederic Cadet, Jingbo Wang

(参考訳) 量子機械学習(QML)は、量子計算の最も有望な応用の1つである。しかし、データが古典的な性質を持ち、QMLの実用的な実世界の応用を探す際に量子的優位性が存在するかどうかはまだ不明である。本研究では,QMLモデルである量子支援ベクトルマシン(QSVM)を,ペプチドを溶血性または非溶血性のいずれかに分類する二項分類タスクに適用する。 3つのペプチドデータセットを用いて、QSVMの性能、多くの古典的なSVM、そして、QSVMが最高に機能する同一のペプチド分類タスクに関する最も優れた結果を適用し、比較する。この作品の貢献には i) この特定のペプチド分類タスクへのQSVMの最初の適用。 (ii)この分類課題において古典的機械学習モデルで得られた最良の結果よりも優れたqsvmの明示的な実証と, (iii)qsvmは、この分類タスクにおいて、多くの(おそらくすべての)古典的svmを上回ることができることを示す実証的な結果である。この基礎研究は、計算生物学の分野で検証可能な量子長所への道を開き、より安全な治療開発を促進する。

Quantum machine learning (QML) is one of the most promising applications of quantum computation. However, it is still unclear whether quantum advantages exist when the data is of a classical nature and the search for practical, real-world applications of QML remains active. In this work, we apply the well-studied quantum support vector machine (QSVM), a powerful QML model, to a binary classification task which classifies peptides as either hemolytic or non-hemolytic. Using three peptide datasets, we apply and contrast the performance of the QSVM, numerous classical SVMs, and the best published results on the same peptide classification task, out of which the QSVM performs best. The contributions of this work include (i) the first application of the QSVM to this specific peptide classification task, (ii) an explicit demonstration of QSVMs outperforming the best published results attained with classical machine learning models on this classification task and (iii) empirical results showing that the QSVM is capable of outperforming many (and possibly all) classical SVMs on this classification task. This foundational work paves the way to verifiable quantum advantages in the field of computational biology and facilitates safer therapeutic development.

翻訳日:2024-02-07 15:32:15 公開日:2024-02-06

# 外乱検出のための隠れ外乱発生効率の向上

Efficient Generation of Hidden Outliers for Improved Outlier Detection ( http://arxiv.org/abs/2402.03846v1 )

ライセンス: Link先を確認

Jose Cribeiro-Ramallo, Vadim Arzamasov, Klemens B\"ohm

(参考訳) 外乱生成は重要な外乱検出タスクを解くのによく使われる手法である。現実的な振る舞いで外れ値を生成するのは困難です。一般的な既存の手法は、高次元空間における外れ値の'多重ビュー'特性を無視しやすい。この性質を考慮に入れている唯一の方法は、効率性と有効性に欠ける。本稿では,その特性を模倣した現実的な外れ値を生成する新しい外れ値生成手法であるBISECTを提案する。そのために、BISECTは、これらの現実的な外れ値を効率的に生成する方法を述べる新しい提案をこの記事に導入している。我々の手法は'複数ビュー'を再現する現在の手法よりも保証と複雑さが優れている。本研究では,bisectが生成する合成異常値を用いて,多種多様なデータセットにおける異常検出を効果的に強化する。例えば、BISECTとのオーバーサンプリングでは、ベースラインと比較してエラーを最大3倍削減した。

Outlier generation is a popular technique used for solving important outlier detection tasks. Generating outliers with realistic behavior is challenging. Popular existing methods tend to disregard the 'multiple views' property of outliers in high-dimensional spaces. The only existing method accounting for this property falls short in efficiency and effectiveness. We propose BISECT, a new outlier generation method that creates realistic outliers mimicking said property. To do so, BISECT employs a novel proposition introduced in this article stating how to efficiently generate said realistic outliers. Our method has better guarantees and complexity than the current methodology for recreating 'multiple views'. We use the synthetic outliers generated by BISECT to effectively enhance outlier detection in diverse datasets, for multiple use cases. For instance, oversampling with BISECT reduced the error by up to 3 times when compared with the baselines.

翻訳日:2024-02-07 15:31:42 公開日:2024-02-06

# 拡散モデルにおけるゲージ自由度、保守性および固有次元推定について

On gauge freedom, conservativity and intrinsic dimensionality estimation in diffusion models ( http://arxiv.org/abs/2402.03845v1 )

ライセンス: Link先を確認

Christian Horvat and Jean-Pascal Pfister

(参考訳) 拡散モデルは、最近、高次元でのサンプリング品質と密度推定の観点から印象的な性能を示す生成モデルである。それらは、時間依存ベクトル場によって記述され、生成モデルとして使用される前方連続拡散過程と後方連続分解過程に依存している。拡散モデルのオリジナルの定式化において、このベクトル場はスコア関数(つまり、拡散過程における所定の時間における対数確率の勾配)であると仮定される。興味深いことに、現実的には、拡散モデルに関するほとんどの研究は、このベクトル場をニューラルネットワーク関数として実装し、あるエネルギー関数の勾配として制約しない(つまり、ほとんどの研究はベクトル場を保守的であるように制限しない)。このような制約がパフォーマンス向上につながるかどうかを実証的に調査する研究もあるが、矛盾する結果につながり、分析結果の提供に失敗している。本稿では,ベクトル場のモデリング自由度に関する3つの解析結果を示す。まず、与えられた(ゲージ)自由を満たす保守的成分と直交成分にベクトル場の新たな分解を提案する。第二に, この直交分解により, 保存成分が真のスコアと正確に等しい場合, 正確な密度推定と精密サンプリングが可能であり, 保存性は必要でも十分でもないことを示した。最後に、データ多様体の局所的な情報を推測する際、ベクトル場が保守的であることを制約することが望ましいことを示す。

Diffusion models are generative models that have recently demonstrated impressive performances in terms of sampling quality and density estimation in high dimensions. They rely on a forward continuous diffusion process and a backward continuous denoising process, which can be described by a time-dependent vector field and is used as a generative model. In the original formulation of the diffusion model, this vector field is assumed to be the score function (i.e. it is the gradient of the log-probability at a given time in the diffusion process). Curiously, on the practical side, most studies on diffusion models implement this vector field as a neural network function and do not constrain it be the gradient of some energy function (that is, most studies do not constrain the vector field to be conservative). Even though some studies investigated empirically whether such a constraint will lead to a performance gain, they lead to contradicting results and failed to provide analytical results. Here, we provide three analytical results regarding the extent of the modeling freedom of this vector field. {Firstly, we propose a novel decomposition of vector fields into a conservative component and an orthogonal component which satisfies a given (gauge) freedom. Secondly, from this orthogonal decomposition, we show that exact density estimation and exact sampling is achieved when the conservative component is exactly equals to the true score and therefore conservativity is neither necessary nor sufficient to obtain exact density estimation and exact sampling. Finally, we show that when it comes to inferring local information of the data manifold, constraining the vector field to be conservative is desirable.

翻訳日:2024-02-07 15:31:20 公開日:2024-02-06

# 光学鋼ロープの非破壊損傷検出法

A new method for optical steel rope non-destructive damage detection ( http://arxiv.org/abs/2402.03843v1 )

ライセンス: Link先を確認

Yunqing Bao, Bin Hu

(参考訳) 本稿では,高高度環境(エアラルロープウェイ)における鋼ロープの非破壊損傷検出アルゴリズムを提案する。まず、rgbd-unetと呼ばれるセグメンテーションモデルは、複雑な背景から正確に鋼ロープを抽出するように設計されている。このモデルは、提案したCMAモジュールを通して色と深度情報を処理・結合する機能を備えている。第2に、VovNetV3.5と呼ばれる検出モデルは、通常の鋼ロープと異常鋼ロープを区別するために開発された。 VovNetアーキテクチャとDBBモジュールを統合してパフォーマンスを向上させる。また,セグメンテーションモデルの一般化能力を高めるために,新たなバックグラウンド拡張手法を提案する。セグメンテーションと検出モデルのトレーニングとテストのために、異なるシナリオで鋼ロープの画像を含むデータセットが作成されます。実験はベースラインモデルよりも大幅に改善された。提案するデータセットでは,検出モデルによる最大精度は0.975に達し,セグメンテーションモデルによる最大f測定値は0.948に達した。

This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through the proposed CMA module. Second, a detection model named VovNetV3.5 is developed to differentiate between normal and abnormal steel ropes. It integrates the VovNet architecture with a DBB module to enhance performance. Besides, a novel background augmentation method is proposed to enhance the generalization ability of the segmentation model. Datasets containing images of steel ropes in different scenarios are created for the training and testing of both the segmentation and detection models. Experiments demonstrate a significant improvement over baseline models. On the proposed dataset, the highest accuracy achieved by the detection model reached 0.975, and the maximum F-measure achieved by the segmentation model reached 0.948.

翻訳日:2024-02-07 15:30:34 公開日:2024-02-06

# 信念のシーングラフ:期待の計算による部分的なシーンの拡張

Belief Scene Graphs: Expanding Partial Scenes with Objects through Computation of Expectation ( http://arxiv.org/abs/2402.03840v1 )

ライセンス: Link先を確認

Mario A.V. Saucedo, Akash Patel, Akshit Saradagi, Christoforos Kanellakis and George Nikolakopoulos

(参考訳) 本稿では,部分的な情報を用いた効率的な高レベルタスク計画を可能にする,部分的な3次元シーングラフのユーティリティ駆動拡張であるBelief Scene Graphsの概念を提案する。ロボットのミッションに適した新しいノード(盲目ノードと呼ばれる)を戦略的に追加するために使用される、任意の3dシーングラフ上の信念(期待)の計算のためのグラフベースの学習手法を提案する。本研究では,利用可能なトレーニングデータからヒストグラムを学習することにより,現実の信念/期待を合理的に近似する相関情報(ceci)に基づく期待値の計算法を提案する。 3次元シーングラフのレポジトリからCECIを学ぶために,新しいグラフ畳み込みニューラルネットワーク(GCN)モデルを開発した。新たなCECIモデルのトレーニングには3Dシーングラフのデータベースが存在しないため,意味的注釈付き実生活3D空間に基づく3Dシーングラフデータセットを生成するための新しい手法を提案する。生成されたデータセットを用いて提案したCECIモデルをトレーニングし,提案手法の広範な検証を行う。我々は、期待を抽象表現に統合するためのコアコンポーネントとして、新しい概念である \textit{belief scene graphs} (bsg) を確立した。この新しいコンセプトは、古典的な3Dシーングラフの概念の進化であり、さまざまなロボティクスミッションのタスク計画と最適化のための高度な推論を可能にすることを目的としている。全体のフレームワークの有効性は、オブジェクト検索のシナリオで評価され、人間の目に見えないオブジェクトの常識をエミュレートする実生活実験でもテストされている。

In this article, we propose the novel concept of Belief Scene Graphs, which are utility-driven extensions of partial 3D scene graphs, that enable efficient high-level task planning with partial information. We propose a graph-based learning methodology for the computation of belief (also referred to as expectation) on any given 3D scene graph, which is then used to strategically add new nodes (referred to as blind nodes) that are relevant for a robotic mission. We propose the method of Computation of Expectation based on Correlation Information (CECI), to reasonably approximate real Belief/Expectation, by learning histograms from available training data. A novel Graph Convolutional Neural Network (GCN) model is developed, to learn CECI from a repository of 3D scene graphs. As no database of 3D scene graphs exists for the training of the novel CECI model, we present a novel methodology for generating a 3D scene graph dataset based on semantically annotated real-life 3D spaces. The generated dataset is then utilized to train the proposed CECI model and for extensive validation of the proposed method. We establish the novel concept of \textit{Belief Scene Graphs} (BSG), as a core component to integrate expectations into abstract representations. This new concept is an evolution of the classical 3D scene graph concept and aims to enable high-level reasoning for the task planning and optimization of a variety of robotics missions. The efficacy of the overall framework has been evaluated in an object search scenario, and has also been tested on a real-life experiment to emulate human common sense of unseen-objects.

翻訳日:2024-02-07 15:29:44 公開日:2024-02-06

# ランダムの特徴モデル--ナイーブ・インパテーションの成功を研究する方法

Random features models: a way to study the success of naive imputation ( http://arxiv.org/abs/2402.03839v1 )

ライセンス: Link先を確認

Alexis Ayme (LPSM (UMR\_8001)), Claire Boyer (LPSM (UMR\_8001), IUF), Aymeric Dieuleveut (CMAP), Erwan Scornet (LPSM (UMR\_8001))

(参考訳) コンスタントな(ナイーブな)インプテーションは、データ欠落に対処するのに初めて簡単に使えるテクニックであるため、まだ広く使われている。しかし、この単純な手法は、インプット入力が真の基礎データと強く異なる可能性があるため、予測目的に対して大きなバイアスを引き起こすことが期待できる。しかし、最近の研究では、データが完全にランダム(MCAR)で欠落していると思われる場合、このバイアスは高次元線形予測器の文脈では低いことが示唆されている。 This paper completes the picture for linear predictors by confirming the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension.To this aim, we consider a unique underlying random features model, which offers a rigorous framework for studying predictive performances, whilst the dimension of the observed features varies.Building on these theoretical results, we establish finite-sample bounds on stochastic gradient (SGD) predictors applied to zero-imputed data, a strategy particularly well suited for large-scale learning.If the MCAR assumption appears to be strong, we show that similar favorable behaviors occur for more complex missing data scenarios.

Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data. Yet, this simple method could be expected to induce a large bias for prediction purposes, as the imputed input may strongly differ from the true underlying data. However, recent works suggest that this bias is low in the context of high-dimensional linear predictors when data is supposed to be missing completely at random (MCAR). This paper completes the picture for linear predictors by confirming the intuition that the bias is negligible and that surprisingly naive imputation also remains relevant in very low dimension.To this aim, we consider a unique underlying random features model, which offers a rigorous framework for studying predictive performances, whilst the dimension of the observed features varies.Building on these theoretical results, we establish finite-sample bounds on stochastic gradient (SGD) predictors applied to zero-imputed data, a strategy particularly well suited for large-scale learning.If the MCAR assumption appears to be strong, we show that similar favorable behaviors occur for more complex missing data scenarios.

翻訳日:2024-02-07 15:28:51 公開日:2024-02-06

# Sliced Wasserstein Weisfeiler-Lehmanグラフカーネルによるガウス過程の回帰

Gaussian process regression with Sliced Wasserstein Weisfeiler-Lehman graph kernels ( http://arxiv.org/abs/2402.03838v1 )

ライセンス: Link先を確認

Rapha\"el Carpintero Perez (CMAP), S\'ebastien da Veiga (ENSAI, CREST), Josselin Garnier (CMAP), Brian Staber

(参考訳) 教師付き学習は、偏微分方程式の解法や材料特性の予測といったタスクの複雑なパターンを効果的に抽出する能力によって、計算物理学の分野で大きな注目を集めている。伝統的に、このようなデータセットは、問題幾何を表す多数のノードが(グラフとして)メッシュとして与えられる入力と、数値解法で得られる対応する出力からなる。つまり、教師付き学習モデルは、ノード属性の連続した大きなスパースグラフを処理できなければならない。本研究ではガウス過程の回帰に着目し,スライスしたwasserstein weisfeiler-lehman(swwl)グラフカーネルを紹介する。既存のグラフカーネルとは対照的に、提案されているswlカーネルはポジティブな定性と劇的な複雑さの低減を享受しており、これまで処理できなかったデータセットを処理できる。新しいカーネルは、入力グラフが数十のノードを持つ分子データセットのグラフ分類で最初に検証される。 SWWLカーネルの効率は、数万のノードからなる入力グラフを構成する計算流体力学や固体力学におけるグラフ回帰に基づいて説明される。

Supervised learning has recently garnered significant attention in the field of computational physics due to its ability to effectively extract complex patterns for tasks like solving partial differential equations, or predicting material properties. Traditionally, such datasets consist of inputs given as meshes with a large number of nodes representing the problem geometry (seen as graphs), and corresponding outputs obtained with a numerical solver. This means the supervised learning model must be able to handle large and sparse graphs with continuous node attributes. In this work, we focus on Gaussian process regression, for which we introduce the Sliced Wasserstein Weisfeiler-Lehman (SWWL) graph kernel. In contrast to existing graph kernels, the proposed SWWL kernel enjoys positive definiteness and a drastic complexity reduction, which makes it possible to process datasets that were previously impossible to handle. The new kernel is first validated on graph classification for molecular datasets, where the input graphs have a few tens of nodes. The efficiency of the SWWL kernel is then illustrated on graph regression in computational fluid dynamics and solid mechanics, where the input graphs are made up of tens of thousands of nodes.

翻訳日:2024-02-07 15:28:34 公開日:2024-02-06

# 強いレーザー照射下での単一イオンを通した熱輸送

Thermal transport through a single trapped ion under strong laser illumination ( http://arxiv.org/abs/2402.03937v1 )

ライセンス: Link先を確認

T. Tassis, F. Brito, F. L. Semi\~ao

(参考訳) 本研究では,レーザー励起によって駆動され,異なる温度で作動する熱貯水池と結合した単一閉じ込めイオン中での量子熱輸送の研究を行う。私たちの焦点は、異なるレーザーカップリングシナリオがシステムのダイナミクスに与える影響を理解することです。レーザー強度がイオンの電子的および運動的自由度が強く結合する状態に達すると、熱貯水池に対する現象論的モデルを用いる従来のアプローチは不十分になる。そのため、装束マスター方程式(DME)の定式化が重要となり、レーザー強度が熱輸送にどのように影響するかをより深く理解できるようになる。脱調および結合強度によって定義されるパラメータ空間内の熱電流を解析し、イオンの振動周波数とレーザーパラメータの影響を受け、熱輸送、残留コヒーレンス、システム特性の微妙な関係を明らかにする。また, 負の差熱伝導率や熱電流流の非対称性などの現象も明らかにし, この本質的量子技術の設定の熱特性について考察した。

In this work, we study quantum heat transport in a single trapped ion, driven by laser excitation and coupled to thermal reservoirs operating at different temperatures. Our focus lies in understanding how different laser coupling scenarios impact the system dynamics. As the laser intensity reaches a regime where the ion's electronic and motional degrees of freedom strongly couple, traditional approaches using phenomenological models for thermal reservoirs become inadequate. Therefore, the adoption of the dressed master equation (DME) formalism becomes crucial, enabling a deeper understanding of how distinct laser intensities influence heat transport. Analyzing the heat current within the parameter space defined by detuning and coupling strength, we observe intriguing circular patterns which are influenced by the ion's vibrational frequency and laser parameters, and reveal nuanced relationships between heat transport, residual coherence, and system characteristics. Our study also reveals phenomena such as negative differential heat conductivity and asymmetry in heat current flow, offering insights into the thermal properties of this essential quantum technology setup.

翻訳日:2024-02-07 15:20:58 公開日:2024-02-06

# 大規模言語モデルを拡張現実に組み込む - 包括性、エンゲージメント、プライバシの機会と課題

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy ( http://arxiv.org/abs/2402.03907v1 )

ライセンス: Link先を確認

Efe Bozkir and S\"uleyman \"Ozdel and Ka Hei Carrie Lau and Mengdi Wang and Hong Gao and Enkelejda Kasneci

(参考訳) 近年のコンピュータグラフィックス、ハードウェア、人工知能(AI)、人間とコンピュータの相互作用は、拡張現実(XR)デバイスや設定をより広く普及させる可能性がある。これらのデバイスとセットアップは、ユーザに対して、目やハンドトラッカーなど、さまざまな感覚モダリティを持つインタラクティブでエンゲージメント、没入感のあるエクスペリエンスを提供する一方で、多くの非プレイヤーキャラクターは、プリスクリプトされた方法で、あるいは従来のAI技術によって利用される。本稿では,仮想アバターに組み込んだり,ユーザプロファイルに従ってエンジニアリングを促したり,特定の目的のためにLLMを微調整したりすることで,より包括的体験を促進するために,XRに大規模言語モデル(LLM)を組み込むことを論じる。このような包含がxr使用の多様性を促進すると論じている。さらに,LLMの多機能な会話機能により,ユーザはXR環境とより関わりやすくなり,XRを日常的に利用できるようになるだろうと考えている。最後に,ユーザによるllm環境提供情報とセンサによる生体計測データの組み合わせが,新たなプライバシ侵害につながる可能性があると推測する。このようなプライバシー侵害の可能性を研究する一方で、ユーザのプライバシーに関する懸念や好みについても調査する必要がある。要約すると、いくつかの課題があるにもかかわらず、LLMをXRに組み込むことは、いくつかの機会のある有望で新しい研究領域である。

Recent developments in computer graphics, hardware, artificial intelligence (AI), and human-computer interaction likely lead to extended reality (XR) devices and setups being more pervasive. While these devices and setups provide users with interactive, engaging, and immersive experiences with different sensing modalities, such as eye and hand trackers, many non-player characters are utilized in a pre-scripted way or by conventional AI techniques. In this paper, we argue for using large language models (LLMs) in XR by embedding them in virtual avatars or as narratives to facilitate more inclusive experiences through prompt engineering according to user profiles and fine-tuning the LLMs for particular purposes. We argue that such inclusion will facilitate diversity for XR use. In addition, we believe that with the versatile conversational capabilities of LLMs, users will engage more with XR environments, which might help XR be more used in everyday life. Lastly, we speculate that combining the information provided to LLM-powered environments by the users and the biometric data obtained through the sensors might lead to novel privacy invasions. While studying such possible privacy invasions, user privacy concerns and preferences should also be investigated. In summary, despite some challenges, embedding LLMs into XR is a promising and novel research area with several opportunities.

翻訳日:2024-02-07 15:20:41 公開日:2024-02-06

# 機械学習アルゴリズムを用いた従業員ターンオーバー分析

Employee Turnover Analysis Using Machine Learning Algorithms ( http://arxiv.org/abs/2402.03905v1 )

ライセンス: Link先を確認

Mahyar Karimi, Kamyar Seyedkazem Viliyani

(参考訳) 従業員の知識は組織資産である。ターンオーバーは明らかで隠れたコストと不可分な損害を課す可能性がある。このリスクを克服し緩和するには、従業員の状態を監視する必要がある。幸福機能の解析が複雑であるため、従業員の離職予測は機械学習技術に委譲することができる。本稿では,従業員の減少率について論じる。 AdaBoost、SVM、RandomForestの3つの異なる教師付き学習アルゴリズムは、従業員の属性の精度をベンチマークするために使用される。到達したモデルは予測分析を確立するのに役立ちます。

Employee's knowledge is an organization asset. Turnover may impose apparent and hidden costs and irreparable damages. To overcome and mitigate this risk, employee's condition should be monitored. Due to high complexity of analyzing well-being features, employee's turnover predicting can be delegated to machine learning techniques. In this paper, we discuss employee's attrition rate. Three different supervised learning algorithms comprising AdaBoost, SVM and RandomForest are used to benchmark employee attrition accuracy. Attained models can help out at establishing predictive analytics.

翻訳日:2024-02-07 15:20:16 公開日:2024-02-06

# Deep MSFOP: 教師なし形状マッチングのための深部関数写像における多重スペクトルフィルタ演算子保存

Deep MSFOP: Multiple Spectral filter Operators Preservation in Deep Functional Maps for Unsupervised Shape Matching ( http://arxiv.org/abs/2402.03904v1 )

ライセンス: Link先を確認

Feifan Luo, Qingsong Li, Ling Hu, Xinru Liu, Haojun Xu, Haibo Wang, Ting Li, Shengjun Liu

(参考訳) 本稿では,多スペクトルフィルタ演算子保存法 (MSFOR) という新しい制約を提案し,それに基づいて,形状マッチングのためのDeep MSFOPと呼ばれる効率的な深部関数写像アーキテクチャを開発した。基本的な考え方は、一般的なディスクリプタ保存制約を使う代わりに、複数のスペクトルフィルタ演算子を保存するためにマップが必要です。これにより、関数の周波数帯に含まれるより情報的な幾何情報を関数マップ計算に組み込むことができる。これは、ウェーブレット保存やlbo可換性といった以前の技術が、私たちの特別なケースであることを保証することができる。さらに,MSFOP制約を用いた地図の効率的な計算方法も開発しており,特に学習可能なフィルタ演算子を持つ深層学習に便利に組み込むことができる。以上の結果を利用して,機能地図と基本点マップを併用した,適切な教師なし損失を伴って,Deep MSFOPパイプラインを設計した。私たちの深い関数マップは、関数マップがより幾何学的に有益で、適切であることが保証され、計算は数値的に安定であるなど、顕著な利点があります。異なるデータセット上での広範な実験結果から,本手法は既存の最先端手法よりも優れており,特に非等長性や一貫性のないトポロジーデータセットのような困難な設定において優れていることが示された。

We propose a novel constraint called Multiple Spectral filter Operators Preservation (MSFOR) to compute functional maps and based on it, develop an efficient deep functional map architecture called Deep MSFOP for shape matching. The core idea is that, instead of using the general descriptor preservation constraint, we require our maps to preserve multiple spectral filter operators. This allows us to incorporate more informative geometrical information, contained in different frequency bands of functions, into the functional map computing. This can be confirmed by that some previous techniques like wavelet preservation and LBO commutativity are actually our special cases. Moreover, we also develop a very efficient way to compute the maps with MSFOP constraint, which can be conveniently embedded into the deep learning, especially having learnable filter operators. Utilizing the above results, we finally design our Deep MSFOP pipeline, equipped with a suitable unsupervised loss jointly penalizing the functional map and the underlying pointwise map. Our deep functional map has notable advantages, including that the functional map is more geometrically informative and guaranteed to be proper, and the computing is numerically stable. Extensive experimental results on different datasets demonstrate that our approach outperforms the existing state-of-the-art methods, especially in challenging settings like non-isometric and inconsistent topology datasets.

翻訳日:2024-02-07 15:20:08 公開日:2024-02-06

# 複合リターンは強化学習におけるばらつきを減らす

Compound Returns Reduce Variance in Reinforcement Learning ( http://arxiv.org/abs/2402.03903v1 )

ライセンス: Link先を確認

Brett Daley, Martha White, Marlos C. Machado

(参考訳) n$-step returnや$\lambda$-returnsといったマルチステップリターンは、強化学習(RL)メソッドのサンプル効率を改善するために一般的に使用される。多段階リターンの分散は、その長さの制限因子となり、あまりにも遠くに目を向けると分散が増加し、多段階学習の利点が逆転する。我々の研究では、分散を減らすために複合戻り値 -- $n$-step の重み付き平均値 -- が示される。与えられた$n$-stepの戻り値と同じ縮約係数を持つ任意の化合物が、厳密に分散を減少させることを初めて証明する。さらに,この分散還元特性が線形関数近似下での時間微分学習の有限サンプル複雑性を改善することを証明した。一般化合物のリターンは実装に費用がかかるため,ミニバッチ経験再生を用いた場合であっても,効率を保ちながら分散を低減できる2ブートストラップリターンを導入する。 2ブートストラップリターンが、計算コストをほとんど増やさずに、n$-step deep rlエージェントのサンプル効率を向上させることができることを示す実験を行った。

Multistep returns, such as $n$-step returns and $\lambda$-returns, are commonly used to improve the sample efficiency of reinforcement learning (RL) methods. The variance of the multistep returns becomes the limiting factor in their length; looking too far into the future increases variance and reverses the benefits of multistep learning. In our work, we demonstrate the ability of compound returns -- weighted averages of $n$-step returns -- to reduce variance. We prove for the first time that any compound return with the same contraction modulus as a given $n$-step return has strictly lower variance. We additionally prove that this variance-reduction property improves the finite-sample complexity of temporal-difference learning under linear function approximation. Because general compound returns can be expensive to implement, we introduce two-bootstrap returns which reduce variance while remaining efficient, even when using minibatched experience replay. We conduct experiments showing that two-bootstrap returns can improve the sample efficiency of $n$-step deep RL agents, with little additional computational cost.

翻訳日:2024-02-07 15:19:42 公開日:2024-02-06

# 点製品注意の可解モデルにおける位置学習と意味学習の相転移

A phase transition between positional and semantic learning in a solvable model of dot-product attention ( http://arxiv.org/abs/2402.03902v1 )

ライセンス: Link先を確認

Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborov\'a

(参考訳) 点製品注目層が位置注意行列(それぞれの位置に基づいてトークンが互いに結合する)と意味注意行列(その意味に基づいて相互に結合するトークンを含む)をどのように学習するかを検討する。アルゴリズム的なタスクの場合、同じ単純なアーキテクチャが位置的あるいは意味的メカニズムを使ってどのようにソリューションを実装するかを実験的に示します。理論的には,学習可能な結合・低ランク問合せとキー行列を持つ非線形セルフアテンション層の学習について検討する。高次元データの漸近的限界と膨大なトレーニングサンプルについて,非凸経験的損失景観における大域的最小値の閉形式的特徴付けを述べる。この最小限は位置的または意味的なメカニズムのいずれかに対応し、サンプルの複雑さが増大する前者から後者への初期相転移を示す。最後に,dot-product attention層を線形位置ベースラインと比較し,十分なデータにアクセス可能な意味的メカニズムを用いて,後者よりも優れていることを示す。

We investigate how a dot-product attention layer learns a positional attention matrix (with tokens attending to each other based on their respective positions) and a semantic attention matrix (with tokens attending to each other based on their meaning). For an algorithmic task, we experimentally show how the same simple architecture can learn to implement a solution using either the positional or semantic mechanism. On the theoretical side, we study the learning of a non-linear self-attention layer with trainable tied and low-rank query and key matrices. In the asymptotic limit of high-dimensional data and a comparably large number of training samples, we provide a closed-form characterization of the global minimum of the non-convex empirical loss landscape. We show that this minimum corresponds to either a positional or a semantic mechanism and evidence an emergent phase transition from the former to the latter with increasing sample complexity. Finally, we compare the dot-product attention layer to linear positional baseline, and show that it outperforms the latter using the semantic mechanism provided it has access to sufficient data.

翻訳日:2024-02-07 15:19:21 公開日:2024-02-06

# バッチユニバーサル予測

Batch Universal Prediction ( http://arxiv.org/abs/2402.03901v1 )

ライセンス: Link先を確認

Marco Bondaschi, Michael Gastpar

(参考訳) 大型言語モデル(llm)は最近、人間のような英語文を生成するという驚くべき能力により、大きな人気を得ている。 LLMは基本的に予測子であり、過去の単語列の確率を推定する。したがって、普遍的な予測の観点からその性能を評価することは自然である。これを公平に行うために,古典的平均的後悔の修正としてバッチ後悔の概念を導入し,その漸近的価値について,記憶力のない情報源と1次マルコフ源について検討する。

Large language models (LLMs) have recently gained much popularity due to their surprising ability at generating human-like English sentences. LLMs are essentially predictors, estimating the probability of a sequence of words given the past. Therefore, it is natural to evaluate their performance from a universal prediction perspective. In order to do that fairly, we introduce the notion of batch regret as a modification of the classical average regret, and we study its asymptotical value for add-constant predictors, in the case of memoryless sources and first-order Markov sources.

翻訳日:2024-02-07 15:19:02 公開日:2024-02-06

# Pro-HAN:プロファイルに基づく音声言語理解のための異種グラフ注意ネットワーク

Pro-HAN: A Heterogeneous Graph Attention Network for Profile-Based Spoken Language Understanding ( http://arxiv.org/abs/2402.03900v1 )

ライセンス: Link先を確認

Dechuan Teng, Chunlin Lu, Xiao Xu, Wanxiang Che, Libo Qin

(参考訳) 近年、プロファイルベースの音声言語理解(SLU)が注目され、ユーザ発話の曖昧さを解消するために、様々な種類の補足プロファイル情報(知識グラフ、ユーザプロファイル、コンテキスト認識)を組み込むことを目指している。しかし、既存のアプローチは、それらの相互関係を考慮せずに、あるいはそれらの内部で無関係で矛盾する情報を除外することなく、異なるプロファイル情報を別々にモデル化することができる。上記の問題に対処するために,複数のプロファイル情報にまたがる推論を行う異種グラフアテンションネットワーク pro-han を導入する。具体的には、複数のPro間の相互関係を捉えるために、Intra-Pro、Inter-Pro、utterance-Proの3種類のエッジを設計する。 ProSLUデータセットに新たな最先端技術を導入し、3つの指標すべてに対して約8%の改善を実現しました。さらに解析実験により,マルチソースプロファイル情報のモデリングにおける本手法の有効性が検証された。

Recently, Profile-based Spoken Language Understanding (SLU) has gained increasing attention, which aims to incorporate various types of supplementary profile information (i.e., Knowledge Graph, User Profile, Context Awareness) to eliminate the prevalent ambiguities in user utterances. However, existing approaches can only separately model different profile information, without considering their interrelationships or excluding irrelevant and conflicting information within them. To address the above issues, we introduce a Heterogeneous Graph Attention Network to perform reasoning across multiple Profile information, called Pro-HAN. Specifically, we design three types of edges, denoted as intra-Pro, inter-Pro, and utterance-Pro, to capture interrelationships among multiple Pros. We establish a new state-of-the-art on the ProSLU dataset, with an improvement of approximately 8% across all three metrics. Further analysis experiments also confirm the effectiveness of our method in modeling multi-source profile information.

翻訳日:2024-02-07 15:18:53 公開日:2024-02-06

# 視覚的質問応答推論の合理化

Convincing Rationales for Visual Question Answering Reasoning ( http://arxiv.org/abs/2402.03896v1 )

ライセンス: Link先を確認

Kun Li, George Vosselman, Michael Ying Yang

(参考訳) 視覚的質問応答(vqa)は、画像の内容に関する質問に対する回答を予測するという困難なタスクである。テキスト質問と視覚イメージの両方を深く理解する必要がある。先行研究は、予測された回答の精度を単純に計算することで、解答モデルを直接評価する。しかし、このような「ブラックボックス」システムでは、予測の背後にある内的推論は無視され、予測を信用できるかどうかさえわからない。場合によっては、不適切な視覚領域やテキストトークンに注目した場合でも、モデルが正しい答えを得られる場合があるため、モデルの信頼性が低く、非論理的になる。 VQA, CRVQAに対して, 与えられた画像/問合せ対の予測解に隣接する視覚的およびテキスト的合理性を生成する。新しい出力がもたらす追加アノテーションを考えると、 {CRVQA} は既存のVQAデータセットとそれらのビジュアルラベルから変換されたサンプルによって訓練され、評価される。広範な実験により、視覚的およびテキスト的合理性が回答の予測をサポートし、さらに精度を向上させることが示されている。さらに, ゼロショット評価設定において, {CRVQA} は汎用VQAデータセット上での競合性能を達成する。データセットとソースコードはhttps://github.com/lik1996/CRVQA2024でリリースされる。

Visual Question Answering (VQA) is a challenging task of predicting the answer to a question about the content of an image. It requires deep understanding of both the textual question and visual image. Prior works directly evaluate the answering models by simply calculating the accuracy of the predicted answers. However, the inner reasoning behind the prediction is disregarded in such a "black box" system, and we do not even know if one can trust the predictions. In some cases, the models still get the correct answers even when they focus on irrelevant visual regions or textual tokens, which makes the models unreliable and illogical. To generate both visual and textual rationales next to the predicted answer to the given image/question pair, we propose Convincing Rationales for VQA, CRVQA. Considering the extra annotations brought by the new outputs, {CRVQA} is trained and evaluated by samples converted from some existing VQA datasets and their visual labels. The extensive experiments demonstrate that the visual and textual rationales support the prediction of the answers, and further improve the accuracy. Furthermore, {CRVQA} achieves competitive performance on generic VQA datatsets in the zero-shot evaluation setting. The dataset and source code will be released under https://github.com/lik1996/CRVQA2024.

翻訳日:2024-02-07 15:18:35 公開日:2024-02-06

# 自動運転のための予測水平条件:安全・快適・効率の最適化

Prediction Horizon Requirements for Automated Driving: Optimizing Safety, Comfort, and Efficiency ( http://arxiv.org/abs/2402.03893v1 )

ライセンス: Link先を確認

Manuel Mu\~noz S\'anchez, Chris van der Ploeg, Robin Smit, Jos Elfring, Emilia Silvas, Ren\'e van de Molengraft

(参考訳) 他の道路利用者の移動を予測することは、自動走行車(AV)の性能を改善する上で有益である。しかし、これらの予測に関連する時間軸とav性能の関係はいまだ不明である。多くの軌道予測アルゴリズムが存在するにもかかわらず、様々な予測長がAV安全やその他の車両性能指標にどのように影響するかは研究されていない。本研究は, 安全性, 快適性, 効率性に着目し, 異なる予測地平線がAV性能に及ぼす影響を検討することによって, このギャップに対処する。最新のリスクベースの予測軌道プランナを用いて複数の実験を行い、最大20秒間予測をシミュレーションした。シミュレーションに基づいて、特定のAV性能基準とアプリケーションニーズに基づいて、必要最小限かつ最適予測地平線を特定するためのフレームワークを提案する。その結果,横断歩道との衝突を防ぐために1.6秒までの地平線が必要であり,最大7～8秒の地平線が最適効率を示し,最大15秒までの地平線が乗客の快適性を向上させることがわかった。提案手法は,歩行者を横断するアプリケーションのための一般的なガイドラインとして,11.8秒の予測水平線を目標とする。

Predicting the movement of other road users is beneficial for improving automated vehicle (AV) performance. However, the relationship between the time horizon associated with these predictions and AV performance remains unclear. Despite the existence of numerous trajectory prediction algorithms, no studies have been conducted on how varying prediction lengths affect AV safety and other vehicle performance metrics, resulting in undefined horizon requirements for prediction methods. Our study addresses this gap by examining the effects of different prediction horizons on AV performance, focusing on safety, comfort, and efficiency. Through multiple experiments using a state-of-the-art, risk-based predictive trajectory planner, we simulated predictions with horizons up to 20 seconds. Based on our simulations, we propose a framework for specifying the minimum required and optimal prediction horizons based on specific AV performance criteria and application needs. Our results indicate that a horizon of 1.6 seconds is required to prevent collisions with crossing pedestrians, horizons of 7-8 seconds yield the best efficiency, and horizons up to 15 seconds improve passenger comfort. We conclude that prediction horizon requirements are application-dependent, and recommend aiming for a prediction horizon of 11.8 seconds as a general guideline for applications involving crossing pedestrians.

翻訳日:2024-02-07 15:18:13 公開日:2024-02-06

# シリコン上の超伝導回路の損失とデコヒーレンス:電子スピン共鳴からの考察

Loss and decoherence in superconducting circuits on silicon: Insights from electron spin resonance ( http://arxiv.org/abs/2402.03889v1 )

ライセンス: Link先を確認

Aditya Jayaraman, Andrey V. Danilov, Jonas Bylander and Sergey E. Kubatkin

(参考訳) 量子計算や量子センシング用途に用いられる固体デバイスは、突発的で帯電した2レベルシステム(TLS)と非磁性スピンによる損失とノイズに悪影響を及ぼす。これら2つのノイズ源は相互接続され、回路性能への影響が増大する。我々は、窒化ニオブ(NbN)超伝導共振器を用いたオンチップ電子スピン共鳴(ESR)法を用いて、シリコンの表面スピンと後表面処理の効果を研究する。異なるスピン緩和時間で特徴付けられる2つの異なるスピン種を同定し, 種々の表面処理(アニール, フッ化水素酸)に対して選択的に反応する。 2つのスピン種のうちの1つだけが低出力(近傍単光子)励起におけるTLS制限共振器品質因子に大きな影響を与える。表面処理後のスピン密度の3～5倍減少を観測し、esr分光法が量子系における損失とデコヒーレンスを緩和する戦略を開発する上で有効であることを示す。

Solid-state devices used for quantum computation and quantum sensing applications are adversely affected by loss and noise caused by spurious, charged two-level systems (TLS) and stray paramagnetic spins. These two sources of noise are interconnected, exacerbating the impact on circuit performance. We use an on-chip electron spin resonance (ESR) technique, with niobium nitride (NbN) superconducting resonators, to study surface spins on silicon and the effect of post-fabrication surface treatments. We identify two distinct spin species that are characterized by different spin-relaxation times and respond selectively to various surface treatments (annealing and hydrofluoric acid). Only one of the two spin species has a significant impact on the TLS-limited resonator quality factor at low-power (near single-photon) excitation. We observe a 3-to-5-fold reduction in the total density of spins after surface treatments, and demonstrate the efficacy of ESR spectroscopy in developing strategies to mitigate loss and decoherence in quantum systems.

翻訳日:2024-02-07 15:17:48 公開日:2024-02-06

# 言語変化の原動力としての社会規範の変遷--ドイツ連邦における言語とジェンダーをめぐる苦闘

Shifting social norms as a driving force for linguistic change: Struggles about language and gender in the German Bundestag ( http://arxiv.org/abs/2402.03887v1 )

ライセンス: Link先を確認

Carolin M\"uller-Spitzer, Samira Ochs

(参考訳) 本稿では,社会規範の変遷に基づく言語変化に着目し,特に言語とジェンダーに関する議論について述べる。この議論では、言語が「自然に」発達し、ジェンダー非包摂的な言語のような「多くの介入」がしばしば「有機的」な言語体系では不適切で「危険」であるとする議論が繰り返されている。しかし、そのような介入は前例がない。言語変化の社会的動機付けのプロセスは珍しくも新しいものでもない。我々は、ドイツにおける重要な政治社会空間であるドイツ連邦共和国への貢献に焦点をあてる。本稿は,1980年代以降,ドイツ連邦議会において,言語とジェンダーに関する他の闘争を出発点として,言語とジェンダーが繰り返し問題となっていることを示すものである。我々は、このことが連邦議会の言語実践にどのように反映されているかを示す。 a) ゲイ及びレズビアンの指定 b)B\"urgerinnen und B\"urger(女性及び男性市民)のようなペア形式 c) 女性形態の住所及び個人名詞(「Pr\」に加えて「Pr\」という。) 最後に、現在非常に熱いジェンダー非包摂的言語に関する議論、特にすべてのジェンダーアイデンティティを包含することを意図したアステリスクやコロン(Lehrer*innen, Lehrer:innen, male*female teachers)のようなジェンダーシンボルを持つ新しい形態について、これらの初期の言語戦闘の意味について議論する。

This paper focuses on language change based on shifting social norms, in particular with regard to the debate on language and gender. It is a recurring argument in this debate that language develops "naturally" and that "severe interventions" - such as gender-inclusive language is often claimed to be - in the allegedly "organic" language system are inappropriate and even "dangerous". Such interventions are, however, not unprecedented. Socially motivated processes of language change are neither unusual nor new. We focus in our contribution on one important political-social space in Germany, the German Bundestag. Taking other struggles about language and gender in the plenaries of the Bundestag as a starting point, our article illustrates that language and gender has been a recurring issue in the German Bundestag since the 1980s. We demonstrate how this is reflected in linguistic practices of the Bundestag, by the use of a) designations for gays and lesbians; b) pair forms such as B\"urgerinnen und B\"urger (female and male citizens); and c) female forms of addresses and personal nouns ('Pr\"asidentin' in addition to 'Pr\"asident'). Lastly, we will discuss implications of these earlier language battles for the currently very heated debate about gender-inclusive language, especially regarding new forms with gender symbols like the asterisk or the colon (Lehrer*innen, Lehrer:innen; male*female teachers) which are intended to encompass all gender identities.

翻訳日:2024-02-07 15:17:30 公開日:2024-02-06

# MOMENT: オープン時系列ファウンデーションモデルのファミリー

MOMENT: A Family of Open Time-series Foundation Models ( http://arxiv.org/abs/2402.03885v1 )

ライセンス: Link先を確認

Mononito Goswami, Konrad Szafer, Arjun Choudhry, Yifu Cai, Shuo Li, Artur Dubrawski

(参考訳) 汎用時系列解析のためのオープンソース基盤モデルのファミリであるMOMENTを紹介する。時系列データの事前学習は,(1)大規模かつ密集した公開時系列リポジトリの欠如,(2)マルチデータセットトレーニングを煩雑なものにする多様な時系列特性のため,困難である。さらに、(3)これらのモデルを評価するための実験的ベンチマーク、特にリソース、時間、監督が限られているシナリオは、まだ初期段階にある。これらの課題に対処するため、私たちはTime-Series Pileと呼ばれる、多種多様な公開時系列コレクションをコンパイルし、大規模マルチデータセット事前トレーニングをアンロックするための時系列固有の課題に体系的に取り組む。最後に,様々なタスクやデータセットの時系列基礎モデル評価のためのベンチマークを,限定的な監督設定で設計する。このベンチマーク実験は、最小限のデータとタスク固有の微調整による事前学習モデルの有効性を示す。最後に,大規模な事前学習型時系列モデルについて興味深い経験的観察を行った。我々のコードは匿名で.4open.science/r/BETT-773F/で入手できる。

We introduce MOMENT, a family of open-source foundation models for general-purpose time-series analysis. Pre-training large models on time-series data is challenging due to (1) the absence of a large and cohesive public time-series repository, and (2) diverse time-series characteristics which make multi-dataset training onerous. Additionally, (3) experimental benchmarks to evaluate these models, especially in scenarios with limited resources, time, and supervision, are still in their nascent stages. To address these challenges, we compile a large and diverse collection of public time-series, called the Time-series Pile, and systematically tackle time-series-specific challenges to unlock large-scale multi-dataset pre-training. Finally, we build on recent work to design a benchmark to evaluate time-series foundation models on diverse tasks and datasets in limited supervision settings. Experiments on this benchmark demonstrate the effectiveness of our pre-trained models with minimal data and task-specific fine-tuning. Finally, we present several interesting empirical observations about large pre-trained time-series models. Our code is available anonymously at anonymous.4open.science/r/BETT-773F/.

翻訳日:2024-02-07 15:16:57 公開日:2024-02-06

# リーマン多様体上の双レベル最適化の枠組み

A Framework for Bilevel Optimization on Riemannian Manifolds ( http://arxiv.org/abs/2402.03883v1 )

ライセンス: Link先を確認

Andi Han, Bamdev Mishra, Pratik Jawanpuria, Akiko Takeda

(参考訳) バイレベル最適化は、様々な分野のアプリケーションに存在感を増している。本研究では,下層および上層問題の変数がリーマン多様体上で制約されるような二段階最適化問題の解法を提案する。多様体上の過次推定戦略を複数提供し,その推定誤差について検討する。多様体上の超勾配降下アルゴリズムの収束と複雑性解析を提供する。また、確率的二段階最適化や一般リトラクションの利用にも発展を拡大する。各種アプリケーションにおける提案フレームワークの有用性について紹介する。

Bilevel optimization has seen an increasing presence in various domains of applications. In this work, we propose a framework for solving bilevel optimization problems where variables of both lower and upper level problems are constrained on Riemannian manifolds. We provide several hypergradient estimation strategies on manifolds and study their estimation error. We provide convergence and complexity analysis for the proposed hypergradient descent algorithm on manifolds. We also extend the developments to stochastic bilevel optimization and to the use of general retraction. We showcase the utility of the proposed framework on various applications.

翻訳日:2024-02-07 15:16:39 公開日:2024-02-06

# 人間は、異常なポーズで物体を認識することでディープネットワークを打ち負かす

Humans Beat Deep Networks at Recognizing Objects in Unusual Poses, Given Enough Time ( http://arxiv.org/abs/2402.03973v1 )

ライセンス: Link先を確認

Netta Ollikka, Amro Abbas, Andrea Perin, Markku Kilpel\"ainen, St\'ephane Deny

(参考訳) ディープラーニングは、いくつかのオブジェクト認識ベンチマークで人間とのギャップを埋めようとしている。ここでは、このギャップを、珍しい視点からオブジェクトを見ることができる挑戦的な画像の文脈で検討する。我々は,この条件下では系統的に脆弱である最先端の事前訓練ネットワーク(EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext)と対照的に,異常なポーズにおける物体の認識が優れていることを発見した。画像の露出時間を制限すると、人間のパフォーマンスはディープネットワークのレベルに低下し、人間が異常なポーズで物体を特定すると、追加の精神的プロセス(追加の時間を必要とする)が発生することを示唆している。最後に、人間とネットワークのエラーパターンの分析により、時間制限された人間でさえ、フィードフォワードの深層ネットワークと異なることが判明した。コンピュータビジョンシステムを人間の視覚システムの堅牢性レベルに持ち込むには、より多くの作業が必要であると結論づける。余分な視聴時間の間に起こる精神過程の性質を理解することが、そのような堅牢性を達成する鍵となるかもしれない。

Deep learning is closing the gap with humans on several object recognition benchmarks. Here we investigate this gap in the context of challenging images where objects are seen from unusual viewpoints. We find that humans excel at recognizing objects in unusual poses, in contrast with state-of-the-art pretrained networks (EfficientNet, SWAG, ViT, SWIN, BEiT, ConvNext) which are systematically brittle in this condition. Remarkably, as we limit image exposure time, human performance degrades to the level of deep networks, suggesting that additional mental processes (requiring additional time) take place when humans identify objects in unusual poses. Finally, our analysis of error patterns of humans vs. networks reveals that even time-limited humans are dissimilar to feed-forward deep networks. We conclude that more work is needed to bring computer vision systems to the level of robustness of the human visual system. Understanding the nature of the mental processes taking place during extra viewing time may be key to attain such robustness.

翻訳日:2024-02-07 15:10:22 公開日:2024-02-06

# 自転車駅におけるメタヒューリスティックスの利用

Using metaheuristics for the location of bicycle stations ( http://arxiv.org/abs/2402.03945v1 )

ライセンス: Link先を確認

Christian Cintrano, Francisco Chicano, Enrique Alba

(参考訳) 本研究では,共有自転車の寄託・収集に最適な場所を見つけるという課題を解決する。そこで我々は, p-median 問題としてこの問題をモデル化する。 p中間問題は、一連の顧客(市民)と最も近い施設(自転車駅)との距離を最小化する形で、一連の施設(自転車駅)を設置することを目指している。提案手法は, 遺伝的アルゴリズム, 局所探索, 粒子群最適化, 模擬アニーリング, 可変近傍探索を用いて, 自転車局の最適な位置を見つけ, 比較優位性を検討した。 iraceを使ってアルゴリズムを自動的にパラメータ化し、アルゴリズムを自動的に微調整する手法に寄与します。私たちはまた、マラガ(spain)という実都市からのさまざまなオープンデータソースから、さまざまな実データ(距離と重み)を研究しました。結果とマラガで実装されたソリューションを比較しました。最後に,提案手法を用いて駅増設により都市内の既存システムを改善する方法について分析した。

In this work, we solve the problem of finding the best locations to place stations for depositing/collecting shared bicycles. To do this, we model the problem as the p-median problem, that is a major existing localization problem in optimization. The p-median problem seeks to place a set of facilities (bicycle stations) in a way that minimizes the distance between a set of clients (citizens) and their closest facility (bike station). We have used a genetic algorithm, iterated local search, particle swarm optimization, simulated annealing, and variable neighbourhood search, to find the best locations for the bicycle stations and study their comparative advantages. We use irace to parameterize each algorithm automatically, to contribute with a methodology to fine-tune algorithms automatically. We have also studied different real data (distance and weights) from diverse open data sources from a real city, Malaga (Spain), hopefully leading to a final smart city application. We have compared our results with the implemented solution in Malaga. Finally, we have analyzed how we can use our proposal to improve the existing system in the city by adding more stations.

翻訳日:2024-02-07 15:10:02 公開日:2024-02-06

# 大規模言語モデルによる隠れ世界の発見

Discovery of the Hidden World with Large Language Models ( http://arxiv.org/abs/2402.03941v1 )

ライセンス: Link先を確認

Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, Kun Zhang

(参考訳) 科学は既知の事実と観察の組み合わせから新しい因果知識を発見することから始まる。因果関係を見つけるために、従来の因果関係発見アプローチは、主に人間の専門家によって与えられる高品質な測定変数に依存している。しかし、因果変数は通常、広い範囲の現実世界のアプリケーションでは利用できない。世界の大規模な観測から豊富な知識を学ぶために訓練された大規模言語モデル(LLM)の台頭は、生の観測データから高いレベルの隠れた変数を発見する新しい機会を提供する。そこで、COAT: Causal representatiOn AssistanTを紹介する。 COATは、非構造化データから潜在的な因果因子を抽出する因子プロジェクタとしてLLMを組み込んでいる。さらに、LCMは、データ値(例えば、アノテーション基準)の収集に使用される追加情報を提供し、生の非構造化データを構造化データに解析するように指示することもできる。注釈付きデータは因果学習モジュール(例えば、FCIアルゴリズム)に供給され、データの厳密な説明とLLMによる因果要因の抽出をさらに改善するための有用なフィードバックが提供される。基礎的因果系を明らかにするためのCOATの有効性を,レビュー評価分析と神経因果診断の2症例で検証した。

Science originates with discovering new causal knowledge from a combination of known facts and observations. Traditional causal discovery approaches mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. However, the causal variables are usually unavailable in a wide range of real-world applications. The rise of large language models (LLMs) that are trained to learn rich knowledge from the massive observations of the world, provides a new opportunity to assist with discovering high-level hidden variables from the raw observational data. Therefore, we introduce COAT: Causal representatiOn AssistanT. COAT incorporates LLMs as a factor proposer that extracts the potential causal factors from unstructured data. Moreover, LLMs can also be instructed to provide additional information used to collect data values (e.g., annotation criteria) and to further parse the raw unstructured data into structured data. The annotated data will be fed to a causal learning module (e.g., the FCI algorithm) that provides both rigorous explanations of the data, as well as useful feedback to further improve the extraction of causal factors by LLMs. We verify the effectiveness of COAT in uncovering the underlying causal system with two case studies of review rating analysis and neuropathic diagnosis.

翻訳日:2024-02-07 15:09:13 公開日:2024-02-06

# スピン量子ビットの完全自律チューニング

Fully autonomous tuning of a spin qubit ( http://arxiv.org/abs/2402.03931v1 )

ライセンス: Link先を確認

Jonas Schuff, Miguel J. Carballido, Madeleine Kotzagiannidis, Juan Carlos Calvo, Marco Caselli, Jacob Rawling, David L. Craig, Barnaby van Straaten, Brandon Severin, Federico Fedele, Simon Svab, Pierre Chevalier Kwon, Rafael S. Eggli, Taras Patlatiuk, Nathan Korda, Dominik Zumb\"uhl, Natalia Ares

(参考訳) 20年以上にわたって、量子コンピューティングのための半導体の量子ビットの研究は大きなブレークスルーをもたらした。しかし、大規模半導体量子回路の開発は、これらの回路の効率的なチューニングと運用の課題によって、依然として制限されている。これらの量子ビットの最適動作条件の同定は複雑であり、広大なパラメータ空間の探索を伴う。これは真の'ヘイスタックの必要'問題を示し、これまでデバイス変動と製造の不完全さのため、完全な自動化に抵抗してきた。本研究では, 半導体量子ビットの完全自律的なチューニングを, 接地デバイスからラビ振動へ導入し, 量子ビット動作が成功したことを示す。 ge/siコア/シェルナノワイヤデバイスにおいて、人間の介入なしにこの自動化を実証する。我々のアプローチは、ディープラーニング、ベイズ最適化、コンピュータビジョン技術を統合する。我々は、この自動化アルゴリズムを幅広い半導体量子ビットデバイスに適用し、量子ビット品質メトリクスの統計的研究を可能にすることを期待する。完全自動化の可能性の実証として、Rabi周波数とg因子が、アルゴリズムによって発見された量子ビットの1つのバリアゲート電圧に依存するかを特徴付ける。スピン量子ビット演算の最初の実証から20年後、この重要な進歩は、これまで未実験だった大きな量子回路の動作を最終的に触媒する。

Spanning over two decades, the study of qubits in semiconductors for quantum computing has yielded significant breakthroughs. However, the development of large-scale semiconductor quantum circuits is still limited by challenges in efficiently tuning and operating these circuits. Identifying optimal operating conditions for these qubits is complex, involving the exploration of vast parameter spaces. This presents a real 'needle in the haystack' problem, which, until now, has resisted complete automation due to device variability and fabrication imperfections. In this study, we present the first fully autonomous tuning of a semiconductor qubit, from a grounded device to Rabi oscillations, a clear indication of successful qubit operation. We demonstrate this automation, achieved without human intervention, in a Ge/Si core/shell nanowire device. Our approach integrates deep learning, Bayesian optimization, and computer vision techniques. We expect this automation algorithm to apply to a wide range of semiconductor qubit devices, allowing for statistical studies of qubit quality metrics. As a demonstration of the potential of full automation, we characterise how the Rabi frequency and g-factor depend on barrier gate voltages for one of the qubits found by the algorithm. Twenty years after the initial demonstrations of spin qubit operation, this significant advancement is poised to finally catalyze the operation of large, previously unexplored quantum circuits.

翻訳日:2024-02-07 15:08:23 公開日:2024-02-06

# リーク, チート, リピート: クローズドソースLCMにおけるデータ汚染とその評価

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs ( http://arxiv.org/abs/2402.03927v1 )

ライセンス: Link先を確認

Simone Balloccu, Patr\'icia Schmidtov\'a, Mateusz Lango, and Ond\v{r}ej Du\v{s}ek

(参考訳) 自然言語処理(NLP)の研究は、Large Language Models(LLM)の使用にますます焦点を当てている。モデルの詳細、特にトレーニングデータへのアクセスの欠如は、研究者の間でデータ汚染に関する懸念を繰り返している。この問題に対処する試みはいくつかあるが、これは逸話的証拠や試行錯誤に限られている。さらに、ユーザから来るデータを使用することで、モデルが反復的に改善される、‘emph{indirect}データリーク’という問題も見落としている。本研究では,OpenAI の GPT-3.5 と GPT-4 を用いて,データ汚染の文脈において最も顕著な LLM を用いた最初の系統解析を行った。 255の論文を分析し、OpenAIのデータ利用ポリシーを考慮して、モデルのリリース後最初の1年間にこれらのモデルにリークしたデータの量を広範囲に文書化します。これらのモデルが263のベンチマークから$\sim$4.7mのサンプルにさらされていると報告した。同時に,不公平な比較や欠落したベースライン比較,再現可能性問題など,レビュー論文に現れる数多くの評価誤りを文書化する。私たちはその結果をhttps://leak-llm.github.io/で共同プロジェクトとしてリリースしています。

Natural Language Processing (NLP) research is increasingly focusing on the use of Large Language Models (LLMs), with some of the most popular ones being either fully or partially closed-source. The lack of access to model details, especially regarding training data, has repeatedly raised concerns about data contamination among researchers. Several attempts have been made to address this issue, but they are limited to anecdotal evidence and trial and error. Additionally, they overlook the problem of \emph{indirect} data leaking, where models are iteratively improved by using data coming from users. In this work, we conduct the first systematic analysis of work using OpenAI's GPT-3.5 and GPT-4, the most prominently used LLMs today, in the context of data contamination. By analysing 255 papers and considering OpenAI's data usage policy, we extensively document the amount of data leaked to these models during the first year after the model's release. We report that these models have been globally exposed to $\sim$4.7M samples from 263 benchmarks. At the same time, we document a number of evaluation malpractices emerging in the reviewed papers, such as unfair or missing baseline comparisons and reproducibility issues. We release our results as a collaborative project on https://leak-llm.github.io/, where other researchers can contribute to our efforts.

翻訳日:2024-02-07 15:08:04 公開日:2024-02-06

# 帰納整列決定変換器

Return-Aligned Decision Transformer ( http://arxiv.org/abs/2402.03923v1 )

ライセンス: Link先を確認

Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

(参考訳) オフライン強化学習における従来のアプローチは、リターンとして知られる累積報酬を最大化する最適なポリシーを学ぶことを目的としている。しかし、アプリケーションが広まるにつれて、リターンを最大化するだけでなく、実際のリターンを特定のターゲットリターンと整合させるエージェントを訓練することがますます重要になり、エージェントのパフォーマンスを制御できるようになる。決定変換器(DT)は、教師付き学習を通じて目標リターンに条件付けられたアクションを生成するポリシーを最適化し、目標リターンを使用してエージェントを制御する機構を備える。実際の戻り値と目標の戻り値とを一致させるように設計されているが、実際の戻り値とdtの戻り値との差を実証的に確認した。本稿では、実際のリターンと目標リターンを効果的に整合させるために、Return-Aligned Decision Transformer (RADT)を提案する。我々のモデルは、通常、戻り値、状態、およびアクションからなる通常の入力シーケンスから戻り値を切り離し、戻り値と状態、および戻り値とアクションの関係を強化する。大規模実験により、RADTはDTベースの手法の実際の戻り値と目標戻り値との差を減少させることが示された。

Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maximize the returns, but align the actual return with a specified target return, giving control over the agent's performance. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and is equipped with a mechanism to control the agent using the target return. Despite being designed to align the actual return with the target return, we have empirically identified a discrepancy between the actual return and the target return in DT. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to effectively align the actual return with the target return. Our model decouples returns from the conventional input sequence, which typically consists of returns, states, and actions, to enhance the relationships between returns and states, as well as returns and actions. Extensive experiments show that RADT reduces the discrepancies between the actual return and the target return of DT-based methods.

翻訳日:2024-02-07 15:07:41 公開日:2024-02-06

# ベイズ最適化を支援する大規模言語モデル

Large Language Models to Enhance Bayesian Optimization ( http://arxiv.org/abs/2402.03921v1 )

ライセンス: Link先を確認

Tennison Liu and Nicol\'as Astorga and Nabeel Seedat and Mihaela van der Schaar

(参考訳) ベイズ最適化(BO)は、複雑で高価なブラックボックス関数を最適化するための強力なアプローチである。その重要性は、特にハイパーパラメータチューニングを含む多くのアプリケーションで強調されているが、その効果は探索と搾取の効率的なバランスに依存する。 BO法にはかなりの進歩があったが、このバランスを打つことは依然として微妙なプロセスである。本稿では,大規模言語モデル (LLM) の能力をBO内に組み込んだ新しいアプローチである \texttt{LLAMBO} を提案する。高レベルでは、自然言語でbo問題をフレーム化することで、llmは歴史的評価に基づく有望なソリューションを反復的に提案できる。より具体的には、文脈理解、少ない学習能力、llmのドメイン知識を組み合わせることで、モデルベースのboの様々なコンポーネントがいかに強化されるかを検討する。以上の結果から,<texttt{LLAMBO} はゼロショットウォームスタートに有効であることが示唆され,サロゲートモデリングや候補サンプリングの改善が期待できる。我々のアプローチは文脈で実行され、llmの微調整は不要です。さらに、それは設計によってモジュール化されており、個々のコンポーネントを既存のBOフレームワークに統合できる。我々は,ハイパーパラメータチューニング問題に対する‘texttt{LLAMBO}’の有効性を実証的に検証し,多様なベンチマーク,プロプライエタリ,合成タスクにまたがる強力な経験的性能を強調した。

Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance still remains a delicate process. In this light, we present \texttt{LLAMBO}, a novel approach that integrates the capabilities of large language models (LLM) within BO. At a high level, we frame the BO problem in natural language terms, enabling LLMs to iteratively propose promising solutions conditioned on historical evaluations. More specifically, we explore how combining contextual understanding, few-shot learning proficiency, and domain knowledge of LLMs can enhance various components of model-based BO. Our findings illustrate that \texttt{LLAMBO} is effective at zero-shot warmstarting, and improves surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse. Our approach is performed in context and does not require LLM finetuning. Additionally, it is modular by design, allowing individual components to be integrated into existing BO frameworks, or function cohesively as an end-to-end method. We empirically validate \texttt{LLAMBO}'s efficacy on the problem of hyperparameter tuning, highlighting strong empirical performance across a range of diverse benchmarks, proprietary, and synthetic tasks.

翻訳日:2024-02-07 15:07:20 公開日:2024-02-06

# ダイナスティック電位クロスオーバー演算子

Dynastic Potential Crossover Operator ( http://arxiv.org/abs/2402.03918v1 )

ライセンス: Link先を確認

Francisco Chicano, Gabriela Ochoa, Darrell Whitley, Renato Tin\'os

(参考訳) 2つの親解に対する最適組換え演算子は、親の1つ(遺伝子伝達特性)から各変数の値を取るものの中で最良の解を提供する。解がビット文字列であれば、最適な再結合作用素の子孫は、2つの親解を含む最小超平面において最適である。この超平面の探索は計算に費用がかかり、最悪の場合指数時間を必要とする。しかし、目的関数の変数相互作用グラフがスパースである場合、多項式時間で探索を行うことができる。本稿では,多項式時間で動作し,低エピスタシス組合せ問題に対する最適再結合演算子のように振る舞う,dynastic potential crossover (dpx)と呼ばれる再結合演算子を提案する。この演算子を、理論的および実験的に、一様クロスオーバーやネットワーククロスオーバーのような従来のクロスオーバー演算子と比較し、最近定義された2つの効率的な再結合演算子、パーティショニングクロスオーバーと分節点分割クロスオーバーと比較する。実験的な比較ではNKQ LandscapesとMAX-SATインスタンスを使用する。 DPXは、他のクロスオーバー演算子よりも、子孫の品質において優れており、軌道や人口ベースメタヒューリスティックに含まれるより良い結果を提供するが、子孫を計算するのにより多くの時間とメモリを必要とする。

An optimal recombination operator for two parent solutions provides the best solution among those that take the value for each variable from one of the parents (gene transmission property). If the solutions are bit strings, the offspring of an optimal recombination operator is optimal in the smallest hyperplane containing the two parent solutions. Exploring this hyperplane is computationally costly, in general, requiring exponential time in the worst case. However, when the variable interaction graph of the objective function is sparse, exploration can be done in polynomial time. In this paper, we present a recombination operator, called Dynastic Potential Crossover (DPX), that runs in polynomial time and behaves like an optimal recombination operator for low-epistasis combinatorial problems. We compare this operator, both theoretically and experimentally, with traditional crossover operators, like uniform crossover and network crossover, and with two recently defined efficient recombination operators: partition crossover and articulation points partition crossover. The empirical comparison uses NKQ Landscapes and MAX-SAT instances. DPX outperforms the other crossover operators in terms of quality of the offspring and provides better results included in a trajectory and a population-based metaheuristic, but it requires more time and memory to compute the offspring.

翻訳日:2024-02-07 15:06:52 公開日:2024-02-06

# 冷間開始不要なインクリメンタル学習のための弾性的特徴統合

Elastic Feature Consolidation for Cold Start Exemplar-free Incremental Learning ( http://arxiv.org/abs/2402.03917v1 )

ライセンス: Link先を確認

Simone Magistri, Tomaso Trinci, Albin Soutif-Cormerais, Joost van de Weijer, Andrew D. Bagdanov

(参考訳) Exemplar-Free Class Incremental Learning (EFCIL) は、タスクのシーケンスから以前のタスクデータにアクセスすることなく学習することを目的としている。本稿では,高品質なバックボーンを学習する最初のタスクにおいて,不十分なデータが利用できるという,コールドスタートの難しさについて考察する。これはefcilにとって特に困難であり、高い可塑性を必要とするため、exemplar-free設定では補うのが難しい特徴ドリフトが生じる。この問題に対処するために,従来のタスクに強く関連する方向のドリフトを規則化し,特徴表現を統合するためのシンプルで効果的な手法を提案する。提案手法は,EFC (Elastic Feature Consolidation) と呼ばれ,経験的特徴行列 (EFM) に基づく特徴ドリフトの抽出可能な2次近似を利用する。 EFMは、重要な方向における特徴ドリフトの正則化や、新しい非対称なクロスエントロピー損失に使用されるガウスプロトタイプの更新に使用する擬似的特徴空間を誘導し、新しいタスクのデータとプロトタイプのリハーサルを効果的にバランスさせる。 cifar-100、tiny-imagenet、imagenet-subset、imagenet-1kの実験結果は、弾力的な機能統合がモデル可塑性を維持し、最先端を著しく上回ることで新しいタスクを学習できることを示しています。

Exemplar-Free Class Incremental Learning (EFCIL) aims to learn from a sequence of tasks without having access to previous task data. In this paper, we consider the challenging Cold Start scenario in which insufficient data is available in the first task to learn a high-quality backbone. This is especially challenging for EFCIL since it requires high plasticity, which results in feature drift which is difficult to compensate for in the exemplar-free setting. To address this problem, we propose a simple and effective approach that consolidates feature representations by regularizing drift in directions highly relevant to previous tasks and employs prototypes to reduce task-recency bias. Our method, called Elastic Feature Consolidation (EFC), exploits a tractable second-order approximation of feature drift based on an Empirical Feature Matrix (EFM). The EFM induces a pseudo-metric in feature space which we use to regularize feature drift in important directions and to update Gaussian prototypes used in a novel asymmetric cross entropy loss which effectively balances prototype rehearsal with data from new tasks. Experimental results on CIFAR-100, Tiny-ImageNet, ImageNet-Subset and ImageNet-1K demonstrate that Elastic Feature Consolidation is better able to learn new tasks by maintaining model plasticity and significantly outperform the state-of-the-art.

翻訳日:2024-02-07 15:06:29 公開日:2024-02-06

# 大規模言語モデルはソーシャルメディアの噂を検知できるか?

Can Large Language Models Detect Rumors on Social Media? ( http://arxiv.org/abs/2402.03916v1 )

ライセンス: Link先を確認

Qiang Liu, Xiang Tao, Junfei Wu, Shu Wu, Liang Wang

(参考訳) 本研究では,ソーシャルメディア上でのうわさ検出にLarge Language Models (LLMs) を用いることを検討した。しかし、llmは、複雑な伝播情報における重要な手がかりに集中せず、大規模で冗長な情報に直面した際に推論に支障をきたす可能性があるため、ニュース内容や多数のコメントを含むソーシャルメディア上の伝達情報全体をllmが推論することは困難である。そこで,本研究では,ニュースやコメントにおいて重要な手がかりを推論するために,llmに提案手法を考案し,伝達情報全体をチェーン・オブ・プロパゲーションに分割し,llmの負担を軽減する手法を提案する。我々はTwitterとWeiboデータセットで広範な実験を行い、LeRuDはいくつかの最先端の噂検出モデルを2.4%から7.6%上回っている。一方、LLMを適用することで、LeRuDはトレーニングにデータを必要としないため、ほとんどショットやゼロショットのシナリオでより有望な噂検出能力を示す。

In this work, we investigate to use Large Language Models (LLMs) for rumor detection on social media. However, it is challenging for LLMs to reason over the entire propagation information on social media, which contains news contents and numerous comments, due to LLMs may not concentrate on key clues in the complex propagation information, and have trouble in reasoning when facing massive and redundant information. Accordingly, we propose an LLM-empowered Rumor Detection (LeRuD) approach, in which we design prompts to teach LLMs to reason over important clues in news and comments, and divide the entire propagation information into a Chain-of-Propagation for reducing LLMs' burden. We conduct extensive experiments on the Twitter and Weibo datasets, and LeRuD outperforms several state-of-the-art rumor detection models by 2.4% to 7.6%. Meanwhile, by applying LLMs, LeRuD requires no data for training, and thus shows more promising rumor detection ability in few-shot or zero-shot scenarios.

翻訳日:2024-02-07 15:06:00 公開日:2024-02-06

# 加速A/Bテストのためのパワーを最大化する学習メトリクス

Learning Metrics that Maximise Power for Accelerated A/B-Tests ( http://arxiv.org/abs/2402.03915v1 )

ライセンス: Link先を確認

Olivier Jeunen and Aleksei Ustimenko

(参考訳) オンライン制御実験は、テクノロジー企業における自信ある意思決定を可能にするための重要なツールである。北星の計量は(長期の収入やユーザーの保持など)定義され、a/bテストで統計的に著しく改善されるシステム変種は優れていると考えられる。ノーススター測度は通常遅延し、感度が低い。その結果、実験のコストは高く、実験は長時間実行する必要があるが、それでもタイプIIエラー(つまり偽陰性)が一般的である。我々は、北星に対する統計力を直接最大化する短期的信号からメトリクスを学習することで、この問題に取り組むことを提案する。既存の手法は過度に適合する傾向があり、平均的な計量感度が高いとタイプIIの誤差が改善しないことが示され、代わりに過去の実験のログ上でメートル法が生成したであろう$p$-値の最小化を提案する。これらのデータセットは2つのソーシャルメディアアプリケーションから収集され、1億6000万以上の月間アクティブユーザーを持つ。実験結果から,学習指標を単独で使用した場合,統計的パワーを最大78%,ノーススターとタンデムで使用した場合,最大210%増やすことができた。あるいは、北星が要求する量の12%以下のサンプルサイズで一定の統計力を得ることができ、実験のコストを大幅に削減することができる。

Online controlled experiments are a crucial tool to allow for confident decision-making in technology companies. A North Star metric is defined (such as long-term revenue or user retention), and system variants that statistically significantly improve on this metric in an A/B-test can be considered superior. North Star metrics are typically delayed and insensitive. As a result, the cost of experimentation is high: experiments need to run for a long time, and even then, type-II errors (i.e. false negatives) are prevalent. We propose to tackle this by learning metrics from short-term signals that directly maximise the statistical power they harness with respect to the North Star. We show that existing approaches are prone to overfitting, in that higher average metric sensitivity does not imply improved type-II errors, and propose to instead minimise the $p$-values a metric would have produced on a log of past experiments. We collect such datasets from two social media applications with over 160 million Monthly Active Users each, totalling over 153 A/B-pairs. Empirical results show that we are able to increase statistical power by up to 78% when using our learnt metrics stand-alone, and by up to 210% when used in tandem with the North Star. Alternatively, we can obtain constant statistical power at a sample size that is down to 12% of what the North Star requires, significantly reducing the cost of experimentation.

翻訳日:2024-02-07 15:05:39 公開日:2024-02-06

# EscherNet: スケーラブルなビュー合成のための生成モデル

EscherNet: A Generative Model for Scalable View Synthesis ( http://arxiv.org/abs/2402.03908v1 )

ライセンス: Link先を確認

Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison

(参考訳) ビュー合成のための多視点条件付き拡散モデルであるEscherNetを紹介する。 eschernetは、暗黙的および生成的な3d表現と特別なカメラ位置符号化を組み合わせることで、任意の数の参照とターゲットビューの間のカメラ変換を正確にかつ連続的に制御できる。 EscherNetは、ビュー合成における例外的な汎用性、柔軟性、スケーラビリティを提供する。これは、1つのコンシューマグレードのGPUで同時に100以上の一貫性のあるターゲットビューを生成することができる。結果として、EscherNetはゼロショットの新規ビュー合成だけでなく、自然にシングルイメージとマルチイメージの3D再構成を統一し、これらの多様なタスクを単一の凝集性フレームワークに統合する。我々の広範な実験により、eschernetは、個々の問題に特化したメソッドと比較しても、複数のベンチマークで最先端のパフォーマンスを達成できることが示されました。この驚くべき汎用性は、3dビジョンのためのスケーラブルなニューラルアーキテクチャを設計するための新しい方向を開く。プロジェクトページ: \url{https://kxhit.github.io/eschernet}

We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding, allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality, flexibility, and scalability in view synthesis -- it can generate more than 100 consistent target views simultaneously on a single consumer-grade GPU, despite being trained with a fixed number of 3 reference views to 3 target views. As a result, EscherNet not only addresses zero-shot novel view synthesis, but also naturally unifies single- and multi-image 3D reconstruction, combining these diverse tasks into a single, cohesive framework. Our extensive experiments demonstrate that EscherNet achieves state-of-the-art performance in multiple benchmarks, even when compared to methods specifically tailored for each individual problem. This remarkable versatility opens up new directions for designing scalable neural architectures for 3D vision. Project page: \url{https://kxhit.github.io/EscherNet}.

翻訳日:2024-02-07 15:04:53 公開日:2024-02-06

# マルチタスク学習における勾配アグリゲーションのベイズ的不確かさ

Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning ( http://arxiv.org/abs/2402.04005v1 )

ライセンス: Link先を確認

Idan Achituve, Idit Diamant, Arnon Netzer, Gal Chechik, Ethan Fetaya

(参考訳) 機械学習がより顕著になるにつれて、複数の推論タスクを並行して実行する必要性が高まっている。各タスクに専用のモデルを実行するのは計算コストがかかるため、マルチタスク学習(MTL)に大きな関心がある。 mtlは、複数のタスクを効率的に解決する単一のモデルを学ぶことを目指している。 mtlモデルの最適化は、タスクごとの単一の勾配を計算し、それらを集約することで、更新方向を組み合わせることで達成されることが多い。しかし、これらのアプローチは、勾配次元の感度という重要な側面を考慮しない。本稿ではベイズ推定を用いた新しい勾配集約手法を提案する。タスク固有のパラメータに確率分布を配置し,タスクの勾配上の分布を誘導する。この付加的な貴重な情報は、各勾配次元の不確かさを定量化し、それらを集約するときに分解することができる。私たちは、さまざまなデータセットで私たちのアプローチの利点を実証し、最先端のパフォーマンスを実現します。

As machine learning becomes more prominent there is a growing demand to perform several inference tasks in parallel. Running a dedicated model for each task is computationally expensive and therefore there is a great interest in multi-task learning (MTL). MTL aims at learning a single model that solves several tasks efficiently. Optimizing MTL models is often achieved by computing a single gradient per task and aggregating them for obtaining a combined update direction. However, these approaches do not consider an important aspect, the sensitivity in the gradient dimensions. Here, we introduce a novel gradient aggregation approach using Bayesian inference. We place a probability distribution over the task-specific parameters, which in turn induce a distribution over the gradients of the tasks. This additional valuable information allows us to quantify the uncertainty in each of the gradients dimensions, which can then be factored in when aggregating them. We empirically demonstrate the benefits of our approach in a variety of datasets, achieving state-of-the-art performance.

翻訳日:2024-02-07 14:56:32 公開日:2024-02-06

# 複数の合成データセット上のアンサンブルのバイアス分散分解

A Bias-Variance Decomposition for Ensembles over Multiple Synthetic Datasets ( http://arxiv.org/abs/2402.03985v1 )

ライセンス: Link先を確認

Ossi R\"ais\"a, Antti Honkela

(参考訳) 近年の研究では、精度の向上からより効果的なモデル選択、不確実性推定に至るまで、教師あり学習のための複数の合成データセットを生成する利点を強調している。これらの利点は明確な実証的支援を持っているが、理論的な理解は現在非常に軽い。本研究では,複数の合成データセットを用いた場合のバイアス分散分解を導出することにより,理論的理解を深める。本理論は,複数の合成データセットが高分散下流予測器に特に有益であると予測し,平均二乗誤差とブライアスコアの場合,適切な合成データセットを選択するための簡単な規則を与える。我々は,本理論が実際にどのように機能するかを,複数の実際のデータセットと下流予測器に対する多くの合成データセットに対するアンサンブルの性能を評価することによって検討する。結果は我々の理論に従い、我々の洞察も事実上関連していることを示している。

Recent studies have highlighted the benefits of generating multiple synthetic datasets for supervised learning, from increased accuracy to more effective model selection and uncertainty estimation. These benefits have clear empirical support, but the theoretical understanding of them is currently very light. We seek to increase the theoretical understanding by deriving bias-variance decompositions for several settings of using multiple synthetic datasets. Our theory predicts multiple synthetic datasets to be especially beneficial for high-variance downstream predictors, and yields a simple rule of thumb to select the appropriate number of synthetic datasets in the case of mean-squared error and Brier score. We investigate how our theory works in practice by evaluating the performance of an ensemble over many synthetic datasets for several real datasets and downstream predictors. The results follow our theory, showing that our insights are also practically relevant.

翻訳日:2024-02-07 14:56:16 公開日:2024-02-06

# 緩和仮定下での確率最適化のためのadamの収束について

On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions ( http://arxiv.org/abs/2402.03982v1 )

ライセンス: Link先を確認

Yusu Hong and Junhong Lin

(参考訳) Adaptive Momentum Estimation (Adam)アルゴリズムは、様々なディープラーニングタスクのトレーニングに非常に効果的である。それにもかかわらず、アダムには理論的な理解が限られており、特に非凸な滑らかなシナリオにおいてバニラ形式に焦点を合わせると、潜在的な非有界勾配とアフィン分散ノイズがある。本稿では,バニラ・アダムをこれらの困難条件下で研究する。本稿では,アフィン分散雑音,有界雑音,サブゲージ雑音を支配する包括的雑音モデルを提案する。我々はAdamが、この一般的なノイズモデルの下で高い確率で$\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$の定常点を見つけることができることを示す。より重要なことは、アダムは任意の問題パラメータでステップサイズをチューニングできず、同じ条件下での確率勾配降下よりも適応性が良いことを明らかにすることである。また,非有界なスムースネスパラメータを許容する一般化スムース条件下でのadamの確率的収束結果を示し,多くの実用目的関数のスムース特性をより正確に捉えるために実証的に示した。

The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks. Despite this, there's limited theoretical understanding for Adam, especially when focusing on its vanilla form in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. In this paper, we study vanilla Adam under these challenging conditions. We introduce a comprehensive noise model which governs affine variance noise, bounded noise and sub-Gaussian noise. We show that Adam can find a stationary point with a $\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$ rate in high probability under this general noise model where $T$ denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors. More importantly, we reveal that Adam is free of tuning step-sizes with any problem-parameters, yielding a better adaptation property than the Stochastic Gradient Descent under the same conditions. We also provide a probabilistic convergence result for Adam under a generalized smooth condition which allows unbounded smoothness parameters and has been illustrated empirically to more accurately capture the smooth property of many practical objective functions.

翻訳日:2024-02-07 14:56:02 公開日:2024-02-06

# 拡散に基づく動きの予測のための可制御型横サンプリング

Controllable Diverse Sampling for Diffusion Based Motion Behavior Forecasting ( http://arxiv.org/abs/2402.03981v1 )

ライセンス: Link先を確認

Yiming Xu, Hao Cheng, Monika Sester

(参考訳) 自動運転タスクでは、複雑な交通環境における軌道予測には、実世界の状況条件と行動の多様性への順守が必要である。既存の手法は、主に事前の仮定や、キュレーションされたデータで訓練された生成モデルに頼って、道路エージェントの確率的振る舞いをシーン制約によって学習している。しかし、データ不均衡や単純な事前処理によるモード平均化の問題に直面し、不安定なトレーニングや単一地上の真実管理によるモード崩壊に悩まされることもしばしばあった。これらの問題により、既存の手法は予測的な多様性を失い、シーンの制約に固執する。これらの課題に対処するために,地図情報とソーシャルインタラクションをトランスフォーマーに基づく条件記述拡散モデルに統合し,将来の軌跡予測を導出する,制御可能拡散軌道(CDT)と呼ばれる新しいトラジェクタを導入する。マルチモーダル性を確保するため,直進,右折,左折などの軌道モードを指示する行動トークンを組み込んだ。さらに,予測された終端を代替行動トークンとしてCDTモデルに組み込んで,正確な軌道の予測を容易にする。 argoverse 2ベンチマークの広範な実験は、複雑な都市環境においてcdtが多様でシーンに準拠した軌跡を生成するのに優れていることを示している。

In autonomous driving tasks, trajectory prediction in complex traffic environments requires adherence to real-world context conditions and behavior multimodalities. Existing methods predominantly rely on prior assumptions or generative models trained on curated data to learn road agents' stochastic behavior bounded by scene constraints. However, they often face mode averaging issues due to data imbalance and simplistic priors, and could even suffer from mode collapse due to unstable training and single ground truth supervision. These issues lead the existing methods to a loss of predictive diversity and adherence to the scene constraints. To address these challenges, we introduce a novel trajectory generator named Controllable Diffusion Trajectory (CDT), which integrates map information and social interactions into a Transformer-based conditional denoising diffusion model to guide the prediction of future trajectories. To ensure multimodality, we incorporate behavioral tokens to direct the trajectory's modes, such as going straight, turning right or left. Moreover, we incorporate the predicted endpoints as an alternative behavioral token into the CDT model to facilitate the prediction of accurate trajectories. Extensive experiments on the Argoverse 2 benchmark demonstrate that CDT excels in generating diverse and scene-compliant trajectories in complex urban settings.

翻訳日:2024-02-07 14:55:36 公開日:2024-02-06

# クロスエントロピーとラベル平滑化:神経崩壊の展望

Cross Entropy versus Label Smoothing: A Neural Collapse Perspective ( http://arxiv.org/abs/2402.03979v1 )

ライセンス: Link先を確認

Li Guo, Keith Ross, Zifan Zhao, Andriopoulos George, Shuyang Ling, Yufeng Xu, Zixuan Dong

(参考訳) ラベル平滑化損失は、ディープニューラルネットワークの過剰フィッティングを軽減するために広く採用されているテクニックである。本稿では,学習末期のモデル動作を特徴付ける強力な経験的・理論的枠組みであるNeural Collapse(NC)の観点から,スムースなラベル付けについて検討する。まず,ラベル平滑化を訓練したモデルがより早く神経崩壊解に収束し,より強い神経崩壊レベルに達することを示す。さらに,同レベルのnc1ではラベル平滑化損失モデルがnc2の増大を示すことを示した。これらの知見は, ラベル平滑化損失下での性能向上とモデルキャリブレーションの強化に有意義な洞察を与える。次に、両損失関数に対する大域的最小化に対する閉形式解を導出するために、制約のない特徴モデルを活用し、さらにラベル平滑化下のモデルは条件数が少なく、理論上はより高速に収束することを示す。実験的な証拠と理論的な結果を組み合わせることで、ラベルの平滑化とクロスエントロピーの損失の違いに関する微妙な洞察を提供するだけでなく、DNNの理解を改善するために強力な神経崩壊フレームワークをどのように利用できるかの例としても役立ちます。

Label smoothing loss is a widely adopted technique to mitigate overfitting in deep neural networks. This paper studies label smoothing from the perspective of Neural Collapse (NC), a powerful empirical and theoretical framework which characterizes model behavior during the terminal phase of training. We first show empirically that models trained with label smoothing converge faster to neural collapse solutions and attain a stronger level of neural collapse. Additionally, we show that at the same level of NC1, models under label smoothing loss exhibit intensified NC2. These findings provide valuable insights into the performance benefits and enhanced model calibration under label smoothing loss. We then leverage the unconstrained feature model to derive closed-form solutions for the global minimizers for both loss functions and further demonstrate that models under label smoothing have a lower conditioning number and, therefore, theoretically converge faster. Our study, combining empirical evidence and theoretical results, not only provides nuanced insights into the differences between label smoothing and cross-entropy losses, but also serves as an example of how the powerful neural collapse framework can be used to improve our understanding of DNNs.

翻訳日:2024-02-07 14:55:15 公開日:2024-02-06

# 多エージェント深部強化学習における協調探索のための統合内在的動機付け

Joint Intrinsic Motivation for Coordinated Exploration in Multi-Agent Deep Reinforcement Learning ( http://arxiv.org/abs/2402.03972v1 )

ライセンス: Link先を確認

Maxime Toquebiau, Nicolas Bredeche, Fa\"iz Benamar, Jae-Yun Jun

(参考訳) マルチエージェント深部強化学習(MADRL)問題はしばしばスパース報酬の課題に遭遇する。エージェント間の調整が必要な場合、この課題はさらに顕著になる。性能はエージェントの振舞いだけでなく、複数のエージェントの関節振舞いにも依存するため、適切な解を見つけることは著しく困難になる。この文脈では、エージェントのグループは、最も効率的な戦略を決定するために、さまざまな共同戦略を積極的に探究することで利益を得ることができる。本稿では,エージェントが集団で新しい行動を示す戦略を報奨する手法を提案する。本稿では,分散実行パラダイムを用いた集中学習に追従する多エージェント固有の動機づけ手法であるJIMを提案する。 JIMは、継続的な環境で機能するように設計されたノベルティの集中的な尺度に基づいて、共同軌道に報いる。本手法の強みは,最先端のMADRL手法の欠点を明らかにするために設計された合成環境と,シミュレーションロボットタスクの両方で実証する。その結果、最適戦略が高いレベルの調整を必要とする課題を解決するためには共同探索が不可欠であることが示された。

Multi-agent deep reinforcement learning (MADRL) problems often encounter the challenge of sparse rewards. This challenge becomes even more pronounced when coordination among agents is necessary. As performance depends not only on one agent's behavior but rather on the joint behavior of multiple agents, finding an adequate solution becomes significantly harder. In this context, a group of agents can benefit from actively exploring different joint strategies in order to determine the most efficient one. In this paper, we propose an approach for rewarding strategies where agents collectively exhibit novel behaviors. We present JIM (Joint Intrinsic Motivation), a multi-agent intrinsic motivation method that follows the centralized learning with decentralized execution paradigm. JIM rewards joint trajectories based on a centralized measure of novelty designed to function in continuous environments. We demonstrate the strengths of this approach both in a synthetic environment designed to reveal shortcomings of state-of-the-art MADRL methods, and in simulated robotic tasks. Results show that joint exploration is crucial for solving tasks where the optimal strategy requires a high level of coordination.

翻訳日:2024-02-07 14:54:49 公開日:2024-02-06

# 表データ:注意は必要なだけか?

Tabular Data: Is Attention All You Need? ( http://arxiv.org/abs/2402.03970v1 )

ライセンス: Link先を確認

Guri Zab\"ergja, Arlind Kadra, Josif Grabocka

(参考訳) ディープラーニングはAIの分野に革命をもたらし、画像とテキストデータを含むアプリケーションにおいて顕著な成果をもたらした。残念ながら、構造化表データに対するニューラルネットワークの利点には決定的な証拠がある。本稿では,グラフデータ上の勾配ブースト決定木とニューラルネットワークを比較するとともに,残差接続を持つ従来の多層パーセプトロン(mlp)に対するトランスフォーマティブアーキテクチャを提案する。これまでの研究とは対照的に、ニューラルネットワークは決定木と競合することを示している。さらに、トランスフォーマーベースのアーキテクチャは、表型データセット上の従来のmlpアーキテクチャの単純な変種を上回らないことを評価した。その結果,本稿は,将来の表型データアプリケーションにニューラルネットワークをデプロイする上で,研究者や実践者コミュニティが重要な選択を行うのに役立つ。

Deep Learning has revolutionized the field of AI and led to remarkable achievements in applications involving image and text data. Unfortunately, there is inconclusive evidence on the merits of neural networks for structured tabular data. In this paper, we introduce a large-scale empirical study comparing neural networks against gradient-boosted decision trees on tabular data, but also transformer-based architectures against traditional multi-layer perceptrons (MLP) with residual connections. In contrast to prior work, our empirical findings indicate that neural networks are competitive against decision trees. Furthermore, we assess that transformer-based architectures do not outperform simpler variants of traditional MLP architectures on tabular datasets. As a result, this paper helps the research and practitioner communities make informed choices on deploying neural networks on future tabular data applications.

翻訳日:2024-02-07 14:54:35 公開日:2024-02-06

# 文脈内学習エージェントは非対称信念更新者である

In-context learning agents are asymmetric belief updaters ( http://arxiv.org/abs/2402.03969v1 )

ライセンス: Link先を確認

Johannes A. Schubert, Akshay K. Jagadish, Marcel Binz, Eric Schulz

(参考訳) 認知心理学から適応した3つの楽器学習課題を用いて,大規模言語モデル(LLM)の文脈内学習ダイナミクスについて検討した。 LLMは非対称な方法で信念を更新し、予測された結果よりも予測された結果からより多くを学ぶ。さらに,反事実フィードバックを学習するとこの効果は逆転し,機関が含まないと消失することを示した。メタ強化学習から得られた理想化された文脈内学習エージェントを探索し,類似したパターンを観察することで,これらの知見を裏付ける。本研究の結果は,文脈内学習の動作の理解に寄与し,問題のフレーミングが学習に大きく影響を与えることを強調し,人間の認知にも見られる現象となった。

We study the in-context learning dynamics of large language models (LLMs) using three instrumental learning tasks adapted from cognitive psychology. We find that LLMs update their beliefs in an asymmetric manner and learn more from better-than-expected outcomes than from worse-than-expected ones. Furthermore, we show that this effect reverses when learning about counterfactual feedback and disappears when no agency is implied. We corroborate these findings by investigating idealized in-context learning agents derived through meta-reinforcement learning, where we observe similar patterns. Taken together, our results contribute to our understanding of how in-context learning works by highlighting that the framing of a problem significantly influences how learning occurs, a phenomenon also observed in human cognition.

翻訳日:2024-02-07 14:54:22 公開日:2024-02-06

# MPNNにおける特徴ベクトルの次元性について

On dimensionality of feature vectors in MPNNs ( http://arxiv.org/abs/2402.03966v1 )

ライセンス: Link先を確認

C\'esar Bravo, Alexander Kozachinskiy, Crist\'obal Rojas

(参考訳) morrisらによる古典的結果を再検討する。 ~(aaai'19) メッセージパッシンググラフニューラルネットワーク(mpnn)は、weisfeiler-leman (wl) 同型テストと区別力において等しい。モリスら。 ~reluアクティベーション関数と$o(n)$-dimensional feature vectorでシミュレーション結果を示し、ここで$n$はグラフのノード数である。最近、アーキテクチャにランダム性を導入することで、Aamand et al。 ~(NeurIPS'22)は$O(\log n)$-dimensional特徴ベクトルへのバウンドを改善できたが、完全なシミュレーションは高い確率でしか保証できなかった。これらすべての構成において、WLテストと等価性を保証するため、MPNNにおける特徴ベクトルの次元はグラフのサイズによって増加する必要がある。しかし、実際に使われるアーキテクチャは定次元の特徴ベクトルを持つ。したがって、これらの結果によって提供される保証と、実際に使用されるアーキテクチャの実際の特性との間にはギャップがある。本稿では、sgmoid のアクティベーション関数のような)非多項解析関数 \emph{any} に対して、mpnn が wl テストと同値であることを保証するために、次元 $d=1$ の特徴ベクトルは、グラフのサイズとは無関係に、我々が必要とするすべてであることを示すことにより、このギャップを閉じる。我々の主要な技術的洞察は、WLテストにおける多重集合をシミュレートするには、実数ではなく有理数上の特徴ベクトルの線形独立性を利用するだけで十分であるということである。解析関数の優れた性質とともに有理数の集合の可算性は、特徴ベクトルの次元を増大させることなく、WLテストの反復に対してシミュレーション不変性を実行することができる。

We revisit the classical result of Morris et al.~(AAAI'19) that message-passing graphs neural networks (MPNNs) are equal in their distinguishing power to the Weisfeiler--Leman (WL) isomorphism test. Morris et al.~show their simulation result with ReLU activation function and $O(n)$-dimensional feature vectors, where $n$ is the number of nodes of the graph. Recently, by introducing randomness into the architecture, Aamand et al.~(NeurIPS'22) were able to improve this bound to $O(\log n)$-dimensional feature vectors, although at the expense of guaranteeing perfect simulation only with high probability. In all these constructions, to guarantee equivalence to the WL test, the dimension of feature vectors in the MPNN has to increase with the size of the graphs. However, architectures used in practice have feature vectors of constant dimension. Thus, there is a gap between the guarantees provided by these results and the actual characteristics of architectures used in practice. In this paper we close this gap by showing that, for \emph{any} non-polynomial analytic (like the sigmoid) activation function, to guarantee that MPNNs are equivalent to the WL test, feature vectors of dimension $d=1$ is all we need, independently of the size of the graphs. Our main technical insight is that for simulating multi-sets in the WL-test, it is enough to use linear independence of feature vectors over rationals instead of reals. Countability of the set of rationals together with nice properties of analytic functions allow us to carry out the simulation invariant over the iterations of the WL test without increasing the dimension of the feature vectors.

翻訳日:2024-02-07 14:54:07 公開日:2024-02-06

# ポジショルトペーパー:火花を消耗させたaiの主張に対抗

Position Paper: Against Spurious Sparks-Dovelating Inflated AI Claims ( http://arxiv.org/abs/2402.03962v1 )

ライセンス: Link先を確認

Patrick Altmeyer, Andrew M. Demetriou, Antony Bartlett, Cynthia C. S. Liem

(参考訳) 人間は周囲の物体に「人間」のような性質を見る傾向がある。私たちは私たちの車を名付け、ペットや家電製品にも話しかけます。この行動は擬人化と呼ばれ、機械学習(ml)にも牽引力があり、人間のような知性は大規模言語モデル(llm)で認識されていると主張されている。本稿では,職業的インセンティブ,人的バイアス,一般方法論的設定を考慮し,現在の汎用人工知能(agi)の探索が,人間的性質をllmに過剰に分配するための完璧な嵐であることを示す。いくつかの実験において、潜在空間における人間解釈パターンの発見は驚くべき結果ではないことが示されている。また,メディアにおける一般的なai表現を考慮し,学術コミュニティに対して,ai研究成果の解釈と伝達において,学術的整合性の原則を意識するように求めた。

Humans have a tendency to see 'human'-like qualities in objects around them. We name our cars, and talk to pets and even household appliances, as if they could understand us as other humans do. This behavior, called anthropomorphism, is also seeing traction in Machine Learning (ML), where human-like intelligence is claimed to be perceived in Large Language Models (LLMs). In this position paper, considering professional incentives, human biases, and general methodological setups, we discuss how the current search for Artificial General Intelligence (AGI) is a perfect storm for over-attributing human-like qualities to LLMs. In several experiments, we demonstrate that the discovery of human-interpretable patterns in latent spaces should not be a surprising outcome. Also in consideration of common AI portrayal in the media, we call for the academic community to exercise extra caution, and to be extra aware of principles of academic integrity, in interpreting and communicating about AI research outcomes.

翻訳日:2024-02-07 14:53:34 公開日:2024-02-06

# 細胞オートマトンにおける自己再生と進化 : evoloops後25年

Self-Reproduction and Evolution in Cellular Automata: 25 Years after Evoloops ( http://arxiv.org/abs/2402.03961v1 )

ライセンス: Link先を確認

Hiroki Sayama and Chrystopher L. Nehaniv

(参考訳) 2024年は、クリス・ラングトンの自己再生ループの進化的変種であるevoloopsの出版25周年であり、変動と自然選択による自己再生生物のダーウィン進化が決定論的細胞オートマトン内で可能であることを証明した。過去数十年間、この人工生命の研究はいくつかの重要な発展を遂げてきた。しばらくの間、活動は比較的休眠状態にあったが、近年のオープンエンド進化への関心の高まりと連続セルオートマトンモデルの成功は、空間的に分散された計算媒体の中で時空間パターンを自己複製し進化させる方法に研究者の注意を呼び戻した。本稿は、過去25年間のこのトピックに関する関連文献のレビューを行い、これまでの主な成果、直面している課題、将来的な研究の方向性について紹介する。

The year of 2024 marks the 25th anniversary of the publication of evoloops, an evolutionary variant of Chris Langton's self-reproducing loops which proved that Darwinian evolution of self-reproducing organisms by variation and natural selection is possible within deterministic cellular automata. Over the last few decades, this line of Artificial Life research has since undergone several important developments. Although it experienced a relative dormancy of activities for a while, the recent rise of interest in open-ended evolution and the success of continuous cellular automata models have brought researchers' attention back to how to make spatio-temporal patterns self-reproduce and evolve within spatially distributed computational media. This article provides a review of the relevant literature on this topic over the past 25 years and highlights the major accomplishments made so far, the challenges being faced, and promising future research directions.

翻訳日:2024-02-07 14:53:15 公開日:2024-02-06

# 手続き的文書に対するスパースグラフ表現

Sparse Graph Representations for Procedural Instructional Documents ( http://arxiv.org/abs/2402.03957v1 )

ライセンス: Link先を確認

Shruti Singh and Rishabh Gupta

(参考訳) 文書類似性の計算は、重複、マッチング、レコメンデーションに応用される様々なNLPドメインにおいて重要なタスクである。文書類似性計算の従来の手法には、文書の表現の学習や、埋め込み上の類似性や距離関数の利用が含まれる。しかし、ペアの類似性と相違は個々の表現によって効率的に捉えられるわけではない。 JCIG(Joint Concept Interaction Graph)のようなグラフ表現は、文書のペアを非方向重み付きグラフとして表現する。 JCIGは文書ペアをグラフとして解釈可能な表現を促進する。しかし、JCIGは非ダイレクトであり、文書内の文のシーケンシャルな流れを考慮しない。本稿では,文書の類似性をモデル化するための2つの手法を提案する。本研究では,非向エッジを有向エッジに置き換えるスーパージェノムソートとハミルトニアンパスに触発された2つのアルゴリズムを提案する。我々のアプローチは、グラフを JCIGの最悪のケースである$O(n^2)$から$O(n)$ edgeに分割する。本稿では、シームズエンコーダとGCNからなるスパース指向グラフモデルアーキテクチャを用いて、シーケンシャル情報を含まないデータセットのベースラインに匹敵する結果が得られ、シーケンシャル情報を含む命令文書データセットのベースラインを10ポイント上回ることを示す。

Computation of document similarity is a critical task in various NLP domains that has applications in deduplication, matching, and recommendation. Traditional approaches for document similarity computation include learning representations of documents and employing a similarity or a distance function over the embeddings. However, pairwise similarities and differences are not efficiently captured by individual representations. Graph representations such as Joint Concept Interaction Graph (JCIG) represent a pair of documents as a joint undirected weighted graph. JCIGs facilitate an interpretable representation of document pairs as a graph. However, JCIGs are undirected, and don't consider the sequential flow of sentences in documents. We propose two approaches to model document similarity by representing document pairs as a directed and sparse JCIG that incorporates sequential information. We propose two algorithms inspired by Supergenome Sorting and Hamiltonian Path that replace the undirected edges with directed edges. Our approach also sparsifies the graph to $O(n)$ edges from JCIG's worst case of $O(n^2)$. We show that our sparse directed graph model architecture consisting of a Siamese encoder and GCN achieves comparable results to the baseline on datasets not containing sequential information and beats the baseline by ten points on an instructional documents dataset containing sequential information.

翻訳日:2024-02-07 14:52:57 公開日:2024-02-06

# 不均一な欠損下での複合サーベイサンプリングにおける混合マトリックス補完

Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness ( http://arxiv.org/abs/2402.03954v1 )

ライセンス: Link先を確認

Xiaojun Mao, Hengfang Wang, Zhonglei Wang and Shu Yang

(参考訳) 大規模なサンプルサイズと混合型アンケートが増加する現代の調査は、堅牢でスケーラブルな分析方法を必要とする。本研究では,複素サーベイサンプリングにより得られた混合データフレーム行列の復元について検討する。この課題に対処するため,第1段階ではロジスティック回帰によるエントリーワイド欠落機構をモデル化し,第2段階では低ランク制約による重み付き対数類似度を最大化して目標パラメータ行列を完成させる2段階の手順を提案する。本稿では,サブリニア収束を実現する高速でスケーラブルな推定アルゴリズムを提案し,提案手法の推定誤差の上限を厳格に導出する。実験により理論的主張が裏付けられ,提案手法は既存手法と比較して有効性を示す。本手法は,全国保健栄養検査調査データの分析に応用される。

Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.

翻訳日:2024-02-07 14:52:36 公開日:2024-02-06

# 変形拘束型ワーピングによるモデル属間の逆移動性の向上

Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping ( http://arxiv.org/abs/2402.03951v1 )

ライセンス: Link先を確認

Qinliang Lin, Cheng Luo, Zenghao Niu, Xilin He, Weicheng Xie, Yuanbo Hou, Linlin Shen, Siyang Song

(参考訳) 代理モデルによって生成された逆例は、典型的には未知のターゲット系への限定的な転送可能性を示す。この問題に対処するために,多くのトランスファービリティ向上手法(入力変換やモデル拡張など)が提案されている。しかし,サロゲートモデルと異なるモデル遺伝子を持つ攻撃システムでは,性能が低かった。本稿では,変形抑制ウォーピング攻撃 (DeCoWA) と呼ばれる新規で汎用的な攻撃戦略を提案する。具体的には、decowaはまず、弾性変形(decow)を介して入力例を補強し、拡張入力の豊富な局所的詳細を得る。ランダム変形によるグローバルセマンティクスの厳密な歪みを回避するため、新たな適応制御戦略により、経年変化の強さと方向をさらに制約する。 CNNサロゲート上でのDeCoWAによる転送可能な例は、画像分類、ビデオアクション認識、音声認識など、様々なタスクにおけるトランスフォーマーの性能(およびその逆)を著しく阻害することを示した。コードはhttps://github.com/linqinliang/decowaで入手できる。

Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained Warping Attack (DeCoWA), that can be effectively applied to cross model genus attack. Specifically, DeCoWA firstly augments input examples via an elastic deformation, namely Deformation-Constrained Warping (DeCoW), to obtain rich local details of the augmented input. To avoid severe distortion of global semantics led by random deformation, DeCoW further constrains the strength and direction of the warping transformation by a novel adaptive control strategy. Extensive experiments demonstrate that the transferable examples crafted by our DeCoWA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at https://github.com/LinQinLiang/DeCoWA.

翻訳日:2024-02-07 14:52:18 公開日:2024-02-06

# カーネルパケットの一般理論:状態空間モデルからコンパクト支持基底へ

A General Theory for Kernel Packets: from state space model to compactly supported basis ( http://arxiv.org/abs/2402.04022v1 )

ライセンス: Link先を確認

Liang Ding and Tuo Rui

(参考訳) 状態空間 (SS) がガウス過程 (GP) の定式化によって訓練時間と予測時間をn個のデータポイントのO(n) に短縮できることはよく知られている。 gp の $m$ 次元 ss モデル定式化は、我々が一般右核パケット (kp) として導入した概念と等価であることを証明する: $\sum_{i=0}^{m}a_id_t^{(j)}k(t,t_i)=0$ 任意の $t \leq t_1$, 0 $\leq j \leq m-1$, and $m+1$ 連続点 $t_i$, ここで ${d}_t^{(j)}f(t)$ は$t$ に作用する$j$-次微分を表す。このアイデアは GP の後方 SS モデルの定式化にまで拡張され、次の$m$連続点に対する左 KP の概念が導かれる: $\sum_{i=0}^{m}b_i{D}_t^{(j)}K(t,t_{m+i})=0$ for any $t\geq t_{2m}$。左右の KP を組合せることで、これらの共分散関数の適当な線型結合がコンパクトに支持された KP 関数を$m$ で得られることを証明できる: $\phi^{(j)}(t)=0$ for any $t\not\in(t_0,t_{2m})$ and $j=0,\cdots,m-1$。 KPs はさらに GP の O(log n) あるいは O(1) への予測時間を減少させ、GP の微分を含むより一般的な問題に適用することができる。

It is well known that the state space (SS) model formulation of a Gaussian process (GP) can lower its training and prediction time both to O(n) for n data points. We prove that an $m$-dimensional SS model formulation of GP is equivalent to a concept we introduce as the general right Kernel Packet (KP): a transformation for the GP covariance function $K$ such that $\sum_{i=0}^{m}a_iD_t^{(j)}K(t,t_i)=0$ holds for any $t \leq t_1$, 0 $\leq j \leq m-1$, and $m+1$ consecutive points $t_i$, where ${D}_t^{(j)}f(t) $ denotes $j$-th order derivative acting on $t$. We extend this idea to the backward SS model formulation of the GP, leading to the concept of the left KP for next $m$ consecutive points: $\sum_{i=0}^{m}b_i{D}_t^{(j)}K(t,t_{m+i})=0$ for any $t\geq t_{2m}$. By combining both left and right KPs, we can prove that a suitable linear combination of these covariance functions yields $m$ compactly supported KP functions: $\phi^{(j)}(t)=0$ for any $t\not\in(t_0,t_{2m})$ and $j=0,\cdots,m-1$. KPs further reduces the prediction time of GP to O(log n) or even O(1) and can be applied to more general problems involving the derivative of GPs.

翻訳日:2024-02-07 14:45:01 公開日:2024-02-06

# DNNにおけるプライバシ漏洩: モデル反転攻撃と防御に関する調査

Privacy Leakage on DNNs: A Survey of Model Inversion Attacks and Defenses ( http://arxiv.org/abs/2402.04013v1 )

ライセンス: Link先を確認

Hao Fang and Yixiang Qiu and Hongyao Yu and Wenbo Yu and Jiawei Kong and Baoli Chong and Bin Chen and Xuan Wang and Shu-Tao Xia

(参考訳) Model Inversion (MI)攻撃は、事前訓練されたモデルへのアクセスを悪用することで、トレーニングデータに関するプライベート情報を開示することを目的としている。これらの攻撃により、敵はプライベートトレーニングデータと密接に一致した高忠実度データを再構築することができる。この分野の急速な進歩にもかかわらず、既存のMI攻撃と防衛の包括的概要は欠如している。このギャップを埋めるため,本稿ではこの分野を徹底的に調査し,総合的な調査を行う。まず、機械学習のシナリオにおける従来のMIについて簡単にレビューします。次に,複数のモダリティと学習タスクにまたがる \textbf{d}eep \textbf{n}eural \textbf{n}etworks (dnns) の最近の攻撃と防御を詳細に分析,比較した。

Model Inversion (MI) attacks aim to disclose private information about the training data by abusing access to the pre-trained models. These attacks enable adversaries to reconstruct high-fidelity data that closely aligns with the private training data, which has raised significant privacy concerns. Despite the rapid advances in the field, we lack a comprehensive overview of existing MI attacks and defenses. To fill this gap, this paper thoroughly investigates this field and presents a holistic survey. Firstly, our work briefly reviews the traditional MI on machine learning scenarios. We then elaborately analyze and compare numerous recent attacks and defenses on \textbf{D}eep \textbf{N}eural \textbf{N}etworks (DNNs) across multiple modalities and learning tasks.

翻訳日:2024-02-07 14:43:54 公開日:2024-02-06

# 監視学習とコントラスト学習を同時に行う高能率アベイラビリティーアタック

Efficient Availability Attacks against Supervised and Contrastive Learning Simultaneously ( http://arxiv.org/abs/2402.04010v1 )

ライセンス: Link先を確認

Yihan Wang and Yifan Zhu and Xiao-Shan Gao

(参考訳) アベイラビリティ攻撃は、不可避なノイズを生成し、リリース前に説明不能な例を作ることで、プライベートデータや商用データセットの不正使用を防止することができる。理想的には、得られた非学習性は、アルゴリズムが使用可能なモデルをトレーニングすることを防ぐ。教師付き学習(SL)アルゴリズムが失敗した場合、悪意のあるデータコレクタは、保護を回避するためにコントラスト学習(CL)アルゴリズムを利用する可能性がある。評価の結果,既存の手法のほとんどは,データ保護にリスクをもたらす教師付きおよび対照的な非学習性の両方を達成できないことがわかった。コントラストエラー最小化に基づく最近の手法とは異なり、教師付きエラー最小化や最大化フレームワークにおけるコントラストライクなデータ拡張を用いて、slとclの両方に有効な攻撃を得る。提案したAUEおよびAAP攻撃は,実世界のアプリケーションにおいて,計算量が少なく,SLおよびCLアルゴリズムをまたいだ最先端の非学習性を実現する。

Availability attacks can prevent the unauthorized use of private data and commercial datasets by generating imperceptible noise and making unlearnable examples before release. Ideally, the obtained unlearnability prevents algorithms from training usable models. When supervised learning (SL) algorithms have failed, a malicious data collector possibly resorts to contrastive learning (CL) algorithms to bypass the protection. Through evaluation, we have found that most of the existing methods are unable to achieve both supervised and contrastive unlearnability, which poses risks to data protection. Different from recent methods based on contrastive error minimization, we employ contrastive-like data augmentations in supervised error minimization or maximization frameworks to obtain attacks effective for both SL and CL. Our proposed AUE and AAP attacks achieve state-of-the-art worst-case unlearnability across SL and CL algorithms with less computation consumption, showcasing prospects in real-world applications.

翻訳日:2024-02-07 14:43:22 公開日:2024-02-06

# パラメータ効率の良いファインチューニングのための低ランクアテンションサイドチューニング

Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning ( http://arxiv.org/abs/2402.04009v1 )

ライセンス: Link先を確認

Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu

(参考訳) 大規模な事前訓練されたモデルを下流タスクに微調整する場合、パラメータ効率の良い微調整(PEFT)手法は、トレーニング可能なパラメータが少ないが、高いGPUメモリ消費と遅いトレーニング速度に悩まされる。これらの方法から学習可能なパラメータは事前学習されたモデルと絡み合っているため、凍結した事前学習モデルのパラメータに関連する勾配を微調整中に計算し保存する必要がある。本稿では,トレーニング対象モジュールを事前学習モデルから切り離し,パラメータだけでなく事前学習ネットワークの出力も凍結する低ランク注意サイドチューニング(LAST)を提案する。 LASTは低ランクのセルフアテンションモジュールのみで構成されるサイドネットワークを訓練する。事前学習されたモデルを凍結した特徴抽出器として見ることにより、サイドネットワークは事前学習されたモデルから中間出力を受け取り、タスク固有の知識の学習に集中する。また、LASTは複数の最適化目標に対して高い並列性を示し、例えば最適なハイパーパラメータの探索において、下流タスク適応において非常に効率的であることを示す。 LASTは、VTAB-1Kや他の視覚適応タスクにおいて、既存のPEFT法と比較して、約30倍のGPUメモリフットプリントと60倍のトレーニング時間で、従来の最先端の手法よりも優れています。

In finetuning a large pretrained model to downstream tasks, parameter-efficient fine-tuning (PEFT) methods can effectively finetune pretrained models with few trainable parameters, but suffer from high GPU memory consumption and slow training speed. Because learnable parameters from these methods are entangled with the pretrained model, gradients related to the frozen pretrained model's parameters have to be computed and stored during finetuning. We propose Low-rank Attention Side-Tuning (LAST), which disentangles the trainable module from the pretrained model by freezing not only parameters but also outputs of the pretrained network. LAST trains a side-network composed of only low-rank self-attention modules. By viewing the pretrained model as a frozen feature extractor, the side-network takes intermediate output from the pretrained model and focus on learning task-specific knowledge. We also show that LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation, for example, in finding optimal hyperparameters. LAST outperforms previous state-of-the-art methods on VTAB-1K and other visual adaptation tasks with roughly only 30\% of GPU memory footprint and 60\% of training time compared to existing PEFT methods, but achieves significantly higher accuracy.

翻訳日:2024-02-07 14:43:03 公開日:2024-02-06

# アルゴリズム的思考連鎖を用いたllm学習データにおける雑音の影響の理解

Understanding the Effect of Noise in LLM Training Data with Algorithmic Chains of Thought ( http://arxiv.org/abs/2402.04004v1 )

ライセンス: Link先を確認

Alex Havrilla, Maia Iyer

(参考訳) 事前トレーニングと微調整の両方の間、大規模言語モデル(\textbf{LLMs})は、広範囲に異なる品質のテキストのトークンで訓練される。どちらのフェーズも通常、‘low-quality’ や \textit{noisy} トレーニングサンプルをヒューリスティックにフィルタリングするが、ノイズの種類や強度が下流のパフォーマンスに与える影響についてはほとんど知られていない。本研究では,アルゴリズムで解けるタスクの高度に制御された設定において,思考連鎖(\textbf{CoT})のノイズがタスク性能に与える影響について検討する。まず、整数リスト上の任意の算術関数に対して、高度にカスタマイズ可能なノイズ付き実行トレースを生成するためのTraced Integer(\textbf{TInt})フレームワークを開発する。次に2種類のノイズを定義する: \textit{static} ノイズは cot トレースが計算された後に適用される局所的なノイズの形式であり、 \textit{dynamic} ノイズは計算されたトレースのエラーを伝播するグローバルなノイズである。次に,種々のレベルのデータセット汚染と強度を持つノイズ付きデータセットに対して,事前学習したモデルの試験性能を評価する。微調整されたモデルでは、高レベルの静的ノイズに対して非常に頑健であるが、低レベルの動的ノイズに対してかなり苦労している。対照的に、数発のトリガーモデルの方が静的ノイズに敏感に見える。この結果がノイズフィルタリングのベストプラクティスにどのように影響するか,特に大域的誤差を伴う破壊的動的ノイズを含むサンプルの除去の重要性を強調して考察した。

During both pretraining and fine-tuning, Large Language Models (\textbf{LLMs}) are trained on trillions of tokens of text of widely varying quality. Both phases of training typically involve heuristically filtering out ``low-quality'' or \textit{noisy} training samples, yet little is known quantitatively about how the type or intensity of noise affects downstream performance. In this work, we study how noise in chain of thought (\textbf{CoT}) impacts task performance in the highly-controlled setting of algorithmically solvable tasks. First, we develop the Traced Integer (\textbf{TInt}) framework to generate highly customizable noised execution traces for any arithmetic function on lists of integers. We then define two types of noise: \textit{static} noise, a local form of noise which is applied after the CoT trace is computed, and \textit{dynamic} noise, a global form of noise which propagates errors in the trace as it is computed. We then evaluate the test performance of pretrained models both prompted and fine-tuned on noised datasets with varying levels of dataset contamination and intensity. We find fine-tuned models are extremely robust to high levels of static noise but struggle significantly more with lower levels of dynamic noise. In contrast, few-shot prompted models appear more sensitive to even static noise. We conclude with a discussion of how our findings impact noise filtering best-practices, in particular emphasizing the importance of removing samples containing destructive dynamic noise with global errors.

翻訳日:2024-02-07 14:42:37 公開日:2024-02-06

# データの帰属訓練のための勾配スケッチとロスランドスケープの研究

Gradient Sketches for Training Data Attribution and Studying the Loss Landscape ( http://arxiv.org/abs/2402.03994v1 )

ライセンス: Link先を確認

Andrea Schioppa

(参考訳) 勾配やヘッセンベクトル積のランダム射影やスケッチは、相対幾何学に関する正確な情報を保持しながら多くのベクトルを保存する必要があるアプリケーションにおいて重要な役割を果たす。 2つの重要なシナリオは、トレーニングデータアトリビューション(モデルの振る舞いをトレーニングデータにトラクシングする)、各トレーニング例の勾配を格納する必要があること、複数のヘッシアンベクトル積を格納する必要があるヘッシアンスペクトルの研究(トレーニングダイナミクスを分析するため)である。密度の高い行列を使用するスケッチは実装が容易だが、メモリバウンドであり、現代のニューラルネットワークにはスケールできない。ニューラルネットワークの固有次元の研究に動機づけられ,スケーラブルなスケッチアルゴリズムの設計空間を提案・検討した。提案手法の有効性を3つの応用として, 学習データ属性, ヘッセンスペクトルの解析, 学習前言語モデルにおける固有次元の計算に適用した。

Random projections or sketches of gradients and Hessian vector products play an essential role in applications where one needs to store many such vectors while retaining accurate information about their relative geometry. Two important scenarios are training data attribution (tracing a model's behavior to the training data), where one needs to store a gradient for each training example, and the study of the spectrum of the Hessian (to analyze the training dynamics), where one needs to store multiple Hessian vector products. While sketches that use dense matrices are easy to implement, they are memory bound and cannot be scaled to modern neural networks. Motivated by work on the intrinsic dimension of neural networks, we propose and study a design space for scalable sketching algorithms. We demonstrate the efficacy of our approach in three applications: training data attribution, the analysis of the Hessian spectrum and the computation of the intrinsic dimension when fine-tuning pre-trained language models.

翻訳日:2024-02-07 14:41:51 公開日:2024-02-06

# 宇宙群制約結晶生成

Space Group Constrained Crystal Generation ( http://arxiv.org/abs/2402.03992v1 )

ライセンス: Link先を確認

Rui Jiao, Wenbing Huang, Yu Liu, Deli Zhao, Yang Liu

(参考訳) 結晶は多くの科学・産業応用の基礎となっている。結晶生成に様々な学習ベースのアプローチが提案されているが、既存の手法では結晶の幾何学を記述する上で重要な空間群制約を考えることはほとんどない。しかし、空間群制約を考えることは、その多様かつ非自明な形式のために困難である。本稿では,空間群の制約を,生成プロセスに手作業で組み込むのがより容易な等価な定式化に還元する。特に、空間群制約を格子行列の不変対数空間の基底制約と分数座標のワイコフ位置制約の2つの部分に変換する。導出制約に基づき、空間群制約をさらに考慮し、従来のDiffCSPを拡張した新しい拡散モデルDiffCSP++を提案する。いくつかの一般的なデータセットの実験は、空間群制約の関与の利点を検証し、我々のDiffCSP++が、結晶構造予測、ab初期結晶生成、およびカスタマイズされた空間群による制御可能な生成において有望な性能を達成することを示す。

Crystals are the foundation of numerous scientific and industrial applications. While various learning-based approaches have been proposed for crystal generation, existing methods seldom consider the space group constraint which is crucial in describing the geometry of crystals and closely relevant to many desirable properties. However, considering space group constraint is challenging owing to its diverse and nontrivial forms. In this paper, we reduce the space group constraint into an equivalent formulation that is more tractable to be handcrafted into the generation process. In particular, we translate the space group constraint into two parts: the basis constraint of the invariant logarithmic space of the lattice matrix and the Wyckoff position constraint of the fractional coordinates. Upon the derived constraints, we then propose DiffCSP++, a novel diffusion model that has enhanced a previous work DiffCSP by further taking space group constraint into account. Experiments on several popular datasets verify the benefit of the involvement of the space group constraint, and show that our DiffCSP++ achieves promising performance on crystal structure prediction, ab initio crystal generation and controllable generation with customized space groups.

翻訳日:2024-02-07 14:41:33 公開日:2024-02-06

# ニューラルランクの崩壊:体重減少と低ランクバイアスの少ないクラス内変動

Neural Rank Collapse: Weight Decay and Small Within-Class Variability Yield Low-Rank Bias ( http://arxiv.org/abs/2402.03991v1 )

ライセンス: Link先を確認

Emanuele Zangrando, Piero Deidda, Simone Brugiapaglia, Nicola Guglielmi, Francesco Tudisco

(参考訳) 近年のディープラーニングの研究は、暗黙の低ランクバイアスの強い経験的および理論的証拠を示しており、ディープネットワークの重み行列は、トレーニング中や利用可能なトレーニングモデルから比較的小さな特異値を取り除く傾向にあり、モデルの性能を維持したり改善したりしながら、モデルのサイズを著しく減少させる可能性がある。しかし、ニューラルネットワークにおける低ランクバイアスに関する理論的研究の大部分は、単純化されたディープ線形ネットワークを扱う。本研究では,非線形活性化と重み崩壊パラメータを持つ一般ネットワークを考察し,学習したネットワークの低ランクバイアスとネットワークの神経崩壊特性を結びつける,興味深い神経ランク崩壊現象の存在を示す。理論的な知見は, この現象を実証する実験的な評価によって裏付けられている。

Recent work in deep learning has shown strong empirical and theoretical evidence of an implicit low-rank bias: weight matrices in deep networks tend to be approximately low-rank and removing relatively small singular values during training or from available trained models may significantly reduce model size while maintaining or even improving model performance. However, the majority of the theoretical investigations around low-rank bias in neural networks deal with oversimplified deep linear networks. In this work, we consider general networks with nonlinear activations and the weight decay parameter, and we show the presence of an intriguing neural rank collapse phenomenon, connecting the low-rank bias of trained networks with networks' neural collapse properties: as the weight decay parameter grows, the rank of each layer in the network decreases proportionally to the within-class variability of the hidden-space embeddings of the previous layers. Our theoretical findings are supported by a range of experimental evaluations illustrating the phenomenon.

翻訳日:2024-02-07 14:41:13 公開日:2024-02-06

# サブサンプリングは魔法ではない:大きなバッチサイズが個人の確率的最適化に働く理由

Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation ( http://arxiv.org/abs/2402.03990v1 )

ライセンス: Link先を確認

Ossi R\"ais\"a, Joonas J\"alk\"o and Antti Honkela

(参考訳) 本研究では, バッチサイズがDP-SGDの総勾配変動に及ぼす影響について検討し, 大規模バッチサイズの有用性に関する理論的説明を求める。 DP-SGDは現代のDP深層学習の基礎であり、その特性は広く研究されており、近年の研究では大規模なバッチサイズが有用であることが実証されている。しかし、この利点の理論的な説明は、概してヒューリスティックである。まず,DP-SGDの全勾配分散をサブサンプリングおよびノイズ誘導分散に分解できることを示す。そして、無限個の反復の極限において、効果的なノイズ誘起分散がバッチサイズに不変であることを証明する。残りのサブサンプリング誘起分散はより大きなバッチサイズで減少するため、大きなバッチは有効な全勾配分散を減少させる。バッチサイズが小さくない場合,非漸近的レジームが実用的設定において有意であることを数値的に確認し,漸近的レジーム以外では,バッチサイズが大きいほど全勾配分散がさらに減少することを確認した。また,DP-SGDの1回の繰り返しに対して,大きなバッチサイズが有効なDPノイズの分散を減少させることを示す十分な条件も見出す。

We study the effect of the batch size to the total gradient variance in differentially private stochastic gradient descent (DP-SGD), seeking a theoretical explanation for the usefulness of large batch sizes. As DP-SGD is the basis of modern DP deep learning, its properties have been widely studied, and recent works have empirically found large batch sizes to be beneficial. However, theoretical explanations of this benefit are currently heuristic at best. We first observe that the total gradient variance in DP-SGD can be decomposed into subsampling-induced and noise-induced variances. We then prove that in the limit of an infinite number of iterations, the effective noise-induced variance is invariant to the batch size. The remaining subsampling-induced variance decreases with larger batch sizes, so large batches reduce the effective total gradient variance. We confirm numerically that the asymptotic regime is relevant in practical settings when the batch size is not small, and find that outside the asymptotic regime, the total gradient variance decreases even more with large batch sizes. We also find a sufficient condition that implies that large batch sizes similarly reduce effective DP noise variance for one iteration of DP-SGD.

翻訳日:2024-02-07 14:40:55 公開日:2024-02-06

# YOLOPointジョイントキーポイントとオブジェクト検出

YOLOPoint Joint Keypoint and Object Detection ( http://arxiv.org/abs/2402.03989v1 )

ライセンス: Link先を確認

Anton Backhaus, Thorsten Luettel, Hans-Joachim Wuensche

(参考訳) 未来のインテリジェントな車両は、周囲を安全に理解し、航行できなければならない。カメラベースの車両システムは、GNSSに依存しないSLAMと視覚計測のための低レベルのランドマークとして、キーポイントやオブジェクトを使用することができる。そこで本稿では,YOLOv5とSuperPointを組み合わせることで,画像内のキーポイントとオブジェクトを同時に検出する畳み込みニューラルネットワークモデルであるYOLOPointを提案する。共有バックボーンと軽量ネットワーク構造を使用することで、YOLOPointはHPatchesとKITTIベンチマークの両方で競争力を発揮する。

Intelligent vehicles of the future must be capable of understanding and navigating safely through their surroundings. Camera-based vehicle systems can use keypoints as well as objects as low- and high-level landmarks for GNSS-independent SLAM and visual odometry. To this end we propose YOLOPoint, a convolutional neural network model that simultaneously detects keypoints and objects in an image by combining YOLOv5 and SuperPoint to create a single forward-pass network that is both real-time capable and accurate. By using a shared backbone and a light-weight network structure, YOLOPoint is able to perform competitively on both the HPatches and KITTI benchmarks.

翻訳日:2024-02-07 14:40:33 公開日:2024-02-06

# REBORN: 教師なしASRの反復訓練による強化学習境界セグメンテーション

REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR ( http://arxiv.org/abs/2402.03988v1 )

ライセンス: Link先を確認

Liang-Hsuan Tseng, En-Pei Hu, Cheng-Han Chiang, Yuan Tseng, Hung-yi Lee, Lin-shan Lee, Shao-Hua Sun

(参考訳) 教師なし自動音声認識(ASR)は、ペア音声テキストデータの監督なしに、音声信号とその対応するテキスト書き起こしのマッピングを学習することを目的としている。音声信号中の単語/音素は、長さが可変で境界が不明な音声信号のセグメントで表現され、このセグメント構造により、特にペアデータなしで音声とテキストのマッピングが困難になる。本稿では,Reinforcement-Learned boundary Segmentation with Iterative Training for Unsupervised ASRを提案する。 ReBORNは、(1)音声信号におけるセグメント構造の境界を予測するセグメント化モデルを訓練し、(2)セグメント化モデルによってセグメント化されたセグメント構造である音素予測モデルを訓練し、音素転写を予測する。セグメンテーションモデルを訓練するための教師付きデータが入手できないため、強化学習を用いてセグメンテーションモデルを訓練し、低いパープレキシティで音素列予測をもたらすセグメンテーションを選択する。同じ環境では、リブリスピーチ、ティミット、および多言語リブリスピーチの5つの非英語の言語において、以前の教師なしasrモデルをすべて上回っています。我々は、REBORNが学習した境界が教師なしのASR性能を改善する理由を包括的に分析する。

Unsupervised automatic speech recognition (ASR) aims to learn the mapping between the speech signal and its corresponding textual transcription without the supervision of paired speech-text data. A word/phoneme in the speech signal is represented by a segment of speech signal with variable length and unknown boundary, and this segmental structure makes learning the mapping between speech and text challenging, especially without paired data. In this paper, we propose REBORN, Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR. REBORN alternates between (1) training a segmentation model that predicts the boundaries of the segmental structures in speech signals and (2) training the phoneme prediction model, whose input is a segmental structure segmented by the segmentation model, to predict a phoneme transcription. Since supervised data for training the segmentation model is not available, we use reinforcement learning to train the segmentation model to favor segmentations that yield phoneme sequence predictions with a lower perplexity. We conduct extensive experiments and find that under the same setting, REBORN outperforms all prior unsupervised ASR models on LibriSpeech, TIMIT, and five non-English languages in Multilingual LibriSpeech. We comprehensively analyze why the boundaries learned by REBORN improve the unsupervised ASR performance.

翻訳日:2024-02-07 14:40:22 公開日:2024-02-06

# TopoNav:スパースリワード環境における効率的な探索のためのトポロジカルナビゲーション

TopoNav: Topological Navigation for Efficient Exploration in Sparse Reward Environments ( http://arxiv.org/abs/2402.04061v1 )

ライセンス: Link先を確認

Jumman Hossain, Abu-Zaher Faridee, Nirmalya Roy, Jade Freeman, Timothy Gregory, Theron T. Trout

(参考訳) 未知の領域を探索する自律ロボットは、重要な課題に直面している。この課題は、伝統的な探査技術がしばしば失敗するスパース報酬環境を強化する。本稿では,ロボットがこれらの制約を克服し,効率,適応性,目標志向の探索を実現するための新しいフレームワークであるTopoNavを紹介する。 TopoNavの基本的なビルディングブロックは、アクティブトポロジカルマッピング、本質的な報酬機構、階層的客観的優先順位付けである。調査を通じて、TopoNavは重要な場所と経路をキャプチャする動的トポロジカルマップを構築している。内在的な報酬を利用して、このマップ内の指定されたサブゴールに向かってロボットを誘導し、スパースな報酬設定でも構造化された探索を促進する。効率的なナビゲーションを確保するため、TopoNavは階層的オブジェクト指向アクティブトポロジフレームワークを採用しており、ロボットは全体的な目標に集中しながら障害物回避のような緊急タスクを優先順位付けすることができる。実環境を再現するシミュレーション環境におけるTopoNavの有効性を示す。その結果, 探索効率, 航法精度, 予期せぬ障害物への適応性が大幅に向上し, 探索・救助, 環境モニタリング, 惑星探査など, 幅広い応用分野における自律探査に革命をもたらす可能性が示された。

Autonomous robots exploring unknown areas face a significant challenge -- navigating effectively without prior maps and with limited external feedback. This challenge intensifies in sparse reward environments, where traditional exploration techniques often fail. In this paper, we introduce TopoNav, a novel framework that empowers robots to overcome these constraints and achieve efficient, adaptable, and goal-oriented exploration. TopoNav's fundamental building blocks are active topological mapping, intrinsic reward mechanisms, and hierarchical objective prioritization. Throughout its exploration, TopoNav constructs a dynamic topological map that captures key locations and pathways. It utilizes intrinsic rewards to guide the robot towards designated sub-goals within this map, fostering structured exploration even in sparse reward settings. To ensure efficient navigation, TopoNav employs the Hierarchical Objective-Driven Active Topologies framework, enabling the robot to prioritize immediate tasks like obstacle avoidance while maintaining focus on the overall goal. We demonstrate TopoNav's effectiveness in simulated environments that replicate real-world conditions. Our results reveal significant improvements in exploration efficiency, navigational accuracy, and adaptability to unforeseen obstacles, showcasing its potential to revolutionize autonomous exploration in a wide range of applications, including search and rescue, environmental monitoring, and planetary exploration.

翻訳日:2024-02-07 14:32:41 公開日:2024-02-06

# 多変量時系列インプテーションのためのディープラーニング:調査

Deep Learning for Multivariate Time Series Imputation: A Survey ( http://arxiv.org/abs/2402.04059v1 )

ライセンス: Link先を確認

Jun Wang, Wenjie Du, Wei Cao, Keli Zhang, Wenjia Wang, Yuxuan Liang, Qingsong Wen

(参考訳) ユビキタスな欠落値は、多変量時系列データを部分的に観測し、時系列の完全性を破壊し、有効時系列データ解析を妨げる。近年の深層学習計算法は, 劣化した時系列データの品質向上に成功し, 下流タスクの性能向上に寄与している。本稿では,最近提案された深層学習インプテーション法について総合的な調査を行う。まず,検討した手法の分類法を提案し,その強みと限界を強調することで,それらの手法の構造化されたレビューを行う。また,異なる手法について実験を行い,その改善度を下流タスクで比較する。最後に,多変量時系列インプテーション研究の今後の課題について述べる。この作業のすべてのコードと構成は、定期的にメンテナンスされている多変量時系列計算用紙リストを含むが、GitHubリポジトリ~\url{https://github.com/WenjieDu/Awesome\_Imputation}にある。

The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we conduct a comprehensive survey on the recently proposed deep learning imputation methods. First, we propose a taxonomy for the reviewed methods, and then provide a structured review of these methods by highlighting their strengths and limitations. We also conduct empirical experiments to study different methods and compare their enhancement for downstream tasks. Finally, the open issues for future research on multivariate time series imputation are pointed out. All code and configurations of this work, including a regularly maintained multivariate time series imputation paper list, can be found in the GitHub repository~\url{https://github.com/WenjieDu/Awesome\_Imputation}.

翻訳日:2024-02-07 14:32:19 公開日:2024-02-06

# 学習アルゴリズムによるより柔軟なPAC-Bayesianメタラーニング

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms ( http://arxiv.org/abs/2402.04054v1 )

ライセンス: Link先を確認

Hossein Zakerinia, Amin Behjati, Christoph H. Lampert

(参考訳) PAC-Bayesian理論を用いたメタラーニング手法の研究フレームワークを提案する。これまでの作業に比べて大きな利点は、タスク間の知識の伝達を実現する方法において、柔軟性を高めることである。以前のアプローチでは、モデル上の事前分布を学習することで、これは間接的にのみ起こりうる。対照的に、新しい一般化は、将来のタスクに使用するべき学習アルゴリズムを学習するよりも、メタラーニングのプロセスをずっと直接的に表現できることを証明している。私たちのフレームワークの柔軟性は、幅広いメタ学習メカニズムを分析し、新しいメカニズムを設計するのに適しています。理論的貢献以外は、我々のフレームワークが実用的なメタ学習メカニズムの予測品質を改善することを実証的に示す。

We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory. Its main advantage over previous work is that it allows for more flexibility in how the transfer of knowledge between tasks is realized. For previous approaches, this could only happen indirectly, by means of learning prior distributions over models. In contrast, the new generalization bounds that we prove express the process of meta-learning much more directly as learning the learning algorithm that should be used for future tasks. The flexibility of our framework makes it suitable to analyze a wide range of meta-learning mechanisms and even design new mechanisms. Other than our theoretical contributions we also show empirically that our framework improves the prediction quality in practical meta-learning mechanisms.

翻訳日:2024-02-07 14:31:59 公開日:2024-02-06

# 順列型重みマッチングによる線形モード接続の解析

Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching ( http://arxiv.org/abs/2402.04051v1 )

ライセンス: Link先を確認

Akira Ito, Masanori Yamada, Atsutoshi Kumagai

(参考訳) 近年、Ainsworthらは、モデルパラメータの置換探索において、重量マッチング(WM)を用いて$L_2$距離を最小にするため、線形モード接続(LMC)を満たす置換を効果的に同定し、異なる種を持つ2つの独立に訓練されたモデル間の線形経路の損失がほぼ一定であることを示した。本稿では,WMを用いたLCCの理論解析を行い,確率勾配降下の有効性とモデルマージなどの分野への応用について考察する。まず,WM が検出した置換が 2 つのモデル間の距離を著しく減少させるわけではなく,LCC の発生は WM 自体の距離減少によるものではないことを実験的に理論的に示す。次に、置換が各層における重み行列の特異ベクトルの方向を変えることができるが、特異値ではないことを示す理論的洞察を与える。この発見は、WM によって発見された置換が、主にモデル全体の大きな特異値に付随する特異ベクトルの方向と一致していることを示している。このアライメントにより、モデル機能を決定する特異ベクトルは、事前マージされたモデルと後マージされたモデルの間により近いため、後マージされたモデルは、事前マージされたモデルと同様の機能を保持し、lmcを満足させるのが容易となる。最後に、データセット依存の置換探索法であるWMとストレートスルー推定器(STE)の違いを分析し、特に3つ以上のモデルを統合する場合、WMがSTEより優れていることを示す。

Recently, Ainsworth et al. showed that using weight matching (WM) to minimize the $L_2$ distance in a permutation search of model parameters effectively identifies permutations that satisfy linear mode connectivity (LMC), in which the loss along a linear path between two independently trained models with different seeds remains nearly constant. This paper provides a theoretical analysis of LMC using WM, which is crucial for understanding stochastic gradient descent's effectiveness and its application in areas like model merging. We first experimentally and theoretically show that permutations found by WM do not significantly reduce the $L_2$ distance between two models and the occurrence of LMC is not merely due to distance reduction by WM in itself. We then provide theoretical insights showing that permutations can change the directions of the singular vectors, but not the singular values, of the weight matrices in each layer. This finding shows that permutations found by WM mainly align the directions of singular vectors associated with large singular values across models. This alignment brings the singular vectors with large singular values, which determine the model functionality, closer between pre-merged and post-merged models, so that the post-merged model retains functionality similar to the pre-merged models, making it easy to satisfy LMC. Finally, we analyze the difference between WM and straight-through estimator (STE), a dataset-dependent permutation search method, and show that WM outperforms STE, especially when merging three or more models.

翻訳日:2024-02-07 14:31:47 公開日:2024-02-06

# ドットの接続:ブラックボックスビジョンランゲージモデルのための協調的微調整

Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models ( http://arxiv.org/abs/2402.04050v1 )

ライセンス: Link先を確認

Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan

(参考訳) 事前訓練された視覚言語モデル(VLM)の出現に伴い、下流タスクのための微調整に多大な努力が注がれている。効率的な微調整手法の設計の進歩にもかかわらず、そのような手法はモデルのパラメータへのアクセスを必要とするため、モデル所有者はモデル所有権を保護するためにブラックボックスとしてモデルを提供することがしばしば難しい。本稿では,入力プロンプトとモデル出力予測にのみアクセス可能なブラックボックスVLMをダウンストリームタスクに微調整するための, \textbf{C}ollabo\textbf{ra}tive \textbf{F}ine-\textbf{T}uning (\textbf{CraFT})アプローチを提案する。 CraFTは、2つのモジュールと、テキストプロンプトを学習するプロンプト生成モジュールと、残差スタイルの出力予測を強化する予測改善モジュールとから構成される。さらに,これらモジュール間の一貫した最適化を促進するために,補助的な予測一貫性損失を導入する。これらのモジュールは、新しい協調学習アルゴリズムによって最適化される。 15のデータセットにまたがる少数ショットの分類に関する広範な実験は、クラフトの優越性を示している。その結果、craftは16ショットのデータセットと8000のクエリで約12\%のまともなゲインを達成できた。さらに、CraFTはより速くトレーニングし、配置にメモリフットプリントの約1/80しか使用せず、ホワイトボックス方式に比べて1.62\%しか消費していない。

With the emergence of pretrained vision-language models (VLMs), considerable efforts have been devoted to fine-tuning them for downstream tasks. Despite the progress made in designing efficient fine-tuning methods, such methods require access to the model's parameters, which can be challenging as model owners often opt to provide their models as a black box to safeguard model ownership. This paper proposes a \textbf{C}ollabo\textbf{ra}tive \textbf{F}ine-\textbf{T}uning (\textbf{CraFT}) approach for fine-tuning black-box VLMs to downstream tasks, where one only has access to the input prompts and the output predictions of the model. CraFT comprises two modules, a prompt generation module for learning text prompts and a prediction refinement module for enhancing output predictions in residual style. Additionally, we introduce an auxiliary prediction-consistent loss to promote consistent optimization across these modules. These modules are optimized by a novel collaborative training algorithm. Extensive experiments on few-shot classification over 15 datasets demonstrate the superiority of CraFT. The results show that CraFT achieves a decent gain of about 12\% with 16-shot datasets and only 8,000 queries. Moreover, CraFT trains faster and uses only about 1/80 of the memory footprint for deployment, while sacrificing only 1.62\% compared to the white-box method.

翻訳日:2024-02-07 14:31:15 公開日:2024-02-06

# 討論のllmシミュレーションにおける系統的バイアス

Systematic Biases in LLM Simulations of Debates ( http://arxiv.org/abs/2402.04049v1 )

ライセンス: Link先を確認

Amir Taubenfeld, Yaniv Dover, Roi Reichart, Ariel Goldstein

(参考訳) 近年の自然言語処理,特にLarge Language Models(LLM)の出現は,人間の行動を正確に再現する計算シミュレーションを構築する上で,エキサイティングな可能性をもたらしている。しかし, LLM は簡素な帰納規則を持たない複雑な統計的学習者であり, 予期せぬ行動を起こす傾向がある。本研究では,人間のインタラクションをシミュレートするLLMの限界,特に政治的議論をシミュレートするLLMの能力に注目した。以上の結果から,LLMエージェントが特定の政治的視点から議論される一方で,モデル固有の社会的バイアスに適合する傾向が示唆された。この傾向は、人間の間で確立された社会的ダイナミクスから逸脱しているように見える行動パターンをもたらす。我々は,llm内のバイアスを操作できる自動自己微調整法を用いてこれらの観察を補強し,エージェントがその後変化したバイアスと一致することを示す。これらの結果は、エージェントがより現実的なシミュレーションを作成するための重要なステップである、これらのバイアスを克服する手法を開発するためのさらなる研究の必要性を強調している。

Recent advancements in natural language processing, especially the emergence of Large Language Models (LLMs), have opened exciting possibilities for constructing computational simulations designed to replicate human behavior accurately. However, LLMs are complex statistical learners without straightforward deductive rules, making them prone to unexpected behaviors. In this study, we highlight the limitations of LLMs in simulating human interactions, particularly focusing on LLMs' ability to simulate political debates. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases despite being directed to debate from certain political perspectives. This tendency results in behavioral patterns that seem to deviate from well-established social dynamics among humans. We reinforce these observations using an automatic self-fine-tuning method, which enables us to manipulate the biases within the LLM and demonstrate that agents subsequently align with the altered biases. These results underscore the need for further research to develop methods that help agents overcome these biases, a critical step toward creating more realistic simulations.

翻訳日:2024-02-07 14:30:47 公開日:2024-02-06

# ノードとエッジ属性の連成拡散によるグラフの生成モデリング

Generative Modeling of Graphs via Joint Diffusion of Node and Edge Attributes ( http://arxiv.org/abs/2402.04046v1 )

ライセンス: Link先を確認

Nimrod Berman, Eitan Kosman, Dotan Di Castro, Omri Azencot

(参考訳) グラフ生成は様々な工学と科学の分野に不可欠である。それでも、既存の方法論はエッジ属性の生成を見逃す傾向がある。しかし、エッジ属性が必須となる重要なアプリケーションを特定し、そのような状況では事前の手法が適さない可能性がある。さらに、自明な適応が利用可能であるが、グラフコンポーネント間の相互作用を適切にモデル化しないため、実験的な調査によってその有効性は限られている。そこで我々は,全てのグラフ成分を考慮したグラフ生成のためのノードとエッジの合同スコアベースモデルを提案する。私たちのアプローチは2つの重要なノベルティを提供します。 i)2つの成分に基づいてサンプルを生成する注目モジュールにノード属性とエッジ属性を結合する。 (ii)ノード、エッジおよび隣接情報はグラフ拡散過程において相互に依存する。本手法は,エッジ機能が重要となる実世界および合成データセットを含む挑戦的ベンチマークを用いて評価する。さらに、エッジ値を組み込んだ新しい合成データセットを導入する。さらに,グラフとして表現された交通シーンの生成という,本手法の利点を生かした新しいアプリケーションを提案する。本手法は他のグラフ生成法よりも優れており,エッジ関連尺度において大きな優位性を示す。

Graph generation is integral to various engineering and scientific disciplines. Nevertheless, existing methodologies tend to overlook the generation of edge attributes. However, we identify critical applications where edge attributes are essential, making prior methods potentially unsuitable in such contexts. Moreover, while trivial adaptations are available, empirical investigations reveal their limited efficacy as they do not properly model the interplay among graph components. To address this, we propose a joint score-based model of nodes and edges for graph generation that considers all graph components. Our approach offers two key novelties: (i) node and edge attributes are combined in an attention module that generates samples based on the two ingredients; and (ii) node, edge and adjacency information are mutually dependent during the graph diffusion process. We evaluate our method on challenging benchmarks involving real-world and synthetic datasets in which edge features are crucial. Additionally, we introduce a new synthetic dataset that incorporates edge values. Furthermore, we propose a novel application that greatly benefits from the method due to its nature: the generation of traffic scenes represented as graphs. Our method outperforms other graph generation methods, demonstrating a significant advantage in edge-related measures.

翻訳日:2024-02-07 14:30:28 公開日:2024-02-06

# グラフニューラルネットワークのための PAC-Bayesian Adversarially Robust Generalization bounds

PAC-Bayesian Adversarially Robust Generalization Bounds for Graph Neural Network ( http://arxiv.org/abs/2402.04038v1 )

ライセンス: Link先を確認

Tan Sun, Junhong Lin

(参考訳) グラフニューラルネットワーク(GNN)は、さまざまなグラフ関連タスクで人気を集めている。しかし、ディープニューラルネットワークと同様に、GNNも敵の攻撃に対して脆弱である。実証的研究は、敵の攻撃に対する効果的な防御アルゴリズムを確立する上で、敵の堅牢な一般化が重要な役割を果たすことを示した。本稿では, PAC-Bayesianフレームワークを用いて, 2種類の人気GNN, グラフ畳み込みネットワーク(GCN), メッセージパッシンググラフニューラルネットワークに対して, 逆向きに堅牢な一般化境界を提供する。その結果、グラフ上の拡散行列のスペクトルノルムと重みのスペクトルノルムと摂動係数が両モデルの堅牢な一般化境界を支配していることが明らかとなった。我々の境界は、(liao et al., 2020) で開発された結果の非自明な一般化であり、最大ノード次数の指数依存を避けながら、標準設定から逆設定までである。コーナリーとして、最大ノード次数への指数的依存を回避して(Liao et al., 2020)のバウンダリを改善する標準設定におけるGCNに対するより優れたPAC-Bayesianのロバストな一般化境界を導出する。

Graph neural networks (GNNs) have gained popularity for various graph-related tasks. However, similar to deep neural networks, GNNs are also vulnerable to adversarial attacks. Empirical studies have shown that adversarially robust generalization has a pivotal role in establishing effective defense algorithms against adversarial attacks. In this paper, we contribute by providing adversarially robust generalization bounds for two kinds of popular GNNs, graph convolutional network (GCN) and message passing graph neural network, using the PAC-Bayesian framework. Our result reveals that spectral norm of the diffusion matrix on the graph and spectral norm of the weights as well as the perturbation factor govern the robust generalization bounds of both models. Our bounds are nontrivial generalizations of the results developed in (Liao et al., 2020) from the standard setting to adversarial setting while avoiding exponential dependence of the maximum node degree. As corollaries, we derive better PAC-Bayesian robust generalization bounds for GCN in the standard setting, which improve the bounds in (Liao et al., 2020) by avoiding exponential dependence on the maximum node degree.

翻訳日:2024-02-07 14:30:12 公開日:2024-02-06

# グラフ表現の証明可能なプライバシー脆弱性について

On provable privacy vulnerabilities of graph representations ( http://arxiv.org/abs/2402.04033v1 )

ライセンス: Link先を確認

Ruofan Wu, Guanhua Fang, Qiying Pan, Mingyang Zhang, Tengfei Liu, Weiqiang Wang, Wenbiao Zhao

(参考訳) グラフ表現学習(GRL)は複雑なネットワーク構造から洞察を抽出するために重要であるが、これらの表現の潜在的なプライバシー上の脆弱性によりセキュリティ上の懸念も生じている。本稿では,エッジ再構成攻撃により,高感度なトポロジ情報を推定できるグラフニューラルモデルの構造的脆弱性について検討する。本研究は,コサイン類似性に基づくエッジ再構築攻撃(COSERA)の理論的基盤に主に対応し,グラフサイズが大きくなるにつれて,スパース・エルドス・レニーグラフを無作為なランダムな特徴で完全に再構築できるという理論的および実証的な証拠を提供する。逆に,確率ブロックモデルの解析と実験により,スパーシティがcoseraの有効性にとって重要な要因であることを示す。最後に,COSERAに対するノイズアグリゲーション(NAG)機構を用いて生成した(確実に)プライベートグラフ表現のレジリエンスについて検討する。我々は、COSERAが、プライバシとユーティリティのトレードオフを解明する手段として機能する能力の有効性と欠如の両方を示す事例を実証的に記述する。

Graph representation learning (GRL) is critical for extracting insights from complex network structures, but it also raises security concerns due to potential privacy vulnerabilities in these representations. This paper investigates the structural vulnerabilities in graph neural models where sensitive topological information can be inferred through edge reconstruction attacks. Our research primarily addresses the theoretical underpinnings of cosine-similarity-based edge reconstruction attacks (COSERA), providing theoretical and empirical evidence that such attacks can perfectly reconstruct sparse Erdos Renyi graphs with independent random features as graph size increases. Conversely, we establish that sparsity is a critical factor for COSERA's effectiveness, as demonstrated through analysis and experiments on stochastic block models. Finally, we explore the resilience of (provably) private graph representations produced via noisy aggregation (NAG) mechanism against COSERA. We empirically delineate instances wherein COSERA demonstrates both efficacy and deficiency in its capacity to function as an instrument for elucidating the trade-off between privacy and utility.

翻訳日:2024-02-07 14:29:50 公開日:2024-02-06

# HEAM : 処理インメモリを用いたハッシュ埋め込み高速化

HEAM : Hashed Embedding Acceleration using Processing-In-Memory ( http://arxiv.org/abs/2402.04032v1 )

ライセンス: Link先を確認

Youngsuk Kim, Hyuk-Jae Lee, Chae Eun Rhee

(参考訳) 今日のデータセンターでは、パーソナライズドレコメンデーションシステムが、特に組み込み操作を行う場合に、大きなメモリ容量と高い帯域幅の必要性といった課題に直面している。従来のアプローチでは、DIMMベースのニアメモリ処理技術や、メモリバウンド問題に対処し、メモリ帯域幅を拡大する3DスタックDRAMが導入されていた。しかし、これらのソリューションはパーソナライズされたレコメンデーションシステムのサイズ拡大を扱う場合に不足する。レコメンデーションモデルは数十テラバイトを超えるサイズに成長し、従来の単一ノード推論サーバ上で効率的に動作することが困難になっている。組込みテーブルの容量を削減するために様々なアルゴリズムが提案されているが、メモリアクセスの増加やメモリ資源の非効率利用につながることが多い。本稿では,3次元スタックDRAMとDIMMを統合したヘテロジニアスメモリアーキテクチャであるHEAMについて紹介する。アーキテクチャは、従来のDIMM、ベースダイレベルのProcess-In-Memory(PIM)を備えた3DスタックDRAM、Look-Up-Tableを備えた銀行グループレベルのPIMで構成されている。この設定は、時間的局所性や埋め込みテーブル容量など、構成的埋め込みのユニークな側面を満たすように特別に設計されている。この設計は銀行アクセスを効果的に削減し、アクセス効率を向上し、全体のスループットを向上し、6.3倍の高速化と58.9%の省エネを実現している。

In today's data centers, personalized recommendation systems face challenges such as the need for large memory capacity and high bandwidth, especially when performing embedding operations. Previous approaches have relied on DIMM-based near-memory processing techniques or introduced 3D-stacked DRAM to address memory-bound issues and expand memory bandwidth. However, these solutions fall short when dealing with the expanding size of personalized recommendation systems. Recommendation models have grown to sizes exceeding tens of terabytes, making them challenging to run efficiently on traditional single-node inference servers. Although various algorithmic methods have been proposed to reduce embedding table capacity, they often result in increased memory access or inefficient utilization of memory resources. This paper introduces HEAM, a heterogeneous memory architecture that integrates 3D-stacked DRAM with DIMM to accelerate recommendation systems in which compositional embedding is utilized-a technique aimed at reducing the size of embedding tables. The architecture is organized into a three-tier memory hierarchy consisting of conventional DIMM, 3D-stacked DRAM with a base die-level Processing-In-Memory (PIM), and a bank group-level PIM incorporating a Look-Up-Table. This setup is specifically designed to accommodate the unique aspects of compositional embedding, such as temporal locality and embedding table capacity. This design effectively reduces bank access, improves access efficiency, and enhances overall throughput, resulting in a 6.3 times speedup and 58.9% energy savings compared to the baseline.

翻訳日:2024-02-07 14:29:30 公開日:2024-02-06

# Polyp-DDPM:拡張セグメンテーションのための拡散型セマンティックポリープ合成

Polyp-DDPM: Diffusion-Based Semantic Polyp Synthesis for Enhanced Segmentation ( http://arxiv.org/abs/2402.04031v1 )

ライセンス: Link先を確認

Zolnamar Dorjsembe, Hsing-Kuo Pao and Furen Xiao

(参考訳) 本研究は, 消化管ポリープのセグメンテーション向上を目的とした, マスク上のポリプの現実的な画像生成のための拡散ベース手法であるpolyp-ddpmを提案する。我々のアプローチは、医療画像に関連するデータ制限、高アノテーションコスト、プライバシーに関する課題に対処する。異常領域を表すセグメンテーションマスク-バイナリマスクの拡散モデルを条件付けすることにより、ポリプ-DDPMは画像品質(フレシェインセプション距離(FID)スコアが78.47、スコアが83.79以上)とセグメンテーション性能(インターセクションオーバーユニオン(IoU)が0.7156、ベースラインモデルの合成画像が0.6694、実データが0.6767と、最先端の手法よりも優れている。提案手法は,高品質で多様な合成データセットを生成し,実画像に匹敵するポリプセグメンテーションモデルを強化し,セグメンテーションモデルを改善するためのデータ拡張機能を提供する。 polyp-ddpmのソースコードと事前トレーニングされたウェイトは、https://github.com/mobaidoctor/polyp-ddpmで公開されている。

This study introduces Polyp-DDPM, a diffusion-based method for generating realistic images of polyps conditioned on masks, aimed at enhancing the segmentation of gastrointestinal (GI) tract polyps. Our approach addresses the challenges of data limitations, high annotation costs, and privacy concerns associated with medical images. By conditioning the diffusion model on segmentation masks-binary masks that represent abnormal areas-Polyp-DDPM outperforms state-of-the-art methods in terms of image quality (achieving a Frechet Inception Distance (FID) score of 78.47, compared to scores above 83.79) and segmentation performance (achieving an Intersection over Union (IoU) of 0.7156, versus less than 0.6694 for synthetic images from baseline models and 0.7067 for real data). Our method generates a high-quality, diverse synthetic dataset for training, thereby enhancing polyp segmentation models to be comparable with real images and offering greater data augmentation capabilities to improve segmentation models. The source code and pretrained weights for Polyp-DDPM are made publicly available at https://github.com/mobaidoctor/polyp-ddpm.

翻訳日:2024-02-07 14:29:00 公開日:2024-02-06

# 密度汎関数理論による再伝播による量子化学データのコスト削減

Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory ( http://arxiv.org/abs/2402.04030v1 )

ライセンス: Link先を確認

Alexander Mathiasen, Hatem Helal, Paul Balanca, Adam Krzywaniak, Ali Parviz, Frederik Hvilsh{\o}j, Blazej Banaszewski, Carlo Luschi, Andrew William Fitzgibbon

(参考訳) 密度汎関数理論(DFT)は分子の量子化学的性質を正確に予測するが、$O(N_{\text{electrons}}^3)$としてスケールする。 Sch\utt et al. (2019)は、ニューラルネットワーク(NN)でDFT 1000倍高速に近似した。おそらく、より大きな分子へのスケーリングで直面する最大の問題は、DFTラベルのコストである。例えば、PCQデータセット(Nakata & Shimazaki, 2017)を作成するのに何年もかかり、その後のNNは1週間以内にトレーニングされる。 DFTはエネルギー$E(\cdot )$を「ロス関数」として最小化することで分子をラベル付けする。損失関数として$E(\cdot )$でNNを直接トレーニングすることでデータセット生成をバイパスする。比較のために、sch\"utt et al. (2019)は626時間のデータセットの作成に費やし、彼らのnnを160h、合計786hでトレーニングした。

Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Sch\"utt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimazaki, 2017) on which subsequent NNs are trained within a week. DFT labels molecules by minimizing energy $E(\cdot )$ as a "loss function." We bypass dataset creation by directly training NNs with $E(\cdot )$ as a loss function. For comparison, Sch\"utt et al. (2019) spent 626 hours creating a dataset on which they trained their NN for 160h, for a total of 786h; our method achieves comparable performance within 31h.

翻訳日:2024-02-07 14:28:28 公開日:2024-02-06

# 正凹深部平衡モデル

Positive concave deep equilibrium models ( http://arxiv.org/abs/2402.04029v1 )

ライセンス: Link先を確認

Mateusz Gabor, Tomasz Piotrowski, Renato L. G. Cavalcante

(参考訳) Deep equilibrium(DEQ)モデルは、標準ニューラルネットワークのメモリ効率の代替として広く認識されており、言語モデリングやコンピュータビジョンタスクにおける最先端のパフォーマンスを実現している。これらのモデルは、出力を明示的に計算するのではなく、固定点方程式を解く。しかし、既存のDECモデルは固定点の存在と特異性の正式な保証を欠いていることが多く、固定点の計算に使用される数値スキームの収束は正式には確立されていない。結果として、deqモデルは実際には不安定である可能性がある。これらの欠点に対処するために、正凹深度平衡モデル(pcDEQ)と呼ばれる新しいDEQモデルを導入する。非線形ペロン・フロベニウス理論に基づく我々のアプローチは、正のオーサント上に凹む非負の重みと活性化関数を強制する。これらの制約を課すことで、凸解析における単調作用素理論に基づくようなDEC文献でよく見られる追加の複雑な仮定に頼ることなく、固定点の存在と一意性を容易に確保できる。さらに、不動点を標準不動点アルゴリズムで計算し、特にトレーニングプロセスを単純化する幾何学的収束の理論的保証を提供する。実験は、他の暗黙のモデルに対するpcDEQモデルの競合性を実証する。

Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.

翻訳日:2024-02-07 14:28:13 公開日:2024-02-06

# AlbNews: アルバニア語におけるトピックモデリングのための見出しのコーパス

AlbNews: A Corpus of Headlines for Topic Modeling in Albanian ( http://arxiv.org/abs/2402.04028v1 )

ライセンス: Link先を確認

Erion \c{C}ano, Dario Lamaj

(参考訳) アルバニア語のような低リソース言語で利用できるテキストコーパスの不足は、自然言語処理タスクの研究にとって深刻なハードルである。本稿では,アルバニア語で600件のニュース見出しと2600件の未掲載記事を集めたAlbNewsを紹介する。このデータはトピックモデリング研究の実施に自由に利用できる。 albnewsサンプルでトレーニングされた従来の機械学習分類器の初期分類スコアを報告する。これらの結果から,基本モデルはアンサンブル学習モデルより優れており,今後の実験のベースラインとして機能することが示唆された。

The scarcity of available text corpora for low-resource languages like Albanian is a serious hurdle for research in natural language processing tasks. This paper introduces AlbNews, a collection of 600 topically labeled news headlines and 2600 unlabeled ones in Albanian. The data can be freely used for conducting topic modeling research. We report the initial classification scores of some traditional machine learning classifiers trained with the AlbNews samples. These results show that basic models outrun the ensemble learning ones and can serve as a baseline for future experiments.

翻訳日:2024-02-07 14:27:50 公開日:2024-02-06

# 精神保健情報の誤り分析:多言語医療コミュニケーションにおける正確性、理解性、意味の評価

Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication ( http://arxiv.org/abs/2402.04023v1 )

ライセンス: Link先を確認

Jaleh Delfani, Constantin Orasan, Hadeel Saadany, Ozlem Temizoz, Eleanor Taylor-Stilgoe, Diptesh Kanojia, Sabine Braun, Barbara Schouten

(参考訳) 本研究は,MHealthドメインのGT出力を英語からペルシア語,アラビア語,トルコ語,ルーマニア語,スペイン語に解析することにより,メンタルヘルス(Mhealth)情報の翻訳にGoogle Translate(GT)を使用することについて検討した。英国国立保健サービスウェブサイトのmhealth情報と王立精神科医大学の情報リーフレットからなる2つのデータセットが使用された。対象言語の母語話者はGT翻訳を手動で評価し、医学用語の正確性、理解性、重要な構文・意味的誤りに焦点を当てた。 GT出力分析は、特にアラビア語、ルーマニア語、ペルシア語の医学用語を正確に翻訳する際の課題を明らかにした。頻度の問題は様々な言語に広まり、主にアラビア語とスペイン語の理解に影響を及ぼした。批判的な誤りは、特にペルシャ語、トルコ語、ルーマニア語など、特定の文脈で発生した。長いテキストの翻訳では改善が見られるが、医療やメンタルヘルスの用語や流布の精度を高める必要がある一方で、よりシームレスなユーザーエクスペリエンスのためのフォーマットの問題にも対処する必要がある。この結果は、Mhealth翻訳にカスタマイズされた翻訳エンジンを使う必要性と、機械翻訳された医療コンテンツのみに依存する際の課題を強調し、多言語医療コミュニケーションにおける人間レビュアーの役割を強調した。

This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its accuracy, comprehensibility, and implications for multilingual healthcare communication through analysing GT output in the MHealth domain from English to Persian, Arabic, Turkish, Romanian, and Spanish. Two datasets comprising MHealth information from the UK National Health Service website and information leaflets from The Royal College of Psychiatrists were used. Native speakers of the target languages manually assessed the GT translations, focusing on medical terminology accuracy, comprehensibility, and critical syntactic/semantic errors. GT output analysis revealed challenges in accurately translating medical terminology, particularly in Arabic, Romanian, and Persian. Fluency issues were prevalent across various languages, affecting comprehension, mainly in Arabic and Spanish. Critical errors arose in specific contexts, such as bullet-point formatting, specifically in Persian, Turkish, and Romanian. Although improvements are seen in longer-text translations, there remains a need to enhance accuracy in medical and mental health terminology and fluency, whilst also addressing formatting issues for a more seamless user experience. The findings highlight the need to use customised translation engines for Mhealth translation and the challenges when relying solely on machine-translated medical content, emphasising the crucial role of human reviewers in multilingual healthcare communication.

翻訳日:2024-02-07 14:27:42 公開日:2024-02-06

# 科学言語モデリング:分子科学における大規模言語モデルの定量的評価

Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science ( http://arxiv.org/abs/2402.04119v1 )

ライセンス: Link先を確認

Pengfei Liu, Jun Tao, Zhixiang Ren

(参考訳) 効率的な分子モデリングと設計は、新しい分子の発見と探索に不可欠であり、深層学習法の導入はこの分野に革命をもたらした。特に、大きな言語モデル(LLM)は、自然言語処理(NLP)の観点から科学的問題に取り組むための新しいアプローチを提供し、科学言語モデリング(SLM)と呼ばれる研究パラダイムを導入している。しかし、モデルとデータモダリティのマッチングを定量化する方法と、モデルの知識-学習の好みを特定する方法の2つの大きな問題が残る。これらの課題に対処するため、ChEBI-20-MMと呼ばれるマルチモーダルベンチマークを提案し、モデルとデータモダリティとの互換性と知識獲得を評価する1263の実験を行った。モーダル遷移確率行列を通じて、タスクに最も適したモーダル性についての洞察を提供する。さらに,局所化特徴フィルタリングによる文脈固有知識マッピングの統計的解釈手法を提案する。先駆的解析は学習機構の探索を提供し,分子科学におけるslmの進歩への道を開く。

Efficient molecular modeling and design are crucial for the discovery and exploration of novel molecules, and the incorporation of deep learning methods has revolutionized this field. In particular, large language models (LLMs) offer a fresh approach to tackle scientific problems from a natural language processing (NLP) perspective, introducing a research paradigm called scientific language modeling (SLM). However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our pioneering analysis offers an exploration of the learning mechanism and paves the way for advancing SLM in molecular science.

翻訳日:2024-02-07 14:21:19 公開日:2024-02-06

# コードインスパイアオブザーバブル測定によるロバスト射影計測

Robust projective measurements through measuring code-inspired observables ( http://arxiv.org/abs/2402.04093v1 )

ライセンス: Link先を確認

Yingkai Ouyang

(参考訳) 量子測度は量子情報処理タスクではユビキタスだが、エラーによって出力が信頼できない。本稿では,コードインスパイアされた可観測物を測定することで,頑健な射影計測を実現する手法を提案する。すなわち、射影的povm、古典的コード、各観測可能な測定結果数に関する制約が与えられたとき、ノイズのない設定における射影的測定と等価な測定値を持つ可換可観測性を構築する。さらに、古典コードが$t$エラーを正せば、observables測定の古典的な結果に対して$t$エラーを正すことができる。本手法では、量子誤り訂正符号に量子データのエンコーディングを必要としないため、量子誤り補正を使用しない短期量子アルゴリズムの堅牢な測定を行うことができる。さらに,提案手法は任意の射影型POVMに対して有効であり,非安定化器量子誤り訂正符号におけるロバストシンドローム抽出を許容できる。

Quantum measurements are ubiquitous in quantum information processing tasks, but errors can render their outputs unreliable. Here, we present a scheme that implements a robust projective measurement through measuring code-inspired observables. Namely, given a projective POVM, a classical code and a constraint on the number of measurement outcomes each observable can have, we construct commuting observables whose measurement is equivalent to the projective measurement in the noiseless setting. Moreover, we can correct $t$ errors on the classical outcomes of the observables' measurement if the classical code corrects $t$ errors. Since our scheme does not require the encoding of quantum data onto a quantum error correction code, it can help construct robust measurements for near-term quantum algorithms that do not use quantum error correction. Moreover, our scheme works for any projective POVM, and hence can allow robust syndrome extraction procedures in non-stabilizer quantum error correction codes.

翻訳日:2024-02-07 14:21:01 公開日:2024-02-06

# サイバーいじめ検出のための大規模言語モデルの利用

The Use of a Large Language Model for Cyberbullying Detection ( http://arxiv.org/abs/2402.04088v1 )

ライセンス: Link先を確認

Bayode Ogunleye, Babitha Dharmaraj

(参考訳) ソーシャルメディアの優位は、加害者に対するいじめのチャンネルに追加された。残念ながら、サイバーいじめ(cyberbullying, cb)は、今日のサイバー世界でもっとも一般的な現象であり、市民の精神的および身体的健康に対する深刻な脅威である。これにより、オンラインフォーラム、ブログ、ソーシャルメディアプラットフォームからいじめコンテンツを防止し、社会への影響を管理する堅牢なシステムを開発する必要が生じる。この目的のために、機械学習(ML)アルゴリズムがいくつか提案されている。しかし、ハイクラス不均衡や一般化の問題から、パフォーマンスは一貫していない。近年,BERT や RoBERTa のような大規模言語モデル (LLM) がSOTA (State-of-the-art) を達成し,自然言語処理 (NLP) タスクを実現している。残念ながら、LSMはCB検出に広く適用されていない。本稿では,これらのモデルを用いたサイバーブリュリング(CB)検出について検討した。既存の研究(FormspringとTwitter)から新しいデータセット(D2)を用意しました。データセットD1とD2の実験結果から,RoBERTaは他のモデルよりも優れていた。

The dominance of social media has added to the channels of bullying for perpetrators. Unfortunately, cyberbullying (CB) is the most prevalent phenomenon in todays cyber world, and is a severe threat to the mental and physical health of citizens. This opens the need to develop a robust system to prevent bullying content from online forums, blogs, and social media platforms to manage the impact in our society. Several machine learning (ML) algorithms have been proposed for this purpose. However, their performances are not consistent due to high class imbalance and generalisation issues. In recent years, large language models (LLMs) like BERT and RoBERTa have achieved state-of-the-art (SOTA) results in several natural language processing (NLP) tasks. Unfortunately, the LLMs have not been applied extensively for CB detection. In our paper, we explored the use of these models for cyberbullying (CB) detection. We have prepared a new dataset (D2) from existing studies (Formspring and Twitter). Our experimental results for dataset D1 and D2 showed that RoBERTa outperformed other models.

翻訳日:2024-02-07 14:20:44 公開日:2024-02-06

# トレーニングフリークリップ適応のためのハード・トゥ・ビートベースライン

A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation ( http://arxiv.org/abs/2402.04087v1 )

ライセンス: Link先を確認

Zhengbo Wang, Jian Liang, Lijun Sheng, Ran He, Zilei Wang, Tieniu Tan

(参考訳) 対照的に、CLIP(Contrastive Language- Image Pretraining)はその目覚ましいゼロショット能力で人気を集めている。近年、下流タスクにおけるCLIPの性能を高めるために、プロンプト学習やアダプタなどの効率的な微調整手法の開発に焦点が当てられている。しかし、これらの手法には追加のトレーニング時間と計算資源が必要であり、限られたリソースを持つデバイスには望ましくない。本稿では,従来のアルゴリズムであるガウス判別分析(GDA)を再検討し,CLIPの下流分類に適用する。通常、GDA は各クラスの特徴が同一の共分散を持つガウス分布に従うと仮定する。ベイズの式を活用すれば、分類器はクラス平均と共分散で表現できるが、これは訓練を必要とせずにデータから推定できる。視覚的モダリティとテキスト的モダリティの両方からの知識を統合するため、CLIP内のゼロショット分類器とアンサンブルする。 17のデータセットの集約的な結果から,本手法は,数点の分類,不均衡学習,分布外一般化において,最先端の手法で同等の結果を得るか,あるいは達成することを確認した。さらに,本手法をベース・トゥ・ニュー・ジェネライゼーションと教師なし学習に拡張し,競合するアプローチに対してその優越性を示す。コードは \url{https://github.com/mrflogs/iclr24} で公開されている。

Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficient fine-tuning methods, such as prompt learning and adapter, to enhance CLIP's performance in downstream tasks. However, these methods still require additional training time and computational resources, which is undesirable for devices with limited resources. In this paper, we revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP. Typically, GDA assumes that features of each class follow Gaussian distributions with identical covariance. By leveraging Bayes' formula, the classifier can be expressed in terms of the class means and covariance, which can be estimated from the data without the need for training. To integrate knowledge from both visual and textual modalities, we ensemble it with the original zero-shot classifier within CLIP. Extensive results on 17 datasets validate that our method surpasses or achieves comparable results with state-of-the-art methods on few-shot classification, imbalanced learning, and out-of-distribution generalization. In addition, we extend our method to base-to-new generalization and unsupervised learning, once again demonstrating its superiority over competing approaches. Our code is publicly available at \url{https://github.com/mrflogs/ICLR24}.

翻訳日:2024-02-07 14:20:28 公開日:2024-02-06

# デコヒーレンス下の量子相関

Quantum correlations under decoherence ( http://arxiv.org/abs/2402.04086v1 )

ライセンス: Link先を確認

S. V. Mousavi

(参考訳) X字状態によって記述され、異方性ハイゼンベルクXY相互作用を介して相互作用する2つの結合量子ビットの系を用いて、ゼロ温度と有限温度の両方の環境デコヒーレンスの下で、量子絡み合いの進化と、絡み合い、局所的な量子不確実性、および測定誘起非局所性を超えたいくつかの量子相関を考察する。絡み合いの2つのよく知られた定量化器としての並行性と対数負性の関係を論じる。測定誘起非局所性は相関コヒーレンスと等しいことが証明される。量子ビットと環境との相互作用は、独立量子ビットに対して突然量子絡み合いが死に至るが、他の相関関係はこの現象を経験しない。絡み合う突然死の時刻を0温度で解析し、有限温度で数値的に計算する。通常の破壊的な役割とは対照的に、いくつかの状況において環境は建設的な役割を果たし、初期量子相関がゼロであっても量子相関を引き起こす。定常量子相関は初期状態とは独立であり、低い有限温度では全ての非ゼロのままである。エンタングルメントを超える量子相関は、エンタングルメントよりも温度に対して強いことが判明した。ゼロ温度定常状態は、他の相関よりも局所量子不確かさが少ない。

Taking a system of two coupled qubits described by a X-shaped state and interacting through an anisotropic Heisenberg XY interaction, we examine the evolution of quantum entanglement and a few quantum correlations beyond entanglement, local quantum uncertainty and measurement-induced nonlocality, under the environmental decoherence both for zero and finite temperatures. The relation between concurrence and log negativity as two well-known quantifiers of entanglement is argued. It will be proven that measurement-induced nonlocality equals correlated coherence. The interaction of qubits with the environment causes quantum entanglement to suddenly die for independent qubits, but other correlations do not experience this phenomenon. The time of entanglement sudden death is calculated analytically for zero temperature, while numerically for finite temperatures. Contrary to its usual destructive role, the environment plays a constructive role in some situations, inducing quantum correlations even when the initial quantum correlations are zero. The steady state quantum correlations, being independent of the initial state, are found to remain all non-zero for low, finite temperatures. It is found that quantum correlations beyond entanglement are more robust against temperature than entanglement. The zero-temperature steady state exhibits less local quantum uncertainty than the other correlations.

翻訳日:2024-02-07 14:20:02 公開日:2024-02-06

# 多面的注意層を学習する可能性

Provably learning a multi-head attention layer ( http://arxiv.org/abs/2402.04084v1 )

ライセンス: Link先を確認

Sitan Chen, Yuanzhi Li

(参考訳) マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要なコンポーネントの1つである。 Given a sequence length $k$, attention matrices $\mathbf{\Theta}_1,\ldots,\mathbf{\Theta}_m\in\mathbb{R}^{d\times d}$, and projection matrices $\mathbf{W}_1,\ldots,\mathbf{W}_m\in\mathbb{R}^{d\times d}$, the corresponding multi-head attention layer $F: \mathbb{R}^{k\times d}\to \mathbb{R}^{k\times d}$ transforms length-$k$ sequences of $d$-dimensional tokens $\mathbf{X}\in\mathbb{R}^{k\times d}$ via $F(\mathbf{X}) \triangleq \sum^m_{i=1} \mathrm{softmax}(\mathbf{X}\mathbf{\Theta}_i\mathbf{X}^\top)\mathbf{X}\mathbf{W}_i$. 本研究では、ランダムな例から多元的注意層を学習し、この問題に対して最初の非自明な上界と下界を与える研究を開始する: - 特定の非退化条件を満たす$\{\mathbf{w}_i, \mathbf{\theta}_i\}$を提供し、$(dk)^{o(m^3)$-timeアルゴリズムを与え、$\{\pm 1\}^{k\times d}$から一様に描かれたランダムラベル付き例に対して$f$から小さな誤差を学習する。 -数値下限を証明し、最悪の場合、$m$ の指数依存は避けられないことを示す。大規模な言語モデルにおけるトークンの離散的な性質を模倣するためにboolean $\mathbf{x}$にフォーカスしていますが、私たちのテクニックは自然に標準の連続的な設定(例えばガウス的)に拡張しています。提案アルゴリズムは,未知のパラメータを含む凸体を例を用いて彫刻することを中心に,ガウス分布の代数的および回転不変性を主に活用するフィードフォワードネットワーク学習のための既存の証明可能なアルゴリズムから大きく離れている。対照的に,本解析は主に入力分布とスライスの様々な上端と下端の境界に依存しているため,より柔軟である。

The multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models. Given a sequence length $k$, attention matrices $\mathbf{\Theta}_1,\ldots,\mathbf{\Theta}_m\in\mathbb{R}^{d\times d}$, and projection matrices $\mathbf{W}_1,\ldots,\mathbf{W}_m\in\mathbb{R}^{d\times d}$, the corresponding multi-head attention layer $F: \mathbb{R}^{k\times d}\to \mathbb{R}^{k\times d}$ transforms length-$k$ sequences of $d$-dimensional tokens $\mathbf{X}\in\mathbb{R}^{k\times d}$ via $F(\mathbf{X}) \triangleq \sum^m_{i=1} \mathrm{softmax}(\mathbf{X}\mathbf{\Theta}_i\mathbf{X}^\top)\mathbf{X}\mathbf{W}_i$. In this work, we initiate the study of provably learning a multi-head attention layer from random examples and give the first nontrivial upper and lower bounds for this problem: - Provided $\{\mathbf{W}_i, \mathbf{\Theta}_i\}$ satisfy certain non-degeneracy conditions, we give a $(dk)^{O(m^3)}$-time algorithm that learns $F$ to small error given random labeled examples drawn uniformly from $\{\pm 1\}^{k\times d}$. - We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable. We focus on Boolean $\mathbf{X}$ to mimic the discrete nature of tokens in large language models, though our techniques naturally extend to standard continuous settings, e.g. Gaussian. Our algorithm, which is centered around using examples to sculpt a convex body containing the unknown parameters, is a significant departure from existing provable algorithms for learning feedforward networks, which predominantly exploit algebraic and rotation invariance properties of the Gaussian distribution. In contrast, our analysis is more flexible as it primarily relies on various upper and lower tail bounds for the input distribution and "slices" thereof.

翻訳日:2024-02-07 14:19:39 公開日:2024-02-06

# 最適住宅価格予測アルゴリズム:xgboost

An Optimal House Price Prediction Algorithm: XGBoost ( http://arxiv.org/abs/2402.04082v1 )

ライセンス: Link先を確認

Hemlata Sharma, Hitesh Harsora, Bayode Ogunleye

(参考訳) 住宅価格の正確な予測は不動産や住宅ローンなど様々な分野の基本的な要件である。資産価値はその物理的特性によって決定されるだけでなく、その周辺地域の影響を強く受けていると広く認識されている。予算制約のバランスをとりながら、個人の多様な住宅ニーズを満たすことは、不動産開発にとって主要な関心事である。そこで我々は,住宅価格予測問題を回帰課題として扱い,独立変数の意義を表現可能な機械学習技術を用いて検討した。米国アイオワ州エイムズシティの住宅データを用いて、住宅価格予測のためのサポートベクトル回帰器、ランダム森林回帰器、XGBoost、多層パーセプトロン、複数線形回帰アルゴリズムを比較した。その後,住宅コストに影響を与える要因を特定した。以上の結果から,XGBoostは住宅価格予測に最適であることがわかった。

An accurate prediction of house prices is a fundamental requirement for various sectors including real estate and mortgage lending. It is widely recognized that a property value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighbourhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare support vector regressor, random forest regressor, XGBoost, multilayer perceptron and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction.

翻訳日:2024-02-07 14:18:37 公開日:2024-02-06

# 拡張による軽量空間ネットワークの一般化

Improved Generalization of Weight Space Networks via Augmentations ( http://arxiv.org/abs/2402.04081v1 )

ライセンス: Link先を確認

Aviv Shamsian, Aviv Navon, David W. Zhang, Yan Zhang, Ethan Fetaya, Gal Chechik, Haggai Maron

(参考訳) ニューラルネットワークが他のニューラルネットワークの重みを処理する深度重み空間(DWS)での学習は、新たな研究方向であり、2Dおよび3Dニューラルネットワーク(INR、NeRF)への応用と、他のタイプのニューラルネットワークに関する推論を行う。残念ながら、重量空間モデルは大幅に過度に適合する傾向がある。我々は、この過剰適合の理由を実証的に分析し、主要な理由はDWSデータセットの多様性の欠如であることを示した。与えられたオブジェクトは、多くの異なる重み設定で表現できるが、典型的なINRトレーニングセットは、同じオブジェクトを表すINR間のばらつきを捉えない。そこで本研究では,重み空間におけるデータ拡張のための戦略を検討し,重み空間に適したミックスアップ手法を提案する。提案手法の有効性を2つの設定で示す。分類では、最大10倍のデータを持つのと同様、パフォーマンスが向上する。自己教師付きコントラスト学習では、下流分類においてかなりの5～10%の利益が得られる。

Learning in deep weight spaces (DWS), where neural networks process the weights of other neural networks, is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs), as well as making inferences about other types of neural networks. Unfortunately, weight space models tend to suffer from substantial overfitting. We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets. While a given object can be represented by many different weight configurations, typical INR training sets fail to capture variability across INRs that represent the same object. To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces. We demonstrate the effectiveness of these methods in two setups. In classification, they improve performance similarly to having up to 10 times more data. In self-supervised contrastive learning, they yield substantial 5-10% gains in downstream classification.

翻訳日:2024-02-07 14:18:22 公開日:2024-02-06

# オフライン強化学習のためのQアンサンブルを用いたエントロピー規則化拡散政策

Entropy-regularized Diffusion Policy with Q-Ensembles for Offline Reinforcement Learning ( http://arxiv.org/abs/2402.04080v1 )

ライセンス: Link先を確認

Ruoqi Zhang, Ziwei Luo, Jens Sj\"olund, Thomas B. Sch\"on, Per Mattsson

(参考訳) 本稿では,オフライン強化学習(rl)のための拡散ポリシーの訓練技術について述べる。中心にある平均回帰確率微分方程式(SDE)は、複素作用分布を標準ガウスに変換し、典型的な拡散ポリシーのように、対応する逆時間SDEで環境状態に条件付けられた作用をサンプリングする。このようなsdeには、ポリシーのログ確率を計算するために使用できるソリューションがあることを示し、オフラインデータセットの探索を改善するエントロピー正規化子を生成する。また,不正確な値関数の影響を軽減するために,より堅牢な政策改善のために,Qアンサンブルの低信頼境界を学習することを提案する。オフラインRLにおけるエントロピー正規化拡散ポリシーとQアンサンブルを組み合わせることで,D4RLベンチマークのほとんどのタスクにおける最先端性能を実現する。コードは \href{https://github.com/ruoqizzz/Entropy-Regularized-Diffusion-Policy-with-QEnsemble}{https://github.com/ruoqizzz/Entropy-Regularized-Diffusion-with-QEnsemble} で公開されている。

This paper presents advanced techniques of training diffusion policies for offline reinforcement learning (RL). At the core is a mean-reverting stochastic differential equation (SDE) that transfers a complex action distribution into a standard Gaussian and then samples actions conditioned on the environment state with a corresponding reverse-time SDE, like a typical diffusion policy. We show that such an SDE has a solution that we can use to calculate the log probability of the policy, yielding an entropy regularizer that improves the exploration of offline datasets. To mitigate the impact of inaccurate value functions from out-of-distribution data points, we further propose to learn the lower confidence bound of Q-ensembles for more robust policy improvement. By combining the entropy-regularized diffusion policy with Q-ensembles in offline RL, our method achieves state-of-the-art performance on most tasks in D4RL benchmarks. Code is available at \href{https://github.com/ruoqizzz/Entropy-Regularized-Diffusion-Policy-with-QEnsemble}{https://github.com/ruoqizzz/Entropy-Regularized-Diffusion-Policy-with-QEnsemble}.

翻訳日:2024-02-07 14:18:04 公開日:2024-02-06

# メトロノームスピンは時間結晶ダイナミクスを安定化させる

A metronome spin stabilizes time-crystalline dynamics ( http://arxiv.org/abs/2402.04078v1 )

ライセンス: Link先を確認

Niklas Euler, Adrian Braemer, Luca Benn, Martin G\"arttner

(参考訳) 乱れのない量子イジング鎖は、各スピンを$\pi(1-\epsilon_i)$で回転させる時間周期駆動を受ける。すべてのスピンが$\epsilon$と同じ偏差を経験し、システムが完全に偏極状態において初期化されている場合、系の磁化は時間結晶であることが知られており、系の磁化はチェーンの長さとともに指数関数的に成長する時間スケールに対して周期2倍の振動を示す。本研究では,スピン間で異なる偏差 $\epsilon$ の効果について検討する。 1つのスピンに対する$\epsilon$の削減は、時空間秩序の寿命を劇的に増加させ、`metronome' スピンの名前を示唆する。平均ハミルトニアン像における摂動論を用いて、マクロなバルク磁化を持つ初期状態のこの観察を説明する。さらに、ランダムなビットストリング初期状態の場合、同じ図でも理解できる位相的エッジモードの寿命の延長を報告します。最後に,2つのシナリオにおいて,メトロノムスピンが直接鎖の一部ではないような変化した幾何学について論じる。その結果,Floquetシステムに現れる複雑なダイナミクスは,空間的に変化するドライブの影響を受け,Floquetエンジニアリングの新たな道が明らかにされた。

We investigate a disorder-free quantum Ising chain subject to a time-periodic drive that rotates each spin by an angle $\pi(1-\epsilon_i)$. In case all spins experience the same deviation $\epsilon$ and the system is initialized in a fully polarized state, the dynamics is known to be time-crystalline: the magnetization of the system exhibits period-doubled oscillations for timescales that grow exponentially with the length of the chain. In this work, we study the effect of a deviation $\epsilon$ that differs between spins. We find that reducing $\epsilon$ for a single spin drastically enhances the lifetime of spatio-temporal order, suggesting the name ``metronome" spin. Employing perturbative arguments in an average Hamiltonian picture, we explain this observation for initial states with macroscopic bulk magnetization. Furthermore, in the case of random bitstring initial states, we report the enhancement of the lifetime of a topological edge mode, which can also be understood in the same picture. Finally, we discuss an altered geometry in which the metronome spin is not directly part of the chain, affecting the dynamics in different ways in the two scenarios considered. Our findings unveil the intricate dynamics that emerge in Floquet systems under the influence of a spatially varying drive, thereby uncovering new avenues for Floquet engineering.

翻訳日:2024-02-07 14:17:41 公開日:2024-02-06

# 教師学習型大規模言語モデルを用いた放射線腫瘍症状抽出の反復的迅速化

Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models ( http://arxiv.org/abs/2402.04075v1 )

ライセンス: Link先を確認

Reza Khanmohammadi, Ahmed I Ghanem, Kyle Verdecchia, Ryan Hall, Mohamed Elshaikh, Benjamin Movsas, Hassan Bagher-Ebadian, Indrin Chetty, Mohammad M. Ghassemi, Kundan Thind

(参考訳) 本研究では,臨床ノートから前立腺癌放射線治療症状の抽出を改善するために,LLMを用いた新しい教師学生アーキテクチャを提案する。学生モデルであるmixtralはまず症状を抽出し、続いて教師モデルであるgpt-4がmixtralのパフォーマンスに基づいてプロンプトを洗練する。この反復的なプロセスは、12の症状にまたがる294の単症状臨床ノートと、エポック1回あたり16ラウンドの精製を含む。その結果, 単症状と多症状のいずれからも症状の抽出は有意に改善した。 59の単調音符では精度は0.51から0.71に、精度は0.52から0.82に、リコールは0.52から0.72に、F1は0.49から0.73に向上した。 375のマルチ症状ノートでは、精度は0.24から0.43に、精度は0.6から0.76に、リコールは0.24から0.43に、F1は0.20から0.44に上昇した。これらの結果から, LLMの放射線オンコロジー利用における高度なプロンプト工学の有効性が示された。

This study introduces a novel teacher-student architecture utilizing Large Language Models (LLMs) to improve prostate cancer radiotherapy symptom extraction from clinical notes. Mixtral, the student model, initially extracts symptoms, followed by GPT-4, the teacher model, which refines prompts based on Mixtral's performance. This iterative process involved 294 single symptom clinical notes across 12 symptoms, with up to 16 rounds of refinement per epoch. Results showed significant improvements in extracting symptoms from both single and multi-symptom notes. For 59 single symptom notes, accuracy increased from 0.51 to 0.71, precision from 0.52 to 0.82, recall from 0.52 to 0.72, and F1 score from 0.49 to 0.73. In 375 multi-symptom notes, accuracy rose from 0.24 to 0.43, precision from 0.6 to 0.76, recall from 0.24 to 0.43, and F1 score from 0.20 to 0.44. These results demonstrate the effectiveness of advanced prompt engineering in LLMs for radiation oncology use.

翻訳日:2024-02-07 14:17:17 公開日:2024-02-06

# X線超蛍光のエルミート確率法

Hermitian stochastic methodology for X-ray superfluorescence ( http://arxiv.org/abs/2402.04069v1 )

ライセンス: Link先を確認

Stasis Chuchurka, Vladislav Sukharnikov, and Nina Rohringer

(参考訳) 最近導入されたx線増幅自発発光のダイナミクスをモデル化するための理論的枠組みは、他の位相空間サンプリング技術と同様に、量子エミッタの密度行列と放射場の確率的サンプリングに基づいている。第一原理に基づいて価値ある理論的洞察を与える一方で、元の確率微分方程式は発散性と数値不安定性を示す。本稿では,確率的成分を摂動的に計算することにより,この問題を解決する。洗練された形式主義は自発放出の特性を正確に再現し、自発放出、増幅自発放出、非線形状態を含む同軸幾何学における集合X線放射の全ての段階を記述するのに普遍的に適用可能である。数値例を通して,超蛍光の1次元近似における重要な特徴を解析する。重要なことに、基礎となる確率方程式の単一実現は、超蛍光の個々の実験観測として完全に解釈できる。

A recently introduced theoretical framework for modeling the dynamics of X-ray amplified spontaneous emission is based on stochastic sampling of the density matrix of quantum emitters and the radiation field, similarly to other phase-space sampling techniques. While based on first principles and providing valuable theoretical insights, the original stochastic differential equations exhibit divergences and numerical instabilities. Here, we resolve this issue by accounting the stochastic components perturbatively. The refined formalism accurately reproduces the properties of spontaneous emission and proves universally applicable for describing all stages of collective X-ray emission in paraxial geometry, including spontaneous emission, amplified spontaneous emission, and the non-linear regime. Through numerical examples, we analyze key features of superfluorescence in one-dimensional approximation. Importantly, single realizations of the underlying stochastic equations can be fully interpreted as individual experimental observations of superfluorescence.

翻訳日:2024-02-07 14:16:55 公開日:2024-02-06

# got to explain: 言語モデルによる証拠駆動予測

Retrieve to Explain: Evidence-driven Predictions with Language Models ( http://arxiv.org/abs/2402.04068v1 )

ライセンス: Link先を確認

Ravi Patel (1), Angus Brayne (1), Rogier Hintzen (1), Daniel Jaroslawicz (1), Georgiana Neculae (1), Dane Corneil (1) ((1) BenevolentAI)

(参考訳) 機械学習モデル、特に言語モデルは内省が難しいことで悪名高い。ブラックボックスモデルは、モデルトレーニングと有害バイアスの両方の問題を隠蔽することができる。ヒューマン・イン・ザ・ループのプロセスでは、不透明な予測は信頼の欠如を招き、効果的に実行してもモデルへの影響を制限する。これらの問題に対処するために、Retrieve to Explain (R2E)を紹介します。 R2Eは検索に基づく言語モデルであり、文書コーパスのエビデンスに基づいた研究質問に対して、最終的な予測に対する証拠の相対的重要性を特定するためにシェープリー値を使用する。 R2Eは再訓練することなく新しいエビデンスに適応し、自然言語へのテンプレート化を通じて構造化データを組み込むことができる。本研究は,本モデルが臨床治験結果を予測するための業界標準遺伝学的アプローチよりも優れていることを示す。

Machine learning models, particularly language models, are notoriously difficult to introspect. Black-box models can mask both issues in model training and harmful biases. For human-in-the-loop processes, opaque predictions can drive lack of trust, limiting a model's impact even when it performs effectively. To address these issues, we introduce Retrieve to Explain (R2E). R2E is a retrieval-based language model that prioritizes amongst a pre-defined set of possible answers to a research question based on the evidence in a document corpus, using Shapley values to identify the relative importance of pieces of evidence to the final prediction. R2E can adapt to new evidence without retraining, and incorporate structured data through templating into natural language. We assess on the use case of drug target identification from published scientific literature, where we show that the model outperforms an industry-standard genetics-based approach on predicting clinical trial outcomes.

翻訳日:2024-02-07 14:16:39 公開日:2024-02-06

# 自律道路補修のための空間的・チャネル的注意による多種道路欠陥検出とセグメンテーション

Multi-class Road Defect Detection and Segmentation using Spatial and Channel-wise Attention for Autonomous Road Repairing ( http://arxiv.org/abs/2402.04064v1 )

ライセンス: Link先を確認

Jongmin Yu, Chen Bene Chi, Sebastiano Fichera, Paolo Paoletti, Devansh Mehta, and Shan Luo

(参考訳) 道路舗装の検出とセグメンテーションは、自律的な道路修復システムの開発に不可欠である。しかし, 道路舗装画像のテクスチャ的単純さ, 欠陥ジオメトリの多様性, クラス間の形態的曖昧さなどにより, 複数クラスの欠陥検出とセグメンテーションを同時に行うインスタンスセグメンテーション手法の開発は困難である。道路欠陥検出とセグメント化のための新しいエンドツーエンド手法を提案する。提案手法は,空間的およびチャネル的次元にわたるグローバルな表現を学習するための,複数の空間的およびチャネル的注意ブロックから構成される。これらの注目ブロックを通じて、道路欠陥の形態情報(空間的特徴)のよりグローバルに一般化された表現と、画像の色と深度情報を学ぶことができる。提案手法の有効性を実証するため,9つの道路欠陥クラスを付加した新たに収集したデータセット上で,様々なアブレーション実験と先行手法との比較を行った。実験の結果,提案手法は既存の道路欠陥検出法やセグメント化法よりも優れていることがわかった。

Road pavement detection and segmentation are critical for developing autonomous road repair systems. However, developing an instance segmentation method that simultaneously performs multi-class defect detection and segmentation is challenging due to the textural simplicity of road pavement image, the diversity of defect geometries, and the morphological ambiguity between classes. We propose a novel end-to-end method for multi-class road defect detection and segmentation. The proposed method comprises multiple spatial and channel-wise attention blocks available to learn global representations across spatial and channel-wise dimensions. Through these attention blocks, more globally generalised representations of morphological information (spatial characteristics) of road defects and colour and depth information of images can be learned. To demonstrate the effectiveness of our framework, we conducted various ablation studies and comparisons with prior methods on a newly collected dataset annotated with nine road defect classes. The experiments show that our proposed method outperforms existing state-of-the-art methods for multi-class road defect detection and segmentation methods.

翻訳日:2024-02-07 14:16:22 公開日:2024-02-06

# リレーショナルハイパーグラフによるリンク予測

Link Prediction with Relational Hypergraphs ( http://arxiv.org/abs/2402.04062v1 )

ライセンス: Link先を確認

Xingyue Huang, Miguel Romero Orth, Pablo Barcel\'o, Michael M. Bronstein, \.Ismail \.Ilkan Ceylan

(参考訳) 知識グラフとのリンク予測は、グラフ機械学習において徹底的に研究されており、成功したアプリケーションとグラフニューラルネットワークアーキテクチャの豊かな展望につながっている。それでも、これらのアーキテクチャの成功をリレーショナルハイパーグラフとリンクするために転送することは依然として困難である。関係のハイパーエッジが存在するため、リンク予測は、様々な選択のために$k$のノード間でタスクとなり、すべての関係がバイナリ($k=2$)であるナレッジグラフとのリンク予測よりもかなり難しい。本稿では,関係ハイパーグラフとリンク予測を行う2つのフレームワークを提案し,対応する関係性Weisfeiler-Lemanアルゴリズムおよびいくつかの自然な論理形式を用いてモデルアーキテクチャの表現力を徹底的に解析する。広範な経験的分析を通じて,提案するモデルアーキテクチャのパワーを,様々なリレーショナルハイパーグラフベンチマークで検証した。結果として得られたモデルアーキテクチャは、インダクティブリンク予測のベースラインを実質的に上回り、トランスダクティブリンク予測の最先端結果に繋がる。そこで本研究では,グラフニューラルネットワークの完全関係構造への応用を解き明かす。

Link prediction with knowledge graphs has been thoroughly studied in graph machine learning, leading to a rich landscape of graph neural network architectures with successful applications. Nonetheless, it remains challenging to transfer the success of these architectures to link prediction with relational hypergraphs. The presence of relational hyperedges makes link prediction a task between $k$ nodes for varying choices of $k$, which is substantially harder than link prediction with knowledge graphs, where every relation is binary ($k=2$). In this paper, we propose two frameworks for link prediction with relational hypergraphs and conduct a thorough analysis of the expressive power of the resulting model architectures via corresponding relational Weisfeiler-Leman algorithms, and also via some natural logical formalisms. Through extensive empirical analysis, we validate the power of the proposed model architectures on various relational hypergraph benchmarks. The resulting model architectures substantially outperform every baseline for inductive link prediction, and lead to state-of-the-art results for transductive link prediction. Our study therefore unlocks applications of graph neural networks to fully relational structures.

翻訳日:2024-02-07 14:16:03 公開日:2024-02-06

# 原子層堆積と高温熱処理による3次元超伝導NiO共振器の2レベル系散逸低減

Reducing two-level system dissipations in 3D superconducting Niobium resonators by atomic layer deposition and high temperature heat treatment ( http://arxiv.org/abs/2402.04137v1 )

ライセンス: Link先を確認

Yasmine Kalboussi, Baptiste Delatte, Sarra Bira, Kassiog\'e Dembele, Xiaoyan Li, Frederic Miserque, Nathalie Brun, Michael Walls, Jean-luc Maurice, Diana Dragoe, Jocelyne Leroy, David Longuevergne, Aur\'elie Gentils, St\'ephanie Jublot-Leclerc, Gregoire Julien, Fabien Eozenou, Matthieu Baudrier, Luc Maurice and Thomas Proslier

(参考訳) 超伝導量子ビットは、世界の計算能力に革命が迫っている量子コンピューティングの最先端技術プラットフォームとして生まれてきた。それにもかかわらず、計算に信頼性のある量子ビット回路の製造には量子コヒーレンス寿命の増大が必要であり、これは主に超伝導薄膜と隣接する誘電体領域に存在する2レベルシステム(TLS)欠陥の散逸によって制限される。本稿では、10nmの酸化アルミニウムal2o3薄膜の原子層堆積(ald)と650 {\deg}cでの高真空(hv)熱処理による3次元超伝導高周波(srf)ニオブ共振器の2レベル系統損失の低減を実証する。 By probing the effect of several heat treatments on Al2O3-coated niobium samples by X-ray photoelectron spectroscopy (XPS) plus scanning and conventional high resolution transmission electron microscopy (STEM/HRTEM) coupled with electron energy loss spectroscopy (EELS) and (EDX) , we witness a dissolution of niobium native oxides and the modification of the Al2O3-Nb interface, which correlates with the enhancement of the quality factor at low fields of two 1.3 GHz niobium cavities coated with 10 nm of Al2O3.

Superconducting qubits have arisen as a leading technology platform for quantum computing which is on the verge of revolutionizing the world's calculation capacities. Nonetheless, the fabrication of computationally reliable qubit circuits requires increasing the quantum coherence lifetimes, which are predominantly limited by the dissipations of two-level system (TLS) defects present in the thin superconducting film and the adjacent dielectric regions. In this paper, we demonstrate the reduction of two-level system losses in three-dimensional superconducting radio frequency (SRF) niobium resonators by atomic layer deposition (ALD) of a 10 nm aluminum oxide Al2O3 thin films followed by a high vacuum (HV) heat treatment at 650 {\deg}C for few hours. By probing the effect of several heat treatments on Al2O3-coated niobium samples by X-ray photoelectron spectroscopy (XPS) plus scanning and conventional high resolution transmission electron microscopy (STEM/HRTEM) coupled with electron energy loss spectroscopy (EELS) and (EDX) , we witness a dissolution of niobium native oxides and the modification of the Al2O3-Nb interface, which correlates with the enhancement of the quality factor at low fields of two 1.3 GHz niobium cavities coated with 10 nm of Al2O3.

翻訳日:2024-02-07 14:07:57 公開日:2024-02-06

# 重力三角形という三角形

A gravitational metrological triangle ( http://arxiv.org/abs/2402.04135v1 )

ライセンス: Link先を確認

Claus Lammerzahl (Univ. Bremen) and Sebastian Ulbricht (PTB Braunschweig)

(参考訳) アインシュタインの一般相対性理論の弱場限界における数学的構造とマクスウェルの電磁力学理論の類似性から、ジョセフソン効果と量子ホール効果の重力的類似性が示されている。これらの効果を組み合わせることで、量子・電磁気三角形の重力類似性が得られる。重力距離三角形は、気象学に応用できる可能性があり、プランク定数と基本粒子質量の関係を調べるために用いられる。これにより、弱同値原理の量子テストが可能になる。さらに、重力と量子/電磁気三角形の類似性は、量子力学の普遍性をテストするのに利用できる。

Motivated by the similarity of the mathematical structure of Einstein's General Relativity in its weak field limit and of Maxwell's theory of electrodynamics it is shown that there are gravitational analogues of the Josephson effect and the quantum Hall effect. These effects can be combined to derive a gravitational analogue of the quantum/electric metrological triangle. The gravitational metrological triangle may have applications in metrology and could be used to investigate the relation of the Planck constant to fundamental particle masses. This allows for quantum tests of the Weak Equivalence Principle. Moreover, the similarity of the gravitational and the quantum/electrical metrological triangle can be used to test the universality of quantum mechanics.

翻訳日:2024-02-07 14:07:32 公開日:2024-02-06

# 周期駆動アーベルモデルにおける非可換アノン

Non-Abelian anyons in a periodically-driven Abelian model ( http://arxiv.org/abs/2402.04131v1 )

ライセンス: Link先を確認

Francesco Petiziol

(参考訳) 本研究では,非可換なアノンが周期的駆動を受けるアーベル位相整列モデルから出現し得ることを示し,その具体例はトリック符号モデルにおいてアノンをイジングすることを示す。トーリック符号は、相互セミオンであるフェルミオンとボソニックの準粒子を持ち、互いに$\pi$フラックスとみなす。フロッケ変調はフェルミオン準粒子のみに対応し、位相的に非自明なバンド構造を誘導するように設計され、高周波領域で駆動支援の$p+ip$ペアリングを実現する。その結果、フェルミオンは、ボソニック準粒子がホストするフロケット・メジャーナゼロモードに分数化し、非可換な文字を発達させる。これらの性質は、準粒子画像における高周波膨張と、正確な準エネルギースペクトルと非可換交換位相の計算を通じて数値的に解析される。本研究は、駆動位相秩序量子物質の非平衡物理学に光を当て、工学的量子システムにおける非可換挙動の観察を容易にする可能性がある。

We show that non-Abelian anyons can emerge from an Abelian topologically-ordered model subject to time-periodic driving, with the specific example of Ising anyons in a driven toric-code model. The toric code possesses fermionic and bosonic quasiparticles which are mutual semions, namely they see each other as $\pi$ fluxes. The Floquet modulation addresses the fermionic quasiparticles only and is designed to induce a topologically non-trivial band structure, realizing drive-assisted $p+ip$ pairing in the high-frequency regime. As a result, the fermions fractionalize into Floquet-Majorana zero modes hosted by the bosonic quasiparticles, which then develop non-Abelian character. These properties are analyzed through high-frequency expansions in a quasiparticle picture and, numerically, via computation of the exact quasienergy spectra and of the non-Abelian exchange phases. Our findings shed light on the nonequilibrium physics of driven topologically-ordered quantum matter and may facilitate the observation of non-Abelian behaviour in engineered quantum systems.

翻訳日:2024-02-07 14:07:03 公開日:2024-02-06

# ovor:リハーサルフリークラスインクリメンタルラーニングのための仮想アウトリーバー正規化を伴うoneprompt

OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning ( http://arxiv.org/abs/2402.04129v1 )

ライセンス: Link先を確認

Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu

(参考訳) 近年の研究では、学習可能なプロンプトとともに、大規模な事前学習モデルを使用することで、クラスインクリメンタルラーニング(CIL)設定のためのリハーサルフリーメソッドが、顕著なリハーサルベースよりも優れたパフォーマンスを達成できることが示されている。リハーサルのないCILメソッドは、異なるタスクからクラスを区別するのに苦労する。本研究では,仮想外乱に基づく正規化手法を提案し,異なるタスク間のクラス間の混同が軽減されるように分類器の決定境界を厳格化する。最近のプロンプトベースのメソッドは、新しいタスクで以前のタスクの知識を上書きすることを防ぐために、タスク固有のプロンプトのプールを必要とすることが多い。論文で明らかになったように、この追加コストは精度を犠牲にすることなく取り除くことができる。本稿では,プリミティブ・プロンプト・ベースの手法により,従来のプロンプト・プールを備えたSOTA(State-of-the-art)手法に匹敵する結果が得られることを示す。我々は,従来のSOTAリハーサルのないCIL手法の精度をImageNet-RとCIFAR-100ベンチマークで向上させ,異なるプロンプトベースの手法との互換性を示した。ソースコードはhttps://github.com/jpmorganchase/ovorで閲覧できます。

Recent works have shown that by using large pre-trained models along with learnable prompts, rehearsal-free methods for class-incremental learning (CIL) settings can achieve superior performance to prominent rehearsal-based ones. Rehearsal-free CIL methods struggle with distinguishing classes from different tasks, as those are not trained together. In this work we propose a regularization method based on virtual outliers to tighten decision boundaries of the classifier, such that confusion of classes among different tasks is mitigated. Recent prompt-based methods often require a pool of task-specific prompts, in order to prevent overwriting knowledge of previous tasks with that of the new task, leading to extra computation in querying and composing an appropriate prompt from the pool. This additional cost can be eliminated, without sacrificing accuracy, as we reveal in the paper. We illustrate that a simplified prompt-based method can achieve results comparable to previous state-of-the-art (SOTA) methods equipped with a prompt pool, using much less learnable parameters and lower inference cost. Our regularization method has demonstrated its compatibility with different prompt-based methods, boosting those previous SOTA rehearsal-free CIL methods' accuracy on the ImageNet-R and CIFAR-100 benchmarks. Our source code is available at https://github.com/jpmorganchase/ovor.

翻訳日:2024-02-07 14:06:19 公開日:2024-02-06

# 多パス量子プロセストモグラフィ : 精度と精度の向上

Multipass Quantum Process Tomography: Precision and Accuracy Enhancement ( http://arxiv.org/abs/2402.04128v1 )

ライセンス: Link先を確認

Stancho G. Stanchev and Nikolay V. Vitanov

(参考訳) 状態準備・測定(SPAM)、読み出し・ショットノイズによる誤差を軽減し,QPT(Quantum Process Tomography)の精度と精度を向上させる手法を提案する。単一ゲート上でのみQPTを実行する代わりに、同一ゲートの複数のアプリケーションに対してQPTを実行することを提案する。この方法は、マルチパスプロセスの標準QPTによるパウリ転移行列(PTM)の測定と、理論上小さな誤差に対して正確な結果をもたらす反復的アプローチと、シルヴェスター方程式の解法に基づく線形化アプローチの2つの方法による単プロセス PTM の導出を含む。 ibmq_qasm_simulator を用いた ibm quantum のシミュレーションにより,これら2つの手法の有効性を検証した。ランダム化ベンチマーク方式と比較して,提案手法は単数(忠実度)ではなく,全PMMを提供する。従来のQPTと比較して,SPAM,読み出し,ショットノイズエラーを大幅に低減するため,PTMの精度と精度が向上する。提案手法を用いて、量子プロセッサibmq_manila(Falcon r5.11L)上のPTMとCNOTゲートの忠実度を実験的に決定する。

We introduce a method to enhance the precision and accuracy of Quantum Process Tomography (QPT) by mitigating the errors caused by state preparation and measurement (SPAM), readout and shot noise. Instead of performing QPT solely on a single gate, we propose performing QPT on a sequence of multiple applications of the same gate. The method involves the measurement of the Pauli transfer matrix (PTM) by standard QPT of the multipass process, and then deduce the single-process PTM by two alternative approaches: an iterative approach which in theory delivers the exact result for small errors, and a linearized approach based on solving the Sylvester equation. We examine the efficiency of these two approaches through simulations on IBM Quantum using ibmq_qasm_simulator. Compared to the Randomized Benchmarking type of methods, the proposed method delivers the entire PTM rather than a single number (fidelity). Compared to standard QPT, our method delivers PTM with much higher accuracy and precision because it greatly reduces the SPAM, readout and shot noise errors. We use the proposed method to experimentally determine the PTM and the fidelity of the CNOT gate on the quantum processor ibmq_manila (Falcon r5.11L).

翻訳日:2024-02-07 14:05:52 公開日:2024-02-06

# 原子カー媒体による偏光励起光の発生におけるシードの役割

Role of seeding in the generation of polarization squeezed light by atomic Kerr medium ( http://arxiv.org/abs/2402.04127v1 )

ライセンス: Link先を確認

Eduardo C. Lima, Breno Marques, Marcelo Martinelli and Luciano S. Cruz

(参考訳) 量子状態の生成とキャラクタリゼーションは多くの量子技術応用の基本要素である。本研究では,Kerr媒体との光相互作用による偏光量子状態の生成と直交偏光苗への結果の依存性について検討した。 %Our実験装置から始めると、Ti:Sapphireレーザーが生成したコヒーレントな状態に基づいており、検出量子効率の補正後、$^{87}$Rbの温蒸気セルとの相互作用により$5.2\pm 0.5$ dB (6.4\pm 0.6$ dB)のノイズ圧縮が生じる。

Quantum state production and characterization are fundamental elements for many quantum technological applications. In this work, we studied the generation of polarization quantum states by interacting light with a Kerr medium and the dependency of the outcome on orthogonal polarization seedlings. Starting from %Our experimental apparatus is based on coherent states produced by Ti:Sapphire laser, interaction with a $^{87}$Rb warm vapor cell led to noise compression of $-5.2\pm 0.5$ dB ($6.4\pm 0.6$ dB after correction of the detection quantum efficiency).

翻訳日:2024-02-07 14:05:27 公開日:2024-02-06

# SCAFFLSA: 線形確率近似と時間差学習における不均一性バイアスの定量化と除去

SCAFFLSA: Quantifying and Eliminating Heterogeneity Bias in Federated Linear Stochastic Approximation and Temporal Difference Learning ( http://arxiv.org/abs/2402.04114v1 )

ライセンス: Link先を確認

Paul Mangold, Sergey Samsonov, Safwan Labbi, Ilya Levin, Reda Alami, Alexey Naumov, Eric Moulines

(参考訳) 本稿では,fedlsa(federated linear stochastic approximation)アルゴリズムの非漸近解析を行う。異種エージェントを用いた局所学習で導入されたバイアスを明示的に定量化し,アルゴリズムの複雑さについて検討する。我々は、FedLSAの通信複雑性が、フェデレーションの利点を制限する所望の精度$\epsilon$で多項式的にスケールすることを示した。そこで我々は,FedLSAの新たな変種であるSCAFFLSAを提案し,制御変数を用いて局所学習のバイアスを補正し,その収束性を統計的不均一性の仮定なしで証明する。本稿では,線形関数近似を用いた時間差分学習に提案手法を適用し,その複雑性改善を解析する。

In this paper, we perform a non-asymptotic analysis of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the bias introduced by local training with heterogeneous agents, and investigate the sample complexity of the algorithm. We show that the communication complexity of FedLSA scales polynomially with the desired precision $\epsilon$, which limits the benefits of federation. To overcome this, we propose SCAFFLSA, a novel variant of FedLSA, that uses control variates to correct the bias of local training, and prove its convergence without assumptions on statistical heterogeneity. We apply the proposed methodology to federated temporal difference learning with linear function approximation, and analyze the corresponding complexity improvements.

翻訳日:2024-02-07 14:05:10 公開日:2024-02-06

# バックスクリーン:チャットgptのダークパーソナリティ特性と陰謀の信念を調査

Behind the Screen: Investigating ChatGPT's Dark Personality Traits and Conspiracy Beliefs ( http://arxiv.org/abs/2402.04110v1 )

ライセンス: Link先を確認

Erik Weber, J\'er\^ome Rutinowski, Markus Pauly

(参考訳) ChatGPTは不透明な振る舞いで有名だ。本稿では, GPT-3.5 と GPT-4 の暗黒性格特性と陰謀信念を詳細に分析する。ダークファクタテスト,mach-iv尺度,ジェネリック共謀信念尺度,共謀精神尺度など,心理的テストとアンケートの相違がみられた。 GPT-3.5 と GPT-4 の差を調べるために, 計算平均スコア, 標準偏差, 重要度試験を行った。人間の研究における相互依存性を示す特徴について,相関性を検討した。また,質問紙に異なる回答行動を示したグループに対応するシステムの役割を,これらの役割に関連づけられた特徴を反映するモデルの能力について検討した。 GPT-3.5とGPT-4の差がほとんどない暗黒性格特性と陰謀信念はどちらのモデルでも顕著に発音されなかった。しかし, GPT-4は情報保持の傾向が顕著であった。 GPT-4はGPT-3.5よりもはるかに大きなデータセットで訓練されているため、特に興味深い。この場合、データ露出の増加は、情報の制御に対するより大きな信念と相関しているようだ。極端な政治的提携の割り当ては陰謀論に対する信念を増大させた。テストシーケンシングはモデルの応答と観測された相関に影響し、文脈記憶の形を示している。

ChatGPT is notorious for its intransparent behavior. This paper tries to shed light on this, providing an in-depth analysis of the dark personality traits and conspiracy beliefs of GPT-3.5 and GPT-4. Different psychological tests and questionnaires were employed, including the Dark Factor Test, the Mach-IV Scale, the Generic Conspiracy Belief Scale, and the Conspiracy Mentality Scale. The responses were analyzed computing average scores, standard deviations, and significance tests to investigate differences between GPT-3.5 and GPT-4. For traits that have shown to be interdependent in human studies, correlations were considered. Additionally, system roles corresponding to groups that have shown distinct answering behavior in the corresponding questionnaires were applied to examine the models' ability to reflect characteristics associated with these roles in their responses. Dark personality traits and conspiracy beliefs were not particularly pronounced in either model with little differences between GPT-3.5 and GPT-4. However, GPT-4 showed a pronounced tendency to believe in information withholding. This is particularly intriguing given that GPT-4 is trained on a significantly larger dataset than GPT-3.5. Apparently, in this case an increased data exposure correlates with a greater belief in the control of information. An assignment of extreme political affiliations increased the belief in conspiracy theories. Test sequencing affected the models' responses and the observed correlations, indicating a form of contextual memory.

翻訳日:2024-02-07 14:04:55 公開日:2024-02-06

# 列車管理システムにおける非構造化テキストを用いた階層的遅延帰属分類

Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems ( http://arxiv.org/abs/2402.04108v1 )

ライセンス: Link先を確認

Anton Borg, Per Lingvall, Martin Svensson

(参考訳) EU指令は、列車遅延の体系的なフォローアップを規定している。スウェーデンではスウェーデン運輸局が適切な遅延帰属コードを登録し割り当てている。しかし、この遅延帰属コードは手動で割り当てられ、これは複雑なタスクである。本稿では,イベント記述に基づいて遅延帰属符号を割り当てる機械学習に基づく意思決定支援について検討する。このテキストはtf-idfを用いて変換され、ランダムフォレストとサポートベクターマシンの2つのモデルがランダム一様分類器とスウェーデンの輸送管理の分類性能に対して評価される。さらに、問題は階層的かつ平坦なアプローチとしてモデル化されている。その結果,階層的アプローチはフラットアプローチよりも優れた性能を示す。どちらのアプローチもランダムな一様分類器よりも優れているが、手動分類よりも悪い。

EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.

翻訳日:2024-02-07 14:04:33 公開日:2024-02-06

# 偏りのない大言語モデルにおける暗黙的バイアスの測定

Measuring Implicit Bias in Explicitly Unbiased Large Language Models ( http://arxiv.org/abs/2402.04105v1 )

ライセンス: Link先を確認

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, Thomas L. Griffiths

(参考訳) 大型言語モデル(LLM)は明示的なバイアステストに合格するが、それでも暗黙のバイアスを持つ。このような暗黙のバイアスを測定することは、課題である: llmがますますプロプライエタリになるにつれて、それらの埋め込みにアクセスして既存のバイアス対策を適用することはできないかもしれない。心理学に着想を得た2つのバイアス尺度:暗黙のバイアスを明らかにするための急進的手法であるLLM Implicit Association Test (IAT) Biasと、意思決定タスクにおける微妙な差別を検出するLLM Decision Biasである。これらの指標を用いて,4つの社会的領域(人種,性別,宗教,健康)と21のカテゴリー(武器,罪悪感,科学,キャリアなど)の6つのLSMにおいて,ヒト様ステレオタイプバイアスが広まっていた。暗黙バイアスの即時測定は埋め込みに基づく手法と相関するが,LLM決定バイアスによる下流の挙動の予測は良好である。この尺度は、相対的な絶対的でない評価が暗黙の偏見とより関連していることを示す心理学的な結果によって動機づけられた個人の決定をLSMに依頼することに基づいている。心理学に基づく素早い尺度を用いることで、標準ベンチマークで明示的なバイアスを示さないプロプライエタリなllmにおいて、ニュアンスバイアスや微妙な差別を効果的に露呈することができる。

Large language models (LLMs) can pass explicit bias tests but still harbor implicit biases, similar to humans who endorse egalitarian beliefs yet exhibit subtle biases. Measuring such implicit biases can be a challenge: as LLMs become increasingly proprietary, it may not be possible to access their embeddings and apply existing bias measures; furthermore, implicit biases are primarily a concern if they affect the actual decisions that these systems make. We address both of these challenges by introducing two measures of bias inspired by psychology: LLM Implicit Association Test (IAT) Bias, which is a prompt-based method for revealing implicit bias; and LLM Decision Bias for detecting subtle discrimination in decision-making tasks. Using these measures, we found pervasive human-like stereotype biases in 6 LLMs across 4 social domains (race, gender, religion, health) and 21 categories (weapons, guilt, science, career among others). Our prompt-based measure of implicit bias correlates with embedding-based methods but better predicts downstream behaviors measured by LLM Decision Bias. This measure is based on asking the LLM to decide between individuals, motivated by psychological results indicating that relative not absolute evaluations are more related to implicit biases. Using prompt-based measures informed by psychology allows us to effectively expose nuanced biases and subtle discrimination in proprietary LLMs that do not show explicit bias on standard benchmarks.

翻訳日:2024-02-07 14:04:22 公開日:2024-02-06

# 英国小売市場における顧客セグメンテーションのためのクラスタリングアルゴリズムの検討

An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market ( http://arxiv.org/abs/2402.04103v1 )

ライセンス: Link先を確認

Jeen Mary John, Olamilekan Shobayo, Bayode Ogunleye

(参考訳) 近年,オンライン購入に対する人々の意識が高まっている。これによりオンライン小売プラットフォームが生まれ、顧客の購買行動の理解を深める必要性が高まった。小売企業は大量の顧客の購入に対処する必要があるため、より正確で効率的な顧客セグメンテーションを実現するための高度なアプローチが必要である。顧客セグメンテーション(customer segmentation)は、顧客中心のサービスを支援するマーケティング分析ツールで、収益性を高める。本稿では,小売業における意思決定プロセスを改善するための顧客セグメンテーションモデルの構築を目的とする。これを実現するために、UCI機械学習レポジトリから得られたイギリスのオンライン小売データセットを使用した。小売データセットは、541,909の顧客記録と8つの特徴で構成されている。本研究は顧客価値を定量化するためにrfm(recency, frequency, and monetary)フレームワークを採用した。その後,k-meansクラスタリング,gaussian mixed model (gmm),dential-based spatial clustering of applications with noise (dbscan),agglomerative clustering, balanced iterative reduction and clustering using hierarchies (birch)など,最先端(sota)クラスタリングアルゴリズムをいくつか比較した。その結果、GMMは他のアプローチよりも優れており、シルエットスコアは0.80である。

Recently, peoples awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer purchases, which requires sophisticated approaches to perform more accurate and efficient customer segmentation. Customer segmentation is a marketing analytical tool that aids customer-centric service and thus enhances profitability. In this paper, we aim to develop a customer segmentation model to improve decision-making processes in the retail market industry. To achieve this, we employed a UK-based online retail dataset obtained from the UCI machine learning repository. The retail dataset consists of 541,909 customer records and eight features. Our study adopted the RFM (recency, frequency, and monetary) framework to quantify customer values. Thereafter, we compared several state-of-the-art (SOTA) clustering algorithms, namely, K-means clustering, the Gaussian mixture model (GMM), density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, and balanced iterative reducing and clustering using hierarchies (BIRCH). The results showed the GMM outperformed other approaches, with a Silhouette Score of 0.80.

翻訳日:2024-02-07 14:03:50 公開日:2024-02-06

# 静的マルウェア検出におけるマルチCNNを用いたセクション解析

Use of Multi-CNNs for Section Analysis in Static Malware Detection ( http://arxiv.org/abs/2402.04102v1 )

ライセンス: Link先を確認

Tony Quertier, Gr\'egoire Barru\'e

(参考訳) 既存のマルウェア検出の研究は、検出率にのみ焦点をあてている。しかし、場合によっては、アルゴリズムの結果を理解することや、アナリストのためにファイル内でどこで調査するかなど、より多くの情報を得ることも重要です。そこで本研究では,ポータブルファイル解析のための新しいモデルを提案する。提案手法は,ファイルを異なるセクションに分割し,各セクションを画像に変換することで,特定のセクションを具体的に扱うために畳み込みニューラルネットワークを訓練する。そして、cnnが返したこれらのスコアをすべて使用して最終検出スコアを計算し、最終スコアにおける各セクションの重要性の分析を改善するモデルを使用します。

Existing research on malware detection focuses almost exclusively on the detection rate. However, in some cases, it is also important to understand the results of our algorithm, or to obtain more information, such as where to investigate in the file for an analyst. In this aim, we propose a new model to analyze Portable Executable files. Our method consists in splitting the files in different sections, then transform each section into an image, in order to train convolutional neural networks to treat specifically each identified section. Then we use all these scores returned by CNNs to compute a final detection score, using models that enable us to improve our analysis of the importance of each section in the final score.

翻訳日:2024-02-07 14:03:30 公開日:2024-02-06

# VRMM: ボリューム・リライナブルな定型ヘッドモデル

VRMM: A Volumetric Relightable Morphable Head Model ( http://arxiv.org/abs/2402.04101v1 )

ライセンス: Link先を確認

Haotian Yang, Mingwu Zheng, Chongyang Ma, Yu-Kun Lai, Pengfei Wan, Haibin Huang

(参考訳) 本稿では,3次元顔モデリングに先立って,新しい容積・パラメトリック顔モデルであるVRMMを紹介する。最近のボリューム事前モデルは、3Dモーフブルモデル(3DMM)のような従来の手法よりも改善されているが、モデル学習やパーソナライズされた再構築では課題に直面している。私たちのVRMMは、アイデンティティ、表現、照明の潜在空間を、低次元の表現に効率的に切り離し、エンコードする新しいトレーニングフレームワークを使用することで、これらを克服しています。このフレームワークは、自己教師型学習で設計されており、データトレーニングの制約を著しく減らし、実際はより実現可能である。学習したVRMMは、リライト機能を提供し、包括的な表現範囲を含んでいる。我々は,アバター生成,顔の再構成,アニメーションなどの様々な応用を通じて,VRMMの汎用性と有効性を示す。さらに,VRMMをベースとした新規な保存型パーソナライゼーションフレームワークにより,生成ボリュームモデルにおけるオーバーフィットの問題に対処する。このようなアプローチは、1つのポートレート入力からでも正確な3D顔の復元を可能にする。実験では,VRMMが3次元顔モデリングの分野を大幅に強化する可能性を示した。

In this paper, we introduce the Volumetric Relightable Morphable Model (VRMM), a novel volumetric and parametric facial prior for 3D face modeling. While recent volumetric prior models offer improvements over traditional methods like 3D Morphable Models (3DMMs), they face challenges in model learning and personalized reconstructions. Our VRMM overcomes these by employing a novel training framework that efficiently disentangles and encodes latent spaces of identity, expression, and lighting into low-dimensional representations. This framework, designed with self-supervised learning, significantly reduces the constraints for training data, making it more feasible in practice. The learned VRMM offers relighting capabilities and encompasses a comprehensive range of expressions. We demonstrate the versatility and effectiveness of VRMM through various applications like avatar generation, facial reconstruction, and animation. Additionally, we address the common issue of overfitting in generative volumetric models with a novel prior-preserving personalization framework based on VRMM. Such an approach enables accurate 3D face reconstruction from even a single portrait input. Our experiments showcase the potential of VRMM to significantly enhance the field of 3D face modeling.

翻訳日:2024-02-07 14:03:19 公開日:2024-02-06

# 奥行き画像の事前解析と自己誘導による画像再構成

Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction ( http://arxiv.org/abs/2402.04097v1 )

ライセンス: Link先を確認

Shijun Liang, Evan Bell, Qing Qu, Rongrong Wang, Saiprasad Ravishankar

(参考訳) 画像修復やMRI(MRI)を含む医用画像の逆問題において、深部画像前処理(DIP)が不完全あるいは劣化した測定から高品質な画像の復元に役立っている。 However, conventional DIP suffers from severe overfitting and spectral bias effects.In this work, we first provide an analysis of how DIP recovers information from undersampled imaging measurements by analyzing the training dynamics of the underlying networks in the kernel regime for different architectures.This study sheds light on important underlying properties for DIP-based recovery.Current research suggests that incorporating a reference image as network input can enhance DIP's performance in image reconstruction compared to using random inputs. しかし、適切な基準画像を得るには監督が必要であり、実用上の困難が伴う。この障害を克服するために,我々はさらに,ネットワーク重みと入力の両方を同時に最適化し,トレーニングデータの必要性をなくす自己駆動型再構築プロセスを導入する。提案手法は,ネットワーク入力画像と再構成画像の両方の堅牢かつ安定した関節推定を可能にする新しいデノイザ正規化項を組み込んだもので,MR画像再構成性能の点で,自己誘導方式が元のDIPと現代の監督手法のどちらよりも優れていることを示す。

The ability of deep image prior (DIP) to recover high-quality images from incomplete or corrupted measurements has made it popular in inverse problems in image restoration and medical imaging including magnetic resonance imaging (MRI). However, conventional DIP suffers from severe overfitting and spectral bias effects.In this work, we first provide an analysis of how DIP recovers information from undersampled imaging measurements by analyzing the training dynamics of the underlying networks in the kernel regime for different architectures.This study sheds light on important underlying properties for DIP-based recovery.Current research suggests that incorporating a reference image as network input can enhance DIP's performance in image reconstruction compared to using random inputs. However, obtaining suitable reference images requires supervision, and raises practical difficulties. In an attempt to overcome this obstacle, we further introduce a self-driven reconstruction process that concurrently optimizes both the network weights and the input while eliminating the need for training data. Our method incorporates a novel denoiser regularization term which enables robust and stable joint estimation of both the network input and reconstructed image.We demonstrate that our self-guided method surpasses both the original DIP and modern supervised methods in terms of MR image reconstruction performance and outperforms previous DIP-based schemes for image inpainting.

翻訳日:2024-02-07 14:03:00 公開日:2024-02-06

# 変分シェープリーネットワーク:不確実性量子化による自己説明シェープリー値の確率論的アプローチ

Variational Shapley Network: A Probabilistic Approach to Self-Explaining Shapley values with Uncertainty Quantification ( http://arxiv.org/abs/2402.04211v1 )

ライセンス: Link先を確認

Mert Ketenci, I\~nigo Urteaga, Victor Alfonso Rodriguez, No\'emie Elhadad, Adler Perotte

(参考訳) シェープ価値は、モデル決定プロセスの解明のための機械学習(ML)の基礎ツールとして現れている。重要な説明可能性の公理を満足する独自の能力が広く採用されているにもかかわらず、計算上の課題は、入力特徴の組み合わせのすべての可能な部分集合に対してモデルを評価する場合(i$)、モデルの限界を推定する場合(ii$)、説明における変数を推定する場合(iii$)、である。本稿では,Shapley値の計算を大幅に単純化し,単一のフォワードパスしか必要としない,新しい自己説明手法を提案する。シャプリー値の決定論的扱いを限界として認識し,説明に固有の不確実性を取り込む確率的枠組みを取り入れる。代替手法とは異なり、我々の手法は観測されたデータ空間に直接依存せず、代わりに、新しいマスク付きニューラルネットワークアーキテクチャによって生成される潜在的特徴特異的な埋め込み空間に由来する適応可能なベースライン値を使用する。シミュレーションおよび実データセットの評価は、我々の手法の堅牢な予測と説明性能を裏付けるものである。

Shapley values have emerged as a foundational tool in machine learning (ML) for elucidating model decision-making processes. Despite their widespread adoption and unique ability to satisfy essential explainability axioms, computational challenges persist in their estimation when ($i$) evaluating a model over all possible subset of input feature combinations, ($ii$) estimating model marginals, and ($iii$) addressing variability in explanations. We introduce a novel, self-explaining method that simplifies the computation of Shapley values significantly, requiring only a single forward pass. Recognizing the deterministic treatment of Shapley values as a limitation, we explore incorporating a probabilistic framework to capture the inherent uncertainty in explanations. Unlike alternatives, our technique does not rely directly on the observed data space to estimate marginals; instead, it uses adaptable baseline values derived from a latent, feature-specific embedding space, generated by a novel masked neural network architecture. Evaluations on simulated and real datasets underscore our technique's robust predictive and explanatory performance.

翻訳日:2024-02-07 13:56:18 公開日:2024-02-06

# マルチモーダル大言語モデルを用いた顔のスポーフィングと偽造検出のための評価ベンチマーク

SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models ( http://arxiv.org/abs/2402.04178v1 )

ライセンス: Link先を確認

Yichen Shi, Yuhao Gao, Yingxin Lai, Hongyang Wang, Jun Feng, Lei He, Jun Wan, Changsheng Chen, Zitong Yu, Xiaochun Cao

(参考訳) マルチモーダル大言語モデル(mllm)は、強力な視覚意味表現と言語推論能力に基づいて、様々な視覚分野(汎用オブジェクト認識や接地など)において驚くべき問題解決能力を示している。しかし、mllmが微妙な視覚的なspoof/forgedの手がかりに敏感であるかどうか、顔攻撃検出(例えば、顔のspoofingや偽造検出)の分野でどのように機能するかはまだ未解明である。本稿では,顔スプーフィングと偽造検出におけるMLLMの能力を評価するために,SHELDという新しいベンチマークを導入する。具体的には,この2つの顔認証タスクにおいて,マルチモーダル顔データを評価するために,true/falseとmulti-choiceの質問を設計する。 4種類の提示攻撃(印刷攻撃,リプレイ攻撃,剛体マスク,紙マスク)において,顔の偽造防止作業において,RGB,赤外線,奥行きの3つのモードを評価する。顔偽造検出タスクでは,視覚と音響の両モードでGANと拡散に基づくデータを評価する。各質問は、標準思考(COT)設定下でのゼロショットテストと少数ショットテストの両方を対象とする。その結果,MLLMは顔のセキュリティ領域において大きな可能性を秘めており,解釈可能性,マルチモーダルフレキシブル推論,関節面のスプーフや偽造検出といった点で,従来の特定モデルよりも有利であることがわかった。さらに,顔画像の様々なタスク特化属性とタスク非関連属性を記述・判断するためのMA-COT(Multi-Attribute Chain of Thought)パラダイムを開発した。顔の偽造防止, 顔の偽造検出, 関節検出作業における広範囲な実験により, 提案したMA-COTの有効性が示された。 https$:$/github.com/laiyingxin2/SHIELDで入手できる。

Multimodal large language models (MLLMs) have demonstrated remarkable problem-solving capabilities in various vision fields (e.g., generic object recognition and grounding) based on strong visual semantic representation and language reasoning ability. However, whether MLLMs are sensitive to subtle visual spoof/forged clues and how they perform in the domain of face attack detection (e.g., face spoofing and forgery detection) is still unexplored. In this paper, we introduce a new benchmark, namely SHIELD, to evaluate the ability of MLLMs on face spoofing and forgery detection. Specifically, we design true/false and multiple-choice questions to evaluate multimodal face data in these two face security tasks. For the face anti-spoofing task, we evaluate three different modalities (i.e., RGB, infrared, depth) under four types of presentation attacks (i.e., print attack, replay attack, rigid mask, paper mask). For the face forgery detection task, we evaluate GAN-based and diffusion-based data with both visual and acoustic modalities. Each question is subjected to both zero-shot and few-shot tests under standard and chain of thought (COT) settings. The results indicate that MLLMs hold substantial potential in the face security domain, offering advantages over traditional specific models in terms of interpretability, multimodal flexible reasoning, and joint face spoof and forgery detection. Additionally, we develop a novel Multi-Attribute Chain of Thought (MA-COT) paradigm for describing and judging various task-specific and task-irrelevant attributes of face images, which provides rich task-related knowledge for subtle spoof/forged clue mining. Extensive experiments in separate face anti-spoofing, separate face forgery detection, and joint detection tasks demonstrate the effectiveness of the proposed MA-COT. The project is available at https$:$//github.com/laiyingxin2/SHIELD

翻訳日:2024-02-07 13:55:57 公開日:2024-02-06

# 大規模言語モデルの下流タスク性能のスケーリング法則

Scaling Laws for Downstream Task Performance of Large Language Models ( http://arxiv.org/abs/2402.04177v1 )

ライセンス: Link先を確認

Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

(参考訳) スケーリング法則は、大きな言語モデル(LLM)の設計をガイドする重要な洞察を提供する。既存の作業は主に、事前トレーニング(上流)損失のスケーリング法則の研究に重点を置いています。しかし、トランスファー学習では、LLMを教師なしデータセットで事前訓練し、下流タスクで微調整することで、ダウンストリームのパフォーマンスを気にすることが多い。本研究では,LLMを機械翻訳タスクのために微調整した転写学習環境におけるスケーリング挙動について検討する。具体的には,ダウンストリームクロスエントロピーとbleuスコアの2つの指標から,プリトレーニングデータとそのサイズが下流のパフォーマンス(翻訳品質)にどのように影響するかを検討する。実験では,微調整データセットのサイズと,事前学習データと下流データの分布がスケーリング挙動に大きく影響することを示す。十分なアライメントで、下流のクロスエントロピーとBLEUスコアは、より事前学習データによって単調に改善される。このような場合、ログローを用いて、下流のBLEUスコアを精度良く予測できることが示される。しかし、適度な調整がBLEUスコアを変動させるか、より事前トレーニングで悪化させる場合もあり、下流のクロスエントロピーは単調に改善する。これらの観測を解析することにより、適切な事前学習データを選択するための新しい実践的な洞察を提供する。

Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we study the scaling behavior in a transfer learning setting, where LLMs are finetuned for machine translation tasks. Specifically, we investigate how the choice of the pretraining data and its size affect downstream performance (translation quality) as judged by two metrics: downstream cross-entropy and BLEU score. Our experiments indicate that the size of the finetuning dataset and the distribution alignment between the pretraining and downstream data significantly influence the scaling behavior. With sufficient alignment, both downstream cross-entropy and BLEU score improve monotonically with more pretraining data. In such cases, we show that it is possible to predict the downstream BLEU score with good accuracy using a log-law. However, there are also cases where moderate misalignment causes the BLEU score to fluctuate or get worse with more pretraining, whereas downstream cross-entropy monotonically improves. By analyzing these observations, we provide new practical insights for choosing appropriate pretraining data.

翻訳日:2024-02-07 13:55:18 公開日:2024-02-06

# COPS: リアルタイムスマイシング検出のための小型オンデバイスパイプライン

COPS: A Compact On-device Pipeline for real-time Smishing detection ( http://arxiv.org/abs/2402.04173v1 )

ライセンス: Link先を確認

Harichandana B S S, Sumit Kumar, Manjunath Bhimappa Ujjinakoppa, Barath Raj Kandur Raja

(参考訳) スマートフォンは私たちの日常生活では不可欠であり、コミュニケーションからオンラインショッピングまで、ほとんどあらゆることができる。しかし、利用の増加に伴い、モバイルデバイスをターゲットとしたサイバー犯罪が急増している。特にスマイッシング攻撃は近年顕著な増加を見せている。この問題は、加害者が毎日平均して15時間未満で新しい偽ウェブサイトを作成することでさらに悪化する。これは、悪意のあるURLのデータベースを効果的に維持する標準的なプラクティスである。そこで本研究では,不正なメッセージやurlの特徴をインテリジェントに識別し,ユーザにリアルタイムに警告する,新しいオンデバイスパイプラインを提案する。 COPSは、スマイシングとURLフィッシング検出のためのサイズ3.46MBのDisentangled Variational Autoencoderに基づく検出モジュールを備えた軽量パイプラインであり、オープンデータセット上でベンチマークを行う。両タスクでそれぞれ98.15%と99.5%の精度を達成し,0.37と0.015の誤陽性率で,リソース制約されたデバイス上でリアルタイムアラートを保証できるという利点を生かして,従来の作業よりも優れていた。

Smartphones have become indispensable in our daily lives and can do almost everything, from communication to online shopping. However, with the increased usage, cybercrime aimed at mobile devices is rocketing. Smishing attacks, in particular, have observed a significant upsurge in recent years. This problem is further exacerbated by the perpetrator creating new deceptive websites daily, with an average life cycle of under 15 hours. This renders the standard practice of keeping a database of malicious URLs ineffective. To this end, we propose a novel on-device pipeline: COPS that intelligently identifies features of fraudulent messages and URLs to alert the user in real-time. COPS is a lightweight pipeline with a detection module based on the Disentangled Variational Autoencoder of size 3.46MB for smishing and URL phishing detection, and we benchmark it on open datasets. We achieve an accuracy of 98.15% and 99.5%, respectively, for both tasks, with a false negative and false positive rate of a mere 0.037 and 0.015, outperforming previous works with the added advantage of ensuring real-time alerts on resource-constrained devices.

翻訳日:2024-02-07 13:54:57 公開日:2024-02-06

# 3次元RRDB-GANを用いた放射線診断における3次元体積超解像

3D Volumetric Super-Resolution in Radiology Using 3D RRDB-GAN ( http://arxiv.org/abs/2402.04171v1 )

ライセンス: Link先を確認

Juhyung Ha, Nian Wang, Surendra Maharjan, Xuhong Zhang

(参考訳) 放射線画像の3次元超解像のための3D Residual-in-Residual Dense Block GAN(3D RRDB-GAN)を提案する。 3D RRDB-GANの重要な側面は、2.5D知覚損失関数の統合である。モデルの有効性は,マウス脳mrh,oasis,hcp1200,msd-task-6など,多種多様なデータセットを対象とした4倍の超解像実験により評価した。これらの評価は、LPIPSやFIDのような定量的メトリクスと、サンプル視覚化による質的評価の両方を包含し、詳細な画像解析におけるモデルの有効性を実証する。 3D RRDB-GANは、特に医療画像の深度、明度、容積の詳細を豊かにすることで、医療画像に重要な貢献をしている。その応用は、包括的3次元視点から複雑な医用画像の解釈と分析を強化することを約束している。

This study introduces the 3D Residual-in-Residual Dense Block GAN (3D RRDB-GAN) for 3D super-resolution for radiology imagery. A key aspect of 3D RRDB-GAN is the integration of a 2.5D perceptual loss function, which contributes to improved volumetric image quality and realism. The effectiveness of our model was evaluated through 4x super-resolution experiments across diverse datasets, including Mice Brain MRH, OASIS, HCP1200, and MSD-Task-6. These evaluations, encompassing both quantitative metrics like LPIPS and FID and qualitative assessments through sample visualizations, demonstrate the models effectiveness in detailed image analysis. The 3D RRDB-GAN offers a significant contribution to medical imaging, particularly by enriching the depth, clarity, and volumetric detail of medical images. Its application shows promise in enhancing the interpretation and analysis of complex medical imagery from a comprehensive 3D perspective.

翻訳日:2024-02-07 13:54:38 公開日:2024-02-06

# 状況対応トラヒックルール例外に対するインフォームド強化学習

Informed Reinforcement Learning for Situation-Aware Traffic Rule Exceptions ( http://arxiv.org/abs/2402.04168v1 )

ライセンス: Link先を確認

Daniel Bogdoll, Jing Qin, Moritz Nekolla, Ahmed Abouelazm, Tim Joseph, J. Marius Z\"ollner

(参考訳) 強化学習は、有望な進歩を持つ非常に活発な研究分野である。しかし、自動運転の分野では、しばしば非常に単純なシナリオが検討されている。一般的なアプローチでは、非解釈制御コマンドをアクションスペースとして使用し、構造を欠いた非構造化報酬設計を用いる。本稿では,構造化ルールブックを知識源として統合したインフォームド強化学習を提案する。我々はトラジェクタを学習し,状況に応じた報奨設計を施し,エージェントが制御されたトラフィックルール例外を必要とする状況を学習できる動的報酬を与える。我々の方法は任意のRLモデルに適用できる。近年のモデルベースエージェントを用いた複雑なシナリオの完成率の向上に成功している。

Reinforcement Learning is a highly active research field with promising advancements. In the field of autonomous driving, however, often very simple scenarios are being examined. Common approaches use non-interpretable control commands as the action space and unstructured reward designs which lack structure. In this work, we introduce Informed Reinforcement Learning, where a structured rulebook is integrated as a knowledge source. We learn trajectories and asses them with a situation-aware reward design, leading to a dynamic reward which allows the agent to learn situations which require controlled traffic rule exceptions. Our method is applicable to arbitrary RL models. We successfully demonstrate high completion rates of complex scenarios with recent model-based agents.

翻訳日:2024-02-07 13:54:21 公開日:2024-02-06

# MLのための温度計算:双曲型モデル埋め込みへの応用

Tempered Calculus for ML: Application to Hyperbolic Model Embedding ( http://arxiv.org/abs/2402.04163v1 )

ライセンス: Link先を確認

Richard Nock and Ehsan Amid and Frank Nielsen and Alexander Soen and Manfred K. Warmuth

(参考訳) MLで使用されるほとんどの数学的歪みは、本質的には、$f$-divergences, Bregman divergences, (正規化された)最適輸送距離、積分確率測度、測地線距離などである。本稿では,これらの歪みを改善するための基礎理論とツールを公表し,機械学習の要件に対処する。まずリーマン積分の一般化から始め、厳密に加法的ではないがより一般的には非指数統計力学のように$t$-加法的である関数をカプセル化する。特に、これはボルテラ積積分を特別な場合として回復させる。次に、(ユークリッド)微分の拡張を用いて計算の基礎定理を一般化する。これは、より具体的な定理のシリーズとともに、計量性、双曲性、エンコーディングといった幾何学的およびML関連の特性に特に重点を置いて、歪み測度の基本的な特性を簡単な方法で設計、変更、あるいは変更する方法を示す結果の基盤となる。我々は、最近MLで注目を集めた問題、すなわち「チープ」による双曲的埋め込みと、双曲的対ユークリッド的スケールによる正確なエンコーディングにどのように適用するかを示す。我々は、poincar\'eディスクモデルが非常に魅力的な機能を持つ新しいアプリケーションを公開し、我々の理論は、ログロス(trees)とロジスティックロス(combination)を使って訓練された決定木の強化された組み合わせのための、 \textit{model} 埋め込みである。

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc. In this paper, we unveil a grounded theory and tools which can help improve these distortions to better cope with ML requirements. We start with a generalization of Riemann integration that also encapsulates functions that are not strictly additive but are, more generally, $t$-additive, as in nonextensive statistical mechanics. Notably, this recovers Volterra's product integral as a special case. We then generalize the Fundamental Theorem of calculus using an extension of the (Euclidean) derivative. This, along with a series of more specific Theorems, serves as a basis for results showing how one can specifically design, alter, or change fundamental properties of distortion measures in a simple way, with a special emphasis on geometric- and ML-related properties that are the metricity, hyperbolicity, and encoding. We show how to apply it to a problem that has recently gained traction in ML: hyperbolic embeddings with a "cheap" and accurate encoding along the hyperbolic vs Euclidean scale. We unveil a new application for which the Poincar\'e disk model has very appealing features, and our theory comes in handy: \textit{model} embeddings for boosted combinations of decision trees, trained using the log-loss (trees) and logistic loss (combinations).

翻訳日:2024-02-07 13:54:10 公開日:2024-02-06

# Markovへの注意: Markov Chainsによるトランスフォーマーの原則分析フレームワーク

Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains ( http://arxiv.org/abs/2402.04161v1 )

ライセンス: Link先を確認

Ashok Vardhan Makkuva, Marco Bondaschi, Adway Girish, Alliot Nagle, Martin Jaggi, Hyeji Kim, Michael Gastpar

(参考訳) 近年、注目に基づくトランスフォーマーは自然言語を含む様々な分野において大きな成功を収めている。彼らの成功の背後にある重要な要素は、生成前訓練の手順であり、これらのモデルが自動回帰的な方法で大きなテキストコーパスで訓練される。この現象を解明するために,マルコフ連鎖のレンズによる変圧器の逐次モデリング能力について,理論と系統実験の両方で研究できる新しい枠組みを提案する。自然言語のマルコビアン性に触発されて、データをマルコビアンソースとしてモデル化し、このフレームワークを用いて、データ分散特性、トランスフォーマアーキテクチャ、学習分布、最終的なモデル性能の間の相互作用を体系的に研究する。特に, 単一層トランスの損失景観を理論的に特徴付け, 特定のデータ特性と変圧器アーキテクチャに基づいて, 大域的ミニマと悪い局所ミニマの存在を示す。実験により,実験結果と理論的な結果が一致していることが実証された。我々は,より高次マルコフ連鎖とより深いアーキテクチャの広い文脈でこれらの知見をさらに調査し,この領域におけるオープン問題を概説する。コードは \url{https://github.com/bond1995/markov} で入手できる。

In recent years, attention-based transformers have achieved tremendous success across a variety of disciplines including natural languages. A key ingredient behind their success is the generative pretraining procedure, during which these models are trained on a large text corpus in an auto-regressive manner. To shed light on this phenomenon, we propose a new framework that allows both theory and systematic experiments to study the sequential modeling capabilities of transformers through the lens of Markov chains. Inspired by the Markovianity of natural languages, we model the data as a Markovian source and utilize this framework to systematically study the interplay between the data-distributional properties, the transformer architecture, the learnt distribution, and the final model performance. In particular, we theoretically characterize the loss landscape of single-layer transformers and show the existence of global minima and bad local minima contingent upon the specific data characteristics and the transformer architecture. Backed by experiments, we demonstrate that our theoretical findings are in congruence with the empirical results. We further investigate these findings in the broader context of higher order Markov chains and deeper architectures, and outline open problems in this arena. Code is available at \url{https://github.com/Bond1995/Markov}.

翻訳日:2024-02-07 13:53:42 公開日:2024-02-06

# プロンプティングによるプラグアンドプレイコントローラのハーネス化

Harnessing the Plug-and-Play Controller by Prompting ( http://arxiv.org/abs/2402.04160v1 )

ライセンス: Link先を確認

Hao Wang, Lei Sha

(参考訳) 制御可能なテキスト生成は、現実世界のアプリケーションで特定の制約を満たすテキストを生成することに焦点を当てた自然言語生成(NLG)における成長分野である。プラグ・アンド・プレイ・コントローラ(ppc)のような以前のアプローチは、生成したテキストの特性を柔軟に制御することを目的としていた。しかし、これらの手法は言語モデルの復号プロセスの整合性を損なうことが多く、結果としてスムーズなテキスト生成は少なくなった。あるいは、複数の属性プロンプトを使用して生成されたテキストを望ましい属性に整列させる手法もあるが、このアプローチでは各属性に対してプロンプト設計が必要であり、言語モデルのサイズに依存していた。本稿では,事前学習言語モデル(PLM)を用いたテキスト生成におけるフレキシブル属性制御手法を提案する。提案手法は、生成過程をPPCで導くことにより、生成したテキストの流速を高めることを目的としている。重要なアイデアは、プロンプトを変更して生成されたテキストの分布を動的に調整し、言語モデルの出力空間を効果的に制限し、所望の属性に影響を与えることである。本研究では, PLM と PPC の円滑な連携を実現するために, 動的適応フィードバックを用いた強化学習 (RLDAF) という新しいモデル微調整手法を提案する。この微調整プロセスは、PPC制御プロセス中に実行される生成アクションに基づいて、言語モデルのパラメータの小さなサブセットに適応する。 PLMとPPCの調和したコラボレーションによって、推論中のテキスト生成の滑らかさが向上する。 sst2データセットで広範な実験を行い,提案手法は,テキストフラレンシや属性一貫性など,さまざまな評価指標における従来のアプローチを上回った。

Controllable text generation is a growing field within natural language generation (NLG) that focuses on producing text that meets specific constraints in real-world applications. Previous approaches, such as plug-and-play controllers (PPCs), aimed to steer the properties of generated text in a flexible manner. However, these methods often compromised the integrity of the language model's decoding process, resulting in less smooth text generation. Alternatively, other techniques utilized multiple attribute prompts to align the generated text with desired attributes, but this approach required prompt design for each attribute and was dependent on the size of the language model. This paper introduces a novel method for flexible attribute control in text generation using pre-trained language models (PLMs). The proposed approach aims to enhance the fluency of generated text by guiding the generation process with PPCs. The key idea is to dynamically adjust the distribution of generated text by modifying prompts, effectively constraining the output space of the language model and influencing the desired attribute. To enable smooth cooperation between the PLM and the PPC, our work innovatively proposes a new model fine-tuning method: Reinforcement Learning with Dynamic Adjust Feedback (RLDAF).This fine-tuning process adapts a small subset of the language model's parameters based on the generating actions taken during the PPC control process. The resulting harmonious collaboration between the PLM and PPC leads to improved smoothness in text generation during inference. Extensive experiments were conducted on the SST2 dataset, and the proposed method outperformed previous approaches in various evaluation metrics, including text fluency and attribute consistency.

翻訳日:2024-02-07 13:53:20 公開日:2024-02-06

# Read to Play (R2-Play):マルチモーダルゲーム指導による決定変換器

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction ( http://arxiv.org/abs/2402.04154v1 )

ライセンス: Link先を確認

Yonggang Jin, Ge Zhang, Hao Zhao, Tianyu Zheng, Jiawei Guo, Liuyu Xiang, Shawn Yue, Stephen W. Huang, Wenhu Chen, Zhaofeng He and Jie Fu

(参考訳) 汎用エージェントの開発は、人工知能の長年の目標である。 Previous efforts utilizing extensive offline datasets from various tasks demonstrate remarkable performance in multitasking scenarios within Reinforcement Learning.However, these works encounter challenges in extending their capabilities to new tasks.Recent approaches integrate textual guidance or visual trajectory into decision networks to provide task-specific contextual cues, representing a promising direction.However, it is observed that relying solely on textual guidance or visual trajectory is insufficient for accurately conveying the contextual information of tasks.This paper explores enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a "read-to-play" capability.Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer.Experimental results demonstrate that incorporating multimodal game instructions significantly enhances the decision transformer's multitasking and generalization capabilities.

Developing a generalist agent is a longstanding objective in artificial intelligence. Previous efforts utilizing extensive offline datasets from various tasks demonstrate remarkable performance in multitasking scenarios within Reinforcement Learning.However, these works encounter challenges in extending their capabilities to new tasks.Recent approaches integrate textual guidance or visual trajectory into decision networks to provide task-specific contextual cues, representing a promising direction.However, it is observed that relying solely on textual guidance or visual trajectory is insufficient for accurately conveying the contextual information of tasks.This paper explores enhanced forms of task guidance for agents, enabling them to comprehend gameplay instructions, thereby facilitating a "read-to-play" capability.Drawing inspiration from the success of multimodal instruction tuning in visual tasks, we treat the visual-based RL task as a long-horizon vision task and construct a set of multimodal game instructions to incorporate instruction tuning into a decision transformer.Experimental results demonstrate that incorporating multimodal game instructions significantly enhances the decision transformer's multitasking and generalization capabilities.

翻訳日:2024-02-07 13:52:52 公開日:2024-02-06

# 量子力学における意識と崩壊へのプロセスベースアプローチ

Towards a process-based approach to consciousness and collapse in quantum mechanics ( http://arxiv.org/abs/2402.04152v1 )

ライセンス: Link先を確認

Raoni Arroyo and Lauro de Matos Nunes Filho and Frederik Moreira dos Santos

(参考訳) 量子力学の解釈によれば、測定過程における人間の意識の因果的役割は「測定問題」と呼ばれる基礎的な問題を解決するために求められる。伝統的に、この解釈は物質双対論のメタ物理と結びついている。このように、量子力学のこの解釈は双対論者の心体問題を継承する。我々の作業仮説は、意識に対するプロセスベースのアプローチは、心身問題に対するホワイトヘッドの解に依拠し、意識とその量子力学の解釈における役割についてより良いメタ物理的理解をもたらすというものである。この記事は、科学のメタフィジカルスにおけるこのような研究プログラムのキックオフです。

According to a particular interpretation of quantum mechanics, the causal role of human consciousness in the measuring process is called upon to solve a foundational problem called the "measurement problem". Traditionally, this interpretation is tied up with the metaphysics of substance dualism. As such, this interpretation of quantum mechanics inherits the dualist's mind-body problem. Our working hypothesis is that a process-based approach to the consciousness causes collapse interpretation (CCCI) -- leaning on Whitehead's solution to the mind-body problem -- offers a better metaphysical understanding of consciousness and its role in interpreting quantum mechanics. This article is the kickoff for such a research program in the metaphysics of science.

翻訳日:2024-02-07 13:52:39 公開日:2024-02-06

# 潜在変数ガウス過程による解釈可能なマルチソースデータ融合

Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process ( http://arxiv.org/abs/2402.04146v1 )

ライセンス: Link先を確認

Sandipp Krishnan Ravi, Yigitcan Comlek, Wei Chen, Arjun Pathak, Vipul Gupta, Rajnikant Umretiya, Andrew Hoffman, Ghanshyam Pilania, Piyush Pandita, Sayan Ghosh, Nathaniel Mckeever, Liping Wang

(参考訳) 人工知能(AI)と機械学習(ML)の出現により、科学と工学の様々な分野が、データ駆動サロゲートを利用して、多くの情報ソース(データ)から複雑なシステムをモデル化してきた。この増殖は、特定の機能を実行するように設計された優れたシステムの開発にかかわるコストと時間の大幅な削減につながった。このようなサロゲートの高い命題は、論文、特許、オープンレポジトリ、その他のリソースなど、複数のデータソースを広範囲に融合して構築されている。しかし、システム最適化中に下流に影響を及ぼす可能性のある情報ソースの既知のおよび未知の物理パラメータの品質と包括性の違いにはあまり注意が払われていない。この問題を解決するために,LVGP(Latent Variable Gaussian Process)に基づくマルチソースデータ融合フレームワークを提案する。個々のデータソースは、物理的に解釈可能な潜在空間にマッピングされる特徴的なカテゴリ変数としてタグ付けされ、ソース認識データ融合モデリングの開発を可能にする。さらに、LVGPの潜伏変数に基づく相似性尺度を導入し、データソースの違いを研究し、理解する。提案手法は、2つの数学的(表現パラボラ問題、2D Ackley関数)と2つの材料科学(FeCrAlおよびSmCoFe合金の設計)のケーススタディを用いて実証および解析を行った。ケーススタディから,シングルソースおよびソースを意識しないMLモデルと比較して,提案したマルチソースデータ融合フレームワークは,スパースデータ問題に対するより良い予測,ソースに対する解釈可能性,異なるソース間の相関や関係を利用してモデリング能力を向上させることができる。

With the advent of artificial intelligence (AI) and machine learning (ML), various domains of science and engineering communites has leveraged data-driven surrogates to model complex systems from numerous sources of information (data). The proliferation has led to significant reduction in cost and time involved in development of superior systems designed to perform specific functionalities. A high proposition of such surrogates are built extensively fusing multiple sources of data, may it be published papers, patents, open repositories, or other resources. However, not much attention has been paid to the differences in quality and comprehensiveness of the known and unknown underlying physical parameters of the information sources that could have downstream implications during system optimization. Towards resolving this issue, a multi-source data fusion framework based on Latent Variable Gaussian Process (LVGP) is proposed. The individual data sources are tagged as a characteristic categorical variable that are mapped into a physically interpretable latent space, allowing the development of source-aware data fusion modeling. Additionally, a dissimilarity metric based on the latent variables of LVGP is introduced to study and understand the differences in the sources of data. The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies. From the case studies, it is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.

翻訳日:2024-02-07 13:52:26 公開日:2024-02-06

# マルチラインAI支援コードオーサリング

Multi-line AI-assisted Code Authoring ( http://arxiv.org/abs/2402.04141v1 )

ライセンス: Link先を確認

Omer Dunay and Daniel Cheng and Adam Tait and Parth Thakkar and Peter C Rigby and Andy Chiu and Imad Ahmad and Arun Ganesan and Chandra Maddila and Vijayaraghavan Murali and Ali Tayyebi and Nachiappan Nagappan

(参考訳) CodeComposeは、大規模言語モデル(LLM)を活用したAI支援のコードオーサリングツールで、Metaの10万人の開発者にインライン提案を提供する。本稿では,単一行の提案表示から複数行の提案まで,製品のスケールアップ方法について述べる。この進化によって、開発者のためにこれらの提案のユーザビリティを改善する上で、いくつかのユニークな課題を克服する必要がありました。まず、LLMの提案が開発者の既存のコードの周りを常に動き回っており、そうでなければ生産性と満足度が低下します。第2に、マルチラインの提案は、生成にかなり時間がかかるため、ユーザによるレイテンシの認識を減らすために、いくつかの革新的な投資を行いました。これらのモデルホスト最適化により、複数行提案遅延が2.5倍になった。最後に,マルチライン提案がユーザエクスペリエンスに与える影響を理解し,これをシングルライン提案と対比するために,10万のエンジニアを対象に実験を行った。私たちの実験は (i)受理された文字の42%が複数行の提案である(ただし表示された提案は16%) (ii)複数行の提案により、9%から17%のユーザが保存したキーストロークの割合がほぼ倍増した。マルチラインのCodeComposeはMetaの全エンジニアに展開されており、エンジニアの1%未満がマルチラインの提案をオプトアウトしている。

CodeCompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to 10's of thousands of developers at Meta. In this paper, we present how we scaled the product from displaying single-line suggestions to multi-line suggestions. This evolution required us to overcome several unique challenges in improving the usability of these suggestions for developers. First, we discuss how multi-line suggestions can have a 'jarring' effect, as the LLM's suggestions constantly move around the developer's existing code, which would otherwise result in decreased productivity and satisfaction. Second, multi-line suggestions take significantly longer to generate; hence we present several innovative investments we made to reduce the perceived latency for users. These model-hosting optimizations sped up multi-line suggestion latency by 2.5x. Finally, we conduct experiments on 10's of thousands of engineers to understand how multi-line suggestions impact the user experience and contrast this with single-line suggestions. Our experiments reveal that (i) multi-line suggestions account for 42% of total characters accepted (despite only accounting for 16% for displayed suggestions) (ii) multi-line suggestions almost doubled the percentage of keystrokes saved for users from 9% to 17%. Multi-line CodeCompose has been rolled out to all engineers at Meta, and less than 1% of engineers have opted out of multi-line suggestions.

翻訳日:2024-02-07 13:51:59 公開日:2024-02-06

# 法的推論の進歩:半自動調停プロセス(saaps)によるグローバル法学における複雑度とバイアスをナビゲートするaiの統合

Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs) ( http://arxiv.org/abs/2402.04140v1 )

ライセンス: Link先を確認

Michael De'Shazer

(参考訳) 本研究は,米国,英国,ルワンダ,スウェーデン,香港の5カ国にまたがる裁判所判決の分析に対する新たなアプローチからなる。本研究はまた、人工知能(ai)と法的分析における最新の進歩の交点を探究し、人間のバイアスを識別し、様々な司法管轄区域における法律の一貫した適用を確保することを目的として、ai(特別に生成的なai)の役割を強調し、裁判所判断の自動化、有効性、一貫性のある多面的議論を促進する。本稿では,高度言語モデル (ALMs) と新たに導入された人間とAIの協調的枠組みを組み込むことにより,法律の実践において,高度言語モデル (ALMs) を用いた地上理論に基づく研究設計を分析することを目的とする。 ShiRLEYは、AIベースのアプリケーション(OpenAIのGPT技術上に構築されている)の名前であり、さまざまな法的判断における論理的矛盾とバイアスを検出することに焦点を当てている。 ShiRLEY分析は集約され、SAM(ALM)と呼ばれる比較指向のAIベースのアプリケーションとともに、ShiRLEYバイアス検出における相対偏差を識別する。さらに、ALM,SARAを介して半自律仲裁プロセス中にCRITICを生成する。上記のAIアプリケーション(SAM in together with ShiRLEY)で識別されるバイアスと定性的ニュアンスを、ビジネスと人権の仲裁規則に基づいて批判的に評価するAI仲裁器の利用において、新しいアプローチが導入された。この半自動仲裁プロセス(SAAP)は、AIと人間による協調分析のハイブリッドシステムを通じて、曖昧な議論に反する「理解」を確実にすることで、法的判断の完全性と公正性を維持することを目的としている。

This study consists of a novel approach toward the analysis of court judgments spanning five countries, including the United States, the United Kingdom, Rwanda, Sweden and Hong Kong. This study also explores the intersection of the latest advancements in artificial intelligence (AI) and legal analysis, emphasizing the role of AI (specifically generative AI) in identifying human biases and facilitating automated, valid, and coherent multisided argumentation of court judgments with the goal of ensuring consistent application of laws in and across various jurisdictions. By incorporating Advanced Language Models (ALMs) and a newly introduced human-AI collaborative framework, this paper seeks to analyze Grounded Theory-based research design with Advanced Language Models (ALMs) in the practice of law. SHIRLEY is the name of the AI-based application (built on top of OpenAI's GPT technology), focusing on detecting logical inconsistencies and biases across various legal decisions. SHIRLEY analysis is aggregated and is accompanied by a comparison-oriented AI-based application called SAM (also an ALM) to identify relative deviations in SHIRLEY bias detections. Further, a CRITIC is generated within semi-autonomous arbitration process via the ALM, SARA. A novel approach is introduced in the utilization of an AI arbitrator to critically evaluate biases and qualitative-in-nature nuances identified by the aforementioned AI applications (SAM in concert with SHIRLEY), based on the Hague Rules on Business and Human Rights Arbitration. This Semi-Automated Arbitration Process (SAAP) aims to uphold the integrity and fairness of legal judgments by ensuring a nuanced debate-resultant "understanding" through a hybrid system of AI and human-based collaborative analysis.

翻訳日:2024-02-07 13:51:38 公開日:2024-02-06

# シングルイメージデハージングのためのU字型視覚マンバ

U-shaped Vision Mamba for Single Image Dehazing ( http://arxiv.org/abs/2402.04139v1 )

ライセンス: Link先を確認

Zhuoran Zheng and Chen Wu

(参考訳) 現在、トランスフォーマーは画像デハジングで最も一般的なアーキテクチャであるが、計算の複雑さが大きいため、長距離依存を扱う能力はリソース制約のあるデバイスに限定されている。この課題に対処するために、効率的なシングルイメージデハージングネットワークであるUVM-Net(Vision Mamba)を導入する。長いシーケンスを処理できることで知られる新しいディープシーケンスモデルであるState Space Sequence Models (SSM) にインスパイアされた我々は、畳み込み層の局所的特徴抽出能力と、長距離依存関係をキャプチャするSSMの機能を統合するBi-SSMブロックを設計した。本手法の有効性を実験的に検証した。本手法は,画像デハジングや画像復元作業において,より効率的な長距離依存性モデリング手法を提供する。コードのURLは \url{https://github.com/zzr-idam} である。

Currently, Transformer is the most popular architecture for image dehazing, but due to its large computational complexity, its ability to handle long-range dependency is limited on resource-constrained devices. To tackle this challenge, we introduce the U-shaped Vision Mamba (UVM-Net), an efficient single-image dehazing network. Inspired by the State Space Sequence Models (SSMs), a new deep sequence model known for its power to handle long sequences, we design a Bi-SSM block that integrates the local feature extraction ability of the convolutional layer with the ability of the SSM to capture long-range dependencies. Extensive experimental results demonstrate the effectiveness of our method. Our method provides a more highly efficient idea of long-range dependency modeling for image dehazing as well as other image restoration tasks. The URL of the code is \url{https://github.com/zzr-idam}.

翻訳日:2024-02-07 13:51:05 公開日:2024-02-06

# NLPにおける「Typological Diversity」とは何か?

What is 'Typological Diversity' in NLP? ( http://arxiv.org/abs/2402.04222v1 )

ライセンス: Link先を確認

Esther Ploeger, Wessel Poelman, Miryam de Lhoneux, Johannes Bjerva

(参考訳) NLP研究コミュニティは英語以外の言語への関心を高め、多言語NLPの大幅な改善をもたらした。しかし、これらの改善は世界の言語の小さなサブセットにのみ適用される。これを拡張するために、言語間の一般化可能な多言語パフォーマンス向上を目指す論文が増えている。この目的のために、言語型学は、広範囲の言語にまたがる一般化を暗示する広範な類型学的なサンプルに基づいて、言語選択を動機付けるために一般的に用いられる。これらの選択はしばしば「類型的に多様な」と表現される。本研究では,「ティポロジー多様性」に関する主張を含むNLP研究を体系的に研究する。このような主張には明確な定義や基準は存在しない。我々は,いくつかの軸に沿って言語選択の多様性を近似する指標を導入し,結果が論文によって大きく異なることを発見した。さらに,歪んだ言語選択は多言語性能を過大評価する可能性があることを示した。言語サンプルの多様性を実証的に正当化する「タイポロジー多様性」の運用を含めることを推奨する。

The NLP research community has devoted increased attention to languages beyond English, resulting in considerable improvements for multilingual NLP. However, these improvements only apply to a small subset of the world's languages. Aiming to extend this, an increasing number of papers aspires to enhance generalizable multilingual performance across languages. To this end, linguistic typology is commonly used to motivate language selection, on the basis that a broad typological sample ought to imply generalization across a broad range of languages. These selections are often described as being 'typologically diverse'. In this work, we systematically investigate NLP research that includes claims regarding 'typological diversity'. We find there are no set definitions or criteria for such claims. We introduce metrics to approximate the diversity of language selection along several axes and find that the results vary considerably across papers. Furthermore, we show that skewed language selection can lead to overestimated multilingual performance. We recommend future work to include an operationalization of 'typological diversity' that empirically justifies the diversity of language samples.

翻訳日:2024-02-07 13:44:57 公開日:2024-02-06

# 2+1d$ su(2)ゲージ理論におけるせん断粘度の古典および量子計算

Classical and Quantum Computing of Shear Viscosity for $2+1D$ SU(2) Gauge Theory ( http://arxiv.org/abs/2402.04221v1 )

ライセンス: Link先を確認

Francesco Turro and Xiaojun Yao

(参考訳) 格子ハミルトニアンの定式化を用いて,$(2+1)$-dimensional su(2) ゲージ理論に対するせん断粘度の非摂動的計算を行う。応力-エネルギーテンソルの遅延グリーン関数は、局所ヒルベルト空間切断を伴う格子ハミルトニアンの正確な対角化による実時間発展から計算され、そのせん断粘度はkubo公式によって得られる。連続極限を取るとき、結合の正規化群フローは考慮するが、余分な演算子再正規化は考慮しない。せん断粘度とエントロピー密度$\frac{\eta}{s}$の比は、よく知られたホログラフィック結果$\frac{1}{4\pi}$の温度が、j_{\rm max}=\frac{1}{2}$の局所的な電気表現を持つ4\times4$ヘキサゴナル格子上の数温度と一致する。また、スペクトル関数と周波数 $\frac{\rho^{xy}(\omega)}{\omega}$ の比は、周波数が小さいときにピーク構造を示す。大きな格子上のj_{\rm max}=\frac{1}{2}$を超える正確な対角化法と単純な行列積状態古典シミュレーション法は、指数関数的に成長する資源を必要とする。そこで,遅延グリーン関数を計算し,j_{\rm max}$トランザクションや有限サイズ効果,トロッター誤差など計算の諸系統を解析する量子計算法を開発した。我々はQuantinuumエミュレータとIBMシミュレータの両方で、小さな格子に対して量子回路を試験し、古典計算と整合した結果を得る。

We perform a nonperturbative calculation of the shear viscosity for $(2+1)$-dimensional SU(2) gauge theory by using the lattice Hamiltonian formulation. The retarded Green's function of the stress-energy tensor is calculated from real time evolution via exact diagonalization of the lattice Hamiltonian with a local Hilbert space truncation and the shear viscosity is obtained via the Kubo formula. When taking the continuum limit, we account for the renormalization group flow of the coupling but no additional operator renormalization. We find the ratio of the shear viscosity and the entropy density $\frac{\eta}{s}$ is consistent with a well-known holographic result $\frac{1}{4\pi}$ at several temperatures on a $4\times4$ hexagonal lattice with the local electric representation truncated at $j_{\rm max}=\frac{1}{2}$. We also find the ratio of the spectral function and frequency $\frac{\rho^{xy}(\omega)}{\omega}$ exhibits a peak structure when the frequency is small. Both exact diagonalization method and simple matrix product state classical simulation method beyond $j_{\rm max}=\frac{1}{2}$ on bigger lattices require exponentially growing resources. So we develop a quantum computing method to calculate the retarded Green's function and analyze various systematics of the calculation including $j_{\rm max}$ truncation and finite size effects and Trotter errors. We test our quantum circuit on both the Quantinuum emulator and the IBM simulator for a small lattice and obtain results consistent with the classical computing ones.

翻訳日:2024-02-07 13:44:40 公開日:2024-02-06

# 無線ビデオキャッシングネットワークにおけるリソースアウェア階層型フェデレート学習

Resource-Aware Hierarchical Federated Learning in Wireless Video Caching Networks ( http://arxiv.org/abs/2402.04216v1 )

ライセンス: Link先を確認

Md Ferdous Pervej and Andreas F. Molisch

(参考訳) ワイヤレスビデオキャッシングネットワークにto-be-requestのコンテンツを様々なレベルで格納することで、いくつかの人気ファイルの動画トラフィックに起因するトラフィックを軽減できる。典型的には、コンテンツサービスプロバイダ(CSP)がコンテンツを所有し、ユーザは(無線)インターネットサービスプロバイダ(ISP)を使用して、CSPから好みのコンテンツを要求する。これらの関係者はプライベート情報やビジネスシークレットを公開しないため、従来の手法はユーザの将来の要求の動的変化を予測できない可能性がある。そこで本研究では,ユーザの今後のコンテンツ要求を予測するためのリソース対応階層型学習(RawHFL)ソリューションを提案する。ユーザが要求されたコンテンツに基づいて、ローカルトレーニングデータセットを更新できる実用的なデータ取得技術が使用されている。また,ネットワークなどの計算資源は限定的であり,モデルの学習には一部の利用者しか参加していないことから,提案アルゴリズムの収束限界を導出する。この境界に基づいて,実資源制約下で効率的にrawhflエネルギを訓練するための制御可能なパラメータを共同で構成するための重み付きユーティリティ関数を最小化する。提案アルゴリズムは, 既存のベースラインよりも, 試験精度とエネルギーコストの面で優れていることが検証された。

Backhaul traffic congestion caused by the video traffic of a few popular files can be alleviated by storing the to-be-requested content at various levels in wireless video caching networks. Typically, content service providers (CSPs) own the content, and the users request their preferred content from the CSPs using their (wireless) internet service providers (ISPs). As these parties do not reveal their private information and business secrets, traditional techniques may not be readily used to predict the dynamic changes in users' future demands. Motivated by this, we propose a novel resource-aware hierarchical federated learning (RawHFL) solution for predicting user's future content requests. A practical data acquisition technique is used that allows the user to update its local training dataset based on its requested content. Besides, since networking and other computational resources are limited, considering that only a subset of the users participate in the model training, we derive the convergence bound of the proposed algorithm. Based on this bound, we minimize a weighted utility function for jointly configuring the controllable parameters to train the RawHFL energy efficiently under practical resource constraints. Our extensive simulation results validate the proposed algorithm's superiority, in terms of test accuracy and energy cost, over existing baselines.

翻訳日:2024-02-07 13:44:13 公開日:2024-02-06

# マクロ量子系の対称性形状と熱力学

Symmetry shapes thermodynamics of macroscopic quantum systems ( http://arxiv.org/abs/2402.04214v1 )

ライセンス: Link先を確認

Vasco Cavina, Ariane Soret, Timur Aslyamov, Krzysztof Ptaszy\'nski, Massimiliano Esposito

(参考訳) 基礎となる対称性群に基づく量子系の熱力学への系統的アプローチを導出する。系のエントロピーは、その密度行列の詳細とは独立な群論的量を用いて記述できることを示した。我々はこの手法を一般の$N$と同一の相互作用を持つ$d$レベルの量子システムに適用する。置換不変性を用いることで、大きな$n$ に対して、エントロピーはモデルの微視的詳細とは完全に独立なレート関数 $s(\boldsymbol{x})$ を持つ普遍的な大きな偏差挙動を示すが、それは置換群 $\text{s}_n$ の既約表現の大きさにのみ依存する。ここで、分割函数は、自由エネルギー $f(\boldsymbol{x})=e(\boldsymbol{x})-\beta^{-1}s(\boldsymbol{x})$, ここで $e(\boldsymbol{x})$ は群表現理論によって決定される特定の部分空間の基底状態エネルギーにのみ依存する速度関数である。この理論を、熱・量子揺らぎの相互作用を示す相転移の最小モデルである横場キュリー・ワイスモデルに適用する。

We derive a systematic approach to the thermodynamics of quantum systems based on the underlying symmetry groups. We show that the entropy of a system can be described in terms of group-theoretical quantities that are largely independent of the details of its density matrix. We apply our technique to generic $N$ identical interacting $d$-level quantum systems. Using permutation invariance, we find that, for large $N$, entropy displays a universal large deviation behavior with a rate function $s(\boldsymbol{x})$ that is completely independent of the microscopic details of the model, but depends only on the size of the irreducible representations of the permutation group $\text{S}_N$. In turn, the partition function is shown to satisfy a large deviation principle with a free energy $f(\boldsymbol{x})=e(\boldsymbol{x})-\beta^{-1}s(\boldsymbol{x})$, where $e(\boldsymbol{x})$ is a rate function that only depends on the ground state energy of particular subspaces determined by group representation theory. We apply our theory to the transverse-field Curie-Weiss model, a minimal model of phase transition exhibiting an interplay of thermal and quantum fluctuations.

翻訳日:2024-02-07 13:43:50 公開日:2024-02-06

# 量子プロセスにおける情報フローの定量化

Quantifying information flow in quantum processes ( http://arxiv.org/abs/2402.04213v1 )

ライセンス: Link先を確認

Leonardo Santos, Zhen-Peng Xu, Jyrki Piilo, Otfried G\"uhne

(参考訳) 本稿では,一般量子プロセスにおける情報フローを定量化する枠組みを提案する。そこで本稿では,量子チャネルのシグナリングパワーを紹介し,その動作特性について考察する。この関数は高次写像への拡張をサポートし、一般的な量子因果ネットワークや不確定因果順序のプロセスにおける情報フローの評価を可能にする。さらに,初期システム環境相関の存在下でも適用可能なオープンシステムにおける情報ダイナミクスへの厳密なアプローチを提供し,古典情報と量子情報バックフローの区別を可能にした。

We present a framework for quantifying information flow within general quantum processes. For this purpose, we introduce the signaling power of quantum channels and discuss its relevant operational properties. This function supports extensions to higher order maps, enabling the evaluation of information flow in general quantum causal networks and also processes with indefinite causal order. Furthermore, our results offer a rigorous approach to information dynamics in open systems that applies also in the presence of initial system-environment correlations, and allows for the distinction between classical and quantum information backflow.

翻訳日:2024-02-07 13:43:10 公開日:2024-02-06

# 量子コンピュータ上の一般混合量子状態の準備

Preparing general mixed quantum states on quantum computers ( http://arxiv.org/abs/2402.04212v1 )

ライセンス: Link先を確認

Douglas F. Pinto, Lucas Friedrich, Jonas Maziero

(参考訳) 量子状態の生成は、量子通信プロトコル、量子コンピューティング、量子相関やその他の物理システム内のリソースの探索など、様々な領域における重要なサブルーチンとして機能する。先行研究で導入されたプロトコル(m. b. pozzobom と j. maziero, quantum inf. process. 18, 142 (2019))と [e。 R. G\aa rding et al., Entropy 23, 797 (2021)], [F. Shahbeigi, M. Karimi and V. Karimipour, Phys. Scr. 97, 025101 (2022)] の著者らは、当初2量子ビットのベル対角状態のために考案された方法論を拡張して、2量子ビットのX-実状態の混合を量子コンピュータ上で準備する能力を示した。本稿では、これらの量子回路内の見過ごされたパターンを探索し、より広い範囲を包含するアプローチを一般化する。量子情報プロセッサを用いた$d$Dの混合量子状態の生成に適したアルゴリズムを提案することにより,混合状態生成手法の大幅な進歩が期待できる。提案アルゴリズムの有効性を検証するため,Xと非X混合2量子状態,および1,2,3量子ビットにまたがる任意のランダム密度行列を用いた総合的な試験を行った。

The preparation of quantum states serves as a pivotal subroutine across various domains, including quantum communication protocols, quantum computing, and the exploration of quantum correlations and other resources within physical systems. Building upon the protocols introduced in previous works [M. B. Pozzobom and J. Maziero, Quantum Inf. Process. 18, 142 (2019)] and [E. R. G\aa rding et al., Entropy 23, 797 (2021)], the authors of [F. Shahbeigi, M. Karimi and V. Karimipour, Phys. Scr. 97, 025101 (2022)] demonstrated the capability to prepare mixed two-qubit X-real states on quantum computers by extending the methodology initially devised for mixed two-qubit Bell-diagonal states. In this article, we delve into an overlooked pattern within these quantum circuits, allowing us to generalize the approach to encompass a broader scope. Presenting an algorithm tailored for the preparation of $d$-dimensional mixed quantum states using quantum information processors, we offer a significant advancement in mixed state preparation methodologies. To validate the efficacy of our algorithm, we conducted comprehensive tests utilizing both X and non-X mixed two-qubit states, as well as arbitrary random density matrices spanning one, two, and three qubits.

翻訳日:2024-02-07 13:42:54 公開日:2024-02-06

# 課題成功」は不十分--望ましくないエージェント行動を取り込む行動批判としての映像言語モデルの使用について

"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors ( http://arxiv.org/abs/2402.04210v1 )

ライセンス: Link先を確認

Lin Guan, Yifan Zhou, Denis Liu, Yantian Zha, Heni Ben Amor, Subbarao Kambhampati

(参考訳) 大規模な生成モデルは有意義なソリューションをサンプリングするのに有用であるが、しばしばタスクの制約やユーザの好みを見落としている。モデルが外部検証器と結合され、最終解が検証フィードバックに従って反復的または漸進的に導出される場合、それらの全力はよりよく活用される。具体的AIの文脈では、検証は多くの場合、命令で指定された目標条件が満たされたかどうかのみを評価する。しかしながら、これらのエージェントを日常の生活にシームレスに統合するには、タスクの成功以上の幅広い制約や嗜好を考慮することが不可欠である(例えば、ロボットは大きな変形を避けるために、パンを慎重に把握する必要がある)。しかし、ロボットタスクの無制限範囲を考えると、goゲームや定理証明のような明示的知識タスクで使われるものと同じようなスクリプト検証器を構築することは不可能である。音声検証器がなければ、ビデオで望ましくないロボットの振る舞いを捉えるためのスケーラブルな行動批判として、ほぼ全能的な巨大ビジョンと言語モデル(vlms)を使用できるだろうか? そこで我々はまず,目標達成型で望ましくないロボットポリシーの多様な事例を含むベンチマークを構築した。そして,VLM批判を総合的に評価し,その強みや失敗モードをより深く理解する。評価に基づいて,VLM批判を効果的に活用するためのガイドラインを提供し,フィードバックを政策改善の反復的なプロセスに統合する実践的な方法を示す。データセットとコードベースは以下の通りである。

Large-scale generative models are shown to be useful for sampling meaningful candidate solutions, yet they often overlook task constraints and user preferences. Their full power is better harnessed when the models are coupled with external verifiers and the final solutions are derived iteratively or progressively according to the verification feedback. In the context of embodied AI, verification often solely involves assessing whether goal conditions specified in the instructions have been met. Nonetheless, for these agents to be seamlessly integrated into daily life, it is crucial to account for a broader range of constraints and preferences beyond bare task success (e.g., a robot should grasp bread with care to avoid significant deformations). However, given the unbounded scope of robot tasks, it is infeasible to construct scripted verifiers akin to those used for explicit-knowledge tasks like the game of Go and theorem proving. This begs the question: when no sound verifier is available, can we use large vision and language models (VLMs), which are approximately omniscient, as scalable Behavior Critics to catch undesirable robot behaviors in videos? To answer this, we first construct a benchmark that contains diverse cases of goal-reaching yet undesirable robot policies. Then, we comprehensively evaluate VLM critics to gain a deeper understanding of their strengths and failure modes. Based on the evaluation, we provide guidelines on how to effectively utilize VLM critiques and showcase a practical way to integrate the feedback into an iterative process of policy refinement. The dataset and codebase are released at: https://guansuns.github.io/pages/vlm-critic.

翻訳日:2024-02-07 13:41:43 公開日:2024-02-06

# 非クリティカルケア患者の急性腎障害予測 : 外的・内的検証による検討

Acute kidney injury prediction for non-critical care patients: a retrospective external and internal validation study ( http://arxiv.org/abs/2402.04209v1 )

ライセンス: Link先を確認

Esra Adiyeke, Yuanfang Ren, Benjamin Shickel, Matthew M. Ruppert, Ziyuan Guan, Sandra L. Kane-Gill, Raghavan Murugan, Nabihah Amatullah, Britney A. Stottlemyer, Tiffany L. Tran, Dan Ricketts, Christopher M Horvat, Parisa Rashidi, Azra Bihorac, Tezcan Ozrazgat-Baslanti

(参考訳) 背景:急性腎障害(aki)は入院患者の最大18%で腎排尿機能の低下がみられる。 AKIの進行は腎臓に不可逆的な損傷をもたらす可能性がある。方法: この振り返りコホート研究は、ピッツバーグ大学医療センター (UPMC) (n = 46,815) とフロリダ大学健康学部 (UFH) (n = 127,202) の非集中治療室に入院した成人例を含む。深層学習モデルと従来の機械学習モデルを比較して,48時間以内にステージ2以上のAKIの進行を予測する。両部位の局所モデル(UFHモデル,UPMCモデル,UPMCモデル)と,両部位の患者の発達コホート(UFH-UPMCモデル)を用いた個別モデル(UFH-UPMCモデル)を訓練した。各部位のモデルを内部および外部で検証し,性別と人種のサブグループ分析を行った。結果: UFH と UPMC の 3% (n=3,257) および 8% (n=2,296) に AKI が認められた。 UFHテストコホートの受信動作曲線値(AUROC)は0.77(UPMCモデル)から0.81(UFHモデル)まで、AUROC値はUPMCテストコホートの0.79(UFHモデル)から0.83(UPMCモデル)までであった。 UFH-UPMCモデルでは、UFHでは0.81(95%信頼区間 [CI] [0.80, 0.83])、UPMCテストコホートでは0.82(95% CI [0.81,0.84])、UFHでは0.6(95% CI, [0.05, 0.06])、UPMCテストコホートでは0.13(95% CI, [0.11,0.15])を達成している。速度論的に推定された糸球体ろ過率, 腎毒性薬物負荷, 尿素窒素は, モデルおよび健康センター全体で最も影響の大きい3つの特徴を残した。結論: 現地で開発されたモデルでは, 他の施設でテストした場合, 識別率がわずかに低下する傾向が見られたが, 影響する特徴の上位セットは, モデルとサイト間で同一のままであった。

Background: Acute kidney injury (AKI), the decline of kidney excretory function, occurs in up to 18% of hospitalized admissions. Progression of AKI may lead to irreversible kidney damage. Methods: This retrospective cohort study includes adult patients admitted to a non-intensive care unit at the University of Pittsburgh Medical Center (UPMC) (n = 46,815) and University of Florida Health (UFH) (n = 127,202). We developed and compared deep learning and conventional machine learning models to predict progression to Stage 2 or higher AKI within the next 48 hours. We trained local models for each site (UFH Model trained on UFH, UPMC Model trained on UPMC) and a separate model with a development cohort of patients from both sites (UFH-UPMC Model). We internally and externally validated the models on each site and performed subgroup analyses across sex and race. Results: Stage 2 or higher AKI occurred in 3% (n=3,257) and 8% (n=2,296) of UFH and UPMC patients, respectively. Area under the receiver operating curve values (AUROC) for the UFH test cohort ranged between 0.77 (UPMC Model) and 0.81 (UFH Model), while AUROC values ranged between 0.79 (UFH Model) and 0.83 (UPMC Model) for the UPMC test cohort. UFH-UPMC Model achieved an AUROC of 0.81 (95% confidence interval [CI] [0.80, 0.83]) for UFH and 0.82 (95% CI [0.81,0.84]) for UPMC test cohorts; an area under the precision recall curve values (AUPRC) of 0.6 (95% CI, [0.05, 0.06]) for UFH and 0.13 (95% CI, [0.11,0.15]) for UPMC test cohorts. Kinetic estimated glomerular filtration rate, nephrotoxic drug burden and blood urea nitrogen remained the top three features with the highest influence across the models and health centers. Conclusion: Locally developed models displayed marginally reduced discrimination when tested on another institution, while the top set of influencing features remained the same across the models and sites.

翻訳日:2024-02-07 13:40:58 公開日:2024-02-06

# 大規模事前学習ニューラルネットワークにおける人型幾何学的抽象化

Human-Like Geometric Abstraction in Large Pre-trained Neural Networks ( http://arxiv.org/abs/2402.04203v1 )

ライセンス: Link先を確認

Declan Campbell, Sreejan Kumar, Tyler Giallanza, Thomas L. Griffiths, Jonathan D. Cohen

(参考訳) 人間は抽象構造を認識し、操作する優れた能力を有しており、特に幾何学の領域で顕著である。認知科学における最近の研究は、ニューラルネットワークがこの能力を共有していないことを示唆しており、人間の幾何学的能力は人間の精神表現における離散的なシンボル構造に由来すると結論付けている。しかしながら、人工知能(AI)の進歩は、モデルサイズとトレーニングデータの量の両方で標準アーキテクチャをスケールアップした後、ニューラルネットワークがより人間的な推論を示すようになることを示唆している。本研究では,幾何学的視覚処理に関する認知科学における経験的結果を再検討し,幾何学的視覚処理における3つの主要なバイアスを同定する。我々は、人間のバイアスを調査する文献からタスクをテストし、AIで使用される大規模なトレーニング済みニューラルネットワークモデルにより、より人間的な抽象幾何学的処理が示されることを示した。

Humans possess a remarkable capacity to recognize and manipulate abstract structure, which is especially apparent in the domain of geometry. Recent research in cognitive science suggests neural networks do not share this capacity, concluding that human geometric abilities come from discrete symbolic structure in human mental representations. However, progress in artificial intelligence (AI) suggests that neural networks begin to demonstrate more human-like reasoning after scaling up standard architectures in both model size and amount of training data. In this study, we revisit empirical results in cognitive science on geometric visual processing and identify three key biases in geometric visual processing: a sensitivity towards complexity, regularity, and the perception of parts and relations. We test tasks from the literature that probe these biases in humans and find that large pre-trained neural network models used in AI demonstrate more human-like abstract geometric processing.

翻訳日:2024-02-07 13:40:04 公開日:2024-02-06

# 情報システムとソフトウェア工学:コンバージェンスの場合

Information Systems and Software Engineering: The Case for Convergence ( http://arxiv.org/abs/2402.04200v1 )

ライセンス: Link先を確認

Brian Fitzgerald

(参考訳) 情報システム (is) とソフトウェア工学 (se) の分野は、その歴史的発展における顕著な類似点を共有している。これらの類似点を以下に概説する。両分野の主刊誌における10年間(2001-2010年)の出版物の分析も、研究トピックにかなりの重複が見られる。若い分野の双方が直面する課題を考えると、伝統的にそうであったよりも両方の分野間のより密接な相互作用から得られる可能性の方が大きい。この記事では、そのような相互作用を奨励し、これがデザインの領域でどのように役立つかを説明します。結論は、ISとSEフィールド間の相互作用を刺激し促進するいくつかの実践的なイニシアチブを提案することで締めくくられる。

The Information Systems (IS) and Software Engineering (SE) fields share a remarkable number of similarities in their historical evolution to date. These similarities are briefly outlined below. An analysis of 10 years (2001-2010) of publications in the primary journals in both fields also reveals a good deal of overlap in research topics. Given the challenges faced by both as young disciplines, there is potentially much to gain from a closer interaction between both fields than has traditionally been the case. This article seeks to encourage such interaction, and illustrates how this might usefully occur in the area of design. It concludes by proposing a number of practical initiatives that could stimulate and facilitate interaction between the IS and SE fields

翻訳日:2024-02-07 13:39:47 公開日:2024-02-06

# Instance by Instance: マルチインスタンス3D登録のための反復フレームワーク

Instance by Instance: An Iterative Framework for Multi-instance 3D Registration ( http://arxiv.org/abs/2402.04195v1 )

ライセンス: Link先を確認

Xinyue Cao, Xiyu Zhang, Yuxin Cheng, Zhaoshuai Qi, Yanning Zhang, Jiaqi Yang

(参考訳) マルチインスタンス登録は、コンピュータビジョンとロボティクスにおいて難しい問題であり、オブジェクトの複数のインスタンスを標準座標システムに登録する必要がある。本稿では,マルチインテンス3d登録(mi-3dreg)のためのインスタンスバイインテンス(ibi)と呼ばれる最初の反復フレームワークを提案する。より簡単で、より難しいシナリオまで、すべてのインスタンスを所定のシナリオに連続的に登録する。反復的なプロセスを通じて、アウトリアーは継続的に排除され、残りのインスタンスとより困難なインスタンスのインリアーレートが増加する。 IBIフレームワークでは、より堅牢なMI-3DRegを実現するために、スパース・トゥ・ディエンス対応型マルチインスタンス登録法(IBI-S2DC)を提案する。合成および実データに対する実験は、BIの有効性を実証し、IBI-S2DCの新たな最先端性能、例えば、我々のMHF1は、合成/実データに対する既存の最先端のECCよりも12.02%/12.35%高いことを示唆している。

Multi-instance registration is a challenging problem in computer vision and robotics, where multiple instances of an object need to be registered in a standard coordinate system. In this work, we propose the first iterative framework called instance-by-instance (IBI) for multi-instance 3D registration (MI-3DReg). It successively registers all instances in a given scenario, starting from the easiest and progressing to more challenging ones. Throughout the iterative process, outliers are eliminated continuously, leading to an increasing inlier rate for the remaining and more challenging instances. Under the IBI framework, we further propose a sparse-to-dense-correspondence-based multi-instance registration method (IBI-S2DC) to achieve robust MI-3DReg. Experiments on the synthetic and real datasets have demonstrated the effectiveness of IBI and suggested the new state-of-the-art performance of IBI-S2DC, e.g., our MHF1 is 12.02%/12.35% higher than the existing state-of-the-art method ECC on the synthetic/real datasets.

翻訳日:2024-02-07 13:39:37 公開日:2024-02-06

# 脱落ストラグラーの分散学習における勾配符号化

Gradient Coding in Decentralized Learning for Evading Stragglers ( http://arxiv.org/abs/2402.04193v1 )

ライセンス: Link先を確認

Chengxi Li and Mikael Skoglund

(参考訳) 本稿では,トラグラーの存在下での分散学習問題について考察する。分散学習のための勾配符号化技術は,冗長なトレーニングデータを持つ符号化勾配を送信するストラグラーを回避するために開発されてきたが,その手法を分散学習シナリオに直接適用することは困難である。この問題に対処するために,グラデーションコーディング(goco)を用いた新しいgossipベースの分散学習手法を提案する。提案手法では, ストラグラーの負の影響を避けるために, 確率勾配符号化の枠組みに基づくエンコード勾配を用いてパラメータベクトルを局所的に更新し, ゴシップ方式で平均化する。強凸損失関数に対するgocoの収束性能を解析した。また,本手法の学習性能をベースライン法と比較し,提案手法が優れていることを示すシミュレーション結果を提供する。

In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with redundant training data, it is difficult to apply those techniques directly to decentralized learning scenarios. To deal with this problem, we propose a new gossip-based decentralized learning method with gradient coding (GOCO). In the proposed method, to avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of stochastic gradient coding and then averaged in a gossip-based manner. We analyze the convergence performance of GOCO for strongly convex loss functions. And we also provide simulation results to demonstrate the superiority of the proposed method in terms of learning performance compared with the baseline methods.

翻訳日:2024-02-07 13:39:17 公開日:2024-02-06

# オープンソースプロジェクトのインキュビティ: ロックされたgithubイシュースレッドの包括的な注釈付きデータセット

Incivility in Open Source Projects: A Comprehensive Annotated Dataset of Locked GitHub Issue Threads ( http://arxiv.org/abs/2402.04183v1 )

ライセンス: Link先を確認

Ramtin Ehsani, Mia Mohammad Imran, Robert Zita, Kostadin Damevski, Preetha Chatterjee

(参考訳) オープン・ソース・ソフトウェア(oss)開発のダイナミック・ランドスケープでは、議論の中のインキビティの理解と対処が健全で生産的なコラボレーションを促進する上で不可欠である。本稿では、213のOSSプロジェクトから収集された404のロックされたGitHubイシューディスカッションスレッドと5961の個別コメントのキュレートデータセットを提案する。我々は, toor bearing discussion features (tbdfs) を用いて様々なカテゴリーのインキビティを付したコメントを付記し, それぞれのイシュースレッドに対してトリガ, ターゲット, およびインキビティの結果を付記した。当社のデータセットでは,苦いフラストレーションや不満足,モッキングが最も一般的なtbdfであることが分かりました。インキビティの最も一般的なトリガ、ターゲット、結果には、ツール/コードやエラーメッセージの使用の失敗、人々、そしてさらなる議論の中止が含まれる。このデータセットは、OSSのincivilityを分析し、そのような振る舞いを検出し緩和するための自動化ツールを改善するための貴重なリソースとして機能する。

In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual comments, collected from 213 OSS projects. We annotated the comments with various categories of incivility using Tone Bearing Discussion Features (TBDFs), and, for each issue thread, we annotated the triggers, targets, and consequences of incivility. We observed that Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs exhibited in our dataset. The most common triggers, targets, and consequences of incivility include Failed use of tool/code or error messages, People, and Discontinued further discussion, respectively. This dataset can serve as a valuable resource for analyzing incivility in OSS and improving automated tools to detect and mitigate such behavior.

翻訳日:2024-02-07 13:39:02 公開日:2024-02-06

# アンサンブルモデル予測安全認定による強化学習

Reinforcement Learning with Ensemble Model Predictive Safety Certification ( http://arxiv.org/abs/2402.04182v1 )

ライセンス: Link先を確認

Sven Gronauer, Tom Haider, Felippe Schmoeller da Roza, Klaus Diepold

(参考訳) 強化学習アルゴリズムは学習するために探索を必要とする。しかし、教師なしの探索は、安全クリティカルなタスクへのそのようなアルゴリズムの展開を妨げ、現実世界の配備を制限する。本稿では,モデルベース深部強化学習とチューブベースモデル予測制御を組み合わせることで,学習エージェントが行う動作を補正し,安全制約違反を最小限に抑える,Ensemble Model Predictive Safety Certificationというアルゴリズムを提案する。本手法は,セーフコントローラが生成するオフラインデータのみを必要とすることで,実際のシステムに関する事前知識の量を削減することを目的とする。その結果,強化学習法に比べて制約違反が有意に少ないことがわかった。

Reinforcement learning algorithms need exploration to learn. However, unsupervised exploration prevents the deployment of such algorithms on safety-critical tasks and limits real-world deployment. In this paper, we propose a new algorithm called Ensemble Model Predictive Safety Certification that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent, keeping safety constraint violations at a minimum through planning. Our approach aims to reduce the amount of prior knowledge about the actual system by requiring only offline data generated by a safe controller. Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.

翻訳日:2024-02-07 13:38:42 公開日:2024-02-06

# AnyTool: 大規模APIコールのための自己表現型階層型エージェント

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls ( http://arxiv.org/abs/2402.04253v1 )

ライセンス: Link先を確認

Yu Du, Fangyun Wei, Hongyang Zhang

(参考訳) 我々はanytoolを紹介する。anytoolは巨大な言語モデルエージェントで、ユーザークエリに対する大量のツールの利用に革命をもたらすように設計されている。 Rapid APIから16,000以上のAPIを使用し、これらのAPIのサブセットがクエリを解決できると仮定して運用しています。 AnyToolには,階層構造を持つAPIレトリバーと,選択したAPI候補セットを使用したユーザクエリの解決を目的とした解決器,初期ソリューションが実行不可能であることを証明すれば,AnyToolを再活性化するセルフリフレクション機構という,3つの要素が含まれている。 AnyToolはGPT-4の関数呼び出し機能を利用しており、外部モジュールをトレーニングする必要がなくなる。また,先行研究によって導入された評価プロトコルを再検討し,人工的に高いパスレートにつながるこのプロトコルの制限を特定する。実用的なアプリケーションシナリオをよりよく反映するために評価プロトコルを改訂することにより、AnyToolBenchと呼ばれる追加のベンチマークを導入する。さまざまなデータセットに対する実験は、ツールLLMやツール利用に適したGPT-4など、強力なベースラインよりもAnyToolの方が優れていることを示している。例えば、anytool は toolbench の平均パスレートで +35.4% で toolllm を上回っている。コードはhttps://github.com/dyabel/AnyTool.comから入手できる。

We introduce AnyTool, a large language model agent designed to revolutionize the utilization of a vast array of tools in addressing user queries. We utilize over 16,000 APIs from Rapid API, operating under the assumption that a subset of these APIs could potentially resolve the queries. AnyTool primarily incorporates three elements: an API retriever with a hierarchical structure, a solver aimed at resolving user queries using a selected set of API candidates, and a self-reflection mechanism, which re-activates AnyTool if the initial solution proves impracticable. AnyTool is powered by the function calling feature of GPT-4, eliminating the need for training external modules. We also revisit the evaluation protocol introduced by previous works and identify a limitation in this protocol that leads to an artificially high pass rate. By revising the evaluation protocol to better reflect practical application scenarios, we introduce an additional benchmark, termed AnyToolBench. Experiments across various datasets demonstrate the superiority of our AnyTool over strong baselines such as ToolLLM and a GPT-4 variant tailored for tool utilization. For instance, AnyTool outperforms ToolLLM by +35.4% in terms of average pass rate on ToolBench. Code will be available at https://github.com/dyabel/AnyTool.

翻訳日:2024-02-07 13:31:58 公開日:2024-02-06

# EVA-CLIP-18B:CLIPを18億パラメータに拡張

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters ( http://arxiv.org/abs/2402.04252v1 )

ライセンス: Link先を確認

Quan Sun, Jinsheng Wang, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang

(参考訳) 対照的な言語イメージ事前学習(CLIP)のスケールアップは、視覚モデルとマルチモーダルモデルの両方の強化に不可欠である。現在までに最大かつ最強のオープンソースCLIPモデルであるEVA-CLIP-18Bについて述べる。 EVA-CLIP-18Bは、わずか6ビリオンのトレーニングサンプルで、27の広く認識されている画像分類ベンチマークで平均80.7%のゼロショットトップ-1の精度を達成し、前回のEVA-CLIP(5ビリオンパラメータ)および他のオープンソースCLIPモデルよりも大きなマージンを達成している。 LAION-2BとCOYO-700Mの2億画素画像テキストペアのトレーニングデータセットを一定に維持しながら,EVA-CLIPのモデルサイズスケーリングによる一貫した性能向上を観察した。このデータセットは公開されており、他の最先端のCLIPモデルで使用される社内データセット(DFN-5B、WebLI-10Bなど)よりもはるかに小さい。 EVA-CLIP-18Bは、EVAスタイルの弱い視覚モデルスケーリングの可能性を示す。モデルウェイトを公開することにより、ビジョンモデルとマルチモーダル基盤モデルの将来的な研究を促進することを願っている。

Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models. We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters. With only 6-billion training samples seen, EVA-CLIP-18B achieves an exceptional 80.7% zero-shot top-1 accuracy averaged across 27 widely recognized image classification benchmarks, outperforming its forerunner EVA-CLIP (5-billion parameters) and other open-source CLIP models by a large margin. Remarkably, we observe a consistent performance improvement with the model size scaling of EVA-CLIP, despite maintaining a constant training dataset of 2-billion image-text pairs from LAION-2B and COYO-700M. This dataset is openly available and much smaller than the in-house datasets (e.g., DFN-5B, WebLI-10B) employed in other state-of-the-art CLIP models. EVA-CLIP-18B demonstrates the potential of EVA-style weak-to-strong visual model scaling. With our model weights made publicly available, we hope to facilitate future research in vision and multimodal foundation models.

翻訳日:2024-02-07 13:31:37 公開日:2024-02-06

# 参照集約による線形時間最小ベイズリスクデコード

Linear-time Minimum Bayes Risk Decoding with Reference Aggregation ( http://arxiv.org/abs/2402.04251v1 )

ライセンス: Link先を確認

Jannis Vamvas and Rico Sennrich

(参考訳) 最小ベイズリスク(MBR)復号法(Minimum Bayes Risk, MBR)は、機械翻訳の品質向上を図ったテキスト生成手法であるが、サンプリングベースの近似を用いても高価である。多数のサンプルシーケンスを必要とするだけでなく、2次複雑性を持つ実用計量のペア計算が必要となる。本稿では,集約された参照表現に対して計算されたスコアを用いて,ペアワイズメトリックスコアを近似する。これはユーティリティ推定の複雑さを$O(n^2)$から$O(n)$に変更し、MBRデコードの品質向上を実証的に保存する。ソースコードはhttps://github.com/zurichnlp/mbrで公開します。

Minimum Bayes Risk (MBR) decoding is a text generation technique that has been shown to improve the quality of machine translations, but is expensive, even if a sampling-based approximation is used. Besides requiring a large number of sampled sequences, it requires the pairwise calculation of a utility metric, which has quadratic complexity. In this paper, we propose to approximate pairwise metric scores with scores calculated against aggregated reference representations. This changes the complexity of utility estimation from $O(n^2)$ to $O(n)$, while empirically preserving most of the quality gains of MBR decoding. We release our source code at https://github.com/ZurichNLP/mbr

翻訳日:2024-02-07 13:31:09 公開日:2024-02-06

# harmbench: 自動レッドチーム編成とロバスト拒否のための標準化された評価フレームワーク

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal ( http://arxiv.org/abs/2402.04249v1 )

ライセンス: Link先を確認

Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

(参考訳) 自動化されたレッドチームリングは、大規模言語モデル(LLM)の悪意ある使用に伴うリスクを発見・緩和する上で大きな約束を持っているが、新しいメソッドを厳格に評価するための標準化された評価フレームワークが欠如している。この問題に対処するために、自動化レッドチームのための標準化された評価フレームワークであるHarmBenchを紹介します。これらの基準を満たすために、レッドチーム評価で未確認のいくつかの望ましい特性を特定し、体系的にHarmBenchを設計する。 harmbenchを用いて18のレッドチーム編成法と33の目標llmと防御法を大規模比較し,新たな知見を得た。また,幅広い攻撃におけるllmのロバスト性を大幅に向上させ,harmonchが攻撃と防御の共開発を可能にすることを実証する,高度に効率的な敵訓練手法を提案する。私たちはHarmBenchをhttps://github.com/centerforaisafety/HarmBenchでオープンソースにしています。

Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at https://github.com/centerforaisafety/HarmBench.

翻訳日:2024-02-07 13:30:56 公開日:2024-02-06

# Mambaは学習方法を学ぶことができるか? 文脈内学習課題の比較研究

Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks ( http://arxiv.org/abs/2402.04248v1 )

ライセンス: Link先を確認

Jongho Park, Jaeseung Park, Zheyang Xiong, Nayoung Lee, Jaewoong Cho, Samet Oymak, Kangwook Lee, Dimitris Papailiopoulos

(参考訳) mamba gu & dao (2034) のような状態空間モデル (state-space model, ssm) は、言語モデリングにおけるトランスフォーマーネットワークの代替として、ゲーティング、畳み込み、入力依存のトークン選択を取り入れ、多頭注意の二次コストを緩和することで提案されている。 ssmは競合性能を示すが、その文脈内学習(icl)能力は、パラメータ最適化なしでタスクの実行を可能にする現代の言語モデルの驚くべき創発性であり、トランスフォーマーに比べて未熟である。本研究では,様々なタスクにまたがるトランスフォーマーモデルに対して,mambaに着目したssmsのicl性能を評価する。その結果、SSMは標準回帰ICLタスクにおいてトランスフォーマーと相容れない性能を示し、スパースパリティ学習のようなタスクでは優れていた。しかし、SSMは非標準検索機能を含むタスクでは不足している。これらの制約に対処するために,Mambaとアテンションブロックを組み合わせたハイブリッドモデルである \variant を導入する。この結果から,ハイブリッドアーキテクチャは言語モデルにおけるICL向上に有望な道筋であることを示唆した。

State-space models (SSMs), such as Mamba Gu & Dao (2034), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of multi-head attention. Although SSMs exhibit competitive performance, their in-context learning (ICL) capabilities, a remarkable emergent property of modern language models that enables task execution without parameter optimization, remain underexplored compared to Transformers. In this study, we evaluate the ICL performance of SSMs, focusing on Mamba, against Transformer models across various tasks. Our results show that SSMs perform comparably to Transformers in standard regression ICL tasks, while outperforming them in tasks like sparse parity learning. However, SSMs fall short in tasks involving non-standard retrieval functionality. To address these limitations, we introduce a hybrid model, \variant, that combines Mamba with attention blocks, surpassing individual models in tasks where they struggle independently. Our findings suggest that hybrid architectures offer promising avenues for enhancing ICL in language models.

翻訳日:2024-02-07 13:30:38 公開日:2024-02-06

# 自律性よりも安全を優先する:科学におけるLLMエージェントのリスク

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science ( http://arxiv.org/abs/2402.04247v1 )

ライセンス: Link先を確認

Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

(参考訳) 大規模言語モデル(llm)を用いた知的エージェントは、自律的に実験を行い、様々な分野にわたる科学的発見を促進することに有望である。彼らの能力は有望だが、安全を慎重に考慮する必要がある新たな脆弱性も導入している。しかし、これらの脆弱性の包括的な調査は行われていないため、文献に顕著なギャップがある。本報告では,科学領域におけるllmベースのエージェントの脆弱性を徹底的に検証し,その悪用に伴う潜在的なリスクを明らかにし,安全対策の必要性を強調することで,このギャップを埋める。まず、ユーザ意図、特定の科学的領域、およびそれらが外部環境に与える影響を考慮し、科学的LLMエージェントに固有の潜在的なリスクを概観することから始める。そして、これらの脆弱性の起源を調べ、制限された既存の作業のスコーピングレビューを提供します。そこで本研究では,人間による規制,エージェント・アライメント,環境フィードバック(エージェント・レギュレーション)の理解を含む三進フレームワークを提案する。さらに,これらの問題を効果的に解決するための改良されたモデル,堅牢なベンチマーク,包括的な規制の開発を提唱する科学エージェントの保護に関連する限界と課題を強調した。

Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines. While their capabilities are promising, they also introduce novel vulnerabilities that demand careful consideration for safety. However, there exists a notable gap in the literature, as there has been no comprehensive exploration of these vulnerabilities. This position paper fills this gap by conducting a thorough examination of vulnerabilities in LLM-based agents within scientific domains, shedding light on potential risks associated with their misuse and emphasizing the need for safety measures. We begin by providing a comprehensive overview of the potential risks inherent to scientific LLM agents, taking into account user intent, the specific scientific domain, and their potential impact on the external environment. Then, we delve into the origins of these vulnerabilities and provide a scoping review of the limited existing works. Based on our analysis, we propose a triadic framework involving human regulation, agent alignment, and an understanding of environmental feedback (agent regulation) to mitigate these identified risks. Furthermore, we highlight the limitations and challenges associated with safeguarding scientific agents and advocate for the development of improved models, robust benchmarks, and comprehensive regulations to address these issues effectively.

翻訳日:2024-02-07 13:30:15 公開日:2024-02-06

# カシミールポラリトンによる超振動遷移の理論

Theory of Supervibronic Transitions via Casimir Polaritons ( http://arxiv.org/abs/2402.04246v1 )

ライセンス: Link先を確認

Tao E. Li

(参考訳) 電子から振動の自由度への遠隔エネルギー移動経路は、振動の強い結合条件下で赤外線光学キャビティ内で同定される。このメカニズムは動的カシミール効果に依存しており、分子の突然の電子遷移によって真の赤外線光子が生成される。さらに、振動ポラリトンの形成により、励起された光子エネルギーは、散逸が起こる前に振動の自由度に移される。解析解と数値シミュレーションの両方で、この電子と振動のエネルギー移動の大きさは分子の数に二乗依存し、振動キャビティデチューニングに共鳴することがわかった。この「超振動」遷移過程において、分子当たりの振動エネルギーの利得がマクロ的な限界において意味を持つため、この過程は室温での従来の振動強い結合装置を用いて観察される可能性がある。

A remote energy transfer pathway from electronic to vibrational degrees of freedom is identified inside an infrared optical cavity under vibrational strong coupling conditions. This mechanism relies on the dynamical Casimir effect, whereby real infrared photons are generated due to a sudden electronic transition of molecules. Moreover, the formation of vibrational polaritons enables the excited photon energy to be transferred to the vibrational degrees of freedom before any dissipation occurs. Both analytic solutions and numerical simulations reveal that the magnitude of this electronic to vibrational energy transfer depends quadratically on the number of molecules and resonantly on the vibration-cavity detuning. During this "supervibronic" transition process, because the vibrational energy gain per molecule can be meaningful in the macroscopic limit, this process may potentially be observed using conventional vibrational strong coupling devices at room temperature.

翻訳日:2024-02-07 13:29:52 公開日:2024-02-06

# 超伝導量子ビットアレイにおける宇宙線誘起相関誤差の直接的証拠

Direct evidence for cosmic-ray-induced correlated errors in superconducting qubit array ( http://arxiv.org/abs/2402.04245v1 )

ライセンス: Link先を確認

Xue-Gang Li, Jun-Hua Wang, Yao-Yao Jiang, Guang-Ming Xue, Xiao-Xia Cai, Jun Zhou, Ming Gong, Zhao-Feng Liu, Shuang-Yu Zheng, Deng-Ke Ma, Mo Chen, Wei-Jie Sun, Shuang Yang, Fei Yan, Yi-Rong Jin, Xue-Feng Ding and Hai-Feng Yu

(参考訳) 相関誤差は量子誤差補正に大きな影響を与え、空間と時間の両方において異なる量子ビットでエラーが発生するという仮定に挑戦する。超伝導量子ビットは複数の量子ビットにまたがる相関誤差に悩まされ、これは電離放射線や宇宙線に起因する可能性がある。しかしながら、この関係に関する直接的な証拠と定量的な理解は、現在不足している。本研究では,マルチキュービット同時充電パリティジャンプを連続的に監視し,相関誤差を検出し,マルチキュービット同時ビットフリップよりも頻度の高い値を求める。次に, 希釈冷凍機において試料箱直下に2つの宇宙線ミューオン検出器を配置し, ミューオンによって引き起こされた超伝導キュービットアレイ内の相関誤差を良好に観測する。また,冷蔵庫に鉛遮蔽層を導入することで,他の相関誤差のほとんどがガンマ線によって引き起こされることを明らかにした。さらに, クビット中の準粒子の組換え速度が高い超伝導膜は, 相関誤差の持続時間を削減するのに有効であることがわかった。本研究は,ガンマ線とミューオンが超伝導量子計算に与える影響を実験的に証明し,量子誤差補正のための緩和戦略に関する実践的知見を提供する。さらに,我々のプロセッサにおけるミューオン誘起相関誤差の平均発生率は約0.40 min$^{-1}$cm$^{-2}$であり,0.506 min$^{-1}$cm$^{-2}$のミューオン検出器で検出されたミューオン事象率に匹敵する。これは、超伝導量子ビットアレイを高エネルギー物理学の分野における低エネルギーしきい値センサとしての可能性を示す。

Correlated errors can significantly impact the quantum error correction, which challenges the assumption that errors occur in different qubits independently in both space and time. Superconducting qubits have been found to suffer correlated errors across multiple qubits, which could be attributable to ionizing radiations and cosmic rays. Nevertheless, the direct evidence and a quantitative understanding of this relationship are currently lacking. In this work, we propose to continuously monitor multi-qubit simultaneous charge-parity jumps to detect correlated errors and find that occur more frequently than multi-qubit simultaneous bit flips. Then, we propose to position two cosmic-ray muon detectors directly beneath the sample box in a dilution refrigerator and successfully observe the correlated errors in a superconducting qubit array triggered by muons. By introducing a lead shielding layer on the refrigerator, we also reveal that the majority of other correlated errors are primarily induced by gamma rays. Furthermore, we find the superconducting film with a higher recombination rate of quasiparticles used in the qubits is helpful in reducing the duration of correlated errors. Our results provide experimental evidence of the impact of gamma rays and muons on superconducting quantum computation and offer practical insights into mitigation strategies for quantum error correction. In addition, we observe the average occurrence rate of muon-induced correlated errors in our processor is approximately 0.40 min$^{-1}$cm$^{-2}$, which is comparable to the muon event rate detected by the muon detector with 0.506 min$^{-1}$cm$^{-2}$. This demonstrates the potential applications of superconducting qubit arrays as low-energy threshold sensors in the field of high-energy physics.

翻訳日:2024-02-07 13:29:38 公開日:2024-02-06

# cast: 効率的なトランスフォーマーのためのサロゲートトークンを用いたクラスタリング

CAST: Clustering Self-Attention using Surrogate Tokens for Efficient Transformers ( http://arxiv.org/abs/2402.04239v1 )

ライセンス: Link先を確認

Adjorn van Engelenhoven, Nicola Strisciuglio, Estefan\'ia Talavera

(参考訳) Transformerアーキテクチャは、幅広いタスクのための強力なツールであることが示されている。メモリ使用量と計算時間は入力シーケンスの長さと2乗的に増加するため、トランスフォーマーの適用が制限される。本研究では,注目計算を最適化し,効率的なトランスフォーマーを実現するために,サロゲートトークン(cast)を用いたクラスタリング方式を提案する。 CASTは学習可能なサロゲートトークンを使用してクラスタ親和性行列を構築し、入力シーケンスをクラスタ化し、新しいクラスタ要約を生成する。各クラスタ内のセルフアテンションは、他のクラスタのクラスタサマリーと結合され、入力シーケンス全体にわたって情報フローを可能にする。 CASTは、複雑性を$O(N^2)$から$O(\alpha N)$に減らして効率を向上する。 castは長距離シーケンスモデリングタスクにおけるベースライントランスフォーマーよりも性能が優れ、また他の効率的なトランスフォーマーよりも時間とメモリ効率が向上することを示した。

The Transformer architecture has shown to be a powerful tool for a wide range of tasks. It is based on the self-attention mechanism, which is an inherently computationally expensive operation with quadratic computational complexity: memory usage and compute time increase quadratically with the length of the input sequences, thus limiting the application of Transformers. In this work, we propose a novel Clustering self-Attention mechanism using Surrogate Tokens (CAST), to optimize the attention computation and achieve efficient transformers. CAST utilizes learnable surrogate tokens to construct a cluster affinity matrix, used to cluster the input sequence and generate novel cluster summaries. The self-attention from within each cluster is then combined with the cluster summaries of other clusters, enabling information flow across the entire input sequence. CAST improves efficiency by reducing the complexity from $O(N^2)$ to $O(\alpha N)$ where N is the sequence length, and {\alpha} is constant according to the number of clusters and samples per cluster. We show that CAST performs better than or comparable to the baseline Transformers on long-range sequence modeling tasks, while also achieving higher results on time and memory efficiency than other efficient transformers.

翻訳日:2024-02-07 13:29:10 公開日:2024-02-06

# 可変カプラを用いたパラメトリック共振絡ゲートの誤差予算

Error budget of parametric resonance entangling gate with a tunable coupler ( http://arxiv.org/abs/2402.04238v1 )

ライセンス: Link先を確認

Eyob A. Sete, Vinay Tripathi, Joseph A. Valery, Daniel Lidar, and Josh Y. Mutus

(参考訳) パラメトリック共振ゲートの波長可変カプラ構造における実験誤差予算を解析した。我々は,不整合,漏洩,振幅,位相誤差など,様々なエラー源を特定し,特徴付ける。 2キュービットのゲート時間を変化させることで、これらのエラーのダイナミクスとゲート忠実性への影響を探求する。ゲートの整合性に対する不整合誤差の影響を正確に把握するために,ゲート動作条件下でのキュービットの整合時間を測定する。以上の結果より,2ビットゲートの忠実度を抑えるために,主に2ビット緩和と白色雑音による失語に起因する不整合誤差が認められた。さらに,非計算状態へのリークが2ビットゲートの不完全性に対する2番目に大きな寄与であることを示す。ここで開発したエラー予算手法は他のタイプのゲート実装にも効果的に適用できる。

We analyze the experimental error budget of parametric resonance gates in a tunable coupler architecture. We identify and characterize various sources of errors, including incoherent, leakage, amplitude, and phase errors. By varying the two-qubit gate time, we explore the dynamics of these errors and their impact on the gate fidelity. To accurately capture the impact of incoherent errors on gate fidelity, we measure the coherence times of qubits under gate operating conditions. Our findings reveal that the incoherent errors, mainly arising from qubit relaxation and dephasing due to white noise, limit the fidelity of the two-qubit gates. Moreover, we demonstrate that leakage to noncomputational states is the second largest contributor to the two-qubit gates infidelity, as characterized using leakage-randomized benchmarking. The error budgeting methodology we developed here can be effectively applied to other types of gate implementations.

翻訳日:2024-02-07 13:28:46 公開日:2024-02-06

# cogcom: 操作の連鎖を通じて詳細に飛び込むビジョン言語モデルのトレーニング

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations ( http://arxiv.org/abs/2402.04236v1 )

ライセンス: Link先を確認

Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang

(参考訳) VLM(Vision-Language Models)は、視覚的な指示を回答に合わせるための広範囲なトレーニングによって、その幅広い生存性を示した。しかし、この決定的なアライメントにより、モデルは批判的な視覚的推論を無視し、さらに細心の注意深い視覚問題や不適切な反応に失敗してしまう。本稿では,vlmが一連の操作で問題を解くためのメカニズムであるチェーン・オブ・マニピュレーション(chain of manipulations)を提案する。各マニピュレーションは,事前のトレーニングによって獲得した内在的能力(例えば接地)や,人間のような行動(例えばズームイン)の模倣から,視覚入力の操作を指す。このメカニズムは、VLMが明白な視覚的推論で忠実な応答を生成することを奨励し、解釈可能な経路におけるエラー原因をユーザーが追跡できるようにする。これにより、メモリベース互換アーキテクチャを備えた一般的な17B VLMであるCogCoMをトレーニングする。実験の結果,3つのカテゴリから8つのベンチマークにまたがる最先端のパフォーマンスが得られた。コードとデータはhttps://github.com/thudm/cogcomで公開されている。

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers. However, this conclusive alignment leads models to ignore critical visual reasoning, and further result in failures on meticulous visual problems and unfaithful responses. In this paper, we propose Chain of Manipulations, a mechanism that enables VLMs to solve problems with a series of manipulations, where each manipulation refers to an operation on the visual input, either from intrinsic abilities (e.g., grounding) acquired through prior training or from imitating human-like behaviors (e.g., zoom in). This mechanism encourages VLMs to generate faithful responses with evidential visual reasoning, and permits users to trace error causes in the interpretable paths. We thus train CogCoM, a general 17B VLM with a memory-based compatible architecture endowed this reasoning mechanism. Experiments show that our model achieves the state-of-the-art performance across 8 benchmarks from 3 categories, and a limited number of training steps with the data swiftly gains a competitive performance. The code and data are publicly available at https://github.com/THUDM/CogCoM.

翻訳日:2024-02-07 13:28:31 公開日:2024-02-06

# 生成エージェントは感情を予測できるか?

Can Generative Agents Predict Emotion? ( http://arxiv.org/abs/2402.04232v1 )

ライセンス: Link先を確認

Ciaran Regan, Nanami Iwahashi, Shogo Tanaka, Mizuki Oka

(参考訳) 大規模言語モデル(llm)は多くの人間のような能力を示しているが、llmの共感的理解と感情状態はまだ人間のそれと一致していない。本研究では,新しいイベントを知覚することで,生成型llmエージェントの感情状態がどのように進化するかを調査し,新しい体験を過去の記憶と比較する新しいアーキテクチャを導入する。この比較を通じて、エージェントは文脈における新しい体験を理解する能力を得る。まず、エージェントは新しい経験を時系列テキストデータとして認識する。新しい入力を知覚した後、エージェントは、標準と呼ばれる過去の関連する記憶の要約を生成し、新しい体験をこの規範と比較する。この比較を通じて、エージェントがコンテキストにおける新しい体験にどのように反応するかを分析することができる。パナスは、影響の試験であり、エージェントに投与され、新しい出来事を知覚した後、エージェントの感情状態をキャプチャする。最後に、新しいエクスペリエンスがエージェントメモリに追加され、将来の標準の作成に使用される。感情的にチャージされた状況から自然言語で複数の経験を作ることで、提案するアーキテクチャを幅広いシナリオでテストする。コンテクストの導入は時々エージェントの感情的アライメントを改善するが、さらなる研究と人間の蒸発器との比較が必要であることが示唆された。この論文は、生成剤のアライメントへの別の一歩となることを願っている。

Large Language Models (LLMs) have demonstrated a number of human-like abilities, however the empathic understanding and emotional state of LLMs is yet to be aligned to that of humans. In this work, we investigate how the emotional state of generative LLM agents evolves as they perceive new events, introducing a novel architecture in which new experiences are compared to past memories. Through this comparison, the agent gains the ability to understand new experiences in context, which according to the appraisal theory of emotion is vital in emotion creation. First, the agent perceives new experiences as time series text data. After perceiving each new input, the agent generates a summary of past relevant memories, referred to as the norm, and compares the new experience to this norm. Through this comparison we can analyse how the agent reacts to the new experience in context. The PANAS, a test of affect, is administered to the agent, capturing the emotional state of the agent after the perception of the new event. Finally, the new experience is then added to the agents memory to be used in the creation of future norms. By creating multiple experiences in natural language from emotionally charged situations, we test the proposed architecture on a wide range of scenarios. The mixed results suggests that introducing context can occasionally improve the emotional alignment of the agent, but further study and comparison with human evaluators is necessary. We hope that this paper is another step towards the alignment of generative agents.

翻訳日:2024-02-07 13:28:07 公開日:2024-02-06

# musicrl:音楽生成を人間の好みに合わせる

MusicRL: Aligning Music Generation to Human Preferences ( http://arxiv.org/abs/2402.04229v1 )

ライセンス: Link先を確認

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zal\'an Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, L\'eonard Hussenot, Neil Zeghidour and Andrea Agostinelli

(参考訳) 人間のフィードバックを微調整した最初の音楽生成システムであるMusicRLを提案する。テキストから音楽へのモデルの鑑賞は特に主観的であり、音楽性の概念とキャプションの背後にある特定の意図はユーザーに依存しない(例えば「アップビート・ワークアウト・ミュージック」のようなキャプションはレトロ・ギター・ソロやテクノ・ポップ・ビートにマップすることができる)。このようなモデルの教師付きトレーニングが難しいだけでなく、デプロイ後の微調整に継続的フィードバックを統合することも必要だ。 MusicRLは、事前訓練された自己回帰型MusicLM(Agostinelli et al., 2023)モデルであり、シーケンスレベルの報酬を最大化するために強化学習で微調整された離散オーディオトークンである。我々は,選抜されたラッカーの助けを借りて,テキストアダランスとオーディオ品質に特化して報酬関数を設計し,それをMusicLMをMusicRL-Rに微調整する。ユーザに対してMusicLMをデプロイし,30,000対の選好からなる実質的なデータセットを収集する。 Reinforcement Learning from Human Feedback (RLHF)を用いて,人間のフィードバックを大規模に組み込んだ最初のテキスト・音楽モデルであるMusicRL-Uを訓練する。人間の評価では、MusicRL-RとMusicRL-Uの両方がベースラインに好まれている。最終的に、musicrl-ruは2つのアプローチを組み合わせることで、人間の利率に応じて最適なモデルとなる。アブレーション研究は、人間の嗜好に影響を及ぼす音楽的特性に光を当て、テキストの定着と品質がその一部にしか影響しないことを示している。これにより、音楽鑑賞における主観性が高まり、音楽生成モデルの微調整における人間のリスナーのさらなる関与が求められる。

We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such models challenging, but it also calls for integrating continuous human feedback in their post-deployment finetuning. MusicRL is a pretrained autoregressive MusicLM (Agostinelli et al., 2023) model of discrete audio tokens finetuned with reinforcement learning to maximise sequence-level rewards. We design reward functions related specifically to text-adherence and audio quality with the help from selected raters, and use those to finetune MusicLM into MusicRL-R. We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences. Using Reinforcement Learning from Human Feedback (RLHF), we train MusicRL-U, the first text-to-music model that incorporates human feedback at scale. Human evaluations show that both MusicRL-R and MusicRL-U are preferred to the baseline. Ultimately, MusicRL-RU combines the two approaches and results in the best model according to human raters. Ablation studies shed light on the musical attributes influencing human preferences, indicating that text adherence and quality only account for a part of it. This underscores the prevalence of subjectivity in musical appreciation and calls for further involvement of human listeners in the finetuning of music generation models.

翻訳日:2024-02-07 13:27:44 公開日:2024-02-06

# ニューロダイナミックモデルを用いた新しい魚型自己適応アプローチに基づく群ロボットの知的集団脱出

Intelligent Collective Escape of Swarm Robots Based on a Novel Fish-inspired Self-adaptive Approach with Neurodynamic Models ( http://arxiv.org/abs/2402.04228v1 )

ライセンス: Link先を確認

Junfei Li, Simon X. Yang

(参考訳) 魚学校は集団移動と捕食者からの動的脱出への単純な個別の相互作用を通じて、高い効率の集団行動を示す。魚の学校行動は、通常、群れロボットの設計制御アーキテクチャに良いインスピレーションを与える。本稿では,スワムロボットの集団脱出のための新しい魚型自己適応手法を提案する。さらに、バイオインスパイアされたニューラルネットワーク(BINN)を導入し、魅力的な力と反発力を組み合わせた衝突のない脱出ロボット軌道を生成する。さらに, 動的環境に対処するため, 変化する環境下での群ロボットの自己適応性能を向上させるために, 神経力学に基づく自己適応機構を提案する。魚の脱出操作と同様に、シミュレーションと実験の結果、群れロボットは脅威から一括して離れることができることが示された。いくつかの比較研究により、提案手法はシステム性能の有効性と効率、および複雑な環境における柔軟性と堅牢性を大幅に改善できることを示した。

Fish schools present high-efficiency group behaviors through simple individual interactions to collective migration and dynamic escape from the predator. The school behavior of fish is usually a good inspiration to design control architecture for swarm robots. In this paper, a novel fish-inspired self-adaptive approach is proposed for collective escape for the swarm robots. In addition, a bio-inspired neural network (BINN) is introduced to generate collision-free escape robot trajectories through the combination of attractive and repulsive forces. Furthermore, to cope with dynamic environments, a neurodynamics-based self-adaptive mechanism is proposed to improve the self-adaptive performance of the swarm robots in the changing environment. Similar to fish escape maneuvers, simulation and experimental results show that the swarm robots are capable of collectively leaving away from the threats. Several comparison studies demonstrated that the proposed approach can significantly improve the effectiveness and efficiency of system performance, and the flexibility and robustness in complex environments.

翻訳日:2024-02-07 13:27:15 公開日:2024-02-06

# 最大絡み合い混合状態を含む絡み合い浄化の性能

Performance of entanglement purification including maximally entangled mixed states ( http://arxiv.org/abs/2402.04226v1 )

ライセンス: Link先を確認

Juan Mauricio Torres, J\'ozsef Zsolt Bern\'ad, Roc\'io G\'omez-Rosas

(参考訳) 遠方の量子システム間の絡み合いは、量子通信を実装するための重要な資源である。この性質は外部剤の影響を受け、効率的な絡み合い浄化プロトコルを用いて修復することができる。本研究では,通常のcnot(control-not)ゲートを置き換える2つの2量子ビット演算に基づく絡み合い浄化プロトコルを提案する。これらの演算は一般化された量子測度から生じ、正の演算子評価測度(POVM)における測度演算子として理解することができる。さらに、コアプロトコルの2つのバリエーションが導入され、特定のシナリオでより実用的なことが示されている。プロトコルの性能は、ベル状態に到達する全体的な成功確率と精製可能な状態の数の観点から研究されている。ランク2の状態に基づいて,最大絡み合う状態(MEMS)の場合に,数値計算を用いて拡張・洗練する成功確率の解析式を得ることができる。また,ベル対角線状態に基づく浄化プロトコルと比較して,手順が概ね便利であることを示すために,より一般的な3つの状態を考える。最後に、初期乱数状態を用いてプロトコルをテストする。いずれの場合も、CNOTベースの浄化プロトコルと比較して、我々のスキームを用いて、より大きな性能と大量の精製状態が見つかる。

Entanglement between distant quantum systems is a critical resource for implementing quantum communication. This property is affected by external agents and can be restored by employing efficient entanglement purification protocols. In this work, we propose an entanglement purification protocol based on two entangling two-qubit operations that replace the usual controlled-NOT (CNOT) gate. These operations arise from a generalized quantum measurement and can be understood as measurement operators in a positive operator-valued measure (POVM). Furthermore, two variants of the core protocol are introduced and shown to be more practical in certain scenarios. The performance of the protocols is studied in terms of the overall success probability of reaching a Bell state and the number of purifiable states. Based on rank-two states, we can obtain analytical expressions for the success probability that we extend and refine using numerical calculations to the case of maximally entangled states (MEMS). We also consider more general rank-three states to show that our procedure is in general more convenient compared to purification protocols based on Bell diagonal states. Finally, we test the protocols using initial random states. In all cases, we find a larger performance and larger amount of purifiable states using our schemes compared to the CNOT-based purification protocol.

翻訳日:2024-02-07 13:26:59 公開日:2024-02-06

# GaMeS: メッシュベースのガウススティングの適応と修正

GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting ( http://arxiv.org/abs/2402.01459v2 )

ライセンス: Link先を確認

Joanna Waczy\'nska, Piotr Borycki, S{\l}awomir Tadeja, Jacek Tabor, Przemys{\l}aw Spurek

(参考訳) 近年,画像レンダリングのためのニューラルネットワークベースの手法が数多く導入されている。例えば、広く研究されているneural radiance fields(nerf)は、ニューラルネットワークを使って3dシーンを表現し、少数の2d画像からの現実的なビュー合成を可能にする。しかし、ほとんどのNeRFモデルは長いトレーニングと推論時間によって制約される。対照的に、Gaussian Splatting(GS)は、ガウス分布を通して画像画素への寄与を近似し、高速なトレーニングと高速なリアルタイムレンダリングを保証することによって、3Dシーンのポイントをレンダリングする新しい最先端技術である。 GSの欠点は、数十万のガウス成分を条件付けする必要があるため、その条件付けに対する明確なアプローチが存在しないことである。そこで本研究では,メッシュとガウス分布のハイブリッドであるガウスメッシュスプレート(ゲーム)モデルを導入し,すべてのガウスメッシュスプレートを物体表面(mesh)にピン留めする。この方法のユニークな貢献は,メッシュ上の位置のみに基づいてガウスプレートを定義することで,アニメーション中の位置,スケール,回転の自動調整を可能にすることである。その結果、高品質なビューをリアルタイムに生成する際の高品質なレンダリングが得られる。さらに,事前定義されたメッシュがない場合,学習プロセス中に初期メッシュを微調整することが可能であることを実証する。

In recent years, a range of neural network-based methods for image rendering have been introduced. For instance, widely-researched neural radiance fields (NeRF) rely on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-theart technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce Gaussian Mesh Splatting (GaMeS) model, a hybrid of mesh and a Gaussian distribution, that pin all Gaussians splats on the object surface (mesh). The unique contribution of our methods is defining Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain high-quality renders in the real-time generation of high-quality views. Furthermore, we demonstrate that in the absence of a predefined mesh, it is possible to fine-tune the initial mesh during the learning process.

翻訳日:2024-02-07 11:38:28 公開日:2024-02-06

# Skip \n:大規模視覚言語モデルにおける幻覚の簡易化法

Skip \n: A simple method to reduce hallucination in Large Vision-Language Models ( http://arxiv.org/abs/2402.01345v2 )

ライセンス: Link先を確認

Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou

(参考訳) 大規模視覚言語モデル(LVLM)の最近の進歩は、人間の言語による視覚情報理解における印象的な能力を示している。これらの進歩にもかかわらず、LVLMは視覚情報に存在しないオブジェクトのテキスト記述を生成するなど、マルチモーダル幻覚の課題に直面している。しかし、マルチモーダル幻覚の根本原因はいまだに解明されていない。本稿では,LVLMの固有バイアスが幻覚の重要な要因である可能性を示唆する新しい視点を提案する。具体的には,学習データ中の「\n\n」の前後の内容が有意な意味変化を示す場合,段落に関する意味変化バイアスを系統的に同定する。このパターンは、「\n\n」に続く内容が幻覚的記述の少ない先行内容と明らかに異なることを推測し、「\n\n」に続く幻覚的記述の確率を増大させる。我々は,この仮説を複数の公開LVLM上で検証した。また、生成した記述に「\n\n」を意図的に挿入すると、より幻覚が引き起こされる。そこで,LVLMの幻覚を効果的に緩和するために,'\n'の出力をスキップすることで簡単な手法を提案する。

Recent advancements in large vision-language models (LVLMs) have demonstrated impressive capability in visual information understanding with human language. Despite these advances, LVLMs still face challenges with multimodal hallucination, such as generating text descriptions of objects that are not present in the visual information. However, the underlying fundamental reasons of multimodal hallucinations remain poorly explored. In this paper, we propose a new perspective, suggesting that the inherent biases in LVLMs might be a key factor in hallucinations. Specifically, we systematically identify a semantic shift bias related to paragraph breaks (\n\n), where the content before and after '\n\n' in the training data frequently exhibit significant semantic changes. This pattern leads the model to infer that the contents following '\n\n' should be obviously different from the preceding contents with less hallucinatory descriptions, thereby increasing the probability of hallucinatory descriptions subsequent to the '\n\n'. We have validated this hypothesis on multiple publicly available LVLMs. Besides, we find that deliberately inserting '\n\n' at the generated description can induce more hallucinations. A simple method is proposed to effectively mitigate the hallucination of LVLMs by skipping the output of '\n'.

翻訳日:2024-02-07 11:38:05 公開日:2024-02-06

# AGILE: 要素分解から学んだアプローチベースのGrasp推論

AGILE: Approach-based Grasp Inference Learned from Element Decomposition ( http://arxiv.org/abs/2402.01303v2 )

ライセンス: Link先を確認

MohammadHossein Koosheshi, Hamed Hosseini, Mehdi Tale Masouleh, Ahmad Kalhor, Mohammad Reza Hairi Yazdi

(参考訳) この把持検出の専門家であるヒトは、手対象の位置情報を考慮して物体を把握できる。本研究は,ロボットマニピュレータが物体に対するグリッパーの接近状況に応じて,物体を最も最適な方法で把握し,同一の学習を可能にする手法を提案する。深層学習を基盤として,提案手法は2つの主要段階からなる。ネットワークを未知のオブジェクトに一般化するために、提案するアプローチに基づく把持推論は、グリッパーの特定のアプローチに対して1つ以上の注釈付き把持を持つオブジェクトをその主部分に分割する要素分解段階を含む。その後、把握検出ネットワークは、マスクr−cnnによる分解された要素と、グリッパーの接近に関する情報を利用して、グリッパーが接近した要素と最も最適な把持を検出する。ネットワークをトレーニングするために,coppeliasimシミュレーション環境で収集したロボット把持データセットを紹介する。データセットは10の異なるオブジェクトを含み、注釈付き要素分解マスクと矩形を把握している。提案手法は,コッペリアシムシミュレーション環境において,被写体に対する90%の把握成功率と見えない被写体に対する78%を取得する。最後に、シミュレーションから現実への領域適応は、シミュレーションで収集したトレーニングセットに変換を適用し、データセットを拡大することにより、デルタパラレルロボットと2本指グリップパーを用いて、70%の物理的把握成功性能が得られる。

Humans, this species expert in grasp detection, can grasp objects by taking into account hand-object positioning information. This work proposes a method to enable a robot manipulator to learn the same, grasping objects in the most optimal way according to how the gripper has approached the object. Built on deep learning, the proposed method consists of two main stages. In order to generalize the network on unseen objects, the proposed Approach-based Grasping Inference involves an element decomposition stage to split an object into its main parts, each with one or more annotated grasps for a particular approach of the gripper. Subsequently, a grasp detection network utilizes the decomposed elements by Mask R-CNN and the information on the approach of the gripper in order to detect the element the gripper has approached and the most optimal grasp. In order to train the networks, the study introduces a robotic grasping dataset collected in the Coppeliasim simulation environment. The dataset involves 10 different objects with annotated element decomposition masks and grasp rectangles. The proposed method acquires a 90% grasp success rate on seen objects and 78% on unseen objects in the Coppeliasim simulation environment. Lastly, simulation-to-reality domain adaptation is performed by applying transformations on the training set collected in simulation and augmenting the dataset, which results in a 70% physical grasp success performance using a Delta parallel robot and a 2 -fingered gripper.

翻訳日:2024-02-07 11:37:44 公開日:2024-02-06

# 間接拡散誘導によるスパースビュー一般化可能なNeRFの処理不確かさ

Taming Uncertainty in Sparse-view Generalizable NeRF via Indirect Diffusion Guidance ( http://arxiv.org/abs/2402.01217v2 )

ライセンス: Link先を確認

Yaokun Li, Chao Gou, Guang Tan

(参考訳) ニューラルラジアンス場(NeRF)は,新規な視点の合成に有効であることを示す。しかし、その濃密な入力とシーン固有の最適化への依存は、その広い適用範囲を制限している。一般化可能なNeRF(Gen-NeRF)は、この問題に対処することを目的としているが、しばしば不確実性に満ちたスパース入力を持つ未観測領域でぼやけたアーティファクトを生成する。本稿では,Gen-NeRFの不確実性を低減することを目的としている。我々は、この不確実性を効果的に緩和できないNeRFは、生成能力の欠如に起因すると仮定する。そこで我々は, 間接拡散誘導型NeRFフレームワークであるID-NeRFを革新的に提案し, 誘導に先立って蒸留拡散を利用することにより, 生成的視点からこの不確実性に対処する。具体的には, 先行手法のように不整合サンプリングと直接的に規則化することで生じるモデルの混乱を避けるために, 拡散誘導潜在空間を通して学習された暗黙的関数に本質的に欠けている想像力を間接的に注入する手法を導入する。各種ベンチマークによる実証評価は,スパース入力による不確実性処理において,提案手法の優れた性能を示す。

Neural Radiance Fields (NeRF) have demonstrated effectiveness in synthesizing novel views. However, their reliance on dense inputs and scene-specific optimization has limited their broader applicability. Generalizable NeRFs (Gen-NeRF), while intended to address this, often produce blurring artifacts in unobserved regions with sparse inputs, which are full of uncertainty. In this paper, we aim to diminish the uncertainty in Gen-NeRF for plausible renderings. We assume that NeRF's inability to effectively mitigate this uncertainty stems from its inherent lack of generative capacity. Therefore, we innovatively propose an Indirect Diffusion-guided NeRF framework, termed ID-NeRF, to address this uncertainty from a generative perspective by leveraging a distilled diffusion prior as guidance. Specifically, to avoid model confusion caused by directly regularizing with inconsistent samplings as in previous methods, our approach introduces a strategy to indirectly inject the inherently missing imagination into the learned implicit function through a diffusion-guided latent space. Empirical evaluation across various benchmarks demonstrates the superior performance of our approach in handling uncertainty with sparse inputs.

翻訳日:2024-02-07 11:37:20 公開日:2024-02-06

# 言語と政治の双方向適応によるオープンエンドエンボディエージェントの構築

Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation ( http://arxiv.org/abs/2401.00006v3 )

ライセンス: Link先を確認

Shaopeng Zhai, Jie Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao and Yu Liu

(参考訳) 大規模言語モデル(LLM)と強化学習(RL)を統合するための具体的エージェントの構築は、人間とAIのインタラクションに革命をもたらした。しかし、既存の研究は、開放性の必要性を満たすための課題に直面している。通常、LLM/RLモデルをトレーニングして、固定されたモデルに適応させ、新しいスキルの探索を制限し、人間とAIの相互作用の有効性を妨げる。この目的のために,(1) 事前学習したLDMを計画の目標に翻訳するための微調整,(2) 意思決定の方針を目標に訓練すること,(2) LLM と方針を調整し,オープンエンドネスを達成すること,の2段階からなる協調学習フレームワークである OpenPAL を提案する。オープンエンドのfpsゲームであるcontraを用いて実験を行い,openpalでトレーニングしたエージェントが任意の命令を理解できるだけでなく,実行効率も高いことを示した。これらの結果から,OpenPALは,実践シナリオにおいてオープンエンドなエンボディエージェントを構築する可能性を持っていることが示唆された。

Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and goal-conditioned training a policy for decision-making; (2) co-training to align the LLM and policy, achieving instruction open-endedness. We conducted experiments using Contra, an open-ended FPS game, demonstrating that an agent trained with OpenPAL not only comprehends arbitrary instructions but also exhibits efficient execution. These results suggest that OpenPAL holds the potential to construct open-ended embodied agents in practical scenarios.

翻訳日:2024-02-07 11:37:00 公開日:2024-02-06

# クラスタースイッチバック実験:時空間干渉下における近似最適速度

Clustered Switchback Experiments: Near-Optimal Rates Under Spatiotemporal Interference ( http://arxiv.org/abs/2312.15574v3 )

ライセンス: Link先を確認

Su Jia, Nathan Kallus, Christina Lee Yu

(参考訳) 我々は,非定常性,ユニット間干渉(空間的)干渉,時空間干渉(時空間干渉)の存在下での実験を考察し,グローバル平均治療効果(GATE)を推定し,全てのユニットが常に治療や制御のために露出した平均結果の差について検討した。空間的干渉はグラフによって記述され、単位の結果はその近傍の処理割り当てに依存し、時間的干渉は隠れマルコフ決定過程によって記述され、どちらの処理(作用)下の遷移核も急速な混合条件を満たすと仮定する。本稿では,単位をクラスタにグループ化し,時間ステップをブロックにグループ化し,各クラスタとブロックの組み合わせに単一のランダムな処理を割り当てるクラスタ型スイッチバック設計を提案する。この設計では、良好なクラスタリングを許容するグラフに対して、Truncated exposure-mapping Horvitz-Thompson estimator が$\tilde O(1/NT)$ mean-squared error (MSE) を達成し、$\Omega(1/NT)$ lower bound to logarithmic terms と一致することを示す。結果は同時に、hu, wager 2022 の $n=1$ 設定を一般化し、また ugander et al 2013 と leung 2022 の $t=1$ 設定を一般化した。シミュレーション研究は我々のアプローチの好ましい性能を検証する。

We consider experimentation in the presence of non-stationarity, inter-unit (spatial) interference, and carry-over effects (temporal interference), where we wish to estimate the global average treatment effect (GATE), the difference between average outcomes having exposed all units at all times to treatment or to control. We suppose spatial interference is described by a graph, where a unit's outcome depends on its neighborhood's treatment assignments, and that temporal interference is described by a hidden Markov decision process, where the transition kernel under either treatment (action) satisfies a rapid mixing condition. We propose a clustered switchback design, where units are grouped into clusters and time steps are grouped into blocks and each whole cluster-block combination is assigned a single random treatment. Under this design, we show that for graphs that admit good clustering, a truncated exposure-mapping Horvitz-Thompson estimator achieves $\tilde O(1/NT)$ mean-squared error (MSE), matching an $\Omega(1/NT)$ lower bound up to logarithmic terms. Our results simultaneously generalize the $N=1$ setting of Hu, Wager 2022 (and improves on the MSE bound shown therein for difference-in-means estimators) as well as the $T=1$ settings of Ugander et al 2013 and Leung 2022. Simulation studies validate the favorable performance of our approach.

翻訳日:2024-02-07 11:36:38 公開日:2024-02-06

# LLMは人間の反応バイアスを示すか? 調査設計における事例研究

Do LLMs exhibit human-like response biases? A case study in survey design ( http://arxiv.org/abs/2311.04076v5 )

ライセンス: Link先を確認

Lindia Tjuatja, Valerie Chen, Sherry Tongshuang Wu, Ameet Talwalkar, Graham Neubig

(参考訳) 大規模言語モデル(LLM)の能力が向上するにつれて、調査や世論調査などの主観的ラベルが望まれる現実世界のタスクにおいて、LLMを人間のためのプロキシとして使用する可能性への興奮が高まっている。主観的タスクにおける人間のプロキシとしてllmが採用される上での障壁として広く引用されているのが、表現の迅速化に対する感受性である。 LLMが人間の反応バイアスを反映する程度について検討する。我々は,社会心理学の文献において,「プロンプト」の語句の変化による人的反応バイアスが広く研究されているサーベイデザインを検討する。これらの研究からLLMが人間的な反応バイアスを示すかどうかを評価するためのデータセットとフレームワークを設計した。 9つのモデルの包括的評価からは,一般的なオープンおよび商用のllmは,特にrlhfモデルにおいて,人間的な動作を反映していないことが分かる。さらに, モデルが人間と同じ方向において有意な変化を示したとしても, 人間に有意な変化を与えない摂動に敏感であることがわかった。これらの結果は、LLMを人間のプロキシとして使用する際の落とし穴を強調し、モデル行動のよりきめ細かいキャラクタリゼーションの必要性を強調している。私たちのコード、データセット、収集したサンプルはhttps://github.com/lindiatjuatja/biasmonkeyで入手できます。

As large language models (LLMs) become more capable, there is growing excitement about the possibility of using LLMs as proxies for humans in real-world tasks where subjective labels are desired, such as in surveys and opinion polling. One widely-cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity to prompt wording - but interestingly, humans also display sensitivities to instruction changes in the form of response biases. We investigate the extent to which LLMs reflect human response biases, if at all. We look to survey design, where human response biases caused by changes in the wordings of "prompts" have been extensively explored in social psychology literature. Drawing from these works, we design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior, particularly in models that have undergone RLHF. Furthermore, even if a model shows a significant change in the same direction as humans, we find that they are sensitive to perturbations that do not elicit significant changes in humans. These results highlight the pitfalls of using LLMs as human proxies, and underscore the need for finer-grained characterizations of model behavior. Our code, dataset, and collected samples are available at https://github.com/lindiatjuatja/BiasMonkey

翻訳日:2024-02-07 11:36:07 公開日:2024-02-06

# AnomalyCLIP:ゼロショット異常検出のための物体認識型プロンプト学習

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection ( http://arxiv.org/abs/2310.18961v5 )

ライセンス: Link先を確認

Qihang Zhou, Guansong Pang, Yu Tian, Shibo He, Jiming Chen

(参考訳) ゼロショット異常検出(ZSAD)は、ターゲットデータセットのトレーニングサンプルなしで異常を検出するために補助データを使用してトレーニングされた検出モデルを必要とする。さまざまな関心事,例えばデータのプライバシなどによって,トレーニングデータにアクセスできない場合において重要なタスクであると同時に,前景オブジェクトの出現,異常領域,さまざまな製品や組織の欠陥や腫瘍などのバックグラウンド機能など,さまざまな領域にわたる異常に一般化する必要が生じるため,そのモデルは極めて困難である。近年,クリップなどの大規模事前学習型視覚言語モデル(vlms)が,異常検出を含む様々な視覚課題において強いゼロショット認識能力を示している。しかし、VLMは画像の異常や異常ではなく、前景オブジェクトのクラスセマンティクスをモデル化することに重点を置いているため、ZSAD性能は弱い。本稿では、AnomalyCLIPと呼ばれる新しいアプローチを導入し、CLIPを異なる領域にわたる正確なZSADに適用する。 AnomalyCLIPの重要な洞察は、オブジェクトに依存しないテキストのプロンプトを学習し、前景のオブジェクトに関係なく画像の一般的な正規性と異常を捉えることである。これにより、モデルがオブジェクトのセマンティクスよりも異常な画像領域に焦点を合わせ、様々な種類のオブジェクトに対する一般化された正規性と異常認識を可能にします。 17の現実世界の異常検出データセットに関する大規模実験では、様々な欠陥検査や医療画像領域からの多種多様なクラスセマンティクスのデータセットにおいて、異常を検出および分割する優れたゼロショット性能が得られた。コードはhttps://github.com/zqhang/AnomalyCLIPで公開される。

Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.

翻訳日:2024-02-07 11:35:42 公開日:2024-02-06

# ユニバーサルドメイン適応のためのメモリ支援サブプロトタイプマイニング

Memory-Assisted Sub-Prototype Mining for Universal Domain Adaptation ( http://arxiv.org/abs/2310.05453v3 )

ライセンス: Link先を確認

Yuxiang Lai (1 and 2), Yi Zhou (1 and 2), Xinghong Liu (1 and 2), Tao Zhou (3) ((1) School of Computer Science and Engineering, Southeast University, China (2) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China (3) School of Computer Science and Engineering, Nanjing University of Science and Technology, China)

(参考訳) ユニバーサルドメイン適応は、クラスを整列させ、ソースとターゲットドメインの同一カテゴリ間の特徴ギャップを減らすことを目的としている。対象のプライベートカテゴリは、ソースドメインに含まれないため、適応プロセス中に未知のクラスとして設定される。しかし、既存の手法の多くはカテゴリ内のクラス内構造を見落としており、特に同じカテゴリに属するサンプル間で重要な概念シフトがある場合である。大きな概念シフトを持つサンプルを強制的に押し付けると、適応性能に悪影響を及ぼす可能性がある。さらに、解釈可能性の観点からは、視覚の特徴を戦闘機や民間航空機のような重要な相違点と一致させることは理不尽である。残念ながら、このような意味的曖昧さとアノテーションのコストのため、カテゴリは必ずしも詳細に分類されるわけではないため、モデルが正確な適応を行うのは困難である。そこで本研究では,同一のサブクラスに属するサンプルとマイニングサブクラスの違いを学習できるメモリ支援サブプロトタイプマイニング (memspm) 法を提案する。そうすることで、我々のモデルは、転送可能性を高め、同じカテゴリにアノテートされたサンプル間の固有の差異を反映するより合理的な特徴空間を学習する。我々は,UniDA,OSDA,PDAを含む複数のシナリオに対してMemSPM法の有効性を評価する。提案手法は,4つのベンチマークにおいて,ほとんどの場合,最先端の性能を実現する。

Universal domain adaptation aims to align the classes and reduce the feature gap between the same category of the source and target domains. The target private category is set as the unknown class during the adaptation process, as it is not included in the source domain. However, most existing methods overlook the intra-class structure within a category, especially in cases where there exists significant concept shift between the samples belonging to the same category. When samples with large concept shift are forced to be pushed together, it may negatively affect the adaptation performance. Moreover, from the interpretability aspect, it is unreasonable to align visual features with significant differences, such as fighter jets and civil aircraft, into the same category. Unfortunately, due to such semantic ambiguity and annotation cost, categories are not always classified in detail, making it difficult for the model to perform precise adaptation. To address these issues, we propose a novel Memory-Assisted Sub-Prototype Mining (MemSPM) method that can learn the differences between samples belonging to the same category and mine sub-classes when there exists significant concept shift between them. By doing so, our model learns a more reasonable feature space that enhances the transferability and reflects the inherent differences among samples annotated as the same category. We evaluate the effectiveness of our MemSPM method over multiple scenarios, including UniDA, OSDA, and PDA. Our method achieves state-of-the-art performance on four benchmarks in most cases.

翻訳日:2024-02-07 11:35:10 公開日:2024-02-06

# 階層的距離構造エンコーディングによるグラフトランスの拡張

Enhancing Graph Transformers with Hierarchical Distance Structural Encoding ( http://arxiv.org/abs/2308.11129v3 )

ライセンス: Link先を確認

Yuankai Luo

(参考訳) グラフトランスフォーマーは、意味のある注意点を導き出すために強い帰納バイアスを必要とする。しかし、現在の手法は、分子、ソーシャルネットワーク、引用ネットワークなどの様々なグラフでよく見られる、より長い範囲、階層構造、あるいはコミュニティ構造を捉えるのに不足することが多い。本稿では,グラフ内のノード距離をモデル化するための階層的距離構造符号化(HDSE)手法を提案する。我々は既存のグラフ変換器の注意機構にHDSEをシームレスに統合する新しいフレームワークを導入し、他の位置符号化と同時適用を可能にした。大規模グラフにhdseを用いたグラフトランスを適用すべく,線形複雑度を有する階層的グローバルアテンション機構を提案する。理論上,最短経路距離におけるhdseの優位性を表現性と一般化の観点から証明する。実験により,HDSEを用いたグラフトランスフォーマーはグラフ分類,7つのグラフレベルのデータセットの回帰,最大10億個のノードを含む12の大規模グラフのノード分類に優れていた。

Graph transformers need strong inductive biases to derive meaningful attention scores. Yet, current methods often fall short in capturing longer ranges, hierarchical structures, or community structures, which are common in various graphs such as molecules, social networks, and citation networks. This paper presents a Hierarchical Distance Structural Encoding (HDSE) method to model node distances in a graph, focusing on its multi-level, hierarchical nature. We introduce a novel framework to seamlessly integrate HDSE into the attention mechanism of existing graph transformers, allowing for simultaneous application with other positional encodings. To apply graph transformer with HDSE to large-scale graphs, we further propose a hierarchical global attention mechanism with linear complexity. We theoretically prove the superiority of HDSE over shortest path distances in terms of expressivity and generalization. Empirically, we demonstrate that graph transformers with HDSE excel in graph classification, regression on 7 graph-level datasets, and node classification on 12 large-scale graphs, including those with up to a billion nodes.

翻訳日:2024-02-07 11:34:45 公開日:2024-02-06

# 言語はグラフを必要とするもの

Language is All a Graph Needs ( http://arxiv.org/abs/2308.07134v5 )

ライセンス: Link先を確認

Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, Yongfeng Zhang

(参考訳) 大規模な事前訓練型言語モデルの出現は、さまざまなAI研究領域に革命をもたらした。トランスフォーマーベースのLarge Language Models (LLM) は、コンピュータビジョンと自然言語処理の分野を統合するために、CNNとRNNを徐々に置き換えている。画像、ビデオ、テキストなどの独立したデータサンプルと比較すると、グラフは通常、豊富な構造的および関係的な情報を含んでいる。一方、言語、特に自然言語は最も表現力のある媒体の1つであり、複雑な構造を記述するのに優れている。しかし、グラフ問題を生成言語モデリングフレームワークに組み込む作業は依然として非常に限られている。 LLMの隆盛を考えると、LLMがグラフの基礎モデルとしてGNNを置き換えることができるかどうかを検討することが不可欠である。本稿では,自然言語命令に基づく高度にスケーラブルなプロンプトを用いたinstructglm(instruction-finetuned graph language model)を提案する。自然言語を用いてグラフのマルチスケールな幾何学構造を記述し、LLMを微調整してグラフタスクを実行することで、生成グラフ学習を実現する。提案手法は, ogbn-arxiv, Cora, PubMedデータセットに基づくGNNベースラインを網羅し, グラフ機械学習の新たな基盤モデルとして, 生成LDMに光を当てる。私たちのコードはhttps://github.com/agiresearch/instructglm.comでオープンソースです。

The emergence of large-scale pre-trained language models has revolutionized various AI research domains. Transformers-based Large Language Models (LLMs) have gradually replaced CNNs and RNNs to unify fields of computer vision and natural language processing. Compared with independent data samples such as images, videos or texts, graphs usually contain rich structural and relational information. Meanwhile, language, especially natural language, being one of the most expressive mediums, excels in describing complex structures. However, existing work on incorporating graph problems into the generative language modeling framework remains very limited. Considering the rising prominence of LLMs, it becomes essential to explore whether LLMs can also replace GNNs as the foundation model for graphs. In this paper, we propose InstructGLM (Instruction-finetuned Graph Language Model) with highly scalable prompts based on natural language instructions. We use natural language to describe multi-scale geometric structure of the graph and then instruction finetune an LLM to perform graph tasks, which enables Generative Graph Learning. Our method surpasses all GNN baselines on ogbn-arxiv, Cora and PubMed datasets, underscoring its effectiveness and sheds light on generative LLMs as new foundation model for graph machine learning. Our code is open-sourced at https://github.com/agiresearch/InstructGLM.

翻訳日:2024-02-07 11:34:26 公開日:2024-02-06

# 画像キャプションのための視覚言語モデルの線形アライメント

Linear Alignment of Vision-language Models for Image Captioning ( http://arxiv.org/abs/2307.05591v3 )

ライセンス: Link先を確認

Fabian Paischer, Markus Hofmarcher, Sepp Hochreiter, Thomas Adler

(参考訳) 近年、CLIPのような視覚言語モデルは、画像キャプションやキャプション評価など、様々なマルチモーダルタスクにおいて、技術の進歩を遂げている。多くのアプローチは、CLIPと言語モデルの間のマッピングネットワークをトレーニングすることで、CLIPスタイルのモデルを下流タスクに適応させる。これは通常、大きなモデルの勾配を計算するためコストがかかる。本稿では,CLIPの画像とテキストの埋め込みを,クローズドフォームで線形にマッピングする,より効率的なトレーニングプロトコルを提案する。これにより勾配計算の必要性を回避し、既存の軽量メソッドの最大1000倍の速度でトレーニング可能な、recapと呼ばれる軽量キャプションメソッドが実現される。さらに,CLIPスコアに基づく2つの新しい学習ベースの画像キャプチャーメトリクスと線形マッピングを提案する。さらにrecapと新しいメトリクスを組み合わせることで,合成キャプションに基づく反復型データストア・オーグメンテーションループ(dal)を設計する。我々はms-coco,flickr30k,vizwiz,msrvttのリキャップを評価した。 Flickr8k-Expert や Flickr8k-Crowdflower での人間の評価と整合性が高いため、既存のメトリクスでは最先端の軽量メソッドに匹敵するパフォーマンスを実現しています。最後に、recapが他のドメインにうまく移行し、dalがパフォーマンス向上につながることを実証します。

Recently, vision-language models like CLIP have advanced the state of the art in a variety of multi-modal tasks including image captioning and caption evaluation. Many approaches adapt CLIP-style models to a downstream task by training a mapping network between CLIP and a language model. This is costly as it usually involves calculating gradients for large models. We propose a more efficient training protocol that fits a linear mapping between image and text embeddings of CLIP via a closed-form solution. This bypasses the need for gradient computation and results in a lightweight captioning method called ReCap, which can be trained up to 1000 times faster than existing lightweight methods. Moreover, we propose two new learning-based image-captioning metrics that build on CLIP score along with our linear mapping. Furthermore, we combine ReCap with our new metrics to design an iterative datastore-augmentation loop (DAL) based on synthetic captions. We evaluate ReCap on MS-COCO, Flickr30k, VizWiz, and MSRVTT. ReCap achieves performance comparable to state-of-the-art lightweight methods on established metrics while outperforming them on our new metrics, which are better aligned with human ratings on Flickr8k-Expert and Flickr8k-Crowdflower. Finally, we demonstrate that ReCap transfers well to other domains and that our DAL leads to a performance boost.

翻訳日:2024-02-07 11:34:03 公開日:2024-02-06

# 自己監督学習のための行列情報理論

Matrix Information Theory for Self-Supervised Learning ( http://arxiv.org/abs/2305.17326v5 )

ライセンス: Link先を確認

Yifan Zhang, Zhiquan Tan, Jingqin Yang, Weiran Huang, Yang Yuan

(参考訳) 最大エントロピー符号化フレームワークは、SimSiam、Barlow Twins、MECといった多くの非コントラスト学習手法に対して統一的な視点を提供する。このフレームワークに着想を得たMatrix-SSLは,行列情報理論を利用して最大エントロピー符号化損失を行列均一性損失として解釈する手法である。さらに、Matrix-SSLは、行列アライメント損失をシームレスに取り込み、異なる分岐に共分散行列を直接アライメントすることで、最大エントロピー符号化法を強化する。実験結果から, Matrix-SSLは, 線形評価条件下でのImageNetデータセットや, 伝達学習タスクのためのMS-COCO上で, 最先端の手法よりも優れていることがわかった。具体的には,MS-COCO上で伝達学習を行う場合,MoCo v2やBYOLといった従来のSOTA手法よりも3.3%向上し,800エポックの事前学習に比べて400エポックに留まった。また,GSM8Kデータセットの72.3%をマトリックスクロスエントロピー損失を用いた7Bモデルを微調整し,標準的なクロスエントロピー損失よりも3.1%の差で表現学習を導入する。コードはhttps://github.com/yifanzhang-pro/matrix-ssl。

The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime, achieving 72.3% on the GSM8K dataset by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.

翻訳日:2024-02-07 11:33:39 公開日:2024-02-06

# 教室対話の分析における大規模言語モデルの評価

Evaluating Large Language Models in Analysing Classroom Dialogue ( http://arxiv.org/abs/2402.02380v2 )

ライセンス: Link先を確認

Yun Long, Haifeng Luo, Yu Zhang

(参考訳) 本研究は,大規模言語モデル(LLM),特に GPT-4 を教室内対話の分析に適用し,診断と品質改善の両面において重要な研究課題である。教育研究における伝統的質的手法の知識集約的かつ労働集約的性質を認識し,llmが分析プロセスを合理化し,強化する可能性について検討した。この研究は、数学と中国語の授業を通して教室の対話を包含する中学のデータセットを含んでいる。これらの対話は、教育専門家が手作業でコーディングし、カスタマイズされたGPT-4モデルを用いて分析した。本研究は,手動アノテーションとGPT-4の出力を比較し,教育対話の分析の有効性を評価することを目的とした。人間のコーダとGPT-4間の時間効率、コーダ間合意、およびコーダ間信頼性を評価する。結果から、gpt-4による時間節約と、モデルと人間のコーダ間のコーディングの一貫性の高まりが示され、特定のコードに多少の相違がある。これらの知見は、LLMの教育評価とファシリテーションにおける強みを浮き彫りにした。

This study explores the application of Large Language Models (LLMs), specifically GPT-4, in the analysis of classroom dialogue, a crucial research task for both teaching diagnosis and quality improvement. Recognizing the knowledge-intensive and labor-intensive nature of traditional qualitative methods in educational research, this study investigates the potential of LLM to streamline and enhance the analysis process. The study involves datasets from a middle school, encompassing classroom dialogues across mathematics and Chinese classes. These dialogues were manually coded by educational experts and then analyzed using a customised GPT-4 model. This study focuses on comparing manual annotations with the outputs of GPT-4 to evaluate its efficacy in analyzing educational dialogues. Time efficiency, inter-coder agreement, and inter-coder reliability between human coders and GPT-4 are evaluated. Results indicate substantial time savings with GPT-4, and a high degree of consistency in coding between the model and human coders, with some discrepancies in specific codes. These findings highlight the strong potential of LLM in teaching evaluation and facilitation.

翻訳日:2024-02-07 11:26:29 公開日:2024-02-06

# 音響キューの強化によるブートストラップ型オーディオ・ビジュアルセグメンテーション

Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues ( http://arxiv.org/abs/2402.02327v2 )

ライセンス: Link先を確認

Tianxiang Chen, Zhentao Tan, Tao Gong, Qi Chu, Yue Wu, Bin Liu, Le Lu, Jieping Ye, Nenghai Yu

(参考訳) 視覚と音声を効果的に相互作用する方法は、マルチモーダリティ研究分野において大きな関心を集めている。近年,ビデオフレーム内の音声オブジェクトをオーディオキューの指導下でセグメント化することを目的とした,新たなAVSタスクが提案されている。しかし、既存のAVS手法のほとんどは、一方向のオーディオキューの統合が不十分なため、視覚的特徴がオーディオモダリティのそれを支配する傾向にあるモダリティの不均衡によって妨げられている。この不均衡は、視覚的側面に対する特徴表現を歪め、共同視覚表現の学習を妨げるとともに、セグメント化の不正確さを引き起こす可能性がある。この問題に対処するため,我々はAVSACを提案する。双方向視覚デコーダ(bavd, bidirectional audio-visual decoder)と双方向ブリッジの統合,音声手がかりの強化,音声と視覚の連続的な相互作用の促進を特徴とする。この双方向インタラクションは、モダリティの不均衡を狭め、より効果的なオーディオと視覚の統合表現の学習を促進する。さらに,BAVDのきめ細かいガイダンスとして,音声・視覚的フレームワイド同期の戦略を提案する。この戦略は視覚特徴における聴覚成分の共有を高め、よりバランスのとれた視聴覚表現学習に寄与する。大規模な実験により,AVS性能のベンチマークが得られた。

How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate those of the audio modality, due to a unidirectional and insufficient integration of audio cues. This imbalance skews the feature representation towards the visual aspect, impeding the learning of joint audio-visual representations and potentially causing segmentation inaccuracies. To address this issue, we propose AVSAC. Our approach features a Bidirectional Audio-Visual Decoder (BAVD) with integrated bidirectional bridges, enhancing audio cues and fostering continuous interplay between audio and visual modalities. This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations. Additionally, we present a strategy for audio-visual frame-wise synchrony as fine-grained guidance of BAVD. This strategy enhances the share of auditory components in visual features, contributing to a more balanced audio-visual representation learning. Extensive experiments show that our method attains new benchmarks in AVS performance.

翻訳日:2024-02-07 11:26:11 公開日:2024-02-06

# 最適サブセット選択のための動的インクリメンタル最適化

Dynamic Incremental Optimization for Best Subset Selection ( http://arxiv.org/abs/2402.02322v2 )

ライセンス: Link先を確認

Shaogang Ren, Xiaoning Qian

(参考訳) 最適なサブセット選択は、多くのスパース学習問題において'ゴールド標準'と見なされる。この非滑らかな非凸問題に対する様々な最適化手法が提案されている。本稿では,$\ell_0$-regularized問題系の双対形式について検討する。原始問題構造と双対問題構造に基づいて,効率的な原始双対アルゴリズムを開発した。この2値範囲推定とインクリメンタルな戦略を活用することで,アルゴリズムは冗長な計算を減らし,最適部分集合選択の解を改善することができる。合成および実世界のデータセットに関する理論的解析と実験は、提案した解の効率性と統計的性質を検証する。

Best subset selection is considered the `gold standard' for many sparse learning problems. A variety of optimization techniques have been proposed to attack this non-smooth non-convex problem. In this paper, we investigate the dual forms of a family of $\ell_0$-regularized problems. An efficient primal-dual algorithm is developed based on the primal and dual problem structures. By leveraging the dual range estimation along with the incremental strategy, our algorithm potentially reduces redundant computation and improves the solutions of best subset selection. Theoretical analysis and experiments on synthetic and real-world datasets validate the efficiency and statistical properties of the proposed solutions.

翻訳日:2024-02-07 11:25:49 公開日:2024-02-06

# 外因性分布学習による因果ベイズ最適化

Causal Bayesian Optimization via Exogenous Distribution Learning ( http://arxiv.org/abs/2402.02277v2 )

ライセンス: Link先を確認

Shaogang Ren, Xiaoning Qian

(参考訳) 構造的因果モデルにおける操作対象変数の最大化は重要な問題である。既存の因果ベイズ最適化(CBO)手法は、報酬を最大化するために因果構造を変更するハード介入に依存するか、データ生成機構を調整して目的を達成するために内在変数にアクションノードを導入する。本稿では, 既存手法の期待によって無視されるか, 限界化される外因性変数の分布を学習するために, 新たな手法を提案する。外因性分布学習は、通常限られた観測データで訓練される代理モデルにおいて、構造化因果モデルの近似精度を向上させる。さらに、学習した外因性分布は、既存のCBOを付加雑音モデル(ANM)を超えた一般的な因果関係に拡張する。外因性変数のリカバリにより、ノイズや未観測の隠れ変数に対して、より柔軟な事前利用が可能になります。学習した外因性分布を利用した新しいCBO法を開発した。異なるデータセットとアプリケーションの実験により,提案手法の利点が示された。

Maximizing a target variable as an operational objective in a structured causal model is an important problem. Existing Causal Bayesian Optimization (CBO) methods either rely on hard interventions that alter the causal structure to maximize the reward; or introduce action nodes to endogenous variables so that the data generation mechanisms are adjusted to achieve the objective. In this paper, a novel method is introduced to learn the distribution of exogenous variables, which is typically ignored or marginalized through expectation by existing methods. Exogenous distribution learning improves the approximation accuracy of structured causal models in a surrogate model that is usually trained with limited observational data. Moreover, the learned exogenous distribution extends existing CBO to general causal schemes beyond Additive Noise Models (ANM). The recovery of exogenous variables allows us to use a more flexible prior for noise or unobserved hidden variables. A new CBO method is developed by leveraging the learned exogenous distribution. Experiments on different datasets and applications show the benefits of our proposed method.

翻訳日:2024-02-07 11:25:42 公開日:2024-02-06

# バニラベイズ最適化、高次元で大きなパフォーマンス

Vanilla Bayesian Optimization Performs Great in High Dimensions ( http://arxiv.org/abs/2402.02229v2 )

ライセンス: Link先を確認

Carl Hvarfner and Erik Orm Hellsten and Luigi Nardi

(参考訳) 高次元問題はベイズ最適化アルゴリズムのアキレスのヒールと見なされてきた。次元の呪いによって刺激されたアルゴリズムの大規模なコレクションは、目的に対して様々な単純化された仮定を課すことで、この設定においてよりパフォーマンスの高いものにすることを目的としている。本稿では,バニラベイズ最適化が高次元タスクに不適合となるような不均一性を明らかにするとともに,既存のアルゴリズムがモデル複雑性を低減させるレンズを通してこれらの不均一性に対処する方法を示す。さらに,バニラベイズ最適化アルゴリズムに典型的な従来の仮定の強化を提案し,目的に構造的制約を課すことなく,管理可能なレベルへの複雑性を低減する。我々の修正 - 次元に先行するガウス過程の単純なスケーリング - により、標準的なベイズ最適化は、以前考えられていた高次元よりも大幅にうまく機能し、複数の一般的な実世界の高次元タスクにおいて既存の最先端アルゴリズムよりも明らかに優れていることが分かる。

High-dimensional problems have long been considered the Achilles' heel of Bayesian optimization algorithms. Spurred by the curse of dimensionality, a large collection of algorithms aim to make it more performant in this setting, commonly by imposing various simplifying assumptions on the objective. In this paper, we identify the degeneracies that make vanilla Bayesian optimization poorly suited to high-dimensional tasks, and further show how existing algorithms address these degeneracies through the lens of lowering the model complexity. Moreover, we propose an enhancement to the prior assumptions that are typical to vanilla Bayesian optimization algorithms, which reduces the complexity to manageable levels without imposing structural restrictions on the objective. Our modification - a simple scaling of the Gaussian process lengthscale prior with the dimensionality - reveals that standard Bayesian optimization works drastically better than previously thought in high dimensions, clearly outperforming existing state-of-the-art algorithms on multiple commonly considered real-world high-dimensional tasks.

翻訳日:2024-02-07 11:25:25 公開日:2024-02-06

# 交通アシスタントとしてのGPT-4V:複雑な交通イベントの視覚言語モデルの詳細

GPT-4V as Traffic Assistant: An In-depth Look at Vision Language Model on Complex Traffic Events ( http://arxiv.org/abs/2402.02205v2 )

ライセンス: Link先を確認

Xingcheng Zhou, Alois C. Knoll

(参考訳) 交通事故、特に交通事故の認識と理解は、インテリジェントな輸送システムとインテリジェントな車両の領域において最重要事項である。この地域は、学術分野と産業分野の両方の広範な焦点を継続的に捉えてきた。複雑な交通イベントの特定と理解は、主に交通環境の複雑な性質、多様な観察的視点、そして事故の多面的原因のため、非常に困難である。これらの要因は、効果的なソリューションの開発を永続的に妨げている。 GPT-4Vのような大規模視覚言語モデル(VLM)の出現は、この問題に対処するための革新的なアプローチを導入している。本稿では,GPT-4Vを代表的トラフィックインシデントビデオのセットで探索し,これらの複雑なトラフィック状況を理解する能力について検討する。 gpt-4vは、ある古典的な交通イベントにおいて、顕著な認知、推論、意思決定能力を示す。同時に、より複雑なシナリオでの理解を制限するgpt-4vの制限も特定した。これらの制限はさらなる探索と解決に役立つ。

The recognition and understanding of traffic incidents, particularly traffic accidents, is a topic of paramount importance in the realm of intelligent transportation systems and intelligent vehicles. This area has continually captured the extensive focus of both the academic and industrial sectors. Identifying and comprehending complex traffic events is highly challenging, primarily due to the intricate nature of traffic environments, diverse observational perspectives, and the multifaceted causes of accidents. These factors have persistently impeded the development of effective solutions. The advent of large vision-language models (VLMs) such as GPT-4V, has introduced innovative approaches to addressing this issue. In this paper, we explore the ability of GPT-4V with a set of representative traffic incident videos and delve into the model's capacity of understanding these complex traffic situations. We observe that GPT-4V demonstrates remarkable cognitive, reasoning, and decision-making ability in certain classic traffic events. Concurrently, we also identify certain limitations of GPT-4V, which constrain its understanding in more intricate scenarios. These limitations merit further exploration and resolution.

翻訳日:2024-02-07 11:25:03 公開日:2024-02-06

# DeCoF:フレーム一貫性によるビデオ検出

DeCoF: Generated Video Detection via Frame Consistency ( http://arxiv.org/abs/2402.02085v2 )

ライセンス: Link先を確認

Long Ma, Jiajia Zhang, Hongping Deng, Ningyu Zhang, Yong Liao, Haiyang Yu

(参考訳) 先進的なビデオ生成手法によって生成されたビデオの品質向上は、社会における新たなセキュリティ上の課題を招き、生成されたビデオ検出が緊急の研究優先事項となる。本研究では,この領域における共同研究を促進するために,生成したビデオ検出のためのオープンソースデータセットを明示的に構築する。一連の注意深く設計されたプローブ実験を通じて,映像生成のための一般およびロバスト検出器の開発において,時間的および空間的アーティファクトの意義について検討した。映像フレーム一貫性の原理に基づいて,特徴学習の一般化における空間的アーティファクトの影響を排除できる簡易かつ効果的な検出モデル(decof)を提案する。ビデオ生成モデルが生成するビデオの検出におけるDeCoFの有効性を実証し、その強力な一般化能力を複数の商用プロプライエタリモデルで検証した。

The escalating quality of video generated by advanced video generation methods leads to new security challenges in society, which makes generated video detection an urgent research priority. To foster collaborative research in this area, we construct the first open-source dataset explicitly for generated video detection, providing a valuable resource for the community to benchmark and improve detection methodologies. Through a series of carefully designed probe experiments, our study explores the significance of temporal and spatial artifacts in developing general and robust detectors for generated video. Based on the principle of video frame consistency, we introduce a simple yet effective detection model (DeCoF) that eliminates the impact of spatial artifacts during generalizing feature learning. Our extensive experiments demonstrate the efficacy of DeCoF in detecting videos produced by unseen video generation models and confirm its powerful generalization capabilities across several commercial proprietary models.

翻訳日:2024-02-07 11:24:47 公開日:2024-02-06

# HPC研究とLLMの展望と課題

The Landscape and Challenges of HPC Research and LLMs ( http://arxiv.org/abs/2402.02018v2 )

ライセンス: Link先を確認

Le Chen, Nesreen K. Ahmed, Akash Dutta, Arijit Bhattacharjee, Sixing Yu, Quazi Ishtiaque Mahmud, Waqwoya Abebe, Hung Phan, Aishwarya Sarkar, Branden Butler, Niranjan Hasabnis, Gal Oren, Vy A. Vo, Juan Pablo Munoz, Theodore L. Willke, Tim Mattson, Ali Jannesari

(参考訳) 近年,言語モデル(LM),特に大規模言語モデル(LLM)がディープラーニングの分野に革命をもたらした。エンコーダデコーダモデルとプロンプトベースの技術の両方が、自然言語処理やコードベースのタスクにおいて大きな可能性を示している。過去数年間、多くの研究所や機関が高性能コンピューティングに多大な投資を行ってきた。本稿では,そのような言語モデルに基づく手法をハイパフォーマンスコンピューティング(hpc)におけるタスクに適用・活用することは,非常に有益であることを示す。本研究は、上記の立場の背後にある推論を示し、既存のアイデアがどのようにしてhpcタスクに適応できるかを強調する。

Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing and code-based tasks. Over the past several years, many research labs and institutions have invested heavily in high-performance computing, approaching or breaching exascale performance levels. In this paper, we posit that adapting and utilizing such language model-based techniques for tasks in high-performance computing (HPC) would be very beneficial. This study presents our reasoning behind the aforementioned position and highlights how existing ideas can be improved and adapted for HPC tasks.

翻訳日:2024-02-07 11:24:02 公開日:2024-02-06

# 機械学習パスロス予測のためのシミュレーション強調データ拡張

Simulation-Enhanced Data Augmentation for Machine Learning Pathloss Prediction ( http://arxiv.org/abs/2402.01969v2 )

ライセンス: Link先を確認

Ahmed P. Mohamed, Byunghyun Lee, Yaguang Zhang, Max Hollingsworth, C. Robert Anderson, James V. Krogmeier, David J. Love

(参考訳) 機械学習(ML)は、パスロス予測に対する有望なソリューションを提供する。しかし、データの可用性の制限により、その効果は低下する可能性がある。そこで本研究では,mlパスロス予測のための新しいシミュレーション強調データ拡張手法を提案する。本手法では,セルカバレッジシミュレータから生成した合成データと,実世界のデータセットを独立に収集する。これらのデータセットは、農場、丘陵地帯、住宅地など様々な環境での広範な測定キャンペーンを通じて収集された。この包括的なデータ収集は、モデルトレーニングにとって重要な真実を提供します。 LiDARデータセットから派生した地理的属性を含む一連のチャネル機能を設計した。これらの特徴は予測モデルをトレーニングするために使われ、高効率で頑健なグラデーション強化MLアルゴリズムであるCatBoostを取り入れた。本研究で示されたように, 合成データの統合は, 異なる環境下でのモデルの一般化可能性を大幅に向上させ, 平均絶対誤差において約12dBの顕著な改善を実現している。さらに,シミュレーショントレーニングセットに追加される少数の測定値であっても,適切なデータバランスで,モデルの性能を大幅に向上させることができることを明らかにした。

Machine learning (ML) offers a promising solution to pathloss prediction. However, its effectiveness can be degraded by the limited availability of data. To alleviate these challenges, this paper introduces a novel simulation-enhanced data augmentation method for ML pathloss prediction. Our method integrates synthetic data generated from a cellular coverage simulator and independently collected real-world datasets. These datasets were collected through an extensive measurement campaign in different environments, including farms, hilly terrains, and residential areas. This comprehensive data collection provides vital ground truth for model training. A set of channel features was engineered, including geographical attributes derived from LiDAR datasets. These features were then used to train our prediction model, incorporating the highly efficient and robust gradient boosting ML algorithm, CatBoost. The integration of synthetic data, as demonstrated in our study, significantly improves the generalizability of the model in different environments, achieving a remarkable improvement of approximately 12dB in terms of mean absolute error for the best-case scenario. Moreover, our analysis reveals that even a small fraction of measurements added to the simulation training set, with proper data balance, can significantly enhance the model's performance.

翻訳日:2024-02-07 11:23:49 公開日:2024-02-06

# 凸最適化によるニューラルネットワークに基づく生成拡散モデルの解析

Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization ( http://arxiv.org/abs/2402.01965v2 )

ライセンス: Link先を確認

Fangzhao Zhang, Mert Pilanci

(参考訳) 拡散モデルは最先端の画像、ビデオ、オーディオ生成で広く使われている。スコアに基づく拡散モデルは,入力データ分布のスコア関数の推定を必要とし,これらの手法の中で際立っている。本研究では,2層ニューラルネットワークを用いた拡散モデルの解析のための理論的枠組みを提案する。既存の拡散理論は主に漸近的であるが、正確な予測スコア関数を特徴付け、有限データを用いたニューラルネットワークに基づく拡散モデルの収束結果を確立する。この研究は、非漸近的な環境でニューラルネットワークベースの拡散モデルが何を学ぶかを理解するのに役立つ。

Diffusion models are becoming widely used in state-of-the-art image, video and audio generation. Score-based diffusion models stand out among these methods, necessitating the estimation of score function of the input data distribution. In this study, we present a theoretical framework to analyze two-layer neural network-based diffusion models by reframing score matching and denoising score matching as convex optimization. Though existing diffusion theory is mainly asymptotic, we characterize the exact predicted score function and establish the convergence result for neural network-based diffusion models with finite data. This work contributes to understanding what neural network-based diffusion model learns in non-asymptotic settings.

翻訳日:2024-02-07 11:23:31 公開日:2024-02-06

# 等角予測による演算子学習のための校正不確かさ定量化

Calibrated Uncertainty Quantification for Operator Learning via Conformal Prediction ( http://arxiv.org/abs/2402.01960v2 )

ライセンス: Link先を確認

Ziqi Ma, Kamyar Azizzadenesheli, Anima Anandkumar

(参考訳) オペレーター・ラーニングは科学や工学の応用でますます採用されてきているが、その多くは校正の不確かさの定量化を必要とする。演算子学習の出力は連続関数であるため、領域内のすべての点で不確実性を同時に定量化することは困難である。現在の方法では、単一点あるいは1つのスカラー関数上のキャリブレーションやガウス性のような強い仮定を考える。本稿では, リスク制御型量子ニューラル演算子, 分布のない有限サンプル機能キャリブレーション等式予測法を提案する。実数値が予測の不確かさ球内にある関数領域上の期待点の割合として定義される被覆率に関する理論的キャリブレーション保証を提供する。 2次元ダーシー流と3次元自動車表面圧力予測タスクの実証実験結果から, キャリブレーション範囲の検証と, ベースライン法よりも効率の良い不確実性帯域が得られた。特に3次元問題では, 対象校正率(不確実性推定値が校正された試験サンプルの割合)が98%に満たされているのは本手法のみである。

Operator learning has been increasingly adopted in scientific and engineering applications, many of which require calibrated uncertainty quantification. Since the output of operator learning is a continuous function, quantifying uncertainty simultaneously at all points in the domain is challenging. Current methods consider calibration at a single point or over one scalar function or make strong assumptions such as Gaussianity. We propose a risk-controlling quantile neural operator, a distribution-free, finite-sample functional calibration conformal prediction method. We provide a theoretical calibration guarantee on the coverage rate, defined as the expected percentage of points on the function domain whose true value lies within the predicted uncertainty ball. Empirical results on a 2D Darcy flow and a 3D car surface pressure prediction task validate our theoretical results, demonstrating calibrated coverage and efficient uncertainty bands outperforming baseline methods. In particular, on the 3D problem, our method is the only one that meets the target calibration percentage (percentage of test samples for which the uncertainty estimates are calibrated) of 98%.

翻訳日:2024-02-07 11:23:21 公開日:2024-02-06

# 超パラメータ最適化のための大言語モデルエージェント

Large Language Model Agent for Hyper-Parameter Optimization ( http://arxiv.org/abs/2402.01881v2 )

ライセンス: Link先を確認

Siyi Liu, Chen Gao, Yong Li

(参考訳) ハイパーパラメータ最適化は現代の機械学習において重要であり、専門家の知識、数多くの試行、高い計算と人的資源を必要とする。自動機械学習(automl)の進歩にもかかわらず、試行効率、セットアップの複雑さ、相互運用性の面での課題は依然として続いている。これらの課題に対処するため,多種多様な機械学習タスクにおけるハイパーパラメータ最適化を自動化するために,LLM(Large Language Models)を利用した新しいパラダイムを導入する。具体的には、AgentHPOはタスク情報を自律的に処理し、特定のハイパーパラメータ(HP)を用いて実験を行い、歴史的な試行に基づいて反復的に最適化する。このヒューマンライクな最適化プロセスは、必要な試行回数を大幅に削減し、セットアッププロセスを単純化し、従来のAutoMLメソッドと比較して解釈可能性とユーザ信頼を高める。 12の代表的な機械学習タスクに対して行われた大規模な実験実験は、エージェントHPOが一致しただけでなく、最も優れたヒトの治験を上回り、同時に説明可能な結果を提供することを示している。さらなる分析は、LLMがこれらのタスクを最適化する際の戦略に光を当て、様々なシナリオにおけるその有効性と適応性を強調している。

Hyperparameter optimization is critical in modern machine learning, requiring expert knowledge, numerous trials, and high computational and human resources. Despite the advancements in Automated Machine Learning (AutoML), challenges in terms of trial efficiency, setup complexity, and interoperability still persist. To address these issues, we introduce a novel paradigm leveraging Large Language Models (LLMs) to automate hyperparameter optimization across diverse machine learning tasks, which is named AgentHPO (short for LLM Agent-based Hyperparameter Optimization). Specifically, AgentHPO processes the task information autonomously, conducts experiments with specific hyperparameters (HPs), and iteratively optimizes them based on historical trials. This human-like optimization process largely reduces the number of required trials, simplifies the setup process, and enhances interpretability and user trust, compared to traditional AutoML methods. Extensive empirical experiments conducted on 12 representative machine-learning tasks indicate that AgentHPO not only matches but also often surpasses the best human trials in terms of performance while simultaneously providing explainable results. Further analysis sheds light on the strategies employed by the LLM in optimizing these tasks, highlighting its effectiveness and adaptability in various scenarios.

翻訳日:2024-02-07 11:23:03 公開日:2024-02-06

# QPPとHPPK:ガロア置換グループを用いた量子セキュア暗号における非可換性の統合

QPP and HPPK: Unifying Non-Commutativity for Quantum-Secure Cryptography with Galois Permutation Group ( http://arxiv.org/abs/2402.01852v2 )

ライセンス: Link先を確認

Randy Kuang

(参考訳) 量子コンピューティングの発展と古典暗号システムにおける脆弱性の増大に対応するため,本論文では,統一暗号フレームワークを提案する。対称鍵暗号のための量子置換パッド(qpp)と、鍵カプセル化機構(kem)とデジタル署名(ds)のための準同型多項式公開鍵(hppk)である。我々のアプローチは、量子の進歩によって引き起こされる課題にしばしば直面する。ガロア置換群の行列表現を利用し、その単射的および非可換な性質を継承し、qppは量子セキュアな対称鍵暗号を実現し、シャノンの完全機密を古典的および量子ネイティブシステムの両方にシームレスに拡張した。一方、NPハード問題のないHPPKでは、平易な公開鍵の対称暗号化が強化されている。このことは、モジュラー乗法やガロア置換群の算術表現を通じて数学的構造を隠蔽し、その部分準同型性を利用することによって達成される。これにより、秘密のカプセル化中に暗号化されたデータのセキュアな計算が可能になり、平易な公開鍵のセキュリティが強化される。 HPPK暗号におけるKEMとDSのシームレスな統合により、コンパクトキー、暗号、署名サイズが得られ、例外的な性能を示す。本稿では、ガロア置換グループの下でQPPとHPPKを有機的に統一し、量子耐性暗号プロトコルの基盤となる重要な進歩を示す。我々の貢献は、量子コンピューティングの時代にセキュアな通信システムの開発を促進する。

In response to the evolving landscape of quantum computing and the escalating vulnerabilities in classical cryptographic systems, our paper introduces a unified cryptographic framework. Rooted in the innovative work of Kuang et al., we leverage two novel primitives: the Quantum Permutation Pad (QPP) for symmetric key encryption and the Homomorphic Polynomial Public Key (HPPK) for Key Encapsulation Mechanism (KEM) and Digital Signatures (DS). Our approach adeptly confronts the challenges posed by quantum advancements. Utilizing the Galois Permutation Group's matrix representations and inheriting its bijective and non-commutative properties, QPP achieves quantum-secure symmetric key encryption, seamlessly extending Shannon's perfect secrecy to both classical and quantum-native systems. Meanwhile, HPPK, free from NP-hard problems, fortifies symmetric encryption for the plain public key. It accomplishes this by concealing the mathematical structure through modular multiplications or arithmetic representations of Galois Permutation Group over hidden rings, harnessing their partial homomorphic properties. This allows for secure computation on encrypted data during secret encapsulations, bolstering the security of the plain public key. The seamless integration of KEM and DS within HPPK cryptography yields compact key, cipher, and signature sizes, demonstrating exceptional performance. This paper organically unifies QPP and HPPK under the Galois Permutation Group, marking a significant advancement in laying the groundwork for quantum-resistant cryptographic protocols. Our contribution propels the development of secure communication systems amid the era of quantum computing.

翻訳日:2024-02-07 11:22:42 公開日:2024-02-06

# LLMは計画できないが、LLM-Moduloフレームワークの計画を助ける

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks ( http://arxiv.org/abs/2402.01817v2 )

ライセンス: Link先を確認

Subbarao Kambhampati, Karthik Valmeekam, Lin Guan, Kaya Stechly, Mudit Verma, Siddhant Bhambri, Lucas Saldyt, Anil Murthy

(参考訳) 計画と推論タスクにおけるLLM(Large Language Models)の役割には、かなりの混乱がある。他方では、LLMは正しいプロンプトや自己検証戦略だけでこれらのタスクを実際に実行できるという過度な最適化的主張がある。他方で、llmが計画/調整タスクに適しているのは、単に問題仕様をある構文形式から別の形式に翻訳し、問題を外部のシンボリックソルバに送るだけである、という悲観的な主張は多すぎるだろう。本稿では,両極端が誤導されているという見解を述べる。自己回帰的LLMは、それ自体では、計画や自己検証(結局のところ、推論の形で)を行うことができず、文学における誤解の理由についていくつか光を当てている。また、LCMは、単純なフロントエンド/バックエンドフォーマットトランスレータを超えて、計画/推論タスクにおいて、より意味のある役割を持つ、普遍的な近似知識ソースと見なされるべきである、と論じる。本稿では, LLMの強度と外部モデルベース検証器の強度を, より厳密な双方向インタラクション方式で組み合わせた, {\displaystyle {\bf LLM-Modulo Frameworks} のビジョンを提案する。外部検証器自体を駆動するモデルがLCMの助けを借りてどのように取得できるかを示す。 LLMとシンボリックコンポーネントを単純にパイプライン化するのではなく、このLLM-Modulo Frameworkは、LLMとシンボリックコンポーネントとの緊密な統合を提供する、より柔軟な知識、問題、嗜好仕様へのモデルベースの計画/推論体制の範囲を拡大する、より優れたニューロシンボリックアプローチを提供します。

There is considerable confusion about the role of Large Language Models (LLMs) in planning and reasoning tasks. On one side are over-optimistic claims that LLMs can indeed do these tasks with just the right prompting or self-verification strategies. On the other side are perhaps over-pessimistic claims that all that LLMs are good for in planning/reasoning tasks are as mere translators of the problem specification from one syntactic format to another, and ship the problem off to external symbolic solvers. In this position paper, we take the view that both these extremes are misguided. We argue that auto-regressive LLMs cannot, by themselves, do planning or self-verification (which is after all a form of reasoning), and shed some light on the reasons for misunderstandings in the literature. We will also argue that LLMs should be viewed as universal approximate knowledge sources that have much more meaningful roles to play in planning/reasoning tasks beyond simple front-end/back-end format translators. We present a vision of {\bf LLM-Modulo Frameworks} that combine the strengths of LLMs with external model-based verifiers in a tighter bi-directional interaction regime. We will show how the models driving the external verifiers themselves can be acquired with the help of LLMs. We will also argue that rather than simply pipelining LLMs and symbolic components, this LLM-Modulo Framework provides a better neuro-symbolic approach that offers tighter integration between LLMs and symbolic components, and allows extending the scope of model-based planning/reasoning regimes towards more flexible knowledge, problem and preference specifications.

翻訳日:2024-02-07 11:22:13 公開日:2024-02-06

# 時系列のための大規模言語モデル:調査

Large Language Models for Time Series: A Survey ( http://arxiv.org/abs/2402.01801v2 )

ライセンス: Link先を確認

Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang

(参考訳) 大規模言語モデル (LLM) は自然言語処理やコンピュータビジョンといった領域で広く利用されている。 llmsはテキスト、画像、グラフィック以外にも、気候、iot、ヘルスケア、トラフィック、オーディオ、ファイナンスといった分野に利益をもたらす時系列データ分析に重要な可能性を秘めている。本調査では,LLMのパワーを時系列解析に活用する様々な手法の詳細な調査と詳細な分類について述べる。我々は,LLMの原文データトレーニングと時系列データの数値的性質のギャップを埋めることの課題に対処し,LLMから数値時系列解析への知識の伝達と蒸留の戦略を探究する。本稿では,(1)LDMの直接的プロンプト,(2)時系列量子化,(3)アライメント技術,(4)ブリッジ機構としての視覚モダリティの利用,(5)LDMとツールの組み合わせなど,様々な手法について述べる。さらに、本調査は、既存のマルチモーダル時系列とテキストデータセットの包括的概要を提供し、この新興分野の課題と将来の可能性について考察する。調査で議論されたすべての論文とデータセットを含む、最新のGithubリポジトリを維持しています。

Large Language Models (LLMs) have seen significant use in domains such as natural language processing and computer vision. Going beyond text, image and graphics, LLMs present a significant potential for analysis of time series data, benefiting domains such as climate, IoT, healthcare, traffic, audio and finance. This survey paper provides an in-depth exploration and a detailed taxonomy of the various methodologies employed to harness the power of LLMs for time series analysis. We address the inherent challenge of bridging the gap between LLMs' original text data training and the numerical nature of time series data, and explore strategies for transferring and distilling knowledge from LLMs to numerical time series analysis. We detail various methodologies, including (1) direct prompting of LLMs, (2) time series quantization, (3) alignment techniques, (4) utilization of the vision modality as a bridging mechanism, and (5) the combination of LLMs with tools. Additionally, this survey offers a comprehensive overview of the existing multimodal time series and text datasets and delves into the challenges and future opportunities of this emerging field. We maintain an up-to-date Github repository which includes all the papers and datasets discussed in the survey.

翻訳日:2024-02-07 11:21:39 公開日:2024-02-06

# DeepSeekMath:オープン言語モデルにおける数学的推論の限界を押し上げる

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models ( http://arxiv.org/abs/2402.03300v2 )

ライセンス: Link先を確認

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y.K. Li, Y. Wu, Daya Guo

(参考訳) 数学的推論は、その複雑で構造化された性質から、言語モデルにとって大きな課題となる。本稿では,DeepSeek-Coder-Base-v1.5 7Bの事前学習を継続するDeepSeekMath 7Bを紹介する。 DeepSeekMath 7Bは、外部ツールキットや投票技術に頼ることなく、競合レベルのMATHベンチマークで51.7%のスコアを獲得し、Gemini-UltraとGPT-4のパフォーマンスレベルに近づいた。 DeepSeekMath 7Bの64以上のサンプルはMATHで60.9%を達成している。 deepseekmathの数学的推論能力は、2つの重要な要因によって引き起こされている。第2に、PPOのメモリ使用量を同時に最適化しながら、数学的推論能力を向上させるPPOの変種であるグループ相対ポリシー最適化(GRPO)を導入する。

Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

翻訳日:2024-02-07 11:14:52 公開日:2024-02-06

# クールなビデオ:800のパラメータでビデオのコーディングを学ぶ

Cool-chic video: Learned video coding with 800 parameters ( http://arxiv.org/abs/2402.03179v2 )

ライセンス: Link先を確認

Thomas Leguay, Th\'eo Ladune, Pierrick Philippe, Olivier D\'eforges

(参考訳) 本稿では,復号化画素毎の900乗法と800のパラメータを用いた軽量な学習ビデオコーデックを提案する。私たちの知る限りでは、これは最もデコーディングの複雑さが低いニューラルビデオコーデックの1つです。オーバーフィットしたイメージコーデックのcool-chicをベースとし、ビデオの時間的冗長性を活用するためにインターコーディングモジュールを付加する。提案モデルは低遅延およびランダムアクセス構成の両方を用いて動画を圧縮し、FFNeRVなどの他のオーバーフィットコーデックよりも高い性能でAVCに近いレート歪みを実現する。 orange-opensource.github.io/Cool-Chic

We propose a lightweight learned video codec with 900 multiplications per decoded pixel and 800 parameters overall. To the best of our knowledge, this is one of the neural video codecs with the lowest decoding complexity. It is built upon the overfitted image codec Cool-chic and supplements it with an inter coding module to leverage the video's temporal redundancies. The proposed model is able to compress videos using both low-delay and random access configurations and achieves rate-distortion close to AVC while out-performing other overfitted codecs such as FFNeRV. The system is made open-source: orange-opensource.github.io/Cool-Chic.

翻訳日:2024-02-07 11:14:32 公開日:2024-02-06

# video-lavit: デカップリングされた視覚運動トークンを用いた統一ビデオ言語事前学習

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization ( http://arxiv.org/abs/2402.03161v2 )

ライセンス: Link先を確認

Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang Song, Kun Gai, Yadong Mu

(参考訳) マルチモーダル大規模言語モデル(LLM)の最近の進歩を踏まえ、画像テキストデータからより情報に富んだ実世界のビデオへの拡張に注目が集まっている。静止画像と比較すると,ビデオは時空間力学のモデル化により,大規模な事前学習を効果的に行う上でユニークな課題となる。本稿では,各映像をキーフレームと時間的動きとして表現する効率的な映像分解法を用いて,ビデオ言語事前学習におけるこのような制限に対処する。これらは、視覚および時間情報をいくつかのトークンとして識別するよく設計されたトークン化器を使用してllmに適応され、ビデオ、画像、テキストの統一された生成前トレーニングを可能にする。推測では、LPMから生成されたトークンを元の連続画素空間に慎重に回収し、様々なビデオコンテンツを作成する。提案するフレームワークは,画像および映像の理解と生成において,13のマルチモーダルベンチマークの競合性能で示されるように,画像および映像コンテンツの理解と生成が可能である。私たちのコードとモデルはhttps://video-lavit.github.ioで入手できる。

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos. Compared to static images, video poses unique challenges for effective large-scale pre-training due to the modeling of its spatiotemporal dynamics. In this paper, we address such limitations in video-language pre-training with an efficient video decomposition that represents each video as keyframes and temporal motions. These are then adapted to an LLM using well-designed tokenizers that discretize visual and temporal information as a few tokens, thus enabling unified generative pre-training of videos, images, and text. At inference, the generated tokens from the LLM are carefully recovered to the original continuous pixel space to create various video content. Our proposed framework is both capable of comprehending and generating image and video content, as demonstrated by its competitive performance across 13 multimodal benchmarks in image and video understanding and generation. Our code and models will be available at https://video-lavit.github.io.

翻訳日:2024-02-07 11:14:21 公開日:2024-02-06

# easyinstruct: 大きな言語モデルのための使いやすい命令処理フレームワーク

EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models ( http://arxiv.org/abs/2402.03049v2 )

ライセンス: Link先を確認

Yixin Ou, Ningyu Zhang, Honghao Gui, Ziwen Xu, Shuofei Qiao, Yida Xue, Runnan Fang, Kangwei Liu, Lei Li, Zhen Bi, Guozhou Zheng, Huajun Chen

(参考訳) 近年,命令チューニングが注目され,大規模言語モデル(llm)の能力向上に欠かせない技術として注目されている。高品質な命令データセットを構築するために,データ量とデータ品質の微妙なバランスを実現するため,多くの命令処理手法が提案されている。しかし、様々な命令処理方法に矛盾があるため、コミュニティで利用可能な標準のオープンソース命令処理実装フレームワークが存在しないため、実践者がさらなる開発や進歩を妨げている。命令処理の研究と開発を容易にするために,命令生成,選択,プロンプトをモジュール化し,それらの組み合わせや相互作用を考慮しつつ,LLMの使い易い命令処理フレームワークであるEasyInstructを提案する。 EasyInstructはhttps://github.com/zjunlp/EasyInstructで公開され、実行中のデモとともに、クイックスタート用にhttps://huggingface.co/spaces/zjunlp/EasyInstructで公開されている。

In recent years, instruction tuning has gained increasing attention and emerged as a crucial technique to enhance the capabilities of Large Language Models (LLMs). To construct high-quality instruction datasets, many instruction processing approaches have been proposed, aiming to achieve a delicate balance between data quantity and data quality. Nevertheless, due to inconsistencies that persist among various instruction processing methods, there is no standard open-source instruction processing implementation framework available for the community, which hinders practitioners from further developing and advancing. To facilitate instruction processing research and development, we present EasyInstruct, an easy-to-use instruction processing framework for LLMs, which modularizes instruction generation, selection, and prompting, while also considering their combination and interaction. EasyInstruct is publicly released and actively maintained at https://github.com/zjunlp/EasyInstruct, along with a running demo App at https://huggingface.co/spaces/zjunlp/EasyInstruct for quick-start, calling for broader research centered on instruction data.

翻訳日:2024-02-07 11:14:03 公開日:2024-02-06

# データ誘起マルチスケール損失と高効率マルチレート勾配降下スキーム

Data-induced multiscale losses and efficient multirate gradient descent schemes ( http://arxiv.org/abs/2402.03021v2 )

ライセンス: Link先を確認

Juncai He, Liangchen Liu, and Yen-Hsi Richard Tsai

(参考訳) 本稿では,機械学習アルゴリズム,特にディープラーニングの文脈におけるマルチスケールデータの影響について検討する。データセットは、その分布が異なる方向にわたるスケールの大きなバリエーションを示す場合、マルチスケールである。本稿では,データから受け継いだ勾配やヘシアンを含む,損失景観の多スケール構造について述べる。それに対応するために、科学計算で使われるマルチスケールアルゴリズムからインスピレーションを得て、新しい勾配降下アプローチを導入する。このアプローチは経験的学習率の選択を超越し、特に後期のトレーニング効率を高めるために、より体系的なデータインフォームド戦略を提供する。

This paper investigates the impact of multiscale data on machine learning algorithms, particularly in the context of deep learning. A dataset is multiscale if its distribution shows large variations in scale across different directions. This paper reveals multiscale structures in the loss landscape, including its gradients and Hessians inherited from the data. Correspondingly, it introduces a novel gradient descent approach, drawing inspiration from multiscale algorithms used in scientific computing. This approach seeks to transcend empirical learning rate selection, offering a more systematic, data-informed strategy to enhance training efficiency, especially in the later stages.

翻訳日:2024-02-07 11:13:42 公開日:2024-02-06

# GitBug-Java - 最近のJavaバグの再現可能なベンチマーク

GitBug-Java: A Reproducible Benchmark of Recent Java Bugs ( http://arxiv.org/abs/2402.02961v2 )

ライセンス: Link先を確認

Andr\'e Silva, Nuno Saavedra, Martin Monperrus

(参考訳) バグフィックスベンチマークは自動プログラム修復(apr)とフォールトローカライゼーション(fl)の方法論を評価するのに不可欠である。しかし、欠陥4jによって例示される既存のベンチマークは、現代の開発プラクティスに沿った最近のバグ修正を組み込むために進化する必要がある。さらに、再現性は重要な科学的原則であり、バグフィックスベンチマークに欠けている。これらのギャップに対処するため、最近のJavaバグの再現可能なベンチマークであるGitBug-Javaを紹介します。 GitBug-Javaは、55の有名なオープンソースリポジトリのコミット履歴から抽出された199のバグを特徴としている。 GitBug-Javaを構築するための方法論は、完全に再現可能な環境におけるバグフィックスの保存を保証する。 GitBug-Javaはhttps://github.com/gitbugactions/gitbug-java.orgで公開しています。

Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with contemporary development practices. Moreover, reproducibility, a key scientific principle, has been lacking in bug-fix benchmarks. To address these gaps, we present GitBug-Java, a reproducible benchmark of recent Java bugs. GitBug-Java features 199 bugs extracted from the 2023 commit history of 55 notable open-source repositories. The methodology for building GitBug-Java ensures the preservation of bug-fixes in fully-reproducible environments. We publish GitBug-Java at https://github.com/gitbugactions/gitbug-java.

翻訳日:2024-02-07 11:13:31 公開日:2024-02-06

# プリ・イン・ポスト処理による線形不等式制約付きベイズ・オプティカルフェア分類

Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing ( http://arxiv.org/abs/2402.02817v2 )

ライセンス: Link先を確認

Xianli Zeng, Guang Cheng and Edgar Dobriban

(参考訳) 機械学習アルゴリズムは保護されたグループに異なる影響を与える可能性がある。そこで我々は,与えられた群フェアネス制約による分類誤差を最小限に抑えるため,ベイズ最適公正分類法を開発した。本稿では,確率的分類器の線形関数である \emph{linear disparity measures} の概念と,群的回帰関数においても線型である \emph{bilinear disparity measures} を導入する。人口格差、機会の平等、予測平等からの逸脱など、いくつかの一般的な格差対策が双線形であることを示します。ベイズ最適公正分類器の形式は、ネマン・ピアソン補題との接続を明らかにすることによって、単一の線形不均等測度の下で得られる。双線型差分法では、ベイズ最適公正分類器はグループワイドのしきい値規則となる。提案手法は,複数のフェアネス制約(等化オッズなど)を扱うことができ,予測フェーズでは保護属性が使用できない場合の共通シナリオも扱うことができる。理論結果の活用により,双線型不等式制約下でフェアベイズ最適分類器を学習する手法を考案する。提案手法は,事前処理(フェアアップおよびダウンサンプリング),インプロセッシング(フェアコストセンシティブな分類),ポストプロセッシング(フェアプラグインルール)という,フェアネスアウェア分類の一般的な3つのアプローチをカバーする。本手法は, 最適値-精度トレードオフを達成しつつ, 相違を直接制御する。提案手法が既存のアルゴリズムと良好に比較できることを実証的に示す。

Machine learning algorithms may have disparate impacts on protected groups. To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints. We introduce the notion of \emph{linear disparity measures}, which are linear functions of a probabilistic classifier; and \emph{bilinear disparity measures}, which are also linear in the group-wise regression functions. We show that several popular disparity measures -- the deviations from demographic parity, equality of opportunity, and predictive equality -- are bilinear. We find the form of Bayes-optimal fair classifiers under a single linear disparity measure, by uncovering a connection with the Neyman-Pearson lemma. For bilinear disparity measures, Bayes-optimal fair classifiers become group-wise thresholding rules. Our approach can also handle multiple fairness constraints (such as equalized odds), and the common scenario when the protected attribute cannot be used at the prediction phase. Leveraging our theoretical results, we design methods that learn fair Bayes-optimal classifiers under bilinear disparity constraints. Our methods cover three popular approaches to fairness-aware classification, via pre-processing (Fair Up- and Down-Sampling), in-processing (Fair Cost-Sensitive Classification) and post-processing (a Fair Plug-In Rule). Our methods control disparity directly while achieving near-optimal fairness-accuracy tradeoffs. We show empirically that our methods compare favorably to existing algorithms.

翻訳日:2024-02-07 11:13:20 公開日:2024-02-06

# 小さな言語モデルのための最適化とアーキテクチャの再考

Rethinking Optimization and Architecture for Tiny Language Models ( http://arxiv.org/abs/2402.02791v2 )

ライセンス: Link先を確認

Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, Yunhe Wang

(参考訳) 大規模言語モデル(llm)のパワーは多くのデータと計算リソースを通して実証されている。しかし,モバイル端末上での言語モデルの適用は,計算コストやメモリコストの面で大きな課題に直面している。高度に複雑な訓練プロセスによって制限された言語モデルの最適化には、慎重に研究されることがほとんどない多くの詳細がある。本研究では,1Bパラメータを持つ小さな言語モデルに基づいて,各成分の効果を分析するための実験的な研究を慎重に設計する。主に、 \ie、neural architecture、パラメータ初期化、最適化戦略という3つの視点が議論されている。いくつかの設計式は、トークン圧縮、アーキテクチャの微調整、パラメータ継承、複数ラウンドトレーニングなど、小さな言語モデルに特に効果的であることが実証されている。次に、1.6T多言語コーパス上でPanGu-$\pi$-1B ProとPanGu-$\pi$-1.5B Proを訓練する。実験の結果、PanGu-$\pi$-1B Proのベンチマーク評価セットにおいて、最適化とアーキテクチャの改善により8.87の顕著な平均改善が得られた。さらに、PanGu-$\pi$-1.5B Proは、モデルサイズが大きいSOTAモデルの範囲を超え、その優れた性能を検証する。コードはhttps://github.com/yuchuantian/rethinktinylmで入手できる。

The power of large language models (LLMs) has been demonstrated through numerous data and computing resources. However, the application of language models on mobile devices is facing huge challenge on the computation and memory costs, that is, tiny language models with high performance are urgently required. Limited by the highly complex training process, there are many details for optimizing language models that are seldom studied carefully. In this study, based on a tiny language model with 1B parameters, we carefully design a series of empirical study to analyze the effect of each component. Three perspectives are mainly discussed, \ie, neural architecture, parameter initialization, and optimization strategy. Several design formulas are empirically proved especially effective for tiny language models, including tokenizer compression, architecture tweaking, parameter inheritance and multiple-round training. Then we train PanGu-$\pi$-1B Pro and PanGu-$\pi$-1.5B Pro on 1.6T multilingual corpora, following the established formulas. Experimental results demonstrate the improved optimization and architecture yield a notable average improvement of 8.87 on benchmark evaluation sets for PanGu-$\pi$-1B Pro. Besides, PanGu-$\pi$-1.5B Pro surpasses a range of SOTA models with larger model sizes, validating its superior performance. The code is available at https://github.com/YuchuanTian/RethinkTinyLM.

翻訳日:2024-02-07 11:12:50 公開日:2024-02-06

# コントラストディフューザ:コントラスト学習による高戻り状態に向けた計画

Contrastive Diffuser: Planning Towards High Return States via Contrastive Learning ( http://arxiv.org/abs/2402.02772v2 )

ライセンス: Link先を確認

Yixiang Shan, Zhengbang Zhu, Ting Long, Qifan Liang, Yi Chang, Weinan Zhang, Liang Yin

(参考訳) 近年,長期計画のための強化学習における拡散モデルの適用が注目されている。いくつかの拡散法は任意の分布に対する拡散のモデリング能力をうまく活用している。これらの手法は計画のための後続の軌道を生成し、著しい改善を示している。しかし、これらの方法は、単純な基底分布と、異なる状態が異なるリターンを持つサンプルの多様性を見渡すことによって制限される。彼らは単に拡散を利用してオフラインデータセットの分布を学習し、その状態がオフラインデータセットと同じ分布を共有するトラジェクトリを生成する。その結果、これらのモデルが高リターン状態に達する確率は、データセットの分布に大きく依存する。誘導モデルも装備されているが、性能は抑えられている。そこで本稿では,これらの制約に対処するために,生成した軌道の状態から高リターン状態へ引き出す戻りコントラスト機構を考案し,低リターン状態から遠ざけてベース分布を改善するcdiffuserという新しい手法を提案する。提案手法の有効性を実証する14種類のd4rlベンチマーク実験を行った。

Applying diffusion models in reinforcement learning for long-term planning has gained much attention recently. Several diffusion-based methods have successfully leveraged the modeling capabilities of diffusion for arbitrary distributions. These methods generate subsequent trajectories for planning and have demonstrated significant improvement. However, these methods are limited by their plain base distributions and their overlooking of the diversity of samples, in which different states have different returns. They simply leverage diffusion to learn the distribution of offline dataset, generate the trajectories whose states share the same distribution with the offline dataset. As a result, the probability of these models reaching the high-return states is largely dependent on the dataset distribution. Even equipped with the guidance model, the performance is still suppressed. To address these limitations, in this paper, we propose a novel method called CDiffuser, which devises a return contrast mechanism to pull the states in generated trajectories towards high-return states while pushing them away from low-return states to improve the base distribution. Experiments on 14 commonly used D4RL benchmarks demonstrate the effectiveness of our proposed method.

翻訳日:2024-02-07 11:12:28 公開日:2024-02-06

# 「大切なことを行う方法だ」--先住民のコミュニティに言語技術をより良く提供するためのプロセスに参加する

"It's how you do things that matters": Attending to Process to Better Serve Indigenous Communities with Language Technologies ( http://arxiv.org/abs/2402.02639v2 )

ライセンス: Link先を確認

Ned Cooper, Courtney Heldreth, Ben Hutchinson

(参考訳) 言語は歴史的に自然言語処理(NLP)技術で守られていないが、近年の大規模多言語モデルのスケーリングや、絶滅危惧言語に対するNLPコミュニティの関心の高まりにより、いくつかの言語では変化している。本稿では,これらのプロジェクトが主に先住民コミュニティに役立てるべきという前提に基づいて,先住民言語のためのNLP技術構築における倫理的配慮について考察する。オーストラリアにおける言語技術プロジェクトにおいて,アボリジニやトーレス海峡の島民コミュニティに勤務する17人の研究者とのインタビューを報告する。インタビューから得られた知見に基づき,NLP研究者は,非コンテクスト化された人工物にのみ焦点をあてるのではなく,先住民コミュニティとの関わりに注意を向けるよう推奨する。

Indigenous languages are historically under-served by Natural Language Processing (NLP) technologies, but this is changing for some languages with the recent scaling of large multilingual models and an increased focus by the NLP community on endangered languages. This position paper explores ethical considerations in building NLP technologies for Indigenous languages, based on the premise that such projects should primarily serve Indigenous communities. We report on interviews with 17 researchers working in or with Aboriginal and/or Torres Strait Islander communities on language technology projects in Australia. Drawing on insights from the interviews, we recommend practices for NLP researchers to increase attention to the process of engagements with Indigenous communities, rather than focusing only on decontextualised artefacts.

翻訳日:2024-02-07 11:12:12 公開日:2024-02-06

# ロボットマニピュレータのロバスト低レベル制御による障害物回避深部強化学習型軌道プランナ

Obstacle Avoidance Deep Reinforcement Learning-Based Trajectory Planner with Robust Low-Level Control for Robotic Manipulators ( http://arxiv.org/abs/2402.02551v2 )

ライセンス: Link先を確認

Mehdi Heydari Shahna, Seyed Adel Alizadeh Kolagar, Jouni Mattila

(参考訳) ロボット工学における現代の戦略は、複雑なブラックボックスの性質と解釈可能性の欠如が特徴であり、安定性と安全性の確保に困難をもたらす可能性がある。これらの課題に対処するために,障害物のない深層強化学習(DRL)トラジェクトリプランナを,環境との相互作用を通じて学習フェーズに積極的に関与しながら,新しい低レベル・共同レベルの制御戦略に統合することを提案する。このアプローチは計算の複雑さを回避し、非反復的およびランダムな障害物回避タスクにも対処する。まず,n自由度 (dof) を有するマニピュレータに対して,関節レベルの推論により速度境界および障害物フリー動作を計画するモデルフリーのdrlエージェントを用いる。この計画は、必要なトルクを生成するロバストなサブシステムベースの適応コントローラに入力され、一方、Cuckoo Search Optimization (CSO)アルゴリズムは、到達に必要な時間、安定化に要する時間、所望値からの最大偏差、定常状態における持続的な追跡誤差を最小化するために制御ゲインを強化する。このアプローチは、未知のロボットマニピュレータモデリングにもかかわらず、未知の環境で位置と速度の誤差が指数関数的にゼロに収束することを保証する。理論的な主張はシミュレーション結果の提示を通じて検証される。

In robotics, contemporary strategies are learning-based, characterized by a complex black-box nature and a lack of interpretability, which may pose challenges in ensuring stability and safety. To address these issues, we propose integrating an obstacle-free deep reinforcement learning (DRL) trajectory planner with a novel auto-tuning low- and joint-level control strategy, all while actively engaging in the learning phase through interactions with the environment. This approach circumvents the complexities associated with computations while also addressing nonrepetitive and random obstacle avoidance tasks. First, a model-free DRL agent to plan velocity-bounded and obstacle-free motion is employed for a manipulator with 'n' degrees of freedom (DoF) in task space through joint-level reasoning. This plan is then input into a robust subsystem-based adaptive controller, which produces the necessary torques, while the Cuckoo Search Optimization (CSO) algorithm enhances control gains to minimize the time required to reach, time taken to stabilize, the maximum deviation from the desired value, and persistent tracking error in the steady state. This approach guarantees that position and velocity errors exponentially converge to zero in an unfamiliar environment, despite unknown robotic manipulator modeling. Theoretical assertions are validated through the presentation of simulation outcomes.

翻訳日:2024-02-07 11:11:57 公開日:2024-02-06

# 教師なし画像インスタンスセグメンテーションのための深いスペクトル改善

Deep Spectral Improvement for Unsupervised Image Instance Segmentation ( http://arxiv.org/abs/2402.02474v2 )

ライセンス: Link先を確認

Farnoosh Arefi, Amir M. Mansourian, Shohreh Kasaei

(参考訳) 深層スペクトル法は,自己教師付き学習を用いて特徴を抽出し,アフィニティ行列のラプラシアンを利用して固有値を得ることにより,画像分割プロセスをグラフ分割タスクとして再構成する。しかし、深層スペクトル法の文脈における他のタスクに比べて、インスタンスセグメンテーションにはあまり注意が払われていない。本稿では,自己教師付きバックボーンから抽出した特徴マップのすべてのチャネルが,例えばセグメント化のために十分な情報を含んでいるわけではないことを述べる。実際、一部のチャネルはノイズが多く、タスクの正確性を妨げている。そこで本研究では,ノイズチャネルリダクション (NCR) とディバイジョンベースリダクション (DCR) の2つのチャネルリダクションモジュールを提案する。 NCRはノイズが少ないためエントロピーの低いチャネルを保持するが、DCRは効果的なインスタンスセグメンテーションのための十分な情報がないため、標準偏差の低いチャネルを保持する。さらに, 深層スペクトル法で一般的に用いられるドット積は, 特徴マップ値に対する感度が高いため, インスタンスセグメンテーションには適さないことを示し, 不正確なインスタンスセグメンテーションを生じさせる可能性を示した。この問題に対処するために、Bray-Curtis over Chebyshev (BoC)と呼ばれる新しい類似度指標が提案されている。それらの値に加えて、機能の分布を考慮に入れ、インスタンスセグメンテーションのより堅牢な類似度尺度を提供する。 Youtube-VIS2019データセットの定量および定性的な結果は、提案したチャネル還元法によって達成された改善と、親和性行列を作成するために従来のドット製品の代わりにBoCを使用することを強調している。これらの改善は、ユニオンと抽出されたインスタンスセグメントに対する平均インターセクションの観点で観察され、強化されたインスタンスセグメント性能を示す。コードは、https://github.com/farnooshar/SpecUnIISで入手できる。

Deep spectral methods reframe the image decomposition process as a graph partitioning task by extracting features using self-supervised learning and utilizing the Laplacian of the affinity matrix to obtain eigensegments. However, instance segmentation has received less attention compared to other tasks within the context of deep spectral methods. This paper addresses the fact that not all channels of the feature map extracted from a self-supervised backbone contain sufficient information for instance segmentation purposes. In fact, Some channels are noisy and hinder the accuracy of the task. To overcome this issue, this paper proposes two channel reduction modules: Noise Channel Reduction (NCR) and Deviation-based Channel Reduction (DCR). The NCR retains channels with lower entropy, as they are less likely to be noisy, while DCR prunes channels with low standard deviation, as they lack sufficient information for effective instance segmentation. Furthermore, the paper demonstrates that the dot product, commonly used in deep spectral methods, is not suitable for instance segmentation due to its sensitivity to feature map values, potentially leading to incorrect instance segments. A new similarity metric called Bray-Curtis over Chebyshev (BoC) is proposed to address this issue. It takes into account the distribution of features in addition to their values, providing a more robust similarity measure for instance segmentation. Quantitative and qualitative results on the Youtube-VIS2019 dataset highlight the improvements achieved by the proposed channel reduction methods and the use of BoC instead of the conventional dot product for creating the affinity matrix. These improvements are observed in terms of mean Intersection over Union and extracted instance segments, demonstrating enhanced instance segmentation performance. The code is available on: https://github.com/farnooshar/SpecUnIIS

翻訳日:2024-02-07 11:11:30 公開日:2024-02-06

# TopoX: トポロジカルドメインでの機械学習のためのPythonパッケージスイート

TopoX: A Suite of Python Packages for Machine Learning on Topological Domains ( http://arxiv.org/abs/2402.02441v2 )

ライセンス: Link先を確認

Mustafa Hajij, Mathilde Papillon, Florian Frantzen, Jens Agerberg, Ibrahem AlJabea, Ruben Ballester, Claudio Battiloro, Guillermo Bern\'ardez, Tolga Birdal, Aiden Brent, Peter Chin, Sergio Escalera, Odin Hoff Gardaa, Gurusankar Gopalakrishnan, Devendra Govil, Josef Hoppe, Maneel Reddy Karri, Jude Khouja, Manuel Lecha, Neal Livesay, Jan Mei{\ss}ner, Soham Mukherjee, Alexander Nikitin, Theodore Papamarkou, Jaro Pr\'ilepok, Karthikeyan Natesan Ramamurthy, Paul Rosen, Aldo Guzm\'an-S\'aenz, Alessandro Salatiello, Shreyas N. Samaga, Michael T. Schaub, Luca Scofano, Indro Spinelli, Lev Telyatnikov, Quang Truong, Robin Walters, Maosheng Yang, Olga Zaghen, Ghada Zamzmi, Ali Zia, Nina Miolane

(参考訳) グラフを拡張するトポロジ領域(ハイパーグラフ、単純化、セル、パス、コンビネータ)で、信頼性が高くユーザフレンドリーなビルディングブロックと機械学習を提供するPythonソフトウェアスイートであるtopoxを紹介します。 topoxは以下の3つのパッケージで構成されている: toponetxは、ノード、エッジ、高次セルの操作を含む、これらのドメインの構築と計算を容易にする。 topoembedxは、node2vecのような一般的なグラフベースの埋め込みアルゴリズムに似た、トポロジカルドメインをベクトル空間に埋め込む方法を提供する。 topoxの広範囲にドキュメント化され、ユニットテストされたソースコードは、MITライセンス下でhttps://github.com/pyt-teamで入手できる。

We introduce topox, a Python software suite that provides reliable and user-friendly building blocks for computing and machine learning on topological domains that extend graphs: hypergraphs, simplicial, cellular, path and combinatorial complexes. topox consists of three packages: toponetx facilitates constructing and computing on these domains, including working with nodes, edges and higher-order cells; topoembedx provides methods to embed topological domains into vector spaces, akin to popular graph-based embedding algorithms such as node2vec; topomodelx is built on top of PyTorch and offers a comprehensive toolbox of higher-order message passing functions for neural networks on topological domains. The extensively documented and unit-tested source code of topox is available under MIT license at https://github.com/pyt-team.

翻訳日:2024-02-07 11:11:00 公開日:2024-02-06

# 超高速道路セグメンテーションにおける低レベル表現の活用

Exploiting Low-level Representations for Ultra-Fast Road Segmentation ( http://arxiv.org/abs/2402.02430v2 )

ライセンス: Link先を確認

Huan Zhou, Feng Xue, Yucong Li, Shi Gong, Yiqun Li, Yu Zhou

(参考訳) 組込みプラットフォーム上でのリアルタイムおよび精度の実現は常に道路分割手法の追求であった。そのため、多くの軽量ネットワークが提案されている。しかし、道路は「物」(特定の識別可能な物体)ではなく「地中」であるという事実を無視し、ハイレベルな特徴ではなく低レベルな道路を表現できる可能性を探るきっかけとなる。意外なことに、主流ネットワークモデルの第一段階は、セグメント化のための道路のほとんどのピクセルを表すのに十分である。そこで我々は,低レベル道路分割ネットワーク(LFD-RoadSeg)を提案する。具体的には、LFD-RoadSegは両側構造を採用している。空間詳細分岐はまずResNet-18の第1段までに道路の低レベル特徴表現を抽出するように設計されている。低レベル特徴において、道路と誤認されるテクスチャレス領域を抑制するために、コンテキスト意味分枝を高速にコンテキスト特徴を抽出するように設計する。この目的のために、第2ブランチでは、入力画像を非対称にダウンサンプルし、ResNet-18の第3ステージに匹敵する受容場を実現するために集約モジュールを設計する。最後に、低レベル特徴から道路を区分するために、低レベル表現とコンテキスト特徴の間の画素毎の注意度を計算し、この注意による非ロード低レベル応答を抑制する選択的融合モジュールを提案する。 KITTI-RoadSegでは、LFD-RoadSegは最大F1測定値(MaxF)95.21%、平均精度93.71%を達成し、Jetson TX2ではTITAN Xpで238FPS、Jetson TX2では54FPSに到達した。ソースコードはhttps://github.com/zhouhuan-hust/lfd-roadsegで入手できる。

Achieving real-time and accuracy on embedded platforms has always been the pursuit of road segmentation methods. To this end, they have proposed many lightweight networks. However, they ignore the fact that roads are "stuff" (background or environmental elements) rather than "things" (specific identifiable objects), which inspires us to explore the feasibility of representing roads with low-level instead of high-level features. Surprisingly, we find that the primary stage of mainstream network models is sufficient to represent most pixels of the road for segmentation. Motivated by this, we propose a Low-level Feature Dominated Road Segmentation network (LFD-RoadSeg). Specifically, LFD-RoadSeg employs a bilateral structure. The spatial detail branch is firstly designed to extract low-level feature representation for the road by the first stage of ResNet-18. To suppress texture-less regions mistaken as the road in the low-level feature, the context semantic branch is then designed to extract the context feature in a fast manner. To this end, in the second branch, we asymmetrically downsample the input image and design an aggregation module to achieve comparable receptive fields to the third stage of ResNet-18 but with less time consumption. Finally, to segment the road from the low-level feature, a selective fusion module is proposed to calculate pixel-wise attention between the low-level representation and context feature, and suppress the non-road low-level response by this attention. On KITTI-Road, LFD-RoadSeg achieves a maximum F1-measure (MaxF) of 95.21% and an average precision of 93.71%, while reaching 238 FPS on a single TITAN Xp and 54 FPS on a Jetson TX2, all with a compact model size of just 936k parameters. The source code is available at https://github.com/zhouhuan-hust/LFD-RoadSeg.

翻訳日:2024-02-07 11:10:43 公開日:2024-02-06

# Aligner: 弱補正による効率的なアライメントの実現

Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction ( http://arxiv.org/abs/2402.02416v2 )

ライセンス: Link先を確認

Jiaming Ji, Boyuan Chen, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, Yaodong Yang

(参考訳) 大規模言語モデル(LLM)の整合化への取り組みは、主にRLHF法(Reinforcement Learning from Human Feedback)を通じて行われる。しかし、RLHFはトレーニング報酬モデル、アクター-批評家工学といった大きな課題に直面しており、重要なことにLLMパラメータへのアクセスが必要である。ここでは、アライメントとアンアライメントされた回答の間の補正残差を学習することにより、RLHFプロセス全体をバイパスする新しい効率的なアライメントパラダイムであるAlignerを紹介する。私たちのAlignerには、いくつかの大きな利点があります。まず、教師付き学習を通じてクエリ・アンサー・コレクションデータセットに基づいてトレーニングされる自動回帰Seq2seqモデルであり、最小限のリソースでパラメータ効率の高いアライメントソリューションを提供する。第2に、Alignerは弱いから強い一般化を促進し、Alignerの監督信号による大規模な事前訓練モデルの微調整は、強い性能向上を示す。第3に、Alignerはモデルに依存しないプラグイン・アンド・プレイモジュールとして機能する。注目すべきことに、Aligner-7Bは11種類のLDMを21.9%、平均で23.8%改善している(GPT-4は17.5%、26.9%)。 Llama2-70Bを(弱い)アリグナー13Bの監督で微調整すると、Llama2は8.2%、無害は61.6%改善できる。データセットとコードはhttps://aligner2024.github.ioを参照。

Efforts to align Large Language Models (LLMs) are mainly conducted via Reinforcement Learning from Human Feedback (RLHF) methods. However, RLHF encounters major challenges including training reward models, actor-critic engineering, and importantly, it requires access to LLM parameters. Here we introduce Aligner, a new efficient alignment paradigm that bypasses the whole RLHF process by learning the correctional residuals between the aligned and the unaligned answers. Our Aligner offers several key advantages. Firstly, it is an autoregressive seq2seq model that is trained on the query-answer-correction dataset via supervised learning; this offers a parameter-efficient alignment solution with minimal resources. Secondly, the Aligner facilitates weak-to-strong generalization; finetuning large pretrained models by Aligner's supervisory signals demonstrates strong performance boost. Thirdly, Aligner functions as a model-agnostic plug-and-play module, allowing for its direct application on different open-source and API-based models. Remarkably, Aligner-7B improves 11 different LLMs by 21.9% in helpfulness and 23.8% in harmlessness on average (GPT-4 by 17.5% and 26.9%). When finetuning (strong) Llama2-70B with (weak) Aligner-13B's supervision, we can improve Llama2 by 8.2% in helpfulness and 61.6% in harmlessness. See our dataset and code at https://aligner2024.github.io

翻訳日:2024-02-07 11:10:05 公開日:2024-02-06

PDF登録状況（公開日: 20240206）