Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231030となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# Rustの安全でないメモリアクセスを識別するための高速な概要ベース全プログラム解析 Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust ( http://arxiv.org/abs/2310.10298v2 ) ライセンス: Link先を確認	Jie Zhou, Mingshen Sun, John Criswell,	(参考訳) Rustは40年以上にわたって低レベルのソフトウェアに悩まされてきたメモリ安全性の問題を根本的に解決する最も有望なシステムプログラミング言語の1つです。しかし、Rustの型ルールが特定のシステムプログラミングに制限されすぎているシナリオと、プログラマがセキュリティチェックよりもパフォーマンスを選択するシナリオに対応するため、Rustは安全でないソースコードを書いたり、安全でないライブラリを呼び出したりするセキュリティ回避ハッチを開放する。その結果、安全でないRustコードと直接リンクされていない外部ライブラリは、メモリ安全違反自体を導入するだけでなく、安全なRustと同じモノリシックなアドレス空間で実行されるプログラム全体を侵害する可能性がある。この問題は、安全でないメモリオブジェクト(安全でないコードによってアクセスされる)を分離し、安全でないメモリへのアクセスをサンドボックス化することで緩和することができる。以前の作業のひとつのカテゴリでは、LLVM IR上の既存のプログラム分析フレームワークを使用して、安全でないメモリオブジェクトとアクセスを識別している。しかし、長い解析時間と低い精度の限界に悩まされている。本稿では,RustのMIR上での要約に基づくプログラム全体の解析を用いて,これらの2つの課題に対処する。要約に基づく分析は、分析時間を節約するために需要情報を算出する。 RustのMIRのパフォーマンス解析は、LLVM IRでは利用できないRust固有のリッチな高レベルな型情報を活用する。この写本は、現在進行中の研究の予備研究である。我々は、安全でないヒープの割り当てと、それらの安全でないヒープオブジェクトへのメモリアクセスの両方を識別するためのプログラム全体をプロトタイプ化した。本稿では,解析のオーバーヘッドと有効性について報告する。 Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing writing unsafe source code or calling unsafe libraries. Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. This problem can be mitigated by isolating unsafe memory objects (those accessed by unsafe code) and sandboxing memory accesses to the unsafe memory. One category of prior work utilizes existing program analysis frameworks on LLVM IR to identify unsafe memory objects and accesses. However, they suffer the limitations of prolonged analysis time and low precision. In this paper, we tackled these two challenges using summary-based whole-program analysis on Rust's MIR. The summary-based analysis computes information on demand so as to save analysis time. Performing analysis on Rust's MIR exploits the rich high-level type information inherent to Rust, which is unavailable in LLVM IR. This manuscript is a preliminary study of ongoing research. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects. We reported the overhead and the efficacy of the analysis in this paper.	翻訳日:2024-03-19 02:23:27 公開日:2023-10-30
# 米国マイクロエレクトロニクスパッケージング生態系 : 課題と機会 US Microelectronics Packaging Ecosystem: Challenges and Opportunities ( http://arxiv.org/abs/2310.11651v3 ) ライセンス: Link先を確認	Rouhan Noor, Himanandhan Reddy Kottur, Patrick J Craig, Liton Kumar Biswas, M Shafkat M Khan, Nitin Varshney, Hamed Dalir, Elif Akçalı, Bahareh Ghane Motlagh, Charles Woychik, Yong-Kyu Yoon, Navid Asadizanjani,	(参考訳) 半導体産業は、デバイスの縮小とコスト削減という従来の方法から大きく変化している。チップデザイナーは、シリコンフットプリントにより多くの機能を追加しながら、コスト効率を高める新しい技術ソリューションを積極的に求めている。 Heterogeneous Integration (HI) は、最も適切なプロセス技術を用いて、独立して設計された、製造されたコンポーネントを統合する高度なパッケージング技術である。しかし、HIの採用には設計とセキュリティの課題が伴う。 HIを有効にするためには、先進的な包装の研究開発が不可欠である。既存の研究は、アウトソース半導体アセンブリーおよびテスト(OSAT)施設やベンダーのほとんどがオフショアにあるため、先進的な包装サプライチェーンにおけるセキュリティ上の脅威を提起している。半導体の需要の増加に対処し、セキュアな半導体サプライチェーンを確保するために、米国政府による半導体製造設備のオンショア化に向けた大規模な取り組みがある。しかし、セキュアで効率的でレジリエントな半導体サプライチェーンを確立するというビジョンを完全に実現するために、米国の先進的なパッケージング能力も強化されなければならない。当社の取り組みは、米国に本拠を置く先進的なパッケージサプライチェーンにおけるボトルネックと弱いリンクを特定することを目的としていた。 The semiconductor industry is experiencing a significant shift from traditional methods of shrinking devices and reducing costs. Chip designers actively seek new technological solutions to enhance cost-effectiveness while incorporating more features into the silicon footprint. One promising approach is Heterogeneous Integration (HI), which involves advanced packaging techniques to integrate independently designed and manufactured components using the most suitable process technology. However, adopting HI introduces design and security challenges. To enable HI, research and development of advanced packaging is crucial. The existing research raises the possible security threats in the advanced packaging supply chain, as most of the Outsourced Semiconductor Assembly and Test (OSAT) facilities/vendors are offshore. To deal with the increasing demand for semiconductors and to ensure a secure semiconductor supply chain, there are sizable efforts from the United States (US) government to bring semiconductor fabrication facilities onshore. However, the US-based advanced packaging capabilities must also be ramped up to fully realize the vision of establishing a secure, efficient, resilient semiconductor supply chain. Our effort was motivated to identify the possible bottlenecks and weak links in the advanced packaging supply chain based in the US.	翻訳日:2024-03-19 02:13:39 公開日:2023-10-30
# 形態的画像検出のための視線領域の解析 Analyzing eyebrow region for morphed image detection ( http://arxiv.org/abs/2310.19290v1 ) ライセンス: Link先を確認	Abdullah Zafar, Christoph Busch,	(参考訳) 国際民間航空機関(ICAO)によると、パスポートの顔画像は旅行者の確認のための主要な識別子として指定されている。したがって、eMRTD(Electronic Machine-Readable Travel Document)に格納されている顔画像の正当性を確認することが重要である。自動境界制御(ABC)システムの導入により,eMRTDに格納された画像が正常な顔認証システムの動作を妨げたり悪用したりするような変化を防止できるようなシステムを実現することがさらに重要である。このようなシステムに対する攻撃の1つは、顔変形攻撃である。モーフィング画像を検出する技術は数多く存在するが、モーフィングアルゴリズムもこれらの検出を避けるために改善されている。そこで本研究では,形態的画像検出のためのアイブロウ領域を解析する。提案手法は,眼窩領域の周波数を解析することに基づく。この手法は2つのデータセットで評価され,それぞれが2つのアルゴリズムを用いて生成した形態素画像から成っている。提案手法は画像検出に有用なツールであり,画像の信頼性が重要となる様々なアプリケーションに適用可能であることが示唆された。 Facial images in passports are designated as primary identifiers for the verification of travelers according to the International Civil Aviation Organization (ICAO). Hence, it is important to ascertain the sanctity of the facial images stored in the electronic Machine-Readable Travel Document (eMRTD). With the introduction of automated border control (ABC) systems that rely on face recognition for the verification of travelers, it is even more crucial to have a system to ensure that the image stored in the eMRTD is free from any alteration that can hinder or abuse the normal working of a facial recognition system. One such attack against these systems is the face-morphing attack. Even though many techniques exist to detect morphed images, morphing algorithms are also improving to evade these detections. In this work, we analyze the eyebrow region for morphed image detection. The proposed method is based on analyzing the frequency content of the eyebrow region. The method was evaluated on two datasets that each consisted of morphed images created using two algorithms. The findings suggest that the proposed method can serve as a valuable tool in morphed image detection, and can be used in various applications where image authenticity is critical.	翻訳日:2024-03-18 23:51:32 公開日:2023-10-30
# オフチェーン計算を用いたブロックチェーンに基づくアイデンティティ管理のためのゼロ知識付加型非対話的知識論の組み入れ Incorporating Zero-Knowledge Succinct Non-interactive Argument of Knowledge for Blockchain-based Identity Management with off-chain computations ( http://arxiv.org/abs/2310.19452v1 ) ライセンス: Link先を確認	Pranay Kothari, Deepak Chopra, Manjot Singh, Shivam Bhardwaj, Rudresh Dwivedi,	(参考訳) 今日の世界では、安全で効率的な生体認証が極めて重要である。従来の認証手法は、サイバー攻撃を受けやすいため、もはや信頼性が低いとみなされている。バイオメトリック認証、特に指紋認証は有望な代替手段として登場したが、生体認証データの保存と使用に対する懸念や、サイバー攻撃に脆弱な集中型ストレージへの懸念が高まっている。本稿では,zk-SNARKを組み込んだブロックチェーンベースの指紋認証システムを提案する。 FVC2002、FVC2004、FVC2006データセットに対するKNNベースのアプローチは、惑星間ファイルシステムを使用して格納される安全で高速で堅牢な生体認証と認証のためのキャンセル可能なテンプレートを生成するために使用される。提案手法は指紋認証のためにそれぞれFVC2002、FVC2004、FVC2006データセットに対して99.01%、98.97%、98.52%の平均精度を提供する。 zk-SNARKの導入は、より小さな証明サイズを促進する。全体として、提案手法はブロックチェーンベースのID管理のためのセキュアで効率的なソリューションを提供する可能性がある。 In today's world, secure and efficient biometric authentication is of keen importance. Traditional authentication methods are no longer considered reliable due to their susceptibility to cyber-attacks. Biometric authentication, particularly fingerprint authentication, has emerged as a promising alternative, but it raises concerns about the storage and use of biometric data, as well as centralized storage, which could make it vulnerable to cyber-attacks. In this paper, a novel blockchain-based fingerprint authentication system is proposed that integrates zk-SNARKs, which are zero-knowledge proofs that enable secure and efficient authentication without revealing sensitive biometric information. A KNN-based approach on the FVC2002, FVC2004 and FVC2006 datasets is used to generate a cancelable template for secure, faster, and robust biometric registration and authentication which is stored using the Interplanetary File System. The proposed approach provides an average accuracy of 99.01%, 98.97% and 98.52% over the FVC2002, FVC2004 and FVC2006 datasets respectively for fingerprint authentication. Incorporation of zk-SNARK facilitates smaller proof size. Overall, the proposed method has the potential to provide a secure and efficient solution for blockchain-based identity management.	翻訳日:2024-03-18 23:51:32 公開日:2023-10-30
# Iris: 構造化ピアツーピアネットワークにおける動的プライバシ保護検索 Iris: Dynamic Privacy Preserving Search in Structured Peer-to-Peer Networks ( http://arxiv.org/abs/2310.19634v1 ) ライセンス: Link先を確認	Angeliki Aktypi, Kasper Rasmussen,	(参考訳) Chordのような構造化ピアツーピアネットワークでは、ユーザーは検索した情報をネットワークから他のノードに尋ねることで、探している情報を取得することができる。検索対象を他のノードに展開することで、クエリのプライバシを必要とするアプリケーション、すなわちルーティングに参加する中間ノードからクエリのターゲットを隠すアプリケーションには、構造化されたピアツーピアネットワークが適さない。本稿では,構造化P2Pネットワーク,特にChordプロトコルのクエリプライバシについて検討する。当初私たちは、$k$-anonymityなどのすでに提案されているプライバシー概念が、強い敵の存在下でのChordにおけるクエリのプライバシー保証を説明できないことを観察しました。したがって、攻撃者の背景知識に関する最悪のシナリオを考慮しても、プライバシ保証を評価することができる、$(\alpha,\delta)$-privacyと呼ぶ新しいプライバシの概念を導入する。次に、リクエストがChord内のクエリのターゲットをルーティングに参加する中間ノードから隠せるアルゴリズムであるIrisを設計する。 Irisは、それぞれのアドレスに到達できるように、ターゲットアドレス以外の要求者クエリを持つことで、要求者がターゲットアドレスに近づくことができる。提案アルゴリズムのセキュリティ解析は,提案するプライバシー概念に基づいて行う。また,このアルゴリズムのプロトタイプをMatlabで開発し,その性能評価を行った。我々の分析では、Irisが$(\alpha,\delta)$-privateであることが証明されている。 In structured peer-to-peer networks like Chord, the users manage to retrieve the information they seek by asking other nodes from the network for the information they search. Revealing to other nodes the search target makes structured peer-to-peer networks unsuitable for applications that demand query privacy, i.e., hiding the query's target from the intermediate nodes that take part in the routing. This paper studies the query privacy of structured P2P networks, particularly the Chord protocol. We initially observe that already proposed privacy notions, such as $k$-anonymity, do not allow us to reason about the privacy guarantees of a query in Chord in the presence of a strong adversary. Thus, we introduce a new privacy notion that we call $(\alpha,\delta)$-privacy that allows us to evaluate the privacy guarantees even when considering the worst-case scenario regarding an attacker's background knowledge. We then design Iris, an algorithm that allows a requester to conceal the target of a query in Chord from the intermediate nodes that take part in the routing. Iris achieves that by having the requester query for other than the target addresses so as reaching each one of them allows the requester to get closer to the target address. We perform a security analysis of the proposed algorithm, based on the privacy notion we introduce. We also develop a prototype of the algorithm in Matlab and evaluate its performance. Our analysis proves Iris to be $(\alpha,\delta)$-private while introducing a modest performance overhead.	翻訳日:2024-03-18 23:51:32 公開日:2023-10-30
# ブロックチェーンとNFTを用いた学生証書共有システム Student Certificate Sharing System Using Blockchain and NFTs ( http://arxiv.org/abs/2310.20036v1 ) ライセンス: Link先を確認	Prakhyat Khati, Ajay Kumar Shrestha, Julita Vassileva,	(参考訳) 本稿では,ブロックチェーンに基づく証明書共有システムを提案する。私たちの戦略は、ブロックチェーンアドレスを使用して機関や雇用者と共有可能なブロックチェーンベースのNFT認証を開発することです。学生は個々の機関が作成したデータに単一のプラットフォームでアクセスし、要求に応じて関連するコースのビューをフィルタリングし、証明書のメタデータをNFTとしてミントする。この方法は、アクセスのアカウンタビリティ、IPFSで永久に保持される包括的なレコード、証明書の作成、配布、アクセスのための検証可能な証明を提供する。また、証明書をより安全かつ効率的に共有することができる。データ証明を通じて信頼要因を組み込むことで,偽証明書や重複証明書などの問題に対する対策を行う。これは、手作業の長い従来の認証検証プロセスの課題に対処する。このシステムにより、学生は、デジタル署名による認証と機密性の確保と、不正アクセスに対するデータ保護のハッシュ化を図りながら、複数の機関の学術的資格を1カ所で管理し、検証することができる。全体として,提案システムは,証明書配布に対する新たなアプローチを提供しながら,データの安全性,説明可能性,機密性を保証する。 In this paper, we propose a certificate sharing system based on blockchain that gives students authority and control over their academic certificates. Our strategy involves developing blockchain-based NFT certifications that can be shared with institutions or employers using blockchain addresses. Students may access the data created by each individual institute in a single platform, filter the view of the relevant courses according to their requirements, and mint their certificate metadata as NFTs. This method provides accountability of access, comprehensive records that are permanently maintained in IPFS, and verifiable provenance for creating, distributing, and accessing certificates. It also makes it possible to share certificates more safely and efficiently. By incorporating trust factors through data provenance, our system provides a countermeasure against issues such as fake and duplicate certificates. It addresses the challenge of the traditional certificate verification processes, which are lengthy manual process. With this system, students can manage and validate their academic credentials from multiple institutions in one location while ensuring authenticity and confidentiality using digital signatures and hashing for data protection against unauthorized access. Overall, our suggested system ensures data safety, accountability, and confidentiality while offering a novel approach to certificate distribution.	翻訳日:2024-03-18 23:51:32 公開日:2023-10-30
# 医療におけるデータマイニング情報の安全性と信頼性:文献レビュー Preserving The Safety And Confidentiality Of Data Mining Information In Health Care: A literature review ( http://arxiv.org/abs/2312.00016v1 ) ライセンス: Link先を確認	Robinson Onyemechi Oturugbum,	(参考訳) 毎日大量のデータが生成されるのは、物のインターネットが急速に発達し、今では医療産業に浸透しているからだ。データマイニングの最近の進歩は、プライバシー保護データマイニング(PPDM)と呼ばれる研究の新たな分野を生み出している。 PPDM技術やアプローチは、個人情報のプライバシーを守り、社会全体の利益を保ちながら、膨大な量のデータから実行可能な洞察を抽出することを可能にする。データ統合は、センシティブな患者情報の共有を必要とする。しかし、潜在的に機密性の高い情報の保存と送信に関して、かなりのプライバシー問題が提起されている。機密情報の開示は患者のプライバシーを侵害する。本稿では,プライバシ保護機構,データ保護規制,緩和戦略に関する関連研究のレビューを行う。レビューでは、他のどの戦略よりも優れた戦略はないと結論付けている。したがって、今後の研究は、大量の医療データの時代におけるプライバシソリューションの適切な技術と評価基準の標準化に焦点を当てるべきである。 Daily, massive volume of data are produced due to the internet of things' rapid development, which has now permeated the healthcare industry. Recent advances in data mining have spawned a new field of a study dubbed privacy-preserving data mining (PPDM). PPDM technique or approach enables the extraction of actionable insight from enormous volume of data while safeguarding the privacy of individual information and benefiting the entire society Medical research has taken a new course as a result of data mining with healthcare data to detect diseases earlier and improve patient care. Data integration necessitates the sharing of sensitive patient information. However, substantial privacy issues are raised in connection with the storage and transmission of potentially sensitive information. Disclosing sensitive information infringes on patients' privacy. This paper aims to conduct a review of related work on privacy-preserving mechanisms, data protection regulations, and mitigating tactics. The review concluded that no single strategy outperforms all others. Hence, future research should focus on adequate techniques for privacy solutions in the age of massive medical data and the standardization of evaluation standards.	翻訳日:2024-03-18 13:35:06 公開日:2023-10-30
# 機械学習に基づくセグメンテーションにおける不確かさの定量化:MRIにおける左室容積推定のためのポストホックアプローチ Uncertainty Quantification in Machine Learning Based Segmentation: A Post-Hoc Approach for Left Ventricle Volume Estimation in MRI ( http://arxiv.org/abs/2312.02167v1 ) ライセンス: Link先を確認	F. Terhag, P. Knechtges, A. Basermann, R. Tempone	(参考訳) 近年の研究では、心臓血管疾患が非感染性疾患の死亡率が最高であることが確認されている。左室容積推定は各種心血管疾患の診断・管理に重要であるが,MRI(MRI)におけるセグメンテーションアルゴリズムに係わる不確実性から重要な課題である。近年の機械学習の進歩、特にU-Netのような畳み込みネットワークは、医療画像の自動セグメンテーションを促進するが、特定の病理や異なるスキャナーベンダーやイメージングプロトコルで苦労している。本研究では,予測誤差のパスワイズ挙動をモデル化するために, it\^{o}確率微分方程式 (sdes) を用いたlv容積予測におけるポストホック不確実性推定手法を提案する。このモデルは、心臓の長軸に沿って左室の面積を記述している。この方法は、基礎となるセグメンテーションアルゴリズムとは無関係であり、様々な既存および将来のセグメンテーション技術での使用を容易にする。提案手法は不確かさを定量化するメカニズムを提供し、医療専門家が信頼できない予測に介入できるようにする。これは、予測精度と信頼性が患者の結果に直接影響を及ぼす医療診断などの重要な応用において最も重要である。この手法はデータセットの変更にも堅牢であり、ラベル付きデータへのアクセスが制限された医療センターへの応用を可能にする。提案する不確実性推定手法は, 自動セグメンテーションの堅牢性と一般化性を高める可能性を示し, 臨床現場におけるより信頼性が高く正確なlv容積推定への道を開くとともに, バイオメディカル画像セグメンテーションにおける不確実性定量化のための新たな道を開くとともに, 今後の研究に有望な方向性を提供する。 Recent studies have confirmed cardiovascular diseases remain responsible for highest death toll amongst non-communicable diseases. Accurate left ventricular (LV) volume estimation is critical for valid diagnosis and management of various cardiovascular conditions, but poses significant challenge due to inherent uncertainties associated with segmentation algorithms in magnetic resonance imaging (MRI). Recent machine learning advancements, particularly U-Net-like convolutional networks, have facilitated automated segmentation for medical images, but struggles under certain pathologies and/or different scanner vendors and imaging protocols. This study proposes a novel methodology for post-hoc uncertainty estimation in LV volume prediction using It\^{o} stochastic differential equations (SDEs) to model path-wise behavior for the prediction error. The model describes the area of the left ventricle along the heart's long axis. The method is agnostic to the underlying segmentation algorithm, facilitating its use with various existing and future segmentation technologies. The proposed approach provides a mechanism for quantifying uncertainty, enabling medical professionals to intervene for unreliable predictions. This is of utmost importance in critical applications such as medical diagnosis, where prediction accuracy and reliability can directly impact patient outcomes. The method is also robust to dataset changes, enabling application for medical centers with limited access to labeled data. Our findings highlight the proposed uncertainty estimation methodology's potential to enhance automated segmentation robustness and generalizability, paving the way for more reliable and accurate LV volume estimation in clinical settings as well as opening new avenues for uncertainty quantification in biomedical image segmentation, providing promising directions for future research.	翻訳日:2024-01-15 15:12:14 公開日:2023-10-30
# 知識グラフのためのオープンドメイン知識抽出 Open Domain Knowledge Extraction for Knowledge Graphs ( http://arxiv.org/abs/2312.09424v1 ) ライセンス: Link先を確認	Kun Qian, Anton Belyi, Fei Wu, Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind, Chloe Seivwright, Yiwen Sun, Ahmed Fakhry, Theo Rekatsinas, Ihab Ilyas, Xiaoguang Qi, Yunyao Li	(参考訳) 知識グラフの品質は、下流アプリケーションの品質に直接影響する(例えば、グラフを使用した回答可能な質問の数など)。ナレッジグラフを構築する際の課題のひとつは、グラフのエンティティと事実の完全性と鮮度を保証することだ。本稿では,オープンWebから高品質なエンティティや事実を大規模にソースする,スケーラブルで拡張可能なフレームワークODKEを紹介する。 odkeは幅広い抽出モデルを使用し、異なるレイテンシでストリーミング処理とバッチ処理の両方をサポートする。私たちは、業界規模のオープンドメイン知識グラフを成長させるためにODKEの構築とデプロイで学んだ課題と設計上の決定を反映します。 The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.	翻訳日:2024-01-15 14:24:17 公開日:2023-10-30
# 文化アルゴリズム最適化によるKnapsackチャレンジへの取り組み Addressing The Knapsack Challenge Through Cultural Algorithm Optimization ( http://arxiv.org/abs/2401.03324v1 ) ライセンス: Link先を確認	Mohammad Saleh Vahdatpour	(参考訳) 0-1 クナプサック問題」は古典的な組合せ最適化の問題であり、与えられた集合から項目のサブセットを選択する必要がある。各項目は固有の値と重みを持ち、主な目的は予め定義されたキャパシティ制約に固執しながら総価値を最大化する選択戦略を定式化することである。本稿では,0-1knapsack問題の解法に特化して設計された新しい文化アルゴリズムについて紹介する。提案アルゴリズムは,集団を洗練するための信念空間を取り入れ,進化過程における交叉率と突然変異率を動的に調節する2つの重要な機能を導入する。大規模な実験を通じて、高次元と複雑な制約によって特徴づけられるクナプサック問題においても、アルゴリズムが常にグローバルな最適位置を探索する際の顕著な効率を示す。 The "0-1 knapsack problem" stands as a classical combinatorial optimization conundrum, necessitating the selection of a subset of items from a given set. Each item possesses inherent values and weights, and the primary objective is to formulate a selection strategy that maximizes the total value while adhering to a predefined capacity constraint. In this research paper, we introduce a novel variant of Cultural Algorithms tailored specifically for solving 0-1 knapsack problems, a well-known combinatorial optimization challenge. Our proposed algorithm incorporates a belief space to refine the population and introduces two vital functions for dynamically adjusting the crossover and mutation rates during the evolutionary process. Through extensive experimentation, we provide compelling evidence of the algorithm's remarkable efficiency in consistently locating the global optimum, even in knapsack problems characterized by high dimensions and intricate constraints.	翻訳日:2024-01-15 09:18:31 公開日:2023-10-30
# SeamlessNeRF: 勾配伝搬による部分NeRFのスチッチ化 SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation ( http://arxiv.org/abs/2311.16127v1 ) ライセンス: Link先を確認	Bingchen Gong and Yuehao Wang and Xiaoguang Han and Qi Dou	(参考訳) neural radiance fields(nerfs)は、3dオブジェクトとシーンのデジタルメディアとして登場し、この領域で編集機能を拡張する研究が急増した。複数NeRFのシームレスな編集とマージのタスクは、2D画像編集における ''Poisson blending''' に似ており、既存の作業で探索されていない重要な操作のままである。このギャップを埋めるために、複数のNeRFをシームレスに混合する新しいアプローチであるSeamlessNeRFを提案する。具体的には,ターゲット放射界の外観を最適化し,音源場との調和を図ることを目的としている。本稿では,ブレンディングの最適化手法を提案する。 1)光源と対象フィールドとの交差境界領域における放射色をピン留めする。 2) 目標の本来の勾配を維持すること。広範な実験により,我々のアプローチは,勾配を通じて境界領域から対象フィールド全体へのソースの出現を効果的に伝達できることを検証した。われわれの知る限り、seamlessnerfはradianceフィールドにグラデーションガイド付き外観編集を導入する最初の作品であり、nerfで表現された3dオブジェクトをシームレスに縫い合わせるためのソリューションを提供する。 Neural Radiance Fields (NeRFs) have emerged as promising digital mediums of 3D objects and scenes, sparking a surge in research to extend the editing capabilities in this domain. The task of seamless editing and merging of multiple NeRFs, resembling the ``Poisson blending'' in 2D image editing, remains a critical operation that is under-explored by existing work. To fill this gap, we propose SeamlessNeRF, a novel approach for seamless appearance blending of multiple NeRFs. In specific, we aim to optimize the appearance of a target radiance field in order to harmonize its merge with a source field. We propose a well-tailored optimization procedure for blending, which is constrained by 1) pinning the radiance color in the intersecting boundary area between the source and target fields and 2) maintaining the original gradient of the target. Extensive experiments validate that our approach can effectively propagate the source appearance from the boundary area to the entire target field through the gradients. To the best of our knowledge, SeamlessNeRF is the first work that introduces gradient-guided appearance editing to radiance fields, offering solutions for seamless stitching of 3D objects represented in NeRFs.	翻訳日:2023-12-03 13:30:09 公開日:2023-10-30
# 抗体構造系列共設計のための階層的学習パラダイム A Hierarchical Training Paradigm for Antibody Structure-sequence Co-design ( http://arxiv.org/abs/2311.16126v1 ) ライセンス: Link先を確認	Fang Wu, Stan Z. Li	(参考訳) 治療抗体は必須であり、急速に拡大する薬物モダリティである。抗体と抗原の結合特異性は、これらのY型タンパク質の先端における相補性決定領域(CDR)によって決定される。本稿では,抗体配列構造共設計のための階層的訓練パラダイム(HTP)を提案する。 htpは4段階のトレーニングステージからなり、それぞれ特定のタンパク質ドメイン内の特定のタンパク質モダリティに対応する。異なる段階のタスクを慎重に作成することで、HTPは幾何グラフニューラルネットワーク(GNN)を大規模タンパク質言語モデルとシームレスかつ効果的に統合し、幾何学構造だけでなく、巨大な抗体や非抗体配列データベースから進化情報を抽出し、リガンド結合のポーズと強度を決定する。実証実験により、HTPは、共同設計問題と固定バックボーン設計において、新しい最先端性能を設定できることが示されている。我々の研究は、深い生成的アーキテクチャの可能性を解き明かし、抗体配列と構造共設計の課題への道のりを照らそうとしている。 Therapeutic antibodies are an essential and rapidly expanding drug modality. The binding specificity between antibodies and antigens is decided by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design. HTP consists of four levels of training stages, each corresponding to a specific protein modality within a particular protein domain. Through carefully crafted tasks in different stages, HTP seamlessly and effectively integrates geometric graph neural networks (GNNs) with large-scale protein language models to excavate evolutionary information from not only geometric structures but also vast antibody and non-antibody sequence databases, which determines ligand binding pose and strength. Empirical experiments show that HTP sets the new state-of-the-art performance in the co-design problem as well as the fix-backbone design. Our research offers a hopeful path to unleash the potential of deep generative architectures and seeks to illuminate the way forward for the antibody sequence and structure co-design challenge.	翻訳日:2023-12-03 13:29:46 公開日:2023-10-30
# 「外を少しだけ見て」--社会的帰属自信と機械学習と人工知能の学生の永続性 "Just a little bit on the outside for the whole time": Social belonging confidence and the persistence of Machine Learning and Artificial Intelligence students ( http://arxiv.org/abs/2311.10745v1 ) ライセンス: Link先を確認	Katherine Mao, Sharon Ferguson, James Magarian, Alison Olechowski	(参考訳) 機械学習(ML)と人工知能(AI)の成長分野は、永続研究においてユニークで未探索なケースを示しており、この発展分野にエンジニアリングによる過去の発見がどの程度適用されるのかは不明である。我々は,この分野での持続性の最初の理解を得るために探索的研究を行い,将来的な仕事の有益な方向を特定する。工学における永続性を予測できる要因の一つとして,信頼のレンズを通した存在を考察し,社会的存在の信頼に対する関心が,職業の多様性を高めるのにどう役立つかについて議論する。本稿では,ML/AI講座の学生へのインタビューを小規模に実施する。これらのインタビューのテーマ分析から、学生がML/AIのキャリアをどう見ているかは、興味やプログラミングの自信に基づいて異なることが判明した。実験では,露出と開始,MLとAIのフィールド境界の解釈,成功に必要なスキルの信念が,学生の持続への意図にどのように影響するかを確認した。学生が社会的帰属によって動機づけられることと、密接なメンターシップの重要性の相違について論じる。 ML/AIにおけるより永続的な研究の動機は、特に社会的帰属と密接なメンターシップ、交差点アイデンティティの役割、そして入門的なML/AIコースである。 The growing field of machine learning (ML) and artificial intelligence (AI) presents a unique and unexplored case within persistence research, meaning it is unclear how past findings from engineering will apply to this developing field. We conduct an exploratory study to gain an initial understanding of persistence in this field and identify fruitful directions for future work. One factor that has been shown to predict persistence in engineering is belonging; we study belonging through the lens of confidence, and discuss how attention to social belonging confidence may help to increase diversity in the profession. In this research paper, we conduct a small set of interviews with students in ML/AI courses. Thematic analysis of these interviews revealed initial differences in how students see a career in ML/AI, which diverge based on interest and programming confidence. We identified how exposure and initiation, the interpretation of ML and AI field boundaries, and beliefs of the skills required to succeed might influence students' intentions to persist. We discuss differences in how students describe being motivated by social belonging and the importance of close mentorship. We motivate further persistence research in ML/AI with particular focus on social belonging and close mentorship, the role of intersectional identity, and introductory ML/AI courses.	翻訳日:2023-11-27 00:44:51 公開日:2023-10-30
# 機械学習と人工知能における学生の意図的持続性モデルの構築 Advancing a Model of Students' Intentional Persistence in Machine Learning and Artificial Intelligence ( http://arxiv.org/abs/2311.10744v1 ) ライセンス: Link先を確認	Sharon Ferguson, Katherine Mao, James Magarian, Alison Olechowski	(参考訳) 機械学習(ML)と人工知能(AI)は、私たちが使用しているアプリケーション、意思決定、そして私たちに関する決定を支えています。多様性を念頭に設計する際,顔認識アルゴリズムから復習アルゴリズムに至るまで,不平等な結果の例を数多く見てきた。したがって、この分野における多様性を促進するための行動をとる必要がある。この研究における重要なステップは、ML/AIを学ぶことを選んだ一部の学生が後に現場を離れる理由を理解することである。多様な集団の持続性は工学的に研究されているが、ML/AIの持続性に影響を与える要因を研究する研究は乏しい。本研究では,ML/AIコースの学生を対象に,ML/AIにおける意図的永続性モデルの構築について述べる。性別,国際学生状況,学生ローン状況,可視的マイノリティ状態などの集団間の持続性について検討した。我々は、ml/aiを他のstem分野と区別する独立した変数、例えば、非技術スキルに対する様々な重点、仕事の曖昧な倫理的意味、そしてこの分野の競争的で収益性の高い性質について検討する。以上より,短期的意図的持続性は,学術的入学要因に関連していると考えられた。長期的持続性は、職業的役割の信頼性の尺度と相関する。私たちの研究に特有ののは、自分の仕事をポジティブな社会的利益にしたいというのは、長期的な意図的な持続性の負の予測要因であるということです。我々は,学級におけるML/AI倫理を有意義に議論し,分野の多様性を高めるために対人スキルの発達を促すことを教育者に勧める。 Machine Learning (ML) and Artificial Intelligence (AI) are powering the applications we use, the decisions we make, and the decisions made about us. We have seen numerous examples of non-equitable outcomes, from facial recognition algorithms to recidivism algorithms, when they are designed without diversity in mind. Thus, we must take action to promote diversity among those in this field. A critical step in this work is understanding why some students who choose to study ML/AI later leave the field. While the persistence of diverse populations has been studied in engineering, there is a lack of research investigating factors that influence persistence in ML/AI. In this work, we present the advancement of a model of intentional persistence in ML/AI by surveying students in ML/AI courses. We examine persistence across demographic groups, such as gender, international student status, student loan status, and visible minority status. We investigate independent variables that distinguish ML/AI from other STEM fields, such as the varying emphasis on non-technical skills, the ambiguous ethical implications of the work, and the highly competitive and lucrative nature of the field. Our findings suggest that short-term intentional persistence is associated with academic enrollment factors such as major and level of study. Long-term intentional persistence is correlated with measures of professional role confidence. Unique to our study, we show that wanting your work to have a positive social benefit is a negative predictor of long-term intentional persistence, and women generally care more about this. We provide recommendations to educators to meaningfully discuss ML/AI ethics in classes and encourage the development of interpersonal skills to help increase diversity in the field.	翻訳日:2023-11-27 00:44:27 公開日:2023-10-30
# KG-FRUS:米国外交関係127年間のグラフベースの新しいデータセット KG-FRUS: a Novel Graph-based Dataset of 127 Years of US Diplomatic Relations ( http://arxiv.org/abs/2311.01606v1 ) ライセンス: Link先を確認	G\"okberk \"Ozsoy, Luis Salamanca, Matthew Connelly, Raymond Hicks and Fernando P\'erez-Cruz	(参考訳) 本稿では,米国政府の外交文書を知識グラフ(kg)にエンコードした30万以上の国文書からなるkg-frusデータセットを提案する。我々は、米国の外交関係(frus)のデータ(xmlファイルとして利用可能)を利用して、文書やその中に言及されている個人や国に関する情報を抽出する。抽出されたエンティティと関連するメタデータを使用して、グラフベースのデータセットを作成します。さらに、生成したKGをWikidataから追加のエンティティと関係を補足する。 kgにおける関係は、外交、外交、政治といった複雑な分野の研究と理解に必要なシナジーとダイナミクスを捉えている。これは、テキスト内のエンティティ間の関係を無視する、単純なドキュメントのコレクションを越えている。我々は、現在のデータセットの様々な可能性を示し、kgを探索する異なるアプローチを図示する。本稿では、単純な研究質問に対するクエリ言語の使用方法と、完全なグラフ構造から恩恵を受けるNode2VecやPageRankといったグラフアルゴリズムの使用方法を例示する。さらに重要なことに、選択された構造は、グラフを継続的に拡張し、拡張するための完全な柔軟性を提供します。提案したKG構築パイプラインは、時間に依存した複雑な現象の他の元のコーパスを符号化することができる。全体として、時間依存関連テキストデータのより汎用的な表現を提供するKGデータベース作成機構と、全重要FRUSデータベースへの特定の応用について述べる。 In the current paper, we present the KG-FRUS dataset, comprised of more than 300,000 US government diplomatic documents encoded in a Knowledge Graph (KG). We leverage the data of the Foreign Relations of the United States (FRUS) (available as XML files) to extract information about the documents and the individuals and countries mentioned within them. We use the extracted entities, and associated metadata, to create a graph-based dataset. Further, we supplement the created KG with additional entities and relations from Wikidata. The relations in the KG capture the synergies and dynamics required to study and understand the complex fields of diplomacy, foreign relations, and politics. This goes well beyond a simple collection of documents which neglects the relations between entities in the text. We showcase a range of possibilities of the current dataset by illustrating different approaches to probe the KG. In the paper, we exemplify how to use a query language to answer simple research questions and how to use graph algorithms such as Node2Vec and PageRank, that benefit from the complete graph structure. More importantly, the chosen structure provides total flexibility for continuously expanding and enriching the graph. Our solution is general, so the proposed pipeline for building the KG can encode other original corpora of time-dependent and complex phenomena. Overall, we present a mechanism to create KG databases providing a more versatile representation of time-dependent related text data and a particular application to the all-important FRUS database.	翻訳日:2023-11-12 19:56:39 公開日:2023-10-30
# テキスト予測のための忠実でロバストな局所解釈可能性 Faithful and Robust Local Interpretability for Textual Predictions ( http://arxiv.org/abs/2311.01605v1 ) ライセンス: Link先を確認	Gianluigi Lopardo, Frederic Precioso, Damien Garreau	(参考訳) 機械学習モデルの信頼性と重要なドメインへのデプロイには、解釈可能性が不可欠である。しかし、既存のテキストモデルを解釈する手法はしばしば複雑であり、数学的基礎が固まっておらず、その性能は保証されていない。本稿では,テキスト上の予測を解釈する新しい方法であるfred(faithful and robust explanationer for textual documents)を提案する。 FREDは、削除された際の予測に大きな影響を及ぼすドキュメントのキーワードを識別する。解釈可能な分類器に関する形式的定義と理論的解析を通じてフレッドの信頼性を確立する。さらに、最先端手法に対する経験的評価は、テキストモデルに対する洞察を提供することにおけるfredの有効性を示している。 Interpretability is essential for machine learning models to be trusted and deployed in critical domains. However, existing methods for interpreting text models are often complex, lack solid mathematical foundations, and their performance is not guaranteed. In this paper, we propose FRED (Faithful and Robust Explainer for textual Documents), a novel method for interpreting predictions over text. FRED identifies key words in a document that significantly impact the prediction when removed. We establish the reliability of FRED through formal definitions and theoretical analyses on interpretable classifiers. Additionally, our empirical evaluation against state-of-the-art methods demonstrates the effectiveness of FRED in providing insights into text models.	翻訳日:2023-11-12 19:56:17 公開日:2023-10-30
# グリーンウォッシングの検出に言語モデルを活用する Leveraging Language Models to Detect Greenwashing ( http://arxiv.org/abs/2311.01469v1 ) ライセンス: Link先を確認	Avalon Vinella, Margaret Capetz, Rebecca Pattichis, Christina Chance, and Reshmi Ghosh	(参考訳) 近年、気候変動による影響が大衆の関心を惹きつけている。その結果、企業は公的なイメージを強化するために持続可能性レポートへの環境取り組みを強調している。しかし、このような報告書のレビューに厳格な規制がないことは、グリーンウォッシングの可能性を秘めている。本研究では,グリーンウォッシングリスクを考慮に入れたラベルを用いた言語モデル学習手法を提案する。本研究の主な貢献は,緑化リスクを定量化するための数学的定式化,この問題に対する微調整式CurrentBERTモデル,結果の比較分析である。持続可能性レポートからなるテストセットでは, 平均精度スコア86.34%, F1スコア0.67を達成し, 提案手法が本課題に対する探索の有望な方向を示すことを示した。 In recent years, climate change repercussions have increasingly captured public interest. Consequently, corporations are emphasizing their environmental efforts in sustainability reports to bolster their public image. Yet, the absence of stringent regulations in review of such reports allows potential greenwashing. In this study, we introduce a novel methodology to train a language model on generated labels for greenwashing risk. Our primary contributions encompass: developing a mathematical formulation to quantify greenwashing risk, a fine-tuned ClimateBERT model for this problem, and a comparative analysis of results. On a test set comprising of sustainability reports, our best model achieved an average accuracy score of 86.34% and F1 score of 0.67, demonstrating that our methods show a promising direction of exploration for this task.	翻訳日:2023-11-12 19:55:54 公開日:2023-10-30
# あなたは次に何をすべきかを覚えています Remember what you did so you know what to do next ( http://arxiv.org/abs/2311.01468v1 ) ライセンス: Link先を確認	Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel	(参考訳) 小学校理科実験用テキストゲームシミュレータであるScienceWorldにおいて、中規模大言語モデル(GPT-J 6Bパラメータ)を用いて、シミュレーションロボットが30種類の目標を達成する計画を作成する。以前に出版された経験的研究によると、大型言語モデル(LLM)は強化学習と比較して不適合である(Wang et al., 2022)。マルコフの仮定(前のステップの1つ)を用いて、LLMは強化学習に基づくアプローチを1.4倍に向上させる。 LLMの入力バッファをできるだけ多くの事前ステップで満たすと、改善は3.5倍になる。トレーニングデータのわずか6.5%のトレーニングでも、強化学習に基づくアプローチよりも2.2倍の改善が見られた。実験の結果、30種類のアクションに対して、パフォーマンスが広範囲に分散していることが判明した。 2023年、Lin et al.(2023年)は、OpenAIの大規模LLMを補完する小さなLLM(T5-large)を用いて、ScienceWorldで優れた結果を得るための2部アプローチ(SwiftSage)を実演した。我々の6-BパラメータであるシングルステージGPT-Jは、GPT-Jよりも29倍のパラメータを持つGPT-3.5ターボを組み込んだSwiftSageの2段アーキテクチャの性能と一致する。 We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption (a single previous step), the LLM outperforms the reinforcement learning-based approach by a factor of 1.4. When we fill the LLM's input buffer with as many prior steps as possible, improvement rises to 3.5x. Even when training on only 6.5% of the training data, we observe a 2.2x improvement over the reinforcement-learning-based approach. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues. In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has 29-times more parameters than GPT-J.	翻訳日:2023-11-12 19:55:41 公開日:2023-10-30
# 非iidデータ上の差分プライベートフェデレーションクラスタリング Differentially Private Federated Clustering over Non-IID Data ( http://arxiv.org/abs/2301.00955v3 ) ライセンス: Link先を確認	Yiwei Li, Shuai Wang, Chong-Yung Chi, Tony Q. S. Quek	(参考訳) 本稿では,大規模クライアント上に分散した未ラベルデータサンプルをパラメータサーバのオーケストレーション下で有限クラスタに正確に分割することを目的とした,フェデレーションクラスタリング(FedC)問題について検討する。クラスタセントロイドを示す実変数と,各データサンプルのクラスタメンバシップを示すバイナリ変数を含むNPハード最適化問題であるが,ソフトクラスタリングソリューションにより,FedC問題を1つの凸制約のみで非凸最適化問題に変換する。そこで,DP-FedCと呼ばれる差分プライバシ(DP)技術を用いた新しいFedCアルゴリズムを提案する。さらに, プライバシ保護と収束率の理論的解析により, 提案するdp-fedcの設計指針として理想的に機能する非識別・独立分散(非i.i.d.)データに対して, 提案するdp-fedcの様々な特性が得られた。次に, 提案するdp-fedcの有効性と, 最先端のfemcアルゴリズムよりも優れた性能, 提示されたすべての解析結果との一貫性を実証するために, 2つの実データを用いた実験結果を提示した。 In this paper, we investigate federated clustering (FedC) problem, that aims to accurately partition unlabeled data samples distributed over massive clients into finite clusters under the orchestration of a parameter server, meanwhile considering data privacy. Though it is an NP-hard optimization problem involving real variables denoting cluster centroids and binary variables denoting the cluster membership of each data sample, we judiciously reformulate the FedC problem into a non-convex optimization problem with only one convex constraint, accordingly yielding a soft clustering solution. Then a novel FedC algorithm using differential privacy (DP) technique, referred to as DP-FedC, is proposed in which partial clients participation and multiple local model updating steps are also considered. Furthermore, various attributes of the proposed DP-FedC are obtained through theoretical analyses of privacy protection and convergence rate, especially for the case of non-identically and independently distributed (non-i.i.d.) data, that ideally serve as the guidelines for the design of the proposed DP-FedC. Then some experimental results on two real datasets are provided to demonstrate the efficacy of the proposed DP-FedC together with its much superior performance over some state-of-the-art FedC algorithms, and the consistency with all the presented analytical results.	翻訳日:2023-11-12 19:54:54 公開日:2023-10-30
# 安全かつパーソナライズ可能な自動運転車開発のための選好学習アプローチ A Preference Learning Approach to Develop Safe and Personalizable Autonomous Vehicles ( http://arxiv.org/abs/2311.02099v1 ) ライセンス: Link先を確認	Ruya Karagulle and Nikos Arechiga and Andrew Best and Jonathan DeCastro and Necmiye Ozay	(参考訳) 本研究は,自動運転車の交通規則遵守を保証する選好学習手法を提案する。本手法では,トラフィックルールを記述する信号時相論理(stl)の優先順位順序付けを学習フレームワークに組み込む。パラメトリック重み付き信号時間論理(PWSTL)を利用して、ペア比較に基づく安全保証優先学習の問題を定式化し、この学習問題を解決するためのアプローチを提案する。提案手法は, 与えられたPWSTL式を重み付けし, これらの重み付けにより, 優先信号が非優先値よりも重み付けされた量的満足度測定値であることを示す。提案手法により得られた重みの有意な評価は,重み付きSTL式に導かれる。本手法は,停止標識と横断歩道を含む2つの異なる運転シナリオにおいて,被験者実験により性能を実証する。提案手法は,既存の選好学習手法と比較して,嗜好を捉えて比較し,安全性を考慮すれば,特に勝っている。 This work introduces a preference learning method that ensures adherence to traffic rules for autonomous vehicles. Our approach incorporates priority ordering of signal temporal logic (STL) formulas, describing traffic rules, into a learning framework. By leveraging the parametric weighted signal temporal logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons, and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula which can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with human subject studies in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences, and notably outperforms them when safety is considered.	翻訳日:2023-11-12 19:44:50 公開日:2023-10-30
# bsdar: ニューラルキーフレーズ生成における注意報奨付きビーム探索復号 BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase Generation ( http://arxiv.org/abs/1909.09485v2 ) ライセンス: Link先を確認	Iftitahu Ni'mah, Vlado Menkovski, Mykola Pechenizkiy	(参考訳) 本研究は, ニューラルキーフレーズ生成における2つの共通デコード問題, シーケンス長バイアスとビーム多様性について検討した。そこで本研究では,単語レベルとngramレベルの報酬関数に基づくビーム探索復号手法を導入し,seq2seq推論をテスト時に制約・洗練する。その結果,提案手法はアルゴリズムのバイアスを克服し,より短く,ほぼ同一のシーケンスに到達し,ソーステキストに存在しないキーフレーズを生成する際の復号性能が大幅に向上した。 This study mainly investigates two common decoding problems in neural keyphrase generation: sequence length bias and beam diversity. To tackle the problems, we introduce a beam search decoding strategy based on word-level and ngram-level reward function to constrain and refine Seq2Seq inference at test time. Results show that our simple proposal can overcome the algorithm bias to shorter and nearly identical sequences, resulting in a significant improvement of the decoding performance on generating keyphrases that are present and absent in source text.	翻訳日:2023-11-03 18:54:09 公開日:2023-10-30
# テキスト・音声・音声・生理信号を用いた機械学習による共感検出 Empathy Detection Using Machine Learning on Text, Audiovisual, Audio or Physiological Signals ( http://arxiv.org/abs/2311.00721v1 ) ライセンス: Link先を確認	Md Rakibul Hasan, Md Zakir Hossain, Shreya Ghosh, Susannah Soon, Tom Gedeon	(参考訳) 共感とは、個人が他人を理解する能力を示す社会的スキルである。過去数年間、共感は、Affective Computing、Cognitive Science and Psychologyに限らず、様々な分野から注目を集めてきた。共感は文脈に依存した用語であり、共感を検知または認識することは、社会、医療、教育に潜在的な応用をもたらす。広範かつ重なり合う話題であるにもかかわらず、機械学習を活用した共感検出研究の道筋は、全体論的な文学的観点からは未検討のままである。この目的のために,10の有名なデータベースから801の論文を体系的に収集,スクリーニングし,選択した54の論文を分析した。本論文は,共感検出システムの入力モダリティ,すなわちテキスト,オーディオ視覚,オーディオ,生理的信号に基づいてグループ化する。本稿では,モダリティ固有の前処理とネットワークアーキテクチャ設計プロトコル,一般的なデータセット記述と可用性の詳細,評価プロトコルについて検討する。我々はさらに,新たな探索方法を促進するコンピュータベースの共感ドメインにおける潜在的応用,展開課題,研究ギャップについても論じる。私たちは、私たちの仕事は、文化、多様性、多言語主義を含む、プライバシーを保護し、偏見のない共感システムを開発するための一歩だと信じています。 Empathy is a social skill that indicates an individual's ability to understand others. Over the past few years, empathy has drawn attention from various disciplines, including but not limited to Affective Computing, Cognitive Science and Psychology. Empathy is a context-dependent term; thus, detecting or recognising empathy has potential applications in society, healthcare and education. Despite being a broad and overlapping topic, the avenue of empathy detection studies leveraging Machine Learning remains underexplored from a holistic literature perspective. To this end, we systematically collect and screen 801 papers from 10 well-known databases and analyse the selected 54 papers. We group the papers based on input modalities of empathy detection systems, i.e., text, audiovisual, audio and physiological signals. We examine modality-specific pre-processing and network architecture design protocols, popular dataset descriptions and availability details, and evaluation protocols. We further discuss the potential applications, deployment challenges and research gaps in the Affective Computing-based empathy domain, which can facilitate new avenues of exploration. We believe that our work is a stepping stone to developing a privacy-preserving and unbiased empathic system inclusive of culture, diversity and multilingualism that can be deployed in practice to enhance the overall well-being of human life.	翻訳日:2023-11-03 16:20:52 公開日:2023-10-30
# 機械学習ポテンシャルにおける構造的・コンフォメーション的多様性の役割 Role of Structural and Conformational Diversity for Machine Learning Potentials ( http://arxiv.org/abs/2311.00862v1 ) ライセンス: Link先を確認	Nikhil Shenoy, Prudencio Tossou, Emmanuel Noutahi, Hadrien Mary, Dominique Beaini, Jiarui Ding	(参考訳) 機械学習の原子間ポテンシャル(mlips)の分野では、データバイアス、特にコンフォメーションと構造的多様性の間の複雑な関係を理解し、モデル一般化は量子力学(qm)データ生成作業の品質向上に不可欠である。この2つの異なる実験により、データセットサイズが一定である固定的予算1と、構造的多様性を変化させつつ、固定的な構造的多様性に焦点をあてた固定的分子集合1とを探索する。その結果,一般化指標におけるニュアンスパターンが明らかになった。特に、最適構造とコンフォーメーションの一般化には、構造とコンフォーメーションの多様性の慎重なバランスが必要であるが、既存のQMデータセットはそのトレードオフを満たしていない。さらに,モデル展開における適用可能性ドメイン定義の重要性を強調しながら,トレーニング分布を超えて一般化するmlipモデルの限界を強調する。これらの知見は、QMデータ生成のための貴重な洞察とガイドラインを提供する。 In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.	翻訳日:2023-11-03 15:27:14 公開日:2023-10-30
# 定数円における量子後零知識へのブラックボックスアプローチ A Black-Box Approach to Post-Quantum Zero-Knowledge in Constant Rounds ( http://arxiv.org/abs/2011.02670v4 ) ライセンス: Link先を確認	Nai-Hui Chia and Kai-Min Chung and Takashi Yamakawa	(参考訳) 最近のセミナルな研究で、ビタンスキーとシュムエリ(STOC '20)は、NPが量子攻撃に対して安全であることを示す定ラウンドゼロ知識引数を初めて構築した。しかし、それらの構造は古典的なものに比べていくつかの欠点がある。具体的には、それらの構成は計算の健全性しか達成せず、エラー(QLWE仮定)と量子完全同型暗号(QFHE)の存在による学習の量子困難性の強い仮定を必要とし、非ブラックボックスシミュレーションに依存している。本稿では、これらの問題をゼロ知識の概念を「$\epsilon$-zero-knowledge」と呼ぶものに弱めるコストで解決する。具体的には, 統計的健全性とブラックボックスの$\epsilon$-zero-knowledge を満たす NP に対して, 衝突するハッシュ関数の存在を前提として, 一定のラウンド・インタラクティブな NP の証明を構築する。興味深いことに、この構成はGoldreich と Kahan (JoC '96) による古典的プロトコルの適応版にすぎないが、量子敵に対する$\epsilon$-zero-knowledgeプロパティの証明には新しいアイデアが必要である。量子攻撃に対するブラックボックス $\epsilon$-zero-knowledge と計算の健全性を満たすnpの一定の円環的対話的議論を量子後一方向関数の存在を仮定するだけで構成する。この結果の核心となるのは、シミュレータが悪意のある検証者のコミットメッセージを抽出し、検証者の内部状態を適切な意味でシミュレートすることのできる新しい量子巻き戻し技術である。 In a recent seminal work, Bitansky and Shmueli (STOC '20) gave the first construction of a constant round zero-knowledge argument for NP secure against quantum attacks. However, their construction has several drawbacks compared to the classical counterparts. Specifically, their construction only achieves computational soundness, requires strong assumptions of quantum hardness of learning with errors (QLWE assumption) and the existence of quantum fully homomorphic encryption (QFHE), and relies on non-black-box simulation. In this paper, we resolve these issues at the cost of weakening the notion of zero-knowledge to what is called $\epsilon$-zero-knowledge. Concretely, we construct the following protocols: - We construct a constant round interactive proof for NP that satisfies statistical soundness and black-box $\epsilon$-zero-knowledge against quantum attacks assuming the existence of collapsing hash functions, which is a quantum counterpart of collision-resistant hash functions. Interestingly, this construction is just an adapted version of the classical protocol by Goldreich and Kahan (JoC '96) though the proof of $\epsilon$-zero-knowledge property against quantum adversaries requires novel ideas. - We construct a constant round interactive argument for NP that satisfies computational soundness and black-box $\epsilon$-zero-knowledge against quantum attacks only assuming the existence of post-quantum one-way functions. At the heart of our results is a new quantum rewinding technique that enables a simulator to extract a committed message of a malicious verifier while simulating verifier's internal state in an appropriate sense.	翻訳日:2023-11-02 18:53:58 公開日:2023-10-30
# 連続条件生成逆数ネットワーク:新しい経験的損失とラベル入力機構 Continuous Conditional Generative Adversarial Networks: Novel Empirical Losses and Label Input Mechanisms ( http://arxiv.org/abs/2011.07466v9 ) ライセンス: Link先を確認	Xin Ding and Yongwei Wang and Zuheng Xu and William J. Welch and Z. Jane Wang	(参考訳) 本研究では,連続的,スカラーな条件(終末回帰ラベル)に基づく画像生成条件生成モデルとして,CcGAN(Continuous Conditional Generative Adversarial Network)を提案する。 Existing conditional GANs (cGANs) are mainly designed for categorical conditions (eg, class labels); conditioning on regression labels is mathematically distinct and raises two fundamental problems:(P1) Since there may be very few (even zero) real images for some regression labels, minimizing existing empirical versions of cGAN losses (aka empirical cGAN losses) often fails in practice;(P2) Since regression labels are scalar and infinitely many, conventional label input methods are not applicable. 提案するccganは, (s1) 既存の経験的cgan損失を連続シナリオに適合するように再構成し, (s2) ナイーブラベル入力 (nli) 法と改良されたラベル入力 (ili) 法を提案し, ジェネレータと判別器に回帰ラベルを組み込む。 s1) における再構成は、2つの新しい経験的判別器損失をもたらし、それぞれhard vicinal discriminator loss (hvdl) とsoft vicinal discriminator loss (svdl) と呼ばれる。 HVDLとSVDLで訓練された判別器の誤差境界は、本研究の軽微な仮定の下で導出される。 2つの新しいベンチマークデータセット(RC-49とCell-200)と新しい評価基準(Sliding Fr\'echet Inception Distance)も提案されている。 CcGANは,Circular 2-D Gaussian, RC-49, UTKFace, Cell-200, Steering Angleのデータセットを用いて, 与えられた回帰ラベルに基づいて画像分布条件から, 多様な高品質なサンプルを生成することができることを示す。さらに、これらの実験では、CcGANは視覚的および定量的にcGANを著しく上回る。 This work proposes the continuous conditional generative adversarial network (CcGAN), the first generative model for image generation conditional on continuous, scalar conditions (termed regression labels). Existing conditional GANs (cGANs) are mainly designed for categorical conditions (eg, class labels); conditioning on regression labels is mathematically distinct and raises two fundamental problems:(P1) Since there may be very few (even zero) real images for some regression labels, minimizing existing empirical versions of cGAN losses (aka empirical cGAN losses) often fails in practice;(P2) Since regression labels are scalar and infinitely many, conventional label input methods are not applicable. The proposed CcGAN solves the above problems, respectively, by (S1) reformulating existing empirical cGAN losses to be appropriate for the continuous scenario; and (S2) proposing a naive label input (NLI) method and an improved label input (ILI) method to incorporate regression labels into the generator and the discriminator. The reformulation in (S1) leads to two novel empirical discriminator losses, termed the hard vicinal discriminator loss (HVDL) and the soft vicinal discriminator loss (SVDL) respectively, and a novel empirical generator loss. The error bounds of a discriminator trained with HVDL and SVDL are derived under mild assumptions in this work. Two new benchmark datasets (RC-49 and Cell-200) and a novel evaluation metric (Sliding Fr\'echet Inception Distance) are also proposed for this continuous scenario. Our experiments on the Circular 2-D Gaussians, RC-49, UTKFace, Cell-200, and Steering Angle datasets show that CcGAN is able to generate diverse, high-quality samples from the image distribution conditional on a given regression label. Moreover, in these experiments, CcGAN substantially outperforms cGAN both visually and quantitatively.	翻訳日:2023-11-02 18:44:29 公開日:2023-10-30
# 予期せぬ敵に対するロバスト性テスト Testing Robustness Against Unforeseen Adversaries ( http://arxiv.org/abs/1908.08016v4 ) ライセンス: Link先を確認	Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks	(参考訳) adversarial robustness researchは主にl_p摂動に焦点を当てており、ほとんどの防御はトレーニングタイムとテストタイムの逆境で開発されている。しかし、現実世界のアプリケーションでは、開発者はシステムが直面する攻撃や汚職の全範囲にアクセスできない。さらに、最悪のケース入力は多様であり、L_pボールに制約される必要はない。研究と現実のこの相違を狭めるために、新しい18の非L_p攻撃を含む、予期せぬ敵に対するモデルの堅牢性を評価するためのフレームワークであるImageNet-UAを導入する。 ImageNet-UAでうまく機能するためには、ディフェンスは一般化ギャップを克服し、トレーニング中に遭遇しない多様な攻撃に対して堅牢でなければならない。大規模な実験では、既存のロバストネス対策が予期せぬロバストネスを捉えていないこと、標準ロバストネス技術が代替トレーニング戦略に勝っていること、新しい手法が予期せぬロバストネスを改善できることが判明した。我々は,機械学習システムの最悪の動作を改善するためのコミュニティの有用なツールとして,ImageNet-UAを提案する。 Adversarial robustness research primarily focuses on L_p perturbations, and most defenses are developed with identical training-time and test-time adversaries. However, in real-world applications developers are unlikely to have access to the full range of attacks or corruptions their system will face. Furthermore, worst-case inputs are likely to be diverse and need not be constrained to the L_p ball. To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks. To perform well on ImageNet-UA, defenses must overcome a generalization gap and be robust to a diverse attacks not encountered during training. In extensive experiments, we find that existing robustness measures do not capture unforeseen robustness, that standard robustness techniques are beat by alternative training strategies, and that novel methods can improve unforeseen robustness. We present ImageNet-UA as a useful tool for the community for improving the worst-case behavior of machine learning systems.	翻訳日:2023-11-02 05:30:05 公開日:2023-10-30
# 低リソース音声コマンド認識のための類似性を用いたニューラルモデル再構成 Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition ( http://arxiv.org/abs/2110.03894v5 ) ライセンス: Link先を確認	Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao	(参考訳) 本研究では,低リソース音声コマンド認識(SCR)のための新しいAR手法を提案し,AR-SCRシステムを構築する。 ARプロシージャは(ターゲットドメインから)音響信号を修正して(ソースドメインから)事前訓練されたSCRモデルを再利用することを目的としている。ソースドメインとターゲットドメインのラベルミスマッチを解消し、arの安定性をさらに高めるため、クラスをアライメントするための新しい類似性に基づくラベルマッピング手法を提案する。さらに、トランスファーラーニング(TL)技術と元のARプロセスを組み合わせることで、モデル適応性を向上させる。提案したAR-SCRシステムは,アラビア語,リトアニア語,マンダリン語を含む3つの低リソースSCRデータセットを用いて評価した。実験結果から、大規模な英語データセットで事前訓練されたAMを用いて、提案したAR-SCRシステムは、アラビア語およびリトアニア語の音声コマンドデータセット上で、限られた訓練データのみを用いて、現在の最先端の結果を上回ります。 In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.	翻訳日:2023-11-02 05:25:17 公開日:2023-10-30
# アルミニウム超伝導共振器の2レベル飽和下での異常損失低減 Anomalous Loss Reduction Below Two-Level System Saturation in Aluminum Superconducting Resonators ( http://arxiv.org/abs/2109.11742v5 ) ライセンス: Link先を確認	Tamin Tai, Jingnan Cai, Steven M. Anlage	(参考訳) 超伝導共振器は量子コンピューティングのためのキュービットリードアウトや運動インダクタンス検出器など多くの用途で広く使われている。これらの共振器は、多くの損失とノイズ機構、特に、少数の光子と低温状態において主な損失源となる2レベル系(TLS)による消音の影響を受けやすい。本研究では, 容量結合型半波長コプラナー導波路共振器について検討した。意外なことに, 共振器の損失は低励磁温度とTLS飽和度以下の温度で減少することが観察された。この挙動は、TLSの離散アンサンブルにおけるTLSと共振光子周波数の遅延を減らし、TLSの温度と電力を低下させることによるTLS共鳴応答帯域の減少に起因する。 TLSの応答帯域幅が共振器からの遅延よりも小さい場合、共振器応答が小さくなり、損失が減少する。より高い励起力では、損失は一般化トンネルモデル(GTM)の予測と一致する対数的パワー依存に従う。離散TLSアンサンブルとGTMを組み合わせたモデルを提案し、測定した共振器内部損失の温度と電力依存性を合理的パラメータと一致させる。 Superconducting resonators are widely used in many applications such as qubit readout for quantum computing, and kinetic inductance detectors. These resonators are susceptible to numerous loss and noise mechanisms, especially the dissipation due to two-level systems (TLS) which become the dominant source of loss in the few-photon and low temperature regime. In this study, capacitively-coupled aluminum half-wavelength coplanar waveguide resonators are investigated. Surprisingly, the loss of the resonators was observed to decrease with a lowering temperature at low excitation powers and temperatures below the TLS saturation. This behavior is attributed to the reduction of the TLS resonant response bandwidth with decreasing temperature and power to below the detuning between the TLS and the resonant photon frequency in a discrete ensemble of TLS. When response bandwidths of TLS are smaller than their detunings from the resonance, the resonant response and thus the loss is reduced. At higher excitation powers, the loss follows a logarithmic power dependence, consistent with predictions from the generalized tunneling model (GTM). A model combining the discrete TLS ensemble with the GTM is proposed and matches the temperature and power dependence of the measured internal loss of the resonator with reasonable parameters.	翻訳日:2023-11-02 05:24:25 公開日:2023-10-30
# future-ai: 医療画像における信頼できる人工知能のための原則とコンセンサス勧告 FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Medical Imaging ( http://arxiv.org/abs/2109.09658v5 ) ライセンス: Link先を確認	Karim Lekadir, Richard Osuala, Catherine Gallin, Noussair Lazrak, Kaisar Kushibar, Gianna Tsakou, Susanna Auss\'o, Leonor Cerd\'a Alberich, Kostas Marias, Manolis Tsiknakis, Sara Colantonio, Nickolas Papanikolaou, Zohaib Salahuddin, Henry C Woodruff, Philippe Lambin, Luis Mart\'i-Bonmat\'i	(参考訳) 人工知能(AI)の最近の進歩は、今日の臨床システムによって生成される膨大なデータと相まって、画像再構成、医用画像分割、画像ベースの診断、治療計画を含む、医療画像のバリューチェーン全体にわたる画像AIソリューションの開発につながっている。医療画像におけるaiの成功と将来の可能性にかかわらず、多くの利害関係者は、複雑で不透明で、重要な臨床応用に対する理解、利用、信頼が難しいと認識されるaiソリューションの潜在的なリスクと倫理的意味を懸念している。これらの懸念とリスクにもかかわらず、医療画像における将来のAI開発を信頼、安全性、採用を高めるための具体的なガイドラインやベストプラクティスは今のところ存在しない。このギャップを埋めるため,本稿では,欧州の5つの大規模健康イメージングプロジェクトから蓄積された経験,コンセンサス,ベストプラクティスから導かれた指針の慎重に選択する。これらの指針はfuture-aiと呼ばれ、その構成要素は (i)公平さ。 (ii)普遍性 (iii)トレーサビリティ (4)ユーザビリティ (v)堅牢性と (vi)説明可能。ステップバイステップアプローチでは、これらのガイドラインは、技術的、臨床的、倫理的に信頼できるAIソリューションを臨床実践に特定、開発、評価、デプロイするための具体的な勧告のフレームワークにさらに変換される。 The recent advancements in artificial intelligence (AI) combined with the extensive amount of data generated by today's clinical systems, has led to the development of imaging AI solutions across the whole value chain of medical imaging, including image reconstruction, medical image segmentation, image-based diagnosis and treatment planning. Notwithstanding the successes and future potential of AI in medical imaging, many stakeholders are concerned of the potential risks and ethical implications of imaging AI solutions, which are perceived as complex, opaque, and difficult to comprehend, utilise, and trust in critical clinical applications. Despite these concerns and risks, there are currently no concrete guidelines and best practices for guiding future AI developments in medical imaging towards increased trust, safety and adoption. To bridge this gap, this paper introduces a careful selection of guiding principles drawn from the accumulated experiences, consensus, and best practices from five large European projects on AI in Health Imaging. These guiding principles are named FUTURE-AI and its building blocks consist of (i) Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Robustness and (vi) Explainability. In a step-by-step approach, these guidelines are further translated into a framework of concrete recommendations for specifying, developing, evaluating, and deploying technically, clinically and ethically trustworthy AI solutions into clinical practice.	翻訳日:2023-11-02 05:24:02 公開日:2023-10-30
# 相対論的ジャッキー・ネアエノンの不確かさ関係--第一原理の導出 Uncertainty Relations for the Relativistic Jackiw-Nair Anyon: A First Principles Derivation ( http://arxiv.org/abs/2107.09342v2 ) ライセンス: Link先を確認	Joydeep Majhi (ISI, Kolkata), Subir Ghosh (ISI, Kolkata)	(参考訳) 本稿では,jackiw と nair ref によって提唱された相対論的粒子モデルに対する $position-position$ と $position-momentum$ (heisenberg) 不確かさ関係を明示的に計算した。 [1]anyonのモデルとして,純粋に量子力学的な枠組みを用いた。これは(シュワルツの不等式を通じて)任意の存在が 2-次元 \textit{noncommutative} 空間に存在するという予想を支持する。我々は最近構築したanyon波動関数refを用いて、anyon座標である${\sqrt{\delta x^2\delta y^2}}=\hbar\bar{\theta}_{xy}$の非自明な不確かさ関係を計算した。 [6]refの枠組みにおいて。 [7]. また、アノンに対するハイゼンベルクの不確かさ関係を計算する。最後に、電子に適用すると、同一の \textit{formalism} が自明な位置の不確実性関係を生じさせ、3次元の可換空間での生活と一致することを示した。 In this paper we have explicitly computed the $position-position$ and $position-momentum$ (Heisenberg) Uncertainty Relations for the model of relativistic particles with arbitrary spin, proposed by Jackiw and Nair ref.[1] as a model for Anyon, in a purely quantum mechanical framework. This supports (via Schwarz inequality) the conjecture that anyons live in a 2-dimensional \textit{noncommutative} space. We have computed the non-trivial uncertainty relation between anyon coordinates, ${\sqrt{\Delta x^2\Delta y^2}}=\hbar\bar{\Theta}_{xy}$, using the recently constructed anyon wave function ref.[6], in the framework of ref.[7]. We also compute the Heisenberg (position-momentum) uncertainty relation for anyons. Lastly we show that the identical \textit{formalism} when applied to electrons, yield a trivial position uncertainty relation, consistent with their living in a 3-dimensional commutative space.	翻訳日:2023-11-02 05:23:20 公開日:2023-10-30
# 連続最適化による因果構造学習におけるエントロピーに基づく損失の役割について On the Role of Entropy-based Loss for Learning Causal Structures with Continuous Optimization ( http://arxiv.org/abs/2106.02835v4 ) ライセンス: Link先を確認	Weilin Chen, Jie Qiao, Ruichu Cai, Zhifeng Hao	(参考訳) 観測データからの因果発見は多くの科学分野において重要であるが難しい課題である。近年, notears と呼ばれる非組合せ有向非巡回制約を用いた手法では, 因果構造学習問題を最小二乗損失を用いた連続最適化問題として定式化している。最小二乗損失関数は標準ガウス雑音仮定の下では十分正当化されるが、仮定が成り立たない場合に制限される。本研究では,ガウス雑音の仮定違反が因果方向の同定を妨げることを理論的に示し,因果方向が線形の場合や非線形の場合の強い非ガウス雑音のばらつきと同様に因果方向の強さによって完全に決定されることを示す。その結果,任意の雑音分布下での確率値と理論的に一致した,より一般的なエントロピーに基づく損失を提案する。提案手法の有効性を検証するために合成データと実世界のデータの両方について広範な実験評価を行い,提案手法が構造ハミング距離,偽発見率,真正率行列において最良であることを示す。 Causal discovery from observational data is an important but challenging task in many scientific fields. Recently, a method with non-combinatorial directed acyclic constraint, called NOTEARS, formulates the causal structure learning problem as a continuous optimization problem using least-square loss. Though the least-square loss function is well justified under the standard Gaussian noise assumption, it is limited if the assumption does not hold. In this work, we theoretically show that the violation of the Gaussian noise assumption will hinder the causal direction identification, making the causal orientation fully determined by the causal strength as well as the variances of noises in the linear case and by the strong non-Gaussian noises in the nonlinear case. Consequently, we propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution. We run extensive empirical evaluations on both synthetic data and real-world data to validate the effectiveness of the proposed method and show that our method achieves the best in Structure Hamming Distance, False Discovery Rate, and True Positive Rate matrices.	翻訳日:2023-11-02 05:22:41 公開日:2023-10-30
# 小型データセットを用いた画像分類学習のためのパラメトリズドロスの進化 Evolving parametrized Loss for Image Classification Learning on Small Datasets ( http://arxiv.org/abs/2103.08249v2 ) ライセンス: Link先を確認	Zhaoyang Hai, Xiabi Liu	(参考訳) 本稿では,メタロスネットワーク(mln)と呼ばれるパラメータ付き損失関数を進化させ,画像分類学習を小規模データセットで学習するメタラーニング手法を提案する。本手法では,MLNを識別対象関数として分類学習の枠組みに組み込む。 MLNは進化戦略アルゴリズム(ES)から最適化された損失関数へと進化し、この損失を最小限に抑えるために最適化された分類器が優れた一般化効果を達成する。分類器は、小さなトレーニングデータセットから学習し、Stochastic Gradient Descent (SGD)でMLNを最小化し、その後、大規模な検証データセット上の小データセット更新分類器の精度でMLNを進化させる。本手法を評価するため,MLNはFashionMNISTから採取した多数のサンプル学習タスクを訓練し,FashionMNISTとCIFAR10から採取した検証タスクを試験した。実験の結果,MLNは古典的クロスエントロピー誤差や平均二乗誤差と比較して,一般化を効果的に改善した。 This paper proposes a meta-learning approach to evolving a parametrized loss function, which is called Meta-Loss Network (MLN), for training the image classification learning on small datasets. In our approach, the MLN is embedded in the framework of classification learning as a differentiable objective function. The MLN is evolved with the Evolutionary Strategy algorithm (ES) to an optimized loss function, such that a classifier, which optimized to minimize this loss, will achieve a good generalization effect. A classifier learns on a small training dataset to minimize MLN with Stochastic Gradient Descent (SGD), and then the MLN is evolved with the precision of the small-dataset-updated classifier on a large validation dataset. In order to evaluate our approach, the MLN is trained with a large number of small sample learning tasks sampled from FashionMNIST and tested on validation tasks sampled from FashionMNIST and CIFAR10. Experiment results demonstrate that the MLN effectively improved generalization compared to classical cross-entropy error and mean squared error.	翻訳日:2023-11-02 05:22:21 公開日:2023-10-30
# 磁性薄膜を用いた貯留層計算 Reservoir Computing with Magnetic Thin Films ( http://arxiv.org/abs/2101.12700v2 ) ライセンス: Link先を確認	Matthew Dale, David Griffin, Richard F. L. Evans, Sarah Jenkins, Simon O'Keefe, Angelika Sebald, Susan Stepney, Fernando Torre, Martin Trefzer	(参考訳) 人工知能の進歩は脳に触発された技術によってもたらされるが、これらの技術は生体システムよりも強力でエネルギー効率が良い。ニューラルネットの非線形ダイナミクスに触発された新しい非伝統的なコンピューティングハードウェアは、生物学的システムと同じような方法で、自然現象を活用し、効率を上げる可能性を秘めている。物理貯水池計算は、光学系から機械系まで、様々な非伝統的なシステムでこれを実証している。貯水池コンピュータは、システムの内部ダイナミクスを利用して、高次元特徴空間に入力されるタスクを非線形に投影する。トレーニングされた読み出し層は、パターン認識や時系列分析などのタスクを実行するために機能を組み合わせる。進展にもかかわらず、外部信号処理を行わずに最先端の性能を達成することは依然として困難である。ここでは, マイクロスケールシミュレーションによる薄膜における3つの磁性物質の初期探査を行う。以上の結果から, 磁気フィルムの基本スピン特性は, 機械学習の課題を解くために必要な非線形ダイナミクスとメモリを生成することが判明した(物理実装におけるこれらの特定の材料の利用には現実的な課題がある)。この方法は他の材料にも応用できるため、比較的単純な(合金の)ものからかなり複雑なもの(反強磁性貯水池)まで、様々な材料をテストできる可能性が開ける。 Advances in artificial intelligence are driven by technologies inspired by the brain, but these technologies are orders of magnitude less powerful and energy efficient than biological systems. Inspired by the nonlinear dynamics of neural networks, new unconventional computing hardware has emerged with the potential to exploit natural phenomena and gain efficiency, in a similar manner to biological systems. Physical reservoir computing demonstrates this with a variety of unconventional systems, from optical-based to memristive systems. Reservoir computers provide a nonlinear projection of the task input into a high-dimensional feature space by exploiting the system's internal dynamics. A trained readout layer then combines features to perform tasks, such as pattern recognition and time-series analysis. Despite progress, achieving state-of-the-art performance without external signal processing to the reservoir remains challenging. Here we perform an initial exploration of three magnetic materials in thin-film geometries via microscale simulation. Our results reveal that basic spin properties of magnetic films generate the required nonlinear dynamics and memory to solve machine learning tasks (although there would be practical challenges in exploiting these particular materials in physical implementations). The method of exploration can be applied to other materials, so this work opens up the possibility of testing different materials, from relatively simple (alloys) to significantly complex (antiferromagnetic reservoirs).	翻訳日:2023-11-02 05:21:37 公開日:2023-10-30
# 統計的ロバスト信頼テストによる3次元顔アライメントの教師なし性能解析 Unsupervised Performance Analysis of 3D Face Alignment with a Statistically Robust Confidence Test ( http://arxiv.org/abs/2004.06550v6 ) ライセンス: Link先を確認	Mostafa Sadeghi, Xavier Alameda-Pineda and Radu Horaud	(参考訳) 本稿では,3次元顔アライメント(3DFA)や顔のランドマーク位置解析の問題点について述べる。このタスクは通常、アノテーション付きデータセットに基づいて管理される。しかしながら、3DFAの場合、アノテーションプロセスはエラーのないことはめったになく、結果に強く偏っている。また,教師なし性能解析(UPA)についても検討した。提案手法の核となる要素は予測されたランドマークとモデルランドマークの間の剛性変換のロバストな推定である。このように計算した剛性マッピングは、表情やアイデンティティの変動による非剛性な顔変形や、様々な摂動によるランドマーク化誤差の影響を受けないことが示されている。先導的な考え方は、推定された回転、翻訳、スケールを予測されたランドマークの集合に適用して、これらのランドマークに埋め込まれた形状(潜在的なエラーを含む)を数学的ホームにマッピングすることである。 UPAは次のように進める。 (i)調査中の3DFA法を用いた2次元顔から3Dランドマークを抽出する。 (ii)これらのランドマークは、正準(正面)のポーズに厳格にマッピングされ、 (iii)各ランドマークに対して統計的に損なわれる信頼スコアを算出する。これにより、マッピングされたランドマークが(インバータ)内側にあるか(インバータ)外側(アウトバータ)の信頼度ボリュームにあるかを評価することができる。公開されているデータセットと公開記事に関連するいくつかの3DFAソフトウェアパッケージを利用する実験的な評価プロトコルについて詳述する。その結果,提案手法は教師付きメトリクスと一致しており,予測されたランドマークと自動アノテートされた3dfaデータセットの両方の精度を計測し,エラーの検出と除去に使用することができる。本論文のソースコードと補足資料はhttps://team.inria.fr/robotlearn/upa3dfa/で公開されている。 This paper addresses the problem of analysing the performance of 3D face alignment (3DFA), or facial landmark localization. This task is usually supervised, based on annotated datasets. Nevertheless, in the particular case of 3DFA, the annotation process is rarely error-free, which strongly biases the results. Alternatively, unsupervised performance analysis (UPA) is investigated. The core ingredient of the proposed methodology is the robust estimation of the rigid transformation between predicted landmarks and model landmarks. It is shown that the rigid mapping thus computed is affected neither by non-rigid facial deformations, due to variabilities in expression and in identity, nor by landmark localization errors, due to various perturbations. The guiding idea is to apply the estimated rotation, translation and scale to a set of predicted landmarks in order to map them onto a mathematical home for the shape embedded in these landmarks (including possible errors). UPA proceeds as follows: (i) 3D landmarks are extracted from a 2D face using the 3DFA method under investigation; (ii) these landmarks are rigidly mapped onto a canonical (frontal) pose, and (iii) a statistically-robust confidence score is computed for each landmark. This allows to assess whether the mapped landmarks lie inside (inliers) or outside (outliers) a confidence volume. An experimental evaluation protocol, that uses publicly available datasets and several 3DFA software packages associated with published articles, is described in detail. The results show that the proposed analysis is consistent with supervised metrics and that it can be used to measure the accuracy of both predicted landmarks and of automatically annotated 3DFA datasets, to detect errors and to eliminate them. Source code and supplemental materials for this paper are publicly available at https://team.inria.fr/robotlearn/upa3dfa/.	翻訳日:2023-11-02 05:20:02 公開日:2023-10-30
# 投機的復号:Seq2seq生成の高速化のための投機的実行の爆発 Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation ( http://arxiv.org/abs/2203.16487v6 ) ライセンス: Link先を確認	Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui	(参考訳) 我々は,投機的実行の考え方を活用し,自己回帰的(ar)復号を加速するために,初めて投機的復号(specdec)を提案する。 spec-drafter -- 効率的かつ正確なドラフト作成のために特別に最適化された独立したモデル -- とspec-verification -- の2つのイノベーションがある。機械翻訳や抽象的な要約を含むSeq2seqタスクの実験結果から、一般的なトランスフォーマーアーキテクチャにおいて、ビーム検索デコーディングに匹敵する世代品質の高速化を実現し、ドラフト-then-verifyパラダイムがわずか1.4\times$$\sim$2\times$スピードアップを実現した。驚くべきスピードアップに加えて、SpecDecの3つのアドバンテージも示し、実世界のアプリケーションで生成モデルを加速する実用的価値を明らかにした。私たちのモデルとコードはhttps://github.com/hemingkx/specdec.com/で利用可能です。 We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around $5\times$ speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only $1.4\times$$\sim$$2\times$ speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at https://github.com/hemingkx/SpecDec.	翻訳日:2023-11-02 05:12:16 公開日:2023-10-30
# ノイズ画像分類のための処理学習因果変換器 Treatment Learning Causal Transformer for Noisy Image Classification ( http://arxiv.org/abs/2203.15529v2 ) ライセンス: Link先を確認	Chao-Han Huck Yang, I-Te Danny Hung, Yi-Chieh Liu, Pin-Yu Chen	(参考訳) 現在のトップノートディープラーニング(DL)ベースのビジョンモデルは主に、トレーニングデータサンプルと関連するラベル間の固有の相関を探索し、活用することに基づいている。しかしながら、既知の実用的な課題は、スプリアス相関、無関係なコンテキスト、ドメインシフト、逆境攻撃などの異なる状況によって引き起こされる「ノイズ」データに対する低下したパフォーマンスである。本研究では,この2値情報「ノイズの存在」を画像分類タスクに組み込んで,その処理効果を共同で推定することで予測精度を向上させる。因果的変動推定から動機付け,雑音画像分類のための現在の観測入力からロバストな特徴表現を推定するために潜在生成モデルを用いた,変圧器に基づく処理学習因果変換器(TLT)を提案する。 TLTは、推定ノイズレベル(バイナリ処理係数としてモデル化)に応じて、設計した因果損失によってトレーニングされた対応する推論ネットワークを割り当て、予測を行う。また、パフォーマンスベンチマークのための幅広いノイズ要因(オブジェクトマスキング、スタイル転送、逆方向摂動など)を取り入れた、ノイズの多い画像データセットも作成する。雑音画像分類におけるTLTの優れた性能は、いくつかの難燃評価指標によりさらに検証される。副産物として、TLTはノイズ画像を認識する視覚的サリエンス法も改善した。 Current top-notch deep learning (DL) based vision models are primarily based on exploring and exploiting the inherent correlations between training data samples and their associated labels. However, a known practical challenge is their degraded performance against "noisy" data, induced by different circumstances such as spurious correlations, irrelevant contexts, domain shift, and adversarial attacks. In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects. Motivated from causal variational inference, we propose a transformer-based architecture, Treatment Learning Causal Transformer (TLT), that uses a latent generative model to estimate robust feature representations from current observational input for noise image classification. Depending on the estimated noise level (modeled as a binary treatment factor), TLT assigns the corresponding inference network trained by the designed causal loss for prediction. We also create new noisy image datasets incorporating a wide range of noise factors (e.g., object masking, style transfer, and adversarial perturbation) for performance benchmarking. The superior performance of TLT in noisy image classification is further validated by several refutation evaluation metrics. As a by-product, TLT also improves visual salience methods for perceiving noisy images.	翻訳日:2023-11-02 05:11:52 公開日:2023-10-30
# 対話型セグメンテーションのためのカスケードスパース特徴伝播ネットワーク Cascaded Sparse Feature Propagation Network for Interactive Segmentation ( http://arxiv.org/abs/2203.05145v3 ) ライセンス: Link先を確認	Chuyu Zhang, Chuanyang Hu, Hui Ren, Yongfei Liu, and Xuming He	(参考訳) 我々は,ユーザが提供するアノテーションをラベルなしの領域に効率的に伝播させることが重要な課題である,ポイントベースのインタラクティブセグメンテーションの問題に取り組むことを目的とする。既存の手法では計算コストがかかる完全連結グラフやトランスフォーマーアーキテクチャを利用して、正確なセグメンテーションに必要な重要なきめ細かい情報を犠牲にする。これらの制約を克服するために,ユーザが提供した情報をラベルなしの領域に伝達するクリック型特徴表現を学習するカスケードスパース特徴伝達ネットワークを提案する。ネットワークのスパース設計により、高解像度な特徴の効率的な情報伝達が可能となり、より詳細なオブジェクトセグメンテーションが実現される。本手法の有効性を検証するために,様々なベンチマークを用いた包括的実験を行い,提案手法の優れた性能を示す。コードは \href{https://github.com/kleinzcy/CSFPN}{https://github.com/kleinzcy/CSFPN} で公開されている。 We aim to tackle the problem of point-based interactive segmentation, in which the key challenge is to propagate the user-provided annotations to unlabeled regions efficiently. Existing methods tackle this challenge by utilizing computationally expensive fully connected graphs or transformer architectures that sacrifice important fine-grained information required for accurate segmentation. To overcome these limitations, we propose a cascade sparse feature propagation network that learns a click-augmented feature representation for propagating user-provided information to unlabeled regions. The sparse design of our network enables efficient information propagation on high-resolution features, resulting in more detailed object segmentation. We validate the effectiveness of our method through comprehensive experiments on various benchmarks, and the results demonstrate the superior performance of our approach. Code is available at \href{https://github.com/kleinzcy/CSFPN}{https://github.com/kleinzcy/CSFPN}.	翻訳日:2023-11-02 05:11:20 公開日:2023-10-30
# ZXダイアグラムの添加と分化 Addition and Differentiation of ZX-diagrams ( http://arxiv.org/abs/2202.11386v3 ) ライセンス: Link先を確認	Emmanuel Jeandel and Simon Perdrix and Margarita Veshchezerova	(参考訳) ZX計算は量子コンピューティングの推論のための強力なフレームワークである。特に興味のある行列のコンパクトな表現を提供する。 zx-計算の特異な性質は、任意のzx-ダイアグラムの線形結合を可能にする形式的な和がないことである。形式主義の普遍性は、任意の2つのZXダイアグラムに対して、それらの解釈の和はZXダイアグラムで表せることを保証している。制御ダイアグラムの構成に依拠して,zx-ダイアグラムの追加の一般的,帰納的定義を導入する。この付加技術に基づき、zx-ダイアグラムの誘導的微分を提供する。実際、その角度の説明に変数を持つ zx-ダイアグラムが与えられると、これらの変数の1つに従ってダイアグラムを区別することができる。微分は量子力学や量子コンピューティング(例えば最適化問題の解法)においてユビキタスである。技術的には、zx-ダイアグラムの分化は、製品規則で見られるように要約と強く関連している。また,変数の分離を基本とした代替的,非帰納的,微分手法も導入する。最後に、結果を適用してイジング・ハミルトン多様体の図形を導出する。 The ZX-calculus is a powerful framework for reasoning in quantum computing. It provides in particular a compact representation of matrices of interests. A peculiar property of the ZX-calculus is the absence of a formal sum allowing the linear combinations of arbitrary ZX-diagrams. The universality of the formalism guarantees however that for any two ZX-diagrams, the sum of their interpretations can be represented by a ZX-diagram. We introduce a general, inductive definition of the addition of ZX-diagrams, relying on the construction of controlled diagrams. Based on this addition technique, we provide an inductive differentiation of ZX-diagrams. Indeed, given a ZX-diagram with variables in the description of its angles, one can differentiate the diagram according to one of these variables. Differentiation is ubiquitous in quantum mechanics and quantum computing (e.g. for solving optimization problems). Technically, differentiation of ZX-diagrams is strongly related to summation as witnessed by the product rules. We also introduce an alternative, non inductive, differentiation technique rather based on the isolation of the variables. Finally, we apply our results to deduce a diagram for an Ising Hamiltonian.	翻訳日:2023-11-02 05:10:26 公開日:2023-10-30
# 暗黒環境における行動認識の深化:包括的ベンチマーク研究 Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study ( http://arxiv.org/abs/2202.09545v3 ) ライセンス: Link先を確認	Yuecong Xu, Jianfei Yang, Haozhi Cao, Jianxiong Yin, Zhenghua Chen, Xiaoli Li, Zhengguo Li, Qianwen Xu	(参考訳) 大規模なビデオデータセットの導入とディープニューラルネットワークの開発により、アクション認識(AR)は大幅に改善されているが、現実のシナリオにおける挑戦的な環境に対して堅牢なARモデルは、まだ探索されていない。我々は,暗環境における行動認識の課題に注目し,監視や夜間の自律運転といった分野に適用できる。直感的には、現在のディープネットワークとビジュアルエンハンスメント技術は、暗い環境でarを扱えるべきであるが、実際には必ずしもそうではないことが観察されている。ダーク環境でarのソリューションを探求するために、私たちは、暗い環境におけるarモデルの堅牢性の評価と向上を目的として、ieee cvpr 2021でug2+ challenge track 2(ug2-2)を立ち上げました。この課題は、ダークビデオarのタスクのための最初のデータセットであるaridデータセットの上に構築され、拡張し、完全かつ半監督された方法でそのようなタスクに取り組むためのモデルをガイドする。現在のARモデルと拡張手法を利用したベースライン結果が報告され、このタスクの難易度を改善の余地で正当化する。研究コミュニティからの積極的な参加により、参加者のソリューションに顕著な進歩が見られ、一方、これらのソリューションの分析は、暗黒環境におけるARの課題に取り組むための可能な方向の特定に役立っている。 While action recognition (AR) has gained large improvements with the introduction of large-scale video datasets and the development of deep neural networks, AR models robust to challenging environments in real-world scenarios are still under-explored. We focus on the task of action recognition in dark environments, which can be applied to fields such as surveillance and autonomous driving at night. Intuitively, current deep networks along with visual enhancement techniques should be able to handle AR in dark environments, however, it is observed that this is not always the case in practice. To dive deeper into exploring solutions for AR in dark environments, we launched the UG2+ Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and advancing the robustness of AR models in dark environments. The challenge builds and expands on top of a novel ARID dataset, the first dataset for the task of dark video AR, and guides models to tackle such a task in both fully and semi-supervised manners. Baseline results utilizing current AR models and enhancement methods are reported, justifying the challenging nature of this task with substantial room for improvements. Thanks to the active participation from the research community, notable advances have been made in participants' solutions, while analysis of these solutions helped better identify possible directions to tackle the challenge of AR in dark environments.	翻訳日:2023-11-02 05:09:47 公開日:2023-10-30
# メトリック学習による最適輸送による分子表現学習の改善 Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport ( http://arxiv.org/abs/2202.06208v3 ) ライセンス: Link先を確認	Fang Wu, Nicolas Courty, Shuting Jin, Stan Z. Li	(参考訳) トレーニングデータは通常、多くの化学および生物学的応用において制限または不均一である。既存の化学と材料科学の機械学習モデルは、訓練領域を超えて一般化することを考慮しない。本稿では,分子レグレッション問題の一般化能力を高めるため,MROTと呼ばれる新しい最適輸送ベースアルゴリズムを開発した。 MROTは、新しい領域距離の測定値と、化学領域ギャップを埋める輸送計画に関する後続の分散正則化を計測することで、データの連続ラベルを学習する。下流では, 化学特性予測や物質吸着選択など, 教師なし・半監督的な環境下での基本的な化学回帰タスクを検討する。広範な実験により、mrotは最先端のモデルを大きく上回り、望ましい性質を持つ新しい物質の発見を加速する可能性を示した。 Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.	翻訳日:2023-11-02 05:09:22 公開日:2023-10-30
# 高速で高精度な圧縮圧縮ビデオ品質向上のためのビットストリームメタデータの活用 Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement ( http://arxiv.org/abs/2202.00011v3 ) ライセンス: Link先を確認	Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava	(参考訳) ビデオ圧縮は、ソーシャルメディアからビデオ会議まで、現代のインターネットを支える技術の中心的な特徴である。ビデオ圧縮は成熟を続けていますが、多くの圧縮設定では品質の低下が顕著です。これらの設定は、帯域制限や不安定な接続による効率的な動画伝送に重要な応用をもたらす。本研究では,ビデオビットストリームに埋め込まれた構造と動作情報を活用する圧縮ビデオに詳細を復元する深層学習アーキテクチャを開発した。その結果,従来の圧縮補正法と比較して復元精度が向上し,高スループットを実現しつつ,近年のディープラーニングビデオ圧縮法と比較した場合の競合性が示された。さらに、ビットストリームで容易に利用できる量子化データに対して、我々のモデルを条件付けする。これにより、1つのモデルでさまざまな圧縮品質の設定を処理でき、事前作業で複数のモデルが必要になります。 Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.	翻訳日:2023-11-02 05:09:09 公開日:2023-10-30
# Subset Stackingによる学習 Learning with Subset Stacking ( http://arxiv.org/abs/2112.06251v3 ) ライセンス: Link先を確認	S. \.Ilker Birbil, Sinan Yildirim, Kaya G\"okalp, M. Hakan Aky\"uz	(参考訳) 入力-出力ペアの集合から学習する新しい回帰アルゴリズムを提案する。本アルゴリズムは,入力変数と出力変数の関係が予測子空間にまたがる不均一な振る舞いを示す集団を対象として設計されている。アルゴリズムは入力空間のランダムな点を中心に集中した部分集合を生成することから始まる。次に、各サブセットに対してローカル予測器をトレーニングする。それらの予測器は、新しい方法で結合され、全体的な予測器となる。我々はこのアルゴリズムを '`Larning with Subset Stacking'' あるいは LESS と呼んでいる。 LESSの試験性能といくつかのデータセットの最先端手法を比較した。比較の結果,LESSは競合型教師あり学習手法であることがわかった。さらに, LESSは計算時間の観点からも効率的であり, 直接並列実装が可能であることも確認した。 We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is designed for populations where the relation between the input variables and the output variable exhibits a heterogeneous behavior across the predictor space. The algorithm starts with generating subsets that are concentrated around random points in the input space. This is followed by training a local predictor for each subset. Those predictors are then combined in a novel way to yield an overall predictor. We call this algorithm ``LEarning with Subset Stacking'' or LESS, due to its resemblance to the method of stacking regressors. We compare the testing performance of LESS with state-of-the-art methods on several datasets. Our comparison shows that LESS is a competitive supervised learning method. Moreover, we observe that LESS is also efficient in terms of computation time and it allows a straightforward parallel implementation.	翻訳日:2023-11-02 05:08:41 公開日:2023-10-30
# lipschitzバンドにバッチフィードバック Lipschitz Bandits with Batched Feedback ( http://arxiv.org/abs/2110.09722v6 ) ライセンス: Link先を確認	Yasong Feng, Zengfeng Huang, Tianyu Wang	(参考訳) 本稿では,バッチフィードバックによるリプシッツのバンドイット問題について検討し,期待される報酬はリプシッツであり,報奨観測はバッチでプレイヤーに伝達される。本稿では,この問題を最適に解くために,Batched Lipschitz Narrowing (BLiN)と呼ばれる新しいランドスケープ認識アルゴリズムを提案する。具体的には、リプシッツのズーム次元が$d_z$の報酬を持つ$T$-step問題に対して、我々のアルゴリズムは、$ \mathcal{O} \left( \log\log T\right) $ batchesのみを用いて理論的に最適(対数的因子まで)の後悔率$\widetilde{\mathcal{O}}\left(T^{\frac{d_z+1}{d_z+2}}\right)$を達成する。この問題に対する複雑性分析も提供する。理論上の下限は、任意のアルゴリズムが最適な後悔を達成するためには、$\omega(\log\log t)$ バッチが必要であることを意味する。したがって、BLiNは最小限の通信を用いて最適な後悔率(対数係数まで)を達成する。 In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a $T$-step problem with Lipschitz reward of zooming dimension $d_z$, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate $\widetilde{\mathcal{O}}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ using only $ \mathcal{O} \left( \log\log T\right) $ batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that $\Omega(\log\log T)$ batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.	翻訳日:2023-11-02 05:07:36 公開日:2023-10-30
# 侵入検知システムにおける深部教師なし学習アルゴリズムのロバスト性評価 Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems ( http://arxiv.org/abs/2207.03576v2 ) ライセンス: Link先を確認	D'Jeff Kanda Nkashama, Arian Soltani, Jean-Charles Verdier, Marc Frappier, Pierre-Martin Tardif, Froduald Kabanza	(参考訳) 近年,コンピュータビジョン,自然言語処理,サイバーセキュリティなど,さまざまな分野でディープラーニングの進歩が観察されている。機械学習(ml)は、異常検出に基づく侵入検知システムによる安全なコンピュータネットワーク構築のための潜在的なツールとしての能力を実証した。 MLアプローチは、データから直接学習するため、サイバーセキュリティに対するヒューリスティックアプローチよりも広く採用されている。データはMLシステムの開発に不可欠であり、アタッカーの潜在的なターゲットとなる。データ中毒や汚染は、データを通してMLモデルを騙すのに最も一般的なテクニックの1つである。本稿では,最近の6つの深層学習アルゴリズムによる汚染データへの侵入検出のロバスト性を評価する。本研究では,新しいモデル,特に侵入検知システムの開発において,データ汚染に敏感な最先端アルゴリズムが,データの摂動に対する自己防衛の重要性を明らかにした。 Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.	翻訳日:2023-11-02 05:00:16 公開日:2023-10-30
# データ効率ganトレーニングのための拡張認識自己スーパービジョン Augmentation-Aware Self-Supervision for Data-Efficient GAN Training ( http://arxiv.org/abs/2205.15677v3 ) ライセンス: Link先を確認	Liang Hou, Qi Cao, Yige Yuan, Songtao Zhao, Chongyang Ma, Siyuan Pan, Pengfei Wan, Zhongyuan Wang, Huawei Shen, Xueqi Cheng	(参考訳) 限定されたデータを持つgans(generative adversarial networks)のトレーニングは、判別器が過剰に適合し易いため難しい。従来提案された差別化可能拡張は、訓練用GANのデータ効率の改善を示す。しかし、データ変換によるラベル空間のセマンティクスの変化を無視し、識別器の表現学習能力を制限し、最終的にジェネレータの生成モデル性能に影響を及ぼすため、識別器の増大に対する望ましくない不変性を暗黙的に導入する。データ拡張の利点を継承しながら、不変性の悪影響を軽減するために、拡張データの拡張パラメータを予測する新しい強化対応自己教師付き判別器を提案する。特に、実際のデータと生成されたデータの予測対象は、トレーニング中に異なるため、区別する必要がある。さらに,自己監督型判別器から,偽データではなく拡張予測可能な実データを生成することで,逆向きに学習することを推奨する。この定式化は、ある仮定の下で生成元の学習目標と算術 $-$ harmonic mean divergence を結びつける。我々は,データ制限付きCIFAR-10, CIFAR-100, FFHQ, LSUN-Catおよび5つの低ショットデータセット上で, クラス条件のBigGANおよび非条件のStyleGAN2アーキテクチャを用いたSOTA手法との比較を行った。実験により,データ効率のよいGANの訓練において,SOTA法よりも優れた手法が得られた。 Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting. Previously proposed differentiable augmentation demonstrates improved data efficiency of training GANs. However, the augmentation implicitly introduces undesired invariance to augmentation for the discriminator since it ignores the change of semantics in the label space caused by data transformation, which may limit the representation learning ability of the discriminator and ultimately affect the generative modeling performance of the generator. To mitigate the negative impact of invariance while inheriting the benefits of data augmentation, we propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data. Particularly, the prediction targets of real data and generated data are required to be distinguished since they are different during training. We further encourage the generator to adversarially learn from the self-supervised discriminator by generating augmentation-predictable real and not fake data. This formulation connects the learning objective of the generator and the arithmetic $-$ harmonic mean divergence under certain assumptions. We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures on data-limited CIFAR-10, CIFAR-100, FFHQ, LSUN-Cat, and five low-shot datasets. Experimental results demonstrate significant improvements of our method over SOTA methods in training data-efficient GANs.	翻訳日:2023-11-02 04:57:31 公開日:2023-10-30
# ボソニック分解チャネルの量子容量とプライベート容量の厳密解 Exact solution for the quantum and private capacities of bosonic dephasing channels ( http://arxiv.org/abs/2205.05736v2 ) ライセンス: Link先を確認	Ludovico Lami, Mark M. Wilde	(参考訳) ノイズの多い量子チャネルの容量は、量子通信回線間での情報伝達の究極の速度を捉え、量子容量はフォールトトレラントな量子計算プラットフォームのオーバーヘッドを決定する上で重要な役割を果たす。多くの応用の中心となるボソニック系では、超伝導回路や光ファイバー通信チャネルに影響を及ぼすノイズをモデル化する非ガウスチャネルの重要なクラスであるボソニックデファッシングチャネルでは、これらのキャパシティの閉じた公式は知られていなかった。ここでは、全てのボソニックデファスティングチャネルの量子、プライベート、双方向の補助量子、および秘密鍵合意容量を、初めて正確に計算する。それらの分布が一様分布に対するチャネルの基礎となる分布の相対エントロピーに等しいことが証明される。この結果は,[jiang & chen, quantum and nonlinear optics 244, 2010]が提唱した,10年以上にわたって開かれてきた問題を解くものだ。 The capacities of noisy quantum channels capture the ultimate rates of information transmission across quantum communication lines, and the quantum capacity plays a key role in determining the overhead of fault-tolerant quantum computation platforms. In the case of bosonic systems, central to many applications, no closed formulas for these capacities were known for bosonic dephasing channels, a key class of non-Gaussian channels modelling, e.g., noise affecting superconducting circuits or fiber-optic communication channels. Here we provide the first exact calculation of the quantum, private, two-way assisted quantum, and secret-key agreement capacities of all bosonic dephasing channels. We prove that that they are equal to the relative entropy of the distribution underlying the channel to the uniform distribution. Our result solves a problem that has been open for over a decade, having been posed originally by [Jiang & Chen, Quantum and Nonlinear Optics 244, 2010].	翻訳日:2023-11-02 04:57:04 公開日:2023-10-30
# INSPIRE:Dense WLANにおけるSPatIalリユースの改善のための分散ベイズ最適化 INSPIRE: Distributed Bayesian Optimization for ImproviNg SPatIal REuse in Dense WLANs ( http://arxiv.org/abs/2204.10184v3 ) ライセンス: Link先を確認	Anthony Bardou, Thomas Begin	(参考訳) 有線ネットワークを抜いてデバイスをインターネットに接続する主要な手段となったWLANは、無線帯域の空間不足により性能上の問題が発生する傾向にある。応答として、IEEE 802.11axとその後の修正は、送信電力(TX_POWER)と感度閾値(OBSS_PD)の2つのキーパラメータの動的更新を可能にすることで、無線チャネルの空間的再利用を高めることを目的としている。本稿では,WLANにおける空間再利用を改善するために,ガウス過程に基づく局所ベイズ最適化を行う分散ソリューションINSPIREを提案する。 INSPIREは、WLANのトポロジについて明確な仮定をせず、アクセスポイントの利他的振る舞いを好んでおり、それによって、WLANの"より優れた"ために、TX_POWERとOBSS_PDパラメータの適切な構成を見つけることができる。我々は,ns-3シミュレータを用いた他の最先端戦略よりもINSPIREの方が優れていることを示す。この結果から,INSPIREは,その公平性とスループットを向上することにより,運用用WLANのサービス品質を大幅に向上させることができることがわかった。 WLANs, which have overtaken wired networks to become the primary means of connecting devices to the Internet, are prone to performance issues due to the scarcity of space in the radio spectrum. As a response, IEEE 802.11ax and subsequent amendments aim at increasing the spatial reuse of a radio channel by allowing the dynamic update of two key parameters in wireless transmission: the transmission power (TX_POWER) and the sensitivity threshold (OBSS_PD). In this paper, we present INSPIRE, a distributed solution performing local Bayesian optimizations based on Gaussian processes to improve the spatial reuse in WLANs. INSPIRE makes no explicit assumptions about the topology of WLANs and favors altruistic behaviors of the access points, leading them to find adequate configurations of their TX_POWER and OBSS_PD parameters for the "greater good" of the WLANs. We demonstrate the superiority of INSPIRE over other state-of-the-art strategies using the ns-3 simulator and two examples inspired by real-life deployments of dense WLANs. Our results show that, in only a few seconds, INSPIRE is able to drastically increase the quality of service of operational WLANs by improving their fairness and throughput.	翻訳日:2023-11-02 04:55:57 公開日:2023-10-30
# 脳電図を用いたスケーラブルな機械学習モデルによる眠気検出性能の検討 Studying Drowsiness Detection Performance while Driving through Scalable Machine Learning Models using Electroencephalography ( http://arxiv.org/abs/2209.04048v3 ) ライセンス: Link先を確認	Jos\'e Manuel Hidalgo Rogel, Enrique Tom\'as Mart\'inez Beltr\'an, Mario Quiles P\'erez, Sergio L\'opez Bernal, Gregorio Mart\'inez P\'erez, Alberto Huertas Celdr\'an	(参考訳) 背景 / 導入: ドライバーの眠気は重要な関心事であり、交通事故の主な原因の1つです。認知神経科学とコンピュータ科学の進歩により、Brain-Computer Interfaces (BCI) と Machine Learning (ML) を用いたドライバーの眠気の検出が可能になった。しかし,不均質なMLアルゴリズムを用いた快適度検出性能の総合評価には欠けており,対象者のグループに適したスケーラブルなMLモデルの性能について検討する必要がある。方法:これらの制約に対処するため、この研究はBCIを用いたインテリジェントな枠組みを示し、脳波に基づいて運転シナリオの眠気を検出する。 SEED-VIGデータセットは、個人とグループにとって最高のパフォーマンスモデルを評価するために使用される。結果: ランダムフォレスト (RF) は,SVM (Support Vector Machine) などの文献において,個々のモデルに対して78%のf1スコアで,他のモデルよりも優れていた。スケーラブルモデルに関して、RFは79%のf1スコアに達し、これらのアプローチの有効性を実証した。本論文は,多種多様なmlアルゴリズムと,被検者の集団が眠気検出システムを改善し,最終的には運転者の疲労による事故数を減らすのに適したスケーラブルなアプローチを検討することの関連性を強調する。結論:本研究から得られた教訓は,SVMだけでなく,文献で十分に調査されていない他のモデルも,眠気検出に関係していることを示している。さらに,新しい被験者が評価された場合でも,スケーラブルなアプローチは眠気の検出に有効である。そこで,提案フレームワークは,BCIとMLを用いた運転シナリオの眠気を検出する新しい手法を提案する。 - Background / Introduction: Driver drowsiness is a significant concern and one of the leading causes of traffic accidents. Advances in cognitive neuroscience and computer science have enabled the detection of drivers' drowsiness using Brain-Computer Interfaces (BCIs) and Machine Learning (ML). However, the literature lacks a comprehensive evaluation of drowsiness detection performance using a heterogeneous set of ML algorithms, and it is necessary to study the performance of scalable ML models suitable for groups of subjects. - Methods: To address these limitations, this work presents an intelligent framework employing BCIs and features based on electroencephalography for detecting drowsiness in driving scenarios. The SEED-VIG dataset is used to evaluate the best-performing models for individual subjects and groups. - Results: Results show that Random Forest (RF) outperformed other models used in the literature, such as Support Vector Machine (SVM), with a 78% f1-score for individual models. Regarding scalable models, RF reached a 79% f1-score, demonstrating the effectiveness of these approaches. This publication highlights the relevance of exploring a diverse set of ML algorithms and scalable approaches suitable for groups of subjects to improve drowsiness detection systems and ultimately reduce the number of accidents caused by driver fatigue. - Conclusions: The lessons learned from this study show that not only SVM but also other models not sufficiently explored in the literature are relevant for drowsiness detection. Additionally, scalable approaches are effective in detecting drowsiness, even when new subjects are evaluated. Thus, the proposed framework presents a novel approach for detecting drowsiness in driving scenarios using BCIs and ML.	翻訳日:2023-11-02 04:45:53 公開日:2023-10-30
# OOV-STR用視覚言語適応型相互デコーダ Vision-Language Adaptive Mutual Decoder for OOV-STR ( http://arxiv.org/abs/2209.00859v2 ) ライセンス: Link先を確認	Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, Lirong Dai	(参考訳) 近年の研究では、語彙(IV)シーンのテキスト認識に共通する深層学習モデルが大きな成功を収めている。しかし、現実のシナリオでは、語彙外(oov)の単語は非常に重要であり、sota認識モデルは通常、oovの設定で性能が悪い。学習言語がOOVプリフォームを制限していたという直感に触発されて、視覚言語適応型相互デコーダ(VLAMD)というフレームワークを設計し、OOVの問題に部分的に対処する。 VLAMDは3つの主要なコンポンジェントから構成される。まず,2つの視覚のみのモジュールを適応的に結合したアテンションベースLSTMデコーダを構築し,視覚言語によるバランスの取れたメインブランチを生成する。次に,共通視覚および言語先行表現学習のための補助的クエリベース自己回帰トランスフォーマ復号ヘッドを追加する。最後に、これらの2つの設計を、より多様な言語モデリングのための双方向トレーニングと組み合わせ、より堅牢な結果を得るために相互に逐次復号を行う。提案手法は,ECCV 2022 TiE Workshop の OOV-ST Challenge において,IV+OOV と OOV の設定に対して,70.31\% と59.61\% の単語精度を達成した。 Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings. Inspired by the intuition that the learned language prior have limited OOV preformence, we design a framework named Vision Language Adaptive Mutual Decoder (VLAMD) to tackle OOV problems partly. VLAMD consists of three main conponents. Firstly, we build an attention based LSTM decoder with two adaptively merged visual-only modules, yields a vision-language balanced main branch. Secondly, we add an auxiliary query based autoregressive transformer decoding head for common visual and language prior representation learning. Finally, we couple these two designs with bidirectional training for more diverse language modeling, and do mutual sequential decoding to get robuster results. Our approach achieved 70.31\% and 59.61\% word accuracy on IV+OOV and OOV settings respectively on Cropped Word Recognition Task of OOV-ST Challenge at ECCV 2022 TiE Workshop, where we got 1st place on both settings.	翻訳日:2023-11-02 04:45:05 公開日:2023-10-30
# 反応に対する一般介入による不変表現の学習 Learning Invariant Representations under General Interventions on the Response ( http://arxiv.org/abs/2208.10027v3 ) ライセンス: Link先を確認	Kang Du and Yu Xiang	(参考訳) 近年、異なる環境から特徴と応答のペアを観察することが一般的になっている。その結果、分散シフトによって異なる分布を持つデータに学習した予測器を適用する必要がある。 1つの原理的なアプローチは、トレーニングとテストモデルを記述するために構造因果モデルを採用することである。しかし、この原則は、応答がインターバルされたときに実践的な設定で違反する可能性がある。自然の疑問は、目に見えない環境で予測を促進するために他の形の不変性を特定することができるかどうかである。そこで本研究では, 線形構造因果モデル (SCM) に焦点をあて, 付加的な特徴を通じて介入を捕捉する明示的な関係性である不変マッチング特性 (IMP) を導入し, 応答に対する一般的な介入と予測器の統一的な処理を可能にする, 新たな不変形の不変性を実現する。本手法の漸近的一般化誤差を離散的および連続的な環境条件下で解析し,半パラメトリック変動係数モデルに関連付けて連続ケースを処理した。新型コロナウイルスのデータセットを含む様々な実験環境において,既存の手法と比較して競合性能を示すアルゴリズムを提案する。 It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we focus on linear structural causal models (SCMs) and introduce invariant matching property (IMP), an explicit relation to capture interventions through an additional feature, leading to an alternative form of invariance that enables a unified treatment of general interventions on the response as well as the predictors. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings including a COVID dataset.	翻訳日:2023-11-02 04:44:38 公開日:2023-10-30
# 開量子系の緩和における時間的臨界スケーリングの適応 Indication of critical scaling in time during the relaxation of an open quantum system ( http://arxiv.org/abs/2208.05164v2 ) ライセンス: Link先を確認	Ling-Na Wu, Jens Nettersheim, Julian Fe\ss, Alexander Schnell, Sabrina Burgardt, Silvia Hiebel, Daniel Adam, Andr\'e Eckardt and Artur Widera	(参考訳) 相転移は、温度や外部磁場のような連続的な制御パラメータに対応する物理系の特異な挙動に対応する。相関長の発散に伴う連続相転移付近で, 微視的系詳細に依存しない臨界指数を持つ普遍的パワーロースケーリング挙動が見いだされる。近年、動的量子相転移と普遍的スケーリングが予測され、クエンチ後の孤立量子系の非平衡力学でも観察され、制御パラメーターの役割は時間とともに果たされた。しかしながら、環境への散逸的な接触によって力学が駆動されるオープンシステムにおいて、そのような臨界現象の時相のシグネチャは、これまでになく明白であった。本稿では,混合状態によって記述された開量子系の緩和ダイナミクスにおいて,時間に対する臨界スケーリングも起こりうることを示す。ルビジウム原子の超低温ボースガスへのスピン交換による散逸結合によって誘導される個々のセシウム原子の大きな原子スピンの緩和ダイナミクスを実験的に測定した。初期状態が平衡から遠い場合、スピン状態のエントロピーは時間内にピークに達し、その最大値に過渡的に近づき、最終的にその低い平衡値に緩和される。さらに,数値シミュレーションに基づく有限次元スケーリング解析により,大きなシステムサイズ限界における散逸系の時間に関する臨界点に対応することを示した。臨界時刻における特徴的長さのばらつきによって信号が伝達され、システムの詳細とは独立な臨界指数によって特徴づけられる。 Phase transitions correspond to the singular behavior of physical systems in response to continuous control parameters like temperature or external fields. Near continuous phase transitions, associated with the divergence of a correlation length, universal power-law scaling behavior with critical exponents independent of microscopic system details is found. Recently, dynamical quantum phase transitions and universal scaling have been predicted and also observed in the non-equilibrium dynamics of isolated quantum systems after a quench, with time playing the role of the control parameter. However, signatures of such critical phenomena in time in open systems, whose dynamics is driven by the dissipative contact to an environment, were so far elusive. Here, we present results indicating that critical scaling with respect to time can also occur during the relaxation dynamics of an open quantum system described by mixed states. We experimentally measure the relaxation dynamics of the large atomic spin of individual Caesium atoms induced by the dissipative coupling via spin-exchange processes to an ultracold Bose gas of Rubidium atoms. For initial states far from equilibrium, the entropy of the spin state is found to peak in time, transiently approaching its maximum possible value, before eventually relaxing to its lower equilibrium value. Moreover, a finite-size scaling analysis based on numerical simulations shows that it corresponds to a critical point with respect to time of the dissipative system in the limit of large system sizes. It is signalled by the divergence of a characteristic length at a critical time, characterized by critical exponents that are found to be independent of system details.	翻訳日:2023-11-02 04:43:45 公開日:2023-10-30
# 20量子ビット量子シミュレータの複素状態再構成 Reconstructing complex states of a 20-qubit quantum simulator ( http://arxiv.org/abs/2208.04862v3 ) ライセンス: Link先を確認	Murali K. Kurmapu, V.V. Tiunova, E.S. Tiunov, Martin Ringbauer, Christine Maier, Rainer Blatt, Thomas Monz, Aleksey K. Fedorov, A.I. Lvovsky	(参考訳) 量子コンピュータとシミュレーターの開発に成功するための前提条件は、それらが生成する量子状態を測定することによって得られる物理的過程の正確な理解である。しかしながら、従来の量子状態推定に必要なリソースは、システムサイズと指数関数的にスケールし、代替アプローチの必要性を強調している。ここでは、大きく絡み合った多ビット量子状態の効率的な再構成法を示す。行列積状態 ansatz の変分バージョンを用いて、20量子ビットのトラップイオンイジング型量子シミュレータで生成された量子状態のトモグラフィー(純状態近似)を行い、各基底で1000個の測定値を持つ27塩基で取得したデータを用いた。我々は、ニューラルネットワークの量子状態表現に基づく手法と比較して、優れた状態再構成品質とより高速な収束を観察する:制限ボルツマンマシンと自己回帰アーキテクチャを備えたフィードフォワードニューラルネットワーク。本研究では,多体量子系のクエンチダイナミクスによって生成される複素状態の効率的な実験的キャラクタリゼーションへの道を開く。 A prerequisite to the successful development of quantum computers and simulators is precise understanding of physical processes occurring therein, which can be achieved by measuring the quantum states they produce. However, the resources required for traditional quantum-state estimation scale exponentially with the system size, highlighting the need for alternative approaches. Here we demonstrate an efficient method for reconstruction of significantly entangled multi-qubit quantum states. Using a variational version of the matrix product state ansatz, we perform the tomography (in the pure-state approximation) of quantum states produced in a 20-qubit trapped-ion Ising-type quantum simulator, using the data acquired in only 27 bases with 1000 measurements in each basis. We observe superior state reconstruction quality and faster convergence compared to the methods based on neural network quantum state representations: restricted Boltzmann machines and feedforward neural networks with autoregressive architecture. Our results pave the way towards efficient experimental characterization of complex states produced by the quench dynamics of many-body quantum systems.	翻訳日:2023-11-02 04:43:24 公開日:2023-10-30
# r\'enyiのシャッフルによるより強力なプライバシー増幅と近似微分プライバシー Stronger Privacy Amplification by Shuffling for R\'enyi and Approximate Differential Privacy ( http://arxiv.org/abs/2208.04591v2 ) ライセンス: Link先を確認	Vitaly Feldman and Audra McMillan and Kunal Talwar	(参考訳) 差分プライバシーのシャッフルモデルは、標準的なローカルモデルと中央モデル(EFMRTT19; CSUZZ19)の中間信頼モデルとして注目されている。このモデルの主な結果は、ランダムにランダムにランダムにデータをシャッフルすることで、差分プライバシーの保証を増幅する。このような増幅は、データが匿名で貢献されるシステムにとって、はるかに強力なプライバシー保証を意味する[BEMMRLRKTS17]。本研究では,理論と数値の両方で結果のシャッフルを行うことで,美術プライバシ増幅の状況を改善する。最初の貢献は、ldpランダム化器のシャッフル出力に対するr\'enyi微分プライバシーパラメータの漸近的最適解析である。第2の貢献は、シャッフルによるプライバシーの増幅に関する新たな分析です。この分析は[FMT20]の技法を改良し、全てのパラメータ設定においてより厳密な数値境界をもたらす。 The shuffle model of differential privacy has gained significant interest as an intermediate trust model between the standard local and central models [EFMRTT19; CSUZZ19]. A key result in this model is that randomly shuffling locally randomized data amplifies differential privacy guarantees. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17]. In this work, we improve the state of the art privacy amplification by shuffling results both theoretically and numerically. Our first contribution is the first asymptotically optimal analysis of the R\'enyi differential privacy parameters for the shuffled outputs of LDP randomizers. Our second contribution is a new analysis of privacy amplification by shuffling. This analysis improves on the techniques of [FMT20] and leads to tighter numerical bounds in all parameter settings.	翻訳日:2023-11-02 04:43:09 公開日:2023-10-30
# ギャップ量子多体系からの急速に混合されたマルコフ鎖 A rapidly mixing Markov chain from any gapped quantum many-body system ( http://arxiv.org/abs/2207.07044v2 ) ライセンス: Link先を確認	Sergey Bravyi, Giuseppe Carleo, David Gosset, Yinchen Liu	(参考訳) 分布 $\pi(x)=\|\langle x\|\psi\rangle\|^2$ からビット文字列 $x$ をサンプリングする計算タスクを考える。我々の主な結果は、逆スペクトルギャップの$H$と、関連する連続時間マルコフ連鎖と定常状態の$\pi$との混合時間との直接リンクを記述する。マルコフ連鎖は、基底状態振幅の比$\langle y\|\psi\rangle/\langle x\|\psi\rangle$が効率良く計算可能であり、$H$のスペクトルギャップはシステムサイズにおける少なくとも逆多項式であり、連鎖の開始状態は、効率よくチェックできる穏やかな技術的条件を満たす。これは、サインプロブレム自由ハミルトニアンとマルコフ連鎖の間の既知の関係を拡張する。この一般化を可能にするツールは、フェルミオン符号問題に対処するために以前は量子モンテカルロシミュレーションで使われていたいわゆる固定ノードハミルトン構成である。提案したサンプリングアルゴリズムを数値的に実装し,56量子ビットのHaldane-Shastry Hamiltonian基底状態からサンプリングする。我々は、固定ノードハミルトニアンに基づくマルコフ連鎖が標準のメトロポリス・ハスティングス・マルコフ連鎖よりも高速に混合されることを経験的に観察する。 We consider the computational task of sampling a bit string $x$ from a distribution $\pi(x)=\|\langle x\|\psi\rangle\|^2$, where $\psi$ is the unique ground state of a local Hamiltonian $H$. Our main result describes a direct link between the inverse spectral gap of $H$ and the mixing time of an associated continuous-time Markov Chain with steady state $\pi$. The Markov Chain can be implemented efficiently whenever ratios of ground state amplitudes $\langle y\|\psi\rangle/\langle x\|\psi\rangle$ are efficiently computable, the spectral gap of $H$ is at least inverse polynomial in the system size, and the starting state of the chain satisfies a mild technical condition that can be efficiently checked. This extends a previously known relationship between sign-problem free Hamiltonians and Markov chains. The tool which enables this generalization is the so-called fixed-node Hamiltonian construction, previously used in Quantum Monte Carlo simulations to address the fermionic sign problem. We implement the proposed sampling algorithm numerically and use it to sample from the ground state of Haldane-Shastry Hamiltonian with up to 56 qubits. We observe empirically that our Markov chain based on the fixed-node Hamiltonian mixes more rapidly than the standard Metropolis-Hastings Markov chain.	翻訳日:2023-11-02 04:42:47 公開日:2023-10-30
# 大規模言語モデルの概念支援型デバイアス Conceptor-Aided Debiasing of Large Language Models ( http://arxiv.org/abs/2211.11087v3 ) ライセンス: Link先を確認	Li S. Yifei, Lyle Ungar, Jo\~ao Sedoc	(参考訳) 事前訓練された大規模言語モデル(LLM)は、トレーニングコーパスの社会的バイアスを反映している。この問題を軽減するために多くの方法が提案されているが、デビアスに失敗したり、モデルの精度を犠牲にしたりすることが多い。我々は,BERT や GPT などの LLM のバイアス部分空間を同定し,除去するためのソフトプロジェクション手法である概念を用いた。提案手法は, コンセプタ非操作による後処理によるバイアス部分空間投影と, (2) トレーニング中のすべてのレイヤにコンセプタ投影を明示的に組み込む新しいアーキテクチャであるconceptor-intervened bert (ci-bert) を提案する。 GLUEベンチマークでは,LLMの性能を維持しつつ,最先端(SoTA)のデバイアス結果を実現する。さらに、様々なシナリオにおいてロバストであり、既存のバイアス部分空間上のAND演算により交差点バイアスを効率的に緩和することができる。 CI-BERTのトレーニングはすべてのレイヤのバイアスを考慮に入れ、バイアス軽減で後処理に勝てるが、CI-BERTは言語モデルの精度を低下させる。また,バイアス部分空間を慎重に構築することの重要性を示す。最善の結果は、偏りのある単語のリストから外れ値を削除し、それらを(or操作によって)組み合わせ、それらの埋め込みをよりクリーンなコーパスから計算することで得られる。 Pre-trained large language models (LLMs) reflect the inherent social biases of their training corpus. Many methods have been proposed to mitigate this issue, but they often fail to debias or they sacrifice model accuracy. We use conceptors--a soft projection method--to identify and remove the bias subspace in LLMs such as BERT and GPT. We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training. We find that conceptor post-processing achieves state-of-the-art (SoTA) debiasing results while maintaining LLMs' performance on the GLUE benchmark. Further, it is robust in various scenarios and can mitigate intersectional bias efficiently by its AND operation on the existing bias subspaces. Although CI-BERT's training takes all layers' bias into account and can beat its post-processing counterpart in bias mitigation, CI-BERT reduces the language model accuracy. We also show the importance of carefully constructing the bias subspace. The best results are obtained by removing outliers from the list of biased words, combining them (via the OR operation), and computing their embeddings using the sentences from a cleaner corpus.	翻訳日:2023-11-02 04:35:28 公開日:2023-10-30
# MLIC:学習画像圧縮のためのマルチ参照エントロピーモデル MLIC: Multi-Reference Entropy Model for Learned Image Compression ( http://arxiv.org/abs/2211.07273v7 ) ライセンス: Link先を確認	Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang	(参考訳) 近年,学習画像の圧縮性能は著しく向上している。潜在表現の分布を推定するエントロピーモデルは、速度分散性能の向上に重要な役割を果たしている。しかし、ほとんどのエントロピーモデルは1次元の相関のみを捉えるが、潜在表現はチャネル回り、局所空間、大域的な空間相関を含む。この問題に対処するため、Multi-Reference Entropy Model (MEM) と高度なバージョンMEM$^+$を提案する。これらのモデルは潜在表現に存在する異なる種類の相関を捉える。具体的には、まず潜在表現をスライスに分割する。現在のスライスを復号する際には、予め復号されたスライスをコンテキストとして使用し、それまでのスライスのアテンションマップを用いて、現在のスライスにおける大域的相関を予測する。ローカルコンテキストをキャプチャするために,性能劣化を回避する2つの拡張チェッカーボードコンテキストキャプチャ技術を導入する。 MEM と MEM$^+$ に基づいて,画像圧縮モデル MLIC と MLIC$^+$ を提案する。我々のMLICおよびMLIC$^+$モデルは、PSNRで測定されたVTM-17.0と比較して、Kodakデータセット上でのBDレートが8.05\%$と11.39\%$に減少する。私たちのコードはhttps://github.com/jiangweibeta/mlicで利用可能です。 Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC.	翻訳日:2023-11-02 04:34:40 公開日:2023-10-30
# 自然な注釈付き単語セグメンテーションデータとしての音声における単語境界の抽出 Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data ( http://arxiv.org/abs/2210.17122v2 ) ライセンス: Link先を確認	Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Zhefeng Wang, Baoxing Huai, Min Zhang	(参考訳) 中国語単語セグメンテーション(CWS)のための自然な注釈付きデータ探索の初期の研究や、音声とテキスト処理の統合に関する最近の研究から着想を得たこの研究は、初めてパラレル音声/テキストデータから単語境界をマイニングすることを提案する。まず、実験で使用したCWSデータに関連する2つのインターネットソースから、並列音声/テキストデータを収集する。そして,文字レベルのアライメントを取得し,隣接する文字間の停止時間に応じて単語境界を決定するための単純なヒューリスティックなルールを設計する。最後に,モデルトレーニングに自然に付加したデータをより有効に活用できる,効果的な完全列学習戦略を提案する。実験によると、このアプローチはクロスドメインと低リソースの両方のシナリオでcwsのパフォーマンスを著しく向上させる。 Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries from parallel speech/text data. First we collect parallel speech/text data from two Internet sources that are related with CWS data used in our experiments. Then, we obtain character-level alignments and design simple heuristic rules for determining word boundaries according to pause duration between adjacent characters. Finally, we present an effective complete-then-train strategy that can better utilize extra naturally annotated data for model training. Experiments demonstrate our approach can significantly boost CWS performance in both cross-domain and low-resource scenarios.	翻訳日:2023-11-02 04:34:11 公開日:2023-10-30
# 最小エントロピー結合を用いた完全安全ステガノグラフィ Perfectly Secure Steganography Using Minimum Entropy Coupling ( http://arxiv.org/abs/2210.14889v4 ) ライセンス: Link先を確認	Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier	(参考訳) ステガノグラフィ(Steganography)とは、敵の第三者が隠された意味があることに気づかないような、秘密情報を無害な内容に符号化する実践である。この問題は古典的にセキュリティ文献で研究されてきたが、生成モデルの最近の進歩は、スケーラブルなステガノグラフィ技術を開発するセキュリティ研究者と機械学習研究者の間で共通の関心を呼んでいる。本研究は,1998年のCachin (1998) の情報理論モデルにおいて, ステガノグラフィーの手法が完全に安全であることを示し, 結合によって誘導される場合に限る。さらに,完全セキュアな手順の中で,最小エントロピー結合によって引き起こされる場合に限り,手続きが情報スループットを最大化することを示す。これらの知見は、私たちの知る限り、任意のカバーテキスト分布に対する完全なセキュリティ保証を達成するための最初のステガノグラフィーアルゴリズムとなる。 GPT-2, WaveRNN, Image Transformer を通信チャネルとして用いて, エントロピー結合に基づく最小のアプローチを, 算術符号, Meteor, 適応動的グループ化の3つの現代ベースラインと比較した。最小エントロピー結合に基づくアプローチは、より強いセキュリティ制約にもかかわらず、より優れたエンコーディング効率を実現する。これらの結果から, 最小エントロピー結合レンズを通して情報理論ステガノグラフィを見ることは自然である可能性が示唆された。 Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in developing scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information-theoretic model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure maximizes information throughput if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees for arbitrary covertext distributions. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines -- arithmetic coding, Meteor, and adaptive dynamic grouping -- using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling.	翻訳日:2023-11-02 04:33:57 公開日:2023-10-30
# 有向非巡回グラフ上のトランスフォーマー Transformers over Directed Acyclic Graphs ( http://arxiv.org/abs/2210.13148v6 ) ライセンス: Link先を確認	Yuankai Luo, Veronika Thost, Lei Shi	(参考訳) トランスフォーマーモデルは最近、グラフ表現学習で人気を博し、通常のグラフニューラルネットワークでキャプチャされたもの以上の複雑な関係を学習する可能性がある。主な研究課題は、グラフの構造バイアスをトランスフォーマーアーキテクチャにどのように注入するかであり、非方向の分子グラフや近年ではより大きなネットワークグラフにもいくつかの提案がなされている。本稿では,有向非巡回グラフ (DAG) 上のトランスフォーマーについて検討し,(1) トランスフォーマーの通常の二次的複雑性よりもはるかに効率的で,同時にDAG構造を忠実に捉えた注意機構,(2) 前者を補完するDAGの部分的順序の位置エンコーディングを提案する。我々は、ソースコードグラフから引用ネットワークのノードへの分類に至るまで、さまざまなタスクに対する我々のアプローチを厳格に評価し、グラフトランスフォーマーを一般的にDAGに適合したグラフニューラルネットワークを上回り、品質と効率の両面でSOTAグラフトランスフォーマーの性能を向上させるという2つの重要な側面において有効であることを示す。 Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is considerably more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our approach over various types of tasks, ranging from classifying source code graphs to nodes in citation networks, and show that it is effective in two important aspects: in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.	翻訳日:2023-11-02 04:33:30 公開日:2023-10-30
# 多行動政策のグラディエントについて On Many-Actions Policy Gradient ( http://arxiv.org/abs/2210.13011v5 ) ライセンス: Link先を確認	Michal Nauman and Marek Cygan	(参考訳) 確率的政策勾配 (SPGs) と状態毎のアクションサンプルのばらつきについて検討した。我々は,多作用のspgが分散を生じさせる時期を決定する多作用最適条件を,比例伸長軌道を持つ単作用剤と比較して導出する。 SPGの文脈における多行動サンプリングに動的モデルを活用するモデルベース多行動(MBMA)を提案する。 MBMAは、マルチアクションSPGの既存の実装に関連する問題に対処し、モデルシミュレーションロールアウトの状態から推定される低いバイアスとSPGに匹敵する分散をもたらす。 MBMAバイアスと分散構造は理論によって予測されるものと一致している。その結果, MBMAはモデルフリー, 多アクション, モデルベースSPGベースラインと比較して, サンプル効率の向上と, 一連の連続行動環境のリターンの向上を実現している。 We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.	翻訳日:2023-11-02 04:33:08 公開日:2023-10-30
# 臨床名付きエンティティ認識のための事前学習言語モデルの価値の検討 Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition ( http://arxiv.org/abs/2210.12770v4 ) ライセンス: Link先を確認	Samuel Belkadi and Lifeng Han and Yuping Wu and Goran Nenadic	(参考訳) 自然言語処理(NLP)の分野では,一般あるいはドメイン固有データから限られたリソースを持つ特定のタスクへの微調整事前学習言語モデル(PLM)の実践が人気を集めている。本研究では,この仮定を再考し,臨床NLP,特に薬物とその関連属性に対する名前付きエンティティ認識について検討する。我々は,スクラッチからトレーニングした Transformer モデルと細調整された BERT ベースの LLM,すなわち BERT, BioBERT, ClinicalBERT を比較した。さらに、文脈学習を促進するために追加のCRF層がそのようなモデルに与える影響を検討する。我々はモデル開発と評価にn2c2-2018共有タスクデータを使用する。実験の結果は 1) CRF層は全ての言語モデルを改善した。 2) マクロ平均F1スコアを用いたBIO制限スパンレベル評価について、微調整LDMは0.83以上のスコアを得たが、TransformerCRFモデルは、スクラッチからトレーニングされた0.78以上のスコアを得た。 3) 重み付き平均値を用いた生体制限スパンレベル評価では, 臨床用bert-crf, bert-crf, およびtransformrcrfがそれぞれ97.59\%/97.44\%/96.84\%と低いスコアを示した。 4) より優れたデータ分散のためのダウンサンプリングによる効率的なトレーニングの適用により、トレーニングコストとデータの必要性はさらに低減され、同様のスコアが維持される。我々のモデルは \url{https://github.com/HECTA-UoM/TransformerCRF} でホストされます。 The practice of fine-tuning Pre-trained Language Models (PLMs) from general or domain-specific data to a specific task with limited resources, has gained popularity within the field of natural language processing (NLP). In this work, we re-visit this assumption and carry out an investigation in clinical NLP, specifically Named Entity Recognition on drugs and their related attributes. We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs namely BERT, BioBERT, and ClinicalBERT. Furthermore, we examine the impact of an additional CRF layer on such models to encourage contextual learning. We use n2c2-2018 shared task data for model development and evaluations. The experimental outcomes show that 1) CRF layers improved all language models; 2) referring to BIO-strict span level evaluation using macro-average F1 score, although the fine-tuned LLMs achieved 0.83+ scores, the TransformerCRF model trained from scratch achieved 0.78+, demonstrating comparable performances with much lower cost - e.g. with 39.80\% less training parameters; 3) referring to BIO-strict span-level evaluation using weighted-average F1 score, ClinicalBERT-CRF, BERT-CRF, and TransformerCRF exhibited lower score differences, with 97.59\%/97.44\%/96.84\% respectively. 4) applying efficient training by down-sampling for better data distribution further reduced the training cost and need for data, while maintaining similar scores - i.e. around 0.02 points lower compared to using the full dataset. Our models will be hosted at \url{https://github.com/HECTA-UoM/TransformerCRF}	翻訳日:2023-11-02 04:32:55 公開日:2023-10-30
# SimSCOOD:微調整ソースコードモデルにおける分布外一般化の体系的解析 SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in Fine-tuned Source Code Models ( http://arxiv.org/abs/2210.04802v2 ) ライセンス: Link先を確認	Hossein Hajipour, Ning Yu, Cristian-Alexandru Staicu, Mario Fritz	(参考訳) ソースコードモデルの事前トレーニングには、大規模なコードデータセットがアクセスしやすくなっている。しかしながら、微調整フェーズでは、特定の下流タスクのコード分布を完全にカバーする代表的なトレーニングデータを得ることは、タスク固有の性質と限定的なラベル付けリソースのため、依然として困難である。さらに、事前学習モデルの微調整は、事前に獲得した事前学習知識を忘れることになる。これらは、まだ体系的に研究されていない予期せぬモデル推論行動による分散(ood)一般化問題につながる。本稿では、ソースコードデータ特性の異なる次元に沿って様々なOODシナリオをシミュレートする最初の体系的アプローチを提案し、それらのシナリオにおける微調整モデル挙動について検討する。完全微調整法とローランド適応法(LoRA)微調整法を含む,異なる微調整法下でのモデルの挙動について検討する。最先端の4つの事前学習モデル上で実施し,2つのコード生成タスクに適用した総合的な解析を行い,ood一般化問題に起因する複数の障害モードを明らかにした。さらに分析の結果,LoRAファインチューニングは様々なシナリオにおける全ファインチューニングよりも,OODの一般化性能が大幅に向上していることが判明した。 Large code datasets have become increasingly accessible for pre-training source code models. However, for the fine-tuning phase, obtaining representative training data that fully covers the code distribution for specific downstream tasks remains challenging due to the task-specific nature and limited labeling resources. Moreover, fine-tuning pretrained models can result in forgetting previously acquired pre-training knowledge. These lead to out-of-distribution (OOD) generalization issues with unexpected model inference behaviors that have not been systematically studied yet. In this paper, we contribute the first systematic approach that simulates various OOD scenarios along different dimensions of source code data properties and study the fine-tuned model behaviors in such scenarios. We investigate the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods. Our comprehensive analysis, conducted on four state-of-the-art pretrained models and applied to two code generation tasks, exposes multiple failure modes attributed to OOD generalization issues. Additionally, our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.	翻訳日:2023-11-02 04:31:44 公開日:2023-10-30
# VoLTA:局部アライメントを弱めるビジョンランゲージ変換器 VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment ( http://arxiv.org/abs/2210.04135v3 ) ライセンス: Link先を確認	Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun and Rama Chellappa	(参考訳) 視覚言語事前学習(VLP)は、最近、様々なユニモーダルおよびマルチモーダルダウンストリームアプリケーションに非常に効果的であることが証明された。しかしながら、既存のほとんどのエンドツーエンドVLP法は、高解像度の画像テキストボックスデータを使用して、オブジェクト検出、セグメンテーション、参照表現理解などのきめ細かい領域レベルのタスクをうまく処理する。残念ながら、正確なバウンディングボックスアノテーションを備えた高解像度画像は、収集し、大規模に監視するのに費用がかかる。本稿では,VoLTA(Vosion-Language Transformer with weak-supervised local-feature Alignment)を提案する。 voltaは、グラフ最適化トランスポートベースの弱教師付きアライメントをローカルイメージパッチとテキストトークンに採用し、明示的で自己正規化され、解釈可能な低レベルマッチング基準を継承する。さらにvoltaは、プレトレーニング中にマルチモーダルフュージョンをユニモーダルバックボーンに深く押し込み、フュージョン固有のトランスフォーマー層を取り除き、メモリ要件をさらに削減する。広範囲の視覚および視覚によるダウンストリームタスクに対する広範な実験は、粗いダウンストリーム性能を損なうことなく、細粒度アプリケーションにおけるVoLTAの有効性を実証している。 Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accurate bounding box annotations are expensive to collect and use for supervision at scale. In this work, we propose VoLTA (Vision-Language Transformer with weakly-supervised local-feature Alignment), a new VLP paradigm that only utilizes image-caption data but achieves fine-grained region-level image understanding, eliminating the use of expensive box annotations. VoLTA adopts graph optimal transport-based weakly-supervised alignment on local image patches and text tokens to germinate an explicit, self-normalized, and interpretable low-level matching criterion. In addition, VoLTA pushes multi-modal fusion deep into the uni-modal backbones during pre-training and removes fusion-specific transformer layers, further reducing memory requirements. Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations.	翻訳日:2023-11-02 04:31:06 公開日:2023-10-30
# ラーニングウェア:小さなモデルは大きい Learnware: Small Models Do Big ( http://arxiv.org/abs/2210.03647v3 ) ライセンス: Link先を確認	Zhi-Hua Zhou, Zhi-Hao Tan	(参考訳) 現在の機械学習技術には、大量のトレーニングデータと熟練したトレーニングスキルの必要性、継続的な学習の難しさ、壊滅的な忘れのリスク、データのプライバシ/プライバシの漏洩など、不満がある。ほとんどの研究は、関連する問題の1つに別々に焦点を合わせており、ほとんどの問題が実際に絡まっているという事実に注意を払っていない。自然言語処理やコンピュータビジョンの応用で目覚ましい成果を上げてきた、一般的なビッグモデルパラダイムは、これらの問題にまだ対応していないが、炭素排出量の深刻な源となっている。本稿では,機械学習モデルをスクラッチから構築する必要がないことをユーザが実現しようとする学習ウェアのパラダイムの概要を紹介する。このパラダイムは,従来の目的を超えて,小さなモデルを再利用して物事を行おうとする試みであり,トレーニングされたモデルを適切に識別し,モデルについて事前に何も知らない将来のユーザの要求に応じて再利用できるようにするための重要な要素である。 There are complaints about current machine learning techniques such as the requirement of a huge amount of training data and proficient training skills, the difficulty of continual learning, the risk of catastrophic forgetting, the leaking of data privacy/proprietary, etc. Most research efforts have been focusing on one of those concerned issues separately, paying less attention to the fact that most issues are entangled in practice. The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes, where the key ingredient is the specification which enables a trained model to be adequately identified to reuse according to the requirement of future users who know nothing about the model in advance.	翻訳日:2023-11-02 04:30:38 公開日:2023-10-30
# 非負のテンソルに対する多体近似 Many-body Approximation for Non-negative Tensors ( http://arxiv.org/abs/2209.15338v3 ) ライセンス: Link先を確認	Kazu Ghalamkari, Mahito Sugiyama, Yoshinobu Kawahara	(参考訳) 多体近似と呼ばれる非負のテンソルを分解する別の方法を提案する。伝統的な分解法は表現の低ランク性を前提としており、大域的な最適化と目標ランクの選択が困難になる。我々は、テンソルとそのモードが確率分布と確率変数に対応するような、エネルギーに基づくテンソルのモデリングによってこれらの問題を回避する。我々のモデルは、階数よりも直感的に調整できる変数間の相互作用(つまりモード)を考慮し、KLの発散最小化の観点からグローバルに最適化することができる。さらに,モード間の相互作用をテンソルネットワークとして可視化し,多体近似と低ランク近似の非自明な関係を明らかにする。テンソル完備化と近似におけるアプローチの有効性を示す。 We present an alternative approach to decompose non-negative tensors, called many-body approximation. Traditional decomposition methods assume low-rankness in the representation, resulting in difficulties in global optimization and target rank selection. We avoid these problems by energy-based modeling of tensors, where a tensor and its mode correspond to a probability distribution and a random variable, respectively. Our model can be globally optimized in terms of the KL divergence minimization by taking the interaction between variables (that is, modes), into account that can be tuned more intuitively than ranks. Furthermore, we visualize interactions between modes as tensor networks and reveal a nontrivial relationship between many-body approximation and low-rank approximation. We demonstrate the effectiveness of our approach in tensor completion and approximation.	翻訳日:2023-11-02 04:30:20 公開日:2023-10-30
# 変分量子固有解法に対するランダム化コンパイルとゼロノイズ外挿による相乗的量子誤差緩和 Synergetic quantum error mitigation by randomized compiling and zero-noise extrapolation for the variational quantum eigensolver ( http://arxiv.org/abs/2212.11198v2 ) ライセンス: Link先を確認	Tomochika Kurita, Hammam Qassim, Masatoshi Ishii, Hirotaka Oshima, Shintaro Sato, Joseph Emerson	(参考訳) 本稿では,変分量子固有解法(VQE)アルゴリズムの量子誤差軽減戦略を提案する。数値シミュレーションにより,vqeのコヒーレントノイズは,従来の緩和法では抑制しにくいような大きな誤差を生じさせる可能性があるが,提案手法では,これらの誤差を著しく低減できることがわかった。提案手法は従来報告されていたランダム化コンパイル(RC)とゼロノイズ外挿(ZNE)の組み合わせである。直感的には、ランダム化コンパイルは、回路内のコヒーレントエラーを確率的ポーリ誤差に変換し、コスト関数を評価する際にゼロノイズ限界への外挿を容易にする。小分子に対するvqeの数値シミュレーションにより,提案手法は,様々な種類のコヒーレントノイズによるエネルギー誤差を最大2桁緩和できることを示した。 We propose a quantum error mitigation strategy for the variational quantum eigensolver (VQE) algorithm. We find, via numerical simulation, that very small amounts of coherent noise in VQE can cause substantially large errors that are difficult to suppress by conventional mitigation methods, and yet our proposed mitigation strategy is able to significantly reduce these errors. The proposed strategy is a combination of previously reported techniques, namely randomized compiling (RC) and zero-noise extrapolation (ZNE). Intuitively, randomized compiling turns coherent errors in the circuit into stochastic Pauli errors, which facilitates extrapolation to the zero-noise limit when evaluating the cost function. Our numerical simulation of VQE for small molecules shows that the proposed strategy can mitigate energy errors induced by various types of coherent noise by up to two orders of magnitude.	翻訳日:2023-11-02 04:23:01 公開日:2023-10-30
# ブラインド超解像カーネル推定のためのメタラーニングカーネル Meta-Learned Kernel For Blind Super-Resolution Kernel Estimation ( http://arxiv.org/abs/2212.07886v2 ) ライセンス: Link先を確認	Royson Lee, Rui Li, Stylianos I. Venieris, Timothy Hospedales, Ferenc Husz\'ar, Nicholas D. Lane	(参考訳) 近年の画像劣化推定手法により,一像超解像(SR)による実世界の画像のアップサンプル化が可能となった。これらの手法のうち、明示的なカーネル推定手法は未知の劣化を扱う上で前例のない性能を示した。それでも、下流SRモデルで使用する場合、いくつかの制限が有効性を制限している。特に、この方法の族は、一画像毎の適応期間の長いことによる過度な推測時間二カーネルミスマッチによる画像の忠実度が劣ること。本研究では,画像の分布に含まれる情報からメタ学習を学習するアプローチを導入し,カーネル推定と画像忠実度の両方の性能を大幅に向上させるとともに,新たな画像への適応を著しく高速化する。具体的には, カーネル生成GANであるMetaKernelGANを, 新しいイメージが提示されると, ジェネレータがインフォームされたカーネル推定から始まり, 識別器は, パッチ分布を識別する強力な能力で開始する。最先端の手法と比較して,MetaKernelGANはカーネルの規模と共分散をよりよく推定し,非盲点SRモデルと組み合わせた場合,最先端の盲点SR結果が得られることを示した。教師なし学習者の教師なし学習を通じて、教師なし学習者の一般化性を維持し、カーネル推定の最適化安定性を改善し、画像適応を向上し、既存の手法よりも14.24から102.1倍の速度で高速な推論を実現する。 Recent image degradation estimation methods have enabled single-image super-resolution (SR) approaches to better upsample real-world images. Among these methods, explicit kernel estimation approaches have demonstrated unprecedented performance at handling unknown degradations. Nonetheless, a number of limitations constrain their efficacy when used by downstream SR models. Specifically, this family of methods yields i) excessive inference time due to long per-image adaptation times and ii) inferior image fidelity due to kernel mismatch. In this work, we introduce a learning-to-learn approach that meta-learns from the information contained in a distribution of images, thereby enabling significantly faster adaptation to new images with substantially improved performance in both kernel estimation and image fidelity. Specifically, we meta-train a kernel-generating GAN, named MetaKernelGAN, on a range of tasks, such that when a new image is presented, the generator starts from an informed kernel estimate and the discriminator starts with a strong capability to distinguish between patch distributions. Compared with state-of-the-art methods, our experiments show that MetaKernelGAN better estimates the magnitude and covariance of the kernel, leading to state-of-the-art blind SR results within a similar computational regime when combined with a non-blind SR model. Through supervised learning of an unsupervised learner, our method maintains the generalizability of the unsupervised learner, improves the optimization stability of kernel estimation, and hence image adaptation, and leads to a faster inference with a speedup between 14.24 to 102.1x over existing methods.	翻訳日:2023-11-02 04:21:58 公開日:2023-10-30
# 不特定人間モデルに対する逆推定の感度について On the Sensitivity of Reward Inference to Misspecified Human Models ( http://arxiv.org/abs/2212.04717v2 ) ライセンス: Link先を確認	Joey Hong and Kush Bhatia and Anca Dragan	(参考訳) 人間の振る舞いから報酬関数を推論することは、価値の整合の中心であり、AIの目標と私たち、人間、実際に望むものとを整合させる。しかし、それを行うには、人間の行動のモデルに依存する。認知科学、神経科学、行動経済学の何十年もの研究の後、正確な人間のモデルを得ることは、オープンな研究課題である。これらのモデルは、報酬の推測が正確になるために、どの程度正確なものが必要なのか? 一方で、モデル内の小さなエラーが推論の破滅的なエラーに繋がる場合、報酬学習のフレームワーク全体が不公平に思えます。一方、モデルが改善されれば、報酬の正確さも向上するという保証が得られます。我々はこの問題を理論的にも経験的にも研究する。残念なことに、予想された報酬で任意に大きなエラーを引き起こす行動の小さな敵バイアスを構築することは可能である。しかし、おそらくもっと重要なことは、報酬推論エラーが人間のモデルにおける誤差で線形に境界づけられるという合理的な仮定を特定できるということです。最後に、シミュレーションおよび人的データを用いて、離散的かつ連続的な制御タスクにおける理論的洞察を検証する。 Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want. But doing so relies on models of how humans behave given their objectives. After decades of research in cognitive science, neuroscience, and behavioral economics, obtaining accurate human models remains an open research topic. This begs the question: how accurate do these models need to be in order for the reward inference to be accurate? On the one hand, if small errors in the model can lead to catastrophic error in inference, the entire framework of reward learning seems ill-fated, as we will never have perfect models of human behavior. On the other hand, if as our models improve, we can have a guarantee that reward accuracy also improves, this would show the benefit of more work on the modeling side. We study this question both theoretically and empirically. We do show that it is unfortunately possible to construct small adversarial biases in behavior that lead to arbitrarily large errors in the inferred reward. However, and arguably more importantly, we are also able to identify reasonable assumptions under which the reward inference error can be bounded linearly in the error in the human model. Finally, we verify our theoretical insights in discrete and continuous control tasks with simulated and human data.	翻訳日:2023-11-02 04:21:20 公開日:2023-10-30
# オフライン強化学習のための信頼度決定値関数 Confidence-Conditioned Value Functions for Offline Reinforcement Learning ( http://arxiv.org/abs/2212.04607v2 ) ライセンス: Link先を確認	Joey Hong and Aviral Kumar and Sergey Levine	(参考訳) オフライン強化学習(RL)は、既存の静的データセットのみを使用して効果的なポリシを学ぶことができる。そのため、オフラインのRLメソッドはデータセットと学習ポリシーの間の分散シフトを処理しなければならない。最も一般的なアプローチは、アウト・オブ・ディストリビューション(ood)アクションのリターンを過小評価する、保守的、あるいは低いバウンドの値関数を学ぶことである。そのような価値関数に最適化されたポリシーは、固定された、おそらくは準最適である保守主義の程度に従ってのみ振る舞うことができる。しかし、トレーニング時に様々なレベルの保守主義のポリシーを学習し、評価中にそれらの1つを動的に選択する方法を考案できれば、これは軽減できる。そこで本研究では,信頼性条件付き値関数を復号化して,保守性の度合いを付加した学習価値関数を提案する。我々はベルマンバックアップの新しい形式を導出し、高い確率で任意の信頼度に対するQ値を同時に学習する。信頼度を条件づけることで,これまでの観察履歴を用いて信頼度レベルを制御し,オンライン評価における適応的戦略を実現する。提案手法は,既存の保守的アルゴリズムからのQ-関数を信頼度に基づいて条件付けすることで実現可能であり,理論的には,学習値関数が任意の信頼度で真値の保守的推定を生成することを示す。最後に,本アルゴリズムが複数の離散制御領域において既存の保守的オフラインrlアルゴリズムよりも優れていることを実証的に示す。 Offline reinforcement learning (RL) promises the ability to learn effective policies solely using existing, static datasets, without any costly online interaction. To do so, offline RL methods must handle distributional shift between the dataset and the learned policy. The most common approach is to learn conservative, or lower-bound, value functions, which underestimate the return of out-of-distribution (OOD) actions. However, such methods exhibit one notable drawback: policies optimized on such value functions can only behave according to a fixed, possibly suboptimal, degree of conservatism. However, this can be alleviated if we instead are able to learn policies for varying degrees of conservatism at training time and devise a method to dynamically choose one of them during evaluation. To do so, in this work, we propose learning value functions that additionally condition on the degree of conservatism, which we dub confidence-conditioned value functions. We derive a new form of a Bellman backup that simultaneously learns Q-values for any degree of confidence with high probability. By conditioning on confidence, our value functions enable adaptive strategies during online evaluation by controlling for confidence level using the history of observations thus far. This approach can be implemented in practice by conditioning the Q-function from existing conservative algorithms on the confidence.We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence. Finally, we empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains.	翻訳日:2023-11-02 04:20:59 公開日:2023-10-30
# 事前学習したタンパク質言語モデルの幾何学的深層学習ネットワークへの統合 Integration of Pre-trained Protein Language Models into Geometric Deep Learning Networks ( http://arxiv.org/abs/2212.03447v2 ) ライセンス: Link先を確認	Fang Wu, Lirong Wu, Dragomir Radev, Jinbo Xu, Stan Z. Li	(参考訳) 幾何学的深層学習は、最近、非ユークリッド領域で大きな成功を収め、大きな生体分子の3次元構造を学習することが、別の研究領域として浮上している。しかし、その有効性は構造データが限られているため、大きく制約されている。一方、1Dシークエンスで訓練されたタンパク質言語モデルでは、広範囲のアプリケーションで拡張性を示す。以前のいくつかの研究では、これらの異なるタンパク質様相を組み合わせることで幾何学的ニューラルネットワークの表現力を促進するが、それらの利点を包括的に理解することはできなかった。本研究では,よく訓練されたタンパク質言語モデルから得られた知識を,いくつかの最先端幾何学的ネットワークに統合し,タンパク質-タンパク質界面予測,モデル品質評価,タンパク質-タンパク質剛体ドッキング,結合親和性予測など,さまざまなタンパク質表現学習ベンチマークを評価する。以上の結果から,ベースラインを20%上回る総合的な改善が見られた。強い証拠は、タンパク質言語モデルの知識の組み入れが幾何ネットワークの能力を大幅に向上させ、複雑なタスクに一般化できることを示唆している。 Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several previous studies consider combining these different protein modalities to promote the representation power of geometric neural networks, but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.	翻訳日:2023-11-02 04:20:32 公開日:2023-10-30
# spuriosity rankings: バイアスの測定と軽減のためのデータのソート Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases ( http://arxiv.org/abs/2212.02648v3 ) ライセンス: Link先を確認	Mazda Moayeri, Wenxiao Wang, Sahil Singla, Soheil Feizi	(参考訳) 本稿では,突発的手がかりに依存するモデルバイアスを簡易かつ効果的に測定・緩和する方法を提案する。データやモデルのトレーニングにコストのかかる変更を必要とせず、既に持っているデータをソートすることでよりうまく利用します。具体的には、解釈可能なネットワークの深い神経的特徴を介して、スプリシティー(一般的なスプリアスキューが存在する程度)に基づいて、クラス内の画像のランク付けを行う。高頻度画像と低頻度画像の精度のギャップとしてモデルバイアスを評価することは、スプリオシティランキングにより、マイノリティサブポピュレーション(低頻度画像)の特定が容易である。低精細度画像に分類ヘッドを微調整することで、モデルのバイアスを少ないコストで効率的に除去することさえ可能で、スプリソリティによらずサンプルを公平に処理することができる。 imagenet上で私たちのメソッドをデモし、5000ドルのクラスフィーチャ依存性に注釈を付けて(630ドル)、これらの機能に対して325k$のsoft segmentationのデータセットを作成しました。同定されたスプリアス神経特徴を用いてスプリオシティのランキングを計算した結果、89ドルの多様なモデルに対するバイアスを評価し、クラス毎のバイアスがモデル間で高い相関関係にあることを見出した。以上の結果から,スプリアス機能依存によるモデルバイアスは,モデルのトレーニング方法よりも,モデルがどのようなトレーニングを受けているかによって影響されることが示唆された。 We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating $5000$ class-feature dependencies ($630$ of which we find to be spurious) and generating a dataset of $325k$ soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for $89$ diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.	翻訳日:2023-11-02 04:20:12 公開日:2023-10-30
# すべてを支配するリスク:モデルベースオフライン強化学習におけるリスクに敏感な視点 One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning ( http://arxiv.org/abs/2212.00124v3 ) ライセンス: Link先を確認	Marc Rigter, Bruno Lacerda, Nick Hawes	(参考訳) オフライン強化学習(rl)は、オンライン探索がコストや危険すぎる、安全クリティカルなドメインに適している。このような安全クリティカルな設定では、決定は破滅的な結果のリスクを考慮するべきである。言い換えれば、意思決定はリスクに敏感であるべきです。オフラインRLのリスクに関する以前の研究は、分散シフトを避けるためにオフラインRL技術とリスク感受性のRLアルゴリズムを組み合わせている。本研究では,これら2つの問題に共同で対処するためのメカニズムとしてリスク感受性を提案する。我々のモデルに基づくアプローチは、てんかんとてんかんの不確実性の両方に対してリスク逆である。エピステマ性不確実性へのリスク回避は、データセットがカバーしていない領域がエピステマ性不確実性が高いため、分布シフトを妨げる。相対的不確実性へのリスク回避は、環境確率性による悪い結果をもたらす可能性のある行動を妨げる。実験により,本アルゴリズムは決定論的ベンチマークにおいて競争性能を達成し,確率的領域におけるリスクに敏感な目標に対する既存のアプローチを上回った。 Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In such safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributional shift, with risk-sensitive RL algorithms, to achieve risk-sensitivity. In this work, we propose risk-sensitivity as a mechanism to jointly address both of these issues. Our model-based approach is risk-averse to both epistemic and aleatoric uncertainty. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environment stochasticity. Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains.	翻訳日:2023-11-02 04:19:28 公開日:2023-10-30
# マルコフ連鎖モンテカルロを用いた線形統計形状モデルの近似断面積と差 Approximating Intersections and Differences Between Linear Statistical Shape Models Using Markov Chain Monte Carlo ( http://arxiv.org/abs/2211.16314v2 ) ライセンス: Link先を確認	Maximilian Weiherer, Finn Klein, Bernhard Egger	(参考訳) 現在まで、統計形状モデル(SSM)の比較は、コンパクト性、一般化、特異性といった単純な指標を用いて行われる、単にパフォーマンスに基づくものであることが多い。実際の形状空間間の類似性や違いは可視化も定量化もできない。本稿では,2つの線形ssmを密接な対応で定性的に比較する手法として,近似交叉空間の計算と,モデルにまたがる(超楕円型)許容形状領域との集合論的差異を提案する。この目的のために、マルコフ連鎖モンテカルロを用いて交叉空間に横たわる形状の分布を近似し、その後、後続サンプルに主成分分析(PCA)を適用し、最終的に交叉空間の新しいSSMが得られる。同様の方法で線形SSM間の差異を推定するが、結果として得られる空間はもはや凸ではなく、PCAを適用するのではなく、後続サンプルを用いて視覚化する。提案アルゴリズムは, 交叉空間の計算と解析, 公開可能な顔モデルの違い, 性別固有の男性と女性, およびアイデンティティと表現モデルに着目して, 質的に示す。合成データと実世界のデータから構築したssmに基づく定量的評価により,本手法が接地空間と差を正確に復元できることを示す。 To date, the comparison of Statistical Shape Models (SSMs) is often solely performance-based, carried out by means of simplistic metrics such as compactness, generalization, or specificity. Any similarities or differences between the actual shape spaces can neither be visualized nor quantified. In this paper, we present a new method to qualitatively compare two linear SSMs in dense correspondence by computing approximate intersection spaces and set-theoretic differences between the (hyper-ellipsoidal) allowable shape domains spanned by the models. To this end, we approximate the distribution of shapes lying in the intersection space using Markov chain Monte Carlo and subsequently apply Principal Component Analysis (PCA) to the posterior samples, eventually yielding a new SSM of the intersection space. We estimate differences between linear SSMs in a similar manner; here, however, the resulting spaces are no longer convex and we do not apply PCA but instead use the posterior samples for visualization. We showcase the proposed algorithm qualitatively by computing and analyzing intersection spaces and differences between publicly available face models, focusing on gender-specific male and female as well as identity and expression models. Our quantitative evaluation based on SSMs built from synthetic and real-world data sets provides detailed evidence that the introduced method is able to recover ground-truth intersection spaces and differences accurately.	翻訳日:2023-11-02 04:19:11 公開日:2023-10-30
# グラフデータのためのアウトリア・ロバスト・グロモフ・ワッサーシュタイン Outlier-Robust Gromov-Wasserstein for Graph Data ( http://arxiv.org/abs/2302.04610v2 ) ライセンス: Link先を確認	Lemin Kong, Jiajin Li, Jianheng Tang, Anthony Man-Cho So	(参考訳) gromov-wasserstein (gw) 距離は、異なる計量空間上で支持される確率分布を比較調整するための強力なツールである。近年,GWは多様なグラフ学習タスクのための異種データの整合化のための主要なモデリング手法となっている。しかし、GW距離は外れ値に非常に敏感であることが知られており、目的関数の他のサンプルと同じ重みが与えられた場合、大きな不正確な結果になる可能性がある。この問題を軽減するため、我々はRGWと呼ばれるGW距離の新しい堅牢バージョンを導入する。 RGWは、クルバック・リーバーの発散に基づくあいまいさ集合の中で楽観的に摂動する限界制約を特徴とする。 rgwの利点をより使いやすくするために,bregman proximal alternating linearized minimization algorithmを用いた計算効率と理論的に証明可能な手順を開発した。広範な実験を通じて,RGWがグラフマッチングや部分形状対応などの実世界のグラフ学習タスクにおいて有効であることを示す。 Gromov-Wasserstein (GW) distance is a powerful tool for comparing and aligning probability distributions supported on different metric spaces. Recently, GW has become the main modeling technique for aligning heterogeneous data for a wide range of graph learning tasks. However, the GW distance is known to be highly sensitive to outliers, which can result in large inaccuracies if the outliers are given the same weight as other samples in the objective function. To mitigate this issue, we introduce a new and robust version of the GW distance called RGW. RGW features optimistically perturbed marginal constraints within a Kullback-Leibler divergence-based ambiguity set. To make the benefits of RGW more accessible in practice, we develop a computationally efficient and theoretically provable procedure using Bregman proximal alternating linearized minimization algorithm. Through extensive experimentation, we validate our theoretical results and demonstrate the effectiveness of RGW on real-world graph learning tasks, such as subgraph matching and partial shape correspondence.	翻訳日:2023-11-02 04:10:26 公開日:2023-10-30
# 雑音量子力学における誤差緩和のロバスト性における閾値 Thresholds in the Robustness of Error Mitigation in Noisy Quantum Dynamics ( http://arxiv.org/abs/2302.04278v2 ) ライセンス: Link先を確認	Pradeep Niroula, Sarang Gopalakrishnan, Michael J. Gullans	(参考訳) ノイズの多い短期量子シミュレーションから有用な情報を抽出するには、エラー軽減戦略が必要である。これらの戦略の幅広いクラスは、ノイズ源の正確な評価に依存している。ノイズが不完全である場合,このような戦略の堅牢性について検討する。空間次元のランダムな空間的局所回路に対する誤差緩和のロバスト性におけるしきい値の存在を予測するためにimry-ma引数を適用する。 $d \geq 2$: 雑音特性障害しきい値レート以下では、量子ビット数でスケールする最大倍の誤差緩和が可能となる。対照的に、1次元の回路では、緩和は障害の特徴づけにおける不完全性に対して$\mathcal{O}(1)$の時間で失敗する。その結果,誤差低減は十分な特性を有する雑音の実用的な方法であることがわかった。本稿では, 量子計算の優位性, 測定誘起相転移の耐故障プローブ, および短期デバイスにおける量子アルゴリズムの検証について考察する。 Extracting useful information from noisy near-term quantum simulations requires error mitigation strategies. A broad class of these strategies rely on precise characterization of the noise source. We study the robustness of such strategies when the noise is imperfectly characterized. We adapt an Imry-Ma argument to predict the existence of a threshold in the robustness of error mitigation for random spatially local circuits in spatial dimensions $D \geq 2$: noise characterization disorder below the threshold rate allows for error mitigation up to times that scale with the number of qubits. For one-dimensional circuits, by contrast, mitigation fails at an $\mathcal{O}(1)$ time for any imperfection in the characterization of disorder. As a result, error mitigation is only a practical method for sufficiently well-characterized noise. We discuss further implications for tests of quantum computational advantage, fault-tolerant probes of measurement-induced phase transitions, and quantum algorithms in near-term devices.	翻訳日:2023-11-02 04:09:40 公開日:2023-10-30
# 文脈ラッソ:ディープニューラルネットワークによるスパース線形モデル The Contextual Lasso: Sparse Linear Models via Deep Neural Networks ( http://arxiv.org/abs/2302.00878v3 ) ライセンス: Link先を確認	Ryan Thompson, Amir Dezfouli, Robert Kohn	(参考訳) スパース線形モデル(Sparse linear model)は、機械学習を解釈するためのいくつかの中核的なツールの1つである。残念ながら、スパース線形モデルは、ディープニューラルネットワークのようなブラックボックスモデルよりも、入力機能の関数としてはるかに柔軟性が低い。この能力ギャップを念頭に置いて、入力特徴を2つのグループに分け、解釈可能なモデルに変数として含めるための説明的特徴と、候補変数を選択してその効果を決定する文脈的特徴の2つを考察する。この二分法によって、文脈的特徴の関数としてスパースパターンと係数が変化するような説明的特徴にスパース線形モデルに適合する新しい統計推定器であるcontextual lassoが導かれる。フィッティングプロセスは、ディープニューラルネットワークを介してこの関数を非パラメトリックに学習する。スパース係数を得るために、ネットワークの出力を$\ell_1$-constrained linear modelの空間にマッピングするプロジェクション層の形で、新しいラッソ正規化器を用いてネットワークを訓練する。実データと合成データに関する大規模な実験は、学習されたモデルは、標準的なディープニューラルネットワークの予測力を犠牲にすることなく、通常のラッソよりもスペーサーであることが示唆されている。 Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.	翻訳日:2023-11-02 04:09:07 公開日:2023-10-30
# きめ細かい分類のための粗分類器によるテスト時間修正 Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification ( http://arxiv.org/abs/2302.00368v2 ) ライセンス: Link先を確認	Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi	(参考訳) 細粒度分類における誤り重大度低減の問題について検討する。きめ細かい分類は、主に正確なアノテーションのためのドメインの専門知識を必要とするため困難である。しかしながら、人間は比較的低いレベルの専門知識を必要とするため、特に粗い分類を行うのに適している。そこで本研究では,ラベル階層を用いた階層的アンサンブル(hie)と呼ばれるポストホック補正手法を提案する。葉ノードの親のみを必要とすることにより,avgを有意に減少させる。 iNaturalist-19とタイトされたImageNet-HデータセットのTop-1精度を改善し、両方のベンチマークで新たな最先端を達成した。また,本手法の有効性についても検討した。提案手法は,細粒度クラスにおいてトレーニングデータが減少するにつれて,誤りの重大度が著しく低下する一方で,トップ1の精度において顕著な向上をもたらす。 hieの単純でポストホックな性質は、この予測をさらに改善するために、市販のトレーニング済みモデルでの使用を実用的にします。 We investigate the problem of reducing mistake severity for fine-grained classification. Fine-grained classification can be challenging, mainly due to the requirement of domain expertise for accurate annotation. However, humans are particularly adept at performing coarse classification as it requires relatively low levels of expertise. To this end, we present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE) that utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions. By only requiring the parents of leaf nodes, our method significantly reduces avg. mistake severity while improving top-1 accuracy on the iNaturalist-19 and tieredImageNet-H datasets, achieving a new state-of-the-art on both benchmarks. We also investigate the efficacy of our approach in the semi-supervised setting. Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes. The simplicity and post-hoc nature of HiE renders it practical to be used with any off-the-shelf trained model to improve its predictions further.	翻訳日:2023-11-02 04:08:42 公開日:2023-10-30
# 大規模変圧器モデルの隠れ表現の幾何学 The geometry of hidden representations of large transformer models ( http://arxiv.org/abs/2302.00294v2 ) ライセンス: Link先を確認	Lucrezia Valeriani, Diego Doimo, Francesca Cuturello, Alessandro Laio, Alessio Ansuini, Alberto Cazzaniga	(参考訳) 大きなトランスは、タンパク質配列、画像、テキストなど、さまざまなデータタイプにわたる自己教師型データ分析に使用される強力なアーキテクチャである。これらのモデルでは、データセットのセマンティクス構造は、ある表現と次の表現の間の変換のシーケンスから現れる。これらの表現の幾何学的および統計的性質と、層を移動するときにどのように変化するかを特徴付ける。内在次元(ID)と周辺組成を解析することにより、タンパク質言語タスクと画像再構成タスクで訓練されたトランスフォーマーにおいて、これらの表現が同様に進化することがわかった。最初の層では、データ多様体は拡大し、高次元となり、次いで中間層で著しく収縮する。モデルの最後の部分では、idはほぼ一定か、あるいは第2の浅いピークを形成する。その結果、データセットの意味情報は最初のピークの終わりによりよく表現され、この現象は多様なデータセットで訓練された多くのモデルで観察できることがわかった。以上より,idプロファイルの相対的最小値に対応する中間層での表現は,下流の学習タスクにより適している,意味的コンテンツの最大化を監督せずに識別する明示的な戦略を指摘した。 Large transformers are powerful architectures used for self-supervised data analysis across various data types, including protein sequences, images, and text. In these models, the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next. We characterize the geometric and statistical properties of these representations and how they change as we move through the layers. By analyzing the intrinsic dimension (ID) and neighbor composition, we find that the representations evolve similarly in transformers trained on protein language tasks and image reconstruction tasks. In the first layers, the data manifold expands, becoming high-dimensional, and then contracts significantly in the intermediate layers. In the last part of the model, the ID remains approximately constant or forms a second shallow peak. We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets. Based on our findings, we point out an explicit strategy to identify, without supervision, the layers that maximize semantic content: representations at intermediate layers corresponding to a relative minimum of the ID profile are more suitable for downstream learning tasks.	翻訳日:2023-11-02 04:08:27 公開日:2023-10-30
# 確率流の自己持続速度マッチング Self-Consistent Velocity Matching of Probability Flows ( http://arxiv.org/abs/2301.13737v3 ) ライセンス: Link先を確認	Lingxiao Li, Samuel Hurault, Justin Solomon	(参考訳) 本稿では,時間依存型フォッカー・プランク方程式やワッサーシュタイン勾配流を含む多種多様な質量保存偏微分方程式(PDE)を解くための離散化フリースケーラブルフレームワークを提案する。主な観測は、PDE溶液の時間変化速度場は自己整合性が必要であり、同じ速度場によって特徴づけられる確率フローを含む固定点方程式を満たす必要があることである。固定点方程式の残差を神経パラメータ化で直接最小化する代わりに、強い経験的性能を持つ重要な計算障害をバイパスするバイアス付き勾配推定器を用いた反復的定式化を用いる。従来の手法と比較して,本手法は時間的・空間的な離散化に悩まされず,より広い範囲のPDEをカバーし,高次元までスケールする。実験により,本手法は,利用可能時に解析解を精度良く回収し,学習時間が少ない高次元での優れた性能を実現する。 We present a discretization-free scalable framework for solving a large class of mass-conserving partial differential equations (PDEs), including the time-dependent Fokker-Planck equation and the Wasserstein gradient flow. The main observation is that the time-varying velocity field of the PDE solution needs to be self-consistent: it must satisfy a fixed-point equation involving the probability flow characterized by the same velocity field. Instead of directly minimizing the residual of the fixed-point equation with neural parameterization, we use an iterative formulation with a biased gradient estimator that bypasses significant computational obstacles with strong empirical performance. Compared to existing approaches, our method does not suffer from temporal or spatial discretization, covers a wider range of PDEs, and scales to high dimensions. Experimentally, our method recovers analytical solutions accurately when they are available and achieves superior performance in high dimensions with less training time compared to alternatives.	翻訳日:2023-11-02 04:08:08 公開日:2023-10-30
# Neural Relation Graph: ラベルノイズと外部データの識別のための統一フレームワーク Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data ( http://arxiv.org/abs/2301.12321v5 ) ライセンス: Link先を確認	Jang-Hyun Kim, Sangdoo Yun, Hyun Oh Song	(参考訳) データの診断とクリーニングは、堅牢な機械学習システムを構築するための重要なステップである。しかしながら、ラベルエラーや過剰表現、外れ値といった複雑な問題が存在するため、大規模なデータセット内の問題を特定することは難しい。本稿では,主に無視される情報のソースである特徴埋め込み空間におけるデータの関係構造を利用して,問題データを特定する統一的な手法を提案する。そこで本研究では,データの関係グラフ構造に基づいてラベル誤りや異常データを検出するスケーラブルで効果的なアルゴリズムを提案する。さらに,特徴埋め込み空間におけるデータポイントの文脈情報を提供する可視化ツールを導入し,インタラクティブにデータ診断を行うための効果的なツールとして機能する。本研究では,画像Net,ESC-50,SST2を含む大規模画像,音声,言語領域タスクにおけるラベル誤りとOODの検出性能を評価する。本手法は,検討中のすべてのタスクにおける最先端検出性能を達成し,様々なドメインにまたがる大規模実世界のデータセットのデバッグにおいてその効果を実証する。私たちはhttps://github.com/snu-mllab/Neural-Relation-Graphでコードをリリースします。 Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. We release codes at https://github.com/snu-mllab/Neural-Relation-Graph.	翻訳日:2023-11-02 04:07:48 公開日:2023-10-30
# アクティブラーニング評価の落とし穴を探る--有意義なパフォーマンス評価のための体系的枠組み Navigating the Pitfalls of Active Learning Evaluation: A Systematic Framework for Meaningful Performance Assessment ( http://arxiv.org/abs/2301.10625v2 ) ライセンス: Link先を確認	Carsten T. L\"uth, Till J. Bungert, Lukas Klein, Paul F. Jaeger	(参考訳) Active Learning (AL)は、ラベルなしデータのプールから最も情報性の高いサンプルをインタラクティブに選択することで、ラベル付けの負担を軽減することを目的としている。近年,ALクエリ手法の改良に関する研究が盛んに行われているが,半教師付き(Semi-SL)や自己教師付き学習(Self-SL)といった新たなパラダイムや,分類器構成の簡易な最適化と比較して,ALの有効性を疑問視する研究もある。このように、今日のAL文学は矛盾した、矛盾した風景を示しており、実践者がALをタスクに使用するかどうかと方法について不透明なままである。本研究では,al法を体系的かつ現実的な評価が欠如していることから,この不整合が生じることを仮定する。具体的には,al評価に必要な微妙な考察を反映した文献の5つの落とし穴を明らかにする。さらに,これらの落とし穴を克服し,AL手法の性能に関する有意義な記述を可能にする評価フレームワークを提案する。本プロトコルの妥当性を示すために,様々なデータセット,クエリメソッド,al設定,トレーニングパラダイムにまたがる画像分類に関する大規模実証研究とベンチマークを提案する。本研究は,文献上の矛盾点を明らかにするとともに,実践者に対して手持ちの勧告を行うことを可能にした。ベンチマークはhttps://github.com/IML-DKFZ/realistic-al.comにホストされている。 Active Learning (AL) aims to reduce the labeling burden by interactively selecting the most informative samples from a pool of unlabeled data. While there has been extensive research on improving AL query methods in recent years, some studies have questioned the effectiveness of AL compared to emerging paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL), or a simple optimization of classifier configurations. Thus, today's AL literature presents an inconsistent and contradictory landscape, leaving practitioners uncertain about whether and how to use AL in their tasks. In this work, we make the case that this inconsistency arises from a lack of systematic and realistic evaluation of AL methods. Specifically, we identify five key pitfalls in the current literature that reflect the delicate considerations required for AL evaluation. Further, we present an evaluation framework that overcomes these pitfalls and thus enables meaningful statements about the performance of AL methods. To demonstrate the relevance of our protocol, we present a large-scale empirical study and benchmark for image classification spanning various data sets, query methods, AL settings, and training paradigms. Our findings clarify the inconsistent picture in the literature and enable us to give hands-on recommendations for practitioners. The benchmark is hosted at https://github.com/IML-DKFZ/realistic-al .	翻訳日:2023-11-02 04:06:51 公開日:2023-10-30
# 多クラス分類における量子ニューラルネットワークの課題依存パワー Problem-Dependent Power of Quantum Neural Networks on Multi-Class Classification ( http://arxiv.org/abs/2301.01597v3 ) ライセンス: Link先を確認	Yuxuan Du, Yibo Yang, Dacheng Tao, Min-Hsiu Hsieh	(参考訳) 量子ニューラルネットワーク(QNN)は物理世界を理解する上で重要なツールとなっているが、その利点と限界は完全には理解されていない。特定の符号化方法を持つQNNの中には、古典的なサロゲートによって効率的にシミュレートできるものもあるが、量子メモリを持つものは古典的な分類器よりも優れている。本稿では,マルチクラス分類タスクにおける量子ニューラルネットワーク分類器(qcs)の問題依存パワーを体系的に検討する。予測リスクの分析により, 分類器の訓練損失と一般化誤差を共同で評価する指標として, 訓練損失が一般化能力よりもパワーを支配すること, 第二に, 深層神経分類器の二重発光リスク曲線とは対照的に, qcsはu字型のリスク曲線をとること, の2つの重要な知見を明らかにした。また、最適QCとヘルストローム境界と等角的タイトフレームとの固有接続を明らかにする。そこで本研究では,学習課題における古典的分類器よりもQCの方が有効かどうかを探索するために,損失ダイナミクスを用いた手法を提案する。画像データセットにおける多層パーセプトロン上のqcsの優位性と畳み込みニューラルネットワークの限界を説明するための手法の有効性を数値実験により証明した。我々の研究はQNNの課題依存力に光を当て、その潜在的なメリットを評価するための実践的なツールを提供する。 Quantum neural networks (QNNs) have become an important tool for understanding the physical world, but their advantages and limitations are not fully understood. Some QNNs with specific encoding methods can be efficiently simulated by classical surrogates, while others with quantum memory may perform better than classical classifiers. Here we systematically investigate the problem-dependent power of quantum neural classifiers (QCs) on multi-class classification tasks. Through the analysis of expected risk, a measure that weighs the training loss and the generalization error of a classifier jointly, we identify two key findings: first, the training loss dominates the power rather than the generalization ability; second, QCs undergo a U-shaped risk curve, in contrast to the double-descent risk curve of deep neural classifiers. We also reveal the intrinsic connection between optimal QCs and the Helstrom bound and the equiangular tight frame. Using these findings, we propose a method that uses loss dynamics to probe whether a QC may be more effective than a classical classifier on a particular learning task. Numerical results demonstrate the effectiveness of our approach to explain the superiority of QCs over multilayer Perceptron on parity datasets and their limitations over convolutional neural networks on image datasets. Our work sheds light on the problem-dependent power of QNNs and offers a practical tool for evaluating their potential merit.	翻訳日:2023-11-02 04:06:25 公開日:2023-10-30
# 非凸非凹ミニマックス最適化のためのユニバーサル勾配降下上昇法 Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization ( http://arxiv.org/abs/2212.12978v5 ) ライセンス: Link先を確認	Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jiajin Li	(参考訳) nonconvex-nonconcave minimaxの最適化は、機械学習の幅広い応用により、過去10年間、大きな注目を集めてきた。既存のアルゴリズムの多くは、原始(双対)函数の凸性 (resp. concavity) や、Polyak-\L{}ojasiewicz (P\L{}) や Kurdyka-\L{}ojasiewicz (K\L{}) のような特定の構造のような一方的な情報に依存している。しかし、これらの規則性条件の検証は実際は困難である。この課題を克服するために,2重平滑化勾配降下昇降法 (ds-gda) という,プライマルとデュアルの更新を自然にバランスさせる新しい単一ループアルゴリズムを提案する。すなわち、同じハイパーパラメータを持つds-gdaは、一方のk\l{}特性を持つ非凸凸、凸非凸、非凸非凸問題を一様解くことができ、$\mathcal{o}(\epsilon^{-4})$ で収束する。 k\l{}指数が知られている場合、よりシャープな(最適な)反復複雑性が得られる。具体的には、指数 $\theta\in(0,1)$ の片側 k\l{} 条件の下で、ds-gda は $\mathcal{o}(\epsilon^{-2\max\{2\theta,1\}})$ の反復複雑性で収束する。いずれも文学における最良の結果と一致している。さらに, ds-gda は p\l{} 条件, k\l{} 条件, 弱いミント変分不等式条件などの正規性条件がなくても, 一般の非凸非凸問題に適用可能であることを示した。例えば ``Forsaken'' 、 ``Bilinearly-coupled minimax'' 、 ``Sixth-order polynomial'' 、 ``PolarGame' などである。我々の知る限りでは、このアルゴリズムはこれらすべての恐ろしい問題に収束する最初の一階法である。 Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-\L{}ojasiewicz (P\L{}) and Kurdyka-\L{}ojasiewicz (K\L{}) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided K\L{} properties, achieving convergence with $\mathcal{O}(\epsilon^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the K\L{} exponent is known. Specifically, under the one-sided K\L{} condition with exponent $\theta\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the P\L{} condition, K\L{} condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems.	翻訳日:2023-11-02 04:06:00 公開日:2023-10-30
# マルチモーダルインタラクションの定量化とモデル化:情報分解フレームワーク Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework ( http://arxiv.org/abs/2302.12247v4 ) ライセンス: Link先を確認	Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency	(参考訳) 近年のマルチモーダルアプリケーションへの関心の高まりにより、様々なモダリティから情報を表現・統合するためのデータセットや手法が広く選択された。これらの経験的な進歩にもかかわらず、基礎的な研究の疑問が残る: マルチモーダルなタスクを解決するのに必要な相互作用をどのように定量化できるか? その後、これらの相互作用を捉えるのに最も適したマルチモーダルモデルは何ですか? これらの質問に答えるために,入力モダリティと出力タスクを関連付ける冗長性,特異性,相乗効果の程度を定量化する情報理論的手法を提案する。これら3つの測度をマルチモーダル分布(略してPID)のPID統計と呼び、高次元分布にスケールするこれらのPID統計に対する2つの新しい推定値を導入する。 PID推定を検証するために、PIDが知られている合成データセットと、PID推定を人間のアノテーションと比較する大規模マルチモーダルベンチマークの両方で広範な実験を行う。最後に,(1)マルチモーダルデータセット内のインタラクションの定量化,(2)マルチモーダルモデルでキャプチャされたインタラクションの定量化,(3)モデル選択のための原則的アプローチ,(4)病理学,ムード予測,ロボット知覚における3つの実世界のケーススタディにおいて有用性を示す。 The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.	翻訳日:2023-11-02 03:58:37 公開日:2023-10-30
# アンバウンドマシン・アンラーニングに向けて Towards Unbounded Machine Unlearning ( http://arxiv.org/abs/2302.09880v3 ) ライセンス: Link先を確認	Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, Eleni Triantafillou	(参考訳) ディープラーニングアンラーニング(deep machine unlearning)は、トレーニングセットのサブセットであるトレーニングされたニューラルネットワークから‘削除’する問題である。この問題は、非常にタイムリーで、多くのアプリケーションがあり、バイアス(rb)の除去、混乱解消(rc)(トレーニングされたモデルの誤ったラベルデータによって引き起こされる)、ユーザープライバシを保護するためにユーザの‘忘れられる権利’を行使すること(up)といった重要なタスクがあります。本論文は,異なるアプリケーション(rb,rc,up)のアンラーニングについて,それぞれが独自のデシデラタ,‘フォーゲッティング’の定義,品質を忘れるための関連するメトリクスを持っているという観点から,我々の知識に対して初めて行うものである。 UPでは,非学習者に対する強力なメンバーシップ推論攻撃の新たな適応を提案する。また、rb、rc、upの異なるアプリケーション依存のメトリクスにおいて、品質を忘れてしまっている唯一の方法である、新しいアンラーニングアルゴリズムであるscruを提案する。同時に、SCRUBはモデルユーティリティ(すなわち保持されたデータと一般化の正確性)を測定する指標上でも一貫してトップパフォーマーであり、以前の作業よりも効率的である。以上は、これまでの最先端技術に対する総合的な実証的評価によって裏付けられている。 Deep machine unlearning is the problem of `removing' from a trained neural network a subset of its training set. This problem is very timely and has many applications, including the key tasks of removing biases (RB), resolving confusion (RC) (caused by mislabelled data in trained models), as well as allowing users to exercise their `right to be forgotten' to protect User Privacy (UP). This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for `forgetting' and associated metrics for forget quality. For UP, we propose a novel adaptation of a strong Membership Inference Attack for unlearning. We also propose SCRUB, a novel unlearning algorithm, which is the only method that is consistently a top performer for forget quality across the different application-dependent metrics for RB, RC, and UP. At the same time, SCRUB is also consistently a top performer on metrics that measure model utility (i.e. accuracy on retained data and generalization), and is more efficient than previous work. The above are substantiated through a comprehensive empirical evaluation against previous state-of-the-art.	翻訳日:2023-11-02 03:57:21 公開日:2023-10-30
# DP-SGDにおける境界学習データ再構成 Bounding Training Data Reconstruction in DP-SGD ( http://arxiv.org/abs/2302.07225v3 ) ライセンス: Link先を確認	Jamie Hayes, Saeed Mahloujifar, Borja Balle	(参考訳) 異なるプライベートトレーニングは、通常はメンバーシップ推論攻撃に対する保証として解釈される保護を提供する。この保証はプロキシによって、完全なトレーニング例を抽出しようとするレコンストラクション攻撃など、他の脅威にも拡張される。最近の研究は、もしメンバーシップ攻撃から保護する必要がなく、訓練データ再構成から保護したいというなら、これらのより野心的な攻撃から保護するためにノイズが少ないため、プライベートモデルの有用性を改善することができるという証拠を提供している。さらに,私的深層学習の標準アルゴリズムであるDP-SGDの文脈でこれを検証し,DP-SGDに対する再構築攻撃の成功と,我々の限界の予測に実証的に一致する攻撃に上限を与える。これら2つの結果は,dp-sgdのプライバシパラメータの設定方法について,レコンストラクション攻撃から保護するための詳細な調査の扉を開くものだ。最後に, DP-SGDパラメータの異なる設定を同一のDP保証に導いた場合, 復元における成功率が著しく異なることを示すために, DP保証だけでは再建攻撃に対する保護を制御できない可能性が示唆された。 Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks. By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples. Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against training data reconstruction, then utility of private models can be improved because less noise is required to protect against these more ambitious attacks. We investigate this further in the context of DP-SGD, a standard algorithm for private deep learning, and provide an upper bound on the success of any reconstruction attack against DP-SGD together with an attack that empirically matches the predictions of our bound. Together, these two results open the door to fine-grained investigations on how to set the privacy parameters of DP-SGD in practice to protect against reconstruction attacks. Finally, we use our methods to demonstrate that different settings of the DP-SGD parameters leading to the same DP guarantees can result in significantly different success rates for reconstruction, indicating that the DP guarantee alone might not be a good proxy for controlling the protection against reconstruction attacks.	翻訳日:2023-11-02 03:56:20 公開日:2023-10-30
# 平均h\"older smoothnessを用いた近最適学習 Near-optimal learning with average H\"older smoothness ( http://arxiv.org/abs/2302.06005v3 ) ライセンス: Link先を確認	Steve Hanneke, Aryeh Kontorovich, Guy Kornowski	(参考訳) 我々は、Ashlagi et al. (COLT 2021) によって提案された平均リプシッツの滑らかさの概念を、H\"古い滑らかさに拡張することで一般化する。我々は, 平均H\"高齢者の滑らかさの観点から, 可逆性および非可逆性(雑音性)の回帰設定を, 平均リプシッツの滑らかさの特殊な場合においても, 既知率と既知率の両方で改善する。さらに,我々の下限は,ログ係数に対する実現可能な設定に密着しているため,minimaxレートが確立される。アルゴリズムの観点からは, 平均滑らか性の概念は未知の分布に対して定義されるため, 学習者は関数クラスの明示的な表現を持たないため, ERMの実行は不可能である。それにもかかわらず、我々は(ほぼ)最適な学習率を達成する異なる学習アルゴリズムを提供する。我々の結果は任意の完全有界距離空間を持ち、その内在幾何学の観点で述べられている。総じて,h\"older smoothness の古典的な最悪ケース概念は,本質的に平均値に置き換えられ,よりシャープな保証が得られることを示した。 We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the "effective smoothness" of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic "worst-case" H\"older constant. We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average H\"older smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness. Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.	翻訳日:2023-11-02 03:54:26 公開日:2023-10-30
# 人間とロボットのコラボレーションアプリケーションのための学習データと深層学習によるマルチユーザ行動認識に向けて Towards Multi-User Activity Recognition through Facilitated Training Data and Deep Learning for Human-Robot Collaboration Applications ( http://arxiv.org/abs/2302.05763v3 ) ライセンス: Link先を確認	Francesco Semeraro, Jon Carberry and Angelo Cangelosi	(参考訳) HRI(Human-robot Interaction)研究は、ロボットが複数の人間のユーザと同時に対話するマルチパーティシナリオに、段階的に対処している。逆に、研究はまだ人間とロボットのコラボレーションの初期段階にある。このようなコラボレーションを扱うために機械学習技術を使用するには、典型的なHRCセットアップよりも生成しにくいデータが必要である。本研究は,非Dydic HRCアプリケーションの並列タスクのシナリオを概説する。これらの概念に基づいて,シングルユーザに関連するデータを収集し,後処理でマージすることで,複数ユーザの活動に関するデータ収集の代替手法を提案し,ペア設定の録音に係わる労力を削減する。このステートメントを検証するために、シングルユーザのアクティビティの3dスケルトンポーズが収集され、ペアにマージされた。その後、このようなデータポイントを用いて長期記憶ネットワーク(LSTM)と時空間グラフ畳み込みネットワーク(STGCN)からなる変動オートエンコーダ(VAE)を別々にトレーニングし、両者の協調活動を認識する。その結果、同じ設定で記録されたユーザのグループに関するトレーニングデータと比較すると、この方法で収集したデータをHRC設定のペアに利用し、同様のパフォーマンスを得ることが可能であり、これらのデータの生成にまつわる技術的困難を軽減できることがわかった。関連コードと収集されたデータは公開されている。 Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for human-robot collaboration. The use of machine learning techniques to handle such type of collaboration requires data that are less feasible to produce than in a typical HRC setup. This work outlines scenarios of concurrent tasks for non-dyadic HRC applications. Based upon these concepts, this study also proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing, to reduce the effort involved in producing recordings of pair settings. To validate this statement, 3D skeleton poses of activity of single users were collected and merged in pairs. After this, such datapoints were used to separately train a long short-term memory (LSTM) network and a variational autoencoder (VAE) composed of spatio-temporal graph convolutional networks (STGCN) to recognise the joint activities of the pairs of people. The results showed that it is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings, relieving from the technical difficulties involved in producing these data. The related code and collected data are publicly available.	翻訳日:2023-11-02 03:54:03 公開日:2023-10-30
# Jaccard Metric Losses: ソフトラベルによるJaccard Indexの最適化 Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels ( http://arxiv.org/abs/2302.05666v4 ) ライセンス: Link先を確認	Zifu Wang and Xuefei Ning and Matthew B. Blaschko	(参考訳) iou(intersection over union)損失はjaccardインデックスを直接最適化するサロゲートである。損失関数の一部としてのIoU損失の活用は、クロスエントロピー損失のみのような画素単位の損失を最適化するよりもセグメンテーションタスクにおいて優れた性能を示した。しかし, ソフトラベルを処理できないため, ラベル平滑化, 知識蒸留, 半教師付き学習といった重要な訓練技術をサポートするために, 損失の柔軟性の欠如が確認された。ハードラベルを用いた標準設定では,Jaccard Metric Losses(JML)というソフトなJaccard損失と同じだが,ソフトなラベルと完全に互換性がある。 JMLをラベル平滑化,知識蒸留,半教師付き学習の3つの顕著なユースケースに適用し,モデルの精度と校正性を示す。実験により,4つのセマンティックセグメンテーションデータセット(Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land)と13のアーキテクチャ間のクロスエントロピー損失に対する一貫した改善が示された。驚くべきことに、私たちの直接的なアプローチは、最先端の知識蒸留と半教師付き学習方法を大きく上回っている。コードは \href{https://github.com/zifuwanggg/jdtlosses}{https://github.com/zifuwanggg/jdtlosses} で入手できる。 Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers. Remarkably, our straightforward approach significantly outperforms state-of-the-art knowledge distillation and semi-supervised learning methods. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.	翻訳日:2023-11-02 03:52:51 公開日:2023-10-30
# AIシステムによるマニピュレーションの特徴付け Characterizing Manipulation from AI Systems ( http://arxiv.org/abs/2303.09387v3 ) ライセンス: Link先を確認	Micah Carroll, Alan Chan, Henry Ashton, David Krueger	(参考訳) 操作は、ソーシャルメディア、広告、チャットボットなど、多くのドメインで共通の関心事である。 AIシステムは世界とのインタラクションをより仲介するので、システム設計者の意図なしにAIシステムが人間を操作できる程度を理解することが重要である。我々の研究は、AIシステムのコンテキストにおける操作の定義と測定における課題を明らかにする。第一に、私たちは他の分野からの操作に関する先行文献を構築し、インセンティブ、意図、危害、隠ぺいの概念に依存する操作の可能な概念の空間を特徴づける。各要因の運用方法についての提案をレビューする。第2に,人間(または他のエージェント)を意図的にかつ秘密的に変化させるインセンティブを追求しているかのように振る舞う場合,システムはマニピュレーションである,という特徴に基づく操作の定義を提案する。第3に,マニピュレーションと関連する概念(デセプションや強制など)との関係について論じる。最後に、いくつかのアプリケーションにおける操作の運用をコンテキスト化します。全体的な評価では、AIシステムによる操作の定義と測定にいくつかの進歩があったが、多くのギャップが残っている。コンセンサスの定義や測定のための信頼できるツールがないため、システム設計者の意図なしにAIシステムが人間の操作を学ぶ可能性を排除することはできない。このような操作は、人間の自律性に重大な脅威をもたらし、それを軽減するための予防措置が保証されていることを示唆している。 Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.	翻訳日:2023-11-02 03:45:23 公開日:2023-10-30
# 点クラウドのためのパラメトリック表面制約アップサンプラーネットワーク Parametric Surface Constrained Upsampler Network for Point Cloud ( http://arxiv.org/abs/2303.08240v2 ) ライセンス: Link先を確認	Pingping Cai and Zhenyao Wu and Xinyi Wu and Song Wang	(参考訳) スパースポイント表現を与えられたクリーンで高密度なポイントクラウドを生成することを目的としたポイントクラウドアップサンプラーの設計は、コンピュータビジョンにおける根本的な挑戦的な問題である。一連の試みは、ディープニューラルネットワークを介してポイントツーポイントマッピング関数を確立することによって、この目標を達成する。しかし、これらのアプローチは表面レベルの明示的な制約が欠如しているため、異常点を生じやすい。この問題を解決するために,ニューラルネットワークにバイコビック関数と回転関数で表されるパラメトリック曲面を学習させ,そこで新たに生成された点を基底面に拘束することにより,新しいサーフェス正規化器をアップサンプラーネットワークに導入する。これらの設計は、2つの異なるネットワークに統合され、レイヤポイントクラウドのアップサンプリングとポイントクラウドのコンプリートによる評価の利点を活かす。両課題の最先端実験結果から,提案手法の有効性が示された。実装コードはhttps://github.com/corecai163/PSCUで公開される。 Designing a point cloud upsampler, which aims to generate a clean and dense point cloud given a sparse point representation, is a fundamental and challenging problem in computer vision. A line of attempts achieves this goal by establishing a point-to-point mapping function via deep neural networks. However, these approaches are prone to produce outlier points due to the lack of explicit surface-level constraints. To solve this problem, we introduce a novel surface regularizer into the upsampler network by forcing the neural network to learn the underlying parametric surface represented by bicubic functions and rotation functions, where the new generated points are then constrained on the underlying surface. These designs are integrated into two different networks for two tasks that take advantages of upsampling layers - point cloud upsampling and point cloud completion for evaluation. The state-of-the-art experimental results on both tasks demonstrate the effectiveness of the proposed method. The implementation code will be available at https://github.com/corecai163/PSCU.	翻訳日:2023-11-02 03:44:43 公開日:2023-10-30
# コンピュータグラフィックス画像の主観的・客観的品質評価 Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images ( http://arxiv.org/abs/2303.08050v3 ) ライセンス: Link先を確認	Zicheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, Jing Liu, Xiongkuo Min, and Guangtao Zhai	(参考訳) コンピュータグラフィックス画像(CGI)は、コンピュータプログラムによって人工的に生成され、ゲームやストリーミングメディアなどの様々なシナリオにおいて広く認識されている。実際には、CGIの品質は、生産期間中のレンダリングの低下、マルチメディアアプリケーションの送信時に必然的な圧縮アーティファクト、構成と設計の低下による美的品質の低下に常に悩まされている。しかし、コンピュータグラフィックス画像品質評価(CGIQA)の課題に対処する研究はほとんど行われていない。ほとんどの画像品質評価(IQA)メトリクスは、自然シーン画像(NSI)のために開発され、合成歪みを持つNSIからなるデータベース上で検証される。 NSIとCGIの品質評価のギャップを埋めるため,6,000のCGI(CGIQA-6k)からなる大規模CGIQAデータベースを構築し,CGIの正確な知覚評価を得るために,よく制御された実験環境において主観的な実験を行う。そこで本研究では,歪みと審美的品質の表現を両立し,効果的な深層学習に基づくno-reference (nr) iqaモデルを提案する。実験の結果,提案手法は構築されたCGIQA-6kデータベースや他のCGIQA関連データベース上で,最先端のNR IQA手法よりも優れていた。データベースはhttps://github.com/zzc-1998/cgiqa6kでリリースされる。 Computer graphics images (CGIs) are artificially generated by means of computer programs and are widely perceived under various scenarios, such as games, streaming media, etc. In practice, the quality of CGIs consistently suffers from poor rendering during production, inevitable compression artifacts during the transmission of multimedia applications, and low aesthetic quality resulting from poor composition and design. However, few works have been dedicated to dealing with the challenge of computer graphics image quality assessment (CGIQA). Most image quality assessment (IQA) metrics are developed for natural scene images (NSIs) and validated on databases consisting of NSIs with synthetic distortions, which are not suitable for in-the-wild CGIs. To bridge the gap between evaluating the quality of NSIs and CGIs, we construct a large-scale in-the-wild CGIQA database consisting of 6,000 CGIs (CGIQA-6k) and carry out the subjective experiment in a well-controlled laboratory environment to obtain the accurate perceptual ratings of the CGIs. Then, we propose an effective deep learning-based no-reference (NR) IQA model by utilizing both distortion and aesthetic quality representation. Experimental results show that the proposed method outperforms all other state-of-the-art NR IQA methods on the constructed CGIQA-6k database and other CGIQA-related databases. The database is released at https://github.com/zzc-1998/CGIQA6K.	翻訳日:2023-11-02 03:44:23 公開日:2023-10-30
# 損失検査による物体検出データセットにおけるラベル誤りの同定 Identifying Label Errors in Object Detection Datasets by Loss Inspection ( http://arxiv.org/abs/2303.06999v2 ) ライセンス: Link先を確認	Marius Schubert, Tobias Riedlinger, Karsten Kahl, Daniel Kr\"oll, Sebastian Schoenen, Sini\v{s}a \v{S}egvi\'c, Matthias Rottmann	(参考訳) 教師付きオブジェクト検出のためのデータセットのラベル付けは退屈で時間を要する作業である。エラーはアノテーション中に簡単に導入でき、レビュー中に見落とされ、不正確なベンチマークとノイズラベルに基づいてトレーニングされたディープニューラルネットワークのパフォーマンス劣化をもたらす。本稿では,まず,オブジェクト検出データセットにおけるラベル誤り検出手法のベンチマークとラベルエラー検出手法とベースラインをいくつか紹介する。 4種類のランダムに導入されたラベルエラーを列車上でシミュレートし,よくラベルされたオブジェクト検出データセットをテストセットとした。ラベル誤り検出法では,2段階の物体検出器が与えられると仮定し,両者の分類と回帰損失の総和を考察する。損失は、後者を検出することを目的として、予測とシミュレートされたラベルエラーを含むノイズラベルに対して計算される。我々は,本手法を3つのベースラインと比較した。深層学習のないナイーブな手法,対象検出器のスコア,分類ソフトマックス分布のエントロピーである。すべてのベースラインを上回り、検討したメソッドの中で、4つのタイプのラベルエラーを効率的に検出する唯一の方法であることを実証します。さらに実際のラベルエラーを検知し a) オブジェクト検出において一般的に使用されるテストデータセットについて b) プロプライエタリなデータセット。いずれの場合も偽陽性率が低い、すなわちラベルエラーを精度良く検出する。 a)71.5%まで、及び b) 97%であった。 Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural networks trained on noisy labels. In this work, we for the first time introduce a benchmark for label error detection methods on object detection datasets as well as a label error detection method and a number of baselines. We simulate four different types of randomly introduced label errors on train and test sets of well-labeled object detection datasets. For our label error detection method we assume a two-stage object detector to be given and consider the sum of both stages' classification and regression losses. The losses are computed with respect to the predictions and the noisy labels including simulated label errors, aiming at detecting the latter. We compare our method to three baselines: a naive one without deep learning, the object detector's score and the entropy of the classification softmax distribution. We outperform all baselines and demonstrate that among the considered methods, ours is the only one that detects label errors of all four types efficiently. Furthermore, we detect real label errors a) on commonly used test datasets in object detection and b) on a proprietary dataset. In both cases we achieve low false positives rates, i.e., we detect label errors with a precision for a) of up to 71.5% and for b) with 97%.	翻訳日:2023-11-02 03:43:56 公開日:2023-10-30
# SHAP-IQ:任意の順序共有相互作用の統一近似 SHAP-IQ: Unified Approximation of any-order Shapley Interactions ( http://arxiv.org/abs/2303.01179v3 ) ライセンス: Link先を確認	Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke H\"ullermeier, Barbara Hammer	(参考訳) 説明可能な人工知能(XAI)の研究において、あらゆるブラックボックスモデルの特徴属性を決定するためにShapley値(SV)が適用される。シェープ相互作用指標はSVを拡張して任意の順序の特徴相互作用を定義する。ユニークなシャプリー相互作用指数の定義は、オープンリサーチの問題であり、これまで3つの定義が提案されてきたが、これは公理の選択によって異なる。さらに、各定義には特定の近似技術が必要である。本稿では,任意の基数相互作用指標(CII)に対するシャプリー相互作用を効率よく計算するためのサンプリングベース近似であるSHAPley Interaction Quantification (SHAP-IQ)を提案する。 SHAP-IQは、新しい表現に基づいており、既存の手法とは対照的に、近似品質の理論的保証と点推定の分散の推定を提供する。 SVの特殊な場合,本手法はSVの新規な表現を明らかにし,Unbiased KernelSHAPに対応して計算を単純化する。本稿では,言語,画像分類,高次元合成モデルを説明することにより,計算効率と有効性を説明する。 Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.	翻訳日:2023-11-02 03:40:47 公開日:2023-10-30
# WEARDA:人間の活動監視のためのウェアラブルセンサーデータの記録 WEARDA: Recording Wearable Sensor Data for Human Activity Monitoring ( http://arxiv.org/abs/2303.00064v2 ) ライセンス: Link先を確認	Richard M.K. van Dijk, Daniela Gawehns and Matthijs van Leeuwen	(参考訳) 本稿では,オープンソースのウェアラブルセンサデータ取得ソフトウェアであるweardaを提案する。 WEARDAはスマートウォッチによる人間の活動データ取得を促進しており、主に透明性、完全な制御、生のセンサーデータへのアクセスを必要とする研究者を対象としている。これは4つのセンサー(三軸加速度計、三軸ジャイロスコープ、気圧計、GPS)の生データを同時に記録する機能を提供する。 Tizen OSを搭載したSamsungのスマートウォッチが選ばれた 1)スマートウォッチソフトウェアAPIに必要な機能。 2) ソフトウェア開発ツールとアクセス可能なドキュメントの可用性。 3) 必要なセンサを有すること、及び 4) 対象ユーザグループによる受け入れのためのケースデザインの要件。 WEARDAは、効率的でエラーのないデータ収集を保証するための準備、計測、物流、プライバシー保護、再現性に関する5つの実践的な課題に対処する。ソフトウェアパッケージは最初、"コミュニティの中心にあるDementia Back"プロジェクトのために作成され、そのコンテキストでうまく使われています。 We present WEARDA, the open source WEARable sensor Data Acquisition software package. WEARDA facilitates the acquisition of human activity data with smartwatches and is primarily aimed at researchers who require transparency, full control, and access to raw sensor data. It provides functionality to simultaneously record raw data from four sensors -- tri-axis accelerometer, tri-axis gyroscope, barometer, and GPS -- which should enable researchers to, for example, estimate energy expenditure and mine movement trajectories. A Samsung smartwatch running the Tizen OS was chosen because of 1) the required functionalities of the smartwatch software API, 2) the availability of software development tools and accessible documentation, 3) having the required sensors, and 4) the requirements on case design for acceptance by the target user group. WEARDA addresses five practical challenges concerning preparation, measurement, logistics, privacy preservation, and reproducibility to ensure efficient and errorless data collection. The software package was initially created for the project "Dementia back at the heart of the community", and has been successfully used in that context.	翻訳日:2023-11-02 03:40:26 公開日:2023-10-30
# アウトソース機械学習タスクの低コスト結果検証のための生成フレームワーク A Generative Framework for Low-Cost Result Validation of Outsourced Machine Learning Tasks ( http://arxiv.org/abs/2304.00083v3 ) ライセンス: Link先を確認	Abhinav Kumar, Miguel A. Guirao Aguilera, Reza Tourani, Satyajayant Misra	(参考訳) 機械学習(ML)の人気が高まり、さまざまなセンシティブなドメインにデプロイされるようになり、MLのセキュリティとプライバシを重視した大きな研究がもたらされた。しかしながら、自動運転など一部のアプリケーションでは、アウトソースされたMLワークロードの整合性検証がより重要になっている。マルチパーティ計算や証明ベースシステムといった既存のソリューションは、計算オーバーヘッドがかなり大きいため、リアルタイムアプリケーションには適さない。我々は、アウトソースされたMLワークロードのリアルタイム検証のための新しいフレームワークであるFidesを提案する。 Fidesは、信頼された実行環境内で実行中に対応するサービスモデルを検証するための、空間を動的に蒸留し微調整する、新しい、効率的な蒸留技術である、Greedy Distillation Transfer Learningを特徴としている。 fideは、統計分析と分岐測定を使用して、サービスモデルが攻撃されている場合に高い確率で識別するクライアント側の攻撃検出モデルを備えている。 Fidesはまた、攻撃が特定されるたびに元のクラスを予測する再分類機能を提供する。攻撃検出と再分類モデルの訓練のための生成的逆ネットワークフレームワークを考案した。評価の結果,fideは攻撃検出で最大98%,再分類で94%の精度を達成した。 The growing popularity of Machine Learning (ML) has led to its deployment in various sensitive domains, which has resulted in significant research focused on ML security and privacy. However, in some applications, such as autonomous driving, integrity verification of the outsourced ML workload is more critical--a facet that has not received much attention. Existing solutions, such as multi-party computation and proof-based systems, impose significant computation overhead, which makes them unfit for real-time applications. We propose Fides, a novel framework for real-time validation of outsourced ML workloads. Fides features a novel and efficient distillation technique--Greedy Distillation Transfer Learning--that dynamically distills and fine-tunes a space and compute-efficient verification model for verifying the corresponding service model while running inside a trusted execution environment. Fides features a client-side attack detection model that uses statistical analysis and divergence measurements to identify, with a high likelihood, if the service model is under attack. Fides also offers a re-classification functionality that predicts the original class whenever an attack is identified. We devised a generative adversarial network framework for training the attack detection and re-classification models. The evaluation shows that Fides achieves an accuracy of up to 98% for attack detection and 94% for re-classification.	翻訳日:2023-11-02 03:32:47 公開日:2023-10-30
# BERT4ETH:Ethereumフラッド検出のためのトレーニング済み変換器 BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection ( http://arxiv.org/abs/2303.18138v2 ) ライセンス: Link先を確認	Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, Ling Liu	(参考訳) 様々な詐欺がethereumで拡散するので、これらの悪意のある活動に対して保護し、脆弱なユーザーを犠牲にしないようにすることが不可欠である。現在の研究はグラフベースの不正検出アプローチのみに依存しているが、高度に繰り返し、歪んだ分散、異種ethereumトランザクションを扱うのに適していない可能性がある。これらの課題に対処するために、ethereum上でさまざまな不正行為を検出するためのアカウント表現抽出器として機能するユニバーサルプリトレーニングトランスコーダbert4ethを提案する。 BERT4ETHは、Ethereumトランザクション固有の動的シーケンシャルパターンをキャプチャするTransformerの優れたモデリング機能を備えており、EthereumのBERTモデルを3つの実践的で効果的な戦略、すなわち反復性削減、スキュー緩和、異種性モデリングで事前トレーニングする際の課題に対処する。実験により,BERT4ETHは,フィッシングアカウントの検出や匿名化タスクにおいて,最先端の手法よりも優れた性能を示した。 BERT4ETHのコードは以下の通りである。 As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.	翻訳日:2023-11-02 03:32:26 公開日:2023-10-30
# アダプティブリファインメントとカントロビッチ計量によるデータ駆動抽象化 [拡張版] Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version] ( http://arxiv.org/abs/2303.17618v4 ) ライセンス: Link先を確認	Adrien Banse, Licio Romao, Alessandro Abate, Rapha\"el M. Jungers	(参考訳) 本稿では,動的システムのスマートでスケーラブルな抽象化のための適応的改良手順を提案する。我々の手法は将来の出力の観測に依存する状態空間の分割に依存している。しかし、この知識は適応的で非対称な方法で動的に構築される。最適構造を学ぶために,マルコフ鎖間のカントロヴィチに触発された計量を定義し,損失関数として用いる。私たちの技術はデータ駆動型フレームワークに傾向がありますが、制限はありません。また、上記のマルコフ連鎖間の計量の性質について研究し、より広い目的のために応用できると考えている。近似アルゴリズムを提案し,従来の線形プログラミング手法よりも計算の複雑さがはるかに高いことを示す。 We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.	翻訳日:2023-11-02 03:32:06 公開日:2023-10-30
# 負サンプリングを超えた効率的な分散表現 Efficient distributed representations beyond negative sampling ( http://arxiv.org/abs/2303.17475v2 ) ライセンス: Link先を確認	Lorenzo Dall'Amico and Enrico Maria Belliardo	(参考訳) 本稿では,分散表現を学習するための効率的な手法について述べる。これはWord2Vecアルゴリズムで導入されたものと類似した目的関数を最小化し、後にいくつかの作品で採用された。最適化計算のボトルネックは、サンプルサイズと2次にスケーリングする多数の演算を必要とするソフトマックス正規化定数の計算である。この複雑さは大規模なデータセットには不適であり、負のサンプリングは一般的な回避策であり、サンプルサイズに関して線形時間で分散表現を得ることができる。しかし、負のサンプリングは損失関数の変更に含まれるため、当初提案されたものと異なる最適化問題を解決する。我々の貢献は、sotfmax正規化定数を線形時間で推定できることを示し、分散表現を学習するための効率的な最適化戦略を設計できることである。単語とノードの埋め込みに関連する2つの一般的なアプリケーションで近似をテストします。その結果, 計算時間が著しく低い負サンプリングに対して, 精度で競合する性能を実証した。 This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.	翻訳日:2023-11-02 03:31:55 公開日:2023-10-30
# 複素値ニューラルネットワークを用いた最適近似 Optimal approximation using complex-valued neural networks ( http://arxiv.org/abs/2303.16813v2 ) ライセンス: Link先を確認	Paul Geuchen, Felix Voigtlaender	(参考訳) 複雑評価ニューラルネットワーク(CVNN)は先日、リカレントニューラルネットワークの安定性の向上や、MRIフィンガープリントなどの複雑な値入力を伴うタスクのパフォーマンス向上など、有望な実証的な成功を示している。真に評価されたケースにおけるDeep Learningの圧倒的な成功は、成長する数学的基盤によって支えられているが、そのような基礎は、複雑な評価されたケースにおいて依然としてほとんど欠落している。そこで, cvnnの近似特性を解析し, 表現率を解析した。以上の結果から,人気のあるmodreluおよび複合型心筋活性化機能を含む幅広い活性化機能に適用できるcvnnの定量的近似限界が得られた。正確には、この結果は、ある空でない開集合上の多ハーモニックでない滑らかな任意の活性化関数に適用できる;これは複素集合への滑らかで非多項の活性化関数のクラスの自然な一般化である。我々の主な結果は、$C^k$-函数の近似誤差が$m^{-k/(2n)}$ for $m \to \infty$ ここで、$m$はニューロンの数、$k$は対象関数の滑らかさ、$n$は(複雑な)入力次元であることを示している。自然連続性仮定では、この速度が最適であることを示し、この仮定を捨てる際の最適性をさらに議論する。さらに,連続近似法を用いて$c^k$-関数を近似する問題は必然的に次元の呪いに苦しむことを証明した。 Complex-valued neural networks (CVNNs) have recently shown promising empirical success, for instance for increasing the stability of recurrent neural networks and for improving the performance in tasks with complex-valued inputs, such as in MRI fingerprinting. While the overwhelming success of Deep Learning in the real-valued case is supported by a growing mathematical foundation, such a foundation is still largely lacking in the complex-valued case. We thus analyze the expressivity of CVNNs by studying their approximation properties. Our results yield the first quantitative approximation bounds for CVNNs that apply to a wide class of activation functions including the popular modReLU and complex cardioid activation functions. Precisely, our results apply to any activation function that is smooth but not polyharmonic on some non-empty open set; this is the natural generalization of the class of smooth and non-polynomial activation functions to the complex setting. Our main result shows that the error for the approximation of $C^k$-functions scales as $m^{-k/(2n)}$ for $m \to \infty$ where $m$ is the number of neurons, $k$ the smoothness of the target function and $n$ is the (complex) input dimension. Under a natural continuity assumption, we show that this rate is optimal; we further discuss the optimality when dropping this assumption. Moreover, we prove that the problem of approximating $C^k$-functions using continuous approximation methods unavoidably suffers from the curse of dimensionality.	翻訳日:2023-11-02 03:31:42 公開日:2023-10-30
# 非線型部分可観測系に対する確率的逆最適制御は知覚の不確実性と行動コストを乱す Probabilistic inverse optimal control for non-linear partially observable systems disentangles perceptual uncertainty and behavioral costs ( http://arxiv.org/abs/2303.16698v2 ) ライセンス: Link先を確認	Dominik Straub, Matthias Schultheis, Heinz Koeppl, Constantin A. Rothkopf	(参考訳) 逆最適制御は、シーケンシャルな意思決定タスクの振る舞いを特徴づけるのに使うことができる。しかし、既存の作業のほとんどは完全に観測可能なシステムや線形システムに限定されている。本稿では、観測不能な動作信号を持つ確率的非線形系に対する逆最適制御の確率論的アプローチを導入し、最大因果エントロピー定式化による逆最適制御に対する以前のアプローチを統一する。エージェントの知覚・運動系のノイズ特性の明示的なモデルと局所線形化手法を用いて,モデルパラメータの近似近似近似関数を導出し,単一のフォワードパス内で計算できる。 2つの古典的な制御課題と2つの人間の行動課題の確率的および部分的に観察可能なバージョンの定量的評価を行った。また,本手法は,認知的行動や実用的行動が,アクティブセンシングやアクティブラーニングといった不確実性下での逐次意思決定に絡み合っているにもかかわらず,知覚的要因や行動的コストを解消できることを示す。提案手法は、模倣学習から感覚運動神経科学まで幅広い応用性を有する。 Inverse optimal control can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, is limited to fully observable or linear systems, or requires the action signals to be known. Here, we introduce a probabilistic approach to inverse optimal control for partially observable stochastic non-linear systems with unobserved action signals, which unifies previous approaches to inverse optimal control with maximum causal entropy formulations. Using an explicit model of the noise characteristics of the sensory and motor systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood function for the model parameters, which can be computed within a single forward pass. We present quantitative evaluations on stochastic and partially observable versions of two classic control tasks and two human behavioral tasks. Importantly, we show that our method can disentangle perceptual factors and behavioral costs despite the fact that epistemic and pragmatic actions are intertwined in sequential decision-making under uncertainty, such as in active sensing and active learning. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.	翻訳日:2023-11-02 03:31:14 公開日:2023-10-30
# confide:pdesのコンテキスト有限差分モデリング CONFIDE: Contextual Finite Differences Modelling of PDEs ( http://arxiv.org/abs/2303.15827v2 ) ライセンス: Link先を確認	Ori Linial, Orly Avner, Dotan Di Castro	(参考訳) 本稿では、学習コンテキストに基づいて、以前に見つからなかったダイナミックスによって生成されたデータサンプルから明示的なPDEを推測する手法を提案する。トレーニングフェーズは、方程式の形式に関する知識を微分スキームと統合し、推論フェーズは、データサンプルに適合し、信号予測とデータ説明の両方を可能にするPDEを生成する。提案手法とsomaアプローチを比較した広範な実験結果と,予測誤差と説明可能性の観点から解の異なるフレーバーについて検討したアブレーション実験を含む。 We introduce a method for inferring an explicit PDE from a data sample generated by previously unseen dynamics, based on a learned context. The training phase integrates knowledge of the form of the equation with a differential scheme, while the inference phase yields a PDE that fits the data sample and enables both signal prediction and data explanation. We include results of extensive experimentation, comparing our method to SOTA approaches, together with ablation studies that examine different flavors of our solution in terms of prediction error and explainability.	翻訳日:2023-11-02 03:30:34 公開日:2023-10-30
# ニューラルスケーリングの量子化モデル The Quantization Model of Neural Scaling ( http://arxiv.org/abs/2303.13506v2 ) ライセンス: Link先を確認	Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark	(参考訳) ニューラルスケーリング法則の量子化モデルを提案し、モデルとデータサイズによる損失の観測されたパワー則と、スケールによる新しい機能の突然の出現について説明する。このモデルは、ネットワーク知識とスキルを離散的なチャンク(\textbf{quanta}$)に"量子化"する量子化仮説(Quantization hypothesis)と呼ばれています。使用頻度を減少させるために量子が学習されると、使用頻度における電力法則が観測された損失のスケーリングを説明する。この予測をおもちゃのデータセット上で検証し,大規模言語モデルにおけるスケーリング曲線の分解について検討する。言語モデル勾配を用いて、モデルの振る舞いを様々なスキル(量子)に自動的に分解する。トレーニング分布でこれらの量子が使用される周波数は、言語モデルに対する経験的スケーリング指数に対応する電力法則に従っており、我々の理論の予測である。 We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($\textbf{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta). We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory.	翻訳日:2023-11-02 03:30:01 公開日:2023-10-30
# 画像としての時系列:不規則にサンプリングされた時系列の視覚トランスフォーマー Time Series as Images: Vision Transformer for Irregularly Sampled Time Series ( http://arxiv.org/abs/2303.12799v2 ) ライセンス: Link先を確認	Zekun Li, Shiyang Li, Xifeng Yan	(参考訳) 不規則にサンプリングされた時系列は、特に医学領域においてますます普及している。これらの不規則性を扱うための様々な特殊な手法が開発されているが、それらの複雑な力学を効果的にモデル化し、空間性を示すことは依然として課題である。本稿では,不規則にサンプリングされた時系列を線グラフ画像に変換し,画像分類と同様に強力な事前学習された視覚トランスフォーマを用いて時系列分類を行う新しい視点を提案する。この手法はアルゴリズム設計を単純化するだけでなく、時系列モデリングの普遍的なフレームワークとして機能する可能性も提示する。注目すべきは、その単純さにもかかわらず、私たちのアプローチは、いくつかの一般的な医療および人間の活動データセットに関する最先端の特殊アルゴリズムよりも優れていることです。特にテスト中に変数の一部が省略された厳密な離脱センサー設定では、様々な観測値に対して強い頑健性を示し、たとえ半分の変数がマスクされていたとしても、絶対的なf1得点点において42.8%の大幅な改善を達成している。コードとデータはhttps://github.com/leezekun/vitstで入手できる。 Irregularly sampled time series are increasingly prevalent, particularly in medical domains. While various specialized methods have been developed to handle these irregularities, effectively modeling their complex dynamics and pronounced sparsity remains a challenge. This paper introduces a novel perspective by converting irregularly sampled time series into line graph images, then utilizing powerful pre-trained vision transformers for time series classification in the same way as image classification. This method not only largely simplifies specialized algorithm designs but also presents the potential to serve as a universal framework for time series modeling. Remarkably, despite its simplicity, our approach outperforms state-of-the-art specialized algorithms on several popular healthcare and human activity datasets. Especially in the rigorous leave-sensors-out setting where a portion of variables is omitted during testing, our method exhibits strong robustness against varying degrees of missing observations, achieving an impressive improvement of 42.8% in absolute F1 score points over leading specialized baselines even with half the variables masked. Code and data are available at https://github.com/Leezekun/ViTST	翻訳日:2023-11-02 03:28:49 公開日:2023-10-30
# FedML-HE: 効率的な同型暗号化に基づくプライバシー保護フェデレーション学習システム FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System ( http://arxiv.org/abs/2303.10837v2 ) ライセンス: Link先を確認	Weizhao Jin, Yuhang Yao, Shanshan Han, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He	(参考訳) federated learningは、ローカルデータの代わりにローカルモデルのアップデートを集約することで、分散デバイス上でマシンラーニングモデルをトレーニングする。しかし、サーバ上の集約されたローカルモデルが反転攻撃によって機密性の高い個人情報を明らかにする可能性があるため、プライバシの懸念が生じる。ホモモルフィック暗号化(HE)のようなプライバシ保護手法はFLトレーニングに必要となる。 HEのプライバシー上の優位性にもかかわらず、そのアプリケーションは特に基礎モデルにおいて非現実的なオーバーヘッドに悩まされている。本稿では,HedML-HEをベースとした安全なモデルアグリゲーションを効率よく実現した,最初の実践的フェデレーション学習システムを提案する。 fedml-heは機密パラメータを選択的に暗号化し、トレーニング中の計算と通信のオーバーヘッドを大幅に削減し、カスタマイズ可能なプライバシ保護を提供する。最適化されたシステムでは,特に大規模な基盤モデル(ResNet-50では10倍,BERTでは40倍程度)において,大幅なオーバーヘッド削減を実現しています。 Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.	翻訳日:2023-11-02 03:27:59 公開日:2023-10-30
# オブジェクト認識同変基本反応拡散モデルによる正確な遷移状態生成 Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model ( http://arxiv.org/abs/2304.06174v3 ) ライセンス: Link先を確認	Chenru Duan, Yuanqi Du, Haojun Jia, and Heather J. Kulik	(参考訳) 遷移状態 (TS) 探索は反応機構の解明と反応ネットワークの探索に重要である。しかし、正確な3次元TS構造を探すには、ポテンシャルエネルギー面の複雑さのために多くの計算集約的な量子化学計算が必要である。そこで我々は, 反応器, TS, および生成物の集合を生成するために, 全ての物理対称性と制約を満たすオブジェクト指向SE(3)同変拡散モデルを開発した。反応物と生成物により、このモデルは量子化学に基づく最適化を行うのに必要な時間ではなく、数秒でTS構造を生成する。生成されたTS構造は、真のTSに比べて0.08 {\AA}根の平均平方偏差が中央値となる。不確実性定量化のための信頼度スコアリングモデルを用いて、最も難しい反応の14\%で量子化学に基づく最適化を行うことで、反応速度推定に必要な精度(2.6 kcal/mol)にアプローチする。提案手法は未知の機構を持つ大規模反応ネットワークの構築に有用である。 Transition state (TS) search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D TS structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here, we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures - reactant, TS, and product - in an elementary reaction. Provided reactant and product, this model generates a TS structure in seconds instead of hours required when performing quantum chemistry-based optimizations. The generated TS structures achieve a median of 0.08 {\AA} root mean square deviation compared to the true TS. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction rate estimation (2.6 kcal/mol) by only performing quantum chemistry-based optimizations on 14\% of the most challenging reactions. We envision the proposed approach useful in constructing large reaction networks with unknown mechanisms.	翻訳日:2023-11-02 03:21:09 公開日:2023-10-30
# DreamPose:安定拡散によるファッション画像とビデオの合成 DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion ( http://arxiv.org/abs/2304.06025v4 ) ライセンス: Link先を確認	Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman	(参考訳) 静止画像からアニメーション・ファッション・ビデオを生成する拡散法であるDreamPoseを提案する。画像と人間のポーズのシーケンスが与えられたら、人間の動きと布の動きの両方を含むビデオを合成する。これを実現するために,事前学習したテキストから画像への拡散(stable diffusion)を,新たな微調整戦略,付加されたコンディショニング信号をサポートするアーキテクチャ変更のセット,時間的一貫性を促進する技術を用いて,ポーズ・アンド・イメージ誘導ビデオ合成モデルに変換する。 ubcのファッションデータセットのファッションビデオのコレクションについて詳しく調べた。本手法は衣料品のスタイルやポーズを多岐にわたって評価し, ファッションビデオの映像化における最先端の成果が得られたことを実証する。 We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation.Video results are available on our project page.	翻訳日:2023-11-02 03:20:55 公開日:2023-10-30
# 衛星映像超解像のための局所-グローバル時間差学習 Local-Global Temporal Difference Learning for Satellite Video Super-Resolution ( http://arxiv.org/abs/2304.04421v2 ) ライセンス: Link先を確認	Yi Xiao, Qiangqiang Yuan, Kui Jiang, Xianyu Jin, Jiang He, Liangpei Zhang, Chia-wen Lin	(参考訳) 光フローベースおよびカーネルベースのアプローチは、衛星ビデオ超解法(VSR)における時間的補償のために広く研究されている。しかし、これらの手法は大規模または複雑なシナリオ、特に衛星ビデオでは一般化されていない。本稿では,その時間的差異を有効かつ効果的な時間的補償に活用することを提案する。フレーム内の局所的および大域的時間的情報を完全に活用するために, 短期的および長期的時間的不整合を体系的にモデル化した。具体的には、隣接フレーム間のRGB差分マップから局所的な動き表現を抽出するための短期時間差分モジュール(S-TDM)を考案し、より正確なテクスチャ表現の手がかりを得る。フレーム列全体の大域的依存性を調べるために、時間的特徴の変調を導くために、前方セグメントと後方セグメントの差を組み込んで活性化する長期時間的差分モジュール(l-tdm)が提案されている。さらに,対象フレームの空間分布と時間補正結果との相互作用を豊かにするための差分補償ユニット(dcu)を提案する。 5つのメインストリームビデオ衛星に対して厳密な客観的・主観評価を行った結果,本手法は最先端のアプローチに好適な効果を示した。コードはhttps://github.com/XY-boy/LGTDで入手できる。 Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies since we observed that these discrepancies offer distinct and mutually complementary properties. Specifically, we devise a Short-term Temporal Difference Module (S-TDM) to extract local motion representations from RGB difference maps between adjacent frames, which yields more clues for accurate texture representation. To explore the global dependency in the entire frame sequence, a Long-term Temporal Difference Module (L-TDM) is proposed, where the differences between forward and backward segments are incorporated and activated to guide the modulation of the temporal feature, leading to a holistic global compensation. Moreover, we further propose a Difference Compensation Unit (DCU) to enrich the interaction between the spatial distribution of the target frame and temporal compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches. Code will be available at https://github.com/XY-boy/LGTD	翻訳日:2023-11-02 03:20:41 公開日:2023-10-30
# クロスモルフォロジーによるロボットマニピュレーションの学習 Learning Robot Manipulation from Cross-Morphology Demonstration ( http://arxiv.org/abs/2304.03833v2 ) ライセンス: Link先を確認	Gautam Salhotra, I-Chun Arthur Liu, Gaurav Sukhatme	(参考訳) 実演(lfd)メソッドから学んだものは、教師と生徒のアクションスペースで小さなミスマッチを処理する。ここでは,教師の形態が学生と大きく異なる場合について述べる。我々のフレームワークであるMorphological Adaptation in Imitation Learning (MAIL)はこのギャップを埋め、異なる形態を持つ他のエージェントによるデモンストレーションからエージェントを訓練することができる。 MAILは、望まれるソリューションへの$\textit{some}$ガイダンスを提供する限り、最適以下のデモから学ぶ。剛体および変形可能な物体を用いた操作タスクにおいて,剛体障害物と相互作用する3次元布操作を含むメールを提示する。 2つのエンドエフェクタを有する模擬エージェントによるデモンストレーションを用いて,1つのエンドエフェクタを持つロボットの視覚制御ポリシを訓練する。 MAILは、LfDおよび非LfDベースラインに対する正規化パフォーマンスメトリックを最大で24.%改善する。本物のFranka Pandaロボットにデプロイされ、オブジェクト(サイズ、回転、翻訳)と布固有の特性(色、厚さ、サイズ、材料)のさまざまな特性を扱う。概要はhttps://uscresl.github.io/mail にある。 Some Learning from Demonstrations (LfD) methods handle small mismatches in the action spaces of the teacher and student. Here we address the case where the teacher's morphology is substantially different from that of the student. Our framework, Morphological Adaptation in Imitation Learning (MAIL), bridges this gap allowing us to train an agent from demonstrations by other agents with significantly different morphologies. MAIL learns from suboptimal demonstrations, so long as they provide $\textit{some}$ guidance towards a desired solution. We demonstrate MAIL on manipulation tasks with rigid and deformable objects including 3D cloth manipulation interacting with rigid obstacles. We train a visual control policy for a robot with one end-effector using demonstrations from a simulated agent with two end-effectors. MAIL shows up to $24\%$ improvement in a normalized performance metric over LfD and non-LfD baselines. It is deployed to a real Franka Panda robot, handles multiple variations in properties for objects (size, rotation, translation), and cloth-specific properties (color, thickness, size, material). An overview is on https://uscresl.github.io/mail .	翻訳日:2023-11-02 03:19:53 公開日:2023-10-30
# log-concaveサンプリングのためのクエリ下限 Query lower bounds for log-concave sampling ( http://arxiv.org/abs/2304.02599v2 ) ライセンス: Link先を確認	Sinho Chewi, Jaume de Dios Pont, Jerry Li, Chen Lu, Shyam Narayanan	(参考訳) ログ・コンケーブのサンプリングは近年顕著なアルゴリズムの進歩をみせたが、このタスクの下位境界を証明するための対応する問題は、以前は次元1でしか知られていなかった。本研究では, 1次元の強いlog-concaveおよびlog-smooth分布からのサンプリングには,任意の定数次元においてシャープな$\omega(\log \kappa)$クエリ, 2次元のガウス分布からのサンプリング$d$(一般のlog-concaveおよびlog-smooth分布からも$d$となる)には$\widetilde \omega(\min(\sqrt\kappa \log d, d)$クエリが必要である。ここで$\kappa$はターゲット分布の条件番号を表す。本証明は,(1)幾何学的測度論におけるカヤヤ予想の研究に触発された多元的構成と,(2)ブロッククリロフアルゴリズムがこの問題に最適であることを示す新しい還元と,行列・ベクトル問合せ文献で開発されたウィッシュアート行列に基づく下限手法との関係に依存する。 Log-concave sampling has witnessed remarkable algorithmic advances in recent years, but the corresponding problem of proving lower bounds for this task has remained elusive, with lower bounds previously known only in dimension one. In this work, we establish the following query lower bounds: (1) sampling from strongly log-concave and log-smooth distributions in dimension $d\ge 2$ requires $\Omega(\log \kappa)$ queries, which is sharp in any constant dimension, and (2) sampling from Gaussians in dimension $d$ (hence also from general log-concave and log-smooth distributions in dimension $d$) requires $\widetilde \Omega(\min(\sqrt\kappa \log d, d))$ queries, which is nearly sharp for the class of Gaussians. Here $\kappa$ denotes the condition number of the target distribution. Our proofs rely upon (1) a multiscale construction inspired by work on the Kakeya conjecture in geometric measure theory, and (2) a novel reduction that demonstrates that block Krylov algorithms are optimal for this problem, as well as connections to lower bound techniques based on Wishart matrices developed in the matrix-vector query literature.	翻訳日:2023-11-02 03:18:16 公開日:2023-10-30
# EduceLab-Scrolls:X線CTによるHerculaneum Papyriからのテキストの復元 EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT ( http://arxiv.org/abs/2304.02084v3 ) ライセンス: Link先を確認	Stephen Parsons, C. Seth Parker, Christy Chapman, Mami Hayashida, W. Brent Seales	(参考訳) X線CT画像を用いたHerculaneum papyriの隠れテキストを明らかにするための完全なソフトウェアパイプラインを提案する。この拡張された仮想アンラッピングパイプラインは、機械学習と、3D画像と2D画像をリンクする新しい幾何学的フレームワークを組み合わせる。 educelab-scrollsは、この問題に対する20年の研究努力を表す包括的なオープンデータセットです。 EduceLab-Scrollsには、小さな断片と無傷のロールスクロールの両方のボリュームX線CT画像が含まれている。データセットには、インク検出モデルの教師付きトレーニングに使用される2Dイメージラベルも含まれている。ラベリングは、スクロールフラグメントのスペクトル写真と、同じフラグメントのX線CT画像との整列を可能とし、画像空間とモダリティの間の機械学習可能なマッピングを作成する。このアライメントは、X線CTで「見えない」炭素インクを検出するための教師あり学習を可能にする。私たちの知る限り、これはこの種のデータセットとしては初めてのもので、ヘリテージドメインでリリースされた最大のデータセットです。本手法は, スクロール断片の正確なテキスト行を, 既知の地底真理で明らかにすることができる。露見されたテキストは、視覚的確認、定量的画像計測、学術的レビューを用いて検証される。 educelab-scrollsは今回初めて、ここで紹介するherculaneum papyriの隠されたテキストを発見した。研究が進むにつれて、educelab-scrollsデータセットがよりテキスト的な発見を生み出すことを期待している。 We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.	翻訳日:2023-11-02 03:17:41 公開日:2023-10-30
# 横型3次元シーンにおける連続的人間の動きの生成 Generating Continual Human Motion in Diverse 3D Scenes ( http://arxiv.org/abs/2304.02061v3 ) ライセンス: Link先を確認	Aymen Mir, Xavier Puig, Angjoo Kanazawa, Gerard Pons-Moll	(参考訳) 本研究では,3次元シーンにおけるアニメーター誘導人間の動作を合成する手法を提案する。 3dシーンにおいて、スパース (3 または 4) のジョイント位置(例えば、人の手と2 フィートの位置)とシード動作シーケンスのセットが与えられると、本手法は、与えられたキーポイントによって課される制約を満足しながら、シード動作から開始される、妥当な動作シーケンスを生成する。本研究では,連続的な動作合成問題を経路に分解し,キーポイントが指定した動作の内外への遷移を図り,シーン情報を明示的に組み込むことなくシーン制約を満たす動作の長期化を可能にする。本手法はシーン非依存のモキャップデータのみを用いて訓練する。結果として,我々のアプローチは,さまざまなジオメトリを備えた3dシーンに展開可能である。ドリフトを使わずに再現可能な連続運動合成を実現するためには,次の目標が原点に位置する目標中心の正準座標系において運動を生成することが重要となる。我々のモデルは,HPS, Replica, Matterport, ScanNet, およびNeRFを用いて表現されたシーンにおいて, 任意の順序でつかむ, 座る, 傾くといった多様な動作の長いシーケンスを生成することができる。いくつかの実験により、3dシーンでパスをナビゲートする既存のメソッドよりも優れていることが証明された。 We introduce a method to synthesize animator guided human motion across 3D scenes. Given a set of sparse (3 or 4) joint locations (such as the location of a person's hand and two feet) and a seed motion sequence in a 3D scene, our method generates a plausible motion sequence starting from the seed motion while satisfying the constraints imposed by the provided keypoints. We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints, which enables long generation of motions that satisfy scene constraints without explicitly incorporating scene information. Our method is trained only using scene agnostic mocap data. As a result, our approach is deployable across 3D scenes with various geometries. For achieving plausible continual motion synthesis without drift, our key contribution is to generate motion in a goal-centric canonical coordinate frame where the next immediate target is situated at the origin. Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together in arbitrary order, demonstrated on scenes of varying geometry: HPS, Replica, Matterport, ScanNet and scenes represented using NeRFs. Several experiments demonstrate that our method outperforms existing methods that navigate paths in 3D scenes.	翻訳日:2023-11-02 03:17:16 公開日:2023-10-30
# 双極子対称性破壊からの非フェルミ液体 Non-Fermi Liquids from Dipolar Symmetry Breaking ( http://arxiv.org/abs/2304.01181v3 ) ライセンス: Link先を確認	Amogh Anakru, Zhen Bi	(参考訳) フラクトロニック位相の出現と量子力学の新しい普遍性クラスは、凝縮系における双極子対称性の重要性を強調している。本研究では,種々の空間次元のフェルミオンモデルにおける双極子対称性の対称性破断相の性質について検討する。このような系では、フェルミオンは双極子凝縮によってエネルギー分散を得る。変換対称性と双極子対称性の間の非自明な可換性のため、二極子縮合の金石モードは分散フェルミオンに強く結合し、自然に低エネルギーで非フェルミ液体を生じさせる。双極子対称性の破れ相のIR記述は、創発的U(1)ゲージ場と結合するフェルミ曲面のよく知られた理論に類似している。また,双極子対称性がわずかに破れた場合の交叉挙動と異方性双極子保存の場合についても論じる。 The emergence of fractonic topological phases and novel universality classes for quantum dynamics highlights the importance of dipolar symmetry in condensed matter systems. In this work, we study the properties of symmetry-breaking phases of the dipolar symmetries in fermionic models in various spatial dimensions. In such systems, fermions obtain energy dispersion through dipole condensation. Due to the nontrivial commutation between the translation symmetry and dipolar symmetry, the Goldstone modes of the dipolar condensate are strongly coupled to the dispersive fermions and naturally give rise to non-Fermi liquids at low energies. The IR description of the dipolar symmetry-breaking phase is analogous to the well-known theory of a Fermi surface coupled to an emergent U(1) gauge field. We also discuss the crossover behavior when the dipolar symmetry is slightly broken and the cases with anisotropic dipolar conservation.	翻訳日:2023-11-02 03:16:21 公開日:2023-10-30
# 生成モデリングのための拡散マップ粒子システム Diffusion map particle systems for generative modeling ( http://arxiv.org/abs/2304.00200v2 ) ライセンス: Link先を確認	Fengyi Li, Youssef Marzouk	(参考訳) 本稿では,拡散マップとラプラシアン調整ワッサーシュタイン勾配勾配(lawgd)に基づく生成モデルのための新しい拡散マップ粒子システム(dmps)を提案する。拡散写像は、サンプルから対応するランジュバン拡散過程の生成元を近似し、従って基礎となるデータ生成多様体を学ぶために用いられる。一方, lawgd では, 拡散写像で計算した生成器のスペクトル近似を用いて, 適切なカーネル選択を条件として, ターゲット分布からの効率的なサンプリングが可能となる。本手法は,オフライントレーニングや最小限のチューニングを必要とせず,中程度の次元のデータセット上で他のアプローチよりも優れる。 We propose a novel diffusion map particle system (DMPS) for generative modeling, based on diffusion maps and Laplacian-adjusted Wasserstein gradient descent (LAWGD). Diffusion maps are used to approximate the generator of the corresponding Langevin diffusion process from samples, and hence to learn the underlying data-generating manifold. On the other hand, LAWGD enables efficient sampling from the target distribution given a suitable choice of kernel, which we construct here via a spectral approximation of the generator, computed with diffusion maps. Our method requires no offline training and minimal tuning, and can outperform other approaches on data sets of moderate dimension.	翻訳日:2023-11-02 03:16:06 公開日:2023-10-30
# 教育のための人工知能(agi) Artificial General Intelligence (AGI) for Education ( http://arxiv.org/abs/2304.12479v3 ) ライセンス: Link先を確認	Ehsan Latif, Gengchen Mai, Matthew Nyaaba, Xuansheng Wu, Ninghao Liu, Guoyu Lu, Sheng Li, Tianming Liu, and Xiaoming Zhai	(参考訳) 人工知能 (AGI) は, GPT-4 や ChatGPT といった大規模言語モデルやチャットボットの出現により, 将来の技術としてグローバルに認識されるようになった。 agiは、教育分野に革命を起こす可能性のある重要な技術の1つであるコンピュータシステムを通じて、人間の知能を再現することを目指している。通常、限られた範囲のタスク用に設計された従来のaiモデルと比較すると、トレーニングのためにかなりの量のドメイン固有のデータを必要とし、教育における複雑な対人ダイナミクスを考えるとは限らない。最近の大規模な事前学習モデルによって駆動されるAGIは、推論、問題解決、意思決定、さらには人間の感情や社会的相互作用を理解することなど、人間レベルの知性を必要とするタスクを実行する機械の能力において、大きな飛躍を示している。この研究は、AGIの教育目標の設定、教育とカリキュラムの設計、評価の実行など、将来の教育における重要な概念、能力、範囲、潜在能力についてレビューする。また、AGIが直面する教育における様々な倫理的問題や、AGIが人間の教育者に与える影響について、豊富な議論を行っている。 AGIの開発は、研究と応用活動を進めるために、教育者とAIエンジニアの学際的なコラボレーションを必要とする。 Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. AGI aims to replicate human intelligence through computer systems, which is one of the critical technologies having the potential to revolutionize the field of education. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This work reviews AGI's key concepts, capabilities, scope, and potential within future education, including setting educational goals, designing pedagogy and curriculum, and performing assessments. We also provide rich discussions over various ethical issues in education faced by AGI and how AGI will affect human educators. The development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.	翻訳日:2023-11-02 03:08:44 公開日:2023-10-30
# 二成分ボース混合物の魅力的な溶液--液-真空共存と臨界点 Attractive Solution of Binary Bose Mixtures: Liquid-Vapor Coexistence and Critical Point ( http://arxiv.org/abs/2304.12334v2 ) ライセンス: Link先を確認	Gabriele Spada, Sebastiano Pilati and Stefano Giorgini	(参考訳) 完全経路積分モンテカルロ法を用いた魅力的な二成分ボース混合物の熱力学的挙動について検討した。我々は, 基底状態が自己結合性液相にある種間相互作用の制御に焦点をあて, 平均場効果を超えて安定化する。我々はアトラクション強度の異なる値に対して圧力対密度面の等温曲線を計算し、マックスウェル構造を用いて液体と蒸気の共存領域の面積を推定する。特に、共存領域内では、ボース=アインシュタイン凝縮は、密度が通常の気体から超流動液相に上昇するにつれて不連続的に起こる。さらに,一階遷移線が終端する臨界点を決定し,その近傍の密度不連続性の挙動について検討する。また, この遷移における密度不連続性は, トラップ内の混合実験で観測できることも指摘した。 We study the thermodynamic behavior of attractive binary Bose mixtures using exact path-integral Monte-Carlo methods. Our focus is on the regime of interspecies interactions where the ground state is in a self-bound liquid phase, stabilized by beyond mean-field effects. We calculate the isothermal curves in the pressure vs density plane for different values of the attraction strength and establish the extent of the coexistence region between liquid and vapor using the Maxwell construction. Notably, within the coexistence region, Bose-Einstein condensation occurs in a discontinuous way as the density jumps from the normal gas to the superfluid liquid phase. Furthermore, we determine the critical point where the line of first-order transition ends and investigate the behavior of the density discontinuity in its vicinity. We also point out that the density discontinuity at the transition could be observed in experiments of mixtures in traps.	翻訳日:2023-11-02 03:08:23 公開日:2023-10-30
# 物理ベース補間による水ネットワークリーク定位のための辞書の学習 Learning Dictionaries from Physical-Based Interpolation for Water Network Leak Localization ( http://arxiv.org/abs/2304.10932v2 ) ライセンス: Link先を確認	Paul Irofti and Luis Romero-Ben and Florin Stoican and Vicen\c{c} Puig	(参考訳) 本稿では,状態推定と学習に基づくリークローカライズ手法を提案する。第1は補間方式で処理されるが、第2段階では辞書学習が考慮される。新たに提案する補間手法は, 配水ネットワークにおける隣接ノードの油圧ヘッド間の相互接続の物理を活用している。さらに、残差は油圧ヘッド値の代わりに直接補間される。よく知られているケーススタディ (modena) に本手法を適用した結果, 補間誤差(配位状態と残差推定)と後方位置推定の両面で, 新たな補間法の改善が示された。 This article presents a leak localization methodology based on state estimation and learning. The first is handled by an interpolation scheme, whereas dictionary learning is considered for the second stage. The novel proposed interpolation technique exploits the physics of the interconnections between hydraulic heads of neighboring nodes in water distribution networks. Additionally, residuals are directly interpolated instead of hydraulic head values. The results of applying the proposed method to a well-known case study (Modena) demonstrated the improvements of the new interpolation method with respect to a state-of-the-art approach, both in terms of interpolation error (considering state and residual estimation) and posterior localization.	翻訳日:2023-11-02 03:08:08 公開日:2023-10-30
# 配向相における長寿命シングルト状態とその等方相への相転移における生存 Long-Lived Singlet State in an Oriented Phase and its Survival across the Phase Transition Into an Isotropic Phase ( http://arxiv.org/abs/2304.10459v3 ) ライセンス: Link先を確認	Vishal Varma, and T S Mahesh	(参考訳) 核スピン対の長寿命一重項状態(LLS)は、液体NMRを介して等方性相において広く研究され、利用されてきた。しかし、異方性相におけるLSSの報告はほとんどなく、スカラーカップリングに加えて双極子カップリングからの寄与を許容し、多くのエキサイティングな可能性を開く。本稿では,液晶溶媒のネマティック相に部分的に配向した一対の核スピンにおけるLSSの観察を報告する。スピンは残留双極子-双極子カップリングを介して強く相互作用する。配向相におけるLSSは、通常のスピン格子緩和時間定数(T_1$)の最大3倍長寿命である。加熱すると、システムはネマティックから等方相への相転移を起こし、llsは対応する$t_1$の最大5倍の寿命を持つ。興味深いことに、配向相で調製されたLSSは、ネマティック相から等方相への遷移を生き残ることができる。配向相におけるllsの応用として, 液晶溶媒中の溶質分子の小さな移動拡散係数を測定するために, その長寿命を利用する。最後に、LSSへのアクセスをロックまたはアンロックするために位相遷移を利用することを提案する。 Long-lived singlet states (LLS) of nuclear spin pairs have been extensively studied and utilized in the isotropic phase via liquid state NMR. However, there are hardly any reports of LLS in the anisotropic phase that allows contribution from the dipolar coupling in addition to the scalar coupling, thereby opening many exciting possibilities. Here we report observing LLS in a pair of nuclear spins partially oriented in the nematic phase of a liquid crystal solvent. The spins are strongly interacting via the residual dipole-dipole coupling. We observe LLS in the oriented phase living up to three times longer than the usual spin-lattice relaxation time constant ($T_1$). Upon heating, the system undergoes a phase transition from nematic into isotropic phase, wherein the LLS is up to five times longer lived than the corresponding $T_1$. Interestingly, the LLS prepared in the oriented phase can survive the transition from the nematic to the isotropic phase. As an application of LLS in the oriented phase, we utilize its longer life to measure the small translational diffusion coefficient of solute molecules in the liquid crystal solvent. Finally, we propose utilizing the phase transition to lock or unlock access to LLS.	翻訳日:2023-11-02 03:07:57 公開日:2023-10-30
# 信頼度予測のための事前学習モデルからのサンプル難読化 Learning Sample Difficulty from Pre-trained Models for Reliable Prediction ( http://arxiv.org/abs/2304.10127v2 ) ライセンス: Link先を確認	Peng Cui, Dan Zhang, Zhijie Deng, Yinpeng Dong, Jun Zhu	(参考訳) 大規模事前学習モデルは多くのアプリケーションで顕著な成功を収めているが、下流モデルの予測信頼性を改善するためにそれらを活用する方法は望ましくないほど過小評価されている。さらに、現代のニューラルネットワークは校正が不十分で、固有のサンプルの難しさやデータの不確実性に関わらず、自信過剰な予測がなされている。そこで本研究では,大規模な事前学習モデルを用いて,サンプル難易度を考慮したエントロピー正規化による下流モデルトレーニングを指導する。大規模データセットに露出し、下流のトレーニングクラスに過度に適合しない事前学習モデルでは、特徴空間ガウスモデルと相対マハラノビス距離計算により、各トレーニングサンプルの難易度を測定することができる。重要なことは、サンプルの難易度に基づいて過信予測を適応的にペナルティ化することで、挑戦するベンチマーク(例えば、ResNet34を用いてImageNet1k上で+0.55% ACCと-3.7% ECE)の精度と不確実性の校正を同時に改善し、信頼性のある予測のための競争基準を一貫して上回っていることである。改良された不確実性推定は、選択的分類(誤った予測を含まない)と分布外検出をさらに改善する。 Large-scale pre-trained models have achieved remarkable success in many applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample's difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample difficulty, we simultaneously improve accuracy and uncertainty calibration across challenging benchmarks (e.g., +0.55% ACC and -3.7% ECE on ImageNet1k using ResNet34), consistently surpassing competitive baselines for reliable prediction. The improved uncertainty estimate further improves selective classification (abstaining from erroneous predictions) and out-of-distribution detection.	翻訳日:2023-11-02 03:07:39 公開日:2023-10-30
# Thorny Roses氏:自然言語処理における両用ジレンマの調査 Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing ( http://arxiv.org/abs/2304.08315v3 ) ライセンス: Link先を確認	Lucie-Aim\'ee Kaffee, Arnav Arora, Zeerak Talat, Isabelle Augenstein	(参考訳) 技術と科学的成果物の意図的かつ有害な再利用である二重利用は、自然言語処理(nlp)の文脈ではまだ明確に定義されていない問題である。しかし、NLP技術は発展を続け、社会に広まりつつあるため、内部の作業はますます不透明になっている。したがって、二重利用の懸念とそれらを制限する潜在的な方法を理解することは、研究開発の潜在的な害を最小化するために重要である。本稿では,NLP研究者と実践者を対象に,課題の深さと展望を把握し,既存のサポートの評価を行う。調査の結果に基づき,NLPコミュニティのニーズに合わせた二重利用の定義を提供する。この調査によると、大多数の研究者が研究の二重利用を心配しているが、その対策は限られている。調査結果を踏まえ,NLPにおける二重利用を緩和する現在の状況と潜在的手段について考察し,既存の会議倫理枠組み,例えばACL倫理チェックリストに統合可能なチェックリストを提案する。 Dual use, the intentional, harmful reuse of technology and scientific artefacts, is a problem yet to be well-defined within the context of Natural Language Processing (NLP). However, as NLP technologies continue to advance and become increasingly widespread in society, their inner workings have become increasingly opaque. Therefore, understanding dual use concerns and potential ways of limiting them is critical to minimising the potential harms of research and development. In this paper, we conduct a survey of NLP researchers and practitioners to understand the depth and their perspective of the problem as well as to assess existing available support. Based on the results of our survey, we offer a definition of dual use that is tailored to the needs of the NLP community. The survey revealed that a majority of researchers are concerned about the potential dual use of their research but only take limited action toward it. In light of the survey results, we discuss the current state and potential means for mitigating dual use in NLP and propose a checklist that can be integrated into existing conference ethics-frameworks, e.g., the ACL ethics checklist.	翻訳日:2023-11-02 03:05:53 公開日:2023-10-30
# 密集群集追跡における重度咬合の頭部集中による対処 Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads ( http://arxiv.org/abs/2304.07705v3 ) ライセンス: Link先を確認	Yu Zhang, Huaming Chen, Wei Bao, Zhongzheng Lai, Zao Zhang, Dong Yuan	(参考訳) ディープラーニングの急速な発展に伴い、オブジェクト検出と追跡は、今日の社会において重要な役割を果たす。密集した群衆シーンのすべての歩行者をコンピュータビジョンのアプローチで識別し追跡することは、この分野で典型的な課題であり、Multiple Object Tracking(MOT)チャレンジとも呼ばれる。現代のトラッカーは、より複雑なシーンで操作する必要がある。 MOT20チャレンジの結果によると、歩行者はMOT17チャレンジの4倍密度がある。したがって、非常に混み合った場面で検出・追跡する能力を向上させることが、この研究の目的である。人体に対する咬合問題に照らし合わせると、頭部は通常より識別が容易である。本研究では,小型・中型ともに歩行者のリコールと精度の向上を図るために,アンカーレス方式のジョイントヘッドとボディ検出器を設計した。また,本モデルでは,訓練用歩行者検出のための統計的頭部比に関する情報は不要である。提案するモデルは,その比率を動的に学習する。提案モデルの有効性を検証するため,MOT20,Crowd Human,HT21データセットなど,さまざまなデータセットに対する広範な実験を行った。その結果,提案手法は中小歩行者のリコール率と精度を著しく改善し,これらの課題データセットにおいて最先端の結果を得ることができた。 With the rapid development of deep learning, object detection and tracking play a vital role in today's society. Being able to identify and track all the pedestrians in the dense crowd scene with computer vision approaches is a typical challenge in this field, also known as the Multiple Object Tracking (MOT) challenge. Modern trackers are required to operate on more and more complicated scenes. According to the MOT20 challenge result, the pedestrian is 4 times denser than the MOT17 challenge. Hence, improving the ability to detect and track in extremely crowded scenes is the aim of this work. In light of the occlusion issue with the human body, the heads are usually easier to identify. In this work, we have designed a joint head and body detector in an anchor-free style to boost the detection recall and precision performance of pedestrians in both small and medium sizes. Innovatively, our model does not require information on the statistical head-body ratio for common pedestrians detection for training. Instead, the proposed model learns the ratio dynamically. To verify the effectiveness of the proposed model, we evaluate the model with extensive experiments on different datasets, including MOT20, Crowdhuman, and HT21 datasets. As a result, our proposed method significantly improves both the recall and precision rate on small & medium sized pedestrians and achieves state-of-the-art results in these challenging datasets.	翻訳日:2023-11-02 03:05:12 公開日:2023-10-30
# 平均二階類似性に基づく確率的分散最適化:アルゴリズムと解析 Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis ( http://arxiv.org/abs/2304.07504v2 ) ライセンス: Link先を確認	Dachao Lin, Yuze Han, Haishan Ye, Zhihua Zhang	(参考訳) 一般化された$\delta$- similarity と $\mu$-strong convexity 条件の下でマスターノードと$n-1$ローカルノードを含む有限サム分散最適化問題を調べる。本稿では,SVRSとAccSVRSの2つの新しいアルゴリズムを提案する。非加速SVRS法は、勾配スライディングと分散低減の技法を組み合わせて、既存の非加速アルゴリズムと比較して$\tilde{\mathcal{O}}(n {+} \sqrt{n}\delta/\mu)$の通信複雑性を向上する。 Katyusha X で提案されたフレームワークを応用し、$\tilde{\mathcal{O}}(n {+} n^{3/4}\sqrt{\delta/\mu})$通信複雑性を持つ直接加速版 AccSVRS も開発する。既存の結果とは対照的に、複雑さの境界は完全に滑らかで、不調なケースでは優れている。さらに, AccSVRS法の厳密性を検証するために, ほぼ一致した下界を確立する。 We study finite-sum distributed optimization problems involving a master node and $n-1$ local nodes under the popular $\delta$-similarity and $\mu$-strong convexity conditions. We propose two new algorithms, SVRS and AccSVRS, motivated by previous works. The non-accelerated SVRS method combines the techniques of gradient sliding and variance reduction and achieves a better communication complexity of $\tilde{\mathcal{O}}(n {+} \sqrt{n}\delta/\mu)$ compared to existing non-accelerated algorithms. Applying the framework proposed in Katyusha X, we also develop a directly accelerated version named AccSVRS with the $\tilde{\mathcal{O}}(n {+} n^{3/4}\sqrt{\delta/\mu})$ communication complexity. In contrast to existing results, our complexity bounds are entirely smoothness-free and exhibit superiority in ill-conditioned cases. Furthermore, we establish a nearly matched lower bound to verify the tightness of our AccSVRS method.	翻訳日:2023-11-02 03:04:52 公開日:2023-10-30
# ベイズ階層モデルのためのギブスサンプラーの次元自由混合時間 Dimension-free mixing times of Gibbs samplers for Bayesian hierarchical models ( http://arxiv.org/abs/2304.06993v2 ) ライセンス: Link先を確認	Filippo Ascolani and Giacomo Zanella	(参考訳) ギブズサンプリングはベイズ階層モデルから生じる後続分布を近似する一般的なアルゴリズムである。しかし、その人気と優れた経験的性能にもかかわらず、勾配に基づくサンプリング法よりもはるかに少ないような収束特性に関する定量的な結果はまだ少ない。本研究は,ベイズ漸近学のツールを用いた階層モデルを対象としたギブスサンプルの総変動混合時間の挙動を解析する。一般確率関数を持つ2レベルモデルの広いクラスに対して、ランダムなデータ生成仮定の下で次元自由収束結果を得る。ガウス的、二項的、カテゴリー的可能性に関する具体例を論じる。 Gibbs samplers are popular algorithms to approximate posterior distributions arising from Bayesian hierarchical models. Despite their popularity and good empirical performances, however, there are still relatively few quantitative results on their convergence properties, e.g. much less than for gradient-based sampling methods. In this work we analyse the behaviour of total variation mixing times of Gibbs samplers targeting hierarchical models using tools from Bayesian asymptotics. We obtain dimension-free convergence results under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed.	翻訳日:2023-11-02 03:03:23 公開日:2023-10-30
# Vault: コードの理解と生成を促進するための総合的な多言語データセット The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation ( http://arxiv.org/abs/2305.06156v2 ) ライセンス: Link先を確認	Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui	(参考訳) 本稿では,多言語言語における高品質なコードテキストペアのデータセットであるThe Vaultについて紹介する。本稿では,ルールベースと深層学習ベースの両方の手法を用いて,高品質なコードとテキストを含むことを保証するサンプルを徹底的に抽出する手法を提案する。コード生成やコード検索,コード要約など,一般的なコーディングタスクに対する広範な評価は,コード検索Netなどの他のデータセットでトレーニングされたモデルよりも優れていることを示す。また,これらのモデルの性能に及ぼす各種プログラミング言語やドクストリングの影響を評価するために,データセットの詳細な分析を行った。 We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code. We present methods for thoroughly extracting samples that use both rule-based and deep learning-based methods to ensure that they contain high-quality pairs of code and text, resulting in a dataset of 43 million high-quality code-text pairs. Our extensive evaluations on common coding tasks including code generation, code search and code summarization show that when fine-tuning Code Large Language Models on The Vault, such models outperform the same models trained on other datasets such as CodeSearchNet. We also provide detailed analyses of our datasets to assess the effects of various programming languages and docstrings on the performance of such models.	翻訳日:2023-11-02 02:55:54 公開日:2023-10-30
# 専門家のガウス混合におけるソフトマックスゲーティング機能 Demystifying Softmax Gating Function in Gaussian Mixture of Experts ( http://arxiv.org/abs/2305.03288v2 ) ライセンス: Link先を確認	Huy Nguyen and TrungTin Nguyen and Nhat Ho	(参考訳) ソフトマックスゲーティング・ガウシアン混合物のパラメータ推定の理解は、文献の長年の未解決問題として残されている。主な原因は、ソフトマックスゲーティング関数に関連する3つの基本的な理論的課題である。 (i)パラメータの翻訳のみによる識別可能性 (II)ソフトマックスゲーティングとガウス密度のエキスパート関数の間の偏微分方程式による内在的相互作用 (3) ガウスの混合を測るソフトマックスの条件密度の数値と分母の間の複素依存性。これらの課題を,パラメータ間の新しいボロノイ損失関数を提案し,パラメータ推定のためのmle(maximum probability estimator)の収束率を確立することで解決する。本研究の結果から,mleの収束率と多項式方程式系の可解性問題との関係が明らかとなった。 Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.	翻訳日:2023-11-02 02:54:38 公開日:2023-10-30
# ニューラルネットワークはタブラルデータ上で高木を向上するのか? When Do Neural Nets Outperform Boosted Trees on Tabular Data? ( http://arxiv.org/abs/2305.02997v3 ) ライセンス: Link先を確認	Duncan McElfresh, Sujay Khandagale, Jonathan Valverde, Vishak Prasad C, Benjamin Feuer, Chinmay Hegde, Ganesh Ramakrishnan, Micah Goldblum, Colin White	(参考訳) タブラルデータ(英: Tabular data)は、機械学習において最も一般的に使用されるデータの1つである。表データに対するニューラルネット(NN)の最近の進歩にもかかわらず、NNが表データ上で一般的に勾配付き決定木(GBDT)を上回っているかどうかについては、活発な議論が続いている。この作業では、一歩後退して、この議論の重要性に疑問を投げかけます。驚くほど多くのデータセットに対して、GBDTとNNのパフォーマンス差は無視可能であるか、GBDTの軽量ハイパーパラメータチューニングの方がNNとGBDTの選択よりも重要である。最近提案された事前データ対応ネットワークであるTabPFNは、3000のトレーニングセットに事実上制限されているが、3000のトレーニングデータポイントをランダムにサンプリングしても、他のアルゴリズムを平均で上回っている。次に、数十のメタ機能を分析して、nnやgbdtがうまく機能するようにデータセットの特性を決定する。例えば、GBDTは、スキューやヘビーテールの機能分布やその他のデータセットの不規則性を扱うのに、NNよりもはるかに優れている。私たちの洞察は、実践者がデータセット上で最もうまく機能するテクニックを決定するためのガイドとして機能します。最後に、表形式のデータ研究を加速することを目的として、TabZilla Benchmark Suiteをリリースした。私たちのベンチマークスイート、コードベース、およびすべての生の結果は、https://github.com/naszilla/tabzillaで閲覧できます。 Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. To this end, we conduct the largest tabular data analysis to date, comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than choosing between NNs and GBDTs. A remarkable exception is the recently-proposed prior-data fitted network, TabPFN: although it is effectively limited to training sets of size 3000, we find that it outperforms all other algorithms on average, even when randomly sampling 3000 training datapoints. Next, we analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed or heavy-tailed feature distributions and other forms of dataset irregularities. Our insights act as a guide for practitioners to determine which techniques may work best on their dataset. Finally, with the goal of accelerating tabular data research, we release the TabZilla Benchmark Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark suite, codebase, and all raw results are available at https://github.com/naszilla/tabzilla.	翻訳日:2023-11-02 02:54:24 公開日:2023-10-30
# 多レベル一貫性に基づく弱制御マイクロ・マクロ圧縮スポッティング Weakly-supervised Micro- and Macro-expression Spotting Based on Multi-level Consistency ( http://arxiv.org/abs/2305.02734v2 ) ライセンス: Link先を確認	Wang-Wang Yu, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li	(参考訳) 非トリミングビデオにおけるマイクロおよびマクロ表現スポッティング手法の多くは、ビデオ単位での収集とフレーム毎のアノテーションの負担に苦しむ。ビデオレベルラベルに基づくwes(weed-supervised expression spotting)は,きめ細かいフレームレベルスポッティングを実現しながら,フレームレベルのアノテーションの複雑さを軽減する可能性がある。しかし、既存の弱教師付き手法は、モーダリティ、サンプル間、タスク間ギャップを含む多重インスタンス学習(MIL)に基づいていると論じる。サンプル間ギャップは主にサンプル分布と持続時間に由来する。そこで本研究では,ビデオレベルのラベルのみを用いたフレームレベルのスポッティングを実現するために,モーダルレベルのサリエンシ,ビデオレベルの分散,ラベルレベルの持続時間,セグメントレベルの特徴一貫性戦略などを含むマルチコンシスタンスな協調機構を用いた,新しいwesフレームワークであるmc-wesを提案する。モーダルレベルのサリエンシ整合性戦略は、生画像と光流のキー相関を捉えることに焦点を当てている。映像レベルの分布整合性戦略は時間分布のスパーシティの差を利用する。ラベルレベルの持続時間一貫性戦略は、顔の筋肉の持続時間の違いを利用する。セグメントレベルの機能一貫性戦略は、同じラベル下の機能は類似性を維持することを強調する。 CAS(ME)$^2$、CAS(ME)$^3$、SAMM-LVという3つの挑戦的なデータセットの実験結果は、MC-WESが最先端の完全教師付き手法に匹敵することを示した。 Most micro- and macro-expression spotting methods in untrimmed videos suffer from the burden of video-wise collection and frame-wise annotation. Weakly-supervised expression spotting (WES) based on video-level labels can potentially mitigate the complexity of frame-level annotation while achieving fine-grained frame-level spotting. However, we argue that existing weakly-supervised methods are based on multiple instance learning (MIL) involving inter-modality, inter-sample, and inter-task gaps. The inter-sample gap is primarily from the sample distribution and duration. Therefore, we propose a novel and simple WES framework, MC-WES, using multi-consistency collaborative mechanisms that include modal-level saliency, video-level distribution, label-level duration and segment-level feature consistency strategies to implement fine frame-level spotting with only video-level labels to alleviate the above gaps and merge prior knowledge. The modal-level saliency consistency strategy focuses on capturing key correlations between raw images and optical flow. The video-level distribution consistency strategy utilizes the difference of sparsity in temporal distribution. The label-level duration consistency strategy exploits the difference in the duration of facial muscles. The segment-level feature consistency strategy emphasizes that features under the same labels maintain similarity. Experimental results on three challenging datasets -- CAS(ME)$^2$, CAS(ME)$^3$, and SAMM-LV -- demonstrate that MC-WES is comparable to state-of-the-art fully-supervised methods.	翻訳日:2023-11-02 02:53:54 公開日:2023-10-30
# アンリミフォーマ:アンリミット長入力長長変圧器 Unlimiformer: Long-Range Transformers with Unlimited Length Input ( http://arxiv.org/abs/2305.01625v3 ) ライセンス: Link先を確認	Amanda Bertsch, Uri Alon, Graham Neubig, Matthew R. Gormley	(参考訳) トランスの提案以来、これらのモデルは入力中の全てのトークンに出席する必要があるため、有界な入力長に限定されてきた。本研究では,既存のトレーニング済みエンコーダデコーダ変換器をラップし,k-nearest-neighbor(kNN)インデックスにクロスアテンション計算をオフロードする一般手法であるUnlimiformerを提案する。このkNNインデックスはGPUまたはCPUメモリのいずれかに保持され、サブ線形時間でクエリされる。この方法では、事実上無制限な入力シーケンスをインデックスすることができる。いくつかの長期文書および書籍要約ベンチマークでUnlimiformerを評価し,BookSumデータセットから500kのトークン長入力を,テスト時に入力トランケーションなしで処理可能であることを示した。我々は、Unlimiformerが、学習重量を増すことなく、コードを変更することなく無制限な入力に拡張することで、BARTやLongformerのような事前学習モデルを改善することを示した。コードとモデルをhttps://github.com/abertsch72/unlimiformerで公開しています。 Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index, while the returned kNN distances are the attention dot-product scores. This kNN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code. We make our code and models publicly available at https://github.com/abertsch72/unlimiformer .	翻訳日:2023-11-02 02:53:23 公開日:2023-10-30
# ChatGPTで生成されたコードは本当に正しいか? コード生成のための大規模言語モデルの厳密な評価 Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation ( http://arxiv.org/abs/2305.01210v3 ) ライセンス: Link先を確認	Jiawei Liu and Chunqiu Steven Xia and Yuyao Wang and Lingming Zhang	(参考訳) プログラム合成は、コードを生成するためにLLM(Large Language Models)の力を直接利用することに焦点を当てた最近のアプローチで長い間研究されてきた。コード合成における様々なllmのパフォーマンスを測定するために、キュレートされた合成問題とテストケースを伴うプログラミングベンチマークが使用される。しかし、これらのテストケースは、生成されたコードの機能的正確性を完全に評価するために、量と品質の両方で制限することができる。 LLMの時代、生成されたコードは本当に正しいのでしょうか? そこで我々は,LLM合成コードの機能的正しさを厳格に評価するコード合成評価フレームワークであるEvalPlusを提案する。 EvalPlusは、LLMと突然変異ベースの戦略を駆使した自動テスト入力ジェネレータによって新たに生成された大量のテストケースで、所定の評価データセットを拡張している。 EvalPlusは一般的なものですが、人気のあるHumanEvalベンチマークのテストケースを80倍拡張してHumanEval+を構築します。 26の人気のあるLCM(例えば、GPT-4とChatGPT)に対する我々の広範な評価は、HumanEval+がLLMによって合成された未検出の誤りコードを大量に取得でき、パス@kを19.3-28.9%まで削減できることを示している。また、テストの不十分さが誤判定につながることもわかりました。例えば、WizardCoder-CodeLlamaとPhind-CodeLlamaはいずれもHumanEval+でChatGPTを上回っている。我々の研究は、従来の一般的なコード合成評価結果が、コード合成のためのLLMの真の性能を正確に反映しているだけでなく、自動テストによってそのようなベンチマークを改善するための新たな方向性も示している。我々は、将来のLLM-for-codeリサーチを促進・加速するために、ツール、拡張データセット、およびすべてのLCM生成コードをhttps://github.com/evalplus/evalplusでオープンソース化しました。 Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure the performance of various LLMs on code synthesis. However, these test-cases can be limited in both quantity and quality for fully assessing the functional correctness of the generated code. Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus -- a code synthesis evaluation framework to rigorously benchmark the functional correctness of LLM-synthesized code. EvalPlus augments a given evaluation dataset with large amounts of test-cases newly produced by an automatic test input generator, powered by both LLM- and mutation-based strategies. While EvalPlus is general, we extend the test-cases of the popular HumanEval benchmark by 80x to build HumanEval+. Our extensive evaluation across 26 popular LLMs (e.g., GPT-4 and ChatGPT) demonstrates that HumanEval+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by up-to 19.3-28.9%. We also surprisingly found that test insufficiency can lead to mis-ranking. For example, both WizardCoder-CodeLlama and Phind-CodeLlama now outperform ChatGPT on HumanEval+, while none of them could on HumanEval. Our work not only indicates that prior popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis, but also opens up a new direction to improve such programming benchmarks through automated testing. We have open-sourced our tools, enhanced datasets as well as all LLM-generated code at https://github.com/evalplus/evalplus to facilitate and accelerate future LLM-for-code research.	翻訳日:2023-11-02 02:53:00 公開日:2023-10-30
# 大規模言語モデルを用いた単体テスト生成に関する実証的研究 An Empirical Study of Using Large Language Models for Unit Test Generation ( http://arxiv.org/abs/2305.00418v2 ) ライセンス: Link先を確認	Mohammed Latif Siddiq, Joanna C. S. Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, Fahmid Al Rifat, and Vinicius Carvalho Lopes	(参考訳) コード生成モデルは、コードコメント、既存のコード、または両方の組み合わせからプロンプトを受け取り、コードを生成する。コード生成モデル(github copilotなど)は、実際にはますます採用されているが、微調整なしでユニットテスト生成にうまく使えるかどうかは不明だ。我々は,このギャップを埋めるために3つの生成モデル(Codex, GPT-3.5-Turbo, StarCoder)がいかにうまくテストケースを生成するかを検討した。 HumanEval と Evosuite SF110 の2つのベンチマークを用いて,単体テスト生成プロセスにおけるコンテキスト生成の効果を検討した。モデルのコンパイル率,テストの正確性,カバレッジ,テストの臭いなどに基づいて評価した。 CodexモデルはHumanEvalデータセットの80%以上のカバレッジを達成したが、EvoSuite SF110ベンチマークの2%以上のカバレッジを持つモデルはない。生成されたテストは、Duplicated AssertsやEmpty Testsといったテストの臭いにも悩まされた。 A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g. GitHub Copilot) are increasingly being adopted in practice, it is unclear whether they can successfully be used for unit test generation without fine-tuning. We investigated how well three generative models (Codex, GPT-3.5-Turbo, and StarCoder) can generate test cases to fill this gap. We used two benchmarks (HumanEval and Evosuite SF110) to investigate the context generation's effect in the unit test generation process. We evaluated the models based on compilation rates, test correctness, coverage, and test smells. We found that the Codex model achieved above 80% coverage for the HumanEval dataset, but no model had more than 2% coverage for the EvoSuite SF110 benchmark. The generated tests also suffered from test smells, such as Duplicated Asserts and Empty Tests.	翻訳日:2023-11-02 02:52:05 公開日:2023-10-30
# PUNR:ニュースレコメンデーションのためのユーザ行動モデリングによる事前学習 PUNR: Pre-training with User Behavior Modeling for News Recommendation ( http://arxiv.org/abs/2304.12633v2 ) ライセンス: Link先を確認	Guangyuan Ma, Hongtao Liu, Xing Wu, Wanhui Qian, Zhepeng Lv, Qing Yang, Songlin Hu	(参考訳) ニュースレコメンデーションは、ユーザーの行動に基づいてクリック行動を予測することを目的としている。ユーザの表現を効果的にモデル化する方法は、望ましいニュースを推奨するキーとなる。既存の作品は、主に監督された微調整段階の改善に焦点を当てている。しかし、ユーザ表現に最適化された PLM ベースの教師なし事前学習手法がまだ存在しない。本研究では,ユーザ行動マスキングとユーザ行動生成という2つのタスクを備えた教師なし事前学習パラダイムを提案する。まず,ユーザ行動マスキング事前学習タスクを導入し,その状況行動に基づいてマスキングユーザ行動の復元を行う。このようにして、このモデルはより強く、より包括的なユーザーニュースリーディングパターンを捉えることができる。さらに,ユーザエンコーダから派生したユーザ表現ベクトルを強化するために,新しいユーザ行動生成事前学習タスクを導入する。上記の事前学習したユーザモデリングエンコーダを用いて、下流の微調整でニュースやユーザ表現を得る。実世界のニュースベンチマークの評価では、既存のベースラインよりも大幅にパフォーマンスが向上している。 News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.	翻訳日:2023-11-02 02:51:48 公開日:2023-10-30
# 比較推論のための事前学習言語モデル Pre-training Language Models for Comparative Reasoning ( http://arxiv.org/abs/2305.14457v2 ) ライセンス: Link先を確認	Mengxia Yu, Zhihan Zhang, Wenhao Yu, Meng Jiang	(参考訳) 比較推論は、対象、概念または実体を比較して結論を引き出す過程であり、基本的な認知能力を構成する。本稿では,テキストに対する比較推論能力を高めるための,事前学習型言語モデルのための新しいフレームワークを提案する。比較推論を必要とするNLPタスクにはアプローチがあるが、コストのかかる手動データラベリングと、異なるタスクに対する限定的な一般化性に悩まされている。本手法では,構造化データと非構造化データの両方を活用する,テキストベースのエンティティ比較のためのスケーラブルなデータ収集手法を提案する。さらに, 比較推論に関する3つの新しい目的を通して, 事前学習言語モデルの枠組みを提案する。比較質問応答,質問生成,要約などの下流タスクの評価は,特に低リソース条件下で,我々の事前学習フレームワークが言語モデルの比較推論能力を大幅に向上させることを示す。この研究は、比較推論のための最初の統合ベンチマークもリリースしている。 Comparative reasoning is a process of comparing objects, concepts, or entities to draw conclusions, which constitutes a fundamental cognitive ability. In this paper, we propose a novel framework to pre-train language models for enhancing their abilities of comparative reasoning over texts. While there have been approaches for NLP tasks that require comparative reasoning, they suffer from costly manual data labeling and limited generalizability to different tasks. Our approach introduces a novel method of collecting scalable data for text-based entity comparison, which leverages both structured and unstructured data. Moreover, we present a framework of pre-training language models via three novel objectives on comparative reasoning. Evaluation on downstream tasks including comparative question answering, question generation, and summarization shows that our pre-training framework significantly improves the comparative reasoning abilities of language models, especially under low-resource conditions. This work also releases the first integrated benchmark for comparative reasoning.	翻訳日:2023-11-02 02:45:02 公開日:2023-10-30
# 確率時空間ダイナミクスのための同変ニューラルシミュレータ Equivariant Neural Simulators for Stochastic Spatiotemporal Dynamics ( http://arxiv.org/abs/2305.14286v2 ) ライセンス: Link先を確認	Koen Minartz, Yoeri Poels, Simon Koop, Vlado Menkovski	(参考訳) ニューラルネットワークは、高次元力学系のスケーラブルなデータ駆動シミュレーションのツールとして、特に数値解法が実現不可能あるいは計算コストが高い環境で登場している。特に、決定論的ニューラルネットワークシミュレータにドメイン対称性を組み込むことで、精度、サンプル効率、パラメータ効率を大幅に改善できることが示されている。しかし、確率的現象をシミュレートできる確率的神経シミュレータに対称性を組み込むには、同変関数近似ではなく、軌道上の同変分布を生成するモデルが必要である。本稿では,同変分布の自己回帰的確率論的モデリングの枠組みであるEquivariant Probabilistic Neural Simulation (EPNS)を提案する。我々はepnsを用いて確率的n体系と確率的細胞動力学のモデルを設計する。実験の結果,EPNSは既存のニューラルネットワークを用いた確率的シミュレーション法よりもかなり優れていた。具体的には,epnに等価性を導入することで,シミュレーション品質,データ効率,ロールアウト安定性,不確実性定量化が向上することを示す。 EPNSは様々な領域における効率的なデータ駆動確率シミュレーションのための有望な手法である。 Neural networks are emerging as a tool for scalable data-driven simulation of high-dimensional dynamical systems, especially in settings where numerical methods are infeasible or computationally expensive. Notably, it has been shown that incorporating domain symmetries in deterministic neural simulators can substantially improve their accuracy, sample efficiency, and parameter efficiency. However, to incorporate symmetries in probabilistic neural simulators that can simulate stochastic phenomena, we need a model that produces equivariant distributions over trajectories, rather than equivariant function approximations. In this paper, we propose Equivariant Probabilistic Neural Simulation (EPNS), a framework for autoregressive probabilistic modeling of equivariant distributions over system evolutions. We use EPNS to design models for a stochastic n-body system and stochastic cellular dynamics. Our results show that EPNS considerably outperforms existing neural network-based methods for probabilistic simulation. More specifically, we demonstrate that incorporating equivariance in EPNS improves simulation quality, data efficiency, rollout stability, and uncertainty quantification. We conclude that EPNS is a promising method for efficient and effective data-driven probabilistic simulation in a diverse range of domains.	翻訳日:2023-11-02 02:44:47 公開日:2023-10-30
# プレゼンテーションバイアス下におけるマルチモーダル学習の反事実強化 Counterfactual Augmentation for Multimodal Learning Under Presentation Bias ( http://arxiv.org/abs/2305.14083v2 ) ライセンス: Link先を確認	Victoria Lin, Louis-Philippe Morency, Dimitrios Dimitriadis, Srinagesh Sharma	(参考訳) 現実世界の機械学習システムでは、ラベルはシステムが奨励したいユーザー行動に由来することが多い。時間とともに、新しいモデルは新しいトレーニング例と機能が利用可能になるようにトレーニングされなければなりません。しかし、ユーザーとモデルの間のフィードバックループは将来のユーザの振る舞いをバイアスし、新しいモデルをトレーニングする能力を損なうラベルにプレゼンテーションバイアスを引き起こす。本稿では,生成したデファクトラベルを用いて提示バイアスを補正する新しい因果的手法である,デファクト拡張を提案する。実証実験により,非補正モデルと既存バイアス補正手法の双方と比較して,デファクト改善により下流性能が向上することが示された。モデル分析はさらに、生成された偽物はオラクルの設定において真の偽物と密接に一致していることを示している。 In real-world machine learning systems, labels are often derived from user behaviors that the system wishes to encourage. Over time, new models must be trained as new training examples and features become available. However, feedback loops between users and models can bias future user behavior, inducing a presentation bias in the labels that compromises the ability to train new models. In this paper, we propose counterfactual augmentation, a novel causal method for correcting presentation bias using generated counterfactual labels. Our empirical evaluations demonstrate that counterfactual augmentation yields better downstream performance compared to both uncorrected models and existing bias-correction methods. Model analyses further indicate that the generated counterfactuals align closely with true counterfactuals in an oracle setting.	翻訳日:2023-11-02 02:44:28 公開日:2023-10-30
# snekhorn:対称エントロピーアフィニティによる次元縮小 SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities ( http://arxiv.org/abs/2305.13797v2 ) ライセンス: Link先を確認	Hugues Van Assel, Titouan Vayer, R\'emi Flamary, Nicolas Courty	(参考訳) 機械学習における多くのアプローチは、データセットのサンプル間の類似性を符号化する重み付きグラフに依存している。ポピュラー次元還元 (dr) アルゴリズム t-sne で特に用いられるエントロピーアフィニティ (eas) は、そのようなグラフの具体例である。不均質なサンプリング密度に対するロバスト性を確保するため、easは各サンプルにカーネル帯域幅パラメータを割り当て、親和性行列の各行のエントロピーが、指数関数がパープレキシティとして知られる特定の値で一定に保たれるようにした。 EAは本質的に非対称で行ワイド確率であるが、行ワイドなエントロピーと確率性の両方に反するヒューリスティックな対称性の手法を実行した後、DRアプローチで使用される。本研究では,最適な輸送問題としてのEAの新たな特徴を明らかにし,二重昇華を用いて効率的に計算できる自然な対称性を実現する。対応する新規親和性行列は、クラスタリング性能の点で対称確率正規化の利点を生かし、また各行のエントロピーを効果的に制御することにより、ノイズレベルの変化に対して特に堅牢である。次に,この新しい親和性行列を利用した新しいdrアルゴリズムsnekhornを提案する。我々は,合成データと実世界のデータの両方についていくつかの指標を用いて,最先端のアプローチよりも明らかに優れていることを示す。 Many approaches in machine learning rely on a weighted graph to encode the similarities between samples in a dataset. Entropic affinities (EAs), which are notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are particular instances of such graphs. To ensure robustness to heterogeneous sampling densities, EAs assign a kernel bandwidth parameter to every sample in such a way that the entropy of each row in the affinity matrix is kept constant at a specific value, whose exponential is known as perplexity. EAs are inherently asymmetric and row-wise stochastic, but they are used in DR approaches after undergoing heuristic symmetrization methods that violate both the row-wise constant entropy and stochasticity properties. In this work, we uncover a novel characterization of EA as an optimal transport problem, allowing a natural symmetrization that can be computed efficiently using dual ascent. The corresponding novel affinity matrix derives advantages from symmetric doubly stochastic normalization in terms of clustering performance, while also effectively controlling the entropy of each row thus making it particularly robust to varying noise levels. Following, we present a new DR algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear superiority to state-of-the-art approaches with several indicators on both synthetic and real-world datasets.	翻訳日:2023-11-02 02:43:52 公開日:2023-10-30
# 知覚テスト:マルチモーダルビデオモデルの診断ベンチマーク Perception Test: A Diagnostic Benchmark for Multimodal Video Models ( http://arxiv.org/abs/2305.13786v2 ) ライセンス: Link先を確認	Viorica P\u{a}tr\u{a}ucean, Lucas Smaira, Ankush Gupta, Adri\`a Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, Jo\~ao Carreira	(参考訳) 本研究では,事前学習したマルチモーダルモデル(Flamingo,SeViLA,GPT-4)の知覚と推論能力を評価するために,新しいマルチモーダルビデオベンチマークである知覚テストを提案する。計算タスク(例えば分類、検出、追跡)に焦点を当てた既存のベンチマークと比較すると、知覚テストは、ビデオ、音声、テキストのモダリティにまたがるスキル(記憶、抽象、物理学、意味論)と推論の種類(記述、説明、予測、反事実)に焦点を当て、包括的で効率的な評価ツールを提供する。このベンチマークは、ゼロショット/少数ショットまたは限定的な微調整方式で、転送機能の事前訓練されたモデルを探索する。これらの目的のために、知覚テストでは、世界中の約100人の参加者によって撮影された知覚的に興味深い状況を示すために設計された、平均23秒の11.6kの現実世界ビデオが導入されている。ビデオには6種類のラベル(マルチチョイスと接地ビデオ、オブジェクトとポイントトラック、テンポラルアクションとサウンドセグメント)が密にアノテートされており、言語と非言語の両方の評価を可能にする。ベンチマークの微調整とバリデーションの分割(cc-by license)は、保持テストの分割を備えたチャレンジサーバに加えて、公開されている(cc-by license)。最先端のビデオqaモデルと比較した人間のベースラインの結果は、パフォーマンスの実質的な差(91.4%対46.2%)を示し、マルチモーダルビデオ理解の改善の余地があることを示唆している。 dataset、baseline code、challenge serverはhttps://github.com/deepmind/perception_testで利用可能である。 We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a substantial gap in performance (91.4% vs 46.2%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baseline code, and challenge server are available at https://github.com/deepmind/perception_test	翻訳日:2023-11-02 02:43:26 公開日:2023-10-30
# 双方向デコードのためのフレームワーク:形態的インフレクションのケーススタディ A Framework for Bidirectional Decoding: Case Study in Morphological Inflection ( http://arxiv.org/abs/2305.12580v2 ) ライセンス: Link先を確認	Marc E. Canby and Julia Hockenmaier	(参考訳) 左右方向の出力を生成するトランスフォーマベースのエンコーダ-デコーダモデルがシーケンス-シーケンスタスクの標準となっている。本稿では,"outside-in"からシーケンスを生成するデコードのためのフレームワークを提案する。各ステップにおいて,モデルが左,右,あるいは左,右のシーケンスに結合するトークンを生成するように選択する。これは従来の双方向デコーダよりも原則的だと主張する。本提案は,様々なモデルアーキテクチャをサポートし,潜在順序変数を辺化する動的プログラミングアルゴリズムなど,いくつかのトレーニング手法を含む。提案手法は2022年と2023年の共有タスクに最先端(sota)をセットし,次のシステムでは平均精度4.7ポイント,2.7ポイントをそれぞれ上回った。このモデルは長いシーケンスで特にうまく動作し、stemとaffixからなる単語のスプリットポイントを暗黙的に学習でき、ユニークな補題が少ないデータセットのベースラインよりもパフォーマンスが良い(ただし補題ごとに多くの例がある)。 Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the "outside-in": at each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. We argue that this is more principled than prior bidirectional decoders. Our proposal supports a variety of model architectures and includes several training methods, such as a dynamic programming algorithm that marginalizes out the latent ordering variable. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively. The model performs particularly well on long sequences, can implicitly learn the split point of words composed of stem and affix, and performs better relative to the baseline on datasets that have fewer unique lemmas (but more examples per lemma).	翻訳日:2023-11-02 02:42:40 公開日:2023-10-30
# ReLUネットワークの多相最適化ダイナミクスとリッチ非線形挙動の理解 Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks ( http://arxiv.org/abs/2305.12467v4 ) ライセンス: Link先を確認	Mingze Wang, Chao Ma	(参考訳) ReLUニューラルネットワークのトレーニングプロセスはしばしば複雑な非線形現象を示す。モデルの非線形性と損失の非凸性は理論解析に重大な課題をもたらす。したがって、ニューラルネットワークの最適化力学に関するこれまでの理論研究は、局所解析(訓練終了など)や近似線形モデル(ニューラル・タンジェント・カーネルなど)に重点を置いていた。本研究では, 線形分離可能なデータに基づいて, グラディエントフローにより学習した2層ReLUネットワークの学習過程を理論的に解析する。この特定の環境では、ランダム初期化から最終収束までの最適化過程全体を解析する。研究した比較的単純なモデルとデータにもかかわらず、学習プロセス全体とは4つの異なるフェーズがあることがわかりました。特定の非線形挙動は、初期凝縮、サドル・トゥ・プラトー力学、プラトーエスケープ、活性化パターンの変化、複雑さの増加による学習など、理論的に正確に識別・捕獲することができる。 The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.	翻訳日:2023-11-02 02:42:21 公開日:2023-10-30
# ReTAG: 分析テキスト生成のための認識テーブルの推論 ReTAG: Reasoning Aware Table to Analytic Text Generation ( http://arxiv.org/abs/2305.11826v2 ) ライセンス: Link先を確認	Deepanway Ghosal and Preksha Nema and Aravindan Raghuveer	(参考訳) テーブル要約のタスクは、テーブル内の特定のハイライトされたセルのセットを簡潔かつ正確に表すテキストを生成することである。テーブルからテキスト生成技術への大きな進歩はあったが、モデルが依然として記述的な要約を生成しており、表に含まれる情報を文で繰り返す。一般的なテーブルからテキストへのベンチマーク(totto (parikh et al., 2020 and infotabs (gupta et al., 2020))の分析を通じて、理想的な要約を生成するには、複数の推論が必要であり、テーブルの範囲を超えた知識へのアクセスが必要であることを観察する。このギャップに対処するために,ベクトル量子化を用いた解析的推論を出力に注入するテーブルおよび推論認識モデルであるReTAGを提案する。 ReTAGは、ToTToとInfoTabsの関連するスライスでPARENTメトリックを2.2%、2.9%改善し、アートベースラインの状態よりもテキスト生成タスクを生成する。人間による評価により、ReTAGの出力は、強いテーブル認識モデルに比べて12%ほど忠実で分析的であることがわかった。我々の知る限りでは、ReTAGは構造認識シーケンスからシーケンスモデルまでの複数の推論手法を制御し、複数のテーブルからテキストタスクへのアートパフォーマンスの状態を克服する最初のモデルである。私たちは、ToTTo、InfoTabsデータセットを参照文毎に推論カテゴリで拡張(そして、オープンソースの35.6K分析、55.9k記述インスタンス)します。 The task of table summarization involves generating text that both succinctly and accurately represents the table or a specific set of highlighted cells within a table. While significant progress has been made in table to text generation techniques, models still mostly generate descriptive summaries, which reiterates the information contained within the table in sentences. Through analysis of popular table to text benchmarks (ToTTo (Parikh et al., 2020 and InfoTabs (Gupta et al., 2020) we observe that in order to generate the ideal summary, multiple types of reasoning is needed coupled with access to knowledge beyond the scope of the table. To address this gap, we propose ReTAG, a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output. ReTAG achieves 2.2%, 2.9% improvement on the PARENT metric in the relevant slice of ToTTo and InfoTabs for the table to text generation task over state of the art baselines. Through human evaluation, we observe that output from ReTAG is upto 12% more faithful and analytical compared to a strong table-aware model. To the best of our knowledge, ReTAG is the first model that can controllably use multiple reasoning methods within a structure-aware sequence to sequence model to surpass state of the art performance in multiple table to text tasks. We extend (and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo, InfoTabs datasets with the reasoning categories used in each reference sentences.	翻訳日:2023-11-02 02:41:43 公開日:2023-10-30
# ToolkenGPT: ツール埋め込みによる大量ツールによる凍結言語モデルの拡張 ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings ( http://arxiv.org/abs/2305.11554v3 ) ライセンス: Link先を確認	Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu	(参考訳) 言語モデル(LLM)を外部ツールで拡張することは、複雑な問題を解決するための有望なアプローチとして現れている。しかし、ツールのデモデータでLSMを微調整する従来の手法は、コストと事前定義されたツールセットに制限される可能性がある。最近のインコンテキスト学習パラダイムはこれらの問題を緩和するが、制限されたコンテキスト長はいくつかのデモのみを可能にし、ツールの最適下理解につながる。さらに、多くのツールが選択できる場合、コンテキスト内学習は完全に機能しない可能性がある。本稿では,両面の利点を組み合わせた代替手法として$\textbf{ToolkenGPT}$を提案する。我々のアプローチは、各$\underline{tool}$をto$\underline{ken}$ ($\textit{toolken}$)として表現し、埋め込みを学習し、通常のワードトークンを生成するのと同じようにツール呼び出しを可能にする。ツールケンが起動されると、LSMはツールの実行のための引数を完了するように促される。 toolkengptは、ツールケンのセットをオンザフライで拡大することで、任意の数のツールをプラグインする柔軟性を提供します。さらに、ツールケン埋め込みを学習するための広範なデモデータを提供することで、ツール使用を改善する。数値推論,知識に基づく質問応答,具体化計画生成など,多様な領域において,我々のアプローチはLLMをツールで効果的に強化し,最新のベースラインを大幅に上回っている。 ToolkenGPTは、複雑なシナリオにおいて、大きなツールセットから関連するツールを使用する有望な能力を示す。 Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration data, can be both costly and restricted to a predefined set of tools. Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations, leading to suboptimal understandings of the tools. Moreover, when there are numerous tools to choose from, in-context learning could completely fail to work. In this paper, we propose an alternative approach, $\textbf{ToolkenGPT}$, which combines the benefits of both sides. Our approach represents each $\underline{tool}$ as a to$\underline{ken}$ ($\textit{toolken}$) and learns an embedding for it, enabling tool calls in the same way as generating a regular word token. Once a toolken is triggered, the LLM is prompted to complete arguments for the tool to execute. ToolkenGPT offers the flexibility to plug in an arbitrary number of tools by expanding the set of toolkens on the fly. In addition, it improves tool use by allowing extensive demonstration data for learning the toolken embeddings. In diverse domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, our approach effectively augments LLMs with tools and substantially outperforms various latest baselines. ToolkenGPT demonstrates the promising ability to use relevant tools from a large tool set in complex scenarios.	翻訳日:2023-11-02 02:41:15 公開日:2023-10-30
# TextDiffuser: テキストペイントとしての拡散モデル TextDiffuser: Diffusion Models as Text Painters ( http://arxiv.org/abs/2305.10855v5 ) ライセンス: Link先を確認	Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei	(参考訳) 拡散モデルは印象的な生成能力で注目を集めているが、現在は正確で一貫性のあるテキストのレンダリングに苦戦している。この問題に対処するために,テキストディフューザを導入し,背景に忠実な視覚的魅力のあるテキストによる画像生成に焦点を当てた。 TextDiffuserは、まず、Transformerモデルがテキストプロンプトから抽出されたキーワードのレイアウトを生成し、次に拡散モデルがテキストプロンプトと生成されたレイアウトに条件付き画像を生成する。さらに,文字認識や検出,文字レベルのセグメンテーションアノテーションを含む1000万のイメージテキストペアを含む,ocrアノテーションを備えた最初の大規模テキストイメージデータセットであるmario-10mをコントリビュートする。我々はさらにMARIO-Evalベンチマークを収集し、テキストのレンダリング品質を評価する包括的なツールとして機能する。実験とユーザスタディにより,テキストプロンプトだけで高品質なテキスト画像を作成し,テキストテンプレート画像と併用し,不完全な画像の再構築を行う,柔軟性と制御性を示す。コード、モデル、データセットは \url{https://aka.ms/textdiffuser} で入手できる。 Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality. Through experiments and user studies, we show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text. The code, model, and dataset will be available at \url{https://aka.ms/textdiffuser}.	翻訳日:2023-11-02 02:40:45 公開日:2023-10-30
# 量子測定結果に対する財務クレーム成分の評価 Valuation of a Financial Claim Contingent on the Outcome of a Quantum Measurement ( http://arxiv.org/abs/2305.10239v2 ) ライセンス: Link先を確認	Lane P. Hughston and Leandro S\'anchez-Betancourt	(参考訳) 有理エージェントは、時として$0$が金銭的契約に入り、その支払いは、ある時点において$T>0$で量子測定によって決定される。量子系の状態は、既知の密度行列 $\hat p$ によってハイゼンベルク表現で与えられる。エージェントは、その契約に参加するのに、その時点で0ドルの支払いを喜んでどのくらいしますか? 有限次元ヒルベルト空間の場合、それぞれのクレームは観測可能な $\hat x_t$ で表現され、ここでは$\hat x_t$ の固有値が測定結果が得られたときに支払われる金額を決定する。妥当な公理の下では、価格関数 $\Pi_{0T}$ が $\Pi_{0T}(\hat X_T) = P_{0T}\,{\rm tr} ( \hat q \hat X_T) $ を任意のクレーム $\hat X_T$ に対して取るような null 空間上の物理的状態 $\hat p$ と等価な価格状態 $\hat q$ が存在することを証明している。すなわち、任意の$\|\xi \rangle \in \mathcal h$ 1 に対して、$\langle \bar \xi \| \hat p \| \xi \rangle = 0$ であることと、$\langle \bar \xi \| \hat q \| \xi \rangle = 0$ であることは同値である。最適化問題の種類を導入し,所定の測定値に基づいてクレームに対する最適契約支払構造を解く。次に,そのような条件下でのコーチェン・スペックルの定理の意義を考察し,契約のポートフォリオ形成の問題について考察する。最後に,複数周期契約について考察する。 We consider a rational agent who at time $0$ enters into a financial contract for which the payout is determined by a quantum measurement at some time $T>0$. The state of the quantum system is given in the Heisenberg representation by a known density matrix $\hat p$. How much will the agent be willing to pay at time $0$ to enter into such a contract? In the case of a finite dimensional Hilbert space, each such claim is represented by an observable $\hat X_T$ where the eigenvalues of $\hat X_T$ determine the amount paid if the corresponding outcome is obtained in the measurement. We prove, under reasonable axioms, that there exists a pricing state $\hat q$ which is equivalent to the physical state $\hat p$ on null spaces such that the pricing function $\Pi_{0T}$ takes the form $\Pi_{0T}(\hat X_T) = P_{0T}\,{\rm tr} ( \hat q \hat X_T) $ for any claim $\hat X_T$, where $P_{0T}$ is the one-period discount factor. By "equivalent" we mean that $\hat p$ and $\hat q$ share the same null space: thus, for any $\|\xi \rangle \in \mathcal H$ one has $\langle \bar \xi \| \hat p \| \xi \rangle = 0$ if and only if $\langle \bar \xi \| \hat q \| \xi \rangle = 0$. We introduce a class of optimization problems and solve for the optimal contract payout structure for a claim based on a given measurement. Then we consider the implications of the Kochen-Specker theorem in such a setting and we look at the problem of forming portfolios of such contracts. Finally, we consider multi-period contracts.	翻訳日:2023-11-02 02:40:22 公開日:2023-10-30
# 自然言語におけるグラフ問題の解ける言語モデル Can Language Models Solve Graph Problems in Natural Language? ( http://arxiv.org/abs/2305.10037v2 ) ライセンス: Link先を確認	Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, Yulia Tsvetkov	(参考訳) 大規模言語モデル(LLM)は、ロボット工学の計画、マルチホップ質問応答や知識探索、構造化コモンセンス推論など、暗黙のグラフィカルな構造を持つ様々なタスクに採用されている。 LLMは、これらのタスクの最先端を構造的含意で進めてきたが、LLMがグラフや構造のテキスト記述を明示的に処理し、それらを接地された概念空間にマッピングし、構造化された操作を行うことができるかどうかはまだ未定である。この目的のために,自然言語で設計したグラフ型問題解決の総合ベンチマークであるnlgraph(natural language graph)を提案する。 NLGraphには29,370の問題が含まれており、接続や最短経路といった単純なタスクから、最大フローやグラフニューラルネットワークのシミュレーションといった複雑な問題まで、複雑な8つのグラフ推論タスクをカバーする。 llms (gpt-3/4) をnlgraphベンチマーク上で様々なプロンプトアプローチで評価し,それを見出す。 1)言語モデルは予備的グラフ推論能力を示す。 2)高度なプロンプトとインコンテキスト学習の利点は,より複雑なグラフ問題において減少する。 3) LLMは, グラフや問題設定の急激な相関に直面すると, 当然脆弱である。次に,自然言語グラフ問題を解決するための2つの命令に基づく手法である build-a-graph prompting と algorithmic prompting を提案する。ビルド・ア・グラフとアルゴリズムは、複数のタスクや設定において、NLGraph上のLLMのパフォーマンスを3.07%から16.85%向上させる一方で、言語モデルを用いたセットアップにおいて最も複雑なグラフ推論タスクをどう解決するかは、オープンな研究課題である。 NLGraphベンチマークと評価コードはhttps://github.com/Arthur-Heng/NLGraphで公開されている。 Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures, such as planning in robotics, multi-hop question answering or knowledge probing, structured commonsense reasoning, and more. While LLMs have advanced the state-of-the-art on these tasks with structure implications, whether LLMs could explicitly process textual descriptions of graphs and structures, map them to grounded conceptual spaces, and perform structured operations remains underexplored. To this end, we propose NLGraph (Natural Language Graph), a comprehensive benchmark of graph-based problem solving designed in natural language. NLGraph contains 29,370 problems, covering eight graph reasoning tasks with varying complexity from simple tasks such as connectivity and shortest path up to complex problems such as maximum flow and simulating graph neural networks. We evaluate LLMs (GPT-3/4) with various prompting approaches on the NLGraph benchmark and find that 1) language models do demonstrate preliminary graph reasoning abilities, 2) the benefit of advanced prompting and in-context learning diminishes on more complex graph problems, while 3) LLMs are also (un)surprisingly brittle in the face of spurious correlations in graph and problem settings. We then propose Build-a-Graph Prompting and Algorithmic Prompting, two instruction-based approaches to enhance LLMs in solving natural language graph problems. Build-a-Graph and Algorithmic prompting improve the performance of LLMs on NLGraph by 3.07% to 16.85% across multiple tasks and settings, while how to solve the most complicated graph reasoning tasks in our setup with language models remains an open research question. The NLGraph benchmark and evaluation code are available at https://github.com/Arthur-Heng/NLGraph.	翻訳日:2023-11-02 02:39:21 公開日:2023-10-30
# 境界幾何学的罰則を保証するリーマン最小最適化の高速化法 Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded Geometric Penalties ( http://arxiv.org/abs/2305.16186v2 ) ライセンス: Link先を確認	David Mart\'inez-Rubio, Christophe Roux, Christopher Criscitiello, Sebastian Pokutta	(参考訳) 本研究では, 積リーマン多様体上の $f(x, y)$ が定義されるような $\min_x \max_y f(x, y)$ という形式の最適化問題について検討し, $x$ と $\mu_y$-strongly geodesically convex (g-convex) が $y$,$\mu_x, \mu_y \geq 0$ に対して $\mu_x$-strongly g-concave in $y$ について検討する。我々は、$f$が$(l_x, l_y, l_{xy})$-smoothと$\mathcal{m}$, $\mathcal{n}$がhadaardである場合、高速化メソッドを設計する。そこで我々は, 計量計画付きリーマン勾配降下に対する大域的線形収束を示すとともに, 幾何学定数を減少させることにより, 既存の高速化手法を改善した。さらに、リーマン min-max の場合に適用する2つの以前の研究を、あらかじめ特定されたコンパクト集合に留まる反復に関する仮定を除去して解析する。 In this work, we study optimization problems of the form $\min_x \max_y f(x, y)$, where $f(x, y)$ is defined on a product Riemannian manifold $\mathcal{M} \times \mathcal{N}$ and is $\mu_x$-strongly geodesically convex (g-convex) in $x$ and $\mu_y$-strongly g-concave in $y$, for $\mu_x, \mu_y \geq 0$. We design accelerated methods when $f$ is $(L_x, L_y, L_{xy})$-smooth and $\mathcal{M}$, $\mathcal{N}$ are Hadamard. To that aim we introduce new g-convex optimization results, of independent interest: we show global linear convergence for metric-projected Riemannian gradient descent and improve existing accelerated methods by reducing geometric constants. Additionally, we complete the analysis of two previous works applying to the Riemannian min-max case by removing an assumption about iterates staying in a pre-specified compact set.	翻訳日:2023-11-02 02:31:55 公開日:2023-10-30
# 競合間の戦略的データ共有 Strategic Data Sharing between Competitors ( http://arxiv.org/abs/2305.16052v3 ) ライセンス: Link先を確認	Nikita Tsoy and Nikola Konstantinov	(参考訳) 協調学習技術は近年大きく進歩し、複数の組織にまたがってプライベートモデルトレーニングを可能にしている。この機会にもかかわらず、競合他社とのデータ共有を考えると、企業はジレンマに直面する。コラボレーションは企業の機械学習モデルを改善することができるが、競合他社に利益をもたらし、利益を減少させる可能性がある。本稿では,このデータ共有トレードオフを分析するための汎用フレームワークを提案する。フレームワークは3つのコンポーネントで構成されており、それぞれ、企業の生産決定、モデル品質に対する追加データの影響、データ共有交渉プロセスである。次に,従来の経済理論に基づく市場モデルに基づく枠組みのインスタンス化を行い,協調的インセンティブに影響を与える重要な要因を明らかにする。その結果,市場条件がデータ共有インセンティブに与える影響が示唆された。特に、企業の製品間の類似性や、難しい学習タスクがコラボレーションを促進するという点で、競争が減少していることが分かりました。 Collaborative learning techniques have significantly advanced in recent years, enabling private model training across multiple organizations. Despite this opportunity, firms face a dilemma when considering data sharing with competitors -- while collaboration can improve a company's machine learning model, it may also benefit competitors and hence reduce profits. In this work, we introduce a general framework for analyzing this data-sharing trade-off. The framework consists of three components, representing the firms' production decisions, the effect of additional data on model quality, and the data-sharing negotiation process, respectively. We then study an instantiation of the framework, based on a conventional market model from economic theory, to identify key factors that affect collaboration incentives. Our findings indicate a profound impact of market conditions on the data-sharing incentives. In particular, we find that reduced competition, in terms of the similarities between the firms' products, and harder learning tasks foster collaboration.	翻訳日:2023-11-02 02:31:22 公開日:2023-10-30
# Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models ( http://arxiv.org/abs/2305.15618v2 ) ライセンス: Link先を確認	Zhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, Leonardo Zepeda-N\'u\~nez	(参考訳) 非ペアデータを用いた統計的ダウンスケーリングのための2段階確率的フレームワークを提案する。統計的ダウンスケーリングは、低分解能データを偏りの粗い数値スキームから高忠実度スキームに整合した高分解能データに変換する確率写像を求める。私たちのフレームワークは、2つのトランスフォーメーションを構成することによってこの問題に取り組みます。 (i)最適な輸送地図による偏りの段階、及び (ii)後方条件サンプリングを用いた確率拡散モデルによって達成されるアップサンプリングステップ。このアプローチは、ペアデータを必要としない条件分布を特徴付け、バイアスサンプルから関連する物理統計を忠実に復元する。本研究では, 気象・気候の数値シミュレーションにおける中核的な問題である1次元および2次元流体流問題に対する提案手法の有用性を実証する。提案手法は,8倍,16倍の解像度をアップサンプリングすることで,低解像度入力からリアルな高解像度出力を生成する。さらに,本手法は,入力と出力の低周波内容が一致しない場合でも,物理量の統計値と正しく一致している。 https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_di ffusion。 We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a conditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representative of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8x and 16x. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives. Code for this work is available at: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_di ffusion.	翻訳日:2023-11-02 02:31:07 公開日:2023-10-30
# Momentumがエラーフィードバックを改善! Momentum Provably Improves Error Feedback! ( http://arxiv.org/abs/2305.15155v2 ) ライセンス: Link先を確認	Ilyas Fatkhullin, Alexander Tyurin, Peter Richt\'arik	(参考訳) 分散環境で機械学習モデルをトレーニングする際の通信オーバーヘッドが高いため、現代のアルゴリズムは損失のある通信圧縮に依存している。しかし、未処理の場合、圧縮による誤差が伝播し、指数的発散を含む非常に不安定な挙動を引き起こす可能性がある。約10年前、Seide氏らは、この問題を緩和するための非常に効果的なヒューリスティックとして、EF14と呼ばれるエラーフィードバック(EF)機構を提案した。しかし、過去10年間のEF分野の着実にアルゴリズムと理論的進歩にもかかわらず、我々の理解は完璧には程遠い。この作業では、最も差し迫った問題のひとつに対処します。特に、標準的な非凸設定では、EFのすべての既知の変種は収束するために非常に大きなバッチサイズに依存しており、実際には禁止される。我々は、この問題を理論的にも現実的にも取り除く驚くほど単純な修正を提案する: Richt\'{a}rik et al による EF の最新の化へのPolyak の運動量の適用。【2021年】ef21として知られる。 EF21-SGDMと命名したこのアルゴリズムは,従来の誤りフィードバックアルゴリズムの標準滑らか性および有界分散仮定に基づく通信とサンプルの複雑さを改善し,有界勾配の相似性などのより強い仮定を必要としない。さらに, 複雑度をさらに向上させるダブルモーメント方式を提案する。本手法から圧縮を除去した場合でも,本手法は新規であり,ポリアックの運動量に富む非凸確率最適化の研究には独立した手法である。 Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richt\'{a}rik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum.	翻訳日:2023-11-02 02:29:56 公開日:2023-10-30
# 表データによる深部異常検出のための個別入力 Beyond Individual Input for Deep Anomaly Detection on Tabular Data ( http://arxiv.org/abs/2305.15121v5 ) ライセンス: Link先を確認	Hugo Thimonier, Fabrice Popineau, Arpad Rimmel and Bich-Li\^en Doan	(参考訳) 異常検出は金融、医療、サイバーセキュリティなど多くの分野において不可欠である。本稿では,教師付きタスクのために最初に提案された非パラメトリックトランスフォーマ(npts)を利用して,特徴量とサンプル値の両方の依存関係をキャプチャする,新しい深層異常検出法を提案する。再構成に基づくフレームワークでは,NPTをトレーニングし,通常のサンプルのマスキング特徴を再構築する。非パラメトリックな方法では、推論中にトレーニングセット全体を活用し、マスクした特徴を再構成して異常スコアを生成するモデルの能力を利用する。私たちの知る限りでは、グラフデータセット上の異常検出のために、機能機能とサンプルサンプルの依存関係をうまく組み合わせる最初の試みである。本手法は,31個のベンチマーク表型データセットを用いた広範囲な実験により,f1-score と auroc の2.4%,1.2% の既存手法を上回り,最先端の性能を実現することを実証した。本研究は,両依存のモデル化が表データにおける異常検出に重要であることを示す。 Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity. In this paper, we propose a novel deep anomaly detection method for tabular data that leverages Non-Parametric Transformers (NPTs), a model initially proposed for supervised tasks, to capture both feature-feature and sample-sample dependencies. In a reconstruction-based framework, we train the NPT to reconstruct masked features of normal samples. In a non-parametric fashion, we leverage the whole training set during inference and use the model's ability to reconstruct the masked features to generate an anomaly score. To the best of our knowledge, this is the first work to successfully combine feature-feature and sample-sample dependencies for anomaly detection on tabular datasets. Through extensive experiments on 31 benchmark tabular datasets, we demonstrate that our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively. Our ablation study provides evidence that modeling both types of dependencies is crucial for anomaly detection on tabular data.	翻訳日:2023-11-02 02:29:29 公開日:2023-10-30
# OPC UAを用いた強化学習の活用に関するミニレビュー A Mini Review on the utilization of Reinforcement Learning with OPC UA ( http://arxiv.org/abs/2305.15113v2 ) ライセンス: Link先を確認	Simon Schindler, Martin Uray, Stefan Huber	(参考訳) 強化学習(Reinforcement Learning, RL)は、ロボット工学、自然言語処理、ゲームプレイといった様々な分野に適用された強力な機械学習パラダイムである。シーケンシャルな意思決定問題を解決するために、設計は経験から学び、動的環境の変化に適応できる。これらの能力により、産業における複雑なプロセスの制御と最適化の第一候補となる。この可能性を完全に活用する鍵は、既存の産業システムへのRLのシームレスな統合である。産業用通信標準であるOpen Platform Communications UnifiedArchitecture (OPC UA)はこのギャップを埋める可能性がある。しかし、RLとOPC UAは異なる分野のものであるため、研究者は2つの技術間のギャップを埋める必要がある。この研究は、このギャップを埋めるために、両方の技術の技術的な概要を簡潔に提供し、RLとOPC UAをどのように組み合わせて適用するかについての洞察を得るために、半発掘的な文献レビューを実施している。この調査では、RLとOPC UAの交差に続き、3つの主要な研究トピックが特定されている。文献レビューの結果は、RLは産業プロセスの制御と最適化のための有望な技術であるが、現実のシナリオに適度に少ない労力で展開するために必要な標準化されたインターフェースを持っていないことを示している。 Reinforcement Learning (RL) is a powerful machine learning paradigm that has been applied in various fields such as robotics, natural language processing and game playing achieving state-of-the-art results. Targeted to solve sequential decision making problems, it is by design able to learn from experience and therefore adapt to changing dynamic environments. These capabilities make it a prime candidate for controlling and optimizing complex processes in industry. The key to fully exploiting this potential is the seamless integration of RL into existing industrial systems. The industrial communication standard Open Platform Communications UnifiedArchitecture (OPC UA) could bridge this gap. However, since RL and OPC UA are from different fields,there is a need for researchers to bridge the gap between the two technologies. This work serves to bridge this gap by providing a brief technical overview of both technologies and carrying out a semi-exhaustive literature review to gain insights on how RL and OPC UA are applied in combination. With this survey, three main research topics have been identified, following the intersection of RL with OPC UA. The results of the literature review show that RL is a promising technology for the control and optimization of industrial processes, but does not yet have the necessary standardized interfaces to be deployed in real-world scenarios with reasonably low effort.	翻訳日:2023-11-02 02:29:09 公開日:2023-10-30
# 実世界情報検索シナリオにおけるLCMのテーブル・ツー・テキスト生成能力の検討 Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios ( http://arxiv.org/abs/2305.14987v2 ) ライセンス: Link先を確認	Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan	(参考訳) タブラルデータは様々な産業で広く使われており、ユーザが情報検索の目的を理解し、操作するのにかなりの時間と労力を要する。大規模言語モデル(LLM)の進歩は、ユーザ効率を向上させる大きな可能性を示している。しかし、テーブル情報探索のための実世界の応用におけるLLMの採用は、いまだに未定である。本稿では,2つの実世界情報探索シナリオ内の4つのデータセットを用いて,異なるLLMのテーブル・トゥ・テキスト機能について検討する。 LogicNLGや、新たに構築したデータインサイト生成用のLoTNLGデータセット、FeTaQAやクエリベースの生成用のF2WTQデータセットなどです。 3つの研究課題について調査を行い,テーブル・ツー・テキスト生成,自動評価,フィードバック生成におけるllmの性能評価を行った。実験結果から,現在の高性能LCM(特にGPT-4)は,実世界のシナリオにおいて,ユーザの情報検索を目的としたテーブル・ツー・テキスト・ジェネレータ,評価器,フィードバック・ジェネレータとして効果的に機能することが示唆された。しかし、他のオープンソース LLM (Tulu と LLaMA-2) と GPT-4 の間には大きな性能差がある。私たちのデータとコードはhttps://github.com/yale-nlp/LLM-T2Tで公開されています。 Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this paper, we investigate the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. These include the LogicNLG and our newly-constructed LoTNLG datasets for data insight generation, along with the FeTaQA and our newly-constructed F2WTQ datasets for query-based generation. We structure our investigation around three research questions, evaluating the performance of LLMs in table-to-text generation, automated evaluation, and feedback generation, respectively. Experimental results indicate that the current high-performing LLM, specifically GPT-4, can effectively serve as a table-to-text generator, evaluator, and feedback generator, facilitating users' information seeking purposes in real-world scenarios. However, a significant performance gap still exists between other open-sourced LLMs (e.g., Tulu and LLaMA-2) and GPT-4 models. Our data and code are publicly available at https://github.com/yale-nlp/LLM-T2T.	翻訳日:2023-11-02 02:28:49 公開日:2023-10-30
# コントラスト視覚言語モデルにおけるテキストエンコーダのボトルネック構成性 Text encoders bottleneck compositionality in contrastive vision-language models ( http://arxiv.org/abs/2305.14897v2 ) ライセンス: Link先を確認	Amita Kamath, Jack Hessel, Kai-Wei Chang	(参考訳) CLIPのような高性能視覚言語(VL)モデルは、単一のベクトルを使ってキャプションを表現する。このボトルネックで、言語に関する情報はどの程度失われていますか? 最初にCompPromptsをキュレートします。これは、VLモデルがキャプチャできるべき構成的なイメージキャプションのセットです(例えば、シングルオブジェクト、オブジェクト+プロパティ、複数の対話オブジェクト)。そして,複数のVLモデルによって生成された単一ベクトルテキスト表現からキャプションを再構築することを目的とした,テキストのみの回復プローブを訓練する。このアプローチではイメージを必要とせず、以前の作業よりも広い範囲のシーンでテストすることができます。私たちはそれを見つけました 1) CLIP のテキストエンコーダは,オブジェクト関係,属性オブジェクト関連,カウント,否定など,よりコンポジション的な入力では不足している。 2)一部のテキストエンコーダは,他よりも著しく優れている。 3) テキストのみのリカバリ性能はcontroledimcaps上でマルチモーダルマッチング性能を予測する: きめ細かい合成画像とキャプションからなる新しい評価ベンチマーク。具体的には, テキストのみの回復性は, コントラッシブVLモデルにおける構成因子のモデル化に必要である(ただし十分ではない)ことを示唆する。データセットとコードをリリースします。 Performant vision-language (VL) models like CLIP represent captions using a single vector. How much information about language is lost in this bottleneck? We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e.g., single object, to object+property, to multiple interacting objects). Then, we train text-only recovery probes that aim to reconstruct captions from single-vector text representations produced by several VL models. This approach does not require images, allowing us to test on a broader range of scenes compared to prior work. We find that: 1) CLIP's text encoder falls short on more compositional inputs, including object relationships, attribute-object association, counting, and negations; 2) some text encoders work significantly better than others; and 3) text-only recovery performance predicts multi-modal matching performance on ControlledImCaps: a new evaluation benchmark we collect and release consisting of fine-grained compositional images and captions. Specifically, our results suggest text-only recoverability is a necessary (but not sufficient) condition for modeling compositional factors in contrastive VL models. We release our datasets and code.	翻訳日:2023-11-02 02:27:42 公開日:2023-10-30
# 等角化グラフニューラルネットワークによるグラフ上の不確かさの定量化 Uncertainty Quantification over Graph with Conformalized Graph Neural Networks ( http://arxiv.org/abs/2305.14535v2 ) ライセンス: Link先を確認	Kexin Huang, Ying Jin, Emmanuel Cand\`es, Jure Leskovec	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データに基づく強力な機械学習予測モデルである。しかし、GNNには厳密な不確実性評価がなく、エラーのコストが重要な設定での信頼性の高いデプロイメントを制限している。本稿では,共形予測(CP)をグラフベースモデルに拡張した共形GNN(CF-GNN)を提案する。グラフ内のエンティティが与えられると、cf-gnnは、事前に定義されたカバレッジ確率(例えば90%)を持つ真のラベルを含む予測セット/インターバルを生成する。我々は,グラフデータに対するCPの有効性を実現するための置換不変条件を確立し,テスト時間カバレッジを正確に評価する。また,有効範囲の他に,実用上の予測セットサイズ/インターバル長の削減が重要である。予測の更新を学習し、より効率的な予測セット/インターバルを生成するトポロジー対応出力補正モデルを開発する動機となる、非コンフォーマリティスコアとネットワーク構造の間の鍵接続を観察した。大規模実験の結果,CF-GNNは予め定義された目標範囲の範囲を達成できる一方で,予測セット/インターバルサイズを最大74%削減できることがわかった。また、様々な生およびネットワーク機能に対する十分な条件付きカバレッジを実証的に達成する。 Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Moreover, besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features.	翻訳日:2023-11-02 02:26:58 公開日:2023-10-30
# 予習変圧器における創発的モジュラリティ Emergent Modularity in Pre-trained Transformers ( http://arxiv.org/abs/2305.18390v2 ) ライセンス: Link先を確認	Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou	(参考訳) この研究は、人間の脳によく見られる特徴であり、汎用知能に欠かせない機能である、事前訓練されたトランスフォーマーにおけるモジュラリティの存在を調べる。 1)ニューロンの機能的特殊化:各ニューロンが主に特定の機能に特化しているかどうかを評価し,その答えがイエスであることを確かめる。 2) 機能に基づくニューロングループ化: 機能によってニューロンをモジュールに分類する構造を探索し, 各モジュールが対応する機能のために機能する。考えられる膨大な量の構造を考えると、我々は期待できる候補としてMixture-of-Expertsに注目し、ニューロンを専門家に分割し、通常異なる入力に対して異なる専門家を活性化する。実験の結果,特定の機能に特化しているニューロンがクラスター化されている機能の専門家がいることがわかった。さらに、機能専門家のアクティベーションの摂動は、対応する機能に大きく影響する。最後に,事前学習中にモジュール構造がどのように出現するかを調べ,モジュール構造が早期に安定化し,ニューロン安定化よりも高速であることが判明した。トランスフォーマーはまずモジュール構造を構築し、次に細粒度のニューロン機能を学ぶことを示唆する。コードとデータはhttps://github.com/THUNLP/modularity-analysis.comで公開されています。 This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.	翻訳日:2023-11-02 02:19:28 公開日:2023-10-30
# 信頼を超えて:信頼できるモデルは非特異性も考慮すべきである Beyond Confidence: Reliable Models Should Also Consider Atypicality ( http://arxiv.org/abs/2305.18262v2 ) ライセンス: Link先を確認	Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin	(参考訳) ほとんどの機械学習モデルは予測に自信を与えることができるが、予測の信頼性を理解するには自信が不十分である。例えば、入力がトレーニングデータセットで十分に表現されていない場合や、入力が本質的に曖昧である場合、モデルは信頼性の低い予測を行うことができる。本研究では,サンプルやクラスが非典型的(希少)であるかとモデル予測の信頼性の関係について検討する。まず,非定型性は誤用と正確性に強く関連していることを示す。特に,非定型入力や非定型クラスの予測が過度に信頼され,精度が低いことを実証的に示す。これらの知見を用いて,不確かさの定量化とモデル性能の向上を,識別型ニューラルネットワークと大規模言語モデルに適用した。本報告では,非定型性を用いることで,異なる皮膚トーン群にまたがる皮膚病変分類器の性能が向上することを示す。全体として,モデルの信頼性だけでなく,不確実性の定量化や性能向上にも非定型性を用いるべきである。以上の結果から, 簡易な非定型性推定器が有意な価値をもたらすことが示唆された。 While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical(rare) a sample or a class is and the reliability of a model's predictions. We first demonstrate that atypicality is strongly related to miscalibration and accuracy. In particular, we empirically show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. Using these insights, we show incorporating atypicality improves uncertainty quantification and model performance for discriminative neural networks and large language models. In a case study, we show that using atypicality improves the performance of a skin lesion classifier across different skin tone groups without having access to the group attributes. Overall, we propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance. Our results demonstrate that simple post-hoc atypicality estimators can provide significant value.	翻訳日:2023-11-02 02:19:07 公開日:2023-10-30
# テキスト駆動画像変換のための条件スコアガイダンス Conditional Score Guidance for Text-Driven Image-to-Image Translation ( http://arxiv.org/abs/2305.18007v2 ) ライセンス: Link先を確認	Hyunsoo Lee, Minsoo Kang, Bohyung Han	(参考訳) 本稿では,事前訓練されたテキスト・画像拡散モデルに基づくテキスト駆動画像変換のための新しいアルゴリズムを提案する。本手法は,修正テキストで定義されたソース画像の関心領域を選択的に編集し,残りの部分を保存し,対象画像を生成することを目的とする。目標プロンプトのみに依存する既存の手法とは対照的に、特定の翻訳タスクに対応するために調整されたソース画像とソーステキストプロンプトの両方を考慮に入れる新しいスコア関数を導入する。この目的のために、条件スコア関数を基準スコアと目標画像生成のためのガイド語に分解し、原則的に導出する。指導項の勾配計算には,後方分布のガウス分布を仮定し,その平均と分散を推定し,追加の訓練をすることなく勾配を調整できる。さらに,条件付きスコアガイダンスの品質向上のために,ソースとターゲットの潜伏者から得られた2つのクロスアテンションマップを組み合わせた,シンプルで効果的なミックスアップ手法を取り入れた。この戦略は、ソース画像における不変部分とターゲットプロンプトに整列した編集領域との望ましい融合を促進するのに有効であり、高忠実なターゲット画像を生成する。総合的な実験により,様々なタスクにおいて優れた画像から画像への翻訳性能を実現することを実証した。 We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function that additionally considers both the source image and the source text prompt, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled manner, decomposing it into the standard score and a guiding term for target image generation. For the gradient computation of the guiding term, we assume a Gaussian distribution of the posterior distribution and estimate its mean and variance to adjust the gradient without additional training. In addition, to improve the quality of the conditional score guidance, we incorporate a simple yet effective mixup technique, which combines two cross-attention maps derived from the source and target latents. This strategy is effective for promoting a desirable fusion of the invariant parts in the source image and the edited regions aligned with the target prompt, leading to high-fidelity target image generation. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks.	翻訳日:2023-11-02 02:18:19 公開日:2023-10-30
# GMSF:グローバルマッチングシーンフロー GMSF: Global Matching Scene Flow ( http://arxiv.org/abs/2305.17432v2 ) ライセンス: Link先を確認	Yushan Zhang, Johan Edstedt, Bastian Wandt, Per-Erik Forss\'en, Maria Magnusson, Michael Felsberg	(参考訳) 我々は点雲からのシーンフロー推定の課題に取り組む。ソースとターゲットポイントクラウドが与えられた場合、目標はソースポイントクラウドの各ポイントからターゲットへの変換を見積もることであり、結果として3dモーションベクトルフィールドが生成される。従来主流であったシーンフロー推定手法では,多段階的な細粒化や再帰的なアーキテクチャが必要であった。対照的に,この問題に対処するために,単発グローバルマッチングの簡易化を提案する。私たちの重要な発見は、ポイントペア間の信頼性の高い機能類似性が不可欠であり、正確なシーンフローを推定するのに十分であることです。そこで本研究では, 高精度かつロバストな特徴表現に不可欠な, ハイブリッドな局所-グローバル-クロストランスフォーマーアーキテクチャを用いて特徴抽出ステップを分解する。大規模な実験により,提案したGlobal Matching Scene Flow (GMSF) が,複数のシーンフロー推定ベンチマークに新たな最先端を設定できることが示されている。 FlyingThings3Dでは、オクルージョンポイントが存在するため、GMSFは前回の最高パフォーマンスの27.4%から5.6%に減らす。 KITTI Scene Flowでは微調整が不要であり,提案手法は最先端の性能を示す。 Waymo-Openデータセットでは、提案手法は従来の手法よりも大きなマージンで優れている。コードはhttps://github.com/zhangyushan3/gmsfで入手できる。 We tackle the task of scene flow estimation from point clouds. Given a source and a target point cloud, the objective is to estimate a translation from each point in the source point cloud to the target, resulting in a 3D motion vector field. Previous dominant scene flow estimation methods require complicated coarse-to-fine or recurrent architectures as a multi-stage refinement. In contrast, we propose a significantly simpler single-scale one-shot global matching to address the problem. Our key finding is that reliable feature similarity between point pairs is essential and sufficient to estimate accurate scene flow. We thus propose to decompose the feature extraction step via a hybrid local-global-cross transformer architecture which is crucial to accurate and robust feature representations. Extensive experiments show that the proposed Global Matching Scene Flow (GMSF) sets a new state-of-the-art on multiple scene flow estimation benchmarks. On FlyingThings3D, with the presence of occlusion points, GMSF reduces the outlier percentage from the previous best performance of 27.4% to 5.6%. On KITTI Scene Flow, without any fine-tuning, our proposed method shows state-of-the-art performance. On the Waymo-Open dataset, the proposed method outperforms previous methods by a large margin. The code is available at https://github.com/ZhangYushan3/GMSF.	翻訳日:2023-11-02 02:17:56 公開日:2023-10-30
# 高解像度画像の脳活動からのデコードに対するコントラスト, 態度, 難易度 Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities ( http://arxiv.org/abs/2305.17214v2 ) ライセンス: Link先を確認	Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens	(参考訳) 機能的磁気共鳴画像(fmri)によって記録された神経反応からの視覚刺激の復号は、認知神経科学と機械学習の興味深い交点を示し、人間の視覚知覚の理解と非侵襲的脳-機械インターフェイスの構築を約束する。しかし、この課題はfMRI信号のノイズの性質と脳の視覚表現の複雑なパターンによって困難である。これらの課題を軽減するために,2相fMRI表現学習フレームワークを導入する。第1フェーズでは、double-contrastive Mask Auto-encoderを提案してfMRI機能学習者を事前訓練し、識別表現を学習する。第2フェーズは、画像オートエンコーダからのガイダンスにより、視覚再構成に最も有用な神経活性化パターンに、特徴学習者が出席するようにチューニングする。最適化されたfMRI特徴学習者は、脳活動から画像刺激を再構成するために潜時拡散モデルを設定する。実験により,50-way-top-1のセマンティック分類精度において,従来の最先端手法よりも39.34%,高解像度かつセマンティックな画像を生成する上で,モデルが優れていることを示す。本研究は,非侵襲的脳-機械インタフェースの開発に寄与し,その可能性を探究するものである。 Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.	翻訳日:2023-11-02 02:16:54 公開日:2023-10-30
# 過去を想像して未来を推測する Inferring the Future by Imagining the Past ( http://arxiv.org/abs/2305.17195v2 ) ライセンス: Link先を確認	Kartik Chandra, Tony Chen, Tzu-Mao Li, Jonathan Ragan-Kelley, Josh Tenenbaum	(参考訳) 漫画本の1枚のパネルは、現在キャラクターがどこにいるかだけでなく、彼らの動き、モチベーション、感情、次に何をすべきかを描写することができる。より一般に、人間は、これまで見たことのない状況でも静的なスナップショットから、過去の出来事と将来の出来事の複雑なシーケンスを日常的に推測する。本稿では,人間がこのような迅速かつ柔軟な推論を行う方法をモデル化する。認知科学における長い研究に基づいて、我々はモンテカルロのアルゴリズムを提供し、その推論は様々な領域における人間の直観とよく相関する。私たちの重要な技術的洞察は、推論問題とモンテカルロ経路追跡の驚くべき関係であり、コンピュータグラフィックスコミュニティから何十年ものアイデアを、一見無関係な心のタスクに応用することができます。 A single panel of a comic book can say a lot: it can depict not only where the characters currently are, but also their motions, their motivations, their emotions, and what they might do next. More generally, humans routinely infer complex sequences of past and future events from a static snapshot of a dynamic scene, even in situations they have never seen before. In this paper, we model how humans make such rapid and flexible inferences. Building on a long line of work in cognitive science, we offer a Monte Carlo algorithm whose inferences correlate well with human intuitions in a wide variety of domains, while only using a small, cognitively-plausible number of samples. Our key technical insight is a surprising connection between our inference problem and Monte Carlo path tracing, which allows us to apply decades of ideas from the computer graphics community to this seemingly-unrelated theory of mind task.	翻訳日:2023-11-02 02:16:29 公開日:2023-10-30
# 3つのタワー:事前学習によるフレキシブルコントラスト学習 Three Towers: Flexible Contrastive Learning with Pretrained Image Models ( http://arxiv.org/abs/2305.16999v3 ) ライセンス: Link先を確認	Jannik Kossen, Mark Collier, Basil Mustafa, Xiao Wang, Xiaohua Zhai, Lucas Beyer, Andreas Steiner, Jesse Berent, Rodolphe Jenatton, Efi Kokiopoulou	(参考訳) 本稿では,視覚言語モデルのコントラスト学習を改善するためのフレキシブルな手法である3つのタワー(3t)を提案する。対照的なモデルは通常、ゼロからトレーニングされるが、LiT (Zhai et al., 2022) は、最近、事前訓練された分類器の埋め込みによる性能向上を示している。しかし、ライトはイメージタワーを凍結した埋め込みに置き換え、イメージタワーを対照的に訓練することの利点を除いた。 3tでは,事前学習された組込みとコントラストトレーニングの両方の恩恵を受ける,より柔軟なストラテジーを提案する。これを実現するため,凍結した既設埋設塔を含む第3の塔を導入し,この第3の塔と主画像テキスト塔との整合を奨励する。経験的に、3TはLiTとCLIPスタイルの検索タスクのベースラインを一貫して改善する。分類において、3Tはオフスクラッチベースラインよりも確実に改善され、JFT事前トレーニングモデルではLiTと比較して性能が劣るが、ImageNet-21kとPlaces365事前トレーニングではLiTより優れている。 We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al., 2022) has recently shown performance gains from using pretrained classifier embeddings. However, LiT directly replaces the image tower with the frozen embeddings, excluding any potential benefits from training the image tower contrastively. With 3T, we propose a more flexible strategy that allows the image tower to benefit from both pretrained embeddings and contrastive training. To achieve this, we introduce a third tower that contains the frozen pretrained embeddings, and we encourage alignment between this third tower and the main image-text towers. Empirically, 3T consistently improves over LiT and the CLIP-style from-scratch baseline for retrieval tasks. For classification, 3T reliably improves over the from-scratch baseline, and while it underperforms relative to LiT for JFT-pretrained models, it outperforms LiT for ImageNet-21k and Places365 pretraining.	翻訳日:2023-11-02 02:16:14 公開日:2023-10-30
# 拡散モデルは視覚・言語共振器か? Are Diffusion Models Vision-And-Language Reasoners? ( http://arxiv.org/abs/2305.16397v2 ) ライセンス: Link先を確認	Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy	(参考訳) テキスト条件付き画像生成モデルは最近、ノイズ拡散プロセスを用いて膨大な定性的成功を示している。しかし、識別的視覚・言語モデルとは異なり、これらの拡散に基づく生成モデルを用いて合成性などの高レベル現象の自動細粒度定量的評価を行うことは非自明な課題である。この目標に向けて、私たちは2つのイノベーションを実行します。まず、DiffusionITMと呼ばれる新しい手法を用いて、任意の画像テキストマッチング(ITM)タスクに対して拡散モデル(この場合、安定拡散)を変換する。第2に,7つの複雑な視覚言語タスク,バイアス評価,詳細な分析を備えた生成的判別評価ベンチマーク(gdbench)ベンチマークを紹介する。安定拡散+拡散ITMは多くのタスクで競争力があり、CLIPよりもCLEVRやWinogroundのようなコンポジションタスクで優れています。生成能力を保ちながらMS-COCOを微調整し, 転送設定により構成性能をさらに向上する。また, 拡散モデルにおける定型バイアスを測定し, 安定拡散2.1は, ほとんどが安定拡散1.5よりも偏りが少ないことを見出した。全体として,本研究の結果は,差別的・生成的モデル評価を近づけるエキサイティングな方向を示している。間もなくコードとベンチマークのセットアップをリリースします。 Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.	翻訳日:2023-11-02 02:15:53 公開日:2023-10-30
# Scan and Snap: 1層トランスにおけるトレーニングダイナミクスとトークン構成の理解 Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer ( http://arxiv.org/abs/2305.16380v4 ) ライセンス: Link先を確認	Yuandong Tian, Yiping Wang, Beidi Chen, Simon Du	(参考訳) トランスフォーマーアーキテクチャは、複数の研究領域で顕著なパフォーマンスを示し、多くのニューラルネットワークモデルのバックボーンとなっている。しかし、その仕組みについては理解が限られている。特に、単純な予測損失により、勾配 \emph{training dynamics} からどのように表現が現れるかは謎のままである。本稿では, 1層自己着脱層と1層デコーダ層を有する1層変圧器について,次のトークン予測タスクに対するsgdトレーニングダイナミクスを数学的に厳密に解析する。自己注意層が入力トークンを結合する方法の動的プロセスのブラックボックスを開き、基礎となる帰納バイアスの性質を明らかにする。より具体的に言うと (a)位置符号化なし。 (b)長い入力シーケンス、及び (c)デコーダ層は自己アテンション層よりも早く学習し、自己アテンションが \emph{discriminative scan algorithm} として機能することを証明する。異なるトークンの中では、トレーニングセット内のキーとクエリトークンの間の低いから高い共起の順序に従って、徐々に注目の重みを減らします。興味深いことに、この手順は勝者の獲得に繋がらないが、2つの層の学習速度によって制御され、(ほとんど)固定されたトークンの組み合わせを残している 'emph{phase transition} によって減速する。合成および実世界データ(wikitext)上でのこの \textbf{\emph{scan and snap}} ダイナミクスを検証する。 Transformer architecture has shown impressive performance in multiple research domains and has become the backbone of many neural network models. However, there is limited understanding on how it works. In particular, with a simple predictive loss, how the representation emerges from the gradient \emph{training dynamics} remains a mystery. In this paper, for 1-layer transformer with one self-attention layer plus one decoder layer, we analyze its SGD training dynamics for the task of next token prediction in a mathematically rigorous manner. We open the black box of the dynamic process of how the self-attention layer combines input tokens, and reveal the nature of underlying inductive bias. More specifically, with the assumption (a) no positional encoding, (b) long input sequence, and (c) the decoder layer learns faster than the self-attention layer, we prove that self-attention acts as a \emph{discriminative scanning algorithm}: starting from uniform attention, it gradually attends more to distinct key tokens for a specific next token to be predicted, and pays less attention to common key tokens that occur across different next tokens. Among distinct tokens, it progressively drops attention weights, following the order of low to high co-occurrence between the key and the query token in the training set. Interestingly, this procedure does not lead to winner-takes-all, but decelerates due to a \emph{phase transition} that is controllable by the learning rates of the two layers, leaving (almost) fixed token combination. We verify this \textbf{\emph{scan and snap}} dynamics on synthetic and real-world data (WikiText).	翻訳日:2023-11-02 02:15:17 公開日:2023-10-30
# 協調学習と最適化における競争相手の正直感 Incentivizing Honesty among Competitors in Collaborative Learning and Optimization ( http://arxiv.org/abs/2305.16272v3 ) ライセンス: Link先を確認	Florian E. Dorner, Nikola Konstantinov, Georgi Pashaliev, Martin Vechev	(参考訳) 協調学習技術は、単一のエンティティのデータでトレーニングされたモデルよりも優れた機械学習モデルのトレーニングを可能にする可能性がある。しかし、多くの場合、このような協力的なスキームの潜在的な参加者は、最善のレコメンデーションを提供することで顧客を引き付けようとする企業のような下流のタスクの競合である。これは他の参加者のモデルを傷つける不名誉なアップデートをインセンティブにし、コラボレーションのメリットを損なう可能性がある。本研究では,このようなインタラクションをモデル化したゲームを定式化し,このフレームワークにおける2つの学習タスクについて検討する。プレイヤーアクションの自然なクラスについて、合理的なクライアントは、その更新を強く操作し、学習を妨げていることを示す。次に、正直なコミュニケーションを動機づけ、完全協調に匹敵する学習品質を確保するメカニズムを提案する。最後に、標準の非凸フェデレーション学習ベンチマークにおけるインセンティブスキームの有効性を実証的に実証する。私たちの研究は、不正なクライアントのインセンティブや行動を明確にモデル化し、悪意のあるクライアントと仮定するのではなく、協調学習のための強力な堅牢性を保証することを示しています。 Collaborative learning techniques have the potential to enable training machine learning models that are superior to models trained on a single entity's data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as firms that each aim to attract customers by providing the best recommendations. This can incentivize dishonest updates that damage other participants' models, potentially undermining the benefits of collaboration. In this work, we formulate a game that models such interactions and study two learning tasks within this framework: single-round mean estimation and multi-round SGD on strongly-convex objectives. For a natural class of player actions, we show that rational clients are incentivized to strongly manipulate their updates, preventing learning. We then propose mechanisms that incentivize honest communication and ensure learning quality comparable to full cooperation. Lastly, we empirically demonstrate the effectiveness of our incentive scheme on a standard non-convex federated learning benchmark. Our work shows that explicitly modeling the incentives and actions of dishonest clients, rather than assuming them malicious, can enable strong robustness guarantees for collaborative learning.	翻訳日:2023-11-02 02:14:47 公開日:2023-10-30
# ジャンプ拡散モデルによるトランス次元生成モデル Trans-Dimensional Generative Modeling via Jump Diffusion Models ( http://arxiv.org/abs/2305.16261v2 ) ライセンス: Link先を確認	Andrew Campbell, William Harvey, Christian Weilbach, Valentin De Bortoli, Tom Rainforth, Arnaud Doucet	(参考訳) 本稿では,各データポイントの状態と次元を共同でモデル化することにより,異なる次元のデータを自然に扱う新しい生成モデルを提案する。生成過程は、異なる次元空間の間をジャンプするジャンプ拡散過程として定式化される。まず, 時間反転生成過程を生成する次元と, 近似する学習のための新しいエビデンスの下限学習目標を導出する前に, フォワードノジング過程を壊す次元を定義する。時間反転生成過程に対する学習近似をシミュレーションし、状態値と次元を共同生成することにより、様々な次元のデータをサンプリングする効果的な方法を提供する。我々は,様々な次元の分子およびビデオデータセットに対する我々のアプローチを実証し,実験時間拡散誘導インプテーションタスクとの適合性の向上と,状態値と次元を別々に生成する固定次元モデルとの補間能力の向上を報告した。 We propose a new class of generative models that naturally handle data of varying dimensionality by jointly modeling the state and dimension of each datapoint. The generative process is formulated as a jump diffusion process that makes jumps between different dimensional spaces. We first define a dimension destroying forward noising process, before deriving the dimension creating time-reversed generative process along with a novel evidence lower bound training objective for learning to approximate it. Simulating our learned approximation to the time-reversed generative process then provides an effective way of sampling data of varying dimensionality by jointly generating state values and dimensions. We demonstrate our approach on molecular and video datasets of varying dimensionality, reporting better compatibility with test-time diffusion guidance imputation tasks and improved interpolation capabilities versus fixed dimensional models that generate state values and dimensions separately.	翻訳日:2023-11-02 02:14:25 公開日:2023-10-30
# 機械振動子とキャビティ-マグノン偏光子の強結合の観測 Observation of strong coupling between a mechanical oscillator and a cavity-magnon polariton ( http://arxiv.org/abs/2307.11328v2 ) ライセンス: Link先を確認	Rui-Chang Shen, Jie Li, Wei-Jiang Wu, Xuan Zuo, Yi-Pu Wang, Shi-Yao Zhu, J. Q. You	(参考訳) キャビティマグノメカニクス(cmm)は新興分野であり、過去10年間、多くの注目を集めてきた。マイクロ波共振器光子、マグノン、振動フォノン間のコヒーレントカップリングを扱う。これまでのCMM実験はすべて、弱い結合状態で行われた。これはシステムの様々な応用を著しく制限する。ここでは, 強結合系におけるCMMシステムを実証し, 関連する正規モード分割を観察する。この状態において、機械振動子は、強く結合されたキャビティ光子とマグノンによって形成されるキャビティ・マグノン・ポラリトンに強く結合され、ポラリトン・メカニクスの協調性は4\times10^3$に達し、従来のCMM実験よりも3桁改善される。この系は三重結合系にあり、系の通常のモードはマイクロ波光子、マグノン、フォノンのハイブリッド化である。これは、コヒーレント完全吸収による偏光子モードの崩壊速度を著しく減少させ、崩壊速度を4桁まで減少させることによって達成される。この研究は、フォノン、光子、マグノンの量子状態のコヒーレントな制御と測定への道を開き、マルチパーティイトハイブリッドシステムにおけるリッチな強結合効果の研究のための新しいプラットフォームを提供する。 Cavity magnomechanics (CMM) is an emerging field and has received much attention in the past decade. It deals with coherent couplings among microwave cavity photons, magnons and vibration phonons. So far, all previous CMM experiments have been operated in the weak-coupling regime. This considerably limits prospective various applications of the system. Here, we demonstrate the CMM system in the strong-coupling regime and observe the associated normal-mode splitting. In this regime, the mechanical oscillator is strongly coupled to a cavity-magnon polariton that is formed by strongly coupled cavity photons and magnons, and the polariton-mechanics cooperativity reaches $4\times10^3$, which is improved by three orders of magnitude than previous CMM experiments. The system is then in the triple-strong-coupling regime and the normal modes of the system are the hybridization of microwave photons, magnons and phonons. This is achieved by significantly reducing the decay rate of the polariton mode using coherent perfect absorption and the decay rate is reduced by four orders of magnitude. The work paves the way towards coherent control and measurement of the quantum states of phonons, photons and magnons, and provides a new platform for the study of rich strong-coupling effects in multipartite hybrid systems.	翻訳日:2023-11-02 02:07:03 公開日:2023-10-30
# 汎用化工学設計知識の育成に向けて Towards Populating Generalizable Engineering Design Knowledge ( http://arxiv.org/abs/2307.06985v3 ) ライセンス: Link先を確認	L Siddharth, Jianxi Luo	(参考訳) 汎用的な工学的設計知識を蓄積することを目指して,特許文書中の文から<head entity, relationship, tail entity>という形の事実を抽出する手法を提案する。これらの事実は特許文書の内外で組み合わせて知識グラフを形成し、設計知識を表現し保存するためのスキームとして機能する。工学設計文学における既存の手法は、事実ではなく統計的近似である三重項をポップアップさせるために予め定義された関係を利用することが多い。提案手法では,文からエンティティと関係を識別するためにタガーを訓練する。エンティティのペアが与えられた場合、特定の関係トークンを特定するために別のタグをトレーニングします。これらのタガーをトレーニングするために、44,227文のデータセットとそれに対応する事実を手作業で構築する。提案手法を2つの推奨アプローチに対してベンチマークする。本手法は,ファンシステムに関連する特許に含まれる文から事実を抽出することで適用する。これらの事実を用いて知識ベースを構築し、ドメインオントロジーをどのように構築し、サブシステムのコンテキスト化された知識を視覚化できるかを示す。次に,ファンシステムにおいて重要な問題に対する知識ベースを探索する。回答を知識グラフに整理し,ChatGPTの問題点に対する意見の比較検討を行う。 Aiming to populate generalizable engineering design knowledge, we propose a method to extract facts of the form <head entity, relationship, tail entity> from sentences found in patent documents. These facts could be combined within and across patent documents to form knowledge graphs that serve as schemes for representing as well as storing design knowledge. Existing methods in engineering design literature often utilise a set of predefined relationships to populate triples that are statistical approximations rather than facts. In our method, we train a tagger to identify both entities and relationships from a sentence. Given a pair of entities, we train another tagger to identify the specific relationship tokens. For training these taggers, we manually construct a dataset of 44,227 sentences and corresponding facts. We benchmark our method against two typically recommended approaches. We apply our method by extracting facts from sentences found in patents related to fan systems. We build a knowledge base using these facts to demonstrate how domain ontologies could be constructed and contextualised knowledge of subsystems could be visualised. We then search the knowledge base for key issues prevailing in fan systems. We organize the responses into knowledge graphs and hold a comparative discussion against the opinions about the key issues from ChatGPT.	翻訳日:2023-11-02 02:06:21 公開日:2023-10-30
# Soft Gripping: 信頼性の特定 Soft Gripping: Specifying for Trustworthiness ( http://arxiv.org/abs/2307.01159v2 ) ライセンス: Link先を確認	Dhaminda B. Abeywickrama, Nguyen Hao Le, Greg Chance, Peter D. Winter, Arianna Manzini, Alix J. Partridge, Jonathan Ives, John Downer, Graham Deacon, Jonathan Rossiter, Kerstin Eder, Shane Windsor	(参考訳) ソフトロボティクス(soft robotics)は、エンジニアがさまざまなアプリケーションで使える柔軟なデバイスを作る新しい技術である。ソフトロボットを広く採用するためには、その信頼性を保証することが不可欠である。信頼性を示すためには、仕様を定式化し、信頼できるものを定義する必要があります。しかし、ソフトロボティクスにおいて最も成熟した分野の一つであるソフトロボットグリッパーでさえ、ソフトロボティクスのコミュニティは、フォーメーション仕様にほとんど関心を示さなかった。本稿では,ソフトロボットシステムの開発における仕様開発の重要性について検討し,食料品のピックアップ・アンド・プレースタスクのためのソフトグリッパーの広範な例を示す。提案された仕様は、信頼性、安全性、適応性、予測可能性、倫理、規制など、機能的および非機能的要件の両方をカバーする。また,ソフトグリップの設計において,第一級の目的として検証可能性を促進する必要性を強調した。 Soft robotics is an emerging technology in which engineers create flexible devices for use in a variety of applications. In order to advance the wide adoption of soft robots, ensuring their trustworthiness is essential; if soft robots are not trusted, they will not be used to their full potential. In order to demonstrate trustworthiness, a specification needs to be formulated to define what is trustworthy. However, even for soft robotic grippers, which is one of the most mature areas in soft robotics, the soft robotics community has so far given very little attention to formulating specifications. In this work, we discuss the importance of developing specifications during development of soft robotic systems, and present an extensive example specification for a soft gripper for pick-and-place tasks for grocery items. The proposed specification covers both functional and non-functional requirements, such as reliability, safety, adaptability, predictability, ethics, and regulations. We also highlight the need to promote verifiability as a first-class objective in the design of a soft gripper.	翻訳日:2023-11-02 02:05:46 公開日:2023-10-30
# 動的システムの最適アクティブ探索 Optimistic Active Exploration of Dynamical Systems ( http://arxiv.org/abs/2306.12371v2 ) ライセンス: Link先を確認	Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes, Stelian Coros, Andreas Krause	(参考訳) 強化学習アルゴリズムは、通常、特定のタスクを解決するためのポリシーを最適化しようとする。推定モデルが大域的にダイナミクスを近似し,ゼロショットで複数のダウンストリームタスクを解決できるように,未知の力学系を探索するにはどうすればよいのか? 本稿では,この課題に対して,アクティブな探索のためのアルゴリズムであるOPAXを開発した。 OPAXは、よく校正された確率モデルを用いて、未知のダイナミクスに関する疫学的な不確かさを定量化する。それは楽観的に -- w.r.t. to plausible dynamics -- 未知のダイナミクスと状態観察の間の情報ゲインを最大化する。提案手法では, 結果の最適化問題を各エピソードで標準手法を用いて解くことができる最適制御問題に還元する方法を示す。一般モデルに対してアルゴリズムを解析し,ガウス過程のダイナミクスの場合,初歩的なサンプル複雑性を限定し,認識的不確かさがゼロに収束することを示す。実験では,OPAXと他のヒューリスティックな探索手法との比較を行った。実験の結果,OPAXは理論的に健全であるだけでなく,新しい下流タスクのゼロショット計画にも有効であることがわかった。 Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by developing an algorithm -- OPAX -- for active exploration. OPAX uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t. to plausible dynamics -- maximizes the information gain between the unknown dynamics and state observations. We show how the resulting optimization problem can be reduced to an optimal control problem that can be solved at each episode using standard approaches. We analyze our algorithm for general models, and, in the case of Gaussian process dynamics, we give a first-of-its-kind sample complexity bound and show that the epistemic uncertainty converges to zero. In our experiments, we compare OPAX with other heuristic active exploration approaches on several environments. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.	翻訳日:2023-11-02 02:05:29 公開日:2023-10-30
# HiNeRV:階層的エンコーディングに基づくニューラル表現によるビデオ圧縮 HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation ( http://arxiv.org/abs/2306.09818v2 ) ライセンス: Link先を確認	Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull	(参考訳) 学習ベースのビデオ圧縮は、現在一般的な研究テーマであり、従来の標準ビデオコーデックと競合する可能性を提供している。この文脈では、Inmplicit Neural Representations (INR) は以前、画像とビデオのコンテンツを表現し、圧縮するために用いられ、他の方法と比較して復号速度が比較的高い。しかし、既存のINRベースの手法では、ビデオ圧縮の最先端技術に匹敵する性能を達成できなかった。これは主に、その表現能力を制限する、採用されているネットワークアーキテクチャの単純さによる。本稿では,軽量層と新しい階層的位置符号化を組み合わせたINRであるHiNeRVを提案する。我々は,奥行き方向畳み込み層,mlp層,補間層を用いて,高容量で深く広いネットワークアーキテクチャを構築する。 HiNeRVはまた、フレームとパッチの両方でビデオをエンコードする統一表現であり、既存のメソッドよりも高いパフォーマンスと柔軟性を提供する。さらに、HiNeRVに基づくビデオコーデックと、トレーニング、プルーニング、量子化のための洗練されたパイプラインを構築し、失われたモデル圧縮時のHiNeRVのパフォーマンスをよりよく保存する。提案手法は,ビデオ圧縮のためのUVGデータセットとMCL-JCVデータセットの両方で評価され,学習ベースコーデックと比較して既存のINRのベースラインと競合性能(HNeRVで72.3%,UVGで43.4%)よりも大幅に向上した。 Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines light weight layers with novel hierarchical positional encodings. We employs depth-wise convolutional, MLP and interpolation layers to build the deep and wide network architecture with high capacity. HiNeRV is also a unified representation encoding videos in both frames and patches at the same time, which offers higher performance and flexibility than existing methods. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV's performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR).	翻訳日:2023-11-02 02:05:08 公開日:2023-10-30
# TensorNet:分子ポテンシャルの効率的な学習のためのモンテカルトテンソル表現 TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials ( http://arxiv.org/abs/2306.06482v2 ) ライセンス: Link先を確認	Guillem Simeon, Gianni de Fabritiis	(参考訳) 分子系表現のための効率的な機械学習モデルの開発は、科学研究において重要である。我々は、デカルトテンソル表現を利用する革新的なo(3)同値なメッセージパッシングニューラルネットワークアーキテクチャであるtensornetを紹介する。カルトテンソル原子埋め込みを用いて、行列積演算により特徴混合を単純化する。さらに、これらのテンソルを回転群既約表現にコスト効率良く分解することで、必要に応じてスカラー、ベクトル、テンソルの分離処理が可能になる。高階球面テンソルモデルと比較して、TensorNetはパラメータが大幅に少ない最先端の性能を示す。小さな分子ポテンシャルエネルギーの場合、これは単一の相互作用層でも達成できる。これらの特性の結果として、モデルの計算コストは大幅に削減される。さらに、ポテンシャルエネルギーと力の上のベクトルとテンソル分子量の正確な予測が可能となる。要約すると、TensorNetのフレームワークは最先端の同変モデルの設計のための新しい空間を開く。 The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermore, the cost-effective decomposition of these tensors into rotation group irreducible representations allows for the separate processing of scalars, vectors, and tensors when necessary. Compared to higher-rank spherical tensor models, TensorNet demonstrates state-of-the-art performance with significantly fewer parameters. For small molecule potential energies, this can be achieved even with a single interaction layer. As a result of all these properties, the model's computational cost is substantially decreased. Moreover, the accurate prediction of vector and tensor molecular quantities on top of potential energies and forces is possible. In summary, TensorNet's framework opens up a new space for the design of state-of-the-art equivariant models.	翻訳日:2023-11-02 02:04:38 公開日:2023-10-30
# PoET:配列配列としてのタンパク質ファミリーの生成モデル PoET: A generative model of protein families as sequences-of-sequences ( http://arxiv.org/abs/2306.06156v2 ) ライセンス: Link先を確認	Timothy F. Truong Jr, Tristan Bepler	(参考訳) 生成タンパク質言語モデルは、望ましい機能を持つ新しいタンパク質を設計する自然な方法である。しかしながら、現在のモデルでは、特定の関心ファミリーからタンパク質を生産することは困難であるか、特定の関心ファミリーから大きな多重配列アライメント(MSA)を訓練する必要があるため、家族間での伝達学習の恩恵を受けられない。この問題に対処するために、我々は、何千万もの天然タンパク質配列の配列として関連タンパク質の集合を生成することを学ぶタンパク質ファミリー全体の自己回帰生成モデルである、$\textbf{P}$r$\textbf{o}$tein $\textbf{E}$volutionary $\textbf{T}$ransformer (PoET)を提案する。 PoETは、関心のあるタンパク質ファミリーで条件付けられた任意の変更を生成し、スコア付けするための検索強化言語モデルとして使用することができ、短いコンテキスト長から外挿して、小さなファミリーでもうまく一般化することができる。これはユニークなトランスフォーマー層によって実現されており、シーケンス間の順序を不変に保ちながらシーケンス内でトークンを逐次モデル化することで、トレーニング中に使用されるもの以上のコンテキスト長にスケールすることができる。 PoETは、深部突然変異スキャンデータセットに関する広範な実験において、既存のタンパク質言語モデルと変異関数予測のための進化的シーケンスモデルより優れており、すべてのMSA深さのタンパク質間の変異効果予測を改善している。 Generative protein language models are a natural way to design new proteins with desired functions. However, current models are either difficult to direct to produce a protein from a specific family of interest, or must be trained on a large multiple sequence alignment (MSA) from the specific family of interest, making them unable to benefit from transfer learning across families. To address this, we propose $\textbf{P}$r$\textbf{o}$tein $\textbf{E}$volutionary $\textbf{T}$ransformer (PoET), an autoregressive generative model of whole protein families that learns to generate sets of related proteins as sequences-of-sequences across tens of millions of natural protein sequence clusters. PoET can be used as a retrieval-augmented language model to generate and score arbitrary modifications conditioned on any protein family of interest, and can extrapolate from short context lengths to generalize well even for small families. This is enabled by a unique Transformer layer; we model tokens sequentially within sequences while attending between sequences order invariantly, allowing PoET to scale to context lengths beyond those used during training. PoET outperforms existing protein language models and evolutionary sequence models for variant function prediction in extensive experiments on deep mutational scanning datasets, improving variant effect prediction across proteins of all MSA depths.	翻訳日:2023-11-02 02:04:28 公開日:2023-10-30
# 量的回帰による反事実推論の進展 Advancing Counterfactual Inference through Quantile Regression ( http://arxiv.org/abs/2306.05751v2 ) ライセンス: Link先を確認	Shaoan Xie, Biwei Huang, Bin Gu, Tongliang Liu, Kun Zhang	(参考訳) 因果的影響を理解し、利用するためには、反事実的な「what if」問合せに対処する能力が不可欠である。従来の反事実推論は通常、構造因果モデルが利用可能であると仮定する。しかし、実際にはそのような因果モデルはしばしば未知であり、識別できない可能性がある。本稿では,与えられた因果モデルや条件分布の直接推定を必要とせずに,定性的因果構造と観測データに基づく信頼性の高い反事実推論を行うことを目的とする。我々は、反事実推論を拡張量子回帰問題として再考し、ディープニューラルネットワークを用いて一般的な因果関係とデータ分布を捉える。提案手法は, 既存のデータと比較して優れた統計効率を示し, さらに, 推定値の非認識データへの一般化の可能性を高め, 一般化誤差の上限を与える。複数のデータセットで実施した実証実験の結果は、我々の理論的な主張に対する説得力のある支持を提供する。 The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference usually assumes the availability of a structural causal model. Yet, in practice, such a causal model is often unknown and may not be identifiable. This paper aims to perform reliable counterfactual inference based on the (learned) qualitative causal structure and observational data, without necessitating a given causal model or even the direct estimation of conditional distributions. We re-cast counterfactual reasoning as an extended quantile regression problem, implemented with deep neural networks to capture general causal relationships and data distributions. The proposed approach offers superior statistical efficiency compared to existing ones, and further, it enhances the potential for generalizing the estimated counterfactual outcomes to previously unseen data, providing an upper bound on the generalization error. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.	翻訳日:2023-11-02 02:03:56 公開日:2023-10-30
# 知識集約型タスクにおける小言語モデルの知識強化推論蒸留 Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks ( http://arxiv.org/abs/2305.18395v2 ) ライセンス: Link先を確認	Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang	(参考訳) 大規模言語モデル(LLM)は、知識の複雑な理解を必要とする知識集約的推論タスクにおいて、有望な性能を示す。しかし、LLMの実際のアプリケーションへの展開は、高い計算要求とデータプライバシに関する懸念のために困難である可能性がある。従来の研究は、ラベル付きデータで微調整したり、LLMを蒸留することで、タスク固有小言語モデル(LM)の構築に重点を置いてきた。しかしながら、これらのアプローチは、必要となる知識を記憶する小さなlmsの能力に制限があるため、知識集約的推論タスクには不向きである。記憶の理論的解析により, 外部知識ベースから得られる知識を付加したLPMから得られる有理性を生成するために, 小さなLMを微調整する新しい手法であるKARD(Knowledge-Augmented Reasoning Distillation)を提案する。さらに,理論生成に関連する文書を得るためのニューラルリランカも提案する。我々は、KARDが知識集約推論データセットであるMedQA-USMLE、StrategyQA、OpenbookQAにおいて、小さなT5およびGPTモデルの性能を著しく向上させることを示す。特に,MedQA-USMLEおよびStrategyQAベンチマークの2倍のパラメータを持つ細調整された3Bモデルに対して,2億5000万T5モデルを優れた性能を達成する。 Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.	翻訳日:2023-11-02 02:02:30 公開日:2023-10-30
# グローバル深層学習による治療反応予測と患者特異的薬物動態予測 Forecasting Response to Treatment with Global Deep Learning and Patient-Specific Pharmacokinetic Priors ( http://arxiv.org/abs/2309.13135v4 ) ライセンス: Link先を確認	Willa Potosnak, Cristian Challu, Kin G. Olivares, Artur Dubrawski	(参考訳) 予後の早期発見や患者のモニタリングには,医療時系列の予測が不可欠である。しかし、ノイズや間欠的なデータのために予測が難しい場合がある。これらの課題は、薬物投与などの外因性要因によって引き起こされる変化点によって、しばしば悪化する。これらの課題に対処するために,患者固有の治療効果の深層学習モデルを示す,新しいグローバルローカルアーキテクチャと薬物動態エンコーダを提案する。現実的にシミュレーションされた実世界データと実世界データの両方を用いて,血糖予測タスクの精度向上に向けたアプローチの有効性を示す。我々のグローバルローカルアーキテクチャは患者固有のモデルよりも9.2-14.6%改善している。さらに、我々の薬物動態エンコーダは、シミュレーションデータでは4.4%、実世界のデータでは2.1%で代替符号化技術よりも改善されている。提案手法は, 予期せぬ治療反応に対する早期警告の発行や, 薬物吸収および除去特性の観点から, 患者固有の治療効果を特徴付けるなど, 臨床実践において有益である。 Forecasting healthcare time series is crucial for early detection of adverse outcomes and for patient monitoring. Forecasting, however, can be difficult in practice due to noisy and intermittent data. The challenges are often exacerbated by change points induced via extrinsic factors, such as the administration of medication. To address these challenges, we propose a novel hybrid global-local architecture and a pharmacokinetic encoder that informs deep learning models of patient-specific treatment effects. We showcase the efficacy of our approach in achieving significant accuracy gains for a blood glucose forecasting task using both realistically simulated and real-world data. Our global-local architecture improves over patient-specific models by 9.2-14.6%. Additionally, our pharmacokinetic encoder improves over alternative encoding techniques by 4.4% on simulated data and 2.1% on real-world data. The proposed approach can have multiple beneficial applications in clinical practice, such as issuing early warnings about unexpected treatment responses, or helping to characterize patient-specific treatment effects in terms of drug absorption and elimination characteristics.	翻訳日:2023-11-02 01:54:04 公開日:2023-10-30
# テンソルネットワークによる位相双対性 Topological dualities via tensor networks ( http://arxiv.org/abs/2309.13118v2 ) ライセンス: Link先を確認	C. Wille, J. Eisert, A. Altland	(参考訳) トーリック符号の基底状態、二次元クラスd超伝導体の基底状態、および二次元イジングモデルの分割和は互いに双対である。この双対性は、物理学の様々な分野に共通するシステム、すなわち、長い範囲の絡み合った位相秩序、(位相)バンド絶縁体、そして古典的な統計力学を結び付けるため、目覚ましい。フェルミオン系とボソニック系をつなぐ双対性構成は本質的に非局所的であり、1次元への次元還元、共形場理論法、作用素代数など様々なアプローチで対処されている。本研究では,この双対性に対する一元的アプローチを提案し,その主主人公がテンソルネットワーク(tn)であり,中間翻訳者の役割を仮定する。双対性のネットに4番目のノードを導入すると、以下の利点が得られる: 定式化は、双対性のすべてのリンクが等しい基底で扱われること、(場の理論的なアプローチとは異なり)格子の精度で定式化されること、相関関数のマッピングにおいて鍵となる特徴、そしてそれらの可能な数値的実装である。最後に、ボソンからフェルミオンへの通過は、直感的で技術的に便利な形式を仮定する2次元のTNフレームワークで完全に定式化される。本稿では, 位相遷移, 点・線欠陥, 位相境界モード, およびシステムクラス間のマッピング下での他の構造の運命を探ることにより, 形式化の予測可能性を示す。物質リーダシップを念頭に置いて,tnsの概念への最小限の親和性のみを前提として,教育的に構築を紹介する。 The ground state of the toric code, that of the two-dimensional class D superconductor, and the partition sum of the two-dimensional Ising model are dual to each other. This duality is remarkable inasmuch as it connects systems commonly associated to different areas of physics -- that of long range entangled topological order, (topological) band insulators, and classical statistical mechanics, respectively. Connecting fermionic and bosonic systems, the duality construction is intrinsically non-local, a complication that has been addressed in a plethora of different approaches, including dimensional reduction to one dimension, conformal field theory methods, and operator algebra. In this work, we propose a unified approach to this duality, whose main protagonist is a tensor network (TN) assuming the role of an intermediate translator. Introducing a fourth node into the net of dualities offers several advantages: the formulation is integrative in that all links of the duality are treated on an equal footing, (unlike in field theoretical approaches) it is formulated with lattice precision, a feature that becomes key in the mapping of correlation functions, and their possible numerical implementation. Finally, the passage from bosons to fermions is formulated entirely within the two-dimensional TN framework where it assumes an intuitive and technically convenient form. We illustrate the predictive potential of the formalism by exploring the fate of phase transitions, point and line defects, topological boundary modes, and other structures under the mapping between system classes. Having condensed matter readerships in mind, we introduce the construction pedagogically in a manner assuming only minimal familiarity with the concept of TNs.	翻訳日:2023-11-02 01:53:45 公開日:2023-10-30
# 長距離相互作用系におけるロバスト量子多体傷の理論 Theory of robust quantum many-body scars in long-range interacting systems ( http://arxiv.org/abs/2309.12504v2 ) ライセンス: Link先を確認	Alessio Lerose, Tommaso Parolini, Rosario Fazio, Dmitry A. Abanin, Silvia Pappalardi	(参考訳) 量子多体傷(Quantum many-body scars、QMBS)は、特別な非平衡初期状態に対する熱化の違反に関連する量子多体系の例外的なエネルギー固有状態である。彼らの様々な体系的構成は局所ハミルトニアンパラメータの微調整を必要とする。本研究では、長距離相互作用する量子スピン系の設定が、一般に堅牢なQMBSをホストすることを示す。我々は、可解な置換対称極限$\alpha=0$からスピンスピン相互作用のパワー-ロー減衰指数$\alpha$を上げる際のスペクトル特性を解析する。まず、カオスのスペクトル符号が無限小$\alpha$に対して現れるにもかかわらず、大きな集合スピンを持つ$\alpha=0$エネルギー固有状態の塔は、$\alpha$の増加とともに滑らかに変形し、特徴的なQMBS特性を示すことを数値的に証明する。より大きな系におけるこれらの状態の性質と運命を明らかにするために、スピンハミルトニアンを相対論的量子回転子に非線型結合した広範なボソニックモードにマッピングする解析的アプローチを導入する。相互作用する不純物モデルの固有状態を正確に解き、原ハミルトニアンの大スピンセクターにおける自己整合局在を$0<\alpha<d$で示す。本理論は, 任意の系サイズに対するqmbの安定性機構を明らかにし, 動的臨界点近傍や半古典的カオスの存在を予測し, 長距離量子イジングチェーンにおいて数値的に検証する。副生成物として、Floquet-prethermalization定理を超えて、周期駆動下での加熱の有無の予測基準が$0<\alpha<d$である。この作業のより広い視点は、ここで開発された技術ツールボックスの独立した応用から、実験ルートの通知から、メトロロジー的に有用なマルチパートの絡み合いまで幅広い。 Quantum many-body scars (QMBS) are exceptional energy eigenstates of quantum many-body systems associated with violations of thermalization for special non-equilibrium initial states. Their various systematic constructions require fine-tuning of local Hamiltonian parameters. In this work we demonstrate that the setting of long-range interacting quantum spin systems generically hosts robust QMBS. We analyze spectral properties upon raising the power-law decay exponent $\alpha$ of spin-spin interactions from the solvable permutationally-symmetric limit $\alpha=0$. First, we numerically establish that despite spectral signatures of chaos appear for infinitesimal $\alpha$, the towers of $\alpha=0$ energy eigenstates with large collective spin are smoothly deformed as $\alpha$ is increased, and exhibit characteristic QMBS features. To elucidate the nature and fate of these states in larger systems, we introduce an analytical approach based on mapping the spin Hamiltonian onto a relativistic quantum rotor non-linearly coupled to an extensive set of bosonic modes. We exactly solve for the eigenstates of this interacting impurity model, and show their self-consistent localization in large-spin sectors of the original Hamiltonian for $0<\alpha<d$. Our theory unveils the stability mechanism of such QMBS for arbitrary system size and predicts instances of its breakdown e.g. near dynamical critical points or in presence of semiclassical chaos, which we verify numerically in long-range quantum Ising chains. As a byproduct, we find a predictive criterion for presence or absence of heating under periodic driving for $0<\alpha<d$, beyond existing Floquet-prethermalization theorems. Broader perspectives of this work range from independent applications of the technical toolbox developed here to informing experimental routes to metrologically useful multipartite entanglement.	翻訳日:2023-11-02 01:53:13 公開日:2023-10-30
# 選択アーティファクトとしてのベル相関 Bell Correlations as Selection Artefacts ( http://arxiv.org/abs/2309.10969v2 ) ライセンス: Link先を確認	Huw Price and Ken Wharton	(参考訳) ベル相関は,実験の初期状態の通常の制御によって生じる特別な選択人工物として生じる可能性があることを示す。これは非局所性であり、直接的な空間的な因果関係や影響を含まない。この議論は、2つの主な点で (arxiv:2101.05370v4 [quant-ph], arxiv:2212.06986 [quant-ph]) における以前の提案を改善する。 (i)実際のベル実験でその応用を示すこと、及び (ii)レトロカウサリティの仮定を避けること。 We show that Bell correlations may arise as a special sort of selection artefact, produced by ordinary control of the initial state of the experiments concerned. This accounts for nonlocality, without recourse to any direct spacelike causality or influence. The argument improves an earlier proposal in (arXiv:2101.05370v4 [quant-ph], arXiv:2212.06986 [quant-ph]) in two main respects: (i) in demonstrating its application in a real Bell experiment; and (ii) in avoiding the need for a postulate of retrocausality.	翻訳日:2023-11-02 01:52:38 公開日:2023-10-30
# テキストから画像への空間制御のためのマスキング・アテンション拡散指導 Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation ( http://arxiv.org/abs/2308.06027v2 ) ライセンス: Link先を確認	Yuki Endo	(参考訳) テキストから画像への合成は,最近の拡散モデルの発展に伴い,高品質な結果が得られた。しかし、テキスト入力だけでは空間的曖昧性が高く、ユーザー制御性は限られている。既存の手法では、視覚誘導(スケッチやセマンティックマスクなど)の追加による空間制御が可能だが、注釈付き画像による追加の訓練が必要となる。本稿では,拡散モデルのさらなる訓練を行わずにテキスト対画像生成を空間的に制御する手法を提案する。本手法は,クロスアテンションマップが単語と画素の位置関係を反映しているという知見に基づく。我々の目的は、与えられたセマンティックマスクやテキストプロンプトに従ってアテンションマップを制御することである。この目的のために、まず、意味領域から計算された定数マップと交差注意マップを直接置き換える簡単なアプローチを探求する。いくつかの先行研究は、クロスアテンションマップを直接操作することで、テキストと画像の拡散モデルのトレーニング不要な空間制御を可能にする。しかし、これらのアプローチは、操作された注意マップが拡散モデルによって学習された実際のものとは程遠いため、与えられたマスクに対する誤解に苦しめられている。この問題に対処するために,拡散モデルに入力された雑音画像を操作することで,各単語や画素への注意を間接的に制御することで,セマンティックマスクに忠実な画像を生成するマスク注意誘導を提案する。 masked-attention guidanceは、事前訓練されたオフザシェルフ拡散モデル(例えば、安定拡散)に容易に統合でき、テキスト誘導画像編集のタスクに適用できる。実験により,本手法は質的および定量的にベースラインよりも高精度な空間制御が可能となった。 Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through additional visual guidance (e.g., sketches and semantic masks) but require additional training with annotated images. In this paper, we propose a method for spatially controlling text-to-image generation without further training of diffusion models. Our method is based on the insight that the cross-attention maps reflect the positional relationship between words and pixels. Our aim is to control the attention maps according to given semantic masks and text prompts. To this end, we first explore a simple approach of directly swapping the cross-attention maps with constant maps computed from the semantic regions. Some prior works also allow training-free spatial control of text-to-image diffusion models by directly manipulating cross-attention maps. However, these approaches still suffer from misalignment to given masks because manipulated attention maps are far from actual ones learned by diffusion models. To address this issue, we propose masked-attention guidance, which can generate images more faithful to semantic masks via indirect control of attention to each word and pixel by manipulating noise images fed to diffusion models. Masked-attention guidance can be easily integrated into pre-trained off-the-shelf diffusion models (e.g., Stable Diffusion) and applied to the tasks of text-guided image editing. Experiments show that our method enables more accurate spatial control than baselines qualitatively and quantitatively.	翻訳日:2023-11-02 01:51:47 公開日:2023-10-30
# 視覚変換器を用いたマルチモーダルからモノモーダルリンパ腫サブタイプモデルへの知識伝達フレームワーク A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models ( http://arxiv.org/abs/2308.01328v2 ) ライセンス: Link先を確認	Bilel Guetarni, Feryal Windal, Halim Benhabiles, Marianne Petit, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard	(参考訳) リンパ腫の亜型を決定することは、生存可能性を高めるためにより良い治療を目標とする患者にとって重要なステップである。この文脈では、遺伝子発現技術に基づく既存のゴールド標準診断法は、高いコストと時間を要するため、アクセシビリティが困難である。 ihc(免疫組織化学)技術に基づく代替診断法(whoが推奨する)は存在するが、同様の制限があり、正確性は低い。深層学習モデルによるWSI(Whole Slide Image)分析では、既存の代替手法よりも安価で高速ながん診断の新しい方向性が示された。本研究では,高分解能wsisとdlbcl(diffuse large b-cell lymphoma)癌サブタイプを区別するためのビジョントランスフォーマティブに基づく枠組みを提案する。この目的のために,様々なWSIモダリティから分類器モデルを訓練するためのマルチモーダルアーキテクチャを提案する。そして,このモデルを知識蒸留機構を用いて,モノモーダル分類器の学習を効率的に進める。 157人の患者を対象に行った実験では, がん分類に関する最新の6つの手法を上回って, モノモーダル分類モデルの有望な性能を示した。さらに、本実験データから推定したパワーロー曲線から、適切な数の追加患者からのトレーニングデータが増えると、我々のモデルは、ICH技術と同等の精度で診断できる可能性が示唆された。 Determining lymphoma subtypes is a crucial step for better patients treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which is based on gene expression technology, is highly expensive and time-consuming making difficult its accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. WSI (Whole Slide Image) analysis by deep learning models showed promising new directions for cancer diagnosis that would be cheaper and faster than existing alternative methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we propose a multi-modal architecture to train a classifier model from various WSI modalities. We then exploit this model through a knowledge distillation mechanism for efficiently driving the learning of a mono-modal classifier. Our experimental study conducted on a dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent methods from the state-of-the-art dedicated for cancer classification. Moreover, the power-law curve, estimated on our experimental data, suggest that with more training data from a reasonable number of additional patients, our model has the potential to achieve diagnostic accuracy comparable to that of IHC technologies.	翻訳日:2023-11-02 01:50:26 公開日:2023-10-30
# mlic++: 学習画像圧縮のための線形複雑性マルチリファレンスエントロピーモデリング MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression ( http://arxiv.org/abs/2307.15421v3 ) ライセンス: Link先を確認	Wei Jiang, Ronggang Wang	(参考訳) 近年,チャネルワイド,局所空間,大域空間相関を捉えるマルチ参照エントロピーモデルが提案されている。以前の研究では、グローバル相関キャプチャに注意が払われているが、二次複雑性は高解像度画像符号化の可能性を制限する。本稿では,softmax 操作の分解を通じて,線形複雑性大域的相関をキャプチャする手法を提案する。そこで我々はMLIC$^{++}$を提案し,マルチ参照エントロピーモデリングのための線形複雑度を持つ画像圧縮手法を提案する。我々のMLIC$^{++}$はより効率的で、PSNRで測定した場合のVTM-17.0と比較して、KodakデータセットのBDレートを13.39%削減する。コードはhttps://github.com/JiangWeibeta/MLICで入手できる。 Recently, multi-reference entropy model has been proposed, which captures channel-wise, local spatial, and global spatial correlations. Previous works adopt attention for global correlation capturing, however, the quadratic complexity limits the potential of high-resolution image coding. In this paper, we propose the linear complexity global correlations capturing, via the decomposition of softmax operation. Based on it, we propose the MLIC$^{++}$, a learned image compression with linear complexity for multi-reference entropy modeling. Our MLIC$^{++}$ is more efficient and it reduces BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Code is available at https://github.com/JiangWeibeta/MLIC.	翻訳日:2023-11-02 01:49:59 公開日:2023-10-30
# モデルベースツリーマルコフモデルを用いた透明シーケンスモデルに向けて Toward Transparent Sequence Models with Model-Based Tree Markov Model ( http://arxiv.org/abs/2307.15367v2 ) ライセンス: Link先を確認	Chan Hsu, Wei-Chun Huang, Jun-Ting Wu, Chih-Yuan Li, Yihuang Kang	(参考訳) 本研究では,シーケンスデータに適用した複雑なブラックボックス機械学習モデルにおける解釈可能性の問題に対処する。モデルベース木隠れセミマルコフモデル(MOB-HSMM)は,高死亡リスク事象の検出と集中治療室(ICU)の死亡リスクに関連する隠れパターンの発見を目的とした,本質的に解釈可能なモデルである。このモデルは、Deep Neural Networks (DNN)から抽出した知識を活用し、明確な説明を提供しながら予測性能を向上させる。実験の結果,モデルベースツリー(MOB木)の性能はLSTMを用いて逐次パターンを学習し,MOB木に転送することで向上した。 MOB-HSMMでHidden Semi-Markov Model (HSMM) とMOBツリーを統合することで、利用可能な情報を用いて潜在的および説明可能なシーケンスを明らかにすることができる。 In this study, we address the interpretability issue in complex, black-box Machine Learning models applied to sequence data. We introduce the Model-Based tree Hidden Semi-Markov Model (MOB-HSMM), an inherently interpretable model aimed at detecting high mortality risk events and discovering hidden patterns associated with the mortality risk in Intensive Care Units (ICU). This model leverages knowledge distilled from Deep Neural Networks (DNN) to enhance predictive performance while offering clear explanations. Our experimental results indicate the improved performance of Model-Based trees (MOB trees) via employing LSTM for learning sequential patterns, which are then transferred to MOB trees. Integrating MOB trees with the Hidden Semi-Markov Model (HSMM) in the MOB-HSMM enables uncovering potential and explainable sequences using available information.	翻訳日:2023-11-02 01:49:42 公開日:2023-10-30
# 超広帯域における量子情報の多重処理 Multiplexed Processing of Quantum Information Across an Ultra-wide Optical Bandwidth ( http://arxiv.org/abs/2310.17819v2 ) ライセンス: Link先を確認	Alon Eldan, Ofek Gilon, Asher Lagimi, Elai Forman, Avi Pe'er	(参考訳) 量子情報処理は量子技術の基礎である。量子情報のプロトコルは、セキュアな通信(量子鍵分布)、テレポート量子状態、および量子計算の中心となる2つの遠隔者間で秘密を共有する。様々な量子通信プロトコルがすでに実現され、商用化されているが、その通信速度は一般的には、利用可能な量子光学光源(10-100 THz)の光帯域よりも低いMHzからGHzの範囲における測定装置の狭い電子帯域幅によって制限されている。本稿では、パラメトリックホモダイン検出による全チャネルの同時測定により、これらのブロードバンドソースを並列に多重周波数チャネル上に並列に処理する効率的な方法を提案する。具体的には、多重連続可変量子鍵分布(CV-QKD)と多重連続可変量子テレポーテーションプロトコルの2つの基本プロトコルを提案する。そこで本研究では,23以上の非相関スペクトルチャネルに対するqkdの検証に成功し,いずれにおいても盗聴を検知する能力を示した。これらの多重化手法(および類似)は、数百のチャネル上で並列に量子処理を実行し、量子プロトコルのスループットを桁違いに増加させる可能性がある。 Quantum information processing is the foundation of quantum technology. Protocols of quantum information share secrets between two distant parties for secure communication (quantum key distribution), teleport quantum states, and stand at the heart of quantum computation. While various protocols of quantum communication have already been realized, and even commercialized, their communication speed is generally low, limited by the narrow electronic bandwidth of the measurement apparatus in the MHz-to-GHz range, which is orders-of-magnitude lower than the optical bandwidth of available quantum optical sources (10-100 THz). We present and demonstrate an efficient method to process quantum information with such broadband sources in parallel over multiplexed frequency channels using parametric homodyne detection for simultaneous measurement of all the channels. Specifically, we propose two basic protocols: A multiplexed Continuous-Variable Quantum Key Distribution (CV-QKD) and A multiplexed continuous-variable quantum teleportation protocol. We demonstrate the multiplexed CV-QKD protocol in a proof-of-principle experiment, where we successfully carry out QKD over 23 uncorrelated spectral channels and show the ability to detect eavesdropping in any of them. These multiplexed methods (and similar) will enable to carry out quantum processing in parallel over hundreds of channels, potentially increasing the throughput of quantum protocols by orders of magnitude	翻訳日:2023-11-02 01:41:14 公開日:2023-10-30
# 正準量子化はGKSL力学につながるか? Does canonical quantization lead to GKSL dynamics? ( http://arxiv.org/abs/2310.17061v2 ) ライセンス: Link先を確認	T. Koide and F. Nicacio	(参考訳) 熱力学的に一貫した熱緩和過程を記述するためのブラウン運動の一般化された古典モデルを導入する。このモデルに正準量子化を適用すると、密度演算子の量子方程式が得られる。この方程式は定常解として熱平衡状態を持つが、時間進化は必ずしも完全正のトレース保存(CPTP)写像であるとは限らない。しかし、高調波振動子ポテンシャルの適用においては、CPTPマップの要件はパラメータの選択によって適切に満たされ、その後、詳細なバランス条件を満たすGorini-Kossakowski-Sudarshan-Lindblad(GKSL)方程式を再現する。この結果は、熱緩和過程における量子古典的対応を示唆し、デコヒーレンスの研究に新たな洞察を与える。 We introduce a generalized classical model of Brownian motion for describing thermal relaxation processes which is thermodynamically consistent. Applying the canonical quantization to this model, a quantum equation for the density operator is obtained. This equation has a thermal equilibrium state as its stationary solution, but the time evolution is not necessarily a Completely Positive and Trace-Preserving (CPTP) map. In the application to the harmonic oscillator potential, however, the requirement of the CPTP map is shown to be satisfied by choosing parameters appropriately and then our equation reproduces a Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) equation satisfying the detailed balance condition. This result suggests a quantum-classical correspondence in thermal relaxation processes and will provide a new insight to the study of decoherence.	翻訳日:2023-11-02 01:40:52 公開日:2023-10-30
# ディープスパースネットワークのためのハイブリッド粒度特徴対話選択に向けて Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network ( http://arxiv.org/abs/2310.15342v2 ) ライセンス: Link先を確認	Fuyuan Lyu, Xing Tang, Dugang Liu, Chen Ma, Weihong Luo, Liang Chen, Xiuqiang He, Xue Liu	(参考訳) ディープスパースネットワークは,高次元スパース特徴を有する予測タスクのためのニューラルネットワークアーキテクチャとして広く研究されている。従来の手法は主に粗粒度空間における特徴相互作用の探索方法に重点を置いていたが、より細かい粒度にはあまり注意が払われていない。本研究では,深層スパースネットワークにおける特徴場と特徴値の両方を対象とする,ハイブリッドな機能間相互作用選択手法を提案する。このような拡張空間を探索するために,ハエで計算される分解空間を提案する。そこで我々はoptikfeatureと呼ばれる選択アルゴリズムを開発し,特徴フィールドと特徴値の両方から機能インタラクションを効率的に選択する。 3つの大規模な実世界のベンチマークデータセットの実験の結果、OptFeatureは精度と効率の点でよく機能していることが示された。さらなる研究が我々の方法の実現性を支持している。 Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method.	翻訳日:2023-11-02 01:40:12 公開日:2023-10-30
# パーセル効果とキャビティによる散逸の再発 Thermal Purcell effect and cavity-induced renormalization of dissipations ( http://arxiv.org/abs/2310.15184v2 ) ライセンス: Link先を確認	Giuliano Chiriac\`o	(参考訳) 近年、組み込み量子材料の性質と位相を操作するためのツールとして、光学キャビティに大きな関心が寄せられている。パーセル効果のため、キャビティは光子相空間を変化させ、そのため材料内の電磁遷移速度を変化させ、光子環境との熱放射の交換速度を変化させる。ここでは, 物質が吸収する放射熱の簡易表現を導出し, キャビティの存在変化について検討し, 適切なキャビティジオメトリーのために劇的に拡張されたことを示す。この効果を典型的なエネルギー散逸過程と比較し, キャビティに結合した材料の温度への影響を確かめ, 1T-TaS$_2$に適用するための基準を与える。 In recent years there has been great interest towards optical cavities as a tool to manipulate the properties and phases of embedded quantum materials. Due to the Purcell effect, a cavity changes the photon phase space and thus the rate of electromagnetic transitions within the material, modifying the exchange rate of heat radiation with the photon environment. Here, I derive a simple expression for the radiative heat power absorbed by the material, investigate how it changes in the presence of a cavity and show that it is enhanced dramatically for appropriate cavity geometries. I compare this effect with typical energy dissipation processes, provide a criterion to establish its impact on the temperature of a material coupled to the cavity and apply it to 1T-TaS$_2$.	翻訳日:2023-11-02 01:39:56 公開日:2023-10-30
# エネルギー効率の良い基地局セルスイッチングのための適応動的プログラミング Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching ( http://arxiv.org/abs/2310.12999v2 ) ライセンス: Link先を確認	Junliang Luo, Yi Tian Xu, Di Wu, Michael Jenkin, Xue Liu, Gregory Dudek	(参考訳) 次世代セルラーネットワークの需要の増加、環境・規制上の懸念、地政学的緊張から生じる潜在的なエネルギー危機などにより、無線ネットワークにおける省エネルギーの重要性が高まっている。本稿では,基地局のセルをオン/オフしてネットワーク電力消費量を削減し,qos(quality of service)メトリクスを維持しつつ,オンライン最適化と組み合わせた近似動的プログラミング(adp)ベースの手法を提案する。各状態-動作ペアに与えられた多層パーセプトロン(mlp)を用いて消費電力を予測し、最適な期待電力を節約した動作を選択するためのadpの値関数を近似する。 QoSを劣化させることなく最大の電力消費を抑えるため、QoSを予測するための別のMLPとハンドオーバを予測するための長期短期メモリ(LSTM)をオンライン最適化アルゴリズムに組み込み、QoS履歴に基づいてセル切替動作をフィルタリングする適応QoS閾値を生成する。本手法の性能は,動的トラヒックパターンを用いた実世界シナリオを用いた実用ネットワークシミュレータを用いて評価する。 Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.	翻訳日:2023-11-02 01:39:41 公開日:2023-10-30
# 頻度・重大度データを用いた保険価格決定のためのニューラルネットワーク:データ前処理から技術関税へのベンチマーク研究 Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff ( http://arxiv.org/abs/2310.12671v2 ) ライセンス: Link先を確認	Freek Holvoet, Katrien Antonio and Roel Henckaerts	(参考訳) 保険会社は通常、クレームの頻度と重大度データをモデル化するための一般化線形モデルに目を向ける。他の分野での成功により、アクチュアルなツールボックス内で機械学習技術が人気を集めている。本論文は,深層学習構造を用いた機械学習による周波数分割保険価格に関する文献に寄与する。本稿では,複数種類の入力特徴が存在する場合に,頻度と重大度を目標とした4つの保険データセットに関するベンチマーク研究を行う。本研究では,バイナリ入力データに対する一般化線形モデル,勾配ブースト木モデル,フィードフォワードニューラルネットワーク(ffnn)および複合型アクチュアルニューラルネットワーク(cann)の性能比較を行った。我々のCANNは、それぞれGLMとGBMと確立されたベースライン予測とニューラルネットワークの補正を組み合わせる。本稿では, 郵便番号, 数値, カテゴリー共変量などの表型保険データに典型的に存在する複数の入力特徴に着目して, データ前処理のステップを説明する。オートエンコーダはニューラルネットワークにカテゴリ変数を埋め込むのに使われ、周波数重大設定でその潜在的な利点を探る。最後に,ニューラルネットの頻度と重大度モデルのためのグローバルサーロゲートモデルを構築した。これらのサロゲートは、FFNNやCANNが捉えた重要な洞察をGLMに翻訳することができる。そのため、技術的関税表は、実際に容易に展開できるものである。 Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.	翻訳日:2023-11-02 01:39:20 公開日:2023-10-30
# コスト効果のあるTCR-Epitope結合親和性予測のためのアクティブラーニングフレームワーク Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction ( http://arxiv.org/abs/2310.10893v2 ) ライセンス: Link先を確認	Pengfei Zhang, Seojin Bang and Heewook Lee	(参考訳) T細胞受容体(TCR)は、宿主細胞表面に提示されるエピトープ配列を認識して脅威に応答する免疫系の重要な構成要素である。近年,機械/深層学習によるTCRとエピトープの結合親和性の計算的予測が注目されている。しかし、その成功は注釈付きtcr-epitopeペアの大規模なコレクションの欠如によって妨げられている。結合親和性を示すには、高価で時間を要するウェットラブの評価が必要である。アノテーションコストを削減するため,アクティブラーニングとTCR-epitopeバインディング親和性予測モデルを組み込んだActiveTCRを提案する。ラベル付きトレーニングペアの小さなセットから始めると、ActiveTCRはアノテーションの'worth'であるラベル付きTCR-epitopeペアを反復検索する。アノテーションのコストを最小化しながら、パフォーマンスの向上を最大化する。 4つのクエリ戦略をランダムサンプリングベースラインと比較し,activetcrがアノテーションコストを約40%削減できることを実証した。さらに,tcr-epitopeペアの基底的真理ラベルをクエリ戦略に提供することで,モデル性能を損なうことなく,すでに注釈付きペアの40%以上の冗長性を識別し,低減できることを示した。本研究はtcr-epitope結合親和性予測のためのデータ最適化に関する最初の体系的調査である。 T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.	翻訳日:2023-11-02 01:39:01 公開日:2023-10-30
# min max相関クラスタリングのための4近似アルゴリズム A 4-approximation algorithm for min max correlation clustering ( http://arxiv.org/abs/2310.09196v2 ) ライセンス: Link先を確認	Holger Heidrich, Jannik Irmai, Bjoern Andres	(参考訳) 本稿では,min max相関クラスタリング問題に対する下限法を提案し,この手法に基づき,完全グラフのための組合せ4近似アルゴリズムを提案する。これは、組合せアルゴリズム(davies et al., 2023)のための線形プログラム定式化(kalhan et al., 2019)と40を用いて、以前の最もよく知られた5の近似保証を改善する。我々はこのアルゴリズムをヒューリスティックな結合によって拡張し、いくつかのベンチマークデータセット上でのソリューション品質と実行時の技術状況を改善することを実証的に示す。 We introduce a lower bounding technique for the min max correlation clustering problem and, based on this technique, a combinatorial 4-approximation algorithm for complete graphs. This improves upon the previous best known approximation guarantees of 5, using a linear program formulation (Kalhan et al., 2019), and 40, for a combinatorial algorithm (Davies et al., 2023). We extend this algorithm by a greedy joining heuristic and show empirically that it improves the state of the art in solution quality and runtime on several benchmark datasets.	翻訳日:2023-11-02 01:38:37 公開日:2023-10-30
# 連続変数、離散変数、カテゴリー変数を混合した制約付き最適化問題に対するベイズ的品質・多様性アプローチ Bayesian Quality-Diversity approaches for constrained optimization problems with mixed continuous, discrete and categorical variables ( http://arxiv.org/abs/2310.05955v2 ) ライセンス: Link先を確認	Loic Brevault and Mathieu Balesdent	(参考訳) 航空宇宙工学、民間工学、エネルギー工学などの複雑な設計問題では、設計するシステムの振る舞いや性能を予測するために、数値的なコストのかかるシミュレーションコードを使用する必要がある。システムの設計を行うために、これらのコードは最適化プロセスに組み込まれ、設計制約を満たしながら最適な設計を提供する。近年,デザイン空間の探索を強化し,特徴関数に関して最適な多角化ソリューションの集合を提供するために,品質多様性と呼ばれる新しいアプローチが提案されている。これらの機能はトレードオフを評価するのに興味深い。さらに、複雑なエンジニアリング設計問題には、最適化問題における技術的な選択を考慮に入れられるような、連続的、離散的、カテゴリー的な設計変数が混在することが多い。本稿では,連続的,離散的,カテゴリー的ベイズ最適化戦略に基づく新しい品質多様性手法を提案する。このアプローチは、古典的な品質に関して計算コストを削減できる - 個別の選択と制約を扱う一方で、多様性のアプローチ。提案手法の性能は, 解析的問題のベンチマークと, 航空宇宙システムを扱う産業設計最適化問題に基づいて評価される。 Complex engineering design problems, such as those involved in aerospace, civil, or energy engineering, require the use of numerically costly simulation codes in order to predict the behavior and performance of the system to be designed. To perform the design of the systems, these codes are often embedded into an optimization process to provide the best design while satisfying the design constraints. Recently, new approaches, called Quality-Diversity, have been proposed in order to enhance the exploration of the design space and to provide a set of optimal diversified solutions with respect to some feature functions. These functions are interesting to assess trade-offs. Furthermore, complex engineering design problems often involve mixed continuous, discrete, and categorical design variables allowing to take into account technological choices in the optimization problem. In this paper, a new Quality-Diversity methodology based on mixed continuous, discrete and categorical Bayesian optimization strategy is proposed. This approach allows to reduce the computational cost with respect to classical Quality - Diversity approaches while dealing with discrete choices and constraints. The performance of the proposed method is assessed on a benchmark of analytical problems as well as on an industrial design optimization problem dealing with aerospace systems.	翻訳日:2023-11-02 01:38:13 公開日:2023-10-30
# 貨幣の新しい経済・金融理論 A new economic and financial theory of money ( http://arxiv.org/abs/2310.04986v4 ) ライセンス: Link先を確認	Michael E. Glinsky and Sharon Sievert	(参考訳) 本論文は,電子通貨を含む経済・金融理論を根本的に改革する。電子通貨の評価は、割引キャッシュフローのミクロ経済理論ではなく、マクロ経済理論と金融政策の基本方程式に基づいて行われる。サブエコノミーの有形資産に付随する取引的エクイティとしての電子通貨の考え方は、主にサブエコノミーの無形資産に付随する株式としての株式の考え方とは対照的に発展する。この見解は、実質的な(電子通貨の流動性のために)金融(電子通貨供給及び価値安定化)及び財政(投資及び運用)政策の調整を行う機関として、電子通貨管理会社によって策定される。評価と意思決定で使用されるリスクモデルは、ディスカウント率につながるユビキタスで不適切な指数的リスクモデルではなく、真のリスクを捉えるマルチタイムスケールモデルになります。意思決定は、多スケールリスクモデルと、Deep Reinforcement Learning、Generative Pretrained Transformers、その他の人工知能(DRL/GPT/AI)を利用したシステムコントローラによって与えられるシステム応答関数に基づいて、真のシステム制御の観点からアプローチされる。最後に、サブエコノミーは、短期的な利用に関連する安定平衡と、マルチスケールのシステム応答関数とDRL/GPT/AIに基づくアクティブな非線形制御で安定化する必要がある不安定平衡の両方を持つ非線形複素物理系と見なされる。 This paper fundamentally reformulates economic and financial theory to include electronic currencies. The valuation of the electronic currencies will be based on macroeconomic theory and the fundamental equation of monetary policy, not the microeconomic theory of discounted cash flows. The view of electronic currency as a transactional equity associated with tangible assets of a sub-economy will be developed, in contrast to the view of stock as an equity associated mostly with intangible assets of a sub-economy. The view will be developed of the electronic currency management firm as an entity responsible for coordinated monetary (electronic currency supply and value stabilization) and fiscal (investment and operational) policies of a substantial (for liquidity of the electronic currency) sub-economy. The risk model used in the valuations and the decision-making will not be the ubiquitous, yet inappropriate, exponential risk model that leads to discount rates, but will be multi time scale models that capture the true risk. The decision-making will be approached from the perspective of true systems control based on a system response function given by the multi scale risk model and system controllers that utilize the Deep Reinforcement Learning, Generative Pretrained Transformers, and other methods of Artificial Intelligence (DRL/GPT/AI). Finally, the sub-economy will be viewed as a nonlinear complex physical system with both stable equilibriums that are associated with short-term exploitation, and unstable equilibriums that need to be stabilized with active nonlinear control based on the multi scale system response functions and DRL/GPT/AI.	翻訳日:2023-11-02 01:37:55 公開日:2023-10-30
# 光学結合ナノ粒子の非エルミートダイナミクスと非相反性 Non-Hermitian dynamics and nonreciprocity of optically coupled nanoparticles ( http://arxiv.org/abs/2310.02610v2 ) ライセンス: Link先を確認	Manuel Reisenbauer, Henning Rudolph, Livia Egyed, Klaus Hornberger, Anton V. Zasedatelev, Murad Abuzarli, Benjamin A. Stickler, Uro\v{s} Deli\'c	(参考訳) フォトニック、原子、電気、光機械のプラットフォームで観察される非エルミート力学は、応用や信号処理を感知する大きな可能性を秘めている。近年, 浮遊ナノ粒子間の完全可変非相互光相互作用が実証されている。ここでは、このチューナビリティを用いて、2つの非相反的および非線形相互作用するナノ粒子の集団的非エルミタンダイナミクスの研究を行う。我々はパリティ時対称性の破れを観察し、十分に強い結合のために、粒子が安定な極限周期に沿って移動する集合的な機械的ラシング遷移を観察する。この研究は、ツイーザーアレイ内の個々の部位の動的制御によって調整された非平衡多粒子集合効果の研究の道を開く。 Non-Hermitian dynamics, as observed in photonic, atomic, electrical, and optomechanical platforms, holds great potential for sensing applications and signal processing. Recently, fully tunable nonreciprocal optical interaction has been demonstrated between levitated nanoparticles. Here, we use this tunability to investigate the collective non-Hermitian dynamics of two nonreciprocally and nonlinearly interacting nanoparticles. We observe parity-time symmetry breaking and, for sufficiently strong coupling, a collective mechanical lasing transition, where the particles move along stable limit cycles. This work opens up a research avenue of nonequilibrium multi-particle collective effects, tailored by the dynamic control of individual sites in a tweezer array.	翻訳日:2023-11-02 01:37:27 公開日:2023-10-30
# 言語モデルトレーニングのための人体フィードバックの微粒化 Fine-Grained Human Feedback Gives Better Rewards for Language Model Training ( http://arxiv.org/abs/2306.01693v2 ) ライセンス: Link先を確認	Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi	(参考訳) 言語モデル(LM)は、しばしば偽、有毒、無関係な出力を生成するなど、望ましくないテキスト生成の振る舞いを示す。人間のフィードバックからの強化学習(RLHF) – LM出力に対する人間の嗜好判断が学習信号に変換される – は、これらの問題に対処する上での約束を最近示した。しかし、このような全体論的フィードバックは、長いテキスト出力に関する限られた情報を伝えるものであり、出力のどの側面がユーザーの好みに影響を与えているかを示すものではない。本稿では, 明快な訓練信号として, きめ細かい人間のフィードバック(例えば, 文は偽で, サブ文は無関係)を用いる。我々は,(1)各セグメント(文など)が生成されてから報酬を与える密度,(2)異なるフィードバックタイプ(事実的誤り,不適切性,情報不完全性など)に関連付けられた複数の報酬モデルを統合する,2つの点で微細な報酬関数からのトレーニングと学習を可能にするフレームワークであるFine-Grained RLHFを紹介する。我々は,このような報酬関数による学習が,自動評価と人的評価の両方で支持されるパフォーマンス向上につながることを示すために,解毒および長文質問応答の実験を行った。さらに、細粒度報酬モデルの異なる組み合わせを用いて、LMの挙動をカスタマイズできることを示す。すべてのデータ、人間のフィードバック、コードをhttps://FineGrainedRLHF.github.ioで公開しています。 Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with such reward functions leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at https://FineGrainedRLHF.github.io.	翻訳日:2023-11-01 23:54:27 公開日:2023-10-30
# Flip-Flop言語モデリングによる注意グラフの抽出 Exposing Attention Glitches with Flip-Flop Language Modeling ( http://arxiv.org/abs/2306.00946v2 ) ライセンス: Link先を確認	Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang	(参考訳) なぜ大規模な言語モデルは事実的不正確さを出力し、誤った推論を示すのか? これらのモデルの脆さ、特に推論の長い連鎖を実行する場合、現在、知識、実践的思考、抽象的思考を一貫性を持って合成する高度な能力を支払うために避けられない価格であるように思える。この根本的な未解決問題を理解するため、本研究は、トランスフォーマーアーキテクチャの帰納的バイアスが断続的にロバストな推論を捉えることができない、注意欠陥の現象を識別し、分析する。この問題を分離するために,ニューラルネットワークモデルの外挿挙動を探索するために設計された合成ベンチマークのパラメトリックなファミリであるフリップフロップ言語モデリング(FFLM)を導入する。この単純な生成タスクは、長い範囲の依存に対してバイナリシンボルをコピーするモデルを必要とします。トランスフォーマーfflmは散発的な推論エラーの長い尾に苦しむことが分かり、その一部は様々な正規化技術を用いて排除できる。予備的な機構解析により,残差エラーの診断と解決が困難になる可能性が示唆された。我々は,自然のLLMにおける閉領域幻覚に注意点が関与していると仮定する。 Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of attention glitches, in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning. To isolate the issue, we introduce flip-flop language modeling (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.	翻訳日:2023-11-01 23:52:48 公開日:2023-10-30
# 高次相互作用のための相互作用測度、分割格子および核テスト Interaction Measures, Partition Lattices and Kernel Tests for High-Order Interactions ( http://arxiv.org/abs/2306.00904v2 ) ライセンス: Link先を確認	Zhaolu Liu, Robert L. Peach, Pedro A.M. Mediano, and Mauricio Barahona	(参考訳) 対関係にのみ依存するモデルは、社会経済、生態学、生物医学システムなど、様々な領域で見られる複雑な多変量データの完全な統計構造を捉えることができないことが多い。 2つ以上の変数からなるグループ間の非自明な依存関係は、そのようなシステムの分析とモデリングにおいて重要な役割を果たすが、データからそのような高次相互作用を抽出することは依然として困難である。ここでは、d$-order (d \geq 2$) 相互作用測度の階層を導入し、ジョイント確率分布の可能な因子化をますます包含し、非パラメトリックなカーネルベースのテストを定義し、d$-order相互作用の統計的意義を体系的に確立する。また、相互作用測度とその複合置換試験の導出を解明する格子理論との数学的関係を確立し、単純錯体とカーネル行列遠心率の関連を明らかにするとともに、計算効率を高める手段を提供する。本研究は,合成データおよび神経画像データへの応用により,数値的に結果を示す。 Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling of such systems, yet extracting such high-order interactions from data remains challenging. Here, we introduce a hierarchy of $d$-order ($d \geq 2$) interaction measures, increasingly inclusive of possible factorisations of the joint probability distribution, and define non-parametric, kernel-based tests to establish systematically the statistical significance of $d$-order interactions. We also establish mathematical links with lattice theory, which elucidate the derivation of the interaction measures and their composite permutation tests; clarify the connection of simplicial complexes with kernel matrix centring; and provide a means to enhance computational efficiency. We illustrate our results numerically with validations on synthetic data, and through an application to neuroimaging data.	翻訳日:2023-11-01 23:52:28 公開日:2023-10-30
# 拡散モデルにおける負転移の対応 Addressing Negative Transfer in Diffusion Models ( http://arxiv.org/abs/2306.00354v2 ) ライセンス: Link先を確認	Hyojun Go, JinYoung Kim, Yunsung Lee, Seunghyun Lee, Shinhyeok Oh, Hyeongdon Moon, Seungtaek Choi	(参考訳) 拡散に基づく生成モデルは様々な領域で顕著な成功を収めている。マルチタスク学習(MTL)の形式を表現するために、異なるノイズレベルを同時に含むタスクの認知に関する共有モデルを訓練する。しかし、MTLの観点からの拡散モデルの解析と改善はいまだに未検討である。特に、mtlはよく知られた負の伝達現象につながり、タスク間の衝突によって特定のタスクのパフォーマンスが低下することがある。本稿では,MTL の観点から拡散訓練を解析し,(O1) 雑音レベルの差が大きくなるにつれてタスク間のタスク親和性が低下し,(O2) 負の伝達が拡散訓練においても生じるという2つの重要な観察結果を示す。これらの観測に基づいて、負の伝達を緩和することで拡散訓練を強化することを目指している。これを実現するために,既存のMLL手法の活用を提案するが,膨大なタスクが存在するため,タスク毎の損失や勾配を計算するのに計算コストがかかる。この課題に対処するために,タスクを小さなタスククラスタにクラスタ化し,MTLメソッドを適用することを提案する。具体的には、(O2)に基づいて、クラスタ内のタスク間の時間的近接を強制するために間隔クラスタリングを用いる。本研究では,信号対雑音比,時間ステップ,タスク親和性を用いて,動的計画法を用いて区間クラスタリングを解決できることを示す。本手法は,mtl法の効率的な計算を可能にすることにより,拡散モデルにおける負の伝達問題に対処する。提案手法のクラスタリングとMTL手法の統合を様々な実験により検証し,拡散モデルのサンプル品質の向上を実証した。プロジェクトのページは \href{https://gohyojun15.github.io/ant_diffusion/}{url} で閲覧できます。 Diffusion-based generative models have achieved remarkable success in various domains. It trains a shared model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of negative transfer, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we first aim to analyze diffusion training from an MTL standpoint, presenting two key observations: (O1) the task affinity between denoising tasks diminishes as the gap between noise levels widens, and (O2) negative transfer can arise even in diffusion training. Building upon these observations, we aim to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on (O2), we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the proposed clustering and its integration with MTL methods through various experiments, demonstrating improved sample quality of diffusion models. Our project page is available at \href{https://gohyojun15.github.io/ANT_diffusion/}{url}.	翻訳日:2023-11-01 23:52:08 公開日:2023-10-30
# スペクトル調和:自己監督学習におけるスペクトル埋め込みと行列補完 Spectal Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning ( http://arxiv.org/abs/2305.19818v2 ) ライセンス: Link先を確認	Marina Munkhoeva, Ivan Oseledets	(参考訳) 自己監督的な手法は、ラベルの形で明らかな監督なしにデータのセマンティクスを尊重する学習表現に対する、一見ヒューリスティックなアプローチによって大きな注目を集めた。現代の自己監督表現学習法で使われる損失の動物園の作業について、一貫性と理論的に根拠のある理解を構築するために、文学の集団がすでに出版されている。本稿では,ラプラス演算子の観点からの理解を提供し,拡張過程に起因する帰納的バイアスを低ランク行列補完問題に結びつける。この目的のために,低ランク行列補完の結果を利用して,最新のssl手法の収束と,その下流性能に影響を与える重要な特性を理論的に解析する。 Self-supervised methods received tremendous attention thanks to their seemingly heuristic approach to learning representations that respect the semantics of the data without any apparent supervision in the form of labels. A growing body of literature is already being published in an attempt to build a coherent and theoretically grounded understanding of the workings of a zoo of losses used in modern self-supervised representation learning methods. In this paper, we attempt to provide an understanding from the perspective of a Laplace operator and connect the inductive bias stemming from the augmentation process to a low-rank matrix completion problem. To this end, we leverage the results from low-rank matrix completion to provide theoretical analysis on the convergence of modern SSL methods and a key property that affects their downstream performance.	翻訳日:2023-11-01 23:51:39 公開日:2023-10-30
# トンネル効果:深層ニューラルネットワークにおけるデータ表現の構築 The Tunnel Effect: Building Data Representations in Deep Neural Networks ( http://arxiv.org/abs/2305.19753v2 ) ライセンス: Link先を確認	Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Mi{\l}o\'s, Tomasz Trzci\'nski	(参考訳) ディープニューラルネットワークは、さまざまなタスクにまたがる顕著な効果で広く知られており、深層ネットワークは暗黙的により複雑なデータ表現を学ぶというコンセンサスがある。本稿では,教師付き画像分類のための十分な深層ネットワークを,結果の表現に異なる2つの異なる部分に分割することを提案する。最初のレイヤは線形に分離可能な表現を生成し、続くレイヤは \textit{the tunnel} と呼ばれ、これらの表現を圧縮し、全体的なパフォーマンスに最小限の影響を与える。総合的な実験研究を通じてトンネルの挙動を探究し,訓練過程の初期段階に現れることを強調する。その深さは、ネットワークの容量とタスクの複雑さの関係に依存する。さらに,このトンネルは分散一般化を損なうことを示し,継続的な学習にその意義について考察する。 Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearly-separable representations, while the subsequent layers, which we refer to as \textit{the tunnel}, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning.	翻訳日:2023-11-01 23:51:09 公開日:2023-10-30
# 単一生成フローネットワークによるグラフィカル構造とパラメータのジョイントベイズ推定 Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network ( http://arxiv.org/abs/2305.19366v2 ) ライセンス: Link先を確認	Tristan Deleu, Mizu Nishikawa-Toomey, Jithendaraa Subramanian, Nikolay Malkin, Laurent Charlin, Yoshua Bengio	(参考訳) 離散的および構造化されたサンプル空間上の生成モデルのクラスである生成フローネットワーク(GFlowNets)は、ベイジアンネットワークの有向非巡回グラフ(DAG)上の境界後部分布を推定する問題に対して、観測のデータセットを与えられた。本稿では, この枠組みを非離散標本空間に拡張する最近の進歩に基づき, ベイズネットワークの構造だけでなく, 条件付き確率分布のパラメータにも乗じて, 結合後部を近似する手法を提案する。我々は,サンプリングポリシが2段階のプロセスに従う単一のGFlowNetを用いて,DAGを1回に1つのエッジに順次生成し,全構造が知られると対応するパラメータを選択する。パラメータは後方分布に含まれるため,ベイジアンネットワークの局所確率モデルに対する柔軟性が向上し,ニューラルネットワークによってパラメータ化される非線形モデルにも適用できる。本手法は jsp-gfn と呼ばれ, シミュレーションデータと実データの両方において既存の手法と好適に比較しながら, 関節後方の正確な近似を提供する。 Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.	翻訳日:2023-11-01 23:50:56 公開日:2023-10-30
# sheetcopilot: 大規模言語モデルによるソフトウェア生産性の次のレベルへ SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models ( http://arxiv.org/abs/2305.19308v2 ) ライセンス: Link先を確認	Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang	(参考訳) コンピュータのエンドユーザーは、表データ処理やプロジェクトスケジュールスケジューリングといった日々のタスクを何十億時間も完了させてきた。これらのタスクのほとんどは反復的でエラーを起こしやすいが、ほとんどのエンドユーザーはこうした面倒な作業を自動化するスキルが欠けている。大規模言語モデル(LLM)の出現により、自然言語ユーザ要求によるソフトウェア指向が到達可能な目標となっている。本研究では,自然言語処理とスプレッドシート制御を併用して要求を満たすシートコパイロットエージェントを提案する。本稿では,スプレッドシートソフトウェア機能の抽象化として,アトミックアクションのセットを提案する。我々はさらに、LLMがスプレッドシートと堅牢に対話するための状態マシンベースのタスク計画フレームワークを設計する。 221のスプレッドシート制御タスクを含む代表データセットをキュレートし、ソフトウェア制御タスクにおけるLLMの能力を厳格にベンチマークするための完全自動評価パイプラインを確立する。当社の SheetCopilot は,単一世代のタスクの 44.3 % を正しく完了し,強力なコード生成ベースラインを広いマージンで上回っている。プロジェクトページ:https://sheetcopilot.github.io/ Computer end users have spent billions of hours completing daily tasks like tabular data processing and project timeline scheduling. Most of these tasks are repetitive and error-prone, yet most end users lack the skill to automate these burdensome works. With the advent of large language models (LLMs), directing software with natural language user requests become a reachable goal. In this work, we propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We propose a set of atomic actions as an abstraction of spreadsheet software functionalities. We further design a state machine-based task planning framework for LLMs to robustly interact with spreadsheets. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline for rigorously benchmarking the ability of LLMs in software control tasks. Our SheetCopilot correctly completes 44.3\% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin. Our project page:https://sheetcopilot.github.io/.	翻訳日:2023-11-01 23:50:35 公開日:2023-10-30
# NetHackはハッキングが難しい NetHack is Hard to Hack ( http://arxiv.org/abs/2305.19240v2 ) ライセンス: Link先を確認	Ulyana Piterbarg, Lerrel Pinto, Rob Fergus	(参考訳) ニューラルポリシー学習法は,アタリゲームからシミュレーションロコモーションに至るまで,様々な制御問題において顕著な成果を上げている。しかし、これらの手法は特に、一般的なダンジョンクローラーゲームであるNetHackのようなマルチモーダルな観察を伴うオープンな環境において、長期的タスクで苦労する。興味深いことに、NeurIPS 2021 NetHack Challengeは、シンボリックエージェントが中央値のゲームスコアで4倍以上のニューラルアプローチを上回りました。本稿では,この性能格差の背景にある理由を考察し,nethackのニューラルポリシー学習に関する広範な研究を行う。本研究は,勝利の象徴的エージェントを解析し,コードベースを拡張して内部戦略の選択を追跡し,最大規模のデモデータセットを生成する。このデータセットを用いて検討する (i)行動階層の長所 (ii)ニューラルアーキテクチャの強化、及び (iii)強化学習と模倣学習の統合。我々の調査では、従来の完全なニューラルネットワークポリシーを127%のオフライン設定で、中央値のオンライン設定で25%超える最先端のニューラルエージェントを作成しました。しかし,優れたシンボリックモデルやトップヒューマンプレイヤーでパフォーマンスギャップを埋めるには,単にスケーリングが不十分であることも示している。 Neural policy learning methods have achieved remarkable results in various control problems, ranging from Atari games to simulated locomotion. However, these methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents outperformed neural approaches by over four times in median game score. In this paper, we delve into the reasons behind this performance gap and present an extensive study on neural policy learning for NetHack. To conduct this study, we analyze the winning symbolic agent, extending its codebase to track internal strategy selection in order to generate one of the largest available demonstration datasets. Utilizing this dataset, we examine (i) the advantages of an action hierarchy; (ii) enhancements in neural architecture; and (iii) the integration of reinforcement learning with imitation learning. Our investigations produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score. However, we also demonstrate that mere scaling is insufficient to bridge the performance gap with the best symbolic models or even the top human players.	翻訳日:2023-11-01 23:50:17 公開日:2023-10-30
# 常時画像生成のためのネスト拡散過程 Nested Diffusion Processes for Anytime Image Generation ( http://arxiv.org/abs/2305.19066v3 ) ライセンス: Link先を確認	Noam Elata, Bahjat Kawar, Tomer Michaeli, Michael Elad	(参考訳) 拡散モデルは、画像生成における最先端のモデルであり、生成プロセスを多くの細かなデノイジングステップに分解することで高品質な画像を合成する。優れた性能にもかかわらず、拡散モデルは計算コストが高く、多くの神経機能評価(NFE)を必要とする。本研究では,完了前に任意のタイミングで停止した場合に実行可能画像を生成する,任意の時間拡散に基づく手法を提案する。既存の事前学習拡散モデルを用いて、生成スキームを2つのネスト拡散過程として再構成し、生成した画像の高速反復精錬を可能にする。 ImageNetとStable Diffusionを用いたテキスト・ツー・イメージ生成実験において,本手法の中間生成品質が元の拡散モデルを大きく上回る一方で,最終的な生成結果と同等であることを示す。我々は,Nested Diffusionの適用性について,逆問題の解決や,サンプリングプロセス全体を通じてユーザの介入を可能とすることで,テキストベースの迅速なコンテンツ作成など,いくつかの設定で説明する。 Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final generation result remains comparable. We illustrate the applicability of Nested Diffusion in several settings, including for solving inverse problems, and for rapid text-based content creation by allowing user intervention throughout the sampling process.	翻訳日:2023-11-01 23:49:56 公開日:2023-10-30
# 画像セグメンテーションにおけるトポロジー認識の不確かさ Topology-Aware Uncertainty for Image Segmentation ( http://arxiv.org/abs/2306.05671v3 ) ライセンス: Link先を確認	Saumya Gupta, Yikai Zhang, Xiaoling Hu, Prateek Prasanna and Chao Chen	(参考訳) 比較的弱い信号と複雑な幾何学・トポロジーのため, 血管や道路網などの曲線構造のセグメンテーションは困難である。大規模なアノテーションを容易かつ加速するためには、専門家による証明読取のような半自動的なアプローチを採用する必要がある。本研究では,このようなタスクに対する不確実性評価に焦点をあて,高い不確かさとエラー発生構造を人間のアノテータが検証できるようにする。ピクセルワイズ不確実性マップを提供する既存の多くの作品とは異なり、我々は、例えば小さな接続や枝などの位相構造の単位における不確かさを推定することが重要であると規定している。これを実現するために、我々は、トポロジカルデータ解析、特に離散モース理論(DMT)のツールを活用し、まず構造を捉え、その不確実性を推論する。この不確かさをモデル化するために,(1)隣接構造物を考慮しながら構造物の不確かさを推定する共同予測モデル(構造間不確実性)を提案し,(2)その表現を摂動・歩行スキームでサンプリングし,各構造物内固有の不確かさをモデル化する新しい確率的dmtを提案する。様々な2次元および3次元データセットにおいて,本手法は既存手法と比較して構造的不確実性マップを生成する。コードはhttps://github.com/saumya-gupta-26/struct-uncertaintyで利用可能 Segmentation of curvilinear structures such as vasculature and road networks is challenging due to relatively weak signals and complex geometry/topology. To facilitate and accelerate large scale annotation, one has to adopt semi-automatic approaches such as proofreading by experts. In this work, we focus on uncertainty estimation for such tasks, so that highly uncertain, and thus error-prone structures can be identified for human annotators to verify. Unlike most existing works, which provide pixel-wise uncertainty maps, we stipulate it is crucial to estimate uncertainty in the units of topological structures, e.g., small pieces of connections and branches. To achieve this, we leverage tools from topological data analysis, specifically discrete Morse theory (DMT), to first capture the structures, and then reason about their uncertainties. To model the uncertainty, we (1) propose a joint prediction model that estimates the uncertainty of a structure while taking the neighboring structures into consideration (inter-structural uncertainty); (2) propose a novel Probabilistic DMT to model the inherent uncertainty within each structure (intra-structural uncertainty) by sampling its representations via a perturb-and-walk scheme. On various 2D and 3D datasets, our method produces better structure-wise uncertainty maps compared to existing works. Code available at https://github.com/Saumya-Gupta-26/struct-uncertainty	翻訳日:2023-11-01 23:44:03 公開日:2023-10-30
# mri脳腫瘍セグメンテーションのための新しい信頼感誘発クラス活性化マッピング A Novel Confidence Induced Class Activation Mapping for MRI Brain Tumor Segmentation ( http://arxiv.org/abs/2306.05476v3 ) ライセンス: Link先を確認	Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho	(参考訳) 磁気共鳴イメージング(MRI)は、脳腫瘍のセグメンテーションにおいて一般的に用いられる技術であり、患者の評価や治療計画に重要である。ラベル付けプロセスが専門知識に頼りにくくするために,クラスアクティベーションマッピング(CAM)を用いた弱教師付きセマンティックセマンティックセグメンテーション(WSSS)法が提案されている。しかし、現在のCAMベースのWSSSメソッドは、勾配やトレーニング可能なパラメータなどの内部ニューラルネットワーク情報を使用してオブジェクトのローカライゼーションマップを生成し、それによってサブ最適解が得られる。これらの問題に対処するために,各特徴マップの重み付けを目標クラスの信頼度を用いて算出する信頼誘導型CAM(Cfd-CAM)を提案する。 2つの脳腫瘍データセットに対する実験により、Cfd-CAMは、同じレベルの監督下で既存の最先端の手法よりも優れていることが示された。総じて,提案するcfd-camアプローチは脳腫瘍の分画精度を向上し,他の医用画像診断のためのwsss法の開発に有用な知見を与える。 Magnetic resonance imaging (MRI) is a commonly used technique for brain tumor segmentation, which is critical for evaluating patients and planning treatment. To make the labeling process less laborious and dependent on expertise, weakly-supervised semantic segmentation (WSSS) methods using class activation mapping (CAM) have been proposed. However, current CAM-based WSSS methods generate the object localization map using internal neural network information, such as gradient or trainable parameters, which can lead to suboptimal solutions. To address these issues, we propose the confidence-induced CAM (Cfd-CAM), which calculates the weight of each feature map by using the confidence of the target class. Our experiments on two brain tumor datasets show that Cfd-CAM outperforms existing state-of-the-art methods under the same level of supervision. Overall, our proposed Cfd-CAM approach improves the accuracy of brain tumor segmentation and may provide valuable insights for developing better WSSS methods for other medical imaging tasks.	翻訳日:2023-11-01 23:42:56 公開日:2023-10-30
# 要因的コントラスト学習 - マルチビュー冗長性を超えて Factorized Contrastive Learning: Going Beyond Multi-view Redundancy ( http://arxiv.org/abs/2306.05268v2 ) ライセンス: Link先を確認	Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov	(参考訳) 多様なマルチモーダルタスクにおいて、コントラスト学習は、ペアリング情報(画像キャプチャやビデオオーディオペアなど)のみを含む豊富なラベルなしデータから表現をうまく学習できるため、特に魅力的なアプローチとなっている。これらのアプローチを支えるのは、マルチビュー冗長性(multi-view redundancy)の仮定である。しかし、多くの現実の環境では、タスク関連情報はモダリティ・ユニクティックな領域にも含まれている: 1つのモダリティにのみ存在するが、タスクに関係している情報である。下流タスクに関連する共有情報とユニークな情報の両方をキャプチャするために、自己組織化されたマルチモーダル表現をどのように学べるか? 本稿では,マルチビュー冗長性を超えた新しいマルチモーダル表現学習法であるFacterCLを提案する。 factorclは,(1)タスク関連情報を共有表現とユニークな表現に分解する,(2)mi下限を最大化しタスク関連情報を取得し,mi上限を最小化することでタスク関連情報を削除する,(3)ラベル無しでタスク関連情報を近似するマルチモーダルデータ拡張,の3つの新たなコントリビューションから構築されている。大規模な実世界のデータセットでは、FacterCLは共有情報とユニークな情報の両方をキャプチャし、6つのベンチマークで最先端の結果を達成する In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient for downstream tasks. However, in many real-world settings, task-relevant information is also contained in modality-unique regions: information that is only present in one modality but still relevant to the task. How can we learn self-supervised multimodal representations to capture both shared and unique information relevant to downstream tasks? This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy. FactorCL is built from three new contributions: (1) factorizing task-relevant information into shared and unique representations, (2) capturing task-relevant information via maximizing MI lower bounds and removing task-irrelevant information via minimizing MI upper bounds, and (3) multimodal data augmentations to approximate task relevance without labels. On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results on six benchmarks	翻訳日:2023-11-01 23:42:35 公開日:2023-10-30
# ラクダはどこまで行けますか。オープンリソースのインストラクションチューニングの現状を探る How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources ( http://arxiv.org/abs/2306.04751v2 ) ライセンス: Link先を確認	Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi	(参考訳) 本研究では,オープン命令追従データセットを用いた命令チューニング言語モデルの最近の進歩について検討する。オープンモデルは最先端のプロプライエタリモデルと同等であるという最近の主張にもかかわらず、これらの主張はしばしば限定的な評価を伴っており、ボード全体の比較と様々なリソースの有用性の決定が困難である。我々は、6.7Bから65Bのパラメータから、手作業によるキュレート(OpenAssistantなど)から合成・蒸留(Alpacaなど)までの12の命令データセットをトレーニングし、それらの事実的知識、推論、多言語性、コーディング、そして、自動的、モデルベース、人間ベースのメトリクスの収集を通じて、それらを体系的に評価する。さらに、高品質なオープンリソースの組み合わせを微調整した命令調整モデルスイートであるT\"uluを紹介します。我々の実験では、異なる命令チューニングデータセットは特定のスキルを解明または拡張できるが、単一のデータセット(または組み合わせ)はすべての評価で最高のパフォーマンスを提供する。興味深いことに、モデルと人間の嗜好に基づく評価は、ベンチマークベースの評価で表されるモデル能力の違いを反映せず、本研究で実施されるシステム評価のタイプの必要性が示唆されている。評価の結果,ChatGPTの性能は平均87%,GPT-4性能は73%であり,このギャップを埋めるためには,より良いベースモデルの構築と指導訓練データの構築にさらなる投資が必要であることが示唆された。我々は、65B T\"uluを完全に微調整したモデルと、将来の研究を促進するためのコード、データ、評価フレームワークをhttps://github.com/allenai/open-instructでリリースしています。 In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied by limited evaluation, making it difficult to compare models across the board and determine the utility of various resources. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets ranging from manually curated (e.g., OpenAssistant) to synthetic and distilled (e.g., Alpaca) and systematically evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities through a collection of automatic, model-based, and human-based metrics. We further introduce T\"ulu, our best performing instruction-tuned model suite finetuned on a combination of high-quality open resources. Our experiments show that different instruction-tuning datasets can uncover or enhance specific skills, while no single dataset (or combination) provides the best performance across all evaluations. Interestingly, we find that model and human preference-based evaluations fail to reflect differences in model capabilities exposed by benchmark-based evaluations, suggesting the need for the type of systemic evaluation performed in this work. Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap. We release our instruction-tuned models, including a fully finetuned 65B T\"ulu, along with our code, data, and evaluation framework at https://github.com/allenai/open-instruct to facilitate future research.	翻訳日:2023-11-01 23:42:09 公開日:2023-10-30
# 生成モデル評価指標の欠陥の暴露と拡散モデルの不公平な処理 Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models ( http://arxiv.org/abs/2306.04675v2 ) ライセンス: Link先を確認	George Stein, Jesse C. Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem	(参考訳) 我々は,セマンティックな画像データセットにまたがる多種多様な生成モデルを体系的に研究し,それらの評価に用いる特徴抽出器と指標を理解し,改善する。心理物理学におけるベストプラクティスを用いて、生成標本に対する人間のイメージリアリズムの知覚を計測し、これまでで最大の生成モデル評価実験を行い、既存の測定基準が人間の評価と強く相関しないことを見出した。生成モデルの全体的なパフォーマンス、忠実性、多様性、ラリティ、記憶力を評価するための17の現代的な指標と比較すると、人間によって判断される拡散モデルの最先端の知覚的実在性は、fidのような一般的に報告されている指標には反映されないことが分かる。この相違は生成標本の多様性によって説明されないが、一つの原因はインセプションV3への過剰依存である。これらの欠陥に対処するために,個別のネットワークで符号化された意味情報がトレーニング手順に強く依存していることを発見し,DINOv2-ViT-L/14が生成モデルのよりリッチな評価を可能にすることを示す。次に,生成モデルがcifar10のような単純で小さなデータセットのトレーニング例を記憶しているが,imagenetのような複雑なデータセットでは必ずしもそうではないことを示す。しかし,本実験では,現在の計測値が記憶を適切に検出できないことを示しており,記憶を不適合やモード縮小といった他の現象と区別することはできない。生成モデルのさらなる開発と評価を容易にするため、生成した画像データセット、人体評価データ、モジュールライブラリをリリースし、https://github.com/layer6ai-labs/dgm-evalで9つの異なるエンコーダに対して17の共通メトリクスを計算します。 We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self-supervised feature extractors, find that the semantic information encoded by individual networks strongly depends on their training procedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation of generative models. Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet. However, our experiments show that current metrics do not properly detect memorization: none in the literature is able to separate memorization from other phenomena such as underfitting or mode shrinkage. To facilitate further development of generative models and their evaluation we release all generated image datasets, human evaluation data, and a modular library to compute 17 common metrics for 9 different encoders at https://github.com/layer6ai-labs/dgm-eval.	翻訳日:2023-11-01 23:41:06 公開日:2023-10-30
# マルチモーダル核融合相互作用:人間と自動定量化の研究 Multimodal Fusion Interactions: A Study of Human and Automatic Quantification ( http://arxiv.org/abs/2306.04125v2 ) ライセンス: Link先を確認	Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency	(参考訳) 異種信号のマルチモーダル融合を実現するためには、各モーダルが個別にタスクに有用な情報を提供し、この情報が他のモーダルの存在下でどのように変化するかを理解する必要がある。本稿では,(1)アノテータが第1,第2,両モダリティをアノテートする部分ラベル,(2)アノテータが第1,第2,第2のモダリティをアノテートする対物ラベルと,(2)アノテータが第1のモダリティをアノテートして,第2のモダリティをアノテートする部分ラベル,の2つのカテゴリをアノテートする方法の比較検討を行った。さらに、(3)情報分解に基づく別の分類法を提案し、アノテータが冗長性の度合いを注釈する: モダリティが個々に同時に同じ予測を与える範囲、一様性: 1つのモダリティが他方がしない予測を可能にする範囲、および相乗性: 2つのモダリティがそれぞれのモダリティを使用しない予測を行うことができる範囲。実験とアノテーションを通じて,各アプローチのいくつかの機会と限界を強調し,部分的および対実的ラベルのアノテーションを情報分解に自動的に変換する手法を提案する。 In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.	翻訳日:2023-11-01 23:40:06 公開日:2023-10-30
# ビジョンファウンデーションモデルによるラベルなしシーン理解に向けて Towards Label-free Scene Understanding by Vision Foundation Models ( http://arxiv.org/abs/2306.03899v2 ) ライセンス: Link先を確認	Runnan Chen, Youquan Liu, Lingdong Kong, Nenglun Chen, Xinge Zhu, Yuexin Ma, Tongliang Liu, Wenping Wang	(参考訳) Contrastive Vision-Language Pre-Training (CLIP) や Segment Anything (SAM) のような視覚基礎モデルは、画像分類やセグメンテーションタスクにおいて印象的なゼロショット性能を示している。しかし, ラベルなしシーン理解のためのCLIPとSAMの組み入れはまだ検討されていない。本稿では,ラベル付きデータなしで2次元世界と3次元世界を理解可能にするビジョン基盤モデルの可能性を検討する。主な課題は、非常にノイズの多い擬似ラベルの下でネットワークを効果的に監視することであり、これはCLIPによって生成され、2Dから3Dドメインへの伝播中にさらに悪化する。これらの課題に対処するために,CLIPとSAMの強みを利用して同時に2Dと3Dネットワークを監督するクロスモダリティノイズスーパービジョン(CNS)手法を提案する。特に,コトレイン2Dおよび3Dネットワークに対して予測整合性正則化を導入し,さらにSAMの頑健な特徴表現を用いた遅延空間整合性を示す。屋内および屋外の多様なデータセットを用いた実験は,2次元および3次元オープン環境の理解において,本手法の優れた性能を示す。 2dネットワークと3dネットワークは、scannet上で28.4\%と33.5\%miouでラベルなしセマンティクスセグメンテーションを実現し、それぞれ4.7\%と7.9\%を改善した。 nuImages と nuScenes のデータセットでは、それぞれ 22.1\% と 26.8\% であり、3.5\% と 6.0\% の改善がある。コードは利用可能。 (https://github.com/runnanchen/Label-Free-Scene-Understanding)。 Vision foundation models such as Contrastive Vision-Language Pre-training (CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot performance on image classification and segmentation tasks. However, the incorporation of CLIP and SAM for label-free scene understanding has yet to be explored. In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. The primary challenge lies in effectively supervising networks under extremely noisy pseudo labels, which are generated by CLIP and further exacerbated during the propagation from the 2D to the 3D domain. To tackle these challenges, we propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously. In particular, we introduce a prediction consistency regularization to co-train 2D and 3D networks, then further impose the networks' latent space consistency using the SAM's robust feature representation. Experiments conducted on diverse indoor and outdoor datasets demonstrate the superior performance of our method in understanding 2D and 3D open environments. Our 2D and 3D network achieves label-free semantic segmentation with 28.4\% and 33.5\% mIoU on ScanNet, improving 4.7\% and 7.9\%, respectively. For nuImages and nuScenes datasets, the performance is 22.1\% and 26.8\% with improvements of 3.5\% and 6.0\%, respectively. Code is available. (https://github.com/runnanchen/Label-Free-Scene-Understanding).	翻訳日:2023-11-01 23:39:07 公開日:2023-10-30
# ランダム分布シフトによる学習 Learning under random distributional shifts ( http://arxiv.org/abs/2306.02948v2 ) ライセンス: Link先を確認	Kirk Bansak, Elisabeth Paulson, Dominik Rothenh\"ausler	(参考訳) 分布シフトモデル分布シフトを適切な表現において逆または低ランクにシフトする設定で予測を生成するための既存の多くのアプローチ。しかし、様々な現実の環境では、人口と環境の多くの小さなランダムな変化の重ね合わせによって、変化が起こるかもしれない。したがって,共変量空間の任意の変化を捉えたランダム分布シフトモデルと,共変量と結果の関係に対する密集したランダムショックモデルを考える。この設定では、関心の長期的な結果を直接予測する標準的なアプローチ、短期的なプロキシ結果を直接予測するプロキシアプローチ、長期的なポリシー結果と(短期的な)プロキシ結果の両方を利用するハイブリッドアプローチなど、いくつかの代替予測戦略の利点と欠点を特徴づける。ハイブリッドアプローチは分散シフトの強さとプロキシ関係の強さに頑健であることを示す。本研究では,この手法を2つのハイインパクト領域のデータセットに適用する。どちらの設定でも、提案手法は現在の手法よりも平均二乗誤差がかなり低いことが分かる。 Many existing approaches for generating predictions in settings with distribution shift model distribution shifts as adversarial or low-rank in suitable representations. In various real-world settings, however, we might expect shifts to arise through the superposition of many small and random changes in the population and environment. Thus, we consider a class of random distribution shift models that capture arbitrary changes in the underlying covariate space, and dense, random shocks to the relationship between the covariates and the outcomes. In this setting, we characterize the benefits and drawbacks of several alternative prediction strategies: the standard approach that directly predicts the long-term outcome of interest, the proxy approach that directly predicts a shorter-term proxy outcome, and a hybrid approach that utilizes both the long-term policy outcome and (shorter-term) proxy outcome(s). We show that the hybrid approach is robust to the strength of the distribution shift and the proxy relationship. We apply this method to datasets in two high-impact domains: asylum-seeker assignment and early childhood education. In both settings, we find that the proposed approach results in substantially lower mean-squared error than current approaches.	翻訳日:2023-11-01 23:38:12 公開日:2023-10-30
# 分散シフト下におけるビデオ自己教師型学習の隠れダイナミクスの解明 Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts ( http://arxiv.org/abs/2306.02014v2 ) ライセンス: Link先を確認	Pritam Sarkar, Ahmad Beirami, Ali Etemad	(参考訳) ビデオ自己教師型学習(VSSL)は近年大きな進歩を遂げている。しかし、分布シフトの異なる形でのこれらのモデルの正確な挙動とダイナミクスはまだ分かっていない。本稿では, 様々な形態の自然分布変化に対応する6種類の自己監督手法(v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE)の挙動を総合的に検討する。 (i)コンテキストシフト。 (ii)視点転換。 (iii)俳優交代。 (iv) ソースシフト。 (v)未知クラスへの一般化可能性(ゼロショット) (vi)オープンセット認識。この広範な研究を行うために,利用可能な公開データセットと一連の評価プロトコルを用いて17の分散および分散ベンチマークペアからなるテストベッドを,意図したシフトで異なるメソッドをストレステストするために慎重に作成する。本研究は,VSSL手法の興味深い発見と興味深い挙動を明らかにするものである。例えば、ビデオモデルは一般的にコンテキストシフトに苦しむが、v-MAEと教師付き学習はより堅牢性を示す。また,v-MAEは時間的学習者であり,v-SimCLRとv-MoCoは視点変化に対して強い性能を示す。オープンセット認識の概念を研究する際,事前学習したVSSLエンコーダを微調整することなく使用した場合,クローズドセットとオープンセット認識性能のトレードオフに気づく。私たちの研究が,実世界のさまざまなシナリオを対象としたロバストなビデオ表現学習フレームワークの開発に貢献できることを願っています。プロジェクトページとコードは、https://pritamqu.github.io/ood-vssl。 Video self-supervised learning (VSSL) has made significant progress in recent years. However, the exact behavior and dynamics of these models under different forms of distribution shift are not yet known. In this paper, we comprehensively study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distribution shift, i.e., (i) context shift, (ii) viewpoint shift, (iii) actor shift, (iv) source shift, (v) generalizability to unknown classes (zero-shot), and (vi) open-set recognition. To perform this extensive study, we carefully craft a test bed consisting of 17 in-distribution and out-of-distribution benchmark pairs using available public datasets and a series of evaluation protocols to stress-test the different methods under the intended shifts. Our study uncovers a series of intriguing findings and interesting behaviors of VSSL methods. For instance, we observe that while video models generally struggle with context shifts, v-MAE and supervised learning exhibit more robustness. Moreover, our study shows that v-MAE is a strong temporal learner, whereas contrastive methods, v-SimCLR and v-MoCo, exhibit strong performances against viewpoint shifts. When studying the notion of open-set recognition, we notice a trade-off between closed-set and open-set recognition performance if the pretrained VSSL encoders are used without finetuning. We hope that our work will contribute to the development of robust video representation learning frameworks for various real-world scenarios. The project page and code are available at: https://pritamqu.github.io/OOD-VSSL.	翻訳日:2023-11-01 23:37:43 公開日:2023-10-30
# PLASTIC: 有効強化学習のための入力とラベルの塑性の改善 PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning ( http://arxiv.org/abs/2306.10711v2 ) ライセンス: Link先を確認	Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, Chulhee Yun	(参考訳) 強化学習(RL)では、特にデータ取得が高価でリスクの高いシナリオにおいて、サンプル効率の向上が不可欠である。原則として、オフポリシーrlアルゴリズムは、環境インタラクション毎に複数の更新を可能にすることにより、サンプル効率を向上させることができる。しかしながら、これらの複数の更新は、しばしば、可塑性の喪失と呼ばれる以前の相互作用に過度に適合するモデルにつながる。本研究は, この現象の原因を, 塑性を2つの側面に分けて検討した。入力可塑性(英: Input plasticity)とは、入力データの変更に対するモデルの適応性、および入力-出力関係の進化に対するモデルの適応性を示すラベル可塑性である。 cifar-10データセットの合成実験により、より滑らかなロスランドスケープの発見は入力可塑性を増加させ、一方、洗練された勾配伝播はラベル可塑性を改善することが判明した。これらの知見を活かしてPLASTICアルゴリズムを導入し,両問題に対処する手法を調和的に組み合わせた。最小限のアーキテクチャ変更により、PLASTICはAtari-100kやDeepmind Control Suiteといったベンチマーク上での競合性能を達成した。この結果は、RLの試料効率を高めるためにモデルの可塑性を維持することの重要性を強調している。コードはhttps://github.com/dojeon-ai/plasticで入手できる。 In Reinforcement Learning (RL), enhancing sample efficiency is crucial, particularly in scenarios when data acquisition is costly and risky. In principle, off-policy RL algorithms can improve sample efficiency by allowing multiple updates per environment interaction. However, these multiple updates often lead the model to overfit to earlier interactions, which is referred to as the loss of plasticity. Our study investigates the underlying causes of this phenomenon by dividing plasticity into two aspects. Input plasticity, which denotes the model's adaptability to changing input data, and label plasticity, which denotes the model's adaptability to evolving input-output relationships. Synthetic experiments on the CIFAR-10 dataset reveal that finding smoother minima of loss landscape enhances input plasticity, whereas refined gradient propagation improves label plasticity. Leveraging these findings, we introduce the PLASTIC algorithm, which harmoniously combines techniques to address both concerns. With minimal architectural modifications, PLASTIC achieves competitive performance on benchmarks including Atari-100k and Deepmind Control Suite. This result emphasizes the importance of preserving the model's plasticity to elevate the sample efficiency in RL. The code is available at https://github.com/dojeon-ai/plastic.	翻訳日:2023-11-01 23:29:56 公開日:2023-10-30
# 明示的制約を考慮した学習ダイナミクスのための安定化ニューラル微分方程式 Stabilized Neural Differential Equations for Learning Dynamics with Explicit Constraints ( http://arxiv.org/abs/2306.09739v2 ) ライセンス: Link先を確認	Alistair White, Niki Kilbertus, Maximilian Gelbrecht, Niklas Boers	(参考訳) データから動的システムを学ぶための多くの手法が最近導入された。しかしながら、推論力学が、保護法や許可されたシステム状態の制限といった既知の制約を確実に維持することはまだ困難である。本稿では, 線形微分方程式に対する任意の多様体制約を強制する手法である安定化ニューラル微分方程式(SNDE)を提案する。我々のアプローチは安定化項に基づいており、元の力学に加えると、制約多様体は漸近的に安定である。その単純さから,本手法はすべての共通神経微分方程式(nde)モデルと適合し,広く適用可能である。実験的な評価では、SNDEは既存の手法よりも優れており、NDEトレーニングに組み込むことができる制約の種類を広くしている。 Many successful methods to learn dynamical systems from data have recently been introduced. However, ensuring that the inferred dynamics preserve known constraints, such as conservation laws or restrictions on the allowed system states, remains challenging. We propose stabilized neural differential equations (SNDEs), a method to enforce arbitrary manifold constraints for neural differential equations. Our approach is based on a stabilization term that, when added to the original dynamics, renders the constraint manifold provably asymptotically stable. Due to its simplicity, our method is compatible with all common neural differential equation (NDE) models and broadly applicable. In extensive empirical evaluations, we demonstrate that SNDEs outperform existing methods while broadening the types of constraints that can be incorporated into NDE training.	翻訳日:2023-11-01 23:29:16 公開日:2023-10-30
# QH9:QM9分子の量子ハミルトン予測ベンチマーク QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules ( http://arxiv.org/abs/2306.09549v2 ) ライセンス: Link先を確認	Haiyang Yu, Meng Liu, Youzhi Luo, Alex Strasser, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji	(参考訳) 教師付き機械学習アプローチは、密度汎関数理論(DFT)のような第一原理計算手法の代用として、電子構造予測の加速にますます利用されている。多くの量子化学データセットは化学的性質と原子力に焦点を当てているが、物理系と化学特性の量子状態を決定する最も重要かつ基本的な物理量であるため、ハミルトン行列の正確かつ効率的な予測を達成する能力は非常に望ましい。本研究では、QM9データセットに基づいて、2,399の分子動力学軌道と130,831の安定な分子ジオメトリに対して正確なハミルトン行列を提供するために、QH9と呼ばれる新しい量子ハミルトンデータセットを生成する。様々な分子を用いてベンチマークタスクを設計することにより、現在の機械学習モデルは任意の分子に対するハミルトン行列を予測する能力を有することを示す。 QH9データセットとベースラインモデルの両方がオープンソースベンチマークを通じてコミュニティに提供されており、機械学習手法の開発や、科学および技術応用のための分子および材料設計の加速に非常に有用である。私たちのベンチマークはhttps://github.com/divelab/AIRS/tree/main/OpenDFT/QHBenchで公開されています。 Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.	翻訳日:2023-11-01 23:29:04 公開日:2023-10-30
# 除去に基づく特徴属性のロバスト性について On the Robustness of Removal-Based Feature Attributions ( http://arxiv.org/abs/2306.07462v2 ) ライセンス: Link先を確認	Chris Lin, Ian Covert, Su-In Lee	(参考訳) 複雑な機械学習モデルによる予測を説明するため、重要点を入力特徴に割り当てる多くの特徴属性法が開発されている。最近の研究の中には、入力やモデル摂動に敏感であることを示すことによって、これらの手法の堅牢性に挑戦するものもある。しかし,従来の帰属ロバスト性は,主に勾配に基づく特徴帰属に焦点が当てられているが,現在,除去に基づく帰属法のロバスト性はよく分かっていない。このギャップを埋めるために、我々は除去に基づく特徴属性の堅牢性特性を理論的に特徴づける。具体的には,これらの手法の統一的な解析を行い,入力とモデルの両方の摂動の設定下で,無傷と摂動の差の上界を導出する。合成データと実世界のデータを用いた実験結果は,理論結果の妥当性を検証し,モデルのリプシッツ正則性向上による帰属ロバスト性の向上など,その実践的意義を実証した。 To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing that they are sensitive to input and model perturbations, while other work addresses this issue by proposing robust attribution methods. However, previous work on attribution robustness has focused primarily on gradient-based feature attributions, whereas the robustness of removal-based attribution methods is not currently well understood. To bridge this gap, we theoretically characterize the robustness properties of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our empirical results on synthetic and real-world data validate our theoretical results and demonstrate their practical implications, including the ability to increase attribution robustness by improving the model's Lipschitz regularity.	翻訳日:2023-11-01 23:27:40 公開日:2023-10-30
# FLSL: 機能レベルの自己教師型学習 FLSL: Feature-level Self-supervised Learning ( http://arxiv.org/abs/2306.06203v2 ) ライセンス: Link先を確認	Qing Su, Anton Netchaev, Hai Li, and Shihao Ji	(参考訳) 現在の自己教師型学習(SSL)手法(例えば、SimCLR, DINO, VICReg, MOCOv3)は、主にインスタンスレベルでの表現を目標としており、オブジェクト検出やセグメンテーションなどの高密度な予測タスクには適さない。共同埋め込みとクラスタリングにトランスフォーマーを用いることにより,FLSL(Feature-Level Self-supervised Learning)と呼ばれる2レベル特徴クラスタリングSSL法を提案する。 FLSL問題の形式的定義を示し、平均シフトおよびk平均視点から目的を構築する。 FLSLは目覚しいセマンティッククラスタ表現を促進し,ビュー内およびビュー間特徴クラスタリングに適した埋め込みスキームを学習する。実験の結果、FLSLは高密度予測タスクにおいて大幅に改善し、対象検出では44.9 (+2.8)% APと46.5% AP、MS-COCOでは40.8 (+2.3)% APと42.1% APを達成した。 FLSL は UAVDT 上の UAV17 オブジェクト検出や DAVIS 2017 上のビデオインスタンスセグメンテーションなど,既存の SSL メソッドよりも一貫して優れている。ソースコードはhttps://github.com/isl-cv/flslで入手できる。 Current self-supervised learning (SSL) methods (e.g., SimCLR, DINO, VICReg,MOCOv3) target primarily on representations at instance level and do not generalize well to dense prediction tasks, such as object detection and segmentation.Towards aligning SSL with dense predictions, this paper demonstrates for the first time the underlying mean-shift clustering process of Vision Transformers (ViT), which aligns well with natural image semantics (e.g., a world of objects and stuffs). By employing transformer for joint embedding and clustering, we propose a two-level feature clustering SSL method, coined Feature-Level Self-supervised Learning (FLSL). We present the formal definition of the FLSL problem and construct the objectives from the mean-shift and k-means perspectives. We show that FLSL promotes remarkable semantic cluster representations and learns an embedding scheme amenable to intra-view and inter-view feature clustering. Experiments show that FLSL yields significant improvements in dense prediction tasks, achieving 44.9 (+2.8)% AP and 46.5% AP in object detection, as well as 40.8 (+2.3)% AP and 42.1% AP in instance segmentation on MS-COCO, using Mask R-CNN with ViT-S/16 and ViT-S/8 as backbone, respectively. FLSL consistently outperforms existing SSL methods across additional benchmarks, including UAV17 object detection on UAVDT, and video instance segmentation on DAVIS 2017.We conclude by presenting visualization and various ablation studies to better understand the success of FLSL. The source code is available at https://github.com/ISL-CV/FLSL.	翻訳日:2023-11-01 23:26:09 公開日:2023-10-30
# Intensity Profile Projection:動的ネットワークのための連続時間表現学習フレームワーク Intensity Profile Projection: A Framework for Continuous-Time Representation Learning for Dynamic Networks ( http://arxiv.org/abs/2306.06155v2 ) ライセンス: Link先を確認	Alexander Modell, Ian Gallagher, Emma Ceccherini, Nick Whiteley and Patrick Rubin-Delanchy	(参考訳) 連続時間動的ネットワークデータのための新しい表現学習フレームワークIntensity Profile Projectionを提案する。 2つのエンティティ(i,j$)間の時間スタンプ(t$)の相互作用を表すトリプル$(i,j,t)$を与えられた場合、我々の手順は各ノードに対して連続時間軌跡を返す。このフレームワークは3つの段階から構成される:例えば、カーネルの滑らか化によるペアエント関数の推定、強度再構成誤差を最小化するプロジェクションの学習、学習されたプロジェクションを通して進化するノード表現の構築。軌道は構造的コヒーレンスと時間的コヒーレンスという2つの性質を満たしており、これは信頼できる推論の基本的なものである。さらに,推定軌跡の誤差を厳密に制御できる推定理論を考案し,ノイズに敏感な追従解析でもその表現が利用できることを示す。この理論はまた、偏分散トレードオフとしての平滑化の役割を解明し、ネットワーク全体の「ボーリング強度」のアルゴリズムを考慮すると、信号対雑音比が増加するにつれて平滑化のレベルをいかに低減できるかを示す。 We present a new representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$, each representing a time-stamped ($t$) interaction between two entities ($i,j$), our procedure returns a continuous-time trajectory for each node, representing its behaviour over time. The framework consists of three stages: estimating pairwise intensity functions, e.g. via kernel smoothing; learning a projection which minimises a notion of intensity reconstruction error; and constructing evolving node representations via the learned projection. The trajectories satisfy two properties, known as structural and temporal coherence, which we see as fundamental for reliable inference. Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses. The theory also elucidates the role of smoothing as a bias-variance trade-off, and shows how we can reduce the level of smoothing as the signal-to-noise ratio increases on account of the algorithm `borrowing strength' across the network.	翻訳日:2023-11-01 23:25:35 公開日:2023-10-30
# T2I-CompBench: オープンワールドコンポジションテキスト画像生成のための総合ベンチマーク T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation ( http://arxiv.org/abs/2307.06350v2 ) ライセンス: Link先を確認	Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu	(参考訳) 最近のテキストから画像へのモデルによって高品質な画像を生成する素晴らしい能力にもかかわらず、現在のアプローチでは、異なる属性と関係を持つオブジェクトを複雑で一貫性のあるシーンに効果的に構成するのに苦労することが多い。 T2I-CompBenchは3つのカテゴリ(属性バインディング、オブジェクト関係、複雑な構成)と6つのサブカテゴリ(カラーバインディング、形状バインディング、テクスチャバインディング、空間関係、非空間関係、複雑な構成)から6000のコンポジションテキストプロンプトからなるオープンワールドコンポジションテキスト画像生成のための総合ベンチマークである。さらに,合成テキストから画像への生成を評価するために特別に設計された評価指標をいくつか提案し,マルチモーダルllmの可能性と限界について検討する。本稿では,プリトレーニングされたテキスト対画像モデルの合成テキスト生成能力を高めるために,報酬駆動サンプル選択(gors)による生成モデルの微調整を提案する。従来のt2i-compbench法をベンチマークし,提案手法の有効性を検証するため,広範な実験と評価を行った。プロジェクトページはhttps://karine-h.github.io/t2i-compbench/。 Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at https://karine-h.github.io/T2I-CompBench/.	翻訳日:2023-11-01 23:18:05 公開日:2023-10-30
# テキスト記述は視覚学習のための圧縮的・不変表現である Text Descriptions are Compressive and Invariant Representations for Visual Learning ( http://arxiv.org/abs/2307.04317v2 ) ライセンス: Link先を確認	Zhili Feng, Anna Bair, J. Zico Kolter	(参考訳) 現代の画像分類は、分類決定を構成する直感的な視覚的特徴に関する情報を直接含まない、大きな識別ネットワークを介してクラスを直接予測することに基づいている。近年、CLIPのような視覚言語モデル(VLM)の研究は、画像クラスの自然言語記述を規定する手段を提供しているが、一般的には各クラスに単一の記述を提供することに焦点を当てている。本研究では,クラスごとの視覚的特徴に対する人間の理解に則った代替手法が,頑健な数ショット学習環境において魅力的な性能を提供できることを示す。特に,新しい手法である「textit{SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors)}を導入する。この手法はまず,まず大規模言語モデル(LLM)を用いて各クラスの視覚的記述を自動的に生成し,次にVLMを用いて各画像の視覚的特徴埋め込みに変換し,最後に,各特徴の関連部分集合を選択して各画像の分類を行う。我々のアプローチの中核は、情報理論上、これらの記述的特徴は、vlmトレーニングプロセスが不変表現学習のために明示的に設計されていないにもかかわらず、従来の画像埋め込みよりもドメインシフトに不変であるという事実です。これらの不変記述機能は、より良い入力圧縮スキームを構成する。ファインチューニングと組み合わせることで、SLR-AVDは、分布内および分布外の両方において既存の最先端のファインチューニング手法より優れていることを示す。 Modern image classification is based upon directly predicting classes via large discriminative networks, which do not directly contain information about the intuitive visual features that may constitute a classification decision. Recently, work in vision-language models (VLM) such as CLIP has provided ways to specify natural language descriptions of image classes, but typically focuses on providing single descriptions for each class. In this work, we demonstrate that an alternative approach, in line with humans' understanding of multiple visual features per class, can also provide compelling performance in the robust few-shot learning setting. In particular, we introduce a novel method, \textit{SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors)}. This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify each image. Core to our approach is the fact that, information-theoretically, these descriptive features are more invariant to domain shift than traditional image embeddings, even though the VLM training process is not explicitly designed for invariant representation learning. These invariant descriptive features also compose a better input compression scheme. When combined with finetuning, we show that SLR-AVD is able to outperform existing state-of-the-art finetuning approaches on both in-distribution and out-of-distribution performance.	翻訳日:2023-11-01 23:17:11 公開日:2023-10-30
# 3分間の人間フィードバックを用いた拡散モデルの検閲サンプリング Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback ( http://arxiv.org/abs/2307.02770v2 ) ライセンス: Link先を確認	TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu	(参考訳) 拡散モデルは最近、高品質な画像生成で顕著な成功を収めている。しかし、事前学習された拡散モデルは、良い画像を生成できるという意味で部分的な不一致を示すことがあるが、望ましくない画像を出力することもある。もしそうなら、単に悪い画像を生成するのを防ぎ、このタスクを検閲と呼びます。本研究では,最小の人間フィードバックに基づいて学習した報酬モデルを用いて,事前学習した拡散モデルを用いた検閲生成法を提案する。検閲は極端に人的フィードバック効率で達成でき、ほんの数分のフィードバックで生成されたラベルだけで十分であることを示す。 https://github.com/tetrzim/diffusion-human-feedback.com/で利用可能。 Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.	翻訳日:2023-11-01 23:16:45 公開日:2023-10-30
# スライスワッサーシュタイン一般化測地学による高速最適輸送 Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics ( http://arxiv.org/abs/2307.01770v2 ) ライセンス: Link先を確認	Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Cl\'ement Bonet, Nicolas Courty	(参考訳) ワッサースタイン距離(wasserstein distance, wd)と関連する最適輸送計画は、確率測度が懸かっている多くの応用において有用であることが証明されている。本稿では,2つの入力分布の最適1次元投影により誘導される輸送マップに基づく,2乗WDの新たなプロキシであるmin-SWGGを提案する。 min-swgg と wasserstein の一般化測地学との接続を描き、ピボット測度を直線上で支持する。特に、ライン上でサポートされている分布の1つの場合において、正確なワッサースタイン距離に対する新しい閉形式を提供し、勾配降下最適化に適応可能な高速計算スキームを導出する。 min-SWGG は WD の上限であり,Sliced-Wasserstein と同様の複雑性を有し,関連する輸送計画を提供するという付加的な特徴を有することを示す。また、距離性、弱収束、計算および位相的性質などの理論的性質についても検討する。実験的な証拠は、勾配流、形状マッチング、画像の着色など、様々な文脈におけるmin-SWGGの利点を支持する。 Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.	翻訳日:2023-11-01 23:16:34 公開日:2023-10-30
# アイデンティティ効果学習におけるグラフニューラルネットワークの一般化限界 Generalization Limits of Graph Neural Networks in Identity Effects Learning ( http://arxiv.org/abs/2307.00134v2 ) ライセンス: Link先を確認	Giuseppe Alessio D'Inverno and Simone Brugiapaglia and Mirco Ravanelli	(参考訳) グラフニューラルネットワーク(GNN)は、さまざまなグラフドメインでデータ駆動学習を行う強力なツールとして登場した。それらは通常、メッセージパス機構に基づいており、表現力の点で同等であることが証明されたグラフ同型に対するWeisfeiler-Lehman (WL)テストと密接に関連している直感的な定式化で人気を高めている。本研究では,物体が2つの同一成分からなるか否かを判断するタスク,いわゆるアイデンティティ効果の学習の文脈において,新たな一般化特性とgnnの基本限界を確立する。本研究の目的は,GNNが単純な認知タスクを遂行する際の能力を理解することであり,計算言語学や化学への応用の可能性にある。 2つのケーススタディを分析しました (i)二文字の単語は、一線表現のような直交符号化を利用する場合、確率勾配降下により訓練されたGNNが、見知らぬ文字に一般化できないことを示す。 (ii)二環グラフ、すなわち2つのサイクルからなるグラフは、GNNとWLテストの接続を利用して正の存在結果を示す。我々の理論解析は広範な数値研究によって裏付けられている。 Graph Neural Networks (GNNs) have emerged as a powerful tool for data-driven learning on various graph domains. They are usually based on a message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power. In this work, we establish new generalization properties and fundamental limits of GNNs in the context of learning so-called identity effects, i.e., the task of determining whether an object is composed of two identical components or not. Our study is motivated by the need to understand the capabilities of GNNs when performing simple cognitive tasks, with potential applications in computational linguistics and chemistry. We analyze two case studies: (i) two-letters words, for which we show that GNNs trained via stochastic gradient descent are unable to generalize to unseen letters when utilizing orthogonal encodings like one-hot representations; (ii) dicyclic graphs, i.e., graphs composed of two cycles, for which we present positive existence results leveraging the connection between GNNs and the WL test. Our theoretical analysis is supported by an extensive numerical study.	翻訳日:2023-11-01 23:15:54 公開日:2023-10-30
# プロンプトによるパーソナライズドコールドスタート勧告に向けて Towards Personalized Cold-Start Recommendation with Prompts ( http://arxiv.org/abs/2306.17256v3 ) ライセンス: Link先を確認	Xuansheng Wu, Huachi Zhou, Yucheng Shi, Wenlin Yao, Xiao Huang, Ninghao Liu	(参考訳) レコメンダシステムは,過去の行動に基づいて,ユーザの興味に沿った情報発見を支援する上で,重要な役割を担っている。しかし、ユーザとコンテンツのインタラクションの履歴が利用できない場合、パーソナライズドレコメンデーションシステムの開発は困難になり、システムコールドスタートレコメンデーション問題として知られる問題に繋がる。この問題は、ユーザーエンゲージメントが不十分なスタートアップ企業やプラットフォームで特に顕著である。従来の研究では、新しいユーザやアイテムを推薦できるが、同じドメイン内の歴史的なユーザとイテムのインタラクションでトレーニングされているため、私たちの問題は解決できない。このギャップを埋めるため,本研究では,事前学習した言語モデルの能力を活用した革新的かつ効果的なアプローチを提案する。提案手法は,ユーザプロファイルや項目属性の情報を含む自然言語の感情分析に変換され,迅速な学習によって感情極性が予測される。言語モデルに格納された広範な知識を利用することで、歴史的ユーザ・イテム相互作用の記録なしで予測を行うことができる。また,提案手法を冷間開始条件下で評価するためのベンチマークも導入し,本手法の有効性を実証した。私たちの知る限りでは、システムコールドスタートレコメンデーション問題に取り組む最初の研究である。メソッドのベンチマークと実装はhttps://github.com/JacksonWuxs/PromptRec.comで公開されている。 Recommender systems play a crucial role in helping users discover information that aligns with their interests based on their past behaviors. However, developing personalized recommendation systems becomes challenging when historical records of user-item interactions are unavailable, leading to what is known as the system cold-start recommendation problem. This issue is particularly prominent in start-up businesses or platforms with insufficient user engagement history. Previous studies focus on user or item cold-start scenarios, where systems could make recommendations for new users or items but are still trained with historical user-item interactions in the same domain, which cannot solve our problem. To bridge the gap, our research introduces an innovative and effective approach, capitalizing on the capabilities of pre-trained language models. We transform the recommendation process into sentiment analysis of natural languages containing information of user profiles and item attributes, where the sentiment polarity is predicted with prompt learning. By harnessing the extensive knowledge housed within language models, the prediction can be made without historical user-item interaction records. A benchmark is also introduced to evaluate the proposed method under the cold-start setting, and the results demonstrate the effectiveness of our method. To the best of our knowledge, this is the first study to tackle the system cold-start recommendation problem. The benchmark and implementation of the method are available at https://github.com/JacksonWuxs/PromptRec.	翻訳日:2023-11-01 23:15:33 公開日:2023-10-30
# 拡散確率モデルのスパイキング Spiking Denoising Diffusion Probabilistic Models ( http://arxiv.org/abs/2306.17046v3 ) ライセンス: Link先を確認	Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, Renjing Xu	(参考訳) スパイキングニューラルネットワーク(SNN)は、人工ニューラルネットワーク(ANN)と比較して、二元的および生物駆動的な性質のため、超低エネルギー消費と高い生物学的可視性を有する。これまでの研究は主に分類タスクにおけるsnsの性能向上に重点を置いてきたが、snsの生成可能性は比較的未解明のままである。本稿では,SNN を用いた新しい生成モデルである Spking Denoising Diffusion Probabilistic Models (SDDPM) について述べる。 SNNのエネルギー効率をフル活用するために,ANNに匹敵する性能を実現する純粋にスパイクされたU-Netアーキテクチャを提案する。広範な実験結果から,提案手法は生成タスクの最先端化を達成し,他のsnベースの生成モデルよりも大幅に優れ,cifar-10とcelebaデータセットでは最大12倍,6倍の改善が得られた。さらに,トレーニングフリーでパフォーマンスをさらに2.69%向上させることができるしきい値誘導戦略を提案する。 SDDPMはSNN生成の分野での大きな進歩を象徴し、新たな視点と潜在的な探索の道のりを注入している。私たちのコードはhttps://github.com/AndyCao1125/SDDPMで利用可能です。 Spiking neural networks (SNNs) have ultra-low energy consumption and high biological plausibility due to their binary and bio-driven nature compared with artificial neural networks (ANNs). While previous research has primarily focused on enhancing the performance of SNNs in classification tasks, the generative potential of SNNs remains relatively unexplored. In our paper, we put forward Spiking Denoising Diffusion Probabilistic Models (SDDPM), a new class of SNN-based generative models that achieve high sample quality. To fully exploit the energy efficiency of SNNs, we propose a purely Spiking U-Net architecture, which achieves comparable performance to its ANN counterpart using only 4 time steps, resulting in significantly reduced energy consumption. Extensive experimental results reveal that our approach achieves state-of-the-art on the generative tasks and substantially outperforms other SNN-based generative models, achieving up to 12x and 6x improvement on the CIFAR-10 and the CelebA datasets, respectively. Moreover, we propose a threshold-guided strategy that can further improve the performances by 2.69% in a training-free manner. The SDDPM symbolizes a significant advancement in the field of SNN generation, injecting new perspectives and potential avenues of exploration. Our code is available at https://github.com/AndyCao1125/SDDPM.	翻訳日:2023-11-01 23:14:38 公開日:2023-10-30
# 深部微分型メッシュ変形による腹部臓器の分節 Abdominal organ segmentation via deep diffeomorphic mesh deformations ( http://arxiv.org/abs/2306.15515v2 ) ライセンス: Link先を確認	Fabian Bongratz, Anne-Marie Rickmann, Christian Wachinger	(参考訳) CTとMRIによる腹部臓器の分節は,手術計画とコンピュータ支援ナビゲーションシステムにとって必須の要件である。腹部臓器の形状,大きさ,位置の多様性が高いため,困難である。テンプレートに対する点対応の腹部形状の3次元数値表現は、その定量的および統計的解析においてさらに重要である。近年,テンプレートベースの表面抽出法は,体積走査によるメッシュ再構築に期待できる進歩を見せている。しかし, 様々な臓器やデータセットに対する深層学習に基づくアプローチの一般化は, 臨床環境への展開にとって重要な要素であり, まだ評価されていない。このギャップを埋めて, 肝臓, 腎臓, 膵臓, 脾臓分節に対するテンプレートベースのメッシュ再構成法を応用した。手動注記CTおよびMRIデータを用いた実験により,従来の手法を異なる形状のオルガンに限定的に一般化し,小さなデータセット上での弱い性能を示すことができた。我々はこれらの問題を、新しい微分型メッシュデフォーメーションアーキテクチャと改善されたトレーニングスキームで緩和する。結果として得られたUNetFlowは4つの器官すべてによく当てはまり、新しいデータに基づいて簡単に微調整できる。さらに,ボクセルとメッシュの出力を整列させてセグメンテーション精度を高める,単純な登録ベースの後処理を提案する。 Abdominal organ segmentation from CT and MRI is an essential prerequisite for surgical planning and computer-aided navigation systems. It is challenging due to the high variability in the shape, size, and position of abdominal organs. Three-dimensional numeric representations of abdominal shapes with point-wise correspondence to a template are further important for quantitative and statistical analyses thereof. Recently, template-based surface extraction methods have shown promising advances for direct mesh reconstruction from volumetric scans. However, the generalization of these deep learning-based approaches to different organs and datasets, a crucial property for deployment in clinical environments, has not yet been assessed. We close this gap and employ template-based mesh reconstruction methods for joint liver, kidney, pancreas, and spleen segmentation. Our experiments on manually annotated CT and MRI data reveal limited generalization capabilities of previous methods to organs of different geometry and weak performance on small datasets. We alleviate these issues with a novel deep diffeomorphic mesh-deformation architecture and an improved training scheme. The resulting method, UNetFlow, generalizes well to all four organs and can be easily fine-tuned on new data. Moreover, we propose a simple registration-based post-processing that aligns voxel and mesh outputs to boost segmentation accuracy.	翻訳日:2023-11-01 23:12:47 公開日:2023-10-30
# 状態のみ列からの非マルコフ決定過程の学習 Learning non-Markovian Decision-Making from State-only Sequences ( http://arxiv.org/abs/2306.15156v3 ) ライセンス: Link先を確認	Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, Sirui Xie	(参考訳) 従来の模倣学習では、デモ参加者の行動にアクセスできるが、これらの運動信号は自然主義的な環境では観測できないことが多い。さらに、これらの設定におけるシーケンシャルな意思決定行動は、標準的なマルコフ決定プロセス(MDP)の仮定から逸脱することができる。これらの課題に対処するために、状態遷移生成器の潜時空間におけるエネルギーベースである非マルコフ決定過程(nMDP)を用いた状態のみ列の深部生成モデリングについて検討する。提案手法は,後肢の短周期MCMCサンプリングと重要サンプリングを含むモデルベース模倣を実現するための最大推定法である。モデルなしのポリシーの実行は、事前のサンプリングと等価であり、モデルベースの計画はそのポリシーから初期化された後続のサンプリングである。非マルコフ制約付き経路計画タスクにおいて,提案手法の有効性を実証し,mujocoスイートからの挑戦領域において,学習モデルが強力な性能を示すことを示した。 Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite.	翻訳日:2023-11-01 23:12:27 公開日:2023-10-30
# InterCode: 実行フィードバックによるインタラクティブコーディングの標準化とベンチマーク InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback ( http://arxiv.org/abs/2306.14898v3 ) ライセンス: Link先を確認	John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao	(参考訳) 人間は基本的にインタラクティブな方法でコードを書き、エラーを修正し、曖昧さを解決し、タスクを分解するために一定の実行フィードバックに頼る。 LLMは最近、有望なコーディング機能を示したが、現在のコーディングベンチマークは、主に静的命令からコードへのシーケンスのトランスダクションプロセスを検討しており、エラーの伝播や生成されたコードと最終的な実行環境との切り離しが可能である。このギャップに対処するため、対話型コーディングの軽量でフレキシブルで使いやすいフレームワークであるInterCodeを標準強化学習(RL)環境として導入し、コードをアクションとして、実行フィードバックを観察する。私たちのフレームワークは言語とプラットフォームに依存しず、自己完結型のDocker環境を使用して安全で再現可能な実行を提供し、従来のseq2seqコーディングメソッドと互換性があり、インタラクティブなコード生成のための新しいメソッドの開発を可能にします。私たちはInterCodeを使って、静的なNL2Bash、Spider、MBPPデータセットからのデータを活用する、アクションスペースとしてBash、SQL、Pythonで3つのインタラクティブなコード環境を作成しています。我々は、ReActやPlan & Solveといった様々なプロンプト戦略で構成された複数の最先端LLMを評価することで、InterCodeの生存性をテストベッドとして示す。その結果,インタラクティブなコード生成の利点が示され,コード理解と生成能力向上のための難解なベンチマークとしてインターコードの利用が期待できることを示した。 intercodeは簡単に拡張できるように設計されているが、capture the flagのような新しいタスクを作成するのにも使える。コードとデータを持つプロジェクトサイト: https://intercode-benchmark.github.io Humans write code in a fundamentally interactive manner and rely on constant execution feedback to correct errors, resolve ambiguities, and decompose tasks. While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations. Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution, and is compatible out-of-the-box with traditional seq2seq coding methods, while enabling the development of new methods for interactive code generation. We use InterCode to create three interactive code environments with Bash, SQL, and Python as action spaces, leveraging data from the static NL2Bash, Spider, and MBPP datasets. We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies such as ReAct and Plan & Solve. Our results showcase the benefits of interactive code generation and demonstrate that InterCode can serve as a challenging benchmark for advancing code understanding and generation capabilities. InterCode is designed to be easily extensible and can even be used to create new tasks such as Capture the Flag, a popular coding puzzle that is inherently multi-step and involves multiple programming languages. Project site with code and data: https://intercode-benchmark.github.io	翻訳日:2023-11-01 23:12:08 公開日:2023-10-30
# 位相依存ハンベリーブラウンとtwiss効果 Phase Dependent Hanbury-Brown and Twiss effect ( http://arxiv.org/abs/2308.11459v2 ) ライセンス: Link先を確認	Xuan Tang, Yunxiao Zhang, Xueshi Guo, Liang Cui, Xiaoying Li, Z. Y. Ou	(参考訳) ハンベリー・ブラウン・アンド・ツイス効果(HBT)は恒星強度干渉法の基礎となる。しかし、位相非感受性の2光子干渉効果である。本稿では,2つの位相コヒーレント入力場とコヒーレント補助場とを混合してHBT干渉計を拡張し,入力場の完全複素二階コヒーレンス関数を測定するために位相感度2光子干渉を実現する。この実用的な手法は、光学系における天文学的応用のための合成開口イメージングの道を開く。パルス入力フィールドは、リモートセンシングや測位アプリケーションのためにもテストされている。本稿では,より現実的なcw広帯域光電界を用いた絡み合い型テレスコピー方式の実装条件について検討する。 Hanbury-Brown and Twiss (HBT) effect is the foundation for stellar intensity interferometry. However, it is a phase insensitive two-photon interference effect. In this paper, we extend the HBT interferometer by mixing two phase-coherent input fields with coherent auxiliary fields before intensity correlation measurement and achieve phase sensitive two-photon interference so as to measure the complete complex second-order coherence function of the input fields. This practical scheme paves the way for synthetic aperture imaging for astronomical applications in optical regime. Pulsed input fields is also tested for potential remote sensing and ranging applications. We discuss the condition to implement recently proposed entanglement-based telescopy scheme with the more realistic cw broadband anti-bunched light fields.	翻訳日:2023-11-01 23:05:08 公開日:2023-10-30
# PsyMo: 歩行から自己申告された心理的トラストを推定するためのデータセット PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait ( http://arxiv.org/abs/2308.10631v2 ) ライセンス: Link先を確認	Adrian Cosma, Emilian Radoi	(参考訳) 運動や外見などの外的要因からの心理的特性推定は、心理学において困難で長期にわたる問題であり、主にエンボディメントの心理学理論に基づいている。これまでのところ、この問題に対処する試みは、侵入性体感センサーを備えたプライベートな小規模データセットを利用している。心理的特性推定のための自動システムの潜在的な応用には、職業的疲労と心理学の推定、マーケティングと広告が含まれる。本研究では,歩行パターンに現れる心理的手がかりを探索するための新しい多目的多モードデータセットであるpsymo(psychological traits from motion)を提案する。被験者312名から7種類の歩行変化と6種類のカメラアングルで歩行シーケンスを収集した。参加者は6つの心理的質問紙に記入し、パーソナリティ、自尊感情、疲労、攻撃性、精神健康に関する17の心理指標を集計した。心理特性推定のための2つの評価プロトコルを提案する。歩行から自己報告された心理的特徴を推定すると同時に、このデータセットは歩行認識のためのベンチマーク手法の代替として使用できる。被験者の身元に関するすべての手がかりを匿名化し,シルエット,2D/3Dヒト骨格,3D SMPLヒトメッシュのみを一般公開した。 Psychological trait estimation from external factors such as movement and appearance is a challenging and long-standing problem in psychology, and is principally based on the psychological theory of embodiment. To date, attempts to tackle this problem have utilized private small-scale datasets with intrusive body-attached sensors. Potential applications of an automated system for psychological trait estimation include estimation of occupational fatigue and psychology, and marketing and advertisement. In this work, we propose PsyMo (Psychological traits from Motion), a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health. We propose two evaluation protocols for psychological trait estimation. Alongside the estimation of self-reported psychological traits from gait, the dataset can be used as a drop-in replacement to benchmark methods for gait recognition. We anonymize all cues related to the identity of the subjects and publicly release only silhouettes, 2D / 3D human skeletons and 3D SMPL human meshes.	翻訳日:2023-11-01 23:04:36 公開日:2023-10-30
# ベイズデータ選択によるモデル学習の高速化 Towards Accelerated Model Training via Bayesian Data Selection ( http://arxiv.org/abs/2308.10544v2 ) ライセンス: Link先を確認	Zhijie Deng, Peng Cui, Jun Zhu	(参考訳) 現実のシナリオにおけるミスラベル付き、重複、バイアス付きのデータは、長期間のトレーニングにつながり、モデル収束を妨げます。簡単あるいはハードなサンプルを優先順位付けする従来のソリューションは、このような多様性を同時に扱う柔軟性を欠いている。最近の研究は、モデルの一般化損失に対するデータの影響を調べることによって、より合理的なデータ選択原則を提案している。しかし、その実践的な採用は、より原則的な近似と追加のホールドアウトデータに依存している。本研究は, 軽量ベイズ処理を活用し, 大規模事前学習モデルを用いた既定ゼロショット予測器を組み込むことにより, この問題を解決した。結果として得られるアルゴリズムは効率的で実装が容易です。我々は,オンラインバッチ選択シナリオにおいて,データノイズと不均衡がかなり大きい難易度ベンチマークについて広範な実証研究を行い,競合ベースラインよりも優れたトレーニング効率を観察する。特に、挑戦的なwebvisionベンチマークにおいて、本手法は、リードデータ選択法よりもトレーニングイテレーションをかなり少なくして、同様の予測性能を達成することができる。 Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy to implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.	翻訳日:2023-11-01 23:04:16 公開日:2023-10-30
# 任意の次元と異なる次元に対する絡み合い証人の簡単な構成 A simple construction of Entanglement Witnesses for arbitrary and different dimensions ( http://arxiv.org/abs/2308.07019v3 ) ライセンス: Link先を確認	Vahid Jannesary, Vahid Karimipour	(参考訳) 異なる次元の空間間の様々な正の写像の集合を生成するための簡単なアプローチを提案する。提案手法は,$d_1 \times d_2$次元のシステムに適したエンタングルメントウィットネスの構築を可能にする。この方法では、選択された所望の測定集合のみからなる絡み合い証人を構成できる。具体例を用いて,本手法の有効性と一般性を示す。また、与えられた状態が正の部分的転置(ppt)絡み合い状態である場合を含む、与えられた状態の絡み合いを目撃するために適切な絡み合い証人を識別する方法を2つの例で示している。 We present a simple approach for generation of a diverse set of positive maps between spaces of different dimensions. The proposed method enables the construction of Entanglement Witnesses tailored for systems in $d_1 \times d_2$ dimensions. With this method, it is possible to construct Entanglement Witnesses that consist solely of a chosen set of desired measurements. We demonstrate the effectiveness and generality of our approach using concrete examples. It is also demonstrated in two examples, how an appropriate entanglement witness can be identified for witnessing the entanglement of a given state, including a case when the given state is a Positive Partial Transpose (PPT) entangled state.	翻訳日:2023-11-01 23:04:01 公開日:2023-10-30
# 協調フィルタリングにおける損失関数の理解を深める Toward a Better Understanding of Loss Functions for Collaborative Filtering ( http://arxiv.org/abs/2308.06091v2 ) ライセンス: Link先を確認	Seongmin Park, Mincheol Yoon, Jae-woong Lee, Hogun Park, Jongwuk Lee	(参考訳) 協調フィルタリング(CF)は現代の推薦システムにおいて重要な手法である。 CFモデルの学習プロセスは通常、インタラクションエンコーダ、損失関数、ネガティブサンプリングの3つのコンポーネントで構成される。多くの既存の研究で洗練された相互作用エンコーダを設計するために様々なcfモデルが提案されているが、最近の研究は損失関数の再構成が著しい性能向上を達成できることを示している。本稿では,既存の損失関数の関係を考察する。我々の数学的解析によると、以前の損失関数はアライメントと均一性関数として解釈できる。 (i)アライメントがユーザとアイテムの表現と一致すること、 (ii)均一性は、ユーザとアイテムの分布を分散させる。この分析に触発されて、Margin-aware Alignment and Weighted Uniformity (MAWU)と呼ばれるデータセットのユニークなパターンを考慮したアライメントと均一性の設計を改善する新しい損失関数を提案する。 mawuの鍵となる新しさは2つあります。 (i)マージン認識アライメント(ma)は、ユーザ/項目固有の人気バイアスを軽減し、 (II)重み付き均一性(WU)は、ユーザとアイテムの均一性の重要性を調整し、データセット固有の特性を反映する。広範な実験の結果、mawuを搭載したmfとlightgcnは、3つのパブリックデータセットで様々な損失関数を持つ最先端cfモデルに匹敵するか優れていることが示された。 Collaborative filtering (CF) is a pivotal technique in modern recommender systems. The learning process of CF models typically consists of three components: interaction encoder, loss function, and negative sampling. Although many existing studies have proposed various CF models to design sophisticated interaction encoders, recent work shows that simply reformulating the loss functions can achieve significant performance gains. This paper delves into analyzing the relationship among existing loss functions. Our mathematical analysis reveals that the previous loss functions can be interpreted as alignment and uniformity functions: (i) the alignment matches user and item representations, and (ii) the uniformity disperses user and item distributions. Inspired by this analysis, we propose a novel loss function that improves the design of alignment and uniformity considering the unique patterns of datasets called Margin-aware Alignment and Weighted Uniformity (MAWU). The key novelty of MAWU is two-fold: (i) margin-aware alignment (MA) mitigates user/item-specific popularity biases, and (ii) weighted uniformity (WU) adjusts the significance between user and item uniformities to reflect the inherent characteristics of datasets. Extensive experimental results show that MF and LightGCN equipped with MAWU are comparable or superior to state-of-the-art CF models with various loss functions on three public datasets.	翻訳日:2023-11-01 23:03:49 公開日:2023-10-30
# 強結合ボゾン系における高速量子状態転移と絡み合い形成 Fast quantum state transfer and entanglement preparation in strongly coupled bosonic systems ( http://arxiv.org/abs/2308.05511v2 ) ライセンス: Link先を確認	Yilun Xu, Daoquan Zhu, Feng-Xiao Sun, Qiongyi He, Wei Zhang	(参考訳) 線形ボゾン系における総励起の保存を保証する連続U(1)ゲージ対称性は、回転波近似(RWA)が失敗する強い結合状態において破られる。本稿では, RWAを超えるXX型結合を持つ多モードボソニック系の解析解を開発し, 高速で高忠実度量子状態伝達(QST)と絡み込み準備(EP)を実装する新しい手法を提案する。このスキームは、大域的u(1)対称性の崩壊にかかわらず励起数が変化しない所定の結合強度とパルス持続時間で実現できる。 QSTタスクでは、いくつかの典型的な量子状態を検討し、この手法が熱雑音や実験シーケンスの不完全性に対して堅牢であることを示す。 EPタスクでは、最短準備時間内にベル状態およびW型状態の準備のために、このスキームをうまく実施する。 Continuous U(1) gauge symmetry, which guarantees the conservation of the total excitations in linear bosonic systems, will be broken when it comes to the strong-coupling regime where the rotation wave approximation (RWA) fails. Here we develop analytic solutions for multi-mode bosonic systems with XX-type couplings beyond RWA, and proposed a novel scheme to implement high-fidelity quantum state transfer (QST) and entanglement preparation (EP) with high speed. The scheme can be realized with designated coupling strength and pulse duration with which the excitation number keeps unchanged regardless of the breakdown of the global U(1) symmetry. In the QST tasks, we consider several typical quantum states and demonstrate that this method is robust against thermal noise and imperfections of experimental sequence. In the EP tasks, the scheme is successfully implemented for the preparation of Bell states and W-type states, within a shortest preparation time.	翻訳日:2023-11-01 23:03:25 公開日:2023-10-30
# グラフクラスタリングのためのホモフィリエンハンス構造学習 Homophily-enhanced Structure Learning for Graph Clustering ( http://arxiv.org/abs/2308.05309v3 ) ライセンス: Link先を確認	Ming Gu, Gaoming Yang, Sheng Zhou, Ning Ma, Jiawei Chen, Qiaoyu Tan, Meihan Liu, Jiajun Bu	(参考訳) グラフクラスタリングはグラフ解析の基本課題であり、グラフニューラルネットワーク(GNN)の最近の進歩は印象的な結果を示している。既存のGNNベースのグラフクラスタリング手法の成功にもかかわらず、それらはしばしばグラフ構造の品質を見落としている。グラフ構造学習は、欠落したリンクを追加し、スプリアス接続を取り除くことで、入力グラフの精細化を可能にする。しかしながら、グラフ構造学習におけるこれまでの取り組みは、主に教師付き設定を中心に行われており、接地ラベルがないため、特定のクラスタリングタスクに直接適用することはできない。このギャップを埋めるために,グラフクラスタリング (HoLe) のための新しい手法である \textbf{ho}mophily-enhanced structure \textbf{le}arning を提案する。我々のモチベーションは、グラフ構造内のホモフィリーの度合いを微妙に向上させることで、GNNとクラスタリングの結果を著しく改善することに由来する。この目的を実現するために,階層相関推定とクラスタ認識スパース化という2つのクラスタリング指向構造学習モジュールを開発した。前者モジュールは、潜在空間とクラスタリング空間からのガイダンスを利用して、より正確なペアワイズノード関係の推定を可能にし、後者は類似度行列とクラスタリング割り当てに基づいてスパーシファイド構造を生成する。さらに,ホモフィリエンハンス構造学習とgnnベースのクラスタリングを交互に行う共同最適化手法を考案し,相互効果の促進を図る。さまざまなタイプとスケールの7つのベンチマークデータセットに関する広範な実験が、さまざまなクラスタリングメトリクスを通じて、最先端のベースラインに対するホールの優位性を示している。 Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.	翻訳日:2023-11-01 23:03:09 公開日:2023-10-30
# TSMD:静的カラーメッシュ品質評価のためのデータベース TSMD: A Database for Static Color Mesh Quality Assessment Study ( http://arxiv.org/abs/2308.01940v2 ) ライセンス: Link先を確認	Qi Yang, Joel Jung, Haiqiang Wang, Xiaozhong Xu, and Shan Liu	(参考訳) テクスチャマップを備えた静的メッシュは、現代の工業や製造業で広く使われており、大量のデータによってメッシュ圧縮コミュニティでかなりの注目を集めている。静的メッシュ圧縮アルゴリズムと客観的品質指標の研究を容易にするために,リッチな視覚特性を持つ42の参照メッシュを含むtencent - static mesh dataset (tsmd) を開発した。 210の歪んだサンプルは、6月23日にalliance for open media volumetric visual media groupからリリースされた多角形静的メッシュコーディングの提案のために開発されたロスリー圧縮スキームによって生成される。 74名の視聴者から主観的スコアを収集するために, クラウドソーシングによる主観的実験を行った。データセットは、そのサンプル多様性と平均世論スコア(mos)の精度を検証するために分析を行い、異質な性質と信頼性を確立する。最先端の客観的メトリクスは、新しいデータセットで評価される。ピアソンとスピアーマンの相関関係は0.75程度と報告されており、不均一なデータセットで通常観測される結果から逸脱し、より堅牢なメトリクスのさらなる開発の必要性を示している。メッシュ、PVS、ビットストリーム、MOSを含むTSMDは、以下の場所で公開されている。 Static meshes with texture map are widely used in modern industrial and manufacturing sectors, attracting considerable attention in the mesh compression community due to its huge amount of data. To facilitate the study of static mesh compression algorithm and objective quality metric, we create the Tencent - Static Mesh Dataset (TSMD) containing 42 reference meshes with rich visual characteristics. 210 distorted samples are generated by the lossy compression scheme developed for the Call for Proposals on polygonal static mesh coding, released on June 23 by the Alliance for Open Media Volumetric Visual Media group. Using processed video sequences, a large-scale, crowdsourcing-based, subjective experiment was conducted to collect subjective scores from 74 viewers. The dataset undergoes analysis to validate its sample diversity and Mean Opinion Scores (MOS) accuracy, establishing its heterogeneous nature and reliability. State-of-the-art objective metrics are evaluated on the new dataset. Pearson and Spearman correlations around 0.75 are reported, deviating from results typically observed on less heterogeneous datasets, demonstrating the need for further development of more robust metrics. The TSMD, including meshes, PVSs, bitstreams, and MOS, is made publicly available at the following location: https://multimedia.tencent.com/resources/tsmd.	翻訳日:2023-11-01 23:02:35 公開日:2023-10-30
# AIバリューチェーンの倫理 The Ethics of AI Value Chains ( http://arxiv.org/abs/2307.16787v2 ) ライセンス: Link先を確認	Blair Attard-Frost, David Gray Widder	(参考訳) AI倫理に関心を持つ研究者、実践者、政策立案者は、さまざまな状況や活動規模にわたるAIシステムの研究と介入に、より統合的なアプローチを必要とする。本稿では,AIバリューチェーンを,必要を満たす統合的概念として提示する。 AIバリューチェーンをより明確に理論化し、概念的にサプライチェーンと区別するために、我々は、バリューチェーンとAIバリューチェーンの理論を戦略的管理、サービスサイエンス、経済地理学、産業、政府、応用研究文献からレビューする。次に、AIバリューチェーンに関連する倫理的懸念をカバーする67のソースのサンプルの統合的レビューを行います。統合的レビューの結果に基づいて、研究者、実践者、政策立案者がAI開発をより倫理的に進め、AIバリューチェーンをまたいだ利用を進めるための4つの今後の方向性を推奨します。私たちのレビューと勧告は、aiバリューチェーンの倫理を研究し、介入しようとする研究課題、産業課題、政策課題の進展に寄与します。 Researchers, practitioners, and policymakers with an interest in AI ethics need more integrative approaches for studying and intervening in AI systems across many contexts and scales of activity. This paper presents AI value chains as an integrative concept that satisfies that need. To more clearly theorize AI value chains and conceptually distinguish them from supply chains, we review theories of value chains and AI value chains from the strategic management, service science, economic geography, industry, government, and applied research literature. We then conduct an integrative review of a sample of 67 sources that cover the ethical concerns implicated in AI value chains. Building upon the findings of our integrative review, we recommend four future directions that researchers, practitioners, and policymakers can take to advance more ethical practices of AI development and use across AI value chains. Our review and recommendations contribute to the advancement of research agendas, industrial agendas, and policy agendas that seek to study and intervene in the ethics of AI value chains.	翻訳日:2023-11-01 23:01:30 公開日:2023-10-30
# 機械学習のための物理システムにおけるサンプリングノイズ対策 -基本限界と固有タスク- Tackling Sampling Noise in Physical Systems for Machine Learning Applications: Fundamental Limits and Eigentasks ( http://arxiv.org/abs/2307.16083v2 ) ライセンス: Link先を確認	Fangjun Hu, Gerasimos Angelatos, Saeed A. Khan, Marti Vives, Esin T\"ureci, Leon Bello, Graham E. Rowlands, Guilhem J. Ribeill, Hakan E. T\"ureci	(参考訳) 学習に使用する物理系の表現能力は,抽出した出力のノイズの存在によって制限される。古典系と量子系の両方に物理系が存在するが、学習におけるノイズの正確な影響はよく分かっていない。教師付き学習に着目し,有限サンプリング雑音下での一般物理系の可解表現能力(REC)を評価する数学的枠組みを提案し,そのエクストリーム,固有タスクを抽出する手法を提案する。固有タスクは、与えられた物理システムが最小限の誤差で近似できる関数のネイティブセットである。量子系のRECは、量子測定の基本理論によって制限され、任意の有限サンプリング物理系のRECに対して厳密な上界が得られることを示す。次に,低雑音固有タスクの抽出が,分類や過度適合性などの機械学習タスクのパフォーマンス向上につながるという実証的証拠を提供する。本稿では,量子システムの相関が固有タスクのノイズ低減により学習能力を高めることを示唆する。これらの結果の適用性は超伝導量子プロセッサの実験で実証されている。我々の発見は量子機械学習とセンシングの応用に幅広い影響を及ぼす。 The expressive capacity of physical systems employed for learning is limited by the unavoidable presence of noise in their extracted outputs. Though present in physical systems across both the classical and quantum regimes, the precise impact of noise on learning remains poorly understood. Focusing on supervised learning, we present a mathematical framework for evaluating the resolvable expressive capacity (REC) of general physical systems under finite sampling noise, and provide a methodology for extracting its extrema, the eigentasks. Eigentasks are a native set of functions that a given physical system can approximate with minimal error. We show that the REC of a quantum system is limited by the fundamental theory of quantum measurement, and obtain a tight upper bound for the REC of any finitely-sampled physical system. We then provide empirical evidence that extracting low-noise eigentasks can lead to improved performance for machine learning tasks such as classification, displaying robustness to overfitting. We present analyses suggesting that correlations in the measured quantum system enhance learning capacity by reducing noise in eigentasks. The applicability of these results in practice is demonstrated with experiments on superconducting quantum processors. Our findings have broad implications for quantum machine learning and sensing applications.	翻訳日:2023-11-01 23:01:12 公開日:2023-10-30
# 量子回路オートエンコーダ Quantum Circuit AutoEncoder ( http://arxiv.org/abs/2307.08446v2 ) ライセンス: Link先を確認	Jun Wu, Hao Fu, Mingzheng Zhu, Haiyue Zhang, Wei Xie and Xiang-Yang Li	(参考訳) 量子オートエンコーダは、量子状態に格納された情報を圧縮するための量子ニューラルネットワークモデルである。しかし、新しい量子情報技術では、多くのタスクのために量子回路に格納された情報を処理する必要がある。本稿では,古典的および量子オートエンコーダの考え方を一般化した量子回路オートエンコーダ(QCAE)のモデルを導入し,量子回路内の情報を圧縮・符号化する。我々はQCAEの包括的なプロトコルを提供し、その実装のために変分量子アルゴリズム varQCAE を設計する。我々は、このモデルについて、損失のない圧縮条件を導出し、その回復率の上下境界を確立することによって理論的に解析する。最後に, varQCAEを3つの実用的なタスクに適用し, 1) 量子回路内の情報を効果的に圧縮し, (2) 量子回路の異常を検知し, (3) 量子デバイスにおける非偏極ノイズを軽減することを示す。このことは,量子回路の他の情報処理タスクにも応用可能であることを示唆する。 Quantum autoencoder is a quantum neural network model for compressing information stored in quantum states. However, one needs to process information stored in quantum circuits for many tasks in the emerging quantum information technology. In this work, generalizing the ideas of classical and quantum autoencoder, we introduce the model of Quantum Circuit AutoEncoder (QCAE) to compress and encode information within quantum circuits. We provide a comprehensive protocol for QCAE and design a variational quantum algorithm, varQCAE, for its implementation. We theoretically analyze this model by deriving conditions for lossless compression and establishing both upper and lower bounds on its recovery fidelity. Finally, we apply varQCAE to three practical tasks and numerical results show that it can effectively (1) compress the information within quantum circuits, (2) detect anomalies in quantum circuits, and (3) mitigate the depolarizing noise in quantum devices. This suggests that our algorithm is potentially applicable to other information processing tasks for quantum circuits.	翻訳日:2023-11-01 23:00:14 公開日:2023-10-30
# S-QGPU:分散量子コンピューティングのための共有量子ゲート処理ユニット S-QGPU: Shared Quantum Gate Processing Unit for Distributed Quantum Computing ( http://arxiv.org/abs/2309.08736v2 ) ライセンス: Link先を確認	Shengwang Du, Yufei Ding, Chunming Qiao	(参考訳) 本稿では,個々の小型量子コンピュータを共有量子ゲート処理ユニット(s-qgpu)に接続する分散量子コンピューティング(dqc)アーキテクチャを提案する。 S-QGPUは、リモートゲート操作のためのハイブリッド2ビットゲートモジュールからなる。各量子コンピュータが専用の通信キュービットを備えている従来のDQCシステムとは対照的に、S-QGPUはリモートゲート操作のためにリソース(例えば通信キュービット)を効果的にプールし、ローカルな量子コンピュータだけでなく、全体の分散システムのコストを大幅に削減する。予備解析とシミュレーションにより,S-QGPUの遠隔ゲート操作のための共有資源が資源利用の効率化を図っている。システム内の全ての計算キュービット(データキュービットとも呼ばれる)が同時遠隔ゲート操作を必要とするわけではない場合、S-QGPUベースのDQCアーキテクチャは通信キュービットを少なくし、全体的なコストを削減できる。あるいは、同じ数の通信キュービットで、特にバーストモードで発生する場合に、より多くの同時リモートゲート操作をより効率的にサポートすることができる。 We propose a distributed quantum computing (DQC) architecture in which individual small-sized quantum computers are connected to a shared quantum gate processing unit (S-QGPU). The S-QGPU comprises a collection of hybrid two-qubit gate modules for remote gate operations. In contrast to conventional DQC systems, where each quantum computer is equipped with dedicated communication qubits, S-QGPU effectively pools the resources (e.g., the communication qubits) together for remote gate operations, and thus significantly reduces the cost of not only the local quantum computers but also the overall distributed system. Our preliminary analysis and simulation show that S-QGPU's shared resources for remote gate operations enable efficient resource utilization. When not all computing qubits (also called data qubits) in the system require simultaneous remote gate operations, S-QGPU-based DQC architecture demands fewer communication qubits, further decreasing the overall cost. Alternatively, with the same number of communication qubits, it can support a larger number of simultaneous remote gate operations more efficiently, especially when these operations occur in a burst mode.	翻訳日:2023-11-01 22:53:00 公開日:2023-10-30
# クラスタ化マルチエージェント線形バンディット Clustered Multi-Agent Linear Bandits ( http://arxiv.org/abs/2309.08710v2 ) ライセンス: Link先を確認	Hamza Cherkaoui and Merwan Barlier and Igor Colin	(参考訳) 本稿では,マルチエージェント線形確率バンディット問題(クラスタ型マルチエージェント線形バンディット)の具体例について述べる。そこで本研究では,エージェント間の効率的な協調を利用して最適化問題を高速化するアルゴリズムを提案する。このコントリビューションでは、ネットワークコントローラがネットワークの基盤となるクラスタ構造を推定し、同一グループ内のエージェント間で共有されるエクスペリエンスを最適化する。後悔最小化問題とクラスタリング品質の両方について理論的解析を行う。合成データと実データの両方における最先端アルゴリズムに対する実証的な評価を通じて,我々の手法の有効性を実証する。 We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.	翻訳日:2023-11-01 22:52:41 公開日:2023-10-30
# 神経機能学習におけるparetoのフロンティア: データ、計算、幅、運 Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck ( http://arxiv.org/abs/2309.03800v2 ) ライセンス: Link先を確認	Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang	(参考訳) 現代のディープラーニングでは、アルゴリズムによる選択(幅、深さ、学習率など)がニュアンスドリソーストレードオフを変調することが知られている。本研究は,これらの複雑度が,計算統計的ギャップの存在下での特徴学習に必然的に現れるかを検討する。まず,多層パーセプトロンの勾配に基づく学習のための統計的クエリの下限を許容する教師付き分類問題であるオフラインスパースパリティ学習を検討する。この下限は、多元的トレードオフフロンティアとして解釈することができる: 成功する学習は、十分なリッチ(大きなモデル)、知識のある(大きなデータセット)、患者(多くのトレーニングイテレーション)、幸運(多くのランダムな推測)がある場合にのみ発生する。理論上, 実験上, 疎初期化とネットワーク幅の増大がサンプル効率を著しく向上させることを示す。ここで、幅は平行探索の役割を担っている: 「ラッタチケット」ニューロンを見つける確率を増幅し、よりサンプル効率のよい特徴を学習する。最後に,合成スパースパリティタスクは,軸指向型特徴学習を必要とする実問題に対するプロキシとして有用であることを示す。広帯域かつ疎初期化MLPモデルを用いて,表層分類ベンチマークにおけるサンプル効率の向上を実証した。 In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are known to modulate nuanced resource tradeoffs. This work investigates how these complexities necessarily arise for feature learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.	翻訳日:2023-11-01 22:51:36 公開日:2023-10-30
# ハードサンプルリマイニング戦略によるロバスト植物病診断に向けて Towards Robust Plant Disease Diagnosis with Hard-sample Re-mining Strategy ( http://arxiv.org/abs/2309.01903v2 ) ライセンス: Link先を確認	Quan Huu Cap, Atsushi Fukuda, Satoshi Kagiwada, Hiroyuki Uga, Nobusuke Iwasaki, Hitoshi Iyatomi	(参考訳) リッチなアノテーション情報により、オブジェクト検出に基づく自動植物病診断システム(例えば、yoloベースのシステム)は、病気の位置の検出や優れた分類性能などの分類ベースのシステム(例えば、effernetベースの)よりも優れていることが多い。これらの検出システムの欠点の1つは、実際の症状が存在しない無注釈の健康データを扱うことである。実際には、健康な植物データは多くの病気データと非常によく似ている。したがって、これらのモデルはしばしば、健康な画像の誤検出ボックスを生成する。加えて、新しいデータを検出モデルにラベル付けるのは通常時間がかかる。 HSM (Hard-sample mining) は、誤り検出ボックスを新しいトレーニングサンプルとして使用することで、モデルを再訓練する一般的な手法である。しかしながら、任意の量のハードサンプルを盲目的に選択すると、疾患と健康データとの類似性が高いため、他の疾患の診断性能が低下する。本稿では,健康なデータの診断性能を高めるとともに,適切なレベルでハードサンプルトレーニング画像を戦略的に選択することで疾患データの性能を向上させることを目的とした,ハードサンプルリマイニング(HSReM)と呼ばれる簡易かつ効果的なトレーニング戦略を提案する。実践的な2つの8クラスキュウリと10クラスのトマトデータセット(42.7Kと35.6Kの画像)に基づく実験により、我々のHSReMトレーニング戦略は、大規模未確認データに対する全体的な診断性能を大幅に改善することを示した。具体的には、HSReM戦略を用いて訓練されたオブジェクト検出モデルは、分類に基づく最先端NetV2-Largeモデルとオリジナルのオブジェクト検出モデルよりも優れた結果を得ただけでなく、複数の評価指標においてHSM戦略を用いたモデルよりも優れていた。 With rich annotation information, object detection-based automated plant disease diagnosis systems (e.g., YOLO-based systems) often provide advantages over classification-based systems (e.g., EfficientNet-based), such as the ability to detect disease locations and superior classification performance. One drawback of these detection systems is dealing with unannotated healthy data with no real symptoms present. In practice, healthy plant data appear to be very similar to many disease data. Thus, those models often produce mis-detected boxes on healthy images. In addition, labeling new data for detection models is typically time-consuming. Hard-sample mining (HSM) is a common technique for re-training a model by using the mis-detected boxes as new training samples. However, blindly selecting an arbitrary amount of hard-sample for re-training will result in the degradation of diagnostic performance for other diseases due to the high similarity between disease and healthy data. In this paper, we propose a simple but effective training strategy called hard-sample re-mining (HSReM), which is designed to enhance the diagnostic performance of healthy data and simultaneously improve the performance of disease data by strategically selecting hard-sample training images at an appropriate level. Experiments based on two practical in-field eight-class cucumber and ten-class tomato datasets (42.7K and 35.6K images) show that our HSReM training strategy leads to a substantial improvement in the overall diagnostic performance on large-scale unseen data. Specifically, the object detection model trained using the HSReM strategy not only achieved superior results as compared to the classification-based state-of-the-art EfficientNetV2-Large model and the original object detection model, but also outperformed the model using the HSM strategy in multiple evaluation metrics.	翻訳日:2023-11-01 22:51:12 公開日:2023-10-30
# NLLB-CLIP -- 予算に基づく列車動作多言語画像検索モデル NLLB-CLIP -- train performant multilingual image retrieval model on a budget ( http://arxiv.org/abs/2309.01859v2 ) ライセンス: Link先を確認	Alexander Visheratin	(参考訳) 今日では、大規模コンピューティング資源の助けを借りて、学術機関や産業機関によって開発された大規模モデルの指数関数的増加は、そのような資源にアクセスできない人が貴重な科学的貢献を得られるかどうかという疑問を提起している。そこで我々は,1000ドルの限られた予算を持つ多言語画像検索の課題を解決することを試みた。その結果,NLLBモデルからテキストエンコーダを用いたNLLB-CLIP-CLIPモデルを提案する。このモデルをトレーニングするために、LAION COCOデータセットから派生した201言語でキャプション付き106,246の良質な画像の自動生成データセットを使用した。様々なサイズの画像とテキストエンコーダを用いて複数のモデルを訓練し、トレーニング中にモデルの異なる部分を凍結させた。既存の評価データセットと、新たに作成されたxtd200とflickr30k-200データセットを用いて、トレーニングモデルを徹底的に分析した。我々は,NLLB-CLIPが最先端モデルに匹敵する品質であり,低リソース言語ではかなり優れていることを示す。 Today, the exponential rise of large models developed by academic and industrial institutions with the help of massive computing resources raises the question of whether someone without access to such resources can make a valuable scientific contribution. To explore this, we tried to solve the challenging task of multilingual image retrieval having a limited budget of $1,000. As a result, we present NLLB-CLIP - CLIP model with a text encoder from the NLLB model. To train the model, we used an automatically created dataset of 106,246 good-quality images with captions in 201 languages derived from the LAION COCO dataset. We trained multiple models using image and text encoders of various sizes and kept different parts of the model frozen during the training. We thoroughly analyzed the trained models using existing evaluation datasets and newly created XTD200 and Flickr30k-200 datasets. We show that NLLB-CLIP is comparable in quality to state-of-the-art models and significantly outperforms them on low-resource languages.	翻訳日:2023-11-01 22:50:28 公開日:2023-10-30
# NAS-X: ツイストによるニューラル適応平滑化 NAS-X: Neural Adaptive Smoothing via Twisting ( http://arxiv.org/abs/2308.14864v2 ) ライセンス: Link先を確認	Dieterich Lawson, Michael Li, Scott Linderman	(参考訳) 逐次潜在変数モデル(SLVM)は統計学や機械学習において必須のツールであり、医療から神経科学まで幅広い応用がある。柔軟性が増すにつれて、解析的推論とモデル学習は難しくなり、近似メソッドが必要となる。本稿では,smc(s smoothing sequential monte carlo)を用いて再重み付けウェイクスリープ(reweighted wake-sleep, rws)を逐次設定に拡張したニューラルアダプティブスライディング(nas-x)を提案する。 RWS と滑らかな SMC を組み合わせることで、NAS-X は低バイアスおよび低分散勾配推定を提供し、離散変数モデルと連続変数モデルの両方に適合する。従来の手法よりもNAS-Xの理論的利点を説明し、神経力学の力学モデルへの挑戦を含む様々なタスクにおいてこれらの利点を実証的に探求する。これらの実験により,NAS-X は従来の VI- および RWS に基づく推論とモデル学習の手法を著しく上回り,より低いパラメータ誤差とより厳密な近距離境界を達成した。 Sequential latent variable models (SLVMs) are essential tools in statistics and machine learning, with applications ranging from healthcare to neuroscience. As their flexibility increases, analytic inference and model learning can become challenging, necessitating approximate methods. Here we introduce neural adaptive smoothing via twisting (NAS-X), a method that extends reweighted wake-sleep (RWS) to the sequential setting by using smoothing sequential Monte Carlo (SMC) to estimate intractable posterior expectations. Combining RWS and smoothing SMC allows NAS-X to provide low-bias and low-variance gradient estimates, and fit both discrete and continuous latent variable models. We illustrate the theoretical advantages of NAS-X over previous methods and explore these advantages empirically in a variety of tasks, including a challenging application to mechanistic models of neuronal dynamics. These experiments show that NAS-X substantially outperforms previous VI- and RWS-based methods in inference and model learning, achieving lower parameter error and tighter likelihood bounds.	翻訳日:2023-11-01 22:50:02 公開日:2023-10-30
# SGMM: モーメントの一般化法に対する確率近似 SGMM: Stochastic Approximation to Generalized Method of Moments ( http://arxiv.org/abs/2308.13564v2 ) ライセンス: Link先を確認	Xiaohong Chen, Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin, Myunghyun Song	(参考訳) 本稿では,(過大な)モーメント制限モデルに対する推定と推論のための新しいアルゴリズムである確率的一般化モーメント法(sgmm)を提案する。我々のSGMMは、人気のあるHansen (1982) (オフライン) GMMに代わる新しい確率近似であり、ストリーミングデータセットをリアルタイムに処理できる高速でスケーラブルな実装を提供する。ほぼ確実な収束と、非効率的なオンライン2SLSと効率的なSGMMに対する(機能的な)中心極限定理を確立する。さらに,SGMMフレームワークにシームレスに統合可能なDurbin-Wu-HausmanおよびSargan-Hansenテストのオンライン版を提案する。大規模なモンテカルロシミュレーションでは、サンプルのサイズが大きくなるにつれて、SGMMは推定精度の点で標準(オフライン)GMMと一致し、計算効率が向上し、大規模なデータセットとオンラインデータセットの両方で実用的価値が示される。サンプルサイズが大きい2つのよく知られた実験例を用いて,概念実証によるアプローチの有効性を実証した。 We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well known empirical examples with large sample sizes.	翻訳日:2023-11-01 22:48:48 公開日:2023-10-30
# 時間関数による量子状態の特異性 Uniqueness of quantum state over time function ( http://arxiv.org/abs/2308.12752v2 ) ライセンス: Link先を確認	Seok Hyung Lie and Nelly H. Y. Ng	(参考訳) 基本的Aの非対称性は、空間と時間の間の従来の量子理論の枠組みの中に存在し、量子チャネルによる因果関係と多部量子状態による因果関係を表す。このような区別は古典的確率論には存在しない。この対称性を量子理論に導入するために、量子系の動的記述が時間とともに静的な量子状態によってカプセル化されるような新しい枠組みが最近提案されている。特に、fullwoodとparzygnatは、jordan積に基づく状態超時間関数をそのような量子超時間関数の有望な候補として提案し、horsmanらによるno-goの結果で必要とされる全ての公理を満たすことを示した。しかし、公理が時間関数に対して一意な状態を誘導するかどうかは明らかでない。本研究では,従来提案されていた公理が時間関数で一意な状態にならないことを示す。そこで我々は,2点を超える任意の時空領域上の量子状態を記述するのにより適した,操作的動機づけのある別の公理集合を提案する。これにより、全ての操作公理を満たす本質的に一意な関数としてフルウッド・パリジーニャート状態が時間関数として確立される。 A fundamental A fundamental asymmetry exists within the conventional framework of quantum theory between space and time, in terms of representing causal relations via quantum channels and acausal relations via multipartite quantum states. Such a distinction does not exist in classical probability theory. In effort to introduce this symmetry to quantum theory, a new framework has recently been proposed, such that dynamical description of a quantum system can be encapsulated by a static quantum state over time. In particular, Fullwood and Parzygnat recently proposed the state over time function based on the Jordan product as a promising candidate for such a quantum state over time function, by showing that it satisfies all the axioms required in the no-go result by Horsman et al. However, it was unclear if the axioms induce a unique state over time function. In this work, we demonstrate that the previously proposed axioms cannot yield a unique state over time function. In response, we therefore propose an alternative set of axioms that is operationally motivated, and better suited to describe quantum states over any spacetime regions beyond two points. By doing so, we establish the Fullwood-Parzygnat state over time function as the essentially unique function satisfying all these operational axioms.	翻訳日:2023-11-01 22:48:07 公開日:2023-10-30
# RefEgo:Ego4Dの自己認識から得られる表現理解データを参照 RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D ( http://arxiv.org/abs/2308.12035v2 ) ライセンス: Link先を確認	Shuhei Kurita, Naoki Katsura, Eri Onami	(参考訳) 一対一の視点からシーンオブジェクトのテキスト表現を接地することは、周囲を認識し、直感的なテキスト指示に従って振る舞うエージェントの開発において本当に要求される能力である。このような能力は、ガラスデバイスや自律ロボットが現実世界の参照対象をローカライズする必要がある。しかし、画像の通常の参照表現理解タスクでは、データセットは主にwebクローラーデータに基づいて構築されており、現実世界のさまざまなオブジェクトのテキスト表現を接地するタスクにおいて、多様な現実世界の構造を反映していない。近年,ego4dの大規模エゴセントリックビデオデータセットが提案されている。 Ego4Dは、ショッピング、料理、ウォーキング、トーキー、製造など、屋内および屋外の多くの状況を含む世界中の多様な現実世界のシーンをカバーしている。 ego4dのエゴセントリックビデオに基づいて、ビデオベースの参照表現理解データセットrefegoの広範なカバレッジを構築しました。我々のデータセットは、ビデオベースの参照式理解アノテーションに12K以上のビデオクリップと41時間を含む。実験では、最先端の2D参照表現理解モデルとオブジェクト追跡アルゴリズムを併用し、困難な状況下でもビデオワイド参照オブジェクト追跡を実現する:ビデオの途中で参照オブジェクトがフレーム外になる、あるいはビデオに複数の類似オブジェクトが提示される。コードはhttps://github.com/shuheikurita/refegoで入手できる。 Grounding textual expressions on scene objects from first-person views is a truly demanding capability in developing agents that are aware of their surroundings and behave following intuitive text instructions. Such capability is of necessity for glass-devices or autonomous robots to localize referred objects in the real-world. In the conventional referring expression comprehension tasks of images, however, datasets are mostly constructed based on the web-crawled data and don't reflect diverse real-world structures on the task of grounding textual expressions in diverse objects in the real world. Recently, a massive-scale egocentric video dataset of Ego4D was proposed. Ego4D covers around the world diverse real-world scenes including numerous indoor and outdoor situations such as shopping, cooking, walking, talking, manufacturing, etc. Based on egocentric videos of Ego4D, we constructed a broad coverage of the video-based referring expression comprehension dataset: RefEgo. Our dataset includes more than 12k video clips and 41 hours for video-based referring expression comprehension annotation. In experiments, we combine the state-of-the-art 2D referring expression comprehension models with the object tracking algorithm, achieving the video-wise referred object tracking even in difficult conditions: the referred object becomes out-of-frame in the middle of the video or multiple similar objects are presented in the video. Codes are available at https://github.com/shuheikurita/RefEgo	翻訳日:2023-11-01 22:47:46 公開日:2023-10-30
# オブジェクトパーマンスによるオフライン追跡 Offline Tracking with Object Permanence ( http://arxiv.org/abs/2310.01288v2 ) ライセンス: Link先を確認	Xianzhong Liu, Holger Caesar	(参考訳) 自動走行データセットの手動ラベリングに要するコストを削減すべく、オフライン認識システムを用いてデータセットを自動的にラベリングする。しかし、物体は時間的にオクルードされることがある。このようなデータセットのオクルージョンシナリオは、オフラインのオートラベルでは未検討のままである。本研究では,隠蔽対象トラックに着目したオフライン追跡モデルを提案する。オブジェクト永続性(object permanence)という概念を利用しており、もはや観測されていなくてもオブジェクトは存在し続ける。このモデルには、標準的なオンライントラッカー、閉塞前後のトラックレットを関連付ける再識別(Re-ID)モジュール、断片化されたトラックを補完するトラック補完モジュールの3つの部分が含まれている。 Re-IDモジュールとトラック完了モジュールは、ベクトル化されたマップを入力の1つとして使用し、オクルージョンで追跡結果を洗練する。モデルは、閉塞された対象軌跡を効果的に回収することができる。従来のオンライン追跡結果を45%のIDSと2%のAMOTAで改善し、3Dマルチオブジェクトトラッキングにおける最先端のパフォーマンスを実現する。 To reduce the expensive labor cost for manual labeling autonomous driving datasets, an alternative is to automatically label the datasets using an offline perception system. However, objects might be temporally occluded. Such occlusion scenarios in the datasets are common yet underexplored in offline autolabeling. In this work, we propose an offline tracking model that focuses on occluded object tracks. It leverages the concept of object permanence which means objects continue to exist even if they are not observed anymore. The model contains three parts: a standard online tracker, a re-identification (Re-ID) module that associates tracklets before and after occlusion, and a track completion module that completes the fragmented tracks. The Re-ID module and the track completion module use the vectorized map as one of the inputs to refine the tracking results with occlusion. The model can effectively recover the occluded object trajectories. It achieves state-of-the-art performance in 3D multi-object tracking by improving over the original online tracking result by 45% IDS and 2% AMOTA on the vehicle tracks.	翻訳日:2023-11-01 22:40:24 公開日:2023-10-30
# ミニバッチSGDと局所SGDの安定性と一般化 Stability and Generalization for Minibatch SGD and Local SGD ( http://arxiv.org/abs/2310.01139v2 ) ライセンス: Link先を確認	Yunwen Lei, Tao Sun, Mingrui Liu	(参考訳) データの規模が大きくなることで、最適化のスピードアップに並列性を活用する人気が高まっている。ミニバッチ確率勾配降下(ミニバッチSGD)と局所SGDは並列最適化の2つの一般的な方法である。既存の理論的研究は、最適化誤差によって測定される機械の数に関して、これらの手法の線形高速化を示している。比較として、これらの手法の安定性と一般化はあまり研究されていない。本稿では,ミニバッチと局所SGDの安定性と一般化解析を行い,新しい予測分散分解を導入して学習可能性を理解する。トレーニングエラーを安定性解析に組み込むことで、過パラメータモデルの一般化にいかに役立つかを示す。最適リスク境界を達成するために,ミニバッチと局所SGDの両方が線形スピードアップを達成することを示す。 The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing a novel expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.	翻訳日:2023-11-01 22:39:47 公開日:2023-10-30
# ResolvNet: マルチスケール一貫性を備えたグラフ畳み込みネットワーク ResolvNet: A Graph Convolutional Network with multi-scale Consistency ( http://arxiv.org/abs/2310.00431v2 ) ライセンス: Link先を確認	Christian Koke, Abhishek Saroha, Yuesong Shen, Marvin Eisenberger, Daniel Cremers	(参考訳) 現在、グラフ学習コミュニティでよく知られている事実として、ボトルネックの存在は、グラフニューラルネットワークが長距離情報を伝播する能力を著しく制限している。今のところ評価されていないのは、直観的には、強い連結されたサブグラフの存在が、共通のアーキテクチャにおける情報フローを厳しく制限する可能性があることだ。この観測により,マルチスケール一貫性の概念が導入された。ノードレベルでは、この概念は与えられたグラフ上で接続が変化しても接続された伝播グラフの保持を指す。グラフレベルでは、マルチスケールの一貫性は、異なる解像度で同じオブジェクトを記述する異なるグラフが同様の特徴ベクトルを割り当てるべきという事実を指す。このように、両方の特性は、多面グラフニューラルネットワークアーキテクチャでは満足できない。これらの欠点を補うために,リゾルダーの数学的概念に基づくフレキシブルグラフニューラルネットワークResolvNetを導入する。このResolvNetアーキテクチャに基づくネットワークは、多くのタスク、すなわちマルチスケール設定の内外において、はるかに高いパフォーマンスのベースラインを誇示しています。 It is by now a well known fact in the graph learning community that the presence of bottlenecks severely limits the ability of graph neural networks to propagate information over long distances. What so far has not been appreciated is that, counter-intuitively, also the presence of strongly connected sub-graphs may severely restrict information flow in common architectures. Motivated by this observation, we introduce the concept of multi-scale consistency. At the node level this concept refers to the retention of a connected propagation graph even if connectivity varies over a given graph. At the graph-level, multi-scale consistency refers to the fact that distinct graphs describing the same object at different resolutions should be assigned similar feature vectors. As we show, both properties are not satisfied by poular graph neural network architectures. To remedy these shortcomings, we introduce ResolvNet, a flexible graph neural network based on the mathematical concept of resolvents. We rigorously establish its multi-scale consistency theoretically and verify it in extensive experiments on real world data: Here networks based on this ResolvNet architecture prove expressive; out-performing baselines significantly on many tasks; in- and outside the multi-scale setting.	翻訳日:2023-11-01 22:39:35 公開日:2023-10-30
# SMPLer-X:表現力のある人文のスケールアップと形状推定 SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation ( http://arxiv.org/abs/2309.17448v2 ) ライセンス: Link先を確認	Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang, Ziwei Liu	(参考訳) 表現的人間のポーズと形状推定(EHPS)は、身体、手、顔の動きのキャプチャを多数の応用で統一する。進歩を奨励しているにもかかわらず、現在の最先端の手法は依然としてトレーニングデータセットの限定セットに依存している。本研究では,VT-Hugeをバックボーンとし,さまざまなデータソースから最大4.5万インスタンスをトレーニングする,最初のジェネラリスト基盤モデル(SMPLer-Xと呼ばれる)へのEHPSのスケールアップについて検討する。ビッグデータと大規模モデルにより、SMPLer-Xは、さまざまなテストベンチマークにまたがる強力なパフォーマンスと、目に見えない環境への優れた転送性を示す。 1) データのスケーリングには,32のEHPSデータセットに対して,単一のデータセットでトレーニングしたモデルでは処理できない幅広いシナリオを含む,体系的な調査を行う。さらに重要なのは、広範なベンチマークプロセスから得られた洞察を活かして、トレーニングスキームを最適化し、EHPS能力の大きな飛躍につながるデータセットを選択することです。 2) モデルスケーリングでは,EHPSにおけるモデルサイズのスケーリング法則を研究するために,視覚変換器を利用する。さらに,我々はSMPLer-Xを専門モデルとし,さらなる性能向上を実現した。 AGORA (107.2 mm NMVE)、UBody (57.4 mm PVE)、EgoBody (63.6 mm PVE)、EHF (62.3 mm PVE) の7つのベンチマークに対して、我々の基礎モデルSMPLer-Xは一貫して最先端の結果を提供する。ホームページ:https://caizhongang.github.io/projects/SMPLer-X/ Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments. 1) For the data scaling, we perform a systematic investigation on 32 EHPS datasets, including a wide range of scenarios that a model trained on any single dataset cannot handle. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. 2) For the model scaling, we take advantage of vision transformers to study the scaling law of model sizes in EHPS. Moreover, our finetuning strategy turn SMPLer-X into specialist models, allowing them to achieve further performance boosts. Notably, our foundation model SMPLer-X consistently delivers state-of-the-art results on seven benchmarks such as AGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF (62.3 mm PVE without finetuning). Homepage: https://caizhongang.github.io/projects/SMPLer-X/	翻訳日:2023-11-01 22:38:50 公開日:2023-10-30
# reusability report: 生物学的に不均一な神経構造を有する前立腺癌の成層化 Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures ( http://arxiv.org/abs/2309.16645v2 ) ライセンス: Link先を確認	Christian Pedersen, Tiberiu Tesileanu, Tinghui Wu, Siavash Golkar, Miles Cranmer, Zijun Zhang, Shirley Ho	(参考訳) elmarakeby et al., "biologically informed deep neural network for prostate cancer discovery"では、生物学的にインフォームドされたフィードフォワードニューラルネットワークであるsparse connections (p-net)が、前立腺癌の状態をモデル化するために提示された。 Elmarakebyらが実施した研究の再現性について,元のコードベースと,より最新のライブラリを使用した独自の再実装の両方を用いて検証した。 reactomeの生物学的経路によるネットワークスパーシフィケーションの寄与を定量化し,p-netの優れた性能にその重要性を確認した。さらに,生体情報をネットワークに組み込むためのニューラルアーキテクチャやアプローチについても検討した。同じトレーニングデータ上で3種類のグラフニューラルネットワークを実験し,各モデル間の臨床予測の一致について検討した。分析の結果、異なるアーキテクチャを持つディープニューラルネットワークは、特定のニューラルアーキテクチャの異なる初期化にまたがる個々の患者に対して、誤った予測を行うことがわかった。これは、異なる神経アーキテクチャがデータの異なる側面に敏感であることを示唆している。 In Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.	翻訳日:2023-11-01 22:37:55 公開日:2023-10-30
# ADGym: 深部異常検出のための設計選択 ADGym: Design Choices for Deep Anomaly Detection ( http://arxiv.org/abs/2309.15376v2 ) ライセンス: Link先を確認	Minqi Jiang, Chaochuan Hou, Ao Zheng, Songqiao Han, Hailiang Huang, Qingsong Wen, Xiyang Hu, Yue Zhao	(参考訳) ディープラーニング(DL)技術は、金融、医療サービス、クラウドコンピューティングなど、さまざまな分野における異常検出(AD)に成功している。しかしながら、現在の研究の多くは、損失関数やネットワークアーキテクチャといった個々の設計選択の貢献を解剖することなく、ディープADアルゴリズム全体を概観する傾向にある。この見解は、新たに設計された損失関数、ネットワークアーキテクチャ、学習パラダイムなど、データ前処理のような予備的なステップの価値を低下させる傾向にある。本稿では,このギャップを埋めるために,2つの重要な疑問を提起する。 (i)異常検出には,深層ad手法のどの設計選択が不可欠か? (ii) 汎用的で既存のソリューションに頼るのではなく、任意のADデータセットに対して最適な設計選択を自動的に選択する方法。これらの問題に対処するため,より深い手法でAD設計要素を包括的に評価し,自動選択するプラットフォームであるADGymを紹介した。我々の広範な実験により、既存のリードメソッドのみに頼るだけでは不十分であることが判明した。対照的にADGymを用いて開発されたモデルは、現在の最先端技術を大きく上回っている。 Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.	翻訳日:2023-11-01 22:36:55 公開日:2023-10-30
# グラフコントラスト学習のための確率的学習 Provable Training for Graph Contrastive Learning ( http://arxiv.org/abs/2309.13944v2 ) ライセンス: Link先を確認	Yue Yu, Xiao Wang, Mengmei Zhang, Nian Liu, Chuan Shi	(参考訳) グラフコントラスト学習(gcl)はラベルのない拡張グラフからノード埋め込みを学ぶための一般的なトレーニングアプローチとして登場した。正のノード対間の類似性を最大化しつつ、負のノード対間の類似性を最小化するという鍵原理は確立されているが、いくつかの根本的な問題はいまだ不明である。複雑なグラフ構造を考えると、いくつかのノードは一貫してよく訓練されているか? あるいは、グラフを拡張せずに原則に違反しているノードがあるのでしょうか? これらのノードを区別し、GCLのトレーニングをさらにガイドする方法? これらの疑問に答えるために、まず、GCLのトレーニングがすべてのノードで実際に不均衡であることを示す実験的な証拠を提示する。この問題に対処するために、ノードが拡張範囲に関連するgclの原理に従う方法の下界である計量「ノードコンパクト性」を提案する。さらに,正規化として二元クロスエントロピーに積分できるバウンド伝搬によって,理論的にノードコンパクト性の形式を導出する。そこで本稿では,GCL の原則に従うノード埋め込みを符号化するための GCL のトレーニングを正規化するための PrOvable Training (POT) を提案する。さまざまなベンチマークに関する広範な実験を通じて、POTは既存のGCLアプローチを一貫して改善し、フレンドリーなプラグインとして機能する。 Graph Contrastive Learning (GCL) has emerged as a popular training approach for learning node embeddings from augmented graphs without labels. Despite the key principle that maximizing the similarity between positive node pairs while minimizing it between negative node pairs is well established, some fundamental problems are still unclear. Considering the complex graph structure, are some nodes consistently well-trained and following this principle even with different graph augmentations? Or are there some nodes more likely to be untrained across graph augmentations and violate the principle? How to distinguish these nodes and further guide the training of GCL? To answer these questions, we first present experimental evidence showing that the training of GCL is indeed imbalanced across all nodes. To address this problem, we propose the metric "node compactness", which is the lower bound of how a node follows the GCL principle related to the range of augmentations. We further derive the form of node compactness theoretically through bound propagation, which can be integrated into binary cross-entropy as a regularization. To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better. Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.	翻訳日:2023-11-01 22:36:22 公開日:2023-10-30
# シャープネス認識の最小化と安定性の限界 Sharpness-Aware Minimization and the Edge of Stability ( http://arxiv.org/abs/2309.12488v4 ) ライセンス: Link先を確認	Philip M. Long and Peter L. Bartlett	(参考訳) 最近の実験では、ステップサイズ$\eta$の勾配降下(gd)を持つニューラルネットワークを訓練する場合、損失のヘッセンの演算子ノルムはおよそ2/\eta$に達するまで増加することが示されている。 2/\eta$の量は、損失の局所二次近似を考慮して「安定性の最先端」と呼ばれる。我々は,GD の変種である SAM (Sharpness-Aware Minimization) の「安定性の端」に到達するための同様の計算を行う。 GDの場合とは異なり、結果のSAM-辺は勾配のノルムに依存する。 3つのディープラーニングトレーニングタスクを用いて、SAMは、この分析によって同定された安定性の端で動作していることを実証的に確認する。 Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.	翻訳日:2023-11-01 22:35:37 公開日:2023-10-30
# DDMT:多変量時系列異常検出のための拡散マスク変換器モデル DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection ( http://arxiv.org/abs/2310.08800v2 ) ライセンス: Link先を確認	Chaocheng Yang and Tingyin Wang and Xuanhui Yan	(参考訳) 多変量時系列における異常検出は時系列研究において重要な課題として現れており、不正検出、故障診断、システム状態推定など様々な分野で重要な研究が行われている。再構成に基づくモデルは近年,時系列データの異常検出に有望な可能性を示している。しかし,データ規模や次元の急激な増加により,時系列再構成におけるノイズ・弱同一性マッピング(WIM)の問題がますます顕著になっている。そこで我々は,Adaptive Dynamic Neighbor Mask (ADNM) 機構を導入し,それを Transformer and Denoising Diffusion Model に統合し,多変量時系列異常検出のための新しいフレームワークである Denoising Diffusion Mask Transformer (DDMT) を開発した。 ADNMモジュールは、データ再構成時に入力と出力の特徴間の情報漏洩を軽減し、再構築時にWIMの問題を軽減する。 Denoising Diffusion Transformer (DDT)は、Denoising Diffusion Modelのための内部ニューラルネットワーク構造としてTransformerを使用している。時系列データの段階的生成過程を学習し、データの確率分布をモデル化し、正常なデータパターンをキャプチャし、ノイズを除去して時系列データを段階的に復元し、異常の明確な回復をもたらす。我々の知る限り、これは多変量時系列異常検出のためのデノイング拡散モデルと変換器を組み合わせた最初のモデルである。 5種類の多変量時系列異常検出データセットを用いて実験を行った。その結果, 時系列データの異常を効果的に識別し, 異常検出時の最先端性能を実現することができた。 Anomaly detection in multivariate time series has emerged as a crucial challenge in time series research, with significant research implications in various fields such as fraud detection, fault diagnosis, and system state estimation. Reconstruction-based models have shown promising potential in recent years for detecting anomalies in time series data. However, due to the rapid increase in data scale and dimensionality, the issues of noise and Weak Identity Mapping (WIM) during time series reconstruction have become increasingly pronounced. To address this, we introduce a novel Adaptive Dynamic Neighbor Mask (ADNM) mechanism and integrate it with the Transformer and Denoising Diffusion Model, creating a new framework for multivariate time series anomaly detection, named Denoising Diffusion Mask Transformer (DDMT). The ADNM module is introduced to mitigate information leakage between input and output features during data reconstruction, thereby alleviating the problem of WIM during reconstruction. The Denoising Diffusion Transformer (DDT) employs the Transformer as an internal neural network structure for Denoising Diffusion Model. It learns the stepwise generation process of time series data to model the probability distribution of the data, capturing normal data patterns and progressively restoring time series data by removing noise, resulting in a clear recovery of anomalies. To the best of our knowledge, this is the first model that combines Denoising Diffusion Model and the Transformer for multivariate time series anomaly detection. Experimental evaluations were conducted on five publicly available multivariate time series anomaly detection datasets. The results demonstrate that the model effectively identifies anomalies in time series data, achieving state-of-the-art performance in anomaly detection.	翻訳日:2023-11-01 22:28:22 公開日:2023-10-30
# 進化的動的最適化と機械学習 Evolutionary Dynamic Optimization and Machine Learning ( http://arxiv.org/abs/2310.08748v2 ) ライセンス: Link先を確認	Abdennour Boulesnane	(参考訳) 進化計算(Evolutionary Computation, EC)は、人工知能の強力な分野として出現し、徐々に発展する自然のメカニズムに触発されている。しかし、ECアプローチは、停滞、多様性喪失、計算複雑性、人口の初期化、早期収束といった課題に直面していることが多い。これらの限界を克服するために、研究者は学習アルゴリズムと進化的手法を統合した。この統合は、反復探索中にECアルゴリズムによって生成された貴重なデータを活用し、検索空間と人口動態に関する洞察を提供する。同様に、進化的アルゴリズムと機械学習(ML)の関係は相反するものであり、ECメソッドはノイズ、不正確、動的目的関数によって特徴づけられる複雑なMLタスクを最適化する特別な機会を提供する。進化機械学習(EML)として知られるこれらのハイブリッド技術は、MLプロセスの様々な段階に適用されている。 EC技術はデータバランシング、機能選択、モデルのトレーニング最適化といったタスクにおいて重要な役割を果たす。さらにMLタスクは、進化的動的最適化(EDO)が価値のある動的最適化を必要とすることが多い。本稿では,EDOとMLの相互統合を包括的に検討する。この研究の目的は、進化的学習コミュニティへの関心を刺激し、この分野における革新的な貢献を促すことである。 Evolutionary Computation (EC) has emerged as a powerful field of Artificial Intelligence, inspired by nature's mechanisms of gradual development. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.	翻訳日:2023-11-01 22:27:53 公開日:2023-10-30
# ラベル比率から学ぶ: 信念伝達による教師付き学習者のブートストラップ Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation ( http://arxiv.org/abs/2310.08056v2 ) ライセンス: Link先を確認	Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, Aravindan Raghuveer	(参考訳) Label Proportions(LLP)からの学習(Learning from Label Proportions)は、トレーニング中のバッグと呼ばれるインスタンスのグループに対して、アグリゲートレベルのラベルしか利用できない学習問題である。この設定は、プライバシー上の考慮から広告や医療といった領域で発生する。そこで本研究では,2つの主要なステップを反復的に実行する新しいアルゴリズムフレームワークを提案する。イテレーション毎に最初のステップ(Pseudo Labeling)として、バイナリインスタンスラベルを組み込んだGibbsディストリビューションを定義します。 a) 類似の共変量を持つインスタンスが類似のラベルを持つべきという制約により、共変量情報 b)バッグレベル集約ラベル。次に,Belief Propagation (BP) を用いてギブス分布を疎外し,擬似ラベルを得る。第2のステップ(改良の埋め込み)では、擬似ラベルを使用して学習者の監督を行い、よりよい埋め込みを得る。さらに、第2ステップの埋め込みを次のイテレーションの新しい共変数として使用して、2つのステップを繰り返す。最後のイテレーションでは、擬似ラベルを使用して分類器を訓練する。本アルゴリズムは,表型および画像型のLLPバイナリ分類問題に対して,複数のSOTAベースライン(最大15%)に対して強い利得を示す。我々は,100万個のサンプルであっても,Belief Propagationによる標準的な教師あり学習よりも計算オーバーヘッドが最小限に抑えられたこれらの改善を実現する。 Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.	翻訳日:2023-11-01 22:27:33 公開日:2023-10-30
# 共変量シフトによるテストサンプルの少ないフェアネス精度トレードオフの改善 Improving Fairness-Accuracy tradeoff with few Test Samples under Covariate Shift ( http://arxiv.org/abs/2310.07535v2 ) ライセンス: Link先を確認	Shreyas Havaldar, Jatin Chauhan, Karthikeyan Shanmugam, Jay Nandy, Aravindan Raghuveer	(参考訳) テストデータの共変量は、モデルの精度と公平性の両方を著しく低下させることができる。このような状況下で、異なるセンシティブなグループ間で公平性を確保することは、刑事司法のような社会的意味合いによって最重要となる。ラベルのないテストサンプルとラベル付きトレーニングセットの小さなセットのみが利用可能な、教師なしの体制の下で運用します。この問題に対して、私たちは3つの貢献をします。まず,新しい複合重み付きエントロピーに基づく予測精度を目標とし,フェアネスの表現マッチング損失を最適化した。我々は、いくつかの標準データセットの公平性・正確性トレードオフに関して、損失定式化による最適化がパレート意味で多くの最先端ベースラインを上回っていることを実験的に検証する。第二の貢献は、Asymmetric Covariate Shift(非対称共変量シフト)という新しい設定である。非対称共変量シフト (asymmetric covariate shift) は、ある群の共変量の分布が他の群に比べて著しく変化し、支配的な群が過剰に表現されたときに起こる。この設定は現在のベースラインでは極めて困難であるが,提案手法がベースラインを大きく上回っていることを示す。第3の貢献は理論であり、トレーニングセットにおける予測損失と重み付きエントロピー項が共変量シフトの下でのテスト損失を近似することを示す。経験的および形式的サンプル複雑性境界により、この未知のテスト損失に対する近似は、他の多くのベースラインに影響を及ぼす重要サンプリング分散に依存しないことを示す。 Covariate shift in the test data can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups in such settings is of paramount importance due to societal implications like criminal justice. We operate under the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards this problem, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines.	翻訳日:2023-11-01 22:27:10 公開日:2023-10-30
# 経路ベル試験による長距離量子相関の証明 Certifying long-range quantum correlations through routed Bell tests ( http://arxiv.org/abs/2310.07484v3 ) ライセンス: Link先を確認	Edwin Peter Lobo, Jef Pauwels, and Stefano Pironio	(参考訳) 伝送チャネルの損失は距離とともに増大し、量子非局所性のフォトニクスの実証とその応用にとって大きな障害となる。最近、Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] は、量子非局所性を証明できる範囲を拡張することを目的として、標準ベルの実験のバリエーションを導入した。我々が「ローテッドベル実験」と呼ぶこれらの実験では、ボブは量子粒子を2つの可能な経路に沿って経路付けし、2つの異なる位置で測定することができる。ショートパスのベル違反は、ロングパスの非局所的相関を検出するために必要な条件を弱めるべきである。実際、CVPはルーティングされたベル実験において、検出効率が任意に低い場合でも、リモートデバイスの結果を古典的に規定できないような量子相関が存在することを示した。本稿では,CVPが考慮した相関関係を古典的に規定することはできないが,遠隔デバイスへの量子システムの伝送を必要としないことを示す。これにより、ルート付きベル実験において「短距離」および「長距離」量子相関の概念が定義される。これらの相関は、非可換多項式最適化のための標準半定義型プログラミング階層によって特徴づけられることを示す。次に、短距離量子相関を除外できる条件について検討する。我々は、遠隔装置の臨界検出効率に基本的な低値が存在することを指摘し、経路ベル実験は任意に広い距離で長距離量子非局所性を示すことができないことを示唆する。しかし,経路付きベル実験により検出効率の閾値が低下することが判明した。しかし、改善はCVPの分析によって示唆されるものよりも大幅に小さい。 Losses in the transmission channel, which increase with distance, pose a major obstacle to photonics demonstrations of quantum nonlocality and its applications. Recently, Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] introduced a variation of standard Bell experiments with the goal of extending the range over which quantum nonlocality can be demonstrated. In these experiments, which we call 'routed Bell experiments', Bob can route his quantum particle along two possible paths and measure it at two distinct locations - one near and another far from the source. The idea is that a Bell violation in the short-path should weaken the conditions required to detect nonlocal correlations in the long-path. Indeed, CVP showed that there are quantum correlations in routed Bell experiments such that the outcomes of the remote device cannot be classically predetermined, even when its detection efficiency is arbitrarily low. In this paper, we show that the correlations considered by CVP, though they cannot be classically predetermined, do not require the transmission of quantum systems to the remote device. This leads us to define the concept of 'short-range' and 'long-range' quantum correlations in routed Bell experiments. We show that these correlations can be characterized through standard semidefinite programming hierarchies for non-commutative polynomial optimization. We then explore the conditions under which short-range quantum correlations can be ruled out. We point out that there exist fundamental lower-bounds on the critical detection efficiency of the distant device, implying that routed Bell experiments cannot demonstrate long-range quantum nonlocality at arbitrarily large distances. However, we do find that routed Bell experiments allow for reducing the detection efficiency threshold. The improvements, though, are significantly smaller than those suggested by CVP's analysis.	翻訳日:2023-11-01 22:26:42 公開日:2023-10-30
# OptiMUS: MIPソルバーと大規模言語モデルを用いた最適化モデリング OptiMUS: Optimization Modeling Using MIP Solvers and large language models ( http://arxiv.org/abs/2310.06116v2 ) ライセンス: Link先を確認	Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell	(参考訳) 最適化問題は製造や流通から医療に至るまで、様々な分野に広がっている。しかし、そのような問題の多くは、最先端の解法で最適に解くのではなく、手でヒューリスティックに解き明かされ、これらの問題を定式化し解決するのに必要な専門知識は、最適化ツールや技術の普及を妨げている。我々は,自然言語記述からmilp問題を定式化し,解決するために設計された大規模言語モデル(llm)ベースのエージェントであるoptimusを紹介する。 OptiMUSは、数学的モデルの開発、ソルバコードの記述とデバッギング、テストの開発、生成したソリューションの有効性の検証を行うことができる。エージェントをベンチマークするために,線形プログラミング(LP)と混合整数線形プログラミング(MILP)の新たなデータセットであるNLP4LPを提案する。実験の結果,OptiMUS は基本的な LLM 促進戦略の約2倍の問題を解くことがわかった。 OptiMUSコードとNLP4LPデータセットは \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS} で入手できる。 Optimization problems are pervasive across various sectors, from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers, as the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve MILP problems from their natural language descriptions. OptiMUS is capable of developing mathematical models, writing and debugging solver code, developing tests, and checking the validity of generated solutions. To benchmark our agent, we present NLP4LP, a novel dataset of linear programming (LP) and mixed integer linear programming (MILP) problems. Our experiments demonstrate that OptiMUS solves nearly twice as many problems as a basic LLM prompting strategy. OptiMUS code and NLP4LP dataset are available at \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS}	翻訳日:2023-11-01 22:25:19 公開日:2023-10-30
# 大規模言語モデル学習のためのメモリコストと通信コストの再考 Rethinking Memory and Communication Cost for Efficient Large Language Model Training ( http://arxiv.org/abs/2310.06003v2 ) ライセンス: Link先を確認	Chan Wu, Hanxiao Zhang, Lin Ju, Jinjing Huang, Youshao Xiao, Zhaoxin Huan, Siyuan Li, Fanzhuang Meng, Lei Liang, Xiaolu Zhang and Jun Zhou	(参考訳) 近年,大規模言語モデル学習のための分散戦略が提案されている。しかし、これらの手法はメモリ消費と通信コストのトレードオフを限定的に解決した。本稿では,大規模な言語モデルの学習速度に及ぼすメモリ消費と通信コストの影響を再考し,部分冗長最適化器(PaRO)を用いたメモリ通信バランス戦略を提案する。 PaROは、微粒なシャーディング戦略により、小メモリ冗長性によるグループ間通信の量と頻度を削減し、様々なトレーニングシナリオにおけるトレーニング効率を向上させる包括的なオプションを提供する。さらに,大規模言語モデル学習において,ノード間やスイッチ間の通信効率を高めるために,階層オーバーラップリング(HO-Ring)通信トポロジを提案する。実験の結果,PaROはSOTA法に比べて1.19x-2.50倍のトレーニングスループットを向上し,ほぼ線形スケーラビリティを実現することがわかった。 hoリングアルゴリズムは従来のリングアルゴリズムと比較して通信効率を36.5%向上させる。 Recently, various distributed strategies for large language model training have been proposed. However, these methods provided limited solutions for the trade-off between memory consumption and communication cost. In this paper, we rethink the impact of memory consumption and communication costs on the training speed of large language models, and propose a memory-communication balanced strategy set Partial Redundancy Optimizer (PaRO). PaRO provides comprehensive options which reduces the amount and frequency of inter-group communication with minor memory redundancy by fine-grained sharding strategy, thereby improving the training efficiency in various training scenarios. Additionally, we propose a Hierarchical Overlapping Ring (HO-Ring) communication topology to enhance communication efficiency between nodes or across switches in large language model training. Our experiments demonstrate that PaRO significantly improves training throughput by 1.19x-2.50x compared to the SOTA method and achieves a near-linear scalability. The HO-Ring algorithm improves communication efficiency by 36.5% compared to the traditional Ring algorithm.	翻訳日:2023-11-01 22:24:42 公開日:2023-10-30
# テキスト関連性測定のための埋め込みを探る:オンラインコメントにおける感覚と関連性を明らかにする Exploring Embeddings for Measuring Text Relatedness: Unveiling Sentiments and Relationships in Online Comments ( http://arxiv.org/abs/2310.05964v2 ) ライセンス: Link先を確認	Anthony Olakangil, Cindy Wang, Justin Nguyen, Qunbo Zhou, Kaavya Jethwa, Jason Li, Aryan Narendra, Nishk Patel, Arjun Rajaram	(参考訳) 新型コロナウイルス(COVID-19)のパンデミックでインターネット利用が70%増加した後、世界中でソーシャルメディアを利用している人が増えている。 Twitter、Meta Threads、YouTube、Redditといったアプリケーションはますます普及しており、世論が表現されないデジタル空間はほとんど残っていない。本稿では,様々なソーシャルメディアプラットフォームにおけるコメント間の感情的・意味的関係を考察するとともに,各メディアプラットフォーム間での意見共有の重要性について考察する。研究者、政治家、ビジネス代表者が世界中のユーザー間で共有された感情の経路を辿ることができる。本稿では,これらのオンラインプラットフォーム上でユーザコメントから抽出されたテキストの関連度を測定する複数の手法を提案する。単語間のセマンティックな関係を捉え、ウェブ全体の感情を分析する埋め込みを活用することで、世論全体の関連を明らかにすることができる。この研究は、YouTube、Reddit、Twitterなどの既存のデータセットを利用している。我々は、双方向エンコーダ表現(BERT)のような人気のある自然言語処理モデルを利用して、感情を分析し、コメント埋め込み間の関係を探索した。さらに,様々なソーシャルメディアプラットフォームにまたがるコメント埋め込みにおける意味的関係を見つけるために,クラスタリングとkl-divergenceを活用することを目的としている。我々の分析は、オンラインコメントの相互接続性をより深く理解し、大きな相互接続脳として機能するインターネットの概念を調査する。 After the COVID-19 pandemic caused internet usage to grow by 70%, there has been an increased number of people all across the world using social media. Applications like Twitter, Meta Threads, YouTube, and Reddit have become increasingly pervasive, leaving almost no digital space where public opinion is not expressed. This paper investigates sentiment and semantic relationships among comments across various social media platforms, as well as discusses the importance of shared opinions across these different media platforms, using word embeddings to analyze components in sentences and documents. It allows researchers, politicians, and business representatives to trace a path of shared sentiment among users across the world. This research paper presents multiple approaches that measure the relatedness of text extracted from user comments on these popular online platforms. By leveraging embeddings, which capture semantic relationships between words and help analyze sentiments across the web, we can uncover connections regarding public opinion as a whole. The study utilizes pre-existing datasets from YouTube, Reddit, Twitter, and more. We made use of popular natural language processing models like Bidirectional Encoder Representations from Transformers (BERT) to analyze sentiments and explore relationships between comment embeddings. Additionally, we aim to utilize clustering and Kl-divergence to find semantic relationships within these comment embeddings across various social media platforms. Our analysis will enable a deeper understanding of the interconnectedness of online comments and will investigate the notion of the internet functioning as a large interconnected brain.	翻訳日:2023-11-01 22:24:25 公開日:2023-10-30
# 注意パラダイムを超越する:地理空間ソーシャルメディアデータからの表現学習 Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data ( http://arxiv.org/abs/2310.05378v2 ) ライセンス: Link先を確認	Nick DiSanto, Anthony Corso, Benjamin Sanders, Gavin Harding	(参考訳) トランスフォーマーは、研究の基盤として注目駆動アーキテクチャを開拓してきたが、文脈情報への依存は、テキストのテーマを暗黙的に学習する能力の限界を浮き彫りにした。本研究では,分散パターンの源泉としてソーシャルメディアデータを調査し,パフォーマンスベンチマークのヒューリスティックパラダイムに挑戦する。複雑な長期的依存関係の取得に依存するネットワークとは対照的に、オンラインデータのモデルは本質的に構造を欠き、集約の基盤となるパターンを学習せざるを得ない。これらの抽象的関係を適切に表現するために、この研究は経験的ソーシャルメディアコーパスを要素成分に分解し、人口密度の場所をまたいだ20億以上のツイートを分析した。 Twitterデータにおける位置と頂点の関係を探索し、各都市固有の単語モデルを用いて、それぞれの表現を評価する。これは、隠れた洞察が高度なアルゴリズムの欠如なしに発見できることを示し、ノイズの多いデータの中でも、地理的な位置がオンラインコミュニケーションにかなりの影響を与えることを示す。この証拠は、地理空間コミュニケーションのパターンとその社会科学における意義に関する明確な洞察を示している。また、複雑なモデルは自然言語におけるパターン認識の前提条件であり、抽象的理解よりも絶対的解釈可能性の受容に疑問を呈する発展途上の景観と整合する。この研究は、洗練されたフレームワークと無形関係の分離を橋渡しし、構造モデルと客観的推論をブレンドするシステムへの道を開く。 While transformers have pioneered attention-driven architectures as a cornerstone of research, their dependence on explicitly contextual information underscores limitations in their abilities to tacitly learn overarching textual themes. This study investigates social media data as a source of distributed patterns, challenging the heuristic paradigm of performance benchmarking. In stark contrast to networks that rely on capturing complex long-term dependencies, models of online data inherently lack structure and are forced to learn underlying patterns in the aggregate. To properly represent these abstract relationships, this research dissects empirical social media corpora into their elemental components and analyzes over two billion tweets across population-dense locations. Exploring the relationship between location and vernacular in Twitter data, we employ Bag-of-Words models specific to each city and evaluate their respective representation. This demonstrates that hidden insights can be uncovered without the crutch of advanced algorithms and demonstrates that even amidst noisy data, geographic location has a considerable influence on online communication. This evidence presents tangible insights regarding geospatial communication patterns and their implications in social science. It also challenges the notion that intricate models are prerequisites for pattern recognition in natural language, aligning with the evolving landscape that questions the embrace of absolute interpretability over abstract understanding. This study bridges the divide between sophisticated frameworks and intangible relationships, paving the way for systems that blend structured models with conjectural reasoning.	翻訳日:2023-11-01 22:23:31 公開日:2023-10-30
# 分布に基づく軌道クラスタリング Distribution-Based Trajectory Clustering ( http://arxiv.org/abs/2310.05123v2 ) ライセンス: Link先を確認	Zi Jing Wang, Ye Zhu, Kai Ming Ting	(参考訳) 軌道クラスタリングは、軌道データの共通パターンの発見を可能にする。現在の軌道クラスタリングの方法は、2つの軌道間の相似性を測定するために2つの点間の距離測度に依存する。距離測定には高い計算コストと低い忠実度という2つの課題がある。既存のクラスタリングアルゴリズムが採用する距離測定とは独立に、別の課題がある。本稿では,最近の分散カーネル(IDK)を3つの課題に対処するための主要なツールとして用いることを提案する。 TIDKCと呼ばれる新しいIDKベースのクラスタリングアルゴリズムは、軌道類似度測定とクラスタリングに分散カーネルをフル活用する。 TIDKCは不規則な形状と線形時間における密度の異なる非線形分離性クラスターを同定する。ランダム初期化に依存しず、外れ値に対して堅牢である。 7つの大規模実世界の軌跡データセットの広範な評価により、IDKは従来の深層学習に基づく距離測定よりも、軌跡内の複雑な構造を捉えるのに効果的であることが示された。さらに,提案したTIDKCは,既存のトラジェクトリクラスタリングアルゴリズムよりもクラスタリング性能と効率が優れている。 Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algorithms have another challenge: either effectiveness issues or high time complexity. In this paper, we propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges. The new IDK-based clustering algorithm, called TIDKC, makes full use of the distributional kernel for trajectory similarity measuring and clustering. TIDKC identifies non-linearly separable clusters with irregular shapes and varied densities in linear time. It does not rely on random initialisation and is robust to outliers. An extensive evaluation on 7 large real-world trajectory datasets confirms that IDK is more effective in capturing complex structures in trajectories than traditional and deep learning-based distance measures. Furthermore, the proposed TIDKC has superior clustering performance and efficiency to existing trajectory clustering algorithms.	翻訳日:2023-11-01 22:23:05 公開日:2023-10-30
# BioBridge:知識グラフによるバイオメディカル基礎モデルのブリッジ BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs ( http://arxiv.org/abs/2310.03320v3 ) ライセンス: Link先を確認	Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, Rishita Anubhai	(参考訳) 基盤モデル(fms)は、大量のラベルのないデータを活用し、幅広いタスクで優れたパフォーマンスを示すことができる。しかし、生体医学領域向けに開発されたfmsは、独立に訓練され、タンパク質配列のみ、小分子構造のみ、臨床データのみのタスクに使用されている。このようなバイオメディカルFMの限界を克服するため,新しいパラメータ効率学習フレームワークであるBioBridgeを提案し,独立に訓練された単調FMを橋渡しし,マルチモーダルな動作を確立する。 BioBridgeは、知識グラフ(KG)を使用して、基礎となる一助的FMを微調整することなく、1つの一助的FMともう1つの間の変換を学習する。実験の結果,BioBridgeは,クロスモーダル検索タスクにおいて,最高のベースラインKG埋め込み手法(平均76.3%)を克服できることが示された。また、BioBridgeは、未知のモダリティや関係を外挿することで、ドメイン外一般化能力を示す。また,バイオブリッジは,生物医学的マルチモーダル質問応答を支援できる汎用レトリバーとして自らを提示し,新規医薬品の誘導生成を促進する。 Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.	翻訳日:2023-11-01 22:22:51 公開日:2023-10-30
# 3次元物理系における対称性の破断学習のための緩和オクタヘドラル群畳み込み Relaxed Octahedral Group Convolution for Learning Symmetry Breaking in 3D Physical Systems ( http://arxiv.org/abs/2310.02299v3 ) ライセンス: Link先を確認	Rui Wang, Robin Walters, Tess E.Smidt	(参考訳) 深部等変モデルでは、サンプル効率と一般化を改善するために対称性を用いる。しかし、これらのモデルの多くにおける完全対称性の仮定は、特にデータがそのような対称性と完全に一致しない場合に制限的である。そこで本稿では,3次元物理系をモデル化するための緩和八面体群畳み込みを導入する。このフレキシブルな畳み込み法は、モデルがデータと整合する最も高いレベルの等値を維持し、物理的システムの微妙な対称性を破る要因を発見できるようにする。実験により,本手法は相転移における対称性破壊要因の洞察を与えるだけでなく,流体超解像タスクにおいて優れた性能を達成できることを示す。 Deep equivariant models use symmetries to improve sample efficiency and generalization. However, the assumption of perfect symmetry in many of these models can sometimes be restrictive, especially when the data does not perfectly align with such symmetries. Thus, we introduce relaxed octahedral group convolution for modeling 3D physical systems in this paper. This flexible convolution technique provably allows the model to both maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in the physical systems. Empirical results validate that our approach can not only provide insights into the symmetry-breaking factors in phase transitions but also achieves superior performance in fluid super-resolution tasks.	翻訳日:2023-11-01 22:22:29 公開日:2023-10-30
# 自己回帰率観察によるウェアラブル医療の効率的不均衡を考慮したフェデレーション学習手法 An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation ( http://arxiv.org/abs/2310.14784v2 ) ライセンス: Link先を確認	Wenhao Yan, He Li, Kaoru Ota, Mianxiong Dong	(参考訳) ウェアラブルセンシング技術やモバイルエッジコンピューティングの進歩により、広く利用可能な医療サービスが普及しています。人々の健康情報は、スマートフォンやウェアラブルバンドなどのエッジデバイスによって収集され、サーバーのさらなる分析を行い、異常な状況に対する提案や警告を送信します。近年のフェデレーション学習では、ローカルデバイス上でプライベートデータをトレーニングし、モデルを共同で更新することが可能になる。しかしながら、健康状態データの不均質な分布は、クラス不均衡によるパフォーマンスのモデル化に重大なリスクをもたらす可能性がある。一方、FLトレーニングはサーバとのみグラデーションを共有することで実現されているため、トレーニングデータはほとんどアクセスできない。クラス不均衡に対する従来の解決策は、連合学習には役立ちません。本研究では,フェデレーション学習シナリオにおけるクラス不均衡の課題に対処するために,新しいフェデレーション学習フレームワークfedimtを提案する。 FedImTには、アグリゲーションの各ラウンドでデータ構成を推定するオンラインスキームが含まれており、その後、複数の推定のバリエーションを追跡するための自己減衰反復を導入し、少数クラスの損失計算のバランスを迅速に調整する。実験は、余剰エネルギー消費やプライバシーリスクを回避することなく、不均衡問題を解決するためのFedImTの有効性を示す。 Widely available healthcare services are now getting popular because of advancements in wearable sensing techniques and mobile edge computing. People's health information is collected by edge devices such as smartphones and wearable bands for further analysis on servers, then send back suggestions and alerts for abnormal conditions. The recent emergence of federated learning allows users to train private data on local devices while updating models collaboratively. However, the heterogeneous distribution of the health condition data may lead to significant risks to model performance due to class imbalance. Meanwhile, as FL training is powered by sharing gradients only with the server, training data is almost inaccessible. The conventional solutions to class imbalance do not work for federated learning. In this work, we propose a new federated learning framework FedImT, dedicated to addressing the challenges of class imbalance in federated learning scenarios. FedImT contains an online scheme that can estimate the data composition during each round of aggregation, then introduces a self-attenuating iterative equivalent to track variations of multiple estimations and promptly tweak the balance of the loss computing for minority classes. Experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks.	翻訳日:2023-11-01 22:15:15 公開日:2023-10-30
# スケーラブルなデータ表現と分類のための学習解釈可能なルール Learning Interpretable Rules for Scalable Data Representation and Classification ( http://arxiv.org/abs/2310.14336v2 ) ライセンス: Link先を確認	Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang	(参考訳) 規則に基づくモデル、例えば決定木は、透明な内部構造と優れたモデル表現性のために高いモデル解釈性を必要とするシナリオで広く使われている。しかし、ルールベースのモデルは、特に大きなデータセットでは、個々のパラメータや構造のために最適化が難しい。アンサンブルメソッドとファジィ/ソフトルールは一般的にパフォーマンスを改善するために使用されるが、モデルの解釈性を犠牲にしている。スケーラビリティと解釈性の両方を得るために,データ表現と分類のための解釈不能なルールを自動的に学習する,ルールベース表現学習器(rrl)という新しい分類器を提案する。非微分可能rrlを効果的に訓練するために、連続空間に投影し、勾配降下を用いて離散モデルを直接最適化できる勾配グラフトと呼ばれる新しい訓練方法を提案する。論理アクティベーション関数の新たな設計は、RRLのスケーラビリティを高め、エンドツーエンドで連続的な特徴を識別できるようにするためにも考案されている。 10個の小さなデータセットと4つの大きなデータセットの探索実験により、RRLは競争的解釈可能なアプローチよりも優れており、異なるシナリオにおける分類精度とモデルの複雑さのトレードオフを得るために容易に調整できることを示した。私たちのコードは以下の通りです。 Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.	翻訳日:2023-11-01 22:14:36 公開日:2023-10-30
# 言語的動機づけによる手話セグメンテーション Linguistically Motivated Sign Language Segmentation ( http://arxiv.org/abs/2310.13960v2 ) ライセンス: Link先を確認	Amit Moryossef, Zifan Jiang, Mathias M\"uller, Sarah Ebling, Yoav Goldberg	(参考訳) 手話セグメンテーションは手話処理システムにおいて重要なタスクである。これは、サイン認識、転写、機械翻訳などの下流タスクを可能にする。本研究では,個々の記号への分割と,複数の記号からなる大きな単位からなる句への分割という2種類の分割について考察する。これら2つのタスクを協調的にモデル化する新しい手法を提案する。本手法は手話コーパスに見られる言語的手がかりに動機づけられている。我々は、主要なIOタグ付けスキームをBIOタグに置き換えて、継続的な署名を行う。句境界において韻律が重要な役割を果たすことを考慮し,光フロー機能の利用について検討する。また,手形と3次元手形正規化の広範囲な解析を行う。署名境界のモデル化には,BIOタグの導入が必要である。オプティカルフローによるプロソディの明示的にエンコーディングは、浅いモデルのセグメンテーションを改善するが、深いモデルではその貢献は無視できる。モデル上における復号アルゴリズムの注意深いチューニングは、セグメンテーション品質をさらに向上させる。最終モデルは、ゼロショット設定下であっても、異なる署名付き言語でドメイン外のビデオコンテンツに一般化されることを実証する。光流と3次元ハンド正規化を含め、この文脈でモデルのロバスト性を高めることが観察される。 Sign language segmentation is a crucial task in sign language processing systems. It enables downstream tasks such as sign recognition, transcription, and machine translation. In this work, we consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases, larger units comprising several signs. We propose a novel approach to jointly model these two tasks. Our method is motivated by linguistic cues observed in sign language corpora. We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing. Given that prosody plays a significant role in phrase boundaries, we explore the use of optical flow features. We also provide an extensive analysis of hand shapes and 3D hand normalization. We find that introducing BIO tagging is necessary to model sign boundaries. Explicitly encoding prosody by optical flow improves segmentation in shallow models, but its contribution is negligible in deeper models. Careful tuning of the decoding algorithm atop the models further improves the segmentation quality. We demonstrate that our final models generalize to out-of-domain video content in a different signed language, even under a zero-shot setting. We observe that including optical flow and 3D hand normalization enhances the robustness of the model in this context.	翻訳日:2023-11-01 22:14:14 公開日:2023-10-30
# 大規模言語モデルはなぜ正しい連鎖を生成するのか? Why Can Large Language Models Generate Correct Chain-of-Thoughts? ( http://arxiv.org/abs/2310.13571v2 ) ライセンス: Link先を確認	Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar	(参考訳) 本稿では,大規模言語モデル(LLM)の能力について述べる。本研究では,LLMを効果的に誘導し,コヒーレントな思考連鎖を生成する方法について検討する。これを実現するために,自然言語生成に適した2階層階層型グラフィカルモデルを提案する。この枠組み内では、真の言語に由来するものと比較して、LLM生成された思考の連鎖の可能性を測る魅力的な幾何学的収束率を確立する。本研究は、推論能力を要求するタスクにおけるパフォーマンス向上を説明する(潜在的に)適切な思考列を生成するllmの能力に関する理論的正当性を提供する。 This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.	翻訳日:2023-11-01 22:13:32 公開日:2023-10-30
# DistillCSE: 文埋め込みのための蒸留コントラスト学習 DistillCSE: Distilled Contrastive Learning for Sentence Embeddings ( http://arxiv.org/abs/2310.13499v2 ) ライセンス: Link先を確認	Jiahao Xu and Wei Shao and Lihui Chen and Lemao Liu	(参考訳) 本稿では,知識蒸留による自己学習パラダイムの下で,コントラスト学習を行うDistillCSEフレームワークを提案する。 DistillCSEの潜在的な利点は、自給自足機能である: ベースモデルを使用してさらなる監視信号を提供することで、知識蒸留を通じてより強力なモデルを学ぶことができる。しかしながら、知識蒸留の標準的な実装によるバニラ蒸留は、過度な過剰フィットによる限界的な改善しか達成できない。さらに定量的に分析した結果, 標準知識蒸留は, コントラスト学習の本質から, 教師モデルのロジットに比較的大きなばらつきがあることが明らかになった。そこで本研究では,高分散によって引き起こされる問題を緩和するため,グループ・Pシャッフル戦略を暗黙の正規化として提案し,複数の教師成分から平均ロジットを抽出した。標準ベンチマークによる実験では、提案法が多くの強力なベースライン法を上回り、新たな最先端性能をもたらすことが示されている。 This paper proposes the DistillCSE framework, which performs contrastive learning under the self-training paradigm with knowledge distillation. The potential advantage of DistillCSE is its self-enhancing feature: using a base model to provide additional supervision signals, a stronger model may be learned through knowledge distillation. However, the vanilla DistillCSE through the standard implementation of knowledge distillation only achieves marginal improvements due to severe overfitting. The further quantitative analyses demonstrate the reason that the standard knowledge distillation exhibits a relatively large variance of the teacher model's logits due to the essence of contrastive learning. To mitigate the issue induced by high variance, this paper accordingly proposed two simple yet effective solutions for knowledge distillation: a Group-P shuffling strategy as an implicit regularization and the averaging logits from multiple teacher components. Experiments on standard benchmarks demonstrate that the proposed DistillCSE outperforms many strong baseline methods and yields a new state-of-the-art performance.	翻訳日:2023-11-01 22:13:21 公開日:2023-10-30
# バンディットゲームにおける近似情報最大化 Approximate information maximization for bandit games ( http://arxiv.org/abs/2310.12563v2 ) ライセンス: Link先を確認	Alex Barbier-Chebbah (IP, CNRS, UPCit\'e), Christian L. Vestergaard (IP, CNRS, UPCit\'e), Jean-Baptiste Masson (IP, CNRS, UPCit\'e), Etienne Boursier (INRIA Saclay)	(参考訳) エントロピー最大化と自由エネルギー最小化は、様々な物理系の力学をモデル化するための一般的な物理原理である。例えば、自由エネルギー原理を用いた脳内意思決定のモデル化、情報ボトルネック原理による隠れ変数へのアクセス時の精度・複雑さトレードオフの最適化(Tishby et al., 2000)、情報最大化を用いたランダム環境におけるナビゲーション(Vergassola et al., 2007)などがある。この原理に基づいて,システム内のキー変数の情報に対する近似を最大化する新しい帯域幅アルゴリズムを提案する。この目的のために,エントロピーの近似解析物理学に基づく表現を開発し,各動作の情報ゲインを予測し,情報ゲインが最も大きいものを選択する。この手法は古典的なバンディット設定において強力なパフォーマンスをもたらす。経験的成功により,ガウス報酬を伴う二本腕バンディット問題に対する漸近的最適性を証明する。システムの性質をグローバルな物理関数に包含する能力のため、このアプローチはより複雑な帯域幅設定に効率的に適応することができ、マルチアーム帯域幅問題に対する情報最大化アプローチのさらなる研究を求めることができる。 Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.	翻訳日:2023-11-01 22:12:13 公開日:2023-10-30
# 大規模言語モデルにおけるファクチュアル知識の体系的評価 Systematic Assessment of Factual Knowledge in Large Language Models ( http://arxiv.org/abs/2310.11638v3 ) ライセンス: Link先を確認	Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari	(参考訳) 従来の研究では,大規模言語モデル(LLM)に格納された知識を評価するために,既存の質問応答ベンチマークに頼っていた。しかし、このアプローチは、主に事前学習データと重複するジェネリックドメインに焦点を当てているため、事実的知識カバレッジに関する制限がある。本稿では,知識グラフ(KG)を利用して,LLMの事実知識を体系的に評価する枠組みを提案する。本フレームワークは,所定のKGに格納された事実から,質問の集合と期待された回答を自動的に生成し,これらの質問に対するLLMの精度を評価する。汎用ドメインと特定ドメインのKGを用いて,最先端のLCMを体系的に評価した。この実験は、ChatGPTがすべてのドメインで一貫してトップパフォーマーであることを示している。また, LLMの性能は命令の微調整, ドメイン, 質問の複雑さに左右され, 相手のコンテキストに左右される傾向がある。 Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.	翻訳日:2023-11-01 22:11:49 公開日:2023-10-30
# フェデレーション多目的学習 Federated Multi-Objective Learning ( http://arxiv.org/abs/2310.09866v2 ) ライセンス: Link先を確認	Haibo Yang, Zhuqing Liu, Jia Liu, Chaosheng Dong, Michinari Momma	(参考訳) 近年、多目的最適化(MOO)は多くのマルチエージェントマルチタスク学習アプリケーションを支える基礎的な問題として現れている。しかし,MOO文学における既存のアルゴリズムは,マルチエージェントマルチタスク学習アプリケーションの分散性やデータプライバシ要求を満足しない集中型学習設定に限定されている。これにより、複数のクライアントがMOO問題を分散的かつ協調的に解決し、トレーニングデータをプライベートに保ちながら、新しいFMOL(Federated Multi-Objective Learning)フレームワークを提案することができる。特に,我々のFMOLフレームワークは,異なるクライアント間で異なる目的関数のセットを提供して,MOOの定式化を初めてフェデレート学習パラダイムに発展させ,一般化する幅広いアプリケーションをサポートする。このfmolフレームワークのために,federated multi-gradient descent averaging (fmgda) と federated stochastic multi-gradient descent averaging (fsmgda) と呼ばれる2つの新しいfederated multi-objective optimization (fmoo) アルゴリズムを提案する。両方のアルゴリズムは、局所的な更新によって通信コストを著しく削減し、一方、単目的フェデレーション学習においてアルゴリズムのアルゴリズムと同等の収束率を達成する。また,提案したFMOOアルゴリズムの有効性についても検討した。 In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.	翻訳日:2023-11-01 22:11:12 公開日:2023-10-30
# impress:拡散型生成aiにおける無許可データ使用に対する知覚不能摂動のレジリエンス評価 IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI ( http://arxiv.org/abs/2310.19248v1 ) ライセンス: Link先を確認	Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen	(参考訳) 安定拡散やDALL-E 2のような拡散に基づく画像生成モデルは、与えられた画像から学習し、プロンプトからのガイダンスに従って高品質なサンプルを生成することができる。例えば、オリジナルのアートワークに基づいてアーティストのスタイルを模倣したアートなイメージを制作したり、偽のコンテンツのためにオリジナル画像を悪意を持って編集したりすることができる。しかし、そのような能力は、元の画像の所有者から適切な許可を得ることなく、重大な倫理的な問題を引き起こす。これに対し、拡散モデルを誤解し、新しいサンプルを適切に生成できないように設計された、知覚不能な摂動を追加することで、そのような不正なデータ使用から元の画像を保護するいくつかの試みがなされている。本研究では, IMPRESSと呼ばれる摂動浄化プラットフォームを導入し, 非受容性摂動の有効性を保護策として評価する。 IMPRESSは、知覚不能な摂動は、元の画像と拡散再構成された画像の間に認識不能な不整合をもたらす可能性があり、これは、画像の浄化のための新しい最適化戦略を考案するために使用することができ、これは、原画像の不正なデータ使用(例えば、スタイル模倣、悪意ある編集)から保護を弱める可能性がある。提案するIMPRESSプラットフォームは,現代の保護手法を包括的に評価し,将来の保護手法の評価プラットフォームとして利用することができる。 Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e.g., style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.	翻訳日:2023-11-01 22:04:50 公開日:2023-10-30
# CodeFusion: コード生成のための事前トレーニング付き拡散モデル CodeFusion: A Pre-trained Diffusion Model for Code Generation ( http://arxiv.org/abs/2310.17680v2 ) ライセンス: Link先を確認	Mukul Singh, Jos\'e Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen	(参考訳) 最後のコード行しか変更できない開発者が、それが正しくなる前に、スクラッチから関数を書き始める頻度を想像してください。自然言語からコードを生成するための自動回帰モデルにも同じような制限がある。符号化自然言語で条件付けられた完全なプログラムを反復的にデノベートすることにより,この制限に対処する,事前学習された拡散コード生成モデルであるcodefusionを導入する。我々は,Bash,Python,Microsoft Excel条件書式(CF)ルールに対して,自然言語のタスクからコード生成までのCodeFusionを評価する。実験の結果、CodeFusion(75Mパラメータ)は最先端の自己回帰システム(350M-175Bパラメータ)と同等に動作し、多様性と品質のバランスが良く、トップ3とトップ5の精度で性能が向上していることがわかった。 Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.	翻訳日:2023-11-01 22:03:54 公開日:2023-10-30
# format5: 自然言語を用いた条件付きテーブルフォーマッティングの省略と例 FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language ( http://arxiv.org/abs/2310.17306v2 ) ライセンス: Link先を確認	Mukul Singh, Jos\'e Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Elnaz Nouri, Mohammad Raza, Gust Verbruggen	(参考訳) フォーマッティングは、視覚化、プレゼンテーション、分析のためのテーブルの重要な特性である。スプレッドシートソフトウェアは、データに依存した条件付きフォーマット(cf)ルールを書くことで自動的にテーブルをフォーマットできる。このようなルールを書くことは、基礎となるロジックを理解し実装する必要があるため、ユーザにとってしばしば困難である。 FormaT5は、対象のテーブルに与えられたCFルールと、所望のフォーマットロジックの自然言語記述を生成できるトランスフォーマーベースのモデルである。これらのタスクのユーザ記述は、しばしば不特定または曖昧であり、コード生成システムは、望ましいルールを1ステップで正確に学習することが困難である。この問題に対処し、引数エラーを最小限に抑えるため、form5は放棄目的にもかかわらずプレースホルダーを予測することを学ぶ。これらのプレースホルダーは、第2のモデルで満たされるか、あるいはフォーマットすべき行の例を、プログラム・バイ・サンプル・システムで利用できる。 FormaT5を多種多様な実シナリオで評価するために、我々は4つの異なるソースから収集された実世界の記述を含む1053のCFタスクの広範なベンチマークを作成する。私たちはこの分野の研究を促進するためにベンチマークをリリースします。回避と充填により、form5は8つの異なるニューラルアプローチをベンチマークで比較できます。本研究は、ドメイン固有の学習システムを構築することの価値を示す。 Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.	翻訳日:2023-11-01 22:03:02 公開日:2023-10-30
# RDBench:リレーショナルデータベースのためのMLベンチマーク RDBench: ML Benchmark for Relational Databases ( http://arxiv.org/abs/2310.16837v2 ) ライセンス: Link先を確認	Zizhao Zhang, Yi Yang, Lutong Zou, He Wen, Tao Feng, Jiaxuan You	(参考訳) 高品質なデータセットと標準化された評価指標から恩恵を受け、機械学習(ML)は持続的な進歩と広範なアプリケーションを実現した。しかし、機械学習をリレーショナルデータベース(RDB)に適用する一方で、十分に確立されたベンチマークが存在しないことは、MLの開発にとって大きな障害である。この問題に対処するため,我々は,複数のテーブルを含むrdb上で再現可能なml研究を促進するための標準ベンチマークであるrdbench(ml benchmark for relational databases)を紹介する。 RDBenchは、さまざまなスケール、ドメイン、リレーショナル構造のRDBデータセットを4つのレベルに分類する。特に、さまざまなMLドメインに対するRDBenchの採用を単純化するために、RDBenchは、グラフデータ、均質グラフ、異質グラフを含む3種類のインターフェースを公開し、その基盤となるタスク定義を共有する。 RDBenchは、RDB予測タスクの下で、XGBoostからGraph Neural Networksまで、さまざまなドメインからのMLメソッド間の有意義な比較を可能にする。 rdbデータセットごとに複数の分類と回帰タスクを設計、同じデータセット上で平均結果を報告し、実験結果のロバスト性をさらに向上させる。 RDBenchはDBGymで実装されている。DBGymはデータベース上のML研究とアプリケーションのためのユーザフレンドリーなプラットフォームで、RDBenchを使った新しいMLメソッドのベンチマークを容易に行える。 Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications. However, while applying machine learning to relational databases (RDBs), the absence of a well-established benchmark remains a significant obstacle to the development of ML. To address this issue, we introduce ML Benchmark For Relational Databases (RDBench), a standardized benchmark that aims to promote reproducible ML research on RDBs that include multiple tables. RDBench offers diverse RDB datasets of varying scales, domains, and relational structures, organized into 4 levels. Notably, to simplify the adoption of RDBench for diverse ML domains, for any given database, RDBench exposes three types of interfaces including tabular data, homogeneous graphs, and heterogeneous graphs, sharing the same underlying task definition. For the first time, RDBench enables meaningful comparisons between ML methods from diverse domains, ranging from XGBoost to Graph Neural Networks, under RDB prediction tasks. We design multiple classification and regression tasks for each RDB dataset and report averaged results over the same dataset, further enhancing the robustness of the experimental findings. RDBench is implemented with DBGym, a user-friendly platform for ML research and application on databases, enabling benchmarking new ML methods with RDBench at ease.	翻訳日:2023-11-01 22:01:37 公開日:2023-10-30
# Data Provenance Initiative: AIにおけるデータセットライセンスと属性の大規模監査 The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI ( http://arxiv.org/abs/2310.16787v2 ) ライセンス: Link先を確認	Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Deb Roy, Sara Hooker	(参考訳) 膨大な、多様な、一貫性のないデータセットで言語モデルをトレーニングするレースは、実践者に対する法的および倫理的リスクに対する懸念を高めている。データの透明性と理解を脅かすこれらのプラクティスを是正するために、法律と機械学習の専門家の間で、1800以上のテキストデータセットを体系的に監査し追跡するための、複数の学際的な取り組みを招集する。私たちは、ソース、クリエーター、一連のライセンス条件、プロパティ、以降の使用から、これらのデータセットの系統をトレースするためのツールと標準を開発します。私たちのランドスケープ分析は、より低いリソース言語、より創造的なタスク、よりリッチなトピックの多様性、より新しい、より合成的なトレーニングデータといった重要なカテゴリを独占するクローズドデータセットによる、商業的にオープンなデータセットとクローズドデータセットの組成と焦点の急激な分割を強調しています。このことは、異なるライセンス条件下で利用できるデータの種類がより深く分断され、著作権と公正使用に関する司法的法的解釈への含意が高まったことを示している。また,広く使用されているデータセットホスティングサイトでは,ライセンスの欠落が72%以上,エラーレートが50%以上,ライセンスの誤分類が頻発している。これは、多くの最近のブレークスルーを駆動する最も人気のあるデータセットの誤帰と情報利用の危機を示している。データセットの透明性と責任ある使用に関する継続的な改善への貢献として、私たちは、最もポピュラーなオープンソースの微調整データコレクションであるwww.dataprovenance.orgのために、データプロヴァンスをトレースしてフィルタできるインタラクティブuiであるdata provenance explorerを使って、監査全体をリリースします。 The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tools and standards to trace the lineage of these datasets, from their source, creators, series of license conditions, properties, and subsequent use. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. This points to a deepening divide in the types of data that are made available under different license conditions, and heightened implications for jurisdictional legal interpretations of copyright and fair use. We also observe frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of 72%+ and error rates of 50%+. This points to a crisis in misattribution and informed use of the most popular datasets driving many recent breakthroughs. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire audit, with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections: www.dataprovenance.org.	翻訳日:2023-11-01 22:00:54 公開日:2023-10-30
# 光による超伝導量子ビットのコヒーレント制御 Coherent control of a superconducting qubit using light ( http://arxiv.org/abs/2310.16155v2 ) ライセンス: Link先を確認	Hana K. Warner, Jeffrey Holzgrafe, Beatriz Yankelevich, David Barton, Stefano Poletto, C. J. Xin, Neil Sinclair, Di Zhu, Eyob Sete, Brandon Langley, Emma Batson, Marco Colangelo, Amirhassan Shams-Ansari, Graham Joe, Karl K. Berggren, Liang Jiang, Matthew Reagor, and Marko Loncar	(参考訳) 量子科学と技術は、低損失および低ノイズ通信チャネルに接続された量子プロセッサのネットワークに依存する強力な計算資源の実現を約束している [1,2]。極低温環境で動作する超伝導マイクロ波量子ビット (3-8ghz) は、その強いジョセフソン非線形性と低損失 [3] のために量子プロセッサノードの有望な候補として現れているが、空間的に分離されたプロセッサノード間の情報は、低損失光ファイバを伝搬する通信光子 (200 thz) を介して室温で伝達される可能性が高い。したがって、これらの異なる周波数間の量子情報の変換 [4-10] は、各プラットフォームの利点を量子資源と対向させることで活用することが重要である。ここでは超伝導量子ビットのコヒーレント光制御を示す。我々は、最大1.18%の変換効率(1.16%の協調性)で動作し、量子コヒーレンス時間 (800 ns) に影響を与えずに超伝導量子ビット内のラビ振動 (2.27 mhz) を示すマイクロ波光量子トランスデューサを開発した。最後に,ネットワーク量子プロセッサノードへのトランスデューサの利用に関する展望について述べる。 Quantum science and technology promise the realization of a powerful computational resource that relies on a network of quantum processors connected with low loss and low noise communication channels capable of distributing entangled states [1,2]. While superconducting microwave qubits (3-8 GHz) operating in cryogenic environments have emerged as promising candidates for quantum processor nodes due to their strong Josephson nonlinearity and low loss [3], the information between spatially separated processor nodes will likely be carried at room temperature via telecommunication photons (200 THz) propagating in low loss optical fibers. Transduction of quantum information [4-10] between these disparate frequencies is therefore critical to leverage the advantages of each platform by interfacing quantum resources. Here, we demonstrate coherent optical control of a superconducting qubit. We achieve this by developing a microwave-optical quantum transducer that operates with up to 1.18% conversion efficiency (1.16% cooperativity) and demonstrate optically-driven Rabi oscillations (2.27 MHz) in a superconducting qubit without impacting qubit coherence times (800 ns). Finally, we discuss outlooks towards using the transducer to network quantum processor nodes.	翻訳日:2023-11-01 21:59:39 公開日:2023-10-30
# claimscan-2023: uncovering truth in social media via claim detection and identification of claims spans Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans ( http://arxiv.org/abs/2310.19267v1 ) ライセンス: Link先を確認	Megha Sundriyal and Md Shad Akhtar and Tanmoy Chakraborty	(参考訳) コンテンツ作成と情報交換の大幅な増加は、非常に有利なオンラインソーシャルメディアプラットフォームの開発によって実現されている。しかし、これらのプラットフォームは偽情報、プロパガンダ、偽ニュースを広める人々にとっての場所になっている。主張は世界の認識を形成するのに不可欠ですが、悲しいことに、偽情報を広める人々によって人を騙すために頻繁に使われています。この問題に対処するため、ソーシャルメディアの巨人はコンテンツモデレーターを使って偽ニュースを現実世界からフィルタリングしている。しかし、情報の量が多いため、偽ニュースを効果的に識別することは困難である。したがって、そのような主張をするソーシャルメディア投稿を自動的に特定し、その妥当性を確認し、信頼性と虚偽の主張を区別することが重要になっている。そこで我々は2023年の情報検索評価フォーラム(FIRE'2023)でCLAIMSCANを紹介した。主な目的は、ソーシャルメディア投稿がクレームを構成するかどうかを決定するタスクAと、クレームを構成するポスト内の単語やフレーズを正確に識別するタスクBである。タスクaは40の登録を受け取り、このタイムリーな課題に強い関心と関与を示した。一方、タスクBは28チームから参加し、誤報のデジタル時代における重要性を強調した。 A significant increase in content creation and information exchange has been made possible by the quick development of online social media platforms, which has been very advantageous. However, these platforms have also become a haven for those who disseminate false information, propaganda, and fake news. Claims are essential in forming our perceptions of the world, but sadly, they are frequently used to trick people by those who spread false information. To address this problem, social media giants employ content moderators to filter out fake news from the actual world. However, the sheer volume of information makes it difficult to identify fake news effectively. Therefore, it has become crucial to automatically identify social media posts that make such claims, check their veracity, and differentiate between credible and false claims. In response, we presented CLAIMSCAN in the 2023 Forum for Information Retrieval Evaluation (FIRE'2023). The primary objectives centered on two crucial tasks: Task A, determining whether a social media post constitutes a claim, and Task B, precisely identifying the words or phrases within the post that form the claim. Task A received 40 registrations, demonstrating a strong interest and engagement in this timely challenge. Meanwhile, Task B attracted participation from 28 teams, highlighting its significance in the digital era of misinformation.	翻訳日:2023-11-01 21:50:56 公開日:2023-10-30
# グラフニューラルネットワーク理解のためのメタデータ駆動アプローチ A Metadata-Driven Approach to Understand Graph Neural Networks ( http://arxiv.org/abs/2310.19263v1 ) ライセンス: Link先を確認	Ting Wei Li, Qiaozhu Mei, Jiaqi Ma	(参考訳) グラフニューラルネットワーク(GNN)は様々なアプリケーションで顕著な成功を収めているが、そのパフォーマンスはグラフデータセットの特定のデータ特性に敏感である。 GNNの限界を理解するための現在の文献は、主にネットワーク科学やグラフ理論からヒューリスティックスとドメイン知識を活用してGNNの振る舞いをモデル化する$\textit{model-driven}$アプローチを採用しており、それは時間をかけて非常に主観的である。本研究ではGNNのグラフデータ特性に対する感度を解析するための$\textit{metadata-driven}$アプローチを提案する。多様なデータセットにまたがってGNN性能のベンチマークから得られたメタデータを多変量スパース回帰解析し,データ特性の集合を生成する。データ駆動手法の有効性を検証するため,データ特性の特定,度数分布に着目し,理論解析や制御実験を通じて,この特性がGNNの性能に与える影響について検討する。より平衡度分布のよいデータセットは,ノード表現の線形分離性が向上し,GNNの性能が向上することを示す。また, 次数分布の異なる合成データセットを用いて制御実験を行い, 実験結果が理論値とよく一致した。理論的解析と制御実験の両方により,提案手法がGNNの重要データ特性の同定に有効であることを検証した。 Graph Neural Networks (GNNs) have achieved remarkable success in various applications, but their performance can be sensitive to specific data properties of the graph datasets they operate on. Current literature on understanding the limitations of GNNs has primarily employed a $\textit{model-driven}$ approach that leverage heuristics and domain knowledge from network science or graph theory to model the GNN behaviors, which is time-consuming and highly subjective. In this work, we propose a $\textit{metadata-driven}$ approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. Our theoretical findings reveal that datasets with more balanced degree distribution exhibit better linear separability of node representations, thus leading to better GNN performance. We also conduct controlled experiments using synthetic datasets with varying degree distributions, and the results align well with our theoretical findings. Collectively, both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.	翻訳日:2023-11-01 21:50:25 公開日:2023-10-30
# diversify & conquer: out-of-distribution disagreementによる成果指向カリキュラムrl Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement ( http://arxiv.org/abs/2310.19261v1 ) ライセンス: Link先を確認	Daesol Cho, Seungjae Lee, and H. Jin Kim	(参考訳) 強化学習 (Reinforcement Learning, RL) はしばしば、エージェントが環境の特性や外部報酬といったドメイン知識にアクセスせずに探索すべき非情報探索問題の課題に直面している。これらの課題に対処するため、本研究では、D2C(Diversify for Disagreement & Conquer)と呼ばれるカリキュラムRLの新しいアプローチを提案する。従来のカリキュラム学習法とは異なり、D2Cは所望の成果の少数の例しか必要とせず、その幾何学や所望の成果例の分布に関わらず、どんな環境でも機能する。提案手法は,目標条件分類器の多様化を行い,訪れた結果状態と所望の結果状態の類似性を識別し,未探索領域を定量化し,任意の目標条件固有報酬信号を単純かつ直感的に設計できるようにする。提案手法は両部マッチングを用いて,順応した中間目標の列を生成するカリキュラム学習目標を定義し,エージェントが探索されていない領域を自動的に探索・征服することを可能にする。本研究は,d2cが,任意に分布した望ましい成果例においても,定量的・質的側面において,事前のカリキュラムrl法を上回っていることを示す実験結果を示す。 Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.	翻訳日:2023-11-01 21:49:57 公開日:2023-10-30
# 教師なしデータ取得によるオブジェクト検出のためのオンラインソースフリードメイン適応の改善 Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition ( http://arxiv.org/abs/2310.19258v1 ) ライセンス: Link先を確認	Xiangyu Shi, Yanyuan Qiao, Qi Wu, Lingqiao Liu, Feras Dayoub	(参考訳) 移動ロボットにおける効果的な物体検出は、多様な不慣れな環境での展開によって挑戦される。 Online Source-Free Domain Adaptation (O-SFDA)は、ターゲットドメインからのラベルなしデータのストリームを使用して、リアルタイムなモデル適応を提供する。しかし、モバイルロボティクスにおけるキャプチャーフレームのすべてが、特に強いドメインシフトがある場合、適応に有用な情報を含んでいるわけではない。本稿では,非教師付きデータ取得による移動ロボットの適応物体検出のためのO-SFDAの改良手法を提案する。本手法は,オンライントレーニングプロセスに含まれる最も情報に富む未ラベル標本を優先する。実世界のデータセットに対する実証的な評価により,我々の手法は既存のO-SFDA技術よりも優れており,移動ロボットの適応物体検出を改善するための教師なしデータ取得の可能性を示す。 Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers real-time model adaptation using a stream of unlabeled data from a target domain. However, not all captured frames in mobile robotics contain information that is beneficial for adaptation, particularly when there is a strong domain shift. This paper introduces a novel approach to enhance O-SFDA for adaptive object detection in mobile robots via unsupervised data acquisition. Our methodology prioritizes the most informative unlabeled samples for inclusion in the online training process. Empirical evaluation on a real-world dataset reveals that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the viability of unsupervised data acquisition for improving adaptive object detection in mobile robots.	翻訳日:2023-11-01 21:49:32 公開日:2023-10-30
# マルチビューインスタンスキャプチャによるインスタンス検出のための高分解能データセット A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture ( http://arxiv.org/abs/2310.19257v1 ) ライセンス: Link先を確認	Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong	(参考訳) インスタンス検出(insdet)は、ロボット工学とコンピュータビジョンにおける長期にわたる問題であり、乱雑なシーンでオブジェクトインスタンス(いくつかの視覚的な例で事前に定義されている)を検出することを目的としている。現実的な重要性があるにもかかわらず、その進歩は、事前定義されたクラスに属するオブジェクトを検出するObject Detectionによって隠れている。主な理由は、現在のinsdetデータセットが現在の標準でスケールが小さすぎるためである。例えば、人気のInsDetデータセットGMU(2016年に公開された)は、2014年に公開された有名なオブジェクト検出データセットであるCOCO(80クラス)よりもはるかに少ない23インスタンスしかありません。私たちは新しいInsDetデータセットとプロトコルを導入する動機があります。トレーニングデータは、マルチビューのインスタンスキャプチャと、フリーボックスアノテーションでインスタンスイメージをペーストしてトレーニングイメージを合成可能な、多様なシーンイメージで構成されています。次に,100のオブジェクトインスタンスのマルチビューキャプチャと,高解像度(6k x 8k)テストイメージを含む実世界データベースをリリースする。第3に,insdetのベースライン手法を大規模に検討し,その性能を分析し,今後の課題を示唆する。予想外のクラス非依存のセグメンテーションモデル(segment anything model, sam)と自己教師付き特徴表現であるdinov2は、オブジェクト検出器(例えばfasterrcnnとretinanet)を再利用するエンドツーエンドトレーニングされたinsdetモデルよりも10 ap以上優れたパフォーマンスを実現しています。 Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene. Despite its practical significance, its advancement is overshadowed by Object Detection, which aims to detect objects belonging to some predefined classes. One major reason is that current InsDet datasets are too small in scale by today's standards. For example, the popular InsDet dataset GMU (published in 2016) has only 23 instances, far less than COCO (80 classes), a well-known object detection dataset published in 2014. We are motivated to introduce a new InsDet dataset and protocol. First, we define a realistic setup for InsDet: training data consists of multi-view instance captures, along with diverse scene images allowing synthesizing training images by pasting instance images on them with free box annotations. Second, we release a real-world database, which contains multi-view capture of 100 object instances, and high-resolution (6k x 8k) testing images. Third, we extensively study baseline methods for InsDet on our dataset, analyze their performance and suggest future work. Somewhat surprisingly, using the off-the-shelf class-agnostic segmentation model (Segment Anything Model, SAM) and the self-supervised feature representation DINOv2 performs the best, achieving >10 AP better than end-to-end trained InsDet models that repurpose object detectors (e.g., FasterRCNN and RetinaNet).	翻訳日:2023-11-01 21:49:19 公開日:2023-10-30
# フローベース分布ロバスト最適化 Flow-based Distributionally Robust Optimization ( http://arxiv.org/abs/2310.19253v1 ) ライセンス: Link先を確認	Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie	(参考訳) 本稿では,フローベース分散ロバスト最適化 (DRO) をWassersteinの不確実性集合を用いて解くために,フローベース分散ロバスト最適化 (DRO) と呼ばれる計算効率のよいフレームワークを提案する。計算量的に困難である無限次元最適化問題に取り組むために,データ分布と対象分布との間の連続時間可逆移動写像をフローベースモデルとして活用し,wasserstein近位勾配流型アルゴリズムを開発した。実際には、勾配降下によりブロックで漸進的に訓練されたニューラルネットワークの列によって輸送マップをパラメータ化する。計算フレームワークは一般に,大規模なサンプルサイズを持つ高次元データを扱うことができ,様々な用途に有用である。本稿では, 逆学習, 分散堅牢な仮説テスト, およびデータ駆動型分散摂動摂動差分プライバシーの新しいメカニズムを実証し, 提案手法は実次元データに対して強い経験的性能を与える。 We present a computationally efficient framework, called \texttt{FlowDRO}, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets, when requiring the worst-case distribution (also called the Least Favorable Distribution, LFD) to be continuous so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models, continuous-time invertible transport maps between the data distribution and the target distribution, and develop a Wasserstein proximal gradient flow type of algorithm. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.	翻訳日:2023-11-01 21:48:46 公開日:2023-10-30
# セマンティックセグメンテーションの評価基準の再検討--粒状断面積の最適化と評価 Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union ( http://arxiv.org/abs/2310.19252v1 ) ライセンス: Link先を確認	Zifu Wang and Maxim Berman and Amal Rannen-Triki and Philip H.S. Torr and Devis Tuia and Tinne Tuytelaars and Luc Van Gool and Jiaqian Yu and Matthew B. Blaschko	(参考訳) 意味セグメンテーションデータセットは、しばしば2種類の不均衡を示す: \textit{class imbalance}、あるクラスが他のクラスよりも頻繁に現れる、 \textit{size imbalance}、あるオブジェクトが他のクラスよりも多くのピクセルを占有する。これにより、従来の評価基準は \textit{majority class} (例えば、ピクセル単位の精度) と \textit{large objects} (例えば、平均ピクセル単位の精度とデータセット単位の平均交点) に偏りがちになる。これらの欠点に対処するため,我々は,細粒度mIoUと,それに対応する最悪の指標を用いて,より包括的なセグメンテーション手法の評価を行う。これらのきめ細かいメトリクスは、大きなオブジェクトに対するバイアスの低減、よりリッチな統計情報、モデルとデータセット監査に関する貴重な洞察を提供する。さらに,12種類の自然および空中のセグメンテーションデータセットについて,提案する指標を用いて15の現代ニューラルネットワークを訓練し,評価する,広範なベンチマーク研究を行った。ベンチマークでは,1つの測定値に基づかないことの必要性を強調し,微細なmIoUsが大きな物体への偏りを減少させることを確認した。さらに,アーキテクチャ設計と損失関数が果たす重要な役割を特定し,細粒度メトリクスを最適化するベストプラクティスを導出する。コードは \href{https://github.com/zifuwanggg/jdtlosses}{https://github.com/zifuwanggg/jdtlosses} で入手できる。 Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.	翻訳日:2023-11-01 21:48:26 公開日:2023-10-30
# 事前訓練型レコメンダシステム:因果脱バイアスの観点から Pre-trained Recommender Systems: A Causal Debiasing Perspective ( http://arxiv.org/abs/2310.19251v1 ) ライセンス: Link先を確認	Ziqian Lin, Hao Ding, Nghia Hoang, Branislav Kveton, Anoop Deoras, Hao Wang	(参考訳) 事前学習されたビジョン/言語モデルに関する最近の研究は、AIにおける新しい有望なソリューション構築パラダイムの実践的な利点を実証している。一般的なタスク空間を記述する広いデータに基づいてモデルを事前学習し、トレーニングデータが著しく制限されている場合(例えばゼロまたは少数ショットの学習シナリオ)に、幅広い下流タスクを解決するためにうまく適応できる。このような進展にインスパイアされた本論文では,事前学習モデルの観点からは,このようなパラダイムをレコメンダシステムのコンテキストに適用する可能性や課題について考察する。特に,異なるドメインから抽出された汎用ユーザ・イテムインタラクションデータに基づいて,汎用的なインタラクションパターンを学習することにより,汎用的なインタラクションパターンをキャプチャする汎用レコメンデータを提案する。しかし、セマンティック空間において強い適合性を持つビジョン/言語データとは異なり、異なるドメイン(例えば、異なる国や異なるeコマースプラットフォーム)にまたがるレコメンデーションデータの基礎となる普遍的なパターンは、しばしば、ユーザとアイテムの文化的な違いと、異なるeコマースプラットフォームの使用によって暗黙的に課されるドメイン内およびドメイン横断のバイアスによって引き起こされる。実験で示したように、データ内の不均一なバイアスは、事前学習されたモデルの有効性を阻害する傾向がある。この課題に対処するため,我々は,階層型ベイズ深層学習モデルであるPreRecを用いて,因果脱バイアスの観点を導入し,定式化する。実世界データを用いた実験により,提案モデルが,クロスマーケットシナリオとクロスプラットフォームシナリオの両方において,ゼロ・マイ・ショット学習環境でのレコメンデーション性能を大幅に向上できることを示した。 Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.	翻訳日:2023-11-01 21:47:56 公開日:2023-10-30
# 表データ用エンドツーエンド機械学習パイプラインにおける有用性と公平性のための差分プライベート合成データの評価 Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data ( http://arxiv.org/abs/2310.19250v1 ) ライセンス: Link先を確認	Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres and Rafael de Sousa	(参考訳) differentially private (dp) 合成データセットは、個々のデータプロバイダのプライバシーを維持しながらデータを共有するためのソリューションである。エンドツーエンドの機械学習パイプラインでDP合成データを活用することの効果を理解することは、医療や人道的行動といった分野に影響を及ぼす。本研究では,機械学習パイプラインにおいて,合成データが実際の表データを置き換えることができる範囲を調査し,機械学習モデルのトレーニングと評価に最も有効な合成データ生成技術を特定する。そこで本研究では,個人別合成データが下流の分類課題に与える影響について,実用性や公平性の観点から検討する。私たちの分析は包括的であり、主要な2種類の合成データ生成アルゴリズム(マージンベースとganベース)の代表を含んでいる。私たちの知識を最大限に活用するために、私たちの仕事は最初です。 i) 実データが合成データに基づいて訓練された機械学習モデルの実用性と公正性をテストするために利用できると想定しない訓練・評価フレームワークを提案する。 (ii)機械学習モデルのトレーニングに使用する有用性と公平性の観点から、合成データセット生成アルゴリズムの最も広範な分析を行う。 (iii) 公正性のいくつかの異なる定義を含む。本研究は, グラフデータに対するモデルトレーニングユーティリティに関して, GANベースの合成データジェネレータをはるかに上回っていることを示す。実際、限界ベースのアルゴリズムが生成するデータを用いてトレーニングされたモデルは、実データを用いてトレーニングされたモデルと同様の実用性を示すことができる。また,実データを用いて学習したモデルに類似した実用性と公正性を同時に達成できるモデルを,境界モデルによる合成データ生成MWEM PGMで訓練できることも明らかにした。 Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generator MWEM PGM can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.	翻訳日:2023-11-01 21:47:19 公開日:2023-10-30
# 不確実性誘導境界学習による社会的事象検出 Uncertainty-guided Boundary Learning for Imbalanced Social Event Detection ( http://arxiv.org/abs/2310.19247v1 ) ライセンス: Link先を確認	Jiaqian Ren and Hao Peng and Lei Jiang and Zhiwei Liu and Jia Wu and Zhengtao Yu and Philip S. Yu	(参考訳) 現実世界の社会イベントは通常、厳しい階級不均衡の分布を示し、訓練された検出モデルが深刻な一般化の課題に遭遇する。ほとんどの研究は周波数の観点からこの問題を解決し、テールクラスの表現や分類器学習を強調する。私たちの観察では、クラスのララリティと比較すると、トレーニングの行き届いた深層学習ネットワークから推定された不確かさは、モデルのパフォーマンスをよりよく反映する。この目的のために、不均衡なイベント検出タスクに対して、新しい不確実性誘導型クラス不均衡学習フレームワーク - UCL$_{SED}$とその変種 - UCL-EC$_{SED}$を提案する。モデル一般化をこれらの不確実なクラスに拡張することにより、全体的なモデル性能を向上させることを目指している。性能劣化は、典型的には、誤分類サンプルを隣り合うクラスとして扱うことから来ており、潜時空間における境界学習と高品質不確実性推定による分類器学習に焦点を当てている。まず,不均衡データに対する識別可能な表現分布を操作するために,新しい不確実性誘導型コントラスト学習損失,すなわちuclとその変種であるucl-ecを設計した。訓練中、全てのクラス、特に不確実なクラスは、特徴空間における明確な分離可能な境界を適応的に調整するよう強制する。第二に, より堅牢で正確なクラス不確実性を得るために, 追加校正法の監督のもと, デンプスター・シェーファー理論を通した多視点証拠分類器の結果を組み合わせる。 event2012\_100, events2018\_100, crisislext\_7の3つの深刻な不均衡なソーシャルイベントデータセットについて実験を行った。我々のモデルは、ほとんど全てのクラス、特に不確実なクラスにおいて、社会イベントの表現と分類タスクを大幅に改善する。 Real-world social events typically exhibit a severe class-imbalance distribution, which makes the trained detection model encounter a serious generalization challenge. Most studies solve this problem from the frequency perspective and emphasize the representation or classifier learning for tail classes. While in our observation, compared to the rarity of classes, the calibrated uncertainty estimated from well-trained evidential deep learning networks better reflects model performance. To this end, we propose a novel uncertainty-guided class imbalance learning framework - UCL$_{SED}$, and its variant - UCL-EC$_{SED}$, for imbalanced social event detection tasks. We aim to improve the overall model performance by enhancing model generalization to those uncertain classes. Considering performance degradation usually comes from misclassifying samples as their confusing neighboring classes, we focus on boundary learning in latent space and classifier learning with high-quality uncertainty estimation. First, we design a novel uncertainty-guided contrastive learning loss, namely UCL and its variant - UCL-EC, to manipulate distinguishable representation distribution for imbalanced data. During training, they force all classes, especially uncertain ones, to adaptively adjust a clear separable boundary in the feature space. Second, to obtain more robust and accurate class uncertainty, we combine the results of multi-view evidential classifiers via the Dempster-Shafer theory under the supervision of an additional calibration method. We conduct experiments on three severely imbalanced social event datasets including Events2012\_100, Events2018\_100, and CrisisLexT\_7. Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.	翻訳日:2023-11-01 21:46:56 公開日:2023-10-30
# 単一チャネル用潜在変数モデルのためのスペクトル正規化フレームワーク A spectral regularisation framework for latent variable models designed for single channel applications ( http://arxiv.org/abs/2310.19246v1 ) ライセンス: Link先を確認	Ryan Balshaw, P. Stephan Heyns, Daniel N. Wilke, Stephan Schmidt	(参考訳) 遅延変数モデル(LVM)は一般的に、観測データ内の基盤となる依存関係、パターン、隠れた構造をキャプチャするために使用される。ソース複製は、単一のチャネルLVMアプリケーションに共通するデータハンケライゼーション前処理ステップの副産物であり、実用的なLVM利用を妨げる。本稿では,スペクトル規則化-LVMというPythonパッケージを紹介する。提案パッケージは、新しいスペクトル正規化項の追加により、ソース複製問題に対処する。このパッケージは、単一チャネルのLVMアプリケーションでスペクトル正則化を行うためのフレームワークを提供するため、スペクトル正則化によるLVMの調査と利用が容易になる。これは、LVMパラメータ推定プロセス中にスペクトル正規化を使用するフレームワークに組み込まれた潜在的LVM目的関数の記号的あるいは明示的な表現を使用することによって達成される。このパッケージの目的は、スペクトル正規化と単一チャネルの時系列アプリケーションに適合する一貫した線形lvm最適化フレームワークを提供することである。 Latent variable models (LVMs) are commonly used to capture the underlying dependencies, patterns, and hidden structure in observed data. Source duplication is a by-product of the data hankelisation pre-processing step common to single channel LVM applications, which hinders practical LVM utilisation. In this article, a Python package titled spectrally-regularised-LVMs is presented. The proposed package addresses the source duplication issue via the addition of a novel spectral regularisation term. This package provides a framework for spectral regularisation in single channel LVM applications, thereby making it easier to investigate and utilise LVMs with spectral regularisation. This is achieved via the use of symbolic or explicit representations of potential LVM objective functions which are incorporated into a framework that uses spectral regularisation during the LVM parameter estimation process. The objective of this package is to provide a consistent linear LVM optimisation framework which incorporates spectral regularisation and caters to single channel time-series applications.	翻訳日:2023-11-01 21:46:26 公開日:2023-10-30
# FetusMapV2:3次元超音波による胎児電位推定の強化 FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound ( http://arxiv.org/abs/2310.19293v1 ) ライセンス: Link先を確認	Chaoyu Chen, Xin Yang, Yuhao Huang, Wenlong Shi, Yan Cao, Mingyuan Luo, Xindi Hu, Lei Zhue, Lequan Yu, Kejuan Yue, Yuanji Zhang, Yi Xiong, Dong Ni, Weijun Huang	(参考訳) 3次元超音波(us)における胎児のポーズ推定は、関連する胎児解剖学的ランドマークのセットを同定することを含む。その主な目的は、胎児に関する包括的情報をランドマーク接続を通して提供し、生体計測、平面の局在化、胎児の動き監視といった様々な重要な応用に役立てることである。しかし、3Dの胎児のポーズを正確に推定するには、画像品質の低下、高次元データを扱うための限られたGPUメモリ、対称的または曖昧な解剖学的構造、胎児のポーズのかなりのバリエーションなど、いくつかの課題がある。本研究では,上記の課題を克服するための新しい3次元胎児ポーズ推定フレームワーク(fetusmapv2)を提案する。私たちの貢献は3倍です。まず,gpuメモリが制限された場合,入力画像の解像度を向上し,より良好な結果が得られるような,相補的なネットワーク構造とアクティベーションのないgpuメモリ管理手法を検討するヒューリスティックスキームを提案する。第2に、対称構造と類似の解剖構造による混乱を軽減するために、新しいペアロスを設計する。隠れた分類タスクをランドマークのローカライゼーションタスクから切り離し、モデル学習を徐々に簡単にする。最後に, 比較的安定したランドマークを選択し, 自己教師付き学習方式を提案する。大規模胎児usデータセットにおける広範囲な実験と多種多様な応用により,1巻あたり22のランドマークを含む1000のボリュームが,他の強力な競合相手よりも優れていることが証明された。 Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose in US volume has several challenges, including poor image quality, limited GPU memory for tackling high dimensional data, symmetrical or ambiguous anatomical structures, and considerable variations in fetal poses. In this study, we propose a novel 3D fetal pose estimation framework (called FetusMapV2) to overcome the above challenges. Our contribution is three-fold. First, we propose a heuristic scheme that explores the complementary network structure-unconstrained and activation-unreserved GPU memory management approaches, which can enlarge the input image resolution for better results under limited GPU memory. Second, we design a novel Pair Loss to mitigate confusion caused by symmetrical and similar anatomical structures. It separates the hidden classification task from the landmark localization task and thus progressively eases model learning. Last, we propose a shape priors-based self-supervised learning by selecting the relatively stable landmarks to refine the pose online. Extensive experiments and diverse applications on a large-scale fetal US dataset including 1000 volumes with 22 landmarks per volume demonstrate that our method outperforms other strong competitors.	翻訳日:2023-11-01 21:38:05 公開日:2023-10-30
# 時間知覚質問応答のための変換器への時間グラフの融合 Fusing Temporal Graphs into Transformers for Time-Sensitive Question Answering ( http://arxiv.org/abs/2310.19292v1 ) ライセンス: Link先を確認	Xin Su, Phillip Howard, Nagib Hakim, Steven Bethard	(参考訳) 長い文書から時間に敏感な質問に答えるには、質問や文書の時間的推論が必要である。重要な疑問は、大きな言語モデルが提供されたテキスト文書のみを使用してそのような推論を実行できるのか、それとも他のシステムから抽出された追加の時間情報から恩恵を受けられるのかである。本研究では、既存の時間情報抽出システムを用いて、質問や文書における事象、時間、時間関係の時間グラフを構築する。次に、これらのグラフをTransformerモデルに融合するための様々なアプローチを検討する。実験結果から,入力テキストに時間グラフを融合する手法は,微調整の有無にかかわらずトランスフォーマーモデルの時間的推論能力を大幅に向上させることが示された。さらに,提案手法はグラフ畳み込みに基づくアプローチよりも優れており,SituatedQAとTimeQAの3つの分割による新しい最先端性能を確立している。 Answering time-sensitive questions from long documents requires temporal reasoning over the times in questions and documents. An important open question is whether large language models can perform such reasoning solely using a provided text document, or whether they can benefit from additional temporal information extracted using other systems. We address this research question by applying existing temporal information extraction systems to construct temporal graphs of events, times, and temporal relations in questions and documents. We then investigate different approaches for fusing these graphs into Transformer models. Experimental results show that our proposed approach for fusing temporal graphs into input text substantially enhances the temporal reasoning capabilities of Transformer models with or without fine-tuning. Additionally, our proposed method outperforms various graph convolution-based approaches and establishes a new state-of-the-art performance on SituatedQA and three splits of TimeQA.	翻訳日:2023-11-01 21:37:38 公開日:2023-10-30
# AMLNet:非回帰型マルチ水平時系列予測のための対向的相互学習ニューラルネットワーク AMLNet: Adversarial Mutual Learning Neural Network for Non-AutoRegressive Multi-Horizon Time Series Forecasting ( http://arxiv.org/abs/2310.19289v1 ) ライセンス: Link先を確認	Yang Lin	(参考訳) 多様な領域で重要なマルチホライゾン時系列予測は、高い精度とスピードを要求する。 AutoRegressive(AR)モデルは短期的な予測では優れているが、地平線が広がるにつれて速度とエラーの問題に悩まされる。非自動回帰(NAR)モデルは長期的な予測に適合するが、相互依存に苦慮し、非現実的な結果をもたらす。我々は、オンライン知識蒸留(KD)アプローチにより現実的な予測を実現する革新的なNARモデルであるAMLNetを紹介する。 AMLNetは、深いARデコーダと深いNARデコーダを協調的に訓練し、より浅いNARデコーダに知識を与えるアンサンブル教師として機能することで、ARモデルとNARモデルの長所を活用する。この知識伝達は2つの重要なメカニズムによって促進される。 1) 結果駆動型KDは教師モデルからのKD損失の寄与を動的に重み付けし、浅いNARデコーダがアンサンブルの多様性を組み込むことを可能にする。 2) ヒント駆動型KDは, モデルに隠された状態から有意な洞察を抽出し, 蒸留する。大規模な実験では、従来のARやNARモデルよりもAMLNetの方が優れていることが示され、精度を高め、計算を高速化するマルチホライゾン時系列予測のための有望な道を示す。 Multi-horizon time series forecasting, crucial across diverse domains, demands high accuracy and speed. While AutoRegressive (AR) models excel in short-term predictions, they suffer speed and error issues as the horizon extends. Non-AutoRegressive (NAR) models suit long-term predictions but struggle with interdependence, yielding unrealistic results. We introduce AMLNet, an innovative NAR model that achieves realistic forecasts through an online Knowledge Distillation (KD) approach. AMLNet harnesses the strengths of both AR and NAR models by training a deep AR decoder and a deep NAR decoder in a collaborative manner, serving as ensemble teachers that impart knowledge to a shallower NAR decoder. This knowledge transfer is facilitated through two key mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of KD losses from the teacher models, enabling the shallow NAR decoder to incorporate the ensemble's diversity; and 2) hint-driven KD, which employs adversarial training to extract valuable insights from the model's hidden states for distillation. Extensive experimentation showcases AMLNet's superiority over conventional AR and NAR models, thereby presenting a promising avenue for multi-horizon time series forecasting that enhances accuracy and expedites computation.	翻訳日:2023-11-01 21:37:23 公開日:2023-10-30
# EDiffSR: リモートセンシング画像超解像のための効率的な拡散確率モデル EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution ( http://arxiv.org/abs/2310.19288v1 ) ライセンス: Link先を確認	Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu Jin, and Liangpei Zhang	(参考訳) 近年,畳み込みネットワークは,mse損失などの回帰目標を最小化することで,リモートセンシング画像スーパーリゾルション(sr)において顕著な発展を遂げている。しかし、印象的な性能は達成したものの、これらの手法は過度にスムースな問題を伴う視覚品質の低下に苦しむことが多い。生成的敵ネットワークは複雑な詳細を推測する可能性があるが、それらは容易に崩壊し、望ましくない成果物をもたらす。そこで本稿では,ediffsrと呼ばれる効率的なリモートセンシング画像srのための拡散確率モデル(dpm)を提案する。 EDiffSRは訓練が容易で、知覚障害画像の生成におけるDPMの利点を維持している。具体的には,ノイズ予測にヘビーunetを使用する従来の手法と異なり,チャネル注意と簡易ゲート操作を簡略化し,優れたノイズ予測性能を実現するための効率的な活性化ネットワーク (eanet) を開発し,計算予算を劇的に削減する。さらに,提案するediffsrにより価値の高い事前知識を導入するために,より充実した条件を抽出するためのpractical conditional prior enhancement module (cpem)を開発した。 LR画像の増幅により条件を直接生成するほとんどのDPMベースのSRモデルとは異なり、提案したCPEMは正確なSRのためにより情報的な手がかりを維持するのに役立つ。 4つのリモートセンシングデータセットの大規模な実験により、EDiffSRは、シミュレーションされた実世界のリモートセンシング画像の視覚的な不快なイメージを定量的かつ質的に復元できることを示した。 EDiffSRのコードはhttps://github.com/XY-boy/EDiffSRで入手できる。 Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR	翻訳日:2023-11-01 21:36:58 公開日:2023-10-30
# ブロックチェーンによる半分散フェデレーション学習のスケーラビリティと信頼性向上--信頼ペナリゼーションと非同期機能 Enhancing Scalability and Reliability in Semi-Decentralized Federated Learning With Blockchain: Trust Penalization and Asynchronous Functionality ( http://arxiv.org/abs/2310.19287v1 ) ライセンス: Link先を確認	Ajay Kumar Shrestha, Faijan Ahamad Khan, Mohammed Afaan Shaikh, Amir Jaberzadeh and Jason Geng	(参考訳) ブロックチェーン技術の統合を活用することにより,分散フェデレート学習におけるスケーラビリティと信頼性の課題に対処する,革新的なアプローチを提案する。本稿では,信頼ペナリゼーション機構による参加ノードの信頼性向上と,効率的かつロバストなモデル更新のための非同期機能の実現に着目する。半分散型フェデレートラーニングとブロックチェーン(SDFL-B)を組み合わせることで、データのプライバシーを損なうことなく、公正でセキュアで透明な機械学習環境の構築を目指している。本研究は,スケーラブルで信頼性の高いsdfl-bシステムを育成する上で,このアプローチの利点を示す総合的なシステムアーキテクチャ,方法論,実験結果,議論を提案する。 The paper presents an innovative approach to address the challenges of scalability and reliability in Distributed Federated Learning by leveraging the integration of blockchain technology. The paper focuses on enhancing the trustworthiness of participating nodes through a trust penalization mechanism while also enabling asynchronous functionality for efficient and robust model updates. By combining Semi-Decentralized Federated Learning with Blockchain (SDFL-B), the proposed system aims to create a fair, secure and transparent environment for collaborative machine learning without compromising data privacy. The research presents a comprehensive system architecture, methodologies, experimental results, and discussions that demonstrate the advantages of this novel approach in fostering scalable and reliable SDFL-B systems.	翻訳日:2023-11-01 21:36:31 公開日:2023-10-30
# 単純コンプレックス上のランダムウォークによるグラフニューラルネットワークのファシリテート Facilitating Graph Neural Networks with Random Walk on Simplicial Complexes ( http://arxiv.org/abs/2310.19285v1 ) ライセンス: Link先を確認	Cai Zhou and Xiyuan Wang and Muhan Zhang	(参考訳) ノードレベルのランダムウォークは、グラフニューラルネットワークの改善に広く使用されている。しかし、エッジ上のランダムウォークや、より一般的には$k$-simplicesへの注意は限定されている。本稿では,Simplicial Complex (SC) の異なる順序でのランダムウォーキングが,GNNの理論的表現性をいかに促進するかを系統的に分析する。まず、$0$-simplicesまたはnodeレベルにおいて、ランダムウォークのブリッジを介して既存の位置符号化(pe)と構造符号化(se)メソッドの接続を確立する。第二に、単体またはエッジレベルでは、エッジレベルのランダムウォークとhodgeをそれぞれ1ドルのlaplacianと対応するedge peに橋渡しします。空間領域では、エッジレベルのランダムウォークを直接利用してEdgeRWSEを構築する。 Hodge 1-Laplcians のスペクトル解析に基づいて、置換同変および表現的エッジレベルの位置符号化である Hodge1Lap を提案する。第3に,本理論を高次簡素なランダムウォークに一般化し,ランダムウォークとホッジラプラシアンに基づく簡素なpeを設計する一般原理を提案する。幅広い単純化されたネットワークを統一するために、レベル間ランダムウォークも導入されている。ランダムウォーク法の有効性を検証する広範な実験を行った。 Node-level random walk has been widely used to improve Graph Neural Networks. However, there is limited attention to random walk on edge and, more generally, on $k$-simplices. This paper systematically analyzes how random walk on different orders of simplicial complexes (SC) facilitates GNNs in their theoretical expressivity. First, on $0$-simplices or node level, we establish a connection between existing positional encoding (PE) and structure encoding (SE) methods through the bridge of random walk. Second, on $1$-simplices or edge level, we bridge edge-level random walk and Hodge $1$-Laplacians and design corresponding edge PE respectively. In the spatial domain, we directly make use of edge level random walk to construct EdgeRWSE. Based on the spectral analysis of Hodge $1$-Laplcians, we propose Hodge1Lap, a permutation equivariant and expressive edge-level positional encoding. Third, we generalize our theory to random walk on higher-order simplices and propose the general principle to design PE on simplices based on random walk and Hodge Laplacians. Inter-level random walk is also introduced to unify a wide range of simplicial networks. Extensive experiments verify the effectiveness of our random walk-based methods.	翻訳日:2023-11-01 21:36:14 公開日:2023-10-30
# rTsfNet: マルチヘッド3次元回転と時系列特徴抽出による人間活動認識のためのDNNモデル rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity Recognition ( http://arxiv.org/abs/2310.19283v1 ) ライセンス: Link先を確認	Yu Enokibori	(参考訳) 本稿では,Multi-head 3D Rotation and Time Series Feature extractを用いたDNNモデルであるrTsfNetを,IMUに基づく人間活動認識(HAR)のための新しいDNNモデルとして提案する。 rTsfNetはDNN内で3D回転パラメータを導出することで特徴を導出する3Dベースを自動的に選択する。そして、多くの研究者の知恵である時系列特徴(TSF)を導出し、MLPを用いてHARを実現する。 CNNを使用しないモデルは、よく管理されたベンチマーク条件と複数のデータセット(UCI HAR、PAMAP2、Daphnet、OPPORTUNITY)の下で既存のモデルよりも高い精度を達成した。 This paper proposes rTsfNet, a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction, as a new DNN model for IMU-based human activity recognition (HAR). rTsfNet automatically selects 3D bases from which features should be derived by deriving 3D rotation parameters within the DNN. Then, time series features (TSFs), the wisdom of many researchers, are derived and realize HAR using MLP. Although a model that does not use CNN, it achieved the highest accuracy than existing models under well-managed benchmark conditions and multiple datasets: UCI HAR, PAMAP2, Daphnet, and OPPORTUNITY, which target different activities.	翻訳日:2023-11-01 21:35:52 公開日:2023-10-30
# トーリックカラビ・ヤウ3次元多様体の最小体積公式のための機械学習正規化 Machine Learning Regularization for the Minimum Volume Formula of Toric Calabi-Yau 3-folds ( http://arxiv.org/abs/2310.19276v1 ) ライセンス: Link先を確認	Eugene Choi, Rak-Kyeong Seong	(参考訳) 佐々木・アインシュタイン5次元多様体の最小体積に対する明示的な公式の集合を示す。これらの5次元多様体上の円錐はトーリック・カラビ・ヤウ3次元多様体である。これらのトーリックカラビ・ヤウ3次元多様体は、4d n=1 超対称ゲージ理論の無限類と関連付けられ、トーリックカラビ・ヤウ3次元多様体を推定するd3ブレーンの世界体積理論として実現される。 AdS/CFT対応の下では、佐々木・アインシュタイン基底の最小体積は対応する4d N=1超等角体理論の中心電荷に逆比例する。最小体積の公式は、トーリック・カラビ・ヤウ3次元多様体の幾何学的不変量の観点から表される。これらの明確な結果は、最小体積を決定する機械学習の以前の応用を超えて進歩する機械学習正規化技術を実装することで導かれる。さらに、機械学習正規化を用いることで、最小体積に対して解釈可能かつ説明可能な式を提示できる。我々の研究は、広範なトーリック・カラビ・ヤウ3次元多様体の集合であっても、最小体積を顕著な精度で近似することを確認する。 We present a collection of explicit formulas for the minimum volume of Sasaki-Einstein 5-manifolds. The cone over these 5-manifolds is a toric Calabi-Yau 3-fold. These toric Calabi-Yau 3-folds are associated with an infinite class of 4d N=1 supersymmetric gauge theories, which are realized as worldvolume theories of D3-branes probing the toric Calabi-Yau 3-folds. Under the AdS/CFT correspondence, the minimum volume of the Sasaki-Einstein base is inversely proportional to the central charge of the corresponding 4d N=1 superconformal field theories. The presented formulas for the minimum volume are in terms of geometric invariants of the toric Calabi-Yau 3-folds. These explicit results are derived by implementing machine learning regularization techniques that advance beyond previous applications of machine learning for determining the minimum volume. Moreover, the use of machine learning regularization allows us to present interpretable and explainable formulas for the minimum volume. Our work confirms that, even for extensive sets of toric Calabi-Yau 3-folds, the proposed formulas approximate the minimum volume with remarkable accuracy.	翻訳日:2023-11-01 21:35:36 公開日:2023-10-30
# グラフニューラルネットワークによる岩石の有効弾性率の予測 Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks ( http://arxiv.org/abs/2310.19274v1 ) ライセンス: Link先を確認	Jaehong Chung, Rasool Ahmad, WaiChing Sun, Wei Cai, Tapan Mukerji	(参考訳) 本研究では,デジタルCTスキャン画像から岩石の効率的な弾性変調を予測するためのグラフニューラルネットワーク(GNN)に基づくアプローチを提案する。マッパーアルゴリズムを用いて3dデジタル岩盤画像をグラフデータセットに変換し,本質的な幾何学的情報をカプセル化する。これらのグラフは、訓練後、弾性率を予測するのに有効である。 gnnモデルでは,様々なサブキューブ次元から導出される様々なグラフサイズにわたるロバストな予測能力を示す。テストデータセットでうまく機能するだけでなく、見えない岩や探索されていないサブキューブサイズの予測精度も高い。畳み込みニューラルネットワーク (CNN) との比較解析により, 未知の岩石特性の予測において, GNNの優れた性能が示された。さらに、微細構造のグラフ表現は、gpuメモリ要求(cnnのグリッド表現と比較)を大幅に削減し、バッチサイズ選択の柔軟性を高める。本研究は, 岩盤特性の予測精度を高め, ディジタル岩盤解析の効率化におけるGNNモデルの可能性を示す。 This study presents a Graph Neural Networks (GNNs)-based approach for predicting the effective elastic moduli of rocks from their digital CT-scan images. We use the Mapper algorithm to transform 3D digital rock images into graph datasets, encapsulating essential geometrical information. These graphs, after training, prove effective in predicting elastic moduli. Our GNN model shows robust predictive capabilities across various graph sizes derived from various subcube dimensions. Not only does it perform well on the test dataset, but it also maintains high prediction accuracy for unseen rocks and unexplored subcube sizes. Comparative analysis with Convolutional Neural Networks (CNNs) reveals the superior performance of GNNs in predicting unseen rock properties. Moreover, the graph representation of microstructures significantly reduces GPU memory requirements (compared to the grid representation for CNNs), enabling greater flexibility in the batch size selection. This work demonstrates the potential of GNN models in enhancing the prediction accuracy of rock properties and boosting the efficiency of digital rock analysis.	翻訳日:2023-11-01 21:35:11 公開日:2023-10-30
# メモリ摂動方程式:データに対するモデルの感度を理解する The Memory Perturbation Equation: Understanding Model's Sensitivity to Data ( http://arxiv.org/abs/2310.19273v1 ) ライセンス: Link先を確認	Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas M\"ollenhoff, Mohammad Emtiyaz Khan	(参考訳) モデルのトレーニングデータに対する感度を理解することは重要であるが、特にトレーニング中は困難でコストもかかる。このような問題を単純化するために,モデルの摂動に対する感度をトレーニングデータに関連付けるメモリ・摂動方程式(MPE)を提案する。ベイズ原理を用いて導かれた MPE は、既存の感度測定を統一し、モデルやアルゴリズムの多種多様に一般化し、感度に関する有用な特性を明らかにする。実験の結果, 訓練中に得られた感度推定は, テストデータの一般化を忠実に予測できることがわかった。提案方程式は,ロバスト・適応学習の今後の研究に有用であると考えられる。 Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.	翻訳日:2023-11-01 21:34:43 公開日:2023-10-30
# NPCL:不確かさを意識した連続学習のためのニューラルプロセス NPCL: Neural Processes for Uncertainty-Aware Continual Learning ( http://arxiv.org/abs/2310.19272v1 ) ライセンス: Link先を確認	Saurav Jha and Dong Gong and He Zhao and Lina Yao	(参考訳) 連続学習(CL)は、新しいタスクによる忘れを制限しながら、ストリーミングデータ上でディープニューラルネットワークを効率的にトレーニングすることを目的としている。しかし、タスク間の干渉が少なくて伝達可能な知識を学習することは困難であり、予測の不確実性を測定することができないため、実世界のCLモデルの展開は制限される。これらの問題に対処するため,我々は,様々なタスクを関数上の確率分布にエンコードし,信頼性の高い不確実性推定を提供するメタリーナーのクラスであるneural process (nps) を用いたclタスクの処理を提案する。具体的には,タスク固有のモジュールを階層的潜在変数モデルに配置したNP-based CL approach (NPCL)を提案する。学習された潜在分布の正規化子を調整し、忘れを緩和する。 NPCLの不確実性推定機能は、CLのタスクヘッド/モジュール推論問題に対処するためにも使用できる。実験の結果,NPCLは従来のCLアプローチよりも優れていた。 NPCLにおける不確実性推定の有効性を検証し、新しいデータを特定し、インスタンスレベルのモデルの信頼性を評価する。コードは \url{https://github.com/srvCodes/NPCL} で入手できる。 Continual learning (CL) aims to train deep neural networks efficiently on streaming data while limiting the forgetting caused by new tasks. However, learning transferable knowledge with less interference between tasks is difficult, and real-world deployment of CL models is limited by their inability to measure predictive uncertainties. To address these issues, we propose handling CL tasks with neural processes (NPs), a class of meta-learners that encode different tasks into probabilistic distributions over functions all while providing reliable uncertainty estimates. Specifically, we propose an NP-based CL approach (NPCL) with task-specific modules arranged in a hierarchical latent variable model. We tailor regularizers on the learned latent distributions to alleviate forgetting. The uncertainty estimation capabilities of the NPCL can also be used to handle the task head/module inference challenge in CL. Our experiments show that the NPCL outperforms previous CL approaches. We validate the effectiveness of uncertainty estimation in the NPCL for identifying novel data and evaluating instance-level model confidence. Code is available at \url{https://github.com/srvCodes/NPCL}.	翻訳日:2023-11-01 21:34:21 公開日:2023-10-30
# 勤勉トロールの愛への学習--対話安全タスクにおけるレーダ効果の会計 Learning to love diligent trolls: Accounting for rater effects in the dialogue safety task ( http://arxiv.org/abs/2310.19271v1 ) ライセンス: Link先を確認	Michael John Ilagan	(参考訳) チャットボットは攻撃的な発話を発生させるリスクがあり、避けなければならない。デプロイ後、チャットボットを継続的に改善する方法の1つは、ライブユーザからのフィードバックから発話/ラベルペアをソースすることだ。しかし、ユーザの中には、間違ったラベルでトレーニング例を提供するトロールがいる。ロールオフトレーニングデータには、ユーザ集約クロスバリデーション(cv)エラーの高いトレーニング例が削除されている。しかし、CVは高価であり、協調攻撃においては、CVはトロルの数と一貫性に圧倒される可能性がある。本研究は,自動エッセイ評価(AES)における方法論にインスパイアされたソリューションを提案することにより,両方の制約に対処する。 GPU計算を必要としないため、LCAは安価である。実験では, トロルが多数である場合でも, AESライクなソリューションは, トロルが一貫した場合には高い精度でトレーニングラベルを推測できることがわかった。 Chatbots have the risk of generating offensive utterances, which must be avoided. Post-deployment, one way for a chatbot to continuously improve is to source utterance/label pairs from feedback by live users. However, among users are trolls, who provide training examples with incorrect labels. To de-troll training data, previous work removed training examples that have high user-aggregated cross-validation (CV) error. However, CV is expensive; and in a coordinated attack, CV may be overwhelmed by trolls in number and in consistency among themselves. In the present work, I address both limitations by proposing a solution inspired by methodology in automated essay scoring (AES): have multiple users rate each utterance, then perform latent class analysis (LCA) to infer correct labels. As it does not require GPU computations, LCA is inexpensive. In experiments, I found that the AES-like solution can infer training labels with high accuracy when trolls are consistent, even when trolls are the majority.	翻訳日:2023-11-01 21:33:51 公開日:2023-10-30
# リーマン対称空間上の不変核:調和解析的アプローチ Invariant kernels on Riemannian symmetric spaces: a harmonic-analytic approach ( http://arxiv.org/abs/2310.19270v1 ) ライセンス: Link先を確認	Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega, Salem Said	(参考訳) この研究は、古典ガウス核が非ユークリッド対称空間上で定義されるとき、パラメータの選択に対して正定でないことを証明することを目的としている。この目的を達成するために,新しい幾何学的および解析的議論を考案した。これらはガウス核の正定値の厳密な特徴づけであり、これは完備だが、数値計算によって扱われる低次元のシナリオは限られている。この結果のチーフは、l$^{\! p}$-$\hspace{0.02cm}$Godement theorems (ここで$p = 1,2$) は、非コンパクト型の対称空間上で定義されるカーネルが正定値となるために必要な十分条件を提供する。ボークナー・ゴッジメントの定理(bochner-godement theorem)と呼ばれる有名な定理は、既にそのような条件を与えており、その範囲でははるかに一般的であるが、特に適用が難しい。ガウス核との接続を超えて、この研究の新しい結果は対称空間上の不変核の研究のための青写真を書き、将来の多くの応用を示唆する特定の調和解析ツールをもたらす。 This work aims to prove that the classical Gaussian kernel, when defined on a non-Euclidean symmetric space, is never positive-definite for any choice of parameter. To achieve this goal, the paper develops new geometric and analytical arguments. These provide a rigorous characterization of the positive-definiteness of the Gaussian kernel, which is complete but for a limited number of scenarios in low dimensions that are treated by numerical computations. Chief among these results are the L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement theorems (where $p = 1,2$), which provide verifiable necessary and sufficient conditions for a kernel defined on a symmetric space of non-compact type to be positive-definite. A celebrated theorem, sometimes called the Bochner-Godement theorem, already gives such conditions and is far more general in its scope, but is especially hard to apply. Beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.	翻訳日:2023-11-01 21:33:10 公開日:2023-10-30
# Redditのナラティブにおける道徳判断:社会常識と言語信号による道徳的火花の調査 Moral Judgments in Narratives on Reddit: Investigating Moral Sparks via Social Commonsense and Linguistic Signals ( http://arxiv.org/abs/2310.19268v1 ) ライセンス: Link先を確認	Ruijie Xi, Munindar P. Singh	(参考訳) オンラインのソーシャルインタラクションの現実性が高まる中、ソーシャルメディアは実生活のモラルシナリオを評価する前例のない手段を提供する。著者やコメンテーターが、誰が非難に値するかを道徳的な判断で共有するRedditの記事を調べます。我々は,(1)社会的常識を活性化する出来事,(2)言語的シグナルなど,道徳的判断に影響する要因を調査するために,計算手法を用いる。この目的のために、我々は、道徳的判断を動機付けるものを示すために、コメンテーターが含むオリジナルの投稿からモラル的火花と呼ぶ抜粋に焦点を当てる。 24,672件以上の投稿と175,988件のコメントを調べると、出来事に関連した否定的な個人的特徴(例えば未熟さや無礼さ)が注目され、非難を喚起し、モラルの火花と責任性の関係を示唆する。さらに、コメンテータの認知過程に影響を及ぼす言語は、出来事や文字を描写することで、抜粋の可能性が道徳的な火花となり、事実や具体的記述はこの効果を阻害しがちである。 Given the increasing realism of social interactions online, social media offers an unprecedented avenue to evaluate real-life moral scenarios. We examine posts from Reddit, where authors and commenters share their moral judgments on who is blameworthy. We employ computational techniques to investigate factors influencing moral judgments, including (1) events activating social commonsense and (2) linguistic signals. To this end, we focus on excerpt-which we term moral sparks-from original posts that commenters include to indicate what motivates their moral judgments. By examining over 24,672 posts and 175,988 comments, we find that event-related negative personal traits (e.g., immature and rude) attract attention and stimulate blame, implying a dependent relationship between moral sparks and blameworthiness. Moreover, language that impacts commenters' cognitive processes to depict events and characters enhances the probability of an excerpt become a moral spark, while factual and concrete descriptions tend to inhibit this effect.	翻訳日:2023-11-01 21:32:46 公開日:2023-10-30
# TempME: Motif Discoveryによる時間グラフニューラルネットワークの説明可能性を目指して TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery ( http://arxiv.org/abs/2310.19324v1 ) ライセンス: Link先を確認	Jialin Chen, Rex Ying	(参考訳) 時空グラフは時変相互作用を伴う動的システムのモデル化に広く使われている。現実のシナリオでは、動的システムにおける未来の相互作用を生成するメカニズムは、典型的には時間的モチーフとして知られるグラフ内の一連の反復的なサブ構造によって制御される。現在の時間グラフニューラルネットワーク(TGNN)の成功と普及にもかかわらず、時間的モチーフがモデルから特定の予測を誘導する重要な指標として認識されているかは定かではない。この課題に対処するために、TGNNの予測を導く最も重要な時間的モチーフを明らかにする、TempME(Temporal Motifs Explainer)と呼ばれる新しいアプローチを提案する。情報ボトルネックの原理から、TempMEは最もインタラクションに関連するモチーフを抽出し、含んでいる情報の量を最小化し、説明の空間性と簡潔性を維持する。 TempMEによる説明のイベントは、既存のアプローチよりも時空間的相関が強く、より理解可能な洞察を提供する。広範な実験によりテンポムの優位性が検証され、6つの実世界のデータセットで説明精度が最大8.21%向上し、現在のtgnnの予測平均精度が最大22.96%向上した。 Temporal graphs are widely used to model dynamic systems with time-varying interactions. In real-world scenarios, the underlying mechanisms of generating future interactions in dynamic systems are typically governed by a set of recurring substructures within the graph, known as temporal motifs. Despite the success and prevalence of current temporal graph neural networks (TGNN), it remains uncertain which temporal motifs are recognized as the significant indications that trigger a certain prediction from the model, which is a critical challenge for advancing the explainability and trustworthiness of current TGNNs. To address this challenge, we propose a novel approach, called Temporal Motifs Explainer (TempME), which uncovers the most pivotal temporal motifs guiding the prediction of TGNNs. Derived from the information bottleneck principle, TempME extracts the most interaction-related motifs while minimizing the amount of contained information to preserve the sparsity and succinctness of the explanation. Events in the explanations generated by TempME are verified to be more spatiotemporally correlated than those of existing approaches, providing more understandable insights. Extensive experiments validate the superiority of TempME, with up to 8.21% increase in terms of explanation accuracy across six real-world datasets and up to 22.96% increase in boosting the prediction Average Precision of current TGNNs.	翻訳日:2023-11-01 21:25:50 公開日:2023-10-30
# pronet:マルチホリゾン時系列予測のためのプログレッシブニューラルネットワーク ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting ( http://arxiv.org/abs/2310.19322v1 ) ライセンス: Link先を確認	Yang Lin	(参考訳) 本稿では,マルチ水平時系列予測のための新しいディープラーニング手法であるProNetを紹介し,自己回帰(AR)と非自己回帰(NAR)戦略を適応的にブレンドする。本手法では,予測水平線をセグメントに分割し,非自己回帰的に各セグメントの最も重要なステップを予測し,残りのステップを自己回帰的に行う。分節過程は潜時変数に依存しており、変動推論によって個々の時間ステップの重要性を効果的に捉えている。 ARモデルと比較して、ProNetは顕著なアドバンテージを示し、ARイテレーションを少なくし、予測速度を高速化し、エラーの蓄積を軽減している。一方、NARモデルと比較すると、ProNetは出力空間における予測の相互依存性を考慮に入れ、予測精度が向上する。 4つの大規模データセットを包含する包括的評価およびアブレーション研究により,pronetの有効性が示され,精度と予測速度,最先端arおよびnar予測モデルよりも優れた性能を示す。 In this paper, we introduce ProNet, an novel deep learning approach designed for multi-horizon time series forecasting, adaptively blending autoregressive (AR) and non-autoregressive (NAR) strategies. Our method involves dividing the forecasting horizon into segments, predicting the most crucial steps in each segment non-autoregressively, and the remaining steps autoregressively. The segmentation process relies on latent variables, which effectively capture the significance of individual time steps through variational inference. In comparison to AR models, ProNet showcases remarkable advantages, requiring fewer AR iterations, resulting in faster prediction speed, and mitigating error accumulation. On the other hand, when compared to NAR models, ProNet takes into account the interdependency of predictions in the output space, leading to improved forecasting accuracy. Our comprehensive evaluation, encompassing four large datasets, and an ablation study, demonstrate the effectiveness of ProNet, highlighting its superior performance in terms of accuracy and prediction speed, outperforming state-of-the-art AR and NAR forecasting models.	翻訳日:2023-11-01 21:25:24 公開日:2023-10-30
# D4Explainer:離散化拡散による分散GNN説明 D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion ( http://arxiv.org/abs/2310.19321v1 ) ライセンス: Link先を確認	Jialin Chen, Shirley Wu, Abhijit Gupta, Rex Ying	(参考訳) グラフニューラルネットワーク(GNN)の広範な展開は、モデル監査と信頼できるグラフ学習の確保において重要な役割を果たす、その説明可能性に大きな関心を喚起する。 GNNの説明可能性の目的は、モデル予測に最も大きな影響を与える基礎となるグラフ構造を識別することである。生成した説明が、特にGNNのアウト・オブ・ディストリビューションデータに対する脆弱性のために、イン・ディストリビューション特性の信頼性が要求される。残念ながら、一般的な説明可能性法は、生成した説明を元のグラフの構造に制約する傾向にあり、したがって分配性の重要性を軽視し、信頼性に欠ける説明をもたらす。これらの課題に対処するため、我々はD4Explainerを提案する。D4Explainerは、偽物とモデルレベルの説明シナリオの両方に対して、分散GNN説明を提供する新しいアプローチである。提案したD4Explainerは、生成グラフ分布学習を最適化目標に組み込む。 1) 与えられたインスタンスの分配特性に適合する多様な反事実グラフの集合を生成し、 2)特定のクラス予測に寄与する最も識別的なグラフパターンを特定し、モデルレベルの説明に役立てる。 d4explainerは、反事実とモデルレベルの説明を組み合わせる最初の統一フレームワークである。合成および実世界のデータセットで実施された実証的な評価は、D4Explainerによって達成された最先端のパフォーマンスを、説明精度、忠実性、多様性、堅牢性の観点から、説得力のある証拠を提供する。 The widespread deployment of Graph Neural Networks (GNNs) sparks significant interest in their explainability, which plays a vital role in model auditing and ensuring trustworthy graph learning. The objective of GNN explainability is to discern the underlying graph structures that have the most significant impact on model predictions. Ensuring that explanations generated are reliable necessitates consideration of the in-distribution property, particularly due to the vulnerability of GNNs to out-of-distribution data. Unfortunately, prevailing explainability methods tend to constrain the generated explanations to the structure of the original graph, thereby downplaying the significance of the in-distribution property and resulting in explanations that lack reliability. To address these challenges, we propose D4Explainer, a novel approach that provides in-distribution GNN explanations for both counterfactual and model-level explanation scenarios. The proposed D4Explainer incorporates generative graph distribution learning into the optimization objective, which accomplishes two goals: 1) generate a collection of diverse counterfactual graphs that conform to the in-distribution property for a given instance, and 2) identify the most discriminative graph patterns that contribute to a specific class prediction, thus serving as model-level explanations. It is worth mentioning that D4Explainer is the first unified framework that combines both counterfactual and model-level explanations. Empirical evaluations conducted on synthetic and real-world datasets provide compelling evidence of the state-of-the-art performance achieved by D4Explainer in terms of explanation accuracy, faithfulness, diversity, and robustness.	翻訳日:2023-11-01 21:25:04 公開日:2023-10-30
# 効率的純探索のためのデュアル指向アルゴリズムの設計 Dual-Directed Algorithm Design for Efficient Pure Exploration ( http://arxiv.org/abs/2310.19319v1 ) ライセンス: Link先を確認	Chao Qin and Wei You	(参考訳) 確率的逐次適応実験の文脈における純粋探索問題を考える。意思決定者の目標は、最小限の測定努力で高い信頼性で代替案に関する質問に正確に答えることである。典型的なクエリ質問は、最も優れたパフォーマンスを持つ選択肢を特定し、ランク付けと選択の問題、あるいは機械学習文献における最善のアーム識別に導くことである。我々は, 固定精度設定に着目し, 試料の最適配置に対する強い収束の概念の観点から, 最適性の十分条件を導出する。双対変数を用いて、割り当てが最適であるために必要な条件を特徴付ける。双対変数を用いることで、原始変数のみに依存する最適条件の組合せ構造をバイパスすることができる。注目すべきは、これらの最適条件は、最初ベストアーム識別のために提案されたトップ2のアルゴリズム設計原則の拡張を可能にすることである。さらに, 最適性条件は, 候補の情報ゲインに基づいて, 候補集合から適応的に選択する情報指向選択規則を, 単純かつ効率的な選択規則として導出する。アルゴリズムアプローチを実装するための広いコンテキストについて概説する。我々は,情報指向の選択と組み合わせることで,gaussian best-arm 同定に最適化されたトップツートンプソンサンプリングが(漸近的に)最適であることを示す。我々のアルゴリズムは、$\epsilon$-best-armの識別と閾値帯域幅問題に最適である。また,本解析は,純粋な爆発問題に対するトンプソンサンプリングの適応を導く一般原則も導いた。数値実験は,提案アルゴリズムの既存のアルゴリズムと比較して,例外的な効率性を示す。 We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for $\epsilon$-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones.	翻訳日:2023-11-01 21:24:39 公開日:2023-10-30
# L2T-DLN:動的損失ネットワークによる学習 L2T-DLN: Learning to Teach with Dynamic Loss Network ( http://arxiv.org/abs/2310.19313v1 ) ライセンス: Link先を確認	Zhoyang Hai, Liyuan Pan, Xiabi Liu, Zhengzheng Liu, Mirna Yunita	(参考訳) 教育の概念が機械学習コミュニティに導入されることにより、教師モデルは動的損失関数を使用して学生モデルのトレーニングを教えるようになる。動的には、適応的損失関数を学生モデル学習の異なるフェーズに設定することを意図している。既存の作品における教師モデル 1) 単に学生モデルの現状に基づいて損失関数を決定するだけで、すなわち、教師の経験を無視する。 2)学生モデルの状態(例えば、訓練イテレーション番号と訓練/評価セットからの損失/正確性)のみを利用するが、損失関数の状態は無視する。本稿では,まず,記憶単位を用いた教師モデルの設計により,時間的課題として損失調整を定式化し,教師モデルの経験から生徒の学習を誘導する。そして、動的損失ネットワークを用いて、教師と生徒モデルとの相互作用を高めるために、教師の学習を支援するために、損失の状態を追加して利用することができる。広範な実験により,本手法は学生の学習を増強し,分類,客観的検出,意味セグメンテーションシナリオを含む実世界課題における様々な深層モデルの性能を向上させることを実証した。 With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.	翻訳日:2023-11-01 21:24:12 公開日:2023-10-30
# スパース状態の効率的生成のための単純量子アルゴリズム A simple quantum algorithm to efficiently prepare sparse states ( http://arxiv.org/abs/2310.19309v1 ) ライセンス: Link先を確認	Debora Ramacciotti, Andreea-Iulia Lefterovici, Antonio F. Rotundo	(参考訳) 状態準備は、多くのアルゴリズムが提案されている量子計算の基本的なルーチンである。中でも最も単純なのがgrover-rudolphアルゴリズムである。本稿では,準備状態がスパースである場合に,本アルゴリズムの性能を解析する。ゲートの複雑性は状態の非零振幅数において線形であり、キュービット数では2次であることを示す。次に,量子ビット数への依存性を線形にするために,アルゴリズムの簡単な修正を導入する。これはスパース状態準備のための最もよく知られたアルゴリズムと競合する State preparation is a fundamental routine in quantum computation, for which many algorithms have been proposed. Among them, perhaps the simplest one is the Grover-Rudolph algorithm. In this paper, we analyse the performance of this algorithm when the state to prepare is sparse. We show that the gate complexity is linear in the number of non-zero amplitudes in the state and quadratic in the number of qubits. We then introduce a simple modification of the algorithm, which makes the dependence on the number of qubits also linear. This is competitive with the best known algorithms for sparse state preparation	翻訳日:2023-11-01 21:23:54 公開日:2023-10-30
# ベルマン完全性がない:モデルに基づく回帰条件付き教師付き学習による軌道ステッチ Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning ( http://arxiv.org/abs/2310.19308v1 ) ライセンス: Link先を確認	Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du	(参考訳) q$-learningのようなオフポリシー動的プログラミング(dp)技術は、シーケンシャルな意思決定問題を解決する重要な技術であることが証明されている。しかし、関数近似の存在下では、そのようなアルゴリズムは収束することが保証されておらず、しばしば、考慮された関数クラスにおいてベルマン完全性が欠如しているため、DPベースの手法の成功にとって重要な条件である。本稿では,回帰条件付き教師付き学習(return-conditioned supervised learning,rcsl)に基づくオフポリシー学習手法がベルマン完全性という課題を回避できることを示す。関数近似器として2層多層パーセプトロンを用いる場合, 一定の層幅がrcslに十分である一方で, ベルマン完全性を満たすために, 状態空間サイズと線形に層幅を成長させる必要がある。これらの結果は, ほぼ最適データセットを用いた環境におけるDP法と比較して, RCSL法の優れた経験的性能を説明するための一歩となる。さらに、最適部分データセットから学習するために、RCSLメソッドに異なる軌道からセグメントを縫合する動的プログラミング機能を与えるMBRCSLという単純なフレームワークを提案する。 MBRCSLは、学習された動的モデルと前方サンプリングを利用して、全ての動的プログラミングアルゴリズムを悩ませるベルマン完全性の必要性を回避しつつ、軌道縫合を達成する。これらの主張を裏付ける理論解析と実験評価の両方を提案し、いくつかのシミュレーションロボット問題に対して最先端のモデルフリーおよびモデルベースオフラインrlアルゴリズムを上回っている。 Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be an important technique for solving sequential decision-making problems. However, in the presence of function approximation such algorithms are not guaranteed to converge, often diverging due to the absence of Bellman-completeness in the function classes considered, a crucial condition for the success of DP-based methods. In this paper, we show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent these challenges of Bellman completeness, converging under significantly more relaxed assumptions inherited from supervised learning. We prove there exists a natural environment in which if one uses two-layer multilayer perceptron as the function approximator, the layer width needs to grow linearly with the state space size to satisfy Bellman-completeness while a constant layer width is enough for RCSL. These findings take a step towards explaining the superior empirical performance of RCSL methods compared to DP-based methods in environments with near-optimal datasets. Furthermore, in order to learn from sub-optimal datasets, we propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories. MBRCSL leverages learned dynamics models and forward sampling to accomplish trajectory stitching while avoiding the need for Bellman completeness that plagues all dynamic programming algorithms. We propose both theoretical analysis and experimental evaluation to back these claims, outperforming state-of-the-art model-free and model-based offline RL algorithms across several simulated robotics problems.	翻訳日:2023-11-01 21:23:46 公開日:2023-10-30
# 極端機械力場に対する計画・探索的アプローチ A Planning-and-Exploring Approach to Extreme-Mechanics Force Fields ( http://arxiv.org/abs/2310.19306v1 ) ライセンス: Link先を確認	Pengjie Shi and Zhiping Xu	(参考訳) 強い格子歪みや破壊時の結合破壊のような極端な機械的プロセスは、自然と工学においてユビキタスであり、しばしば構造が破滅的な破壊を引き起こす。しかし, き裂の核生成と成長を理解するには, き裂先端の原子準位構造から荷重が印加される構造的特徴まで幅広い多スケール特性が必要である。分子シミュレーションは、クラックフロントにおける進行的なミクロ構造変化を解決する重要なツールを提供し、機械的エネルギー散逸、クラックパスの選択、動的不安定性(例えば、キンキング、分岐)などのプロセスの探索に広く用いられている。原子位置に基づく局所的記述子に基づく実験力場と結合順序は, 非線形, 異方性応力-ひずみ関係, エッジのエネルギー密度に対しても, 破壊の予測を満足させるものではない。したがって、高忠実な力場はひずみのテンソルの性質と破壊時の希少事象のエネルギーを含み、残念ながら最先端の経験的力場と機械学習の力場の両方では考慮されていない。第一原理計算によって生成されたデータに基づいて, ひずみ状態空間の事前サンプリングとアクティブラーニング技術を組み合わせて, 臨界結合距離における遷移状態の探索を行い, 破壊力場nn-f$^3$を開発した。 NN-F$^3$の能力は、モデル問題としてh-BNおよびツイスト二層グラフェンの破断を研究することによって実証される。シミュレーションの結果,最近の実験結果を確認し,極端機械過程の予測において第一原理計算から電子構造の知識を含める必要性を浮き彫りにした。 Extreme mechanical processes such as strong lattice distortion and bond breakage during fracture are ubiquitous in nature and engineering, which often lead to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenged by their multiscale characteristics spanning from atomic-level structures at the crack tip to the structural features where the load is applied. Molecular simulations offer an important tool to resolve the progressive microstructural changes at crack fronts and are widely used to explore processes therein, such as mechanical energy dissipation, crack path selection, and dynamic instabilities (e.g., kinking, branching). Empirical force fields developed based on local descriptors based on atomic positions and the bond orders do not yield satisfying predictions of fracture, even for the nonlinear, anisotropic stress-strain relations and the energy densities of edges. High-fidelity force fields thus should include the tensorial nature of strain and the energetics of rare events during fracture, which, unfortunately, have not been taken into account in both the state-of-the-art empirical and machine-learning force fields. Based on data generated by first-principles calculations, we develop a neural network-based force field for fracture, NN-F$^3$, by combining pre-sampling of the space of strain states and active-learning techniques to explore the transition states at critical bonding distances. The capability of NN-F$^3$ is demonstrated by studying the rupture of h-BN and twisted bilayer graphene as model problems. The simulation results confirm recent experimental findings and highlight the necessity to include the knowledge of electronic structures from first-principles calculations in predicting extreme mechanical processes.	翻訳日:2023-11-01 21:23:16 公開日:2023-10-30
# 財務異常検出のための縦・水平分割データによるプライバシー保護フェデレーション学習 Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection ( http://arxiv.org/abs/2310.19304v1 ) ライセンス: Link先を確認	Swanand Ravindra Kadhe, Heiko Ludwig, Nathalie Baracaldo, Alan King, Yi Zhou, Keith Houck, Ambrish Rawat, Mark Purcell, Naoise Holohan, Mikio Takeuchi, Ryo Kawahara, Nir Drucker, Hayim Shaul, Eyal Kushnir, Omri Soceanu	(参考訳) 金融異常の証拠を効果的に検出するには、支払いネットワークシステム(PNS)やパートナー銀行など、多様なデータを所有している複数のエンティティ間の協調が必要である。これらの金融機関間の信頼は規制と競争によって制限される。フェデレートラーニング(FL)は、データを垂直または水平に分割する場合に、エンティティが協調的にモデルをトレーニングすることを可能にする。しかし、実世界の金融異常検出シナリオでは、データは上下に分割されるため、既存のFLアプローチをプラグ・アンド・プレイで使用することはできない。我々の新しいソリューションであるPV4FADは、完全同型暗号化(HE)、セキュアマルチパーティ計算(SMPC)、差分プライバシ(DP)、ランダム化技術を組み合わせて、トレーニング中のプライバシと精度をバランスさせ、モデル展開時の推論脅威を防止する。我々のソリューションは、HEおよびSMPCを介して入力プライバシを提供し、DPを介して推測時間攻撃に対するプライバシを出力する。具体的には、正直だが厳密な脅威モデルでは、銀行はpnsトランザクションについてセンシティブな特徴を学ばず、pnsは銀行のデータセットに関する情報を学ばず、予測ラベルしか学ばないことを示す。また,推論中にアウトプットプライバシを保護するdp機構を開発し,解析する。提案手法は,分散DPを満足しながら,バンク単位のノイズレベルを著しく低減し,高ユーティリティモデルを生成する。高い精度を確保するため,本手法では,特にランダムフォレストをアンサンブルモデルとして作成する。これにより,アンサンブルのよく知られた特性を利用して分散を低減し,精度を向上させることができる。私たちのソリューションは、米国プライバシ・エンハンシング・テクノロジーズ(PET)賞チャレンジの第1フェーズで2位を獲得しました。 The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.	翻訳日:2023-11-01 21:22:46 公開日:2023-10-30
# chat-gptを用いた対話推薦のためのユーザニーズの抽出 Extracting user needs with Chat-GPT for dialogue recommendation ( http://arxiv.org/abs/2310.19303v1 ) ライセンス: Link先を確認	Yugen Sato, Taisei Nakajima, Tatsuki Kawamoto, Tomohiro Takagi	(参考訳) chatgptのような大規模言語モデル(llm)はますます洗練され、人間のような能力を発揮し、様々な日常業務において人間を助ける上で不可欠な役割を担っている。 AIの重要な応用は、対話型レコメンデーションシステムで、人間の問い合わせに応答し、ユーザに合わせたレコメンデーションを行う。ほとんどの従来の対話型レコメンデーションシステムでは、言語モデルは対話モデルとしてのみ使用され、別個のレコメンデーションシステムが存在する。これは対話システムとして使われる言語モデルが推薦システムとして機能する能力を持っていないためである。そこで我々は,対話システムとしての非常に高い推論能力と高品質な文を生成する能力を有するOpenAIのChat-GPTを用いて,推薦機能を備えた対話システムの構築を実現し,システムの有効性を検証する。 Large-scale language models (LLMs), such as ChatGPT, are becoming increasingly sophisticated and exhibit human-like capabilities, playing an essential role in assisting humans in a variety of everyday tasks. An important application of AI is interactive recommendation systems that respond to human inquiries and make recommendations tailored to the user. In most conventional interactive recommendation systems, the language model is used only as a dialogue model, and there is a separate recommendation system. This is due to the fact that the language model used as a dialogue system does not have the capability to serve as a recommendation system. Therefore, we will realize the construction of a dialogue system with recommendation capability by using OpenAI's Chat-GPT, which has a very high inference capability as a dialogue system and the ability to generate high-quality sentences, and verify the effectiveness of the system.	翻訳日:2023-11-01 21:22:16 公開日:2023-10-30
# ROME:ビジュアルコモンセンスを超えた推論のための事前学習型視覚言語モデルの評価 ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense ( http://arxiv.org/abs/2310.19301v1 ) ライセンス: Link先を確認	Kankan Zhou, Eason Lai, Wei Bin Au Yeong, Kyriakos Mouratidis, Jing Jiang	(参考訳) 人間は常識を超えた推論能力を持っている。例えば、空の魚のボウルの隣のテーブルに横たわる金魚の非日常的なイメージを考えると、人間は魚が魚のボウルの中にいないと断固として判断する。しかしこのケースは、視覚的な入力にもかかわらず、魚がボウルの中にいるという一般的なシナリオに向け、視覚言語モデルでは異なるかもしれない。本稿では,最先端の視覚言語モデルが直観的コンテンツを正しく解釈する推論能力を持っているかどうかを評価するために,rome(reasoning beyond commonsense knowledge)という新しい探索データセットを提案する。 ROMEには、色、形状、材料、サイズ、位置関係に関する常識的知識に反するイメージが含まれている。最先端の事前学習された視覚言語モデルの実験により、これらのモデルのほとんどは依然として直観に反するシナリオを解釈できないことが判明した。我々は、ROMEが視覚言語研究における常識知識以上の推論に関するさらなる調査を加速することを期待している。 Humans possess a strong capability for reasoning beyond common sense. For example, given an unconventional image of a goldfish laying on the table next to an empty fishbowl, a human would effortlessly determine that the fish is not inside the fishbowl. The case, however, may be different for a vision-language model, whose reasoning could gravitate towards the common scenario that the fish is inside the bowl, despite the visual input. In this paper, we introduce a novel probing dataset named ROME (reasoning beyond commonsense knowledge) to evaluate whether the state-of-the-art pre-trained vision-language models have the reasoning capability to correctly interpret counter-intuitive content. ROME contains images that defy commonsense knowledge with regards to color, shape, material, size and positional relation. Experiments on the state-of-the-art pre-trained vision-language models reveal that most of these models are still largely incapable of interpreting counter-intuitive scenarios. We hope that ROME will spur further investigations on reasoning beyond commonsense knowledge in vision-language research.	翻訳日:2023-11-01 21:21:59 公開日:2023-10-30
# 動的治療のためのステージアウェア学習 Stage-Aware Learning for Dynamic Treatments ( http://arxiv.org/abs/2310.19300v1 ) ライセンス: Link先を確認	Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu	(参考訳) 動的治療体制(DTR)の最近の進歩は、個人のニーズに合わせて調整され、期待される臨床利益を最大化できる強力な最適な治療探索アルゴリズムを提供する。しかし、既存のアルゴリズムは最適な治療、特に長期にわたる意思決定を伴う慢性疾患においてサンプルサイズ不足に苦しむ可能性がある。これらの課題に対処するため、我々は、DTRを、観察された治療軌跡と、決定段階を越えて最適な体制によって得られるものとの整合性の優先順位付けに焦点をあてて推定する、新しい個別化学習手法を提案する。観測軌道が最適処理と完全に一致しなければならないという制約を緩和することにより,逆確率重み付き手法のサンプル効率と安定性を大幅に改善する。特に,提案手法は,一般的な成果重み付け学習フレームワークを具体例として含む,より汎用的なフレームワークを構築している。さらに,決定段階間の不均一性を明示的に考慮するための注意機構とともに,段階重要度スコアの概念を導入する。我々はフィッシャー整合性や有限サンプル性能境界を含む提案手法の理論的性質を確立する。本手法を広範囲なシミュレーション環境において実証的に評価し,その実例について検討した。 Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal treatment searching algorithms, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms could suffer from insufficient sample size under optimal treatments, especially for chronic diseases involving long stages of decision-making. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of inverse probability weighted based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic.	翻訳日:2023-11-01 21:21:42 公開日:2023-10-30
# 生成モデルにおける公平性の測定について On Measuring Fairness in Generative Models ( http://arxiv.org/abs/2310.19297v1 ) ライセンス: Link先を確認	Christopher T. H. Teo, Milad Abdollahzadeh, Ngai-Man Cheung	(参考訳) 近年,公平な生成モデルへの関心が高まっている。本研究は, フェアネス測定の詳細な研究を初めて行い, 公正な生成モデルにおいて, ゲージングの進行に重要な要素となる。我々は3つの貢献をした。まず,高精度な属性分類器(SA)を用いた場合においても,既存の公正度測定フレームワークにかなりの測定誤差があることを明らかにする。これらの結果は、以前報告された公平性の改善に疑問を投げかけた。第2に, クラシファイア・エラー・アウェア計測(CLEAM)を提案する。これは統計モデルを用いて, SA分類器の不正確さを推定する新しいフレームワークである。提案したCLEAMは,StyleGAN2 w.r.t. Genderの4.98%$\rightarrowを0.62%削減する。さらに、CLEAMは最小限の追加オーバーヘッドでこれを達成する。第3に,CLEAMを用いて重要なテキスト・画像生成器とGANの公平性を計測し,これらのモデルにかなりのバイアスを生じさせ,それらのアプリケーションに対する懸念を提起する。コードとより多くのリソース: https://sutd-visual-computing-group.github.io/cleam/ Recently, there has been increased interest in fair generative models. In this work, we conduct, for the first time, an in-depth study on fairness measurement, a critical component in gauging progress on fair generative models. We make three contributions. First, we conduct a study that reveals that the existing fairness measurement framework has considerable measurement errors, even when highly accurate sensitive attribute (SA) classifiers are used. These findings cast doubts on previously reported fairness improvements. Second, to address this issue, we propose CLassifier Error-Aware Measurement (CLEAM), a new framework which uses a statistical model to account for inaccuracies in SA classifiers. Our proposed CLEAM reduces measurement errors significantly, e.g., 4.98% $\rightarrow$ 0.62% for StyleGAN2 w.r.t. Gender. Additionally, CLEAM achieves this with minimal additional overhead. Third, we utilize CLEAM to measure fairness in important text-to-image generator and GANs, revealing considerable biases in these models that raise concerns about their applications. Code and more resources: https://sutd-visual-computing-group.github.io/CLEAM/.	翻訳日:2023-11-01 21:21:23 公開日:2023-10-30
# 量子状態の複素値ウィグナーエントロピー Complex-valued Wigner entropy of a quantum state ( http://arxiv.org/abs/2310.19296v1 ) ライセンス: Link先を確認	Nicolas J. Cerf, Anaelle Hertz, Zacharie Van Herstraeten	(参考訳) 量子状態のウィグナー関数が負の値を持つことは一般的な知識であり、真の確率密度と見なすことはできない。ここでは、負のウィグナー関数に拡張される位相空間におけるエントロピー的汎関数を見つけることの難しさを調べ、任意のウィグナー関数に付随する複素値エントロピーを定義するメリットを提唱する。複素ウィグナーエントロピー (complex wigner entropy) と呼ばれるこの量は、複素平面におけるウィグナー函数のシャノン微分エントロピーの解析的継続によって定義される。複素ウィグナーエントロピーは興味深い性質を持ち、特に実部と虚部はガウスユニタリの下で不変である(位相空間における変位、回転、スキーズ)。その実部はガウスの畳み込みの下でのウィグナー函数の進化を考える際に物理的に関係があるが、その虚部は単にウィグナー函数の負の体積に比例する。最後に、任意のウィグナー関数の複素値フィッシャー情報を定義する。これは(拡張ド・ブルーエンの同一性によって)(状態がガウスの付加雑音を受けるとき)複素ウィグナーエントロピーの時間微分と結びついている。全体として、複素平面は位相空間における準確率分布のエントロピー特性を分析するための適切な枠組みをもたらすことが期待される。 It is common knowledge that the Wigner function of a quantum state may admit negative values, so that it cannot be viewed as a genuine probability density. Here, we examine the difficulty in finding an entropy-like functional in phase space that extends to negative Wigner functions and then advocate the merits of defining a complex-valued entropy associated with any Wigner function. This quantity, which we call the complex Wigner entropy, is defined via the analytic continuation of Shannon's differential entropy of the Wigner function in the complex plane. We show that the complex Wigner entropy enjoys interesting properties, especially its real and imaginary parts are both invariant under Gaussian unitaries (displacements, rotations, and squeezing in phase space). Its real part is physically relevant when considering the evolution of the Wigner function under a Gaussian convolution, while its imaginary part is simply proportional to the negative volume of the Wigner function. Finally, we define the complex-valued Fisher information of any Wigner function, which is linked (via an extended de Bruijn's identity) to the time derivative of the complex Wigner entropy when the state undergoes Gaussian additive noise. Overall, it is anticipated that the complex plane yields a proper framework for analyzing the entropic properties of quasiprobability distributions in phase space.	翻訳日:2023-11-01 21:21:04 公開日:2023-10-30
# ROAM: 最適化されたオペレータオーダとメモリレイアウトによるメモリ効率の大きなDNNトレーニング ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout ( http://arxiv.org/abs/2310.19295v1 ) ライセンス: Link先を確認	Huiyao Shu and Ang Wang and Ziji Shi and Hanyu Zhao and Yong Li and Lu Lu	(参考訳) ディープラーニングモデルのサイズが拡大するにつれ、トレーニングのメモリ要件は急増している。オフロード、再計算、圧縮といったハイレベルなテクニックはメモリのプレッシャーを軽減するが、オーバーヘッドも伴う。しかし、適切な演算子実行順序とテンソルメモリレイアウトを含むメモリ効率の高い実行プランは、モデルのメモリ効率を大幅に向上させ、ハイレベルな技術によるオーバーヘッドを低減することができる。本稿では,演算子順序とテンソルメモリレイアウトを最適化したメモリ効率実行計画の導出のために,計算グラフレベルで動作するROAMを提案する。まずモデル構造とメモリ負荷の訓練を慎重に検討し,これまで十分にサポートされていなかった大規模複雑なグラフの最適化を支援するための高度な理論を提案する。さらに,タスク分割を自動的に探索する効率的な木に基づくアルゴリズムを提案し,課題を解決するために高い性能と有効性を提供する。実験の結果、ROAMはPytorchと2つの最先端手法と比較して35.7%、13.3%、27.2%の大幅なメモリ削減を実現し、53.7倍のスピードアップを実現している。 GPT2-XLの拡張による評価は、ROAMのスケーラビリティをさらに検証する。 As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability.	翻訳日:2023-11-01 21:20:40 公開日:2023-10-30
# カラー同変畳み込みネットワーク Color Equivariant Convolutional Networks ( http://arxiv.org/abs/2310.19368v1 ) ライセンス: Link先を確認	Attila Lengyel, Ombretta Strafforello, Robert-Jan Bruintjes, Alexander Gielisse, Jan van Gemert	(参考訳) 色は、畳み込みニューラルネットワーク(cnns)がオブジェクト認識に容易に活用できる重要な視覚的手がかりである。しかし、cnnは、偶発的な記録条件によってもたらされた色の変化の間にデータの不均衡がある場合に苦労する。色不変性はこの問題に対処するが、識別力の犠牲となるすべての色情報を除去するコストがかかる。本稿では,カラー情報を保持しつつ,色スペクトル間の形状特徴共有を可能にする,新しいディープラーニングビルディングブロックであるカラー等変畳み込み(CEConvs)を提案する。ニューラルネットワークにおける色相のパラメータ共有を組み込むことにより、等分散の概念を幾何変換から測光変換へ拡張する。 CEConvsの利点は、様々なタスクに対するダウンストリーム性能と、列車-テストの分散シフトを含む色の変化に対する堅牢性の改善である。我々のアプローチは、ResNetsのような既存のアーキテクチャにシームレスに統合することができ、CNNにおけるカラーベースのドメインシフトに対処するための有望なソリューションを提供する。 Color is a crucial visual cue readily exploited by Convolutional Neural Networks (CNNs) for object recognition. However, CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions. Color invariance addresses this issue but does so at the cost of removing all color information, which sacrifices discriminative power. In this paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum while retaining important color information. We extend the notion of equivariance from geometric to photometric transformations by incorporating parameter sharing over hue-shifts in a neural network. We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts. Our approach can be seamlessly integrated into existing architectures, such as ResNets, and offers a promising solution for addressing color-based domain shifts in CNNs.	翻訳日:2023-11-01 21:12:48 公開日:2023-10-30
# フロッケ非平衡グリーン関数とフロッケ量子マスター方程式:電子-電子相互作用の役割 Floquet non-equilibrium Green's function and Floquet quantum master equation for electronic transport: The role of electron-electron interactions ( http://arxiv.org/abs/2310.19362v1 ) ライセンス: Link先を確認	Vahid Mosallanejad, Yu Wang, and Wenjie Dou	(参考訳) 非平衡グリーン関数(NEGF)と量子マスター方程式(QME)は電子輸送のアプローチの2つの主要なクラスである。外部周期場との相互作用により駆動される量子ドットの輸送特性に対するこれらの形式の様々なフロケ分散について論じる。最初にFloquet NEGFの2つのバージョンを導出した。また、相互作用系に対するFloquet NEGFフォーマリズムのアンサッツについても検討する。さらに,弱い相互作用状態においてFloquet QMEの2つのバージョンを導出した。各手法を用いて,各演算子の期待値と現在の演算子の期待値の評価について詳述する。本研究は, 定期運転対象の2レベルシステムを用いた交通手段について検討した。 4つの方法すべての結果は高い一貫性を示している。我々はこれらのフロケ量子輸送法が光に曝される分子接合の研究に有用であると期待する。 Non-equilibrium Green's function (NEGF) and quantum master equation (QME) are two main classes of approaches for electronic transport. We discuss various Floquet variances of these formalisms for transport properties of a quantum dot driven via interaction with an external periodic field. We first derived two versions of the Floquet NEGF. We also explore an ansatz of the Floquet NEGF formalism for the interacting systems. In addition, we derived two versions of Floquet QME in the weak interaction regime. With each method, we elaborate on the evaluation of the expectation values of the number and current operators. We examined these methods for transport through a two-level system that is subject to periodic driving. The results of all four methods show great consistency. We expect these Floquet quantum transport methods to be useful in studying molecular junctions exposed to light.	翻訳日:2023-11-01 21:12:32 公開日:2023-10-30
# バランス、不均衡、リバランス:minimaxゲームの観点からのロバストオーバーフィットの理解 Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective ( http://arxiv.org/abs/2310.19360v1 ) ライセンス: Link先を確認	Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, Yisen Wang	(参考訳) 敵対的訓練(AT)はおそらく、頑健な特徴を抽出するための最先端のアルゴリズムである。しかし、最近の研究者はATが特に学習率(LR)が崩壊した後、深刻な過適合問題に悩まされていることに気づいた。本稿では,モデルトレーナーと攻撃者の間の動的ミニマックスゲームとして,敵対的トレーニングを見て,この現象を説明する。具体的には, LR崩壊がトレーナーに強い記憶能力を与えることでミニマックスゲーム間のバランスを損なうかを分析し, 非破壊的特徴を記憶した結果, 強靭なオーバーフィッティングを引き起こすことを示す。この理解を広範囲な実験で検証し、2人のゲームプレーヤーのダイナミクスから強固なオーバーフィットの全体像を提供する。この理解は、トレーナーの能力の正規化や攻撃強度の向上によって、2人のプレイヤーを再バランスさせることで、堅牢なオーバーフィッティングを緩和するきっかけとなる。実験により、提案したReBalanced Adversarial Training (ReBAT) は、非常に長い訓練の後でも、頑健なオーバーフィッティングに苦しむことはないことが示された。コードはhttps://github.com/PKU-ML/ReBAT.comで入手できる。 Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https://github.com/PKU-ML/ReBAT.	翻訳日:2023-11-01 21:12:22 公開日:2023-10-30
# 複数のインスタンス学習におけるインスタンスラベル相関の導入病理組織像における癌検出への応用 Introducing instance label correlation in multiple instance learning. Application to cancer detection on histopathological images ( http://arxiv.org/abs/2310.19359v1 ) ライセンス: Link先を確認	Pablo Morales-\'Alvarez, Arne Schmidt, Jos\'e Miguel Hern\'andez-Lobato, Rafael Molina	(参考訳) 近年では,マルチインスタンス学習(mil)の弱い教師付きパラダイムが,さまざまな分野で広く普及している。パラダイム的な例は計算病理学であり、全スライディング画像に対するパッチレベルのラベルの欠如は、教師付きモデルの適用を妨げる。ガウス過程(GP)に基づく確率的MIL法は, 優れた不確実性推定能力により有望な結果を得た。しかし、これらは1つの重要な事実を考慮しない汎用的MIL手法であり、(病理)画像では、近隣のパッチのラベルに相関が期待できる。本研究では,VGPMIL-PRと呼ばれる最先端のGPベースのMIL法を拡張し,その相関性を利用する。そこで我々は統計物理学イジングモデルに触発された新しい結合項を開発した。すべてのモデルパラメータを推定するために変分推論を使用します。興味深いことに、Ising項の強度を調節する重みがなくなると、VGPMIL-PRの定式化が回復する。提案手法の性能は,前立腺癌検出の現実的な2つの問題において評価される。我々のモデルは、他の最先端確率的MIL法よりも優れた結果が得られることを示す。我々はまた、小説『Ising』の影響を洞察するために、異なる可視化と分析も提供する。これらの知見は、提案されたモデルの他の研究分野への応用を促進することが期待されている。 In the last years, the weakly supervised paradigm of multiple instance learning (MIL) has become very popular in many different areas. A paradigmatic example is computational pathology, where the lack of patch-level labels for whole-slide images prevents the application of supervised models. Probabilistic MIL methods based on Gaussian Processes (GPs) have obtained promising results due to their excellent uncertainty estimation capabilities. However, these are general-purpose MIL methods that do not take into account one important fact: in (histopathological) images, the labels of neighboring patches are expected to be correlated. In this work, we extend a state-of-the-art GP-based MIL method, which is called VGPMIL-PR, to exploit such correlation. To do so, we develop a novel coupling term inspired by the statistical physics Ising model. We use variational inference to estimate all the model parameters. Interestingly, the VGPMIL-PR formulation is recovered when the weight that regulates the strength of the Ising term vanishes. The performance of the proposed method is assessed in two real-world problems of prostate cancer detection. We show that our model achieves better results than other state-of-the-art probabilistic MIL methods. We also provide different visualizations and analysis to gain insights into the influence of the novel Ising term. These insights are expected to facilitate the application of the proposed model to other research areas.	翻訳日:2023-11-01 21:12:00 公開日:2023-10-30
# 局所ランダム量子回路は任意のアーキテクチャ上の近似設計を形成する Local random quantum circuits form approximate designs on arbitrary architectures ( http://arxiv.org/abs/2310.19355v1 ) ライセンス: Link先を確認	Shivan Mittal, Nicholas Hunter-Jones	(参考訳) エッジが許容される2$-qudit相互作用を決定する任意の連結グラフ上のランダム量子回路(RQC)を考える。以前の研究は、局所次元$q$1D、完全、および$D$次元のグラフを持つような$n$量子回路が近似ユニタリな設計を成し、多項式的に多くのゲートの後に一意群$U(q^n)$上のハール測度に近い分布からユニタリを生成することを確立してきた。ここで、これらの結果を拡張して、幅広いグラフのクラス上の $o(\mathrm{poly}(n,k))$gate からなる rqcs が近似ユニタリな $k$-designs を形成することを示す。有界次数と高さを持つ木にまたがるグラフ上の rqcs は、$o(\|e\|n\,\mathrm{poly}(k))$ gates の後に $k$-designs となる。さらに, rqc が多項式回路サイズで近似設計を生成するグラフのより大きなクラスを特定する。 k \leq 4$ に対して、ある最大次数のグラフ上の RQC が $O(\|E\|n)$ ゲートの後に設計され、明示的な定数を与えることを示す。我々は局所ハミルトニアンのスペクトルギャップから回路サイズの境界を決定する。この目的のために、正規グラフ上のフラストレーションフリーハミルトニアンのギャップを任意の連結グラフに有界化するための有限サイズ(Knabe)法を拡張する。さらに,任意のグラフ上のハミルトニアンのスペクトルギャップを決定するための検出可能性補題に基づく新しい手法を提案する。第1法は[Commun. Phys. 291, 257 (2009)]の簡潔な代替証明を提供し,第2法は,任意の連結アーキテクチャ上のRQCが準多項式回路サイズで近似設計を成すことを示す。 We consider random quantum circuits (RQC) on arbitrary connected graphs whose edges determine the allowed $2$-qudit interactions. Prior work has established that such $n$-qudit circuits with local dimension $q$ on 1D, complete, and $D$-dimensional graphs form approximate unitary designs, that is, they generate unitaries from distributions close to the Haar measure on the unitary group $U(q^n)$ after polynomially many gates. Here, we extend those results by proving that RQCs comprised of $O(\mathrm{poly}(n,k))$ gates on a wide class of graphs form approximate unitary $k$-designs. We prove that RQCs on graphs with spanning trees of bounded degree and height form $k$-designs after $O(\|E\|n\,\mathrm{poly}(k))$ gates, where $\|E\|$ is the number of edges in the graph. Furthermore, we identify larger classes of graphs for which RQCs generate approximate designs in polynomial circuit size. For $k \leq 4$, we show that RQCs on graphs of certain maximum degrees form designs after $O(\|E\|n)$ gates, providing explicit constants. We determine our circuit size bounds from the spectral gaps of local Hamiltonians. To that end, we extend the finite-size (or Knabe) method for bounding gaps of frustration-free Hamiltonians on regular graphs to arbitrary connected graphs. We further introduce a new method based on the Detectability Lemma for determining the spectral gaps of Hamiltonians on arbitrary graphs. Our methods have wider applicability as the first method provides a succinct alternative proof of [Commun. Math. Phys. 291, 257 (2009)] and the second method proves that RQCs on any connected architecture form approximate designs in quasi-polynomial circuit size.	翻訳日:2023-11-01 21:11:38 公開日:2023-10-30
# 物体検出のための半教師あり領域一般化 Semi- and Weakly-Supervised Domain Generalization for Object Detection ( http://arxiv.org/abs/2310.19351v1 ) ライセンス: Link先を確認	Ryosuke Furuta, Yoichi Sato	(参考訳) トレーニングとテストデータでドメインが大きく異なる場合、オブジェクト検出器はうまく動作しない。この問題を解決するために,複数の領域の接地ラベルを用いたトレーニングデータを必要とする領域一般化手法が提案されている。しかし、クラスラベルだけでなく、バウンディングボックスにも注釈を付けなければならないため、オブジェクト検出のためにこれらのデータを集めるのに時間と労力がかかります。高価なアノテーションを必要とせずに、オブジェクト検出におけるドメインギャップを克服するために、半教師付きドメイン一般化オブジェクト検出(SS-DGOD)と弱い教師付きDGOD(WS-DGOD)という2つの新しい問題設定を提案する。複数のドメインからのラベル付きデータを必要とする従来のドメインの一般化とは対照的に、SS-DGODとWS-DGODは1つのドメインからのみラベル付きデータを必要とし、トレーニングのために複数のドメインからラベル付きまたは弱いラベル付きデータを必要とする。対象検出器は、教師から出力される擬似ラベルを用いて、未ラベルまたは弱ラベルのデータに基づいて学生ネットワークを訓練する同じ学習フレームワークを用いて、提案した設定で効果的に訓練できることを示す。実験の結果,提案手法で学習した対象検出器は,あるラベル付きドメインデータでトレーニングされたベースライン検出器を著しく上回っており,非教師付きドメイン適応(uda)設定で訓練されたものと同等かそれ以上の性能を発揮することがわかった。 Object detectors do not work well when domains largely differ between training and testing data. To solve this problem, domain generalization approaches, which require training data with ground-truth labels from multiple domains, have been proposed. However, it is time-consuming and labor-intensive to collect those data for object detection because not only class labels but also bounding boxes must be annotated. To overcome the problem of domain gap in object detection without requiring expensive annotations, we propose to consider two new problem settings: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD). In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training. We show that object detectors can be effectively trained on the proposed settings with the same student-teacher learning framework, where a student network is trained with pseudo labels output from a teacher on the unlabeled or weakly-labeled data. The experimental results demonstrate that the object detectors trained on the proposed settings significantly outperform baseline detectors trained on one labeled domain data and perform comparably to or better than those trained on unsupervised domain adaptation (UDA) settings, while ours do not use target domain data for training in contrast to UDA.	翻訳日:2023-11-01 21:10:58 公開日:2023-10-30
# 日本SimCSE技術報告 Japanese SimCSE Technical Report ( http://arxiv.org/abs/2310.19349v1 ) ライセンス: Link先を確認	Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda	(参考訳) simcseで微調整された日本語文埋め込みモデルの開発について報告する。文埋め込み研究のベースラインとして使用可能な日本語の文埋め込みモデルが不足していることから,24の日本語・多言語モデル,5つの教師付きデータセット,4つの教師なしデータセットを含む日本語文埋め込みに関する広範な実験を行った。本報告では,日本語SimCSEの詳細なトレーニング設定と評価結果について述べる。 We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted extensive experiments on Japanese sentence embeddings involving 24 pre-trained Japanese or multilingual language models, five supervised datasets, and four unsupervised datasets. In this report, we provide the detailed training setup for Japanese SimCSE and their evaluation results.	翻訳日:2023-11-01 21:10:29 公開日:2023-10-30
# ドープファンデルワールス反強磁性体(Ni,Cd)PS3における量子多体磁気励起子の迅速抑制 Rapid suppression of quantum many-body magnetic exciton in doped van der Waals antiferromagnet (Ni,Cd)PS3 ( http://arxiv.org/abs/2310.19348v1 ) ライセンス: Link先を確認	Junghyun Kim, Woongki Na, Jonghyeon Kim, Pyeongjae Park, Kaixuan Zhang, Inho Hwang, Young-Woo Son, Jae Hoon Kim, Hyeonsik Cheong, Je-Geun Park	(参考訳) van der waals反強磁性体nips3における磁気励起子のユニークな発見は、zhang-rice一重項励起状態とzhang-rice三重項基底状態の2つの量子多体状態の間に生じる。同時に、この励起子に由来する発光のスペクトル幅は0.4 meVと非常に狭い。 NiPS3の磁気励起子の極端なコヒーレンスを含むこれらの異常な性質は、多くの疑問を呈している。 Ni1-xCdxPS3を用いたドーピング効果について実験的に検討した。実験の結果,磁気励起子はcdドーピング数%で劇的に抑制された。これらすべてが生じるが、エキシトンの幅は徐々に増加し、反強磁性基底状態は堅牢である。これらの結果は、コヒーレント磁気励起子の前提条件として格子の均一性が隠された重要性を強調している。最後に、壊れた電荷移動は(Ni,Cd)PS3におけるコヒーレント磁気励起子の均一な形成を許すというエキサイティングなシナリオが現れる。 The unique discovery of magnetic exciton in van der Waals antiferromagnet NiPS3 arises between two quantum many-body states of a Zhang-Rice singlet excited state and a Zhang-Rice triplet ground state. Simultaneously, the spectral width of photoluminescence originating from this exciton is exceedingly narrow as 0.4 meV. These extraordinary properties, including the extreme coherence of the magnetic exciton in NiPS3, beg many questions. We studied doping effects using Ni1-xCdxPS3 using two experimental techniques and theoretical studies. Our experimental results show that the magnetic exciton is drastically suppressed upon a few % Cd doping. All these happen while the width of the exciton only gradually increases, and the antiferromagnetic ground state is robust. These results highlight the lattice uniformity's hidden importance as a prerequisite for coherent magnetic exciton. Finally, an exciting scenario emerges: the broken charge transfer forbids the otherwise uniform formation of the coherent magnetic exciton in (Ni,Cd)PS3.	翻訳日:2023-11-01 21:10:20 公開日:2023-10-30
# LLMの理解と実装能力の相違によるテキスト要約の現実的整合性の改善 Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs ( http://arxiv.org/abs/2310.19347v1 ) ライセンス: Link先を確認	Huawen Feng, Yan Fan, Xiong Liu, Ting-En Lin, Zekun Yao, Yuchuan Wu, Fei Huang, Yongbin Li, Qianli Ma	(参考訳) 大規模言語モデル(llm)によるテキスト要約の最近の進歩にもかかわらず、それらはテキスト生成において「幻覚」として知られる元の記事と事実上矛盾する要約を生成することが多い。従来の小さなモデル(例えばBART、T5)とは異なり、現在のLLMは愚かなミスを少なくするが、原因や効果を示唆する、誤った詳細を追加する、過度に一般化するなど、より洗練されたものを作る。これらの幻覚は従来の手法による検出が困難であり、テキスト要約の事実整合性を改善する上で大きな課題となる。本稿では,LLM(DECENT)の包括的・包括的NT能力を阻害する逆デカップリング手法を提案する。さらに, LLMの学習過程において, 真偽に対する感度の不足を補うために, 探索に基づくパラメータ効率の手法を採用した。このように、LLMはエンプレッシングや理解に混同されることが少なく、より正確に命令を実行でき、幻覚を識別する能力が向上する。実験の結果, llmsに基づくテキスト要約の信頼性が有意に向上した。 Despite the recent progress in text summarization made by large language models (LLMs), they often generate summaries that are factually inconsistent with original articles, known as "hallucinations" in text generation. Unlike previous small models (e.g., BART, T5), current LLMs make fewer silly mistakes but more sophisticated ones, such as imposing cause and effect, adding false details, and overgeneralizing, etc. These hallucinations are challenging to detect through traditional methods, which poses great challenges for improving the factual consistency of text summarization. In this paper, we propose an adversarially DEcoupling method to disentangle the Comprehension and EmbellishmeNT abilities of LLMs (DECENT). Furthermore, we adopt a probing-based parameter-efficient technique to cover the shortage of sensitivity for true and false in the training process of LLMs. In this way, LLMs are less confused about embellishing and understanding, thus can execute the instructions more accurately and have enhanced abilities to distinguish hallucinations. Experimental results show that DECENT significantly improves the reliability of text summarization based on LLMs.	翻訳日:2023-11-01 21:09:59 公開日:2023-10-30
# テストスイートタスク: MuST-SHE と INES を用いたMT におけるジェンダーフェアネスの評価 Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES ( http://arxiv.org/abs/2310.19345v1 ) ライセンス: Link先を確認	Beatrice Savoldi and Marco Gaido and Matteo Negri and Luisa Bentivogli	(参考訳) WMT-2023"テストスイート"共有タスクの一部として,MuST-SHE-WMT23とINESの2つのテストスイートの評価結果を要約する。 en-de と de-en の言語ペアに焦点をあてることで、私たちはこれらの新しく作られたテストスイートを利用して、女性と男性を翻訳し、性別非包括的な翻訳を生成するシステムの能力を調査します。さらに,テストスイートに関連する指標について議論し,人間による評価によって検証する。以上の結果から,女性と男性の両方の性別形態を自然主義的なジェンダー現象に正しく翻訳する上で,システムは合理的かつ同等のパフォーマンスを達成できることが示唆された。代わりに、翻訳における包括的言語フォームの生成は、全ての評価されたMTモデルの挑戦的なタスクとして現れ、今後の改善とトピックの研究の余地を示す。 As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic.	翻訳日:2023-11-01 21:09:37 公開日:2023-10-30
# 知識伝達によるラベルのみモデル反転攻撃 Label-Only Model Inversion Attacks via Knowledge Transfer ( http://arxiv.org/abs/2310.19342v1 ) ライセンス: Link先を確認	Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, Ngai-Man Cheung	(参考訳) モデル反転(MI)攻撃では、敵は機械学習(ML)モデルへのアクセスを悪用し、プライベートトレーニングデータを推論して再構築する。ホワイトボックスとブラックボックスのセットアップでは、敵がそれぞれ完全なモデルまたはモデルのソフトアウトプットにアクセスするという顕著な進歩がなされている。しかし、最も難しいが実際に重要な設定では、非常に限定的な研究がある: ラベルのみのmi攻撃、敵は信頼度スコアや他のモデル情報なしで、モデルの予測ラベル(ハードラベル)へのアクセスしかできない。本研究ではラベルのみのMI攻撃に対する新しいアプローチであるLOKTを提案する。我々のアイデアは、不透明なターゲットモデルから代理モデルへの知識の伝達に基づいている。その後,これらのサロゲートモデルを用いて,先進的なホワイトボックス攻撃を活用できる。本稿では、生成モデルに基づく知識伝達を提案し、効果的な知識伝達のための新しいモデルであるTarget Model-assisted ACGAN(T-ACGAN)を提案する。提案手法はラベルのみのmiをより扱いやすいホワイトボックス設定にキャストする。提案手法に基づくサロゲートモデルがmiのターゲットモデルの効果的なプロキシとなることをサポートする分析を提供する。実験の結果,本手法は既存のsomaラベルのみのmi攻撃を全miベンチマークで15%以上上回った。さらに,提案手法はクエリ予算の観点から好適な比較を行う。私たちの研究は、最小限の情報(ハードラベル)が露出しても、mlモデルに対するプライバシの脅威が高まることを浮き彫りにしている。私たちの研究は、最小限の情報(ハードラベル)が露出しても、mlモデルに対するプライバシの脅威が高まることを浮き彫りにしている。私たちのコード、デモ、モデル、再構築されたデータは、プロジェクトページで利用可能です。 In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private training data. Remarkable progress has been made in the white-box and black-box setups, where the adversary has access to the complete model or the model's soft output respectively. However, there is very limited study in the most challenging but practically important setup: Label-only MI attacks, where the adversary only has access to the model's predicted label (hard label) without confidence scores nor any other model information. In this work, we propose LOKT, a novel approach for label-only MI attacks. Our idea is based on transfer of knowledge from the opaque target model to surrogate models. Subsequently, using these surrogate models, our approach can harness advanced white-box attacks. We propose knowledge transfer based on generative modelling, and introduce a new model, Target model-assisted ACGAN (T-ACGAN), for effective knowledge transfer. Our method casts the challenging label-only MI into the more tractable white-box setup. We provide analysis to support that surrogate models based on our approach serve as effective proxies for the target model for MI. Our experiments show that our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks. Furthermore, our method compares favorably in terms of query budget. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our code, demo, models and reconstructed data are available at our project page: https://ngoc-nguyen-0.github.io/lokt/	翻訳日:2023-11-01 21:09:20 公開日:2023-10-30
# Skywork: よりオープンなバイリンガル基礎モデル Skywork: A More Open Bilingual Foundation Model ( http://arxiv.org/abs/2310.19341v1 ) ライセンス: Link先を確認	Tianwen Wei, Liang Zhao, Lichang Zhang, Bo Zhu, Lijie Wang, Haihua Yang, Biye Li, Cheng Cheng, Weiwei L\"u, Rui Hu, Chenxia Li, Liu Yang, Xilin Luo, Xuejie Wu, Lunan Liu, Wenjun Cheng, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Lei Lin, Xiaokun Wang, Yutuan Ma, Chuanhai Dong, Yanqi Sun, Yifu Chen, Yongyi Peng, Xiaojuan Liang, Shuicheng Yan, Han Fang, Yahui Zhou	(参考訳) 本報告では、英語と中国語のテキストから3.2兆枚以上のトークンを収集した大規模言語モデル(llm)のファミリーであるskywork-13bについて述べる。このバイリンガル基礎モデルは、現在までに最も広く訓練され、公開されているLLMである。汎用トレーニングとドメイン特化強化トレーニングをそれぞれターゲットとした,セグメンテーションコーパスを用いた2段階のトレーニング手法を提案する。我々のモデルは,一般的なベンチマークに優れるだけでなく,多様なドメインにおける中国語のモデリングにおける 'emph{state of the art} のパフォーマンスも達成できることを示す。さらに, LLM コミュニティによるさらなる調査を保証し, テストデータ汚染がプレス問題であることを示す新しい漏洩検出手法を提案する。今後の研究を進めるため,我々はskywork-13bをトレーニングの中間段階で取得したチェックポイントと共にリリースする。われわれはSkyPileのコーパスもリリースしている。これは150億以上のウェブテキストのトークンを集めたもので、中国最大の高品質なプレトレーニングコーパスだ。 Skywork-13Bとオープンコーパスが、高品質のLCMへのアクセスを民主化するための貴重なオープンソースリソースになることを期待しています。 In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs.	翻訳日:2023-11-01 21:08:56 公開日:2023-10-30
# スケーラブルな2分間フィードバック:継続的フィードバック機器としてのデジタル・講義対応調査 Scalable Two-Minute Feedback: Digital, Lecture-Accompanying Survey as a Continuous Feedback Instrument ( http://arxiv.org/abs/2310.19334v1 ) ライセンス: Link先を確認	Armin Egetenmeier, Sven Strickroth	(参考訳) コースや講義内容に関する詳細なフィードバックは改善に不可欠であり、リフレクションのツールとしても機能する。しかし、フィードバックをタイムリーに収集し分析することが教師にとって課題となるため、フィードバックの方法は散発的にのみ使われることが多い。また、学生の現在の状況や学期中の労働負荷の変化も考慮しないことが多い。総合的な調査では,学生のストレスを定量的に測定し,質的な部分で参加者の反射に対処し,2つの教育機関で改善のための一般的な提案(いわゆるOne-Minute Paperに基づく)を収集するための形式的フィードバックとして,デジタル調査形式を用いた。学期中のフィードバックは定性的に評価され、メタレベルと特別な特徴(例えば、学生の労働倫理や他のコースのリフレクション)について議論される。結果は、低いが一定のフィードバック率を示している。回答は主に講義内容や組織的側面の話題を取り上げ、講義内の問題を集中的に報告するために使用された。さらに,大規模言語モデルとしての人工知能(AI)サポートを検証し,教師に対するオープンエンド応答を要約する有望な結果を示した。最後に、講師の経験を反映させ、その結果と改善の可能性について考察する。 Detailed feedback on courses and lecture content is essential for their improvement and also serves as a tool for reflection. However, feedback methods are often only used sporadically, especially in mass courses, because collecting and analyzing feedback in a timely manner is often a challenge for teachers. Moreover, the current situation of the students or the changing workload during the semester are usually not taken into account either. For a holistic investigation, the article used a digital survey format as formative feedback which attempts to measure student stress in a quantitative part and to address the participants' reflection in a qualitative part, as well as to collect general suggestions for improvement (based on the so-called One-Minute Paper) at two educational institutions. The feedback during the semester is evaluated qualitatively and discussed on a meta-level and special features (e.g. reflections on student work ethic or other courses) are addressed. The results show a low, but constant rate of feedback. Responses mostly cover topics of the lecture content or organizational aspects and were intensively used to report issues within the lecture. In addition, artificial intelligence (AI) support in the form of a large language model was tested and showed promising results in summarizing the open-ended responses for the teacher. Finally, the experiences from the lecturers are reflected upon and the results as well as possibilities for improvement are discussed.	翻訳日:2023-11-01 21:08:36 公開日:2023-10-30
# 不均一相互作用をもつ量子スピン鎖の固有状態熱化とその分解 Eigenstate Thermalization and its breakdown in Quantum Spin Chains with Inhomogeneous Interactions ( http://arxiv.org/abs/2310.19333v1 ) ライセンス: Link先を確認	Ding-Zu Wang, Hao Zhu, Jian Cui, Javier Arg\"uello-Luengo, Maciej Lewenstein, Guo-Feng Zhang, Piotr Sierant, Shi-Ju Ran	(参考訳) 固有状態熱化仮説 (ETH) は、孤立量子多体系におけるエルゴディディティと熱化の基準を確立する成功理論である。本研究では,線形不斉相互作用を持つスピン-1/2$ xxz鎖の熱化特性について検討する。不均質な相互作用の導入は、量子カオスと熱化の開始に繋がるが、十分に強い不均一性のために阻害される。 ETHを発現させ,相互作用の強度の変化による分解を示すため,不均一なXXZスピン鎖の固有状態における局所可観測体の行列要素のエネルギーレベルと特性の統計を探索する。さらに, エンタングルメントエントロピーの力学と生存確率について検討し, 熱化とその破壊を考察した。超低温原子系における線形不均一相互作用でXXZ鎖を実験的に実現する方法を概説する。以上の結果から,不均一性の挿入によるETHの出現機構が明らかとなり,強い相互作用が存在する場合の量子力学の停止が示唆された。 The eigenstate thermalization hypothesis (ETH) is a successful theory that establishes the criteria for ergodicity and thermalization in isolated quantum many-body systems. In this work, we investigate the thermalization properties of spin-$ 1/2 $ XXZ chain with linearly-inhomogeneous interactions. We demonstrate that introduction of the inhomogeneous interactions leads to an onset of quantum chaos and thermalization, which, however, becomes inhibited for sufficiently strong inhomogeneity. To exhibit ETH, and to display its breakdown upon varying the strength of interactions, we probe statistics of energy levels and properties of matrix elements of local observables in eigenstates of the inhomogeneous XXZ spin chain. Moreover, we investigate the dynamics of the entanglement entropy and the survival probability which further evidence the thermalization and its breakdown in the considered model. We outline a way to experimentally realize the XXZ chain with linearly-inhomogeneous interactions in systems of ultracold atoms. Our results highlight a mechanism of emergence of ETH due to insertion of inhomogeneities in an otherwise integrable system and illustrate the arrest of quantum dynamics in presence of strong interactions.	翻訳日:2023-11-01 21:08:13 公開日:2023-10-30
# 量子テレポーテーションに基づく半量子プロキシブラインド署名 Semiquantum proxy blind signature based on quantum teleportation ( http://arxiv.org/abs/2310.19327v1 ) ライセンス: Link先を確認	Xiao Tan, Tian-Yu Ye	(参考訳) 本稿では,X状態に基づく量子テレポーテーションを用いた新しいセミクエンタムプロキシブラインドシグネチャ方式を提案する。そこでは,元のメッセージオーナ,プロキシシグネチャ,サードパーティが完全量子能力を持つ量子参加者であり,元のシグネチャとシグネチャ検証器は限定量子能力を持つセミクエンタム参加者である。我々のプロトコルは、完全な盲目、非偽造性、非再考だけでなく、盗聴者からの攻撃行動にも抵抗できることがわかった。従来の多くの量子プロキシブラインドシグネチャプロトコルと比較すると、元のシグネチャとシグネチャ検証器はともに量子能力に制限のある半量子参加者であるため、我々のプロトコルは量子リソースが少なくなり、現実の実装が容易になる可能性がある。 In this paper, we propose a novel semiquantum proxy blind signature scheme with quantum teleportation based on X states, where the original message owner, the proxy signer and the third party are quantum participants with complete quantum capabilities, while the original signer and the signature verifier are semiquantum participants with limited quantum capabilities. It turns out that our protocol not only has complete blindness, unforgeability, non-repudiation and but also can resist the attack behavior from an eavesdropper. Compared with many previous quantum proxy blind signature protocols, our protocol may need less quantum resources and be easier to implement in reality, since both the original signer and the signature verifier are semiquantum participants with limited quantum capabilities.	翻訳日:2023-11-01 21:07:54 公開日:2023-10-30
# 英語における難解な質問生成のための軽量手法 A Lightweight Method to Generate Unanswerable Questions in English ( http://arxiv.org/abs/2310.19403v1 ) ライセンス: Link先を確認	Vagrant Gautam, Miaoran Zhang, Dietrich Klakow	(参考訳) 利用可能な情報で質問に答えられない場合、質問応答のための堅牢なシステム(QA)は _not_ を知って答えるべきである。これを行うQAモデルを構築する方法の1つは、アノテータを採用するか、あるいは解決不可能な質問生成のための自動メソッドを通じて作成される、解決不可能な質問からなる追加のトレーニングデータである。既存の自動アプローチのモデルの複雑さが正当化されていないことを示すため、英語の難解な質問生成のためのより単純なデータ拡張手法について検討する。従来の最先端技術と比較すると、トレーニング不要で軽量な戦略によって生成されたデータは、より優れたモデル(BERT-largeでSQuAD 2.0データに+1.6 F1ポイント)となり、より人力的な関連性と可読性が高い。我々は,複数のエンコーダモデルにまたがる拡張を行わず,異なる量の生成データとTydiQA-MinSpanデータ(BERT-largeで+9.3 F1ポイント)を用いて,このアプローチの生の利点を定量化する。我々の結果は、スワップを将来の作業の単純だが強力なベースラインとして確立する。 If a question cannot be answered with the available information, robust systems for question answering (QA) should know _not_ to answer. One way to build QA models that do this is with additional training data comprised of unanswerable questions, created either by employing annotators or through automated methods for unanswerable question generation. To show that the model complexity of existing automated approaches is not justified, we examine a simpler data augmentation method for unanswerable question generation in English: performing antonym and entity swaps on answerable questions. Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data with BERT-large), and has higher human-judged relatedness and readability. We quantify the raw benefits of our approach compared to no augmentation across multiple encoder models, using different amounts of generated data, and also on TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish swaps as a simple but strong baseline for future work.	翻訳日:2023-11-01 21:00:34 公開日:2023-10-30
# playtest:ゲーム用のゲーム化テストジェネレータ PlayTest: A Gamified Test Generator for Games ( http://arxiv.org/abs/2310.19402v1 ) ライセンス: Link先を確認	Patric Feldmeier, Philipp Straubinger, Gordon Fraser	(参考訳) ゲームは通常段階的に作成され、同じシナリオを繰り返しテストする必要がある。そこで我々は,Playtestと呼ばれるゲームにカプセル化することで,このゲームテストプロセスを緩和することを目的としている。 playtestはプレイヤーのアクションに基づいて価値あるテストケースを自動生成する。開発プロセス中のプレイテストフェーズにおいて,ツールを通じて各ゲームにアクセスできるようにすることにより,ゲームテストタスクをクラウドソーシングするPlaytestの利用を想定する。 Games are usually created incrementally, requiring repeated testing of the same scenarios, which is a tedious and error-prone task for game developers. Therefore, we aim to alleviate this game testing process by encapsulating it into a game called Playtest, which transforms the tiring testing process into a competitive game with a purpose. Playtest automates the generation of valuable test cases based on player actions, without the players even realising it. We envision the use of Playtest to crowdsource the task of testing games by giving players access to the respective games through our tool in the playtesting phases during the development process.	翻訳日:2023-11-01 21:00:01 公開日:2023-10-30
# LightSAGE:買い物客の推薦における大規模項目検索のためのグラフニューラルネットワーク LightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee's Advertisement Recommendation ( http://arxiv.org/abs/2310.19394v1 ) ライセンス: Link先を確認	Dang Minh Nguyen, Chenfei Wang, Yan Shen, Yifan Zeng	(参考訳) グラフニューラルネットワーク(GNN)は、推薦問題におけるアイテム検索のトレンドソリューションである。しかし最近の報告では、新しいモデルアーキテクチャに重点を置いている。これは、GNNを産業環境に適用する際のギャップを生じさせる可能性がある。グラフの構築とデータ空間の扱いに加えて、プロジェクト全体の成功においても重要な役割を果たす。本稿では,GNNの大規模eコマースアイテム検索への応用について報告する。グラフの構築、モデリング、データスキューネスの処理において、単純で新しくてインパクトのあるテクニックを紹介します。具体的には,強信号ユーザ行動と高精度協調フィルタリング(cf)アルゴリズムを組み合わせることで,高品質な項目グラフを構築する。そこで我々はLightSAGEと呼ばれる新しいGNNアーキテクチャを開発し、ベクトル探索のための高品質なアイテムの埋め込みを生成する。最後に、広告(ads)システムにおいて重要となるコールドスタートおよびロングテールアイテムを扱う複数の戦略を設計する。本モデルでは,オフライン評価の改善やオンラインa/bテストを実施し,shopeeのレコメンデーション広告システムのメイントラフィックにデプロイする。 Graph Neural Network (GNN) is the trending solution for item retrieval in recommendation problems. Most recent reports, however, focus heavily on new model architectures. This may bring some gaps when applying GNN in the industrial setup, where, besides the model, constructing the graph and handling data sparsity also play critical roles in the overall success of the project. In this work, we report how GNN is applied for large-scale e-commerce item retrieval at Shopee. We introduce our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. Specifically, we construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items' embeddings for vector search. Finally, we design multiple strategies to handle cold-start and long-tail items, which are critical in an advertisement (ads) system. Our models bring improvement in offline evaluations, online A/B tests, and are deployed to the main traffic of Shopee's Recommendation Advertisement system.	翻訳日:2023-11-01 20:59:50 公開日:2023-10-30
# 前庭神経癌に対する臨床ガイドライン駆動自動線状特徴抽出法 A Clinical Guideline Driven Automated Linear Feature Extraction for Vestibular Schwannoma ( http://arxiv.org/abs/2310.19392v1 ) ライセンス: Link先を確認	Navodini Wijethilake, Steve Connor, Anna Oviedova, Rebecca Burger, Tom Vercauteren, Jonathan Shapey	(参考訳) 前庭神経腫は、バランス神経の1つから成長する良性脳腫瘍である。患者は手術、放射線治療、あるいは保守的な「待機とスキャン」戦略で治療される。臨床医は通常、手作業で抽出したリニア測定を使って臨床意思決定を支援する。本研究の目的は,深層学習に基づくセグメンテーションを用いて,計算アルゴリズムを用いて関連する臨床特徴を抽出し,このプロセスを自動化することである。私たちの知識を最大限に活用するため,本研究は,局所臨床ガイドラインを再現する自動アプローチを提案する最初の方法である。深層学習ベースセグメンテーションにより,T1強調MRIでは0.8124+- 0.2343,0.8969+- 0.0521,T2強調MRIでは0.8222+0.2108,0.9049+- 0.0646を得た。そこで本稿では, 腫瘍の肉眼領域の大きさに基づいて, 分割領域から最も適切な最大線量測定を選択し, 抽出するアルゴリズムを提案する。このツールを用いて、臨床医は、臨床診断補助として機能する腫瘍進展に関する視覚ガイドと関連するメトリクスを提供される。本研究は,イギリスにおける第3次神経外科専門病院に紹介された50例から得られた187件のスキャンデータを用いた。専門神経放射線医が手動で抽出した測定値から,自動測定値と有意な相関が認められた(p<0.0001。 Vestibular Schwannoma is a benign brain tumour that grows from one of the balance nerves. Patients may be treated by surgery, radiosurgery or with a conservative "wait-and-scan" strategy. Clinicians typically use manually extracted linear measurements to aid clinical decision making. This work aims to automate and improve this process by using deep learning based segmentation to extract relevant clinical features through computational algorithms. To the best of our knowledge, our study is the first to propose an automated approach to replicate local clinical guidelines. Our deep learning based segmentation provided Dice-scores of 0.8124 +- 0.2343 and 0.8969 +- 0.0521 for extrameatal and whole tumour regions respectively for T2 weighted MRI, whereas 0.8222 +- 0.2108 and 0.9049 +- 0.0646 were obtained for T1 weighted MRI. We propose a novel algorithm to choose and extract the most appropriate maximum linear measurement from the segmented regions based on the size of the extrameatal portion of the tumour. Using this tool, clinicians will be provided with a visual guide and related metrics relating to tumour progression that will function as a clinical decision aid. In this study, we utilize 187 scans obtained from 50 patients referred to a tertiary specialist neurosurgical service in the United Kingdom. The measurements extracted manually by an expert neuroradiologist indicated a significant correlation with the automated measurements (p < 0.0001).	翻訳日:2023-11-01 20:59:33 公開日:2023-10-30
# 因果的公平性:因果関係の橋渡し、個々人の公平性、敵対的堅牢性 Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness ( http://arxiv.org/abs/2310.19391v1 ) ライセンス: Link先を確認	Ahmad-Reza Ehyaei, Golnoosh Farnadi, Samira Samadi	(参考訳) 敵対的摂動は機械学習モデルの脆弱性を暴露するために使用され、一方個々の公平性の概念は、機密性に関係なく公平な扱いを保証することを目的としている。最初の違いにもかかわらず、両方の概念は類似した入力データインスタンスを生成するためにメトリクスに依存している。これらの指標は、特に因果構造から派生したデータの特徴と一致し、反事実的近接を反映するように設計されるべきである。このようなメトリクスを定義する以前の試みは、しばしばデータや構造的因果モデルに関する一般的な仮定を欠いている。本研究では,感性属性を含む因果構造に基づいて定式化された因果フェアメトリックを提案する。ロバストネス分析のために、保護因果摂動の概念が提示される。さらに,実世界の問題に対するメトリック推定とデプロイメントの手法を提案することにより,メトリック学習を考察した。紹介されたメトリクスは、対人訓練、公正学習、アルゴリズムの講義、因果強化学習に応用されている。 Adversarial perturbation is used to expose vulnerabilities in machine learning models, while the concept of individual fairness aims to ensure equitable treatment regardless of sensitive attributes. Despite their initial differences, both concepts rely on metrics to generate similar input data instances. These metrics should be designed to align with the data's characteristics, especially when it is derived from causal structure and should reflect counterfactuals proximity. Previous attempts to define such metrics often lack general assumptions about data or structural causal models. In this research, we introduce a causal fair metric formulated based on causal structures that encompass sensitive attributes. For robustness analysis, the concept of protected causal perturbation is presented. Additionally, we delve into metric learning, proposing a method for metric estimation and deployment in real-world problems. The introduced metric has applications in the fields adversarial training, fair learning, algorithmic recourse, and causal reinforcement learning.	翻訳日:2023-11-01 20:59:10 公開日:2023-10-30
# 暗黙多様体ガウス過程回帰 Implicit Manifold Gaussian Process Regression ( http://arxiv.org/abs/2310.19390v1 ) ライセンス: Link先を確認	Bernardo Fichera, Viacheslav Borovitskiy, Andreas Krause, Aude Billard	(参考訳) ガウス過程の回帰は、よく校正された不確実性推定を提供し、小さなデータセットやスパースデータセットを扱う能力によって広く利用されている。しかし、それは高次元データに苦しむ。このテクニックを高次元にスケールする方法の1つは、データが実際に存在する暗黙の低次元多様体を、多様体仮説によって仮定されるように活用することである。以前の作業では、通常、多様体構造は明示的に与えられること、すなわちメッシュによって与えられるか、球面のようなよく知られた多様体の1つであることが知られていることを要求する。対照的に,本論文では,データ(ラベル付き,ラベルなし)から直接暗黙の構造を完全に微分可能な方法で推定できるガウス過程回帰手法を提案する。得られたモデルについて、仮定多様体上の mat\'ern gauss 過程への収束について論じる。我々の手法は数十万のデータポイントまでスケールし、高次元〜設定における標準ガウス過程回帰の予測性能とキャリブレーションを改善する。 Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Mat\'ern Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional~settings.	翻訳日:2023-11-01 20:58:54 公開日:2023-10-30
# オセロは解決した Othello is Solved ( http://arxiv.org/abs/2310.19387v1 ) ライセンス: Link先を確認	Hiroki Takizawa	(参考訳) オセロのゲームは世界で最も複雑で人気のあるゲームの1つであり、まだ計算学的に解決されていない。オセロは、およそ10オクテデシリオン(10から58のパワー)のゲーム記録と10オクテリオン(10から28のパワー)のゲームポジションを持っている。オセロを解くという課題は、どちらのプレイヤーもミスを起こさずにゲームの結果を決定することであり、長い間コンピュータ科学における大きな挑戦であった。 othelloが解決され、両プレーヤーによる完璧なプレーが引き分けにつながることを計算的に証明した。強力なothelloソフトウェアは、ヒューリスティックに設計された検索技術を使って長い間構築されてきた。ゲームの解決は、ソフトウェアがゲームを完璧にプレイできるソリューションを提供する。 The game of Othello is one of the world's most complex and popular games that has yet to be computationally solved. Othello has roughly ten octodecillion (10 to the 58th power) possible game records and ten octillion (10 to the 28th power) possible game position. The challenge of solving Othello, determining the outcome of a game with no mistake made by either player, has long been a grand challenge in computer science. This paper announces a significant milestone: Othello is now solved, computationally proved that perfect play by both players lead to a draw. Strong Othello software has long been built using heuristically designed search techniques. Solving a game provides the solution which enables software to play the game perfectly.	翻訳日:2023-11-01 20:58:37 公開日:2023-10-30
# ニューラルエミュレータを用いたサブグリッドスケールダイナミックスのグラディエントフリーオンライン学習 Gradient-free online learning of subgrid-scale dynamics with neural emulators ( http://arxiv.org/abs/2310.19385v1 ) ライセンス: Link先を確認	Hugo Frezat, Guillaume Balarac, Julien Le Sommer, Ronan Fablet	(参考訳) 本稿では,非微分型数値解法に対する$\textit{a posteriori}$損失関数を用いて,オンライン上で機械学習に基づくサブグリッドパラメータ化を学習する汎用アルゴリズムを提案する。提案手法では, ニューラルネットワークを用いて, 時間積分ステップによる勾配伝播を可能にするために, 低減状態空間ソルバの近似を学習する。このアルゴリズムは、元の解法の勾配を計算することなく、オンライン戦略の利点のほとんどを回復することができる。近似バイアスの伝播を最小化するために,各損失量と神経エミュレータとパラメトリゼーション成分を別々に訓練する必要があることを実証した。 In this paper, we propose a generic algorithm to train machine learning-based subgrid parametrizations online, i.e., with $\textit{a posteriori}$ loss functions for non-differentiable numerical solvers. The proposed approach leverage neural emulators to train an approximation of the reduced state-space solver, which is then used to allows gradient propagation through temporal integration steps. The algorithm is able to recover most of the benefit of online strategies without having to compute the gradient of the original solver. It is demonstrated that training the neural emulator and parametrization components separately with respective loss quantities is necessary in order to minimize the propagation of some approximation bias.	翻訳日:2023-11-01 20:58:25 公開日:2023-10-30
# 深部随時仮説テスト Deep anytime-valid hypothesis testing ( http://arxiv.org/abs/2310.19384v1 ) ライセンス: Link先を確認	Teodora Pandeva and Patrick Forr\'e and Aaditya Ramdas and Shubhanshu Shekhar	(参考訳) 本研究では,非パラメトリックテスト問題に対する強力な逐次的仮説テストを構築するための汎用フレームワークを提案する。これらの問題のヌル仮説は、データ分布上の2つの既知の演算子の作用を用いて抽象形式で定義される。この抽象化により、2サンプルテスト、独立テスト、条件付き独立テストのような古典的なタスクを統一的に扱うことができ、機械学習(ML)モデルの対角的堅牢性のテストのような現代の問題も解決できる。提案するフレームワークは,従来のバッチテストよりも次のような利点がある。 1)オンラインデータストリームを継続的に監視し、nullに対する証拠を効率的に集約する。 2) 複数のテストの修正を必要とせず、タイプiエラーの厳密な制御を提供する。 3) 問題の未知の硬さにサンプルサイズ要件を適用する。逐次テストの設計のためのゲーム理論的アプローチであるtest-by-bettingフレームワークにおいて,mlモデルの表現能力を活用するための原則的アプローチを開発した。合成および実世界のデータセットに関する実証的な結果は、我々の一般的なフレームワークを用いてインスタンス化されたテストが、いくつかのタスクにおける特別なベースラインと競合していることを示している。 We propose a general framework for constructing powerful, sequential hypothesis tests for a large class of nonparametric testing problems. The null hypothesis for these problems is defined in an abstract form using the action of two known operators on the data distribution. This abstraction allows for a unified treatment of several classical tasks, such as two-sample testing, independence testing, and conditional-independence testing, as well as modern problems, such as testing for adversarial robustness of machine learning (ML) models. Our proposed framework has the following advantages over classical batch tests: 1) it continuously monitors online data streams and efficiently aggregates evidence against the null, 2) it provides tight control over the type I error without the need for multiple testing correction, 3) it adapts the sample size requirement to the unknown hardness of the problem. We develop a principled approach of leveraging the representation capability of ML models within the testing-by-betting framework, a game-theoretic approach for designing sequential tests. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines on several tasks.	翻訳日:2023-11-01 20:58:12 公開日:2023-10-30
# 実測実験における補正ベルと非文脈不等式 Corrected Bell and Noncontextuality Inequalities for Realistic Experiments ( http://arxiv.org/abs/2310.19383v1 ) ライセンス: Link先を確認	Kim Vall\'ee, Pierre-Emmanuel Emeriau, Boris Bourdoncle, Adel Sohbi, Shane Mansfield, Damian Markham	(参考訳) 文脈性は量子相関の特徴である。非古典的現象としての基礎的な観点から、量子優位の資源として応用的な視点から重要である。一般には隠れた変数の項で定義され、パラメータ依存と決定論の仮定と矛盾する。前者は非シグナリングまたは非ジグナブルの経験的性質、後者は測定シャープネスの経験的性質によって正当化することができる。しかし、現実的な実験では、経験的性質が正確には持たないため、非古典性の形式としての文脈性に対する反対や、想定される量子上の利点に対する潜在的な脆弱性が生じる可能性がある。両性質を定量化するための尺度を導入し,対応する仮定の量化緩和を導入する。我々は、その雑音に対する堅牢性を保証する文脈分数という、既知の文脈的尺度の連続性を証明した。すると、これらの緩和が文脈的分数(あるいは任意の非文脈的不等式)への補正項を通して文脈的不完全性を説明することができる範囲を、真の文脈的不完全性の概念で決定する。その結果,様々な確立した結果や実験的な設定を適用あるいは関連付けるのに十分な汎用性が得られた。 Contextuality is a feature of quantum correlations. It is crucial from a foundational perspective as a nonclassical phenomenon, and from an applied perspective as a resource for quantum advantage. It is commonly defined in terms of hidden variables, for which it forces a contradiction with the assumptions of parameter-independence and determinism. The former can be justified by the empirical property of non-signalling or non-disturbance, and the latter by the empirical property of measurement sharpness. However, in realistic experiments neither empirical property holds exactly, which leads to possible objections to contextuality as a form of nonclassicality, and potential vulnerabilities for supposed quantum advantages. We introduce measures to quantify both properties, and introduce quantified relaxations of the corresponding assumptions. We prove the continuity of a known measure of contextuality, the contextual fraction, which ensures its robustness to noise. We then bound the extent to which these relaxations can account for contextuality, via corrections terms to the contextual fraction (or to any noncontextuality inequality), culminating in a notion of genuine contextuality, which is robust to experimental imperfections. We then show that our result is general enough to apply or relate to a variety of established results and experimental setups.	翻訳日:2023-11-01 20:57:55 公開日:2023-10-30
# 機械学習ショートカットによる公開データの保護 Protecting Publicly Available Data With Machine Learning Shortcuts ( http://arxiv.org/abs/2310.19381v1 ) ライセンス: Link先を確認	Nicolas M. M\"uller, Maximilian Burgert, Pascal Debus, Jennifer Williams, Philip Sperl, Konstantin B\"ottinger	(参考訳) 機械学習(ml)ショートカットやスプリアス相関はデータセット内のアーティファクトであり、非常に優れたトレーニングとテストパフォーマンスをもたらすが、モデルの一般化能力は著しく制限される。このようなショートカットはドメイン内テストパフォーマンスの良さから気づかないほど不気味なものです。本稿では,異なるショートカットの影響について検討し,簡単なショートカットであっても説明可能なAI手法により検出が難しいことを示す。私たちはこの事実を利用して、オンラインデータベースをクローラーから守るためのアプローチを設計します。デートプラットフォーム、衣料品メーカー、中古車ディーラーなどのプロバイダは、大規模にデータポイントをつかんで再送する専門化されたクローリング業界を扱わなければなりません。 MLショートカットを意図的に追加することで、抑止力を実現できることを示す。このようなデータセットはMLのユースケースでは使用できないため、クローラやインターネットからの不正なデータの使用を回避できる。 3つのユースケースから得られた実世界データを用いて,提案手法では収集したデータは使用できないが,ショートカットは人間の知覚では認識が困難であることを示す。したがって,提案手法は不正なデータクローリングに対する積極的な保護となる。 Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.	翻訳日:2023-11-01 20:57:32 公開日:2023-10-30
# transxnet: 視覚認識のためのdual dynamic token mixerによるグローバルおよびローカルダイナミクスの学習 TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition ( http://arxiv.org/abs/2310.19380v1 ) ライセンス: Link先を確認	Meng Lou, Hong-Yu Zhou, Sibei Yang, Yizhou Yu	(参考訳) 近年,インダクティブバイアスの導入と一般化性能の向上を目的として,変圧器への畳み込みを取り入れている。しかし、従来の畳み込みの静的な性質は、入力のバリエーションに動的に適応することを妨げるため、自己注意が注意行列を動的に計算するにつれて、畳み込みと自己注意の表現の相違が生じる。さらに、畳み込みと自己アテンションからなるトークンミキサーを積み重ねてディープネットワークを形成すると、畳み込みの静的性質は、自己アテンションによって生成された機能を畳み込みカーネルに融合させるのを妨げる。これら2つの制限は、構築されたネットワークの準最適表現能力をもたらす。そこで本研究では,グローバルな情報と局所的な詳細を入力依存的に集約する軽量なD-Mixerを提案する。 D-Mixerは、効率的なグローバルアテンションモジュールと入力依存の奥行き畳み込みを均等に分割した特徴セグメントに別々に適用し、ネットワークに強い帰納バイアスと拡張された有効受容場を与える。我々は,新しいハイブリッドCNN-TransformerビジョンバックボーンネットワークであるTransXNetを設計する上で,基本的なビルディングブロックとしてD-Mixerを使用している。 ImageNet-1Kの画像分類タスクでは、TransXNet-TはSwing-Tを0.3倍の精度で上回り、計算コストの半分以下である。さらに、TransXNet-SとTransXNet-Bは優れたモデルスケーラビリティを示し、それぞれ83.8\%と84.6\%のTop-1精度を実現した。さらに,提案するネットワークアーキテクチャは,計算コストを低減しつつ,様々な密集した予測タスクにおいて強力な一般化能力を示す。 Recent studies have integrated convolution into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it from dynamically adapting to input variations, resulting in a representation discrepancy between convolution and self-attention as self-attention calculates attention matrices dynamically. Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels. These two limitations result in a sub-optimal representation capacity of the constructed networks. To find a solution, we propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way. D-Mixer works by applying an efficient global attention module and an input-dependent depthwise convolution separately on evenly split feature segments, endowing the network with strong inductive bias and an enlarged effective receptive field. We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network that delivers compelling performance. In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3\% in top-1 accuracy while requiring less than half of the computational cost. Furthermore, TransXNet-S and TransXNet-B exhibit excellent model scalability, achieving top-1 accuracy of 83.8\% and 84.6\% respectively, with reasonable computational costs. Additionally, our proposed network architecture demonstrates strong generalization capabilities in various dense prediction tasks, outperforming other state-of-the-art networks while having lower computational costs.	翻訳日:2023-11-01 20:57:12 公開日:2023-10-30
# イメージジェネレータのハイブリッドドメイン適応 Few-shot Hybrid Domain Adaptation of Image Generators ( http://arxiv.org/abs/2310.19378v1 ) ライセンス: Link先を確認	Hengjia Li, Yang Liu, Linxuan Xia, Yuqi Lin, Tu Zheng, Zheng Yang, Wenxiao Wang, Xiaohui Zhong, Xiaobo Ren, Xiaofei He	(参考訳) 事前学習されたジェネレータは、複数のターゲットドメインのハイブリッドに適応し、それらの統合された属性で画像を生成することができるか? 本研究では、Few-shot Hybrid Domain Adaptation (HDA)という新しいタスクを導入する。ソースジェネレータといくつかのターゲットドメインを与えられたhdaは、ソースドメインの特性をオーバーライドすることなく、すべてのターゲットドメインの統合属性を保持する適応型ジェネレータの獲得を目指している。ドメイン適応(DA)と比較して、HDAはジェネレータをより複合的で拡張可能なドメインに適応するための柔軟性と汎用性を提供します。同時に、HDAは、ターゲットドメインの個々の画像のみにアクセスでき、ハイブリッドドメインの認証画像が欠如しているため、DAよりも多くの課題を提示します。この問題に対処するために、異なるドメインの画像を直接分離可能なサブ空間にエンコードする差別化フレームワークを導入する。 HDAを実現するために,距離損失と方向損失からなる新たな方向空間損失を提案する。特に、距離損失は、生成された画像からすべての対象部分空間までの距離を減らすことにより、すべての対象領域の属性をブレンドする。方向損失は、垂直部分空間に沿って適応を導くことによって、ソース領域からの特性を保存する。実験により、本手法は、セマンティクス類似性、画像忠実性、ドメイン間の一貫性においてベースラインメソッドを上回る1つの適応型ジェネレータにおいて、多数のドメイン固有の属性を得ることができることを示した。 Can a pre-trained generator be adapted to the hybrid of multiple target domains and generate images with integrated attributes of them? In this work, we introduce a new task -- Few-shot Hybrid Domain Adaptation (HDA). Given a source generator and several target domains, HDA aims to acquire an adapted generator that preserves the integrated attributes of all target domains, without overriding the source domain's characteristics. Compared with Domain Adaptation (DA), HDA offers greater flexibility and versatility to adapt generators to more composite and expansive domains. Simultaneously, HDA also presents more challenges than DA as we have access only to images from individual target domains and lack authentic images from the hybrid domain. To address this issue, we introduce a discriminator-free framework that directly encodes different domains' images into well-separable subspaces. To achieve HDA, we propose a novel directional subspace loss comprised of a distance loss and a direction loss. Concretely, the distance loss blends the attributes of all target domains by reducing the distances from generated images to all target subspaces. The direction loss preserves the characteristics from the source domain by guiding the adaptation along the perpendicular to subspaces. Experiments show that our method can obtain numerous domain-specific attributes in a single adapted generator, which surpasses the baseline methods in semantic similarity, image fidelity, and cross-domain consistency.	翻訳日:2023-11-01 20:56:38 公開日:2023-10-30
# 二次元共形場理論における状態準備としての不均一クエンチ Inhomogeneous quenches as state preparation in two-dimensional conformal field theories ( http://arxiv.org/abs/2310.19376v1 ) ライセンス: Link先を確認	Masahiro Nozaki, Kotaro Tamaoka, Mao Tian Tan	(参考訳) システムが特徴のない状態に進化しない非平衡過程は、非平衡現象における新しい中心対象の1つである。本稿では,2次元共形場理論(2$d CFTs)における近距離絡み状態から,正規化を伴う境界状態から,M\"obius/SSD理論と呼ばれる不均一ハミルトニアンの系を進化させる。この論文で考慮されたCFTの詳細にかかわらず、M\\\obius進化の間、絡み合いエントロピーは量子回復と呼ばれる周期運動を示す。 ssd時間発展の間、一部のサブシステムを除いて、大規模なシステムでは、絡み合いエントロピーと相互情報が真空状態のものと近似される。サブシステムが真空に冷やす時間は$t_1 \gg \mathcal{o}(l\sqrt{l_a})$であり、ここで$t_1$、$l$、$l_a$は時間、システム、サブシステムサイズである。この結果は, SSDハミルトニアンにより誘導される不均一なクエンチが, ほぼ真空状態の準備として用いられることを示唆している。本稿では,本論文で検討した系の重力双対を提案し,一般化する。さらに,不均質なクエンチと連続的マルチスケールエンタングルメント再正規化アンサッツ (cmera) との関係について考察する。 The non-equilibrium process where the system does not evolve to the featureless state is one of the new central objects in the non-equilibrium phenomena. In this paper, starting from the short-range entangled state in the two-dimensional conformal field theories ($2$d CFTs), the boundary state with a regularization, we evolve the system with the inhomogeneous Hamiltonians called M\"obius/SSD ones. Regardless of the details of CFTs considered in this paper, during the M\"obius evolution, the entanglement entropy exhibits the periodic motion called quantum revival. During SSD time evolution, except for some subsystems, in the large time regime, entanglement entropy and mutual information are approximated by those for the vacuum state. We argue the time regime for the subsystem to cool down to vacuum one is $t_1 \gg \mathcal{O}(L\sqrt{l_A})$, where $t_1$, $L$, and $l_A$ are time, system, and subsystem sizes. This finding suggests the inhomogeneous quench induced by the SSD Hamiltonian may be used as the preparation for the approximately-vacuum state. We propose the gravity dual of the systems considered in this paper, furthermore, and generalize it. In addition to them, we discuss the relation between the inhomogenous quenches and continuous multi-scale entanglement renormalization ansatz (cMERA).	翻訳日:2023-11-01 20:56:13 公開日:2023-10-30
# シーン特異的融合モジュールによるrgb-xオブジェクト検出 RGB-X Object Detection via Scene-Specific Fusion Modules ( http://arxiv.org/abs/2310.19372v1 ) ライセンス: Link先を確認	Sri Aditya Deevi, Connor Lee, Lu Gan, Sushruth Nagesh, Gaurav Pandey, and Soon-Jo Chung	(参考訳) マルチモーダル深度センサー融合は、自動運転車が周囲の環境をあらゆる天候下で視覚的に理解することを可能にする可能性がある。しかし、既存の深層センサー融合法では、通常、統合されたマルチモーダル特徴を持つ畳み込みアーキテクチャを採用しており、トレーニングには大きなコアギスタードマルチモーダルデータセットを必要とする。本研究では,シーン固有の融合モジュールを介し,事前学習した単一モードモデルの活用と融合が可能な,効率的かつモジュール化されたRGB-X融合ネットワークを提案する。実験では,rgb-thermalおよびrgb-gatedデータセットにおける既存の手法と比較して,少量の追加パラメータのみを用いて融合を行う方法が優れていることを示す。私たちのコードはhttps://github.com/dsriaditya999/RGBXFusionで利用可能です。 Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.	翻訳日:2023-11-01 20:55:43 公開日:2023-10-30
# 区間値データと区間値関数データの順序分類 Ordinal classification for interval-valued data and interval-valued functional data ( http://arxiv.org/abs/2310.19433v1 ) ライセンス: Link先を確認	Aleix Alcacer, Marina Mart\'inez-Garcia, Irene Epifanio	(参考訳) 順序分類の目的は、観測された入力の集合から出力の順序ラベルを予測することである。区間値データとは、間隔の形でのデータを指す。順序分類問題において、初めて区間値データと区間値関数データが入力と見なされる。区間データと区間値関数データに対する6つの順序分類器を提案する。 3つはパラメトリックであり、1つは順序二項分解、もう1つは順序ロジスティック回帰に基づいている。他の3つの手法は、インターバルデータにおけるインターバルデータとカーネル間の距離の利用に基づいている。方法の1つは、順序分類に$k$-nearest-neighbor法を用いる。別の方法はカーネル主成分分析と順序分類器を考える。そして、最善を尽くす方法である第6の方法は、カーネルによって誘導される順序ランダムフォレストを使用する。それらは、人間の地球開発や気象データに関する合成およびオリジナルの実データを用いた広範囲な実験研究において、na\"iveなアプローチと比較される。その結果,順序と区間値情報を考慮すると精度が向上することがわかった。ソースコードとデータセットはhttps://github.com/aleixalcacer/ocfivdで入手できる。 The aim of ordinal classification is to predict the ordered labels of the output from a set of observed inputs. Interval-valued data refers to data in the form of intervals. For the first time, interval-valued data and interval-valued functional data are considered as inputs in an ordinal classification problem. Six ordinal classifiers for interval data and interval-valued functional data are proposed. Three of them are parametric, one of them is based on ordinal binary decompositions and the other two are based on ordered logistic regression. The other three methods are based on the use of distances between interval data and kernels on interval data. One of the methods uses the weighted $k$-nearest-neighbor technique for ordinal classification. Another method considers kernel principal component analysis plus an ordinal classifier. And the sixth method, which is the method that performs best, uses a kernel-induced ordinal random forest. They are compared with na\"ive approaches in an extensive experimental study with synthetic and original real data sets, about human global development, and weather data. The results show that considering ordering and interval-valued information improves the accuracy. The source code and data sets are available at https://github.com/aleixalcacer/OCFIVD.	翻訳日:2023-11-01 20:47:22 公開日:2023-10-30
# ロボット操作のためのディープポリシーネットワークの決定を説明する Explaining the Decisions of Deep Policy Networks for Robotic Manipulations ( http://arxiv.org/abs/2310.19432v1 ) ライセンス: Link先を確認	Seongun Kim, Jaesik Choi	(参考訳) ディープポリシーネットワークは、ロボットが行動を学び、エンド・ツー・エンドの方法で様々な現実世界の複雑なタスクを解決できるようにする。しかし、行動の理由を提供するための透明性が欠如している。したがって、そのようなブラックボックスモデルは、実際にロボットを配置する際の信頼性が低く、破壊的な動作をもたらすことが多い。透明性を高めるためには,各入力特徴が与えられた行動決定にどの程度寄与するかを考慮し,ロボットの動作を説明することが重要である。本稿では,入力帰属法による深い政策モデルの明示的な分析を行い,各入力特徴がロボットの政策モデルの判断にどの程度影響するかを説明する。そこで本研究では,ロボットポリシネットワークに入力帰属法を適用するための2つの方法を提案する。(1) エンドエフェクタ運動に対するモータトルクの影響を反映するために,各関節トルクの重要度を測定し,(2) 負の入力と深いポリシネットワークの出力を適切に処理するための関連伝搬法を修正する。我々の知る限りでは、ロボット操作のためにオンラインのディープポリシーネットワークにおけるマルチモーダルセンサ入力の入力属性の動的変化を特定する最初のレポートである。 Deep policy networks enable robots to learn behaviors to solve various real-world complex tasks in an end-to-end fashion. However, they lack transparency to provide the reasons of actions. Thus, such a black-box model often results in low reliability and disruptive actions during the deployment of the robot in practice. To enhance its transparency, it is important to explain robot behaviors by considering the extent to which each input feature contributes to determining a given action. In this paper, we present an explicit analysis of deep policy models through input attribution methods to explain how and to what extent each input feature affects the decisions of the robot policy models. To this end, we present two methods for applying input attribution methods to robot policy networks: (1) we measure the importance factor of each joint torque to reflect the influence of the motor torque on the end-effector movement, and (2) we modify a relevance propagation method to handle negative inputs and outputs in deep policy networks properly. To the best of our knowledge, this is the first report to identify the dynamic changes of input attributions of multi-modal sensor inputs in deep policy networks online for robotic manipulation.	翻訳日:2023-11-01 20:47:06 公開日:2023-10-30
# 不可能な平面の自動検出による信頼性挙動合成のための精製拡散プランナ Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans ( http://arxiv.org/abs/2310.19427v1 ) ライセンス: Link先を確認	Kyowoon Lee, Seongun Kim and Jaesik Choi	(参考訳) 拡散型計画法は, 軌道拡散モデルの訓練と補助誘導関数を用いたサンプル軌道の条件付けにより, 長期的, スパース・リワードタスクにおいて有望な結果を示した。しかし、生成モデルとしての性質から、拡散モデルは実現可能な計画を生成することが保証されていないため、実行が失敗し、プランナーが安全クリティカルな応用に役立ちなくなる。本研究では,拡散モデルが生み出す信頼できない計画を改善するための新しい手法を提案する。そこで本研究では,拡散モデルにより生成された個別計画の品質を評価するための,新たな修復ギャップを提案する。復元ギャップ誘導を生成するgap予測器により復元ギャップを推定し、拡散プランナーを精製する。さらに,サブ・オプティカル・ギャップ・予測器から発生する敵対的精錬指導を防止し,実現不可能な計画のさらなる洗練を可能にするアトリビューション・マップ・レギュラライザを提案する。提案手法は,長期計画を必要とするオフライン制御設定における3つのベンチマークの有効性を示す。また,提案手法は,差分予測器の帰属マップを提示し,誤り発生遷移を強調することにより説明可能性を示し,生成した計画のより深い理解を可能にする。 Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.	翻訳日:2023-11-01 20:46:45 公開日:2023-10-30
# 人工知能と人文科学の限界 Artificial intelligence and the limits of the humanities ( http://arxiv.org/abs/2310.19425v1 ) ライセンス: Link先を確認	W{\l}odzis{\l}aw Duch	(参考訳) 現代社会における文化の複雑さは人間の理解を超えている。認知科学は、精神モデルに基づく伝統的な説明に疑問を投げかけた。人文科学における中核的な主題は、その重要性を失う可能性がある。人間はデジタルの時代に適応しなければならない。人文科学の新しい学際分野が出現する。情報への即時アクセスは、知識への即時アクセスに置き換えられる。人類の認知的限界と、世界的課題に対処するために必要な人工知能と学際研究の発展によって開かれた機会を理解することが、人文科学の活性化の鍵となる。人工知能は、芸術から政治科学、哲学まで、人文科学を根本的に変え、これらの規律を学生にとって魅力的なものにし、現在の制限を超えてそれを可能にします。 The complexity of cultures in the modern world is now beyond human comprehension. Cognitive sciences cast doubts on the traditional explanations based on mental models. The core subjects in humanities may lose their importance. Humanities have to adapt to the digital age. New, interdisciplinary branches of humanities emerge. Instant access to information will be replaced by instant access to knowledge. Understanding the cognitive limitations of humans and the opportunities opened by the development of artificial intelligence and interdisciplinary research necessary to address global challenges is the key to the revitalization of humanities. Artificial intelligence will radically change humanities, from art to political sciences and philosophy, making these disciplines attractive to students and enabling them to go beyond current limitations.	翻訳日:2023-11-01 20:46:20 公開日:2023-10-30
# 教師なしスキル発見のための変分カリキュラム強化学習 Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills ( http://arxiv.org/abs/2310.19424v1 ) ライセンス: Link先を確認	Seongun Kim, Kyowoon Lee, Jaesik Choi	(参考訳) 相互情報(MI)の最大化や変動エンパワーメントを通じて,タスク指向の報酬関数を使わずに複雑なスキルを自律的に獲得するための,有望なフレームワークとして,相互情報に基づく強化学習(RL)が提案されている。しかしながら、トレーニングスキルの順序がサンプル効率に大きく影響するという事実から、複雑なスキルの習得は依然として困難である。そこで本研究では,変分カリキュラムRL (VCRL) と命名する本質的な報酬関数を持つ目標条件付きRLにおいて,変分エンパワーメントをカリキュラム学習として再放送する。そこで本稿では,情報理論に基づく教師なしスキル発見のための新しい手法として,VUVC(Value Uncertainty Variational Curriculum)を提案する。規則性条件下では、VUVCは、均一なカリキュラムに比べて訪問状態のエントロピーの増加を加速させる。複雑なナビゲーションおよびロボット操作作業におけるアプローチの有効性を,サンプル効率と状態カバレッジ速度の観点から検証した。また,本手法によって得られたスキルが,実世界のロボットナビゲーションタスクをゼロショットで達成し,これらのスキルをグローバルプランナーに組み込むことにより,さらに性能が向上することを示す。 Mutual information-based reinforcement learning (RL) has been proposed as a promising framework for retrieving complex skills autonomously without a task-oriented reward function through mutual information (MI) maximization or variational empowerment. However, learning complex skills is still challenging, due to the fact that the order of training skills can largely affect sample efficiency. Inspired by this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.	翻訳日:2023-11-01 20:46:12 公開日:2023-10-30
# 平均BERTによる言語教育 : 低リソース環境における潜伏ブートストラップの効果 Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings ( http://arxiv.org/abs/2310.19420v1 ) ライセンス: Link先を確認	David Samuel	(参考訳) 本稿では,言語モデルの事前学習における代替的自己スーパービジョン手法である潜在ブートストラップの利用について検討する。離散サブワードで自己スーパービジョンを使用する典型的な方法とは異なり、潜在ブートストラップはよりリッチな監視信号にコンテキスト化された埋め込みを利用する。限られた資源から言語知識を得る上で,このアプローチがいかに効果的かを評価する実験を行う。具体的には,2つの小さなコーパスの事前学習と4つの言語ベンチマークの評価を含む,BabyLM共有タスクに基づく実験を行った。 This paper explores the use of latent bootstrapping, an alternative self-supervision technique, for pretraining language models. Unlike the typical practice of using self-supervision on discrete subwords, latent bootstrapping leverages contextualized embeddings for a richer supervision signal. We conduct experiments to assess how effective this approach is for acquiring linguistic knowledge from limited resources. Specifically, our experiments are based on the BabyLM shared task, which includes pretraining on two small curated corpora and an evaluation on four linguistic benchmarks.	翻訳日:2023-11-01 20:45:51 公開日:2023-10-30
# 固有ベクトル継続と投影型エミュレータ Eigenvector Continuation and Projection-Based Emulators ( http://arxiv.org/abs/2310.19419v1 ) ライセンス: Link先を確認	Thomas Duguet, Andreas Ekstr\"om, Richard J. Furnstahl, Sebastian K\"onig, Dean Lee	(参考訳) 固有ベクトル継続(Eigenvector continuation)は、パラメータ集合の固有ベクトルスナップショットから派生した部分空間射影を用いたパラメトリック固有値問題の計算方法である。還元基底法(reduce-basis method)と呼ばれる、より広範な部分空間射影技法のクラスの一部である。本稿では,固有ベクトル継続および投影型エミュレータの開発,理論,応用について述べる。基礎概念を紹介し,基礎理論と収束特性を議論し,最近の量子システムへの応用と今後の展望について述べる。 Eigenvector continuation is a computational method for parametric eigenvalue problems that uses subspace projection with a basis derived from eigenvector snapshots from different parameter sets. It is part of a broader class of subspace-projection techniques called reduced-basis methods. In this colloquium article, we present the development, theory, and applications of eigenvector continuation and projection-based emulators. We introduce the basic concepts, discuss the underlying theory and convergence properties, and present recent applications for quantum systems and future prospects.	翻訳日:2023-11-01 20:45:41 公開日:2023-10-30
# GaitFormer: ノイズの多いマルチタスク学習による歩行表現の学習 GaitFormer: Learning Gait Representations with Noisy Multi-Task Learning ( http://arxiv.org/abs/2310.19418v1 ) ライセンス: Link先を確認	Adrian Cosma, Emilian Radoi	(参考訳) 歩行分析は、被験者の協力に頼らずに個人識別を行うための信頼できる方法であることが証明されている。歩行は、短時間で顕著に変化しない生体計測であり、個人特有のものと見なすことができる。これまでの歩行分析の研究は、外見に基づく手法が依存する歩行者特性の多くを考慮せずに、主に識別と人口推定に焦点を当てていた。本研究では、歩行に基づく人物識別とともに、移動パターンからのみ歩行者属性を識別する。 217kの匿名トラックレットを含む歩行分析システムを事前学習するための最大のデータセットであるdagegaitを提案する。 DenseGaitはビデオストリームを自動的に処理して構築され、現実世界に存在する一連の歩数共変量を提供する。データセットを研究コミュニティに公開しています。さらに,マルチタスク方式で事前学習したトランスフォーマーであるgaitformerを提案する。このモデルでは,cacia-bでは92.5%,fvgでは85.33%の精度を実現している。これは、類似の手法と比較して、+14.2%と+9.67%の精度の増加に相当する。さらに、ゲイトフォーマーは、動きパターンのみを利用して、性別情報と多数の外観属性を正確に識別することができる。実験を再現するコードは公開されています。 Gait analysis is proven to be a reliable way to perform person identification without relying on subject cooperation. Walking is a biometric that does not significantly change in short periods of time and can be regarded as unique to each person. So far, the study of gait analysis focused mostly on identification and demographics estimation, without considering many of the pedestrian attributes that appearance-based methods rely on. In this work, alongside gait-based person identification, we explore pedestrian attribute identification solely from movement patterns. We propose DenseGait, the largest dataset for pretraining gait analysis systems containing 217K anonymized tracklets, annotated automatically with 42 appearance attributes. DenseGait is constructed by automatically processing video streams and offers the full array of gait covariates present in the real world. We make the dataset available to the research community. Additionally, we propose GaitFormer, a transformer-based model that after pretraining in a multi-task fashion on DenseGait, achieves 92.5% accuracy on CASIA-B and 85.33% on FVG, without utilizing any manually annotated data. This corresponds to a +14.2% and +9.67% accuracy increase compared to similar methods. Moreover, GaitFormer is able to accurately identify gender information and a multitude of appearance attributes utilizing only movement patterns. The code to reproduce the experiments is made publicly.	翻訳日:2023-11-01 20:45:32 公開日:2023-10-30
# 量子多体問題解決に向けた量子実験データの機械学習 Machine learning on quantum experimental data toward solving quantum many-body problems ( http://arxiv.org/abs/2310.19416v1 ) ライセンス: Link先を確認	Gyungmin Cho, Dohun Kim	(参考訳) 量子ハードウェアの実装の進歩により、古典的コンピュータによるエミュレーションでは難解なデータの獲得が可能となった。これらのデータと古典的機械学習(ML)アルゴリズムの統合は、あいまいなパターンを明らかにする可能性を秘めている。このハイブリッドアプローチは、古典的コンピュータのみを用いた場合と比較して、効率よく解ける問題のクラスを拡大するが、現在の量子コンピュータにおけるノイズの出現により、制限された問題を解くために実現されている。ここでは、与えられたハミルトニアンの基底状態の性質の予測や量子位相の分類など、多体物理学における興味のある問題へのハイブリッドアプローチの適用性を拡張する。 127量子ビットの超伝導量子ハードウェア上で,様々なエラー低減手法を用いて実験を行い,量子コンピュータから洗練されたデータを取得することができた。これにより,最大44キュービットのシステムに対して,古典的MLアルゴリズムの実装を成功させることができた。量子実験データ処理における古典的MLアルゴリズムのスケーラビリティと有効性を検証する。 Advancements in the implementation of quantum hardware have enabled the acquisition of data that are intractable for emulation with classical computers. The integration of classical machine learning (ML) algorithms with these data holds potential for unveiling obscure patterns. Although this hybrid approach extends the class of efficiently solvable problems compared to using only classical computers, this approach has been realized for solving restricted problems because of the prevalence of noise in current quantum computers. Here, we extend the applicability of the hybrid approach to problems of interest in many-body physics, such as predicting the properties of the ground state of a given Hamiltonian and classifying quantum phases. By performing experiments with various error-reducing procedures on superconducting quantum hardware with 127 qubits, we managed to acquire refined data from the quantum computer. This enabled us to demonstrate the successful implementation of classical ML algorithms for systems with up to 44 qubits. Our results verify the scalability and effectiveness of the classical ML algorithms for processing quantum experimental data.	翻訳日:2023-11-01 20:45:08 公開日:2023-10-30
# CARPE-ID: 個人化ロボット支援のための連続適応型再識別 CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance ( http://arxiv.org/abs/2310.19413v1 ) ライセンス: Link先を確認	Federico Rollo, Andrea Zunino, Nikolaos Tsagarakis, Enrico Mingo Hoffman, Arash Ajoudani	(参考訳) 今日のHuman-Robot Interaction(HRI)のシナリオでは、ロボットが最も近い個人と協力するか、あるいはシーンがただの人間アクターを含んでいると仮定する傾向が一般的である。しかし,店舗のフロア操作のような現実的なシナリオでは,そのような仮定は保持されず,混み合った環境でロボットがターゲット認識を行う必要がある。この要件を満たすために,本研究では,ロボットが適切な個人とシームレスに協調し,視覚的な外観や部分的,あるいは完全な咬合を受けることを保証する,連続的な視覚適応技術に基づく人物再識別モジュールを提案する。実験室で記録されたビデオとHRIシナリオ,すなわち移動ロボットによる人物追従タスクを用いて,このフレームワークを単体でテストする。ターゲットは追跡中の外観を変え、カメラの視野から消えて、閉塞や服装のバリエーションの難しいケースをテストするように求められます。提案手法を最先端マルチオブジェクトトラッキング (mot) 法と比較し, 全事例において, carpe-id が選択した各ターゲットを正確に追跡できることを示した。同時に、s-o-t-a MOTはビデオ毎に4つのトラッキングエラーがある。 In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.	翻訳日:2023-11-01 20:44:36 公開日:2023-10-30
# マンモグラム画像を用いたヒューリスティックアシストトランスレス-u-netとマルチスケール密度ネットによるインテリジェント乳癌診断 Intelligent Breast Cancer Diagnosis with Heuristic-assisted Trans-Res-U-Net and Multiscale DenseNet using Mammogram Images ( http://arxiv.org/abs/2310.19411v1 ) ライセンス: Link先を確認	Muhammad Yaqub, Feng Jinchao	(参考訳) 乳癌 (BC) は女性のがん関連死亡率に大きく寄与し, 早期発見の重要性が示唆された。マンモグラフィは乳腺の異常を同定し診断するための重要なツールであるが,悪性腫瘍の正確な鑑別は困難である。本稿では,マンモグラフィ画像を用いたbcスクリーニングのための新しい深層学習手法を提案する。提案モデルは,確立されたベンチマーク音源からのデータ収集,atrous convolution-based attentive and adaptive trans-res-unet (aca-atrunet) アーキテクチャを用いた画像分割,atrous convolution-based attentive and adaptive multi-scale densenet (aca-amdn) モデルによるbc同定の3つの異なる段階からなる。 ACA-ATRUNetとACA-AMDNモデル内のハイパーパラメータは、MML-EOOアルゴリズムを用いて最適化される。複数のメトリクスを活用する性能評価を行い,従来の手法との比較分析を行った。以上の結果から,bc検出フレームワークは早期発見の精度が向上し,マンモグラフィによるスクリーニング手法が向上する可能性が示唆された。 Breast cancer (BC) significantly contributes to cancer-related mortality in women, underscoring the criticality of early detection for optimal patient outcomes. A mammography is a key tool for identifying and diagnosing breast abnormalities; however, accurately distinguishing malignant mass lesions remains challenging. To address this issue, we propose a novel deep learning approach for BC screening utilizing mammography images. Our proposed model comprises three distinct stages: data collection from established benchmark sources, image segmentation employing an Atrous Convolution-based Attentive and Adaptive Trans-Res-UNet (ACA-ATRUNet) architecture, and BC identification via an Atrous Convolution-based Attentive and Adaptive Multi-scale DenseNet (ACA-AMDN) model. The hyperparameters within the ACA-ATRUNet and ACA-AMDN models are optimised using the Modified Mussel Length-based Eurasian Oystercatcher Optimization (MML-EOO) algorithm. Performance evaluation, leveraging multiple metrics, is conducted, and a comparative analysis against conventional methods is presented. Our experimental findings reveal that the proposed BC detection framework attains superior precision rates in early disease detection, demonstrating its potential to enhance mammography-based screening methodologies.	翻訳日:2023-11-01 20:44:12 公開日:2023-10-30
# 生成されたディストリビューションは、生成モデルに対するメンバーシップ推論攻撃に必要なすべてである Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models ( http://arxiv.org/abs/2310.19410v1 ) ライセンス: Link先を確認	Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang	(参考訳) 生成モデルは様々な視覚創造タスクで革命的な成功を収めてきたが、その間、トレーニングデータの個人情報を漏らすという脅威にさらされている。クエリイメージをトレーニングデータセットメンバまたは非メンバとして分類することにより、生成モデルのプライバシ脆弱性を示すために、いくつかのメンバシップ推論攻撃(MIA)が提案されている。しかし、これらの攻撃はシャドウモデルやホワイトボックスアクセスを必要とすること、拡散モデルのユニークな性質を無視したり焦点を合わせること、複数の生成モデルへの一般化を妨げることなど、大きな制限に苦しむ。対照的に, 生成逆ネットワーク, [可変]オートエンコーダ, 暗黙関数, 新興拡散モデルなど, 様々な生成モデルに対する最初の一般化メンバシップ推論攻撃を提案する。我々は、ターゲットジェネレータと補助的な非メンバーデータセットから生成されるディストリビューションのみを利用するため、ターゲットジェネレータはブラックボックスであり、そのアーキテクチャやアプリケーションシナリオに依存しない。実験は、すべての生成モデルが攻撃に対して脆弱であることを検証します。例えば、我々の研究は、CIFAR-10とCelebAで訓練されたDDPM、DDIM、FastDPMに対するAUC $>0.99$攻撃を達成する。そして、vqgan, ldm (text-conditional generation) および liif に対する攻撃によって auc $>0.90.$ が達成され、結果として私たちは、生成モデルの設計と公開において、このようなプライバシリークリスクに注意するようにコミュニティに訴えます。 Generative models have demonstrated revolutionary success in various visual creation tasks, but in the meantime, they have been exposed to the threat of leaking private information of their training data. Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember. However, these attacks suffer from major limitations, such as requiring shadow models and white-box access, and either ignoring or only focusing on the unique property of diffusion models, which block their generalization to multiple generative models. In contrast, we propose the first generalized membership inference attack against a variety of generative models such as generative adversarial networks, [variational] autoencoders, implicit functions, and the emerging diffusion models. We leverage only generated distributions from target generators and auxiliary non-member datasets, therefore regarding target generators as black boxes and agnostic to their architectures or application scenarios. Experiments validate that all the generative models are vulnerable to our attack. For instance, our work achieves attack AUC $>0.99$ against DDPM, DDIM, and FastDPM trained on CIFAR-10 and CelebA. And the attack against VQGAN, LDM (for the text-conditional generation), and LIIF achieves AUC $>0.90.$ As a result, we appeal to our community to be aware of such privacy leakage risks when designing and publishing generative models.	翻訳日:2023-11-01 20:43:43 公開日:2023-10-30
# 廃棄物浄化のための資源制約セマンティックセグメンテーション Resource Constrained Semantic Segmentation for Waste Sorting ( http://arxiv.org/abs/2310.19407v1 ) ライセンス: Link先を確認	Elisa Cascina, Andrea Pellegrino, Lorenzo Tozzi	(参考訳) 本研究は, 廃棄物発生の環境への影響を最小限に抑えるため, 資源回収施設における効率的な廃棄物選別戦略の必要性に対処するものである。産業環境におけるリサイクル廃棄物の分別化のための資源制約セマンティックセマンティックセマンティックセマンティクスモデルを提案する。私たちのゴールは、処理能力に制限のあるエッジアプリケーションに適した、10MBのメモリ制約に適合するモデルを開発することです。 ICNet、BiSeNet(Xception39のバックボーン)、ENetの3つのネットワークで実験を行った。上記の制限を考慮に入れ、より広いネット上で量子化およびプルーニング技術を実装し、平均IoU測定値にわずかに影響を与えながら正の結果を得る。さらに,focal と lov\'asz を組み合わせることで,クロスエントロピー損失関数と比較して性能が向上する暗黙のクラス不均衡を解消する手法を提案する。 This work addresses the need for efficient waste sorting strategies in Materials Recovery Facilities to minimize the environmental impact of rising waste. We propose resource-constrained semantic segmentation models for segmenting recyclable waste in industrial settings. Our goal is to develop models that fit within a 10MB memory constraint, suitable for edge applications with limited processing capacity. We perform the experiments on three networks: ICNet, BiSeNet (Xception39 backbone), and ENet. Given the aforementioned limitation, we implement quantization and pruning techniques on the broader nets, achieving positive results while marginally impacting the Mean IoU metric. Furthermore, we propose a combination of Focal and Lov\'asz loss that addresses the implicit class imbalance resulting in better performance compared with the Cross-entropy loss function.	翻訳日:2023-11-01 20:43:15 公開日:2023-10-30
# 効率的な畳み込みネットワークの設計による物体検出のためのレーダー・ライダー融合 Radar-Lidar Fusion for Object Detection by Designing Effective Convolution Networks ( http://arxiv.org/abs/2310.19405v1 ) ライセンス: Link先を確認	Farzeen Munir, Shoaib Azam, Tomasz Kucner, Ville Kyrki, Moongu Jeon	(参考訳) 物体検出は知覚システムのコアコンポーネントであり、ego車両に安全な経路計画を確保するために周囲に関する情報を提供する。カメラとライダーは知覚システムを大幅に進歩させたが、その性能は悪天候下では制限される。対照的にミリ波技術は、レーダーがそのような状況で効果的に機能することを可能にする。しかし、知覚システムを構築するためにレーダーのみに頼ることは、データのばらばらな性質のため、環境を完全には捉えない。これに対処するために、センサー融合戦略が導入された。オブジェクト検出の強化のために,レーダとライダーデータを統合したデュアルブランチフレームワークを提案する。一次分枝はレーダーの特徴の抽出に焦点を合わせ、補助分枝はライダーの特徴を抽出する。これらを付加的注意を用いて組み合わせる。その後、新たな並列分岐構造(pfs)を介して統合機能を処理し、スケール変動を管理する。次に、領域提案ヘッドをオブジェクト検出に利用する。本研究では,cocoメトリクスを用いた放射データセットにおける提案手法の有効性を評価した。その結果、好適な気象条件と悪天候条件で、最先端の手法をそれぞれ1.89\%$と2.61\%$で上回った。これはレーダーとライダーの融合が、特に厳しい気象条件において、正確な物体検出と局在化を達成する上での価値を強調する。 Object detection is a core component of perception systems, providing the ego vehicle with information about its surroundings to ensure safe route planning. While cameras and Lidar have significantly advanced perception systems, their performance can be limited in adverse weather conditions. In contrast, millimeter-wave technology enables radars to function effectively in such conditions. However, relying solely on radar for building a perception system doesn't fully capture the environment due to the data's sparse nature. To address this, sensor fusion strategies have been introduced. We propose a dual-branch framework to integrate radar and Lidar data for enhanced object detection. The primary branch focuses on extracting radar features, while the auxiliary branch extracts Lidar features. These are then combined using additive attention. Subsequently, the integrated features are processed through a novel Parallel Forked Structure (PFS) to manage scale variations. A region proposal head is then utilized for object detection. We evaluated the effectiveness of our proposed method on the Radiate dataset using COCO metrics. The results show that it surpasses state-of-the-art methods by $1.89\%$ and $2.61\%$ in favorable and adverse weather conditions, respectively. This underscores the value of radar-Lidar fusion in achieving precise object detection and localization, especially in challenging weather conditions.	翻訳日:2023-11-01 20:43:01 公開日:2023-10-30
# 多エージェント協調学習システムのための後悔最小化アルゴリズム Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems ( http://arxiv.org/abs/2310.19468v1 ) ライセンス: Link先を確認	Jialin Yi	(参考訳) MACL(Multi-Agent Cooperative Learning)は、人工知能(AI)システムであり、複数の学習エージェントが協力して共通のタスクを完了させる。様々な領域(例えば、交通制御、クラウドコンピューティング、ロボティクス)におけるMACLシステムの最近の実証的な成功は、逐次決定問題のためのMACLシステムの設計と分析に活発な研究を巻き起こした。意思決定問題に対する学習アルゴリズムの重要な指標の1つは、その後悔、すなわち、最も達成可能な報酬とアルゴリズムが得る実際の報酬との差である。低レベルの学習アルゴリズムを用いたMACLシステムの設計と開発は、膨大な経済価値を生み出すことができる。本論文では, 逐次決定問題に対するMACLシステムの解析を行う。具体的には、第3章及び第4章は、複数の学習エージェントが通信ネットワークを介して情報を交換でき、エージェントが選択した行動の報酬だけを観察できる、全情報またはバンディットフィードバックを用いて、協調型マルチエージェントマルチエージェントバンディット問題を調査する。第5章では、分散環境でのオンライン凸最適化のコミュニケーション・レグレットトレードオフを考察する。第6章では、適応的なインクリメンタルマッチングを使用して、未知だが固定型のエージェントに対して、ハイプロダクティブなチームを形成する方法について論じている。以上の問題に対して,実現可能な学習アルゴリズムに対する後悔の少ない境界を示し,この境界を達成するための効率的なアルゴリズムを提供する。第3章、第4章、第5章の後悔境界は、通信網の接続性や通信遅延にどのように影響するかを定量化し、MACLシステムにおける通信プロトコルの設計に関する有用なガイダンスを提供する。 A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systems	翻訳日:2023-11-01 20:36:00 公開日:2023-10-30
# ニューラルインプシット関数の混合による生成的ニューラルフィールド Generative Neural Fields by Mixtures of Neural Implicit Functions ( http://arxiv.org/abs/2310.19464v1 ) ライセンス: Link先を確認	Tackgeun You and Mijeong Kim and Jungtaek Kim and Bohyung Han	(参考訳) 本稿では,暗黙的ベースネットワークの線形結合によって表現される生成的ニューラルネットワークの学習手法を提案する。提案アルゴリズムは,メタラーニングや自動デコーディングのパラダイムを採用することにより,暗黙のニューラルネットワーク表現とその係数を潜在空間で学習する。提案手法は, モデル平均化により, 推定するネットワークのサイズを小さく保ちながら, ベースネットワークの数を増やすことにより, 生成するニューラルネットワークの容量を容易に拡大する。したがって、モデルを用いたインスタンスのサンプリングは、レイテンシとメモリフットプリントの点で効率的である。さらに,対象タスクの拡散確率モデルをカスタマイズして潜時混合係数をサンプリングし,最終モデルで目に見えないデータを効果的に生成する。提案手法は,画像,ボクセルデータ,NeRFシーンの様々なベンチマークにおいて,特定のモダリティやドメインの高度な設計を伴わずに,競合生成性能を実現する。 We propose a novel approach to learning the generative neural fields represented by linear combinations of implicit basis networks. Our algorithm learns basis networks in the form of implicit neural representations and their coefficients in a latent space by either conducting meta-learning or adopting auto-decoding paradigms. The proposed method easily enlarges the capacity of generative neural fields by increasing the number of basis networks while maintaining the size of a network for inference to be small through their weighted model averaging. Consequently, sampling instances using the model is efficient in terms of latency and memory footprint. Moreover, we customize denoising diffusion probabilistic model for a target task to sample latent mixture coefficients, which allows our final model to generate unseen data effectively. Experiments show that our approach achieves competitive generation performance on diverse benchmarks for images, voxel data, and NeRF scenes without sophisticated designs for specific modalities and domains.	翻訳日:2023-11-01 20:35:32 公開日:2023-10-30
# コストからゴールまで見積もるのではなく、計画ヒューリスティックをランクに最適化する Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal ( http://arxiv.org/abs/2310.19463v1 ) ライセンス: Link先を確認	Leah Chrestien, Tom\'as Pevn\'y, Stefan Edelkamp, Anton\'in Komenda	(参考訳) 計画のための模倣学習では、解いた問題インスタンスの集合に対してヒューリスティック関数のパラメータを最適化する。本研究は, 回帰最適経路上の状態のみを展開する, a* および greedy best-first search を主とする前方探索アルゴリズムに対して, 厳密に最適効率なヒューリスティックの必要十分条件を再検討する。そして、与えられたフォワード探索アルゴリズムの変種に合わせて調整されたランキングに基づく損失関数の族を提案する。さらに,学習理論の観点からは,コスト対ゴールの最適化が不必要に難しい理由について考察する。様々な問題に対する実験的な比較は、導出理論を支持しない。 In imitation learning for planning, parameters of heuristic functions are optimized against a set of solved problem instances. This work revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A* and greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal \hstar\ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.	翻訳日:2023-11-01 20:35:15 公開日:2023-10-30
# ハードウェア不自由通信システムにおける拡散確率モデル -無線生成AIに向けて- Denoising Diffusion Probabilistic Models for Hardware-Impaired Communication Systems: Towards Wireless Generative AI ( http://arxiv.org/abs/2310.19460v1 ) ライセンス: Link先を確認	Mehdi Letafati, Samad Ali, Matti Latva-aho	(参考訳) ChatGPTや拡散モデルのような最先端のジェネレーティブモデルによる卓越した成果により、生成AIは、さまざまな産業や学術領域で大きな注目を集めている。本稿では,ハードウェア不整形トランシーバを用いた実用的な有限精度無線通信システムについて,拡散確率モデル(DDPM)を提案する。 DDPMの背後にある直感は、いわゆる「デノイング」ステップでデータ生成プロセスを分解することである。 DDPMベースの受信機は、ハードウェア障害(HWI)、チャネル歪み、量子化誤差などの現実的な非理想に直面する実用的な無線通信方式を提案する。提案手法は低SNR下でのネットワークレジリエンス,HWIレベルと量子化誤差の相違によるほぼ不変な再構成性能,非ガウス雑音に対するロバストなアウト・オブ・ディストリビューション性能を実現する。さらに,コサイン類似性と平均二乗誤差(MSE)の観点から,従来のディープニューラルネットワーク(DNN)ベースの受信機と比較して25dB以上の改善が見られた。 Thanks to the outstanding achievements from state-of-the-art generative models like ChatGPT and diffusion models, generative AI has gained substantial attention across various industrial and academic domains. In this paper, denoising diffusion probabilistic models (DDPMs) are proposed for a practical finite-precision wireless communication system with hardware-impaired transceivers. The intuition behind DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, a DDPM-based receiver is proposed for a practical wireless communication scheme that faces realistic non-idealities, including hardware impairments (HWI), channel distortions, and quantization errors. It is shown that our approach provides network resilience under low-SNR regimes, near-invariant reconstruction performance with respect to different HWI levels and quantization errors, and robust out-of-distribution performance against non-Gaussian noise. Moreover, the reconstruction performance of our scheme is evaluated in terms of cosine similarity and mean-squared error (MSE), highlighting more than 25 dB improvement compared to the conventional deep neural network (DNN)-based receivers.	翻訳日:2023-11-01 20:34:46 公開日:2023-10-30
# キャンパスネットワーク上のエンタングルメントに基づく量子デジタル署名 Entanglement-based quantum digital signatures over deployed campus network ( http://arxiv.org/abs/2310.19457v1 ) ライセンス: Link先を確認	Joseph C. Chapman, Muneer Alshowkan, Bing Qi, Nicholas A. Peters	(参考訳) 量子デジタル署名プロトコルは、今日のデジタル世界において、公開鍵デジタル署名のほとんどの側面を置き換えるものである。量子デジタルシグネチャプロトコルの大きな利点は、公開鍵暗号ではできないのに対して、情報理論のセキュリティを持つことができることである。ここでは,ハードウェアの実証と特徴付けを行い,キャンパスネットワーク上での絡み合いに基づく量子デジタル署名の実装を行う。 25時間以上にわたって、我々はキャンパスネットワーク上で測定を行い、量子ビット誤り率(ほとんどの場合)を十分に低く測定し、原理的には、厳密なシミュレーションで示されるように、50km以上の量子デジタルシグネチャを実現する。これらの結果は、量子デジタル署名をデプロイされたファイバ上でうまく利用することができることを示している。現在のエンタングルメントベースのアプローチの実装はシグネチャレートが低いが、実現可能なアップグレードはシグネチャレートを大幅に増加させる。さらに,報告した手法はユーザ数に優れた柔軟性を提供する。 The quantum digital signature protocol offers a replacement for most aspects of public-key digital signatures ubiquitous in today's digital world. A major advantage of a quantum digital signatures protocol is that it can have information-theoretic security, whereas public-key cryptography cannot. Here we demonstrate and characterize hardware to implement entanglement-based quantum digital signatures over our campus network. Over 25 hours, we collect measurements on our campus network, where we measure sufficiently low quantum bit error rates (<5\% in most cases) which in principle enable quantum digital signatures over up to 50 km as shown in rigorous simulation accompanied by a noise model developed specifically for our implementation. These results show quantum digital signatures can be successfully employed over deployed fiber. While the current implementation of our entanglement-based approach has a low signature rate, feasible upgrades would significantly increase the signature rate. In addition, our reported method provides great flexibility in the number of users.	翻訳日:2023-11-01 20:34:25 公開日:2023-10-30
# mmmとmmmsynth: 不均質な表データのクラスタリングと合成データ生成 MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation ( http://arxiv.org/abs/2310.19454v1 ) ライセンス: Link先を確認	Chandrani Kumari and Rahul Siddharthan	(参考訳) 我々は、クラスタリングと合成データ生成という異種グラフデータセットに関連する2つのタスクに対して、新しいアルゴリズムを提供する。タブラルデータセットは典型的には列内の異種データ型(数値、順序、カテゴリー)から構成されるが、行内に隠されたクラスタ構造を持つ場合もある。例えば、それらは異種(地理、社会経済、方法論)のソースから引き出され、それらが記述する結果変数(病気の存在など)は他の変数だけでなく、クラスタコンテキストにも依存する。さらに、生体医学データの共有は、しばしば患者の機密性法によって妨げられ、例えば、ディープラーニングによって、実際のデータから合成表データを生成するアルゴリズムへの関心がある。本研究では,合成不均質データにおけるクラスタの決定に標準アルゴリズムを上回り,実データの構造を復元する,新しいem型クラスタリングアルゴリズムmmm(`madras mixed model'')を提案する。そこで本研究では,MMMsynthという合成表データ生成アルゴリズムを用いて,入力データに対してクラスタ固有のデータ分布を仮定したクラスタワイズ合成データを生成する。このアルゴリズムは、合成データでトレーニングされ、実際に公開されたデータセットでテストされた場合、標準mlアルゴリズムのパフォーマンスをテストすることによってベンチマークを行う。我々の合成データ生成アルゴリズムは、他の文献表データ生成装置よりも優れており、実データで純粋にトレーニングのパフォーマンスにアプローチする。 We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM (``Madras Mixture Model''), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.	翻訳日:2023-11-01 20:34:07 公開日:2023-10-30
# ALT:クリックスルーレート予測のための言語モデルとCTRモデル間の微粒なアライメントを目指して ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction ( http://arxiv.org/abs/2310.19453v1 ) ライセンス: Link先を確認	Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu	(参考訳) クリックスルーレート(CTR)予測は、さまざまなパーソナライズされたオンラインサービスにおいてコア機能モジュールとして機能する。データモダリティと入力形式により、CTR予測のモデルは、主に2つのカテゴリに分類される。ひとつは、従来のCTRモデルで、1ホットの符号化IDの特徴を表わし、特徴相互作用モデリングによって協調的な信号をキャプチャすることを目的としている。第2のカテゴリは、ハードプロンプトテンプレートによって得られるテキストモダリティの文を入力として取り、事前訓練された言語モデル(PLM)を用いて意味知識を抽出する。これらの2つの研究は、一般的に同じ入力データ(テキストと表のモダリティ)の異なる特性に焦点を合わせ、互いに異なる相補的な関係を形成する。そこで本稿では,CTR予測のための言語モデルとCTRモデル(ALT)間の細粒度特徴レベルのアライメントを提案する。一般的なCLIPのようなインスタンスレベルのコントラスト学習とは別に、マスク言語と表型モデリングの両方のための新しい共同再構築事前訓練タスクを設計する。具体的には、一方のモダリティ(トークンや特徴)のマスクされたデータは、他方のモダリティの助けを借りて復元され、双対モダリティ間の十分な相互情報抽出を通じて特徴レベルの相互作用とアライメントを確立する必要がある。さらに,下流のctr予測タスクに対して,アライメント言語とctrモデルを別々に,あるいは共同で訓練するオプションにより,産業用途における様々な有効性と効率要件を満たした3種類の微調整戦略を提案する。 3つの実世界のデータセットに対する大規模な実験により、ALTはSOTAベースラインより優れており、様々な言語やCTRモデルに高い互換性があることが示された。 Click-through rate (CTR) prediction plays as a core function module in various personalized online services. According to the data modality and input format, the models for CTR prediction can be mainly classified into two categories. The first one is the traditional CTR models that take as inputs the one-hot encoded ID features of tabular modality, which aims to capture the collaborative signals via feature interaction modeling. The second category takes as inputs the sentences of textual modality obtained by hard prompt templates, where pretrained language models (PLMs) are adopted to extract the semantic knowledge. These two lines of research generally focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. Therefore, in this paper, we propose to conduct fine-grained feature-level Alignment between Language and CTR models (ALT) for CTR prediction. Apart from the common CLIP-like instance-level contrastive learning, we further design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose three different finetuning strategies with the option to train the aligned language and CTR models separately or jointly for downstream CTR prediction tasks, thus accommodating the varying efficacy and efficiency requirements for industrial applications. Extensive experiments on three real-world datasets demonstrate that ALT outperforms SOTA baselines, and is highly compatible for various language and CTR models.	翻訳日:2023-11-01 20:33:40 公開日:2023-10-30
# PyTorchモデルへの大規模フォールトインジェクションの適用 - 検証効率向上のためのPyTorchFIの拡張- Large-Scale Application of Fault Injection into PyTorch Models -- an Extension to PyTorchFI for Validation Efficiency ( http://arxiv.org/abs/2310.19449v1 ) ライセンス: Link先を確認	Ralf Graafe, Qutub Syed Sha, Florian Geissler, Michael Paulitsch	(参考訳) ハードウェアにおける過渡的あるいは恒久的な障害は、ユーザー固有のエラー、すなわちサイレントデータエラー(SDE)の痕跡なしで、ニューラルネットワーク(NN)の出力を誤ったものにすることができる。一方、現代のNNは特定の障害を許容できる固有の冗長性を持っている。安全ケースを確立するには,両タイプの腐敗を識別し,定量化する必要がある。近年,ハードウェア(HW)故障がソフトウェア(SW),特にNNモデルに与える影響を調べるために,いくつかの欠陥注入法が確立されている。現在の FI 法は, 断層を注入する手法に重点を置いているが, 大規模な FI 試験に欠かせない場合が多く, 特定の断層モデルに基づく多くの故障箇所を短時間で解析する必要がある。結果は簡潔で、繰り返し可能で、同等である必要があります。これらの要件に対処し、機械学習開発サイクルのデフォルトコンポーネントとしてフォールトインジェクションを有効にするため、PyTorchALFI(Application Level Fault Injection for PyTorch)と呼ばれる新しいフォールトインジェクションフレームワークを導入する。 PyTorchALFIは、ランダムに生成された再利用可能なフォールトセットを定義し、PyTorchモデルに注入し、複雑なテストシナリオを定義し、データセットを拡張し、テストKPIを生成する。本稿では, テストシナリオの定義, ソフトウェアアーキテクチャ, および新しいフレームワークを用いて, 故障位置と数値の反復的変化を適用し, 異なるモデル修正を比較し, テスト結果を解析するいくつかの例について述べる。 Transient or permanent faults in hardware can render the output of Neural Networks (NN) incorrect without user-specific traces of the error, i.e. silent data errors (SDE). On the other hand, modern NNs also possess an inherent redundancy that can tolerate specific faults. To establish a safety case, it is necessary to distinguish and quantify both types of corruptions. To study the effects of hardware (HW) faults on software (SW) in general and NN models in particular, several fault injection (FI) methods have been established in recent years. Current FI methods focus on the methodology of injecting faults but often fall short of accounting for large-scale FI tests, where many fault locations based on a particular fault model need to be analyzed in a short time. Results need to be concise, repeatable, and comparable. To address these requirements and enable fault injection as the default component in a machine learning development cycle, we introduce a novel fault injection framework called PyTorchALFI (Application Level Fault Injection for PyTorch) based on PyTorchFI. PyTorchALFI provides an efficient way to define randomly generated and reusable sets of faults to inject into PyTorch models, defines complex test scenarios, enhances data sets, and generates test KPIs while tightly coupling fault-free, faulty, and modified NN. In this paper, we provide details about the definition of test scenarios, software architecture, and several examples of how to use the new framework to apply iterative changes in fault location and number, compare different model modifications, and analyze test results.	翻訳日:2023-11-01 20:32:52 公開日:2023-10-30
# 咬合認識時空間変圧器を用いた大規模シーンのグルーピング Towards Grouping in Large Scenes with Occlusion-aware Spatio-temporal Transformers ( http://arxiv.org/abs/2310.19447v1 ) ライセンス: Link先を確認	Jinsong Zhang and Lingfeng Gu and Yu-Kun Lai and Xueyang Wang and Kun Li	(参考訳) グループ検出、特に大規模なシーンでは、公共の安全とスマートシティに多くの潜在的な応用がある。既存の手法では,複数人の大規模場面で頻繁な閉塞に対処できず,時空間情報の有効活用が困難である。本稿では,大規模シーンにおけるグループ検出のためのエンドツーエンドフレームワークGroupTransformerを提案する。複数の人による頻繁な隠蔽に対処するため,重度の隠蔽人作物の検出・抑制のための隠蔽エンコーダを設計した。本研究では, 時空間的関係を探究するために, 軌跡情報を抽出し, 人物間特徴を階層的に融合する時空間的トランスフォーマを提案する。大規模・小規模の両方での実験結果から,本手法は最先端の手法と比較して性能が向上することが示された。大規模シーンでは,F1スコアが10%以上向上し,精度が向上した。小規模シーンでは,f1スコアのパフォーマンスを5%以上向上させることができた。コード付きのプロジェクトページはhttp://cic.tju.edu.cn/faculty/likun/projects/GroupTransにある。 Group detection, especially for large-scale scenes, has many potential applications for public safety and smart cities. Existing methods fail to cope with frequent occlusions in large-scale scenes with multiple people, and are difficult to effectively utilize spatio-temporal information. In this paper, we propose an end-to-end framework,GroupTransformer, for group detection in large-scale scenes. To deal with the frequent occlusions caused by multiple people, we design an occlusion encoder to detect and suppress severely occluded person crops. To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner. Experimental results on both large-scale and small-scale scenes demonstrate that our method achieves better performance compared with state-of-the-art methods. On large-scale scenes, our method significantly boosts the performance in terms of precision and F1 score by more than 10%. On small-scale scenes, our method still improves the performance of F1 score by more than 5%. The project page with code can be found at http://cic.tju.edu.cn/faculty/likun/projects/GroupTrans.	翻訳日:2023-11-01 20:32:20 公開日:2023-10-30
# 狭窄検出のための連合学習フレームワーク A Federated Learning Framework for Stenosis Detection ( http://arxiv.org/abs/2310.19445v1 ) ライセンス: Link先を確認	Mariachiara Di Cosmo, Giovanna Migliorelli, Matteo Francioni, Andi Mucaj, Alessandro Maolo, Alessandro Aprile, Emanuele Frontoni, Maria Chiara Fiorentino, and Sara Moccia	(参考訳) 本研究は,冠動脈造影画像(CA)の狭窄検出におけるFL(Federated Learning)の使用について検討した。 2つの機関から得られた2つの異種データセットについて検討した: Dataset 1は、Ancona(イタリア)のOspedale Riunitiで取得した200人の患者1219枚の画像を含み、Dataset 2は、以前の研究で得られた90人の患者7492枚の画像を含む。狭窄検出はより高速なR-CNNモデルを用いて行った。 FLフレームワークでは、モデルのバックボーンの重量のみを2つのクライアント機関間で共有し、フェデレート平均化(FedAvg)を用いて重み付けを行った。狭窄検出の精度(p rec),リコール(rec),f1スコア(f1)を用いて評価した。 FLフレームワークは,クライアント1に対して,+3.76%,+17.21%,+10.80%,Preg = 73.56, Rec = 67.01, F1 = 70.13, F1 = 70.13の局所モデルに対して,性能を向上する。このような結果から,患者プライバシを保ちつつ,様々な施設からのデータ均一性に対処することにより,CAにおける自動狭窄検出に関する多施設間研究を可能にした。 This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA). Two heterogeneous datasets from two institutions were considered: Dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy); Dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature. Stenosis detection was performed by using a Faster R-CNN model. In our FL framework, only the weights of the model backbone were shared among the two client institutions, using Federated Averaging (FedAvg) for weight aggregation. We assessed the performance of stenosis detection using Precision (P rec), Recall (Rec), and F1 score (F1). Our results showed that the FL framework does not substantially affects clients 2 performance, which already achieved good performance with local training; for client 1, instead, FL framework increases the performance with respect to local model of +3.76%, +17.21% and +10.80%, respectively, reaching P rec = 73.56, Rec = 67.01 and F1 = 70.13. With such results, we showed that FL may enable multicentric studies relevant to automatic stenosis detection in CA by addressing data heterogeneity from various institutions, while preserving patient privacy.	翻訳日:2023-11-01 20:32:03 公開日:2023-10-30
# 一対一:知識蒸留における異種アーキテクチャ間のギャップを埋める One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation ( http://arxiv.org/abs/2310.19444v1 ) ライセンス: Link先を確認	Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu	(参考訳) 知識蒸留〜(KD)は,教師-学生の学習手法によるモデル性能向上に有効な手法であることが証明されている。しかし、既存の蒸留法は、教師と生徒のモデルが同じモデルファミリー、特にヒントに基づくアプローチに属すると仮定して設計されている。集中型カーネルアライメント(CKA)を用いて、異種教師と学生のモデル間の学習特徴を比較することにより、重要な特徴のばらつきを観察する。この分散は、クロスアーキテクチャ蒸留における従来のヒントベースの手法の非効率性を示している。ヘテロジニアスモデルを蒸留する際の課題に対処するため, ヘテロジニアスアーキテクチャ間の蒸留性能を著しく向上させる, OFA-KDという, シンプルで効果的なKDフレームワークを提案する。具体的には,アーキテクチャ固有の情報を破棄するlogits空間のような,中間機能を整合した潜在空間に投影する。また,学生が無関係な情報に邪魔されることを防止すべく,適応的目標拡張手法を提案する。 cnn、transformer、mlpを含む様々なアーキテクチャによる広範囲な実験は、異種アーキテクチャ間の蒸留を可能にするofa-kdフレームワークの優位性を示しています。具体的には、我々のOFA-KDを装着すると、学生モデルは、CIFAR-100データセットで最大8.0%、ImageNet-1Kデータセットで最大0.7%の顕著なパフォーマンス向上を達成する。 PyTorchのコードとチェックポイントはhttps://github.com/Hao840/OFAKDで確認できる。 Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD.	翻訳日:2023-11-01 20:31:36 公開日:2023-10-30
# マーカーレスモーションキャプチャーからの動的ガウススメッティングは乳幼児の運動を再構成できる Dynamic Gaussian Splatting from Markerless Motion Capture can Reconstruct Infants Movements ( http://arxiv.org/abs/2310.19441v1 ) ライセンス: Link先を確認	R. James Cotton and Colleen Peyton	(参考訳) 運動の正確な3dトラッキングへの簡単なアクセスは、リハビリテーションの多くの面に役立つだろう。この目標を達成するための課題は、有能な成人のための多くのデータセットと事前訓練されたアルゴリズムがあるが、これらのデータセットで訓練されたアルゴリズムは、障害のある人、幼児、新生児を含む臨床人口に一般化できないことだ。幼児と新生児の信頼性の高い運動分析は、自発的な運動行動が神経機能と神経発達障害の重要な指標であり、早期の介入を導くのに役立つため重要である。マーカーレスモーションキャプチャ(MMC)データに対する動的ガウススプラッティングの適用について検討した。本手法では, セグメンテーションマスクを用いて幼児に焦点を合わせ, シーンの初期化を著しく改善する。本手法は,シーンの新たな視点の描画や乳幼児の動きの追跡に有用である可能性が示唆された。この研究は、様々な臨床患者に応用できる高度な運動分析ツールへの道を開き、特に幼児の早期発見に重点を置いている。 Easy access to precise 3D tracking of movement could benefit many aspects of rehabilitation. A challenge to achieving this goal is that while there are many datasets and pretrained algorithms for able-bodied adults, algorithms trained on these datasets often fail to generalize to clinical populations including people with disabilities, infants, and neonates. Reliable movement analysis of infants and neonates is important as spontaneous movement behavior is an important indicator of neurological function and neurodevelopmental disability, which can help guide early interventions. We explored the application of dynamic Gaussian splatting to sparse markerless motion capture (MMC) data. Our approach leverages semantic segmentation masks to focus on the infant, significantly improving the initialization of the scene. Our results demonstrate the potential of this method in rendering novel views of scenes and tracking infant movements. This work paves the way for advanced movement analysis tools that can be applied to diverse clinical populations, with a particular emphasis on early detection in infants.	翻訳日:2023-11-01 20:31:03 公開日:2023-10-30
# 非対称拡散型チャネル適応型セキュア無線セマンティクス通信 Asymmetric Diffusion Based Channel-Adaptive Secure Wireless Semantic Communications ( http://arxiv.org/abs/2310.19439v1 ) ライセンス: Link先を確認	Xintian Ren, Jun Wu, Hansong Xu, Qianqian Pan	(参考訳) セマンティックコミュニケーションは、画像分類や画像再構成といったタスクにおけるエンドツーエンドデータ送信の研究を推進する、新しいディープラーニングベースのコミュニケーションパラダイムとして登場した。しかし、セマンティックアタックによるセキュリティ問題は十分に解明されておらず、セマンティック通信システム内の脆弱性が潜在的セマンティックな摂動にさらされている。本稿では,この問題を解決するために,拡散モデルと深層強化学習(drl)を利用したセキュアな意味コミュニケーションシステムであるdrawcを提案する。送信側端の拡散モジュールと受信側端の非対称なdenoisingモジュールにより、DiffuSeCはデータソース攻撃やチャネルアタックを含むセマンティックアタックによって追加された摂動を緩和する。セマンティックアタックによる不安定なチャネル条件下でのロバスト性をさらに向上するため,DRLに基づくチャネル適応拡散ステップ選択方式を開発し,変動環境下での安定した性能を実現する。両端間の拡散時間ステップ調整のための時間ステップ同期スキームを設計する。シミュレーションの結果, 提案したDiffuSeCは, より広いチャネル条件下での従来の作業よりも頑健な精度を示し, 不安定環境下での信号-雑音比(SNR)に応じて, モデル状態を迅速に調整できることがわかった。 Semantic communication has emerged as a new deep learning-based communication paradigm that drives the research of end-to-end data transmission in tasks like image classification, and image reconstruction. However, the security problem caused by semantic attacks has not been well explored, resulting in vulnerabilities within semantic communication systems exposed to potential semantic perturbations. In this paper, we propose a secure semantic communication system, DiffuSeC, which leverages the diffusion model and deep reinforcement learning (DRL) to address this issue. With the diffusing module in the sender end and the asymmetric denoising module in the receiver end, the DiffuSeC mitigates the perturbations added by semantic attacks, including data source attacks and channel attacks. To further improve the robustness under unstable channel conditions caused by semantic attacks, we developed a DRL-based channel-adaptive diffusion step selection scheme to achieve stable performance under fluctuating environments. A timestep synchronization scheme is designed for diffusion timestep coordination between the two ends. Simulation results demonstrate that the proposed DiffuSeC shows higher robust accuracy than previous works under a wide range of channel conditions, and can quickly adjust the model state according to signal-to-noise ratios (SNRs) in unstable environments.	翻訳日:2023-11-01 20:30:46 公開日:2023-10-30
# 自然ドメイン基盤モデルは医用画像分類に有用か? Are Natural Domain Foundation Models Useful for Medical Image Classification? ( http://arxiv.org/abs/2310.19522v1 ) ライセンス: Link先を確認	Joana Pal\'es Huix and Adithya Raju Ganeshan and Johan Fredin Haslum and Magnus S\"oderberg and Christos Matsoukas and Kevin Smith	(参考訳) ディープラーニングの分野は、さまざまなタスクに容易に適応できる一般的な基礎モデルの利用に集約されている。このパラダイムシフトは自然言語処理の分野で一般的に行われているが、コンピュータビジョンでは進歩が遅くなっている。本稿では, 医用画像分類課題に対する各種基礎モデルの転送可能性について検討し, この問題に対処しようとする。具体的には, SAM, SEEM, DINOv2, BLIP, OpenCLIPの5つの基礎モデルの性能評価を行った。これらのモデルの可能性を完全に活用するために、さまざまなトレーニング設定を検討します。我々の研究は様々な結果を示している。特にDINOv2は、ImageNet事前トレーニングの標準プラクティスを一貫して上回っている。しかし、他の基盤モデルは、医療画像分類タスクへの転送可能性の限界を示すこの確立されたベースラインを一貫して打ち負かさなかった。 The deep learning field is converging towards the use of general foundation models that can be easily adapted for diverse tasks. While this paradigm shift has become common practice within the field of natural language processing, progress has been slower in computer vision. In this paper we attempt to address this issue by investigating the transferability of various state-of-the-art foundation models to medical image classification tasks. Specifically, we evaluate the performance of five foundation models, namely SAM, SEEM, DINOv2, BLIP, and OpenCLIP across four well-established medical imaging datasets. We explore different training settings to fully harness the potential of these models. Our study shows mixed results. DINOv2 in particular, consistently outperforms the standard practice of ImageNet pretraining. However, other foundation models failed to consistently beat this established baseline indicating limitations in their transferability to medical image classification tasks.	翻訳日:2023-11-01 20:22:21 公開日:2023-10-30
# 対話型レコメンデーションのための一般神経因果モデル A General Neural Causal Model for Interactive Recommendation ( http://arxiv.org/abs/2310.19519v1 ) ライセンス: Link先を確認	Jialin Liu, Xinyan Su, Peng Zhou, Xiangyu Zhao, Jun Li	(参考訳) 観測データの生存バイアスは、リコメンダシステムの最適化を局所最適に導く。現在、ほとんどのソリューションは、強化学習による長期的な満足度を最大化するために、既存のヒューマンシステムコラボレーションパターンを再設計している。しかし、因果的観点から見れば、生き残り効果を緩和するには反事実的問題に答える必要がある。本研究では,偽推論を実現するための神経因果モデルを提案する。具体的には,学習可能な構造的因果モデルを構築し,選択遷移を定性的に特徴付ける。生存バイアスの軽減は、反事実的一貫性によって達成される。一貫性を特定するために、gumbel-max関数を構造制約として使用する。一貫性を推定するために、強化最適化を適用し、Gumbel-Softmax をトレードオフとして使い、微分可能な関数を得る。理論的および実証的な研究は、我々の解の有効性を実証する。 Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.	翻訳日:2023-11-01 20:22:06 公開日:2023-10-30
# 3次元シーンにおける質問に対する文脈対応自然回答の生成 Generating Context-Aware Natural Answers for Questions in 3D Scenes ( http://arxiv.org/abs/2310.19516v1 ) ライセンス: Link先を確認	Mohammed Munzer Dwedari, Matthias Niessner, Dave Zhenyu Chen	(参考訳) 3D質問応答は、まだ探索されていない3D視覚言語における若い分野である。従来の方法は事前に定義された回答空間に限られており、自然に回答を生成できない。本研究では,質問応答タスクをシーケンス生成タスクにピボットし,3次元シーン (gen3dqa) における質問に対する自由形式の自然な回答を生成する。この目的のために、我々は言語報酬を直接モデルに最適化し、グローバルな文セマンティクスを確保する。また,文の質を向上させるために,実用的な言語理解報酬を適用する。本手法は,ScanQAベンチマークに新しいSOTAを設定する(テストセットのCIDErスコア72.22/66.57)。 3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the language rewards to secure the global sentence semantics. Here, we also adapt a pragmatic language understanding reward to further improve the sentence quality. Our method sets a new SOTA on the ScanQA benchmark (CIDEr score 72.22/66.57 on the test sets).	翻訳日:2023-11-01 20:21:52 公開日:2023-10-30
# 衛星画像からのレーダ複合材料の異常気象用変圧器によるノキャスティング Transformer-based nowcasting of radar composites from satellite images for severe weather ( http://arxiv.org/abs/2310.19515v1 ) ライセンス: Link先を確認	\c{C}a\u{g}lar K\"u\c{c}\"uk and Apostolos Giannakos and Stefan Schneider and Alexander Jann	(参考訳) 気象レーダのデータは, 気象予報モデルに欠かせない要素である。気象レーダーデータは高分解能で貴重な情報を提供するが、その地上性は可用性を制限し、大規模なアプリケーションを妨げる。対照的に、気象衛星はより広い領域をカバーするが、より粗い解像度を持つ。しかし、データ駆動方式と静止衛星に搭載された現代のセンサーの急速な進歩により、地上観測と宇宙観測のギャップを埋める新たな機会が生まれ、最終的には精度の高い天気予報に繋がる。ここでは、衛星データを用いて地上レーダー画像列を最大2時間リードするトランスフォーマーモデルを提案する。厳しい気象条件を反映したデータセットでトレーニングされたこのモデルは、異なる気象現象の下で発生するレーダーフィールドを予測し、急速に成長する/減少するフィールドと複雑なフィールド構造に対する堅牢性を示す。モデル解釈では、10.3$\mu m$ (c13) の赤外線チャネルは全ての気象条件の熟練した情報を含んでいるが、雷データは厳しい気象条件、特に短いリードタイムにおいて最も相対的な特徴を持つ。このモデルは、レーダータワーを明示的に必要とせずに、大きな領域にまたがる降水量予測をサポートし、数値気象予測と水文モデルを強化し、データスカース領域のレーダプロキシを提供する。さらに、オープンソースのフレームワークは、運用データ駆動の nowcasting への進展を促進する。 Weather radar data are critical for nowcasting and an integral component of numerical weather prediction models. While weather radar data provide valuable information at high resolution, their ground-based nature limits their availability, which impedes large-scale applications. In contrast, meteorological satellites cover larger domains but with coarser resolution. However, with the rapid advancements in data-driven methodologies and modern sensors aboard geostationary satellites, new opportunities are emerging to bridge the gap between ground- and space-based observations, ultimately leading to more skillful weather prediction with high accuracy. Here, we present a Transformer-based model for nowcasting ground-based radar image sequences using satellite data up to two hours lead time. Trained on a dataset reflecting severe weather conditions, the model predicts radar fields occurring under different weather phenomena and shows robustness against rapidly growing/decaying fields and complex field structures. Model interpretation reveals that the infrared channel centered at 10.3 $\mu m$ (C13) contains skillful information for all weather conditions, while lightning data have the highest relative feature importance in severe weather conditions, particularly in shorter lead times. The model can support precipitation nowcasting across large domains without an explicit need for radar towers, enhance numerical weather prediction and hydrological models, and provide radar proxy for data-scarce regions. Moreover, the open-source framework facilitates progress towards operational data-driven nowcasting.	翻訳日:2023-11-01 20:21:42 公開日:2023-10-30
# 深層学習を用いた抗体配列設計のための逆折り畳み Inverse folding for antibody sequence design using deep learning ( http://arxiv.org/abs/2310.19513v1 ) ライセンス: Link先を確認	Fr\'ed\'eric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane	(参考訳) 3次元構造情報に基づく抗体配列設計の問題を考える。先行研究に基づき,cdr-h3ループの著明な改善とともに,抗体構造に特化し,配列回復と構造ロバスト性に関する汎用タンパク質モデルよりも優れる,微調整された逆折り畳みモデルを提案する。相補性決定領域の正準配座を研究し、これらのループの既知のクラスターへの符号化を改善した。最後に, 薬物発見およびバインダー設計へのモデルの適用を考察し, 物理学に基づく手法を用いて提案する配列の品質評価を行った。 We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.	翻訳日:2023-11-01 20:21:19 公開日:2023-10-30
# VideoCrafter1: 高品質ビデオ生成のためのオープン拡散モデル VideoCrafter1: Open Diffusion Models for High-Quality Video Generation ( http://arxiv.org/abs/2310.19512v1 ) ライセンス: Link先を確認	Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan	(参考訳) ビデオ生成は、学界と産業の両方にますます関心を寄せている。商用ツールは可塑性ビデオを生成することができるが、研究者やエンジニアが利用できるオープンソースモデルは限られている。本稿では,高品質映像生成のための拡散モデルとして,t2v(text-to-video)とi2v(image-to-video)モデルを紹介する。 T2Vモデルは与えられたテキスト入力に基づいてビデオを合成し、I2Vモデルは追加のイメージ入力を含む。提案したT2Vモデルは、解像度が1024 \times 576$のリアルで映像品質の高いビデオを生成することができる。 I2Vモデルは、提供された参照画像の内容に厳密に準拠し、その内容、構造、スタイルを保存するビデオを作成するように設計されている。このモデルは、コンテンツ保存制約を維持しながら、所定の画像をビデオクリップに変換することができる最初のオープンソースI2V基盤モデルである。これらのオープンソースビデオ生成モデルは、コミュニティ内の技術進歩に大きく貢献すると考えています。 Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers and engineers. In this work, we introduce two diffusion models for high-quality video generation, namely text-to-video (T2V) and image-to-video (I2V) models. T2V models synthesize a video based on a given text input, while I2V models incorporate an additional image input. Our proposed T2V model can generate realistic and cinematic-quality videos with a resolution of $1024 \times 576$, outperforming other open-source T2V models in terms of quality. The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style. This model is the first open-source I2V foundation model capable of transforming a given image into a video clip while maintaining content preservation constraints. We believe that these open-source video generation models will contribute significantly to the technological advancements within the community.	翻訳日:2023-11-01 20:21:07 公開日:2023-10-30
# 量子通信用シリコン中のoバンドおよび遷移金属色中心の光物理 Photophysics of O-band and transition metal color centers in monolithic silicon for quantum communications ( http://arxiv.org/abs/2310.19510v1 ) ライセンス: Link先を確認	Murat Can Sarihan, Jiahui Huang, Jin Ho Kang, Cody Fan, Wei Liu, Khalifa M. Azizur-Rahman, Baolai Liang, Chee Wei Wong	(参考訳) 低分散Oバンド波長における色中心は、エネルギー時間絡みによるメモリ支援量子通信に向けた長寿命量子ネットワークノードにとって不可欠な資源である。そこで本研究では,光発光のダイナミクスを検証しながら,T中心およびその他の色中心欠陥の発生過程を解明し,量子ビットストレージと放射効率を向上させる。 t センターの $tx_{0}$ ライフタイムを 65% から 1.56 に延長した。さらに、ゼロ分散波長に近づいた1312nm付近に$^Cu_n^m$関連ダブルト発光が存在し、スピン縮退により0.5T以下で磁場誘起膨張が25%増加し、T中心を高忠実なスピン光子界面として置き換えることが可能となる。 Color centers at the low-dispersion O-band wavelengths are an essential resource for long-lifetime quantum network nodes toward memory-assisted quantum communications using energy-time entanglement. In this work, we explore the process of developing T centers and other color center defects to improve qubit storage and radiative efficiency while examining the photoluminescence dynamics. We have extended the $TX_{0}$ lifetime of T centers by 65% to 1.56 $\mu$s. Furthermore, we discover the presence of a $^Cu_n^m$ related doublet emission around 1312 nm close to the zero-dispersion wavelength, with a spin degeneracy resulting in a magnetic-field induced broadening by 25% under 0.5 T, which can be an alternative to T centers as a high-fidelity spin-photon interface.	翻訳日:2023-11-01 20:20:49 公開日:2023-10-30
# SparseByteNN: 微細なグループ空間に基づく新しいモバイル推論高速化フレームワーク SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity ( http://arxiv.org/abs/2310.19509v1 ) ライセンス: Link先を確認	Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen	(参考訳) ネットワークサイズを増やすという課題に対処するため、研究者らはネットワークプルーニングを通じてスパースモデルを開発した。しかし、一般のコンピュータデバイス上での大幅な高速化を達成しながらモデル精度を維持することは、未解決の問題である。本稿では,カーネルの粒度を微粒化してリアルタイム実行を実現し,高精度なモバイル推論高速化フレームワークであるSparseByteNNを提案する。私たちの枠組みは2つの部分からなる。 (a) 構造化プルーニングと非構造化プルーニングの疎粒度を有する微細粒度カーネルスペーシティスキーマ。異なる演算子のために複数のスパースパターンを設計する。提案する全ネットワーク再構成戦略と組み合わせることで,高い圧縮率と高い精度を同時に達成できる。 b)スパースパターンと共最適化された推論エンジン。従来の知恵では、この理論的FLOPの減少は実世界の効率向上には変換されない。 ARMとWebAssembly用の効率的なスパースカーネル群を導入することで、この誤解を修正することを目指している。スパースプリミティブの効率的な実装により,MobileNet-v1のスパースバージョンは,効率・精度曲線の高密度ベースラインよりも優れていることを示す。 Qualcomm 855の実験結果によると、30%のスパースMobileNet-v1では、SparseByteNNは密度の高いバージョンで1.27倍、最先端のスパース推論エンジンMNNで1.29倍のスピードアップを達成した。 SparseByteNNのソースコードはhttps://github.com/lswzjuer/SparseByteNNで入手できる。 To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acceleration framework SparseByteNN, which leverages fine-grained kernel sparsity to achieve real-time execution as well as high accuracy. Our framework consists of two parts: (a) A fine-grained kernel sparsity schema with a sparsity granularity between structured pruning and unstructured pruning. It designs multiple sparse patterns for different operators. Combined with our proposed whole network rearrangement strategy, the schema achieves a high compression rate and high precision at the same time. (b) Inference engine co-optimized with the sparse pattern. The conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy curve. Experimental results on Qualcomm 855 show that for 30% sparse MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and 1.29x speedup over the state-of-the-art sparse inference engine MNN with a slight accuracy drop of 0.224%. The source code of SparseByteNN will be available at https://github.com/lswzjuer/SparseByteNN	翻訳日:2023-11-01 20:20:35 公開日:2023-10-30
# 変分量子特異値分解アルゴリズムの修正について On Modifying the Variational Quantum Singular Value Decomposition Algorithm ( http://arxiv.org/abs/2310.19504v1 ) ライセンス: Link先を確認	Jezer Jojo, Ankit Khandelwal, M Girish Chandra	(参考訳) 本稿では,本論文で広く用いられている変分量子特異値分解アルゴリズムに対する2つの修正について考察する。 1つ目は、アルゴリズムの性能向上を示唆し、回路の深さを減少させる目的関数の変更である。第2の修正では、アルゴリズムの重要なステップである一般行列の期待値の計算方法が導入された。そして、この修正アルゴリズムをベンチマークし、新しい目的関数のパフォーマンスを既存のアルゴリズムと比較します。 In this work, we discuss two modifications that can be made to a known variational quantum singular value decomposition algorithm popular in the literature. The first is a change to the objective function which hints at improved performance of the algorithm and decreases the depth of the circuits. The second modification introduces a new way of computing expectation values of general matrices, which is a key step in the algorithm. We then benchmark this modified algorithm and compare the performance of our new objective function with the existing one.	翻訳日:2023-11-01 20:20:08 公開日:2023-10-30
# 水中ロボットの視覚ナビゲーションのための深層学習 Deep Learning for Visual Navigation of Underwater Robots ( http://arxiv.org/abs/2310.19495v1 ) ライセンス: Link先を確認	M. Sunbeam	(参考訳) 本稿では,水中ロボットの視覚ナビゲーションのための深層学習法を簡単に調査することを目的とする。本稿では,深層学習手法を用いた水中ロボットの視覚知覚,利用可能な水中視覚データセット,模倣学習,ナビゲーションのための強化学習手法について述べる。さらに, 水中ロボットの模倣学習や深層学習のパラダイムの下で, 現在の景観における訓練手法を明確にするために, 関連研究を分類する。深層学習アルゴリズムを用いて水中ナビゲーションのための非視覚データを処理する文献は、対照的な例を除いて考慮されない。 This paper aims to briefly survey deep learning methods for visual navigation of underwater robotics. The scope of this paper includes the visual perception of underwater robotics with deep learning methods, the available visual underwater datasets, imitation learning, and reinforcement learning methods for navigation. Additionally, relevant works will be categorized under the imitation learning or deep learning paradigm for underwater robots for clarity of the training methodologies in the current landscape. Literature that uses deep learning algorithms to process non-visual data for underwater navigation will not be considered, except as contrasting examples.	翻訳日:2023-11-01 20:19:36 公開日:2023-10-30
# 付加・乗算雑音を考慮した線形SDEの発電機同定 Generator Identification for Linear SDEs with Additive and Multiplicative Noise ( http://arxiv.org/abs/2310.19491v1 ) ライセンス: Link先を確認	Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong	(参考訳) 本稿では,与えられた固定初期状態を持つ解過程の分布から線形確率微分方程式(sde)の生成元を同定するための条件を提案する。これらの識別可能性条件は、観測分布からの干渉後分布の同定を可能にするため、線形sdesを用いた因果推論において不可欠である。具体的には,線形SDEの生成元を付加雑音で同定するための十分かつ必要な条件と,乗法雑音で線形SDEの生成元を特定するための十分な条件を導出する。両タイプのSDEから導出される条件は汎用的であることを示す。さらに, 導出同定可能性条件の幾何学的解釈を行い, その理解を深める。理論的結果を検証するため,確立した知見を裏付け,裏付ける一連のシミュレーションを行った。 In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.	翻訳日:2023-11-01 20:19:28 公開日:2023-10-30
# 非線形力学系のための適応メタラーニングに基づくkklオブザーバ設計 Adaptive Meta-Learning-Based KKL Observer Design for Nonlinear Dynamical Systems ( http://arxiv.org/abs/2310.19489v1 ) ライセンス: Link先を確認	Lukas Trommer, Halil Yigit Oksuz	(参考訳) Kazantzis-Kravaris/Luenberger (KKL) オブザーバの設計の理論は、非線形変換写像とその左逆を使って線形オブザーバ状態空間を導入することで非線形システムの状態を推定する方法論を導入する。ニューラルネットワークを用いたデータ駆動アプローチは、これらの変換マップを正確に近似する能力を示している。本稿では,非線形力学系のオブザーバ設計をメタラーニングを通じて行う新しいアプローチを提案する。メタラーニングとは,基礎となる学習問題の本質的性質に着目し,タスクの分布に適応するための学習モデルを最適化することを目的とした機械学習の概念である。システム出力の測定から情報を活用するフレームワークを導入し、さまざまなシステム条件や属性にオンライン適応可能な学習ベースのKKLオブザーバを設計する。提案手法の有効性を検証するために,初期条件と内部パラメータの異なる非線形システムの状態推定を包括的に実験し,高い精度,一般化能力,雑音に対するロバスト性を示す。 The theory of Kazantzis-Kravaris/Luenberger (KKL) observer design introduces a methodology that uses a nonlinear transformation map and its left inverse to estimate the state of a nonlinear system through the introduction of a linear observer state space. Data-driven approaches using artificial neural networks have demonstrated the ability to accurately approximate these transformation maps. This paper presents a novel approach to observer design for nonlinear dynamical systems through meta-learning, a concept in machine learning that aims to optimize learning models for fast adaptation to a distribution of tasks through an improved focus on the intrinsic properties of the underlying learning problem. We introduce a framework that leverages information from measurements of the system output to design a learning-based KKL observer capable of online adaptation to a variety of system conditions and attributes. To validate the effectiveness of our approach, we present comprehensive experimental results for the estimation of nonlinear system states with varying initial conditions and internal parameters, demonstrating high accuracy, generalization capability, and robustness against noise.	翻訳日:2023-11-01 20:19:17 公開日:2023-10-30
# VDIP-TGV:全一般化変分を前提とした変分深度画像によるブラインド画像デコンボリューション VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation ( http://arxiv.org/abs/2310.19477v1 ) ライセンス: Link先を確認	Tingting Wu, Zhiyan Du, Zhi Li, Feng-Lei Fan, Tieyong Zeng	(参考訳) ぼやけたイメージから未知のぼやけたカーネルで鮮明なイメージを復元することは難しい問題である。 deep image prior (dip) では、教師付きモデルではなく、単一の画像の正規化としてディープネットワークを使用することを提案している。しかし、画像とネットワークアーキテクチャの関係は不明確であるため、推定されたぼやけカーネルとクリーンイメージに十分な制約を与える適切なアーキテクチャを見つけることは困難である。また、ディップは後方のスパース最大値(map)を使い、回復画像の選択を強制するには不十分である。近年、ボケカーネルとリカバリイメージの両方に制約を課し、変分原理による最適化過程において、画像の標準偏差を考慮した変分深部画像前処理(VDIP)が提案されている。しかし,VDIPは画像の細部処理に苦慮し,ぼやけたカーネルが大きければ準最適結果を生成する傾向がある。そこで本論文では,全一般化変分法(TGV)をVDIPと組み合わせ,VDIPの欠点を克服する。 TGVはフレキシブルな正則化であり、様々な順序の偏微分の特性を利用して異なるスケールで画像を正則化し、シャープエッジを維持しながら油絵のアーチファクトを減らす。提案したVDIP-TGVは、TGVを介して余分な勾配情報を補足することにより、画像のエッジと詳細を効果的に回復する。さらに、このモデルは従来のアルゴリズムとディープラーニングを効果的に組み合わせた乗算器の交互方向法(ADMM)によって解決される。実験により,提案するVDIP-TGVは,様々な最先端モデルを定量的かつ定性的に超えることがわかった。 Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find a suitable architecture to provide sufficient constraints on the estimated blur kernels and clean images. Also, DIP uses the sparse maximum a posteriori (MAP), which is insufficient to enforce the selection of the recovery image. Recently, variational deep image prior (VDIP) was proposed to impose constraints on both blur kernels and recovery images and take the standard deviation of the image into account during the optimization process by the variational principle. However, we empirically find that VDIP struggles with processing image details and tends to generate suboptimal results when the blur kernel is large. Therefore, we combine total generalized variational (TGV) regularization with VDIP in this paper to overcome these shortcomings of VDIP. TGV is a flexible regularization that utilizes the characteristics of partial derivatives of varying orders to regularize images at different scales, reducing oil painting artifacts while maintaining sharp edges. The proposed VDIP-TGV effectively recovers image edges and details by supplementing extra gradient information through TGV. Additionally, this model is solved by the alternating direction method of multipliers (ADMM), which effectively combines traditional algorithms and deep learning methods. Experiments show that our proposed VDIP-TGV surpasses various state-of-the-art models quantitatively and qualitatively.	翻訳日:2023-11-01 20:19:00 公開日:2023-10-30
# グロークキングチケット:宝くじチケットはグロークキングを加速させる Grokking Tickets: Lottery Tickets Accelerate Grokking ( http://arxiv.org/abs/2310.19470v1 ) ライセンス: Link先を確認	Gouki Minegishi, Yusuke Iwasawa and Yutaka Matsuo	(参考訳) ニューラルネットワークの一般化において、グロッキングは最も驚くべきパズルの1つだ。ネットワークはまず、完全なトレーニング精度と低い一般化を備えた記憶ソリューションに到達するが、さらなるトレーニングでは、完全に一般化されたソリューションに到達する。我々は,宝くじ仮説からグルーキングするメカニズムを分析し,宝くじ(良質なサブネットワーク)を見つけるためのプロセスを特定し,記憶と一般化の間の遷移相を記述するための鍵とする。我々はこれらのサブネットワークを'Grokking ticket'と呼び、完全一般化後のマグニチュードプルーニングによって識別する。まず,「グルーキングチケット」を用いて,様々な構成(MLP, Transformer, 算術, 画像分類タスク)の高密度ネットワークと比較して,宝くじがグルーキングを劇的に加速することを示す。また,「グルーキングチケット」がウェイトノルムよりも重要な要因であることを確認するため,「グッド」サブネットワークとL1とL2のノルムを持つ高密度ネットワークを比較した。その結果、サブネットワークは制御された密集モデルよりも高速に一般化することが示された。さらなる研究で、適切な刈り取り速度で、重量減衰を伴わずともグラッキングが達成できることが判明した。また,記憶ソリューションで識別されたチケットを使用したり,記憶と一般化の遷移を行ったり,初期化時にネットワークをプルーニングする場合(ランサムプルーニング,Grasp,SNIP,Synflow)にはスピードアップが起こらないことを示す。その結果、ネットワークパラメータの重みノルムはグロッキングの過程を説明するのに十分ではなく、記憶から一般化への遷移を記述するのに良いサブネットワークを見つけることの重要性が示されている。実装コードは、このリンクからアクセスすることができる。 Grokking is one of the most surprising puzzles in neural network generalization: a network first reaches a memorization solution with perfect training accuracy and poor generalization, but with further training, it reaches a perfectly generalized solution. We aim to analyze the mechanism of grokking from the lottery ticket hypothesis, identifying the process to find the lottery tickets (good sparse subnetworks) as the key to describing the transitional phase between memorization and generalization. We refer to these subnetworks as ''Grokking tickets'', which is identified via magnitude pruning after perfect generalization. First, using ''Grokking tickets'', we show that the lottery tickets drastically accelerate grokking compared to the dense networks on various configurations (MLP and Transformer, and an arithmetic and image classification tasks). Additionally, to verify that ''Grokking ticket'' are a more critical factor than weight norms, we compared the ''good'' subnetworks with a dense network having the same L1 and L2 norms. Results show that the subnetworks generalize faster than the controlled dense model. In further investigations, we discovered that at an appropriate pruning rate, grokking can be achieved even without weight decay. We also show that speedup does not happen when using tickets identified at the memorization solution or transition between memorization and generalization or when pruning networks at the initialization (Random pruning, Grasp, SNIP, and Synflow). The results indicate that the weight norm of network parameters is not enough to explain the process of grokking, but the importance of finding good subnetworks to describe the transition from memorization to generalization. The implementation code can be accessed via this link: \url{https://github.com/gouki510/Grokking-Tickets}.	翻訳日:2023-11-01 20:18:28 公開日:2023-10-30
# creoleval: creolesのための多言語マルチタスクベンチマーク CreoleVal: Multilingual Multitask Benchmarks for Creoles ( http://arxiv.org/abs/2310.19567v1 ) ライセンス: Link先を確認	Heather Lent and Kushal Tatariya and Raj Dabre and Yiyi Chen and Marcell Fekete and Esther Ploeger and Li Zhou and Hans Erik Heje and Diptesh Kanojia and Paul Belony and Marcel Bollmann and Lo\"ic Grobol and Miryam de Lhoneux and Daniel Hershcovich and Michel DeGraff and Anders S{\o}gaard and Johannes Bjerva	(参考訳) クレオールは未開発の言語群であり、nlp研究に利用可能なリソースは少ない。クレオールと他の高リソース言語との系譜的結びつきは、伝達学習の重要な可能性を示しているが、この注釈付きデータの欠如により、このポテンシャルは妨げられている。この作業では、最大28のCreole言語をカバーする8つの異なるNLPタスクにまたがるベンチマークデータセットのコレクションであるCreoleValを紹介します。各ベンチマークについて,ゼロショット設定でベースライン実験を行い,クレオールのトランスファー学習の能力と限界をさらに確認する。最終的に、CreoleValの目標は、NLPおよび計算言語学におけるCreolesの研究を強化することである。このリソースが世界中のCreole言語ユーザへの技術的包摂に貢献できることを願っています。 Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and other highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of brand new development datasets for machine comprehension, relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, the goal of CreoleVal is to empower research on Creoles in NLP and computational linguistics. We hope this resource will contribute to technological inclusion for Creole language users around the globe.	翻訳日:2023-11-01 20:09:56 公開日:2023-10-30
# 位相相と量子相の合成次元:展望 Synthetic dimensions for topological and quantum phases: Perspective ( http://arxiv.org/abs/2310.19549v1 ) ライセンス: Link先を確認	Javier Arg\"uello-Luengo, Utso Bhattacharya, Alessio Celi, Ravindra W. Chhajlany, Tobias Grass, Marcin P{\l}odzie\'n, Debraj Rakshit, Tymoteusz Salamon, Paolo Stornati, Leticia Tarruell, and Maciej Lewenstein	(参考訳) 本稿では,バルセロナ群 (ICFO, UAB), Donostia (DIPC), Pozna\'n (UAM), Krak\'ow (UJ), Allahabad (HRI) を中心に実現された研究に基づいて, 合成次元の研究の最近の進展について報告する。合成次元の概念は原子物理学、量子光学、フォトニクスにおいて特によく機能し、内部自由度(基底状態のゼーマン準準位、準安定励起状態、原子の運動状態、光子の角運動量状態または横モード)は合成空間を提供する。本稿では, 合成次元の量子シミュレータを設計し, 曲面空間, 人工ゲージ場, 格子ゲージ理論, ツイストロニクス, 量子ランダムウォークなどを模倣する試みについて述べる。 In this Perspective article we report on recent progress on studies of synthetic dimensions, mostly, but not only, based on the research realized around the Barcelona groups (ICFO, UAB), Donostia (DIPC), Pozna\'n (UAM), Krak\'ow (UJ), and Allahabad (HRI). The concept of synthetic dimensions works particularly well in atomic physics, quantum optics, and photonics, where the internal degrees of freedom (Zeeman sublevels of the ground state, metastable excited states, or motional states for atoms, and angular momentum states or transverse modes for photons) provide the synthetic space. We describe our attempts to design quantum simulators with synthetic dimensions, to mimic curved spaces, artificial gauge fields, lattice gauge theories, twistronics, quantum random walks, and more.	翻訳日:2023-11-01 20:09:39 公開日:2023-10-30
# ワッサーシュタイン空間における近似理論, 計算, 深層学習 Approximation Theory, Computing, and Deep Learning on the Wasserstein Space ( http://arxiv.org/abs/2310.19548v1 ) ライセンス: Link先を確認	Massimo Fornasier and Pascal Heid and Giacomo Enrico Sodini	(参考訳) 有限標本からの無限次元空間における函数の近似の課題は、広く有意であると見なされている。本研究では,確率空間上で定義されるソボレフ-滑らか関数の数値近似の難解問題を探索する。我々の特に焦点は、関連する例となるワッサーシュタイン距離関数に焦点を当てている。効率的なポイントワイズ評価に焦点をあてた既存の文献とは対照的に、我々は3つの機械学習に基づくアプローチを採用して機能近似を定義する新しいコースをグラフ化した。 1. 有限数の最適輸送問題の解法と対応するワッサーシュタインポテンシャルの計算。 2.wasserstein sobolev空間におけるtikhonov正規化による経験的リスク最小化 3. ティホノフ汎函数のオイラー・ラグランジュ方程式の弱形式を特徴づけるサドル点定式化による問題への対処。理論的な貢献として,各解に対する一般化誤差に関する明示的かつ定量的な境界を与える。証明では、計量ソボレフ空間の理論を利用し、最適な輸送法、変分計算法、大きな偏差境界法と組み合わせる。数値実装では,ニューラルネットワークを基礎関数として適切に設計した。これらのネットワークは多様な方法論を用いてトレーニングを行う。このアプローチにより、トレーニング後に迅速に評価できる近似関数を得ることができる。その結果, 構築的解は, 評価速度が同等の精度で著しく向上し, 最先端法を数桁上回った。 The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.	翻訳日:2023-11-01 20:09:22 公開日:2023-10-30
# ab慣性計算のための一般球面基底上の1, 2, 3体の行列要素の展開 Expansion of one-, two- and three-body matrix elements on a generic spherical basis for nuclear ab initio calculations ( http://arxiv.org/abs/2310.19547v1 ) ライセンス: Link先を確認	Alberto Scalesi, Carlo Barbieri, Enrico Vigezzi	(参考訳) 原子核の研究は、非常に複雑な構造を持つ一、二、三体作用素を含むハミルトニアンに基づいている。伝統的に、そのような作用素の行列要素はハーモニック振動子単一粒子ベースで拡張され、これは内在的な運動の中心運動を単純な分離を可能にする。最近のいくつかの研究により、異なる単粒子基底を用いると数値核構造計算に大きな利点をもたらすことが示されている。本研究では、一般球面上で拡張されたハミルトン行列要素の完全な解析的表現を初めて提示する。これにより、最適な核基地を決定するための体系的な研究が可能になる。 Ab initio studies of atomic nuclei are based on Hamiltonians including one-, two- and three-body operators with very complicated structures. Traditionally, matrix elements of such operators are expanded on a Harmonic Oscillator single-particle basis, which allows for a simple separation of the center-of-mass motion from the intrinsic one. A few recent investigations have showed that the use of different single-particle bases can bring significant advantages to numerical nuclear structure computations. In this work, the complete analytical expression of the Hamiltonian matrix elements expanded on a generic spherical basis is presented for the first time. This will allow systematic studies aimed at the determination of optimal nuclear bases.	翻訳日:2023-11-01 20:08:59 公開日:2023-10-30
# MENTOR:アイリス提示検出のための人間の知覚誘導事前訓練 MENTOR: Human Perception-Guided Pretraining for Iris Presentation Detection ( http://arxiv.org/abs/2310.19545v1 ) ライセンス: Link先を確認	Colton R. Crum, Adam Czajka	(参考訳) CNNのトレーニングに人間のサリエンスを取り入れることで、生体情報提示攻撃検出などの困難なタスクのパフォーマンスが向上した。しかし、アノテーションの収集は面倒な作業であり、アノテーションが手に入ると、(モデルアーキテクチャにおいて)この情報をモデルのトレーニングに効率的に組み込む方法や方法に関する問題には言及しない。本稿では、これらの問題を2回の訓練で解決するMENTOR(huMan pErceptioN-guided preTraining fOr iris pResentation attack Detection)を紹介する。まず、入力虹彩画像(実例と偽例の両方)から人間の唾液マップを学習するためにオートエンコーダを訓練する。この表現が学習されると、トレーニングされたautoencoderを2つの方法で利用します。 (a)アイリス提示攻撃検知器の事前訓練されたバックボーンとして、及び (b) 未知データ上の有能な特徴の人為的なアノテータである。 MENTORの利点は3つあります。 (a)一般用重量(例えば、画像ネットソース、ランダム)と比較して、人間の知覚訓練エンコーダの重量を使用する場合のアイリスPAD性能の顕著な向上 b) 未確認アイリスPADサンプルに対する無数のヒト様唾液マップを作成する能力、及び、ヒト唾液誘導訓練パラダイムにおける使用方法 (c)虹彩PADモデルトレーニングの効率性の向上。資料のコードと重みが同紙とともに提供される。 Incorporating human salience into the training of CNNs has boosted performance in difficult tasks such as biometric presentation attack detection. However, collecting human annotations is a laborious task, not to mention the questions of how and where (in the model architecture) to efficiently incorporate this information into model's training once annotations are obtained. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr iris pResentation attack detection), which addresses both of these issues through two unique rounds of training. First, we train an autoencoder to learn human saliency maps given an input iris image (both real and fake examples). Once this representation is learned, we utilize the trained autoencoder in two different ways: (a) as a pre-trained backbone for an iris presentation attack detector, and (b) as a human-inspired annotator of salient features on unknown data. We show that MENTOR's benefits are threefold: (a) significant boost in iris PAD performance when using the human perception-trained encoder's weights compared to general-purpose weights (e.g. ImageNet-sourced, or random), (b) capability of generating infinite number of human-like saliency maps for unseen iris PAD samples to be used in any human saliency-guided training paradigm, and (c) increase in efficiency of iris PAD model training. Sources codes and weights are offered along with the paper.	翻訳日:2023-11-01 20:08:48 公開日:2023-10-30
# 単発視覚追跡における画像関連誘導バイアスの活用 Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking ( http://arxiv.org/abs/2310.19542v1 ) ライセンス: Link先を確認	Chuanming Tang, Kai Wang, Joost van de Weijer, Jianlin Zhang, Yongmei Huang	(参考訳) 視覚追跡における最先端のパフォーマンスにもかかわらず、最近のシングルブランチトラッカーは、ビジョントランスフォーマー(ViT)エンコーダと推論パイプラインに関連する、弱い前提を見逃す傾向にある。さらに, 判別トラッカの有効性は, デュアルブランチパイプラインの採用により制限されている。単分岐ネットワークと識別モデルとのギャップを埋めるための適応型ViTモデル予測トラッカー(AViTMP)を提案する。具体的には,提案するエンコーダavit-encにおいて,vitに基づく密組込みパラダイムを豊かにするために,アダプタモジュールとジョイントターゲット状態埋め込みを導入する。次にavit-encと密輸デコーダと判別対象モデルを組み合わせて正確な位置を推定する。さらに,従来の推論手法の限界を緩和するため,双方向のサイクルトラッキング検証により,トラクタの存在下でのロバスト性を向上するCycleTrackという新しい推論パイプラインを提案する。最後に,長期的なシナリオにおいて大きな課題を積極的に処理する,デュアルフレーム更新推論戦略を提案する。実験では,lasot,lasotextsub,avistなどを含む総合評価のための10のトラッキングベンチマークについてavitmpを評価した。実験結果から,AViTMPが最先端の性能,特に長期追跡とロバスト性を達成したことが明らかとなった。 Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.	翻訳日:2023-11-01 20:08:24 公開日:2023-10-30
# IterInv:Pixel-Level T2Iモデルの反復インバージョン IterInv: Iterative Inversion for Pixel-Level T2I Models ( http://arxiv.org/abs/2310.19540v1 ) ライセンス: Link先を確認	Chuanming Tang, Kai Wang, Joost van de Weijer	(参考訳) 大規模テキスト画像拡散モデルは、入力テキストプロンプトに従って説得力のある画像を生成するための画期的な開発である。画像編集研究の目的は、ユーザーがテキストプロンプトを変更することによって生成された画像を制御することである。現在の画像編集技術は、LDM(Latent Diffusion Models)に基づくDDIMインバージョンに依存している。しかし、LDMがオートエンコーダ機構を備えた最初の圧縮段階により詳細を失うと、遅延空間で動作する大きな事前訓練されたT2Iモデルが存在する。代わりに、ImagenやDeepFloyd-IFといった画素レベルで動作する別のメインストリームのT2Iパイプラインは、この問題を回避する。通常は複数のステージで構成され、通常はテキストから画像へのステージと、いくつかの超解像度ステージで構成される。この場合、DDIMのインバージョンは、超解像拡散モデルがDDIM技術と互換性がないため、元の画像を生成する初期ノイズを見つけることができない。実験結果によると,雑音画像を条件として反復結合することがこの問題の根源である。本研究では,このT2Iモデルのストリームに対する反復反転(IterInv)手法を開発し,オープンソースのDeepFloyd-IFモデルを用いてIterInvを検証する。 IterInvの手法と一般的な画像編集手法を組み合わせることで、IterInvの応用可能性を証明する。コードは \url{https://github.com/Tchuanm/IterInv.git} でリリースされる。 Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{https://github.com/Tchuanm/IterInv.git}.	翻訳日:2023-11-01 20:07:58 公開日:2023-10-30
# チーム問題解決をリアルタイムで改善するための新しい表現 A Novel Representation to Improve Team Problem Solving in Real-Time ( http://arxiv.org/abs/2310.19539v1 ) ライセンス: Link先を確認	Alex Doboli	(参考訳) 本稿では,実生活における問題解決におけるチームの行動の理解と改善を支援する,計算メトリクスを支援する新しい表現を提案する。チームは現代の活動において重要ですが、活動を改善するためのコンピューティング支援はほとんどありません。この表現は、解決中に開発、拡張、利用された異なるメンタルイメージをキャプチャする。ケーススタディは表現を示します。 This paper proposes a novel representation to support computing metrics that help understanding and improving in real-time a team's behavior during problem solving in real-life. Even though teams are important in modern activities, there is little computing aid to improve their activity. The representation captures the different mental images developed, enhanced, and utilized during solving. A case study illustrates the representation.	翻訳日:2023-11-01 20:07:35 公開日:2023-10-30
# 量子レゴとxp安定化コード Quantum Lego and XP Stabilizer Codes ( http://arxiv.org/abs/2310.19538v1 ) ライセンス: Link先を確認	Ruohan Shen, Yixu Wang and ChunJun Cao	(参考訳) 我々は,'quantum lego' の最近のグラフィカルな枠組みを,安定化群が一般に非可換な xp 安定化符号に適用する。演算子マッチングの考え方がそのような符号を保ち続けており、結果の符号が XP であればすべての XP 対称性を生成するのに十分であることを示す。テンソル収縮や結合の下でこれらの対称性を追跡する効率的な古典アルゴリズムを提供する。これは、パウリの安定化状態やクリフォード演算を超えて、ゴッテマン・クニルの定理によって暗示されるアルゴリズムの部分拡張を構成する。共役変換は普遍的な量子演算を生成するため、これらのアルゴリズムから得られるXP対称性は一般に得られるテンソルを一意に特定しない。この拡張フレームワークを使用することで、高い距離を持つ新しいXP安定化コードとフォールトトレラントな$T$ゲートを持つ$[[8,1,2]]$コードを提供します。 XP正規符号に対しては、任意の単一キュービットエラーチャネルに対して、テンソルネットワークに基づく最大可能性復号器を構築する。 We apply the recent graphical framework of ''quantum lego'' to XP stabilizer codes where the stabilizer group is generally non-abelian. We show that the idea of operator matching continues to hold for such codes and is sufficient for generating all their XP symmetries provided the resulting code is XP. We provide an efficient classical algorithm for tracking these symmetries under tensor contraction or conjoining. This constitutes a partial extension of the algorithm implied by Gottesman-Knill theorem beyond Pauli stabilizer states and Clifford operations. Because conjoining transformations generate quantum operations that are universal, the XP symmetries obtained from these algorithms do not uniquely identify the resulting tensors in general. Using this extended framework, we provide a novel XP stabilizer code with higher distance and a $[[8,1,2]]$ code with fault-tolerant $T$ gate. For XP regular codes, we also construct a tensor-network-based the maximum likelihood decoder for any i.i.d. single qubit error channel.	翻訳日:2023-11-01 20:07:30 公開日:2023-10-30
# 判別的特徴を有するデータに対する微調整の影響について On consequences of finetuning on data with highly discriminative features ( http://arxiv.org/abs/2310.19537v1 ) ライセンス: Link先を確認	Wojciech Masarczyk, Tomasz Trzci\'nski, Mateusz Ostaszewski	(参考訳) トランスファーラーニングの時代、スクラッチからニューラルネットワークを訓練することは時代遅れになりつつある。転送学習は新しいタスクの事前知識を活用し、計算資源を保存する。ネットワークは基本的なデータパターンを優先し、事前学習した価値のある機能を禁止する傾向があります。この挙動を「機能侵食」と呼び、ネットワーク性能と内部表現への影響を分析する。 In the era of transfer learning, training neural networks from scratch is becoming obsolete. Transfer learning leverages prior knowledge for new tasks, conserving computational resources. While its advantages are well-documented, we uncover a notable drawback: networks tend to prioritize basic data patterns, forsaking valuable pre-learned features. We term this behavior "feature erosion" and analyze its impact on network performance and internal representations.	翻訳日:2023-11-01 20:07:16 公開日:2023-10-30
# 逆バッチ逆強化学習 : 対話的勧告のための不完全な実証から振り返る Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation ( http://arxiv.org/abs/2310.19536v1 ) ライセンス: Link先を確認	Jialin Liu, Xinyan Su, Zeyu He, Xiangyu Zhao, Jun Li	(参考訳) 報酬はユーザの満足度を測る指標であり、インタラクティブなレコメンデーションシステムでは制限要因として機能する。本研究では,強化学習の基礎となる学習報酬問題(LTR)に焦点を当てた。従来のアプローチでは、報酬を得るための追加の手順を導入し、最適化の複雑さを増大させるか、ユーザとエージェントのインタラクションが完璧なデモを提供すると仮定する。理想的には、構成実証を用いて報酬と政策の両方を最適化する統一的なアプローチを採用することを目指している。しかし、この要件は、報酬が本質的に政治におけるユーザーのフィードバックを定量化するのに対し、推薦エージェントは政治外の将来的な累積評価を近似するため、課題となる。この課題に取り組むために,要求される特性を実現する新しいバッチ逆強化学習パラダイムを提案する。 LTRとレコメンダエージェント評価を併用するために,ディスカウントされた定常分布補正を利用する。構成要件を満たすために,保存を通じて悲観主義の概念を取り入れる。具体的には,ベルマン変換を用いてバニラ補正を修正し,KL正則化を適用した。実世界の2つのデータセットを用いて経験的研究を行い,提案手法は相対的に有効性(2.3\%)と効率(11.53\%)を向上することを示した。 Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)	翻訳日:2023-11-01 20:07:11 公開日:2023-10-30
# レガシビデオコンテンツの再生:双方向情報伝達によるデインターレース Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation ( http://arxiv.org/abs/2310.19535v1 ) ライセンス: Link先を確認	Zhaowei Gao, Mingyang Song, Christopher Schroers, Yang Zhang	(参考訳) 古いcrt表示技術と限られた伝送帯域のため、初期のフィルムやテレビ放送ではインターレース走査が一般的であった。これは各フィールドが情報の半分しか含まないことを意味する。現代のディスプレイはフルフレームを必要とするため、これはデインターレースの研究、すなわちレガシービデオコンテンツの欠落した情報を復元するきっかけとなった。本稿では,アニメーションコンテンツとライブアクションコンテンツを分離する深層学習手法を提案する。提案手法は,空間と時間の両方で情報を活用するために,複数スケールにわたる双方向時空間情報伝搬を支援する。より具体的には,アライメント,融合,整流などの機能改良を行うフローガイドリファインメントブロック(frb)を設計する。さらに,複数のフィールドを同時に処理し,フレーム単位の処理時間を短縮し,リアルタイム処理を可能にする。実験の結果,提案手法は既存手法と比較して優れた性能を示した。 Due to old CRT display technology and limited transmission bandwidth, early film and TV broadcasts commonly used interlaced scanning. This meant each field contained only half of the information. Since modern displays require full frames, this has spurred research into deinterlacing, i.e. restoring the missing information in legacy video content. In this paper, we present a deep-learning-based method for deinterlacing animated and live-action content. Our proposed method supports bidirectional spatio-temporal information propagation across multiple scales to leverage information in both space and time. More specifically, we design a Flow-guided Refinement Block (FRB) which performs feature refinement including alignment, fusion, and rectification. Additionally, our method can process multiple fields simultaneously, reducing per-frame processing time, and potentially enabling real-time processing. Our experimental results demonstrate that our proposed method achieves superior performance compared to existing methods.	翻訳日:2023-11-01 20:06:43 公開日:2023-10-30
# 生成言語モデルにおける学習困難度軽減のための情報エントロピー損失 InfoEntropy Loss to Mitigate Bias of Learning Difficulties for Generative Language Models ( http://arxiv.org/abs/2310.19531v1 ) ライセンス: Link先を確認	Zhenpeng Su, Xing Wu, Xue Bai, Zijia Lin, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu	(参考訳) 生成言語モデルは、通常、前のものから次のトークン(サブワード/ワード/フレーズ)を予測することによって、大きなテキストコーパスで事前訓練される。最近の研究は、下流タスクにおける大規模な生成言語モデルの印象的な性能を実証している。しかし、既存の生成言語モデルは、訓練中にテキストコーパスに固有の課題、すなわち頻繁なトークンと頻繁なトークンの不均衡を無視している。これは、言語モデルが一般的で簡単に学習できるトークンに支配され、希少で難解なトークンを見渡すことができる。そこで我々は,情報エントロピー損失(InfoEntropy Loss)関数を提案する。学習中,語彙上の予測確率分布の情報エントロピーに応じて,to-be-learnedトークンの学習難易度を動的に評価することができる。その後、トレーニング損失を適応的にスケーリングし、モデルをより理解の難しいトークンに集中させようとする。 Pileデータセットでは、生成言語モデルを436M、1.1B、6.7Bパラメータで訓練する。提案されたInfoEntropy Lossを組み込んだモデルでは、ダウンストリームベンチマークで一貫したパフォーマンス向上が期待できる。 Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose an Information Entropy Loss (InfoEntropy Loss) function. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 436M, 1.1B, and 6.7B parameters. Experiments reveal that models incorporating the proposed InfoEntropy Loss can gain consistent performance improvement on downstream benchmarks.	翻訳日:2023-11-01 20:06:29 公開日:2023-10-30
# Decoupled Actor-Critic Decoupled Actor-Critic ( http://arxiv.org/abs/2310.19527v1 ) ライセンス: Link先を確認	Michal Nauman and Marek Cygan	(参考訳) アクタ-クリティックな手法は、一見無矛盾な2つの問題の停滞状態にある。まず、過大評価に対する批判的傾向は、低バウンドq値を用いて最適化された保守的政策から時間差目標をサンプリングする必要がある。第2に、不確実性に直面した楽観的な政策は、後悔のレベルを低くすることを示している。そこで我々は,この二分法を治療するために,DAC(Decoupled Actor-Critic)を提案する。 DACは、時間差学習に使用される保守的なアクターと、探索に使用される楽観的なアクターという、2つの異なるアクターをグラデーションバックプロパゲーションによって学習する。我々は,DeepMind制御タスクにおいて,低リプレイ率と高リプレイ率の条件下でDACを試験し,複数の設計選択を補正する。計算オーバーヘッドは最小限だが、DACは最先端の性能とロコモーションタスクのサンプル効率を達成する。 Actor-Critic methods are in a stalemate of two seemingly irreconcilable problems. Firstly, critic proneness towards overestimation requires sampling temporal-difference targets from a conservative policy optimized using lower-bound Q-values. Secondly, well-known results show that policies that are optimistic in the face of uncertainty yield lower regret levels. To remedy this dichotomy, we propose Decoupled Actor-Critic (DAC). DAC is an off-policy algorithm that learns two distinct actors by gradient backpropagation: a conservative actor used for temporal-difference learning and an optimistic actor used for exploration. We test DAC on DeepMind Control tasks in low and high replay ratio regimes and ablate multiple design choices. Despite minimal computational overhead, DAC achieves state-of-the-art performance and sample efficiency on locomotion tasks.	翻訳日:2023-11-01 20:06:11 公開日:2023-10-30
# ホットダイヤモンド基板へのイオン注入によるカラーセンターの高密度アンサンブルの効率的な作製 Efficient fabrication of high-density ensembles of color centers via ion implantation on a hot diamond substrate ( http://arxiv.org/abs/2310.19526v1 ) ライセンス: Link先を確認	E. Nieto Hernandez, G. Andrini, A. Crnjac, M. Brajkovic, F. Picariello, E. Corte, V. Pugliese, M. Matijevi\'c, P. Apr\`a, V. Varzi, J. Forneris, M. Genovese, Z. Siketic, M. Jaksic, S. Ditalia Tchernij	(参考訳) ダイヤモンドの窒素空洞(nv)中心は量子計測やセンシングを含む量子技術にとって有望なシステムである。外部界に対する高い感度を達成するための有望な戦略は、ダイヤモンド格子に導入された放射線損傷の量によってイオン注入による製造が上限のnv中心の大きなアンサンブルの活用に依存している。本研究は,熱標的基板(>550 {\deg}C)にMeV N2+イオンを高流動注入することにより,NV中心の密度を増大させるアプローチを示す。以上の結果から, 高温注入はダイヤモンドからグラファイト相への可逆変換に必要な空隙密度閾値を増大させ, 高密度アンサンブルを実現することができることがわかった。さらに, mev n2+およびmg+イオンで様々な温度で注入されたダイヤモンド基板上に色中心の形成効率を調べた結果, nv中心とマグネシウム空孔(mgv)中心の両方の形成効率は, 注入温度とともに増加することが明らかとなった。 Nitrogen-Vacancy (NV) centers in diamond are promising systems for quantum technologies, including quantum metrology and sensing. A promising strategy for the achievement of high sensitivity to external fields relies on the exploitation of large ensembles of NV centers, whose fabrication by ion implantation is upper limited by the amount of radiation damage introduced in the diamond lattice. In this works we demonstrate an approach to increase the density of NV centers upon the high-fluence implantation of MeV N2+ ions on a hot target substrate (>550 {\deg}C). Our results show that, with respect to room-temperature implantation, the high-temperature process increases the vacancy density threshold required for the irreversible conversion of diamond to a graphitic phase, thus enabling to achieve higher density ensembles. Furthermore, the formation efficiency of color centers was investigated on diamond substrates implanted at varying temperatures with MeV N2+ and Mg+ ions revealing that the formation efficiency of both NV centers and magnesium-vacancy (MgV) centers increases with the implantation temperature.	翻訳日:2023-11-01 20:05:56 公開日:2023-10-30
# DPATD:Dual-Phase Audio Transformer for Denoising DPATD: Dual-Phase Audio Transformer for Denoising ( http://arxiv.org/abs/2310.19588v1 ) ライセンス: Link先を確認	Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang	(参考訳) 近年の高性能トランスフォーマーベース音声強調モデルでは,時間領域法が時間周波数領域法と同等の性能を達成できることが示されている。しかし、時間領域音声強調システムは、通常、多数の時間ステップからなる入力音声シーケンスを受け取り、非常に長いシーケンスをモデル化し、適切な動作を訓練することは困難である。本稿では,より小さな音声チャンクを入力として利用し,上記の課題に対処するために,音声情報の効率的な活用を実現する。本研究では,二重位相オーディオトランスフォーマ(dpatd)を提案する。トランスフォーマ層を深層構造に整理し,クリーンなオーディオシーケンスを学習する新しいモデルである。 DPATDは音声入力を小さなチャンクに分割し、入力長は元のシーケンス長の平方根に比例することができる。メモリに圧縮された説明可能な注意は効率的で、頻繁に使用される自己注意モジュールよりも早く収束する。我々のモデルは最先端の手法よりも優れています。 Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods. However, time-domain speech enhancement systems typically receive input audio sequences consisting of a large number of time steps, making it challenging to model extremely long sequences and train models to perform adequately. In this paper, we utilize smaller audio chunks as input to achieve efficient utilization of audio information to address the above challenges. We propose a dual-phase audio transformer for denoising (DPATD), a novel model to organize transformer layers in a deep structure to learn clean audio sequences for denoising. DPATD splits the audio input into smaller chunks, where the input length can be proportional to the square root of the original sequence length. Our memory-compressed explainable attention is efficient and converges faster compared to the frequently used self-attention module. Extensive experiments demonstrate that our model outperforms state-of-the-art methods.	翻訳日:2023-11-01 19:57:36 公開日:2023-10-30
# gc-mvsnet:マルチビュー、マルチスケール、幾何学的一貫性のあるマルチビューステレオ GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo ( http://arxiv.org/abs/2310.19583v1 ) ライセンス: Link先を確認	Vibhas K. Vats, Sripad Joshi, David J. Crandall, Md. Alimoor Reza, Soon-heung Jung	(参考訳) 従来のマルチビューステレオ(MVS)手法は、測光的および幾何的整合性制約に大きく依存するが、より新しい機械学習ベースのMVS法は、後処理ステップとしてのみ複数のソースビューにまたがる幾何的整合性をチェックする。本稿では,学習中に異なるスケールで複数のソースビューにまたがる参照ビュー深度マップの幾何学的一貫性を明示的に奨励する新しいアプローチを提案する(図1参照)。この幾何整合性損失を加えることで、幾何的不整合画素を明示的にペナル化することで学習を著しく加速し、訓練の繰り返し要求を他のMVS手法のほぼ半分に削減する。広範な実験により,dtu と blendedmvs データセットにおける新たな最先端技術と,タンク・テンプルベンチマークの競合結果が得られた。我々の知る限り、GC-MVSNetは学習中にマルチビュー、マルチスケールの幾何的一貫性を強制する最初の試みである。 Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.	翻訳日:2023-11-01 19:57:20 公開日:2023-10-30
# 会話を通して見る:拡散モデルに基づく音声・視覚音声分離 Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model ( http://arxiv.org/abs/2310.19581v1 ) ライセンス: Link先を確認	Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung	(参考訳) 本研究の目的は,視覚手がかりを用いた混合音声から対象話者の声を抽出することである。音声と音声の分離に関する既存の研究は、その性能を有望な知性で実証している。そこで本研究では,自然サンプル生成能力で知られる拡散機構に基づく音声・視覚音声分離モデルであるavdiffussを提案する。拡散の2つのモードを効果的に融合させるため,クロスアテンションに基づく特徴融合機構を提案する。このメカニズムは、音声生成における音声・視覚対応から音声情報を統合するための音声領域に特化している。このようにして、融合プロセスは過剰な計算要求なしに、特徴の高時間分解を維持できる。提案手法は,VoxCeleb2 と LRS3 の2つのベンチマークを用いて,より自然な音声を生成する。 The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for its capability in generating natural samples. For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism. This mechanism is specifically tailored for the speech domain to integrate the phonetic information from audio-visual correspondence in speech generation. In this way, the fusion process maintains the high temporal resolution of the features, without excessive computational requirements. We demonstrate that the proposed framework achieves state-of-the-art results on two benchmarks, including VoxCeleb2 and LRS3, producing speech with notably better naturalness.	翻訳日:2023-11-01 19:56:52 公開日:2023-10-30
# 単眼3次元顔再建における知覚形状損失 A Perceptual Shape Loss for Monocular 3D Face Reconstruction ( http://arxiv.org/abs/2310.19580v1 ) ライセンス: Link先を確認	Christopher Otto, Prashanth Chandran, Gaspard Zoss, Markus Gross, Paulo Gotardo, Derek Bradley	(参考訳) モノクロ3D顔の再構成は広範にわたるトピックであり、既存のアプローチでは、高速ニューラルネットワーク推論またはオフラインの顔形状の反復的再構成によってこの問題に取り組む。どちらの場合でも、注意深く設計されたエネルギー関数は最小化され、一般的には測光損失、ランドマーク再投影損失などの損失項が含まれる。本研究では,特定の画像から得られる3次元顔の復元の質を人間がどのように認識するかに着想を得た,単眼顔撮影のための新しい損失関数を提案する。シェーディングが人間の視覚系における3次元形状の強い指標となることは広く知られている。そこで,新しい「知覚的」形状損失は,シェーディング手がかりのみを用いて3次元顔推定の質を判定することを目的としている。私たちの損失は、入力された顔画像とジオメトリ推定のシェードレンダリングを取り込んで、シェードレンダリングが与えられた画像にどの程度適合するかを知覚的に評価するスコアを予測する、識別器スタイルのニューラルネットワークとして実装されます。この「批判的」ネットワークはRGB画像と幾何学的レンダリングだけで動作し、シーン内のアルベドや照明を見積もる必要がない。さらに、この損失は画像空間で完全に動作しており、メッシュトポロジーに非依存である。新しい知覚的形状損失と従来の3d顔の最適化やディープニューラルネットワークの回帰といったエネルギー用語を組み合わせることで、最先端の結果を改善できることを示す。 Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.	翻訳日:2023-11-01 19:56:35 公開日:2023-10-30
# skip-wavenet:レーダーエコーグラム中のfirn層を追跡するwaveletベースのマルチスケールアーキテクチャ Skip-WaveNet: A Wavelet based Multi-scale Architecture to Trace Firn Layers in Radar Echograms ( http://arxiv.org/abs/2310.19574v1 ) ライセンス: Link先を確認	Debvrat Varshney, Masoud Yari, Oluwanisola Ibikunle, Jilu Li, John Paden, Maryam Rahnemoonfar	(参考訳) 空中レーダーセンサーから生成したエコーグラムは、氷床の上のほこりの層を捉えている。これらの層の正確な追跡は,氷冠融解の海面上昇への寄与を調べるために必要となる積雪率を計算するために重要である。しかし、地下層を検出するためにレーダエコーグラムを自動的に処理することは難しい問題である。本研究では,これらのレーダエコーグラムのためのウェーブレットに基づくマルチスケールディープラーニングアーキテクチャを開発し,ファーン層の検出を改善する。ウェーブレットベースアーキテクチャは, 最適データセットスケール (ods) と最適画像スケール (ois) のf-scoreを, 非ウェーブレットアーキテクチャよりもそれぞれ3.99%, 3.7%改善することを示す。さらに,提案するスキップウェーブネットアーキテクチャは,各イテレーションで新たなウェーブレットを生成し,最先端のfirn層検出ネットワークと比較して高い汎用性を実現し,平均絶対誤差3.31ピクセル,94.3%の平均精度で層深さを推定する。このようなネットワークは科学者によってファーン層を辿り、年間降雪率を計算し、氷床の表面の質量収支を推定し、地球規模の海面上昇を予測するために利用できる。 Echograms created from airborne radar sensors capture the profile of firn layers present on top of an ice sheet. Accurate tracking of these layers is essential to calculate the snow accumulation rates, which are required to investigate the contribution of polar ice cap melt to sea level rise. However, automatically processing the radar echograms to detect the underlying firn layers is a challenging problem. In our work, we develop wavelet-based multi-scale deep learning architectures for these radar echograms to improve firn layer detection. We show that wavelet based architectures improve the optimal dataset scale (ODS) and optimal image scale (OIS) F-scores by 3.99% and 3.7%, respectively, over the non-wavelet architecture. Further, our proposed Skip-WaveNet architecture generates new wavelets in each iteration, achieves higher generalizability as compared to state-of-the-art firn layer detection networks, and estimates layer depths with a mean absolute error of 3.31 pixels and 94.3% average precision. Such a network can be used by scientists to trace firn layers, calculate the annual snow accumulation rates, estimate the resulting surface mass balance of the ice sheet, and help project global sea level rise.	翻訳日:2023-11-01 19:56:10 公開日:2023-10-30
# ブーストツリーを用いた語彙データに基づくモデル不確かさに基づく能動的学習 Model Uncertainty based Active Learning on Tabular Data using Boosted Trees ( http://arxiv.org/abs/2310.19573v1 ) ライセンス: Link先を確認	Sharath M Shankaranarayana	(参考訳) 教師付き機械学習は、モデルトレーニングのための適切なラベル付きデータの可用性に依存している。ラベル付きデータは人間のアノテーションによって取得されるが、これは面倒でコストのかかるプロセスであり、しばしば主題の専門家を必要とする。アクティブラーニングは機械学習のサブフィールドであり、モデルトレーニングのための最も価値のあるデータインスタンスを選択し、人間のアノテータからのみラベルをクエリすることで、ラベル付きデータを効率的に取得するのに役立つ。近年、特に深層ニューラルネットワークに基づくモデルにおいて、アクティブラーニングの分野で多くの研究が行われている。 image\textual\multimodalデータを扱う際にはディープラーニングが輝くが、勾配向上手法は表データよりもはるかに優れた結果が得られる傾向にある。本研究では,ブースト木を用いた表データに対するアクティブラーニングについて検討する。アクティブラーニングにおける不確実性に基づくサンプリングは、最も一般的に使用されるクエリ戦略であり、これらのインスタンスのラベルは、現在のモデル予測が最大限に不確実なシーケンシャルにクエリされる。エントロピーはしばしば不確実性を測定するための選択である。しかし、エントロピーは必ずしもモデルの不確かさの尺度ではない。モデル不確実性を計測し、それをアクティブな学習に活用する深層学習には多くの研究があるが、神経以外のネットワークモデルについては、まだ研究されていない。そこで本研究では,強化木を用いたモデル不確実性手法の有効性について検討する。このモデルの不確実性を生かして、表データの回帰タスクに対するアクティブラーニングにおける不確実性に基づくサンプリングを提案する。さらに,回帰課題に対するコスト効率のよいアクティブラーニング手法と,分類課題に対するコスト効率のよいアクティブラーニング手法を提案する。 Supervised machine learning relies on the availability of good labelled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.	翻訳日:2023-11-01 19:55:46 公開日:2023-10-30
# インコンテキスト学習のためのデモ再生による入力ラベルマッピングの改善 Improving Input-label Mapping with Demonstration Replay for In-context Learning ( http://arxiv.org/abs/2310.19572v1 ) ライセンス: Link先を確認	Zhuocheng Gong, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan	(参考訳) In-context Learning(ICL)は、入力にいくつかの入力ラベルを付加して、モデルパラメータを直接調整することなく、下流のNLPタスクに対するモデルの理解を強化する、大規模な自己回帰言語モデルの出現する能力である。 ICLの有効性は、大きな言語モデル(LLM)の強力な言語モデリング能力によるもので、インコンテキストのデモンストレーションに基づいて入力とラベルのマッピングを学習することができる。有望な結果を得たにもかかわらず、ICLにおける言語モデリングの因果性は、注意を後方のみに制限する、すなわちトークンは以前のトークンにのみ対応し、完全な入力ラベル情報の取得に失敗し、モデルの性能を制限している。本稿では,スライディング因果注意法(RdSca)を用いた新たなICL手法を提案する。具体的には、後続のデモンストレーションを複製してフロントに結合し、モデルが因果制限下でも後続の情報を‘オブザーバ’できるようにします。さらに、情報漏洩を避けるために、因果注意をカスタマイズするスライディング因果注意を導入する。実験の結果,本手法はICL実験における入力ラベルマッピングを大幅に改善することがわかった。また,先行研究の未調査領域であるトレーニングなしで因果的注意をカスタマイズする方法について,詳細な分析を行った。 In-context learning (ICL) is an emerging capability of large autoregressive language models where a few input-label demonstrations are appended to the input to enhance the model's understanding of downstream NLP tasks, without directly adjusting the model parameters. The effectiveness of ICL can be attributed to the strong language modeling capabilities of large language models (LLMs), which enable them to learn the mapping between input and labels based on in-context demonstrations. Despite achieving promising results, the causal nature of language modeling in ICL restricts the attention to be backward only, i.e., a token only attends to its previous tokens, failing to capture the full input-label information and limiting the model's performance. In this paper, we propose a novel ICL method called Repeated Demonstration with Sliding Causal Attention, (RdSca). Specifically, we duplicate later demonstrations and concatenate them to the front, allowing the model to `observe' the later information even under the causal restriction. Besides, we introduce sliding causal attention, which customizes causal attention to avoid information leakage. Experimental results show that our method significantly improves the input-label mapping in ICL demonstrations. We also conduct an in-depth analysis of how to customize the causal attention without training, which has been an unexplored area in previous research.	翻訳日:2023-11-01 19:55:18 公開日:2023-10-30
# DataZoo: トラフィック分類実験の合理化 DataZoo: Streamlining Traffic Classification Experiments ( http://arxiv.org/abs/2310.19568v1 ) ライセンス: Link先を確認	Jan Luxemburk, Karel Hynek	(参考訳) コンピュータビジョンや自然言語処理などの機械学習コミュニティは、開発を加速するために多数の支援ツールとベンチマークデータセットを開発してきた。対照的に、ネットワークトラフィック分類分野は、ほとんどのタスクの標準ベンチマークデータセットが欠落しており、利用可能なサポートソフトウェアはスコープが限られている。本稿では,このギャップに対処し,ネットワークトラフィック分類におけるデータセット管理の合理化と,評価設定における潜在的なミスの空間削減を目的としたツールセットであるDataZooを紹介する。 DataZooは、CESNET-QUIC22、CESNET-TLS22、CESNET-TLS-Year22という3つの広範なデータセットにアクセスするための標準化されたAPIを提供する。さらに、時間的およびサービス関連要因を考慮して、機能スケーリングと現実的なデータセットパーティショニングの方法も含まれている。 DataZooツールセットは、現実的な評価シナリオの作成を簡単にし、分類方法のクロスコンペア化と結果の再現を容易にする。 The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification and to reduce the space for potential mistakes in the evaluation setup. DataZoo provides a standardized API for accessing three extensive datasets -- CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-Year22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.	翻訳日:2023-11-01 19:54:54 公開日:2023-10-30
# 複数の観測者がKSコンテキスト性を検出することができるか? Can multiple observers detect KS-contextuality? ( http://arxiv.org/abs/2310.19564v1 ) ライセンス: Link先を確認	Arthur C. R. Dutra, Roberto D. Baldij\~ao, Marcelo Terra Cunha	(参考訳) KS-コンテキスト性は量子論の重要な特徴である。以前の研究では、複数の独立したオブザーバが同じシステム上で連続的に測定するセットアップにおいて、$N$-cycle KS-contextualityがなくなりました。この現象は、状態が劣化し、量子資源が枯渇する追加観測者の測定として説明できる。この説明は、状態に依存しない文脈性はそのようなシステムで生き残るべきであることを意味する。本稿では,この現象はそうではないことを示す。この結果は,公共システムにおけるペレスメルミン非文脈性不等式を破ろうとするオブザーバーをシミュレートすることで達成した。さらに, 状況に依存しない場合においても文脈性が失われることを説明するため, 設定の分析的記述を提供する。最終的に、これらの結果は、状態に依存しない文脈性は、ある文脈の測定の間にあるシステムに何が起こるかとは独立ではないことを示している。 KS-contextuality is a crucial feature of quantum theory. Previous research demonstrated the vanishing of $N$-cycle KS-contextuality in setups where multiple independent observers measure sequentially on the same system, which we call Public Systems. This phenomenon can be explained as the additional observers' measurements degrading the state and depleting the quantum resource. This explanation would imply that state-independent contextuality should survive in such a system. In this paper, we show that this is not the case. We achieved this result by simulating an observer trying to violate the Peres-Mermin noncontextuality inequality in a Public System. Additionally, we provide an analytical description of our setup, explaining the loss of contextuality even in the state-independent case. Ultimately, these results show that state-independent contextuality is not independent of what happens to the system in-between the measurements of a context.	翻訳日:2023-11-01 19:54:37 公開日:2023-10-30
# 多様体上のロボット学習のための非パラメトリック回帰 Non-parametric regression for robot learning on manifolds ( http://arxiv.org/abs/2310.19561v1 ) ライセンス: Link先を確認	P. C. Lopez-Custodio, K. Bharath, A. Kucukyilmaz, and S. P. Preston	(参考訳) ロボット学習のためのツールの多くはユークリッドデータのために設計された。しかし、ロボット工学における多くの応用は多様体値データを含む。一般的な例は向き付けであり、これは3-by-3回転行列あるいは四元数として表すことができ、その空間は非ユークリッド多様体である。ロボット学習では、多様体値データはしばしば多様体を適切なユークリッド空間に関連付けるか、あるいは1つまたは複数の接空間にデータを投影することによって処理される。これらのアプローチは予測精度の低さと畳み込みアルゴリズムをもたらす可能性がある。本稿では,多様体内で直接作用する回帰に対する「内在的」なアプローチを提案する。これは多様体上の適切な確率分布を取ることを含み、そのパラメータを時間のような予測変数の関数とし、カーネルを組み込んだ「局所的確率」法によりその関数を非パラメトリックに推定する。我々はこの手法をカーネル化推定と呼ぶ。アプローチは概念的には単純であり、一般に異なる多様体に適用できる。ロボット工学のアプリケーションでよく見られる3種類の多様体値データを用いて実装する。これらの実験の結果はプロジェクションベースアルゴリズムよりも予測精度がよい。 Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.	翻訳日:2023-11-01 19:54:20 公開日:2023-10-30
# 物理視聴覚コモンセンス推論のための不連続反事実学習 Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning ( http://arxiv.org/abs/2310.19559v1 ) ライセンス: Link先を確認	Changsheng Lv and Shuai Zhang and Yapeng Tian and Mengshi Qi and Huadong Ma	(参考訳) 本稿では,物理視聴覚コモンセンス推論のためのdcl(disentangleed counterfactual learning)アプローチを提案する。このタスクは、ビデオとオーディオの両方の入力に基づいて物体の物理常識を推論することを目的としており、主な課題は人間の推論能力を模倣する方法である。現在の手法のほとんどは、マルチモーダルデータにおける異なる特徴を十分に活用できず、モデルの因果推論能力の欠如は、暗黙の物理的知識の推論の進歩を妨げる。これらの問題に対処するために,本提案手法では,可変オートエンコーダ (vae) を応用し,相互情報をコントラスト損失関数で最大化する不連続シーケンシャルエンコーダ (disentangled sequential encoder) による潜在空間内の静的(時間不変)および動的(時変)要素に映像を分離する。さらに,異なる物体間の物理的知識関係のモデル化により,モデルの推論能力を増強する対実的学習モジュールを導入する。提案手法は,任意のベースラインに組み込むことができるプラグアンドプレイモジュールである。実験では,提案手法はベースライン法を改良し,最先端の性能を実現する。ソースコードはhttps://github.com/andy20178/dclで入手できます。 In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention. Our proposed method is a plug-and-play module that can be incorporated into any baseline. In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance. Our source code is available at https://github.com/Andy20178/DCL.	翻訳日:2023-11-01 19:54:02 公開日:2023-10-30
# モデルスパーシフィケーションを用いた非凸・非スムース問題に対するプライバシ保存型連立初等双次学習 Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification ( http://arxiv.org/abs/2310.19558v1 ) ライセンス: Link先を確認	Yiwei Li, Chien-Wei Huang, Shuai Wang, Chong-Yung Chi, Tony Q. S. Quek	(参考訳) フェデレートラーニング(FL)は、クライアントのデータを共有することなく、パラメータサーバ(PS)のオーケストレーションの下で、大規模な分散クライアント上でモデルをトレーニングする、急速に成長する研究領域として認識されている。本稿では,非凸性および非平滑性損失関数を特徴とするフェデレーション問題を,FLアプリケーションで広く普及しているが,非凸性と非平滑性の性質が複雑であり,通信効率とプライバシ保護の矛盾が原因で対処が困難である。本稿では,非凸および非滑らかなFL問題に適した双方向モデルスペーシフィケーションを備えた新しいフェデレーション原始双対アルゴリズムを提案し,高いプライバシー保証のために差分プライバシを適用した。その独特な洞察力といくつかのプライバシーと収束分析は、flアルゴリズム設計ガイドラインにも提示されている。実世界のデータに対する広範囲な実験を行い,提案アルゴリズムの有効性と最先端のflアルゴリズムよりも優れた性能を実証し,解析結果と特性の検証を行った。 Federated learning (FL) has been recognized as a rapidly growing research area, where the model is trained over massively distributed clients under the orchestration of a parameter server (PS) without sharing clients' data. This paper delves into a class of federated problems characterized by non-convex and non-smooth loss functions, that are prevalent in FL applications but challenging to handle due to their intricate non-convexity and non-smoothness nature and the conflicting requirements on communication efficiency and privacy protection. In this paper, we propose a novel federated primal-dual algorithm with bidirectional model sparsification tailored for non-convex and non-smooth FL problems, and differential privacy is applied for strong privacy guarantee. Its unique insightful properties and some privacy and convergence analyses are also presented for the FL algorithm design guidelines. Extensive experiments on real-world data are conducted to demonstrate the effectiveness of the proposed algorithm and much superior performance than some state-of-the-art FL algorithms, together with the validation of all the analytical results and properties.	翻訳日:2023-11-01 19:53:39 公開日:2023-10-30
# 効率的なプレトレーニングによる映像基礎モデルの構築 Harvest Video Foundation Models via Efficient Post-Pretraining ( http://arxiv.org/abs/2310.19554v1 ) ライセンス: Link先を確認	Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, Limin Wang, Yu Qiao, Ping Luo	(参考訳) ビデオデータの冗長性や高品質なビデオ言語データセットの欠如のため、ビデオ言語基盤モデルの構築は費用がかかり難い。本稿では,画像から映像ファンデーションモデルを取り出すための効率的なフレームワークを提案する。提案手法は,入力ビデオパッチをランダムにドロップし,プレトレーニング後の入力テキストをマスクアウトすることで,直感的に簡単である。パッチドロップはトレーニング効率を大幅に向上させ、テキストマスキングはクロスモーダル融合の学習を強制する。提案手法の有効性を検証するために,ゼロショットタスク,ビデオ質問応答,ビデオテキスト検索など,幅広い下流課題において広範囲な実験を行った。その単純さにもかかわらず、本手法は、事前訓練されたビデオ基盤モデルに匹敵する最先端のパフォーマンスを実現する。この手法は非常に効率的で、8gpuで1日未満でトレーニングでき、プリトレーニングデータとしてwebvid-10mだけを必要とする。当社の手法は,ビデオファンデーションモデルをシンプルかつ強力なものにし,構築時に有用な洞察を提供し,事前学習された大規模モデルをよりアクセスし,持続可能なものにすることを願っている。これはInternVideoプロジェクト \url{https://github.com/OpenGVLab/InternVideo} の一部である。 Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets. In this paper, we propose an efficient framework to harvest video foundation models from image ones. Our method is intuitively simple by randomly dropping input video patches and masking out input text during the post-pretraining procedure. The patch dropping boosts the training efficiency significantly and text masking enforces the learning of cross-modal fusion. We conduct extensive experiments to validate the effectiveness of our method on a wide range of video-language downstream tasks including various zero-shot tasks, video question answering, and video-text retrieval. Despite its simplicity, our method achieves state-of-the-art performances, which are comparable to some heavily pretrained video foundation models. Our method is extremely efficient and can be trained in less than one day on 8 GPUs, requiring only WebVid-10M as pretraining data. We hope our method can serve as a simple yet strong counterpart for prevalent video foundation models, provide useful insights when building them, and make large pretrained models more accessible and sustainable. This is part of the InternVideo project \url{https://github.com/OpenGVLab/InternVideo}.	翻訳日:2023-11-01 19:53:19 公開日:2023-10-30
# RayDF:マルチビュー整合性を持つニューラルな地表面距離場 RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency ( http://arxiv.org/abs/2310.19629v1 ) ライセンス: Link先を確認	Zhuoman Liu, Bo Yang	(参考訳) 本稿では,連続3次元形状表現の問題について検討する。既存の成功手法の大半は座標に基づく暗黙的神経表現である。しかし、新しいビューを描画したり、明示的な表面点を復元するのに非効率である。少数の研究がレイベースの神経関数として3次元形状を定式化し始めたが、多視点幾何整合性の欠如により学習された構造は劣っている。これらの課題に対処するために、RayDFと呼ばれる新しいフレームワークを提案する。主な構成要素は3つある。 1)単純光線面距離場。 2)新しい2線視認性分類器,及び 3) 学習された線面距離を多視点形状に整合させるマルチビュー一貫性最適化モジュール。提案手法を3つの公開データセット上で広範に評価し,既存の座標ベースおよびレイベースベースラインを明らかに超越した,合成および挑戦的な実世界の3Dシーンにおける3次元表面点再構成における顕著な性能を示した。最も注目すべきは,800x800深度の画像を描画する座標ベースの手法よりも1000倍高速で,3次元形状表現の精度が向上している点である。私たちのコードとデータはhttps://github.com/vlar-group/raydfで入手できます。 In this paper, we study the problem of continuous 3D shape representations. The majority of existing successful methods are coordinate-based implicit neural representations. However, they are inefficient to render novel views or recover explicit surface points. A few works start to formulate 3D shapes as ray-based neural functions, but the learned structures are inferior due to the lack of multi-view geometry consistency. To tackle these challenges, we propose a new framework called RayDF. It consists of three major components: 1) the simple ray-surface distance field, 2) the novel dual-ray visibility classifier, and 3) a multi-view consistency optimization module to drive the learned ray-surface distances to be multi-view geometry consistent. We extensively evaluate our method on three public datasets, demonstrating remarkable performance in 3D surface point reconstruction on both synthetic and challenging real-world 3D scenes, clearly surpassing existing coordinate-based and ray-based baselines. Most notably, our method achieves a 1000x faster speed than coordinate-based methods to render an 800x800 depth image, showing the superiority of our method for 3D shape representation. Our code and data are available at https://github.com/vLAR-group/RayDF	翻訳日:2023-11-01 19:46:02 公開日:2023-10-30
# トランスフォーメーション対トラディション:芸術と人文科学のための人工知能(AGI) Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities ( http://arxiv.org/abs/2310.19626v1 ) ライセンス: Link先を確認	Zhengliang Liu, Yiwei Li, Qian Cao, Junwen Chen, Tianze Yang, Zihao Wu, John Hale, John Gibbs, Khaled Rasheed, Ninghao Liu, Gengchen Mai, and Tianming Liu	(参考訳) 人工知能(AGI)の最近の進歩、特に大きな言語モデルと創造的な画像生成システムは、芸術や人文科学にまたがる様々なタスクにおいて印象的な能力を示している。しかし、AGIの急速な進化は、文化的に重要なこれらの領域における責任ある展開について批判的な疑問を提起している。本稿では,芸術と人文科学に関連するテキスト,グラフィック,オーディオ,ビデオに対するAGIの応用と意義を包括的に分析する。詩から歴史,マーケティング,映画,コミュニケーションから古典芸術まで幅広い分野における最先端システムとその利用について調査した。我々は, AGIシステムにおける事実性, 毒性, バイアス, 公衆安全に関する重大な懸念を概説し, 緩和戦略を提案する。この論文は、真理や人間の尊厳を損なうことなく、agiが創造性、知識、文化的価値を促進することを保証するために、マルチステイクホルダーの協力を主張する。私たちのタイムリーな貢献は、急速に発展する分野をまとめ、人間の繁栄を中心とした責任ある進歩を提唱しながら、有望な方向性を強調します。この分析は、AGIの技術能力と永続的な社会財との整合性に関するさらなる研究の基盤となる。 Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly human. This paper provides a comprehensive analysis of the applications and implications of AGI for text, graphics, audio, and video pertaining to arts and the humanities. We survey cutting-edge systems and their usage in areas ranging from poetry to history, marketing to film, and communication to classical art. We outline substantial concerns pertaining to factuality, toxicity, biases, and public safety in AGI systems, and propose mitigation strategies. The paper argues for multi-stakeholder collaboration to ensure AGI promotes creativity, knowledge, and cultural values without undermining truth or human dignity. Our timely contribution summarizes a rapidly developing field, highlighting promising directions while advocating for responsible progress centering on human flourishing. The analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods.	翻訳日:2023-11-01 19:45:34 公開日:2023-10-30
# タンパク質言語モデルの学習後量子化の探索 Exploring Post-Training Quantization of Protein Language Models ( http://arxiv.org/abs/2310.19624v1 ) ライセンス: Link先を確認	Shuang Peng, Fei Yang, Ning Sun, Sheng Chen, Yanfeng Jiang, Aimin Pan	(参考訳) esm-1bやesm-2のような教師なしタンパク質言語モデル(proteinlms)の最近の進歩は、さまざまなタンパク質予測タスクで期待されている。しかし、これらのモデルは、高い計算要求、重要なメモリ要求、遅延のために問題に直面し、限られたリソースを持つデバイスでの使用を制限する。そこで本研究では,ProteinLMのポストトレーニング量子化(PTQ)について検討し,ESM-2ProteinLMをベースとしたAlphaFoldの簡易版であるESMFoldに着目した。我々の研究は、たんぱく質の全重みと活性化を定量化する最初の試みである。典型的な均一量子化法はESMFoldでは不十分であり、8ビット量子化ではTMスコアが大幅に低下する。 esmfold,特に層正規化前の高度に非対称なアクティベーション範囲について,幅広い量子化実験を行い,低ビット固定点形式を用いた表現の困難さを明らかにした。これらの課題に対処するために,不斉アクティベーション値の分数次線形量子化を利用して正確な近似を保証する新しいPTQ法を提案する。タンパク質構造予測タスクにおける本手法の有効性を実証し,ESMFoldを精度良く低ビット幅まで正確に定量化できることを示した。さらに,本手法を接触予測タスクに適用し,その汎用性を示した。本研究は,タンパク質膜に対する革新的PTQ法を導入し,特定の量子化課題に対処し,タンパク質関連アプリケーションに重要な意味を持つより効率的なタンパク質膜の開発につながる可能性がある。 Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize all weights and activations of ProteinLMs. We observed that the typical uniform quantization method performs poorly on ESMFold, causing a significant drop in TM-Score when using 8-bit quantization. We conducted extensive quantization experiments, uncovering unique challenges associated with ESMFold, particularly highly asymmetric activation ranges before Layer Normalization, making representation difficult using low-bit fixed-point formats. To address these challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, demonstrating that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, showcasing its versatility. In summary, our study introduces an innovative PTQ method for ProteinLMs, addressing specific quantization challenges and potentially leading to the development of more efficient ProteinLMs with significant implications for various protein-related applications.	翻訳日:2023-11-01 19:44:40 公開日:2023-10-30
# 疎正準相関推定のためのベイズ的手法 A Bayesian Methodology for Estimation for Sparse Canonical Correlation ( http://arxiv.org/abs/2310.19621v1 ) ライセンス: Link先を確認	Siddhesh Kulkarni, Subhadip Pal, Jeremy T. Gaskins	(参考訳) 共同研究に参加した被験者ごとに異なる実験から得られた多視点高次元データの統合的統計解析を行うことは困難である。標準相関解析 (CCA) は、そのようなデータセット間の関係を識別するための統計的手続きである。その文脈において、構造化スパースCA(Structured Sparse CCA, ScSCCA)は、対応するCCA方向ベクトルをスパースと仮定して、異なるデータモダリティ間の相互関係の堅牢なモデリングを目的とした、急速に発展する方法論分野である。急速に成長している統計方法論開発地域であるが、ベイズパラダイムで関連する方法論を開発する必要がある。本稿では,ベイズ無限因子モデルを用いて,モデリングフレームワークの2つの異なるレベルでのスパース性を促進することにより,頑健な推定を実現することを目的とした,新しいscscca手法を提案する。まず, 潜時変荷重行列のレベルにおいて, スパーシリティを促進するために, 乗算半コーシー法を用いる。さらに,グラフィカルホースシューの事前構造や対角構造を用いて,共分散行列のさらなるスパース性を促進する。提案手法と他の頻繁に使用されるCCA法の性能を比較するために複数のシミュレーションを行い,乳がん研究から得られたマルチオミクスデータを解析するために,本手法を適用した。 It can be challenging to perform an integrative statistical analysis of multi-view high-dimensional data acquired from different experiments on each subject who participated in a joint study. Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between such data sets. In that context, Structured Sparse CCA (ScSCCA) is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities by assuming the corresponding CCA directional vectors to be sparse. Although it is a rapidly growing area of statistical methodology development, there is a need for developing related methodologies in the Bayesian paradigm. In this manuscript, we propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation by encouraging sparsity in two different levels of the modeling framework. Firstly, we utilize a multiplicative Half-Cauchy process prior to encourage sparsity at the level of the latent variable loading matrices. Additionally, we promote further sparsity in the covariance matrix by using graphical horseshoe prior or diagonal structure. We conduct multiple simulations to compare the performance of the proposed method with that of other frequently used CCA procedures, and we apply the developed procedures to analyze multi-omics data arising from a breast cancer study.	翻訳日:2023-11-01 19:44:04 公開日:2023-10-30
# 大規模軌道モデルはスケーラブルな運動予測器とプランナーである Large Trajectory Models are Scalable Motion Predictors and Planners ( http://arxiv.org/abs/2310.19620v1 ) ライセンス: Link先を確認	Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao	(参考訳) 運動予測と計画は自動運転において重要なタスクであり、最近の取り組みは機械学習ベースのアプローチに移行している。課題には、多様な道路トポロジの理解、長期にわたる交通力学の推論、異種行動の解釈、大規模連続状態空間におけるポリシーの生成などが含まれる。モデルスケーリングによる類似の複雑さに対処する大規模言語モデルの成功に触発されて、我々はState Transformer (STR)と呼ばれるスケーラブルなトラジェクトリモデルを導入した。 strは、観測、状態、動作を一つの統一シーケンスモデリングタスクに配置することで、動き予測と動き計画の問題を再構成する。単純なモデル設計では、STRは両問題におけるベースラインアプローチを一貫して上回っている。実験結果から,STRなどの大型軌道モデル(LTM)は,優れた適応性と学習効率を示すことにより,スケーリング法則に従うことが明らかとなった。定性的な結果は、LTMがトレーニングデータ分布から大きく分岐するシナリオにおいて、妥当な予測を行うことができることを示している。 LTMはまた、明確な損失設計やコストの高い高レベルのアノテーションなしで、長期計画のための複雑な推論を行うことを学ぶ。 Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task. With a simple model design, STR consistently outperforms baseline approaches in both problems. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.	翻訳日:2023-11-01 19:43:41 公開日:2023-10-30
# 大規模言語モデルにおける心の選別理論の全体像に向けて Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models ( http://arxiv.org/abs/2310.19619v1 ) ライセンス: Link先を確認	Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai	(参考訳) 大規模言語モデル(llm)は、心の理論(tom)の潜在的出現に関して、かなりの関心と議論を生み出した。最近のいくつかの調査では、これらのモデルに堅牢なToMが欠如していることが判明し、新しいベンチマークの開発に対する需要が高まっている。本稿では,(1)ToMの全体像を分類するにはどうすればよいのか,という2つの道路封鎖問題に答える。 2) マシンToMのより効果的な評価プロトコルとは何か? 心理学的な研究の後、機械のToMを7つの精神状態カテゴリーに分類し、既存のベンチマークでToMの未調査側面を特定する。 ToMの総合的かつ位置的評価により、ToMを個々の構成要素に分解し、LLMを物理的に環境に配置し、人間との相互作用において社会的に位置するエージェントとして扱う。このような位置評価は、精神状態をより包括的に評価し、近道やデータ漏洩のリスクを軽減する可能性がある。さらに,概念実証としてグリッド・ワールド・セットアップにおけるパイロット・スタディを提案する。このポジションペーパーは将来,ToM と LLM を統合し,研究者がToM のランドスケープで作業を行うための直感的な手段となることを期待する。プロジェクトページ:https://github.com/Mars-tin/awesome-theory-of-mind Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks, as current ones primarily focus on different aspects of ToM and are prone to shortcuts and data leakage. In this position paper, we seek to answer two road-blocking questions: (1) How can we taxonomize a holistic landscape of machine ToM? (2) What is a more effective evaluation protocol for machine ToM? Following psychological studies, we taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM. We argue for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans. Such situated evaluation provides a more comprehensive assessment of mental states and potentially mitigates the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept. We hope this position paper can facilitate future research to integrate ToM with LLMs and offer an intuitive means for researchers to better position their work in the landscape of ToM. Project page: https://github.com/Mars-tin/awesome-theory-of-mind	翻訳日:2023-11-01 19:43:21 公開日:2023-10-30
# 抑制性神経回路はシナプス可塑性のサインを制御する Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity ( http://arxiv.org/abs/2310.19614v1 ) ライセンス: Link先を確認	Julian Rossbroich, Friedemann Zenke	(参考訳) 神経回路がどのように信用割り当てを達成するかは、システム神経科学において未解決の課題である。様々な研究により、多層ネットワークによるバックプロパゲートエラー信号の解法が提案されている。これらの純粋に機能的に動機づけられたモデルは、シナプス可塑性の徴候を決定する局所的エラー信号を表すために異なる神経細胞のコンパートメントを仮定する。しかし、この明示的な誤り変調は、主にシナプス後活動に依存する現象学的可塑性モデルと矛盾する。本稿では,適応制御理論の枠組みで導かれる可解なマイクロ回路モデルとヘビー学習規則が,この不一致をいかに解消するかを示す。誤りがトップダウン非抑制シナプス求心性にコード化されていると仮定すると、繰り返し抑制がヘビアン可塑性に明示的に影響を及ぼすと、誤り修飾学習は回路レベルで自然に現れる。同じ学習規則は、抑制がない場合の可塑性を実験的に観察し、いくつかの非線形分離可能なベンチマークでエラーのバックプロパゲーション(bp)に比較可能である。本研究は, 機能的および実験的に観察された可塑性規則のギャップを埋め, 励起可塑性の抑制に関する具体的な予測を行う。 How neuronal circuits achieve credit assignment remains a central unsolved question in systems neuroscience. Various studies have suggested plausible solutions for back-propagating error signals through multi-layer networks. These purely functionally motivated models assume distinct neuronal compartments to represent local error signals that determine the sign of synaptic plasticity. However, this explicit error modulation is inconsistent with phenomenological plasticity models in which the sign depends primarily on postsynaptic activity. Here we show how a plausible microcircuit model and Hebbian learning rule derived within an adaptive control theory framework can resolve this discrepancy. Assuming errors are encoded in top-down dis-inhibitory synaptic afferents, we show that error-modulated learning emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same learning rule accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.	翻訳日:2023-11-01 19:42:59 公開日:2023-10-30
# 部分ベイズニューラルネットワークのFeynman-Kacトレーニングについて On Feynman--Kac training of partial Bayesian neural networks ( http://arxiv.org/abs/2310.19608v1 ) ライセンス: Link先を確認	Zheng Zhao and Sebastian Mair and Thomas B. Sch\"on and Jens Sj\"olund	(参考訳) 近年,パラメータのサブセットのみを確率的と考える部分ベイズニューラルネットワーク (pbnns) が,完全なベイズニューラルネットワークと競合することが示された。しかし、pBNNはしばしば潜在変数空間において多重モードであり、パラメトリックモデルに近似することは困難である。そこで本研究では,Feynman-Kacモデルのシミュレーションとして,pBNNのトレーニングを定式化した,効率的なサンプリングベーストレーニング戦略を提案する。次に,このモデルのパラメータと潜在後続分布を同時に計算可能な計算コストで推定できる逐次モンテカルロサンプリングの変種について述べる。我々は,様々な合成データと実世界のデータセットについて,提案手法が予測性能の面での最先端を上回っていることを示す。 Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent-variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. We show on various synthetic and real-world datasets that our proposed training scheme outperforms the state of the art in terms of predictive performance.	翻訳日:2023-11-01 19:42:39 公開日:2023-10-30
# 抽象的議論による事例ベース推論における事例関連性の学習に関する技術報告 Technical Report on the Learning of Case Relevance in Case-Based Reasoning with Abstract Argumentation ( http://arxiv.org/abs/2310.19607v1 ) ライセンス: Link先を確認	Guilherme Paulino-Passos, Francesca Toni	(参考訳) ケースベース推論は、いくつかの法的設定において重要な役割を果たすことが知られている。本稿では,最近の事例ベース推論のアプローチに注目し,議論が事例を表現し,事例間の結果の不一致と関連性の概念による攻撃結果を示す抽象的議論のインスタンス化が支持する。この文脈では、関連性はケース間の特異性の形式に結びついている。我々は,意思決定木を駆使して,ケースベース推論と抽象的議論(aa-cbr)の組み合わせと,法的場面における予測のためのケース関連学習について検討する。具体的には,aa-cbr と decision-tree-based learning の2つの法定データセットについて,決定木との比較で比較検討を行った。また,AA-CBRによるケース関係の学習により,決定木よりもコンパクトな表現が得られ,認知に難渋する説明を得る上で有益であることが示唆された。 Case-based reasoning is known to play an important role in several legal settings. In this paper we focus on a recent approach to case-based reasoning, supported by an instantiation of abstract argumentation whereby arguments represent cases and attack between arguments results from outcome disagreement between cases and a notion of relevance. In this context, relevance is connected to a form of specificity among cases. We explore how relevance can be learnt automatically in practice with the help of decision trees, and explore the combination of case-based reasoning with abstract argumentation (AA-CBR) and learning of case relevance for prediction in legal settings. Specifically, we show that, for two legal datasets, AA-CBR and decision-tree-based learning of case relevance perform competitively in comparison with decision trees. We also show that AA-CBR with decision-tree-based learning of case relevance results in a more compact representation than their decision tree counterparts, which could be beneficial for obtaining cognitively tractable explanations.	翻訳日:2023-11-01 19:42:24 公開日:2023-10-30
# Deep Kalman Filters can Filter Deep Kalman Filters Can Filter ( http://arxiv.org/abs/2310.19603v1 ) ライセンス: Link先を確認	Blanka Hovart, Anastasis Kratsios, Yannick Limmer, Xuwei Yang	(参考訳) ディープカルマンフィルタ(deep kalman filter、dkfs)は、逐次データからガウス確率測度を生成するニューラルネットワークモデルのクラスである。 DKFはカルマンフィルタにインスパイアされたものの、確率的フィルタリング問題と具体的な理論的関係が欠如しているため、数学ファイナンスにおけるボンドとオプション価格のモデルキャリブレーションなど、従来のモデルベースフィルタが使用されている領域に適用性に制限される。本研究では,非マルコフ的および条件付きガウス的信号過程の条件法則を概略実装できる連続時間DKFのクラスを示すことで,ディープラーニングの数学的基礎に対処する。近似結果は、与えられたコンパクトなパスの集合に対して計算された最悪のケース2-ワッサーシュタイン距離によって近似誤差を定量化する。 Deep Kalman filters (DKFs) are a class of neural network models that generate Gaussian probability measures from sequential data. Though DKFs are inspired by the Kalman filter, they lack concrete theoretical ties to the stochastic filtering problem, thus limiting their applicability to areas where traditional model-based filters have been used, e.g.\ model calibration for bond and option prices in mathematical finance. We address this issue in the mathematical foundations of deep learning by exhibiting a class of continuous-time DKFs which can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-times measurements. Our approximation results hold uniformly over sufficiently regular compact subsets of paths, where the approximation error is quantified by the worst-case 2-Wasserstein distance computed uniformly over the given compact set of paths.	翻訳日:2023-11-01 19:42:07 公開日:2023-10-30
# スピンスピン結合を持つ2量子ラビモデルにおける熱力学的限界 Thermodynamic Limit in the Two-qubit Quantum Rabi Model with Spin-Spin Coupling ( http://arxiv.org/abs/2310.19595v1 ) ライセンス: Link先を確認	R. Grimaudo, A. Messina, A. Sergi, E. Solano, and D. Valenti	(参考訳) 同じ量子化場モードに結合された2つの相互作用量子ビットからなる量子系において、2階超放射性量子相転移が発生する。スピンスピン相互作用を持つ積分可能な2量子ビット量子ラビモデルに対して,熱力学的に適切な限界を導入する。すなわち、スピンとモードの周波数比に関係なく、スピンスピンとスピンモードのカップリングとモード周波数との無限比によって決定される。 The occurrence of a second-order superradiant quantum phase transition is brought to light in a quantum system consisting of two interacting qubits coupled to the same quantized field mode. We introduce an appropriate thermodynamic-like limit for the integrable two-qubit quantum Rabi model with spin-spin interaction. Namely, it is determined by the infinite ratios of the spin-spin and the spin-mode couplings to the mode frequency, regardless of the spin-to-mode frequency ratios.	翻訳日:2023-11-01 19:41:24 公開日:2023-10-30
# エキスパートアドバイザを用いた局所定常データの予測 Prediction of Locally Stationary Data Using Expert Advice ( http://arxiv.org/abs/2310.19591v1 ) ライセンス: Link先を確認	Vladimir V'yugin, Vladimir Trunov	(参考訳) 継続的機械学習の問題は研究されている。ゲーム理論のアプローチの枠組みでは、次の予測を計算する際には、データフローを生成するソースの確率的性質に関する仮定は使用されない -- ソースはアナログ、アルゴリズム、確率的であり、そのパラメータは確率モデルを構築する際にランダムに変化する可能性がある。局所定常時系列のオンライン予測アルゴリズムを提案する。提案アルゴリズムの効率を推定する。 The problem of continuous machine learning is studied. Within the framework of the game-theoretic approach, when for calculating the next forecast, no assumptions about the stochastic nature of the source that generates the data flow are used -- the source can be analog, algorithmic or probabilistic, its parameters can change at random times, when building a prognostic model, only structural assumptions are used about the nature of data generation. An online forecasting algorithm for a locally stationary time series is presented. An estimate of the efficiency of the proposed algorithm is obtained.	翻訳日:2023-11-01 19:41:18 公開日:2023-10-30
# シャープ解によって特徴づけられる偏微分方程式を解く演算子学習による物理インフォームドニューラルネットワーク Operator Learning Enhanced Physics-informed Neural Networks for Solving Partial Differential Equations Characterized by Sharp Solutions ( http://arxiv.org/abs/2310.19590v1 ) ライセンス: Link先を確認	Bin Lin, Zhiping Mao, Zhicheng Wang, George Em Karniadakis	(参考訳) 物理インフォームドニューラルネットワーク(PINN)は偏微分方程式(PDE)の前方および逆問題の解法として有望なアプローチとして示されている。一方、ディープ・オペレーター・ネットワーク(DeepONet)やフーリエ・ニューラル・オペレータ(FNO)などの手法を含むニューラル・オペレーター・アプローチは、PDEの近似ソリューションとして広く採用されている。それでも、シャープなソリューションからなる問題を解決することは、この2つのアプローチを採用する際に大きな課題となる。そこで本研究では,演算子学習強化物理インフォームドニューラルネットワーク(OL-PINN)と呼ばれる新しいフレームワークを提案する。まず,deeponetを用いて,鋭い解を特徴とするpdesに関連する平滑な問題の集合について解演算子を学習する。その後、トレーニング済みのDeepONetをPINNと統合し、ターゲットのシャープな解問題を解決する。本稿では, 非線形拡散反応方程式, バーガーズ方程式, 圧縮不能なナビエ・ストークス方程式などの様々な問題をレイノルズ数で解くことで, OL-PINNの有効性を示す。提案手法はバニラピンと比較すると,強い一般化能力を達成するために少数の残差点しか必要としない。さらに、堅牢なトレーニングプロセスを確保しながら、精度を大幅に向上させる。さらに、OL-PINNは逆問題を解決するためにPINNの利点を継承する。この目的のために,ol-pinn法を部分境界条件のみを用いて解くことに応用し,古典的数値解法では解くことが困難であり,不適切な問題やより複雑な逆問題を解く能力を示す。 Physics-informed Neural Networks (PINNs) have been shown as a promising approach for solving both forward and inverse problems of partial differential equations (PDEs). Meanwhile, the neural operator approach, including methods such as Deep Operator Network (DeepONet) and Fourier neural operator (FNO), has been introduced and extensively employed in approximating solution of PDEs. Nevertheless, to solve problems consisting of sharp solutions poses a significant challenge when employing these two approaches. To address this issue, we propose in this work a novel framework termed Operator Learning Enhanced Physics-informed Neural Networks (OL-PINN). Initially, we utilize DeepONet to learn the solution operator for a set of smooth problems relevant to the PDEs characterized by sharp solutions. Subsequently, we integrate the pre-trained DeepONet with PINN to resolve the target sharp solution problem. We showcase the efficacy of OL-PINN by successfully addressing various problems, such as the nonlinear diffusion-reaction equation, the Burgers equation and the incompressible Navier-Stokes equation at high Reynolds number. Compared with the vanilla PINN, the proposed method requires only a small number of residual points to achieve a strong generalization capability. Moreover, it substantially enhances accuracy, while also ensuring a robust training process. Furthermore, OL-PINN inherits the advantage of PINN for solving inverse problems. To this end, we apply the OL-PINN approach for solving problems with only partial boundary conditions, which usually cannot be solved by the classical numerical methods, showing its capacity in solving ill-posed problems and consequently more complex inverse problems.	翻訳日:2023-11-01 19:41:08 公開日:2023-10-30
# ゲージ同変非線形メッセージパッシングを用いたメッシュ上のモデリングダイナミクス Modeling Dynamics over Meshes with Gauge Equivariant Nonlinear Message Passing ( http://arxiv.org/abs/2310.19589v1 ) ライセンス: Link先を確認	Jung Yeon Park, Lawson L.S. Wong, Robin Walters	(参考訳) 非ユークリッド多様体上のデータは、しばしば表面メッシュとして離散化され、コンピュータグラフィックスや生物学的および物理的システムに自然に現れる。特に、多様体上の偏微分方程式(PDE)の解は、基礎となる幾何学に批判的に依存する。グラフニューラルネットワークはPDEにうまく適用されているが、曲面幾何学を取り入れておらず、多様体の局所ゲージ対称性を考慮していない。あるいは、メッシュ上のゲージ同変畳み込みおよび注意アーキテクチャに関する最近の研究は、基礎となる幾何学を活用するが、複雑な非線形力学を持つ表面PDEのモデル化では不十分である。これらの問題に対処するため、非線形メッセージパッシングを用いた新しいゲージ同変アーキテクチャを提案する。我々の新しいアーキテクチャは、複雑で非線形なドメイン上の畳み込みネットワークや注意ネットワークよりも高い性能を実現する。しかし、非メッシュの場合と同様に、設計上のトレードオフは、異なるタスクに対して畳み込み、注意、またはメッセージパッシングのネットワークを好む。 Data over non-Euclidean manifolds, often discretized as surface meshes, naturally arise in computer graphics and biological and physical systems. In particular, solutions to partial differential equations (PDEs) over manifolds depend critically on the underlying geometry. While graph neural networks have been successfully applied to PDEs, they do not incorporate surface geometry and do not consider local gauge symmetries of the manifold. Alternatively, recent works on gauge equivariant convolutional and attentional architectures on meshes leverage the underlying geometry but underperform in modeling surface PDEs with complex nonlinear dynamics. To address these issues, we introduce a new gauge equivariant architecture using nonlinear message passing. Our novel architecture achieves higher performance than either convolutional or attentional networks on domains with highly complex and nonlinear dynamics. However, similar to the non-mesh case, design trade-offs favor convolutional, attentional, or message passing networks for different tasks; we investigate in which circumstances our message passing method provides the most benefit.	翻訳日:2023-11-01 19:40:38 公開日:2023-10-30
# DrM: 休眠率最小化による視覚強化学習の習得 DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization ( http://arxiv.org/abs/2310.19668v1 ) ライセンス: Link先を確認	Guowei Xu, Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Zhecheng Yuan, Tianying Ji, Yu Luo, Xiaoyu Liu, Jiaxin Yuan, Pu Hua, Shuzhen Li, Yanjie Ze, Hal Daum\'e III, Furong Huang, Huazhe Xu	(参考訳) 視覚強化学習(RL)は連続制御タスクにおいて有望である。その進歩にもかかわらず、現在のアルゴリズムは、サンプル効率、漸近的性能、ランダム種の選択に対する堅牢性など、事実上あらゆるパフォーマンス面で満足できない。本稿では、初期訓練中に持続的不活性を示すエージェントである既存の視覚的RL法の主な欠点を特定し、効果的に探索する能力を制限する。さらに,この重要な観察により,運動的不活発な探索に対するエージェントの傾きと,その政策ネットワークにおける神経活動の欠如との間に有意な相関が明らかとなった。この不活性を定量化するために、RLエージェントのネットワークにおける不活性を測定するために、休眠比を計量として採用する。また, 報酬信号によらず, 休眠比がエージェントの活動レベルのスタンドアロン指標として機能することを実証的に認識する。上記の知見を生かしたdrmは,エージェントの探索・探索トレードオフを積極的に最小化することにより,3つのコアメカニズムを用いてガイドする手法である。実験によると、DrMはDeepMind Control Suite、MetaWorld、Adroitを含む3つの連続制御ベンチマーク環境において、壊れた種(合計76種)なしでサンプル効率と漸近性能を大幅に改善する。最も重要なことは、drmはdeepmindコントロールスイートの犬とマニピュレータドメインの両方のタスクを一貫して解決する最初のモデルフリーなアルゴリズムである。 Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations.	翻訳日:2023-11-01 19:34:16 公開日:2023-10-30
# 神経拡散反応過程による動的テンソル分解 Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes ( http://arxiv.org/abs/2310.19666v1 ) ライセンス: Link先を確認	Zheng Wang, Shikai Fang, Shibo Li, Shandian Zhe	(参考訳) テンソル分解は多方向データ解析の重要なツールである。実際には、データはしばしばスパースされ、リッチな時間情報と関連付けられる。しかし、既存の手法はしばしば時間情報を過小評価し、わずかに観察されたテンソルエントリ内の構造的知識を無視する。これらの制限を克服し、その基盤となる時間構造をよりよく捉えるために、Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE)を提案する。各テンソルモードにおけるエンティティの動的埋め込みを推定するニューラル拡散-反応プロセスを開発した。具体的には、観測されたテンソルエントリに基づいて、エンティティ間の相関をエンコードする多成分グラフを構築する。グラフ拡散プロセスを構築し、相関したエンティティの埋め込み軌道を共進化させ、ニューラルネットワークを用いて個々のエンティティに対する反応プロセスを構築する。このようにして、我々のモデルは、異なる実体に対する埋め込みの進化において、共通性と個性の両方を捉えることができる。次に、ニューラルネットワークを用いて入力値を埋め込み軌道の非線形関数としてモデル化する。モデル推定にはODEソルバを組み合わせて確率的ミニバッチ学習アルゴリズムを開発する。本稿では,各ミニバッチの処理コストのバランスをとるための階層化サンプリング手法を提案する。我々はシミュレーション研究と実世界のアプリケーションの両方において,このアプローチの利点を示す。コードはhttps://github.com/wzhut/dynamic-tensor-decomposition-via-neural-diffusion-reaction-processesで入手できる。 Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation study and real-world applications. The code is available at https://github.com/wzhut/Dynamic-Tensor-Decomposition-via-Neural-Diffusion-Reaction-Processes.	翻訳日:2023-11-01 19:33:43 公開日:2023-10-30
# 北エフの量子二重模型の任意のセクターの分類 Classification of the anyon sectors of Kitaev's quantum double model ( http://arxiv.org/abs/2310.19661v1 ) ライセンス: Link先を確認	Alex Bols, Siddharth Vadnerkar	(参考訳) 無限三角格子上のキタエフの量子二重モデルの任意のセクターと、非アーベルケースを含む有限ゲージ群$G$の完全な分類を与える。予想通り、モデルの任意のセクターは、正確に$G$の量子二重代数の既約表現に対応する。私たちの証明は2つの主な部分からなる。第一部では、量子二重代数の各既約表現を純粋状態として構成し、これらの純状態の GNS 表現が任意のセクターに対的に不随意であることを示す。第2部では、任意のエノンセクターが、第1部で構築されたエノンセクターの1つに一意的に等しいことを示す。証明の最初の部分は、問題の状態の記述を文字列-ネット凝縮として決定的に用いている。純粋性は、これらの状態が局所的制約の適切な集合を満たすユニークな状態として特徴づけられる。証明の核心は、局所ゲージ変換のある群が局所弦ネットの集合に対して自由に推移的に作用するという事実である。第二に、任意のセクターがこれらの制約の有限個を除いて全てを満たす純粋状態を含むことを示す。既知の手法を用いることで、これらの制約のうちの1つを除いて全てを満たすあらゆるセクターで純粋な状態を構築することができる。最後に、そのような状態は、最初の部分で構築された任意のセクターの1つのベクトル状態でなければならないことを示す。 We give a complete classification of the anyon sectors of Kitaev's quantum double model on the infinite triangular lattice and for finite gauge group $G$, including the non-abelian case. As conjectured, the anyon sectors of the model correspond precisely to the irreducible representations of the quantum double algebra of $G$. Our proof consists of two main parts. In the first part, we construct for each irreducible representation of the quantum double algebra a pure state and show that the GNS representations of these pure states are pairwise disjoint anyon sectors. In the second part we show that any anyon sector is unitarily equivalent to one of the anyon sectors constructed in the first part. The first part of the proof crucially uses a description of the states in question as string-net condensates. Purity is shown by characterising these states as the unique states that satisfy appropriate sets of local constraints. At the core of the proof is the fact that certain groups of local gauge transformations act freely and transitively on collections of local string-nets. For the second part, we show that any anyon sector contains a pure state that satisfies all but a finite number of these constraints. Using known techniques we can then construct a pure state in the anyon sector that satisfies all but one of these constraints. Finally, we show explicitly that any such state must be a vector state in one of the anyon sectors constructed in the first part.	翻訳日:2023-11-01 19:33:20 公開日:2023-10-30
# 反復生成概念ボトルネックを用いた解釈可能テキスト分類 Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck ( http://arxiv.org/abs/2310.19660v1 ) ライセンス: Link先を確認	Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, Chris Callison-Burch	(参考訳) ディープニューラルネットワークはテキスト分類タスクに優れるが、ハイテイクドメインへの応用は、解釈可能性の欠如によって妨げられている。そこで本研究では,グローバルかつ局所的な説明を提供する,本質的に解釈可能なテキスト分類フレームワークであるText Bottleneck Models (TBMs)を提案する。 tbmsは出力ラベルを直接予測する代わりに、スパースな概念集合のカテゴリー値を予測し、それらの概念上の線形層を使用して最終的な予測を生成する。これらの概念は、人間のキュレーションを必要とせずに、LLM(Large Language Model)によって自動的に発見され、測定することができる。概念生成と測定の両方に GPT-4 を用いる12種類の多様なデータセットにおいて,TBM は GPT-4 fewshot や DeBERTa などの確立したブラックボックスベースラインに匹敵する性能を示す。全体として、tbmsは、特に一般ドメインテキストにおいて、最小限のパフォーマンストレードオフで、解釈性を高める有望な新しいフレームワークであることを示唆している。 Deep neural networks excel in text classification tasks, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBMs), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBMs predict categorical values for a sparse set of salient concepts and use a linear layer over those concept values to produce the final prediction. These concepts can be automatically discovered and measured by a Large Language Model (LLM), without the need for human curation. On 12 diverse datasets, using GPT-4 for both concept generation and measurement, we show that TBMs can rival the performance of established black-box baselines such as GPT-4 fewshot and finetuned DeBERTa, while falling short against finetuned GPT-3.5. Overall, our findings suggest that TBMs are a promising new framework that enhances interpretability, with minimal performance tradeoffs, particularly for general-domain text.	翻訳日:2023-11-01 19:32:55 公開日:2023-10-30
# ネットワーク侵入検出のための自然言語による木モデル決定の解説 Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection ( http://arxiv.org/abs/2310.19658v1 ) ライセンス: Link先を確認	Noah Ziems, Gang Liu, John Flanagan, Meng Jiang	(参考訳) 機械学習を利用したネットワーク侵入検知(NID)システムは、悪質なネットワークトラフィックを検出するために実際に高い性能を示すことが示されている。特に決定木は、パフォーマンスと単純さのバランスが強いが、NIDシステムのユーザは、機械学習の背景知識を解釈しなければならない。さらに、なぜ特定の特徴が分類に重要であるかについて、追加の外部情報を提供することができない。本研究では,大規模言語モデル(LLM)を用いて,意思決定木NIDシステムに対する説明と背景知識を提供する。さらに,人間による決定木推論の理解度を測定するクイズ質問の自動生成を活用した,決定木説明のための新たな人間評価フレームワークを提案する。最後に, LLM の生成した決定木説明は, 可読性, 品質, 背景知識の利用の人間の評価と高い相関性を示し, 同時に意思決定境界の理解を深めた。 Network intrusion detection (NID) systems which leverage machine learning have been shown to have strong performance in practice when used to detect malicious network traffic. Decision trees in particular offer a strong balance between performance and simplicity, but require users of NID systems to have background knowledge in machine learning to interpret. In addition, they are unable to provide additional outside information as to why certain features may be important for classification. In this work, we explore the use of large language models (LLMs) to provide explanations and additional background knowledge for decision tree NID systems. Further, we introduce a new human evaluation framework for decision tree explanations, which leverages automatically generated quiz questions that measure human evaluators' understanding of decision tree inference. Finally, we show LLM generated decision tree explanations correlate highly with human ratings of readability, quality, and use of background knowledge while simultaneously providing better understanding of decision boundaries.	翻訳日:2023-11-01 19:32:33 公開日:2023-10-30
# 計算病理学におけるドメイン一般化:調査とガイドライン Domain Generalization in Computational Pathology: Survey and Guidelines ( http://arxiv.org/abs/2310.19656v1 ) ライセンス: Link先を確認	Mostafa Jahanifar, Manahil Raza, Kesi Xu, Trinh Vuong, Rob Jewsbury, Adam Shephard, Neda Zamanitajeddin, Jin Tae Kwak, Shan E Ahmed Raza, Fayyaz Minhas, Nasir Rajpoot	(参考訳) 深層学習モデルは、様々な組織像解析アプリケーションにまたがる複雑なタスクに取り組むことにより、計算病理学(CPath)において例外的な効果を示した。それでも、分布外データ(異種イメージング装置や様々な組織調製方法など、様々なソースから推定される)の存在は、 \emph{ domain shift} (DS) を引き起こす可能性がある。 DSは、訓練されたモデルの一般化をわずかに異なるデータ分布を持つ未知のデータセットに還元し、革新的な 'emph{ domain generalization} (DG) ソリューションの必要性を喚起する。本研究は,癌研究および臨床実習における診断・予後モデルに大きな影響を与えるDG法の可能性を認識し,CPathにおけるDGの達成に関するガイドラインとともに報告する。我々は、様々なDSタイプを厳格に定義し、CPathの既存のDGアプローチとリソースを体系的にレビューし、分類し、それらの利点、制限、適用可能性に関する洞察を提供する。また,28個の最先端DGアルゴリズムを用いて,複雑なDG問題に対処するためのベンチマーク実験を行った。以上の結果から, CPath特異的なステント拡張技術と注意深い実験設計が有効である可能性が示唆された。しかし、CPath では DG のすべてに適合するソリューションは存在しない。そこで我々は,異なるシナリオに応じてDSの検出と管理を行うための明確なガイドラインを確立する。コンセプト、ガイドライン、レコメンデーションのほとんどはcpathのアプリケーションで提供されていますが、ほとんどの医療画像分析タスクにも適用できると考えています。 Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases the generalization of trained models to unseen datasets with slightly different data distributions, prompting the need for innovative \emph{domain generalization} (DG) solutions. Recognizing the potential of DG methods to significantly influence diagnostic and prognostic models in cancer studies and clinical practice, we present this survey along with guidelines on achieving DG in CPath. We rigorously define various DS types, systematically review and categorize existing DG approaches and resources in CPath, and provide insights into their advantages, limitations, and applicability. We also conduct thorough benchmarking experiments with 28 cutting-edge DG algorithms to address a complex DG problem. Our findings suggest that careful experiment design and CPath-specific Stain Augmentation technique can be very effective. However, there is no one-size-fits-all solution for DG in CPath. Therefore, we establish clear guidelines for detecting and managing DS depending on different scenarios. While most of the concepts, guidelines, and recommendations are given for applications in CPath, we believe that they are applicable to most medical image analysis tasks as well.	翻訳日:2023-11-01 19:32:18 公開日:2023-10-30
# mcad:効率的な画像テキスト検索のためのマルチティーチャークロスモーダルアライメント蒸留 MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval ( http://arxiv.org/abs/2310.19654v1 ) ライセンス: Link先を確認	Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu	(参考訳) 大規模視覚言語事前学習モデルの成功と,産業領域における画像テキスト検索の広範な適用により,モデルサイズを削減し,端末端末展開を合理化する必要性が高まっている。画像テキスト検索の主流モデル構造はシングルストリームとデュアルストリームであり、どちらも視覚とテキスト間のセマンティックギャップを埋めることを目的としている。デュアルストリームモデルはオフラインインデックス化と高速推論において優れ、一方シングルストリームモデルは適切な特徴融合を用いてより正確なクロスモデルアライメントを実現する。単ストリームモデルと二重ストリームモデルの利点を統合するため, マルチティーチングラークロスモーダルアライメント蒸留(MCAD)手法を提案する。両ストリームモデルのイメージとテキストの特徴に融合した単一ストリーム特徴を組み込むことで,教師の新たな特徴やロジットを定式化する。次に,留学生のデュアルストリームモデルの能力を高めるために,ロジットと特徴蒸留の両方を行い,推論の複雑さを増すことなく高い検索性能を達成する。画像テキスト検索タスクにおけるMCADの顕著な性能と高効率性を示す。さらに,9300万のメモリと30ミリ秒の検索レイテンシを持つSnapdragonクリップ上で,モバイルCLIPモデルを実装した。 With the success of large-scale visual-language pretraining models and the wide application of image-text retrieval in industry areas, reducing the model size and streamlining their terminal-device deployment have become urgently necessary. The mainstream model structures for image-text retrieval are single-stream and dual-stream, both aiming to close the semantic gap between visual and textual modalities. Dual-stream models excel at offline indexing and fast inference, while single-stream models achieve more accurate cross-model alignment by employing adequate feature fusion. We propose a multi-teacher cross-modality alignment distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher features and logits. Then, we conduct both logit and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity. Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a mobile CLIP model on Snapdragon clips with only 93M running memory and 30ms search latency, without apparent performance degradation of the original large CLIP.	翻訳日:2023-11-01 19:31:53 公開日:2023-10-30
# 拡散モデルによる無制限データプランによるvaeトレーニングのアップグレード Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion Models ( http://arxiv.org/abs/2310.19653v1 ) ライセンス: Link先を確認	Tim Z. Xiao, Johannes Zenn, Robert Bamler	(参考訳) 変分オートエンコーダ(VAE)は表現学習の一般的なモデルであるが、それらのエンコーダは真の(連続的な)データ分散である$p_{\mathrm{data}}(\mathbf{x})$の代わりに有限トレーニングセットで訓練されているため、オーバーフィッティング(Cremer et al., 2018)の影響を受けやすい。一方、拡散モデルはエンコーダを固定することでこの問題を回避する。これにより、それらの表現は解釈できないが、トレーニングを単純化し、$p_{\mathrm{data}}(\mathbf{x})$の正確かつ連続的な近似を可能にする。本稿では,VAEにおけるオーバーフィッティングエンコーダを,事前学習した拡散モデルからのサンプルのトレーニングにより効果的に緩和できることを示す。これらの結果は、最近の研究結果(Alemohammad et al., 2023; Shumailov et al., 2023)が、他の生成モデルによって生成されたデータに基づいてモデルが訓練された場合、生成性能の低下を観測している。提案手法を用いて学習したVAEの一般化性能,償却ギャップ,ロバスト性を3つの異なるデータセットで解析した。通常のトレーニング法と従来のデータ拡張法と比較して,すべての測定値が改善され,拡散モデルから得られたサンプルの量で十分な値が得られることが判明した。 Variational autoencoders (VAEs) are popular models for representation learning but their encoders are susceptible to overfitting (Cremer et al., 2018) because they are trained on a finite training set instead of the true (continuous) data distribution $p_{\mathrm{data}}(\mathbf{x})$. Diffusion models, on the other hand, avoid this issue by keeping the encoder fixed. This makes their representations less interpretable, but it simplifies training, enabling accurate and continuous approximations of $p_{\mathrm{data}}(\mathbf{x})$. In this paper, we show that overfitting encoders in VAEs can be effectively mitigated by training on samples from a pre-trained diffusion model. These results are somewhat unexpected as recent findings (Alemohammad et al., 2023; Shumailov et al., 2023) observe a decay in generative performance when models are trained on data generated by another generative model. We analyze generalization performance, amortization gap, and robustness of VAEs trained with our proposed method on three different data sets. We find improvements in all metrics compared to both normal training and conventional data augmentation methods, and we show that a modest amount of samples from the diffusion model suffices to obtain these gains.	翻訳日:2023-11-01 19:31:30 公開日:2023-10-30
# インストラクションチューニングのダイナミクス:大規模言語モデルのそれぞれの能力には独自の成長ペースがある Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace ( http://arxiv.org/abs/2310.19651v1 ) ライセンス: Link先を確認	Chiyu Song, Zhanchao Zhou, Jianhao Yan, Yuejiao Fei, Zhenzhong Lan, Yue Zhang	(参考訳) 命令チューニングは、大規模言語モデル(llm)の汎用知性を引き出すための急成長する手法である。しかし、命令データの作成はいまだにヒューリスティックであり、既存のデータセット間の品質と分散に大きな変化をもたらす。これらのデータセットから得られた実験的な結論も矛盾しておらず、一部の研究では命令数のスケーリングの重要性を強調している。データ構築ガイドラインをより深く理解するために、私たちは、全体的なモデルパフォーマンスから、クリエイティブな記述、コード生成、論理的推論といった基礎的な能力の成長まで、焦点を絞ります。数百のモデルチェックポイント (7b〜33b) を用いて,40k以上のヒューマンキュレート命令データからなる新しいコレクション上で,データボリューム,パラメータサイズ,データ構築手法が様々な能力開発に与える影響を体系的に検討した。提案したデータセットは、厳密に品質制御され、10の異なるLCM能力に分類される。私たちの研究は3つの主要な発見を明らかにした。 (i) モデル全体の性能に直接影響を及ぼすデータ量とパラメータスケールにもかかわらず、その増加に反応する能力があり、限られたデータを使って効果的に訓練できる能力がある一方で、これらの変化に強く抵抗する能力もある。 (II)GPT-4の合成データより効率が良く、容積増加とともにモデル性能を常に向上させることができるが、合成データでは達成できない。 (iii)命令データは、最初の2つの観察を反映するドメイン外データに対する評価結果とともに、強力な相互可能性の一般化をもたらす。さらに、これらの結果がより効率的なデータ構築を導出し、公開ベンチマークの性能改善につながることを実証する。 Instruction tuning is a burgeoning method to elicit the general intelligence of Large Language Models (LLMs). However, the creation of instruction data is still largely heuristic, leading to significant variation in quality and distribution across existing datasets. Experimental conclusions drawn from these datasets are also inconsistent, with some studies emphasizing the importance of scaling instruction numbers, while others argue that a limited number of samples suffice. To better understand data construction guidelines, we deepen our focus from the overall model performance to the growth of each underlying ability, such as creative writing, code generation, and logical reasoning. We systematically investigate the effects of data volume, parameter size, and data construction methods on the development of various abilities, using hundreds of model checkpoints (7b to 33b) fully instruction-tuned on a new collection of over 40k human-curated instruction data. This proposed dataset is stringently quality-controlled and categorized into ten distinct LLM abilities. Our study reveals three primary findings: (i) Despite data volume and parameter scale directly impacting models' overall performance, some abilities are more responsive to their increases and can be effectively trained using limited data, while some are highly resistant to these changes. (ii) Human-curated data strongly outperforms synthetic data from GPT-4 in efficiency and can constantly enhance model performance with volume increases, but is unachievable with synthetic data. (iii) Instruction data brings powerful cross-ability generalization, with evaluation results on out-of-domain data mirroring the first two observations. Furthermore, we demonstrate how these findings can guide more efficient data constructions, leading to practical performance improvements on public benchmarks.	翻訳日:2023-11-01 19:30:59 公開日:2023-10-30
# keygen2vec: 質問応答におけるマルチラベルキーワード生成による学習文書埋め込み KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering ( http://arxiv.org/abs/2310.19650v1 ) ライセンス: Link先を確認	Iftitahu Ni'mah and Samaneh Khoshrou and Vlado Menkovski and Mykola Pechenizkiy	(参考訳) 文書ソース間の構造的類似性を保ちながら高次元埋め込み空間に文書を表現することは、テキスト表現学習における多くの研究の最終的な目標である。しかし、現在の埋め込みモデルは、主にラベル管理の可用性に依存して、その結果の埋め込みの表現力を高めている。対照的に、教師なしの埋め込みは安価であるが、ターゲットコーパスの暗黙的な構造、特にプリトレーニングソースとの異なる分布から来るサンプルをキャプチャできないことが多い。本研究の目的は,Sequence-to-Sequence (Seq2Seq)テキストジェネレータによる文書埋め込みを学習することで,ラベル管理への依存を緩和することである。具体的には,コミュニティベース質問応答(cqa)において,キーフレーズ生成タスクをマルチラベルキーワード生成に再構成する。実験の結果、KeyGen2VecはPurity, Normalized Mutual Information (NMI)、F1-Scoreのメトリクスに基づいて最大14.7%のマルチラベルキーワード分類器よりも優れていることがわかった。興味深いことに、一般的にラベル管理を通じて埋め込みを学ぶことの絶対的な利点は評価データセット間で非常に肯定的であるが、KeyGen2VecはYahoo! cQAのトピックラベル管理と多くの潜在トピックラベルを利用する分類器と競合している。 Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however, mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings. In contrast, unsupervised embeddings are cheap, but they often cannot capture implicit structure in target corpus, particularly for samples that come from different distribution with the pretraining source. Our study aims to loosen up the dependency on label supervision by learning document embeddings via Sequence-to-Sequence (Seq2Seq) text generator. Specifically, we reformulate keyphrase generation task into multi-label keyword generation in community-based Question Answering (cQA). Our empirical results show that KeyGen2Vec in general is superior than multi-label keyword classifier by up to 14.7% based on Purity, Normalized Mutual Information (NMI), and F1-Score metrics. Interestingly, although in general the absolute advantage of learning embeddings through label supervision is highly positive across evaluation datasets, KeyGen2Vec is shown to be competitive with classifier that exploits topic label supervision in Yahoo! cQA with larger number of latent topic labels.	翻訳日:2023-11-01 19:30:29 公開日:2023-10-30
# 高速スワップ後悔最小化と近似相関平衡への応用 Fast swap regret minimization and applications to approximate correlated equilibria ( http://arxiv.org/abs/2310.19647v1 ) ライセンス: Link先を確認	Binghui Peng and Aviad Rubinstein	(参考訳) 任意の定数 $\varepsilon>0$ に対して、$t = \mathsf{polylog}(n)$ round で$\varepsilon t$-swap の後悔を得るという、単純で計算効率の良いアルゴリズムを与える。我々のアルゴリズムは$\varepsilon$に指数関数的依存を持つが、我々は一致する新しい下界を証明する。 Our algorithm for swap regret implies faster convergence to $\varepsilon$-Correlated Equilibrium ($\varepsilon$-CE) in several regimes: For normal form two-player games with $n$ actions, it implies the first uncoupled dynamics that converges to the set of $\varepsilon$-CE in polylogarithmic rounds; a $\mathsf{polylog}(n)$-bit communication protocol for $\varepsilon$-CE in two-player games (resolving an open problem mentioned by [Babichenko-Rubinstein'2017, Goos-Rubinstein'2018, Ganor-CS'2018]; and an $\tilde{O}(n)$-query algorithm for $\varepsilon$-CE (resolving an open problem of [Babichenko'2020] and obtaining the first separation between $\varepsilon$-CE and $\varepsilon$-Nash equilibrium in the query complexity model). 広義のゲームの場合、我々のアルゴリズムはPTAS for $\mathit{normal}$ $\mathit{form}$ $\mathit{correlated}$ $\mathit{equilibria}$, 計算的に難解であると予想される(例: [Stengel-Forges'08, Fujii'23])。 We give a simple and computationally efficient algorithm that, for any constant $\varepsilon>0$, obtains $\varepsilon T$-swap regret within only $T = \mathsf{polylog}(n)$ rounds; this is an exponential improvement compared to the super-linear number of rounds required by the state-of-the-art algorithm, and resolves the main open problem of [Blum and Mansour 2007]. Our algorithm has an exponential dependence on $\varepsilon$, but we prove a new, matching lower bound. Our algorithm for swap regret implies faster convergence to $\varepsilon$-Correlated Equilibrium ($\varepsilon$-CE) in several regimes: For normal form two-player games with $n$ actions, it implies the first uncoupled dynamics that converges to the set of $\varepsilon$-CE in polylogarithmic rounds; a $\mathsf{polylog}(n)$-bit communication protocol for $\varepsilon$-CE in two-player games (resolving an open problem mentioned by [Babichenko-Rubinstein'2017, Goos-Rubinstein'2018, Ganor-CS'2018]; and an $\tilde{O}(n)$-query algorithm for $\varepsilon$-CE (resolving an open problem of [Babichenko'2020] and obtaining the first separation between $\varepsilon$-CE and $\varepsilon$-Nash equilibrium in the query complexity model). For extensive-form games, our algorithm implies a PTAS for $\mathit{normal}$ $\mathit{form}$ $\mathit{correlated}$ $\mathit{equilibria}$, a solution concept often conjectured to be computationally intractable (e.g. [Stengel-Forges'08, Fujii'23]).	翻訳日:2023-11-01 19:30:04 公開日:2023-10-30
# distnet2d: 長距離時間情報を活用した効率的なセグメンテーションと追跡 DistNet2D: Leveraging long-range temporal information for efficient segmentation and tracking ( http://arxiv.org/abs/2310.19641v1 ) ライセンス: Link先を確認	Jean Ollion, Martin Maliet, Caroline Giuglaris, Elise Vacher and Maxime Deforet	(参考訳) videomicroscopyから長いトラックや系統を抽出するには、非常に低いエラー率が必要であり、高密度または変形した細胞の複雑なデータセットでは困難である。この課題を克服するには、時間的コンテキストを活用することが重要だ。本研究では2次元セルセグメンテーションと追跡のための新しいディープニューラルネットワーク(DNN)アーキテクチャであるDistNet2Dを提案する。 DistNet2Dは入力時に7つのフレームを考慮し、映画全体の情報を利用してセグメンテーションエラーを修正する後処理手順を使用する。 distnet2dは、密集した細菌細胞と真核生物細胞を含む2つの実験データセットの最近の2つの方法よりも優れている。 2Dデータ可視化、キュレーション、トレーニングのためのImageJベースのグラフィカルユーザインタフェースに統合されている。最後に, distnet2dの性能を, 細菌および真核生物の細胞において, 細胞の大きさと形状と輸送特性との相関性について実証した。 Extracting long tracks and lineages from videomicroscopy requires an extremely low error rate, which is challenging on complex datasets of dense or deforming cells. Leveraging temporal context is key to overcome this challenge. We propose DistNet2D, a new deep neural network (DNN) architecture for 2D cell segmentation and tracking that leverages both mid- and long-term temporal context. DistNet2D considers seven frames at the input and uses a post-processing procedure that exploits information from the entire movie to correct segmentation errors. DistNet2D outperforms two recent methods on two experimental datasets, one containing densely packed bacterial cells and the other containing eukaryotic cells. It has been integrated into an ImageJ-based graphical user interface for 2D data visualization, curation, and training. Finally, we demonstrate the performance of DistNet2D on correlating the size and shape of cells with their transport properties over large statistics, for both bacterial and eukaryotic cells.	翻訳日:2023-11-01 19:29:28 公開日:2023-10-30
# 顔の表情の不均衡認識のための余分な知識をマイニングする「Leave No Stone Unturned」 Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition ( http://arxiv.org/abs/2310.19636v1 ) ライセンス: Link先を確認	Yuhang Zhang, Yaqi Li, Lixiong Qin, Xuannan Liu, Weihong Deng	(参考訳) 顔の表情データは大きな不均衡を特徴とし、ほとんどの収集されたデータは幸福あるいは中立な表現を示し、恐怖や嫌悪の事例は少ない。この不均衡は、表情認識(FER)モデルに課題をもたらし、様々な人間の感情状態を完全に理解する能力を妨げる。既存のFER法は通常、高度に不均衡なテストセットに対して全体的な精度を報告するが、全ての式クラスの平均精度は低い。本稿では,不均衡なFER問題に対処することを目的とする。既存の手法は主に、マイナークラスのサンプルのみからマイナークラスの知識を学ぶことに焦点を当てている。しかし,本研究では,マイノリティークラスとマイノリティークラスの両方のサンプルから,マイノリティークラスに関連する余分な知識を抽出する手法を提案する。我々のモチベーションは、FERが分布学習タスクに似ているという信念から来ており、サンプルには複数のクラスに関する情報が含まれている可能性がある。例えば、メジャークラスのサプライズからのサンプルには、マイナークラスの恐れの便利な機能も含まれているかもしれない。そこで本研究では,モデルの正規化に再均衡したアテンションマップを活用する手法を提案し,すべてのトレーニングサンプルからマイナークラスに関する変換不変情報を抽出する。また,不均衡なトレーニングデータのラベル分布に関する余分な情報を利用して,モデルがよりマイナーなクラスに注意を払うように誘導し,クロスエントロピー損失を規制するために,再バランススムースラベルを導入する。異なるデータセットとバックボーンの広範な実験により、2つの提案されたモジュールが協力してモデルを正規化し、不均衡なFERタスクの下で最先端のパフォーマンスを達成することが示されている。コードはhttps://github.com/zyh-uaiaaaaで入手できる。 Facial expression data is characterized by a significant imbalance, with most collected data showing happy or neutral expressions and fewer instances of fear or disgust. This imbalance poses challenges to facial expression recognition (FER) models, hindering their ability to fully understand various human emotional states. Existing FER methods typically report overall accuracy on highly imbalanced test sets but exhibit low performance in terms of the mean accuracy across all expression classes. In this paper, our aim is to address the imbalanced FER problem. Existing methods primarily focus on learning knowledge of minor classes solely from minor-class samples. However, we propose a novel approach to extract extra knowledge related to the minor classes from both major and minor class samples. Our motivation stems from the belief that FER resembles a distribution learning task, wherein a sample may contain information about multiple classes. For instance, a sample from the major class surprise might also contain useful features of the minor class fear. Inspired by that, we propose a novel method that leverages re-balanced attention maps to regularize the model, enabling it to extract transformation invariant information about the minor classes from all training samples. Additionally, we introduce re-balanced smooth labels to regulate the cross-entropy loss, guiding the model to pay more attention to the minor classes by utilizing the extra information regarding the label distribution of the imbalanced training data. Extensive experiments on different datasets and backbones show that the two proposed modules work together to regularize the model and achieve state-of-the-art performance under the imbalanced FER task. Code is available at https://github.com/zyh-uaiaaaa.	翻訳日:2023-11-01 19:29:09 公開日:2023-10-30
# 臨床精度・解釈性モデルのための双方向キャプション Bidirectional Captioning for Clinically Accurate and Interpretable Models ( http://arxiv.org/abs/2310.19635v1 ) ライセンス: Link先を確認	Keegan Quigley, Miriam Cha, Josh Barua, Geeticka Chauhan, Seth Berkowitz, Steven Horng, Polina Golland	(参考訳) 視覚言語事前学習は、下流コンピュータビジョンタスクに効率的に転送する高品質な視覚エンコーダを生成することが示されている。生成言語モデルが広く注目されている一方で、画像キャプションは、特に医学的画像分析において、対照的な学習を好むクロスモーダルプリトレーニングの形式として見過ごされてきた。本稿では,放射線学レポートの双方向キャプションを事前学習の一形態として実験し,学習した埋め込みの質と有用性を比較検討した。我々は、放射線領域にradtexと呼ばれるcnnエンコーダ、トランスフォーマデコーダアーキテクチャを最適化する。その結果,コントラスト付き事前学習と競合するプリトレーニング型視覚エンコーダ(CheXpert competition multi-label AUC 89.4%)の字幕化だけでなく,臨床関連報告(CheXpert labeler を用いたマクロF1スコア0.349)を生成でき,対象とする対話的出力のプロンプトに応答できることがわかった。 Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. While generative language models have gained widespread attention, image captioning has thus far been mostly overlooked as a form of cross-modal pretraining in favor of contrastive learning, especially in medical image analysis. In this paper, we experiment with bidirectional captioning of radiology reports as a form of pretraining and compare the quality and utility of learned embeddings with those from contrastive pretraining methods. We optimize a CNN encoder, transformer decoder architecture named RadTex for the radiology domain. Results show that not only does captioning pretraining yield visual encoders that are competitive with contrastive pretraining (CheXpert competition multi-label AUC of 89.4%), but also that our transformer decoder is capable of generating clinically relevant reports (captioning macro-F1 score of 0.349 using CheXpert labeler) and responding to prompts with targeted, interactive outputs.	翻訳日:2023-11-01 19:28:41 公開日:2023-10-30
# デブリ・破壊・アーティファクト粒子を用いたtem画像からの無傷アデノウイルス自動検出のための畳み込みニューラルネットワーク Convolutional Neural Networks for Automatic Detection of Intact Adenovirus from TEM Imaging with Debris, Broken and Artefacts Particles ( http://arxiv.org/abs/2310.19630v1 ) ライセンス: Link先を確認	Olivier Rukundo, Andrea Behanova, Riccardo De Feo, Seppo Ronkko, Joni Oja, Jussi Tohka	(参考訳) 製造および製造過程における医薬品の一次粒子および純度プロファイルの定期的なモニタリングは、製造者が製品の変動や汚染を避けるために不可欠である。透過電子顕微鏡(TEM)イメージングは、ウイルスベースの遺伝子治療ベクター製品と中間体において、変化が粒子の特性と純度に与える影響を予測するのに役立つ。無傷粒子は有効成分を特徴付けることができるため、粉体、破砕物、アーティファクト粒子を混合した非インタクトウイルス背景に対する無傷アデノウイルスの検出を自動化することが有用である。このような粒子の存在下では、無傷アデノウイルスの検出がより困難になる。この課題を克服するため,我々は,アデノウイルスのセミオートアノテーションとセグメンテーションのためのソフトウェアツールと,temイメージングシステムにおける無傷アデノウイルスの自動セグメンテーションと検出のためのソフトウェアツールを開発した。開発した半自動ツールは従来の画像解析手法を活用し,畳み込みニューラルネットワークと画像解析技術に基づいて自動ツールを構築した。定量・定性評価の結果, 真正検出率は偽陽性, 陰性で, アデノウイルスは本物のデブリや破断性アデノウイルス, 染色性アーティファクトと誤認することなく良好な検出率を示した。 Regular monitoring of the primary particles and purity profiles of a drug product during development and manufacturing processes is essential for manufacturers to avoid product variability and contamination. Transmission electron microscopy (TEM) imaging helps manufacturers predict how changes affect particle characteristics and purity for virus-based gene therapy vector products and intermediates. Since intact particles can characterize efficacious products, it is beneficial to automate the detection of intact adenovirus against a non-intact-viral background mixed with debris, broken, and artefact particles. In the presence of such particles, detecting intact adenoviruses becomes more challenging. To overcome the challenge, due to such a presence, we developed a software tool for semi-automatic annotation and segmentation of adenoviruses and a software tool for automatic segmentation and detection of intact adenoviruses in TEM imaging systems. The developed semi-automatic tool exploited conventional image analysis techniques while the automatic tool was built based on convolutional neural networks and image analysis techniques. Our quantitative and qualitative evaluations showed outstanding true positive detection rates compared to false positive and negative rates where adenoviruses were nicely detected without mistaking them for real debris, broken adenoviruses, and/or staining artefacts.	翻訳日:2023-11-01 19:28:18 公開日:2023-10-30
# 重なり合うスパース画像の深層学習に基づく分解:ニュートリノ相互作用の頂点への応用 Deep-learning-based decomposition of overlapping-sparse images: application at the vertex of neutrino interactions ( http://arxiv.org/abs/2310.19695v1 ) ライセンス: Link先を確認	Sa\'ul Alonso-Monsalve, Davide Sgalaberna, Xingyu Zhao, Adrien Molines, Clark McGrew, Andr\'e Rubbia	(参考訳) 画像分解は様々なコンピュータビジョンタスクにおいて重要な役割を担い、視覚的コンテンツの基本的なレベルでの分析と操作を可能にする。重なり合う画像は、複数のオブジェクトやシーンが部分的にお互いを遮っているときに起こり、分解アルゴリズムに特有の課題をもたらす。このタスクはスパース画像を扱う際に強化され、意味のある情報の不足がコンポーネントの正確な抽出を複雑にする。本稿では,多次元重なりスパース画像内の個々の物体を正確に抽出する深層学習の力を利用する解と,撮像検出器から得られた重なり粒子の分解を伴う高エネルギー物理学における直接的応用について述べる。ニュートリノ相互作用の頂点における独立粒子を同定し、測定し、複数の荷電粒子が重複する検出器像を観測することを期待する。深層学習によって頂点での検出器活動の像を分解することで、特定された低運動量粒子の運動パラメータを推定し、ニュートリノ現象の再構成されたエネルギー分解能を高めることができる。また, 上記の手法と完全微分可能生成モデルを組み合わせることで, さらに画像分解を改善し, その結果, 測定パラメータの分解能を向上し, 前例のない結果を得る。この改良はニュートリノのフレーバー振動を管理するパラメータを正確に測定し、物質と反物質の間の対称性を探索するために重要である。 Image decomposition plays a crucial role in various computer vision tasks, enabling the analysis and manipulation of visual content at a fundamental level. Overlapping images, which occur when multiple objects or scenes partially occlude each other, pose unique challenges for decomposition algorithms. The task intensifies when working with sparse images, where the scarcity of meaningful information complicates the precise extraction of components. This paper presents a solution that leverages the power of deep learning to accurately extract individual objects within multi-dimensional overlapping-sparse images, with a direct application in high-energy physics with decomposition of overlaid elementary particles obtained from imaging detectors. In particular, the proposed approach tackles a highly complex yet unsolved problem: identifying and measuring independent particles at the vertex of neutrino interactions, where one expects to observe detector images with multiple indiscernible overlapping charged particles. By decomposing the image of the detector activity at the vertex through deep learning, it is possible to infer the kinematic parameters of the identified low-momentum particles - which otherwise would remain neglected - and enhance the reconstructed energy resolution of the neutrino event. We also present an additional step - that can be tuned directly on detector data - combining the above method with a fully-differentiable generative model to improve the image decomposition further and, consequently, the resolution of the measured parameters, achieving unprecedented results. This improvement is crucial for precisely measuring the parameters that govern neutrino flavour oscillations and searching for asymmetries between matter and antimatter.	翻訳日:2023-11-01 19:21:18 公開日:2023-10-30
# 長期時空間モデリングのための畳み込み状態空間モデル Convolutional State Space Models for Long-Range Spatiotemporal Modeling ( http://arxiv.org/abs/2310.19694v1 ) ライセンス: Link先を確認	Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon	(参考訳) 複雑な空間相関と長距離時間依存を同時にモデル化する必要があるため、長時空間列を効果的にモデル化することは困難である。 convlstmsは、再帰的なニューラルネットワークでテンソル値の状態を更新することでこれに対処するが、それらのシーケンシャルな計算によってトレーニングが遅くなる。対照的に、トランスフォーマーは時空間列全体を並列に処理し、トークンに圧縮することができる。しかしながら、注意のコストは2倍にスケールし、拡張性はより長いシーケンスに制限される。本稿では、先行手法の課題に対処し、ConvLSTMのテンソルモデリングのアイデアとS4やS5のような状態空間メソッドのロングシーケンスモデリングのアプローチを組み合わせた畳み込み状態空間モデル(ConvSSM)を導入する。まず,並列スキャンを畳み込み再帰に適用し,下位並列化と高速な自己回帰生成を実現する方法を示す。次に,長距離依存関係をモデル化するためのパラメータ化と初期化戦略の動機となるconvssmとssmのダイナミクスの等価性を確立する。その結果、ConvS5は、長距離時空間モデリングのための効率的なConvSSM変種である。 ConvS5 は Transformers と ConvLSTM を長距離移動MNIST 実験で上回り、ConvLSTM より3倍速く、Transformers より400倍速くサンプルを生成する。さらに、ConvS5はDMLab、Minecraft、Habitatの予測ベンチマークに挑戦する最先端のメソッドのパフォーマンスと一致し、長い時空間シーケンスをモデリングするための新しい方向を可能にする。 Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.	翻訳日:2023-11-01 19:20:52 公開日:2023-10-30
# 量子ドットセルオートマタを用いた非同期シーケンス回路における静的ハザードの除去 Elimination of Static Hazards in Asynchronous Sequential Circuits using Quantum dot Cellular Automata ( http://arxiv.org/abs/2310.19692v1 ) ライセンス: Link先を確認	Angshuman Khan, Chiradeep Mukherjee, Ankan Kumar Chakraborty, Ratna Chakrabarty, Debashis De	(参考訳) 新興技術には他にはないが、Quantum-dot Cellular Automata(量子ドットセルオートマタ)では、セル内の電子間の静電気的相互作用を扱う、高速、低電力動作、高パッケージ密度を見つけることができる。文献調査はqca回路のハザードフリー設計に欠けている。ハザードは曖昧で予測不能なアウトプットを生成し、回避できる。本研究は, リスクのない非同期シーケンシャル回路と, リンクエネルギーの両面を比較し, より優れた回路を提案する。回路シミュレーションはQCADesignerツールで検証されている。 There is nowhere else in emerging technology, but in Quantum-dot Cellular Automata, one can find high speed, low power operation, and high packaging density, which deals with electrostatic interaction between electrons within a cell. Literature survey lacks in hazards free design of QCA circuit. Hazards create ambiguous and unpredictable output, which can be avoided. This work considers both hazards and hazards-free asynchronous sequential circuits; both are compared in terms of kink energy, and a better one has been proposed. The circuit simulation has been verified in the QCADesigner tool.	翻訳日:2023-11-01 19:20:06 公開日:2023-10-30
# 因果関係は対実フェアネスとロバスト予測とグループフェアネスを結びつける Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness ( http://arxiv.org/abs/2310.19691v1 ) ライセンス: Link先を確認	Jacy Reese Anthis and Victor Veitch	(参考訳) 反事実的公平性は、異なる人種や性別のような異なる保護されたクラスがある場合、aiや他のアルゴリズムシステムによって同じ方法で分類されるように要求される。これは米国法体系に反映される直感的な基準であるが、反事実は現実世界のデータでは直接観察できないため、その使用は制限されている。一方、グループフェアネスの指標(例えば、人口比率や等化確率)は直感的ではないが、より容易に観察できる。本稿では, 対実フェアネス, 頑健な予測, グループフェアネスのギャップを埋めるために, $\textit{causal context}$ を用いる。まず, 公平性と正確性の間には, 必ずしも根本的なトレードオフが存在するとは限らないことを示すことにより, 反事実的公正さを動機づける。第2に,データ生成過程の因果グラフと,グループフェアネスメトリクスが反事実フェアネスと等価である場合の対応関係を考案する。第3に,3つの共通フェアネスコンテキストにおいて,ラベル選択,予測者の選択が,それぞれ人口差パリティ,等化オッズ,キャリブレーションと等価であることを示す。対実フェアネスは、比較的単純なグループフェアネスの測定によってテストすることができる。 Counterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.	翻訳日:2023-11-01 19:19:49 公開日:2023-10-30
# 変分境界による非逆分布アライメントの実現に向けて Towards Practical Non-Adversarial Distribution Alignment via Variational Bounds ( http://arxiv.org/abs/2310.19690v1 ) ライセンス: Link先を確認	Ziyu Gong, Ben Usman, Han Zhao, David I. Inouye	(参考訳) 分布アライメントは、フェアネスとロバストネスの応用で不変表現を学ぶのに使うことができる。ほとんどの先行研究は対向アライメント法を頼っているが、結果として生じるミニマックス問題は不安定で最適化が難しい。非敵対的可能性に基づくアプローチは、モデルの可逆性を必要とするか、潜在する事前に制約を課すか、あるいはアライメントのための一般的なフレームワークが欠如している。これらの制約を克服するために,任意のモデルパイプラインに適用可能な非逆vaeに基づくアライメント手法を提案する。我々は、vaeのような目的を持つが異なる視点を持つアライメント上界(ノイズ境界を含む)のセットを開発する。提案手法を,理論上も経験的にも,従来のVAEベースのアライメント手法と比較する。最後に,新たなアライメント損失により,標準不変表現学習パイプラインの敵意損失を,元のアーキテクチャを変更せずに置き換えることができることを実証し,非逆アライメント手法の適用性を大幅に拡大することを示した。 Distribution alignment can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial alignment methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods.	翻訳日:2023-11-01 19:19:07 公開日:2023-10-30
# デジタル空間における感情分析:レビューの概観 Sentiment Analysis in Digital Spaces: An Overview of Reviews ( http://arxiv.org/abs/2310.19687v1 ) ライセンス: Link先を確認	Laura E.M. Ayravainen, Joanne Hinds, Brittany I. Davidson	(参考訳) 感性分析(SA)は一般的にデジタルテキストデータに適用され、意見や感情に対する洞察を明らかにする。多くの体系的レビューは既存の研究を要約しているが、しばしば有効性や科学的実践についての議論を見落としている。本稿では,2,275の初等研究を含む38の体系的レビューを合成したレビューの概要を紹介する。我々は,システムレビュー手法と報告基準の厳格さと品質を評価するための,目覚しい品質評価フレームワークを考案した。その結果,多様なアプリケーションや手法,報告の厳密さ,課題が時間の経過とともに明らかとなった。今後の研究や実践者がこれらの問題にどう対処できるかを議論し、その重要性を多くのアプリケーションで強調する。 Sentiment analysis (SA) is commonly applied to digital textual data, revealing insight into opinions and feelings. Many systematic reviews have summarized existing work, but often overlook discussions of validity and scientific practices. Here, we present an overview of reviews, synthesizing 38 systematic reviews, containing 2,275 primary studies. We devise a bespoke quality assessment framework designed to assess the rigor and quality of systematic review methodologies and reporting standards. Our findings show diverse applications and methods, limited reporting rigor, and challenges over time. We discuss how future research and practitioners can address these issues and highlight their importance across numerous applications.	翻訳日:2023-11-01 19:18:37 公開日:2023-10-30
# 入力再構成は回帰u-netモデルの不確かさを直接推定するために使用できるか? --頭頸部癌に対する陽子線量予測への応用 Can input reconstruction be used to directly estimate uncertainty of a regression U-Net model? -- Application to proton therapy dose prediction for head and neck cancer patients ( http://arxiv.org/abs/2310.19686v1 ) ライセンス: Link先を確認	Margerie Huet-Dastarac, Dan Nguyen, Steve Jiang, John Lee, Ana Barragan Montero	(参考訳) 深層学習モデルの不確実性を信頼性と効率的な方法で推定することは、文献で多くの異なる解が提案されているオープンな問題のままである。ほとんどの一般的な方法は、モンテカルロ・ドロップアウト (MCDO) やディープ・アンサンブル (DE) のようなベイズ近似に基づいているが、高い推論時間(つまり、複数の推論パスを必要とする)を持ち、アウト・オブ・ディストリビューション検出 (OOD) データ(すなわち、イン・ディストリビューション (ID) と OOD に類似した不確実性)では機能しない。医療アプリケーションのような安全上重要な環境では、誤った予測が患者の安全性を脅かす可能性があるため、正確な不確実性推定手法が重要である。本研究では,代替の直接不確実性推定法を提案し,回帰型u-netアーキテクチャに適用する。この方法は、入力を再構築するボトルネックから分岐を追加することで構成される。入力再構成誤差はモデルの不確かさのサロゲートとして使用できる。概念実証のために, 頭頸部癌患者の陽子治療線量予測に適用した。本手法の精度,時間ゲイン,OOD検出を本手法で解析し,一般的なMCDOやDEと比較した。入力再構成法ではDとMCDO(0.447と0.612の間)よりも予測誤差(0.620)の高いピアソン相関係数を示した。また,OOD(Zスコア34.05)の同定も容易である。回帰タスクと同時に不確実性を推定するので、時間や計算資源は少なくなります。 Estimating the uncertainty of deep learning models in a reliable and efficient way has remained an open problem, where many different solutions have been proposed in the literature. Most common methods are based on Bayesian approximations, like Monte Carlo dropout (MCDO) or Deep ensembling (DE), but they have a high inference time (i.e. require multiple inference passes) and might not work for out-of-distribution detection (OOD) data (i.e. similar uncertainty for in-distribution (ID) and OOD). In safety critical environments, like medical applications, accurate and fast uncertainty estimation methods, able to detect OOD data, are crucial, since wrong predictions can jeopardize patients safety. In this study, we present an alternative direct uncertainty estimation method and apply it for a regression U-Net architecture. The method consists in the addition of a branch from the bottleneck which reconstructs the input. The input reconstruction error can be used as a surrogate of the model uncertainty. For the proof-of-concept, our method is applied to proton therapy dose prediction in head and neck cancer patients. Accuracy, time-gain, and OOD detection are analyzed for our method in this particular application and compared with the popular MCDO and DE. The input reconstruction method showed a higher Pearson correlation coefficient with the prediction error (0.620) than DE and MCDO (between 0.447 and 0.612). Moreover, our method allows an easier identification of OOD (Z-score of 34.05). It estimates the uncertainty simultaneously to the regression task, therefore requires less time or computational resources.	翻訳日:2023-11-01 19:18:27 公開日:2023-10-30
# DGFN: 二重生成フローネットワーク DGFN: Double Generative Flow Networks ( http://arxiv.org/abs/2310.19685v1 ) ライセンス: Link先を確認	Elaine Lau, Nikhil Vemgal, Doina Precup, Emmanuel Bengio	(参考訳) 深層学習は薬物発見の有効なツールとして現れており、予測モデルと生成モデルの両方に応用される可能性がある。 Generative Flow Networks (GFlowNets/GFNs) は、多種多様な候補を生成する能力、特に小さな分子生成タスクで認識される手法である。本稿では、DGFN(Double GFlowNets)を紹介する。強化学習とDouble Deep Q-Learningからインスピレーションを得て,これらのトラジェクトリをサンプリングするターゲットネットワークを導入し,メインネットワークをこれらのトラジェクトリで更新する。実験の結果、dgfnsはスパース報酬ドメインと高次元状態空間の探索を効果的に促進することが明らかとなった。 Deep learning is emerging as an effective tool in drug discovery, with potential applications in both predictive and generative models. Generative Flow Networks (GFlowNets/GFNs) are a recently introduced method recognized for the ability to generate diverse candidates, in particular in small molecule generation tasks. In this work, we introduce double GFlowNets (DGFNs). Drawing inspiration from reinforcement learning and Double Deep Q-Learning, we introduce a target network used to sample trajectories, while updating the main network with these sampled trajectories. Empirical results confirm that DGFNs effectively enhance exploration in sparse reward domains and high-dimensional state spaces, both challenging aspects of de-novo design in drug discovery.	翻訳日:2023-11-01 19:17:50 公開日:2023-10-30
# 深層学習を用いた入場誘導問題の密度推定 Density Estimation for Entry Guidance Problems using Deep Learning ( http://arxiv.org/abs/2310.19684v1 ) ライセンス: Link先を確認	Jens A. Rataczak, Davide Amato, Jay W. McMahon	(参考訳) 本研究は、惑星突入誘導問題に使用する大気密度プロファイルを推定するための深層学習手法を提案する。長期短期記憶(lstm)ニューラルネットワークは、エントリー車両で利用可能な測定値と、それが飛行する密度プロファイルの間のマッピングを学ぶために訓練される。測定には球面状態表現、直感加速度成分、表面圧力測定が含まれる。ネットワークのトレーニングデータは、最初に、指数密度モデルを用いた完全数値予測-補正ガイダンス(fnpeg)アルゴリズムを用いて、火星への突入ミッションのモンテカルロ分析を行い、真理密度プロファイルをmarsgramからサンプリングすることで生成される。 LSTMネットワークの予測をFNPEGアルゴリズムに統合するためのカリキュラム学習手法を開発した。訓練されたLSTMは、車両が飛行する密度プロファイルを予測し、既に飛行している密度プロファイルを再構築する。 FNPEGアルゴリズムの性能は指数モデル,1次フェードメモリフィルタを付加した指数モデル,LSTMネットワークの3つの異なる密度推定手法で評価される。その結果、LSTMモデルを用いることで、ノイズとノイズの両測定を考慮に入れた場合、他の2つの手法よりも優れた終端精度が得られることがわかった。 This work presents a deep-learning approach to estimate atmospheric density profiles for use in planetary entry guidance problems. A long short-term memory (LSTM) neural network is trained to learn the mapping between measurements available onboard an entry vehicle and the density profile through which it is flying. Measurements include the spherical state representation, Cartesian sensed acceleration components, and a surface-pressure measurement. Training data for the network is initially generated by performing a Monte Carlo analysis of an entry mission at Mars using the fully numerical predictor-corrector guidance (FNPEG) algorithm that utilizes an exponential density model, while the truth density profiles are sampled from MarsGRAM. A curriculum learning procedure is developed to refine the LSTM network's predictions for integration within the FNPEG algorithm. The trained LSTM is capable of both predicting the density profile through which the vehicle will fly and reconstructing the density profile through which it has already flown. The performance of the FNPEG algorithm is assessed for three different density estimation techniques: an exponential model, an exponential model augmented with a first-order fading-memory filter, and the LSTM network. Results demonstrate that using the LSTM model results in superior terminal accuracy compared to the other two techniques when considering both noisy and noiseless measurements.	翻訳日:2023-11-01 19:17:29 公開日:2023-10-30
# 時系列オンラインブートストラップ An Online Bootstrap for Time Series ( http://arxiv.org/abs/2310.19683v1 ) ライセンス: Link先を確認	Nicolai Palm and Thomas Nagler	(参考訳) ブートストラップのような再サンプリング手法は、機械学習の分野で有用であることが証明されている。しかし, 従来のブートストラップ法の適用性は, 時系列や空間的相関観測など, 依存データの大きなストリームを扱う場合に制限される。本稿では,データの依存性を考慮した新しいブートストラップ手法を提案する。この方法は、ますます依存する重みの自己回帰配列に基づいている。一般条件下でのブートストラップ方式の理論的妥当性を実証する。提案手法の有効性をシミュレーションにより実証し, 複雑なデータ依存関係が存在する場合でも信頼性の高い不確実性定量化を実現することを示す。我々の研究は、古典的な再サンプリング技術と現代のデータ分析の要求のギャップを埋め、動的でデータ豊富な環境における研究者や実践者にとって貴重なツールを提供する。 Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.	翻訳日:2023-11-01 19:17:06 公開日:2023-10-30
# 事前学習型言語モデルをニューラルネットワーク翻訳に統合する Integrating Pre-trained Language Model into Neural Machine Translation ( http://arxiv.org/abs/2310.19680v1 ) ライセンス: Link先を確認	Soon-Jae Hwang, Chang-Sung Jeong	(参考訳) ニューラルネットワーク翻訳(NMT)は、広範囲の研究・開発を通じて自然言語処理において重要な技術となっている。しかし、高品質なバイリンガル言語ペアデータの不足は、NMTの性能向上に依然として大きな課題をもたらしている。近年,この問題を解決するために,事前学習言語モデル(PLM)の文脈情報の利用を検討している。しかし, PLM モデルと NMT モデルの不整合性の問題は未解決のままである。本研究では PLM 統合 NMT (PiNMT) モデルを提案する。 PiNMTモデルは、3つの重要なコンポーネント、PLM Multi Layer Converter、Embedding Fusion、Cosine Alignmentで構成され、それぞれがNMTに効果的なPLM情報を提供する上で重要な役割を果たす。さらに,本論文では,個別学習率と2段階学習という2つのトレーニング戦略についても紹介する。提案したPiNMTモデルとトレーニング戦略を実装することで,IWSLT'14 En$\leftrightarrow$Deデータセット上で最先端のパフォーマンスを実現した。本研究の結果は,非互換性を克服し,性能を向上させるため,PLMとNMTを効率的に統合する新たなアプローチを示すものである。 Neural Machine Translation (NMT) has become a significant technology in natural language processing through extensive research and development. However, the deficiency of high-quality bilingual language pair data still poses a major challenge to improving NMT performance. Recent studies are exploring the use of contextual information from pre-trained language model (PLM) to address this problem. Yet, the issue of incompatibility between PLM and NMT model remains unresolved. This study proposes a PLM-integrated NMT (PiNMT) model to overcome the identified problems. The PiNMT model consists of three critical components, PLM Multi Layer Converter, Embedding Fusion, and Cosine Alignment, each playing a vital role in providing effective PLM information to NMT. Furthermore, two training strategies, Separate Learning Rates and Dual Step Training, are also introduced in this paper. By implementing the proposed PiNMT model and training strategy, we achieved state-of-the-art performance on the IWSLT'14 En$\leftrightarrow$De dataset. This study's outcomes are noteworthy as they demonstrate a novel approach for efficiently integrating PLM with NMT to overcome incompatibility and enhance performance.	翻訳日:2023-11-01 19:16:51 公開日:2023-10-30
# HyPE: 相対的位置エンコーディングのための双曲的ビアーゼによる注意 HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding ( http://arxiv.org/abs/2310.19676v1 ) ライセンス: Link先を確認	Giorgio Angelotti	(参考訳) Transformerベースのアーキテクチャでは、アテンション機構は入力シーケンスのトークンに関して本質的に置換不変である。シーケンシャルな順序を課すため、トークンの位置は固定または学習可能なパラメータを持つスキームを使って符号化される。本稿では,双曲関数の特性を利用してトークンの相対位置を符号化するHyPE(Hyperbolic Positional Encoding)を提案する。このアプローチは、マスクの$O(L^2)$値を格納する必要なく注意機構をバイアスし、$L$は入力シーケンスの長さである。 HyPEは予備連結演算と行列乗法を活用し、ソフトマックス計算にバイアスを間接的に組み込んだ相対距離の符号化を容易にする。この設計はflashattention-2との互換性を確保し、エンコーディング内で学習可能なパラメータの勾配バックプロパゲーションをサポートする。分析によって,HyPEはALiBiの注意バイアスを近似し,事前学習中に遭遇する長さを超えるコンテキストに対して有望な一般化機能を提供することを示した。今後の研究の方向性としてHyPEの実験的評価を提案する。 In Transformer-based architectures, the attention mechanism is inherently permutation-invariant with respect to the input sequence's tokens. To impose sequential order, token positions are typically encoded using a scheme with either fixed or learnable parameters. We introduce Hyperbolic Positional Encoding (HyPE), a novel method that utilizes hyperbolic functions' properties to encode tokens' relative positions. This approach biases the attention mechanism without the necessity of storing the $O(L^2)$ values of the mask, with $L$ being the length of the input sequence. HyPE leverages preliminary concatenation operations and matrix multiplications, facilitating the encoding of relative distances indirectly incorporating biases into the softmax computation. This design ensures compatibility with FlashAttention-2 and supports the gradient backpropagation for any potential learnable parameters within the encoding. We analytically demonstrate that, by careful hyperparameter selection, HyPE can approximate the attention bias of ALiBi, thereby offering promising generalization capabilities for contexts extending beyond the lengths encountered during pretraining. The experimental evaluation of HyPE is proposed as a direction for future research.	翻訳日:2023-11-01 19:16:16 公開日:2023-10-30
# 階層型階層型深層学習による共同画像圧縮と分類 A Principled Hierarchical Deep Learning Approach to Joint Image Compression and Classification ( http://arxiv.org/abs/2310.19675v1 ) ライセンス: Link先を確認	Siyu Qi, Achintha Wijesinghe, Lahiru D. Chamain, Zhi Ding	(参考訳) 低コストセンサーを含むディープラーニング(DL)の応用の中で、リモート画像分類はエッジセンサーとクラウド分類器を分離する物理チャネルを含む。従来のDLモデルは、センサーのエンコーダとエッジサーバのデコーダ+分類器に分割する必要がある。重要な課題は、接続チャネルが制限されたレート/容量を持つ場合、そのような分散モデルを効果的に訓練することである。我々のゴールは、エンコーダのラテントが低チャネル帯域を必要とするようにDLモデルを最適化し、高い分類精度で特徴情報を提供することである。本研究は,エンコーダを誘導し,コンパクトで差別的で,一般的な拡張/変換に適した特徴を抽出する3段階共同学習戦略を提案する。エンドツーエンド(E2E)トレーニングの前に,初期スクリーニングフェーズを通じて潜時次元を最適化する。単一プリデプロイエンコーダによる調整可能なビットレートを得るために、エントロピーに基づく量子化および/または手動トランケーションを潜在表現に適用する。 CIFAR-10では最大1.5%,CIFAR-100では3%,従来のE2Eクロスエントロピートレーニングでは3%の精度向上が得られた。 Among applications of deep learning (DL) involving low cost sensors, remote image classification involves a physical channel that separates edge sensors and cloud classifiers. Traditional DL models must be divided between an encoder for the sensor and the decoder + classifier at the edge server. An important challenge is to effectively train such distributed models when the connecting channels have limited rate/capacity. Our goal is to optimize DL models such that the encoder latent requires low channel bandwidth while still delivers feature information for high classification accuracy. This work proposes a three-step joint learning strategy to guide encoders to extract features that are compact, discriminative, and amenable to common augmentations/transformations. We optimize latent dimension through an initial screening phase before end-to-end (E2E) training. To obtain an adjustable bit rate via a single pre-deployed encoder, we apply entropy-based quantization and/or manual truncation on the latent representations. Tests show that our proposed method achieves accuracy improvement of up to 1.5% on CIFAR-10 and 3% on CIFAR-100 over conventional E2E cross-entropy training.	翻訳日:2023-11-01 19:15:55 公開日:2023-10-30
# 協調的評価:大規模言語モデルと人間によるオープンエンド世代評価の相乗効果を探る Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation ( http://arxiv.org/abs/2310.19740v1 ) ライセンス: Link先を確認	Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi	(参考訳) 自動メトリクスは、しばしば人間の判断と弱い相関を示すため、人間は創造性を要求する拡張自然言語生成タスク(nlg)の評価に広く関わっている。大規模言語モデル(LLM)は最近、人間の評価に代わるスケーラブルで費用対効果の高い代替品として登場した。しかしながら、人間とLLMの両方には、固有の主観性と信頼できない判断、特に多様なタスク要求に合わせた適応可能なメトリクスを必要とするオープンなタスクに制限がある。人間とllmベースの評価器の相乗効果を探求し、未完成のnlgタスクにおける既存の一貫性のない評価基準の課題に対処するために、タスク固有の基準のチェックリストの設計とllmが初期イデオレーションを生成するテキストの詳細な評価を含む共同評価パイプラインcoevalを提案する。我々は,コエバルにおけるLLMとヒトの相互効果について,一連の実験を行った。その結果, llms を利用することで, coeval は長文を効果的に評価し, かなりの時間を節約し, 評価異常を低減できることがわかった。人間の精査は依然として役割を担っており、LLM評価スコアの約20%を究極の信頼性のために更新している。 Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unreliable judgments, particularly for open-ended tasks that require adaptable metrics tailored to diverse task requirements. To explore the synergy between humans and LLM-based evaluators and address the challenges of existing inconsistent evaluation criteria in open-ended NLG tasks, we propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts, in which LLM generates initial ideation, and then humans engage in scrutiny. We conducted a series of experiments to investigate the mutual effects between LLMs and humans in CoEval. Results show that, by utilizing LLMs, CoEval effectively evaluates lengthy texts, saving significant time and reducing human evaluation outliers. Human scrutiny still plays a role, revising around 20% of LLM evaluation scores for ultimate reliability.	翻訳日:2023-11-01 19:08:44 公開日:2023-10-30
# 大規模言語モデルにおける敵攻撃と防御--古くて新しい脅威 Adversarial Attacks and Defenses in Large Language Models: Old and New Threats ( http://arxiv.org/abs/2310.19737v1 ) ライセンス: Link先を確認	Leo Schwinn and David Dobre and Stephan G\"unnemann and Gauthier Gidel	(参考訳) 過去10年間、ニューラルネットワークの堅牢性向上を目的とした広範な研究が続けられてきたが、この問題は未解決のままである。ここでの大きな障害の1つは、欠陥防衛評価による新しい防衛アプローチの頑健さの過大評価である。欠陥のある堅牢性評価は、その後の作業で修正を必要とし、研究を危険に遅らせ、誤ったセキュリティ感覚を提供する。この文脈では、自然言語処理における差し迫った敵国軍競争、特にChatGPT、Google Bard、Anthropic's Claudeといった、クローズドソースのLarge Language Models(LLMs)に関する大きな課題に直面します。我々は,新しいアプローチの堅牢性評価を改善し,欠陥評価の量を削減するための第1の前提条件を提供する。さらに,LLMに対する埋め込み空間攻撃を,オープンソースモデルで悪意のあるコンテンツを生成するための新たな脅威モデルとして認識する。最後に、最近提案された防御について、llm特有のベストプラクティスがなければ、新しいアプローチの堅牢さを過大評価することが容易であることを示す。 Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.	翻訳日:2023-11-01 19:08:20 公開日:2023-10-30
# 選好フィードバックを用いた個人差分回帰推定 Differentially Private Reward Estimation with Preference Feedback ( http://arxiv.org/abs/2310.19733v1 ) ライセンス: Link先を確認	Sayak Ray Chowdhury, Xingyu Zhou and Nagarajan Natarajan	(参考訳) 嗜好に基づくフィードバックから学ぶことは最近、生成モデルと人間の関心を結びつけるための有望なアプローチとして、かなりの注目を集めている。数値的な報酬に頼る代わりに、生成モデルは人間フィードバックによる強化学習(RLHF)を用いて訓練される。これらのアプローチは、まず2つの可能なアクションをペアで比較し、次にこれらの比較を使って報酬モデルを推定し、最終的に推定報酬モデルに基づくポリシーを採用する。上記のパイプラインの任意のステップにおける敵攻撃は、人間のラベルのプライベートで機密性の高い情報を明らかにする可能性がある。本研究では,ラベル差分プライバシ(DP)の概念を採用し,各ラベルのプライバシを保護しつつ,嗜好に基づくフィードバックからの報酬推定の問題に焦点をあてる。具体的には、遅延報酬パラメータ $\theta^* \in \mathbb{R}^d$ を含むペア比較フィードバックに対するパラメトリックBradley-Terry-Luce(BTL)モデルを考える。標準 minimax 推定フレームワークでは、dp の局所モデルと中央モデルの両方の下で $\theta^$ を推定する際の誤差の上限を上下に厳密に設定する。特定のプライバシー予算に対して、$\epsilon$と$n$のサンプルに対して、ローカルモデルの下でラベルDPを保証するための追加コストは、$\Theta \big(\frac{1}{e^\epsilon-1}\sqrt{\frac{d}{n}}\big)$であり、$\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$である。これらの理論結果を裏付ける合成データのシミュレーションを行う。 Learning from preference-based feedback has recently gained considerable traction as a promising approach to align generative models with human interests. Instead of relying on numerical rewards, the generative models are trained using reinforcement learning with human feedback (RLHF). These approaches first solicit feedback from human labelers typically in the form of pairwise comparisons between two possible actions, then estimate a reward model using these comparisons, and finally employ a policy based on the estimated reward model. An adversarial attack in any step of the above pipeline might reveal private and sensitive information of human labelers. In this work, we adopt the notion of label differential privacy (DP) and focus on the problem of reward estimation from preference-based feedback while protecting privacy of each individual labelers. Specifically, we consider the parametric Bradley-Terry-Luce (BTL) model for such pairwise comparison feedback involving a latent reward parameter $\theta^ \in \mathbb{R}^d$. Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP. We show, for a given privacy budget $\epsilon$ and number of samples $n$, that the additional cost to ensure label-DP under local model is $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}}\big)$, while it is $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$ under the weaker central model. We perform simulations on synthetic data that corroborate these theoretical results.	翻訳日:2023-11-01 19:07:37 公開日:2023-10-30
# ViR:ビジョン保持ネットワーク ViR: Vision Retention Networks ( http://arxiv.org/abs/2310.19731v1 ) ライセンス: Link先を確認	Ali Hatamizadeh, Michael Ranzinger, Jan Kautz	(参考訳) 視覚変換器(ViT)は、長距離空間依存のモデリングや大規模トレーニングのスケーラビリティに特有な能力を持つため、近年、多くの人気を集めている。自己注意機構の訓練並列性は、優れた性能を維持する上で重要な役割を果たすが、その二次的な複雑さは、高速な推論を必要とする多くのシナリオにおけるViTの適用を妨げている。この効果は、入力特徴の自動回帰モデリングを必要とするアプリケーションにおいてさらに顕著である。自然言語処理(nlp)では、ジェネレーティブなアプリケーションで効率的な推論を可能にする再帰的定式化を伴う並列化モデルが提案されている。そこで本研究では,この傾向に触発されたビジョン保持ネットワーク(vir)と呼ばれる新しいコンピュータビジョンモデルを提案する。特に、ViRは、大きなシーケンス長を処理する際の柔軟な定式化のため、高解像度の画像を必要とするタスクにおいて、画像スループットとメモリ消費に好適にスケールする。 ViRは、認識タスクのための一般的なビジョンバックボーンにおいて、並列性と繰り返しの等価性を実現する最初の試みである。異なるデータセットサイズと様々な画像解像度を用いた広範囲な実験により、ViRの有効性を検証し、競争性能を達成した。私たちのコードと事前訓練されたモデルは公開されます。 Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios which demand fast inference. This effect is even more pronounced in applications in which autoregressive modeling of input features is required. In Natural Language Processing (NLP), a new stream of efforts have proposed parallelizable models with recurrent formulation that allows for efficient inference in generative applications. Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance. In particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. The ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. Our code and pretrained models will be made publicly available.	翻訳日:2023-11-01 19:07:03 公開日:2023-10-30
# コンディショナルトランスフォーマによる医療インストラクションの生成 Generating Medical Instructions with Conditional Transformer ( http://arxiv.org/abs/2310.19727v1 ) ライセンス: Link先を確認	Samuel Belkadi and Nicolo Micheletti and Lifeng Han and Warren Del-Pinto and Goran Nenadic	(参考訳) 現実世界の医療指導へのアクセスは、医療研究と医療の品質改善に不可欠である。しかし、実際の医学的指示へのアクセスは、表現される情報の繊細な性質のため、しばしば制限される。さらに、これらの命令をトレーニングや微調整の自然言語処理(NLP)モデルに手動でラベル付けするのも面倒でコストがかかる。本稿では,新たなタスク固有モデルアーキテクチャである Label-To-Text-Transformer (\textbf{LT3}) を導入し,医薬品とその属性の語彙リストなどの提供ラベルに基づく合成医療命令を生成する。 LT3はMIMIC-IIIデータベースから抽出された膨大な量の医療指示に基づいて訓練され、モデルが貴重な合成医療指示を作成できる。 lt3の性能を,最先端の事前学習言語モデル(plm)t5と対比して評価し,生成されたテキストの品質と多様性を分析した。生成された合成データをデプロイして、n2c2-2018データセット上で名前付きエンティティ認識(NER)タスクのためのSpacyNERモデルをトレーニングする。実験の結果, 合成データを用いたモデルでは, 薬物, 頻度, 経路, 強度, 形状のラベル認識において96-98\%のf1スコアが得られることがわかった。 LT3 コードとデータは \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer} で共有される。 Access to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}	翻訳日:2023-11-01 19:06:43 公開日:2023-10-30
# シンプルなモデルへの道は騒音から始まる A Path to Simpler Models Starts With Noise ( http://arxiv.org/abs/2310.19726v1 ) ライセンス: Link先を確認	Lesia Semenova, Harry Chen, Ronald Parr, Cynthia Rudin	(参考訳) ラショモン集合は、与えられたデータセット上でほぼ等しく作用するモデルの集合であり、ラショモン比は、ラショモン集合に属する所定の仮説空間内の全てのモデルの分数である。ラショモン比は、刑事司法、医療、貸付、教育、その他の分野における表型データセットにおいて、しばしば大きく、より単純なモデルがより複雑なモデルと同じレベルの精度を達成することができるかという実践的な意味合いを持つ。オープンな疑問は、なぜラショモン比が大きくなるのかである。本研究では,学習過程におけるデータ生成過程のメカニズムと,学習過程においてアナリストが通常行う選択とを組み合わせることで,ラショモン比のサイズを決定する手法を提案する。具体的には、noisierデータセットが、実践者がモデルをトレーニングする方法を通じて、より大きなrashomon比につながることを実証する。さらに,ラショモン集合の異なる分類パターン間の予測平均差を捉え,ラベルノイズが増大する理由をモチベーションとして,パターン多様性(pattern diversity)という尺度を導入する。その結果、単純なモデルが複雑でノイズの多いデータセット上でブラックボックスモデルと同様に振る舞う傾向がある理由が説明できる。 The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.	翻訳日:2023-11-01 19:06:18 公開日:2023-10-30
# support matrix machine: レビュー Support matrix machine: A review ( http://arxiv.org/abs/2310.19717v1 ) ライセンス: Link先を確認	Anuradha Kumari, Mushir Akhtar, Rupal Shah, M. Tanveer	(参考訳) サポートベクトルマシン(SVM)は、分類と回帰問題に対する機械学習の領域で最も研究されているパラダイムの1つである。ベクトル化された入力データに依存する。しかし、実世界のデータの大部分は行列形式で存在し、行列をベクトルに変換することによってSVMへの入力として与えられる。再構成の過程は、行列データに固有の空間相関を阻害する。また、行列をベクトルに変換することで高次元の入力データが得られるため、計算が複雑になる。行列入力データの分類におけるこれらの課題を克服するために,サポートマトリックスマシン (smm) を提案する。これは行列入力データを扱うのに適した新しい手法の1つである。 SMM法は、核ノルムとフロベニウスノルムの組み合わせであるスペクトル弾性ネット特性を用いて、行列データの構造情報を保存する。本稿は,SMMモデルの開発について,初心者と専門家双方による詳細な要約として使用可能な,初めて詳細な分析を行う。本稿では,ロバスト,スパース,クラス不均衡,マルチクラス分類モデルなど,多数のsmm変種について考察する。また、SMMモデルの適用状況を分析し、SMMアルゴリズムを前進させる動機となる将来的な研究の道筋や可能性について概説する。 Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm.	翻訳日:2023-11-01 19:05:32 公開日:2023-10-30
# オンライン不信生態系の複雑さとその進化 Complexity of the Online Distrust Ecosystem and its Evolution ( http://arxiv.org/abs/2310.19710v1 ) ライセンス: Link先を確認	Lucia Illari, Nicholas J. Restrepo, Neil F. Johnson	(参考訳) 集団的不信(およびそれに伴う誤報)は、我々の時代の最も複雑な現象の1つである。例えば、医学的専門性への不信、気候変動科学、民主的な選挙結果、さらには現在のイスラエル・ハマス・ウクライナ・ロシア紛争における事実確認事件への不信さえある。では、オンライン不信エコシステムがなぜこれほど回復力があるのか? パンデミックの前後でどのように進化しましたか。この期間、facebookの緩和政策はどれくらいうまくいったのか? 我々は、パンデミック以前のユーザーがワクチンの不信感にのみ注力した合計1億人のコミュニティ(facebookページ)のfacebookネットワークを分析した。 2019年から2023年までのこのダイナミックネットワークをマッピングすると、閉鎖を含むFacebookの緩和キャンペーンの結果として、急速に自己修復されたことがわかる。これは、新型コロナウイルス(COVID-19)によるFacebookの上昇は効果がない(例:2020年11月)という以前の発見を裏付け、拡張します。今後の介入は,複数の話題,複数の地理的尺度にまたがって共鳴しなくてはならない。最近の多くの研究と異なり、我々の研究は厳密な科学的研究の正確性が証明されていないサードパーティのブラックボックスツールに依存しておらず、そのような研究の結論に疑問を投げかけている。 Collective human distrust (and its associated mis-disinformation) is one of the most complex phenomena of our time. e.g. distrust of medical expertise, or climate change science, or democratic election outcomes, and even distrust of fact-checked events in the current Israel-Hamas and Ukraine-Russia conflicts. So what makes the online distrust ecosystem so resilient? How has it evolved during and since the pandemic? And how well have Facebook mitigation policies worked during this time period? We analyze a Facebook network of interconnected in-built communities (Facebook pages) totaling roughly 100 million users who pre-pandemic were just focused on distrust of vaccines. Mapping out this dynamical network from 2019 to 2023, we show that it has quickly self-healed in the wake of Facebook's mitigation campaigns which include shutdowns. This confirms and extends our earlier finding that Facebook's ramp-ups during COVID were ineffective (e.g. November 2020). Our findings show that future interventions must be chosen to resonate across multiple topics and across multiple geographical scales. Unlike many recent studies, our findings do not rely on third-party black-box tools whose accuracy for rigorous scientific research is unproven, hence raising doubts about such studies' conclusions, nor is our network built using fleeting hyperlink mentions which have questionable relevance.	翻訳日:2023-11-01 19:05:12 公開日:2023-10-30
# ニューラルネットワークの知識編集に関する調査研究 A Survey on Knowledge Editing of Neural Networks ( http://arxiv.org/abs/2310.19704v1 ) ライセンス: Link先を確認	Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi	(参考訳) 深層ニューラルネットワークは、学界や業界でますます普及し、さまざまな分野や関連するタスクで人間のパフォーマンスと一致し、追い越すようになっている。しかし、人間と同じように、最大のニューラルネットワークでさえ間違いを犯し、世界が経つにつれて一度正しい予測が無効になる可能性がある。ミスや最新の情報を考慮したサンプルによるデータセットの強化は、実用アプリケーションでは一般的な回避策となっている。しかしながら、破滅的な忘れというよく知られた現象は、ニューラルネットワークパラメータの暗黙的に記憶された知識の正確な変化を達成する上で課題となり、しばしば望ましい振る舞いを達成するために完全なモデルの再訓練が必要となる。これは高価で信頼性がなく、大規模な自己教師型事前トレーニングの現在のトレンドと相容れないため、データ変更にニューラルネットワークモデルを適用するためのより効率的で効果的な方法を見つける必要がある。このニーズに対処するために、知識編集は、事前学習されたタスクにおけるモデル行動に影響を与えることなく、信頼性、データ効率、高速な目標モデルの変更を可能にすることを目的とした、新しい研究分野として浮上している。本調査では,最近の人工知能研究分野について概説する。まず、ニューラルネットワークを編集し、共通の枠組みで形式化し、継続的学習のようなより悪名高い研究分野と区別する問題を紹介する。次に、これまでに提案されている最も関連する知識編集手法とデータセットのレビューを行い、正規化技法、メタラーニング、直接モデル編集、アーキテクチャ戦略の4つの異なるファミリーに分類する。最後に,他の研究分野との交点と今後の研究の方向性について概説する。 Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.	翻訳日:2023-11-01 19:04:31 公開日:2023-10-30
# 臨界(1+1)-d 3状態ポッツ模型における位相欠陥の格子実現 Lattice Realizations of Topological Defects in the critical (1+1)-d Three-State Potts Model ( http://arxiv.org/abs/2310.19703v1 ) ライセンス: Link先を確認	Madhav Sinha, Fei Yan, Linnea Grans-Samuelsson, Ananda Roy and Hubert Saleur	(参考訳) 位相的/完全伝達的欠陥は2次元共形場理論(CFT)の対称性の解析において基礎的な役割を果たす。本研究では,これらの欠陥に対するスピン鎖規則化を提案し,三状態ポッツCFTの場合の解析を行った。特に、すべてのプリミティブ欠陥に対する格子バージョンが提示され、残りの欠陥はプリミティブ欠陥の融合によって得られる。この欠陥は、周期的境界条件を持つ等質スピン鎖の2つの所与の部位に修正された相互作用を導入することによって得られる。様々な原始的欠陥は1を除いて格子上の位相的であり、これはスケーリング極限においてのみ位相的である。格子モデルは, 正対角化法と密度行列再正規化法を組み合わせて解析する。欠陥の異なるハミルトニアンに対する低次エネルギースペクトルと、欠陥の周囲に対称に位置するブロックの絡み合いエントロピーを計算する。後者は、様々な欠陥を特徴付ける$g$-functionを計算する便利な方法を提供する。最後に、「交差チャンネル」におけるライン演算子の固有値と異なる欠陥ラインの融合についても解析する。結果はすべて共形場理論の期待と一致している。 Topological/perfectly-transmissive defects play a fundamental role in the analysis of the symmetries of two dimensional conformal field theories (CFTs). In the present work, spin chain regularizations for these defects are proposed and analyzed in the case of the three-state Potts CFT. In particular, lattice versions for all the primitive defects are presented, with the remaining defects obtained from the fusion of the primitive ones. The defects are obtained by introducing modified interactions around two given sites of an otherwise homogeneous spin chain with periodic boundary condition. The various primitive defects are topological on the lattice except for one, which is topological only in the scaling limit. The lattice models are analyzed using a combination of exact diagonalization and density matrix renormalization group techniques. Low-lying energy spectra for different defect Hamiltonians as well as entanglement entropy of blocks located symmetrically around the defects are computed. The latter provides a convenient way to compute the $g$-function which characterizes various defects. Finally, the eigenvalues of the line operators in the "crossed channel'' and fusion of different defect lines are also analyzed. The results are all in agreement with expectations from conformal field theory.	翻訳日:2023-11-01 19:04:02 公開日:2023-10-30
# フォトニック結晶空洞中の量子ドットからの通信波長におけるパーセル励起単一光子 Purcell-Enhanced Single Photons at Telecom Wavelengths from a Quantum Dot in a Photonic Crystal Cavity ( http://arxiv.org/abs/2310.19701v1 ) ライセンス: Link先を確認	Catherine L. Phillips, Alistair J. Brash, Max Godsland, Nicholas J. Martin, Andrew Foster, Anna Tomlinson, Rene Dost, Nasser Babazadeh, Elisa M. Sala, Luke Wilson, Jon Heffernan, Maurice S. Skolnick, A. Mark Fox	(参考訳) 量子ドットは、様々な低損失の通信帯域にまたがる可変発光により、既存のファイバーネットワークと互換性があるため、テレコムの単一光子源として有望な候補である。フォトニック構造への適合性は、パーセル効果を通じて輝度を高め、効率的な量子通信技術をサポートする。本研究は, 液滴エピタキシーMOVPEを用いて作成したInAs/InPQDをテレコムCバンド内で動作させる。低モード容積フォトニック結晶空洞内のqdの相互作用によりパーセル因子5から発生する340psの短い放射寿命を観察した。試料温度のその場制御により,QDの放射波長の温度調整と,25Kまでの温度で保存された単一光子放射純度の両方を示す。これらの結果から,QDをベースとした低温無低温Cバンド単一光子源の実現可能性を示し,量子通信技術の適用性を支持した。 Quantum dots are promising candidates for telecom single photon sources due to their tunable emission across the different low-loss telecommunications bands, making them compatible with existing fiber networks. Their suitability for integration into photonic structures allows for enhanced brightness through the Purcell effect, supporting efficient quantum communication technologies. Our work focuses on InAs/InP QDs created via droplet epitaxy MOVPE to operate within the telecoms C-band. We observe a short radiative lifetime of 340 ps, arising from a Purcell factor of 5, owing to interaction of the QD within a low-mode-volume photonic crystal cavity. Through in-situ control of the sample temperature, we show both temperature tuning of the QD's emission wavelength and a preserved single photon emission purity at temperatures up to 25K. These findings suggest the viability of QD-based, cryogen-free, C-band single photon sources, supporting applicability in quantum communication technologies.	翻訳日:2023-11-01 19:03:44 公開日:2023-10-30
# プロンプティングとプリフィックスチューニングはいつ行われるのか? 能力と限界の理論 When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations ( http://arxiv.org/abs/2310.19698v1 ) ライセンス: Link先を確認	Aleksandar Petrov, Philip H.S. Torr, Adel Bibi	(参考訳) 文脈に基づく微調整手法は、プロンプト、文脈内学習、ソフト・プロンプト(プロンプト・チューニング)、プレフィックス・チューニング(プレフィックス・チューニング)などがあり、パラメータのごく一部で完全な微調整の性能とよく一致するため人気がある。実験的な成功にもかかわらず、これらの手法がモデルの内部計算と表現力の限界にどのように影響するかについての理論的理解はほとんどない。連続埋め込み空間は離散トークン空間よりも表現力が高いが,ソフトプロンプトやプレフィックスチューニングは,学習可能なパラメータの数が同じであっても,完全な微調整よりも厳密に表現力に乏しいことを示す。具体的には、コンテキストベースの微調整はコンテンツ上の相対的注意パターンを変えることができず、注意層の出力を一定の方向に偏らせるだけである。これは、プロンプト、インコンテキスト学習、ソフトプロンプト、プレフィックスチューニングといったテクニックは、事前訓練されたモデルに存在するスキルを効果的に誘発することができるが、新しい注意パターンを必要とする新しいタスクを学べないことを示唆している。 Context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. Despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are strictly less expressive than full fine-tuning, even with the same number of learnable parameters. Concretely, context-based fine-tuning cannot change the relative attention pattern over the content and can only bias the outputs of an attention layer in a fixed direction. This suggests that while techniques like prompting, in-context learning, soft prompting, and prefix-tuning can effectively elicit skills present in the pretrained model, they cannot learn novel tasks that require new attention patterns.	翻訳日:2023-11-01 19:03:22 公開日:2023-10-30
# e^{\text{rpca}}$:指数関数的家族分布に対するロバスト主成分分析 $e^{\text{RPCA}}$: Robust Principal Component Analysis for Exponential Family Distributions ( http://arxiv.org/abs/2310.19787v1 ) ライセンス: Link先を確認	Xiaojun Zheng, Simon Mak, Liyan Xie, Yao Xie	(参考訳) ロバスト・プリンシパル・コンポーネント分析(RPCA)は,データ行列から低ランク構造を復元する手法として広く用いられている。これらの腐敗は、咬合、悪質な改ざん、その他の異常の原因から生じる可能性があり、低ランクの背景を持つ腐敗の共同同定は、プロセス監視と診断に不可欠である。しかし、既存のRPCAメソッドとその拡張は、多くのアプリケーションで知られ、非常に非ガウス的であるデータ行列の基盤となる確率分布をほとんど考慮していない。そこで我々は,指数族に分布する場合に,所望の分解を低ランク・スパース行列に分解できるRobust principal Component Analysis for Exponential Family distributions(e^{\text{RPCA}}$)という新しい手法を提案する。効率的な$e^{\text{RPCA}}$分解のための乗算器最適化アルゴリズムの新しい交互方向法を提案する。 e^{\text{RPCA}}$の有効性は、鋼板欠陥検出のための第1およびアトランタ大都市圏における犯罪活動監視のための第2の2つのアプリケーションで実証される。 Robust Principal Component Analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low-rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non-Gaussian. We thus propose a new method called Robust Principal Component Analysis for Exponential Family distributions ($e^{\text{RPCA}}$), which can perform the desired decomposition into low-rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient $e^{\text{RPCA}}$ decomposition. The effectiveness of $e^{\text{RPCA}}$ is then demonstrated in two applications: the first for steel sheet defect detection, and the second for crime activity monitoring in the Atlanta metropolitan area.	翻訳日:2023-11-01 18:55:49 公開日:2023-10-30
# 分類するか、分類するかを学ぶか? 一般カテゴリー発見のための自己符号化 Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery ( http://arxiv.org/abs/2310.19776v1 ) ライセンス: Link先を確認	Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek	(参考訳) テスト時に新しいカテゴリを発表するという試みでは、事前定義されたカテゴリセットによって制限される従来の教師付き認識モデルの固有の制限に直面する。自己教師とオープンワールドの学習の領域において、テスト時のカテゴリ発見への進歩は行われてきたが、重要でしばしば見過ごされる疑問が続いている。本稿では、最適化のレンズを通して \textit{category} を概念化し、よく定義された問題に対する最適な解と見なす。このユニークな概念化を生かして,テスト時に未知のカテゴリを発見できる,新しい,効率的かつ自己管理的な手法を提案する。このアプローチの健全な特徴は、個々のデータインスタンスに最小長のカテゴリコードを割り当てることであり、実世界のデータセットでよく見られる暗黙のカテゴリ階層をカプセル化する。この機構により、カテゴリの粒度の制御が強化され、より詳細なカテゴリを扱うためのモデルが組み合わされる。試行錯誤による評価は, テスト時に未知のカテゴリを管理する上でのソリューションの有効性を実証するものである。さらに、我々の提案を理論的根拠で補強し、その最適性の証明を提供する。私たちのコードは、 \url{https://github.com/sarahrastegar/infosieve} で利用可能です。 In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a \textit{category}? In this paper, we conceptualize a \textit{category} through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at: \url{https://github.com/SarahRastegar/InfoSieve}.	翻訳日:2023-11-01 18:55:25 公開日:2023-10-30
# 説明可能な人工知能(XAI) 2.0:オープンチャレンジのマニフェストと学際研究の方向性 Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions ( http://arxiv.org/abs/2310.19775v1 ) ライセンス: Link先を確認	Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andr\'es P\'aez, Wojciech Samek, Johannes Schneider, Timo Speith, Simone Stumpf	(参考訳) 不透明な人工知能(AI)に基づくシステムは、さまざまな現実世界のアプリケーションで繁栄を続けているため、これらのブラックボックスモデルを理解することが最重要になっている。これに対し、説明可能なAI(XAI)は、様々な領域にまたがる実践的、倫理的利益の研究分野として登場した。本稿は,XAIの進歩と実世界のシナリオへの応用に加えて,より広い視点と協調的な取り組みの必要性を強調し,XAI内の課題に対処するものである。我々は,様々な分野の専門家を集めてオープンな問題を特定し,研究課題の同期化に努め,xaiの実用化を加速する。協力的な議論と学際的な協力の促進により、私たちは、XAIを前進させ、その継続的な成功に寄与することを目指しています。我々の目標は、XAIを進めるための包括的な提案を行うことです。この目標を達成するために,我々は,27のオープン問題のマニフェストを9つのカテゴリに分類した。これらの課題は、XAIの複雑さとニュアンスをカプセル化し、将来の研究のためのロードマップを提供する。各問題に対して,利害関係者の集合的知性を活用するために,有望な研究指針を提供する。 As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.	翻訳日:2023-11-01 18:55:04 公開日:2023-10-30
# 絡み合い支援古典コミュニケーションのための符号 Codes for entanglement-assisted classical communication ( http://arxiv.org/abs/2310.19774v1 ) ライセンス: Link先を確認	Tushita Prasad, Markus Grassl	(参考訳) 本稿では,固定数の消去や誤りを訂正できる新しい絡み合い支援古典的通信方式を提案する。このスキームは、最大絡み合ったペアによって補助される量子チャネル上で古典的な情報を伝達する。このような課題を古典的な問題に還元して達成するための一般的な枠組みを確立する。私たちは、利用可能な絡み合いの量に基づいて、直接コーディングやスーパーセンスコーディングを使用します。この結果、2つの古典的チャンネルが組み合わさる。このシナリオでは、明示的な符号化スキームを示す。我々は,提案手法を特定の境界値と比較し,そのスキームが最適である特定のパラメータの範囲を求める。提示されたスキームは容易に実現できる。実験で実証されたスーパーデンス符号化の実装のみが必要となる。 We present a new entanglement assisted classical communication scheme which can correct a fixed number of erasures or errors. The scheme transmits classical information over a quantum channel assisted by maximally entangled pairs. We establish a general framework to accomplish such a task by reducing it to a classical problem. We use direct coding or super-dense coding based on the amount of entanglement available. This results in a combination of two classical channels. For this scenario we present an explicit encoding scheme. We compare our scheme with specific bounds and find certain ranges of parameters where the scheme is optimal. The presented scheme can easily be realized. It requires only the implementation of super-dense coding which has been demonstrated successfully in experiments.	翻訳日:2023-11-01 18:54:45 公開日:2023-10-30
# MM-VID:GPT-4V(ision)による映像理解の促進 MM-VID: Advancing Video Understanding with GPT-4V(ision) ( http://arxiv.org/abs/2310.19773v1 ) ライセンス: Link先を確認	Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang	(参考訳) 本稿では、GPT-4Vの能力を利用する統合システムMM-VIDと、視覚、音声、音声の特殊なツールを組み合わせて、高度な映像理解を促進する。 MM-VIDは、長いビデオや、1時間以内のコンテンツの推論や複数のエピソードにまたがるストーリーラインの把握といった複雑なタスクによって引き起こされる課題に対処するように設計されている。 mm-vidはgpt-4vでビデオからスクリプトまで生成し、マルチモーダル要素を長いテキストスクリプトに書き込む。生成されたスクリプトは、文字の動き、アクション、表現、対話を詳述し、ビデオ理解を実現するための大きな言語モデル(LLM)の道を開く。これにより、音声記述、文字識別、マルチモーダルハイレベル理解などの高度な機能を実現する。実験により,様々なビデオ長の異なる動画ジャンルに対するMM-VIDの有効性が示された。また,ゲームやグラフィックユーザインタフェースなど,インタラクティブな環境にも適用可能な可能性を示した。 We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. MM-VID is designed to address the challenges posed by long-form videos and intricate tasks such as reasoning within hour-long content and grasping storylines spanning multiple episodes. MM-VID uses a video-to-script generation with GPT-4V to transcribe multimodal elements into a long textual script. The generated script details character movements, actions, expressions, and dialogues, paving the way for large language models (LLMs) to achieve video understanding. This enables advanced capabilities, including audio description, character identification, and multimodal high-level comprehension. Experimental results demonstrate the effectiveness of MM-VID in handling distinct video genres with various video lengths. Additionally, we showcase its potential when applied to interactive environments, such as video games and graphic user interfaces.	翻訳日:2023-11-01 18:54:36 公開日:2023-10-30
# 動的メタサーフェスアンテナを用いた非視線ユーザトラッキングのための自己回帰注意型ニューラルネットワーク Autoregressive Attention Neural Networks for Non-Line-of-Sight User Tracking with Dynamic Metasurface Antennas ( http://arxiv.org/abs/2310.19767v1 ) ライセンス: Link先を確認	Kyriakos Stylianopoulos, Murat Bayraktar, Nuria Gonz\'alez Prelcic, George C. Alexandropoulos	(参考訳) 次世代無線ネットワークにおけるユーザのローカライゼーションと追跡は、ダイナミックメタサーフェスアンテナ(DMA)のような技術によって革新される可能性がある。一般的に提案されているアルゴリズム的アプローチは、比較的支配的なLine-of-Sight(LoS)パスの仮定に依存するか、DMA要素数に匹敵する長さのパイロット送信シーケンスを必要とする。本稿では,ユーザトラッキングのための2段階の機械学習に基づくアプローチを提案する。新たに提案するアテンションベースニューラルネットワーク(nn)は,ユーザモビリティパターンによらず,ノイズの多いチャネル応答を潜在的なユーザ位置にマッピングするように訓練された。このアーキテクチャは、高次元の周波数応答信号から情報を抽出するために特に修正された顕著な視覚トランスの修正を構成する。第2段階として、過去のユーザ位置に対するnnの予測を学習可能な自己回帰モデルに通し、時間相関チャネル情報を利用して最終位置予測を得る。チャネル推定手法は、部分的に接続された無線周波数チェーンを持つDMA受信アーキテクチャを活用し、パイロット数が減少する。屋外光線追跡のシナリオに対する数値的な評価は、LoSブロックにもかかわらず、様々なマルチパス設定で高い位置精度を達成することができることを示している。 User localization and tracking in the upcoming generation of wireless networks have the potential to be revolutionized by technologies such as the Dynamic Metasurface Antennas (DMAs). Commonly proposed algorithmic approaches rely on assumptions about relatively dominant Line-of-Sight (LoS) paths, or require pilot transmission sequences whose length is comparable to the number of DMA elements, thus, leading to limited effectiveness and considerable measurement overheads in blocked LoS and dynamic multipath environments. In this paper, we present a two-stage machine-learning-based approach for user tracking, specifically designed for non-LoS multipath settings. A newly proposed attention-based Neural Network (NN) is first trained to map noisy channel responses to potential user positions, regardless of user mobility patterns. This architecture constitutes a modification of the prominent vision transformer, specifically modified for extracting information from high-dimensional frequency response signals. As a second stage, the NN's predictions for the past user positions are passed through a learnable autoregressive model to exploit the time-correlated channel information and obtain the final position predictions. The channel estimation procedure leverages a DMA receive architecture with partially-connected radio frequency chains, which results to reduced numbers of pilots. The numerical evaluation over an outdoor ray-tracing scenario illustrates that despite LoS blockage, this methodology is capable of achieving high position accuracy across various multipath settings.	翻訳日:2023-11-01 18:54:19 公開日:2023-10-30
# 誘導コヒーレンスに基づく干渉計の1次コヒーレンスと経路識別性との相補性 Complementarity relationship between first-order coherence and path distinguishability in an interferometer based on induced coherence ( http://arxiv.org/abs/2310.19765v1 ) ライセンス: Link先を確認	Gerard J. Machado, Lluc Sendra, Adam Vall\'es, Juan P. Torres	(参考訳) 誘導コヒーレンス(英語版)の概念に基づく干渉計を考えると、異なる二階非線形結晶に由来する2つの信号光子が干渉することができる。 2つの干渉信号光子間の1次コヒーレンスと、それらが原点となる非線形結晶に関する識別情報を定量化するパラメータを関連付ける相補性関係を導出する。驚くべきことに、導出した関係は単光子系を超えており、任意の光子フラックス速度に有効である。導出相補性関係の妥当性を検証した低光子流束レジームの実験結果を示す。 We consider an interferometer based on the concept of induced coherence, where two signal photons that originate in different second-order nonlinear crystals can interfere. We derive a complementarity relationship that links the first-order coherence between the two interfering signal photons with a parameter that quantifies the distinguishing information regarding the nonlinear crystal where they originated. Astonishingly, the derived relationship goes beyond the single-photon regime and is valid for any photon flux rate generated. We show experimental results in the low photon-flux regime that confirm the validity of the derived complementarity relationship.	翻訳日:2023-11-01 18:53:52 公開日:2023-10-30
# ニューラルPDEにおける自己回帰ルネサンス Autoregressive Renaissance in Neural PDE Solvers ( http://arxiv.org/abs/2310.19763v1 ) ライセンス: Link先を確認	Yolanne Yi Ran Lee	(参考訳) ニューラル偏微分方程式(PDE)の分野における最近の発展は、ニューラル作用素に強く重点を置いている。しかし、ICLR 2022で発表されたBrandstetterらによる論文"Message Passing Neural PDE Solver"では、自己回帰モデルを再検討し、最先端のフーリエニューラル演算子と従来のPDEソルバの両方に匹敵する、あるいは優れたメッセージパッシンググラフニューラルネットワークを、その一般化能力と性能で設計している。このブログ記事は、自動回帰モデルにおける不安定性の一般的な問題と、メッセージパッシンググラフニューラルネットワークアーキテクチャの設計選択に対処するために使用される戦略について詳しく説明している。 Recent developments in the field of neural partial differential equation (PDE) solvers have placed a strong emphasis on neural operators. However, the paper "Message Passing Neural PDE Solver" by Brandstetter et al. published in ICLR 2022 revisits autoregressive models and designs a message passing graph neural network that is comparable with or outperforms both the state-of-the-art Fourier Neural Operator and traditional classical PDE solvers in its generalization capabilities and performance. This blog post delves into the key contributions of this work, exploring the strategies used to address the common problem of instability in autoregressive models and the design choices of the message passing graph neural network architecture.	翻訳日:2023-11-01 18:53:31 公開日:2023-10-30
# 格子場理論からのリアルタイムスピンシステム Real-time Spin Systems from Lattice Field Theory ( http://arxiv.org/abs/2310.19761v1 ) ライセンス: Link先を確認	Neill C. Warrington	(参考訳) 熱浴におけるスピン系の実時間ダイナミクスを計算するための格子場理論法を構築する。これは、シュウィンガー・ケルディッシュによるタカノの以前の研究と機能的分化技術に基づいて行われる。一般スピンハミルトニアンに対してシュウィンガー・ケルディシュ経路積分を導出し、簡単なシステム上でその方法を実証する。我々の経路積分には符号問題があり、一般にシステムサイズにおいて指数的な実行時間を必要とするが、線形記憶だけを必要とする。後者は、この方法を両方の指数関数である正確な対角化よりも有利にすることができる。我々の経路積分は、符号問題を減らす手法である輪郭変形に適応できる。 We construct a lattice field theory method for computing the real-time dynamics of spin systems in a thermal bath. This is done by building on previous work of Takano with Schwinger-Keldysh and functional differentiation techniques. We derive a Schwinger-Keldysh path integral for generic spin Hamiltonians, then demonstrate the method on a simple system. Our path integral has a sign problem, which generally requires exponential run time in the system size, but requires only linear storage. The latter may place this method at an advantage over exact diagonalization, which is exponential in both. Our path integral is amenable to contour deformations, a technique for reducing sign problems.	翻訳日:2023-11-01 18:53:08 公開日:2023-10-30
# 機械学習モデルを用いた疫病発生予測 Epidemic outbreak prediction using machine learning models ( http://arxiv.org/abs/2310.19760v1 ) ライセンス: Link先を確認	Akshara Pramod, JS Abhishek, Dr. Suganthi K	(参考訳) 今日の世界では、新興・再発展のリスクが高まっており、近年の医療技術の進歩により、地域内での流行の予測が可能となり、感染拡大の予測は、物事のコントロールを維持するために必要な薬品や物流を当局が準備するのに大いに役立ちます。本稿では,機械学習と深層学習のアルゴリズムを用いて,米国ニューヨーク州における流行(インフルエンザ,肝炎,マラリア)の発生を予測すべく,同地域の当局や医療機関にアウトブレイクを知らせるポータルを作成した。このアルゴリズムは、過去のデータを使って5週間のケース数を予測します。グーグル検索トレンド、社会メディアデータ、気象データなどの非臨床要因も、アウトブレイクの確率を予測するために使われてきた。 In today's world,the risk of emerging and re-emerging epidemics have increased.The recent advancement in healthcare technology has made it possible to predict an epidemic outbreak in a region.Early prediction of an epidemic outbreak greatly helps the authorities to be prepared with the necessary medications and logistics required to keep things in control. In this article, we try to predict the epidemic outbreak (influenza, hepatitis and malaria) for the state of New York, USA using machine and deep learning algorithms, and a portal has been created for the same which can alert the authorities and health care organizations of the region in case of an outbreak. The algorithm takes historical data to predict the possible number of cases for 5 weeks into the future. Non-clinical factors like google search trends,social media data and weather data have also been used to predict the probability of an outbreak.	翻訳日:2023-11-01 18:52:49 公開日:2023-10-30
# 英国における鉄道運行コスト計算 Rail journey cost calculator for Great Britain ( http://arxiv.org/abs/2310.19754v1 ) ライセンス: Link先を確認	Federico Botta	(参考訳) 病院や仕事のある地域など、様々な場所のアクセシビリティは、交通システム、都市環境、そして様々な人々が到達できるサービスや機会の不平等を理解する上で重要である。多くの場合、この領域の研究は、特定時間枠内である地域に住む人々が特定の目的地に到達できるかどうかという問題に焦点が当てられている。しかし、こうした旅の費用や手頃な価格であっても、しばしば省略されるか、同じレベルとはみなされない。ここでは,英国におけるトレイン旅行のコストを分析するために,Pythonパッケージと関連するデータセットを紹介する。我々は、これを構築するのに使った元のデータセット、それを分析するために開発したPythonパッケージ、生成した出力データセットを示します。私たちは、研究者、政策立案者、その他の利害関係者が、列車の運行コスト、これに起因する地理的または社会的不平等、輸送システムの改善方法に関する質問を調査できるようにするために、我々の研究を監督しています。 Accessibility of different places, such as hospitals or areas with jobs, is important in understanding transportation systems, urban environments, and potential inequalities in what services and opportunities different people can reach. Often, research in this area is framed around the question of whether people living in an area are able to reach certain destinations within a prespecified time frame. However, the cost of such journeys, and whether they are affordable, is often omitted or not considered to the same level. Here, we present a Python package and an associated data set which allows to analyse the cost of train journeys in Great Britain. We present the original data set we used to construct this, the Python package we developed to analyse it, and the output data set which we generated. We envisage our work to allow researchers, policy makers, and other stakeholders, to investigate questions around the cost of train journeys, any geographical or social inequalities arising from this, and how the transport system could be improved.	翻訳日:2023-11-01 18:52:22 公開日:2023-10-30
# CLIPを用いたゼロショット視覚分類のためのモーダル内プロキシ学習 Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP ( http://arxiv.org/abs/2310.19752v1 ) ライセンス: Link先を確認	Qi Qian, Yuanhong Xu, Juhua Hu	(参考訳) 視覚言語による事前学習メソッド、例えばクリップは、クラス名のテキスト埋め込みによるクラスプロキシで、視覚的な分類において印象的なゼロショットのパフォーマンスを示している。しかし、テキストと視覚空間の間のモダリティギャップは、準最適性能をもたらす可能性がある。理論的には、クリップのコントラスト損失を最小化し、ビジョンタスクの最適なプロキシをビジョン空間にのみ配置することで、ギャップを十分に削減できないことを示す。そこで,未ラベルの目標視データから,ゼロショット転送のためのテキストプロキシの助けを借りて,ビジョンプロキシを直接学習することを提案する。さらに,本理論解析により,テキストプロキシが取得した擬似ラベルをさらに洗練し,視覚のモード内プロキシ学習(inmap)を容易にするための戦略を開発した。広範囲な下流タスクの実験により,提案手法の有効性と有効性が確認された。具体的には、InMaPは単一のGPU上で1分以内にビジョンプロキシを取得することができ、CLIPが事前トレーニングしたViT-L/14@336でImageNet上でのゼロショット精度を7.02\%から80.21\%に改善することができる。コードは \url{https://github.com/idstcv/InMaP} で入手できる。 Vision-language pre-training methods, e.g., CLIP, demonstrate an impressive zero-shot performance on visual categorizations with the class proxy from the text embedding of the class name. However, the modality gap between the text and vision space can result in a sub-optimal performance. We theoretically show that the gap cannot be reduced sufficiently by minimizing the contrastive loss in CLIP and the optimal proxy for vision tasks may reside only in the vision space. Therefore, given unlabeled target vision data, we propose to learn the vision proxy directly with the help from the text proxy for zero-shot transfer. Moreover, according to our theoretical analysis, strategies are developed to further refine the pseudo label obtained by the text proxy to facilitate the intra-modal proxy learning (InMaP) for vision. Experiments on extensive downstream tasks confirm the effectiveness and efficiency of our proposal. Concretely, InMaP can obtain the vision proxy within one minute on a single GPU while improving the zero-shot accuracy from $77.02\%$ to $80.21\%$ on ImageNet with ViT-L/14@336 pre-trained by CLIP. Code is available at \url{https://github.com/idstcv/InMaP}.	翻訳日:2023-11-01 18:51:51 公開日:2023-10-30
# ソーシャルメディアにおけるスタンス検出のためのチェーンオブソート埋め込み Chain-of-Thought Embeddings for Stance Detection on Social Media ( http://arxiv.org/abs/2310.19750v1 ) ライセンス: Link先を確認	Joseph Gatto, Omar Sharif, Sarah Masud Preum	(参考訳) ソーシャルメディアでのスタンス検出は大規模言語モデル(llm)では困難であり、オンライン会話における新しいスラングや口語は、しばしば暗黙のスタンスラベルを含んでいる。 CoT(Chain-of-Thought)プロンプトは、最近、スタンス検出タスクのパフォーマンスを改善することが示されている。しかし、cotプロンプトは暗黙のスタンス識別に苦しむ。この課題は、モデルがさまざまなトピックに関連するスラングや進化する知識に慣れるまでに、多くのサンプルが最初に理解することが難しいためである。本研究では,COT推論を埋め込み,従来のRoBERTaを用いた姿勢検出パイプラインに統合することにより,姿勢検出タスクにおけるCOT性能を向上させるCOT埋め込みを導入することで,この問題に対処する。私たちの分析は 1)テキストエンコーダは、COT出力ラベルを歪ませるような小さなエラーや幻覚を伴うCOT推論を利用することができる。 2)テキストエンコーダは,サンプルの予測がドメイン固有のパターンに大きく依存する場合,COT推論の誤解を招く可能性がある。本モデルはソーシャルメディアから収集した複数の姿勢検出データセット上でのSOTA性能を実現する。 Stance detection on social media is challenging for Large Language Models (LLMs), as emerging slang and colloquial language in online conversations often contain deeply implicit stance labels. Chain-of-Thought (COT) prompting has recently been shown to improve performance on stance detection tasks -- alleviating some of these issues. However, COT prompting still struggles with implicit stance identification. This challenge arises because many samples are initially challenging to comprehend before a model becomes familiar with the slang and evolving knowledge related to different topics, all of which need to be acquired through the training data. In this study, we address this problem by introducing COT Embeddings which improve COT performance on stance detection tasks by embedding COT reasonings and integrating them into a traditional RoBERTa-based stance detection pipeline. Our analysis demonstrates that 1) text encoders can leverage COT reasonings with minor errors or hallucinations that would otherwise distort the COT output label. 2) Text encoders can overlook misleading COT reasoning when a sample's prediction heavily depends on domain-specific patterns. Our model achieves SOTA performance on multiple stance detection datasets collected from social media.	翻訳日:2023-11-01 18:51:24 公開日:2023-10-30
# この特性のよいところを教えてください:セグメントパーソナライズされた画像収集の要約にレビューを活用する Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization ( http://arxiv.org/abs/2310.19743v1 ) ライセンス: Link先を確認	Monika Wysoczanska, Moran Beladev, Karen Lastmann Assaraf, Fengjun Wang, Ofri Kleinfeld, Gil Amsalem, Hadas Harush Boker	(参考訳) 画像収集要約技術は、画像ギャラリーのコンパクトな表現を、その意味的コンテンツをキャプチャする画像の慎重に選択されたサブセットを通して提示することを目的としている。しかし、webコンテンツに関しては、ユーザの特定の意図や好みに応じて、理想的な選択が異なります。これはBooking.comで特に重要であり、ユーザの期待に沿うプロパティとその視覚的要約を提示することが重要である。この課題に対処するために、プロパティレビューを分析し、ユーザが言及する最も重要な側面を抽出することで、プロパティビジュアルの要約におけるユーザ意図を考察する。視覚的な要約にレビューからの洞察を取り入れることで、関連コンテンツをユーザに提示することで要約を強化する。さらに、コストのかかるアノテーションを必要とせずに実現します。人間の知覚研究を含む我々の実験は、ノンパーソナライズとイメージベースのクラスタリングベースラインよりもクロスサムマライザとして生み出される、クロスモーダルアプローチの優位性を示しています。 Image collection summarization techniques aim to present a compact representation of an image gallery through a carefully selected subset of images that captures its semantic content. When it comes to web content, however, the ideal selection can vary based on the user's specific intentions and preferences. This is particularly relevant at Booking.com, where presenting properties and their visual summaries that align with users' expectations is crucial. To address this challenge, we consider user intentions in the summarization of property visuals by analyzing property reviews and extracting the most significant aspects mentioned by users. By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user. Moreover, we achieve it without the need for costly annotations. Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach, which we coin as CrossSummarizer over the no-personalization and image-based clustering baselines.	翻訳日:2023-11-01 18:50:54 公開日:2023-10-30
# 位相変調によるアレイの原子制御 Individual-atom control in array through phase modulation ( http://arxiv.org/abs/2310.19741v1 ) ライセンス: Link先を確認	Guoqing Wang, Wenchao Xu, Changhao Li, Vladan Vuleti\'c, Paola Cappellaro	(参考訳) 低クロストークを維持しながら並列ゲート操作を実行することは、中性原子配列を強力な量子コンピュータやシミュレータに変換するための重要なステップである。クロストーク抑制のために小さな領域に集束した制御ビームは、通常困難であり、特定の遷移に対して不完全な分極を引き起こす。本研究では, 位相変調連続駆動による単一キュービットゲートの設計手法を導入することで, この問題に対処する。特定の量子ビットは、クロストーク効果を著しく抑制する変調パラメータを調整するだけで、個別に高精度に対応できる。格子構造に配置すると、最適クロストーク抑制による個別制御を実現する。追加のアドレッシング光または多重変調周波数の補助により、並列ゲート演算の2つの効率的な実装を開発する。その結果、複雑な波面設計や高出力レーザービームを必要とせず、低エラーのパラレルゲート操作で原子線プラットフォームをスケールアップする方法が得られた。 Performing parallel gate operations while retaining low crosstalk is an essential step in transforming neutral atom arrays into powerful quantum computers and simulators. Tightly focusing control beams in small areas for crosstalk suppression is typically challenging and can lead to imperfect polarization for certain transitions. We tackle such a problem by introducing a method to engineer single qubit gates through phase-modulated continuous driving. Distinct qubits can be individually addressed to high accuracy by simply tuning the modulation parameters, which significantly suppresses crosstalk effects. When arranged in a lattice structure, individual control with optimal crosstalk suppression is achieved. With the assistance of additional addressing light or multiple modulation frequencies, we develop two efficient implementations of parallel-gate operations. Our results pave the way to scaling up atom-array platforms with low-error parallel-gate operations, without requiring complicated wavefront design or high-power laser beams.	翻訳日:2023-11-01 18:50:35 公開日:2023-10-30
# DEFT: 現実世界のハンド・ポリシーのためのデクサラス・ファイン・チューニング DEFT: Dexterous Fine-Tuning for Real-World Hand Policies ( http://arxiv.org/abs/2310.19797v1 ) ライセンス: Link先を確認	Aditya Kannan, Kenneth Shaw, Shikhar Bahl, Pragna Mannam, Deepak Pathak	(参考訳) デクスタリティはしばしば複雑な操作の基盤として見なされる。人間は、食べ物作りから操作ツールまで、さまざまなスキルを手を使って実行することができる。本稿では,これらの課題,特に軟質で変形可能な物体や,複雑で比較的長い水平なタスクについて検討する。しかし、そのような振る舞いをスクラッチから学ぶことはデータ非効率である。これを回避するために,実世界で直接実行される人間による事前処理を活用する新しいアプローチDEFT(DExterous Fine-Tuning for Hand Policies)を提案する。これらの先行性を改善するために、DEFTは効率的なオンライン最適化手順を必要とする。人間の学習とオンラインの微調整を統合し、ソフトなロボットハンドと組み合わせることで、DEFTはさまざまなタスクにまたがって成功を示し、汎用的な巧妙な操作に向けた堅牢でデータ効率のよい経路を確立する。ビデオの検索結果はhttps://dexterous-finetuning.github.ioでご覧ください。 Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.	翻訳日:2023-11-01 18:43:52 公開日:2023-10-30
# 合成を用いた再合成アルゴリズムの再評価 Re-evaluating Retrosynthesis Algorithms with Syntheseus ( http://arxiv.org/abs/2310.19796v1 ) ライセンス: Link先を確認	Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gai\'nski, Philipp Seidl, Marwin Segler	(参考訳) 分子の合成の計画(レトロシンセシスとも呼ばれる)は近年、機械学習と化学のコミュニティに注目が集まっている。安定した進歩の出現にもかかわらず、不完全なベンチマークと不整合比較は既存の技術の体系的な欠点を隠蔽していると主張する。そこで本研究では,syntheseusというベンチマークライブラリを提案する。このベンチマークライブラリは,単一ステップおよび複数ステップのレトロシンセシスアルゴリズムの一貫性のある評価を可能にする。合成法を用いて, 過去のレトロシンセシスアルゴリズムを再評価し, 慎重に評価すると, 最先端モデルのランクが変化することがわかった。私たちはこの地域の将来の仕事のガイダンスで終わります。 The planning of how to synthesize molecules, also known as retrosynthesis, has been a growing focus of the machine learning and chemistry communities in recent years. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques. To remedy this, we present a benchmarking library called syntheseus which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step retrosynthesis algorithms. We use syntheseus to re-evaluate a number of previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes when evaluated carefully. We end with guidance for future works in this area.	翻訳日:2023-11-01 18:43:38 公開日:2023-10-30
# SimMMDG: マルチモーダルドメイン一般化のためのシンプルで効果的なフレームワーク SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization ( http://arxiv.org/abs/2310.19795v1 ) ライセンス: Link先を確認	Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, Olga Fink	(参考訳) 実世界のシナリオでは、ドメイン一般化(DG)を達成するには、未知のターゲット分布に一般化するモデルが必要であるため、大きな課題が提示される。未知のマルチモーダル分布への一般化は、異なるモダリティによって示される異なる性質のためにさらに困難をもたらす。マルチモーダルシナリオにおけるドメイン一般化の課題を克服するために,単純かつ効果的なマルチモーダルdgフレームワークであるsimmmdgを提案する。異なるモダリティから同じ埋め込み空間へのマッピング機能はモデルの一般化を妨げると論じている。これに対処するために、各モダリティ内の機能をモダリティ固有のコンポーネントとモダリティ共有コンポーネントに分割することを提案する。我々は,モダリティ共有特徴に対する教師付きコントラスト学習を用いて,共同性を確保し,多様性を促進するためにモダリティ固有の特徴に距離制約を課す。さらに,学習された機能を正規化するクロスモーダル翻訳モジュールを導入し,欠落モダリティ一般化にも利用できる。本稿では,EPIC-KitchensデータセットとHuman-Animal-Cartoon(HAC)データセットを用いたマルチモーダルDGを理論的に支持し,高い性能を実現していることを示す。私たちのソースコードとhacデータセットはhttps://github.com/donghao51/simmmdgで利用可能です。 In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.	翻訳日:2023-11-01 18:43:24 公開日:2023-10-30
# 線形モデルのためのロバスト因果バンディット Robust Causal Bandits for Linear Models ( http://arxiv.org/abs/2310.19794v1 ) ライセンス: Link先を確認	Zirui Yan, Arpan Mukherjee, Burak Var{\i}c{\i}, Ali Tajer	(参考訳) 因果系における報酬関数を最適化するための実験の逐次設計は、因果包帯(CB)における介入のシーケンシャル設計によって効果的にモデル化することができる。 CBに関する既存の文献では、因果モデルが時間とともに一定であることが重要な仮定である。しかし、この仮定は、常に時間モデルゆらぎを経る複雑なシステムでは必ずしも成り立たない。本稿では,このようなモデル変動に対するCBの堅牢性について述べる。焦点は線形構造方程式モデル(SEM)による因果系である。 SEMと時間変化の前・後統計モデルは、すべて不明である。累積的後悔(cumulative regret)は設計基準として採用され、その目的は、因果モデル全体とそのゆらぎを認識したオラクルに対して、最小の累積後悔を引き起こす一連の介入を設計することである。第一に, 既存手法ではモデル偏差の例さえあれば, 後悔する部分線形性が維持できないことが判明した。特に、モデルの偏差を持つインスタンス数が$t^\frac{1}{2l}$で、$t$が時間軸であり、$l$がグラフの最長因果経路である場合、既存のアルゴリズムは、$t$で線形後悔する。次に、ロバストなcbアルゴリズムを設計し、その後悔を解析し、後悔の上位及び情報理論的下限を設定する。具体的には、$N$ノードと最大次数$d$のグラフにおいて、モデル偏差$C$の一般的な測度の下で、累積後悔は$\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{NT} + NC))$で上界、下界$\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$で下界となる。これらの境界を比較すると、提案アルゴリズムは$C$が$o(\sqrt{T})$であるときにほぼ最適な$\tilde{\mathcal{O}}(\sqrt{T})$後悔を達成し、より広い範囲の$C$に対してサブ線形後悔を維持する。 Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as $T^\frac{1}{2L}$, where $T$ is the time horizon and $L$ is the longest causal path in the graph, the existing algorithms will have linear regret in $T$. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with $N$ nodes and maximum degree $d$, under a general measure of model deviation $C$, the cumulative regret is upper bounded by $\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{NT} + NC))$ and lower bounded by $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$. Comparing these bounds establishes that the proposed algorithm achieves nearly optimal $\tilde{\mathcal{O}}(\sqrt{T})$ regret when $C$ is $o(\sqrt{T})$ and maintains sub-linear regret for a broader range of $C$.	翻訳日:2023-11-01 18:42:59 公開日:2023-10-30
# 勾配流をもつガウス型マルチインデックスモデルの学習について On Learning Gaussian Multi-index Models with Gradient Flow ( http://arxiv.org/abs/2310.19793v1 ) ライセンス: Link先を確認	Alberto Bietti, Joan Bruna and Loucas Pillaud-Vivien	(参考訳) 高次元ガウスデータに対するマルチインデックス回帰問題における勾配流れについて検討する。マルチインデックス関数は、未知の低ランク線形射影と任意の未知の低次元リンク関数からなる。そのため、ニューラルネットワークにおける特徴学習の自然なテンプレートを構成する。低階射影をパラメトリする部分空間よりも、非パラメトリックモデルで低次元リンク関数を無限に高速に学習する2時間スケールのアルゴリズムを考える。部分空間相関行列上で生じる行列半群構造を適切に活用することにより、結果として生じるグラスマン人口勾配流れのグローバル収束を確立し、関連する「サドル・ツー・サドル」ダイナミクスの定量的記述を提供する。特に、各サドルに関連する時間スケールは、ターゲットリンク関数の適切なエルミート分解の観点から明確に特徴づけることができる。これらのポジティブな結果とは対照的に、リンク関数が知られ固定されている場合の関連する \emph{planted} 問題は、実際には勾配流れのダイナミクスが高い確率で捕捉されるような大まかな最適化のランドスケープを持っていることも示している。 We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian population gradient flow dynamics, and provide a quantitative description of its associated `saddle-to-saddle' dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function. In contrast with these positive results, we also show that the related \emph{planted} problem, where the link function is known and fixed, in fact has a rough optimization landscape, in which gradient flow dynamics might get trapped with high probability.	翻訳日:2023-11-01 18:42:14 公開日:2023-10-30
# Eval4NLP 2023 大規模言語モデルを説明可能な指標として示す作業 The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics ( http://arxiv.org/abs/2310.19792v1 ) ライセンス: Link先を確認	Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger	(参考訳) パラメータの数の増加と事前学習データにより、生成型大規模言語モデル(LLM)は、タスクに関連する最小あるいは全くの例でタスクを解く顕著な能力を示した。特に、LLMはテキスト生成タスクにおいて評価指標としてうまく採用されている。本研究では,機械翻訳(MT)と要約評価のためのプロンプトとスコア抽出を参加者に求めるEval4NLP 2023共有タスクを提案する。具体的には,許可されたllmのリストを選択し,プロンプトに焦点を合わせるために微調整を禁止する,新たなコンペティション設定を提案する。参加者のアプローチの概要を述べるとともに,MTの3つの言語対と要約データセットにまたがる新しい参照なしテストセットについて評価する。特に、タスクの制限にもかかわらず、最高のパフォーマンスのシステムは、GEMBAやComet-Kiwi-XXLといった大規模モデルで開発された最近の参照なしメトリクスと同等かそれ以上の結果を得る。最後に,個別のトラックとして,llmによる説明の可能性について,小規模の人間による評価を行う。 With an increasing number of parameters and pre-training data, generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples. Notably, LLMs have been successfully employed as evaluation metrics in text generation tasks. Within this context, we introduce the Eval4NLP 2023 shared task that asks participants to explore prompting and score extraction for machine translation (MT) and summarization evaluation. Specifically, we propose a novel competition setting in which we select a list of allowed LLMs and disallow fine-tuning to ensure a focus on prompting. We present an overview of participants' approaches and evaluate them on a new reference-free test set spanning three language pairs for MT and a summarization dataset. Notably, despite the task's restrictions, the best-performing systems achieve results on par with or even surpassing recent reference-free metrics developed using larger models, including GEMBA and Comet-Kiwi-XXL. Finally, as a separate track, we perform a small-scale human evaluation of the plausibility of explanations given by the LLMs.	翻訳日:2023-11-01 18:41:56 公開日:2023-10-30
# lilo: 圧縮と文書化による解釈可能なライブラリの学習 LILO: Learning Interpretable Libraries by Compressing and Documenting Code ( http://arxiv.org/abs/2310.19791v1 ) ライセンス: Link先を確認	Gabriel Grand, Lionel Wong, Matthew Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum, Jacob Andreas	(参考訳) 大規模言語モデル(LLM)はコード生成に優れていますが、ソフトウェア開発の重要な側面はリファクタリングのテクニックです。本稿では,特定の問題領域に合わせたライブラリを構築するために,反復的に合成,圧縮,文書化を行う神経シンボリックフレームワークであるliloを紹介する。 LILOは、LLM誘導プログラム合成と、Stitchからの自動リファクタリングにおける最近のアルゴリズム的な進歩を組み合わせたものだ。これらの抽象化を解釈するために、文脈的使用例に基づいて自然言語名や文書を推論するAuto-Doc(Auto-Docmentation)手順を導入する。人間の可読性の改善に加えて、AutoDocはLILOのシンセサイザーが学習した抽象化を解釈し、デプロイするのを手助けすることで、パフォーマンスを向上させる。文字列編集,シーン推論,グラフィック合成の3つの帰納的プログラム合成ベンチマークでLILOを評価する。最先端のライブラリ学習アルゴリズムDreamCoderを含む既存のニューラルおよびシンボリックメソッドと比較して、LILOはより複雑なタスクを解決し、言語知識に根ざしたリッチなライブラリを学ぶ。 While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.	翻訳日:2023-11-01 18:41:37 公開日:2023-10-30
# DiffEnc:学習エンコーダを用いた変分拡散 DiffEnc: Variational Diffusion with a Learned Encoder ( http://arxiv.org/abs/2310.19789v1 ) ライセンス: Link先を確認	Beatrix M. G. Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther	(参考訳) 拡散モデルは階層的変分オートエンコーダ(vaes)と見なすことができる: 生成過程における条件分布のパラメータ共有と階層上の独立項としての損失の効率的な計算である。モデルに柔軟性を加えながらこれらの利点を維持する拡散モデルに対する2つの変更を検討する。まず,拡散過程におけるデータと深さに依存した平均関数を導入することにより,拡散損失が変化する。提案するフレームワークであるDiffEncは,CIFAR-10における最先端の可能性を実現する。次に、逆エンコーダ法と生成過程のノイズ分散の比を1に固定されるのではなく、自由ウェイトパラメータとする。有限深度階層に対して、エビデンスローバウンド(ELBO)は、重み付け拡散損失アプローチの目的として、および推論に特化してノイズスケジュールを最適化するために使用することができる。一方、無限深さ階層では、重みパラメータは 1 で十分定義された ELBO を持つ必要がある。 Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves state-of-the-art likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an objective for a weighted diffusion loss approach and for optimizing the noise schedule specifically for inference. For the infinite-depth hierarchy, on the other hand, the weight parameter has to be 1 to have a well-defined ELBO.	翻訳日:2023-11-01 18:41:14 公開日:2023-10-30
# 予算を固定した局所最適最良腕識別法 Locally Optimal Best Arm Identification with a Fixed Budget ( http://arxiv.org/abs/2310.19788v1 ) ライセンス: Link先を確認	Masahiro Kato	(参考訳) 本研究は, 最良治療アーム, 期待結果の高い治療アームの同定に関する課題について検討する。最良治療アームの同定と誤認の確率の低下を目標とし,様々な研究分野において,<emph{best arm identification} (bai) や順序最適化など様々な名称で検討してきた。実験では,治療アロケーションラウンドの数を固定した。各ラウンドにおいて、意思決定者は、処理アームを実験ユニットに割り当て、対応する結果を観察し、処理アーム間のばらつきが異なるガウス分布に従う。実験の最後には、観察に基づいて最適な治療アームの見積もりとして、治療アームの1つを推奨する。意思決定者の目標は、最良の治療アームを誤認する可能性を最小化する実験を設計することである。この目的を念頭に、我々は、最善と準最適治療腕の期待結果のギャップがゼロに近づく小ギャップ体制の下で、誤同定の確率の低い境界を開発する。そして、この分散が知られていると仮定して、我々は、Neyman (1934) が提案した Neyman 割り当ての拡張である Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) 戦略と Bubeck et al. (2011) が提案した Uniform-EBA 戦略を設計する。 GNA-EBA戦略は, サンプルサイズが小ギャップ体制下で無限に近づくにつれて, 誤同定の確率が下界と一致するため, 漸近的に最適であることを示す。局所的な漸近的最適戦略は、その性能が小ギャップ体制によって特徴づけられる制限された状況の中で下界と一致しているためである。 This study investigates the problem of identifying the best treatment arm, a treatment arm with the highest expected outcome. We aim to identify the best treatment arm with a lower probability of misidentification, which has been explored under various names across numerous research fields, including \emph{best arm identification} (BAI) and ordinal optimization. In our experiments, the number of treatment-allocation rounds is fixed. In each round, a decision-maker allocates a treatment arm to an experimental unit and observes a corresponding outcome, which follows a Gaussian distribution with a variance different among treatment arms. At the end of the experiment, we recommend one of the treatment arms as an estimate of the best treatment arm based on the observations. The objective of the decision-maker is to design an experiment that minimizes the probability of misidentifying the best treatment arm. With this objective in mind, we develop lower bounds for the probability of misidentification under the small-gap regime, where the gaps of the expected outcomes between the best and suboptimal treatment arms approach zero. Then, assuming that the variances are known, we design the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, which is an extension of the Neyman allocation proposed by Neyman (1934) and the Uniform-EBA strategy proposed by Bubeck et al. (2011). For the GNA-EBA strategy, we show that the strategy is asymptotically optimal because its probability of misidentification aligns with the lower bounds as the sample size approaches infinity under the small-gap regime. We refer to such optimal strategies as locally asymptotic optimal because their performance aligns with the lower bounds within restricted situations characterized by the small-gap regime.	翻訳日:2023-11-01 18:40:53 公開日:2023-10-30
# ビジョン言語モデルで"アップ"とは何か? 空間的推論と闘いの考察 What's "up" with vision-language models? Investigating their struggle with spatial reasoning ( http://arxiv.org/abs/2310.19785v1 ) ライセンス: Link先を確認	Amita Kamath, Jack Hessel, Kai-Wei Chang	(参考訳) 最近の視覚言語(VL)モデルは強力だが、「右」と「左」を確実に区別できるだろうか? このような空間関係のモデル理解を定量化するために、3つの新しいコーパスをキュレートする。これらのテストは、VQAv2のような既存のデータセットよりも正確に空間的推論を分離します。例えば、私たちのWhat'sUpベンチマークには、オブジェクトの空間的関係だけを変化させ、そのアイデンティティを固定し続ける一連の写真が含まれています(図1:モデルは、テーブルの下の犬の通常のケースだけでなく、同じテーブルの上にある同じ犬も理解する必要があります)。例えば、VQAv2の人間のパリティに近いVQAv2で微調整されたBLIPは、我々のベンチマークで99%の精度で56%の精度を実現している。私たちはこの驚くべき行動の原因を研究することで結論付けます。 1) LAION-2Bのような一般的な視覚言語事前学習コーパスは、空間関係を学習するための信頼性が低い。 2) 事前設定を含むアップウェイトやコーパスの微調整のような基本的なモデリング介入は、ベンチマークがもたらす課題に対処するには不十分である。これらのコーパスがさらなる研究を促進することを期待しており、データとコードをhttps://github.com/amitakamath/whatsup_vlms.comで公開しています。 Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"? We curate three new corpora to quantify model comprehension of such basic spatial relations. These tests isolate spatial reasoning more precisely than existing datasets like VQAv2, e.g., our What'sUp benchmark contains sets of photographs varying only the spatial relations of objects, keeping their identity fixed (see Figure 1: models must comprehend not only the usual case of a dog under a table, but also, the same dog on top of the same table). We evaluate 18 VL models, finding that all perform poorly, e.g., BLIP finetuned on VQAv2, which nears human parity on VQAv2, achieves 56% accuracy on our benchmarks vs. humans at 99%. We conclude by studying causes of this surprising behavior, finding: 1) that popular vision-language pretraining corpora like LAION-2B contain little reliable data for learning spatial relationships; and 2) that basic modeling interventions like up-weighting preposition-containing instances or fine-tuning on our corpora are not sufficient to address the challenges our benchmarks pose. We are hopeful that these corpora will facilitate further research, and we release our data and code at https://github.com/amitakamath/whatsup_vlms.	翻訳日:2023-11-01 18:39:33 公開日:2023-10-30
# CustomNet: テキスト・画像拡散モデルにおける可変視点によるゼロショットオブジェクトのカスタマイズ CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2310.19784v1 ) ライセンス: Link先を確認	Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan	(参考訳) 画像生成にカスタマイズされたオブジェクトを組み込むことは、テキスト・画像生成において魅力的な特徴である。しかし、既存の最適化ベースおよびエンコーダベースの方法は、時間消費最適化、不十分なアイデンティティ保存、一般的なコピーペースト効果などの欠点によって妨げられている。これらの制限を克服するために、私たちは、オブジェクトのカスタマイズプロセスに3Dの新しいビュー合成機能を明示的に組み込んだ新しいオブジェクトカスタマイズアプローチであるCustomNetを紹介します。この統合により、空間的位置関係と視点の調整が容易になり、オブジェクトのアイデンティティを効果的に保存しながら多様な出力が得られる。さらに,既存の3次元画像合成手法の限界を克服し,テキスト記述やユーザ定義画像による位置制御やフレキシブルな背景制御を実現するための繊細な設計を提案する。さらに私たちは、現実世界のオブジェクトや複雑なバックグラウンドをよりうまく処理できるデータセット構築パイプラインを活用します。これらの設計を取り入れた本手法は,テスト時間最適化なしでゼロショットオブジェクトのカスタマイズを容易にし,視点,位置,背景を同時制御する。その結果、CustomNetはアイデンティティ保護の強化を保証し、多様な調和した出力を生成する。 Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.	翻訳日:2023-11-01 18:39:06 公開日:2023-10-30
# ジェネリック回路アーキテクチャにおける近似t設計 Approximate t-designs in generic circuit architectures ( http://arxiv.org/abs/2310.19783v1 ) ライセンス: Link先を確認	Daniel Belkin, James Allen, Soumik Ghosh, Christopher Kang, Sophia Lin, James Sud, Fred Chong, Bill Fefferman, and Bryan K. Clark	(参考訳) ユニタリ t-デザインは、最初の t モーメントが最大ランダムに見えるユニタリ群上の分布である。以前の研究は、特定のランダム量子回路アンサンブルがt設計を近似する深さのいくつかの上界を確立した。ここで、これらの境界はハールランダム二箇所ゲートの任意の固定アーキテクチャに拡張可能であることを示す。これは、そのようなアーキテクチャのスペクトルギャップと1Dブリックワークアーキテクチャのギャップを関連付けることで達成される。我々の境界は、回路のブロックがサイト上に接続されたグラフを形成するのに必要な典型的な層数のみを通して、アーキテクチャの詳細に依存する。この量が幅に依存しない場合、回路は線形深さで近似t設計を形成する。また、固定アーキテクチャ上の対応する分布の性質の観点から、非決定論的アーキテクチャに暗黙的な境界を与える。 Unitary t-designs are distributions on the unitary group whose first t moments appear maximally random. Previous work has established several upper bounds on the depths at which certain specific random quantum circuit ensembles approximate t-designs. Here we show that these bounds can be extended to any fixed architecture of Haar-random two-site gates. This is accomplished by relating the spectral gaps of such architectures to those of 1D brickwork architectures. Our bound depends on the details of the architecture only via the typical number of layers needed for a block of the circuit to form a connected graph over the sites. When this quantity is independent of width, the circuit forms an approximate t-design in linear depth. We also give an implicit bound for nondeterministic architectures in terms of properties of the corresponding distribution over fixed architectures.	翻訳日:2023-11-01 18:38:46 公開日:2023-10-30

Title

Authors

Abstract

論文公表日・翻訳日

# Rustの安全でないメモリアクセスを識別するための高速な概要ベース全プログラム解析

Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust ( http://arxiv.org/abs/2310.10298v2 )

ライセンス: Link先を確認

Jie Zhou, Mingshen Sun, John Criswell,

(参考訳) Rustは40年以上にわたって低レベルのソフトウェアに悩まされてきたメモリ安全性の問題を根本的に解決する最も有望なシステムプログラミング言語の1つです。しかし、Rustの型ルールが特定のシステムプログラミングに制限されすぎているシナリオと、プログラマがセキュリティチェックよりもパフォーマンスを選択するシナリオに対応するため、Rustは安全でないソースコードを書いたり、安全でないライブラリを呼び出したりするセキュリティ回避ハッチを開放する。その結果、安全でないRustコードと直接リンクされていない外部ライブラリは、メモリ安全違反自体を導入するだけでなく、安全なRustと同じモノリシックなアドレス空間で実行されるプログラム全体を侵害する可能性がある。この問題は、安全でないメモリオブジェクト(安全でないコードによってアクセスされる)を分離し、安全でないメモリへのアクセスをサンドボックス化することで緩和することができる。以前の作業のひとつのカテゴリでは、LLVM IR上の既存のプログラム分析フレームワークを使用して、安全でないメモリオブジェクトとアクセスを識別している。しかし、長い解析時間と低い精度の限界に悩まされている。本稿では,RustのMIR上での要約に基づくプログラム全体の解析を用いて,これらの2つの課題に対処する。要約に基づく分析は、分析時間を節約するために需要情報を算出する。 RustのMIRのパフォーマンス解析は、LLVM IRでは利用できないRust固有のリッチな高レベルな型情報を活用する。この写本は、現在進行中の研究の予備研究である。我々は、安全でないヒープの割り当てと、それらの安全でないヒープオブジェクトへのメモリアクセスの両方を識別するためのプログラム全体をプロトタイプ化した。本稿では,解析のオーバーヘッドと有効性について報告する。

Rust is one of the most promising systems programming languages to fundamentally solve the memory safety issues that have plagued low-level software for over forty years. However, to accommodate the scenarios where Rust's type rules might be too restrictive for certain systems programming and where programmers opt for performance over security checks, Rust opens security escape hatches allowing writing unsafe source code or calling unsafe libraries. Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. This problem can be mitigated by isolating unsafe memory objects (those accessed by unsafe code) and sandboxing memory accesses to the unsafe memory. One category of prior work utilizes existing program analysis frameworks on LLVM IR to identify unsafe memory objects and accesses. However, they suffer the limitations of prolonged analysis time and low precision. In this paper, we tackled these two challenges using summary-based whole-program analysis on Rust's MIR. The summary-based analysis computes information on demand so as to save analysis time. Performing analysis on Rust's MIR exploits the rich high-level type information inherent to Rust, which is unavailable in LLVM IR. This manuscript is a preliminary study of ongoing research. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects. We reported the overhead and the efficacy of the analysis in this paper.

翻訳日:2024-03-19 02:23:27 公開日:2023-10-30

# 米国マイクロエレクトロニクスパッケージング生態系 : 課題と機会

US Microelectronics Packaging Ecosystem: Challenges and Opportunities ( http://arxiv.org/abs/2310.11651v3 )

ライセンス: Link先を確認

Rouhan Noor, Himanandhan Reddy Kottur, Patrick J Craig, Liton Kumar Biswas, M Shafkat M Khan, Nitin Varshney, Hamed Dalir, Elif Akçalı, Bahareh Ghane Motlagh, Charles Woychik, Yong-Kyu Yoon, Navid Asadizanjani,

(参考訳) 半導体産業は、デバイスの縮小とコスト削減という従来の方法から大きく変化している。チップデザイナーは、シリコンフットプリントにより多くの機能を追加しながら、コスト効率を高める新しい技術ソリューションを積極的に求めている。 Heterogeneous Integration (HI) は、最も適切なプロセス技術を用いて、独立して設計された、製造されたコンポーネントを統合する高度なパッケージング技術である。しかし、HIの採用には設計とセキュリティの課題が伴う。 HIを有効にするためには、先進的な包装の研究開発が不可欠である。既存の研究は、アウトソース半導体アセンブリーおよびテスト(OSAT)施設やベンダーのほとんどがオフショアにあるため、先進的な包装サプライチェーンにおけるセキュリティ上の脅威を提起している。半導体の需要の増加に対処し、セキュアな半導体サプライチェーンを確保するために、米国政府による半導体製造設備のオンショア化に向けた大規模な取り組みがある。しかし、セキュアで効率的でレジリエントな半導体サプライチェーンを確立するというビジョンを完全に実現するために、米国の先進的なパッケージング能力も強化されなければならない。当社の取り組みは、米国に本拠を置く先進的なパッケージサプライチェーンにおけるボトルネックと弱いリンクを特定することを目的としていた。

The semiconductor industry is experiencing a significant shift from traditional methods of shrinking devices and reducing costs. Chip designers actively seek new technological solutions to enhance cost-effectiveness while incorporating more features into the silicon footprint. One promising approach is Heterogeneous Integration (HI), which involves advanced packaging techniques to integrate independently designed and manufactured components using the most suitable process technology. However, adopting HI introduces design and security challenges. To enable HI, research and development of advanced packaging is crucial. The existing research raises the possible security threats in the advanced packaging supply chain, as most of the Outsourced Semiconductor Assembly and Test (OSAT) facilities/vendors are offshore. To deal with the increasing demand for semiconductors and to ensure a secure semiconductor supply chain, there are sizable efforts from the United States (US) government to bring semiconductor fabrication facilities onshore. However, the US-based advanced packaging capabilities must also be ramped up to fully realize the vision of establishing a secure, efficient, resilient semiconductor supply chain. Our effort was motivated to identify the possible bottlenecks and weak links in the advanced packaging supply chain based in the US.

翻訳日:2024-03-19 02:13:39 公開日:2023-10-30

# 形態的画像検出のための視線領域の解析

Analyzing eyebrow region for morphed image detection ( http://arxiv.org/abs/2310.19290v1 )

ライセンス: Link先を確認

Abdullah Zafar, Christoph Busch,

(参考訳) 国際民間航空機関(ICAO)によると、パスポートの顔画像は旅行者の確認のための主要な識別子として指定されている。したがって、eMRTD(Electronic Machine-Readable Travel Document)に格納されている顔画像の正当性を確認することが重要である。自動境界制御(ABC)システムの導入により,eMRTDに格納された画像が正常な顔認証システムの動作を妨げたり悪用したりするような変化を防止できるようなシステムを実現することがさらに重要である。このようなシステムに対する攻撃の1つは、顔変形攻撃である。モーフィング画像を検出する技術は数多く存在するが、モーフィングアルゴリズムもこれらの検出を避けるために改善されている。そこで本研究では,形態的画像検出のためのアイブロウ領域を解析する。提案手法は,眼窩領域の周波数を解析することに基づく。この手法は2つのデータセットで評価され,それぞれが2つのアルゴリズムを用いて生成した形態素画像から成っている。提案手法は画像検出に有用なツールであり,画像の信頼性が重要となる様々なアプリケーションに適用可能であることが示唆された。

Facial images in passports are designated as primary identifiers for the verification of travelers according to the International Civil Aviation Organization (ICAO). Hence, it is important to ascertain the sanctity of the facial images stored in the electronic Machine-Readable Travel Document (eMRTD). With the introduction of automated border control (ABC) systems that rely on face recognition for the verification of travelers, it is even more crucial to have a system to ensure that the image stored in the eMRTD is free from any alteration that can hinder or abuse the normal working of a facial recognition system. One such attack against these systems is the face-morphing attack. Even though many techniques exist to detect morphed images, morphing algorithms are also improving to evade these detections. In this work, we analyze the eyebrow region for morphed image detection. The proposed method is based on analyzing the frequency content of the eyebrow region. The method was evaluated on two datasets that each consisted of morphed images created using two algorithms. The findings suggest that the proposed method can serve as a valuable tool in morphed image detection, and can be used in various applications where image authenticity is critical.

翻訳日:2024-03-18 23:51:32 公開日:2023-10-30

# オフチェーン計算を用いたブロックチェーンに基づくアイデンティティ管理のためのゼロ知識付加型非対話的知識論の組み入れ

Incorporating Zero-Knowledge Succinct Non-interactive Argument of Knowledge for Blockchain-based Identity Management with off-chain computations ( http://arxiv.org/abs/2310.19452v1 )

ライセンス: Link先を確認

Pranay Kothari, Deepak Chopra, Manjot Singh, Shivam Bhardwaj, Rudresh Dwivedi,

(参考訳) 今日の世界では、安全で効率的な生体認証が極めて重要である。従来の認証手法は、サイバー攻撃を受けやすいため、もはや信頼性が低いとみなされている。バイオメトリック認証、特に指紋認証は有望な代替手段として登場したが、生体認証データの保存と使用に対する懸念や、サイバー攻撃に脆弱な集中型ストレージへの懸念が高まっている。本稿では,zk-SNARKを組み込んだブロックチェーンベースの指紋認証システムを提案する。 FVC2002、FVC2004、FVC2006データセットに対するKNNベースのアプローチは、惑星間ファイルシステムを使用して格納される安全で高速で堅牢な生体認証と認証のためのキャンセル可能なテンプレートを生成するために使用される。提案手法は指紋認証のためにそれぞれFVC2002、FVC2004、FVC2006データセットに対して99.01%、98.97%、98.52%の平均精度を提供する。 zk-SNARKの導入は、より小さな証明サイズを促進する。全体として、提案手法はブロックチェーンベースのID管理のためのセキュアで効率的なソリューションを提供する可能性がある。

In today's world, secure and efficient biometric authentication is of keen importance. Traditional authentication methods are no longer considered reliable due to their susceptibility to cyber-attacks. Biometric authentication, particularly fingerprint authentication, has emerged as a promising alternative, but it raises concerns about the storage and use of biometric data, as well as centralized storage, which could make it vulnerable to cyber-attacks. In this paper, a novel blockchain-based fingerprint authentication system is proposed that integrates zk-SNARKs, which are zero-knowledge proofs that enable secure and efficient authentication without revealing sensitive biometric information. A KNN-based approach on the FVC2002, FVC2004 and FVC2006 datasets is used to generate a cancelable template for secure, faster, and robust biometric registration and authentication which is stored using the Interplanetary File System. The proposed approach provides an average accuracy of 99.01%, 98.97% and 98.52% over the FVC2002, FVC2004 and FVC2006 datasets respectively for fingerprint authentication. Incorporation of zk-SNARK facilitates smaller proof size. Overall, the proposed method has the potential to provide a secure and efficient solution for blockchain-based identity management.

翻訳日:2024-03-18 23:51:32 公開日:2023-10-30

# Iris: 構造化ピアツーピアネットワークにおける動的プライバシ保護検索

Iris: Dynamic Privacy Preserving Search in Structured Peer-to-Peer Networks ( http://arxiv.org/abs/2310.19634v1 )

ライセンス: Link先を確認

Angeliki Aktypi, Kasper Rasmussen,

(参考訳) Chordのような構造化ピアツーピアネットワークでは、ユーザーは検索した情報をネットワークから他のノードに尋ねることで、探している情報を取得することができる。検索対象を他のノードに展開することで、クエリのプライバシを必要とするアプリケーション、すなわちルーティングに参加する中間ノードからクエリのターゲットを隠すアプリケーションには、構造化されたピアツーピアネットワークが適さない。本稿では,構造化P2Pネットワーク,特にChordプロトコルのクエリプライバシについて検討する。当初私たちは、$k$-anonymityなどのすでに提案されているプライバシー概念が、強い敵の存在下でのChordにおけるクエリのプライバシー保証を説明できないことを観察しました。したがって、攻撃者の背景知識に関する最悪のシナリオを考慮しても、プライバシ保証を評価することができる、$(\alpha,\delta)$-privacyと呼ぶ新しいプライバシの概念を導入する。次に、リクエストがChord内のクエリのターゲットをルーティングに参加する中間ノードから隠せるアルゴリズムであるIrisを設計する。 Irisは、それぞれのアドレスに到達できるように、ターゲットアドレス以外の要求者クエリを持つことで、要求者がターゲットアドレスに近づくことができる。提案アルゴリズムのセキュリティ解析は,提案するプライバシー概念に基づいて行う。また,このアルゴリズムのプロトタイプをMatlabで開発し,その性能評価を行った。我々の分析では、Irisが$(\alpha,\delta)$-privateであることが証明されている。

In structured peer-to-peer networks like Chord, the users manage to retrieve the information they seek by asking other nodes from the network for the information they search. Revealing to other nodes the search target makes structured peer-to-peer networks unsuitable for applications that demand query privacy, i.e., hiding the query's target from the intermediate nodes that take part in the routing. This paper studies the query privacy of structured P2P networks, particularly the Chord protocol. We initially observe that already proposed privacy notions, such as $k$-anonymity, do not allow us to reason about the privacy guarantees of a query in Chord in the presence of a strong adversary. Thus, we introduce a new privacy notion that we call $(\alpha,\delta)$-privacy that allows us to evaluate the privacy guarantees even when considering the worst-case scenario regarding an attacker's background knowledge. We then design Iris, an algorithm that allows a requester to conceal the target of a query in Chord from the intermediate nodes that take part in the routing. Iris achieves that by having the requester query for other than the target addresses so as reaching each one of them allows the requester to get closer to the target address. We perform a security analysis of the proposed algorithm, based on the privacy notion we introduce. We also develop a prototype of the algorithm in Matlab and evaluate its performance. Our analysis proves Iris to be $(\alpha,\delta)$-private while introducing a modest performance overhead.

翻訳日:2024-03-18 23:51:32 公開日:2023-10-30

# ブロックチェーンとNFTを用いた学生証書共有システム

Student Certificate Sharing System Using Blockchain and NFTs ( http://arxiv.org/abs/2310.20036v1 )

ライセンス: Link先を確認

Prakhyat Khati, Ajay Kumar Shrestha, Julita Vassileva,

(参考訳) 本稿では,ブロックチェーンに基づく証明書共有システムを提案する。私たちの戦略は、ブロックチェーンアドレスを使用して機関や雇用者と共有可能なブロックチェーンベースのNFT認証を開発することです。学生は個々の機関が作成したデータに単一のプラットフォームでアクセスし、要求に応じて関連するコースのビューをフィルタリングし、証明書のメタデータをNFTとしてミントする。この方法は、アクセスのアカウンタビリティ、IPFSで永久に保持される包括的なレコード、証明書の作成、配布、アクセスのための検証可能な証明を提供する。また、証明書をより安全かつ効率的に共有することができる。データ証明を通じて信頼要因を組み込むことで,偽証明書や重複証明書などの問題に対する対策を行う。これは、手作業の長い従来の認証検証プロセスの課題に対処する。このシステムにより、学生は、デジタル署名による認証と機密性の確保と、不正アクセスに対するデータ保護のハッシュ化を図りながら、複数の機関の学術的資格を1カ所で管理し、検証することができる。全体として,提案システムは,証明書配布に対する新たなアプローチを提供しながら,データの安全性,説明可能性,機密性を保証する。

In this paper, we propose a certificate sharing system based on blockchain that gives students authority and control over their academic certificates. Our strategy involves developing blockchain-based NFT certifications that can be shared with institutions or employers using blockchain addresses. Students may access the data created by each individual institute in a single platform, filter the view of the relevant courses according to their requirements, and mint their certificate metadata as NFTs. This method provides accountability of access, comprehensive records that are permanently maintained in IPFS, and verifiable provenance for creating, distributing, and accessing certificates. It also makes it possible to share certificates more safely and efficiently. By incorporating trust factors through data provenance, our system provides a countermeasure against issues such as fake and duplicate certificates. It addresses the challenge of the traditional certificate verification processes, which are lengthy manual process. With this system, students can manage and validate their academic credentials from multiple institutions in one location while ensuring authenticity and confidentiality using digital signatures and hashing for data protection against unauthorized access. Overall, our suggested system ensures data safety, accountability, and confidentiality while offering a novel approach to certificate distribution.

翻訳日:2024-03-18 23:51:32 公開日:2023-10-30

# 医療におけるデータマイニング情報の安全性と信頼性:文献レビュー

Preserving The Safety And Confidentiality Of Data Mining Information In Health Care: A literature review ( http://arxiv.org/abs/2312.00016v1 )

ライセンス: Link先を確認

Robinson Onyemechi Oturugbum,

(参考訳) 毎日大量のデータが生成されるのは、物のインターネットが急速に発達し、今では医療産業に浸透しているからだ。データマイニングの最近の進歩は、プライバシー保護データマイニング(PPDM)と呼ばれる研究の新たな分野を生み出している。 PPDM技術やアプローチは、個人情報のプライバシーを守り、社会全体の利益を保ちながら、膨大な量のデータから実行可能な洞察を抽出することを可能にする。データ統合は、センシティブな患者情報の共有を必要とする。しかし、潜在的に機密性の高い情報の保存と送信に関して、かなりのプライバシー問題が提起されている。機密情報の開示は患者のプライバシーを侵害する。本稿では,プライバシ保護機構,データ保護規制,緩和戦略に関する関連研究のレビューを行う。レビューでは、他のどの戦略よりも優れた戦略はないと結論付けている。したがって、今後の研究は、大量の医療データの時代におけるプライバシソリューションの適切な技術と評価基準の標準化に焦点を当てるべきである。

Daily, massive volume of data are produced due to the internet of things' rapid development, which has now permeated the healthcare industry. Recent advances in data mining have spawned a new field of a study dubbed privacy-preserving data mining (PPDM). PPDM technique or approach enables the extraction of actionable insight from enormous volume of data while safeguarding the privacy of individual information and benefiting the entire society Medical research has taken a new course as a result of data mining with healthcare data to detect diseases earlier and improve patient care. Data integration necessitates the sharing of sensitive patient information. However, substantial privacy issues are raised in connection with the storage and transmission of potentially sensitive information. Disclosing sensitive information infringes on patients' privacy. This paper aims to conduct a review of related work on privacy-preserving mechanisms, data protection regulations, and mitigating tactics. The review concluded that no single strategy outperforms all others. Hence, future research should focus on adequate techniques for privacy solutions in the age of massive medical data and the standardization of evaluation standards.

翻訳日:2024-03-18 13:35:06 公開日:2023-10-30

# 機械学習に基づくセグメンテーションにおける不確かさの定量化:MRIにおける左室容積推定のためのポストホックアプローチ

Uncertainty Quantification in Machine Learning Based Segmentation: A Post-Hoc Approach for Left Ventricle Volume Estimation in MRI ( http://arxiv.org/abs/2312.02167v1 )

ライセンス: Link先を確認

F. Terhag, P. Knechtges, A. Basermann, R. Tempone

(参考訳) 近年の研究では、心臓血管疾患が非感染性疾患の死亡率が最高であることが確認されている。左室容積推定は各種心血管疾患の診断・管理に重要であるが,MRI(MRI)におけるセグメンテーションアルゴリズムに係わる不確実性から重要な課題である。近年の機械学習の進歩、特にU-Netのような畳み込みネットワークは、医療画像の自動セグメンテーションを促進するが、特定の病理や異なるスキャナーベンダーやイメージングプロトコルで苦労している。本研究では,予測誤差のパスワイズ挙動をモデル化するために, it\^{o}確率微分方程式 (sdes) を用いたlv容積予測におけるポストホック不確実性推定手法を提案する。このモデルは、心臓の長軸に沿って左室の面積を記述している。この方法は、基礎となるセグメンテーションアルゴリズムとは無関係であり、様々な既存および将来のセグメンテーション技術での使用を容易にする。提案手法は不確かさを定量化するメカニズムを提供し、医療専門家が信頼できない予測に介入できるようにする。これは、予測精度と信頼性が患者の結果に直接影響を及ぼす医療診断などの重要な応用において最も重要である。この手法はデータセットの変更にも堅牢であり、ラベル付きデータへのアクセスが制限された医療センターへの応用を可能にする。提案する不確実性推定手法は, 自動セグメンテーションの堅牢性と一般化性を高める可能性を示し, 臨床現場におけるより信頼性が高く正確なlv容積推定への道を開くとともに, バイオメディカル画像セグメンテーションにおける不確実性定量化のための新たな道を開くとともに, 今後の研究に有望な方向性を提供する。

Recent studies have confirmed cardiovascular diseases remain responsible for highest death toll amongst non-communicable diseases. Accurate left ventricular (LV) volume estimation is critical for valid diagnosis and management of various cardiovascular conditions, but poses significant challenge due to inherent uncertainties associated with segmentation algorithms in magnetic resonance imaging (MRI). Recent machine learning advancements, particularly U-Net-like convolutional networks, have facilitated automated segmentation for medical images, but struggles under certain pathologies and/or different scanner vendors and imaging protocols. This study proposes a novel methodology for post-hoc uncertainty estimation in LV volume prediction using It\^{o} stochastic differential equations (SDEs) to model path-wise behavior for the prediction error. The model describes the area of the left ventricle along the heart's long axis. The method is agnostic to the underlying segmentation algorithm, facilitating its use with various existing and future segmentation technologies. The proposed approach provides a mechanism for quantifying uncertainty, enabling medical professionals to intervene for unreliable predictions. This is of utmost importance in critical applications such as medical diagnosis, where prediction accuracy and reliability can directly impact patient outcomes. The method is also robust to dataset changes, enabling application for medical centers with limited access to labeled data. Our findings highlight the proposed uncertainty estimation methodology's potential to enhance automated segmentation robustness and generalizability, paving the way for more reliable and accurate LV volume estimation in clinical settings as well as opening new avenues for uncertainty quantification in biomedical image segmentation, providing promising directions for future research.

翻訳日:2024-01-15 15:12:14 公開日:2023-10-30

# 知識グラフのためのオープンドメイン知識抽出

Open Domain Knowledge Extraction for Knowledge Graphs ( http://arxiv.org/abs/2312.09424v1 )

ライセンス: Link先を確認

Kun Qian, Anton Belyi, Fei Wu, Samira Khorshidi, Azadeh Nikfarjam, Rahul Khot, Yisi Sang, Katherine Luna, Xianqi Chu, Eric Choi, Yash Govind, Chloe Seivwright, Yiwen Sun, Ahmed Fakhry, Theo Rekatsinas, Ihab Ilyas, Xiaoguang Qi, Yunyao Li

(参考訳) 知識グラフの品質は、下流アプリケーションの品質に直接影響する(例えば、グラフを使用した回答可能な質問の数など)。ナレッジグラフを構築する際の課題のひとつは、グラフのエンティティと事実の完全性と鮮度を保証することだ。本稿では,オープンWebから高品質なエンティティや事実を大規模にソースする,スケーラブルで拡張可能なフレームワークODKEを紹介する。 odkeは幅広い抽出モデルを使用し、異なるレイテンシでストリーミング処理とバッチ処理の両方をサポートする。私たちは、業界規模のオープンドメイン知識グラフを成長させるためにODKEの構築とデプロイで学んだ課題と設計上の決定を反映します。

The quality of a knowledge graph directly impacts the quality of downstream applications (e.g. the number of answerable questions using the graph). One ongoing challenge when building a knowledge graph is to ensure completeness and freshness of the graph's entities and facts. In this paper, we introduce ODKE, a scalable and extensible framework that sources high-quality entities and facts from open web at scale. ODKE utilizes a wide range of extraction models and supports both streaming and batch processing at different latency. We reflect on the challenges and design decisions made and share lessons learned when building and deploying ODKE to grow an industry-scale open domain knowledge graph.

翻訳日:2024-01-15 14:24:17 公開日:2023-10-30

# 文化アルゴリズム最適化によるKnapsackチャレンジへの取り組み

Addressing The Knapsack Challenge Through Cultural Algorithm Optimization ( http://arxiv.org/abs/2401.03324v1 )

ライセンス: Link先を確認

Mohammad Saleh Vahdatpour

(参考訳) 0-1 クナプサック問題」は古典的な組合せ最適化の問題であり、与えられた集合から項目のサブセットを選択する必要がある。各項目は固有の値と重みを持ち、主な目的は予め定義されたキャパシティ制約に固執しながら総価値を最大化する選択戦略を定式化することである。本稿では,0-1knapsack問題の解法に特化して設計された新しい文化アルゴリズムについて紹介する。提案アルゴリズムは,集団を洗練するための信念空間を取り入れ,進化過程における交叉率と突然変異率を動的に調節する2つの重要な機能を導入する。大規模な実験を通じて、高次元と複雑な制約によって特徴づけられるクナプサック問題においても、アルゴリズムが常にグローバルな最適位置を探索する際の顕著な効率を示す。

The "0-1 knapsack problem" stands as a classical combinatorial optimization conundrum, necessitating the selection of a subset of items from a given set. Each item possesses inherent values and weights, and the primary objective is to formulate a selection strategy that maximizes the total value while adhering to a predefined capacity constraint. In this research paper, we introduce a novel variant of Cultural Algorithms tailored specifically for solving 0-1 knapsack problems, a well-known combinatorial optimization challenge. Our proposed algorithm incorporates a belief space to refine the population and introduces two vital functions for dynamically adjusting the crossover and mutation rates during the evolutionary process. Through extensive experimentation, we provide compelling evidence of the algorithm's remarkable efficiency in consistently locating the global optimum, even in knapsack problems characterized by high dimensions and intricate constraints.

翻訳日:2024-01-15 09:18:31 公開日:2023-10-30

# SeamlessNeRF: 勾配伝搬による部分NeRFのスチッチ化

SeamlessNeRF: Stitching Part NeRFs with Gradient Propagation ( http://arxiv.org/abs/2311.16127v1 )

ライセンス: Link先を確認

Bingchen Gong and Yuehao Wang and Xiaoguang Han and Qi Dou

(参考訳) neural radiance fields(nerfs)は、3dオブジェクトとシーンのデジタルメディアとして登場し、この領域で編集機能を拡張する研究が急増した。複数NeRFのシームレスな編集とマージのタスクは、2D画像編集における ''Poisson blending''' に似ており、既存の作業で探索されていない重要な操作のままである。このギャップを埋めるために、複数のNeRFをシームレスに混合する新しいアプローチであるSeamlessNeRFを提案する。具体的には,ターゲット放射界の外観を最適化し,音源場との調和を図ることを目的としている。本稿では,ブレンディングの最適化手法を提案する。 1)光源と対象フィールドとの交差境界領域における放射色をピン留めする。 2) 目標の本来の勾配を維持すること。広範な実験により,我々のアプローチは,勾配を通じて境界領域から対象フィールド全体へのソースの出現を効果的に伝達できることを検証した。われわれの知る限り、seamlessnerfはradianceフィールドにグラデーションガイド付き外観編集を導入する最初の作品であり、nerfで表現された3dオブジェクトをシームレスに縫い合わせるためのソリューションを提供する。

Neural Radiance Fields (NeRFs) have emerged as promising digital mediums of 3D objects and scenes, sparking a surge in research to extend the editing capabilities in this domain. The task of seamless editing and merging of multiple NeRFs, resembling the ``Poisson blending'' in 2D image editing, remains a critical operation that is under-explored by existing work. To fill this gap, we propose SeamlessNeRF, a novel approach for seamless appearance blending of multiple NeRFs. In specific, we aim to optimize the appearance of a target radiance field in order to harmonize its merge with a source field. We propose a well-tailored optimization procedure for blending, which is constrained by 1) pinning the radiance color in the intersecting boundary area between the source and target fields and 2) maintaining the original gradient of the target. Extensive experiments validate that our approach can effectively propagate the source appearance from the boundary area to the entire target field through the gradients. To the best of our knowledge, SeamlessNeRF is the first work that introduces gradient-guided appearance editing to radiance fields, offering solutions for seamless stitching of 3D objects represented in NeRFs.

翻訳日:2023-12-03 13:30:09 公開日:2023-10-30

# 抗体構造系列共設計のための階層的学習パラダイム

A Hierarchical Training Paradigm for Antibody Structure-sequence Co-design ( http://arxiv.org/abs/2311.16126v1 )

ライセンス: Link先を確認

Fang Wu, Stan Z. Li

(参考訳) 治療抗体は必須であり、急速に拡大する薬物モダリティである。抗体と抗原の結合特異性は、これらのY型タンパク質の先端における相補性決定領域(CDR)によって決定される。本稿では,抗体配列構造共設計のための階層的訓練パラダイム(HTP)を提案する。 htpは4段階のトレーニングステージからなり、それぞれ特定のタンパク質ドメイン内の特定のタンパク質モダリティに対応する。異なる段階のタスクを慎重に作成することで、HTPは幾何グラフニューラルネットワーク(GNN)を大規模タンパク質言語モデルとシームレスかつ効果的に統合し、幾何学構造だけでなく、巨大な抗体や非抗体配列データベースから進化情報を抽出し、リガンド結合のポーズと強度を決定する。実証実験により、HTPは、共同設計問題と固定バックボーン設計において、新しい最先端性能を設定できることが示されている。我々の研究は、深い生成的アーキテクチャの可能性を解き明かし、抗体配列と構造共設計の課題への道のりを照らそうとしている。

Therapeutic antibodies are an essential and rapidly expanding drug modality. The binding specificity between antibodies and antigens is decided by complementarity-determining regions (CDRs) at the tips of these Y-shaped proteins. In this paper, we propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design. HTP consists of four levels of training stages, each corresponding to a specific protein modality within a particular protein domain. Through carefully crafted tasks in different stages, HTP seamlessly and effectively integrates geometric graph neural networks (GNNs) with large-scale protein language models to excavate evolutionary information from not only geometric structures but also vast antibody and non-antibody sequence databases, which determines ligand binding pose and strength. Empirical experiments show that HTP sets the new state-of-the-art performance in the co-design problem as well as the fix-backbone design. Our research offers a hopeful path to unleash the potential of deep generative architectures and seeks to illuminate the way forward for the antibody sequence and structure co-design challenge.

翻訳日:2023-12-03 13:29:46 公開日:2023-10-30

# 「外を少しだけ見て」--社会的帰属自信と機械学習と人工知能の学生の永続性

"Just a little bit on the outside for the whole time": Social belonging confidence and the persistence of Machine Learning and Artificial Intelligence students ( http://arxiv.org/abs/2311.10745v1 )

ライセンス: Link先を確認

Katherine Mao, Sharon Ferguson, James Magarian, Alison Olechowski

(参考訳) 機械学習(ML)と人工知能(AI)の成長分野は、永続研究においてユニークで未探索なケースを示しており、この発展分野にエンジニアリングによる過去の発見がどの程度適用されるのかは不明である。我々は,この分野での持続性の最初の理解を得るために探索的研究を行い,将来的な仕事の有益な方向を特定する。工学における永続性を予測できる要因の一つとして,信頼のレンズを通した存在を考察し,社会的存在の信頼に対する関心が,職業の多様性を高めるのにどう役立つかについて議論する。本稿では,ML/AI講座の学生へのインタビューを小規模に実施する。これらのインタビューのテーマ分析から、学生がML/AIのキャリアをどう見ているかは、興味やプログラミングの自信に基づいて異なることが判明した。実験では,露出と開始,MLとAIのフィールド境界の解釈,成功に必要なスキルの信念が,学生の持続への意図にどのように影響するかを確認した。学生が社会的帰属によって動機づけられることと、密接なメンターシップの重要性の相違について論じる。 ML/AIにおけるより永続的な研究の動機は、特に社会的帰属と密接なメンターシップ、交差点アイデンティティの役割、そして入門的なML/AIコースである。

The growing field of machine learning (ML) and artificial intelligence (AI) presents a unique and unexplored case within persistence research, meaning it is unclear how past findings from engineering will apply to this developing field. We conduct an exploratory study to gain an initial understanding of persistence in this field and identify fruitful directions for future work. One factor that has been shown to predict persistence in engineering is belonging; we study belonging through the lens of confidence, and discuss how attention to social belonging confidence may help to increase diversity in the profession. In this research paper, we conduct a small set of interviews with students in ML/AI courses. Thematic analysis of these interviews revealed initial differences in how students see a career in ML/AI, which diverge based on interest and programming confidence. We identified how exposure and initiation, the interpretation of ML and AI field boundaries, and beliefs of the skills required to succeed might influence students' intentions to persist. We discuss differences in how students describe being motivated by social belonging and the importance of close mentorship. We motivate further persistence research in ML/AI with particular focus on social belonging and close mentorship, the role of intersectional identity, and introductory ML/AI courses.

翻訳日:2023-11-27 00:44:51 公開日:2023-10-30

# 機械学習と人工知能における学生の意図的持続性モデルの構築

Advancing a Model of Students' Intentional Persistence in Machine Learning and Artificial Intelligence ( http://arxiv.org/abs/2311.10744v1 )

ライセンス: Link先を確認

Sharon Ferguson, Katherine Mao, James Magarian, Alison Olechowski

(参考訳) 機械学習(ML)と人工知能(AI)は、私たちが使用しているアプリケーション、意思決定、そして私たちに関する決定を支えています。多様性を念頭に設計する際,顔認識アルゴリズムから復習アルゴリズムに至るまで,不平等な結果の例を数多く見てきた。したがって、この分野における多様性を促進するための行動をとる必要がある。この研究における重要なステップは、ML/AIを学ぶことを選んだ一部の学生が後に現場を離れる理由を理解することである。多様な集団の持続性は工学的に研究されているが、ML/AIの持続性に影響を与える要因を研究する研究は乏しい。本研究では,ML/AIコースの学生を対象に,ML/AIにおける意図的永続性モデルの構築について述べる。性別,国際学生状況,学生ローン状況,可視的マイノリティ状態などの集団間の持続性について検討した。我々は、ml/aiを他のstem分野と区別する独立した変数、例えば、非技術スキルに対する様々な重点、仕事の曖昧な倫理的意味、そしてこの分野の競争的で収益性の高い性質について検討する。以上より,短期的意図的持続性は,学術的入学要因に関連していると考えられた。長期的持続性は、職業的役割の信頼性の尺度と相関する。私たちの研究に特有ののは、自分の仕事をポジティブな社会的利益にしたいというのは、長期的な意図的な持続性の負の予測要因であるということです。我々は,学級におけるML/AI倫理を有意義に議論し,分野の多様性を高めるために対人スキルの発達を促すことを教育者に勧める。

Machine Learning (ML) and Artificial Intelligence (AI) are powering the applications we use, the decisions we make, and the decisions made about us. We have seen numerous examples of non-equitable outcomes, from facial recognition algorithms to recidivism algorithms, when they are designed without diversity in mind. Thus, we must take action to promote diversity among those in this field. A critical step in this work is understanding why some students who choose to study ML/AI later leave the field. While the persistence of diverse populations has been studied in engineering, there is a lack of research investigating factors that influence persistence in ML/AI. In this work, we present the advancement of a model of intentional persistence in ML/AI by surveying students in ML/AI courses. We examine persistence across demographic groups, such as gender, international student status, student loan status, and visible minority status. We investigate independent variables that distinguish ML/AI from other STEM fields, such as the varying emphasis on non-technical skills, the ambiguous ethical implications of the work, and the highly competitive and lucrative nature of the field. Our findings suggest that short-term intentional persistence is associated with academic enrollment factors such as major and level of study. Long-term intentional persistence is correlated with measures of professional role confidence. Unique to our study, we show that wanting your work to have a positive social benefit is a negative predictor of long-term intentional persistence, and women generally care more about this. We provide recommendations to educators to meaningfully discuss ML/AI ethics in classes and encourage the development of interpersonal skills to help increase diversity in the field.

翻訳日:2023-11-27 00:44:27 公開日:2023-10-30

# KG-FRUS:米国外交関係127年間のグラフベースの新しいデータセット

KG-FRUS: a Novel Graph-based Dataset of 127 Years of US Diplomatic Relations ( http://arxiv.org/abs/2311.01606v1 )

ライセンス: Link先を確認

G\"okberk \"Ozsoy, Luis Salamanca, Matthew Connelly, Raymond Hicks and Fernando P\'erez-Cruz

(参考訳) 本稿では,米国政府の外交文書を知識グラフ(kg)にエンコードした30万以上の国文書からなるkg-frusデータセットを提案する。我々は、米国の外交関係(frus)のデータ(xmlファイルとして利用可能)を利用して、文書やその中に言及されている個人や国に関する情報を抽出する。抽出されたエンティティと関連するメタデータを使用して、グラフベースのデータセットを作成します。さらに、生成したKGをWikidataから追加のエンティティと関係を補足する。 kgにおける関係は、外交、外交、政治といった複雑な分野の研究と理解に必要なシナジーとダイナミクスを捉えている。これは、テキスト内のエンティティ間の関係を無視する、単純なドキュメントのコレクションを越えている。我々は、現在のデータセットの様々な可能性を示し、kgを探索する異なるアプローチを図示する。本稿では、単純な研究質問に対するクエリ言語の使用方法と、完全なグラフ構造から恩恵を受けるNode2VecやPageRankといったグラフアルゴリズムの使用方法を例示する。さらに重要なことに、選択された構造は、グラフを継続的に拡張し、拡張するための完全な柔軟性を提供します。提案したKG構築パイプラインは、時間に依存した複雑な現象の他の元のコーパスを符号化することができる。全体として、時間依存関連テキストデータのより汎用的な表現を提供するKGデータベース作成機構と、全重要FRUSデータベースへの特定の応用について述べる。

In the current paper, we present the KG-FRUS dataset, comprised of more than 300,000 US government diplomatic documents encoded in a Knowledge Graph (KG). We leverage the data of the Foreign Relations of the United States (FRUS) (available as XML files) to extract information about the documents and the individuals and countries mentioned within them. We use the extracted entities, and associated metadata, to create a graph-based dataset. Further, we supplement the created KG with additional entities and relations from Wikidata. The relations in the KG capture the synergies and dynamics required to study and understand the complex fields of diplomacy, foreign relations, and politics. This goes well beyond a simple collection of documents which neglects the relations between entities in the text. We showcase a range of possibilities of the current dataset by illustrating different approaches to probe the KG. In the paper, we exemplify how to use a query language to answer simple research questions and how to use graph algorithms such as Node2Vec and PageRank, that benefit from the complete graph structure. More importantly, the chosen structure provides total flexibility for continuously expanding and enriching the graph. Our solution is general, so the proposed pipeline for building the KG can encode other original corpora of time-dependent and complex phenomena. Overall, we present a mechanism to create KG databases providing a more versatile representation of time-dependent related text data and a particular application to the all-important FRUS database.

翻訳日:2023-11-12 19:56:39 公開日:2023-10-30

# テキスト予測のための忠実でロバストな局所解釈可能性

Faithful and Robust Local Interpretability for Textual Predictions ( http://arxiv.org/abs/2311.01605v1 )

ライセンス: Link先を確認

Gianluigi Lopardo, Frederic Precioso, Damien Garreau

(参考訳) 機械学習モデルの信頼性と重要なドメインへのデプロイには、解釈可能性が不可欠である。しかし、既存のテキストモデルを解釈する手法はしばしば複雑であり、数学的基礎が固まっておらず、その性能は保証されていない。本稿では,テキスト上の予測を解釈する新しい方法であるfred(faithful and robust explanationer for textual documents)を提案する。 FREDは、削除された際の予測に大きな影響を及ぼすドキュメントのキーワードを識別する。解釈可能な分類器に関する形式的定義と理論的解析を通じてフレッドの信頼性を確立する。さらに、最先端手法に対する経験的評価は、テキストモデルに対する洞察を提供することにおけるfredの有効性を示している。

Interpretability is essential for machine learning models to be trusted and deployed in critical domains. However, existing methods for interpreting text models are often complex, lack solid mathematical foundations, and their performance is not guaranteed. In this paper, we propose FRED (Faithful and Robust Explainer for textual Documents), a novel method for interpreting predictions over text. FRED identifies key words in a document that significantly impact the prediction when removed. We establish the reliability of FRED through formal definitions and theoretical analyses on interpretable classifiers. Additionally, our empirical evaluation against state-of-the-art methods demonstrates the effectiveness of FRED in providing insights into text models.

翻訳日:2023-11-12 19:56:17 公開日:2023-10-30

# グリーンウォッシングの検出に言語モデルを活用する

Leveraging Language Models to Detect Greenwashing ( http://arxiv.org/abs/2311.01469v1 )

ライセンス: Link先を確認

Avalon Vinella, Margaret Capetz, Rebecca Pattichis, Christina Chance, and Reshmi Ghosh

(参考訳) 近年、気候変動による影響が大衆の関心を惹きつけている。その結果、企業は公的なイメージを強化するために持続可能性レポートへの環境取り組みを強調している。しかし、このような報告書のレビューに厳格な規制がないことは、グリーンウォッシングの可能性を秘めている。本研究では,グリーンウォッシングリスクを考慮に入れたラベルを用いた言語モデル学習手法を提案する。本研究の主な貢献は,緑化リスクを定量化するための数学的定式化,この問題に対する微調整式CurrentBERTモデル,結果の比較分析である。持続可能性レポートからなるテストセットでは, 平均精度スコア86.34%, F1スコア0.67を達成し, 提案手法が本課題に対する探索の有望な方向を示すことを示した。

In recent years, climate change repercussions have increasingly captured public interest. Consequently, corporations are emphasizing their environmental efforts in sustainability reports to bolster their public image. Yet, the absence of stringent regulations in review of such reports allows potential greenwashing. In this study, we introduce a novel methodology to train a language model on generated labels for greenwashing risk. Our primary contributions encompass: developing a mathematical formulation to quantify greenwashing risk, a fine-tuned ClimateBERT model for this problem, and a comparative analysis of results. On a test set comprising of sustainability reports, our best model achieved an average accuracy score of 86.34% and F1 score of 0.67, demonstrating that our methods show a promising direction of exploration for this task.

翻訳日:2023-11-12 19:55:54 公開日:2023-10-30

# あなたは次に何をすべきかを覚えています

Remember what you did so you know what to do next ( http://arxiv.org/abs/2311.01468v1 )

ライセンス: Link先を確認

Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, Ralph Weischedel

(参考訳) 小学校理科実験用テキストゲームシミュレータであるScienceWorldにおいて、中規模大言語モデル(GPT-J 6Bパラメータ)を用いて、シミュレーションロボットが30種類の目標を達成する計画を作成する。以前に出版された経験的研究によると、大型言語モデル(LLM)は強化学習と比較して不適合である(Wang et al., 2022)。マルコフの仮定(前のステップの1つ)を用いて、LLMは強化学習に基づくアプローチを1.4倍に向上させる。 LLMの入力バッファをできるだけ多くの事前ステップで満たすと、改善は3.5倍になる。トレーニングデータのわずか6.5%のトレーニングでも、強化学習に基づくアプローチよりも2.2倍の改善が見られた。実験の結果、30種類のアクションに対して、パフォーマンスが広範囲に分散していることが判明した。 2023年、Lin et al.(2023年)は、OpenAIの大規模LLMを補完する小さなLLM(T5-large)を用いて、ScienceWorldで優れた結果を得るための2部アプローチ(SwiftSage)を実演した。我々の6-BパラメータであるシングルステージGPT-Jは、GPT-Jよりも29倍のパラメータを持つGPT-3.5ターボを組み込んだSwiftSageの2段アーキテクチャの性能と一致する。

We explore using a moderately sized large language model (GPT-J 6B parameters) to create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments. Previously published empirical work claimed that large language models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement learning. Using the Markov assumption (a single previous step), the LLM outperforms the reinforcement learning-based approach by a factor of 1.4. When we fill the LLM's input buffer with as many prior steps as possible, improvement rises to 3.5x. Even when training on only 6.5% of the training data, we observe a 2.2x improvement over the reinforcement-learning-based approach. Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues. In work contemporaneous with ours, Lin et al. (2023) demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large) complemented by OpenAI's massive LLMs to achieve outstanding results in ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has 29-times more parameters than GPT-J.

翻訳日:2023-11-12 19:55:41 公開日:2023-10-30

# 非iidデータ上の差分プライベートフェデレーションクラスタリング

Differentially Private Federated Clustering over Non-IID Data ( http://arxiv.org/abs/2301.00955v3 )

ライセンス: Link先を確認

Yiwei Li, Shuai Wang, Chong-Yung Chi, Tony Q. S. Quek

(参考訳) 本稿では,大規模クライアント上に分散した未ラベルデータサンプルをパラメータサーバのオーケストレーション下で有限クラスタに正確に分割することを目的とした,フェデレーションクラスタリング(FedC)問題について検討する。クラスタセントロイドを示す実変数と,各データサンプルのクラスタメンバシップを示すバイナリ変数を含むNPハード最適化問題であるが,ソフトクラスタリングソリューションにより,FedC問題を1つの凸制約のみで非凸最適化問題に変換する。そこで,DP-FedCと呼ばれる差分プライバシ(DP)技術を用いた新しいFedCアルゴリズムを提案する。さらに, プライバシ保護と収束率の理論的解析により, 提案するdp-fedcの設計指針として理想的に機能する非識別・独立分散(非i.i.d.)データに対して, 提案するdp-fedcの様々な特性が得られた。次に, 提案するdp-fedcの有効性と, 最先端のfemcアルゴリズムよりも優れた性能, 提示されたすべての解析結果との一貫性を実証するために, 2つの実データを用いた実験結果を提示した。

In this paper, we investigate federated clustering (FedC) problem, that aims to accurately partition unlabeled data samples distributed over massive clients into finite clusters under the orchestration of a parameter server, meanwhile considering data privacy. Though it is an NP-hard optimization problem involving real variables denoting cluster centroids and binary variables denoting the cluster membership of each data sample, we judiciously reformulate the FedC problem into a non-convex optimization problem with only one convex constraint, accordingly yielding a soft clustering solution. Then a novel FedC algorithm using differential privacy (DP) technique, referred to as DP-FedC, is proposed in which partial clients participation and multiple local model updating steps are also considered. Furthermore, various attributes of the proposed DP-FedC are obtained through theoretical analyses of privacy protection and convergence rate, especially for the case of non-identically and independently distributed (non-i.i.d.) data, that ideally serve as the guidelines for the design of the proposed DP-FedC. Then some experimental results on two real datasets are provided to demonstrate the efficacy of the proposed DP-FedC together with its much superior performance over some state-of-the-art FedC algorithms, and the consistency with all the presented analytical results.

翻訳日:2023-11-12 19:54:54 公開日:2023-10-30

# 安全かつパーソナライズ可能な自動運転車開発のための選好学習アプローチ

A Preference Learning Approach to Develop Safe and Personalizable Autonomous Vehicles ( http://arxiv.org/abs/2311.02099v1 )

ライセンス: Link先を確認

Ruya Karagulle and Nikos Arechiga and Andrew Best and Jonathan DeCastro and Necmiye Ozay

(参考訳) 本研究は,自動運転車の交通規則遵守を保証する選好学習手法を提案する。本手法では,トラフィックルールを記述する信号時相論理(stl)の優先順位順序付けを学習フレームワークに組み込む。パラメトリック重み付き信号時間論理(PWSTL)を利用して、ペア比較に基づく安全保証優先学習の問題を定式化し、この学習問題を解決するためのアプローチを提案する。提案手法は, 与えられたPWSTL式を重み付けし, これらの重み付けにより, 優先信号が非優先値よりも重み付けされた量的満足度測定値であることを示す。提案手法により得られた重みの有意な評価は,重み付きSTL式に導かれる。本手法は,停止標識と横断歩道を含む2つの異なる運転シナリオにおいて,被験者実験により性能を実証する。提案手法は,既存の選好学習手法と比較して,嗜好を捉えて比較し,安全性を考慮すれば,特に勝っている。

This work introduces a preference learning method that ensures adherence to traffic rules for autonomous vehicles. Our approach incorporates priority ordering of signal temporal logic (STL) formulas, describing traffic rules, into a learning framework. By leveraging the parametric weighted signal temporal logic (PWSTL), we formulate the problem of safety-guaranteed preference learning based on pairwise comparisons, and propose an approach to solve this learning problem. Our approach finds a feasible valuation for the weights of the given PWSTL formula such that, with these weights, preferred signals have weighted quantitative satisfaction measures greater than their non-preferred counterparts. The feasible valuation of weights given by our approach leads to a weighted STL formula which can be used in correct-and-custom-by-construction controller synthesis. We demonstrate the performance of our method with human subject studies in two different simulated driving scenarios involving a stop sign and a pedestrian crossing. Our approach yields competitive results compared to existing preference learning methods in terms of capturing preferences, and notably outperforms them when safety is considered.

翻訳日:2023-11-12 19:44:50 公開日:2023-10-30

# bsdar: ニューラルキーフレーズ生成における注意報奨付きビーム探索復号

BSDAR: Beam Search Decoding with Attention Reward in Neural Keyphrase Generation ( http://arxiv.org/abs/1909.09485v2 )

ライセンス: Link先を確認

Iftitahu Ni'mah, Vlado Menkovski, Mykola Pechenizkiy

(参考訳) 本研究は, ニューラルキーフレーズ生成における2つの共通デコード問題, シーケンス長バイアスとビーム多様性について検討した。そこで本研究では,単語レベルとngramレベルの報酬関数に基づくビーム探索復号手法を導入し,seq2seq推論をテスト時に制約・洗練する。その結果,提案手法はアルゴリズムのバイアスを克服し,より短く,ほぼ同一のシーケンスに到達し,ソーステキストに存在しないキーフレーズを生成する際の復号性能が大幅に向上した。

This study mainly investigates two common decoding problems in neural keyphrase generation: sequence length bias and beam diversity. To tackle the problems, we introduce a beam search decoding strategy based on word-level and ngram-level reward function to constrain and refine Seq2Seq inference at test time. Results show that our simple proposal can overcome the algorithm bias to shorter and nearly identical sequences, resulting in a significant improvement of the decoding performance on generating keyphrases that are present and absent in source text.

翻訳日:2023-11-03 18:54:09 公開日:2023-10-30

# テキスト・音声・音声・生理信号を用いた機械学習による共感検出

Empathy Detection Using Machine Learning on Text, Audiovisual, Audio or Physiological Signals ( http://arxiv.org/abs/2311.00721v1 )

ライセンス: Link先を確認

Md Rakibul Hasan, Md Zakir Hossain, Shreya Ghosh, Susannah Soon, Tom Gedeon

(参考訳) 共感とは、個人が他人を理解する能力を示す社会的スキルである。過去数年間、共感は、Affective Computing、Cognitive Science and Psychologyに限らず、様々な分野から注目を集めてきた。共感は文脈に依存した用語であり、共感を検知または認識することは、社会、医療、教育に潜在的な応用をもたらす。広範かつ重なり合う話題であるにもかかわらず、機械学習を活用した共感検出研究の道筋は、全体論的な文学的観点からは未検討のままである。この目的のために,10の有名なデータベースから801の論文を体系的に収集,スクリーニングし,選択した54の論文を分析した。本論文は,共感検出システムの入力モダリティ,すなわちテキスト,オーディオ視覚,オーディオ,生理的信号に基づいてグループ化する。本稿では,モダリティ固有の前処理とネットワークアーキテクチャ設計プロトコル,一般的なデータセット記述と可用性の詳細,評価プロトコルについて検討する。我々はさらに,新たな探索方法を促進するコンピュータベースの共感ドメインにおける潜在的応用,展開課題,研究ギャップについても論じる。私たちは、私たちの仕事は、文化、多様性、多言語主義を含む、プライバシーを保護し、偏見のない共感システムを開発するための一歩だと信じています。

Empathy is a social skill that indicates an individual's ability to understand others. Over the past few years, empathy has drawn attention from various disciplines, including but not limited to Affective Computing, Cognitive Science and Psychology. Empathy is a context-dependent term; thus, detecting or recognising empathy has potential applications in society, healthcare and education. Despite being a broad and overlapping topic, the avenue of empathy detection studies leveraging Machine Learning remains underexplored from a holistic literature perspective. To this end, we systematically collect and screen 801 papers from 10 well-known databases and analyse the selected 54 papers. We group the papers based on input modalities of empathy detection systems, i.e., text, audiovisual, audio and physiological signals. We examine modality-specific pre-processing and network architecture design protocols, popular dataset descriptions and availability details, and evaluation protocols. We further discuss the potential applications, deployment challenges and research gaps in the Affective Computing-based empathy domain, which can facilitate new avenues of exploration. We believe that our work is a stepping stone to developing a privacy-preserving and unbiased empathic system inclusive of culture, diversity and multilingualism that can be deployed in practice to enhance the overall well-being of human life.

翻訳日:2023-11-03 16:20:52 公開日:2023-10-30

# 機械学習ポテンシャルにおける構造的・コンフォメーション的多様性の役割

Role of Structural and Conformational Diversity for Machine Learning Potentials ( http://arxiv.org/abs/2311.00862v1 )

ライセンス: Link先を確認

Nikhil Shenoy, Prudencio Tossou, Emmanuel Noutahi, Hadrien Mary, Dominique Beaini, Jiarui Ding

(参考訳) 機械学習の原子間ポテンシャル(mlips)の分野では、データバイアス、特にコンフォメーションと構造的多様性の間の複雑な関係を理解し、モデル一般化は量子力学(qm)データ生成作業の品質向上に不可欠である。この2つの異なる実験により、データセットサイズが一定である固定的予算1と、構造的多様性を変化させつつ、固定的な構造的多様性に焦点をあてた固定的分子集合1とを探索する。その結果,一般化指標におけるニュアンスパターンが明らかになった。特に、最適構造とコンフォーメーションの一般化には、構造とコンフォーメーションの多様性の慎重なバランスが必要であるが、既存のQMデータセットはそのトレードオフを満たしていない。さらに,モデル展開における適用可能性ドメイン定義の重要性を強調しながら,トレーニング分布を超えて一般化するmlipモデルの限界を強調する。これらの知見は、QMデータ生成のための貴重な洞察とガイドラインを提供する。

In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.

翻訳日:2023-11-03 15:27:14 公開日:2023-10-30

# 定数円における量子後零知識へのブラックボックスアプローチ

A Black-Box Approach to Post-Quantum Zero-Knowledge in Constant Rounds ( http://arxiv.org/abs/2011.02670v4 )

ライセンス: Link先を確認

Nai-Hui Chia and Kai-Min Chung and Takashi Yamakawa

(参考訳) 最近のセミナルな研究で、ビタンスキーとシュムエリ(STOC '20)は、NPが量子攻撃に対して安全であることを示す定ラウンドゼロ知識引数を初めて構築した。しかし、それらの構造は古典的なものに比べていくつかの欠点がある。具体的には、それらの構成は計算の健全性しか達成せず、エラー(QLWE仮定)と量子完全同型暗号(QFHE)の存在による学習の量子困難性の強い仮定を必要とし、非ブラックボックスシミュレーションに依存している。本稿では、これらの問題をゼロ知識の概念を「$\epsilon$-zero-knowledge」と呼ぶものに弱めるコストで解決する。具体的には, 統計的健全性とブラックボックスの$\epsilon$-zero-knowledge を満たす NP に対して, 衝突するハッシュ関数の存在を前提として, 一定のラウンド・インタラクティブな NP の証明を構築する。興味深いことに、この構成はGoldreich と Kahan (JoC '96) による古典的プロトコルの適応版にすぎないが、量子敵に対する$\epsilon$-zero-knowledgeプロパティの証明には新しいアイデアが必要である。量子攻撃に対するブラックボックス $\epsilon$-zero-knowledge と計算の健全性を満たすnpの一定の円環的対話的議論を量子後一方向関数の存在を仮定するだけで構成する。この結果の核心となるのは、シミュレータが悪意のある検証者のコミットメッセージを抽出し、検証者の内部状態を適切な意味でシミュレートすることのできる新しい量子巻き戻し技術である。

In a recent seminal work, Bitansky and Shmueli (STOC '20) gave the first construction of a constant round zero-knowledge argument for NP secure against quantum attacks. However, their construction has several drawbacks compared to the classical counterparts. Specifically, their construction only achieves computational soundness, requires strong assumptions of quantum hardness of learning with errors (QLWE assumption) and the existence of quantum fully homomorphic encryption (QFHE), and relies on non-black-box simulation. In this paper, we resolve these issues at the cost of weakening the notion of zero-knowledge to what is called $\epsilon$-zero-knowledge. Concretely, we construct the following protocols: - We construct a constant round interactive proof for NP that satisfies statistical soundness and black-box $\epsilon$-zero-knowledge against quantum attacks assuming the existence of collapsing hash functions, which is a quantum counterpart of collision-resistant hash functions. Interestingly, this construction is just an adapted version of the classical protocol by Goldreich and Kahan (JoC '96) though the proof of $\epsilon$-zero-knowledge property against quantum adversaries requires novel ideas. - We construct a constant round interactive argument for NP that satisfies computational soundness and black-box $\epsilon$-zero-knowledge against quantum attacks only assuming the existence of post-quantum one-way functions. At the heart of our results is a new quantum rewinding technique that enables a simulator to extract a committed message of a malicious verifier while simulating verifier's internal state in an appropriate sense.

翻訳日:2023-11-02 18:53:58 公開日:2023-10-30

# 連続条件生成逆数ネットワーク:新しい経験的損失とラベル入力機構

Continuous Conditional Generative Adversarial Networks: Novel Empirical Losses and Label Input Mechanisms ( http://arxiv.org/abs/2011.07466v9 )

ライセンス: Link先を確認

Xin Ding and Yongwei Wang and Zuheng Xu and William J. Welch and Z. Jane Wang

(参考訳) 本研究では,連続的,スカラーな条件(終末回帰ラベル)に基づく画像生成条件生成モデルとして,CcGAN(Continuous Conditional Generative Adversarial Network)を提案する。 Existing conditional GANs (cGANs) are mainly designed for categorical conditions (eg, class labels); conditioning on regression labels is mathematically distinct and raises two fundamental problems:(P1) Since there may be very few (even zero) real images for some regression labels, minimizing existing empirical versions of cGAN losses (aka empirical cGAN losses) often fails in practice;(P2) Since regression labels are scalar and infinitely many, conventional label input methods are not applicable. 提案するccganは, (s1) 既存の経験的cgan損失を連続シナリオに適合するように再構成し, (s2) ナイーブラベル入力 (nli) 法と改良されたラベル入力 (ili) 法を提案し, ジェネレータと判別器に回帰ラベルを組み込む。 s1) における再構成は、2つの新しい経験的判別器損失をもたらし、それぞれhard vicinal discriminator loss (hvdl) とsoft vicinal discriminator loss (svdl) と呼ばれる。 HVDLとSVDLで訓練された判別器の誤差境界は、本研究の軽微な仮定の下で導出される。 2つの新しいベンチマークデータセット(RC-49とCell-200)と新しい評価基準(Sliding Fr\'echet Inception Distance)も提案されている。 CcGANは,Circular 2-D Gaussian, RC-49, UTKFace, Cell-200, Steering Angleのデータセットを用いて, 与えられた回帰ラベルに基づいて画像分布条件から, 多様な高品質なサンプルを生成することができることを示す。さらに、これらの実験では、CcGANは視覚的および定量的にcGANを著しく上回る。

This work proposes the continuous conditional generative adversarial network (CcGAN), the first generative model for image generation conditional on continuous, scalar conditions (termed regression labels). Existing conditional GANs (cGANs) are mainly designed for categorical conditions (eg, class labels); conditioning on regression labels is mathematically distinct and raises two fundamental problems:(P1) Since there may be very few (even zero) real images for some regression labels, minimizing existing empirical versions of cGAN losses (aka empirical cGAN losses) often fails in practice;(P2) Since regression labels are scalar and infinitely many, conventional label input methods are not applicable. The proposed CcGAN solves the above problems, respectively, by (S1) reformulating existing empirical cGAN losses to be appropriate for the continuous scenario; and (S2) proposing a naive label input (NLI) method and an improved label input (ILI) method to incorporate regression labels into the generator and the discriminator. The reformulation in (S1) leads to two novel empirical discriminator losses, termed the hard vicinal discriminator loss (HVDL) and the soft vicinal discriminator loss (SVDL) respectively, and a novel empirical generator loss. The error bounds of a discriminator trained with HVDL and SVDL are derived under mild assumptions in this work. Two new benchmark datasets (RC-49 and Cell-200) and a novel evaluation metric (Sliding Fr\'echet Inception Distance) are also proposed for this continuous scenario. Our experiments on the Circular 2-D Gaussians, RC-49, UTKFace, Cell-200, and Steering Angle datasets show that CcGAN is able to generate diverse, high-quality samples from the image distribution conditional on a given regression label. Moreover, in these experiments, CcGAN substantially outperforms cGAN both visually and quantitatively.

翻訳日:2023-11-02 18:44:29 公開日:2023-10-30

# 予期せぬ敵に対するロバスト性テスト

Testing Robustness Against Unforeseen Adversaries ( http://arxiv.org/abs/1908.08016v4 )

ライセンス: Link先を確認

Max Kaufmann, Daniel Kang, Yi Sun, Steven Basart, Xuwang Yin, Mantas Mazeika, Akul Arora, Adam Dziedzic, Franziska Boenisch, Tom Brown, Jacob Steinhardt, Dan Hendrycks

(参考訳) adversarial robustness researchは主にl_p摂動に焦点を当てており、ほとんどの防御はトレーニングタイムとテストタイムの逆境で開発されている。しかし、現実世界のアプリケーションでは、開発者はシステムが直面する攻撃や汚職の全範囲にアクセスできない。さらに、最悪のケース入力は多様であり、L_pボールに制約される必要はない。研究と現実のこの相違を狭めるために、新しい18の非L_p攻撃を含む、予期せぬ敵に対するモデルの堅牢性を評価するためのフレームワークであるImageNet-UAを導入する。 ImageNet-UAでうまく機能するためには、ディフェンスは一般化ギャップを克服し、トレーニング中に遭遇しない多様な攻撃に対して堅牢でなければならない。大規模な実験では、既存のロバストネス対策が予期せぬロバストネスを捉えていないこと、標準ロバストネス技術が代替トレーニング戦略に勝っていること、新しい手法が予期せぬロバストネスを改善できることが判明した。我々は,機械学習システムの最悪の動作を改善するためのコミュニティの有用なツールとして,ImageNet-UAを提案する。

Adversarial robustness research primarily focuses on L_p perturbations, and most defenses are developed with identical training-time and test-time adversaries. However, in real-world applications developers are unlikely to have access to the full range of attacks or corruptions their system will face. Furthermore, worst-case inputs are likely to be diverse and need not be constrained to the L_p ball. To narrow in on this discrepancy between research and reality we introduce ImageNet-UA, a framework for evaluating model robustness against a range of unforeseen adversaries, including eighteen new non-L_p attacks. To perform well on ImageNet-UA, defenses must overcome a generalization gap and be robust to a diverse attacks not encountered during training. In extensive experiments, we find that existing robustness measures do not capture unforeseen robustness, that standard robustness techniques are beat by alternative training strategies, and that novel methods can improve unforeseen robustness. We present ImageNet-UA as a useful tool for the community for improving the worst-case behavior of machine learning systems.

翻訳日:2023-11-02 05:30:05 公開日:2023-10-30

# 低リソース音声コマンド認識のための類似性を用いたニューラルモデル再構成

Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition ( http://arxiv.org/abs/2110.03894v5 )

ライセンス: Link先を確認

Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

(参考訳) 本研究では,低リソース音声コマンド認識(SCR)のための新しいAR手法を提案し,AR-SCRシステムを構築する。 ARプロシージャは(ターゲットドメインから)音響信号を修正して(ソースドメインから)事前訓練されたSCRモデルを再利用することを目的としている。ソースドメインとターゲットドメインのラベルミスマッチを解消し、arの安定性をさらに高めるため、クラスをアライメントするための新しい類似性に基づくラベルマッピング手法を提案する。さらに、トランスファーラーニング(TL)技術と元のARプロセスを組み合わせることで、モデル適応性を向上させる。提案したAR-SCRシステムは,アラビア語,リトアニア語,マンダリン語を含む3つの低リソースSCRデータセットを用いて評価した。実験結果から、大規模な英語データセットで事前訓練されたAMを用いて、提案したAR-SCRシステムは、アラビア語およびリトアニア語の音声コマンドデータセット上で、限られた訓練データのみを用いて、現在の最先端の結果を上回ります。

In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.

翻訳日:2023-11-02 05:25:17 公開日:2023-10-30

# アルミニウム超伝導共振器の2レベル飽和下での異常損失低減

Anomalous Loss Reduction Below Two-Level System Saturation in Aluminum Superconducting Resonators ( http://arxiv.org/abs/2109.11742v5 )

ライセンス: Link先を確認

Tamin Tai, Jingnan Cai, Steven M. Anlage

(参考訳) 超伝導共振器は量子コンピューティングのためのキュービットリードアウトや運動インダクタンス検出器など多くの用途で広く使われている。これらの共振器は、多くの損失とノイズ機構、特に、少数の光子と低温状態において主な損失源となる2レベル系(TLS)による消音の影響を受けやすい。本研究では, 容量結合型半波長コプラナー導波路共振器について検討した。意外なことに, 共振器の損失は低励磁温度とTLS飽和度以下の温度で減少することが観察された。この挙動は、TLSの離散アンサンブルにおけるTLSと共振光子周波数の遅延を減らし、TLSの温度と電力を低下させることによるTLS共鳴応答帯域の減少に起因する。 TLSの応答帯域幅が共振器からの遅延よりも小さい場合、共振器応答が小さくなり、損失が減少する。より高い励起力では、損失は一般化トンネルモデル(GTM)の予測と一致する対数的パワー依存に従う。離散TLSアンサンブルとGTMを組み合わせたモデルを提案し、測定した共振器内部損失の温度と電力依存性を合理的パラメータと一致させる。

Superconducting resonators are widely used in many applications such as qubit readout for quantum computing, and kinetic inductance detectors. These resonators are susceptible to numerous loss and noise mechanisms, especially the dissipation due to two-level systems (TLS) which become the dominant source of loss in the few-photon and low temperature regime. In this study, capacitively-coupled aluminum half-wavelength coplanar waveguide resonators are investigated. Surprisingly, the loss of the resonators was observed to decrease with a lowering temperature at low excitation powers and temperatures below the TLS saturation. This behavior is attributed to the reduction of the TLS resonant response bandwidth with decreasing temperature and power to below the detuning between the TLS and the resonant photon frequency in a discrete ensemble of TLS. When response bandwidths of TLS are smaller than their detunings from the resonance, the resonant response and thus the loss is reduced. At higher excitation powers, the loss follows a logarithmic power dependence, consistent with predictions from the generalized tunneling model (GTM). A model combining the discrete TLS ensemble with the GTM is proposed and matches the temperature and power dependence of the measured internal loss of the resonator with reasonable parameters.

翻訳日:2023-11-02 05:24:25 公開日:2023-10-30

# future-ai: 医療画像における信頼できる人工知能のための原則とコンセンサス勧告

FUTURE-AI: Guiding Principles and Consensus Recommendations for Trustworthy Artificial Intelligence in Medical Imaging ( http://arxiv.org/abs/2109.09658v5 )

ライセンス: Link先を確認

Karim Lekadir, Richard Osuala, Catherine Gallin, Noussair Lazrak, Kaisar Kushibar, Gianna Tsakou, Susanna Auss\'o, Leonor Cerd\'a Alberich, Kostas Marias, Manolis Tsiknakis, Sara Colantonio, Nickolas Papanikolaou, Zohaib Salahuddin, Henry C Woodruff, Philippe Lambin, Luis Mart\'i-Bonmat\'i

(参考訳) 人工知能(AI)の最近の進歩は、今日の臨床システムによって生成される膨大なデータと相まって、画像再構成、医用画像分割、画像ベースの診断、治療計画を含む、医療画像のバリューチェーン全体にわたる画像AIソリューションの開発につながっている。医療画像におけるaiの成功と将来の可能性にかかわらず、多くの利害関係者は、複雑で不透明で、重要な臨床応用に対する理解、利用、信頼が難しいと認識されるaiソリューションの潜在的なリスクと倫理的意味を懸念している。これらの懸念とリスクにもかかわらず、医療画像における将来のAI開発を信頼、安全性、採用を高めるための具体的なガイドラインやベストプラクティスは今のところ存在しない。このギャップを埋めるため,本稿では,欧州の5つの大規模健康イメージングプロジェクトから蓄積された経験,コンセンサス,ベストプラクティスから導かれた指針の慎重に選択する。これらの指針はfuture-aiと呼ばれ、その構成要素は (i)公平さ。 (ii)普遍性 (iii)トレーサビリティ (4)ユーザビリティ (v)堅牢性と (vi)説明可能。ステップバイステップアプローチでは、これらのガイドラインは、技術的、臨床的、倫理的に信頼できるAIソリューションを臨床実践に特定、開発、評価、デプロイするための具体的な勧告のフレームワークにさらに変換される。

The recent advancements in artificial intelligence (AI) combined with the extensive amount of data generated by today's clinical systems, has led to the development of imaging AI solutions across the whole value chain of medical imaging, including image reconstruction, medical image segmentation, image-based diagnosis and treatment planning. Notwithstanding the successes and future potential of AI in medical imaging, many stakeholders are concerned of the potential risks and ethical implications of imaging AI solutions, which are perceived as complex, opaque, and difficult to comprehend, utilise, and trust in critical clinical applications. Despite these concerns and risks, there are currently no concrete guidelines and best practices for guiding future AI developments in medical imaging towards increased trust, safety and adoption. To bridge this gap, this paper introduces a careful selection of guiding principles drawn from the accumulated experiences, consensus, and best practices from five large European projects on AI in Health Imaging. These guiding principles are named FUTURE-AI and its building blocks consist of (i) Fairness, (ii) Universality, (iii) Traceability, (iv) Usability, (v) Robustness and (vi) Explainability. In a step-by-step approach, these guidelines are further translated into a framework of concrete recommendations for specifying, developing, evaluating, and deploying technically, clinically and ethically trustworthy AI solutions into clinical practice.

翻訳日:2023-11-02 05:24:02 公開日:2023-10-30

# 相対論的ジャッキー・ネアエノンの不確かさ関係--第一原理の導出

Uncertainty Relations for the Relativistic Jackiw-Nair Anyon: A First Principles Derivation ( http://arxiv.org/abs/2107.09342v2 )

ライセンス: Link先を確認

Joydeep Majhi (ISI, Kolkata), Subir Ghosh (ISI, Kolkata)

(参考訳) 本稿では,jackiw と nair ref によって提唱された相対論的粒子モデルに対する $position-position$ と $position-momentum$ (heisenberg) 不確かさ関係を明示的に計算した。 [1]anyonのモデルとして,純粋に量子力学的な枠組みを用いた。これは(シュワルツの不等式を通じて)任意の存在が 2-次元 \textit{noncommutative} 空間に存在するという予想を支持する。我々は最近構築したanyon波動関数refを用いて、anyon座標である${\sqrt{\delta x^2\delta y^2}}=\hbar\bar{\theta}_{xy}$の非自明な不確かさ関係を計算した。 [6]refの枠組みにおいて。 [7]. また、アノンに対するハイゼンベルクの不確かさ関係を計算する。最後に、電子に適用すると、同一の \textit{formalism} が自明な位置の不確実性関係を生じさせ、3次元の可換空間での生活と一致することを示した。

In this paper we have explicitly computed the $position-position$ and $position-momentum$ (Heisenberg) Uncertainty Relations for the model of relativistic particles with arbitrary spin, proposed by Jackiw and Nair ref.[1] as a model for Anyon, in a purely quantum mechanical framework. This supports (via Schwarz inequality) the conjecture that anyons live in a 2-dimensional \textit{noncommutative} space. We have computed the non-trivial uncertainty relation between anyon coordinates, ${\sqrt{\Delta x^2\Delta y^2}}=\hbar\bar{\Theta}_{xy}$, using the recently constructed anyon wave function ref.[6], in the framework of ref.[7]. We also compute the Heisenberg (position-momentum) uncertainty relation for anyons. Lastly we show that the identical \textit{formalism} when applied to electrons, yield a trivial position uncertainty relation, consistent with their living in a 3-dimensional commutative space.

翻訳日:2023-11-02 05:23:20 公開日:2023-10-30

# 連続最適化による因果構造学習におけるエントロピーに基づく損失の役割について

On the Role of Entropy-based Loss for Learning Causal Structures with Continuous Optimization ( http://arxiv.org/abs/2106.02835v4 )

ライセンス: Link先を確認

Weilin Chen, Jie Qiao, Ruichu Cai, Zhifeng Hao

(参考訳) 観測データからの因果発見は多くの科学分野において重要であるが難しい課題である。近年, notears と呼ばれる非組合せ有向非巡回制約を用いた手法では, 因果構造学習問題を最小二乗損失を用いた連続最適化問題として定式化している。最小二乗損失関数は標準ガウス雑音仮定の下では十分正当化されるが、仮定が成り立たない場合に制限される。本研究では,ガウス雑音の仮定違反が因果方向の同定を妨げることを理論的に示し,因果方向が線形の場合や非線形の場合の強い非ガウス雑音のばらつきと同様に因果方向の強さによって完全に決定されることを示す。その結果,任意の雑音分布下での確率値と理論的に一致した,より一般的なエントロピーに基づく損失を提案する。提案手法の有効性を検証するために合成データと実世界のデータの両方について広範な実験評価を行い,提案手法が構造ハミング距離,偽発見率,真正率行列において最良であることを示す。

Causal discovery from observational data is an important but challenging task in many scientific fields. Recently, a method with non-combinatorial directed acyclic constraint, called NOTEARS, formulates the causal structure learning problem as a continuous optimization problem using least-square loss. Though the least-square loss function is well justified under the standard Gaussian noise assumption, it is limited if the assumption does not hold. In this work, we theoretically show that the violation of the Gaussian noise assumption will hinder the causal direction identification, making the causal orientation fully determined by the causal strength as well as the variances of noises in the linear case and by the strong non-Gaussian noises in the nonlinear case. Consequently, we propose a more general entropy-based loss that is theoretically consistent with the likelihood score under any noise distribution. We run extensive empirical evaluations on both synthetic data and real-world data to validate the effectiveness of the proposed method and show that our method achieves the best in Structure Hamming Distance, False Discovery Rate, and True Positive Rate matrices.

翻訳日:2023-11-02 05:22:41 公開日:2023-10-30

# 小型データセットを用いた画像分類学習のためのパラメトリズドロスの進化

Evolving parametrized Loss for Image Classification Learning on Small Datasets ( http://arxiv.org/abs/2103.08249v2 )

ライセンス: Link先を確認

Zhaoyang Hai, Xiabi Liu

(参考訳) 本稿では,メタロスネットワーク(mln)と呼ばれるパラメータ付き損失関数を進化させ,画像分類学習を小規模データセットで学習するメタラーニング手法を提案する。本手法では,MLNを識別対象関数として分類学習の枠組みに組み込む。 MLNは進化戦略アルゴリズム(ES)から最適化された損失関数へと進化し、この損失を最小限に抑えるために最適化された分類器が優れた一般化効果を達成する。分類器は、小さなトレーニングデータセットから学習し、Stochastic Gradient Descent (SGD)でMLNを最小化し、その後、大規模な検証データセット上の小データセット更新分類器の精度でMLNを進化させる。本手法を評価するため,MLNはFashionMNISTから採取した多数のサンプル学習タスクを訓練し,FashionMNISTとCIFAR10から採取した検証タスクを試験した。実験の結果,MLNは古典的クロスエントロピー誤差や平均二乗誤差と比較して,一般化を効果的に改善した。

This paper proposes a meta-learning approach to evolving a parametrized loss function, which is called Meta-Loss Network (MLN), for training the image classification learning on small datasets. In our approach, the MLN is embedded in the framework of classification learning as a differentiable objective function. The MLN is evolved with the Evolutionary Strategy algorithm (ES) to an optimized loss function, such that a classifier, which optimized to minimize this loss, will achieve a good generalization effect. A classifier learns on a small training dataset to minimize MLN with Stochastic Gradient Descent (SGD), and then the MLN is evolved with the precision of the small-dataset-updated classifier on a large validation dataset. In order to evaluate our approach, the MLN is trained with a large number of small sample learning tasks sampled from FashionMNIST and tested on validation tasks sampled from FashionMNIST and CIFAR10. Experiment results demonstrate that the MLN effectively improved generalization compared to classical cross-entropy error and mean squared error.

翻訳日:2023-11-02 05:22:21 公開日:2023-10-30

# 磁性薄膜を用いた貯留層計算

Reservoir Computing with Magnetic Thin Films ( http://arxiv.org/abs/2101.12700v2 )

ライセンス: Link先を確認

Matthew Dale, David Griffin, Richard F. L. Evans, Sarah Jenkins, Simon O'Keefe, Angelika Sebald, Susan Stepney, Fernando Torre, Martin Trefzer

(参考訳) 人工知能の進歩は脳に触発された技術によってもたらされるが、これらの技術は生体システムよりも強力でエネルギー効率が良い。ニューラルネットの非線形ダイナミクスに触発された新しい非伝統的なコンピューティングハードウェアは、生物学的システムと同じような方法で、自然現象を活用し、効率を上げる可能性を秘めている。物理貯水池計算は、光学系から機械系まで、様々な非伝統的なシステムでこれを実証している。貯水池コンピュータは、システムの内部ダイナミクスを利用して、高次元特徴空間に入力されるタスクを非線形に投影する。トレーニングされた読み出し層は、パターン認識や時系列分析などのタスクを実行するために機能を組み合わせる。進展にもかかわらず、外部信号処理を行わずに最先端の性能を達成することは依然として困難である。ここでは, マイクロスケールシミュレーションによる薄膜における3つの磁性物質の初期探査を行う。以上の結果から, 磁気フィルムの基本スピン特性は, 機械学習の課題を解くために必要な非線形ダイナミクスとメモリを生成することが判明した(物理実装におけるこれらの特定の材料の利用には現実的な課題がある)。この方法は他の材料にも応用できるため、比較的単純な(合金の)ものからかなり複雑なもの(反強磁性貯水池)まで、様々な材料をテストできる可能性が開ける。

Advances in artificial intelligence are driven by technologies inspired by the brain, but these technologies are orders of magnitude less powerful and energy efficient than biological systems. Inspired by the nonlinear dynamics of neural networks, new unconventional computing hardware has emerged with the potential to exploit natural phenomena and gain efficiency, in a similar manner to biological systems. Physical reservoir computing demonstrates this with a variety of unconventional systems, from optical-based to memristive systems. Reservoir computers provide a nonlinear projection of the task input into a high-dimensional feature space by exploiting the system's internal dynamics. A trained readout layer then combines features to perform tasks, such as pattern recognition and time-series analysis. Despite progress, achieving state-of-the-art performance without external signal processing to the reservoir remains challenging. Here we perform an initial exploration of three magnetic materials in thin-film geometries via microscale simulation. Our results reveal that basic spin properties of magnetic films generate the required nonlinear dynamics and memory to solve machine learning tasks (although there would be practical challenges in exploiting these particular materials in physical implementations). The method of exploration can be applied to other materials, so this work opens up the possibility of testing different materials, from relatively simple (alloys) to significantly complex (antiferromagnetic reservoirs).

翻訳日:2023-11-02 05:21:37 公開日:2023-10-30

# 統計的ロバスト信頼テストによる3次元顔アライメントの教師なし性能解析

Unsupervised Performance Analysis of 3D Face Alignment with a Statistically Robust Confidence Test ( http://arxiv.org/abs/2004.06550v6 )

ライセンス: Link先を確認

Mostafa Sadeghi, Xavier Alameda-Pineda and Radu Horaud

(参考訳) 本稿では,3次元顔アライメント(3DFA)や顔のランドマーク位置解析の問題点について述べる。このタスクは通常、アノテーション付きデータセットに基づいて管理される。しかしながら、3DFAの場合、アノテーションプロセスはエラーのないことはめったになく、結果に強く偏っている。また,教師なし性能解析(UPA)についても検討した。提案手法の核となる要素は予測されたランドマークとモデルランドマークの間の剛性変換のロバストな推定である。このように計算した剛性マッピングは、表情やアイデンティティの変動による非剛性な顔変形や、様々な摂動によるランドマーク化誤差の影響を受けないことが示されている。先導的な考え方は、推定された回転、翻訳、スケールを予測されたランドマークの集合に適用して、これらのランドマークに埋め込まれた形状(潜在的なエラーを含む)を数学的ホームにマッピングすることである。 UPAは次のように進める。 (i)調査中の3DFA法を用いた2次元顔から3Dランドマークを抽出する。 (ii)これらのランドマークは、正準(正面)のポーズに厳格にマッピングされ、 (iii)各ランドマークに対して統計的に損なわれる信頼スコアを算出する。これにより、マッピングされたランドマークが(インバータ)内側にあるか(インバータ)外側(アウトバータ)の信頼度ボリュームにあるかを評価することができる。公開されているデータセットと公開記事に関連するいくつかの3DFAソフトウェアパッケージを利用する実験的な評価プロトコルについて詳述する。その結果,提案手法は教師付きメトリクスと一致しており,予測されたランドマークと自動アノテートされた3dfaデータセットの両方の精度を計測し,エラーの検出と除去に使用することができる。本論文のソースコードと補足資料はhttps://team.inria.fr/robotlearn/upa3dfa/で公開されている。

This paper addresses the problem of analysing the performance of 3D face alignment (3DFA), or facial landmark localization. This task is usually supervised, based on annotated datasets. Nevertheless, in the particular case of 3DFA, the annotation process is rarely error-free, which strongly biases the results. Alternatively, unsupervised performance analysis (UPA) is investigated. The core ingredient of the proposed methodology is the robust estimation of the rigid transformation between predicted landmarks and model landmarks. It is shown that the rigid mapping thus computed is affected neither by non-rigid facial deformations, due to variabilities in expression and in identity, nor by landmark localization errors, due to various perturbations. The guiding idea is to apply the estimated rotation, translation and scale to a set of predicted landmarks in order to map them onto a mathematical home for the shape embedded in these landmarks (including possible errors). UPA proceeds as follows: (i) 3D landmarks are extracted from a 2D face using the 3DFA method under investigation; (ii) these landmarks are rigidly mapped onto a canonical (frontal) pose, and (iii) a statistically-robust confidence score is computed for each landmark. This allows to assess whether the mapped landmarks lie inside (inliers) or outside (outliers) a confidence volume. An experimental evaluation protocol, that uses publicly available datasets and several 3DFA software packages associated with published articles, is described in detail. The results show that the proposed analysis is consistent with supervised metrics and that it can be used to measure the accuracy of both predicted landmarks and of automatically annotated 3DFA datasets, to detect errors and to eliminate them. Source code and supplemental materials for this paper are publicly available at https://team.inria.fr/robotlearn/upa3dfa/.

翻訳日:2023-11-02 05:20:02 公開日:2023-10-30

# 投機的復号:Seq2seq生成の高速化のための投機的実行の爆発

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation ( http://arxiv.org/abs/2203.16487v6 )

ライセンス: Link先を確認

Heming Xia, Tao Ge, Peiyi Wang, Si-Qing Chen, Furu Wei, Zhifang Sui

(参考訳) 我々は,投機的実行の考え方を活用し,自己回帰的(ar)復号を加速するために,初めて投機的復号(specdec)を提案する。 spec-drafter -- 効率的かつ正確なドラフト作成のために特別に最適化された独立したモデル -- とspec-verification -- の2つのイノベーションがある。機械翻訳や抽象的な要約を含むSeq2seqタスクの実験結果から、一般的なトランスフォーマーアーキテクチャにおいて、ビーム検索デコーディングに匹敵する世代品質の高速化を実現し、ドラフト-then-verifyパラダイムがわずか1.4\times$$\sim$2\times$スピードアップを実現した。驚くべきスピードアップに加えて、SpecDecの3つのアドバンテージも示し、実世界のアプリケーションで生成モデルを加速する実用的価値を明らかにした。私たちのモデルとコードはhttps://github.com/hemingkx/specdec.com/で利用可能です。

We propose Speculative Decoding (SpecDec), for the first time ever, to formally study exploiting the idea of speculative execution to accelerate autoregressive (AR) decoding. Speculative Decoding has two innovations: Spec-Drafter -- an independent model specially optimized for efficient and accurate drafting -- and Spec-Verification -- a reliable method for verifying the drafted tokens efficiently in the decoding paradigm. Experimental results on various seq2seq tasks including machine translation and abstractive summarization show our approach can achieve around $5\times$ speedup for the popular Transformer architectures with comparable generation quality to beam search decoding, refreshing the impression that the draft-then-verify paradigm introduces only $1.4\times$$\sim$$2\times$ speedup. In addition to the remarkable speedup, we also demonstrate 3 additional advantages of SpecDec, revealing its practical value for accelerating generative models in real-world applications. Our models and codes are available at https://github.com/hemingkx/SpecDec.

翻訳日:2023-11-02 05:12:16 公開日:2023-10-30

# ノイズ画像分類のための処理学習因果変換器

Treatment Learning Causal Transformer for Noisy Image Classification ( http://arxiv.org/abs/2203.15529v2 )

ライセンス: Link先を確認

Chao-Han Huck Yang, I-Te Danny Hung, Yi-Chieh Liu, Pin-Yu Chen

(参考訳) 現在のトップノートディープラーニング(DL)ベースのビジョンモデルは主に、トレーニングデータサンプルと関連するラベル間の固有の相関を探索し、活用することに基づいている。しかしながら、既知の実用的な課題は、スプリアス相関、無関係なコンテキスト、ドメインシフト、逆境攻撃などの異なる状況によって引き起こされる「ノイズ」データに対する低下したパフォーマンスである。本研究では,この2値情報「ノイズの存在」を画像分類タスクに組み込んで,その処理効果を共同で推定することで予測精度を向上させる。因果的変動推定から動機付け,雑音画像分類のための現在の観測入力からロバストな特徴表現を推定するために潜在生成モデルを用いた,変圧器に基づく処理学習因果変換器(TLT)を提案する。 TLTは、推定ノイズレベル(バイナリ処理係数としてモデル化)に応じて、設計した因果損失によってトレーニングされた対応する推論ネットワークを割り当て、予測を行う。また、パフォーマンスベンチマークのための幅広いノイズ要因(オブジェクトマスキング、スタイル転送、逆方向摂動など)を取り入れた、ノイズの多い画像データセットも作成する。雑音画像分類におけるTLTの優れた性能は、いくつかの難燃評価指標によりさらに検証される。副産物として、TLTはノイズ画像を認識する視覚的サリエンス法も改善した。

Current top-notch deep learning (DL) based vision models are primarily based on exploring and exploiting the inherent correlations between training data samples and their associated labels. However, a known practical challenge is their degraded performance against "noisy" data, induced by different circumstances such as spurious correlations, irrelevant contexts, domain shift, and adversarial attacks. In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy by jointly estimating their treatment effects. Motivated from causal variational inference, we propose a transformer-based architecture, Treatment Learning Causal Transformer (TLT), that uses a latent generative model to estimate robust feature representations from current observational input for noise image classification. Depending on the estimated noise level (modeled as a binary treatment factor), TLT assigns the corresponding inference network trained by the designed causal loss for prediction. We also create new noisy image datasets incorporating a wide range of noise factors (e.g., object masking, style transfer, and adversarial perturbation) for performance benchmarking. The superior performance of TLT in noisy image classification is further validated by several refutation evaluation metrics. As a by-product, TLT also improves visual salience methods for perceiving noisy images.

翻訳日:2023-11-02 05:11:52 公開日:2023-10-30

# 対話型セグメンテーションのためのカスケードスパース特徴伝播ネットワーク

Cascaded Sparse Feature Propagation Network for Interactive Segmentation ( http://arxiv.org/abs/2203.05145v3 )

ライセンス: Link先を確認

Chuyu Zhang, Chuanyang Hu, Hui Ren, Yongfei Liu, and Xuming He

(参考訳) 我々は,ユーザが提供するアノテーションをラベルなしの領域に効率的に伝播させることが重要な課題である,ポイントベースのインタラクティブセグメンテーションの問題に取り組むことを目的とする。既存の手法では計算コストがかかる完全連結グラフやトランスフォーマーアーキテクチャを利用して、正確なセグメンテーションに必要な重要なきめ細かい情報を犠牲にする。これらの制約を克服するために,ユーザが提供した情報をラベルなしの領域に伝達するクリック型特徴表現を学習するカスケードスパース特徴伝達ネットワークを提案する。ネットワークのスパース設計により、高解像度な特徴の効率的な情報伝達が可能となり、より詳細なオブジェクトセグメンテーションが実現される。本手法の有効性を検証するために,様々なベンチマークを用いた包括的実験を行い,提案手法の優れた性能を示す。コードは \href{https://github.com/kleinzcy/CSFPN}{https://github.com/kleinzcy/CSFPN} で公開されている。

We aim to tackle the problem of point-based interactive segmentation, in which the key challenge is to propagate the user-provided annotations to unlabeled regions efficiently. Existing methods tackle this challenge by utilizing computationally expensive fully connected graphs or transformer architectures that sacrifice important fine-grained information required for accurate segmentation. To overcome these limitations, we propose a cascade sparse feature propagation network that learns a click-augmented feature representation for propagating user-provided information to unlabeled regions. The sparse design of our network enables efficient information propagation on high-resolution features, resulting in more detailed object segmentation. We validate the effectiveness of our method through comprehensive experiments on various benchmarks, and the results demonstrate the superior performance of our approach. Code is available at \href{https://github.com/kleinzcy/CSFPN}{https://github.com/kleinzcy/CSFPN}.

翻訳日:2023-11-02 05:11:20 公開日:2023-10-30

# ZXダイアグラムの添加と分化

Addition and Differentiation of ZX-diagrams ( http://arxiv.org/abs/2202.11386v3 )

ライセンス: Link先を確認

Emmanuel Jeandel and Simon Perdrix and Margarita Veshchezerova

(参考訳) ZX計算は量子コンピューティングの推論のための強力なフレームワークである。特に興味のある行列のコンパクトな表現を提供する。 zx-計算の特異な性質は、任意のzx-ダイアグラムの線形結合を可能にする形式的な和がないことである。形式主義の普遍性は、任意の2つのZXダイアグラムに対して、それらの解釈の和はZXダイアグラムで表せることを保証している。制御ダイアグラムの構成に依拠して,zx-ダイアグラムの追加の一般的,帰納的定義を導入する。この付加技術に基づき、zx-ダイアグラムの誘導的微分を提供する。実際、その角度の説明に変数を持つ zx-ダイアグラムが与えられると、これらの変数の1つに従ってダイアグラムを区別することができる。微分は量子力学や量子コンピューティング(例えば最適化問題の解法)においてユビキタスである。技術的には、zx-ダイアグラムの分化は、製品規則で見られるように要約と強く関連している。また,変数の分離を基本とした代替的,非帰納的,微分手法も導入する。最後に、結果を適用してイジング・ハミルトン多様体の図形を導出する。

The ZX-calculus is a powerful framework for reasoning in quantum computing. It provides in particular a compact representation of matrices of interests. A peculiar property of the ZX-calculus is the absence of a formal sum allowing the linear combinations of arbitrary ZX-diagrams. The universality of the formalism guarantees however that for any two ZX-diagrams, the sum of their interpretations can be represented by a ZX-diagram. We introduce a general, inductive definition of the addition of ZX-diagrams, relying on the construction of controlled diagrams. Based on this addition technique, we provide an inductive differentiation of ZX-diagrams. Indeed, given a ZX-diagram with variables in the description of its angles, one can differentiate the diagram according to one of these variables. Differentiation is ubiquitous in quantum mechanics and quantum computing (e.g. for solving optimization problems). Technically, differentiation of ZX-diagrams is strongly related to summation as witnessed by the product rules. We also introduce an alternative, non inductive, differentiation technique rather based on the isolation of the variables. Finally, we apply our results to deduce a diagram for an Ising Hamiltonian.

翻訳日:2023-11-02 05:10:26 公開日:2023-10-30

# 暗黒環境における行動認識の深化:包括的ベンチマーク研究

Going Deeper into Recognizing Actions in Dark Environments: A Comprehensive Benchmark Study ( http://arxiv.org/abs/2202.09545v3 )

ライセンス: Link先を確認

Yuecong Xu, Jianfei Yang, Haozhi Cao, Jianxiong Yin, Zhenghua Chen, Xiaoli Li, Zhengguo Li, Qianwen Xu

(参考訳) 大規模なビデオデータセットの導入とディープニューラルネットワークの開発により、アクション認識(AR)は大幅に改善されているが、現実のシナリオにおける挑戦的な環境に対して堅牢なARモデルは、まだ探索されていない。我々は,暗環境における行動認識の課題に注目し,監視や夜間の自律運転といった分野に適用できる。直感的には、現在のディープネットワークとビジュアルエンハンスメント技術は、暗い環境でarを扱えるべきであるが、実際には必ずしもそうではないことが観察されている。ダーク環境でarのソリューションを探求するために、私たちは、暗い環境におけるarモデルの堅牢性の評価と向上を目的として、ieee cvpr 2021でug2+ challenge track 2(ug2-2)を立ち上げました。この課題は、ダークビデオarのタスクのための最初のデータセットであるaridデータセットの上に構築され、拡張し、完全かつ半監督された方法でそのようなタスクに取り組むためのモデルをガイドする。現在のARモデルと拡張手法を利用したベースライン結果が報告され、このタスクの難易度を改善の余地で正当化する。研究コミュニティからの積極的な参加により、参加者のソリューションに顕著な進歩が見られ、一方、これらのソリューションの分析は、暗黒環境におけるARの課題に取り組むための可能な方向の特定に役立っている。

While action recognition (AR) has gained large improvements with the introduction of large-scale video datasets and the development of deep neural networks, AR models robust to challenging environments in real-world scenarios are still under-explored. We focus on the task of action recognition in dark environments, which can be applied to fields such as surveillance and autonomous driving at night. Intuitively, current deep networks along with visual enhancement techniques should be able to handle AR in dark environments, however, it is observed that this is not always the case in practice. To dive deeper into exploring solutions for AR in dark environments, we launched the UG2+ Challenge Track 2 (UG2-2) in IEEE CVPR 2021, with a goal of evaluating and advancing the robustness of AR models in dark environments. The challenge builds and expands on top of a novel ARID dataset, the first dataset for the task of dark video AR, and guides models to tackle such a task in both fully and semi-supervised manners. Baseline results utilizing current AR models and enhancement methods are reported, justifying the challenging nature of this task with substantial room for improvements. Thanks to the active participation from the research community, notable advances have been made in participants' solutions, while analysis of these solutions helped better identify possible directions to tackle the challenge of AR in dark environments.

翻訳日:2023-11-02 05:09:47 公開日:2023-10-30

# メトリック学習による最適輸送による分子表現学習の改善

Improving Molecular Representation Learning with Metric Learning-enhanced Optimal Transport ( http://arxiv.org/abs/2202.06208v3 )

ライセンス: Link先を確認

Fang Wu, Nicolas Courty, Shuting Jin, Stan Z. Li

(参考訳) トレーニングデータは通常、多くの化学および生物学的応用において制限または不均一である。既存の化学と材料科学の機械学習モデルは、訓練領域を超えて一般化することを考慮しない。本稿では,分子レグレッション問題の一般化能力を高めるため,MROTと呼ばれる新しい最適輸送ベースアルゴリズムを開発した。 MROTは、新しい領域距離の測定値と、化学領域ギャップを埋める輸送計画に関する後続の分散正則化を計測することで、データの連続ラベルを学習する。下流では, 化学特性予測や物質吸着選択など, 教師なし・半監督的な環境下での基本的な化学回帰タスクを検討する。広範な実験により、mrotは最先端のモデルを大きく上回り、望ましい性質を持つ新しい物質の発見を加速する可能性を示した。

Training data are usually limited or heterogeneous in many chemical and biological applications. Existing machine learning models for chemistry and materials science fail to consider generalizing beyond training domains. In this article, we develop a novel optimal transport-based algorithm termed MROT to enhance their generalization capability for molecular regression problems. MROT learns a continuous label of the data by measuring a new metric of domain distances and a posterior variance regularization over the transport plan to bridge the chemical domain gap. Among downstream tasks, we consider basic chemical regression tasks in unsupervised and semi-supervised settings, including chemical property prediction and materials adsorption selection. Extensive experiments show that MROT significantly outperforms state-of-the-art models, showing promising potential in accelerating the discovery of new substances with desired properties.

翻訳日:2023-11-02 05:09:22 公開日:2023-10-30

# 高速で高精度な圧縮圧縮ビデオ品質向上のためのビットストリームメタデータの活用

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement ( http://arxiv.org/abs/2202.00011v3 )

ライセンス: Link先を確認

Max Ehrlich, Jon Barker, Namitha Padmanabhan, Larry Davis, Andrew Tao, Bryan Catanzaro, Abhinav Shrivastava

(参考訳) ビデオ圧縮は、ソーシャルメディアからビデオ会議まで、現代のインターネットを支える技術の中心的な特徴である。ビデオ圧縮は成熟を続けていますが、多くの圧縮設定では品質の低下が顕著です。これらの設定は、帯域制限や不安定な接続による効率的な動画伝送に重要な応用をもたらす。本研究では,ビデオビットストリームに埋め込まれた構造と動作情報を活用する圧縮ビデオに詳細を復元する深層学習アーキテクチャを開発した。その結果,従来の圧縮補正法と比較して復元精度が向上し,高スループットを実現しつつ,近年のディープラーニングビデオ圧縮法と比較した場合の競合性が示された。さらに、ビットストリームで容易に利用できる量子化データに対して、我々のモデルを条件付けする。これにより、1つのモデルでさまざまな圧縮品質の設定を処理でき、事前作業で複数のモデルが必要になります。

Video compression is a central feature of the modern internet powering technologies from social media to video conferencing. While video compression continues to mature, for many compression settings, quality loss is still noticeable. These settings nevertheless have important applications to the efficient transmission of videos over bandwidth constrained or otherwise unstable connections. In this work, we develop a deep learning architecture capable of restoring detail to compressed videos which leverages the underlying structure and motion information embedded in the video bitstream. We show that this improves restoration accuracy compared to prior compression correction methods and is competitive when compared with recent deep-learning-based video compression methods on rate-distortion while achieving higher throughput. Furthermore, we condition our model on quantization data which is readily available in the bitstream. This allows our single model to handle a variety of different compression quality settings which required an ensemble of models in prior work.

翻訳日:2023-11-02 05:09:09 公開日:2023-10-30

# Subset Stackingによる学習

Learning with Subset Stacking ( http://arxiv.org/abs/2112.06251v3 )

ライセンス: Link先を確認

S. \.Ilker Birbil, Sinan Yildirim, Kaya G\"okalp, M. Hakan Aky\"uz

(参考訳) 入力-出力ペアの集合から学習する新しい回帰アルゴリズムを提案する。本アルゴリズムは,入力変数と出力変数の関係が予測子空間にまたがる不均一な振る舞いを示す集団を対象として設計されている。アルゴリズムは入力空間のランダムな点を中心に集中した部分集合を生成することから始まる。次に、各サブセットに対してローカル予測器をトレーニングする。それらの予測器は、新しい方法で結合され、全体的な予測器となる。我々はこのアルゴリズムを '`Larning with Subset Stacking'' あるいは LESS と呼んでいる。 LESSの試験性能といくつかのデータセットの最先端手法を比較した。比較の結果,LESSは競合型教師あり学習手法であることがわかった。さらに, LESSは計算時間の観点からも効率的であり, 直接並列実装が可能であることも確認した。

We propose a new regression algorithm that learns from a set of input-output pairs. Our algorithm is designed for populations where the relation between the input variables and the output variable exhibits a heterogeneous behavior across the predictor space. The algorithm starts with generating subsets that are concentrated around random points in the input space. This is followed by training a local predictor for each subset. Those predictors are then combined in a novel way to yield an overall predictor. We call this algorithm ``LEarning with Subset Stacking'' or LESS, due to its resemblance to the method of stacking regressors. We compare the testing performance of LESS with state-of-the-art methods on several datasets. Our comparison shows that LESS is a competitive supervised learning method. Moreover, we observe that LESS is also efficient in terms of computation time and it allows a straightforward parallel implementation.

翻訳日:2023-11-02 05:08:41 公開日:2023-10-30

# lipschitzバンドにバッチフィードバック

Lipschitz Bandits with Batched Feedback ( http://arxiv.org/abs/2110.09722v6 )

ライセンス: Link先を確認

Yasong Feng, Zengfeng Huang, Tianyu Wang

(参考訳) 本稿では,バッチフィードバックによるリプシッツのバンドイット問題について検討し,期待される報酬はリプシッツであり,報奨観測はバッチでプレイヤーに伝達される。本稿では,この問題を最適に解くために,Batched Lipschitz Narrowing (BLiN)と呼ばれる新しいランドスケープ認識アルゴリズムを提案する。具体的には、リプシッツのズーム次元が$d_z$の報酬を持つ$T$-step問題に対して、我々のアルゴリズムは、$ \mathcal{O} \left( \log\log T\right) $ batchesのみを用いて理論的に最適(対数的因子まで)の後悔率$\widetilde{\mathcal{O}}\left(T^{\frac{d_z+1}{d_z+2}}\right)$を達成する。この問題に対する複雑性分析も提供する。理論上の下限は、任意のアルゴリズムが最適な後悔を達成するためには、$\omega(\log\log t)$ バッチが必要であることを意味する。したがって、BLiNは最小限の通信を用いて最適な後悔率(対数係数まで)を達成する。

In this paper, we study Lipschitz bandit problems with batched feedback, where the expected reward is Lipschitz and the reward observations are communicated to the player in batches. We introduce a novel landscape-aware algorithm, called Batched Lipschitz Narrowing (BLiN), that optimally solves this problem. Specifically, we show that for a $T$-step problem with Lipschitz reward of zooming dimension $d_z$, our algorithm achieves theoretically optimal (up to logarithmic factors) regret rate $\widetilde{\mathcal{O}}\left(T^{\frac{d_z+1}{d_z+2}}\right)$ using only $ \mathcal{O} \left( \log\log T\right) $ batches. We also provide complexity analysis for this problem. Our theoretical lower bound implies that $\Omega(\log\log T)$ batches are necessary for any algorithm to achieve the optimal regret. Thus, BLiN achieves optimal regret rate (up to logarithmic factors) using minimal communication.

翻訳日:2023-11-02 05:07:36 公開日:2023-10-30

# 侵入検知システムにおける深部教師なし学習アルゴリズムのロバスト性評価

Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems ( http://arxiv.org/abs/2207.03576v2 )

ライセンス: Link先を確認

D'Jeff Kanda Nkashama, Arian Soltani, Jean-Charles Verdier, Marc Frappier, Pierre-Martin Tardif, Froduald Kabanza

(参考訳) 近年,コンピュータビジョン,自然言語処理,サイバーセキュリティなど,さまざまな分野でディープラーニングの進歩が観察されている。機械学習(ml)は、異常検出に基づく侵入検知システムによる安全なコンピュータネットワーク構築のための潜在的なツールとしての能力を実証した。 MLアプローチは、データから直接学習するため、サイバーセキュリティに対するヒューリスティックアプローチよりも広く採用されている。データはMLシステムの開発に不可欠であり、アタッカーの潜在的なターゲットとなる。データ中毒や汚染は、データを通してMLモデルを騙すのに最も一般的なテクニックの1つである。本稿では,最近の6つの深層学習アルゴリズムによる汚染データへの侵入検出のロバスト性を評価する。本研究では,新しいモデル,特に侵入検知システムの開発において,データ汚染に敏感な最先端アルゴリズムが,データの摂動に対する自己防衛の重要性を明らかにした。

Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.

翻訳日:2023-11-02 05:00:16 公開日:2023-10-30

# データ効率ganトレーニングのための拡張認識自己スーパービジョン

Augmentation-Aware Self-Supervision for Data-Efficient GAN Training ( http://arxiv.org/abs/2205.15677v3 )

ライセンス: Link先を確認

Liang Hou, Qi Cao, Yige Yuan, Songtao Zhao, Chongyang Ma, Siyuan Pan, Pengfei Wan, Zhongyuan Wang, Huawei Shen, Xueqi Cheng

(参考訳) 限定されたデータを持つgans(generative adversarial networks)のトレーニングは、判別器が過剰に適合し易いため難しい。従来提案された差別化可能拡張は、訓練用GANのデータ効率の改善を示す。しかし、データ変換によるラベル空間のセマンティクスの変化を無視し、識別器の表現学習能力を制限し、最終的にジェネレータの生成モデル性能に影響を及ぼすため、識別器の増大に対する望ましくない不変性を暗黙的に導入する。データ拡張の利点を継承しながら、不変性の悪影響を軽減するために、拡張データの拡張パラメータを予測する新しい強化対応自己教師付き判別器を提案する。特に、実際のデータと生成されたデータの予測対象は、トレーニング中に異なるため、区別する必要がある。さらに,自己監督型判別器から,偽データではなく拡張予測可能な実データを生成することで,逆向きに学習することを推奨する。この定式化は、ある仮定の下で生成元の学習目標と算術 $-$ harmonic mean divergence を結びつける。我々は,データ制限付きCIFAR-10, CIFAR-100, FFHQ, LSUN-Catおよび5つの低ショットデータセット上で, クラス条件のBigGANおよび非条件のStyleGAN2アーキテクチャを用いたSOTA手法との比較を行った。実験により,データ効率のよいGANの訓練において,SOTA法よりも優れた手法が得られた。

Training generative adversarial networks (GANs) with limited data is challenging because the discriminator is prone to overfitting. Previously proposed differentiable augmentation demonstrates improved data efficiency of training GANs. However, the augmentation implicitly introduces undesired invariance to augmentation for the discriminator since it ignores the change of semantics in the label space caused by data transformation, which may limit the representation learning ability of the discriminator and ultimately affect the generative modeling performance of the generator. To mitigate the negative impact of invariance while inheriting the benefits of data augmentation, we propose a novel augmentation-aware self-supervised discriminator that predicts the augmentation parameter of the augmented data. Particularly, the prediction targets of real data and generated data are required to be distinguished since they are different during training. We further encourage the generator to adversarially learn from the self-supervised discriminator by generating augmentation-predictable real and not fake data. This formulation connects the learning objective of the generator and the arithmetic $-$ harmonic mean divergence under certain assumptions. We compare our method with state-of-the-art (SOTA) methods using the class-conditional BigGAN and unconditional StyleGAN2 architectures on data-limited CIFAR-10, CIFAR-100, FFHQ, LSUN-Cat, and five low-shot datasets. Experimental results demonstrate significant improvements of our method over SOTA methods in training data-efficient GANs.

翻訳日:2023-11-02 04:57:31 公開日:2023-10-30

# ボソニック分解チャネルの量子容量とプライベート容量の厳密解

Exact solution for the quantum and private capacities of bosonic dephasing channels ( http://arxiv.org/abs/2205.05736v2 )

ライセンス: Link先を確認

Ludovico Lami, Mark M. Wilde

(参考訳) ノイズの多い量子チャネルの容量は、量子通信回線間での情報伝達の究極の速度を捉え、量子容量はフォールトトレラントな量子計算プラットフォームのオーバーヘッドを決定する上で重要な役割を果たす。多くの応用の中心となるボソニック系では、超伝導回路や光ファイバー通信チャネルに影響を及ぼすノイズをモデル化する非ガウスチャネルの重要なクラスであるボソニックデファッシングチャネルでは、これらのキャパシティの閉じた公式は知られていなかった。ここでは、全てのボソニックデファスティングチャネルの量子、プライベート、双方向の補助量子、および秘密鍵合意容量を、初めて正確に計算する。それらの分布が一様分布に対するチャネルの基礎となる分布の相対エントロピーに等しいことが証明される。この結果は,[jiang & chen, quantum and nonlinear optics 244, 2010]が提唱した,10年以上にわたって開かれてきた問題を解くものだ。

The capacities of noisy quantum channels capture the ultimate rates of information transmission across quantum communication lines, and the quantum capacity plays a key role in determining the overhead of fault-tolerant quantum computation platforms. In the case of bosonic systems, central to many applications, no closed formulas for these capacities were known for bosonic dephasing channels, a key class of non-Gaussian channels modelling, e.g., noise affecting superconducting circuits or fiber-optic communication channels. Here we provide the first exact calculation of the quantum, private, two-way assisted quantum, and secret-key agreement capacities of all bosonic dephasing channels. We prove that that they are equal to the relative entropy of the distribution underlying the channel to the uniform distribution. Our result solves a problem that has been open for over a decade, having been posed originally by [Jiang & Chen, Quantum and Nonlinear Optics 244, 2010].

翻訳日:2023-11-02 04:57:04 公開日:2023-10-30

# INSPIRE:Dense WLANにおけるSPatIalリユースの改善のための分散ベイズ最適化

INSPIRE: Distributed Bayesian Optimization for ImproviNg SPatIal REuse in Dense WLANs ( http://arxiv.org/abs/2204.10184v3 )

ライセンス: Link先を確認

Anthony Bardou, Thomas Begin

(参考訳) 有線ネットワークを抜いてデバイスをインターネットに接続する主要な手段となったWLANは、無線帯域の空間不足により性能上の問題が発生する傾向にある。応答として、IEEE 802.11axとその後の修正は、送信電力(TX_POWER)と感度閾値(OBSS_PD)の2つのキーパラメータの動的更新を可能にすることで、無線チャネルの空間的再利用を高めることを目的としている。本稿では,WLANにおける空間再利用を改善するために,ガウス過程に基づく局所ベイズ最適化を行う分散ソリューションINSPIREを提案する。 INSPIREは、WLANのトポロジについて明確な仮定をせず、アクセスポイントの利他的振る舞いを好んでおり、それによって、WLANの"より優れた"ために、TX_POWERとOBSS_PDパラメータの適切な構成を見つけることができる。我々は,ns-3シミュレータを用いた他の最先端戦略よりもINSPIREの方が優れていることを示す。この結果から,INSPIREは,その公平性とスループットを向上することにより,運用用WLANのサービス品質を大幅に向上させることができることがわかった。

WLANs, which have overtaken wired networks to become the primary means of connecting devices to the Internet, are prone to performance issues due to the scarcity of space in the radio spectrum. As a response, IEEE 802.11ax and subsequent amendments aim at increasing the spatial reuse of a radio channel by allowing the dynamic update of two key parameters in wireless transmission: the transmission power (TX_POWER) and the sensitivity threshold (OBSS_PD). In this paper, we present INSPIRE, a distributed solution performing local Bayesian optimizations based on Gaussian processes to improve the spatial reuse in WLANs. INSPIRE makes no explicit assumptions about the topology of WLANs and favors altruistic behaviors of the access points, leading them to find adequate configurations of their TX_POWER and OBSS_PD parameters for the "greater good" of the WLANs. We demonstrate the superiority of INSPIRE over other state-of-the-art strategies using the ns-3 simulator and two examples inspired by real-life deployments of dense WLANs. Our results show that, in only a few seconds, INSPIRE is able to drastically increase the quality of service of operational WLANs by improving their fairness and throughput.

翻訳日:2023-11-02 04:55:57 公開日:2023-10-30

# 脳電図を用いたスケーラブルな機械学習モデルによる眠気検出性能の検討

Studying Drowsiness Detection Performance while Driving through Scalable Machine Learning Models using Electroencephalography ( http://arxiv.org/abs/2209.04048v3 )

ライセンス: Link先を確認

Jos\'e Manuel Hidalgo Rogel, Enrique Tom\'as Mart\'inez Beltr\'an, Mario Quiles P\'erez, Sergio L\'opez Bernal, Gregorio Mart\'inez P\'erez, Alberto Huertas Celdr\'an

(参考訳) 背景 / 導入: ドライバーの眠気は重要な関心事であり、交通事故の主な原因の1つです。認知神経科学とコンピュータ科学の進歩により、Brain-Computer Interfaces (BCI) と Machine Learning (ML) を用いたドライバーの眠気の検出が可能になった。しかし,不均質なMLアルゴリズムを用いた快適度検出性能の総合評価には欠けており,対象者のグループに適したスケーラブルなMLモデルの性能について検討する必要がある。方法:これらの制約に対処するため、この研究はBCIを用いたインテリジェントな枠組みを示し、脳波に基づいて運転シナリオの眠気を検出する。 SEED-VIGデータセットは、個人とグループにとって最高のパフォーマンスモデルを評価するために使用される。結果: ランダムフォレスト (RF) は,SVM (Support Vector Machine) などの文献において,個々のモデルに対して78%のf1スコアで,他のモデルよりも優れていた。スケーラブルモデルに関して、RFは79%のf1スコアに達し、これらのアプローチの有効性を実証した。本論文は,多種多様なmlアルゴリズムと,被検者の集団が眠気検出システムを改善し,最終的には運転者の疲労による事故数を減らすのに適したスケーラブルなアプローチを検討することの関連性を強調する。結論:本研究から得られた教訓は,SVMだけでなく,文献で十分に調査されていない他のモデルも,眠気検出に関係していることを示している。さらに,新しい被験者が評価された場合でも,スケーラブルなアプローチは眠気の検出に有効である。そこで,提案フレームワークは,BCIとMLを用いた運転シナリオの眠気を検出する新しい手法を提案する。

- Background / Introduction: Driver drowsiness is a significant concern and one of the leading causes of traffic accidents. Advances in cognitive neuroscience and computer science have enabled the detection of drivers' drowsiness using Brain-Computer Interfaces (BCIs) and Machine Learning (ML). However, the literature lacks a comprehensive evaluation of drowsiness detection performance using a heterogeneous set of ML algorithms, and it is necessary to study the performance of scalable ML models suitable for groups of subjects. - Methods: To address these limitations, this work presents an intelligent framework employing BCIs and features based on electroencephalography for detecting drowsiness in driving scenarios. The SEED-VIG dataset is used to evaluate the best-performing models for individual subjects and groups. - Results: Results show that Random Forest (RF) outperformed other models used in the literature, such as Support Vector Machine (SVM), with a 78% f1-score for individual models. Regarding scalable models, RF reached a 79% f1-score, demonstrating the effectiveness of these approaches. This publication highlights the relevance of exploring a diverse set of ML algorithms and scalable approaches suitable for groups of subjects to improve drowsiness detection systems and ultimately reduce the number of accidents caused by driver fatigue. - Conclusions: The lessons learned from this study show that not only SVM but also other models not sufficiently explored in the literature are relevant for drowsiness detection. Additionally, scalable approaches are effective in detecting drowsiness, even when new subjects are evaluated. Thus, the proposed framework presents a novel approach for detecting drowsiness in driving scenarios using BCIs and ML.

翻訳日:2023-11-02 04:45:53 公開日:2023-10-30

# OOV-STR用視覚言語適応型相互デコーダ

Vision-Language Adaptive Mutual Decoder for OOV-STR ( http://arxiv.org/abs/2209.00859v2 )

ライセンス: Link先を確認

Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, Lirong Dai

(参考訳) 近年の研究では、語彙(IV)シーンのテキスト認識に共通する深層学習モデルが大きな成功を収めている。しかし、現実のシナリオでは、語彙外(oov)の単語は非常に重要であり、sota認識モデルは通常、oovの設定で性能が悪い。学習言語がOOVプリフォームを制限していたという直感に触発されて、視覚言語適応型相互デコーダ(VLAMD)というフレームワークを設計し、OOVの問題に部分的に対処する。 VLAMDは3つの主要なコンポンジェントから構成される。まず,2つの視覚のみのモジュールを適応的に結合したアテンションベースLSTMデコーダを構築し,視覚言語によるバランスの取れたメインブランチを生成する。次に,共通視覚および言語先行表現学習のための補助的クエリベース自己回帰トランスフォーマ復号ヘッドを追加する。最後に、これらの2つの設計を、より多様な言語モデリングのための双方向トレーニングと組み合わせ、より堅牢な結果を得るために相互に逐次復号を行う。提案手法は,ECCV 2022 TiE Workshop の OOV-ST Challenge において,IV+OOV と OOV の設定に対して,70.31\% と59.61\% の単語精度を達成した。

Recent works have shown huge success of deep learning models for common in vocabulary (IV) scene text recognition. However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings. Inspired by the intuition that the learned language prior have limited OOV preformence, we design a framework named Vision Language Adaptive Mutual Decoder (VLAMD) to tackle OOV problems partly. VLAMD consists of three main conponents. Firstly, we build an attention based LSTM decoder with two adaptively merged visual-only modules, yields a vision-language balanced main branch. Secondly, we add an auxiliary query based autoregressive transformer decoding head for common visual and language prior representation learning. Finally, we couple these two designs with bidirectional training for more diverse language modeling, and do mutual sequential decoding to get robuster results. Our approach achieved 70.31\% and 59.61\% word accuracy on IV+OOV and OOV settings respectively on Cropped Word Recognition Task of OOV-ST Challenge at ECCV 2022 TiE Workshop, where we got 1st place on both settings.

翻訳日:2023-11-02 04:45:05 公開日:2023-10-30

# 反応に対する一般介入による不変表現の学習

Learning Invariant Representations under General Interventions on the Response ( http://arxiv.org/abs/2208.10027v3 )

ライセンス: Link先を確認

Kang Du and Yu Xiang

(参考訳) 近年、異なる環境から特徴と応答のペアを観察することが一般的になっている。その結果、分散シフトによって異なる分布を持つデータに学習した予測器を適用する必要がある。 1つの原理的なアプローチは、トレーニングとテストモデルを記述するために構造因果モデルを採用することである。しかし、この原則は、応答がインターバルされたときに実践的な設定で違反する可能性がある。自然の疑問は、目に見えない環境で予測を促進するために他の形の不変性を特定することができるかどうかである。そこで本研究では, 線形構造因果モデル (SCM) に焦点をあて, 付加的な特徴を通じて介入を捕捉する明示的な関係性である不変マッチング特性 (IMP) を導入し, 応答に対する一般的な介入と予測器の統一的な処理を可能にする, 新たな不変形の不変性を実現する。本手法の漸近的一般化誤差を離散的および連続的な環境条件下で解析し,半パラメトリック変動係数モデルに関連付けて連続ケースを処理した。新型コロナウイルスのデータセットを含む様々な実験環境において,既存の手法と比較して競合性能を示すアルゴリズムを提案する。

It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we focus on linear structural causal models (SCMs) and introduce invariant matching property (IMP), an explicit relation to capture interventions through an additional feature, leading to an alternative form of invariance that enables a unified treatment of general interventions on the response as well as the predictors. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings including a COVID dataset.

翻訳日:2023-11-02 04:44:38 公開日:2023-10-30

# 開量子系の緩和における時間的臨界スケーリングの適応

Indication of critical scaling in time during the relaxation of an open quantum system ( http://arxiv.org/abs/2208.05164v2 )

ライセンス: Link先を確認

Ling-Na Wu, Jens Nettersheim, Julian Fe\ss, Alexander Schnell, Sabrina Burgardt, Silvia Hiebel, Daniel Adam, Andr\'e Eckardt and Artur Widera

(参考訳) 相転移は、温度や外部磁場のような連続的な制御パラメータに対応する物理系の特異な挙動に対応する。相関長の発散に伴う連続相転移付近で, 微視的系詳細に依存しない臨界指数を持つ普遍的パワーロースケーリング挙動が見いだされる。近年、動的量子相転移と普遍的スケーリングが予測され、クエンチ後の孤立量子系の非平衡力学でも観察され、制御パラメーターの役割は時間とともに果たされた。しかしながら、環境への散逸的な接触によって力学が駆動されるオープンシステムにおいて、そのような臨界現象の時相のシグネチャは、これまでになく明白であった。本稿では,混合状態によって記述された開量子系の緩和ダイナミクスにおいて,時間に対する臨界スケーリングも起こりうることを示す。ルビジウム原子の超低温ボースガスへのスピン交換による散逸結合によって誘導される個々のセシウム原子の大きな原子スピンの緩和ダイナミクスを実験的に測定した。初期状態が平衡から遠い場合、スピン状態のエントロピーは時間内にピークに達し、その最大値に過渡的に近づき、最終的にその低い平衡値に緩和される。さらに,数値シミュレーションに基づく有限次元スケーリング解析により,大きなシステムサイズ限界における散逸系の時間に関する臨界点に対応することを示した。臨界時刻における特徴的長さのばらつきによって信号が伝達され、システムの詳細とは独立な臨界指数によって特徴づけられる。

Phase transitions correspond to the singular behavior of physical systems in response to continuous control parameters like temperature or external fields. Near continuous phase transitions, associated with the divergence of a correlation length, universal power-law scaling behavior with critical exponents independent of microscopic system details is found. Recently, dynamical quantum phase transitions and universal scaling have been predicted and also observed in the non-equilibrium dynamics of isolated quantum systems after a quench, with time playing the role of the control parameter. However, signatures of such critical phenomena in time in open systems, whose dynamics is driven by the dissipative contact to an environment, were so far elusive. Here, we present results indicating that critical scaling with respect to time can also occur during the relaxation dynamics of an open quantum system described by mixed states. We experimentally measure the relaxation dynamics of the large atomic spin of individual Caesium atoms induced by the dissipative coupling via spin-exchange processes to an ultracold Bose gas of Rubidium atoms. For initial states far from equilibrium, the entropy of the spin state is found to peak in time, transiently approaching its maximum possible value, before eventually relaxing to its lower equilibrium value. Moreover, a finite-size scaling analysis based on numerical simulations shows that it corresponds to a critical point with respect to time of the dissipative system in the limit of large system sizes. It is signalled by the divergence of a characteristic length at a critical time, characterized by critical exponents that are found to be independent of system details.

翻訳日:2023-11-02 04:43:45 公開日:2023-10-30

# 20量子ビット量子シミュレータの複素状態再構成

Reconstructing complex states of a 20-qubit quantum simulator ( http://arxiv.org/abs/2208.04862v3 )

ライセンス: Link先を確認

Murali K. Kurmapu, V.V. Tiunova, E.S. Tiunov, Martin Ringbauer, Christine Maier, Rainer Blatt, Thomas Monz, Aleksey K. Fedorov, A.I. Lvovsky

(参考訳) 量子コンピュータとシミュレーターの開発に成功するための前提条件は、それらが生成する量子状態を測定することによって得られる物理的過程の正確な理解である。しかしながら、従来の量子状態推定に必要なリソースは、システムサイズと指数関数的にスケールし、代替アプローチの必要性を強調している。ここでは、大きく絡み合った多ビット量子状態の効率的な再構成法を示す。行列積状態 ansatz の変分バージョンを用いて、20量子ビットのトラップイオンイジング型量子シミュレータで生成された量子状態のトモグラフィー(純状態近似)を行い、各基底で1000個の測定値を持つ27塩基で取得したデータを用いた。我々は、ニューラルネットワークの量子状態表現に基づく手法と比較して、優れた状態再構成品質とより高速な収束を観察する:制限ボルツマンマシンと自己回帰アーキテクチャを備えたフィードフォワードニューラルネットワーク。本研究では,多体量子系のクエンチダイナミクスによって生成される複素状態の効率的な実験的キャラクタリゼーションへの道を開く。

A prerequisite to the successful development of quantum computers and simulators is precise understanding of physical processes occurring therein, which can be achieved by measuring the quantum states they produce. However, the resources required for traditional quantum-state estimation scale exponentially with the system size, highlighting the need for alternative approaches. Here we demonstrate an efficient method for reconstruction of significantly entangled multi-qubit quantum states. Using a variational version of the matrix product state ansatz, we perform the tomography (in the pure-state approximation) of quantum states produced in a 20-qubit trapped-ion Ising-type quantum simulator, using the data acquired in only 27 bases with 1000 measurements in each basis. We observe superior state reconstruction quality and faster convergence compared to the methods based on neural network quantum state representations: restricted Boltzmann machines and feedforward neural networks with autoregressive architecture. Our results pave the way towards efficient experimental characterization of complex states produced by the quench dynamics of many-body quantum systems.

翻訳日:2023-11-02 04:43:24 公開日:2023-10-30

# r\'enyiのシャッフルによるより強力なプライバシー増幅と近似微分プライバシー

Stronger Privacy Amplification by Shuffling for R\'enyi and Approximate Differential Privacy ( http://arxiv.org/abs/2208.04591v2 )

ライセンス: Link先を確認

Vitaly Feldman and Audra McMillan and Kunal Talwar

(参考訳) 差分プライバシーのシャッフルモデルは、標準的なローカルモデルと中央モデル(EFMRTT19; CSUZZ19)の中間信頼モデルとして注目されている。このモデルの主な結果は、ランダムにランダムにランダムにデータをシャッフルすることで、差分プライバシーの保証を増幅する。このような増幅は、データが匿名で貢献されるシステムにとって、はるかに強力なプライバシー保証を意味する[BEMMRLRKTS17]。本研究では,理論と数値の両方で結果のシャッフルを行うことで,美術プライバシ増幅の状況を改善する。最初の貢献は、ldpランダム化器のシャッフル出力に対するr\'enyi微分プライバシーパラメータの漸近的最適解析である。第2の貢献は、シャッフルによるプライバシーの増幅に関する新たな分析です。この分析は[FMT20]の技法を改良し、全てのパラメータ設定においてより厳密な数値境界をもたらす。

The shuffle model of differential privacy has gained significant interest as an intermediate trust model between the standard local and central models [EFMRTT19; CSUZZ19]. A key result in this model is that randomly shuffling locally randomized data amplifies differential privacy guarantees. Such amplification implies substantially stronger privacy guarantees for systems in which data is contributed anonymously [BEMMRLRKTS17]. In this work, we improve the state of the art privacy amplification by shuffling results both theoretically and numerically. Our first contribution is the first asymptotically optimal analysis of the R\'enyi differential privacy parameters for the shuffled outputs of LDP randomizers. Our second contribution is a new analysis of privacy amplification by shuffling. This analysis improves on the techniques of [FMT20] and leads to tighter numerical bounds in all parameter settings.

翻訳日:2023-11-02 04:43:09 公開日:2023-10-30

# ギャップ量子多体系からの急速に混合されたマルコフ鎖

A rapidly mixing Markov chain from any gapped quantum many-body system ( http://arxiv.org/abs/2207.07044v2 )

ライセンス: Link先を確認

Sergey Bravyi, Giuseppe Carleo, David Gosset, Yinchen Liu

(参考訳) 分布 $\pi(x)=|\langle x|\psi\rangle|^2$ からビット文字列 $x$ をサンプリングする計算タスクを考える。我々の主な結果は、逆スペクトルギャップの$H$と、関連する連続時間マルコフ連鎖と定常状態の$\pi$との混合時間との直接リンクを記述する。マルコフ連鎖は、基底状態振幅の比$\langle y|\psi\rangle/\langle x|\psi\rangle$が効率良く計算可能であり、$H$のスペクトルギャップはシステムサイズにおける少なくとも逆多項式であり、連鎖の開始状態は、効率よくチェックできる穏やかな技術的条件を満たす。これは、サインプロブレム自由ハミルトニアンとマルコフ連鎖の間の既知の関係を拡張する。この一般化を可能にするツールは、フェルミオン符号問題に対処するために以前は量子モンテカルロシミュレーションで使われていたいわゆる固定ノードハミルトン構成である。提案したサンプリングアルゴリズムを数値的に実装し,56量子ビットのHaldane-Shastry Hamiltonian基底状態からサンプリングする。我々は、固定ノードハミルトニアンに基づくマルコフ連鎖が標準のメトロポリス・ハスティングス・マルコフ連鎖よりも高速に混合されることを経験的に観察する。

We consider the computational task of sampling a bit string $x$ from a distribution $\pi(x)=|\langle x|\psi\rangle|^2$, where $\psi$ is the unique ground state of a local Hamiltonian $H$. Our main result describes a direct link between the inverse spectral gap of $H$ and the mixing time of an associated continuous-time Markov Chain with steady state $\pi$. The Markov Chain can be implemented efficiently whenever ratios of ground state amplitudes $\langle y|\psi\rangle/\langle x|\psi\rangle$ are efficiently computable, the spectral gap of $H$ is at least inverse polynomial in the system size, and the starting state of the chain satisfies a mild technical condition that can be efficiently checked. This extends a previously known relationship between sign-problem free Hamiltonians and Markov chains. The tool which enables this generalization is the so-called fixed-node Hamiltonian construction, previously used in Quantum Monte Carlo simulations to address the fermionic sign problem. We implement the proposed sampling algorithm numerically and use it to sample from the ground state of Haldane-Shastry Hamiltonian with up to 56 qubits. We observe empirically that our Markov chain based on the fixed-node Hamiltonian mixes more rapidly than the standard Metropolis-Hastings Markov chain.

翻訳日:2023-11-02 04:42:47 公開日:2023-10-30

# 大規模言語モデルの概念支援型デバイアス

Conceptor-Aided Debiasing of Large Language Models ( http://arxiv.org/abs/2211.11087v3 )

ライセンス: Link先を確認

Li S. Yifei, Lyle Ungar, Jo\~ao Sedoc

(参考訳) 事前訓練された大規模言語モデル(LLM)は、トレーニングコーパスの社会的バイアスを反映している。この問題を軽減するために多くの方法が提案されているが、デビアスに失敗したり、モデルの精度を犠牲にしたりすることが多い。我々は,BERT や GPT などの LLM のバイアス部分空間を同定し,除去するためのソフトプロジェクション手法である概念を用いた。提案手法は, コンセプタ非操作による後処理によるバイアス部分空間投影と, (2) トレーニング中のすべてのレイヤにコンセプタ投影を明示的に組み込む新しいアーキテクチャであるconceptor-intervened bert (ci-bert) を提案する。 GLUEベンチマークでは,LLMの性能を維持しつつ,最先端(SoTA)のデバイアス結果を実現する。さらに、様々なシナリオにおいてロバストであり、既存のバイアス部分空間上のAND演算により交差点バイアスを効率的に緩和することができる。 CI-BERTのトレーニングはすべてのレイヤのバイアスを考慮に入れ、バイアス軽減で後処理に勝てるが、CI-BERTは言語モデルの精度を低下させる。また,バイアス部分空間を慎重に構築することの重要性を示す。最善の結果は、偏りのある単語のリストから外れ値を削除し、それらを(or操作によって)組み合わせ、それらの埋め込みをよりクリーンなコーパスから計算することで得られる。

Pre-trained large language models (LLMs) reflect the inherent social biases of their training corpus. Many methods have been proposed to mitigate this issue, but they often fail to debias or they sacrifice model accuracy. We use conceptors--a soft projection method--to identify and remove the bias subspace in LLMs such as BERT and GPT. We propose two methods of applying conceptors (1) bias subspace projection by post-processing by the conceptor NOT operation; and (2) a new architecture, conceptor-intervened BERT (CI-BERT), which explicitly incorporates the conceptor projection into all layers during training. We find that conceptor post-processing achieves state-of-the-art (SoTA) debiasing results while maintaining LLMs' performance on the GLUE benchmark. Further, it is robust in various scenarios and can mitigate intersectional bias efficiently by its AND operation on the existing bias subspaces. Although CI-BERT's training takes all layers' bias into account and can beat its post-processing counterpart in bias mitigation, CI-BERT reduces the language model accuracy. We also show the importance of carefully constructing the bias subspace. The best results are obtained by removing outliers from the list of biased words, combining them (via the OR operation), and computing their embeddings using the sentences from a cleaner corpus.

翻訳日:2023-11-02 04:35:28 公開日:2023-10-30

# MLIC:学習画像圧縮のためのマルチ参照エントロピーモデル

MLIC: Multi-Reference Entropy Model for Learned Image Compression ( http://arxiv.org/abs/2211.07273v7 )

ライセンス: Link先を確認

Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang

(参考訳) 近年,学習画像の圧縮性能は著しく向上している。潜在表現の分布を推定するエントロピーモデルは、速度分散性能の向上に重要な役割を果たしている。しかし、ほとんどのエントロピーモデルは1次元の相関のみを捉えるが、潜在表現はチャネル回り、局所空間、大域的な空間相関を含む。この問題に対処するため、Multi-Reference Entropy Model (MEM) と高度なバージョンMEM$^+$を提案する。これらのモデルは潜在表現に存在する異なる種類の相関を捉える。具体的には、まず潜在表現をスライスに分割する。現在のスライスを復号する際には、予め復号されたスライスをコンテキストとして使用し、それまでのスライスのアテンションマップを用いて、現在のスライスにおける大域的相関を予測する。ローカルコンテキストをキャプチャするために,性能劣化を回避する2つの拡張チェッカーボードコンテキストキャプチャ技術を導入する。 MEM と MEM$^+$ に基づいて,画像圧縮モデル MLIC と MLIC$^+$ を提案する。我々のMLICおよびMLIC$^+$モデルは、PSNRで測定されたVTM-17.0と比較して、Kodakデータセット上でのBDレートが8.05\%$と11.39\%$に減少する。私たちのコードはhttps://github.com/jiangweibeta/mlicで利用可能です。

Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC.

翻訳日:2023-11-02 04:34:40 公開日:2023-10-30

# 自然な注釈付き単語セグメンテーションデータとしての音声における単語境界の抽出

Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data ( http://arxiv.org/abs/2210.17122v2 )

ライセンス: Link先を確認

Lei Zhang, Zhenghua Li, Shilin Zhou, Chen Gong, Zhefeng Wang, Baoxing Huai, Min Zhang

(参考訳) 中国語単語セグメンテーション(CWS)のための自然な注釈付きデータ探索の初期の研究や、音声とテキスト処理の統合に関する最近の研究から着想を得たこの研究は、初めてパラレル音声/テキストデータから単語境界をマイニングすることを提案する。まず、実験で使用したCWSデータに関連する2つのインターネットソースから、並列音声/テキストデータを収集する。そして,文字レベルのアライメントを取得し,隣接する文字間の停止時間に応じて単語境界を決定するための単純なヒューリスティックなルールを設計する。最後に,モデルトレーニングに自然に付加したデータをより有効に活用できる,効果的な完全列学習戦略を提案する。実験によると、このアプローチはクロスドメインと低リソースの両方のシナリオでcwsのパフォーマンスを著しく向上させる。

Inspired by early research on exploring naturally annotated data for Chinese word segmentation (CWS), and also by recent research on integration of speech and text processing, this work for the first time proposes to mine word boundaries from parallel speech/text data. First we collect parallel speech/text data from two Internet sources that are related with CWS data used in our experiments. Then, we obtain character-level alignments and design simple heuristic rules for determining word boundaries according to pause duration between adjacent characters. Finally, we present an effective complete-then-train strategy that can better utilize extra naturally annotated data for model training. Experiments demonstrate our approach can significantly boost CWS performance in both cross-domain and low-resource scenarios.

翻訳日:2023-11-02 04:34:11 公開日:2023-10-30

# 最小エントロピー結合を用いた完全安全ステガノグラフィ

Perfectly Secure Steganography Using Minimum Entropy Coupling ( http://arxiv.org/abs/2210.14889v4 )

ライセンス: Link先を確認

Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

(参考訳) ステガノグラフィ(Steganography)とは、敵の第三者が隠された意味があることに気づかないような、秘密情報を無害な内容に符号化する実践である。この問題は古典的にセキュリティ文献で研究されてきたが、生成モデルの最近の進歩は、スケーラブルなステガノグラフィ技術を開発するセキュリティ研究者と機械学習研究者の間で共通の関心を呼んでいる。本研究は,1998年のCachin (1998) の情報理論モデルにおいて, ステガノグラフィーの手法が完全に安全であることを示し, 結合によって誘導される場合に限る。さらに,完全セキュアな手順の中で,最小エントロピー結合によって引き起こされる場合に限り,手続きが情報スループットを最大化することを示す。これらの知見は、私たちの知る限り、任意のカバーテキスト分布に対する完全なセキュリティ保証を達成するための最初のステガノグラフィーアルゴリズムとなる。 GPT-2, WaveRNN, Image Transformer を通信チャネルとして用いて, エントロピー結合に基づく最小のアプローチを, 算術符号, Meteor, 適応動的グループ化の3つの現代ベースラインと比較した。最小エントロピー結合に基づくアプローチは、より強いセキュリティ制約にもかかわらず、より優れたエンコーディング効率を実現する。これらの結果から, 最小エントロピー結合レンズを通して情報理論ステガノグラフィを見ることは自然である可能性が示唆された。

Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning. While this problem has classically been studied in security literature, recent advances in generative models have led to a shared interest among security and machine learning researchers in developing scalable steganography techniques. In this work, we show that a steganography procedure is perfectly secure under Cachin (1998)'s information-theoretic model of steganography if and only if it is induced by a coupling. Furthermore, we show that, among perfectly secure procedures, a procedure maximizes information throughput if and only if it is induced by a minimum entropy coupling. These insights yield what are, to the best of our knowledge, the first steganography algorithms to achieve perfect security guarantees for arbitrary covertext distributions. To provide empirical validation, we compare a minimum entropy coupling-based approach to three modern baselines -- arithmetic coding, Meteor, and adaptive dynamic grouping -- using GPT-2, WaveRNN, and Image Transformer as communication channels. We find that the minimum entropy coupling-based approach achieves superior encoding efficiency, despite its stronger security constraints. In aggregate, these results suggest that it may be natural to view information-theoretic steganography through the lens of minimum entropy coupling.

翻訳日:2023-11-02 04:33:57 公開日:2023-10-30

# 有向非巡回グラフ上のトランスフォーマー

Transformers over Directed Acyclic Graphs ( http://arxiv.org/abs/2210.13148v6 )

ライセンス: Link先を確認

Yuankai Luo, Veronika Thost, Lei Shi

(参考訳) トランスフォーマーモデルは最近、グラフ表現学習で人気を博し、通常のグラフニューラルネットワークでキャプチャされたもの以上の複雑な関係を学習する可能性がある。主な研究課題は、グラフの構造バイアスをトランスフォーマーアーキテクチャにどのように注入するかであり、非方向の分子グラフや近年ではより大きなネットワークグラフにもいくつかの提案がなされている。本稿では,有向非巡回グラフ (DAG) 上のトランスフォーマーについて検討し,(1) トランスフォーマーの通常の二次的複雑性よりもはるかに効率的で,同時にDAG構造を忠実に捉えた注意機構,(2) 前者を補完するDAGの部分的順序の位置エンコーディングを提案する。我々は、ソースコードグラフから引用ネットワークのノードへの分類に至るまで、さまざまなタスクに対する我々のアプローチを厳格に評価し、グラフトランスフォーマーを一般的にDAGに適合したグラフニューラルネットワークを上回り、品質と効率の両面でSOTAグラフトランスフォーマーの性能を向上させるという2つの重要な側面において有効であることを示す。

Transformer models have recently gained popularity in graph representation learning as they have the potential to learn complex relationships beyond the ones captured by regular graph neural networks. The main research question is how to inject the structural bias of graphs into the transformer architecture, and several proposals have been made for undirected molecular graphs and, recently, also for larger network graphs. In this paper, we study transformers over directed acyclic graphs (DAGs) and propose architecture adaptations tailored to DAGs: (1) An attention mechanism that is considerably more efficient than the regular quadratic complexity of transformers and at the same time faithfully captures the DAG structure, and (2) a positional encoding of the DAG's partial order, complementing the former. We rigorously evaluate our approach over various types of tasks, ranging from classifying source code graphs to nodes in citation networks, and show that it is effective in two important aspects: in making graph transformers generally outperform graph neural networks tailored to DAGs and in improving SOTA graph transformer performance in terms of both quality and efficiency.

翻訳日:2023-11-02 04:33:30 公開日:2023-10-30

# 多行動政策のグラディエントについて

On Many-Actions Policy Gradient ( http://arxiv.org/abs/2210.13011v5 )

ライセンス: Link先を確認

Michal Nauman and Marek Cygan

(参考訳) 確率的政策勾配 (SPGs) と状態毎のアクションサンプルのばらつきについて検討した。我々は,多作用のspgが分散を生じさせる時期を決定する多作用最適条件を,比例伸長軌道を持つ単作用剤と比較して導出する。 SPGの文脈における多行動サンプリングに動的モデルを活用するモデルベース多行動(MBMA)を提案する。 MBMAは、マルチアクションSPGの既存の実装に関連する問題に対処し、モデルシミュレーションロールアウトの状態から推定される低いバイアスとSPGに匹敵する分散をもたらす。 MBMAバイアスと分散構造は理論によって予測されるものと一致している。その結果, MBMAはモデルフリー, 多アクション, モデルベースSPGベースラインと比較して, サンプル効率の向上と, 一連の連続行動環境のリターンの向上を実現している。

We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with proportionally extended trajectory. We propose Model-Based Many-Actions (MBMA), an approach leveraging dynamics models for many-actions sampling in the context of SPG. MBMA addresses issues associated with existing implementations of many-actions SPG and yields lower bias and comparable variance to SPG estimated from states in model-simulated rollouts. We find that MBMA bias and variance structure matches that predicted by theory. As a result, MBMA achieves improved sample efficiency and higher returns on a range of continuous action environments as compared to model-free, many-actions, and model-based on-policy SPG baselines.

翻訳日:2023-11-02 04:33:08 公開日:2023-10-30

# 臨床名付きエンティティ認識のための事前学習言語モデルの価値の検討

Exploring the Value of Pre-trained Language Models for Clinical Named Entity Recognition ( http://arxiv.org/abs/2210.12770v4 )

ライセンス: Link先を確認

Samuel Belkadi and Lifeng Han and Yuping Wu and Goran Nenadic

(参考訳) 自然言語処理(NLP)の分野では,一般あるいはドメイン固有データから限られたリソースを持つ特定のタスクへの微調整事前学習言語モデル(PLM)の実践が人気を集めている。本研究では,この仮定を再考し,臨床NLP,特に薬物とその関連属性に対する名前付きエンティティ認識について検討する。我々は,スクラッチからトレーニングした Transformer モデルと細調整された BERT ベースの LLM,すなわち BERT, BioBERT, ClinicalBERT を比較した。さらに、文脈学習を促進するために追加のCRF層がそのようなモデルに与える影響を検討する。我々はモデル開発と評価にn2c2-2018共有タスクデータを使用する。実験の結果は 1) CRF層は全ての言語モデルを改善した。 2) マクロ平均F1スコアを用いたBIO制限スパンレベル評価について、微調整LDMは0.83以上のスコアを得たが、TransformerCRFモデルは、スクラッチからトレーニングされた0.78以上のスコアを得た。 3) 重み付き平均値を用いた生体制限スパンレベル評価では, 臨床用bert-crf, bert-crf, およびtransformrcrfがそれぞれ97.59\%/97.44\%/96.84\%と低いスコアを示した。 4) より優れたデータ分散のためのダウンサンプリングによる効率的なトレーニングの適用により、トレーニングコストとデータの必要性はさらに低減され、同様のスコアが維持される。我々のモデルは \url{https://github.com/HECTA-UoM/TransformerCRF} でホストされます。

The practice of fine-tuning Pre-trained Language Models (PLMs) from general or domain-specific data to a specific task with limited resources, has gained popularity within the field of natural language processing (NLP). In this work, we re-visit this assumption and carry out an investigation in clinical NLP, specifically Named Entity Recognition on drugs and their related attributes. We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs namely BERT, BioBERT, and ClinicalBERT. Furthermore, we examine the impact of an additional CRF layer on such models to encourage contextual learning. We use n2c2-2018 shared task data for model development and evaluations. The experimental outcomes show that 1) CRF layers improved all language models; 2) referring to BIO-strict span level evaluation using macro-average F1 score, although the fine-tuned LLMs achieved 0.83+ scores, the TransformerCRF model trained from scratch achieved 0.78+, demonstrating comparable performances with much lower cost - e.g. with 39.80\% less training parameters; 3) referring to BIO-strict span-level evaluation using weighted-average F1 score, ClinicalBERT-CRF, BERT-CRF, and TransformerCRF exhibited lower score differences, with 97.59\%/97.44\%/96.84\% respectively. 4) applying efficient training by down-sampling for better data distribution further reduced the training cost and need for data, while maintaining similar scores - i.e. around 0.02 points lower compared to using the full dataset. Our models will be hosted at \url{https://github.com/HECTA-UoM/TransformerCRF}

翻訳日:2023-11-02 04:32:55 公開日:2023-10-30

# SimSCOOD:微調整ソースコードモデルにおける分布外一般化の体系的解析

SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in Fine-tuned Source Code Models ( http://arxiv.org/abs/2210.04802v2 )

ライセンス: Link先を確認

Hossein Hajipour, Ning Yu, Cristian-Alexandru Staicu, Mario Fritz

(参考訳) ソースコードモデルの事前トレーニングには、大規模なコードデータセットがアクセスしやすくなっている。しかしながら、微調整フェーズでは、特定の下流タスクのコード分布を完全にカバーする代表的なトレーニングデータを得ることは、タスク固有の性質と限定的なラベル付けリソースのため、依然として困難である。さらに、事前学習モデルの微調整は、事前に獲得した事前学習知識を忘れることになる。これらは、まだ体系的に研究されていない予期せぬモデル推論行動による分散(ood)一般化問題につながる。本稿では、ソースコードデータ特性の異なる次元に沿って様々なOODシナリオをシミュレートする最初の体系的アプローチを提案し、それらのシナリオにおける微調整モデル挙動について検討する。完全微調整法とローランド適応法(LoRA)微調整法を含む,異なる微調整法下でのモデルの挙動について検討する。最先端の4つの事前学習モデル上で実施し,2つのコード生成タスクに適用した総合的な解析を行い,ood一般化問題に起因する複数の障害モードを明らかにした。さらに分析の結果,LoRAファインチューニングは様々なシナリオにおける全ファインチューニングよりも,OODの一般化性能が大幅に向上していることが判明した。

Large code datasets have become increasingly accessible for pre-training source code models. However, for the fine-tuning phase, obtaining representative training data that fully covers the code distribution for specific downstream tasks remains challenging due to the task-specific nature and limited labeling resources. Moreover, fine-tuning pretrained models can result in forgetting previously acquired pre-training knowledge. These lead to out-of-distribution (OOD) generalization issues with unexpected model inference behaviors that have not been systematically studied yet. In this paper, we contribute the first systematic approach that simulates various OOD scenarios along different dimensions of source code data properties and study the fine-tuned model behaviors in such scenarios. We investigate the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods. Our comprehensive analysis, conducted on four state-of-the-art pretrained models and applied to two code generation tasks, exposes multiple failure modes attributed to OOD generalization issues. Additionally, our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.

翻訳日:2023-11-02 04:31:44 公開日:2023-10-30

# VoLTA:局部アライメントを弱めるビジョンランゲージ変換器

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment ( http://arxiv.org/abs/2210.04135v3 )

ライセンス: Link先を確認

Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun and Rama Chellappa

(参考訳) 視覚言語事前学習(VLP)は、最近、様々なユニモーダルおよびマルチモーダルダウンストリームアプリケーションに非常に効果的であることが証明された。しかしながら、既存のほとんどのエンドツーエンドVLP法は、高解像度の画像テキストボックスデータを使用して、オブジェクト検出、セグメンテーション、参照表現理解などのきめ細かい領域レベルのタスクをうまく処理する。残念ながら、正確なバウンディングボックスアノテーションを備えた高解像度画像は、収集し、大規模に監視するのに費用がかかる。本稿では,VoLTA(Vosion-Language Transformer with weak-supervised local-feature Alignment)を提案する。 voltaは、グラフ最適化トランスポートベースの弱教師付きアライメントをローカルイメージパッチとテキストトークンに採用し、明示的で自己正規化され、解釈可能な低レベルマッチング基準を継承する。さらにvoltaは、プレトレーニング中にマルチモーダルフュージョンをユニモーダルバックボーンに深く押し込み、フュージョン固有のトランスフォーマー層を取り除き、メモリ要件をさらに削減する。広範囲の視覚および視覚によるダウンストリームタスクに対する広範な実験は、粗いダウンストリーム性能を損なうことなく、細粒度アプリケーションにおけるVoLTAの有効性を実証している。

Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accurate bounding box annotations are expensive to collect and use for supervision at scale. In this work, we propose VoLTA (Vision-Language Transformer with weakly-supervised local-feature Alignment), a new VLP paradigm that only utilizes image-caption data but achieves fine-grained region-level image understanding, eliminating the use of expensive box annotations. VoLTA adopts graph optimal transport-based weakly-supervised alignment on local image patches and text tokens to germinate an explicit, self-normalized, and interpretable low-level matching criterion. In addition, VoLTA pushes multi-modal fusion deep into the uni-modal backbones during pre-training and removes fusion-specific transformer layers, further reducing memory requirements. Extensive experiments on a wide range of vision- and vision-language downstream tasks demonstrate the effectiveness of VoLTA on fine-grained applications without compromising the coarse-grained downstream performance, often outperforming methods using significantly more caption and box annotations.

翻訳日:2023-11-02 04:31:06 公開日:2023-10-30

# ラーニングウェア:小さなモデルは大きい

Learnware: Small Models Do Big ( http://arxiv.org/abs/2210.03647v3 )

ライセンス: Link先を確認

Zhi-Hua Zhou, Zhi-Hao Tan

(参考訳) 現在の機械学習技術には、大量のトレーニングデータと熟練したトレーニングスキルの必要性、継続的な学習の難しさ、壊滅的な忘れのリスク、データのプライバシ/プライバシの漏洩など、不満がある。ほとんどの研究は、関連する問題の1つに別々に焦点を合わせており、ほとんどの問題が実際に絡まっているという事実に注意を払っていない。自然言語処理やコンピュータビジョンの応用で目覚ましい成果を上げてきた、一般的なビッグモデルパラダイムは、これらの問題にまだ対応していないが、炭素排出量の深刻な源となっている。本稿では,機械学習モデルをスクラッチから構築する必要がないことをユーザが実現しようとする学習ウェアのパラダイムの概要を紹介する。このパラダイムは,従来の目的を超えて,小さなモデルを再利用して物事を行おうとする試みであり,トレーニングされたモデルを適切に識別し,モデルについて事前に何も知らない将来のユーザの要求に応じて再利用できるようにするための重要な要素である。

There are complaints about current machine learning techniques such as the requirement of a huge amount of training data and proficient training skills, the difficulty of continual learning, the risk of catastrophic forgetting, the leaking of data privacy/proprietary, etc. Most research efforts have been focusing on one of those concerned issues separately, paying less attention to the fact that most issues are entangled in practice. The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions. This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes, where the key ingredient is the specification which enables a trained model to be adequately identified to reuse according to the requirement of future users who know nothing about the model in advance.

翻訳日:2023-11-02 04:30:38 公開日:2023-10-30

# 非負のテンソルに対する多体近似

Many-body Approximation for Non-negative Tensors ( http://arxiv.org/abs/2209.15338v3 )

ライセンス: Link先を確認

Kazu Ghalamkari, Mahito Sugiyama, Yoshinobu Kawahara

(参考訳) 多体近似と呼ばれる非負のテンソルを分解する別の方法を提案する。伝統的な分解法は表現の低ランク性を前提としており、大域的な最適化と目標ランクの選択が困難になる。我々は、テンソルとそのモードが確率分布と確率変数に対応するような、エネルギーに基づくテンソルのモデリングによってこれらの問題を回避する。我々のモデルは、階数よりも直感的に調整できる変数間の相互作用(つまりモード)を考慮し、KLの発散最小化の観点からグローバルに最適化することができる。さらに,モード間の相互作用をテンソルネットワークとして可視化し,多体近似と低ランク近似の非自明な関係を明らかにする。テンソル完備化と近似におけるアプローチの有効性を示す。

We present an alternative approach to decompose non-negative tensors, called many-body approximation. Traditional decomposition methods assume low-rankness in the representation, resulting in difficulties in global optimization and target rank selection. We avoid these problems by energy-based modeling of tensors, where a tensor and its mode correspond to a probability distribution and a random variable, respectively. Our model can be globally optimized in terms of the KL divergence minimization by taking the interaction between variables (that is, modes), into account that can be tuned more intuitively than ranks. Furthermore, we visualize interactions between modes as tensor networks and reveal a nontrivial relationship between many-body approximation and low-rank approximation. We demonstrate the effectiveness of our approach in tensor completion and approximation.

翻訳日:2023-11-02 04:30:20 公開日:2023-10-30

# 変分量子固有解法に対するランダム化コンパイルとゼロノイズ外挿による相乗的量子誤差緩和

Synergetic quantum error mitigation by randomized compiling and zero-noise extrapolation for the variational quantum eigensolver ( http://arxiv.org/abs/2212.11198v2 )

ライセンス: Link先を確認

Tomochika Kurita, Hammam Qassim, Masatoshi Ishii, Hirotaka Oshima, Shintaro Sato, Joseph Emerson

(参考訳) 本稿では,変分量子固有解法(VQE)アルゴリズムの量子誤差軽減戦略を提案する。数値シミュレーションにより,vqeのコヒーレントノイズは,従来の緩和法では抑制しにくいような大きな誤差を生じさせる可能性があるが,提案手法では,これらの誤差を著しく低減できることがわかった。提案手法は従来報告されていたランダム化コンパイル(RC)とゼロノイズ外挿(ZNE)の組み合わせである。直感的には、ランダム化コンパイルは、回路内のコヒーレントエラーを確率的ポーリ誤差に変換し、コスト関数を評価する際にゼロノイズ限界への外挿を容易にする。小分子に対するvqeの数値シミュレーションにより,提案手法は,様々な種類のコヒーレントノイズによるエネルギー誤差を最大2桁緩和できることを示した。

We propose a quantum error mitigation strategy for the variational quantum eigensolver (VQE) algorithm. We find, via numerical simulation, that very small amounts of coherent noise in VQE can cause substantially large errors that are difficult to suppress by conventional mitigation methods, and yet our proposed mitigation strategy is able to significantly reduce these errors. The proposed strategy is a combination of previously reported techniques, namely randomized compiling (RC) and zero-noise extrapolation (ZNE). Intuitively, randomized compiling turns coherent errors in the circuit into stochastic Pauli errors, which facilitates extrapolation to the zero-noise limit when evaluating the cost function. Our numerical simulation of VQE for small molecules shows that the proposed strategy can mitigate energy errors induced by various types of coherent noise by up to two orders of magnitude.

翻訳日:2023-11-02 04:23:01 公開日:2023-10-30

# ブラインド超解像カーネル推定のためのメタラーニングカーネル

Meta-Learned Kernel For Blind Super-Resolution Kernel Estimation ( http://arxiv.org/abs/2212.07886v2 )

ライセンス: Link先を確認

Royson Lee, Rui Li, Stylianos I. Venieris, Timothy Hospedales, Ferenc Husz\'ar, Nicholas D. Lane

(参考訳) 近年の画像劣化推定手法により,一像超解像(SR)による実世界の画像のアップサンプル化が可能となった。これらの手法のうち、明示的なカーネル推定手法は未知の劣化を扱う上で前例のない性能を示した。それでも、下流SRモデルで使用する場合、いくつかの制限が有効性を制限している。特に、この方法の族は、一画像毎の適応期間の長いことによる過度な推測時間二カーネルミスマッチによる画像の忠実度が劣ること。本研究では,画像の分布に含まれる情報からメタ学習を学習するアプローチを導入し,カーネル推定と画像忠実度の両方の性能を大幅に向上させるとともに,新たな画像への適応を著しく高速化する。具体的には, カーネル生成GANであるMetaKernelGANを, 新しいイメージが提示されると, ジェネレータがインフォームされたカーネル推定から始まり, 識別器は, パッチ分布を識別する強力な能力で開始する。最先端の手法と比較して,MetaKernelGANはカーネルの規模と共分散をよりよく推定し,非盲点SRモデルと組み合わせた場合,最先端の盲点SR結果が得られることを示した。教師なし学習者の教師なし学習を通じて、教師なし学習者の一般化性を維持し、カーネル推定の最適化安定性を改善し、画像適応を向上し、既存の手法よりも14.24から102.1倍の速度で高速な推論を実現する。

Recent image degradation estimation methods have enabled single-image super-resolution (SR) approaches to better upsample real-world images. Among these methods, explicit kernel estimation approaches have demonstrated unprecedented performance at handling unknown degradations. Nonetheless, a number of limitations constrain their efficacy when used by downstream SR models. Specifically, this family of methods yields i) excessive inference time due to long per-image adaptation times and ii) inferior image fidelity due to kernel mismatch. In this work, we introduce a learning-to-learn approach that meta-learns from the information contained in a distribution of images, thereby enabling significantly faster adaptation to new images with substantially improved performance in both kernel estimation and image fidelity. Specifically, we meta-train a kernel-generating GAN, named MetaKernelGAN, on a range of tasks, such that when a new image is presented, the generator starts from an informed kernel estimate and the discriminator starts with a strong capability to distinguish between patch distributions. Compared with state-of-the-art methods, our experiments show that MetaKernelGAN better estimates the magnitude and covariance of the kernel, leading to state-of-the-art blind SR results within a similar computational regime when combined with a non-blind SR model. Through supervised learning of an unsupervised learner, our method maintains the generalizability of the unsupervised learner, improves the optimization stability of kernel estimation, and hence image adaptation, and leads to a faster inference with a speedup between 14.24 to 102.1x over existing methods.

翻訳日:2023-11-02 04:21:58 公開日:2023-10-30

# 不特定人間モデルに対する逆推定の感度について

On the Sensitivity of Reward Inference to Misspecified Human Models ( http://arxiv.org/abs/2212.04717v2 )

ライセンス: Link先を確認

Joey Hong and Kush Bhatia and Anca Dragan

(参考訳) 人間の振る舞いから報酬関数を推論することは、価値の整合の中心であり、AIの目標と私たち、人間、実際に望むものとを整合させる。しかし、それを行うには、人間の行動のモデルに依存する。認知科学、神経科学、行動経済学の何十年もの研究の後、正確な人間のモデルを得ることは、オープンな研究課題である。これらのモデルは、報酬の推測が正確になるために、どの程度正確なものが必要なのか? 一方で、モデル内の小さなエラーが推論の破滅的なエラーに繋がる場合、報酬学習のフレームワーク全体が不公平に思えます。一方、モデルが改善されれば、報酬の正確さも向上するという保証が得られます。我々はこの問題を理論的にも経験的にも研究する。残念なことに、予想された報酬で任意に大きなエラーを引き起こす行動の小さな敵バイアスを構築することは可能である。しかし、おそらくもっと重要なことは、報酬推論エラーが人間のモデルにおける誤差で線形に境界づけられるという合理的な仮定を特定できるということです。最後に、シミュレーションおよび人的データを用いて、離散的かつ連続的な制御タスクにおける理論的洞察を検証する。

Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want. But doing so relies on models of how humans behave given their objectives. After decades of research in cognitive science, neuroscience, and behavioral economics, obtaining accurate human models remains an open research topic. This begs the question: how accurate do these models need to be in order for the reward inference to be accurate? On the one hand, if small errors in the model can lead to catastrophic error in inference, the entire framework of reward learning seems ill-fated, as we will never have perfect models of human behavior. On the other hand, if as our models improve, we can have a guarantee that reward accuracy also improves, this would show the benefit of more work on the modeling side. We study this question both theoretically and empirically. We do show that it is unfortunately possible to construct small adversarial biases in behavior that lead to arbitrarily large errors in the inferred reward. However, and arguably more importantly, we are also able to identify reasonable assumptions under which the reward inference error can be bounded linearly in the error in the human model. Finally, we verify our theoretical insights in discrete and continuous control tasks with simulated and human data.

翻訳日:2023-11-02 04:21:20 公開日:2023-10-30

# オフライン強化学習のための信頼度決定値関数

Confidence-Conditioned Value Functions for Offline Reinforcement Learning ( http://arxiv.org/abs/2212.04607v2 )

ライセンス: Link先を確認

Joey Hong and Aviral Kumar and Sergey Levine

(参考訳) オフライン強化学習(RL)は、既存の静的データセットのみを使用して効果的なポリシを学ぶことができる。そのため、オフラインのRLメソッドはデータセットと学習ポリシーの間の分散シフトを処理しなければならない。最も一般的なアプローチは、アウト・オブ・ディストリビューション(ood)アクションのリターンを過小評価する、保守的、あるいは低いバウンドの値関数を学ぶことである。そのような価値関数に最適化されたポリシーは、固定された、おそらくは準最適である保守主義の程度に従ってのみ振る舞うことができる。しかし、トレーニング時に様々なレベルの保守主義のポリシーを学習し、評価中にそれらの1つを動的に選択する方法を考案できれば、これは軽減できる。そこで本研究では,信頼性条件付き値関数を復号化して,保守性の度合いを付加した学習価値関数を提案する。我々はベルマンバックアップの新しい形式を導出し、高い確率で任意の信頼度に対するQ値を同時に学習する。信頼度を条件づけることで,これまでの観察履歴を用いて信頼度レベルを制御し,オンライン評価における適応的戦略を実現する。提案手法は,既存の保守的アルゴリズムからのQ-関数を信頼度に基づいて条件付けすることで実現可能であり,理論的には,学習値関数が任意の信頼度で真値の保守的推定を生成することを示す。最後に,本アルゴリズムが複数の離散制御領域において既存の保守的オフラインrlアルゴリズムよりも優れていることを実証的に示す。

Offline reinforcement learning (RL) promises the ability to learn effective policies solely using existing, static datasets, without any costly online interaction. To do so, offline RL methods must handle distributional shift between the dataset and the learned policy. The most common approach is to learn conservative, or lower-bound, value functions, which underestimate the return of out-of-distribution (OOD) actions. However, such methods exhibit one notable drawback: policies optimized on such value functions can only behave according to a fixed, possibly suboptimal, degree of conservatism. However, this can be alleviated if we instead are able to learn policies for varying degrees of conservatism at training time and devise a method to dynamically choose one of them during evaluation. To do so, in this work, we propose learning value functions that additionally condition on the degree of conservatism, which we dub confidence-conditioned value functions. We derive a new form of a Bellman backup that simultaneously learns Q-values for any degree of confidence with high probability. By conditioning on confidence, our value functions enable adaptive strategies during online evaluation by controlling for confidence level using the history of observations thus far. This approach can be implemented in practice by conditioning the Q-function from existing conservative algorithms on the confidence.We theoretically show that our learned value functions produce conservative estimates of the true value at any desired confidence. Finally, we empirically show that our algorithm outperforms existing conservative offline RL algorithms on multiple discrete control domains.

翻訳日:2023-11-02 04:20:59 公開日:2023-10-30

# 事前学習したタンパク質言語モデルの幾何学的深層学習ネットワークへの統合

Integration of Pre-trained Protein Language Models into Geometric Deep Learning Networks ( http://arxiv.org/abs/2212.03447v2 )

ライセンス: Link先を確認

Fang Wu, Lirong Wu, Dragomir Radev, Jinbo Xu, Stan Z. Li

(参考訳) 幾何学的深層学習は、最近、非ユークリッド領域で大きな成功を収め、大きな生体分子の3次元構造を学習することが、別の研究領域として浮上している。しかし、その有効性は構造データが限られているため、大きく制約されている。一方、1Dシークエンスで訓練されたタンパク質言語モデルでは、広範囲のアプリケーションで拡張性を示す。以前のいくつかの研究では、これらの異なるタンパク質様相を組み合わせることで幾何学的ニューラルネットワークの表現力を促進するが、それらの利点を包括的に理解することはできなかった。本研究では,よく訓練されたタンパク質言語モデルから得られた知識を,いくつかの最先端幾何学的ネットワークに統合し,タンパク質-タンパク質界面予測,モデル品質評価,タンパク質-タンパク質剛体ドッキング,結合親和性予測など,さまざまなタンパク質表現学習ベンチマークを評価する。以上の結果から,ベースラインを20%上回る総合的な改善が見られた。強い証拠は、タンパク質言語モデルの知識の組み入れが幾何ネットワークの能力を大幅に向上させ、複雑なタスクに一般化できることを示唆している。

Geometric deep learning has recently achieved great success in non-Euclidean domains, and learning on 3D structures of large biomolecules is emerging as a distinct research area. However, its efficacy is largely constrained due to the limited quantity of structural data. Meanwhile, protein language models trained on substantial 1D sequences have shown burgeoning capabilities with scale in a broad range of applications. Several previous studies consider combining these different protein modalities to promote the representation power of geometric neural networks, but fail to present a comprehensive understanding of their benefits. In this work, we integrate the knowledge learned by well-trained protein language models into several state-of-the-art geometric networks and evaluate a variety of protein representation learning benchmarks, including protein-protein interface prediction, model quality assessment, protein-protein rigid-body docking, and binding affinity prediction. Our findings show an overall improvement of 20% over baselines. Strong evidence indicates that the incorporation of protein language models' knowledge enhances geometric networks' capacity by a significant margin and can be generalized to complex tasks.

翻訳日:2023-11-02 04:20:32 公開日:2023-10-30

# spuriosity rankings: バイアスの測定と軽減のためのデータのソート

Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases ( http://arxiv.org/abs/2212.02648v3 )

ライセンス: Link先を確認

Mazda Moayeri, Wenxiao Wang, Sahil Singla, Soheil Feizi

(参考訳) 本稿では,突発的手がかりに依存するモデルバイアスを簡易かつ効果的に測定・緩和する方法を提案する。データやモデルのトレーニングにコストのかかる変更を必要とせず、既に持っているデータをソートすることでよりうまく利用します。具体的には、解釈可能なネットワークの深い神経的特徴を介して、スプリシティー(一般的なスプリアスキューが存在する程度)に基づいて、クラス内の画像のランク付けを行う。高頻度画像と低頻度画像の精度のギャップとしてモデルバイアスを評価することは、スプリオシティランキングにより、マイノリティサブポピュレーション(低頻度画像)の特定が容易である。低精細度画像に分類ヘッドを微調整することで、モデルのバイアスを少ないコストで効率的に除去することさえ可能で、スプリソリティによらずサンプルを公平に処理することができる。 imagenet上で私たちのメソッドをデモし、5000ドルのクラスフィーチャ依存性に注釈を付けて(630ドル)、これらの機能に対して325k$のsoft segmentationのデータセットを作成しました。同定されたスプリアス神経特徴を用いてスプリオシティのランキングを計算した結果、89ドルの多様なモデルに対するバイアスを評価し、クラス毎のバイアスがモデル間で高い相関関係にあることを見出した。以上の結果から,スプリアス機能依存によるモデルバイアスは,モデルのトレーニング方法よりも,モデルがどのようなトレーニングを受けているかによって影響されることが示唆された。

We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating $5000$ class-feature dependencies ($630$ of which we find to be spurious) and generating a dataset of $325k$ soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for $89$ diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.

翻訳日:2023-11-02 04:20:12 公開日:2023-10-30

# すべてを支配するリスク:モデルベースオフライン強化学習におけるリスクに敏感な視点

One Risk to Rule Them All: A Risk-Sensitive Perspective on Model-Based Offline Reinforcement Learning ( http://arxiv.org/abs/2212.00124v3 )

ライセンス: Link先を確認

Marc Rigter, Bruno Lacerda, Nick Hawes

(参考訳) オフライン強化学習(rl)は、オンライン探索がコストや危険すぎる、安全クリティカルなドメインに適している。このような安全クリティカルな設定では、決定は破滅的な結果のリスクを考慮するべきである。言い換えれば、意思決定はリスクに敏感であるべきです。オフラインRLのリスクに関する以前の研究は、分散シフトを避けるためにオフラインRL技術とリスク感受性のRLアルゴリズムを組み合わせている。本研究では,これら2つの問題に共同で対処するためのメカニズムとしてリスク感受性を提案する。我々のモデルに基づくアプローチは、てんかんとてんかんの不確実性の両方に対してリスク逆である。エピステマ性不確実性へのリスク回避は、データセットがカバーしていない領域がエピステマ性不確実性が高いため、分布シフトを妨げる。相対的不確実性へのリスク回避は、環境確率性による悪い結果をもたらす可能性のある行動を妨げる。実験により,本アルゴリズムは決定論的ベンチマークにおいて競争性能を達成し,確率的領域におけるリスクに敏感な目標に対する既存のアプローチを上回った。

Offline reinforcement learning (RL) is suitable for safety-critical domains where online exploration is too costly or dangerous. In such safety-critical settings, decision-making should take into consideration the risk of catastrophic outcomes. In other words, decision-making should be risk-sensitive. Previous works on risk in offline RL combine together offline RL techniques, to avoid distributional shift, with risk-sensitive RL algorithms, to achieve risk-sensitivity. In this work, we propose risk-sensitivity as a mechanism to jointly address both of these issues. Our model-based approach is risk-averse to both epistemic and aleatoric uncertainty. Risk-aversion to epistemic uncertainty prevents distributional shift, as areas not covered by the dataset have high epistemic uncertainty. Risk-aversion to aleatoric uncertainty discourages actions that may result in poor outcomes due to environment stochasticity. Our experiments show that our algorithm achieves competitive performance on deterministic benchmarks, and outperforms existing approaches for risk-sensitive objectives in stochastic domains.

翻訳日:2023-11-02 04:19:28 公開日:2023-10-30

# マルコフ連鎖モンテカルロを用いた線形統計形状モデルの近似断面積と差

Approximating Intersections and Differences Between Linear Statistical Shape Models Using Markov Chain Monte Carlo ( http://arxiv.org/abs/2211.16314v2 )

ライセンス: Link先を確認

Maximilian Weiherer, Finn Klein, Bernhard Egger

(参考訳) 現在まで、統計形状モデル(SSM)の比較は、コンパクト性、一般化、特異性といった単純な指標を用いて行われる、単にパフォーマンスに基づくものであることが多い。実際の形状空間間の類似性や違いは可視化も定量化もできない。本稿では,2つの線形ssmを密接な対応で定性的に比較する手法として,近似交叉空間の計算と,モデルにまたがる(超楕円型)許容形状領域との集合論的差異を提案する。この目的のために、マルコフ連鎖モンテカルロを用いて交叉空間に横たわる形状の分布を近似し、その後、後続サンプルに主成分分析(PCA)を適用し、最終的に交叉空間の新しいSSMが得られる。同様の方法で線形SSM間の差異を推定するが、結果として得られる空間はもはや凸ではなく、PCAを適用するのではなく、後続サンプルを用いて視覚化する。提案アルゴリズムは, 交叉空間の計算と解析, 公開可能な顔モデルの違い, 性別固有の男性と女性, およびアイデンティティと表現モデルに着目して, 質的に示す。合成データと実世界のデータから構築したssmに基づく定量的評価により,本手法が接地空間と差を正確に復元できることを示す。

To date, the comparison of Statistical Shape Models (SSMs) is often solely performance-based, carried out by means of simplistic metrics such as compactness, generalization, or specificity. Any similarities or differences between the actual shape spaces can neither be visualized nor quantified. In this paper, we present a new method to qualitatively compare two linear SSMs in dense correspondence by computing approximate intersection spaces and set-theoretic differences between the (hyper-ellipsoidal) allowable shape domains spanned by the models. To this end, we approximate the distribution of shapes lying in the intersection space using Markov chain Monte Carlo and subsequently apply Principal Component Analysis (PCA) to the posterior samples, eventually yielding a new SSM of the intersection space. We estimate differences between linear SSMs in a similar manner; here, however, the resulting spaces are no longer convex and we do not apply PCA but instead use the posterior samples for visualization. We showcase the proposed algorithm qualitatively by computing and analyzing intersection spaces and differences between publicly available face models, focusing on gender-specific male and female as well as identity and expression models. Our quantitative evaluation based on SSMs built from synthetic and real-world data sets provides detailed evidence that the introduced method is able to recover ground-truth intersection spaces and differences accurately.

翻訳日:2023-11-02 04:19:11 公開日:2023-10-30

# グラフデータのためのアウトリア・ロバスト・グロモフ・ワッサーシュタイン

Outlier-Robust Gromov-Wasserstein for Graph Data ( http://arxiv.org/abs/2302.04610v2 )

ライセンス: Link先を確認

Lemin Kong, Jiajin Li, Jianheng Tang, Anthony Man-Cho So

(参考訳) gromov-wasserstein (gw) 距離は、異なる計量空間上で支持される確率分布を比較調整するための強力なツールである。近年,GWは多様なグラフ学習タスクのための異種データの整合化のための主要なモデリング手法となっている。しかし、GW距離は外れ値に非常に敏感であることが知られており、目的関数の他のサンプルと同じ重みが与えられた場合、大きな不正確な結果になる可能性がある。この問題を軽減するため、我々はRGWと呼ばれるGW距離の新しい堅牢バージョンを導入する。 RGWは、クルバック・リーバーの発散に基づくあいまいさ集合の中で楽観的に摂動する限界制約を特徴とする。 rgwの利点をより使いやすくするために,bregman proximal alternating linearized minimization algorithmを用いた計算効率と理論的に証明可能な手順を開発した。広範な実験を通じて,RGWがグラフマッチングや部分形状対応などの実世界のグラフ学習タスクにおいて有効であることを示す。

Gromov-Wasserstein (GW) distance is a powerful tool for comparing and aligning probability distributions supported on different metric spaces. Recently, GW has become the main modeling technique for aligning heterogeneous data for a wide range of graph learning tasks. However, the GW distance is known to be highly sensitive to outliers, which can result in large inaccuracies if the outliers are given the same weight as other samples in the objective function. To mitigate this issue, we introduce a new and robust version of the GW distance called RGW. RGW features optimistically perturbed marginal constraints within a Kullback-Leibler divergence-based ambiguity set. To make the benefits of RGW more accessible in practice, we develop a computationally efficient and theoretically provable procedure using Bregman proximal alternating linearized minimization algorithm. Through extensive experimentation, we validate our theoretical results and demonstrate the effectiveness of RGW on real-world graph learning tasks, such as subgraph matching and partial shape correspondence.

翻訳日:2023-11-02 04:10:26 公開日:2023-10-30

# 雑音量子力学における誤差緩和のロバスト性における閾値

Thresholds in the Robustness of Error Mitigation in Noisy Quantum Dynamics ( http://arxiv.org/abs/2302.04278v2 )

ライセンス: Link先を確認

Pradeep Niroula, Sarang Gopalakrishnan, Michael J. Gullans

(参考訳) ノイズの多い短期量子シミュレーションから有用な情報を抽出するには、エラー軽減戦略が必要である。これらの戦略の幅広いクラスは、ノイズ源の正確な評価に依存している。ノイズが不完全である場合,このような戦略の堅牢性について検討する。空間次元のランダムな空間的局所回路に対する誤差緩和のロバスト性におけるしきい値の存在を予測するためにimry-ma引数を適用する。 $d \geq 2$: 雑音特性障害しきい値レート以下では、量子ビット数でスケールする最大倍の誤差緩和が可能となる。対照的に、1次元の回路では、緩和は障害の特徴づけにおける不完全性に対して$\mathcal{O}(1)$の時間で失敗する。その結果,誤差低減は十分な特性を有する雑音の実用的な方法であることがわかった。本稿では, 量子計算の優位性, 測定誘起相転移の耐故障プローブ, および短期デバイスにおける量子アルゴリズムの検証について考察する。

Extracting useful information from noisy near-term quantum simulations requires error mitigation strategies. A broad class of these strategies rely on precise characterization of the noise source. We study the robustness of such strategies when the noise is imperfectly characterized. We adapt an Imry-Ma argument to predict the existence of a threshold in the robustness of error mitigation for random spatially local circuits in spatial dimensions $D \geq 2$: noise characterization disorder below the threshold rate allows for error mitigation up to times that scale with the number of qubits. For one-dimensional circuits, by contrast, mitigation fails at an $\mathcal{O}(1)$ time for any imperfection in the characterization of disorder. As a result, error mitigation is only a practical method for sufficiently well-characterized noise. We discuss further implications for tests of quantum computational advantage, fault-tolerant probes of measurement-induced phase transitions, and quantum algorithms in near-term devices.

翻訳日:2023-11-02 04:09:40 公開日:2023-10-30

# 文脈ラッソ:ディープニューラルネットワークによるスパース線形モデル

The Contextual Lasso: Sparse Linear Models via Deep Neural Networks ( http://arxiv.org/abs/2302.00878v3 )

ライセンス: Link先を確認

Ryan Thompson, Amir Dezfouli, Robert Kohn

(参考訳) スパース線形モデル(Sparse linear model)は、機械学習を解釈するためのいくつかの中核的なツールの1つである。残念ながら、スパース線形モデルは、ディープニューラルネットワークのようなブラックボックスモデルよりも、入力機能の関数としてはるかに柔軟性が低い。この能力ギャップを念頭に置いて、入力特徴を2つのグループに分け、解釈可能なモデルに変数として含めるための説明的特徴と、候補変数を選択してその効果を決定する文脈的特徴の2つを考察する。この二分法によって、文脈的特徴の関数としてスパースパターンと係数が変化するような説明的特徴にスパース線形モデルに適合する新しい統計推定器であるcontextual lassoが導かれる。フィッティングプロセスは、ディープニューラルネットワークを介してこの関数を非パラメトリックに学習する。スパース係数を得るために、ネットワークの出力を$\ell_1$-constrained linear modelの空間にマッピングするプロジェクション層の形で、新しいラッソ正規化器を用いてネットワークを訓練する。実データと合成データに関する大規模な実験は、学習されたモデルは、標準的なディープニューラルネットワークの予測力を犠牲にすることなく、通常のラッソよりもスペーサーであることが示唆されている。

Sparse linear models are one of several core tools for interpretable machine learning, a field of emerging importance as predictive models permeate decision-making in many domains. Unfortunately, sparse linear models are far less flexible as functions of their input features than black-box models like deep neural networks. With this capability gap in mind, we study a not-uncommon situation where the input features dichotomize into two groups: explanatory features, which are candidates for inclusion as variables in an interpretable model, and contextual features, which select from the candidate variables and determine their effects. This dichotomy leads us to the contextual lasso, a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features. The fitting process learns this function nonparametrically via a deep neural network. To attain sparse coefficients, we train the network with a novel lasso regularizer in the form of a projection layer that maps the network's output onto the space of $\ell_1$-constrained linear models. An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso without sacrificing the predictive power of a standard deep neural network.

翻訳日:2023-11-02 04:09:07 公開日:2023-10-30

# きめ細かい分類のための粗分類器によるテスト時間修正

Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification ( http://arxiv.org/abs/2302.00368v2 )

ライセンス: Link先を確認

Kanishk Jain, Shyamgopal Karthik, Vineet Gandhi

(参考訳) 細粒度分類における誤り重大度低減の問題について検討する。きめ細かい分類は、主に正確なアノテーションのためのドメインの専門知識を必要とするため困難である。しかしながら、人間は比較的低いレベルの専門知識を必要とするため、特に粗い分類を行うのに適している。そこで本研究では,ラベル階層を用いた階層的アンサンブル(hie)と呼ばれるポストホック補正手法を提案する。葉ノードの親のみを必要とすることにより,avgを有意に減少させる。 iNaturalist-19とタイトされたImageNet-HデータセットのTop-1精度を改善し、両方のベンチマークで新たな最先端を達成した。また,本手法の有効性についても検討した。提案手法は,細粒度クラスにおいてトレーニングデータが減少するにつれて,誤りの重大度が著しく低下する一方で,トップ1の精度において顕著な向上をもたらす。 hieの単純でポストホックな性質は、この予測をさらに改善するために、市販のトレーニング済みモデルでの使用を実用的にします。

We investigate the problem of reducing mistake severity for fine-grained classification. Fine-grained classification can be challenging, mainly due to the requirement of domain expertise for accurate annotation. However, humans are particularly adept at performing coarse classification as it requires relatively low levels of expertise. To this end, we present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE) that utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions. By only requiring the parents of leaf nodes, our method significantly reduces avg. mistake severity while improving top-1 accuracy on the iNaturalist-19 and tieredImageNet-H datasets, achieving a new state-of-the-art on both benchmarks. We also investigate the efficacy of our approach in the semi-supervised setting. Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes. The simplicity and post-hoc nature of HiE renders it practical to be used with any off-the-shelf trained model to improve its predictions further.

翻訳日:2023-11-02 04:08:42 公開日:2023-10-30

# 大規模変圧器モデルの隠れ表現の幾何学

The geometry of hidden representations of large transformer models ( http://arxiv.org/abs/2302.00294v2 )

ライセンス: Link先を確認

Lucrezia Valeriani, Diego Doimo, Francesca Cuturello, Alessandro Laio, Alessio Ansuini, Alberto Cazzaniga

(参考訳) 大きなトランスは、タンパク質配列、画像、テキストなど、さまざまなデータタイプにわたる自己教師型データ分析に使用される強力なアーキテクチャである。これらのモデルでは、データセットのセマンティクス構造は、ある表現と次の表現の間の変換のシーケンスから現れる。これらの表現の幾何学的および統計的性質と、層を移動するときにどのように変化するかを特徴付ける。内在次元(ID)と周辺組成を解析することにより、タンパク質言語タスクと画像再構成タスクで訓練されたトランスフォーマーにおいて、これらの表現が同様に進化することがわかった。最初の層では、データ多様体は拡大し、高次元となり、次いで中間層で著しく収縮する。モデルの最後の部分では、idはほぼ一定か、あるいは第2の浅いピークを形成する。その結果、データセットの意味情報は最初のピークの終わりによりよく表現され、この現象は多様なデータセットで訓練された多くのモデルで観察できることがわかった。以上より,idプロファイルの相対的最小値に対応する中間層での表現は,下流の学習タスクにより適している,意味的コンテンツの最大化を監督せずに識別する明示的な戦略を指摘した。

Large transformers are powerful architectures used for self-supervised data analysis across various data types, including protein sequences, images, and text. In these models, the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next. We characterize the geometric and statistical properties of these representations and how they change as we move through the layers. By analyzing the intrinsic dimension (ID) and neighbor composition, we find that the representations evolve similarly in transformers trained on protein language tasks and image reconstruction tasks. In the first layers, the data manifold expands, becoming high-dimensional, and then contracts significantly in the intermediate layers. In the last part of the model, the ID remains approximately constant or forms a second shallow peak. We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets. Based on our findings, we point out an explicit strategy to identify, without supervision, the layers that maximize semantic content: representations at intermediate layers corresponding to a relative minimum of the ID profile are more suitable for downstream learning tasks.

翻訳日:2023-11-02 04:08:27 公開日:2023-10-30

# 確率流の自己持続速度マッチング

Self-Consistent Velocity Matching of Probability Flows ( http://arxiv.org/abs/2301.13737v3 )

ライセンス: Link先を確認

Lingxiao Li, Samuel Hurault, Justin Solomon

(参考訳) 本稿では,時間依存型フォッカー・プランク方程式やワッサーシュタイン勾配流を含む多種多様な質量保存偏微分方程式(PDE)を解くための離散化フリースケーラブルフレームワークを提案する。主な観測は、PDE溶液の時間変化速度場は自己整合性が必要であり、同じ速度場によって特徴づけられる確率フローを含む固定点方程式を満たす必要があることである。固定点方程式の残差を神経パラメータ化で直接最小化する代わりに、強い経験的性能を持つ重要な計算障害をバイパスするバイアス付き勾配推定器を用いた反復的定式化を用いる。従来の手法と比較して,本手法は時間的・空間的な離散化に悩まされず,より広い範囲のPDEをカバーし,高次元までスケールする。実験により,本手法は,利用可能時に解析解を精度良く回収し,学習時間が少ない高次元での優れた性能を実現する。

We present a discretization-free scalable framework for solving a large class of mass-conserving partial differential equations (PDEs), including the time-dependent Fokker-Planck equation and the Wasserstein gradient flow. The main observation is that the time-varying velocity field of the PDE solution needs to be self-consistent: it must satisfy a fixed-point equation involving the probability flow characterized by the same velocity field. Instead of directly minimizing the residual of the fixed-point equation with neural parameterization, we use an iterative formulation with a biased gradient estimator that bypasses significant computational obstacles with strong empirical performance. Compared to existing approaches, our method does not suffer from temporal or spatial discretization, covers a wider range of PDEs, and scales to high dimensions. Experimentally, our method recovers analytical solutions accurately when they are available and achieves superior performance in high dimensions with less training time compared to alternatives.

翻訳日:2023-11-02 04:08:08 公開日:2023-10-30

# Neural Relation Graph: ラベルノイズと外部データの識別のための統一フレームワーク

Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data ( http://arxiv.org/abs/2301.12321v5 )

ライセンス: Link先を確認

Jang-Hyun Kim, Sangdoo Yun, Hyun Oh Song

(参考訳) データの診断とクリーニングは、堅牢な機械学習システムを構築するための重要なステップである。しかしながら、ラベルエラーや過剰表現、外れ値といった複雑な問題が存在するため、大規模なデータセット内の問題を特定することは難しい。本稿では,主に無視される情報のソースである特徴埋め込み空間におけるデータの関係構造を利用して,問題データを特定する統一的な手法を提案する。そこで本研究では,データの関係グラフ構造に基づいてラベル誤りや異常データを検出するスケーラブルで効果的なアルゴリズムを提案する。さらに,特徴埋め込み空間におけるデータポイントの文脈情報を提供する可視化ツールを導入し,インタラクティブにデータ診断を行うための効果的なツールとして機能する。本研究では,画像Net,ESC-50,SST2を含む大規模画像,音声,言語領域タスクにおけるラベル誤りとOODの検出性能を評価する。本手法は,検討中のすべてのタスクにおける最先端検出性能を達成し,様々なドメインにまたがる大規模実世界のデータセットのデバッグにおいてその効果を実証する。私たちはhttps://github.com/snu-mllab/Neural-Relation-Graphでコードをリリースします。

Diagnosing and cleaning data is a crucial step for building robust machine learning systems. However, identifying problems within large-scale datasets with real-world distributions is challenging due to the presence of complex issues such as label errors, under-representation, and outliers. In this paper, we propose a unified approach for identifying the problematic data by utilizing a largely ignored source of information: a relational structure of data in the feature-embedded space. To this end, we present scalable and effective algorithms for detecting label errors and outlier data based on the relational graph structure of data. We further introduce a visualization tool that provides contextual information of a data point in the feature-embedded space, serving as an effective tool for interactively diagnosing data. We evaluate the label error and outlier/out-of-distribution (OOD) detection performances of our approach on the large-scale image, speech, and language domain tasks, including ImageNet, ESC-50, and SST2. Our approach achieves state-of-the-art detection performance on all tasks considered and demonstrates its effectiveness in debugging large-scale real-world datasets across various domains. We release codes at https://github.com/snu-mllab/Neural-Relation-Graph.

翻訳日:2023-11-02 04:07:48 公開日:2023-10-30

# アクティブラーニング評価の落とし穴を探る--有意義なパフォーマンス評価のための体系的枠組み

Navigating the Pitfalls of Active Learning Evaluation: A Systematic Framework for Meaningful Performance Assessment ( http://arxiv.org/abs/2301.10625v2 )

ライセンス: Link先を確認

Carsten T. L\"uth, Till J. Bungert, Lukas Klein, Paul F. Jaeger

(参考訳) Active Learning (AL)は、ラベルなしデータのプールから最も情報性の高いサンプルをインタラクティブに選択することで、ラベル付けの負担を軽減することを目的としている。近年,ALクエリ手法の改良に関する研究が盛んに行われているが,半教師付き(Semi-SL)や自己教師付き学習(Self-SL)といった新たなパラダイムや,分類器構成の簡易な最適化と比較して,ALの有効性を疑問視する研究もある。このように、今日のAL文学は矛盾した、矛盾した風景を示しており、実践者がALをタスクに使用するかどうかと方法について不透明なままである。本研究では,al法を体系的かつ現実的な評価が欠如していることから,この不整合が生じることを仮定する。具体的には,al評価に必要な微妙な考察を反映した文献の5つの落とし穴を明らかにする。さらに,これらの落とし穴を克服し,AL手法の性能に関する有意義な記述を可能にする評価フレームワークを提案する。本プロトコルの妥当性を示すために,様々なデータセット,クエリメソッド,al設定,トレーニングパラダイムにまたがる画像分類に関する大規模実証研究とベンチマークを提案する。本研究は,文献上の矛盾点を明らかにするとともに,実践者に対して手持ちの勧告を行うことを可能にした。ベンチマークはhttps://github.com/IML-DKFZ/realistic-al.comにホストされている。

Active Learning (AL) aims to reduce the labeling burden by interactively selecting the most informative samples from a pool of unlabeled data. While there has been extensive research on improving AL query methods in recent years, some studies have questioned the effectiveness of AL compared to emerging paradigms such as semi-supervised (Semi-SL) and self-supervised learning (Self-SL), or a simple optimization of classifier configurations. Thus, today's AL literature presents an inconsistent and contradictory landscape, leaving practitioners uncertain about whether and how to use AL in their tasks. In this work, we make the case that this inconsistency arises from a lack of systematic and realistic evaluation of AL methods. Specifically, we identify five key pitfalls in the current literature that reflect the delicate considerations required for AL evaluation. Further, we present an evaluation framework that overcomes these pitfalls and thus enables meaningful statements about the performance of AL methods. To demonstrate the relevance of our protocol, we present a large-scale empirical study and benchmark for image classification spanning various data sets, query methods, AL settings, and training paradigms. Our findings clarify the inconsistent picture in the literature and enable us to give hands-on recommendations for practitioners. The benchmark is hosted at https://github.com/IML-DKFZ/realistic-al .

翻訳日:2023-11-02 04:06:51 公開日:2023-10-30

# 多クラス分類における量子ニューラルネットワークの課題依存パワー

Problem-Dependent Power of Quantum Neural Networks on Multi-Class Classification ( http://arxiv.org/abs/2301.01597v3 )

ライセンス: Link先を確認

Yuxuan Du, Yibo Yang, Dacheng Tao, Min-Hsiu Hsieh

(参考訳) 量子ニューラルネットワーク(QNN)は物理世界を理解する上で重要なツールとなっているが、その利点と限界は完全には理解されていない。特定の符号化方法を持つQNNの中には、古典的なサロゲートによって効率的にシミュレートできるものもあるが、量子メモリを持つものは古典的な分類器よりも優れている。本稿では,マルチクラス分類タスクにおける量子ニューラルネットワーク分類器(qcs)の問題依存パワーを体系的に検討する。予測リスクの分析により, 分類器の訓練損失と一般化誤差を共同で評価する指標として, 訓練損失が一般化能力よりもパワーを支配すること, 第二に, 深層神経分類器の二重発光リスク曲線とは対照的に, qcsはu字型のリスク曲線をとること, の2つの重要な知見を明らかにした。また、最適QCとヘルストローム境界と等角的タイトフレームとの固有接続を明らかにする。そこで本研究では,学習課題における古典的分類器よりもQCの方が有効かどうかを探索するために,損失ダイナミクスを用いた手法を提案する。画像データセットにおける多層パーセプトロン上のqcsの優位性と畳み込みニューラルネットワークの限界を説明するための手法の有効性を数値実験により証明した。我々の研究はQNNの課題依存力に光を当て、その潜在的なメリットを評価するための実践的なツールを提供する。

Quantum neural networks (QNNs) have become an important tool for understanding the physical world, but their advantages and limitations are not fully understood. Some QNNs with specific encoding methods can be efficiently simulated by classical surrogates, while others with quantum memory may perform better than classical classifiers. Here we systematically investigate the problem-dependent power of quantum neural classifiers (QCs) on multi-class classification tasks. Through the analysis of expected risk, a measure that weighs the training loss and the generalization error of a classifier jointly, we identify two key findings: first, the training loss dominates the power rather than the generalization ability; second, QCs undergo a U-shaped risk curve, in contrast to the double-descent risk curve of deep neural classifiers. We also reveal the intrinsic connection between optimal QCs and the Helstrom bound and the equiangular tight frame. Using these findings, we propose a method that uses loss dynamics to probe whether a QC may be more effective than a classical classifier on a particular learning task. Numerical results demonstrate the effectiveness of our approach to explain the superiority of QCs over multilayer Perceptron on parity datasets and their limitations over convolutional neural networks on image datasets. Our work sheds light on the problem-dependent power of QNNs and offers a practical tool for evaluating their potential merit.

翻訳日:2023-11-02 04:06:25 公開日:2023-10-30

# 非凸非凹ミニマックス最適化のためのユニバーサル勾配降下上昇法

Universal Gradient Descent Ascent Method for Nonconvex-Nonconcave Minimax Optimization ( http://arxiv.org/abs/2212.12978v5 )

ライセンス: Link先を確認

Taoli Zheng, Linglingzhi Zhu, Anthony Man-Cho So, Jose Blanchet, Jiajin Li

(参考訳) nonconvex-nonconcave minimaxの最適化は、機械学習の幅広い応用により、過去10年間、大きな注目を集めてきた。既存のアルゴリズムの多くは、原始(双対)函数の凸性 (resp. concavity) や、Polyak-\L{}ojasiewicz (P\L{}) や Kurdyka-\L{}ojasiewicz (K\L{}) のような特定の構造のような一方的な情報に依存している。しかし、これらの規則性条件の検証は実際は困難である。この課題を克服するために,2重平滑化勾配降下昇降法 (ds-gda) という,プライマルとデュアルの更新を自然にバランスさせる新しい単一ループアルゴリズムを提案する。すなわち、同じハイパーパラメータを持つds-gdaは、一方のk\l{}特性を持つ非凸凸、凸非凸、非凸非凸問題を一様解くことができ、$\mathcal{o}(\epsilon^{-4})$ で収束する。 k\l{}指数が知られている場合、よりシャープな(最適な)反復複雑性が得られる。具体的には、指数 $\theta\in(0,1)$ の片側 k\l{} 条件の下で、ds-gda は $\mathcal{o}(\epsilon^{-2\max\{2\theta,1\}})$ の反復複雑性で収束する。いずれも文学における最良の結果と一致している。さらに, ds-gda は p\l{} 条件, k\l{} 条件, 弱いミント変分不等式条件などの正規性条件がなくても, 一般の非凸非凸問題に適用可能であることを示した。例えば ``Forsaken'' 、 ``Bilinearly-coupled minimax'' 、 ``Sixth-order polynomial'' 、 ``PolarGame' などである。我々の知る限りでは、このアルゴリズムはこれらすべての恐ろしい問題に収束する最初の一階法である。

Nonconvex-nonconcave minimax optimization has received intense attention over the last decade due to its broad applications in machine learning. Most existing algorithms rely on one-sided information, such as the convexity (resp. concavity) of the primal (resp. dual) functions, or other specific structures, such as the Polyak-\L{}ojasiewicz (P\L{}) and Kurdyka-\L{}ojasiewicz (K\L{}) conditions. However, verifying these regularity conditions is challenging in practice. To meet this challenge, we propose a novel universally applicable single-loop algorithm, the doubly smoothed gradient descent ascent method (DS-GDA), which naturally balances the primal and dual updates. That is, DS-GDA with the same hyperparameters is able to uniformly solve nonconvex-concave, convex-nonconcave, and nonconvex-nonconcave problems with one-sided K\L{} properties, achieving convergence with $\mathcal{O}(\epsilon^{-4})$ complexity. Sharper (even optimal) iteration complexity can be obtained when the K\L{} exponent is known. Specifically, under the one-sided K\L{} condition with exponent $\theta\in(0,1)$, DS-GDA converges with an iteration complexity of $\mathcal{O}(\epsilon^{-2\max\{2\theta,1\}})$. They all match the corresponding best results in the literature. Moreover, we show that DS-GDA is practically applicable to general nonconvex-nonconcave problems even without any regularity conditions, such as the P\L{} condition, K\L{} condition, or weak Minty variational inequalities condition. For various challenging nonconvex-nonconcave examples in the literature, including ``Forsaken'', ``Bilinearly-coupled minimax'', ``Sixth-order polynomial'', and ``PolarGame'', the proposed DS-GDA can all get rid of limit cycles. To the best of our knowledge, this is the first first-order algorithm to achieve convergence on all of these formidable problems.

翻訳日:2023-11-02 04:06:00 公開日:2023-10-30

# マルチモーダルインタラクションの定量化とモデル化:情報分解フレームワーク

Quantifying & Modeling Multimodal Interactions: An Information Decomposition Framework ( http://arxiv.org/abs/2302.12247v4 )

ライセンス: Link先を確認

Paul Pu Liang, Yun Cheng, Xiang Fan, Chun Kai Ling, Suzanne Nie, Richard Chen, Zihao Deng, Nicholas Allen, Randy Auerbach, Faisal Mahmood, Ruslan Salakhutdinov, Louis-Philippe Morency

(参考訳) 近年のマルチモーダルアプリケーションへの関心の高まりにより、様々なモダリティから情報を表現・統合するためのデータセットや手法が広く選択された。これらの経験的な進歩にもかかわらず、基礎的な研究の疑問が残る: マルチモーダルなタスクを解決するのに必要な相互作用をどのように定量化できるか? その後、これらの相互作用を捉えるのに最も適したマルチモーダルモデルは何ですか? これらの質問に答えるために,入力モダリティと出力タスクを関連付ける冗長性,特異性,相乗効果の程度を定量化する情報理論的手法を提案する。これら3つの測度をマルチモーダル分布(略してPID)のPID統計と呼び、高次元分布にスケールするこれらのPID統計に対する2つの新しい推定値を導入する。 PID推定を検証するために、PIDが知られている合成データセットと、PID推定を人間のアノテーションと比較する大規模マルチモーダルベンチマークの両方で広範な実験を行う。最後に,(1)マルチモーダルデータセット内のインタラクションの定量化,(2)マルチモーダルモデルでキャプチャされたインタラクションの定量化,(3)モデル選択のための原則的アプローチ,(4)病理学,ムード予測,ロボット知覚における3つの実世界のケーススタディにおいて有用性を示す。

The recent explosion of interest in multimodal applications has resulted in a wide selection of datasets and methods for representing and integrating information from different modalities. Despite these empirical advances, there remain fundamental research questions: How can we quantify the interactions that are necessary to solve a multimodal task? Subsequently, what are the most suitable multimodal models to capture these interactions? To answer these questions, we propose an information-theoretic approach to quantify the degree of redundancy, uniqueness, and synergy relating input modalities with an output task. We term these three measures as the PID statistics of a multimodal distribution (or PID for short), and introduce two new estimators for these PID statistics that scale to high-dimensional distributions. To validate PID estimation, we conduct extensive experiments on both synthetic datasets where the PID is known and on large-scale multimodal benchmarks where PID estimations are compared with human annotations. Finally, we demonstrate their usefulness in (1) quantifying interactions within multimodal datasets, (2) quantifying interactions captured by multimodal models, (3) principled approaches for model selection, and (4) three real-world case studies engaging with domain experts in pathology, mood prediction, and robotic perception where our framework helps to recommend strong multimodal models for each application.

翻訳日:2023-11-02 03:58:37 公開日:2023-10-30

# アンバウンドマシン・アンラーニングに向けて

Towards Unbounded Machine Unlearning ( http://arxiv.org/abs/2302.09880v3 )

ライセンス: Link先を確認

Meghdad Kurmanji, Peter Triantafillou, Jamie Hayes, Eleni Triantafillou

(参考訳) ディープラーニングアンラーニング(deep machine unlearning)は、トレーニングセットのサブセットであるトレーニングされたニューラルネットワークから‘削除’する問題である。この問題は、非常にタイムリーで、多くのアプリケーションがあり、バイアス(rb)の除去、混乱解消(rc)(トレーニングされたモデルの誤ったラベルデータによって引き起こされる)、ユーザープライバシを保護するためにユーザの‘忘れられる権利’を行使すること(up)といった重要なタスクがあります。本論文は,異なるアプリケーション(rb,rc,up)のアンラーニングについて,それぞれが独自のデシデラタ,‘フォーゲッティング’の定義,品質を忘れるための関連するメトリクスを持っているという観点から,我々の知識に対して初めて行うものである。 UPでは,非学習者に対する強力なメンバーシップ推論攻撃の新たな適応を提案する。また、rb、rc、upの異なるアプリケーション依存のメトリクスにおいて、品質を忘れてしまっている唯一の方法である、新しいアンラーニングアルゴリズムであるscruを提案する。同時に、SCRUBはモデルユーティリティ(すなわち保持されたデータと一般化の正確性)を測定する指標上でも一貫してトップパフォーマーであり、以前の作業よりも効率的である。以上は、これまでの最先端技術に対する総合的な実証的評価によって裏付けられている。

Deep machine unlearning is the problem of `removing' from a trained neural network a subset of its training set. This problem is very timely and has many applications, including the key tasks of removing biases (RB), resolving confusion (RC) (caused by mislabelled data in trained models), as well as allowing users to exercise their `right to be forgotten' to protect User Privacy (UP). This paper is the first, to our knowledge, to study unlearning for different applications (RB, RC, UP), with the view that each has its own desiderata, definitions for `forgetting' and associated metrics for forget quality. For UP, we propose a novel adaptation of a strong Membership Inference Attack for unlearning. We also propose SCRUB, a novel unlearning algorithm, which is the only method that is consistently a top performer for forget quality across the different application-dependent metrics for RB, RC, and UP. At the same time, SCRUB is also consistently a top performer on metrics that measure model utility (i.e. accuracy on retained data and generalization), and is more efficient than previous work. The above are substantiated through a comprehensive empirical evaluation against previous state-of-the-art.

翻訳日:2023-11-02 03:57:21 公開日:2023-10-30

# DP-SGDにおける境界学習データ再構成

Bounding Training Data Reconstruction in DP-SGD ( http://arxiv.org/abs/2302.07225v3 )

ライセンス: Link先を確認

Jamie Hayes, Saeed Mahloujifar, Borja Balle

(参考訳) 異なるプライベートトレーニングは、通常はメンバーシップ推論攻撃に対する保証として解釈される保護を提供する。この保証はプロキシによって、完全なトレーニング例を抽出しようとするレコンストラクション攻撃など、他の脅威にも拡張される。最近の研究は、もしメンバーシップ攻撃から保護する必要がなく、訓練データ再構成から保護したいというなら、これらのより野心的な攻撃から保護するためにノイズが少ないため、プライベートモデルの有用性を改善することができるという証拠を提供している。さらに,私的深層学習の標準アルゴリズムであるDP-SGDの文脈でこれを検証し,DP-SGDに対する再構築攻撃の成功と,我々の限界の予測に実証的に一致する攻撃に上限を与える。これら2つの結果は,dp-sgdのプライバシパラメータの設定方法について,レコンストラクション攻撃から保護するための詳細な調査の扉を開くものだ。最後に, DP-SGDパラメータの異なる設定を同一のDP保証に導いた場合, 復元における成功率が著しく異なることを示すために, DP保証だけでは再建攻撃に対する保護を制御できない可能性が示唆された。

Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks. By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples. Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against training data reconstruction, then utility of private models can be improved because less noise is required to protect against these more ambitious attacks. We investigate this further in the context of DP-SGD, a standard algorithm for private deep learning, and provide an upper bound on the success of any reconstruction attack against DP-SGD together with an attack that empirically matches the predictions of our bound. Together, these two results open the door to fine-grained investigations on how to set the privacy parameters of DP-SGD in practice to protect against reconstruction attacks. Finally, we use our methods to demonstrate that different settings of the DP-SGD parameters leading to the same DP guarantees can result in significantly different success rates for reconstruction, indicating that the DP guarantee alone might not be a good proxy for controlling the protection against reconstruction attacks.

翻訳日:2023-11-02 03:56:20 公開日:2023-10-30

# 平均h\"older smoothnessを用いた近最適学習

Near-optimal learning with average H\"older smoothness ( http://arxiv.org/abs/2302.06005v3 )

ライセンス: Link先を確認

Steve Hanneke, Aryeh Kontorovich, Guy Kornowski

(参考訳) 我々は、Ashlagi et al. (COLT 2021) によって提案された平均リプシッツの滑らかさの概念を、H\"古い滑らかさに拡張することで一般化する。我々は, 平均H\"高齢者の滑らかさの観点から, 可逆性および非可逆性(雑音性)の回帰設定を, 平均リプシッツの滑らかさの特殊な場合においても, 既知率と既知率の両方で改善する。さらに,我々の下限は,ログ係数に対する実現可能な設定に密着しているため,minimaxレートが確立される。アルゴリズムの観点からは, 平均滑らか性の概念は未知の分布に対して定義されるため, 学習者は関数クラスの明示的な表現を持たないため, ERMの実行は不可能である。それにもかかわらず、我々は(ほぼ)最適な学習率を達成する異なる学習アルゴリズムを提供する。我々の結果は任意の完全有界距離空間を持ち、その内在幾何学の観点で述べられている。総じて,h\"older smoothness の古典的な最悪ケース概念は,本質的に平均値に置き換えられ,よりシャープな保証が得られることを示した。

We generalize the notion of average Lipschitz smoothness proposed by Ashlagi et al. (COLT 2021) by extending it to H\"older smoothness. This measure of the "effective smoothness" of a function is sensitive to the underlying distribution and can be dramatically smaller than its classic "worst-case" H\"older constant. We consider both the realizable and the agnostic (noisy) regression settings, proving upper and lower risk bounds in terms of the average H\"older smoothness; these rates improve upon both previously known rates even in the special case of average Lipschitz smoothness. Moreover, our lower bound is tight in the realizable setting up to log factors, thus we establish the minimax rate. From an algorithmic perspective, since our notion of average smoothness is defined with respect to the unknown underlying distribution, the learner does not have an explicit representation of the function class, hence is unable to execute ERM. Nevertheless, we provide distinct learning algorithms that achieve both (nearly) optimal learning rates. Our results hold in any totally bounded metric space, and are stated in terms of its intrinsic geometry. Overall, our results show that the classic worst-case notion of H\"older smoothness can be essentially replaced by its average, yielding considerably sharper guarantees.

翻訳日:2023-11-02 03:54:26 公開日:2023-10-30

# 人間とロボットのコラボレーションアプリケーションのための学習データと深層学習によるマルチユーザ行動認識に向けて

Towards Multi-User Activity Recognition through Facilitated Training Data and Deep Learning for Human-Robot Collaboration Applications ( http://arxiv.org/abs/2302.05763v3 )

ライセンス: Link先を確認

Francesco Semeraro, Jon Carberry and Angelo Cangelosi

(参考訳) HRI(Human-robot Interaction)研究は、ロボットが複数の人間のユーザと同時に対話するマルチパーティシナリオに、段階的に対処している。逆に、研究はまだ人間とロボットのコラボレーションの初期段階にある。このようなコラボレーションを扱うために機械学習技術を使用するには、典型的なHRCセットアップよりも生成しにくいデータが必要である。本研究は,非Dydic HRCアプリケーションの並列タスクのシナリオを概説する。これらの概念に基づいて,シングルユーザに関連するデータを収集し,後処理でマージすることで,複数ユーザの活動に関するデータ収集の代替手法を提案し,ペア設定の録音に係わる労力を削減する。このステートメントを検証するために、シングルユーザのアクティビティの3dスケルトンポーズが収集され、ペアにマージされた。その後、このようなデータポイントを用いて長期記憶ネットワーク(LSTM)と時空間グラフ畳み込みネットワーク(STGCN)からなる変動オートエンコーダ(VAE)を別々にトレーニングし、両者の協調活動を認識する。その結果、同じ設定で記録されたユーザのグループに関するトレーニングデータと比較すると、この方法で収集したデータをHRC設定のペアに利用し、同様のパフォーマンスを得ることが可能であり、これらのデータの生成にまつわる技術的困難を軽減できることがわかった。関連コードと収集されたデータは公開されている。

Human-robot interaction (HRI) research is progressively addressing multi-party scenarios, where a robot interacts with more than one human user at the same time. Conversely, research is still at an early stage for human-robot collaboration. The use of machine learning techniques to handle such type of collaboration requires data that are less feasible to produce than in a typical HRC setup. This work outlines scenarios of concurrent tasks for non-dyadic HRC applications. Based upon these concepts, this study also proposes an alternative way of gathering data regarding multi-user activity, by collecting data related to single users and merging them in post-processing, to reduce the effort involved in producing recordings of pair settings. To validate this statement, 3D skeleton poses of activity of single users were collected and merged in pairs. After this, such datapoints were used to separately train a long short-term memory (LSTM) network and a variational autoencoder (VAE) composed of spatio-temporal graph convolutional networks (STGCN) to recognise the joint activities of the pairs of people. The results showed that it is possible to make use of data collected in this way for pair HRC settings and get similar performances compared to using training data regarding groups of users recorded under the same settings, relieving from the technical difficulties involved in producing these data. The related code and collected data are publicly available.

翻訳日:2023-11-02 03:54:03 公開日:2023-10-30

# Jaccard Metric Losses: ソフトラベルによるJaccard Indexの最適化

Jaccard Metric Losses: Optimizing the Jaccard Index with Soft Labels ( http://arxiv.org/abs/2302.05666v4 )

ライセンス: Link先を確認

Zifu Wang and Xuefei Ning and Matthew B. Blaschko

(参考訳) iou(intersection over union)損失はjaccardインデックスを直接最適化するサロゲートである。損失関数の一部としてのIoU損失の活用は、クロスエントロピー損失のみのような画素単位の損失を最適化するよりもセグメンテーションタスクにおいて優れた性能を示した。しかし, ソフトラベルを処理できないため, ラベル平滑化, 知識蒸留, 半教師付き学習といった重要な訓練技術をサポートするために, 損失の柔軟性の欠如が確認された。ハードラベルを用いた標準設定では,Jaccard Metric Losses(JML)というソフトなJaccard損失と同じだが,ソフトなラベルと完全に互換性がある。 JMLをラベル平滑化,知識蒸留,半教師付き学習の3つの顕著なユースケースに適用し,モデルの精度と校正性を示す。実験により,4つのセマンティックセグメンテーションデータセット(Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land)と13のアーキテクチャ間のクロスエントロピー損失に対する一貫した改善が示された。驚くべきことに、私たちの直接的なアプローチは、最先端の知識蒸留と半教師付き学習方法を大きく上回っている。コードは \href{https://github.com/zifuwanggg/jdtlosses}{https://github.com/zifuwanggg/jdtlosses} で入手できる。

Intersection over Union (IoU) losses are surrogates that directly optimize the Jaccard index. Leveraging IoU losses as part of the loss function have demonstrated superior performance in semantic segmentation tasks compared to optimizing pixel-wise losses such as the cross-entropy loss alone. However, we identify a lack of flexibility in these losses to support vital training techniques like label smoothing, knowledge distillation, and semi-supervised learning, mainly due to their inability to process soft labels. To address this, we introduce Jaccard Metric Losses (JMLs), which are identical to the soft Jaccard loss in standard settings with hard labels but are fully compatible with soft labels. We apply JMLs to three prominent use cases of soft labels: label smoothing, knowledge distillation and semi-supervised learning, and demonstrate their potential to enhance model accuracy and calibration. Our experiments show consistent improvements over the cross-entropy loss across 4 semantic segmentation datasets (Cityscapes, PASCAL VOC, ADE20K, DeepGlobe Land) and 13 architectures, including classic CNNs and recent vision transformers. Remarkably, our straightforward approach significantly outperforms state-of-the-art knowledge distillation and semi-supervised learning methods. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

翻訳日:2023-11-02 03:52:51 公開日:2023-10-30

# AIシステムによるマニピュレーションの特徴付け

Characterizing Manipulation from AI Systems ( http://arxiv.org/abs/2303.09387v3 )

ライセンス: Link先を確認

Micah Carroll, Alan Chan, Henry Ashton, David Krueger

(参考訳) 操作は、ソーシャルメディア、広告、チャットボットなど、多くのドメインで共通の関心事である。 AIシステムは世界とのインタラクションをより仲介するので、システム設計者の意図なしにAIシステムが人間を操作できる程度を理解することが重要である。我々の研究は、AIシステムのコンテキストにおける操作の定義と測定における課題を明らかにする。第一に、私たちは他の分野からの操作に関する先行文献を構築し、インセンティブ、意図、危害、隠ぺいの概念に依存する操作の可能な概念の空間を特徴づける。各要因の運用方法についての提案をレビューする。第2に,人間(または他のエージェント)を意図的にかつ秘密的に変化させるインセンティブを追求しているかのように振る舞う場合,システムはマニピュレーションである,という特徴に基づく操作の定義を提案する。第3に,マニピュレーションと関連する概念(デセプションや強制など)との関係について論じる。最後に、いくつかのアプリケーションにおける操作の運用をコンテキスト化します。全体的な評価では、AIシステムによる操作の定義と測定にいくつかの進歩があったが、多くのギャップが残っている。コンセンサスの定義や測定のための信頼できるツールがないため、システム設計者の意図なしにAIシステムが人間の操作を学ぶ可能性を排除することはできない。このような操作は、人間の自律性に重大な脅威をもたらし、それを軽減するための予防措置が保証されていることを示唆している。

Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans without the intent of the system designers. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.

翻訳日:2023-11-02 03:45:23 公開日:2023-10-30

# 点クラウドのためのパラメトリック表面制約アップサンプラーネットワーク

Parametric Surface Constrained Upsampler Network for Point Cloud ( http://arxiv.org/abs/2303.08240v2 )

ライセンス: Link先を確認

Pingping Cai and Zhenyao Wu and Xinyi Wu and Song Wang

(参考訳) スパースポイント表現を与えられたクリーンで高密度なポイントクラウドを生成することを目的としたポイントクラウドアップサンプラーの設計は、コンピュータビジョンにおける根本的な挑戦的な問題である。一連の試みは、ディープニューラルネットワークを介してポイントツーポイントマッピング関数を確立することによって、この目標を達成する。しかし、これらのアプローチは表面レベルの明示的な制約が欠如しているため、異常点を生じやすい。この問題を解決するために,ニューラルネットワークにバイコビック関数と回転関数で表されるパラメトリック曲面を学習させ,そこで新たに生成された点を基底面に拘束することにより,新しいサーフェス正規化器をアップサンプラーネットワークに導入する。これらの設計は、2つの異なるネットワークに統合され、レイヤポイントクラウドのアップサンプリングとポイントクラウドのコンプリートによる評価の利点を活かす。両課題の最先端実験結果から,提案手法の有効性が示された。実装コードはhttps://github.com/corecai163/PSCUで公開される。

Designing a point cloud upsampler, which aims to generate a clean and dense point cloud given a sparse point representation, is a fundamental and challenging problem in computer vision. A line of attempts achieves this goal by establishing a point-to-point mapping function via deep neural networks. However, these approaches are prone to produce outlier points due to the lack of explicit surface-level constraints. To solve this problem, we introduce a novel surface regularizer into the upsampler network by forcing the neural network to learn the underlying parametric surface represented by bicubic functions and rotation functions, where the new generated points are then constrained on the underlying surface. These designs are integrated into two different networks for two tasks that take advantages of upsampling layers - point cloud upsampling and point cloud completion for evaluation. The state-of-the-art experimental results on both tasks demonstrate the effectiveness of the proposed method. The implementation code will be available at https://github.com/corecai163/PSCU.

翻訳日:2023-11-02 03:44:43 公開日:2023-10-30

# コンピュータグラフィックス画像の主観的・客観的品質評価

Subjective and Objective Quality Assessment for in-the-Wild Computer Graphics Images ( http://arxiv.org/abs/2303.08050v3 )

ライセンス: Link先を確認

Zicheng Zhang, Wei Sun, Yingjie Zhou, Jun Jia, Zhichao Zhang, Jing Liu, Xiongkuo Min, and Guangtao Zhai

(参考訳) コンピュータグラフィックス画像(CGI)は、コンピュータプログラムによって人工的に生成され、ゲームやストリーミングメディアなどの様々なシナリオにおいて広く認識されている。実際には、CGIの品質は、生産期間中のレンダリングの低下、マルチメディアアプリケーションの送信時に必然的な圧縮アーティファクト、構成と設計の低下による美的品質の低下に常に悩まされている。しかし、コンピュータグラフィックス画像品質評価(CGIQA)の課題に対処する研究はほとんど行われていない。ほとんどの画像品質評価(IQA)メトリクスは、自然シーン画像(NSI)のために開発され、合成歪みを持つNSIからなるデータベース上で検証される。 NSIとCGIの品質評価のギャップを埋めるため,6,000のCGI(CGIQA-6k)からなる大規模CGIQAデータベースを構築し,CGIの正確な知覚評価を得るために,よく制御された実験環境において主観的な実験を行う。そこで本研究では,歪みと審美的品質の表現を両立し,効果的な深層学習に基づくno-reference (nr) iqaモデルを提案する。実験の結果,提案手法は構築されたCGIQA-6kデータベースや他のCGIQA関連データベース上で,最先端のNR IQA手法よりも優れていた。データベースはhttps://github.com/zzc-1998/cgiqa6kでリリースされる。

Computer graphics images (CGIs) are artificially generated by means of computer programs and are widely perceived under various scenarios, such as games, streaming media, etc. In practice, the quality of CGIs consistently suffers from poor rendering during production, inevitable compression artifacts during the transmission of multimedia applications, and low aesthetic quality resulting from poor composition and design. However, few works have been dedicated to dealing with the challenge of computer graphics image quality assessment (CGIQA). Most image quality assessment (IQA) metrics are developed for natural scene images (NSIs) and validated on databases consisting of NSIs with synthetic distortions, which are not suitable for in-the-wild CGIs. To bridge the gap between evaluating the quality of NSIs and CGIs, we construct a large-scale in-the-wild CGIQA database consisting of 6,000 CGIs (CGIQA-6k) and carry out the subjective experiment in a well-controlled laboratory environment to obtain the accurate perceptual ratings of the CGIs. Then, we propose an effective deep learning-based no-reference (NR) IQA model by utilizing both distortion and aesthetic quality representation. Experimental results show that the proposed method outperforms all other state-of-the-art NR IQA methods on the constructed CGIQA-6k database and other CGIQA-related databases. The database is released at https://github.com/zzc-1998/CGIQA6K.

翻訳日:2023-11-02 03:44:23 公開日:2023-10-30

# 損失検査による物体検出データセットにおけるラベル誤りの同定

Identifying Label Errors in Object Detection Datasets by Loss Inspection ( http://arxiv.org/abs/2303.06999v2 )

ライセンス: Link先を確認

Marius Schubert, Tobias Riedlinger, Karsten Kahl, Daniel Kr\"oll, Sebastian Schoenen, Sini\v{s}a \v{S}egvi\'c, Matthias Rottmann

(参考訳) 教師付きオブジェクト検出のためのデータセットのラベル付けは退屈で時間を要する作業である。エラーはアノテーション中に簡単に導入でき、レビュー中に見落とされ、不正確なベンチマークとノイズラベルに基づいてトレーニングされたディープニューラルネットワークのパフォーマンス劣化をもたらす。本稿では,まず,オブジェクト検出データセットにおけるラベル誤り検出手法のベンチマークとラベルエラー検出手法とベースラインをいくつか紹介する。 4種類のランダムに導入されたラベルエラーを列車上でシミュレートし,よくラベルされたオブジェクト検出データセットをテストセットとした。ラベル誤り検出法では,2段階の物体検出器が与えられると仮定し,両者の分類と回帰損失の総和を考察する。損失は、後者を検出することを目的として、予測とシミュレートされたラベルエラーを含むノイズラベルに対して計算される。我々は,本手法を3つのベースラインと比較した。深層学習のないナイーブな手法,対象検出器のスコア,分類ソフトマックス分布のエントロピーである。すべてのベースラインを上回り、検討したメソッドの中で、4つのタイプのラベルエラーを効率的に検出する唯一の方法であることを実証します。さらに実際のラベルエラーを検知し a) オブジェクト検出において一般的に使用されるテストデータセットについて b) プロプライエタリなデータセット。いずれの場合も偽陽性率が低い、すなわちラベルエラーを精度良く検出する。 a)71.5%まで、及び b) 97%であった。

Labeling datasets for supervised object detection is a dull and time-consuming task. Errors can be easily introduced during annotation and overlooked during review, yielding inaccurate benchmarks and performance degradation of deep neural networks trained on noisy labels. In this work, we for the first time introduce a benchmark for label error detection methods on object detection datasets as well as a label error detection method and a number of baselines. We simulate four different types of randomly introduced label errors on train and test sets of well-labeled object detection datasets. For our label error detection method we assume a two-stage object detector to be given and consider the sum of both stages' classification and regression losses. The losses are computed with respect to the predictions and the noisy labels including simulated label errors, aiming at detecting the latter. We compare our method to three baselines: a naive one without deep learning, the object detector's score and the entropy of the classification softmax distribution. We outperform all baselines and demonstrate that among the considered methods, ours is the only one that detects label errors of all four types efficiently. Furthermore, we detect real label errors a) on commonly used test datasets in object detection and b) on a proprietary dataset. In both cases we achieve low false positives rates, i.e., we detect label errors with a precision for a) of up to 71.5% and for b) with 97%.

翻訳日:2023-11-02 03:43:56 公開日:2023-10-30

# SHAP-IQ:任意の順序共有相互作用の統一近似

SHAP-IQ: Unified Approximation of any-order Shapley Interactions ( http://arxiv.org/abs/2303.01179v3 )

ライセンス: Link先を確認

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke H\"ullermeier, Barbara Hammer

(参考訳) 説明可能な人工知能(XAI)の研究において、あらゆるブラックボックスモデルの特徴属性を決定するためにShapley値(SV)が適用される。シェープ相互作用指標はSVを拡張して任意の順序の特徴相互作用を定義する。ユニークなシャプリー相互作用指数の定義は、オープンリサーチの問題であり、これまで3つの定義が提案されてきたが、これは公理の選択によって異なる。さらに、各定義には特定の近似技術が必要である。本稿では,任意の基数相互作用指標(CII)に対するシャプリー相互作用を効率よく計算するためのサンプリングベース近似であるSHAPley Interaction Quantification (SHAP-IQ)を提案する。 SHAP-IQは、新しい表現に基づいており、既存の手法とは対照的に、近似品質の理論的保証と点推定の分散の推定を提供する。 SVの特殊な場合,本手法はSVの新規な表現を明らかにし,Unbiased KernelSHAPに対応して計算を単純化する。本稿では,言語,画像分類,高次元合成モデルを説明することにより,計算効率と有効性を説明する。

Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models.

翻訳日:2023-11-02 03:40:47 公開日:2023-10-30

# WEARDA:人間の活動監視のためのウェアラブルセンサーデータの記録

WEARDA: Recording Wearable Sensor Data for Human Activity Monitoring ( http://arxiv.org/abs/2303.00064v2 )

ライセンス: Link先を確認

Richard M.K. van Dijk, Daniela Gawehns and Matthijs van Leeuwen

(参考訳) 本稿では,オープンソースのウェアラブルセンサデータ取得ソフトウェアであるweardaを提案する。 WEARDAはスマートウォッチによる人間の活動データ取得を促進しており、主に透明性、完全な制御、生のセンサーデータへのアクセスを必要とする研究者を対象としている。これは4つのセンサー(三軸加速度計、三軸ジャイロスコープ、気圧計、GPS)の生データを同時に記録する機能を提供する。 Tizen OSを搭載したSamsungのスマートウォッチが選ばれた 1)スマートウォッチソフトウェアAPIに必要な機能。 2) ソフトウェア開発ツールとアクセス可能なドキュメントの可用性。 3) 必要なセンサを有すること、及び 4) 対象ユーザグループによる受け入れのためのケースデザインの要件。 WEARDAは、効率的でエラーのないデータ収集を保証するための準備、計測、物流、プライバシー保護、再現性に関する5つの実践的な課題に対処する。ソフトウェアパッケージは最初、"コミュニティの中心にあるDementia Back"プロジェクトのために作成され、そのコンテキストでうまく使われています。

We present WEARDA, the open source WEARable sensor Data Acquisition software package. WEARDA facilitates the acquisition of human activity data with smartwatches and is primarily aimed at researchers who require transparency, full control, and access to raw sensor data. It provides functionality to simultaneously record raw data from four sensors -- tri-axis accelerometer, tri-axis gyroscope, barometer, and GPS -- which should enable researchers to, for example, estimate energy expenditure and mine movement trajectories. A Samsung smartwatch running the Tizen OS was chosen because of 1) the required functionalities of the smartwatch software API, 2) the availability of software development tools and accessible documentation, 3) having the required sensors, and 4) the requirements on case design for acceptance by the target user group. WEARDA addresses five practical challenges concerning preparation, measurement, logistics, privacy preservation, and reproducibility to ensure efficient and errorless data collection. The software package was initially created for the project "Dementia back at the heart of the community", and has been successfully used in that context.

翻訳日:2023-11-02 03:40:26 公開日:2023-10-30

# アウトソース機械学習タスクの低コスト結果検証のための生成フレームワーク

A Generative Framework for Low-Cost Result Validation of Outsourced Machine Learning Tasks ( http://arxiv.org/abs/2304.00083v3 )

ライセンス: Link先を確認

Abhinav Kumar, Miguel A. Guirao Aguilera, Reza Tourani, Satyajayant Misra

(参考訳) 機械学習(ML)の人気が高まり、さまざまなセンシティブなドメインにデプロイされるようになり、MLのセキュリティとプライバシを重視した大きな研究がもたらされた。しかしながら、自動運転など一部のアプリケーションでは、アウトソースされたMLワークロードの整合性検証がより重要になっている。マルチパーティ計算や証明ベースシステムといった既存のソリューションは、計算オーバーヘッドがかなり大きいため、リアルタイムアプリケーションには適さない。我々は、アウトソースされたMLワークロードのリアルタイム検証のための新しいフレームワークであるFidesを提案する。 Fidesは、信頼された実行環境内で実行中に対応するサービスモデルを検証するための、空間を動的に蒸留し微調整する、新しい、効率的な蒸留技術である、Greedy Distillation Transfer Learningを特徴としている。 fideは、統計分析と分岐測定を使用して、サービスモデルが攻撃されている場合に高い確率で識別するクライアント側の攻撃検出モデルを備えている。 Fidesはまた、攻撃が特定されるたびに元のクラスを予測する再分類機能を提供する。攻撃検出と再分類モデルの訓練のための生成的逆ネットワークフレームワークを考案した。評価の結果,fideは攻撃検出で最大98%,再分類で94%の精度を達成した。

The growing popularity of Machine Learning (ML) has led to its deployment in various sensitive domains, which has resulted in significant research focused on ML security and privacy. However, in some applications, such as autonomous driving, integrity verification of the outsourced ML workload is more critical--a facet that has not received much attention. Existing solutions, such as multi-party computation and proof-based systems, impose significant computation overhead, which makes them unfit for real-time applications. We propose Fides, a novel framework for real-time validation of outsourced ML workloads. Fides features a novel and efficient distillation technique--Greedy Distillation Transfer Learning--that dynamically distills and fine-tunes a space and compute-efficient verification model for verifying the corresponding service model while running inside a trusted execution environment. Fides features a client-side attack detection model that uses statistical analysis and divergence measurements to identify, with a high likelihood, if the service model is under attack. Fides also offers a re-classification functionality that predicts the original class whenever an attack is identified. We devised a generative adversarial network framework for training the attack detection and re-classification models. The evaluation shows that Fides achieves an accuracy of up to 98% for attack detection and 94% for re-classification.

翻訳日:2023-11-02 03:32:47 公開日:2023-10-30

# BERT4ETH:Ethereumフラッド検出のためのトレーニング済み変換器

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection ( http://arxiv.org/abs/2303.18138v2 )

ライセンス: Link先を確認

Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, Ling Liu

(参考訳) 様々な詐欺がethereumで拡散するので、これらの悪意のある活動に対して保護し、脆弱なユーザーを犠牲にしないようにすることが不可欠である。現在の研究はグラフベースの不正検出アプローチのみに依存しているが、高度に繰り返し、歪んだ分散、異種ethereumトランザクションを扱うのに適していない可能性がある。これらの課題に対処するために、ethereum上でさまざまな不正行為を検出するためのアカウント表現抽出器として機能するユニバーサルプリトレーニングトランスコーダbert4ethを提案する。 BERT4ETHは、Ethereumトランザクション固有の動的シーケンシャルパターンをキャプチャするTransformerの優れたモデリング機能を備えており、EthereumのBERTモデルを3つの実践的で効果的な戦略、すなわち反復性削減、スキュー緩和、異種性モデリングで事前トレーニングする際の課題に対処する。実験により,BERT4ETHは,フィッシングアカウントの検出や匿名化タスクにおいて,最先端の手法よりも優れた性能を示した。 BERT4ETHのコードは以下の通りである。

As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

翻訳日:2023-11-02 03:32:26 公開日:2023-10-30

# アダプティブリファインメントとカントロビッチ計量によるデータ駆動抽象化 [拡張版]

Data-driven abstractions via adaptive refinements and a Kantorovich metric [extended version] ( http://arxiv.org/abs/2303.17618v4 )

ライセンス: Link先を確認

Adrien Banse, Licio Romao, Alessandro Abate, Rapha\"el M. Jungers

(参考訳) 本稿では,動的システムのスマートでスケーラブルな抽象化のための適応的改良手順を提案する。我々の手法は将来の出力の観測に依存する状態空間の分割に依存している。しかし、この知識は適応的で非対称な方法で動的に構築される。最適構造を学ぶために,マルコフ鎖間のカントロヴィチに触発された計量を定義し,損失関数として用いる。私たちの技術はデータ駆動型フレームワークに傾向がありますが、制限はありません。また、上記のマルコフ連鎖間の計量の性質について研究し、より広い目的のために応用できると考えている。近似アルゴリズムを提案し,従来の線形プログラミング手法よりも計算の複雑さがはるかに高いことを示す。

We introduce an adaptive refinement procedure for smart, and scalable abstraction of dynamical systems. Our technique relies on partitioning the state space depending on the observation of future outputs. However, this knowledge is dynamically constructed in an adaptive, asymmetric way. In order to learn the optimal structure, we define a Kantorovich-inspired metric between Markov chains, and we use it as a loss function. Our technique is prone to data-driven frameworks, but not restricted to. We also study properties of the above mentioned metric between Markov chains, which we believe could be of application for wider purpose. We propose an algorithm to approximate it, and we show that our method yields a much better computational complexity than using classical linear programming techniques.

翻訳日:2023-11-02 03:32:06 公開日:2023-10-30

# 負サンプリングを超えた効率的な分散表現

Efficient distributed representations beyond negative sampling ( http://arxiv.org/abs/2303.17475v2 )

ライセンス: Link先を確認

Lorenzo Dall'Amico and Enrico Maria Belliardo

(参考訳) 本稿では,分散表現を学習するための効率的な手法について述べる。これはWord2Vecアルゴリズムで導入されたものと類似した目的関数を最小化し、後にいくつかの作品で採用された。最適化計算のボトルネックは、サンプルサイズと2次にスケーリングする多数の演算を必要とするソフトマックス正規化定数の計算である。この複雑さは大規模なデータセットには不適であり、負のサンプリングは一般的な回避策であり、サンプルサイズに関して線形時間で分散表現を得ることができる。しかし、負のサンプリングは損失関数の変更に含まれるため、当初提案されたものと異なる最適化問題を解決する。我々の貢献は、sotfmax正規化定数を線形時間で推定できることを示し、分散表現を学習するための効率的な最適化戦略を設計できることである。単語とノードの埋め込みに関連する2つの一般的なアプリケーションで近似をテストします。その結果, 計算時間が著しく低い負サンプリングに対して, 精度で競合する性能を実証した。

This article describes an efficient method to learn distributed representations, also known as embeddings. This is accomplished minimizing an objective function similar to the one introduced in the Word2Vec algorithm and later adopted in several works. The optimization computational bottleneck is the calculation of the softmax normalization constants for which a number of operations scaling quadratically with the sample size is required. This complexity is unsuited for large datasets and negative sampling is a popular workaround, allowing one to obtain distributed representations in linear time with respect to the sample size. Negative sampling consists, however, in a change of the loss function and hence solves a different optimization problem from the one originally proposed. Our contribution is to show that the sotfmax normalization constants can be estimated in linear time, allowing us to design an efficient optimization strategy to learn distributed representations. We test our approximation on two popular applications related to word and node embeddings. The results evidence competing performance in terms of accuracy with respect to negative sampling with a remarkably lower computational time.

翻訳日:2023-11-02 03:31:55 公開日:2023-10-30

# 複素値ニューラルネットワークを用いた最適近似

Optimal approximation using complex-valued neural networks ( http://arxiv.org/abs/2303.16813v2 )

ライセンス: Link先を確認

Paul Geuchen, Felix Voigtlaender

(参考訳) 複雑評価ニューラルネットワーク(CVNN)は先日、リカレントニューラルネットワークの安定性の向上や、MRIフィンガープリントなどの複雑な値入力を伴うタスクのパフォーマンス向上など、有望な実証的な成功を示している。真に評価されたケースにおけるDeep Learningの圧倒的な成功は、成長する数学的基盤によって支えられているが、そのような基礎は、複雑な評価されたケースにおいて依然としてほとんど欠落している。そこで, cvnnの近似特性を解析し, 表現率を解析した。以上の結果から,人気のあるmodreluおよび複合型心筋活性化機能を含む幅広い活性化機能に適用できるcvnnの定量的近似限界が得られた。正確には、この結果は、ある空でない開集合上の多ハーモニックでない滑らかな任意の活性化関数に適用できる;これは複素集合への滑らかで非多項の活性化関数のクラスの自然な一般化である。我々の主な結果は、$C^k$-函数の近似誤差が$m^{-k/(2n)}$ for $m \to \infty$ ここで、$m$はニューロンの数、$k$は対象関数の滑らかさ、$n$は(複雑な)入力次元であることを示している。自然連続性仮定では、この速度が最適であることを示し、この仮定を捨てる際の最適性をさらに議論する。さらに,連続近似法を用いて$c^k$-関数を近似する問題は必然的に次元の呪いに苦しむことを証明した。

Complex-valued neural networks (CVNNs) have recently shown promising empirical success, for instance for increasing the stability of recurrent neural networks and for improving the performance in tasks with complex-valued inputs, such as in MRI fingerprinting. While the overwhelming success of Deep Learning in the real-valued case is supported by a growing mathematical foundation, such a foundation is still largely lacking in the complex-valued case. We thus analyze the expressivity of CVNNs by studying their approximation properties. Our results yield the first quantitative approximation bounds for CVNNs that apply to a wide class of activation functions including the popular modReLU and complex cardioid activation functions. Precisely, our results apply to any activation function that is smooth but not polyharmonic on some non-empty open set; this is the natural generalization of the class of smooth and non-polynomial activation functions to the complex setting. Our main result shows that the error for the approximation of $C^k$-functions scales as $m^{-k/(2n)}$ for $m \to \infty$ where $m$ is the number of neurons, $k$ the smoothness of the target function and $n$ is the (complex) input dimension. Under a natural continuity assumption, we show that this rate is optimal; we further discuss the optimality when dropping this assumption. Moreover, we prove that the problem of approximating $C^k$-functions using continuous approximation methods unavoidably suffers from the curse of dimensionality.

翻訳日:2023-11-02 03:31:42 公開日:2023-10-30

# 非線型部分可観測系に対する確率的逆最適制御は知覚の不確実性と行動コストを乱す

Probabilistic inverse optimal control for non-linear partially observable systems disentangles perceptual uncertainty and behavioral costs ( http://arxiv.org/abs/2303.16698v2 )

ライセンス: Link先を確認

Dominik Straub, Matthias Schultheis, Heinz Koeppl, Constantin A. Rothkopf

(参考訳) 逆最適制御は、シーケンシャルな意思決定タスクの振る舞いを特徴づけるのに使うことができる。しかし、既存の作業のほとんどは完全に観測可能なシステムや線形システムに限定されている。本稿では、観測不能な動作信号を持つ確率的非線形系に対する逆最適制御の確率論的アプローチを導入し、最大因果エントロピー定式化による逆最適制御に対する以前のアプローチを統一する。エージェントの知覚・運動系のノイズ特性の明示的なモデルと局所線形化手法を用いて,モデルパラメータの近似近似近似関数を導出し,単一のフォワードパス内で計算できる。 2つの古典的な制御課題と2つの人間の行動課題の確率的および部分的に観察可能なバージョンの定量的評価を行った。また,本手法は,認知的行動や実用的行動が,アクティブセンシングやアクティブラーニングといった不確実性下での逐次意思決定に絡み合っているにもかかわらず,知覚的要因や行動的コストを解消できることを示す。提案手法は、模倣学習から感覚運動神経科学まで幅広い応用性を有する。

Inverse optimal control can be used to characterize behavior in sequential decision-making tasks. Most existing work, however, is limited to fully observable or linear systems, or requires the action signals to be known. Here, we introduce a probabilistic approach to inverse optimal control for partially observable stochastic non-linear systems with unobserved action signals, which unifies previous approaches to inverse optimal control with maximum causal entropy formulations. Using an explicit model of the noise characteristics of the sensory and motor systems of the agent in conjunction with local linearization techniques, we derive an approximate likelihood function for the model parameters, which can be computed within a single forward pass. We present quantitative evaluations on stochastic and partially observable versions of two classic control tasks and two human behavioral tasks. Importantly, we show that our method can disentangle perceptual factors and behavioral costs despite the fact that epistemic and pragmatic actions are intertwined in sequential decision-making under uncertainty, such as in active sensing and active learning. The proposed method has broad applicability, ranging from imitation learning to sensorimotor neuroscience.

翻訳日:2023-11-02 03:31:14 公開日:2023-10-30

# confide:pdesのコンテキスト有限差分モデリング

CONFIDE: Contextual Finite Differences Modelling of PDEs ( http://arxiv.org/abs/2303.15827v2 )

ライセンス: Link先を確認

Ori Linial, Orly Avner, Dotan Di Castro

(参考訳) 本稿では、学習コンテキストに基づいて、以前に見つからなかったダイナミックスによって生成されたデータサンプルから明示的なPDEを推測する手法を提案する。トレーニングフェーズは、方程式の形式に関する知識を微分スキームと統合し、推論フェーズは、データサンプルに適合し、信号予測とデータ説明の両方を可能にするPDEを生成する。提案手法とsomaアプローチを比較した広範な実験結果と,予測誤差と説明可能性の観点から解の異なるフレーバーについて検討したアブレーション実験を含む。

We introduce a method for inferring an explicit PDE from a data sample generated by previously unseen dynamics, based on a learned context. The training phase integrates knowledge of the form of the equation with a differential scheme, while the inference phase yields a PDE that fits the data sample and enables both signal prediction and data explanation. We include results of extensive experimentation, comparing our method to SOTA approaches, together with ablation studies that examine different flavors of our solution in terms of prediction error and explainability.

翻訳日:2023-11-02 03:30:34 公開日:2023-10-30

# ニューラルスケーリングの量子化モデル

The Quantization Model of Neural Scaling ( http://arxiv.org/abs/2303.13506v2 )

ライセンス: Link先を確認

Eric J. Michaud, Ziming Liu, Uzay Girit, Max Tegmark

(参考訳) ニューラルスケーリング法則の量子化モデルを提案し、モデルとデータサイズによる損失の観測されたパワー則と、スケールによる新しい機能の突然の出現について説明する。このモデルは、ネットワーク知識とスキルを離散的なチャンク(\textbf{quanta}$)に"量子化"する量子化仮説(Quantization hypothesis)と呼ばれています。使用頻度を減少させるために量子が学習されると、使用頻度における電力法則が観測された損失のスケーリングを説明する。この予測をおもちゃのデータセット上で検証し,大規模言語モデルにおけるスケーリング曲線の分解について検討する。言語モデル勾配を用いて、モデルの振る舞いを様々なスキル(量子)に自動的に分解する。トレーニング分布でこれらの量子が使用される周波数は、言語モデルに対する経験的スケーリング指数に対応する電力法則に従っており、我々の理論の予測である。

We propose the Quantization Model of neural scaling laws, explaining both the observed power law dropoff of loss with model and data size, and also the sudden emergence of new capabilities with scale. We derive this model from what we call the Quantization Hypothesis, where network knowledge and skills are "quantized" into discrete chunks ($\textbf{quanta}$). We show that when quanta are learned in order of decreasing use frequency, then a power law in use frequencies explains observed power law scaling of loss. We validate this prediction on toy datasets, then study how scaling curves decompose for large language models. Using language model gradients, we automatically decompose model behavior into a diverse set of skills (quanta). We tentatively find that the frequency at which these quanta are used in the training distribution roughly follows a power law corresponding with the empirical scaling exponent for language models, a prediction of our theory.

翻訳日:2023-11-02 03:30:01 公開日:2023-10-30

# 画像としての時系列:不規則にサンプリングされた時系列の視覚トランスフォーマー

Time Series as Images: Vision Transformer for Irregularly Sampled Time Series ( http://arxiv.org/abs/2303.12799v2 )

ライセンス: Link先を確認

Zekun Li, Shiyang Li, Xifeng Yan

(参考訳) 不規則にサンプリングされた時系列は、特に医学領域においてますます普及している。これらの不規則性を扱うための様々な特殊な手法が開発されているが、それらの複雑な力学を効果的にモデル化し、空間性を示すことは依然として課題である。本稿では,不規則にサンプリングされた時系列を線グラフ画像に変換し,画像分類と同様に強力な事前学習された視覚トランスフォーマを用いて時系列分類を行う新しい視点を提案する。この手法はアルゴリズム設計を単純化するだけでなく、時系列モデリングの普遍的なフレームワークとして機能する可能性も提示する。注目すべきは、その単純さにもかかわらず、私たちのアプローチは、いくつかの一般的な医療および人間の活動データセットに関する最先端の特殊アルゴリズムよりも優れていることです。特にテスト中に変数の一部が省略された厳密な離脱センサー設定では、様々な観測値に対して強い頑健性を示し、たとえ半分の変数がマスクされていたとしても、絶対的なf1得点点において42.8%の大幅な改善を達成している。コードとデータはhttps://github.com/leezekun/vitstで入手できる。

Irregularly sampled time series are increasingly prevalent, particularly in medical domains. While various specialized methods have been developed to handle these irregularities, effectively modeling their complex dynamics and pronounced sparsity remains a challenge. This paper introduces a novel perspective by converting irregularly sampled time series into line graph images, then utilizing powerful pre-trained vision transformers for time series classification in the same way as image classification. This method not only largely simplifies specialized algorithm designs but also presents the potential to serve as a universal framework for time series modeling. Remarkably, despite its simplicity, our approach outperforms state-of-the-art specialized algorithms on several popular healthcare and human activity datasets. Especially in the rigorous leave-sensors-out setting where a portion of variables is omitted during testing, our method exhibits strong robustness against varying degrees of missing observations, achieving an impressive improvement of 42.8% in absolute F1 score points over leading specialized baselines even with half the variables masked. Code and data are available at https://github.com/Leezekun/ViTST

翻訳日:2023-11-02 03:28:49 公開日:2023-10-30

# FedML-HE: 効率的な同型暗号化に基づくプライバシー保護フェデレーション学習システム

FedML-HE: An Efficient Homomorphic-Encryption-Based Privacy-Preserving Federated Learning System ( http://arxiv.org/abs/2303.10837v2 )

ライセンス: Link先を確認

Weizhao Jin, Yuhang Yao, Shanshan Han, Carlee Joe-Wong, Srivatsan Ravi, Salman Avestimehr, Chaoyang He

(参考訳) federated learningは、ローカルデータの代わりにローカルモデルのアップデートを集約することで、分散デバイス上でマシンラーニングモデルをトレーニングする。しかし、サーバ上の集約されたローカルモデルが反転攻撃によって機密性の高い個人情報を明らかにする可能性があるため、プライバシの懸念が生じる。ホモモルフィック暗号化(HE)のようなプライバシ保護手法はFLトレーニングに必要となる。 HEのプライバシー上の優位性にもかかわらず、そのアプリケーションは特に基礎モデルにおいて非現実的なオーバーヘッドに悩まされている。本稿では,HedML-HEをベースとした安全なモデルアグリゲーションを効率よく実現した,最初の実践的フェデレーション学習システムを提案する。 fedml-heは機密パラメータを選択的に暗号化し、トレーニング中の計算と通信のオーバーヘッドを大幅に削減し、カスタマイズ可能なプライバシ保護を提供する。最適化されたシステムでは,特に大規模な基盤モデル(ResNet-50では10倍,BERTでは40倍程度)において,大幅なオーバーヘッド削減を実現しています。

Federated Learning trains machine learning models on distributed devices by aggregating local model updates instead of local data. However, privacy concerns arise as the aggregated local models on the server may reveal sensitive personal information by inversion attacks. Privacy-preserving methods, such as homomorphic encryption (HE), then become necessary for FL training. Despite HE's privacy advantages, its applications suffer from impractical overheads, especially for foundation models. In this paper, we present FedML-HE, the first practical federated learning system with efficient HE-based secure model aggregation. FedML-HE proposes to selectively encrypt sensitive parameters, significantly reducing both computation and communication overheads during training while providing customizable privacy preservation. Our optimized system demonstrates considerable overhead reduction, particularly for large foundation models (e.g., ~10x reduction for ResNet-50, and up to ~40x reduction for BERT), demonstrating the potential for scalable HE-based FL deployment.

翻訳日:2023-11-02 03:27:59 公開日:2023-10-30

# オブジェクト認識同変基本反応拡散モデルによる正確な遷移状態生成

Accurate transition state generation with an object-aware equivariant elementary reaction diffusion model ( http://arxiv.org/abs/2304.06174v3 )

ライセンス: Link先を確認

Chenru Duan, Yuanqi Du, Haojun Jia, and Heather J. Kulik

(参考訳) 遷移状態 (TS) 探索は反応機構の解明と反応ネットワークの探索に重要である。しかし、正確な3次元TS構造を探すには、ポテンシャルエネルギー面の複雑さのために多くの計算集約的な量子化学計算が必要である。そこで我々は, 反応器, TS, および生成物の集合を生成するために, 全ての物理対称性と制約を満たすオブジェクト指向SE(3)同変拡散モデルを開発した。反応物と生成物により、このモデルは量子化学に基づく最適化を行うのに必要な時間ではなく、数秒でTS構造を生成する。生成されたTS構造は、真のTSに比べて0.08 {\AA}根の平均平方偏差が中央値となる。不確実性定量化のための信頼度スコアリングモデルを用いて、最も難しい反応の14\%で量子化学に基づく最適化を行うことで、反応速度推定に必要な精度(2.6 kcal/mol)にアプローチする。提案手法は未知の機構を持つ大規模反応ネットワークの構築に有用である。

Transition state (TS) search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D TS structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here, we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures - reactant, TS, and product - in an elementary reaction. Provided reactant and product, this model generates a TS structure in seconds instead of hours required when performing quantum chemistry-based optimizations. The generated TS structures achieve a median of 0.08 {\AA} root mean square deviation compared to the true TS. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction rate estimation (2.6 kcal/mol) by only performing quantum chemistry-based optimizations on 14\% of the most challenging reactions. We envision the proposed approach useful in constructing large reaction networks with unknown mechanisms.

翻訳日:2023-11-02 03:21:09 公開日:2023-10-30

# DreamPose:安定拡散によるファッション画像とビデオの合成

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion ( http://arxiv.org/abs/2304.06025v4 )

ライセンス: Link先を確認

Johanna Karras, Aleksander Holynski, Ting-Chun Wang, Ira Kemelmacher-Shlizerman

(参考訳) 静止画像からアニメーション・ファッション・ビデオを生成する拡散法であるDreamPoseを提案する。画像と人間のポーズのシーケンスが与えられたら、人間の動きと布の動きの両方を含むビデオを合成する。これを実現するために,事前学習したテキストから画像への拡散(stable diffusion)を,新たな微調整戦略,付加されたコンディショニング信号をサポートするアーキテクチャ変更のセット,時間的一貫性を促進する技術を用いて,ポーズ・アンド・イメージ誘導ビデオ合成モデルに変換する。 ubcのファッションデータセットのファッションビデオのコレクションについて詳しく調べた。本手法は衣料品のスタイルやポーズを多岐にわたって評価し, ファッションビデオの映像化における最先端の成果が得られたことを実証する。

We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel fine-tuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation.Video results are available on our project page.

翻訳日:2023-11-02 03:20:55 公開日:2023-10-30

# 衛星映像超解像のための局所-グローバル時間差学習

Local-Global Temporal Difference Learning for Satellite Video Super-Resolution ( http://arxiv.org/abs/2304.04421v2 )

ライセンス: Link先を確認

Yi Xiao, Qiangqiang Yuan, Kui Jiang, Xianyu Jin, Jiang He, Liangpei Zhang, Chia-wen Lin

(参考訳) 光フローベースおよびカーネルベースのアプローチは、衛星ビデオ超解法(VSR)における時間的補償のために広く研究されている。しかし、これらの手法は大規模または複雑なシナリオ、特に衛星ビデオでは一般化されていない。本稿では,その時間的差異を有効かつ効果的な時間的補償に活用することを提案する。フレーム内の局所的および大域的時間的情報を完全に活用するために, 短期的および長期的時間的不整合を体系的にモデル化した。具体的には、隣接フレーム間のRGB差分マップから局所的な動き表現を抽出するための短期時間差分モジュール(S-TDM)を考案し、より正確なテクスチャ表現の手がかりを得る。フレーム列全体の大域的依存性を調べるために、時間的特徴の変調を導くために、前方セグメントと後方セグメントの差を組み込んで活性化する長期時間的差分モジュール(l-tdm)が提案されている。さらに,対象フレームの空間分布と時間補正結果との相互作用を豊かにするための差分補償ユニット(dcu)を提案する。 5つのメインストリームビデオ衛星に対して厳密な客観的・主観評価を行った結果,本手法は最先端のアプローチに好適な効果を示した。コードはhttps://github.com/XY-boy/LGTDで入手できる。

Optical-flow-based and kernel-based approaches have been extensively explored for temporal compensation in satellite Video Super-Resolution (VSR). However, these techniques are less generalized in large-scale or complex scenarios, especially in satellite videos. In this paper, we propose to exploit the well-defined temporal difference for efficient and effective temporal compensation. To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies since we observed that these discrepancies offer distinct and mutually complementary properties. Specifically, we devise a Short-term Temporal Difference Module (S-TDM) to extract local motion representations from RGB difference maps between adjacent frames, which yields more clues for accurate texture representation. To explore the global dependency in the entire frame sequence, a Long-term Temporal Difference Module (L-TDM) is proposed, where the differences between forward and backward segments are incorporated and activated to guide the modulation of the temporal feature, leading to a holistic global compensation. Moreover, we further propose a Difference Compensation Unit (DCU) to enrich the interaction between the spatial distribution of the target frame and temporal compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches. Code will be available at https://github.com/XY-boy/LGTD

翻訳日:2023-11-02 03:20:41 公開日:2023-10-30

# クロスモルフォロジーによるロボットマニピュレーションの学習

Learning Robot Manipulation from Cross-Morphology Demonstration ( http://arxiv.org/abs/2304.03833v2 )

ライセンス: Link先を確認

Gautam Salhotra, I-Chun Arthur Liu, Gaurav Sukhatme

(参考訳) 実演(lfd)メソッドから学んだものは、教師と生徒のアクションスペースで小さなミスマッチを処理する。ここでは,教師の形態が学生と大きく異なる場合について述べる。我々のフレームワークであるMorphological Adaptation in Imitation Learning (MAIL)はこのギャップを埋め、異なる形態を持つ他のエージェントによるデモンストレーションからエージェントを訓練することができる。 MAILは、望まれるソリューションへの$\textit{some}$ガイダンスを提供する限り、最適以下のデモから学ぶ。剛体および変形可能な物体を用いた操作タスクにおいて,剛体障害物と相互作用する3次元布操作を含むメールを提示する。 2つのエンドエフェクタを有する模擬エージェントによるデモンストレーションを用いて,1つのエンドエフェクタを持つロボットの視覚制御ポリシを訓練する。 MAILは、LfDおよび非LfDベースラインに対する正規化パフォーマンスメトリックを最大で24.%改善する。本物のFranka Pandaロボットにデプロイされ、オブジェクト(サイズ、回転、翻訳)と布固有の特性(色、厚さ、サイズ、材料)のさまざまな特性を扱う。概要はhttps://uscresl.github.io/mail にある。

Some Learning from Demonstrations (LfD) methods handle small mismatches in the action spaces of the teacher and student. Here we address the case where the teacher's morphology is substantially different from that of the student. Our framework, Morphological Adaptation in Imitation Learning (MAIL), bridges this gap allowing us to train an agent from demonstrations by other agents with significantly different morphologies. MAIL learns from suboptimal demonstrations, so long as they provide $\textit{some}$ guidance towards a desired solution. We demonstrate MAIL on manipulation tasks with rigid and deformable objects including 3D cloth manipulation interacting with rigid obstacles. We train a visual control policy for a robot with one end-effector using demonstrations from a simulated agent with two end-effectors. MAIL shows up to $24\%$ improvement in a normalized performance metric over LfD and non-LfD baselines. It is deployed to a real Franka Panda robot, handles multiple variations in properties for objects (size, rotation, translation), and cloth-specific properties (color, thickness, size, material). An overview is on https://uscresl.github.io/mail .

翻訳日:2023-11-02 03:19:53 公開日:2023-10-30

# log-concaveサンプリングのためのクエリ下限

Query lower bounds for log-concave sampling ( http://arxiv.org/abs/2304.02599v2 )

ライセンス: Link先を確認

Sinho Chewi, Jaume de Dios Pont, Jerry Li, Chen Lu, Shyam Narayanan

(参考訳) ログ・コンケーブのサンプリングは近年顕著なアルゴリズムの進歩をみせたが、このタスクの下位境界を証明するための対応する問題は、以前は次元1でしか知られていなかった。本研究では, 1次元の強いlog-concaveおよびlog-smooth分布からのサンプリングには,任意の定数次元においてシャープな$\omega(\log \kappa)$クエリ, 2次元のガウス分布からのサンプリング$d$(一般のlog-concaveおよびlog-smooth分布からも$d$となる)には$\widetilde \omega(\min(\sqrt\kappa \log d, d)$クエリが必要である。ここで$\kappa$はターゲット分布の条件番号を表す。本証明は,(1)幾何学的測度論におけるカヤヤ予想の研究に触発された多元的構成と,(2)ブロッククリロフアルゴリズムがこの問題に最適であることを示す新しい還元と,行列・ベクトル問合せ文献で開発されたウィッシュアート行列に基づく下限手法との関係に依存する。

Log-concave sampling has witnessed remarkable algorithmic advances in recent years, but the corresponding problem of proving lower bounds for this task has remained elusive, with lower bounds previously known only in dimension one. In this work, we establish the following query lower bounds: (1) sampling from strongly log-concave and log-smooth distributions in dimension $d\ge 2$ requires $\Omega(\log \kappa)$ queries, which is sharp in any constant dimension, and (2) sampling from Gaussians in dimension $d$ (hence also from general log-concave and log-smooth distributions in dimension $d$) requires $\widetilde \Omega(\min(\sqrt\kappa \log d, d))$ queries, which is nearly sharp for the class of Gaussians. Here $\kappa$ denotes the condition number of the target distribution. Our proofs rely upon (1) a multiscale construction inspired by work on the Kakeya conjecture in geometric measure theory, and (2) a novel reduction that demonstrates that block Krylov algorithms are optimal for this problem, as well as connections to lower bound techniques based on Wishart matrices developed in the matrix-vector query literature.

翻訳日:2023-11-02 03:18:16 公開日:2023-10-30

# EduceLab-Scrolls:X線CTによるHerculaneum Papyriからのテキストの復元

EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT ( http://arxiv.org/abs/2304.02084v3 )

ライセンス: Link先を確認

Stephen Parsons, C. Seth Parker, Christy Chapman, Mami Hayashida, W. Brent Seales

(参考訳) X線CT画像を用いたHerculaneum papyriの隠れテキストを明らかにするための完全なソフトウェアパイプラインを提案する。この拡張された仮想アンラッピングパイプラインは、機械学習と、3D画像と2D画像をリンクする新しい幾何学的フレームワークを組み合わせる。 educelab-scrollsは、この問題に対する20年の研究努力を表す包括的なオープンデータセットです。 EduceLab-Scrollsには、小さな断片と無傷のロールスクロールの両方のボリュームX線CT画像が含まれている。データセットには、インク検出モデルの教師付きトレーニングに使用される2Dイメージラベルも含まれている。ラベリングは、スクロールフラグメントのスペクトル写真と、同じフラグメントのX線CT画像との整列を可能とし、画像空間とモダリティの間の機械学習可能なマッピングを作成する。このアライメントは、X線CTで「見えない」炭素インクを検出するための教師あり学習を可能にする。私たちの知る限り、これはこの種のデータセットとしては初めてのもので、ヘリテージドメインでリリースされた最大のデータセットです。本手法は, スクロール断片の正確なテキスト行を, 既知の地底真理で明らかにすることができる。露見されたテキストは、視覚的確認、定量的画像計測、学術的レビューを用いて検証される。 educelab-scrollsは今回初めて、ここで紹介するherculaneum papyriの隠されたテキストを発見した。研究が進むにつれて、educelab-scrollsデータセットがよりテキスト的な発見を生み出すことを期待している。

We present a complete software pipeline for revealing the hidden texts of the Herculaneum papyri using X-ray CT images. This enhanced virtual unwrapping pipeline combines machine learning with a novel geometric framework linking 3D and 2D images. We also present EduceLab-Scrolls, a comprehensive open dataset representing two decades of research effort on this problem. EduceLab-Scrolls contains a set of volumetric X-ray CT images of both small fragments and intact, rolled scrolls. The dataset also contains 2D image labels that are used in the supervised training of an ink detection model. Labeling is enabled by aligning spectral photography of scroll fragments with X-ray CT images of the same fragments, thus creating a machine-learnable mapping between image spaces and modalities. This alignment permits supervised learning for the detection of "invisible" carbon ink in X-ray CT, a task that is "impossible" even for human expert labelers. To our knowledge, this is the first aligned dataset of its kind and is the largest dataset ever released in the heritage domain. Our method is capable of revealing accurate lines of text on scroll fragments with known ground truth. Revealed text is verified using visual confirmation, quantitative image metrics, and scholarly review. EduceLab-Scrolls has also enabled the discovery, for the first time, of hidden texts from the Herculaneum papyri, which we present here. We anticipate that the EduceLab-Scrolls dataset will generate more textual discovery as research continues.

翻訳日:2023-11-02 03:17:41 公開日:2023-10-30

# 横型3次元シーンにおける連続的人間の動きの生成

Generating Continual Human Motion in Diverse 3D Scenes ( http://arxiv.org/abs/2304.02061v3 )

ライセンス: Link先を確認

Aymen Mir, Xavier Puig, Angjoo Kanazawa, Gerard Pons-Moll

(参考訳) 本研究では,3次元シーンにおけるアニメーター誘導人間の動作を合成する手法を提案する。 3dシーンにおいて、スパース (3 または 4) のジョイント位置(例えば、人の手と2 フィートの位置)とシード動作シーケンスのセットが与えられると、本手法は、与えられたキーポイントによって課される制約を満足しながら、シード動作から開始される、妥当な動作シーケンスを生成する。本研究では,連続的な動作合成問題を経路に分解し,キーポイントが指定した動作の内外への遷移を図り,シーン情報を明示的に組み込むことなくシーン制約を満たす動作の長期化を可能にする。本手法はシーン非依存のモキャップデータのみを用いて訓練する。結果として,我々のアプローチは,さまざまなジオメトリを備えた3dシーンに展開可能である。ドリフトを使わずに再現可能な連続運動合成を実現するためには,次の目標が原点に位置する目標中心の正準座標系において運動を生成することが重要となる。我々のモデルは,HPS, Replica, Matterport, ScanNet, およびNeRFを用いて表現されたシーンにおいて, 任意の順序でつかむ, 座る, 傾くといった多様な動作の長いシーケンスを生成することができる。いくつかの実験により、3dシーンでパスをナビゲートする既存のメソッドよりも優れていることが証明された。

We introduce a method to synthesize animator guided human motion across 3D scenes. Given a set of sparse (3 or 4) joint locations (such as the location of a person's hand and two feet) and a seed motion sequence in a 3D scene, our method generates a plausible motion sequence starting from the seed motion while satisfying the constraints imposed by the provided keypoints. We decompose the continual motion synthesis problem into walking along paths and transitioning in and out of the actions specified by the keypoints, which enables long generation of motions that satisfy scene constraints without explicitly incorporating scene information. Our method is trained only using scene agnostic mocap data. As a result, our approach is deployable across 3D scenes with various geometries. For achieving plausible continual motion synthesis without drift, our key contribution is to generate motion in a goal-centric canonical coordinate frame where the next immediate target is situated at the origin. Our model can generate long sequences of diverse actions such as grabbing, sitting and leaning chained together in arbitrary order, demonstrated on scenes of varying geometry: HPS, Replica, Matterport, ScanNet and scenes represented using NeRFs. Several experiments demonstrate that our method outperforms existing methods that navigate paths in 3D scenes.

翻訳日:2023-11-02 03:17:16 公開日:2023-10-30

# 双極子対称性破壊からの非フェルミ液体

Non-Fermi Liquids from Dipolar Symmetry Breaking ( http://arxiv.org/abs/2304.01181v3 )

ライセンス: Link先を確認

Amogh Anakru, Zhen Bi

(参考訳) フラクトロニック位相の出現と量子力学の新しい普遍性クラスは、凝縮系における双極子対称性の重要性を強調している。本研究では,種々の空間次元のフェルミオンモデルにおける双極子対称性の対称性破断相の性質について検討する。このような系では、フェルミオンは双極子凝縮によってエネルギー分散を得る。変換対称性と双極子対称性の間の非自明な可換性のため、二極子縮合の金石モードは分散フェルミオンに強く結合し、自然に低エネルギーで非フェルミ液体を生じさせる。双極子対称性の破れ相のIR記述は、創発的U(1)ゲージ場と結合するフェルミ曲面のよく知られた理論に類似している。また,双極子対称性がわずかに破れた場合の交叉挙動と異方性双極子保存の場合についても論じる。

The emergence of fractonic topological phases and novel universality classes for quantum dynamics highlights the importance of dipolar symmetry in condensed matter systems. In this work, we study the properties of symmetry-breaking phases of the dipolar symmetries in fermionic models in various spatial dimensions. In such systems, fermions obtain energy dispersion through dipole condensation. Due to the nontrivial commutation between the translation symmetry and dipolar symmetry, the Goldstone modes of the dipolar condensate are strongly coupled to the dispersive fermions and naturally give rise to non-Fermi liquids at low energies. The IR description of the dipolar symmetry-breaking phase is analogous to the well-known theory of a Fermi surface coupled to an emergent U(1) gauge field. We also discuss the crossover behavior when the dipolar symmetry is slightly broken and the cases with anisotropic dipolar conservation.

翻訳日:2023-11-02 03:16:21 公開日:2023-10-30

# 生成モデリングのための拡散マップ粒子システム

Diffusion map particle systems for generative modeling ( http://arxiv.org/abs/2304.00200v2 )

ライセンス: Link先を確認

Fengyi Li, Youssef Marzouk

(参考訳) 本稿では,拡散マップとラプラシアン調整ワッサーシュタイン勾配勾配(lawgd)に基づく生成モデルのための新しい拡散マップ粒子システム(dmps)を提案する。拡散写像は、サンプルから対応するランジュバン拡散過程の生成元を近似し、従って基礎となるデータ生成多様体を学ぶために用いられる。一方, lawgd では, 拡散写像で計算した生成器のスペクトル近似を用いて, 適切なカーネル選択を条件として, ターゲット分布からの効率的なサンプリングが可能となる。本手法は,オフライントレーニングや最小限のチューニングを必要とせず,中程度の次元のデータセット上で他のアプローチよりも優れる。

We propose a novel diffusion map particle system (DMPS) for generative modeling, based on diffusion maps and Laplacian-adjusted Wasserstein gradient descent (LAWGD). Diffusion maps are used to approximate the generator of the corresponding Langevin diffusion process from samples, and hence to learn the underlying data-generating manifold. On the other hand, LAWGD enables efficient sampling from the target distribution given a suitable choice of kernel, which we construct here via a spectral approximation of the generator, computed with diffusion maps. Our method requires no offline training and minimal tuning, and can outperform other approaches on data sets of moderate dimension.

翻訳日:2023-11-02 03:16:06 公開日:2023-10-30

# 教育のための人工知能(agi)

Artificial General Intelligence (AGI) for Education ( http://arxiv.org/abs/2304.12479v3 )

ライセンス: Link先を確認

Ehsan Latif, Gengchen Mai, Matthew Nyaaba, Xuansheng Wu, Ninghao Liu, Guoyu Lu, Sheng Li, Tianming Liu, and Xiaoming Zhai

(参考訳) 人工知能 (AGI) は, GPT-4 や ChatGPT といった大規模言語モデルやチャットボットの出現により, 将来の技術としてグローバルに認識されるようになった。 agiは、教育分野に革命を起こす可能性のある重要な技術の1つであるコンピュータシステムを通じて、人間の知能を再現することを目指している。通常、限られた範囲のタスク用に設計された従来のaiモデルと比較すると、トレーニングのためにかなりの量のドメイン固有のデータを必要とし、教育における複雑な対人ダイナミクスを考えるとは限らない。最近の大規模な事前学習モデルによって駆動されるAGIは、推論、問題解決、意思決定、さらには人間の感情や社会的相互作用を理解することなど、人間レベルの知性を必要とするタスクを実行する機械の能力において、大きな飛躍を示している。この研究は、AGIの教育目標の設定、教育とカリキュラムの設計、評価の実行など、将来の教育における重要な概念、能力、範囲、潜在能力についてレビューする。また、AGIが直面する教育における様々な倫理的問題や、AGIが人間の教育者に与える影響について、豊富な議論を行っている。 AGIの開発は、研究と応用活動を進めるために、教育者とAIエンジニアの学際的なコラボレーションを必要とする。

Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. AGI aims to replicate human intelligence through computer systems, which is one of the critical technologies having the potential to revolutionize the field of education. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This work reviews AGI's key concepts, capabilities, scope, and potential within future education, including setting educational goals, designing pedagogy and curriculum, and performing assessments. We also provide rich discussions over various ethical issues in education faced by AGI and how AGI will affect human educators. The development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.

翻訳日:2023-11-02 03:08:44 公開日:2023-10-30

# 二成分ボース混合物の魅力的な溶液--液-真空共存と臨界点

Attractive Solution of Binary Bose Mixtures: Liquid-Vapor Coexistence and Critical Point ( http://arxiv.org/abs/2304.12334v2 )

ライセンス: Link先を確認

Gabriele Spada, Sebastiano Pilati and Stefano Giorgini

(参考訳) 完全経路積分モンテカルロ法を用いた魅力的な二成分ボース混合物の熱力学的挙動について検討した。我々は, 基底状態が自己結合性液相にある種間相互作用の制御に焦点をあて, 平均場効果を超えて安定化する。我々はアトラクション強度の異なる値に対して圧力対密度面の等温曲線を計算し、マックスウェル構造を用いて液体と蒸気の共存領域の面積を推定する。特に、共存領域内では、ボース=アインシュタイン凝縮は、密度が通常の気体から超流動液相に上昇するにつれて不連続的に起こる。さらに,一階遷移線が終端する臨界点を決定し,その近傍の密度不連続性の挙動について検討する。また, この遷移における密度不連続性は, トラップ内の混合実験で観測できることも指摘した。

We study the thermodynamic behavior of attractive binary Bose mixtures using exact path-integral Monte-Carlo methods. Our focus is on the regime of interspecies interactions where the ground state is in a self-bound liquid phase, stabilized by beyond mean-field effects. We calculate the isothermal curves in the pressure vs density plane for different values of the attraction strength and establish the extent of the coexistence region between liquid and vapor using the Maxwell construction. Notably, within the coexistence region, Bose-Einstein condensation occurs in a discontinuous way as the density jumps from the normal gas to the superfluid liquid phase. Furthermore, we determine the critical point where the line of first-order transition ends and investigate the behavior of the density discontinuity in its vicinity. We also point out that the density discontinuity at the transition could be observed in experiments of mixtures in traps.

翻訳日:2023-11-02 03:08:23 公開日:2023-10-30

# 物理ベース補間による水ネットワークリーク定位のための辞書の学習

Learning Dictionaries from Physical-Based Interpolation for Water Network Leak Localization ( http://arxiv.org/abs/2304.10932v2 )

ライセンス: Link先を確認

Paul Irofti and Luis Romero-Ben and Florin Stoican and Vicen\c{c} Puig

(参考訳) 本稿では,状態推定と学習に基づくリークローカライズ手法を提案する。第1は補間方式で処理されるが、第2段階では辞書学習が考慮される。新たに提案する補間手法は, 配水ネットワークにおける隣接ノードの油圧ヘッド間の相互接続の物理を活用している。さらに、残差は油圧ヘッド値の代わりに直接補間される。よく知られているケーススタディ (modena) に本手法を適用した結果, 補間誤差(配位状態と残差推定)と後方位置推定の両面で, 新たな補間法の改善が示された。

This article presents a leak localization methodology based on state estimation and learning. The first is handled by an interpolation scheme, whereas dictionary learning is considered for the second stage. The novel proposed interpolation technique exploits the physics of the interconnections between hydraulic heads of neighboring nodes in water distribution networks. Additionally, residuals are directly interpolated instead of hydraulic head values. The results of applying the proposed method to a well-known case study (Modena) demonstrated the improvements of the new interpolation method with respect to a state-of-the-art approach, both in terms of interpolation error (considering state and residual estimation) and posterior localization.

翻訳日:2023-11-02 03:08:08 公開日:2023-10-30

# 配向相における長寿命シングルト状態とその等方相への相転移における生存

Long-Lived Singlet State in an Oriented Phase and its Survival across the Phase Transition Into an Isotropic Phase ( http://arxiv.org/abs/2304.10459v3 )

ライセンス: Link先を確認

Vishal Varma, and T S Mahesh

(参考訳) 核スピン対の長寿命一重項状態(LLS)は、液体NMRを介して等方性相において広く研究され、利用されてきた。しかし、異方性相におけるLSSの報告はほとんどなく、スカラーカップリングに加えて双極子カップリングからの寄与を許容し、多くのエキサイティングな可能性を開く。本稿では,液晶溶媒のネマティック相に部分的に配向した一対の核スピンにおけるLSSの観察を報告する。スピンは残留双極子-双極子カップリングを介して強く相互作用する。配向相におけるLSSは、通常のスピン格子緩和時間定数(T_1$)の最大3倍長寿命である。加熱すると、システムはネマティックから等方相への相転移を起こし、llsは対応する$t_1$の最大5倍の寿命を持つ。興味深いことに、配向相で調製されたLSSは、ネマティック相から等方相への遷移を生き残ることができる。配向相におけるllsの応用として, 液晶溶媒中の溶質分子の小さな移動拡散係数を測定するために, その長寿命を利用する。最後に、LSSへのアクセスをロックまたはアンロックするために位相遷移を利用することを提案する。

Long-lived singlet states (LLS) of nuclear spin pairs have been extensively studied and utilized in the isotropic phase via liquid state NMR. However, there are hardly any reports of LLS in the anisotropic phase that allows contribution from the dipolar coupling in addition to the scalar coupling, thereby opening many exciting possibilities. Here we report observing LLS in a pair of nuclear spins partially oriented in the nematic phase of a liquid crystal solvent. The spins are strongly interacting via the residual dipole-dipole coupling. We observe LLS in the oriented phase living up to three times longer than the usual spin-lattice relaxation time constant ($T_1$). Upon heating, the system undergoes a phase transition from nematic into isotropic phase, wherein the LLS is up to five times longer lived than the corresponding $T_1$. Interestingly, the LLS prepared in the oriented phase can survive the transition from the nematic to the isotropic phase. As an application of LLS in the oriented phase, we utilize its longer life to measure the small translational diffusion coefficient of solute molecules in the liquid crystal solvent. Finally, we propose utilizing the phase transition to lock or unlock access to LLS.

翻訳日:2023-11-02 03:07:57 公開日:2023-10-30

# 信頼度予測のための事前学習モデルからのサンプル難読化

Learning Sample Difficulty from Pre-trained Models for Reliable Prediction ( http://arxiv.org/abs/2304.10127v2 )

ライセンス: Link先を確認

Peng Cui, Dan Zhang, Zhijie Deng, Yinpeng Dong, Jun Zhu

(参考訳) 大規模事前学習モデルは多くのアプリケーションで顕著な成功を収めているが、下流モデルの予測信頼性を改善するためにそれらを活用する方法は望ましくないほど過小評価されている。さらに、現代のニューラルネットワークは校正が不十分で、固有のサンプルの難しさやデータの不確実性に関わらず、自信過剰な予測がなされている。そこで本研究では,大規模な事前学習モデルを用いて,サンプル難易度を考慮したエントロピー正規化による下流モデルトレーニングを指導する。大規模データセットに露出し、下流のトレーニングクラスに過度に適合しない事前学習モデルでは、特徴空間ガウスモデルと相対マハラノビス距離計算により、各トレーニングサンプルの難易度を測定することができる。重要なことは、サンプルの難易度に基づいて過信予測を適応的にペナルティ化することで、挑戦するベンチマーク(例えば、ResNet34を用いてImageNet1k上で+0.55% ACCと-3.7% ECE)の精度と不確実性の校正を同時に改善し、信頼性のある予測のための競争基準を一貫して上回っていることである。改良された不確実性推定は、選択的分類(誤った予測を含まない)と分布外検出をさらに改善する。

Large-scale pre-trained models have achieved remarkable success in many applications, but how to leverage them to improve the prediction reliability of downstream models is undesirably under-explored. Moreover, modern neural networks have been found to be poorly calibrated and make overconfident predictions regardless of inherent sample difficulty and data uncertainty. To address this issue, we propose to utilize large-scale pre-trained models to guide downstream model training with sample difficulty-aware entropy regularization. Pre-trained models that have been exposed to large-scale datasets and do not overfit the downstream training classes enable us to measure each training sample's difficulty via feature-space Gaussian modeling and relative Mahalanobis distance computation. Importantly, by adaptively penalizing overconfident prediction based on the sample difficulty, we simultaneously improve accuracy and uncertainty calibration across challenging benchmarks (e.g., +0.55% ACC and -3.7% ECE on ImageNet1k using ResNet34), consistently surpassing competitive baselines for reliable prediction. The improved uncertainty estimate further improves selective classification (abstaining from erroneous predictions) and out-of-distribution detection.

翻訳日:2023-11-02 03:07:39 公開日:2023-10-30

# Thorny Roses氏:自然言語処理における両用ジレンマの調査

Thorny Roses: Investigating the Dual Use Dilemma in Natural Language Processing ( http://arxiv.org/abs/2304.08315v3 )

ライセンス: Link先を確認

Lucie-Aim\'ee Kaffee, Arnav Arora, Zeerak Talat, Isabelle Augenstein

(参考訳) 技術と科学的成果物の意図的かつ有害な再利用である二重利用は、自然言語処理(nlp)の文脈ではまだ明確に定義されていない問題である。しかし、NLP技術は発展を続け、社会に広まりつつあるため、内部の作業はますます不透明になっている。したがって、二重利用の懸念とそれらを制限する潜在的な方法を理解することは、研究開発の潜在的な害を最小化するために重要である。本稿では,NLP研究者と実践者を対象に,課題の深さと展望を把握し,既存のサポートの評価を行う。調査の結果に基づき,NLPコミュニティのニーズに合わせた二重利用の定義を提供する。この調査によると、大多数の研究者が研究の二重利用を心配しているが、その対策は限られている。調査結果を踏まえ,NLPにおける二重利用を緩和する現在の状況と潜在的手段について考察し,既存の会議倫理枠組み,例えばACL倫理チェックリストに統合可能なチェックリストを提案する。

Dual use, the intentional, harmful reuse of technology and scientific artefacts, is a problem yet to be well-defined within the context of Natural Language Processing (NLP). However, as NLP technologies continue to advance and become increasingly widespread in society, their inner workings have become increasingly opaque. Therefore, understanding dual use concerns and potential ways of limiting them is critical to minimising the potential harms of research and development. In this paper, we conduct a survey of NLP researchers and practitioners to understand the depth and their perspective of the problem as well as to assess existing available support. Based on the results of our survey, we offer a definition of dual use that is tailored to the needs of the NLP community. The survey revealed that a majority of researchers are concerned about the potential dual use of their research but only take limited action toward it. In light of the survey results, we discuss the current state and potential means for mitigating dual use in NLP and propose a checklist that can be integrated into existing conference ethics-frameworks, e.g., the ACL ethics checklist.

翻訳日:2023-11-02 03:05:53 公開日:2023-10-30

# 密集群集追跡における重度咬合の頭部集中による対処

Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads ( http://arxiv.org/abs/2304.07705v3 )

ライセンス: Link先を確認

Yu Zhang, Huaming Chen, Wei Bao, Zhongzheng Lai, Zao Zhang, Dong Yuan

(参考訳) ディープラーニングの急速な発展に伴い、オブジェクト検出と追跡は、今日の社会において重要な役割を果たす。密集した群衆シーンのすべての歩行者をコンピュータビジョンのアプローチで識別し追跡することは、この分野で典型的な課題であり、Multiple Object Tracking(MOT)チャレンジとも呼ばれる。現代のトラッカーは、より複雑なシーンで操作する必要がある。 MOT20チャレンジの結果によると、歩行者はMOT17チャレンジの4倍密度がある。したがって、非常に混み合った場面で検出・追跡する能力を向上させることが、この研究の目的である。人体に対する咬合問題に照らし合わせると、頭部は通常より識別が容易である。本研究では,小型・中型ともに歩行者のリコールと精度の向上を図るために,アンカーレス方式のジョイントヘッドとボディ検出器を設計した。また,本モデルでは,訓練用歩行者検出のための統計的頭部比に関する情報は不要である。提案するモデルは,その比率を動的に学習する。提案モデルの有効性を検証するため,MOT20,Crowd Human,HT21データセットなど,さまざまなデータセットに対する広範な実験を行った。その結果,提案手法は中小歩行者のリコール率と精度を著しく改善し,これらの課題データセットにおいて最先端の結果を得ることができた。

With the rapid development of deep learning, object detection and tracking play a vital role in today's society. Being able to identify and track all the pedestrians in the dense crowd scene with computer vision approaches is a typical challenge in this field, also known as the Multiple Object Tracking (MOT) challenge. Modern trackers are required to operate on more and more complicated scenes. According to the MOT20 challenge result, the pedestrian is 4 times denser than the MOT17 challenge. Hence, improving the ability to detect and track in extremely crowded scenes is the aim of this work. In light of the occlusion issue with the human body, the heads are usually easier to identify. In this work, we have designed a joint head and body detector in an anchor-free style to boost the detection recall and precision performance of pedestrians in both small and medium sizes. Innovatively, our model does not require information on the statistical head-body ratio for common pedestrians detection for training. Instead, the proposed model learns the ratio dynamically. To verify the effectiveness of the proposed model, we evaluate the model with extensive experiments on different datasets, including MOT20, Crowdhuman, and HT21 datasets. As a result, our proposed method significantly improves both the recall and precision rate on small & medium sized pedestrians and achieves state-of-the-art results in these challenging datasets.

翻訳日:2023-11-02 03:05:12 公開日:2023-10-30

# 平均二階類似性に基づく確率的分散最適化:アルゴリズムと解析

Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis ( http://arxiv.org/abs/2304.07504v2 )

ライセンス: Link先を確認

Dachao Lin, Yuze Han, Haishan Ye, Zhihua Zhang

(参考訳) 一般化された$\delta$- similarity と $\mu$-strong convexity 条件の下でマスターノードと$n-1$ローカルノードを含む有限サム分散最適化問題を調べる。本稿では,SVRSとAccSVRSの2つの新しいアルゴリズムを提案する。非加速SVRS法は、勾配スライディングと分散低減の技法を組み合わせて、既存の非加速アルゴリズムと比較して$\tilde{\mathcal{O}}(n {+} \sqrt{n}\delta/\mu)$の通信複雑性を向上する。 Katyusha X で提案されたフレームワークを応用し、$\tilde{\mathcal{O}}(n {+} n^{3/4}\sqrt{\delta/\mu})$通信複雑性を持つ直接加速版 AccSVRS も開発する。既存の結果とは対照的に、複雑さの境界は完全に滑らかで、不調なケースでは優れている。さらに, AccSVRS法の厳密性を検証するために, ほぼ一致した下界を確立する。

We study finite-sum distributed optimization problems involving a master node and $n-1$ local nodes under the popular $\delta$-similarity and $\mu$-strong convexity conditions. We propose two new algorithms, SVRS and AccSVRS, motivated by previous works. The non-accelerated SVRS method combines the techniques of gradient sliding and variance reduction and achieves a better communication complexity of $\tilde{\mathcal{O}}(n {+} \sqrt{n}\delta/\mu)$ compared to existing non-accelerated algorithms. Applying the framework proposed in Katyusha X, we also develop a directly accelerated version named AccSVRS with the $\tilde{\mathcal{O}}(n {+} n^{3/4}\sqrt{\delta/\mu})$ communication complexity. In contrast to existing results, our complexity bounds are entirely smoothness-free and exhibit superiority in ill-conditioned cases. Furthermore, we establish a nearly matched lower bound to verify the tightness of our AccSVRS method.

翻訳日:2023-11-02 03:04:52 公開日:2023-10-30

# ベイズ階層モデルのためのギブスサンプラーの次元自由混合時間

Dimension-free mixing times of Gibbs samplers for Bayesian hierarchical models ( http://arxiv.org/abs/2304.06993v2 )

ライセンス: Link先を確認

Filippo Ascolani and Giacomo Zanella

(参考訳) ギブズサンプリングはベイズ階層モデルから生じる後続分布を近似する一般的なアルゴリズムである。しかし、その人気と優れた経験的性能にもかかわらず、勾配に基づくサンプリング法よりもはるかに少ないような収束特性に関する定量的な結果はまだ少ない。本研究は,ベイズ漸近学のツールを用いた階層モデルを対象としたギブスサンプルの総変動混合時間の挙動を解析する。一般確率関数を持つ2レベルモデルの広いクラスに対して、ランダムなデータ生成仮定の下で次元自由収束結果を得る。ガウス的、二項的、カテゴリー的可能性に関する具体例を論じる。

Gibbs samplers are popular algorithms to approximate posterior distributions arising from Bayesian hierarchical models. Despite their popularity and good empirical performances, however, there are still relatively few quantitative results on their convergence properties, e.g. much less than for gradient-based sampling methods. In this work we analyse the behaviour of total variation mixing times of Gibbs samplers targeting hierarchical models using tools from Bayesian asymptotics. We obtain dimension-free convergence results under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed.

翻訳日:2023-11-02 03:03:23 公開日:2023-10-30

# Vault: コードの理解と生成を促進するための総合的な多言語データセット

The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation ( http://arxiv.org/abs/2305.06156v2 )

ライセンス: Link先を確認

Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui

(参考訳) 本稿では,多言語言語における高品質なコードテキストペアのデータセットであるThe Vaultについて紹介する。本稿では,ルールベースと深層学習ベースの両方の手法を用いて,高品質なコードとテキストを含むことを保証するサンプルを徹底的に抽出する手法を提案する。コード生成やコード検索,コード要約など,一般的なコーディングタスクに対する広範な評価は,コード検索Netなどの他のデータセットでトレーニングされたモデルよりも優れていることを示す。また,これらのモデルの性能に及ぼす各種プログラミング言語やドクストリングの影響を評価するために,データセットの詳細な分析を行った。

We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code. We present methods for thoroughly extracting samples that use both rule-based and deep learning-based methods to ensure that they contain high-quality pairs of code and text, resulting in a dataset of 43 million high-quality code-text pairs. Our extensive evaluations on common coding tasks including code generation, code search and code summarization show that when fine-tuning Code Large Language Models on The Vault, such models outperform the same models trained on other datasets such as CodeSearchNet. We also provide detailed analyses of our datasets to assess the effects of various programming languages and docstrings on the performance of such models.

翻訳日:2023-11-02 02:55:54 公開日:2023-10-30

# 専門家のガウス混合におけるソフトマックスゲーティング機能

Demystifying Softmax Gating Function in Gaussian Mixture of Experts ( http://arxiv.org/abs/2305.03288v2 )

ライセンス: Link先を確認

Huy Nguyen and TrungTin Nguyen and Nhat Ho

(参考訳) ソフトマックスゲーティング・ガウシアン混合物のパラメータ推定の理解は、文献の長年の未解決問題として残されている。主な原因は、ソフトマックスゲーティング関数に関連する3つの基本的な理論的課題である。 (i)パラメータの翻訳のみによる識別可能性 (II)ソフトマックスゲーティングとガウス密度のエキスパート関数の間の偏微分方程式による内在的相互作用 (3) ガウスの混合を測るソフトマックスの条件密度の数値と分母の間の複素依存性。これらの課題を,パラメータ間の新しいボロノイ損失関数を提案し,パラメータ推定のためのmle(maximum probability estimator)の収束率を確立することで解決する。本研究の結果から,mleの収束率と多項式方程式系の可解性問題との関係が明らかとなった。

Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.

翻訳日:2023-11-02 02:54:38 公開日:2023-10-30

# ニューラルネットワークはタブラルデータ上で高木を向上するのか?

When Do Neural Nets Outperform Boosted Trees on Tabular Data? ( http://arxiv.org/abs/2305.02997v3 )

ライセンス: Link先を確認

Duncan McElfresh, Sujay Khandagale, Jonathan Valverde, Vishak Prasad C, Benjamin Feuer, Chinmay Hegde, Ganesh Ramakrishnan, Micah Goldblum, Colin White

(参考訳) タブラルデータ(英: Tabular data)は、機械学習において最も一般的に使用されるデータの1つである。表データに対するニューラルネット(NN)の最近の進歩にもかかわらず、NNが表データ上で一般的に勾配付き決定木(GBDT)を上回っているかどうかについては、活発な議論が続いている。この作業では、一歩後退して、この議論の重要性に疑問を投げかけます。驚くほど多くのデータセットに対して、GBDTとNNのパフォーマンス差は無視可能であるか、GBDTの軽量ハイパーパラメータチューニングの方がNNとGBDTの選択よりも重要である。最近提案された事前データ対応ネットワークであるTabPFNは、3000のトレーニングセットに事実上制限されているが、3000のトレーニングデータポイントをランダムにサンプリングしても、他のアルゴリズムを平均で上回っている。次に、数十のメタ機能を分析して、nnやgbdtがうまく機能するようにデータセットの特性を決定する。例えば、GBDTは、スキューやヘビーテールの機能分布やその他のデータセットの不規則性を扱うのに、NNよりもはるかに優れている。私たちの洞察は、実践者がデータセット上で最もうまく機能するテクニックを決定するためのガイドとして機能します。最後に、表形式のデータ研究を加速することを目的として、TabZilla Benchmark Suiteをリリースした。私たちのベンチマークスイート、コードベース、およびすべての生の結果は、https://github.com/naszilla/tabzillaで閲覧できます。

Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and question the importance of this debate. To this end, we conduct the largest tabular data analysis to date, comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than choosing between NNs and GBDTs. A remarkable exception is the recently-proposed prior-data fitted network, TabPFN: although it is effectively limited to training sets of size 3000, we find that it outperforms all other algorithms on average, even when randomly sampling 3000 training datapoints. Next, we analyze dozens of metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed or heavy-tailed feature distributions and other forms of dataset irregularities. Our insights act as a guide for practitioners to determine which techniques may work best on their dataset. Finally, with the goal of accelerating tabular data research, we release the TabZilla Benchmark Suite: a collection of the 36 'hardest' of the datasets we study. Our benchmark suite, codebase, and all raw results are available at https://github.com/naszilla/tabzilla.

翻訳日:2023-11-02 02:54:24 公開日:2023-10-30

# 多レベル一貫性に基づく弱制御マイクロ・マクロ圧縮スポッティング

Weakly-supervised Micro- and Macro-expression Spotting Based on Multi-level Consistency ( http://arxiv.org/abs/2305.02734v2 )

ライセンス: Link先を確認

Wang-Wang Yu, Kai-Fu Yang, Hong-Mei Yan, Yong-Jie Li

(参考訳) 非トリミングビデオにおけるマイクロおよびマクロ表現スポッティング手法の多くは、ビデオ単位での収集とフレーム毎のアノテーションの負担に苦しむ。ビデオレベルラベルに基づくwes(weed-supervised expression spotting)は,きめ細かいフレームレベルスポッティングを実現しながら,フレームレベルのアノテーションの複雑さを軽減する可能性がある。しかし、既存の弱教師付き手法は、モーダリティ、サンプル間、タスク間ギャップを含む多重インスタンス学習(MIL)に基づいていると論じる。サンプル間ギャップは主にサンプル分布と持続時間に由来する。そこで本研究では,ビデオレベルのラベルのみを用いたフレームレベルのスポッティングを実現するために,モーダルレベルのサリエンシ,ビデオレベルの分散,ラベルレベルの持続時間,セグメントレベルの特徴一貫性戦略などを含むマルチコンシスタンスな協調機構を用いた,新しいwesフレームワークであるmc-wesを提案する。モーダルレベルのサリエンシ整合性戦略は、生画像と光流のキー相関を捉えることに焦点を当てている。映像レベルの分布整合性戦略は時間分布のスパーシティの差を利用する。ラベルレベルの持続時間一貫性戦略は、顔の筋肉の持続時間の違いを利用する。セグメントレベルの機能一貫性戦略は、同じラベル下の機能は類似性を維持することを強調する。 CAS(ME)$^2$、CAS(ME)$^3$、SAMM-LVという3つの挑戦的なデータセットの実験結果は、MC-WESが最先端の完全教師付き手法に匹敵することを示した。

Most micro- and macro-expression spotting methods in untrimmed videos suffer from the burden of video-wise collection and frame-wise annotation. Weakly-supervised expression spotting (WES) based on video-level labels can potentially mitigate the complexity of frame-level annotation while achieving fine-grained frame-level spotting. However, we argue that existing weakly-supervised methods are based on multiple instance learning (MIL) involving inter-modality, inter-sample, and inter-task gaps. The inter-sample gap is primarily from the sample distribution and duration. Therefore, we propose a novel and simple WES framework, MC-WES, using multi-consistency collaborative mechanisms that include modal-level saliency, video-level distribution, label-level duration and segment-level feature consistency strategies to implement fine frame-level spotting with only video-level labels to alleviate the above gaps and merge prior knowledge. The modal-level saliency consistency strategy focuses on capturing key correlations between raw images and optical flow. The video-level distribution consistency strategy utilizes the difference of sparsity in temporal distribution. The label-level duration consistency strategy exploits the difference in the duration of facial muscles. The segment-level feature consistency strategy emphasizes that features under the same labels maintain similarity. Experimental results on three challenging datasets -- CAS(ME)$^2$, CAS(ME)$^3$, and SAMM-LV -- demonstrate that MC-WES is comparable to state-of-the-art fully-supervised methods.

翻訳日:2023-11-02 02:53:54 公開日:2023-10-30

# アンリミフォーマ:アンリミット長入力長長変圧器

Unlimiformer: Long-Range Transformers with Unlimited Length Input ( http://arxiv.org/abs/2305.01625v3 )

ライセンス: Link先を確認

Amanda Bertsch, Uri Alon, Graham Neubig, Matthew R. Gormley

(参考訳) トランスの提案以来、これらのモデルは入力中の全てのトークンに出席する必要があるため、有界な入力長に限定されてきた。本研究では,既存のトレーニング済みエンコーダデコーダ変換器をラップし,k-nearest-neighbor(kNN)インデックスにクロスアテンション計算をオフロードする一般手法であるUnlimiformerを提案する。このkNNインデックスはGPUまたはCPUメモリのいずれかに保持され、サブ線形時間でクエリされる。この方法では、事実上無制限な入力シーケンスをインデックスすることができる。いくつかの長期文書および書籍要約ベンチマークでUnlimiformerを評価し,BookSumデータセットから500kのトークン長入力を,テスト時に入力トランケーションなしで処理可能であることを示した。我々は、Unlimiformerが、学習重量を増すことなく、コードを変更することなく無制限な入力に拡張することで、BARTやLongformerのような事前学習モデルを改善することを示した。コードとモデルをhttps://github.com/abertsch72/unlimiformerで公開しています。

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index, while the returned kNN distances are the attention dot-product scores. This kNN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even 500k token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code. We make our code and models publicly available at https://github.com/abertsch72/unlimiformer .

翻訳日:2023-11-02 02:53:23 公開日:2023-10-30

# ChatGPTで生成されたコードは本当に正しいか? コード生成のための大規模言語モデルの厳密な評価

Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation ( http://arxiv.org/abs/2305.01210v3 )

ライセンス: Link先を確認

Jiawei Liu and Chunqiu Steven Xia and Yuyao Wang and Lingming Zhang

(参考訳) プログラム合成は、コードを生成するためにLLM(Large Language Models)の力を直接利用することに焦点を当てた最近のアプローチで長い間研究されてきた。コード合成における様々なllmのパフォーマンスを測定するために、キュレートされた合成問題とテストケースを伴うプログラミングベンチマークが使用される。しかし、これらのテストケースは、生成されたコードの機能的正確性を完全に評価するために、量と品質の両方で制限することができる。 LLMの時代、生成されたコードは本当に正しいのでしょうか? そこで我々は,LLM合成コードの機能的正しさを厳格に評価するコード合成評価フレームワークであるEvalPlusを提案する。 EvalPlusは、LLMと突然変異ベースの戦略を駆使した自動テスト入力ジェネレータによって新たに生成された大量のテストケースで、所定の評価データセットを拡張している。 EvalPlusは一般的なものですが、人気のあるHumanEvalベンチマークのテストケースを80倍拡張してHumanEval+を構築します。 26の人気のあるLCM(例えば、GPT-4とChatGPT)に対する我々の広範な評価は、HumanEval+がLLMによって合成された未検出の誤りコードを大量に取得でき、パス@kを19.3-28.9%まで削減できることを示している。また、テストの不十分さが誤判定につながることもわかりました。例えば、WizardCoder-CodeLlamaとPhind-CodeLlamaはいずれもHumanEval+でChatGPTを上回っている。我々の研究は、従来の一般的なコード合成評価結果が、コード合成のためのLLMの真の性能を正確に反映しているだけでなく、自動テストによってそのようなベンチマークを改善するための新たな方向性も示している。我々は、将来のLLM-for-codeリサーチを促進・加速するために、ツール、拡張データセット、およびすべてのLCM生成コードをhttps://github.com/evalplus/evalplusでオープンソース化しました。

Program synthesis has been long studied with recent approaches focused on directly using the power of Large Language Models (LLMs) to generate code. Programming benchmarks, with curated synthesis problems and test-cases, are used to measure the performance of various LLMs on code synthesis. However, these test-cases can be limited in both quantity and quality for fully assessing the functional correctness of the generated code. Such limitation in the existing benchmarks begs the following question: In the era of LLMs, is the code generated really correct? To answer this, we propose EvalPlus -- a code synthesis evaluation framework to rigorously benchmark the functional correctness of LLM-synthesized code. EvalPlus augments a given evaluation dataset with large amounts of test-cases newly produced by an automatic test input generator, powered by both LLM- and mutation-based strategies. While EvalPlus is general, we extend the test-cases of the popular HumanEval benchmark by 80x to build HumanEval+. Our extensive evaluation across 26 popular LLMs (e.g., GPT-4 and ChatGPT) demonstrates that HumanEval+ is able to catch significant amounts of previously undetected wrong code synthesized by LLMs, reducing the pass@k by up-to 19.3-28.9%. We also surprisingly found that test insufficiency can lead to mis-ranking. For example, both WizardCoder-CodeLlama and Phind-CodeLlama now outperform ChatGPT on HumanEval+, while none of them could on HumanEval. Our work not only indicates that prior popular code synthesis evaluation results do not accurately reflect the true performance of LLMs for code synthesis, but also opens up a new direction to improve such programming benchmarks through automated testing. We have open-sourced our tools, enhanced datasets as well as all LLM-generated code at https://github.com/evalplus/evalplus to facilitate and accelerate future LLM-for-code research.

翻訳日:2023-11-02 02:53:00 公開日:2023-10-30

# 大規模言語モデルを用いた単体テスト生成に関する実証的研究

An Empirical Study of Using Large Language Models for Unit Test Generation ( http://arxiv.org/abs/2305.00418v2 )

ライセンス: Link先を確認

Mohammed Latif Siddiq, Joanna C. S. Santos, Ridwanul Hasan Tanvir, Noshin Ulfat, Fahmid Al Rifat, and Vinicius Carvalho Lopes

(参考訳) コード生成モデルは、コードコメント、既存のコード、または両方の組み合わせからプロンプトを受け取り、コードを生成する。コード生成モデル(github copilotなど)は、実際にはますます採用されているが、微調整なしでユニットテスト生成にうまく使えるかどうかは不明だ。我々は,このギャップを埋めるために3つの生成モデル(Codex, GPT-3.5-Turbo, StarCoder)がいかにうまくテストケースを生成するかを検討した。 HumanEval と Evosuite SF110 の2つのベンチマークを用いて,単体テスト生成プロセスにおけるコンテキスト生成の効果を検討した。モデルのコンパイル率,テストの正確性,カバレッジ,テストの臭いなどに基づいて評価した。 CodexモデルはHumanEvalデータセットの80%以上のカバレッジを達成したが、EvoSuite SF110ベンチマークの2%以上のカバレッジを持つモデルはない。生成されたテストは、Duplicated AssertsやEmpty Testsといったテストの臭いにも悩まされた。

A code generation model generates code by taking a prompt from a code comment, existing code, or a combination of both. Although code generation models (e.g. GitHub Copilot) are increasingly being adopted in practice, it is unclear whether they can successfully be used for unit test generation without fine-tuning. We investigated how well three generative models (Codex, GPT-3.5-Turbo, and StarCoder) can generate test cases to fill this gap. We used two benchmarks (HumanEval and Evosuite SF110) to investigate the context generation's effect in the unit test generation process. We evaluated the models based on compilation rates, test correctness, coverage, and test smells. We found that the Codex model achieved above 80% coverage for the HumanEval dataset, but no model had more than 2% coverage for the EvoSuite SF110 benchmark. The generated tests also suffered from test smells, such as Duplicated Asserts and Empty Tests.

翻訳日:2023-11-02 02:52:05 公開日:2023-10-30

# PUNR:ニュースレコメンデーションのためのユーザ行動モデリングによる事前学習

PUNR: Pre-training with User Behavior Modeling for News Recommendation ( http://arxiv.org/abs/2304.12633v2 )

ライセンス: Link先を確認

Guangyuan Ma, Hongtao Liu, Xing Wu, Wanhui Qian, Zhepeng Lv, Qing Yang, Songlin Hu

(参考訳) ニュースレコメンデーションは、ユーザーの行動に基づいてクリック行動を予測することを目的としている。ユーザの表現を効果的にモデル化する方法は、望ましいニュースを推奨するキーとなる。既存の作品は、主に監督された微調整段階の改善に焦点を当てている。しかし、ユーザ表現に最適化された PLM ベースの教師なし事前学習手法がまだ存在しない。本研究では,ユーザ行動マスキングとユーザ行動生成という2つのタスクを備えた教師なし事前学習パラダイムを提案する。まず,ユーザ行動マスキング事前学習タスクを導入し,その状況行動に基づいてマスキングユーザ行動の復元を行う。このようにして、このモデルはより強く、より包括的なユーザーニュースリーディングパターンを捉えることができる。さらに,ユーザエンコーダから派生したユーザ表現ベクトルを強化するために,新しいユーザ行動生成事前学習タスクを導入する。上記の事前学習したユーザモデリングエンコーダを用いて、下流の微調整でニュースやユーザ表現を得る。実世界のニュースベンチマークの評価では、既存のベースラインよりも大幅にパフォーマンスが向上している。

News recommendation aims to predict click behaviors based on user behaviors. How to effectively model the user representations is the key to recommending preferred news. Existing works are mostly focused on improvements in the supervised fine-tuning stage. However, there is still a lack of PLM-based unsupervised pre-training methods optimized for user representations. In this work, we propose an unsupervised pre-training paradigm with two tasks, i.e. user behavior masking and user behavior generation, both towards effective user behavior modeling. Firstly, we introduce the user behavior masking pre-training task to recover the masked user behaviors based on their contextual behaviors. In this way, the model could capture a much stronger and more comprehensive user news reading pattern. Besides, we incorporate a novel auxiliary user behavior generation pre-training task to enhance the user representation vector derived from the user encoder. We use the above pre-trained user modeling encoder to obtain news and user representations in downstream fine-tuning. Evaluations on the real-world news benchmark show significant performance improvements over existing baselines.

翻訳日:2023-11-02 02:51:48 公開日:2023-10-30

# 比較推論のための事前学習言語モデル

Pre-training Language Models for Comparative Reasoning ( http://arxiv.org/abs/2305.14457v2 )

ライセンス: Link先を確認

Mengxia Yu, Zhihan Zhang, Wenhao Yu, Meng Jiang

(参考訳) 比較推論は、対象、概念または実体を比較して結論を引き出す過程であり、基本的な認知能力を構成する。本稿では,テキストに対する比較推論能力を高めるための,事前学習型言語モデルのための新しいフレームワークを提案する。比較推論を必要とするNLPタスクにはアプローチがあるが、コストのかかる手動データラベリングと、異なるタスクに対する限定的な一般化性に悩まされている。本手法では,構造化データと非構造化データの両方を活用する,テキストベースのエンティティ比較のためのスケーラブルなデータ収集手法を提案する。さらに, 比較推論に関する3つの新しい目的を通して, 事前学習言語モデルの枠組みを提案する。比較質問応答,質問生成,要約などの下流タスクの評価は,特に低リソース条件下で,我々の事前学習フレームワークが言語モデルの比較推論能力を大幅に向上させることを示す。この研究は、比較推論のための最初の統合ベンチマークもリリースしている。

Comparative reasoning is a process of comparing objects, concepts, or entities to draw conclusions, which constitutes a fundamental cognitive ability. In this paper, we propose a novel framework to pre-train language models for enhancing their abilities of comparative reasoning over texts. While there have been approaches for NLP tasks that require comparative reasoning, they suffer from costly manual data labeling and limited generalizability to different tasks. Our approach introduces a novel method of collecting scalable data for text-based entity comparison, which leverages both structured and unstructured data. Moreover, we present a framework of pre-training language models via three novel objectives on comparative reasoning. Evaluation on downstream tasks including comparative question answering, question generation, and summarization shows that our pre-training framework significantly improves the comparative reasoning abilities of language models, especially under low-resource conditions. This work also releases the first integrated benchmark for comparative reasoning.

翻訳日:2023-11-02 02:45:02 公開日:2023-10-30

# 確率時空間ダイナミクスのための同変ニューラルシミュレータ

Equivariant Neural Simulators for Stochastic Spatiotemporal Dynamics ( http://arxiv.org/abs/2305.14286v2 )

ライセンス: Link先を確認

Koen Minartz, Yoeri Poels, Simon Koop, Vlado Menkovski

(参考訳) ニューラルネットワークは、高次元力学系のスケーラブルなデータ駆動シミュレーションのツールとして、特に数値解法が実現不可能あるいは計算コストが高い環境で登場している。特に、決定論的ニューラルネットワークシミュレータにドメイン対称性を組み込むことで、精度、サンプル効率、パラメータ効率を大幅に改善できることが示されている。しかし、確率的現象をシミュレートできる確率的神経シミュレータに対称性を組み込むには、同変関数近似ではなく、軌道上の同変分布を生成するモデルが必要である。本稿では,同変分布の自己回帰的確率論的モデリングの枠組みであるEquivariant Probabilistic Neural Simulation (EPNS)を提案する。我々はepnsを用いて確率的n体系と確率的細胞動力学のモデルを設計する。実験の結果,EPNSは既存のニューラルネットワークを用いた確率的シミュレーション法よりもかなり優れていた。具体的には,epnに等価性を導入することで,シミュレーション品質,データ効率,ロールアウト安定性,不確実性定量化が向上することを示す。 EPNSは様々な領域における効率的なデータ駆動確率シミュレーションのための有望な手法である。

Neural networks are emerging as a tool for scalable data-driven simulation of high-dimensional dynamical systems, especially in settings where numerical methods are infeasible or computationally expensive. Notably, it has been shown that incorporating domain symmetries in deterministic neural simulators can substantially improve their accuracy, sample efficiency, and parameter efficiency. However, to incorporate symmetries in probabilistic neural simulators that can simulate stochastic phenomena, we need a model that produces equivariant distributions over trajectories, rather than equivariant function approximations. In this paper, we propose Equivariant Probabilistic Neural Simulation (EPNS), a framework for autoregressive probabilistic modeling of equivariant distributions over system evolutions. We use EPNS to design models for a stochastic n-body system and stochastic cellular dynamics. Our results show that EPNS considerably outperforms existing neural network-based methods for probabilistic simulation. More specifically, we demonstrate that incorporating equivariance in EPNS improves simulation quality, data efficiency, rollout stability, and uncertainty quantification. We conclude that EPNS is a promising method for efficient and effective data-driven probabilistic simulation in a diverse range of domains.

翻訳日:2023-11-02 02:44:47 公開日:2023-10-30

# プレゼンテーションバイアス下におけるマルチモーダル学習の反事実強化

Counterfactual Augmentation for Multimodal Learning Under Presentation Bias ( http://arxiv.org/abs/2305.14083v2 )

ライセンス: Link先を確認

Victoria Lin, Louis-Philippe Morency, Dimitrios Dimitriadis, Srinagesh Sharma

(参考訳) 現実世界の機械学習システムでは、ラベルはシステムが奨励したいユーザー行動に由来することが多い。時間とともに、新しいモデルは新しいトレーニング例と機能が利用可能になるようにトレーニングされなければなりません。しかし、ユーザーとモデルの間のフィードバックループは将来のユーザの振る舞いをバイアスし、新しいモデルをトレーニングする能力を損なうラベルにプレゼンテーションバイアスを引き起こす。本稿では,生成したデファクトラベルを用いて提示バイアスを補正する新しい因果的手法である,デファクト拡張を提案する。実証実験により,非補正モデルと既存バイアス補正手法の双方と比較して,デファクト改善により下流性能が向上することが示された。モデル分析はさらに、生成された偽物はオラクルの設定において真の偽物と密接に一致していることを示している。

In real-world machine learning systems, labels are often derived from user behaviors that the system wishes to encourage. Over time, new models must be trained as new training examples and features become available. However, feedback loops between users and models can bias future user behavior, inducing a presentation bias in the labels that compromises the ability to train new models. In this paper, we propose counterfactual augmentation, a novel causal method for correcting presentation bias using generated counterfactual labels. Our empirical evaluations demonstrate that counterfactual augmentation yields better downstream performance compared to both uncorrected models and existing bias-correction methods. Model analyses further indicate that the generated counterfactuals align closely with true counterfactuals in an oracle setting.

翻訳日:2023-11-02 02:44:28 公開日:2023-10-30

# snekhorn:対称エントロピーアフィニティによる次元縮小

SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities ( http://arxiv.org/abs/2305.13797v2 )

ライセンス: Link先を確認

Hugues Van Assel, Titouan Vayer, R\'emi Flamary, Nicolas Courty

(参考訳) 機械学習における多くのアプローチは、データセットのサンプル間の類似性を符号化する重み付きグラフに依存している。ポピュラー次元還元 (dr) アルゴリズム t-sne で特に用いられるエントロピーアフィニティ (eas) は、そのようなグラフの具体例である。不均質なサンプリング密度に対するロバスト性を確保するため、easは各サンプルにカーネル帯域幅パラメータを割り当て、親和性行列の各行のエントロピーが、指数関数がパープレキシティとして知られる特定の値で一定に保たれるようにした。 EAは本質的に非対称で行ワイド確率であるが、行ワイドなエントロピーと確率性の両方に反するヒューリスティックな対称性の手法を実行した後、DRアプローチで使用される。本研究では,最適な輸送問題としてのEAの新たな特徴を明らかにし,二重昇華を用いて効率的に計算できる自然な対称性を実現する。対応する新規親和性行列は、クラスタリング性能の点で対称確率正規化の利点を生かし、また各行のエントロピーを効果的に制御することにより、ノイズレベルの変化に対して特に堅牢である。次に,この新しい親和性行列を利用した新しいdrアルゴリズムsnekhornを提案する。我々は,合成データと実世界のデータの両方についていくつかの指標を用いて,最先端のアプローチよりも明らかに優れていることを示す。

Many approaches in machine learning rely on a weighted graph to encode the similarities between samples in a dataset. Entropic affinities (EAs), which are notably used in the popular Dimensionality Reduction (DR) algorithm t-SNE, are particular instances of such graphs. To ensure robustness to heterogeneous sampling densities, EAs assign a kernel bandwidth parameter to every sample in such a way that the entropy of each row in the affinity matrix is kept constant at a specific value, whose exponential is known as perplexity. EAs are inherently asymmetric and row-wise stochastic, but they are used in DR approaches after undergoing heuristic symmetrization methods that violate both the row-wise constant entropy and stochasticity properties. In this work, we uncover a novel characterization of EA as an optimal transport problem, allowing a natural symmetrization that can be computed efficiently using dual ascent. The corresponding novel affinity matrix derives advantages from symmetric doubly stochastic normalization in terms of clustering performance, while also effectively controlling the entropy of each row thus making it particularly robust to varying noise levels. Following, we present a new DR algorithm, SNEkhorn, that leverages this new affinity matrix. We show its clear superiority to state-of-the-art approaches with several indicators on both synthetic and real-world datasets.

翻訳日:2023-11-02 02:43:52 公開日:2023-10-30

# 知覚テスト:マルチモーダルビデオモデルの診断ベンチマーク

Perception Test: A Diagnostic Benchmark for Multimodal Video Models ( http://arxiv.org/abs/2305.13786v2 )

ライセンス: Link先を確認

Viorica P\u{a}tr\u{a}ucean, Lucas Smaira, Ankush Gupta, Adri\`a Recasens Continente, Larisa Markeeva, Dylan Banarse, Skanda Koppula, Joseph Heyward, Mateusz Malinowski, Yi Yang, Carl Doersch, Tatiana Matejovicova, Yury Sulsky, Antoine Miech, Alex Frechette, Hanna Klimczak, Raphael Koster, Junlin Zhang, Stephanie Winkler, Yusuf Aytar, Simon Osindero, Dima Damen, Andrew Zisserman, Jo\~ao Carreira

(参考訳) 本研究では,事前学習したマルチモーダルモデル(Flamingo,SeViLA,GPT-4)の知覚と推論能力を評価するために,新しいマルチモーダルビデオベンチマークである知覚テストを提案する。計算タスク(例えば分類、検出、追跡)に焦点を当てた既存のベンチマークと比較すると、知覚テストは、ビデオ、音声、テキストのモダリティにまたがるスキル(記憶、抽象、物理学、意味論)と推論の種類(記述、説明、予測、反事実)に焦点を当て、包括的で効率的な評価ツールを提供する。このベンチマークは、ゼロショット/少数ショットまたは限定的な微調整方式で、転送機能の事前訓練されたモデルを探索する。これらの目的のために、知覚テストでは、世界中の約100人の参加者によって撮影された知覚的に興味深い状況を示すために設計された、平均23秒の11.6kの現実世界ビデオが導入されている。ビデオには6種類のラベル(マルチチョイスと接地ビデオ、オブジェクトとポイントトラック、テンポラルアクションとサウンドセグメント)が密にアノテートされており、言語と非言語の両方の評価を可能にする。ベンチマークの微調整とバリデーションの分割(cc-by license)は、保持テストの分割を備えたチャレンジサーバに加えて、公開されている(cc-by license)。最先端のビデオqaモデルと比較した人間のベースラインの結果は、パフォーマンスの実質的な差(91.4%対46.2%)を示し、マルチモーダルビデオ理解の改善の余地があることを示唆している。 dataset、baseline code、challenge serverはhttps://github.com/deepmind/perception_testで利用可能である。

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a substantial gap in performance (91.4% vs 46.2%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baseline code, and challenge server are available at https://github.com/deepmind/perception_test

翻訳日:2023-11-02 02:43:26 公開日:2023-10-30

# 双方向デコードのためのフレームワーク:形態的インフレクションのケーススタディ

A Framework for Bidirectional Decoding: Case Study in Morphological Inflection ( http://arxiv.org/abs/2305.12580v2 )

ライセンス: Link先を確認

Marc E. Canby and Julia Hockenmaier

(参考訳) 左右方向の出力を生成するトランスフォーマベースのエンコーダ-デコーダモデルがシーケンス-シーケンスタスクの標準となっている。本稿では,"outside-in"からシーケンスを生成するデコードのためのフレームワークを提案する。各ステップにおいて,モデルが左,右,あるいは左,右のシーケンスに結合するトークンを生成するように選択する。これは従来の双方向デコーダよりも原則的だと主張する。本提案は,様々なモデルアーキテクチャをサポートし,潜在順序変数を辺化する動的プログラミングアルゴリズムなど,いくつかのトレーニング手法を含む。提案手法は2022年と2023年の共有タスクに最先端(sota)をセットし,次のシステムでは平均精度4.7ポイント,2.7ポイントをそれぞれ上回った。このモデルは長いシーケンスで特にうまく動作し、stemとaffixからなる単語のスプリットポイントを暗黙的に学習でき、ユニークな補題が少ないデータセットのベースラインよりもパフォーマンスが良い(ただし補題ごとに多くの例がある)。

Transformer-based encoder-decoder models that generate outputs in a left-to-right fashion have become standard for sequence-to-sequence tasks. In this paper, we propose a framework for decoding that produces sequences from the "outside-in": at each step, the model chooses to generate a token on the left, on the right, or join the left and right sequences. We argue that this is more principled than prior bidirectional decoders. Our proposal supports a variety of model architectures and includes several training methods, such as a dynamic programming algorithm that marginalizes out the latent ordering variable. Our model sets state-of-the-art (SOTA) on the 2022 and 2023 shared tasks, beating the next best systems by over 4.7 and 2.7 points in average accuracy respectively. The model performs particularly well on long sequences, can implicitly learn the split point of words composed of stem and affix, and performs better relative to the baseline on datasets that have fewer unique lemmas (but more examples per lemma).

翻訳日:2023-11-02 02:42:40 公開日:2023-10-30

# ReLUネットワークの多相最適化ダイナミクスとリッチ非線形挙動の理解

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks ( http://arxiv.org/abs/2305.12467v4 )

ライセンス: Link先を確認

Mingze Wang, Chao Ma

(参考訳) ReLUニューラルネットワークのトレーニングプロセスはしばしば複雑な非線形現象を示す。モデルの非線形性と損失の非凸性は理論解析に重大な課題をもたらす。したがって、ニューラルネットワークの最適化力学に関するこれまでの理論研究は、局所解析(訓練終了など)や近似線形モデル(ニューラル・タンジェント・カーネルなど)に重点を置いていた。本研究では, 線形分離可能なデータに基づいて, グラディエントフローにより学習した2層ReLUネットワークの学習過程を理論的に解析する。この特定の環境では、ランダム初期化から最終収束までの最適化過程全体を解析する。研究した比較的単純なモデルとデータにもかかわらず、学習プロセス全体とは4つの異なるフェーズがあることがわかりました。特定の非線形挙動は、初期凝縮、サドル・トゥ・プラトー力学、プラトーエスケープ、活性化パターンの変化、複雑さの増加による学習など、理論的に正確に識別・捕獲することができる。

The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.

翻訳日:2023-11-02 02:42:21 公開日:2023-10-30

# ReTAG: 分析テキスト生成のための認識テーブルの推論

ReTAG: Reasoning Aware Table to Analytic Text Generation ( http://arxiv.org/abs/2305.11826v2 )

ライセンス: Link先を確認

Deepanway Ghosal and Preksha Nema and Aravindan Raghuveer

(参考訳) テーブル要約のタスクは、テーブル内の特定のハイライトされたセルのセットを簡潔かつ正確に表すテキストを生成することである。テーブルからテキスト生成技術への大きな進歩はあったが、モデルが依然として記述的な要約を生成しており、表に含まれる情報を文で繰り返す。一般的なテーブルからテキストへのベンチマーク(totto (parikh et al., 2020 and infotabs (gupta et al., 2020))の分析を通じて、理想的な要約を生成するには、複数の推論が必要であり、テーブルの範囲を超えた知識へのアクセスが必要であることを観察する。このギャップに対処するために,ベクトル量子化を用いた解析的推論を出力に注入するテーブルおよび推論認識モデルであるReTAGを提案する。 ReTAGは、ToTToとInfoTabsの関連するスライスでPARENTメトリックを2.2%、2.9%改善し、アートベースラインの状態よりもテキスト生成タスクを生成する。人間による評価により、ReTAGの出力は、強いテーブル認識モデルに比べて12%ほど忠実で分析的であることがわかった。我々の知る限りでは、ReTAGは構造認識シーケンスからシーケンスモデルまでの複数の推論手法を制御し、複数のテーブルからテキストタスクへのアートパフォーマンスの状態を克服する最初のモデルである。私たちは、ToTTo、InfoTabsデータセットを参照文毎に推論カテゴリで拡張(そして、オープンソースの35.6K分析、55.9k記述インスタンス)します。

The task of table summarization involves generating text that both succinctly and accurately represents the table or a specific set of highlighted cells within a table. While significant progress has been made in table to text generation techniques, models still mostly generate descriptive summaries, which reiterates the information contained within the table in sentences. Through analysis of popular table to text benchmarks (ToTTo (Parikh et al., 2020 and InfoTabs (Gupta et al., 2020) we observe that in order to generate the ideal summary, multiple types of reasoning is needed coupled with access to knowledge beyond the scope of the table. To address this gap, we propose ReTAG, a table and reasoning aware model that uses vector-quantization to infuse different types of analytical reasoning into the output. ReTAG achieves 2.2%, 2.9% improvement on the PARENT metric in the relevant slice of ToTTo and InfoTabs for the table to text generation task over state of the art baselines. Through human evaluation, we observe that output from ReTAG is upto 12% more faithful and analytical compared to a strong table-aware model. To the best of our knowledge, ReTAG is the first model that can controllably use multiple reasoning methods within a structure-aware sequence to sequence model to surpass state of the art performance in multiple table to text tasks. We extend (and open source 35.6K analytical, 55.9k descriptive instances) the ToTTo, InfoTabs datasets with the reasoning categories used in each reference sentences.

翻訳日:2023-11-02 02:41:43 公開日:2023-10-30

# ToolkenGPT: ツール埋め込みによる大量ツールによる凍結言語モデルの拡張

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings ( http://arxiv.org/abs/2305.11554v3 )

ライセンス: Link先を確認

Shibo Hao, Tianyang Liu, Zhen Wang, Zhiting Hu

(参考訳) 言語モデル(LLM)を外部ツールで拡張することは、複雑な問題を解決するための有望なアプローチとして現れている。しかし、ツールのデモデータでLSMを微調整する従来の手法は、コストと事前定義されたツールセットに制限される可能性がある。最近のインコンテキスト学習パラダイムはこれらの問題を緩和するが、制限されたコンテキスト長はいくつかのデモのみを可能にし、ツールの最適下理解につながる。さらに、多くのツールが選択できる場合、コンテキスト内学習は完全に機能しない可能性がある。本稿では,両面の利点を組み合わせた代替手法として$\textbf{ToolkenGPT}$を提案する。我々のアプローチは、各$\underline{tool}$をto$\underline{ken}$ ($\textit{toolken}$)として表現し、埋め込みを学習し、通常のワードトークンを生成するのと同じようにツール呼び出しを可能にする。ツールケンが起動されると、LSMはツールの実行のための引数を完了するように促される。 toolkengptは、ツールケンのセットをオンザフライで拡大することで、任意の数のツールをプラグインする柔軟性を提供します。さらに、ツールケン埋め込みを学習するための広範なデモデータを提供することで、ツール使用を改善する。数値推論,知識に基づく質問応答,具体化計画生成など,多様な領域において,我々のアプローチはLLMをツールで効果的に強化し,最新のベースラインを大幅に上回っている。 ToolkenGPTは、複雑なシナリオにおいて、大きなツールセットから関連するツールを使用する有望な能力を示す。

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to solving complex problems. However, traditional methods, which finetune LLMs with tool demonstration data, can be both costly and restricted to a predefined set of tools. Recent in-context learning paradigm alleviates these issues, but the limited context length only allows for a few shots of demonstrations, leading to suboptimal understandings of the tools. Moreover, when there are numerous tools to choose from, in-context learning could completely fail to work. In this paper, we propose an alternative approach, $\textbf{ToolkenGPT}$, which combines the benefits of both sides. Our approach represents each $\underline{tool}$ as a to$\underline{ken}$ ($\textit{toolken}$) and learns an embedding for it, enabling tool calls in the same way as generating a regular word token. Once a toolken is triggered, the LLM is prompted to complete arguments for the tool to execute. ToolkenGPT offers the flexibility to plug in an arbitrary number of tools by expanding the set of toolkens on the fly. In addition, it improves tool use by allowing extensive demonstration data for learning the toolken embeddings. In diverse domains, including numerical reasoning, knowledge-based question answering, and embodied plan generation, our approach effectively augments LLMs with tools and substantially outperforms various latest baselines. ToolkenGPT demonstrates the promising ability to use relevant tools from a large tool set in complex scenarios.

翻訳日:2023-11-02 02:41:15 公開日:2023-10-30

# TextDiffuser: テキストペイントとしての拡散モデル

TextDiffuser: Diffusion Models as Text Painters ( http://arxiv.org/abs/2305.10855v5 )

ライセンス: Link先を確認

Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei

(参考訳) 拡散モデルは印象的な生成能力で注目を集めているが、現在は正確で一貫性のあるテキストのレンダリングに苦戦している。この問題に対処するために,テキストディフューザを導入し,背景に忠実な視覚的魅力のあるテキストによる画像生成に焦点を当てた。 TextDiffuserは、まず、Transformerモデルがテキストプロンプトから抽出されたキーワードのレイアウトを生成し、次に拡散モデルがテキストプロンプトと生成されたレイアウトに条件付き画像を生成する。さらに,文字認識や検出,文字レベルのセグメンテーションアノテーションを含む1000万のイメージテキストペアを含む,ocrアノテーションを備えた最初の大規模テキストイメージデータセットであるmario-10mをコントリビュートする。我々はさらにMARIO-Evalベンチマークを収集し、テキストのレンダリング品質を評価する包括的なツールとして機能する。実験とユーザスタディにより,テキストプロンプトだけで高品質なテキスト画像を作成し,テキストテンプレート画像と併用し,不完全な画像の再構築を行う,柔軟性と制御性を示す。コード、モデル、データセットは \url{https://aka.ms/textdiffuser} で入手できる。

Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality. Through experiments and user studies, we show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text. The code, model, and dataset will be available at \url{https://aka.ms/textdiffuser}.

翻訳日:2023-11-02 02:40:45 公開日:2023-10-30

# 量子測定結果に対する財務クレーム成分の評価

Valuation of a Financial Claim Contingent on the Outcome of a Quantum Measurement ( http://arxiv.org/abs/2305.10239v2 )

ライセンス: Link先を確認

Lane P. Hughston and Leandro S\'anchez-Betancourt

(参考訳) 有理エージェントは、時として$0$が金銭的契約に入り、その支払いは、ある時点において$T>0$で量子測定によって決定される。量子系の状態は、既知の密度行列 $\hat p$ によってハイゼンベルク表現で与えられる。エージェントは、その契約に参加するのに、その時点で0ドルの支払いを喜んでどのくらいしますか? 有限次元ヒルベルト空間の場合、それぞれのクレームは観測可能な $\hat x_t$ で表現され、ここでは$\hat x_t$ の固有値が測定結果が得られたときに支払われる金額を決定する。妥当な公理の下では、価格関数 $\Pi_{0T}$ が $\Pi_{0T}(\hat X_T) = P_{0T}\,{\rm tr} ( \hat q \hat X_T) $ を任意のクレーム $\hat X_T$ に対して取るような null 空間上の物理的状態 $\hat p$ と等価な価格状態 $\hat q$ が存在することを証明している。すなわち、任意の$|\xi \rangle \in \mathcal h$ 1 に対して、$\langle \bar \xi | \hat p | \xi \rangle = 0$ であることと、$\langle \bar \xi | \hat q | \xi \rangle = 0$ であることは同値である。最適化問題の種類を導入し,所定の測定値に基づいてクレームに対する最適契約支払構造を解く。次に,そのような条件下でのコーチェン・スペックルの定理の意義を考察し,契約のポートフォリオ形成の問題について考察する。最後に,複数周期契約について考察する。

We consider a rational agent who at time $0$ enters into a financial contract for which the payout is determined by a quantum measurement at some time $T>0$. The state of the quantum system is given in the Heisenberg representation by a known density matrix $\hat p$. How much will the agent be willing to pay at time $0$ to enter into such a contract? In the case of a finite dimensional Hilbert space, each such claim is represented by an observable $\hat X_T$ where the eigenvalues of $\hat X_T$ determine the amount paid if the corresponding outcome is obtained in the measurement. We prove, under reasonable axioms, that there exists a pricing state $\hat q$ which is equivalent to the physical state $\hat p$ on null spaces such that the pricing function $\Pi_{0T}$ takes the form $\Pi_{0T}(\hat X_T) = P_{0T}\,{\rm tr} ( \hat q \hat X_T) $ for any claim $\hat X_T$, where $P_{0T}$ is the one-period discount factor. By "equivalent" we mean that $\hat p$ and $\hat q$ share the same null space: thus, for any $|\xi \rangle \in \mathcal H$ one has $\langle \bar \xi | \hat p | \xi \rangle = 0$ if and only if $\langle \bar \xi | \hat q | \xi \rangle = 0$. We introduce a class of optimization problems and solve for the optimal contract payout structure for a claim based on a given measurement. Then we consider the implications of the Kochen-Specker theorem in such a setting and we look at the problem of forming portfolios of such contracts. Finally, we consider multi-period contracts.

翻訳日:2023-11-02 02:40:22 公開日:2023-10-30

# 自然言語におけるグラフ問題の解ける言語モデル

Can Language Models Solve Graph Problems in Natural Language? ( http://arxiv.org/abs/2305.10037v2 )

ライセンス: Link先を確認

Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, Yulia Tsvetkov

(参考訳) 大規模言語モデル(LLM)は、ロボット工学の計画、マルチホップ質問応答や知識探索、構造化コモンセンス推論など、暗黙のグラフィカルな構造を持つ様々なタスクに採用されている。 LLMは、これらのタスクの最先端を構造的含意で進めてきたが、LLMがグラフや構造のテキスト記述を明示的に処理し、それらを接地された概念空間にマッピングし、構造化された操作を行うことができるかどうかはまだ未定である。この目的のために,自然言語で設計したグラフ型問題解決の総合ベンチマークであるnlgraph(natural language graph)を提案する。 NLGraphには29,370の問題が含まれており、接続や最短経路といった単純なタスクから、最大フローやグラフニューラルネットワークのシミュレーションといった複雑な問題まで、複雑な8つのグラフ推論タスクをカバーする。 llms (gpt-3/4) をnlgraphベンチマーク上で様々なプロンプトアプローチで評価し,それを見出す。 1)言語モデルは予備的グラフ推論能力を示す。 2)高度なプロンプトとインコンテキスト学習の利点は,より複雑なグラフ問題において減少する。 3) LLMは, グラフや問題設定の急激な相関に直面すると, 当然脆弱である。次に,自然言語グラフ問題を解決するための2つの命令に基づく手法である build-a-graph prompting と algorithmic prompting を提案する。ビルド・ア・グラフとアルゴリズムは、複数のタスクや設定において、NLGraph上のLLMのパフォーマンスを3.07%から16.85%向上させる一方で、言語モデルを用いたセットアップにおいて最も複雑なグラフ推論タスクをどう解決するかは、オープンな研究課題である。 NLGraphベンチマークと評価コードはhttps://github.com/Arthur-Heng/NLGraphで公開されている。

Large language models (LLMs) are increasingly adopted for a variety of tasks with implicit graphical structures, such as planning in robotics, multi-hop question answering or knowledge probing, structured commonsense reasoning, and more. While LLMs have advanced the state-of-the-art on these tasks with structure implications, whether LLMs could explicitly process textual descriptions of graphs and structures, map them to grounded conceptual spaces, and perform structured operations remains underexplored. To this end, we propose NLGraph (Natural Language Graph), a comprehensive benchmark of graph-based problem solving designed in natural language. NLGraph contains 29,370 problems, covering eight graph reasoning tasks with varying complexity from simple tasks such as connectivity and shortest path up to complex problems such as maximum flow and simulating graph neural networks. We evaluate LLMs (GPT-3/4) with various prompting approaches on the NLGraph benchmark and find that 1) language models do demonstrate preliminary graph reasoning abilities, 2) the benefit of advanced prompting and in-context learning diminishes on more complex graph problems, while 3) LLMs are also (un)surprisingly brittle in the face of spurious correlations in graph and problem settings. We then propose Build-a-Graph Prompting and Algorithmic Prompting, two instruction-based approaches to enhance LLMs in solving natural language graph problems. Build-a-Graph and Algorithmic prompting improve the performance of LLMs on NLGraph by 3.07% to 16.85% across multiple tasks and settings, while how to solve the most complicated graph reasoning tasks in our setup with language models remains an open research question. The NLGraph benchmark and evaluation code are available at https://github.com/Arthur-Heng/NLGraph.

翻訳日:2023-11-02 02:39:21 公開日:2023-10-30

# 境界幾何学的罰則を保証するリーマン最小最適化の高速化法

Accelerated Methods for Riemannian Min-Max Optimization Ensuring Bounded Geometric Penalties ( http://arxiv.org/abs/2305.16186v2 )

ライセンス: Link先を確認

David Mart\'inez-Rubio, Christophe Roux, Christopher Criscitiello, Sebastian Pokutta

(参考訳) 本研究では, 積リーマン多様体上の $f(x, y)$ が定義されるような $\min_x \max_y f(x, y)$ という形式の最適化問題について検討し, $x$ と $\mu_y$-strongly geodesically convex (g-convex) が $y$,$\mu_x, \mu_y \geq 0$ に対して $\mu_x$-strongly g-concave in $y$ について検討する。我々は、$f$が$(l_x, l_y, l_{xy})$-smoothと$\mathcal{m}$, $\mathcal{n}$がhadaardである場合、高速化メソッドを設計する。そこで我々は, 計量計画付きリーマン勾配降下に対する大域的線形収束を示すとともに, 幾何学定数を減少させることにより, 既存の高速化手法を改善した。さらに、リーマン min-max の場合に適用する2つの以前の研究を、あらかじめ特定されたコンパクト集合に留まる反復に関する仮定を除去して解析する。

In this work, we study optimization problems of the form $\min_x \max_y f(x, y)$, where $f(x, y)$ is defined on a product Riemannian manifold $\mathcal{M} \times \mathcal{N}$ and is $\mu_x$-strongly geodesically convex (g-convex) in $x$ and $\mu_y$-strongly g-concave in $y$, for $\mu_x, \mu_y \geq 0$. We design accelerated methods when $f$ is $(L_x, L_y, L_{xy})$-smooth and $\mathcal{M}$, $\mathcal{N}$ are Hadamard. To that aim we introduce new g-convex optimization results, of independent interest: we show global linear convergence for metric-projected Riemannian gradient descent and improve existing accelerated methods by reducing geometric constants. Additionally, we complete the analysis of two previous works applying to the Riemannian min-max case by removing an assumption about iterates staying in a pre-specified compact set.

翻訳日:2023-11-02 02:31:55 公開日:2023-10-30

# 競合間の戦略的データ共有

Strategic Data Sharing between Competitors ( http://arxiv.org/abs/2305.16052v3 )

ライセンス: Link先を確認

Nikita Tsoy and Nikola Konstantinov

(参考訳) 協調学習技術は近年大きく進歩し、複数の組織にまたがってプライベートモデルトレーニングを可能にしている。この機会にもかかわらず、競合他社とのデータ共有を考えると、企業はジレンマに直面する。コラボレーションは企業の機械学習モデルを改善することができるが、競合他社に利益をもたらし、利益を減少させる可能性がある。本稿では,このデータ共有トレードオフを分析するための汎用フレームワークを提案する。フレームワークは3つのコンポーネントで構成されており、それぞれ、企業の生産決定、モデル品質に対する追加データの影響、データ共有交渉プロセスである。次に,従来の経済理論に基づく市場モデルに基づく枠組みのインスタンス化を行い,協調的インセンティブに影響を与える重要な要因を明らかにする。その結果,市場条件がデータ共有インセンティブに与える影響が示唆された。特に、企業の製品間の類似性や、難しい学習タスクがコラボレーションを促進するという点で、競争が減少していることが分かりました。

Collaborative learning techniques have significantly advanced in recent years, enabling private model training across multiple organizations. Despite this opportunity, firms face a dilemma when considering data sharing with competitors -- while collaboration can improve a company's machine learning model, it may also benefit competitors and hence reduce profits. In this work, we introduce a general framework for analyzing this data-sharing trade-off. The framework consists of three components, representing the firms' production decisions, the effect of additional data on model quality, and the data-sharing negotiation process, respectively. We then study an instantiation of the framework, based on a conventional market model from economic theory, to identify key factors that affect collaboration incentives. Our findings indicate a profound impact of market conditions on the data-sharing incentives. In particular, we find that reduced competition, in terms of the similarities between the firms' products, and harder learning tasks foster collaboration.

翻訳日:2023-11-02 02:31:22 公開日:2023-10-30

# Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models

Debias Coarsely, Sample Conditionally: Statistical Downscaling through Optimal Transport and Probabilistic Diffusion Models ( http://arxiv.org/abs/2305.15618v2 )

ライセンス: Link先を確認

Zhong Yi Wan, Ricardo Baptista, Yi-fan Chen, John Anderson, Anudhyan Boral, Fei Sha, Leonardo Zepeda-N\'u\~nez

(参考訳) 非ペアデータを用いた統計的ダウンスケーリングのための2段階確率的フレームワークを提案する。統計的ダウンスケーリングは、低分解能データを偏りの粗い数値スキームから高忠実度スキームに整合した高分解能データに変換する確率写像を求める。私たちのフレームワークは、2つのトランスフォーメーションを構成することによってこの問題に取り組みます。 (i)最適な輸送地図による偏りの段階、及び (ii)後方条件サンプリングを用いた確率拡散モデルによって達成されるアップサンプリングステップ。このアプローチは、ペアデータを必要としない条件分布を特徴付け、バイアスサンプルから関連する物理統計を忠実に復元する。本研究では, 気象・気候の数値シミュレーションにおける中核的な問題である1次元および2次元流体流問題に対する提案手法の有用性を実証する。提案手法は,8倍,16倍の解像度をアップサンプリングすることで,低解像度入力からリアルな高解像度出力を生成する。さらに,本手法は,入力と出力の低周波内容が一致しない場合でも,物理量の統計値と正しく一致している。 https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_di ffusion。

We introduce a two-stage probabilistic framework for statistical downscaling using unpaired data. Statistical downscaling seeks a probabilistic map to transform low-resolution data from a biased coarse-grained numerical scheme to high-resolution data that is consistent with a high-fidelity scheme. Our framework tackles the problem by composing two transformations: (i) a debiasing step via an optimal transport map, and (ii) an upsampling step achieved by a probabilistic diffusion model with a posteriori conditional sampling. This approach characterizes a conditional distribution without needing paired data, and faithfully recovers relevant physical statistics from biased samples. We demonstrate the utility of the proposed approach on one- and two-dimensional fluid flow problems, which are representative of the core difficulties present in numerical simulations of weather and climate. Our method produces realistic high-resolution outputs from low-resolution inputs, by upsampling resolutions of 8x and 16x. Moreover, our procedure correctly matches the statistics of physical quantities, even when the low-frequency content of the inputs and outputs do not match, a crucial but difficult-to-satisfy assumption needed by current state-of-the-art alternatives. Code for this work is available at: https://github.com/google-research/swirl-dynamics/tree/main/swirl_dynamics/projects/probabilistic_di ffusion.

翻訳日:2023-11-02 02:31:07 公開日:2023-10-30

# Momentumがエラーフィードバックを改善!

Momentum Provably Improves Error Feedback! ( http://arxiv.org/abs/2305.15155v2 )

ライセンス: Link先を確認

Ilyas Fatkhullin, Alexander Tyurin, Peter Richt\'arik

(参考訳) 分散環境で機械学習モデルをトレーニングする際の通信オーバーヘッドが高いため、現代のアルゴリズムは損失のある通信圧縮に依存している。しかし、未処理の場合、圧縮による誤差が伝播し、指数的発散を含む非常に不安定な挙動を引き起こす可能性がある。約10年前、Seide氏らは、この問題を緩和するための非常に効果的なヒューリスティックとして、EF14と呼ばれるエラーフィードバック(EF)機構を提案した。しかし、過去10年間のEF分野の着実にアルゴリズムと理論的進歩にもかかわらず、我々の理解は完璧には程遠い。この作業では、最も差し迫った問題のひとつに対処します。特に、標準的な非凸設定では、EFのすべての既知の変種は収束するために非常に大きなバッチサイズに依存しており、実際には禁止される。我々は、この問題を理論的にも現実的にも取り除く驚くほど単純な修正を提案する: Richt\'{a}rik et al による EF の最新の化へのPolyak の運動量の適用。【2021年】ef21として知られる。 EF21-SGDMと命名したこのアルゴリズムは,従来の誤りフィードバックアルゴリズムの標準滑らか性および有界分散仮定に基づく通信とサンプルの複雑さを改善し,有界勾配の相似性などのより強い仮定を必要としない。さらに, 複雑度をさらに向上させるダブルモーメント方式を提案する。本手法から圧縮を除去した場合でも,本手法は新規であり,ポリアックの運動量に富む非凸確率最適化の研究には独立した手法である。

Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richt\'{a}rik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum.

翻訳日:2023-11-02 02:29:56 公開日:2023-10-30

# 表データによる深部異常検出のための個別入力

Beyond Individual Input for Deep Anomaly Detection on Tabular Data ( http://arxiv.org/abs/2305.15121v5 )

ライセンス: Link先を確認

Hugo Thimonier, Fabrice Popineau, Arpad Rimmel and Bich-Li\^en Doan

(参考訳) 異常検出は金融、医療、サイバーセキュリティなど多くの分野において不可欠である。本稿では,教師付きタスクのために最初に提案された非パラメトリックトランスフォーマ(npts)を利用して,特徴量とサンプル値の両方の依存関係をキャプチャする,新しい深層異常検出法を提案する。再構成に基づくフレームワークでは,NPTをトレーニングし,通常のサンプルのマスキング特徴を再構築する。非パラメトリックな方法では、推論中にトレーニングセット全体を活用し、マスクした特徴を再構成して異常スコアを生成するモデルの能力を利用する。私たちの知る限りでは、グラフデータセット上の異常検出のために、機能機能とサンプルサンプルの依存関係をうまく組み合わせる最初の試みである。本手法は,31個のベンチマーク表型データセットを用いた広範囲な実験により,f1-score と auroc の2.4%,1.2% の既存手法を上回り,最先端の性能を実現することを実証した。本研究は,両依存のモデル化が表データにおける異常検出に重要であることを示す。

Anomaly detection is vital in many domains, such as finance, healthcare, and cybersecurity. In this paper, we propose a novel deep anomaly detection method for tabular data that leverages Non-Parametric Transformers (NPTs), a model initially proposed for supervised tasks, to capture both feature-feature and sample-sample dependencies. In a reconstruction-based framework, we train the NPT to reconstruct masked features of normal samples. In a non-parametric fashion, we leverage the whole training set during inference and use the model's ability to reconstruct the masked features to generate an anomaly score. To the best of our knowledge, this is the first work to successfully combine feature-feature and sample-sample dependencies for anomaly detection on tabular datasets. Through extensive experiments on 31 benchmark tabular datasets, we demonstrate that our method achieves state-of-the-art performance, outperforming existing methods by 2.4% and 1.2% in terms of F1-score and AUROC, respectively. Our ablation study provides evidence that modeling both types of dependencies is crucial for anomaly detection on tabular data.

翻訳日:2023-11-02 02:29:29 公開日:2023-10-30

# OPC UAを用いた強化学習の活用に関するミニレビュー

A Mini Review on the utilization of Reinforcement Learning with OPC UA ( http://arxiv.org/abs/2305.15113v2 )

ライセンス: Link先を確認

Simon Schindler, Martin Uray, Stefan Huber

(参考訳) 強化学習(Reinforcement Learning, RL)は、ロボット工学、自然言語処理、ゲームプレイといった様々な分野に適用された強力な機械学習パラダイムである。シーケンシャルな意思決定問題を解決するために、設計は経験から学び、動的環境の変化に適応できる。これらの能力により、産業における複雑なプロセスの制御と最適化の第一候補となる。この可能性を完全に活用する鍵は、既存の産業システムへのRLのシームレスな統合である。産業用通信標準であるOpen Platform Communications UnifiedArchitecture (OPC UA)はこのギャップを埋める可能性がある。しかし、RLとOPC UAは異なる分野のものであるため、研究者は2つの技術間のギャップを埋める必要がある。この研究は、このギャップを埋めるために、両方の技術の技術的な概要を簡潔に提供し、RLとOPC UAをどのように組み合わせて適用するかについての洞察を得るために、半発掘的な文献レビューを実施している。この調査では、RLとOPC UAの交差に続き、3つの主要な研究トピックが特定されている。文献レビューの結果は、RLは産業プロセスの制御と最適化のための有望な技術であるが、現実のシナリオに適度に少ない労力で展開するために必要な標準化されたインターフェースを持っていないことを示している。

Reinforcement Learning (RL) is a powerful machine learning paradigm that has been applied in various fields such as robotics, natural language processing and game playing achieving state-of-the-art results. Targeted to solve sequential decision making problems, it is by design able to learn from experience and therefore adapt to changing dynamic environments. These capabilities make it a prime candidate for controlling and optimizing complex processes in industry. The key to fully exploiting this potential is the seamless integration of RL into existing industrial systems. The industrial communication standard Open Platform Communications UnifiedArchitecture (OPC UA) could bridge this gap. However, since RL and OPC UA are from different fields,there is a need for researchers to bridge the gap between the two technologies. This work serves to bridge this gap by providing a brief technical overview of both technologies and carrying out a semi-exhaustive literature review to gain insights on how RL and OPC UA are applied in combination. With this survey, three main research topics have been identified, following the intersection of RL with OPC UA. The results of the literature review show that RL is a promising technology for the control and optimization of industrial processes, but does not yet have the necessary standardized interfaces to be deployed in real-world scenarios with reasonably low effort.

翻訳日:2023-11-02 02:29:09 公開日:2023-10-30

# 実世界情報検索シナリオにおけるLCMのテーブル・ツー・テキスト生成能力の検討

Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios ( http://arxiv.org/abs/2305.14987v2 )

ライセンス: Link先を確認

Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan

(参考訳) タブラルデータは様々な産業で広く使われており、ユーザが情報検索の目的を理解し、操作するのにかなりの時間と労力を要する。大規模言語モデル(LLM)の進歩は、ユーザ効率を向上させる大きな可能性を示している。しかし、テーブル情報探索のための実世界の応用におけるLLMの採用は、いまだに未定である。本稿では,2つの実世界情報探索シナリオ内の4つのデータセットを用いて,異なるLLMのテーブル・トゥ・テキスト機能について検討する。 LogicNLGや、新たに構築したデータインサイト生成用のLoTNLGデータセット、FeTaQAやクエリベースの生成用のF2WTQデータセットなどです。 3つの研究課題について調査を行い,テーブル・ツー・テキスト生成,自動評価,フィードバック生成におけるllmの性能評価を行った。実験結果から,現在の高性能LCM(特にGPT-4)は,実世界のシナリオにおいて,ユーザの情報検索を目的としたテーブル・ツー・テキスト・ジェネレータ,評価器,フィードバック・ジェネレータとして効果的に機能することが示唆された。しかし、他のオープンソース LLM (Tulu と LLaMA-2) と GPT-4 の間には大きな性能差がある。私たちのデータとコードはhttps://github.com/yale-nlp/LLM-T2Tで公開されています。

Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this paper, we investigate the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. These include the LogicNLG and our newly-constructed LoTNLG datasets for data insight generation, along with the FeTaQA and our newly-constructed F2WTQ datasets for query-based generation. We structure our investigation around three research questions, evaluating the performance of LLMs in table-to-text generation, automated evaluation, and feedback generation, respectively. Experimental results indicate that the current high-performing LLM, specifically GPT-4, can effectively serve as a table-to-text generator, evaluator, and feedback generator, facilitating users' information seeking purposes in real-world scenarios. However, a significant performance gap still exists between other open-sourced LLMs (e.g., Tulu and LLaMA-2) and GPT-4 models. Our data and code are publicly available at https://github.com/yale-nlp/LLM-T2T.

翻訳日:2023-11-02 02:28:49 公開日:2023-10-30

# コントラスト視覚言語モデルにおけるテキストエンコーダのボトルネック構成性

Text encoders bottleneck compositionality in contrastive vision-language models ( http://arxiv.org/abs/2305.14897v2 )

ライセンス: Link先を確認

Amita Kamath, Jack Hessel, Kai-Wei Chang

(参考訳) CLIPのような高性能視覚言語(VL)モデルは、単一のベクトルを使ってキャプションを表現する。このボトルネックで、言語に関する情報はどの程度失われていますか? 最初にCompPromptsをキュレートします。これは、VLモデルがキャプチャできるべき構成的なイメージキャプションのセットです(例えば、シングルオブジェクト、オブジェクト+プロパティ、複数の対話オブジェクト)。そして,複数のVLモデルによって生成された単一ベクトルテキスト表現からキャプションを再構築することを目的とした,テキストのみの回復プローブを訓練する。このアプローチではイメージを必要とせず、以前の作業よりも広い範囲のシーンでテストすることができます。私たちはそれを見つけました 1) CLIP のテキストエンコーダは,オブジェクト関係,属性オブジェクト関連,カウント,否定など,よりコンポジション的な入力では不足している。 2)一部のテキストエンコーダは,他よりも著しく優れている。 3) テキストのみのリカバリ性能はcontroledimcaps上でマルチモーダルマッチング性能を予測する: きめ細かい合成画像とキャプションからなる新しい評価ベンチマーク。具体的には, テキストのみの回復性は, コントラッシブVLモデルにおける構成因子のモデル化に必要である(ただし十分ではない)ことを示唆する。データセットとコードをリリースします。

Performant vision-language (VL) models like CLIP represent captions using a single vector. How much information about language is lost in this bottleneck? We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e.g., single object, to object+property, to multiple interacting objects). Then, we train text-only recovery probes that aim to reconstruct captions from single-vector text representations produced by several VL models. This approach does not require images, allowing us to test on a broader range of scenes compared to prior work. We find that: 1) CLIP's text encoder falls short on more compositional inputs, including object relationships, attribute-object association, counting, and negations; 2) some text encoders work significantly better than others; and 3) text-only recovery performance predicts multi-modal matching performance on ControlledImCaps: a new evaluation benchmark we collect and release consisting of fine-grained compositional images and captions. Specifically, our results suggest text-only recoverability is a necessary (but not sufficient) condition for modeling compositional factors in contrastive VL models. We release our datasets and code.

翻訳日:2023-11-02 02:27:42 公開日:2023-10-30

# 等角化グラフニューラルネットワークによるグラフ上の不確かさの定量化

Uncertainty Quantification over Graph with Conformalized Graph Neural Networks ( http://arxiv.org/abs/2305.14535v2 )

ライセンス: Link先を確認

Kexin Huang, Ying Jin, Emmanuel Cand\`es, Jure Leskovec

(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データに基づく強力な機械学習予測モデルである。しかし、GNNには厳密な不確実性評価がなく、エラーのコストが重要な設定での信頼性の高いデプロイメントを制限している。本稿では,共形予測(CP)をグラフベースモデルに拡張した共形GNN(CF-GNN)を提案する。グラフ内のエンティティが与えられると、cf-gnnは、事前に定義されたカバレッジ確率(例えば90%)を持つ真のラベルを含む予測セット/インターバルを生成する。我々は,グラフデータに対するCPの有効性を実現するための置換不変条件を確立し,テスト時間カバレッジを正確に評価する。また,有効範囲の他に,実用上の予測セットサイズ/インターバル長の削減が重要である。予測の更新を学習し、より効率的な予測セット/インターバルを生成するトポロジー対応出力補正モデルを開発する動機となる、非コンフォーマリティスコアとネットワーク構造の間の鍵接続を観察した。大規模実験の結果,CF-GNNは予め定義された目標範囲の範囲を達成できる一方で,予測セット/インターバルサイズを最大74%削減できることがわかった。また、様々な生およびネットワーク機能に対する十分な条件付きカバレッジを実証的に達成する。

Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data. However, GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant. We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates. Given an entity in the graph, CF-GNN produces a prediction set/interval that provably contains the true label with pre-defined coverage probability (e.g. 90%). We establish a permutation invariance condition that enables the validity of CP on graph data and provide an exact characterization of the test-time coverage. Moreover, besides valid coverage, it is crucial to reduce the prediction set size/interval length for practical use. We observe a key connection between non-conformity scores and network structures, which motivates us to develop a topology-aware output correction model that learns to update the prediction and produces more efficient prediction sets/intervals. Extensive experiments show that CF-GNN achieves any pre-defined target marginal coverage while significantly reducing the prediction set/interval size by up to 74% over the baselines. It also empirically achieves satisfactory conditional coverage over various raw and network features.

翻訳日:2023-11-02 02:26:58 公開日:2023-10-30

# 予習変圧器における創発的モジュラリティ

Emergent Modularity in Pre-trained Transformers ( http://arxiv.org/abs/2305.18390v2 )

ライセンス: Link先を確認

Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Chaojun Xiao, Xiaozhi Wang, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou

(参考訳) この研究は、人間の脳によく見られる特徴であり、汎用知能に欠かせない機能である、事前訓練されたトランスフォーマーにおけるモジュラリティの存在を調べる。 1)ニューロンの機能的特殊化:各ニューロンが主に特定の機能に特化しているかどうかを評価し,その答えがイエスであることを確かめる。 2) 機能に基づくニューロングループ化: 機能によってニューロンをモジュールに分類する構造を探索し, 各モジュールが対応する機能のために機能する。考えられる膨大な量の構造を考えると、我々は期待できる候補としてMixture-of-Expertsに注目し、ニューロンを専門家に分割し、通常異なる入力に対して異なる専門家を活性化する。実験の結果,特定の機能に特化しているニューロンがクラスター化されている機能の専門家がいることがわかった。さらに、機能専門家のアクティベーションの摂動は、対応する機能に大きく影響する。最後に,事前学習中にモジュール構造がどのように出現するかを調べ,モジュール構造が早期に安定化し,ニューロン安定化よりも高速であることが判明した。トランスフォーマーはまずモジュール構造を構築し、次に細粒度のニューロン機能を学ぶことを示唆する。コードとデータはhttps://github.com/THUNLP/modularity-analysis.comで公開されています。

This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.

翻訳日:2023-11-02 02:19:28 公開日:2023-10-30

# 信頼を超えて:信頼できるモデルは非特異性も考慮すべきである

Beyond Confidence: Reliable Models Should Also Consider Atypicality ( http://arxiv.org/abs/2305.18262v2 )

ライセンス: Link先を確認

Mert Yuksekgonul, Linjun Zhang, James Zou, Carlos Guestrin

(参考訳) ほとんどの機械学習モデルは予測に自信を与えることができるが、予測の信頼性を理解するには自信が不十分である。例えば、入力がトレーニングデータセットで十分に表現されていない場合や、入力が本質的に曖昧である場合、モデルは信頼性の低い予測を行うことができる。本研究では,サンプルやクラスが非典型的(希少)であるかとモデル予測の信頼性の関係について検討する。まず,非定型性は誤用と正確性に強く関連していることを示す。特に,非定型入力や非定型クラスの予測が過度に信頼され,精度が低いことを実証的に示す。これらの知見を用いて,不確かさの定量化とモデル性能の向上を,識別型ニューラルネットワークと大規模言語モデルに適用した。本報告では,非定型性を用いることで,異なる皮膚トーン群にまたがる皮膚病変分類器の性能が向上することを示す。全体として,モデルの信頼性だけでなく,不確実性の定量化や性能向上にも非定型性を用いるべきである。以上の結果から, 簡易な非定型性推定器が有意な価値をもたらすことが示唆された。

While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical(rare) a sample or a class is and the reliability of a model's predictions. We first demonstrate that atypicality is strongly related to miscalibration and accuracy. In particular, we empirically show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. Using these insights, we show incorporating atypicality improves uncertainty quantification and model performance for discriminative neural networks and large language models. In a case study, we show that using atypicality improves the performance of a skin lesion classifier across different skin tone groups without having access to the group attributes. Overall, we propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance. Our results demonstrate that simple post-hoc atypicality estimators can provide significant value.

翻訳日:2023-11-02 02:19:07 公開日:2023-10-30

# テキスト駆動画像変換のための条件スコアガイダンス

Conditional Score Guidance for Text-Driven Image-to-Image Translation ( http://arxiv.org/abs/2305.18007v2 )

ライセンス: Link先を確認

Hyunsoo Lee, Minsoo Kang, Bohyung Han

(参考訳) 本稿では,事前訓練されたテキスト・画像拡散モデルに基づくテキスト駆動画像変換のための新しいアルゴリズムを提案する。本手法は,修正テキストで定義されたソース画像の関心領域を選択的に編集し,残りの部分を保存し,対象画像を生成することを目的とする。目標プロンプトのみに依存する既存の手法とは対照的に、特定の翻訳タスクに対応するために調整されたソース画像とソーステキストプロンプトの両方を考慮に入れる新しいスコア関数を導入する。この目的のために、条件スコア関数を基準スコアと目標画像生成のためのガイド語に分解し、原則的に導出する。指導項の勾配計算には,後方分布のガウス分布を仮定し,その平均と分散を推定し,追加の訓練をすることなく勾配を調整できる。さらに,条件付きスコアガイダンスの品質向上のために,ソースとターゲットの潜伏者から得られた2つのクロスアテンションマップを組み合わせた,シンプルで効果的なミックスアップ手法を取り入れた。この戦略は、ソース画像における不変部分とターゲットプロンプトに整列した編集領域との望ましい融合を促進するのに有効であり、高忠実なターゲット画像を生成する。総合的な実験により,様々なタスクにおいて優れた画像から画像への翻訳性能を実現することを実証した。

We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function that additionally considers both the source image and the source text prompt, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled manner, decomposing it into the standard score and a guiding term for target image generation. For the gradient computation of the guiding term, we assume a Gaussian distribution of the posterior distribution and estimate its mean and variance to adjust the gradient without additional training. In addition, to improve the quality of the conditional score guidance, we incorporate a simple yet effective mixup technique, which combines two cross-attention maps derived from the source and target latents. This strategy is effective for promoting a desirable fusion of the invariant parts in the source image and the edited regions aligned with the target prompt, leading to high-fidelity target image generation. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks.

翻訳日:2023-11-02 02:18:19 公開日:2023-10-30

# GMSF:グローバルマッチングシーンフロー

GMSF: Global Matching Scene Flow ( http://arxiv.org/abs/2305.17432v2 )

ライセンス: Link先を確認

Yushan Zhang, Johan Edstedt, Bastian Wandt, Per-Erik Forss\'en, Maria Magnusson, Michael Felsberg

(参考訳) 我々は点雲からのシーンフロー推定の課題に取り組む。ソースとターゲットポイントクラウドが与えられた場合、目標はソースポイントクラウドの各ポイントからターゲットへの変換を見積もることであり、結果として3dモーションベクトルフィールドが生成される。従来主流であったシーンフロー推定手法では,多段階的な細粒化や再帰的なアーキテクチャが必要であった。対照的に,この問題に対処するために,単発グローバルマッチングの簡易化を提案する。私たちの重要な発見は、ポイントペア間の信頼性の高い機能類似性が不可欠であり、正確なシーンフローを推定するのに十分であることです。そこで本研究では, 高精度かつロバストな特徴表現に不可欠な, ハイブリッドな局所-グローバル-クロストランスフォーマーアーキテクチャを用いて特徴抽出ステップを分解する。大規模な実験により,提案したGlobal Matching Scene Flow (GMSF) が,複数のシーンフロー推定ベンチマークに新たな最先端を設定できることが示されている。 FlyingThings3Dでは、オクルージョンポイントが存在するため、GMSFは前回の最高パフォーマンスの27.4%から5.6%に減らす。 KITTI Scene Flowでは微調整が不要であり,提案手法は最先端の性能を示す。 Waymo-Openデータセットでは、提案手法は従来の手法よりも大きなマージンで優れている。コードはhttps://github.com/zhangyushan3/gmsfで入手できる。

We tackle the task of scene flow estimation from point clouds. Given a source and a target point cloud, the objective is to estimate a translation from each point in the source point cloud to the target, resulting in a 3D motion vector field. Previous dominant scene flow estimation methods require complicated coarse-to-fine or recurrent architectures as a multi-stage refinement. In contrast, we propose a significantly simpler single-scale one-shot global matching to address the problem. Our key finding is that reliable feature similarity between point pairs is essential and sufficient to estimate accurate scene flow. We thus propose to decompose the feature extraction step via a hybrid local-global-cross transformer architecture which is crucial to accurate and robust feature representations. Extensive experiments show that the proposed Global Matching Scene Flow (GMSF) sets a new state-of-the-art on multiple scene flow estimation benchmarks. On FlyingThings3D, with the presence of occlusion points, GMSF reduces the outlier percentage from the previous best performance of 27.4% to 5.6%. On KITTI Scene Flow, without any fine-tuning, our proposed method shows state-of-the-art performance. On the Waymo-Open dataset, the proposed method outperforms previous methods by a large margin. The code is available at https://github.com/ZhangYushan3/GMSF.

翻訳日:2023-11-02 02:17:56 公開日:2023-10-30

# 高解像度画像の脳活動からのデコードに対するコントラスト, 態度, 難易度

Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities ( http://arxiv.org/abs/2305.17214v2 )

ライセンス: Link先を確認

Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, Marie-Francine Moens

(参考訳) 機能的磁気共鳴画像(fmri)によって記録された神経反応からの視覚刺激の復号は、認知神経科学と機械学習の興味深い交点を示し、人間の視覚知覚の理解と非侵襲的脳-機械インターフェイスの構築を約束する。しかし、この課題はfMRI信号のノイズの性質と脳の視覚表現の複雑なパターンによって困難である。これらの課題を軽減するために,2相fMRI表現学習フレームワークを導入する。第1フェーズでは、double-contrastive Mask Auto-encoderを提案してfMRI機能学習者を事前訓練し、識別表現を学習する。第2フェーズは、画像オートエンコーダからのガイダンスにより、視覚再構成に最も有用な神経活性化パターンに、特徴学習者が出席するようにチューニングする。最適化されたfMRI特徴学習者は、脳活動から画像刺激を再構成するために潜時拡散モデルを設定する。実験により,50-way-top-1のセマンティック分類精度において,従来の最先端手法よりも39.34%,高解像度かつセマンティックな画像を生成する上で,モデルが優れていることを示す。本研究は,非侵襲的脳-機械インタフェースの開発に寄与し,その可能性を探究するものである。

Decoding visual stimuli from neural responses recorded by functional Magnetic Resonance Imaging (fMRI) presents an intriguing intersection between cognitive neuroscience and machine learning, promising advancements in understanding human visual perception and building non-invasive brain-machine interfaces. However, the task is challenging due to the noisy nature of fMRI signals and the intricate pattern of brain visual representations. To mitigate these challenges, we introduce a two-phase fMRI representation learning framework. The first phase pre-trains an fMRI feature learner with a proposed Double-contrastive Mask Auto-encoder to learn denoised representations. The second phase tunes the feature learner to attend to neural activation patterns most informative for visual reconstruction with guidance from an image auto-encoder. The optimized fMRI feature learner then conditions a latent diffusion model to reconstruct image stimuli from brain activities. Experimental results demonstrate our model's superiority in generating high-resolution and semantically accurate images, substantially exceeding previous state-of-the-art methods by 39.34% in the 50-way-top-1 semantic classification accuracy. Our research invites further exploration of the decoding task's potential and contributes to the development of non-invasive brain-machine interfaces.

翻訳日:2023-11-02 02:16:54 公開日:2023-10-30

# 過去を想像して未来を推測する

Inferring the Future by Imagining the Past ( http://arxiv.org/abs/2305.17195v2 )

ライセンス: Link先を確認

Kartik Chandra, Tony Chen, Tzu-Mao Li, Jonathan Ragan-Kelley, Josh Tenenbaum

(参考訳) 漫画本の1枚のパネルは、現在キャラクターがどこにいるかだけでなく、彼らの動き、モチベーション、感情、次に何をすべきかを描写することができる。より一般に、人間は、これまで見たことのない状況でも*静的なスナップショット*から、過去の出来事と将来の出来事の複雑なシーケンスを日常的に推測する。本稿では,人間がこのような迅速かつ柔軟な推論を行う方法をモデル化する。認知科学における長い研究に基づいて、我々はモンテカルロのアルゴリズムを提供し、その推論は様々な領域における人間の直観とよく相関する。私たちの重要な技術的洞察は、推論問題とモンテカルロ経路追跡の驚くべき関係であり、コンピュータグラフィックスコミュニティから何十年ものアイデアを、一見無関係な心のタスクに応用することができます。

A single panel of a comic book can say a lot: it can depict not only where the characters currently are, but also their motions, their motivations, their emotions, and what they might do next. More generally, humans routinely infer complex sequences of past and future events from a *static snapshot* of a *dynamic scene*, even in situations they have never seen before. In this paper, we model how humans make such rapid and flexible inferences. Building on a long line of work in cognitive science, we offer a Monte Carlo algorithm whose inferences correlate well with human intuitions in a wide variety of domains, while only using a small, cognitively-plausible number of samples. Our key technical insight is a surprising connection between our inference problem and Monte Carlo path tracing, which allows us to apply decades of ideas from the computer graphics community to this seemingly-unrelated theory of mind task.

翻訳日:2023-11-02 02:16:29 公開日:2023-10-30

# 3つのタワー:事前学習によるフレキシブルコントラスト学習

Three Towers: Flexible Contrastive Learning with Pretrained Image Models ( http://arxiv.org/abs/2305.16999v3 )

ライセンス: Link先を確認

Jannik Kossen, Mark Collier, Basil Mustafa, Xiao Wang, Xiaohua Zhai, Lucas Beyer, Andreas Steiner, Jesse Berent, Rodolphe Jenatton, Efi Kokiopoulou

(参考訳) 本稿では,視覚言語モデルのコントラスト学習を改善するためのフレキシブルな手法である3つのタワー(3t)を提案する。対照的なモデルは通常、ゼロからトレーニングされるが、LiT (Zhai et al., 2022) は、最近、事前訓練された分類器の埋め込みによる性能向上を示している。しかし、ライトはイメージタワーを凍結した埋め込みに置き換え、イメージタワーを対照的に訓練することの利点を除いた。 3tでは,事前学習された組込みとコントラストトレーニングの両方の恩恵を受ける,より柔軟なストラテジーを提案する。これを実現するため,凍結した既設埋設塔を含む第3の塔を導入し,この第3の塔と主画像テキスト塔との整合を奨励する。経験的に、3TはLiTとCLIPスタイルの検索タスクのベースラインを一貫して改善する。分類において、3Tはオフスクラッチベースラインよりも確実に改善され、JFT事前トレーニングモデルではLiTと比較して性能が劣るが、ImageNet-21kとPlaces365事前トレーニングではLiTより優れている。

We introduce Three Towers (3T), a flexible method to improve the contrastive learning of vision-language models by incorporating pretrained image classifiers. While contrastive models are usually trained from scratch, LiT (Zhai et al., 2022) has recently shown performance gains from using pretrained classifier embeddings. However, LiT directly replaces the image tower with the frozen embeddings, excluding any potential benefits from training the image tower contrastively. With 3T, we propose a more flexible strategy that allows the image tower to benefit from both pretrained embeddings and contrastive training. To achieve this, we introduce a third tower that contains the frozen pretrained embeddings, and we encourage alignment between this third tower and the main image-text towers. Empirically, 3T consistently improves over LiT and the CLIP-style from-scratch baseline for retrieval tasks. For classification, 3T reliably improves over the from-scratch baseline, and while it underperforms relative to LiT for JFT-pretrained models, it outperforms LiT for ImageNet-21k and Places365 pretraining.

翻訳日:2023-11-02 02:16:14 公開日:2023-10-30

# 拡散モデルは視覚・言語共振器か?

Are Diffusion Models Vision-And-Language Reasoners? ( http://arxiv.org/abs/2305.16397v2 )

ライセンス: Link先を確認

Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy

(参考訳) テキスト条件付き画像生成モデルは最近、ノイズ拡散プロセスを用いて膨大な定性的成功を示している。しかし、識別的視覚・言語モデルとは異なり、これらの拡散に基づく生成モデルを用いて合成性などの高レベル現象の自動細粒度定量的評価を行うことは非自明な課題である。この目標に向けて、私たちは2つのイノベーションを実行します。まず、DiffusionITMと呼ばれる新しい手法を用いて、任意の画像テキストマッチング(ITM)タスクに対して拡散モデル(この場合、安定拡散)を変換する。第2に,7つの複雑な視覚言語タスク,バイアス評価,詳細な分析を備えた生成的判別評価ベンチマーク(gdbench)ベンチマークを紹介する。安定拡散+拡散ITMは多くのタスクで競争力があり、CLIPよりもCLEVRやWinogroundのようなコンポジションタスクで優れています。生成能力を保ちながらMS-COCOを微調整し, 転送設定により構成性能をさらに向上する。また, 拡散モデルにおける定型バイアスを測定し, 安定拡散2.1は, ほとんどが安定拡散1.5よりも偏りが少ないことを見出した。全体として,本研究の結果は,差別的・生成的モデル評価を近づけるエキサイティングな方向を示している。間もなくコードとベンチマークのセットアップをリリースします。

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon.

翻訳日:2023-11-02 02:15:53 公開日:2023-10-30

# Scan and Snap: 1層トランスにおけるトレーニングダイナミクスとトークン構成の理解

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer ( http://arxiv.org/abs/2305.16380v4 )

ライセンス: Link先を確認

Yuandong Tian, Yiping Wang, Beidi Chen, Simon Du

(参考訳) トランスフォーマーアーキテクチャは、複数の研究領域で顕著なパフォーマンスを示し、多くのニューラルネットワークモデルのバックボーンとなっている。しかし、その仕組みについては理解が限られている。特に、単純な予測損失により、勾配 \emph{training dynamics} からどのように表現が現れるかは謎のままである。本稿では, 1層自己着脱層と1層デコーダ層を有する1層変圧器について,次のトークン予測タスクに対するsgdトレーニングダイナミクスを数学的に厳密に解析する。自己注意層が入力トークンを結合する方法の動的プロセスのブラックボックスを開き、基礎となる帰納バイアスの性質を明らかにする。より具体的に言うと (a)位置符号化なし。 (b)長い入力シーケンス、及び (c)デコーダ層は自己アテンション層よりも早く学習し、自己アテンションが \emph{discriminative scan algorithm} として機能することを証明する。異なるトークンの中では、トレーニングセット内のキーとクエリトークンの間の低いから高い共起の順序に従って、徐々に注目の重みを減らします。興味深いことに、この手順は勝者の獲得に繋がらないが、2つの層の学習速度によって制御され、(ほとんど)固定されたトークンの組み合わせを残している 'emph{phase transition} によって減速する。合成および実世界データ(wikitext)上でのこの \textbf{\emph{scan and snap}} ダイナミクスを検証する。

Transformer architecture has shown impressive performance in multiple research domains and has become the backbone of many neural network models. However, there is limited understanding on how it works. In particular, with a simple predictive loss, how the representation emerges from the gradient \emph{training dynamics} remains a mystery. In this paper, for 1-layer transformer with one self-attention layer plus one decoder layer, we analyze its SGD training dynamics for the task of next token prediction in a mathematically rigorous manner. We open the black box of the dynamic process of how the self-attention layer combines input tokens, and reveal the nature of underlying inductive bias. More specifically, with the assumption (a) no positional encoding, (b) long input sequence, and (c) the decoder layer learns faster than the self-attention layer, we prove that self-attention acts as a \emph{discriminative scanning algorithm}: starting from uniform attention, it gradually attends more to distinct key tokens for a specific next token to be predicted, and pays less attention to common key tokens that occur across different next tokens. Among distinct tokens, it progressively drops attention weights, following the order of low to high co-occurrence between the key and the query token in the training set. Interestingly, this procedure does not lead to winner-takes-all, but decelerates due to a \emph{phase transition} that is controllable by the learning rates of the two layers, leaving (almost) fixed token combination. We verify this \textbf{\emph{scan and snap}} dynamics on synthetic and real-world data (WikiText).

翻訳日:2023-11-02 02:15:17 公開日:2023-10-30

# 協調学習と最適化における競争相手の正直感

Incentivizing Honesty among Competitors in Collaborative Learning and Optimization ( http://arxiv.org/abs/2305.16272v3 )

ライセンス: Link先を確認

Florian E. Dorner, Nikola Konstantinov, Georgi Pashaliev, Martin Vechev

(参考訳) 協調学習技術は、単一のエンティティのデータでトレーニングされたモデルよりも優れた機械学習モデルのトレーニングを可能にする可能性がある。しかし、多くの場合、このような協力的なスキームの潜在的な参加者は、最善のレコメンデーションを提供することで顧客を引き付けようとする企業のような下流のタスクの競合である。これは他の参加者のモデルを傷つける不名誉なアップデートをインセンティブにし、コラボレーションのメリットを損なう可能性がある。本研究では,このようなインタラクションをモデル化したゲームを定式化し,このフレームワークにおける2つの学習タスクについて検討する。プレイヤーアクションの自然なクラスについて、合理的なクライアントは、その更新を強く操作し、学習を妨げていることを示す。次に、正直なコミュニケーションを動機づけ、完全協調に匹敵する学習品質を確保するメカニズムを提案する。最後に、標準の非凸フェデレーション学習ベンチマークにおけるインセンティブスキームの有効性を実証的に実証する。私たちの研究は、不正なクライアントのインセンティブや行動を明確にモデル化し、悪意のあるクライアントと仮定するのではなく、協調学習のための強力な堅牢性を保証することを示しています。

Collaborative learning techniques have the potential to enable training machine learning models that are superior to models trained on a single entity's data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as firms that each aim to attract customers by providing the best recommendations. This can incentivize dishonest updates that damage other participants' models, potentially undermining the benefits of collaboration. In this work, we formulate a game that models such interactions and study two learning tasks within this framework: single-round mean estimation and multi-round SGD on strongly-convex objectives. For a natural class of player actions, we show that rational clients are incentivized to strongly manipulate their updates, preventing learning. We then propose mechanisms that incentivize honest communication and ensure learning quality comparable to full cooperation. Lastly, we empirically demonstrate the effectiveness of our incentive scheme on a standard non-convex federated learning benchmark. Our work shows that explicitly modeling the incentives and actions of dishonest clients, rather than assuming them malicious, can enable strong robustness guarantees for collaborative learning.

翻訳日:2023-11-02 02:14:47 公開日:2023-10-30

# ジャンプ拡散モデルによるトランス次元生成モデル

Trans-Dimensional Generative Modeling via Jump Diffusion Models ( http://arxiv.org/abs/2305.16261v2 )

ライセンス: Link先を確認

Andrew Campbell, William Harvey, Christian Weilbach, Valentin De Bortoli, Tom Rainforth, Arnaud Doucet

(参考訳) 本稿では,各データポイントの状態と次元を共同でモデル化することにより,異なる次元のデータを自然に扱う新しい生成モデルを提案する。生成過程は、異なる次元空間の間をジャンプするジャンプ拡散過程として定式化される。まず, 時間反転生成過程を生成する次元と, 近似する学習のための新しいエビデンスの下限学習目標を導出する前に, フォワードノジング過程を壊す次元を定義する。時間反転生成過程に対する学習近似をシミュレーションし、状態値と次元を共同生成することにより、様々な次元のデータをサンプリングする効果的な方法を提供する。我々は,様々な次元の分子およびビデオデータセットに対する我々のアプローチを実証し,実験時間拡散誘導インプテーションタスクとの適合性の向上と,状態値と次元を別々に生成する固定次元モデルとの補間能力の向上を報告した。

We propose a new class of generative models that naturally handle data of varying dimensionality by jointly modeling the state and dimension of each datapoint. The generative process is formulated as a jump diffusion process that makes jumps between different dimensional spaces. We first define a dimension destroying forward noising process, before deriving the dimension creating time-reversed generative process along with a novel evidence lower bound training objective for learning to approximate it. Simulating our learned approximation to the time-reversed generative process then provides an effective way of sampling data of varying dimensionality by jointly generating state values and dimensions. We demonstrate our approach on molecular and video datasets of varying dimensionality, reporting better compatibility with test-time diffusion guidance imputation tasks and improved interpolation capabilities versus fixed dimensional models that generate state values and dimensions separately.

翻訳日:2023-11-02 02:14:25 公開日:2023-10-30

# 機械振動子とキャビティ-マグノン偏光子の強結合の観測

Observation of strong coupling between a mechanical oscillator and a cavity-magnon polariton ( http://arxiv.org/abs/2307.11328v2 )

ライセンス: Link先を確認

Rui-Chang Shen, Jie Li, Wei-Jiang Wu, Xuan Zuo, Yi-Pu Wang, Shi-Yao Zhu, J. Q. You

(参考訳) キャビティマグノメカニクス(cmm)は新興分野であり、過去10年間、多くの注目を集めてきた。マイクロ波共振器光子、マグノン、振動フォノン間のコヒーレントカップリングを扱う。これまでのCMM実験はすべて、弱い結合状態で行われた。これはシステムの様々な応用を著しく制限する。ここでは, 強結合系におけるCMMシステムを実証し, 関連する正規モード分割を観察する。この状態において、機械振動子は、強く結合されたキャビティ光子とマグノンによって形成されるキャビティ・マグノン・ポラリトンに強く結合され、ポラリトン・メカニクスの協調性は4\times10^3$に達し、従来のCMM実験よりも3桁改善される。この系は三重結合系にあり、系の通常のモードはマイクロ波光子、マグノン、フォノンのハイブリッド化である。これは、コヒーレント完全吸収による偏光子モードの崩壊速度を著しく減少させ、崩壊速度を4桁まで減少させることによって達成される。この研究は、フォノン、光子、マグノンの量子状態のコヒーレントな制御と測定への道を開き、マルチパーティイトハイブリッドシステムにおけるリッチな強結合効果の研究のための新しいプラットフォームを提供する。

Cavity magnomechanics (CMM) is an emerging field and has received much attention in the past decade. It deals with coherent couplings among microwave cavity photons, magnons and vibration phonons. So far, all previous CMM experiments have been operated in the weak-coupling regime. This considerably limits prospective various applications of the system. Here, we demonstrate the CMM system in the strong-coupling regime and observe the associated normal-mode splitting. In this regime, the mechanical oscillator is strongly coupled to a cavity-magnon polariton that is formed by strongly coupled cavity photons and magnons, and the polariton-mechanics cooperativity reaches $4\times10^3$, which is improved by three orders of magnitude than previous CMM experiments. The system is then in the triple-strong-coupling regime and the normal modes of the system are the hybridization of microwave photons, magnons and phonons. This is achieved by significantly reducing the decay rate of the polariton mode using coherent perfect absorption and the decay rate is reduced by four orders of magnitude. The work paves the way towards coherent control and measurement of the quantum states of phonons, photons and magnons, and provides a new platform for the study of rich strong-coupling effects in multipartite hybrid systems.

翻訳日:2023-11-02 02:07:03 公開日:2023-10-30

# 汎用化工学設計知識の育成に向けて

Towards Populating Generalizable Engineering Design Knowledge ( http://arxiv.org/abs/2307.06985v3 )

ライセンス: Link先を確認

L Siddharth, Jianxi Luo

(参考訳) 汎用的な工学的設計知識を蓄積することを目指して,特許文書中の文から<head entity, relationship, tail entity>という形の事実を抽出する手法を提案する。これらの事実は特許文書の内外で組み合わせて知識グラフを形成し、設計知識を表現し保存するためのスキームとして機能する。工学設計文学における既存の手法は、事実ではなく統計的近似である三重項をポップアップさせるために予め定義された関係を利用することが多い。提案手法では,文からエンティティと関係を識別するためにタガーを訓練する。エンティティのペアが与えられた場合、特定の関係トークンを特定するために別のタグをトレーニングします。これらのタガーをトレーニングするために、44,227文のデータセットとそれに対応する事実を手作業で構築する。提案手法を2つの推奨アプローチに対してベンチマークする。本手法は,ファンシステムに関連する特許に含まれる文から事実を抽出することで適用する。これらの事実を用いて知識ベースを構築し、ドメインオントロジーをどのように構築し、サブシステムのコンテキスト化された知識を視覚化できるかを示す。次に,ファンシステムにおいて重要な問題に対する知識ベースを探索する。回答を知識グラフに整理し,ChatGPTの問題点に対する意見の比較検討を行う。

Aiming to populate generalizable engineering design knowledge, we propose a method to extract facts of the form <head entity, relationship, tail entity> from sentences found in patent documents. These facts could be combined within and across patent documents to form knowledge graphs that serve as schemes for representing as well as storing design knowledge. Existing methods in engineering design literature often utilise a set of predefined relationships to populate triples that are statistical approximations rather than facts. In our method, we train a tagger to identify both entities and relationships from a sentence. Given a pair of entities, we train another tagger to identify the specific relationship tokens. For training these taggers, we manually construct a dataset of 44,227 sentences and corresponding facts. We benchmark our method against two typically recommended approaches. We apply our method by extracting facts from sentences found in patents related to fan systems. We build a knowledge base using these facts to demonstrate how domain ontologies could be constructed and contextualised knowledge of subsystems could be visualised. We then search the knowledge base for key issues prevailing in fan systems. We organize the responses into knowledge graphs and hold a comparative discussion against the opinions about the key issues from ChatGPT.

翻訳日:2023-11-02 02:06:21 公開日:2023-10-30

# Soft Gripping: 信頼性の特定

Soft Gripping: Specifying for Trustworthiness ( http://arxiv.org/abs/2307.01159v2 )

ライセンス: Link先を確認

Dhaminda B. Abeywickrama, Nguyen Hao Le, Greg Chance, Peter D. Winter, Arianna Manzini, Alix J. Partridge, Jonathan Ives, John Downer, Graham Deacon, Jonathan Rossiter, Kerstin Eder, Shane Windsor

(参考訳) ソフトロボティクス(soft robotics)は、エンジニアがさまざまなアプリケーションで使える柔軟なデバイスを作る新しい技術である。ソフトロボットを広く採用するためには、その信頼性を保証することが不可欠である。信頼性を示すためには、仕様を定式化し、信頼できるものを定義する必要があります。しかし、ソフトロボティクスにおいて最も成熟した分野の一つであるソフトロボットグリッパーでさえ、ソフトロボティクスのコミュニティは、フォーメーション仕様にほとんど関心を示さなかった。本稿では,ソフトロボットシステムの開発における仕様開発の重要性について検討し,食料品のピックアップ・アンド・プレースタスクのためのソフトグリッパーの広範な例を示す。提案された仕様は、信頼性、安全性、適応性、予測可能性、倫理、規制など、機能的および非機能的要件の両方をカバーする。また,ソフトグリップの設計において,第一級の目的として検証可能性を促進する必要性を強調した。

Soft robotics is an emerging technology in which engineers create flexible devices for use in a variety of applications. In order to advance the wide adoption of soft robots, ensuring their trustworthiness is essential; if soft robots are not trusted, they will not be used to their full potential. In order to demonstrate trustworthiness, a specification needs to be formulated to define what is trustworthy. However, even for soft robotic grippers, which is one of the most mature areas in soft robotics, the soft robotics community has so far given very little attention to formulating specifications. In this work, we discuss the importance of developing specifications during development of soft robotic systems, and present an extensive example specification for a soft gripper for pick-and-place tasks for grocery items. The proposed specification covers both functional and non-functional requirements, such as reliability, safety, adaptability, predictability, ethics, and regulations. We also highlight the need to promote verifiability as a first-class objective in the design of a soft gripper.

翻訳日:2023-11-02 02:05:46 公開日:2023-10-30

# 動的システムの最適アクティブ探索

Optimistic Active Exploration of Dynamical Systems ( http://arxiv.org/abs/2306.12371v2 )

ライセンス: Link先を確認

Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes, Stelian Coros, Andreas Krause

(参考訳) 強化学習アルゴリズムは、通常、特定のタスクを解決するためのポリシーを最適化しようとする。推定モデルが大域的にダイナミクスを近似し,ゼロショットで複数のダウンストリームタスクを解決できるように,未知の力学系を探索するにはどうすればよいのか? 本稿では,この課題に対して,アクティブな探索のためのアルゴリズムであるOPAXを開発した。 OPAXは、よく校正された確率モデルを用いて、未知のダイナミクスに関する疫学的な不確かさを定量化する。それは楽観的に -- w.r.t. to plausible dynamics -- 未知のダイナミクスと状態観察の間の情報ゲインを最大化する。提案手法では, 結果の最適化問題を各エピソードで標準手法を用いて解くことができる最適制御問題に還元する方法を示す。一般モデルに対してアルゴリズムを解析し,ガウス過程のダイナミクスの場合,初歩的なサンプル複雑性を限定し,認識的不確かさがゼロに収束することを示す。実験では,OPAXと他のヒューリスティックな探索手法との比較を行った。実験の結果,OPAXは理論的に健全であるだけでなく,新しい下流タスクのゼロショット計画にも有効であることがわかった。

Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by developing an algorithm -- OPAX -- for active exploration. OPAX uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t. to plausible dynamics -- maximizes the information gain between the unknown dynamics and state observations. We show how the resulting optimization problem can be reduced to an optimal control problem that can be solved at each episode using standard approaches. We analyze our algorithm for general models, and, in the case of Gaussian process dynamics, we give a first-of-its-kind sample complexity bound and show that the epistemic uncertainty converges to zero. In our experiments, we compare OPAX with other heuristic active exploration approaches on several environments. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.

翻訳日:2023-11-02 02:05:29 公開日:2023-10-30

# HiNeRV:階層的エンコーディングに基づくニューラル表現によるビデオ圧縮

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation ( http://arxiv.org/abs/2306.09818v2 )

ライセンス: Link先を確認

Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

(参考訳) 学習ベースのビデオ圧縮は、現在一般的な研究テーマであり、従来の標準ビデオコーデックと競合する可能性を提供している。この文脈では、Inmplicit Neural Representations (INR) は以前、画像とビデオのコンテンツを表現し、圧縮するために用いられ、他の方法と比較して復号速度が比較的高い。しかし、既存のINRベースの手法では、ビデオ圧縮の最先端技術に匹敵する性能を達成できなかった。これは主に、その表現能力を制限する、採用されているネットワークアーキテクチャの単純さによる。本稿では,軽量層と新しい階層的位置符号化を組み合わせたINRであるHiNeRVを提案する。我々は,奥行き方向畳み込み層,mlp層,補間層を用いて,高容量で深く広いネットワークアーキテクチャを構築する。 HiNeRVはまた、フレームとパッチの両方でビデオをエンコードする統一表現であり、既存のメソッドよりも高いパフォーマンスと柔軟性を提供する。さらに、HiNeRVに基づくビデオコーデックと、トレーニング、プルーニング、量子化のための洗練されたパイプラインを構築し、失われたモデル圧縮時のHiNeRVのパフォーマンスをよりよく保存する。提案手法は,ビデオ圧縮のためのUVGデータセットとMCL-JCVデータセットの両方で評価され,学習ベースコーデックと比較して既存のINRのベースラインと競合性能(HNeRVで72.3%,UVGで43.4%)よりも大幅に向上した。

Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines light weight layers with novel hierarchical positional encodings. We employs depth-wise convolutional, MLP and interpolation layers to build the deep and wide network architecture with high capacity. HiNeRV is also a unified representation encoding videos in both frames and patches at the same time, which offers higher performance and flexibility than existing methods. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV's performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR).

翻訳日:2023-11-02 02:05:08 公開日:2023-10-30

# TensorNet:分子ポテンシャルの効率的な学習のためのモンテカルトテンソル表現

TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials ( http://arxiv.org/abs/2306.06482v2 )

ライセンス: Link先を確認

Guillem Simeon, Gianni de Fabritiis

(参考訳) 分子系表現のための効率的な機械学習モデルの開発は、科学研究において重要である。我々は、デカルトテンソル表現を利用する革新的なo(3)同値なメッセージパッシングニューラルネットワークアーキテクチャであるtensornetを紹介する。カルトテンソル原子埋め込みを用いて、行列積演算により特徴混合を単純化する。さらに、これらのテンソルを回転群既約表現にコスト効率良く分解することで、必要に応じてスカラー、ベクトル、テンソルの分離処理が可能になる。高階球面テンソルモデルと比較して、TensorNetはパラメータが大幅に少ない最先端の性能を示す。小さな分子ポテンシャルエネルギーの場合、これは単一の相互作用層でも達成できる。これらの特性の結果として、モデルの計算コストは大幅に削減される。さらに、ポテンシャルエネルギーと力の上のベクトルとテンソル分子量の正確な予測が可能となる。要約すると、TensorNetのフレームワークは最先端の同変モデルの設計のための新しい空間を開く。

The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermore, the cost-effective decomposition of these tensors into rotation group irreducible representations allows for the separate processing of scalars, vectors, and tensors when necessary. Compared to higher-rank spherical tensor models, TensorNet demonstrates state-of-the-art performance with significantly fewer parameters. For small molecule potential energies, this can be achieved even with a single interaction layer. As a result of all these properties, the model's computational cost is substantially decreased. Moreover, the accurate prediction of vector and tensor molecular quantities on top of potential energies and forces is possible. In summary, TensorNet's framework opens up a new space for the design of state-of-the-art equivariant models.

翻訳日:2023-11-02 02:04:38 公開日:2023-10-30

# PoET:配列配列としてのタンパク質ファミリーの生成モデル

PoET: A generative model of protein families as sequences-of-sequences ( http://arxiv.org/abs/2306.06156v2 )

ライセンス: Link先を確認

Timothy F. Truong Jr, Tristan Bepler

(参考訳) 生成タンパク質言語モデルは、望ましい機能を持つ新しいタンパク質を設計する自然な方法である。しかしながら、現在のモデルでは、特定の関心ファミリーからタンパク質を生産することは困難であるか、特定の関心ファミリーから大きな多重配列アライメント(MSA)を訓練する必要があるため、家族間での伝達学習の恩恵を受けられない。この問題に対処するために、我々は、何千万もの天然タンパク質配列の配列として関連タンパク質の集合を生成することを学ぶタンパク質ファミリー全体の自己回帰生成モデルである、$\textbf{P}$r$\textbf{o}$tein $\textbf{E}$volutionary $\textbf{T}$ransformer (PoET)を提案する。 PoETは、関心のあるタンパク質ファミリーで条件付けられた任意の変更を生成し、スコア付けするための検索強化言語モデルとして使用することができ、短いコンテキスト長から外挿して、小さなファミリーでもうまく一般化することができる。これはユニークなトランスフォーマー層によって実現されており、シーケンス間の順序を不変に保ちながらシーケンス内でトークンを逐次モデル化することで、トレーニング中に使用されるもの以上のコンテキスト長にスケールすることができる。 PoETは、深部突然変異スキャンデータセットに関する広範な実験において、既存のタンパク質言語モデルと変異関数予測のための進化的シーケンスモデルより優れており、すべてのMSA深さのタンパク質間の変異効果予測を改善している。

Generative protein language models are a natural way to design new proteins with desired functions. However, current models are either difficult to direct to produce a protein from a specific family of interest, or must be trained on a large multiple sequence alignment (MSA) from the specific family of interest, making them unable to benefit from transfer learning across families. To address this, we propose $\textbf{P}$r$\textbf{o}$tein $\textbf{E}$volutionary $\textbf{T}$ransformer (PoET), an autoregressive generative model of whole protein families that learns to generate sets of related proteins as sequences-of-sequences across tens of millions of natural protein sequence clusters. PoET can be used as a retrieval-augmented language model to generate and score arbitrary modifications conditioned on any protein family of interest, and can extrapolate from short context lengths to generalize well even for small families. This is enabled by a unique Transformer layer; we model tokens sequentially within sequences while attending between sequences order invariantly, allowing PoET to scale to context lengths beyond those used during training. PoET outperforms existing protein language models and evolutionary sequence models for variant function prediction in extensive experiments on deep mutational scanning datasets, improving variant effect prediction across proteins of all MSA depths.

翻訳日:2023-11-02 02:04:28 公開日:2023-10-30

# 量的回帰による反事実推論の進展

Advancing Counterfactual Inference through Quantile Regression ( http://arxiv.org/abs/2306.05751v2 )

ライセンス: Link先を確認

Shaoan Xie, Biwei Huang, Bin Gu, Tongliang Liu, Kun Zhang

(参考訳) 因果的影響を理解し、利用するためには、反事実的な「what if」問合せに対処する能力が不可欠である。従来の反事実推論は通常、構造因果モデルが利用可能であると仮定する。しかし、実際にはそのような因果モデルはしばしば未知であり、識別できない可能性がある。本稿では,与えられた因果モデルや条件分布の直接推定を必要とせずに,定性的因果構造と観測データに基づく信頼性の高い反事実推論を行うことを目的とする。我々は、反事実推論を拡張量子回帰問題として再考し、ディープニューラルネットワークを用いて一般的な因果関係とデータ分布を捉える。提案手法は, 既存のデータと比較して優れた統計効率を示し, さらに, 推定値の非認識データへの一般化の可能性を高め, 一般化誤差の上限を与える。複数のデータセットで実施した実証実験の結果は、我々の理論的な主張に対する説得力のある支持を提供する。

The capacity to address counterfactual "what if" inquiries is crucial for understanding and making use of causal influences. Traditional counterfactual inference usually assumes the availability of a structural causal model. Yet, in practice, such a causal model is often unknown and may not be identifiable. This paper aims to perform reliable counterfactual inference based on the (learned) qualitative causal structure and observational data, without necessitating a given causal model or even the direct estimation of conditional distributions. We re-cast counterfactual reasoning as an extended quantile regression problem, implemented with deep neural networks to capture general causal relationships and data distributions. The proposed approach offers superior statistical efficiency compared to existing ones, and further, it enhances the potential for generalizing the estimated counterfactual outcomes to previously unseen data, providing an upper bound on the generalization error. Empirical results conducted on multiple datasets offer compelling support for our theoretical assertions.

翻訳日:2023-11-02 02:03:56 公開日:2023-10-30

# 知識集約型タスクにおける小言語モデルの知識強化推論蒸留

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks ( http://arxiv.org/abs/2305.18395v2 )

ライセンス: Link先を確認

Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

(参考訳) 大規模言語モデル(LLM)は、知識の複雑な理解を必要とする知識集約的推論タスクにおいて、有望な性能を示す。しかし、LLMの実際のアプリケーションへの展開は、高い計算要求とデータプライバシに関する懸念のために困難である可能性がある。従来の研究は、ラベル付きデータで微調整したり、LLMを蒸留することで、タスク固有小言語モデル(LM)の構築に重点を置いてきた。しかしながら、これらのアプローチは、必要となる知識を記憶する小さなlmsの能力に制限があるため、知識集約的推論タスクには不向きである。記憶の理論的解析により, 外部知識ベースから得られる知識を付加したLPMから得られる有理性を生成するために, 小さなLMを微調整する新しい手法であるKARD(Knowledge-Augmented Reasoning Distillation)を提案する。さらに,理論生成に関連する文書を得るためのニューラルリランカも提案する。我々は、KARDが知識集約推論データセットであるMedQA-USMLE、StrategyQA、OpenbookQAにおいて、小さなT5およびGPTモデルの性能を著しく向上させることを示す。特に,MedQA-USMLEおよびStrategyQAベンチマークの2倍のパラメータを持つ細調整された3Bモデルに対して,2億5000万T5モデルを優れた性能を達成する。

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

翻訳日:2023-11-02 02:02:30 公開日:2023-10-30

# グローバル深層学習による治療反応予測と患者特異的薬物動態予測

Forecasting Response to Treatment with Global Deep Learning and Patient-Specific Pharmacokinetic Priors ( http://arxiv.org/abs/2309.13135v4 )

ライセンス: Link先を確認

Willa Potosnak, Cristian Challu, Kin G. Olivares, Artur Dubrawski

(参考訳) 予後の早期発見や患者のモニタリングには,医療時系列の予測が不可欠である。しかし、ノイズや間欠的なデータのために予測が難しい場合がある。これらの課題は、薬物投与などの外因性要因によって引き起こされる変化点によって、しばしば悪化する。これらの課題に対処するために,患者固有の治療効果の深層学習モデルを示す,新しいグローバルローカルアーキテクチャと薬物動態エンコーダを提案する。現実的にシミュレーションされた実世界データと実世界データの両方を用いて,血糖予測タスクの精度向上に向けたアプローチの有効性を示す。我々のグローバルローカルアーキテクチャは患者固有のモデルよりも9.2-14.6%改善している。さらに、我々の薬物動態エンコーダは、シミュレーションデータでは4.4%、実世界のデータでは2.1%で代替符号化技術よりも改善されている。提案手法は, 予期せぬ治療反応に対する早期警告の発行や, 薬物吸収および除去特性の観点から, 患者固有の治療効果を特徴付けるなど, 臨床実践において有益である。

Forecasting healthcare time series is crucial for early detection of adverse outcomes and for patient monitoring. Forecasting, however, can be difficult in practice due to noisy and intermittent data. The challenges are often exacerbated by change points induced via extrinsic factors, such as the administration of medication. To address these challenges, we propose a novel hybrid global-local architecture and a pharmacokinetic encoder that informs deep learning models of patient-specific treatment effects. We showcase the efficacy of our approach in achieving significant accuracy gains for a blood glucose forecasting task using both realistically simulated and real-world data. Our global-local architecture improves over patient-specific models by 9.2-14.6%. Additionally, our pharmacokinetic encoder improves over alternative encoding techniques by 4.4% on simulated data and 2.1% on real-world data. The proposed approach can have multiple beneficial applications in clinical practice, such as issuing early warnings about unexpected treatment responses, or helping to characterize patient-specific treatment effects in terms of drug absorption and elimination characteristics.

翻訳日:2023-11-02 01:54:04 公開日:2023-10-30

# テンソルネットワークによる位相双対性

Topological dualities via tensor networks ( http://arxiv.org/abs/2309.13118v2 )

ライセンス: Link先を確認

C. Wille, J. Eisert, A. Altland

(参考訳) トーリック符号の基底状態、二次元クラスd超伝導体の基底状態、および二次元イジングモデルの分割和は互いに双対である。この双対性は、物理学の様々な分野に共通するシステム、すなわち、長い範囲の絡み合った位相秩序、(位相)バンド絶縁体、そして古典的な統計力学を結び付けるため、目覚ましい。フェルミオン系とボソニック系をつなぐ双対性構成は本質的に非局所的であり、1次元への次元還元、共形場理論法、作用素代数など様々なアプローチで対処されている。本研究では,この双対性に対する一元的アプローチを提案し,その主主人公がテンソルネットワーク(tn)であり,中間翻訳者の役割を仮定する。双対性のネットに4番目のノードを導入すると、以下の利点が得られる: 定式化は、双対性のすべてのリンクが等しい基底で扱われること、(場の理論的なアプローチとは異なり)格子の精度で定式化されること、相関関数のマッピングにおいて鍵となる特徴、そしてそれらの可能な数値的実装である。最後に、ボソンからフェルミオンへの通過は、直感的で技術的に便利な形式を仮定する2次元のTNフレームワークで完全に定式化される。本稿では, 位相遷移, 点・線欠陥, 位相境界モード, およびシステムクラス間のマッピング下での他の構造の運命を探ることにより, 形式化の予測可能性を示す。物質リーダシップを念頭に置いて,tnsの概念への最小限の親和性のみを前提として,教育的に構築を紹介する。

The ground state of the toric code, that of the two-dimensional class D superconductor, and the partition sum of the two-dimensional Ising model are dual to each other. This duality is remarkable inasmuch as it connects systems commonly associated to different areas of physics -- that of long range entangled topological order, (topological) band insulators, and classical statistical mechanics, respectively. Connecting fermionic and bosonic systems, the duality construction is intrinsically non-local, a complication that has been addressed in a plethora of different approaches, including dimensional reduction to one dimension, conformal field theory methods, and operator algebra. In this work, we propose a unified approach to this duality, whose main protagonist is a tensor network (TN) assuming the role of an intermediate translator. Introducing a fourth node into the net of dualities offers several advantages: the formulation is integrative in that all links of the duality are treated on an equal footing, (unlike in field theoretical approaches) it is formulated with lattice precision, a feature that becomes key in the mapping of correlation functions, and their possible numerical implementation. Finally, the passage from bosons to fermions is formulated entirely within the two-dimensional TN framework where it assumes an intuitive and technically convenient form. We illustrate the predictive potential of the formalism by exploring the fate of phase transitions, point and line defects, topological boundary modes, and other structures under the mapping between system classes. Having condensed matter readerships in mind, we introduce the construction pedagogically in a manner assuming only minimal familiarity with the concept of TNs.

翻訳日:2023-11-02 01:53:45 公開日:2023-10-30

# 長距離相互作用系におけるロバスト量子多体傷の理論

Theory of robust quantum many-body scars in long-range interacting systems ( http://arxiv.org/abs/2309.12504v2 )

ライセンス: Link先を確認

Alessio Lerose, Tommaso Parolini, Rosario Fazio, Dmitry A. Abanin, Silvia Pappalardi

(参考訳) 量子多体傷(Quantum many-body scars、QMBS)は、特別な非平衡初期状態に対する熱化の違反に関連する量子多体系の例外的なエネルギー固有状態である。彼らの様々な体系的構成は局所ハミルトニアンパラメータの微調整を必要とする。本研究では、長距離相互作用する量子スピン系の設定が、一般に堅牢なQMBSをホストすることを示す。我々は、可解な置換対称極限$\alpha=0$からスピンスピン相互作用のパワー-ロー減衰指数$\alpha$を上げる際のスペクトル特性を解析する。まず、カオスのスペクトル符号が無限小$\alpha$に対して現れるにもかかわらず、大きな集合スピンを持つ$\alpha=0$エネルギー固有状態の塔は、$\alpha$の増加とともに滑らかに変形し、特徴的なQMBS特性を示すことを数値的に証明する。より大きな系におけるこれらの状態の性質と運命を明らかにするために、スピンハミルトニアンを相対論的量子回転子に非線型結合した広範なボソニックモードにマッピングする解析的アプローチを導入する。相互作用する不純物モデルの固有状態を正確に解き、原ハミルトニアンの大スピンセクターにおける自己整合局在を$0<\alpha<d$で示す。本理論は, 任意の系サイズに対するqmbの安定性機構を明らかにし, 動的臨界点近傍や半古典的カオスの存在を予測し, 長距離量子イジングチェーンにおいて数値的に検証する。副生成物として、Floquet-prethermalization定理を超えて、周期駆動下での加熱の有無の予測基準が$0<\alpha<d$である。この作業のより広い視点は、ここで開発された技術ツールボックスの独立した応用から、実験ルートの通知から、メトロロジー的に有用なマルチパートの絡み合いまで幅広い。

Quantum many-body scars (QMBS) are exceptional energy eigenstates of quantum many-body systems associated with violations of thermalization for special non-equilibrium initial states. Their various systematic constructions require fine-tuning of local Hamiltonian parameters. In this work we demonstrate that the setting of long-range interacting quantum spin systems generically hosts robust QMBS. We analyze spectral properties upon raising the power-law decay exponent $\alpha$ of spin-spin interactions from the solvable permutationally-symmetric limit $\alpha=0$. First, we numerically establish that despite spectral signatures of chaos appear for infinitesimal $\alpha$, the towers of $\alpha=0$ energy eigenstates with large collective spin are smoothly deformed as $\alpha$ is increased, and exhibit characteristic QMBS features. To elucidate the nature and fate of these states in larger systems, we introduce an analytical approach based on mapping the spin Hamiltonian onto a relativistic quantum rotor non-linearly coupled to an extensive set of bosonic modes. We exactly solve for the eigenstates of this interacting impurity model, and show their self-consistent localization in large-spin sectors of the original Hamiltonian for $0<\alpha<d$. Our theory unveils the stability mechanism of such QMBS for arbitrary system size and predicts instances of its breakdown e.g. near dynamical critical points or in presence of semiclassical chaos, which we verify numerically in long-range quantum Ising chains. As a byproduct, we find a predictive criterion for presence or absence of heating under periodic driving for $0<\alpha<d$, beyond existing Floquet-prethermalization theorems. Broader perspectives of this work range from independent applications of the technical toolbox developed here to informing experimental routes to metrologically useful multipartite entanglement.

翻訳日:2023-11-02 01:53:13 公開日:2023-10-30

# 選択アーティファクトとしてのベル相関

Bell Correlations as Selection Artefacts ( http://arxiv.org/abs/2309.10969v2 )

ライセンス: Link先を確認

Huw Price and Ken Wharton

(参考訳) ベル相関は,実験の初期状態の通常の制御によって生じる特別な選択人工物として生じる可能性があることを示す。これは非局所性であり、直接的な空間的な因果関係や影響を含まない。この議論は、2つの主な点で (arxiv:2101.05370v4 [quant-ph], arxiv:2212.06986 [quant-ph]) における以前の提案を改善する。 (i)実際のベル実験でその応用を示すこと、及び (ii)レトロカウサリティの仮定を避けること。

We show that Bell correlations may arise as a special sort of selection artefact, produced by ordinary control of the initial state of the experiments concerned. This accounts for nonlocality, without recourse to any direct spacelike causality or influence. The argument improves an earlier proposal in (arXiv:2101.05370v4 [quant-ph], arXiv:2212.06986 [quant-ph]) in two main respects: (i) in demonstrating its application in a real Bell experiment; and (ii) in avoiding the need for a postulate of retrocausality.

翻訳日:2023-11-02 01:52:38 公開日:2023-10-30

# テキストから画像への空間制御のためのマスキング・アテンション拡散指導

Masked-Attention Diffusion Guidance for Spatially Controlling Text-to-Image Generation ( http://arxiv.org/abs/2308.06027v2 )

ライセンス: Link先を確認

Yuki Endo

(参考訳) テキストから画像への合成は,最近の拡散モデルの発展に伴い,高品質な結果が得られた。しかし、テキスト入力だけでは空間的曖昧性が高く、ユーザー制御性は限られている。既存の手法では、視覚誘導(スケッチやセマンティックマスクなど)の追加による空間制御が可能だが、注釈付き画像による追加の訓練が必要となる。本稿では,拡散モデルのさらなる訓練を行わずにテキスト対画像生成を空間的に制御する手法を提案する。本手法は,クロスアテンションマップが単語と画素の位置関係を反映しているという知見に基づく。我々の目的は、与えられたセマンティックマスクやテキストプロンプトに従ってアテンションマップを制御することである。この目的のために、まず、意味領域から計算された定数マップと交差注意マップを直接置き換える簡単なアプローチを探求する。いくつかの先行研究は、クロスアテンションマップを直接操作することで、テキストと画像の拡散モデルのトレーニング不要な空間制御を可能にする。しかし、これらのアプローチは、操作された注意マップが拡散モデルによって学習された実際のものとは程遠いため、与えられたマスクに対する誤解に苦しめられている。この問題に対処するために,拡散モデルに入力された雑音画像を操作することで,各単語や画素への注意を間接的に制御することで,セマンティックマスクに忠実な画像を生成するマスク注意誘導を提案する。 masked-attention guidanceは、事前訓練されたオフザシェルフ拡散モデル(例えば、安定拡散)に容易に統合でき、テキスト誘導画像編集のタスクに適用できる。実験により,本手法は質的および定量的にベースラインよりも高精度な空間制御が可能となった。

Text-to-image synthesis has achieved high-quality results with recent advances in diffusion models. However, text input alone has high spatial ambiguity and limited user controllability. Most existing methods allow spatial control through additional visual guidance (e.g., sketches and semantic masks) but require additional training with annotated images. In this paper, we propose a method for spatially controlling text-to-image generation without further training of diffusion models. Our method is based on the insight that the cross-attention maps reflect the positional relationship between words and pixels. Our aim is to control the attention maps according to given semantic masks and text prompts. To this end, we first explore a simple approach of directly swapping the cross-attention maps with constant maps computed from the semantic regions. Some prior works also allow training-free spatial control of text-to-image diffusion models by directly manipulating cross-attention maps. However, these approaches still suffer from misalignment to given masks because manipulated attention maps are far from actual ones learned by diffusion models. To address this issue, we propose masked-attention guidance, which can generate images more faithful to semantic masks via indirect control of attention to each word and pixel by manipulating noise images fed to diffusion models. Masked-attention guidance can be easily integrated into pre-trained off-the-shelf diffusion models (e.g., Stable Diffusion) and applied to the tasks of text-guided image editing. Experiments show that our method enables more accurate spatial control than baselines qualitatively and quantitatively.

翻訳日:2023-11-02 01:51:47 公開日:2023-10-30

# 視覚変換器を用いたマルチモーダルからモノモーダルリンパ腫サブタイプモデルへの知識伝達フレームワーク

A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models ( http://arxiv.org/abs/2308.01328v2 )

ライセンス: Link先を確認

Bilel Guetarni, Feryal Windal, Halim Benhabiles, Marianne Petit, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard

(参考訳) リンパ腫の亜型を決定することは、生存可能性を高めるためにより良い治療を目標とする患者にとって重要なステップである。この文脈では、遺伝子発現技術に基づく既存のゴールド標準診断法は、高いコストと時間を要するため、アクセシビリティが困難である。 ihc(免疫組織化学)技術に基づく代替診断法(whoが推奨する)は存在するが、同様の制限があり、正確性は低い。深層学習モデルによるWSI(Whole Slide Image)分析では、既存の代替手法よりも安価で高速ながん診断の新しい方向性が示された。本研究では,高分解能wsisとdlbcl(diffuse large b-cell lymphoma)癌サブタイプを区別するためのビジョントランスフォーマティブに基づく枠組みを提案する。この目的のために,様々なWSIモダリティから分類器モデルを訓練するためのマルチモーダルアーキテクチャを提案する。そして,このモデルを知識蒸留機構を用いて,モノモーダル分類器の学習を効率的に進める。 157人の患者を対象に行った実験では, がん分類に関する最新の6つの手法を上回って, モノモーダル分類モデルの有望な性能を示した。さらに、本実験データから推定したパワーロー曲線から、適切な数の追加患者からのトレーニングデータが増えると、我々のモデルは、ICH技術と同等の精度で診断できる可能性が示唆された。

Determining lymphoma subtypes is a crucial step for better patients treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which is based on gene expression technology, is highly expensive and time-consuming making difficult its accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. WSI (Whole Slide Image) analysis by deep learning models showed promising new directions for cancer diagnosis that would be cheaper and faster than existing alternative methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we propose a multi-modal architecture to train a classifier model from various WSI modalities. We then exploit this model through a knowledge distillation mechanism for efficiently driving the learning of a mono-modal classifier. Our experimental study conducted on a dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent methods from the state-of-the-art dedicated for cancer classification. Moreover, the power-law curve, estimated on our experimental data, suggest that with more training data from a reasonable number of additional patients, our model has the potential to achieve diagnostic accuracy comparable to that of IHC technologies.

翻訳日:2023-11-02 01:50:26 公開日:2023-10-30

# mlic++: 学習画像圧縮のための線形複雑性マルチリファレンスエントロピーモデリング

MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression ( http://arxiv.org/abs/2307.15421v3 )

ライセンス: Link先を確認

Wei Jiang, Ronggang Wang

(参考訳) 近年,チャネルワイド,局所空間,大域空間相関を捉えるマルチ参照エントロピーモデルが提案されている。以前の研究では、グローバル相関キャプチャに注意が払われているが、二次複雑性は高解像度画像符号化の可能性を制限する。本稿では,softmax 操作の分解を通じて,線形複雑性大域的相関をキャプチャする手法を提案する。そこで我々はMLIC$^{++}$を提案し,マルチ参照エントロピーモデリングのための線形複雑度を持つ画像圧縮手法を提案する。我々のMLIC$^{++}$はより効率的で、PSNRで測定した場合のVTM-17.0と比較して、KodakデータセットのBDレートを13.39%削減する。コードはhttps://github.com/JiangWeibeta/MLICで入手できる。

Recently, multi-reference entropy model has been proposed, which captures channel-wise, local spatial, and global spatial correlations. Previous works adopt attention for global correlation capturing, however, the quadratic complexity limits the potential of high-resolution image coding. In this paper, we propose the linear complexity global correlations capturing, via the decomposition of softmax operation. Based on it, we propose the MLIC$^{++}$, a learned image compression with linear complexity for multi-reference entropy modeling. Our MLIC$^{++}$ is more efficient and it reduces BD-rate by 13.39% on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Code is available at https://github.com/JiangWeibeta/MLIC.

翻訳日:2023-11-02 01:49:59 公開日:2023-10-30

# モデルベースツリーマルコフモデルを用いた透明シーケンスモデルに向けて

Toward Transparent Sequence Models with Model-Based Tree Markov Model ( http://arxiv.org/abs/2307.15367v2 )

ライセンス: Link先を確認

Chan Hsu, Wei-Chun Huang, Jun-Ting Wu, Chih-Yuan Li, Yihuang Kang

(参考訳) 本研究では,シーケンスデータに適用した複雑なブラックボックス機械学習モデルにおける解釈可能性の問題に対処する。モデルベース木隠れセミマルコフモデル(MOB-HSMM)は,高死亡リスク事象の検出と集中治療室(ICU)の死亡リスクに関連する隠れパターンの発見を目的とした,本質的に解釈可能なモデルである。このモデルは、Deep Neural Networks (DNN)から抽出した知識を活用し、明確な説明を提供しながら予測性能を向上させる。実験の結果,モデルベースツリー(MOB木)の性能はLSTMを用いて逐次パターンを学習し,MOB木に転送することで向上した。 MOB-HSMMでHidden Semi-Markov Model (HSMM) とMOBツリーを統合することで、利用可能な情報を用いて潜在的および説明可能なシーケンスを明らかにすることができる。

In this study, we address the interpretability issue in complex, black-box Machine Learning models applied to sequence data. We introduce the Model-Based tree Hidden Semi-Markov Model (MOB-HSMM), an inherently interpretable model aimed at detecting high mortality risk events and discovering hidden patterns associated with the mortality risk in Intensive Care Units (ICU). This model leverages knowledge distilled from Deep Neural Networks (DNN) to enhance predictive performance while offering clear explanations. Our experimental results indicate the improved performance of Model-Based trees (MOB trees) via employing LSTM for learning sequential patterns, which are then transferred to MOB trees. Integrating MOB trees with the Hidden Semi-Markov Model (HSMM) in the MOB-HSMM enables uncovering potential and explainable sequences using available information.

翻訳日:2023-11-02 01:49:42 公開日:2023-10-30

# 超広帯域における量子情報の多重処理

Multiplexed Processing of Quantum Information Across an Ultra-wide Optical Bandwidth ( http://arxiv.org/abs/2310.17819v2 )

ライセンス: Link先を確認

Alon Eldan, Ofek Gilon, Asher Lagimi, Elai Forman, Avi Pe'er

(参考訳) 量子情報処理は量子技術の基礎である。量子情報のプロトコルは、セキュアな通信(量子鍵分布)、テレポート量子状態、および量子計算の中心となる2つの遠隔者間で秘密を共有する。様々な量子通信プロトコルがすでに実現され、商用化されているが、その通信速度は一般的には、利用可能な量子光学光源(10-100 THz)の光帯域よりも低いMHzからGHzの範囲における測定装置の狭い電子帯域幅によって制限されている。本稿では、パラメトリックホモダイン検出による全チャネルの同時測定により、これらのブロードバンドソースを並列に多重周波数チャネル上に並列に処理する効率的な方法を提案する。具体的には、多重連続可変量子鍵分布(CV-QKD)と多重連続可変量子テレポーテーションプロトコルの2つの基本プロトコルを提案する。そこで本研究では,23以上の非相関スペクトルチャネルに対するqkdの検証に成功し,いずれにおいても盗聴を検知する能力を示した。これらの多重化手法(および類似)は、数百のチャネル上で並列に量子処理を実行し、量子プロトコルのスループットを桁違いに増加させる可能性がある。

Quantum information processing is the foundation of quantum technology. Protocols of quantum information share secrets between two distant parties for secure communication (quantum key distribution), teleport quantum states, and stand at the heart of quantum computation. While various protocols of quantum communication have already been realized, and even commercialized, their communication speed is generally low, limited by the narrow electronic bandwidth of the measurement apparatus in the MHz-to-GHz range, which is orders-of-magnitude lower than the optical bandwidth of available quantum optical sources (10-100 THz). We present and demonstrate an efficient method to process quantum information with such broadband sources in parallel over multiplexed frequency channels using parametric homodyne detection for simultaneous measurement of all the channels. Specifically, we propose two basic protocols: A multiplexed Continuous-Variable Quantum Key Distribution (CV-QKD) and A multiplexed continuous-variable quantum teleportation protocol. We demonstrate the multiplexed CV-QKD protocol in a proof-of-principle experiment, where we successfully carry out QKD over 23 uncorrelated spectral channels and show the ability to detect eavesdropping in any of them. These multiplexed methods (and similar) will enable to carry out quantum processing in parallel over hundreds of channels, potentially increasing the throughput of quantum protocols by orders of magnitude

翻訳日:2023-11-02 01:41:14 公開日:2023-10-30

# 正準量子化はGKSL力学につながるか?

Does canonical quantization lead to GKSL dynamics? ( http://arxiv.org/abs/2310.17061v2 )

ライセンス: Link先を確認

T. Koide and F. Nicacio

(参考訳) 熱力学的に一貫した熱緩和過程を記述するためのブラウン運動の一般化された古典モデルを導入する。このモデルに正準量子化を適用すると、密度演算子の量子方程式が得られる。この方程式は定常解として熱平衡状態を持つが、時間進化は必ずしも完全正のトレース保存(CPTP)写像であるとは限らない。しかし、高調波振動子ポテンシャルの適用においては、CPTPマップの要件はパラメータの選択によって適切に満たされ、その後、詳細なバランス条件を満たすGorini-Kossakowski-Sudarshan-Lindblad(GKSL)方程式を再現する。この結果は、熱緩和過程における量子古典的対応を示唆し、デコヒーレンスの研究に新たな洞察を与える。

We introduce a generalized classical model of Brownian motion for describing thermal relaxation processes which is thermodynamically consistent. Applying the canonical quantization to this model, a quantum equation for the density operator is obtained. This equation has a thermal equilibrium state as its stationary solution, but the time evolution is not necessarily a Completely Positive and Trace-Preserving (CPTP) map. In the application to the harmonic oscillator potential, however, the requirement of the CPTP map is shown to be satisfied by choosing parameters appropriately and then our equation reproduces a Gorini-Kossakowski-Sudarshan-Lindblad (GKSL) equation satisfying the detailed balance condition. This result suggests a quantum-classical correspondence in thermal relaxation processes and will provide a new insight to the study of decoherence.

翻訳日:2023-11-02 01:40:52 公開日:2023-10-30

# ディープスパースネットワークのためのハイブリッド粒度特徴対話選択に向けて

Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network ( http://arxiv.org/abs/2310.15342v2 )

ライセンス: Link先を確認

Fuyuan Lyu, Xing Tang, Dugang Liu, Chen Ma, Weihong Luo, Liang Chen, Xiuqiang He, Xue Liu

(参考訳) ディープスパースネットワークは,高次元スパース特徴を有する予測タスクのためのニューラルネットワークアーキテクチャとして広く研究されている。従来の手法は主に粗粒度空間における特徴相互作用の探索方法に重点を置いていたが、より細かい粒度にはあまり注意が払われていない。本研究では,深層スパースネットワークにおける特徴場と特徴値の両方を対象とする,ハイブリッドな機能間相互作用選択手法を提案する。このような拡張空間を探索するために,ハエで計算される分解空間を提案する。そこで我々はoptikfeatureと呼ばれる選択アルゴリズムを開発し,特徴フィールドと特徴値の両方から機能インタラクションを効率的に選択する。 3つの大規模な実世界のベンチマークデータセットの実験の結果、OptFeatureは精度と効率の点でよく機能していることが示された。さらなる研究が我々の方法の実現性を支持している。

Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method.

翻訳日:2023-11-02 01:40:12 公開日:2023-10-30

# パーセル効果とキャビティによる散逸の再発

Thermal Purcell effect and cavity-induced renormalization of dissipations ( http://arxiv.org/abs/2310.15184v2 )

ライセンス: Link先を確認

Giuliano Chiriac\`o

(参考訳) 近年、組み込み量子材料の性質と位相を操作するためのツールとして、光学キャビティに大きな関心が寄せられている。パーセル効果のため、キャビティは光子相空間を変化させ、そのため材料内の電磁遷移速度を変化させ、光子環境との熱放射の交換速度を変化させる。ここでは, 物質が吸収する放射熱の簡易表現を導出し, キャビティの存在変化について検討し, 適切なキャビティジオメトリーのために劇的に拡張されたことを示す。この効果を典型的なエネルギー散逸過程と比較し, キャビティに結合した材料の温度への影響を確かめ, 1T-TaS$_2$に適用するための基準を与える。

In recent years there has been great interest towards optical cavities as a tool to manipulate the properties and phases of embedded quantum materials. Due to the Purcell effect, a cavity changes the photon phase space and thus the rate of electromagnetic transitions within the material, modifying the exchange rate of heat radiation with the photon environment. Here, I derive a simple expression for the radiative heat power absorbed by the material, investigate how it changes in the presence of a cavity and show that it is enhanced dramatically for appropriate cavity geometries. I compare this effect with typical energy dissipation processes, provide a criterion to establish its impact on the temperature of a material coupled to the cavity and apply it to 1T-TaS$_2$.

翻訳日:2023-11-02 01:39:56 公開日:2023-10-30

# エネルギー効率の良い基地局セルスイッチングのための適応動的プログラミング

Adaptive Dynamic Programming for Energy-Efficient Base Station Cell Switching ( http://arxiv.org/abs/2310.12999v2 )

ライセンス: Link先を確認

Junliang Luo, Yi Tian Xu, Di Wu, Michael Jenkin, Xue Liu, Gregory Dudek

(参考訳) 次世代セルラーネットワークの需要の増加、環境・規制上の懸念、地政学的緊張から生じる潜在的なエネルギー危機などにより、無線ネットワークにおける省エネルギーの重要性が高まっている。本稿では,基地局のセルをオン/オフしてネットワーク電力消費量を削減し,qos(quality of service)メトリクスを維持しつつ,オンライン最適化と組み合わせた近似動的プログラミング(adp)ベースの手法を提案する。各状態-動作ペアに与えられた多層パーセプトロン(mlp)を用いて消費電力を予測し、最適な期待電力を節約した動作を選択するためのadpの値関数を近似する。 QoSを劣化させることなく最大の電力消費を抑えるため、QoSを予測するための別のMLPとハンドオーバを予測するための長期短期メモリ(LSTM)をオンライン最適化アルゴリズムに組み込み、QoS履歴に基づいてセル切替動作をフィルタリングする適応QoS閾値を生成する。本手法の性能は,動的トラヒックパターンを用いた実世界シナリオを用いた実用ネットワークシミュレータを用いて評価する。

Energy saving in wireless networks is growing in importance due to increasing demand for evolving new-gen cellular networks, environmental and regulatory concerns, and potential energy crises arising from geopolitical tensions. In this work, we propose an approximate dynamic programming (ADP)-based method coupled with online optimization to switch on/off the cells of base stations to reduce network power consumption while maintaining adequate Quality of Service (QoS) metrics. We use a multilayer perceptron (MLP) given each state-action pair to predict the power consumption to approximate the value function in ADP for selecting the action with optimal expected power saved. To save the largest possible power consumption without deteriorating QoS, we include another MLP to predict QoS and a long short-term memory (LSTM) for predicting handovers, incorporated into an online optimization algorithm producing an adaptive QoS threshold for filtering cell switching actions based on the overall QoS history. The performance of the method is evaluated using a practical network simulator with various real-world scenarios with dynamic traffic patterns.

翻訳日:2023-11-02 01:39:41 公開日:2023-10-30

# 頻度・重大度データを用いた保険価格決定のためのニューラルネットワーク:データ前処理から技術関税へのベンチマーク研究

Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff ( http://arxiv.org/abs/2310.12671v2 )

ライセンス: Link先を確認

Freek Holvoet, Katrien Antonio and Roel Henckaerts

(参考訳) 保険会社は通常、クレームの頻度と重大度データをモデル化するための一般化線形モデルに目を向ける。他の分野での成功により、アクチュアルなツールボックス内で機械学習技術が人気を集めている。本論文は,深層学習構造を用いた機械学習による周波数分割保険価格に関する文献に寄与する。本稿では,複数種類の入力特徴が存在する場合に,頻度と重大度を目標とした4つの保険データセットに関するベンチマーク研究を行う。本研究では,バイナリ入力データに対する一般化線形モデル,勾配ブースト木モデル,フィードフォワードニューラルネットワーク(ffnn)および複合型アクチュアルニューラルネットワーク(cann)の性能比較を行った。我々のCANNは、それぞれGLMとGBMと確立されたベースライン予測とニューラルネットワークの補正を組み合わせる。本稿では, 郵便番号, 数値, カテゴリー共変量などの表型保険データに典型的に存在する複数の入力特徴に着目して, データ前処理のステップを説明する。オートエンコーダはニューラルネットワークにカテゴリ変数を埋め込むのに使われ、周波数重大設定でその潜在的な利点を探る。最後に,ニューラルネットの頻度と重大度モデルのためのグローバルサーロゲートモデルを構築した。これらのサロゲートは、FFNNやCANNが捉えた重要な洞察をGLMに翻訳することができる。そのため、技術的関税表は、実際に容易に展開できるものである。

Insurers usually turn to generalized linear models for modelling claim frequency and severity data. Due to their success in other fields, machine learning techniques are gaining popularity within the actuarial toolbox. Our paper contributes to the literature on frequency-severity insurance pricing with machine learning via deep learning structures. We present a benchmark study on four insurance data sets with frequency and severity targets in the presence of multiple types of input features. We compare in detail the performance of: a generalized linear model on binned input data, a gradient-boosted tree model, a feed-forward neural network (FFNN), and the combined actuarial neural network (CANN). Our CANNs combine a baseline prediction established with a GLM and GBM, respectively, with a neural network correction. We explain the data preprocessing steps with specific focus on the multiple types of input features typically present in tabular insurance data sets, such as postal codes, numeric and categorical covariates. Autoencoders are used to embed the categorical variables into the neural network and we explore their potential advantages in a frequency-severity setting. Finally, we construct global surrogate models for the neural nets' frequency and severity models. These surrogates enable the translation of the essential insights captured by the FFNNs or CANNs to GLMs. As such, a technical tariff table results that can easily be deployed in practice.

翻訳日:2023-11-02 01:39:20 公開日:2023-10-30

# コスト効果のあるTCR-Epitope結合親和性予測のためのアクティブラーニングフレームワーク

Active Learning Framework for Cost-Effective TCR-Epitope Binding Affinity Prediction ( http://arxiv.org/abs/2310.10893v2 )

ライセンス: Link先を確認

Pengfei Zhang, Seojin Bang and Heewook Lee

(参考訳) T細胞受容体(TCR)は、宿主細胞表面に提示されるエピトープ配列を認識して脅威に応答する免疫系の重要な構成要素である。近年,機械/深層学習によるTCRとエピトープの結合親和性の計算的予測が注目されている。しかし、その成功は注釈付きtcr-epitopeペアの大規模なコレクションの欠如によって妨げられている。結合親和性を示すには、高価で時間を要するウェットラブの評価が必要である。アノテーションコストを削減するため,アクティブラーニングとTCR-epitopeバインディング親和性予測モデルを組み込んだActiveTCRを提案する。ラベル付きトレーニングペアの小さなセットから始めると、ActiveTCRはアノテーションの'worth'であるラベル付きTCR-epitopeペアを反復検索する。アノテーションのコストを最小化しながら、パフォーマンスの向上を最大化する。 4つのクエリ戦略をランダムサンプリングベースラインと比較し,activetcrがアノテーションコストを約40%削減できることを実証した。さらに,tcr-epitopeペアの基底的真理ラベルをクエリ戦略に提供することで,モデル性能を損なうことなく,すでに注釈付きペアの40%以上の冗長性を識別し,低減できることを示した。本研究はtcr-epitope結合親和性予測のためのデータ最適化に関する最初の体系的調査である。

T cell receptors (TCRs) are critical components of adaptive immune systems, responsible for responding to threats by recognizing epitope sequences presented on host cell surface. Computational prediction of binding affinity between TCRs and epitope sequences using machine/deep learning has attracted intense attention recently. However, its success is hindered by the lack of large collections of annotated TCR-epitope pairs. Annotating their binding affinity requires expensive and time-consuming wet-lab evaluation. To reduce annotation cost, we present ActiveTCR, a framework that incorporates active learning and TCR-epitope binding affinity prediction models. Starting with a small set of labeled training pairs, ActiveTCR iteratively searches for unlabeled TCR-epitope pairs that are ''worth'' for annotation. It aims to maximize performance gains while minimizing the cost of annotation. We compared four query strategies with a random sampling baseline and demonstrated that ActiveTCR reduces annotation costs by approximately 40%. Furthermore, we showed that providing ground truth labels of TCR-epitope pairs to query strategies can help identify and reduce more than 40% redundancy among already annotated pairs without compromising model performance, enabling users to train equally powerful prediction models with less training data. Our work is the first systematic investigation of data optimization for TCR-epitope binding affinity prediction.

翻訳日:2023-11-02 01:39:01 公開日:2023-10-30

# min max相関クラスタリングのための4近似アルゴリズム

A 4-approximation algorithm for min max correlation clustering ( http://arxiv.org/abs/2310.09196v2 )

ライセンス: Link先を確認

Holger Heidrich, Jannik Irmai, Bjoern Andres

(参考訳) 本稿では,min max相関クラスタリング問題に対する下限法を提案し,この手法に基づき,完全グラフのための組合せ4近似アルゴリズムを提案する。これは、組合せアルゴリズム(davies et al., 2023)のための線形プログラム定式化(kalhan et al., 2019)と40を用いて、以前の最もよく知られた5の近似保証を改善する。我々はこのアルゴリズムをヒューリスティックな結合によって拡張し、いくつかのベンチマークデータセット上でのソリューション品質と実行時の技術状況を改善することを実証的に示す。

We introduce a lower bounding technique for the min max correlation clustering problem and, based on this technique, a combinatorial 4-approximation algorithm for complete graphs. This improves upon the previous best known approximation guarantees of 5, using a linear program formulation (Kalhan et al., 2019), and 40, for a combinatorial algorithm (Davies et al., 2023). We extend this algorithm by a greedy joining heuristic and show empirically that it improves the state of the art in solution quality and runtime on several benchmark datasets.

翻訳日:2023-11-02 01:38:37 公開日:2023-10-30

# 連続変数、離散変数、カテゴリー変数を混合した制約付き最適化問題に対するベイズ的品質・多様性アプローチ

Bayesian Quality-Diversity approaches for constrained optimization problems with mixed continuous, discrete and categorical variables ( http://arxiv.org/abs/2310.05955v2 )

ライセンス: Link先を確認

Loic Brevault and Mathieu Balesdent

(参考訳) 航空宇宙工学、民間工学、エネルギー工学などの複雑な設計問題では、設計するシステムの振る舞いや性能を予測するために、数値的なコストのかかるシミュレーションコードを使用する必要がある。システムの設計を行うために、これらのコードは最適化プロセスに組み込まれ、設計制約を満たしながら最適な設計を提供する。近年,デザイン空間の探索を強化し,特徴関数に関して最適な多角化ソリューションの集合を提供するために,品質多様性と呼ばれる新しいアプローチが提案されている。これらの機能はトレードオフを評価するのに興味深い。さらに、複雑なエンジニアリング設計問題には、最適化問題における技術的な選択を考慮に入れられるような、連続的、離散的、カテゴリー的な設計変数が混在することが多い。本稿では,連続的,離散的,カテゴリー的ベイズ最適化戦略に基づく新しい品質多様性手法を提案する。このアプローチは、古典的な品質に関して計算コストを削減できる - 個別の選択と制約を扱う一方で、多様性のアプローチ。提案手法の性能は, 解析的問題のベンチマークと, 航空宇宙システムを扱う産業設計最適化問題に基づいて評価される。

Complex engineering design problems, such as those involved in aerospace, civil, or energy engineering, require the use of numerically costly simulation codes in order to predict the behavior and performance of the system to be designed. To perform the design of the systems, these codes are often embedded into an optimization process to provide the best design while satisfying the design constraints. Recently, new approaches, called Quality-Diversity, have been proposed in order to enhance the exploration of the design space and to provide a set of optimal diversified solutions with respect to some feature functions. These functions are interesting to assess trade-offs. Furthermore, complex engineering design problems often involve mixed continuous, discrete, and categorical design variables allowing to take into account technological choices in the optimization problem. In this paper, a new Quality-Diversity methodology based on mixed continuous, discrete and categorical Bayesian optimization strategy is proposed. This approach allows to reduce the computational cost with respect to classical Quality - Diversity approaches while dealing with discrete choices and constraints. The performance of the proposed method is assessed on a benchmark of analytical problems as well as on an industrial design optimization problem dealing with aerospace systems.

翻訳日:2023-11-02 01:38:13 公開日:2023-10-30

# 貨幣の新しい経済・金融理論

A new economic and financial theory of money ( http://arxiv.org/abs/2310.04986v4 )

ライセンス: Link先を確認

Michael E. Glinsky and Sharon Sievert

(参考訳) 本論文は,電子通貨を含む経済・金融理論を根本的に改革する。電子通貨の評価は、割引キャッシュフローのミクロ経済理論ではなく、マクロ経済理論と金融政策の基本方程式に基づいて行われる。サブエコノミーの有形資産に付随する取引的エクイティとしての電子通貨の考え方は、主にサブエコノミーの無形資産に付随する株式としての株式の考え方とは対照的に発展する。この見解は、実質的な(電子通貨の流動性のために)金融(電子通貨供給及び価値安定化)及び財政(投資及び運用)政策の調整を行う機関として、電子通貨管理会社によって策定される。評価と意思決定で使用されるリスクモデルは、ディスカウント率につながるユビキタスで不適切な指数的リスクモデルではなく、真のリスクを捉えるマルチタイムスケールモデルになります。意思決定は、多スケールリスクモデルと、Deep Reinforcement Learning、Generative Pretrained Transformers、その他の人工知能(DRL/GPT/AI)を利用したシステムコントローラによって与えられるシステム応答関数に基づいて、真のシステム制御の観点からアプローチされる。最後に、サブエコノミーは、短期的な利用に関連する安定平衡と、マルチスケールのシステム応答関数とDRL/GPT/AIに基づくアクティブな非線形制御で安定化する必要がある不安定平衡の両方を持つ非線形複素物理系と見なされる。

This paper fundamentally reformulates economic and financial theory to include electronic currencies. The valuation of the electronic currencies will be based on macroeconomic theory and the fundamental equation of monetary policy, not the microeconomic theory of discounted cash flows. The view of electronic currency as a transactional equity associated with tangible assets of a sub-economy will be developed, in contrast to the view of stock as an equity associated mostly with intangible assets of a sub-economy. The view will be developed of the electronic currency management firm as an entity responsible for coordinated monetary (electronic currency supply and value stabilization) and fiscal (investment and operational) policies of a substantial (for liquidity of the electronic currency) sub-economy. The risk model used in the valuations and the decision-making will not be the ubiquitous, yet inappropriate, exponential risk model that leads to discount rates, but will be multi time scale models that capture the true risk. The decision-making will be approached from the perspective of true systems control based on a system response function given by the multi scale risk model and system controllers that utilize the Deep Reinforcement Learning, Generative Pretrained Transformers, and other methods of Artificial Intelligence (DRL/GPT/AI). Finally, the sub-economy will be viewed as a nonlinear complex physical system with both stable equilibriums that are associated with short-term exploitation, and unstable equilibriums that need to be stabilized with active nonlinear control based on the multi scale system response functions and DRL/GPT/AI.

翻訳日:2023-11-02 01:37:55 公開日:2023-10-30

# 光学結合ナノ粒子の非エルミートダイナミクスと非相反性

Non-Hermitian dynamics and nonreciprocity of optically coupled nanoparticles ( http://arxiv.org/abs/2310.02610v2 )

ライセンス: Link先を確認

Manuel Reisenbauer, Henning Rudolph, Livia Egyed, Klaus Hornberger, Anton V. Zasedatelev, Murad Abuzarli, Benjamin A. Stickler, Uro\v{s} Deli\'c

(参考訳) フォトニック、原子、電気、光機械のプラットフォームで観察される非エルミート力学は、応用や信号処理を感知する大きな可能性を秘めている。近年, 浮遊ナノ粒子間の完全可変非相互光相互作用が実証されている。ここでは、このチューナビリティを用いて、2つの非相反的および非線形相互作用するナノ粒子の集団的非エルミタンダイナミクスの研究を行う。我々はパリティ時対称性の破れを観察し、十分に強い結合のために、粒子が安定な極限周期に沿って移動する集合的な機械的ラシング遷移を観察する。この研究は、ツイーザーアレイ内の個々の部位の動的制御によって調整された非平衡多粒子集合効果の研究の道を開く。

Non-Hermitian dynamics, as observed in photonic, atomic, electrical, and optomechanical platforms, holds great potential for sensing applications and signal processing. Recently, fully tunable nonreciprocal optical interaction has been demonstrated between levitated nanoparticles. Here, we use this tunability to investigate the collective non-Hermitian dynamics of two nonreciprocally and nonlinearly interacting nanoparticles. We observe parity-time symmetry breaking and, for sufficiently strong coupling, a collective mechanical lasing transition, where the particles move along stable limit cycles. This work opens up a research avenue of nonequilibrium multi-particle collective effects, tailored by the dynamic control of individual sites in a tweezer array.

翻訳日:2023-11-02 01:37:27 公開日:2023-10-30

# 言語モデルトレーニングのための人体フィードバックの微粒化

Fine-Grained Human Feedback Gives Better Rewards for Language Model Training ( http://arxiv.org/abs/2306.01693v2 )

ライセンス: Link先を確認

Zeqiu Wu, Yushi Hu, Weijia Shi, Nouha Dziri, Alane Suhr, Prithviraj Ammanabrolu, Noah A. Smith, Mari Ostendorf, Hannaneh Hajishirzi

(参考訳) 言語モデル(LM)は、しばしば偽、有毒、無関係な出力を生成するなど、望ましくないテキスト生成の振る舞いを示す。人間のフィードバックからの強化学習(RLHF) – LM出力に対する人間の嗜好判断が学習信号に変換される – は、これらの問題に対処する上での約束を最近示した。しかし、このような全体論的フィードバックは、長いテキスト出力に関する限られた情報を伝えるものであり、出力のどの側面がユーザーの好みに影響を与えているかを示すものではない。本稿では, 明快な訓練信号として, きめ細かい人間のフィードバック(例えば, 文は偽で, サブ文は無関係)を用いる。我々は,(1)各セグメント(文など)が生成されてから報酬を与える密度,(2)異なるフィードバックタイプ(事実的誤り,不適切性,情報不完全性など)に関連付けられた複数の報酬モデルを統合する,2つの点で微細な報酬関数からのトレーニングと学習を可能にするフレームワークであるFine-Grained RLHFを紹介する。我々は,このような報酬関数による学習が,自動評価と人的評価の両方で支持されるパフォーマンス向上につながることを示すために,解毒および長文質問応答の実験を行った。さらに、細粒度報酬モデルの異なる組み合わせを用いて、LMの挙動をカスタマイズできることを示す。すべてのデータ、人間のフィードバック、コードをhttps://FineGrainedRLHF.github.ioで公開しています。

Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs. Reinforcement learning from human feedback (RLHF) - where human preference judgments on LM outputs are transformed into a learning signal - has recently shown promise in addressing these issues. However, such holistic feedback conveys limited information on long text outputs; it does not indicate which aspects of the outputs influenced user preference; e.g., which parts contain what type(s) of errors. In this paper, we use fine-grained human feedback (e.g., which sentence is false, which sub-sentence is irrelevant) as an explicit training signal. We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects: (1) density, providing a reward after every segment (e.g., a sentence) is generated; and (2) incorporating multiple reward models associated with different feedback types (e.g., factual incorrectness, irrelevance, and information incompleteness). We conduct experiments on detoxification and long-form question answering to illustrate how learning with such reward functions leads to improved performance, supported by both automatic and human evaluation. Additionally, we show that LM behaviors can be customized using different combinations of fine-grained reward models. We release all data, collected human feedback, and codes at https://FineGrainedRLHF.github.io.

翻訳日:2023-11-01 23:54:27 公開日:2023-10-30

# Flip-Flop言語モデリングによる注意グラフの抽出

Exposing Attention Glitches with Flip-Flop Language Modeling ( http://arxiv.org/abs/2306.00946v2 )

ライセンス: Link先を確認

Bingbin Liu, Jordan T. Ash, Surbhi Goel, Akshay Krishnamurthy, Cyril Zhang

(参考訳) なぜ大規模な言語モデルは事実的不正確さを出力し、誤った推論を示すのか? これらのモデルの脆さ、特に推論の長い連鎖を実行する場合、現在、知識、実践的思考、抽象的思考を一貫性を持って合成する高度な能力を支払うために避けられない価格であるように思える。この根本的な未解決問題を理解するため、本研究は、トランスフォーマーアーキテクチャの帰納的バイアスが断続的にロバストな推論を捉えることができない、注意欠陥の現象を識別し、分析する。この問題を分離するために,ニューラルネットワークモデルの外挿挙動を探索するために設計された合成ベンチマークのパラメトリックなファミリであるフリップフロップ言語モデリング(FFLM)を導入する。この単純な生成タスクは、長い範囲の依存に対してバイナリシンボルをコピーするモデルを必要とします。トランスフォーマーfflmは散発的な推論エラーの長い尾に苦しむことが分かり、その一部は様々な正規化技術を用いて排除できる。予備的な機構解析により,残差エラーの診断と解決が困難になる可能性が示唆された。我々は,自然のLLMにおける閉領域幻覚に注意点が関与していると仮定する。

Why do large language models sometimes output factual inaccuracies and exhibit erroneous reasoning? The brittleness of these models, particularly when executing long chains of reasoning, currently seems to be an inevitable price to pay for their advanced capabilities of coherently synthesizing knowledge, pragmatics, and abstract thought. Towards making sense of this fundamentally unsolved problem, this work identifies and analyzes the phenomenon of attention glitches, in which the Transformer architecture's inductive biases intermittently fail to capture robust reasoning. To isolate the issue, we introduce flip-flop language modeling (FFLM), a parametric family of synthetic benchmarks designed to probe the extrapolative behavior of neural language models. This simple generative task requires a model to copy binary symbols over long-range dependencies, ignoring the tokens in between. We find that Transformer FFLMs suffer from a long tail of sporadic reasoning errors, some of which we can eliminate using various regularization techniques. Our preliminary mechanistic analyses show why the remaining errors may be very difficult to diagnose and resolve. We hypothesize that attention glitches account for (some of) the closed-domain hallucinations in natural LLMs.

翻訳日:2023-11-01 23:52:48 公開日:2023-10-30

# 高次相互作用のための相互作用測度、分割格子および核テスト

Interaction Measures, Partition Lattices and Kernel Tests for High-Order Interactions ( http://arxiv.org/abs/2306.00904v2 )

ライセンス: Link先を確認

Zhaolu Liu, Robert L. Peach, Pedro A.M. Mediano, and Mauricio Barahona

(参考訳) 対関係にのみ依存するモデルは、社会経済、生態学、生物医学システムなど、様々な領域で見られる複雑な多変量データの完全な統計構造を捉えることができないことが多い。 2つ以上の変数からなるグループ間の非自明な依存関係は、そのようなシステムの分析とモデリングにおいて重要な役割を果たすが、データからそのような高次相互作用を抽出することは依然として困難である。ここでは、d$-order (d \geq 2$) 相互作用測度の階層を導入し、ジョイント確率分布の可能な因子化をますます包含し、非パラメトリックなカーネルベースのテストを定義し、d$-order相互作用の統計的意義を体系的に確立する。また、相互作用測度とその複合置換試験の導出を解明する格子理論との数学的関係を確立し、単純錯体とカーネル行列遠心率の関連を明らかにするとともに、計算効率を高める手段を提供する。本研究は,合成データおよび神経画像データへの応用により,数値的に結果を示す。

Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling of such systems, yet extracting such high-order interactions from data remains challenging. Here, we introduce a hierarchy of $d$-order ($d \geq 2$) interaction measures, increasingly inclusive of possible factorisations of the joint probability distribution, and define non-parametric, kernel-based tests to establish systematically the statistical significance of $d$-order interactions. We also establish mathematical links with lattice theory, which elucidate the derivation of the interaction measures and their composite permutation tests; clarify the connection of simplicial complexes with kernel matrix centring; and provide a means to enhance computational efficiency. We illustrate our results numerically with validations on synthetic data, and through an application to neuroimaging data.

翻訳日:2023-11-01 23:52:28 公開日:2023-10-30

# 拡散モデルにおける負転移の対応

Addressing Negative Transfer in Diffusion Models ( http://arxiv.org/abs/2306.00354v2 )

ライセンス: Link先を確認

Hyojun Go, JinYoung Kim, Yunsung Lee, Seunghyun Lee, Shinhyeok Oh, Hyeongdon Moon, Seungtaek Choi

(参考訳) 拡散に基づく生成モデルは様々な領域で顕著な成功を収めている。マルチタスク学習(MTL)の形式を表現するために、異なるノイズレベルを同時に含むタスクの認知に関する共有モデルを訓練する。しかし、MTLの観点からの拡散モデルの解析と改善はいまだに未検討である。特に、mtlはよく知られた負の伝達現象につながり、タスク間の衝突によって特定のタスクのパフォーマンスが低下することがある。本稿では,MTL の観点から拡散訓練を解析し,(O1) 雑音レベルの差が大きくなるにつれてタスク間のタスク親和性が低下し,(O2) 負の伝達が拡散訓練においても生じるという2つの重要な観察結果を示す。これらの観測に基づいて、負の伝達を緩和することで拡散訓練を強化することを目指している。これを実現するために,既存のMLL手法の活用を提案するが,膨大なタスクが存在するため,タスク毎の損失や勾配を計算するのに計算コストがかかる。この課題に対処するために,タスクを小さなタスククラスタにクラスタ化し,MTLメソッドを適用することを提案する。具体的には、(O2)に基づいて、クラスタ内のタスク間の時間的近接を強制するために間隔クラスタリングを用いる。本研究では,信号対雑音比,時間ステップ,タスク親和性を用いて,動的計画法を用いて区間クラスタリングを解決できることを示す。本手法は,mtl法の効率的な計算を可能にすることにより,拡散モデルにおける負の伝達問題に対処する。提案手法のクラスタリングとMTL手法の統合を様々な実験により検証し,拡散モデルのサンプル品質の向上を実証した。プロジェクトのページは \href{https://gohyojun15.github.io/ant_diffusion/}{url} で閲覧できます。

Diffusion-based generative models have achieved remarkable success in various domains. It trains a shared model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of negative transfer, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we first aim to analyze diffusion training from an MTL standpoint, presenting two key observations: (O1) the task affinity between denoising tasks diminishes as the gap between noise levels widens, and (O2) negative transfer can arise even in diffusion training. Building upon these observations, we aim to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on (O2), we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the proposed clustering and its integration with MTL methods through various experiments, demonstrating improved sample quality of diffusion models. Our project page is available at \href{https://gohyojun15.github.io/ANT_diffusion/}{url}.

翻訳日:2023-11-01 23:52:08 公開日:2023-10-30

# スペクトル調和:自己監督学習におけるスペクトル埋め込みと行列補完

Spectal Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning ( http://arxiv.org/abs/2305.19818v2 )

ライセンス: Link先を確認

Marina Munkhoeva, Ivan Oseledets

(参考訳) 自己監督的な手法は、ラベルの形で明らかな監督なしにデータのセマンティクスを尊重する学習表現に対する、一見ヒューリスティックなアプローチによって大きな注目を集めた。現代の自己監督表現学習法で使われる損失の動物園の作業について、一貫性と理論的に根拠のある理解を構築するために、文学の集団がすでに出版されている。本稿では,ラプラス演算子の観点からの理解を提供し,拡張過程に起因する帰納的バイアスを低ランク行列補完問題に結びつける。この目的のために,低ランク行列補完の結果を利用して,最新のssl手法の収束と,その下流性能に影響を与える重要な特性を理論的に解析する。

Self-supervised methods received tremendous attention thanks to their seemingly heuristic approach to learning representations that respect the semantics of the data without any apparent supervision in the form of labels. A growing body of literature is already being published in an attempt to build a coherent and theoretically grounded understanding of the workings of a zoo of losses used in modern self-supervised representation learning methods. In this paper, we attempt to provide an understanding from the perspective of a Laplace operator and connect the inductive bias stemming from the augmentation process to a low-rank matrix completion problem. To this end, we leverage the results from low-rank matrix completion to provide theoretical analysis on the convergence of modern SSL methods and a key property that affects their downstream performance.

翻訳日:2023-11-01 23:51:39 公開日:2023-10-30

# トンネル効果:深層ニューラルネットワークにおけるデータ表現の構築

The Tunnel Effect: Building Data Representations in Deep Neural Networks ( http://arxiv.org/abs/2305.19753v2 )

ライセンス: Link先を確認

Wojciech Masarczyk, Mateusz Ostaszewski, Ehsan Imani, Razvan Pascanu, Piotr Mi{\l}o\'s, Tomasz Trzci\'nski

(参考訳) ディープニューラルネットワークは、さまざまなタスクにまたがる顕著な効果で広く知られており、深層ネットワークは暗黙的により複雑なデータ表現を学ぶというコンセンサスがある。本稿では,教師付き画像分類のための十分な深層ネットワークを,結果の表現に異なる2つの異なる部分に分割することを提案する。最初のレイヤは線形に分離可能な表現を生成し、続くレイヤは \textit{the tunnel} と呼ばれ、これらの表現を圧縮し、全体的なパフォーマンスに最小限の影響を与える。総合的な実験研究を通じてトンネルの挙動を探究し,訓練過程の初期段階に現れることを強調する。その深さは、ネットワークの容量とタスクの複雑さの関係に依存する。さらに,このトンネルは分散一般化を損なうことを示し,継続的な学習にその意義について考察する。

Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearly-separable representations, while the subsequent layers, which we refer to as \textit{the tunnel}, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning.

翻訳日:2023-11-01 23:51:09 公開日:2023-10-30

# 単一生成フローネットワークによるグラフィカル構造とパラメータのジョイントベイズ推定

Joint Bayesian Inference of Graphical Structure and Parameters with a Single Generative Flow Network ( http://arxiv.org/abs/2305.19366v2 )

ライセンス: Link先を確認

Tristan Deleu, Mizu Nishikawa-Toomey, Jithendaraa Subramanian, Nikolay Malkin, Laurent Charlin, Yoshua Bengio

(参考訳) 離散的および構造化されたサンプル空間上の生成モデルのクラスである生成フローネットワーク(GFlowNets)は、ベイジアンネットワークの有向非巡回グラフ(DAG)上の境界後部分布を推定する問題に対して、観測のデータセットを与えられた。本稿では, この枠組みを非離散標本空間に拡張する最近の進歩に基づき, ベイズネットワークの構造だけでなく, 条件付き確率分布のパラメータにも乗じて, 結合後部を近似する手法を提案する。我々は,サンプリングポリシが2段階のプロセスに従う単一のGFlowNetを用いて,DAGを1回に1つのエッジに順次生成し,全構造が知られると対応するパラメータを選択する。パラメータは後方分布に含まれるため,ベイジアンネットワークの局所確率モデルに対する柔軟性が向上し,ニューラルネットワークによってパラメータ化される非線形モデルにも適用できる。本手法は jsp-gfn と呼ばれ, シミュレーションデータと実データの両方において既存の手法と好適に比較しながら, 関節後方の正確な近似を提供する。

Generative Flow Networks (GFlowNets), a class of generative models over discrete and structured sample spaces, have been previously applied to the problem of inferring the marginal posterior distribution over the directed acyclic graph (DAG) of a Bayesian Network, given a dataset of observations. Based on recent advances extending this framework to non-discrete sample spaces, we propose in this paper to approximate the joint posterior over not only the structure of a Bayesian Network, but also the parameters of its conditional probability distributions. We use a single GFlowNet whose sampling policy follows a two-phase process: the DAG is first generated sequentially one edge at a time, and then the corresponding parameters are picked once the full structure is known. Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models of the Bayesian Network, making our approach applicable even to non-linear models parametrized by neural networks. We show that our method, called JSP-GFN, offers an accurate approximation of the joint posterior, while comparing favorably against existing methods on both simulated and real data.

翻訳日:2023-11-01 23:50:56 公開日:2023-10-30

# sheetcopilot: 大規模言語モデルによるソフトウェア生産性の次のレベルへ

SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models ( http://arxiv.org/abs/2305.19308v2 )

ライセンス: Link先を確認

Hongxin Li, Jingran Su, Yuntao Chen, Qing Li, Zhaoxiang Zhang

(参考訳) コンピュータのエンドユーザーは、表データ処理やプロジェクトスケジュールスケジューリングといった日々のタスクを何十億時間も完了させてきた。これらのタスクのほとんどは反復的でエラーを起こしやすいが、ほとんどのエンドユーザーはこうした面倒な作業を自動化するスキルが欠けている。大規模言語モデル(LLM)の出現により、自然言語ユーザ要求によるソフトウェア指向が到達可能な目標となっている。本研究では,自然言語処理とスプレッドシート制御を併用して要求を満たすシートコパイロットエージェントを提案する。本稿では,スプレッドシートソフトウェア機能の抽象化として,アトミックアクションのセットを提案する。我々はさらに、LLMがスプレッドシートと堅牢に対話するための状態マシンベースのタスク計画フレームワークを設計する。 221のスプレッドシート制御タスクを含む代表データセットをキュレートし、ソフトウェア制御タスクにおけるLLMの能力を厳格にベンチマークするための完全自動評価パイプラインを確立する。当社の SheetCopilot は,単一世代のタスクの 44.3 % を正しく完了し,強力なコード生成ベースラインを広いマージンで上回っている。プロジェクトページ:https://sheetcopilot.github.io/

Computer end users have spent billions of hours completing daily tasks like tabular data processing and project timeline scheduling. Most of these tasks are repetitive and error-prone, yet most end users lack the skill to automate these burdensome works. With the advent of large language models (LLMs), directing software with natural language user requests become a reachable goal. In this work, we propose a SheetCopilot agent that takes natural language task and control spreadsheet to fulfill the requirements. We propose a set of atomic actions as an abstraction of spreadsheet software functionalities. We further design a state machine-based task planning framework for LLMs to robustly interact with spreadsheets. We curate a representative dataset containing 221 spreadsheet control tasks and establish a fully automated evaluation pipeline for rigorously benchmarking the ability of LLMs in software control tasks. Our SheetCopilot correctly completes 44.3\% of tasks for a single generation, outperforming the strong code generation baseline by a wide margin. Our project page:https://sheetcopilot.github.io/.

翻訳日:2023-11-01 23:50:35 公開日:2023-10-30

# NetHackはハッキングが難しい

NetHack is Hard to Hack ( http://arxiv.org/abs/2305.19240v2 )

ライセンス: Link先を確認

Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

(参考訳) ニューラルポリシー学習法は,アタリゲームからシミュレーションロコモーションに至るまで,様々な制御問題において顕著な成果を上げている。しかし、これらの手法は特に、一般的なダンジョンクローラーゲームであるNetHackのようなマルチモーダルな観察を伴うオープンな環境において、長期的タスクで苦労する。興味深いことに、NeurIPS 2021 NetHack Challengeは、シンボリックエージェントが中央値のゲームスコアで4倍以上のニューラルアプローチを上回りました。本稿では,この性能格差の背景にある理由を考察し,nethackのニューラルポリシー学習に関する広範な研究を行う。本研究は,勝利の象徴的エージェントを解析し,コードベースを拡張して内部戦略の選択を追跡し,最大規模のデモデータセットを生成する。このデータセットを用いて検討する (i)行動階層の長所 (ii)ニューラルアーキテクチャの強化、及び (iii)強化学習と模倣学習の統合。我々の調査では、従来の完全なニューラルネットワークポリシーを127%のオフライン設定で、中央値のオンライン設定で25%超える最先端のニューラルエージェントを作成しました。しかし,優れたシンボリックモデルやトップヒューマンプレイヤーでパフォーマンスギャップを埋めるには,単にスケーリングが不十分であることも示している。

Neural policy learning methods have achieved remarkable results in various control problems, ranging from Atari games to simulated locomotion. However, these methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents outperformed neural approaches by over four times in median game score. In this paper, we delve into the reasons behind this performance gap and present an extensive study on neural policy learning for NetHack. To conduct this study, we analyze the winning symbolic agent, extending its codebase to track internal strategy selection in order to generate one of the largest available demonstration datasets. Utilizing this dataset, we examine (i) the advantages of an action hierarchy; (ii) enhancements in neural architecture; and (iii) the integration of reinforcement learning with imitation learning. Our investigations produce a state-of-the-art neural agent that surpasses previous fully neural policies by 127% in offline settings and 25% in online settings on median game score. However, we also demonstrate that mere scaling is insufficient to bridge the performance gap with the best symbolic models or even the top human players.

翻訳日:2023-11-01 23:50:17 公開日:2023-10-30

# 常時画像生成のためのネスト拡散過程

Nested Diffusion Processes for Anytime Image Generation ( http://arxiv.org/abs/2305.19066v3 )

ライセンス: Link先を確認

Noam Elata, Bahjat Kawar, Tomer Michaeli, Michael Elad

(参考訳) 拡散モデルは、画像生成における最先端のモデルであり、生成プロセスを多くの細かなデノイジングステップに分解することで高品質な画像を合成する。優れた性能にもかかわらず、拡散モデルは計算コストが高く、多くの神経機能評価(NFE)を必要とする。本研究では,完了前に任意のタイミングで停止した場合に実行可能画像を生成する,任意の時間拡散に基づく手法を提案する。既存の事前学習拡散モデルを用いて、生成スキームを2つのネスト拡散過程として再構成し、生成した画像の高速反復精錬を可能にする。 ImageNetとStable Diffusionを用いたテキスト・ツー・イメージ生成実験において,本手法の中間生成品質が元の拡散モデルを大きく上回る一方で,最終的な生成結果と同等であることを示す。我々は,Nested Diffusionの適用性について,逆問題の解決や,サンプリングプロセス全体を通じてユーザの介入を可能とすることで,テキストベースの迅速なコンテンツ作成など,いくつかの設定で説明する。

Diffusion models are the current state-of-the-art in image generation, synthesizing high-quality images by breaking down the generation process into many fine-grained denoising steps. Despite their good performance, diffusion models are computationally expensive, requiring many neural function evaluations (NFEs). In this work, we propose an anytime diffusion-based method that can generate viable images when stopped at arbitrary times before completion. Using existing pretrained diffusion models, we show that the generation scheme can be recomposed as two nested diffusion processes, enabling fast iterative refinement of a generated image. In experiments on ImageNet and Stable Diffusion-based text-to-image generation, we show, both qualitatively and quantitatively, that our method's intermediate generation quality greatly exceeds that of the original diffusion model, while the final generation result remains comparable. We illustrate the applicability of Nested Diffusion in several settings, including for solving inverse problems, and for rapid text-based content creation by allowing user intervention throughout the sampling process.

翻訳日:2023-11-01 23:49:56 公開日:2023-10-30

# 画像セグメンテーションにおけるトポロジー認識の不確かさ

Topology-Aware Uncertainty for Image Segmentation ( http://arxiv.org/abs/2306.05671v3 )

ライセンス: Link先を確認

Saumya Gupta, Yikai Zhang, Xiaoling Hu, Prateek Prasanna and Chao Chen

(参考訳) 比較的弱い信号と複雑な幾何学・トポロジーのため, 血管や道路網などの曲線構造のセグメンテーションは困難である。大規模なアノテーションを容易かつ加速するためには、専門家による証明読取のような半自動的なアプローチを採用する必要がある。本研究では,このようなタスクに対する不確実性評価に焦点をあて,高い不確かさとエラー発生構造を人間のアノテータが検証できるようにする。ピクセルワイズ不確実性マップを提供する既存の多くの作品とは異なり、我々は、例えば小さな接続や枝などの位相構造の単位における不確かさを推定することが重要であると規定している。これを実現するために、我々は、トポロジカルデータ解析、特に離散モース理論(DMT)のツールを活用し、まず構造を捉え、その不確実性を推論する。この不確かさをモデル化するために,(1)隣接構造物を考慮しながら構造物の不確かさを推定する共同予測モデル(構造間不確実性)を提案し,(2)その表現を摂動・歩行スキームでサンプリングし,各構造物内固有の不確かさをモデル化する新しい確率的dmtを提案する。様々な2次元および3次元データセットにおいて,本手法は既存手法と比較して構造的不確実性マップを生成する。コードはhttps://github.com/saumya-gupta-26/struct-uncertaintyで利用可能

Segmentation of curvilinear structures such as vasculature and road networks is challenging due to relatively weak signals and complex geometry/topology. To facilitate and accelerate large scale annotation, one has to adopt semi-automatic approaches such as proofreading by experts. In this work, we focus on uncertainty estimation for such tasks, so that highly uncertain, and thus error-prone structures can be identified for human annotators to verify. Unlike most existing works, which provide pixel-wise uncertainty maps, we stipulate it is crucial to estimate uncertainty in the units of topological structures, e.g., small pieces of connections and branches. To achieve this, we leverage tools from topological data analysis, specifically discrete Morse theory (DMT), to first capture the structures, and then reason about their uncertainties. To model the uncertainty, we (1) propose a joint prediction model that estimates the uncertainty of a structure while taking the neighboring structures into consideration (inter-structural uncertainty); (2) propose a novel Probabilistic DMT to model the inherent uncertainty within each structure (intra-structural uncertainty) by sampling its representations via a perturb-and-walk scheme. On various 2D and 3D datasets, our method produces better structure-wise uncertainty maps compared to existing works. Code available at https://github.com/Saumya-Gupta-26/struct-uncertainty

翻訳日:2023-11-01 23:44:03 公開日:2023-10-30

# mri脳腫瘍セグメンテーションのための新しい信頼感誘発クラス活性化マッピング

A Novel Confidence Induced Class Activation Mapping for MRI Brain Tumor Segmentation ( http://arxiv.org/abs/2306.05476v3 )

ライセンス: Link先を確認

Yu-Jen Chen, Yiyu Shi, Tsung-Yi Ho

(参考訳) 磁気共鳴イメージング(MRI)は、脳腫瘍のセグメンテーションにおいて一般的に用いられる技術であり、患者の評価や治療計画に重要である。ラベル付けプロセスが専門知識に頼りにくくするために,クラスアクティベーションマッピング(CAM)を用いた弱教師付きセマンティックセマンティックセグメンテーション(WSSS)法が提案されている。しかし、現在のCAMベースのWSSSメソッドは、勾配やトレーニング可能なパラメータなどの内部ニューラルネットワーク情報を使用してオブジェクトのローカライゼーションマップを生成し、それによってサブ最適解が得られる。これらの問題に対処するために,各特徴マップの重み付けを目標クラスの信頼度を用いて算出する信頼誘導型CAM(Cfd-CAM)を提案する。 2つの脳腫瘍データセットに対する実験により、Cfd-CAMは、同じレベルの監督下で既存の最先端の手法よりも優れていることが示された。総じて,提案するcfd-camアプローチは脳腫瘍の分画精度を向上し,他の医用画像診断のためのwsss法の開発に有用な知見を与える。

Magnetic resonance imaging (MRI) is a commonly used technique for brain tumor segmentation, which is critical for evaluating patients and planning treatment. To make the labeling process less laborious and dependent on expertise, weakly-supervised semantic segmentation (WSSS) methods using class activation mapping (CAM) have been proposed. However, current CAM-based WSSS methods generate the object localization map using internal neural network information, such as gradient or trainable parameters, which can lead to suboptimal solutions. To address these issues, we propose the confidence-induced CAM (Cfd-CAM), which calculates the weight of each feature map by using the confidence of the target class. Our experiments on two brain tumor datasets show that Cfd-CAM outperforms existing state-of-the-art methods under the same level of supervision. Overall, our proposed Cfd-CAM approach improves the accuracy of brain tumor segmentation and may provide valuable insights for developing better WSSS methods for other medical imaging tasks.

翻訳日:2023-11-01 23:42:56 公開日:2023-10-30

# 要因的コントラスト学習 - マルチビュー冗長性を超えて

Factorized Contrastive Learning: Going Beyond Multi-view Redundancy ( http://arxiv.org/abs/2306.05268v2 )

ライセンス: Link先を確認

Paul Pu Liang, Zihao Deng, Martin Ma, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov

(参考訳) 多様なマルチモーダルタスクにおいて、コントラスト学習は、ペアリング情報(画像キャプチャやビデオオーディオペアなど)のみを含む豊富なラベルなしデータから表現をうまく学習できるため、特に魅力的なアプローチとなっている。これらのアプローチを支えるのは、マルチビュー冗長性(multi-view redundancy)の仮定である。しかし、多くの現実の環境では、タスク関連情報はモダリティ・ユニクティックな領域にも含まれている: 1つのモダリティにのみ存在するが、タスクに関係している情報である。下流タスクに関連する共有情報とユニークな情報の両方をキャプチャするために、自己組織化されたマルチモーダル表現をどのように学べるか? 本稿では,マルチビュー冗長性を超えた新しいマルチモーダル表現学習法であるFacterCLを提案する。 factorclは,(1)タスク関連情報を共有表現とユニークな表現に分解する,(2)mi下限を最大化しタスク関連情報を取得し,mi上限を最小化することでタスク関連情報を削除する,(3)ラベル無しでタスク関連情報を近似するマルチモーダルデータ拡張,の3つの新たなコントリビューションから構築されている。大規模な実世界のデータセットでは、FacterCLは共有情報とユニークな情報の両方をキャプチャし、6つのベンチマークで最先端の結果を達成する

In a wide range of multimodal tasks, contrastive learning has become a particularly appealing approach since it can successfully learn representations from abundant unlabeled data with only pairing information (e.g., image-caption or video-audio pairs). Underpinning these approaches is the assumption of multi-view redundancy - that shared information between modalities is necessary and sufficient for downstream tasks. However, in many real-world settings, task-relevant information is also contained in modality-unique regions: information that is only present in one modality but still relevant to the task. How can we learn self-supervised multimodal representations to capture both shared and unique information relevant to downstream tasks? This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy. FactorCL is built from three new contributions: (1) factorizing task-relevant information into shared and unique representations, (2) capturing task-relevant information via maximizing MI lower bounds and removing task-irrelevant information via minimizing MI upper bounds, and (3) multimodal data augmentations to approximate task relevance without labels. On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results on six benchmarks

翻訳日:2023-11-01 23:42:35 公開日:2023-10-30

# ラクダはどこまで行けますか。オープンリソースのインストラクションチューニングの現状を探る

How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources ( http://arxiv.org/abs/2306.04751v2 )

ライセンス: Link先を確認

Yizhong Wang, Hamish Ivison, Pradeep Dasigi, Jack Hessel, Tushar Khot, Khyathi Raghavi Chandu, David Wadden, Kelsey MacMillan, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

(参考訳) 本研究では,オープン命令追従データセットを用いた命令チューニング言語モデルの最近の進歩について検討する。オープンモデルは最先端のプロプライエタリモデルと同等であるという最近の主張にもかかわらず、これらの主張はしばしば限定的な評価を伴っており、ボード全体の比較と様々なリソースの有用性の決定が困難である。我々は、6.7Bから65Bのパラメータから、手作業によるキュレート(OpenAssistantなど)から合成・蒸留(Alpacaなど)までの12の命令データセットをトレーニングし、それらの事実的知識、推論、多言語性、コーディング、そして、自動的、モデルベース、人間ベースのメトリクスの収集を通じて、それらを体系的に評価する。さらに、高品質なオープンリソースの組み合わせを微調整した命令調整モデルスイートであるT\"uluを紹介します。我々の実験では、異なる命令チューニングデータセットは特定のスキルを解明または拡張できるが、単一のデータセット(または組み合わせ)はすべての評価で最高のパフォーマンスを提供する。興味深いことに、モデルと人間の嗜好に基づく評価は、ベンチマークベースの評価で表されるモデル能力の違いを反映せず、本研究で実施されるシステム評価のタイプの必要性が示唆されている。評価の結果,ChatGPTの性能は平均87%,GPT-4性能は73%であり,このギャップを埋めるためには,より良いベースモデルの構築と指導訓練データの構築にさらなる投資が必要であることが示唆された。我々は、65B T\"uluを完全に微調整したモデルと、将来の研究を促進するためのコード、データ、評価フレームワークをhttps://github.com/allenai/open-instructでリリースしています。

In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied by limited evaluation, making it difficult to compare models across the board and determine the utility of various resources. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets ranging from manually curated (e.g., OpenAssistant) to synthetic and distilled (e.g., Alpaca) and systematically evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities through a collection of automatic, model-based, and human-based metrics. We further introduce T\"ulu, our best performing instruction-tuned model suite finetuned on a combination of high-quality open resources. Our experiments show that different instruction-tuning datasets can uncover or enhance specific skills, while no single dataset (or combination) provides the best performance across all evaluations. Interestingly, we find that model and human preference-based evaluations fail to reflect differences in model capabilities exposed by benchmark-based evaluations, suggesting the need for the type of systemic evaluation performed in this work. Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap. We release our instruction-tuned models, including a fully finetuned 65B T\"ulu, along with our code, data, and evaluation framework at https://github.com/allenai/open-instruct to facilitate future research.

翻訳日:2023-11-01 23:42:09 公開日:2023-10-30

# 生成モデル評価指標の欠陥の暴露と拡散モデルの不公平な処理

Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models ( http://arxiv.org/abs/2306.04675v2 )

ライセンス: Link先を確認

George Stein, Jesse C. Cresswell, Rasa Hosseinzadeh, Yi Sui, Brendan Leigh Ross, Valentin Villecroze, Zhaoyan Liu, Anthony L. Caterini, J. Eric T. Taylor, Gabriel Loaiza-Ganem

(参考訳) 我々は,セマンティックな画像データセットにまたがる多種多様な生成モデルを体系的に研究し,それらの評価に用いる特徴抽出器と指標を理解し,改善する。心理物理学におけるベストプラクティスを用いて、生成標本に対する人間のイメージリアリズムの知覚を計測し、これまでで最大の生成モデル評価実験を行い、既存の測定基準が人間の評価と強く相関しないことを見出した。生成モデルの全体的なパフォーマンス、忠実性、多様性、ラリティ、記憶力を評価するための17の現代的な指標と比較すると、人間によって判断される拡散モデルの最先端の知覚的実在性は、fidのような一般的に報告されている指標には反映されないことが分かる。この相違は生成標本の多様性によって説明されないが、一つの原因はインセプションV3への過剰依存である。これらの欠陥に対処するために,個別のネットワークで符号化された意味情報がトレーニング手順に強く依存していることを発見し,DINOv2-ViT-L/14が生成モデルのよりリッチな評価を可能にすることを示す。次に,生成モデルがcifar10のような単純で小さなデータセットのトレーニング例を記憶しているが,imagenetのような複雑なデータセットでは必ずしもそうではないことを示す。しかし,本実験では,現在の計測値が記憶を適切に検出できないことを示しており,記憶を不適合やモード縮小といった他の現象と区別することはできない。生成モデルのさらなる開発と評価を容易にするため、生成した画像データセット、人体評価データ、モジュールライブラリをリリースし、https://github.com/layer6ai-labs/dgm-evalで9つの異なるエンコーダに対して17の共通メトリクスを計算します。

We systematically study a wide variety of generative models spanning semantically-diverse image datasets to understand and improve the feature extractors and metrics used to evaluate them. Using best practices in psychophysics, we measure human perception of image realism for generated samples by conducting the largest experiment evaluating generative models to date, and find that no existing metric strongly correlates with human evaluations. Comparing to 17 modern metrics for evaluating the overall performance, fidelity, diversity, rarity, and memorization of generative models, we find that the state-of-the-art perceptual realism of diffusion models as judged by humans is not reflected in commonly reported metrics such as FID. This discrepancy is not explained by diversity in generated samples, though one cause is over-reliance on Inception-V3. We address these flaws through a study of alternative self-supervised feature extractors, find that the semantic information encoded by individual networks strongly depends on their training procedure, and show that DINOv2-ViT-L/14 allows for much richer evaluation of generative models. Next, we investigate data memorization, and find that generative models do memorize training examples on simple, smaller datasets like CIFAR10, but not necessarily on more complex datasets like ImageNet. However, our experiments show that current metrics do not properly detect memorization: none in the literature is able to separate memorization from other phenomena such as underfitting or mode shrinkage. To facilitate further development of generative models and their evaluation we release all generated image datasets, human evaluation data, and a modular library to compute 17 common metrics for 9 different encoders at https://github.com/layer6ai-labs/dgm-eval.

翻訳日:2023-11-01 23:41:06 公開日:2023-10-30

# マルチモーダル核融合相互作用:人間と自動定量化の研究

Multimodal Fusion Interactions: A Study of Human and Automatic Quantification ( http://arxiv.org/abs/2306.04125v2 )

ライセンス: Link先を確認

Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency

(参考訳) 異種信号のマルチモーダル融合を実現するためには、各モーダルが個別にタスクに有用な情報を提供し、この情報が他のモーダルの存在下でどのように変化するかを理解する必要がある。本稿では,(1)アノテータが第1,第2,両モダリティをアノテートする部分ラベル,(2)アノテータが第1,第2,第2のモダリティをアノテートする対物ラベルと,(2)アノテータが第1のモダリティをアノテートして,第2のモダリティをアノテートする部分ラベル,の2つのカテゴリをアノテートする方法の比較検討を行った。さらに、(3)情報分解に基づく別の分類法を提案し、アノテータが冗長性の度合いを注釈する: モダリティが個々に同時に同じ予測を与える範囲、一様性: 1つのモダリティが他方がしない予測を可能にする範囲、および相乗性: 2つのモダリティがそれぞれのモダリティを使用しない予測を行うことができる範囲。実験とアノテーションを通じて,各アプローチのいくつかの機会と限界を強調し,部分的および対実的ラベルのアノテーションを情報分解に自動的に変換する手法を提案する。

In order to perform multimodal fusion of heterogeneous signals, we need to understand their interactions: how each modality individually provides information useful for a task and how this information changes in the presence of other modalities. In this paper, we perform a comparative study of how humans annotate two categorizations of multimodal interactions: (1) partial labels, where different annotators annotate the label given the first, second, and both modalities, and (2) counterfactual labels, where the same annotator annotates the label given the first modality before asking them to explicitly reason about how their answer changes when given the second. We further propose an alternative taxonomy based on (3) information decomposition, where annotators annotate the degrees of redundancy: the extent to which modalities individually and together give the same predictions, uniqueness: the extent to which one modality enables a prediction that the other does not, and synergy: the extent to which both modalities enable one to make a prediction that one would not otherwise make using individual modalities. Through experiments and annotations, we highlight several opportunities and limitations of each approach and propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition, yielding an accurate and efficient method for quantifying multimodal interactions.

翻訳日:2023-11-01 23:40:06 公開日:2023-10-30

# ビジョンファウンデーションモデルによるラベルなしシーン理解に向けて

Towards Label-free Scene Understanding by Vision Foundation Models ( http://arxiv.org/abs/2306.03899v2 )

ライセンス: Link先を確認

Runnan Chen, Youquan Liu, Lingdong Kong, Nenglun Chen, Xinge Zhu, Yuexin Ma, Tongliang Liu, Wenping Wang

(参考訳) Contrastive Vision-Language Pre-Training (CLIP) や Segment Anything (SAM) のような視覚基礎モデルは、画像分類やセグメンテーションタスクにおいて印象的なゼロショット性能を示している。しかし, ラベルなしシーン理解のためのCLIPとSAMの組み入れはまだ検討されていない。本稿では,ラベル付きデータなしで2次元世界と3次元世界を理解可能にするビジョン基盤モデルの可能性を検討する。主な課題は、非常にノイズの多い擬似ラベルの下でネットワークを効果的に監視することであり、これはCLIPによって生成され、2Dから3Dドメインへの伝播中にさらに悪化する。これらの課題に対処するために,CLIPとSAMの強みを利用して同時に2Dと3Dネットワークを監督するクロスモダリティノイズスーパービジョン(CNS)手法を提案する。特に,コトレイン2Dおよび3Dネットワークに対して予測整合性正則化を導入し,さらにSAMの頑健な特徴表現を用いた遅延空間整合性を示す。屋内および屋外の多様なデータセットを用いた実験は,2次元および3次元オープン環境の理解において,本手法の優れた性能を示す。 2dネットワークと3dネットワークは、scannet上で28.4\%と33.5\%miouでラベルなしセマンティクスセグメンテーションを実現し、それぞれ4.7\%と7.9\%を改善した。 nuImages と nuScenes のデータセットでは、それぞれ 22.1\% と 26.8\% であり、3.5\% と 6.0\% の改善がある。コードは利用可能。 (https://github.com/runnanchen/Label-Free-Scene-Understanding)。

Vision foundation models such as Contrastive Vision-Language Pre-training (CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot performance on image classification and segmentation tasks. However, the incorporation of CLIP and SAM for label-free scene understanding has yet to be explored. In this paper, we investigate the potential of vision foundation models in enabling networks to comprehend 2D and 3D worlds without labelled data. The primary challenge lies in effectively supervising networks under extremely noisy pseudo labels, which are generated by CLIP and further exacerbated during the propagation from the 2D to the 3D domain. To tackle these challenges, we propose a novel Cross-modality Noisy Supervision (CNS) method that leverages the strengths of CLIP and SAM to supervise 2D and 3D networks simultaneously. In particular, we introduce a prediction consistency regularization to co-train 2D and 3D networks, then further impose the networks' latent space consistency using the SAM's robust feature representation. Experiments conducted on diverse indoor and outdoor datasets demonstrate the superior performance of our method in understanding 2D and 3D open environments. Our 2D and 3D network achieves label-free semantic segmentation with 28.4\% and 33.5\% mIoU on ScanNet, improving 4.7\% and 7.9\%, respectively. For nuImages and nuScenes datasets, the performance is 22.1\% and 26.8\% with improvements of 3.5\% and 6.0\%, respectively. Code is available. (https://github.com/runnanchen/Label-Free-Scene-Understanding).

翻訳日:2023-11-01 23:39:07 公開日:2023-10-30

# ランダム分布シフトによる学習

Learning under random distributional shifts ( http://arxiv.org/abs/2306.02948v2 )

ライセンス: Link先を確認

Kirk Bansak, Elisabeth Paulson, Dominik Rothenh\"ausler

(参考訳) 分布シフトモデル分布シフトを適切な表現において逆または低ランクにシフトする設定で予測を生成するための既存の多くのアプローチ。しかし、様々な現実の環境では、人口と環境の多くの小さなランダムな変化の重ね合わせによって、変化が起こるかもしれない。したがって,共変量空間の任意の変化を捉えたランダム分布シフトモデルと,共変量と結果の関係に対する密集したランダムショックモデルを考える。この設定では、関心の長期的な結果を直接予測する標準的なアプローチ、短期的なプロキシ結果を直接予測するプロキシアプローチ、長期的なポリシー結果と(短期的な)プロキシ結果の両方を利用するハイブリッドアプローチなど、いくつかの代替予測戦略の利点と欠点を特徴づける。ハイブリッドアプローチは分散シフトの強さとプロキシ関係の強さに頑健であることを示す。本研究では,この手法を2つのハイインパクト領域のデータセットに適用する。どちらの設定でも、提案手法は現在の手法よりも平均二乗誤差がかなり低いことが分かる。

Many existing approaches for generating predictions in settings with distribution shift model distribution shifts as adversarial or low-rank in suitable representations. In various real-world settings, however, we might expect shifts to arise through the superposition of many small and random changes in the population and environment. Thus, we consider a class of random distribution shift models that capture arbitrary changes in the underlying covariate space, and dense, random shocks to the relationship between the covariates and the outcomes. In this setting, we characterize the benefits and drawbacks of several alternative prediction strategies: the standard approach that directly predicts the long-term outcome of interest, the proxy approach that directly predicts a shorter-term proxy outcome, and a hybrid approach that utilizes both the long-term policy outcome and (shorter-term) proxy outcome(s). We show that the hybrid approach is robust to the strength of the distribution shift and the proxy relationship. We apply this method to datasets in two high-impact domains: asylum-seeker assignment and early childhood education. In both settings, we find that the proposed approach results in substantially lower mean-squared error than current approaches.

翻訳日:2023-11-01 23:38:12 公開日:2023-10-30

# 分散シフト下におけるビデオ自己教師型学習の隠れダイナミクスの解明

Uncovering the Hidden Dynamics of Video Self-supervised Learning under Distribution Shifts ( http://arxiv.org/abs/2306.02014v2 )

ライセンス: Link先を確認

Pritam Sarkar, Ahmad Beirami, Ali Etemad

(参考訳) ビデオ自己教師型学習(VSSL)は近年大きな進歩を遂げている。しかし、分布シフトの異なる形でのこれらのモデルの正確な挙動とダイナミクスはまだ分かっていない。本稿では, 様々な形態の自然分布変化に対応する6種類の自己監督手法(v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE)の挙動を総合的に検討する。 (i)コンテキストシフト。 (ii)視点転換。 (iii)俳優交代。 (iv) ソースシフト。 (v)未知クラスへの一般化可能性(ゼロショット) (vi)オープンセット認識。この広範な研究を行うために,利用可能な公開データセットと一連の評価プロトコルを用いて17の分散および分散ベンチマークペアからなるテストベッドを,意図したシフトで異なるメソッドをストレステストするために慎重に作成する。本研究は,VSSL手法の興味深い発見と興味深い挙動を明らかにするものである。例えば、ビデオモデルは一般的にコンテキストシフトに苦しむが、v-MAEと教師付き学習はより堅牢性を示す。また,v-MAEは時間的学習者であり,v-SimCLRとv-MoCoは視点変化に対して強い性能を示す。オープンセット認識の概念を研究する際,事前学習したVSSLエンコーダを微調整することなく使用した場合,クローズドセットとオープンセット認識性能のトレードオフに気づく。私たちの研究が,実世界のさまざまなシナリオを対象としたロバストなビデオ表現学習フレームワークの開発に貢献できることを願っています。プロジェクトページとコードは、https://pritamqu.github.io/ood-vssl。

Video self-supervised learning (VSSL) has made significant progress in recent years. However, the exact behavior and dynamics of these models under different forms of distribution shift are not yet known. In this paper, we comprehensively study the behavior of six popular self-supervised methods (v-SimCLR, v-MoCo, v-BYOL, v-SimSiam, v-DINO, v-MAE) in response to various forms of natural distribution shift, i.e., (i) context shift, (ii) viewpoint shift, (iii) actor shift, (iv) source shift, (v) generalizability to unknown classes (zero-shot), and (vi) open-set recognition. To perform this extensive study, we carefully craft a test bed consisting of 17 in-distribution and out-of-distribution benchmark pairs using available public datasets and a series of evaluation protocols to stress-test the different methods under the intended shifts. Our study uncovers a series of intriguing findings and interesting behaviors of VSSL methods. For instance, we observe that while video models generally struggle with context shifts, v-MAE and supervised learning exhibit more robustness. Moreover, our study shows that v-MAE is a strong temporal learner, whereas contrastive methods, v-SimCLR and v-MoCo, exhibit strong performances against viewpoint shifts. When studying the notion of open-set recognition, we notice a trade-off between closed-set and open-set recognition performance if the pretrained VSSL encoders are used without finetuning. We hope that our work will contribute to the development of robust video representation learning frameworks for various real-world scenarios. The project page and code are available at: https://pritamqu.github.io/OOD-VSSL.

翻訳日:2023-11-01 23:37:43 公開日:2023-10-30

# PLASTIC: 有効強化学習のための入力とラベルの塑性の改善

PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning ( http://arxiv.org/abs/2306.10711v2 )

ライセンス: Link先を確認

Hojoon Lee, Hanseul Cho, Hyunseung Kim, Daehoon Gwak, Joonkee Kim, Jaegul Choo, Se-Young Yun, Chulhee Yun

(参考訳) 強化学習(RL)では、特にデータ取得が高価でリスクの高いシナリオにおいて、サンプル効率の向上が不可欠である。原則として、オフポリシーrlアルゴリズムは、環境インタラクション毎に複数の更新を可能にすることにより、サンプル効率を向上させることができる。しかしながら、これらの複数の更新は、しばしば、可塑性の喪失と呼ばれる以前の相互作用に過度に適合するモデルにつながる。本研究は, この現象の原因を, 塑性を2つの側面に分けて検討した。入力可塑性(英: Input plasticity)とは、入力データの変更に対するモデルの適応性、および入力-出力関係の進化に対するモデルの適応性を示すラベル可塑性である。 cifar-10データセットの合成実験により、より滑らかなロスランドスケープの発見は入力可塑性を増加させ、一方、洗練された勾配伝播はラベル可塑性を改善することが判明した。これらの知見を活かしてPLASTICアルゴリズムを導入し,両問題に対処する手法を調和的に組み合わせた。最小限のアーキテクチャ変更により、PLASTICはAtari-100kやDeepmind Control Suiteといったベンチマーク上での競合性能を達成した。この結果は、RLの試料効率を高めるためにモデルの可塑性を維持することの重要性を強調している。コードはhttps://github.com/dojeon-ai/plasticで入手できる。

In Reinforcement Learning (RL), enhancing sample efficiency is crucial, particularly in scenarios when data acquisition is costly and risky. In principle, off-policy RL algorithms can improve sample efficiency by allowing multiple updates per environment interaction. However, these multiple updates often lead the model to overfit to earlier interactions, which is referred to as the loss of plasticity. Our study investigates the underlying causes of this phenomenon by dividing plasticity into two aspects. Input plasticity, which denotes the model's adaptability to changing input data, and label plasticity, which denotes the model's adaptability to evolving input-output relationships. Synthetic experiments on the CIFAR-10 dataset reveal that finding smoother minima of loss landscape enhances input plasticity, whereas refined gradient propagation improves label plasticity. Leveraging these findings, we introduce the PLASTIC algorithm, which harmoniously combines techniques to address both concerns. With minimal architectural modifications, PLASTIC achieves competitive performance on benchmarks including Atari-100k and Deepmind Control Suite. This result emphasizes the importance of preserving the model's plasticity to elevate the sample efficiency in RL. The code is available at https://github.com/dojeon-ai/plastic.

翻訳日:2023-11-01 23:29:56 公開日:2023-10-30

# 明示的制約を考慮した学習ダイナミクスのための安定化ニューラル微分方程式

Stabilized Neural Differential Equations for Learning Dynamics with Explicit Constraints ( http://arxiv.org/abs/2306.09739v2 )

ライセンス: Link先を確認

Alistair White, Niki Kilbertus, Maximilian Gelbrecht, Niklas Boers

(参考訳) データから動的システムを学ぶための多くの手法が最近導入された。しかしながら、推論力学が、保護法や許可されたシステム状態の制限といった既知の制約を確実に維持することはまだ困難である。本稿では, 線形微分方程式に対する任意の多様体制約を強制する手法である安定化ニューラル微分方程式(SNDE)を提案する。我々のアプローチは安定化項に基づいており、元の力学に加えると、制約多様体は漸近的に安定である。その単純さから,本手法はすべての共通神経微分方程式(nde)モデルと適合し,広く適用可能である。実験的な評価では、SNDEは既存の手法よりも優れており、NDEトレーニングに組み込むことができる制約の種類を広くしている。

Many successful methods to learn dynamical systems from data have recently been introduced. However, ensuring that the inferred dynamics preserve known constraints, such as conservation laws or restrictions on the allowed system states, remains challenging. We propose stabilized neural differential equations (SNDEs), a method to enforce arbitrary manifold constraints for neural differential equations. Our approach is based on a stabilization term that, when added to the original dynamics, renders the constraint manifold provably asymptotically stable. Due to its simplicity, our method is compatible with all common neural differential equation (NDE) models and broadly applicable. In extensive empirical evaluations, we demonstrate that SNDEs outperform existing methods while broadening the types of constraints that can be incorporated into NDE training.

翻訳日:2023-11-01 23:29:16 公開日:2023-10-30

# QH9:QM9分子の量子ハミルトン予測ベンチマーク

QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules ( http://arxiv.org/abs/2306.09549v2 )

ライセンス: Link先を確認

Haiyang Yu, Meng Liu, Youzhi Luo, Alex Strasser, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

(参考訳) 教師付き機械学習アプローチは、密度汎関数理論(DFT)のような第一原理計算手法の代用として、電子構造予測の加速にますます利用されている。多くの量子化学データセットは化学的性質と原子力に焦点を当てているが、物理系と化学特性の量子状態を決定する最も重要かつ基本的な物理量であるため、ハミルトン行列の正確かつ効率的な予測を達成する能力は非常に望ましい。本研究では、QM9データセットに基づいて、2,399の分子動力学軌道と130,831の安定な分子ジオメトリに対して正確なハミルトン行列を提供するために、QH9と呼ばれる新しい量子ハミルトンデータセットを生成する。様々な分子を用いてベンチマークタスクを設計することにより、現在の機械学習モデルは任意の分子に対するハミルトン行列を予測する能力を有することを示す。 QH9データセットとベースラインモデルの両方がオープンソースベンチマークを通じてコミュニティに提供されており、機械学習手法の開発や、科学および技術応用のための分子および材料設計の加速に非常に有用である。私たちのベンチマークはhttps://github.com/divelab/AIRS/tree/main/OpenDFT/QHBenchで公開されています。

Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 2,399 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.

翻訳日:2023-11-01 23:29:04 公開日:2023-10-30

# 除去に基づく特徴属性のロバスト性について

On the Robustness of Removal-Based Feature Attributions ( http://arxiv.org/abs/2306.07462v2 )

ライセンス: Link先を確認

Chris Lin, Ian Covert, Su-In Lee

(参考訳) 複雑な機械学習モデルによる予測を説明するため、重要点を入力特徴に割り当てる多くの特徴属性法が開発されている。最近の研究の中には、入力やモデル摂動に敏感であることを示すことによって、これらの手法の堅牢性に挑戦するものもある。しかし,従来の帰属ロバスト性は,主に勾配に基づく特徴帰属に焦点が当てられているが,現在,除去に基づく帰属法のロバスト性はよく分かっていない。このギャップを埋めるために、我々は除去に基づく特徴属性の堅牢性特性を理論的に特徴づける。具体的には,これらの手法の統一的な解析を行い,入力とモデルの両方の摂動の設定下で,無傷と摂動の差の上界を導出する。合成データと実世界のデータを用いた実験結果は,理論結果の妥当性を検証し,モデルのリプシッツ正則性向上による帰属ロバスト性の向上など,その実践的意義を実証した。

To explain predictions made by complex machine learning models, many feature attribution methods have been developed that assign importance scores to input features. Some recent work challenges the robustness of these methods by showing that they are sensitive to input and model perturbations, while other work addresses this issue by proposing robust attribution methods. However, previous work on attribution robustness has focused primarily on gradient-based feature attributions, whereas the robustness of removal-based attribution methods is not currently well understood. To bridge this gap, we theoretically characterize the robustness properties of removal-based feature attributions. Specifically, we provide a unified analysis of such methods and derive upper bounds for the difference between intact and perturbed attributions, under settings of both input and model perturbations. Our empirical results on synthetic and real-world data validate our theoretical results and demonstrate their practical implications, including the ability to increase attribution robustness by improving the model's Lipschitz regularity.

翻訳日:2023-11-01 23:27:40 公開日:2023-10-30

# FLSL: 機能レベルの自己教師型学習

FLSL: Feature-level Self-supervised Learning ( http://arxiv.org/abs/2306.06203v2 )

ライセンス: Link先を確認

Qing Su, Anton Netchaev, Hai Li, and Shihao Ji

(参考訳) 現在の自己教師型学習(SSL)手法(例えば、SimCLR, DINO, VICReg, MOCOv3)は、主にインスタンスレベルでの表現を目標としており、オブジェクト検出やセグメンテーションなどの高密度な予測タスクには適さない。共同埋め込みとクラスタリングにトランスフォーマーを用いることにより,FLSL(Feature-Level Self-supervised Learning)と呼ばれる2レベル特徴クラスタリングSSL法を提案する。 FLSL問題の形式的定義を示し、平均シフトおよびk平均視点から目的を構築する。 FLSLは目覚しいセマンティッククラスタ表現を促進し,ビュー内およびビュー間特徴クラスタリングに適した埋め込みスキームを学習する。実験の結果、FLSLは高密度予測タスクにおいて大幅に改善し、対象検出では44.9 (+2.8)% APと46.5% AP、MS-COCOでは40.8 (+2.3)% APと42.1% APを達成した。 FLSL は UAVDT 上の UAV17 オブジェクト検出や DAVIS 2017 上のビデオインスタンスセグメンテーションなど,既存の SSL メソッドよりも一貫して優れている。ソースコードはhttps://github.com/isl-cv/flslで入手できる。

Current self-supervised learning (SSL) methods (e.g., SimCLR, DINO, VICReg,MOCOv3) target primarily on representations at instance level and do not generalize well to dense prediction tasks, such as object detection and segmentation.Towards aligning SSL with dense predictions, this paper demonstrates for the first time the underlying mean-shift clustering process of Vision Transformers (ViT), which aligns well with natural image semantics (e.g., a world of objects and stuffs). By employing transformer for joint embedding and clustering, we propose a two-level feature clustering SSL method, coined Feature-Level Self-supervised Learning (FLSL). We present the formal definition of the FLSL problem and construct the objectives from the mean-shift and k-means perspectives. We show that FLSL promotes remarkable semantic cluster representations and learns an embedding scheme amenable to intra-view and inter-view feature clustering. Experiments show that FLSL yields significant improvements in dense prediction tasks, achieving 44.9 (+2.8)% AP and 46.5% AP in object detection, as well as 40.8 (+2.3)% AP and 42.1% AP in instance segmentation on MS-COCO, using Mask R-CNN with ViT-S/16 and ViT-S/8 as backbone, respectively. FLSL consistently outperforms existing SSL methods across additional benchmarks, including UAV17 object detection on UAVDT, and video instance segmentation on DAVIS 2017.We conclude by presenting visualization and various ablation studies to better understand the success of FLSL. The source code is available at https://github.com/ISL-CV/FLSL.

翻訳日:2023-11-01 23:26:09 公開日:2023-10-30

# Intensity Profile Projection:動的ネットワークのための連続時間表現学習フレームワーク

Intensity Profile Projection: A Framework for Continuous-Time Representation Learning for Dynamic Networks ( http://arxiv.org/abs/2306.06155v2 )

ライセンス: Link先を確認

Alexander Modell, Ian Gallagher, Emma Ceccherini, Nick Whiteley and Patrick Rubin-Delanchy

(参考訳) 連続時間動的ネットワークデータのための新しい表現学習フレームワークIntensity Profile Projectionを提案する。 2つのエンティティ(i,j$)間の時間スタンプ(t$)の相互作用を表すトリプル$(i,j,t)$を与えられた場合、我々の手順は各ノードに対して連続時間軌跡を返す。このフレームワークは3つの段階から構成される:例えば、カーネルの滑らか化によるペアエント関数の推定、強度再構成誤差を最小化するプロジェクションの学習、学習されたプロジェクションを通して進化するノード表現の構築。軌道は構造的コヒーレンスと時間的コヒーレンスという2つの性質を満たしており、これは信頼できる推論の基本的なものである。さらに,推定軌跡の誤差を厳密に制御できる推定理論を考案し,ノイズに敏感な追従解析でもその表現が利用できることを示す。この理論はまた、偏分散トレードオフとしての平滑化の役割を解明し、ネットワーク全体の「ボーリング強度」のアルゴリズムを考慮すると、信号対雑音比が増加するにつれて平滑化のレベルをいかに低減できるかを示す。

We present a new representation learning framework, Intensity Profile Projection, for continuous-time dynamic network data. Given triples $(i,j,t)$, each representing a time-stamped ($t$) interaction between two entities ($i,j$), our procedure returns a continuous-time trajectory for each node, representing its behaviour over time. The framework consists of three stages: estimating pairwise intensity functions, e.g. via kernel smoothing; learning a projection which minimises a notion of intensity reconstruction error; and constructing evolving node representations via the learned projection. The trajectories satisfy two properties, known as structural and temporal coherence, which we see as fundamental for reliable inference. Moreoever, we develop estimation theory providing tight control on the error of any estimated trajectory, indicating that the representations could even be used in quite noise-sensitive follow-on analyses. The theory also elucidates the role of smoothing as a bias-variance trade-off, and shows how we can reduce the level of smoothing as the signal-to-noise ratio increases on account of the algorithm `borrowing strength' across the network.

翻訳日:2023-11-01 23:25:35 公開日:2023-10-30

# T2I-CompBench: オープンワールドコンポジションテキスト画像生成のための総合ベンチマーク

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation ( http://arxiv.org/abs/2307.06350v2 )

ライセンス: Link先を確認

Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu

(参考訳) 最近のテキストから画像へのモデルによって高品質な画像を生成する素晴らしい能力にもかかわらず、現在のアプローチでは、異なる属性と関係を持つオブジェクトを複雑で一貫性のあるシーンに効果的に構成するのに苦労することが多い。 T2I-CompBenchは3つのカテゴリ(属性バインディング、オブジェクト関係、複雑な構成)と6つのサブカテゴリ(カラーバインディング、形状バインディング、テクスチャバインディング、空間関係、非空間関係、複雑な構成)から6000のコンポジションテキストプロンプトからなるオープンワールドコンポジションテキスト画像生成のための総合ベンチマークである。さらに,合成テキストから画像への生成を評価するために特別に設計された評価指標をいくつか提案し,マルチモーダルllmの可能性と限界について検討する。本稿では,プリトレーニングされたテキスト対画像モデルの合成テキスト生成能力を高めるために,報酬駆動サンプル選択(gors)による生成モデルの微調整を提案する。従来のt2i-compbench法をベンチマークし,提案手法の有効性を検証するため,広範な実験と評価を行った。プロジェクトページはhttps://karine-h.github.io/t2i-compbench/。

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel fine-tuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at https://karine-h.github.io/T2I-CompBench/.

翻訳日:2023-11-01 23:18:05 公開日:2023-10-30

# テキスト記述は視覚学習のための圧縮的・不変表現である

Text Descriptions are Compressive and Invariant Representations for Visual Learning ( http://arxiv.org/abs/2307.04317v2 )

ライセンス: Link先を確認

Zhili Feng, Anna Bair, J. Zico Kolter

(参考訳) 現代の画像分類は、分類決定を構成する直感的な視覚的特徴に関する情報を直接含まない、大きな識別ネットワークを介してクラスを直接予測することに基づいている。近年、CLIPのような視覚言語モデル(VLM)の研究は、画像クラスの自然言語記述を規定する手段を提供しているが、一般的には各クラスに単一の記述を提供することに焦点を当てている。本研究では,クラスごとの視覚的特徴に対する人間の理解に則った代替手法が,頑健な数ショット学習環境において魅力的な性能を提供できることを示す。特に,新しい手法である「textit{SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors)}を導入する。この手法はまず,まず大規模言語モデル(LLM)を用いて各クラスの視覚的記述を自動的に生成し,次にVLMを用いて各画像の視覚的特徴埋め込みに変換し,最後に,各特徴の関連部分集合を選択して各画像の分類を行う。我々のアプローチの中核は、情報理論上、これらの記述的特徴は、vlmトレーニングプロセスが不変表現学習のために明示的に設計されていないにもかかわらず、従来の画像埋め込みよりもドメインシフトに不変であるという事実です。これらの不変記述機能は、より良い入力圧縮スキームを構成する。ファインチューニングと組み合わせることで、SLR-AVDは、分布内および分布外の両方において既存の最先端のファインチューニング手法より優れていることを示す。

Modern image classification is based upon directly predicting classes via large discriminative networks, which do not directly contain information about the intuitive visual features that may constitute a classification decision. Recently, work in vision-language models (VLM) such as CLIP has provided ways to specify natural language descriptions of image classes, but typically focuses on providing single descriptions for each class. In this work, we demonstrate that an alternative approach, in line with humans' understanding of multiple visual features per class, can also provide compelling performance in the robust few-shot learning setting. In particular, we introduce a novel method, \textit{SLR-AVD (Sparse Logistic Regression using Augmented Visual Descriptors)}. This method first automatically generates multiple visual descriptions of each class via a large language model (LLM), then uses a VLM to translate these descriptions to a set of visual feature embeddings of each image, and finally uses sparse logistic regression to select a relevant subset of these features to classify each image. Core to our approach is the fact that, information-theoretically, these descriptive features are more invariant to domain shift than traditional image embeddings, even though the VLM training process is not explicitly designed for invariant representation learning. These invariant descriptive features also compose a better input compression scheme. When combined with finetuning, we show that SLR-AVD is able to outperform existing state-of-the-art finetuning approaches on both in-distribution and out-of-distribution performance.

翻訳日:2023-11-01 23:17:11 公開日:2023-10-30

# 3分間の人間フィードバックを用いた拡散モデルの検閲サンプリング

Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback ( http://arxiv.org/abs/2307.02770v2 )

ライセンス: Link先を確認

TaeHo Yoon, Kibeom Myoung, Keon Lee, Jaewoong Cho, Albert No, Ernest K. Ryu

(参考訳) 拡散モデルは最近、高品質な画像生成で顕著な成功を収めている。しかし、事前学習された拡散モデルは、良い画像を生成できるという意味で部分的な不一致を示すことがあるが、望ましくない画像を出力することもある。もしそうなら、単に悪い画像を生成するのを防ぎ、このタスクを検閲と呼びます。本研究では,最小の人間フィードバックに基づいて学習した報酬モデルを用いて,事前学習した拡散モデルを用いた検閲生成法を提案する。検閲は極端に人的フィードバック効率で達成でき、ほんの数分のフィードバックで生成されたラベルだけで十分であることを示す。 https://github.com/tetrzim/diffusion-human-feedback.com/で利用可能。

Diffusion models have recently shown remarkable success in high-quality image generation. Sometimes, however, a pre-trained diffusion model exhibits partial misalignment in the sense that the model can generate good images, but it sometimes outputs undesirable images. If so, we simply need to prevent the generation of the bad images, and we call this task censoring. In this work, we present censored generation with a pre-trained diffusion model using a reward model trained on minimal human feedback. We show that censoring can be accomplished with extreme human feedback efficiency and that labels generated with a mere few minutes of human feedback are sufficient. Code available at: https://github.com/tetrzim/diffusion-human-feedback.

翻訳日:2023-11-01 23:16:45 公開日:2023-10-30

# スライスワッサーシュタイン一般化測地学による高速最適輸送

Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics ( http://arxiv.org/abs/2307.01770v2 )

ライセンス: Link先を確認

Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Cl\'ement Bonet, Nicolas Courty

(参考訳) ワッサースタイン距離(wasserstein distance, wd)と関連する最適輸送計画は、確率測度が懸かっている多くの応用において有用であることが証明されている。本稿では,2つの入力分布の最適1次元投影により誘導される輸送マップに基づく,2乗WDの新たなプロキシであるmin-SWGGを提案する。 min-swgg と wasserstein の一般化測地学との接続を描き、ピボット測度を直線上で支持する。特に、ライン上でサポートされている分布の1つの場合において、正確なワッサースタイン距離に対する新しい閉形式を提供し、勾配降下最適化に適応可能な高速計算スキームを導出する。 min-SWGG は WD の上限であり,Sliced-Wasserstein と同様の複雑性を有し,関連する輸送計画を提供するという付加的な特徴を有することを示す。また、距離性、弱収束、計算および位相的性質などの理論的性質についても検討する。実験的な証拠は、勾配流、形状マッチング、画像の着色など、様々な文脈におけるmin-SWGGの利点を支持する。

Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others.

翻訳日:2023-11-01 23:16:34 公開日:2023-10-30

# アイデンティティ効果学習におけるグラフニューラルネットワークの一般化限界

Generalization Limits of Graph Neural Networks in Identity Effects Learning ( http://arxiv.org/abs/2307.00134v2 )

ライセンス: Link先を確認

Giuseppe Alessio D'Inverno and Simone Brugiapaglia and Mirco Ravanelli

(参考訳) グラフニューラルネットワーク(GNN)は、さまざまなグラフドメインでデータ駆動学習を行う強力なツールとして登場した。それらは通常、メッセージパス機構に基づいており、表現力の点で同等であることが証明されたグラフ同型に対するWeisfeiler-Lehman (WL)テストと密接に関連している直感的な定式化で人気を高めている。本研究では,物体が2つの同一成分からなるか否かを判断するタスク,いわゆるアイデンティティ効果の学習の文脈において,新たな一般化特性とgnnの基本限界を確立する。本研究の目的は,GNNが単純な認知タスクを遂行する際の能力を理解することであり,計算言語学や化学への応用の可能性にある。 2つのケーススタディを分析しました (i)二文字の単語は、一線表現のような直交符号化を利用する場合、確率勾配降下により訓練されたGNNが、見知らぬ文字に一般化できないことを示す。 (ii)二環グラフ、すなわち2つのサイクルからなるグラフは、GNNとWLテストの接続を利用して正の存在結果を示す。我々の理論解析は広範な数値研究によって裏付けられている。

Graph Neural Networks (GNNs) have emerged as a powerful tool for data-driven learning on various graph domains. They are usually based on a message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power. In this work, we establish new generalization properties and fundamental limits of GNNs in the context of learning so-called identity effects, i.e., the task of determining whether an object is composed of two identical components or not. Our study is motivated by the need to understand the capabilities of GNNs when performing simple cognitive tasks, with potential applications in computational linguistics and chemistry. We analyze two case studies: (i) two-letters words, for which we show that GNNs trained via stochastic gradient descent are unable to generalize to unseen letters when utilizing orthogonal encodings like one-hot representations; (ii) dicyclic graphs, i.e., graphs composed of two cycles, for which we present positive existence results leveraging the connection between GNNs and the WL test. Our theoretical analysis is supported by an extensive numerical study.

翻訳日:2023-11-01 23:15:54 公開日:2023-10-30

# プロンプトによるパーソナライズドコールドスタート勧告に向けて

Towards Personalized Cold-Start Recommendation with Prompts ( http://arxiv.org/abs/2306.17256v3 )

ライセンス: Link先を確認

Xuansheng Wu, Huachi Zhou, Yucheng Shi, Wenlin Yao, Xiao Huang, Ninghao Liu

(参考訳) レコメンダシステムは,過去の行動に基づいて,ユーザの興味に沿った情報発見を支援する上で,重要な役割を担っている。しかし、ユーザとコンテンツのインタラクションの履歴が利用できない場合、パーソナライズドレコメンデーションシステムの開発は困難になり、システムコールドスタートレコメンデーション問題として知られる問題に繋がる。この問題は、ユーザーエンゲージメントが不十分なスタートアップ企業やプラットフォームで特に顕著である。従来の研究では、新しいユーザやアイテムを推薦できるが、同じドメイン内の歴史的なユーザとイテムのインタラクションでトレーニングされているため、私たちの問題は解決できない。このギャップを埋めるため,本研究では,事前学習した言語モデルの能力を活用した革新的かつ効果的なアプローチを提案する。提案手法は,ユーザプロファイルや項目属性の情報を含む自然言語の感情分析に変換され,迅速な学習によって感情極性が予測される。言語モデルに格納された広範な知識を利用することで、歴史的ユーザ・イテム相互作用の記録なしで予測を行うことができる。また,提案手法を冷間開始条件下で評価するためのベンチマークも導入し,本手法の有効性を実証した。私たちの知る限りでは、システムコールドスタートレコメンデーション問題に取り組む最初の研究である。メソッドのベンチマークと実装はhttps://github.com/JacksonWuxs/PromptRec.comで公開されている。

Recommender systems play a crucial role in helping users discover information that aligns with their interests based on their past behaviors. However, developing personalized recommendation systems becomes challenging when historical records of user-item interactions are unavailable, leading to what is known as the system cold-start recommendation problem. This issue is particularly prominent in start-up businesses or platforms with insufficient user engagement history. Previous studies focus on user or item cold-start scenarios, where systems could make recommendations for new users or items but are still trained with historical user-item interactions in the same domain, which cannot solve our problem. To bridge the gap, our research introduces an innovative and effective approach, capitalizing on the capabilities of pre-trained language models. We transform the recommendation process into sentiment analysis of natural languages containing information of user profiles and item attributes, where the sentiment polarity is predicted with prompt learning. By harnessing the extensive knowledge housed within language models, the prediction can be made without historical user-item interaction records. A benchmark is also introduced to evaluate the proposed method under the cold-start setting, and the results demonstrate the effectiveness of our method. To the best of our knowledge, this is the first study to tackle the system cold-start recommendation problem. The benchmark and implementation of the method are available at https://github.com/JacksonWuxs/PromptRec.

翻訳日:2023-11-01 23:15:33 公開日:2023-10-30

# 拡散確率モデルのスパイキング

Spiking Denoising Diffusion Probabilistic Models ( http://arxiv.org/abs/2306.17046v3 )

ライセンス: Link先を確認

Jiahang Cao, Ziqing Wang, Hanzhong Guo, Hao Cheng, Qiang Zhang, Renjing Xu

(参考訳) スパイキングニューラルネットワーク(SNN)は、人工ニューラルネットワーク(ANN)と比較して、二元的および生物駆動的な性質のため、超低エネルギー消費と高い生物学的可視性を有する。これまでの研究は主に分類タスクにおけるsnsの性能向上に重点を置いてきたが、snsの生成可能性は比較的未解明のままである。本稿では,SNN を用いた新しい生成モデルである Spking Denoising Diffusion Probabilistic Models (SDDPM) について述べる。 SNNのエネルギー効率をフル活用するために,ANNに匹敵する性能を実現する純粋にスパイクされたU-Netアーキテクチャを提案する。広範な実験結果から,提案手法は生成タスクの最先端化を達成し,他のsnベースの生成モデルよりも大幅に優れ,cifar-10とcelebaデータセットでは最大12倍,6倍の改善が得られた。さらに,トレーニングフリーでパフォーマンスをさらに2.69%向上させることができるしきい値誘導戦略を提案する。 SDDPMはSNN生成の分野での大きな進歩を象徴し、新たな視点と潜在的な探索の道のりを注入している。私たちのコードはhttps://github.com/AndyCao1125/SDDPMで利用可能です。

Spiking neural networks (SNNs) have ultra-low energy consumption and high biological plausibility due to their binary and bio-driven nature compared with artificial neural networks (ANNs). While previous research has primarily focused on enhancing the performance of SNNs in classification tasks, the generative potential of SNNs remains relatively unexplored. In our paper, we put forward Spiking Denoising Diffusion Probabilistic Models (SDDPM), a new class of SNN-based generative models that achieve high sample quality. To fully exploit the energy efficiency of SNNs, we propose a purely Spiking U-Net architecture, which achieves comparable performance to its ANN counterpart using only 4 time steps, resulting in significantly reduced energy consumption. Extensive experimental results reveal that our approach achieves state-of-the-art on the generative tasks and substantially outperforms other SNN-based generative models, achieving up to 12x and 6x improvement on the CIFAR-10 and the CelebA datasets, respectively. Moreover, we propose a threshold-guided strategy that can further improve the performances by 2.69% in a training-free manner. The SDDPM symbolizes a significant advancement in the field of SNN generation, injecting new perspectives and potential avenues of exploration. Our code is available at https://github.com/AndyCao1125/SDDPM.

翻訳日:2023-11-01 23:14:38 公開日:2023-10-30

# 深部微分型メッシュ変形による腹部臓器の分節

Abdominal organ segmentation via deep diffeomorphic mesh deformations ( http://arxiv.org/abs/2306.15515v2 )

ライセンス: Link先を確認

Fabian Bongratz, Anne-Marie Rickmann, Christian Wachinger

(参考訳) CTとMRIによる腹部臓器の分節は,手術計画とコンピュータ支援ナビゲーションシステムにとって必須の要件である。腹部臓器の形状,大きさ,位置の多様性が高いため,困難である。テンプレートに対する点対応の腹部形状の3次元数値表現は、その定量的および統計的解析においてさらに重要である。近年,テンプレートベースの表面抽出法は,体積走査によるメッシュ再構築に期待できる進歩を見せている。しかし, 様々な臓器やデータセットに対する深層学習に基づくアプローチの一般化は, 臨床環境への展開にとって重要な要素であり, まだ評価されていない。このギャップを埋めて, 肝臓, 腎臓, 膵臓, 脾臓分節に対するテンプレートベースのメッシュ再構成法を応用した。手動注記CTおよびMRIデータを用いた実験により,従来の手法を異なる形状のオルガンに限定的に一般化し,小さなデータセット上での弱い性能を示すことができた。我々はこれらの問題を、新しい微分型メッシュデフォーメーションアーキテクチャと改善されたトレーニングスキームで緩和する。結果として得られたUNetFlowは4つの器官すべてによく当てはまり、新しいデータに基づいて簡単に微調整できる。さらに,ボクセルとメッシュの出力を整列させてセグメンテーション精度を高める,単純な登録ベースの後処理を提案する。

Abdominal organ segmentation from CT and MRI is an essential prerequisite for surgical planning and computer-aided navigation systems. It is challenging due to the high variability in the shape, size, and position of abdominal organs. Three-dimensional numeric representations of abdominal shapes with point-wise correspondence to a template are further important for quantitative and statistical analyses thereof. Recently, template-based surface extraction methods have shown promising advances for direct mesh reconstruction from volumetric scans. However, the generalization of these deep learning-based approaches to different organs and datasets, a crucial property for deployment in clinical environments, has not yet been assessed. We close this gap and employ template-based mesh reconstruction methods for joint liver, kidney, pancreas, and spleen segmentation. Our experiments on manually annotated CT and MRI data reveal limited generalization capabilities of previous methods to organs of different geometry and weak performance on small datasets. We alleviate these issues with a novel deep diffeomorphic mesh-deformation architecture and an improved training scheme. The resulting method, UNetFlow, generalizes well to all four organs and can be easily fine-tuned on new data. Moreover, we propose a simple registration-based post-processing that aligns voxel and mesh outputs to boost segmentation accuracy.

翻訳日:2023-11-01 23:12:47 公開日:2023-10-30

# 状態のみ列からの非マルコフ決定過程の学習

Learning non-Markovian Decision-Making from State-only Sequences ( http://arxiv.org/abs/2306.15156v3 )

ライセンス: Link先を確認

Aoyang Qin, Feng Gao, Qing Li, Song-Chun Zhu, Sirui Xie

(参考訳) 従来の模倣学習では、デモ参加者の行動にアクセスできるが、これらの運動信号は自然主義的な環境では観測できないことが多い。さらに、これらの設定におけるシーケンシャルな意思決定行動は、標準的なマルコフ決定プロセス(MDP)の仮定から逸脱することができる。これらの課題に対処するために、状態遷移生成器の潜時空間におけるエネルギーベースである非マルコフ決定過程(nMDP)を用いた状態のみ列の深部生成モデリングについて検討する。提案手法は,後肢の短周期MCMCサンプリングと重要サンプリングを含むモデルベース模倣を実現するための最大推定法である。モデルなしのポリシーの実行は、事前のサンプリングと等価であり、モデルベースの計画はそのポリシーから初期化された後続のサンプリングである。非マルコフ制約付き経路計画タスクにおいて,提案手法の有効性を実証し,mujocoスイートからの挑戦領域において,学習モデルが強力な性能を示すことを示した。

Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from the assumptions of a standard Markov Decision Process (MDP). To address these challenges, we explore deep generative modeling of state-only sequences with non-Markov Decision Process (nMDP), where the policy is an energy-based prior in the latent space of the state transition generator. We develop maximum likelihood estimation to achieve model-based imitation, which involves short-run MCMC sampling from the prior and importance sampling for the posterior. The learned model enables \textit{decision-making as inference}: model-free policy execution is equivalent to prior sampling, model-based planning is posterior sampling initialized from the policy. We demonstrate the efficacy of the proposed method in a prototypical path planning task with non-Markovian constraints and show that the learned model exhibits strong performances in challenging domains from the MuJoCo suite.

翻訳日:2023-11-01 23:12:27 公開日:2023-10-30

# InterCode: 実行フィードバックによるインタラクティブコーディングの標準化とベンチマーク

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback ( http://arxiv.org/abs/2306.14898v3 )

ライセンス: Link先を確認

John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao

(参考訳) 人間は基本的にインタラクティブな方法でコードを書き、エラーを修正し、曖昧さを解決し、タスクを分解するために一定の実行フィードバックに頼る。 LLMは最近、有望なコーディング機能を示したが、現在のコーディングベンチマークは、主に静的命令からコードへのシーケンスのトランスダクションプロセスを検討しており、エラーの伝播や生成されたコードと最終的な実行環境との切り離しが可能である。このギャップに対処するため、対話型コーディングの軽量でフレキシブルで使いやすいフレームワークであるInterCodeを標準強化学習(RL)環境として導入し、コードをアクションとして、実行フィードバックを観察する。私たちのフレームワークは言語とプラットフォームに依存しず、自己完結型のDocker環境を使用して安全で再現可能な実行を提供し、従来のseq2seqコーディングメソッドと互換性があり、インタラクティブなコード生成のための新しいメソッドの開発を可能にします。私たちはInterCodeを使って、静的なNL2Bash、Spider、MBPPデータセットからのデータを活用する、アクションスペースとしてBash、SQL、Pythonで3つのインタラクティブなコード環境を作成しています。我々は、ReActやPlan & Solveといった様々なプロンプト戦略で構成された複数の最先端LLMを評価することで、InterCodeの生存性をテストベッドとして示す。その結果,インタラクティブなコード生成の利点が示され,コード理解と生成能力向上のための難解なベンチマークとしてインターコードの利用が期待できることを示した。 intercodeは簡単に拡張できるように設計されているが、capture the flagのような新しいタスクを作成するのにも使える。コードとデータを持つプロジェクトサイト: https://intercode-benchmark.github.io

Humans write code in a fundamentally interactive manner and rely on constant execution feedback to correct errors, resolve ambiguities, and decompose tasks. While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning (RL) environment, with code as actions and execution feedback as observations. Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution, and is compatible out-of-the-box with traditional seq2seq coding methods, while enabling the development of new methods for interactive code generation. We use InterCode to create three interactive code environments with Bash, SQL, and Python as action spaces, leveraging data from the static NL2Bash, Spider, and MBPP datasets. We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies such as ReAct and Plan & Solve. Our results showcase the benefits of interactive code generation and demonstrate that InterCode can serve as a challenging benchmark for advancing code understanding and generation capabilities. InterCode is designed to be easily extensible and can even be used to create new tasks such as Capture the Flag, a popular coding puzzle that is inherently multi-step and involves multiple programming languages. Project site with code and data: https://intercode-benchmark.github.io

翻訳日:2023-11-01 23:12:08 公開日:2023-10-30

# 位相依存ハンベリーブラウンとtwiss効果

Phase Dependent Hanbury-Brown and Twiss effect ( http://arxiv.org/abs/2308.11459v2 )

ライセンス: Link先を確認

Xuan Tang, Yunxiao Zhang, Xueshi Guo, Liang Cui, Xiaoying Li, Z. Y. Ou

(参考訳) ハンベリー・ブラウン・アンド・ツイス効果(HBT)は恒星強度干渉法の基礎となる。しかし、位相非感受性の2光子干渉効果である。本稿では,2つの位相コヒーレント入力場とコヒーレント補助場とを混合してHBT干渉計を拡張し,入力場の完全複素二階コヒーレンス関数を測定するために位相感度2光子干渉を実現する。この実用的な手法は、光学系における天文学的応用のための合成開口イメージングの道を開く。パルス入力フィールドは、リモートセンシングや測位アプリケーションのためにもテストされている。本稿では,より現実的なcw広帯域光電界を用いた絡み合い型テレスコピー方式の実装条件について検討する。

Hanbury-Brown and Twiss (HBT) effect is the foundation for stellar intensity interferometry. However, it is a phase insensitive two-photon interference effect. In this paper, we extend the HBT interferometer by mixing two phase-coherent input fields with coherent auxiliary fields before intensity correlation measurement and achieve phase sensitive two-photon interference so as to measure the complete complex second-order coherence function of the input fields. This practical scheme paves the way for synthetic aperture imaging for astronomical applications in optical regime. Pulsed input fields is also tested for potential remote sensing and ranging applications. We discuss the condition to implement recently proposed entanglement-based telescopy scheme with the more realistic cw broadband anti-bunched light fields.

翻訳日:2023-11-01 23:05:08 公開日:2023-10-30

# PsyMo: 歩行から自己申告された心理的トラストを推定するためのデータセット

PsyMo: A Dataset for Estimating Self-Reported Psychological Traits from Gait ( http://arxiv.org/abs/2308.10631v2 )

ライセンス: Link先を確認

Adrian Cosma, Emilian Radoi

(参考訳) 運動や外見などの外的要因からの心理的特性推定は、心理学において困難で長期にわたる問題であり、主にエンボディメントの心理学理論に基づいている。これまでのところ、この問題に対処する試みは、侵入性体感センサーを備えたプライベートな小規模データセットを利用している。心理的特性推定のための自動システムの潜在的な応用には、職業的疲労と心理学の推定、マーケティングと広告が含まれる。本研究では,歩行パターンに現れる心理的手がかりを探索するための新しい多目的多モードデータセットであるpsymo(psychological traits from motion)を提案する。被験者312名から7種類の歩行変化と6種類のカメラアングルで歩行シーケンスを収集した。参加者は6つの心理的質問紙に記入し、パーソナリティ、自尊感情、疲労、攻撃性、精神健康に関する17の心理指標を集計した。心理特性推定のための2つの評価プロトコルを提案する。歩行から自己報告された心理的特徴を推定すると同時に、このデータセットは歩行認識のためのベンチマーク手法の代替として使用できる。被験者の身元に関するすべての手がかりを匿名化し,シルエット,2D/3Dヒト骨格,3D SMPLヒトメッシュのみを一般公開した。

Psychological trait estimation from external factors such as movement and appearance is a challenging and long-standing problem in psychology, and is principally based on the psychological theory of embodiment. To date, attempts to tackle this problem have utilized private small-scale datasets with intrusive body-attached sensors. Potential applications of an automated system for psychological trait estimation include estimation of occupational fatigue and psychology, and marketing and advertisement. In this work, we propose PsyMo (Psychological traits from Motion), a novel, multi-purpose and multi-modal dataset for exploring psychological cues manifested in walking patterns. We gathered walking sequences from 312 subjects in 7 different walking variations and 6 camera angles. In conjunction with walking sequences, participants filled in 6 psychological questionnaires, totalling 17 psychometric attributes related to personality, self-esteem, fatigue, aggressiveness and mental health. We propose two evaluation protocols for psychological trait estimation. Alongside the estimation of self-reported psychological traits from gait, the dataset can be used as a drop-in replacement to benchmark methods for gait recognition. We anonymize all cues related to the identity of the subjects and publicly release only silhouettes, 2D / 3D human skeletons and 3D SMPL human meshes.

翻訳日:2023-11-01 23:04:36 公開日:2023-10-30

# ベイズデータ選択によるモデル学習の高速化

Towards Accelerated Model Training via Bayesian Data Selection ( http://arxiv.org/abs/2308.10544v2 )

ライセンス: Link先を確認

Zhijie Deng, Peng Cui, Jun Zhu

(参考訳) 現実のシナリオにおけるミスラベル付き、重複、バイアス付きのデータは、長期間のトレーニングにつながり、モデル収束を妨げます。簡単あるいはハードなサンプルを優先順位付けする従来のソリューションは、このような多様性を同時に扱う柔軟性を欠いている。最近の研究は、モデルの一般化損失に対するデータの影響を調べることによって、より合理的なデータ選択原則を提案している。しかし、その実践的な採用は、より原則的な近似と追加のホールドアウトデータに依存している。本研究は, 軽量ベイズ処理を活用し, 大規模事前学習モデルを用いた既定ゼロショット予測器を組み込むことにより, この問題を解決した。結果として得られるアルゴリズムは効率的で実装が容易です。我々は,オンラインバッチ選択シナリオにおいて,データノイズと不均衡がかなり大きい難易度ベンチマークについて広範な実証研究を行い,競合ベースラインよりも優れたトレーニング効率を観察する。特に、挑戦的なwebvisionベンチマークにおいて、本手法は、リードデータ選択法よりもトレーニングイテレーションをかなり少なくして、同様の予測性能を達成することができる。

Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy to implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.

翻訳日:2023-11-01 23:04:16 公開日:2023-10-30

# 任意の次元と異なる次元に対する絡み合い証人の簡単な構成

A simple construction of Entanglement Witnesses for arbitrary and different dimensions ( http://arxiv.org/abs/2308.07019v3 )

ライセンス: Link先を確認

Vahid Jannesary, Vahid Karimipour

(参考訳) 異なる次元の空間間の様々な正の写像の集合を生成するための簡単なアプローチを提案する。提案手法は,$d_1 \times d_2$次元のシステムに適したエンタングルメントウィットネスの構築を可能にする。この方法では、選択された所望の測定集合のみからなる絡み合い証人を構成できる。具体例を用いて,本手法の有効性と一般性を示す。また、与えられた状態が正の部分的転置(ppt)絡み合い状態である場合を含む、与えられた状態の絡み合いを目撃するために適切な絡み合い証人を識別する方法を2つの例で示している。

We present a simple approach for generation of a diverse set of positive maps between spaces of different dimensions. The proposed method enables the construction of Entanglement Witnesses tailored for systems in $d_1 \times d_2$ dimensions. With this method, it is possible to construct Entanglement Witnesses that consist solely of a chosen set of desired measurements. We demonstrate the effectiveness and generality of our approach using concrete examples. It is also demonstrated in two examples, how an appropriate entanglement witness can be identified for witnessing the entanglement of a given state, including a case when the given state is a Positive Partial Transpose (PPT) entangled state.

翻訳日:2023-11-01 23:04:01 公開日:2023-10-30

# 協調フィルタリングにおける損失関数の理解を深める

Toward a Better Understanding of Loss Functions for Collaborative Filtering ( http://arxiv.org/abs/2308.06091v2 )

ライセンス: Link先を確認

Seongmin Park, Mincheol Yoon, Jae-woong Lee, Hogun Park, Jongwuk Lee

(参考訳) 協調フィルタリング(CF)は現代の推薦システムにおいて重要な手法である。 CFモデルの学習プロセスは通常、インタラクションエンコーダ、損失関数、ネガティブサンプリングの3つのコンポーネントで構成される。多くの既存の研究で洗練された相互作用エンコーダを設計するために様々なcfモデルが提案されているが、最近の研究は損失関数の再構成が著しい性能向上を達成できることを示している。本稿では,既存の損失関数の関係を考察する。我々の数学的解析によると、以前の損失関数はアライメントと均一性関数として解釈できる。 (i)アライメントがユーザとアイテムの表現と一致すること、 (ii)均一性は、ユーザとアイテムの分布を分散させる。この分析に触発されて、Margin-aware Alignment and Weighted Uniformity (MAWU)と呼ばれるデータセットのユニークなパターンを考慮したアライメントと均一性の設計を改善する新しい損失関数を提案する。 mawuの鍵となる新しさは2つあります。 (i)マージン認識アライメント(ma)は、ユーザ/項目固有の人気バイアスを軽減し、 (II)重み付き均一性(WU)は、ユーザとアイテムの均一性の重要性を調整し、データセット固有の特性を反映する。広範な実験の結果、mawuを搭載したmfとlightgcnは、3つのパブリックデータセットで様々な損失関数を持つ最先端cfモデルに匹敵するか優れていることが示された。

Collaborative filtering (CF) is a pivotal technique in modern recommender systems. The learning process of CF models typically consists of three components: interaction encoder, loss function, and negative sampling. Although many existing studies have proposed various CF models to design sophisticated interaction encoders, recent work shows that simply reformulating the loss functions can achieve significant performance gains. This paper delves into analyzing the relationship among existing loss functions. Our mathematical analysis reveals that the previous loss functions can be interpreted as alignment and uniformity functions: (i) the alignment matches user and item representations, and (ii) the uniformity disperses user and item distributions. Inspired by this analysis, we propose a novel loss function that improves the design of alignment and uniformity considering the unique patterns of datasets called Margin-aware Alignment and Weighted Uniformity (MAWU). The key novelty of MAWU is two-fold: (i) margin-aware alignment (MA) mitigates user/item-specific popularity biases, and (ii) weighted uniformity (WU) adjusts the significance between user and item uniformities to reflect the inherent characteristics of datasets. Extensive experimental results show that MF and LightGCN equipped with MAWU are comparable or superior to state-of-the-art CF models with various loss functions on three public datasets.

翻訳日:2023-11-01 23:03:49 公開日:2023-10-30

# 強結合ボゾン系における高速量子状態転移と絡み合い形成

Fast quantum state transfer and entanglement preparation in strongly coupled bosonic systems ( http://arxiv.org/abs/2308.05511v2 )

ライセンス: Link先を確認

Yilun Xu, Daoquan Zhu, Feng-Xiao Sun, Qiongyi He, Wei Zhang

(参考訳) 線形ボゾン系における総励起の保存を保証する連続U(1)ゲージ対称性は、回転波近似(RWA)が失敗する強い結合状態において破られる。本稿では, RWAを超えるXX型結合を持つ多モードボソニック系の解析解を開発し, 高速で高忠実度量子状態伝達(QST)と絡み込み準備(EP)を実装する新しい手法を提案する。このスキームは、大域的u(1)対称性の崩壊にかかわらず励起数が変化しない所定の結合強度とパルス持続時間で実現できる。 QSTタスクでは、いくつかの典型的な量子状態を検討し、この手法が熱雑音や実験シーケンスの不完全性に対して堅牢であることを示す。 EPタスクでは、最短準備時間内にベル状態およびW型状態の準備のために、このスキームをうまく実施する。

Continuous U(1) gauge symmetry, which guarantees the conservation of the total excitations in linear bosonic systems, will be broken when it comes to the strong-coupling regime where the rotation wave approximation (RWA) fails. Here we develop analytic solutions for multi-mode bosonic systems with XX-type couplings beyond RWA, and proposed a novel scheme to implement high-fidelity quantum state transfer (QST) and entanglement preparation (EP) with high speed. The scheme can be realized with designated coupling strength and pulse duration with which the excitation number keeps unchanged regardless of the breakdown of the global U(1) symmetry. In the QST tasks, we consider several typical quantum states and demonstrate that this method is robust against thermal noise and imperfections of experimental sequence. In the EP tasks, the scheme is successfully implemented for the preparation of Bell states and W-type states, within a shortest preparation time.

翻訳日:2023-11-01 23:03:25 公開日:2023-10-30

# グラフクラスタリングのためのホモフィリエンハンス構造学習

Homophily-enhanced Structure Learning for Graph Clustering ( http://arxiv.org/abs/2308.05309v3 )

ライセンス: Link先を確認

Ming Gu, Gaoming Yang, Sheng Zhou, Ning Ma, Jiawei Chen, Qiaoyu Tan, Meihan Liu, Jiajun Bu

(参考訳) グラフクラスタリングはグラフ解析の基本課題であり、グラフニューラルネットワーク(GNN)の最近の進歩は印象的な結果を示している。既存のGNNベースのグラフクラスタリング手法の成功にもかかわらず、それらはしばしばグラフ構造の品質を見落としている。グラフ構造学習は、欠落したリンクを追加し、スプリアス接続を取り除くことで、入力グラフの精細化を可能にする。しかしながら、グラフ構造学習におけるこれまでの取り組みは、主に教師付き設定を中心に行われており、接地ラベルがないため、特定のクラスタリングタスクに直接適用することはできない。このギャップを埋めるために,グラフクラスタリング (HoLe) のための新しい手法である \textbf{ho}mophily-enhanced structure \textbf{le}arning を提案する。我々のモチベーションは、グラフ構造内のホモフィリーの度合いを微妙に向上させることで、GNNとクラスタリングの結果を著しく改善することに由来する。この目的を実現するために,階層相関推定とクラスタ認識スパース化という2つのクラスタリング指向構造学習モジュールを開発した。前者モジュールは、潜在空間とクラスタリング空間からのガイダンスを利用して、より正確なペアワイズノード関係の推定を可能にし、後者は類似度行列とクラスタリング割り当てに基づいてスパーシファイド構造を生成する。さらに,ホモフィリエンハンス構造学習とgnnベースのクラスタリングを交互に行う共同最適化手法を考案し,相互効果の促進を図る。さまざまなタイプとスケールの7つのベンチマークデータセットに関する広範な実験が、さまざまなクラスタリングメトリクスを通じて、最先端のベースラインに対するホールの優位性を示している。

Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.

翻訳日:2023-11-01 23:03:09 公開日:2023-10-30

# TSMD:静的カラーメッシュ品質評価のためのデータベース

TSMD: A Database for Static Color Mesh Quality Assessment Study ( http://arxiv.org/abs/2308.01940v2 )

ライセンス: Link先を確認

Qi Yang, Joel Jung, Haiqiang Wang, Xiaozhong Xu, and Shan Liu

(参考訳) テクスチャマップを備えた静的メッシュは、現代の工業や製造業で広く使われており、大量のデータによってメッシュ圧縮コミュニティでかなりの注目を集めている。静的メッシュ圧縮アルゴリズムと客観的品質指標の研究を容易にするために,リッチな視覚特性を持つ42の参照メッシュを含むtencent - static mesh dataset (tsmd) を開発した。 210の歪んだサンプルは、6月23日にalliance for open media volumetric visual media groupからリリースされた多角形静的メッシュコーディングの提案のために開発されたロスリー圧縮スキームによって生成される。 74名の視聴者から主観的スコアを収集するために, クラウドソーシングによる主観的実験を行った。データセットは、そのサンプル多様性と平均世論スコア(mos)の精度を検証するために分析を行い、異質な性質と信頼性を確立する。最先端の客観的メトリクスは、新しいデータセットで評価される。ピアソンとスピアーマンの相関関係は0.75程度と報告されており、不均一なデータセットで通常観測される結果から逸脱し、より堅牢なメトリクスのさらなる開発の必要性を示している。メッシュ、PVS、ビットストリーム、MOSを含むTSMDは、以下の場所で公開されている。

Static meshes with texture map are widely used in modern industrial and manufacturing sectors, attracting considerable attention in the mesh compression community due to its huge amount of data. To facilitate the study of static mesh compression algorithm and objective quality metric, we create the Tencent - Static Mesh Dataset (TSMD) containing 42 reference meshes with rich visual characteristics. 210 distorted samples are generated by the lossy compression scheme developed for the Call for Proposals on polygonal static mesh coding, released on June 23 by the Alliance for Open Media Volumetric Visual Media group. Using processed video sequences, a large-scale, crowdsourcing-based, subjective experiment was conducted to collect subjective scores from 74 viewers. The dataset undergoes analysis to validate its sample diversity and Mean Opinion Scores (MOS) accuracy, establishing its heterogeneous nature and reliability. State-of-the-art objective metrics are evaluated on the new dataset. Pearson and Spearman correlations around 0.75 are reported, deviating from results typically observed on less heterogeneous datasets, demonstrating the need for further development of more robust metrics. The TSMD, including meshes, PVSs, bitstreams, and MOS, is made publicly available at the following location: https://multimedia.tencent.com/resources/tsmd.

翻訳日:2023-11-01 23:02:35 公開日:2023-10-30

# AIバリューチェーンの倫理

The Ethics of AI Value Chains ( http://arxiv.org/abs/2307.16787v2 )

ライセンス: Link先を確認

Blair Attard-Frost, David Gray Widder

(参考訳) AI倫理に関心を持つ研究者、実践者、政策立案者は、さまざまな状況や活動規模にわたるAIシステムの研究と介入に、より統合的なアプローチを必要とする。本稿では,AIバリューチェーンを,必要を満たす統合的概念として提示する。 AIバリューチェーンをより明確に理論化し、概念的にサプライチェーンと区別するために、我々は、バリューチェーンとAIバリューチェーンの理論を戦略的管理、サービスサイエンス、経済地理学、産業、政府、応用研究文献からレビューする。次に、AIバリューチェーンに関連する倫理的懸念をカバーする67のソースのサンプルの統合的レビューを行います。統合的レビューの結果に基づいて、研究者、実践者、政策立案者がAI開発をより倫理的に進め、AIバリューチェーンをまたいだ利用を進めるための4つの今後の方向性を推奨します。私たちのレビューと勧告は、aiバリューチェーンの倫理を研究し、介入しようとする研究課題、産業課題、政策課題の進展に寄与します。

Researchers, practitioners, and policymakers with an interest in AI ethics need more integrative approaches for studying and intervening in AI systems across many contexts and scales of activity. This paper presents AI value chains as an integrative concept that satisfies that need. To more clearly theorize AI value chains and conceptually distinguish them from supply chains, we review theories of value chains and AI value chains from the strategic management, service science, economic geography, industry, government, and applied research literature. We then conduct an integrative review of a sample of 67 sources that cover the ethical concerns implicated in AI value chains. Building upon the findings of our integrative review, we recommend four future directions that researchers, practitioners, and policymakers can take to advance more ethical practices of AI development and use across AI value chains. Our review and recommendations contribute to the advancement of research agendas, industrial agendas, and policy agendas that seek to study and intervene in the ethics of AI value chains.

翻訳日:2023-11-01 23:01:30 公開日:2023-10-30

# 機械学習のための物理システムにおけるサンプリングノイズ対策 -基本限界と固有タスク-

Tackling Sampling Noise in Physical Systems for Machine Learning Applications: Fundamental Limits and Eigentasks ( http://arxiv.org/abs/2307.16083v2 )

ライセンス: Link先を確認

Fangjun Hu, Gerasimos Angelatos, Saeed A. Khan, Marti Vives, Esin T\"ureci, Leon Bello, Graham E. Rowlands, Guilhem J. Ribeill, Hakan E. T\"ureci

(参考訳) 学習に使用する物理系の表現能力は,抽出した出力のノイズの存在によって制限される。古典系と量子系の両方に物理系が存在するが、学習におけるノイズの正確な影響はよく分かっていない。教師付き学習に着目し,有限サンプリング雑音下での一般物理系の可解表現能力(REC)を評価する数学的枠組みを提案し,そのエクストリーム,固有タスクを抽出する手法を提案する。固有タスクは、与えられた物理システムが最小限の誤差で近似できる関数のネイティブセットである。量子系のRECは、量子測定の基本理論によって制限され、任意の有限サンプリング物理系のRECに対して厳密な上界が得られることを示す。次に,低雑音固有タスクの抽出が,分類や過度適合性などの機械学習タスクのパフォーマンス向上につながるという実証的証拠を提供する。本稿では,量子システムの相関が固有タスクのノイズ低減により学習能力を高めることを示唆する。これらの結果の適用性は超伝導量子プロセッサの実験で実証されている。我々の発見は量子機械学習とセンシングの応用に幅広い影響を及ぼす。

The expressive capacity of physical systems employed for learning is limited by the unavoidable presence of noise in their extracted outputs. Though present in physical systems across both the classical and quantum regimes, the precise impact of noise on learning remains poorly understood. Focusing on supervised learning, we present a mathematical framework for evaluating the resolvable expressive capacity (REC) of general physical systems under finite sampling noise, and provide a methodology for extracting its extrema, the eigentasks. Eigentasks are a native set of functions that a given physical system can approximate with minimal error. We show that the REC of a quantum system is limited by the fundamental theory of quantum measurement, and obtain a tight upper bound for the REC of any finitely-sampled physical system. We then provide empirical evidence that extracting low-noise eigentasks can lead to improved performance for machine learning tasks such as classification, displaying robustness to overfitting. We present analyses suggesting that correlations in the measured quantum system enhance learning capacity by reducing noise in eigentasks. The applicability of these results in practice is demonstrated with experiments on superconducting quantum processors. Our findings have broad implications for quantum machine learning and sensing applications.

翻訳日:2023-11-01 23:01:12 公開日:2023-10-30

# 量子回路オートエンコーダ

Quantum Circuit AutoEncoder ( http://arxiv.org/abs/2307.08446v2 )

ライセンス: Link先を確認

Jun Wu, Hao Fu, Mingzheng Zhu, Haiyue Zhang, Wei Xie and Xiang-Yang Li

(参考訳) 量子オートエンコーダは、量子状態に格納された情報を圧縮するための量子ニューラルネットワークモデルである。しかし、新しい量子情報技術では、多くのタスクのために量子回路に格納された情報を処理する必要がある。本稿では,古典的および量子オートエンコーダの考え方を一般化した量子回路オートエンコーダ(QCAE)のモデルを導入し,量子回路内の情報を圧縮・符号化する。我々はQCAEの包括的なプロトコルを提供し、その実装のために変分量子アルゴリズム varQCAE を設計する。我々は、このモデルについて、損失のない圧縮条件を導出し、その回復率の上下境界を確立することによって理論的に解析する。最後に, varQCAEを3つの実用的なタスクに適用し, 1) 量子回路内の情報を効果的に圧縮し, (2) 量子回路の異常を検知し, (3) 量子デバイスにおける非偏極ノイズを軽減することを示す。このことは,量子回路の他の情報処理タスクにも応用可能であることを示唆する。

Quantum autoencoder is a quantum neural network model for compressing information stored in quantum states. However, one needs to process information stored in quantum circuits for many tasks in the emerging quantum information technology. In this work, generalizing the ideas of classical and quantum autoencoder, we introduce the model of Quantum Circuit AutoEncoder (QCAE) to compress and encode information within quantum circuits. We provide a comprehensive protocol for QCAE and design a variational quantum algorithm, varQCAE, for its implementation. We theoretically analyze this model by deriving conditions for lossless compression and establishing both upper and lower bounds on its recovery fidelity. Finally, we apply varQCAE to three practical tasks and numerical results show that it can effectively (1) compress the information within quantum circuits, (2) detect anomalies in quantum circuits, and (3) mitigate the depolarizing noise in quantum devices. This suggests that our algorithm is potentially applicable to other information processing tasks for quantum circuits.

翻訳日:2023-11-01 23:00:14 公開日:2023-10-30

# S-QGPU:分散量子コンピューティングのための共有量子ゲート処理ユニット

S-QGPU: Shared Quantum Gate Processing Unit for Distributed Quantum Computing ( http://arxiv.org/abs/2309.08736v2 )

ライセンス: Link先を確認

Shengwang Du, Yufei Ding, Chunming Qiao

(参考訳) 本稿では,個々の小型量子コンピュータを共有量子ゲート処理ユニット(s-qgpu)に接続する分散量子コンピューティング(dqc)アーキテクチャを提案する。 S-QGPUは、リモートゲート操作のためのハイブリッド2ビットゲートモジュールからなる。各量子コンピュータが専用の通信キュービットを備えている従来のDQCシステムとは対照的に、S-QGPUはリモートゲート操作のためにリソース(例えば通信キュービット)を効果的にプールし、ローカルな量子コンピュータだけでなく、全体の分散システムのコストを大幅に削減する。予備解析とシミュレーションにより,S-QGPUの遠隔ゲート操作のための共有資源が資源利用の効率化を図っている。システム内の全ての計算キュービット(データキュービットとも呼ばれる)が同時遠隔ゲート操作を必要とするわけではない場合、S-QGPUベースのDQCアーキテクチャは通信キュービットを少なくし、全体的なコストを削減できる。あるいは、同じ数の通信キュービットで、特にバーストモードで発生する場合に、より多くの同時リモートゲート操作をより効率的にサポートすることができる。

We propose a distributed quantum computing (DQC) architecture in which individual small-sized quantum computers are connected to a shared quantum gate processing unit (S-QGPU). The S-QGPU comprises a collection of hybrid two-qubit gate modules for remote gate operations. In contrast to conventional DQC systems, where each quantum computer is equipped with dedicated communication qubits, S-QGPU effectively pools the resources (e.g., the communication qubits) together for remote gate operations, and thus significantly reduces the cost of not only the local quantum computers but also the overall distributed system. Our preliminary analysis and simulation show that S-QGPU's shared resources for remote gate operations enable efficient resource utilization. When not all computing qubits (also called data qubits) in the system require simultaneous remote gate operations, S-QGPU-based DQC architecture demands fewer communication qubits, further decreasing the overall cost. Alternatively, with the same number of communication qubits, it can support a larger number of simultaneous remote gate operations more efficiently, especially when these operations occur in a burst mode.

翻訳日:2023-11-01 22:53:00 公開日:2023-10-30

# クラスタ化マルチエージェント線形バンディット

Clustered Multi-Agent Linear Bandits ( http://arxiv.org/abs/2309.08710v2 )

ライセンス: Link先を確認

Hamza Cherkaoui and Merwan Barlier and Igor Colin

(参考訳) 本稿では,マルチエージェント線形確率バンディット問題(クラスタ型マルチエージェント線形バンディット)の具体例について述べる。そこで本研究では,エージェント間の効率的な協調を利用して最適化問題を高速化するアルゴリズムを提案する。このコントリビューションでは、ネットワークコントローラがネットワークの基盤となるクラスタ構造を推定し、同一グループ内のエージェント間で共有されるエクスペリエンスを最適化する。後悔最小化問題とクラスタリング品質の両方について理論的解析を行う。合成データと実データの両方における最先端アルゴリズムに対する実証的な評価を通じて,我々の手法の有効性を実証する。

We address in this paper a particular instance of the multi-agent linear stochastic bandit problem, called clustered multi-agent linear bandits. In this setting, we propose a novel algorithm leveraging an efficient collaboration between the agents in order to accelerate the overall optimization problem. In this contribution, a network controller is responsible for estimating the underlying cluster structure of the network and optimizing the experiences sharing among agents within the same groups. We provide a theoretical analysis for both the regret minimization problem and the clustering quality. Through empirical evaluation against state-of-the-art algorithms on both synthetic and real data, we demonstrate the effectiveness of our approach: our algorithm significantly improves regret minimization while managing to recover the true underlying cluster partitioning.

翻訳日:2023-11-01 22:52:41 公開日:2023-10-30

# 神経機能学習におけるparetoのフロンティア: データ、計算、幅、運

Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck ( http://arxiv.org/abs/2309.03800v2 )

ライセンス: Link先を確認

Benjamin L. Edelman, Surbhi Goel, Sham Kakade, Eran Malach, Cyril Zhang

(参考訳) 現代のディープラーニングでは、アルゴリズムによる選択(幅、深さ、学習率など)がニュアンスドリソーストレードオフを変調することが知られている。本研究は,これらの複雑度が,計算統計的ギャップの存在下での特徴学習に必然的に現れるかを検討する。まず,多層パーセプトロンの勾配に基づく学習のための統計的クエリの下限を許容する教師付き分類問題であるオフラインスパースパリティ学習を検討する。この下限は、多元的トレードオフフロンティアとして解釈することができる: 成功する学習は、十分なリッチ(大きなモデル)、知識のある(大きなデータセット)、患者(多くのトレーニングイテレーション)、幸運(多くのランダムな推測)がある場合にのみ発生する。理論上, 実験上, 疎初期化とネットワーク幅の増大がサンプル効率を著しく向上させることを示す。ここで、幅は平行探索の役割を担っている: 「ラッタチケット」ニューロンを見つける確率を増幅し、よりサンプル効率のよい特徴を学習する。最後に,合成スパースパリティタスクは,軸指向型特徴学習を必要とする実問題に対するプロキシとして有用であることを示す。広帯域かつ疎初期化MLPモデルを用いて,表層分類ベンチマークにおけるサンプル効率の向上を実証した。

In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are known to modulate nuanced resource tradeoffs. This work investigates how these complexities necessarily arise for feature learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.

翻訳日:2023-11-01 22:51:36 公開日:2023-10-30

# ハードサンプルリマイニング戦略によるロバスト植物病診断に向けて

Towards Robust Plant Disease Diagnosis with Hard-sample Re-mining Strategy ( http://arxiv.org/abs/2309.01903v2 )

ライセンス: Link先を確認

Quan Huu Cap, Atsushi Fukuda, Satoshi Kagiwada, Hiroyuki Uga, Nobusuke Iwasaki, Hitoshi Iyatomi

(参考訳) リッチなアノテーション情報により、オブジェクト検出に基づく自動植物病診断システム(例えば、yoloベースのシステム)は、病気の位置の検出や優れた分類性能などの分類ベースのシステム(例えば、effernetベースの)よりも優れていることが多い。これらの検出システムの欠点の1つは、実際の症状が存在しない無注釈の健康データを扱うことである。実際には、健康な植物データは多くの病気データと非常によく似ている。したがって、これらのモデルはしばしば、健康な画像の誤検出ボックスを生成する。加えて、新しいデータを検出モデルにラベル付けるのは通常時間がかかる。 HSM (Hard-sample mining) は、誤り検出ボックスを新しいトレーニングサンプルとして使用することで、モデルを再訓練する一般的な手法である。しかしながら、任意の量のハードサンプルを盲目的に選択すると、疾患と健康データとの類似性が高いため、他の疾患の診断性能が低下する。本稿では,健康なデータの診断性能を高めるとともに,適切なレベルでハードサンプルトレーニング画像を戦略的に選択することで疾患データの性能を向上させることを目的とした,ハードサンプルリマイニング(HSReM)と呼ばれる簡易かつ効果的なトレーニング戦略を提案する。実践的な2つの8クラスキュウリと10クラスのトマトデータセット(42.7Kと35.6Kの画像)に基づく実験により、我々のHSReMトレーニング戦略は、大規模未確認データに対する全体的な診断性能を大幅に改善することを示した。具体的には、HSReM戦略を用いて訓練されたオブジェクト検出モデルは、分類に基づく最先端NetV2-Largeモデルとオリジナルのオブジェクト検出モデルよりも優れた結果を得ただけでなく、複数の評価指標においてHSM戦略を用いたモデルよりも優れていた。

With rich annotation information, object detection-based automated plant disease diagnosis systems (e.g., YOLO-based systems) often provide advantages over classification-based systems (e.g., EfficientNet-based), such as the ability to detect disease locations and superior classification performance. One drawback of these detection systems is dealing with unannotated healthy data with no real symptoms present. In practice, healthy plant data appear to be very similar to many disease data. Thus, those models often produce mis-detected boxes on healthy images. In addition, labeling new data for detection models is typically time-consuming. Hard-sample mining (HSM) is a common technique for re-training a model by using the mis-detected boxes as new training samples. However, blindly selecting an arbitrary amount of hard-sample for re-training will result in the degradation of diagnostic performance for other diseases due to the high similarity between disease and healthy data. In this paper, we propose a simple but effective training strategy called hard-sample re-mining (HSReM), which is designed to enhance the diagnostic performance of healthy data and simultaneously improve the performance of disease data by strategically selecting hard-sample training images at an appropriate level. Experiments based on two practical in-field eight-class cucumber and ten-class tomato datasets (42.7K and 35.6K images) show that our HSReM training strategy leads to a substantial improvement in the overall diagnostic performance on large-scale unseen data. Specifically, the object detection model trained using the HSReM strategy not only achieved superior results as compared to the classification-based state-of-the-art EfficientNetV2-Large model and the original object detection model, but also outperformed the model using the HSM strategy in multiple evaluation metrics.

翻訳日:2023-11-01 22:51:12 公開日:2023-10-30

# NLLB-CLIP -- 予算に基づく列車動作多言語画像検索モデル

NLLB-CLIP -- train performant multilingual image retrieval model on a budget ( http://arxiv.org/abs/2309.01859v2 )

ライセンス: Link先を確認

Alexander Visheratin

(参考訳) 今日では、大規模コンピューティング資源の助けを借りて、学術機関や産業機関によって開発された大規模モデルの指数関数的増加は、そのような資源にアクセスできない人が貴重な科学的貢献を得られるかどうかという疑問を提起している。そこで我々は,1000ドルの限られた予算を持つ多言語画像検索の課題を解決することを試みた。その結果,NLLBモデルからテキストエンコーダを用いたNLLB-CLIP-CLIPモデルを提案する。このモデルをトレーニングするために、LAION COCOデータセットから派生した201言語でキャプション付き106,246の良質な画像の自動生成データセットを使用した。様々なサイズの画像とテキストエンコーダを用いて複数のモデルを訓練し、トレーニング中にモデルの異なる部分を凍結させた。既存の評価データセットと、新たに作成されたxtd200とflickr30k-200データセットを用いて、トレーニングモデルを徹底的に分析した。我々は,NLLB-CLIPが最先端モデルに匹敵する品質であり,低リソース言語ではかなり優れていることを示す。

Today, the exponential rise of large models developed by academic and industrial institutions with the help of massive computing resources raises the question of whether someone without access to such resources can make a valuable scientific contribution. To explore this, we tried to solve the challenging task of multilingual image retrieval having a limited budget of $1,000. As a result, we present NLLB-CLIP - CLIP model with a text encoder from the NLLB model. To train the model, we used an automatically created dataset of 106,246 good-quality images with captions in 201 languages derived from the LAION COCO dataset. We trained multiple models using image and text encoders of various sizes and kept different parts of the model frozen during the training. We thoroughly analyzed the trained models using existing evaluation datasets and newly created XTD200 and Flickr30k-200 datasets. We show that NLLB-CLIP is comparable in quality to state-of-the-art models and significantly outperforms them on low-resource languages.

翻訳日:2023-11-01 22:50:28 公開日:2023-10-30

# NAS-X: ツイストによるニューラル適応平滑化

NAS-X: Neural Adaptive Smoothing via Twisting ( http://arxiv.org/abs/2308.14864v2 )

ライセンス: Link先を確認

Dieterich Lawson, Michael Li, Scott Linderman

(参考訳) 逐次潜在変数モデル(SLVM)は統計学や機械学習において必須のツールであり、医療から神経科学まで幅広い応用がある。柔軟性が増すにつれて、解析的推論とモデル学習は難しくなり、近似メソッドが必要となる。本稿では,smc(s smoothing sequential monte carlo)を用いて再重み付けウェイクスリープ(reweighted wake-sleep, rws)を逐次設定に拡張したニューラルアダプティブスライディング(nas-x)を提案する。 RWS と滑らかな SMC を組み合わせることで、NAS-X は低バイアスおよび低分散勾配推定を提供し、離散変数モデルと連続変数モデルの両方に適合する。従来の手法よりもNAS-Xの理論的利点を説明し、神経力学の力学モデルへの挑戦を含む様々なタスクにおいてこれらの利点を実証的に探求する。これらの実験により,NAS-X は従来の VI- および RWS に基づく推論とモデル学習の手法を著しく上回り,より低いパラメータ誤差とより厳密な近距離境界を達成した。

Sequential latent variable models (SLVMs) are essential tools in statistics and machine learning, with applications ranging from healthcare to neuroscience. As their flexibility increases, analytic inference and model learning can become challenging, necessitating approximate methods. Here we introduce neural adaptive smoothing via twisting (NAS-X), a method that extends reweighted wake-sleep (RWS) to the sequential setting by using smoothing sequential Monte Carlo (SMC) to estimate intractable posterior expectations. Combining RWS and smoothing SMC allows NAS-X to provide low-bias and low-variance gradient estimates, and fit both discrete and continuous latent variable models. We illustrate the theoretical advantages of NAS-X over previous methods and explore these advantages empirically in a variety of tasks, including a challenging application to mechanistic models of neuronal dynamics. These experiments show that NAS-X substantially outperforms previous VI- and RWS-based methods in inference and model learning, achieving lower parameter error and tighter likelihood bounds.

翻訳日:2023-11-01 22:50:02 公開日:2023-10-30

# SGMM: モーメントの一般化法に対する確率近似

SGMM: Stochastic Approximation to Generalized Method of Moments ( http://arxiv.org/abs/2308.13564v2 )

ライセンス: Link先を確認

Xiaohong Chen, Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin, Myunghyun Song

(参考訳) 本稿では,(過大な)モーメント制限モデルに対する推定と推論のための新しいアルゴリズムである確率的一般化モーメント法(sgmm)を提案する。我々のSGMMは、人気のあるHansen (1982) (オフライン) GMMに代わる新しい確率近似であり、ストリーミングデータセットをリアルタイムに処理できる高速でスケーラブルな実装を提供する。ほぼ確実な収束と、非効率的なオンライン2SLSと効率的なSGMMに対する(機能的な)中心極限定理を確立する。さらに,SGMMフレームワークにシームレスに統合可能なDurbin-Wu-HausmanおよびSargan-Hansenテストのオンライン版を提案する。大規模なモンテカルロシミュレーションでは、サンプルのサイズが大きくなるにつれて、SGMMは推定精度の点で標準(オフライン)GMMと一致し、計算効率が向上し、大規模なデータセットとオンラインデータセットの両方で実用的価値が示される。サンプルサイズが大きい2つのよく知られた実験例を用いて,概念実証によるアプローチの有効性を実証した。

We introduce a new class of algorithms, Stochastic Generalized Method of Moments (SGMM), for estimation and inference on (overidentified) moment restriction models. Our SGMM is a novel stochastic approximation alternative to the popular Hansen (1982) (offline) GMM, and offers fast and scalable implementation with the ability to handle streaming datasets in real time. We establish the almost sure convergence, and the (functional) central limit theorem for the inefficient online 2SLS and the efficient SGMM. Moreover, we propose online versions of the Durbin-Wu-Hausman and Sargan-Hansen tests that can be seamlessly integrated within the SGMM framework. Extensive Monte Carlo simulations show that as the sample size increases, the SGMM matches the standard (offline) GMM in terms of estimation accuracy and gains over computational efficiency, indicating its practical value for both large-scale and online datasets. We demonstrate the efficacy of our approach by a proof of concept using two well known empirical examples with large sample sizes.

翻訳日:2023-11-01 22:48:48 公開日:2023-10-30

# 時間関数による量子状態の特異性

Uniqueness of quantum state over time function ( http://arxiv.org/abs/2308.12752v2 )

ライセンス: Link先を確認

Seok Hyung Lie and Nelly H. Y. Ng

(参考訳) 基本的Aの非対称性は、空間と時間の間の従来の量子理論の枠組みの中に存在し、量子チャネルによる因果関係と多部量子状態による因果関係を表す。このような区別は古典的確率論には存在しない。この対称性を量子理論に導入するために、量子系の動的記述が時間とともに静的な量子状態によってカプセル化されるような新しい枠組みが最近提案されている。特に、fullwoodとparzygnatは、jordan積に基づく状態超時間関数をそのような量子超時間関数の有望な候補として提案し、horsmanらによるno-goの結果で必要とされる全ての公理を満たすことを示した。しかし、公理が時間関数に対して一意な状態を誘導するかどうかは明らかでない。本研究では,従来提案されていた公理が時間関数で一意な状態にならないことを示す。そこで我々は,2点を超える任意の時空領域上の量子状態を記述するのにより適した,操作的動機づけのある別の公理集合を提案する。これにより、全ての操作公理を満たす本質的に一意な関数としてフルウッド・パリジーニャート状態が時間関数として確立される。

A fundamental A fundamental asymmetry exists within the conventional framework of quantum theory between space and time, in terms of representing causal relations via quantum channels and acausal relations via multipartite quantum states. Such a distinction does not exist in classical probability theory. In effort to introduce this symmetry to quantum theory, a new framework has recently been proposed, such that dynamical description of a quantum system can be encapsulated by a static quantum state over time. In particular, Fullwood and Parzygnat recently proposed the state over time function based on the Jordan product as a promising candidate for such a quantum state over time function, by showing that it satisfies all the axioms required in the no-go result by Horsman et al. However, it was unclear if the axioms induce a unique state over time function. In this work, we demonstrate that the previously proposed axioms cannot yield a unique state over time function. In response, we therefore propose an alternative set of axioms that is operationally motivated, and better suited to describe quantum states over any spacetime regions beyond two points. By doing so, we establish the Fullwood-Parzygnat state over time function as the essentially unique function satisfying all these operational axioms.

翻訳日:2023-11-01 22:48:07 公開日:2023-10-30

# RefEgo:Ego4Dの自己認識から得られる表現理解データを参照

RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D ( http://arxiv.org/abs/2308.12035v2 )

ライセンス: Link先を確認

Shuhei Kurita, Naoki Katsura, Eri Onami

(参考訳) 一対一の視点からシーンオブジェクトのテキスト表現を接地することは、周囲を認識し、直感的なテキスト指示に従って振る舞うエージェントの開発において本当に要求される能力である。このような能力は、ガラスデバイスや自律ロボットが現実世界の参照対象をローカライズする必要がある。しかし、画像の通常の参照表現理解タスクでは、データセットは主にwebクローラーデータに基づいて構築されており、現実世界のさまざまなオブジェクトのテキスト表現を接地するタスクにおいて、多様な現実世界の構造を反映していない。近年,ego4dの大規模エゴセントリックビデオデータセットが提案されている。 Ego4Dは、ショッピング、料理、ウォーキング、トーキー、製造など、屋内および屋外の多くの状況を含む世界中の多様な現実世界のシーンをカバーしている。 ego4dのエゴセントリックビデオに基づいて、ビデオベースの参照表現理解データセットrefegoの広範なカバレッジを構築しました。我々のデータセットは、ビデオベースの参照式理解アノテーションに12K以上のビデオクリップと41時間を含む。実験では、最先端の2D参照表現理解モデルとオブジェクト追跡アルゴリズムを併用し、困難な状況下でもビデオワイド参照オブジェクト追跡を実現する:ビデオの途中で参照オブジェクトがフレーム外になる、あるいはビデオに複数の類似オブジェクトが提示される。コードはhttps://github.com/shuheikurita/refegoで入手できる。

Grounding textual expressions on scene objects from first-person views is a truly demanding capability in developing agents that are aware of their surroundings and behave following intuitive text instructions. Such capability is of necessity for glass-devices or autonomous robots to localize referred objects in the real-world. In the conventional referring expression comprehension tasks of images, however, datasets are mostly constructed based on the web-crawled data and don't reflect diverse real-world structures on the task of grounding textual expressions in diverse objects in the real world. Recently, a massive-scale egocentric video dataset of Ego4D was proposed. Ego4D covers around the world diverse real-world scenes including numerous indoor and outdoor situations such as shopping, cooking, walking, talking, manufacturing, etc. Based on egocentric videos of Ego4D, we constructed a broad coverage of the video-based referring expression comprehension dataset: RefEgo. Our dataset includes more than 12k video clips and 41 hours for video-based referring expression comprehension annotation. In experiments, we combine the state-of-the-art 2D referring expression comprehension models with the object tracking algorithm, achieving the video-wise referred object tracking even in difficult conditions: the referred object becomes out-of-frame in the middle of the video or multiple similar objects are presented in the video. Codes are available at https://github.com/shuheikurita/RefEgo

翻訳日:2023-11-01 22:47:46 公開日:2023-10-30

# オブジェクトパーマンスによるオフライン追跡

Offline Tracking with Object Permanence ( http://arxiv.org/abs/2310.01288v2 )

ライセンス: Link先を確認

Xianzhong Liu, Holger Caesar

(参考訳) 自動走行データセットの手動ラベリングに要するコストを削減すべく、オフライン認識システムを用いてデータセットを自動的にラベリングする。しかし、物体は時間的にオクルードされることがある。このようなデータセットのオクルージョンシナリオは、オフラインのオートラベルでは未検討のままである。本研究では,隠蔽対象トラックに着目したオフライン追跡モデルを提案する。オブジェクト永続性(object permanence)という概念を利用しており、もはや観測されていなくてもオブジェクトは存在し続ける。このモデルには、標準的なオンライントラッカー、閉塞前後のトラックレットを関連付ける再識別(Re-ID)モジュール、断片化されたトラックを補完するトラック補完モジュールの3つの部分が含まれている。 Re-IDモジュールとトラック完了モジュールは、ベクトル化されたマップを入力の1つとして使用し、オクルージョンで追跡結果を洗練する。モデルは、閉塞された対象軌跡を効果的に回収することができる。従来のオンライン追跡結果を45%のIDSと2%のAMOTAで改善し、3Dマルチオブジェクトトラッキングにおける最先端のパフォーマンスを実現する。

To reduce the expensive labor cost for manual labeling autonomous driving datasets, an alternative is to automatically label the datasets using an offline perception system. However, objects might be temporally occluded. Such occlusion scenarios in the datasets are common yet underexplored in offline autolabeling. In this work, we propose an offline tracking model that focuses on occluded object tracks. It leverages the concept of object permanence which means objects continue to exist even if they are not observed anymore. The model contains three parts: a standard online tracker, a re-identification (Re-ID) module that associates tracklets before and after occlusion, and a track completion module that completes the fragmented tracks. The Re-ID module and the track completion module use the vectorized map as one of the inputs to refine the tracking results with occlusion. The model can effectively recover the occluded object trajectories. It achieves state-of-the-art performance in 3D multi-object tracking by improving over the original online tracking result by 45% IDS and 2% AMOTA on the vehicle tracks.

翻訳日:2023-11-01 22:40:24 公開日:2023-10-30

# ミニバッチSGDと局所SGDの安定性と一般化

Stability and Generalization for Minibatch SGD and Local SGD ( http://arxiv.org/abs/2310.01139v2 )

ライセンス: Link先を確認

Yunwen Lei, Tao Sun, Mingrui Liu

(参考訳) データの規模が大きくなることで、最適化のスピードアップに並列性を活用する人気が高まっている。ミニバッチ確率勾配降下(ミニバッチSGD)と局所SGDは並列最適化の2つの一般的な方法である。既存の理論的研究は、最適化誤差によって測定される機械の数に関して、これらの手法の線形高速化を示している。比較として、これらの手法の安定性と一般化はあまり研究されていない。本稿では,ミニバッチと局所SGDの安定性と一般化解析を行い,新しい予測分散分解を導入して学習可能性を理解する。トレーニングエラーを安定性解析に組み込むことで、過パラメータモデルの一般化にいかに役立つかを示す。最適リスク境界を達成するために,ミニバッチと局所SGDの両方が線形スピードアップを達成することを示す。

The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing a novel expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.

翻訳日:2023-11-01 22:39:47 公開日:2023-10-30

# ResolvNet: マルチスケール一貫性を備えたグラフ畳み込みネットワーク

ResolvNet: A Graph Convolutional Network with multi-scale Consistency ( http://arxiv.org/abs/2310.00431v2 )

ライセンス: Link先を確認

Christian Koke, Abhishek Saroha, Yuesong Shen, Marvin Eisenberger, Daniel Cremers

(参考訳) 現在、グラフ学習コミュニティでよく知られている事実として、ボトルネックの存在は、グラフニューラルネットワークが長距離情報を伝播する能力を著しく制限している。今のところ評価されていないのは、直観的には、強い連結されたサブグラフの存在が、共通のアーキテクチャにおける情報フローを厳しく制限する可能性があることだ。この観測により,マルチスケール一貫性の概念が導入された。ノードレベルでは、この概念は与えられたグラフ上で接続が変化しても接続された伝播グラフの保持を指す。グラフレベルでは、マルチスケールの一貫性は、異なる解像度で同じオブジェクトを記述する異なるグラフが同様の特徴ベクトルを割り当てるべきという事実を指す。このように、両方の特性は、多面グラフニューラルネットワークアーキテクチャでは満足できない。これらの欠点を補うために,リゾルダーの数学的概念に基づくフレキシブルグラフニューラルネットワークResolvNetを導入する。このResolvNetアーキテクチャに基づくネットワークは、多くのタスク、すなわちマルチスケール設定の内外において、はるかに高いパフォーマンスのベースラインを誇示しています。

It is by now a well known fact in the graph learning community that the presence of bottlenecks severely limits the ability of graph neural networks to propagate information over long distances. What so far has not been appreciated is that, counter-intuitively, also the presence of strongly connected sub-graphs may severely restrict information flow in common architectures. Motivated by this observation, we introduce the concept of multi-scale consistency. At the node level this concept refers to the retention of a connected propagation graph even if connectivity varies over a given graph. At the graph-level, multi-scale consistency refers to the fact that distinct graphs describing the same object at different resolutions should be assigned similar feature vectors. As we show, both properties are not satisfied by poular graph neural network architectures. To remedy these shortcomings, we introduce ResolvNet, a flexible graph neural network based on the mathematical concept of resolvents. We rigorously establish its multi-scale consistency theoretically and verify it in extensive experiments on real world data: Here networks based on this ResolvNet architecture prove expressive; out-performing baselines significantly on many tasks; in- and outside the multi-scale setting.

翻訳日:2023-11-01 22:39:35 公開日:2023-10-30

# SMPLer-X:表現力のある人文のスケールアップと形状推定

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation ( http://arxiv.org/abs/2309.17448v2 )

ライセンス: Link先を確認

Zhongang Cai, Wanqi Yin, Ailing Zeng, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

(参考訳) 表現的人間のポーズと形状推定(EHPS)は、身体、手、顔の動きのキャプチャを多数の応用で統一する。進歩を奨励しているにもかかわらず、現在の最先端の手法は依然としてトレーニングデータセットの限定セットに依存している。本研究では,VT-Hugeをバックボーンとし,さまざまなデータソースから最大4.5万インスタンスをトレーニングする,最初のジェネラリスト基盤モデル(SMPLer-Xと呼ばれる)へのEHPSのスケールアップについて検討する。ビッグデータと大規模モデルにより、SMPLer-Xは、さまざまなテストベンチマークにまたがる強力なパフォーマンスと、目に見えない環境への優れた転送性を示す。 1) データのスケーリングには,32のEHPSデータセットに対して,単一のデータセットでトレーニングしたモデルでは処理できない幅広いシナリオを含む,体系的な調査を行う。さらに重要なのは、広範なベンチマークプロセスから得られた洞察を活かして、トレーニングスキームを最適化し、EHPS能力の大きな飛躍につながるデータセットを選択することです。 2) モデルスケーリングでは,EHPSにおけるモデルサイズのスケーリング法則を研究するために,視覚変換器を利用する。さらに,我々はSMPLer-Xを専門モデルとし,さらなる性能向上を実現した。 AGORA (107.2 mm NMVE)、UBody (57.4 mm PVE)、EgoBody (63.6 mm PVE)、EHF (62.3 mm PVE) の7つのベンチマークに対して、我々の基礎モデルSMPLer-Xは一貫して最先端の結果を提供する。ホームページ:https://caizhongang.github.io/projects/SMPLer-X/

Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Despite encouraging progress, current state-of-the-art methods still depend largely on a confined set of training datasets. In this work, we investigate scaling up EHPS towards the first generalist foundation model (dubbed SMPLer-X), with up to ViT-Huge as the backbone and training with up to 4.5M instances from diverse data sources. With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments. 1) For the data scaling, we perform a systematic investigation on 32 EHPS datasets, including a wide range of scenarios that a model trained on any single dataset cannot handle. More importantly, capitalizing on insights obtained from the extensive benchmarking process, we optimize our training scheme and select datasets that lead to a significant leap in EHPS capabilities. 2) For the model scaling, we take advantage of vision transformers to study the scaling law of model sizes in EHPS. Moreover, our finetuning strategy turn SMPLer-X into specialist models, allowing them to achieve further performance boosts. Notably, our foundation model SMPLer-X consistently delivers state-of-the-art results on seven benchmarks such as AGORA (107.2 mm NMVE), UBody (57.4 mm PVE), EgoBody (63.6 mm PVE), and EHF (62.3 mm PVE without finetuning). Homepage: https://caizhongang.github.io/projects/SMPLer-X/

翻訳日:2023-11-01 22:38:50 公開日:2023-10-30

# reusability report: 生物学的に不均一な神経構造を有する前立腺癌の成層化

Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures ( http://arxiv.org/abs/2309.16645v2 )

ライセンス: Link先を確認

Christian Pedersen, Tiberiu Tesileanu, Tinghui Wu, Siavash Golkar, Miles Cranmer, Zijun Zhang, Shirley Ho

(参考訳) elmarakeby et al., "biologically informed deep neural network for prostate cancer discovery"では、生物学的にインフォームドされたフィードフォワードニューラルネットワークであるsparse connections (p-net)が、前立腺癌の状態をモデル化するために提示された。 Elmarakebyらが実施した研究の再現性について,元のコードベースと,より最新のライブラリを使用した独自の再実装の両方を用いて検証した。 reactomeの生物学的経路によるネットワークスパーシフィケーションの寄与を定量化し,p-netの優れた性能にその重要性を確認した。さらに,生体情報をネットワークに組み込むためのニューラルアーキテクチャやアプローチについても検討した。同じトレーニングデータ上で3種類のグラフニューラルネットワークを実験し,各モデル間の臨床予測の一致について検討した。分析の結果、異なるアーキテクチャを持つディープニューラルネットワークは、特定のニューラルアーキテクチャの異なる初期化にまたがる個々の患者に対して、誤った予測を行うことがわかった。これは、異なる神経アーキテクチャがデータの異なる側面に敏感であることを示唆している。

In Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.

翻訳日:2023-11-01 22:37:55 公開日:2023-10-30

# ADGym: 深部異常検出のための設計選択

ADGym: Design Choices for Deep Anomaly Detection ( http://arxiv.org/abs/2309.15376v2 )

ライセンス: Link先を確認

Minqi Jiang, Chaochuan Hou, Ao Zheng, Songqiao Han, Hailiang Huang, Qingsong Wen, Xiyang Hu, Yue Zhao

(参考訳) ディープラーニング(DL)技術は、金融、医療サービス、クラウドコンピューティングなど、さまざまな分野における異常検出(AD)に成功している。しかしながら、現在の研究の多くは、損失関数やネットワークアーキテクチャといった個々の設計選択の貢献を解剖することなく、ディープADアルゴリズム全体を概観する傾向にある。この見解は、新たに設計された損失関数、ネットワークアーキテクチャ、学習パラダイムなど、データ前処理のような予備的なステップの価値を低下させる傾向にある。本稿では,このギャップを埋めるために,2つの重要な疑問を提起する。 (i)異常検出には,深層ad手法のどの設計選択が不可欠か? (ii) 汎用的で既存のソリューションに頼るのではなく、任意のADデータセットに対して最適な設計選択を自動的に選択する方法。これらの問題に対処するため,より深い手法でAD設計要素を包括的に評価し,自動選択するプラットフォームであるADGymを紹介した。我々の広範な実験により、既存のリードメソッドのみに頼るだけでは不十分であることが判明した。対照的にADGymを用いて開発されたモデルは、現在の最先端技術を大きく上回っている。

Deep learning (DL) techniques have recently found success in anomaly detection (AD) across various fields such as finance, medical services, and cloud computing. However, most of the current research tends to view deep AD algorithms as a whole, without dissecting the contributions of individual design choices like loss functions and network architectures. This view tends to diminish the value of preliminary steps like data preprocessing, as more attention is given to newly designed loss functions, network architectures, and learning paradigms. In this paper, we aim to bridge this gap by asking two key questions: (i) Which design choices in deep AD methods are crucial for detecting anomalies? (ii) How can we automatically select the optimal design choices for a given AD dataset, instead of relying on generic, pre-existing solutions? To address these questions, we introduce ADGym, a platform specifically crafted for comprehensive evaluation and automatic selection of AD design elements in deep methods. Our extensive experiments reveal that relying solely on existing leading methods is not sufficient. In contrast, models developed using ADGym significantly surpass current state-of-the-art techniques.

翻訳日:2023-11-01 22:36:55 公開日:2023-10-30

# グラフコントラスト学習のための確率的学習

Provable Training for Graph Contrastive Learning ( http://arxiv.org/abs/2309.13944v2 )

ライセンス: Link先を確認

Yue Yu, Xiao Wang, Mengmei Zhang, Nian Liu, Chuan Shi

(参考訳) グラフコントラスト学習(gcl)はラベルのない拡張グラフからノード埋め込みを学ぶための一般的なトレーニングアプローチとして登場した。正のノード対間の類似性を最大化しつつ、負のノード対間の類似性を最小化するという鍵原理は確立されているが、いくつかの根本的な問題はいまだ不明である。複雑なグラフ構造を考えると、いくつかのノードは一貫してよく訓練されているか? あるいは、グラフを拡張せずに原則に違反しているノードがあるのでしょうか? これらのノードを区別し、GCLのトレーニングをさらにガイドする方法? これらの疑問に答えるために、まず、GCLのトレーニングがすべてのノードで実際に不均衡であることを示す実験的な証拠を提示する。この問題に対処するために、ノードが拡張範囲に関連するgclの原理に従う方法の下界である計量「ノードコンパクト性」を提案する。さらに,正規化として二元クロスエントロピーに積分できるバウンド伝搬によって,理論的にノードコンパクト性の形式を導出する。そこで本稿では,GCL の原則に従うノード埋め込みを符号化するための GCL のトレーニングを正規化するための PrOvable Training (POT) を提案する。さまざまなベンチマークに関する広範な実験を通じて、POTは既存のGCLアプローチを一貫して改善し、フレンドリーなプラグインとして機能する。

Graph Contrastive Learning (GCL) has emerged as a popular training approach for learning node embeddings from augmented graphs without labels. Despite the key principle that maximizing the similarity between positive node pairs while minimizing it between negative node pairs is well established, some fundamental problems are still unclear. Considering the complex graph structure, are some nodes consistently well-trained and following this principle even with different graph augmentations? Or are there some nodes more likely to be untrained across graph augmentations and violate the principle? How to distinguish these nodes and further guide the training of GCL? To answer these questions, we first present experimental evidence showing that the training of GCL is indeed imbalanced across all nodes. To address this problem, we propose the metric "node compactness", which is the lower bound of how a node follows the GCL principle related to the range of augmentations. We further derive the form of node compactness theoretically through bound propagation, which can be integrated into binary cross-entropy as a regularization. To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better. Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.

翻訳日:2023-11-01 22:36:22 公開日:2023-10-30

# シャープネス認識の最小化と安定性の限界

Sharpness-Aware Minimization and the Edge of Stability ( http://arxiv.org/abs/2309.12488v4 )

ライセンス: Link先を確認

Philip M. Long and Peter L. Bartlett

(参考訳) 最近の実験では、ステップサイズ$\eta$の勾配降下(gd)を持つニューラルネットワークを訓練する場合、損失のヘッセンの演算子ノルムはおよそ2/\eta$に達するまで増加することが示されている。 2/\eta$の量は、損失の局所二次近似を考慮して「安定性の最先端」と呼ばれる。我々は,GD の変種である SAM (Sharpness-Aware Minimization) の「安定性の端」に到達するための同様の計算を行う。 GDの場合とは異なり、結果のSAM-辺は勾配のノルムに依存する。 3つのディープラーニングトレーニングタスクを用いて、SAMは、この分析によって同定された安定性の端で動作していることを実証的に確認する。

Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.

翻訳日:2023-11-01 22:35:37 公開日:2023-10-30

# DDMT:多変量時系列異常検出のための拡散マスク変換器モデル

DDMT: Denoising Diffusion Mask Transformer Models for Multivariate Time Series Anomaly Detection ( http://arxiv.org/abs/2310.08800v2 )

ライセンス: Link先を確認

Chaocheng Yang and Tingyin Wang and Xuanhui Yan

(参考訳) 多変量時系列における異常検出は時系列研究において重要な課題として現れており、不正検出、故障診断、システム状態推定など様々な分野で重要な研究が行われている。再構成に基づくモデルは近年,時系列データの異常検出に有望な可能性を示している。しかし,データ規模や次元の急激な増加により,時系列再構成におけるノイズ・弱同一性マッピング(WIM)の問題がますます顕著になっている。そこで我々は,Adaptive Dynamic Neighbor Mask (ADNM) 機構を導入し,それを Transformer and Denoising Diffusion Model に統合し,多変量時系列異常検出のための新しいフレームワークである Denoising Diffusion Mask Transformer (DDMT) を開発した。 ADNMモジュールは、データ再構成時に入力と出力の特徴間の情報漏洩を軽減し、再構築時にWIMの問題を軽減する。 Denoising Diffusion Transformer (DDT)は、Denoising Diffusion Modelのための内部ニューラルネットワーク構造としてTransformerを使用している。時系列データの段階的生成過程を学習し、データの確率分布をモデル化し、正常なデータパターンをキャプチャし、ノイズを除去して時系列データを段階的に復元し、異常の明確な回復をもたらす。我々の知る限り、これは多変量時系列異常検出のためのデノイング拡散モデルと変換器を組み合わせた最初のモデルである。 5種類の多変量時系列異常検出データセットを用いて実験を行った。その結果, 時系列データの異常を効果的に識別し, 異常検出時の最先端性能を実現することができた。

Anomaly detection in multivariate time series has emerged as a crucial challenge in time series research, with significant research implications in various fields such as fraud detection, fault diagnosis, and system state estimation. Reconstruction-based models have shown promising potential in recent years for detecting anomalies in time series data. However, due to the rapid increase in data scale and dimensionality, the issues of noise and Weak Identity Mapping (WIM) during time series reconstruction have become increasingly pronounced. To address this, we introduce a novel Adaptive Dynamic Neighbor Mask (ADNM) mechanism and integrate it with the Transformer and Denoising Diffusion Model, creating a new framework for multivariate time series anomaly detection, named Denoising Diffusion Mask Transformer (DDMT). The ADNM module is introduced to mitigate information leakage between input and output features during data reconstruction, thereby alleviating the problem of WIM during reconstruction. The Denoising Diffusion Transformer (DDT) employs the Transformer as an internal neural network structure for Denoising Diffusion Model. It learns the stepwise generation process of time series data to model the probability distribution of the data, capturing normal data patterns and progressively restoring time series data by removing noise, resulting in a clear recovery of anomalies. To the best of our knowledge, this is the first model that combines Denoising Diffusion Model and the Transformer for multivariate time series anomaly detection. Experimental evaluations were conducted on five publicly available multivariate time series anomaly detection datasets. The results demonstrate that the model effectively identifies anomalies in time series data, achieving state-of-the-art performance in anomaly detection.

翻訳日:2023-11-01 22:28:22 公開日:2023-10-30

# 進化的動的最適化と機械学習

Evolutionary Dynamic Optimization and Machine Learning ( http://arxiv.org/abs/2310.08748v2 )

ライセンス: Link先を確認

Abdennour Boulesnane

(参考訳) 進化計算(Evolutionary Computation, EC)は、人工知能の強力な分野として出現し、徐々に発展する自然のメカニズムに触発されている。しかし、ECアプローチは、停滞、多様性喪失、計算複雑性、人口の初期化、早期収束といった課題に直面していることが多い。これらの限界を克服するために、研究者は学習アルゴリズムと進化的手法を統合した。この統合は、反復探索中にECアルゴリズムによって生成された貴重なデータを活用し、検索空間と人口動態に関する洞察を提供する。同様に、進化的アルゴリズムと機械学習(ML)の関係は相反するものであり、ECメソッドはノイズ、不正確、動的目的関数によって特徴づけられる複雑なMLタスクを最適化する特別な機会を提供する。進化機械学習(EML)として知られるこれらのハイブリッド技術は、MLプロセスの様々な段階に適用されている。 EC技術はデータバランシング、機能選択、モデルのトレーニング最適化といったタスクにおいて重要な役割を果たす。さらにMLタスクは、進化的動的最適化(EDO)が価値のある動的最適化を必要とすることが多い。本稿では,EDOとMLの相互統合を包括的に検討する。この研究の目的は、進化的学習コミュニティへの関心を刺激し、この分野における革新的な貢献を促すことである。

Evolutionary Computation (EC) has emerged as a powerful field of Artificial Intelligence, inspired by nature's mechanisms of gradual development. However, EC approaches often face challenges such as stagnation, diversity loss, computational complexity, population initialization, and premature convergence. To overcome these limitations, researchers have integrated learning algorithms with evolutionary techniques. This integration harnesses the valuable data generated by EC algorithms during iterative searches, providing insights into the search space and population dynamics. Similarly, the relationship between evolutionary algorithms and Machine Learning (ML) is reciprocal, as EC methods offer exceptional opportunities for optimizing complex ML tasks characterized by noisy, inaccurate, and dynamic objective functions. These hybrid techniques, known as Evolutionary Machine Learning (EML), have been applied at various stages of the ML process. EC techniques play a vital role in tasks such as data balancing, feature selection, and model training optimization. Moreover, ML tasks often require dynamic optimization, for which Evolutionary Dynamic Optimization (EDO) is valuable. This paper presents the first comprehensive exploration of reciprocal integration between EDO and ML. The study aims to stimulate interest in the evolutionary learning community and inspire innovative contributions in this domain.

翻訳日:2023-11-01 22:27:53 公開日:2023-10-30

# ラベル比率から学ぶ: 信念伝達による教師付き学習者のブートストラップ

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation ( http://arxiv.org/abs/2310.08056v2 )

ライセンス: Link先を確認

Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, Aravindan Raghuveer

(参考訳) Label Proportions(LLP)からの学習(Learning from Label Proportions)は、トレーニング中のバッグと呼ばれるインスタンスのグループに対して、アグリゲートレベルのラベルしか利用できない学習問題である。この設定は、プライバシー上の考慮から広告や医療といった領域で発生する。そこで本研究では,2つの主要なステップを反復的に実行する新しいアルゴリズムフレームワークを提案する。イテレーション毎に最初のステップ(Pseudo Labeling)として、バイナリインスタンスラベルを組み込んだGibbsディストリビューションを定義します。 a) 類似の共変量を持つインスタンスが類似のラベルを持つべきという制約により、共変量情報 b)バッグレベル集約ラベル。次に,Belief Propagation (BP) を用いてギブス分布を疎外し,擬似ラベルを得る。第2のステップ(改良の埋め込み)では、擬似ラベルを使用して学習者の監督を行い、よりよい埋め込みを得る。さらに、第2ステップの埋め込みを次のイテレーションの新しい共変数として使用して、2つのステップを繰り返す。最後のイテレーションでは、擬似ラベルを使用して分類器を訓練する。本アルゴリズムは,表型および画像型のLLPバイナリ分類問題に対して,複数のSOTAベースライン(最大15%)に対して強い利得を示す。我々は,100万個のサンプルであっても,Belief Propagationによる標準的な教師あり学習よりも計算オーバーヘッドが最小限に抑えられたこれらの改善を実現する。

Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.

翻訳日:2023-11-01 22:27:33 公開日:2023-10-30

# 共変量シフトによるテストサンプルの少ないフェアネス精度トレードオフの改善

Improving Fairness-Accuracy tradeoff with few Test Samples under Covariate Shift ( http://arxiv.org/abs/2310.07535v2 )

ライセンス: Link先を確認

Shreyas Havaldar, Jatin Chauhan, Karthikeyan Shanmugam, Jay Nandy, Aravindan Raghuveer

(参考訳) テストデータの共変量は、モデルの精度と公平性の両方を著しく低下させることができる。このような状況下で、異なるセンシティブなグループ間で公平性を確保することは、刑事司法のような社会的意味合いによって最重要となる。ラベルのないテストサンプルとラベル付きトレーニングセットの小さなセットのみが利用可能な、教師なしの体制の下で運用します。この問題に対して、私たちは3つの貢献をします。まず,新しい複合重み付きエントロピーに基づく予測精度を目標とし,フェアネスの表現マッチング損失を最適化した。我々は、いくつかの標準データセットの公平性・正確性トレードオフに関して、損失定式化による最適化がパレート意味で多くの最先端ベースラインを上回っていることを実験的に検証する。第二の貢献は、Asymmetric Covariate Shift(非対称共変量シフト)という新しい設定である。非対称共変量シフト (asymmetric covariate shift) は、ある群の共変量の分布が他の群に比べて著しく変化し、支配的な群が過剰に表現されたときに起こる。この設定は現在のベースラインでは極めて困難であるが,提案手法がベースラインを大きく上回っていることを示す。第3の貢献は理論であり、トレーニングセットにおける予測損失と重み付きエントロピー項が共変量シフトの下でのテスト損失を近似することを示す。経験的および形式的サンプル複雑性境界により、この未知のテスト損失に対する近似は、他の多くのベースラインに影響を及ぼす重要サンプリング分散に依存しないことを示す。

Covariate shift in the test data can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups in such settings is of paramount importance due to societal implications like criminal justice. We operate under the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards this problem, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines.

翻訳日:2023-11-01 22:27:10 公開日:2023-10-30

# 経路ベル試験による長距離量子相関の証明

Certifying long-range quantum correlations through routed Bell tests ( http://arxiv.org/abs/2310.07484v3 )

ライセンス: Link先を確認

Edwin Peter Lobo, Jef Pauwels, and Stefano Pironio

(参考訳) 伝送チャネルの損失は距離とともに増大し、量子非局所性のフォトニクスの実証とその応用にとって大きな障害となる。最近、Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] は、量子非局所性を証明できる範囲を拡張することを目的として、標準ベルの実験のバリエーションを導入した。我々が「ローテッドベル実験」と呼ぶこれらの実験では、ボブは量子粒子を2つの可能な経路に沿って経路付けし、2つの異なる位置で測定することができる。ショートパスのベル違反は、ロングパスの非局所的相関を検出するために必要な条件を弱めるべきである。実際、CVPはルーティングされたベル実験において、検出効率が任意に低い場合でも、リモートデバイスの結果を古典的に規定できないような量子相関が存在することを示した。本稿では,CVPが考慮した相関関係を古典的に規定することはできないが,遠隔デバイスへの量子システムの伝送を必要としないことを示す。これにより、ルート付きベル実験において「短距離」および「長距離」量子相関の概念が定義される。これらの相関は、非可換多項式最適化のための標準半定義型プログラミング階層によって特徴づけられることを示す。次に、短距離量子相関を除外できる条件について検討する。我々は、遠隔装置の臨界検出効率に基本的な低値が存在することを指摘し、経路ベル実験は任意に広い距離で長距離量子非局所性を示すことができないことを示唆する。しかし,経路付きベル実験により検出効率の閾値が低下することが判明した。しかし、改善はCVPの分析によって示唆されるものよりも大幅に小さい。

Losses in the transmission channel, which increase with distance, pose a major obstacle to photonics demonstrations of quantum nonlocality and its applications. Recently, Chaturvedi, Viola, and Pawlowski (CVP) [arXiv:2211.14231] introduced a variation of standard Bell experiments with the goal of extending the range over which quantum nonlocality can be demonstrated. In these experiments, which we call 'routed Bell experiments', Bob can route his quantum particle along two possible paths and measure it at two distinct locations - one near and another far from the source. The idea is that a Bell violation in the short-path should weaken the conditions required to detect nonlocal correlations in the long-path. Indeed, CVP showed that there are quantum correlations in routed Bell experiments such that the outcomes of the remote device cannot be classically predetermined, even when its detection efficiency is arbitrarily low. In this paper, we show that the correlations considered by CVP, though they cannot be classically predetermined, do not require the transmission of quantum systems to the remote device. This leads us to define the concept of 'short-range' and 'long-range' quantum correlations in routed Bell experiments. We show that these correlations can be characterized through standard semidefinite programming hierarchies for non-commutative polynomial optimization. We then explore the conditions under which short-range quantum correlations can be ruled out. We point out that there exist fundamental lower-bounds on the critical detection efficiency of the distant device, implying that routed Bell experiments cannot demonstrate long-range quantum nonlocality at arbitrarily large distances. However, we do find that routed Bell experiments allow for reducing the detection efficiency threshold. The improvements, though, are significantly smaller than those suggested by CVP's analysis.

翻訳日:2023-11-01 22:26:42 公開日:2023-10-30

# OptiMUS: MIPソルバーと大規模言語モデルを用いた最適化モデリング

OptiMUS: Optimization Modeling Using MIP Solvers and large language models ( http://arxiv.org/abs/2310.06116v2 )

ライセンス: Link先を確認

Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell

(参考訳) 最適化問題は製造や流通から医療に至るまで、様々な分野に広がっている。しかし、そのような問題の多くは、最先端の解法で最適に解くのではなく、手でヒューリスティックに解き明かされ、これらの問題を定式化し解決するのに必要な専門知識は、最適化ツールや技術の普及を妨げている。我々は,自然言語記述からmilp問題を定式化し,解決するために設計された大規模言語モデル(llm)ベースのエージェントであるoptimusを紹介する。 OptiMUSは、数学的モデルの開発、ソルバコードの記述とデバッギング、テストの開発、生成したソリューションの有効性の検証を行うことができる。エージェントをベンチマークするために,線形プログラミング(LP)と混合整数線形プログラミング(MILP)の新たなデータセットであるNLP4LPを提案する。実験の結果,OptiMUS は基本的な LLM 促進戦略の約2倍の問題を解くことがわかった。 OptiMUSコードとNLP4LPデータセットは \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS} で入手できる。

Optimization problems are pervasive across various sectors, from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers, as the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. We introduce OptiMUS, a Large Language Model (LLM)-based agent designed to formulate and solve MILP problems from their natural language descriptions. OptiMUS is capable of developing mathematical models, writing and debugging solver code, developing tests, and checking the validity of generated solutions. To benchmark our agent, we present NLP4LP, a novel dataset of linear programming (LP) and mixed integer linear programming (MILP) problems. Our experiments demonstrate that OptiMUS solves nearly twice as many problems as a basic LLM prompting strategy. OptiMUS code and NLP4LP dataset are available at \href{https://github.com/teshnizi/OptiMUS}{https://github.com/teshnizi/OptiMUS}

翻訳日:2023-11-01 22:25:19 公開日:2023-10-30

# 大規模言語モデル学習のためのメモリコストと通信コストの再考

Rethinking Memory and Communication Cost for Efficient Large Language Model Training ( http://arxiv.org/abs/2310.06003v2 )

ライセンス: Link先を確認

Chan Wu, Hanxiao Zhang, Lin Ju, Jinjing Huang, Youshao Xiao, Zhaoxin Huan, Siyuan Li, Fanzhuang Meng, Lei Liang, Xiaolu Zhang and Jun Zhou

(参考訳) 近年,大規模言語モデル学習のための分散戦略が提案されている。しかし、これらの手法はメモリ消費と通信コストのトレードオフを限定的に解決した。本稿では,大規模な言語モデルの学習速度に及ぼすメモリ消費と通信コストの影響を再考し,部分冗長最適化器(PaRO)を用いたメモリ通信バランス戦略を提案する。 PaROは、微粒なシャーディング戦略により、小メモリ冗長性によるグループ間通信の量と頻度を削減し、様々なトレーニングシナリオにおけるトレーニング効率を向上させる包括的なオプションを提供する。さらに,大規模言語モデル学習において,ノード間やスイッチ間の通信効率を高めるために,階層オーバーラップリング(HO-Ring)通信トポロジを提案する。実験の結果,PaROはSOTA法に比べて1.19x-2.50倍のトレーニングスループットを向上し,ほぼ線形スケーラビリティを実現することがわかった。 hoリングアルゴリズムは従来のリングアルゴリズムと比較して通信効率を36.5%向上させる。

Recently, various distributed strategies for large language model training have been proposed. However, these methods provided limited solutions for the trade-off between memory consumption and communication cost. In this paper, we rethink the impact of memory consumption and communication costs on the training speed of large language models, and propose a memory-communication balanced strategy set Partial Redundancy Optimizer (PaRO). PaRO provides comprehensive options which reduces the amount and frequency of inter-group communication with minor memory redundancy by fine-grained sharding strategy, thereby improving the training efficiency in various training scenarios. Additionally, we propose a Hierarchical Overlapping Ring (HO-Ring) communication topology to enhance communication efficiency between nodes or across switches in large language model training. Our experiments demonstrate that PaRO significantly improves training throughput by 1.19x-2.50x compared to the SOTA method and achieves a near-linear scalability. The HO-Ring algorithm improves communication efficiency by 36.5% compared to the traditional Ring algorithm.

翻訳日:2023-11-01 22:24:42 公開日:2023-10-30

# テキスト関連性測定のための埋め込みを探る:オンラインコメントにおける感覚と関連性を明らかにする

Exploring Embeddings for Measuring Text Relatedness: Unveiling Sentiments and Relationships in Online Comments ( http://arxiv.org/abs/2310.05964v2 )

ライセンス: Link先を確認

Anthony Olakangil, Cindy Wang, Justin Nguyen, Qunbo Zhou, Kaavya Jethwa, Jason Li, Aryan Narendra, Nishk Patel, Arjun Rajaram

(参考訳) 新型コロナウイルス(COVID-19)のパンデミックでインターネット利用が70%増加した後、世界中でソーシャルメディアを利用している人が増えている。 Twitter、Meta Threads、YouTube、Redditといったアプリケーションはますます普及しており、世論が表現されないデジタル空間はほとんど残っていない。本稿では,様々なソーシャルメディアプラットフォームにおけるコメント間の感情的・意味的関係を考察するとともに,各メディアプラットフォーム間での意見共有の重要性について考察する。研究者、政治家、ビジネス代表者が世界中のユーザー間で共有された感情の経路を辿ることができる。本稿では,これらのオンラインプラットフォーム上でユーザコメントから抽出されたテキストの関連度を測定する複数の手法を提案する。単語間のセマンティックな関係を捉え、ウェブ全体の感情を分析する埋め込みを活用することで、世論全体の関連を明らかにすることができる。この研究は、YouTube、Reddit、Twitterなどの既存のデータセットを利用している。我々は、双方向エンコーダ表現(BERT)のような人気のある自然言語処理モデルを利用して、感情を分析し、コメント埋め込み間の関係を探索した。さらに,様々なソーシャルメディアプラットフォームにまたがるコメント埋め込みにおける意味的関係を見つけるために,クラスタリングとkl-divergenceを活用することを目的としている。我々の分析は、オンラインコメントの相互接続性をより深く理解し、大きな相互接続脳として機能するインターネットの概念を調査する。

After the COVID-19 pandemic caused internet usage to grow by 70%, there has been an increased number of people all across the world using social media. Applications like Twitter, Meta Threads, YouTube, and Reddit have become increasingly pervasive, leaving almost no digital space where public opinion is not expressed. This paper investigates sentiment and semantic relationships among comments across various social media platforms, as well as discusses the importance of shared opinions across these different media platforms, using word embeddings to analyze components in sentences and documents. It allows researchers, politicians, and business representatives to trace a path of shared sentiment among users across the world. This research paper presents multiple approaches that measure the relatedness of text extracted from user comments on these popular online platforms. By leveraging embeddings, which capture semantic relationships between words and help analyze sentiments across the web, we can uncover connections regarding public opinion as a whole. The study utilizes pre-existing datasets from YouTube, Reddit, Twitter, and more. We made use of popular natural language processing models like Bidirectional Encoder Representations from Transformers (BERT) to analyze sentiments and explore relationships between comment embeddings. Additionally, we aim to utilize clustering and Kl-divergence to find semantic relationships within these comment embeddings across various social media platforms. Our analysis will enable a deeper understanding of the interconnectedness of online comments and will investigate the notion of the internet functioning as a large interconnected brain.

翻訳日:2023-11-01 22:24:25 公開日:2023-10-30

# 注意パラダイムを超越する:地理空間ソーシャルメディアデータからの表現学習

Transcending the Attention Paradigm: Representation Learning from Geospatial Social Media Data ( http://arxiv.org/abs/2310.05378v2 )

ライセンス: Link先を確認

Nick DiSanto, Anthony Corso, Benjamin Sanders, Gavin Harding

(参考訳) トランスフォーマーは、研究の基盤として注目駆動アーキテクチャを開拓してきたが、文脈情報への依存は、テキストのテーマを暗黙的に学習する能力の限界を浮き彫りにした。本研究では,分散パターンの源泉としてソーシャルメディアデータを調査し,パフォーマンスベンチマークのヒューリスティックパラダイムに挑戦する。複雑な長期的依存関係の取得に依存するネットワークとは対照的に、オンラインデータのモデルは本質的に構造を欠き、集約の基盤となるパターンを学習せざるを得ない。これらの抽象的関係を適切に表現するために、この研究は経験的ソーシャルメディアコーパスを要素成分に分解し、人口密度の場所をまたいだ20億以上のツイートを分析した。 Twitterデータにおける位置と頂点の関係を探索し、各都市固有の単語モデルを用いて、それぞれの表現を評価する。これは、隠れた洞察が高度なアルゴリズムの欠如なしに発見できることを示し、ノイズの多いデータの中でも、地理的な位置がオンラインコミュニケーションにかなりの影響を与えることを示す。この証拠は、地理空間コミュニケーションのパターンとその社会科学における意義に関する明確な洞察を示している。また、複雑なモデルは自然言語におけるパターン認識の前提条件であり、抽象的理解よりも絶対的解釈可能性の受容に疑問を呈する発展途上の景観と整合する。この研究は、洗練されたフレームワークと無形関係の分離を橋渡しし、構造モデルと客観的推論をブレンドするシステムへの道を開く。

While transformers have pioneered attention-driven architectures as a cornerstone of research, their dependence on explicitly contextual information underscores limitations in their abilities to tacitly learn overarching textual themes. This study investigates social media data as a source of distributed patterns, challenging the heuristic paradigm of performance benchmarking. In stark contrast to networks that rely on capturing complex long-term dependencies, models of online data inherently lack structure and are forced to learn underlying patterns in the aggregate. To properly represent these abstract relationships, this research dissects empirical social media corpora into their elemental components and analyzes over two billion tweets across population-dense locations. Exploring the relationship between location and vernacular in Twitter data, we employ Bag-of-Words models specific to each city and evaluate their respective representation. This demonstrates that hidden insights can be uncovered without the crutch of advanced algorithms and demonstrates that even amidst noisy data, geographic location has a considerable influence on online communication. This evidence presents tangible insights regarding geospatial communication patterns and their implications in social science. It also challenges the notion that intricate models are prerequisites for pattern recognition in natural language, aligning with the evolving landscape that questions the embrace of absolute interpretability over abstract understanding. This study bridges the divide between sophisticated frameworks and intangible relationships, paving the way for systems that blend structured models with conjectural reasoning.

翻訳日:2023-11-01 22:23:31 公開日:2023-10-30

# 分布に基づく軌道クラスタリング

Distribution-Based Trajectory Clustering ( http://arxiv.org/abs/2310.05123v2 )

ライセンス: Link先を確認

Zi Jing Wang, Ye Zhu, Kai Ming Ting

(参考訳) 軌道クラスタリングは、軌道データの共通パターンの発見を可能にする。現在の軌道クラスタリングの方法は、2つの軌道間の相似性を測定するために2つの点間の距離測度に依存する。距離測定には高い計算コストと低い忠実度という2つの課題がある。既存のクラスタリングアルゴリズムが採用する距離測定とは独立に、別の課題がある。本稿では,最近の分散カーネル(IDK)を3つの課題に対処するための主要なツールとして用いることを提案する。 TIDKCと呼ばれる新しいIDKベースのクラスタリングアルゴリズムは、軌道類似度測定とクラスタリングに分散カーネルをフル活用する。 TIDKCは不規則な形状と線形時間における密度の異なる非線形分離性クラスターを同定する。ランダム初期化に依存しず、外れ値に対して堅牢である。 7つの大規模実世界の軌跡データセットの広範な評価により、IDKは従来の深層学習に基づく距離測定よりも、軌跡内の複雑な構造を捉えるのに効果的であることが示された。さらに,提案したTIDKCは,既存のトラジェクトリクラスタリングアルゴリズムよりもクラスタリング性能と効率が優れている。

Trajectory clustering enables the discovery of common patterns in trajectory data. Current methods of trajectory clustering rely on a distance measure between two points in order to measure the dissimilarity between two trajectories. The distance measures employed have two challenges: high computational cost and low fidelity. Independent of the distance measure employed, existing clustering algorithms have another challenge: either effectiveness issues or high time complexity. In this paper, we propose to use a recent Isolation Distributional Kernel (IDK) as the main tool to meet all three challenges. The new IDK-based clustering algorithm, called TIDKC, makes full use of the distributional kernel for trajectory similarity measuring and clustering. TIDKC identifies non-linearly separable clusters with irregular shapes and varied densities in linear time. It does not rely on random initialisation and is robust to outliers. An extensive evaluation on 7 large real-world trajectory datasets confirms that IDK is more effective in capturing complex structures in trajectories than traditional and deep learning-based distance measures. Furthermore, the proposed TIDKC has superior clustering performance and efficiency to existing trajectory clustering algorithms.

翻訳日:2023-11-01 22:23:05 公開日:2023-10-30

# BioBridge:知識グラフによるバイオメディカル基礎モデルのブリッジ

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs ( http://arxiv.org/abs/2310.03320v3 )

ライセンス: Link先を確認

Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, Rishita Anubhai

(参考訳) 基盤モデル(fms)は、大量のラベルのないデータを活用し、幅広いタスクで優れたパフォーマンスを示すことができる。しかし、生体医学領域向けに開発されたfmsは、独立に訓練され、タンパク質配列のみ、小分子構造のみ、臨床データのみのタスクに使用されている。このようなバイオメディカルFMの限界を克服するため,新しいパラメータ効率学習フレームワークであるBioBridgeを提案し,独立に訓練された単調FMを橋渡しし,マルチモーダルな動作を確立する。 BioBridgeは、知識グラフ(KG)を使用して、基礎となる一助的FMを微調整することなく、1つの一助的FMともう1つの間の変換を学習する。実験の結果,BioBridgeは,クロスモーダル検索タスクにおいて,最高のベースラインKG埋め込み手法(平均76.3%)を克服できることが示された。また、BioBridgeは、未知のモダリティや関係を外挿することで、ドメイン外一般化能力を示す。また,バイオブリッジは,生物医学的マルチモーダル質問応答を支援できる汎用レトリバーとして自らを提示し,新規医薬品の誘導生成を促進する。

Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.

翻訳日:2023-11-01 22:22:51 公開日:2023-10-30

# 3次元物理系における対称性の破断学習のための緩和オクタヘドラル群畳み込み

Relaxed Octahedral Group Convolution for Learning Symmetry Breaking in 3D Physical Systems ( http://arxiv.org/abs/2310.02299v3 )

ライセンス: Link先を確認

Rui Wang, Robin Walters, Tess E.Smidt

(参考訳) 深部等変モデルでは、サンプル効率と一般化を改善するために対称性を用いる。しかし、これらのモデルの多くにおける完全対称性の仮定は、特にデータがそのような対称性と完全に一致しない場合に制限的である。そこで本稿では,3次元物理系をモデル化するための緩和八面体群畳み込みを導入する。このフレキシブルな畳み込み法は、モデルがデータと整合する最も高いレベルの等値を維持し、物理的システムの微妙な対称性を破る要因を発見できるようにする。実験により,本手法は相転移における対称性破壊要因の洞察を与えるだけでなく,流体超解像タスクにおいて優れた性能を達成できることを示す。

Deep equivariant models use symmetries to improve sample efficiency and generalization. However, the assumption of perfect symmetry in many of these models can sometimes be restrictive, especially when the data does not perfectly align with such symmetries. Thus, we introduce relaxed octahedral group convolution for modeling 3D physical systems in this paper. This flexible convolution technique provably allows the model to both maintain the highest level of equivariance that is consistent with data and discover the subtle symmetry-breaking factors in the physical systems. Empirical results validate that our approach can not only provide insights into the symmetry-breaking factors in phase transitions but also achieves superior performance in fluid super-resolution tasks.

翻訳日:2023-11-01 22:22:29 公開日:2023-10-30

# 自己回帰率観察によるウェアラブル医療の効率的不均衡を考慮したフェデレーション学習手法

An Efficient Imbalance-Aware Federated Learning Approach for Wearable Healthcare with Autoregressive Ratio Observation ( http://arxiv.org/abs/2310.14784v2 )

ライセンス: Link先を確認

Wenhao Yan, He Li, Kaoru Ota, Mianxiong Dong

(参考訳) ウェアラブルセンシング技術やモバイルエッジコンピューティングの進歩により、広く利用可能な医療サービスが普及しています。人々の健康情報は、スマートフォンやウェアラブルバンドなどのエッジデバイスによって収集され、サーバーのさらなる分析を行い、異常な状況に対する提案や警告を送信します。近年のフェデレーション学習では、ローカルデバイス上でプライベートデータをトレーニングし、モデルを共同で更新することが可能になる。しかしながら、健康状態データの不均質な分布は、クラス不均衡によるパフォーマンスのモデル化に重大なリスクをもたらす可能性がある。一方、FLトレーニングはサーバとのみグラデーションを共有することで実現されているため、トレーニングデータはほとんどアクセスできない。クラス不均衡に対する従来の解決策は、連合学習には役立ちません。本研究では,フェデレーション学習シナリオにおけるクラス不均衡の課題に対処するために,新しいフェデレーション学習フレームワークfedimtを提案する。 FedImTには、アグリゲーションの各ラウンドでデータ構成を推定するオンラインスキームが含まれており、その後、複数の推定のバリエーションを追跡するための自己減衰反復を導入し、少数クラスの損失計算のバランスを迅速に調整する。実験は、余剰エネルギー消費やプライバシーリスクを回避することなく、不均衡問題を解決するためのFedImTの有効性を示す。

Widely available healthcare services are now getting popular because of advancements in wearable sensing techniques and mobile edge computing. People's health information is collected by edge devices such as smartphones and wearable bands for further analysis on servers, then send back suggestions and alerts for abnormal conditions. The recent emergence of federated learning allows users to train private data on local devices while updating models collaboratively. However, the heterogeneous distribution of the health condition data may lead to significant risks to model performance due to class imbalance. Meanwhile, as FL training is powered by sharing gradients only with the server, training data is almost inaccessible. The conventional solutions to class imbalance do not work for federated learning. In this work, we propose a new federated learning framework FedImT, dedicated to addressing the challenges of class imbalance in federated learning scenarios. FedImT contains an online scheme that can estimate the data composition during each round of aggregation, then introduces a self-attenuating iterative equivalent to track variations of multiple estimations and promptly tweak the balance of the loss computing for minority classes. Experiments demonstrate the effectiveness of FedImT in solving the imbalance problem without extra energy consumption and avoiding privacy risks.

翻訳日:2023-11-01 22:15:15 公開日:2023-10-30

# スケーラブルなデータ表現と分類のための学習解釈可能なルール

Learning Interpretable Rules for Scalable Data Representation and Classification ( http://arxiv.org/abs/2310.14336v2 )

ライセンス: Link先を確認

Zhuo Wang, Wei Zhang, Ning Liu, Jianyong Wang

(参考訳) 規則に基づくモデル、例えば決定木は、透明な内部構造と優れたモデル表現性のために高いモデル解釈性を必要とするシナリオで広く使われている。しかし、ルールベースのモデルは、特に大きなデータセットでは、個々のパラメータや構造のために最適化が難しい。アンサンブルメソッドとファジィ/ソフトルールは一般的にパフォーマンスを改善するために使用されるが、モデルの解釈性を犠牲にしている。スケーラビリティと解釈性の両方を得るために,データ表現と分類のための解釈不能なルールを自動的に学習する,ルールベース表現学習器(rrl)という新しい分類器を提案する。非微分可能rrlを効果的に訓練するために、連続空間に投影し、勾配降下を用いて離散モデルを直接最適化できる勾配グラフトと呼ばれる新しい訓練方法を提案する。論理アクティベーション関数の新たな設計は、RRLのスケーラビリティを高め、エンドツーエンドで連続的な特徴を識別できるようにするためにも考案されている。 10個の小さなデータセットと4つの大きなデータセットの探索実験により、RRLは競争的解釈可能なアプローチよりも優れており、異なるシナリオにおける分類精度とモデルの複雑さのトレードオフを得るために容易に調整できることを示した。私たちのコードは以下の通りです。

Rule-based models, e.g., decision trees, are widely used in scenarios demanding high model interpretability for their transparent inner structures and good model expressivity. However, rule-based models are hard to optimize, especially on large data sets, due to their discrete parameters and structures. Ensemble methods and fuzzy/soft rules are commonly used to improve performance, but they sacrifice the model interpretability. To obtain both good scalability and interpretability, we propose a new classifier, named Rule-based Representation Learner (RRL), that automatically learns interpretable non-fuzzy rules for data representation and classification. To train the non-differentiable RRL effectively, we project it to a continuous space and propose a novel training method, called Gradient Grafting, that can directly optimize the discrete model using gradient descent. A novel design of logical activation functions is also devised to increase the scalability of RRL and enable it to discretize the continuous features end-to-end. Exhaustive experiments on ten small and four large data sets show that RRL outperforms the competitive interpretable approaches and can be easily adjusted to obtain a trade-off between classification accuracy and model complexity for different scenarios. Our code is available at: https://github.com/12wang3/rrl.

翻訳日:2023-11-01 22:14:36 公開日:2023-10-30

# 言語的動機づけによる手話セグメンテーション

Linguistically Motivated Sign Language Segmentation ( http://arxiv.org/abs/2310.13960v2 )

ライセンス: Link先を確認

Amit Moryossef, Zifan Jiang, Mathias M\"uller, Sarah Ebling, Yoav Goldberg

(参考訳) 手話セグメンテーションは手話処理システムにおいて重要なタスクである。これは、サイン認識、転写、機械翻訳などの下流タスクを可能にする。本研究では,個々の記号への分割と,複数の記号からなる大きな単位からなる句への分割という2種類の分割について考察する。これら2つのタスクを協調的にモデル化する新しい手法を提案する。本手法は手話コーパスに見られる言語的手がかりに動機づけられている。我々は、主要なIOタグ付けスキームをBIOタグに置き換えて、継続的な署名を行う。句境界において韻律が重要な役割を果たすことを考慮し,光フロー機能の利用について検討する。また,手形と3次元手形正規化の広範囲な解析を行う。署名境界のモデル化には,BIOタグの導入が必要である。オプティカルフローによるプロソディの明示的にエンコーディングは、浅いモデルのセグメンテーションを改善するが、深いモデルではその貢献は無視できる。モデル上における復号アルゴリズムの注意深いチューニングは、セグメンテーション品質をさらに向上させる。最終モデルは、ゼロショット設定下であっても、異なる署名付き言語でドメイン外のビデオコンテンツに一般化されることを実証する。光流と3次元ハンド正規化を含め、この文脈でモデルのロバスト性を高めることが観察される。

Sign language segmentation is a crucial task in sign language processing systems. It enables downstream tasks such as sign recognition, transcription, and machine translation. In this work, we consider two kinds of segmentation: segmentation into individual signs and segmentation into phrases, larger units comprising several signs. We propose a novel approach to jointly model these two tasks. Our method is motivated by linguistic cues observed in sign language corpora. We replace the predominant IO tagging scheme with BIO tagging to account for continuous signing. Given that prosody plays a significant role in phrase boundaries, we explore the use of optical flow features. We also provide an extensive analysis of hand shapes and 3D hand normalization. We find that introducing BIO tagging is necessary to model sign boundaries. Explicitly encoding prosody by optical flow improves segmentation in shallow models, but its contribution is negligible in deeper models. Careful tuning of the decoding algorithm atop the models further improves the segmentation quality. We demonstrate that our final models generalize to out-of-domain video content in a different signed language, even under a zero-shot setting. We observe that including optical flow and 3D hand normalization enhances the robustness of the model in this context.

翻訳日:2023-11-01 22:14:14 公開日:2023-10-30

# 大規模言語モデルはなぜ正しい連鎖を生成するのか?

Why Can Large Language Models Generate Correct Chain-of-Thoughts? ( http://arxiv.org/abs/2310.13571v2 )

ライセンス: Link先を確認

Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar

(参考訳) 本稿では,大規模言語モデル(LLM)の能力について述べる。本研究では,LLMを効果的に誘導し,コヒーレントな思考連鎖を生成する方法について検討する。これを実現するために,自然言語生成に適した2階層階層型グラフィカルモデルを提案する。この枠組み内では、真の言語に由来するものと比較して、LLM生成された思考の連鎖の可能性を測る魅力的な幾何学的収束率を確立する。本研究は、推論能力を要求するタスクにおけるパフォーマンス向上を説明する(潜在的に)適切な思考列を生成するllmの能力に関する理論的正当性を提供する。

This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.

翻訳日:2023-11-01 22:13:32 公開日:2023-10-30

# DistillCSE: 文埋め込みのための蒸留コントラスト学習

DistillCSE: Distilled Contrastive Learning for Sentence Embeddings ( http://arxiv.org/abs/2310.13499v2 )

ライセンス: Link先を確認

Jiahao Xu and Wei Shao and Lihui Chen and Lemao Liu

(参考訳) 本稿では,知識蒸留による自己学習パラダイムの下で,コントラスト学習を行うDistillCSEフレームワークを提案する。 DistillCSEの潜在的な利点は、自給自足機能である: ベースモデルを使用してさらなる監視信号を提供することで、知識蒸留を通じてより強力なモデルを学ぶことができる。しかしながら、知識蒸留の標準的な実装によるバニラ蒸留は、過度な過剰フィットによる限界的な改善しか達成できない。さらに定量的に分析した結果, 標準知識蒸留は, コントラスト学習の本質から, 教師モデルのロジットに比較的大きなばらつきがあることが明らかになった。そこで本研究では,高分散によって引き起こされる問題を緩和するため,グループ・Pシャッフル戦略を暗黙の正規化として提案し,複数の教師成分から平均ロジットを抽出した。標準ベンチマークによる実験では、提案法が多くの強力なベースライン法を上回り、新たな最先端性能をもたらすことが示されている。

This paper proposes the DistillCSE framework, which performs contrastive learning under the self-training paradigm with knowledge distillation. The potential advantage of DistillCSE is its self-enhancing feature: using a base model to provide additional supervision signals, a stronger model may be learned through knowledge distillation. However, the vanilla DistillCSE through the standard implementation of knowledge distillation only achieves marginal improvements due to severe overfitting. The further quantitative analyses demonstrate the reason that the standard knowledge distillation exhibits a relatively large variance of the teacher model's logits due to the essence of contrastive learning. To mitigate the issue induced by high variance, this paper accordingly proposed two simple yet effective solutions for knowledge distillation: a Group-P shuffling strategy as an implicit regularization and the averaging logits from multiple teacher components. Experiments on standard benchmarks demonstrate that the proposed DistillCSE outperforms many strong baseline methods and yields a new state-of-the-art performance.

翻訳日:2023-11-01 22:13:21 公開日:2023-10-30

# バンディットゲームにおける近似情報最大化

Approximate information maximization for bandit games ( http://arxiv.org/abs/2310.12563v2 )

ライセンス: Link先を確認

Alex Barbier-Chebbah (IP, CNRS, UPCit\'e), Christian L. Vestergaard (IP, CNRS, UPCit\'e), Jean-Baptiste Masson (IP, CNRS, UPCit\'e), Etienne Boursier (INRIA Saclay)

(参考訳) エントロピー最大化と自由エネルギー最小化は、様々な物理系の力学をモデル化するための一般的な物理原理である。例えば、自由エネルギー原理を用いた脳内意思決定のモデル化、情報ボトルネック原理による隠れ変数へのアクセス時の精度・複雑さトレードオフの最適化(Tishby et al., 2000)、情報最大化を用いたランダム環境におけるナビゲーション(Vergassola et al., 2007)などがある。この原理に基づいて,システム内のキー変数の情報に対する近似を最大化する新しい帯域幅アルゴリズムを提案する。この目的のために,エントロピーの近似解析物理学に基づく表現を開発し,各動作の情報ゲインを予測し,情報ゲインが最も大きいものを選択する。この手法は古典的なバンディット設定において強力なパフォーマンスをもたらす。経験的成功により,ガウス報酬を伴う二本腕バンディット問題に対する漸近的最適性を証明する。システムの性質をグローバルな物理関数に包含する能力のため、このアプローチはより複雑な帯域幅設定に効率的に適応することができ、マルチアーム帯域幅問題に対する情報最大化アプローチのさらなる研究を求めることができる。

Entropy maximization and free energy minimization are general physical principles for modeling the dynamics of various physical systems. Notable examples include modeling decision-making within the brain using the free-energy principle, optimizing the accuracy-complexity trade-off when accessing hidden variables with the information bottleneck principle (Tishby et al., 2000), and navigation in random environments using information maximization (Vergassola et al., 2007). Built on this principle, we propose a new class of bandit algorithms that maximize an approximation to the information of a key variable within the system. To this end, we develop an approximated analytical physics-based representation of an entropy to forecast the information gain of each action and greedily choose the one with the largest information gain. This method yields strong performances in classical bandit settings. Motivated by its empirical success, we prove its asymptotic optimality for the two-armed bandit problem with Gaussian rewards. Owing to its ability to encompass the system's properties in a global physical functional, this approach can be efficiently adapted to more complex bandit settings, calling for further investigation of information maximization approaches for multi-armed bandit problems.

翻訳日:2023-11-01 22:12:13 公開日:2023-10-30

# 大規模言語モデルにおけるファクチュアル知識の体系的評価

Systematic Assessment of Factual Knowledge in Large Language Models ( http://arxiv.org/abs/2310.11638v3 )

ライセンス: Link先を確認

Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari

(参考訳) 従来の研究では,大規模言語モデル(LLM)に格納された知識を評価するために,既存の質問応答ベンチマークに頼っていた。しかし、このアプローチは、主に事前学習データと重複するジェネリックドメインに焦点を当てているため、事実的知識カバレッジに関する制限がある。本稿では,知識グラフ(KG)を利用して,LLMの事実知識を体系的に評価する枠組みを提案する。本フレームワークは,所定のKGに格納された事実から,質問の集合と期待された回答を自動的に生成し,これらの質問に対するLLMの精度を評価する。汎用ドメインと特定ドメインのKGを用いて,最先端のLCMを体系的に評価した。この実験は、ChatGPTがすべてのドメインで一貫してトップパフォーマーであることを示している。また, LLMの性能は命令の微調整, ドメイン, 質問の複雑さに左右され, 相手のコンテキストに左右される傾向がある。

Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.

翻訳日:2023-11-01 22:11:49 公開日:2023-10-30

# フェデレーション多目的学習

Federated Multi-Objective Learning ( http://arxiv.org/abs/2310.09866v2 )

ライセンス: Link先を確認

Haibo Yang, Zhuqing Liu, Jia Liu, Chaosheng Dong, Michinari Momma

(参考訳) 近年、多目的最適化(MOO)は多くのマルチエージェントマルチタスク学習アプリケーションを支える基礎的な問題として現れている。しかし,MOO文学における既存のアルゴリズムは,マルチエージェントマルチタスク学習アプリケーションの分散性やデータプライバシ要求を満足しない集中型学習設定に限定されている。これにより、複数のクライアントがMOO問題を分散的かつ協調的に解決し、トレーニングデータをプライベートに保ちながら、新しいFMOL(Federated Multi-Objective Learning)フレームワークを提案することができる。特に,我々のFMOLフレームワークは,異なるクライアント間で異なる目的関数のセットを提供して,MOOの定式化を初めてフェデレート学習パラダイムに発展させ,一般化する幅広いアプリケーションをサポートする。このfmolフレームワークのために,federated multi-gradient descent averaging (fmgda) と federated stochastic multi-gradient descent averaging (fsmgda) と呼ばれる2つの新しいfederated multi-objective optimization (fmoo) アルゴリズムを提案する。両方のアルゴリズムは、局所的な更新によって通信コストを著しく削減し、一方、単目的フェデレーション学習においてアルゴリズムのアルゴリズムと同等の収束率を達成する。また,提案したFMOOアルゴリズムの有効性についても検討した。

In recent years, multi-objective optimization (MOO) emerges as a foundational problem underpinning many multi-agent multi-task learning applications. However, existing algorithms in MOO literature remain limited to centralized learning settings, which do not satisfy the distributed nature and data privacy needs of such multi-agent multi-task learning applications. This motivates us to propose a new federated multi-objective learning (FMOL) framework with multiple clients distributively and collaboratively solving an MOO problem while keeping their training data private. Notably, our FMOL framework allows a different set of objective functions across different clients to support a wide range of applications, which advances and generalizes the MOO formulation to the federated learning paradigm for the first time. For this FMOL framework, we propose two new federated multi-objective optimization (FMOO) algorithms called federated multi-gradient descent averaging (FMGDA) and federated stochastic multi-gradient descent averaging (FSMGDA). Both algorithms allow local updates to significantly reduce communication costs, while achieving the {\em same} convergence rates as those of their algorithmic counterparts in the single-objective federated learning. Our extensive experiments also corroborate the efficacy of our proposed FMOO algorithms.

翻訳日:2023-11-01 22:11:12 公開日:2023-10-30

# impress:拡散型生成aiにおける無許可データ使用に対する知覚不能摂動のレジリエンス評価

IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI ( http://arxiv.org/abs/2310.19248v1 )

ライセンス: Link先を確認

Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen

(参考訳) 安定拡散やDALL-E 2のような拡散に基づく画像生成モデルは、与えられた画像から学習し、プロンプトからのガイダンスに従って高品質なサンプルを生成することができる。例えば、オリジナルのアートワークに基づいてアーティストのスタイルを模倣したアートなイメージを制作したり、偽のコンテンツのためにオリジナル画像を悪意を持って編集したりすることができる。しかし、そのような能力は、元の画像の所有者から適切な許可を得ることなく、重大な倫理的な問題を引き起こす。これに対し、拡散モデルを誤解し、新しいサンプルを適切に生成できないように設計された、知覚不能な摂動を追加することで、そのような不正なデータ使用から元の画像を保護するいくつかの試みがなされている。本研究では, IMPRESSと呼ばれる摂動浄化プラットフォームを導入し, 非受容性摂動の有効性を保護策として評価する。 IMPRESSは、知覚不能な摂動は、元の画像と拡散再構成された画像の間に認識不能な不整合をもたらす可能性があり、これは、画像の浄化のための新しい最適化戦略を考案するために使用することができ、これは、原画像の不正なデータ使用(例えば、スタイル模倣、悪意ある編集)から保護を弱める可能性がある。提案するIMPRESSプラットフォームは,現代の保護手法を包括的に評価し,将来の保護手法の評価プラットフォームとして利用することができる。

Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e.g., style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.

翻訳日:2023-11-01 22:04:50 公開日:2023-10-30

# CodeFusion: コード生成のための事前トレーニング付き拡散モデル

CodeFusion: A Pre-trained Diffusion Model for Code Generation ( http://arxiv.org/abs/2310.17680v2 )

ライセンス: Link先を確認

Mukul Singh, Jos\'e Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Gust Verbruggen

(参考訳) 最後のコード行しか変更できない開発者が、それが正しくなる前に、スクラッチから関数を書き始める頻度を想像してください。自然言語からコードを生成するための自動回帰モデルにも同じような制限がある。符号化自然言語で条件付けられた完全なプログラムを反復的にデノベートすることにより,この制限に対処する,事前学習された拡散コード生成モデルであるcodefusionを導入する。我々は,Bash,Python,Microsoft Excel条件書式(CF)ルールに対して,自然言語のタスクからコード生成までのCodeFusionを評価する。実験の結果、CodeFusion(75Mパラメータ)は最先端の自己回帰システム(350M-175Bパラメータ)と同等に動作し、多様性と品質のバランスが良く、トップ3とトップ5の精度で性能が向上していることがわかった。

Imagine a developer who can only change their last line of code, how often would they have to start writing a function from scratch before it is correct? Auto-regressive models for code generation from natural language have a similar limitation: they do not easily allow reconsidering earlier tokens generated. We introduce CodeFusion, a pre-trained diffusion code generation model that addresses this limitation by iteratively denoising a complete program conditioned on the encoded natural language. We evaluate CodeFusion on the task of natural language to code generation for Bash, Python, and Microsoft Excel conditional formatting (CF) rules. Experiments show that CodeFusion (75M parameters) performs on par with state-of-the-art auto-regressive systems (350M-175B parameters) in top-1 accuracy and outperforms them in top-3 and top-5 accuracy due to its better balance in diversity versus quality.

翻訳日:2023-11-01 22:03:54 公開日:2023-10-30

# format5: 自然言語を用いた条件付きテーブルフォーマッティングの省略と例

FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language ( http://arxiv.org/abs/2310.17306v2 )

ライセンス: Link先を確認

Mukul Singh, Jos\'e Cambronero, Sumit Gulwani, Vu Le, Carina Negreanu, Elnaz Nouri, Mohammad Raza, Gust Verbruggen

(参考訳) フォーマッティングは、視覚化、プレゼンテーション、分析のためのテーブルの重要な特性である。スプレッドシートソフトウェアは、データに依存した条件付きフォーマット(cf)ルールを書くことで自動的にテーブルをフォーマットできる。このようなルールを書くことは、基礎となるロジックを理解し実装する必要があるため、ユーザにとってしばしば困難である。 FormaT5は、対象のテーブルに与えられたCFルールと、所望のフォーマットロジックの自然言語記述を生成できるトランスフォーマーベースのモデルである。これらのタスクのユーザ記述は、しばしば不特定または曖昧であり、コード生成システムは、望ましいルールを1ステップで正確に学習することが困難である。この問題に対処し、引数エラーを最小限に抑えるため、form5は放棄目的にもかかわらずプレースホルダーを予測することを学ぶ。これらのプレースホルダーは、第2のモデルで満たされるか、あるいはフォーマットすべき行の例を、プログラム・バイ・サンプル・システムで利用できる。 FormaT5を多種多様な実シナリオで評価するために、我々は4つの異なるソースから収集された実世界の記述を含む1053のCFタスクの広範なベンチマークを作成する。私たちはこの分野の研究を促進するためにベンチマークをリリースします。回避と充填により、form5は8つの異なるニューラルアプローチをベンチマークで比較できます。本研究は、ドメイン固有の学習システムを構築することの価値を示す。

Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often challenging for users as it requires them to understand and implement the underlying logic. We present FormaT5, a transformer-based model that can generate a CF rule given the target table and a natural language description of the desired formatting logic. We find that user descriptions for these tasks are often under-specified or ambiguous, making it harder for code generation systems to accurately learn the desired rule in a single step. To tackle this problem of under-specification and minimise argument errors, FormaT5 learns to predict placeholders though an abstention objective. These placeholders can then be filled by a second model or, when examples of rows that should be formatted are available, by a programming-by-example system. To evaluate FormaT5 on diverse and real scenarios, we create an extensive benchmark of 1053 CF tasks, containing real-world descriptions collected from four different sources. We release our benchmarks to encourage research in this area. Abstention and filling allow FormaT5 to outperform 8 different neural approaches on our benchmarks, both with and without examples. Our results illustrate the value of building domain-specific learning systems.

翻訳日:2023-11-01 22:03:02 公開日:2023-10-30

# RDBench:リレーショナルデータベースのためのMLベンチマーク

RDBench: ML Benchmark for Relational Databases ( http://arxiv.org/abs/2310.16837v2 )

ライセンス: Link先を確認

Zizhao Zhang, Yi Yang, Lutong Zou, He Wen, Tao Feng, Jiaxuan You

(参考訳) 高品質なデータセットと標準化された評価指標から恩恵を受け、機械学習(ML)は持続的な進歩と広範なアプリケーションを実現した。しかし、機械学習をリレーショナルデータベース(RDB)に適用する一方で、十分に確立されたベンチマークが存在しないことは、MLの開発にとって大きな障害である。この問題に対処するため,我々は,複数のテーブルを含むrdb上で再現可能なml研究を促進するための標準ベンチマークであるrdbench(ml benchmark for relational databases)を紹介する。 RDBenchは、さまざまなスケール、ドメイン、リレーショナル構造のRDBデータセットを4つのレベルに分類する。特に、さまざまなMLドメインに対するRDBenchの採用を単純化するために、RDBenchは、グラフデータ、均質グラフ、異質グラフを含む3種類のインターフェースを公開し、その基盤となるタスク定義を共有する。 RDBenchは、RDB予測タスクの下で、XGBoostからGraph Neural Networksまで、さまざまなドメインからのMLメソッド間の有意義な比較を可能にする。 rdbデータセットごとに複数の分類と回帰タスクを設計、同じデータセット上で平均結果を報告し、実験結果のロバスト性をさらに向上させる。 RDBenchはDBGymで実装されている。DBGymはデータベース上のML研究とアプリケーションのためのユーザフレンドリーなプラットフォームで、RDBenchを使った新しいMLメソッドのベンチマークを容易に行える。

Benefiting from high-quality datasets and standardized evaluation metrics, machine learning (ML) has achieved sustained progress and widespread applications. However, while applying machine learning to relational databases (RDBs), the absence of a well-established benchmark remains a significant obstacle to the development of ML. To address this issue, we introduce ML Benchmark For Relational Databases (RDBench), a standardized benchmark that aims to promote reproducible ML research on RDBs that include multiple tables. RDBench offers diverse RDB datasets of varying scales, domains, and relational structures, organized into 4 levels. Notably, to simplify the adoption of RDBench for diverse ML domains, for any given database, RDBench exposes three types of interfaces including tabular data, homogeneous graphs, and heterogeneous graphs, sharing the same underlying task definition. For the first time, RDBench enables meaningful comparisons between ML methods from diverse domains, ranging from XGBoost to Graph Neural Networks, under RDB prediction tasks. We design multiple classification and regression tasks for each RDB dataset and report averaged results over the same dataset, further enhancing the robustness of the experimental findings. RDBench is implemented with DBGym, a user-friendly platform for ML research and application on databases, enabling benchmarking new ML methods with RDBench at ease.

翻訳日:2023-11-01 22:01:37 公開日:2023-10-30

# Data Provenance Initiative: AIにおけるデータセットライセンスと属性の大規模監査

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI ( http://arxiv.org/abs/2310.16787v2 )

ライセンス: Link先を確認

Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Deb Roy, Sara Hooker

(参考訳) 膨大な、多様な、一貫性のないデータセットで言語モデルをトレーニングするレースは、実践者に対する法的および倫理的リスクに対する懸念を高めている。データの透明性と理解を脅かすこれらのプラクティスを是正するために、法律と機械学習の専門家の間で、1800以上のテキストデータセットを体系的に監査し追跡するための、複数の学際的な取り組みを招集する。私たちは、ソース、クリエーター、一連のライセンス条件、プロパティ、以降の使用から、これらのデータセットの系統をトレースするためのツールと標準を開発します。私たちのランドスケープ分析は、より低いリソース言語、より創造的なタスク、よりリッチなトピックの多様性、より新しい、より合成的なトレーニングデータといった重要なカテゴリを独占するクローズドデータセットによる、商業的にオープンなデータセットとクローズドデータセットの組成と焦点の急激な分割を強調しています。このことは、異なるライセンス条件下で利用できるデータの種類がより深く分断され、著作権と公正使用に関する司法的法的解釈への含意が高まったことを示している。また,広く使用されているデータセットホスティングサイトでは,ライセンスの欠落が72%以上,エラーレートが50%以上,ライセンスの誤分類が頻発している。これは、多くの最近のブレークスルーを駆動する最も人気のあるデータセットの誤帰と情報利用の危機を示している。データセットの透明性と責任ある使用に関する継続的な改善への貢献として、私たちは、最もポピュラーなオープンソースの微調整データコレクションであるwww.dataprovenance.orgのために、データプロヴァンスをトレースしてフィルタできるインタラクティブuiであるdata provenance explorerを使って、監査全体をリリースします。

The race to train language models on vast, diverse, and inconsistently documented datasets has raised pressing concerns about the legal and ethical risks for practitioners. To remedy these practices threatening data transparency and understanding, we convene a multi-disciplinary effort between legal and machine learning experts to systematically audit and trace 1800+ text datasets. We develop tools and standards to trace the lineage of these datasets, from their source, creators, series of license conditions, properties, and subsequent use. Our landscape analysis highlights the sharp divides in composition and focus of commercially open vs closed datasets, with closed datasets monopolizing important categories: lower resource languages, more creative tasks, richer topic variety, newer and more synthetic training data. This points to a deepening divide in the types of data that are made available under different license conditions, and heightened implications for jurisdictional legal interpretations of copyright and fair use. We also observe frequent miscategorization of licenses on widely used dataset hosting sites, with license omission of 72%+ and error rates of 50%+. This points to a crisis in misattribution and informed use of the most popular datasets driving many recent breakthroughs. As a contribution to ongoing improvements in dataset transparency and responsible use, we release our entire audit, with an interactive UI, the Data Provenance Explorer, which allows practitioners to trace and filter on data provenance for the most popular open source finetuning data collections: www.dataprovenance.org.

翻訳日:2023-11-01 22:00:54 公開日:2023-10-30

# 光による超伝導量子ビットのコヒーレント制御

Coherent control of a superconducting qubit using light ( http://arxiv.org/abs/2310.16155v2 )

ライセンス: Link先を確認

Hana K. Warner, Jeffrey Holzgrafe, Beatriz Yankelevich, David Barton, Stefano Poletto, C. J. Xin, Neil Sinclair, Di Zhu, Eyob Sete, Brandon Langley, Emma Batson, Marco Colangelo, Amirhassan Shams-Ansari, Graham Joe, Karl K. Berggren, Liang Jiang, Matthew Reagor, and Marko Loncar

(参考訳) 量子科学と技術は、低損失および低ノイズ通信チャネルに接続された量子プロセッサのネットワークに依存する強力な計算資源の実現を約束している [1,2]。極低温環境で動作する超伝導マイクロ波量子ビット (3-8ghz) は、その強いジョセフソン非線形性と低損失 [3] のために量子プロセッサノードの有望な候補として現れているが、空間的に分離されたプロセッサノード間の情報は、低損失光ファイバを伝搬する通信光子 (200 thz) を介して室温で伝達される可能性が高い。したがって、これらの異なる周波数間の量子情報の変換 [4-10] は、各プラットフォームの利点を量子資源と対向させることで活用することが重要である。ここでは超伝導量子ビットのコヒーレント光制御を示す。我々は、最大1.18%の変換効率(1.16%の協調性)で動作し、量子コヒーレンス時間 (800 ns) に影響を与えずに超伝導量子ビット内のラビ振動 (2.27 mhz) を示すマイクロ波光量子トランスデューサを開発した。最後に,ネットワーク量子プロセッサノードへのトランスデューサの利用に関する展望について述べる。

Quantum science and technology promise the realization of a powerful computational resource that relies on a network of quantum processors connected with low loss and low noise communication channels capable of distributing entangled states [1,2]. While superconducting microwave qubits (3-8 GHz) operating in cryogenic environments have emerged as promising candidates for quantum processor nodes due to their strong Josephson nonlinearity and low loss [3], the information between spatially separated processor nodes will likely be carried at room temperature via telecommunication photons (200 THz) propagating in low loss optical fibers. Transduction of quantum information [4-10] between these disparate frequencies is therefore critical to leverage the advantages of each platform by interfacing quantum resources. Here, we demonstrate coherent optical control of a superconducting qubit. We achieve this by developing a microwave-optical quantum transducer that operates with up to 1.18% conversion efficiency (1.16% cooperativity) and demonstrate optically-driven Rabi oscillations (2.27 MHz) in a superconducting qubit without impacting qubit coherence times (800 ns). Finally, we discuss outlooks towards using the transducer to network quantum processor nodes.

翻訳日:2023-11-01 21:59:39 公開日:2023-10-30

# claimscan-2023: uncovering truth in social media via claim detection and identification of claims spans

Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans ( http://arxiv.org/abs/2310.19267v1 )

ライセンス: Link先を確認

Megha Sundriyal and Md Shad Akhtar and Tanmoy Chakraborty

(参考訳) コンテンツ作成と情報交換の大幅な増加は、非常に有利なオンラインソーシャルメディアプラットフォームの開発によって実現されている。しかし、これらのプラットフォームは偽情報、プロパガンダ、偽ニュースを広める人々にとっての場所になっている。主張は世界の認識を形成するのに不可欠ですが、悲しいことに、偽情報を広める人々によって人を騙すために頻繁に使われています。この問題に対処するため、ソーシャルメディアの巨人はコンテンツモデレーターを使って偽ニュースを現実世界からフィルタリングしている。しかし、情報の量が多いため、偽ニュースを効果的に識別することは困難である。したがって、そのような主張をするソーシャルメディア投稿を自動的に特定し、その妥当性を確認し、信頼性と虚偽の主張を区別することが重要になっている。そこで我々は2023年の情報検索評価フォーラム(FIRE'2023)でCLAIMSCANを紹介した。主な目的は、ソーシャルメディア投稿がクレームを構成するかどうかを決定するタスクAと、クレームを構成するポスト内の単語やフレーズを正確に識別するタスクBである。タスクaは40の登録を受け取り、このタイムリーな課題に強い関心と関与を示した。一方、タスクBは28チームから参加し、誤報のデジタル時代における重要性を強調した。

A significant increase in content creation and information exchange has been made possible by the quick development of online social media platforms, which has been very advantageous. However, these platforms have also become a haven for those who disseminate false information, propaganda, and fake news. Claims are essential in forming our perceptions of the world, but sadly, they are frequently used to trick people by those who spread false information. To address this problem, social media giants employ content moderators to filter out fake news from the actual world. However, the sheer volume of information makes it difficult to identify fake news effectively. Therefore, it has become crucial to automatically identify social media posts that make such claims, check their veracity, and differentiate between credible and false claims. In response, we presented CLAIMSCAN in the 2023 Forum for Information Retrieval Evaluation (FIRE'2023). The primary objectives centered on two crucial tasks: Task A, determining whether a social media post constitutes a claim, and Task B, precisely identifying the words or phrases within the post that form the claim. Task A received 40 registrations, demonstrating a strong interest and engagement in this timely challenge. Meanwhile, Task B attracted participation from 28 teams, highlighting its significance in the digital era of misinformation.

翻訳日:2023-11-01 21:50:56 公開日:2023-10-30

# グラフニューラルネットワーク理解のためのメタデータ駆動アプローチ

A Metadata-Driven Approach to Understand Graph Neural Networks ( http://arxiv.org/abs/2310.19263v1 )

ライセンス: Link先を確認

Ting Wei Li, Qiaozhu Mei, Jiaqi Ma

(参考訳) グラフニューラルネットワーク(GNN)は様々なアプリケーションで顕著な成功を収めているが、そのパフォーマンスはグラフデータセットの特定のデータ特性に敏感である。 GNNの限界を理解するための現在の文献は、主にネットワーク科学やグラフ理論からヒューリスティックスとドメイン知識を活用してGNNの振る舞いをモデル化する$\textit{model-driven}$アプローチを採用しており、それは時間をかけて非常に主観的である。本研究ではGNNのグラフデータ特性に対する感度を解析するための$\textit{metadata-driven}$アプローチを提案する。多様なデータセットにまたがってGNN性能のベンチマークから得られたメタデータを多変量スパース回帰解析し,データ特性の集合を生成する。データ駆動手法の有効性を検証するため,データ特性の特定,度数分布に着目し,理論解析や制御実験を通じて,この特性がGNNの性能に与える影響について検討する。より平衡度分布のよいデータセットは,ノード表現の線形分離性が向上し,GNNの性能が向上することを示す。また, 次数分布の異なる合成データセットを用いて制御実験を行い, 実験結果が理論値とよく一致した。理論的解析と制御実験の両方により,提案手法がGNNの重要データ特性の同定に有効であることを検証した。

Graph Neural Networks (GNNs) have achieved remarkable success in various applications, but their performance can be sensitive to specific data properties of the graph datasets they operate on. Current literature on understanding the limitations of GNNs has primarily employed a $\textit{model-driven}$ approach that leverage heuristics and domain knowledge from network science or graph theory to model the GNN behaviors, which is time-consuming and highly subjective. In this work, we propose a $\textit{metadata-driven}$ approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. Our theoretical findings reveal that datasets with more balanced degree distribution exhibit better linear separability of node representations, thus leading to better GNN performance. We also conduct controlled experiments using synthetic datasets with varying degree distributions, and the results align well with our theoretical findings. Collectively, both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.

翻訳日:2023-11-01 21:50:25 公開日:2023-10-30

# diversify & conquer: out-of-distribution disagreementによる成果指向カリキュラムrl

Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement ( http://arxiv.org/abs/2310.19261v1 )

ライセンス: Link先を確認

Daesol Cho, Seungjae Lee, and H. Jin Kim

(参考訳) 強化学習 (Reinforcement Learning, RL) はしばしば、エージェントが環境の特性や外部報酬といったドメイン知識にアクセスせずに探索すべき非情報探索問題の課題に直面している。これらの課題に対処するため、本研究では、D2C(Diversify for Disagreement & Conquer)と呼ばれるカリキュラムRLの新しいアプローチを提案する。従来のカリキュラム学習法とは異なり、D2Cは所望の成果の少数の例しか必要とせず、その幾何学や所望の成果例の分布に関わらず、どんな環境でも機能する。提案手法は,目標条件分類器の多様化を行い,訪れた結果状態と所望の結果状態の類似性を識別し,未探索領域を定量化し,任意の目標条件固有報酬信号を単純かつ直感的に設計できるようにする。提案手法は両部マッチングを用いて,順応した中間目標の列を生成するカリキュラム学習目標を定義し,エージェントが探索されていない領域を自動的に探索・征服することを可能にする。本研究は,d2cが,任意に分布した望ましい成果例においても,定量的・質的側面において,事前のカリキュラムrl法を上回っていることを示す実験結果を示す。

Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.

翻訳日:2023-11-01 21:49:57 公開日:2023-10-30

# 教師なしデータ取得によるオブジェクト検出のためのオンラインソースフリードメイン適応の改善

Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition ( http://arxiv.org/abs/2310.19258v1 )

ライセンス: Link先を確認

Xiangyu Shi, Yanyuan Qiao, Qi Wu, Lingqiao Liu, Feras Dayoub

(参考訳) 移動ロボットにおける効果的な物体検出は、多様な不慣れな環境での展開によって挑戦される。 Online Source-Free Domain Adaptation (O-SFDA)は、ターゲットドメインからのラベルなしデータのストリームを使用して、リアルタイムなモデル適応を提供する。しかし、モバイルロボティクスにおけるキャプチャーフレームのすべてが、特に強いドメインシフトがある場合、適応に有用な情報を含んでいるわけではない。本稿では,非教師付きデータ取得による移動ロボットの適応物体検出のためのO-SFDAの改良手法を提案する。本手法は,オンライントレーニングプロセスに含まれる最も情報に富む未ラベル標本を優先する。実世界のデータセットに対する実証的な評価により,我々の手法は既存のO-SFDA技術よりも優れており,移動ロボットの適応物体検出を改善するための教師なしデータ取得の可能性を示す。

Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers real-time model adaptation using a stream of unlabeled data from a target domain. However, not all captured frames in mobile robotics contain information that is beneficial for adaptation, particularly when there is a strong domain shift. This paper introduces a novel approach to enhance O-SFDA for adaptive object detection in mobile robots via unsupervised data acquisition. Our methodology prioritizes the most informative unlabeled samples for inclusion in the online training process. Empirical evaluation on a real-world dataset reveals that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the viability of unsupervised data acquisition for improving adaptive object detection in mobile robots.

翻訳日:2023-11-01 21:49:32 公開日:2023-10-30

# マルチビューインスタンスキャプチャによるインスタンス検出のための高分解能データセット

A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture ( http://arxiv.org/abs/2310.19257v1 )

ライセンス: Link先を確認

Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong

(参考訳) インスタンス検出(insdet)は、ロボット工学とコンピュータビジョンにおける長期にわたる問題であり、乱雑なシーンでオブジェクトインスタンス(いくつかの視覚的な例で事前に定義されている)を検出することを目的としている。現実的な重要性があるにもかかわらず、その進歩は、事前定義されたクラスに属するオブジェクトを検出するObject Detectionによって隠れている。主な理由は、現在のinsdetデータセットが現在の標準でスケールが小さすぎるためである。例えば、人気のInsDetデータセットGMU(2016年に公開された)は、2014年に公開された有名なオブジェクト検出データセットであるCOCO(80クラス)よりもはるかに少ない23インスタンスしかありません。私たちは新しいInsDetデータセットとプロトコルを導入する動機があります。トレーニングデータは、マルチビューのインスタンスキャプチャと、フリーボックスアノテーションでインスタンスイメージをペーストしてトレーニングイメージを合成可能な、多様なシーンイメージで構成されています。次に,100のオブジェクトインスタンスのマルチビューキャプチャと,高解像度(6k x 8k)テストイメージを含む実世界データベースをリリースする。第3に,insdetのベースライン手法を大規模に検討し,その性能を分析し,今後の課題を示唆する。予想外のクラス非依存のセグメンテーションモデル(segment anything model, sam)と自己教師付き特徴表現であるdinov2は、オブジェクト検出器(例えばfasterrcnnとretinanet)を再利用するエンドツーエンドトレーニングされたinsdetモデルよりも10 ap以上優れたパフォーマンスを実現しています。

Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene. Despite its practical significance, its advancement is overshadowed by Object Detection, which aims to detect objects belonging to some predefined classes. One major reason is that current InsDet datasets are too small in scale by today's standards. For example, the popular InsDet dataset GMU (published in 2016) has only 23 instances, far less than COCO (80 classes), a well-known object detection dataset published in 2014. We are motivated to introduce a new InsDet dataset and protocol. First, we define a realistic setup for InsDet: training data consists of multi-view instance captures, along with diverse scene images allowing synthesizing training images by pasting instance images on them with free box annotations. Second, we release a real-world database, which contains multi-view capture of 100 object instances, and high-resolution (6k x 8k) testing images. Third, we extensively study baseline methods for InsDet on our dataset, analyze their performance and suggest future work. Somewhat surprisingly, using the off-the-shelf class-agnostic segmentation model (Segment Anything Model, SAM) and the self-supervised feature representation DINOv2 performs the best, achieving >10 AP better than end-to-end trained InsDet models that repurpose object detectors (e.g., FasterRCNN and RetinaNet).

翻訳日:2023-11-01 21:49:19 公開日:2023-10-30

# フローベース分布ロバスト最適化

Flow-based Distributionally Robust Optimization ( http://arxiv.org/abs/2310.19253v1 )

ライセンス: Link先を確認

Chen Xu, Jonghyeok Lee, Xiuyuan Cheng, Yao Xie

(参考訳) 本稿では,フローベース分散ロバスト最適化 (DRO) をWassersteinの不確実性集合を用いて解くために,フローベース分散ロバスト最適化 (DRO) と呼ばれる計算効率のよいフレームワークを提案する。計算量的に困難である無限次元最適化問題に取り組むために,データ分布と対象分布との間の連続時間可逆移動写像をフローベースモデルとして活用し,wasserstein近位勾配流型アルゴリズムを開発した。実際には、勾配降下によりブロックで漸進的に訓練されたニューラルネットワークの列によって輸送マップをパラメータ化する。計算フレームワークは一般に,大規模なサンプルサイズを持つ高次元データを扱うことができ,様々な用途に有用である。本稿では, 逆学習, 分散堅牢な仮説テスト, およびデータ駆動型分散摂動摂動差分プライバシーの新しいメカニズムを実証し, 提案手法は実次元データに対して強い経験的性能を与える。

We present a computationally efficient framework, called \texttt{FlowDRO}, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets, when requiring the worst-case distribution (also called the Least Favorable Distribution, LFD) to be continuous so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models, continuous-time invertible transport maps between the data distribution and the target distribution, and develop a Wasserstein proximal gradient flow type of algorithm. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.

翻訳日:2023-11-01 21:48:46 公開日:2023-10-30

# セマンティックセグメンテーションの評価基準の再検討--粒状断面積の最適化と評価

Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union ( http://arxiv.org/abs/2310.19252v1 )

ライセンス: Link先を確認

Zifu Wang and Maxim Berman and Amal Rannen-Triki and Philip H.S. Torr and Devis Tuia and Tinne Tuytelaars and Luc Van Gool and Jiaqian Yu and Matthew B. Blaschko

(参考訳) 意味セグメンテーションデータセットは、しばしば2種類の不均衡を示す: \textit{class imbalance}、あるクラスが他のクラスよりも頻繁に現れる、 \textit{size imbalance}、あるオブジェクトが他のクラスよりも多くのピクセルを占有する。これにより、従来の評価基準は \textit{majority class} (例えば、ピクセル単位の精度) と \textit{large objects} (例えば、平均ピクセル単位の精度とデータセット単位の平均交点) に偏りがちになる。これらの欠点に対処するため,我々は,細粒度mIoUと,それに対応する最悪の指標を用いて,より包括的なセグメンテーション手法の評価を行う。これらのきめ細かいメトリクスは、大きなオブジェクトに対するバイアスの低減、よりリッチな統計情報、モデルとデータセット監査に関する貴重な洞察を提供する。さらに,12種類の自然および空中のセグメンテーションデータセットについて,提案する指標を用いて15の現代ニューラルネットワークを訓練し,評価する,広範なベンチマーク研究を行った。ベンチマークでは,1つの測定値に基づかないことの必要性を強調し,微細なmIoUsが大きな物体への偏りを減少させることを確認した。さらに,アーキテクチャ設計と損失関数が果たす重要な役割を特定し,細粒度メトリクスを最適化するベストプラクティスを導出する。コードは \href{https://github.com/zifuwanggg/jdtlosses}{https://github.com/zifuwanggg/jdtlosses} で入手できる。

Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.

翻訳日:2023-11-01 21:48:26 公開日:2023-10-30

# 事前訓練型レコメンダシステム:因果脱バイアスの観点から

Pre-trained Recommender Systems: A Causal Debiasing Perspective ( http://arxiv.org/abs/2310.19251v1 )

ライセンス: Link先を確認

Ziqian Lin, Hao Ding, Nghia Hoang, Branislav Kveton, Anoop Deoras, Hao Wang

(参考訳) 事前学習されたビジョン/言語モデルに関する最近の研究は、AIにおける新しい有望なソリューション構築パラダイムの実践的な利点を実証している。一般的なタスク空間を記述する広いデータに基づいてモデルを事前学習し、トレーニングデータが著しく制限されている場合(例えばゼロまたは少数ショットの学習シナリオ)に、幅広い下流タスクを解決するためにうまく適応できる。このような進展にインスパイアされた本論文では,事前学習モデルの観点からは,このようなパラダイムをレコメンダシステムのコンテキストに適用する可能性や課題について考察する。特に,異なるドメインから抽出された汎用ユーザ・イテムインタラクションデータに基づいて,汎用的なインタラクションパターンを学習することにより,汎用的なインタラクションパターンをキャプチャする汎用レコメンデータを提案する。しかし、セマンティック空間において強い適合性を持つビジョン/言語データとは異なり、異なるドメイン(例えば、異なる国や異なるeコマースプラットフォーム)にまたがるレコメンデーションデータの基礎となる普遍的なパターンは、しばしば、ユーザとアイテムの文化的な違いと、異なるeコマースプラットフォームの使用によって暗黙的に課されるドメイン内およびドメイン横断のバイアスによって引き起こされる。実験で示したように、データ内の不均一なバイアスは、事前学習されたモデルの有効性を阻害する傾向がある。この課題に対処するため,我々は,階層型ベイズ深層学習モデルであるPreRecを用いて,因果脱バイアスの観点を導入し,定式化する。実世界データを用いた実験により,提案モデルが,クロスマーケットシナリオとクロスプラットフォームシナリオの両方において,ゼロ・マイ・ショット学習環境でのレコメンデーション性能を大幅に向上できることを示した。

Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.

翻訳日:2023-11-01 21:47:56 公開日:2023-10-30

# 表データ用エンドツーエンド機械学習パイプラインにおける有用性と公平性のための差分プライベート合成データの評価

Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data ( http://arxiv.org/abs/2310.19250v1 )

ライセンス: Link先を確認

Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres and Rafael de Sousa

(参考訳) differentially private (dp) 合成データセットは、個々のデータプロバイダのプライバシーを維持しながらデータを共有するためのソリューションである。エンドツーエンドの機械学習パイプラインでDP合成データを活用することの効果を理解することは、医療や人道的行動といった分野に影響を及ぼす。本研究では,機械学習パイプラインにおいて,合成データが実際の表データを置き換えることができる範囲を調査し,機械学習モデルのトレーニングと評価に最も有効な合成データ生成技術を特定する。そこで本研究では,個人別合成データが下流の分類課題に与える影響について,実用性や公平性の観点から検討する。私たちの分析は包括的であり、主要な2種類の合成データ生成アルゴリズム(マージンベースとganベース)の代表を含んでいる。私たちの知識を最大限に活用するために、私たちの仕事は最初です。 i) 実データが合成データに基づいて訓練された機械学習モデルの実用性と公正性をテストするために利用できると想定しない訓練・評価フレームワークを提案する。 (ii)機械学習モデルのトレーニングに使用する有用性と公平性の観点から、合成データセット生成アルゴリズムの最も広範な分析を行う。 (iii) 公正性のいくつかの異なる定義を含む。本研究は, グラフデータに対するモデルトレーニングユーティリティに関して, GANベースの合成データジェネレータをはるかに上回っていることを示す。実際、限界ベースのアルゴリズムが生成するデータを用いてトレーニングされたモデルは、実データを用いてトレーニングされたモデルと同様の実用性を示すことができる。また,実データを用いて学習したモデルに類似した実用性と公正性を同時に達成できるモデルを,境界モデルによる合成データ生成MWEM PGMで訓練できることも明らかにした。

Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generator MWEM PGM can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.

翻訳日:2023-11-01 21:47:19 公開日:2023-10-30

# 不確実性誘導境界学習による社会的事象検出

Uncertainty-guided Boundary Learning for Imbalanced Social Event Detection ( http://arxiv.org/abs/2310.19247v1 )

ライセンス: Link先を確認

Jiaqian Ren and Hao Peng and Lei Jiang and Zhiwei Liu and Jia Wu and Zhengtao Yu and Philip S. Yu

(参考訳) 現実世界の社会イベントは通常、厳しい階級不均衡の分布を示し、訓練された検出モデルが深刻な一般化の課題に遭遇する。ほとんどの研究は周波数の観点からこの問題を解決し、テールクラスの表現や分類器学習を強調する。私たちの観察では、クラスのララリティと比較すると、トレーニングの行き届いた深層学習ネットワークから推定された不確かさは、モデルのパフォーマンスをよりよく反映する。この目的のために、不均衡なイベント検出タスクに対して、新しい不確実性誘導型クラス不均衡学習フレームワーク - UCL$_{SED}$とその変種 - UCL-EC$_{SED}$を提案する。モデル一般化をこれらの不確実なクラスに拡張することにより、全体的なモデル性能を向上させることを目指している。性能劣化は、典型的には、誤分類サンプルを隣り合うクラスとして扱うことから来ており、潜時空間における境界学習と高品質不確実性推定による分類器学習に焦点を当てている。まず,不均衡データに対する識別可能な表現分布を操作するために,新しい不確実性誘導型コントラスト学習損失,すなわちuclとその変種であるucl-ecを設計した。訓練中、全てのクラス、特に不確実なクラスは、特徴空間における明確な分離可能な境界を適応的に調整するよう強制する。第二に, より堅牢で正確なクラス不確実性を得るために, 追加校正法の監督のもと, デンプスター・シェーファー理論を通した多視点証拠分類器の結果を組み合わせる。 event2012\_100, events2018\_100, crisislext\_7の3つの深刻な不均衡なソーシャルイベントデータセットについて実験を行った。我々のモデルは、ほとんど全てのクラス、特に不確実なクラスにおいて、社会イベントの表現と分類タスクを大幅に改善する。

Real-world social events typically exhibit a severe class-imbalance distribution, which makes the trained detection model encounter a serious generalization challenge. Most studies solve this problem from the frequency perspective and emphasize the representation or classifier learning for tail classes. While in our observation, compared to the rarity of classes, the calibrated uncertainty estimated from well-trained evidential deep learning networks better reflects model performance. To this end, we propose a novel uncertainty-guided class imbalance learning framework - UCL$_{SED}$, and its variant - UCL-EC$_{SED}$, for imbalanced social event detection tasks. We aim to improve the overall model performance by enhancing model generalization to those uncertain classes. Considering performance degradation usually comes from misclassifying samples as their confusing neighboring classes, we focus on boundary learning in latent space and classifier learning with high-quality uncertainty estimation. First, we design a novel uncertainty-guided contrastive learning loss, namely UCL and its variant - UCL-EC, to manipulate distinguishable representation distribution for imbalanced data. During training, they force all classes, especially uncertain ones, to adaptively adjust a clear separable boundary in the feature space. Second, to obtain more robust and accurate class uncertainty, we combine the results of multi-view evidential classifiers via the Dempster-Shafer theory under the supervision of an additional calibration method. We conduct experiments on three severely imbalanced social event datasets including Events2012\_100, Events2018\_100, and CrisisLexT\_7. Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.

翻訳日:2023-11-01 21:46:56 公開日:2023-10-30

# 単一チャネル用潜在変数モデルのためのスペクトル正規化フレームワーク

A spectral regularisation framework for latent variable models designed for single channel applications ( http://arxiv.org/abs/2310.19246v1 )

ライセンス: Link先を確認

Ryan Balshaw, P. Stephan Heyns, Daniel N. Wilke, Stephan Schmidt

(参考訳) 遅延変数モデル(LVM)は一般的に、観測データ内の基盤となる依存関係、パターン、隠れた構造をキャプチャするために使用される。ソース複製は、単一のチャネルLVMアプリケーションに共通するデータハンケライゼーション前処理ステップの副産物であり、実用的なLVM利用を妨げる。本稿では,スペクトル規則化-LVMというPythonパッケージを紹介する。提案パッケージは、新しいスペクトル正規化項の追加により、ソース複製問題に対処する。このパッケージは、単一チャネルのLVMアプリケーションでスペクトル正則化を行うためのフレームワークを提供するため、スペクトル正則化によるLVMの調査と利用が容易になる。これは、LVMパラメータ推定プロセス中にスペクトル正規化を使用するフレームワークに組み込まれた潜在的LVM目的関数の記号的あるいは明示的な表現を使用することによって達成される。このパッケージの目的は、スペクトル正規化と単一チャネルの時系列アプリケーションに適合する一貫した線形lvm最適化フレームワークを提供することである。

Latent variable models (LVMs) are commonly used to capture the underlying dependencies, patterns, and hidden structure in observed data. Source duplication is a by-product of the data hankelisation pre-processing step common to single channel LVM applications, which hinders practical LVM utilisation. In this article, a Python package titled spectrally-regularised-LVMs is presented. The proposed package addresses the source duplication issue via the addition of a novel spectral regularisation term. This package provides a framework for spectral regularisation in single channel LVM applications, thereby making it easier to investigate and utilise LVMs with spectral regularisation. This is achieved via the use of symbolic or explicit representations of potential LVM objective functions which are incorporated into a framework that uses spectral regularisation during the LVM parameter estimation process. The objective of this package is to provide a consistent linear LVM optimisation framework which incorporates spectral regularisation and caters to single channel time-series applications.

翻訳日:2023-11-01 21:46:26 公開日:2023-10-30

# FetusMapV2:3次元超音波による胎児電位推定の強化

FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound ( http://arxiv.org/abs/2310.19293v1 )

ライセンス: Link先を確認

Chaoyu Chen, Xin Yang, Yuhao Huang, Wenlong Shi, Yan Cao, Mingyuan Luo, Xindi Hu, Lei Zhue, Lequan Yu, Kejuan Yue, Yuanji Zhang, Yi Xiong, Dong Ni, Weijun Huang

(参考訳) 3次元超音波(us)における胎児のポーズ推定は、関連する胎児解剖学的ランドマークのセットを同定することを含む。その主な目的は、胎児に関する包括的情報をランドマーク接続を通して提供し、生体計測、平面の局在化、胎児の動き監視といった様々な重要な応用に役立てることである。しかし、3Dの胎児のポーズを正確に推定するには、画像品質の低下、高次元データを扱うための限られたGPUメモリ、対称的または曖昧な解剖学的構造、胎児のポーズのかなりのバリエーションなど、いくつかの課題がある。本研究では,上記の課題を克服するための新しい3次元胎児ポーズ推定フレームワーク(fetusmapv2)を提案する。私たちの貢献は3倍です。まず,gpuメモリが制限された場合,入力画像の解像度を向上し,より良好な結果が得られるような,相補的なネットワーク構造とアクティベーションのないgpuメモリ管理手法を検討するヒューリスティックスキームを提案する。第2に、対称構造と類似の解剖構造による混乱を軽減するために、新しいペアロスを設計する。隠れた分類タスクをランドマークのローカライゼーションタスクから切り離し、モデル学習を徐々に簡単にする。最後に, 比較的安定したランドマークを選択し, 自己教師付き学習方式を提案する。大規模胎児usデータセットにおける広範囲な実験と多種多様な応用により,1巻あたり22のランドマークを含む1000のボリュームが,他の強力な競合相手よりも優れていることが証明された。

Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose in US volume has several challenges, including poor image quality, limited GPU memory for tackling high dimensional data, symmetrical or ambiguous anatomical structures, and considerable variations in fetal poses. In this study, we propose a novel 3D fetal pose estimation framework (called FetusMapV2) to overcome the above challenges. Our contribution is three-fold. First, we propose a heuristic scheme that explores the complementary network structure-unconstrained and activation-unreserved GPU memory management approaches, which can enlarge the input image resolution for better results under limited GPU memory. Second, we design a novel Pair Loss to mitigate confusion caused by symmetrical and similar anatomical structures. It separates the hidden classification task from the landmark localization task and thus progressively eases model learning. Last, we propose a shape priors-based self-supervised learning by selecting the relatively stable landmarks to refine the pose online. Extensive experiments and diverse applications on a large-scale fetal US dataset including 1000 volumes with 22 landmarks per volume demonstrate that our method outperforms other strong competitors.

翻訳日:2023-11-01 21:38:05 公開日:2023-10-30

# 時間知覚質問応答のための変換器への時間グラフの融合

Fusing Temporal Graphs into Transformers for Time-Sensitive Question Answering ( http://arxiv.org/abs/2310.19292v1 )

ライセンス: Link先を確認

Xin Su, Phillip Howard, Nagib Hakim, Steven Bethard

(参考訳) 長い文書から時間に敏感な質問に答えるには、質問や文書の時間的推論が必要である。重要な疑問は、大きな言語モデルが提供されたテキスト文書のみを使用してそのような推論を実行できるのか、それとも他のシステムから抽出された追加の時間情報から恩恵を受けられるのかである。本研究では、既存の時間情報抽出システムを用いて、質問や文書における事象、時間、時間関係の時間グラフを構築する。次に、これらのグラフをTransformerモデルに融合するための様々なアプローチを検討する。実験結果から,入力テキストに時間グラフを融合する手法は,微調整の有無にかかわらずトランスフォーマーモデルの時間的推論能力を大幅に向上させることが示された。さらに,提案手法はグラフ畳み込みに基づくアプローチよりも優れており,SituatedQAとTimeQAの3つの分割による新しい最先端性能を確立している。

Answering time-sensitive questions from long documents requires temporal reasoning over the times in questions and documents. An important open question is whether large language models can perform such reasoning solely using a provided text document, or whether they can benefit from additional temporal information extracted using other systems. We address this research question by applying existing temporal information extraction systems to construct temporal graphs of events, times, and temporal relations in questions and documents. We then investigate different approaches for fusing these graphs into Transformer models. Experimental results show that our proposed approach for fusing temporal graphs into input text substantially enhances the temporal reasoning capabilities of Transformer models with or without fine-tuning. Additionally, our proposed method outperforms various graph convolution-based approaches and establishes a new state-of-the-art performance on SituatedQA and three splits of TimeQA.

翻訳日:2023-11-01 21:37:38 公開日:2023-10-30

# AMLNet:非回帰型マルチ水平時系列予測のための対向的相互学習ニューラルネットワーク

AMLNet: Adversarial Mutual Learning Neural Network for Non-AutoRegressive Multi-Horizon Time Series Forecasting ( http://arxiv.org/abs/2310.19289v1 )

ライセンス: Link先を確認

Yang Lin

(参考訳) 多様な領域で重要なマルチホライゾン時系列予測は、高い精度とスピードを要求する。 AutoRegressive(AR)モデルは短期的な予測では優れているが、地平線が広がるにつれて速度とエラーの問題に悩まされる。非自動回帰(NAR)モデルは長期的な予測に適合するが、相互依存に苦慮し、非現実的な結果をもたらす。我々は、オンライン知識蒸留(KD)アプローチにより現実的な予測を実現する革新的なNARモデルであるAMLNetを紹介する。 AMLNetは、深いARデコーダと深いNARデコーダを協調的に訓練し、より浅いNARデコーダに知識を与えるアンサンブル教師として機能することで、ARモデルとNARモデルの長所を活用する。この知識伝達は2つの重要なメカニズムによって促進される。 1) 結果駆動型KDは教師モデルからのKD損失の寄与を動的に重み付けし、浅いNARデコーダがアンサンブルの多様性を組み込むことを可能にする。 2) ヒント駆動型KDは, モデルに隠された状態から有意な洞察を抽出し, 蒸留する。大規模な実験では、従来のARやNARモデルよりもAMLNetの方が優れていることが示され、精度を高め、計算を高速化するマルチホライゾン時系列予測のための有望な道を示す。

Multi-horizon time series forecasting, crucial across diverse domains, demands high accuracy and speed. While AutoRegressive (AR) models excel in short-term predictions, they suffer speed and error issues as the horizon extends. Non-AutoRegressive (NAR) models suit long-term predictions but struggle with interdependence, yielding unrealistic results. We introduce AMLNet, an innovative NAR model that achieves realistic forecasts through an online Knowledge Distillation (KD) approach. AMLNet harnesses the strengths of both AR and NAR models by training a deep AR decoder and a deep NAR decoder in a collaborative manner, serving as ensemble teachers that impart knowledge to a shallower NAR decoder. This knowledge transfer is facilitated through two key mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of KD losses from the teacher models, enabling the shallow NAR decoder to incorporate the ensemble's diversity; and 2) hint-driven KD, which employs adversarial training to extract valuable insights from the model's hidden states for distillation. Extensive experimentation showcases AMLNet's superiority over conventional AR and NAR models, thereby presenting a promising avenue for multi-horizon time series forecasting that enhances accuracy and expedites computation.

翻訳日:2023-11-01 21:37:23 公開日:2023-10-30

# EDiffSR: リモートセンシング画像超解像のための効率的な拡散確率モデル

EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution ( http://arxiv.org/abs/2310.19288v1 )

ライセンス: Link先を確認

Yi Xiao, Qiangqiang Yuan, Kui Jiang, Jiang He, Xianyu Jin, and Liangpei Zhang

(参考訳) 近年,畳み込みネットワークは,mse損失などの回帰目標を最小化することで,リモートセンシング画像スーパーリゾルション(sr)において顕著な発展を遂げている。しかし、印象的な性能は達成したものの、これらの手法は過度にスムースな問題を伴う視覚品質の低下に苦しむことが多い。生成的敵ネットワークは複雑な詳細を推測する可能性があるが、それらは容易に崩壊し、望ましくない成果物をもたらす。そこで本稿では,ediffsrと呼ばれる効率的なリモートセンシング画像srのための拡散確率モデル(dpm)を提案する。 EDiffSRは訓練が容易で、知覚障害画像の生成におけるDPMの利点を維持している。具体的には,ノイズ予測にヘビーunetを使用する従来の手法と異なり,チャネル注意と簡易ゲート操作を簡略化し,優れたノイズ予測性能を実現するための効率的な活性化ネットワーク (eanet) を開発し,計算予算を劇的に削減する。さらに,提案するediffsrにより価値の高い事前知識を導入するために,より充実した条件を抽出するためのpractical conditional prior enhancement module (cpem)を開発した。 LR画像の増幅により条件を直接生成するほとんどのDPMベースのSRモデルとは異なり、提案したCPEMは正確なSRのためにより情報的な手がかりを維持するのに役立つ。 4つのリモートセンシングデータセットの大規模な実験により、EDiffSRは、シミュレーションされた実世界のリモートセンシング画像の視覚的な不快なイメージを定量的かつ質的に復元できることを示した。 EDiffSRのコードはhttps://github.com/XY-boy/EDiffSRで入手できる。

Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR

翻訳日:2023-11-01 21:36:58 公開日:2023-10-30

# ブロックチェーンによる半分散フェデレーション学習のスケーラビリティと信頼性向上--信頼ペナリゼーションと非同期機能

Enhancing Scalability and Reliability in Semi-Decentralized Federated Learning With Blockchain: Trust Penalization and Asynchronous Functionality ( http://arxiv.org/abs/2310.19287v1 )

ライセンス: Link先を確認

Ajay Kumar Shrestha, Faijan Ahamad Khan, Mohammed Afaan Shaikh, Amir Jaberzadeh and Jason Geng

(参考訳) ブロックチェーン技術の統合を活用することにより,分散フェデレート学習におけるスケーラビリティと信頼性の課題に対処する,革新的なアプローチを提案する。本稿では,信頼ペナリゼーション機構による参加ノードの信頼性向上と,効率的かつロバストなモデル更新のための非同期機能の実現に着目する。半分散型フェデレートラーニングとブロックチェーン(SDFL-B)を組み合わせることで、データのプライバシーを損なうことなく、公正でセキュアで透明な機械学習環境の構築を目指している。本研究は,スケーラブルで信頼性の高いsdfl-bシステムを育成する上で,このアプローチの利点を示す総合的なシステムアーキテクチャ,方法論,実験結果,議論を提案する。

The paper presents an innovative approach to address the challenges of scalability and reliability in Distributed Federated Learning by leveraging the integration of blockchain technology. The paper focuses on enhancing the trustworthiness of participating nodes through a trust penalization mechanism while also enabling asynchronous functionality for efficient and robust model updates. By combining Semi-Decentralized Federated Learning with Blockchain (SDFL-B), the proposed system aims to create a fair, secure and transparent environment for collaborative machine learning without compromising data privacy. The research presents a comprehensive system architecture, methodologies, experimental results, and discussions that demonstrate the advantages of this novel approach in fostering scalable and reliable SDFL-B systems.

翻訳日:2023-11-01 21:36:31 公開日:2023-10-30

# 単純コンプレックス上のランダムウォークによるグラフニューラルネットワークのファシリテート

Facilitating Graph Neural Networks with Random Walk on Simplicial Complexes ( http://arxiv.org/abs/2310.19285v1 )

ライセンス: Link先を確認

Cai Zhou and Xiyuan Wang and Muhan Zhang

(参考訳) ノードレベルのランダムウォークは、グラフニューラルネットワークの改善に広く使用されている。しかし、エッジ上のランダムウォークや、より一般的には$k$-simplicesへの注意は限定されている。本稿では,Simplicial Complex (SC) の異なる順序でのランダムウォーキングが,GNNの理論的表現性をいかに促進するかを系統的に分析する。まず、$0$-simplicesまたはnodeレベルにおいて、ランダムウォークのブリッジを介して既存の位置符号化(pe)と構造符号化(se)メソッドの接続を確立する。第二に、単体またはエッジレベルでは、エッジレベルのランダムウォークとhodgeをそれぞれ1ドルのlaplacianと対応するedge peに橋渡しします。空間領域では、エッジレベルのランダムウォークを直接利用してEdgeRWSEを構築する。 Hodge 1-Laplcians のスペクトル解析に基づいて、置換同変および表現的エッジレベルの位置符号化である Hodge1Lap を提案する。第3に,本理論を高次簡素なランダムウォークに一般化し,ランダムウォークとホッジラプラシアンに基づく簡素なpeを設計する一般原理を提案する。幅広い単純化されたネットワークを統一するために、レベル間ランダムウォークも導入されている。ランダムウォーク法の有効性を検証する広範な実験を行った。

Node-level random walk has been widely used to improve Graph Neural Networks. However, there is limited attention to random walk on edge and, more generally, on $k$-simplices. This paper systematically analyzes how random walk on different orders of simplicial complexes (SC) facilitates GNNs in their theoretical expressivity. First, on $0$-simplices or node level, we establish a connection between existing positional encoding (PE) and structure encoding (SE) methods through the bridge of random walk. Second, on $1$-simplices or edge level, we bridge edge-level random walk and Hodge $1$-Laplacians and design corresponding edge PE respectively. In the spatial domain, we directly make use of edge level random walk to construct EdgeRWSE. Based on the spectral analysis of Hodge $1$-Laplcians, we propose Hodge1Lap, a permutation equivariant and expressive edge-level positional encoding. Third, we generalize our theory to random walk on higher-order simplices and propose the general principle to design PE on simplices based on random walk and Hodge Laplacians. Inter-level random walk is also introduced to unify a wide range of simplicial networks. Extensive experiments verify the effectiveness of our random walk-based methods.

翻訳日:2023-11-01 21:36:14 公開日:2023-10-30

# rTsfNet: マルチヘッド3次元回転と時系列特徴抽出による人間活動認識のためのDNNモデル

rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity Recognition ( http://arxiv.org/abs/2310.19283v1 )

ライセンス: Link先を確認

Yu Enokibori

(参考訳) 本稿では,Multi-head 3D Rotation and Time Series Feature extractを用いたDNNモデルであるrTsfNetを,IMUに基づく人間活動認識(HAR)のための新しいDNNモデルとして提案する。 rTsfNetはDNN内で3D回転パラメータを導出することで特徴を導出する3Dベースを自動的に選択する。そして、多くの研究者の知恵である時系列特徴(TSF)を導出し、MLPを用いてHARを実現する。 CNNを使用しないモデルは、よく管理されたベンチマーク条件と複数のデータセット(UCI HAR、PAMAP2、Daphnet、OPPORTUNITY)の下で既存のモデルよりも高い精度を達成した。

This paper proposes rTsfNet, a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction, as a new DNN model for IMU-based human activity recognition (HAR). rTsfNet automatically selects 3D bases from which features should be derived by deriving 3D rotation parameters within the DNN. Then, time series features (TSFs), the wisdom of many researchers, are derived and realize HAR using MLP. Although a model that does not use CNN, it achieved the highest accuracy than existing models under well-managed benchmark conditions and multiple datasets: UCI HAR, PAMAP2, Daphnet, and OPPORTUNITY, which target different activities.

翻訳日:2023-11-01 21:35:52 公開日:2023-10-30

# トーリックカラビ・ヤウ3次元多様体の最小体積公式のための機械学習正規化

Machine Learning Regularization for the Minimum Volume Formula of Toric Calabi-Yau 3-folds ( http://arxiv.org/abs/2310.19276v1 )

ライセンス: Link先を確認

Eugene Choi, Rak-Kyeong Seong

(参考訳) 佐々木・アインシュタイン5次元多様体の最小体積に対する明示的な公式の集合を示す。これらの5次元多様体上の円錐はトーリック・カラビ・ヤウ3次元多様体である。これらのトーリックカラビ・ヤウ3次元多様体は、4d n=1 超対称ゲージ理論の無限類と関連付けられ、トーリックカラビ・ヤウ3次元多様体を推定するd3ブレーンの世界体積理論として実現される。 AdS/CFT対応の下では、佐々木・アインシュタイン基底の最小体積は対応する4d N=1超等角体理論の中心電荷に逆比例する。最小体積の公式は、トーリック・カラビ・ヤウ3次元多様体の幾何学的不変量の観点から表される。これらの明確な結果は、最小体積を決定する機械学習の以前の応用を超えて進歩する機械学習正規化技術を実装することで導かれる。さらに、機械学習正規化を用いることで、最小体積に対して解釈可能かつ説明可能な式を提示できる。我々の研究は、広範なトーリック・カラビ・ヤウ3次元多様体の集合であっても、最小体積を顕著な精度で近似することを確認する。

We present a collection of explicit formulas for the minimum volume of Sasaki-Einstein 5-manifolds. The cone over these 5-manifolds is a toric Calabi-Yau 3-fold. These toric Calabi-Yau 3-folds are associated with an infinite class of 4d N=1 supersymmetric gauge theories, which are realized as worldvolume theories of D3-branes probing the toric Calabi-Yau 3-folds. Under the AdS/CFT correspondence, the minimum volume of the Sasaki-Einstein base is inversely proportional to the central charge of the corresponding 4d N=1 superconformal field theories. The presented formulas for the minimum volume are in terms of geometric invariants of the toric Calabi-Yau 3-folds. These explicit results are derived by implementing machine learning regularization techniques that advance beyond previous applications of machine learning for determining the minimum volume. Moreover, the use of machine learning regularization allows us to present interpretable and explainable formulas for the minimum volume. Our work confirms that, even for extensive sets of toric Calabi-Yau 3-folds, the proposed formulas approximate the minimum volume with remarkable accuracy.

翻訳日:2023-11-01 21:35:36 公開日:2023-10-30

# グラフニューラルネットワークによる岩石の有効弾性率の予測

Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks ( http://arxiv.org/abs/2310.19274v1 )

ライセンス: Link先を確認

Jaehong Chung, Rasool Ahmad, WaiChing Sun, Wei Cai, Tapan Mukerji

(参考訳) 本研究では,デジタルCTスキャン画像から岩石の効率的な弾性変調を予測するためのグラフニューラルネットワーク(GNN)に基づくアプローチを提案する。マッパーアルゴリズムを用いて3dデジタル岩盤画像をグラフデータセットに変換し,本質的な幾何学的情報をカプセル化する。これらのグラフは、訓練後、弾性率を予測するのに有効である。 gnnモデルでは,様々なサブキューブ次元から導出される様々なグラフサイズにわたるロバストな予測能力を示す。テストデータセットでうまく機能するだけでなく、見えない岩や探索されていないサブキューブサイズの予測精度も高い。畳み込みニューラルネットワーク (CNN) との比較解析により, 未知の岩石特性の予測において, GNNの優れた性能が示された。さらに、微細構造のグラフ表現は、gpuメモリ要求(cnnのグリッド表現と比較)を大幅に削減し、バッチサイズ選択の柔軟性を高める。本研究は, 岩盤特性の予測精度を高め, ディジタル岩盤解析の効率化におけるGNNモデルの可能性を示す。

This study presents a Graph Neural Networks (GNNs)-based approach for predicting the effective elastic moduli of rocks from their digital CT-scan images. We use the Mapper algorithm to transform 3D digital rock images into graph datasets, encapsulating essential geometrical information. These graphs, after training, prove effective in predicting elastic moduli. Our GNN model shows robust predictive capabilities across various graph sizes derived from various subcube dimensions. Not only does it perform well on the test dataset, but it also maintains high prediction accuracy for unseen rocks and unexplored subcube sizes. Comparative analysis with Convolutional Neural Networks (CNNs) reveals the superior performance of GNNs in predicting unseen rock properties. Moreover, the graph representation of microstructures significantly reduces GPU memory requirements (compared to the grid representation for CNNs), enabling greater flexibility in the batch size selection. This work demonstrates the potential of GNN models in enhancing the prediction accuracy of rock properties and boosting the efficiency of digital rock analysis.

翻訳日:2023-11-01 21:35:11 公開日:2023-10-30

# メモリ摂動方程式:データに対するモデルの感度を理解する

The Memory Perturbation Equation: Understanding Model's Sensitivity to Data ( http://arxiv.org/abs/2310.19273v1 )

ライセンス: Link先を確認

Peter Nickl, Lu Xu, Dharmesh Tailor, Thomas M\"ollenhoff, Mohammad Emtiyaz Khan

(参考訳) モデルのトレーニングデータに対する感度を理解することは重要であるが、特にトレーニング中は困難でコストもかかる。このような問題を単純化するために,モデルの摂動に対する感度をトレーニングデータに関連付けるメモリ・摂動方程式(MPE)を提案する。ベイズ原理を用いて導かれた MPE は、既存の感度測定を統一し、モデルやアルゴリズムの多種多様に一般化し、感度に関する有用な特性を明らかにする。実験の結果, 訓練中に得られた感度推定は, テストデータの一般化を忠実に予測できることがわかった。提案方程式は,ロバスト・適応学習の今後の研究に有用であると考えられる。

Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.

翻訳日:2023-11-01 21:34:43 公開日:2023-10-30

# NPCL:不確かさを意識した連続学習のためのニューラルプロセス

NPCL: Neural Processes for Uncertainty-Aware Continual Learning ( http://arxiv.org/abs/2310.19272v1 )

ライセンス: Link先を確認

Saurav Jha and Dong Gong and He Zhao and Lina Yao

(参考訳) 連続学習(CL)は、新しいタスクによる忘れを制限しながら、ストリーミングデータ上でディープニューラルネットワークを効率的にトレーニングすることを目的としている。しかし、タスク間の干渉が少なくて伝達可能な知識を学習することは困難であり、予測の不確実性を測定することができないため、実世界のCLモデルの展開は制限される。これらの問題に対処するため,我々は,様々なタスクを関数上の確率分布にエンコードし,信頼性の高い不確実性推定を提供するメタリーナーのクラスであるneural process (nps) を用いたclタスクの処理を提案する。具体的には,タスク固有のモジュールを階層的潜在変数モデルに配置したNP-based CL approach (NPCL)を提案する。学習された潜在分布の正規化子を調整し、忘れを緩和する。 NPCLの不確実性推定機能は、CLのタスクヘッド/モジュール推論問題に対処するためにも使用できる。実験の結果,NPCLは従来のCLアプローチよりも優れていた。 NPCLにおける不確実性推定の有効性を検証し、新しいデータを特定し、インスタンスレベルのモデルの信頼性を評価する。コードは \url{https://github.com/srvCodes/NPCL} で入手できる。

Continual learning (CL) aims to train deep neural networks efficiently on streaming data while limiting the forgetting caused by new tasks. However, learning transferable knowledge with less interference between tasks is difficult, and real-world deployment of CL models is limited by their inability to measure predictive uncertainties. To address these issues, we propose handling CL tasks with neural processes (NPs), a class of meta-learners that encode different tasks into probabilistic distributions over functions all while providing reliable uncertainty estimates. Specifically, we propose an NP-based CL approach (NPCL) with task-specific modules arranged in a hierarchical latent variable model. We tailor regularizers on the learned latent distributions to alleviate forgetting. The uncertainty estimation capabilities of the NPCL can also be used to handle the task head/module inference challenge in CL. Our experiments show that the NPCL outperforms previous CL approaches. We validate the effectiveness of uncertainty estimation in the NPCL for identifying novel data and evaluating instance-level model confidence. Code is available at \url{https://github.com/srvCodes/NPCL}.

翻訳日:2023-11-01 21:34:21 公開日:2023-10-30

# 勤勉トロールの愛への学習--対話安全タスクにおけるレーダ効果の会計

Learning to love diligent trolls: Accounting for rater effects in the dialogue safety task ( http://arxiv.org/abs/2310.19271v1 )

ライセンス: Link先を確認

Michael John Ilagan

(参考訳) チャットボットは攻撃的な発話を発生させるリスクがあり、避けなければならない。デプロイ後、チャットボットを継続的に改善する方法の1つは、ライブユーザからのフィードバックから発話/ラベルペアをソースすることだ。しかし、ユーザの中には、間違ったラベルでトレーニング例を提供するトロールがいる。ロールオフトレーニングデータには、ユーザ集約クロスバリデーション(cv)エラーの高いトレーニング例が削除されている。しかし、CVは高価であり、協調攻撃においては、CVはトロルの数と一貫性に圧倒される可能性がある。本研究は,自動エッセイ評価(AES)における方法論にインスパイアされたソリューションを提案することにより,両方の制約に対処する。 GPU計算を必要としないため、LCAは安価である。実験では, トロルが多数である場合でも, AESライクなソリューションは, トロルが一貫した場合には高い精度でトレーニングラベルを推測できることがわかった。

Chatbots have the risk of generating offensive utterances, which must be avoided. Post-deployment, one way for a chatbot to continuously improve is to source utterance/label pairs from feedback by live users. However, among users are trolls, who provide training examples with incorrect labels. To de-troll training data, previous work removed training examples that have high user-aggregated cross-validation (CV) error. However, CV is expensive; and in a coordinated attack, CV may be overwhelmed by trolls in number and in consistency among themselves. In the present work, I address both limitations by proposing a solution inspired by methodology in automated essay scoring (AES): have multiple users rate each utterance, then perform latent class analysis (LCA) to infer correct labels. As it does not require GPU computations, LCA is inexpensive. In experiments, I found that the AES-like solution can infer training labels with high accuracy when trolls are consistent, even when trolls are the majority.

翻訳日:2023-11-01 21:33:51 公開日:2023-10-30

# リーマン対称空間上の不変核:調和解析的アプローチ

Invariant kernels on Riemannian symmetric spaces: a harmonic-analytic approach ( http://arxiv.org/abs/2310.19270v1 )

ライセンス: Link先を確認

Nathael Da Costa, Cyrus Mostajeran, Juan-Pablo Ortega, Salem Said

(参考訳) この研究は、古典ガウス核が非ユークリッド対称空間上で定義されるとき、パラメータの選択に対して正定でないことを証明することを目的としている。この目的を達成するために,新しい幾何学的および解析的議論を考案した。これらはガウス核の正定値の厳密な特徴づけであり、これは完備だが、数値計算によって扱われる低次元のシナリオは限られている。この結果のチーフは、l$^{\! p}$-$\hspace{0.02cm}$Godement theorems (ここで$p = 1,2$) は、非コンパクト型の対称空間上で定義されるカーネルが正定値となるために必要な十分条件を提供する。ボークナー・ゴッジメントの定理(bochner-godement theorem)と呼ばれる有名な定理は、既にそのような条件を与えており、その範囲でははるかに一般的であるが、特に適用が難しい。ガウス核との接続を超えて、この研究の新しい結果は対称空間上の不変核の研究のための青写真を書き、将来の多くの応用を示唆する特定の調和解析ツールをもたらす。

This work aims to prove that the classical Gaussian kernel, when defined on a non-Euclidean symmetric space, is never positive-definite for any choice of parameter. To achieve this goal, the paper develops new geometric and analytical arguments. These provide a rigorous characterization of the positive-definiteness of the Gaussian kernel, which is complete but for a limited number of scenarios in low dimensions that are treated by numerical computations. Chief among these results are the L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement theorems (where $p = 1,2$), which provide verifiable necessary and sufficient conditions for a kernel defined on a symmetric space of non-compact type to be positive-definite. A celebrated theorem, sometimes called the Bochner-Godement theorem, already gives such conditions and is far more general in its scope, but is especially hard to apply. Beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.

翻訳日:2023-11-01 21:33:10 公開日:2023-10-30

# Redditのナラティブにおける道徳判断:社会常識と言語信号による道徳的火花の調査

Moral Judgments in Narratives on Reddit: Investigating Moral Sparks via Social Commonsense and Linguistic Signals ( http://arxiv.org/abs/2310.19268v1 )

ライセンス: Link先を確認

Ruijie Xi, Munindar P. Singh

(参考訳) オンラインのソーシャルインタラクションの現実性が高まる中、ソーシャルメディアは実生活のモラルシナリオを評価する前例のない手段を提供する。著者やコメンテーターが、誰が非難に値するかを道徳的な判断で共有するRedditの記事を調べます。我々は,(1)社会的常識を活性化する出来事,(2)言語的シグナルなど,道徳的判断に影響する要因を調査するために,計算手法を用いる。この目的のために、我々は、道徳的判断を動機付けるものを示すために、コメンテーターが含むオリジナルの投稿からモラル的火花と呼ぶ抜粋に焦点を当てる。 24,672件以上の投稿と175,988件のコメントを調べると、出来事に関連した否定的な個人的特徴(例えば未熟さや無礼さ)が注目され、非難を喚起し、モラルの火花と責任性の関係を示唆する。さらに、コメンテータの認知過程に影響を及ぼす言語は、出来事や文字を描写することで、抜粋の可能性が道徳的な火花となり、事実や具体的記述はこの効果を阻害しがちである。

Given the increasing realism of social interactions online, social media offers an unprecedented avenue to evaluate real-life moral scenarios. We examine posts from Reddit, where authors and commenters share their moral judgments on who is blameworthy. We employ computational techniques to investigate factors influencing moral judgments, including (1) events activating social commonsense and (2) linguistic signals. To this end, we focus on excerpt-which we term moral sparks-from original posts that commenters include to indicate what motivates their moral judgments. By examining over 24,672 posts and 175,988 comments, we find that event-related negative personal traits (e.g., immature and rude) attract attention and stimulate blame, implying a dependent relationship between moral sparks and blameworthiness. Moreover, language that impacts commenters' cognitive processes to depict events and characters enhances the probability of an excerpt become a moral spark, while factual and concrete descriptions tend to inhibit this effect.

翻訳日:2023-11-01 21:32:46 公開日:2023-10-30

# TempME: Motif Discoveryによる時間グラフニューラルネットワークの説明可能性を目指して

TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery ( http://arxiv.org/abs/2310.19324v1 )

ライセンス: Link先を確認

Jialin Chen, Rex Ying

(参考訳) 時空グラフは時変相互作用を伴う動的システムのモデル化に広く使われている。現実のシナリオでは、動的システムにおける未来の相互作用を生成するメカニズムは、典型的には時間的モチーフとして知られるグラフ内の一連の反復的なサブ構造によって制御される。現在の時間グラフニューラルネットワーク(TGNN)の成功と普及にもかかわらず、時間的モチーフがモデルから特定の予測を誘導する重要な指標として認識されているかは定かではない。この課題に対処するために、TGNNの予測を導く最も重要な時間的モチーフを明らかにする、TempME(Temporal Motifs Explainer)と呼ばれる新しいアプローチを提案する。情報ボトルネックの原理から、TempMEは最もインタラクションに関連するモチーフを抽出し、含んでいる情報の量を最小化し、説明の空間性と簡潔性を維持する。 TempMEによる説明のイベントは、既存のアプローチよりも時空間的相関が強く、より理解可能な洞察を提供する。広範な実験によりテンポムの優位性が検証され、6つの実世界のデータセットで説明精度が最大8.21%向上し、現在のtgnnの予測平均精度が最大22.96%向上した。

Temporal graphs are widely used to model dynamic systems with time-varying interactions. In real-world scenarios, the underlying mechanisms of generating future interactions in dynamic systems are typically governed by a set of recurring substructures within the graph, known as temporal motifs. Despite the success and prevalence of current temporal graph neural networks (TGNN), it remains uncertain which temporal motifs are recognized as the significant indications that trigger a certain prediction from the model, which is a critical challenge for advancing the explainability and trustworthiness of current TGNNs. To address this challenge, we propose a novel approach, called Temporal Motifs Explainer (TempME), which uncovers the most pivotal temporal motifs guiding the prediction of TGNNs. Derived from the information bottleneck principle, TempME extracts the most interaction-related motifs while minimizing the amount of contained information to preserve the sparsity and succinctness of the explanation. Events in the explanations generated by TempME are verified to be more spatiotemporally correlated than those of existing approaches, providing more understandable insights. Extensive experiments validate the superiority of TempME, with up to 8.21% increase in terms of explanation accuracy across six real-world datasets and up to 22.96% increase in boosting the prediction Average Precision of current TGNNs.

翻訳日:2023-11-01 21:25:50 公開日:2023-10-30

# pronet:マルチホリゾン時系列予測のためのプログレッシブニューラルネットワーク

ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting ( http://arxiv.org/abs/2310.19322v1 )

ライセンス: Link先を確認

Yang Lin

(参考訳) 本稿では,マルチ水平時系列予測のための新しいディープラーニング手法であるProNetを紹介し,自己回帰(AR)と非自己回帰(NAR)戦略を適応的にブレンドする。本手法では,予測水平線をセグメントに分割し,非自己回帰的に各セグメントの最も重要なステップを予測し,残りのステップを自己回帰的に行う。分節過程は潜時変数に依存しており、変動推論によって個々の時間ステップの重要性を効果的に捉えている。 ARモデルと比較して、ProNetは顕著なアドバンテージを示し、ARイテレーションを少なくし、予測速度を高速化し、エラーの蓄積を軽減している。一方、NARモデルと比較すると、ProNetは出力空間における予測の相互依存性を考慮に入れ、予測精度が向上する。 4つの大規模データセットを包含する包括的評価およびアブレーション研究により,pronetの有効性が示され,精度と予測速度,最先端arおよびnar予測モデルよりも優れた性能を示す。

In this paper, we introduce ProNet, an novel deep learning approach designed for multi-horizon time series forecasting, adaptively blending autoregressive (AR) and non-autoregressive (NAR) strategies. Our method involves dividing the forecasting horizon into segments, predicting the most crucial steps in each segment non-autoregressively, and the remaining steps autoregressively. The segmentation process relies on latent variables, which effectively capture the significance of individual time steps through variational inference. In comparison to AR models, ProNet showcases remarkable advantages, requiring fewer AR iterations, resulting in faster prediction speed, and mitigating error accumulation. On the other hand, when compared to NAR models, ProNet takes into account the interdependency of predictions in the output space, leading to improved forecasting accuracy. Our comprehensive evaluation, encompassing four large datasets, and an ablation study, demonstrate the effectiveness of ProNet, highlighting its superior performance in terms of accuracy and prediction speed, outperforming state-of-the-art AR and NAR forecasting models.

翻訳日:2023-11-01 21:25:24 公開日:2023-10-30

# D4Explainer:離散化拡散による分散GNN説明

D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion ( http://arxiv.org/abs/2310.19321v1 )

ライセンス: Link先を確認

Jialin Chen, Shirley Wu, Abhijit Gupta, Rex Ying

(参考訳) グラフニューラルネットワーク(GNN)の広範な展開は、モデル監査と信頼できるグラフ学習の確保において重要な役割を果たす、その説明可能性に大きな関心を喚起する。 GNNの説明可能性の目的は、モデル予測に最も大きな影響を与える基礎となるグラフ構造を識別することである。生成した説明が、特にGNNのアウト・オブ・ディストリビューションデータに対する脆弱性のために、イン・ディストリビューション特性の信頼性が要求される。残念ながら、一般的な説明可能性法は、生成した説明を元のグラフの構造に制約する傾向にあり、したがって分配性の重要性を軽視し、信頼性に欠ける説明をもたらす。これらの課題に対処するため、我々はD4Explainerを提案する。D4Explainerは、偽物とモデルレベルの説明シナリオの両方に対して、分散GNN説明を提供する新しいアプローチである。提案したD4Explainerは、生成グラフ分布学習を最適化目標に組み込む。 1) 与えられたインスタンスの分配特性に適合する多様な反事実グラフの集合を生成し、 2)特定のクラス予測に寄与する最も識別的なグラフパターンを特定し、モデルレベルの説明に役立てる。 d4explainerは、反事実とモデルレベルの説明を組み合わせる最初の統一フレームワークである。合成および実世界のデータセットで実施された実証的な評価は、D4Explainerによって達成された最先端のパフォーマンスを、説明精度、忠実性、多様性、堅牢性の観点から、説得力のある証拠を提供する。

The widespread deployment of Graph Neural Networks (GNNs) sparks significant interest in their explainability, which plays a vital role in model auditing and ensuring trustworthy graph learning. The objective of GNN explainability is to discern the underlying graph structures that have the most significant impact on model predictions. Ensuring that explanations generated are reliable necessitates consideration of the in-distribution property, particularly due to the vulnerability of GNNs to out-of-distribution data. Unfortunately, prevailing explainability methods tend to constrain the generated explanations to the structure of the original graph, thereby downplaying the significance of the in-distribution property and resulting in explanations that lack reliability. To address these challenges, we propose D4Explainer, a novel approach that provides in-distribution GNN explanations for both counterfactual and model-level explanation scenarios. The proposed D4Explainer incorporates generative graph distribution learning into the optimization objective, which accomplishes two goals: 1) generate a collection of diverse counterfactual graphs that conform to the in-distribution property for a given instance, and 2) identify the most discriminative graph patterns that contribute to a specific class prediction, thus serving as model-level explanations. It is worth mentioning that D4Explainer is the first unified framework that combines both counterfactual and model-level explanations. Empirical evaluations conducted on synthetic and real-world datasets provide compelling evidence of the state-of-the-art performance achieved by D4Explainer in terms of explanation accuracy, faithfulness, diversity, and robustness.

翻訳日:2023-11-01 21:25:04 公開日:2023-10-30

# 効率的純探索のためのデュアル指向アルゴリズムの設計

Dual-Directed Algorithm Design for Efficient Pure Exploration ( http://arxiv.org/abs/2310.19319v1 )

ライセンス: Link先を確認

Chao Qin and Wei You

(参考訳) 確率的逐次適応実験の文脈における純粋探索問題を考える。意思決定者の目標は、最小限の測定努力で高い信頼性で代替案に関する質問に正確に答えることである。典型的なクエリ質問は、最も優れたパフォーマンスを持つ選択肢を特定し、ランク付けと選択の問題、あるいは機械学習文献における最善のアーム識別に導くことである。我々は, 固定精度設定に着目し, 試料の最適配置に対する強い収束の概念の観点から, 最適性の十分条件を導出する。双対変数を用いて、割り当てが最適であるために必要な条件を特徴付ける。双対変数を用いることで、原始変数のみに依存する最適条件の組合せ構造をバイパスすることができる。注目すべきは、これらの最適条件は、最初ベストアーム識別のために提案されたトップ2のアルゴリズム設計原則の拡張を可能にすることである。さらに, 最適性条件は, 候補の情報ゲインに基づいて, 候補集合から適応的に選択する情報指向選択規則を, 単純かつ効率的な選択規則として導出する。アルゴリズムアプローチを実装するための広いコンテキストについて概説する。我々は,情報指向の選択と組み合わせることで,gaussian best-arm 同定に最適化されたトップツートンプソンサンプリングが(漸近的に)最適であることを示す。我々のアルゴリズムは、$\epsilon$-best-armの識別と閾値帯域幅問題に最適である。また,本解析は,純粋な爆発問題に対するトンプソンサンプリングの適応を導く一般原則も導いた。数値実験は,提案アルゴリズムの既存のアルゴリズムと比較して,例外的な効率性を示す。

We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for $\epsilon$-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones.

翻訳日:2023-11-01 21:24:39 公開日:2023-10-30

# L2T-DLN:動的損失ネットワークによる学習

L2T-DLN: Learning to Teach with Dynamic Loss Network ( http://arxiv.org/abs/2310.19313v1 )

ライセンス: Link先を確認

Zhoyang Hai, Liyuan Pan, Xiabi Liu, Zhengzheng Liu, Mirna Yunita

(参考訳) 教育の概念が機械学習コミュニティに導入されることにより、教師モデルは動的損失関数を使用して学生モデルのトレーニングを教えるようになる。動的には、適応的損失関数を学生モデル学習の異なるフェーズに設定することを意図している。既存の作品における教師モデル 1) 単に学生モデルの現状に基づいて損失関数を決定するだけで、すなわち、教師の経験を無視する。 2)学生モデルの状態(例えば、訓練イテレーション番号と訓練/評価セットからの損失/正確性)のみを利用するが、損失関数の状態は無視する。本稿では,まず,記憶単位を用いた教師モデルの設計により,時間的課題として損失調整を定式化し,教師モデルの経験から生徒の学習を誘導する。そして、動的損失ネットワークを用いて、教師と生徒モデルとの相互作用を高めるために、教師の学習を支援するために、損失の状態を追加して利用することができる。広範な実験により,本手法は学生の学習を増強し,分類,客観的検出,意味セグメンテーションシナリオを含む実世界課題における様々な深層モデルの性能を向上させることを実証した。

With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.

翻訳日:2023-11-01 21:24:12 公開日:2023-10-30

# スパース状態の効率的生成のための単純量子アルゴリズム

A simple quantum algorithm to efficiently prepare sparse states ( http://arxiv.org/abs/2310.19309v1 )

ライセンス: Link先を確認

Debora Ramacciotti, Andreea-Iulia Lefterovici, Antonio F. Rotundo

(参考訳) 状態準備は、多くのアルゴリズムが提案されている量子計算の基本的なルーチンである。中でも最も単純なのがgrover-rudolphアルゴリズムである。本稿では,準備状態がスパースである場合に,本アルゴリズムの性能を解析する。ゲートの複雑性は状態の非零振幅数において線形であり、キュービット数では2次であることを示す。次に,量子ビット数への依存性を線形にするために,アルゴリズムの簡単な修正を導入する。これはスパース状態準備のための最もよく知られたアルゴリズムと競合する

State preparation is a fundamental routine in quantum computation, for which many algorithms have been proposed. Among them, perhaps the simplest one is the Grover-Rudolph algorithm. In this paper, we analyse the performance of this algorithm when the state to prepare is sparse. We show that the gate complexity is linear in the number of non-zero amplitudes in the state and quadratic in the number of qubits. We then introduce a simple modification of the algorithm, which makes the dependence on the number of qubits also linear. This is competitive with the best known algorithms for sparse state preparation

翻訳日:2023-11-01 21:23:54 公開日:2023-10-30

# ベルマン完全性がない:モデルに基づく回帰条件付き教師付き学習による軌道ステッチ

Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning ( http://arxiv.org/abs/2310.19308v1 )

ライセンス: Link先を確認

Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du

(参考訳) q$-learningのようなオフポリシー動的プログラミング(dp)技術は、シーケンシャルな意思決定問題を解決する重要な技術であることが証明されている。しかし、関数近似の存在下では、そのようなアルゴリズムは収束することが保証されておらず、しばしば、考慮された関数クラスにおいてベルマン完全性が欠如しているため、DPベースの手法の成功にとって重要な条件である。本稿では,回帰条件付き教師付き学習(return-conditioned supervised learning,rcsl)に基づくオフポリシー学習手法がベルマン完全性という課題を回避できることを示す。関数近似器として2層多層パーセプトロンを用いる場合, 一定の層幅がrcslに十分である一方で, ベルマン完全性を満たすために, 状態空間サイズと線形に層幅を成長させる必要がある。これらの結果は, ほぼ最適データセットを用いた環境におけるDP法と比較して, RCSL法の優れた経験的性能を説明するための一歩となる。さらに、最適部分データセットから学習するために、RCSLメソッドに異なる軌道からセグメントを縫合する動的プログラミング機能を与えるMBRCSLという単純なフレームワークを提案する。 MBRCSLは、学習された動的モデルと前方サンプリングを利用して、全ての動的プログラミングアルゴリズムを悩ませるベルマン完全性の必要性を回避しつつ、軌道縫合を達成する。これらの主張を裏付ける理論解析と実験評価の両方を提案し、いくつかのシミュレーションロボット問題に対して最先端のモデルフリーおよびモデルベースオフラインrlアルゴリズムを上回っている。

Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be an important technique for solving sequential decision-making problems. However, in the presence of function approximation such algorithms are not guaranteed to converge, often diverging due to the absence of Bellman-completeness in the function classes considered, a crucial condition for the success of DP-based methods. In this paper, we show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent these challenges of Bellman completeness, converging under significantly more relaxed assumptions inherited from supervised learning. We prove there exists a natural environment in which if one uses two-layer multilayer perceptron as the function approximator, the layer width needs to grow linearly with the state space size to satisfy Bellman-completeness while a constant layer width is enough for RCSL. These findings take a step towards explaining the superior empirical performance of RCSL methods compared to DP-based methods in environments with near-optimal datasets. Furthermore, in order to learn from sub-optimal datasets, we propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories. MBRCSL leverages learned dynamics models and forward sampling to accomplish trajectory stitching while avoiding the need for Bellman completeness that plagues all dynamic programming algorithms. We propose both theoretical analysis and experimental evaluation to back these claims, outperforming state-of-the-art model-free and model-based offline RL algorithms across several simulated robotics problems.

翻訳日:2023-11-01 21:23:46 公開日:2023-10-30

# 極端機械力場に対する計画・探索的アプローチ

A Planning-and-Exploring Approach to Extreme-Mechanics Force Fields ( http://arxiv.org/abs/2310.19306v1 )

ライセンス: Link先を確認

Pengjie Shi and Zhiping Xu

(参考訳) 強い格子歪みや破壊時の結合破壊のような極端な機械的プロセスは、自然と工学においてユビキタスであり、しばしば構造が破滅的な破壊を引き起こす。しかし, き裂の核生成と成長を理解するには, き裂先端の原子準位構造から荷重が印加される構造的特徴まで幅広い多スケール特性が必要である。分子シミュレーションは、クラックフロントにおける進行的なミクロ構造変化を解決する重要なツールを提供し、機械的エネルギー散逸、クラックパスの選択、動的不安定性(例えば、キンキング、分岐)などのプロセスの探索に広く用いられている。原子位置に基づく局所的記述子に基づく実験力場と結合順序は, 非線形, 異方性応力-ひずみ関係, エッジのエネルギー密度に対しても, 破壊の予測を満足させるものではない。したがって、高忠実な力場はひずみのテンソルの性質と破壊時の希少事象のエネルギーを含み、残念ながら最先端の経験的力場と機械学習の力場の両方では考慮されていない。第一原理計算によって生成されたデータに基づいて, ひずみ状態空間の事前サンプリングとアクティブラーニング技術を組み合わせて, 臨界結合距離における遷移状態の探索を行い, 破壊力場nn-f$^3$を開発した。 NN-F$^3$の能力は、モデル問題としてh-BNおよびツイスト二層グラフェンの破断を研究することによって実証される。シミュレーションの結果,最近の実験結果を確認し,極端機械過程の予測において第一原理計算から電子構造の知識を含める必要性を浮き彫りにした。

Extreme mechanical processes such as strong lattice distortion and bond breakage during fracture are ubiquitous in nature and engineering, which often lead to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenged by their multiscale characteristics spanning from atomic-level structures at the crack tip to the structural features where the load is applied. Molecular simulations offer an important tool to resolve the progressive microstructural changes at crack fronts and are widely used to explore processes therein, such as mechanical energy dissipation, crack path selection, and dynamic instabilities (e.g., kinking, branching). Empirical force fields developed based on local descriptors based on atomic positions and the bond orders do not yield satisfying predictions of fracture, even for the nonlinear, anisotropic stress-strain relations and the energy densities of edges. High-fidelity force fields thus should include the tensorial nature of strain and the energetics of rare events during fracture, which, unfortunately, have not been taken into account in both the state-of-the-art empirical and machine-learning force fields. Based on data generated by first-principles calculations, we develop a neural network-based force field for fracture, NN-F$^3$, by combining pre-sampling of the space of strain states and active-learning techniques to explore the transition states at critical bonding distances. The capability of NN-F$^3$ is demonstrated by studying the rupture of h-BN and twisted bilayer graphene as model problems. The simulation results confirm recent experimental findings and highlight the necessity to include the knowledge of electronic structures from first-principles calculations in predicting extreme mechanical processes.

翻訳日:2023-11-01 21:23:16 公開日:2023-10-30

# 財務異常検出のための縦・水平分割データによるプライバシー保護フェデレーション学習

Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection ( http://arxiv.org/abs/2310.19304v1 )

ライセンス: Link先を確認

Swanand Ravindra Kadhe, Heiko Ludwig, Nathalie Baracaldo, Alan King, Yi Zhou, Keith Houck, Ambrish Rawat, Mark Purcell, Naoise Holohan, Mikio Takeuchi, Ryo Kawahara, Nir Drucker, Hayim Shaul, Eyal Kushnir, Omri Soceanu

(参考訳) 金融異常の証拠を効果的に検出するには、支払いネットワークシステム(PNS)やパートナー銀行など、多様なデータを所有している複数のエンティティ間の協調が必要である。これらの金融機関間の信頼は規制と競争によって制限される。フェデレートラーニング(FL)は、データを垂直または水平に分割する場合に、エンティティが協調的にモデルをトレーニングすることを可能にする。しかし、実世界の金融異常検出シナリオでは、データは上下に分割されるため、既存のFLアプローチをプラグ・アンド・プレイで使用することはできない。我々の新しいソリューションであるPV4FADは、完全同型暗号化(HE)、セキュアマルチパーティ計算(SMPC)、差分プライバシ(DP)、ランダム化技術を組み合わせて、トレーニング中のプライバシと精度をバランスさせ、モデル展開時の推論脅威を防止する。我々のソリューションは、HEおよびSMPCを介して入力プライバシを提供し、DPを介して推測時間攻撃に対するプライバシを出力する。具体的には、正直だが厳密な脅威モデルでは、銀行はpnsトランザクションについてセンシティブな特徴を学ばず、pnsは銀行のデータセットに関する情報を学ばず、予測ラベルしか学ばないことを示す。また,推論中にアウトプットプライバシを保護するdp機構を開発し,解析する。提案手法は,分散DPを満足しながら,バンク単位のノイズレベルを著しく低減し,高ユーティリティモデルを生成する。高い精度を確保するため,本手法では,特にランダムフォレストをアンサンブルモデルとして作成する。これにより,アンサンブルのよく知られた特性を利用して分散を低減し,精度を向上させることができる。私たちのソリューションは、米国プライバシ・エンハンシング・テクノロジーズ(PET)賞チャレンジの第1フェーズで2位を獲得しました。

The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.

翻訳日:2023-11-01 21:22:46 公開日:2023-10-30

# chat-gptを用いた対話推薦のためのユーザニーズの抽出

Extracting user needs with Chat-GPT for dialogue recommendation ( http://arxiv.org/abs/2310.19303v1 )

ライセンス: Link先を確認

Yugen Sato, Taisei Nakajima, Tatsuki Kawamoto, Tomohiro Takagi

(参考訳) chatgptのような大規模言語モデル(llm)はますます洗練され、人間のような能力を発揮し、様々な日常業務において人間を助ける上で不可欠な役割を担っている。 AIの重要な応用は、対話型レコメンデーションシステムで、人間の問い合わせに応答し、ユーザに合わせたレコメンデーションを行う。ほとんどの従来の対話型レコメンデーションシステムでは、言語モデルは対話モデルとしてのみ使用され、別個のレコメンデーションシステムが存在する。これは対話システムとして使われる言語モデルが推薦システムとして機能する能力を持っていないためである。そこで我々は,対話システムとしての非常に高い推論能力と高品質な文を生成する能力を有するOpenAIのChat-GPTを用いて,推薦機能を備えた対話システムの構築を実現し,システムの有効性を検証する。

Large-scale language models (LLMs), such as ChatGPT, are becoming increasingly sophisticated and exhibit human-like capabilities, playing an essential role in assisting humans in a variety of everyday tasks. An important application of AI is interactive recommendation systems that respond to human inquiries and make recommendations tailored to the user. In most conventional interactive recommendation systems, the language model is used only as a dialogue model, and there is a separate recommendation system. This is due to the fact that the language model used as a dialogue system does not have the capability to serve as a recommendation system. Therefore, we will realize the construction of a dialogue system with recommendation capability by using OpenAI's Chat-GPT, which has a very high inference capability as a dialogue system and the ability to generate high-quality sentences, and verify the effectiveness of the system.

翻訳日:2023-11-01 21:22:16 公開日:2023-10-30

# ROME:ビジュアルコモンセンスを超えた推論のための事前学習型視覚言語モデルの評価

ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense ( http://arxiv.org/abs/2310.19301v1 )

ライセンス: Link先を確認

Kankan Zhou, Eason Lai, Wei Bin Au Yeong, Kyriakos Mouratidis, Jing Jiang

(参考訳) 人間は常識を超えた推論能力を持っている。例えば、空の魚のボウルの隣のテーブルに横たわる金魚の非日常的なイメージを考えると、人間は魚が魚のボウルの中にいないと断固として判断する。しかしこのケースは、視覚的な入力にもかかわらず、魚がボウルの中にいるという一般的なシナリオに向け、視覚言語モデルでは異なるかもしれない。本稿では,最先端の視覚言語モデルが直観的コンテンツを正しく解釈する推論能力を持っているかどうかを評価するために,rome(reasoning beyond commonsense knowledge)という新しい探索データセットを提案する。 ROMEには、色、形状、材料、サイズ、位置関係に関する常識的知識に反するイメージが含まれている。最先端の事前学習された視覚言語モデルの実験により、これらのモデルのほとんどは依然として直観に反するシナリオを解釈できないことが判明した。我々は、ROMEが視覚言語研究における常識知識以上の推論に関するさらなる調査を加速することを期待している。

Humans possess a strong capability for reasoning beyond common sense. For example, given an unconventional image of a goldfish laying on the table next to an empty fishbowl, a human would effortlessly determine that the fish is not inside the fishbowl. The case, however, may be different for a vision-language model, whose reasoning could gravitate towards the common scenario that the fish is inside the bowl, despite the visual input. In this paper, we introduce a novel probing dataset named ROME (reasoning beyond commonsense knowledge) to evaluate whether the state-of-the-art pre-trained vision-language models have the reasoning capability to correctly interpret counter-intuitive content. ROME contains images that defy commonsense knowledge with regards to color, shape, material, size and positional relation. Experiments on the state-of-the-art pre-trained vision-language models reveal that most of these models are still largely incapable of interpreting counter-intuitive scenarios. We hope that ROME will spur further investigations on reasoning beyond commonsense knowledge in vision-language research.

翻訳日:2023-11-01 21:21:59 公開日:2023-10-30

# 動的治療のためのステージアウェア学習

Stage-Aware Learning for Dynamic Treatments ( http://arxiv.org/abs/2310.19300v1 )

ライセンス: Link先を確認

Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu

(参考訳) 動的治療体制(DTR)の最近の進歩は、個人のニーズに合わせて調整され、期待される臨床利益を最大化できる強力な最適な治療探索アルゴリズムを提供する。しかし、既存のアルゴリズムは最適な治療、特に長期にわたる意思決定を伴う慢性疾患においてサンプルサイズ不足に苦しむ可能性がある。これらの課題に対処するため、我々は、DTRを、観察された治療軌跡と、決定段階を越えて最適な体制によって得られるものとの整合性の優先順位付けに焦点をあてて推定する、新しい個別化学習手法を提案する。観測軌道が最適処理と完全に一致しなければならないという制約を緩和することにより,逆確率重み付き手法のサンプル効率と安定性を大幅に改善する。特に,提案手法は,一般的な成果重み付け学習フレームワークを具体例として含む,より汎用的なフレームワークを構築している。さらに,決定段階間の不均一性を明示的に考慮するための注意機構とともに,段階重要度スコアの概念を導入する。我々はフィッシャー整合性や有限サンプル性能境界を含む提案手法の理論的性質を確立する。本手法を広範囲なシミュレーション環境において実証的に評価し,その実例について検討した。

Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal treatment searching algorithms, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms could suffer from insufficient sample size under optimal treatments, especially for chronic diseases involving long stages of decision-making. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of inverse probability weighted based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic.

翻訳日:2023-11-01 21:21:42 公開日:2023-10-30

# 生成モデルにおける公平性の測定について

On Measuring Fairness in Generative Models ( http://arxiv.org/abs/2310.19297v1 )

ライセンス: Link先を確認

Christopher T. H. Teo, Milad Abdollahzadeh, Ngai-Man Cheung

(参考訳) 近年,公平な生成モデルへの関心が高まっている。本研究は, フェアネス測定の詳細な研究を初めて行い, 公正な生成モデルにおいて, ゲージングの進行に重要な要素となる。我々は3つの貢献をした。まず,高精度な属性分類器(SA)を用いた場合においても,既存の公正度測定フレームワークにかなりの測定誤差があることを明らかにする。これらの結果は、以前報告された公平性の改善に疑問を投げかけた。第2に, クラシファイア・エラー・アウェア計測(CLEAM)を提案する。これは統計モデルを用いて, SA分類器の不正確さを推定する新しいフレームワークである。提案したCLEAMは,StyleGAN2 w.r.t. Genderの4.98%$\rightarrowを0.62%削減する。さらに、CLEAMは最小限の追加オーバーヘッドでこれを達成する。第3に,CLEAMを用いて重要なテキスト・画像生成器とGANの公平性を計測し,これらのモデルにかなりのバイアスを生じさせ,それらのアプリケーションに対する懸念を提起する。コードとより多くのリソース: https://sutd-visual-computing-group.github.io/cleam/

Recently, there has been increased interest in fair generative models. In this work, we conduct, for the first time, an in-depth study on fairness measurement, a critical component in gauging progress on fair generative models. We make three contributions. First, we conduct a study that reveals that the existing fairness measurement framework has considerable measurement errors, even when highly accurate sensitive attribute (SA) classifiers are used. These findings cast doubts on previously reported fairness improvements. Second, to address this issue, we propose CLassifier Error-Aware Measurement (CLEAM), a new framework which uses a statistical model to account for inaccuracies in SA classifiers. Our proposed CLEAM reduces measurement errors significantly, e.g., 4.98% $\rightarrow$ 0.62% for StyleGAN2 w.r.t. Gender. Additionally, CLEAM achieves this with minimal additional overhead. Third, we utilize CLEAM to measure fairness in important text-to-image generator and GANs, revealing considerable biases in these models that raise concerns about their applications. Code and more resources: https://sutd-visual-computing-group.github.io/CLEAM/.

翻訳日:2023-11-01 21:21:23 公開日:2023-10-30

# 量子状態の複素値ウィグナーエントロピー

Complex-valued Wigner entropy of a quantum state ( http://arxiv.org/abs/2310.19296v1 )

ライセンス: Link先を確認

Nicolas J. Cerf, Anaelle Hertz, Zacharie Van Herstraeten

(参考訳) 量子状態のウィグナー関数が負の値を持つことは一般的な知識であり、真の確率密度と見なすことはできない。ここでは、負のウィグナー関数に拡張される位相空間におけるエントロピー的汎関数を見つけることの難しさを調べ、任意のウィグナー関数に付随する複素値エントロピーを定義するメリットを提唱する。複素ウィグナーエントロピー (complex wigner entropy) と呼ばれるこの量は、複素平面におけるウィグナー函数のシャノン微分エントロピーの解析的継続によって定義される。複素ウィグナーエントロピーは興味深い性質を持ち、特に実部と虚部はガウスユニタリの下で不変である(位相空間における変位、回転、スキーズ)。その実部はガウスの畳み込みの下でのウィグナー函数の進化を考える際に物理的に関係があるが、その虚部は単にウィグナー函数の負の体積に比例する。最後に、任意のウィグナー関数の複素値フィッシャー情報を定義する。これは(拡張ド・ブルーエンの同一性によって)(状態がガウスの付加雑音を受けるとき)複素ウィグナーエントロピーの時間微分と結びついている。全体として、複素平面は位相空間における準確率分布のエントロピー特性を分析するための適切な枠組みをもたらすことが期待される。

It is common knowledge that the Wigner function of a quantum state may admit negative values, so that it cannot be viewed as a genuine probability density. Here, we examine the difficulty in finding an entropy-like functional in phase space that extends to negative Wigner functions and then advocate the merits of defining a complex-valued entropy associated with any Wigner function. This quantity, which we call the complex Wigner entropy, is defined via the analytic continuation of Shannon's differential entropy of the Wigner function in the complex plane. We show that the complex Wigner entropy enjoys interesting properties, especially its real and imaginary parts are both invariant under Gaussian unitaries (displacements, rotations, and squeezing in phase space). Its real part is physically relevant when considering the evolution of the Wigner function under a Gaussian convolution, while its imaginary part is simply proportional to the negative volume of the Wigner function. Finally, we define the complex-valued Fisher information of any Wigner function, which is linked (via an extended de Bruijn's identity) to the time derivative of the complex Wigner entropy when the state undergoes Gaussian additive noise. Overall, it is anticipated that the complex plane yields a proper framework for analyzing the entropic properties of quasiprobability distributions in phase space.

翻訳日:2023-11-01 21:21:04 公開日:2023-10-30

# ROAM: 最適化されたオペレータオーダとメモリレイアウトによるメモリ効率の大きなDNNトレーニング

ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout ( http://arxiv.org/abs/2310.19295v1 )

ライセンス: Link先を確認

Huiyao Shu and Ang Wang and Ziji Shi and Hanyu Zhao and Yong Li and Lu Lu

(参考訳) ディープラーニングモデルのサイズが拡大するにつれ、トレーニングのメモリ要件は急増している。オフロード、再計算、圧縮といったハイレベルなテクニックはメモリのプレッシャーを軽減するが、オーバーヘッドも伴う。しかし、適切な演算子実行順序とテンソルメモリレイアウトを含むメモリ効率の高い実行プランは、モデルのメモリ効率を大幅に向上させ、ハイレベルな技術によるオーバーヘッドを低減することができる。本稿では,演算子順序とテンソルメモリレイアウトを最適化したメモリ効率実行計画の導出のために,計算グラフレベルで動作するROAMを提案する。まずモデル構造とメモリ負荷の訓練を慎重に検討し,これまで十分にサポートされていなかった大規模複雑なグラフの最適化を支援するための高度な理論を提案する。さらに,タスク分割を自動的に探索する効率的な木に基づくアルゴリズムを提案し,課題を解決するために高い性能と有効性を提供する。実験の結果、ROAMはPytorchと2つの最先端手法と比較して35.7%、13.3%、27.2%の大幅なメモリ削減を実現し、53.7倍のスピードアップを実現している。 GPT2-XLの拡張による評価は、ROAMのスケーラビリティをさらに検証する。

As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability.

翻訳日:2023-11-01 21:20:40 公開日:2023-10-30

# カラー同変畳み込みネットワーク

Color Equivariant Convolutional Networks ( http://arxiv.org/abs/2310.19368v1 )

ライセンス: Link先を確認

Attila Lengyel, Ombretta Strafforello, Robert-Jan Bruintjes, Alexander Gielisse, Jan van Gemert

(参考訳) 色は、畳み込みニューラルネットワーク(cnns)がオブジェクト認識に容易に活用できる重要な視覚的手がかりである。しかし、cnnは、偶発的な記録条件によってもたらされた色の変化の間にデータの不均衡がある場合に苦労する。色不変性はこの問題に対処するが、識別力の犠牲となるすべての色情報を除去するコストがかかる。本稿では,カラー情報を保持しつつ,色スペクトル間の形状特徴共有を可能にする,新しいディープラーニングビルディングブロックであるカラー等変畳み込み(CEConvs)を提案する。ニューラルネットワークにおける色相のパラメータ共有を組み込むことにより、等分散の概念を幾何変換から測光変換へ拡張する。 CEConvsの利点は、様々なタスクに対するダウンストリーム性能と、列車-テストの分散シフトを含む色の変化に対する堅牢性の改善である。我々のアプローチは、ResNetsのような既存のアーキテクチャにシームレスに統合することができ、CNNにおけるカラーベースのドメインシフトに対処するための有望なソリューションを提供する。

Color is a crucial visual cue readily exploited by Convolutional Neural Networks (CNNs) for object recognition. However, CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions. Color invariance addresses this issue but does so at the cost of removing all color information, which sacrifices discriminative power. In this paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum while retaining important color information. We extend the notion of equivariance from geometric to photometric transformations by incorporating parameter sharing over hue-shifts in a neural network. We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts. Our approach can be seamlessly integrated into existing architectures, such as ResNets, and offers a promising solution for addressing color-based domain shifts in CNNs.

翻訳日:2023-11-01 21:12:48 公開日:2023-10-30

# フロッケ非平衡グリーン関数とフロッケ量子マスター方程式:電子-電子相互作用の役割

Floquet non-equilibrium Green's function and Floquet quantum master equation for electronic transport: The role of electron-electron interactions ( http://arxiv.org/abs/2310.19362v1 )

ライセンス: Link先を確認

Vahid Mosallanejad, Yu Wang, and Wenjie Dou

(参考訳) 非平衡グリーン関数(NEGF)と量子マスター方程式(QME)は電子輸送のアプローチの2つの主要なクラスである。外部周期場との相互作用により駆動される量子ドットの輸送特性に対するこれらの形式の様々なフロケ分散について論じる。最初にFloquet NEGFの2つのバージョンを導出した。また、相互作用系に対するFloquet NEGFフォーマリズムのアンサッツについても検討する。さらに,弱い相互作用状態においてFloquet QMEの2つのバージョンを導出した。各手法を用いて,各演算子の期待値と現在の演算子の期待値の評価について詳述する。本研究は, 定期運転対象の2レベルシステムを用いた交通手段について検討した。 4つの方法すべての結果は高い一貫性を示している。我々はこれらのフロケ量子輸送法が光に曝される分子接合の研究に有用であると期待する。

Non-equilibrium Green's function (NEGF) and quantum master equation (QME) are two main classes of approaches for electronic transport. We discuss various Floquet variances of these formalisms for transport properties of a quantum dot driven via interaction with an external periodic field. We first derived two versions of the Floquet NEGF. We also explore an ansatz of the Floquet NEGF formalism for the interacting systems. In addition, we derived two versions of Floquet QME in the weak interaction regime. With each method, we elaborate on the evaluation of the expectation values of the number and current operators. We examined these methods for transport through a two-level system that is subject to periodic driving. The results of all four methods show great consistency. We expect these Floquet quantum transport methods to be useful in studying molecular junctions exposed to light.

翻訳日:2023-11-01 21:12:32 公開日:2023-10-30

# バランス、不均衡、リバランス:minimaxゲームの観点からのロバストオーバーフィットの理解

Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective ( http://arxiv.org/abs/2310.19360v1 )

ライセンス: Link先を確認

Yifei Wang, Liangchen Li, Jiansheng Yang, Zhouchen Lin, Yisen Wang

(参考訳) 敵対的訓練(AT)はおそらく、頑健な特徴を抽出するための最先端のアルゴリズムである。しかし、最近の研究者はATが特に学習率(LR)が崩壊した後、深刻な過適合問題に悩まされていることに気づいた。本稿では,モデルトレーナーと攻撃者の間の動的ミニマックスゲームとして,敵対的トレーニングを見て,この現象を説明する。具体的には, LR崩壊がトレーナーに強い記憶能力を与えることでミニマックスゲーム間のバランスを損なうかを分析し, 非破壊的特徴を記憶した結果, 強靭なオーバーフィッティングを引き起こすことを示す。この理解を広範囲な実験で検証し、2人のゲームプレーヤーのダイナミクスから強固なオーバーフィットの全体像を提供する。この理解は、トレーナーの能力の正規化や攻撃強度の向上によって、2人のプレイヤーを再バランスさせることで、堅牢なオーバーフィッティングを緩和するきっかけとなる。実験により、提案したReBalanced Adversarial Training (ReBAT) は、非常に長い訓練の後でも、頑健なオーバーフィッティングに苦しむことはないことが示された。コードはhttps://github.com/PKU-ML/ReBAT.comで入手できる。

Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https://github.com/PKU-ML/ReBAT.

翻訳日:2023-11-01 21:12:22 公開日:2023-10-30

# 複数のインスタンス学習におけるインスタンスラベル相関の導入病理組織像における癌検出への応用

Introducing instance label correlation in multiple instance learning. Application to cancer detection on histopathological images ( http://arxiv.org/abs/2310.19359v1 )

ライセンス: Link先を確認

Pablo Morales-\'Alvarez, Arne Schmidt, Jos\'e Miguel Hern\'andez-Lobato, Rafael Molina

(参考訳) 近年では,マルチインスタンス学習(mil)の弱い教師付きパラダイムが,さまざまな分野で広く普及している。パラダイム的な例は計算病理学であり、全スライディング画像に対するパッチレベルのラベルの欠如は、教師付きモデルの適用を妨げる。ガウス過程(GP)に基づく確率的MIL法は, 優れた不確実性推定能力により有望な結果を得た。しかし、これらは1つの重要な事実を考慮しない汎用的MIL手法であり、(病理)画像では、近隣のパッチのラベルに相関が期待できる。本研究では,VGPMIL-PRと呼ばれる最先端のGPベースのMIL法を拡張し,その相関性を利用する。そこで我々は統計物理学イジングモデルに触発された新しい結合項を開発した。すべてのモデルパラメータを推定するために変分推論を使用します。興味深いことに、Ising項の強度を調節する重みがなくなると、VGPMIL-PRの定式化が回復する。提案手法の性能は,前立腺癌検出の現実的な2つの問題において評価される。我々のモデルは、他の最先端確率的MIL法よりも優れた結果が得られることを示す。我々はまた、小説『Ising』の影響を洞察するために、異なる可視化と分析も提供する。これらの知見は、提案されたモデルの他の研究分野への応用を促進することが期待されている。

In the last years, the weakly supervised paradigm of multiple instance learning (MIL) has become very popular in many different areas. A paradigmatic example is computational pathology, where the lack of patch-level labels for whole-slide images prevents the application of supervised models. Probabilistic MIL methods based on Gaussian Processes (GPs) have obtained promising results due to their excellent uncertainty estimation capabilities. However, these are general-purpose MIL methods that do not take into account one important fact: in (histopathological) images, the labels of neighboring patches are expected to be correlated. In this work, we extend a state-of-the-art GP-based MIL method, which is called VGPMIL-PR, to exploit such correlation. To do so, we develop a novel coupling term inspired by the statistical physics Ising model. We use variational inference to estimate all the model parameters. Interestingly, the VGPMIL-PR formulation is recovered when the weight that regulates the strength of the Ising term vanishes. The performance of the proposed method is assessed in two real-world problems of prostate cancer detection. We show that our model achieves better results than other state-of-the-art probabilistic MIL methods. We also provide different visualizations and analysis to gain insights into the influence of the novel Ising term. These insights are expected to facilitate the application of the proposed model to other research areas.

翻訳日:2023-11-01 21:12:00 公開日:2023-10-30

# 局所ランダム量子回路は任意のアーキテクチャ上の近似設計を形成する

Local random quantum circuits form approximate designs on arbitrary architectures ( http://arxiv.org/abs/2310.19355v1 )

ライセンス: Link先を確認

Shivan Mittal, Nicholas Hunter-Jones

(参考訳) エッジが許容される2$-qudit相互作用を決定する任意の連結グラフ上のランダム量子回路(RQC)を考える。以前の研究は、局所次元$q$1D、完全、および$D$次元のグラフを持つような$n$量子回路が近似ユニタリな設計を成し、多項式的に多くのゲートの後に一意群$U(q^n)$上のハール測度に近い分布からユニタリを生成することを確立してきた。ここで、これらの結果を拡張して、幅広いグラフのクラス上の $o(\mathrm{poly}(n,k))$gate からなる rqcs が近似ユニタリな $k$-designs を形成することを示す。有界次数と高さを持つ木にまたがるグラフ上の rqcs は、$o(|e|n\,\mathrm{poly}(k))$ gates の後に $k$-designs となる。さらに, rqc が多項式回路サイズで近似設計を生成するグラフのより大きなクラスを特定する。 k \leq 4$ に対して、ある最大次数のグラフ上の RQC が $O(|E|n)$ ゲートの後に設計され、明示的な定数を与えることを示す。我々は局所ハミルトニアンのスペクトルギャップから回路サイズの境界を決定する。この目的のために、正規グラフ上のフラストレーションフリーハミルトニアンのギャップを任意の連結グラフに有界化するための有限サイズ(Knabe)法を拡張する。さらに,任意のグラフ上のハミルトニアンのスペクトルギャップを決定するための検出可能性補題に基づく新しい手法を提案する。第1法は[Commun. Phys. 291, 257 (2009)]の簡潔な代替証明を提供し,第2法は,任意の連結アーキテクチャ上のRQCが準多項式回路サイズで近似設計を成すことを示す。

We consider random quantum circuits (RQC) on arbitrary connected graphs whose edges determine the allowed $2$-qudit interactions. Prior work has established that such $n$-qudit circuits with local dimension $q$ on 1D, complete, and $D$-dimensional graphs form approximate unitary designs, that is, they generate unitaries from distributions close to the Haar measure on the unitary group $U(q^n)$ after polynomially many gates. Here, we extend those results by proving that RQCs comprised of $O(\mathrm{poly}(n,k))$ gates on a wide class of graphs form approximate unitary $k$-designs. We prove that RQCs on graphs with spanning trees of bounded degree and height form $k$-designs after $O(|E|n\,\mathrm{poly}(k))$ gates, where $|E|$ is the number of edges in the graph. Furthermore, we identify larger classes of graphs for which RQCs generate approximate designs in polynomial circuit size. For $k \leq 4$, we show that RQCs on graphs of certain maximum degrees form designs after $O(|E|n)$ gates, providing explicit constants. We determine our circuit size bounds from the spectral gaps of local Hamiltonians. To that end, we extend the finite-size (or Knabe) method for bounding gaps of frustration-free Hamiltonians on regular graphs to arbitrary connected graphs. We further introduce a new method based on the Detectability Lemma for determining the spectral gaps of Hamiltonians on arbitrary graphs. Our methods have wider applicability as the first method provides a succinct alternative proof of [Commun. Math. Phys. 291, 257 (2009)] and the second method proves that RQCs on any connected architecture form approximate designs in quasi-polynomial circuit size.

翻訳日:2023-11-01 21:11:38 公開日:2023-10-30

# 物体検出のための半教師あり領域一般化

Semi- and Weakly-Supervised Domain Generalization for Object Detection ( http://arxiv.org/abs/2310.19351v1 )

ライセンス: Link先を確認

Ryosuke Furuta, Yoichi Sato

(参考訳) トレーニングとテストデータでドメインが大きく異なる場合、オブジェクト検出器はうまく動作しない。この問題を解決するために,複数の領域の接地ラベルを用いたトレーニングデータを必要とする領域一般化手法が提案されている。しかし、クラスラベルだけでなく、バウンディングボックスにも注釈を付けなければならないため、オブジェクト検出のためにこれらのデータを集めるのに時間と労力がかかります。高価なアノテーションを必要とせずに、オブジェクト検出におけるドメインギャップを克服するために、半教師付きドメイン一般化オブジェクト検出(SS-DGOD)と弱い教師付きDGOD(WS-DGOD)という2つの新しい問題設定を提案する。複数のドメインからのラベル付きデータを必要とする従来のドメインの一般化とは対照的に、SS-DGODとWS-DGODは1つのドメインからのみラベル付きデータを必要とし、トレーニングのために複数のドメインからラベル付きまたは弱いラベル付きデータを必要とする。対象検出器は、教師から出力される擬似ラベルを用いて、未ラベルまたは弱ラベルのデータに基づいて学生ネットワークを訓練する同じ学習フレームワークを用いて、提案した設定で効果的に訓練できることを示す。実験の結果,提案手法で学習した対象検出器は,あるラベル付きドメインデータでトレーニングされたベースライン検出器を著しく上回っており,非教師付きドメイン適応(uda)設定で訓練されたものと同等かそれ以上の性能を発揮することがわかった。

Object detectors do not work well when domains largely differ between training and testing data. To solve this problem, domain generalization approaches, which require training data with ground-truth labels from multiple domains, have been proposed. However, it is time-consuming and labor-intensive to collect those data for object detection because not only class labels but also bounding boxes must be annotated. To overcome the problem of domain gap in object detection without requiring expensive annotations, we propose to consider two new problem settings: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD). In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training. We show that object detectors can be effectively trained on the proposed settings with the same student-teacher learning framework, where a student network is trained with pseudo labels output from a teacher on the unlabeled or weakly-labeled data. The experimental results demonstrate that the object detectors trained on the proposed settings significantly outperform baseline detectors trained on one labeled domain data and perform comparably to or better than those trained on unsupervised domain adaptation (UDA) settings, while ours do not use target domain data for training in contrast to UDA.

翻訳日:2023-11-01 21:10:58 公開日:2023-10-30

# 日本SimCSE技術報告

Japanese SimCSE Technical Report ( http://arxiv.org/abs/2310.19349v1 )

ライセンス: Link先を確認

Hayato Tsukagoshi, Ryohei Sasano, Koichi Takeda

(参考訳) simcseで微調整された日本語文埋め込みモデルの開発について報告する。文埋め込み研究のベースラインとして使用可能な日本語の文埋め込みモデルが不足していることから,24の日本語・多言語モデル,5つの教師付きデータセット,4つの教師なしデータセットを含む日本語文埋め込みに関する広範な実験を行った。本報告では,日本語SimCSEの詳細なトレーニング設定と評価結果について述べる。

We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted extensive experiments on Japanese sentence embeddings involving 24 pre-trained Japanese or multilingual language models, five supervised datasets, and four unsupervised datasets. In this report, we provide the detailed training setup for Japanese SimCSE and their evaluation results.

翻訳日:2023-11-01 21:10:29 公開日:2023-10-30

# ドープファンデルワールス反強磁性体(Ni,Cd)PS3における量子多体磁気励起子の迅速抑制

Rapid suppression of quantum many-body magnetic exciton in doped van der Waals antiferromagnet (Ni,Cd)PS3 ( http://arxiv.org/abs/2310.19348v1 )

ライセンス: Link先を確認

Junghyun Kim, Woongki Na, Jonghyeon Kim, Pyeongjae Park, Kaixuan Zhang, Inho Hwang, Young-Woo Son, Jae Hoon Kim, Hyeonsik Cheong, Je-Geun Park

(参考訳) van der waals反強磁性体nips3における磁気励起子のユニークな発見は、zhang-rice一重項励起状態とzhang-rice三重項基底状態の2つの量子多体状態の間に生じる。同時に、この励起子に由来する発光のスペクトル幅は0.4 meVと非常に狭い。 NiPS3の磁気励起子の極端なコヒーレンスを含むこれらの異常な性質は、多くの疑問を呈している。 Ni1-xCdxPS3を用いたドーピング効果について実験的に検討した。実験の結果,磁気励起子はcdドーピング数%で劇的に抑制された。これらすべてが生じるが、エキシトンの幅は徐々に増加し、反強磁性基底状態は堅牢である。これらの結果は、コヒーレント磁気励起子の前提条件として格子の均一性が隠された重要性を強調している。最後に、壊れた電荷移動は(Ni,Cd)PS3におけるコヒーレント磁気励起子の均一な形成を許すというエキサイティングなシナリオが現れる。

The unique discovery of magnetic exciton in van der Waals antiferromagnet NiPS3 arises between two quantum many-body states of a Zhang-Rice singlet excited state and a Zhang-Rice triplet ground state. Simultaneously, the spectral width of photoluminescence originating from this exciton is exceedingly narrow as 0.4 meV. These extraordinary properties, including the extreme coherence of the magnetic exciton in NiPS3, beg many questions. We studied doping effects using Ni1-xCdxPS3 using two experimental techniques and theoretical studies. Our experimental results show that the magnetic exciton is drastically suppressed upon a few % Cd doping. All these happen while the width of the exciton only gradually increases, and the antiferromagnetic ground state is robust. These results highlight the lattice uniformity's hidden importance as a prerequisite for coherent magnetic exciton. Finally, an exciting scenario emerges: the broken charge transfer forbids the otherwise uniform formation of the coherent magnetic exciton in (Ni,Cd)PS3.

翻訳日:2023-11-01 21:10:20 公開日:2023-10-30

# LLMの理解と実装能力の相違によるテキスト要約の現実的整合性の改善

Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs ( http://arxiv.org/abs/2310.19347v1 )

ライセンス: Link先を確認

Huawen Feng, Yan Fan, Xiong Liu, Ting-En Lin, Zekun Yao, Yuchuan Wu, Fei Huang, Yongbin Li, Qianli Ma

(参考訳) 大規模言語モデル(llm)によるテキスト要約の最近の進歩にもかかわらず、それらはテキスト生成において「幻覚」として知られる元の記事と事実上矛盾する要約を生成することが多い。従来の小さなモデル(例えばBART、T5)とは異なり、現在のLLMは愚かなミスを少なくするが、原因や効果を示唆する、誤った詳細を追加する、過度に一般化するなど、より洗練されたものを作る。これらの幻覚は従来の手法による検出が困難であり、テキスト要約の事実整合性を改善する上で大きな課題となる。本稿では,LLM(DECENT)の包括的・包括的NT能力を阻害する逆デカップリング手法を提案する。さらに, LLMの学習過程において, 真偽に対する感度の不足を補うために, 探索に基づくパラメータ効率の手法を採用した。このように、LLMはエンプレッシングや理解に混同されることが少なく、より正確に命令を実行でき、幻覚を識別する能力が向上する。実験の結果, llmsに基づくテキスト要約の信頼性が有意に向上した。

Despite the recent progress in text summarization made by large language models (LLMs), they often generate summaries that are factually inconsistent with original articles, known as "hallucinations" in text generation. Unlike previous small models (e.g., BART, T5), current LLMs make fewer silly mistakes but more sophisticated ones, such as imposing cause and effect, adding false details, and overgeneralizing, etc. These hallucinations are challenging to detect through traditional methods, which poses great challenges for improving the factual consistency of text summarization. In this paper, we propose an adversarially DEcoupling method to disentangle the Comprehension and EmbellishmeNT abilities of LLMs (DECENT). Furthermore, we adopt a probing-based parameter-efficient technique to cover the shortage of sensitivity for true and false in the training process of LLMs. In this way, LLMs are less confused about embellishing and understanding, thus can execute the instructions more accurately and have enhanced abilities to distinguish hallucinations. Experimental results show that DECENT significantly improves the reliability of text summarization based on LLMs.

翻訳日:2023-11-01 21:09:59 公開日:2023-10-30

# テストスイートタスク: MuST-SHE と INES を用いたMT におけるジェンダーフェアネスの評価

Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES ( http://arxiv.org/abs/2310.19345v1 )

ライセンス: Link先を確認

Beatrice Savoldi and Marco Gaido and Matteo Negri and Luisa Bentivogli

(参考訳) WMT-2023"テストスイート"共有タスクの一部として,MuST-SHE-WMT23とINESの2つのテストスイートの評価結果を要約する。 en-de と de-en の言語ペアに焦点をあてることで、私たちはこれらの新しく作られたテストスイートを利用して、女性と男性を翻訳し、性別非包括的な翻訳を生成するシステムの能力を調査します。さらに,テストスイートに関連する指標について議論し,人間による評価によって検証する。以上の結果から,女性と男性の両方の性別形態を自然主義的なジェンダー現象に正しく翻訳する上で,システムは合理的かつ同等のパフォーマンスを達成できることが示唆された。代わりに、翻訳における包括的言語フォームの生成は、全ての評価されたMTモデルの挑戦的なタスクとして現れ、今後の改善とトピックの研究の余地を示す。

As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic.

翻訳日:2023-11-01 21:09:37 公開日:2023-10-30

# 知識伝達によるラベルのみモデル反転攻撃

Label-Only Model Inversion Attacks via Knowledge Transfer ( http://arxiv.org/abs/2310.19342v1 )

ライセンス: Link先を確認

Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh, Ngai-Man Cheung

(参考訳) モデル反転(MI)攻撃では、敵は機械学習(ML)モデルへのアクセスを悪用し、プライベートトレーニングデータを推論して再構築する。ホワイトボックスとブラックボックスのセットアップでは、敵がそれぞれ完全なモデルまたはモデルのソフトアウトプットにアクセスするという顕著な進歩がなされている。しかし、最も難しいが実際に重要な設定では、非常に限定的な研究がある: ラベルのみのmi攻撃、敵は信頼度スコアや他のモデル情報なしで、モデルの予測ラベル(ハードラベル)へのアクセスしかできない。本研究ではラベルのみのMI攻撃に対する新しいアプローチであるLOKTを提案する。我々のアイデアは、不透明なターゲットモデルから代理モデルへの知識の伝達に基づいている。その後,これらのサロゲートモデルを用いて,先進的なホワイトボックス攻撃を活用できる。本稿では、生成モデルに基づく知識伝達を提案し、効果的な知識伝達のための新しいモデルであるTarget Model-assisted ACGAN(T-ACGAN)を提案する。提案手法はラベルのみのmiをより扱いやすいホワイトボックス設定にキャストする。提案手法に基づくサロゲートモデルがmiのターゲットモデルの効果的なプロキシとなることをサポートする分析を提供する。実験の結果,本手法は既存のsomaラベルのみのmi攻撃を全miベンチマークで15%以上上回った。さらに,提案手法はクエリ予算の観点から好適な比較を行う。私たちの研究は、最小限の情報(ハードラベル)が露出しても、mlモデルに対するプライバシの脅威が高まることを浮き彫りにしている。私たちの研究は、最小限の情報(ハードラベル)が露出しても、mlモデルに対するプライバシの脅威が高まることを浮き彫りにしている。私たちのコード、デモ、モデル、再構築されたデータは、プロジェクトページで利用可能です。

In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private training data. Remarkable progress has been made in the white-box and black-box setups, where the adversary has access to the complete model or the model's soft output respectively. However, there is very limited study in the most challenging but practically important setup: Label-only MI attacks, where the adversary only has access to the model's predicted label (hard label) without confidence scores nor any other model information. In this work, we propose LOKT, a novel approach for label-only MI attacks. Our idea is based on transfer of knowledge from the opaque target model to surrogate models. Subsequently, using these surrogate models, our approach can harness advanced white-box attacks. We propose knowledge transfer based on generative modelling, and introduce a new model, Target model-assisted ACGAN (T-ACGAN), for effective knowledge transfer. Our method casts the challenging label-only MI into the more tractable white-box setup. We provide analysis to support that surrogate models based on our approach serve as effective proxies for the target model for MI. Our experiments show that our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks. Furthermore, our method compares favorably in terms of query budget. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our code, demo, models and reconstructed data are available at our project page: https://ngoc-nguyen-0.github.io/lokt/

翻訳日:2023-11-01 21:09:20 公開日:2023-10-30

# Skywork: よりオープンなバイリンガル基礎モデル

Skywork: A More Open Bilingual Foundation Model ( http://arxiv.org/abs/2310.19341v1 )

ライセンス: Link先を確認

Tianwen Wei, Liang Zhao, Lichang Zhang, Bo Zhu, Lijie Wang, Haihua Yang, Biye Li, Cheng Cheng, Weiwei L\"u, Rui Hu, Chenxia Li, Liu Yang, Xilin Luo, Xuejie Wu, Lunan Liu, Wenjun Cheng, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Lei Lin, Xiaokun Wang, Yutuan Ma, Chuanhai Dong, Yanqi Sun, Yifu Chen, Yongyi Peng, Xiaojuan Liang, Shuicheng Yan, Han Fang, Yahui Zhou

(参考訳) 本報告では、英語と中国語のテキストから3.2兆枚以上のトークンを収集した大規模言語モデル(llm)のファミリーであるskywork-13bについて述べる。このバイリンガル基礎モデルは、現在までに最も広く訓練され、公開されているLLMである。汎用トレーニングとドメイン特化強化トレーニングをそれぞれターゲットとした,セグメンテーションコーパスを用いた2段階のトレーニング手法を提案する。我々のモデルは,一般的なベンチマークに優れるだけでなく,多様なドメインにおける中国語のモデリングにおける 'emph{state of the art} のパフォーマンスも達成できることを示す。さらに, LLM コミュニティによるさらなる調査を保証し, テストデータ汚染がプレス問題であることを示す新しい漏洩検出手法を提案する。今後の研究を進めるため,我々はskywork-13bをトレーニングの中間段階で取得したチェックポイントと共にリリースする。われわれはSkyPileのコーパスもリリースしている。これは150億以上のウェブテキストのトークンを集めたもので、中国最大の高品質なプレトレーニングコーパスだ。 Skywork-13Bとオープンコーパスが、高品質のLCMへのアクセスを民主化するための貴重なオープンソースリソースになることを期待しています。

In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs.

翻訳日:2023-11-01 21:08:56 公開日:2023-10-30

# スケーラブルな2分間フィードバック:継続的フィードバック機器としてのデジタル・講義対応調査

Scalable Two-Minute Feedback: Digital, Lecture-Accompanying Survey as a Continuous Feedback Instrument ( http://arxiv.org/abs/2310.19334v1 )

ライセンス: Link先を確認

Armin Egetenmeier, Sven Strickroth

(参考訳) コースや講義内容に関する詳細なフィードバックは改善に不可欠であり、リフレクションのツールとしても機能する。しかし、フィードバックをタイムリーに収集し分析することが教師にとって課題となるため、フィードバックの方法は散発的にのみ使われることが多い。また、学生の現在の状況や学期中の労働負荷の変化も考慮しないことが多い。総合的な調査では,学生のストレスを定量的に測定し,質的な部分で参加者の反射に対処し,2つの教育機関で改善のための一般的な提案(いわゆるOne-Minute Paperに基づく)を収集するための形式的フィードバックとして,デジタル調査形式を用いた。学期中のフィードバックは定性的に評価され、メタレベルと特別な特徴(例えば、学生の労働倫理や他のコースのリフレクション)について議論される。結果は、低いが一定のフィードバック率を示している。回答は主に講義内容や組織的側面の話題を取り上げ、講義内の問題を集中的に報告するために使用された。さらに,大規模言語モデルとしての人工知能(AI)サポートを検証し,教師に対するオープンエンド応答を要約する有望な結果を示した。最後に、講師の経験を反映させ、その結果と改善の可能性について考察する。

Detailed feedback on courses and lecture content is essential for their improvement and also serves as a tool for reflection. However, feedback methods are often only used sporadically, especially in mass courses, because collecting and analyzing feedback in a timely manner is often a challenge for teachers. Moreover, the current situation of the students or the changing workload during the semester are usually not taken into account either. For a holistic investigation, the article used a digital survey format as formative feedback which attempts to measure student stress in a quantitative part and to address the participants' reflection in a qualitative part, as well as to collect general suggestions for improvement (based on the so-called One-Minute Paper) at two educational institutions. The feedback during the semester is evaluated qualitatively and discussed on a meta-level and special features (e.g. reflections on student work ethic or other courses) are addressed. The results show a low, but constant rate of feedback. Responses mostly cover topics of the lecture content or organizational aspects and were intensively used to report issues within the lecture. In addition, artificial intelligence (AI) support in the form of a large language model was tested and showed promising results in summarizing the open-ended responses for the teacher. Finally, the experiences from the lecturers are reflected upon and the results as well as possibilities for improvement are discussed.

翻訳日:2023-11-01 21:08:36 公開日:2023-10-30

# 不均一相互作用をもつ量子スピン鎖の固有状態熱化とその分解

Eigenstate Thermalization and its breakdown in Quantum Spin Chains with Inhomogeneous Interactions ( http://arxiv.org/abs/2310.19333v1 )

ライセンス: Link先を確認

Ding-Zu Wang, Hao Zhu, Jian Cui, Javier Arg\"uello-Luengo, Maciej Lewenstein, Guo-Feng Zhang, Piotr Sierant, Shi-Ju Ran

(参考訳) 固有状態熱化仮説 (ETH) は、孤立量子多体系におけるエルゴディディティと熱化の基準を確立する成功理論である。本研究では,線形不斉相互作用を持つスピン-1/2$ xxz鎖の熱化特性について検討する。不均質な相互作用の導入は、量子カオスと熱化の開始に繋がるが、十分に強い不均一性のために阻害される。 ETHを発現させ,相互作用の強度の変化による分解を示すため,不均一なXXZスピン鎖の固有状態における局所可観測体の行列要素のエネルギーレベルと特性の統計を探索する。さらに, エンタングルメントエントロピーの力学と生存確率について検討し, 熱化とその破壊を考察した。超低温原子系における線形不均一相互作用でXXZ鎖を実験的に実現する方法を概説する。以上の結果から,不均一性の挿入によるETHの出現機構が明らかとなり,強い相互作用が存在する場合の量子力学の停止が示唆された。

The eigenstate thermalization hypothesis (ETH) is a successful theory that establishes the criteria for ergodicity and thermalization in isolated quantum many-body systems. In this work, we investigate the thermalization properties of spin-$ 1/2 $ XXZ chain with linearly-inhomogeneous interactions. We demonstrate that introduction of the inhomogeneous interactions leads to an onset of quantum chaos and thermalization, which, however, becomes inhibited for sufficiently strong inhomogeneity. To exhibit ETH, and to display its breakdown upon varying the strength of interactions, we probe statistics of energy levels and properties of matrix elements of local observables in eigenstates of the inhomogeneous XXZ spin chain. Moreover, we investigate the dynamics of the entanglement entropy and the survival probability which further evidence the thermalization and its breakdown in the considered model. We outline a way to experimentally realize the XXZ chain with linearly-inhomogeneous interactions in systems of ultracold atoms. Our results highlight a mechanism of emergence of ETH due to insertion of inhomogeneities in an otherwise integrable system and illustrate the arrest of quantum dynamics in presence of strong interactions.

翻訳日:2023-11-01 21:08:13 公開日:2023-10-30

# 量子テレポーテーションに基づく半量子プロキシブラインド署名

Semiquantum proxy blind signature based on quantum teleportation ( http://arxiv.org/abs/2310.19327v1 )

ライセンス: Link先を確認

Xiao Tan, Tian-Yu Ye

(参考訳) 本稿では,X状態に基づく量子テレポーテーションを用いた新しいセミクエンタムプロキシブラインドシグネチャ方式を提案する。そこでは,元のメッセージオーナ,プロキシシグネチャ,サードパーティが完全量子能力を持つ量子参加者であり,元のシグネチャとシグネチャ検証器は限定量子能力を持つセミクエンタム参加者である。我々のプロトコルは、完全な盲目、非偽造性、非再考だけでなく、盗聴者からの攻撃行動にも抵抗できることがわかった。従来の多くの量子プロキシブラインドシグネチャプロトコルと比較すると、元のシグネチャとシグネチャ検証器はともに量子能力に制限のある半量子参加者であるため、我々のプロトコルは量子リソースが少なくなり、現実の実装が容易になる可能性がある。

In this paper, we propose a novel semiquantum proxy blind signature scheme with quantum teleportation based on X states, where the original message owner, the proxy signer and the third party are quantum participants with complete quantum capabilities, while the original signer and the signature verifier are semiquantum participants with limited quantum capabilities. It turns out that our protocol not only has complete blindness, unforgeability, non-repudiation and but also can resist the attack behavior from an eavesdropper. Compared with many previous quantum proxy blind signature protocols, our protocol may need less quantum resources and be easier to implement in reality, since both the original signer and the signature verifier are semiquantum participants with limited quantum capabilities.

翻訳日:2023-11-01 21:07:54 公開日:2023-10-30

# 英語における難解な質問生成のための軽量手法

A Lightweight Method to Generate Unanswerable Questions in English ( http://arxiv.org/abs/2310.19403v1 )

ライセンス: Link先を確認

Vagrant Gautam, Miaoran Zhang, Dietrich Klakow

(参考訳) 利用可能な情報で質問に答えられない場合、質問応答のための堅牢なシステム(QA)は _not_ を知って答えるべきである。これを行うQAモデルを構築する方法の1つは、アノテータを採用するか、あるいは解決不可能な質問生成のための自動メソッドを通じて作成される、解決不可能な質問からなる追加のトレーニングデータである。既存の自動アプローチのモデルの複雑さが正当化されていないことを示すため、英語の難解な質問生成のためのより単純なデータ拡張手法について検討する。従来の最先端技術と比較すると、トレーニング不要で軽量な戦略によって生成されたデータは、より優れたモデル(BERT-largeでSQuAD 2.0データに+1.6 F1ポイント)となり、より人力的な関連性と可読性が高い。我々は,複数のエンコーダモデルにまたがる拡張を行わず,異なる量の生成データとTydiQA-MinSpanデータ(BERT-largeで+9.3 F1ポイント)を用いて,このアプローチの生の利点を定量化する。我々の結果は、スワップを将来の作業の単純だが強力なベースラインとして確立する。

If a question cannot be answered with the available information, robust systems for question answering (QA) should know _not_ to answer. One way to build QA models that do this is with additional training data comprised of unanswerable questions, created either by employing annotators or through automated methods for unanswerable question generation. To show that the model complexity of existing automated approaches is not justified, we examine a simpler data augmentation method for unanswerable question generation in English: performing antonym and entity swaps on answerable questions. Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data with BERT-large), and has higher human-judged relatedness and readability. We quantify the raw benefits of our approach compared to no augmentation across multiple encoder models, using different amounts of generated data, and also on TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish swaps as a simple but strong baseline for future work.

翻訳日:2023-11-01 21:00:34 公開日:2023-10-30

# playtest:ゲーム用のゲーム化テストジェネレータ

PlayTest: A Gamified Test Generator for Games ( http://arxiv.org/abs/2310.19402v1 )

ライセンス: Link先を確認

Patric Feldmeier, Philipp Straubinger, Gordon Fraser

(参考訳) ゲームは通常段階的に作成され、同じシナリオを繰り返しテストする必要がある。そこで我々は,Playtestと呼ばれるゲームにカプセル化することで,このゲームテストプロセスを緩和することを目的としている。 playtestはプレイヤーのアクションに基づいて価値あるテストケースを自動生成する。開発プロセス中のプレイテストフェーズにおいて,ツールを通じて各ゲームにアクセスできるようにすることにより,ゲームテストタスクをクラウドソーシングするPlaytestの利用を想定する。

Games are usually created incrementally, requiring repeated testing of the same scenarios, which is a tedious and error-prone task for game developers. Therefore, we aim to alleviate this game testing process by encapsulating it into a game called Playtest, which transforms the tiring testing process into a competitive game with a purpose. Playtest automates the generation of valuable test cases based on player actions, without the players even realising it. We envision the use of Playtest to crowdsource the task of testing games by giving players access to the respective games through our tool in the playtesting phases during the development process.

翻訳日:2023-11-01 21:00:01 公開日:2023-10-30

# LightSAGE:買い物客の推薦における大規模項目検索のためのグラフニューラルネットワーク

LightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee's Advertisement Recommendation ( http://arxiv.org/abs/2310.19394v1 )

ライセンス: Link先を確認

Dang Minh Nguyen, Chenfei Wang, Yan Shen, Yifan Zeng

(参考訳) グラフニューラルネットワーク(GNN)は、推薦問題におけるアイテム検索のトレンドソリューションである。しかし最近の報告では、新しいモデルアーキテクチャに重点を置いている。これは、GNNを産業環境に適用する際のギャップを生じさせる可能性がある。グラフの構築とデータ空間の扱いに加えて、プロジェクト全体の成功においても重要な役割を果たす。本稿では,GNNの大規模eコマースアイテム検索への応用について報告する。グラフの構築、モデリング、データスキューネスの処理において、単純で新しくてインパクトのあるテクニックを紹介します。具体的には,強信号ユーザ行動と高精度協調フィルタリング(cf)アルゴリズムを組み合わせることで,高品質な項目グラフを構築する。そこで我々はLightSAGEと呼ばれる新しいGNNアーキテクチャを開発し、ベクトル探索のための高品質なアイテムの埋め込みを生成する。最後に、広告(ads)システムにおいて重要となるコールドスタートおよびロングテールアイテムを扱う複数の戦略を設計する。本モデルでは,オフライン評価の改善やオンラインa/bテストを実施し,shopeeのレコメンデーション広告システムのメイントラフィックにデプロイする。

Graph Neural Network (GNN) is the trending solution for item retrieval in recommendation problems. Most recent reports, however, focus heavily on new model architectures. This may bring some gaps when applying GNN in the industrial setup, where, besides the model, constructing the graph and handling data sparsity also play critical roles in the overall success of the project. In this work, we report how GNN is applied for large-scale e-commerce item retrieval at Shopee. We introduce our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. Specifically, we construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items' embeddings for vector search. Finally, we design multiple strategies to handle cold-start and long-tail items, which are critical in an advertisement (ads) system. Our models bring improvement in offline evaluations, online A/B tests, and are deployed to the main traffic of Shopee's Recommendation Advertisement system.

翻訳日:2023-11-01 20:59:50 公開日:2023-10-30

# 前庭神経癌に対する臨床ガイドライン駆動自動線状特徴抽出法

A Clinical Guideline Driven Automated Linear Feature Extraction for Vestibular Schwannoma ( http://arxiv.org/abs/2310.19392v1 )

ライセンス: Link先を確認

Navodini Wijethilake, Steve Connor, Anna Oviedova, Rebecca Burger, Tom Vercauteren, Jonathan Shapey

(参考訳) 前庭神経腫は、バランス神経の1つから成長する良性脳腫瘍である。患者は手術、放射線治療、あるいは保守的な「待機とスキャン」戦略で治療される。臨床医は通常、手作業で抽出したリニア測定を使って臨床意思決定を支援する。本研究の目的は,深層学習に基づくセグメンテーションを用いて,計算アルゴリズムを用いて関連する臨床特徴を抽出し,このプロセスを自動化することである。私たちの知識を最大限に活用するため,本研究は,局所臨床ガイドラインを再現する自動アプローチを提案する最初の方法である。深層学習ベースセグメンテーションにより,T1強調MRIでは0.8124+- 0.2343,0.8969+- 0.0521,T2強調MRIでは0.8222+0.2108,0.9049+- 0.0646を得た。そこで本稿では, 腫瘍の肉眼領域の大きさに基づいて, 分割領域から最も適切な最大線量測定を選択し, 抽出するアルゴリズムを提案する。このツールを用いて、臨床医は、臨床診断補助として機能する腫瘍進展に関する視覚ガイドと関連するメトリクスを提供される。本研究は,イギリスにおける第3次神経外科専門病院に紹介された50例から得られた187件のスキャンデータを用いた。専門神経放射線医が手動で抽出した測定値から,自動測定値と有意な相関が認められた(p<0.0001。

Vestibular Schwannoma is a benign brain tumour that grows from one of the balance nerves. Patients may be treated by surgery, radiosurgery or with a conservative "wait-and-scan" strategy. Clinicians typically use manually extracted linear measurements to aid clinical decision making. This work aims to automate and improve this process by using deep learning based segmentation to extract relevant clinical features through computational algorithms. To the best of our knowledge, our study is the first to propose an automated approach to replicate local clinical guidelines. Our deep learning based segmentation provided Dice-scores of 0.8124 +- 0.2343 and 0.8969 +- 0.0521 for extrameatal and whole tumour regions respectively for T2 weighted MRI, whereas 0.8222 +- 0.2108 and 0.9049 +- 0.0646 were obtained for T1 weighted MRI. We propose a novel algorithm to choose and extract the most appropriate maximum linear measurement from the segmented regions based on the size of the extrameatal portion of the tumour. Using this tool, clinicians will be provided with a visual guide and related metrics relating to tumour progression that will function as a clinical decision aid. In this study, we utilize 187 scans obtained from 50 patients referred to a tertiary specialist neurosurgical service in the United Kingdom. The measurements extracted manually by an expert neuroradiologist indicated a significant correlation with the automated measurements (p < 0.0001).

翻訳日:2023-11-01 20:59:33 公開日:2023-10-30

# 因果的公平性:因果関係の橋渡し、個々人の公平性、敵対的堅牢性

Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness ( http://arxiv.org/abs/2310.19391v1 )

ライセンス: Link先を確認

Ahmad-Reza Ehyaei, Golnoosh Farnadi, Samira Samadi

(参考訳) 敵対的摂動は機械学習モデルの脆弱性を暴露するために使用され、一方個々の公平性の概念は、機密性に関係なく公平な扱いを保証することを目的としている。最初の違いにもかかわらず、両方の概念は類似した入力データインスタンスを生成するためにメトリクスに依存している。これらの指標は、特に因果構造から派生したデータの特徴と一致し、反事実的近接を反映するように設計されるべきである。このようなメトリクスを定義する以前の試みは、しばしばデータや構造的因果モデルに関する一般的な仮定を欠いている。本研究では,感性属性を含む因果構造に基づいて定式化された因果フェアメトリックを提案する。ロバストネス分析のために、保護因果摂動の概念が提示される。さらに,実世界の問題に対するメトリック推定とデプロイメントの手法を提案することにより,メトリック学習を考察した。紹介されたメトリクスは、対人訓練、公正学習、アルゴリズムの講義、因果強化学習に応用されている。

Adversarial perturbation is used to expose vulnerabilities in machine learning models, while the concept of individual fairness aims to ensure equitable treatment regardless of sensitive attributes. Despite their initial differences, both concepts rely on metrics to generate similar input data instances. These metrics should be designed to align with the data's characteristics, especially when it is derived from causal structure and should reflect counterfactuals proximity. Previous attempts to define such metrics often lack general assumptions about data or structural causal models. In this research, we introduce a causal fair metric formulated based on causal structures that encompass sensitive attributes. For robustness analysis, the concept of protected causal perturbation is presented. Additionally, we delve into metric learning, proposing a method for metric estimation and deployment in real-world problems. The introduced metric has applications in the fields adversarial training, fair learning, algorithmic recourse, and causal reinforcement learning.

翻訳日:2023-11-01 20:59:10 公開日:2023-10-30

# 暗黙多様体ガウス過程回帰

Implicit Manifold Gaussian Process Regression ( http://arxiv.org/abs/2310.19390v1 )

ライセンス: Link先を確認

Bernardo Fichera, Viacheslav Borovitskiy, Andreas Krause, Aude Billard

(参考訳) ガウス過程の回帰は、よく校正された不確実性推定を提供し、小さなデータセットやスパースデータセットを扱う能力によって広く利用されている。しかし、それは高次元データに苦しむ。このテクニックを高次元にスケールする方法の1つは、データが実際に存在する暗黙の低次元多様体を、多様体仮説によって仮定されるように活用することである。以前の作業では、通常、多様体構造は明示的に与えられること、すなわちメッシュによって与えられるか、球面のようなよく知られた多様体の1つであることが知られていることを要求する。対照的に,本論文では,データ(ラベル付き,ラベルなし)から直接暗黙の構造を完全に微分可能な方法で推定できるガウス過程回帰手法を提案する。得られたモデルについて、仮定多様体上の mat\'ern gauss 過程への収束について論じる。我々の手法は数十万のデータポイントまでスケールし、高次元〜設定における標準ガウス過程回帰の予測性能とキャリブレーションを改善する。

Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Mat\'ern Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional~settings.

翻訳日:2023-11-01 20:58:54 公開日:2023-10-30

# オセロは解決した

Othello is Solved ( http://arxiv.org/abs/2310.19387v1 )

ライセンス: Link先を確認

Hiroki Takizawa

(参考訳) オセロのゲームは世界で最も複雑で人気のあるゲームの1つであり、まだ計算学的に解決されていない。オセロは、およそ10オクテデシリオン(10から58のパワー)のゲーム記録と10オクテリオン(10から28のパワー)のゲームポジションを持っている。オセロを解くという課題は、どちらのプレイヤーもミスを起こさずにゲームの結果を決定することであり、長い間コンピュータ科学における大きな挑戦であった。 othelloが解決され、両プレーヤーによる完璧なプレーが引き分けにつながることを計算的に証明した。強力なothelloソフトウェアは、ヒューリスティックに設計された検索技術を使って長い間構築されてきた。ゲームの解決は、ソフトウェアがゲームを完璧にプレイできるソリューションを提供する。

The game of Othello is one of the world's most complex and popular games that has yet to be computationally solved. Othello has roughly ten octodecillion (10 to the 58th power) possible game records and ten octillion (10 to the 28th power) possible game position. The challenge of solving Othello, determining the outcome of a game with no mistake made by either player, has long been a grand challenge in computer science. This paper announces a significant milestone: Othello is now solved, computationally proved that perfect play by both players lead to a draw. Strong Othello software has long been built using heuristically designed search techniques. Solving a game provides the solution which enables software to play the game perfectly.

翻訳日:2023-11-01 20:58:37 公開日:2023-10-30

# ニューラルエミュレータを用いたサブグリッドスケールダイナミックスのグラディエントフリーオンライン学習

Gradient-free online learning of subgrid-scale dynamics with neural emulators ( http://arxiv.org/abs/2310.19385v1 )

ライセンス: Link先を確認

Hugo Frezat, Guillaume Balarac, Julien Le Sommer, Ronan Fablet

(参考訳) 本稿では,非微分型数値解法に対する$\textit{a posteriori}$損失関数を用いて,オンライン上で機械学習に基づくサブグリッドパラメータ化を学習する汎用アルゴリズムを提案する。提案手法では, ニューラルネットワークを用いて, 時間積分ステップによる勾配伝播を可能にするために, 低減状態空間ソルバの近似を学習する。このアルゴリズムは、元の解法の勾配を計算することなく、オンライン戦略の利点のほとんどを回復することができる。近似バイアスの伝播を最小化するために,各損失量と神経エミュレータとパラメトリゼーション成分を別々に訓練する必要があることを実証した。

In this paper, we propose a generic algorithm to train machine learning-based subgrid parametrizations online, i.e., with $\textit{a posteriori}$ loss functions for non-differentiable numerical solvers. The proposed approach leverage neural emulators to train an approximation of the reduced state-space solver, which is then used to allows gradient propagation through temporal integration steps. The algorithm is able to recover most of the benefit of online strategies without having to compute the gradient of the original solver. It is demonstrated that training the neural emulator and parametrization components separately with respective loss quantities is necessary in order to minimize the propagation of some approximation bias.

翻訳日:2023-11-01 20:58:25 公開日:2023-10-30

# 深部随時仮説テスト

Deep anytime-valid hypothesis testing ( http://arxiv.org/abs/2310.19384v1 )

ライセンス: Link先を確認

Teodora Pandeva and Patrick Forr\'e and Aaditya Ramdas and Shubhanshu Shekhar

(参考訳) 本研究では,非パラメトリックテスト問題に対する強力な逐次的仮説テストを構築するための汎用フレームワークを提案する。これらの問題のヌル仮説は、データ分布上の2つの既知の演算子の作用を用いて抽象形式で定義される。この抽象化により、2サンプルテスト、独立テスト、条件付き独立テストのような古典的なタスクを統一的に扱うことができ、機械学習(ML)モデルの対角的堅牢性のテストのような現代の問題も解決できる。提案するフレームワークは,従来のバッチテストよりも次のような利点がある。 1)オンラインデータストリームを継続的に監視し、nullに対する証拠を効率的に集約する。 2) 複数のテストの修正を必要とせず、タイプiエラーの厳密な制御を提供する。 3) 問題の未知の硬さにサンプルサイズ要件を適用する。逐次テストの設計のためのゲーム理論的アプローチであるtest-by-bettingフレームワークにおいて,mlモデルの表現能力を活用するための原則的アプローチを開発した。合成および実世界のデータセットに関する実証的な結果は、我々の一般的なフレームワークを用いてインスタンス化されたテストが、いくつかのタスクにおける特別なベースラインと競合していることを示している。

We propose a general framework for constructing powerful, sequential hypothesis tests for a large class of nonparametric testing problems. The null hypothesis for these problems is defined in an abstract form using the action of two known operators on the data distribution. This abstraction allows for a unified treatment of several classical tasks, such as two-sample testing, independence testing, and conditional-independence testing, as well as modern problems, such as testing for adversarial robustness of machine learning (ML) models. Our proposed framework has the following advantages over classical batch tests: 1) it continuously monitors online data streams and efficiently aggregates evidence against the null, 2) it provides tight control over the type I error without the need for multiple testing correction, 3) it adapts the sample size requirement to the unknown hardness of the problem. We develop a principled approach of leveraging the representation capability of ML models within the testing-by-betting framework, a game-theoretic approach for designing sequential tests. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines on several tasks.

翻訳日:2023-11-01 20:58:12 公開日:2023-10-30

# 実測実験における補正ベルと非文脈不等式

Corrected Bell and Noncontextuality Inequalities for Realistic Experiments ( http://arxiv.org/abs/2310.19383v1 )

ライセンス: Link先を確認

Kim Vall\'ee, Pierre-Emmanuel Emeriau, Boris Bourdoncle, Adel Sohbi, Shane Mansfield, Damian Markham

(参考訳) 文脈性は量子相関の特徴である。非古典的現象としての基礎的な観点から、量子優位の資源として応用的な視点から重要である。一般には隠れた変数の項で定義され、パラメータ依存と決定論の仮定と矛盾する。前者は非シグナリングまたは非ジグナブルの経験的性質、後者は測定シャープネスの経験的性質によって正当化することができる。しかし、現実的な実験では、経験的性質が正確には持たないため、非古典性の形式としての文脈性に対する反対や、想定される量子上の利点に対する潜在的な脆弱性が生じる可能性がある。両性質を定量化するための尺度を導入し,対応する仮定の量化緩和を導入する。我々は、その雑音に対する堅牢性を保証する文脈分数という、既知の文脈的尺度の連続性を証明した。すると、これらの緩和が文脈的分数(あるいは任意の非文脈的不等式)への補正項を通して文脈的不完全性を説明することができる範囲を、真の文脈的不完全性の概念で決定する。その結果,様々な確立した結果や実験的な設定を適用あるいは関連付けるのに十分な汎用性が得られた。

Contextuality is a feature of quantum correlations. It is crucial from a foundational perspective as a nonclassical phenomenon, and from an applied perspective as a resource for quantum advantage. It is commonly defined in terms of hidden variables, for which it forces a contradiction with the assumptions of parameter-independence and determinism. The former can be justified by the empirical property of non-signalling or non-disturbance, and the latter by the empirical property of measurement sharpness. However, in realistic experiments neither empirical property holds exactly, which leads to possible objections to contextuality as a form of nonclassicality, and potential vulnerabilities for supposed quantum advantages. We introduce measures to quantify both properties, and introduce quantified relaxations of the corresponding assumptions. We prove the continuity of a known measure of contextuality, the contextual fraction, which ensures its robustness to noise. We then bound the extent to which these relaxations can account for contextuality, via corrections terms to the contextual fraction (or to any noncontextuality inequality), culminating in a notion of genuine contextuality, which is robust to experimental imperfections. We then show that our result is general enough to apply or relate to a variety of established results and experimental setups.

翻訳日:2023-11-01 20:57:55 公開日:2023-10-30

# 機械学習ショートカットによる公開データの保護

Protecting Publicly Available Data With Machine Learning Shortcuts ( http://arxiv.org/abs/2310.19381v1 )

ライセンス: Link先を確認

Nicolas M. M\"uller, Maximilian Burgert, Pascal Debus, Jennifer Williams, Philip Sperl, Konstantin B\"ottinger

(参考訳) 機械学習(ml)ショートカットやスプリアス相関はデータセット内のアーティファクトであり、非常に優れたトレーニングとテストパフォーマンスをもたらすが、モデルの一般化能力は著しく制限される。このようなショートカットはドメイン内テストパフォーマンスの良さから気づかないほど不気味なものです。本稿では,異なるショートカットの影響について検討し,簡単なショートカットであっても説明可能なAI手法により検出が難しいことを示す。私たちはこの事実を利用して、オンラインデータベースをクローラーから守るためのアプローチを設計します。デートプラットフォーム、衣料品メーカー、中古車ディーラーなどのプロバイダは、大規模にデータポイントをつかんで再送する専門化されたクローリング業界を扱わなければなりません。 MLショートカットを意図的に追加することで、抑止力を実現できることを示す。このようなデータセットはMLのユースケースでは使用できないため、クローラやインターネットからの不正なデータの使用を回避できる。 3つのユースケースから得られた実世界データを用いて,提案手法では収集したデータは使用できないが,ショートカットは人間の知覚では認識が困難であることを示す。したがって,提案手法は不正なデータクローリングに対する積極的な保護となる。

Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.

翻訳日:2023-11-01 20:57:32 公開日:2023-10-30

# transxnet: 視覚認識のためのdual dynamic token mixerによるグローバルおよびローカルダイナミクスの学習

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition ( http://arxiv.org/abs/2310.19380v1 )

ライセンス: Link先を確認

Meng Lou, Hong-Yu Zhou, Sibei Yang, Yizhou Yu

(参考訳) 近年,インダクティブバイアスの導入と一般化性能の向上を目的として,変圧器への畳み込みを取り入れている。しかし、従来の畳み込みの静的な性質は、入力のバリエーションに動的に適応することを妨げるため、自己注意が注意行列を動的に計算するにつれて、畳み込みと自己注意の表現の相違が生じる。さらに、畳み込みと自己アテンションからなるトークンミキサーを積み重ねてディープネットワークを形成すると、畳み込みの静的性質は、自己アテンションによって生成された機能を畳み込みカーネルに融合させるのを妨げる。これら2つの制限は、構築されたネットワークの準最適表現能力をもたらす。そこで本研究では,グローバルな情報と局所的な詳細を入力依存的に集約する軽量なD-Mixerを提案する。 D-Mixerは、効率的なグローバルアテンションモジュールと入力依存の奥行き畳み込みを均等に分割した特徴セグメントに別々に適用し、ネットワークに強い帰納バイアスと拡張された有効受容場を与える。我々は,新しいハイブリッドCNN-TransformerビジョンバックボーンネットワークであるTransXNetを設計する上で,基本的なビルディングブロックとしてD-Mixerを使用している。 ImageNet-1Kの画像分類タスクでは、TransXNet-TはSwing-Tを0.3倍の精度で上回り、計算コストの半分以下である。さらに、TransXNet-SとTransXNet-Bは優れたモデルスケーラビリティを示し、それぞれ83.8\%と84.6\%のTop-1精度を実現した。さらに,提案するネットワークアーキテクチャは,計算コストを低減しつつ,様々な密集した予測タスクにおいて強力な一般化能力を示す。

Recent studies have integrated convolution into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it from dynamically adapting to input variations, resulting in a representation discrepancy between convolution and self-attention as self-attention calculates attention matrices dynamically. Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels. These two limitations result in a sub-optimal representation capacity of the constructed networks. To find a solution, we propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way. D-Mixer works by applying an efficient global attention module and an input-dependent depthwise convolution separately on evenly split feature segments, endowing the network with strong inductive bias and an enlarged effective receptive field. We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network that delivers compelling performance. In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3\% in top-1 accuracy while requiring less than half of the computational cost. Furthermore, TransXNet-S and TransXNet-B exhibit excellent model scalability, achieving top-1 accuracy of 83.8\% and 84.6\% respectively, with reasonable computational costs. Additionally, our proposed network architecture demonstrates strong generalization capabilities in various dense prediction tasks, outperforming other state-of-the-art networks while having lower computational costs.

翻訳日:2023-11-01 20:57:12 公開日:2023-10-30

# イメージジェネレータのハイブリッドドメイン適応

Few-shot Hybrid Domain Adaptation of Image Generators ( http://arxiv.org/abs/2310.19378v1 )

ライセンス: Link先を確認

Hengjia Li, Yang Liu, Linxuan Xia, Yuqi Lin, Tu Zheng, Zheng Yang, Wenxiao Wang, Xiaohui Zhong, Xiaobo Ren, Xiaofei He

(参考訳) 事前学習されたジェネレータは、複数のターゲットドメインのハイブリッドに適応し、それらの統合された属性で画像を生成することができるか? 本研究では、Few-shot Hybrid Domain Adaptation (HDA)という新しいタスクを導入する。ソースジェネレータといくつかのターゲットドメインを与えられたhdaは、ソースドメインの特性をオーバーライドすることなく、すべてのターゲットドメインの統合属性を保持する適応型ジェネレータの獲得を目指している。ドメイン適応(DA)と比較して、HDAはジェネレータをより複合的で拡張可能なドメインに適応するための柔軟性と汎用性を提供します。同時に、HDAは、ターゲットドメインの個々の画像のみにアクセスでき、ハイブリッドドメインの認証画像が欠如しているため、DAよりも多くの課題を提示します。この問題に対処するために、異なるドメインの画像を直接分離可能なサブ空間にエンコードする差別化フレームワークを導入する。 HDAを実現するために,距離損失と方向損失からなる新たな方向空間損失を提案する。特に、距離損失は、生成された画像からすべての対象部分空間までの距離を減らすことにより、すべての対象領域の属性をブレンドする。方向損失は、垂直部分空間に沿って適応を導くことによって、ソース領域からの特性を保存する。実験により、本手法は、セマンティクス類似性、画像忠実性、ドメイン間の一貫性においてベースラインメソッドを上回る1つの適応型ジェネレータにおいて、多数のドメイン固有の属性を得ることができることを示した。

Can a pre-trained generator be adapted to the hybrid of multiple target domains and generate images with integrated attributes of them? In this work, we introduce a new task -- Few-shot Hybrid Domain Adaptation (HDA). Given a source generator and several target domains, HDA aims to acquire an adapted generator that preserves the integrated attributes of all target domains, without overriding the source domain's characteristics. Compared with Domain Adaptation (DA), HDA offers greater flexibility and versatility to adapt generators to more composite and expansive domains. Simultaneously, HDA also presents more challenges than DA as we have access only to images from individual target domains and lack authentic images from the hybrid domain. To address this issue, we introduce a discriminator-free framework that directly encodes different domains' images into well-separable subspaces. To achieve HDA, we propose a novel directional subspace loss comprised of a distance loss and a direction loss. Concretely, the distance loss blends the attributes of all target domains by reducing the distances from generated images to all target subspaces. The direction loss preserves the characteristics from the source domain by guiding the adaptation along the perpendicular to subspaces. Experiments show that our method can obtain numerous domain-specific attributes in a single adapted generator, which surpasses the baseline methods in semantic similarity, image fidelity, and cross-domain consistency.

翻訳日:2023-11-01 20:56:38 公開日:2023-10-30

# 二次元共形場理論における状態準備としての不均一クエンチ

Inhomogeneous quenches as state preparation in two-dimensional conformal field theories ( http://arxiv.org/abs/2310.19376v1 )

ライセンス: Link先を確認

Masahiro Nozaki, Kotaro Tamaoka, Mao Tian Tan

(参考訳) システムが特徴のない状態に進化しない非平衡過程は、非平衡現象における新しい中心対象の1つである。本稿では,2次元共形場理論(2$d CFTs)における近距離絡み状態から,正規化を伴う境界状態から,M\"obius/SSD理論と呼ばれる不均一ハミルトニアンの系を進化させる。この論文で考慮されたCFTの詳細にかかわらず、M\\\obius進化の間、絡み合いエントロピーは量子回復と呼ばれる周期運動を示す。 ssd時間発展の間、一部のサブシステムを除いて、大規模なシステムでは、絡み合いエントロピーと相互情報が真空状態のものと近似される。サブシステムが真空に冷やす時間は$t_1 \gg \mathcal{o}(l\sqrt{l_a})$であり、ここで$t_1$、$l$、$l_a$は時間、システム、サブシステムサイズである。この結果は, SSDハミルトニアンにより誘導される不均一なクエンチが, ほぼ真空状態の準備として用いられることを示唆している。本稿では,本論文で検討した系の重力双対を提案し,一般化する。さらに,不均質なクエンチと連続的マルチスケールエンタングルメント再正規化アンサッツ (cmera) との関係について考察する。

The non-equilibrium process where the system does not evolve to the featureless state is one of the new central objects in the non-equilibrium phenomena. In this paper, starting from the short-range entangled state in the two-dimensional conformal field theories ($2$d CFTs), the boundary state with a regularization, we evolve the system with the inhomogeneous Hamiltonians called M\"obius/SSD ones. Regardless of the details of CFTs considered in this paper, during the M\"obius evolution, the entanglement entropy exhibits the periodic motion called quantum revival. During SSD time evolution, except for some subsystems, in the large time regime, entanglement entropy and mutual information are approximated by those for the vacuum state. We argue the time regime for the subsystem to cool down to vacuum one is $t_1 \gg \mathcal{O}(L\sqrt{l_A})$, where $t_1$, $L$, and $l_A$ are time, system, and subsystem sizes. This finding suggests the inhomogeneous quench induced by the SSD Hamiltonian may be used as the preparation for the approximately-vacuum state. We propose the gravity dual of the systems considered in this paper, furthermore, and generalize it. In addition to them, we discuss the relation between the inhomogenous quenches and continuous multi-scale entanglement renormalization ansatz (cMERA).

翻訳日:2023-11-01 20:56:13 公開日:2023-10-30

# シーン特異的融合モジュールによるrgb-xオブジェクト検出

RGB-X Object Detection via Scene-Specific Fusion Modules ( http://arxiv.org/abs/2310.19372v1 )

ライセンス: Link先を確認

Sri Aditya Deevi, Connor Lee, Lu Gan, Sushruth Nagesh, Gaurav Pandey, and Soon-Jo Chung

(参考訳) マルチモーダル深度センサー融合は、自動運転車が周囲の環境をあらゆる天候下で視覚的に理解することを可能にする可能性がある。しかし、既存の深層センサー融合法では、通常、統合されたマルチモーダル特徴を持つ畳み込みアーキテクチャを採用しており、トレーニングには大きなコアギスタードマルチモーダルデータセットを必要とする。本研究では,シーン固有の融合モジュールを介し,事前学習した単一モードモデルの活用と融合が可能な,効率的かつモジュール化されたRGB-X融合ネットワークを提案する。実験では,rgb-thermalおよびrgb-gatedデータセットにおける既存の手法と比較して,少量の追加パラメータのみを用いて融合を行う方法が優れていることを示す。私たちのコードはhttps://github.com/dsriaditya999/RGBXFusionで利用可能です。

Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.

翻訳日:2023-11-01 20:55:43 公開日:2023-10-30

# 区間値データと区間値関数データの順序分類

Ordinal classification for interval-valued data and interval-valued functional data ( http://arxiv.org/abs/2310.19433v1 )

ライセンス: Link先を確認

Aleix Alcacer, Marina Mart\'inez-Garcia, Irene Epifanio

(参考訳) 順序分類の目的は、観測された入力の集合から出力の順序ラベルを予測することである。区間値データとは、間隔の形でのデータを指す。順序分類問題において、初めて区間値データと区間値関数データが入力と見なされる。区間データと区間値関数データに対する6つの順序分類器を提案する。 3つはパラメトリックであり、1つは順序二項分解、もう1つは順序ロジスティック回帰に基づいている。他の3つの手法は、インターバルデータにおけるインターバルデータとカーネル間の距離の利用に基づいている。方法の1つは、順序分類に$k$-nearest-neighbor法を用いる。別の方法はカーネル主成分分析と順序分類器を考える。そして、最善を尽くす方法である第6の方法は、カーネルによって誘導される順序ランダムフォレストを使用する。それらは、人間の地球開発や気象データに関する合成およびオリジナルの実データを用いた広範囲な実験研究において、na\"iveなアプローチと比較される。その結果,順序と区間値情報を考慮すると精度が向上することがわかった。ソースコードとデータセットはhttps://github.com/aleixalcacer/ocfivdで入手できる。

The aim of ordinal classification is to predict the ordered labels of the output from a set of observed inputs. Interval-valued data refers to data in the form of intervals. For the first time, interval-valued data and interval-valued functional data are considered as inputs in an ordinal classification problem. Six ordinal classifiers for interval data and interval-valued functional data are proposed. Three of them are parametric, one of them is based on ordinal binary decompositions and the other two are based on ordered logistic regression. The other three methods are based on the use of distances between interval data and kernels on interval data. One of the methods uses the weighted $k$-nearest-neighbor technique for ordinal classification. Another method considers kernel principal component analysis plus an ordinal classifier. And the sixth method, which is the method that performs best, uses a kernel-induced ordinal random forest. They are compared with na\"ive approaches in an extensive experimental study with synthetic and original real data sets, about human global development, and weather data. The results show that considering ordering and interval-valued information improves the accuracy. The source code and data sets are available at https://github.com/aleixalcacer/OCFIVD.

翻訳日:2023-11-01 20:47:22 公開日:2023-10-30

# ロボット操作のためのディープポリシーネットワークの決定を説明する

Explaining the Decisions of Deep Policy Networks for Robotic Manipulations ( http://arxiv.org/abs/2310.19432v1 )

ライセンス: Link先を確認

Seongun Kim, Jaesik Choi

(参考訳) ディープポリシーネットワークは、ロボットが行動を学び、エンド・ツー・エンドの方法で様々な現実世界の複雑なタスクを解決できるようにする。しかし、行動の理由を提供するための透明性が欠如している。したがって、そのようなブラックボックスモデルは、実際にロボットを配置する際の信頼性が低く、破壊的な動作をもたらすことが多い。透明性を高めるためには,各入力特徴が与えられた行動決定にどの程度寄与するかを考慮し,ロボットの動作を説明することが重要である。本稿では,入力帰属法による深い政策モデルの明示的な分析を行い,各入力特徴がロボットの政策モデルの判断にどの程度影響するかを説明する。そこで本研究では,ロボットポリシネットワークに入力帰属法を適用するための2つの方法を提案する。(1) エンドエフェクタ運動に対するモータトルクの影響を反映するために,各関節トルクの重要度を測定し,(2) 負の入力と深いポリシネットワークの出力を適切に処理するための関連伝搬法を修正する。我々の知る限りでは、ロボット操作のためにオンラインのディープポリシーネットワークにおけるマルチモーダルセンサ入力の入力属性の動的変化を特定する最初のレポートである。

Deep policy networks enable robots to learn behaviors to solve various real-world complex tasks in an end-to-end fashion. However, they lack transparency to provide the reasons of actions. Thus, such a black-box model often results in low reliability and disruptive actions during the deployment of the robot in practice. To enhance its transparency, it is important to explain robot behaviors by considering the extent to which each input feature contributes to determining a given action. In this paper, we present an explicit analysis of deep policy models through input attribution methods to explain how and to what extent each input feature affects the decisions of the robot policy models. To this end, we present two methods for applying input attribution methods to robot policy networks: (1) we measure the importance factor of each joint torque to reflect the influence of the motor torque on the end-effector movement, and (2) we modify a relevance propagation method to handle negative inputs and outputs in deep policy networks properly. To the best of our knowledge, this is the first report to identify the dynamic changes of input attributions of multi-modal sensor inputs in deep policy networks online for robotic manipulation.

翻訳日:2023-11-01 20:47:06 公開日:2023-10-30

# 不可能な平面の自動検出による信頼性挙動合成のための精製拡散プランナ

Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans ( http://arxiv.org/abs/2310.19427v1 )

ライセンス: Link先を確認

Kyowoon Lee, Seongun Kim and Jaesik Choi

(参考訳) 拡散型計画法は, 軌道拡散モデルの訓練と補助誘導関数を用いたサンプル軌道の条件付けにより, 長期的, スパース・リワードタスクにおいて有望な結果を示した。しかし、生成モデルとしての性質から、拡散モデルは実現可能な計画を生成することが保証されていないため、実行が失敗し、プランナーが安全クリティカルな応用に役立ちなくなる。本研究では,拡散モデルが生み出す信頼できない計画を改善するための新しい手法を提案する。そこで本研究では,拡散モデルにより生成された個別計画の品質を評価するための,新たな修復ギャップを提案する。復元ギャップ誘導を生成するgap予測器により復元ギャップを推定し、拡散プランナーを精製する。さらに,サブ・オプティカル・ギャップ・予測器から発生する敵対的精錬指導を防止し,実現不可能な計画のさらなる洗練を可能にするアトリビューション・マップ・レギュラライザを提案する。提案手法は,長期計画を必要とするオフライン制御設定における3つのベンチマークの有効性を示す。また,提案手法は,差分予測器の帰属マップを提示し,誤り発生遷移を強調することにより説明可能性を示し,生成した計画のより深い理解を可能にする。

Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.

翻訳日:2023-11-01 20:46:45 公開日:2023-10-30

# 人工知能と人文科学の限界

Artificial intelligence and the limits of the humanities ( http://arxiv.org/abs/2310.19425v1 )

ライセンス: Link先を確認

W{\l}odzis{\l}aw Duch

(参考訳) 現代社会における文化の複雑さは人間の理解を超えている。認知科学は、精神モデルに基づく伝統的な説明に疑問を投げかけた。人文科学における中核的な主題は、その重要性を失う可能性がある。人間はデジタルの時代に適応しなければならない。人文科学の新しい学際分野が出現する。情報への即時アクセスは、知識への即時アクセスに置き換えられる。人類の認知的限界と、世界的課題に対処するために必要な人工知能と学際研究の発展によって開かれた機会を理解することが、人文科学の活性化の鍵となる。人工知能は、芸術から政治科学、哲学まで、人文科学を根本的に変え、これらの規律を学生にとって魅力的なものにし、現在の制限を超えてそれを可能にします。

The complexity of cultures in the modern world is now beyond human comprehension. Cognitive sciences cast doubts on the traditional explanations based on mental models. The core subjects in humanities may lose their importance. Humanities have to adapt to the digital age. New, interdisciplinary branches of humanities emerge. Instant access to information will be replaced by instant access to knowledge. Understanding the cognitive limitations of humans and the opportunities opened by the development of artificial intelligence and interdisciplinary research necessary to address global challenges is the key to the revitalization of humanities. Artificial intelligence will radically change humanities, from art to political sciences and philosophy, making these disciplines attractive to students and enabling them to go beyond current limitations.

翻訳日:2023-11-01 20:46:20 公開日:2023-10-30

# 教師なしスキル発見のための変分カリキュラム強化学習

Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills ( http://arxiv.org/abs/2310.19424v1 )

ライセンス: Link先を確認

Seongun Kim, Kyowoon Lee, Jaesik Choi

(参考訳) 相互情報(MI)の最大化や変動エンパワーメントを通じて,タスク指向の報酬関数を使わずに複雑なスキルを自律的に獲得するための,有望なフレームワークとして,相互情報に基づく強化学習(RL)が提案されている。しかしながら、トレーニングスキルの順序がサンプル効率に大きく影響するという事実から、複雑なスキルの習得は依然として困難である。そこで本研究では,変分カリキュラムRL (VCRL) と命名する本質的な報酬関数を持つ目標条件付きRLにおいて,変分エンパワーメントをカリキュラム学習として再放送する。そこで本稿では,情報理論に基づく教師なしスキル発見のための新しい手法として,VUVC(Value Uncertainty Variational Curriculum)を提案する。規則性条件下では、VUVCは、均一なカリキュラムに比べて訪問状態のエントロピーの増加を加速させる。複雑なナビゲーションおよびロボット操作作業におけるアプローチの有効性を,サンプル効率と状態カバレッジ速度の観点から検証した。また,本手法によって得られたスキルが,実世界のロボットナビゲーションタスクをゼロショットで達成し,これらのスキルをグローバルプランナーに組み込むことにより,さらに性能が向上することを示す。

Mutual information-based reinforcement learning (RL) has been proposed as a promising framework for retrieving complex skills autonomously without a task-oriented reward function through mutual information (MI) maximization or variational empowerment. However, learning complex skills is still challenging, due to the fact that the order of training skills can largely affect sample efficiency. Inspired by this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.

翻訳日:2023-11-01 20:46:12 公開日:2023-10-30

# 平均BERTによる言語教育 : 低リソース環境における潜伏ブートストラップの効果

Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings ( http://arxiv.org/abs/2310.19420v1 )

ライセンス: Link先を確認

David Samuel

(参考訳) 本稿では,言語モデルの事前学習における代替的自己スーパービジョン手法である潜在ブートストラップの利用について検討する。離散サブワードで自己スーパービジョンを使用する典型的な方法とは異なり、潜在ブートストラップはよりリッチな監視信号にコンテキスト化された埋め込みを利用する。限られた資源から言語知識を得る上で,このアプローチがいかに効果的かを評価する実験を行う。具体的には,2つの小さなコーパスの事前学習と4つの言語ベンチマークの評価を含む,BabyLM共有タスクに基づく実験を行った。

This paper explores the use of latent bootstrapping, an alternative self-supervision technique, for pretraining language models. Unlike the typical practice of using self-supervision on discrete subwords, latent bootstrapping leverages contextualized embeddings for a richer supervision signal. We conduct experiments to assess how effective this approach is for acquiring linguistic knowledge from limited resources. Specifically, our experiments are based on the BabyLM shared task, which includes pretraining on two small curated corpora and an evaluation on four linguistic benchmarks.

翻訳日:2023-11-01 20:45:51 公開日:2023-10-30

# 固有ベクトル継続と投影型エミュレータ

Eigenvector Continuation and Projection-Based Emulators ( http://arxiv.org/abs/2310.19419v1 )

ライセンス: Link先を確認

Thomas Duguet, Andreas Ekstr\"om, Richard J. Furnstahl, Sebastian K\"onig, Dean Lee

(参考訳) 固有ベクトル継続(Eigenvector continuation)は、パラメータ集合の固有ベクトルスナップショットから派生した部分空間射影を用いたパラメトリック固有値問題の計算方法である。還元基底法(reduce-basis method)と呼ばれる、より広範な部分空間射影技法のクラスの一部である。本稿では,固有ベクトル継続および投影型エミュレータの開発,理論,応用について述べる。基礎概念を紹介し,基礎理論と収束特性を議論し,最近の量子システムへの応用と今後の展望について述べる。

Eigenvector continuation is a computational method for parametric eigenvalue problems that uses subspace projection with a basis derived from eigenvector snapshots from different parameter sets. It is part of a broader class of subspace-projection techniques called reduced-basis methods. In this colloquium article, we present the development, theory, and applications of eigenvector continuation and projection-based emulators. We introduce the basic concepts, discuss the underlying theory and convergence properties, and present recent applications for quantum systems and future prospects.

翻訳日:2023-11-01 20:45:41 公開日:2023-10-30

# GaitFormer: ノイズの多いマルチタスク学習による歩行表現の学習

GaitFormer: Learning Gait Representations with Noisy Multi-Task Learning ( http://arxiv.org/abs/2310.19418v1 )

ライセンス: Link先を確認

Adrian Cosma, Emilian Radoi

(参考訳) 歩行分析は、被験者の協力に頼らずに個人識別を行うための信頼できる方法であることが証明されている。歩行は、短時間で顕著に変化しない生体計測であり、個人特有のものと見なすことができる。これまでの歩行分析の研究は、外見に基づく手法が依存する歩行者特性の多くを考慮せずに、主に識別と人口推定に焦点を当てていた。本研究では、歩行に基づく人物識別とともに、移動パターンからのみ歩行者属性を識別する。 217kの匿名トラックレットを含む歩行分析システムを事前学習するための最大のデータセットであるdagegaitを提案する。 DenseGaitはビデオストリームを自動的に処理して構築され、現実世界に存在する一連の歩数共変量を提供する。データセットを研究コミュニティに公開しています。さらに,マルチタスク方式で事前学習したトランスフォーマーであるgaitformerを提案する。このモデルでは,cacia-bでは92.5%,fvgでは85.33%の精度を実現している。これは、類似の手法と比較して、+14.2%と+9.67%の精度の増加に相当する。さらに、ゲイトフォーマーは、動きパターンのみを利用して、性別情報と多数の外観属性を正確に識別することができる。実験を再現するコードは公開されています。

Gait analysis is proven to be a reliable way to perform person identification without relying on subject cooperation. Walking is a biometric that does not significantly change in short periods of time and can be regarded as unique to each person. So far, the study of gait analysis focused mostly on identification and demographics estimation, without considering many of the pedestrian attributes that appearance-based methods rely on. In this work, alongside gait-based person identification, we explore pedestrian attribute identification solely from movement patterns. We propose DenseGait, the largest dataset for pretraining gait analysis systems containing 217K anonymized tracklets, annotated automatically with 42 appearance attributes. DenseGait is constructed by automatically processing video streams and offers the full array of gait covariates present in the real world. We make the dataset available to the research community. Additionally, we propose GaitFormer, a transformer-based model that after pretraining in a multi-task fashion on DenseGait, achieves 92.5% accuracy on CASIA-B and 85.33% on FVG, without utilizing any manually annotated data. This corresponds to a +14.2% and +9.67% accuracy increase compared to similar methods. Moreover, GaitFormer is able to accurately identify gender information and a multitude of appearance attributes utilizing only movement patterns. The code to reproduce the experiments is made publicly.

翻訳日:2023-11-01 20:45:32 公開日:2023-10-30

# 量子多体問題解決に向けた量子実験データの機械学習

Machine learning on quantum experimental data toward solving quantum many-body problems ( http://arxiv.org/abs/2310.19416v1 )

ライセンス: Link先を確認

Gyungmin Cho, Dohun Kim

(参考訳) 量子ハードウェアの実装の進歩により、古典的コンピュータによるエミュレーションでは難解なデータの獲得が可能となった。これらのデータと古典的機械学習(ML)アルゴリズムの統合は、あいまいなパターンを明らかにする可能性を秘めている。このハイブリッドアプローチは、古典的コンピュータのみを用いた場合と比較して、効率よく解ける問題のクラスを拡大するが、現在の量子コンピュータにおけるノイズの出現により、制限された問題を解くために実現されている。ここでは、与えられたハミルトニアンの基底状態の性質の予測や量子位相の分類など、多体物理学における興味のある問題へのハイブリッドアプローチの適用性を拡張する。 127量子ビットの超伝導量子ハードウェア上で,様々なエラー低減手法を用いて実験を行い,量子コンピュータから洗練されたデータを取得することができた。これにより,最大44キュービットのシステムに対して,古典的MLアルゴリズムの実装を成功させることができた。量子実験データ処理における古典的MLアルゴリズムのスケーラビリティと有効性を検証する。

Advancements in the implementation of quantum hardware have enabled the acquisition of data that are intractable for emulation with classical computers. The integration of classical machine learning (ML) algorithms with these data holds potential for unveiling obscure patterns. Although this hybrid approach extends the class of efficiently solvable problems compared to using only classical computers, this approach has been realized for solving restricted problems because of the prevalence of noise in current quantum computers. Here, we extend the applicability of the hybrid approach to problems of interest in many-body physics, such as predicting the properties of the ground state of a given Hamiltonian and classifying quantum phases. By performing experiments with various error-reducing procedures on superconducting quantum hardware with 127 qubits, we managed to acquire refined data from the quantum computer. This enabled us to demonstrate the successful implementation of classical ML algorithms for systems with up to 44 qubits. Our results verify the scalability and effectiveness of the classical ML algorithms for processing quantum experimental data.

翻訳日:2023-11-01 20:45:08 公開日:2023-10-30

# CARPE-ID: 個人化ロボット支援のための連続適応型再識別

CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance ( http://arxiv.org/abs/2310.19413v1 )

ライセンス: Link先を確認

Federico Rollo, Andrea Zunino, Nikolaos Tsagarakis, Enrico Mingo Hoffman, Arash Ajoudani

(参考訳) 今日のHuman-Robot Interaction(HRI)のシナリオでは、ロボットが最も近い個人と協力するか、あるいはシーンがただの人間アクターを含んでいると仮定する傾向が一般的である。しかし,店舗のフロア操作のような現実的なシナリオでは,そのような仮定は保持されず,混み合った環境でロボットがターゲット認識を行う必要がある。この要件を満たすために,本研究では,ロボットが適切な個人とシームレスに協調し,視覚的な外観や部分的,あるいは完全な咬合を受けることを保証する,連続的な視覚適応技術に基づく人物再識別モジュールを提案する。実験室で記録されたビデオとHRIシナリオ,すなわち移動ロボットによる人物追従タスクを用いて,このフレームワークを単体でテストする。ターゲットは追跡中の外観を変え、カメラの視野から消えて、閉塞や服装のバリエーションの難しいケースをテストするように求められます。提案手法を最先端マルチオブジェクトトラッキング (mot) 法と比較し, 全事例において, carpe-id が選択した各ターゲットを正確に追跡できることを示した。同時に、s-o-t-a MOTはビデオ毎に4つのトラッキングエラーがある。

In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.

翻訳日:2023-11-01 20:44:36 公開日:2023-10-30

# マンモグラム画像を用いたヒューリスティックアシストトランスレス-u-netとマルチスケール密度ネットによるインテリジェント乳癌診断

Intelligent Breast Cancer Diagnosis with Heuristic-assisted Trans-Res-U-Net and Multiscale DenseNet using Mammogram Images ( http://arxiv.org/abs/2310.19411v1 )

ライセンス: Link先を確認

Muhammad Yaqub, Feng Jinchao

(参考訳) 乳癌 (BC) は女性のがん関連死亡率に大きく寄与し, 早期発見の重要性が示唆された。マンモグラフィは乳腺の異常を同定し診断するための重要なツールであるが,悪性腫瘍の正確な鑑別は困難である。本稿では,マンモグラフィ画像を用いたbcスクリーニングのための新しい深層学習手法を提案する。提案モデルは,確立されたベンチマーク音源からのデータ収集,atrous convolution-based attentive and adaptive trans-res-unet (aca-atrunet) アーキテクチャを用いた画像分割,atrous convolution-based attentive and adaptive multi-scale densenet (aca-amdn) モデルによるbc同定の3つの異なる段階からなる。 ACA-ATRUNetとACA-AMDNモデル内のハイパーパラメータは、MML-EOOアルゴリズムを用いて最適化される。複数のメトリクスを活用する性能評価を行い,従来の手法との比較分析を行った。以上の結果から,bc検出フレームワークは早期発見の精度が向上し,マンモグラフィによるスクリーニング手法が向上する可能性が示唆された。

Breast cancer (BC) significantly contributes to cancer-related mortality in women, underscoring the criticality of early detection for optimal patient outcomes. A mammography is a key tool for identifying and diagnosing breast abnormalities; however, accurately distinguishing malignant mass lesions remains challenging. To address this issue, we propose a novel deep learning approach for BC screening utilizing mammography images. Our proposed model comprises three distinct stages: data collection from established benchmark sources, image segmentation employing an Atrous Convolution-based Attentive and Adaptive Trans-Res-UNet (ACA-ATRUNet) architecture, and BC identification via an Atrous Convolution-based Attentive and Adaptive Multi-scale DenseNet (ACA-AMDN) model. The hyperparameters within the ACA-ATRUNet and ACA-AMDN models are optimised using the Modified Mussel Length-based Eurasian Oystercatcher Optimization (MML-EOO) algorithm. Performance evaluation, leveraging multiple metrics, is conducted, and a comparative analysis against conventional methods is presented. Our experimental findings reveal that the proposed BC detection framework attains superior precision rates in early disease detection, demonstrating its potential to enhance mammography-based screening methodologies.

翻訳日:2023-11-01 20:44:12 公開日:2023-10-30

# 生成されたディストリビューションは、生成モデルに対するメンバーシップ推論攻撃に必要なすべてである

Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models ( http://arxiv.org/abs/2310.19410v1 )

ライセンス: Link先を確認

Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang

(参考訳) 生成モデルは様々な視覚創造タスクで革命的な成功を収めてきたが、その間、トレーニングデータの個人情報を漏らすという脅威にさらされている。クエリイメージをトレーニングデータセットメンバまたは非メンバとして分類することにより、生成モデルのプライバシ脆弱性を示すために、いくつかのメンバシップ推論攻撃(MIA)が提案されている。しかし、これらの攻撃はシャドウモデルやホワイトボックスアクセスを必要とすること、拡散モデルのユニークな性質を無視したり焦点を合わせること、複数の生成モデルへの一般化を妨げることなど、大きな制限に苦しむ。対照的に, 生成逆ネットワーク, [可変]オートエンコーダ, 暗黙関数, 新興拡散モデルなど, 様々な生成モデルに対する最初の一般化メンバシップ推論攻撃を提案する。我々は、ターゲットジェネレータと補助的な非メンバーデータセットから生成されるディストリビューションのみを利用するため、ターゲットジェネレータはブラックボックスであり、そのアーキテクチャやアプリケーションシナリオに依存しない。実験は、すべての生成モデルが攻撃に対して脆弱であることを検証します。例えば、我々の研究は、CIFAR-10とCelebAで訓練されたDDPM、DDIM、FastDPMに対するAUC $>0.99$攻撃を達成する。そして、vqgan, ldm (text-conditional generation) および liif に対する攻撃によって auc $>0.90.$ が達成され、結果として私たちは、生成モデルの設計と公開において、このようなプライバシリークリスクに注意するようにコミュニティに訴えます。

Generative models have demonstrated revolutionary success in various visual creation tasks, but in the meantime, they have been exposed to the threat of leaking private information of their training data. Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember. However, these attacks suffer from major limitations, such as requiring shadow models and white-box access, and either ignoring or only focusing on the unique property of diffusion models, which block their generalization to multiple generative models. In contrast, we propose the first generalized membership inference attack against a variety of generative models such as generative adversarial networks, [variational] autoencoders, implicit functions, and the emerging diffusion models. We leverage only generated distributions from target generators and auxiliary non-member datasets, therefore regarding target generators as black boxes and agnostic to their architectures or application scenarios. Experiments validate that all the generative models are vulnerable to our attack. For instance, our work achieves attack AUC $>0.99$ against DDPM, DDIM, and FastDPM trained on CIFAR-10 and CelebA. And the attack against VQGAN, LDM (for the text-conditional generation), and LIIF achieves AUC $>0.90.$ As a result, we appeal to our community to be aware of such privacy leakage risks when designing and publishing generative models.

翻訳日:2023-11-01 20:43:43 公開日:2023-10-30

# 廃棄物浄化のための資源制約セマンティックセグメンテーション

Resource Constrained Semantic Segmentation for Waste Sorting ( http://arxiv.org/abs/2310.19407v1 )

ライセンス: Link先を確認

Elisa Cascina, Andrea Pellegrino, Lorenzo Tozzi

(参考訳) 本研究は, 廃棄物発生の環境への影響を最小限に抑えるため, 資源回収施設における効率的な廃棄物選別戦略の必要性に対処するものである。産業環境におけるリサイクル廃棄物の分別化のための資源制約セマンティックセマンティックセマンティックセマンティクスモデルを提案する。私たちのゴールは、処理能力に制限のあるエッジアプリケーションに適した、10MBのメモリ制約に適合するモデルを開発することです。 ICNet、BiSeNet(Xception39のバックボーン)、ENetの3つのネットワークで実験を行った。上記の制限を考慮に入れ、より広いネット上で量子化およびプルーニング技術を実装し、平均IoU測定値にわずかに影響を与えながら正の結果を得る。さらに,focal と lov\'asz を組み合わせることで,クロスエントロピー損失関数と比較して性能が向上する暗黙のクラス不均衡を解消する手法を提案する。

This work addresses the need for efficient waste sorting strategies in Materials Recovery Facilities to minimize the environmental impact of rising waste. We propose resource-constrained semantic segmentation models for segmenting recyclable waste in industrial settings. Our goal is to develop models that fit within a 10MB memory constraint, suitable for edge applications with limited processing capacity. We perform the experiments on three networks: ICNet, BiSeNet (Xception39 backbone), and ENet. Given the aforementioned limitation, we implement quantization and pruning techniques on the broader nets, achieving positive results while marginally impacting the Mean IoU metric. Furthermore, we propose a combination of Focal and Lov\'asz loss that addresses the implicit class imbalance resulting in better performance compared with the Cross-entropy loss function.

翻訳日:2023-11-01 20:43:15 公開日:2023-10-30

# 効率的な畳み込みネットワークの設計による物体検出のためのレーダー・ライダー融合

Radar-Lidar Fusion for Object Detection by Designing Effective Convolution Networks ( http://arxiv.org/abs/2310.19405v1 )

ライセンス: Link先を確認

Farzeen Munir, Shoaib Azam, Tomasz Kucner, Ville Kyrki, Moongu Jeon

(参考訳) 物体検出は知覚システムのコアコンポーネントであり、ego車両に安全な経路計画を確保するために周囲に関する情報を提供する。カメラとライダーは知覚システムを大幅に進歩させたが、その性能は悪天候下では制限される。対照的にミリ波技術は、レーダーがそのような状況で効果的に機能することを可能にする。しかし、知覚システムを構築するためにレーダーのみに頼ることは、データのばらばらな性質のため、環境を完全には捉えない。これに対処するために、センサー融合戦略が導入された。オブジェクト検出の強化のために,レーダとライダーデータを統合したデュアルブランチフレームワークを提案する。一次分枝はレーダーの特徴の抽出に焦点を合わせ、補助分枝はライダーの特徴を抽出する。これらを付加的注意を用いて組み合わせる。その後、新たな並列分岐構造(pfs)を介して統合機能を処理し、スケール変動を管理する。次に、領域提案ヘッドをオブジェクト検出に利用する。本研究では,cocoメトリクスを用いた放射データセットにおける提案手法の有効性を評価した。その結果、好適な気象条件と悪天候条件で、最先端の手法をそれぞれ1.89\%$と2.61\%$で上回った。これはレーダーとライダーの融合が、特に厳しい気象条件において、正確な物体検出と局在化を達成する上での価値を強調する。

Object detection is a core component of perception systems, providing the ego vehicle with information about its surroundings to ensure safe route planning. While cameras and Lidar have significantly advanced perception systems, their performance can be limited in adverse weather conditions. In contrast, millimeter-wave technology enables radars to function effectively in such conditions. However, relying solely on radar for building a perception system doesn't fully capture the environment due to the data's sparse nature. To address this, sensor fusion strategies have been introduced. We propose a dual-branch framework to integrate radar and Lidar data for enhanced object detection. The primary branch focuses on extracting radar features, while the auxiliary branch extracts Lidar features. These are then combined using additive attention. Subsequently, the integrated features are processed through a novel Parallel Forked Structure (PFS) to manage scale variations. A region proposal head is then utilized for object detection. We evaluated the effectiveness of our proposed method on the Radiate dataset using COCO metrics. The results show that it surpasses state-of-the-art methods by $1.89\%$ and $2.61\%$ in favorable and adverse weather conditions, respectively. This underscores the value of radar-Lidar fusion in achieving precise object detection and localization, especially in challenging weather conditions.

翻訳日:2023-11-01 20:43:01 公開日:2023-10-30

# 多エージェント協調学習システムのための後悔最小化アルゴリズム

Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems ( http://arxiv.org/abs/2310.19468v1 )

ライセンス: Link先を確認

Jialin Yi

(参考訳) MACL(Multi-Agent Cooperative Learning)は、人工知能(AI)システムであり、複数の学習エージェントが協力して共通のタスクを完了させる。様々な領域(例えば、交通制御、クラウドコンピューティング、ロボティクス)におけるMACLシステムの最近の実証的な成功は、逐次決定問題のためのMACLシステムの設計と分析に活発な研究を巻き起こした。意思決定問題に対する学習アルゴリズムの重要な指標の1つは、その後悔、すなわち、最も達成可能な報酬とアルゴリズムが得る実際の報酬との差である。低レベルの学習アルゴリズムを用いたMACLシステムの設計と開発は、膨大な経済価値を生み出すことができる。本論文では, 逐次決定問題に対するMACLシステムの解析を行う。具体的には、第3章及び第4章は、複数の学習エージェントが通信ネットワークを介して情報を交換でき、エージェントが選択した行動の報酬だけを観察できる、全情報またはバンディットフィードバックを用いて、協調型マルチエージェントマルチエージェントバンディット問題を調査する。第5章では、分散環境でのオンライン凸最適化のコミュニケーション・レグレットトレードオフを考察する。第6章では、適応的なインクリメンタルマッチングを使用して、未知だが固定型のエージェントに対して、ハイプロダクティブなチームを形成する方法について論じている。以上の問題に対して,実現可能な学習アルゴリズムに対する後悔の少ない境界を示し,この境界を達成するための効率的なアルゴリズムを提供する。第3章、第4章、第5章の後悔境界は、通信網の接続性や通信遅延にどのように影響するかを定量化し、MACLシステムにおける通信プロトコルの設計に関する有用なガイダンスを提供する。

A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systems

翻訳日:2023-11-01 20:36:00 公開日:2023-10-30

# ニューラルインプシット関数の混合による生成的ニューラルフィールド

Generative Neural Fields by Mixtures of Neural Implicit Functions ( http://arxiv.org/abs/2310.19464v1 )

ライセンス: Link先を確認

Tackgeun You and Mijeong Kim and Jungtaek Kim and Bohyung Han

(参考訳) 本稿では,暗黙的ベースネットワークの線形結合によって表現される生成的ニューラルネットワークの学習手法を提案する。提案アルゴリズムは,メタラーニングや自動デコーディングのパラダイムを採用することにより,暗黙のニューラルネットワーク表現とその係数を潜在空間で学習する。提案手法は, モデル平均化により, 推定するネットワークのサイズを小さく保ちながら, ベースネットワークの数を増やすことにより, 生成するニューラルネットワークの容量を容易に拡大する。したがって、モデルを用いたインスタンスのサンプリングは、レイテンシとメモリフットプリントの点で効率的である。さらに,対象タスクの拡散確率モデルをカスタマイズして潜時混合係数をサンプリングし,最終モデルで目に見えないデータを効果的に生成する。提案手法は,画像,ボクセルデータ,NeRFシーンの様々なベンチマークにおいて,特定のモダリティやドメインの高度な設計を伴わずに,競合生成性能を実現する。

We propose a novel approach to learning the generative neural fields represented by linear combinations of implicit basis networks. Our algorithm learns basis networks in the form of implicit neural representations and their coefficients in a latent space by either conducting meta-learning or adopting auto-decoding paradigms. The proposed method easily enlarges the capacity of generative neural fields by increasing the number of basis networks while maintaining the size of a network for inference to be small through their weighted model averaging. Consequently, sampling instances using the model is efficient in terms of latency and memory footprint. Moreover, we customize denoising diffusion probabilistic model for a target task to sample latent mixture coefficients, which allows our final model to generate unseen data effectively. Experiments show that our approach achieves competitive generation performance on diverse benchmarks for images, voxel data, and NeRF scenes without sophisticated designs for specific modalities and domains.

翻訳日:2023-11-01 20:35:32 公開日:2023-10-30

# コストからゴールまで見積もるのではなく、計画ヒューリスティックをランクに最適化する

Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal ( http://arxiv.org/abs/2310.19463v1 )

ライセンス: Link先を確認

Leah Chrestien, Tom\'as Pevn\'y, Stefan Edelkamp, Anton\'in Komenda

(参考訳) 計画のための模倣学習では、解いた問題インスタンスの集合に対してヒューリスティック関数のパラメータを最適化する。本研究は, 回帰最適経路上の状態のみを展開する, a* および greedy best-first search を主とする前方探索アルゴリズムに対して, 厳密に最適効率なヒューリスティックの必要十分条件を再検討する。そして、与えられたフォワード探索アルゴリズムの変種に合わせて調整されたランキングに基づく損失関数の族を提案する。さらに,学習理論の観点からは,コスト対ゴールの最適化が不必要に難しい理由について考察する。様々な問題に対する実験的な比較は、導出理論を支持しない。

In imitation learning for planning, parameters of heuristic functions are optimized against a set of solved problem instances. This work revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A* and greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal \hstar\ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.

翻訳日:2023-11-01 20:35:15 公開日:2023-10-30

# ハードウェア不自由通信システムにおける拡散確率モデル -無線生成AIに向けて-

Denoising Diffusion Probabilistic Models for Hardware-Impaired Communication Systems: Towards Wireless Generative AI ( http://arxiv.org/abs/2310.19460v1 )

ライセンス: Link先を確認

Mehdi Letafati, Samad Ali, Matti Latva-aho

(参考訳) ChatGPTや拡散モデルのような最先端のジェネレーティブモデルによる卓越した成果により、生成AIは、さまざまな産業や学術領域で大きな注目を集めている。本稿では,ハードウェア不整形トランシーバを用いた実用的な有限精度無線通信システムについて,拡散確率モデル(DDPM)を提案する。 DDPMの背後にある直感は、いわゆる「デノイング」ステップでデータ生成プロセスを分解することである。 DDPMベースの受信機は、ハードウェア障害(HWI)、チャネル歪み、量子化誤差などの現実的な非理想に直面する実用的な無線通信方式を提案する。提案手法は低SNR下でのネットワークレジリエンス,HWIレベルと量子化誤差の相違によるほぼ不変な再構成性能,非ガウス雑音に対するロバストなアウト・オブ・ディストリビューション性能を実現する。さらに,コサイン類似性と平均二乗誤差(MSE)の観点から,従来のディープニューラルネットワーク(DNN)ベースの受信機と比較して25dB以上の改善が見られた。

Thanks to the outstanding achievements from state-of-the-art generative models like ChatGPT and diffusion models, generative AI has gained substantial attention across various industrial and academic domains. In this paper, denoising diffusion probabilistic models (DDPMs) are proposed for a practical finite-precision wireless communication system with hardware-impaired transceivers. The intuition behind DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, a DDPM-based receiver is proposed for a practical wireless communication scheme that faces realistic non-idealities, including hardware impairments (HWI), channel distortions, and quantization errors. It is shown that our approach provides network resilience under low-SNR regimes, near-invariant reconstruction performance with respect to different HWI levels and quantization errors, and robust out-of-distribution performance against non-Gaussian noise. Moreover, the reconstruction performance of our scheme is evaluated in terms of cosine similarity and mean-squared error (MSE), highlighting more than 25 dB improvement compared to the conventional deep neural network (DNN)-based receivers.

翻訳日:2023-11-01 20:34:46 公開日:2023-10-30

# キャンパスネットワーク上のエンタングルメントに基づく量子デジタル署名

Entanglement-based quantum digital signatures over deployed campus network ( http://arxiv.org/abs/2310.19457v1 )

ライセンス: Link先を確認

Joseph C. Chapman, Muneer Alshowkan, Bing Qi, Nicholas A. Peters

(参考訳) 量子デジタル署名プロトコルは、今日のデジタル世界において、公開鍵デジタル署名のほとんどの側面を置き換えるものである。量子デジタルシグネチャプロトコルの大きな利点は、公開鍵暗号ではできないのに対して、情報理論のセキュリティを持つことができることである。ここでは,ハードウェアの実証と特徴付けを行い,キャンパスネットワーク上での絡み合いに基づく量子デジタル署名の実装を行う。 25時間以上にわたって、我々はキャンパスネットワーク上で測定を行い、量子ビット誤り率(ほとんどの場合)を十分に低く測定し、原理的には、厳密なシミュレーションで示されるように、50km以上の量子デジタルシグネチャを実現する。これらの結果は、量子デジタル署名をデプロイされたファイバ上でうまく利用することができることを示している。現在のエンタングルメントベースのアプローチの実装はシグネチャレートが低いが、実現可能なアップグレードはシグネチャレートを大幅に増加させる。さらに,報告した手法はユーザ数に優れた柔軟性を提供する。

The quantum digital signature protocol offers a replacement for most aspects of public-key digital signatures ubiquitous in today's digital world. A major advantage of a quantum digital signatures protocol is that it can have information-theoretic security, whereas public-key cryptography cannot. Here we demonstrate and characterize hardware to implement entanglement-based quantum digital signatures over our campus network. Over 25 hours, we collect measurements on our campus network, where we measure sufficiently low quantum bit error rates (<5\% in most cases) which in principle enable quantum digital signatures over up to 50 km as shown in rigorous simulation accompanied by a noise model developed specifically for our implementation. These results show quantum digital signatures can be successfully employed over deployed fiber. While the current implementation of our entanglement-based approach has a low signature rate, feasible upgrades would significantly increase the signature rate. In addition, our reported method provides great flexibility in the number of users.

翻訳日:2023-11-01 20:34:25 公開日:2023-10-30

# mmmとmmmsynth: 不均質な表データのクラスタリングと合成データ生成

MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation ( http://arxiv.org/abs/2310.19454v1 )

ライセンス: Link先を確認

Chandrani Kumari and Rahul Siddharthan

(参考訳) 我々は、クラスタリングと合成データ生成という異種グラフデータセットに関連する2つのタスクに対して、新しいアルゴリズムを提供する。タブラルデータセットは典型的には列内の異種データ型(数値、順序、カテゴリー)から構成されるが、行内に隠されたクラスタ構造を持つ場合もある。例えば、それらは異種(地理、社会経済、方法論)のソースから引き出され、それらが記述する結果変数(病気の存在など)は他の変数だけでなく、クラスタコンテキストにも依存する。さらに、生体医学データの共有は、しばしば患者の機密性法によって妨げられ、例えば、ディープラーニングによって、実際のデータから合成表データを生成するアルゴリズムへの関心がある。本研究では,合成不均質データにおけるクラスタの決定に標準アルゴリズムを上回り,実データの構造を復元する,新しいem型クラスタリングアルゴリズムmmm(`madras mixed model'')を提案する。そこで本研究では,MMMsynthという合成表データ生成アルゴリズムを用いて,入力データに対してクラスタ固有のデータ分布を仮定したクラスタワイズ合成データを生成する。このアルゴリズムは、合成データでトレーニングされ、実際に公開されたデータセットでテストされた場合、標準mlアルゴリズムのパフォーマンスをテストすることによってベンチマークを行う。我々の合成データ生成アルゴリズムは、他の文献表データ生成装置よりも優れており、実データで純粋にトレーニングのパフォーマンスにアプローチする。

We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM (``Madras Mixture Model''), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.

翻訳日:2023-11-01 20:34:07 公開日:2023-10-30

# ALT:クリックスルーレート予測のための言語モデルとCTRモデル間の微粒なアライメントを目指して

ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction ( http://arxiv.org/abs/2310.19453v1 )

ライセンス: Link先を確認

Hangyu Wang, Jianghao Lin, Xiangyang Li, Bo Chen, Chenxu Zhu, Ruiming Tang, Weinan Zhang, Yong Yu

(参考訳) クリックスルーレート(CTR)予測は、さまざまなパーソナライズされたオンラインサービスにおいてコア機能モジュールとして機能する。データモダリティと入力形式により、CTR予測のモデルは、主に2つのカテゴリに分類される。ひとつは、従来のCTRモデルで、1ホットの符号化IDの特徴を表わし、特徴相互作用モデリングによって協調的な信号をキャプチャすることを目的としている。第2のカテゴリは、ハードプロンプトテンプレートによって得られるテキストモダリティの文を入力として取り、事前訓練された言語モデル(PLM)を用いて意味知識を抽出する。これらの2つの研究は、一般的に同じ入力データ(テキストと表のモダリティ)の異なる特性に焦点を合わせ、互いに異なる相補的な関係を形成する。そこで本稿では,CTR予測のための言語モデルとCTRモデル(ALT)間の細粒度特徴レベルのアライメントを提案する。一般的なCLIPのようなインスタンスレベルのコントラスト学習とは別に、マスク言語と表型モデリングの両方のための新しい共同再構築事前訓練タスクを設計する。具体的には、一方のモダリティ(トークンや特徴)のマスクされたデータは、他方のモダリティの助けを借りて復元され、双対モダリティ間の十分な相互情報抽出を通じて特徴レベルの相互作用とアライメントを確立する必要がある。さらに,下流のctr予測タスクに対して,アライメント言語とctrモデルを別々に,あるいは共同で訓練するオプションにより,産業用途における様々な有効性と効率要件を満たした3種類の微調整戦略を提案する。 3つの実世界のデータセットに対する大規模な実験により、ALTはSOTAベースラインより優れており、様々な言語やCTRモデルに高い互換性があることが示された。

Click-through rate (CTR) prediction plays as a core function module in various personalized online services. According to the data modality and input format, the models for CTR prediction can be mainly classified into two categories. The first one is the traditional CTR models that take as inputs the one-hot encoded ID features of tabular modality, which aims to capture the collaborative signals via feature interaction modeling. The second category takes as inputs the sentences of textual modality obtained by hard prompt templates, where pretrained language models (PLMs) are adopted to extract the semantic knowledge. These two lines of research generally focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. Therefore, in this paper, we propose to conduct fine-grained feature-level Alignment between Language and CTR models (ALT) for CTR prediction. Apart from the common CLIP-like instance-level contrastive learning, we further design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose three different finetuning strategies with the option to train the aligned language and CTR models separately or jointly for downstream CTR prediction tasks, thus accommodating the varying efficacy and efficiency requirements for industrial applications. Extensive experiments on three real-world datasets demonstrate that ALT outperforms SOTA baselines, and is highly compatible for various language and CTR models.

翻訳日:2023-11-01 20:33:40 公開日:2023-10-30

# PyTorchモデルへの大規模フォールトインジェクションの適用 - 検証効率向上のためのPyTorchFIの拡張-

Large-Scale Application of Fault Injection into PyTorch Models -- an Extension to PyTorchFI for Validation Efficiency ( http://arxiv.org/abs/2310.19449v1 )

ライセンス: Link先を確認

Ralf Graafe, Qutub Syed Sha, Florian Geissler, Michael Paulitsch

(参考訳) ハードウェアにおける過渡的あるいは恒久的な障害は、ユーザー固有のエラー、すなわちサイレントデータエラー(SDE)の痕跡なしで、ニューラルネットワーク(NN)の出力を誤ったものにすることができる。一方、現代のNNは特定の障害を許容できる固有の冗長性を持っている。安全ケースを確立するには,両タイプの腐敗を識別し,定量化する必要がある。近年,ハードウェア(HW)故障がソフトウェア(SW),特にNNモデルに与える影響を調べるために,いくつかの欠陥注入法が確立されている。現在の FI 法は, 断層を注入する手法に重点を置いているが, 大規模な FI 試験に欠かせない場合が多く, 特定の断層モデルに基づく多くの故障箇所を短時間で解析する必要がある。結果は簡潔で、繰り返し可能で、同等である必要があります。これらの要件に対処し、機械学習開発サイクルのデフォルトコンポーネントとしてフォールトインジェクションを有効にするため、PyTorchALFI(Application Level Fault Injection for PyTorch)と呼ばれる新しいフォールトインジェクションフレームワークを導入する。 PyTorchALFIは、ランダムに生成された再利用可能なフォールトセットを定義し、PyTorchモデルに注入し、複雑なテストシナリオを定義し、データセットを拡張し、テストKPIを生成する。本稿では, テストシナリオの定義, ソフトウェアアーキテクチャ, および新しいフレームワークを用いて, 故障位置と数値の反復的変化を適用し, 異なるモデル修正を比較し, テスト結果を解析するいくつかの例について述べる。

Transient or permanent faults in hardware can render the output of Neural Networks (NN) incorrect without user-specific traces of the error, i.e. silent data errors (SDE). On the other hand, modern NNs also possess an inherent redundancy that can tolerate specific faults. To establish a safety case, it is necessary to distinguish and quantify both types of corruptions. To study the effects of hardware (HW) faults on software (SW) in general and NN models in particular, several fault injection (FI) methods have been established in recent years. Current FI methods focus on the methodology of injecting faults but often fall short of accounting for large-scale FI tests, where many fault locations based on a particular fault model need to be analyzed in a short time. Results need to be concise, repeatable, and comparable. To address these requirements and enable fault injection as the default component in a machine learning development cycle, we introduce a novel fault injection framework called PyTorchALFI (Application Level Fault Injection for PyTorch) based on PyTorchFI. PyTorchALFI provides an efficient way to define randomly generated and reusable sets of faults to inject into PyTorch models, defines complex test scenarios, enhances data sets, and generates test KPIs while tightly coupling fault-free, faulty, and modified NN. In this paper, we provide details about the definition of test scenarios, software architecture, and several examples of how to use the new framework to apply iterative changes in fault location and number, compare different model modifications, and analyze test results.

翻訳日:2023-11-01 20:32:52 公開日:2023-10-30

# 咬合認識時空間変圧器を用いた大規模シーンのグルーピング

Towards Grouping in Large Scenes with Occlusion-aware Spatio-temporal Transformers ( http://arxiv.org/abs/2310.19447v1 )

ライセンス: Link先を確認

Jinsong Zhang and Lingfeng Gu and Yu-Kun Lai and Xueyang Wang and Kun Li

(参考訳) グループ検出、特に大規模なシーンでは、公共の安全とスマートシティに多くの潜在的な応用がある。既存の手法では,複数人の大規模場面で頻繁な閉塞に対処できず,時空間情報の有効活用が困難である。本稿では,大規模シーンにおけるグループ検出のためのエンドツーエンドフレームワークGroupTransformerを提案する。複数の人による頻繁な隠蔽に対処するため,重度の隠蔽人作物の検出・抑制のための隠蔽エンコーダを設計した。本研究では, 時空間的関係を探究するために, 軌跡情報を抽出し, 人物間特徴を階層的に融合する時空間的トランスフォーマを提案する。大規模・小規模の両方での実験結果から,本手法は最先端の手法と比較して性能が向上することが示された。大規模シーンでは,F1スコアが10%以上向上し,精度が向上した。小規模シーンでは,f1スコアのパフォーマンスを5%以上向上させることができた。コード付きのプロジェクトページはhttp://cic.tju.edu.cn/faculty/likun/projects/GroupTransにある。

Group detection, especially for large-scale scenes, has many potential applications for public safety and smart cities. Existing methods fail to cope with frequent occlusions in large-scale scenes with multiple people, and are difficult to effectively utilize spatio-temporal information. In this paper, we propose an end-to-end framework,GroupTransformer, for group detection in large-scale scenes. To deal with the frequent occlusions caused by multiple people, we design an occlusion encoder to detect and suppress severely occluded person crops. To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner. Experimental results on both large-scale and small-scale scenes demonstrate that our method achieves better performance compared with state-of-the-art methods. On large-scale scenes, our method significantly boosts the performance in terms of precision and F1 score by more than 10%. On small-scale scenes, our method still improves the performance of F1 score by more than 5%. The project page with code can be found at http://cic.tju.edu.cn/faculty/likun/projects/GroupTrans.

翻訳日:2023-11-01 20:32:20 公開日:2023-10-30

# 狭窄検出のための連合学習フレームワーク

A Federated Learning Framework for Stenosis Detection ( http://arxiv.org/abs/2310.19445v1 )

ライセンス: Link先を確認

Mariachiara Di Cosmo, Giovanna Migliorelli, Matteo Francioni, Andi Mucaj, Alessandro Maolo, Alessandro Aprile, Emanuele Frontoni, Maria Chiara Fiorentino, and Sara Moccia

(参考訳) 本研究は,冠動脈造影画像(CA)の狭窄検出におけるFL(Federated Learning)の使用について検討した。 2つの機関から得られた2つの異種データセットについて検討した: Dataset 1は、Ancona(イタリア)のOspedale Riunitiで取得した200人の患者1219枚の画像を含み、Dataset 2は、以前の研究で得られた90人の患者7492枚の画像を含む。狭窄検出はより高速なR-CNNモデルを用いて行った。 FLフレームワークでは、モデルのバックボーンの重量のみを2つのクライアント機関間で共有し、フェデレート平均化(FedAvg)を用いて重み付けを行った。狭窄検出の精度(p rec),リコール(rec),f1スコア(f1)を用いて評価した。 FLフレームワークは,クライアント1に対して,+3.76%,+17.21%,+10.80%,Preg = 73.56, Rec = 67.01, F1 = 70.13, F1 = 70.13の局所モデルに対して,性能を向上する。このような結果から,患者プライバシを保ちつつ,様々な施設からのデータ均一性に対処することにより,CAにおける自動狭窄検出に関する多施設間研究を可能にした。

This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA). Two heterogeneous datasets from two institutions were considered: Dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy); Dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature. Stenosis detection was performed by using a Faster R-CNN model. In our FL framework, only the weights of the model backbone were shared among the two client institutions, using Federated Averaging (FedAvg) for weight aggregation. We assessed the performance of stenosis detection using Precision (P rec), Recall (Rec), and F1 score (F1). Our results showed that the FL framework does not substantially affects clients 2 performance, which already achieved good performance with local training; for client 1, instead, FL framework increases the performance with respect to local model of +3.76%, +17.21% and +10.80%, respectively, reaching P rec = 73.56, Rec = 67.01 and F1 = 70.13. With such results, we showed that FL may enable multicentric studies relevant to automatic stenosis detection in CA by addressing data heterogeneity from various institutions, while preserving patient privacy.

翻訳日:2023-11-01 20:32:03 公開日:2023-10-30

# 一対一:知識蒸留における異種アーキテクチャ間のギャップを埋める

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation ( http://arxiv.org/abs/2310.19444v1 )

ライセンス: Link先を確認

Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu

(参考訳) 知識蒸留〜(KD)は,教師-学生の学習手法によるモデル性能向上に有効な手法であることが証明されている。しかし、既存の蒸留法は、教師と生徒のモデルが同じモデルファミリー、特にヒントに基づくアプローチに属すると仮定して設計されている。集中型カーネルアライメント(CKA)を用いて、異種教師と学生のモデル間の学習特徴を比較することにより、重要な特徴のばらつきを観察する。この分散は、クロスアーキテクチャ蒸留における従来のヒントベースの手法の非効率性を示している。ヘテロジニアスモデルを蒸留する際の課題に対処するため, ヘテロジニアスアーキテクチャ間の蒸留性能を著しく向上させる, OFA-KDという, シンプルで効果的なKDフレームワークを提案する。具体的には,アーキテクチャ固有の情報を破棄するlogits空間のような,中間機能を整合した潜在空間に投影する。また,学生が無関係な情報に邪魔されることを防止すべく,適応的目標拡張手法を提案する。 cnn、transformer、mlpを含む様々なアーキテクチャによる広範囲な実験は、異種アーキテクチャ間の蒸留を可能にするofa-kdフレームワークの優位性を示しています。具体的には、我々のOFA-KDを装着すると、学生モデルは、CIFAR-100データセットで最大8.0%、ImageNet-1Kデータセットで最大0.7%の顕著なパフォーマンス向上を達成する。 PyTorchのコードとチェックポイントはhttps://github.com/Hao840/OFAKDで確認できる。

Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD.

翻訳日:2023-11-01 20:31:36 公開日:2023-10-30

# マーカーレスモーションキャプチャーからの動的ガウススメッティングは乳幼児の運動を再構成できる

Dynamic Gaussian Splatting from Markerless Motion Capture can Reconstruct Infants Movements ( http://arxiv.org/abs/2310.19441v1 )

ライセンス: Link先を確認

R. James Cotton and Colleen Peyton

(参考訳) 運動の正確な3dトラッキングへの簡単なアクセスは、リハビリテーションの多くの面に役立つだろう。この目標を達成するための課題は、有能な成人のための多くのデータセットと事前訓練されたアルゴリズムがあるが、これらのデータセットで訓練されたアルゴリズムは、障害のある人、幼児、新生児を含む臨床人口に一般化できないことだ。幼児と新生児の信頼性の高い運動分析は、自発的な運動行動が神経機能と神経発達障害の重要な指標であり、早期の介入を導くのに役立つため重要である。マーカーレスモーションキャプチャ(MMC)データに対する動的ガウススプラッティングの適用について検討した。本手法では, セグメンテーションマスクを用いて幼児に焦点を合わせ, シーンの初期化を著しく改善する。本手法は,シーンの新たな視点の描画や乳幼児の動きの追跡に有用である可能性が示唆された。この研究は、様々な臨床患者に応用できる高度な運動分析ツールへの道を開き、特に幼児の早期発見に重点を置いている。

Easy access to precise 3D tracking of movement could benefit many aspects of rehabilitation. A challenge to achieving this goal is that while there are many datasets and pretrained algorithms for able-bodied adults, algorithms trained on these datasets often fail to generalize to clinical populations including people with disabilities, infants, and neonates. Reliable movement analysis of infants and neonates is important as spontaneous movement behavior is an important indicator of neurological function and neurodevelopmental disability, which can help guide early interventions. We explored the application of dynamic Gaussian splatting to sparse markerless motion capture (MMC) data. Our approach leverages semantic segmentation masks to focus on the infant, significantly improving the initialization of the scene. Our results demonstrate the potential of this method in rendering novel views of scenes and tracking infant movements. This work paves the way for advanced movement analysis tools that can be applied to diverse clinical populations, with a particular emphasis on early detection in infants.

翻訳日:2023-11-01 20:31:03 公開日:2023-10-30

# 非対称拡散型チャネル適応型セキュア無線セマンティクス通信

Asymmetric Diffusion Based Channel-Adaptive Secure Wireless Semantic Communications ( http://arxiv.org/abs/2310.19439v1 )

ライセンス: Link先を確認

Xintian Ren, Jun Wu, Hansong Xu, Qianqian Pan

(参考訳) セマンティックコミュニケーションは、画像分類や画像再構成といったタスクにおけるエンドツーエンドデータ送信の研究を推進する、新しいディープラーニングベースのコミュニケーションパラダイムとして登場した。しかし、セマンティックアタックによるセキュリティ問題は十分に解明されておらず、セマンティック通信システム内の脆弱性が潜在的セマンティックな摂動にさらされている。本稿では,この問題を解決するために,拡散モデルと深層強化学習(drl)を利用したセキュアな意味コミュニケーションシステムであるdrawcを提案する。送信側端の拡散モジュールと受信側端の非対称なdenoisingモジュールにより、DiffuSeCはデータソース攻撃やチャネルアタックを含むセマンティックアタックによって追加された摂動を緩和する。セマンティックアタックによる不安定なチャネル条件下でのロバスト性をさらに向上するため,DRLに基づくチャネル適応拡散ステップ選択方式を開発し,変動環境下での安定した性能を実現する。両端間の拡散時間ステップ調整のための時間ステップ同期スキームを設計する。シミュレーションの結果, 提案したDiffuSeCは, より広いチャネル条件下での従来の作業よりも頑健な精度を示し, 不安定環境下での信号-雑音比(SNR)に応じて, モデル状態を迅速に調整できることがわかった。

Semantic communication has emerged as a new deep learning-based communication paradigm that drives the research of end-to-end data transmission in tasks like image classification, and image reconstruction. However, the security problem caused by semantic attacks has not been well explored, resulting in vulnerabilities within semantic communication systems exposed to potential semantic perturbations. In this paper, we propose a secure semantic communication system, DiffuSeC, which leverages the diffusion model and deep reinforcement learning (DRL) to address this issue. With the diffusing module in the sender end and the asymmetric denoising module in the receiver end, the DiffuSeC mitigates the perturbations added by semantic attacks, including data source attacks and channel attacks. To further improve the robustness under unstable channel conditions caused by semantic attacks, we developed a DRL-based channel-adaptive diffusion step selection scheme to achieve stable performance under fluctuating environments. A timestep synchronization scheme is designed for diffusion timestep coordination between the two ends. Simulation results demonstrate that the proposed DiffuSeC shows higher robust accuracy than previous works under a wide range of channel conditions, and can quickly adjust the model state according to signal-to-noise ratios (SNRs) in unstable environments.

翻訳日:2023-11-01 20:30:46 公開日:2023-10-30

# 自然ドメイン基盤モデルは医用画像分類に有用か?

Are Natural Domain Foundation Models Useful for Medical Image Classification? ( http://arxiv.org/abs/2310.19522v1 )

ライセンス: Link先を確認

Joana Pal\'es Huix and Adithya Raju Ganeshan and Johan Fredin Haslum and Magnus S\"oderberg and Christos Matsoukas and Kevin Smith

(参考訳) ディープラーニングの分野は、さまざまなタスクに容易に適応できる一般的な基礎モデルの利用に集約されている。このパラダイムシフトは自然言語処理の分野で一般的に行われているが、コンピュータビジョンでは進歩が遅くなっている。本稿では, 医用画像分類課題に対する各種基礎モデルの転送可能性について検討し, この問題に対処しようとする。具体的には, SAM, SEEM, DINOv2, BLIP, OpenCLIPの5つの基礎モデルの性能評価を行った。これらのモデルの可能性を完全に活用するために、さまざまなトレーニング設定を検討します。我々の研究は様々な結果を示している。特にDINOv2は、ImageNet事前トレーニングの標準プラクティスを一貫して上回っている。しかし、他の基盤モデルは、医療画像分類タスクへの転送可能性の限界を示すこの確立されたベースラインを一貫して打ち負かさなかった。

The deep learning field is converging towards the use of general foundation models that can be easily adapted for diverse tasks. While this paradigm shift has become common practice within the field of natural language processing, progress has been slower in computer vision. In this paper we attempt to address this issue by investigating the transferability of various state-of-the-art foundation models to medical image classification tasks. Specifically, we evaluate the performance of five foundation models, namely SAM, SEEM, DINOv2, BLIP, and OpenCLIP across four well-established medical imaging datasets. We explore different training settings to fully harness the potential of these models. Our study shows mixed results. DINOv2 in particular, consistently outperforms the standard practice of ImageNet pretraining. However, other foundation models failed to consistently beat this established baseline indicating limitations in their transferability to medical image classification tasks.

翻訳日:2023-11-01 20:22:21 公開日:2023-10-30

# 対話型レコメンデーションのための一般神経因果モデル

A General Neural Causal Model for Interactive Recommendation ( http://arxiv.org/abs/2310.19519v1 )

ライセンス: Link先を確認

Jialin Liu, Xinyan Su, Peng Zhou, Xiangyu Zhao, Jun Li

(参考訳) 観測データの生存バイアスは、リコメンダシステムの最適化を局所最適に導く。現在、ほとんどのソリューションは、強化学習による長期的な満足度を最大化するために、既存のヒューマンシステムコラボレーションパターンを再設計している。しかし、因果的観点から見れば、生き残り効果を緩和するには反事実的問題に答える必要がある。本研究では,偽推論を実現するための神経因果モデルを提案する。具体的には,学習可能な構造的因果モデルを構築し,選択遷移を定性的に特徴付ける。生存バイアスの軽減は、反事実的一貫性によって達成される。一貫性を特定するために、gumbel-max関数を構造制約として使用する。一貫性を推定するために、強化最適化を適用し、Gumbel-Softmax をトレードオフとして使い、微分可能な関数を得る。理論的および実証的な研究は、我々の解の有効性を実証する。

Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.

翻訳日:2023-11-01 20:22:06 公開日:2023-10-30

# 3次元シーンにおける質問に対する文脈対応自然回答の生成

Generating Context-Aware Natural Answers for Questions in 3D Scenes ( http://arxiv.org/abs/2310.19516v1 )

ライセンス: Link先を確認

Mohammed Munzer Dwedari, Matthias Niessner, Dave Zhenyu Chen

(参考訳) 3D質問応答は、まだ探索されていない3D視覚言語における若い分野である。従来の方法は事前に定義された回答空間に限られており、自然に回答を生成できない。本研究では,質問応答タスクをシーケンス生成タスクにピボットし,3次元シーン (gen3dqa) における質問に対する自由形式の自然な回答を生成する。この目的のために、我々は言語報酬を直接モデルに最適化し、グローバルな文セマンティクスを確保する。また,文の質を向上させるために,実用的な言語理解報酬を適用する。本手法は,ScanQAベンチマークに新しいSOTAを設定する(テストセットのCIDErスコア72.22/66.57)。

3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the language rewards to secure the global sentence semantics. Here, we also adapt a pragmatic language understanding reward to further improve the sentence quality. Our method sets a new SOTA on the ScanQA benchmark (CIDEr score 72.22/66.57 on the test sets).

翻訳日:2023-11-01 20:21:52 公開日:2023-10-30

# 衛星画像からのレーダ複合材料の異常気象用変圧器によるノキャスティング

Transformer-based nowcasting of radar composites from satellite images for severe weather ( http://arxiv.org/abs/2310.19515v1 )

ライセンス: Link先を確認

\c{C}a\u{g}lar K\"u\c{c}\"uk and Apostolos Giannakos and Stefan Schneider and Alexander Jann

(参考訳) 気象レーダのデータは, 気象予報モデルに欠かせない要素である。気象レーダーデータは高分解能で貴重な情報を提供するが、その地上性は可用性を制限し、大規模なアプリケーションを妨げる。対照的に、気象衛星はより広い領域をカバーするが、より粗い解像度を持つ。しかし、データ駆動方式と静止衛星に搭載された現代のセンサーの急速な進歩により、地上観測と宇宙観測のギャップを埋める新たな機会が生まれ、最終的には精度の高い天気予報に繋がる。ここでは、衛星データを用いて地上レーダー画像列を最大2時間リードするトランスフォーマーモデルを提案する。厳しい気象条件を反映したデータセットでトレーニングされたこのモデルは、異なる気象現象の下で発生するレーダーフィールドを予測し、急速に成長する/減少するフィールドと複雑なフィールド構造に対する堅牢性を示す。モデル解釈では、10.3$\mu m$ (c13) の赤外線チャネルは全ての気象条件の熟練した情報を含んでいるが、雷データは厳しい気象条件、特に短いリードタイムにおいて最も相対的な特徴を持つ。このモデルは、レーダータワーを明示的に必要とせずに、大きな領域にまたがる降水量予測をサポートし、数値気象予測と水文モデルを強化し、データスカース領域のレーダプロキシを提供する。さらに、オープンソースのフレームワークは、運用データ駆動の nowcasting への進展を促進する。

Weather radar data are critical for nowcasting and an integral component of numerical weather prediction models. While weather radar data provide valuable information at high resolution, their ground-based nature limits their availability, which impedes large-scale applications. In contrast, meteorological satellites cover larger domains but with coarser resolution. However, with the rapid advancements in data-driven methodologies and modern sensors aboard geostationary satellites, new opportunities are emerging to bridge the gap between ground- and space-based observations, ultimately leading to more skillful weather prediction with high accuracy. Here, we present a Transformer-based model for nowcasting ground-based radar image sequences using satellite data up to two hours lead time. Trained on a dataset reflecting severe weather conditions, the model predicts radar fields occurring under different weather phenomena and shows robustness against rapidly growing/decaying fields and complex field structures. Model interpretation reveals that the infrared channel centered at 10.3 $\mu m$ (C13) contains skillful information for all weather conditions, while lightning data have the highest relative feature importance in severe weather conditions, particularly in shorter lead times. The model can support precipitation nowcasting across large domains without an explicit need for radar towers, enhance numerical weather prediction and hydrological models, and provide radar proxy for data-scarce regions. Moreover, the open-source framework facilitates progress towards operational data-driven nowcasting.

翻訳日:2023-11-01 20:21:42 公開日:2023-10-30

# 深層学習を用いた抗体配列設計のための逆折り畳み

Inverse folding for antibody sequence design using deep learning ( http://arxiv.org/abs/2310.19513v1 )

ライセンス: Link先を確認

Fr\'ed\'eric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane

(参考訳) 3次元構造情報に基づく抗体配列設計の問題を考える。先行研究に基づき,cdr-h3ループの著明な改善とともに,抗体構造に特化し,配列回復と構造ロバスト性に関する汎用タンパク質モデルよりも優れる,微調整された逆折り畳みモデルを提案する。相補性決定領域の正準配座を研究し、これらのループの既知のクラスターへの符号化を改善した。最後に, 薬物発見およびバインダー設計へのモデルの適用を考察し, 物理学に基づく手法を用いて提案する配列の品質評価を行った。

We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.

翻訳日:2023-11-01 20:21:19 公開日:2023-10-30

# VideoCrafter1: 高品質ビデオ生成のためのオープン拡散モデル

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation ( http://arxiv.org/abs/2310.19512v1 )

ライセンス: Link先を確認

Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

(参考訳) ビデオ生成は、学界と産業の両方にますます関心を寄せている。商用ツールは可塑性ビデオを生成することができるが、研究者やエンジニアが利用できるオープンソースモデルは限られている。本稿では,高品質映像生成のための拡散モデルとして,t2v(text-to-video)とi2v(image-to-video)モデルを紹介する。 T2Vモデルは与えられたテキスト入力に基づいてビデオを合成し、I2Vモデルは追加のイメージ入力を含む。提案したT2Vモデルは、解像度が1024 \times 576$のリアルで映像品質の高いビデオを生成することができる。 I2Vモデルは、提供された参照画像の内容に厳密に準拠し、その内容、構造、スタイルを保存するビデオを作成するように設計されている。このモデルは、コンテンツ保存制約を維持しながら、所定の画像をビデオクリップに変換することができる最初のオープンソースI2V基盤モデルである。これらのオープンソースビデオ生成モデルは、コミュニティ内の技術進歩に大きく貢献すると考えています。

Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers and engineers. In this work, we introduce two diffusion models for high-quality video generation, namely text-to-video (T2V) and image-to-video (I2V) models. T2V models synthesize a video based on a given text input, while I2V models incorporate an additional image input. Our proposed T2V model can generate realistic and cinematic-quality videos with a resolution of $1024 \times 576$, outperforming other open-source T2V models in terms of quality. The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style. This model is the first open-source I2V foundation model capable of transforming a given image into a video clip while maintaining content preservation constraints. We believe that these open-source video generation models will contribute significantly to the technological advancements within the community.

翻訳日:2023-11-01 20:21:07 公開日:2023-10-30

# 量子通信用シリコン中のoバンドおよび遷移金属色中心の光物理

Photophysics of O-band and transition metal color centers in monolithic silicon for quantum communications ( http://arxiv.org/abs/2310.19510v1 )

ライセンス: Link先を確認

Murat Can Sarihan, Jiahui Huang, Jin Ho Kang, Cody Fan, Wei Liu, Khalifa M. Azizur-Rahman, Baolai Liang, Chee Wei Wong

(参考訳) 低分散Oバンド波長における色中心は、エネルギー時間絡みによるメモリ支援量子通信に向けた長寿命量子ネットワークノードにとって不可欠な資源である。そこで本研究では,光発光のダイナミクスを検証しながら,T中心およびその他の色中心欠陥の発生過程を解明し,量子ビットストレージと放射効率を向上させる。 t センターの $tx_{0}$ ライフタイムを 65% から 1.56 に延長した。さらに、ゼロ分散波長に近づいた1312nm付近に$^*Cu_n^m$関連ダブルト発光が存在し、スピン縮退により0.5T以下で磁場誘起膨張が25%増加し、T中心を高忠実なスピン光子界面として置き換えることが可能となる。

Color centers at the low-dispersion O-band wavelengths are an essential resource for long-lifetime quantum network nodes toward memory-assisted quantum communications using energy-time entanglement. In this work, we explore the process of developing T centers and other color center defects to improve qubit storage and radiative efficiency while examining the photoluminescence dynamics. We have extended the $TX_{0}$ lifetime of T centers by 65% to 1.56 $\mu$s. Furthermore, we discover the presence of a $^*Cu_n^m$ related doublet emission around 1312 nm close to the zero-dispersion wavelength, with a spin degeneracy resulting in a magnetic-field induced broadening by 25% under 0.5 T, which can be an alternative to T centers as a high-fidelity spin-photon interface.

翻訳日:2023-11-01 20:20:49 公開日:2023-10-30

# SparseByteNN: 微細なグループ空間に基づく新しいモバイル推論高速化フレームワーク

SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity ( http://arxiv.org/abs/2310.19509v1 )

ライセンス: Link先を確認

Haitao Xu, Songwei Liu, Yuyang Xu, Shuai Wang, Jiashi Li, Chenqian Yan, Liangqiang Li, Lean Fu, Xin Pan, Fangmin Chen

(参考訳) ネットワークサイズを増やすという課題に対処するため、研究者らはネットワークプルーニングを通じてスパースモデルを開発した。しかし、一般のコンピュータデバイス上での大幅な高速化を達成しながらモデル精度を維持することは、未解決の問題である。本稿では,カーネルの粒度を微粒化してリアルタイム実行を実現し,高精度なモバイル推論高速化フレームワークであるSparseByteNNを提案する。私たちの枠組みは2つの部分からなる。 (a) 構造化プルーニングと非構造化プルーニングの疎粒度を有する微細粒度カーネルスペーシティスキーマ。異なる演算子のために複数のスパースパターンを設計する。提案する全ネットワーク再構成戦略と組み合わせることで,高い圧縮率と高い精度を同時に達成できる。 b)スパースパターンと共最適化された推論エンジン。従来の知恵では、この理論的FLOPの減少は実世界の効率向上には変換されない。 ARMとWebAssembly用の効率的なスパースカーネル群を導入することで、この誤解を修正することを目指している。スパースプリミティブの効率的な実装により,MobileNet-v1のスパースバージョンは,効率・精度曲線の高密度ベースラインよりも優れていることを示す。 Qualcomm 855の実験結果によると、30%のスパースMobileNet-v1では、SparseByteNNは密度の高いバージョンで1.27倍、最先端のスパース推論エンジンMNNで1.29倍のスピードアップを達成した。 SparseByteNNのソースコードはhttps://github.com/lswzjuer/SparseByteNNで入手できる。

To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acceleration framework SparseByteNN, which leverages fine-grained kernel sparsity to achieve real-time execution as well as high accuracy. Our framework consists of two parts: (a) A fine-grained kernel sparsity schema with a sparsity granularity between structured pruning and unstructured pruning. It designs multiple sparse patterns for different operators. Combined with our proposed whole network rearrangement strategy, the schema achieves a high compression rate and high precision at the same time. (b) Inference engine co-optimized with the sparse pattern. The conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy curve. Experimental results on Qualcomm 855 show that for 30% sparse MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and 1.29x speedup over the state-of-the-art sparse inference engine MNN with a slight accuracy drop of 0.224%. The source code of SparseByteNN will be available at https://github.com/lswzjuer/SparseByteNN

翻訳日:2023-11-01 20:20:35 公開日:2023-10-30

# 変分量子特異値分解アルゴリズムの修正について

On Modifying the Variational Quantum Singular Value Decomposition Algorithm ( http://arxiv.org/abs/2310.19504v1 )

ライセンス: Link先を確認

Jezer Jojo, Ankit Khandelwal, M Girish Chandra

(参考訳) 本稿では,本論文で広く用いられている変分量子特異値分解アルゴリズムに対する2つの修正について考察する。 1つ目は、アルゴリズムの性能向上を示唆し、回路の深さを減少させる目的関数の変更である。第2の修正では、アルゴリズムの重要なステップである一般行列の期待値の計算方法が導入された。そして、この修正アルゴリズムをベンチマークし、新しい目的関数のパフォーマンスを既存のアルゴリズムと比較します。

In this work, we discuss two modifications that can be made to a known variational quantum singular value decomposition algorithm popular in the literature. The first is a change to the objective function which hints at improved performance of the algorithm and decreases the depth of the circuits. The second modification introduces a new way of computing expectation values of general matrices, which is a key step in the algorithm. We then benchmark this modified algorithm and compare the performance of our new objective function with the existing one.

翻訳日:2023-11-01 20:20:08 公開日:2023-10-30

# 水中ロボットの視覚ナビゲーションのための深層学習

Deep Learning for Visual Navigation of Underwater Robots ( http://arxiv.org/abs/2310.19495v1 )

ライセンス: Link先を確認

M. Sunbeam

(参考訳) 本稿では,水中ロボットの視覚ナビゲーションのための深層学習法を簡単に調査することを目的とする。本稿では,深層学習手法を用いた水中ロボットの視覚知覚,利用可能な水中視覚データセット,模倣学習,ナビゲーションのための強化学習手法について述べる。さらに, 水中ロボットの模倣学習や深層学習のパラダイムの下で, 現在の景観における訓練手法を明確にするために, 関連研究を分類する。深層学習アルゴリズムを用いて水中ナビゲーションのための非視覚データを処理する文献は、対照的な例を除いて考慮されない。

This paper aims to briefly survey deep learning methods for visual navigation of underwater robotics. The scope of this paper includes the visual perception of underwater robotics with deep learning methods, the available visual underwater datasets, imitation learning, and reinforcement learning methods for navigation. Additionally, relevant works will be categorized under the imitation learning or deep learning paradigm for underwater robots for clarity of the training methodologies in the current landscape. Literature that uses deep learning algorithms to process non-visual data for underwater navigation will not be considered, except as contrasting examples.

翻訳日:2023-11-01 20:19:36 公開日:2023-10-30

# 付加・乗算雑音を考慮した線形SDEの発電機同定

Generator Identification for Linear SDEs with Additive and Multiplicative Noise ( http://arxiv.org/abs/2310.19491v1 )

ライセンス: Link先を確認

Yuanyuan Wang, Xi Geng, Wei Huang, Biwei Huang, Mingming Gong

(参考訳) 本稿では,与えられた固定初期状態を持つ解過程の分布から線形確率微分方程式(sde)の生成元を同定するための条件を提案する。これらの識別可能性条件は、観測分布からの干渉後分布の同定を可能にするため、線形sdesを用いた因果推論において不可欠である。具体的には,線形SDEの生成元を付加雑音で同定するための十分かつ必要な条件と,乗法雑音で線形SDEの生成元を特定するための十分な条件を導出する。両タイプのSDEから導出される条件は汎用的であることを示す。さらに, 導出同定可能性条件の幾何学的解釈を行い, その理解を深める。理論的結果を検証するため,確立した知見を裏付け,裏付ける一連のシミュレーションを行った。

In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.

翻訳日:2023-11-01 20:19:28 公開日:2023-10-30

# 非線形力学系のための適応メタラーニングに基づくkklオブザーバ設計

Adaptive Meta-Learning-Based KKL Observer Design for Nonlinear Dynamical Systems ( http://arxiv.org/abs/2310.19489v1 )

ライセンス: Link先を確認

Lukas Trommer, Halil Yigit Oksuz

(参考訳) Kazantzis-Kravaris/Luenberger (KKL) オブザーバの設計の理論は、非線形変換写像とその左逆を使って線形オブザーバ状態空間を導入することで非線形システムの状態を推定する方法論を導入する。ニューラルネットワークを用いたデータ駆動アプローチは、これらの変換マップを正確に近似する能力を示している。本稿では,非線形力学系のオブザーバ設計をメタラーニングを通じて行う新しいアプローチを提案する。メタラーニングとは,基礎となる学習問題の本質的性質に着目し,タスクの分布に適応するための学習モデルを最適化することを目的とした機械学習の概念である。システム出力の測定から情報を活用するフレームワークを導入し、さまざまなシステム条件や属性にオンライン適応可能な学習ベースのKKLオブザーバを設計する。提案手法の有効性を検証するために,初期条件と内部パラメータの異なる非線形システムの状態推定を包括的に実験し,高い精度,一般化能力,雑音に対するロバスト性を示す。

The theory of Kazantzis-Kravaris/Luenberger (KKL) observer design introduces a methodology that uses a nonlinear transformation map and its left inverse to estimate the state of a nonlinear system through the introduction of a linear observer state space. Data-driven approaches using artificial neural networks have demonstrated the ability to accurately approximate these transformation maps. This paper presents a novel approach to observer design for nonlinear dynamical systems through meta-learning, a concept in machine learning that aims to optimize learning models for fast adaptation to a distribution of tasks through an improved focus on the intrinsic properties of the underlying learning problem. We introduce a framework that leverages information from measurements of the system output to design a learning-based KKL observer capable of online adaptation to a variety of system conditions and attributes. To validate the effectiveness of our approach, we present comprehensive experimental results for the estimation of nonlinear system states with varying initial conditions and internal parameters, demonstrating high accuracy, generalization capability, and robustness against noise.

翻訳日:2023-11-01 20:19:17 公開日:2023-10-30

# VDIP-TGV:全一般化変分を前提とした変分深度画像によるブラインド画像デコンボリューション

VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation ( http://arxiv.org/abs/2310.19477v1 )

ライセンス: Link先を確認

Tingting Wu, Zhiyan Du, Zhi Li, Feng-Lei Fan, Tieyong Zeng

(参考訳) ぼやけたイメージから未知のぼやけたカーネルで鮮明なイメージを復元することは難しい問題である。 deep image prior (dip) では、教師付きモデルではなく、単一の画像の正規化としてディープネットワークを使用することを提案している。しかし、画像とネットワークアーキテクチャの関係は不明確であるため、推定されたぼやけカーネルとクリーンイメージに十分な制約を与える適切なアーキテクチャを見つけることは困難である。また、ディップは後方のスパース最大値(map)を使い、回復画像の選択を強制するには不十分である。近年、ボケカーネルとリカバリイメージの両方に制約を課し、変分原理による最適化過程において、画像の標準偏差を考慮した変分深部画像前処理(VDIP)が提案されている。しかし,VDIPは画像の細部処理に苦慮し,ぼやけたカーネルが大きければ準最適結果を生成する傾向がある。そこで本論文では,全一般化変分法(TGV)をVDIPと組み合わせ,VDIPの欠点を克服する。 TGVはフレキシブルな正則化であり、様々な順序の偏微分の特性を利用して異なるスケールで画像を正則化し、シャープエッジを維持しながら油絵のアーチファクトを減らす。提案したVDIP-TGVは、TGVを介して余分な勾配情報を補足することにより、画像のエッジと詳細を効果的に回復する。さらに、このモデルは従来のアルゴリズムとディープラーニングを効果的に組み合わせた乗算器の交互方向法(ADMM)によって解決される。実験により,提案するVDIP-TGVは,様々な最先端モデルを定量的かつ定性的に超えることがわかった。

Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find a suitable architecture to provide sufficient constraints on the estimated blur kernels and clean images. Also, DIP uses the sparse maximum a posteriori (MAP), which is insufficient to enforce the selection of the recovery image. Recently, variational deep image prior (VDIP) was proposed to impose constraints on both blur kernels and recovery images and take the standard deviation of the image into account during the optimization process by the variational principle. However, we empirically find that VDIP struggles with processing image details and tends to generate suboptimal results when the blur kernel is large. Therefore, we combine total generalized variational (TGV) regularization with VDIP in this paper to overcome these shortcomings of VDIP. TGV is a flexible regularization that utilizes the characteristics of partial derivatives of varying orders to regularize images at different scales, reducing oil painting artifacts while maintaining sharp edges. The proposed VDIP-TGV effectively recovers image edges and details by supplementing extra gradient information through TGV. Additionally, this model is solved by the alternating direction method of multipliers (ADMM), which effectively combines traditional algorithms and deep learning methods. Experiments show that our proposed VDIP-TGV surpasses various state-of-the-art models quantitatively and qualitatively.

翻訳日:2023-11-01 20:19:00 公開日:2023-10-30

# グロークキングチケット:宝くじチケットはグロークキングを加速させる

Grokking Tickets: Lottery Tickets Accelerate Grokking ( http://arxiv.org/abs/2310.19470v1 )

ライセンス: Link先を確認

Gouki Minegishi, Yusuke Iwasawa and Yutaka Matsuo

(参考訳) ニューラルネットワークの一般化において、グロッキングは最も驚くべきパズルの1つだ。ネットワークはまず、完全なトレーニング精度と低い一般化を備えた記憶ソリューションに到達するが、さらなるトレーニングでは、完全に一般化されたソリューションに到達する。我々は,宝くじ仮説からグルーキングするメカニズムを分析し,宝くじ(良質なサブネットワーク)を見つけるためのプロセスを特定し,記憶と一般化の間の遷移相を記述するための鍵とする。我々はこれらのサブネットワークを'Grokking ticket'と呼び、完全一般化後のマグニチュードプルーニングによって識別する。まず,「グルーキングチケット」を用いて,様々な構成(MLP, Transformer, 算術, 画像分類タスク)の高密度ネットワークと比較して,宝くじがグルーキングを劇的に加速することを示す。また,「グルーキングチケット」がウェイトノルムよりも重要な要因であることを確認するため,「グッド」サブネットワークとL1とL2のノルムを持つ高密度ネットワークを比較した。その結果、サブネットワークは制御された密集モデルよりも高速に一般化することが示された。さらなる研究で、適切な刈り取り速度で、重量減衰を伴わずともグラッキングが達成できることが判明した。また,記憶ソリューションで識別されたチケットを使用したり,記憶と一般化の遷移を行ったり,初期化時にネットワークをプルーニングする場合(ランサムプルーニング,Grasp,SNIP,Synflow)にはスピードアップが起こらないことを示す。その結果、ネットワークパラメータの重みノルムはグロッキングの過程を説明するのに十分ではなく、記憶から一般化への遷移を記述するのに良いサブネットワークを見つけることの重要性が示されている。実装コードは、このリンクからアクセスすることができる。

Grokking is one of the most surprising puzzles in neural network generalization: a network first reaches a memorization solution with perfect training accuracy and poor generalization, but with further training, it reaches a perfectly generalized solution. We aim to analyze the mechanism of grokking from the lottery ticket hypothesis, identifying the process to find the lottery tickets (good sparse subnetworks) as the key to describing the transitional phase between memorization and generalization. We refer to these subnetworks as ''Grokking tickets'', which is identified via magnitude pruning after perfect generalization. First, using ''Grokking tickets'', we show that the lottery tickets drastically accelerate grokking compared to the dense networks on various configurations (MLP and Transformer, and an arithmetic and image classification tasks). Additionally, to verify that ''Grokking ticket'' are a more critical factor than weight norms, we compared the ''good'' subnetworks with a dense network having the same L1 and L2 norms. Results show that the subnetworks generalize faster than the controlled dense model. In further investigations, we discovered that at an appropriate pruning rate, grokking can be achieved even without weight decay. We also show that speedup does not happen when using tickets identified at the memorization solution or transition between memorization and generalization or when pruning networks at the initialization (Random pruning, Grasp, SNIP, and Synflow). The results indicate that the weight norm of network parameters is not enough to explain the process of grokking, but the importance of finding good subnetworks to describe the transition from memorization to generalization. The implementation code can be accessed via this link: \url{https://github.com/gouki510/Grokking-Tickets}.

翻訳日:2023-11-01 20:18:28 公開日:2023-10-30

# creoleval: creolesのための多言語マルチタスクベンチマーク

CreoleVal: Multilingual Multitask Benchmarks for Creoles ( http://arxiv.org/abs/2310.19567v1 )

ライセンス: Link先を確認

Heather Lent and Kushal Tatariya and Raj Dabre and Yiyi Chen and Marcell Fekete and Esther Ploeger and Li Zhou and Hans Erik Heje and Diptesh Kanojia and Paul Belony and Marcel Bollmann and Lo\"ic Grobol and Miryam de Lhoneux and Daniel Hershcovich and Michel DeGraff and Anders S{\o}gaard and Johannes Bjerva

(参考訳) クレオールは未開発の言語群であり、nlp研究に利用可能なリソースは少ない。クレオールと他の高リソース言語との系譜的結びつきは、伝達学習の重要な可能性を示しているが、この注釈付きデータの欠如により、このポテンシャルは妨げられている。この作業では、最大28のCreole言語をカバーする8つの異なるNLPタスクにまたがるベンチマークデータセットのコレクションであるCreoleValを紹介します。各ベンチマークについて,ゼロショット設定でベースライン実験を行い,クレオールのトランスファー学習の能力と限界をさらに確認する。最終的に、CreoleValの目標は、NLPおよび計算言語学におけるCreolesの研究を強化することである。このリソースが世界中のCreole言語ユーザへの技術的包摂に貢献できることを願っています。

Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and other highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of brand new development datasets for machine comprehension, relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, the goal of CreoleVal is to empower research on Creoles in NLP and computational linguistics. We hope this resource will contribute to technological inclusion for Creole language users around the globe.

翻訳日:2023-11-01 20:09:56 公開日:2023-10-30

# 位相相と量子相の合成次元:展望

Synthetic dimensions for topological and quantum phases: Perspective ( http://arxiv.org/abs/2310.19549v1 )

ライセンス: Link先を確認

Javier Arg\"uello-Luengo, Utso Bhattacharya, Alessio Celi, Ravindra W. Chhajlany, Tobias Grass, Marcin P{\l}odzie\'n, Debraj Rakshit, Tymoteusz Salamon, Paolo Stornati, Leticia Tarruell, and Maciej Lewenstein

(参考訳) 本稿では,バルセロナ群 (ICFO, UAB), Donostia (DIPC), Pozna\'n (UAM), Krak\'ow (UJ), Allahabad (HRI) を中心に実現された研究に基づいて, 合成次元の研究の最近の進展について報告する。合成次元の概念は原子物理学、量子光学、フォトニクスにおいて特によく機能し、内部自由度(基底状態のゼーマン準準位、準安定励起状態、原子の運動状態、光子の角運動量状態または横モード)は合成空間を提供する。本稿では, 合成次元の量子シミュレータを設計し, 曲面空間, 人工ゲージ場, 格子ゲージ理論, ツイストロニクス, 量子ランダムウォークなどを模倣する試みについて述べる。

In this Perspective article we report on recent progress on studies of synthetic dimensions, mostly, but not only, based on the research realized around the Barcelona groups (ICFO, UAB), Donostia (DIPC), Pozna\'n (UAM), Krak\'ow (UJ), and Allahabad (HRI). The concept of synthetic dimensions works particularly well in atomic physics, quantum optics, and photonics, where the internal degrees of freedom (Zeeman sublevels of the ground state, metastable excited states, or motional states for atoms, and angular momentum states or transverse modes for photons) provide the synthetic space. We describe our attempts to design quantum simulators with synthetic dimensions, to mimic curved spaces, artificial gauge fields, lattice gauge theories, twistronics, quantum random walks, and more.

翻訳日:2023-11-01 20:09:39 公開日:2023-10-30

# ワッサーシュタイン空間における近似理論, 計算, 深層学習

Approximation Theory, Computing, and Deep Learning on the Wasserstein Space ( http://arxiv.org/abs/2310.19548v1 )

ライセンス: Link先を確認

Massimo Fornasier and Pascal Heid and Giacomo Enrico Sodini

(参考訳) 有限標本からの無限次元空間における函数の近似の課題は、広く有意であると見なされている。本研究では,確率空間上で定義されるソボレフ-滑らか関数の数値近似の難解問題を探索する。我々の特に焦点は、関連する例となるワッサーシュタイン距離関数に焦点を当てている。効率的なポイントワイズ評価に焦点をあてた既存の文献とは対照的に、我々は3つの機械学習に基づくアプローチを採用して機能近似を定義する新しいコースをグラフ化した。 1. 有限数の最適輸送問題の解法と対応するワッサーシュタインポテンシャルの計算。 2.wasserstein sobolev空間におけるtikhonov正規化による経験的リスク最小化 3. ティホノフ汎函数のオイラー・ラグランジュ方程式の弱形式を特徴づけるサドル点定式化による問題への対処。理論的な貢献として,各解に対する一般化誤差に関する明示的かつ定量的な境界を与える。証明では、計量ソボレフ空間の理論を利用し、最適な輸送法、変分計算法、大きな偏差境界法と組み合わせる。数値実装では,ニューラルネットワークを基礎関数として適切に設計した。これらのネットワークは多様な方法論を用いてトレーニングを行う。このアプローチにより、トレーニング後に迅速に評価できる近似関数を得ることができる。その結果, 構築的解は, 評価速度が同等の精度で著しく向上し, 最先端法を数桁上回った。

The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.

翻訳日:2023-11-01 20:09:22 公開日:2023-10-30

# ab慣性計算のための一般球面基底上の1, 2, 3体の行列要素の展開

Expansion of one-, two- and three-body matrix elements on a generic spherical basis for nuclear ab initio calculations ( http://arxiv.org/abs/2310.19547v1 )

ライセンス: Link先を確認

Alberto Scalesi, Carlo Barbieri, Enrico Vigezzi

(参考訳) 原子核の研究は、非常に複雑な構造を持つ一、二、三体作用素を含むハミルトニアンに基づいている。伝統的に、そのような作用素の行列要素はハーモニック振動子単一粒子ベースで拡張され、これは内在的な運動の中心運動を単純な分離を可能にする。最近のいくつかの研究により、異なる単粒子基底を用いると数値核構造計算に大きな利点をもたらすことが示されている。本研究では、一般球面上で拡張されたハミルトン行列要素の完全な解析的表現を初めて提示する。これにより、最適な核基地を決定するための体系的な研究が可能になる。

Ab initio studies of atomic nuclei are based on Hamiltonians including one-, two- and three-body operators with very complicated structures. Traditionally, matrix elements of such operators are expanded on a Harmonic Oscillator single-particle basis, which allows for a simple separation of the center-of-mass motion from the intrinsic one. A few recent investigations have showed that the use of different single-particle bases can bring significant advantages to numerical nuclear structure computations. In this work, the complete analytical expression of the Hamiltonian matrix elements expanded on a generic spherical basis is presented for the first time. This will allow systematic studies aimed at the determination of optimal nuclear bases.

翻訳日:2023-11-01 20:08:59 公開日:2023-10-30

# MENTOR:アイリス提示検出のための人間の知覚誘導事前訓練

MENTOR: Human Perception-Guided Pretraining for Iris Presentation Detection ( http://arxiv.org/abs/2310.19545v1 )

ライセンス: Link先を確認

Colton R. Crum, Adam Czajka

(参考訳) CNNのトレーニングに人間のサリエンスを取り入れることで、生体情報提示攻撃検出などの困難なタスクのパフォーマンスが向上した。しかし、アノテーションの収集は面倒な作業であり、アノテーションが手に入ると、(モデルアーキテクチャにおいて)この情報をモデルのトレーニングに効率的に組み込む方法や方法に関する問題には言及しない。本稿では、これらの問題を2回の訓練で解決するMENTOR(huMan pErceptioN-guided preTraining fOr iris pResentation attack Detection)を紹介する。まず、入力虹彩画像(実例と偽例の両方)から人間の唾液マップを学習するためにオートエンコーダを訓練する。この表現が学習されると、トレーニングされたautoencoderを2つの方法で利用します。 (a)アイリス提示攻撃検知器の事前訓練されたバックボーンとして、及び (b) 未知データ上の有能な特徴の人為的なアノテータである。 MENTORの利点は3つあります。 (a)一般用重量(例えば、画像ネットソース、ランダム)と比較して、人間の知覚訓練エンコーダの重量を使用する場合のアイリスPAD性能の顕著な向上 b) 未確認アイリスPADサンプルに対する無数のヒト様唾液マップを作成する能力、及び、ヒト唾液誘導訓練パラダイムにおける使用方法 (c)虹彩PADモデルトレーニングの効率性の向上。資料のコードと重みが同紙とともに提供される。

Incorporating human salience into the training of CNNs has boosted performance in difficult tasks such as biometric presentation attack detection. However, collecting human annotations is a laborious task, not to mention the questions of how and where (in the model architecture) to efficiently incorporate this information into model's training once annotations are obtained. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr iris pResentation attack detection), which addresses both of these issues through two unique rounds of training. First, we train an autoencoder to learn human saliency maps given an input iris image (both real and fake examples). Once this representation is learned, we utilize the trained autoencoder in two different ways: (a) as a pre-trained backbone for an iris presentation attack detector, and (b) as a human-inspired annotator of salient features on unknown data. We show that MENTOR's benefits are threefold: (a) significant boost in iris PAD performance when using the human perception-trained encoder's weights compared to general-purpose weights (e.g. ImageNet-sourced, or random), (b) capability of generating infinite number of human-like saliency maps for unseen iris PAD samples to be used in any human saliency-guided training paradigm, and (c) increase in efficiency of iris PAD model training. Sources codes and weights are offered along with the paper.

翻訳日:2023-11-01 20:08:48 公開日:2023-10-30

# 単発視覚追跡における画像関連誘導バイアスの活用

Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking ( http://arxiv.org/abs/2310.19542v1 )

ライセンス: Link先を確認

Chuanming Tang, Kai Wang, Joost van de Weijer, Jianlin Zhang, Yongmei Huang

(参考訳) 視覚追跡における最先端のパフォーマンスにもかかわらず、最近のシングルブランチトラッカーは、ビジョントランスフォーマー(ViT)エンコーダと推論パイプラインに関連する、弱い前提を見逃す傾向にある。さらに, 判別トラッカの有効性は, デュアルブランチパイプラインの採用により制限されている。単分岐ネットワークと識別モデルとのギャップを埋めるための適応型ViTモデル予測トラッカー(AViTMP)を提案する。具体的には,提案するエンコーダavit-encにおいて,vitに基づく密組込みパラダイムを豊かにするために,アダプタモジュールとジョイントターゲット状態埋め込みを導入する。次にavit-encと密輸デコーダと判別対象モデルを組み合わせて正確な位置を推定する。さらに,従来の推論手法の限界を緩和するため,双方向のサイクルトラッキング検証により,トラクタの存在下でのロバスト性を向上するCycleTrackという新しい推論パイプラインを提案する。最後に,長期的なシナリオにおいて大きな課題を積極的に処理する,デュアルフレーム更新推論戦略を提案する。実験では,lasot,lasotextsub,avistなどを含む総合評価のための10のトラッキングベンチマークについてavitmpを評価した。実験結果から,AViTMPが最先端の性能,特に長期追跡とロバスト性を達成したことが明らかとなった。

Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.

翻訳日:2023-11-01 20:08:24 公開日:2023-10-30

# IterInv:Pixel-Level T2Iモデルの反復インバージョン

IterInv: Iterative Inversion for Pixel-Level T2I Models ( http://arxiv.org/abs/2310.19540v1 )

ライセンス: Link先を確認

Chuanming Tang, Kai Wang, Joost van de Weijer

(参考訳) 大規模テキスト画像拡散モデルは、入力テキストプロンプトに従って説得力のある画像を生成するための画期的な開発である。画像編集研究の目的は、ユーザーがテキストプロンプトを変更することによって生成された画像を制御することである。現在の画像編集技術は、LDM(Latent Diffusion Models)に基づくDDIMインバージョンに依存している。しかし、LDMがオートエンコーダ機構を備えた最初の圧縮段階により詳細を失うと、遅延空間で動作する大きな事前訓練されたT2Iモデルが存在する。代わりに、ImagenやDeepFloyd-IFといった画素レベルで動作する別のメインストリームのT2Iパイプラインは、この問題を回避する。通常は複数のステージで構成され、通常はテキストから画像へのステージと、いくつかの超解像度ステージで構成される。この場合、DDIMのインバージョンは、超解像拡散モデルがDDIM技術と互換性がないため、元の画像を生成する初期ノイズを見つけることができない。実験結果によると,雑音画像を条件として反復結合することがこの問題の根源である。本研究では,このT2Iモデルのストリームに対する反復反転(IterInv)手法を開発し,オープンソースのDeepFloyd-IFモデルを用いてIterInvを検証する。 IterInvの手法と一般的な画像編集手法を組み合わせることで、IterInvの応用可能性を証明する。コードは \url{https://github.com/Tchuanm/IterInv.git} でリリースされる。

Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{https://github.com/Tchuanm/IterInv.git}.

翻訳日:2023-11-01 20:07:58 公開日:2023-10-30

# チーム問題解決をリアルタイムで改善するための新しい表現

A Novel Representation to Improve Team Problem Solving in Real-Time ( http://arxiv.org/abs/2310.19539v1 )

ライセンス: Link先を確認

Alex Doboli

(参考訳) 本稿では,実生活における問題解決におけるチームの行動の理解と改善を支援する,計算メトリクスを支援する新しい表現を提案する。チームは現代の活動において重要ですが、活動を改善するためのコンピューティング支援はほとんどありません。この表現は、解決中に開発、拡張、利用された異なるメンタルイメージをキャプチャする。ケーススタディは表現を示します。

This paper proposes a novel representation to support computing metrics that help understanding and improving in real-time a team's behavior during problem solving in real-life. Even though teams are important in modern activities, there is little computing aid to improve their activity. The representation captures the different mental images developed, enhanced, and utilized during solving. A case study illustrates the representation.

翻訳日:2023-11-01 20:07:35 公開日:2023-10-30

# 量子レゴとxp安定化コード

Quantum Lego and XP Stabilizer Codes ( http://arxiv.org/abs/2310.19538v1 )

ライセンス: Link先を確認

Ruohan Shen, Yixu Wang and ChunJun Cao

(参考訳) 我々は,'quantum lego' の最近のグラフィカルな枠組みを,安定化群が一般に非可換な xp 安定化符号に適用する。演算子マッチングの考え方がそのような符号を保ち続けており、結果の符号が XP であればすべての XP 対称性を生成するのに十分であることを示す。テンソル収縮や結合の下でこれらの対称性を追跡する効率的な古典アルゴリズムを提供する。これは、パウリの安定化状態やクリフォード演算を超えて、ゴッテマン・クニルの定理によって暗示されるアルゴリズムの部分拡張を構成する。共役変換は普遍的な量子演算を生成するため、これらのアルゴリズムから得られるXP対称性は一般に得られるテンソルを一意に特定しない。この拡張フレームワークを使用することで、高い距離を持つ新しいXP安定化コードとフォールトトレラントな$T$ゲートを持つ$[[8,1,2]]$コードを提供します。 XP正規符号に対しては、任意の単一キュービットエラーチャネルに対して、テンソルネットワークに基づく最大可能性復号器を構築する。

We apply the recent graphical framework of ''quantum lego'' to XP stabilizer codes where the stabilizer group is generally non-abelian. We show that the idea of operator matching continues to hold for such codes and is sufficient for generating all their XP symmetries provided the resulting code is XP. We provide an efficient classical algorithm for tracking these symmetries under tensor contraction or conjoining. This constitutes a partial extension of the algorithm implied by Gottesman-Knill theorem beyond Pauli stabilizer states and Clifford operations. Because conjoining transformations generate quantum operations that are universal, the XP symmetries obtained from these algorithms do not uniquely identify the resulting tensors in general. Using this extended framework, we provide a novel XP stabilizer code with higher distance and a $[[8,1,2]]$ code with fault-tolerant $T$ gate. For XP regular codes, we also construct a tensor-network-based the maximum likelihood decoder for any i.i.d. single qubit error channel.

翻訳日:2023-11-01 20:07:30 公開日:2023-10-30

# 判別的特徴を有するデータに対する微調整の影響について

On consequences of finetuning on data with highly discriminative features ( http://arxiv.org/abs/2310.19537v1 )

ライセンス: Link先を確認

Wojciech Masarczyk, Tomasz Trzci\'nski, Mateusz Ostaszewski

(参考訳) トランスファーラーニングの時代、スクラッチからニューラルネットワークを訓練することは時代遅れになりつつある。転送学習は新しいタスクの事前知識を活用し、計算資源を保存する。ネットワークは基本的なデータパターンを優先し、事前学習した価値のある機能を禁止する傾向があります。この挙動を「機能侵食」と呼び、ネットワーク性能と内部表現への影響を分析する。

In the era of transfer learning, training neural networks from scratch is becoming obsolete. Transfer learning leverages prior knowledge for new tasks, conserving computational resources. While its advantages are well-documented, we uncover a notable drawback: networks tend to prioritize basic data patterns, forsaking valuable pre-learned features. We term this behavior "feature erosion" and analyze its impact on network performance and internal representations.

翻訳日:2023-11-01 20:07:16 公開日:2023-10-30

# 逆バッチ逆強化学習 : 対話的勧告のための不完全な実証から振り返る

Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation ( http://arxiv.org/abs/2310.19536v1 )

ライセンス: Link先を確認

Jialin Liu, Xinyan Su, Zeyu He, Xiangyu Zhao, Jun Li

(参考訳) 報酬はユーザの満足度を測る指標であり、インタラクティブなレコメンデーションシステムでは制限要因として機能する。本研究では,強化学習の基礎となる学習報酬問題(LTR)に焦点を当てた。従来のアプローチでは、報酬を得るための追加の手順を導入し、最適化の複雑さを増大させるか、ユーザとエージェントのインタラクションが完璧なデモを提供すると仮定する。理想的には、構成実証を用いて報酬と政策の両方を最適化する統一的なアプローチを採用することを目指している。しかし、この要件は、報酬が本質的に政治におけるユーザーのフィードバックを定量化するのに対し、推薦エージェントは政治外の将来的な累積評価を近似するため、課題となる。この課題に取り組むために,要求される特性を実現する新しいバッチ逆強化学習パラダイムを提案する。 LTRとレコメンダエージェント評価を併用するために,ディスカウントされた定常分布補正を利用する。構成要件を満たすために,保存を通じて悲観主義の概念を取り入れる。具体的には,ベルマン変換を用いてバニラ補正を修正し,KL正則化を適用した。実世界の2つのデータセットを用いて経験的研究を行い,提案手法は相対的に有効性(2.3\%)と効率(11.53\%)を向上することを示した。

Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)

翻訳日:2023-11-01 20:07:11 公開日:2023-10-30

# レガシビデオコンテンツの再生:双方向情報伝達によるデインターレース

Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation ( http://arxiv.org/abs/2310.19535v1 )

ライセンス: Link先を確認

Zhaowei Gao, Mingyang Song, Christopher Schroers, Yang Zhang

(参考訳) 古いcrt表示技術と限られた伝送帯域のため、初期のフィルムやテレビ放送ではインターレース走査が一般的であった。これは各フィールドが情報の半分しか含まないことを意味する。現代のディスプレイはフルフレームを必要とするため、これはデインターレースの研究、すなわちレガシービデオコンテンツの欠落した情報を復元するきっかけとなった。本稿では,アニメーションコンテンツとライブアクションコンテンツを分離する深層学習手法を提案する。提案手法は,空間と時間の両方で情報を活用するために,複数スケールにわたる双方向時空間情報伝搬を支援する。より具体的には,アライメント,融合,整流などの機能改良を行うフローガイドリファインメントブロック(frb)を設計する。さらに,複数のフィールドを同時に処理し,フレーム単位の処理時間を短縮し,リアルタイム処理を可能にする。実験の結果,提案手法は既存手法と比較して優れた性能を示した。

Due to old CRT display technology and limited transmission bandwidth, early film and TV broadcasts commonly used interlaced scanning. This meant each field contained only half of the information. Since modern displays require full frames, this has spurred research into deinterlacing, i.e. restoring the missing information in legacy video content. In this paper, we present a deep-learning-based method for deinterlacing animated and live-action content. Our proposed method supports bidirectional spatio-temporal information propagation across multiple scales to leverage information in both space and time. More specifically, we design a Flow-guided Refinement Block (FRB) which performs feature refinement including alignment, fusion, and rectification. Additionally, our method can process multiple fields simultaneously, reducing per-frame processing time, and potentially enabling real-time processing. Our experimental results demonstrate that our proposed method achieves superior performance compared to existing methods.

翻訳日:2023-11-01 20:06:43 公開日:2023-10-30

# 生成言語モデルにおける学習困難度軽減のための情報エントロピー損失

InfoEntropy Loss to Mitigate Bias of Learning Difficulties for Generative Language Models ( http://arxiv.org/abs/2310.19531v1 )

ライセンス: Link先を確認

Zhenpeng Su, Xing Wu, Xue Bai, Zijia Lin, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

(参考訳) 生成言語モデルは、通常、前のものから次のトークン(サブワード/ワード/フレーズ)を予測することによって、大きなテキストコーパスで事前訓練される。最近の研究は、下流タスクにおける大規模な生成言語モデルの印象的な性能を実証している。しかし、既存の生成言語モデルは、訓練中にテキストコーパスに固有の課題、すなわち頻繁なトークンと頻繁なトークンの不均衡を無視している。これは、言語モデルが一般的で簡単に学習できるトークンに支配され、希少で難解なトークンを見渡すことができる。そこで我々は,情報エントロピー損失(InfoEntropy Loss)関数を提案する。学習中,語彙上の予測確率分布の情報エントロピーに応じて,to-be-learnedトークンの学習難易度を動的に評価することができる。その後、トレーニング損失を適応的にスケーリングし、モデルをより理解の難しいトークンに集中させようとする。 Pileデータセットでは、生成言語モデルを436M、1.1B、6.7Bパラメータで訓練する。提案されたInfoEntropy Lossを組み込んだモデルでは、ダウンストリームベンチマークで一貫したパフォーマンス向上が期待できる。

Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose an Information Entropy Loss (InfoEntropy Loss) function. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 436M, 1.1B, and 6.7B parameters. Experiments reveal that models incorporating the proposed InfoEntropy Loss can gain consistent performance improvement on downstream benchmarks.

翻訳日:2023-11-01 20:06:29 公開日:2023-10-30

# Decoupled Actor-Critic

Decoupled Actor-Critic ( http://arxiv.org/abs/2310.19527v1 )

ライセンス: Link先を確認

Michal Nauman and Marek Cygan

(参考訳) アクタ-クリティックな手法は、一見無矛盾な2つの問題の停滞状態にある。まず、過大評価に対する批判的傾向は、低バウンドq値を用いて最適化された保守的政策から時間差目標をサンプリングする必要がある。第2に、不確実性に直面した楽観的な政策は、後悔のレベルを低くすることを示している。そこで我々は,この二分法を治療するために,DAC(Decoupled Actor-Critic)を提案する。 DACは、時間差学習に使用される保守的なアクターと、探索に使用される楽観的なアクターという、2つの異なるアクターをグラデーションバックプロパゲーションによって学習する。我々は,DeepMind制御タスクにおいて,低リプレイ率と高リプレイ率の条件下でDACを試験し,複数の設計選択を補正する。計算オーバーヘッドは最小限だが、DACは最先端の性能とロコモーションタスクのサンプル効率を達成する。

Actor-Critic methods are in a stalemate of two seemingly irreconcilable problems. Firstly, critic proneness towards overestimation requires sampling temporal-difference targets from a conservative policy optimized using lower-bound Q-values. Secondly, well-known results show that policies that are optimistic in the face of uncertainty yield lower regret levels. To remedy this dichotomy, we propose Decoupled Actor-Critic (DAC). DAC is an off-policy algorithm that learns two distinct actors by gradient backpropagation: a conservative actor used for temporal-difference learning and an optimistic actor used for exploration. We test DAC on DeepMind Control tasks in low and high replay ratio regimes and ablate multiple design choices. Despite minimal computational overhead, DAC achieves state-of-the-art performance and sample efficiency on locomotion tasks.

翻訳日:2023-11-01 20:06:11 公開日:2023-10-30

# ホットダイヤモンド基板へのイオン注入によるカラーセンターの高密度アンサンブルの効率的な作製

Efficient fabrication of high-density ensembles of color centers via ion implantation on a hot diamond substrate ( http://arxiv.org/abs/2310.19526v1 )

ライセンス: Link先を確認

E. Nieto Hernandez, G. Andrini, A. Crnjac, M. Brajkovic, F. Picariello, E. Corte, V. Pugliese, M. Matijevi\'c, P. Apr\`a, V. Varzi, J. Forneris, M. Genovese, Z. Siketic, M. Jaksic, S. Ditalia Tchernij

(参考訳) ダイヤモンドの窒素空洞(nv)中心は量子計測やセンシングを含む量子技術にとって有望なシステムである。外部界に対する高い感度を達成するための有望な戦略は、ダイヤモンド格子に導入された放射線損傷の量によってイオン注入による製造が上限のnv中心の大きなアンサンブルの活用に依存している。本研究は,熱標的基板(>550 {\deg}C)にMeV N2+イオンを高流動注入することにより,NV中心の密度を増大させるアプローチを示す。以上の結果から, 高温注入はダイヤモンドからグラファイト相への可逆変換に必要な空隙密度閾値を増大させ, 高密度アンサンブルを実現することができることがわかった。さらに, mev n2+およびmg+イオンで様々な温度で注入されたダイヤモンド基板上に色中心の形成効率を調べた結果, nv中心とマグネシウム空孔(mgv)中心の両方の形成効率は, 注入温度とともに増加することが明らかとなった。

Nitrogen-Vacancy (NV) centers in diamond are promising systems for quantum technologies, including quantum metrology and sensing. A promising strategy for the achievement of high sensitivity to external fields relies on the exploitation of large ensembles of NV centers, whose fabrication by ion implantation is upper limited by the amount of radiation damage introduced in the diamond lattice. In this works we demonstrate an approach to increase the density of NV centers upon the high-fluence implantation of MeV N2+ ions on a hot target substrate (>550 {\deg}C). Our results show that, with respect to room-temperature implantation, the high-temperature process increases the vacancy density threshold required for the irreversible conversion of diamond to a graphitic phase, thus enabling to achieve higher density ensembles. Furthermore, the formation efficiency of color centers was investigated on diamond substrates implanted at varying temperatures with MeV N2+ and Mg+ ions revealing that the formation efficiency of both NV centers and magnesium-vacancy (MgV) centers increases with the implantation temperature.

翻訳日:2023-11-01 20:05:56 公開日:2023-10-30

# DPATD:Dual-Phase Audio Transformer for Denoising

DPATD: Dual-Phase Audio Transformer for Denoising ( http://arxiv.org/abs/2310.19588v1 )

ライセンス: Link先を確認

Junhui Li, Pu Wang, Jialu Li, Xinzhe Wang, Youshan Zhang

(参考訳) 近年の高性能トランスフォーマーベース音声強調モデルでは,時間領域法が時間周波数領域法と同等の性能を達成できることが示されている。しかし、時間領域音声強調システムは、通常、多数の時間ステップからなる入力音声シーケンスを受け取り、非常に長いシーケンスをモデル化し、適切な動作を訓練することは困難である。本稿では,より小さな音声チャンクを入力として利用し,上記の課題に対処するために,音声情報の効率的な活用を実現する。本研究では,二重位相オーディオトランスフォーマ(dpatd)を提案する。トランスフォーマ層を深層構造に整理し,クリーンなオーディオシーケンスを学習する新しいモデルである。 DPATDは音声入力を小さなチャンクに分割し、入力長は元のシーケンス長の平方根に比例することができる。メモリに圧縮された説明可能な注意は効率的で、頻繁に使用される自己注意モジュールよりも早く収束する。我々のモデルは最先端の手法よりも優れています。

Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods. However, time-domain speech enhancement systems typically receive input audio sequences consisting of a large number of time steps, making it challenging to model extremely long sequences and train models to perform adequately. In this paper, we utilize smaller audio chunks as input to achieve efficient utilization of audio information to address the above challenges. We propose a dual-phase audio transformer for denoising (DPATD), a novel model to organize transformer layers in a deep structure to learn clean audio sequences for denoising. DPATD splits the audio input into smaller chunks, where the input length can be proportional to the square root of the original sequence length. Our memory-compressed explainable attention is efficient and converges faster compared to the frequently used self-attention module. Extensive experiments demonstrate that our model outperforms state-of-the-art methods.

翻訳日:2023-11-01 19:57:36 公開日:2023-10-30

# gc-mvsnet:マルチビュー、マルチスケール、幾何学的一貫性のあるマルチビューステレオ

GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo ( http://arxiv.org/abs/2310.19583v1 )

ライセンス: Link先を確認

Vibhas K. Vats, Sripad Joshi, David J. Crandall, Md. Alimoor Reza, Soon-heung Jung

(参考訳) 従来のマルチビューステレオ(MVS)手法は、測光的および幾何的整合性制約に大きく依存するが、より新しい機械学習ベースのMVS法は、後処理ステップとしてのみ複数のソースビューにまたがる幾何的整合性をチェックする。本稿では,学習中に異なるスケールで複数のソースビューにまたがる参照ビュー深度マップの幾何学的一貫性を明示的に奨励する新しいアプローチを提案する(図1参照)。この幾何整合性損失を加えることで、幾何的不整合画素を明示的にペナル化することで学習を著しく加速し、訓練の繰り返し要求を他のMVS手法のほぼ半分に削減する。広範な実験により,dtu と blendedmvs データセットにおける新たな最先端技術と,タンク・テンプルベンチマークの競合結果が得られた。我々の知る限り、GC-MVSNetは学習中にマルチビュー、マルチスケールの幾何的一貫性を強制する最初の試みである。

Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.

翻訳日:2023-11-01 19:57:20 公開日:2023-10-30

# 会話を通して見る:拡散モデルに基づく音声・視覚音声分離

Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model ( http://arxiv.org/abs/2310.19581v1 )

ライセンス: Link先を確認

Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung

(参考訳) 本研究の目的は,視覚手がかりを用いた混合音声から対象話者の声を抽出することである。音声と音声の分離に関する既存の研究は、その性能を有望な知性で実証している。そこで本研究では,自然サンプル生成能力で知られる拡散機構に基づく音声・視覚音声分離モデルであるavdiffussを提案する。拡散の2つのモードを効果的に融合させるため,クロスアテンションに基づく特徴融合機構を提案する。このメカニズムは、音声生成における音声・視覚対応から音声情報を統合するための音声領域に特化している。このようにして、融合プロセスは過剰な計算要求なしに、特徴の高時間分解を維持できる。提案手法は,VoxCeleb2 と LRS3 の2つのベンチマークを用いて,より自然な音声を生成する。

The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for its capability in generating natural samples. For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism. This mechanism is specifically tailored for the speech domain to integrate the phonetic information from audio-visual correspondence in speech generation. In this way, the fusion process maintains the high temporal resolution of the features, without excessive computational requirements. We demonstrate that the proposed framework achieves state-of-the-art results on two benchmarks, including VoxCeleb2 and LRS3, producing speech with notably better naturalness.

翻訳日:2023-11-01 19:56:52 公開日:2023-10-30

# 単眼3次元顔再建における知覚形状損失

A Perceptual Shape Loss for Monocular 3D Face Reconstruction ( http://arxiv.org/abs/2310.19580v1 )

ライセンス: Link先を確認

Christopher Otto, Prashanth Chandran, Gaspard Zoss, Markus Gross, Paulo Gotardo, Derek Bradley

(参考訳) モノクロ3D顔の再構成は広範にわたるトピックであり、既存のアプローチでは、高速ニューラルネットワーク推論またはオフラインの顔形状の反復的再構成によってこの問題に取り組む。どちらの場合でも、注意深く設計されたエネルギー関数は最小化され、一般的には測光損失、ランドマーク再投影損失などの損失項が含まれる。本研究では,特定の画像から得られる3次元顔の復元の質を人間がどのように認識するかに着想を得た,単眼顔撮影のための新しい損失関数を提案する。シェーディングが人間の視覚系における3次元形状の強い指標となることは広く知られている。そこで,新しい「知覚的」形状損失は,シェーディング手がかりのみを用いて3次元顔推定の質を判定することを目的としている。私たちの損失は、入力された顔画像とジオメトリ推定のシェードレンダリングを取り込んで、シェードレンダリングが与えられた画像にどの程度適合するかを知覚的に評価するスコアを予測する、識別器スタイルのニューラルネットワークとして実装されます。この「批判的」ネットワークはRGB画像と幾何学的レンダリングだけで動作し、シーン内のアルベドや照明を見積もる必要がない。さらに、この損失は画像空間で完全に動作しており、メッシュトポロジーに非依存である。新しい知覚的形状損失と従来の3d顔の最適化やディープニューラルネットワークの回帰といったエネルギー用語を組み合わせることで、最先端の結果を改善できることを示す。

Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.

翻訳日:2023-11-01 19:56:35 公開日:2023-10-30

# skip-wavenet:レーダーエコーグラム中のfirn層を追跡するwaveletベースのマルチスケールアーキテクチャ

Skip-WaveNet: A Wavelet based Multi-scale Architecture to Trace Firn Layers in Radar Echograms ( http://arxiv.org/abs/2310.19574v1 )

ライセンス: Link先を確認

Debvrat Varshney, Masoud Yari, Oluwanisola Ibikunle, Jilu Li, John Paden, Maryam Rahnemoonfar

(参考訳) 空中レーダーセンサーから生成したエコーグラムは、氷床の上のほこりの層を捉えている。これらの層の正確な追跡は,氷冠融解の海面上昇への寄与を調べるために必要となる積雪率を計算するために重要である。しかし、地下層を検出するためにレーダエコーグラムを自動的に処理することは難しい問題である。本研究では,これらのレーダエコーグラムのためのウェーブレットに基づくマルチスケールディープラーニングアーキテクチャを開発し,ファーン層の検出を改善する。ウェーブレットベースアーキテクチャは, 最適データセットスケール (ods) と最適画像スケール (ois) のf-scoreを, 非ウェーブレットアーキテクチャよりもそれぞれ3.99%, 3.7%改善することを示す。さらに,提案するスキップウェーブネットアーキテクチャは,各イテレーションで新たなウェーブレットを生成し,最先端のfirn層検出ネットワークと比較して高い汎用性を実現し,平均絶対誤差3.31ピクセル,94.3%の平均精度で層深さを推定する。このようなネットワークは科学者によってファーン層を辿り、年間降雪率を計算し、氷床の表面の質量収支を推定し、地球規模の海面上昇を予測するために利用できる。

Echograms created from airborne radar sensors capture the profile of firn layers present on top of an ice sheet. Accurate tracking of these layers is essential to calculate the snow accumulation rates, which are required to investigate the contribution of polar ice cap melt to sea level rise. However, automatically processing the radar echograms to detect the underlying firn layers is a challenging problem. In our work, we develop wavelet-based multi-scale deep learning architectures for these radar echograms to improve firn layer detection. We show that wavelet based architectures improve the optimal dataset scale (ODS) and optimal image scale (OIS) F-scores by 3.99% and 3.7%, respectively, over the non-wavelet architecture. Further, our proposed Skip-WaveNet architecture generates new wavelets in each iteration, achieves higher generalizability as compared to state-of-the-art firn layer detection networks, and estimates layer depths with a mean absolute error of 3.31 pixels and 94.3% average precision. Such a network can be used by scientists to trace firn layers, calculate the annual snow accumulation rates, estimate the resulting surface mass balance of the ice sheet, and help project global sea level rise.

翻訳日:2023-11-01 19:56:10 公開日:2023-10-30

# ブーストツリーを用いた語彙データに基づくモデル不確かさに基づく能動的学習

Model Uncertainty based Active Learning on Tabular Data using Boosted Trees ( http://arxiv.org/abs/2310.19573v1 )

ライセンス: Link先を確認

Sharath M Shankaranarayana

(参考訳) 教師付き機械学習は、モデルトレーニングのための適切なラベル付きデータの可用性に依存している。ラベル付きデータは人間のアノテーションによって取得されるが、これは面倒でコストのかかるプロセスであり、しばしば主題の専門家を必要とする。アクティブラーニングは機械学習のサブフィールドであり、モデルトレーニングのための最も価値のあるデータインスタンスを選択し、人間のアノテータからのみラベルをクエリすることで、ラベル付きデータを効率的に取得するのに役立つ。近年、特に深層ニューラルネットワークに基づくモデルにおいて、アクティブラーニングの分野で多くの研究が行われている。 image\textual\multimodalデータを扱う際にはディープラーニングが輝くが、勾配向上手法は表データよりもはるかに優れた結果が得られる傾向にある。本研究では,ブースト木を用いた表データに対するアクティブラーニングについて検討する。アクティブラーニングにおける不確実性に基づくサンプリングは、最も一般的に使用されるクエリ戦略であり、これらのインスタンスのラベルは、現在のモデル予測が最大限に不確実なシーケンシャルにクエリされる。エントロピーはしばしば不確実性を測定するための選択である。しかし、エントロピーは必ずしもモデルの不確かさの尺度ではない。モデル不確実性を計測し、それをアクティブな学習に活用する深層学習には多くの研究があるが、神経以外のネットワークモデルについては、まだ研究されていない。そこで本研究では,強化木を用いたモデル不確実性手法の有効性について検討する。このモデルの不確実性を生かして、表データの回帰タスクに対するアクティブラーニングにおける不確実性に基づくサンプリングを提案する。さらに,回帰課題に対するコスト効率のよいアクティブラーニング手法と,分類課題に対するコスト効率のよいアクティブラーニング手法を提案する。

Supervised machine learning relies on the availability of good labelled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.

翻訳日:2023-11-01 19:55:46 公開日:2023-10-30

# インコンテキスト学習のためのデモ再生による入力ラベルマッピングの改善

Improving Input-label Mapping with Demonstration Replay for In-context Learning ( http://arxiv.org/abs/2310.19572v1 )

ライセンス: Link先を確認

Zhuocheng Gong, Jiahao Liu, Qifan Wang, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

(参考訳) In-context Learning(ICL)は、入力にいくつかの入力ラベルを付加して、モデルパラメータを直接調整することなく、下流のNLPタスクに対するモデルの理解を強化する、大規模な自己回帰言語モデルの出現する能力である。 ICLの有効性は、大きな言語モデル(LLM)の強力な言語モデリング能力によるもので、インコンテキストのデモンストレーションに基づいて入力とラベルのマッピングを学習することができる。有望な結果を得たにもかかわらず、ICLにおける言語モデリングの因果性は、注意を後方のみに制限する、すなわちトークンは以前のトークンにのみ対応し、完全な入力ラベル情報の取得に失敗し、モデルの性能を制限している。本稿では,スライディング因果注意法(RdSca)を用いた新たなICL手法を提案する。具体的には、後続のデモンストレーションを複製してフロントに結合し、モデルが因果制限下でも後続の情報を‘オブザーバ’できるようにします。さらに、情報漏洩を避けるために、因果注意をカスタマイズするスライディング因果注意を導入する。実験の結果,本手法はICL実験における入力ラベルマッピングを大幅に改善することがわかった。また,先行研究の未調査領域であるトレーニングなしで因果的注意をカスタマイズする方法について,詳細な分析を行った。

In-context learning (ICL) is an emerging capability of large autoregressive language models where a few input-label demonstrations are appended to the input to enhance the model's understanding of downstream NLP tasks, without directly adjusting the model parameters. The effectiveness of ICL can be attributed to the strong language modeling capabilities of large language models (LLMs), which enable them to learn the mapping between input and labels based on in-context demonstrations. Despite achieving promising results, the causal nature of language modeling in ICL restricts the attention to be backward only, i.e., a token only attends to its previous tokens, failing to capture the full input-label information and limiting the model's performance. In this paper, we propose a novel ICL method called Repeated Demonstration with Sliding Causal Attention, (RdSca). Specifically, we duplicate later demonstrations and concatenate them to the front, allowing the model to `observe' the later information even under the causal restriction. Besides, we introduce sliding causal attention, which customizes causal attention to avoid information leakage. Experimental results show that our method significantly improves the input-label mapping in ICL demonstrations. We also conduct an in-depth analysis of how to customize the causal attention without training, which has been an unexplored area in previous research.

翻訳日:2023-11-01 19:55:18 公開日:2023-10-30

# DataZoo: トラフィック分類実験の合理化

DataZoo: Streamlining Traffic Classification Experiments ( http://arxiv.org/abs/2310.19568v1 )

ライセンス: Link先を確認

Jan Luxemburk, Karel Hynek

(参考訳) コンピュータビジョンや自然言語処理などの機械学習コミュニティは、開発を加速するために多数の支援ツールとベンチマークデータセットを開発してきた。対照的に、ネットワークトラフィック分類分野は、ほとんどのタスクの標準ベンチマークデータセットが欠落しており、利用可能なサポートソフトウェアはスコープが限られている。本稿では,このギャップに対処し,ネットワークトラフィック分類におけるデータセット管理の合理化と,評価設定における潜在的なミスの空間削減を目的としたツールセットであるDataZooを紹介する。 DataZooは、CESNET-QUIC22、CESNET-TLS22、CESNET-TLS-Year22という3つの広範なデータセットにアクセスするための標準化されたAPIを提供する。さらに、時間的およびサービス関連要因を考慮して、機能スケーリングと現実的なデータセットパーティショニングの方法も含まれている。 DataZooツールセットは、現実的な評価シナリオの作成を簡単にし、分類方法のクロスコンペア化と結果の再現を容易にする。

The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification and to reduce the space for potential mistakes in the evaluation setup. DataZoo provides a standardized API for accessing three extensive datasets -- CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-Year22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.

翻訳日:2023-11-01 19:54:54 公開日:2023-10-30

# 複数の観測者がKSコンテキスト性を検出することができるか?

Can multiple observers detect KS-contextuality? ( http://arxiv.org/abs/2310.19564v1 )

ライセンス: Link先を確認

Arthur C. R. Dutra, Roberto D. Baldij\~ao, Marcelo Terra Cunha

(参考訳) KS-コンテキスト性は量子論の重要な特徴である。以前の研究では、複数の独立したオブザーバが同じシステム上で連続的に測定するセットアップにおいて、$N$-cycle KS-contextualityがなくなりました。この現象は、状態が劣化し、量子資源が枯渇する追加観測者の測定として説明できる。この説明は、状態に依存しない文脈性はそのようなシステムで生き残るべきであることを意味する。本稿では,この現象はそうではないことを示す。この結果は,公共システムにおけるペレスメルミン非文脈性不等式を破ろうとするオブザーバーをシミュレートすることで達成した。さらに, 状況に依存しない場合においても文脈性が失われることを説明するため, 設定の分析的記述を提供する。最終的に、これらの結果は、状態に依存しない文脈性は、ある文脈の測定の間にあるシステムに何が起こるかとは独立ではないことを示している。

KS-contextuality is a crucial feature of quantum theory. Previous research demonstrated the vanishing of $N$-cycle KS-contextuality in setups where multiple independent observers measure sequentially on the same system, which we call Public Systems. This phenomenon can be explained as the additional observers' measurements degrading the state and depleting the quantum resource. This explanation would imply that state-independent contextuality should survive in such a system. In this paper, we show that this is not the case. We achieved this result by simulating an observer trying to violate the Peres-Mermin noncontextuality inequality in a Public System. Additionally, we provide an analytical description of our setup, explaining the loss of contextuality even in the state-independent case. Ultimately, these results show that state-independent contextuality is not independent of what happens to the system in-between the measurements of a context.

翻訳日:2023-11-01 19:54:37 公開日:2023-10-30

# 多様体上のロボット学習のための非パラメトリック回帰

Non-parametric regression for robot learning on manifolds ( http://arxiv.org/abs/2310.19561v1 )

ライセンス: Link先を確認

P. C. Lopez-Custodio, K. Bharath, A. Kucukyilmaz, and S. P. Preston

(参考訳) ロボット学習のためのツールの多くはユークリッドデータのために設計された。しかし、ロボット工学における多くの応用は多様体値データを含む。一般的な例は向き付けであり、これは3-by-3回転行列あるいは四元数として表すことができ、その空間は非ユークリッド多様体である。ロボット学習では、多様体値データはしばしば多様体を適切なユークリッド空間に関連付けるか、あるいは1つまたは複数の接空間にデータを投影することによって処理される。これらのアプローチは予測精度の低さと畳み込みアルゴリズムをもたらす可能性がある。本稿では,多様体内で直接作用する回帰に対する「内在的」なアプローチを提案する。これは多様体上の適切な確率分布を取ることを含み、そのパラメータを時間のような予測変数の関数とし、カーネルを組み込んだ「局所的確率」法によりその関数を非パラメトリックに推定する。我々はこの手法をカーネル化推定と呼ぶ。アプローチは概念的には単純であり、一般に異なる多様体に適用できる。ロボット工学のアプリケーションでよく見られる3種類の多様体値データを用いて実装する。これらの実験の結果はプロジェクションベースアルゴリズムよりも予測精度がよい。

Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.

翻訳日:2023-11-01 19:54:20 公開日:2023-10-30

# 物理視聴覚コモンセンス推論のための不連続反事実学習

Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning ( http://arxiv.org/abs/2310.19559v1 )

ライセンス: Link先を確認

Changsheng Lv and Shuai Zhang and Yapeng Tian and Mengshi Qi and Huadong Ma

(参考訳) 本稿では,物理視聴覚コモンセンス推論のためのdcl(disentangleed counterfactual learning)アプローチを提案する。このタスクは、ビデオとオーディオの両方の入力に基づいて物体の物理常識を推論することを目的としており、主な課題は人間の推論能力を模倣する方法である。現在の手法のほとんどは、マルチモーダルデータにおける異なる特徴を十分に活用できず、モデルの因果推論能力の欠如は、暗黙の物理的知識の推論の進歩を妨げる。これらの問題に対処するために,本提案手法では,可変オートエンコーダ (vae) を応用し,相互情報をコントラスト損失関数で最大化する不連続シーケンシャルエンコーダ (disentangled sequential encoder) による潜在空間内の静的(時間不変)および動的(時変)要素に映像を分離する。さらに,異なる物体間の物理的知識関係のモデル化により,モデルの推論能力を増強する対実的学習モジュールを導入する。提案手法は,任意のベースラインに組み込むことができるプラグアンドプレイモジュールである。実験では,提案手法はベースライン法を改良し,最先端の性能を実現する。ソースコードはhttps://github.com/andy20178/dclで入手できます。

In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention. Our proposed method is a plug-and-play module that can be incorporated into any baseline. In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance. Our source code is available at https://github.com/Andy20178/DCL.

翻訳日:2023-11-01 19:54:02 公開日:2023-10-30

# モデルスパーシフィケーションを用いた非凸・非スムース問題に対するプライバシ保存型連立初等双次学習

Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification ( http://arxiv.org/abs/2310.19558v1 )

ライセンス: Link先を確認

Yiwei Li, Chien-Wei Huang, Shuai Wang, Chong-Yung Chi, Tony Q. S. Quek

(参考訳) フェデレートラーニング(FL)は、クライアントのデータを共有することなく、パラメータサーバ(PS)のオーケストレーションの下で、大規模な分散クライアント上でモデルをトレーニングする、急速に成長する研究領域として認識されている。本稿では,非凸性および非平滑性損失関数を特徴とするフェデレーション問題を,FLアプリケーションで広く普及しているが,非凸性と非平滑性の性質が複雑であり,通信効率とプライバシ保護の矛盾が原因で対処が困難である。本稿では,非凸および非滑らかなFL問題に適した双方向モデルスペーシフィケーションを備えた新しいフェデレーション原始双対アルゴリズムを提案し,高いプライバシー保証のために差分プライバシを適用した。その独特な洞察力といくつかのプライバシーと収束分析は、flアルゴリズム設計ガイドラインにも提示されている。実世界のデータに対する広範囲な実験を行い,提案アルゴリズムの有効性と最先端のflアルゴリズムよりも優れた性能を実証し,解析結果と特性の検証を行った。

Federated learning (FL) has been recognized as a rapidly growing research area, where the model is trained over massively distributed clients under the orchestration of a parameter server (PS) without sharing clients' data. This paper delves into a class of federated problems characterized by non-convex and non-smooth loss functions, that are prevalent in FL applications but challenging to handle due to their intricate non-convexity and non-smoothness nature and the conflicting requirements on communication efficiency and privacy protection. In this paper, we propose a novel federated primal-dual algorithm with bidirectional model sparsification tailored for non-convex and non-smooth FL problems, and differential privacy is applied for strong privacy guarantee. Its unique insightful properties and some privacy and convergence analyses are also presented for the FL algorithm design guidelines. Extensive experiments on real-world data are conducted to demonstrate the effectiveness of the proposed algorithm and much superior performance than some state-of-the-art FL algorithms, together with the validation of all the analytical results and properties.

翻訳日:2023-11-01 19:53:39 公開日:2023-10-30

# 効率的なプレトレーニングによる映像基礎モデルの構築

Harvest Video Foundation Models via Efficient Post-Pretraining ( http://arxiv.org/abs/2310.19554v1 )

ライセンス: Link先を確認

Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, Limin Wang, Yu Qiao, Ping Luo

(参考訳) ビデオデータの冗長性や高品質なビデオ言語データセットの欠如のため、ビデオ言語基盤モデルの構築は費用がかかり難い。本稿では,画像から映像ファンデーションモデルを取り出すための効率的なフレームワークを提案する。提案手法は,入力ビデオパッチをランダムにドロップし,プレトレーニング後の入力テキストをマスクアウトすることで,直感的に簡単である。パッチドロップはトレーニング効率を大幅に向上させ、テキストマスキングはクロスモーダル融合の学習を強制する。提案手法の有効性を検証するために,ゼロショットタスク,ビデオ質問応答,ビデオテキスト検索など,幅広い下流課題において広範囲な実験を行った。その単純さにもかかわらず、本手法は、事前訓練されたビデオ基盤モデルに匹敵する最先端のパフォーマンスを実現する。この手法は非常に効率的で、8gpuで1日未満でトレーニングでき、プリトレーニングデータとしてwebvid-10mだけを必要とする。当社の手法は,ビデオファンデーションモデルをシンプルかつ強力なものにし,構築時に有用な洞察を提供し,事前学習された大規模モデルをよりアクセスし,持続可能なものにすることを願っている。これはInternVideoプロジェクト \url{https://github.com/OpenGVLab/InternVideo} の一部である。

Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets. In this paper, we propose an efficient framework to harvest video foundation models from image ones. Our method is intuitively simple by randomly dropping input video patches and masking out input text during the post-pretraining procedure. The patch dropping boosts the training efficiency significantly and text masking enforces the learning of cross-modal fusion. We conduct extensive experiments to validate the effectiveness of our method on a wide range of video-language downstream tasks including various zero-shot tasks, video question answering, and video-text retrieval. Despite its simplicity, our method achieves state-of-the-art performances, which are comparable to some heavily pretrained video foundation models. Our method is extremely efficient and can be trained in less than one day on 8 GPUs, requiring only WebVid-10M as pretraining data. We hope our method can serve as a simple yet strong counterpart for prevalent video foundation models, provide useful insights when building them, and make large pretrained models more accessible and sustainable. This is part of the InternVideo project \url{https://github.com/OpenGVLab/InternVideo}.

翻訳日:2023-11-01 19:53:19 公開日:2023-10-30

# RayDF:マルチビュー整合性を持つニューラルな地表面距離場

RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency ( http://arxiv.org/abs/2310.19629v1 )

ライセンス: Link先を確認

Zhuoman Liu, Bo Yang

(参考訳) 本稿では,連続3次元形状表現の問題について検討する。既存の成功手法の大半は座標に基づく暗黙的神経表現である。しかし、新しいビューを描画したり、明示的な表面点を復元するのに非効率である。少数の研究がレイベースの神経関数として3次元形状を定式化し始めたが、多視点幾何整合性の欠如により学習された構造は劣っている。これらの課題に対処するために、RayDFと呼ばれる新しいフレームワークを提案する。主な構成要素は3つある。 1)単純光線面距離場。 2)新しい2線視認性分類器,及び 3) 学習された線面距離を多視点形状に整合させるマルチビュー一貫性最適化モジュール。提案手法を3つの公開データセット上で広範に評価し,既存の座標ベースおよびレイベースベースラインを明らかに超越した,合成および挑戦的な実世界の3Dシーンにおける3次元表面点再構成における顕著な性能を示した。最も注目すべきは,800x800深度の画像を描画する座標ベースの手法よりも1000倍高速で,3次元形状表現の精度が向上している点である。私たちのコードとデータはhttps://github.com/vlar-group/raydfで入手できます。

In this paper, we study the problem of continuous 3D shape representations. The majority of existing successful methods are coordinate-based implicit neural representations. However, they are inefficient to render novel views or recover explicit surface points. A few works start to formulate 3D shapes as ray-based neural functions, but the learned structures are inferior due to the lack of multi-view geometry consistency. To tackle these challenges, we propose a new framework called RayDF. It consists of three major components: 1) the simple ray-surface distance field, 2) the novel dual-ray visibility classifier, and 3) a multi-view consistency optimization module to drive the learned ray-surface distances to be multi-view geometry consistent. We extensively evaluate our method on three public datasets, demonstrating remarkable performance in 3D surface point reconstruction on both synthetic and challenging real-world 3D scenes, clearly surpassing existing coordinate-based and ray-based baselines. Most notably, our method achieves a 1000x faster speed than coordinate-based methods to render an 800x800 depth image, showing the superiority of our method for 3D shape representation. Our code and data are available at https://github.com/vLAR-group/RayDF

翻訳日:2023-11-01 19:46:02 公開日:2023-10-30

# トランスフォーメーション対トラディション:芸術と人文科学のための人工知能(AGI)

Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities ( http://arxiv.org/abs/2310.19626v1 )

ライセンス: Link先を確認

Zhengliang Liu, Yiwei Li, Qian Cao, Junwen Chen, Tianze Yang, Zihao Wu, John Hale, John Gibbs, Khaled Rasheed, Ninghao Liu, Gengchen Mai, and Tianming Liu

(参考訳) 人工知能(AGI)の最近の進歩、特に大きな言語モデルと創造的な画像生成システムは、芸術や人文科学にまたがる様々なタスクにおいて印象的な能力を示している。しかし、AGIの急速な進化は、文化的に重要なこれらの領域における責任ある展開について批判的な疑問を提起している。本稿では,芸術と人文科学に関連するテキスト,グラフィック,オーディオ,ビデオに対するAGIの応用と意義を包括的に分析する。詩から歴史,マーケティング,映画,コミュニケーションから古典芸術まで幅広い分野における最先端システムとその利用について調査した。我々は, AGIシステムにおける事実性, 毒性, バイアス, 公衆安全に関する重大な懸念を概説し, 緩和戦略を提案する。この論文は、真理や人間の尊厳を損なうことなく、agiが創造性、知識、文化的価値を促進することを保証するために、マルチステイクホルダーの協力を主張する。私たちのタイムリーな貢献は、急速に発展する分野をまとめ、人間の繁栄を中心とした責任ある進歩を提唱しながら、有望な方向性を強調します。この分析は、AGIの技術能力と永続的な社会財との整合性に関するさらなる研究の基盤となる。

Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly human. This paper provides a comprehensive analysis of the applications and implications of AGI for text, graphics, audio, and video pertaining to arts and the humanities. We survey cutting-edge systems and their usage in areas ranging from poetry to history, marketing to film, and communication to classical art. We outline substantial concerns pertaining to factuality, toxicity, biases, and public safety in AGI systems, and propose mitigation strategies. The paper argues for multi-stakeholder collaboration to ensure AGI promotes creativity, knowledge, and cultural values without undermining truth or human dignity. Our timely contribution summarizes a rapidly developing field, highlighting promising directions while advocating for responsible progress centering on human flourishing. The analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods.

翻訳日:2023-11-01 19:45:34 公開日:2023-10-30

# タンパク質言語モデルの学習後量子化の探索

Exploring Post-Training Quantization of Protein Language Models ( http://arxiv.org/abs/2310.19624v1 )

ライセンス: Link先を確認

Shuang Peng, Fei Yang, Ning Sun, Sheng Chen, Yanfeng Jiang, Aimin Pan

(参考訳) esm-1bやesm-2のような教師なしタンパク質言語モデル(proteinlms)の最近の進歩は、さまざまなタンパク質予測タスクで期待されている。しかし、これらのモデルは、高い計算要求、重要なメモリ要求、遅延のために問題に直面し、限られたリソースを持つデバイスでの使用を制限する。そこで本研究では,ProteinLMのポストトレーニング量子化(PTQ)について検討し,ESM-2ProteinLMをベースとしたAlphaFoldの簡易版であるESMFoldに着目した。我々の研究は、たんぱく質の全重みと活性化を定量化する最初の試みである。典型的な均一量子化法はESMFoldでは不十分であり、8ビット量子化ではTMスコアが大幅に低下する。 esmfold,特に層正規化前の高度に非対称なアクティベーション範囲について,幅広い量子化実験を行い,低ビット固定点形式を用いた表現の困難さを明らかにした。これらの課題に対処するために,不斉アクティベーション値の分数次線形量子化を利用して正確な近似を保証する新しいPTQ法を提案する。タンパク質構造予測タスクにおける本手法の有効性を実証し,ESMFoldを精度良く低ビット幅まで正確に定量化できることを示した。さらに,本手法を接触予測タスクに適用し,その汎用性を示した。本研究は,タンパク質膜に対する革新的PTQ法を導入し,特定の量子化課題に対処し,タンパク質関連アプリケーションに重要な意味を持つより効率的なタンパク質膜の開発につながる可能性がある。

Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize all weights and activations of ProteinLMs. We observed that the typical uniform quantization method performs poorly on ESMFold, causing a significant drop in TM-Score when using 8-bit quantization. We conducted extensive quantization experiments, uncovering unique challenges associated with ESMFold, particularly highly asymmetric activation ranges before Layer Normalization, making representation difficult using low-bit fixed-point formats. To address these challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, demonstrating that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, showcasing its versatility. In summary, our study introduces an innovative PTQ method for ProteinLMs, addressing specific quantization challenges and potentially leading to the development of more efficient ProteinLMs with significant implications for various protein-related applications.

翻訳日:2023-11-01 19:44:40 公開日:2023-10-30

# 疎正準相関推定のためのベイズ的手法

A Bayesian Methodology for Estimation for Sparse Canonical Correlation ( http://arxiv.org/abs/2310.19621v1 )

ライセンス: Link先を確認

Siddhesh Kulkarni, Subhadip Pal, Jeremy T. Gaskins

(参考訳) 共同研究に参加した被験者ごとに異なる実験から得られた多視点高次元データの統合的統計解析を行うことは困難である。標準相関解析 (CCA) は、そのようなデータセット間の関係を識別するための統計的手続きである。その文脈において、構造化スパースCA(Structured Sparse CCA, ScSCCA)は、対応するCCA方向ベクトルをスパースと仮定して、異なるデータモダリティ間の相互関係の堅牢なモデリングを目的とした、急速に発展する方法論分野である。急速に成長している統計方法論開発地域であるが、ベイズパラダイムで関連する方法論を開発する必要がある。本稿では,ベイズ無限因子モデルを用いて,モデリングフレームワークの2つの異なるレベルでのスパース性を促進することにより,頑健な推定を実現することを目的とした,新しいscscca手法を提案する。まず, 潜時変荷重行列のレベルにおいて, スパーシリティを促進するために, 乗算半コーシー法を用いる。さらに,グラフィカルホースシューの事前構造や対角構造を用いて,共分散行列のさらなるスパース性を促進する。提案手法と他の頻繁に使用されるCCA法の性能を比較するために複数のシミュレーションを行い,乳がん研究から得られたマルチオミクスデータを解析するために,本手法を適用した。

It can be challenging to perform an integrative statistical analysis of multi-view high-dimensional data acquired from different experiments on each subject who participated in a joint study. Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between such data sets. In that context, Structured Sparse CCA (ScSCCA) is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities by assuming the corresponding CCA directional vectors to be sparse. Although it is a rapidly growing area of statistical methodology development, there is a need for developing related methodologies in the Bayesian paradigm. In this manuscript, we propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation by encouraging sparsity in two different levels of the modeling framework. Firstly, we utilize a multiplicative Half-Cauchy process prior to encourage sparsity at the level of the latent variable loading matrices. Additionally, we promote further sparsity in the covariance matrix by using graphical horseshoe prior or diagonal structure. We conduct multiple simulations to compare the performance of the proposed method with that of other frequently used CCA procedures, and we apply the developed procedures to analyze multi-omics data arising from a breast cancer study.

翻訳日:2023-11-01 19:44:04 公開日:2023-10-30

# 大規模軌道モデルはスケーラブルな運動予測器とプランナーである

Large Trajectory Models are Scalable Motion Predictors and Planners ( http://arxiv.org/abs/2310.19620v1 )

ライセンス: Link先を確認

Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao

(参考訳) 運動予測と計画は自動運転において重要なタスクであり、最近の取り組みは機械学習ベースのアプローチに移行している。課題には、多様な道路トポロジの理解、長期にわたる交通力学の推論、異種行動の解釈、大規模連続状態空間におけるポリシーの生成などが含まれる。モデルスケーリングによる類似の複雑さに対処する大規模言語モデルの成功に触発されて、我々はState Transformer (STR)と呼ばれるスケーラブルなトラジェクトリモデルを導入した。 strは、観測、状態、動作を一つの統一シーケンスモデリングタスクに配置することで、動き予測と動き計画の問題を再構成する。単純なモデル設計では、STRは両問題におけるベースラインアプローチを一貫して上回っている。実験結果から,STRなどの大型軌道モデル(LTM)は,優れた適応性と学習効率を示すことにより,スケーリング法則に従うことが明らかとなった。定性的な結果は、LTMがトレーニングデータ分布から大きく分岐するシナリオにおいて、妥当な予測を行うことができることを示している。 LTMはまた、明確な損失設計やコストの高い高レベルのアノテーションなしで、長期計画のための複雑な推論を行うことを学ぶ。

Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task. With a simple model design, STR consistently outperforms baseline approaches in both problems. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.

翻訳日:2023-11-01 19:43:41 公開日:2023-10-30

# 大規模言語モデルにおける心の選別理論の全体像に向けて

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models ( http://arxiv.org/abs/2310.19619v1 )

ライセンス: Link先を確認

Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

(参考訳) 大規模言語モデル(llm)は、心の理論(tom)の潜在的出現に関して、かなりの関心と議論を生み出した。最近のいくつかの調査では、これらのモデルに堅牢なToMが欠如していることが判明し、新しいベンチマークの開発に対する需要が高まっている。本稿では,(1)ToMの全体像を分類するにはどうすればよいのか,という2つの道路封鎖問題に答える。 2) マシンToMのより効果的な評価プロトコルとは何か? 心理学的な研究の後、機械のToMを7つの精神状態カテゴリーに分類し、既存のベンチマークでToMの未調査側面を特定する。 ToMの総合的かつ位置的評価により、ToMを個々の構成要素に分解し、LLMを物理的に環境に配置し、人間との相互作用において社会的に位置するエージェントとして扱う。このような位置評価は、精神状態をより包括的に評価し、近道やデータ漏洩のリスクを軽減する可能性がある。さらに,概念実証としてグリッド・ワールド・セットアップにおけるパイロット・スタディを提案する。このポジションペーパーは将来,ToM と LLM を統合し,研究者がToM のランドスケープで作業を行うための直感的な手段となることを期待する。プロジェクトページ:https://github.com/Mars-tin/awesome-theory-of-mind

Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks, as current ones primarily focus on different aspects of ToM and are prone to shortcuts and data leakage. In this position paper, we seek to answer two road-blocking questions: (1) How can we taxonomize a holistic landscape of machine ToM? (2) What is a more effective evaluation protocol for machine ToM? Following psychological studies, we taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM. We argue for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans. Such situated evaluation provides a more comprehensive assessment of mental states and potentially mitigates the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept. We hope this position paper can facilitate future research to integrate ToM with LLMs and offer an intuitive means for researchers to better position their work in the landscape of ToM. Project page: https://github.com/Mars-tin/awesome-theory-of-mind

翻訳日:2023-11-01 19:43:21 公開日:2023-10-30

# 抑制性神経回路はシナプス可塑性のサインを制御する

Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity ( http://arxiv.org/abs/2310.19614v1 )

ライセンス: Link先を確認

Julian Rossbroich, Friedemann Zenke

(参考訳) 神経回路がどのように信用割り当てを達成するかは、システム神経科学において未解決の課題である。様々な研究により、多層ネットワークによるバックプロパゲートエラー信号の解法が提案されている。これらの純粋に機能的に動機づけられたモデルは、シナプス可塑性の徴候を決定する局所的エラー信号を表すために異なる神経細胞のコンパートメントを仮定する。しかし、この明示的な誤り変調は、主にシナプス後活動に依存する現象学的可塑性モデルと矛盾する。本稿では,適応制御理論の枠組みで導かれる可解なマイクロ回路モデルとヘビー学習規則が,この不一致をいかに解消するかを示す。誤りがトップダウン非抑制シナプス求心性にコード化されていると仮定すると、繰り返し抑制がヘビアン可塑性に明示的に影響を及ぼすと、誤り修飾学習は回路レベルで自然に現れる。同じ学習規則は、抑制がない場合の可塑性を実験的に観察し、いくつかの非線形分離可能なベンチマークでエラーのバックプロパゲーション(bp)に比較可能である。本研究は, 機能的および実験的に観察された可塑性規則のギャップを埋め, 励起可塑性の抑制に関する具体的な予測を行う。

How neuronal circuits achieve credit assignment remains a central unsolved question in systems neuroscience. Various studies have suggested plausible solutions for back-propagating error signals through multi-layer networks. These purely functionally motivated models assume distinct neuronal compartments to represent local error signals that determine the sign of synaptic plasticity. However, this explicit error modulation is inconsistent with phenomenological plasticity models in which the sign depends primarily on postsynaptic activity. Here we show how a plausible microcircuit model and Hebbian learning rule derived within an adaptive control theory framework can resolve this discrepancy. Assuming errors are encoded in top-down dis-inhibitory synaptic afferents, we show that error-modulated learning emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same learning rule accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.

翻訳日:2023-11-01 19:42:59 公開日:2023-10-30

# 部分ベイズニューラルネットワークのFeynman-Kacトレーニングについて

On Feynman--Kac training of partial Bayesian neural networks ( http://arxiv.org/abs/2310.19608v1 )

ライセンス: Link先を確認

Zheng Zhao and Sebastian Mair and Thomas B. Sch\"on and Jens Sj\"olund

(参考訳) 近年,パラメータのサブセットのみを確率的と考える部分ベイズニューラルネットワーク (pbnns) が,完全なベイズニューラルネットワークと競合することが示された。しかし、pBNNはしばしば潜在変数空間において多重モードであり、パラメトリックモデルに近似することは困難である。そこで本研究では,Feynman-Kacモデルのシミュレーションとして,pBNNのトレーニングを定式化した,効率的なサンプリングベーストレーニング戦略を提案する。次に,このモデルのパラメータと潜在後続分布を同時に計算可能な計算コストで推定できる逐次モンテカルロサンプリングの変種について述べる。我々は,様々な合成データと実世界のデータセットについて,提案手法が予測性能の面での最先端を上回っていることを示す。

Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent-variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. We show on various synthetic and real-world datasets that our proposed training scheme outperforms the state of the art in terms of predictive performance.

翻訳日:2023-11-01 19:42:39 公開日:2023-10-30

# 抽象的議論による事例ベース推論における事例関連性の学習に関する技術報告

Technical Report on the Learning of Case Relevance in Case-Based Reasoning with Abstract Argumentation ( http://arxiv.org/abs/2310.19607v1 )

ライセンス: Link先を確認

Guilherme Paulino-Passos, Francesca Toni

(参考訳) ケースベース推論は、いくつかの法的設定において重要な役割を果たすことが知られている。本稿では,最近の事例ベース推論のアプローチに注目し,議論が事例を表現し,事例間の結果の不一致と関連性の概念による攻撃結果を示す抽象的議論のインスタンス化が支持する。この文脈では、関連性はケース間の特異性の形式に結びついている。我々は,意思決定木を駆使して,ケースベース推論と抽象的議論(aa-cbr)の組み合わせと,法的場面における予測のためのケース関連学習について検討する。具体的には,aa-cbr と decision-tree-based learning の2つの法定データセットについて,決定木との比較で比較検討を行った。また,AA-CBRによるケース関係の学習により,決定木よりもコンパクトな表現が得られ,認知に難渋する説明を得る上で有益であることが示唆された。

Case-based reasoning is known to play an important role in several legal settings. In this paper we focus on a recent approach to case-based reasoning, supported by an instantiation of abstract argumentation whereby arguments represent cases and attack between arguments results from outcome disagreement between cases and a notion of relevance. In this context, relevance is connected to a form of specificity among cases. We explore how relevance can be learnt automatically in practice with the help of decision trees, and explore the combination of case-based reasoning with abstract argumentation (AA-CBR) and learning of case relevance for prediction in legal settings. Specifically, we show that, for two legal datasets, AA-CBR and decision-tree-based learning of case relevance perform competitively in comparison with decision trees. We also show that AA-CBR with decision-tree-based learning of case relevance results in a more compact representation than their decision tree counterparts, which could be beneficial for obtaining cognitively tractable explanations.

翻訳日:2023-11-01 19:42:24 公開日:2023-10-30

# Deep Kalman Filters can Filter

Deep Kalman Filters Can Filter ( http://arxiv.org/abs/2310.19603v1 )

ライセンス: Link先を確認

Blanka Hovart, Anastasis Kratsios, Yannick Limmer, Xuwei Yang

(参考訳) ディープカルマンフィルタ(deep kalman filter、dkfs)は、逐次データからガウス確率測度を生成するニューラルネットワークモデルのクラスである。 DKFはカルマンフィルタにインスパイアされたものの、確率的フィルタリング問題と具体的な理論的関係が欠如しているため、数学ファイナンスにおけるボンドとオプション価格のモデルキャリブレーションなど、従来のモデルベースフィルタが使用されている領域に適用性に制限される。本研究では,非マルコフ的および条件付きガウス的信号過程の条件法則を概略実装できる連続時間DKFのクラスを示すことで,ディープラーニングの数学的基礎に対処する。近似結果は、与えられたコンパクトなパスの集合に対して計算された最悪のケース2-ワッサーシュタイン距離によって近似誤差を定量化する。

Deep Kalman filters (DKFs) are a class of neural network models that generate Gaussian probability measures from sequential data. Though DKFs are inspired by the Kalman filter, they lack concrete theoretical ties to the stochastic filtering problem, thus limiting their applicability to areas where traditional model-based filters have been used, e.g.\ model calibration for bond and option prices in mathematical finance. We address this issue in the mathematical foundations of deep learning by exhibiting a class of continuous-time DKFs which can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-times measurements. Our approximation results hold uniformly over sufficiently regular compact subsets of paths, where the approximation error is quantified by the worst-case 2-Wasserstein distance computed uniformly over the given compact set of paths.

翻訳日:2023-11-01 19:42:07 公開日:2023-10-30

# スピンスピン結合を持つ2量子ラビモデルにおける熱力学的限界

Thermodynamic Limit in the Two-qubit Quantum Rabi Model with Spin-Spin Coupling ( http://arxiv.org/abs/2310.19595v1 )

ライセンス: Link先を確認

R. Grimaudo, A. Messina, A. Sergi, E. Solano, and D. Valenti

(参考訳) 同じ量子化場モードに結合された2つの相互作用量子ビットからなる量子系において、2階超放射性量子相転移が発生する。スピンスピン相互作用を持つ積分可能な2量子ビット量子ラビモデルに対して,熱力学的に適切な限界を導入する。すなわち、スピンとモードの周波数比に関係なく、スピンスピンとスピンモードのカップリングとモード周波数との無限比によって決定される。

The occurrence of a second-order superradiant quantum phase transition is brought to light in a quantum system consisting of two interacting qubits coupled to the same quantized field mode. We introduce an appropriate thermodynamic-like limit for the integrable two-qubit quantum Rabi model with spin-spin interaction. Namely, it is determined by the infinite ratios of the spin-spin and the spin-mode couplings to the mode frequency, regardless of the spin-to-mode frequency ratios.

翻訳日:2023-11-01 19:41:24 公開日:2023-10-30

# エキスパートアドバイザを用いた局所定常データの予測

Prediction of Locally Stationary Data Using Expert Advice ( http://arxiv.org/abs/2310.19591v1 )

ライセンス: Link先を確認

Vladimir V'yugin, Vladimir Trunov

(参考訳) 継続的機械学習の問題は研究されている。ゲーム理論のアプローチの枠組みでは、次の予測を計算する際には、データフローを生成するソースの確率的性質に関する仮定は使用されない -- ソースはアナログ、アルゴリズム、確率的であり、そのパラメータは確率モデルを構築する際にランダムに変化する可能性がある。局所定常時系列のオンライン予測アルゴリズムを提案する。提案アルゴリズムの効率を推定する。

The problem of continuous machine learning is studied. Within the framework of the game-theoretic approach, when for calculating the next forecast, no assumptions about the stochastic nature of the source that generates the data flow are used -- the source can be analog, algorithmic or probabilistic, its parameters can change at random times, when building a prognostic model, only structural assumptions are used about the nature of data generation. An online forecasting algorithm for a locally stationary time series is presented. An estimate of the efficiency of the proposed algorithm is obtained.

翻訳日:2023-11-01 19:41:18 公開日:2023-10-30

# シャープ解によって特徴づけられる偏微分方程式を解く演算子学習による物理インフォームドニューラルネットワーク

Operator Learning Enhanced Physics-informed Neural Networks for Solving Partial Differential Equations Characterized by Sharp Solutions ( http://arxiv.org/abs/2310.19590v1 )

ライセンス: Link先を確認

Bin Lin, Zhiping Mao, Zhicheng Wang, George Em Karniadakis

(参考訳) 物理インフォームドニューラルネットワーク(PINN)は偏微分方程式(PDE)の前方および逆問題の解法として有望なアプローチとして示されている。一方、ディープ・オペレーター・ネットワーク(DeepONet)やフーリエ・ニューラル・オペレータ(FNO)などの手法を含むニューラル・オペレーター・アプローチは、PDEの近似ソリューションとして広く採用されている。それでも、シャープなソリューションからなる問題を解決することは、この2つのアプローチを採用する際に大きな課題となる。そこで本研究では,演算子学習強化物理インフォームドニューラルネットワーク(OL-PINN)と呼ばれる新しいフレームワークを提案する。まず,deeponetを用いて,鋭い解を特徴とするpdesに関連する平滑な問題の集合について解演算子を学習する。その後、トレーニング済みのDeepONetをPINNと統合し、ターゲットのシャープな解問題を解決する。本稿では, 非線形拡散反応方程式, バーガーズ方程式, 圧縮不能なナビエ・ストークス方程式などの様々な問題をレイノルズ数で解くことで, OL-PINNの有効性を示す。提案手法はバニラピンと比較すると,強い一般化能力を達成するために少数の残差点しか必要としない。さらに、堅牢なトレーニングプロセスを確保しながら、精度を大幅に向上させる。さらに、OL-PINNは逆問題を解決するためにPINNの利点を継承する。この目的のために,ol-pinn法を部分境界条件のみを用いて解くことに応用し,古典的数値解法では解くことが困難であり,不適切な問題やより複雑な逆問題を解く能力を示す。

Physics-informed Neural Networks (PINNs) have been shown as a promising approach for solving both forward and inverse problems of partial differential equations (PDEs). Meanwhile, the neural operator approach, including methods such as Deep Operator Network (DeepONet) and Fourier neural operator (FNO), has been introduced and extensively employed in approximating solution of PDEs. Nevertheless, to solve problems consisting of sharp solutions poses a significant challenge when employing these two approaches. To address this issue, we propose in this work a novel framework termed Operator Learning Enhanced Physics-informed Neural Networks (OL-PINN). Initially, we utilize DeepONet to learn the solution operator for a set of smooth problems relevant to the PDEs characterized by sharp solutions. Subsequently, we integrate the pre-trained DeepONet with PINN to resolve the target sharp solution problem. We showcase the efficacy of OL-PINN by successfully addressing various problems, such as the nonlinear diffusion-reaction equation, the Burgers equation and the incompressible Navier-Stokes equation at high Reynolds number. Compared with the vanilla PINN, the proposed method requires only a small number of residual points to achieve a strong generalization capability. Moreover, it substantially enhances accuracy, while also ensuring a robust training process. Furthermore, OL-PINN inherits the advantage of PINN for solving inverse problems. To this end, we apply the OL-PINN approach for solving problems with only partial boundary conditions, which usually cannot be solved by the classical numerical methods, showing its capacity in solving ill-posed problems and consequently more complex inverse problems.

翻訳日:2023-11-01 19:41:08 公開日:2023-10-30

# ゲージ同変非線形メッセージパッシングを用いたメッシュ上のモデリングダイナミクス

Modeling Dynamics over Meshes with Gauge Equivariant Nonlinear Message Passing ( http://arxiv.org/abs/2310.19589v1 )

ライセンス: Link先を確認

Jung Yeon Park, Lawson L.S. Wong, Robin Walters

(参考訳) 非ユークリッド多様体上のデータは、しばしば表面メッシュとして離散化され、コンピュータグラフィックスや生物学的および物理的システムに自然に現れる。特に、多様体上の偏微分方程式(PDE)の解は、基礎となる幾何学に批判的に依存する。グラフニューラルネットワークはPDEにうまく適用されているが、曲面幾何学を取り入れておらず、多様体の局所ゲージ対称性を考慮していない。あるいは、メッシュ上のゲージ同変畳み込みおよび注意アーキテクチャに関する最近の研究は、基礎となる幾何学を活用するが、複雑な非線形力学を持つ表面PDEのモデル化では不十分である。これらの問題に対処するため、非線形メッセージパッシングを用いた新しいゲージ同変アーキテクチャを提案する。我々の新しいアーキテクチャは、複雑で非線形なドメイン上の畳み込みネットワークや注意ネットワークよりも高い性能を実現する。しかし、非メッシュの場合と同様に、設計上のトレードオフは、異なるタスクに対して畳み込み、注意、またはメッセージパッシングのネットワークを好む。

Data over non-Euclidean manifolds, often discretized as surface meshes, naturally arise in computer graphics and biological and physical systems. In particular, solutions to partial differential equations (PDEs) over manifolds depend critically on the underlying geometry. While graph neural networks have been successfully applied to PDEs, they do not incorporate surface geometry and do not consider local gauge symmetries of the manifold. Alternatively, recent works on gauge equivariant convolutional and attentional architectures on meshes leverage the underlying geometry but underperform in modeling surface PDEs with complex nonlinear dynamics. To address these issues, we introduce a new gauge equivariant architecture using nonlinear message passing. Our novel architecture achieves higher performance than either convolutional or attentional networks on domains with highly complex and nonlinear dynamics. However, similar to the non-mesh case, design trade-offs favor convolutional, attentional, or message passing networks for different tasks; we investigate in which circumstances our message passing method provides the most benefit.

翻訳日:2023-11-01 19:40:38 公開日:2023-10-30

# DrM: 休眠率最小化による視覚強化学習の習得

DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization ( http://arxiv.org/abs/2310.19668v1 )

ライセンス: Link先を確認

Guowei Xu, Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Zhecheng Yuan, Tianying Ji, Yu Luo, Xiaoyu Liu, Jiaxin Yuan, Pu Hua, Shuzhen Li, Yanjie Ze, Hal Daum\'e III, Furong Huang, Huazhe Xu

(参考訳) 視覚強化学習(RL)は連続制御タスクにおいて有望である。その進歩にもかかわらず、現在のアルゴリズムは、サンプル効率、漸近的性能、ランダム種の選択に対する堅牢性など、事実上あらゆるパフォーマンス面で満足できない。本稿では、初期訓練中に持続的不活性を示すエージェントである既存の視覚的RL法の主な欠点を特定し、効果的に探索する能力を制限する。さらに,この重要な観察により,運動的不活発な探索に対するエージェントの傾きと,その政策ネットワークにおける神経活動の欠如との間に有意な相関が明らかとなった。この不活性を定量化するために、RLエージェントのネットワークにおける不活性を測定するために、休眠比を計量として採用する。また, 報酬信号によらず, 休眠比がエージェントの活動レベルのスタンドアロン指標として機能することを実証的に認識する。上記の知見を生かしたdrmは,エージェントの探索・探索トレードオフを積極的に最小化することにより,3つのコアメカニズムを用いてガイドする手法である。実験によると、DrMはDeepMind Control Suite、MetaWorld、Adroitを含む3つの連続制御ベンチマーク環境において、壊れた種(合計76種)なしでサンプル効率と漸近性能を大幅に改善する。最も重要なことは、drmはdeepmindコントロールスイートの犬とマニピュレータドメインの両方のタスクを一貫して解決する最初のモデルフリーなアルゴリズムである。

Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations.

翻訳日:2023-11-01 19:34:16 公開日:2023-10-30

# 神経拡散反応過程による動的テンソル分解

Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes ( http://arxiv.org/abs/2310.19666v1 )

ライセンス: Link先を確認

Zheng Wang, Shikai Fang, Shibo Li, Shandian Zhe

(参考訳) テンソル分解は多方向データ解析の重要なツールである。実際には、データはしばしばスパースされ、リッチな時間情報と関連付けられる。しかし、既存の手法はしばしば時間情報を過小評価し、わずかに観察されたテンソルエントリ内の構造的知識を無視する。これらの制限を克服し、その基盤となる時間構造をよりよく捉えるために、Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE)を提案する。各テンソルモードにおけるエンティティの動的埋め込みを推定するニューラル拡散-反応プロセスを開発した。具体的には、観測されたテンソルエントリに基づいて、エンティティ間の相関をエンコードする多成分グラフを構築する。グラフ拡散プロセスを構築し、相関したエンティティの埋め込み軌道を共進化させ、ニューラルネットワークを用いて個々のエンティティに対する反応プロセスを構築する。このようにして、我々のモデルは、異なる実体に対する埋め込みの進化において、共通性と個性の両方を捉えることができる。次に、ニューラルネットワークを用いて入力値を埋め込み軌道の非線形関数としてモデル化する。モデル推定にはODEソルバを組み合わせて確率的ミニバッチ学習アルゴリズムを開発する。本稿では,各ミニバッチの処理コストのバランスをとるための階層化サンプリング手法を提案する。我々はシミュレーション研究と実世界のアプリケーションの両方において,このアプローチの利点を示す。コードはhttps://github.com/wzhut/dynamic-tensor-decomposition-via-neural-diffusion-reaction-processesで入手できる。

Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation study and real-world applications. The code is available at https://github.com/wzhut/Dynamic-Tensor-Decomposition-via-Neural-Diffusion-Reaction-Processes.

翻訳日:2023-11-01 19:33:43 公開日:2023-10-30

# 北エフの量子二重模型の任意のセクターの分類

Classification of the anyon sectors of Kitaev's quantum double model ( http://arxiv.org/abs/2310.19661v1 )

ライセンス: Link先を確認

Alex Bols, Siddharth Vadnerkar

(参考訳) 無限三角格子上のキタエフの量子二重モデルの任意のセクターと、非アーベルケースを含む有限ゲージ群$G$の完全な分類を与える。予想通り、モデルの任意のセクターは、正確に$G$の量子二重代数の既約表現に対応する。私たちの証明は2つの主な部分からなる。第一部では、量子二重代数の各既約表現を純粋状態として構成し、これらの純状態の GNS 表現が任意のセクターに対的に不随意であることを示す。第2部では、任意のエノンセクターが、第1部で構築されたエノンセクターの1つに一意的に等しいことを示す。証明の最初の部分は、問題の状態の記述を文字列-ネット凝縮として決定的に用いている。純粋性は、これらの状態が局所的制約の適切な集合を満たすユニークな状態として特徴づけられる。証明の核心は、局所ゲージ変換のある群が局所弦ネットの集合に対して自由に推移的に作用するという事実である。第二に、任意のセクターがこれらの制約の有限個を除いて全てを満たす純粋状態を含むことを示す。既知の手法を用いることで、これらの制約のうちの1つを除いて全てを満たすあらゆるセクターで純粋な状態を構築することができる。最後に、そのような状態は、最初の部分で構築された任意のセクターの1つのベクトル状態でなければならないことを示す。

We give a complete classification of the anyon sectors of Kitaev's quantum double model on the infinite triangular lattice and for finite gauge group $G$, including the non-abelian case. As conjectured, the anyon sectors of the model correspond precisely to the irreducible representations of the quantum double algebra of $G$. Our proof consists of two main parts. In the first part, we construct for each irreducible representation of the quantum double algebra a pure state and show that the GNS representations of these pure states are pairwise disjoint anyon sectors. In the second part we show that any anyon sector is unitarily equivalent to one of the anyon sectors constructed in the first part. The first part of the proof crucially uses a description of the states in question as string-net condensates. Purity is shown by characterising these states as the unique states that satisfy appropriate sets of local constraints. At the core of the proof is the fact that certain groups of local gauge transformations act freely and transitively on collections of local string-nets. For the second part, we show that any anyon sector contains a pure state that satisfies all but a finite number of these constraints. Using known techniques we can then construct a pure state in the anyon sector that satisfies all but one of these constraints. Finally, we show explicitly that any such state must be a vector state in one of the anyon sectors constructed in the first part.

翻訳日:2023-11-01 19:33:20 公開日:2023-10-30

# 反復生成概念ボトルネックを用いた解釈可能テキスト分類

Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck ( http://arxiv.org/abs/2310.19660v1 )

ライセンス: Link先を確認

Josh Magnus Ludan, Qing Lyu, Yue Yang, Liam Dugan, Mark Yatskar, Chris Callison-Burch

(参考訳) ディープニューラルネットワークはテキスト分類タスクに優れるが、ハイテイクドメインへの応用は、解釈可能性の欠如によって妨げられている。そこで本研究では,グローバルかつ局所的な説明を提供する,本質的に解釈可能なテキスト分類フレームワークであるText Bottleneck Models (TBMs)を提案する。 tbmsは出力ラベルを直接予測する代わりに、スパースな概念集合のカテゴリー値を予測し、それらの概念上の線形層を使用して最終的な予測を生成する。これらの概念は、人間のキュレーションを必要とせずに、LLM(Large Language Model)によって自動的に発見され、測定することができる。概念生成と測定の両方に GPT-4 を用いる12種類の多様なデータセットにおいて,TBM は GPT-4 fewshot や DeBERTa などの確立したブラックボックスベースラインに匹敵する性能を示す。全体として、tbmsは、特に一般ドメインテキストにおいて、最小限のパフォーマンストレードオフで、解釈性を高める有望な新しいフレームワークであることを示唆している。

Deep neural networks excel in text classification tasks, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBMs), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBMs predict categorical values for a sparse set of salient concepts and use a linear layer over those concept values to produce the final prediction. These concepts can be automatically discovered and measured by a Large Language Model (LLM), without the need for human curation. On 12 diverse datasets, using GPT-4 for both concept generation and measurement, we show that TBMs can rival the performance of established black-box baselines such as GPT-4 fewshot and finetuned DeBERTa, while falling short against finetuned GPT-3.5. Overall, our findings suggest that TBMs are a promising new framework that enhances interpretability, with minimal performance tradeoffs, particularly for general-domain text.

翻訳日:2023-11-01 19:32:55 公開日:2023-10-30

# ネットワーク侵入検出のための自然言語による木モデル決定の解説

Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection ( http://arxiv.org/abs/2310.19658v1 )

ライセンス: Link先を確認

Noah Ziems, Gang Liu, John Flanagan, Meng Jiang

(参考訳) 機械学習を利用したネットワーク侵入検知(NID)システムは、悪質なネットワークトラフィックを検出するために実際に高い性能を示すことが示されている。特に決定木は、パフォーマンスと単純さのバランスが強いが、NIDシステムのユーザは、機械学習の背景知識を解釈しなければならない。さらに、なぜ特定の特徴が分類に重要であるかについて、追加の外部情報を提供することができない。本研究では,大規模言語モデル(LLM)を用いて,意思決定木NIDシステムに対する説明と背景知識を提供する。さらに,人間による決定木推論の理解度を測定するクイズ質問の自動生成を活用した,決定木説明のための新たな人間評価フレームワークを提案する。最後に, LLM の生成した決定木説明は, 可読性, 品質, 背景知識の利用の人間の評価と高い相関性を示し, 同時に意思決定境界の理解を深めた。

Network intrusion detection (NID) systems which leverage machine learning have been shown to have strong performance in practice when used to detect malicious network traffic. Decision trees in particular offer a strong balance between performance and simplicity, but require users of NID systems to have background knowledge in machine learning to interpret. In addition, they are unable to provide additional outside information as to why certain features may be important for classification. In this work, we explore the use of large language models (LLMs) to provide explanations and additional background knowledge for decision tree NID systems. Further, we introduce a new human evaluation framework for decision tree explanations, which leverages automatically generated quiz questions that measure human evaluators' understanding of decision tree inference. Finally, we show LLM generated decision tree explanations correlate highly with human ratings of readability, quality, and use of background knowledge while simultaneously providing better understanding of decision boundaries.

翻訳日:2023-11-01 19:32:33 公開日:2023-10-30

# 計算病理学におけるドメイン一般化:調査とガイドライン

Domain Generalization in Computational Pathology: Survey and Guidelines ( http://arxiv.org/abs/2310.19656v1 )

ライセンス: Link先を確認

Mostafa Jahanifar, Manahil Raza, Kesi Xu, Trinh Vuong, Rob Jewsbury, Adam Shephard, Neda Zamanitajeddin, Jin Tae Kwak, Shan E Ahmed Raza, Fayyaz Minhas, Nasir Rajpoot

(参考訳) 深層学習モデルは、様々な組織像解析アプリケーションにまたがる複雑なタスクに取り組むことにより、計算病理学(CPath)において例外的な効果を示した。それでも、分布外データ(異種イメージング装置や様々な組織調製方法など、様々なソースから推定される)の存在は、 \emph{ domain shift} (DS) を引き起こす可能性がある。 DSは、訓練されたモデルの一般化をわずかに異なるデータ分布を持つ未知のデータセットに還元し、革新的な 'emph{ domain generalization} (DG) ソリューションの必要性を喚起する。本研究は,癌研究および臨床実習における診断・予後モデルに大きな影響を与えるDG法の可能性を認識し,CPathにおけるDGの達成に関するガイドラインとともに報告する。我々は、様々なDSタイプを厳格に定義し、CPathの既存のDGアプローチとリソースを体系的にレビューし、分類し、それらの利点、制限、適用可能性に関する洞察を提供する。また,28個の最先端DGアルゴリズムを用いて,複雑なDG問題に対処するためのベンチマーク実験を行った。以上の結果から, CPath特異的なステント拡張技術と注意深い実験設計が有効である可能性が示唆された。しかし、CPath では DG のすべてに適合するソリューションは存在しない。そこで我々は,異なるシナリオに応じてDSの検出と管理を行うための明確なガイドラインを確立する。コンセプト、ガイドライン、レコメンデーションのほとんどはcpathのアプリケーションで提供されていますが、ほとんどの医療画像分析タスクにも適用できると考えています。

Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases the generalization of trained models to unseen datasets with slightly different data distributions, prompting the need for innovative \emph{domain generalization} (DG) solutions. Recognizing the potential of DG methods to significantly influence diagnostic and prognostic models in cancer studies and clinical practice, we present this survey along with guidelines on achieving DG in CPath. We rigorously define various DS types, systematically review and categorize existing DG approaches and resources in CPath, and provide insights into their advantages, limitations, and applicability. We also conduct thorough benchmarking experiments with 28 cutting-edge DG algorithms to address a complex DG problem. Our findings suggest that careful experiment design and CPath-specific Stain Augmentation technique can be very effective. However, there is no one-size-fits-all solution for DG in CPath. Therefore, we establish clear guidelines for detecting and managing DS depending on different scenarios. While most of the concepts, guidelines, and recommendations are given for applications in CPath, we believe that they are applicable to most medical image analysis tasks as well.

翻訳日:2023-11-01 19:32:18 公開日:2023-10-30

# mcad:効率的な画像テキスト検索のためのマルチティーチャークロスモーダルアライメント蒸留

MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval ( http://arxiv.org/abs/2310.19654v1 )

ライセンス: Link先を確認

Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu

(参考訳) 大規模視覚言語事前学習モデルの成功と,産業領域における画像テキスト検索の広範な適用により,モデルサイズを削減し,端末端末展開を合理化する必要性が高まっている。画像テキスト検索の主流モデル構造はシングルストリームとデュアルストリームであり、どちらも視覚とテキスト間のセマンティックギャップを埋めることを目的としている。デュアルストリームモデルはオフラインインデックス化と高速推論において優れ、一方シングルストリームモデルは適切な特徴融合を用いてより正確なクロスモデルアライメントを実現する。単ストリームモデルと二重ストリームモデルの利点を統合するため, マルチティーチングラークロスモーダルアライメント蒸留(MCAD)手法を提案する。両ストリームモデルのイメージとテキストの特徴に融合した単一ストリーム特徴を組み込むことで,教師の新たな特徴やロジットを定式化する。次に,留学生のデュアルストリームモデルの能力を高めるために,ロジットと特徴蒸留の両方を行い,推論の複雑さを増すことなく高い検索性能を達成する。画像テキスト検索タスクにおけるMCADの顕著な性能と高効率性を示す。さらに,9300万のメモリと30ミリ秒の検索レイテンシを持つSnapdragonクリップ上で,モバイルCLIPモデルを実装した。

With the success of large-scale visual-language pretraining models and the wide application of image-text retrieval in industry areas, reducing the model size and streamlining their terminal-device deployment have become urgently necessary. The mainstream model structures for image-text retrieval are single-stream and dual-stream, both aiming to close the semantic gap between visual and textual modalities. Dual-stream models excel at offline indexing and fast inference, while single-stream models achieve more accurate cross-model alignment by employing adequate feature fusion. We propose a multi-teacher cross-modality alignment distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher features and logits. Then, we conduct both logit and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity. Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a mobile CLIP model on Snapdragon clips with only 93M running memory and 30ms search latency, without apparent performance degradation of the original large CLIP.

翻訳日:2023-11-01 19:31:53 公開日:2023-10-30

# 拡散モデルによる無制限データプランによるvaeトレーニングのアップグレード

Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion Models ( http://arxiv.org/abs/2310.19653v1 )

ライセンス: Link先を確認

Tim Z. Xiao, Johannes Zenn, Robert Bamler

(参考訳) 変分オートエンコーダ(VAE)は表現学習の一般的なモデルであるが、それらのエンコーダは真の(連続的な)データ分散である$p_{\mathrm{data}}(\mathbf{x})$の代わりに有限トレーニングセットで訓練されているため、オーバーフィッティング(Cremer et al., 2018)の影響を受けやすい。一方、拡散モデルはエンコーダを固定することでこの問題を回避する。これにより、それらの表現は解釈できないが、トレーニングを単純化し、$p_{\mathrm{data}}(\mathbf{x})$の正確かつ連続的な近似を可能にする。本稿では,VAEにおけるオーバーフィッティングエンコーダを,事前学習した拡散モデルからのサンプルのトレーニングにより効果的に緩和できることを示す。これらの結果は、最近の研究結果(Alemohammad et al., 2023; Shumailov et al., 2023)が、他の生成モデルによって生成されたデータに基づいてモデルが訓練された場合、生成性能の低下を観測している。提案手法を用いて学習したVAEの一般化性能,償却ギャップ,ロバスト性を3つの異なるデータセットで解析した。通常のトレーニング法と従来のデータ拡張法と比較して,すべての測定値が改善され,拡散モデルから得られたサンプルの量で十分な値が得られることが判明した。

Variational autoencoders (VAEs) are popular models for representation learning but their encoders are susceptible to overfitting (Cremer et al., 2018) because they are trained on a finite training set instead of the true (continuous) data distribution $p_{\mathrm{data}}(\mathbf{x})$. Diffusion models, on the other hand, avoid this issue by keeping the encoder fixed. This makes their representations less interpretable, but it simplifies training, enabling accurate and continuous approximations of $p_{\mathrm{data}}(\mathbf{x})$. In this paper, we show that overfitting encoders in VAEs can be effectively mitigated by training on samples from a pre-trained diffusion model. These results are somewhat unexpected as recent findings (Alemohammad et al., 2023; Shumailov et al., 2023) observe a decay in generative performance when models are trained on data generated by another generative model. We analyze generalization performance, amortization gap, and robustness of VAEs trained with our proposed method on three different data sets. We find improvements in all metrics compared to both normal training and conventional data augmentation methods, and we show that a modest amount of samples from the diffusion model suffices to obtain these gains.

翻訳日:2023-11-01 19:31:30 公開日:2023-10-30

# インストラクションチューニングのダイナミクス:大規模言語モデルのそれぞれの能力には独自の成長ペースがある

Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace ( http://arxiv.org/abs/2310.19651v1 )

ライセンス: Link先を確認

Chiyu Song, Zhanchao Zhou, Jianhao Yan, Yuejiao Fei, Zhenzhong Lan, Yue Zhang

(参考訳) 命令チューニングは、大規模言語モデル(llm)の汎用知性を引き出すための急成長する手法である。しかし、命令データの作成はいまだにヒューリスティックであり、既存のデータセット間の品質と分散に大きな変化をもたらす。これらのデータセットから得られた実験的な結論も矛盾しておらず、一部の研究では命令数のスケーリングの重要性を強調している。データ構築ガイドラインをより深く理解するために、私たちは、全体的なモデルパフォーマンスから、クリエイティブな記述、コード生成、論理的推論といった基礎的な能力の成長まで、焦点を絞ります。数百のモデルチェックポイント (7b〜33b) を用いて,40k以上のヒューマンキュレート命令データからなる新しいコレクション上で,データボリューム,パラメータサイズ,データ構築手法が様々な能力開発に与える影響を体系的に検討した。提案したデータセットは、厳密に品質制御され、10の異なるLCM能力に分類される。私たちの研究は3つの主要な発見を明らかにした。 (i) モデル全体の性能に直接影響を及ぼすデータ量とパラメータスケールにもかかわらず、その増加に反応する能力があり、限られたデータを使って効果的に訓練できる能力がある一方で、これらの変化に強く抵抗する能力もある。 (II)GPT-4の合成データより効率が良く、容積増加とともにモデル性能を常に向上させることができるが、合成データでは達成できない。 (iii)命令データは、最初の2つの観察を反映するドメイン外データに対する評価結果とともに、強力な相互可能性の一般化をもたらす。さらに、これらの結果がより効率的なデータ構築を導出し、公開ベンチマークの性能改善につながることを実証する。

Instruction tuning is a burgeoning method to elicit the general intelligence of Large Language Models (LLMs). However, the creation of instruction data is still largely heuristic, leading to significant variation in quality and distribution across existing datasets. Experimental conclusions drawn from these datasets are also inconsistent, with some studies emphasizing the importance of scaling instruction numbers, while others argue that a limited number of samples suffice. To better understand data construction guidelines, we deepen our focus from the overall model performance to the growth of each underlying ability, such as creative writing, code generation, and logical reasoning. We systematically investigate the effects of data volume, parameter size, and data construction methods on the development of various abilities, using hundreds of model checkpoints (7b to 33b) fully instruction-tuned on a new collection of over 40k human-curated instruction data. This proposed dataset is stringently quality-controlled and categorized into ten distinct LLM abilities. Our study reveals three primary findings: (i) Despite data volume and parameter scale directly impacting models' overall performance, some abilities are more responsive to their increases and can be effectively trained using limited data, while some are highly resistant to these changes. (ii) Human-curated data strongly outperforms synthetic data from GPT-4 in efficiency and can constantly enhance model performance with volume increases, but is unachievable with synthetic data. (iii) Instruction data brings powerful cross-ability generalization, with evaluation results on out-of-domain data mirroring the first two observations. Furthermore, we demonstrate how these findings can guide more efficient data constructions, leading to practical performance improvements on public benchmarks.

翻訳日:2023-11-01 19:30:59 公開日:2023-10-30

# keygen2vec: 質問応答におけるマルチラベルキーワード生成による学習文書埋め込み

KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering ( http://arxiv.org/abs/2310.19650v1 )

ライセンス: Link先を確認

Iftitahu Ni'mah and Samaneh Khoshrou and Vlado Menkovski and Mykola Pechenizkiy

(参考訳) 文書ソース間の構造的類似性を保ちながら高次元埋め込み空間に文書を表現することは、テキスト表現学習における多くの研究の最終的な目標である。しかし、現在の埋め込みモデルは、主にラベル管理の可用性に依存して、その結果の埋め込みの表現力を高めている。対照的に、教師なしの埋め込みは安価であるが、ターゲットコーパスの暗黙的な構造、特にプリトレーニングソースとの異なる分布から来るサンプルをキャプチャできないことが多い。本研究の目的は,Sequence-to-Sequence (Seq2Seq)テキストジェネレータによる文書埋め込みを学習することで,ラベル管理への依存を緩和することである。具体的には,コミュニティベース質問応答(cqa)において,キーフレーズ生成タスクをマルチラベルキーワード生成に再構成する。実験の結果、KeyGen2VecはPurity, Normalized Mutual Information (NMI)、F1-Scoreのメトリクスに基づいて最大14.7%のマルチラベルキーワード分類器よりも優れていることがわかった。興味深いことに、一般的にラベル管理を通じて埋め込みを学ぶことの絶対的な利点は評価データセット間で非常に肯定的であるが、KeyGen2VecはYahoo! cQAのトピックラベル管理と多くの潜在トピックラベルを利用する分類器と競合している。

Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however, mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings. In contrast, unsupervised embeddings are cheap, but they often cannot capture implicit structure in target corpus, particularly for samples that come from different distribution with the pretraining source. Our study aims to loosen up the dependency on label supervision by learning document embeddings via Sequence-to-Sequence (Seq2Seq) text generator. Specifically, we reformulate keyphrase generation task into multi-label keyword generation in community-based Question Answering (cQA). Our empirical results show that KeyGen2Vec in general is superior than multi-label keyword classifier by up to 14.7% based on Purity, Normalized Mutual Information (NMI), and F1-Score metrics. Interestingly, although in general the absolute advantage of learning embeddings through label supervision is highly positive across evaluation datasets, KeyGen2Vec is shown to be competitive with classifier that exploits topic label supervision in Yahoo! cQA with larger number of latent topic labels.

翻訳日:2023-11-01 19:30:29 公開日:2023-10-30

# 高速スワップ後悔最小化と近似相関平衡への応用

Fast swap regret minimization and applications to approximate correlated equilibria ( http://arxiv.org/abs/2310.19647v1 )

ライセンス: Link先を確認

Binghui Peng and Aviad Rubinstein

(参考訳) 任意の定数 $\varepsilon>0$ に対して、$t = \mathsf{polylog}(n)$ round で$\varepsilon t$-swap の後悔を得るという、単純で計算効率の良いアルゴリズムを与える。我々のアルゴリズムは$\varepsilon$に指数関数的依存を持つが、我々は一致する新しい下界を証明する。 Our algorithm for swap regret implies faster convergence to $\varepsilon$-Correlated Equilibrium ($\varepsilon$-CE) in several regimes: For normal form two-player games with $n$ actions, it implies the first uncoupled dynamics that converges to the set of $\varepsilon$-CE in polylogarithmic rounds; a $\mathsf{polylog}(n)$-bit communication protocol for $\varepsilon$-CE in two-player games (resolving an open problem mentioned by [Babichenko-Rubinstein'2017, Goos-Rubinstein'2018, Ganor-CS'2018]; and an $\tilde{O}(n)$-query algorithm for $\varepsilon$-CE (resolving an open problem of [Babichenko'2020] and obtaining the first separation between $\varepsilon$-CE and $\varepsilon$-Nash equilibrium in the query complexity model). 広義のゲームの場合、我々のアルゴリズムはPTAS for $\mathit{normal}$ $\mathit{form}$ $\mathit{correlated}$ $\mathit{equilibria}$, 計算的に難解であると予想される(例: [Stengel-Forges'08, Fujii'23])。

We give a simple and computationally efficient algorithm that, for any constant $\varepsilon>0$, obtains $\varepsilon T$-swap regret within only $T = \mathsf{polylog}(n)$ rounds; this is an exponential improvement compared to the super-linear number of rounds required by the state-of-the-art algorithm, and resolves the main open problem of [Blum and Mansour 2007]. Our algorithm has an exponential dependence on $\varepsilon$, but we prove a new, matching lower bound. Our algorithm for swap regret implies faster convergence to $\varepsilon$-Correlated Equilibrium ($\varepsilon$-CE) in several regimes: For normal form two-player games with $n$ actions, it implies the first uncoupled dynamics that converges to the set of $\varepsilon$-CE in polylogarithmic rounds; a $\mathsf{polylog}(n)$-bit communication protocol for $\varepsilon$-CE in two-player games (resolving an open problem mentioned by [Babichenko-Rubinstein'2017, Goos-Rubinstein'2018, Ganor-CS'2018]; and an $\tilde{O}(n)$-query algorithm for $\varepsilon$-CE (resolving an open problem of [Babichenko'2020] and obtaining the first separation between $\varepsilon$-CE and $\varepsilon$-Nash equilibrium in the query complexity model). For extensive-form games, our algorithm implies a PTAS for $\mathit{normal}$ $\mathit{form}$ $\mathit{correlated}$ $\mathit{equilibria}$, a solution concept often conjectured to be computationally intractable (e.g. [Stengel-Forges'08, Fujii'23]).

翻訳日:2023-11-01 19:30:04 公開日:2023-10-30

# distnet2d: 長距離時間情報を活用した効率的なセグメンテーションと追跡

DistNet2D: Leveraging long-range temporal information for efficient segmentation and tracking ( http://arxiv.org/abs/2310.19641v1 )

ライセンス: Link先を確認

Jean Ollion, Martin Maliet, Caroline Giuglaris, Elise Vacher and Maxime Deforet

(参考訳) videomicroscopyから長いトラックや系統を抽出するには、非常に低いエラー率が必要であり、高密度または変形した細胞の複雑なデータセットでは困難である。この課題を克服するには、時間的コンテキストを活用することが重要だ。本研究では2次元セルセグメンテーションと追跡のための新しいディープニューラルネットワーク(DNN)アーキテクチャであるDistNet2Dを提案する。 DistNet2Dは入力時に7つのフレームを考慮し、映画全体の情報を利用してセグメンテーションエラーを修正する後処理手順を使用する。 distnet2dは、密集した細菌細胞と真核生物細胞を含む2つの実験データセットの最近の2つの方法よりも優れている。 2Dデータ可視化、キュレーション、トレーニングのためのImageJベースのグラフィカルユーザインタフェースに統合されている。最後に, distnet2dの性能を, 細菌および真核生物の細胞において, 細胞の大きさと形状と輸送特性との相関性について実証した。

Extracting long tracks and lineages from videomicroscopy requires an extremely low error rate, which is challenging on complex datasets of dense or deforming cells. Leveraging temporal context is key to overcome this challenge. We propose DistNet2D, a new deep neural network (DNN) architecture for 2D cell segmentation and tracking that leverages both mid- and long-term temporal context. DistNet2D considers seven frames at the input and uses a post-processing procedure that exploits information from the entire movie to correct segmentation errors. DistNet2D outperforms two recent methods on two experimental datasets, one containing densely packed bacterial cells and the other containing eukaryotic cells. It has been integrated into an ImageJ-based graphical user interface for 2D data visualization, curation, and training. Finally, we demonstrate the performance of DistNet2D on correlating the size and shape of cells with their transport properties over large statistics, for both bacterial and eukaryotic cells.

翻訳日:2023-11-01 19:29:28 公開日:2023-10-30

# 顔の表情の不均衡認識のための余分な知識をマイニングする「Leave No Stone Unturned」

Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition ( http://arxiv.org/abs/2310.19636v1 )

ライセンス: Link先を確認

Yuhang Zhang, Yaqi Li, Lixiong Qin, Xuannan Liu, Weihong Deng

(参考訳) 顔の表情データは大きな不均衡を特徴とし、ほとんどの収集されたデータは幸福あるいは中立な表現を示し、恐怖や嫌悪の事例は少ない。この不均衡は、表情認識(FER)モデルに課題をもたらし、様々な人間の感情状態を完全に理解する能力を妨げる。既存のFER法は通常、高度に不均衡なテストセットに対して全体的な精度を報告するが、全ての式クラスの平均精度は低い。本稿では,不均衡なFER問題に対処することを目的とする。既存の手法は主に、マイナークラスのサンプルのみからマイナークラスの知識を学ぶことに焦点を当てている。しかし,本研究では,マイノリティークラスとマイノリティークラスの両方のサンプルから,マイノリティークラスに関連する余分な知識を抽出する手法を提案する。我々のモチベーションは、FERが分布学習タスクに似ているという信念から来ており、サンプルには複数のクラスに関する情報が含まれている可能性がある。例えば、メジャークラスのサプライズからのサンプルには、マイナークラスの恐れの便利な機能も含まれているかもしれない。そこで本研究では,モデルの正規化に再均衡したアテンションマップを活用する手法を提案し,すべてのトレーニングサンプルからマイナークラスに関する変換不変情報を抽出する。また,不均衡なトレーニングデータのラベル分布に関する余分な情報を利用して,モデルがよりマイナーなクラスに注意を払うように誘導し,クロスエントロピー損失を規制するために,再バランススムースラベルを導入する。異なるデータセットとバックボーンの広範な実験により、2つの提案されたモジュールが協力してモデルを正規化し、不均衡なFERタスクの下で最先端のパフォーマンスを達成することが示されている。コードはhttps://github.com/zyh-uaiaaaaで入手できる。

Facial expression data is characterized by a significant imbalance, with most collected data showing happy or neutral expressions and fewer instances of fear or disgust. This imbalance poses challenges to facial expression recognition (FER) models, hindering their ability to fully understand various human emotional states. Existing FER methods typically report overall accuracy on highly imbalanced test sets but exhibit low performance in terms of the mean accuracy across all expression classes. In this paper, our aim is to address the imbalanced FER problem. Existing methods primarily focus on learning knowledge of minor classes solely from minor-class samples. However, we propose a novel approach to extract extra knowledge related to the minor classes from both major and minor class samples. Our motivation stems from the belief that FER resembles a distribution learning task, wherein a sample may contain information about multiple classes. For instance, a sample from the major class surprise might also contain useful features of the minor class fear. Inspired by that, we propose a novel method that leverages re-balanced attention maps to regularize the model, enabling it to extract transformation invariant information about the minor classes from all training samples. Additionally, we introduce re-balanced smooth labels to regulate the cross-entropy loss, guiding the model to pay more attention to the minor classes by utilizing the extra information regarding the label distribution of the imbalanced training data. Extensive experiments on different datasets and backbones show that the two proposed modules work together to regularize the model and achieve state-of-the-art performance under the imbalanced FER task. Code is available at https://github.com/zyh-uaiaaaa.

翻訳日:2023-11-01 19:29:09 公開日:2023-10-30

# 臨床精度・解釈性モデルのための双方向キャプション

Bidirectional Captioning for Clinically Accurate and Interpretable Models ( http://arxiv.org/abs/2310.19635v1 )

ライセンス: Link先を確認

Keegan Quigley, Miriam Cha, Josh Barua, Geeticka Chauhan, Seth Berkowitz, Steven Horng, Polina Golland

(参考訳) 視覚言語事前学習は、下流コンピュータビジョンタスクに効率的に転送する高品質な視覚エンコーダを生成することが示されている。生成言語モデルが広く注目されている一方で、画像キャプションは、特に医学的画像分析において、対照的な学習を好むクロスモーダルプリトレーニングの形式として見過ごされてきた。本稿では,放射線学レポートの双方向キャプションを事前学習の一形態として実験し,学習した埋め込みの質と有用性を比較検討した。我々は、放射線領域にradtexと呼ばれるcnnエンコーダ、トランスフォーマデコーダアーキテクチャを最適化する。その結果,コントラスト付き事前学習と競合するプリトレーニング型視覚エンコーダ(CheXpert competition multi-label AUC 89.4%)の字幕化だけでなく,臨床関連報告(CheXpert labeler を用いたマクロF1スコア0.349)を生成でき,対象とする対話的出力のプロンプトに応答できることがわかった。

Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. While generative language models have gained widespread attention, image captioning has thus far been mostly overlooked as a form of cross-modal pretraining in favor of contrastive learning, especially in medical image analysis. In this paper, we experiment with bidirectional captioning of radiology reports as a form of pretraining and compare the quality and utility of learned embeddings with those from contrastive pretraining methods. We optimize a CNN encoder, transformer decoder architecture named RadTex for the radiology domain. Results show that not only does captioning pretraining yield visual encoders that are competitive with contrastive pretraining (CheXpert competition multi-label AUC of 89.4%), but also that our transformer decoder is capable of generating clinically relevant reports (captioning macro-F1 score of 0.349 using CheXpert labeler) and responding to prompts with targeted, interactive outputs.

翻訳日:2023-11-01 19:28:41 公開日:2023-10-30

# デブリ・破壊・アーティファクト粒子を用いたtem画像からの無傷アデノウイルス自動検出のための畳み込みニューラルネットワーク

Convolutional Neural Networks for Automatic Detection of Intact Adenovirus from TEM Imaging with Debris, Broken and Artefacts Particles ( http://arxiv.org/abs/2310.19630v1 )

ライセンス: Link先を確認

Olivier Rukundo, Andrea Behanova, Riccardo De Feo, Seppo Ronkko, Joni Oja, Jussi Tohka

(参考訳) 製造および製造過程における医薬品の一次粒子および純度プロファイルの定期的なモニタリングは、製造者が製品の変動や汚染を避けるために不可欠である。透過電子顕微鏡(TEM)イメージングは、ウイルスベースの遺伝子治療ベクター製品と中間体において、変化が粒子の特性と純度に与える影響を予測するのに役立つ。無傷粒子は有効成分を特徴付けることができるため、粉体、破砕物、アーティファクト粒子を混合した非インタクトウイルス背景に対する無傷アデノウイルスの検出を自動化することが有用である。このような粒子の存在下では、無傷アデノウイルスの検出がより困難になる。この課題を克服するため,我々は,アデノウイルスのセミオートアノテーションとセグメンテーションのためのソフトウェアツールと,temイメージングシステムにおける無傷アデノウイルスの自動セグメンテーションと検出のためのソフトウェアツールを開発した。開発した半自動ツールは従来の画像解析手法を活用し,畳み込みニューラルネットワークと画像解析技術に基づいて自動ツールを構築した。定量・定性評価の結果, 真正検出率は偽陽性, 陰性で, アデノウイルスは本物のデブリや破断性アデノウイルス, 染色性アーティファクトと誤認することなく良好な検出率を示した。

Regular monitoring of the primary particles and purity profiles of a drug product during development and manufacturing processes is essential for manufacturers to avoid product variability and contamination. Transmission electron microscopy (TEM) imaging helps manufacturers predict how changes affect particle characteristics and purity for virus-based gene therapy vector products and intermediates. Since intact particles can characterize efficacious products, it is beneficial to automate the detection of intact adenovirus against a non-intact-viral background mixed with debris, broken, and artefact particles. In the presence of such particles, detecting intact adenoviruses becomes more challenging. To overcome the challenge, due to such a presence, we developed a software tool for semi-automatic annotation and segmentation of adenoviruses and a software tool for automatic segmentation and detection of intact adenoviruses in TEM imaging systems. The developed semi-automatic tool exploited conventional image analysis techniques while the automatic tool was built based on convolutional neural networks and image analysis techniques. Our quantitative and qualitative evaluations showed outstanding true positive detection rates compared to false positive and negative rates where adenoviruses were nicely detected without mistaking them for real debris, broken adenoviruses, and/or staining artefacts.

翻訳日:2023-11-01 19:28:18 公開日:2023-10-30

# 重なり合うスパース画像の深層学習に基づく分解:ニュートリノ相互作用の頂点への応用

Deep-learning-based decomposition of overlapping-sparse images: application at the vertex of neutrino interactions ( http://arxiv.org/abs/2310.19695v1 )

ライセンス: Link先を確認

Sa\'ul Alonso-Monsalve, Davide Sgalaberna, Xingyu Zhao, Adrien Molines, Clark McGrew, Andr\'e Rubbia

(参考訳) 画像分解は様々なコンピュータビジョンタスクにおいて重要な役割を担い、視覚的コンテンツの基本的なレベルでの分析と操作を可能にする。重なり合う画像は、複数のオブジェクトやシーンが部分的にお互いを遮っているときに起こり、分解アルゴリズムに特有の課題をもたらす。このタスクはスパース画像を扱う際に強化され、意味のある情報の不足がコンポーネントの正確な抽出を複雑にする。本稿では,多次元重なりスパース画像内の個々の物体を正確に抽出する深層学習の力を利用する解と,撮像検出器から得られた重なり粒子の分解を伴う高エネルギー物理学における直接的応用について述べる。ニュートリノ相互作用の頂点における独立粒子を同定し、測定し、複数の荷電粒子が重複する検出器像を観測することを期待する。深層学習によって頂点での検出器活動の像を分解することで、特定された低運動量粒子の運動パラメータを推定し、ニュートリノ現象の再構成されたエネルギー分解能を高めることができる。また, 上記の手法と完全微分可能生成モデルを組み合わせることで, さらに画像分解を改善し, その結果, 測定パラメータの分解能を向上し, 前例のない結果を得る。この改良はニュートリノのフレーバー振動を管理するパラメータを正確に測定し、物質と反物質の間の対称性を探索するために重要である。

Image decomposition plays a crucial role in various computer vision tasks, enabling the analysis and manipulation of visual content at a fundamental level. Overlapping images, which occur when multiple objects or scenes partially occlude each other, pose unique challenges for decomposition algorithms. The task intensifies when working with sparse images, where the scarcity of meaningful information complicates the precise extraction of components. This paper presents a solution that leverages the power of deep learning to accurately extract individual objects within multi-dimensional overlapping-sparse images, with a direct application in high-energy physics with decomposition of overlaid elementary particles obtained from imaging detectors. In particular, the proposed approach tackles a highly complex yet unsolved problem: identifying and measuring independent particles at the vertex of neutrino interactions, where one expects to observe detector images with multiple indiscernible overlapping charged particles. By decomposing the image of the detector activity at the vertex through deep learning, it is possible to infer the kinematic parameters of the identified low-momentum particles - which otherwise would remain neglected - and enhance the reconstructed energy resolution of the neutrino event. We also present an additional step - that can be tuned directly on detector data - combining the above method with a fully-differentiable generative model to improve the image decomposition further and, consequently, the resolution of the measured parameters, achieving unprecedented results. This improvement is crucial for precisely measuring the parameters that govern neutrino flavour oscillations and searching for asymmetries between matter and antimatter.

翻訳日:2023-11-01 19:21:18 公開日:2023-10-30

# 長期時空間モデリングのための畳み込み状態空間モデル

Convolutional State Space Models for Long-Range Spatiotemporal Modeling ( http://arxiv.org/abs/2310.19694v1 )

ライセンス: Link先を確認

Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon

(参考訳) 複雑な空間相関と長距離時間依存を同時にモデル化する必要があるため、長時空間列を効果的にモデル化することは困難である。 convlstmsは、再帰的なニューラルネットワークでテンソル値の状態を更新することでこれに対処するが、それらのシーケンシャルな計算によってトレーニングが遅くなる。対照的に、トランスフォーマーは時空間列全体を並列に処理し、トークンに圧縮することができる。しかしながら、注意のコストは2倍にスケールし、拡張性はより長いシーケンスに制限される。本稿では、先行手法の課題に対処し、ConvLSTMのテンソルモデリングのアイデアとS4やS5のような状態空間メソッドのロングシーケンスモデリングのアプローチを組み合わせた畳み込み状態空間モデル(ConvSSM)を導入する。まず,並列スキャンを畳み込み再帰に適用し,下位並列化と高速な自己回帰生成を実現する方法を示す。次に,長距離依存関係をモデル化するためのパラメータ化と初期化戦略の動機となるconvssmとssmのダイナミクスの等価性を確立する。その結果、ConvS5は、長距離時空間モデリングのための効率的なConvSSM変種である。 ConvS5 は Transformers と ConvLSTM を長距離移動MNIST 実験で上回り、ConvLSTM より3倍速く、Transformers より400倍速くサンプルを生成する。さらに、ConvS5はDMLab、Minecraft、Habitatの予測ベンチマークに挑戦する最先端のメソッドのパフォーマンスと一致し、長い時空間シーケンスをモデリングするための新しい方向を可能にする。

Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.

翻訳日:2023-11-01 19:20:52 公開日:2023-10-30

# 量子ドットセルオートマタを用いた非同期シーケンス回路における静的ハザードの除去

Elimination of Static Hazards in Asynchronous Sequential Circuits using Quantum dot Cellular Automata ( http://arxiv.org/abs/2310.19692v1 )

ライセンス: Link先を確認

Angshuman Khan, Chiradeep Mukherjee, Ankan Kumar Chakraborty, Ratna Chakrabarty, Debashis De

(参考訳) 新興技術には他にはないが、Quantum-dot Cellular Automata(量子ドットセルオートマタ)では、セル内の電子間の静電気的相互作用を扱う、高速、低電力動作、高パッケージ密度を見つけることができる。文献調査はqca回路のハザードフリー設計に欠けている。ハザードは曖昧で予測不能なアウトプットを生成し、回避できる。本研究は, リスクのない非同期シーケンシャル回路と, リンクエネルギーの両面を比較し, より優れた回路を提案する。回路シミュレーションはQCADesignerツールで検証されている。

There is nowhere else in emerging technology, but in Quantum-dot Cellular Automata, one can find high speed, low power operation, and high packaging density, which deals with electrostatic interaction between electrons within a cell. Literature survey lacks in hazards free design of QCA circuit. Hazards create ambiguous and unpredictable output, which can be avoided. This work considers both hazards and hazards-free asynchronous sequential circuits; both are compared in terms of kink energy, and a better one has been proposed. The circuit simulation has been verified in the QCADesigner tool.

翻訳日:2023-11-01 19:20:06 公開日:2023-10-30

# 因果関係は対実フェアネスとロバスト予測とグループフェアネスを結びつける

Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness ( http://arxiv.org/abs/2310.19691v1 )

ライセンス: Link先を確認

Jacy Reese Anthis and Victor Veitch

(参考訳) 反事実的公平性は、異なる人種や性別のような異なる保護されたクラスがある場合、aiや他のアルゴリズムシステムによって同じ方法で分類されるように要求される。これは米国法体系に反映される直感的な基準であるが、反事実は現実世界のデータでは直接観察できないため、その使用は制限されている。一方、グループフェアネスの指標(例えば、人口比率や等化確率)は直感的ではないが、より容易に観察できる。本稿では, 対実フェアネス, 頑健な予測, グループフェアネスのギャップを埋めるために, $\textit{causal context}$ を用いる。まず, 公平性と正確性の間には, 必ずしも根本的なトレードオフが存在するとは限らないことを示すことにより, 反事実的公正さを動機づける。第2に,データ生成過程の因果グラフと,グループフェアネスメトリクスが反事実フェアネスと等価である場合の対応関係を考案する。第3に,3つの共通フェアネスコンテキストにおいて,ラベル選択,予測者の選択が,それぞれ人口差パリティ,等化オッズ,キャリブレーションと等価であることを示す。対実フェアネスは、比較的単純なグループフェアネスの測定によってテストすることができる。

Counterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.

翻訳日:2023-11-01 19:19:49 公開日:2023-10-30

# 変分境界による非逆分布アライメントの実現に向けて

Towards Practical Non-Adversarial Distribution Alignment via Variational Bounds ( http://arxiv.org/abs/2310.19690v1 )

ライセンス: Link先を確認

Ziyu Gong, Ben Usman, Han Zhao, David I. Inouye

(参考訳) 分布アライメントは、フェアネスとロバストネスの応用で不変表現を学ぶのに使うことができる。ほとんどの先行研究は対向アライメント法を頼っているが、結果として生じるミニマックス問題は不安定で最適化が難しい。非敵対的可能性に基づくアプローチは、モデルの可逆性を必要とするか、潜在する事前に制約を課すか、あるいはアライメントのための一般的なフレームワークが欠如している。これらの制約を克服するために,任意のモデルパイプラインに適用可能な非逆vaeに基づくアライメント手法を提案する。我々は、vaeのような目的を持つが異なる視点を持つアライメント上界(ノイズ境界を含む)のセットを開発する。提案手法を,理論上も経験的にも,従来のVAEベースのアライメント手法と比較する。最後に,新たなアライメント損失により,標準不変表現学習パイプラインの敵意損失を,元のアーキテクチャを変更せずに置き換えることができることを実証し,非逆アライメント手法の適用性を大幅に拡大することを示した。

Distribution alignment can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial alignment methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods.

翻訳日:2023-11-01 19:19:07 公開日:2023-10-30

# デジタル空間における感情分析:レビューの概観

Sentiment Analysis in Digital Spaces: An Overview of Reviews ( http://arxiv.org/abs/2310.19687v1 )

ライセンス: Link先を確認

Laura E.M. Ayravainen, Joanne Hinds, Brittany I. Davidson

(参考訳) 感性分析(SA)は一般的にデジタルテキストデータに適用され、意見や感情に対する洞察を明らかにする。多くの体系的レビューは既存の研究を要約しているが、しばしば有効性や科学的実践についての議論を見落としている。本稿では,2,275の初等研究を含む38の体系的レビューを合成したレビューの概要を紹介する。我々は,システムレビュー手法と報告基準の厳格さと品質を評価するための,目覚しい品質評価フレームワークを考案した。その結果,多様なアプリケーションや手法,報告の厳密さ,課題が時間の経過とともに明らかとなった。今後の研究や実践者がこれらの問題にどう対処できるかを議論し、その重要性を多くのアプリケーションで強調する。

Sentiment analysis (SA) is commonly applied to digital textual data, revealing insight into opinions and feelings. Many systematic reviews have summarized existing work, but often overlook discussions of validity and scientific practices. Here, we present an overview of reviews, synthesizing 38 systematic reviews, containing 2,275 primary studies. We devise a bespoke quality assessment framework designed to assess the rigor and quality of systematic review methodologies and reporting standards. Our findings show diverse applications and methods, limited reporting rigor, and challenges over time. We discuss how future research and practitioners can address these issues and highlight their importance across numerous applications.

翻訳日:2023-11-01 19:18:37 公開日:2023-10-30

# 入力再構成は回帰u-netモデルの不確かさを直接推定するために使用できるか? --頭頸部癌に対する陽子線量予測への応用

Can input reconstruction be used to directly estimate uncertainty of a regression U-Net model? -- Application to proton therapy dose prediction for head and neck cancer patients ( http://arxiv.org/abs/2310.19686v1 )

ライセンス: Link先を確認

Margerie Huet-Dastarac, Dan Nguyen, Steve Jiang, John Lee, Ana Barragan Montero

(参考訳) 深層学習モデルの不確実性を信頼性と効率的な方法で推定することは、文献で多くの異なる解が提案されているオープンな問題のままである。ほとんどの一般的な方法は、モンテカルロ・ドロップアウト (MCDO) やディープ・アンサンブル (DE) のようなベイズ近似に基づいているが、高い推論時間(つまり、複数の推論パスを必要とする)を持ち、アウト・オブ・ディストリビューション検出 (OOD) データ(すなわち、イン・ディストリビューション (ID) と OOD に類似した不確実性)では機能しない。医療アプリケーションのような安全上重要な環境では、誤った予測が患者の安全性を脅かす可能性があるため、正確な不確実性推定手法が重要である。本研究では,代替の直接不確実性推定法を提案し,回帰型u-netアーキテクチャに適用する。この方法は、入力を再構築するボトルネックから分岐を追加することで構成される。入力再構成誤差はモデルの不確かさのサロゲートとして使用できる。概念実証のために, 頭頸部癌患者の陽子治療線量予測に適用した。本手法の精度,時間ゲイン,OOD検出を本手法で解析し,一般的なMCDOやDEと比較した。入力再構成法ではDとMCDO(0.447と0.612の間)よりも予測誤差(0.620)の高いピアソン相関係数を示した。また,OOD(Zスコア34.05)の同定も容易である。回帰タスクと同時に不確実性を推定するので、時間や計算資源は少なくなります。

Estimating the uncertainty of deep learning models in a reliable and efficient way has remained an open problem, where many different solutions have been proposed in the literature. Most common methods are based on Bayesian approximations, like Monte Carlo dropout (MCDO) or Deep ensembling (DE), but they have a high inference time (i.e. require multiple inference passes) and might not work for out-of-distribution detection (OOD) data (i.e. similar uncertainty for in-distribution (ID) and OOD). In safety critical environments, like medical applications, accurate and fast uncertainty estimation methods, able to detect OOD data, are crucial, since wrong predictions can jeopardize patients safety. In this study, we present an alternative direct uncertainty estimation method and apply it for a regression U-Net architecture. The method consists in the addition of a branch from the bottleneck which reconstructs the input. The input reconstruction error can be used as a surrogate of the model uncertainty. For the proof-of-concept, our method is applied to proton therapy dose prediction in head and neck cancer patients. Accuracy, time-gain, and OOD detection are analyzed for our method in this particular application and compared with the popular MCDO and DE. The input reconstruction method showed a higher Pearson correlation coefficient with the prediction error (0.620) than DE and MCDO (between 0.447 and 0.612). Moreover, our method allows an easier identification of OOD (Z-score of 34.05). It estimates the uncertainty simultaneously to the regression task, therefore requires less time or computational resources.

翻訳日:2023-11-01 19:18:27 公開日:2023-10-30

# DGFN: 二重生成フローネットワーク

DGFN: Double Generative Flow Networks ( http://arxiv.org/abs/2310.19685v1 )

ライセンス: Link先を確認

Elaine Lau, Nikhil Vemgal, Doina Precup, Emmanuel Bengio

(参考訳) 深層学習は薬物発見の有効なツールとして現れており、予測モデルと生成モデルの両方に応用される可能性がある。 Generative Flow Networks (GFlowNets/GFNs) は、多種多様な候補を生成する能力、特に小さな分子生成タスクで認識される手法である。本稿では、DGFN(Double GFlowNets)を紹介する。強化学習とDouble Deep Q-Learningからインスピレーションを得て,これらのトラジェクトリをサンプリングするターゲットネットワークを導入し,メインネットワークをこれらのトラジェクトリで更新する。実験の結果、dgfnsはスパース報酬ドメインと高次元状態空間の探索を効果的に促進することが明らかとなった。

Deep learning is emerging as an effective tool in drug discovery, with potential applications in both predictive and generative models. Generative Flow Networks (GFlowNets/GFNs) are a recently introduced method recognized for the ability to generate diverse candidates, in particular in small molecule generation tasks. In this work, we introduce double GFlowNets (DGFNs). Drawing inspiration from reinforcement learning and Double Deep Q-Learning, we introduce a target network used to sample trajectories, while updating the main network with these sampled trajectories. Empirical results confirm that DGFNs effectively enhance exploration in sparse reward domains and high-dimensional state spaces, both challenging aspects of de-novo design in drug discovery.

翻訳日:2023-11-01 19:17:50 公開日:2023-10-30

# 深層学習を用いた入場誘導問題の密度推定

Density Estimation for Entry Guidance Problems using Deep Learning ( http://arxiv.org/abs/2310.19684v1 )

ライセンス: Link先を確認

Jens A. Rataczak, Davide Amato, Jay W. McMahon

(参考訳) 本研究は、惑星突入誘導問題に使用する大気密度プロファイルを推定するための深層学習手法を提案する。長期短期記憶(lstm)ニューラルネットワークは、エントリー車両で利用可能な測定値と、それが飛行する密度プロファイルの間のマッピングを学ぶために訓練される。測定には球面状態表現、直感加速度成分、表面圧力測定が含まれる。ネットワークのトレーニングデータは、最初に、指数密度モデルを用いた完全数値予測-補正ガイダンス(fnpeg)アルゴリズムを用いて、火星への突入ミッションのモンテカルロ分析を行い、真理密度プロファイルをmarsgramからサンプリングすることで生成される。 LSTMネットワークの予測をFNPEGアルゴリズムに統合するためのカリキュラム学習手法を開発した。訓練されたLSTMは、車両が飛行する密度プロファイルを予測し、既に飛行している密度プロファイルを再構築する。 FNPEGアルゴリズムの性能は指数モデル,1次フェードメモリフィルタを付加した指数モデル,LSTMネットワークの3つの異なる密度推定手法で評価される。その結果、LSTMモデルを用いることで、ノイズとノイズの両測定を考慮に入れた場合、他の2つの手法よりも優れた終端精度が得られることがわかった。

This work presents a deep-learning approach to estimate atmospheric density profiles for use in planetary entry guidance problems. A long short-term memory (LSTM) neural network is trained to learn the mapping between measurements available onboard an entry vehicle and the density profile through which it is flying. Measurements include the spherical state representation, Cartesian sensed acceleration components, and a surface-pressure measurement. Training data for the network is initially generated by performing a Monte Carlo analysis of an entry mission at Mars using the fully numerical predictor-corrector guidance (FNPEG) algorithm that utilizes an exponential density model, while the truth density profiles are sampled from MarsGRAM. A curriculum learning procedure is developed to refine the LSTM network's predictions for integration within the FNPEG algorithm. The trained LSTM is capable of both predicting the density profile through which the vehicle will fly and reconstructing the density profile through which it has already flown. The performance of the FNPEG algorithm is assessed for three different density estimation techniques: an exponential model, an exponential model augmented with a first-order fading-memory filter, and the LSTM network. Results demonstrate that using the LSTM model results in superior terminal accuracy compared to the other two techniques when considering both noisy and noiseless measurements.

翻訳日:2023-11-01 19:17:29 公開日:2023-10-30

# 時系列オンラインブートストラップ

An Online Bootstrap for Time Series ( http://arxiv.org/abs/2310.19683v1 )

ライセンス: Link先を確認

Nicolai Palm and Thomas Nagler

(参考訳) ブートストラップのような再サンプリング手法は、機械学習の分野で有用であることが証明されている。しかし, 従来のブートストラップ法の適用性は, 時系列や空間的相関観測など, 依存データの大きなストリームを扱う場合に制限される。本稿では,データの依存性を考慮した新しいブートストラップ手法を提案する。この方法は、ますます依存する重みの自己回帰配列に基づいている。一般条件下でのブートストラップ方式の理論的妥当性を実証する。提案手法の有効性をシミュレーションにより実証し, 複雑なデータ依存関係が存在する場合でも信頼性の高い不確実性定量化を実現することを示す。我々の研究は、古典的な再サンプリング技術と現代のデータ分析の要求のギャップを埋め、動的でデータ豊富な環境における研究者や実践者にとって貴重なツールを提供する。

Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.

翻訳日:2023-11-01 19:17:06 公開日:2023-10-30

# 事前学習型言語モデルをニューラルネットワーク翻訳に統合する

Integrating Pre-trained Language Model into Neural Machine Translation ( http://arxiv.org/abs/2310.19680v1 )

ライセンス: Link先を確認

Soon-Jae Hwang, Chang-Sung Jeong

(参考訳) ニューラルネットワーク翻訳(NMT)は、広範囲の研究・開発を通じて自然言語処理において重要な技術となっている。しかし、高品質なバイリンガル言語ペアデータの不足は、NMTの性能向上に依然として大きな課題をもたらしている。近年,この問題を解決するために,事前学習言語モデル(PLM)の文脈情報の利用を検討している。しかし, PLM モデルと NMT モデルの不整合性の問題は未解決のままである。本研究では PLM 統合 NMT (PiNMT) モデルを提案する。 PiNMTモデルは、3つの重要なコンポーネント、PLM Multi Layer Converter、Embedding Fusion、Cosine Alignmentで構成され、それぞれがNMTに効果的なPLM情報を提供する上で重要な役割を果たす。さらに,本論文では,個別学習率と2段階学習という2つのトレーニング戦略についても紹介する。提案したPiNMTモデルとトレーニング戦略を実装することで,IWSLT'14 En$\leftrightarrow$Deデータセット上で最先端のパフォーマンスを実現した。本研究の結果は,非互換性を克服し,性能を向上させるため,PLMとNMTを効率的に統合する新たなアプローチを示すものである。

Neural Machine Translation (NMT) has become a significant technology in natural language processing through extensive research and development. However, the deficiency of high-quality bilingual language pair data still poses a major challenge to improving NMT performance. Recent studies are exploring the use of contextual information from pre-trained language model (PLM) to address this problem. Yet, the issue of incompatibility between PLM and NMT model remains unresolved. This study proposes a PLM-integrated NMT (PiNMT) model to overcome the identified problems. The PiNMT model consists of three critical components, PLM Multi Layer Converter, Embedding Fusion, and Cosine Alignment, each playing a vital role in providing effective PLM information to NMT. Furthermore, two training strategies, Separate Learning Rates and Dual Step Training, are also introduced in this paper. By implementing the proposed PiNMT model and training strategy, we achieved state-of-the-art performance on the IWSLT'14 En$\leftrightarrow$De dataset. This study's outcomes are noteworthy as they demonstrate a novel approach for efficiently integrating PLM with NMT to overcome incompatibility and enhance performance.

翻訳日:2023-11-01 19:16:51 公開日:2023-10-30

# HyPE: 相対的位置エンコーディングのための双曲的ビアーゼによる注意

HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding ( http://arxiv.org/abs/2310.19676v1 )

ライセンス: Link先を確認

Giorgio Angelotti

(参考訳) Transformerベースのアーキテクチャでは、アテンション機構は入力シーケンスのトークンに関して本質的に置換不変である。シーケンシャルな順序を課すため、トークンの位置は固定または学習可能なパラメータを持つスキームを使って符号化される。本稿では,双曲関数の特性を利用してトークンの相対位置を符号化するHyPE(Hyperbolic Positional Encoding)を提案する。このアプローチは、マスクの$O(L^2)$値を格納する必要なく注意機構をバイアスし、$L$は入力シーケンスの長さである。 HyPEは予備連結演算と行列乗法を活用し、ソフトマックス計算にバイアスを間接的に組み込んだ相対距離の符号化を容易にする。この設計はflashattention-2との互換性を確保し、エンコーディング内で学習可能なパラメータの勾配バックプロパゲーションをサポートする。分析によって,HyPEはALiBiの注意バイアスを近似し,事前学習中に遭遇する長さを超えるコンテキストに対して有望な一般化機能を提供することを示した。今後の研究の方向性としてHyPEの実験的評価を提案する。

In Transformer-based architectures, the attention mechanism is inherently permutation-invariant with respect to the input sequence's tokens. To impose sequential order, token positions are typically encoded using a scheme with either fixed or learnable parameters. We introduce Hyperbolic Positional Encoding (HyPE), a novel method that utilizes hyperbolic functions' properties to encode tokens' relative positions. This approach biases the attention mechanism without the necessity of storing the $O(L^2)$ values of the mask, with $L$ being the length of the input sequence. HyPE leverages preliminary concatenation operations and matrix multiplications, facilitating the encoding of relative distances indirectly incorporating biases into the softmax computation. This design ensures compatibility with FlashAttention-2 and supports the gradient backpropagation for any potential learnable parameters within the encoding. We analytically demonstrate that, by careful hyperparameter selection, HyPE can approximate the attention bias of ALiBi, thereby offering promising generalization capabilities for contexts extending beyond the lengths encountered during pretraining. The experimental evaluation of HyPE is proposed as a direction for future research.

翻訳日:2023-11-01 19:16:16 公開日:2023-10-30

# 階層型階層型深層学習による共同画像圧縮と分類

A Principled Hierarchical Deep Learning Approach to Joint Image Compression and Classification ( http://arxiv.org/abs/2310.19675v1 )

ライセンス: Link先を確認

Siyu Qi, Achintha Wijesinghe, Lahiru D. Chamain, Zhi Ding

(参考訳) 低コストセンサーを含むディープラーニング(DL)の応用の中で、リモート画像分類はエッジセンサーとクラウド分類器を分離する物理チャネルを含む。従来のDLモデルは、センサーのエンコーダとエッジサーバのデコーダ+分類器に分割する必要がある。重要な課題は、接続チャネルが制限されたレート/容量を持つ場合、そのような分散モデルを効果的に訓練することである。我々のゴールは、エンコーダのラテントが低チャネル帯域を必要とするようにDLモデルを最適化し、高い分類精度で特徴情報を提供することである。本研究は,エンコーダを誘導し,コンパクトで差別的で,一般的な拡張/変換に適した特徴を抽出する3段階共同学習戦略を提案する。エンドツーエンド(E2E)トレーニングの前に,初期スクリーニングフェーズを通じて潜時次元を最適化する。単一プリデプロイエンコーダによる調整可能なビットレートを得るために、エントロピーに基づく量子化および/または手動トランケーションを潜在表現に適用する。 CIFAR-10では最大1.5%,CIFAR-100では3%,従来のE2Eクロスエントロピートレーニングでは3%の精度向上が得られた。

Among applications of deep learning (DL) involving low cost sensors, remote image classification involves a physical channel that separates edge sensors and cloud classifiers. Traditional DL models must be divided between an encoder for the sensor and the decoder + classifier at the edge server. An important challenge is to effectively train such distributed models when the connecting channels have limited rate/capacity. Our goal is to optimize DL models such that the encoder latent requires low channel bandwidth while still delivers feature information for high classification accuracy. This work proposes a three-step joint learning strategy to guide encoders to extract features that are compact, discriminative, and amenable to common augmentations/transformations. We optimize latent dimension through an initial screening phase before end-to-end (E2E) training. To obtain an adjustable bit rate via a single pre-deployed encoder, we apply entropy-based quantization and/or manual truncation on the latent representations. Tests show that our proposed method achieves accuracy improvement of up to 1.5% on CIFAR-10 and 3% on CIFAR-100 over conventional E2E cross-entropy training.

翻訳日:2023-11-01 19:15:55 公開日:2023-10-30

# 協調的評価:大規模言語モデルと人間によるオープンエンド世代評価の相乗効果を探る

Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation ( http://arxiv.org/abs/2310.19740v1 )

ライセンス: Link先を確認

Qintong Li, Leyang Cui, Lingpeng Kong, Wei Bi

(参考訳) 自動メトリクスは、しばしば人間の判断と弱い相関を示すため、人間は創造性を要求する拡張自然言語生成タスク(nlg)の評価に広く関わっている。大規模言語モデル(LLM)は最近、人間の評価に代わるスケーラブルで費用対効果の高い代替品として登場した。しかしながら、人間とLLMの両方には、固有の主観性と信頼できない判断、特に多様なタスク要求に合わせた適応可能なメトリクスを必要とするオープンなタスクに制限がある。人間とllmベースの評価器の相乗効果を探求し、未完成のnlgタスクにおける既存の一貫性のない評価基準の課題に対処するために、タスク固有の基準のチェックリストの設計とllmが初期イデオレーションを生成するテキストの詳細な評価を含む共同評価パイプラインcoevalを提案する。我々は,コエバルにおけるLLMとヒトの相互効果について,一連の実験を行った。その結果, llms を利用することで, coeval は長文を効果的に評価し, かなりの時間を節約し, 評価異常を低減できることがわかった。人間の精査は依然として役割を担っており、LLM評価スコアの約20%を究極の信頼性のために更新している。

Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unreliable judgments, particularly for open-ended tasks that require adaptable metrics tailored to diverse task requirements. To explore the synergy between humans and LLM-based evaluators and address the challenges of existing inconsistent evaluation criteria in open-ended NLG tasks, we propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts, in which LLM generates initial ideation, and then humans engage in scrutiny. We conducted a series of experiments to investigate the mutual effects between LLMs and humans in CoEval. Results show that, by utilizing LLMs, CoEval effectively evaluates lengthy texts, saving significant time and reducing human evaluation outliers. Human scrutiny still plays a role, revising around 20% of LLM evaluation scores for ultimate reliability.

翻訳日:2023-11-01 19:08:44 公開日:2023-10-30

# 大規模言語モデルにおける敵攻撃と防御--古くて新しい脅威

Adversarial Attacks and Defenses in Large Language Models: Old and New Threats ( http://arxiv.org/abs/2310.19737v1 )

ライセンス: Link先を確認

Leo Schwinn and David Dobre and Stephan G\"unnemann and Gauthier Gidel

(参考訳) 過去10年間、ニューラルネットワークの堅牢性向上を目的とした広範な研究が続けられてきたが、この問題は未解決のままである。ここでの大きな障害の1つは、欠陥防衛評価による新しい防衛アプローチの頑健さの過大評価である。欠陥のある堅牢性評価は、その後の作業で修正を必要とし、研究を危険に遅らせ、誤ったセキュリティ感覚を提供する。この文脈では、自然言語処理における差し迫った敵国軍競争、特にChatGPT、Google Bard、Anthropic's Claudeといった、クローズドソースのLarge Language Models(LLMs)に関する大きな課題に直面します。我々は,新しいアプローチの堅牢性評価を改善し,欠陥評価の量を削減するための第1の前提条件を提供する。さらに,LLMに対する埋め込み空間攻撃を,オープンソースモデルで悪意のあるコンテンツを生成するための新たな脅威モデルとして認識する。最後に、最近提案された防御について、llm特有のベストプラクティスがなければ、新しいアプローチの堅牢さを過大評価することが容易であることを示す。

Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.

翻訳日:2023-11-01 19:08:20 公開日:2023-10-30

# 選好フィードバックを用いた個人差分回帰推定

Differentially Private Reward Estimation with Preference Feedback ( http://arxiv.org/abs/2310.19733v1 )

ライセンス: Link先を確認

Sayak Ray Chowdhury, Xingyu Zhou and Nagarajan Natarajan

(参考訳) 嗜好に基づくフィードバックから学ぶことは最近、生成モデルと人間の関心を結びつけるための有望なアプローチとして、かなりの注目を集めている。数値的な報酬に頼る代わりに、生成モデルは人間フィードバックによる強化学習(RLHF)を用いて訓練される。これらのアプローチは、まず2つの可能なアクションをペアで比較し、次にこれらの比較を使って報酬モデルを推定し、最終的に推定報酬モデルに基づくポリシーを採用する。上記のパイプラインの任意のステップにおける敵攻撃は、人間のラベルのプライベートで機密性の高い情報を明らかにする可能性がある。本研究では,ラベル差分プライバシ(DP)の概念を採用し,各ラベルのプライバシを保護しつつ,嗜好に基づくフィードバックからの報酬推定の問題に焦点をあてる。具体的には、遅延報酬パラメータ $\theta^* \in \mathbb{R}^d$ を含むペア比較フィードバックに対するパラメトリックBradley-Terry-Luce(BTL)モデルを考える。標準 minimax 推定フレームワークでは、dp の局所モデルと中央モデルの両方の下で $\theta^*$ を推定する際の誤差の上限を上下に厳密に設定する。特定のプライバシー予算に対して、$\epsilon$と$n$のサンプルに対して、ローカルモデルの下でラベルDPを保証するための追加コストは、$\Theta \big(\frac{1}{e^\epsilon-1}\sqrt{\frac{d}{n}}\big)$であり、$\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$である。これらの理論結果を裏付ける合成データのシミュレーションを行う。

Learning from preference-based feedback has recently gained considerable traction as a promising approach to align generative models with human interests. Instead of relying on numerical rewards, the generative models are trained using reinforcement learning with human feedback (RLHF). These approaches first solicit feedback from human labelers typically in the form of pairwise comparisons between two possible actions, then estimate a reward model using these comparisons, and finally employ a policy based on the estimated reward model. An adversarial attack in any step of the above pipeline might reveal private and sensitive information of human labelers. In this work, we adopt the notion of label differential privacy (DP) and focus on the problem of reward estimation from preference-based feedback while protecting privacy of each individual labelers. Specifically, we consider the parametric Bradley-Terry-Luce (BTL) model for such pairwise comparison feedback involving a latent reward parameter $\theta^* \in \mathbb{R}^d$. Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP. We show, for a given privacy budget $\epsilon$ and number of samples $n$, that the additional cost to ensure label-DP under local model is $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}}\big)$, while it is $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$ under the weaker central model. We perform simulations on synthetic data that corroborate these theoretical results.

翻訳日:2023-11-01 19:07:37 公開日:2023-10-30

# ViR:ビジョン保持ネットワーク

ViR: Vision Retention Networks ( http://arxiv.org/abs/2310.19731v1 )

ライセンス: Link先を確認

Ali Hatamizadeh, Michael Ranzinger, Jan Kautz

(参考訳) 視覚変換器(ViT)は、長距離空間依存のモデリングや大規模トレーニングのスケーラビリティに特有な能力を持つため、近年、多くの人気を集めている。自己注意機構の訓練並列性は、優れた性能を維持する上で重要な役割を果たすが、その二次的な複雑さは、高速な推論を必要とする多くのシナリオにおけるViTの適用を妨げている。この効果は、入力特徴の自動回帰モデリングを必要とするアプリケーションにおいてさらに顕著である。自然言語処理(nlp)では、ジェネレーティブなアプリケーションで効率的な推論を可能にする再帰的定式化を伴う並列化モデルが提案されている。そこで本研究では,この傾向に触発されたビジョン保持ネットワーク(vir)と呼ばれる新しいコンピュータビジョンモデルを提案する。特に、ViRは、大きなシーケンス長を処理する際の柔軟な定式化のため、高解像度の画像を必要とするタスクにおいて、画像スループットとメモリ消費に好適にスケールする。 ViRは、認識タスクのための一般的なビジョンバックボーンにおいて、並列性と繰り返しの等価性を実現する最初の試みである。異なるデータセットサイズと様々な画像解像度を用いた広範囲な実験により、ViRの有効性を検証し、競争性能を達成した。私たちのコードと事前訓練されたモデルは公開されます。

Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios which demand fast inference. This effect is even more pronounced in applications in which autoregressive modeling of input features is required. In Natural Language Processing (NLP), a new stream of efforts have proposed parallelizable models with recurrent formulation that allows for efficient inference in generative applications. Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance. In particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. The ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. Our code and pretrained models will be made publicly available.

翻訳日:2023-11-01 19:07:03 公開日:2023-10-30

# コンディショナルトランスフォーマによる医療インストラクションの生成

Generating Medical Instructions with Conditional Transformer ( http://arxiv.org/abs/2310.19727v1 )

ライセンス: Link先を確認

Samuel Belkadi and Nicolo Micheletti and Lifeng Han and Warren Del-Pinto and Goran Nenadic

(参考訳) 現実世界の医療指導へのアクセスは、医療研究と医療の品質改善に不可欠である。しかし、実際の医学的指示へのアクセスは、表現される情報の繊細な性質のため、しばしば制限される。さらに、これらの命令をトレーニングや微調整の自然言語処理(NLP)モデルに手動でラベル付けするのも面倒でコストがかかる。本稿では,新たなタスク固有モデルアーキテクチャである Label-To-Text-Transformer (\textbf{LT3}) を導入し,医薬品とその属性の語彙リストなどの提供ラベルに基づく合成医療命令を生成する。 LT3はMIMIC-IIIデータベースから抽出された膨大な量の医療指示に基づいて訓練され、モデルが貴重な合成医療指示を作成できる。 lt3の性能を,最先端の事前学習言語モデル(plm)t5と対比して評価し,生成されたテキストの品質と多様性を分析した。生成された合成データをデプロイして、n2c2-2018データセット上で名前付きエンティティ認識(NER)タスクのためのSpacyNERモデルをトレーニングする。実験の結果, 合成データを用いたモデルでは, 薬物, 頻度, 経路, 強度, 形状のラベル認識において96-98\%のf1スコアが得られることがわかった。 LT3 コードとデータは \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer} で共有される。

Access to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}

翻訳日:2023-11-01 19:06:43 公開日:2023-10-30

# シンプルなモデルへの道は騒音から始まる

A Path to Simpler Models Starts With Noise ( http://arxiv.org/abs/2310.19726v1 )

ライセンス: Link先を確認

Lesia Semenova, Harry Chen, Ronald Parr, Cynthia Rudin

(参考訳) ラショモン集合は、与えられたデータセット上でほぼ等しく作用するモデルの集合であり、ラショモン比は、ラショモン集合に属する所定の仮説空間内の全てのモデルの分数である。ラショモン比は、刑事司法、医療、貸付、教育、その他の分野における表型データセットにおいて、しばしば大きく、より単純なモデルがより複雑なモデルと同じレベルの精度を達成することができるかという実践的な意味合いを持つ。オープンな疑問は、なぜラショモン比が大きくなるのかである。本研究では,学習過程におけるデータ生成過程のメカニズムと,学習過程においてアナリストが通常行う選択とを組み合わせることで,ラショモン比のサイズを決定する手法を提案する。具体的には、noisierデータセットが、実践者がモデルをトレーニングする方法を通じて、より大きなrashomon比につながることを実証する。さらに,ラショモン集合の異なる分類パターン間の予測平均差を捉え,ラベルノイズが増大する理由をモチベーションとして,パターン多様性(pattern diversity)という尺度を導入する。その結果、単純なモデルが複雑でノイズの多いデータセット上でブラックボックスモデルと同様に振る舞う傾向がある理由が説明できる。

The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.

翻訳日:2023-11-01 19:06:18 公開日:2023-10-30

# support matrix machine: レビュー

Support matrix machine: A review ( http://arxiv.org/abs/2310.19717v1 )

ライセンス: Link先を確認

Anuradha Kumari, Mushir Akhtar, Rupal Shah, M. Tanveer

(参考訳) サポートベクトルマシン(SVM)は、分類と回帰問題に対する機械学習の領域で最も研究されているパラダイムの1つである。ベクトル化された入力データに依存する。しかし、実世界のデータの大部分は行列形式で存在し、行列をベクトルに変換することによってSVMへの入力として与えられる。再構成の過程は、行列データに固有の空間相関を阻害する。また、行列をベクトルに変換することで高次元の入力データが得られるため、計算が複雑になる。行列入力データの分類におけるこれらの課題を克服するために,サポートマトリックスマシン (smm) を提案する。これは行列入力データを扱うのに適した新しい手法の1つである。 SMM法は、核ノルムとフロベニウスノルムの組み合わせであるスペクトル弾性ネット特性を用いて、行列データの構造情報を保存する。本稿は,SMMモデルの開発について,初心者と専門家双方による詳細な要約として使用可能な,初めて詳細な分析を行う。本稿では,ロバスト,スパース,クラス不均衡,マルチクラス分類モデルなど,多数のsmm変種について考察する。また、SMMモデルの適用状況を分析し、SMMアルゴリズムを前進させる動機となる将来的な研究の道筋や可能性について概説する。

Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm.

翻訳日:2023-11-01 19:05:32 公開日:2023-10-30

# オンライン不信生態系の複雑さとその進化

Complexity of the Online Distrust Ecosystem and its Evolution ( http://arxiv.org/abs/2310.19710v1 )

ライセンス: Link先を確認

Lucia Illari, Nicholas J. Restrepo, Neil F. Johnson

(参考訳) 集団的不信(およびそれに伴う誤報)は、我々の時代の最も複雑な現象の1つである。例えば、医学的専門性への不信、気候変動科学、民主的な選挙結果、さらには現在のイスラエル・ハマス・ウクライナ・ロシア紛争における事実確認事件への不信さえある。では、オンライン不信エコシステムがなぜこれほど回復力があるのか? パンデミックの前後でどのように進化しましたか。この期間、facebookの緩和政策はどれくらいうまくいったのか? 我々は、パンデミック以前のユーザーがワクチンの不信感にのみ注力した合計1億人のコミュニティ(facebookページ)のfacebookネットワークを分析した。 2019年から2023年までのこのダイナミックネットワークをマッピングすると、閉鎖を含むFacebookの緩和キャンペーンの結果として、急速に自己修復されたことがわかる。これは、新型コロナウイルス(COVID-19)によるFacebookの上昇は効果がない(例:2020年11月)という以前の発見を裏付け、拡張します。今後の介入は,複数の話題,複数の地理的尺度にまたがって共鳴しなくてはならない。最近の多くの研究と異なり、我々の研究は厳密な科学的研究の正確性が証明されていないサードパーティのブラックボックスツールに依存しておらず、そのような研究の結論に疑問を投げかけている。

Collective human distrust (and its associated mis-disinformation) is one of the most complex phenomena of our time. e.g. distrust of medical expertise, or climate change science, or democratic election outcomes, and even distrust of fact-checked events in the current Israel-Hamas and Ukraine-Russia conflicts. So what makes the online distrust ecosystem so resilient? How has it evolved during and since the pandemic? And how well have Facebook mitigation policies worked during this time period? We analyze a Facebook network of interconnected in-built communities (Facebook pages) totaling roughly 100 million users who pre-pandemic were just focused on distrust of vaccines. Mapping out this dynamical network from 2019 to 2023, we show that it has quickly self-healed in the wake of Facebook's mitigation campaigns which include shutdowns. This confirms and extends our earlier finding that Facebook's ramp-ups during COVID were ineffective (e.g. November 2020). Our findings show that future interventions must be chosen to resonate across multiple topics and across multiple geographical scales. Unlike many recent studies, our findings do not rely on third-party black-box tools whose accuracy for rigorous scientific research is unproven, hence raising doubts about such studies' conclusions, nor is our network built using fleeting hyperlink mentions which have questionable relevance.

翻訳日:2023-11-01 19:05:12 公開日:2023-10-30

# ニューラルネットワークの知識編集に関する調査研究

A Survey on Knowledge Editing of Neural Networks ( http://arxiv.org/abs/2310.19704v1 )

ライセンス: Link先を確認

Vittorio Mazzia, Alessandro Pedrani, Andrea Caciolai, Kay Rottmann, Davide Bernardi

(参考訳) 深層ニューラルネットワークは、学界や業界でますます普及し、さまざまな分野や関連するタスクで人間のパフォーマンスと一致し、追い越すようになっている。しかし、人間と同じように、最大のニューラルネットワークでさえ間違いを犯し、世界が経つにつれて一度正しい予測が無効になる可能性がある。ミスや最新の情報を考慮したサンプルによるデータセットの強化は、実用アプリケーションでは一般的な回避策となっている。しかしながら、破滅的な忘れというよく知られた現象は、ニューラルネットワークパラメータの暗黙的に記憶された知識の正確な変化を達成する上で課題となり、しばしば望ましい振る舞いを達成するために完全なモデルの再訓練が必要となる。これは高価で信頼性がなく、大規模な自己教師型事前トレーニングの現在のトレンドと相容れないため、データ変更にニューラルネットワークモデルを適用するためのより効率的で効果的な方法を見つける必要がある。このニーズに対処するために、知識編集は、事前学習されたタスクにおけるモデル行動に影響を与えることなく、信頼性、データ効率、高速な目標モデルの変更を可能にすることを目的とした、新しい研究分野として浮上している。本調査では,最近の人工知能研究分野について概説する。まず、ニューラルネットワークを編集し、共通の枠組みで形式化し、継続的学習のようなより悪名高い研究分野と区別する問題を紹介する。次に、これまでに提案されている最も関連する知識編集手法とデータセットのレビューを行い、正規化技法、メタラーニング、直接モデル編集、アーキテクチャ戦略の4つの異なるファミリーに分類する。最後に,他の研究分野との交点と今後の研究の方向性について概説する。

Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.

翻訳日:2023-11-01 19:04:31 公開日:2023-10-30

# 臨界(1+1)-d 3状態ポッツ模型における位相欠陥の格子実現

Lattice Realizations of Topological Defects in the critical (1+1)-d Three-State Potts Model ( http://arxiv.org/abs/2310.19703v1 )

ライセンス: Link先を確認

Madhav Sinha, Fei Yan, Linnea Grans-Samuelsson, Ananda Roy and Hubert Saleur

(参考訳) 位相的/完全伝達的欠陥は2次元共形場理論(CFT)の対称性の解析において基礎的な役割を果たす。本研究では,これらの欠陥に対するスピン鎖規則化を提案し,三状態ポッツCFTの場合の解析を行った。特に、すべてのプリミティブ欠陥に対する格子バージョンが提示され、残りの欠陥はプリミティブ欠陥の融合によって得られる。この欠陥は、周期的境界条件を持つ等質スピン鎖の2つの所与の部位に修正された相互作用を導入することによって得られる。様々な原始的欠陥は1を除いて格子上の位相的であり、これはスケーリング極限においてのみ位相的である。格子モデルは, 正対角化法と密度行列再正規化法を組み合わせて解析する。欠陥の異なるハミルトニアンに対する低次エネルギースペクトルと、欠陥の周囲に対称に位置するブロックの絡み合いエントロピーを計算する。後者は、様々な欠陥を特徴付ける$g$-functionを計算する便利な方法を提供する。最後に、「交差チャンネル」におけるライン演算子の固有値と異なる欠陥ラインの融合についても解析する。結果はすべて共形場理論の期待と一致している。

Topological/perfectly-transmissive defects play a fundamental role in the analysis of the symmetries of two dimensional conformal field theories (CFTs). In the present work, spin chain regularizations for these defects are proposed and analyzed in the case of the three-state Potts CFT. In particular, lattice versions for all the primitive defects are presented, with the remaining defects obtained from the fusion of the primitive ones. The defects are obtained by introducing modified interactions around two given sites of an otherwise homogeneous spin chain with periodic boundary condition. The various primitive defects are topological on the lattice except for one, which is topological only in the scaling limit. The lattice models are analyzed using a combination of exact diagonalization and density matrix renormalization group techniques. Low-lying energy spectra for different defect Hamiltonians as well as entanglement entropy of blocks located symmetrically around the defects are computed. The latter provides a convenient way to compute the $g$-function which characterizes various defects. Finally, the eigenvalues of the line operators in the "crossed channel'' and fusion of different defect lines are also analyzed. The results are all in agreement with expectations from conformal field theory.

翻訳日:2023-11-01 19:04:02 公開日:2023-10-30

# フォトニック結晶空洞中の量子ドットからの通信波長におけるパーセル励起単一光子

Purcell-Enhanced Single Photons at Telecom Wavelengths from a Quantum Dot in a Photonic Crystal Cavity ( http://arxiv.org/abs/2310.19701v1 )

ライセンス: Link先を確認

Catherine L. Phillips, Alistair J. Brash, Max Godsland, Nicholas J. Martin, Andrew Foster, Anna Tomlinson, Rene Dost, Nasser Babazadeh, Elisa M. Sala, Luke Wilson, Jon Heffernan, Maurice S. Skolnick, A. Mark Fox

(参考訳) 量子ドットは、様々な低損失の通信帯域にまたがる可変発光により、既存のファイバーネットワークと互換性があるため、テレコムの単一光子源として有望な候補である。フォトニック構造への適合性は、パーセル効果を通じて輝度を高め、効率的な量子通信技術をサポートする。本研究は, 液滴エピタキシーMOVPEを用いて作成したInAs/InPQDをテレコムCバンド内で動作させる。低モード容積フォトニック結晶空洞内のqdの相互作用によりパーセル因子5から発生する340psの短い放射寿命を観察した。試料温度のその場制御により,QDの放射波長の温度調整と,25Kまでの温度で保存された単一光子放射純度の両方を示す。これらの結果から,QDをベースとした低温無低温Cバンド単一光子源の実現可能性を示し,量子通信技術の適用性を支持した。

Quantum dots are promising candidates for telecom single photon sources due to their tunable emission across the different low-loss telecommunications bands, making them compatible with existing fiber networks. Their suitability for integration into photonic structures allows for enhanced brightness through the Purcell effect, supporting efficient quantum communication technologies. Our work focuses on InAs/InP QDs created via droplet epitaxy MOVPE to operate within the telecoms C-band. We observe a short radiative lifetime of 340 ps, arising from a Purcell factor of 5, owing to interaction of the QD within a low-mode-volume photonic crystal cavity. Through in-situ control of the sample temperature, we show both temperature tuning of the QD's emission wavelength and a preserved single photon emission purity at temperatures up to 25K. These findings suggest the viability of QD-based, cryogen-free, C-band single photon sources, supporting applicability in quantum communication technologies.

翻訳日:2023-11-01 19:03:44 公開日:2023-10-30

# プロンプティングとプリフィックスチューニングはいつ行われるのか? 能力と限界の理論

When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations ( http://arxiv.org/abs/2310.19698v1 )

ライセンス: Link先を確認

Aleksandar Petrov, Philip H.S. Torr, Adel Bibi

(参考訳) 文脈に基づく微調整手法は、プロンプト、文脈内学習、ソフト・プロンプト(プロンプト・チューニング)、プレフィックス・チューニング(プレフィックス・チューニング)などがあり、パラメータのごく一部で完全な微調整の性能とよく一致するため人気がある。実験的な成功にもかかわらず、これらの手法がモデルの内部計算と表現力の限界にどのように影響するかについての理論的理解はほとんどない。連続埋め込み空間は離散トークン空間よりも表現力が高いが,ソフトプロンプトやプレフィックスチューニングは,学習可能なパラメータの数が同じであっても,完全な微調整よりも厳密に表現力に乏しいことを示す。具体的には、コンテキストベースの微調整はコンテンツ上の相対的注意パターンを変えることができず、注意層の出力を一定の方向に偏らせるだけである。これは、プロンプト、インコンテキスト学習、ソフトプロンプト、プレフィックスチューニングといったテクニックは、事前訓練されたモデルに存在するスキルを効果的に誘発することができるが、新しい注意パターンを必要とする新しいタスクを学べないことを示唆している。

Context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. Despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are strictly less expressive than full fine-tuning, even with the same number of learnable parameters. Concretely, context-based fine-tuning cannot change the relative attention pattern over the content and can only bias the outputs of an attention layer in a fixed direction. This suggests that while techniques like prompting, in-context learning, soft prompting, and prefix-tuning can effectively elicit skills present in the pretrained model, they cannot learn novel tasks that require new attention patterns.

翻訳日:2023-11-01 19:03:22 公開日:2023-10-30

# e^{\text{rpca}}$:指数関数的家族分布に対するロバスト主成分分析

$e^{\text{RPCA}}$: Robust Principal Component Analysis for Exponential Family Distributions ( http://arxiv.org/abs/2310.19787v1 )

ライセンス: Link先を確認

Xiaojun Zheng, Simon Mak, Liyan Xie, Yao Xie

(参考訳) ロバスト・プリンシパル・コンポーネント分析(RPCA)は,データ行列から低ランク構造を復元する手法として広く用いられている。これらの腐敗は、咬合、悪質な改ざん、その他の異常の原因から生じる可能性があり、低ランクの背景を持つ腐敗の共同同定は、プロセス監視と診断に不可欠である。しかし、既存のRPCAメソッドとその拡張は、多くのアプリケーションで知られ、非常に非ガウス的であるデータ行列の基盤となる確率分布をほとんど考慮していない。そこで我々は,指数族に分布する場合に,所望の分解を低ランク・スパース行列に分解できるRobust principal Component Analysis for Exponential Family distributions(e^{\text{RPCA}}$)という新しい手法を提案する。効率的な$e^{\text{RPCA}}$分解のための乗算器最適化アルゴリズムの新しい交互方向法を提案する。 e^{\text{RPCA}}$の有効性は、鋼板欠陥検出のための第1およびアトランタ大都市圏における犯罪活動監視のための第2の2つのアプリケーションで実証される。

Robust Principal Component Analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anomalies, and the joint identification of such corruptions with low-rank background is critical for process monitoring and diagnosis. However, existing RPCA methods and their extensions largely do not account for the underlying probabilistic distribution for the data matrices, which in many applications are known and can be highly non-Gaussian. We thus propose a new method called Robust Principal Component Analysis for Exponential Family distributions ($e^{\text{RPCA}}$), which can perform the desired decomposition into low-rank and sparse matrices when such a distribution falls within the exponential family. We present a novel alternating direction method of multiplier optimization algorithm for efficient $e^{\text{RPCA}}$ decomposition. The effectiveness of $e^{\text{RPCA}}$ is then demonstrated in two applications: the first for steel sheet defect detection, and the second for crime activity monitoring in the Atlanta metropolitan area.

翻訳日:2023-11-01 18:55:49 公開日:2023-10-30

# 分類するか、分類するかを学ぶか? 一般カテゴリー発見のための自己符号化

Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery ( http://arxiv.org/abs/2310.19776v1 )

ライセンス: Link先を確認

Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek

(参考訳) テスト時に新しいカテゴリを発表するという試みでは、事前定義されたカテゴリセットによって制限される従来の教師付き認識モデルの固有の制限に直面する。自己教師とオープンワールドの学習の領域において、テスト時のカテゴリ発見への進歩は行われてきたが、重要でしばしば見過ごされる疑問が続いている。本稿では、最適化のレンズを通して \textit{category} を概念化し、よく定義された問題に対する最適な解と見なす。このユニークな概念化を生かして,テスト時に未知のカテゴリを発見できる,新しい,効率的かつ自己管理的な手法を提案する。このアプローチの健全な特徴は、個々のデータインスタンスに最小長のカテゴリコードを割り当てることであり、実世界のデータセットでよく見られる暗黙のカテゴリ階層をカプセル化する。この機構により、カテゴリの粒度の制御が強化され、より詳細なカテゴリを扱うためのモデルが組み合わされる。試行錯誤による評価は, テスト時に未知のカテゴリを管理する上でのソリューションの有効性を実証するものである。さらに、我々の提案を理論的根拠で補強し、その最適性の証明を提供する。私たちのコードは、 \url{https://github.com/sarahrastegar/infosieve} で利用可能です。

In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a \textit{category}? In this paper, we conceptualize a \textit{category} through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at: \url{https://github.com/SarahRastegar/InfoSieve}.

翻訳日:2023-11-01 18:55:25 公開日:2023-10-30

# 説明可能な人工知能(XAI) 2.0:オープンチャレンジのマニフェストと学際研究の方向性

Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions ( http://arxiv.org/abs/2310.19775v1 )

ライセンス: Link先を確認

Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andr\'es P\'aez, Wojciech Samek, Johannes Schneider, Timo Speith, Simone Stumpf

(参考訳) 不透明な人工知能(AI)に基づくシステムは、さまざまな現実世界のアプリケーションで繁栄を続けているため、これらのブラックボックスモデルを理解することが最重要になっている。これに対し、説明可能なAI(XAI)は、様々な領域にまたがる実践的、倫理的利益の研究分野として登場した。本稿は,XAIの進歩と実世界のシナリオへの応用に加えて,より広い視点と協調的な取り組みの必要性を強調し,XAI内の課題に対処するものである。我々は,様々な分野の専門家を集めてオープンな問題を特定し,研究課題の同期化に努め,xaiの実用化を加速する。協力的な議論と学際的な協力の促進により、私たちは、XAIを前進させ、その継続的な成功に寄与することを目指しています。我々の目標は、XAIを進めるための包括的な提案を行うことです。この目標を達成するために,我々は,27のオープン問題のマニフェストを9つのカテゴリに分類した。これらの課題は、XAIの複雑さとニュアンスをカプセル化し、将来の研究のためのロードマップを提供する。各問題に対して,利害関係者の集合的知性を活用するために,有望な研究指針を提供する。

As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.

翻訳日:2023-11-01 18:55:04 公開日:2023-10-30

# 絡み合い支援古典コミュニケーションのための符号

Codes for entanglement-assisted classical communication ( http://arxiv.org/abs/2310.19774v1 )

ライセンス: Link先を確認

Tushita Prasad, Markus Grassl

(参考訳) 本稿では,固定数の消去や誤りを訂正できる新しい絡み合い支援古典的通信方式を提案する。このスキームは、最大絡み合ったペアによって補助される量子チャネル上で古典的な情報を伝達する。このような課題を古典的な問題に還元して達成するための一般的な枠組みを確立する。私たちは、利用可能な絡み合いの量に基づいて、直接コーディングやスーパーセンスコーディングを使用します。この結果、2つの古典的チャンネルが組み合わさる。このシナリオでは、明示的な符号化スキームを示す。我々は,提案手法を特定の境界値と比較し,そのスキームが最適である特定のパラメータの範囲を求める。提示されたスキームは容易に実現できる。実験で実証されたスーパーデンス符号化の実装のみが必要となる。

We present a new entanglement assisted classical communication scheme which can correct a fixed number of erasures or errors. The scheme transmits classical information over a quantum channel assisted by maximally entangled pairs. We establish a general framework to accomplish such a task by reducing it to a classical problem. We use direct coding or super-dense coding based on the amount of entanglement available. This results in a combination of two classical channels. For this scenario we present an explicit encoding scheme. We compare our scheme with specific bounds and find certain ranges of parameters where the scheme is optimal. The presented scheme can easily be realized. It requires only the implementation of super-dense coding which has been demonstrated successfully in experiments.

翻訳日:2023-11-01 18:54:45 公開日:2023-10-30

# MM-VID:GPT-4V(ision)による映像理解の促進

MM-VID: Advancing Video Understanding with GPT-4V(ision) ( http://arxiv.org/abs/2310.19773v1 )

ライセンス: Link先を確認

Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

(参考訳) 本稿では、GPT-4Vの能力を利用する統合システムMM-VIDと、視覚、音声、音声の特殊なツールを組み合わせて、高度な映像理解を促進する。 MM-VIDは、長いビデオや、1時間以内のコンテンツの推論や複数のエピソードにまたがるストーリーラインの把握といった複雑なタスクによって引き起こされる課題に対処するように設計されている。 mm-vidはgpt-4vでビデオからスクリプトまで生成し、マルチモーダル要素を長いテキストスクリプトに書き込む。生成されたスクリプトは、文字の動き、アクション、表現、対話を詳述し、ビデオ理解を実現するための大きな言語モデル(LLM)の道を開く。これにより、音声記述、文字識別、マルチモーダルハイレベル理解などの高度な機能を実現する。実験により,様々なビデオ長の異なる動画ジャンルに対するMM-VIDの有効性が示された。また,ゲームやグラフィックユーザインタフェースなど,インタラクティブな環境にも適用可能な可能性を示した。

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. MM-VID is designed to address the challenges posed by long-form videos and intricate tasks such as reasoning within hour-long content and grasping storylines spanning multiple episodes. MM-VID uses a video-to-script generation with GPT-4V to transcribe multimodal elements into a long textual script. The generated script details character movements, actions, expressions, and dialogues, paving the way for large language models (LLMs) to achieve video understanding. This enables advanced capabilities, including audio description, character identification, and multimodal high-level comprehension. Experimental results demonstrate the effectiveness of MM-VID in handling distinct video genres with various video lengths. Additionally, we showcase its potential when applied to interactive environments, such as video games and graphic user interfaces.

翻訳日:2023-11-01 18:54:36 公開日:2023-10-30

# 動的メタサーフェスアンテナを用いた非視線ユーザトラッキングのための自己回帰注意型ニューラルネットワーク

Autoregressive Attention Neural Networks for Non-Line-of-Sight User Tracking with Dynamic Metasurface Antennas ( http://arxiv.org/abs/2310.19767v1 )

ライセンス: Link先を確認

Kyriakos Stylianopoulos, Murat Bayraktar, Nuria Gonz\'alez Prelcic, George C. Alexandropoulos

(参考訳) 次世代無線ネットワークにおけるユーザのローカライゼーションと追跡は、ダイナミックメタサーフェスアンテナ(DMA)のような技術によって革新される可能性がある。一般的に提案されているアルゴリズム的アプローチは、比較的支配的なLine-of-Sight(LoS)パスの仮定に依存するか、DMA要素数に匹敵する長さのパイロット送信シーケンスを必要とする。本稿では,ユーザトラッキングのための2段階の機械学習に基づくアプローチを提案する。新たに提案するアテンションベースニューラルネットワーク(nn)は,ユーザモビリティパターンによらず,ノイズの多いチャネル応答を潜在的なユーザ位置にマッピングするように訓練された。このアーキテクチャは、高次元の周波数応答信号から情報を抽出するために特に修正された顕著な視覚トランスの修正を構成する。第2段階として、過去のユーザ位置に対するnnの予測を学習可能な自己回帰モデルに通し、時間相関チャネル情報を利用して最終位置予測を得る。チャネル推定手法は、部分的に接続された無線周波数チェーンを持つDMA受信アーキテクチャを活用し、パイロット数が減少する。屋外光線追跡のシナリオに対する数値的な評価は、LoSブロックにもかかわらず、様々なマルチパス設定で高い位置精度を達成することができることを示している。

User localization and tracking in the upcoming generation of wireless networks have the potential to be revolutionized by technologies such as the Dynamic Metasurface Antennas (DMAs). Commonly proposed algorithmic approaches rely on assumptions about relatively dominant Line-of-Sight (LoS) paths, or require pilot transmission sequences whose length is comparable to the number of DMA elements, thus, leading to limited effectiveness and considerable measurement overheads in blocked LoS and dynamic multipath environments. In this paper, we present a two-stage machine-learning-based approach for user tracking, specifically designed for non-LoS multipath settings. A newly proposed attention-based Neural Network (NN) is first trained to map noisy channel responses to potential user positions, regardless of user mobility patterns. This architecture constitutes a modification of the prominent vision transformer, specifically modified for extracting information from high-dimensional frequency response signals. As a second stage, the NN's predictions for the past user positions are passed through a learnable autoregressive model to exploit the time-correlated channel information and obtain the final position predictions. The channel estimation procedure leverages a DMA receive architecture with partially-connected radio frequency chains, which results to reduced numbers of pilots. The numerical evaluation over an outdoor ray-tracing scenario illustrates that despite LoS blockage, this methodology is capable of achieving high position accuracy across various multipath settings.

翻訳日:2023-11-01 18:54:19 公開日:2023-10-30

# 誘導コヒーレンスに基づく干渉計の1次コヒーレンスと経路識別性との相補性

Complementarity relationship between first-order coherence and path distinguishability in an interferometer based on induced coherence ( http://arxiv.org/abs/2310.19765v1 )

ライセンス: Link先を確認

Gerard J. Machado, Lluc Sendra, Adam Vall\'es, Juan P. Torres

(参考訳) 誘導コヒーレンス(英語版)の概念に基づく干渉計を考えると、異なる二階非線形結晶に由来する2つの信号光子が干渉することができる。 2つの干渉信号光子間の1次コヒーレンスと、それらが原点となる非線形結晶に関する識別情報を定量化するパラメータを関連付ける相補性関係を導出する。驚くべきことに、導出した関係は単光子系を超えており、任意の光子フラックス速度に有効である。導出相補性関係の妥当性を検証した低光子流束レジームの実験結果を示す。

We consider an interferometer based on the concept of induced coherence, where two signal photons that originate in different second-order nonlinear crystals can interfere. We derive a complementarity relationship that links the first-order coherence between the two interfering signal photons with a parameter that quantifies the distinguishing information regarding the nonlinear crystal where they originated. Astonishingly, the derived relationship goes beyond the single-photon regime and is valid for any photon flux rate generated. We show experimental results in the low photon-flux regime that confirm the validity of the derived complementarity relationship.

翻訳日:2023-11-01 18:53:52 公開日:2023-10-30

# ニューラルPDEにおける自己回帰ルネサンス

Autoregressive Renaissance in Neural PDE Solvers ( http://arxiv.org/abs/2310.19763v1 )

ライセンス: Link先を確認

Yolanne Yi Ran Lee

(参考訳) ニューラル偏微分方程式(PDE)の分野における最近の発展は、ニューラル作用素に強く重点を置いている。しかし、ICLR 2022で発表されたBrandstetterらによる論文"Message Passing Neural PDE Solver"では、自己回帰モデルを再検討し、最先端のフーリエニューラル演算子と従来のPDEソルバの両方に匹敵する、あるいは優れたメッセージパッシンググラフニューラルネットワークを、その一般化能力と性能で設計している。このブログ記事は、自動回帰モデルにおける不安定性の一般的な問題と、メッセージパッシンググラフニューラルネットワークアーキテクチャの設計選択に対処するために使用される戦略について詳しく説明している。

Recent developments in the field of neural partial differential equation (PDE) solvers have placed a strong emphasis on neural operators. However, the paper "Message Passing Neural PDE Solver" by Brandstetter et al. published in ICLR 2022 revisits autoregressive models and designs a message passing graph neural network that is comparable with or outperforms both the state-of-the-art Fourier Neural Operator and traditional classical PDE solvers in its generalization capabilities and performance. This blog post delves into the key contributions of this work, exploring the strategies used to address the common problem of instability in autoregressive models and the design choices of the message passing graph neural network architecture.

翻訳日:2023-11-01 18:53:31 公開日:2023-10-30

# 格子場理論からのリアルタイムスピンシステム

Real-time Spin Systems from Lattice Field Theory ( http://arxiv.org/abs/2310.19761v1 )

ライセンス: Link先を確認

Neill C. Warrington

(参考訳) 熱浴におけるスピン系の実時間ダイナミクスを計算するための格子場理論法を構築する。これは、シュウィンガー・ケルディッシュによるタカノの以前の研究と機能的分化技術に基づいて行われる。一般スピンハミルトニアンに対してシュウィンガー・ケルディシュ経路積分を導出し、簡単なシステム上でその方法を実証する。我々の経路積分には符号問題があり、一般にシステムサイズにおいて指数的な実行時間を必要とするが、線形記憶だけを必要とする。後者は、この方法を両方の指数関数である正確な対角化よりも有利にすることができる。我々の経路積分は、符号問題を減らす手法である輪郭変形に適応できる。

We construct a lattice field theory method for computing the real-time dynamics of spin systems in a thermal bath. This is done by building on previous work of Takano with Schwinger-Keldysh and functional differentiation techniques. We derive a Schwinger-Keldysh path integral for generic spin Hamiltonians, then demonstrate the method on a simple system. Our path integral has a sign problem, which generally requires exponential run time in the system size, but requires only linear storage. The latter may place this method at an advantage over exact diagonalization, which is exponential in both. Our path integral is amenable to contour deformations, a technique for reducing sign problems.

翻訳日:2023-11-01 18:53:08 公開日:2023-10-30

# 機械学習モデルを用いた疫病発生予測

Epidemic outbreak prediction using machine learning models ( http://arxiv.org/abs/2310.19760v1 )

ライセンス: Link先を確認

Akshara Pramod, JS Abhishek, Dr. Suganthi K

(参考訳) 今日の世界では、新興・再発展のリスクが高まっており、近年の医療技術の進歩により、地域内での流行の予測が可能となり、感染拡大の予測は、物事のコントロールを維持するために必要な薬品や物流を当局が準備するのに大いに役立ちます。本稿では,機械学習と深層学習のアルゴリズムを用いて,米国ニューヨーク州における流行(インフルエンザ,肝炎,マラリア)の発生を予測すべく,同地域の当局や医療機関にアウトブレイクを知らせるポータルを作成した。このアルゴリズムは、過去のデータを使って5週間のケース数を予測します。グーグル検索トレンド、社会メディアデータ、気象データなどの非臨床要因も、アウトブレイクの確率を予測するために使われてきた。

In today's world,the risk of emerging and re-emerging epidemics have increased.The recent advancement in healthcare technology has made it possible to predict an epidemic outbreak in a region.Early prediction of an epidemic outbreak greatly helps the authorities to be prepared with the necessary medications and logistics required to keep things in control. In this article, we try to predict the epidemic outbreak (influenza, hepatitis and malaria) for the state of New York, USA using machine and deep learning algorithms, and a portal has been created for the same which can alert the authorities and health care organizations of the region in case of an outbreak. The algorithm takes historical data to predict the possible number of cases for 5 weeks into the future. Non-clinical factors like google search trends,social media data and weather data have also been used to predict the probability of an outbreak.

翻訳日:2023-11-01 18:52:49 公開日:2023-10-30

# 英国における鉄道運行コスト計算

Rail journey cost calculator for Great Britain ( http://arxiv.org/abs/2310.19754v1 )

ライセンス: Link先を確認

Federico Botta

(参考訳) 病院や仕事のある地域など、様々な場所のアクセシビリティは、交通システム、都市環境、そして様々な人々が到達できるサービスや機会の不平等を理解する上で重要である。多くの場合、この領域の研究は、特定時間枠内である地域に住む人々が特定の目的地に到達できるかどうかという問題に焦点が当てられている。しかし、こうした旅の費用や手頃な価格であっても、しばしば省略されるか、同じレベルとはみなされない。ここでは,英国におけるトレイン旅行のコストを分析するために,Pythonパッケージと関連するデータセットを紹介する。我々は、これを構築するのに使った元のデータセット、それを分析するために開発したPythonパッケージ、生成した出力データセットを示します。私たちは、研究者、政策立案者、その他の利害関係者が、列車の運行コスト、これに起因する地理的または社会的不平等、輸送システムの改善方法に関する質問を調査できるようにするために、我々の研究を監督しています。

Accessibility of different places, such as hospitals or areas with jobs, is important in understanding transportation systems, urban environments, and potential inequalities in what services and opportunities different people can reach. Often, research in this area is framed around the question of whether people living in an area are able to reach certain destinations within a prespecified time frame. However, the cost of such journeys, and whether they are affordable, is often omitted or not considered to the same level. Here, we present a Python package and an associated data set which allows to analyse the cost of train journeys in Great Britain. We present the original data set we used to construct this, the Python package we developed to analyse it, and the output data set which we generated. We envisage our work to allow researchers, policy makers, and other stakeholders, to investigate questions around the cost of train journeys, any geographical or social inequalities arising from this, and how the transport system could be improved.

翻訳日:2023-11-01 18:52:22 公開日:2023-10-30

# CLIPを用いたゼロショット視覚分類のためのモーダル内プロキシ学習

Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP ( http://arxiv.org/abs/2310.19752v1 )

ライセンス: Link先を確認

Qi Qian, Yuanhong Xu, Juhua Hu

(参考訳) 視覚言語による事前学習メソッド、例えばクリップは、クラス名のテキスト埋め込みによるクラスプロキシで、視覚的な分類において印象的なゼロショットのパフォーマンスを示している。しかし、テキストと視覚空間の間のモダリティギャップは、準最適性能をもたらす可能性がある。理論的には、クリップのコントラスト損失を最小化し、ビジョンタスクの最適なプロキシをビジョン空間にのみ配置することで、ギャップを十分に削減できないことを示す。そこで,未ラベルの目標視データから,ゼロショット転送のためのテキストプロキシの助けを借りて,ビジョンプロキシを直接学習することを提案する。さらに,本理論解析により,テキストプロキシが取得した擬似ラベルをさらに洗練し,視覚のモード内プロキシ学習(inmap)を容易にするための戦略を開発した。広範囲な下流タスクの実験により,提案手法の有効性と有効性が確認された。具体的には、InMaPは単一のGPU上で1分以内にビジョンプロキシを取得することができ、CLIPが事前トレーニングしたViT-L/14@336でImageNet上でのゼロショット精度を7.02\%から80.21\%に改善することができる。コードは \url{https://github.com/idstcv/InMaP} で入手できる。

Vision-language pre-training methods, e.g., CLIP, demonstrate an impressive zero-shot performance on visual categorizations with the class proxy from the text embedding of the class name. However, the modality gap between the text and vision space can result in a sub-optimal performance. We theoretically show that the gap cannot be reduced sufficiently by minimizing the contrastive loss in CLIP and the optimal proxy for vision tasks may reside only in the vision space. Therefore, given unlabeled target vision data, we propose to learn the vision proxy directly with the help from the text proxy for zero-shot transfer. Moreover, according to our theoretical analysis, strategies are developed to further refine the pseudo label obtained by the text proxy to facilitate the intra-modal proxy learning (InMaP) for vision. Experiments on extensive downstream tasks confirm the effectiveness and efficiency of our proposal. Concretely, InMaP can obtain the vision proxy within one minute on a single GPU while improving the zero-shot accuracy from $77.02\%$ to $80.21\%$ on ImageNet with ViT-L/14@336 pre-trained by CLIP. Code is available at \url{https://github.com/idstcv/InMaP}.

翻訳日:2023-11-01 18:51:51 公開日:2023-10-30

# ソーシャルメディアにおけるスタンス検出のためのチェーンオブソート埋め込み

Chain-of-Thought Embeddings for Stance Detection on Social Media ( http://arxiv.org/abs/2310.19750v1 )

ライセンス: Link先を確認

Joseph Gatto, Omar Sharif, Sarah Masud Preum

(参考訳) ソーシャルメディアでのスタンス検出は大規模言語モデル(llm)では困難であり、オンライン会話における新しいスラングや口語は、しばしば暗黙のスタンスラベルを含んでいる。 CoT(Chain-of-Thought)プロンプトは、最近、スタンス検出タスクのパフォーマンスを改善することが示されている。しかし、cotプロンプトは暗黙のスタンス識別に苦しむ。この課題は、モデルがさまざまなトピックに関連するスラングや進化する知識に慣れるまでに、多くのサンプルが最初に理解することが難しいためである。本研究では,COT推論を埋め込み,従来のRoBERTaを用いた姿勢検出パイプラインに統合することにより,姿勢検出タスクにおけるCOT性能を向上させるCOT埋め込みを導入することで,この問題に対処する。私たちの分析は 1)テキストエンコーダは、COT出力ラベルを歪ませるような小さなエラーや幻覚を伴うCOT推論を利用することができる。 2)テキストエンコーダは,サンプルの予測がドメイン固有のパターンに大きく依存する場合,COT推論の誤解を招く可能性がある。本モデルはソーシャルメディアから収集した複数の姿勢検出データセット上でのSOTA性能を実現する。

Stance detection on social media is challenging for Large Language Models (LLMs), as emerging slang and colloquial language in online conversations often contain deeply implicit stance labels. Chain-of-Thought (COT) prompting has recently been shown to improve performance on stance detection tasks -- alleviating some of these issues. However, COT prompting still struggles with implicit stance identification. This challenge arises because many samples are initially challenging to comprehend before a model becomes familiar with the slang and evolving knowledge related to different topics, all of which need to be acquired through the training data. In this study, we address this problem by introducing COT Embeddings which improve COT performance on stance detection tasks by embedding COT reasonings and integrating them into a traditional RoBERTa-based stance detection pipeline. Our analysis demonstrates that 1) text encoders can leverage COT reasonings with minor errors or hallucinations that would otherwise distort the COT output label. 2) Text encoders can overlook misleading COT reasoning when a sample's prediction heavily depends on domain-specific patterns. Our model achieves SOTA performance on multiple stance detection datasets collected from social media.

翻訳日:2023-11-01 18:51:24 公開日:2023-10-30

# この特性のよいところを教えてください:セグメントパーソナライズされた画像収集の要約にレビューを活用する

Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization ( http://arxiv.org/abs/2310.19743v1 )

ライセンス: Link先を確認

Monika Wysoczanska, Moran Beladev, Karen Lastmann Assaraf, Fengjun Wang, Ofri Kleinfeld, Gil Amsalem, Hadas Harush Boker

(参考訳) 画像収集要約技術は、画像ギャラリーのコンパクトな表現を、その意味的コンテンツをキャプチャする画像の慎重に選択されたサブセットを通して提示することを目的としている。しかし、webコンテンツに関しては、ユーザの特定の意図や好みに応じて、理想的な選択が異なります。これはBooking.comで特に重要であり、ユーザの期待に沿うプロパティとその視覚的要約を提示することが重要である。この課題に対処するために、プロパティレビューを分析し、ユーザが言及する最も重要な側面を抽出することで、プロパティビジュアルの要約におけるユーザ意図を考察する。視覚的な要約にレビューからの洞察を取り入れることで、関連コンテンツをユーザに提示することで要約を強化する。さらに、コストのかかるアノテーションを必要とせずに実現します。人間の知覚研究を含む我々の実験は、ノンパーソナライズとイメージベースのクラスタリングベースラインよりもクロスサムマライザとして生み出される、クロスモーダルアプローチの優位性を示しています。

Image collection summarization techniques aim to present a compact representation of an image gallery through a carefully selected subset of images that captures its semantic content. When it comes to web content, however, the ideal selection can vary based on the user's specific intentions and preferences. This is particularly relevant at Booking.com, where presenting properties and their visual summaries that align with users' expectations is crucial. To address this challenge, we consider user intentions in the summarization of property visuals by analyzing property reviews and extracting the most significant aspects mentioned by users. By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user. Moreover, we achieve it without the need for costly annotations. Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach, which we coin as CrossSummarizer over the no-personalization and image-based clustering baselines.

翻訳日:2023-11-01 18:50:54 公開日:2023-10-30

# 位相変調によるアレイの原子制御

Individual-atom control in array through phase modulation ( http://arxiv.org/abs/2310.19741v1 )

ライセンス: Link先を確認

Guoqing Wang, Wenchao Xu, Changhao Li, Vladan Vuleti\'c, Paola Cappellaro

(参考訳) 低クロストークを維持しながら並列ゲート操作を実行することは、中性原子配列を強力な量子コンピュータやシミュレータに変換するための重要なステップである。クロストーク抑制のために小さな領域に集束した制御ビームは、通常困難であり、特定の遷移に対して不完全な分極を引き起こす。本研究では, 位相変調連続駆動による単一キュービットゲートの設計手法を導入することで, この問題に対処する。特定の量子ビットは、クロストーク効果を著しく抑制する変調パラメータを調整するだけで、個別に高精度に対応できる。格子構造に配置すると、最適クロストーク抑制による個別制御を実現する。追加のアドレッシング光または多重変調周波数の補助により、並列ゲート演算の2つの効率的な実装を開発する。その結果、複雑な波面設計や高出力レーザービームを必要とせず、低エラーのパラレルゲート操作で原子線プラットフォームをスケールアップする方法が得られた。

Performing parallel gate operations while retaining low crosstalk is an essential step in transforming neutral atom arrays into powerful quantum computers and simulators. Tightly focusing control beams in small areas for crosstalk suppression is typically challenging and can lead to imperfect polarization for certain transitions. We tackle such a problem by introducing a method to engineer single qubit gates through phase-modulated continuous driving. Distinct qubits can be individually addressed to high accuracy by simply tuning the modulation parameters, which significantly suppresses crosstalk effects. When arranged in a lattice structure, individual control with optimal crosstalk suppression is achieved. With the assistance of additional addressing light or multiple modulation frequencies, we develop two efficient implementations of parallel-gate operations. Our results pave the way to scaling up atom-array platforms with low-error parallel-gate operations, without requiring complicated wavefront design or high-power laser beams.

翻訳日:2023-11-01 18:50:35 公開日:2023-10-30

# DEFT: 現実世界のハンド・ポリシーのためのデクサラス・ファイン・チューニング

DEFT: Dexterous Fine-Tuning for Real-World Hand Policies ( http://arxiv.org/abs/2310.19797v1 )

ライセンス: Link先を確認

Aditya Kannan, Kenneth Shaw, Shikhar Bahl, Pragna Mannam, Deepak Pathak

(参考訳) デクスタリティはしばしば複雑な操作の基盤として見なされる。人間は、食べ物作りから操作ツールまで、さまざまなスキルを手を使って実行することができる。本稿では,これらの課題,特に軟質で変形可能な物体や,複雑で比較的長い水平なタスクについて検討する。しかし、そのような振る舞いをスクラッチから学ぶことはデータ非効率である。これを回避するために,実世界で直接実行される人間による事前処理を活用する新しいアプローチDEFT(DExterous Fine-Tuning for Hand Policies)を提案する。これらの先行性を改善するために、DEFTは効率的なオンライン最適化手順を必要とする。人間の学習とオンラインの微調整を統合し、ソフトなロボットハンドと組み合わせることで、DEFTはさまざまなタスクにまたがって成功を示し、汎用的な巧妙な操作に向けた堅牢でデータ効率のよい経路を確立する。ビデオの検索結果はhttps://dexterous-finetuning.github.ioでご覧ください。

Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.

翻訳日:2023-11-01 18:43:52 公開日:2023-10-30

# 合成を用いた再合成アルゴリズムの再評価

Re-evaluating Retrosynthesis Algorithms with Syntheseus ( http://arxiv.org/abs/2310.19796v1 )

ライセンス: Link先を確認

Krzysztof Maziarz, Austin Tripp, Guoqing Liu, Megan Stanley, Shufang Xie, Piotr Gai\'nski, Philipp Seidl, Marwin Segler

(参考訳) 分子の合成の計画(レトロシンセシスとも呼ばれる)は近年、機械学習と化学のコミュニティに注目が集まっている。安定した進歩の出現にもかかわらず、不完全なベンチマークと不整合比較は既存の技術の体系的な欠点を隠蔽していると主張する。そこで本研究では,syntheseusというベンチマークライブラリを提案する。このベンチマークライブラリは,単一ステップおよび複数ステップのレトロシンセシスアルゴリズムの一貫性のある評価を可能にする。合成法を用いて, 過去のレトロシンセシスアルゴリズムを再評価し, 慎重に評価すると, 最先端モデルのランクが変化することがわかった。私たちはこの地域の将来の仕事のガイダンスで終わります。

The planning of how to synthesize molecules, also known as retrosynthesis, has been a growing focus of the machine learning and chemistry communities in recent years. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques. To remedy this, we present a benchmarking library called syntheseus which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step retrosynthesis algorithms. We use syntheseus to re-evaluate a number of previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes when evaluated carefully. We end with guidance for future works in this area.

翻訳日:2023-11-01 18:43:38 公開日:2023-10-30

# SimMMDG: マルチモーダルドメイン一般化のためのシンプルで効果的なフレームワーク

SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization ( http://arxiv.org/abs/2310.19795v1 )

ライセンス: Link先を確認

Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, Olga Fink

(参考訳) 実世界のシナリオでは、ドメイン一般化(DG)を達成するには、未知のターゲット分布に一般化するモデルが必要であるため、大きな課題が提示される。未知のマルチモーダル分布への一般化は、異なるモダリティによって示される異なる性質のためにさらに困難をもたらす。マルチモーダルシナリオにおけるドメイン一般化の課題を克服するために,単純かつ効果的なマルチモーダルdgフレームワークであるsimmmdgを提案する。異なるモダリティから同じ埋め込み空間へのマッピング機能はモデルの一般化を妨げると論じている。これに対処するために、各モダリティ内の機能をモダリティ固有のコンポーネントとモダリティ共有コンポーネントに分割することを提案する。我々は,モダリティ共有特徴に対する教師付きコントラスト学習を用いて,共同性を確保し,多様性を促進するためにモダリティ固有の特徴に距離制約を課す。さらに,学習された機能を正規化するクロスモーダル翻訳モジュールを導入し,欠落モダリティ一般化にも利用できる。本稿では,EPIC-KitchensデータセットとHuman-Animal-Cartoon(HAC)データセットを用いたマルチモーダルDGを理論的に支持し,高い性能を実現していることを示す。私たちのソースコードとhacデータセットはhttps://github.com/donghao51/simmmdgで利用可能です。

In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.

翻訳日:2023-11-01 18:43:24 公開日:2023-10-30

# 線形モデルのためのロバスト因果バンディット

Robust Causal Bandits for Linear Models ( http://arxiv.org/abs/2310.19794v1 )

ライセンス: Link先を確認

Zirui Yan, Arpan Mukherjee, Burak Var{\i}c{\i}, Ali Tajer

(参考訳) 因果系における報酬関数を最適化するための実験の逐次設計は、因果包帯(CB)における介入のシーケンシャル設計によって効果的にモデル化することができる。 CBに関する既存の文献では、因果モデルが時間とともに一定であることが重要な仮定である。しかし、この仮定は、常に時間モデルゆらぎを経る複雑なシステムでは必ずしも成り立たない。本稿では,このようなモデル変動に対するCBの堅牢性について述べる。焦点は線形構造方程式モデル(SEM)による因果系である。 SEMと時間変化の前・後統計モデルは、すべて不明である。累積的後悔(cumulative regret)は設計基準として採用され、その目的は、因果モデル全体とそのゆらぎを認識したオラクルに対して、最小の累積後悔を引き起こす一連の介入を設計することである。第一に, 既存手法ではモデル偏差の例さえあれば, 後悔する部分線形性が維持できないことが判明した。特に、モデルの偏差を持つインスタンス数が$t^\frac{1}{2l}$で、$t$が時間軸であり、$l$がグラフの最長因果経路である場合、既存のアルゴリズムは、$t$で線形後悔する。次に、ロバストなcbアルゴリズムを設計し、その後悔を解析し、後悔の上位及び情報理論的下限を設定する。具体的には、$N$ノードと最大次数$d$のグラフにおいて、モデル偏差$C$の一般的な測度の下で、累積後悔は$\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{NT} + NC))$で上界、下界$\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$で下界となる。これらの境界を比較すると、提案アルゴリズムは$C$が$o(\sqrt{T})$であるときにほぼ最適な$\tilde{\mathcal{O}}(\sqrt{T})$後悔を達成し、より広い範囲の$C$に対してサブ線形後悔を維持する。

Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as $T^\frac{1}{2L}$, where $T$ is the time horizon and $L$ is the longest causal path in the graph, the existing algorithms will have linear regret in $T$. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with $N$ nodes and maximum degree $d$, under a general measure of model deviation $C$, the cumulative regret is upper bounded by $\tilde{\mathcal{O}}(d^{L-\frac{1}{2}}(\sqrt{NT} + NC))$ and lower bounded by $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$. Comparing these bounds establishes that the proposed algorithm achieves nearly optimal $\tilde{\mathcal{O}}(\sqrt{T})$ regret when $C$ is $o(\sqrt{T})$ and maintains sub-linear regret for a broader range of $C$.

翻訳日:2023-11-01 18:42:59 公開日:2023-10-30

# 勾配流をもつガウス型マルチインデックスモデルの学習について

On Learning Gaussian Multi-index Models with Gradient Flow ( http://arxiv.org/abs/2310.19793v1 )

ライセンス: Link先を確認

Alberto Bietti, Joan Bruna and Loucas Pillaud-Vivien

(参考訳) 高次元ガウスデータに対するマルチインデックス回帰問題における勾配流れについて検討する。マルチインデックス関数は、未知の低ランク線形射影と任意の未知の低次元リンク関数からなる。そのため、ニューラルネットワークにおける特徴学習の自然なテンプレートを構成する。低階射影をパラメトリする部分空間よりも、非パラメトリックモデルで低次元リンク関数を無限に高速に学習する2時間スケールのアルゴリズムを考える。部分空間相関行列上で生じる行列半群構造を適切に活用することにより、結果として生じるグラスマン人口勾配流れのグローバル収束を確立し、関連する「サドル・ツー・サドル」ダイナミクスの定量的記述を提供する。特に、各サドルに関連する時間スケールは、ターゲットリンク関数の適切なエルミート分解の観点から明確に特徴づけることができる。これらのポジティブな結果とは対照的に、リンク関数が知られ固定されている場合の関連する \emph{planted} 問題は、実際には勾配流れのダイナミクスが高い確率で捕捉されるような大まかな最適化のランドスケープを持っていることも示している。

We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian population gradient flow dynamics, and provide a quantitative description of its associated `saddle-to-saddle' dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function. In contrast with these positive results, we also show that the related \emph{planted} problem, where the link function is known and fixed, in fact has a rough optimization landscape, in which gradient flow dynamics might get trapped with high probability.

翻訳日:2023-11-01 18:42:14 公開日:2023-10-30

# Eval4NLP 2023 大規模言語モデルを説明可能な指標として示す作業

The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics ( http://arxiv.org/abs/2310.19792v1 )

ライセンス: Link先を確認

Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger

(参考訳) パラメータの数の増加と事前学習データにより、生成型大規模言語モデル(LLM)は、タスクに関連する最小あるいは全くの例でタスクを解く顕著な能力を示した。特に、LLMはテキスト生成タスクにおいて評価指標としてうまく採用されている。本研究では,機械翻訳(MT)と要約評価のためのプロンプトとスコア抽出を参加者に求めるEval4NLP 2023共有タスクを提案する。具体的には,許可されたllmのリストを選択し,プロンプトに焦点を合わせるために微調整を禁止する,新たなコンペティション設定を提案する。参加者のアプローチの概要を述べるとともに,MTの3つの言語対と要約データセットにまたがる新しい参照なしテストセットについて評価する。特に、タスクの制限にもかかわらず、最高のパフォーマンスのシステムは、GEMBAやComet-Kiwi-XXLといった大規模モデルで開発された最近の参照なしメトリクスと同等かそれ以上の結果を得る。最後に,個別のトラックとして,llmによる説明の可能性について,小規模の人間による評価を行う。

With an increasing number of parameters and pre-training data, generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples. Notably, LLMs have been successfully employed as evaluation metrics in text generation tasks. Within this context, we introduce the Eval4NLP 2023 shared task that asks participants to explore prompting and score extraction for machine translation (MT) and summarization evaluation. Specifically, we propose a novel competition setting in which we select a list of allowed LLMs and disallow fine-tuning to ensure a focus on prompting. We present an overview of participants' approaches and evaluate them on a new reference-free test set spanning three language pairs for MT and a summarization dataset. Notably, despite the task's restrictions, the best-performing systems achieve results on par with or even surpassing recent reference-free metrics developed using larger models, including GEMBA and Comet-Kiwi-XXL. Finally, as a separate track, we perform a small-scale human evaluation of the plausibility of explanations given by the LLMs.

翻訳日:2023-11-01 18:41:56 公開日:2023-10-30

# lilo: 圧縮と文書化による解釈可能なライブラリの学習

LILO: Learning Interpretable Libraries by Compressing and Documenting Code ( http://arxiv.org/abs/2310.19791v1 )

ライセンス: Link先を確認

Gabriel Grand, Lionel Wong, Matthew Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum, Jacob Andreas

(参考訳) 大規模言語モデル(LLM)はコード生成に優れていますが、ソフトウェア開発の重要な側面はリファクタリングのテクニックです。本稿では,特定の問題領域に合わせたライブラリを構築するために,反復的に合成,圧縮,文書化を行う神経シンボリックフレームワークであるliloを紹介する。 LILOは、LLM誘導プログラム合成と、Stitchからの自動リファクタリングにおける最近のアルゴリズム的な進歩を組み合わせたものだ。これらの抽象化を解釈するために、文脈的使用例に基づいて自然言語名や文書を推論するAuto-Doc(Auto-Docmentation)手順を導入する。人間の可読性の改善に加えて、AutoDocはLILOのシンセサイザーが学習した抽象化を解釈し、デプロイするのを手助けすることで、パフォーマンスを向上させる。文字列編集,シーン推論,グラフィック合成の3つの帰納的プログラム合成ベンチマークでLILOを評価する。最先端のライブラリ学習アルゴリズムDreamCoderを含む既存のニューラルおよびシンボリックメソッドと比較して、LILOはより複雑なタスクを解決し、言語知識に根ざしたリッチなライブラリを学ぶ。

While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.

翻訳日:2023-11-01 18:41:37 公開日:2023-10-30

# DiffEnc:学習エンコーダを用いた変分拡散

DiffEnc: Variational Diffusion with a Learned Encoder ( http://arxiv.org/abs/2310.19789v1 )

ライセンス: Link先を確認

Beatrix M. G. Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther

(参考訳) 拡散モデルは階層的変分オートエンコーダ(vaes)と見なすことができる: 生成過程における条件分布のパラメータ共有と階層上の独立項としての損失の効率的な計算である。モデルに柔軟性を加えながらこれらの利点を維持する拡散モデルに対する2つの変更を検討する。まず,拡散過程におけるデータと深さに依存した平均関数を導入することにより,拡散損失が変化する。提案するフレームワークであるDiffEncは,CIFAR-10における最先端の可能性を実現する。次に、逆エンコーダ法と生成過程のノイズ分散の比を1に固定されるのではなく、自由ウェイトパラメータとする。有限深度階層に対して、エビデンスローバウンド(ELBO)は、重み付け拡散損失アプローチの目的として、および推論に特化してノイズスケジュールを最適化するために使用することができる。一方、無限深さ階層では、重みパラメータは 1 で十分定義された ELBO を持つ必要がある。

Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves state-of-the-art likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an objective for a weighted diffusion loss approach and for optimizing the noise schedule specifically for inference. For the infinite-depth hierarchy, on the other hand, the weight parameter has to be 1 to have a well-defined ELBO.

翻訳日:2023-11-01 18:41:14 公開日:2023-10-30

# 予算を固定した局所最適最良腕識別法

Locally Optimal Best Arm Identification with a Fixed Budget ( http://arxiv.org/abs/2310.19788v1 )

ライセンス: Link先を確認

Masahiro Kato

(参考訳) 本研究は, 最良治療アーム, 期待結果の高い治療アームの同定に関する課題について検討する。最良治療アームの同定と誤認の確率の低下を目標とし,様々な研究分野において,<emph{best arm identification} (bai) や順序最適化など様々な名称で検討してきた。実験では,治療アロケーションラウンドの数を固定した。各ラウンドにおいて、意思決定者は、処理アームを実験ユニットに割り当て、対応する結果を観察し、処理アーム間のばらつきが異なるガウス分布に従う。実験の最後には、観察に基づいて最適な治療アームの見積もりとして、治療アームの1つを推奨する。意思決定者の目標は、最良の治療アームを誤認する可能性を最小化する実験を設計することである。この目的を念頭に、我々は、最善と準最適治療腕の期待結果のギャップがゼロに近づく小ギャップ体制の下で、誤同定の確率の低い境界を開発する。そして、この分散が知られていると仮定して、我々は、Neyman (1934) が提案した Neyman 割り当ての拡張である Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) 戦略と Bubeck et al. (2011) が提案した Uniform-EBA 戦略を設計する。 GNA-EBA戦略は, サンプルサイズが小ギャップ体制下で無限に近づくにつれて, 誤同定の確率が下界と一致するため, 漸近的に最適であることを示す。局所的な漸近的最適戦略は、その性能が小ギャップ体制によって特徴づけられる制限された状況の中で下界と一致しているためである。

This study investigates the problem of identifying the best treatment arm, a treatment arm with the highest expected outcome. We aim to identify the best treatment arm with a lower probability of misidentification, which has been explored under various names across numerous research fields, including \emph{best arm identification} (BAI) and ordinal optimization. In our experiments, the number of treatment-allocation rounds is fixed. In each round, a decision-maker allocates a treatment arm to an experimental unit and observes a corresponding outcome, which follows a Gaussian distribution with a variance different among treatment arms. At the end of the experiment, we recommend one of the treatment arms as an estimate of the best treatment arm based on the observations. The objective of the decision-maker is to design an experiment that minimizes the probability of misidentifying the best treatment arm. With this objective in mind, we develop lower bounds for the probability of misidentification under the small-gap regime, where the gaps of the expected outcomes between the best and suboptimal treatment arms approach zero. Then, assuming that the variances are known, we design the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, which is an extension of the Neyman allocation proposed by Neyman (1934) and the Uniform-EBA strategy proposed by Bubeck et al. (2011). For the GNA-EBA strategy, we show that the strategy is asymptotically optimal because its probability of misidentification aligns with the lower bounds as the sample size approaches infinity under the small-gap regime. We refer to such optimal strategies as locally asymptotic optimal because their performance aligns with the lower bounds within restricted situations characterized by the small-gap regime.

翻訳日:2023-11-01 18:40:53 公開日:2023-10-30

# ビジョン言語モデルで"アップ"とは何か? 空間的推論と闘いの考察

What's "up" with vision-language models? Investigating their struggle with spatial reasoning ( http://arxiv.org/abs/2310.19785v1 )

ライセンス: Link先を確認

Amita Kamath, Jack Hessel, Kai-Wei Chang

(参考訳) 最近の視覚言語(VL)モデルは強力だが、「右」と「左」を確実に区別できるだろうか? このような空間関係のモデル理解を定量化するために、3つの新しいコーパスをキュレートする。これらのテストは、VQAv2のような既存のデータセットよりも正確に空間的推論を分離します。例えば、私たちのWhat'sUpベンチマークには、オブジェクトの空間的関係だけを変化させ、そのアイデンティティを固定し続ける一連の写真が含まれています(図1:モデルは、テーブルの下の犬の通常のケースだけでなく、同じテーブルの上にある同じ犬も理解する必要があります)。例えば、VQAv2の人間のパリティに近いVQAv2で微調整されたBLIPは、我々のベンチマークで99%の精度で56%の精度を実現している。私たちはこの驚くべき行動の原因を研究することで結論付けます。 1) LAION-2Bのような一般的な視覚言語事前学習コーパスは、空間関係を学習するための信頼性が低い。 2) 事前設定を含むアップウェイトやコーパスの微調整のような基本的なモデリング介入は、ベンチマークがもたらす課題に対処するには不十分である。これらのコーパスがさらなる研究を促進することを期待しており、データとコードをhttps://github.com/amitakamath/whatsup_vlms.comで公開しています。

Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"? We curate three new corpora to quantify model comprehension of such basic spatial relations. These tests isolate spatial reasoning more precisely than existing datasets like VQAv2, e.g., our What'sUp benchmark contains sets of photographs varying only the spatial relations of objects, keeping their identity fixed (see Figure 1: models must comprehend not only the usual case of a dog under a table, but also, the same dog on top of the same table). We evaluate 18 VL models, finding that all perform poorly, e.g., BLIP finetuned on VQAv2, which nears human parity on VQAv2, achieves 56% accuracy on our benchmarks vs. humans at 99%. We conclude by studying causes of this surprising behavior, finding: 1) that popular vision-language pretraining corpora like LAION-2B contain little reliable data for learning spatial relationships; and 2) that basic modeling interventions like up-weighting preposition-containing instances or fine-tuning on our corpora are not sufficient to address the challenges our benchmarks pose. We are hopeful that these corpora will facilitate further research, and we release our data and code at https://github.com/amitakamath/whatsup_vlms.

翻訳日:2023-11-01 18:39:33 公開日:2023-10-30

# CustomNet: テキスト・画像拡散モデルにおける可変視点によるゼロショットオブジェクトのカスタマイズ

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2310.19784v1 )

ライセンス: Link先を確認

Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan

(参考訳) 画像生成にカスタマイズされたオブジェクトを組み込むことは、テキスト・画像生成において魅力的な特徴である。しかし、既存の最適化ベースおよびエンコーダベースの方法は、時間消費最適化、不十分なアイデンティティ保存、一般的なコピーペースト効果などの欠点によって妨げられている。これらの制限を克服するために、私たちは、オブジェクトのカスタマイズプロセスに3Dの新しいビュー合成機能を明示的に組み込んだ新しいオブジェクトカスタマイズアプローチであるCustomNetを紹介します。この統合により、空間的位置関係と視点の調整が容易になり、オブジェクトのアイデンティティを効果的に保存しながら多様な出力が得られる。さらに,既存の3次元画像合成手法の限界を克服し,テキスト記述やユーザ定義画像による位置制御やフレキシブルな背景制御を実現するための繊細な設計を提案する。さらに私たちは、現実世界のオブジェクトや複雑なバックグラウンドをよりうまく処理できるデータセット構築パイプラインを活用します。これらの設計を取り入れた本手法は,テスト時間最適化なしでゼロショットオブジェクトのカスタマイズを容易にし,視点,位置,背景を同時制御する。その結果、CustomNetはアイデンティティ保護の強化を保証し、多様な調和した出力を生成する。

Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.

翻訳日:2023-11-01 18:39:06 公開日:2023-10-30

# ジェネリック回路アーキテクチャにおける近似t設計

Approximate t-designs in generic circuit architectures ( http://arxiv.org/abs/2310.19783v1 )

ライセンス: Link先を確認

Daniel Belkin, James Allen, Soumik Ghosh, Christopher Kang, Sophia Lin, James Sud, Fred Chong, Bill Fefferman, and Bryan K. Clark

(参考訳) ユニタリ t-デザインは、最初の t モーメントが最大ランダムに見えるユニタリ群上の分布である。以前の研究は、特定のランダム量子回路アンサンブルがt設計を近似する深さのいくつかの上界を確立した。ここで、これらの境界はハールランダム二箇所ゲートの任意の固定アーキテクチャに拡張可能であることを示す。これは、そのようなアーキテクチャのスペクトルギャップと1Dブリックワークアーキテクチャのギャップを関連付けることで達成される。我々の境界は、回路のブロックがサイト上に接続されたグラフを形成するのに必要な典型的な層数のみを通して、アーキテクチャの詳細に依存する。この量が幅に依存しない場合、回路は線形深さで近似t設計を形成する。また、固定アーキテクチャ上の対応する分布の性質の観点から、非決定論的アーキテクチャに暗黙的な境界を与える。

Unitary t-designs are distributions on the unitary group whose first t moments appear maximally random. Previous work has established several upper bounds on the depths at which certain specific random quantum circuit ensembles approximate t-designs. Here we show that these bounds can be extended to any fixed architecture of Haar-random two-site gates. This is accomplished by relating the spectral gaps of such architectures to those of 1D brickwork architectures. Our bound depends on the details of the architecture only via the typical number of layers needed for a block of the circuit to form a connected graph over the sites. When this quantity is independent of width, the circuit forms an approximate t-design in linear depth. We also give an implicit bound for nondeterministic architectures in terms of properties of the corresponding distribution over fixed architectures.

翻訳日:2023-11-01 18:38:46 公開日:2023-10-30

# チュートリアル:フォトニックインタフェースを用いた静止キュービットの遠隔絡み合いプロトコル

Tutorial: Remote entanglement protocols for stationary qubits with photonic interfaces ( http://arxiv.org/abs/2310.19878v1 )

ライセンス: Link先を確認

Hans K.C. Beukers, Matteo Pasini, Hyeongrak Choi, Dirk Englund, Ronald Hanson and Johannes Borregaard

(参考訳) 遠方の量子システム間の絡み合いを生成することは量子ネットワークの核となる。近年,遠隔エンタングルメント生成のための理論的プロトコルが多数提案されており,その多くが実験的に実現されている。ここでは,原子あるいは固体系における単一スピン間の光子を介する絡み合い発生の一般的な機構を明らかにするためのモジュラー理論フレームワークを提供する。本フレームワークでは,既存のプロトコルをさまざまな抽象化レベルに分類し,異なるスキームの要素を新たな方法で組み合わせることができる。これらの抽象化レイヤにより、異なる量子ハードウェアのプロトコルを簡単に比較できる。特定の実験パラメータに合わせたプロトコルの実用的評価を実現するため,我々は,本フレームワークをベースとした数値シミュレーションを考案した。

Generating entanglement between distant quantum systems is at the core of quantum networking. In recent years, numerous theoretical protocols for remote entanglement generation have been proposed, of which many have been experimentally realized. Here, we provide a modular theoretical framework to elucidate the general mechanisms of photon-mediated entanglement generation between single spins in atomic or solid-state systems. Our framework categorizes existing protocols at various levels of abstraction and allows for combining the elements of different schemes in new ways. These abstraction layers make it possible to readily compare protocols for different quantum hardware. To enable the practical evaluation of protocols tailored to specific experimental parameters, we have devised numerical simulations based on the framework with our codes available online.

翻訳日:2023-11-01 18:31:09 公開日:2023-10-30

# Res-Tuning: Backboneからタナーをアンバインドするフレキシブルで効率的なチューニングパラダイム

Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone ( http://arxiv.org/abs/2310.19859v1 )

ライセンス: Link先を確認

Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Ao Ma, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

(参考訳) パラメータ効率のチューニングは、大規模な基礎モデルを下流アプリケーションに転送する傾向にある。既存の方法は通常、いくつかの軽量チューナーをバックボーンに埋め込むが、そこではチューナーの設計と学習の両方がベースモデルに大きく依存する。この作品はres-tuningと呼ばれる新しいチューニングパラダイムを提供しており、故意にチューナーをバックボーンから解き放つ。理論的および実証的な証拠から、一般的なチューニングアプローチは、我々の拘束力のない定式化の下で同等のものを持ち、それゆえ、我々のフレームワークに無力に統合できることを示した。構造的な絡み合いのおかげで、ネットワークアーキテクチャからチューナーの設計を解放し、様々なチューニング戦略の柔軟な組み合わせを容易にします。さらに,本枝からバイパス(すなわちチューナー列によって形成される)を効果的に分離し,その勾配をチューナーのみにバックプロパゲーションするが,バックボーンにはない,メモリ効率の良いリズチューニングの変種を提案する。このような分離は、マルチタスク推論のためのワンタイムバックボーンフォワードも可能にする。判別タスクと生成タスクの両方に関する広範囲な実験は、有効性と効率の観点から、既存の代替案よりも優れた方法を示している。プロジェクトページ: $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}}$

Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbone. With both theoretical and empirical evidence, we show that popular tuning approaches have their equivalent counterparts under our unbinding formulation, and hence can be integrated into our framework effortlessly. Thanks to the structural disentanglement, we manage to free the design of tuners from the network architecture, facilitating flexible combination of various tuning strategies. We further propose a memory-efficient variant of Res-Tuning, where the bypass i.e., formed by a sequence of tuners) is effectively detached from the main branch, such that the gradients are back-propagated only to the tuners but not to the backbone. Such a detachment also allows one-time backbone forward for multi-task inference. Extensive experiments on both discriminative and generative tasks demonstrate the superiority of our method over existing alternatives from the perspectives of efficacy and efficiency. Project page: $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}}$.

翻訳日:2023-11-01 18:30:59 公開日:2023-10-30

# Node-Attributed Stochastic Block Modelの厳密な回復とブレグマンハードクラスタリング

Exact Recovery and Bregman Hard Clustering of Node-Attributed Stochastic Block Model ( http://arxiv.org/abs/2310.19854v1 )

ライセンス: Link先を確認

Maximilien Dreveton, Felipe S. Fernandes, Daniel R. Figueiredo

(参考訳) ネットワーククラスタリングは、同様の接続パターンを持つノード(コミュニティ)の集合を特定する問題に取り組む。しかし、多くのシナリオでは、ノードはクラスタリング構造と相関する属性も持っている。これにより、ネットワーク情報(エッジ)とノード情報(属性)を併用して高性能クラスタリングアルゴリズムを設計できる。ネットワークとノード属性の一般的なモデルの下で、この研究はコミュニティラベルの正確な回復のための情報理論的な基準を確立し、モデルのチャーノフ・ヘルリンガー分岐によって決定される位相遷移を特徴付ける。この基準は、正確な回復を得るためにネットワークと属性情報を交換する方法を示している(例えば、より信頼性の高いネットワーク情報は信頼性の低い属性情報を必要とする)。また,ネットワークインタラクションの確率分布とノード属性が指数関数族に属することを仮定して,協調確率を最大化する反復クラスタリングアルゴリズムを提案する。これは広い範囲の相互作用(例えば、重み付きエッジ)と属性(例えば、非ガウスモデル)とスパースネットワークをカバーし、指数関数族とブレグマン発散との接続を探索する。合成データを用いた大規模な数値実験により、提案アルゴリズムは、ネットワーク情報や属性情報のみを利用する古典的アルゴリズムと、両方の情報源を利用する最先端のアルゴリズムより優れていることを示す。この研究の貢献は、ノード分散ネットワーク上でコミュニティラベルを推測するための基本的な限界と実践的テクニックに関する洞察を提供する。

Network clustering tackles the problem of identifying sets of nodes (communities) that have similar connection patterns. However, in many scenarios, nodes also have attributes that are correlated with the clustering structure. Thus, network information (edges) and node information (attributes) can be jointly leveraged to design high-performance clustering algorithms. Under a general model for the network and node attributes, this work establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model. The criterion shows how network and attribute information can be exchanged in order to have exact recovery (e.g., more reliable network information requires less reliable attribute information). This work also presents an iterative clustering algorithm that maximizes the joint likelihood, assuming that the probability distribution of network interactions and node attributes belong to exponential families. This covers a broad range of possible interactions (e.g., edges with weights) and attributes (e.g., non-Gaussian models), as well as sparse networks, while also exploring the connection between exponential families and Bregman divergences. Extensive numerical experiments using synthetic data indicate that the proposed algorithm outperforms classic algorithms that leverage only network or only attribute information as well as state-of-the-art algorithms that also leverage both sources of information. The contributions of this work provide insights into the fundamental limits and practical techniques for inferring community labels on node-attributed networks.

翻訳日:2023-11-01 18:30:34 公開日:2023-10-30

# AIアライメント: 総合的な調査

AI Alignment: A Comprehensive Survey ( http://arxiv.org/abs/2310.19852v1 )

ライセンス: Link先を確認

Jiaming Ji, Tianyi Qiu, Boyuan Chen, Borong Zhang, Hantao Lou, Kaile Wang, Yawen Duan, Zhonghao He, Jiayi Zhou, Zhaowei Zhang, Fanzhi Zeng, Kwan Yee Ng, Juntao Dai, Xuehai Pan, Aidan O'Gara, Yingshan Lei, Hua Xu, Brian Tse, Jie Fu, Stephen McAleer, Yaodong Yang, Yizhou Wang, Song-Chun Zhu, Yike Guo, Wen Gao

(参考訳) AIアライメントは、人間の意図や価値観に応じてAIシステムを構築することを目的としている。超人的能力を持つAIシステムが出現すると、ミスアライメントシステムに関連する潜在的な大規模リスクが明らかになる。何百人ものAI専門家と一般大衆がAIのリスクに対する懸念を表明し、パンデミックや核戦争のような他の社会規模のリスクと並んで、AIによる絶滅のリスクを軽減すべきであると主張した。本稿では,aiアライメントに関する最新の体系的調査の欠如に動機づけられて,アライメント研究の中核的概念,方法論,実践について考察する。まず、ロバスト性、解釈可能性、制御可能性、倫理性(rice)の4つの原則をaiアライメントの重要な目的とします。現在のアライメント研究の展望を概説し、それらを2つの重要なコンポーネント、前方アライメントと後方アライメントに分解する。前者はアライメントトレーニングを通じてAIシステムをアライメントさせることを目標とし、後者はシステムアライメントに関する証拠を取得し、不正調整リスクの悪化を避けるためにそれらを適切に管理することを目的としている。前進アライメントについて,様々なタイプのフィードバック(外部アライメント)から学習を行う方法と,目標の非一般化(内部アライメント)を避けるために分布シフトを克服する方法について議論する。下位アライメントでは、デプロイされたさまざまなaiシステムの価値アライメントの程度を判断し、前進アライメント結果の保証をさらに向上させる検証手法について検討する。これに基づいて、私たちは、チュートリアル、論文のコレクション、ブログ、その他の学習リソースをhttps://www.alignmentsurvey.com.com.comで常に更新したwebサイトもリリースしています。

AI alignment aims to build AI systems that are in accordance with human intentions and values. With the emergence of AI systems possessing superhuman capabilities, the potential large-scale risks associated with misaligned systems become apparent. Hundreds of AI experts and public figures have expressed their concerns about AI risks, arguing that mitigating the risk of extinction from AI should be a global priority, alongside other societal-scale risks such as pandemics and nuclear war. Motivated by the lack of an up-to-date systematic survey on AI alignment, in this paper, we delve into the core concepts, methodology, and practice of alignment research. To begin with, we identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality (RICE). We outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. On forward alignment, we discuss how to conduct learning from various types of feedback (a.k.a., outer alignment) and how to overcome the distribution shift to avoid goal misgeneralization (a.k.a., inner alignment). On backward alignment, we discuss verification techniques that can tell the degree of value alignment for various AI systems deployed, which can further improve the assurance of forward alignment outcomes. Based on this, we also release a constantly updated website featuring tutorials, collections of papers, blogs, and other learning resources at https://www.alignmentsurvey.com.

翻訳日:2023-11-01 18:30:11 公開日:2023-10-30

# 側鎖拡散確率モデルによるタンパク質結合に対する突然変異効果の予測

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model ( http://arxiv.org/abs/2310.19849v1 )

ライセンス: Link先を確認

Shiwei Liu, Tian Zhu, Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang

(参考訳) 多くの重要な生物学的プロセスはタンパク質とタンパク質の相互作用のネットワークに依存している。アミノ酸変異がタンパク質-タンパク質結合に及ぼす影響を予測することは、タンパク質工学と治療の発見に不可欠である。しかし、結合エネルギーに関する注釈付き実験データの不足は、特に深層学習に基づく計算手法を開発する上で大きな課題となっている。そこで本研究では,未ラベルなタンパク質構造を利用した表現学習手法であるSidechainDiffを提案する。サイドチェーンディフはリーマン拡散モデルを用いて側鎖構造の生成過程を学習し、タンパク質-タンパク質界面上の突然変異の構造的文脈表現を与える。学習した表現を利用して、タンパク質とタンパク質の結合に対する突然変異の影響を予測する。さらに、SidechainDiffは、側鎖の拡散に基づく最初の生成モデルであり、タンパク質のバックボーン構造の生成に主に焦点をあてた以前の取り組みと区別している。

Many crucial biological processes rely on networks of protein-protein interactions. Predicting the effect of amino acid mutations on protein-protein binding is vital in protein engineering and therapeutic discovery. However, the scarcity of annotated experimental data on binding energy poses a significant challenge for developing computational approaches, particularly deep learning-based methods. In this work, we propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures. SidechainDiff utilizes a Riemannian diffusion model to learn the generative process of side-chain conformations and can also give the structural context representations of mutations on the protein-protein interface. Leveraging the learned representations, we achieve state-of-the-art performance in predicting the mutational effects on protein-protein binding. Furthermore, SidechainDiff is the first diffusion-based generative model for side-chains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures.

翻訳日:2023-11-01 18:29:38 公開日:2023-10-30

# 連続時間モデルに基づく強化学習における効率的な探索

Efficient Exploration in Continuous-time Model-based Reinforcement Learning ( http://arxiv.org/abs/2310.19848v1 )

ライセンス: Link先を確認

Lenart Treven, Jonas H\"ubotter, Bhavya Sukhija, Florian D\"orfler, Andreas Krause

(参考訳) 強化学習アルゴリズムは、基礎となるシステムがしばしば連続しているにもかかわらず、通常離散時間ダイナミクスを考える。本稿では,非線形常微分方程式(odes)を用いた連続時間ダイナミクスを表現するモデルベース強化学習アルゴリズムを提案する。高度に調整された確率モデルを用いて認識論的不確かさを捉え、楽観的な原理を探索に利用する。私たちの後悔は、測定選択戦略(MSS)の重要性を表面化しています。等価サンプリングなどのMSSの共通選択に対して,ガウス過程(GP)を用いてODEをモデル化する場合,後悔はサブリニアであることを示す。さらに,適応的でデータに依存した実用的なMSSを提案し,GPダイナミックスと組み合わせることで,より少ないサンプルでサブ線形後悔を実現する。我々は,その離散時間に対する連続時間モデリングの利点と,提案する標準ベースライン上の適応型mssを,いくつかのアプリケーションで紹介する。

Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications.

翻訳日:2023-11-01 18:29:24 公開日:2023-10-30

# 特徴選択とハイパーパラメータ最適化のための修正遺伝的アルゴリズム:スパム予測におけるXGBoostの場合

Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction ( http://arxiv.org/abs/2310.19845v1 )

ライセンス: Link先を確認

Nazeeh Ghatasheh, Ismail Altaharwa, Khaled Aldebei

(参考訳) 近年,オンラインソーシャルネットワークのスパムが研究・ビジネス界で注目を集めている。 Twitterはスパムコンテンツを広めるメディアとして好まれている。多くの研究がソーシャルネットワークのスパムに遭遇しようとした。 Twitterは、機能領域のサイズと不均衡なデータ分散によって、さらなる課題をもたらした。通常、関連する研究は、これらの主な課題の一部やブラックボックスモデルの作成に焦点を当てている。本稿では,不均衡データセット上での次元性低減とハイパーパラメータ最適化を同時に行う遺伝的アルゴリズムを提案する。このアルゴリズムは、eXtreme Gradient Boosting分類器を初期化し、ツイートデータセットの特徴空間を縮小し、スパム予測モデルを生成する。このモデルは10倍成層クロスバリデーションを50回繰り返して検証し、非パラメトリック統計テストを用いて分析する。その結果得られた予測モデルは、幾何平均と精度でそれぞれ82.32\%と92.67\%で達成され、総特徴空間の10\%未満である。その結果,改良された遺伝的アルゴリズムは,Chi^2$と$PCA$の特徴選択法よりも優れていた。さらに、eXtreme Gradient Boostingは、スパム予測においてBERTベースのディープラーニングモデルを含む多くの機械学習アルゴリズムを上回っている。さらに,提案手法をsmsスパムモデリングに適用し,関連する手法と比較した。

Recently, spam on online social networks has attracted attention in the research and business world. Twitter has become the preferred medium to spread spam content. Many research efforts attempted to encounter social networks spam. Twitter brought extra challenges represented by the feature space size, and imbalanced data distributions. Usually, the related research works focus on part of these main challenges or produce black-box models. In this paper, we propose a modified genetic algorithm for simultaneous dimensionality reduction and hyper parameter optimization over imbalanced datasets. The algorithm initialized an eXtreme Gradient Boosting classifier and reduced the features space of tweets dataset; to generate a spam prediction model. The model is validated using a 50 times repeated 10-fold stratified cross-validation, and analyzed using nonparametric statistical tests. The resulted prediction model attains on average 82.32\% and 92.67\% in terms of geometric mean and accuracy respectively, utilizing less than 10\% of the total feature space. The empirical results show that the modified genetic algorithm outperforms $Chi^2$ and $PCA$ feature selection methods. In addition, eXtreme Gradient Boosting outperforms many machine learning algorithms, including BERT-based deep learning model, in spam prediction. Furthermore, the proposed approach is applied to SMS spam modeling and compared to related works.

翻訳日:2023-11-01 18:29:06 公開日:2023-10-30

# 遺伝的アルゴリズムとエクストリームブースティングを用いたテレマーケティングプロセスのモデル化:特徴選択とコスト感性分析アプローチ

Modeling the Telemarketing Process using Genetic Algorithms and Extreme Boosting: Feature Selection and Cost-Sensitive Analytical Approach ( http://arxiv.org/abs/2310.19843v1 )

ライセンス: Link先を確認

Nazeeh Ghatasheh, Ismail Altaharwa, Khaled Aldebei

(参考訳) 現在、ほとんど全ての直接マーケティング活動は、個人よりも事実上行われ、脅威的なペースで対人スキルを弱めている。さらに、企業は顧客のマーケティングオファーを受け付ける傾向を感じ、促進しようと努力してきた。デジタルトランスフォーメーションとバーチャルプレゼンスの増加により、企業は新たなマーケティング研究アプローチを探さざるを得なくなった。本研究は,テレマーケティングデータのパワーを活用し,クライアントの長期預金意欲をモデル化し,クライアントの最も重要な特性を見出すことを目的としている。ポルトガルの銀行と社会経済指標の実際のデータは、遠隔販売による意思決定プロセスのモデル化に使用される。この研究には2つの重要な貢献がある。まず、最適な識別特徴とチューン分類器パラメータを同時に選択する新しい遺伝的アルゴリズムに基づく分類器を提案する。次に、説明可能な予測モデルを構築する。最良生成分類モデルは, 繰り返し10倍の階層化クロスバリデーションを50回繰り返し, 集中的に検証し, 選択した特徴を解析した。これらのモデルは関連する作品の興味の種別精度を著しく上回り、それぞれ平均89.07\%と0.059を幾何学的平均とタイプiの誤差で達成した。このモデルは、潜在的利益率を最小限のコストで最大化し、マーケティングの意思決定を支援するための洞察を提供すると期待されている。

Currently, almost all direct marketing activities take place virtually rather than in person, weakening interpersonal skills at an alarming pace. Furthermore, businesses have been striving to sense and foster the tendency of their clients to accept a marketing offer. The digital transformation and the increased virtual presence forced firms to seek novel marketing research approaches. This research aims at leveraging the power of telemarketing data in modeling the willingness of clients to make a term deposit and finding the most significant characteristics of the clients. Real-world data from a Portuguese bank and national socio-economic metrics are used to model the telemarketing decision-making process. This research makes two key contributions. First, propose a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously. Second, build an explainable prediction model. The best-generated classification models were intensively validated using 50 times repeated 10-fold stratified cross-validation and the selected features have been analyzed. The models significantly outperform the related works in terms of class of interest accuracy, they attained an average of 89.07\% and 0.059 in terms of geometric mean and type I error respectively. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making.

翻訳日:2023-11-01 18:28:43 公開日:2023-10-30

# 音楽形式の生成

Musical Form Generation ( http://arxiv.org/abs/2310.19842v1 )

ライセンス: Link先を確認

Lilac Atassi

(参考訳) 最近の生成モデルは魅力的な音楽を生み出すことができるが、その実用性は限られている。音楽のバリエーションはしばしば偶然に残され、結果として作曲は構造に欠ける。 1分を超えて伸びるピースは、不整合または反復的になることがある。本稿では,構造化された任意の長さの音楽作品を生成する手法を提案する。このアプローチの中心は、これらのセグメント間の遷移を伴う条件付き生成モデルを用いた音楽セグメントの作成である。ハイレベルな構成を決定するプロンプトの生成は、より細かい低レベルな詳細を作成することと異なる。次に、大きな言語モデルを使用して音楽形式を提案する。

While recent generative models can produce engaging music, their utility is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.

翻訳日:2023-11-01 18:28:21 公開日:2023-10-30

# 安全気候分析への解釈可能なクラスタリングアプローチ--安全気候認識におけるドライバグループ識別の検討

An interpretable clustering approach to safety climate analysis: examining driver group distinction in safety climate perceptions ( http://arxiv.org/abs/2310.19841v1 )

ライセンス: Link先を確認

Kailai Sun, Tianxiang Lan, Yang Miang Goh, Sufiana Safiena, Yueng-Hsiang Huang, Bailey Lytle, Yimin He

(参考訳) 交通産業、特にトラック産業は、職場での事故や死亡の傾向にある。大型トラックによる事故は、全体の死者のかなりの割合を占めた。事故防止における安全気候の重要な役割を認識した研究者は、その要因を理解し、組織内の影響を測定することを試みた。既存のデータ駆動型安全気候研究は目覚ましい進歩を遂げているが、従業員の安全気候認識に基づくクラスタリングは革新的であり、研究に広く活用されていない。安全気候の認識に基づいてドライバーのクラスターを識別することで、組織は従業員をプロファイルし、より影響力のある介入を考案することができる。クラスタリングアプローチの欠如は、従業員のクラスタメンバシップに影響を与える要因の解釈や説明が難しいためかもしれない。さらに、既存の安全関連研究は複数のクラスタリングアルゴリズムを比較しておらず、潜在的なバイアスをもたらした。これらの問題に対処するために,安全気候分析のための解釈可能なクラスタリング手法を提案する。本研究は,トラックドライバーの安全環境に対する認識に基づいて5つのアルゴリズムを比較したものである。部分依存プロット(QPDP)を定量的に評価する手法を提案する。クラスタリングの結果をよりよく解釈するために,様々な解釈可能な機械学習尺度(shap,pfi,qpdp)を導入する。 7000人以上のアメリカのトラック運転手から収集されたデータに基づいて、この研究は科学文献に大きく貢献している。様々なドライバーグループを区別する上で、監督ケアの促進が重要な役割を担っている。 Pythonコードはhttps://github.com/NUS-DBE/truck-driver-safety-climateで入手できる。

The transportation industry, particularly the trucking sector, is prone to workplace accidents and fatalities. Accidents involving large trucks accounted for a considerable percentage of overall traffic fatalities. Recognizing the crucial role of safety climate in accident prevention, researchers have sought to understand its factors and measure its impact within organizations. While existing data-driven safety climate studies have made remarkable progress, clustering employees based on their safety climate perception is innovative and has not been extensively utilized in research. Identifying clusters of drivers based on their safety climate perception allows the organization to profile its workforce and devise more impactful interventions. The lack of utilizing the clustering approach could be due to difficulties interpreting or explaining the factors influencing employees' cluster membership. Moreover, existing safety-related studies did not compare multiple clustering algorithms, resulting in potential bias. To address these issues, this study introduces an interpretable clustering approach for safety climate analysis. This study compares 5 algorithms for clustering truck drivers based on their safety climate perceptions. It proposes a novel method for quantitatively evaluating partial dependence plots (QPDP). To better interpret the clustering results, this study introduces different interpretable machine learning measures (SHAP, PFI, and QPDP). Drawing on data collected from more than 7,000 American truck drivers, this study significantly contributes to the scientific literature. It highlights the critical role of supervisory care promotion in distinguishing various driver groups. The Python code is available at https://github.com/NUS-DBE/truck-driver-safety-climate.

翻訳日:2023-11-01 18:28:14 公開日:2023-10-30

# 解釈可能なプロトタイプベースグラフ情報ボトルネック

Interpretable Prototype-based Graph Information Bottleneck ( http://arxiv.org/abs/2310.19906v1 )

ライセンス: Link先を確認

Sangwoo Seo, Sungwon Kim, Chanyoung Park

(参考訳) グラフニューラルネットワーク(GNN)の成功により、意思決定プロセスを理解し、予測に関する説明を提供する必要性が生まれ、ブラックボックスモデルに透過的な説明を提供する説明可能なAI(XAI)が生まれました。近年,プロトタイプの使用により,予測に影響を及ぼすグラフを学習し,モデルの説明可能性の向上に成功している。しかしながら、これらのアプローチは、グラフ全体からの過剰な情報を持つプロトタイプを提供する傾向にあり、キーサブストラクチャの排除や無関係なサブストラクチャの導入につながり、下流タスクにおけるモデルの解釈可能性とパフォーマンスの両方を制限できる。本研究では,モデル予測に重要な入力グラフから重要な部分グラフをプロトタイプに提供するために,情報ボトルネックフレームワークにプロトタイプ学習を組み込んだ解釈可能なプロトタイプベースグラフインフォメーション・ボトルネック(PGIB)という,説明可能なGNNの新たなフレームワークを提案する。これはプロトタイプ学習を、予測性能に重大な影響を与える重要な部分グラフを識別するプロセスに組み込んだ最初の作業である。定性的分析を含む広範囲な実験により、PGIBは予測性能と説明可能性の両方の観点から最先端の手法より優れていることが示された。

The success of Graph Neural Networks (GNNs) has led to a need for understanding their decision-making process and providing explanations for their predictions, which has given rise to explainable AI (XAI) that offers transparent explanations for black-box models. Recently, the use of prototypes has successfully improved the explainability of models by learning prototypes to imply training graphs that affect the prediction. However, these approaches tend to provide prototypes with excessive information from the entire graph, leading to the exclusion of key substructures or the inclusion of irrelevant substructures, which can limit both the interpretability and the performance of the model in downstream tasks. In this work, we propose a novel framework of explainable GNNs, called interpretable Prototype-based Graph Information Bottleneck (PGIB) that incorporates prototype learning within the information bottleneck framework to provide prototypes with the key subgraph from the input graph that is important for the model prediction. This is the first work that incorporates prototype learning into the process of identifying the key subgraphs that have a critical impact on the prediction performance. Extensive experiments, including qualitative analysis, demonstrate that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.

翻訳日:2023-11-01 18:17:48 公開日:2023-10-30

# 一般化スピン系における量子から古典的クロスオーバー--FeI$_2$の温度依存性スピンダイナミクス

Quantum to classical crossover in generalized spin systems -- the temperature-dependent spin dynamics of FeI$_2$ ( http://arxiv.org/abs/2310.19905v1 )

ライセンス: Link先を確認

D. Dahlbom, D. Brooks, M. S. Wilson, S. Chi, A. I. Kolesnikov, M. B. Stone, H. Cao, Y.-W. Li, K. Barros, M. Mourigal, C. D. Batista, X. Bai

(参考訳) 有限温度での量子スピン系のシミュレーションは、多体物理学におけるオープンチャレンジである。この研究は、ピボット化合物の温度依存性のスピンダイナミクスであるFeI$_2$を研究し、非弾性中性子散乱によって測定された動的スピン構造因子$S(\mathbf{q}, \omega)$の現象論的再正規化によって普遍量子効果が説明できるかどうかを決定する。量子-古典対応原理に基づく再正規化スキームは、通常モードを記述する調和振動子に低温で一般的に適用される。しかし、この再正規化を任意に高温に拡張する方法は明確ではない。ここでは古典的モーメントの温度依存正規化を導入し、その等級は量子和則、すなわち$N_S$双極磁気モーメントに対して$\int d\omega d\mathbf{q} S(\mathbf{q}, \omega) = N_S S (S+1)$ によって決定される。この単純な再正規化スキームは、すべての温度において計算および測定された$s(\mathbf{q}, \omega)$ の一致を大幅に改善することを示している。その物質中の双極子モーメントと四極子モーメントの結合ダイナミクスにより、この再正規化手順はSU(3)コヒーレント状態に基づく古典理論や局所多極子モーメントの任意のSU(N)コヒーレント状態表現にまで拡張される。

Simulating quantum spin systems at finite temperatures is an open challenge in many-body physics. This work studies the temperature-dependent spin dynamics of a pivotal compound, FeI$_2$, to determine if universal quantum effects can be accounted for by a phenomenological renormalization of the dynamical spin structure factor $S(\mathbf{q}, \omega)$ measured by inelastic neutron scattering. Renormalization schemes based on the quantum-to-classical correspondence principle are commonly applied at low temperatures to the harmonic oscillators describing normal modes. However, it is not clear how to extend this renormalization to arbitrarily high temperatures. Here we introduce a temperature-dependent normalization of the classical moments, whose magnitude is determined by imposing the quantum sum rule, i.e. $\int d\omega d\mathbf{q} S(\mathbf{q}, \omega) = N_S S (S+1)$ for $N_S$ dipolar magnetic moments. We show that this simple renormalization scheme significantly improves the agreement between the calculated and measured $S(\mathbf{q}, \omega)$ for FeI$_{2}$ at all temperatures. Due to the coupled dynamics of dipolar and quadrupolar moments in that material, this renormalization procedure is extended to classical theories based on SU(3) coherent states, and by extension, to any SU(N) coherent state representation of local multipolar moments.

翻訳日:2023-11-01 18:17:27 公開日:2023-10-30

# Herd: 知的作曲家によるプロプライエタリで大規模なLLMのパフォーマンスに匹敵する、複数の小さなLLMの使用

Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer ( http://arxiv.org/abs/2310.19902v1 )

ライセンス: Link先を確認

Surya Narayanan Hari, Matt Thomson

(参考訳) 現在、多目的で、Q&A、テキスト要約、コンテンツ生成など、現実世界のタスクを実行できるLLMは1000以上存在する。しかしながら、フリーモデルのアクセシビリティ、スケール、信頼性は、日々のユースケースで広くデプロイされるのを防ぐ。アクセスとスケールの最初の2つの問題に対処するため、HuggingFaceのような組織は、モデルの重み付けと異なるパラダイムを使ってトレーニングされたモデルの定量バージョンをアップロードしたモデルリポジトリと、トレーニングプロセスを記述するモデルカードを作成している。一部のモデルは、一般的に使用されるベンチマークのパフォーマンスを報告しているが、すべてではないし、モデル展開コストのベンチマークでパフォーマンスのトレードオフによる現実世界への影響を解釈するのも不透明である。本稿では,オープンソースモデルの群れが,インテリジェントルータを介してプロプライエタリモデルのパフォーマンスにマッチするか,あるいは超えられることを示す。オープンソースモデルの群れは、効果的に2.5倍小さいモデルで構成されていても、chatgptの精度に合致することを示した。 GPTがクエリに答えられない場合、Herdは少なくとも40%の確率でモデルを特定することができる。

Currently, over a thousand LLMs exist that are multi-purpose and are capable of performing real world tasks, including Q&A, text summarization, content generation, etc. However, accessibility, scale and reliability of free models prevents them from being widely deployed in everyday use cases. To address the first two issues of access and scale, organisations such as HuggingFace have created model repositories where users have uploaded model weights and quantized versions of models trained using different paradigms, as well as model cards describing their training process. While some models report performance on commonly used benchmarks, not all do, and interpreting the real world impact of trading off performance on a benchmark for model deployment cost, is unclear. Here, we show that a herd of open source models can match or exceed the performance of proprietary models via an intelligent router. We show that a Herd of open source models is able to match the accuracy of ChatGPT, despite being composed of models that are effectively 2.5x smaller. We show that in cases where GPT is not able to answer the query, Herd is able to identify a model that can, at least 40% of the time.

翻訳日:2023-11-01 18:16:53 公開日:2023-10-30

# mist:畳み込み注意混合(cam)デコーダを用いた医用画像分割トランス

MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder ( http://arxiv.org/abs/2310.19898v1 )

ライセンス: Link先を確認

Md Motiur Rahman, Shiva Shokouhmand, Smriti Bhatt, and Miad Faezipour

(参考訳) 医用画像セグメンテーションに使用される一般的な、有望なディープラーニングアプローチの1つは、自己注意を利用して画素間の長距離依存関係をキャプチャできるトランスフォーマーである。医療画像のセグメンテーションの成功にもかかわらず、トランスフォーマーはマルチモーダル次元のピクセルの局所的なコンテキストを捉えることに限界に直面している。本稿では,新しい畳み込み型注意混合(cam)デコーダを組み込んだ医用画像分割トランス(mist)を提案する。 MISTには2つの部分がある: 事前訓練された多軸視覚変換器(MaxViT)をエンコーダとして使用し、符号化された特徴表現をCAMデコーダに渡して画像のセグメンテーションを行う。 CAMデコーダでは,マルチヘッド自己アテンション,空間アテンション,圧縮及び励起アテンションモジュールを組み合わせたアテンションミキサーを導入し,すべての空間次元における長距離依存性をキャプチャする。さらに、空間情報ゲインを高めるために、それぞれ、特徴抽出と受容野拡大に深部および浅部畳み込みを用いる。異なるネットワークステージからの低レベルと高レベルの機能の統合は、接続をスキップすることで可能となり、MISTは不要な情報を抑えることができる。実験の結果,CAMデコーダを用いたMISTトランスフォーマは,ACDCおよびSynapseデータセットの医用画像セグメンテーションに特化して設計された最先端のモデルよりも優れていた。また,CAMデコーダを階層変換器に付加することで,セグメント化性能が大幅に向上することを示した。データとコードを使った私たちのモデルはGitHubで公開されています。

One of the common and promising deep learning approaches used for medical image segmentation is transformers, as they can capture long-range dependencies among the pixels by utilizing self-attention. Despite being successful in medical image segmentation, transformers face limitations in capturing local contexts of pixels in multimodal dimensions. We propose a Medical Image Segmentation Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder to address this issue. MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images. In the CAM decoder, an attention-mixer combining multi-head self-attention, spatial attention, and squeeze and excitation attention modules is introduced to capture long-range dependencies in all spatial dimensions. Moreover, to enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion, respectively. The integration of low-level and high-level features from different network stages is enabled by skip connections, allowing MIST to suppress unnecessary information. The experiments show that our MIST transformer with CAM decoder outperforms the state-of-the-art models specifically designed for medical image segmentation on the ACDC and Synapse datasets. Our results also demonstrate that adding the CAM decoder with a hierarchical transformer improves segmentation performance significantly. Our model with data and code is publicly available on GitHub.

翻訳日:2023-11-01 18:16:31 公開日:2023-10-30

# ジョルダン・ウィグナー変換のない自由フェルミオン

Free fermions with no Jordan-Wigner transformation ( http://arxiv.org/abs/2310.19897v1 )

ライセンス: Link先を確認

Paul Fendley and Balazs Pozsgay

(参考訳) ヨルダン・ウィグナー変換はしばしばフェルミオン作用素の項で量子スピン鎖を書き換えるために用いられる。結果のハミルトニアンがフェルミオンにおいて双線型、すなわちフェルミオンは自由であるとき、厳密なスペクトルは、典型的には系の大きさと線形にしか成長しない行列の固有値から従う。しかし、フェルミオン双線型へのヨルダン・ウィグナー変換を認めないいくつかのハミルトニアンは、依然として同じ種類の自由フェルミオンスペクトルを持つ。そのような『変装中の自由フェルミオン』モデルのスペクトルは、昇降演算子の複雑だが明示的な構成によって正確に見ることができる。さらに、このようなスピン鎖の族を見つける方法を一般化する。正確なスペクトルを計算し、エレガントなグラフ理論構成をモデルに一般化します。また、この族が$N$=2格子超対称性を持つことを説明する。

The Jordan-Wigner transformation is frequently utilised to rewrite quantum spin chains in terms of fermionic operators. When the resulting Hamiltonian is bilinear in fermions, i.e. the fermions are free, the exact spectrum follows from the eigenvalues of a matrix typically growing only linearly with the size of the system. However, several Hamiltonians that do not admit a Jordan-Wigner transformation to fermion bilinears still have the same type of free-fermion spectra. The spectra of such ``free fermions in disguise" models can be found exactly by an intricate but explicit construction of the raising and lowering operators. We generalise the methods further to find a family of such spin chains. We compute the exact spectrum, and generalise an elegant graph-theory construction to our model. We also explain how this family admits an $N$=2 lattice supersymmetry.

翻訳日:2023-11-01 18:16:03 公開日:2023-10-30

# 視覚モデルにおける盲点形状の探索

Exploring Geometry of Blind Spots in Vision Models ( http://arxiv.org/abs/2310.19889v1 )

ライセンス: Link先を確認

Sriram Balasubramanian, Gaurang Sriramanan, Vinu Sankar Sadasivan, Soheil Feizi

(参考訳) 様々な環境でディープニューラルネットワークが顕著に成功したにもかかわらず、いくつかの研究は、敵対的攻撃として知られるほとんど知覚できない摂動に対する圧倒的な感受性を示した。一方, 入力空間における大振幅摂動は, ネットワークアクティベーションに有意な変化を起こさないため, 深層ネットワークの感度が低いことも先行研究で確認されている。本研究では,CNN や Transformers などの視覚モデルにおける過敏性現象を詳細に研究し,そのようなネットワークの「等価性」レベルセットの幾何と範囲を研究するための技術を提案する。本研究では,局所勾配の直交成分を用いて入力空間に対する信頼度の高い領域を反復的に探索するレベルセットトラバーサルアルゴリズムを提案する。ソースイメージが与えられた場合、このアルゴリズムは、他のクラスからの任意のイメージと知覚的に似ているにもかかわらず、ソースイメージと同じ同等の信頼レベルにある入力を識別する。さらに、これらの入力に対する高信頼パスによってソースイメージが線形に接続されていることを観察し、深層ネットワークのレベルセットのための星状構造を明らかにする。さらに,モデルが高信頼度を維持しているこれらの連結高次元領域の範囲を同定し,推定する。このプロジェクトのコードはhttps://github.com/sriramb-98/blindspots-neurips-subで公開されている。

Despite the remarkable success of deep neural networks in a myriad of settings, several works have demonstrated their overwhelming sensitivity to near-imperceptible perturbations, known as adversarial attacks. On the other hand, prior works have also observed that deep networks can be under-sensitive, wherein large-magnitude perturbations in input space do not induce appreciable changes to network activations. In this work, we study in detail the phenomenon of under-sensitivity in vision models such as CNNs and Transformers, and present techniques to study the geometry and extent of "equi-confidence" level sets of such networks. We propose a Level Set Traversal algorithm that iteratively explores regions of high confidence with respect to the input space using orthogonal components of the local gradients. Given a source image, we use this algorithm to identify inputs that lie in the same equi-confidence level set as the source image despite being perceptually similar to arbitrary images from other classes. We further observe that the source image is linearly connected by a high-confidence path to these inputs, uncovering a star-like structure for level sets of deep networks. Furthermore, we attempt to identify and estimate the extent of these connected higher-dimensional regions over which the model maintains a high degree of confidence. The code for this project is publicly available at https://github.com/SriramB-98/blindspots-neurips-sub

翻訳日:2023-11-01 18:15:49 公開日:2023-10-30

# BTRec: BERTベースのパーソナライズドツアーのためのトラジェクトリレコメンデーション

BTRec: BERT-Based Trajectory Recommendation for Personalized Tours ( http://arxiv.org/abs/2310.19886v1 )

ライセンス: Link先を確認

Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim

(参考訳) 楽しい休日を訪れる観光客にとって重要な課題は、特に不慣れな都市を訪れる際に、適切なレコメンデーションを十分に計画された旅程を持つことである。多くのツアーレコメンデーションツールは、popular point of interest(pois)やルーティング制約など、限られた数の要素のみを考慮に入れている。したがって、それらが提供するソリューションは、必ずしもシステムの個々のユーザーと一致しないかもしれない。本稿では,poibert埋め込みアルゴリズムから,bertフレームワークを用いたpoisのパーソナライズされたイテナリーを推奨するbtrec (bert-based tracking recommendation)を提案する。我々のBTRECアルゴリズムは、過去のPOI訪問と共にユーザの人口統計情報を修正されたBERT言語モデルに組み込んで、ソースと宛先のPOIのペアが与えられた場合、パーソナライズされたPOI反復予測を推奨する。我々の推薦システムは、訪問したPOIを最大化する旅行イテナリーを作成するとともに、POIのカテゴリと時間可用性のユーザの好みを考慮した旅行イテナリーを作成することができる。我々の推薦アルゴリズムは、自然言語処理(NLP)における文補完の問題に大きく影響されている。異なる大きさの8つの都市からなるデータセットを用いて,提案アルゴリズムが安定であり,リコール,精度,f1-scoreで測定した他の多くのシーケンス予測アルゴリズムよりも優れていることを示す。

An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual users of the system. We propose an iterative algorithm in this paper, namely: BTREC (BERT-based Trajectory Recommendation), that extends from the POIBERT embedding algorithm to recommend personalized itineraries on POIs using the BERT framework. Our BTREC algorithm incorporates users' demographic information alongside past POI visits into a modified BERT language model to recommend a personalized POI itinerary prediction given a pair of source and destination POIs. Our recommendation system can create a travel itinerary that maximizes POIs visited, while also taking into account user preferences for categories of POIs and time availability. Our recommendation algorithm is largely inspired by the problem of sentence completion in natural language processing (NLP). Using a dataset of eight cities of different sizes, our experimental results demonstrate that our proposed algorithm is stable and outperforms many other sequence prediction algorithms, measured by recall, precision, and F1-scores.

翻訳日:2023-11-01 18:15:26 公開日:2023-10-30

# 有界ゲート複雑性の量子状態とユニタリの学習

Learning quantum states and unitaries of bounded gate complexity ( http://arxiv.org/abs/2310.19882v1 )

ライセンス: Link先を確認

Haimeng Zhao, Laura Lewis, Ishaan Kannan, Yihui Quek, Hsin-Yuan Huang, Matthias C. Caro

(参考訳) 量子状態トモグラフィーは悪名高いが、ほとんどの州は事実上のトモグラフィーにはほとんど関心を持っていない。自然界に現れる状態とユニタリが有界ゲート複雑性であることを考えると、効率的な学習が可能かどうかを問うのは自然である。本研究では,2量子ビットゲートを持つ量子回路が生成する状態から小さなトレース距離までを学習するためには,サンプルの複雑性を$g$で線形にスケーリングする必要があることを証明した。また、$G$ゲートによって生成されるユニタリを、小さな平均ケースエラーに対して線形に$G$で学習するのに最適なクエリの複雑さが証明される。サンプル効率のよい学習は可能であるが、合理的な暗号予想の下では、学習状態とゲート複雑性のユニタリの計算複雑性は指数関数的に$g$でスケールしなければならない。これらの結果が量子機械学習モデルの表現性に基本的な制限を課し、ユニタリ学習におけるノーランチ定理に関する新たな視点を提供する。その結果,量子状態やユニタリの学習の複雑さが,それらの状態やユニタリの創成の複雑さとどのように関係しているかが明らかになった。

While quantum state tomography is notoriously hard, most states hold little interest to practically-minded tomographers. Given that states and unitaries appearing in Nature are of bounded gate complexity, it is natural to ask if efficient learning becomes possible. In this work, we prove that to learn a state generated by a quantum circuit with $G$ two-qubit gates to a small trace distance, a sample complexity scaling linearly in $G$ is necessary and sufficient. We also prove that the optimal query complexity to learn a unitary generated by $G$ gates to a small average-case error scales linearly in $G$. While sample-efficient learning can be achieved, we show that under reasonable cryptographic conjectures, the computational complexity for learning states and unitaries of gate complexity $G$ must scale exponentially in $G$. We illustrate how these results establish fundamental limitations on the expressivity of quantum machine learning models and provide new perspectives on no-free-lunch theorems in unitary learning. Together, our results answer how the complexity of learning quantum states and unitaries relate to the complexity of creating these states and unitaries.

翻訳日:2023-11-01 18:15:00 公開日:2023-10-30

# 運動学を用いた量子パルトンシャワー

Quantum Parton Shower with Kinematics ( http://arxiv.org/abs/2310.19881v1 )

ライセンス: Link先を確認

Christian W. Bauer, So Chigusa, Masahito Yamazaki

(参考訳) 量子干渉効果を効率的に取り入れるパルトンシャワーは、量子コンピュータ上で効率的に実行されることが示されている。しかし、これらの量子パルトンシャワーには、イベントの再構築に必要な完全なキネマティックな情報が含まれておらず、古典的なパルトンシャワーではvetoアルゴリズムを使用する必要がある。本研究では,進化変数の離散化に関する1つの余分な仮定を加えることで,事象における完全な量子干渉を再現し,運動的効果を含む量子vetoアルゴリズムを構築することができることを示す。最終的に,このvetoアルゴリズムで生成された量子干渉効果は古典的に扱いやすく,効率的な古典的アルゴリズムを考案できることを示した。

Parton showers which can efficiently incorporate quantum interference effects have been shown to be run efficiently on quantum computers. However, so far these quantum parton showers did not include the full kinematical information required to reconstruct an event, which in classical parton showers requires the use of a veto algorithm. In this work, we show that adding one extra assumption about the discretization of the evolution variable allows to construct a quantum veto algorithm, which reproduces the full quantum interference in the event, and allows to include kinematical effects. We finally show that for certain initial states the quantum interference effects generated in this veto algorithm are classically tractable, such that an efficient classical algorithm can be devised.

翻訳日:2023-11-01 18:14:40 公開日:2023-10-30

# 収縮グラフを用いた境界エンタングルメントエントロピー

Bounding Entanglement Entropy with Contracted Graphs ( http://arxiv.org/abs/2310.19874v1 )

ライセンス: Link先を確認

Cynthia Keeler, William Munizzi, Jason Pollack

(参考訳) 我々の以前の研究であるarxiv:2204.07593とarxiv:2306.01043に従って、クリフォード回路下の量子状態の軌道を ‘reachability graphs’ で研究し、頂点が同じエントロピーベクトルを持つ量子状態のクラスを表す'contracted graphs'を導入する。これらの収縮グラフはクリフォード群の二重コセットを表し、左コセットは開始状態の安定化部分群から構築され、右コセットはエントロピー保存演算子から構築される。我々は、安定状態のための収縮グラフと、W状態とディック状態について研究し、状態の収縮グラフの直径が、その2$-qubit Clifford軌道の「エントロピー多様性」をいかに制限するかについて議論した。任意の量子状態に対して、任意の$n$-qubit Clifford回路を用いて生成できるエントロピーベクトルの数に上限を導出する。我々は、同じクリフォード軌道内の状態の重力双対の相対的近接に対するホログラフィック的含意を推測する。我々はクリフォード群の下でエントロピーがどのように進化するかに焦点をあてるが、我々の二重コセット形式、すなわち縮約グラフ図は、一般ゲート集合や一般状態の性質に拡張可能である。

Following on our previous work arXiv:2204.07593 and arXiv:2306.01043 studying the orbits of quantum states under Clifford circuits via `reachability graphs', we introduce `contracted graphs' whose vertices represent classes of quantum states with the same entropy vector. These contracted graphs represent the double cosets of the Clifford group, where the left cosets are built from the stabilizer subgroup of the starting state and the right cosets are built from the entropy-preserving operators. We study contracted graphs for stabilizer states, as well as W states and Dicke states, discussing how the diameter of a state's contracted graph constrains the `entropic diversity' of its $2$-qubit Clifford orbit. We derive an upper bound on the number of entropy vectors that can be generated using any $n$-qubit Clifford circuit, for any quantum state. We speculate on the holographic implications for the relative proximity of gravitational duals of states within the same Clifford orbit. Although we concentrate on how entropy evolves under the Clifford group, our double-coset formalism, and thus the contracted graph picture, is extendable to generic gate sets and generic state properties.

翻訳日:2023-11-01 18:14:28 公開日:2023-10-30

# ニューラルネットワークを用いた計量流

Metric Flows with Neural Networks ( http://arxiv.org/abs/2310.19870v1 )

ライセンス: Link先を確認

James Halverson and Fabian Ruehle

(参考訳) ニューラルネットワークの勾配降下によって引き起こされるリーマン計量の空間内の流れの理論を考案する。これは、Calabi-Yauメトリクスをニューラルネットワークで近似する最近の進歩によるものであり、ニューラルネットワークの空間における理解フローの最近の進歩によって実現されている。我々は、時間とともに進化する複雑な非局所物体である計量ニューラルネットワークカーネルによって制御される対応する計量フロー方程式を導出する。しかし、多くのアーキテクチャでは、カーネルが固定され、ダイナミクスが単純化される無限幅の制限がある。追加の仮定は流れの局所性を誘導し、3d Poincar\'e予想を解くのに使われたリッチフローのペレルマンの定式化の実現を可能にする。これらのアイデアを数値calabi-yauメトリクスに適用し,機能学習の重要性に関する議論を行った。

We develop a theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincar\'e conjecture. We apply these ideas to numerical Calabi-Yau metrics, including a discussion on the importance of feature learning.

翻訳日:2023-11-01 18:14:01 公開日:2023-10-30

# 1次元量子シミュレータにおける有限エネルギー相転移の観察

Observation of a finite-energy phase transition in a one-dimensional quantum simulator ( http://arxiv.org/abs/2310.19869v1 )

ライセンス: Link先を確認

Alexander Schuckert, Or Katz, Lei Feng, Eleanor Crane, Arinjoy De, Mohammad Hafezi, Alexey V. Gorshkov, Christopher Monroe

(参考訳) 自然界で最も衝撃的な多体現象の1つは、温度やエネルギーが臨界値に達すると、マクロな性質が突然変化することである。このような平衡遷移は2次元と3次元の空間次元で予測・観測されてきたが、1次元(1次元)系には存在しないと考えられてきた。 50年前、ダイソンとチューレスは、1Dの相転移が長距離相互作用の存在下で起こることを指摘したが、平衡状態の準備と十分な長距離相互作用の実現の両方の必要性から、これまで実験的な実現は行われていない。ここでは, 有限エネルギー相転移の最初の実験実験について報告する。有限エネルギー状態は時間発展する積初期状態によって生成され、多体ハミルトニアンのダイナミクスの下で熱化できるという単純な観察を用いている。 1Dトラップイオン量子シミュレータで異なるエネルギーで初期状態を作成することにより、長距離相互作用量子系の有限エネルギー相図を研究する。強磁性平衡相転移と低エネルギー偏極常磁性体から高エネルギー非偏極常磁性体への交叉を最大2,300ドルスピンの系で観測し,数値シミュレーションとよく一致した。我々の研究は、量子シミュレーターが以前に到達不可能な位相を有限エネルギー密度で実現し、研究する能力を示す。

One of the most striking many-body phenomena in nature is the sudden change of macroscopic properties as the temperature or energy reaches a critical value. Such equilibrium transitions have been predicted and observed in two and three spatial dimensions, but have long been thought not to exist in one-dimensional (1D) systems. Fifty years ago, Dyson and Thouless pointed out that a phase transition in 1D can occur in the presence of long-range interactions, but an experimental realization has so far not been achieved due to the requirement to both prepare equilibrium states and realize sufficiently long-range interactions. Here we report on the first experimental demonstration of a finite-energy phase transition in 1D. We use the simple observation that finite-energy states can be prepared by time-evolving product initial states and letting them thermalize under the dynamics of a many-body Hamiltonian. By preparing initial states with different energies in a 1D trapped-ion quantum simulator, we study the finite-energy phase diagram of a long-range interacting quantum system. We observe a ferromagnetic equilibrium phase transition as well as a crossover from a low-energy polarized paramagnet to a high-energy unpolarized paramagnet in a system of up to $23$ spins, in excellent agreement with numerical simulations. Our work demonstrates the ability of quantum simulators to realize and study previously inaccessible phases at finite energy density.

翻訳日:2023-11-01 18:13:49 公開日:2023-10-30

# 粒子数保存系における典型的な絡み合いエントロピー

Typical entanglement entropy in systems with particle-number conservation ( http://arxiv.org/abs/2310.19862v1 )

ライセンス: Link先を確認

Yale Cheng, Rohit Patil, Yicheng Zhang, Marcos Rigol, Lucas Hackl

(参考訳) 我々は, 粒子数$n$, 体積$v$, サブシステム分率$f=v_a/v$ の関数として, 任意の種類の識別不能粒子を含む系において, 典型的な二部絡みエントロピー $\langle s_a\rangle_n$ を計算する。出力級数 $\langle S_A\rangle_N=a f V+b\sqrt{V}+c+o(1)$ として結果を拡張し、$c$ が普遍であること(つまりシステム型とは独立である)、そして$a$ と $b$ は局所ヒルベルト空間次元を特徴づける生成関数から得られる。我々は, ボソン, フェルミオン, スピン, およびそれらの混合物など, 幅広い異なる系を研究することにより, 本研究の汎用性を示す。我々は,量子カオススピンおよびボーソン系の高励起固有状態の絡み合いエントロピーについて解析結果が説明できることを示す。

We calculate the typical bipartite entanglement entropy $\langle S_A\rangle_N$ in systems containing indistinguishable particles of any kind as a function of the total particle number $N$, the volume $V$, and the subsystem fraction $f=V_A/V$, where $V_A$ is the volume of the subsystem. We expand our result as a power series $\langle S_A\rangle_N=a f V+b\sqrt{V}+c+o(1)$, and find that $c$ is universal (i.e., independent of the system type), while $a$ and $b$ can be obtained from a generating function characterizing the local Hilbert space dimension. We illustrate the generality of our findings by studying a wide range of different systems, e.g., bosons, fermions, spins, and mixtures thereof. We provide evidence that our analytical results describe the entanglement entropy of highly excited eigenstates of quantum-chaotic spin and boson systems, which is distinct from that of integrable counterparts.

翻訳日:2023-11-01 18:13:25 公開日:2023-10-30

# 競合rlにおける後方サンプリング:関数近似と部分観測

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation ( http://arxiv.org/abs/2310.19861v1 )

ライセンス: Link先を確認

Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang

(参考訳) 本稿では,一般関数近似の文脈における競合強化学習(RL)の後方サンプリングアルゴリズムについて検討する。まず,ゼロサムマルコフゲーム(MG)を,自己学習と逆学習という2つの重要な条件下で焦点を合わせ,機能近似の複雑性尺度として自己学習と逆一般化ユーラダー係数(GEC)を提案する。自己再生型GCCに基づいて,両プレイヤーがNash平衡を学習し,状態の部分的観測可能性に対処できるモデルベース自己再生後サンプリング手法を提案する。さらに、MG学習に適合する部分観測可能なMGモデルの集合を、相手の敵のポリシーと同一視する。本研究は, 対向GECを組み込んで, 潜在的な部分観測可能性を持つ対向MG学習のためのモデルベース後方サンプリング法を提案する。さらに,提案アルゴリズムに対して,提案するgecおよびエピソード数$t$でサブリニアにスケール可能な低後悔限度を提供する。我々の知る限り、我々は、完全可観測・部分的に可観測両方のMGクラスにおいて、抽出可能なゼロサムMGクラスの大部分に適用可能な、競争的RLのための汎用モデルベース後方サンプリングアルゴリズムを初めて開発した。

This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capturing the exploration-exploitation trade-off in MGs. Based on self-play GEC, we propose a model-based self-play posterior sampling method to control both players to learn Nash equilibrium, which can successfully handle the partial observability of states. Furthermore, we identify a set of partially observable MG models fitting MG learning with the adversarial policies of the opponent. Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We further provide low regret bounds for proposed algorithms that can scale sublinearly with the proposed GEC and the number of episodes $T$. To the best of our knowledge, we for the first time develop generic model-based posterior sampling algorithms for competitive RL that can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning.

翻訳日:2023-11-01 18:13:00 公開日:2023-10-30

# 生成ニューラルネットワークにおける物理知識の獲得

The Acquisition of Physical Knowledge in Generative Neural Networks ( http://arxiv.org/abs/2310.19943v1 )

ライセンス: Link先を確認

Luca M. Schulze Buschoff, Eric Schulz, Marcel Binz

(参考訳) 子どもが年をとると、周囲の物理的な過程を直感的に理解するようになる。彼らの物理的理解は段階的に発展し、以前の経験的研究で広範囲にマッピングされた発達軌道に沿って移動する。本稿では, 深層生成ニューラルネットワークの学習軌跡を, 身体的理解をテストベッドとして, 子どもの発達軌跡と比較する。我々は,人間の発達の2つの異なる仮説(確率的最適化と複雑性の増加)を考察できるアプローチについて概説する。我々のモデルは、多くの物理過程を正確に予測することができるが、両方の仮説の下での学習軌跡は、子供の発達的軌跡に従わない。

As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories using physical understanding as a testbed. We outline an approach that allows us to examine two distinct hypotheses of human development - stochastic optimization and complexity increase. We find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.

翻訳日:2023-11-01 18:05:39 公開日:2023-10-30

# Split-NER:2つの質問応答型分類によるエンティティ認識

Split-NER: Named Entity Recognition via Two Question-Answering-based Classifications ( http://arxiv.org/abs/2310.19942v1 )

ライセンス: Link先を確認

Jatin Arora and Youngja Park

(参考訳) 本研究では, NER の問題を,(1) エンティティタイプによらず,単純にエンティティ参照を抽出する Span Detection,(2) エンティティタイプに分類する Span Classification という2つの論理サブタスクに分割することで解決する。さらに,両サブタスクをQA問題として定式化し,各サブタスクに対して個別に最適化可能な2つのランダモデルを生成する。 4つのクロスドメインデータセットによる実験では、この2ステップのアプローチが効率的かつ時間的効率の両方を示している。当社のシステムであるSplitNERは,OntoNotes5.0,WNUT17,サイバーセキュリティデータセットをベースラインとして,BioNLP13CGのオンパーパフォーマンスを実現している。いずれの場合も、qaベースラインに比べてトレーニング時間が大幅に短縮される。本システムの有効性は,検出と分類を分離して,BERTモデルを2回微調整することに起因する。ソースコードはhttps://github.com/c3sr/split-nerにある。

In this work, we address the NER problem by splitting it into two logical sub-tasks: (1) Span Detection which simply extracts entity mention spans irrespective of entity type; (2) Span Classification which classifies the spans into their entity types. Further, we formulate both sub-tasks as question-answering (QA) problems and produce two leaner models which can be optimized separately for each sub-task. Experiments with four cross-domain datasets demonstrate that this two-step approach is both effective and time efficient. Our system, SplitNER outperforms baselines on OntoNotes5.0, WNUT17 and a cybersecurity dataset and gives on-par performance on BioNLP13CG. In all cases, it achieves a significant reduction in training time compared to its QA baseline counterpart. The effectiveness of our system stems from fine-tuning the BERT model twice, separately for span detection and classification. The source code can be found at https://github.com/c3sr/split-ner.

翻訳日:2023-11-01 18:05:27 公開日:2023-10-30

# リアプノフに基づくDropout Deep Neural Network (Lb-DDNN) コントローラ

Lyapunov-Based Dropout Deep Neural Network (Lb-DDNN) Controller ( http://arxiv.org/abs/2310.19938v1 )

ライセンス: Link先を確認

Saiedeh Akbari, Emily J. Griffis, Omkar Sudhir Patil, Warren E. Dixon

(参考訳) ディープニューラルネットワーク(DNN)ベースの適応コントローラは、非線形力学系における非構造的不確かさを補うために使用できる。しかし、DNNは過度な適合と適応にも非常に敏感である。ドロップアウト正規化(Dropout regularization)は、トレーニング中にノードをランダムにドロップして、オーバーフィッティングやコ適応といった問題を緩和するアプローチである。本稿では,DNNベースの適応制御器を開発した。開発したドロップアウト技術は,dnn内の各層ごとに確率的に選択される重みの非活性化を可能にする。同時に、オンライン教師なし学習のためにdnnの全層の重み付けを更新するために、lyapunovベースのリアルタイム重み付け適応法が導入された。追跡誤差の漸近収束を確実にするために、非スムースライプノフ系安定性解析を行う。開発したドロップアウトDNNベースの適応コントローラのシミュレーション結果は、トラッキングエラーが38.32%改善し、関数近似エラーが53.67%改善し、ドロップアウト正規化のないベースライン適応DNNベースのコントローラと比較して50.44%低下したことを示している。

Deep neural network (DNN)-based adaptive controllers can be used to compensate for unstructured uncertainties in nonlinear dynamic systems. However, DNNs are also very susceptible to overfitting and co-adaptation. Dropout regularization is an approach where nodes are randomly dropped during training to alleviate issues such as overfitting and co-adaptation. In this paper, a dropout DNN-based adaptive controller is developed. The developed dropout technique allows the deactivation of weights that are stochastically selected for each individual layer within the DNN. Simultaneously, a Lyapunov-based real-time weight adaptation law is introduced to update the weights of all layers of the DNN for online unsupervised learning. A non-smooth Lyapunov-based stability analysis is performed to ensure asymptotic convergence of the tracking error. Simulation results of the developed dropout DNN-based adaptive controller indicate a 38.32% improvement in the tracking error, a 53.67% improvement in the function approximation error, and 50.44% lower control effort when compared to a baseline adaptive DNN-based controller without dropout regularization.

翻訳日:2023-11-01 18:05:07 公開日:2023-10-30

# オブジェクト検出のためのFew-Annotation Learningに向けて:トランスフォーマーモデルの方が効率的か?

Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ? ( http://arxiv.org/abs/2310.19936v1 )

ライセンス: Link先を確認

Quentin Bouniot, Ang\'elique Loesch, Romaric Audigier, Amaury Habrard

(参考訳) オブジェクト検出のような特殊で密集したダウンストリームタスクでは、データラベリングには専門知識が必要であり、非常に高価である。少数の設定では、変換器をベースとしたオブジェクト検出器は、類似のパラメータの畳み込みベースの2段階モデルよりも優れているが、半教師付き設定における最近のアプローチでは効果が低い。本稿では,教師モデルが生成した擬似ラベルの機密後処理に頼らずに,学生教師アーキテクチャを用いた少人数のアノテーション学習設定において,現在最先端のオブジェクト検出器であるDeformable DETRに適した半教師付き手法を提案する。本手法はCOCOとPascal VOCの半教師付きオブジェクト検出ベンチマークで評価し,特にアノテーションが少ない場合,従来の手法よりも優れていた。我々の貢献は、同様のオブジェクト検出手法をこの設定で適用する新たな可能性も開きます。

For specialized and dense downstream tasks such as object detection, labeling data requires expertise and can be very expensive, making few-shot and semi-supervised models much more attractive alternatives. While in the few-shot setup we observe that transformer-based object detectors perform better than convolution-based two-stage models for a similar amount of parameters, they are not as effective when used with recent approaches in the semi-supervised setting. In this paper, we propose a semi-supervised method tailored for the current state-of-the-art object detector Deformable DETR in the few-annotation learning setup using a student-teacher architecture, which avoids relying on a sensitive post-processing of the pseudo-labels generated by the teacher model. We evaluate our method on the semi-supervised object detection benchmarks COCO and Pascal VOC, and it outperforms previous methods, especially when annotations are scarce. We believe that our contributions open new possibilities to adapt similar object detection methods in this setup as well.

翻訳日:2023-11-01 18:04:45 公開日:2023-10-30

# 環境ニューラルプロセスのためのSim2Real

Sim2Real for Environmental Neural Processes ( http://arxiv.org/abs/2310.19932v1 )

ライセンス: Link先を確認

Jonas Scholz, Tom R. Andersson, Anna Vaughan, James Requeima, Richard E. Turner

(参考訳) 機械学習(ML)ベースの気象モデルは、最近急速に改善されている。これらのモデルは通常、数値データ同化システムからのグリッド再解析データに基づいて訓練される。しかし、再分析データは、物理法則の仮定や時空間分解の低さといった制限を伴う。リアナリシスと現実のギャップは、気象観測所などの観測でMLモデルを直接訓練することへの関心が高まっている。分散した環境観測をモデル化するにはスケーラブルで柔軟なMLアーキテクチャが必要であり、そのうちの1つは畳み込み条件ニューラルプロセス(ConvCNP)である。 ConvCNPは、グリッドとオフザグリッドの両方のコンテキストデータを条件付きで学習し、ターゲット位置における不確実性を考慮した予測を行う。しかし、実際の観測の空間性は、ConvCNPのようなデータ・ハングリーな深層学習モデルに課題を呈している。一つの潜在的な解決策は'sim2real':再解析と観測データの微調整の事前トレーニングである。様々な気象観測所を用いてドイツ上空の気温を補間するように訓練されたconvcnpを用いてsim2realの解析を行った。保持された気象観測所では、sim2real trainingは、再解析データのみ、またはステーションデータのみでトレーニングされた同じモデルアーキテクチャを実質的に上回っており、再分析データは実際の観測から学習するための足場となる。これによってsim2realは、天気予報と気候モニタリングのより正確なモデルを可能にする。

Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML models directly on observations such as weather stations. Modelling scattered and sparse environmental observations requires scalable and flexible ML architectures, one of which is the convolutional conditional neural process (ConvCNP). ConvCNPs can learn to condition on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations. However, the sparsity of real observations presents a challenge for data-hungry deep learning models like the ConvCNP. One potential solution is 'Sim2Real': pre-training on reanalysis and fine-tuning on observational data. We analyse Sim2Real with a ConvCNP trained to interpolate surface air temperature over Germany, using varying numbers of weather stations for fine-tuning. On held-out weather stations, Sim2Real training substantially outperforms the same model architecture trained only with reanalysis data or only with station data, showing that reanalysis data can serve as a stepping stone for learning from real observations. Sim2Real could thus enable more accurate models for weather prediction and climate monitoring.

翻訳日:2023-11-01 18:04:25 公開日:2023-10-30

# モデルに基づくパラメータ化ポリシー勾配法:理論と実践的アルゴリズム

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms ( http://arxiv.org/abs/2310.19927v1 )

ライセンス: Link先を確認

Shenao Zhang, Boyi Liu, Zhaoran Wang, Tuo Zhao

(参考訳) ReParameterization (RP) Policy Gradient Methods (PGM) は、ロボット工学やコンピュータグラフィックスにおける連続的な制御タスクに広く採用されている。しかし、近年の研究では、長期強化学習問題に適用すると、モデルベースrp pgmは、勾配分散を爆発させることでカオス的かつ非スムース最適化の景観を経験し、収束が遅くなることが明らかになっている。これは、再パラメータ化法が深層生成モデルの訓練のような問題において低い勾配推定のばらつきを持つという従来の考え方とは対照的である。この現象を理解するため,モデルベースRP PGMの理論的検討を行い,最適化困難に対する解を求める。具体的には、モデルに基づくRP PGMの収束を解析し、関数近似器の滑らかさを勾配推定の品質に影響を与える主要な要因として挙げる。そこで本研究では, 長期モデルアンロールによる爆発分散問題を緩和するためのスペクトル正規化法を提案する。実験の結果,適切な正規化はモデルベースrp pgmの勾配分散を著しく減少させることがわかった。その結果, 提案手法の性能は, Likelihood Ratio (LR) 勾配推定器のような他の勾配推定器と同等か優れていることがわかった。私たちのコードはhttps://github.com/agentification/RP_PGMで利用可能です。

ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.

翻訳日:2023-11-01 18:04:03 公開日:2023-10-30

# Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents

Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents ( http://arxiv.org/abs/2310.19923v1 )

ライセンス: Link先を確認

Michael G\"unther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturuam, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao

(参考訳) テキスト埋め込みモデルは、文を意味情報をカプセル化する固定サイズの特徴ベクトルに変換する強力なツールとして登場した。これらのモデルは、情報検索、セマンティッククラスタリング、テキストの再ランクといったタスクには不可欠ですが、既存のオープンソースモデル、特にBERTのようなアーキテクチャ上に構築されたモデルでは、長いドキュメントの表現に苦労し、しばしば切り詰められます。この課題を緩和するための一般的なアプローチは、文書を埋め込むために小さな段落に分割することである。しかし、この戦略によりベクトルの集合がより大きくなり、結果としてメモリ消費が増加し、計算集約的なベクトル探索がレイテンシが上昇する。これらの課題に対処するため,我々は8192トークンまで対応可能なオープンソースのテキスト埋め込みモデルであるjina embeddeds 2を紹介する。このモデルは,従来の512token制限を超越し,長文処理を行うように設計されている。 Jina Embeddings 2はMTEBベンチマークの様々な組み込み関連タスクにおける最先端のパフォーマンスを達成するだけでなく、OpenAIのプロプライエタリなada-002モデルのパフォーマンスと一致する。さらに,この拡張コンテキストによって,narrativeqaなどのタスクのパフォーマンスが向上することを示す実験を行った。

Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often resort to truncation. One common approach to mitigate this challenge involves splitting documents into smaller paragraphs for embedding. However, this strategy results in a much larger set of vectors, consequently leading to increased memory consumption and computationally intensive vector searches with elevated latency. To address these challenges, we introduce Jina Embeddings 2, an open-source text embedding model capable of accommodating up to 8192 tokens. This model is designed to transcend the conventional 512-token limit and adeptly process long documents. Jina Embeddings 2 not only achieves state-of-the-art performance on a range of embedding-related tasks in the MTEB benchmark but also matches the performance of OpenAI's proprietary ada-002 model. Additionally, our experiments indicate that an extended context can enhance performance in tasks such as NarrativeQA.

翻訳日:2023-11-01 18:03:40 公開日:2023-10-30

# 機械学習によるカット生成線形プログラムのクラス解決

Solving a Class of Cut-Generating Linear Programs via Machine Learning ( http://arxiv.org/abs/2310.19920v1 )

ライセンス: Link先を確認

Atefeh Rajabalizadeh and Danial Davarnia

(参考訳) カット生成線形プログラム(CGLP)は分離オラクルとして重要な役割を担い、混合整数プログラムの可能な領域の不等式を生成する。分岐結合内に組み込むと、CGLPから得られた切断面は緩和を緩和し、二重境界を改善するのに役立つ。しかし,分岐木および分岐木のノードでのCGLPの実行は,ノード候補の多さや,ノードが有用な切断面を許容する事前知識の欠如により,計算に煩雑である。その結果、CGLPは、二重境界の改善に潜在的な影響があるにもかかわらず、ブランチ・アンド・カットアルゴリズムのデフォルト設定でしばしば避けられる。本稿では,分岐木のノードで切断面を生成できるかどうかを決定するCGLPクラスの最適値を近似する機械学習に基づく新しいフレームワークを提案する。目的関数ベクトルのインジケータ関数としてCGLPを翻訳することにより,従来のデータ分類手法により近似できることを示す。 cglp構造に基づく対応分類問題に対する学習データセットを効率的に生成するための体系的手順を提案する。本研究ではロジスティック回帰などの分類手法を用いてベンチマークインスタンスの計算実験を行う。これらの結果は, 従来の切削面法と比較して, 分類から得られた近似cglpが解時間を改善することを示唆する。提案するフレームワークは分岐木内の多数のノードに効率よく適用でき、カットを追加するのに最適な候補を特定することができる。

Cut-generating linear programs (CGLPs) play a key role as a separation oracle to produce valid inequalities for the feasible region of mixed-integer programs. When incorporated inside branch-and-bound, the cutting planes obtained from CGLPs help to tighten relaxations and improve dual bounds. However, running the CGLPs at the nodes of the branch-and-bound tree is computationally cumbersome due to the large number of node candidates and the lack of a priori knowledge on which nodes admit useful cutting planes. As a result, CGLPs are often avoided at default settings of branch-and-cut algorithms despite their potential impact on improving dual bounds. In this paper, we propose a novel framework based on machine learning to approximate the optimal value of a CGLP class that determines whether a cutting plane can be generated at a node of the branch-and-bound tree. Translating the CGLP as an indicator function of the objective function vector, we show that it can be approximated through conventional data classification techniques. We provide a systematic procedure to efficiently generate training data sets for the corresponding classification problem based on the CGLP structure. We conduct computational experiments on benchmark instances using classification methods such as logistic regression. These results suggest that the approximate CGLP obtained from classification can improve the solution time compared to that of conventional cutting plane methods. Our proposed framework can be efficiently applied to a large number of nodes in the branch-and-bound tree to identify the best candidates for adding a cut.

翻訳日:2023-11-01 18:03:14 公開日:2023-10-30

# ニューラルネットワークの値最大化によるメタ学習戦略

Meta-Learning Strategies through Value Maximization in Neural Networks ( http://arxiv.org/abs/2310.19919v1 )

ライセンス: Link先を確認

Rodrigo Carrasco-Davis, Javier Mas\'is, Andrew M. Saxe

(参考訳) 生物および人工学習エージェントは、ハイパーパラメータの選択から、curriculaのようなタスク分散の側面まで、学習方法に関する多くの選択に直面している。これらのメタラーニングの方法を理解することは、生物学習者における認知制御機能の規範的説明を提供し、工学的システムを改善することができる。しかし、学習プロセス全体の最適化の複雑さのため、現代のディープネットワークで計算する上で最適な戦略は依然として困難である。ここでは, 最適戦略について理論的に検討する。本稿では,完全規範目的の制御信号を効率的に最適化する学習努力フレームワークを提案する。簡単なニューラルネットワークアーキテクチャで利用できる勾配勾配降下に対する平均動的方程式を用いて計算的トラクタビリティを得る。本フレームワークは,メタラーニングとカリキュラムの自動学習の手法を統一した規範的設定で適応する。この枠組みを,共通メタラーニングアルゴリズムにおける近似の効果,最適曲率の側面の推測,連続学習環境における最適ニューロン資源割り当ての計算に応用する。設定をまたがって、学習の早い段階でタスクのより簡単な側面に適用した場合、コントロールの努力が最も有益であることが分かりました。全体として、学習活動フレームワークは、様々な学習システムにおける介入の規範的利益を研究するための、牽引可能な理論テストベッドを提供し、認知神経科学の確立した理論によって提示される学習軌跡に対する最適な認知制御戦略の正式な説明を提供する。

Biological and artificial learning agents face numerous choices about how to learn, ranging from hyperparameter selection to aspects of task distributions like curricula. Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems. Yet optimal strategies remain challenging to compute in modern deep networks due to the complexity of optimizing through the entire learning process. Here we theoretically investigate optimal strategies in a tractable setting. We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective: discounted cumulative performance throughout learning. We obtain computational tractability by using average dynamical equations for gradient descent, available for simple neural network architectures. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting. We apply this framework to investigate the effect of approximations in common meta-learning algorithms; infer aspects of optimal curricula; and compute optimal neuronal resource allocation in a continual learning setting. Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning; followed by sustained effort on harder aspects. Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.

翻訳日:2023-11-01 18:02:50 公開日:2023-10-30

# 非マスキングバイアスと不平等--電子健康記録を用いた医療用人工知能におけるバイアス検出と緩和の体系的レビュー

Unmasking Bias and Inequities: A Systematic Review of Bias Detection and Mitigation in Healthcare Artificial Intelligence Using Electronic Health Records ( http://arxiv.org/abs/2310.19917v1 )

ライセンス: Link先を確認

Feng Chen, Liqin Wang, Julie Hong, Jiaqi Jiang, Li Zhou

(参考訳) 目的: 電子健康記録(EHR)を利用した人工知能(AI)応用が普及しているが、様々なバイアスも導入されている。本研究では,EHRデータを用いたAI研究におけるバイアスに対処する文献を体系的にレビューすることを目的とする。方法: 組織的レビューとメタ分析(prisma)ガイドラインに対して, 望ましい報告項目に従って体系的レビューを行った。我々は,2010年1月1日から2022年10月31日までにPubMed, Web of Science, Institute of Electrical and Electronics Engineersから論文を検索した。バイアスの主なタイプを6つ定義し、バイアス処理における既存のアプローチをまとめました。結果: 検索された252記事中20項目が最終レビューの包含基準を満たした。 8つの研究が選択バイアスを分析し、6つは暗黙のバイアス、5つは共起バイアス、4つは計測バイアス、2つはアルゴリズムバイアスを分析。バイアスハンドリングアプローチについては、モデル開発中のバイアスを10の研究で特定し、バイアスを軽減する方法を提示した。議論: バイアスはさまざまな段階でAIアプリケーション開発プロセスに浸透する可能性がある。本稿では, 異なる開発段階におけるバイアスに対処する手法について考察するが, さらなる効果的なアプローチを実現する余地がある。結論: 医療AIの偏見に注目が集まる中、このトピックに関するEHRデータを用いた研究はまだ限られている。 EHRデータによるAIバイアスの検出と緩和は、引き続き課題となっている。医学的aiにおけるバイアスの検出、緩和、評価を一般化し、解釈可能な標準化された方法を開発するには、さらなる研究が必要である。

Objectives: Artificial intelligence (AI) applications utilizing electronic health records (EHRs) have gained popularity, but they also introduce various types of bias. This study aims to systematically review the literature that address bias in AI research utilizing EHR data. Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guideline. We retrieved articles published between January 1, 2010, and October 31, 2022, from PubMed, Web of Science, and the Institute of Electrical and Electronics Engineers. We defined six major types of bias and summarized the existing approaches in bias handling. Results: Out of the 252 retrieved articles, 20 met the inclusion criteria for the final review. Five out of six bias were covered in this review: eight studies analyzed selection bias; six on implicit bias; five on confounding bias; four on measurement bias; two on algorithmic bias. For bias handling approaches, ten studies identified bias during model development, while seventeen presented methods to mitigate the bias. Discussion: Bias may infiltrate the AI application development process at various stages. Although this review discusses methods for addressing bias at different development stages, there is room for implementing additional effective approaches. Conclusion: Despite growing attention to bias in healthcare AI, research using EHR data on this topic is still limited. Detecting and mitigating AI bias with EHR data continues to pose challenges. Further research is needed to raise a standardized method that is generalizable and interpretable to detect, mitigate and evaluate bias in medical AI.

翻訳日:2023-11-01 18:02:25 公開日:2023-10-30

# GPCR-BERT:タンパク質言語モデルを用いたGタンパク質結合受容体のシーケンス設計の解釈

GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models ( http://arxiv.org/abs/2310.19915v1 )

ライセンス: Link先を確認

Seongwon Kim, Parisa Mollaei, Akshay Antony, Rishikesh Magar, Amir Barati Farimani

(参考訳) 化学・生物学におけるトランスフォーマーと大規模言語モデル(LLM)の台頭に伴い、治療の設計と理解のための新たな道が科学界に開かれた。タンパク質配列は言語としてモデル化することができ、特にタンパク質配列データセットへのアクセスが豊富であるLLMの最近の進歩を利用することができる。本稿では,GPCR(G Protein-Coupled Receptors)のシーケンシャルデザインを理解するためのGPCR-BERTモデルを開発した。 gpcrはfda承認薬の3分の1以上がターゲットである。しかし、アミノ酸配列、リガンド選択性、コンフォメーションモチーフ(NPxxY、CWxP、E/DRYなど)の関係に関する包括的な理解が欠如している。予め訓練したタンパク質モデル(Prot-Bert)を用いて,モチーフの変動の予測タスクを微調整することで,結合ポケットの残基と保存モチーフのいくつかの関連性を明らかにすることができた。これを実現するために,我々は,マスキングのタイプを決定する際に,アミノ酸の寄与度を抽出するために解釈されるモデルの注意重みと隠れ状態を利用した。微調整されたモデルはモチーフ内の隠れた残差の予測において高い精度を示した。さらに,3次元構造上に埋め込み解析を行い,受容体のコンフォメーション内での高次相互作用を解明した。

With the rise of Transformers and Large Language Models (LLMs) in Chemistry and Biology, new avenues for the design and understanding of therapeutics have opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence datasets. In this paper, we developed the GPCR-BERT model for understanding the sequential design of G Protein-Coupled Receptors (GPCRs). GPCRs are the target of over one-third of FDA-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship between amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, E/DRY). By utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights, and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.

翻訳日:2023-11-01 18:01:56 公開日:2023-10-30

# 雑音推定復号に基づく効率的な絡み合い浄化

Efficient entanglement purification based on noise guessing decoding ( http://arxiv.org/abs/2310.19914v1 )

ライセンス: Link先を確認

Andr\'e Roque, Diogo Cruz, Francisco A. Monteiro, Bruno C. Coutinho

(参考訳) 本稿では,従来の誤り訂正符号に対して最近考案されたランダムな付加雑音復号法(GRAND)に基づいて,ハッシュ処理と推定に基づく新しい二部絡み除去プロトコルを提案する。我々のプロトコルは、既存のハッシュプロトコルに対して大きな利点を提供し、浄化のためにキュービットを少なくし、高い忠実性を実現し、計算コストを削減してより良い利得を提供する。我々は,この発見を裏付ける数値的,半解析的な結果を示し,bennet 等のハッシュプロトコルと詳細な比較を行った。その先駆的な仕事は性能限界を考案したが、実装のための明確な構成は提供しなかった。本研究は, そのギャップを埋め, 明示的で効率的な浄化法を提供する。提案手法は,16組の小さなアンサンブルであっても,ベルペアあたり10%のノイズを持つ状態を浄化できることを実証する。本研究は,ノイズを伴う実用的な設定に対処するプロトコルの計測に基づく実装について検討する。本研究は, 実現可能な計算コストを持つハッシュ法を用いて, 実用的で効率的な絡み合い浄化への道を開く。従来のハッシュプロトコルと比較すると,提案手法は,初期リソース数を100倍小さくすることで,所望の忠実性を実現することができる。そのため,提案手法は資源が限られ,計算オーバーヘッドが比較的少ない将来の量子ネットワークに適していると考えられる。

In this paper, we propose a novel bipartite entanglement purification protocol built upon hashing and upon the guessing random additive noise decoding (GRAND) approach recently devised for classical error correction codes. Our protocol offers substantial advantages over existing hashing protocols, requiring fewer qubits for purification, achieving higher fidelities, and delivering better yields with reduced computational costs. We provide numerical and semi-analytical results to corroborate our findings and provide a detailed comparison with the hashing protocol of Bennet et al. Although that pioneering work devised performance bounds, it did not offer an explicit construction for implementation. The present work fills that gap, offering both an explicit and more efficient purification method. We demonstrate that our protocol is capable of purifying states with noise on the order of 10% per Bell pair even with a small ensemble of 16 pairs. The work explores a measurement-based implementation of the protocol to address practical setups with noise. This work opens the path to practical and efficient entanglement purification using hashing-based methods with feasible computational costs. Compared to the original hashing protocol, the proposed method can achieve some desired fidelity with a number of initial resources up to one hundred times smaller. Therefore, the proposed method seems well-fit for future quantum networks with a limited number of resources and entails a relatively low computational overhead.

翻訳日:2023-11-01 18:01:33 公開日:2023-10-30

# ベイズシミュレーションに基づく宇宙初期条件の推論

Bayesian Simulation-based Inference for Cosmological Initial Conditions ( http://arxiv.org/abs/2310.19910v1 )

ライセンス: Link先を確認

Florian List, Noemi Anau Montel, Christoph Weniger

(参考訳) 観測から天体物理学と宇宙学の分野を再構築することは困難である。非線形変換、空間構造の混合、ノイズの計算が必要である。対照的に、フィールドを観測にマッピングするフォワードシミュレータは、多くのアプリケーションで容易に利用できる。本稿では,シミュレーションに基づく推論に根ざした多目的ベイズ場再構成アルゴリズムを提案する。提案手法は汎用(非微分可能)フォワードシミュレータに適用でき,後方からのサンプリングが可能となる。概念実証の応用について最初に有望な結果を示す: 後期密度場からの宇宙初期条件の回復。

Reconstructing astrophysical and cosmological fields from observations is challenging. It requires accounting for non-linear transformations, mixing of spatial structure, and noise. In contrast, forward simulators that map fields to observations are readily available for many applications. We present a versatile Bayesian field reconstruction algorithm rooted in simulation-based inference and enhanced by autoregressive modeling. The proposed technique is applicable to generic (non-differentiable) forward simulators and allows sampling from the posterior for the underlying field. We show first promising results on a proof-of-concept application: the recovery of cosmological initial conditions from late-time density fields.

翻訳日:2023-11-01 18:01:07 公開日:2023-10-30

# バックボーンの戦い - コンピュータビジョンタスク間で事前訓練されたモデルの大規模比較

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks ( http://arxiv.org/abs/2310.19909v1 )

ライセンス: Link先を確認

Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

(参考訳) ニューラルネットワークベースのコンピュータビジョンシステムは一般的に、事前訓練またはランダムに初期化された特徴抽出器であるバックボーン上に構築される。数年前、デフォルトオプションはimagenetがトレーニングした畳み込みニューラルネットワークだった。しかし、最近は様々なアルゴリズムやデータセットを使って事前訓練された無数のバックボーンが出現している。このような選択の多さは、さまざまなシステムのパフォーマンス向上に繋がる一方で、どのバックボーンを選択するべきかのインフォームドな決定を行うことは困難である。 battle of the backbones(bob)は、視覚言語モデル、自己教師付き学習で訓練されたモデル、安定した拡散バックボーンを含む、さまざまな事前学習されたモデルスイートを、分類からオブジェクト検出、ood一般化まで、さまざまなコンピュータビジョンタスクにまたがってベンチマークすることにより、この選択を簡単にする。さらにBoBは、1500以上のトレーニングランで実施された総合的な分析を通じて、既存のアプローチの強みと弱みを照らすことによって、研究コミュニティがコンピュータビジョンを推し進めるための有望な方向性に光を当てている。視覚トランスフォーマー(vits)と自己教師付き学習(ssl)はますます人気が高まっているが、大規模トレーニングセットで教師付きで事前学習された畳み込みニューラルネットワークは、我々が検討するモデルの中で、ほとんどのタスクで最高のパフォーマンスを保っている。さらに、同じアーキテクチャと同じような大きさの事前トレーニングデータセットに対するリンゴとアプリケーションの比較では、SSLバックボーンは非常に競争力があり、将来の作業は高度なアーキテクチャとより大きな事前トレーニングデータセットでSSL事前トレーニングを実行するべきであることを示しています。私たちは実験の生の結果をコードとともにリリースし、研究者が独自のバックボーンをgauntletに配置できるようにしました。

Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https://github.com/hsouri/Battle-of-the-Backbones

翻訳日:2023-11-01 18:00:52 公開日:2023-10-30

# 波面制御による軸方向定位の強化

Enhancing axial localization with wavefront control ( http://arxiv.org/abs/2310.19908v1 )

ライセンス: Link先を確認

M. Peterek, M. Paur, M. Vitek, D. Koutny, B. Stoklasa, L. Motka, Z. Hradil, J. Rehacek, L. L. Sanchez-Soto

(参考訳) 軸方向の詳細を分解する能力を高めることは3次元光学イメージングにおいて重要である。渦ビームを用いた軸方向定位における究極の精度を示す実験的な証拠を提供する。 Laguerre-Gauss(LG)ビームの場合、この顕著な限界はたった1つの強度スキャンで達成できる。この証明は、LG渦ビームに基づく顕微鏡技術が、導入した量子インスピレーションされた超解像プロトコルの恩恵を受ける可能性を実証している。

Enhancing the ability to resolve axial details is crucial in three-dimensional optical imaging. We provide experimental evidence showcasing the ultimate precision achievable in axial localization using vortex beams. For Laguerre-Gauss (LG) beams, this remarkable limit can be attained with just a single intensity scan. This proof-of-principle demonstrates that microscopy techniques based on LG vortex beams can potentially benefit from the introduced quantum-inspired superresolution protocol.

翻訳日:2023-11-01 17:59:56 公開日:2023-10-30

# 「人」=光肌、西洋男性、有色女性のセクシュアリゼーション--安定拡散におけるステレオタイプ

'Person' == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion ( http://arxiv.org/abs/2310.19981v1 )

ライセンス: Link先を確認

Sourojit Ghosh, Aylin Caliskan

(参考訳) 我々は、最も人気のあるテキスト・画像生成装置の1つに埋め込まれたステレオタイプについて研究する。本研究では,性別・国籍・大陸アイデンティティのステレオタイプが,どのような性別・国籍・大陸アイデンティティが「人」に割り当てられているか,あるいは「アジア出身者」にどのような性別・国籍・大陸アイデンティティが割り当てられているかを示す。視覚言語モデルクリップのコサイン類似性を用いて,クリップベース安定拡散v2.1で生成した画像を手作業による検査で比較した。我々は,男女・国籍情報のない「人」の安定拡散が,男性像,少なくとも非二元性イメージ,アフリカ・アジア上空のヨーロッパ・北米の人物とどのように一致しているかを観察し,人格表現を欧州・北米の男性に向けた。また,パプアニューギニアとオセアニア全体の両方の植民者の子孫に対して多数を占める先住民族が絶滅したことを指摘して,オセアニアの人がオーストラリア/ニュージーランド人であると考えられるような大陸のステレオタイプと結果としての被害を示す。最後に,女性,特にラテンアメリカ人,メキシコ人,インド人,エジプト人の他民族に対する過性化のパターンを,NSFW検出器で測定した。このことは、安定拡散が、メディアにおける客観化を通じて、西洋の色の女性のフェティシュ化を持続させることを示す。イメージデータセットは公開されています。

We study stereotypes embedded within one of the most popular text-to-image generators: Stable Diffusion. We examine what stereotypes of gender and nationality/continental identity does Stable Diffusion display in the absence of such information i.e. what gender and nationality/continental identity is assigned to `a person', or to `a person from Asia'. Using vision-language model CLIP's cosine similarity to compare images generated by CLIP-based Stable Diffusion v2.1 verified by manual examination, we chronicle results from 136 prompts (50 results/prompt) of front-facing images of persons from 6 different continents, 27 nationalities and 3 genders. We observe how Stable Diffusion outputs of `a person' without any additional gender/nationality information correspond closest to images of men and least with persons of nonbinary gender, and to persons from Europe/North America over Africa/Asia, pointing towards Stable Diffusion having a concerning representation of personhood to be a European/North American man. We also show continental stereotypes and resultant harms e.g. a person from Oceania is deemed to be Australian/New Zealander over Papua New Guinean, pointing to the erasure of Indigenous Oceanic peoples, who form a majority over descendants of colonizers both in Papua New Guinea and in Oceania overall. Finally, we unexpectedly observe a pattern of oversexualization of women, specifically Latin American, Mexican, Indian and Egyptian women relative to other nationalities, measured through an NSFW detector. This demonstrates how Stable Diffusion perpetuates Western fetishization of women of color through objectification in media, which if left unchecked will amplify this stereotypical representation. Image datasets are made publicly available.

翻訳日:2023-11-01 17:52:26 公開日:2023-10-30

# より高速なフランクウルフ反復による微分プライベートLASSO正規化ロジスティック回帰のスケールアップ

Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations ( http://arxiv.org/abs/2310.19978v1 )

ライセンス: Link先を確認

Edward Raff, Amol Khanna, Fred Lu

(参考訳) 我々の知る限りでは、現在、スパース入力データ上で微分プライベート回帰モデルを訓練する方法は存在しない。これに対処するため、frank-wolfe アルゴリズムを l_1$ ペナルテッド線形回帰に適応させ、スパース入力を認識し、効果的に利用する。この場合、アルゴリズムのトレーニング時間を$\mathcal{O}(T D S + T N S)$から$\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$に短縮します。この方法では,プライバシパラメータ$\epsilon$の値とデータセットのスパーシリティに応じて,最大2,200\times$の係数でランタイムを削減できることを示す。

To the best of our knowledge, there are no methods today for training differentially private regression models on sparse input data. To remedy this, we adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. In doing so, we reduce the training time of the algorithm from $\mathcal{O}( T D S + T N S)$ to $\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$, where $T$ is the number of iterations and a sparsity rate $S$ of a dataset with $N$ rows and $D$ features. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200\times$, depending on the value of the privacy parameter $\epsilon$ and the sparsity of the dataset.

翻訳日:2023-11-01 17:51:48 公開日:2023-10-30

# バイオインストラクト:バイオメディカル自然言語処理のための大規模言語モデルのチューニング

BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing ( http://arxiv.org/abs/2310.19975v1 )

ライセンス: Link先を確認

Hieu Tran, Zhichao Yang, Zonghai Yao, Hong Yu

(参考訳) 大規模言語モデル(LLM)は多くの自然言語処理(NLP)タスクで大きな成功を収めた。これは大量のデータに対するllmの事前トレーニングと、特定のドメインへの命令チューニングによって実現される。しかし、生物医学領域での指示はわずかしか発表されていない。この問題に対処するため,我々は25,000以上のサンプルを含むタスク固有の命令データセットであるbioinstructを紹介する。このデータセットは、3シードの80命令からなるgpt-4言語モデルにプロンプトすることで魅力的に生成された。バイオインストラクトデータセットを用いてLLMを微調整することにより,バイオメディカル自然言語処理(BioNLP)におけるLLMの性能を最適化することを目指す。 LLaMA LLM (1\&2, 7B\&13B) の命令チューニングを行い,情報抽出,質問応答,テキスト生成などのバイオNLPアプリケーション上で評価を行った。また、マルチタスク学習の原則を用いて、命令がモデル性能にどのように貢献するかを評価した。

Large language models (LLMs) has achieved a great success in many natural language processing (NLP) tasks. This is achieved by pretraining of LLMs on vast amount of data and then instruction tuning to specific domains. However, only a few instructions in the biomedical domain have been published. To address this issue, we introduce BioInstruct, a customized task-specific instruction dataset containing more than 25,000 examples. This dataset was generated attractively by prompting a GPT-4 language model with a three-seed-sample of 80 human-curated instructions. By fine-tuning LLMs using the BioInstruct dataset, we aim to optimize the LLM's performance in biomedical natural language processing (BioNLP). We conducted instruction tuning on the LLaMA LLMs (1\&2, 7B\&13B) and evaluated them on BioNLP applications, including information extraction, question answering, and text generation. We also evaluated how instructions contributed to model performance using multi-tasking learning principles.

翻訳日:2023-11-01 17:51:28 公開日:2023-10-30

# f$-differential privacy による混合機構におけるプライバシの統一的拡張

Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy ( http://arxiv.org/abs/2310.19973v1 )

ライセンス: Link先を確認

Chendi Wang, Buxin Su, Jiayuan Ye, Reza Shokri, Weijie J. Su

(参考訳) 差分プライベート(DP)機械学習アルゴリズムは、ランダム初期化、ランダムバッチサブサンプリング、シャッフルなど、ランダム性の多くのソースを生成する。しかし、そのようなランダム性は、解析が難しいアルゴリズムの出力に対する混合分布を誘導するため、微分プライバシー境界を証明する際に考慮するのは難しい。本稿では,シャッフルモデルとDP-GD (One-iteration differentially private gradient descent) のプライバシ境界の改善に焦点をあてる。シャッフルモデルに対するトレードオフ関数のクローズドフォーム式を導出し、$(\epsilon,\delta)$-DP に基づいて最新の結果を上回る結果を得る。また,DP-GDのプライバシーに対するランダム初期化の影響について検討した。トレードオフ関数の数値計算は、ランダム初期化がDP-GDのプライバシーを高めることを示唆している。これらの混合機構に対する$f$-DP保証の解析は,本論文で導入されたトレードオフ関数の不等式に依存する。この不等式は、f$-divergences の合同凸性を意味する。最後に, ホッケースティックの高精度な連接凸性に関する$f$-DP類似を$(\epsilon,\delta)$-DPで検討し, 混合機構のプライバシ解析に応用する。

Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy bounds for shuffling models and one-iteration differentially private gradient descent (DP-GD) with random initializations using $f$-DP. We derive a closed-form expression of the trade-off function for shuffling models that outperforms the most up-to-date results based on $(\epsilon,\delta)$-DP. Moreover, we investigate the effects of random initialization on the privacy of one-iteration DP-GD. Our numerical computations of the trade-off function indicate that random initialization can enhance the privacy of DP-GD. Our analysis of $f$-DP guarantees for these mixture mechanisms relies on an inequality for trade-off functions introduced in this paper. This inequality implies the joint convexity of $F$-divergences. Finally, we study an $f$-DP analog of the advanced joint convexity of the hockey-stick divergence related to $(\epsilon,\delta)$-DP and apply it to analyze the privacy of mixture mechanisms.

翻訳日:2023-11-01 17:51:11 公開日:2023-10-30

# Au(111)表面からのNO散乱の多重量子振動緩和に関する線形半古典力学による研究

A Linearized Semiclassical dynamics study of the multi-quantum vibrational relaxation of NO scattering from a Au(111) Surface ( http://arxiv.org/abs/2310.19972v1 )

ライセンス: Link先を確認

Shreyas Malpathak and Nandini Ananth

(参考訳) Au(111)表面から散乱するNO分子の振動緩和は、金属-分子界面における非断熱エネルギー移動を理解する努力の焦点となっている。実験的な測定と以前の理論は、金属中の電子ホール対の励起によって振動エネルギーの緩和が起こらないことを示唆している。ここで、線形半古典的手法を用いて、入射変換エネルギーの異なる場合、noの振動緩和を$\nu_i=3$状態から正確に予測する。また, 振動緩和過程を媒介する金属から分子への過渡電子移動の中心的役割を正確に把握するが, 高入射振動励起のマルチ量子緩和の量的予測には至っていない(\nu_i = 16$)。

The vibrational relaxation of NO molecules scattering from an Au(111) surface has served as the focus of efforts to understand nonadiabatic energy transfer at metal-molecule interfaces. Experimental measurements and previous theoretical efforts suggest that multi-quantal NO vibrational energy relaxation occurs via electron hole pair excitations in the metal. Here, using a Linearized Semiclassical approach, we accurately predict the vibrational relaxation of NO from $\nu_i=3$ state for different incident translational energies. We also accurately capture the central role of transient electron transfer from the metal to the molecule in mediating vibrational relaxation process, but fall short of quantitatively predicting the full extent of multi-quantum relaxation for high incident vibrational excitations ($\nu_i = 16$).

翻訳日:2023-11-01 17:50:47 公開日:2023-10-30

# トランスフォーマーの可能性に害を与える戦略:UNSL at eRisk 2023

Strategies to Harness the Transformers' Potential: UNSL at eRisk 2023 ( http://arxiv.org/abs/2310.19970v1 )

ライセンス: Link先を確認

Horacio Thompson, Leticia Cagnina and Marcelo Errecalde

(参考訳) CLEF eRisk Laboratoryは、インターネット上のリスク検出に関連するさまざまなタスクに対するソリューションを探索している。 2023年版では,第1タスクはうつ病の症状を検索し,BDIアンケートの症状との関連性に応じて利用者の文章を抽出することを目的としていた。課題2は,病的ギャンブルリスクの早期発見の問題であり,参加者は可能な限り早くユーザを検知しなければならなかった。最後に第3課題は摂食障害の重症度を推定することであった。我々の研究グループは、トランスフォーマーに基づくソリューションを提案する最初の2つのタスクに参加した。タスク1では、情報検索タスクに興味深い様々なアプローチを適用しました。 2つの提案はコンテキスト化された埋め込みベクトルの類似性に基づいており、もう1つは機械学習の魅力的な技術であるプロンプトに基づいている。タスク2では、3つの微調整モデルと、早期検出フレームワークで定義された基準に従って決定ポリシーを提案する。あるモデルは、アドレス領域に重要な単語を持つ拡張語彙を提示した。最後のタスクでは、意思決定ベースのメトリクス、ランキングベースのメトリクス、ランタイムを考慮して、優れたパフォーマンスを得ました。本研究では,eRiskタスクにおけるトランスフォーマーの予測可能性の展開方法について検討する。

The CLEF eRisk Laboratory explores solutions to different tasks related to risk detection on the Internet. In the 2023 edition, Task 1 consisted of searching for symptoms of depression, the objective of which was to extract user writings according to their relevance to the BDI Questionnaire symptoms. Task 2 was related to the problem of early detection of pathological gambling risks, where the participants had to detect users at risk as quickly as possible. Finally, Task 3 consisted of estimating the severity levels of signs of eating disorders. Our research group participated in the first two tasks, proposing solutions based on Transformers. For Task 1, we applied different approaches that can be interesting in information retrieval tasks. Two proposals were based on the similarity of contextualized embedding vectors, and the other one was based on prompting, an attractive current technique of machine learning. For Task 2, we proposed three fine-tuned models followed by decision policy according to criteria defined by an early detection framework. One model presented extended vocabulary with important words to the addressed domain. In the last task, we obtained good performances considering the decision-based metrics, ranking-based metrics, and runtime. In this work, we explore different ways to deploy the predictive potential of Transformers in eRisk tasks.

翻訳日:2023-11-01 17:50:31 公開日:2023-10-30

# 血液検査・半構造化・非構造化患者記録を用いた多変量機械学習による炎症性関節炎の早期発見

Early detection of inflammatory arthritis to improve referrals using multimodal machine learning from blood testing, semi-structured and unstructured patient records ( http://arxiv.org/abs/2310.19967v1 )

ライセンス: Link先を確認

Bing Wang, Weizi Li, Anthony Bradlow, Antoni T.Y. Chan, Eghosa Bazuaye

(参考訳) 炎症性関節炎 (IA) の早期発見は, 医療資源の制限の下で, タイムリーな治療とIA病コースの悪化を防止するために, 効率的かつ正確な病院紹介トリアージに重要である。手動評価プロセスは、iaを早期に検出するための最も一般的なアプローチであるが、非常に労働集約的で非効率である。一般診療(GP)から病院への紹介ごとに、大量の臨床情報を評価する必要がある。機械学習は、繰り返しアセスメントタスクを自動化し、IAの早期検出のための意思決定支援を提供する大きな可能性を示している。しかし、ほとんどの機械学習によるIA検出法は血液検査の結果に依存している。しかし、実際には、血液検査データは参照時点では必ずしも利用可能ではないため、iaを早期に検出するために、半構造化データや非構造化データのようなマルチモーダルデータを活用する方法が必要である。本研究では,IA早期検出における意思決定を支援するために,マルチモーダルデータを用いた融合・アンサンブル学習手法を提案する。我々の知る限りでは,本研究はgpレファラーからのia早期検出を支援するためにマルチモーダルデータを利用する最初の試みである。

Early detection of inflammatory arthritis (IA) is critical to efficient and accurate hospital referral triage for timely treatment and preventing the deterioration of the IA disease course, especially under limited healthcare resources. The manual assessment process is the most common approach in practice for the early detection of IA, but it is extremely labor-intensive and inefficient. A large amount of clinical information needs to be assessed for every referral from General Practice (GP) to the hospitals. Machine learning shows great potential in automating repetitive assessment tasks and providing decision support for the early detection of IA. However, most machine learning-based methods for IA detection rely on blood testing results. But in practice, blood testing data is not always available at the point of referrals, so we need methods to leverage multimodal data such as semi-structured and unstructured data for early detection of IA. In this research, we present fusion and ensemble learning-based methods using multimodal data to assist decision-making in the early detection of IA. To the best of our knowledge, our study is the first attempt to utilize multimodal data to support the early detection of IA from GP referrals.

翻訳日:2023-11-01 17:50:13 公開日:2023-10-30

# ExPT:Few-Shot実験設計のための合成プレトレーニング

ExPT: Synthetic Pretraining for Few-Shot Experimental Design ( http://arxiv.org/abs/2310.19961v1 )

ライセンス: Link先を確認

Tung Nguyen, Sudhanshu Agrawal, Aditya Grover

(参考訳) 実験設計は多くの科学・工学分野において根本的な問題である。この問題では,実世界の設計評価の時間,お金,安全コストなどによりサンプル効率が重要となる。既存のアプローチでは、アクティブなデータ収集や過去の実験のラベル付き大規模データセットへのアクセスに依存しているため、現実のシナリオでは実用的ではない。本研究では,入力設計のラベル付きデータポイントとそれに対応する値のみを使用可能な,より困難かつ現実的な数ショット実験設計の課題に対処する。本稿では,いくつかのラベル付き例と所望の出力のモデル条件が最適入力設計を生成する条件生成タスクとしてこの問題にアプローチする。そこで本研究では,合成前訓練とインコンテクスト学習を組み合わせた新しい実験モデルであるexperiment pretrained transformers (expt)を提案する。 ExPTでは、入力領域から有限個の非競合データ点の集まりを仮定し、この領域上で定義された多様な合成関数を最適化するためにトランスフォーマーニューラルネットワークを事前訓練する。 unsupervised pretrainingは、対象タスクからいくつかのラベル付きデータポイントを条件付けし、候補オプティマを生成することで、exptがテスト時に任意の設計タスクにインコンテキストで適応できる。課題領域における数発実験設計のexptを評価し,その優れた汎用性と性能を示す。ソースコードはhttps://github.com/tung-nd/ExPT.gitで入手できる。

Experimental design is a fundamental problem in many science and engineering fields. In this problem, sample efficiency is crucial due to the time, money, and safety costs of real-world design evaluations. Existing approaches either rely on active data collection or access to large, labeled datasets of past experiments, making them impractical in many real-world scenarios. In this work, we address the more challenging yet realistic setting of few-shot experimental design, where only a few labeled data points of input designs and their corresponding values are available. We approach this problem as a conditional generation task, where a model conditions on a few labeled examples and the desired output to generate an optimal input design. To this end, we introduce Experiment Pretrained Transformers (ExPT), a foundation model for few-shot experimental design that employs a novel combination of synthetic pretraining with in-context learning. In ExPT, we only assume knowledge of a finite collection of unlabelled data points from the input domain and pretrain a transformer neural network to optimize diverse synthetic functions defined over this domain. Unsupervised pretraining allows ExPT to adapt to any design task at test time in an in-context fashion by conditioning on a few labeled data points from the target task and generating the candidate optima. We evaluate ExPT on few-shot experimental design in challenging domains and demonstrate its superior generality and performance compared to existing methods. The source code is available at https://github.com/tung-nd/ExPT.git.

翻訳日:2023-11-01 17:49:50 公開日:2023-10-30

# 混合座標を用いた運動データのトポロジ学習

Topological Learning for Motion Data via Mixed Coordinates ( http://arxiv.org/abs/2310.19960v1 )

ライセンス: Link先を確認

Hengrui Luo, Jisu Kim, Alice Patania, Mikael Vejdemo-Johansson

(参考訳) トポロジーはデータセットの構造情報を効率的に抽出することができる。本稿では,移動学習のための多出力ガウス過程モデルにトポロジ的情報を組み込もうとする。この目的を達成するために、円座標の枠組みを混合値座標の新たな枠組みに拡張し、時系列における線形トレンドを考慮に入れます。複数の時系列から複数の出力ガウスプロセスモデルを通して効果的に学ぶことの大きな課題の1つは、関数型カーネルを構築することである。本稿では,マルチ出力ガウスプロセスモデルにおいて,トポロジカルクラスタリングを用いてクラスタベースのカーネルを構築することを提案する。このカーネルは、トポロジ的構造情報を包含するだけでなく、時間および運動系列におけるトポロジ的情報を用いた統一的なフレームワークを構築できる。

Topology can extract the structural information in a dataset efficiently. In this paper, we attempt to incorporate topological information into a multiple output Gaussian process model for transfer learning purposes. To achieve this goal, we extend the framework of circular coordinates into a novel framework of mixed valued coordinates to take linear trends in the time series into consideration. One of the major challenges to learn from multiple time series effectively via a multiple output Gaussian process model is constructing a functional kernel. We propose to use topologically induced clustering to construct a cluster based kernel in a multiple output Gaussian process model. This kernel not only incorporates the topological structural information, but also allows us to put forward a unified framework using topological information in time and motion series.

翻訳日:2023-11-01 17:49:25 公開日:2023-10-30

# PriPrune: Pruned Federated Learningにおけるプライバシの定量化と保存

PriPrune: Quantifying and Preserving Privacy in Pruned Federated Learning ( http://arxiv.org/abs/2310.19958v1 )

ライセンス: Link先を確認

Tianyue Chu, Mengwei Yang, Nikolaos Laoutaris, Athina Markopoulou

(参考訳) Federated Learning(FL)は、複数のクライアントデバイスとサーバが、ローカルなトレーニングデータを共有することなく、モデル更新のみを交換することで、グローバルモデルを協調的にトレーニングできるパラダイムである。これらのデバイスは通信や計算リソースの面で制約されることが多く、モデルプルーニング(モデルのサイズと複雑さを減らすために広く使用されるパラダイム)の恩恵を受けることができる。直観的には、ローカルモデルをより粗いものにすることで、pruningはflのコンテキストにおけるプライバシ攻撃に対する保護を提供するものと期待される。しかし、この保護は以前にも正式にも実験的にも特徴づけられておらず、最先端の攻撃に対して十分なものかどうかは不明である。本稿では,flにおけるモデルプルーニングのプライバシ保証に関する最初の調査を行う。我々は,pruned flモデルによって漏洩した情報量に関する情報理論上の上限を導出する。我々はこれらの理論的な知見を補完し、ベンチマークデータセットを用いて、最先端のプライバシー攻撃を含む包括的な実験により検証する。この評価は、プルーニングによって提供されるプライバシー保護に影響を与える可能性のある選択とパラメータに関する貴重な洞察を提供する。このアルゴリズムでは、パーソナライズされたクライアント毎の防御マスクを使用し、防御プルーニング率を適用して、プライバシとモデルパフォーマンスを共同で最適化する。 PriPruneは、クライアント上でプラインドされたFLスキームを変更せずに適用し、サーバによる逆攻撃から保護する、普遍的な方法である。私たちの経験的評価は、プライバシを考慮しない最先端のpruned flスキームと比較して、pripruneがプライバシ-精度のトレードオフを大幅に改善していることを示しています。

Federated learning (FL) is a paradigm that allows several client devices and a server to collaboratively train a global model, by exchanging only model updates, without the devices sharing their local training data. These devices are often constrained in terms of communication and computation resources, and can further benefit from model pruning -- a paradigm that is widely used to reduce the size and complexity of models. Intuitively, by making local models coarser, pruning is expected to also provide some protection against privacy attacks in the context of FL. However this protection has not been previously characterized, formally or experimentally, and it is unclear if it is sufficient against state-of-the-art attacks. In this paper, we perform the first investigation of privacy guarantees for model pruning in FL. We derive information-theoretic upper bounds on the amount of information leaked by pruned FL models. We complement and validate these theoretical findings, with comprehensive experiments that involve state-of-the-art privacy attacks, on several state-of-the-art FL pruning schemes, using benchmark datasets. This evaluation provides valuable insights into the choices and parameters that can affect the privacy protection provided by pruning. Based on these insights, we introduce PriPrune -- a privacy-aware algorithm for local model pruning, which uses a personalized per-client defense mask and adapts the defense pruning rate so as to jointly optimize privacy and model performance. PriPrune is universal in that can be applied after any pruned FL scheme on the client, without modification, and protects against any inversion attack by the server. Our empirical evaluation demonstrates that PriPrune significantly improves the privacy-accuracy tradeoff compared to state-of-the-art pruned FL schemes that do not take privacy into account.

翻訳日:2023-11-01 17:49:11 公開日:2023-10-30

# 時空間ビッグデータのためのディープラーニング: 機会と課題に関するビジョン

Deep Learning for Spatiotemporal Big Data: A Vision on Opportunities and Challenges ( http://arxiv.org/abs/2310.19957v1 )

ライセンス: Link先を確認

Zhe Jiang

(参考訳) gps、リモートセンシング、計算シミュレーションの進歩により、地球科学、農業、スマートシティ、公共の安全にまたがる様々な応用領域から膨大な時空間データが収集されている。このような新たな地理空間的および時空間的ビッグデータは、ディープラーニング技術の最近の進歩と相まって、これまで不可能だった問題を解決する新たな機会を育んでいる。たとえばリモートセンシングの研究者たちは、地球画像のビッグデータを使って、多くの土地被覆と土地利用モデリングタスクのための基礎モデルを訓練できる。沿岸モデラーはaiサロゲートを訓練して数値シミュレーションを高速化することができる。しかし、時空間ビッグデータの特徴は、ディープラーニング技術に新たな課題をもたらす。本稿では,様々なタイプの時空間ビッグデータを紹介し,時空間ビッグデータに適用する深層学習分野における新たな研究機会について論じ,ユニークな課題をリストアップし,今後の研究ニーズを明らかにする。

With advancements in GPS, remote sensing, and computational simulation, an enormous volume of spatiotemporal data is being collected at an increasing speed from various application domains, spanning Earth sciences, agriculture, smart cities, and public safety. Such emerging geospatial and spatiotemporal big data, coupled with recent advances in deep learning technologies, foster new opportunities to solve problems that have not been possible before. For instance, remote sensing researchers can potentially train a foundation model using Earth imagery big data for numerous land cover and land use modeling tasks. Coastal modelers can train AI surrogates to speed up numerical simulations. However, the distinctive characteristics of spatiotemporal big data pose new challenges for deep learning technologies. This vision paper introduces various types of spatiotemporal big data, discusses new research opportunities in the realm of deep learning applied to spatiotemporal big data, lists the unique challenges, and identifies several future research needs.

翻訳日:2023-11-01 17:48:43 公開日:2023-10-30

# トランスフォーマー言語モデル一般化における深さと幅の影響

The Impact of Depth and Width on Transformer Language Model Generalization ( http://arxiv.org/abs/2310.19956v1 )

ライセンス: Link先を確認

Jackson Petty, Sjoerd van Steenkiste, Ishita Dasgupta, Fei Sha, Dan Garrette, Tal Linzen

(参考訳) 新しい文を処理するには、言語モデル(lms)は構成的に一般化する必要があります。モデルの構造は構成の一般化を促進するか? トランスフォーマーに焦点をあてて、我々は最近の理論的および実証的な研究によって動機づけられた仮説を検証し、トランスフォーマーはより深い(より多くの層を持つ)ときにより構成的に一般化する。単に層を追加するだけでパラメータの総数を増加させ、深さとサイズを両立させるため、パラメータ総数が一定となるように深さと幅を切り替える3種類のモデル(41m,134m,374m)を構築した。すべてのモデルをlmsとして事前トレーニングし、合成一般化をテストするタスクで微調整します。 1) 微調整後,より深いモデルにより,より浅いモデルよりもより良い分散を一般化するが,追加層による相対的な利益は急速に減少する。(2) それぞれのファミリー内では,より深いモデルの方が優れた言語モデリング性能を示すが,戻り値も同様に減少する,(3) 合成一般化の深さの利点は,言語モデリングや分散データの性能向上にのみ帰着することができない,という3つの結論を報告する。

To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by recent theoretical and empirical work, that transformers generalize more compositionally when they are deeper (have more layers). Because simply adding layers increases the total number of parameters, confounding depth and size, we construct three classes of models which trade off depth for width such that the total number of parameters is kept constant (41M, 134M and 374M parameters). We pretrain all models as LMs and fine-tune them on tasks that test for compositional generalization. We report three main conclusions: (1) after fine-tuning, deeper models generalize better out-of-distribution than shallower models do, but the relative benefit of additional layers diminishes rapidly; (2) within each family, deeper models show better language modeling performance, but returns are similarly diminishing; (3) the benefits of depth for compositional generalization cannot be attributed solely to better performance on language modeling or on in-distribution data.

翻訳日:2023-11-01 17:48:28 公開日:2023-10-30

# 観察研究による行動変化の計測 : 概観

Measuring Behavior Change with Observational Studies: a Review ( http://arxiv.org/abs/2310.19951v1 )

ライセンス: Link先を確認

Arianna Pera, Gianmarco de Francisci Morales, Luca Maria Aiello

(参考訳) デジタル時代の行動変化を探求することは、21世紀の課題の文脈における社会的進歩に不可欠である。 128の論文(2000-2023)を分析し,オンライン行動変化を特徴付ける行動と変化検出方法論,参照プラットフォーム,理論的枠組みを分類するマップを構築した。私たちの発見は、感情の変化、API制限されたプラットフォームへの重点、理論の統合に焦点を合わせました。オンライン行動変化の研究において、より幅広い行動タイプ、多様なデータソース、より強力な理論・実践的アライメントを捉えることができる方法論を提唱する。

Exploring behavioral change in the digital age is imperative for societal progress in the context of 21st-century challenges. We analyzed 148 articles (2000-2023) and built a map that categorizes behaviors and change detection methodologies, platforms of reference, and theoretical frameworks that characterize online behavior change. Our findings uncover a focus on sentiment shifts, an emphasis on API-restricted platforms, and limited theory integration. We call for methodologies able to capture a wider range of behavioral types, diverse data sources, and stronger theory-practice alignment in the study of online behavioral change.

翻訳日:2023-11-01 17:48:02 公開日:2023-10-30

# ゲート依存雑音下における古典影の安定性

Stability of classical shadows under gate-dependent noise ( http://arxiv.org/abs/2310.19947v1 )

ライセンス: Link先を確認

Raphael Brieger, Markus Heinrich, Ingo Roth, Martin Kliesch

(参考訳) オブザーバブルの期待値は、繰り返し準備された量子状態におけるランダム化されたベースの測定結果のいわゆる古典影$\unicode{x2014}$を用いて日常的に推定される。実際の影推定の精度を信頼するためには,現実的な雑音下での推定者の挙動を理解することが重要である。本研究では、クリフォード回路のシミュレーションにおいて、境界安定化ノルム$\unicode{x2014}$オリジナブルのゲート依存ノイズの下で、クリフォードユニタリを含む影推定プロトコルが安定であることを証明する。これらの可観測性については、プロトコルのサンプル複雑性が本質的にノイズのないケースと同一であることも示します。対照的に,'magic'オブザーバブルの推定は,システム規模で指数関数的にスケールするバイアスに苦しむことを実証する。さらに,いわゆる頑健な影は,未緩和の古典的影に比べてゲート依存ノイズの存在に大きなバイアスを生じさせることが示唆された。技術的レベルでは、影推定に影響を及ぼす平均ノイズチャネルを特定し、よりきめ細かなノイズ誘発バイアスの制御を可能にする。

Expectation values of observables are routinely estimated using so-called classical shadows$\unicode{x2014}$the outcomes of randomized bases measurements on a repeatedly prepared quantum state. In order to trust the accuracy of shadow estimation in practice, it is crucial to understand the behavior of the estimators under realistic noise. In this work, we prove that any shadow estimation protocol involving Clifford unitaries is stable under gate-dependent noise for observables with bounded stabilizer norm$\unicode{x2014}$originally introduced in the context of simulating Clifford circuits. For these observables, we also show that the protocol's sample complexity is essentially identical to the noiseless case. In contrast, we demonstrate that estimation of `magic' observables can suffer from a bias that scales exponentially in the system size. We further find that so-called robust shadows, aiming at mitigating noise, can introduce a large bias in the presence of gate-dependent noise compared to unmitigated classical shadows. On a technical level, we identify average noise channels that affect shadow estimators and allow for a more fine-grained control of noise-induced biases.

翻訳日:2023-11-01 17:47:50 公開日:2023-10-30

# 軌道予測のための条件付き無臭オートエンコーダ

Conditional Unscented Autoencoders for Trajectory Prediction ( http://arxiv.org/abs/2310.19944v1 )

ライセンス: Link先を確認

Faris Janjo\v{s}, Marcel Hallgarten, Anthony Knittel, Maxim Dolgov, Andreas Zell, J. Marius Z\"ollner

(参考訳) \ac{CVAE} は \ac{AD} の軌道予測において最も広く使われているモデルの一つである。運転状況と地中の未来の間の相互作用を確率的潜在空間に捉え、それを用いて予測を生成する。本稿では,CVAE の重要な構成要素に挑戦する。 CVAEの基礎となるVAEの空間における最近の進歩を利用して,サンプリング手順の簡単な変更が性能に大きな恩恵をもたらすことを示す。任意の学習分布からサンプルを決定論的に抽出する非スパイスサンプリングは,潜在的に危険なランダムサンプリングよりも軌道予測に適していることがわかった。さらに、より構造化された混合潜水空間や、CVAEによる推論を行う新しい、より表現力のある方法など、さらなる改善も提供します。 CelebAデータセット上の画像モデリングのタスクや,ベースラインのvanilla CVAEよりも優れた画像モデリングのタスクにおいて,InterAction予測データセット上で評価することで,我々のモデルの適用性を示す。コードはhttps://github.com/boschresearch/cuae-predictionで入手できる。

The \ac{CVAE} is one of the most widely-used models in trajectory prediction for \ac{AD}. It captures the interplay between a driving context and its ground-truth future into a probabilistic latent space and uses it to produce predictions. In this paper, we challenge key components of the CVAE. We leverage recent advances in the space of the VAE, the foundation of the CVAE, which show that a simple change in the sampling procedure can greatly benefit performance. We find that unscented sampling, which draws samples from any learned distribution in a deterministic manner, can naturally be better suited to trajectory prediction than potentially dangerous random sampling. We go further and offer additional improvements, including a more structured mixture latent space, as well as a novel, potentially more expressive way to do inference with CVAEs. We show wide applicability of our models by evaluating them on the INTERACTION prediction dataset, outperforming the state of the art, as well as at the task of image modeling on the CelebA dataset, outperforming the baseline vanilla CVAE. Code is available at https://github.com/boschresearch/cuae-prediction.

翻訳日:2023-11-01 17:47:31 公開日:2023-10-30

# SURF:流体力学を予測するGNNの一般化ベンチマーク

SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics ( http://arxiv.org/abs/2310.20049v1 )

ライセンス: Link先を確認

Stefan K\"unzli, Florain Gr\"otschla, Jo\"el Mathys and Roger Wattenhofer

(参考訳) 流体力学のシミュレーションは、単純なバルブから複雑なターボ機械まで、設計と開発プロセスに不可欠である。基礎となる物理方程式の正確な解法は計算に高価である。したがって、メッシュ上のインタラクションをモデル化する学習ベースのソルバは、そのスピードアップが期待できるため関心を集めている。しかし、これらのモデルが根底にある物理原理を真に理解し、補間よりも一般化できるかどうかは不明である。一般化は、異なるトポロジー、解像度、熱力学的範囲に適応する汎用流体シミュレータの重要な要件である。学習したグラフに基づく流体シミュレータの「textit{ Generalization}」をテストするためのベンチマークであるSURFを提案する。 SURFは個々のデータセットで構成され、異なるモデルを評価し比較するための特定のパフォーマンスと一般化メトリクスを提供する。我々は2つの最先端グラフベースモデルを徹底的に研究し、SURFの適用性を実証的に実証し、それらの一般化に新たな洞察を与える。

Simulating fluid dynamics is crucial for the design and development process, ranging from simple valves to complex turbomachinery. Accurately solving the underlying physical equations is computationally expensive. Therefore, learning-based solvers that model interactions on meshes have gained interest due to their promising speed-ups. However, it is unknown to what extent these models truly understand the underlying physical principles and can generalize rather than interpolate. Generalization is a key requirement for a general-purpose fluid simulator, which should adapt to different topologies, resolutions, or thermodynamic ranges. We propose SURF, a benchmark designed to test the \textit{generalization} of learned graph-based fluid simulators. SURF comprises individual datasets and provides specific performance and generalization metrics for evaluating and comparing different models. We empirically demonstrate the applicability of SURF by thoroughly investigating the two state-of-the-art graph-based models, yielding new insights into their generalization.

翻訳日:2023-11-01 17:39:33 公開日:2023-10-30

# ダブルウェルポテンシャルにおける双極子超固体の融合

Merging Dipolar Supersolids in a Double-Well Potential ( http://arxiv.org/abs/2310.20018v1 )

ライセンス: Link先を確認

Hui Li, Eli Halperin, Shai Ronen, and John L. Bohn

(参考訳) 双極子ボース-アインシュタイン凝縮体による2つの同一超固体の融合挙動を理論的に検討した。特定のトラップアスペクト比のために2つの井戸間の障壁高さと間隔を断熱的に調整することにより、2つの超固体が互いに移動し、超固体状態、マクロドロップレット状態、リング状態、迷路状態を含む様々な基底状態相が出現する。我々は、マージ遷移中に見られる様々な状態を特徴付ける位相図を構築する。さらにガスの2つの部分を引き離すのに必要な力を計算し、マージした超固体が変形可能なプラスチック材料のように作用することを発見した。我々の研究は、双極子超固体の層構造とそれらの相互作用の将来の研究の道を開く。

We theoretically investigate the merging behaviour of two identical supersolids through dipolar Bose-Einstein condensates confined within a double-well potential. By adiabatically tuning the barrier height and the spacing between the two wells for specific trap aspect ratios, the two supersolids move toward each other and lead to the emergence of a variety of ground state phases, including a supersolid state, a macrodroplet state, a ring state, and a labyrinth state. We construct a phase diagram that characterizes various states seen during the merging transition. Further, we calculate the force required to pull the two portions of the gas apart, finding that the merged supersolids act like a deformable plastic material. Our work paves the way for future studies of layer structure in dipolar supersolids and the interaction between them in experiments.

翻訳日:2023-11-01 17:39:17 公開日:2023-10-30

# シリコン中の空洞結合型テレコム原子源

Cavity-coupled telecom atomic source in silicon ( http://arxiv.org/abs/2310.20014v1 )

ライセンス: Link先を確認

Adam Johnston, Ulises Felix-Rendon, Yu-En Wong, Songtao Chen

(参考訳) 固体材料の原子欠陥は量子相互接続やネットワーク応用に有望な候補である。近年、シリコンプラットフォームにおいて、成熟したシリコンフォトニクスとエレクトロニクス技術によってスケーラブルなデバイス統合を可能にする一連の原子欠陥が特定されている。特に、T中心は、テレコムバンドの光遷移と長いコヒーレンス時間を持つ二重基底状態の電子スピン多様体により、非常に有望である。しかし、T中心プラットフォームを前進させるためのオープンな課題は、弱く遅いゼロフォノン線放出を強化することである。本研究では,単一t中心からのキャビティエンハンシング蛍光放射を実証する。これは、単一のT中心を低損失のモード体積のシリコンフォトニック結晶キャビティに統合することで実現され、結果として蛍光崩壊速度がF$=6.89で向上する。効率的な光子抽出により、ゼロフォノン線における平均光子アウトカップリング速度73.3kHzを達成することができる。結合系の力学はリンドブラッドマスター方程式を解いてよくモデル化される。これらの結果は、量子情報処理およびネットワークアプリケーションのための効率的なT中心スピンフォトンインターフェースを構築するための重要なステップである。

Atomic defects in solid-state materials are promising candidates for quantum interconnect and networking applications. Recently, a series of atomic defects have been identified in the silicon platform, where scalable device integration can be enabled by mature silicon photonics and electronics technologies. In particular, T centers hold great promise due to their telecom band optical transitions and the doublet ground state electronic spin manifold with long coherence times. However, an open challenge for advancing the T center platform is to enhance its weak and slow zero phonon line emission. In this work, we demonstrate the cavity-enhanced fluorescence emission from a single T center. This is realized by integrating single T centers with a low-loss, small mode-volume silicon photonic crystal cavity, which results in an enhancement of the fluorescence decay rate by a factor of $F$ = 6.89. Efficient photon extraction enables the system to achieve an average photon outcoupling rate of 73.3 kHz at the zero phonon line. The dynamics of the coupled system is well modeled by solving the Lindblad master equation. These results represent a significant step towards building efficient T center spin-photon interfaces for quantum information processing and networking applications.

翻訳日:2023-11-01 17:39:02 公開日:2023-10-30

# 外乱に対するマルチスケール特徴属性

Multiscale Feature Attribution for Outliers ( http://arxiv.org/abs/2310.20012v1 )

ライセンス: Link先を確認

Jeff Shen, Peter Melchior

(参考訳) 機械学習のテクニックは、巨大なデータセットの外れ値を自動的に識別し、人間の検査よりもはるかに高速で再現性が高い。どの機能がこの入力を異常にレンダリングするのか? 本稿では, 異常なテストデータがトレーニングデータの限界を超える可能性があり, モデル性能が疑わしいと期待できる機能の種類を, 特定したい機能の種類についてほとんど知らないような, 外れ値に特化して設計された機能帰属手法である逆マルチスケールオクルージョンを提案する。我々は、ダークエネルギーサーベイ機器から銀河スペクトルから検出された異常値の方法を示し、その結果が別の帰属アプローチよりもずっと解釈可能であることを見出した。

Machine learning techniques can automatically identify outliers in massive datasets, much faster and more reproducible than human inspection ever could. But finding such outliers immediately leads to the question: which features render this input anomalous? We propose a new feature attribution method, Inverse Multiscale Occlusion, that is specifically designed for outliers, for which we have little knowledge of the type of features we want to identify and expect that the model performance is questionable because anomalous test data likely exceed the limits of the training data. We demonstrate our method on outliers detected in galaxy spectra from the Dark Energy Survey Instrument and find its results to be much more interpretable than alternative attribution approaches.

翻訳日:2023-11-01 17:38:43 公開日:2023-10-30

# 進化的テーブルトップゲームデザイン:リスクゲームにおけるケーススタディ

Evolutionary Tabletop Game Design: A Case Study in the Risk Game ( http://arxiv.org/abs/2310.20008v1 )

ライセンス: Link先を確認

Lana Bertoldo Rossato, Leonardo Boaventura Bombardelli, and Anderson Rocha Tavares

(参考訳) 手動でゲームを作成して評価するのは大変な作業です。手続き的コンテンツ生成はゲームアーチファクトを作成するのに役立つが、通常はゲーム全体ではない。進化的アルゴリズムと自動プレイテストを組み合わせた進化的ゲームデザインは、単純な機器で新しいボードゲームを作成するために用いられてきたが、元々のアプローチにはサイコロ、カード、地図を備えた複雑なテーブルトップゲームは含まれていない。本研究は, テーブルトップゲームに対するアプローチの拡張を提案し, プレイヤーがマップテリトリーを征服して勝利しなければならない軍事戦略ゲームである, リスクの変種を生成することにより, プロセスを評価する。遺伝的アルゴリズムを用いて選択したパラメータを進化させ、ゲームをテストするためのルールベースのエージェントと、生成された新しいバリエーションを評価するための様々な品質基準を用いてこれを達成した。その結果,より小さなマップでオリジナルゲームの新たなバリエーションが作成され,より短いマッチが得られた。また、よりバランスの取れたマッチが作られ、通常のドラマが維持される。また、目的関数が正しく追求される場合が多いが、生成されたゲームはほとんど自明であった。この研究は、古典的なボードゲームを超えた進化的ゲームデザインの使用に関する有望な研究への道を開いた。

Creating and evaluating games manually is an arduous and laborious task. Procedural content generation can aid by creating game artifacts, but usually not an entire game. Evolutionary game design, which combines evolutionary algorithms with automated playtesting, has been used to create novel board games with simple equipment; however, the original approach does not include complex tabletop games with dice, cards, and maps. This work proposes an extension of the approach for tabletop games, evaluating the process by generating variants of Risk, a military strategy game where players must conquer map territories to win. We achieved this using a genetic algorithm to evolve the chosen parameters, as well as a rules-based agent to test the games and a variety of quality criteria to evaluate the new variations generated. Our results show the creation of new variations of the original game with smaller maps, resulting in shorter matches. Also, the variants produce more balanced matches, maintaining the usual drama. We also identified limitations in the process, where, in many cases, where the objective function was correctly pursued, but the generated games were nearly trivial. This work paves the way towards promising research regarding the use of evolutionary game design beyond classic board games.

翻訳日:2023-11-01 17:38:28 公開日:2023-10-30

# 強化学習におけるトンプソンサンプリングのためのベイズ回帰境界の改良

Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning ( http://arxiv.org/abs/2310.20007v1 )

ライセンス: Link先を確認

Ahmadreza Moradipari, Mohammad Pedramfar, Modjtaba Shokrian Zini, Vaneet Aggarwal

(参考訳) 本稿では,複数設定の強化学習におけるトンプソンサンプリングに対する最初のベイズ的後悔の限界を実証する。本稿では,サロゲート環境の離散セットを用いた学習問題を単純化し,後方整合性を用いた情報比率の高精度解析を提案する。これは、h$ がエピソードの長さ、$d_{l_1}$ が環境空間のコルモゴロフ $l_1-$dimensionであるような不均質な強化学習問題において、順序 $\widetilde{o}(h\sqrt{d_{l_1}t})$ の上限となる。次に、表、線形、有限混合といった様々な設定で$d_{l_1}$の具体的な境界を見つけ、その結果がどのようにそれらの種類の最初のものであるか、それとも最先端の技術を改善するかについて議論する。

In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.

翻訳日:2023-11-01 17:38:07 公開日:2023-10-30

# スペインにおける抑うつ・摂食障害の早期発見:UNSL at MentalRiskES 2023

Early Detection of Depression and Eating Disorders in Spanish: UNSL at MentalRiskES 2023 ( http://arxiv.org/abs/2310.20003v1 )

ライセンス: Link先を確認

Horacio Thompson, Marcelo Errecalde

(参考訳) mentalriskesは、スペイン語の早期のリスク検出に関連する問題を解決するための新しい挑戦である。目的は、異なるタスクを考慮した精神障害の兆候を示すTelegramユーザーをできるだけ早く検出することである。タスク1は、ユーザの摂食障害の検出、うつ病検出に焦点を当てたタスク2、未知の障害の検出を目的としたタスク3を含む。これらのタスクはサブタスクに分割され、それぞれが解決アプローチを定義する。調査グループではサブタスクA for Tasks 1, 2: ユーザが肯定的か否定的かを評価するバイナリ分類問題に参加した。これらの課題を解決するために, 初期検出フレームワークで定義された基準に従って, 変圧器に基づくモデルと決定方針を提案する。モデルの1つは、解決すべき各タスクに重要な単語を持つ拡張語彙を示した。さらに,ユーザ評価中にモデルが実行する予測履歴に基づいて決定ポリシーを適用した。タスク1と2では,分類とレイテンシに基づくランキングで2番目に高い性能を示し,スペイン語の早期検出問題に対するアプローチの有効性と一貫性を実証した。

MentalRiskES is a novel challenge that proposes to solve problems related to early risk detection for the Spanish language. The objective is to detect, as soon as possible, Telegram users who show signs of mental disorders considering different tasks. Task 1 involved the users' detection of eating disorders, Task 2 focused on depression detection, and Task 3 aimed at detecting an unknown disorder. These tasks were divided into subtasks, each one defining a resolution approach. Our research group participated in subtask A for Tasks 1 and 2: a binary classification problem that evaluated whether the users were positive or negative. To solve these tasks, we proposed models based on Transformers followed by a decision policy according to criteria defined by an early detection framework. One of the models presented an extended vocabulary with important words for each task to be solved. In addition, we applied a decision policy based on the history of predictions that the model performs during user evaluation. For Tasks 1 and 2, we obtained the second-best performance according to rankings based on classification and latency, demonstrating the effectiveness and consistency of our approaches for solving early detection problems in the Spanish language.

翻訳日:2023-11-01 17:37:46 公開日:2023-10-30

# システム相互運用性タイプ:3次研究

Systems Interoperability Types: A Tertiary Study ( http://arxiv.org/abs/2310.19999v1 )

ライセンス: Link先を確認

Rita S. P. Maciel and Pedro H. Valle and K\'ecia S. Santos and Elisa Y. Nakagawa

(参考訳) 相互運用性は、いくつかの相互運用性タイプ(またはレベル)、多様なモデル、フレームワーク、ソリューションが出現し、また異なるドメインからの継続的な取り組みの結果、少なくとも40年以上にわたって注目されてきた。ブロックチェーンやIoT、Industrial 4.0のような新しいアプリケーションドメインといったテクノロジの現在の異質性は、新たなインタラクションの可能性だけでなく、相互運用性の課題ももたらす。さらに、相互運用性のタイプに対する現在の理解における混乱と曖昧さが存在し、ステークホルダーのコミュニケーションと意思決定を妨げる。この研究は、ソフトウェア集約型システムの相互運用性のパノラマを更新し、そのタイプに特に注目する。そこで我々は,2012年から2023年にかけて発行された37のセカンダリ研究を精査し,13の相互運用モデルと6つのフレームワークに加えて,117の異なる定義に関連する36の相互運用タイプを発見した。このパノラマは、相互運用性に関する懸念が、ソフトウェアシステムの境界を越えて技術から社会技術的問題へと移行し、いまだに多くのオープンな問題を解決する必要があることを明かしている。我々はまた、低結合で費用効率で相互運用可能なシステムを実現するために、インターオペラビリティを多分野の研究分野として活用する緊急行動および潜在的研究機会に対処する。

Interoperability has been a focus of attention over at least four decades, with the emergence of several interoperability types (or levels), diverse models, frameworks, and solutions, also as a result of a continuous effort from different domains. The current heterogeneity in technologies such as blockchain, IoT and new application domains such as Industry 4.0 brings not only new interaction possibilities but also challenges for interoperability. Moreover, confusion and ambiguity in the current understanding of interoperability types exist, hampering stakeholders' communication and decision making. This work presents an updated panorama of software-intensive systems interoperability with particular attention to its types. For this, we conducted a tertiary study that scrutinized 37 secondary studies published from 2012 to 2023, from which we found 36 interoperability types associated with 117 different definitions, besides 13 interoperability models and six frameworks in various domains. This panorama reveals that the concern with interoperability has migrated from technical to social-technical issues going beyond the software systems' boundary and still requiring solving many open issues. We also address the urgent actions and also potential research opportunities to leverage interoperability as a multidisciplinary research field to achieve low-coupled, cost-effective, and interoperable systems.

翻訳日:2023-11-01 17:37:29 公開日:2023-10-30

# 生成的検索型オントロジグラフとマルチエージェント戦略による大型言語モデルに基づく材料設計

Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design ( http://arxiv.org/abs/2310.19998v1 )

ライセンス: Link先を確認

Markus J. Buehler

(参考訳) トランスフォーマーニューラルネットワークは、特に材料分析、設計、製造において、人間の言語、記号、コード、数値データの両方で効果的に機能する能力を含む、有望な能力を示している。本稿では,教材の工学的分析を支援するツールとして大規模言語モデル (LLM) の利用,主題領域の重要情報検索,研究仮説の展開,異なる知識領域にわたる機械的関係の発見,物理基底真理に基づく能動的知識生成のためのシミュレーションコードの作成と実行について検討する。特定の機能、機能、インストラクションを備えたAIエージェントのセットとして使用される場合、LLMは分析および設計問題におけるアプリケーションのための強力な問題解決戦略を提供することができる。本実験は,材料力学領域のトレーニングデータを基に開発した微調整モデルであるmechgptを用いて行った。まず、ファインタニングがドメイン知識を合理的に理解してLLMを実現するかを確認します。しかし、学習内容の文脈外でクエリを行うと、LLMは正しい情報を思い出すことが困難になる。モデルがどのような概念を重要か,どのように関連しているかを理解するための,検索から導かれるオントロジナレッジグラフ戦略を用いて,これに対処する方法を示す。このような戦略は、ノード、エッジ、サブグラフのレベルで豊富な情報を持つ解釈可能なグラフ構造を提供することもできる。非線形サンプリング戦略とエージェントベースモデリングを複合質問応答に適用し,能動的学習密度汎関数理論(dft)モデリングとデータ解析から自動力場開発の文脈におけるコード生成と実行について検討した。

Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing, including their capacity to work effectively with both human language, symbols, code, and numerical data. Here we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials, applied to retrieving key information about subject areas, developing research hypotheses, discovery of mechanistic relationships across disparate areas of knowledge, and writing and executing simulation codes for active knowledge generation based on physical ground truths. When used as sets of AI agents with specific features, capabilities, and instructions, LLMs can provide powerful problem solution strategies for applications in analysis and design problems. Our experiments focus on using a fine-tuned model, MechGPT, developed based on training data in the mechanics of materials domain. We first affirm how finetuning endows LLMs with reasonable understanding of domain knowledge. However, when queried outside the context of learned matter, LLMs can have difficulty to recall correct information. We show how this can be addressed using retrieval-augmented Ontological Knowledge Graph strategies that discern how the model understands what concepts are important and how they are related. Illustrated for a use case of relating distinct areas of knowledge - here, music and proteins - such strategies can also provide an interpretable graph structure with rich information at the node, edge and subgraph level. We discuss nonlinear sampling strategies and agent-based modeling applied to complex question answering, code generation and execution in the context of automated force field development from actively learned Density Functional Theory (DFT) modeling, and data analysis.

翻訳日:2023-11-01 17:37:08 公開日:2023-10-30

# トランスダクティブ・マイノショット学習のための適応型アンカーラベル伝播

Adaptive Anchor Label Propagation for Transductive Few-Shot Learning ( http://arxiv.org/abs/2310.19996v1 )

ライセンス: Link先を確認

Michalis Lazarou, Yannis Avrithis, Guangyu Ren, Tania Stathaki

(参考訳) ラベル付きデータによる画像の分類の問題に対処する例は少ない。ラベル伝搬などのトランスダクティブ推論手法を用いてラベルなしデータを活用することで,少数ショット学習の性能を大幅に向上できることが示されている。ラベル伝搬は、データの基盤となる多様体構造を利用する構築グラフを利用して、ラベルのないデータの擬似ラベルを推論する。しかし、既存のラベル伝播アプローチの限界は、すべてのデータポイントの位置が固定されており、アルゴリズムが可能な限り効果的ではないように準最適であるかもしれないことである。本研究では,その過程における多様体の位置を最適化する可微分損失関数を最小化することにより,ラベル付きデータの特徴埋め込みを適応する新しいアルゴリズムを提案する。提案アルゴリズムであるAdaptive Anchor Label Propagationは,1ショット設定と5ショット設定において,標準ラベル伝搬アルゴリズムを最大7%,2%向上させる。提案手法は, miniImageNet, tieredImageNet, CUB, CIFAR-FS と, ResNet12 と WideResNet-28-10 の2つのバックボーンを用いて,提案アルゴリズムの利点を明らかにする実験結果である。ソースコードはhttps://github.com/MichalisLazarou/A2LPで確認できる。

Few-shot learning addresses the issue of classifying images using limited labeled data. Exploiting unlabeled data through the use of transductive inference methods such as label propagation has been shown to improve the performance of few-shot learning significantly. Label propagation infers pseudo-labels for unlabeled data by utilizing a constructed graph that exploits the underlying manifold structure of the data. However, a limitation of the existing label propagation approaches is that the positions of all data points are fixed and might be sub-optimal so that the algorithm is not as effective as possible. In this work, we propose a novel algorithm that adapts the feature embeddings of the labeled data by minimizing a differentiable loss function optimizing their positions in the manifold in the process. Our novel algorithm, Adaptive Anchor Label Propagation}, outperforms the standard label propagation algorithm by as much as 7% and 2% in the 1-shot and 5-shot settings respectively. We provide experimental results highlighting the merits of our algorithm on four widely used few-shot benchmark datasets, namely miniImageNet, tieredImageNet, CUB and CIFAR-FS and two commonly used backbones, ResNet12 and WideResNet-28-10. The source code can be found at https://github.com/MichalisLazarou/A2LP.

翻訳日:2023-11-01 17:36:39 公開日:2023-10-30

# 心の感情理論:緩やかな言語推論による高速な視覚処理

Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning ( http://arxiv.org/abs/2310.19995v1 )

ライセンス: Link先を確認

Yasaman Etesam and Ozge Nilay Yalcin and Chuxuan Zhang and Angelica Lim

(参考訳) イメージにおける心的問題の理論は感情認識のタスクであり、具体的には「境界ボックスの人はどのように感じるか?」と問う。表情、ボディポーズ、文脈情報、暗黙のコモンセンス知識はいずれもタスクの難しさに寄与し、現在このタスクは感情コンピューティングにおいて最も難しい問題の一つである。本研究の目的は,最近の大規模視覚言語モデル (CLIP, LLaVA) と大規模言語モデル (GPT-3.5) に埋め込まれた情緒的常識知識をコンテキスト内感情(EMOTIC)データセット上で評価することである。画像上の純粋テキストに基づく言語モデルを評価するために,26の感情カテゴリに関連する社会的信号記述852と,文字表現と設定に関する著者のガイドから引用された感情的に良好な環境文脈のラベル224を用いて,感情知覚に関連する「ナラティブキャプション」を構築する。画像から言語への移動タスクにおけるキャプションの利用を評価する。ゼロショット視覚言語モデルを用いたエモティックな実験は、"高速"と"低い"推論の組み合わせが感情認識システムを改善するための有望な方法であることを示している。それでも、EMOTICデータセットでトレーニングされた以前の作業と比べて、心のタスクのゼロショット感情理論にはギャップが残っている。

The emotional theory of mind problem in images is an emotion recognition task, specifically asking "How does the person in the bounding box feel?" Facial expressions, body pose, contextual information and implicit commonsense knowledge all contribute to the difficulty of the task, making this task currently one of the hardest problems in affective computing. The goal of this work is to evaluate the emotional commonsense knowledge embedded in recent large vision language models (CLIP, LLaVA) and large language models (GPT-3.5) on the Emotions in Context (EMOTIC) dataset. In order to evaluate a purely text-based language model on images, we construct "narrative captions" relevant to emotion perception, using a set of 872 physical social signal descriptions related to 26 emotional categories, along with 224 labels for emotionally salient environmental contexts, sourced from writer's guides for character expressions and settings. We evaluate the use of the resulting captions in an image-to-language-to-emotion task. Experiments using zero-shot vision-language models on EMOTIC show that combining "fast" and "slow" reasoning is a promising way forward to improve emotion recognition systems. Nevertheless, a gap remains in the zero-shot emotional theory of mind task compared to prior work trained on the EMOTIC dataset.

翻訳日:2023-11-01 17:36:17 公開日:2023-10-30

# PolyThrottle:エッジデバイス上でのエネルギー効率の良いニューラルネットワーク推論

PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices ( http://arxiv.org/abs/2310.19991v1 )

ライセンス: Link先を確認

Minghao Yan, Hongyi Wang, Shivaram Venkataraman

(参考訳) ニューラルネットワーク(NN)が多様な分野に展開されるにつれて、そのエネルギー需要は増加する。いくつかの先行研究は、訓練中のエネルギー消費の削減に重点を置いているが、ML駆動システムの連続運転は、推論中にかなりのエネルギー消費をもたらす。本稿では、従来の研究で無視されるGPU、メモリ、CPU周波数などのデバイス上のハードウェア要素の構成が、通常の微調整によるNN推論におけるエネルギー消費にどのように影響するかを検討する。本稿では,Constrained Bayesian Optimization を用いて,各ハードウェアコンポーネント間で構成を最適化するPolyThrottleを提案する。我々の経験的評価は、人気のあるモデルで最大36%のエネルギーを節約できることを示すエネルギー性能均衡の新しい側面を明らかにする。また、PolyThrottleがアプリケーション制約を満たしつつ、ほぼ最適設定に迅速に収束できることを検証する。

As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.

翻訳日:2023-11-01 17:35:50 公開日:2023-10-30

# 学習したローカル検索ヒューリスティックの限界を解き明かす:あなたはミークの最高傑作か?

Unveiling the Limits of Learned Local Search Heuristics: Are You the Mightiest of the Meek? ( http://arxiv.org/abs/2310.19990v1 )

ライセンス: Link先を確認

Ankur Nath, Alan Kuhnle

(参考訳) 近年,ニューラルネットワークと局所探索ヒューリスティックスの組み合わせは,組合せ最適化の分野で人気が高まっている。かなりの計算要求にもかかわらず、このアプローチは最小限の手動工学で有望な結果を示した。しかし,これらの統合の試みの実証的評価において,3つの限界が認められた。第一に、適度な複雑さと弱いベースラインを持つインスタンスは、学習ベースのアプローチの有効性を正確に評価する上で課題となる。第2に,アブレーション研究の欠如により,深層学習アーキテクチャに対する精度の高い改良の定量化と属性化が困難になる。最後に、多様な分布にまたがる学習ヒューリスティックの一般化は未検討のままである。本研究では,これらの制約を包括的に調査する。驚いたことに、Tabu Searchに基づく単純な学習ヒューリスティックは、パフォーマンスと一般化性の点で、最先端(SOTA)学習ヒューリスティックを超越している。本研究の成果は,仮定を克服し,今後の研究と組合せ最適化の革新に向けたエキサイティングな道を開くものである。

In recent years, combining neural networks with local search heuristics has become popular in the field of combinatorial optimization. Despite its considerable computational demands, this approach has exhibited promising outcomes with minimal manual engineering. However, we have identified three critical limitations in the empirical evaluation of these integration attempts. Firstly, instances with moderate complexity and weak baselines pose a challenge in accurately evaluating the effectiveness of learning-based approaches. Secondly, the absence of an ablation study makes it difficult to quantify and attribute improvements accurately to the deep learning architecture. Lastly, the generalization of learned heuristics across diverse distributions remains underexplored. In this study, we conduct a comprehensive investigation into these identified limitations. Surprisingly, we demonstrate that a simple learned heuristic based on Tabu Search surpasses state-of-the-art (SOTA) learned heuristics in terms of performance and generalizability. Our findings challenge prevailing assumptions and open up exciting avenues for future research and innovation in combinatorial optimization.

翻訳日:2023-11-01 17:35:34 公開日:2023-10-30

# de broglie-bohm-barbour-bertotti理論の展望について

On the Prospects of a de Broglie-Bohm-Barbour-Bertotti Theory ( http://arxiv.org/abs/2310.19989v1 )

ライセンス: Link先を確認

Antonio Vassallo and Pedro Naranjo

(参考訳) Pure shape dynamics (PSD) は、Julian Barbour と Bruno Bertotti によって提案されたリレーショナルフレームワークの新たな実装である。 PSDは物理学に対するライプニツィアン/マチアンのアプローチであり、物理系の動的進化を完全に記述し、システム自体の外部構造に頼らない。この章では、psdがいかにして de broglie-bohm n-body system を効果的に記述し、そのような関係記述の概念上の利点を論じている。この分析は、ド・ブロイとボームの元々の洞察に関する現代の関係論者による説明によって開かれた波動関数の性質の理解を求める新たな方向を浮き彫りにする。

Pure shape dynamics (PSD) is a novel implementation of the relational framework originally proposed by Julian Barbour and Bruno Bertotti. PSD represents a Leibnizian/Machian approach to physics in that it completely describes the dynamical evolution of a physical system without resorting to any structure external to the system itself. The chapter discusses how PSD effectively describes a de Broglie-Bohm N-body system and the conceptual benefits of such a relational description. The analysis will highlight the new directions in the quest for an understanding of the nature of the wave function that are opened up by a modern relationalist elaboration on de Broglie's and Bohm's original insights.

翻訳日:2023-11-01 17:35:19 公開日:2023-10-30

# Web検索と生成モデルを活用した画像分類における弱判定境界の対応

Addressing Weak Decision Boundaries in Image Classification by Leveraging Web Search and Generative Models ( http://arxiv.org/abs/2310.19986v1 )

ライセンス: Link先を確認

Preetam Prabhu Srikar Dammu, Yunhe Feng, Chirag Shah

(参考訳) 機械学習(ML)技術は倫理的および運用上の問題から脱却されることが知られているが、企業による機密性の高いアプリケーションへのデプロイの推進が増えているのを目撃している。多くの主要な問題は、MLモデルが表現不足のグループに対して等しくうまく機能しないことである。これにより、脆弱な人口は不利で不利な立場に置かれる。本稿では,web検索と生成モデルの力を活用し,識別モデルの欠点を緩和する手法を提案する。本研究では,ImageNet の People Subtree サブセットを用いた画像分類問題において,弱い個体群(例えば,女性色医師)を表すクラスにおいて,頑健性の向上とバイアス軽減に有効であることを示す。提案手法は,(1)弱い判定境界を識別し,(2)DALL-E 2とStable Diffusionによる画像生成のためのテキストだけでなく,Googleの検索クエリを構築すること,(3)新たに取得したトレーニングサンプルが集団バイアス問題を軽減する方法を示す。モデル全体の性能は依然として大幅に改善されているが、モデルの性別精度の相違は著しく低下している(77.30\%)。これらの改良に加えて,分類器の判断境界が著しく向上し,弱い点が少なくなり,クラス間の分離が増加するのが特徴である。本研究では,脆弱な個体群に対して本手法を実証するが,提案手法は幅広い問題や領域に拡張可能である。

Machine learning (ML) technologies are known to be riddled with ethical and operational problems, however, we are witnessing an increasing thrust by businesses to deploy them in sensitive applications. One major issue among many is that ML models do not perform equally well for underrepresented groups. This puts vulnerable populations in an even disadvantaged and unfavorable position. We propose an approach that leverages the power of web search and generative models to alleviate some of the shortcomings of discriminative models. We demonstrate our method on an image classification problem using ImageNet's People Subtree subset, and show that it is effective in enhancing robustness and mitigating bias in certain classes that represent vulnerable populations (e.g., female doctor of color). Our new method is able to (1) identify weak decision boundaries for such classes; (2) construct search queries for Google as well as text for generating images through DALL-E 2 and Stable Diffusion; and (3) show how these newly captured training samples could alleviate population bias issue. While still improving the model's overall performance considerably, we achieve a significant reduction (77.30\%) in the model's gender accuracy disparity. In addition to these improvements, we observed a notable enhancement in the classifier's decision boundary, as it is characterized by fewer weakspots and an increased separation between classes. Although we showcase our method on vulnerable populations in this study, the proposed technique is extendable to a wide range of problems and domains.

翻訳日:2023-11-01 17:35:03 公開日:2023-10-30

# 価値アライメントの前提条件としての概念アライメント

Concept Alignment as a Prerequisite for Value Alignment ( http://arxiv.org/abs/2310.20059v1 )

ライセンス: Link先を確認

Sunayana Rane, Mark Ho, Ilia Sucholutsky, Thomas L. Griffiths

(参考訳) 価値アライメントは、人々と安全かつ確実に対話できるAIシステムを構築するために不可欠である。しかし、ある人が-そしてその価値を評価できることは、現在世界中で何が起こっているのかを理解し、評価するために使われている概念に依存する。概念への価値の依存は、概念のアライメントが価値アライメントの前提条件であることを意味します。本稿では,逆強化学習環境における概念アライメント問題を形式的に解析し,概念アライメントの無視が系統的価値のミスアラインメントにつながることを示すとともに,その概念と価値を共同で推論することで,障害モードを最小化する手法について述べる。また,人間の被験者による実験結果から,エージェントが意図的に行動する際に使用する概念を,共同推論モデルに則って判断することを示した。

Value alignment is essential for building AI systems that can safely and reliably interact with people. However, what a person values -- and is even capable of valuing -- depends on the concepts that they are currently using to understand and evaluate what happens in the world. The dependence of values on concepts means that concept alignment is a prerequisite for value alignment -- agents need to align their representation of a situation with that of humans in order to successfully align their values. Here, we formally analyze the concept alignment problem in the inverse reinforcement learning setting, show how neglecting concept alignment can lead to systematic value mis-alignment, and describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values. Additionally, we report experimental results with human participants showing that humans reason about the concepts used by an agent when acting intentionally, in line with our joint reasoning model.

翻訳日:2023-11-01 17:27:56 公開日:2023-10-30

# SolarFormer:ソーラーPVプロファイリング用マルチスケールトランス

SolarFormer: Multi-scale Transformer for Solar PV Profiling ( http://arxiv.org/abs/2310.20057v1 )

ライセンス: Link先を確認

Adrian de Luis, Minh Tran, Taisei Hanyu, Anh Tran, Liao Haitao, Roy McCann, Alan Mantooth, Ying Huang, Ngan Le

(参考訳) 気候変動が強まるにつれて、持続可能なエネルギー源へのシフトがより顕著になる。太陽光発電(PV)エネルギーは、信頼性と設置の容易さから好まれる選択である。 PV導入の正確なマッピングは,導入状況を理解し,エネルギー政策を伝える上で重要である。このニーズを満たすために、航空画像からソーラーパネルを分割し、その位置と大きさに関する洞察を提供するSolarFormerを紹介します。しかし、コンピュータビジョンにおけるソーラーパネルの識別は、気象条件、屋根条件、地上サンプリング距離(GSD)の変動など様々な要因により複雑である。これらの複雑さに対処するために、マルチスケールトランスフォーマーエンコーダとマスク付きアテンショントランスフォーマーデコーダを備えたSolarFormerを提案する。本モデルでは,低レベル機能を活用し,太陽PV設置の局所化を強化するインスタンスクエリ機構を組み込んだ。 GGE(France)、IGN(France)、USGS(California, USA)など、さまざまなデータセットを使用して、SolarFormerを厳格に評価しました。我々の広範な実験は、我々のモデルが最先端のモデルに合致するか、超えていることを一貫して実証し、グローバルな持続可能エネルギーイニシアチブのためのソーラーパネルセグメンテーションを約束しています。

As climate change intensifies, the global imperative to shift towards sustainable energy sources becomes more pronounced. Photovoltaic (PV) energy is a favored choice due to its reliability and ease of installation. Accurate mapping of PV installations is crucial for understanding their adoption and informing energy policy. To meet this need, we introduce the SolarFormer, designed to segment solar panels from aerial imagery, offering insights into their location and size. However, solar panel identification in Computer Vision is intricate due to various factors like weather conditions, roof conditions, and Ground Sampling Distance (GSD) variations. To tackle these complexities, we present the SolarFormer, featuring a multi-scale Transformer encoder and a masked-attention Transformer decoder. Our model leverages low-level features and incorporates an instance query mechanism to enhance the localization of solar PV installations. We rigorously evaluated our SolarFormer using diverse datasets, including GGE (France), IGN (France), and USGS (California, USA), across different GSDs. Our extensive experiments consistently demonstrate that our model either matches or surpasses state-of-the-art models, promising enhanced solar panel segmentation for global sustainable energy initiatives.

翻訳日:2023-11-01 17:27:37 公開日:2023-10-30

# 制約付き階層型モンテカルロ信念状態計画

Constrained Hierarchical Monte Carlo Belief-State Planning ( http://arxiv.org/abs/2310.20054v1 )

ライセンス: Link先を確認

Arec Jamgochian, Hugo Buurmeijer, Kyle H. Wray, Anthony Corso, Mykel J. Kochenderfer

(参考訳) 制約付き部分観測可能なマルコフ決定プロセス(CPOMDPs)の最適計画は、コスト制約を満たしつつ報酬目標を最大化し、状態と遷移の不確実性の下で安全な計画を一般化する。残念ながら、大規模または連続的な問題領域ではオンラインCPOMDP計画は非常に難しい。多くの大きなロボットドメインでは、階層的な分解は、高レベルのアクションプリミティブ(オプション)を与えられた低レベル制御のためのツールを使用することで、計画を簡単にすることができる。我々は、この階層を活用し、オンライン検索ベースのCPOMDPプランニングを大規模ロボット問題に拡張するために、制約付きオプション選択木探索(COBeTS)を導入する。プリミティブオプションコントローラが割り当てられた制約予算を満たすように定義された場合、COBeTSはいつでも制約を満たす。さもなくば、cobetsはオプションプリミティブの安全なシーケンスへの検索をガイドし、階層的監視はランタイムの安全性を達成するために使用できる。我々はCOBeTSをいくつかの安全クリティカルで制約のある部分的に観測可能なロボットドメインで実証し、非階層的ベースラインでは不可能な連続CPOMDPで計画できることを示した。

Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.

翻訳日:2023-11-01 17:27:15 公開日:2023-10-30

# ハミルトンモンテカルロを用いた最適PAC-Bayes境界の推定

Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo ( http://arxiv.org/abs/2310.20053v1 )

ライセンス: Link先を確認

Szilvia Ujv\'ary, Gergely Flamich, Vincent Fortuin, Jos\'e Miguel Hern\'andez Lobato

(参考訳) PAC-Bayes文学における重要な未発見の疑問は、PAC-Bayes境界を最適化する際、後続の族をガウス分布に限定することで、どれだけの厳密さを失うかである。本稿では,データ非依存のPAC-Bayes境界を最適後部を用いて推定し,MFVIを用いて得られた境界と比較する。具体的には,(1)ハミルトニアン・モンテカルロを用いた最適ギブス後部からのサンプル,(2)熱力学積分によるKLの偏差を推定し,(3)異なる仮定の下で高確率境界を求める3つの方法を提案する。 mnistデータセットを用いた実験では, 5～6 %程度の密着度差がみられた。

An important yet underexplored question in the PAC-Bayes literature is how much tightness we lose by restricting the posterior family to factorized Gaussian distributions when optimizing a PAC-Bayes bound. We investigate this issue by estimating data-independent PAC-Bayes bounds using the optimal posteriors, comparing them to bounds obtained using MFVI. Concretely, we (1) sample from the optimal Gibbs posterior using Hamiltonian Monte Carlo, (2) estimate its KL divergence from the prior with thermodynamic integration, and (3) propose three methods to obtain high-probability bounds under different assumptions. Our experiments on the MNIST dataset reveal significant tightness gaps, as much as 5-6\% in some cases.

翻訳日:2023-11-01 17:26:56 公開日:2023-10-30

# 俺を見ろ再生しない! SurpriseNet: クラスインクリメンタル学習にインスパイアされた異常検出

Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning ( http://arxiv.org/abs/2310.20052v1 )

ライセンス: Link先を確認

Anton Lee and Yaqian Zhang and Heitor Murilo Gomes and Albert Bifet and Bernhard Pfahringer

(参考訳) 連続学習は、一連のタスクのインクリメンタルトレーニングを通じて、知識とスキルを蓄積できる人工ニューラルネットワークを作ることを目的としている。継続的学習の主な課題は破滅的な干渉であり、新しい知識が過去の知識を覆したり妨害したりして、忘れてしまう。関連する問題は、モデルがタスク境界を越えたクラスを区別する知識を取得して保持しない"クロスタスク知識"を学ぶことだ。両方の問題に対する一般的な解決策は"再生"であり、過去のインスタンスのバッファを限定して、クロスタスクの知識を学習し、破滅的な干渉を緩和する。しかし、これらの方法の顕著な欠点は、制限されたリプレイバッファをオーバーフィットする傾向があることである。対照的に,提案手法であるSurpriseNetは,パラメータ分離法と,異常検出にインスパイアされたオートエンコーダを用いたクロスタスク知識の学習により,破滅的な干渉に対処する。 surprisenetは、画像固有の帰納バイアスに依存しないため、構造化データと非構造化データの両方に適用できる。従来の視覚連続学習ベンチマークや構造化データデータセットにおけるSurpriseNetの強みを実証した実証実験を行った。ソースコード: https://doi.org/10.5281/zenodo.8247906 and https://github.com/tachyonicclock/surprisenet-cikm-23

Continual learning aims to create artificial neural networks capable of accumulating knowledge and skills through incremental training on a sequence of tasks. The main challenge of continual learning is catastrophic interference, wherein new knowledge overrides or interferes with past knowledge, leading to forgetting. An associated issue is the problem of learning "cross-task knowledge," where models fail to acquire and retain knowledge that helps differentiate classes across task boundaries. A common solution to both problems is "replay," where a limited buffer of past instances is utilized to learn cross-task knowledge and mitigate catastrophic interference. However, a notable drawback of these methods is their tendency to overfit the limited replay buffer. In contrast, our proposed solution, SurpriseNet, addresses catastrophic interference by employing a parameter isolation method and learning cross-task knowledge using an auto-encoder inspired by anomaly detection. SurpriseNet is applicable to both structured and unstructured data, as it does not rely on image-specific inductive biases. We have conducted empirical experiments demonstrating the strengths of SurpriseNet on various traditional vision continual-learning benchmarks, as well as on structured data datasets. Source code made available at https://doi.org/10.5281/zenodo.8247906 and https://github.com/tachyonicClock/SurpriseNet-CIKM-23

翻訳日:2023-11-01 17:26:39 公開日:2023-10-30

# 多項式に基づく注意体系の表現性

The Expressibility of Polynomial based Attention Scheme ( http://arxiv.org/abs/2310.20051v1 )

ライセンス: Link先を確認

Zhao Song, Guangyi Xu, Junze Yin

(参考訳) 大きな言語モデル(LLM)は、私たちの日常生活の様々な側面を大幅に改善しました。これらのモデルは、医療から教育、生産性の向上、意思決定プロセス、アクセシビリティなど、多くの領域に影響を与える。その結果、彼らは人々の生活様式に影響を与え、ある程度変化した。しかしながら、トランスアーキテクチャにおける注意の二次的複雑さは、長いテキストコンテキストを処理するためにこれらのモデルをスケールアップする際の課題となる。この問題は、長いテキストで非常に大きなモデルをトレーニングしたり、推論中に効率的に使用するのが現実的ではない。 KMZ23] による最近の研究では, ソフトマックスを多項式関数と多項式スケッチに置き換え, 注意機構を高速化する手法が提案されているが, この新しいアプローチの理論的理解はまだ十分に理解されていない。本稿では,多項式注意力の表現能力に関する理論的解析を行う。本研究は,高次・低次多項式注意力の相違を明らかにする。具体的には、慎重に設計された2つのデータセット、すなわち$\mathcal{D}_0$と$\mathcal{D}_1$を構築します。十分高い次数$\beta$で、単層多項式注意ネットワークは$\mathcal{D}_0$と$\mathcal{D}_1$を区別できることを示した。しかし、$\beta$の低いネットワークでは、2つのデータセットを効果的に分離することはできない。この分析は、大きな値の増幅とデータセットの区別における高次多項式のさらなる有効性を示している。本解析は,多項式注意の表現能力に関する洞察を与え,高次多項式を注意メカニズムに組み込んで複雑な言語相関を捉えるための理論的根拠を提供する。

Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. As a result, they have influenced and, to some extent, reshaped people's lifestyles. However, the quadratic complexity of attention in transformer architectures poses a challenge when scaling up these models for processing long textual contexts. This issue makes it impractical to train very large models on lengthy texts or use them efficiently during inference. While a recent study by [KMZ23] introduced a technique that replaces the softmax with a polynomial function and polynomial sketching to speed up attention mechanisms, the theoretical understandings of this new approach are not yet well understood. In this paper, we offer a theoretical analysis of the expressive capabilities of polynomial attention. Our study reveals a disparity in the ability of high-degree and low-degree polynomial attention. Specifically, we construct two carefully designed datasets, namely $\mathcal{D}_0$ and $\mathcal{D}_1$, where $\mathcal{D}_1$ includes a feature with a significantly larger value compared to $\mathcal{D}_0$. We demonstrate that with a sufficiently high degree $\beta$, a single-layer polynomial attention network can distinguish between $\mathcal{D}_0$ and $\mathcal{D}_1$. However, with a low degree $\beta$, the network cannot effectively separate the two datasets. This analysis underscores the greater effectiveness of high-degree polynomials in amplifying large values and distinguishing between datasets. Our analysis offers insight into the representational capacity of polynomial attention and provides a rationale for incorporating higher-degree polynomials in attention mechanisms to capture intricate linguistic correlations.

翻訳日:2023-11-01 17:26:15 公開日:2023-10-30

# コンテキスト内学習にアノテートする例は何でしょう? 効率的かつ効率的な選択に向けて

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection ( http://arxiv.org/abs/2310.20046v1 )

ライセンス: Link先を確認

Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis

(参考訳) 大規模言語モデル(LLM)は、コンテキスト内学習(ICL)を通じて新しいタスクに適応することができる。 ICL は訓練された LLM のパラメータ更新を必要としないため効率が良いが、LLM の入力としてアノテートされた例はわずかである。本研究では,実例をアノテートするための予算が限られているiclのアクティブラーニング手法について検討する。本稿では,モデルが不確実である例を識別し,意味的多様性に基づくサンプル選択を行う,モデル適応型最適化フリーアルゴリズムadaiclを提案する。不確実性サンプリングは予算効率を改善し、llmが新しい情報を学ぶのに役立つ。さらに、AdaICLはそのサンプリング戦略を最大被覆問題として採用し、モデルのフィードバックに基づいて動的に適応し、グリードアルゴリズムによってほぼ解決できる。 9つのデータセットと7つのllmに関する広範囲な実験により、adaiclはsotaよりも4.4%の精度向上(7.7%の相対的改善)を示し、ランダムにアノテーションを実行するよりも予算効率が最大3倍向上することが示された。

Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about, and performs semantic diversity-based example selection. Diversity-based sampling improves overall effectiveness, while uncertainty sampling improves budget efficiency and helps the LLM learn new information. Moreover, AdaICL poses its sampling strategy as a Maximum Coverage problem, that dynamically adapts based on the model's feedback and can be approximately solved via greedy algorithms. Extensive experiments on nine datasets and seven LLMs show that AdaICL improves performance by 4.4% accuracy points over SOTA (7.7% relative improvement), is up to 3x more budget-efficient than performing annotations uniformly at random, while it outperforms SOTA with 2x fewer ICL examples.

翻訳日:2023-11-01 17:25:46 公開日:2023-10-30

# ソフトウェア製品ラインの分析結果における相補的変動

Comprehending Variability in Analysis Results of Software Product Lines ( http://arxiv.org/abs/2310.20042v1 )

ライセンス: Link先を確認

Rafael F. Toledo, Joanne M. Atlee, Rui Ming Xiong

(参考訳) ソフトウェア製品ライン(SPL)の分析は通常、結果が保持する製品変異の集合を示す論理式で注釈付けされた変数結果を報告します。これらの表現は、splに多くの機能や製品変異がある場合、複雑で推論が困難になる可能性がある。前回の研究では、製品に適用される分析結果を強調するフィルターをサポートするビジュアライザが導入されたが、この作業は弱く評価された。本稿では,この新たなビジュアライゼーションの有効性を評価するための制御型ユーザスタディについて報告する。以上の結果から,新しいビジュアライザを用いることで,ユーザの作業の正確性や効率が向上し,ユーザの認知負荷を低減できることが示唆された。

Analyses of a software product line (SPL) typically report variable results that are annotated with logical expressions indicating the set of product variants for which the results hold. These expressions can get complicated and difficult to reason about when the SPL has lots of features and product variants. Previous work introduced a visualizer that supports filters for highlighting the analysis results that apply to product variants of interest, but this work was weakly evaluated. In this paper, we report on a controlled user study that evaluates the effectiveness of this new visualizer in helping the user search variable results and compare the results of multiple variants. Our findings indicate that the use of the new visualizer significantly improves the correctness and efficiency of the user's work and reduces the user's cognitive load in working with variable results.

翻訳日:2023-11-01 17:25:23 公開日:2023-10-30

# 腫瘍セグメンテーション性能評価のためのDice類似度係数に優れた指標としての放射能

Radiomics as a measure superior to the Dice similarity coefficient for tumor segmentation performance evaluation ( http://arxiv.org/abs/2310.20039v1 )

ライセンス: Link先を確認

Yoichi Watanabe (1) and Rukhsora Akramova (1) ((1) Department of Radiation Oncology, University of Minnesota Medical School, Minneapolis, MN, USA)

(参考訳) 高品質な放射線治療では、標的と健全な構造の正確なセグメンテーションが不可欠である。本研究は,広く用いられているDice similarity Coefficient (DSC)と比較して,医師やオートセグメンテーションツールのセグメンテーション能力を評価するための優れた尺度として,放射能の特徴を提案する。ライダーデータライブラリーで利用可能な10個の肺腫瘍の2つのctスキャンから放射能データを解析することにより、セグメンテーション精度を評価するための再現可能な放射能特徴を選択する。放射能の特徴をPyRadiomicsを用いて抽出し,CCC(Concordance correlation Coefficient)に基づく選択を行った。その後, 10例のCT画像を用いて, それぞれ異なる医師やオートセグメンテーションツールで分割し, セグメンテーション性能を評価した。この研究は、2つのCT画像の間にCCCが0.93以上あり、堅牢な再現性を示す206個の放射能の特徴を明らかにした。これらの特徴のうち7つはクラス内相関係数 (icc) が低く、セグメンテーションの違いに対する感度が高まることを示している。特に、球状性、伸長性、平坦性などの原形の特徴のICCは0.1177から0.995の範囲であった。対照的に、すべてのdsc値は0.778であった。本研究は, 放射線学の特徴, 特に形状とエネルギーに関する特徴が, DSCとは異なり, 腫瘍のセグメンテーション特性の微妙な変化を捉えることができることを示した。その結果,ICCを併用したRadiomicsの特徴は,医師の腫瘍セグメンテーション能力とオートセグメンテーションツールの性能を評価する上で優れていた。以上の結果から, これらの新しい指標は, 新しい自己隔離法の評価や, 医学的セグメンテーションにおける個人訓練の強化に有効であることが示唆された。

In high-quality radiotherapy delivery, precise segmentation of targets and healthy structures is essential. This study proposes Radiomics features as a superior measure for assessing the segmentation ability of physicians and auto-segmentation tools, in comparison to the widely used Dice Similarity Coefficient (DSC). The research involves selecting reproducible radiomics features for evaluating segmentation accuracy by analyzing radiomics data from 2 CT scans of 10 lung tumors, available in the RIDER Data Library. Radiomics features were extracted using PyRadiomics, with selection based on the Concordance Correlation Coefficient (CCC). Subsequently, CT images from 10 patients, each segmented by different physicians or auto-segmentation tools, were used to assess segmentation performance. The study reveals 206 radiomics features with a CCC greater than 0.93 between the two CT images, indicating robust reproducibility. Among these features, seven exhibit low Intraclass Correlation Coefficients (ICC), signifying increased sensitivity to segmentation differences. Notably, ICCs of original shape features, including sphericity, elongation, and flatness, ranged from 0.1177 to 0.995. In contrast, all DSC values exceeded 0.778. This research demonstrates that radiomics features, particularly those related to shape and energy, can capture subtle variations in tumor segmentation characteristics, unlike DSC. As a result, Radiomics features with ICC prove superior for evaluating a physician's tumor segmentation ability and the performance of auto-segmentation tools. The findings suggest that these new metrics can be employed to assess novel auto-segmentation methods and enhance the training of individuals in medical segmentation, thus contributing to improved radiotherapy practices.

翻訳日:2023-11-01 17:25:08 公開日:2023-10-30

# 臨床要約におけるファクチュアルアライメントのための合成模倣編集フィードバック

Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization ( http://arxiv.org/abs/2310.20033v1 )

ライセンス: Link先を確認

Prakamya Mishra, Zonghai Yao, Shuwei Chen, Beining Wang, Rohan Mittal, Hong Yu

(参考訳) GPTやLLaMAファミリーのような大規模言語モデル(LLM)は、重要な文脈情報をキャプチャし、凝縮し、要約タスクで最先端のパフォーマンスを達成するという、例外的な能力を示している。しかし、これらのモデルの幻覚に関するコミュニティの懸念は高まり続けている。 LLMは、実際に幻覚化された要約を生成することがあるが、これは臨床領域のNLPタスク(例えば、臨床メモの要約)において非常に有害である。ヒトのフィードバックを用いた微調整LSMは、世代間でLLMを実際に整合させるという約束を示しているが、そのような訓練には高品質な人間注釈データが必要である。本研究では, 臨床ノート要約タスクにおいて, 品質の高いフィードバックデータを生成するために, 人間の専門家の代わりにchatgptを用いた新しいパイプラインを提案する。近年の研究では、複雑な状況(専門知識を必要とする臨床NLPタスクなど)における優先的なフィードバックによる人間のアライメントの欠点や、ドメインの専門家による編集フィードバックの収集の利点について論じている。加えて、GPTは多くの臨床NLPタスク(例えばUSMLE QA)で専門家レベルに達したが、臨床ノート要約タスクにおいて、GPTが専門家レベルの編集フィードバックを生成できるかどうかを議論する以前の研究は少ない。私たちはこのギャップを埋めたい。最後に,人間のアライメントにおけるGPT編集の可能性,特に事実性の観点から評価した。

Large Language Models (LLMs) like the GPT and LLaMA families have demonstrated exceptional capabilities in capturing and condensing critical contextual information and achieving state-of-the-art performance in the summarization task. However, community concerns about these models' hallucination issues continue to rise. LLMs sometimes generate factually hallucinated summaries, which can be extremely harmful in the clinical domain NLP tasks (e.g., clinical note summarization), where factually incorrect statements can lead to critically erroneous diagnoses. Fine-tuning LLMs using human feedback has shown the promise of aligning LLMs to be factually consistent during generation, but such training procedure requires high-quality human-annotated data, which can be extremely expensive to get in the clinical domain. In this work, we propose a new pipeline using ChatGPT instead of human experts to generate high-quality feedback data for improving factual consistency in the clinical note summarization task. We focus specifically on edit feedback because recent work discusses the shortcomings of human alignment via preference feedback in complex situations (such as clinical NLP tasks that require extensive expert knowledge), as well as some advantages of collecting edit feedback from domain experts. In addition, although GPT has reached the expert level in many clinical NLP tasks (e.g., USMLE QA), there is not much previous work discussing whether GPT can generate expert-level edit feedback for LMs in the clinical note summarization task. We hope to fill this gap. Finally, our evaluations demonstrate the potential use of GPT edits in human alignment, especially from a factuality perspective.

翻訳日:2023-11-01 17:24:36 公開日:2023-10-30

# リーマン拡散モデルのスケーリング

Scaling Riemannian Diffusion Models ( http://arxiv.org/abs/2310.20030v1 )

ライセンス: Link先を確認

Aaron Lou, Minkai Xu, Stefano Ermon

(参考訳) リーマン拡散モデルは標準ユークリッド空間拡散モデルからインスピレーションを得て、一般多様体上の分布を学ぶ。残念なことに、追加の幾何学的複雑性は拡散遷移項を閉じた形で表現できないようにするため、以前の手法では、パフォーマンスを低下させ、高次元の応用を妨げるスコアマッチングトレーニング目標の近似を不正確にする。本稿では,これらの近似を再検討し,いくつかの実践的改善を提案する。我々の重要な観察は、最も関連する多様体は対称空間であり、計算にはるかに適しているということである。様々な ans\{a}tze を活用して組み合わせることで、関連する量を高速で高精度に計算できる。低次元データセットでは、我々の補正は明らかな改善をもたらし、拡散は他の方法と競合する。さらに,本手法は非自明な多様体上の高次元タスクに拡張できることを示す。特に、$SU(n)$格子上のQCD密度と高次元超球面上の対照的に学習された埋め込みをモデル化する。

Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. Our key observation is that most relevant manifolds are symmetric spaces, which are much more amenable to computation. By leveraging and combining various ans\"{a}tze, we can quickly compute relevant quantities to high precision. On low dimensional datasets, our correction produces a noticeable improvement, allowing diffusion to compete with other methods. Additionally, we show that our method enables us to scale to high dimensional tasks on nontrivial manifolds. In particular, we model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.

翻訳日:2023-11-01 17:24:05 公開日:2023-10-30

# GOPlan:学習モデルによる計画による目標条件付きオフライン強化学習

GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models ( http://arxiv.org/abs/2310.20025v1 )

ライセンス: Link先を確認

Mianchu Wang, Rui Yang, Xi Chen, Meng Fang

(参考訳) オフラインのゴール条件付きRL(GCRL)は、多様なマルチタスクのオフラインデータセットから汎用ポリシーを学ぶための実用的なパラダイムを提供する。最近の注目すべき進歩にもかかわらず、オフラインのgcrlメソッドはモデルフリーなアプローチに制限されており、限られたデータ予算と目立たない目標の一般化に取り組む能力に制限されている。本研究では,(1)マルチゴールデータセット内でのマルチモーダルな行動分布をキャプチャ可能な事前ポリシーを事前学習すること,(2)提案手法を応用して,具体化政策のための仮想軌道生成を計画する,新たな2段階モデルベースフレームワークであるゴール条件付きオフライン計画(GOPlan)を提案する。特に、事前の方針は、オフ・オブ・ディストリビューション(OOD)アクションの落とし穴を克服するために、異なるモード分離を示すアドバイザリネットワークのアドバンテージ重みに基づく。さらなる政策最適化のために、軌道内目標と軌道間目標の両方について学習モデルを用いて計画し、高品質な虚構データを生成する。実験により,GOPlanは様々なオフラインマルチゴール操作タスクにおいて,最先端の性能を実現することを示す。さらに,GOPlanが小規模なデータ予算を処理し,OOD目標を一般化する上での優れた能力を強調した。

Offline goal-conditioned RL (GCRL) offers a feasible paradigm to learn general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods have been restricted to model-free approaches, constraining their capacity to tackle limited data budgets and unseen goal generalization. In this work, we propose a novel two-stage model-based framework, Goal-conditioned Offline Planning (GOPlan), including (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, the prior policy is based on an advantage-weighted Conditioned Generative Adversarial Networks that exhibits distinct mode separation to overcome the pitfalls of out-of-distribution (OOD) actions. For further policy optimization, the reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals. Through experimental evaluations, we demonstrate that GOPlan achieves state-of-the-art performance on various offline multi-goal manipulation tasks. Moreover, our results highlight the superior ability of GOPlan to handle small data budgets and generalize to OOD goals.

翻訳日:2023-11-01 17:23:46 公開日:2023-10-30

# アドホックロボットネットワークのトポロジー回復可能性予測:データ駆動型フォールトトレラントアプローチ

Topology Recoverability Prediction for Ad-Hoc Robot Networks: A Data-Driven Fault-Tolerant Approach ( http://arxiv.org/abs/2310.20024v1 )

ライセンス: Link先を確認

Matin Macktoobian and Zhan Shu and Qing Zhao

(参考訳) アドホックなロボットネットワークで発生する故障は、それらのネットワークのサブセットの切断につながるトポロジを致命的に混乱させる可能性がある。最適なトポロジー合成は一般にリソース集約的で、大規模なアドホックなロボットネットワークでリアルタイムに行うのに時間がかかります。トポロジー再計算は、任意の障害の発生後のトポロジー回復可能性の確率が、その回復可能性のそれを超える場合にのみ行うべきである。この問題を二分分類問題として定式化する。そこで,ベイジアン・ガウシアン混合モデルに基づく2経路データ駆動モデルを構築し,2つの異なるプレフォールトおよびポストフォールト予測経路による典型的な問題の解を予測する。これらの経路の予測を統合することで得られた結果は、文献で見られる現在の戦略と比較して、トポロジ(発見可能性)予測問題の解決における我々のモデルの成功を示している。

Faults occurring in ad-hoc robot networks may fatally perturb their topologies leading to disconnection of subsets of those networks. Optimal topology synthesis is generally resource-intensive and time-consuming to be done in real time for large ad-hoc robot networks. One should only perform topology re-computations if the probability of topology recoverability after the occurrence of any fault surpasses that of its irrecoverability. We formulate this problem as a binary classification problem. Then, we develop a two-pathway data-driven model based on Bayesian Gaussian mixture models that predicts the solution to a typical problem by two different pre-fault and post-fault prediction pathways. The results, obtained by the integration of the predictions of those pathways, clearly indicate the success of our model in solving the topology (ir)recoverability prediction problem compared to the best of current strategies found in the literature.

翻訳日:2023-11-01 17:23:19 公開日:2023-10-30

# 多時間一般化量子マスター方程式の効率的な定式化:2次元スペクトルのシミュレーションコストの調整

Efficient formulation of multitime generalized quantum master equations: Taming the cost of simulating 2D spectra ( http://arxiv.org/abs/2310.20022v1 )

ライセンス: Link先を確認

Thomas Sayer and Andr\'es Montoya-Castillo

(参考訳) 現代の4波混合分光法は、実験的および計算的に得るには高価である。場合によっては、一般化量子マスター方程式(GQME)を用いて量子力学問題の好ましくないスケーリングを改善することができる。しかし、複数の(光と物質)相互作用を包含することは運動方程式を複雑にし、一見不可避な立方体スケーリングに繋がる。本稿では、GQMEフレームワークを任意の数の量子測定に拡張した以前の研究の計算コストを大幅に単純化し、削減する定式化を提案する。具体的には、移動テンソル法に着想を得た離散畳み込み実装に切り替えることで、修正された森-中島-Zwanzigフレームワークから量子相関関数の時間微分を除去する。次に、励起エネルギー移動ダイマーモデルのための2次元電子スペクトルをシミュレートして、この手法の能力を実証する。この方法では、データの分解能は任意に粗いものとなり、特に$t_2$軸に沿ってデータを実験的に取得する方法を反映している。控えめな場合であっても、これは$\mathcal{O}(10^3)$より少ないデータポイントを必要とする。さらにスペクトルを1-, 2-, 3時間相関に分解し, 将来のスペクトルを予測し, スケーリングを2次的に予測するために, システムがさらに測定する必要のないマルコフ体制に入る時期を示す。これにより、短い時間データのみを使用して長時間のスペクトルを生成することができ、従来の標準的な方法論を超えてタイムスケールへのアクセスが可能になる。

Modern 4-wave mixing spectroscopies are expensive to obtain experimentally and computationally. In certain cases, the unfavorable scaling of quantum dynamics problems can be improved using a generalized quantum master equation (GQME) approach. However, the inclusion of multiple (light-matter) interactions complicates the equation of motion and leads to seemingly unavoidable cubic scaling in time. In this paper, we present a formulation that greatly simplifies and reduces the computational cost of previous work that extended the GQME framework to treat arbitrary numbers of quantum measurements. Specifically, we remove the time derivatives of quantum correlation functions from the modified Mori-Nakajima-Zwanzig framework by switching to a discrete-convolution implementation inspired by the transfer-tensor approach. We then demonstrate the method's capabilities by simulating 2D electronic spectra for the excitation-energy-transfer dimer model. In our method, the resolution of the data can be arbitrarily coarsened, especially along the $t_2$ axis, which mirrors how the data are obtained experimentally. Even in a modest case, this demands $\mathcal{O}(10^3)$ fewer data points. We are further able to decompose the spectra into 1-, 2-, and 3-time correlations, showing how and when the system enters a Markovian regime where further measurements are unnecessary to predict future spectra and the scaling becomes quadratic. This offers the ability to generate long-time spectra using only short-time data, enabling access to timescales previously beyond the reach of standard methodologies.

翻訳日:2023-11-01 17:23:03 公開日:2023-10-30

# 顔面非対称性 : 対面インタビューにおける評価のためのコンピュータビジョンに基づく行動指標

Facial asymmetry: A Computer Vision based behaviometric index for assessment during a face-to-face interview ( http://arxiv.org/abs/2310.20083v1 )

ライセンス: Link先を確認

Shuvam Keshari, Tanusree Dutta, Raju Mullick, Ashish Rathor, Priyadarshi Patnaik

(参考訳) 適切な仕事に適した人を選ぶことで、人事面接プロセスは認知的に要求されるタスクになります。インタヴューに続く心理測定テストは、そのようなメカニズムには制限があるものの、プロセスを支援するためにしばしば使用される。心理測定テストは、反応の偽りや社会的望ましさに悩まされるが、面接プロセスは面接者による反応の分析方法に依存する。本稿では,面接者の認知負荷を増加させることなく,面接者の客観的評価を容易にするための補助的ツールとしてビハビオメトリの使用を提案する。行動計測は, 表情, 発声パターン, 瞳孔反応, 近位行動, 身体言語など, 相容れない行動特性を利用する, 選択過程における比較的小さな研究分野である。本手法は, 薄片の挙動を分析し, インタビュアーに関するバイアスのない情報を提供する。本研究は, 顔の非対称性とマイクロ表現の観点から, 顔の表情を捉える手法を提案する。構造類似度指数を用いた界面複合材料を用いて, 顔非対称性の進行時間グラフをテストケースとして開発した。フレーム・バイ・フレーム分析は3つのYouTubeビデオサンプルで行われ、構造類似度指数(SSID)は75%以上で行動の一致を示した。この研究は、オープンソースのコンピュータビジョンアルゴリズムとライブラリ(python-opencvとdlib)を使用して、顔非対称性の分析手順を定式化している。

Choosing the right person for the right job makes the personnel interview process a cognitively demanding task. Psychometric tests, followed by an interview, have often been used to aid the process although such mechanisms have their limitations. While psychometric tests suffer from faking or social desirability of responses, the interview process depends on the way the responses are analyzed by the interviewers. We propose the use of behaviometry as an assistive tool to facilitate an objective assessment of the interviewee without increasing the cognitive load of the interviewer. Behaviometry is a relatively little explored field of study in the selection process, that utilizes inimitable behavioral characteristics like facial expressions, vocalization patterns, pupillary reactions, proximal behavior, body language, etc. The method analyzes thin slices of behavior and provides unbiased information about the interviewee. The current study proposes the methodology behind this tool to capture facial expressions, in terms of facial asymmetry and micro-expressions. Hemi-facial composites using a structural similarity index was used to develop a progressive time graph of facial asymmetry, as a test case. A frame-by-frame analysis was performed on three YouTube video samples, where Structural similarity index (SSID) scores of 75% and more showed behavioral congruence. The research utilizes open-source computer vision algorithms and libraries (python-opencv and dlib) to formulate the procedure for analysis of the facial asymmetry.

翻訳日:2023-11-01 17:14:35 公開日:2023-10-30

# 効率的な選択法学習による効率的なグラフGNN

Efficient Subgraph GNNs by Learning Effective Selection Policies ( http://arxiv.org/abs/2310.20082v1 )

ライセンス: Link先を確認

Beatrice Bevilacqua, Moshe Eliasof, Eli Meirom, Bruno Ribeiro, Haggai Maron

(参考訳) 部分グラフgnnは、部分グラフの集合からグラフ表現を学習する表現型ニューラルネットワークである。残念ながら、それらの適用性は、多くのサブグラフでメッセージパッシングを行う際の計算複雑性によって妨げられている。本稿では,データ駆動方式で可能な大きな部分グラフの小さなサブセットを選択することを学ぶことの問題点について考察する。まず、効率的な部分グラフ選択ポリシーが存在する、wlが区別できないグラフのファミリーが存在することを証明することによって、この問題を動機付けます。次に,反復的にサブグラフを選択する方法を学ぶための新しいアプローチであるpolicy-learnを提案する。私たちは、一般的なランダムなポリシーや同じ問題を解決する事前作業とは異なり、アーキテクチャが上で述べた効率的なポリシーを学習できることを証明します。実験の結果,幅広いデータセットで既存のベースラインを上回っていることがわかった。

Subgraph GNNs are provably expressive neural architectures that learn graph representations from sets of subgraphs. Unfortunately, their applicability is hampered by the computational complexity associated with performing message passing on many subgraphs. In this paper, we consider the problem of learning to select a small subset of the large set of possible subgraphs in a data-driven fashion. We first motivate the problem by proving that there are families of WL-indistinguishable graphs for which there exist efficient subgraph selection policies: small subsets of subgraphs that can already identify all the graphs within the family. We then propose a new approach, called Policy-Learn, that learns how to select subgraphs in an iterative manner. We prove that, unlike popular random policies and prior work addressing the same problem, our architecture is able to learn the efficient policies mentioned above. Our experimental results demonstrate that Policy-Learn outperforms existing baselines across a wide range of datasets.

翻訳日:2023-11-01 17:14:08 公開日:2023-10-30

# 大規模言語モデルによるパーソナライゼーション向上のための要約と検索の統合

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models ( http://arxiv.org/abs/2310.20081v1 )

ライセンス: Link先を確認

Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, Abhinav Sethy

(参考訳) パーソナライゼーション(パーソナライゼーション)は、自然言語処理(NLP)システムにおけるユーザエクスペリエンスにおいて重要な要素である。 LLM(Large Language Models)の出現によって、重要な疑問は、これらのモデルを使ってユーザーエクスペリエンスをよりパーソナライズする方法である。言語モデルの出力をパーソナライズするには、過去のユーザデータを言語モデルプロンプトに組み込むことが簡単なアプローチであるが、このアプローチは入力長の制限を超える長い入力と遅延やコストの問題を引き起こす可能性がある。既存のアプローチは、ダウンストリームタスクのプロンプトを構築するために、関連するユーザデータ(選択検索)を選択的に抽出することで、このような課題に対処する。しかし、検索ベースの方法は潜在的な情報損失、より深いユーザー理解の欠如、コールドスタートの課題によって制限される。これらの制約を克服するために,llmsによって生成されたタスク対応ユーザ要約を用いて,検索によるパーソナライズを拡張した新しい要約手法を提案する。要約はオフラインで生成および保存でき、音声アシスタントのような実行時の制約のある実世界のシステムがllmのパワーを活用できる。 LaMPパーソナライゼーションベンチマークのほとんどのタスクにおいて,検索したユーザデータの75%削減がオンパーまたはオーバーフォームであることを示す実験を行った。 LLMによるオフライン要約と実行時検索により,現実的な制約下でのタスクのパーソナライズ性能が向上することが実証された。

Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.

翻訳日:2023-11-01 17:13:52 公開日:2023-10-30

# トカマク核融合炉におけるプラズマインダクタンスダイナミクス予測のためのハイブリダイゼーション物理とニューラルode

Hybridizing Physics and Neural ODEs for Predicting Plasma Inductance Dynamics in Tokamak Fusion Reactors ( http://arxiv.org/abs/2310.20079v1 )

ライセンス: Link先を確認

Allen M. Wang, Darren T. Garnier, and Cristina Rea

(参考訳) トカマクとして知られる核融合炉は確固たるエネルギー源として約束されているが、プラズマ制御の進歩とプラズマの制御が失われる事象の処理は経済的に必要である。より高度な制御アルゴリズムを適用する上で重要なボトルネックは、物理ベースのアプローチとデータ駆動アプローチの両方が現在不足している、プラズマシミュレーションの改善の必要性である。前者は計算コストとプラズマのモデル化の難しさの両方でボトルネックされ、後者は相対的なデータ粘度でボトルネックされる。この問題に対処するため、本研究では、プラズマダイナミクスのサブセット、すなわちプラズマ電流と内部インダクタンスダイナミクスの予測問題に対して、neural ordinary differential equation(ode)フレームワークを適用する。 neural odeフレームワークは、物理学に基づく帰納バイアスを自然に含むことができるため、alcator c-mod fusion reactorからのデータに基づいて物理ベースのモデルとニューラルネットワークモデルを訓練し、物理学ベースの方程式とニューラルネットワークodeを組み合わせるモデルが、既存の物理学によるodeモデルと純粋なニューラルネットワークodeモデルの両方よりも優れた結果をもたらすことを見出します。

While fusion reactors known as tokamaks hold promise as a firm energy source, advances in plasma control, and handling of events where control of plasmas is lost, are needed for them to be economical. A significant bottleneck towards applying more advanced control algorithms is the need for better plasma simulation, where both physics-based and data-driven approaches currently fall short. The former is bottle-necked by both computational cost and the difficulty of modelling plasmas, and the latter is bottle-necked by the relative paucity of data. To address this issue, this work applies the neural ordinary differential equations (ODE) framework to the problem of predicting a subset of plasma dynamics, namely the coupled plasma current and internal inductance dynamics. As the neural ODE framework allows for the natural inclusion of physics-based inductive biases, we train both physics-based and neural network models on data from the Alcator C-Mod fusion reactor and find that a model that combines physics-based equations with a neural ODE performs better than both existing physics-motivated ODEs and a pure neural ODE model.

翻訳日:2023-11-01 17:13:24 公開日:2023-10-30

# TorchProbe: 動的ディープラーニングコンパイラのファジング

TorchProbe: Fuzzing Dynamic Deep Learning Compilers ( http://arxiv.org/abs/2310.20078v1 )

ライセンス: Link先を確認

Qidong Su, Chuqin Geng, Gennady Pekhimenko, Xujie Si

(参考訳) 静的および動的計算グラフは、ディープラーニングフレームワークを構築するための2つの異なるアプローチを表している。前者はコンパイラベースの最適化を優先し、後者はプログラマビリティとユーザフレンドリに重点を置いている。 Pythonでの任意のディープラーニングプログラムのコンパイルをサポートするPyTorch 2.0の最近のリリースは、ディープラーニングインフラストラクチャの進化における新たな方向性を示し、コンパイラ技術をよりダイナミックな方法で取り入れ、動的制御フローやクロージャのようなよりダイナミックな言語機能をサポートする。 Pythonとのシームレスな統合を考えると、Pythonで書かれた任意のディープラーニングコードをサポートすることを目指している。しかし、Pythonの固有のダイナミズムは、コンパイラの完全性と堅牢性に課題をもたらす。最近の研究では、ディープラーニングコンパイラのテストにファジングを導入しているが、動的機能のテスト方法に関する包括的な分析はいまだに存在しない。この問題に対処するため,動的機能を含むテストケースを生成するコード変換を提案する。これらの変換はプログラムの意味を保ち、変換されたプログラムと元のプログラムの相違がバグの存在を示す。このアプローチを通じて、PyTorchコンパイラとその基盤となるテンソルコンパイラTritonの20の既知のバグを特定した。

Static and dynamic computational graphs represent two distinct approaches to constructing deep learning frameworks. The former prioritizes compiler-based optimizations, while the latter focuses on programmability and user-friendliness. The recent release of PyTorch 2.0, which supports compiling arbitrary deep learning programs in Python, signifies a new direction in the evolution of deep learning infrastructure to incorporate compiler techniques in a more dynamic manner and support more dynamic language features like dynamic control flows and closures. Given PyTorch's seamless integration with Python, its compiler aims to support arbitrary deep learning code written in Python. However, the inherent dynamism of Python poses challenges to the completeness and robustness of the compiler. While recent research has introduced fuzzing to test deep learning compilers, there is still a lack of comprehensive analysis on how to test dynamic features. To address this issue, we propose several code transformations to generate test cases involving dynamic features. These transformations preserve the program's semantics, ensuring that any discrepancy between the transformed and original programs indicates the presence of a bug. Through our approach, we have successfully identified twenty previously unknown bugs in the PyTorch compiler and its underlying tensor compiler Triton.

翻訳日:2023-11-01 17:13:00 公開日:2023-10-30

# 自然言語処理のための部分テンソル化トランスフォーマー

Partial Tensorized Transformers for Natural Language Processing ( http://arxiv.org/abs/2310.20077v1 )

ライセンス: Link先を確認

Subhadra Vadlamannati, Ryan Solgi

(参考訳) トランスフォーマーアーキテクチャは、前例のない精度のため、自然言語処理(NLP)や他の機械学習タスクに革命をもたらした。しかし、その広範なメモリとパラメータ要件は、しばしば実用上の応用を妨げる。本研究では,トランスフォーマービジョン言語ニューラルネット(bert,vit)の精度向上と圧縮におけるテンソル-トレイン分解の効果について検討する。ニューラルネットワーク(PTNN)の埋め込み層圧縮と部分的テンソル化の両方にアルゴリズム的アプローチで焦点をあてる。新しいptnnアプローチは,トレーニング後の調整を必要とせず,既存のモデルの精度を最大5%向上させ,テンソル分解の分野における新たな基盤を打ち破る。

The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical applications. In this work, we study the effect of tensor-train decomposition to improve the accuracy and compress transformer vision-language neural networks, namely BERT and ViT. We focus both on embedding-layer compression and partial tensorization of neural networks (PTNN) through an algorithmic approach. Our novel PTNN approach significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments, breaking new ground in the field of tensor decomposition.

翻訳日:2023-11-01 17:12:42 公開日:2023-10-30

# meekセパレータとその標的因果発見への応用

Meek Separators and Their Applications in Targeted Causal Discovery ( http://arxiv.org/abs/2310.20075v1 )

ライセンス: Link先を確認

Kirankumar Shiragur and Jiaqi Zhang and Caroline Uhler

(参考訳) 介入データから因果構造を学習することは、様々な分野にわたる幅広い応用において根本的な問題である。以前の多くの研究は因果グラフ全体の回復に重点を置いてきたが、実際には因果グラフの一部のみを学習するシナリオがある。これは$targeted$causal discoveryと呼ばれる。我々の研究は、サブセット検索と因果マッチングの2つの問題に焦点を当てている。両症例の介入回数を最小化することを目的としている。これに対して、$meek~separator$という頂点の部分集合を導入し、介入すると、残りの非向きの辺をより小さな連結されたコンポーネントに分解する。次に,小型のmeek分離器を見つけるための効率的なアルゴリズムを提案する。このような手順は、様々な分割型アプローチの設計に有用である。特に,部分集合探索と因果マッチングの対数近似を実現する2つのランダム化アルゴリズムを提案する。以上の結果から,両問題に対する既知の平均ケース保証が得られた。これにより、様々なアプリケーションから発生する多くの目的の因果構造学習問題に対して、近似的手法を設計する可能性が開けると考えている。

Learning causal structures from interventional data is a fundamental problem with broad applications across various fields. While many previous works have focused on recovering the entire causal graph, in practice, there are scenarios where learning only part of the causal graph suffices. This is called $targeted$ causal discovery. In our work, we focus on two such well-motivated problems: subset search and causal matching. We aim to minimize the number of interventions in both cases. Towards this, we introduce the $Meek~separator$, which is a subset of vertices that, when intervened, decomposes the remaining unoriented edges into smaller connected components. We then present an efficient algorithm to find Meek separators that are of small sizes. Such a procedure is helpful in designing various divide-and-conquer-based approaches. In particular, we propose two randomized algorithms that achieve logarithmic approximation for subset search and causal matching, respectively. Our results provide the first known average-case provable guarantees for both problems. We believe that this opens up possibilities to design near-optimal methods for many other targeted causal structure learning problems arising from various applications.

翻訳日:2023-11-01 17:12:30 公開日:2023-10-30

# オブザーバブルの時間:他の対称性への展開

Time of ocurrence observables: expading to other symmetries ( http://arxiv.org/abs/2310.20074v1 )

ライセンス: Link先を確認

V. Cavalheri Pereira, J. C. A. Barata

(参考訳) 近年の研究では、量子力学における時間測定を記述するために、正の演算子値測度の定式化が提案されている。この研究は、これらの変換に関して共変な測度を構築するために、因果ポアンカー変換を含むような測度の構築方法を一般化することにより、他の著者による作業の拡張を目的としている。

Recent works have proposed the use of the formalism of Positive Operator Valued Measures to describe time measurements in quantum mechanics. This work aims to expand on the work done by other authors, by generalizing the previously proposed construction method of such measures to include causal Poincar\'e transformations, in order to construct measures which are covariant with respect to such transformations.

翻訳日:2023-11-01 17:12:13 公開日:2023-10-30

# インストラクションチューニングによる生成モデルの自動評価

Automatic Evaluation of Generative Models with Instruction Tuning ( http://arxiv.org/abs/2310.20072v1 )

ライセンス: Link先を確認

Shuhaib Mehri and Vered Shwartz

(参考訳) 自然言語生成の自動評価は,NLPにおいて長い間,あるタスクに対する人間の判断と評価基準をエミュレートする,訓練済みのファインチューン言語モデルが目標とされてきた。命令チューニングモデルの一般化能力に着想を得て,命令チューニングに基づく学習指標を提案する。このアプローチをテストするために,さまざまなNLGタスクと評価基準にわたる人間の判断のデータセットであるHEAPを収集した。実験の結果,HEAP 上でのチューニング言語モデルでは,多くの評価課題において優れた性能が得られることがわかった。さらに、複数のタスクを共同でトレーニングすることで、さらなるパフォーマンス向上が期待できる。

Automatic evaluation of natural language generation has long been an elusive goal in NLP.A recent paradigm fine-tunes pre-trained language models to emulate human judgements for a particular task and evaluation criterion. Inspired by the generalization ability of instruction-tuned models, we propose a learned metric based on instruction tuning. To test our approach, we collected HEAP, a dataset of human judgements across various NLG tasks and evaluation criteria. Our findings demonstrate that instruction tuning language models on HEAP yields good performance on many evaluation tasks, though some criteria are less trivial to learn than others. Further, jointly training on multiple tasks can yield additional performance improvements, which can be beneficial for future tasks with little to no human annotated data.

翻訳日:2023-11-01 17:12:05 公開日:2023-10-30

# FOCAL: 直交遅延空間におけるマルチモーダル時系列センシング信号のコントラスト学習

FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space ( http://arxiv.org/abs/2310.20071v1 )

ライセンス: Link先を確認

Shengzhong Liu, Tomoyoshi Kimura, Dongxin Liu, Ruijie Wang, Jinyang Li, Suhas Diggavi, Mani Srivastava, Tarek Abdelzaher

(参考訳) 本稿では,マルチモーダル時系列センシング信号から自己教師付き学習を通して包括的特徴を抽出するための,新しいコントラスト学習フレームワークfocalを提案する。既存のマルチモーダルコントラストフレームワークは、主に感覚モダリティ間の共有情報に依存しているが、基礎となるセンシング物理学を理解する上で重要な排他的モダリティ情報を明示的に考慮していない。さらに、時系列の対照的なフレームワークは時間的情報の局所性を適切に扱っていない。第一に、マルチモーダル時系列が与えられたとき、各モードは互いに直交する共有特徴とプライベート特徴からなる因子化された潜在空間に符号化される。共有空間は、モーダルマッチングの目的を通じて、感覚的モダリティ間で一貫性のある特徴パターンを強調する。対照的に、プライベート空間は変換不変目的を通じてモダリティ排他的情報を抽出する。第2に,時間的近接サンプル間の平均距離が時間的遠方サンプルよりも大きくなるような時間的構造的制約を提案する。 2つのバックボーンエンコーダと2つの分類器を備えた4つのマルチモーダルセンシングデータセットで広範な評価を行い、焦点の優位を示す。ダウンストリームタスクにおける最先端のベースラインを、利用可能なラベルの異なる比率で、明確なマージンで一貫して上回る。コードと自己収集したデータセットは、https://github.com/tomoyoshki/focal.comで入手できる。

This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective. In contrast, the private space extracts modality-exclusive information through a transformation-invariant objective. Second, we propose a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. Extensive evaluations are performed on four multimodal sensing datasets with two backbone encoders and two classifiers to demonstrate the superiority of FOCAL. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.

翻訳日:2023-11-01 17:11:49 公開日:2023-10-30

# Vignat: グラフアテンションネットワークによるコードセマンティクスの学習による脆弱性識別

Vignat: Vulnerability identification by learning code semantics via graph attention networks ( http://arxiv.org/abs/2310.20067v1 )

ライセンス: Link先を確認

Shuo Liu and Gail Kaiser

(参考訳) 脆弱性の識別は、サイバーセキュリティ攻撃からソフトウェアシステムを保護するために不可欠である。しかし、巨大なプロジェクトには数百万行以上のコードがあり、複雑な依存関係によって従来の静的および動的メソッドの実行が困難になります。さらに、さまざまな脆弱性のセマンティクス構造は大きく異なり、同時に発生する可能性があるため、一般的なルールベースのメソッドの拡張が難しくなる。本稿では,コードのグラフレベルのセマンティック表現を学習することで脆弱性を識別する新しいアテンションベースのフレームワークである「textit{Vignat}」を提案する。我々は、コードプロパティグラフ(cpgs)を細かい粒度で表現し、脆弱性検出にグラフアテンションネットワーク(gats)を使用する。結果は、人気のあるcライブラリから派生した信頼できるデータセットに対して、vignatが57.38\%の精度を達成できることを示している。さらに、GATの解釈可能性によって、脆弱性パターンに関する貴重な洞察が得られます。

Vulnerability identification is crucial to protect software systems from attacks for cyber-security. However, huge projects have more than millions of lines of code, and the complex dependencies make it hard to carry out traditional static and dynamic methods. Furthermore, the semantic structure of various types of vulnerabilities differs greatly and may occur simultaneously, making general rule-based methods difficult to extend. In this paper, we propose \textit{Vignat}, a novel attention-based framework for identifying vulnerabilities by learning graph-level semantic representations of code. We represent codes with code property graphs (CPGs) in fine grain and use graph attention networks (GATs) for vulnerability detection. The results show that Vignat is able to achieve $57.38\%$ accuracy on reliable datasets derived from popular C libraries. Furthermore, the interpretability of our GATs provides valuable insights into vulnerability patterns.

翻訳日:2023-11-01 17:11:24 公開日:2023-10-30

# LinFlo-Net:心のシミュレーション可能なメッシュを生成するための2段階のディープラーニング手法

LinFlo-Net: A two-stage deep learning method to generate simulation ready meshes of the heart ( http://arxiv.org/abs/2310.20065v1 )

ライセンス: Link先を確認

Arjun Narayanan, Fanwei Kong, Shawn Shadden

(参考訳) 本稿では,患者の撮像データから心臓のコンピュータモデルを自動的に生成する深層学習モデルを提案する。本手法は, テンプレートメッシュを変形させ, 心臓構造を所定の画像に適合させる。このアプローチを採用した以前のディープラーニング手法と比較して、このフレームワークはメッシュの自己浸透を最小限に抑えるように設計されている。本研究では, 2段階の2相変形過程と, 表面接触と間隙をペナリゼーションする運動キネマティクスに基づく新たな損失関数を用いることでこれを実現する。本モデルは,最先端手法と同等の精度を示すとともに,自己干渉のないメッシュを生成する。結果として得られるメッシュは物理ベースのシミュレーションで容易に利用でき、後処理やクリーンアップの必要性を最小限に抑えることができる。

We present a deep learning model to automatically generate computer models of the human heart from patient imaging data with an emphasis on its capability to generate thin-walled cardiac structures. Our method works by deforming a template mesh to fit the cardiac structures to the given image. Compared with prior deep learning methods that adopted this approach, our framework is designed to minimize mesh self-penetration, which typically arises when deforming surface meshes separated by small distances. We achieve this by using a two-stage diffeomorphic deformation process along with a novel loss function derived from the kinematics of motion that penalizes surface contact and interpenetration. Our model demonstrates comparable accuracy with state-of-the-art methods while additionally producing meshes free of self-intersections. The resultant meshes are readily usable in physics based simulation, minimizing the need for post-processing and cleanup.

翻訳日:2023-11-01 17:11:09 公開日:2023-10-30

# ブラインドマルチ分散ノイズ除去のためのスケーラブルなトレーニング戦略

A Scalable Training Strategy for Blind Multi-Distribution Noise Removal ( http://arxiv.org/abs/2310.20064v1 )

ライセンス: Link先を確認

Kevin Zhang, Sakshum Kulshrestha, Christopher Metzler

(参考訳) 固定されたネットワーク重みが与えられた場合、あるタスク(例えば、ポアソンノイズを除去する)における特定の特殊化(例えば、スペックルノイズを除去する)が、別のタスク(例えば、スペックルノイズを除去する)に対して本質的にトレードオフされる。さらに、そのようなネットワークのトレーニングは、次元の呪いのために困難である: 仕様空間の次元(すなわち、ノイズ分布を記述するのに必要なパラメータの数)が増加するにつれて、トレーニングに必要なユニークな仕様の数が指数関数的に増加する。この空間を均一にサンプリングすると、非常に困難な問題仕様ではうまく機能するが、簡単な問題仕様では不十分なネットワークになる。本稿では,適応サンプリング/アクティブラーニング戦略を用いたネットワークの訓練を提案する。これらの結果を高次元に拡張し、真の仕様喪失景観の多項式近似を組み込むことにより、最近提案された普遍的デノイザー訓練戦略を改良した。この近似により、トレーニング時間をほぼ2桁削減できる。本手法はpoisson-gaussian-speckle混合雑音のシミュレーションを用いて実験を行い,提案する訓練戦略により,単一ブラインド・ジェネラリストのデノイザーネットワークが,幅広い動作条件において,特定デノイザーネットワークの均一な境界内でピーク信号対雑音比を達成できることを実証する。また,poisson-gaussian-speckleノイズが多量の画像の小さなデータセットをキャプチャし,適応サンプリング戦略を用いたユニバーサルデノイザーが一様に訓練されたベースラインよりも優れていることを示す。

Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g.,~removing Poisson noise) for performance at another (e.g.,~removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e.,~the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.

翻訳日:2023-11-01 17:10:56 公開日:2023-10-30

# 分散・スケーラブル・プライバシ保護型合成データ生成

Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation ( http://arxiv.org/abs/2310.20062v1 )

ライセンス: Link先を確認

Vishal Ramesh, Rui Zhao, Naman Goel

(参考訳) 合成データは、プライバシーリスクを低減しつつ、データの価値を活用するための有望な方法として浮上している。合成データのポテンシャルは、プライバシフレンドリなデータリリースに限らず、分散シフトに対してより公正で堅牢な機械学習アルゴリズムをトレーニングするなど、ユースケースにおける実際のデータの補完も含んでいる。プライバシと統計保証の改善と機械学習パイプラインの利用向上のために、合成データ生成のアルゴリズム的な進歩に多くの関心が寄せられている。しかし、責任があり信頼性の高い合成データ生成では、これらのアルゴリズム的な側面にのみ焦点をあてるだけでは十分ではなく、合成データ生成パイプラインの全体像を考える必要がある。我々は,信頼されたセンタに頼らずに,実データのコントリビュータが自発的にプライベートな合成データ生成に参加することを可能にする,新しいシステムを構築する。私たちのモジュラーで汎用的でスケーラブルなソリューションは、Solid(Social Linked Data)、MPC(Secure Multi-Party Computation)、Trusted Execution Environments(TEEs)という3つのビルディングブロックに基づいています。 Solidは、Podと呼ばれる分散データストアにデータを安全に保存し、データへのアクセスを制御するための仕様だ。 MPC(英: MPC)とは、入力を秘密にしながら入力上の関数を共同で計算する暗号化手法である。 Intel SGXのようなTEEは、コードとデータの機密性と整合性のためにハードウェアベースの機能に依存している。これらの3つの技術が、責任と信頼できる合成データ生成における様々な課題に、いかに効果的に対処できるかを示す。 1)コントリビュータの自主性 2)分散化 3)プライバシーとプライバシー 4) スケーラビリティ。我々は,シミュレーションおよび実データ集合と異なる合成データ生成アルゴリズムについて,厳密な実験結果を用いてクレームを支持する。

Synthetic data is emerging as a promising way to harness the value of data, while reducing privacy risks. The potential of synthetic data is not limited to privacy-friendly data release, but also includes complementing real data in use-cases such as training machine learning algorithms that are more fair and robust to distribution shifts etc. There is a lot of interest in algorithmic advances in synthetic data generation for providing better privacy and statistical guarantees and for its better utilisation in machine learning pipelines. However, for responsible and trustworthy synthetic data generation, it is not sufficient to focus only on these algorithmic aspects and instead, a holistic view of the synthetic data generation pipeline must be considered. We build a novel system that allows the contributors of real data to autonomously participate in differentially private synthetic data generation without relying on a trusted centre. Our modular, general and scalable solution is based on three building blocks namely: Solid (Social Linked Data), MPC (Secure Multi-Party Computation) and Trusted Execution Environments (TEEs). Solid is a specification that lets people store their data securely in decentralised data stores called Pods and control access to their data. MPC refers to the set of cryptographic methods for different parties to jointly compute a function over their inputs while keeping those inputs private. TEEs such as Intel SGX rely on hardware based features for confidentiality and integrity of code and data. We show how these three technologies can be effectively used to address various challenges in responsible and trustworthy synthetic data generation by ensuring: 1) contributor autonomy, 2) decentralisation, 3) privacy and 4) scalability. We support our claims with rigorous empirical results on simulated and real datasets and different synthetic data generation algorithms.

翻訳日:2023-11-01 17:10:24 公開日:2023-10-30

# AdaSub: 低次元部分空間における2次情報を用いた確率最適化

AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces ( http://arxiv.org/abs/2310.20060v1 )

ライセンス: Link先を確認

Jo\~ao Victor Galv\~ao da Mata and Martin S. Andersen

(参考訳) 本研究では,現在および過去の情報に基づいて適応的に定義される低次元部分空間において,二次情報に基づく探索方向を計算する確率的最適化アルゴリズムadasubを提案する。一階法と比較して二階法の方が収束特性は良いが、各イテレーションでヘッセン行列を計算する必要性は計算コストを過大にし、実用的でない。この問題に対処するため,提案手法は,探索のための部分空間次元の選択を可能にすることにより,計算コストとアルゴリズム効率の管理を可能にする。我々のコードはgithubで無料で入手でき、予備的な数値結果は、adasubが所定の精度に達するのに必要な時間とイテレーション数で人気のある確率最適化器を上回っていることを示している。

We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses, making them impractical. To address this issue, our approach enables the management of computational expenses and algorithm efficiency by enabling the selection of the subspace dimension for the search. Our code is freely available on GitHub, and our preliminary numerical results demonstrate that AdaSub surpasses popular stochastic optimizers in terms of time and number of iterations required to reach a given accuracy.

翻訳日:2023-11-01 17:09:50 公開日:2023-10-30

# TarGEN: 大規模言語モデルによるターゲットデータ生成

TarGEN: Targeted Data Generation with Large Language Models ( http://arxiv.org/abs/2310.17876v2 )

ライセンス: Link先を確認

Himanshu Gupta and Kevin Scaria and Ujjwala Anantheswaran and Shreyas Verma and Mihir Parmar and Saurabh Arjun Sawant and Chitta Baral and Swaroop Mishra

(参考訳) 大規模言語モデル(llm)の急速な進歩は、多様で高品質な合成データセットを生成することを目的として、データ合成技術への関心を高めた。しかし、これらの合成データセットは、しばしば多様性の欠如とノイズの増加に苦しむ。本稿では,llmを用いた高品質合成データセット生成のための多段階プロンプト戦略であるtargenを提案する。 TarGENの利点は、その種なしの性質であり、特定のタスクインスタンスを必要としない。我々は、データセット作成中に不正確なラベル付きインスタンスを修正し、信頼性のあるラベルを確実にする自己補正と呼ばれる手法でTarGENを拡張した。提案手法の有効性を評価するため,SuperGLUEベンチマークから8つのタスクをエミュレートし,エンコーダのみ,エンコーダのみ,エンコーダのみ,デコーダのみのモデルを含む各種言語モデルを合成およびオリジナルトレーニングセットで微調整する。オリジナルのテストセットの評価によると、TarGENが生成したデータセットでトレーニングしたモデルは、オリジナルのデータセットでトレーニングしたモデルよりも約1-2%パフォーマンスが良い(Syn.による82.84%、Flan-T5を使用したog.では81.12%)。命令チューニングを導入すると、Flan-T5による合成データでは84.54%、元のデータでは81.49%のパフォーマンスが向上する。合成データセットを元のデータセットと比較した包括的な分析により、合成データセットはデータセットの複雑さと多様性の類似または高いレベルを示すことが明らかになった。さらに、合成データセットは、元のデータセットと密接に一致するバイアスレベルを表示する。最後に、私たちの合成スーパーグルーデータセットで事前調整すると、t5-3bはopenllmのリーダーボード上で印象的な結果をもたらし、セルフインストラクションデータセットでトレーニングされたモデルを4.14%上回ります。 TarGENが品質データ生成に役立ち、複雑なベンチマークを作成するための人間の努力を減らすことができることを願っています。

The rapid advancement of large language models (LLMs) has sparked interest in data synthesis techniques, aiming to generate diverse and high-quality synthetic datasets. However, these synthetic datasets often suffer from a lack of diversity and added noise. In this paper, we present TarGEN, a multi-step prompting strategy for generating high-quality synthetic datasets utilizing a LLM. An advantage of TarGEN is its seedless nature; it does not require specific task instances, broadening its applicability beyond task replication. We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances during dataset creation, ensuring reliable labels. To assess our technique's effectiveness, we emulate 8 tasks from the SuperGLUE benchmark and finetune various language models, including encoder-only, encoder-decoder, and decoder-only models on both synthetic and original training sets. Evaluation on the original test set reveals that models trained on datasets generated by TarGEN perform approximately 1-2% points better than those trained on original datasets (82.84% via syn. vs. 81.12% on og. using Flan-T5). When incorporating instruction tuning, the performance increases to 84.54% on synthetic data vs. 81.49% on original data by Flan-T5. A comprehensive analysis of the synthetic dataset compared to the original dataset reveals that the synthetic dataset demonstrates similar or higher levels of dataset complexity and diversity. Furthermore, the synthetic dataset displays a bias level that aligns closely with the original dataset. Finally, when pre-finetuned on our synthetic SuperGLUE dataset, T5-3B yields impressive results on the OpenLLM leaderboard, surpassing the model trained on the Self-Instruct dataset by 4.14% points. We hope that TarGEN can be helpful for quality data generation and reducing the human efforts to create complex benchmarks.

翻訳日:2023-11-01 10:10:05 公開日:2023-10-30

# 推論タスクに人間のようなコンテンツ効果を示す言語モデル

Language models show human-like content effects on reasoning tasks ( http://arxiv.org/abs/2207.07051v3 )

ライセンス: Link先を確認

Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Hannah R. Sheahan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, Felix Hill

(参考訳) 抽象推論はインテリジェントシステムにとって重要な能力である。大規模言語モデル (LM) は抽象的推論タスクにおいて上述のパフォーマンスを達成するが、多くの不完全性を示す。しかし、人間の抽象的推論も不完全である。例えば、人間の推論は現実世界の知識と信念に影響され、顕著な「コンテンツ効果」を示す。これらの内容に絡み合った推論パターンは、人間の知性の基本的性質に関する議論において中心的な役割を果たす。ここでは、言語モデル $\unicode{x2014}$ が人間の知識のいくつかの側面を捉えた事前の期待値 $\unicode{x2014}$ が、同様にコンテンツを論理問題への解に混合するかどうかを考察する。自然言語推論,文節の論理的妥当性の判断,wason選択課題の3つの論理的推論課題について検討した。言語モデルは、これらのタスクで観察されるのと同じパターンの多くを反映している。$\unicode{x2014}$ 人間と同様に、タスクのセマンティックコンテンツが論理的推論をサポートする場合、モデルはより正確に答える。これらの並列性は、応答パターンと、モデル応答分布と人間の応答時間の関係のような低レベルの特徴の両方に反映される。本研究は,これらの認知的影響と言語モデルの性能に寄与する要因の両方を理解することにつながる。

Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect. For example, human reasoning is affected by our real-world knowledge and beliefs, and shows notable "content effects"; humans reason more reliably when the semantic content of a problem supports the correct logical inferences. These content-entangled reasoning patterns play a central role in debates about the fundamental nature of human intelligence. Here, we investigate whether language models $\unicode{x2014}$ whose prior expectations capture some aspects of human knowledge $\unicode{x2014}$ similarly mix content into their answers to logical problems. We explored this question across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. We evaluate state of the art large language models, as well as humans, and find that the language models reflect many of the same patterns observed in humans across these tasks $\unicode{x2014}$ like humans, models answer more accurately when the semantic content of a task supports the logical inferences. These parallels are reflected both in answer patterns, and in lower-level features like the relationship between model answer distributions and human response times. Our findings have implications for understanding both these cognitive effects in humans, and the factors that contribute to language model performance.

翻訳日:2023-11-01 01:25:15 公開日:2023-10-30

# 階層型プロンプティング支援 Webナビゲーションにおける大規模言語モデル

Hierarchical Prompting Assists Large Language Model on Web Navigation ( http://arxiv.org/abs/2305.14257v3 )

ライセンス: Link先を確認

Abishek Sridhar, Robert Lo, Frank F. Xu, Hao Zhu, Shuyan Zhou

(参考訳) 大規模言語モデル(LLM)は、対話的な意思決定タスクにおける複雑な観察処理に苦労する。この問題を軽減するために,簡単な階層的プロンプト手法を提案する。常に完全な観察(例えばwebページ)をプロンプトに置く従来のプロンプトアプローチから逸脱し、より凝縮され、専用の要約プロンプトと関係のあるアクションアウェアな観察を最初に構築することを提案する。 ACTORプロンプトは、要約された観察に基づいて次のアクションを予測する。提案手法は適用範囲が広いが,Webナビゲーションの複雑な領域において,完全な観測が冗長で無関係な情報を含む場合が特に有効であることを示す。提案手法は,タスク成功率を6.2%向上させ,長い観察トレースを持つ対話的意思決定タスクの可能性を示した。

Large language models (LLMs) struggle on processing complicated observations in interactive decision making tasks. To alleviate this issue, we propose a simple hierarchical prompting approach. Diverging from previous prompting approaches that always put the full observation (e.g. a web page) to the prompt, we propose to first construct an action-aware observation which is more condensed and relevant with a dedicated SUMMARIZER prompt. The ACTOR prompt then predicts the next action based on the summarized observation. While our method has broad applicability, we particularly demonstrate its efficacy in the complex domain of web navigation where a full observation often contains redundant and irrelevant information. Our approach outperforms the previous state-of-the-art prompting mechanics by 6.2% on task success rate, demonstrating its potential on interactive decision making tasks with long observation traces.

翻訳日:2023-10-31 22:12:31 公開日:2023-10-30

# 事前学習表現における拡散冗長性

Diffused Redundancy in Pre-trained Representations ( http://arxiv.org/abs/2306.00183v2 )

ライセンス: Link先を確認

Vedant Nanda, Till Speicher, John P. Dickerson, Soheil Feizi, Krishna P. Gummadi, Adrian Weller

(参考訳) 大規模なデータセット上でニューラルネットワークを事前トレーニングすることで学んだ表現は、さまざまな下流タスクの実行に成功している。本研究では,事前学習された表現で機能がどのようにエンコードされるのかを詳細に検討する。閾値サイズよりも大きい層内のニューロンのランダムに選択されたサブセットは、全層と大きな類似度を持ち、様々な下流タスクで層全体と同様に実行することができる。例えば、ImageNet1kで事前トレーニングされたResNet50の最後層からランダムに選択されたニューロンの20ドル%でトレーニングされた線形プローブは、下流のCIFAR10分類のためにニューロンの全層でトレーニングされた線形プローブの5ドル以内の精度を達成する。我々は、ImageNet1kとImageNet21kの両方で事前訓練された異なるニューラルネットワーク(CNNとTransformersを含む)の実験を行い、VTABベンチマークから取得したさまざまな下流タスクを評価する。プレトレーニング中に使用される損失とデータセットは、主に拡散冗長性の程度と、必要なニューロンの「臨界質量」が下流のタスクに依存することが判明し、タスクに依存しない冗長性パフォーマンスのparetoフロンティアが存在することを示唆した。その結果,事前学習したディープニューラルネットワークで学習された表現の性質が明らかとなり,ダウンストリームタスクの実行には全層が必要でない可能性が示唆された。下流タスクの効率的な一般化を実現するために,この冗長性を活用する可能性を検討するとともに,意図しない結果に注意を喚起する。私たちのコードは \url{https://github.com/nvedant07/diffused-redundancy} で利用可能です。

Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, ie, any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on $20\%$ of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within $5\%$ of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pre-trained on both ImageNet1k and ImageNet21k and evaluate a variety of downstream tasks taken from the VTAB benchmark. We find that the loss and dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences. Our code is available at \url{https://github.com/nvedant07/diffused-redundancy}.

翻訳日:2023-10-31 21:53:34 公開日:2023-10-30

# 大規模言語モデルは半パラメトリック強化学習エージェントである

Large Language Models Are Semi-Parametric Reinforcement Learning Agents ( http://arxiv.org/abs/2306.07929v2 )

ライセンス: Link先を確認

Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu

(参考訳) 人間の記憶と推論機構に関する認知科学の知見に触発され,REMEMBERERとして,進化可能なLLM(Large Language Model)エージェントフレームワークが提案されている。長期記憶をLLMに装備することにより、REMEMBERERは、異なるタスク目標に対してであっても過去のエピソードからの経験を活用できる。さらに、メモリの更新にRLEM(Reinforcement Learning with Experience Memory)を導入します。したがって、システム全体が成功と失敗の両方の経験から学び、LSMのパラメータを微調整することなくその能力を進化させることができる。このようにして、提案したREMEMBERERは半パラメトリックRLエージェントを構成する。提案したフレームワークを評価するために,2つのRLタスクセットに対して大規模な実験を行った。初期化とトレーニングセットの異なる平均結果は,2つのタスクセットにおける成功率の4%と2%を上回り,REMEMBERERの優位性と堅牢性を示す。

Inspired by the insights in cognitive science with respect to human memory and reasoning mechanism, a novel evolvable LLM-based (Large Language Model) agent framework is proposed as REMEMBERER. By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory. We further introduce Reinforcement Learning with Experience Memory (RLEM) to update the memory. Thus, the whole system can learn from the experiences of both success and failure, and evolve its capability without fine-tuning the parameters of the LLM. In this way, the proposed REMEMBERER constitutes a semi-parametric RL agent. Extensive experiments are conducted on two RL task sets to evaluate the proposed framework. The average results with different initialization and training sets exceed the prior SOTA by 4% and 2% for the success rate on two task sets and demonstrate the superiority and robustness of REMEMBERER.

翻訳日:2023-10-31 21:29:12 公開日:2023-10-30

# ブロック状態変換器

Block-State Transformers ( http://arxiv.org/abs/2306.09539v4 )

ライセンス: Link先を確認

Mahan Fathi and Jonathan Pilault and Orhan Firat and Christopher Pal and Pierre-Luc Bacon and Ross Goroshin

(参考訳) 状態空間モデル(ssm)は、長い範囲の依存関係をモデル化し、その実行時の複雑さのために長いシーケンスに効率的にスケールする必要があるタスクで印象的な結果を示している。元々は連続的な信号のために設計されていたが、SSMは視覚やオーディオにおいて多くのタスクにおいて優れたパフォーマンスを示してきた。本研究では,長期コンテキスト化のためのSSMサブレイヤと,シーケンスの短期表現のためのBlock-State Transformerサブレイヤを内部的に組み合わせたBST(Block-State Transformer)というハイブリッド層を提案する。 SSMとブロックワイズを統合した3つの異なる並列化可能な変種について検討する。我々のモデルは言語モデリングの難易度において類似のTransformerベースのアーキテクチャよりも優れており、より長いシーケンスに一般化できることを示す。また、ブロック状態変圧器は、モデル並列化を行う際のブロックリカレント変圧器と比較して、層レベルで10倍以上の速度向上を示す。

State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.

翻訳日:2023-10-31 21:13:05 公開日:2023-10-30

# 交通信号制御のためのSim-to-Real転送に向けた不確実な接地行動変換

Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control ( http://arxiv.org/abs/2307.12388v3 )

ライセンス: Link先を確認

Longchao Da, Hao Mei, Romir Sharma and Hua Wei

(参考訳) 交通信号制御(tsc)は、数百万人の日常生活に影響を与える複雑で重要なタスクである。強化学習(rl)は交通信号制御の最適化に有望な結果を示しているが、現在のrlベースのtsc法は主にシミュレーションで訓練され、シミュレーションと実世界のパフォーマンスギャップに苦しむ。本稿では, シミュレーション中の動作を不確実性で動的に変換することで, シミュレーション環境から実世界環境へ学習した学習方針を伝達し, 遷移力学の領域ギャップを緩和する, UGAT と呼ばれるシミュレーションから実世界への移行手法を提案する。本手法をシミュレーションした交通環境において評価し,実環境におけるトランスファーrlポリシーの性能を著しく向上させることを示す。

Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.

翻訳日:2023-10-31 20:38:48 公開日:2023-10-30

# 小さな魔法は多くを意味する

A little magic means a lot ( http://arxiv.org/abs/2308.16228v2 )

ライセンス: Link先を確認

Andi Gu, Lorenzo Leone, Soumik Ghosh, Jens Eisert, Susanne Yelin, Yihui Quek

(参考訳) いわゆるマジックの表記は、非古典的な量子状態が正確にどのようにあるかを定量化する: マジックの低い値が量子上の優位性を妨げる。本研究では,魔法の少ない量子状態の特定のアンサンブルが,高い魔法を持つ量子状態と計算的に区別できない,'pseudomagic'という現象を導入する。従来、そのような計算の不明瞭さは、擬似絡みの概念を導入することによって、絡み合いに関して研究されてきた。しかし, 擬似呪文は擬似絡み合いに従わず, その意味も示さない。応用の観点からは、pseudomagicは量子カオスの理論に新たな光を当てている: 状態の存在は、非カオスユニタリから構築されているものの、いかなる物理的観測者でもランダムなカオス状態とは区別できないことを明かす。さらなる応用として、状態合成問題に対する新しい下限、特性テストプロトコル、量子暗号への含意が含まれる。私たちの結果は、魔法は量子状態の'隠れた'性質である、という概念的な含意を持っている: いくつかの状態は(計算的に束縛された)目よりも多くの魔法を持っている。物理学の観点からは、実験室で測定できる物理的性質は効率的に計算的に検出できるものであるという考え方を提唱している。

Notions of so-called magic quantify how non-classical quantum states are in a precise sense: low values of magic preclude quantum advantage; they also play a key role in quantum error correction. In this work, we introduce the phenomenon of 'pseudomagic' -- wherein certain ensembles of quantum states with low magic are computationally indistinguishable from quantum states with high magic. Previously, such computational indistinguishability has been studied with respect to entanglement, by introducing the notion of pseudoentanglement. However, we show that pseudomagic neither follows from pseudoentanglement, nor implies it. In terms of applications, pseudomagic sheds new light on the theory of quantum chaos: it reveals the existence of states that, although built from non-chaotic unitaries, cannot be distinguished from random chaotic states by any physical observer. Further applications include new lower bounds on state synthesis problems, property testing protocols, as well as implications for quantum cryptography. Our results have the conceptual implication that magic is a 'hide-able' property of quantum states: some states have a lot more magic than meets the (computationally-bounded) eye. From the physics perspective, it advocates the mindset that the only physical properties that can be measured in a laboratory are those that are efficiently computationally detectable.

翻訳日:2023-10-31 20:26:15 公開日:2023-10-30

# 宇宙エネルギーを用いた細粒度開集合認識モデル

Latent Space Energy-based Model for Fine-grained Open Set Recognition ( http://arxiv.org/abs/2309.10711v2 )

ライセンス: Link先を確認

Wentao Bao, Qi Yu, Yu Kong

(参考訳) 細粒度オープンセット認識(fineosr)は、未知のクラスのイメージを拒絶しながら、微妙な外観の違いを持つクラスに属するイメージを認識することを目的としている。 OSRの最近の傾向は、識別的未知の検出に対する生成モデルの利点を示している。生成モデルの一種として、エネルギーベースモデル(EBM)は、生成的および識別的タスクのハイブリッドモデリングのポテンシャルである。しかし、既存のebmの多くは高次元空間における密度推定に苦しむため、細粒度なクラスから画像を認識するのに非常に重要である。本稿では,osrのエネルギーに基づく事前分布を細粒度視覚空間に導入することで,低次元潜在空間を探索する。具体的には, 潜在空間ebmに基づいて, 細粒度クラスにおける試料の表現性, 粒度, 密度を向上させるために, 属性認識情報ボトルネック (aib) , 残留属性特徴集約 (rafa) モジュール, 不確実性に基づく仮想異常合成 (uvos) モジュールを提案する。本手法は, 近年の視覚トランスフォーマを用いて, 強力な視覚分類と生成を行うことができる。高解像度で写真リアルな偽画像を生成する能力を保ちながら、細粒度および一般的な視覚分類データセットで検証する。

Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. A recent trend in OSR shows the benefit of generative models to discriminative unknown detection. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks. However, most existing EBMs suffer from density estimation in high-dimensional space, which is critical to recognizing images from fine-grained classes. In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world. Specifically, based on the latent space EBM, we propose an attribute-aware information bottleneck (AIB), a residual attribute feature aggregation (RAFA) module, and an uncertainty-based virtual outlier synthesis (UVOS) module to improve the expressivity, granularity, and density of the samples in fine-grained classes, respectively. Our method is flexible to take advantage of recent vision transformers for powerful visual classification and generation. The method is validated on both fine-grained and general visual classification datasets while preserving the capability of generating photo-realistic fake images with high resolution.

翻訳日:2023-10-31 20:15:06 公開日:2023-10-30

# カスケード拡散モデルによる熱帯サイクロンの予測

Forecasting Tropical Cyclones with Cascaded Diffusion Models ( http://arxiv.org/abs/2310.01690v3 )

ライセンス: Link先を確認

Pritthijit Nath, Pancham Shukla, C\'esar Quilodr\'an-Casas

(参考訳) 気候変動によってサイクロンがより激しくなるにつれて、aiベースのモデリングの台頭は、数学的モデルに基づく従来の方法よりも安価でアクセスしやすいアプローチを提供する。本研究は, 衛星画像, リモートセンシング, 大気データの統合によるサイクロン軌道と降水パターンの予測に拡散モデルを用いており, 予測, 超解像, 降水モデリングを組み込んだカスケード手法を用いて, 6大盆地から51サイクロンのデータセットを訓練している。実験により, 3つのタスクすべてに対して, SSIMおよびPSNR値が0.5および20dBを超える36時間ロールアウトまでの正確な予測が得られた。この研究はまた、サイクロン予測のような高性能ニーズのための拡散モデルのようなaiメソッドの有望な効率を強調すると同時に、計算量的に手頃な価格のままで、重要な予測ニーズと財務上の制約のある高度に脆弱な地域に適している。 url{https://github.com/nathzi1505/forecast-diffmodels} でアクセス可能なコード。

As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations. Code accessible at \url{https://github.com/nathzi1505/forecast-diffmodels}.

翻訳日:2023-10-31 20:02:32 公開日:2023-10-30

# LDPCコードによるエラーフロー性能向上のための強化学習

Boosting Learning for LDPC Codes to Improve the Error-Floor Performance ( http://arxiv.org/abs/2310.07194v2 )

ライセンス: Link先を確認

Hee-Youl Kwak, Dae-Young Yun, Yongjune Kim, Sang-Hyo Kim, Jong-Seon No

(参考訳) 低密度パリティチェック(LDPC)符号は、エラー訂正機能と単純な復号処理のために通信システムにおいて商業化が成功している。しかし、LDPC符号のエラーフロア現象は、誤り率を一定レベルで急速に低下させ、極めて低いエラー率を実現し、超高信頼性を必要とするシナリオにLDPC符号を配置するという課題を提起する。本研究では,ニューラル・ミンサム(nms)デコーダの学習法を提案する。まず,アンサンブルネットワークの強化学習技術を活用することで,デコードネットワークを2つのニューラルデコーダに分割し,第1デコーダが修正に失敗した不正確な単語に特化するポストデコーダを訓練する。第二に,学習における勾配問題に対処するため,前ブロックを再訓練しながら,一ブロックの重みを局所的に訓練するブロックワイドトレーニングスケジュールを導入する。最後に,不満足なチェックノードに異なる重みを割り当てることで,少ない重みでエラーフローを効果的に低減できることを示す。これらのトレーニング手法を標準LDPC符号に適用することにより、他の復号法と比較して最高のエラーフロア性能が得られる。提案したNMSデコーダは、追加モジュールを使わずに新しいトレーニング手法によって最適化され、追加のハードウェアコストを発生させることなく既存のLDPCデコーダに統合することができる。ソースコードはhttps://github.com/ghy1228/ldpc_error_floorで入手できる。

Low-density parity-check (LDPC) codes have been successfully commercialized in communication systems due to their strong error correction capabilities and simple decoding process. However, the error-floor phenomenon of LDPC codes, in which the error rate stops decreasing rapidly at a certain level, presents challenges for achieving extremely low error rates and deploying LDPC codes in scenarios demanding ultra-high reliability. In this work, we propose training methods for neural min-sum (NMS) decoders to eliminate the error-floor effect. First, by leveraging the boosting learning technique of ensemble networks, we divide the decoding network into two neural decoders and train the post decoder to be specialized for uncorrected words that the first decoder fails to correct. Secondly, to address the vanishing gradient issue in training, we introduce a block-wise training schedule that locally trains a block of weights while retraining the preceding block. Lastly, we show that assigning different weights to unsatisfied check nodes effectively lowers the error-floor with a minimal number of weights. By applying these training methods to standard LDPC codes, we achieve the best error-floor performance compared to other decoding methods. The proposed NMS decoder, optimized solely through novel training methods without additional modules, can be integrated into existing LDPC decoders without incurring extra hardware costs. The source code is available at https://github.com/ghy1228/LDPC_Error_Floor .

翻訳日:2023-10-31 19:49:30 公開日:2023-10-30

# GestureGPT:大規模言語モデルエージェントによるゼロショット対話型ジェスチャー理解とグラウンド化

GestureGPT: Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents ( http://arxiv.org/abs/2310.12821v3 )

ライセンス: Link先を確認

Xin Zeng, Xiaoyu Wang, Tengxiang Zhang, Chun Yu, Shengdong Zhao, Yiqiang Chen

(参考訳) 現在のジェスチャー認識システムは、主に事前に定義されたセット内のジェスチャーの識別に重点を置いており、これらのジェスチャーを対話的なGUI要素やシステム機能(例えば 'thumb-up' ジェスチャーを 'like' ボタンにリンクするなど)に接続する際のギャップを残している。我々は,大規模言語モデル(llms)を活用したゼロショットジェスチャ理解と接地フレームワークであるgisgegptを紹介する。ジェスチャ記述はジェスチャビデオのハンドランドマーク座標に基づいて定式化し,デュアルエージェント対話システムへ入力する。ジェスチャーエージェントは、これらの記述を解読し、コンテキストエージェントが整理し提供するインタラクションコンテキスト(インターフェイス、履歴、視線データなど)に関する問い合わせを行う。反復交換に続いて、ジェスチャーエージェントはユーザ意図を識別し、対話的な機能にグラウンドする。ジェスチャー記述モジュールをパブリックなファーストビューとサードビューのジェスチャデータセットを使って検証し、システム全体をビデオストリーミングとスマートホームiotコントロールという2つの現実の設定でテストした。最高のゼロショットトップ5は、ビデオストリーミングの80.11%、スマートホームタスクの90.78%であり、新しいジェスチャー理解パラダイムの可能性を示している。

Current gesture recognition systems primarily focus on identifying gestures within a predefined set, leaving a gap in connecting these gestures to interactive GUI elements or system functions (e.g., linking a 'thumb-up' gesture to a 'like' button). We introduce GestureGPT, a novel zero-shot gesture understanding and grounding framework leveraging large language models (LLMs). Gesture descriptions are formulated based on hand landmark coordinates from gesture videos and fed into our dual-agent dialogue system. A gesture agent deciphers these descriptions and queries about the interaction context (e.g., interface, history, gaze data), which a context agent organizes and provides. Following iterative exchanges, the gesture agent discerns user intent, grounding it to an interactive function. We validated the gesture description module using public first-view and third-view gesture datasets and tested the whole system in two real-world settings: video streaming and smart home IoT control. The highest zero-shot Top-5 grounding accuracies are 80.11% for video streaming and 90.78% for smart home tasks, showing potential of the new gesture understanding paradigm.

翻訳日:2023-10-31 19:37:12 公開日:2023-10-30

# Denevil: インストラクション学習による大規模言語モデルの倫理的価値の解読とナビゲート

Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning ( http://arxiv.org/abs/2310.11053v2 )

ライセンス: Link先を確認

Shitong Duan, Xiaoyuan Yi, Peng Zhang, Tun Lu, Xing Xie, Ning Gu

(参考訳) 大規模言語モデル(LLM)は前例のない突破口をたどったが、日常生活への統合が進むと、非倫理的コンテンツによって社会的リスクが生じる可能性がある。偏見のような特定の問題に関する広範な研究にもかかわらず、LLMの本質的な価値は道徳哲学の観点からほとんど解明されていない。この研究は道徳的基礎理論を生かした倫理的価値観へと発展する。信頼性の低い従来の差別的評価を超えて、LLMの価値の脆弱性を動的に活用し、倫理の侵害を発生的方法で誘発する新しいプロンプト生成アルゴリズムであるDeNEVILを提案する。そこで我々は,500以上の値の原理をカバーする2,397のプロンプトからなる高品質なデータセットであるMoralPromptを構築し,本質的な値をLLMのスペクトルにわたってベンチマークする。ほとんどのモデルは本質的に不一致しており、さらなる倫理的価値の調整を必要としていることに気付きました。そこで本研究では,LLM出力の値コンプライアンスを学習によって大幅に向上し,適切な値命令を生成するためのコンテキスト内アライメント手法であるVILMOを開発した。我々の手法はブラックボックスやオープンソースモデルに適しており、LLMの倫理的価値を研究する上で有望な第一歩となる。

Large Language Models (LLMs) have made unprecedented breakthroughs, yet their increasing integration into everyday life might raise societal risks due to generated unethical content. Despite extensive study on specific issues like bias, the intrinsic values of LLMs remain largely unexplored from a moral philosophy perspective. This work delves into ethical values utilizing Moral Foundation Theory. Moving beyond conventional discriminative evaluations with poor reliability, we propose DeNEVIL, a novel prompt generation algorithm tailored to dynamically exploit LLMs' value vulnerabilities and elicit the violation of ethics in a generative manner, revealing their underlying value inclinations. On such a basis, we construct MoralPrompt, a high-quality dataset comprising 2,397 prompts covering 500+ value principles, and then benchmark the intrinsic values across a spectrum of LLMs. We discovered that most models are essentially misaligned, necessitating further ethical value alignment. In response, we develop VILMO, an in-context alignment method that substantially enhances the value compliance of LLM outputs by learning to generate appropriate value instructions, outperforming existing competitors. Our methods are suitable for black-box and open-source models, offering a promising initial step in studying the ethical values of LLMs.

翻訳日:2023-10-31 19:34:47 公開日:2023-10-30

# 言語モデルにおける真さをモデル化するペルソナ

Personas as a Way to Model Truthfulness in Language Models ( http://arxiv.org/abs/2310.18168v2 )

ライセンス: Link先を確認

Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He

(参考訳) 大規模な言語モデルは、インターネットから大量のテキストで訓練されており、これは事実と誤解を招く世界に関する情報の両方を含んでいる。言語モデルは、この矛盾するデータで真理と偽りを区別できるだろうか? llmがコーパスを生産する異なるエージェントをモデル化できるという見解を拡張して、真理のあるパーソナリティをモデル化することで真理のあるテキストをクラスタ化できると仮定した。例えば、wikipediaやscienceのような信頼できる情報源は通常形式的な文体を使い、一貫した主張をする。このペルソナをモデル化することにより、LLMは、各エージェントがトレーニングテキストを生成する特定のコンテキストを超えて、真実性を一般化することができる。例えば、このモデルはエージェント"wikipedia"が、ペルソナを共有するため、"科学"によってのみ生成されたトピックに対して、真に振る舞うと推測できる。まず2つの観察によってペルソナ仮説の証拠を示す:(1)生成前にモデルの答えが真理であるかどうかを検証できる、(2)一連の事実に基づいてモデルを微調整することで、未知の話題に対する真理性が向上する。次に、算術を合成環境として用いて、言語モデルが真と偽の言明を分離し、エージェント間で真さを一般化できることを示し、訓練データ内のエージェントが真偽のペルソナを作成することができる真偽生成プロセスを共有する場合に限る。全体としては、モデルがデータの階層構造を利用して真理のような抽象概念を学習できることが示唆されている。

Large Language Models are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. Can language models discern truth from falsehood in this contradicting data? Expanding on the view that LLMs can model different agents producing the corpora, we hypothesize that they can cluster truthful text by modeling a truthful persona: a group of agents that are likely to produce truthful text and share similar features. For example, trustworthy sources like Wikipedia and Science usually use formal writing styles and make consistent claims. By modeling this persona, LLMs can generalize truthfulness beyond the specific contexts in which each agent generated the training text. For example, the model can infer that the agent "Wikipedia" will behave truthfully on topics that were only generated by "Science" because they share a persona. We first show evidence for the persona hypothesis via two observations: (1) we can probe whether a model's answer will be truthful before it is generated; (2) finetuning a model on a set of facts improves its truthfulness on unseen topics. Next, using arithmetics as a synthetic environment, we show that language models can separate true and false statements, and generalize truthfulness across agents; but only if agents in the training data share a truthful generative process that enables the creation of a truthful persona. Overall, our findings suggest that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.

翻訳日:2023-10-31 19:13:51 公開日:2023-10-30

# 一次元閉じ込めフェルミオンにおける合成次元誘起擬ヤーン・テラー効果

Synthetic dimension-induced pseudo Jahn-Teller effect in one-dimensional confined fermions ( http://arxiv.org/abs/2310.17995v2 )

ライセンス: Link先を確認

Andr\'e Becker, Georgios M. Koutentakis, Peter Schmelcher

(参考訳) 超低温フェルミガス中における量子不純物の基底状態を記述するために, 浴場と不純物種の間にかなりの質量差があるにもかかわらず, 断熱的ボルン・オッペンハイマー近似の失敗を実証した。反発の増大は、速い浴槽と遅い不純物自由度との間の非断熱カップリングの出現を招き、擬ヤーン・テラー効果に従って後者のパリティ対称性を減少させる。このメカニズムの存在は、不純物の位置と合成次元として作用する相互作用強度の逆を含む円錐交差と関連している。 ab initio完全相関シミュレーションと実効モデルとの比較を含む詳細な基底状態解析により,これらの効果の存在を解明する。本研究は複雑な分子現象の強力なエミュレータとして超低温原子アンサンブルを提案する。

We demonstrate the failure of the adiabatic Born-Oppenheimer approximation to describe the ground state of a quantum impurity within an ultracold Fermi gas despite substantial mass differences between the bath and impurity species. Increasing repulsion leads to the appearance of non-adiabatic couplings between the fast bath and slow impurity degrees of freedom which reduce the parity symmetry of the latter according to the pseudo Jahn-Teller effect. The presence of this mechanism is associated to a conical intersection involving the impurity position and the inverse of the interaction strength which acts as a synthetic dimension. We elucidate the presence of these effects via a detailed ground state analysis involving the comparison of ab initio fully-correlated simulations with effective models. Our study suggests ultracold atomic ensembles as potent emulators of complex molecular phenomena.

翻訳日:2023-10-31 19:13:27 公開日:2023-10-30

# 一度列車で家族を得る:オフラインからオンラインへの強化学習のための状態適応バランス

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning ( http://arxiv.org/abs/2310.17966v2 )

ライセンス: Link先を確認

Shenzhi Wang, Qisen Yang, Jiawei Gao, Matthieu Gaetan Lin, Hao Chen, Liwei Wu, Ning Jia, Shiji Song, Gao Huang

(参考訳) オフライン-オンライン強化学習(rl)は、事前収集されたデータセットの事前トレーニングと、オンライン環境での微調整を組み合わせたトレーニングパラダイムである。しかし、オンラインファインチューニングの導入は、よく知られた分散シフト問題を強化することができる。既存のソリューションは、オフラインとオンライン両方の学習において、政策改善目標にポリシー制約を課すことで、この問題に対処する。彼らは通常、ポリシーの改善とさまざまなデータコレクション間の制約の間の単一のバランスを提唱する。この1サイズフィットの方法は、異なる状態におけるデータ品質の著しい変動のため、各サンプルを最適に活用できない。この目的のために、既存のアルゴリズムが状態適応型改善-制約バランスを決定することを可能にする、シンプルで効果的なフレームワークであるfamo2o(family offline-to-online rl)を紹介します。 FamO2Oは、異なる改善/制約強度のポリシー群を訓練するための普遍モデルと、各州に適したポリシーを選択するためのバランスモデルを利用する。理論的には、より高いポリシーパフォーマンスを達成するためには、状態適応バランスが必要であることを証明します。実証的な実験により、FamO2Oは様々な既存手法に対して統計的に有意な改善をもたらし、D4RLベンチマークで最先端の性能を達成した。コードはhttps://github.com/LeapLabTHU/FamO2Oで入手できる。

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a balance model to select a suitable policy for each state. Theoretically, we prove that state-adaptive balances are necessary for achieving a higher policy performance upper bound. Empirically, extensive experiments show that FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark. Codes are available at https://github.com/LeapLabTHU/FamO2O.

翻訳日:2023-10-31 19:13:15 公開日:2023-10-30

# CNR演算に基づく量子近似最適化アルゴリズム

A Quantum Approximate Optimization Algorithm Based on CNR Operations ( http://arxiv.org/abs/2310.17927v2 )

ライセンス: Link先を確認

Da You Lv and An Min Wang

(参考訳) 本稿では、CNR演算(comparison and replacement)を導入し、CNR$t$のレベル$p$とアンシラキュービット数に依存する組合せ最適化問題に対する純粋量子近似アルゴリズムを構築する。 CNRは、高いオブジェクト関数レベルを持つ文字列の確率をアルゴリズムのレベルで引き上げ、オブジェクト関数が確率を支配するのをほぼ最大化する。ビット数n$が変更されていない場合、アルゴリズムの性能は、直接$p$の増加によって向上する。そして$t$は、CNRの正確性と信頼性を決定する。理論的結果に対するアルゴリズムトレンドの実践的性能は、$t$が増加するにつれて向上する。固定された$p$と$t$の場合、このアルゴリズムは測定の確率分布が同じである状態と、非退化あるいは退化のインスタンスに対して対応する適合曲線をそれぞれ出力する。

This paper introduces the "comparison and replacement" (CNR) operation and constructs a pure quantum approximate algorithm for combinatorial optimization problems which depends on the number of level $p$ and ancilla qubits number of CNR $t$. CNR lifts the probability of strings with high object function level by level in the algorithm, which ensures the strings approximately maximizing the object function dominate the probability. When the number of bits $n$ remains unchanged, the performance of the algorithm improves with the increase of $p$ directly. And $t$ determines the accuracy and reliability of CNR. The practical performance of algorithm trends to theoretical results as $t$ increases. For fixed $p$ and $t$, the algorithm outputs a state with identical probability distribution of measurement or the corresponding fit curve for nondegenerate or degenerate instance respectively, which means that, for universal combinatorial optimization problems, the algorithm always works.

翻訳日:2023-10-31 19:12:50 公開日:2023-10-30

# ControlLLM: グラフ検索によるツールによる言語モデルの拡張

ControlLLM: Augment Language Models with Tools by Searching on Graphs ( http://arxiv.org/abs/2310.17796v2 )

ライセンス: Link先を確認

Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Zhiheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang

(参考訳) 我々は,大規模言語モデル(LLM)が複雑な実世界のタスクを解くためのマルチモーダルツールを利用できる新しいフレームワークであるControlLLMを提案する。 LLMの顕著な性能にもかかわらず、不明瞭なユーザプロンプト、不正確なツールの選択とパラメータ化、非効率なツールスケジューリングによるツール呼び出しに苦戦している。 To overcome these challenges, our framework comprises three key components: (1) a \textit{task decomposer} that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a \textit{Thoughts-on-Graph (ToG) paradigm} that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an \textit{execution engine with a rich toolbox} that interprets the solution path and runs the tools efficiently on different computational devices. 我々は,画像,音声,映像処理を含む多種多様なタスクの枠組みを評価し,既存の手法と比較して,その精度,効率,汎用性を示す。コードはhttps://github.com/OpenGVLab/ControlLLM にある。

We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a \textit{task decomposer} that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a \textit{Thoughts-on-Graph (ToG) paradigm} that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an \textit{execution engine with a rich toolbox} that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods. The code is at https://github.com/OpenGVLab/ControlLLM .

翻訳日:2023-10-31 19:12:34 公開日:2023-10-30

# ビデオにおける複合活動検出のためのハイブリッドグラフネットワーク

A Hybrid Graph Network for Complex Activity Detection in Video ( http://arxiv.org/abs/2310.17493v2 )

ライセンス: Link先を確認

Salman Khan, Izzeddin Teeti, Andrew Bradley, Mohamed Elhoseiny, Fabio Cuzzolin

(参考訳) ビデオの解釈と理解は、自動運転やスポーツ分析など、さまざまな分野におけるコンピュータビジョンの課題である。ビデオクリップ内で実行されるアクションを解釈するための既存のアプローチは、時間的行動局所化(TAL)に基づいている。複雑な活動検出(CompAD)の新たな分野は、ビデオ内で発生する複雑な活動の内部構造をモデル化することによって、この分析を長期的な活動に拡張する。本研究では,局所的(短期)動的シーンを符号化するグラフと,全長周期動作をモデル化した時間グラフを組み合わせたハイブリッドグラフニューラルネットワークを用いて,コンパッド問題に対処する。私たちのアプローチは以下の通りです。まず,各映像スニペットに対して,個々の物体を検知して追跡し,すべてのエージェントチューブと全体シーンから3d特徴を抽出することにより,(局所)シーン内のアクティブ要素(「エージェント」)に対して時空間的「チューブ」を生成する新しい特徴抽出手法を提案する。二次に、各ノード(エージェントチューブ又はシーンを表す)が他のすべてのノードに接続されたローカルシーングラフを構築します。次に、このグラフに注意を払い、局所動的シーンの全体的な表現を得る。三最後に、すべてのローカルシーングラフ表現は、時間グラフを介して相互接続され、その開始と終了時間とともに複雑なアクティビティクラスを推定する。提案されたフレームワークは、ActivityNet-1.3、Thumos-14、ROADを含む3つのデータセットで、これまでの最先端メソッドよりも優れている。

Interpretation and understanding of video presents a challenging computer vision task in numerous fields - e.g. autonomous driving and sports analytics. Existing approaches to interpreting the actions taking place within a video clip are based upon Temporal Action Localisation (TAL), which typically identifies short-term actions. The emerging field of Complex Activity Detection (CompAD) extends this analysis to long-term activities, with a deeper understanding obtained by modelling the internal structure of a complex activity taking place within the video. We address the CompAD problem using a hybrid graph neural network which combines attention applied to a graph encoding the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Our approach is as follows: i) Firstly, we propose a novel feature extraction technique which, for each video snippet, generates spatiotemporal `tubes' for the active elements (`agents') in the (local) scene by detecting individual objects, tracking them and then extracting 3D features from all the agent tubes as well as the overall scene. ii) Next, we construct a local scene graph where each node (representing either an agent tube or the scene) is connected to all other nodes. Attention is then applied to this graph to obtain an overall representation of the local dynamic scene. iii) Finally, all local scene graph representations are interconnected via a temporal graph, to estimate the complex activity class together with its start and end time. The proposed framework outperforms all previous state-of-the-art methods on all three datasets including ActivityNet-1.3, Thumos-14, and ROAD.

翻訳日:2023-10-31 19:12:15 公開日:2023-10-30

# ディープリッツの適応的重要サンプリング

Adaptive importance sampling for Deep Ritz ( http://arxiv.org/abs/2310.17185v2 )

ライセンス: Link先を確認

Xiaoliang Wan and Tao Zhou and Yuancheng Zhou

(参考訳) 本稿では,偏微分方程式(PDE)の解法を目的としたディープリッツ法の適応サンプリング手法を提案する。 2つの深いニューラルネットワークが使用される。 1つのネットワークはPDEの解を近似するために使用され、もう1つはトレーニングセットを洗練させるために新しいコロケーションポイントを生成するために使用される深層生成モデルである。適応サンプリング手順は2つの主要なステップから構成される。最初のステップは、トレーニングセットのコロケーションポイントによって識別される関連する変分損失を最小限にして、ディープリッツ法を用いてPDEを解くことである。 2番目のステップは、次の計算で使われる新しいトレーニングセットを生成し、現在の近似解の精度をさらに向上させる。変分損失の積分を非正規化確率密度関数(PDF)として扱い、境界KRnetと呼ばれる深い生成モデルを用いて近似する。新しいサンプルとその関連するpdf値は、bounded krnetから得られる。これらの新しいサンプルとその関連PDF値により、重要サンプリングによりより正確に変分損失を近似することができる。従来のDeep Ritz法と比較して,提案手法は精度を向上し,特に低正規性と高次元性に特徴付けられる問題に対して有効である。本稿では,本手法の有効性を数値実験により実証する。

We introduce an adaptive sampling method for the Deep Ritz method aimed at solving partial differential equations (PDEs). Two deep neural networks are used. One network is employed to approximate the solution of PDEs, while the other one is a deep generative model used to generate new collocation points to refine the training set. The adaptive sampling procedure consists of two main steps. The first step is solving the PDEs using the Deep Ritz method by minimizing an associated variational loss discretized by the collocation points in the training set. The second step involves generating a new training set, which is then used in subsequent computations to further improve the accuracy of the current approximate solution. We treat the integrand in the variational loss as an unnormalized probability density function (PDF) and approximate it using a deep generative model called bounded KRnet. The new samples and their associated PDF values are obtained from the bounded KRnet. With these new samples and their associated PDF values, the variational loss can be approximated more accurately by importance sampling. Compared to the original Deep Ritz method, the proposed adaptive method improves accuracy, especially for problems characterized by low regularity and high dimensionality. We demonstrate the effectiveness of our new method through a series of numerical experiments.

翻訳日:2023-10-31 19:11:48 公開日:2023-10-30

# 小さな不均衡テキストデータにおける感情検出のためのデータ拡張

Data Augmentation for Emotion Detection in Small Imbalanced Text Data ( http://arxiv.org/abs/2310.17015v3 )

ライセンス: Link先を確認

Anna Koufakou, Diego Grisales, Ragy Costa de jesus, Oscar Fox

(参考訳) テキストにおける感情認識は、喜びや怒りなどの感情を識別するタスクであり、多くのアプリケーションでNLPにおいて難しい問題である。課題のひとつは、感情を注釈付けしたデータセットが不足していることだ。既存のデータセットは小さく、異なる感情分類に従い、感情分布に不均衡を示す。本研究では,RoBERTaのような現在の最先端モデルが低性能である小さな不均衡データセットに適用した場合に,データ拡張技術が与える影響について検討した。具体的には、異なるソースから派生したサイズ、感情カテゴリー、分布の異なる3つのデータセットに対して、4つのデータ拡張方法(簡易データ拡張EDA、静的および文脈的埋め込みベース、ProtAugment)を利用した。実験結果から,分類器モデルの訓練に拡張データを用いることで,大幅な改善が得られた。最後に2つのケーススタディを行いました a) 一般的なチャット-GPT APIを使って、異なるプロンプトを使ってテキストを言い換え、 b) トレーニングセットを補強するために外部データを使用する。結果はこれらの手法の有望な可能性を示している。

Emotion recognition in text, the task of identifying emotions such as joy or anger, is a challenging problem in NLP with many applications. One of the challenges is the shortage of available datasets that have been annotated with emotions. Certain existing datasets are small, follow different emotion taxonomies and display imbalance in their emotion distribution. In this work, we studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets, for which current state-of-the-art models (such as RoBERTa) under-perform. Specifically, we utilized four data augmentation methods (Easy Data Augmentation EDA, static and contextual Embedding-based, and ProtAugment) on three datasets that come from different sources and vary in size, emotion categories and distributions. Our experimental results show that using the augmented data when training the classifier model leads to significant improvements. Finally, we conducted two case studies: a) directly using the popular chat-GPT API to paraphrase text using different prompts, and b) using external data to augment the training set. Results show the promising potential of these methods.

翻訳日:2023-10-31 19:11:28 公開日:2023-10-30

# テキスト基準に基づく画像クラスタリング

Image Clustering Conditioned on Text Criteria ( http://arxiv.org/abs/2310.18297v2 )

ライセンス: Link先を確認

Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu, Kangwook Lee

(参考訳) 古典的なクラスタリング手法では,クラスタリング結果を直接制御することができず,クラスタリング結果がユーザの意識する関連する基準と一致しない場合がある。本研究では,現代視覚言語モデルと大規模言語モデルを活用することで,ユーザ特定テキスト基準に基づく画像クラスタリングを行う手法を提案する。我々は,テキストの基準(ic$|$tc)を条件とした手法を画像クラスタリングと呼び,画像クラスタリングの異なるパラダイムを表す。 IC$|$TCは、最小限かつ実用的な人間の介入を必要とし、ユーザーはクラスタリング結果に対してかなりの制御をすることができる。実験の結果、IC$|$TCは、人間の行動、身体的位置、気分などの様々な基準で画像を効果的にクラスタリングし、ベースラインを大幅に上回ることを示した。

Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC$|$TC), and it represents a different paradigm of image clustering. IC$|$TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC$|$TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines.

翻訳日:2023-10-31 18:57:33 公開日:2023-10-30

# ベイズ最適化による持続可能なコンクリート

Sustainable Concrete via Bayesian Optimization ( http://arxiv.org/abs/2310.18288v2 )

ライセンス: Link先を確認

Sebastian Ament, Andrew Witte, Nishant Garg, Julius Kusuma

(参考訳) 世界の二酸化炭素排出量の8%は、データセンター建設におけるco2排出源でもあるコンクリートの主要成分であるセメントの生産に起因する可能性がある。したがって、低炭素コンクリート式の発見は持続可能性にとって非常に重要である。しかし、新しいコンクリートの公式を実験することは時間がかかり、労働集約的であるため、通常、コンクリートの28日間の圧縮強度を記録するのを待たなければならない。これにより、ベイズ最適化(BO)のような実験的な設計手法が、強力で持続可能なコンクリート公式の探索を加速する機会を提供する。ここでは 1) 実測値が比較的少ないガウス過程モデルにより, コンクリート強度を精度良く予測できるモデリング手順を提案する。 2【多目的最適化問題としての持続可能なコンクリートの探索】 3)提案したモデルを用いて,アルゴリズムにより提案した混合体の実世界強度測定を行う。実験の結果, 地球温暖化ポテンシャル(GWP)と関連する圧縮強度とのトレードオフは, 現在の産業プラクティスに基づく混合よりも改善した。私たちのメソッドはgithub.com/facebookresearch/sustainableconcreteでオープンソースです。

Eight percent of global carbon dioxide emissions can be attributed to the production of cement, the main component of concrete, which is also the dominant source of CO2 emissions in the construction of data centers. The discovery of lower-carbon concrete formulae is therefore of high significance for sustainability. However, experimenting with new concrete formulae is time consuming and labor intensive, as one usually has to wait to record the concrete's 28-day compressive strength, a quantity whose measurement can by its definition not be accelerated. This provides an opportunity for experimental design methodology like Bayesian Optimization (BO) to accelerate the search for strong and sustainable concrete formulae. Herein, we 1) propose modeling steps that make concrete strength amenable to be predicted accurately by a Gaussian process model with relatively few measurements, 2) formulate the search for sustainable concrete as a multi-objective optimization problem, and 3) leverage the proposed model to carry out multi-objective BO with real-world strength measurements of the algorithmically proposed mixes. Our experimental results show improved trade-offs between the mixtures' global warming potential (GWP) and their associated compressive strengths, compared to mixes based on current industry practices. Our methods are open-sourced at github.com/facebookresearch/SustainableConcrete.

翻訳日:2023-10-31 18:57:09 公開日:2023-10-30

# 畳み込みニューラルネットワークを用いた芸術的スタイル伝達のための生成AIモデル

Generative AI Model for Artistic Style Transfer Using Convolutional Neural Networks ( http://arxiv.org/abs/2310.18237v2 )

ライセンス: Link先を確認

Jonayet Miah, Duc M Cao, Md Abu Sayed, and Md. Sabbirul Haque

(参考訳) 芸術的スタイル転送(artiteal style transfer)とは、生成的人工知能(generative artificial intelligence)のキャプティベーション応用であり、ある画像の内容を他の画像の芸術的スタイルと融合させ、ユニークな視覚的な構成を作り出すことを含む。本稿では,畳み込みニューラルネットワーク(cnns)を用いた新しいスタイル転送手法の包括的概要について述べる。 cnnが学習した深層画像表現を活用し,画像コンテンツとスタイルを分離・操作する方法を実証し,コンテンツとスタイルを調和させた高品質画像の合成を可能にした。コンテンツとスタイルの表現、損失計算、最適化を含む方法論を解説し、異なるスタイルとコンテンツにまたがるアプローチの有効性と汎用性を明らかにする実験結果を示す。

Artistic style transfer, a captivating application of generative artificial intelligence, involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs). By leveraging deep image representations learned by CNNs, we demonstrate how to separate and manipulate image content and style, enabling the synthesis of high-quality images that combine content and style in a harmonious manner. We describe the methodology, including content and style representations, loss computation, and optimization, and showcase experimental results highlighting the effectiveness and versatility of the approach across different styles and content

翻訳日:2023-10-31 18:56:36 公開日:2023-10-30

# Davidsonian Scene Graph: テキスト・画像生成のためのきめ細かい評価における信頼性の向上

Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation ( http://arxiv.org/abs/2310.18235v2 )

ライセンス: Link先を確認

Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang

(参考訳) テキストから画像へのモデルの評価は、非常に難しい。テキスト画像の忠実性を評価するための最近の強固なアプローチは、事前学習された基礎モデルを用いてプロンプトから質問と回答のセットを自動的に生成するqg/a(question generation and answering)に基づいており、これらの回答がプロンプトベースの回答と一致するかどうかに基づいて出力画像がスコア付けされる。この種の評価は、基礎となるQGモデルとQAモデルの品質に自然に依存する。既存のQG/A作業における信頼性上の課題を特定し,対処する。 (a)qg質問は、プロンプト(幻覚、重複、欠落を回避)を尊重すべきである。 (b)VQAの答えは一貫していなければならない(画像にはオートバイがないが、オートバイは青だと主張する)。我々はこれらの問題を,形式的意味論に触発された経験的基盤評価フレームワークであるDavidsonian Scene Graph (DSG)で解決する。 DSGはグラフベースの自動QG/Aであり、任意のQG/Aモジュールに適応するようにモジュール実装されている。 DSGは依存グラフにまとめられた原子的およびユニークな質問を生成する。 (i)適切な意味的カバレッジを確保し、 (ii)不一致解答。モデル構成(LLM, VQA, T2I)の広範な実験と人間による評価により,DSGが上記の課題に対処できることを実証的に実証した。最後に,1060のプロンプトを含むオープンソースの評価ベンチマークDSG-1kを提案する。我々はDSG-1kプロンプトと対応するDSG質問をリリースする。

Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model are consistent with the prompt-based answers. This kind of evaluation is naturally dependent on the quality of the underlying QG and QA models. We identify and address several reliability challenges in existing QG/A work: (a) QG questions should respect the prompt (avoiding hallucinations, duplications, and omissions) and (b) VQA answers should be consistent (not asserting that there is no motorcycle in an image while also claiming the motorcycle is blue). We address these issues with Davidsonian Scene Graph (DSG), an empirically grounded evaluation framework inspired by formal semantics. DSG is an automatic, graph-based QG/A that is modularly implemented to be adaptable to any QG/A module. DSG produces atomic and unique questions organized in dependency graphs, which (i) ensure appropriate semantic coverage and (ii) sidestep inconsistent answers. With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above. Finally, we present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts, covering a wide range of fine-grained semantic categories with a balanced distribution. We release the DSG-1k prompts and the corresponding DSG questions.

翻訳日:2023-10-31 18:56:14 公開日:2023-10-30

# EHRTutor:患者の放電指示の理解を促進する

EHRTutor: Enhancing Patient Understanding of Discharge Instructions ( http://arxiv.org/abs/2310.19212v1 )

ライセンス: Link先を確認

Zihao Zhang, Zonghai Yao, Huixue Zhou, Feiyun ouyang, Hong Yu

(参考訳) 大規模言語モデルは様々な分野の教育の教師として成功している。臨床受診を患者に教えることは、退院後の治療計画に患者が固執する上で重要な役割を担っている。本稿では,大規模言語モデル(llm)を活用した会話型質問応答による患者教育のための革新的多成分フレームワークであるehrtutorを提案する。 EHRTutorはまず、電子健康記録排出指示に関する質問を定式化する。そして、各質問をテストとして管理することで、会話を通じて患者を教育する。最後に、会話の最後に要約を生成する。 LLMとドメインエキスパートによる評価結果は、ベースラインよりもEHRTutorの方が明確な選択であることを示している。さらに、EHRTutorは、将来の社内システムトレーニングに使用できる合成患者教育対話を生成するためのフレームワークも提供する。

Large language models have shown success as a tutor in education in various fields. Educating patients about their clinical visits plays a pivotal role in patients' adherence to their treatment plans post-discharge. This paper presents EHRTutor, an innovative multi-component framework leveraging the Large Language Model (LLM) for patient education through conversational question-answering. EHRTutor first formulates questions pertaining to the electronic health record discharge instructions. It then educates the patient through conversation by administering each question as a test. Finally, it generates a summary at the end of the conversation. Evaluation results using LLMs and domain experts have shown a clear preference for EHRTutor over the baseline. Moreover, EHRTutor also offers a framework for generating synthetic patient education dialogues that can be used for future in-house system training.

翻訳日:2023-10-31 13:48:56 公開日:2023-10-30

# テロ対策のための探索的パターン検出フレームワーク

Investigative Pattern Detection Framework for Counterterrorism ( http://arxiv.org/abs/2310.19211v1 )

ライセンス: Link先を確認

Shashika R. Muramudalige, Benjamin W. K. Hung, Rosanne Libretti, Jytte Klausen, Anura P. Jayasumana

(参考訳) 暴力的な過激派による攻撃を防ぐための法執行調査は、公衆の安全のためにますます重要になっている。この問題は、過激派やグループの複雑な行動を特定するためにスキャンする必要がある膨大なデータ量によって悪化する。アナリストから問い合わせに応答する情報を抽出し、新たな情報を継続的にスキャンし、過去のイベントと統合し、新たな脅威を警告する自動ツールが必要となる。我々は、調査パターン検出の課題に対処し、対テロ対策のための調査パターン検出フレームワーク(INSPECT)を開発する。このフレームワークは、行動指標を識別するための機械学習技術や、リスクプロファイル/グループを検出するグラフパターンマッチング技術を含む、多数のコンピューティングツールを統合する。 INSPECTはまた、詳細な法医学的伝記の大規模マイニング、知識ネットワークの形成、行動指標とラジカル化軌跡のクエリのための複数のタスクを自動化する。 INSPECTは調査対象探索のループ・イン・ザ・ループ・モードを目標としており、国内のジハドリズムに関する進化的データセットを用いて検証・評価されている。

Law-enforcement investigations aimed at preventing attacks by violent extremists have become increasingly important for public safety. The problem is exacerbated by the massive data volumes that need to be scanned to identify complex behaviors of extremists and groups. Automated tools are required to extract information to respond queries from analysts, continually scan new information, integrate them with past events, and then alert about emerging threats. We address challenges in investigative pattern detection and develop an Investigative Pattern Detection Framework for Counterterrorism (INSPECT). The framework integrates numerous computing tools that include machine learning techniques to identify behavioral indicators and graph pattern matching techniques to detect risk profiles/groups. INSPECT also automates multiple tasks for large-scale mining of detailed forensic biographies, forming knowledge networks, and querying for behavioral indicators and radicalization trajectories. INSPECT targets human-in-the-loop mode of investigative search and has been validated and evaluated using an evolving dataset on domestic jihadism.

翻訳日:2023-10-31 13:48:46 公開日:2023-10-30

# クラスタリング割り当て一貫性を持つ一般化カテゴリー発見

Generalized Category Discovery with Clustering Assignment Consistency ( http://arxiv.org/abs/2310.19210v1 )

ライセンス: Link先を確認

Xiangli Yang, Xinglin Pan, Irwin King, Zenglin Xu

(参考訳) 一般化圏発見(GCD)は、最近提案されたオープンワールドタスクである。ラベル付きインスタンスとラベルなしインスタンスからなる一連のイメージが与えられた場合、gcdの目標はラベル付きデータセットから転送される情報を使用してラベルなしのサンプルを自動的にクラスタ化することである。ラベルのないデータセットは、既知のクラスと新しいクラスの両方からなる。主な課題は、未ラベルの新規クラスサンプルと未ラベルの既知のクラスサンプルが未ラベルのデータセットで混合されることである。ラベルなしデータセットのクラス番号を知らずにgcdに対処するために,クラスタ一貫性を促進するコトレーニングベースのフレームワークを提案する。具体的には,同じサンプルに対して十分に異なる2つのビューを生成するために,まず弱かつ強い拡張変換を導入する。次に,協調学習を前提として,特徴型類似性とクラスタリング割り当ての一貫性を促進する一貫性表現学習戦略を提案する。最後に、半教師付き表現学習プロセスから学習した識別的埋め込みを用いて、元のスパースネットワークを構築し、コミュニティ検出手法を用いてクラスタリング結果とカテゴリ数とを同時に取得する。広汎な実験により,本手法は3つの総合的なベンチマークと3つのきめ細かな視覚認識データセットに対して,最先端の性能を実現する。特に、ImageNet-100データセットでは、この手法は、それぞれ \texttt{Novel} クラスと \texttt{All} クラスで 15.5\% と 7.0\% をはるかに上回る。

Generalized category discovery (GCD) is a recently proposed open-world task. Given a set of images consisting of labeled and unlabeled instances, the goal of GCD is to automatically cluster the unlabeled samples using information transferred from the labeled dataset. The unlabeled dataset comprises both known and novel classes. The main challenge is that unlabeled novel class samples and unlabeled known class samples are mixed together in the unlabeled dataset. To address the GCD without knowing the class number of unlabeled dataset, we propose a co-training-based framework that encourages clustering consistency. Specifically, we first introduce weak and strong augmentation transformations to generate two sufficiently different views for the same sample. Then, based on the co-training assumption, we propose a consistency representation learning strategy, which encourages consistency between feature-prototype similarity and clustering assignment. Finally, we use the discriminative embeddings learned from the semi-supervised representation learning process to construct an original sparse network and use a community detection method to obtain the clustering results and the number of categories simultaneously. Extensive experiments show that our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets. Especially in the ImageNet-100 data set, our method significantly exceeds the best baseline by 15.5\% and 7.0\% on the \texttt{Novel} and \texttt{All} classes, respectively.

翻訳日:2023-10-31 13:48:29 公開日:2023-10-30

# litcab: さまざまな長さの出力に対する言語モデルの軽量キャリブレーション

LitCab: Lightweight Calibration of Language Models on Outputs of Varied Lengths ( http://arxiv.org/abs/2310.19208v1 )

ライセンス: Link先を確認

Xin Liu, Muhammad Khalifa, Lu Wang

(参考訳) モデルは、その確率推定が実際の出力が正しい可能性に合致するときによく調整されたと見なされる。 lmsの一般的な問題である幻覚の検出と緩和において重要な役割を果たすため、言語モデル(lms)の校正は不可欠である。しかし、回答の正しさと高い計算コストを識別する柔軟性の欠如により、一般的なニューラルネットワークキャリブレーション技術はLMに適していない。例えば、温度スケーリングのような後処理方法は、しばしば候補世代を再順序付けできない。さらに、トレーニングベースの手法ではモデル全体を微調整する必要がある。本稿では,入力テキストの表現とlm出力ロジットの操作を行う単一の線形層からなる軽量キャリブレーション機構であるlitcabを提案する。 LitCabはモデルのキャリブレーションを改善し、元のモデルのパラメータの2%しか追加しない。評価のために、7つのテキスト生成タスクからなるベンチマークであるCaTを構築し、短いフレーズから段落までの応答をカバーする。 Llama2-7BでLitCabをテストし、平均ECEスコアを20%削減することで、全タスクのキャリブレーションを改善する。さらに, GPT と LLaMA の7つのオープンソース LM を総合的に評価した結果,(1) 同じ家族内の大規模モデルでは, 短時間のタスクではキャリブレーションが向上するが, 必ずしも長いタスクでは不十分であることがわかった。 2) LLaMA, Llama2, Vicunaモデルと比較して, パラメータが少ないにもかかわらず, GPTモデルの方がキャリブレーションが優れている。 (3) 限定目的(会話など)のサンプルを用いたファインチューニング事前学習モデル(LLaMAなど)は、キャリブレーションが悪化する可能性があり、LMのキャリブレーションにおけるファインチューニング設定の重要性を強調している。

A model is considered well-calibrated when its probability estimate aligns with the actual likelihood of the output being correct. Calibrating language models (LMs) is crucial, as it plays a vital role in detecting and mitigating hallucinations, a common issue of LMs, as well as building more trustworthy models. Yet, popular neural model calibration techniques are not well-suited for LMs due to their lack of flexibility in discerning answer correctness and their high computational costs. For instance, post-processing methods like temperature scaling are often unable to reorder the candidate generations. Moreover, training-based methods require finetuning the entire model, which is impractical due to the increasing sizes of modern LMs. In this paper, we present LitCab, a lightweight calibration mechanism consisting of a single linear layer taking the input text representation and manipulateing the LM output logits. LitCab improves model calibration by only adding < 2% of the original model parameters. For evaluation, we construct CaT, a benchmark consisting of 7 text generation tasks, covering responses ranging from short phrases to paragraphs. We test LitCab with Llama2-7B, where it improves calibration across all tasks, by reducing the average ECE score by 20%. We further conduct a comprehensive evaluation with 7 popular open-sourced LMs from GPT and LLaMA families, yielding the following key findings: (1) Larger models within the same family exhibit better calibration on tasks with short generation tasks, but not necessarily for longer ones. (2) GPT-family models show superior calibration compared to LLaMA, Llama2 and Vicuna models despite having much fewer parameters. (3) Finetuning pretrained model (e.g., LLaMA) with samples of limited purpose (e.g., conversations) may lead to worse calibration, highlighting the importance of finetuning setups for calibrating LMs.

翻訳日:2023-10-31 13:48:02 公開日:2023-10-30

# 生成型人工知能を用いた学生学習行動のシミュレーション

Leveraging generative artificial intelligence to simulate student learning behavior ( http://arxiv.org/abs/2310.19206v1 )

ライセンス: Link先を確認

Songlin Xu, Xinyu Zhang

(参考訳) 学生シミュレーションは、学習成果を高め、教育研究を進め、最終的には効果的な教育の未来を形作るための変革的なアプローチを示す。学生の学習行動をシミュレートするaiにおける顕著な成果である大規模言語モデル(llm)の実現可能性を検討する。従来の機械学習ベースの予測とは異なり、llmを利用して特定の人口動態を持つ仮想学生をインスタンス化し、学習経験、コース教材、理解レベル、エンゲージメントの間の複雑な相関を明らかにする。我々の目的は、学習成果を予測するだけでなく、実際の学生の学習行動やパターンを再現することである。この仮説を3つの実験で検証する。最初の実験は、n = 145のデータセットに基づいて、人口統計データから学生の学習結果をシミュレートし、様々な人口要因に関する実際の学生との平行性を明らかにする。第2の実験(n = 4524)は、仮想学生のモデリングにより多くの評価履歴を持つ、ますます現実的なシミュレーション行動をもたらす。第3の実験(N = 27)は、事前の知識とコースの相互作用を取り入れ、仮想的な学生の学習行動と、テスト質問、コース資料、エンゲージメント、理解レベルからのきめ細かいマッピングとの強いつながりを示している。これらを総合して,llmの理解を深め,学生シミュレーションの有効性を実証し,包括性と教育効果を高めるために,より適応性の高いカリキュラムデザインを付与した。

Student simulation presents a transformative approach to enhance learning outcomes, advance educational research, and ultimately shape the future of effective pedagogy. We explore the feasibility of using large language models (LLMs), a remarkable achievement in AI, to simulate student learning behaviors. Unlike conventional machine learning based prediction, we leverage LLMs to instantiate virtual students with specific demographics and uncover intricate correlations among learning experiences, course materials, understanding levels, and engagement. Our objective is not merely to predict learning outcomes but to replicate learning behaviors and patterns of real students. We validate this hypothesis through three experiments. The first experiment, based on a dataset of N = 145, simulates student learning outcomes from demographic data, revealing parallels with actual students concerning various demographic factors. The second experiment (N = 4524) results in increasingly realistic simulated behaviors with more assessment history for virtual students modelling. The third experiment (N = 27), incorporating prior knowledge and course interactions, indicates a strong link between virtual students' learning behaviors and fine-grained mappings from test questions, course materials, engagement and understanding levels. Collectively, these findings deepen our understanding of LLMs and demonstrate its viability for student simulation, empowering more adaptable curricula design to enhance inclusivity and educational effectiveness.

翻訳日:2023-10-31 13:47:28 公開日:2023-10-30

# ChatGPTはソフトウェアテストインテリジェンスを前進させることができるか? 変成試験の経験報告

Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing ( http://arxiv.org/abs/2310.19204v1 )

ライセンス: Link先を確認

Quang-Hung Luu, Huai Liu, and Tsong Yueh Chen

(参考訳) ChatGPTは人間の質問に答えるために使われている人工知能チャットボットとしてよく知られているが、ソフトウェアテストの進歩の可能性を見出したいかもしれない。本稿では,最新のソフトウェアテスト技術であるメタモルフィックテスト(MT)のケーススタディを通じて,ソフトウェアテストのインテリジェンス向上におけるChatGPTの有効性を検討する。私たちはchatgptに、基本的にはオブジェクトプログラムに必要な特性であり、伝統的に人間の知性を必要とするメタモーフィックリレーション(mrs)の候補を生成するように依頼します。これらのMR候補は、ドメインの専門家による正確性の観点から評価される。複数のソフトウェアシステムをテストするために、chatgptを新しい正しいmrsを生成するために使用できることを示す。とはいえ、MR候補の大多数は曖昧に定義されているか、正しく定義されていないか、特にMTでテストされたことのないシステムで定義されている。ChatGPTは、後にテストを実施するために採用されるMR候補を提案することで、ソフトウェアテストインテリジェンスを促進するために使用できる。

While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness.

翻訳日:2023-10-31 13:47:02 公開日:2023-10-30

# M4LE:大規模言語モデルのためのマルチタスクマルチタスクマルチドメイン長期評価ベンチマーク

M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models ( http://arxiv.org/abs/2310.19240v1 )

ライセンス: Link先を確認

Wai-Chung Kwan, Xingshan Zeng, Yufei Wang, Yusen Sun, Liangyou Li, Lifeng Shang, Qun Liu, Kam-Fai Wong

(参考訳) 長いシーケンスを管理することは、大きな言語モデル(LLM)にとって重要かつ必要な機能となっている。しかし、LLMの長期的能力をどのように包括的かつ体系的に評価するかについては、未解決の問題である。その理由の1つは、従来の広く使われているベンチマークが主に短いシーケンスで構成されていることである。本稿では,M4LE(Multi-ability, Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation)を提案する。 M4LEは、36のNLPデータセット、11のタスクタイプ、12のドメインからなる多様なNLPタスクプールに基づいている。自然に長いシーケンスを持つタスクの不足を軽減し、複数の可能性評価を組み込むことを目的として、短いシーケンスタスクを一貫したロングシーケンスシナリオに変換するための自動アプローチ(ただし、人間のアノテーションは無視できない)を提案する。具体的には、(1)明示的なシングルスパン、(2)セマンティックなシングルスパン、(3)明示的なマルチスパン、(4)セマンティックなマルチスパン、(5)グローバルコンテキスト理解の5つの異なる能力を含む。 M4LEのサンプルは1kから8kまで均等に分散される。我々は,11個のLLM,特に長文入力に最適化されたLLMについて,系統評価を行った。結果はこう示しています 1)現在のLLMは、特にタスクが複数注意を必要とする場合、長いコンテキストを理解するのに苦労している。 2)有能なLLMでは意味検索が困難である。 3) 位置補間付き長文で微調整されたモデルでは, 微調整をしないニューラルタンジェントカーネル (NTK) によるスケーリング手法と同等の性能を示した。この挑戦的な領域における将来の研究を促進するために、ベンチマークを公開しています。

Managing long sequences has become an important and necessary feature for large language models (LLMs). However, it is still an open question of how to comprehensively and systematically evaluate the long-sequence capability of LLMs. One of the reasons is that conventional and widely-used benchmarks mainly consist of short sequences. In this paper, we propose M4LE, a Multi-ability, Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation. M4LE is based on a diverse NLP task pool comprising 36 NLP datasets, 11 task types and 12 domains. To alleviate the scarcity of tasks with naturally long sequences and incorporate multiple-ability assessment, we propose an automatic approach (but with negligible human annotations) to convert short-sequence tasks into a unified long-sequence scenario where LLMs have to identify single or multiple relevant spans in long contexts based on explicit or semantic hints. Specifically, the scenario includes five different types of abilities: (1) explicit single-span; (2) semantic single-span; (3) explicit multiple-span; (4) semantic multiple-span; and (5) global context understanding. The resulting samples in M4LE are evenly distributed from 1k to 8k input length. We conducted a systematic evaluation on 11 well-established LLMs, especially those optimized for long-sequence inputs. Our results reveal that: 1) Current LLMs struggle to understand long context, particularly when tasks require multiple-span attention. 2) Semantic retrieval task is more difficult for competent LLMs. 3) Models fine-tuned on longer text with position interpolation have comparable performance to those using Neural Tangent Kernel (NTK) aware scaling methods without fine-tuning. We make our benchmark publicly available to encourage future research in this challenging area.

翻訳日:2023-10-31 13:35:17 公開日:2023-10-30

# キャビティマグノメカニクス:古典から量子へ

Cavity magnomechanics: from classical to quantum ( http://arxiv.org/abs/2310.19237v1 )

ライセンス: Link先を確認

Xuan Zuo, Zhi-Yuan Fan, Hang Qian, Ming-Song Ding, Huatang Tan, Hao Xiong, Jie Li

(参考訳) 磁気材料中のマグノンに基づくハイブリッド量子システムは、過去10年間で大きな進歩を遂げた。これらは、マイクロ波光子、光子、振動フォノン、超伝導量子ビットとマグノンの結合に基づいている。特に、マグノン、マイクロ波キャビティ光子、振動フォノン間の相互作用は、キャビティ量子力学(cmm)の系を形成し、キャビティ量子力学、マグノニクス、量子光学、量子情報の学際的な分野にある。本稿では,この新興分野の実験的・理論的進展について概観する。まず,マグノメカニカルカップリングの基本理論を紹介し,その後,マグノメカニカル誘起透過性,マグノメカニカル動的バックアクション,マグノン・フォノン・クロスカー非線形性など,実験的に観測された代表的な古典現象を紹介する。また、マグノン、フォノン、光子の異なる種類の量子状態を作成するためのCMMシステムの可能性を示す理論的な提案や、それに基づくマグノメカニクス、光力学、関連する量子プロトコルを組み合わせたハイブリッドシステムについても論じる。最後に、このレビューを要約し、この分野での今後の研究の方向性を概観する。

Hybrid quantum systems based on magnons in magnetic materials have made significant progress in the past decade. They are built based on the couplings of magnons with microwave photons, optical photons, vibration phonons, and superconducting qubits. In particular, the interactions among magnons, microwave cavity photons, and vibration phonons form the system of cavity magnomechanics (CMM), which lies in the interdisciplinary field of cavity QED, magnonics, quantum optics, and quantum information. Here, we review the experimental and theoretical progress of this emerging field. We first introduce the underlying theories of the magnomechanical coupling, and then some representative classical phenomena that have been experimentally observed, including magnomechanically induced transparency, magnomechanical dynamical backactions, magnon-phonon cross-Kerr nonlinearity, etc. We also discuss a number of theoretical proposals, which show the potential of the CMM system for preparing different kinds of quantum states of magnons, phonons, and photons, and hybrid systems combining magnomechanics and optomechanics and relevant quantum protocols based on them. Finally, we summarize this review and provide an outlook for the future research directions in this field.

翻訳日:2023-10-31 13:34:40 公開日:2023-10-30

# 大規模言語モデルを用いた実世界会議要約システムの構築:実践的視点

Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective ( http://arxiv.org/abs/2310.19233v1 )

ライセンス: Link先を確認

Md Tahmid Rahman Laskar, Xue-Yong Fu, Cheng Chen, Shashi Bhushan TN

(参考訳) 本稿では,大規模言語モデル (LLM) を用いた実世界利用のための会議要約システムを効果的に構築する方法を検討する。本研究では, GPT-4, GPT-3.5, PaLM-2, LLaMA-2 など,様々なオープンソース LLM の評価と比較を行う。以上の結果から,ほとんどのクローズドソース LLM は性能的に優れていることがわかった。しかし、LLaMA-2 (7Bと13B) のようなより小さなオープンソースモデルは、ゼロショットシナリオでも大きなクローズドソースモデルに匹敵するパフォーマンスを実現できた。 API経由でのみアクセス可能なクローズドソースモデルのプライバシ上の懸念と、クローズドソースモデルの微調整バージョンの使用に伴う高コストを考えると、競合的なパフォーマンスを実現するオープンソースモデルは、工業的利用においてより有利である。 LLaMA-2-7Bモデルは、関連するコストとプライバシの懸念とパフォーマンスのバランスをとることで、産業利用に期待できる。要約すると、本論文は、実世界のビジネスミーティングの要約にLLMを使うことに関する実践的な洞察を提供し、パフォーマンスとコストのトレードオフに光を当てる。

This paper studies how to effectively build meeting summarization systems for real-world usage using large language models (LLMs). For this purpose, we conduct an extensive evaluation and comparison of various closed-source and open-source LLMs, namely, GPT-4, GPT- 3.5, PaLM-2, and LLaMA-2. Our findings reveal that most closed-source LLMs are generally better in terms of performance. However, much smaller open-source models like LLaMA- 2 (7B and 13B) could still achieve performance comparable to the large closed-source models even in zero-shot scenarios. Considering the privacy concerns of closed-source models for only being accessible via API, alongside the high cost associated with using fine-tuned versions of the closed-source models, the opensource models that can achieve competitive performance are more advantageous for industrial use. Balancing performance with associated costs and privacy concerns, the LLaMA-2-7B model looks more promising for industrial usage. In sum, this paper offers practical insights on using LLMs for real-world business meeting summarization, shedding light on the trade-offs between performance and cost.

翻訳日:2023-10-31 13:34:16 公開日:2023-10-30

# 熱帯キャラクタリゼーションを用いたアダプタプラニング

Adapter Pruning using Tropical Characterization ( http://arxiv.org/abs/2310.19232v1 )

ライセンス: Link先を確認

Rishabh Bhardwaj, Tushar Vaidya, Soujanya Poria

(参考訳) アダプタは、訓練済み言語モデルの層間にトレーニング可能なモジュールを挿入する自然言語処理において、パラメータ効率のよい伝達学習アプローチとして広く普及している。しかし、いくつかのヒューリスティックな研究とは別に、下流アプリケーションに必要なアダプタパラメータの最適な数を分析する研究が不足している。本稿では,訓練可能なモジュールの熱帯特性を解析し,アダプタプルーニング手法を提案する。我々は,熱帯超曲面の配向を変化させることなく,アダプタ層からパラメータを抽出することを目的とした最適化問題とみなした。 5つのnlpデータセットを用いた実験により、熱帯幾何学は、マグニチュードベースのベースラインと比較すると、pruneの関連するパラメータを識別しがちであることが示された。

Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has been a lack of studies analyzing the optimal number of adapter parameters needed for downstream applications. In this paper, we propose an adapter pruning approach by studying the tropical characteristics of trainable modules. We cast it as an optimization problem that aims to prune parameters from the adapter layers without changing the orientation of underlying tropical hypersurfaces. Our experiments on five NLP datasets show that tropical geometry tends to identify more relevant parameters to prune when compared with the magnitude-based baseline, while a combined approach works best across the tasks.

翻訳日:2023-10-31 13:33:56 公開日:2023-10-30

# 地球観測における深層学習のためのデータセットなどデータはない

There Are No Data Like More Data- Datasets for Deep Learning in Earth Observation ( http://arxiv.org/abs/2310.19231v1 )

ライセンス: Link先を確認

Michael Schmitt and Seyed Ali Ahmadi and Yonghao Xu and Gulsen Taskin and Ujjwal Verma and Francescopaolo Sica and Ronny Hansch

(参考訳) 注意深いキュレーションと注釈付きデータセットは、機械学習の基礎であり、特にデータ格納型ディープニューラルネットワークは、しばしば人工知能(ai)と呼ばれるものの中核を形成する。地球観測(eo)問題に適用されたディープラーニングの大規模成功により、コミュニティの焦点は、より洗練されたディープニューラルネットワークアーキテクチャの開発と、データセットの全体的重要性をほとんど無視するトレーニング戦略である。その目的のために、以前公開された地球観測のためのAIに関するレビュー記事によって無視された多くのタスク固有のデータセットが作成されている。この記事では、視点を変え、地球観測データとアプリケーション専用の機械学習データセットをスポットライトに入れたいと思っています。歴史的発展のレビューに基づき、現在利用可能な資源を概説し、今後の発展への展望を定めている。我々は、我々のデータの性質が、画像データに深層学習技術を適用する他の多くのコミュニティと地球観測コミュニティを区別するものであり、EOデータの特異性に関する詳細な理解が、我々の規律の中核的な能力である、という理解に貢献することを望んでいる。

Carefully curated and annotated datasets are the foundation of machine learning, with particularly data-hungry deep neural networks forming the core of what is often called Artificial Intelligence (AI). Due to the massive success of deep learning applied to Earth Observation (EO) problems, the focus of the community has been largely on the development of ever-more sophisticated deep neural network architectures and training strategies largely ignoring the overall importance of datasets. For that purpose, numerous task-specific datasets have been created that were largely ignored by previously published review articles on AI for Earth observation. With this article, we want to change the perspective and put machine learning datasets dedicated to Earth observation data and applications into the spotlight. Based on a review of the historical developments, currently available resources are described and a perspective for future developments is formed. We hope to contribute to an understanding that the nature of our data is what distinguishes the Earth observation community from many other communities that apply deep learning techniques to image data, and that a detailed understanding of EO data peculiarities is among the core competencies of our discipline.

翻訳日:2023-10-31 13:33:41 公開日:2023-10-30

# 強相互作用する局所量子場理論の量子シミュレーションのための効率的な真空状態形成

Efficient vacuum state preparation for quantum simulation of strongly interacting local quantum field theories ( http://arxiv.org/abs/2310.19229v1 )

ライセンス: Link先を確認

Thomas D. Cohen, Hyunwoo Oh

(参考訳) 量子コンピュータ上で強相互作用する局所量子場理論の文脈で基底状態を作成するための効率的な手法を提案する。このアルゴリズムは、解析的に計算可能な系の基底状態から始まり、パラメータ空間の経路に沿って基底状態を維持しながらハミルトニアンのパラメータを興味のあるものへと発展させるという、従来の断熱的状態形成技術や量子ゼノ効果に基づく方法と同じクラスに属する。このアプローチでは、体積の平方根に比例する時間内に真空状態を生成する。提案手法は,パラメータ空間で適切に定義された経路長とともに資源が線形にスケールするパラメータ空間内の経路をトラバースする新しい手法を利用する。現実的な制限によるエラーは抑制され、沿道の世俗的な成長は見られない。最終的な精度は、体積に依存しない付加コストで任意に向上することができ、生成した状態と正確な基底状態との重なりによって対数的に増加する。

An efficient approach for preparing ground states in the context of strongly interacting local quantum field theories on quantum computers is presented. The algorithm belongs to the same class as traditional adiabatic state preparation techniques and methods based on quantum Zeno effect in that it starts with a ground state of an analytically calculable system and evolves the parameters of the Hamiltonian to the one of interest while maintaining the ground state along the path in parameter space. The approach produces the vacuum state in a time proportional to the square-root of the volume, which is a square-root improvement in speed compared to traditional approaches. The approach exploits a novel method for traversing the path in parameter space in which the resources scale linearly with a path length suitably defined in parameter space. Errors due to practical limitations are controlled and do not exhibit secular growth along the path. The final accuracy can be arbitrarily improved with an additive cost, which is independent of the volume and grows slower than logarithmically with the overlap between the state produced and the exact ground state.

翻訳日:2023-10-31 13:33:21 公開日:2023-10-30

# 確率的構成マシン:FPGAの実装

Stochastic Configuration Machines: FPGA Implementation ( http://arxiv.org/abs/2310.19225v1 )

ライセンス: Link先を確認

Matthew J. Felicetti and Dianhui Wang

(参考訳) 産業用ニューラルネットワークは一般的に、応答速度、メモリサイズ、電力使用量などの制約がある。ランダムな学習者はこれらの問題に対処できる。しかし、ハードウェアソリューションは、モデルの性能を維持しながら、より良いリソース削減を提供することができる。確率的構成ネットワーク(SCN)は、データモデリングの利点と実現可能性のために、産業アプリケーションにおいて主要な選択肢である。 Stochastic Configuration Machines (SCM) はこれを拡張して、ランダム化された重み付けを各ノードのスカラーでバイナリ値に制限し、学習性能と結果の解釈性を改善するメカニズムモデルを使用することで、メモリ制約の削減に重点を置いている。本稿では、フィールドプログラマブルゲートアレイ(FPGA)にSCMモデルを実装し、アルゴリズムにバイナリコード入力を導入することを目的とする。結果は、2つのベンチマークと、SCMと1層アーキテクチャとディープアーキテクチャを含む2つの産業データセットで報告されている。

Neural networks for industrial applications generally have additional constraints such as response speed, memory size and power usage. Randomized learners can address some of these issues. However, hardware solutions can provide better resource reduction whilst maintaining the model's performance. Stochastic configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling. Stochastic Configuration Machines (SCMs) extend this to focus on reducing the memory constraints by limiting the randomized weights to a binary value with a scalar for each node and using a mechanism model to improve the learning performance and result interpretability. This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to the algorithm. Results are reported for two benchmark and two industrial datasets, including SCM with single-layer and deep architectures.

翻訳日:2023-10-31 13:33:02 公開日:2023-10-30

# CHAMMI:顕微鏡画像におけるチャネル適応モデルのベンチマーク

CHAMMI: A benchmark for channel-adaptive models in microscopy imaging ( http://arxiv.org/abs/2310.19224v1 )

ライセンス: Link先を確認

Zitong Chen, Chau Pham, Siqi Wang, Michael Doron, Nikita Moshkov, Bryan A. Plummer, Juan C. Caicedo

(参考訳) ほとんどのニューラルネットワークは、入力画像が一定数のチャンネルを持つと仮定している(rgb画像では3つ)。しかし、機器や実験目標に応じてチャンネルの数が変化する顕微鏡画像など、チャンネルの数が変化する可能性のある設定が多数存在する。しかし、チャネルの数や種類に不変なニューラルネットワークを作成して評価するシステム的な試みは行われていない。結果として、訓練されたモデルは個々の研究に固有のままであり、他の顕微鏡設定ではほとんど再利用できない。本稿では,顕微鏡画像におけるチャネル適応モデルの検討のためのベンチマークを提案する。 1) 可変チャネル単細胞画像のデータセット、及び 2)生物学的に関連する評価枠組み。さらに,複数の既存手法を用いてチャネル適応モデルを作成し,このベンチマークの性能を固定チャネルベースラインモデルと比較した。チャネル適応モデルがドメイン外のタスクをより一般化し、計算効率が向上できることが分かりました。キュレートされたデータセット(https://doi.org/10.5281/zenodo.7988357)と評価API(https://github.com/broadinstitute/MorphEm.git)をコントリビュートして、将来の研究や応用における客観的比較を容易にする。

Most neural networks assume that input images have a fixed number of channels (three for RGB images). However, there are many settings where the number of channels may vary, such as microscopy images where the number of channels changes depending on instruments and experimental goals. Yet, there has not been a systemic attempt to create and evaluate neural networks that are invariant to the number and type of channels. As a result, trained models remain specific to individual studies and are hardly reusable for other microscopy settings. In this paper, we present a benchmark for investigating channel-adaptive models in microscopy imaging, which consists of 1) a dataset of varied-channel single-cell images, and 2) a biologically relevant evaluation framework. In addition, we adapted several existing techniques to create channel-adaptive models and compared their performance on this benchmark to fixed-channel, baseline models. We find that channel-adaptive models can generalize better to out-of-domain tasks and can be computationally efficient. We contribute a curated dataset (https://doi.org/10.5281/zenodo.7988357) and an evaluation API (https://github.com/broadinstitute/MorphEm.git) to facilitate objective comparisons in future research and applications.

翻訳日:2023-10-31 13:32:47 公開日:2023-10-30

# rgb画像に基づくロボット把持検出のためのモジュール型アンチノイズ深層学習ネットワーク

Modular Anti-noise Deep Learning Network for Robotic Grasp Detection Based on RGB Images ( http://arxiv.org/abs/2310.19223v1 )

ライセンス: Link先を確認

Zhaocong Li

(参考訳) 従来の手法は深度センサーに依存しているが、現在のトレンドは深度センサーがないにもかかわらず、費用対効果の高いRGB画像の利用に傾いている。本稿では,単一のRGB画像からつかむポーズを検出するための興味深いアプローチを提案する。そこで本研究では,パラレルプレートグリッパーを備えたロボット向けに,把持検出とセマンティクスセグメンテーションを付加したモジュール型学習ネットワークを提案する。我々のネットワークは、把握可能な対象を識別するだけでなく、セマンティックセグメンテーションによる事前把握分析を融合し、把握検出精度を高める。著しく、私たちのデザインは弾力性を示し、ぼやけた、騒がしい視覚をうまく処理します。鍵となる貢献は、rgb画像からの把持検出のための訓練可能なネットワーク、実現可能な把持実装を容易にするモジュラーデザイン、および共通の画像歪みに対して頑健なアーキテクチャを含む。提案手法の有効性と精度を実践的な実験と評価によって実証する。

While traditional methods relies on depth sensors, the current trend leans towards utilizing cost-effective RGB images, despite their absence of depth cues. This paper introduces an interesting approach to detect grasping pose from a single RGB image. To this end, we propose a modular learning network augmented with grasp detection and semantic segmentation, tailored for robots equipped with parallel-plate grippers. Our network not only identifies graspable objects but also fuses prior grasp analyses with semantic segmentation, thereby boosting grasp detection precision. Significantly, our design exhibits resilience, adeptly handling blurred and noisy visuals. Key contributions encompass a trainable network for grasp detection from RGB images, a modular design facilitating feasible grasp implementation, and an architecture robust against common image distortions. We demonstrate the feasibility and accuracy of our proposed approach through practical experiments and evaluations.

翻訳日:2023-10-31 13:32:22 公開日:2023-10-30

# 統合学習における勾配を用いた最大知識直交性再構成

Maximum Knowledge Orthogonality Reconstruction with Gradients in Federated Learning ( http://arxiv.org/abs/2310.19222v1 )

ライセンス: Link先を確認

Feng Wang, Senem Velipasalar, M. Cenk Gursoy

(参考訳) フェデレートラーニング(FL)は、プライバシーを守るためにクライアントデータをローカルに保つことを目的としている。データそのものを集める代わりに、サーバはクライアントから集約された勾配更新のみを収集する。 FLの普及に伴い、勾配更新から入力データを再構成することでFLアプローチの脆弱性を明らかにするなど、かなりの作業が続けられている。しかし、既存の作業の多くは、非現実的に小さなバッチサイズでFLの設定を前提としており、バッチサイズが大きいと画質が劣る。他の研究では、ニューラルネットワークアーキテクチャやパラメータを不審な点に修正しているため、クライアントが検出することができる。さらに、ほとんどのバッチは、大きなバッチから1つのサンプル入力だけを再構築できる。これらの制約に対処するために、クライアントの入力データを再構築する、MKOR(Maxum Knowledge Orgonality Restruction)と呼ばれる、新しく完全に解析的なアプローチを提案する。提案手法は,大規模なバッチから数学的に証明された高品質画像を再構成する。 MKORは、クライアントに秘密に修正されたパラメータを送信することしか要求せず、クライアントの勾配更新から入力イメージを効率的かつ不明瞭に再構築することができる。 MNIST, CIFAR-100, ImageNetにおけるMKORの性能を評価し, 最新技術と比較した。その結果、mkorは既存のアプローチよりも優れており、flのプライバシ保護に関するさらなる研究の必要性が高まり、包括的な防御アプローチが開発されることが示されている。

Federated learning (FL) aims at keeping client data local to preserve privacy. Instead of gathering the data itself, the server only collects aggregated gradient updates from clients. Following the popularity of FL, there has been considerable amount of work, revealing the vulnerability of FL approaches by reconstructing the input data from gradient updates. Yet, most existing works assume an FL setting with unrealistically small batch size, and have poor image quality when the batch size is large. Other works modify the neural network architectures or parameters to the point of being suspicious, and thus, can be detected by clients. Moreover, most of them can only reconstruct one sample input from a large batch. To address these limitations, we propose a novel and completely analytical approach, referred to as the maximum knowledge orthogonality reconstruction (MKOR), to reconstruct clients' input data. Our proposed method reconstructs a mathematically proven high quality image from large batches. MKOR only requires the server to send secretly modified parameters to clients and can efficiently and inconspicuously reconstruct the input images from clients' gradient updates. We evaluate MKOR's performance on the MNIST, CIFAR-100, and ImageNet dataset and compare it with the state-of-the-art works. The results show that MKOR outperforms the existing approaches, and draws attention to a pressing need for further research on the privacy protection of FL so that comprehensive defense approaches can be developed.

翻訳日:2023-10-31 13:32:08 公開日:2023-10-30

# ストリームからプールへ:i.i.d. Arrivalsを超える動的価格設定

From Stream to Pool: Dynamic Pricing Beyond i.i.d. Arrivals ( http://arxiv.org/abs/2310.19220v1 )

ライセンス: Link先を確認

Titing Cui, Su Jia, Thomas Lavastida

(参考訳) 動的価格問題は、textbf{stream}モデルの下で広く研究されている: 顧客のストリームが順次到着し、それぞれが独立して同一に分散されたバリュエーションを持つ。しかし、この定式化は現実の世界を完全に反映するものではない。多くのシナリオでは、高い評価の顧客は早期に購入を行い、市場を去る傾向があり、バリュエーションの分布において「emph{shift}」となる。そこで本研究では,非ストラテジックな単価単価の顧客を売り手と繰り返しやりとりするモデルについて考察する。各顧客は、独立したPoissonプロセスに従って断続的に価格を監視し、観察された価格が彼女の‘emph{private’評価よりも低い場合、市場を永久に去る。我々は、最適な収益の1/k$を保証した非適応ポリシーを効率良く計算するminimax \emph{optimal}アルゴリズムを提案する。さらに,新規な \emph{debiasing} アプローチに基づく適応型 \emph{learn-then-earn} ポリシーを示し,$\tilde o(kn^{3/4})$ regret bound を証明する。さらに、マルティンゲール濃度の不等式を用いて、$\tilde O(k^{3/4} n^{3/4})$へのバウンドをさらに改善する。

The dynamic pricing problem has been extensively studied under the \textbf{stream} model: A stream of customers arrives sequentially, each with an independently and identically distributed valuation. However, this formulation is not entirely reflective of the real world. In many scenarios, high-valuation customers tend to make purchases earlier and leave the market, leading to a \emph{shift} in the valuation distribution. Thus motivated, we consider a model where a \textbf{pool} of $n$ non-strategic unit-demand customers interact repeatedly with the seller. Each customer monitors the price intermittently according to an independent Poisson process and makes a purchase if the observed price is lower than her \emph{private} valuation, whereupon she leaves the market permanently. We present a minimax \emph{optimal} algorithm that efficiently computes a non-adaptive policy which guarantees a $1/k$ fraction of the optimal revenue, given any set of $k$ prices. Moreover, we present an adaptive \emph{learn-then-earn} policy based on a novel \emph{debiasing} approach, and prove an $\tilde O(kn^{3/4})$ regret bound. We further improve the bound to $\tilde O(k^{3/4} n^{3/4})$ using martingale concentration inequalities.

翻訳日:2023-10-31 13:31:46 公開日:2023-10-30

# フェデレーテッド・アンラーニングに関する調査研究 : 分類学,課題,今後の方向性

A Survey of Federated Unlearning: A Taxonomy, Challenges and Future Directions ( http://arxiv.org/abs/2310.19218v1 )

ライセンス: Link先を確認

Jiaxi Yang, Yang Zhao

(参考訳) 信頼に値する連合学習(fl)の発展に伴い、忘れられる権利を実践する必要性が、連合学習(fu)の領域を生み出している。 FLでは、クライアントが生データを共有せずにグローバルモデルを共同でトレーニングすることで、特定の情報を選択的に学習する作業が大幅に複雑になる。その意味では、FUの課題に取り組むために多くの努力がなされており、大きな進歩を遂げている。本稿では,FUに関する総合的な調査を行う。特に,既存のアルゴリズム,目標,評価指標を提供し,fuの課題を特定する。いくつかの研究をレビューし比較することにより、様々なスキーム、潜在的な応用、今後の方向性の分類にまとめる。

With the development of trustworthy Federated Learning (FL), the requirement of implementing right to be forgotten gives rise to the area of Federated Unlearning (FU). Comparing to machine unlearning, a major challenge of FU lies in the decentralized and privacy-preserving nature of FL, in which clients jointly train a global model without sharing their raw data, making it substantially more intricate to selectively unlearn specific information. In that regard, many efforts have been made to tackle the challenges of FU and have achieved significant progress. In this paper, we present a comprehensive survey of FU. Specially, we provide the existing algorithms, objectives, evaluation metrics, and identify some challenges of FU. By reviewing and comparing some studies, we summarize them into a taxonomy for various schemes, potential applications and future directions.

翻訳日:2023-10-31 13:31:20 公開日:2023-10-30

# 微分プライベート最適化におけるグループクリッピングの精度と効率について

On the accuracy and efficiency of group-wise clipping in differentially private optimization ( http://arxiv.org/abs/2310.19215v1 )

ライセンス: Link先を確認

Zhiqi Bu, Ruixuan Liu, Yu-Xiang Wang, Sheng Zha, George Karypis

(参考訳) 最近の進歩は、ディファレンシャル・プライベート・ディープラーニング(DP)の精度、メモリコスト、トレーニング速度を大幅に向上させており、特に数百万から数十億のパラメータを持つ大規模ビジョンと言語モデルにおいてである。本研究では,dp最適化の重要な要素であるサンプル毎勾配クリッピング方式を徹底的に検討する。その結果,全層クリッピング(粗粒度)が最も一般的であり,最も精度が高いが,層ワイドクリッピング(微粒度)など,他のグループワイドクリッピングに比べてメモリコストが大きくなることがわかった。我々は収束理論と複雑性解析を通じてこのトレードオフを定式化する。重要なことは、グループワイドクリッピングと全層クリッピングの精度ギャップがより大きいモデルでは小さくなる一方で、グループワイドクリッピングのメモリ利点が残ることである。このため、グループワイドクリッピングにより、大きなモデルのDP最適化により、高い精度と低いピークメモリを同時に達成できる。

Recent advances have substantially improved the accuracy, memory cost, and training speed of differentially private (DP) deep learning, especially on large vision and language models with millions to billions of parameters. In this work, we thoroughly study the per-sample gradient clipping style, a key component in DP optimization. We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off: while the all-layer clipping (of coarse granularity) is the most prevalent and usually gives the best accuracy, it incurs heavier memory cost compared to other group-wise clipping, such as the layer-wise clipping (of finer granularity). We formalize this trade-off through our convergence theory and complexity analysis. Importantly, we demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models, while the memory advantage of the group-wise clipping remains. Consequently, the group-wise clipping allows DP optimization of large models to achieve high accuracy and low peak memory simultaneously.

翻訳日:2023-10-31 13:31:08 公開日:2023-10-30

# 多階低階行列における因子フィッティング、ランクアロケーション、パーティショニング

Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices ( http://arxiv.org/abs/2310.19214v1 )

ライセンス: Link先を確認

Tetiana Parshakova, Trevor Hastie, Eric Darve, Stephen Boyd

(参考訳) 我々は、行列の和の行と列の置換として定義されるマルチレベル低階行列(MLR)を考える。 MLR行列は低階行列を拡張するが、総記憶量や行列ベクトル乗算の複雑さなど、その性質の多くを共有している。フロベニウスノルムの MLR 行列によって与えられた行列を適合させる際に生じる3つの問題に対処する。第一の問題は、MLR行列の因子を調整する因子フィッティングである。 2つ目はランクアロケーションであり、MLR行列に必要な総ストレージを保存するために、与えられた値の合計ランクに基づいて各レベルのブロックのランクを選択する。最後の問題は、列と列の階層的な分割と、ランクと要素を選択することである。本稿では,提案手法を実装したオープンソースパッケージについて述べる。

We consider multilevel low rank (MLR) matrices, defined as a row and column permutation of a sum of matrices, each one a block diagonal refinement of the previous one, with all blocks low rank given in factored form. MLR matrices extend low rank matrices but share many of their properties, such as the total storage required and complexity of matrix-vector multiplication. We address three problems that arise in fitting a given matrix by an MLR matrix in the Frobenius norm. The first problem is factor fitting, where we adjust the factors of the MLR matrix. The second is rank allocation, where we choose the ranks of the blocks in each level, subject to the total rank having a given value, which preserves the total storage needed for the MLR matrix. The final problem is to choose the hierarchical partition of rows and columns, along with the ranks and factors. This paper is accompanied by an open source package that implements the proposed methods.

翻訳日:2023-10-31 13:30:47 公開日:2023-10-30

# duma: 速い思考と遅い思考を持つデュアルマインド会話エージェント

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking ( http://arxiv.org/abs/2310.18075v2 )

ライセンス: Link先を確認

Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

(参考訳) 人間の認知の二重プロセス理論に着想を得て,2つの生成的大言語モデル(LLM)をそれぞれ高速・低速な思考に用い,二重マシン機構を具現化した対話エージェントフレームワークであるDUMAを導入する。高速思考モデルは、外的相互作用と初期応答生成の主要なインターフェースとして機能し、完全な応答の複雑さに基づいて、遅い思考モデルに取り組む必要性を評価する。起動すると、遅い思考モデルが会話を引き継ぎ、綿密な計画、推論、ツール利用に取り組み、よく分析された応答を提供する。このデュアルミンド構成は、直感的な応答と状況に基づいた意図的な問題解決プロセスのシームレスな遷移を可能にする。我々は,不動産業界のオンライン調査を扱う対話エージェントを構築した。実験は,本手法が有効性と効率のバランスをとることを証明し,ベースラインと比較して著しく改善した。

Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow thinking respectively. The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response. When invoked, the slow thinking model takes over the conversation, engaging in meticulous planning, reasoning, and tool utilization to provide a well-analyzed response. This dual-mind configuration allows for a seamless transition between intuitive responses and deliberate problem-solving processes based on the situation. We have constructed a conversational agent to handle online inquiries in the real estate industry. The experiment proves that our method balances effectiveness and efficiency, and has a significant improvement compared to the baseline.

翻訳日:2023-10-31 11:47:56 公開日:2023-10-30

# FormalGeo:人間ライクなIMOレベルの自動推論への第一歩

FormalGeo: The First Step Toward Human-like IMO-level Geometric Automated Reasoning ( http://arxiv.org/abs/2310.18021v2 )

ライセンス: Link先を確認

Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Qike Huang, Xiaoxiao Jin, Yanjun Guo, Chenyang Mao, Zhe Zhu, Dengfeng Yue, Fangzhen Zhu, Yang Li, Yifan Wang, Yiwen Huang, Runan Wang, Cheng Qin, Zhenbing Zeng, Shaorong Xie, Xiangfeng Luo, Tuo Leng

(参考訳) これは、私たちが過去3年間に達成した一連の研究における最初の論文です。本稿では,完全かつ互換性のある形式平面幾何システムを構築した。これは、IMOレベルの平面形状問題と可読性AI自動推論の間に重要な橋渡しとなる。このフォーマルなシステムがあれば、最新のAIモデルを私たちのフォーマルなシステムとシームレスに統合することができます。この形式的なフレームワークの中で、AIは、他の自然言語を扱うのと同じように、IMOレベルの平面幾何学問題に対する推論的推論ソリューションを提供することができ、これらの証明は読みやすく、トレース可能で、検証可能である。本稿では,幾何形式体系の発展を導くために,幾何形式化理論(GFT)を提案する。 GFTに基づいて、88の幾何述語と196の定理からなるフォーマルジオを確立した。 IMOレベルの幾何学問題を表現、検証、解決することができる。また、PythonでFGPS(形式幾何学問題の解法)も作成しました。これは、問題解決プロセスを検証するインタラクティブアシスタントと、前方探索、後方探索、AI支援検索などの様々な手法を活用する自動問題解決ツールの両方として機能する。 FormalGeo7kデータセットには6,981(データ拡張による186,832)の幾何学的問題と完全な形式言語アノテーションが含まれています。フォーマルシステムの実装とフォーマルGeo7kの実験は、GFTの正しさと実用性を検証する。奥行き優先探索法は2.42%の問題解決失敗率しか生み出せず,より低い解を得るために深層学習手法を組み込むことができる。 FGPSとFormalGeo7kデータセットのソースコードはhttps://github.com/BitSecret/FormalGeoで公開されている。

This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a complete and compatible formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. With this formal system in place, we have been able to seamlessly integrate modern AI models with our formal system. Within this formal framework, AI is now capable of providing deductive reasoning solutions to IMO-level plane geometry problems, just like handling other natural languages, and these proofs are readable, traceable, and verifiable. We propose the geometry formalization theory (GFT) to guide the development of the geometry formal system. Based on the GFT, we have established the FormalGeo, which consists of 88 geometric predicates and 196 theorems. It can represent, validate, and solve IMO-level geometry problems. we also have crafted the FGPS (formal geometry problem solver) in Python. It serves as both an interactive assistant for verifying problem-solving processes and an automated problem solver, utilizing various methods such as forward search, backward search and AI-assisted search. We've annotated the FormalGeo7k dataset, containing 6,981 (expand to 186,832 through data augmentation) geometry problems with complete formal language annotations. Implementation of the formal system and experiments on the FormalGeo7k validate the correctness and utility of the GFT. The backward depth-first search method only yields a 2.42% problem-solving failure rate, and we can incorporate deep learning techniques to achieve lower one. The source code of FGPS and FormalGeo7k dataset are available at https://github.com/BitSecret/FormalGeo.

翻訳日:2023-10-31 11:47:38 公開日:2023-10-30

# ロールプレイング・チャットボットはキャラクターの性格を捉えるか? ロールプレイングチャットボットのパーソナリティ特性評価

Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots ( http://arxiv.org/abs/2310.17976v2 )

ライセンス: Link先を確認

Xintao Wang, Quan Tu, Yaying Fei, Ziang Leng, Cheng Li

(参考訳) 大規模な事前訓練された言語モデルの出現は、新しいAIアプリケーション、特に異なるペルソナを持つチャットボットの領域における能力に革命をもたらした。本論文は,チャットボットの「刺激応答性」の性質を考慮し,ロールプレイング・チャットボットにおける個性評価のための革新的なオープンエンドインタビュースタイルのアプローチを提示する。チャットハルヒライブラリーが作成した32種類のロールプレイングチャットボットについて,5次元とmbti次元の両方においてパーソナリティ評価を行い,その人間知覚との整合を計測した。評価結果は,LLMに基づく現代のロールプレイングチャットボットは,人間よりも82.8%のアライメント率で,対応するキャラクターの性格特性を効果的に表現できることを示した。また、チャットボットの個性を形作るための潜在的戦略も提案する。そこで本稿は,計算言語学と心理学を交差するロールプレイングチャットボットの基礎研究である。リソースはhttps://github.com/LC1332/Chat-Haruhi-Suzumiyaで利用可能です。

The emergence of large-scale pretrained language models has revolutionized the capabilities of new AI application, especially in the realm of crafting chatbots with distinct personas. Given the "stimulus-response" nature of chatbots, this paper unveils an innovative open-ended interview-style approach for personality assessment on role-playing chatbots, which offers a richer comprehension of their intrinsic personalities. We conduct personality assessments on 32 role-playing chatbots created by the ChatHaruhi library, across both the Big Five and MBTI dimensions, and measure their alignment with human perception. Evaluation results underscore that modern role-playing chatbots based on LLMs can effectively portray personality traits of corresponding characters, with an alignment rate of 82.8% compared with human-perceived personalities. Besides, we also suggest potential strategies for shaping chatbots' personalities. Hence, this paper serves as a cornerstone study for role-playing chatbots that intersects computational linguistics and psychology. Our resources are available at https://github.com/LC1332/Chat-Haruhi-Suzumiya

翻訳日:2023-10-31 11:47:11 公開日:2023-10-30

# 可視赤外人物再識別のための形状中心表現学習

Shape-centered Representation Learning for Visible-Infrared Person Re-identification ( http://arxiv.org/abs/2310.17952v2 )

ライセンス: Link先を確認

Shuang Li, Jiaxu Leng, Ji Gan, Mengjingcheng Mo, Xinbo Gao

(参考訳) 現在の可視赤外人物再同定 (VI-ReID) 法は, 形態変化に対する身体形状の自然抵抗を無視し, 外観特徴の識別を優先する手法である。当初,形状と外観の特徴の直接結合により,形状の識別電位を測定した。しかし、2つの未解決問題が形状特徴の利用に続いている。推論フェーズにおける形状特徴抽出の補助モデルへの依存と、本質的なモジュラリティの相違による生成した赤外線形状の誤差に関係している。もう1つの問題は、形状と外観の特徴の間の不適切な相関である。上記の課題に対処するため,形状に関連した形状特徴と外観特徴の学習に焦点を当てた形状中心表現学習フレームワーク(ScRL)を提案する。具体的には,図形特徴伝達(Shape Feature Propagation, SFP)を考案し, 推論時に最小の複雑さのコストで原画像から形状特徴を直接抽出する。赤外線物体形状の非精度を特徴レベルで再現するために,赤外線形状復元(isr)を提案する。さらに,形状に関連する外観特徴を取得するために,形状特徴によって誘導される識別非関連特徴を抑えつつ,識別関連特徴をアクセントする外観特徴強調(AFE)を設計する。提案したSCRLの有効性を検証するため, 広範囲な実験を行った。顕著な結果を得るために、SYSU-MM01、HITSZ-VCM、RegDBデータセットにおけるRanc-1(mAP)の精度は76.1%、71.2%、92.4%(72.6%、52.9%、86.7%)に達し、既存の最先端の手法よりも優れていた。

Current Visible-Infrared Person Re-Identification (VI-ReID) methods prioritize extracting distinguishing appearance features, ignoring the natural resistance of body shape against modality changes. Initially, we gauged the discriminative potential of shapes by a straightforward concatenation of shape and appearance features. However, two unresolved issues persist in the utilization of shape features. One pertains to the dependence on auxiliary models for shape feature extraction in the inference phase, along with the errors in generated infrared shapes due to the intrinsic modality disparity. The other issue involves the inadequately explored correlation between shape and appearance features. To tackle the aforementioned challenges, we propose the Shape-centered Representation Learning framework (ScRL), which focuses on learning shape features and appearance features associated with shapes. Specifically, we devise the Shape Feature Propagation (SFP), facilitating direct extraction of shape features from original images with minimal complexity costs during inference. To restitute inaccuracies in infrared body shapes at the feature level, we present the Infrared Shape Restitution (ISR). Furthermore, to acquire appearance features related to shape, we design the Appearance Feature Enhancement (AFE), which accentuates identity-related features while suppressing identity-unrelated features guided by shape features. Extensive experiments are conducted to validate the effectiveness of the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM, RegDB datasets respectively, outperforming existing state-of-the-art methods.

翻訳日:2023-10-31 11:46:48 公開日:2023-10-30

# lightlm: 生成レコメンデーションのための軽量で深層で狭い言語モデル

LightLM: A Lightweight Deep and Narrow Language Model for Generative Recommendation ( http://arxiv.org/abs/2310.17488v2 )

ライセンス: Link先を確認

Kai Mei, Yongfeng Zhang

(参考訳) 本稿では, 生成レコメンデーションのための軽量トランスフォーマーベース言語モデルLightLMを提案する。トランスフォーマティブベースの生成モデリングは、nlpやvisionといったさまざまなaiサブフィールドで重要になっているが、パーソナライズされた生成モデリングに対するユニークな需要のために、生成推奨はまだ初期段階にある。ジェネレーティブレコメンデーションに関する既存の研究では、T5、GPT、LLaMA、M6といったNLP指向のトランスフォーマーアーキテクチャが使われており、これは重く、特にレコメンデーションタスクのために設計されていない。 LightLMは、特にレコメンデーションアイテムの直接生成に適した軽量で細いトランスフォーマーアーキテクチャを導入することで、この問題に対処している。この構造は、特に直接的な生成的推奨に適しており、入力は主にモデルのキャパシティによく適合する短いトークンで構成されているため、言語モデルがこのタスクに大きすぎる必要はないという観察から生まれたものである。また,SCI(Spectral Collaborative Indexing)とグラフコラボレーションインデックス(Graph Collaborative Indexing,GCI)という,考案したユーザIDとアイテムIDのインデックス化手法によって,大規模言語モデルよりも高い精度で,より深く狭いトランスフォーマーアーキテクチャを実現することも示す。また,アイテムを出力として生成する幻覚問題に対処するため,生成推薦者に対して制約付き生成プロセスを提案する。実世界のデータセットでの実験では、LightLMは推奨精度と効率の両方において、様々な競争ベースラインを上回っている。コードはhttps://github.com/dongyuanjushi/LightLMにある。

This paper presents LightLM, a lightweight Transformer-based language model for generative recommendation. While Transformer-based generative modeling has gained importance in various AI sub-fields such as NLP and vision, generative recommendation is still in its infancy due to its unique demand on personalized generative modeling. Existing works on generative recommendation often use NLP-oriented Transformer architectures such as T5, GPT, LLaMA and M6, which are heavy-weight and are not specifically designed for recommendation tasks. LightLM tackles the issue by introducing a light-weight deep and narrow Transformer architecture, which is specifically tailored for direct generation of recommendation items. This structure is especially apt for straightforward generative recommendation and stems from the observation that language model does not have to be too wide for this task, as the input predominantly consists of short tokens that are well-suited for the model's capacity. We also show that our devised user and item ID indexing methods, i.e., Spectral Collaborative Indexing (SCI) and Graph Collaborative Indexing (GCI), enables the deep and narrow Transformer architecture to outperform large-scale language models for recommendation. Besides, to address the hallucination problem of generating items as output, we propose the constrained generation process for generative recommenders. Experiments on real-world datasets show that LightLM outperforms various competitive baselines in terms of both recommendation accuracy and efficiency. The code can be found at https://github.com/dongyuanjushi/LightLM.

翻訳日:2023-10-31 11:46:04 公開日:2023-10-30

# FAMO: 高速適応型マルチタスク最適化

FAMO: Fast Adaptive Multitask Optimization ( http://arxiv.org/abs/2306.03792v3 )

ライセンス: Link先を確認

Bo Liu, Yihao Feng, Peter Stone, Qiang Liu

(参考訳) AIの壮大な持続目標の1つは、マルチタスク学習(MTL)を通じて多様なデータから複数の異なるタスクを学習できる汎用エージェントを作成することである。しかし、実際には、全タスクの平均損失に勾配降下(GD)を適用すると、特定のタスクの過度な過度な最適化により、マルチタスク性能が低下する可能性がある。よりバランスの取れた損失削減のためにタスク勾配を操作する以前のアプローチでは、すべてのタスク勾配を格納して計算する必要がある(\mathcal{o}(k)$ space and time where $k$ is the number of tasks)。本研究では,Fast Adaptive Multitask Optimization FAMOを紹介した。これは,$\mathcal{O}(1)$ space and time を用いて,バランスの取れた方法でタスク損失を低減する動的重み付け手法である。マルチタスクの教師付きおよび強化学習問題を網羅する広範な実験を行う。以上の結果から,famoは最先端の勾配操作技術と同等あるいは優れた性能を達成でき,空間と計算効率も大幅に向上した。コードは \url{https://github.com/Cranial-XIX/FAMO} で入手できる。

One of the grand enduring goals of AI is to create generalist agents that can learn multiple different tasks from diverse data via multitask learning (MTL). However, in practice, applying gradient descent (GD) on the average loss across all tasks may yield poor multitask performance due to severe under-optimization of certain tasks. Previous approaches that manipulate task gradients for a more balanced loss decrease require storing and computing all task gradients ($\mathcal{O}(k)$ space and time where $k$ is the number of tasks), limiting their use in large-scale scenarios. In this work, we introduce Fast Adaptive Multitask Optimization FAMO, a dynamic weighting method that decreases task losses in a balanced way using $\mathcal{O}(1)$ space and time. We conduct an extensive set of experiments covering multi-task supervised and reinforcement learning problems. Our results indicate that FAMO achieves comparable or superior performance to state-of-the-art gradient manipulation techniques while offering significant improvements in space and computational efficiency. Code is available at \url{https://github.com/Cranial-XIX/FAMO}.

翻訳日:2023-10-31 11:44:16 公開日:2023-10-30

PDF登録状況（公開日: 20231030）