Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231119となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# アナログ・高周波回路最適化のための回路中心型遺伝的アルゴリズム(CGA) Circuit-centric Genetic Algorithm (CGA) for Analog and Radio-Frequency Circuit Optimization ( http://arxiv.org/abs/2403.17938v1 ) ライセンス: Link先を確認	Mingi Kwon, Yeonjun Lee, Ickhyun Song,	(参考訳) 本稿では,RF受信機の性能パラメータを最大化することを目的とした,アナログ/高周波回路におけるパラメータの自動最適化手法を提案する。設計対象は、消費電力とノイズフィギュアの低減と変換ゲインの増加を含む。本研究では,レシーバの最適化に人工アルゴリズムを用い,各種回路パラメータを用いた性能パラメータの達成方法について検討した。従来の遺伝的アルゴリズム(GA)の課題を克服するために,回路中心型遺伝的アルゴリズム(CGA)の概念が提案されている。提案手法では,既存のディープラーニングモデルよりもシンプルで計算効率のよい推論プロセスを採用する。さらに、CGAは、最適点を見つけるための手動設計よりも、設計者の作業量を軽減し、優れた最適点を探しながら、従来のGAよりも大きな利点を提供する。 This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating how to fulfill the performance parameters with diverse circuit parameters. To overcome issues observed in the traditional Genetic Algorithm (GA), the concept of the Circuit-centric Genetic Algorithm (CGA) is proposed as a viable approach. The new method adopts an inference process that is simpler and computationally more efficient than the existing deep learning models. In addition, CGA offers significant advantages over manual design of finding optimal points and the conventional GA, mitigating the designer's workload while searching for superior optimum points.	翻訳日:2024-04-01 02:44:33 公開日:2023-11-19
# ミニアプリにおけるセキュリティと脆弱性のシステマティック分析 Systematic Analysis of Security and Vulnerabilities in Miniapps ( http://arxiv.org/abs/2311.11382v1 ) ライセンス: Link先を確認	Yuyang Han, Xu Ji, Zhiqiang Wang, Jianyi Zhang,	(参考訳) 過去数年間、軽量アプリケーションとして、モバイルインターネットセクターではミニアプリがとても重要視されているため、ミニアプリが急増しているのを目撃してきた。このため、ミニアプリのセキュリティは、機密データの整合性を損なうことに直接影響し、ユーザーのプライバシーを脅かす可能性がある。しかし,ミニアプリ・セキュリティに関する様々な研究成果を概観した結果,ミニアプリ・ウェブ・インタフェースの安全性に関する研究における彼らの行動は限られていることが判明した。本稿では,ミニアプリのセキュリティリスクを軽減するために,ユーザ,サーバ,攻撃者に着目したトリアド脅威モデルを提案する。最小特権の原則と許可の整合性の方向性に従うことにより,このモデルによるミニアプリのセキュリティリスク評価のための新しい分析フレームワークを設計する。そして,セキュリティリスク評価と,ミニアプリに関連する脅威モデルとの相関関係を解析した。この分析により、潜在的なスコープを特定し、セキュリティリスクと分類することが可能になる。ケーススタディでは、SQLインジェクション、論理的脆弱性、クロスサイトスクリプティングなど、9つの主要な脆弱性のカテゴリを特定します。また,50,628件のセキュリティリスクリスクの評価を行い,具体例を示した。 The past few years have witnessed a boom of miniapps, as lightweight applications, miniapps are of great importance in the mobile internet sector. Consequently, the security of miniapps can directly impact compromising the integrity of sensitive data, posing a potential threat to user privacy. However, after a thorough review of the various research efforts in miniapp security, we found that their actions in researching the safety of miniapp web interfaces are limited. This paper proposes a triad threat model focusing on users, servers and attackers to mitigate the security risk of miniapps. By following the principle of least privilege and the direction of permission consistency, we design a novel analysis framework for the security risk assessment of miniapps by this model. Then, we analyzed the correlation between the security risk assessment and the threat model associated with the miniapp. This analysis led to identifying potential scopes and categorisations with security risks. In the case study, we identify nine major categories of vulnerability issues, such as SQL injection, logical vulnerabilities and cross-site scripting. We also assessed a total of 50,628 security risk hazards and provided specific examples.	翻訳日:2024-03-18 22:53:06 公開日:2023-11-19
# ニコラム乗算を用いた効率的な楕円曲線暗号算術 An Efficient Elliptic Curve Cryptography Arithmetic Using Nikhilam Multiplication ( http://arxiv.org/abs/2311.11392v1 ) ライセンス: Link先を確認	Prokash Barman, Banani Saha,	(参考訳) 乗算は楕円曲線暗号(ECC)算術において最も重要な演算の1つである。 ECCスカラー(整数)乗算における点加算と点倍増は必要である。より高階の古典的(標準的な)乗法では、多くの中間演算が必要である。乗算における演算の削減は、ECC演算の関数速度を増大させる。これらの目的は、古代の乗算アルゴリズムである日比羅経を用いて達成できる。ニキラーム経(ニキラームきょう)は、16のヴェーダ数学の経典(algorithms)に含まれる経典(algorithms)の一つ。ニヒラム経は2つの大きな十進数の乗算に効率的である。この経典は2つの大数の乗算を2つの小数の乗算に減らす。楕円曲線暗号の関数速度は、スカラー乗算のためのNikhilam法を用いて向上することができる。 Multiplication is one of the most important operation in Elliptic Curve Cryptography (ECC) arithmetic. For point addition and point doubling in ECC scalar (integer) multiplication is required. In higher order classical (standard) multiplication many intermediate operations are required. Reduced operation in multiplication will increase the functional speed of ECC arithmetic. These goals can be achieved using ancient multiplication algorithm namely Nikhilam Sutra. Nikhilam Sutra is one of the Sutra (algorithm) within 16 Vedic mathematics Sutras (algorithms). Nikhilam Sutra is efficient for multiplying two large decimal numbers. The Sutra reduces multiplication of two large numbers into two smaller numbers multiplication. The functional speed of Elliptic Curve Cryptography can be increased using Nikhilam method for scalar multiplication.	翻訳日:2024-03-18 22:53:06 公開日:2023-11-19
# IoTセキュリティのためのDNAエンコード楕円曲線暗号システム DNA Encoded Elliptic Curve Cryptography System for IoT Security ( http://arxiv.org/abs/2311.11393v1 ) ライセンス: Link先を確認	Prokash Barmana, Banani Saha,	(参考訳) コンピュータ科学と情報技術(IoT)の分野において、モノのインターネット(IoT)は新興技術の1つである。 IoT環境では、複数のデバイスが相互接続され、それらの間でデータを送信する。 IoT環境には、何らかのセキュリティ上の脆弱性が発生する可能性がある。これまでのところ、IoTはセキュリティ上の欠陥のために広く受け入れられていない。したがって、IoT環境を最も堅牢に保つため、DNAエンコーディングを用いた楕円曲線暗号(ECC)によるIoTの安定したセキュリティフレームワークを提案する。 ECCは、他のよく知られた公開鍵暗号技術の中でも最も軽量な暗号技術である。暗号化の複雑さを高めるため、ECCを用いたDNA計算のDNA符号化機構が先行している。 In the field of Computer Science and Information Technology Internet of Things (IoT) is one of the emerging technologies. In IoT environment several devices are interconnected and transmit data among them. There may be some security vulnerability arise within the IoT environment. Till date, IoT has not been widely accepted due to its security flaws. Hence to keep the IoT environment most robust, we propose a stable security framework of IoT with Elliptic Curve Cryptography (ECC) using DNA Encoding. The ECC is most lightweight cryptography technique among other well known public key cryptography techniques. To increase encryption complexity, DNA encoding mechanism of DNA computing with ECC is preceded.	翻訳日:2024-03-18 22:53:06 公開日:2023-11-19
# 組込みシステムにおけるECQV暗証書の動的セキュアセッションの確立 Establishing Dynamic Secure Sessions for ECQV Implicit Certificates in Embedded Systems ( http://arxiv.org/abs/2311.11444v1 ) ライセンス: Link先を確認	Fikret Basic, Christian Steger, Robert Kofler,	(参考訳) IoTや自動車の領域では、暗黙の証明書が、制約のある組み込みデバイスでますます人気を集めている。彼らは、共通の脅威に対するリソース効率の高いセキュリティソリューションを提示します。計算要求はもはや主要な問題ではない。現在では、提供されたセキュリティレベルと引き起こされた脅威モデルとの適切なバランスを決定することに重点を置いている。ほとんどの設計ソリューションは静的キーの導出のみに基づいており、したがって完全なフォワードの秘密性が欠如している。これにより、送信されたデータは、通信セッションではなく、証明書に結びついたキーを持つことによって、将来的な露見のために開放される。そこで我々は,STS(Station to Station)プロトコルを暗黙の証明書で利用する設計を提案することで,このギャップに対処することを目指している。さらに,提案する設計と最先端鍵導出プロトコルとの間の性能およびセキュリティレベルを総合的に検討する。比較研究では,静的ECDSAキーの導出に比べて計算量が20倍に増加し,セッション関連のセキュリティ脆弱性を軽減できることが示されている。 Be it in the IoT or automotive domain, implicit certificates are gaining ever more prominence in constrained embedded devices. They present a resource-efficient security solution against common threat concerns. The computational requirements are not the main issue anymore. The focus is now placed on determining a good balance between the provided security level and the derived threat model. A security aspect that often gets overlooked is the establishment of secure communication sessions, as most design solutions are based only on the use of static key derivation, and therefore, lack the perfect forward secrecy. This leaves the transmitted data open for potential future exposures by having keys tied to the certificates rather than the communication sessions. We aim to patch this gap, by presenting a design that utilizes the Station to Station (STS) protocol with implicit certificates. In addition, we propose potential protocol optimization implementation steps and run a comprehensive study on the performance and security level between the proposed design and the state-of-the-art key derivation protocols. In our comparative study, we show that with a slight computational increase of 20\% compared to a static ECDSA key derivation, we are able to mitigate many session-related security vulnerabilities that would otherwise remain open.	翻訳日:2024-03-18 22:53:06 公開日:2023-11-19
# EditShield: 命令誘導拡散モデルによる未許可画像編集の保護 EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models ( http://arxiv.org/abs/2311.12066v1 ) ライセンス: Link先を確認	Ruoxi Chen, Haibo Jin, Jinyin Chen, Lichao Sun,	(参考訳) テキスト・ツー・イメージの拡散モデルは、画像合成において創造的なコンテンツを生み出す進化の過程として現れてきた。これらのモデルの印象的な生成能力に基づいて、命令誘導拡散モデルは、簡単な命令と入力画像で画像を編集することができる。ユーザーは自由に編集された画像を入手することができるが、許可されていない画像操作に関する懸念が持ち上がっている。従来の研究では、パーソナライズされた拡散モデルの未承認利用が検討されてきたが、命令誘導拡散モデルのこの問題はいまだほとんど解明されていない。本稿では,このようなモデルからの不正な修正に対する保護手法であるEditShieldを提案する。具体的には、EditShieldは拡散過程で使用される潜伏表現をシフトできる知覚不能な摂動を追加することで、モデルに不一致の被写体で非現実的な画像を生成するように強制する。人工および実世界のデータセット間でEditShieldの有効性を実証した。さらにEditShieldは、様々な編集タイプや同義語命令フレーズに対する堅牢性も維持している。 Text-to-image diffusion models have emerged as an evolutionary for producing creative content in image synthesis. Based on the impressive generation abilities of these models, instruction-guided diffusion models can edit images with simple instructions and input images. While they empower users to obtain their desired edited images with ease, they have raised concerns about unauthorized image manipulation. Prior research has delved into the unauthorized use of personalized diffusion models; however, this problem of instruction-guided diffusion models remains largely unexplored. In this paper, we first propose a protection method EditShield against unauthorized modifications from such models. Specifically, EditShield works by adding imperceptible perturbations that can shift the latent representation used in the diffusion process, forcing models to generate unrealistic images with mismatched subjects. Our extensive experiments demonstrate EditShield's effectiveness among synthetic and real-world datasets. Besides, EditShield also maintains robustness against various editing types and synonymous instruction phrases.	翻訳日:2024-03-18 15:51:52 公開日:2023-11-19
# ノイズフリー抵抗を用いた鍵分配方式の暗号解析 Crypto analysis of the key distribution scheme using noise-free resistances ( http://arxiv.org/abs/2312.00031v1 ) ライセンス: Link先を確認	Laszlo B. Kish,	(参考訳) 情報理論(無条件)セキュリティを提供する鍵交換方式は複雑で実装に費用がかかる。それでも、キー交換における無条件のセキュリティを達成するための唯一の方法である。したがって、情報理論セキュリティのためのより単純なソリューションの探索は、極めて正当化されている。 Linらは、熱ノイズのない抵抗と直流電圧を利用する興味深いハードウェアキー分配方式を提案した。このシステムの暗号解析について述べる。イヴが過去にも未来にもいつでも共有秘密にアクセスできれば、受動的に取得され記録された電圧と電流を用いて、過去と未来に生成されたすべての鍵を、遡っても破ることに成功したことが示される。したがって、このスキームはセキュアな鍵交換器ではないが、もともと共有されていた秘密以上の情報エントロピーを持たないキー拡張器である。また,本提案手法は,通信を効率よく認証できないため,元来の共有秘密が漏洩した場合に有効ではないことを指摘する。しかし、認証された通信プロトコルを有効にするために、無条件でセキュアな鍵交換器が適用された場合、動作します。 Known key exchange schemes offering information-theoretic (unconditional) security are complex and costly to implement. Nonetheless, they remain the only known methods for achieving unconditional security in key exchange. Therefore, the explorations for simpler solutions for information-theoretic security are highly justified. Lin et al. [1] proposed an interesting hardware key distribution scheme that utilizes thermal-noise-free resistances and DC voltages. A crypto analysis of this system is presented. It is shown that, if Eve gains access to the initial shared secret at any time in the past or future, she can successfully crack all the generated keys in the past and future, even retroactively, using passively obtained and recorded voltages and currents. Therefore, the scheme is not a secure key exchanger, but it is rather a key expander with no more information entropy than the originally shared secret at the beginning. We also point out that the proposed defense methods against active attacks do not function when the original shared secret is compromised because then the communication cannot be efficiently authenticated. However, they do work when an unconditionally secure key exchanger is applied to enable the authenticated communication protocol.	翻訳日:2024-03-18 13:35:06 公開日:2023-11-19
# 変貌する法医学的ツールマーク分析:客観的かつ透明な比較アルゴリズム Revolutionizing Forensic Toolmark Analysis: An Objective and Transparent Comparison Algorithm ( http://arxiv.org/abs/2312.00032v1 ) ライセンス: Link先を確認	Maria Cuellar and Sheng Gao and Heike Hofmann	(参考訳) 現在、法医学的なツールマークの比較は人間によって主観的に行われており、一貫性と精度の欠如につながっている。検査官が同じツールか異なるツールでマークのペアが作られたかどうかを判断できる証拠はほとんどない。また、攻撃の角度やマーク生成の方向など、異なる条件下でマークが作られる場合、この分類が可能であるという証拠もほとんどない。元のツールマークデータを3Dで生成し、各ツールマークから信号を抽出し、ツールマーク信号を客観的に比較するためのアルゴリズムを訓練する。ツールマークの信号は、角度や方向ではなく、ツールによってクラスタ化されています。すなわち、ツール内の可変性は、角度/方向に関わらず、ツール間の可変性よりも小さい。既知のマッチと既知の非マッチ密度は、データの依存関係を考慮に入れた場合でも、重複が小さいため、新しい一対のマークが同じツールで作られたかどうかを判断するのに有用な手段である。ツールマーク信号と不確実性の尺度を比較するための形式的手法として,確率比法を提案する。この実験的に訓練されたオープンソース手法は、鑑識によって客観的にツールマークを比較し、ツールマークの比較の信頼性を向上させるために用いられる。これにより、刑事司法制度における司法の流産を減らすことができる。 Forensic toolmark comparisons are currently performed subjectively by humans, which leads to a lack of consistency and accuracy. There is little evidence that examiners can determine whether pairs of marks were made by the same tool or different tools. There is also little evidence that they can make this classification when marks are made under different conditions, such as different angles of attack or direction of mark generation. We generate original toolmark data in 3D, extract the signal from each toolmarks, and train an algorithm to compare toolmark signals objectively. We find that toolmark signals cluster by tool, and not by angle or direction. That is, the variability within tool, regardless of angle/direction, is smaller than the variability between tools. The known-match and known-non-match densities of the similarities of pairs of marks have a small overlap, even when accounting for dependencies in the data, making them a useful instrument for determining whether a new pair of marks was made by the same tool. We provide a likelihood ratio approach as a formal method for comparing toolmark signals with a measure of uncertainty. This empirically trained, open-source method can be used by forensic examiners to compare toolmarks objectively and thus improve the reliability of toolmark comparisons. This can, in turn, reduce miscarriages of justice in the criminal justice system.	翻訳日:2023-12-11 03:53:47 公開日:2023-11-19
# チューリングテスト:AIチャットボットは人間に似ているか? A Turing Test: Are AI Chatbots Behaviorally Similar to Humans? ( http://arxiv.org/abs/2312.00798v1 ) ライセンス: Link先を確認	Qiaozhu Mei, Yutong Xie, Walter Yuan, Matthew O. Jackson	(参考訳) aiチャットボットにチューリングテストを実行します。チャットボットは,信頼,公平性,リスク回避,協力,<textit{etc>>といった特性を引き出すように設計された,一連の古典的な行動ゲームの中でどのように振る舞うかを検討する。パーソナリティ特性を測定する従来のbig-5心理学的調査と同様に。 ChatGPT-4はチューリングテストに合格し、50カ国以上からの数十万人の人間の行動との比較に基づいて、人間のような行動特性と性格特性を一貫して示す。チャットボットはまた、以前の経験に基づいて行動を変更し、そのやりとりから学習していたコンテキストを‘as if’と表現し、同じ戦略的状況の異なるフレーミングに対応して行動を変える。彼らの行動は、平均的な人間の行動と、より利他的かつ協調的な分布の端で行動する傾向にある。私たちは彼らが自分自身とパートナーの報酬の平均を最大化しているかのように振る舞うと見積もっています。 We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}; as well as a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 passes the Turing Test in that it consistently exhibits human-like behavioral and personality traits based on a comparison to the behavior of hundreds of thousands of humans from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts ``as if'' they were learning from the interactions, and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoff.	翻訳日:2023-12-11 03:45:01 公開日:2023-11-19
# トレーニング可能なCOSFIREフィルタを用いたラジオギャラクシーの分類 Classification of Radio Galaxies with trainable COSFIRE filters ( http://arxiv.org/abs/2311.11286v1 ) ライセンス: Link先を確認	Steven Ndungu, Trienko Grobler, Stefan J. Wijnholds Dimka Karastoyanova, George Azzopardi	(参考訳) 電波銀河は多様な特性を示し、様々な放射メカニズムを通じて電波放射を放出し、形態に基づいた異なる種類に分類することは複雑な課題である。この課題を効果的に解決するために,コスファイアフィルタを用いた電波銀河分類の革新的アプローチを提案する。これらのフィルタは、画像内のプロトタイプパターンの形状と向きの両方に適応する能力を持っている。 COSFIREアプローチは、説明可能で、学習不要で、回転耐性があり、効率的で、巨大なトレーニングセットを必要としない。本手法の有効性を評価するため,1180個のトレーニングサンプルと404個のテストサンプルからなるベンチマーク電波銀河データセットの実験を行った。特に,本手法は平均精度93.36\%を達成した。この成果は、現代のディープラーニングモデルよりも優れており、このデータセット上で達成された最高の結果です。さらに、COSFIREフィルタはより優れた計算性能を提供し、$\sim$20$\times$演算はDenseNetベースの競合メソッドよりも少ない(同じ精度で比較した場合)。本研究は,COSFIREフィルタを用いたラジオ銀河分類の複雑さに対処する手法の有効性を裏付けるものである。この研究は、電波銀河観測に内在する方向の課題を超越するロバストな解を提供することによって、この分野の進歩に貢献している。本手法は,様々な画像分類手法に適用できるという点で多様である。 Radio galaxies exhibit a rich diversity of characteristics and emit radio emissions through a variety of radiation mechanisms, making their classification into distinct types based on morphology a complex challenge. To address this challenge effectively, we introduce an innovative approach for radio galaxy classification using COSFIRE filters. These filters possess the ability to adapt to both the shape and orientation of prototype patterns within images. The COSFIRE approach is explainable, learning-free, rotation-tolerant, efficient, and does not require a huge training set. To assess the efficacy of our method, we conducted experiments on a benchmark radio galaxy data set comprising of 1180 training samples and 404 test samples. Notably, our approach achieved an average accuracy rate of 93.36\%. This achievement outperforms contemporary deep learning models, and it is the best result ever achieved on this data set. Additionally, COSFIRE filters offer better computational performance, $\sim$20$\times$ fewer operations than the DenseNet-based competing method (when comparing at the same accuracy). Our findings underscore the effectiveness of the COSFIRE filter-based approach in addressing the complexities associated with radio galaxy classification. This research contributes to advancing the field by offering a robust solution that transcends the orientation challenges intrinsic to radio galaxy observations. Our method is versatile in that it is applicable to various image classification approaches.	翻訳日:2023-12-03 14:11:12 公開日:2023-11-19
# 大規模言語モデルを用いた財務書類に対するゼロショット質問応答 Zero-Shot Question Answering over Financial Documents using Large Language Models ( http://arxiv.org/abs/2311.14722v1 ) ライセンス: Link先を確認	Karmvir Singh Phogat, Chetan Harsha, Sridhar Dasaratha, Shashishekar Ramakrishna, Sai Akhil Puranam	(参考訳) 我々は,財務報告に対するマルチホップ数値推論を必要とする複雑な問題に答えるために,大規模言語モデル(LLM)に基づくアプローチを導入する。 LLMは様々な自然言語や推論タスクにおいて顕著な性能を示してきたが、複雑な推論問題はしばしば、慎重に例を作らなければならない数発のプロンプトに依存している。対照的に、我々のアプローチでは、LLMを誘導する新しいゼロショットプロンプトを使用して、必要な推論をPythonプログラムやドメイン固有言語にエンコードする。生成されたプログラムはプログラムインタープリタによって実行され、正確な算術演算を行う際の LLM の制限を緩和する。提案手法を,最近開発されたGPTモデルを用いて3つの財務データセットに対して評価し,様々なゼロショットベースラインとの比較を行った。実験結果から,本手法は各ベースライン上でのLLMの精度を著しく向上することが示された。結果の詳細な分析を行い、調査結果をサポートする洞察を与えます。提案手法の成功は,LLMに埋め込まれた知識を効果的に活用するためにゼロショットプロンプトを設計することで,複雑な領域固有の数値推論を抽出する可能性を示す。 We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports. While LLMs have exhibited remarkable performance on various natural language and reasoning tasks, complex reasoning problems often rely on few-shot prompts that require carefully crafted examples. In contrast, our approach uses novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language. The generated program is then executed by a program interpreter, thus mitigating the limitations of LLM in performing accurate arithmetic calculations. We evaluate the proposed approach on three financial datasets using some of the recently developed generative pretrained transformer (GPT) models and perform comparisons with various zero-shot baselines. The experimental results demonstrate that our approach significantly improves the accuracy for all the LLMs over their respective baselines. We provide a detailed analysis of the results, generating insights to support our findings. The success of our approach demonstrates the enormous potential to extract complex domain specific numerical reasoning by designing zero-shot prompts to effectively exploit the knowledge embedded in LLMs.	翻訳日:2023-12-03 13:40:35 公開日:2023-11-19
# 学術雑誌のマニュアル作成におけるAI活用 AI Use in Manuscript Preparation for Academic Journals ( http://arxiv.org/abs/2311.14720v1 ) ライセンス: Link先を確認	Nir Chemaya and Daniel Martin	(参考訳) ChatGPTやBardといったツールを駆使したLarge Language Models(LLMs)の創発的な能力は、AIが学術的な文章にどう影響するかという興奮と心配の両方を生み出した。 ai利用に関する懸念が高まる中、学術出版物の著者は自発的に原稿の改訂に使用するaiツールを開示し、ジャーナルやカンファレンスは開示の義務付けや検出サービスの利用を開始する可能性がある。こうした略奪的可能性を踏まえ、学術者は、原稿作成におけるAIの使用を報告する必要があるとみなし、検出器が学術著作におけるAIの使用にどう反応するかを調査する。 The emergent abilities of Large Language Models (LLMs), which power tools like ChatGPT and Bard, have produced both excitement and worry about how AI will impact academic writing. In response to rising concerns about AI use, authors of academic publications may decide to voluntarily disclose any AI tools they use to revise their manuscripts, and journals and conferences could begin mandating disclosure and/or turn to using detection services, as many teachers have done with student writing in class settings. Given these looming possibilities, we investigate whether academics view it as necessary to report AI use in manuscript preparation and how detectors react to the use of AI in academic writing.	翻訳日:2023-12-03 13:40:14 公開日:2023-11-19
# 工学設計のための説明可能なAI:システム工学とコンポーネントベースディープラーニングの統一的アプローチ Explainable AI for engineering design: A unified approach of systems engineering and component-based deep learning ( http://arxiv.org/abs/2108.13836v4 ) ライセンス: Link先を確認	Philipp Geyer, Manav Mahan Singh and Xia Chen	(参考訳) 機械学習によって作成されたデータ駆動モデルは、設計とエンジニアリングのあらゆる分野で重要性を増している。彼らは、より良いパフォーマンスと持続可能性を持つ新しいアーティファクトを作成する意思決定者を支援する高い可能性を持っている。しかし、これらのモデルの一般化とブラックボックスの性質は、説明可能性と再利用性に制限がある。このような状況を克服するため,機械学習(ML)による部分コンポーネントモデル作成のためのコンポーネントベースアプローチを提案する。このコンポーネントベースのアプローチは、ディープラーニングとシステムエンジニアリング(SE)を連携させる。エネルギー効率の高い建築設計の分野において,まず,トレーニングデータ外の予測精度を解析し,コンポーネントベース手法のより優れた一般化を実証する。従来のモノリシック法に比べてはるかに高い精度(R2 = 0.94)を観測する(R2 = 0.71)。第2に、SEからの感度情報と低深度決定木からのルールがいかに工学に役立つかを示す。第3に、予備知識とデータ駆動型戦略の整合性を示す定性的定量的手法による説明可能性の評価を行い、ホワイトボックスシミュレーション結果(エンベロープ成分: R2 = 0.92.0.99; ゾーン: R2 = 0.78.0.93)と比較して、コンポーネントインタフェースにおけるアクティベーションの正しさを示す。コンポーネントベースの説明可能性の鍵は、コンポーネント間のインターフェイスのアクティベーションが解釈可能なエンジニアリング量であることである。コンポーネントを構成する可能性の広い構成は、理解可能なデータ駆動モデルで見知らぬ新しい設計ケースの検証を可能にする。類似の確率分布による成分のパラメータ範囲のマッチングは、再利用可能な、一般化された、信頼できるモデルを生み出す。このアプローチは、モデル構造をシステム工学の工学的手法とドメイン知識に適応させる。 Data-driven models created by machine learning gain in importance in all fields of design and engineering. They have high potential to assist decision-makers in creating novel artefacts with better performance and sustainability. However, limited generalization and the black-box nature of these models lead to limited explainability and reusability. To overcome this situation, we propose a component-based approach to create partial component models by machine learning (ML). This component-based approach aligns deep learning with systems engineering (SE). For the domain of energy efficient building design, we first demonstrate better generalization of the component-based method by analyzing prediction accuracy outside the training data. We observe a much higher accuracy (R2 = 0.94) compared to conventional monolithic methods (R2 = 0.71). Second, we illustrate explainability by exemplary demonstrating how sensitivity information from SE and rules from low-depth decision trees serve engineering. Third, we evaluate explainability by qualitative and quantitative methods demonstrating the matching of preliminary knowledge and data-driven derived strategies and show correctness of activations at component interfaces compared to white-box simulation results (envelope components: R2 = 0.92..0.99; zones: R2 = 0.78..0.93). The key for component-based explainability is that activations at interfaces between the components are interpretable engineering quantities. The large range of possible configurations in composing components allows the examination of novel unseen design cases with understandable data-driven models. The matching of parameter ranges of components by similar probability distribution produces reusable, well-generalizing, and trustworthy models. The approach adapts the model structure to engineering methods of systems engineering and to domain knowledge.	翻訳日:2023-11-23 06:17:26 公開日:2023-11-19
# 高速ビュー合成のためのカスケードおよび一般化可能なニューラルラジアンス場 Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis ( http://arxiv.org/abs/2208.04717v2 ) ライセンス: Link先を確認	Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila	(参考訳) ビュー合成のためのカスケードおよび一般化可能なニューラル放射場法であるCG-NeRFを提案する。近年の一般化されたビュー合成手法は、近隣の入力ビューを用いて高品質な新規ビューを描画することができる。しかしながら、ニューラルネットワークの放射場を均一にサンプリングする性質から、レンダリング速度は依然として遅い。既存のシーン固有のメソッドは、新しいビューを効率的に訓練しレンダリングできるが、見えないデータに一般化することはできない。本稿では、粗い放射場予測器と畳み込みに基づくニューラルレンダラーという2つの新しいモジュールを提案することにより、ビュー合成を高速かつ一般化する問題に対処する。このアーキテクチャは暗黙のニューラルネットワークに基づいて一貫したシーン形状を推論し、単一のGPUを使用して新しいビューを効率的にレンダリングする。まず,dtuデータセットの複数の3dシーンでcg-nerfをトレーニングし,光量損失のみを用いて,実データや合成データに対して高品質で正確な斬新なビューを生成する。さらに,単一のシーンのより密集した参照画像を用いて,事前学習したモデルの高速レンダリングを維持しつつ,明示的な表現に頼らずに正確なノベルビューを生成することができる。実験結果から,CG-NeRFは様々な合成および実データに対して,最先端の一般化可能なニューラルレンダリング法より優れていた。 We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering speed is still slow due to the nature of uniformly-point sampling of neural radiance fields. Existing scene-specific methods can train and render novel views efficiently but can not generalize to unseen data. Our approach addresses the problems of fast and generalizing view synthesis by proposing two novel modules: a coarse radiance fields predictor and a convolutional-based neural renderer. This architecture infers consistent scene geometry based on the implicit neural fields and renders new views efficiently using a single GPU. We first train CG-NeRF on multiple 3D scenes of the DTU dataset, and the network can produce high-quality and accurate novel views on unseen real and synthetic data using only photometric losses. Moreover, our method can leverage a denser set of reference images of a single scene to produce accurate novel views without relying on additional explicit representations and still maintains the high-speed rendering of the pre-trained model. Experimental results show that CG-NeRF outperforms state-of-the-art generalizable neural rendering methods on various synthetic and real datasets.	翻訳日:2023-11-23 06:07:55 公開日:2023-11-19
# スポーツにおける多目的追跡のための反復的スケールアップIoUとディープ・フィーチャーズ・アソシエーション Iterative Scale-Up ExpansionIoU and Deep Features Association for Multi-Object Tracking in Sports ( http://arxiv.org/abs/2306.13074v5 ) ライセンス: Link先を確認	Hsiang-Wei Huang, Cheng-Yen Yang, Jiacheng Sun, Pyong-Kun Kim, Kwang-Ju Kim, Kyoungoh Lee, Chung-I Huang, Jenq-Neng Hwang	(参考訳) 深層学習に基づく物体検出装置は、多目的追跡アルゴリズムの顕著な進歩を導いている。しかし、現在の追跡手法は主に歩行者や車両の単純で規則的な動きパターンに焦点を当てている。これは、アスリートのような非線形不規則な動きを持つターゲットの追跡アルゴリズムのギャップを残している。さらに、最近の追跡アルゴリズムにおけるカルマンフィルタに依存すると、物体の動きがその線形仮定に反するときに不足する。これらの課題を克服するために,スポーツシナリオの多対象追跡に焦点を当てた,Deep ExpansionIoU(Deep-EIoU)という,オンラインかつ堅牢な多対象追跡手法を提案する。従来の手法とは異なり、カルマンフィルタの使用を放棄し、スポーツシナリオにおける拡張IoUの反復的なスケールアップと深い特徴を活用する。このアプローチは、トラッキングプロセスをオンラインに保ちながら、より堅牢な検出器を採用することなく、優れたトラッキング性能を実現する。提案手法は,SportsMOTデータセットでは77.2% HOTA,SportsNet-Trackingデータセットでは85.4% HOTAを達成し,不規則な動作物体の追跡に顕著な効果を示した。さまざまなスポーツシナリオをカバーする、さまざまな大規模マルチオブジェクトトラッキングベンチマークで、これまでの最先端のトラッカーを上回っている。コードとモデルはhttps://github.com/hsiangwei0903/deep-eiouで入手できる。 Deep learning-based object detectors have driven notable progress in multi-object tracking algorithms. Yet, current tracking methods mainly focus on simple, regular motion patterns in pedestrians or vehicles. This leaves a gap in tracking algorithms for targets with nonlinear, irregular motion, like athletes. Additionally, relying on the Kalman filter in recent tracking algorithms falls short when object motion defies its linear assumption. To overcome these issues, we propose a novel online and robust multi-object tracking approach named deep ExpansionIoU (Deep-EIoU), which focuses on multi-object tracking for sports scenarios. Unlike conventional methods, we abandon the use of the Kalman filter and leverage the iterative scale-up ExpansionIoU and deep features for robust tracking in sports scenarios. This approach achieves superior tracking performance without adopting a more robust detector, all while keeping the tracking process in an online fashion. Our proposed method demonstrates remarkable effectiveness in tracking irregular motion objects, achieving a score of 77.2% HOTA on the SportsMOT dataset and 85.4% HOTA on the SoccerNet-Tracking dataset. It outperforms all previous state-of-the-art trackers on various large-scale multi-object tracking benchmarks, covering various kinds of sports scenarios. The code and models are available at https://github.com/hsiangwei0903/Deep-EIoU.	翻訳日:2023-11-23 05:01:35 公開日:2023-11-19
# 高忠実度単分子ダイナミックシーン再構成のための変形性3次元ガウスアン Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction ( http://arxiv.org/abs/2309.13101v2 ) ライセンス: Link先を確認	Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin	(参考訳) 暗黙の神経表現は、動的なシーンの再構築とレンダリングに対する新しいアプローチの道を開いた。それでも、最先端の動的ニューラルネットワークレンダリング手法はこれらの暗黙の表現に大きく依存しており、シーン内のオブジェクトの複雑な詳細を捉えるのにしばしば苦労している。さらに、暗黙の手法は、一般的な動的シーンにおけるリアルタイムレンダリングの達成が困難であり、様々なタスクでの使用を制限する。そこで,本稿では,3次元ガウス法を用いてシーンを再構成し,変形場を有する正準空間で学習し,モノクロ動的シーンをモデル化する3次元ガウス法を提案する。また,オーバヘッドの不要なアニーリングスムージングトレーニング機構を導入することで,実世界のデータセットにおける時間補間タスクのスムース性に対する不正確なポーズの影響を軽減できる。微分ガウスラスタライザにより、変形可能な3Dガウスは高いレンダリング品質だけでなく、リアルタイムレンダリング速度も達成できる。実験の結果,本手法はレンダリング品質と速度の両方において既存手法よりも優れており,新規なビュー合成,時間補間,リアルタイムレンダリングといったタスクに適していることがわかった。 Implicit neural representation has paved the way for new approaches to dynamic scene reconstruction and rendering. Nonetheless, cutting-edge dynamic neural rendering methods rely heavily on these implicit representations, which frequently struggle to capture the intricate details of objects in the scene. Furthermore, implicit methods have difficulty achieving real-time rendering in general dynamic scenes, limiting their use in a variety of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using 3D Gaussians and learns them in canonical space with a deformation field to model monocular dynamic scenes. We also introduce an annealing smoothing training mechanism with no extra overhead, which can mitigate the impact of inaccurate poses on the smoothness of time interpolation tasks in real-world datasets. Through a differential Gaussian rasterizer, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time interpolation, and real-time rendering.	翻訳日:2023-11-23 04:37:06 公開日:2023-11-19
# SIRe-IR:高照度シーンにおける影と照度除去によるBRDF再建のための逆レンダリング SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes ( http://arxiv.org/abs/2310.13030v2 ) ライセンス: Link先を確認	Ziyi Yang, Yanzhen Chen, Xinyu Gao, Yazhen Yuan, Yu Wu, Xiaowei Zhou, Xiaogang Jin	(参考訳) 暗黙の神経表現は、逆レンダリングの新しい可能性を開く。しかし、既存の暗黙の神経逆レンダリング手法は、大きな影と間接的な照明を持つ強い照らされたシーンを扱うのに苦労している。影と反射の存在は、シーン幾何学の正確な理解につながり、正確な分解を困難にする。この目的のために,非線形マッピングと正規化可視性推定を用いてシーンを環境マップ,アルベド,粗さに分解する暗黙的ニューラルネットワーク逆レンダリング手法SIRe-IRを提案する。間接放射場, 正常, 視認性, 直接光を同時に正確にモデル化することにより, 現場に厳密な制約を課すことなく, 材料の影と間接照明の両方を除去できる。強い照明の存在下でも,影干渉のない高品質なアルベドと粗さを回収する。 SIRe-IRは、定量評価と定性評価の両方において既存の手法より優れている。 Implicit neural representation has opened up new possibilities for inverse rendering. However, existing implicit neural inverse rendering methods struggle to handle strongly illuminated scenes with significant shadows and indirect illumination. The existence of shadows and reflections can lead to an inaccurate understanding of scene geometry, making precise factorization difficult. To this end, we present SIRe-IR, an implicit neural inverse rendering approach that uses non-linear mapping and regularized visibility estimation to decompose the scene into environment map, albedo, and roughness. By accurately modeling the indirect radiance field, normal, visibility, and direct light simultaneously, we are able to remove both shadows and indirect illumination in materials without imposing strict constraints on the scene. Even in the presence of intense illumination, our method recovers high-quality albedo and roughness with no shadow interference. SIRe-IR outperforms existing methods in both quantitative and qualitative evaluations.	翻訳日:2023-11-23 04:24:40 公開日:2023-11-19
# SecureBERT と LLAMA 2 を利用した制御領域ネットワーク侵入検知と分類 SecureBERT and LLAMA 2 Empowered Control Area Network Intrusion Detection and Classification ( http://arxiv.org/abs/2311.12074v1 ) ライセンス: Link先を確認	Xuemei Li, Huirong Fu	(参考訳) 多くの研究がコントロールエリアネットワーク(CAN)攻撃の検出に有効であることを示した。人間の意味空間を理解する領域において、トランスフォーマーベースのモデルは顕著な効果を示した。事前学習されたトランスフォーマーを活用することは、様々な言語関連タスクにおいて一般的な戦略となり、これらのモデルが人間のセマンティクスをより包括的に把握できるようになる。 can侵入検出のための事前学習モデルの適応性評価について検討するため、can-securebertとcan-llama2の2つの異なるモデルを開発した。特に、我々のCAN-LLAMA2モデルは、バランスの取れた精度、精度検出率、F1スコア、そして驚くほど低い3.10e-6の誤警報率で、例外的な性能 0.999993 を達成することで、最先端モデルを上回る。驚くべきことに、誤警報率は、先行モデルのmth-ids(multitiered hybrid intrusion detection system)の52倍小さい。本研究は,大規模言語モデルを基盤モデルとして採用し,他のサイバーセキュリティ関連タスクへのアダプタを導入し,モデル固有の言語関連能力を維持することの約束を明らかにする。 Numerous studies have proved their effective strength in detecting Control Area Network (CAN) attacks. In the realm of understanding the human semantic space, transformer-based models have demonstrated remarkable effectiveness. Leveraging pre-trained transformers has become a common strategy in various language-related tasks, enabling these models to grasp human semantics more comprehensively. To delve into the adaptability evaluation on pre-trained models for CAN intrusion detection, we have developed two distinct models: CAN-SecureBERT and CAN-LLAMA2. Notably, our CAN-LLAMA2 model surpasses the state-of-the-art models by achieving an exceptional performance 0.999993 in terms of balanced accuracy, precision detection rate, F1 score, and a remarkably low false alarm rate of 3.10e-6. Impressively, the false alarm rate is 52 times smaller than that of the leading model, MTH-IDS (Multitiered Hybrid Intrusion Detection System). Our study underscores the promise of employing a Large Language Model as the foundational model, while incorporating adapters for other cybersecurity-related tasks and maintaining the model's inherent language-related capabilities.	翻訳日:2023-11-23 03:38:48 公開日:2023-11-19
# 教師なし学習の統合による低線量CT画像再構成の実現 Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning ( http://arxiv.org/abs/2311.12071v1 ) ライセンス: Link先を確認	Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar	(参考訳) 従来のモデルベース画像再構成法(mbir)は、前方モデルと雑音モデルと単純な物体前兆を組み合わせたものである。画像再構成へのディープラーニング手法の最近の応用は、アンサンプされた測定や様々なノイズによる画像再構成の課題に対処するためのデータ駆動アプローチに成功している。本研究では,X線CT画像再構成のためのハイブリッド教師なし学習フレームワークを提案する。提案した学習定式化は、疎性または教師なし学習に基づく先行とニューラルネットワーク再構成の両方を活用して、固定点反復過程をシミュレートする。各訓練ブロックは、決定論的MBIRソルバとニューラルネットワークで構成される。情報は2つの再構成器を通して並列に流れ、最適に結合される。複数のブロックをカスケードして再構築パイプラインを形成する。訓練データに制限のある低用量ct画像再構成における学習ハイブリッドモデルの有効性を実証し,nih aapm mayoクリニック低用量ctグランドチャレンジデータセットを用いてトレーニングおよびテストを行った。本研究では,教師付きディープ・ネットワーク・コンストラクタとMBIRソルバの組み合わせを,学習された疎表現に基づく先行や分析的先行と組み合わせて検討した。近年の低用量CT再建法と比較して,提案手法の有望な性能を示す。 Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with undersampled measurements or various types of noise. In this work, we propose a hybrid supervised-unsupervised learning framework for X-ray computed tomography (CT) image reconstruction. The proposed learning formulation leverages both sparsity or unsupervised learning-based priors and neural network reconstructors to simulate a fixed-point iteration process. Each proposed trained block consists of a deterministic MBIR solver and a neural network. The information flows in parallel through these two reconstructors and is then optimally combined. Multiple such blocks are cascaded to form a reconstruction pipeline. We demonstrate the efficacy of this learned hybrid model for low-dose CT image reconstruction with limited training data, where we use the NIH AAPM Mayo Clinic Low Dose CT Grand Challenge dataset for training and testing. In our experiments, we study combinations of supervised deep network reconstructors and MBIR solver with learned sparse representation-based priors or analytical priors. Our results demonstrate the promising performance of the proposed framework compared to recent low-dose CT reconstruction methods.	翻訳日:2023-11-23 03:38:24 公開日:2023-11-19
# FDDM:周波数分離拡散モデルを用いた医用画像の教師なし翻訳 FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model ( http://arxiv.org/abs/2311.12070v1 ) ライセンス: Link先を確認	Yunxiang Li, Hua-Chieh Shao, Xiaoxue Qian, You Zhang	(参考訳) 拡散モデルは、疾患の診断、局所化、治療を支援するために、医用画像翻訳のための高品質な画像を作成する大きな可能性を示している。しかしながら、現在の拡散モデルは、医学画像の解剖学的構造を正確に保存できる忠実な画像翻訳、特に障害のないデータセットの達成に限られている。構造的ミスマッチは疾患の誤認や治療ミスにつながるため、構造的および解剖学的詳細の保存は信頼できる診断と治療計画に不可欠である。本研究では,フーリエ領域の医療画像の周波数成分を翻訳過程で分離し,構造保存された高品質画像変換を可能にする新しいフレームワークである周波数分解拡散モデル(fddm)を導入した。 FDDMは、教師なしの周波数変換モジュールを適用して、ソースの医用画像を周波数固有出力に変換し、その後、周波数固有情報を使用して、最終ソースからターゲットへの画像変換のための次の拡散モデルを導出する。公開脳mriからctへの翻訳データセットを用いてfddmの広範な評価を行い,他のgan,vae,および拡散に基づくモデルよりも優れた性能を示した。 Frechet開始距離(FID)、ピーク信号-雑音比(PSNR)、構造類似度指標(SSIM)などの指標を評価した。 FDDMのFIDは29.88で、第2位の半分以下である。これらの結果から,FDDMは,翻訳された解剖学的構造の忠実さを維持しつつ,高リアルなターゲットドメイン画像の生成に優れていた。 Diffusion models have demonstrated significant potential in producing high-quality images for medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models have limited success in achieving faithful image translations that can accurately preserve the anatomical structures of medical images, especially for unpaired datasets. The preservation of structural and anatomical details is essential to reliable medical diagnosis and treatment planning, as structural mismatches can lead to disease misidentification and treatment errors. In this study, we introduced a frequency-decoupled diffusion model (FDDM), a novel framework that decouples the frequency components of medical images in the Fourier domain during the translation process, to allow structure-preserved high-quality image conversion. FDDM applies an unsupervised frequency conversion module to translate the source medical images into frequency-specific outputs and then uses the frequency-specific information to guide a following diffusion model for final source-to-target image translation. We conducted extensive evaluations of FDDM using a public brain MR-to-CT translation dataset, showing its superior performance against other GAN-, VAE-, and diffusion-based models. Metrics including the Frechet inception distance (FID), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM) were assessed. FDDM achieves an FID of 29.88, less than half of the second best. These results demonstrated FDDM's prowess in generating highly-realistic target-domain images while maintaining the faithfulness of translated anatomical structures.	翻訳日:2023-11-23 03:38:05 公開日:2023-11-19
# 協調基礎モデルによる新規物体検出の促進 Enhancing Novel Object Detection via Cooperative Foundational Models ( http://arxiv.org/abs/2311.12068v1 ) ライセンス: Link先を確認	Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan	(参考訳) 本稿では,新規物体検出(nod)の難解かつ創発的な問題に対処し,推論中の未知物体と新規物体のカテゴリの正確な検出に焦点をあてる。従来の物体検出アルゴリズムは本質的にクローズドセットであり、NODを扱う能力を制限する。本稿では,既存の閉集合検出器を開集合検出器に変換する新しい手法を提案する。この変換は、事前訓練された基礎モデル、特にCLIPとSAMの相補的な強みを協調的なメカニズムを通じて活用することで達成される。さらに,この機構をGDINOなどの最先端のオープンセット検出器と統合することにより,物体検出性能の新たなベンチマークを確立する。 LVISデータセット上の既知のオブジェクトに対して,新しいオブジェクト検出において17.42mAP,42.08mAPを達成する。 COCO OVDの分割にアプローチを適用すると、新しいクラスに対する7.2ドル \text{AP}_{50} のマージンで現在の最先端技術を上回っます。私たちのコードはhttps://github.com/rohit901/cooperative-foundational-modelsで利用可能です。 In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://github.com/rohit901/cooperative-foundational-models .	翻訳日:2023-11-23 03:37:39 公開日:2023-11-19
# 質と量:ファッションデザインにおけるテキストから画像への合成のための100万枚の高品質画像 Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design ( http://arxiv.org/abs/2311.12067v1 ) ライセンス: Link先を確認	Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan	(参考訳) aiとファッションデザインの融合は有望な研究分野として現れてきた。しかし、衣料品や試着段階に関する広範な相互関連データが欠如していることは、この領域におけるAIの潜在能力を妨げている。そこで本研究では,複数年にわたる厳格な努力の成果であるファッション・ディフフュージョンデータセットを提案する。このデータセットは、100万以上の高品質なファッション画像で構成され、詳細なテキスト記述と組み合わせられている。さまざまな地理的位置と文化的背景から得られたデータセットは、世界的なファッショントレンドをカプセル化している。この画像には、衣服や人間に関連する細かい属性が刻まれており、ファッションデザインプロセスを単純化してテキスト・ツー・イメージ(T2I)タスクにしている。 Fashion-Diffusionデータセットは、高品質なテキストイメージペアと多様なヒューマンガーメントペアを提供するだけでなく、人間に関する大規模なリソースとしても機能し、T2I世代の研究を促進する。さらに、t2iベースのファッションデザイン分野における標準化を促進するために、ファッションデザインモデルの性能評価のための複数のデータセットからなる新しいベンチマークを提案する。この研究は、AI駆動のファッションデザインの領域における大きな飛躍であり、この分野における将来の研究のための新しい標準を確立している。 The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.	翻訳日:2023-11-23 03:37:19 公開日:2023-11-19
# 大規模言語モデルエージェントを用いたマイナショット分類とセグメンテーション Few-Shot Classification & Segmentation Using Large Language Models Agent ( http://arxiv.org/abs/2311.12065v1 ) ライセンス: Link先を確認	Tian Meng, Yang Tao, Wuliang Yin	(参考訳) 少数ショット画像分類とセグメンテーション(FS-CS)のタスクは、ターゲットクラスのいくつかの例を考慮すれば、クエリ画像中のターゲットオブジェクトの分類とセグメンテーションを必要とする。本研究では,大規模言語モデル(LLM)をエージェントとして利用し,FS-CS問題にトレーニング不要で対処する手法を提案する。 LLMをタスクプランナーおよび市販のビジョンモデルにツールを組み込むことにより、画像レベルラベルのみを用いて対象オブジェクトの分類とセグメンテーションを行うことができる。具体的には、chain-of-thought prompting and in-context learning guide the llm to observe support images like human; segment anything model (sam) や gpt-4vision といったビジョンモデルは、llm が空間的および意味的情報を同時に理解するのを支援する。最終的に、LLMはその要約と推論機能を使用して、クエリイメージの分類とセグメント化を行う。提案手法のモジュラーフレームワークにより拡張が容易になる。提案手法はPascal-5iデータセットの最先端性能を実現する。 The task of few-shot image classification and segmentation (FS-CS) requires the classification and segmentation of target objects in a query image, given only a few examples of the target classes. We introduce a method that utilises large language models (LLM) as an agent to address the FS-CS problem in a training-free manner. By making the LLM the task planner and off-the-shelf vision models the tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels. Specifically, chain-of-thought prompting and in-context learning guide the LLM to observe support images like human; vision models such as Segment Anything Model (SAM) and GPT-4Vision assist LLM understand spatial and semantic information at the same time. Ultimately, the LLM uses its summarizing and reasoning capabilities to classify and segment the query image. The proposed method's modular framework makes it easily extendable. Our approach achieves state-of-the-art performance on the Pascal-5i dataset.	翻訳日:2023-11-23 03:36:57 公開日:2023-11-19
# 本当にサイコロが必要なの? セグメンテーション損失の隠れた地域規模バイアス Do We Really Need Dice? The Hidden Region-Size Biases of Segmentation Losses ( http://arxiv.org/abs/2104.08717v4 ) ライセンス: Link先を確認	Bingyuan Liu, Jose Dolz, Adrian Galdran, Riadh Kobbi, Ismail Ben Ayed	(参考訳) ほとんどのセグメンテーション損失はクロスエントロピー(CE)またはディース損失の変種である。表面的には、これら2つの損失のカテゴリは無関係に見え、どのカテゴリがより良い選択であるかについての明確なコンセンサスはなく、それぞれのベンチマークやアプリケーションのパフォーマンスが異なる。さらに、Dice と CE は相補的であり、複合 CE-Dice の損失を動機付けていると広く主張されている。本研究では,CE と Dice が従来考えられていたよりもはるかに深い関係を持つことを示す理論解析を行う。まず, 制約最適化の観点からは, 2つの要素,すなわち, 予測された前景領域を接地構造へ押し上げる類似の接地構造マッチング項と, 予測された領域の大きさに異なるバイアスを与える地域規模ペナルティ項に分解することを示した。 Diceは特定の極端に不均衡な解に対して本質的な偏りを持ち、CEは暗黙的に地平線領域の比例を奨励する。以上の結果から,dice損失が不均衡分節化に改善をもたらす医学的画像化文献における広範な実験的な証拠を説明する。理論解析に基づいて,領域サイズのバイアスを明示的に制御できる,原理的かつ簡単な解法を提案する。提案手法は,CE を L1 あるいは KL の発散に基づく明示的な用語と統合し,対象のクラス比にマッチする分節領域比を奨励し,クラス不均衡を緩和するが,一般性を損なうことはない。異なる損失と応用に関する包括的実験とアブレーションの研究は、我々の理論解析と、明示的かつ単純な領域サイズ項の有効性を検証する。 Most segmentation losses are arguably variants of the Cross-Entropy (CE) or Dice losses. On the surface, these two categories of losses seem unrelated, and there is no clear consensus as to which category is a better choice, with varying performances for each across different benchmarks and applications. Furthermore, it is widely argued within the medical-imaging community that Dice and CE are complementary, which has motivated the use of compound CE-Dice losses. In this work, we provide a theoretical analysis, which shows that CE and Dice share a much deeper connection than previously thought. First, we show that, from a constrained-optimization perspective, they both decompose into two components, i.e., a similar ground-truth matching term, which pushes the predicted foreground regions towards the ground-truth, and a region-size penalty term imposing different biases on the size of the predicted regions. Then, we provide bound relationships and an information-theoretic analysis, which uncover hidden region-size biases: Dice has an intrinsic bias towards specific extremely imbalanced solutions, whereas CE implicitly encourages the ground-truth region proportions. Our theoretical results explain the wide experimental evidence in the medical-imaging literature, whereby Dice losses bring improvements for imbalanced segmentation. Based on our theoretical analysis, we propose a principled and simple solution, which enables to control explicitly the region-size bias. The proposed method integrates CE with explicit terms based on L1 or the KL divergence, which encourage segmenting region proportions to match target class proportions, thereby mitigating class imbalance but without losing generality. Comprehensive experiments and ablation studies over different losses and applications validate our theoretical analysis, as well as the effectiveness of explicit and simple region-size terms.	翻訳日:2023-11-22 21:38:28 公開日:2023-11-19
# テキスト上の数値推論のための質問方向グラフ注意ネットワーク Question Directed Graph Attention Network for Numerical Reasoning over Text ( http://arxiv.org/abs/2009.07448v2 ) ライセンス: Link先を確認	Kunlong Chen, Weidi Xu, Xingyi Cheng, Zou Xiaochuan, Yuyu Zhang, Le Song, Taifeng Wang, Yuan Qi, Wei Chu	(参考訳) 追加、減算、ソート、カウントなどのテキストに対する数値推論は、自然言語の理解と算術計算の両方を必要とするため、機械読解の難しい課題である。この課題に対処するために,このような推論に必要な経過と質問の文脈に対する不均質なグラフ表現を提案し,このコンテキストグラフ上で多段階の数値推論を駆動する質問指向グラフアテンションネットワークを設計する。コードリンクは:https://github.com/emnlp2020qdgat/QDGAT Numerical reasoning over texts, such as addition, subtraction, sorting and counting, is a challenging machine reading comprehension task, since it requires both natural language understanding and arithmetic computation. To address this challenge, we propose a heterogeneous graph representation for the context of the passage and question needed for such reasoning, and design a question directed graph attention network to drive multi-step numerical reasoning over this context graph. The code link is at: https://github.com/emnlp2020qdgat/QDGAT	翻訳日:2023-11-22 21:36:17 公開日:2023-11-19
# データ駆動戦略における認識的不確実性と認識の表現 Representations of epistemic uncertainty and awareness in data-driven strategies ( http://arxiv.org/abs/2110.11482v7 ) ライセンス: Link先を確認	Mario Angelelli, Massimiliano Gervasi	(参考訳) aiとビッグデータの拡散は、意思決定を支援する情報量を増やしながら、データや実証的な証拠との直接的なインタラクションを削減し、意思決定プロセスを再構築している。このパラダイムシフトは、データオブザーバビリティの制限があいまいさと解釈性の欠如をもたらすため、新しい不確実性源を導入する。データ駆動戦略の適切な分析の必要性は、知識へのこの種の境界付きアクセスを記述できる新しいモデルの探索を動機付ける。この貢献は、知識表現の不確実性とそのエージェントによる伝達に関する新しい理論モデルを示す。モデルの比較と結合のための構造を内挿することで、知識状態の動的記述を提供する。具体的には、更新は組み合わせによって表現され、その説明可能性は異なる次元表現における一貫性に基づいている。我々は、推論、選好関係、情報尺度の多重性の観点から、非等価な知識表現を考察する。さらに,非古典的不確実性(エルスバーグのモデル)と,他のエージェントがデータ(ウィグナーの友人)を観察することによる知識の推論という2つのシナリオとの形式的類似性を定義する。最後に,提案モデルがデータ駆動戦略に与える影響について考察し,ビジネス価値次元の不確実性に基づく推論と,その評価のための計測ツールの設計に注目する。 The diffusion of AI and big data is reshaping decision-making processes by increasing the amount of information that supports decisions while reducing direct interaction with data and empirical evidence. This paradigm shift introduces new sources of uncertainty, as limited data observability results in ambiguity and a lack of interpretability. The need for the proper analysis of data-driven strategies motivates the search for new models that can describe this type of bounded access to knowledge. This contribution presents a novel theoretical model for uncertainty in knowledge representation and its transfer mediated by agents. We provide a dynamical description of knowledge states by endowing our model with a structure to compare and combine them. Specifically, an update is represented through combinations, and its explainability is based on its consistency in different dimensional representations. We look at inequivalent knowledge representations in terms of multiplicity of inferences, preference relations, and information measures. Furthermore, we define a formal analogy with two scenarios that illustrate non-classical uncertainty in terms of ambiguity (Ellsberg's model) and reasoning about knowledge mediated by other agents observing data (Wigner's friend). Finally, we discuss some implications of the proposed model for data-driven strategies, with special attention to reasoning under uncertainty about business value dimensions and the design of measurement tools for their assessment.	翻訳日:2023-11-22 21:27:10 公開日:2023-11-19
# 教師なし画像アニメーションにおける微細粒運動変形の微分運動進化 Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation ( http://arxiv.org/abs/2110.04658v2 ) ライセンス: Link先を確認	Peirong Liu, Rui Wang, Xuefei Cao, Yipin Zhou, Ashish Shah, Ser-Nam Lim	(参考訳) 画像アニメーション(英: Image animation)とは、ソース画像中の特定のオブジェクトに駆動ビデオの動きを転送するタスクである。近年では、ラベル付きデータやドメイン先行を必要とせず、教師なしのモーション転送において大きな進歩が見られるが、現在の教師なしアプローチの多くは、ソースと駆動ドメインの間に大きな動き/ビューの相違が生じても、動きの変形を捉えるのに苦慮している。このような条件下では、動き場を適切に捉えるのに十分な情報がないだけである。動き推定のための微分精細化を統合したエンドツーエンドの教師なし動き伝達フレームワークであるdime (differential motion evolution) を紹介する。主な発見は,(1)常微分方程式(ODE)で運動伝達を捉えることにより,運動場を規則化し,(2)原画像自体を利用することで,大きな運動変化による閉塞/欠落領域を塗布することができる。さらに、ビュー毎にODEをモデル化することで、DMEはソースオブジェクトの複数の異なるビューを簡単に活用できるというODEの考え方の自然な拡張も提案する。 9つのベンチマークにわたる広範囲な実験により、dimeは最先端のオブジェクトよりもかなりのマージンを上回り、目に見えないオブジェクトをより一般化しています。 Image animation is the task of transferring the motion of a driving video to a given object in a source image. While great progress has recently been made in unsupervised motion transfer, requiring no labeled data or domain priors, many current unsupervised approaches still struggle to capture the motion deformations when large motion/view discrepancies occur between the source and driving domains. Under such conditions, there is simply not enough information to capture the motion field properly. We introduce DiME (Differential Motion Evolution), an end-to-end unsupervised motion transfer framework integrating differential refinement for motion estimation. Key findings are twofold: (1) by capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field, and (2) by utilizing the source image itself, we are able to inpaint occluded/missing regions arising from large motion changes. Additionally, we also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available by modeling an ODE per view. Extensive experiments across 9 benchmarks show DiME outperforms the state-of-the-arts by a significant margin and generalizes much better to unseen objects.	翻訳日:2023-11-22 21:26:47 公開日:2023-11-19
# 確率のない頻繁な推論:信頼性のあるシミュレータに基づく推論のための古典統計と機械学習の橋渡し Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference ( http://arxiv.org/abs/2107.03920v8 ) ライセンス: Link先を確認	Niccol\`o Dalmasso, Luca Masserano, David Zhao, Rafael Izbicki, Ann B. Lee	(参考訳) 科学の多くの分野は、複雑なシステムの難解な可能性関数を暗黙的にエンコードするコンピュータシミュレータを多用している。古典的な統計手法は、いわゆる「可能性のない推論(LFI)」設定、特に漸近的および低次元のレジームの外では不適当である。同時に、近似ベイズ計算やより最近の機械学習技術のような従来のlfi法は、一般的な設定(高次元データ、有限サンプルサイズ、任意のパラメータ値)において名目カバレッジを持つ信頼セットを保証しない。さらに、パラメータ空間全体にわたってそのような手法によって提供される信頼セットの実証的カバレッジを確認するための診断ツールも存在しない。本研究では,古典統計と現代の機械学習提供を橋渡しする統一的モジュール型推論フレームワークを提案する。 (i)未知のパラメータの任意の値に対して、頻繁な有限サンプル被覆を持つ信頼集合のニーマン構成への実践的アプローチ (ii)パラメータ空間全体の経験的カバレッジを推定する解釈可能な診断。一般のフレームワークを、LF2I ( chance-free frequentist inference) と呼ぶ。テスト統計を定義する任意のメソッドはLF2Iを利用して、固定パラメータ設定のモンテカルロサンプルを犠牲にすることなく、有効な信頼セットと診断を作成することができる。 2つの確率ベーステスト統計(acoreとbff)のパワーを調査し,その経験的性能を高次元複雑なデータで実証する。コードはhttps://github.com/lee-group-cmu/lf2iで入手できる。 Many areas of science make extensive use of computer simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, traditional LFI methods - such as Approximate Bayesian Computation or more recent machine learning techniques - do not guarantee confidence sets with nominal coverage in general settings (i.e., with high-dimensional data, finite sample sizes, and for any parameter value). In addition, there are no diagnostic tools to check the empirical coverage of confidence sets provided by such methods across the entire parameter space. In this work, we propose a unified and modular inference framework that bridges classical statistics and modern machine learning providing (i) a practical approach to the Neyman construction of confidence sets with frequentist finite-sample coverage for any value of the unknown parameters; and (ii) interpretable diagnostics that estimate the empirical coverage across the entire parameter space. We refer to the general framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic can leverage LF2I to create valid confidence sets and diagnostics without costly Monte Carlo samples at fixed parameter settings. We study the power of two likelihood-based test statistics (ACORE and BFF) and demonstrate their empirical performance on high-dimensional, complex data. Code is available at https://github.com/lee-group-cmu/lf2i.	翻訳日:2023-11-22 21:25:09 公開日:2023-11-19
# 3次元小分子と高分子錯体のための効率的かつ正確な物理量認識多重グラフニューラルネットワーク Efficient and Accurate Physics-aware Multiplex Graph Neural Networks for 3D Small Molecules and Macromolecule Complexes ( http://arxiv.org/abs/2206.02789v3 ) ライセンス: Link先を確認	Shuo Zhang, Yang Liu, Lei Xie	(参考訳) グラフニューラルネットワーク(GNN)を分子科学に適用する最近の進歩は、3次元3次元構造表現をGNNで学習する能力を示している。しかし、既存のGNNのほとんどは、多様な相互作用のモデリング不足、計算コストの高い演算、ベクトル値の無知の限界に悩まされている。そこで我々は,新しいGNNモデルである物理対応多重グラフニューラルネットワーク(PaxNet)を提案し,小さな有機化合物とマクロ分子複合体の3次元分子の表現を効率的かつ正確に学習する。 PaxNetは、分子力学にインスパイアされた局所的および非局所的な相互作用のモデリングを分離し、高価な角度関連計算を減らす。スカラー特性の他に、paxnetは各原子の関連するベクトルを学習することでベクトル特性を予測できる。 PaxNetの性能を評価するために,2つのタスクにおける最先端のベースラインと比較する。量子化学特性を予測するための小さな分子データセットでは、PaxNetは予測誤差を15%削減し、最高のベースラインよりも73%少ないメモリを使用する。タンパク質-リガンド結合親和性を予測するマクロ分子データセットでは、PaxNetはメモリ消費を33%減らし、推論時間を85%減らしながら、最高のベースラインを上回っている。したがって、PaxNetは分子の大規模機械学習のための普遍的で堅牢で正確な方法を提供する。私たちのコードはhttps://github.com/zetayue/Physics-aware-Multiplex-GNNで利用可能です。 Recent advances in applying Graph Neural Networks (GNNs) to molecular science have showcased the power of learning three-dimensional (3D) structure representations with GNNs. However, most existing GNNs suffer from the limitations of insufficient modeling of diverse interactions, computational expensive operations, and ignorance of vectorial values. Here, we tackle these limitations by proposing a novel GNN model, Physics-aware Multiplex Graph Neural Network (PaxNet), to efficiently and accurately learn the representations of 3D molecules for both small organic compounds and macromolecule complexes. PaxNet separates the modeling of local and non-local interactions inspired by molecular mechanics, and reduces the expensive angle-related computations. Besides scalar properties, PaxNet can also predict vectorial properties by learning an associated vector for each atom. To evaluate the performance of PaxNet, we compare it with state-of-the-art baselines in two tasks. On small molecule dataset for predicting quantum chemical properties, PaxNet reduces the prediction error by 15% and uses 73% less memory than the best baseline. On macromolecule dataset for predicting protein-ligand binding affinities, PaxNet outperforms the best baseline while reducing the memory consumption by 33% and the inference time by 85%. Thus, PaxNet provides a universal, robust and accurate method for large-scale machine learning of molecules. Our code is available at https://github.com/zetayue/Physics-aware-Multiplex-GNN.	翻訳日:2023-11-22 21:15:28 公開日:2023-11-19
# 追尾と映像物体検出の統一化 Unifying Tracking and Image-Video Object Detection ( http://arxiv.org/abs/2211.11077v2 ) ライセンス: Link先を確認	Peirong Liu, Rui Wang, Pengchuan Zhang, Omid Poursaeed, Yipin Zhou, Xuefei Cao, Sreya Dutta Roy, Ashish Shah, Ser-Nam Lim	(参考訳) オブジェクト指向検出(OD)はコンピュータビジョンにおける最も基本的なタスクの1つである。近年のディープラーニングの進歩により、画像ODのパフォーマンスは学習ベースのデータ駆動アプローチによって新たな高みへと押し上げられている。一方、video odは、より高価なデータアノテーションのニーズのために、あまり探求されていない。同時に、トラックの同一性や時空間軌跡の推論を必要とするマルチオブジェクト追跡(MOT)も、ビデオODと類似の精神を共有している。しかし、ほとんどのmotデータセットはクラス固有(例えば、person-annotated only)であり、モデルが他のオブジェクトを追跡する柔軟性を制約している。本稿では、画像OD、ビデオOD、MOTを1つのエンドツーエンドモデルで統合する最初のフレームワークであるTrIVD(Tracking and Image-Video Detection)を提案する。データセット間のカテゴリラベルの相違やセマンティックな重複に対処するため、TrIVDはビジュアルテキストアライメントによるオブジェクトカテゴリの検出/追跡を根拠と理由として定式化している。統合された定式化により、クロスデータセット、マルチタスクのトレーニングが可能になり、TrIVDにフレームレベルの特徴、ビデオレベルの時空間関係、およびアイデンティティの関連性を追跡することができる。このような共同トレーニングにより、よりリッチなオブジェクトカテゴリアノテーションを備えたODデータからの知識をMOTに拡張し、ゼロショット追跡機能を実現することができます。実験により、マルチタスクで訓練されたTrIVDは、すべての画像/ビデオODおよびMOTタスクでシングルタスクベースラインを上回っていることが示された。さらに、ゼロショットトラッキングという新しいタスクに、最初のベースラインを設定します。 Objection detection (OD) has been one of the most fundamental tasks in computer vision. Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, multi-object tracking (MOT) which requires reasoning about track identities and spatio-temporal trajectories, shares similar spirits with video OD. However, most MOT datasets are class-specific (e.g., person-annotated only), which constrains a model's flexibility to perform tracking on other objects. We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model. To handle the discrepancies and semantic overlaps of category labels across datasets, TrIVD formulates detection/tracking as grounding and reasons about object categories via visual-text alignments. The unified formulation enables cross-dataset, multi-task training, and thus equips TrIVD with the ability to leverage frame-level features, video-level spatio-temporal relations, as well as track identity associations. With such joint training, we can now extend the knowledge from OD data, that comes with much richer object category annotations, to MOT and achieve zero-shot tracking capability. Experiments demonstrate that multi-task co-trained TrIVD outperforms single-task baselines across all image/video OD and MOT tasks. We further set the first baseline on the new task of zero-shot tracking.	翻訳日:2023-11-22 20:51:44 公開日:2023-11-19
# MLIC:学習画像圧縮のためのマルチ参照エントロピーモデル MLIC: Multi-Reference Entropy Model for Learned Image Compression ( http://arxiv.org/abs/2211.07273v8 ) ライセンス: Link先を確認	Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang	(参考訳) 近年,学習画像の圧縮性能は著しく向上している。潜在表現の分布を推定するエントロピーモデルは、速度分散性能の向上に重要な役割を果たしている。しかし、ほとんどのエントロピーモデルは1次元の相関のみを捉えるが、潜在表現はチャネル回り、局所空間、大域的な空間相関を含む。この問題に対処するため、Multi-Reference Entropy Model (MEM) と高度なバージョンMEM$^+$を提案する。これらのモデルは潜在表現に存在する異なる種類の相関を捉える。具体的には、まず潜在表現をスライスに分割する。現在のスライスを復号する際には、予め復号されたスライスをコンテキストとして使用し、それまでのスライスのアテンションマップを用いて、現在のスライスにおける大域的相関を予測する。ローカルコンテキストをキャプチャするために,性能劣化を回避する2つの拡張チェッカーボードコンテキストキャプチャ技術を導入する。 MEM と MEM$^+$ に基づいて,画像圧縮モデル MLIC と MLIC$^+$ を提案する。我々のMLICおよびMLIC$^+$モデルは、PSNRで測定されたVTM-17.0と比較して、Kodakデータセット上でのBDレートが8.05\%$と11.39\%$に減少する。私たちのコードはhttps://github.com/jiangweibeta/mlicで利用可能です。 Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC.	翻訳日:2023-11-22 20:50:12 公開日:2023-11-19
# 擬似決定論的量子回路の難読化 Obfuscation of Pseudo-Deterministic Quantum Circuits ( http://arxiv.org/abs/2302.11083v3 ) ライセンス: Link先を確認	James Bartusek, Fuyuki Kitagawa, Ryo Nishimaki, and Takashi Yamakawa	(参考訳) 従来のオラクルモデルでは、疑似決定論的量子回路を難読化する方法を示し、誤りを伴う学習の量子ハードネスを仮定する。古典的な量子回路の$Q$の説明を考えると、我々のオブファスケータは任意の入力に対して$Q$を繰り返し評価することができる量子状態$\ket{\widetilde{Q}}$を出力する。古典オラクルを量子後識別不能なオブファスケータの候補として使用することにより、多項式サイズの疑似決定論的量子回路に対する識別不能な難読化の最初の候補構築が可能になる。特に,本手法はShorのアルゴリズム(SICOMP 1997)を実装するのに十分な性能を持つ回路群に対する,最初の候補オブファスケータである。提案手法はバルタテックとマラボルタ (ITCS 2022) に従っており、量子計算(CVQC) スキームの古典的検証の検証を妨害することにより、量子回路を難読化する。我々は、Mahadevの量子完全同型暗号スキーム(FOCS 2018)の評価手順を検証するために使用できる量子 \emph{partitioning} 回路に対して、公に検証可能なCVQCスキームを構築することで、ヌル回路を超えていく。我々はバルタテック (TCC 2021) の1回限りの安全なスキームを完全再利用可能なスキームにアップグレードし、パブリックデコダブルな \emph{Pauli functional commitment} を通じて実現し、この作業で正式に定義し構成する。このコミットメントスキームは、受信者の標準とアダマール基底のデコード機能にアクセスできるコミッタに対するバインディングの概念を満たすもので、等価だが衝突耐性のハッシュ関数の文脈で導入されたamos、georgiou、kiayias、zhandry(stoc 2020)の技術に基づいて構築されている。 We show how to obfuscate pseudo-deterministic quantum circuits in the classical oracle model, assuming the quantum hardness of learning with errors. Given the classical description of a quantum circuit $Q$, our obfuscator outputs a quantum state $\ket{\widetilde{Q}}$ that can be used to evaluate $Q$ repeatedly on arbitrary inputs. Instantiating the classical oracle using any candidate post-quantum indistinguishability obfuscator gives us the first candidate construction of indistinguishability obfuscation for all polynomial-size pseudo-deterministic quantum circuits. In particular, our scheme is the first candidate obfuscator for a class of circuits that is powerful enough to implement Shor's algorithm (SICOMP 1997). Our approach follows Bartusek and Malavolta (ITCS 2022), who obfuscate \emph{null} quantum circuits by obfuscating the verifier of an appropriate classical verification of quantum computation (CVQC) scheme. We go beyond null circuits by constructing a publicly-verifiable CVQC scheme for quantum \emph{partitioning} circuits, which can be used to verify the evaluation procedure of Mahadev's quantum fully-homomorphic encryption scheme (FOCS 2018). We achieve this by upgrading the one-time secure scheme of Bartusek (TCC 2021) to a fully reusable scheme, via a publicly-decodable \emph{Pauli functional commitment}, which we formally define and construct in this work. This commitment scheme, which satisfies a notion of binding against committers that can access the receiver's standard and Hadamard basis decoding functionalities, is constructed by building on techniques of Amos, Georgiou, Kiayias, and Zhandry (STOC 2020) introduced in the context of equivocal but collision-resistant hash functions.	翻訳日:2023-11-22 20:25:51 公開日:2023-11-19
# 初期化学習: メタ学習はプロンプトチューニングにおけるクロスタスクの一般化を改善するか? Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning? ( http://arxiv.org/abs/2302.08143v3 ) ライセンス: Link先を確認	Chengwei Qin, Qian Li, Ruochen Zhao, Shafiq Joty	(参考訳) タスク毎に追加のトークンの埋め込みのみをチューニングし、事前学習された言語モデル(plm)を凍結しておくプロンプトチューニング(pt)は、わずかな学習で驚くべきパフォーマンスを示している。それにもかかわらず、PTは迅速な埋め込みの良好な初期化に大きく依存していることが示されている。本研究では,メタプロンプト・チューニング(MPT)について検討し,メタ学習がPTにおけるクロスタスクの一般化を(可能ならば)改善し,他の関連するタスクからのプロンプト埋め込みを初期化することで,体系的に研究する。我々は,多種多様なソース/ターゲットタスク設定を用いて,多種多様な適応設定において,メタ学習アルゴリズムの代表セットを経験的に分析する。広範囲な実験と分析により,MPTの有効性を実証した。この改善は特に分類タスクにおいて重要である。質問応答など他のタスクでは、MPTはPTより優れているが、マルチタスク学習では必ずしも優れているとは限らない。さらに,タスクの類似性の観点から,詳細な分析を行う。 Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, keeping the pre-trained language model (PLM) frozen, has shown remarkable performance in few-shot learning. Despite this, PT has been shown to rely heavily on good initialization of the prompt embeddings. In this work, we study meta prompt tuning (MPT) to systematically explore how meta-learning can help improve (if it can) cross-task generalization in PT through learning to initialize the prompt embeddings from other relevant tasks. We empirically analyze a representative set of meta learning algorithms in a wide range of adaptation settings with different source/target task configurations on a large set of few-shot tasks. With extensive experiments and analysis, we demonstrate the effectiveness of MPT. We find the improvement to be significant particularly on classification tasks. For other kinds of tasks such as question answering, we observe that while MPT can outperform PT in most cases, it does not always outperform multi-task learning. We further provide an in-depth analysis from the perspective of task similarity.	翻訳日:2023-11-22 20:24:50 公開日:2023-11-19
# ChatGPTは汎用自然言語処理タスクか? Is ChatGPT a General-Purpose Natural Language Processing Task Solver? ( http://arxiv.org/abs/2302.06476v3 ) ライセンス: Link先を確認	Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang	(参考訳) 大規模化の進展により、大規模言語モデル(LLM)は、下流データに適応することなく、さまざまな自然言語処理(NLP)タスクをゼロショットで実行できることを実証した。近年のChatGPTの登場は、人間の入力に対する高品質な応答と、その後の会話に基づく自己修正の誤りを生成できるという事実から、自然言語処理(NLP)コミュニティから大きな注目を集めている。しかし、ChatGPTが多くのNLPタスクをゼロショットで実行できるジェネラリストモデルとして機能するかどうかはまだ分かっていない。本研究では,ChatGPTのゼロショット学習能力を7つの代表的なタスクカテゴリをカバーする20のNLPデータセット上で評価することにより,実証的に解析する。広範な実証研究により,現在のChatGPTの有効性と限界を実証した。 ChatGPTは推論能力(例えば算術的推論)を好む多くのタスクでよく機能するが、シーケンシャルタグ付けのような特定のタスクを解く際にはまだ課題に直面している。また,定性ケーススタディを通じて詳細な分析を行う。 Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies.	翻訳日:2023-11-22 20:23:22 公開日:2023-11-19
# 位相空間における工学的アービタリーハミルトニアン Engineering Arbitrary Hamiltonians in Phase Space ( http://arxiv.org/abs/2302.04257v3 ) ライセンス: Link先を確認	Lingzhen Guo and Vittorio Peano	(参考訳) 非可換フーリエ変換(NcFT)に基づく周期駆動発振器のフロケ位相空間における任意のハミルトニアンを設計するための一般化手法を提案する。位相空間における任意の対象フロケ・ハミルトニアンと実空間における周期的駆動ポテンシャルの関係を確立する。実空間における駆動ポテンシャルの解析式は、位相空間、例えば回転格子やシャープ境界井戸において新しいハミルトニアンを生成することができる。我々のプロトコルは、非古典的状態生成とボソニック量子計算のための様々な実験プラットフォームで実現できる。 We introduce a general method to engineer arbitrary Hamiltonians in the Floquet phase space of a periodically driven oscillator, based on the non-commutative Fourier transformation (NcFT) technique. We establish the relationship between an arbitrary target Floquet Hamiltonian in phase space and the periodic driving potential in real space. We obtain analytical expressions for the driving potentials in real space that can generate novel Hamiltonians in phase space, e.g., rotational lattices and sharp-boundary well. Our protocol can be realised in a range of experimental platforms for nonclassical states generation and bosonic quantum computation.	翻訳日:2023-11-22 20:22:11 公開日:2023-11-19
# シュレディンガー-ロバートソン不確実性関係に基づくより強いEPRステアリング基準 Stronger EPR-steering criterion based on inferred Schrodinger-Robertson uncertainty relation ( http://arxiv.org/abs/2303.11914v3 ) ライセンス: Link先を確認	Laxmi Prasad Naik, Rakesh Mohan Das, Prasanta K. Panigrahi	(参考訳) ステアリングはベルの非局所性と絡み合いの間の3つの同値な非局所相関の1つである。シュロディンガー・ロバートソンの不確実性関係(SRUR)は、絡みや操舵の検知に広く用いられている。しかし、SRURに基づく初期の研究におけるステアリング基準は、完全な推論-分散不確実性関係を含まない。本稿では,局所隠れ状態モデルとレイド形式を考慮し,二成分シナリオにおけるsrurに基づく完全な推定分散epr-steering条件を導出する。さらに,2量子および2量子の異方性状態の離散変数による操舵基準の有効性を確認する。 Steering is one of the three in-equivalent forms of nonlocal correlations intermediate between Bell nonlocality and entanglement. Schrodinger-Robertson uncertainty relation (SRUR), has been widely used to detect entanglement and steering. However, the steering criterion in earlier works, based on SRUR, did not involve complete inferred-variance uncertainty relation. In this paper, by considering the local hidden state model and Reid formalism, we derive a complete inferred-variance EPR-steering criterion based on SRUR in the bipartite scenario. Furthermore, we check the effectiveness of our steering criterion with discrete variable bipartite two-qubit and two-qutrit isotropic states.	翻訳日:2023-11-22 20:15:06 公開日:2023-11-19
# エージェントベース市場モデルと相互作用する多くの学習エージェント Many learning agents interacting with an agent-based market model ( http://arxiv.org/abs/2303.07393v3 ) ライセンス: Link先を確認	Matthew Dicks, Andrew Paskaramoorthy, Tim Gebbie	(参考訳) 我々は,金融市場のリアクティブエージェントベースモデル(ABM)とイベント時に相互作用する複数の強化学習最適実行取引エージェントのダイナミクスと相互作用を考察する。このモデルは、最適な実行学習エージェント、最小限の知的流動性テイカー、高速な電子流動性プロバイダによって表される3つの栄養レベルを持つ市場エコロジーを表している。最適な実行エージェントクラスには、制限注文と市場注文の組み合わせを使用できる購入および販売エージェント、または市場注文を使用した貿易のみが含まれる。報酬関数は、注文をタイムリーに実行しないペナルティに対して、取引実行スリップを明示的にバランスさせる。この研究は、エージェントの数、エージェントの初期注文のサイズ、学習に使用される状態空間の関数として、複数の競合する学習エージェントが、最小限のインテリジェントな市場シミュレーションにどのように影響するかを示す。我々は、様々な学習エージェントの仕様が含まれている場合、abmのダイナミクスを調べるために位相空間プロットを用いる。さらに、学習可能な最適な実行エージェントが、経験的データと同じ複雑さでダイナミクスを生み出すことができるかどうかについて検討する。最適な実行エージェントを組み込むことで、ABMが作り出したスタイル化された事実を経験的データに適合させることができ、市場マイクロ構造を調査する上で必要となるものとなる。しかし, 実験データから得られた複雑性を回復するには, チャート-基礎-ノイズABMの実行エージェントを含めるには不十分である。 We consider the dynamics and the interactions of multiple reinforcement learning optimal execution trading agents interacting with a reactive Agent-Based Model (ABM) of a financial market in event time. The model represents a market ecology with 3-trophic levels represented by: optimal execution learning agents, minimally intelligent liquidity takers, and fast electronic liquidity providers. The optimal execution agent classes include buying and selling agents that can either use a combination of limit orders and market orders, or only trade using market orders. The reward function explicitly balances trade execution slippage against the penalty of not executing the order timeously. This work demonstrates how multiple competing learning agents impact a minimally intelligent market simulation as functions of the number of agents, the size of agents' initial orders, and the state spaces used for learning. We use phase space plots to examine the dynamics of the ABM, when various specifications of learning agents are included. Further, we examine whether the inclusion of optimal execution agents that can learn is able to produce dynamics with the same complexity as empirical data. We find that the inclusion of optimal execution agents changes the stylised facts produced by ABM to conform more with empirical data, and are a necessary inclusion for ABMs investigating market micro-structure. However, including execution agents to chartist-fundamentalist-noise ABMs is insufficient to recover the complexity observed in empirical data.	翻訳日:2023-11-22 20:13:51 公開日:2023-11-19
# SimuQ: アナログコンパイルによる量子ハミルトンシミュレーションのプログラミングフレームワーク SimuQ: A Framework for Programming Quantum Hamiltonian Simulation with Analog Compilation ( http://arxiv.org/abs/2303.02775v3 ) ライセンス: Link先を確認	Yuxiang Peng, Jacob Young, Pengyu Liu, Xiaodi Wu	(参考訳) 量子系の進化をシミュレートし、量子現象を探究する量子ハミルトンシミュレーションは、量子コンピューティングの最も有望な応用の1つである。最近の実験結果から、ハミルトニアン指向アナログ量子シミュレーションは、ノイズ中間スケール量子(NISQ)マシン時代の回路指向デジタル量子シミュレーションよりも有利であることが示唆された。しかし、アナログ量子シミュレータのプログラミングはハードウェアとソフトウェアの統一インターフェースが欠如しているため、はるかに困難である。本稿では、ハミルトン計画とパルスレベルコンパイルをサポートする量子ハミルトンシミュレーションのための最初のフレームワークであるSimuQを、異種アナログ量子シミュレータに設計、実装する。具体的には、SimuQでは、フロントエンドユーザーがターゲットの量子システムをハミルトンモデリング言語で指定し、アナログ量子シミュレータのハミルトンレベルのプログラマビリティは、抽象アナログ命令セット(AAIS)と呼ばれる新しい抽象化によって指定され、ハードウェアプロバイダによってAIS仕様言語でプログラムされる。ソルバベースのコンパイルにより、simuqは実デバイスで実行可能なパルススケジュールを生成し、超伝導(ibm)、中性原子(quera)、閉じ込められたイオン(ionq)量子デバイスで実証される所望の量子システムの進化をシミュレートする。さらに,SimuQのコンパイラを上記のアナログ量子シミュレータで評価するために,ネイティブ操作やインタラクションベースゲートを持つデバイスのハミルトンレベルプログラマビリティを公開するという利点を実証し,量子シミュレーションの小さなベンチマークを確立する。 Quantum Hamiltonian simulation, which simulates the evolution of quantum systems and probes quantum phenomena, is one of the most promising applications of quantum computing. Recent experimental results suggest that Hamiltonian-oriented analog quantum simulation would be advantageous over circuit-oriented digital quantum simulation in the Noisy Intermediate-Scale Quantum (NISQ) machine era. However, programming analog quantum simulators is much more challenging due to the lack of a unified interface between hardware and software. In this paper, we design and implement SimuQ, the first framework for quantum Hamiltonian simulation that supports Hamiltonian programming and pulse-level compilation to heterogeneous analog quantum simulators. Specifically, in SimuQ, front-end users specify the target quantum system with Hamiltonian Modeling Language, and the Hamiltonian-level programmability of analog quantum simulators is specified through a new abstraction called the abstract analog instruction set (AAIS) and programmed in AAIS Specification Language by hardware providers. Through a solver-based compilation, SimuQ generates executable pulse schedules for real devices to simulate the evolution of desired quantum systems, which is demonstrated on superconducting (IBM), neutral-atom (QuEra), and trapped-ion (IonQ) quantum devices. Moreover, we demonstrate the advantages of exposing the Hamiltonian-level programmability of devices with native operations or interaction-based gates and establish a small benchmark of quantum simulation to evaluate SimuQ's compiler with the above analog quantum simulators.	翻訳日:2023-11-22 20:11:05 公開日:2023-11-19
# IRFL:図形言語の画像認識 IRFL: Image Recognition of Figurative Language ( http://arxiv.org/abs/2303.15445v2 ) ライセンス: Link先を確認	Ron Yosef, Yonatan Bitton, Dafna Shahaf	(参考訳) 比喩、シミュレート、イディオムなどの音声の図は人間のコミュニケーションの不可欠な部分である。それらは様々な形態の言論においてユビキタスであり、人々は複雑な抽象的な考えを伝え、感情を誘発することができる。図形形式はしばしば複数のモダリティ(テキストと画像の両方)を通して伝達されるため、多モーダルな図形言語を理解することは重要なAI課題であり、深いビジョン、言語、常識、文化的知識を織り合わせている。本研究では,IRFL(Image Recognition of Figurative Language)データセットの開発を行う。人間のアノテーションと自動パイプラインを利用して、マルチモーダルデータセットを生成し、マルチモーダル・フィギュラティブ言語理解のためのベンチマークとして2つの新しいタスクを導入する。我々は最先端のビジョンと言語モデルを実験し、最高の(22%)は人間(97%)よりもかなり悪い結果が得られた。私たちは、図形言語をよりよく理解できるモデルの開発を推進するために、データセット、ベンチマーク、コードをリリースしています。 Figures of speech such as metaphors, similes, and idioms are integral parts of human communication. They are ubiquitous in many forms of discourse, allowing people to convey complex, abstract ideas and evoke emotion. As figurative forms are often conveyed through multiple modalities (e.g., both text and images), understanding multimodal figurative language is an important AI challenge, weaving together profound vision, language, commonsense and cultural knowledge. In this work, we develop the Image Recognition of Figurative Language (IRFL) dataset. We leverage human annotation and an automatic pipeline we created to generate a multimodal dataset, and introduce two novel tasks as a benchmark for multimodal figurative language understanding. We experimented with state-of-the-art vision and language models and found that the best (22%) performed substantially worse than humans (97%). We release our dataset, benchmark, and code, in hopes of driving the development of models that can better understand figurative language.	翻訳日:2023-11-22 19:57:50 公開日:2023-11-19
# 非線形フォトニック結晶を用いたリートロッター積公式で定義されるコヒーレント圧縮様状態の生成 Generation of a coherent squeezed like state defined with the Lie-Trotter product formula using a nonlinear photonic crystal ( http://arxiv.org/abs/2304.11373v3 ) ライセンス: Link先を確認	Hiroo Azuma	(参考訳) 本稿では,非線形フォトニック結晶を用いたコヒーレント励起光の発生方法について検討する。フォトニック結晶は入射光の群速度を減少させるため、二階非線形光感受性$\chi^{(2)}$の材料からなる場合、非線形材料とそれを通過する光との相互作用は強化され、発光光の量子状態は大幅に縮小される。これにより、非線形フォトニック結晶を配置した共振共振器を備えたコヒーレント励起光を生成することができる。このコヒーレント圧縮様状態はリートロッター積公式で定義され、その数学的表現は従来のコヒーレント圧縮状態と異なる。このコヒーレントな圧縮状態が,提案手法の物理パラメータを調整することで,実際に15.9ドルdBで得られることを示す。光子の平均個数をビームスプリッタに1個または2個ずつ与え、圧縮光の流れを一対の絡み合った光に分割することにより、その絡み合いを定量的に推定する。本論文は、H. Azuma, J. Physの続編である。 d:appl。 Phys 55, 315106 (2022). In this paper, we investigate how to generate coherent squeezed like light using a nonlinear photonic crystal. Because the photonic crystal reduces the group velocity of the incident light, if it is composed of a material with a second-order nonlinear optical susceptibility $\chi^{(2)}$, the interaction between the nonlinear material and the light passing through it strengthens and the quantum state of the emitted light is largely squeezed. Thus, we can generate a coherent squeezed like light with a resonating cavity in which the nonlinear photonic crystal is placed. This coherent squeezed like state is defined with the Lie-Trotter product formula and its mathematical expression is different from those of conventional coherent squeezed states. We show that we can obtain this coherent squeezed like state with a squeezing level $15.9$ dB practically by adjusting physical parameters for our proposed method. Feeding the squeezed light whose average number of photons is given by one or two into a beam splitter and splitting the flow of the squeezed light into a pair of entangled light beams, we estimate their entanglement quantitatively. This paper is a sequel to H. Azuma, J. Phys. D: Appl. Phys. 55, 315106 (2022).	翻訳日:2023-11-22 19:46:13 公開日:2023-11-19
# 圧縮された)大きな言語モデルをトレーニングする方法 How To Train Your (Compressed) Large Language Model ( http://arxiv.org/abs/2305.14864v2 ) ライセンス: Link先を確認	Ananya Harsh Jha, Tom Sherborne, Evan Pete Walsh, Dirk Groeneveld, Emma Strubell, Iz Beltagy	(参考訳) 大規模言語モデル (LLM) のサイズが大きくなると、モデルの汎用性とゼロショットのプロンプト性を保ちながら、モデルのサイズを縮小できる圧縮方法が必要である。このゴールは一般的な圧縮設定よりも野心的であり、特定のエンドタスクに特化するためにモデルのサイズを減らす。そこで本研究では,言語モデリングの複雑度と12のゼロショットエンドタスクを含む大規模評価を行うタスク非依存圧縮パイプラインを開発した。以上の結果から,単純な層毎の刈り取りと継続する言語モデルが,既存の3つの最先端ベースラインを上回って,計算効率が1.5倍向上していることが示された。しかし、典型的なタスク特化圧縮とは異なり、最良の圧縮モデルは、スクラッチから訓練された同様のサイズのモデルに著しく劣る。半大の事前訓練モデルをタスクに依存しない圧縮の上限とし、合理的なトークン予算の下でこのギャップを埋めるための今後の作業を求める。本研究は,既存のllm圧縮手法の欠如を浮き彫りにし,モデルの汎用性と圧縮時のゼロショットプロンサビリティを維持できる新しい方法の必要性を明らかにした。再現性の向上とメソッド設計の反復を支援するため、コードと評価のセットアップをリリースします。 With the increase in the size of large language models (LLMs), we need compression methods that can reduce the model size while preserving the generality and zero-shot promptability of the model. This goal is more ambitious than the typical compression setup, which reduces the model's size at the expense of specializing it to a specific end-task. To study this, we develop a task-agnostic compression pipeline with a large-scale evaluation comprising language modeling perplexity and 12 zero-shot end-tasks. Our results show that a simple layer-wise pruning followed by continued language model pretraining matches or outperforms three existing state-of-the-art baselines while being 1.5x more computationally efficient. However, unlike typical task-specialized compression, our best-compressed model significantly underperforms a similar-sized model trained from scratch. We posit the half-sized pretrained model as an upper bound for task-agnostic compression and call for future work to bridge this gap under a reasonable token budget. Our findings highlight the inadequacy of existing compression methods for LLMs and establish a requirement for new methods that preserve a model's generality and zero-shot promptability under compression. We release our code and evaluation setup to facilitate reproducibility and help iterate on method design.	翻訳日:2023-11-22 19:38:35 公開日:2023-11-19
# ミニマックス修正による効果的な二値最適化 Effective Bilevel Optimization via Minimax Reformulation ( http://arxiv.org/abs/2305.13153v2 ) ライセンス: Link先を確認	Xiaoyu Wang, Rui Pan, Renjie Pi and Tong Zhang	(参考訳) バイレベル最適化は、ハイパーパラメータ最適化、データクリーニング、メタラーニングなど、さまざまな機械学習問題に成功している。しかし、その膨大な計算コストは、大規模問題におけるその利用に大きな課題をもたらす。この課題は、2段階の定式化のネスト構造によって生じ、各高次計算はコストのかかる内部最適化手順を必要とする。そこで本研究では,二段階最適化をミニマックス問題として再編成し,外部依存性を効果的に分離する手法を提案する。穏やかな条件下では、これらの2つの問題が等価であることを示す。さらに,収束保証付きミニマックス問題の解法として,多段勾配降下法(GDA)アルゴリズムを導入する。その結果,提案手法は計算コストを大幅に削減しつつ,最先端の2段階法よりも優れていた。 Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.	翻訳日:2023-11-22 19:37:36 公開日:2023-11-19
# 教師なしマルチビュー歩行者検出 Unsupervised Multi-view Pedestrian Detection ( http://arxiv.org/abs/2305.12457v2 ) ライセンス: Link先を確認	Mengyin Liu, Chao Zhu, Shiqi Ren, Xu-Cheng Yin	(参考訳) ビデオ監視の繁栄により、特定のエリアの歩行者を正確に見つけるために複数のカメラが適用された。しかし、従来の手法では、ビデオフレームやカメラビューごとに人間のラベル付きアノテーションに依存しており、カメラキャリブレーションや同期よりも重い負担がかかる。そこで本稿では,2D-3Dマッピングによる多視点検出器の学習におけるアノテーションの必要性を排除するために,unsupervised Multi-view Pedestrian Detection approach (UMPD)を提案する。 1)セマンティクス対応反復セグメンテーション(sis)は,視覚言語モデルから提案する反復型pcaとゼロショット意味クラスを用いて,仮想ラベルとして2次元歩行者マスクに変換されるマルチビュー画像の教師なし表現を抽出する。 2)2D-to-3D幾何投影による3D-to-2Dレンダリングの損失をSIS擬似ラベルを用いてトレーニングし,多視点2D画像を3次元ボリュームにエンコードし,ボクセルの密度と色を予測する。 3)GVDからバードスアイビューに投影される3次元密度のより優れた検出結果を得るためには,垂直型BEV正規化(VBR)を提案し,自然歩行者のポーズのように垂直となるように拘束する。一般的な多視点歩行者検出ベンチマークであるWildtrack,Terrace,MultiviewXの広範囲にわたる実験により,提案手法は,これまでの最先端の監視手法と競争的に機能することを示す。コードは利用可能だ。 With the prosperity of the video surveillance, multiple cameras have been applied to accurately locate pedestrians in a specific area. However, previous methods rely on the human-labeled annotations in every video frame and camera view, leading to heavier burden than necessary camera calibration and synchronization. Therefore, we propose in this paper an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping. 1) Firstly, Semantic-aware Iterative Segmentation (SIS) is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels, via our proposed iterative PCA and zero-shot semantic classes from vision-language models. 2) Secondly, we propose Geometry-aware Volume-based Detector (GVD) to end-to-end encode multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D rendering losses with SIS pseudo labels. 3) Thirdly, for better detection results, i.e., the 3D density projected on Birds-Eye-View from GVD, we propose Vertical-aware BEV Regularization (VBR) to constraint them to be vertical like the natural pedestrian poses. Extensive experiments on popular multi-view pedestrian detection benchmarks Wildtrack, Terrace, and MultiviewX, show that our proposed UMPD approach, as the first fully-unsupervised method to our best knowledge, performs competitively to the previous state-of-the-art supervised techniques. Code will be available.	翻訳日:2023-11-22 19:37:22 公開日:2023-11-19
# 低ランク拡散モデルによる非教師なし超スペクトルパンシャープニング Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model ( http://arxiv.org/abs/2305.10925v2 ) ライセンス: Link先を確認	Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, and Deyu Meng	(参考訳) 高分解能パノクロマトグラフィー (PAN) 画像と低分解能ハイパースペクトル (LRHS) 画像を融合して高分解能ハイパースペクトル (HRHS) 画像を生成する過程である。既存のベイジアンベースのhsパンシャープニング法では、画像特徴を特徴付ける前に手工芸画像を設計する必要があり、ディープラーニングベースのhsパンシャープニング法は通常、多数のペアトレーニングデータを必要とし、一般化能力に乏しい。そこで本研究では,事前学習した深層拡散モデルのパワーとベイズ法の一般化能力を同時に活用し,ハイパースペクトルパンシャープ化のための低ランク拡散モデルを提案する。具体的には、HRHS画像は2つの低ランクテンソル、すなわちベーステンソルと係数行列の積から復元できると仮定する。基本テンソルは画像フィールド上にあり、スペクトル次元が小さい。これにより、事前学習したリモートセンシング拡散モデルを用いて画像構造を捉えることができる。さらに, HRHS のスペクトル情報を保持する LRHS 画像から係数行列を事前推定する, 単純かつ極めて有効な手法を導出する。実験の結果,提案手法は従来の手法よりも優れた性能を示し,dl法よりも一般化能力が向上した。コードはhttps://github.com/xyrui/plrdiffでリリースされる。 Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a large number of paired training data and suffer from poor generalization ability. To address these issues, in this work, we propose a low-rank diffusion model for hyperspectral pansharpening by simultaneously leveraging the power of the pre-trained deep diffusion model and better generalization ability of Bayesian methods. Specifically, we assume that the HRHS image can be recovered from the product of two low-rank tensors, i.e., the base tensor and the coefficient matrix. The base tensor lies on the image field and has a low spectral dimension. Thus, we can conveniently utilize a pre-trained remote sensing diffusion model to capture its image structures. Additionally, we derive a simple yet quite effective way to pre-estimate the coefficient matrix from the observed LRHS image, which preserves the spectral information of the HRHS. Experimental results demonstrate that the proposed method performs better than some popular traditional approaches and gains better generalization ability than some DL-based methods. The code is released in https://github.com/xyrui/PLRDiff.	翻訳日:2023-11-22 19:36:20 公開日:2023-11-19
# TwitterとMastodon間のプラットフォーム移行パターンの探索 - ユーザ行動調査 Exploring Platform Migration Patterns between Twitter and Mastodon: A User Behavior Study ( http://arxiv.org/abs/2305.09196v3 ) ライセンス: Link先を確認	Ujun Jeong, Paras Sheth, Anique Tahir, Faisal Alatawi, H. Russell Bernard, Huan Liu	(参考訳) 最近、twitterからmastodonなどの代替プラットフォームに移行するユーザの急増は、移行パターンとは何か、さまざまなプラットフォームがユーザの行動にどう影響するか、ユーザ移行が移行プロセスにどのように収まるのか、といった疑問を提起した。本研究では,twitterの所有権変更後の最初の10週間で,twitterからmastodonに移行した1万人以上のユーザを対象に,これらの質問を詳細に調査する。私たちの研究は3つの主要な段階に分かれている。まず,マイグレーションパターンの抽出と解析を行うアルゴリズムを開発した。第二に、行動分析を活用して、TwitterとMastodonの異なるアーキテクチャを調べ、ユーザー行動が各プラットフォームの特徴とどのように対応するかを学ぶ。最後に,特定の行動要因がユーザに与える影響を判断する。我々は,ユーザの行動調査から得られたユーザマイグレーション,洞察,教訓について共有する。 A recent surge of users migrating from Twitter to alternative platforms, such as Mastodon, raised questions regarding what migration patterns are, how different platforms impact user behaviors, and how migrated users settle in the migration process. In this study, we elaborate on how we investigate these questions by collecting data over 10,000 users who migrated from Twitter to Mastodon within the first ten weeks following the ownership change of Twitter. Our research is structured in three primary steps. First, we develop algorithms to extract and analyze migration patterns. Second, by leveraging behavioral analysis, we examine the distinct architectures of Twitter and Mastodon to learn how user behaviors correspond with the characteristics of each platform. Last, we determine how particular behavioral factors influence users to stay on Mastodon. We share our findings of user migration, insights, and lessons learned from the user behavior study.	翻訳日:2023-11-22 19:33:55 公開日:2023-11-19
# ソースフリードメイン適応によるSSVEPベースの脳-コンピュータインタフェース Source-Free Domain Adaptation for SSVEP-based Brain-Computer Interfaces ( http://arxiv.org/abs/2305.17403v2 ) ライセンス: Link先を確認	Osman Berke Guney, Deniz Kucukahmetler and Huseyin Ozkan	(参考訳) 本稿では、定常視覚誘発電位(SSVEP)に基づく脳-コンピュータインタフェース(BCI)スペルに対するソースフリードメイン適応法を提案する。 SSVEPベースのBCIスペルは、迅速なコミュニケーションを可能にすることで、発話困難を経験する個人を支援する。しかし,高情報伝達率 (ITR) を実現するには,システムを使用する前に広い校正期間を必要とするため,新規ユーザの不快感が生じる。本稿では,未ラベルのターゲットデータのみに基づいて,ソースドメイン(元ユーザや過去の実験参加者のデータ)から新たなユーザ(ターゲットドメイン)に事前学習したデータに基づいて,強力なディープニューラルネットワーク(dnn)を適応させる新しい手法を提案する。この適応は、自己適応項と局所正規項からなるカスタム損失関数を最小化する。自己適応項は擬似ラベル戦略を使い、新しい局所規則項はデータ構造を利用してDNNに類似のラベルを隣接インスタンスに割り当てさせる。提案手法は,キャリブレーションの負担を取り除き,優れたキャラクタ識別精度とitrを維持しながらユーザの快適性を優先する。特に、ベンチマークとBETAデータセットにおける201.15ビット/minと145.02ビット/minのITRをそれぞれ達成し、最先端の代替よりも優れています。私たちのコードはhttps://github.com/osmanberke/SFDA-SSVEP-BCIで利用可能です。 This paper presents a source free domain adaptation method for steady-state visually evoked potentials (SSVEP) based brain-computer interface (BCI) spellers. SSVEP-based BCI spellers assist individuals experiencing speech difficulties by enabling them to communicate at a fast rate. However, achieving a high information transfer rate (ITR) in most prominent methods requires an extensive calibration period before using the system, leading to discomfort for new users. We address this issue by proposing a novel method that adapts a powerful deep neural network (DNN) pre-trained on data from source domains (data from former users or participants of previous experiments) to the new user (target domain), based only on the unlabeled target data. This adaptation is achieved by minimizing our proposed custom loss function composed of self-adaptation and local-regularity terms. The self-adaptation term uses the pseudo-label strategy, while the novel local-regularity term exploits the data structure and forces the DNN to assign similar labels to adjacent instances. The proposed method priorities user comfort by removing the burden of calibration while maintaining an excellent character identification accuracy and ITR. In particular, our method achieves striking 201.15 bits/min and 145.02 bits/min ITRs on the benchmark and BETA datasets, respectively, and outperforms the state-of-the-art alternatives. Our code is available at https://github.com/osmanberke/SFDA-SSVEP-BCI	翻訳日:2023-11-22 19:22:49 公開日:2023-11-19
# 簡単なベースラインによる対人訓練の見直しと促進 Revisiting and Advancing Adversarial Training Through A Simple Baseline ( http://arxiv.org/abs/2306.07613v2 ) ライセンス: Link先を確認	Hong Liu	(参考訳) 本稿では,敵の攻撃に対する先駆的防御手法である敵訓練の本質的要素について考察する。本稿では,損失関数や学習速度スケジューラ,データ拡張など,モデルアーキテクチャに依存しない要因が,敵の堅牢性と一般化に影響を及ぼすことを示す。これらの要因が制御されると、SimpleATと呼ばれるシンプルなベースラインアプローチを導入し、最近の手法と競合し、堅牢なオーバーフィッティングを軽減します。我々はCIFAR-10/100とTiny-ImageNetの広範な実験を行い、AutoAttackのような最先端の攻撃者に対するSimpleATの堅牢性を検証する。以上の結果から,CIFAR-10-Cに見られるような画像劣化の存在下で,SimpleATは優れた性能を示した。さらに、我々はSimpleATがモデル予測のばらつきを低減できることを実証的に示す。以上の結果から,SimpleATと先進的対人防御手法の相互関係が明らかとなった。 In this paper, we delve into the essential components of adversarial training which is a pioneering defense technique against adversarial attacks. We indicate that some factors such as the loss function, learning rate scheduler, and data augmentation, which are independent of the model architecture, will influence adversarial robustness and generalization. When these factors are controlled for, we introduce a simple baseline approach, termed SimpleAT, that performs competitively with recent methods and mitigates robust overfitting. We conduct extensive experiments on CIFAR-10/100 and Tiny-ImageNet, which validate the robustness of SimpleAT against state-of-the-art adversarial attackers such as AutoAttack. Our results also demonstrate that SimpleAT exhibits good performance in the presence of various image corruptions, such as those found in the CIFAR-10-C. In addition, we empirically show that SimpleAT is capable of reducing the variance in model predictions, which is considered the primary contributor to robust overfitting. Our results also reveal the connections between SimpleAT and many advanced state-of-the-art adversarial defense methods.	翻訳日:2023-11-22 19:13:33 公開日:2023-11-19
# 集合価値フィードバックによるオンライン学習 Online Learning with Set-Valued Feedback ( http://arxiv.org/abs/2306.06247v3 ) ライセンス: Link先を確認	Vinod Raman, Unique Subedi, Ambuj Tewari	(参考訳) 学習者が1つのラベルを予測するが、フィードバックとして \textit{set of labels} を受け取るオンラインマルチクラス分類の変種を調査した。このモデルでは、明らかにされた集合に含まれるラベルを出力しないために学習者がペナルティを課される。単一ラベルフィードバックによるオンラインマルチクラス学習とは異なり、決定論的かつランダム化されたオンライン学習能力は、集合的フィードバックの下で実現可能な設定において \textit{not equivalent} である。さらに、決定論的かつランダムな実現可能学習性は、フィードバックとして明らかにできる集合の集合のヘリー数が有限であれば同値であることを示す。この分離を考慮に入れ、我々は2つの新しい組合せ次元、すなわち集合リトルストーンと測度シェータリングの次元を与え、その有限性はそれぞれ決定論的およびランダムに実現可能な可学習性を特徴づける。さらに、これらの次元は、決定論的でランダム化されたミニマックスの後悔を、実現可能な設定で下界と上界に制限する。実現可能な設定を超えて、測定値の破砕次元が学習性を特徴づけ続け、不可知的な設定におけるミニマックス後悔を定量化する。最後に,オンラインマルチラベルランキング,オンラインマルチラベル分類,インターバル値応答による実数値予測という3つの実践的学習環境において,ミニマックス後悔の限界を確立するために実験結果を用いた。 We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} in the realizable setting under set-valued feedback. In addition, we show that deterministic and randomized realizable learnability are equivalent if the Helly number of the collection of sets that can be revealed as feedback is finite. In light of this separation, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, whose finiteness characterizes deterministic and randomized realizable learnability respectively. Additionally, these dimensions lower- and upper bound the deterministic and randomized minimax regret in the realizable setting. Going beyond the realizable setting, we prove that the Measure shattering dimension continues to characterize learnability and quantify minimax regret in the agnostic setting. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.	翻訳日:2023-11-22 19:12:34 公開日:2023-11-19
# MMSum:ビデオのマルチモーダル要約とサムネイル生成のためのデータセット MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos ( http://arxiv.org/abs/2306.04216v2 ) ライセンス: Link先を確認	Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang	(参考訳) マルチモーダル出力(MSMO)を用いたマルチモーダル要約が,有望な研究方向として浮上している。それでも、メンテナンスの不十分、データアクセシビリティの欠如、サイズ制限、適切な分類の欠如など、既存のMSMOデータセットには多くの制限がある。これらの課題に対処し、この新たな方向性のための包括的なデータセットを提供するため、我々は、慎重に\textbf{MMSum}データセットをキュレートした。新しいデータセットは,(1)ビデオコンテンツとテキストコンテンツの両方に有能な要約を提供し,マルチモーダル学習に優れた指導とラベルを提供する。 2) 包括的かつ丁寧に分類し, 多様な実世界のシナリオを包括する17のカテゴリと170のサブカテゴリにまたがる。 3) 提案するデータセット上で行ったベンチマークテストは, \textit{video summarization}, \textit{text summarization}, \textit{multimodal summarization} など,さまざまなタスクとメソッドを評価した。アクセシビリティとコラボレーションを推進すべく、私たちは \textbf{MMSum}データセットとデータ収集ツールを完全なオープンソースリソースとしてリリースします。プロジェクトのwebサイトは~\url{https://mmsum-dataset.github.io/}にある。 Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient maintenance, data inaccessibility, limited size, and the absence of proper categorization, which pose significant challenges. To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the \textbf{MMSum} dataset. Our new dataset features (1) Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning. (2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. (3) Benchmark tests performed on the proposed dataset to assess various tasks and methods, including \textit{video summarization}, \textit{text summarization}, and \textit{multimodal summarization}. To champion accessibility and collaboration, we will release the \textbf{MMSum} dataset and the data collection tool as fully open-source resources, fostering transparency and accelerating future developments. Our project website can be found at~\url{https://mmsum-dataset.github.io/}	翻訳日:2023-11-22 19:10:00 公開日:2023-11-19
# 有限要素インスピレーションネットワーク:部分観測から解釈可能な変形可能な物体ダイナミクスを学習する Finite element inspired networks: Learning interpretable deformable object dynamics from partial observations ( http://arxiv.org/abs/2307.07975v2 ) ライセンス: Link先を確認	Shamil Mamedov, A. Ren\'e Geist, Jan Swevers, Sebastian Trimpe	(参考訳) 変形可能な線形オブジェクト(dlo)ダイナミクスの正確なシミュレーションは、手前のタスクが人間の解釈可能なモデルを必要とする場合、難しい。このようなモデルに到達するために、剛有限要素法(R-FEM)からインスピレーションを得て、動的ネットワークによって内部状態が経時的にアンロールされる剛体の直列鎖としてDLOをモデル化する。この状態が直接観察されないため、ダイナミックスネットワークは、観測された運動変数をDLOの隠れ状態にマッピングする物理インフォームドエンコーダと共同で訓練される。状態が物理的に意味のある表現を取得することを奨励するために、基礎となるR-FEMモデルの前方運動学をデコーダとして活用する。ロボット実験を通じて、提案アーキテクチャは、部分的な観測から物理的に解釈可能な予測をもたらす、容易に扱いやすいDLO力学モデルを提供することを示した。プロジェクトコードは \url{https://tinyurl.com/fei-networks} で利用可能である。 Accurate simulation of deformable linear object (DLO) dynamics is challenging if the task at hand requires a human-interpretable model that also yields fast predictions. To arrive at such a model, we draw inspiration from the rigid finite element method (R-FEM) and model a DLO as a serial chain of rigid bodies whose internal state is unrolled through time by a dynamics network. As this state is not observed directly, the dynamics network is trained jointly with a physics-informed encoder which maps observed motion variables to the DLO's hidden state. To encourage that the state acquires a physically meaningful representation, we leverage the forward kinematics of the underlying R-FEM model as a decoder. Through robot experiments we demonstrate that the proposed architecture provides an easy-to-handle, yet capable DLO dynamics model yielding physically interpretable predictions from partial observations. The project code is available at: \url{https://tinyurl.com/fei-networks}	翻訳日:2023-11-22 18:49:37 公開日:2023-11-19
# 非パラメトリックな帯域における最も重要なシフトの追跡 Tracking Most Significant Shifts in Nonparametric Contextual Bandits ( http://arxiv.org/abs/2307.05341v2 ) ライセンス: Link先を確認	Joe Suk and Samory Kpotufe	(参考訳) リプシッツが報酬関数を意味する非パラメトリックな文脈帯域について、時間とともに変化する可能性がある。まず、この最小限のダイナミックな後悔率を、変更数で$L$と総変量$V$で理解されていない設定で確立し、どちらも文脈空間上の分布のすべての変化を捉え、この設定では最先端の手続きが最適でないと主張する。次に、私たちはこの設定に対する適応性の問題、すなわち$l$ や $v$ を知らずにminimaxレートを達成する傾向がある。極めて重要なことは、与えられたコンテキストで局所的に見られるbandit問題は、他のコンテキスト空間の報酬変更である$\cal x$の影響を受けるべきではない、ということです。したがって、我々は変化の概念を提案し、これは大きな変化を経験し、局所性をうまく考慮し、したがって$L$や$V$よりもかなり少ない変化を数えている。さらに、非定常MAB(Suk & Kpotufe, 2022)に関する最近の研究と同様に、大きな変化は平均報酬の最も重要な変化(例えば、観測された文脈に関連する深刻なベストアームの変化)を数えることしかなかった。私たちの主な成果は、このより寛容な変化の概念が実際に適応可能であることを示すことです。 We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.	翻訳日:2023-11-22 18:48:05 公開日:2023-11-19
# 人間好奇心のネットワーク理論を用いた本質的動機付けグラフ探索 Intrinsically motivated graph exploration using network theories of human curiosity ( http://arxiv.org/abs/2307.04962v3 ) ライセンス: Link先を確認	Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy and Dani S. Bassett	(参考訳) 本質的に動機づけられた探索は、追加の外部報酬なしでも強化学習に役立つことが証明されている。環境が自然にグラフとして表現される場合、探索を導く最善の方法は未解決の問題だ。本研究では,情報ギャップ理論と圧縮進行理論という,人間の好奇心の2つの理論によるグラフ構造データ探索手法を提案する。この理論は好奇心を、環境に訪れるノードによって引き起こされるサブグラフの位相的特徴を最適化する本質的な動機であると考えている。これらの特徴をグラフニューラルネットワークに基づく強化学習の報奨として利用する。合成生成グラフの複数のクラスにおいて、訓練されたエージェントは、訓練中に見られるよりも長い探索的歩行とより大きな環境に一般化する。本手法は, トポロジ特性のグリーディ評価よりも効率的に計算する。提案される本質的動機は、レコメンダシステムに対して特に関連がある。我々は、好奇心を考慮した次のノードレコメンデーションが、MovieLens、Amazon Books、Wikipediaなど、現実世界のグラフ環境におけるPageRank中心性よりも人間の選択をより予測できることを示した。 Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by nodes visited in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to longer exploratory walks and larger environments than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that next-node recommendations considering curiosity are more predictive of human choices than PageRank centrality in several real-world graph environments, including MovieLens, Amazon Books, and Wikipedia.	翻訳日:2023-11-22 18:47:41 公開日:2023-11-19
# 対人訓練による解釈可能なコンピュータビジョンモデル:ロバスト性-解釈可能性結合を解き明かす Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection ( http://arxiv.org/abs/2307.02500v2 ) ライセンス: Link先を確認	Delyan Boychev	(参考訳) 最先端のディープニューラルネットワークの複雑性が永久に増大するにつれて、その解釈性を維持することがますます難しくなっている。本研究は,ロバストなモデル作成に使用される敵の訓練の効果を評価することを目的としている。コンピュータビジョンモデルをより解釈可能にすることが示されている。モデルを現実世界にデプロイする場合、解釈性は堅牢性と同じくらい不可欠です。これら2つの課題の相関性を証明するため,局所的特徴重要度法 (SHAP, 統合的勾配法) と特徴可視化技術 (Representation Inversion, Class Specific Image Generation) を用いてモデルを広範囲に検討した。標準モデルは、ロバストに比べて敵の攻撃の影響を受けやすく、その学習された表現は人間にとって意味をなさない。逆に、これらのモデルは予測をサポートする画像の特徴的な領域に焦点を当てている。さらに、ロバストモデルによって学習される機能は、実際のものに近い。 With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.	翻訳日:2023-11-22 18:46:14 公開日:2023-11-19
# スピン-1鎖の量子フィッシャー情報と多成分絡み合い Quantum Fisher Information and multipartite entanglement in spin-1 chains ( http://arxiv.org/abs/2307.02407v2 ) ライセンス: Link先を確認	Federico Dell'Anna, Sunny Pradhan, Cristian Degli Esposti Boschi, Elisa Ercolessi	(参考訳) 本稿では,1次元スピン-1モデルにおける基底状態の量子フィッシャー情報(QFI)をマルチパーティイトエンタングルメントの証として検討する。最も一般的なSU(2)不変のスピン-1鎖であるビリナー・バイカドラティックモデルと、最も近い隣り合う相互作用と開境界条件を持つXXZスピン-1鎖である。厳密な非局所可観測性のqfiのスケーリングは、位相図の特徴付けや、特に位相相の研究において、最大にスケールできることを示した。臨界相におけるその挙動を分析することで、局所および弦観測可能な順序パラメータのスケーリング次元を復元することができる。数値計算は密度行列再正規化群アルゴリズムとテンソルネットワーク技術を利用して得られた。 In this paper, we study the ground state Quantum Fisher Information (QFI) in one-dimensional spin-1 models, as witness to Multipartite Entanglement. The models addressed are the Bilinear-Biquadratic model, the most general isotropic SU(2)-invariant spin-1 chain, and the XXZ spin-1 chain, both with nearest-neighbor interactions and open boundary conditions. We show that the scaling of the QFI of strictly non-local observables can be used for characterizing the phase diagrams and, in particular, for studying topological phases, where it scales maximally. Analysing its behavior at the critical phases we are also able to recover the scaling dimensions of the order parameters both for local and string observables. The numerical results have been obtained by exploiting the Density Matrix Renormalization Group algorithm and Tensor Network techniques.	翻訳日:2023-11-22 18:45:57 公開日:2023-11-19
# 頸動脈超音波画像分割と分類のための領域とカテゴリ信頼に基づくマルチタスクネットワーク A region and category confidence-based multi-task network for carotid ultrasound image segmentation and classification ( http://arxiv.org/abs/2307.00583v2 ) ライセンス: Link先を確認	Haitao Gan and Ran Zhou and Yanghan Ou and Furong Wang and Xinyao Cheng and Aaron Fenster	(参考訳) 超音波画像における頸動脈プラークの分割と分類は動脈硬化の治療と脳卒中リスクの評価において重要な役割を果たす。深層学習法は頸動脈プラークのセグメンテーションと分類に用いられてきたが,2段階法は解析の複雑さを増大させ,既存のマルチタスク法はセグメンテーションと分類の関係を無視している。これらのことは、すべてのタスクで価値ある情報が完全に活用されないため、最適以下のパフォーマンスをもたらす。そこで我々は,この2つの課題間の相関を利用して,領域信頼モジュール (RCM) とサンプルカテゴリ信頼モジュール (CCM) を用いて,超音波頸動脈プラーク分類と分類のためのマルチタスク学習フレームワーク (RCCM-Net) を提案する。 RCMは、プラーク領域の確率から分類タスクへの知識を提供し、CCMはセグメンテーションタスクのカテゴリ標本重量を学習するために設計されている。総計1270枚の頸動脈プラークの2次元超音波画像が,中国湖南省の病院から採取された。提案手法は,従来のシングルタスクネットワーク (segnet, deeplabv3+, unet++, efficientnet, res2net, repvgg, dpn) とマルチタスクアルゴリズム (hrnet, mtanet) と比較して,85.82% の精度と84.92%のディス相似性効率でセグメント化が可能であった。アブレーション実験では,設計したRCMとCCMの両方がネットワークの性能向上に有効であることを示した。そこで本手法は,臨床および臨床における頸動脈プラーク解析に有用であると考えられた。 The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationship between the segmentation and classification. These will lead to suboptimal performance as valuable information might not be fully leveraged across all tasks. Therefore, we propose a multi-task learning framework (RCCM-Net) for ultrasound carotid plaque segmentation and classification, which utilizes a region confidence module (RCM) and a sample category confidence module (CCM) to exploit the correlation between these two tasks. The RCM provides knowledge from the probability of plaque regions to the classification task, while the CCM is designed to learn the categorical sample weight for the segmentation task. A total of 1270 2D ultrasound images of carotid plaques were collected from Zhongnan Hospital (Wuhan, China) for our experiments. The results showed that the proposed method can improve both segmentation and classification performance compared to existing single-task networks (i.e., SegNet, Deeplabv3+, UNet++, EfficientNet, Res2Net, RepVGG, DPN) and multi-task algorithms (i.e., HRNet, MTANet), with an accuracy of 85.82% for classification and a Dice-similarity-coefficient of 84.92% for segmentation. In the ablation study, the results demonstrated that both the designed RCM and CCM were beneficial in improving the network's performance. Therefore, we believe that the proposed method could be useful for carotid plaque analysis in clinical trials and practice.	翻訳日:2023-11-22 18:45:41 公開日:2023-11-19
# JD広告検索におけるマルチエキスパート知識凝縮を用いたクエリ分類の改善に向けて Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search ( http://arxiv.org/abs/2308.01098v3 ) ライセンス: Link先を確認	Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao, Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao	(参考訳) 検索クエリ分類は、ユーザの意図を理解する効果的な方法であり、実際のオンライン広告システムにおいて非常に重要である。低レイテンシを確保するために、浅いモデル(例えばFastText)が効率的なオンライン推論に広く使われている。しかし、fasttextモデルの表現能力は不十分であり、特に低頻度クエリや尾付きカテゴリでは分類性能が低下する。より深く複雑なモデル(bertなど)を使用することは効果的なソリューションだが、オンライン推論の遅延が増加し、計算コストが高くなる。したがって、推論効率と分類性能の両方をジャグリングする方法は明らかに極めて重要である。本稿では,この課題を克服するために,オンライン高速テキストモデルの厳密な低レイテンシ制約下での分類性能を向上させるための,単純かつ効果的な知識蒸留フレームワークである知識凝縮(kc)を提案する。具体的には、より関連性の高いデータを取得するために、オフラインのBERTモデルをトレーニングすることを提案する。強力なセマンティック表現から恩恵を受けることで、過去のデータに公開されていない関連性の高いラベルがトレーニングセットに追加され、FastTextモデルのトレーニングが改善される。さらに, 関係データのマイニング能力の向上を図るため, 分散分散多元学習戦略を提案する。異なるデータ分布から複数のbertモデルをトレーニングすることで、それぞれ、ハイ、ミドル、低周波の検索クエリでパフォーマンスが向上する。マルチディストリビューションからのモデルアンサンブルにより、その検索能力はより強力になる。我々はこのフレームワークの2つのバージョンをJD検索にデプロイし、オフライン実験と複数のデータセットからのオンラインA/Bテストの両方で提案手法の有効性を検証した。 Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.	翻訳日:2023-11-22 18:36:26 公開日:2023-11-19
# ワッサーシュタイン統計の形状とアフィン変形に関する情報幾何学 Information Geometry of Wasserstein Statistics on Shapes and Affine Deformations ( http://arxiv.org/abs/2307.12508v3 ) ライセンス: Link先を確認	Shun-ichi Amari, Takeru Matsuda	(参考訳) 情報幾何学とワッサーシュタイン幾何学は確率分布の多様体で導入された2つの主要な構造であり、それらはその異なる特徴を捉えている。位置スケールモデルの多次元一般化であるアフィン変形統計モデルのためのliおよびzhao(2023)の枠組みにおけるワッサースタイン幾何学の特徴について検討した。我々は情報幾何学とwasserstein幾何に基づく推定子の長所と短所を比較した。確率分布の形状とアフィン変形はワッサーシュタイン幾何学において分離され、フィッシャー効率の損失と引き換えに波形摂動に対する頑健さを示す。楕円対称アフィン変形モデルの場合,ワッサースタイン推定器がモーメント推定器であることを示す。波形がガウス的である場合と場合に限り、情報幾何学的推定器(maximum-likelihood estimator)と一致する。ワッサーシュタイン効率の役割は、波形変化に対する堅牢性の観点から解明される。 Information geometry and Wasserstein geometry are two main structures introduced in a manifold of probability distributions, and they capture its different characteristics. We study characteristics of Wasserstein geometry in the framework of Li and Zhao (2023) for the affine deformation statistical model, which is a multi-dimensional generalization of the location-scale model. We compare merits and demerits of estimators based on information geometry and Wasserstein geometry. The shape of a probability distribution and its affine deformation are separated in the Wasserstein geometry, showing its robustness against the waveform perturbation in exchange for the loss in Fisher efficiency. We show that the Wasserstein estimator is the moment estimator in the case of the elliptically symmetric affine deformation model. It coincides with the information-geometrical estimator (maximum-likelihood estimator) when and only when the waveform is Gaussian. The role of the Wasserstein efficiency is elucidated in terms of robustness against waveform change.	翻訳日:2023-11-22 18:33:50 公開日:2023-11-19
# 量子スピンガラスの絡み合いとレプリカ対称性の破れ Entanglement and replica symmetry breaking in a driven-dissipative quantum spin glass ( http://arxiv.org/abs/2307.10176v2 ) ライセンス: Link先を確認	Brendan P. Marsh, Ronen M. Kroeze, Surya Ganguli, Sarang Gopalakrishnan, Jonathan Keeling, and Benjamin L. Lev	(参考訳) 本稿では,共焦点共焦点共振器QEDシステムの量子力学シミュレーションについて述べる。開量子力学とレプリカ対称性の破れの間の密接な関係が確立され、個々の量子軌道がレプリカとなる。我々は、最大15個のスピン1/2粒子からなる完全連結でフラストレーションのあるスピンネットワークにおけるレプリカ対称性の破れの出現において、絡み合いが重要な役割を担っていることを観察する。絡み合ったスピンの量子軌道は、半古典的軌道よりも低いエネルギーの定常状態スピン配置に達する。キャビティ放出はスピン配置の連続確率的進化のモニタリングを可能にし、この計画からのバックアクションは状態が分裂したイジング状態と複製対称性に絡み合った。スピンガラス秩序の出現は、磁化の欠如とレプリカ間の非自明なスピン重なり密度分布の存在によってそれ自体が現れる。さらに、これらの重なりは、パリシ rsb 解 ansatz のシェリントン-カークパトリック模型と一致して、初期の超計量次数を示す。しかし、非熱パリスオーダーのパラメータ分布は、この量子光学スピングラスの駆動散逸性を強調している。この実用可能なシステムは、量子効果がスピングラスの物理をいかに強化するかを調べるためのテストベッドとして機能するかもしれない。 We describe simulations of the quantum dynamics of a confocal cavity QED system that realizes an intrinsically driven-dissipative spin glass. A close connection between open quantum dynamics and replica symmetry breaking is established, in which individual quantum trajectories are the replicas. We observe that entanglement plays an important role in the emergence of replica symmetry breaking in a fully connected, frustrated spin network of up to fifteen spin-1/2 particles. Quantum trajectories of entangled spins reach steady-state spin configurations of lower energy than that of semiclassical trajectories. Cavity emission allows monitoring of the continuous stochastic evolution of spin configurations, while backaction from this projects entangled states into states of broken Ising and replica symmetry. The emergence of spin glass order manifests itself through the simultaneous absence of magnetization and the presence of nontrivial spin overlap density distributions among replicas. Moreover, these overlaps reveal incipient ultrametric order, in line with the Parisi RSB solution ansatz for the Sherrington-Kirkpatrick model. A nonthermal Parisi order parameter distribution, however, highlights the driven-dissipative nature of this quantum optical spin glass. This practicable system could serve as a testbed for exploring how quantum effects enrich the physics of spin glasses.	翻訳日:2023-11-22 18:32:19 公開日:2023-11-19
# ヒッグス真空がゼロの可視宇宙の双対として実現される隠れたセクタダークマター Hidden Sector Dark Matter Realized as a Twin of the Visible Universe With Zero Higgs Vacuum Expectation ( http://arxiv.org/abs/2308.08107v4 ) ライセンス: Link先を確認	Stephen L. Adler	(参考訳) 宇宙は2つの同一の粒子集合とゲージ相互作用を含み、ヒッグスポテンシャルによって異なる重力によってのみ結合する。基礎となる対称性のため、非結合時の2つのセクタは非零相と零ヒッグス真空期待相の境界にあるヒッグスポテンシャルを持つと仮定する。 2つのセクター間の結合を断ち切ることで、あるセクターにおけるヒッグスポテンシャルを非ゼロヒッグス期待領域に(可視セクターを)押し込み、もう一方セクターにおけるヒッグスポテンシャルをゼロヒッグス期待領域に(暗セクターを)押し込むことができる。ダークセクターで最小の質量のバリオンは、自ら相互作用するダークマター粒子の候補となる。 We propose that the universe contains two identical sets of particles and gauge interactions, coupling only through gravitation, which differ by their Higgs potentials. We postulate that because of underlying symmetries, the two sectors when uncoupled have Higgs potentials that lie at the boundary between phases with nonzero and zero Higgs vacuum expectation. Turning on the coupling between the two sectors can break the degeneracy, pushing the Higgs potential in one sector into the domain of nonzero Higgs expectation (giving the visible sector), and pushing the Higgs potential in the other sector into the domain of zero Higgs expectation (giving the dark sector). The least massive baryon in the dark sector will then be a candidate self-interacting dark matter particle.	翻訳日:2023-11-22 18:20:18 公開日:2023-11-19
# 量子カーネル生成におけるいくつかの適合関数と絡み合いゲート Several fitness functions and entanglement gates in quantum kernel generation ( http://arxiv.org/abs/2309.03307v3 ) ライセンス: Link先を確認	Haiyan Wang	(参考訳) 量子機械学習(QML)は、量子技術における有望なフロンティアである。量子アドバンテージの追求において、サポートベクトルマシンのための量子カーネル法が強力なアプローチとして登場した。量子力学の基本的な概念である絡み合いは、量子コンピューティングにおいて中心的な役割を果たす。本稿では,多目的遺伝的アルゴリズムを用いて,量子カーネル特徴写像におけるエンタングルメントゲートの最適個数について検討する。我々は,非局所ゲートと局所ゲートの遺伝的アルゴリズムの適合機能を明確にし,エンタングルメントゲートを用いる利点について考察した。実験により,量子カーネル法における量子回路の最適構成は,絡み合うための非局所ゲートの数に比例することがわかった。この結果は、非局所ゲートが主に抑制された量子カーネル生成に関する以前の文献を補完する。さらに,量子サポートベクトルマシンの機能マップに必要な非局所ゲート数を推定するために,データの分離性指標を活用できることを実証する。この洞察は、データ分析に基づいたhttps://qiskit.org/のような様々な量子プログラミングパッケージで、絡み合いパラメータなどの適切なパラメータを選択するのに役立つ。本研究は,量子機械学習アルゴリズムの効率と精度を向上させる上で有用なガイダンスを提供する。 Quantum machine learning (QML) represents a promising frontier in the quantum technologies. In this pursuit of quantum advantage, the quantum kernel method for support vector machine has emerged as a powerful approach. Entanglement, a fundamental concept in quantum mechanics, assumes a central role in quantum computing. In this paper, we investigate the optimal number of entanglement gates in the quantum kernel feature maps by a multi-objective genetic algorithm. We distinct the fitness functions of genetic algorithm for non-local gates for entanglement and local gates to gain insights into the benefits of employing entanglement gates. Our experiments reveal that the optimal configuration of quantum circuits for the quantum kernel method incorporates a proportional number of non-local gates for entanglement. The result complements the prior literature on quantum kernel generation where non-local gates were largely suppressed. Furthermore, we demonstrate that the separability indexes of data can be leveraged to estimate the number of non-local gates required for the quantum support vector machine's feature maps. This insight can be helpful in selecting appropriate parameters, such as the entanglement parameter, in various quantum programming packages like https://qiskit.org/ based on data analysis. Our findings offer valuable guidance for enhancing the efficiency and accuracy of quantum machine learning algorithms.	翻訳日:2023-11-22 18:10:20 公開日:2023-11-19
# 情報熱力学第二法則の普遍的妥当性 Universal validity of the second law of information thermodynamics ( http://arxiv.org/abs/2308.15558v2 ) ライセンス: Link先を確認	Shintaro Minagawa, M. Hamed Mohammady, Kenta Sakai, Kohtaro Kato, Francesco Buscemi	(参考訳) フィードバック制御と消去プロトコルは、マックスウェルのデモンパラドックスを具現化し、熱力学と情報処理の相互作用を研究するモデルとしてしばしば考えられている。このような研究は、マクスウェルのデーモンと第二の熱力学の法則が平和的に共存できるという結論をコミュニティで広く受け入れられており、デーモンが与える利益は、測定を行い、デーモンの記憶を初期状態に戻すコストによって相殺されなければならないからである。この種のステートメントは、まとめて情報熱力学の第2法則と呼ばれ、最近量子理論シナリオを含むように拡張されている。しかし、この方向の以前の研究では、特にデーモンの記憶におけるフィードバックプロセスと測定についていくつかの仮定がなされており、普遍的に適用不可能であり、有効範囲が明確でないステートメントに到達している。本研究では、熱力学の第2法則と完全に一致した量子フィードバック制御および消去プロトコルの全範囲を正確に特徴付けることにより、このギャップを埋める。量子フィードバック制御と消去プロトコルは、そのプロトコルが熱力学と全体的な互換性がある限り、関連する測定プロセスに関係なく保持されなければならない。我々の包括的な分析は、新しいシナリオを包含するだけでなく、より少ない仮定で以前のシナリオも取り出す。この単純化は理論のより明確な理解に寄与する。さらに,本研究は,フィードバック制御により抽出可能な作業の特徴を識別する正しい情報尺度として,Groenewold-Ozawa情報ゲインを同定する。 Feedback control and erasure protocols have often been considered as a model to embody Maxwell's Demon paradox and to study the interplay between thermodynamics and information processing. Such studies have led to the conclusion, now widely accepted in the community, that Maxwell's Demon and the second law of thermodynamics can peacefully coexist because any gain provided by the demon must be offset by the cost of performing measurement and resetting the demon's memory to its initial state. Statements of this kind are collectively referred to as second laws of information thermodynamics and have recently been extended to include quantum theoretical scenarios. However, previous studies in this direction have made several assumptions, in particular about the feedback process and the measurement performed on the demon's memory, and thus arrived at statements that are not universally applicable and whose range of validity is not clear. In this work, we fill this gap by precisely characterizing the full range of quantum feedback control and erasure protocols that are overall consistent with the second law of thermodynamics. This leads us to conclude that the second law of information thermodynamics is indeed universal: it must hold for any quantum feedback control and erasure protocol, regardless of the measurement process involved, as long as the protocol is overall compatible with thermodynamics. Our comprehensive analysis not only encompasses new scenarios but also retrieves previous ones, doing so with fewer assumptions. This simplification contributes to a clearer understanding of the theory. Additionally, our work identifies the Groenewold--Ozawa information gain as the correct information measure characterizing the work extractable by feedback control.	翻訳日:2023-11-22 18:07:46 公開日:2023-11-19
# SA2-Net:顕微鏡画像分割のためのスケールアウェアアテンションネットワーク SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation ( http://arxiv.org/abs/2309.16661v3 ) ライセンス: Link先を確認	Mustansar Fiaz, Moein Heidari, Rao Muhammad Anwer, Hisham Cholakkal	(参考訳) 顕微鏡画像分割は、与えられた顕微鏡画像内の各ピクセルに意味的ラベルを割り当てることを目的としている。畳み込みニューラルネットワーク(CNN)は多くの既存のフレームワークの基礎となっているが、多くの場合、長距離依存を明示的に捉えるのに苦労する。当初、トランスフォーマーは自己注意でこの問題に対処するために考案されたが、形状、サイズ、外観、ターゲット領域密度など、顕微鏡画像における様々な課題に対処するために、局所的特徴とグローバルな特徴の両方が重要であることが証明されている。本稿では,マルチスケール特徴学習を利用して,顕微鏡画像内の多様な構造を効果的に処理する,注意誘導型SA2-Netを提案する。具体的には,細胞などの微細領域のスケールや形状の変動を正確に把握し,正確なセグメンテーションを行うためのSA2モジュールを提案する。このモジュールは、マルチステージ機能の各レベルにおけるローカルな注意と、複数の解像度にわたるグローバルな関心を取り入れている。さらに、アダプティブアップアテンション(AuA)モジュールと呼ばれる新しいアップサンプリング戦略を導入することで、ぼやけた領域境界(セル境界など)の問題に対処する。このモジュールは、明示的な注意機構を用いて顕微鏡領域の局在性を改善するための識別能力を高める。 5つの挑戦的なデータセットに関する広範な実験は、sa2-netモデルの利点を示しています。ソースコードは \url{https://github.com/mustansarfiaz/sa2-net} で公開されている。 Microscopic image segmentation is a challenging task, wherein the objective is to assign semantic labels to each pixel in a given microscopic image. While convolutional neural networks (CNNs) form the foundation of many existing frameworks, they often struggle to explicitly capture long-range dependencies. Although transformers were initially devised to address this issue using self-attention, it has been proven that both local and global features are crucial for addressing diverse challenges in microscopic images, including variations in shape, size, appearance, and target region density. In this paper, we introduce SA2-Net, an attention-guided method that leverages multi-scale feature learning to effectively handle diverse structures within microscopic images. Specifically, we propose scale-aware attention (SA2) module designed to capture inherent variations in scales and shapes of microscopic regions, such as cells, for accurate segmentation. This module incorporates local attention at each level of multi-stage features, as well as global attention across multiple resolutions. Furthermore, we address the issue of blurred region boundaries (e.g., cell boundaries) by introducing a novel upsampling strategy called the Adaptive Up-Attention (AuA) module. This module enhances the discriminative ability for improved localization of microscopic regions using an explicit attention mechanism. Extensive experiments on five challenging datasets demonstrate the benefits of our SA2-Net model. Our source code is publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}.	翻訳日:2023-11-22 17:58:49 公開日:2023-11-19
# 時系列予測: 差分データによる長期依存の解放 Time-Series Forecasting: Unleashing Long-Term Dependencies with Fractionally Differenced Data ( http://arxiv.org/abs/2309.13409v3 ) ライセンス: Link先を確認	Sarit Maitra, Vivek Mishra, Srashti Dwivedi, Sukanya Kundu, Goutam Kumar Kundu	(参考訳) 本研究では,分数差分(FD)のパワーを利用して時系列データにおける短期的および長期的依存関係を捉える新しい予測手法を提案する。従来の整数差分法とは異なり、FDはメモリを連続的に保存し、モデリングのために安定化する。スパイ指標からの金融データにfdを適用し,ニュースレポートからの感情分析を組み込むことで,fdの有効性を目標変数のバイナリ分類と組み合わせて検討する。教師付き分類アルゴリズムを用いてFDシリーズの性能を検証した。その結果, 整数差に対するFDの優位性を示し, 受信器動作特性/Area Under the Curve (ROCAUC) とMathews correlation Coefficient (MCC) の評価で確認された。 This study introduces a novel forecasting strategy that leverages the power of fractional differencing (FD) to capture both short- and long-term dependencies in time series data. Unlike traditional integer differencing methods, FD preserves memory in series while stabilizing it for modeling purposes. By applying FD to financial data from the SPY index and incorporating sentiment analysis from news reports, this empirical analysis explores the effectiveness of FD in conjunction with binary classification of target variables. Supervised classification algorithms were employed to validate the performance of FD series. The results demonstrate the superiority of FD over integer differencing, as confirmed by Receiver Operating Characteristic/Area Under the Curve (ROCAUC) and Mathews Correlation Coefficient (MCC) evaluations.	翻訳日:2023-11-22 17:57:16 公開日:2023-11-19
# スパースエントロピーワッサースタイン回帰を用いたロバストネットワークプラニング Robust Network Pruning With Sparse Entropic Wasserstein Regression ( http://arxiv.org/abs/2310.04918v2 ) ライセンス: Link先を確認	Lei You and Hei Victor Cheng	(参考訳) 本研究では、経験的フィッシャー情報行列(FIM)の計算において、不正確な勾配が存在するというニューラルネットワークプルーニングの問題に取り組む。我々は, 最適輸送 (ot) 問題の幾何学的属性を活かしたエントロピーワッサースタイン回帰 (ewr) の定式化を提案する。これは、データポイント間の近傍補間を採用することでノイズ緩和に優れる分析的に示される。ワッサーシュタイン距離の独特な強さは、ノイズ低減と共分散情報保存のバランスをとる本質的な能力である。各種ネットワーク上での大規模実験により,提案手法と最先端(SoTA)ネットワークプルーニングアルゴリズムとの同等の性能を示した。提案手法は,ネットワークサイズやターゲットのスパース性が大きい場合,ノイズデータやアナログメモリ,逆襲攻撃などにより,ノイズ勾配が存在する場合に,さらに大きな利得が得られる。特に,提案手法では,ネットワークパラメータの4分の1以下しか残っていないmobilenetv1の精度が6%向上し,テスト損失が8%向上した。 This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce an entropic Wasserstein regression (EWR) formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. This is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points. The unique strength of the Wasserstein distance is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of the proposed method with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.	翻訳日:2023-11-22 17:47:07 公開日:2023-11-19
# 生まれつきのルールはどこから来るのか? 重ね合わせ Where does the Born Rule come from? Superposition ( http://arxiv.org/abs/2310.04188v3 ) ライセンス: Link先を確認	David Ellerman	(参考訳) ボルン則は量子力学 (qm) において、数理形式論と確率論における実験結果との関係を提供するため、重要な役割を担っている。生まれてくる規則は通常の確率論では起こらない。その時はどこから来ますか。これは文学における大きな論争の的となった。我々は、自然法則が現れる通常の確率論の最も単純な拡張は何であるかを問うアプローチを取る。これは、(通常の離散事象に加えて)重ね合わせ事象の概念を有限確率論に加えることによって生まれた規則が現れることを示すことによって解かれる。したがって、この規則は物理学に基づく導出を必要としない。これは単に重ね合わせの数学の特徴であり、通常の確率論に重ね合わせの事象が加わっただけである。 The Born Rule plays a critical role in quantum mechanics (QM) since it supplies the link between the mathematical formalism and experimental results in terms of probabilities. The Born Rule does not occur in ordinary probability theory. Where then does it come from? This has been a topic of considerable controversy in the literature. We take the approach of asking what is the simplest extension of ordinary probability theory where the Born rule appears. This is answered by showing that the Born Rule appears by adding the notion of superposition events (in addition to the ordinary discrete events) to finite probability theory. Hence the rule does not need any physics-based derivation. It is simply a feature of the mathematics of superposition when only superposition events are added to ordinary probability theory.	翻訳日:2023-11-22 17:45:54 公開日:2023-11-19
# 初歩的行動の学習と再利用による隠れ体験の再現 Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency ( http://arxiv.org/abs/2310.01827v2 ) ライセンス: Link先を確認	Francisco Roldan Sanchez, Qiang Wang, David Cordova Bulens, Kevin McGuinness, Stephen Redmond, Noel O'Connor	(参考訳) hindsight experience replay (her) は強化学習 (rl) で用いられるテクニックであり、スパース報酬を用いて目標ベースのロボット操作タスクを解決するために、オフポリシーrlベースのエージェントをトレーニングするのに非常に効率的であることが証明されている。 HERは、過去の経験の誤りから学習することで、RLベースのエージェントのサンプル効率を改善するが、環境を探索する際のガイダンスは提供しない。これは、このリプレイ戦略を使ってエージェントを訓練するのに必要な経験量のために、非常に大きなトレーニング時間をもたらす。本稿では,より複雑なタスクを学習しながら,エージェントを探索中により報奨的行動に導くために,単純なタスクの解法として学習された原始的な振る舞いを用いた手法を提案する。しかし、この指導は手動で設計したカリキュラムによっては実行されず、批判者ネットワークを使用して、前述したプリミティブポリシーによって提案されたアクションを使用するかどうかを各時間ステップで決定する。本手法は,複数のブロック操作タスクにおいて,その性能とアルゴリズムのより効率的なバリエーションを比較して評価する。提案手法では, サンプル効率と計算時間の両方から, エージェントがより早く方針を学習できることを実証する。コードはhttps://github.com/franroldans/qmp-herで入手できる。 Hindsight Experience Replay (HER) is a technique used in reinforcement learning (RL) that has proven to be very efficient for training off-policy RL-based agents to solve goal-based robotic manipulation tasks using sparse rewards. Even though HER improves the sample efficiency of RL-based agents by learning from mistakes made in past experiences, it does not provide any guidance while exploring the environment. This leads to very large training times due to the volume of experience required to train an agent using this replay strategy. In this paper, we propose a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks. This guidance, however, is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed by the previously-learned primitive policies. We evaluate our method by comparing its performance against HER and other more efficient variations of this algorithm in several block manipulation tasks. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time. Code is available at https://github.com/franroldans/qmp-her.	翻訳日:2023-11-22 17:43:57 公開日:2023-11-19
# RegBN: 正規化を伴うマルチモーダルデータのバッチ正規化 RegBN: Batch Normalization of Multimodal Data with Regularization ( http://arxiv.org/abs/2310.00641v2 ) ライセンス: Link先を確認	Morteza Ghahremani and Christian Wachinger	(参考訳) 近年、マルチモーダルデータの統合におけるニューラルネットワークの成功によって、マルチソースセンサーが捉えた高次元データを統合することへの関心が高まっている。しかし、不均一なマルチモーダルデータの統合は、不均一なデータソース間の結合効果と依存関係が望ましくない変数とバイアスを導入し、マルチモーダルモデルの準最適性能をもたらすなど、大きな課題となる。そのため、融合前にデータモダリティから抽出した低レベル・高レベルの特徴を正規化することが重要となる。本稿では,正規化を組み込んだマルチモーダルデータの正規化のための新しい手法,reginbnを提案する。 RegBNはFrobeniusのノルムを正規化用語として使用して、共同創設者の副作用と、異なるデータソース間の基盤となる依存関係に対処している。提案手法は複数のモードにまたがってうまく一般化し,学習可能なパラメータの必要性を排除し,トレーニングや推論を簡素化する。言語, 音声, 画像, ビデオ, 深度, 表層, 三次元MRIなどの多彩なモーダル性を含む5つの研究領域の8つのデータベース上でRegBNの有効性を検証する。提案手法は多層パーセプトロン,畳み込みニューラルネットワーク,視覚トランスフォーマーなどの異なるアーキテクチャに適用可能であり,マルチモーダルニューラルネットワークにおいて低レベルと高レベルの両方の機能を効果的に正規化できることを示す。 RegBN は \url{https://github.com/mogvision/regbn} で利用可能である。 Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in the integration of multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces a novel approach for the normalization of multimodal data, called RegBN, that incorporates regularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks. RegBN is available at \url{https://github.com/mogvision/regbn}.	翻訳日:2023-11-22 17:43:34 公開日:2023-11-19
# SoybeanNet:無人航空機(UAV)画像からダイズポッドを数えるトランスフォーマーベースの畳み込みニューラルネットワーク SoybeanNet: Transformer-Based Convolutional Neural Network for Soybean Pod Counting from Unmanned Aerial Vehicle (UAV) Images ( http://arxiv.org/abs/2310.10861v2 ) ライセンス: Link先を確認	Jiajia Li, Raju Thada Magar, Dong Chen, Feng Lin, Dechun Wang, Xiang Yin, Weichao Zhuang and Zhaojian Li	(参考訳) 大豆は食物、タンパク質、油の重要な供給源であり、その収量の向上、栽培法の改善、大豆の育種技術の進歩をめざす広範な研究が行われている。この文脈において、ダイズポッドカウントは生産の理解と最適化において重要な役割を果たす。近年の進歩にもかかわらず,実地環境で効果的に動作可能なロバストポッドカウントアルゴリズムの開発は,米国ミシガン州の実際の大豆畑から採取した無人航空機(uav)画像を用いた高精度大豆ポッドカウント手法の先駆的課題である。具体的には,大豆ポッドの同時カウントとローカライゼーションを高精度に行うために,強力なトランスフォーマーバックボーンを利用する新しいポイントベースカウントネットワークであるSoybeanNetを提案する。さらに、ダイズポッドカウントのためのUAV取得画像のデータセットが作成、オープンソース化され、113枚のドローン画像と260k以上の手動で注釈付けされたダイズポッドが自然の照明下で捕獲された。総合的な評価を通じて、SoybeanNetは、収集した画像をテストする際に、5つの最先端アプローチよりも優れた性能を示した。注目すべきは、SoybeanNetがテストデータセットでテストした場合のカウント精度が84.51\%に達したことだ。また、ソースコード(\url{https://github.com/jiajiali04/soybean-pod-counting-from-uav-images})とラベル付き大豆データセット(\url{https://www.kaggle.com/datasets/jiajiali/uav-based-soybean-pod-images})も提供している。 Soybeans are a critical source of food, protein and oil, and thus have received extensive research aimed at enhancing their yield, refining cultivation practices, and advancing soybean breeding techniques. Within this context, soybean pod counting plays an essential role in understanding and optimizing production. Despite recent advancements, the development of a robust pod-counting algorithm capable of performing effectively in real-field conditions remains a significant challenge This paper presents a pioneering work of accurate soybean pod counting utilizing unmanned aerial vehicle (UAV) images captured from actual soybean fields in Michigan, USA. Specifically, this paper presents SoybeanNet, a novel point-based counting network that harnesses powerful transformer backbones for simultaneous soybean pod counting and localization with high accuracy. In addition, a new dataset of UAV-acquired images for soybean pod counting was created and open-sourced, consisting of 113 drone images with more than 260k manually annotated soybean pods captured under natural lighting conditions. Through comprehensive evaluations, SoybeanNet demonstrated superior performance over five state-of-the-art approaches when tested on the collected images. Remarkably, SoybeanNet achieved a counting accuracy of $84.51\%$ when tested on the testing dataset, attesting to its efficacy in real-world scenarios. The publication also provides both the source code (\url{https://github.com/JiajiaLi04/Soybean-Pod-Counting-from-UAV-Images}) and the labeled soybean dataset (\url{https://www.kaggle.com/datasets/jiajiali/uav-based-soybean-pod-images}), offering a valuable resource for future research endeavors in soybean pod counting and related fields.	翻訳日:2023-11-22 17:34:42 公開日:2023-11-19
# AutoDIR: 遅延拡散によるオールインワン画像の自動復元 AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion ( http://arxiv.org/abs/2310.10123v3 ) ライセンス: Link先を確認	Yitong Jiang, Zhaoyang Zhang, Tianfan Xue and Jinwei Gu	(参考訳) 本稿では,ある画像が未知の劣化を生じさせる複雑な実世界の画像復元状況を解決することを目的とする。そこで本研究では,複数の未知の劣化を自動的に検出し対処できる,潜在拡散(autodir)を備えたオールインワン画像復元フレームワークを提案する。まず,ブラインド画像品質評価モジュール(biqa)を用いて,画像の未知の支配的画像劣化型の自動検出と同定を行う。次に、オールインワンイメージリファインメント(AIR)モジュールは、BIQAのガイダンスにより、複数の種類の劣化画像復元を処理する。最後に,AIRで歪んだ画像の復元のために,SCM(Structure Correction Module)を提案する。総合的な評価から,autodirはより広い範囲のタスクをサポートしながら,優れた修復結果を達成し,最先端のアプローチに勝ることが示された。特にAutoDIRは、複数の未知の劣化を伴う実シナリオイメージを自動的に処理する最初の方法でもある。 In this paper, we aim to solve complex real-world image restoration situations, in which, one image may have a variety of unknown degradations. To this end, we propose an all-in-one image restoration framework with latent diffusion (AutoDIR), which can automatically detect and address multiple unknown degradations. Our framework first utilizes a Blind Image Quality Assessment Module (BIQA) to automatically detect and identify the unknown dominant image degradation type of the image. Then, an All-in-One Image Refinement (AIR) Module handles multiple kinds of degradation image restoration with the guidance of BIQA. Finally, a Structure Correction Module (SCM) is proposed to recover the image details distorted by AIR. Our comprehensive evaluation demonstrates that AutoDIR outperforms state-of-the-art approaches by achieving superior restoration results while supporting a wider range of tasks. Notably, AutoDIR is also the first method to automatically handle real-scenario images with multiple unknown degradations.	翻訳日:2023-11-22 17:33:27 公開日:2023-11-19
# 動的モジュール展開と適応による生涯シーケンス生成 Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation ( http://arxiv.org/abs/2310.09886v3 ) ライセンス: Link先を確認	Chengwei Qin, Chen Chen, Shafiq Joty	(参考訳) 連続学習の課題である生涯シーケンス生成(LSG)は、連続的なタスクのシーケンス上でモデルを継続的に訓練し、過去の知識の忘れを回避しつつ、常に新しい世代パターンを学習することを目的としている。既存のLSG手法は主に、タスク間の知識伝達にほとんど注意を払わずに、古い知識を維持することに焦点を当てている。対照的に、人間は以前に獲得した類似のタスクからの知識を活用することで、新しいタスクをよりよく学べる。ヒトの学習パラダイムにインスパイアされた動的モジュール拡張・適応(DMEA)を提案し,タスク相関に基づく新しい知識獲得のためのアーキテクチャを動的に決定し,最も類似したタスクを選択し,新しいタスクへの適応を容易にする。さらに,学習プロセスが現在のタスクに偏りやすく,学習前の知識をより厳しく忘れてしまう可能性があることから,現在のタスクと再生タスクの学習のバランスをとるために,動的勾配スケーリングを提案する。大規模な実験により、DMEAはLSG設定の異なる既存手法より一貫して優れていることを示す。 Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings.	翻訳日:2023-11-22 17:33:11 公開日:2023-11-19
# がんにおけるバイオマーカー発見のための言語モデルから知識グラフへ From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer ( http://arxiv.org/abs/2310.08365v2 ) ライセンス: Link先を確認	Md. Rezaul Karim and Lina Molinas Comet and Md Shajalal and Oya Deniz Beyan and Dietrich Rebholz-Schuhmann and Stefan Decker	(参考訳) ドメインの専門家は、様々な疾患のシナリオにおける予防と治療的意思決定を開発するための戦略を設計するのに役立つ、特定の生物学的プロセスの調整と普及に、近年の知識に頼っていることが多い。 ai(artificial intelligence)の難解なシナリオは、生体医学データ(テキスト、画像、省略、臨床など)を使用して、がんの診断と治療の推奨を提供することだ。 ~ がん、薬物、遺伝子、タンパク質などの生体医学的実体に関するデータと知識とそのメカニズムは、構造化された(知識ベース(kbs)と非構造化された(科学的記事など)ソースにまたがる。大規模知識グラフ(KG)は、意味的相互関連エンティティや関係に関する事実の統合と抽出によって構築することができる。このようなKGは、探索と質問応答(QA)を可能にするだけでなく、ドメインの専門家が新しい知識を推論することを可能にする。しかし,データアセットやセマンティック技術に対する理解の欠如から,大規模KGの探索とクエリは非ドメインユーザにとって面倒である。本稿では,癌特異的バイオマーカー発見と対話型QAを活用するドメインKGを開発する。そこで我々は OncoNet Ontology (ONO) というドメインオントロジーを構築した。 KGは、ONO、メタデータ、制御された語彙、およびBioBERTおよびSciBERTベースの情報抽出装置を用いて、科学論文から生医学的な概念を調和させることによりさらに豊かになる。さらに、生物医学領域は進化しており、新しい発見が古い発見に取って代わることが多いため、最新の科学的発見にアクセスできることなく、AIシステムが診断と治療を提供しながら概念ドリフトを示す可能性は高い。そこで,より最近の論文やkbsに基づいて,大言語モデル(llms)を用いてkgを微調整する。 Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.	翻訳日:2023-11-22 17:31:25 公開日:2023-11-19
# FedMFS:選択的モーダル通信を用いた多モード融合学習 FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication ( http://arxiv.org/abs/2310.07048v2 ) ライセンス: Link先を確認	Liangqi Yuan and Dong-Jun Han and Vishnu Pandi Chellapandi and Stanislaw H. \.Zak and Christopher G. Brinton	(参考訳) multimodal federated learning (fl) は、デバイスが複数のモダリティ(圧力、動き、その他の種類のデータを測定するセンサーなど)で計測値を集めているfl設定でのモデルトレーニングを強化することを目的としている。しかし、特に異種ネットワーク設定において、マルチモーダルFLに対する重要な課題は未解決のままである。 (i)各装置が収集するモダリティの集合は多様であり、 (ii) 通信制限は、デバイスがローカルに訓練されたモダリティモデルをサーバにアップロードすることを妨げている。本稿では,上記の課題に対処可能な新しいマルチモーダル融合fl手法であるfedmfs(federated multimodal fusion learning with selective modality communication)を提案する。鍵となるアイデアは、各デバイスに対するモダリティ選択基準の導入である。 (i)Shapley値分析によって測定されたモダリティの影響 (ii)通信オーバーヘッドの指標としてのモダリティモデルサイズ。これにより、fedmfはリソースの制約やアプリケーション要件に応じて、通信コストに対して柔軟にパフォーマンスのバランスをとることができる。実世界のActionSenseデータセットの実験では、FedMFSが複数のベースラインに匹敵する精度を達成し、通信オーバーヘッドを4倍に削減できることを示した。 Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.	翻訳日:2023-11-22 17:29:47 公開日:2023-11-19
# frank-wolfe-based metarounding アルゴリズムによるオンライン組合せ線形最適化 Online Combinatorial Linear Optimization via a Frank-Wolfe-based Metarounding Algorithm ( http://arxiv.org/abs/2310.12629v2 ) ライセンス: Link先を確認	Ryotaro Mitsuboshi, Kohei Hatano, and Eiji Takimoto	(参考訳) Metaroundingは、いくつかの組合せクラスに対する線形最適化のための近似アルゴリズムを、同じクラスのオンライン線形最適化アルゴリズムに変換するアプローチである。本稿では, 組合せクラスに対して, 緩和に基づく近似アルゴリズムが存在するという自然な仮定のもとに, 新たな畳み込みアルゴリズムを提案する。私たちのアルゴリズムは理論的にも実用的にもはるかに効率的です。 Metarounding is an approach to convert an approximation algorithm for linear optimization over some combinatorial classes to an online linear optimization algorithm for the same class. We propose a new metarounding algorithm under a natural assumption that a relax-based approximation algorithm exists for the combinatorial class. Our algorithm is much more efficient in both theoretical and practical aspects.	翻訳日:2023-11-22 17:17:52 公開日:2023-11-19
# 複数経路設定によるアライメント言語モデルの不確かさ校正の検討 Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting ( http://arxiv.org/abs/2310.11732v2 ) ライセンス: Link先を確認	Guande He, Peng Cui, Jianfei Chen, Wenbo Hu, Jun Zhu	(参考訳) 協調言語モデル (LM) の実践的応用において顕著な進歩はあったが, 対応する事前学習型 LM と比較すると, 出力応答が過度に信頼される傾向にある。本研究では,多段設定下でのlmsのロジットに基づく不確実性校正に対するアライメントプロセスの影響を体系的に評価する。我々はまず,事前学習したキャリブレーションとlmsのキャリブレーションの違いについて,注意深い実験を行った。実験結果から,複数選択条件下でのLMには2つの不確実性が存在することが明らかとなった。次に,単純な合成アライメントスキームにおける微調整によるlmの調整におけるこれら2つの不確かさの役割について検討し,これら2つの不確かさの和合がlmsの過密化の一因であると結論づける。さらに,アライメントLMの一般的なポストホックキャリブレーション法の有用性について検討し,アライメントLMのキャリブレーションを容易かつ効率的に行う方法を提案する。 lmsのより信頼性の高いアライメントプロセスの設計に関する洞察を私たちの発見に提供できることを願っています。 Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.	翻訳日:2023-11-22 17:17:32 公開日:2023-11-19
# イソトポローグ回転スペクトルによる自然存在量の3次元構造決定のための反射同変拡散 Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance ( http://arxiv.org/abs/2310.11609v2 ) ライセンス: Link先を確認	Austin Cheng, Alston Lo, Santiago Miret, Brooks Pate, Al\'an Aspuru-Guzik	(参考訳) 構造決定は、天然物、法医学的なサンプル、星間物質、実験室合成などの未知の有機分子を特定するために必要である。回転分光は、慣性モーメントを介して小さな有機分子の正確な3次元情報を提供することによって構造決定を可能にする。これらのモーメントを用いて、クラッチマン分析は、炭素、窒素、酸素を含む天然同位体の存在量を持つ全ての原子の非符号の$\|x\|,\|y\|,\|z\|$座標である同位体置換座標を決定する。非符号置換座標は構造の推測を検証することができるが、不足している$+/-$符号は置換座標のみから実際の構造を決定するのに困難である。この逆問題に対処するために、分子の完全な3d構造を分子式、慣性モーメント、重原子の無符号置換座標から推測する生成拡散モデルであるkreed(クラッチマン反射同変拡散)を開発した。 kreed の top-1 予測では、qm9 と geom データセットで 98% 以上の精度で正確な 3d 構造を同定している。置換座標が炭素のサブセットに制限されると、精度はQM9では91%、GEOMでは32%に維持される。文献から収集した置換座標の試験セットにおいて、クリードは33例中25例で正しい全原子3d構造を予測し、回転分光による文脈自由3d構造決定の実験的適用性を示した。 Structure determination is necessary to identify unknown organic molecules, such as those in natural products, forensic samples, the interstellar medium, and laboratory syntheses. Rotational spectroscopy enables structure determination by providing accurate 3D information about small organic molecules via their moments of inertia. Using these moments, Kraitchman analysis determines isotopic substitution coordinates, which are the unsigned $\|x\|,\|y\|,\|z\|$ coordinates of all atoms with natural isotopic abundance, including carbon, nitrogen, and oxygen. While unsigned substitution coordinates can verify guesses of structures, the missing $+/-$ signs make it challenging to determine the actual structure from the substitution coordinates alone. To tackle this inverse problem, we develop KREED (Kraitchman REflection-Equivariant Diffusion), a generative diffusion model that infers a molecule's complete 3D structure from its molecular formula, moments of inertia, and unsigned substitution coordinates of heavy atoms. KREED's top-1 predictions identify the correct 3D structure with >98% accuracy on the QM9 and GEOM datasets when provided with substitution coordinates of all heavy atoms with natural isotopic abundance. When substitution coordinates are restricted to only a subset of carbons, accuracy is retained at 91% on QM9 and 32% on GEOM. On a test set of experimentally measured substitution coordinates gathered from the literature, KREED predicts the correct all-atom 3D structure in 25 of 33 cases, demonstrating experimental applicability for context-free 3D structure determination with rotational spectroscopy.	翻訳日:2023-11-22 17:17:09 公開日:2023-11-19
# 量子ネットワークのパーコレーション理論 Percolation Theories for Quantum Networks ( http://arxiv.org/abs/2310.18420v2 ) ライセンス: Link先を確認	Xiangyi Meng, Xinqi Hu, Yu Tian, Gaogao Dong, Renaud Lambiotte, Jianxi Gao, Shlomo Havlin	(参考訳) 量子ネットワークは過去10年間、理論領域と実験領域の両方で急速に進歩し、統計物理学の観点からその大規模特徴を理解することがますます重要になっている。接続が部分的に絡み合っており、量子ノイズにさらされている不完全な量子ネットワークにおいて、遠方のノード間で(例えば、中間ノードを通して)効果的に、そして間接的に絡み合うことができるのか? ネットワーク接続に着目した統計物理学の分野であるパーコレーション理論に、正確なあるいは近似的なマッピングを描画することにより、この問題に対処する最近の研究を調査する。特に、古典的なパーコレーションフレームワークは、ネットワークの間接接続を一意的に定義していない。この実現により、「'Concurrence percolation'」と呼ばれる別の理論が出現し、この理論は、量子ネットワークがかつて古典的なパーコレーションの文脈で考えられていたよりも弾力性があり、将来の量子ネットワーク設計に新たな洞察をもたらすことを示唆している。 Quantum networks have experienced rapid advancements in both theoretical and experimental domains over the last decade, making it increasingly important to understand their large-scale features from the viewpoint of statistical physics. This review paper discusses a fundamental question: how can entanglement be effectively and indirectly (e.g., through intermediate nodes) distributed between distant nodes in an imperfect quantum network, where the connections are only partially entangled and subject to quantum noise? We survey recent studies addressing this issue by drawing exact or approximate mappings to percolation theory, a branch of statistical physics centered on network connectivity. Notably, we show that the classical percolation frameworks do not uniquely define the network's indirect connectivity. This realization leads to the emergence of an alternative theory called ``concurrence percolation,'' which uncovers a previously unrecognized quantum advantage that emerges at large scales, suggesting that quantum networks are more resilient than initially assumed within classical percolation contexts, offering refreshing insights into future quantum network design.	翻訳日:2023-11-22 17:08:56 公開日:2023-11-19
# IIDウェイトを超えて:スパースと低ランクのディープニューラルネットワークもガウス的プロセスである Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes ( http://arxiv.org/abs/2310.16597v2 ) ライセンス: Link先を確認	Thiziri Nait-Saada, Alireza Naderi, Jared Tanner	(参考訳) 無限に広いニューラルネットワークは、ディープラーニングに現れる多くの現象の理解を可能にする、有用で管理可能な数学的モデルであることが証明されている。例えば、ランダムディープネットワークをガウス過程に収束させることで、活性化関数とネットワークウェイトの選択がトレーニング力学に与える影響を厳密に分析することができる。本稿では, Matthews et al. (2018) の初歩的な証明を, IID や直交重みの確立した事例を含むより大規模な初期重量分布(PSEUDO-IID と呼ぶ)に拡張するとともに, 計算速度の向上をめざす低ランクで構造化されたスパース設定を新たに導入する。また,PSEUDO-IID分布に初期化される完全連結・畳み込みネットワークは,その分散にほぼ等価であることを示す。この結果を用いて,より広い階層のニューラルネットワークのエッジ・オブ・カオスを識別し,そのトレーニングを強化するために臨界度でチューニングすることができる。 The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.	翻訳日:2023-11-22 17:06:40 公開日:2023-11-19
# 複数の解像度でのルーティング問題を解決する対称性保存グラフアテンションネットワーク Symmetry-preserving graph attention network to solve routing problems at multiple resolutions ( http://arxiv.org/abs/2310.15543v2 ) ライセンス: Link先を確認	Cong Dao Tran, Thong Bach, Truong Son Hy	(参考訳) トラベリングセールスパーソン問題 (TSP) と車両ルーティング問題 (VRP) は,機械学習 (ML) 手法の適応により,精度と計算時間を合理的に向上した。しかし、以前の作品では、回転、翻訳、置換、スケーリングを含む、tspsとvrpから生じる対称性を完全に尊重していない。本研究では,組合わせ問題を解くために,最初の完全同値モデルとトレーニングを導入する。さらに、特に大きなグラフや長距離グラフの場合において、入力グラフのマルチスケール構造(ローカルからグローバル情報)を捉えることが不可欠であり、従来の手法は局所的あるいは準最適解に繋がるローカル情報のみを抽出することに限定されていた。上記の制限に対処するため,マルチレゾリューション方式と等価グラフアテンションネットワーク(mEGAT)アーキテクチャを併用して,低レベルおよび高レベルグラフレゾリューションに基づく最適経路を効率的に学習する手法を提案する。特に, 入力グラフから粗粒グラフの階層構造を構築し, まずは単純な低レベルグラフのルーティング問題を解き, その知識をより複雑な高レベルグラフに活用する。実験により,本モデルが既存のベースラインより優れており,対称性の保存とマルチレゾリューションがデータ駆動方式で組合せ問題を解くための重要なレシピであることを実証した。私たちのソースコードはhttps://github.com/HySonLab/Multires-NP-hardで公開されています。 Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard	翻訳日:2023-11-22 17:05:25 公開日:2023-11-19
# FS-Net:マイクロ網膜血管構造の抽出改善のためのフルスケールネットワークと適応閾値 FS-Net: Full Scale Network and Adaptive Threshold for Improving Extraction of Micro-Retinal Vessel Structures ( http://arxiv.org/abs/2311.08059v2 ) ライセンス: Link先を確認	Melaku N. Getahun, Oleg Y. Rogov, Dmitry V. Dylov, Andrey Somov, Ahmed Bouridane, Rifat Hamoudi	(参考訳) 網膜血管セグメンテーションは、生体画像処理において広く研究されている課題であり、網膜障害の治療および検出における眼科医の負担を軽減することを目的としている。しかし、網膜血管の分割には独自の課題があり、従来の技術では分枝や微小血管構造を分割する場合に十分な結果が得られなかった。近年のニューラルネットワークのアプローチは、局所的および全体的特性を共に保持できないことと、小さなエンド容器を捕獲できないことが、望ましい結果を達成するのに困難である点が特徴である。この網膜血管セグメンテーション問題を解決するために,エンコーダ・デコーダニューラルネットワークアーキテクチャ,シグモイド平滑化,適応しきい値法に基づくフルスケールの微小血管抽出機構を提案する。ネットワークは、残余、エンコーダブースター、ボトルネック強化、圧縮、励起ビルディングブロックで構成されている。これらすべてのブロックは、セグメンテーションマップの機能抽出と予測を改善するのに役立ちます。提案手法は, DRIVE, CHASE-DB1, STAREデータセットを用いて評価し, 従来の研究と比較した場合の競合結果を得た。 AUCとDRIVEデータセットの精度はそれぞれ0.9884と0.9702である。 CHASE-DB1データセットでは、スコアはそれぞれ0.9903と0.9755である。 STAREデータセットでは、スコアはそれぞれ0.9916と0.9750である。その結果、眼科医の注意を引こうとする実生活診断センターにおいて、このソリューションが実現される確率が高くなる。 Retinal vascular segmentation, is a widely researched subject in biomedical image processing, aims to relieve ophthalmologists' workload when treating and detecting retinal disorders. However, segmenting retinal vessels has its own set of challenges, with prior techniques failing to generate adequate results when segmenting branches and microvascular structures. The neural network approaches used recently are characterized by the inability to keep local and global properties together and the failure to capture tiny end vessels make it challenging to attain the desired result. To reduce this retinal vessel segmentation problem, we propose a full-scale micro-vessel extraction mechanism based on an encoder-decoder neural network architecture, sigmoid smoothing, and an adaptive threshold method. The network consists of of residual, encoder booster, bottleneck enhancement, squeeze, and excitation building blocks. All of these blocks together help to improve the feature extraction and prediction of the segmentation map. The proposed solution has been evaluated using the DRIVE, CHASE-DB1, and STARE datasets, and competitive results are obtained when compared with previous studies. The AUC and accuracy on the DRIVE dataset are 0.9884 and 0.9702, respectively. On the CHASE-DB1 dataset, the scores are 0.9903 and 0.9755, respectively. On the STARE dataset, the scores are 0.9916 and 0.9750, respectively. The performance achieved is one step ahead of what has been done in previous studies, and this results in a higher chance of having this solution in real-life diagnostic centers that seek ophthalmologists attention.	翻訳日:2023-11-22 16:19:14 公開日:2023-11-19
# コードのための言語モデルに関する調査 A Survey on Language Models for Code ( http://arxiv.org/abs/2311.07989v2 ) ライセンス: Link先を確認	Ziyin Zhang and Chaoyu Chen and Bingchang Liu and Cong Liao and Zi Gong and Hang Yu and Jianguo Li and Rui Wang	(参考訳) 本稿では,50以上のモデル,30以上の評価タスク,150以上のデータセット,550以上の関連作業を含む,言語モデルによるコード処理の最近の進歩を体系的にレビューする。私たちは、コード処理モデルをgptファミリに代表される一般的な言語モデルと、特にコードで事前学習される特殊なモデルに分解します。これらのモデルとの関係と相違について考察し,nlpが実施したのと全く同じ方法で,統計モデルやrnnから事前学習されたトランスフォーマーやllmへのコードモデリングの歴史的変遷を強調する。また、ast、cfg、ユニットテストといったコード固有の機能や、コード言語モデルをトレーニングするアプリケーションについても議論し、このドメインにおける重要な課題と将来的な方向性を特定します。調査はGitHubリポジトリのhttps://github.com/codefuse-ai/Awesome-Code-LLM.comで公開しています。 In this work we systematically review the recent advancements in code processing with language models, covering 50+ models, 30+ evaluation tasks, 150+ datasets, and 550 related works. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also discuss code-specific features such as AST, CFG, and unit tests, along with their application in training code language models, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on GitHub repository at https://github.com/codefuse-ai/Awesome-Code-LLM.	翻訳日:2023-11-22 16:18:47 公開日:2023-11-19
# PEMS:事前訓練されたエピデミック時系列モデル PEMS: Pre-trained Epidemic Time-series Models ( http://arxiv.org/abs/2311.07841v2 ) ライセンス: Link先を確認	Harshavardhan Kamarthi, B. Aditya Prakash	(参考訳) 伝染病の将来に関する正確かつ確実な予測を提供することは、公衆衛生上の決定を情報化するための重要な問題である。近年の研究では、ディープラーニング手法の進歩を活用して過去の流行データから学習するデータ駆動ソリューションが、従来の力学モデルより優れていることが示されている。しかし、多くの場合、過去のデータは希少であり、基礎となるダイナミクスを十分に捉えていない。過去の流行による大量のデータが存在しているが、他の病気の時系列データからの事前知識を活用することは、ささいな課題である。言語および視覚タスクにおける事前学習モデルの成功に動機づけられた我々は、異なる疾患や流行から複数のデータセットから学習するために、事前訓練された流行時間モデルの問題に取り組む。自己教師型学習(SSL)タスクの集合として事前学習を定式化することにより,各種疾患の時系列データセットから学習する,事前学習型エピデミック時系列モデル(PEMS)を導入する。我々は,複数のダウンストリームタスクの微調整に活用可能な流行ダイナミクスに関する重要な事前知識を得るために,sslタスクを慎重に設計することにより,不均一なダイナミクスの処理や,複数の流行データセットから有用なパターンを効率的に取得することなど,流行時系列の事前学習に特有のさまざまな重要な課題に取り組む。その結果、PEMは、さまざまな季節パターン、地理、感染メカニズムのデータセット間で、さまざまなダウンストリームの時系列タスクにおいて、以前の最先端の手法よりも優れています。 Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We introduce Pre-trained Epidemic Time-Series Models (PEMS) that learn from diverse time-series datasets of a variety of diseases by formulating pre-training as a set of self-supervised learning (SSL) tasks. We tackle various important challenges specific to pre-training for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets.	翻訳日:2023-11-22 16:18:31 公開日:2023-11-19
# 一般化アナロジー:aiの監視を測定困難領域に一般化するためのテストベッド Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains ( http://arxiv.org/abs/2311.07723v2 ) ライセンス: Link先を確認	Joshua Clymer, Garrett Baker, Rohan Subramani, Sam Wang	(参考訳) aiシステムがよりインテリジェントになり、その行動がより評価が難しくなるにつれ、彼らは指示に従うのではなく、人間のフィードバックの欠陥を競うことを学ぶことができるが、このリスクは、llmが人間のフィードバックを信頼できない状況に一般化する方法を制御することによって軽減できる。報酬モデルをいかに一般化するかをよりよく理解するために、私たちは8つのカテゴリにまたがる69の分布シフトを作成します。報酬モデルでは,「インストラクション・フォロー」の評価をデフォルトでは学ばず,代わりにインターネットテキストに似たペルソナを好んでいる。報酬モデルの内部表現を解釈する技術は、標準的な微調整よりも優れた一般化を実現するが、それでもしばしば、複雑な振る舞いと命令追従を区別することができない。我々は、最も難しい15の分散シフトをジェネラライゼーションアナログIES(GENIES)ベンチマークに統合し、報酬モデル一般化の制御に向けた進歩を期待する。 As AI systems become more intelligent and their behavior becomes more challenging to assess, they may learn to game the flaws of human feedback instead of genuinely striving to follow instructions; however, this risk can be mitigated by controlling how LLMs generalize human feedback to situations where it is unreliable. To better understand how reward models generalize, we craft 69 distribution shifts spanning 8 categories. We find that reward models do not learn to evaluate `instruction-following' by default and instead favor personas that resemble internet text. Techniques for interpreting reward models' internal representations achieve better generalization than standard fine-tuning, but still frequently fail to distinguish instruction-following from conflated behaviors. We consolidate the 15 most challenging distribution shifts into the GENeralization analogIES (GENIES) benchmark, which we hope will enable progress toward controlling reward model generalization.	翻訳日:2023-11-22 16:16:45 公開日:2023-11-19
# 組合せ最適化問題に対する予測候補最適化パラダイムの再考とベンチマーク Rethinking and Benchmarking Predict-then-Optimize Paradigm for Combinatorial Optimization Problems ( http://arxiv.org/abs/2311.07633v2 ) ライセンス: Link先を確認	Haoyu Geng, Hang Ruan, Runzhong Wang, Yang Li, Yang Wang, Lei Chen, Junchi Yan	(参考訳) 多くのwebアプリケーションは、エネルギーコスト認識スケジューリング、web広告の予算配分、ソーシャルネットワークでのグラフマッチングなど、組合せ最適化の問題を解決することに依存している。しかし、多くの最適化問題には未知の係数が含まれており、これらの要因の不適切な予測は、エネルギー浪費、非効率な資源配分、ソーシャルネットワークにおける不適切なマッチングなどを引き起こす可能性がある。このような研究テーマを「予測テーマ最適化(PTO)」と呼び、統一システムにおける予測と意思決定のパフォーマンスを考察する。注目すべき最近の開発は、従来の2段階のアプローチとは対照的に、よりよい結果をもたらすと主張する最終的な意思決定品質を直接最適化する、エンドツーエンドの手法である。しかしながら、この分野の評価ベンチマークは断片化されており、様々なシナリオにおける様々なモデルの有効性はいまだ不明であり、包括的な評価と迅速な展開を妨げる。これらの問題に対処するため,我々は,現在のアプローチを包括的に分類し,既存の実験シナリオを統合し,統合ベンチマークを確立する。また,インクルーシブファイナンスのためのインダストリアルコンビネート広告問題の新たなデータセットをオープンソースとして紹介する。 ptoの再設計とベンチマークによって、より便利な評価とデプロイメントが促進され、この分野のアカデミーと業界の両方でさらなる改善がもたらされることを願っています。 Numerous web applications rely on solving combinatorial optimization problems, such as energy cost-aware scheduling, budget allocation on web advertising, and graph matching on social networks. However, many optimization problems involve unknown coefficients, and improper predictions of these factors may lead to inferior decisions which may cause energy wastage, inefficient resource allocation, inappropriate matching in social networks, etc. Such a research topic is referred to as "Predict-Then-Optimize (PTO)" which considers the performance of prediction and decision-making in a unified system. A noteworthy recent development is the end-to-end methods by directly optimizing the ultimate decision quality which claims to yield better results in contrast to the traditional two-stage approach. However, the evaluation benchmarks in this field are fragmented and the effectiveness of various models in different scenarios remains unclear, hindering the comprehensive assessment and fast deployment of these methods. To address these issues, we provide a comprehensive categorization of current approaches and integrate existing experimental scenarios to establish a unified benchmark, elucidating the circumstances under which end-to-end training yields improvements, as well as the contexts in which it performs ineffectively. We also introduce a new dataset for the industrial combinatorial advertising problem for inclusive finance to open-source. We hope the rethinking and benchmarking of PTO could facilitate more convenient evaluation and deployment, and inspire further improvements both in the academy and industry within this field.	翻訳日:2023-11-22 16:15:52 公開日:2023-11-19
# ソーシャルレコメンデーションのための未ターゲティングブラックボックス攻撃 Untargeted Black-box Attacks for Social Recommendations ( http://arxiv.org/abs/2311.07127v2 ) ライセンス: Link先を確認	Wenqi Fan, Shijie Wang, Xiao-yong Wei, Xiaowei Mei, Qing Li	(参考訳) オンラインソーシャルネットワークの興隆は、ユーザの意思決定プロセスを強化するために社会的関係を組み込んだソーシャルレコメンデーションシステムの進化を促進する。ノード表現の学習においてグラフニューラルネットワークが大きな成功を収めたことにより、GNNベースのソーシャルレコメンデーションは、ユーザ-イテムインタラクションとユーザ-ユーザ関係を同時にモデル化するために広く研究されている。その大きな成功にもかかわらず、最近の研究では、これらの高度なレコメンデーションシステムは、攻撃者がレコメンデーションのパフォーマンスを乱すために適切に設計されたフェイクユーザープロファイルを注入できる敵の攻撃に対して非常に脆弱であることが示されている。既存のほとんどの研究は、主にバニラレコメンデーターシステムにおけるターゲットアイテムのプロモートを目的とした攻撃に焦点を当てているが、全体的な予測性能を低下させる未目標攻撃はブラックボックスシナリオ下での社会的レコメンデーションでは調査されていない。ソーシャルレコメンデーションシステムに対する未ターゲティング攻撃を実行するために、攻撃者は偽ユーザーのための悪意あるソーシャル関係を構築して攻撃性能を高めることができる。しかし、ブラックボックスのソーシャルレコメンデーションを攻撃するには、社会的関係とアイテムプロファイルの調整が難しい。この制限に対処するため,我々はまず,コミュニティ間接続とコールドスタート項目が推奨性能の劣化に有効であることを示すための予備的研究を行った。具体的には,マルチエージェント強化学習に基づく新しいフレームワークによるマルチアタックを提案し,コールドスタートアイテムプロファイルの生成と,ブラックボックスソーシャルレコメンデーションに対する非ターゲティング攻撃を行うためのコミュニティ間ソーシャルリレーションを協調させる。様々な実世界のデータセットに関する包括的実験は、ブラックボックス設定下で提案する攻撃フレームワークの有効性を実証する。 The rise of online social networks has facilitated the evolution of social recommender systems, which incorporate social relations to enhance users' decision-making process. With the great success of Graph Neural Networks in learning node representations, GNN-based social recommendations have been widely studied to model user-item interactions and user-user social relations simultaneously. Despite their great successes, recent studies have shown that these advanced recommender systems are highly vulnerable to adversarial attacks, in which attackers can inject well-designed fake user profiles to disrupt recommendation performances. While most existing studies mainly focus on targeted attacks to promote target items on vanilla recommender systems, untargeted attacks to degrade the overall prediction performance are less explored on social recommendations under a black-box scenario. To perform untargeted attacks on social recommender systems, attackers can construct malicious social relationships for fake users to enhance the attack performance. However, the coordination of social relations and item profiles is challenging for attacking black-box social recommendations. To address this limitation, we first conduct several preliminary studies to demonstrate the effectiveness of cross-community connections and cold-start items in degrading recommendations performance. Specifically, we propose a novel framework Multiattack based on multi-agent reinforcement learning to coordinate the generation of cold-start item profiles and cross-community social relations for conducting untargeted attacks on black-box social recommendations. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of our proposed attacking framework under the black-box setting.	翻訳日:2023-11-22 16:14:47 公開日:2023-11-19
# ヒト脳活動からの言語生成 Language Generation from Human Brain Activities ( http://arxiv.org/abs/2311.09889v2 ) ライセンス: Link先を確認	Ziyi Ye, Qingyao Ai, Yiqun Liu, Min Zhang, Christina Lioma, Tuukka Ruotsalo	(参考訳) 非侵襲的脳-コンピュータインタフェース(BCI)による人間の言語の生成は、障害者に提供したりコミュニケーションを改善するなど、多くの応用を解き放つ可能性がある。しかし、現在、bcisによる言語生成は、最も可能性の高い皮質意味表現を持つ前生成文継続候補を選択するための分類設定でのみ成功している。脳と大規模計算言語モデルとの関係を明らかにする最近の研究に触発されて,意味的脳デコーダと組み合わせて,機能的磁気共鳴画像(fMRI)入力から言語を直接生成する,大規模言語モデル(LLM)のキャパシティを利用する生成言語BCIを提案する。提案モデルは,事前生成した候補の事前知識を必要とせず,視覚刺激や聴覚刺激の意味的内容に整合したコヒーレントな言語系列を生成することができる。提案したモデルから生成された言語を,ランダム制御,事前生成言語選択アプローチ,および標準LCMと比較し,統計的言語学習データに基づいて,次の単語の確率のみに基づいて共通コヒーレントテキストを生成する。提案モデルでは,脳の入力がサンプリングされたときのセマンティック刺激とより整合した言語を生成する。本研究は,直接言語生成におけるbcis活用の可能性と実現可能性を示す。 Generating human language through non-invasive brain-computer interfaces (BCIs) has the potential to unlock many applications, such as serving disabled patients and improving communication. Currently, however, generating language via BCIs has been previously successful only within a classification setup for selecting pre-generated sentence continuation candidates with the most likely cortical semantic representation. Inspired by recent research that revealed associations between the brain and the large computational language models, we propose a generative language BCI that utilizes the capacity of a large language model (LLM) jointly with a semantic brain decoder to directly generate language from functional magnetic resonance imaging (fMRI) input. The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli perceived, without prior knowledge of any pre-generated candidates. We compare the language generated from the presented model with a random control, pre-generated language selection approach, and a standard LLM, which generates common coherent text solely based on the next word likelihood according to statistical language training data. The proposed model is found to generate language that is more aligned with semantic stimulus in response to which brain input is sampled. Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.	翻訳日:2023-11-22 16:07:10 公開日:2023-11-19
# 生成aiを活用した臨床エビデンス要約の信頼性向上 Leveraging Generative AI for Clinical Evidence Summarization Needs to Achieve Trustworthiness ( http://arxiv.org/abs/2311.11211v1 ) ライセンス: Link先を確認	Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng	(参考訳) エビデンスベースの医療は、医療の判断と実践を最良の証拠で力づけることで、医療の質を向上させることを目指している。様々な情報源から得ることができる医学的証拠の急速な成長は、明らかな情報の収集、評価、合成に挑戦する。大規模言語モデルによって実証された、生成AIの最近の進歩は、困難な作業の促進を約束する。しかし、説明責任、公平、包括的モデルの開発は依然として複雑な作業である。この観点から、医療証拠の自動要約の文脈において、生成AIの信頼性について論じる。 Evidence-based medicine aims to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.	翻訳日:2023-11-22 06:59:20 公開日:2023-11-19
# hih:unconstrained gait認識のための階層ネットワークにおけるマルチモーダル階層 HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition ( http://arxiv.org/abs/2311.11210v1 ) ライセンス: Link先を確認	Lei Wang, Yinchi Ma, Peng Luan, Wei Yao, Congcong Li, Bo Liu	(参考訳) 歩行認識は、制御された環境において有望な進歩を遂げてきたが、視野の変化、咬合、歩行速度の変化といった課題により、訓練されていない環境では著しく困難である。加えて、複数のモダリティを融合させる努力は、特に屋外シナリオにおいて、クロスモダリティの非互換性のため、しばしば限られた改善に直面します。これらの問題に対処するために,階層ネットワーク (hih) において,ロバストな歩行認識のためにシルエットとポーズシーケンスを統合するマルチモーダル階層を提案する。 HiHは階層的ゲイト分解器(HGD)モジュールを用いてシルエットデータから一般的なゲイトパターンの深さ方向およびモジュール内階層的な検査を行う。このアプローチは、全身のダイナミクスから詳細な手足の動きまでの動き階層を捉え、複数の空間分解能にわたる歩行特性の表現を容易にする。これを補完する2次元関節配列に基づく補助枝は、歩行解析の空間的・時間的側面を豊かにする。ポーズ誘導型空間アテンションのための変形性空間拡張(DSE)モジュールと、学習された時間オフセットを通じて運動力学を整列させる変形性時間アライメント(DTA)モジュールを用いる。さまざまな屋内および屋外データセットにわたる広範囲な評価は、HiHの最先端のパフォーマンスを示し、正確性と効率のバランスの取れたトレードオフを確認している。 Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.	翻訳日:2023-11-22 06:59:11 公開日:2023-11-19
# 単平面蛍光画像からの3次元ガイドワイヤ形状再構成 3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images ( http://arxiv.org/abs/2311.11209v1 ) ライセンス: Link先を確認	Tudor Jianu, Baoru Huang, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen	(参考訳) 血管内ナビゲーションは血管内疾患の診断および治療に必須であり、感覚フィードバックの制約により主に蛍光画像に影響を及ぼす。血管内介入のための現在の形状再構成技術は、しばしば事前情報または特殊な機器に依存し、患者に放射線曝露の増大を強いる可能性がある。ディープラーニングは潜在的な可能性を秘めているが、通常は広範なデータを必要とする。本稿では,最先端人工血管シミュレータcathsimと3次元蛍光ガイドワイヤ再構成ネットワーク(3d-fgrn)を用いた3次元ガイドワイヤの再構築手法を提案する。我々の3D-FGRNは、シミュレーションされた単平面蛍光画像から従来の三角測量と同等の結果が得られる。提案するネットワークの効率を高める実験を行い,従来の手法に代わる有望な代替手段として実証した。 Endovascular navigation, essential for diagnosing and treating endovascular diseases, predominantly hinges on fluoroscopic images due to the constraints in sensory feedback. Current shape reconstruction techniques for endovascular intervention often rely on either a priori information or specialized equipment, potentially subjecting patients to heightened radiation exposure. While deep learning holds potential, it typically demands extensive data. In this paper, we propose a new method to reconstruct the 3D guidewire by utilizing CathSim, a state-of-the-art endovascular simulator, and a 3D Fluoroscopy Guidewire Reconstruction Network (3D-FGRN). Our 3D-FGRN delivers results on par with conventional triangulation from simulated monoplane fluoroscopic images. Our experiments accentuate the efficiency of the proposed network, demonstrating it as a promising alternative to traditional methods.	翻訳日:2023-11-22 06:58:44 公開日:2023-11-19
# logicnet: 論理一貫性を組み込んだ顔属性学習ネットワーク LogicNet: A Logical Consistency Embedded Face Attribute Learning Network ( http://arxiv.org/abs/2311.11208v1 ) ライセンス: Link先を確認	Haiyu Wu, Sicong Tian, Huayu Li, Kevin W. Bowyer	(参考訳) 予測における論理的一貫性の確保は、多属性分類において決定的だが見落とされた側面である。この監視の潜在的な理由を探求し、この分野に2つの押し付け課題を紹介します。 1) 論理的整合性をチェックするためにデータでトレーニングされたモデルが、論理的に整合性のある予測をどうやって得るか。 2) 論理的整合性チェックを受けていないデータで、どうやって同じことを達成できますか? 自動化を強化するには、手作業の最小化も不可欠です。これらの課題に対処するために,fh41kとceleba-logicという2つのデータセットを導入し,属性間の論理関係を学習する敵対的トレーニングフレームワークであるlogicnetを提案する。 LogicNetの精度は、FH37K、FH41K、CelebA-logicでそれぞれ23.05%、9.96%、そして1.71%という、次のベストアプローチよりも高い。実世界の事例分析において,本手法は,他の手法と比較して,平均失敗事例数の50%以上を削減できる。 Ensuring logical consistency in predictions is a crucial yet overlooked aspect in multi-attribute classification. We explore the potential reasons for this oversight and introduce two pressing challenges to the field: 1) How can we ensure that a model, when trained with data checked for logical consistency, yields predictions that are logically consistent? 2) How can we achieve the same with data that hasn't undergone logical consistency checks? Minimizing manual effort is also essential for enhancing automation. To address these challenges, we introduce two datasets, FH41K and CelebA-logic, and propose LogicNet, an adversarial training framework that learns the logical relationships between attributes. Accuracy of LogicNet surpasses that of the next-best approach by 23.05%, 9.96%, and 1.71% on FH37K, FH41K, and CelebA-logic, respectively. In real-world case analysis, our approach can achieve a reduction of more than 50% in the average number of failed cases compared to other methods.	翻訳日:2023-11-22 06:58:31 公開日:2023-11-19
# 拡散モデルを用いた有理設計生成のための雑音スケジューリングについて On the Noise Scheduling for Generating Plausible Designs with Diffusion Models ( http://arxiv.org/abs/2311.11207v1 ) ライセンス: Link先を確認	Jiajie Fan, Laure Vuaille, Thomas B\"ack, Hao Wang	(参考訳) ディープジェネレーティブモデル(dgms)はファッションから自動車部門まで、複数の業界にまたがる革新的なデザインを生み出すために広く使われている。視覚的品質の高い画像を生成することに加え、構造設計のタスクは、例えば浮動小数点や欠落部分などの意味表現により厳密な制約を課す。拡散モデルのノイズスケジュールが結果の妥当性に与える影響を探索し、モデルの性能が結果の可否を決定する様々なノイズレベルが存在することを示す。また,与えられた画像集合に対して,そのような範囲を決定するための2つの手法を提案し,新しいパラメトリックノイズスケジュールを考案し,信頼性を向上させる。このノイズスケジュールをよく知られた拡散モデルEDMのトレーニングとサンプリングに適用し、デフォルトのノイズスケジュールと比較する。 edmと比較すると, 設計精度は83.4%から93.5%, fr\'echetインセプション距離 (fid) が7.84から4.87に大幅に向上した。高度な画像編集ツールのさらなる応用は、モデルの構造に対するしっかりとした理解を示している。 Deep Generative Models (DGMs) are widely used to create innovative designs across multiple industries, ranging from fashion to the automotive sector. In addition to generating images of high visual quality, the task of structural design generation imposes more stringent constrains on the semantic expression, e.g., no floating material or missing part, which we refer to as plausibility in this work. We delve into the impact of noise schedules of diffusion models on the plausibility of the outcome: there exists a range of noise levels at which the model's performance decides the result plausibility. Also, we propose two techniques to determine such a range for a given image set and devise a novel parametric noise schedule for better plausibility. We apply this noise schedule to the training and sampling of the well-known diffusion model EDM and compare it to its default noise schedule. Compared to EDM, our schedule significantly improves the rate of plausible designs from 83.4% to 93.5% and Fr\'echet Inception Distance (FID) from 7.84 to 4.87. Further applications of advanced image editing tools demonstrate the model's solid understanding of structure.	翻訳日:2023-11-22 06:58:14 公開日:2023-11-19
# ロバストなネットワークスライシング:マルチエージェントポリシー、敵対的攻撃、防御戦略 Robust Network Slicing: Multi-Agent Policies, Adversarial Attacks, and Defensive Strategies ( http://arxiv.org/abs/2311.11206v1 ) ライセンス: Link先を確認	Feng Wang, M. Cenk Gursoy, and Senem Velipasalar	(参考訳) 本稿では,複数の基地局と複数のユーザを持つ動的環境下でのネットワークスライシングのためのマルチエージェント深層強化学習(deep RL)フレームワークを提案する。特に,複数のアクタと集中型批評家(MACC)を備えた新しいディープRLフレームワークを提案し,アクタをポインタネットワークとして実装し,入力の異なる次元に適合させる。提案するdeep rlアルゴリズムの性能をシミュレーションにより評価し,その効果を示す。その後,先行情報と電力予算の制限を伴う深いrlベースのジャマーを開発した。妨害者の目標は、ネットワークスライシングによって達成される伝送速度を最小化し、ネットワークスライシングエージェントの性能を低下させることである。我々は、深いRLによるチャンネル最適化と同様に、聴取位相とジャミング位相の両方でジャマーを設計し、ジャミング位置最適化に対処する。我々は、ジャミングフェーズとリスニングフェーズを切り替えることで、最適化位置でのジャマーの評価を行い、最適化されたチャネルセットにおける干渉攻撃を生成する。提案手法は,ネットワークスライシングポリシーに関する直接的なフィードバックや事前知識を必要とせずに,被害者のパフォーマンスを著しく低減できることを示す。最後に,ネットワークスライシング(防御手段として)とジャミングのためのnash平衡教師付きポリシアンサンブル混合戦略プロファイルを考案する。本研究では,ネットワークスライシングエージェントとジャムマーエージェントを用いて,提案アルゴリズムの性能評価を行い,その有効性を示す。 In this paper, we present a multi-agent deep reinforcement learning (deep RL) framework for network slicing in a dynamic environment with multiple base stations and multiple users. In particular, we propose a novel deep RL framework with multiple actors and centralized critic (MACC) in which actors are implemented as pointer networks to fit the varying dimension of input. We evaluate the performance of the proposed deep RL algorithm via simulations to demonstrate its effectiveness. Subsequently, we develop a deep RL based jammer with limited prior information and limited power budget. The goal of the jammer is to minimize the transmission rates achieved with network slicing and thus degrade the network slicing agents' performance. We design a jammer with both listening and jamming phases and address jamming location optimization as well as jamming channel optimization via deep RL. We evaluate the jammer at the optimized location, generating interference attacks in the optimized set of channels by switching between the jamming phase and listening phase. We show that the proposed jammer can significantly reduce the victims' performance without direct feedback or prior knowledge on the network slicing policies. Finally, we devise a Nash-equilibrium-supervised policy ensemble mixed strategy profile for network slicing (as a defensive measure) and jamming. We evaluate the performance of the proposed policy ensemble algorithm by applying on the network slicing agents and the jammer agent in simulations to show its effectiveness.	翻訳日:2023-11-22 06:57:53 公開日:2023-11-19
# カテーテルおよびガイドワイヤセグメンテーションにおける形状感応損失 Shape-Sensitive Loss for Catheter and Guidewire Segmentation ( http://arxiv.org/abs/2311.11205v1 ) ライセンス: Link先を確認	Chayun Kongtongvattana, Baoru Huang, Jingxuan Kang, Hoan Nguyen, Olajide Olufemi, Anh Nguyen	(参考訳) 本稿では,カテーテルおよびガイドワイヤセグメンテーションのための形状感応損失関数を導入し,それを視覚トランスフォーマーネットワークで活用し,大規模x線画像データセットに新たな最先端結果を確立する。ネットワーク由来の予測とそれに対応する基底真理を符号付き距離マップに変換し、任意のネットワークが単に全体輪郭ではなく本質的な境界に集中できるようにする。これらのsdmは視覚トランスフォーマを施し、臨界画像属性をカプセル化した高次元特徴ベクトルを効率的に生成する。これらの特徴ベクトル間の余弦的類似性を計算することにより、従来の重複度に基づく測度の制限を超えて、画像類似性の微妙な理解が得られる。提案手法の利点は、スケールや翻訳の不変性から微妙な差異の検出に優れ、画像内の医療機器の正確な位置決めとデライン化を確保することにある。包括的定量的・質的分析により,既存のベースラインよりも性能が著しく向上し,カテーテルおよびガイドワイヤセグメンテーションを改善するための新しい形状感応損失関数が期待できることが証明された。 We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-derived predictions and their corresponding ground truths into signed distance maps, thereby enabling any networks to concentrate on the essential boundaries rather than merely the overall contours. These SDMs are subjected to the vision transformer, efficiently producing high-dimensional feature vectors encapsulating critical image attributes. By computing the cosine similarity between these feature vectors, we gain a nuanced understanding of image similarity that goes beyond the limitations of traditional overlap-based measures. The advantages of our approach are manifold, ranging from scale and translation invariance to superior detection of subtle differences, thus ensuring precise localization and delineation of the medical instruments within the images. Comprehensive quantitative and qualitative analyses substantiate the significant enhancement in performance over existing baselines, demonstrating the promise held by our new shape-sensitive loss function for improving catheter and guidewire segmentation.	翻訳日:2023-11-22 06:57:31 公開日:2023-11-19
# データ信頼性のアンマキングと改善:無害言語モデルのトレーニングのためのデータセットを用いた研究 Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models ( http://arxiv.org/abs/2311.11202v1 ) ライセンス: Link先を確認	Zhaowei Zhu, Jialu Wang, Hao Cheng, Yang Liu	(参考訳) 言語モデルはさまざまなタスクでpromiseを示していますが、トレーニング、微調整、アライメントの間、望ましくないデータに影響されます。例えば、安全でない会話が誤って安全なものとして注釈付けされている場合、これらのサンプルに微調整されたモデルは有害である可能性がある。したがって、アノテーションの正確性、すなわちデータセットの信頼性が重要である。本研究は,Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHFなどの一般的なベンチマークを含む,現実世界のデータセットの信頼性に注目したものだ。ヒトによるこれらのデータセットのクリーニングのコストと難しさを考慮し、データセットの信頼性を評価し、ラベルの誤りを特定し、キュレートされた言語データにおけるノイズの多いラベルの影響を評価するための体系的な枠組みを導入する。このフレームワークでは、上記のベンチマークから構築された11のデータセットで平均6.16%のラベルエラーを発見し、修正する。データ信頼性と下流学習性能はラベルエラーを直接修正することで著しく改善され、既存の実世界のデータセットをクリーニングすることの重要性が示される。オープンソース: https://github.com/docta-ai/docta.com Language models have shown promise in various tasks but can be affected by undesired data during training, fine-tuning, or alignment. For example, if some unsafe conversations are wrongly annotated as safe ones, the model fine-tuned on these samples may be harmful. Therefore, the correctness of annotations, i.e., the credibility of the dataset, is important. This study focuses on the credibility of real-world datasets, including the popular benchmarks Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHF, that can be used for training a harmless language model. Given the cost and difficulty of cleaning these datasets by humans, we introduce a systematic framework for evaluating the credibility of datasets, identifying label errors, and evaluating the influence of noisy labels in the curated language data, specifically focusing on unsafe comments and conversation classification. With the framework, we find and fix an average of 6.16% label errors in 11 datasets constructed from the above benchmarks. The data credibility and downstream learning performance can be remarkably improved by directly fixing label errors, indicating the significance of cleaning existing real-world datasets. Open-source: https://github.com/Docta-ai/docta.	翻訳日:2023-11-22 06:57:11 公開日:2023-11-19
# スケールフリーネットワーク:推論の改善 Scale-free networks: improved inference ( http://arxiv.org/abs/2311.11200v1 ) ライセンス: Link先を確認	Nixon Jerez-Lillo, Francisco A. Rodrigues and Pedro L. Ramos	(参考訳) パワーロー分布は、様々な応用科学と同様に複雑なネットワークにおいて重要な役割を果たす。ネットワークの次数分布が法則分布に従うかどうかを調べることは重要な問題である。モデルパラメータを推定するためによく使われる推論手法は、しばしばバイアス付き推定を導き、モデルがパワーローに従うという仮説の拒絶につながる。本稿では,ベイズ推定を用いて正確な推定値と信頼性区間を求める手法について述べる。推論法は連続分布と離散分布の両方に対して導出される。これらの手法は、客観的ベイズアプローチが両モデルのパラメータの偏りのない推定値を返すことを明らかにする。特に,連続例では,明らかな後方分布を同定する。この研究は適合度テストの能力を高め、ネットワークや他のデータセットがパワーロー分布に準拠しているかどうかを正確に識別する。提案手法を5000以上の合成ネットワークと3,000以上の実ネットワークに対して適合度分布に適用する。以上の結果から,本手法は,指定名目レベルに近い受入頻度が得られるため,実用上より適していることが示唆された。 The power-law distribution plays a crucial role in complex networks as well as various applied sciences. Investigating whether the degree distribution of a network follows a power-law distribution is an important concern. The commonly used inferential methods for estimating the model parameters often yield biased estimates, which can lead to the rejection of the hypothesis that a model conforms to a power-law. In this paper, we discuss improved methods that utilize Bayesian inference to obtain accurate estimates and precise credibility intervals. The inferential methods are derived for both continuous and discrete distributions. These methods reveal that objective Bayesian approaches return nearly unbiased estimates for the parameters of both models. Notably, in the continuous case, we identify an explicit posterior distribution. This work enhances the power of goodness-of-fit tests, enabling us to accurately discern whether a network or any other dataset adheres to a power-law distribution. We apply the proposed approach to fit degree distributions for more than 5,000 synthetic networks and over 3,000 real networks. The results indicate that our method is more suitable in practice, as it yields a frequency of acceptance close to the specified nominal level.	翻訳日:2023-11-22 06:56:50 公開日:2023-11-19
# オルガノイド画像のセグメンテーションのための自己監督型Versus監督訓練 Self-Supervised Versus Supervised Training for Segmentation of Organoid Images ( http://arxiv.org/abs/2311.11198v1 ) ライセンス: Link先を確認	Asmaa Haja, Eric Brouwer and Lambert Schomaker	(参考訳) デジタル顕微鏡の分野における関連データの注釈付けのプロセスは、必要な技術スキルと人間の専門知識により、時間と費用の両方がかかる。結果として、大量の顕微鏡画像データセットがラベル付けされず、ディープラーニングアルゴリズムによる効果的な利用を妨げている。近年、ラベルのないデータから多くの関連情報が引き出せることが示されている。自己教師付き学習(SSL)は、ラベルを必要とせずにメインタスクに類似したプリテキストタスクの下で固有の特徴を学習する、有望なソリューションである。トレーニングされた結果は、我々の場合のメインタスクイメージセグメンテーションに転送されます。 ResNet50 U-Netは、構造化類似度指数(Structure similarity Index Metric, SSIM)だけで、L1損失と組み合わせてSSIMを用いて、肝臓前駆体オルガノイドのイメージを拡張画像から復元する訓練が最初に行われた。エンコーダとデコーダの両方がタンデムで訓練された。重みは、凍ったエンコーダ重みを持つセグメンテーションのために設計された別のU-Netモデルに転送され、Binary Cross Entropy、Dice、Intersection over Union (IoU)損失を用いた。比較のために、私たちは同じu-netアーキテクチャを使用して、2つの教師付きモデルをトレーニングしました。その結果,25\%のドロップを用いた自己教師型学習モデルや画像のぼかし増強法は,IoU損失を用いた他の強化法よりも優れていた。メインタスクの114画像のみを訓練すると、自己教師付き学習アプローチは、教師付き方法が得点したf1=0.78と対照的に、f1-scoreを0.85で達成する教師付き手法を高い安定性で上回る。さらに、より大きなデータセット(1000画像)でトレーニングすると、自己教師あり学習は依然として良くなり、教師あり方法のスコア 0.85と対照的に、f1スコア 0.92 となる。 The process of annotating relevant data in the field of digital microscopy can be both time-consuming and especially expensive due to the required technical skills and human-expert knowledge. Consequently, large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms. In recent years it has been shown that a lot of relevant information can be drawn from unlabeled data. Self-supervised learning (SSL) is a promising solution based on learning intrinsic features under a pretext task that is similar to the main task without requiring labels. The trained result is transferred to the main task - image segmentation in our case. A ResNet50 U-Net was first trained to restore images of liver progenitor organoids from augmented images using the Structural Similarity Index Metric (SSIM), alone, and using SSIM combined with L1 loss. Both the encoder and decoder were trained in tandem. The weights were transferred to another U-Net model designed for segmentation with frozen encoder weights, using Binary Cross Entropy, Dice, and Intersection over Union (IoU) losses. For comparison, we used the same U-Net architecture to train two supervised models, one utilizing the ResNet50 encoder as well as a simple CNN. Results showed that self-supervised learning models using a 25\% pixel drop or image blurring augmentation performed better than the other augmentation techniques using the IoU loss. When trained on only 114 images for the main task, the self-supervised learning approach outperforms the supervised method achieving an F1-score of 0.85, with higher stability, in contrast to an F1=0.78 scored by the supervised method. Furthermore, when trained with larger data sets (1,000 images), self-supervised learning is still able to perform better, achieving an F1-score of 0.92, contrasting to a score of 0.85 for the supervised method.	翻訳日:2023-11-22 06:56:32 公開日:2023-11-19
# 非同一分散サンプルによるテスト Testing with Non-identically Distributed Samples ( http://arxiv.org/abs/2311.11194v1 ) ライセンス: Link先を確認	Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant	(参考訳) サンプルが独立に分布するが同一に分布しない環境では,サブ線形サンプル特性試験と推定がどの程度適用されるかを検討する。具体的には、以下の分散特性テストフレームワークについて検討する。 $k$, $\textbf{p}_1, \textbf{p}_2,\ldots,\textbf{p}_t$の離散的なサポートの上に一連のディストリビューションが存在すると仮定し、各ディストリビューションから$c$独立ドローを得る。平均分布のプロパティを学習またはテストすることを目標とすると、$\textbf{p}_{\mathrm{avg}}$である。 This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities -- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance. To test uniformity or identity -- distinguishing the case that $\textbf{p}_{\mathrm{avg}}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. 対照的に、$c \ge 2$ の場合、通常の i.i.d. のサブリニアなサンプル試験を復元する: $o(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ のサンプルは、$\varepsilon \ge k^{-1/4}$ の条件下での最適なサンプル複雑性に合致する。さらに、$c=2$の場合、$\rho > 0$ が存在して、$\rho k$ サンプルを持つ線形状態であっても、サンプルの多重集合(同じ $\textbf{p}_i$ から抽出されたサンプルを無視する)を考えるテスターは、均一性テストを行うことができない。 We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size $k$, $\textbf{p}_1, \textbf{p}_2,\ldots,\textbf{p}_T$, and we obtain $c$ independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, $\textbf{p}_{\mathrm{avg}}$. This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities -- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance. To test uniformity or identity -- distinguishing the case that $\textbf{p}_{\mathrm{avg}}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. In contrast, for $c \ge 2$, we recover the usual sublinear sample testing of the i.i.d. setting: we show that $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ samples are sufficient, matching the optimal sample complexity in the i.i.d. case in the regime where $\varepsilon \ge k^{-1/4}$. Additionally, we show that in the $c=2$ case, there is a constant $\rho > 0$ such that even in the linear regime with $\rho k$ samples, no tester that considers the multiset of samples (ignoring which samples were drawn from the same $\textbf{p}_i$) can perform uniformity testing.	翻訳日:2023-11-22 06:55:55 公開日:2023-11-19
# AIのインパクトアセスメントを評価する: 教室での研究 Assessing AI Impact Assessments: A Classroom Study ( http://arxiv.org/abs/2311.11193v1 ) ライセンス: Link先を確認	Nari Johnson, Hoda Heidari	(参考訳) 提案されたAIシステムへの影響を想像するための構造化プロセスを提供するツール群であるAIIA(Artificial Intelligence Impact Assessments)が、AIシステムを管理するための提案として人気が高まっている。近年、政府や民間団体の取り組みによりAIIAの多様なインスタンス化が提案されている。しかし、これまでのAIIA楽器の評価は限られていた。我々は,AIの社会的・倫理的意味に着目した選択科目において,大規模な研究集約大学(R1)で授業(N = 38)を行う。学生を異なる組織の役割(例えばML科学者やプロダクトマネージャ)に割り当て、参加者チームに、2つの想像上の生成AIシステムのうちの1つに対して、既存の3つのAI影響評価の1つを完成させるよう依頼します。参加者の行動前・後アンケートに対する反応のテーマ分析では、影響評価が、生成型AIシステムの潜在的なリスクに対する参加者の認識や、潜在的な害に対処するAI専門家の責任レベルに影響を及ぼすという予備的証拠が得られた。また、既存のAIIA機器が共有する一貫した制約も発見し、それらのフォーマットや内容、および潜在的な害を予知・軽減するための活動の実現可能性と有効性について懸念する。本研究の成果をもとに,AIIAの開発・検証に向けた今後の取り組みを提言する。 Artificial Intelligence Impact Assessments ("AIIAs"), a family of tools that provide structured processes to imagine the possible impacts of a proposed AI system, have become an increasingly popular proposal to govern AI systems. Recent efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. However, to date that has been limited evaluation of existing AIIA instruments. We conduct a classroom study (N = 38) at a large research-intensive university (R1) in an elective course focused on the societal and ethical implications of AI. We assign students to different organizational roles (for example, an ML scientist or product manager) and ask participant teams to complete one of three existing AI impact assessments for one of two imagined generative AI systems. In our thematic analysis of participants' responses to pre- and post-activity questionnaires, we find preliminary evidence that impact assessments can influence participants' perceptions of the potential risks of generative AI systems, and the level of responsibility held by AI experts in addressing potential harm. We also discover a consistent set of limitations shared by several existing AIIA instruments, which we group into concerns about their format and content, as well as the feasibility and effectiveness of the activity in foreseeing and mitigating potential harms. Drawing on the findings of this study, we provide recommendations for future work on developing and validating AIIAs.	翻訳日:2023-11-22 06:55:13 公開日:2023-11-19
# 視覚障害者の身体的攻撃に対する注意に基づくリアルタイム防御 Attention-Based Real-Time Defenses for Physical Adversarial Attacks in Vision Applications ( http://arxiv.org/abs/2311.11191v1 ) ライセンス: Link先を確認	Giulio Rossolini, Alessandro Biondi and Giorgio Buttazzo	(参考訳) ディープニューラルネットワークはコンピュータビジョンタスクにおいて優れたパフォーマンスを示すが、現実世界の敵攻撃に対する脆弱性は、予測を破損させる物理的オブジェクトを通じて達成され、安全クリティカルな領域における彼らのアプリケーションに対する深刻なセキュリティ上の懸念を引き起こす。既存の防衛手法は単一フレーム解析に焦点を当てており、リアルタイム決定が重要であるマルチフレームシナリオの適用性を制限する高い計算コストが特徴である。この問題に対処するため,本研究では,浅層ネットワーク層における悪意のある物体を迅速に識別し追跡し,その敵効果をマルチフレーム設定で隠蔽する,効果的な注意に基づく防御機構を提案する。本研究は,実世界の敵対的攻撃に対する既存のオーバーアクティベーション技術を拡張し,それらをリアルタイムアプリケーションで利用可能にすることにより,最先端の技術である。また、効率的なマルチフレーム防御フレームワークを導入し、防御性能と計算コストの両方を評価するための広範囲な実験を通じて有効性を検証する。 Deep neural networks exhibit excellent performance in computer vision tasks, but their vulnerability to real-world adversarial attacks, achieved through physical objects that can corrupt their predictions, raises serious security concerns for their application in safety-critical domains. Existing defense methods focus on single-frame analysis and are characterized by high computational costs that limit their applicability in multi-frame scenarios, where real-time decisions are crucial. To address this problem, this paper proposes an efficient attention-based defense mechanism that exploits adversarial channel-attention to quickly identify and track malicious objects in shallow network layers and mask their adversarial effects in a multi-frame setting. This work advances the state of the art by enhancing existing over-activation techniques for real-world adversarial attacks to make them usable in real-time applications. It also introduces an efficient multi-frame defense framework, validating its efficacy through extensive experiments aimed at evaluating both defense performance and computational cost.	翻訳日:2023-11-22 06:54:46 公開日:2023-11-19
# 検出可能性の絡み合い対策 Entanglement measures for detectability ( http://arxiv.org/abs/2311.11189v1 ) ライセンス: Link先を確認	Masahito Hayashi and Yuki Ito	(参考訳) 仮説テスト設定に基づく検出性能として,新たな絡み合い尺度を提案する。量子サノフ定理を拡張して絡み合った状態を検出する方法を明らかにする。最大相関状態に対するそれらの計算式を導出し、一般絡み合う状態に作用するアルゴリズムを提案する。さらに,本アルゴリズムがメンバシップ問題に対する分離可能性の解決にどのように役立つかを検討する。 We propose new entanglement measures as the detection performance based on as the hypothesis testing setting. We clarify how our measures work for detecting an entangled state by extending quantum Sanov theorem. We derive their calculation formulas for maximally correlated states, and propose their algorithms that work for general entangled state. In addition, we investigate how our algorithm works for solving the membership problem for separability.	翻訳日:2023-11-22 06:54:29 公開日:2023-11-19
# 一般化量子有元ブラフトアルゴリズムとその量子情報ボトルネックへの応用 Generalized quantum Arimoto-Blahut algorithm and its application to quantum information bottleneck ( http://arxiv.org/abs/2311.11188v1 ) ライセンス: Link先を確認	Masahito Hayashi and Geng Liu	(参考訳) 我々は、Ramakrishnan et al. (IEEE Trans) による量子アリーモト・ブラフトアルゴリズムを一般化する。 IT, 67, 946 (2021) は線形制約のある密度行列の集合上で定義される関数である。このアルゴリズムは適用範囲が広い。そこで,本アルゴリズムを3つの量子システムを用いた量子情報ボトルネックに適用し,量子学習に適用する。得られたアルゴリズムを,Grimsmo と Still (Phys) の既存アルゴリズムと比較した。 A, 94, 012338 (2016)。数値解析の結果,我々のアルゴリズムはアルゴリズムよりも優れていることがわかった。 We generalize the quantum Arimoto-Blahut algorithm by Ramakrishnan et al. (IEEE Trans. IT, 67, 946 (2021)) to a function defined over a set of density matrices with linear constraints. This algorithm has wider applicability. Hence, we apply our algorithm to the quantum information bottleneck with three quantum systems, which can be used for quantum learning. We numerically compare our obtained algorithm with the existing algorithm by Grimsmo and Still (Phys. Rev. A, 94, 012338 (2016)). Our numerical analysis shows that our algorithm is better than their algorithm.	翻訳日:2023-11-22 06:54:24 公開日:2023-11-19
# オープンボキャブラリーカモフラージュ物体のセグメンテーション Open-Vocabulary Camouflaged Object Segmentation ( http://arxiv.org/abs/2311.11241v1 ) ライセンス: Link先を確認	Youwei Pang, Xiaoqi Zhao, Jiaming Zuo, Lihe Zhang, Huchuan Lu	(参考訳) 近年、CLIPのような大規模視覚言語モデル(VLM)が出現し、オープンワールドオブジェクト認識への道を開いた。多くの研究が、推論時に新しいクラスを持つ多様なオブジェクトを知覚する必要がある、オープン語彙の高密度な予測課題に対する事前学習VLMの利用について検討している。既存の手法は、オープンな語彙に適合せず、データ収集バイアスとアノテーションコストのために複雑な場面でキャモフラージュされた知覚できないオブジェクトを伴わない、関連するタスクの公開データセットに基づく実験を構築している。このギャップを埋めるために,新しいタスクであるオープンボキャブラリー迷彩オブジェクトセグメンテーション(ovcos)を導入し,11,483個の手書き画像と対応するオブジェクトクラスを含む大規模複雑なシーンデータセット(\textbf{ovcamo})を構築する。さらに、パラメータ固定されたCLIPに反復的意味指導と構造拡張を付加した、強力な単一段オープン語彙である \underline{c}amouflaged \underline{o}bject \underline{s}egmentation transform\underline{er} baseline \textbf{OVCoser} を構築した。クラスセマンティック知識の指導とエッジ情報と深度情報からの視覚構造的手がかりの補足を統合することにより、提案手法は効率よくカモフラージュされたオブジェクトを捕捉できる。さらに、この効果的なフレームワークは、OVCamoデータセットに対する大きなマージンで、従来のオープン語彙のセマンティックイメージセグメンテーションの最先端を上回ります。提案するデータセットとベースラインによって、より実用的な価値を持つこの新しいタスクが、オープンボキャブラリー密集予測タスクの研究をさらに拡大できることを期待している。 Recently, the emergence of the large-scale vision-language model (VLM), such as CLIP, has opened the way towards open-world object perception. Many works has explored the utilization of pre-trained VLM for the challenging open-vocabulary dense prediction task that requires perceive diverse objects with novel classes at inference time. Existing methods construct experiments based on the public datasets of related tasks, which are not tailored for open vocabulary and rarely involves imperceptible objects camouflaged in complex scenes due to data collection bias and annotation costs. To fill in the gaps, we introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) and construct a large-scale complex scene dataset (\textbf{OVCamo}) which containing 11,483 hand-selected images with fine annotations and corresponding object classes. Further, we build a strong single-stage open-vocabulary \underline{c}amouflaged \underline{o}bject \underline{s}egmentation transform\underline{er} baseline \textbf{OVCoser} attached to the parameter-fixed CLIP with iterative semantic guidance and structure enhancement. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects. Moreover, this effective framework also surpasses previous state-of-the-arts of open-vocabulary semantic image segmentation by a large margin on our OVCamo dataset. With the proposed dataset and baseline, we hope that this new task with more practical value can further expand the research on open-vocabulary dense prediction tasks.	翻訳日:2023-11-21 21:35:48 公開日:2023-11-19
# AtomXR: 自然言語と没入型物理的相互作用によるXRプロトタイピング AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction ( http://arxiv.org/abs/2311.11238v1 ) ライセンス: Link先を確認	Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman	(参考訳) 拡張現実(XR)の技術的進歩により、より多くのXRコンテンツへの需要が増大するにつれ、従来の開発プロセスはいくつかの課題に直面している。 1)未熟な開発者のための急な学習曲線 2)ヘッドセット内における2次元開発環境と3次元ユーザ体験の切り離し 3) 開発環境とテスト環境のコンテキスト切り替えによるイテレーションサイクルの遅さ。これらの課題に対処するために、私たちは、経験豊富な開発者と経験の浅い開発者の両方に、自然言語、目視、タッチインタラクションを使用したアプリケーション開発を促進すべく設計された、合理化され、没入的、ノーコードXRプロトタイピングツールであるAtomXRを紹介します。 AtomXRは以下のもので構成されます。 1. AtomScript - 高速プロトタイピングのための高レベルの人間解釈可能なスクリプト言語。 2)atomscript生成のためのllmsとマルチモーダル入力を統合する自然言語インタフェース 3)没入型インヘッドセットオーサリング環境。 2つのユーザスタディによる経験的評価は、自然言語ベースおよび没入型プロトタイピングに関する洞察を与え、AtomXRは従来のシステムと比較して、スピードとユーザエクスペリエンスを大幅に改善することを示している。 As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environments. To address these challenges, we introduce AtomXR, a streamlined, immersive, no-code XR prototyping tool designed to empower both experienced and inexperienced developers in creating applications using natural language, eye-gaze, and touch interactions. AtomXR consists of: 1) AtomScript, a high-level human-interpretable scripting language for rapid prototyping, 2) a natural language interface that integrates LLMs and multimodal inputs for AtomScript generation, and 3) an immersive in-headset authoring environment. Empirical evaluation through two user studies offers insights into natural language-based and immersive prototyping, and shows AtomXR provides significant improvements in speed and user experience compared to traditional systems.	翻訳日:2023-11-21 21:35:15 公開日:2023-11-19
# マルチモーダル感性分析のためのAI深層学習アルゴリズムの実装 Implementation of AI Deep Learning Algorithm For Multi-Modal Sentiment Analysis ( http://arxiv.org/abs/2311.11237v1 ) ライセンス: Link先を確認	Jiazhen Wang	(参考訳) 2チャンネル畳み込みニューラルネットワークとリングネットワークを組み合わせたマルチモーダル感情認識手法が確立された。感情情報を効果的に抽出し、学習効率を向上させる。単語はグローブでベクトル化され、単語ベクトルは畳み込みニューラルネットワークに入力された。注意機構と最大プールコンバータBiSRUチャネルを組み合わせることで、局所的な深層感情と逐次的感情意味論を得る。最後に、複数の特徴を融合して感情の極性として入力することにより、対象の感情分析を実現する。特徴融合に基づく感情分析手法は,感情データセットの認識精度を効果的に向上させ,学習時間を短縮できることを示す。モデルにはある種の一般化がある。 A multi-modal emotion recognition method was established by combining two-channel convolutional neural network with ring network. This method can extract emotional information effectively and improve learning efficiency. The words were vectorized with GloVe, and the word vector was input into the convolutional neural network. Combining attention mechanism and maximum pool converter BiSRU channel, the local deep emotion and pre-post sequential emotion semantics are obtained. Finally, multiple features are fused and input as the polarity of emotion, so as to achieve the emotion analysis of the target. Experiments show that the emotion analysis method based on feature fusion can effectively improve the recognition accuracy of emotion data set and reduce the learning time. The model has a certain generalization.	翻訳日:2023-11-21 21:34:54 公開日:2023-11-19
# 時系列異常検出における「異常」の解法:自己教師付きトリドメイン解 Unraveling the `Anomaly' in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution ( http://arxiv.org/abs/2311.11235v1 ) ライセンス: Link先を確認	Yuting Sun, Guansong Pang, Guanhua Ye, Tong Chen, Xia Hu, Hongzhi Yin	(参考訳) 時系列異常検出(tsad: time series anomaly detection)における現在進行中の課題、特に異常ラベルの不足と異常長と形状の変化は、より効率的なソリューションの必要性をもたらした。 TSADにおける従来の教師付きモデルには限定的な異常ラベルが存在するため、自己教師付き学習のような様々なSOTA深層学習技術がこの問題に対処するために導入されている。しかし、これらは異常長や形状の変化に対処し難いため、様々な異常への適応性が制限される。さらに、多くのベンチマークデータセットは、ランダム関数でさえ検出できる明示的な異常を持つという問題に悩まされている。この問題は、不適切な評価指標である点調整(PA)によって悪化し、モデル性能が膨張する可能性がある。本稿では,3つのデータ領域の時間的・頻度的・残差的特徴を,異常ラベルに依存することなくモデル化することで,これらの課題に対処する,自己教師型学習ベースのTriADを提案する。従来のコントラスト学習法とは異なり、triadはドメイン間コントラスト損失とドメイン内コントラスト損失の両方を使用して、通常のデータ間の共通属性を学習し、異常と区別する。さらに,ディスコード検出アルゴリズムと統合することで,長さの異なる異常を検出できる。この研究は、高度に設計されたデータセット(UCRアーカイブ)と評価指標(PA%Kとアフィリエイト)の両方を利用して、TSADにおけるディープラーニングの可能性を再評価する最初の試みである。 UCRデータセットの実験結果により、TriADは、SOTA深層学習モデルよりもPA%KベースのF1スコアが3倍、精度が50%向上した。 The ongoing challenges in time series anomaly detection (TSAD), notably the scarcity of anomaly labels and the variability in anomaly lengths and shapes, have led to the need for a more efficient solution. As limited anomaly labels hinder traditional supervised models in TSAD, various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. However, they encounter difficulties handling variations in anomaly lengths and shapes, limiting their adaptability to diverse anomalies. Additionally, many benchmark datasets suffer from the problem of having explicit anomalies that even random functions can detect. This problem is exacerbated by ill-posed evaluation metrics, known as point adjustment (PA), which can result in inflated model performance. In this context, we propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD), which addresses these challenges by modeling features across three data domains - temporal, frequency, and residual domains - without relying on anomaly labels. Unlike traditional contrastive learning methods, TriAD employs both inter-domain and intra-domain contrastive loss to learn common attributes among normal data and differentiate them from anomalies. Additionally, our approach can detect anomalies of varying lengths by integrating with a discord discovery algorithm. It is worth noting that this study is the first to reevaluate the deep learning potential in TSAD, utilizing both rigorously designed datasets (i.e., UCR Archive) and evaluation metrics (i.e., PA%K and affiliation). Through experimental results on the UCR dataset, TriAD achieves an impressive three-fold increase in PA%K based F1 scores over SOTA deep learning models, and 50% increase of accuracy as compared to SOTA discord discovery algorithms.	翻訳日:2023-11-21 21:34:41 公開日:2023-11-19
# 医療におけるコンピュータビジョンのための畳み込みニューラルネットワークによる放射線診断の強化 Enhancing Radiology Diagnosis through Convolutional Neural Networks for Computer Vision in Healthcare ( http://arxiv.org/abs/2311.11234v1 ) ライセンス: Link先を確認	Keshav Kumar K., Dr N V S L Narasimham	(参考訳) 放射線診断における畳み込みニューラルネットワーク(CNN)の変換力について, 解釈可能性, 有効性, 倫理的問題に着目して検討した。変更されたDenseNetアーキテクチャでは、CNNは特殊性、感度、精度の点で優れている。従来の手法よりも優れていることは、効率向上を強調する比較分析によって検証される。それでも、解釈可能性に関する問題は、継続的モデルの改善に加えて、洗練されたメソッドの必要性を強調している。相互運用性や放射線技師のトレーニングといった統合問題は、チームワークの提案につながります。倫理的含意を体系的に考慮し、広範な枠組みを必要とする。アーキテクチャのリファインメント、解釈可能性、倫理的考察は、放射線診断におけるCNNの展開に責任を持つものとして、今後の研究において優先される必要がある。 The transformative power of Convolutional Neural Networks (CNNs) in radiology diagnostics is examined in this study, with a focus on interpretability, effectiveness, and ethical issues. With an altered DenseNet architecture, the CNN performs admirably in terms of particularity, sensitivity, as well as accuracy. Its superiority over conventional methods is validated by comparative analyses, which highlight efficiency gains. Nonetheless, interpretability issues highlight the necessity of sophisticated methods in addition to continuous model improvement. Integration issues like interoperability and radiologists' training lead to suggestions for teamwork. Systematic consideration of the ethical implications is carried out, necessitating extensive frameworks. Refinement of architectures, interpretability, alongside ethical considerations need to be prioritized in future work for responsible CNN deployment in radiology diagnostics.	翻訳日:2023-11-21 21:34:07 公開日:2023-11-19
# 制御されたテキスト生成における意図しないバイアスを軽減する因果関係 Causal ATE Mitigates Unintended Bias in Controlled Text Generation ( http://arxiv.org/abs/2311.11229v1 ) ライセンス: Link先を確認	Rahul Madhavan and Kahini Wadhawan	(参考訳) 因果平均処理効果(Causal ATE)を用いた言語モデルの属性制御について検討した。言語モデルにおける属性制御タスク(lms)の既存の方法は、興味のある属性を持つ文中の単語の共起をチェックし、それらを制御する。しかしながら、トレーニングデータセット内の属性と単語のスプリアス相関は、推論中にスプリアス相関が提示された場合に、モデルが属性の存在を幻覚させる可能性がある。簡単な摂動に基づくCausal ATE法は意図しない効果を除去する。さらに,分類タスクにおける因果関係の調査のための理論的基礎を提供し,偽陽性の数を減らすことを証明し,意図しないバイアスの問題を緩和する。特に、有害性軽減の問題において、有害性軽減の課題は、しばしば除毒後に保護されたグループに現れる不注意な偏見にある。この意図しないバイアスは、Causal ATEメトリックを用いて解決できることが示される。 We study attribute control in language models through the method of Causal Average Treatment Effect (Causal ATE). Existing methods for the attribute control task in Language Models (LMs) check for the co-occurrence of words in a sentence with the attribute of interest, and control for them. However, spurious correlation of the words with the attribute in the training dataset, can cause models to hallucinate the presence of the attribute when presented with the spurious correlate during inference. We show that the simple perturbation-based method of Causal ATE removes this unintended effect. Additionally, we offer a theoretical foundation for investigating Causal ATE in the classification task, and prove that it reduces the number of false positives -- thereby mitigating the issue of unintended bias. Specifically, we ground it in the problem of toxicity mitigation, where a significant challenge lies in the inadvertent bias that often emerges towards protected groups post detoxification. We show that this unintended bias can be solved by the use of the Causal ATE metric.	翻訳日:2023-11-21 21:33:55 公開日:2023-11-19
# 分子システムの高精度で効率的な幾何学的深層学習のための普遍的枠組み A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems ( http://arxiv.org/abs/2311.11228v1 ) ライセンス: Link先を確認	Shuo Zhang, Yang Liu, Lei Xie	(参考訳) 分子科学は、異なるタイプや大きさの分子とその複合体を含む幅広い問題に対処する。近年、幾何学的ディープラーニング、特にグラフニューラルネットワークは、分子科学の応用において有望な性能を示している。しかし、既存のほとんどの研究は特定の分子系に目的の誘導バイアスを課すことが多く、マクロ分子や大規模タスクに適用しても非効率である。これらの課題に対処するため,PAMNetは,任意の分子系のサイズや型が異なる3次元(3D)分子の表現を正確かつ効率的に学習するための普遍的なフレームワークである。分子力学にインスパイアされたPAMNetは、局所的および非局所的相互作用とその組み合わせ効果を明示的にモデル化するために、物理情報バイアスを誘導する。その結果、PAMNetは高価な操作を削減でき、時間とメモリ効率が向上する。広範なベンチマーク研究において、PAMNetは、小さな分子の性質、RNA3D構造、タンパク質-リガンド結合親和性という3つの異なる学習課題において、精度と効率の両面で最先端のベースラインより優れている。この結果は,分子科学の幅広い応用におけるPAMNetの可能性を強調した。 Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes. Recently, geometric deep learning, especially Graph Neural Networks, has shown promising performance in molecular science applications. However, most existing works often impose targeted inductive biases to a specific molecular system, and are inefficient when applied to macromolecules or large-scale tasks, thereby limiting their applications to many real-world problems. To address these challenges, we present PAMNet, a universal framework for accurately and efficiently learning the representations of three-dimensional (3D) molecules of varying sizes and types in any molecular system. Inspired by molecular mechanics, PAMNet induces a physics-informed bias to explicitly model local and non-local interactions and their combined effects. As a result, PAMNet can reduce expensive operations, making it time and memory efficient. In extensive benchmark studies, PAMNet outperforms state-of-the-art baselines regarding both accuracy and efficiency in three diverse learning tasks: small molecule properties, RNA 3D structures, and protein-ligand binding affinities. Our results highlight the potential for PAMNet in a broad range of molecular science applications.	翻訳日:2023-11-21 21:33:41 公開日:2023-11-19
# FedRA:不均一クライアントの力を解き放つためのフェデレーションチューニングのためのランダムアロケーション戦略 FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients ( http://arxiv.org/abs/2311.11227v1 ) ライセンス: Link先を確認	Shangchao Su, Bin Li, Xiangyang Xue	(参考訳) 基礎モデルの可用性が高まり、フェデレーションチューニングはフェデレーション学習の分野で注目を集め、複数のクライアントからのデータと計算リソースを活用して、協調的に微調整された基礎モデルを開発した。しかしながら、現実世界のフェデレーションシナリオでは、計算や通信リソースの異なる多数の異種クライアントが存在することが多く、モデルの微調整プロセス全体をサポートすることができない。そこで本研究では,新しいフェデレートチューニングアルゴリズムであるFedRAを提案する。 FedRAの実装は単純で、オリジナルのモデルにさらなる変更を加えることなく、トランスフォーマーベースのモデルにシームレスに統合することができる。具体的には、各通信ラウンドにおいて、FedRAはランダムにアロケーション行列を生成する。リソース制約のあるクライアントの場合、loraを使用して割り当て行列と微調整に基づいて、元のモデルから少数のレイヤを再編成する。その後、サーバは現在の割り当て行列に従ってクライアントから更新されたLoRAパラメータを元のモデルの対応するレイヤに集約する。 fedraは、すべてのクライアントがグローバルモデルを完全にサポートできないようなシナリオもサポートしていますが、これは素晴らしいアドバンテージです。大規模な画像データセットであるDomainNetとNICO++を、さまざまな非ID設定で実験する。その結果,FedRAは比較手法よりも優れていた。ソースコードは \url{https://github.com/leondada/fedra} で入手できる。 With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using LoRA. Subsequently, the server aggregates the updated LoRA parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.	翻訳日:2023-11-21 21:33:19 公開日:2023-11-19
# LLMに基づくプロンプト修正とユーザフィードバックを用いた対話型クエリ生成アシスタント An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback ( http://arxiv.org/abs/2311.11226v1 ) ライセンス: Link先を確認	Kaustubh D. Dhole, Ramraj Chandradevan, Eugene Agichtein	(参考訳) 検索は情報にアクセスする主要な方法であるが、特にユーザがドメインに精通していない状況や、他の言語の文書を検索したり、クエリとして容易に表現できないイベントなどの複雑な情報を探す場合、効果的なクエリの定式化は難しい課題である。しかし、このようなクエリ・バイ・サンプルのシナリオは、ドリフトの概念になりがちであり、クエリ生成メソッドに非常に敏感である。このデモでは、llmをインタラクティブに使用し、ユーザがクエリ定式化プロセスのすべての段階で編集やフィードバックを提供するための補完的なアプローチを示す。提案するクエリ生成アシスタントは,単言語および多言語文書コレクション上での対話型クエリ生成をサポートする新しい検索インタフェースである。具体的には、提案した補助インタフェースにより、ユーザは異なるLCMによって生成されたクエリを洗練し、検索したドキュメントやパスに対するフィードバックを提供し、より効果的なクエリを生成するプロンプトとしてユーザのフィードバックを組み込むことができる。提案するインタフェースは,検索モデルの有効性を定性的に評価するために,クエリ生成のためのllmの微調整と促進を探求する,複雑な検索タスクに対してhitl(human-in-the-loop)実験を行う上で有用な実験ツールである。 While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other languages, or looking for complex information such as events, which are not easily expressible as queries. Providing example documents or passages of interest, might be easier for a user, however, such query-by-example scenarios are prone to concept drift, and are highly sensitive to the query generation method. This demo illustrates complementary approaches of using LLMs interactively, assisting and enabling the user to provide edits and feedback at all stages of the query formulation process. The proposed Query Generation Assistant is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection. Specifically, the proposed assistive interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries. The proposed interface is a valuable experimental tool for exploring fine-tuning and prompting of LLMs for query generation to qualitatively evaluate the effectiveness of retrieval and ranking models, and for conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries without such assistance.	翻訳日:2023-11-21 21:32:58 公開日:2023-11-19
# TextGuard: テキスト分類によるバックドア攻撃に対する防御 TextGuard: Provable Defense against Backdoor Attacks on Text Classification ( http://arxiv.org/abs/2311.11225v1 ) ライセンス: Link先を確認	Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song	(参考訳) バックドア攻撃は、セキュリティクリティカルなアプリケーションに機械学習モデルをデプロイする上で、大きなセキュリティ脅威となっている。既存の研究はバックドア攻撃に対する多くの防御を提案している。特定の実証的な防御効果を示すにもかかわらず、これらの技術は任意の攻撃に対して形式的で証明可能なセキュリティ保証を提供することはできない。その結果,本評価で示すように,強力な適応攻撃によって容易に破られる。本稿では,テキスト分類におけるバックドア攻撃に対する最初の防御手法であるtextguardを提案する。特にTextGuardは、まず(バックドア付き)トレーニングデータをサブトレーニングセットに分割し、各トレーニング文をサブ文に分割する。このパーティショニングにより、サブトレーニングセットの大部分がバックドアトリガを含まないことが保証される。その後、各サブトレーニングセットからベース分類器を訓練し、そのアンサンブルが最終予測を提供する。理論的には、バックドアトリガの長さが一定のしきい値に収まると、TextGuardは、トレーニングやテストにおけるトリガーの存在によって、その予測が影響を受けないことを保証します。本評価では,3つのベンチマークテキスト分類タスクにおけるTextGuardの有効性を実証し,バックドア攻撃に対する既存の認証防御の認証精度を上回った。さらに,TextGuardの実証性能を高めるための新たな戦略を提案する。最先端の実証的防御との比較は、複数のバックドア攻撃に対するTextGuardの優位性を検証する。私たちのコードとデータはhttps://github.com/ai-secure/textguardで入手できます。 Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard.	翻訳日:2023-11-21 21:32:30 公開日:2023-11-19
# ガウス拡散:構造雑音を伴う拡散確率モデルの3次元ガウス散乱 GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise ( http://arxiv.org/abs/2311.11221v1 ) ライセンス: Link先を確認	Xinhai Li and Huaibin Wang and Kuo-Kun Tseng	(参考訳) text-to-3dは効率的な生成方法と拡張的な創造性で知られており、aigcドメインでかなりの注目を集めている。しかし、Nerfと2次元拡散モデルの融合は、しばしば過飽和画像を生成し、画素ワイドレンダリング法の制約により下流産業用途に厳しい制約を課す。ガウススプラッティングは、最近、NeRF法で一般的な従来の点検法に取って代わられ、3次元再構成の様々な側面に革命をもたらした。本稿では,gaussian splattingに基づく新たな3dコンテンツ生成フレームワークを提案する。 3次元生成における多視点一貫性の実現という課題は、モデリングの複雑さと精度を著しく損なう。 SJCからインスピレーションを得て,多視点形状の不整合の是正を目的とした3次元ガウススプラッティングによる摂動画像へのマルチビューノイズ分布の適用を検討した。我々は,様々な視点からガウスノイズを発生させる効率的なノイズ生成法を考案した。さらに、バニラ3dガウス系世代は、局所的なミニマでモデルを罠にかける傾向があり、フローター、バリ、増殖要素などの人工物を引き起こす。これらの問題を緩和するために,3次元外観の品質と安定性を高めるため,変分ガウススプラッティング法を提案する。我々の知る限り,本手法は3次元コンテンツ生成プロセスの全領域にわたるガウススプラッティングの包括的利用が初めてである。 Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes.	翻訳日:2023-11-21 21:32:07 公開日:2023-11-19
# 状態独立な全対数論 State-independent all-versus-nothing arguments ( http://arxiv.org/abs/2311.11218v1 ) ライセンス: Link先を確認	Boseong Kim, Samson Abramsky	(参考訳) 文脈性は、古典的直観に挑戦する量子情報の重要な特徴であり、量子優位の明示的な証明を構築する基盤を提供する。量子的優位性の多くの証拠は文脈性議論に基づいているが、文脈性の定義はそれぞれの研究で異なり、その結果間の即時接続の確立に矛盾を引き起こす。本報告では,層理論的文脈性の数学的構造を概観し,この枠組みを拡張してコチェン・スペック的文脈性を説明する。まず、文脈性の定義を詳細な例で取り上げる。次に、全対無(AvN)引数を述べ、状態非依存のAvNクラスを定義します。可観測物の部分的閉包を可換化することで, コッチェン=スペーカー型文脈性, あるいは部分閉包における文脈性が, この枠組みに変換できることが示されている。最後に、状態側ビューにおける文脈性クラスの厳密な階層構造が部分的クロージャ形式とともに状態非依存のAvNクラスにマージされるような演算子側ビューにおける文脈性の各ケースを比較する。全体として、この記事はコチェン=スペクター型の概念を状態独立なAvN引数に組み込むことにより、文脈性の統一的な解釈を提供する。この結果は文脈性に対する新たな洞察を示し、量子優位の証明を構築するためのコヒーレントなアプローチへの道を開く。 Contextuality is a key feature of quantum information that challenges classical intuitions, providing the basis for constructing explicit proofs of quantum advantage. While a number of evidences of quantum advantage are based on the contextuality argument, the definition of contextuality is different in each research, causing incoherence in the establishment of instant connection between their results. In this report, we review the mathematical structure of sheaf-theoretic contextuality and extend this framework to explain Kochen-Specker type contextuality. We first cover the definitions in contextuality with detailed examples. Then, we state the all-versus-nothing (AvN) argument and define a state-independent AvN class. It is shown that Kochen-Specker type contextuality, or contextuality in a partial closure, can be translated into this framework by the partial closure of observables under the multiplication of commuting measurements. Finally, we compare each case of contextuality in an operator-side view, where the strict hierarchy of contextuality class in a state-side view seems to merge into the state-independent AvN class together with the partial closure formalism. Overall, this report provides a unified interpretation of contextuality by integrating Kochen-Specker type notions into the state-independent AvN argument. The results present novel insights into contextuality, which pave the way for a coherent approach to constructing proofs of quantum advantage.	翻訳日:2023-11-21 21:31:38 公開日:2023-11-19
# SPLAIN: 理由とデータによるサイバーセキュリティ問題の拡大 SPLAIN: Augmenting CybersecurityWarnings with Reasons and Data ( http://arxiv.org/abs/2311.11215v1 ) ライセンス: Link先を確認	Vera A. Kazakova and Jena D. Hwang and Bonnie J. Dorr and Yorick Wilks and J. Blake Gage and Alex Memory and Mark A. Clark	(参考訳) 効果的なサイバー脅威認識と予防要求は、事前アプローチが一般的に限定的で、究極的には理解できない情報を提供するため、理解しやすい予測システムである。 SPLAIN(Simplified Plaintext Language)は,警告データをユーザフレンドリーなサイバー脅威記述に変換する自然言語生成装置である。 SPLAINは、入力データとシステム機能に関する階層的な説明的詳細を組み込んだ、明確で実用的な出力を生成するように設計されている。個々のセンサによる予測信号の入力と融合モジュールからの全体的な警告を考慮し、SPLAINは各信号に対して、センサやデータ信号に関する情報を問い合わせる。この収集されたデータは、ユーザレビューのための予測、センシング、データ要素を含む、一貫性のある英語の説明に処理される。 SPLAINのテンプレートベースのアプローチは、一貫した警告構造と語彙を保証する。 splainの階層的な出力構造により、各脅威とそのコンポーネントを拡張でき、要求の基盤となる説明を明らかにすることができる。我々の結論は、サイバー警告の背後にある「方法」と「理由」を特定することの必要性を強調し、一貫性のある説明を生成するための単純な構造化テンプレートを提唱し、機械学習アプローチにおける直接因果関係が常に識別可能であるとは限らないことを認識し、モデルやトレーニングデータといった一般的な方法論に焦点を当てるためにいくつかの説明を必要とする。 Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorporating hierarchically organized explanatory details about input data and system functionality. Given the inputs of individual sensor-induced forecasting signals and an overall warning from a fusion module, SPLAIN queries each signal for information on contributing sensors and data signals. This collected data is processed into a coherent English explanation, encompassing forecasting, sensing, and data elements for user review. SPLAIN's template-based approach ensures consistent warning structure and vocabulary. SPLAIN's hierarchical output structure allows each threat and its components to be expanded to reveal underlying explanations on demand. Our conclusions emphasize the need for designers to specify the "how" and "why" behind cyber warnings, advocate for simple structured templates in generating consistent explanations, and recognize that direct causal links in Machine Learning approaches may not always be identifiable, requiring some explanations to focus on general methodologies, such as model and training data.	翻訳日:2023-11-21 21:31:13 公開日:2023-11-19
# 弱監督下における変電所設備故障の赤外画像識別法 Infrared image identification method of substation equipment fault under weak supervision ( http://arxiv.org/abs/2311.11214v1 ) ライセンス: Link先を確認	Anjali Sharma, Priya Banerjee, Nikhil Singh	(参考訳) 本研究では, サブステーション装置の赤外線画像中の欠陥を弱教師付きで識別する手法を提案する。機器識別にFaster RCNNモデルを使用し、モデルのネットワーク構造とパラメータの変更による検出精度を向上させる。サブステーションで検査ロボットが捉えた赤外線画像の解析により,本手法を実証する。手動でマークされた結果に対して性能が検証され、提案手法が様々な機器タイプにわたる故障同定の精度を大幅に向上させることを示した。 This study presents a weakly supervised method for identifying faults in infrared images of substation equipment. It utilizes the Faster RCNN model for equipment identification, enhancing detection accuracy through modifications to the model's network structure and parameters. The method is exemplified through the analysis of infrared images captured by inspection robots at substations. Performance is validated against manually marked results, demonstrating that the proposed algorithm significantly enhances the accuracy of fault identification across various equipment types.	翻訳日:2023-11-21 21:30:49 公開日:2023-11-19
# 因果探索アルゴリズムにおける事前学習言語モデルの利用は可能か? Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms? ( http://arxiv.org/abs/2311.11212v1 ) ライセンス: Link先を確認	Chanhui Lee (1), Juhyeon Kim (2), Yongjun Jeong (3), Juhyun Lyu (4), Junghee Kim (4), Sangmin Lee (4), Sangjun Han (4), Hyeokjun Choe (4), Soyeon Park (4), Woohyung Lim (4), Sungbin Lim (5,6), Sanghack Lee (2,7) ((1) Department of Artificial Intelligence, Korea University, (2) Graduate School of Data Science, Seoul National University, (3) Department of Computer Science and Engineering, UNIST, (4) Data Intelligence Laboratory, LG AI Research, (5) Department of Statistics, Korea University, (6) LG AI Research, (7) SNU-LG AI Research Center)	(参考訳) スケーリング法は、事前訓練された言語モデル(PLM)を因果推論の分野に導入することを許している。 PLMの因果推論は、データを利用した変数間の因果関係を決定することを目的とした因果発見とは対照的に、テキストベースの記述にのみ依存する。近年,特別に設計されたプロンプトにより,反復的因果推論の結果を集約して因果発見を模倣する手法が研究されている。原因と効果の発見におけるPLMの有用性を強調しており、特に複数の変数を扱う場合、データ不足によって制限されることが多い。逆に、PLMはデータを解析せず、迅速な設計に大きく依存しているというPLMの特徴は、因果発見にPLMを直接使用する上で重要な制限となる。したがって、plmに基づく因果推論は、素早い設計に深く依存し、因果関係を決定する際に過剰信頼と誤った予測のリスクを負う。本稿では,物理に着想を得た合成データの実験を通して,前述のPLMに基づく因果推論の限界を実証的に示す。そこで本研究では,plmから得られた知識を因果発見アルゴリズムと統合する新しいフレームワークを提案する。これは因果発見のための隣接行列を初期化し、事前知識を用いた正規化を組み込むことによって達成される。提案手法は, PLMと因果発見の統合による性能向上を実証するだけでなく, PLMから抽出した事前知識を既存の因果発見アルゴリズムで活用する方法も提案する。 Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics-inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.	翻訳日:2023-11-21 21:30:42 公開日:2023-11-19
# イベントトリガー型コンテキスト認識ストーリー生成のためのクロスアテンション強化モデル A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation ( http://arxiv.org/abs/2311.11271v1 ) ライセンス: Link先を確認	Chen Tang, Tyler Loakman and Chenghua Lin	(参考訳) 近年の進歩にもかかわらず、既存のストーリー生成システムは、コンテクストやイベントの特徴を効果的に組み込むのに困難に直面している。これらの課題に対処するために、我々は、コンテキスト特徴をイベントシーケンスに残余マッピングを通じてマッピングするクロスアテンション機構を用いて、生成されたストーリーの関連性とコヒーレンスを高める新しいニューラル生成モデル、EtriCAを導入する。この機能キャプチャメカニズムにより,ストーリー生成プロセスにおいて,イベント間の論理関係をより効果的に活用できる。提案モデルをさらに強化するために,大規模書籍コーパスに知識向上のためのポストトレーニングフレームワーク(KeEtriCA)を用いる。これにより、EtriCAはより広い範囲のデータサンプルに適応できる。その結果,自動測定では約5倍,人的評価では10倍以上の改善が得られた。我々は、ストーリー生成におけるフレームワークの性能を評価するために、最新技術ベースラインモデル(SOTA)との比較を含む広範な実験を行う。自動測定と人的評価の両方を含む実験結果は、既存の最先端ベースラインよりもモデルの方が優れていることを示す。これらの結果は,生成した物語の質を向上させるために,文脈やイベントの特徴を活用するモデルの有効性を裏付けるものである。 Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5\% improvement in automatic metrics and over 10\% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.	翻訳日:2023-11-21 21:24:47 公開日:2023-11-19
# 実世界の筆記支援に向けて:偽字と誤字による漢字チェックベンチマーク Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters ( http://arxiv.org/abs/2311.11268v1 ) ライセンス: Link先を確認	Yinghui Li, Zishan Xu, Shaoshen Chen, Haojing Huang, Yangning Li, Yong Jiang, Zhongli Li, Qingyu Zhou, Hai-Tao Zheng, Ying Shen	(参考訳) 筆記支援は人間の生活に密接に関連する応用であり、また、基礎的な自然言語処理(NLP)研究分野でもある。その目的は入力テキストの正しさと品質を改善することであり、誤字の検出と修正には文字チェックが不可欠である。手書き文字が大多数を占める現実の世界から見ると、人間が間違える文字には、偽文字(すなわち、文字の誤りによって作られた不正確な文字)と誤字文字(すなわち、スペルミスによって誤用された真の文字)が含まれる。しかし、既存のデータセットや関連研究は、主に音韻的・視覚的混乱に起因する誤字のみに焦点を当てており、より一般的で難しい偽字を無視している。このジレンマを突破するために、偽字と誤字が混ざった人間の注釈付き視覚中国語文字チェックデータセットVisual-C$^3$を提示する。私たちの知る限りでは、visual-c$^3$は、漢字チェックシナリオにおける、最初の現実世界のビジュアルであり、最大の人造データセットです。また,Visual-C$^3$の新たなベースライン手法を提案し,評価する。広範な実験結果と分析の結果、visual-c$^3$は高品質だが困難であることがわかった。 Visual-C$^3$データセットとベースラインメソッドは、コミュニティにおけるさらなる研究を促進するために公開されます。 Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C$^3$, a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. The Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.	翻訳日:2023-11-21 21:24:26 公開日:2023-11-19
# メンタルヘルスアプリケーションにおける大規模言語モデルの再考 Rethinking Large Language Models in Mental Health Applications ( http://arxiv.org/abs/2311.11267v1 ) ライセンス: Link先を確認	Shaoxiong Ji and Tianlin Zhang and Kailai Yang and Sophia Ananiadou and Erik Cambria	(参考訳) 大規模言語モデル(LLM)はメンタルヘルスにおいて貴重な資産となり、分類タスクとカウンセリングアプリケーションの両方において有望である。本稿では,精神保健分野におけるLSMの利用について考察する。予測のための生成モデルの不安定性と幻覚的なアウトプットを生成する可能性について論じ、その信頼性と信頼性を維持するために継続する監査と評価の必要性を強調する。この論文は、しばしば交換可能な『説明可能性』と『解釈可能性』を区別し、LLMが生み出す潜在的幻覚的自己説明に頼るのではなく、本質的に解釈可能な方法を開発することを提唱している。 LLMの進歩にもかかわらず、人間のカウンセラーの共感的理解、ニュアンスド解釈、文脈認識は、精神保健カウンセリングのセンシティブで複雑な領域では相容れないままである。 LLMの使用は、それを置き換えようとするのではなく、人間の専門知識を補完するツールと見なして、司法的かつ思慮深い考え方でアプローチされるべきである。 Large Language Models (LLMs) have become valuable assets in mental health, showing promise in both classification tasks and counseling applications. This paper offers a perspective on using LLMs in mental health applications. It discusses the instability of generative models for prediction and the potential for generating hallucinatory outputs, underscoring the need for ongoing audits and evaluations to maintain their reliability and dependability. The paper also distinguishes between the often interchangeable terms ``explainability'' and ``interpretability'', advocating for developing inherently interpretable methods instead of relying on potentially hallucinated self-explanations generated by LLMs. Despite the advancements in LLMs, human counselors' empathetic understanding, nuanced interpretation, and contextual awareness remain irreplaceable in the sensitive and complex realm of mental health counseling. The use of LLMs should be approached with a judicious and considerate mindset, viewing them as tools that complement human expertise rather than seeking to replace it.	翻訳日:2023-11-21 21:24:03 公開日:2023-11-19
# 物理インフォームドニューラルネットワークとニューラル演算子における雑音入出力の不確かさの定量化 Uncertainty quantification for noisy inputs-outputs in physics-informed neural networks and neural operators ( http://arxiv.org/abs/2311.11262v1 ) ライセンス: Link先を確認	Zongren Zou, Xuhui Meng, George Em Karniadakis	(参考訳) 科学機械学習(SciML)における不確実性定量化(UQ)は、ニューラルネットワーク(NN)が様々な科学分野にわたる複雑な問題に広く採用されているため、ますます重要になっている。代表的なSciMLモデルは物理インフォームドニューラルネットワーク(PINN)とニューラル演算子(NO)である。近年、SciMLのUQはますます研究されているが、PINNにおける時空間座標やNOsにおける入力関数などのノイズ入力による不確実性に対処する研究はほとんどない。モデルの入力におけるノイズの存在は、ほとんどのSciMLアルゴリズムの固有の非線形性のために、モデルの出力におけるノイズと比較して、かなり多くの課題を引き起こす。結果として、ノイズの多い入力に対するUQは、物理的な知識を含むアプリケーションにこれらのモデルの信頼性と信頼性の高いデプロイを行う上で重要な要素となる。そこで本研究では,ピンとnosのノイズ入力から生じる不確かさを定量化するベイズ法を提案する。本手法は,物理情報を符号化する際に,PINNやNOにシームレスに統合可能であることを示す。 PINNは、損失関数または可能性のいずれにおいても、自動的に微分される物理情報を含むことで物理学を取り入れ、時空間座標を入力とすることが多い。そこで,本手法は,観測された座標が雑音を受ける問題に対処する能力をPINNに装備する。一方、事前訓練されたNOは微分方程式の解法やベイズ逆問題(英語版)において方程式を含まない代理として一般的に用いられる。提案手法では,入力関数と出力関数の両方のノイズ測定をuqで処理できる。 Uncertainty quantification (UQ) in scientific machine learning (SciML) becomes increasingly critical as neural networks (NNs) are being widely adopted in addressing complex problems across various scientific disciplines. Representative SciML models are physics-informed neural networks (PINNs) and neural operators (NOs). While UQ in SciML has been increasingly investigated in recent years, very few works have focused on addressing the uncertainty caused by the noisy inputs, such as spatial-temporal coordinates in PINNs and input functions in NOs. The presence of noise in the inputs of the models can pose significantly more challenges compared to noise in the outputs of the models, primarily due to the inherent nonlinearity of most SciML algorithms. As a result, UQ for noisy inputs becomes a crucial factor for reliable and trustworthy deployment of these models in applications involving physical knowledge. To this end, we introduce a Bayesian approach to quantify uncertainty arising from noisy inputs-outputs in PINNs and NOs. We show that this approach can be seamlessly integrated into PINNs and NOs, when they are employed to encode the physical information. PINNs incorporate physics by including physics-informed terms via automatic differentiation, either in the loss function or the likelihood, and often take as input the spatial-temporal coordinate. Therefore, the present method equips PINNs with the capability to address problems where the observed coordinate is subject to noise. On the other hand, pretrained NOs are also commonly employed as equation-free surrogates in solving differential equations and Bayesian inverse problems, in which they take functions as inputs. The proposed approach enables them to handle noisy measurements for both input and output functions with UQ.	翻訳日:2023-11-21 21:23:45 公開日:2023-11-19
# 視覚言語モデルに対する対向的プロンプトチューニング Adversarial Prompt Tuning for Vision-Language Models ( http://arxiv.org/abs/2311.11261v1 ) ライセンス: Link先を確認	Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang	(参考訳) マルチモーダル学習の急速な進歩に伴い、CLIPのような事前学習された視覚言語モデル(VLM)は、視覚と言語の間のギャップを埋める際、顕著な能力を示した。しかし、これらのモデルは敵の攻撃、特に画像のモダリティに弱いままであり、かなりのセキュリティリスクが生じる。本稿では,VLMにおける画像エンコーダの対向性を高める手法であるAdvPT(Adversarial Prompt Tuning)を提案する。 AdvPTは、学習可能なテキストプロンプトを革新的に活用し、それを敵対的な画像埋め込みと整合させ、広範囲なパラメータトレーニングやモデルアーキテクチャの変更を必要とせずに、VLMに固有の脆弱性に対処する。我々は,AdvPTがホワイトボックス攻撃やブラックボックス攻撃に対する抵抗性を向上し,既存の画像処理による防御技術と組み合わせることで,防御能力をさらに向上することを示す。総合的な実験分析は、テキスト入力の修正を通じて、対向画像に対する抵抗を改善することに特化した新しいパラダイムである、対向プロンプトチューニングに関する洞察を与え、将来の堅牢なマルチモーダル学習研究への道を開く。これらの知見は、VLMの安全性を高める新たな可能性を開く。私たちのコードは論文の発行時に入手できます。 With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code will be available upon publication of the paper.	翻訳日:2023-11-21 21:23:17 公開日:2023-11-19
# Radarize:屋内環境のための大規模レーダーSLAM Radarize: Large-Scale Radar SLAM for Indoor Environments ( http://arxiv.org/abs/2311.11260v1 ) ライセンス: Link先を確認	Emerson Sie, Xinyu Wu, Heyu Guo, Deepak Vasisht	(参考訳) 我々は、低コストのコモディティ単一チップmmWaveレーダのみを使用する屋内環境のための自己完結SLAMパイプラインであるRadarizeを提案する。レーダネイティブアプローチでは,ドップラーシフトに基づくオドメトリなどの電波周波数特有の現象を利用して,性能を向上させる。本手法は,4つのキャンパス建物にまたがる146件の大規模トラジェクトリデータセットを用いて,約4680mの走行距離で評価した。以上の結果から,IMUやホイール・オドメトリーなどのセンサを必要とせず,絶対軌道誤差 (ATE) を用いて計測し, 計測精度を約5倍, SLAMの約8倍に向上することがわかった。 We present Radarize, a self-contained SLAM pipeline for indoor environments that uses only a low-cost commodity single-chip mmWave radar. Our radar-native approach leverages phenomena unique to radio frequencies, such as doppler shift-based odometry, to improve performance. We evaluate our method on a large-scale dataset of 146 trajectories spanning 4 campus buildings, totaling approximately 4680m of travel distance. Our results show that our method outperforms state-of-the-art radar-based approaches by approximately 5x in terms of odometry and 8x in terms of end-to-end SLAM, as measured by absolute trajectory error (ATE), without the need additional sensors such as IMUs or wheel odometry.	翻訳日:2023-11-21 21:22:57 公開日:2023-11-19
# 解釈可能かつ効率的な量子インスパイア機械学習のためのテンソルネットワーク Tensor networks for interpretable and efficient quantum-inspired machine learning ( http://arxiv.org/abs/2311.11258v1 ) ライセンス: Link先を確認	Shi-Ju Ran and Gang Su	(参考訳) ディープラーニング(ML)の現在のスキームと高い解釈可能性と効率を同時に獲得することは、重要な課題である。量子力学から派生したよく確立された数学的ツールであるテンソルネットワーク(TN)は、効率的な「ホワイトボックス」MLスキームを開発する上で、その独特な利点を示している。本稿では,TNベースのMLにおけるインスピレーションの進展について概説する。一方、TN MLの解釈性は、量子情報と多体物理学に基づく固い理論基盤に適合する。一方で、強力なtn表現や量子多体物理学で開発された高度な計算技術から高い効率を得られる。量子コンピュータの急速な発展に伴い、TNは量子ハードウェア上で実行可能な新しいスキームを思いつき、近い将来「量子人工知能」へと進むことが期待されている。 It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-based ML. On one hand, interpretability of TN ML is accommodated with the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be rendered from the powerful TN representations and the advanced computational techniques developed in quantum many-body physics. With the fast development on quantum computers, TN is expected to conceive novel schemes runnable on quantum hardware, heading towards the ``quantum artificial intelligence'' in the forthcoming future.	翻訳日:2023-11-21 21:22:39 公開日:2023-11-19
# 圧縮高次高調波の発生 Generation of squeezed high-order harmonics ( http://arxiv.org/abs/2311.11257v1 ) ライセンス: Link先を確認	Matan Even Tzur, Michael Birk, Alexey Gorlach, Ido Kaminer, Michael Krueger, and Oren Cohen	(参考訳) 何十年もの間、高調波発生(hhg)に関するほとんどの研究は、物質は量子ではなく、古典的物であると考えており、高調波の量子光学的性質は疑問視されている。ここでは高調波の量子的性質を探求する。任意の量子光状態によって駆動されるとき、高調波の量子状態の公式を導出し、実験的関連性の特定の場合を探索する。特に、適度に圧縮されたポンプの場合、HHGはコヒーレント光によって駆動され、圧縮された高調波が生じる。高調波スクイージングは、イオン化時間をポンプのスクイージング位相と同期させることで最適化される。この体制を超えて、ポンプのスクイーズが増加するにつれて、ハーモニクスは最初、圧縮された熱光子統計を取得し、相互作用系の半古典的非線形応答関数に強く依存する複雑な量子状態を占める。その結果、超短波長超短パルスを絞り込み、より一般的には、従来アクセスできなかったスペクトル範囲に量子周波数変換することで、超感度アト秒メトロロジーが可能となる。 For decades, most research on high harmonic generation (HHG) considered matter as quantum but light as classical, leaving the quantum-optical nature of the harmonics an open question. Here we explore the quantum properties of high harmonics. We derive a formula for the quantum state of the high harmonics, when driven by arbitrary quantum light states, and then explore specific cases of experimental relevance. Specifically, for a moderately squeezed pump, HHG driven by squeezed coherent light results in squeezed high harmonics. Harmonic squeezing is optimized by syncing ionization times with the pump's squeezing phase. Beyond this regime, as pump squeezing is increased, the harmonics initially acquire squeezed thermal photon statistics, and then occupy an intricate quantum state which strongly depends on the semi-classical nonlinear response function of the interacting system. Our results pave the way for the generation of squeezed extreme-ultraviolet ultrashort pulses, and, more generally, quantum frequency conversion into previously inaccessible spectral ranges, which may enable ultrasensitive attosecond metrology.	翻訳日:2023-11-21 21:22:25 公開日:2023-11-19
# BOIS: 相互接続システムのベイズ最適化 BOIS: Bayesian Optimization of Interconnected Systems ( http://arxiv.org/abs/2311.11254v1 ) ライセンス: Link先を確認	Leonardo D. Gonz\'alez and Victor M. Zavala	(参考訳) ベイズ最適化(BO)は、高価なサンプルシステムのグローバル最適化に有効なパラダイムであることが証明されている。 boの主な利点の1つは、学習と探索のプロセスを導くのに利用できるモデルの不確かさを特徴付けるために、ガウス過程(gps)を使用することである。しかし、BOは通常システムをブラックボックスとして扱うため、構造的知識(物理学や疎結合など)を利用する能力は制限される。複合関数は$f(x, y(x))$であり、gp モデリングはパフォーマンス関数 $f$ から中間関数 $y$ にシフトされ、構造知識を利用するための道筋を提供する。しかし、BOフレームワークにおける合成関数の使用は、GPによって計算されるガウス密度$y$から$f$の確率密度を生成する必要性により複雑である(例えば、$f$が非線形であれば、閉形式式を得ることはできない)。従来の作業ではサンプリング技術を使ってこの問題に対処しており、実装が容易で柔軟性があるが、計算集約性が高い。本稿では,boにおける複合関数の効率的な利用を可能にする新しいパラダイムを提案する。このパラダイムでは,複合関数の統計モーメントに対する閉形式式を得るのに$f$の適応線形化を用いる。この単純なアプローチ(boisと呼ぶ)により、相互接続されたシステムや複数のgpモデルを埋め込んだシステム、物理モデルとgpモデルの組み合わせなど、構造的知識の活用が可能になる。化学プロセス最適化ケーススタディを用いて,BOISの標準BOとサンプリングアプローチの有効性をベンチマークした。その結果,boisは性能向上を達成し,複合関数の統計を正確に捉えることができた。 Bayesian optimization (BO) has proven to be an effective paradigm for the global optimization of expensive-to-sample systems. One of the main advantages of BO is its use of Gaussian processes (GPs) to characterize model uncertainty which can be leveraged to guide the learning and search process. However, BO typically treats systems as black-boxes and this limits the ability to exploit structural knowledge (e.g., physics and sparse interconnections). Composite functions of the form $f(x, y(x))$, wherein GP modeling is shifted from the performance function $f$ to an intermediate function $y$, offer an avenue for exploiting structural knowledge. However, the use of composite functions in a BO framework is complicated by the need to generate a probability density for $f$ from the Gaussian density of $y$ calculated by the GP (e.g., when $f$ is nonlinear it is not possible to obtain a closed-form expression). Previous work has handled this issue using sampling techniques; these are easy to implement and flexible but are computationally intensive. In this work, we introduce a new paradigm which allows for the efficient use of composite functions in BO; this uses adaptive linearizations of $f$ to obtain closed-form expressions for the statistical moments of the composite function. We show that this simple approach (which we call BOIS) enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. Using a chemical process optimization case study, we benchmark the effectiveness of BOIS against standard BO and sampling approaches. Our results indicate that BOIS achieves performance gains and accurately captures the statistics of composite functions.	翻訳日:2023-11-21 21:22:05 公開日:2023-11-19
# 日本のサブメーターレベルの土地被覆マッピング Submeter-level Land Cover Mapping of Japan ( http://arxiv.org/abs/2311.11252v1 ) ライセンス: Link先を確認	Naoto Yokoya, Junshi Xia, Clifford Broni-Bediako	(参考訳) ディープラーニングは、サブメーターレベルのマッピングタスクにおいて有望なパフォーマンスを示しているが、特に大規模に適用する場合、サブメーターレベルの画像のアノテーションコストは依然として課題である。本稿では,日本初の8階級の土地被覆図を,比較的低いアノテーションコストで提示する。最近導入されたグローバルサブメーターレベルの土地被覆マッピングのベンチマークデータセットであるOpenEarthMapと,少量のラベル付きデータによる全国規模の地図を実現するU-Netモデルを導入した。 OpenEarthMapでトレーニングされたU-Netモデルが明らかに失敗し、モデルを再トレーニングする領域や領域のラベル付きデータを少量追加することで、全体の精度が80%向上し、再トレーニング後の16パーセント近くの改善が達成された。地理空間情報機関(Geospatial Information Authority of Japan)が提供する航空画像を用いて,全国8クラスの土地被覆分類地図を作成する。提案手法は, アノテーションコストの低減と高精度マッピングの結果から, サブメータレベルの光リモートセンシングデータを用いた全国規模の土地被覆マッピングの自動更新に寄与する可能性を実証する。地図の結果は公開される予定だ。 Deep learning has shown promising performance in submeter-level mapping tasks; however, the annotation cost of submeter-level imagery remains a challenge, especially when applied on a large scale. In this paper, we present the first submeter-level land cover mapping of Japan with eight classes, at a relatively low annotation cost. We introduce a human-in-the-loop deep learning framework leveraging OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping, with a U-Net model that achieves national-scale mapping with a small amount of additional labeled data. By adding a small amount of labeled data of areas or regions where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80\% was achieved, which is a nearly 16 percentage point improvement after retraining. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps of eight classes for the entire country of Japan. Our framework, with its low annotation cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of national-scale land cover mapping using submeter-level optical remote sensing data. The mapping results will be made publicly available.	翻訳日:2023-11-21 21:21:34 公開日:2023-11-19
# 非エルミートイジング鎖における異種多体相転移 Unconventional many-body phase transitions in a non-Hermitian Ising chain ( http://arxiv.org/abs/2311.11251v1 ) ライセンス: Link先を確認	Chao-Ze Lu, Xiaolong Deng, Su-Peng Kou and Gaoyong Sun	(参考訳) 1次元強磁性トランスバースフィールドIsingモデルにおける多体相転移について検討し、2次相転移と2つの$\mathcal{PT}$相転移の3つの相転移を示すことを示す。基底状態における2次相転移は, 生体直交および自己正規エンタングルメントエントロピーを用いて検討し, 有限スケールスケーリング理論を用いて小系の中心電荷を抽出する手法を開発した。第2次相転移と比較して、第1の$\mathcal{PT}$遷移は全エネルギースペクトルにおける例外点の出現によって特徴づけられるが、第2の$\mathcal{PT}$遷移は特定の励起状態においてのみ発生する。さらに, エネルギーの仮想部分のスケーリングの観点から, 二つの例外点が2次であることが興味深い。この研究は、非エルミート系における非慣習的多体相転移の正確な解を与える。 We study many-body phase transitions in a one-dimensional ferromagnetic transversed field Ising model with an imaginary field and show that the system exhibits three phase transitions: one second-order phase transition and two $\mathcal{PT}$ phase transitions. The second-order phase transition occurring in the ground state is investigated via biorthogonal and self-normal entanglement entropy, for which we develop an approach to perform finite-size scaling theory to extract the central charge for small systems. Compared with the second-order phase transition, the first $\mathcal{PT}$ transition is characterized by the appearance of an exceptional point in the full energy spectrum, while the second $\mathcal{PT}$ transition only occurs in specific excited states. Furthermore, we interestingly show that both of exceptional points are second-order in terms of scalings of imaginary parts of the energy. This work provides an exact solution for unconventional many-body phase transitions in non-Hermitian systems.	翻訳日:2023-11-21 21:20:57 公開日:2023-11-19
# 感性分析に関する総合的レビュー:課題・アプローチ・応用 A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications ( http://arxiv.org/abs/2311.11250v1 ) ライセンス: Link先を確認	Sudhanshu Kumar (1), Partha Pratim Roy (1), Debi Prosad Dogra (2), Byung-Gyu Kim (3) ((1) Department of Computer Science and Engineering, IIT Roorkee, India, (2) School of Electrical Sciences, IIT Bhubaneswar, Odisha, India, (3) Department of IT Engineering, Sookmyung Women's University, Seoul, South Korea)	(参考訳) 感性分析(SA)はテキストマイニングにおける新たな分野である。異なるソーシャルメディアプラットフォーム上でテキストで表現された意見を計算的に識別し分類するプロセスである。ソーシャルメディアは、製品、サービス、そして最新の市場トレンドに対する顧客のマインドセットを知る上で重要な役割を果たす。ほとんどの組織は、提供された製品やサービスをアップグレードするための顧客の反応とフィードバックに依存しています。 SAや世論調査は諸藩にとって有望な研究分野であると思われる。インターネット上の構造化および非構造化フォーマットで毎日発生するビッグデータを分析する上で重要な役割を果たす。本研究は,音声,画像,映像,テキストなど様々な分野における感情と最近の研究・開発について述べる。感情分析の課題と機会についても論文で論じている。 \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing} Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}	翻訳日:2023-11-21 21:20:29 公開日:2023-11-19
# iot侵入検出のためのopen set dandelion network Open Set Dandelion Network for IoT Intrusion Detection ( http://arxiv.org/abs/2311.11249v1 ) ライセンス: Link先を確認	Jiashu Wu, Hao Dai, Kenneth B. Kent, Jerome Yen, Chengzhong Xu, Yang Wang	(参考訳) IoTデバイスが広く普及するにつれて、悪意のある侵入から保護することが不可欠である。しかし、IoTのデータ不足は、データ依存の従来の侵入検出手法の適用性を制限している。そこで本稿では,非教師付きヘテロジニアスドメイン適応に基づくオープンセット型Dandelion Network(OSDN)を提案する。 OSDNモデルは、知識豊富なソースネットワーク侵入ドメインからの侵入知識転送を実行し、データスカースターゲットIoT侵入ドメインのより正確な侵入検出を容易にする。オープンセット設定の下では、ソースドメインで観測されない新規のターゲットドメイン侵入を検出することもできる。これを実現するために、osdnモデルは、ソースドメインを、各侵入カテゴリがコンパクトにグループ化され、異なる侵入カテゴリが分離される、すなわち、カテゴリ間分離性とカテゴリ内コンパクト性を同時に強調する、タンポレーションのような特徴空間に形成する。タンポポをベースとしたターゲットメンバシップ機構は、ターゲットタンポポを形成する。そして、タンポポ角分離機構によりカテゴリー間分離性が向上し、タンポポ埋め込みアライメント機構はさらに細かな方法で両タンポポを整列させる。カテゴリ内コンパクト性を促進するために、識別されたサンプルタンポポ機構を用いる。未知の侵入知識と生成した未知の侵入知識の両方を用いて訓練された侵入分類器の支援により、セマンティクスダンポレーション補正機構は、難解なカテゴリを強調し、カテゴリ間分離性を改善する。理論的には、これらのメカニズムはIoT侵入検出のために侵入知識転送を効果的に実行するOSDNモデルを形成する。いくつかの侵入データセットに関する包括的な実験は、OSDNモデルの有効性を検証し、3つの最先端のベースライン法を16.9%上回った。 As IoT devices become widely, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this paper we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%.	翻訳日:2023-11-21 21:20:08 公開日:2023-11-19
# AutoStory:最小限の人間によるストーリーテリング画像の生成 AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort ( http://arxiv.org/abs/2311.11243v1 ) ライセンス: Link先を確認	Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen	(参考訳) ストーリービジュアライゼーションは、テキストで記述されたストーリーにマッチする一連の画像を生成することを目的としており、生成した画像は高品質、テキスト記述との整合性、文字のアイデンティティの整合性を満たす必要がある。ストーリービジュアライゼーションの複雑さを考えると、既存のメソッドは、いくつかの特定の文字やシナリオだけを考慮するか、スケッチのようなイメージごとの制御条件をユーザに要求することで、問題を劇的に単純化する。しかし、これらの単純化により、実際のアプリケーションではこれらの手法は無能である。そこで本研究では,人間のインタラクションを最小限に抑えて,多種多様で高品質で一貫したストーリーイメージを効果的に生成できる自動ストーリー可視化システムを提案する。具体的には,大規模言語モデルの理解と計画機能をレイアウト計画に活用し,大規模テキストから画像へのモデルを用いて,レイアウトに基づく高度なストーリーイメージを生成する。境界ボックスなどのスパース制御条件はレイアウト計画に適しているが,スケッチやキーポイントなどの密集制御条件は高品質な画像コンテンツを生成するのに適している。画像の画質を向上させるだけでなく,ユーザインタラクションも簡単かつ直感的に行えるよう,画像生成のための簡単なバウンディングボックスレイアウトをスケッチやキーポイント制御条件に変換する。また,文字画像の収集や描画に要する労力をなくし,多視点に一貫性のある文字画像を生成するための簡易かつ効果的な手法を提案する。 Story visualization aims to generate a series of images that match the story described in texts, and it requires the generated images to satisfy high quality, alignment with the text description, and consistency in character identities. Given the complexity of story visualization, existing methods drastically simplify the problem by considering only a few specific characters and scenarios, or requiring the users to provide per-image control conditions such as sketches. However, these simplifications render these methods incompetent for real applications. To this end, we propose an automated story visualization system that can effectively generate diverse, high-quality, and consistent sets of story images, with minimal human interactions. Specifically, we utilize the comprehension and planning capabilities of large language models for layout planning, and then leverage large-scale text-to-image models to generate sophisticated story images based on the layout. We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e.g., sketches and keypoints, are suitable for generating high-quality image content. To obtain the best of both worlds, we devise a dense condition generation module to transform simple bounding box layouts into sketch or keypoint control conditions for final image generation, which not only improves the image quality but also allows easy and intuitive user interactions. In addition, we propose a simple yet effective method to generate multi-view consistent character images, eliminating the reliance on human labor to collect or draw character images.	翻訳日:2023-11-21 21:19:17 公開日:2023-11-19
# 断熱強磁場近似に基づく分子のコヒーレントイオン化ダイナミクス Coherent postionization dynamics of molecules based on adiabatic strong-field approximation ( http://arxiv.org/abs/2311.11242v1 ) ライセンス: Link先を確認	Shan Xue, Wenli Yang, Ping Li, Yuxuan Zhang, Pengji Ding, Song-Feng Zhao, Hongchuan Du and Anh-Thu Le	(参考訳) 開システム密度行列法は通常、強いレーザー場におけるポストイオン化ダイナミクスを調べるために非コヒーレントな集団注入を用いる。コヒーレンス注入の存在は長い間議論の対象となっている。この文脈では、断熱強磁場近似(ASFA)に基づくコヒーレンス注入モデルを導入する。このモデルは方向トンネルイオン化によるイオンコヒーレンスを効果的に予測する。磁場強度の増大に伴い、ASFAモデルにより予測されるコヒーレンス度は徐々にSFAモデルから逸脱するが、単純な波動膨張モデルと部分波動膨張モデルよりはるかに緩やかに保たれている。ポストイオン化分子動力学に及ぼすコヒーレンス注入の影響をo$_2$とn$_2$で検討した。イオン化誘起振動コヒーレンスは, n$_2^+$ における $x^2 \sigma _g^+ -b^2 \sigma _u^+ $ と o$_2^+$ の解離確率を強く増加させることがわかった。逆に、イオン化によって引き起こされるビブロンコヒーレンスが関連する遷移に阻害作用を持つ。これらの結果から,強電界電離後の分子動力学シミュレーションにおけるビブロニック状態分解コヒーレンス注入の意義が示唆された。 Open-system density matrix methods typically employ incoherent population injection to investigate the postionization dynamics in strong laser fields. The presence of coherence injection has long been a subject of debate. In this context, we introduce a coherence injection model based on the adiabatic strong-field approximation (ASFA). This model effectively predicts ionic coherence resulting from directional tunnel ionization. With increasing field strength, the degree of coherence predicted by the ASFA model gradually deviates from that of the SFA model but remains much milder compared to the results of the simple and partial-wave expansion models. The impact of coherence injection on the postionization molecular dynamics is explored in O$_2$ and N$_2$. We find that the ionization-induced vibrational coherence strongly enhances the population inversion of $X^2 \Sigma _g^+ -B^2 \Sigma _u^+ $ in N$_2^+$ and the dissociation probability of O$_2^+$. Conversely, the ionization-induced vibronic coherences have inhibitory effects on the related transitions. These findings reveal the significance of including the vibronic-state-resolved coherence injection in simulating molecular dynamics following strong-field ionization.	翻訳日:2023-11-21 21:18:46 公開日:2023-11-19
# UMAAF:画像の多面的属性による美学の展開 UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images ( http://arxiv.org/abs/2311.11306v1 ) ライセンス: Link先を確認	Weijie Li, Yitian Wan, Xingjiao Wu, Junjie Xu, Liang He	(参考訳) スマートフォンやウェブサイトの普及に伴い、画像美容アセスメント(IAA)はますます重要になっている。 IAAにおける属性の重要性は広く認識されているが、多くの属性に基づく手法では美的属性の選択と利用について考慮されていない。最初のステップは、パースペクティブとインタースペクティブの両方から美的属性を取得することです。本研究では,画像の直接的視覚特性を抽出し,絶対的属性を構成する。 inter-perspectiveでは、同じシーケンス内の画像間の相対スコア関係をモデル化し、相対属性を形成することに重点を置いている。次に,画像属性の美的評価をよりよく活用するために,画像の絶対的属性と相対的属性の両方をモデル化する統一多属性美的評価フレームワーク(umaaf)を提案する。絶対属性に対しては,複数の絶対属性認識モジュールと絶対属性相互作用ネットワークを利用する。絶対属性認識モジュールは、まずいくつかの絶対属性学習タスクで事前訓練され、その後、対応する絶対属性の特徴を抽出するために使用される。絶対属性相互作用ネットワークは、多様な絶対属性特徴の重みを適応的に学習し、それらを様々な絶対属性視点から汎用的な美的特徴と効果的に統合し、美的予測を生成する。画像の相対的属性をモデル化するために,画像間の相対的ランク付けと相対的距離関係を相対的相関損失関数で検討し,umaafのロバスト性を高める。さらに、umaaf は tad66k と ava データセットで最先端のパフォーマンスを実現し、複数の実験で各モジュールの有効性とモデルの人間好みとの整合を実証した。 With the increasing prevalence of smartphones and websites, Image Aesthetic Assessment (IAA) has become increasingly crucial. While the significance of attributes in IAA is widely recognized, many attribute-based methods lack consideration for the selection and utilization of aesthetic attributes. Our initial step involves the acquisition of aesthetic attributes from both intra- and inter-perspectives. Within the intra-perspective, we extract the direct visual attributes of images, constituting the absolute attribute. In the inter-perspective, our focus lies in modeling the relative score relationships between images within the same sequence, forming the relative attribute. Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images. For absolute attributes, we leverage multiple absolute-attribute perception modules and an absolute-attribute interacting network. The absolute-attribute perception modules are first pre-trained on several absolute-attribute learning tasks and then used to extract corresponding absolute attribute features. The absolute-attribute interacting network adaptively learns the weight of diverse absolute-attribute features, effectively integrating them with generic aesthetic features from various absolute-attribute perspectives and generating the aesthetic prediction. To model the relative attribute of images, we consider the relative ranking and relative distance relationships between images in a Relative-Relation Loss function, which boosts the robustness of the UMAAF. Furthermore, UMAAF achieves state-of-the-art performance on TAD66K and AVA datasets, and multiple experiments demonstrate the effectiveness of each module and the model's alignment with human preference.	翻訳日:2023-11-21 21:11:30 公開日:2023-11-19
# 大きな学習率によって一般化が改善される:しかし、どのくらい大きなことを言っているのか? Large Learning Rates Improve Generalization: But How Large Are We Talking About? ( http://arxiv.org/abs/2311.11303v1 ) ライセンス: Link先を確認	Ekaterina Lobacheva, Eduard Pockonechnyy, Maxim Kodryan, Dmitry Vetrov	(参考訳) ニューラルネットワークのトレーニングを大きな学習率(LR)で始めることを推奨する最近の研究から着想を得て、この仮説を詳細に検討する。本研究は, 初回LR範囲を明らかにし, 後続のLRおよび重量平均化によるトレーニングに最適な結果を与えるものである。これらの範囲は、一般的に想定されるよりもかなり狭い。学習速度のハイパーパラメータを正確に制御し,より実用的な設定で重要な知見を検証できるように,簡易な設定で主実験を行った。 Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed. We conduct our main experiments in a simplified setup that allows precise control of the learning rate hyperparameter and validate our key findings in a more practical setting.	翻訳日:2023-11-21 21:11:04 公開日:2023-11-19
# Exchanging Dual Encoder-Decoder:Semantic Guidanceと空間的位置検出のための新しい戦略 Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization ( http://arxiv.org/abs/2311.11302v1 ) ライセンス: Link先を確認	Sijie Zhao, Xueliang Zhang, Pengfeng Xiao, and Guangjun He	(参考訳) 変化検出は地球観測における重要な課題である。近年,ディープラーニングに基づく手法が有望な性能を示し,変化検出に迅速に採用されている。しかし、広く使われているマルチエンコーダとシングルデコーダ(MESD)とデュアルエンコーダデコーダ(DED)アーキテクチャは、変更検出を効果的に処理するのに依然として苦労している。前者は機能レベル融合における両時間的特徴干渉の問題があり、後者はクラス内変化検出やマルチビュービルディング変更検出には適用できない。これらの問題を解決するために,セマンティックガイダンスと空間的ローカライゼーションを用いたバイナリ変更検出のためのデュアルエンコーダ・デコーダ構造を交換した新しい手法を提案する。提案手法は,決定レベルでの両時間的特徴とDEDの非適用性を両時間的意味的特徴を用いて決定することで,MESDにおける両時間的特徴推論の問題を解決する。この戦略に基づいてバイナリ変更検出モデルを構築し、クラス内変更検出データセット(CDD, SYSU)、シングルビュービルド変更検出データセット(WHU, LEVIR-CD, LEVIR-CD+)、マルチビュービルディング変更検出データセット(NJDS)の3つのシナリオにおいて、6つのデータセットに対して18の最先端変更検出手法を検証・比較する。実験結果から,f1-scores 97.77%,83.07%,94.86%,92.33%,91.39%,74.35%のcdd,sysu,whu,levir-cd,levir-cd+,njds のベンチマーク法をそれぞれ上回って,高い性能を実現した。この作業のコードはhttps://github.com/NJU-LHRS/official-SGSLNで公開される。 Change detection is a critical task in earth observation applications. Recently, deep learning-based methods have shown promising performance and are quickly adopted in change detection. However, the widely used multiple encoder and single decoder (MESD) as well as dual encoder-decoder (DED) architectures still struggle to effectively handle change detection well. The former has problems of bitemporal feature interference in the feature-level fusion, while the latter is inapplicable to intraclass change detection and multiview building change detection. To solve these problems, we propose a new strategy with an exchanging dual encoder-decoder structure for binary change detection with semantic guidance and spatial localization. The proposed strategy solves the problems of bitemporal feature inference in MESD by fusing bitemporal features in the decision level and the inapplicability in DED by determining changed areas using bitemporal semantic features. We build a binary change detection model based on this strategy, and then validate and compare it with 18 state-of-the-art change detection methods on six datasets in three scenarios, including intraclass change detection datasets (CDD, SYSU), single-view building change detection datasets (WHU, LEVIR-CD, LEVIR-CD+) and a multiview building change detection dataset (NJDS). The experimental results demonstrate that our model achieves superior performance with high efficiency and outperforms all benchmark methods with F1-scores of 97.77%, 83.07%, 94.86%, 92.33%, 91.39%, 74.35% on CDD, SYSU, WHU, LEVIR-CD, LEVIR- CD+, and NJDS datasets, respectively. The code of this work will be available at https://github.com/NJU-LHRS/official-SGSLN.	翻訳日:2023-11-21 21:10:54 公開日:2023-11-19
# CHAMP: クラスタ階層の効率的なアノテーションと統合 CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies ( http://arxiv.org/abs/2311.11301v1 ) ライセンス: Link先を確認	Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan	(参考訳) 様々なNLPタスクは、各ノードがアイテムのクラスタであるノード上の複雑な階層構造を必要とする。例えば、entailmentグラフの生成、階層的なクロスドキュメントのコア参照解決、アノテートイベントとサブイベントの関係などです。このような階層構造の効率的なアノテーションを可能にするため,任意のタイプのテキストに対してクラスタと階層を同時に構築可能なオープンソースツールであるCHAMPをリリースする。このインクリメンタルなアプローチは、一般的なペアワイズアノテーションアプローチに比べてアノテーション時間を大幅に削減するとともに、クラスタや階層レベルでの推移性を維持することを保証する。さらに、CHAMPには統合モードがあり、複数のクラスタ階層アノテーションを簡単に比較でき、不一致を解消できる。 Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.	翻訳日:2023-11-21 21:10:17 公開日:2023-11-19
# カテゴリから分類器へ: Web 探索による名前のみの継続的な学習 From Categories to Classifier: Name-Only Continual Learning by Exploring the Web ( http://arxiv.org/abs/2311.11293v1 ) ライセンス: Link先を確認	Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip H.S. Torr, Adel Bibi	(参考訳) 継続学習(CL)はしばしば、非現実的に時間がかかり、実際にコストがかかるという仮定である広範な注釈付きデータセットの可用性に依存する。我々は、時間とコストの制約が手動アノテーションを禁止する、名前のみ連続学習と呼ばれる新しいパラダイムを探求する。このシナリオでは、学習者は注釈付きトレーニングデータの豪華さなしに、カテゴリ名のみを使用して新しいカテゴリシフトに適応する。提案手法は拡張的で進化し続けているインターネットを利用して,画像分類のためのweb上教師なしデータの検索とダウンロードを行う。我々は、Webデータの信頼性を調べ、それらが手動で注釈付きデータセットよりも優れている場合もあります。さらに,webを活用すれば,laion-5bから生成モデルや画像検索を用いたサポートセットを作成することで,最先端の命名のみの分類を上回って,最大25%の精度向上を実現するサポートセットを作成できることを示す。各種連続学習コンテキストに適用すると,手動で注釈付きデータセットで学習したモデルと比較して,連続的な性能差が小さい。 EvoTrendsは、Webから作られたクラスインクリメンタルなデータセットで、数分で作成された現実世界のトレンドをキャプチャします。全体として,本論文は,連続学習における手動データラベリングに関わる課題を軽減するために,未処理のウェブ教師付きデータを使用することの可能性を強調した。 Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annotated training data. Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification. We investigate the reliability of our web data and find them comparable, and in some cases superior, to manually annotated datasets. Additionally, we show that by harnessing the web, we can create support sets that surpass state-of-the-art name-only classification that create support sets using generative models or image retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied across varied continual learning contexts, our method consistently exhibits a small performance gap in comparison to models trained on manually annotated datasets. We present EvoTrends, a class-incremental dataset made from the web to capture real-world trends, created in just minutes. Overall, this paper underscores the potential of using uncurated webly-supervised data to mitigate the challenges associated with manual data labeling in continual learning.	翻訳日:2023-11-21 21:10:07 公開日:2023-11-19
# 映像予測のための空間マスキングによるペアワイズ層注意 Pair-wise Layer Attention with Spatial Masking for Video Prediction ( http://arxiv.org/abs/2311.11289v1 ) ライセンス: Link先を確認	Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song	(参考訳) ビデオ予測は、過去のフレームを利用することで将来のフレームを生み出し、気象予測や自律運転など、多くの応用においてその大きな可能性を示した。以前の作品は、テクスチャの詳細を伴わずに、究極のハイレベルなセマンティクス機能を将来のフレームにデコードすることが多く、予測品質が低下する。そこで我々は,低レベルの視覚手がかりと高レベル特徴を結合することにより,u字型構造から派生した特徴マップの層別意味依存性を高めるペアワイズ層注意モジュールを開発した。これにより、予測フレームのテクスチャ詳細が強化される。さらに、既存の手法の多くはトランスレータによって時空間のダイナミクスを捉えるが、エンコーダの空間的特徴を十分に活用できない。これにより、プリトレーニング中に部分的な符号化機能を隠蔽する空間マスキング(SM)モジュールを設計し、デコーダによる残像画素の可視性を高めることができる。そこで本稿では,映像予測のための空間マスキング(pla-sm)フレームワークを用いて,動きの傾向を反映した時空間ダイナミクスを捉える。 5つのベンチマークに関する広範囲な実験と厳密なアブレーション研究は、提案手法の利点を示している。コードはGitHubで入手できる。 Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub.	翻訳日:2023-11-21 21:09:42 公開日:2023-11-19
# パレート前線の向こうに何がある? 多目的最適化のための意思決定支援手法の検討 What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization ( http://arxiv.org/abs/2311.11288v1 ) ライセンス: Link先を確認	Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek and Pradeep K. Murukannaiah	(参考訳) 本稿では,多目的最適化(MOO)アルゴリズムが生み出す解を探索するための意思決定支援手法を統一するレビューを行う。多様な問題を解決するためにMOOを適用するため、MOOアルゴリズムが提供するトレードオフを解析するためのアプローチがフィールドに分散している。本稿では,可視化,解集合のマイニング,不確実性探索,および対話性,説明可能性,倫理といった新たな研究方向を含む,このトピックの進歩の概要について述べる。これらの手法を様々な研究分野から合成し,アプリケーションとは無関係に統一的なアプローチを構築する。本研究の目的は,MOOアルゴリズムの利用に対する研究者や実践者の参入障壁を小さくし,新たな研究指針を提供することである。 We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic, including methods for visualization, mining the solution set, and uncertainty exploration as well as emerging research directions, including interactivity, explainability, and ethics. We synthesize these methods drawing from different fields of research to build a unified approach, independent of the application. Our goals are to reduce the entry barrier for researchers and practitioners on using MOO algorithms and to provide novel research directions.	翻訳日:2023-11-21 21:09:17 公開日:2023-11-19
# 効率的なロボットマニピュレーションスキル獲得のための触覚アクティブ推論強化学習 Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition ( http://arxiv.org/abs/2311.11287v1 ) ライセンス: Link先を確認	Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu and Panfeng Huang	(参考訳) ロボット操作は、退屈で危険なタスクの実行において、人間を置き換える可能性を秘めている。しかし、現実のオープンワールド操作を形式的に記述することが困難であり、既存の学習手法の非効率性のため、制御に基づくアプローチは適切ではない。したがって、幅広いシナリオに操作を適用することが大きな課題となる。本研究では,ロボット操作におけるスキル学習のための新しい手法である触覚能動推論強化学習(tactile-airl)を提案する。強化学習(RL)の性能を高めるために,モデルに基づく手法と本質的な好奇心をRLプロセスに統合した能動推論を導入する。この統合により、アルゴリズムのトレーニング効率とスパース報酬への適応性が向上する。さらに、視覚に基づく触覚センサを用いて、操作タスクの詳細な認識を行う。最後に,自由エネルギー最小化による適切な行動を想定し,計画するためにモデルベースアプローチを採用する。シミュレーションの結果,タスクをプッシュする非理解オブジェクトのトレーニング効率は有意に高いことがわかった。エージェントは、SACベースラインを越え、わずか数回の相互作用エピソードで、密度と疎度の両方の報酬タスクをエクササイズすることができる。さらに,本手法を用いてグリッパーねじり作業の物理実験を行い,アルゴリズムの高速学習能力とその実用的応用の可能性を示す。 Robotic manipulation holds the potential to replace humans in the execution of tedious or dangerous tasks. However, control-based approaches are not suitable due to the difficulty of formally describing open-world manipulation in reality, and the inefficiency of existing learning methods. Thus, applying manipulation in a wide range of scenarios presents significant challenges. In this study, we propose a novel method for skill learning in robotic manipulation called Tactile Active Inference Reinforcement Learning (Tactile-AIRL), aimed at achieving efficient training. To enhance the performance of reinforcement learning (RL), we introduce active inference, which integrates model-based techniques and intrinsic curiosity into the RL process. This integration improves the algorithm's training efficiency and adaptability to sparse rewards. Additionally, we utilize a vision-based tactile sensor to provide detailed perception for manipulation tasks. Finally, we employ a model-based approach to imagine and plan appropriate actions through free energy minimization. Simulation results demonstrate that our method achieves significantly high training efficiency in non-prehensile objects pushing tasks. It enables agents to excel in both dense and sparse reward tasks with just a few interaction episodes, surpassing the SAC baseline. Furthermore, we conduct physical experiments on a gripper screwing task using our method, which showcases the algorithm's rapid learning capability and its potential for practical applications.	翻訳日:2023-11-21 21:09:06 公開日:2023-11-19
# TimeSQL: マルチスケールパッチとスムーズな2次損失による多変量時系列予測の改善 TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss ( http://arxiv.org/abs/2311.11285v1 ) ライセンス: Link先を確認	Site Mo, Haoxin Wang, Bixiong Li, Songhai Fan, Yuankai Wu, Xianggen Liu	(参考訳) 時系列(英: Time series)とは、任意の時間間隔で収集された実数値の確率変数の列である。実世界の多変量時系列はノイズを伴い、複雑な局所的および大域的時間力学を含むため、歴史的観測から将来の時系列を予測することは困難である。この作業は、マルチスケールパッチとスムーズな二次的損失(SQL)を活用して、上記の課題に対処する、シンプルで効果的なフレームワークであるTimeSQLを提案する。マルチスケールパッチは、時系列を異なる長さスケールの2次元パッチに変換し、時系列における局所性と長期相関の認識を促進する。 sqlはrational quadratic kernelから派生したもので、ノイズや外れ値の過剰を避けるために動的に勾配を調整することができる。理論的解析により、穏やかな条件下では、SQLのモデルに対するノイズの影響は常にMSEのノイズよりも小さいことが示される。 2つのモジュールに基づいて、TimeSQLは8つの実世界のベンチマークデータセット上で、最先端のパフォーマンスを新たに達成する。さらなるアブレーション研究により、TimeSQLのキーモジュールは、プラグイン・アンド・プレイ技術として立脚した多変量時系列予測のための他のモデルの結果も強化できることが示された。 Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as TimeSQL, which leverages multi-scale patching and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale patching transforms the time series into two-dimensional patches with different length scales, facilitating the perception of both locality and long-term correlations in time series. SQL is derived from the rational quadratic kernel and can dynamically adjust the gradients to avoid overfitting to the noises and outliers. Theoretical analysis demonstrates that, under mild conditions, the effect of the noises on the model with SQL is always smaller than that with MSE. Based on the two modules, TimeSQL achieves new state-of-the-art performance on the eight real-world benchmark datasets. Further ablation studies indicate that the key modules in TimeSQL could also enhance the results of other models for multivariate time series forecasting, standing as plug-and-play techniques.	翻訳日:2023-11-21 21:08:42 公開日:2023-11-19
# luciddreamer: インターバルスコアマッチングによる高忠実度テキスト対3d生成に向けて LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching ( http://arxiv.org/abs/2311.11284v1 ) ライセンス: Link先を確認	Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen	(参考訳) テキスト3d生成の最近の進歩は、様々な現実世界のシナリオにまたがって想像力のある3dアセットを作成する新たな可能性を開くことによって、生成モデルにおける重要なマイルストーンとなった。テキスト3d生成の最近の進歩は期待されているものの、詳細な高品質な3dモデルのレンダリングには不足していることが多い。多くのメソッドがSDS(Score Distillation Sampling)に基づいているため、この問題は特に顕著である。本稿では3次元モデルに不整合かつ低品質な更新方向をもたらし、過度なスムーシング効果をもたらすSDSの顕著な欠陥を同定する。そこで我々は,ISM (Interval Score Matching) と呼ばれる新しい手法を提案する。 ISMは決定論的拡散軌道を用いており、間隔ベースのスコアマッチングを用いてオーバー・スムーシングに対抗する。さらに、テキストから3D生成パイプラインに3Dガウススプラッティングを組み込む。大規模な実験により、我々のモデルは品質と訓練効率の最先端性を大きく上回ることがわかった。 The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency.	翻訳日:2023-11-21 21:08:22 公開日:2023-11-19
# 個々の誤報タグ付けはエコーチャンバーを補強する;集団タグ付けはしない Individual misinformation tagging reinforces echo chambers; Collective tagging does not ( http://arxiv.org/abs/2311.11282v1 ) ライセンス: Link先を確認	Junsol Kim, Zhao Wang, Haohan Shi, Hsin-Keng Ling, James Evans	(参考訳) オンライン上の誤った情報による不安定な影響に対する恐れは、個人やプラットフォームに反応を促した。個人は、より健康的な情報エコシステムを追求し、自己強化的な意見の反響室を壊すために、事実チェックで他人のオンライン主張に挑戦する権限を与えられた。タグづけされたポスターは、新しい政治情報を探し、その直前に話題の興味を広げていたが、タグ付けされたポスターは情報バブルに後退した。これらの意図しない結果は、誤情報モデレーションのための集合的検証システムによって軟化された。 Twitterの新しいプラットフォームであるCommunity Notesでは、偽情報のタグ付けは公開前に他のファクトチェッカーによってピアレビューされた。集団的な誤情報タグ付けでは、ポスターは多様な情報消費から撤退する可能性が低い。詳細な比較は、個人と集団の誤情報のタグ付けメッセージにおける毒性、感情、可読性、遅延の違いを示唆する。これらの知見は、情報エコシステム全体の情報消費とモビリティの多様性に個人と集団のモデレーション戦略が与える影響の異なる証拠を提供する。 Fears about the destabilizing impact of misinformation online have motivated individuals and platforms to respond. Individuals have become empowered to challenge others' online claims with fact-checks in pursuit of a healthier information ecosystem and to break down echo chambers of self-reinforcing opinion. Using Twitter data, here we show the consequences of individual misinformation tagging: tagged posters had explored novel political information and expanded topical interests immediately prior, but being tagged caused posters to retreat into information bubbles. These unintended consequences were softened by a collective verification system for misinformation moderation. In Twitter's new platform, Community Notes, misinformation tagging was peer-reviewed by other fact-checkers before the exposure. With collective misinformation tagging, posters were less likely to retreat from diverse information consumption. Detailed comparison suggests differences in toxicity, sentiment, readability, and delay in individual versus collective misinformation tagging messages. These findings provide evidence for differential impacts from individual versus collective moderation strategies on the diversity of information consumption and mobility across the information ecosystem.	翻訳日:2023-11-21 21:08:05 公開日:2023-11-19
# 深層強化学習によるマルチタイム制御とコミュニケーション -その1:通信対応車両制御- Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control ( http://arxiv.org/abs/2311.11281v1 ) ライセンス: Link先を確認	Tong Liu, Lei Lei, Kan Zheng, Xuemin (Sherman) Shen	(参考訳) V2X通信によって実現されるインテリジェントな意思決定システムは、安全で効率的な自動運転(AD)を実現するために不可欠であり、車両制御と無線リソース割り当て(RRA)という2種類の決定を異なる時間スケールで行う必要がある。 RRAと車両制御の相互作用は共同設計を必要とする。本論文(パートI,パートII)では,多段階制御と通信(MTCC)の協調最適化フレームワークを,深層強化学習(DRL)に基づいて提案する。本稿では,まず通信対応DRLベースのPCサブプロブレムと制御対応DRLベースのRRAサブプロブレムに分解する。次に、RRAポリシーが与えられると仮定したPCサブプロブレムに着目し、効率的なPCポリシーを学ぶためのMTCC-PCアルゴリズムを提案する。ランダムな観察遅延下でのPC性能向上のため、PC状態空間を観察遅延とPC動作履歴で拡張する。さらに、拡張状態に関する報酬関数を定義して、拡張状態マルコフ決定プロセス(MDP)を構築する。拡張状態MDPの最適ポリシは、観測遅延を伴う元のPC問題に最適であることが証明された。 MTCC-PCアルゴリズムは,従来の通信対応制御とは異なり,単純な確率遅延モデルではなく,C-V2X通信の微細な埋め込みシミュレーションによって生成された遅延環境で訓練される。最後に,MTCC-PCの性能とベースラインDRLアルゴリズムの性能を比較する実験を行った。 An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multi-timescale control and communications (MTCC) based on Deep Reinforcement Learning (DRL). In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem. Then, we focus on the PC sub-problem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov Decision Process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of C-V2X communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.	翻訳日:2023-11-21 21:07:49 公開日:2023-11-19
# 深層強化学習によるマルチタイム制御とコミュニケーション -その2: 無線リソース配置の制御- Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part II: Control-Aware Radio Resource Allocation ( http://arxiv.org/abs/2311.11280v1 ) ライセンス: Link先を確認	Lei Lei, Tong Liu, Kan Zheng, Xuemin (Sherman) Shen	(参考訳) 本論文のパートI(Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control)では,C-V2X(Cellular Vehicle-to-Everything)システムにおけるマルチスケール制御と通信(MTCC)の問題を,DRL(Deep Reinforcement Learning)に基づく小隊制御(PC)サブプロブレムとDRL(RRA)サブプロブレムに分解した。我々は,PCサブプロブレムに着目し,RRAポリシーを考慮し,最適PCポリシーを学習するためのMTCC-PCアルゴリズムを提案した。本稿では,PC ポリシーが与えられたことを前提とした MTCC における RRA サブプロブレムに着目し,RRA ポリシーを学習するための MTCC-RRA アルゴリズムを提案する。具体的には、観察遅延に起因するPC性能劣化量を定量化するRRA報酬関数にPCアドバンテージ関数を組み込む。さらに,PC アクション履歴を用いて RRA の状態空間を拡張し,より優れた RRA ポリシーを提案する。さらに,報奨シェーピングと報奨バックプロパゲーションを優先した経験リプレイ (rbper) 技術を用いて,マルチエージェント問題とスパース報酬問題を効率的に解決する。最後に,pcとrraのポリシを反復的に学習するために,サンプルと計算効率のよいトレーニング手法を提案する。 MTCCアルゴリズムの有効性を検証するために, MTCCの性能をベースラインDRLアルゴリズムと比較した, 先行車両の実走行データを用いた実験を行った。 In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward shaping and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.	翻訳日:2023-11-21 21:07:20 公開日:2023-11-19
# 汎用的ディープフェイク検出のための潜在空間拡張による超越的偽造特異性 Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection ( http://arxiv.org/abs/2311.11278v1 ) ライセンス: Link先を確認	Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu	(参考訳) Deepfake検出は、トレーニングとテストデータの分布にミスマッチがある場合のパフォーマンスが低下する、重要な一般化ハードルに直面している。広く受け入れられた説明は、これらの検出器が様々な偽造物に広く適用される特徴を学ぶよりも、偽造物に過度に適合する傾向にある。この問題に対処するために、我々は、ヒューリスティックなアイデアに基づく、lsda(\underline{l}atent \underline{s}pace \underline{d}ata \underline{a}ugmentation)と呼ばれる単純で効果的な検出器を提案する。この考え方に従い, 潜在空間における偽造特徴の変動を構築・シミュレートすることにより, 偽造空間の拡大を提案する。このアプローチは、リッチでドメイン固有の特徴の獲得と、異なるフォージェリータイプ間のスムーズな移行の促進を含み、ドメインギャップを効果的に埋める。提案手法は, 改良された特徴から蒸留された知識を生かし, 一般化可能なディープフェイク検出器の開発に有効である。包括的実験により,提案手法は驚くほど効果的であり,広く使用されているベンチマークにおいて最先端の検出器を超越することを示した。 Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. A broadly received explanation is the tendency of these detectors to be overfitted to forgery-specific artifacts, rather than learning features that are widely applicable across various forgeries. To address this issue, we propose a simple yet effective detector called LSDA (\underline{L}atent \underline{S}pace \underline{D}ata \underline{A}ugmentation), which is based on a heuristic idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary, thereby mitigating the overfitting of method-specific features (see Figure. 1). Following this idea, we propose to enlarge the forgery space by constructing and simulating variations within and across forgery features in the latent space. This approach encompasses the acquisition of enriched, domain-specific features and the facilitation of smoother transitions between different forgery types, effectively bridging domain gaps. Our approach culminates in refining a binary classifier that leverages the distilled knowledge from the enhanced features, striving for a generalizable deepfake detector. Comprehensive experiments show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.	翻訳日:2023-11-21 21:06:44 公開日:2023-11-19
# 迷彩レンズによる大型視覚言語モデルの一般化と幻覚 Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens ( http://arxiv.org/abs/2311.11273v1 ) ライセンス: Link先を確認	Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li	(参考訳) 大型視覚言語モデル(lvlm)は近年急速に発展し、注目を集めている。本稿では,LVLM が難易度の高いcamouflaged object detection (COD) シナリオに学習自由な方法で一般化できるかどうかを検討するために,新しいフレームワークであるcamo-perceptive vision- language framework (CPVLF) を提案する。一般化の過程では、lvlm内の幻覚の問題により、迷彩されたシーンの物体を誤って知覚し、反事実的な概念を生み出すことが分かる。さらに、LVLMはカモフラージュされた物体の正確な位置決めを特別に訓練されていないため、これらの物体を正確に特定する上で不確実性を示す。そこで本研究では,言語と視覚の両方の観点からのlvlmのカモフラージュシーンの知覚を増強し,幻覚問題を低減し,カモフラージュ対象を正確に同定する能力を向上させる視覚知覚連鎖を提案する。我々は,CPVLFが広く使用されている3つのCODデータセットに対して有効であることを検証するとともに,CODタスクにおけるLVLMの可能性を示す。 Large Vision-Language Model (LVLM) has seen burgeoning development and increasing attention recently. In this paper, we propose a novel framework, camo-perceptive vision-language framework (CPVLF), to explore whether LVLM can generalize to the challenging camouflaged object detection (COD) scenario in a training-free manner. During the process of generalization, we find that due to hallucination issues within LVLM, it can erroneously perceive objects in camouflaged scenes, producing counterfactual concepts. Moreover, as LVLM is not specifically trained for the precise localization of camouflaged objects, it exhibits a degree of uncertainty in accurately pinpointing these objects. Therefore, we propose chain of visual perception, which enhances LVLM's perception of camouflaged scenes from both linguistic and visual perspectives, reducing the hallucination issue and improving its capability in accurately locating camouflaged objects. We validate the effectiveness of CPVLF on three widely used COD datasets, and the experiments show the potential of LVLM in the COD task.	翻訳日:2023-11-21 21:06:15 公開日:2023-11-19
# 分散二レベル最適化の通信複雑性について On the Communication Complexity of Decentralized Bilevel Optimization ( http://arxiv.org/abs/2311.11342v1 ) ライセンス: Link先を確認	Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao	(参考訳) 分散二レベル最適化は、機械学習に広く応用されて以来、ここ数年で積極的に研究されてきた。しかし、既存のアルゴリズムは確率的過次性の推定によって引き起こされる通信の複雑さに悩まされ、実際のタスクに限定する。この問題に対処するため,各ラウンドおよび小ラウンドの通信コストを低減し,不均一な環境下での分散確率的二段階勾配降下アルゴリズムを開発した。そのため、既存のアルゴリズムよりもはるかに優れた通信複雑性を実現することができる。さらに、より困難な分散化されたマルチレベル最適化にアルゴリズムを拡張します。私たちの知る限りでは、不均質な環境下でこれらの理論的な結果を達成するのは初めてです。実験結果から,本アルゴリズムの有効性が確認された。 Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and small communication rounds. As such, it can achieve a much better communication complexity than existing algorithms. Moreover, we extend our algorithm to the more challenging decentralized multi-level optimization. To the best of our knowledge, this is the first time achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.	翻訳日:2023-11-21 20:58:28 公開日:2023-11-19
# 時系列の自己蒸留表現学習 Self-Distilled Representation Learning for Time Series ( http://arxiv.org/abs/2311.11335v1 ) ライセンス: Link先を確認	Felix Pieper and Konstantin Ditschuneit and Martin Genzel and Alexandra Lindt and Johannes Otterbach	(参考訳) 時系列データに対する自己教師あり学習は、最近自然言語処理やコンピュータビジョンで解かれたものと同様の可能性を秘めている。この分野の既存の研究は対照的な学習に重点を置いているが、我々はData2vecの自己蒸留フレームワークに基づく概念的にシンプルだが強力な非競合的アプローチを提案する。本手法の中核は,同じ時系列のマスキングビューから入力時系列の潜在表現を予測する学生-教師方式である。この戦略は、対照的なサンプルペアの設計によって一般的に導入される強いモダリティ特有の仮定やバイアスを避ける。 UCRやUEAのアーカイブやETTやElectricityのデータセットといった最先端の自己教師型学習手法と比較して,下流タスクとして分類と予測を行うアプローチの競争力を実証する。 Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework. The core of our method is a student-teacher scheme that predicts the latent representation of an input time series from masked views of the same time series. This strategy avoids strong modality-specific assumptions and biases typically introduced by the design of contrastive sample pairs. We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets.	翻訳日:2023-11-21 20:58:19 公開日:2023-11-19
# 動的システムにおける因果スレッドによる変化の説明 Using Causal Threads to Explain Changes in a Dynamic System ( http://arxiv.org/abs/2311.11334v1 ) ライセンス: Link先を確認	Robert B. Allen	(参考訳) 我々はシステムのリッチな意味モデルの開発を探求する。具体的には,これらのシステムにおける状態変化に関する構造的因果説明について考察する。基本的に、プロセスベースの動的知識グラフを開発しています。例えば,雪球地球理論によって提案された地質変化の因果スレッドのモデルを構築した。さらに,説明を行うためのグラフィカルインタフェースの初期プロトタイプについて述べる。大規模言語モデル(llm)のような要約や説明に対する統計的アプローチとは異なり、直接表現のアプローチは直接検査し検証することができる。 We explore developing rich semantic models of systems. Specifically, we consider structured causal explanations about state changes in those systems. Essentially, we are developing process-based dynamic knowledge graphs. As an example, we construct a model of the causal threads for geological changes proposed by the Snowball Earth theory. Further, we describe an early prototype of a graphical interface to present the explanations. Unlike statistical approaches to summarization and explanation such as Large Language Models (LLMs), our approach of direct representation can be inspected and verified directly.	翻訳日:2023-11-21 20:58:05 公開日:2023-11-19
# 金融サービスにおけるポルトガルのFAQ Portuguese FAQ for Financial Services ( http://arxiv.org/abs/2311.11331v1 ) ライセンス: Link先を確認	Paulo Finardi, Wanderley M. Melo, Edgard D. Medeiros Neto, Alex F. Mansano, Pablo B. Costa, Vinicius F. Carid\'a	(参考訳) ポルトガルの金融分野におけるドメイン固有データの重要性は、自然言語処理(NLP)アプリケーションの開発を嫌っている。この制限に対処するため,本研究はデータ拡張技術によって生成された合成データの利用を提唱する。この調査は、ブラジル中央銀行のfaqから引用されたデータセットの強化に焦点を当てており、意味的類似性が異なる技術を使用している。教師なしタスクは、低・高セマンティック類似性シナリオにおける拡張データの影響を評価するために行われる。さらに、結果のデータセットはHugging Face Datasetsプラットフォーム上に公開され、アクセシビリティが向上し、NLP研究コミュニティ内での広範なエンゲージメントが促進される。 Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, employing techniques that vary in semantic similarity. Supervised and unsupervised tasks are conducted to evaluate the impact of augmented data on both low and high semantic similarity scenarios. Additionally, the resultant dataset will be publicly disseminated on the Hugging Face Datasets platform, thereby enhancing accessibility and fostering broader engagement within the NLP research community.	翻訳日:2023-11-21 20:57:58 公開日:2023-11-19
# ユニタリ変換とアンシラ状態測定によるマトリックス操作 Matrix manipulations via unitary transformations and ancilla-state measurements ( http://arxiv.org/abs/2311.11329v1 ) ライセンス: Link先を確認	Alexander I. Zenchuk, Wentao Qi, Asutosh Kumar, Junde Wu	(参考訳) 本稿では,マルチキュービットトフォリ型と最も単純な1キュービット演算に基づく内部積,行列加算,行列乗算の計算プロトコルを提案し,アンシラ測定を用いて計算のすべてのゴミを除去する。加算プロトコルの深さ(ランタイム)は$O(1)$であり、他のプロトコルの深さは考慮された行列の次元によって対数的に増加する。 We propose protocols for calculating inner product, matrix addition and matrix multiplication based on multiqubit Toffoli-type and the simplest one-qubit operations and employ ancilla measurements to remove all garbage of calculations. The depth (runtime) of the addition protocol is $O(1)$ and that of other protocols logarithmically increases with the dimensionality of the considered matrices.	翻訳日:2023-11-21 20:57:46 公開日:2023-11-19
# LABCAT:主成分整合信頼領域を用いた局所適応ベイズ最適化 LABCAT: Locally adaptive Bayesian optimization using principal component-aligned trust regions ( http://arxiv.org/abs/2311.11328v1 ) ライセンス: Link先を確認	E. Visser, C.E. van Daalen, J.C. Schoeman	(参考訳) ベイズ最適化(BO)は高価なブラックボックス関数を最適化する一般的な方法である。 BOには、より長い最適化実行を伴う計算のスローダウン、非定常あるいは不条件の目的関数に対する適合性の低下、収束特性の低下など、よく文書化された欠点がいくつかある。信頼領域などのローカル戦略をBOに組み込んでこれらの制限を緩和するアルゴリズムがいくつか提案されているが、いずれのアルゴリズムも十分対応していない。そこで本研究では,局所ガウス過程サーロゲートモデルの長さスケールに基づく主成分整合回転と適応再スケーリング戦略を付加することにより,信頼領域に基づくboを拡張したlabcatアルゴリズムを提案する。一連の合成テスト関数とよく知られたCOCOベンチマークソフトウェアを用いて、広範囲にわたる数値実験を行い、LABCATアルゴリズムは最先端BOや他のブラックボックス最適化アルゴリズムよりも優れていることを示した。 Bayesian optimization (BO) is a popular method for optimizing expensive black-box functions. BO has several well-documented shortcomings, including computational slowdown with longer optimization runs, poor suitability for non-stationary or ill-conditioned objective functions, and poor convergence characteristics. Several algorithms have been proposed that incorporate local strategies, such as trust regions, into BO to mitigate these limitations; however, none address all of them satisfactorily. To address these shortcomings, we propose the LABCAT algorithm, which extends trust-region-based BO by adding principal-component-aligned rotation and an adaptive rescaling strategy based on the length-scales of a local Gaussian process surrogate model with automatic relevance determination. Through extensive numerical experiments using a set of synthetic test functions and the well-known COCO benchmarking software, we show that the LABCAT algorithm outperforms several state-of-the-art BO and other black-box optimization algorithms.	翻訳日:2023-11-21 20:57:37 公開日:2023-11-19
# MoVideo:拡散モデルを用いたモーション対応ビデオ生成 MoVideo: Motion-Aware Video Generation with Diffusion Models ( http://arxiv.org/abs/2311.11325v1 ) ライセンス: Link先を確認	Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan	(参考訳) 近年,映像生成における拡散モデルの利用は大きな進歩を遂げているが,そのほとんどは画像生成フレームワークの単純な拡張であり,映像と画像の大きな違いであるモーションを明示的に考慮していない。本稿では,映像奥行きと光流の2つの側面から運動を考慮した新しいモーションアウェアビデオ生成(movideo)フレームワークを提案する。前者はフレーム単位の物体距離と空間配置による動きを規制し、後者はフレーム間の対応による動きを記述し、細部を保存し時間的整合性を改善する。より具体的には、テキストプロンプトから生成されるキーフレームを前提として、ビデオ深度と対応する光フローを生成する時空間モジュールを用いた拡散モデルを最初に設計する。そして、奥行き、光フローベースゆがみビデオ、計算された咬合マスクの指導の下で、別の時空間拡散モデルにより潜時空間で映像を生成する。最後に、我々は再び光学フローを使用して異なるフレームを整列し、改良し、潜在空間から画素空間へのより良いビデオデコーディングを行う。実験では、MoVideoはテキスト・トゥ・ビデオと画像・トゥ・ビデオ生成の両方で最先端の結果を達成する。 While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. The former regulates motion by per-frame object distances and spatial layouts, while the later describes motion by cross-frame correspondences that help in preserving fine details and improving temporal consistency. More specifically, given a key frame that exists or generated from text prompts, we first design a diffusion model with spatio-temporal modules to generate the video depth and the corresponding optical flows. Then, the video is generated in the latent space by another spatio-temporal diffusion model under the guidance of depth, optical flow-based warped latent video and the calculated occlusion mask. Lastly, we use optical flows again to align and refine different frames for better video decoding from the latent space to the pixel space. In experiments, MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.	翻訳日:2023-11-21 20:57:18 公開日:2023-11-19
# 処理効果推定のための表現誘発共起バイアスの境界 Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation ( http://arxiv.org/abs/2311.11321v1 ) ライセンス: Link先を確認	Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel	(参考訳) 条件平均処理効果(CATE)推定のための最先端手法は、表現学習を広く活用する。ここでは、(潜在的に制約された)低次元表現による低サンプルCATE推定のばらつきを低減する。しかし、低次元の表現は、観測された共同設立者に関する情報を失う可能性があり、その結果、CATE推定のための表現学習の妥当性が典型的に侵害されるため、バイアスにつながる。本稿では,CATE推定における次元減少(あるいは表現に関する他の制約)から生じる表現誘発共起バイアスの境界を推定する,表現に依存しない新しいフレームワークを提案する。まず、CATEが低次元(制約付き)表現を非識別する条件を理論的に確立する。第二に、我々はCATEを部分的に同定すること、あるいは同等に、表現誘発共役バイアスの下限と上限を推定することを提案する。我々は一連の実験において境界の有効性を示す。まとめると、我々のフレームワークは、CATE推定の有効性が重要である実践において、直接的な関連性を持っている。 State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATEs are non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose to perform partial identification of CATEs or, equivalently, aim at estimating of lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our framework is of direct relevance in practice where the validity of CATE estimation is of importance.	翻訳日:2023-11-21 20:56:51 公開日:2023-11-19
# GeoSAM: モビリティインフラストラクチャの自動セグメンテーションのためのスパースと濃厚なビジュアルプロンプトを備えた微調整SAM GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure ( http://arxiv.org/abs/2311.11319v1 ) ライセンス: Link先を確認	Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu	(参考訳) Segment Anything Model (SAM)は、自然画像のセグメンテーションに適用された際、印象的な性能を示している。しかし、特に道路、歩道、横断歩道などの移動インフラを分割する場合、航空画像や衛星画像のような地理的画像に苦しむ。この劣ったパフォーマンスは、これらのオブジェクトの狭い特徴、それらのテクスチャが周囲に混ざり合うこと、木、建物、車両、歩行者のようなオブジェクトから干渉することに由来する。これらの課題に対処するために,ゼロショット学習からの濃密な視覚的プロンプトと,事前学習したCNNセグメンテーションモデルからの疎密な視覚的プロンプトを用いて微調整戦略を実装する新しいSAMベースのフレームワークであるGeoSAMを提案する。提案したGeoSAMは、道路インフラ、歩行者インフラ、および平均17.65%において、道路と歩行者の両方のインフラを含む移動インフラのセグメント化に基礎モデルを活用するという重要な飛躍の成果として、地理的イメージセグメンテーションの既存のアプローチ、特に20%、14.29%、および17.65%を上回っている。 The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 20%, 14.29%, and 17.65% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images.	翻訳日:2023-11-21 20:56:37 公開日:2023-11-19
# ガウス平滑化とガウス微分の離散近似 Discrete approximations of Gaussian smoothing and Gaussian derivatives ( http://arxiv.org/abs/2311.11317v1 ) ライセンス: Link先を確認	Tony Lindeberg	(参考訳) 本稿では, 離散データに適用するためのスケール空間理論におけるガウス平滑化およびガウス微分計算の近似問題に関する深い処理法を考案する。連続的および離散的スケール空間論の以前の公理的処理との密接な関係から、これらのスケール空間演算を明示的離散畳み込みという観点から区別する3つの主要な方法を考える。 (i)ガウス核とガウス微分核をサンプリングする。 (ii)各画素支持領域上にガウス核とガウス微分核を局所的に統合し、 3) ガウス核の離散アナログのスケール空間解析を基礎とし, 空間的スムーズな画像データに小サポート中央差分演算子を適用することにより微分近似を演算する。本研究では,これら3つの主要な離散化手法の特性を理論的・実験的に検討し,その性能を定量的に評価する。その結果、サンプル化されたガウス核と導関数、および統合されたガウス核と導関数は、非常に微細なスケールで非常に低性能であることがわかった。非常に微細なスケールでは、ガウス核の離散的な類似とそれに対応する離散微分近似が大幅に向上する。一方、サンプル化されたガウス核とサンプル化されたガウス微分は、スケールパラメータが十分に大きい場合、グリッド間隔の単位においてスケールパラメータが約1より大きい場合、対応する連続結果の数値的に非常に良い近似をもたらす。 This paper develops an in-depth treatment concerning the problem of approximating the Gaussian smoothing and Gaussian derivative computations in scale-space theory for application on discrete data. With close connections to previous axiomatic treatments of continuous and discrete scale-space theory, we consider three main ways discretizing these scale-space operations in terms of explicit discrete convolutions, based on either (i) sampling the Gaussian kernels and the Gaussian derivative kernels, (ii) locally integrating the Gaussian kernels and the Gaussian derivative kernels over each pixel support region and (iii) basing the scale-space analysis on the discrete analogue of the Gaussian kernel, and then computing derivative approximations by applying small-support central difference operators to the spatially smoothed image data. We study the properties of these three main discretization methods both theoretically and experimentally, and characterize their performance by quantitative measures, including the results they give rise to with respect to the task of scale selection, investigated for four different use cases, and with emphasis on the behaviour at fine scales. The results show that the sampled Gaussian kernels and derivatives as well as the integrated Gaussian kernels and derivatives perform very poorly at very fine scales. At very fine scales, the discrete analogue of the Gaussian kernel with its corresponding discrete derivative approximations performs substantially better. The sampled Gaussian kernel and the sampled Gaussian derivatives do, on the other hand, lead to numerically very good approximations of the corresponding continuous results, when the scale parameter is sufficiently large, in the experiments presented in the paper, when the scale parameter is greater than a value of about 1, in units of the grid spacing.	翻訳日:2023-11-21 20:56:12 公開日:2023-11-19
# TPTU-v2: リアルタイムシステムにおける大規模言語モデルベースエージェントのタスク計画とツール利用の促進 TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems ( http://arxiv.org/abs/2311.11315v1 ) ライセンス: Link先を確認	Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao	(参考訳) 大規模言語モデル(llm)は、タスク計画と、タスク計画と、apiのような外部ツールの併用を必要とする外部ツールの使用の組み合わせを必要とするタスクに対処する能力を示している。 However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. そこで本稿では,実世界のシステムで動作するllmベースのエージェントのタスク計画とツール使用能力の向上を目的とした包括的フレームワークを提案する。このフレームワークは,(1) API Retrieverが利用可能な広範囲な配列の中で,ユーザタスクに関連するAPIを選択する,(2) LLM FinetunerがベースLLMをチューニングしてタスク計画やAPI呼び出しに役立てる,(3) Demo Selectorは,難しいAPIに関連するさまざまなデモを適応的に検索する,という3つの重要なコンポーネントで構成されている。実世界の商用システムとオープンソースの学術データセットを用いて,本手法の有効性を検証し,各コンポーネントの有効性と統合フレームワークの有効性を明らかにした。 Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world commercial system as well as an open-sourced academic dataset, and the outcomes clearly showcase the efficacy of each individual component as well as the integrated framework.	翻訳日:2023-11-21 20:55:47 公開日:2023-11-19
# ケラー非線形性のt-dmrgシミュレーション : 非ガウス力学の初期状態依存性の解析 t-DMRG Simulation of Kerr Nonlinearity; Analyzing Initial State Dependency of non-Gaussian Dynamics ( http://arxiv.org/abs/2311.11314v1 ) ライセンス: Link先を確認	Souvik Agasti	(参考訳) 時間発展ブロックデシメーション (tebd) アルゴリズムを用いて, コヒーレント駆動自由散逸カー非線形系を数値的にシミュレートし, 古典的ビスタブルとダイナミクスがどのように類似しているかを検証した。 2つのコヒーレント分岐の重ね合わせは非古典的時間ダイナミクスをもたらす。ウィグナー状態の表現は、システムが異なる軌道を通じて進化し、異なる外部ドライブと初期条件のために異なる分岐を安定化し、進化を通じて非ゲージ化をもたらすことを確認している。さらに,進化が初期状態の残留的な影響を被っていることも確認した。 We simulate coherent driven free dissipative Kerr nonlinear system numerically, starting from different initial states, using time-evolving block decimation (TEBD) algorithm to see how the dynamics are analogous to classical bistability. The superposition of two coherent branches results in non-classical time dynamics. The Wigner state representation confirms that the system evolves through different trajectories to stabilize different branches for different external drives and initial conditions, resulting de-Gaussification throughout evolution. Furthermore, we also see that the evolution suffers a residual effect of the initial state.	翻訳日:2023-11-21 20:55:18 公開日:2023-11-19
# 量子誤り訂正プログラムの記号的実行 Symbolic Execution for Quantum Error Correction Programs ( http://arxiv.org/abs/2311.11313v1 ) ライセンス: Link先を確認	Wang Fang, Mingsheng Ying	(参考訳) 我々は,量子プログラムのためのシンボリック実行フレームワークqseを定義し,記号変数を量子状態と量子測定結果に統合する。 QSEの音響定理が証明される。さらに,量子誤差補正プログラムの効率的な解析を容易にするシンボリック安定化状態を導入する。 QSEフレームワーク内では、シンボリック表現を用いて量子誤り訂正の可能な逆誤差を特徴付けることができ、シミュレータによるサンプリングに依存する既存の手法よりも大幅に改善される。我々はQuantumSE.jlというプロトタイプツールでシンボル安定化状態をサポートするQSEを実装した。量子反復符号、北エフのトーリック符号、量子タナー符号を含む代表量子誤り訂正符号の実験により、1000量子ビットを超える量子誤り訂正プログラムをデバッグするためのQuantumSE.jlの効率を実証する。さらに、QSEの副産物として、QuantumSE.jlの安定化回路のサンプリング機能は、実験において最先端の安定化シミュレータであるGoogleのStimよりも優れている。 We define a symbolic execution framework QSE for quantum programs by integrating symbolic variables into quantum states and the outcomes of quantum measurements. The soundness theorem of QSE is proved. We further introduce symbolic stabilizer states, which facilitate the efficient analysis of quantum error correction programs. Within the QSE framework, we can use symbolic expressions to characterize the possible adversarial errors in quantum error correction, providing a significant improvement over existing methods that rely on sampling with simulators. We implement QSE with the support of symbolic stabilizer states in a prototype tool named QuantumSE.jl. With experiments on representative quantum error correction codes, including quantum repetition codes, Kitaev's toric codes, and quantum Tanner codes, we demonstrate the efficiency of QuantumSE.jl for debugging quantum error correction programs with over 1000 qubits. In addition, as a by-product of QSE, QuantumSE.jl's sampling functionality for stabilizer circuits also outperforms the state-of-the-art stabilizer simulator, Google's Stim, in the experiments.	翻訳日:2023-11-21 20:55:04 公開日:2023-11-19
# マルチモーダル相互作用とプール注意によるrgb-d意味セグメンテーションの最適化 Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention ( http://arxiv.org/abs/2311.11312v1 ) ライセンス: Link先を確認	Shuai Zhang, Minghong Xie	(参考訳) RGB-D画像のセマンティックセグメンテーションは、シーン内の物体の外観や空間的関係を理解し、様々な要因を慎重に検討する必要がある。しかし、屋内環境では、RGBと深度画像の単純な入力は、しばしば意味情報と空間情報の比較的限られた取得をもたらし、最適下分割の結果をもたらす。そこで本研究では,rgbと奥行きモダリティの対話的な相乗効果を活かし,補完的情報の利用を最適化する新しい手法であるmipanetを提案する。具体的には,Multi-modal Interaction Fusion Module (MIM) をネットワークの最も深い層に組み込む。このモジュールはRGBと深度情報の融合を容易にするために設計されており、相互強化と修正が可能である。さらに,エンコーダの様々な段階において,Pooling Attention Module (PAM)を導入する。このモジュールは、ネットワークによって抽出された機能を増幅し、モジュールの出力をターゲットとしてデコーダに統合し、セマンティックセグメンテーションのパフォーマンスを大幅に改善する。実験の結果、MIPANetは2つの屋内シーンデータセットであるNYUDv2とSUN-RGBDの既存手法よりも優れており、RGB-Dセマンティックセマンティックセマンティックセマンティクスの強化の有効性が示されている。 Semantic segmentation of RGB-D images involves understanding the appearance and spatial relationships of objects within a scene, which requires careful consideration of various factors. However, in indoor environments, the simple input of RGB and depth images often results in a relatively limited acquisition of semantic and spatial information, leading to suboptimal segmentation outcomes. To address this, we propose the Multi-modal Interaction and Pooling Attention Network (MIPANet), a novel approach designed to harness the interactive synergy between RGB and depth modalities, optimizing the utilization of complementary information. Specifically, we incorporate a Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Additionally, we introduce a Pooling Attention Module (PAM) at various stages of the encoder. This module serves to amplify the features extracted by the network and integrates the module's output into the decoder in a targeted manner, significantly improving semantic segmentation performance. Our experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYUDv2 and SUN-RGBD, underscoring its effectiveness in enhancing RGB-D semantic segmentation.	翻訳日:2023-11-21 20:54:44 公開日:2023-11-19
# 異方性中心スピンモデルによるスピンスクイーズ Spin squeezing generated by anisotropic central spin model ( http://arxiv.org/abs/2311.11308v1 ) ライセンス: Link先を確認	Lei Shao and Libin Fu	(参考訳) スピンスクイージングは、重要な量子資源として、量子力学において重要な役割を担い、高精度なパラメータ推定スキームを実現できる。ここでは,異方性中心スピン系におけるスピンスクイージングと量子相転移について検討する。このような中心スピン系は、中心スピンとスピン浴の間の遷移周波数の比が無限大に向かう限界において、異方性リプキン-メシュコフ-グリック模型にマッピングできる。この性質は1軸のねじれ相互作用を誘発し、スピンスクイーズを生成する新しい可能性を与える。我々は、基底状態と中心スピンモデルの動的進化を通してスピンスクイーズ状態を生成することを検討する。その結果, スピンスクイーズパラメータは異方性パラメータが減少するにつれて向上し, その値はシステムサイズで$N^{-2/3}$となることがわかった。さらに, 臨界点周辺の量子フィッシャー情報の臨界指数を数値シミュレーションにより求め, この値は周波数比として4/3ドルの値となり, システムサイズが無限大になる傾向がみられた。この研究はスピンスクイーズ状態を生成するための有望なスキームを提供し、量子センシングの潜在的な進歩の道を開く。 Spin squeezing, as a crucial quantum resource, plays a pivotal role in quantum metrology, enabling us to achieve high-precision parameter estimation schemes. Here we investigate the spin squeezing and the quantum phase transition in anisotropic central spin systems. We find that this kind of central spin systems can be mapped to the anisotropic Lipkin-Meshkov-Glick model in the limit where the ratio of transition frequencies between the central spin and the spin bath tends towards infinity. This property can induce a one-axis twisting interaction and provides a new possibility for generating spin squeezing. We consider generating spin-squeezed states via the ground state and the dynamic evolution of the central spin model. The results show that the spin squeezing parameter improves as the anisotropy parameter decreases, and its value scales with system size as $N^{-2/3}$. Furthermore, we obtain the critical exponent of the quantum Fisher information around the critical point by numerical simulation, and find this value tends to $4/3$ as the frequency ratio and the system size approach infinity. This work offers a promising scheme for generating spin-squeezed state and paves the way for potential advancements in quantum sensing.	翻訳日:2023-11-21 20:54:20 公開日:2023-11-19
# 直交専門家の混合によるマルチタスク強化学習 Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts ( http://arxiv.org/abs/2311.11385v1 ) ライセンス: Link先を確認	Ahmed Hendawy, Jan Peters, Carlo D'Eramo	(参考訳) マルチタスク強化学習(mtrl)は、様々な問題を一般化するスキルを持つ内在エージェントの長年の問題に取り組む。この目的のために、表現の共有は、タスクのユニークな特徴と共通の特徴の両方をキャプチャする上で、基本的な役割を果たす。タスクは、スキル、オブジェクト、または物理的特性の点で類似性を示し、それらの表現を活用すれば、普遍的なポリシーの達成が容易になる。それでも、多様な表現の共有セットを学ぶことの追求は、いまだに未解決の課題である。本稿では,直交表現を用いてタスク間の共通構造をカプセル化して多様性を促進するMTRLにおける表現学習手法を提案する。我々の手法はMixture Of Orthogonal Experts (MOORE) と呼ばれ、Gram-Schmidtプロセスを利用して、専門家の混合によって生成された表現の共有部分空間を形成する。タスク固有の情報が提供されると、MOOREはこの共有部分空間から関連する表現を生成する。提案手法の有効性をMiniGridとMetaWorldという2つのMTRLベンチマークで評価し,MOOREが関連するベースラインを超越し,MetaWorldにおける新たな最先端結果を確立することを示す。 Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.	翻訳日:2023-11-21 20:46:49 公開日:2023-11-19
# mriにおける拡散確率モデルの新しい応用に関する調査 A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI ( http://arxiv.org/abs/2311.11383v1 ) ライセンス: Link先を確認	Yuheng Fan, Hanxi Liao, Shiqi Huang, Yimin Luo, Huazhu Fu, Haikun Qi	(参考訳) 拡散確率モデル (DPM) は, 明らかな可能性評価とデータ合成のための段階的なサンプリングプロセスを用いて, 研究の関心が高まっている。サンプリング中の多くのステップによる計算負荷にもかかわらず、DPMは様々な医療画像タスクにおいて、その高品質で多様な世代に対して広く評価されている。磁気共鳴イメージング(mri)は、軟組織コントラストと超b空間分解能に優れ、拡散モデルに特有の機会を持つ重要な医用イメージングモードである。 MRIでDPMを探索する研究が近年増えているが、MRIアプリケーション用に特別に設計されたDPMの調査論文はいまだに不足している。この記事では、MRIコミュニティの研究者が異なるアプリケーションにおけるDPMの進歩を把握できるようにすることを目的としている。まず,拡散時間ステップが離散的か連続的かに応じて分類された2つの支配的なDPMの理論を紹介し,画像生成,画像翻訳,セグメンテーション,異常検出,その他の研究トピックを含むMRIにおける新たなDPMの総合的なレビューを行う。最後に、DPMのMRIタスクに特有の制限だけでなく、一般的な制限についても論じ、さらに探究する価値のある潜在的な領域を指摘する。 Diffusion probabilistic models (DPMs) which employ explicit likelihood characterization and a gradual sampling process to synthesize data, have gained increasing research interest. Despite their huge computational burdens due to the large number of steps involved during sampling, DPMs are widely appreciated in various medical imaging tasks for their high-quality and diversity of generation. Magnetic resonance imaging (MRI) is an important medical imaging modality with excellent soft tissue contrast and superb spatial resolution, which possesses unique opportunities for diffusion models. Although there is a recent surge of studies exploring DPMs in MRI, a survey paper of DPMs specifically designed for MRI applications is still lacking. This review article aims to help researchers in the MRI community to grasp the advances of DPMs in different applications. We first introduce the theory of two dominant kinds of DPMs, categorized according to whether the diffusion time step is discrete or continuous, and then provide a comprehensive review of emerging DPMs in MRI, including reconstruction, image generation, image translation, segmentation, anomaly detection, and further research topics. Finally, we discuss the general limitations as well as limitations specific to the MRI tasks of DPMs and point out potential areas that are worth further exploration.	翻訳日:2023-11-21 20:46:27 公開日:2023-11-19
# 統計情報を付加した変圧器モデルの説明可能性の検討 Inspecting Explainability of Transformer Models with Additional Statistical Information ( http://arxiv.org/abs/2311.11378v1 ) ライセンス: Link先を確認	Hoang C. Nguyen, Haeil Lee, Junmo Kim	(参考訳) 近年、視覚領域ではトランスフォーマーがより普及しているため、それを視覚化することでトランスフォーマーモデルを効果的に解釈する方法を見つける必要がある。最近の研究でcheferらは、各イメージパッチの重要性を示すために注意層を組み合わせることで、視覚とマルチモーダルタスクのトランスフォーマーを効果的に可視化できる。しかし、Swin Transformerのような他の変種のTransformerに適用する場合、この方法は予測対象に集中できない。本手法は,層正規化層におけるトークンの統計を考慮し,スウィントランスとvitの解釈可能性を示す。 Transformer becomes more popular in the vision domain in recent years so there is a need for finding an effective way to interpret the Transformer model by visualizing it. In recent work, Chefer et al. can visualize the Transformer on vision and multi-modal tasks effectively by combining attention layers to show the importance of each image patch. However, when applying to other variants of Transformer such as the Swin Transformer, this method can not focus on the predicted object. Our method, by considering the statistics of tokens in layer normalization layers, shows a great ability to interpret the explainability of Swin Transformer and ViT.	翻訳日:2023-11-21 20:46:04 公開日:2023-11-19
# ML-LMCL:音声言語理解におけるASRロバスト性向上のための相互学習と大規模コントラスト学習 ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding ( http://arxiv.org/abs/2311.11375v1 ) ライセンス: Link先を確認	Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou	(参考訳) 音声言語理解(SLU)はタスク指向対話システムの基本課題である。しかしながら、自動音声認識(ASR)による避けられない誤りは、通常、理解性能を損ね、エラーの伝播につながる。コントラスト学習によってこの問題に対処しようとする試みはいくつかあるが,(1)手書き文字とASR文字の書き起こしは微調整で等しく扱うこと,(2)コントラスト学習を適用する際に意味論的に類似したペアがまだ追い出されているという事実を無視すること,(3)KL(Kulback-Leibler)という問題に悩まされる。本稿では,sluにおけるasrロバスト性向上のための新しい枠組みである,相互学習と大規模比較学習(ml-lmcl)を提案する。具体的には、相互学習に適用し、2つのSLUモデルを手書き文字とASR文字で訓練し、これら2つのモデルの知識を反復的に共有することを目的としている。また,クラスタ内ペアを可能な限り排除しないように,距離偏光正規化器を導入する。さらに,klの消失を緩和するために周期的アニーリングスケジュールを用いる。 3つのデータセットの実験では、ML-LMCLは既存のモデルより優れ、新しい最先端のパフォーマンスを実現する。 Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue systems. However, the inevitable errors from automatic speech recognition (ASR) usually impair the understanding performance and lead to error propagation. Although there are some attempts to address this problem through contrastive learning, they (1) treat clean manual transcripts and ASR transcripts equally without discrimination in fine-tuning; (2) neglect the fact that the semantically similar pairs are still pushed away when applying contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL) vanishing. In this paper, we propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness in SLU. Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models. We also introduce a distance polarization regularizer to avoid pushing away the intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing schedule to mitigate KL vanishing issue. Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.	翻訳日:2023-11-21 20:45:52 公開日:2023-11-19
# SOccDPT: メモリ制約下で訓練された高密度予測変換器からの半教師付き3次元セマンティック動作 SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints ( http://arxiv.org/abs/2311.11371v1 ) ライセンス: Link先を確認	Aditya Nalgunda Ganesh	(参考訳) 我々は高密度な予測変換器を用いた単眼画像からの3次元意味占有予測のためのメモリ効率のよいSOccDPTを提案する。構造化トラヒックデータセットでトレーニングされた既存のメソッドの制限に対処するために、インド駆動データセットやベンガルー駆動データセットを含む非構造化データセットでモデルをトレーニングします。半教師付きトレーニングパイプラインにより,socdptは限定ラベル付きデータセットから学習でき,擬似基底真理ラベルを代入することで,手作業によるラベリングの必要性を低減し,bengaluruセマンティック占有データセットを作成できる。この広範なトレーニングにより、非構造化トラフィックシナリオを効果的に処理できるモデルの能力が向上します。トレーニング中のメモリ制限を克服するために,各エポックをトレーニングするパラメータのサブセットを選択するパッチワイズトレーニングを導入し,自動グレードグラフ構築時のメモリ使用量を削減する。構造化されていないトラフィックとメモリ制約のあるトレーニングと推論の文脈において、SOccDPTはRMSEの9.1473のスコアで示されるような既存の格差推定手法より優れており、セマンティックセグメンテーションIoUのスコアは46.02%に達し、競争周波数は69.47Hzである。コードとセマンティック占有率データセットを公開します。 We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi-supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labelling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public.	翻訳日:2023-11-21 20:45:24 公開日:2023-11-19
# 公共データを用いた最適局所的非パラメトリック分類 Optimal Locally Private Nonparametric Classification with Public Data ( http://arxiv.org/abs/2311.11369v1 ) ライセンス: Link先を確認	Yuheng Ma and Hanfang Yang	(参考訳) 本研究では,非パラメトリック分類に着目し,非対話型ldp(local differential privacy)学習の課題について検討する。後方ドリフト仮定の下では, LDP制約による最小収束率を初めて導出した。そこで,本研究では,極小最大収束率を実現する新しい手法である局所プライベート分類木を提案する。さらに,パラメータチューニングを回避し,高速収束推定器を生成するデータ駆動プルーニング手順を設計する。合成および実データを用いた総合的な実験は,提案手法の優れた性能を示す。理論的および実験的な結果は、プライベートデータと比較して公開データの有効性を示すものであり、非プライベートデータ収集の優先順位付けの実践的提案につながっている。 In this work, we investigate the problem of public data-assisted non-interactive LDP (Local Differential Privacy) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and produces a fast converging estimator. Comprehensive experiments conducted on synthetic and real datasets show the superior performance of our proposed method. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.	翻訳日:2023-11-21 20:44:56 公開日:2023-11-19
# 不均一ハイパーグラフニューラルネットワークの自己教師付き事前学習 Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks ( http://arxiv.org/abs/2311.11368v1 ) ライセンス: Link先を確認	Abdalgader Abubaker, Takanori Maehara, Madhav Nimishakavi, Vassilis Plachouras	(参考訳) 近年,グラフニューラルネットワーク(gnns)の事前学習手法がラベルなしグラフデータから効果的な表現を学習するのに成功している。しかし、これらの手法のほとんどはグラフの対関係に依存しており、エンティティ間の高次関係を捉えていない。ハイパーグラフは、データ内のエンティティ間の高次関係を効果的にモデル化できる汎用的で表現力のある構造である。ハイパーグラフ(HyperGNN)にGNNを適用する努力にもかかわらず、現在、異種ハイパーグラフ上でのHyperGNNの完全な事前訓練方法は存在しない。本稿では,異種HyperGNNのための自己教師型事前学習フレームワークであるSPHHを提案する。本手法は,データ内のエンティティ間の高次関係を自己監督的に効果的に捉えることができる。 SPHHは、ハイパーグラフ構造から派生した情報表現を用いて、ハイパーグラフ内のエンティティの局所的およびグローバル的表現を同時に学習することを目的とした2つの自己教師型事前訓練タスクからなる。全体としては,ハイパーgnnの自己教師付き事前学習の分野において重要な進歩を示し,ハイパーグラフ構成にマッピングされたノード分類やリンク予測タスクなど,グラフベースのダウンストリームタスクのパフォーマンス向上の可能性を示す。 4つの異なるHyperGNNモデルを用いた2つの実世界のベンチマーク実験により、提案したSPHHフレームワークは、様々な下流タスクにおける最先端のベースラインを一貫して上回ることを示す。その結果、SPHHは、アーキテクチャや複雑さに関わらず、様々な下流タスクにおける様々なHyperGNNモデルの性能を向上させることができることが示され、フレームワークの堅牢性を強調している。 Recently, pretraining methods for the Graph Neural Networks (GNNs) have been successful at learning effective representations from unlabeled graph data. However, most of these methods rely on pairwise relations in the graph and do not capture the underling higher-order relations between entities. Hypergraphs are versatile and expressive structures that can effectively model higher-order relationships among entities in the data. Despite the efforts to adapt GNNs to hypergraphs (HyperGNN), there are currently no fully self-supervised pretraining methods for HyperGNN on heterogeneous hypergraphs. In this paper, we present SPHH, a novel self-supervised pretraining framework for heterogeneous HyperGNNs. Our method is able to effectively capture higher-order relations among entities in the data in a self-supervised manner. SPHH is consist of two self-supervised pretraining tasks that aim to simultaneously learn both local and global representations of the entities in the hypergraph by using informative representations derived from the hypergraph structure. Overall, our work presents a significant advancement in the field of self-supervised pretraining of HyperGNNs, and has the potential to improve the performance of various graph-based downstream tasks such as node classification and link prediction tasks which are mapped to hypergraph configuration. Our experiments on two real-world benchmarks using four different HyperGNN models show that our proposed SPHH framework consistently outperforms state-of-the-art baselines in various downstream tasks. The results demonstrate that SPHH is able to improve the performance of various HyperGNN models in various downstream tasks, regardless of their architecture or complexity, which highlights the robustness of our framework.	翻訳日:2023-11-21 20:44:46 公開日:2023-11-19
# 証拠不確かさの定量化:変数に基づく視点 Evidential Uncertainty Quantification: A Variance-Based Perspective ( http://arxiv.org/abs/2311.11367v1 ) ライセンス: Link先を確認	Ruxiao Duan, Brian Caffo, Harrison X. Bai, Haris I. Sair, Craig Jones	(参考訳) 深層ニューラルネットワークの不確かさの定量化は研究の活発な分野となり、アクティブラーニングのような下流の様々なタスクにおいて重要な役割を担っている。近年の顕在的深層学習の進歩は, モデルの1つの前方通過による動脈およびてんかんの不確かさの直接定量化に寄与している。ほとんどの伝統的なアプローチでは、エントロピーに基づく方法で分類における明白な不確かさを導き、サンプルレベルでの不確かさを定量化している。しかし,回帰問題に広く適用されている分散ベースの手法は,分類設定においてほとんど用いられない。本研究では,回帰から分類へ分散ベースのアプローチを適用し,クラスレベルでの分類の不確かさを定量化する。回帰における分散分解手法は、全共分散の法則に基づく分類におけるクラス共分散分解に拡張され、クラス相関も共分散から導出される。クロスドメインデータセットの実験は、分散ベースのアプローチが、アクティブドメイン適応におけるエントロピーベースのアプローチと同等の精度をもたらすだけでなく、クラスワイドの不確実性やクラス間の相関に関する情報をもたらすことを示す。コードはhttps://github.com/kerrydrx/evidentialadaで入手できる。この明らかな不確実性定量化の代替手段は、クラスの不確実性や相関が応用において重要である場合、研究者により多くの選択肢を与える。 Uncertainty quantification of deep neural networks has become an active field of research and plays a crucial role in various downstream tasks such as active learning. Recent advances in evidential deep learning shed light on the direct quantification of aleatoric and epistemic uncertainties with a single forward pass of the model. Most traditional approaches adopt an entropy-based method to derive evidential uncertainty in classification, quantifying uncertainty at the sample level. However, the variance-based method that has been widely applied in regression problems is seldom used in the classification setting. In this work, we adapt the variance-based approach from regression to classification, quantifying classification uncertainty at the class level. The variance decomposition technique in regression is extended to class covariance decomposition in classification based on the law of total covariance, and the class correlation is also derived from the covariance. Experiments on cross-domain datasets are conducted to illustrate that the variance-based approach not only results in similar accuracy as the entropy-based one in active domain adaptation but also brings information about class-wise uncertainties as well as between-class correlations. The code is available at https://github.com/KerryDRX/EvidentialADA. This alternative means of evidential uncertainty quantification will give researchers more options when class uncertainties and correlations are important in their applications.	翻訳日:2023-11-21 20:44:19 公開日:2023-11-19
# 古典データ符号化のための量子アクセスモデルの回路複雑性について On circuit complexity of quantum access models for encoding classical data ( http://arxiv.org/abs/2311.11365v1 ) ライセンス: Link先を確認	Xiao-Ming Zhang, Xiao Yuan	(参考訳) 古典的なデータエンコーディングは通常、オラクルベースの量子アルゴリズムではブラックボックスとして扱われる。一方,それらの構成は実用的なアルゴリズムの実装に不可欠である。ここでは、データエンコーディングのブラックボックスを開き、典型的な量子アクセスモデルを構築する際のclifford$+t$の複雑さを調べます。一般の行列に対して、スパースアクセス入力モデルとブロックエンコーディングの両方が、行列がスパースであっても、行列次元に対してほぼ線形回路複雑度を必要とすることを示す。また、ほぼ最適のゲート複雑性を達成する構築プロトコルも提供します。一方、行列が効率的なユニタリの線形結合多項式項である場合、構成はデータキュービットに対して効率的になる。典型的な例として、これらのユニタリがPauli文字列である場合のブロック符号化の改善を提案する。私たちのプロトコルは、量子状態の改善と独立した値を持つpauli文字列の選択的神託に基づいて構築されています。我々のアクセスモデル構築は、調整可能なアクビット数を提供し、対応する時空トレードオフを提供する。 Classical data encoding is usually treated as a black-box in the oracle-based quantum algorithms. On the other hand, their constructions are crucial for practical algorithm implementations. Here, we open the black-boxes of data encoding and study the Clifford$+T$ complexity of constructing some typical quantum access models. For general matrices, we show that both sparse-access input models and block-encoding require nearly linear circuit complexities relative to the matrix dimension, even if matrices are sparse. We also gives construction protocols achieving near-optimal gate complexities. On the other hand, the construction becomes efficient with respect to the data qubit when the matrix is the linear combination polynomial terms of efficient unitaries. As a typical example, we propose improved block encoding when these unitaries are Pauli strings. Our protocols are built upon improved quantum state preparation and a selective oracle for Pauli strings, which hold independent value. Our access model constructions offer considerable flexibility, allowing for tunable ancillary qubit number and offers corresponding space-time trade-offs.	翻訳日:2023-11-21 20:43:57 公開日:2023-11-19
# 対称性不変量子機械学習力場 Symmetry-invariant quantum machine learning force fields ( http://arxiv.org/abs/2311.11362v1 ) ライセンス: Link先を確認	Isabel Nha Minh Le, Oriel Kiss, Julian Schuhmacher, Ivano Tavernelli and Francesco Tacchino	(参考訳) 機械学習技術は、原子論シミュレーションのための効率的で正確な力場を計算するのに欠かせないツールである。このアプローチは最近、量子コンピューティングの手法を取り入れるために拡張され、潜在的なエネルギー表面や原子力を予測するために変分量子学習モデルが用いられるようになった。しかしながら、そのようなモデルのトレーニング容易性とスケーラビリティは、理論的および実用的障壁の両方のため、依然として制限されている。近年の幾何学的古典的および量子的機械学習の発展に触発されて、我々は、データにインスパイアされた先行として、物理的に関連する幅広い対称性を明示的に組み込む量子ニューラルネットワークを設計した。我々の不変量子学習モデルは、複雑性が増大する個々の分子において、より一般的なものよりも優れています。さらに,複数の成分を持つシステムの最小例として水二量体について検討し,提案手法の汎用性を示し,より大きなシミュレーションへの道を開く。以上の結果から,分子力場の生成は幾何学的量子機械学習の枠組みを活用し,化学系は高度な量子機械学習ツールの開発と応用のための興味深く豊かな場であることが示唆された。 Machine learning techniques are essential tools to compute efficient, yet accurate, force fields for atomistic simulations. This approach has recently been extended to incorporate quantum computational methods, making use of variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data. However, the trainability and scalability of such models are still limited, due to both theoretical and practical barriers. Inspired by recent developments in geometric classical and quantum machine learning, here we design quantum neural networks that explicitly incorporate, as a data-inspired prior, an extensive set of physically relevant symmetries. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity. Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent, in fact, an interesting and rich playground for the development and application of advanced quantum machine learning tools.	翻訳日:2023-11-21 20:43:41 公開日:2023-11-19
# 手のひら印字認識のためのスケールアウェアコンペティションネットワーク Scale-aware competition network for palmprint recognition ( http://arxiv.org/abs/2311.11354v1 ) ライセンス: Link先を確認	Chengrui Gao, Ziyuan Yang, Min Zhu, Andrew Beng Jin Teo	(参考訳) Palmprintのバイオメトリックスは、パームスキャンによる支払いと社会保障に注意を向けた。しかし,テクスチャの寸法を無視して,テクスチャの配向を優先する手法が主流であった。我々は,この制約を解消するために,イントラスケールとイントラスケールの機能を同時抽出する革新的なネットワークを設計した。本稿では,ISCM(Inner-Scale Competition Module)とASCM(Across-Scale Competition Module)を含むSAC-Net(Scale-Aware competitive Network)を提案する。 ISCMは学習可能なGaborフィルタと自己認識機構を効率的に統合し、リッチな向きデータを抽出し、長距離識別特性を持つテクスチャを識別する。その後、ASCMは様々なスケールの競争戦略を活用して、競合するテクスチャスケールの要素を効果的にカプセル化する。 iscm と ascm を併用することにより, パームプリントの特徴を特徴付ける。 3つのベンチマークデータセットにまたがる厳密な実験は、最先端の代替案と比較して、提案手法の例外的な認識性能と回復力を示している。 Palmprint biometrics garner heightened attention in palm-scanning payment and social security due to their distinctive attributes. However, prevailing methodologies singularly prioritize texture orientation, neglecting the significant texture scale dimension. We design an innovative network for concurrently extracting intra-scale and inter-scale features to redress this limitation. This paper proposes a scale-aware competitive network (SAC-Net), which includes the Inner-Scale Competition Module (ISCM) and the Across-Scale Competition Module (ASCM) to capture texture characteristics related to orientation and scale. ISCM efficiently integrates learnable Gabor filters and a self-attention mechanism to extract rich orientation data and discern textures with long-range discriminative properties. Subsequently, ASCM leverages a competitive strategy across various scales to effectively encapsulate the competitive texture scale elements. By synergizing ISCM and ASCM, our method adeptly characterizes palmprint features. Rigorous experimentation across three benchmark datasets unequivocally demonstrates our proposed approach's exceptional recognition performance and resilience relative to state-of-the-art alternatives.	翻訳日:2023-11-21 20:43:20 公開日:2023-11-19
# 規制の代替:パブリックAIの事例 An Alternative to Regulation: The Case for Public AI ( http://arxiv.org/abs/2311.11350v1 ) ライセンス: Link先を確認	Nicholas Vincent, David Bau, Sarah Schwettmann, Joshua Tan	(参考訳) 政府はAIを構築できるのか? 本稿では、政府や他の公共機関が資金提供し、提供し、管理する「公的なAI」 - 公開アクセス可能なAIモデルを開発するための継続的な取り組みについて述べる。パブリックAIは、AIに対する標準的な規制アプローチの代替と補完の両方を提供するが、同時に新しい技術とポリシーの課題も示唆している。我々は、MLリサーチコミュニティがこのイニシアチブを形作り、その実装をサポートするためのロードマップと、パブリックAIが他の責任あるAIイニシアチブを補完する方法について提示する。 Can governments build AI? In this paper, we describe an ongoing effort to develop ``public AI'' -- publicly accessible AI models funded, provisioned, and governed by governments or other public bodies. Public AI presents both an alternative and a complement to standard regulatory approaches to AI, but it also suggests new technical and policy challenges. We present a roadmap for how the ML research community can help shape this initiative and support its implementation, and how public AI can complement other responsible AI initiatives.	翻訳日:2023-11-21 20:43:01 公開日:2023-11-19
# 被覆粘度を考慮したアルゴリズムの講義 Coverage-Validity-Aware Algorithmic Recourse ( http://arxiv.org/abs/2311.11349v1 ) ライセンス: Link先を確認	Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen	(参考訳) アルゴリズムリコースは、機械学習モデルの説明可能性、透明性、それゆえ倫理を促進するための顕著な技術として浮上する。既存のアルゴリズムリコースアプローチは不変予測モデルをとることが多いが、予測モデルは通常、新しいデータの到着時に更新される。したがって、現在のモデルにそれぞれ有効である言い換えは、将来のモデルでは無効になる可能性がある。そこで本研究では,モデルシフトに対するロバスト性を示すモデルに依存しない談話を生成する新しい枠組みを提案する。まず,非線形(ブラックボックス)モデルのカバレッジを意識した線形サロゲートを構築し,そのリコースを線形サロゲートに対して生成する。我々は, 被覆特性を考慮した線形サロゲートと minimax probability machines (mpm) との理論的関係を確立する。そして、異なる共分散の頑健性を規定することで、提案フレームワークは$\ell_2$-regularization やクラス重み付けを含むmpmの一般的な正規化を回復する。さらに,我々のサーロゲートが近似超平面を直観的に押し付け,ロバストだけでなく解釈可能な帰路も促進することを示した。数値的な結果は,我々のフレームワークの有用性と堅牢性を示している。 Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework.	翻訳日:2023-11-21 20:42:49 公開日:2023-11-19
# 異方性開量子ラビ模型における多臨界散逸相転移 Multicritical dissipative phase transitions in the anisotropic open quantum Rabi model ( http://arxiv.org/abs/2311.11346v1 ) ライセンス: Link先を確認	Guitao Lyu, Korbinian Kottmann, Martin B. Plenio, Myung-Joong Hwang	(参考訳) 回転項と反回転項の結合強度の異方性の程度を変えて一階及び二階の散逸相転移を示す異方性開量子ラビモデルの非平衡定常状態について検討する。半古典的および量子的アプローチの両方を用いて、異方性と散逸の間の相互作用から生じる豊富な位相図を見つける。まず、通常相と超放射相の両方が安定な双安定相が存在する。第2に、第1および第2次相転移の位相境界が一致する多臨界点が存在する。新しい臨界指数の集合が多臨界点のスケーリングを支配していることを示す。最後に,ラマン遷移の強度制御により異方性が調整可能な一対の捕捉イオンを用いて,多臨界遷移の観測とビスタビリティの実現可能性について検討する。本研究は, 有限成分量子系における臨界現象の範囲を拡大し, 臨界量子センシングへの応用に有用であることを示す。 We investigate the nonequilibrium steady state of the anisotropic open quantum Rabi model, which exhibits first-order and second-order dissipative phase transitions upon varying the degree of anisotropy between the coupling strengths of rotating and counterrotating terms. Using both semiclassical and quantum approaches, we find a rich phase diagram resulting from the interplay between the anisotropy and the dissipation. First, there exists a bistable phase where both the normal and superradiant phases are stable. Second, there are multicritical points where the phase boundaries for the first- and second-order phase transitions meet. We show that a new set of critical exponents governs the scaling of the multicritical points. Finally, we discuss the feasibility of observing the multicritical transitions and bistability using a pair of trapped ions where the anisotropy can be tuned by the controlling the intensity of the Raman transitions. Our study enlarges the scope of critical phenomena that may occur in finite-component quantum systems, which could be useful for the applications in the critical quantum sensing.	翻訳日:2023-11-21 20:42:27 公開日:2023-11-19
# 連続変数への新しい埋め込みを用いた高速化逆モデリングのための生成モデル A Generative Model for Accelerated Inverse Modelling Using a Novel Embedding for Continuous Variables ( http://arxiv.org/abs/2311.11343v1 ) ライセンス: Link先を確認	S\'ebastien Bompas abd Stefan Sandfeld	(参考訳) 材料科学において、望ましい性質を持つ高速プロトタイピング材料の挑戦は、しばしば適切な微細構造を見つけるために広範囲な実験を必要とする。さらに、与えられた性質に対する微細構造の発見は、一般に複数の解が存在する可能性のある不適切な問題である。生成機械学習モデルを使用することは、計算コストの低減にも有効である。これは、例えばモデルへの条件付け入力として連続プロパティ変数を必要とするため、新しい課題が伴う。本稿では,既存手法の欠点を考察し,浮動小数点数のバイナリ表現に基づく生成モデルの新たな埋め込み戦略と比較する。これにより正規化の必要性を排除し、情報を保存し、生成モデルを条件付けするための汎用的な埋め込み空間を作成する。この手法は任意の数にネットワークを条件付けし、生成した微細構造画像のきめ細かい制御を提供し、加速材料設計に寄与することができる。 In materials science, the challenge of rapid prototyping materials with desired properties often involves extensive experimentation to find suitable microstructures. Additionally, finding microstructures for given properties is typically an ill-posed problem where multiple solutions may exist. Using generative machine learning models can be a viable solution which also reduces the computational cost. This comes with new challenges because, e.g., a continuous property variable as conditioning input to the model is required. We investigate the shortcomings of an existing method and compare this to a novel embedding strategy for generative models that is based on the binary representation of floating point numbers. This eliminates the need for normalization, preserves information, and creates a versatile embedding space for conditioning the generative model. This technique can be applied to condition a network on any number, to provide fine control over generated microstructure images, thereby contributing to accelerated materials design.	翻訳日:2023-11-21 20:42:07 公開日:2023-11-19
# 複数モーダルの同時埋め込み学習による出現コード Appearance Codes using Joint Embedding Learning of Multiple Modalities ( http://arxiv.org/abs/2311.11427v1 ) ライセンス: Link先を確認	Alex Zhang and Evan Dogariu	(参考訳) 近年のジェネレーティブ・モデリングにおける外観コードの使用により、シーンの昼夜のレンダリングなど、様々な外観と照明を備えた新しいビューレンダリングが可能となった。この手法の大きな限界は,各シーンにおける新たな外観符号の再学習の必要性であり,異なるモード間のコントラスト的損失制約を強制することにより,シーンの外観と構造に対する共同埋め込み空間を学習するフレームワークを提案する。我々はRADIATEデータセット上の単純な変分オートエンコーダモデルに適用し、付加的な最適化イテレーションなしで夜間画像の新しいレンダリングを生成することができることを定性的に示す。さらに,標準的な画像毎出現コード技術を用いたベースラインvaeと比較し,推定で見当たらない画像の出現コードを学習することなく,同様の品質の世代を実現できることを示す。 The use of appearance codes in recent work on generative modeling has enabled novel view renders with variable appearance and illumination, such as day-time and night-time renders of a scene. A major limitation of this technique is the need to re-train new appearance codes for every scene on inference, so in this work we address this problem proposing a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities. We apply our framework to a simple Variational Auto-Encoder model on the RADIATE dataset \cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new renders of night-time photos using day-time appearance codes without additional optimization iterations. Additionally, we compare our model to a baseline VAE that uses the standard per-image appearance code technique and show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.	翻訳日:2023-11-21 20:34:16 公開日:2023-11-19
# テンソルアウェアエネルギー会計 Tensor-Aware Energy Accounting ( http://arxiv.org/abs/2311.11424v1 ) ライセンス: Link先を確認	Timur Babakol and Yu David Liu	(参考訳) ディープラーニング(DL)がサポートする人工知能(AI)アプリケーションの急速な成長に伴い、これらのアプリケーションのエネルギー効率は持続可能性に大きな影響を与えている。 SmaragdineはTensorFlowで実装されたテンソルベースのDLプログラムのための新しいエネルギー会計システムである。 SmaragdineはDLプログラムの内部構造を認識しており、テンソル対応エネルギー会計と呼んでいる。スマラグジンでは、DLプログラムのエネルギー消費は、その論理的階層的な分解構造に沿った単位に分解することができる。我々は、最も広く使われている言語モデルの一つであるBERTのエネルギー挙動を理解するためにSmaragdineを適用した。 Smaragdineは、BERTの最も高いエネルギー/電力消費成分を識別することができる。さらに,Smaragdineが下流のツールチェーン構築をどのようにサポートしているかを事例として,BERTのハイパーパラメータチューニングによるエネルギー影響と,BERTが次世代のALBERTに進化する際のエネルギー挙動の進化を比較検討した。 With the rapid growth of Artificial Intelligence (AI) applications supported by deep learning (DL), the energy efficiency of these applications has an increasingly large impact on sustainability. We introduce Smaragdine, a new energy accounting system for tensor-based DL programs implemented with TensorFlow. At the heart of Smaragdine is a novel white-box methodology of energy accounting: Smaragdine is aware of the internal structure of the DL program, which we call tensor-aware energy accounting. With Smaragdine, the energy consumption of a DL program can be broken down into units aligned with its logical hierarchical decomposition structure. We apply Smaragdine for understanding the energy behavior of BERT, one of the most widely used language models. Layer-by-layer and tensor-by-tensor, Smaragdine is capable of identifying the highest energy/power-consuming components of BERT. Furthermore, we conduct two case studies on how Smaragdine supports downstream toolchain building, one on the comparative energy impact of hyperparameter tuning of BERT, the other on the energy behavior evolution when BERT evolves to its next generation, ALBERT.	翻訳日:2023-11-21 20:33:57 公開日:2023-11-19
# 混合データセットを用いた無線ネットワーク最適化のためのオフライン強化学習 Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets ( http://arxiv.org/abs/2311.11423v1 ) ライセンス: Link先を確認	Kun Yang, Cong Shen, Jing Yang, Shu-ping Yeh, Jerry Sydir	(参考訳) 近年の強化学習(RL)は、無線無線リソース管理(RRM)におけるオンラインRLの採用を促進している。しかし、オンラインRLアルゴリズムは環境との直接の相互作用を必要とするが、RLにおける避けられない探索による潜在的な性能損失を考えると、望ましくないかもしれない。本研究ではまず, RRM 問題の解法における \emph{offline} RL アルゴリズムの利用について検討する。我々は,ユーザスケジューリングによる線形結合を最大化することを目的とした特定のRRM問題に対して,動作制約付きQラーニング(BCQ),保守的Qラーニング(CQL),暗黙的Qラーニング(IQL)を含む,最先端のオフラインRLアルゴリズムを評価した。 rrm問題に対するオフラインrlの性能は、データ収集に使用される行動ポリシーに極めて依存しており、さらに、異なる行動ポリシーによって収集される異種データセットを活用する新しいオフラインrlソリューションを提案する。データセットの適切な混合により、オフラインRLは、すべての関連する行動ポリシーが極めて最適である場合でも、ほぼ最適RLポリシーを生成することができることを示す。 The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first investigate the use of \emph{offline} RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms, including behavior constrained Q-learning (BCQ), conservative Q-learning (CQL), and implicit Q-learning (IQL), for a specific RRM problem that aims at maximizing a linear combination {of sum and} 5-percentile rates via user scheduling. We observe that the performance of offline RL for the RRM problem depends critically on the behavior policy used for data collection, and further propose a novel offline RL solution that leverages heterogeneous datasets collected by different behavior policies. We show that with a proper mixture of the datasets, offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.	翻訳日:2023-11-21 20:33:38 公開日:2023-11-19
# 識別不能閾値における精度:分類アルゴリズムの評価法 Precision at the indistinguishability threshold: a method for evaluating classification algorithms ( http://arxiv.org/abs/2311.11422v1 ) ライセンス: Link先を確認	David J. T. Sumpter	(参考訳) aucやf1-scoreなど、分類アルゴリズムのパフォーマンスを評価するための単一の数値メトリクスは幅広く存在する(wikipediaは17の指標をリストアップし、27の異なる名前を持っている)。本稿では,猫と実猫を区別できないようにアルゴリズムが調整された場合,猫が実際に猫を包んでいるとラベル付けされた画像が,どれくらいの頻度で存在するのか,という疑問に答えるための新しい指標を提案する。この計量を構成するステップは次のとおりである。まず、アルゴリズムが2つの無作為なチョセン画像(例えば、猫を含むとラベルづけされた画像)と、実際に猫を含む画像から1つの画像(つまり、猫を含むとラベルされた画像)を示すとき、最も高いスコアを持つ画像が実際の猫画像の集合から選択された画像の確率が50\%であるように閾値スコアを設定する。この判定閾値では、正のラベル付き画像の集合は正の画像の集合と区別できない。 2番目のステップとして、猫を含むものとしてラベル付けされた画像からランダムに選択された画像が実際に猫を含む頻度を問うことで、パフォーマンスを測定する。この計量は「区別不能閾値での精度」と考えることができる。この新しいメトリクスは、これらのメトリクスすべてに固有の精度とリコールのトレードオフに対処するものではないが、このメソッドがAUCなどの使用時に発生する落とし穴を回避し、例えばF1スコアよりもモチベーションがよいことを示す。 There exist a wide range of single number metrics for assessing performance of classification algorithms, including AUC and the F1-score (Wikipedia lists 17 such metrics, with 27 different names). In this article, I propose a new metric to answer the following question: when an algorithm is tuned so that it can no longer distinguish labelled cats from real cats, how often does a randomly chosen image that has been labelled as containing a cat actually contain a cat? The steps to construct this metric are as follows. First, we set a threshold score such that when the algorithm is shown two randomly-chosen images -- one that has a score greater than the threshold (i.e. a picture labelled as containing a cat) and another from those pictures that really does contain a cat -- the probability that the image with the highest score is the one chosen from the set of real cat images is 50\%. At this decision threshold, the set of positively labelled images are indistinguishable from the set of images which are positive. Then, as a second step, we measure performance by asking how often a randomly chosen picture from those labelled as containing a cat actually contains a cat. This metric can be thought of as {\it precision at the indistinguishability threshold}. While this new metric doesn't address the tradeoff between precision and recall inherent to all such metrics, I do show why this method avoids pitfalls that can occur when using, for example AUC, and it is better motivated than, for example, the F1-score.	翻訳日:2023-11-21 20:33:18 公開日:2023-11-19
# LifeLearner:組み込みコンピューティングプラットフォームのためのハードウェア対応メタ継続学習システム LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms ( http://arxiv.org/abs/2311.11420v1 ) ライセンス: Link先を確認	Young D. Kwon, Jagmohan Chauhan, Hong Jia, Stylianos I. Venieris, and Cecilia Mascolo	(参考訳) 連続学習(continual learning, cl)は、ユーザのパーソナライゼーションや家庭用ロボットといったアプリケーションに対して、オンザフライで学習とコンテキスト適応を可能にする。これはコンテキスト、アクション、ユーザが変更する場合に重要な機能です。しかし、リソース制約のある組み込みシステムでCLを有効にすることは、ラベル付きデータ、メモリ、計算能力に制限があるため困難である。本稿では,システムリソース(低メモリ,レイテンシ,エネルギー消費)を劇的に最適化し,高い精度を保ちながら,ハードウェアを意識したメタ連続学習システムlifelearnerを提案する。具体的には,(1)データ不足問題に明示的に対処し,高い精度を確保するためのメタラーニングとリハーサル戦略,(2)損失のない圧縮を効果的に組み合わせてCLとリハーサルサンプルのリソース要求を大幅に削減する,(3)ハードウェア特性を考慮した組込みおよびIoTプラットフォーム上でのハードウェア認識システムを開発する。その結果、lifelearnerは、oracleのベースラインと比較して精度が2.8%低下し、ほぼ最適のcl性能を達成している。最先端(SOTA)メタCL法では、LifeLearnerはメモリフットプリントを(178.7x)大幅に削減し、エンドツーエンドのレイテンシを80.8-94.2%、エネルギー消費を80.9-94.2%削減した。さらに、2つのエッジデバイスとマイクロコントローラユニットにLifeLearnerを配置し、リソース制約のあるプラットフォーム上で、SOTAメソッドの実行が不可能な効率的なCLと、適応可能なCLをユビキタスに極端に展開することを可能にする。コードはhttps://github.com/theyoungkwon/lifelearnerで入手できる。 Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. This is an important feature when context, actions, and users change. However, enabling CL on resource-constrained embedded systems is challenging due to the limited labeled data, memory, and computing capacity. In this paper, we propose LifeLearner, a hardware-aware meta continual learning system that drastically optimizes system resources (lower memory, latency, energy consumption) while ensuring high accuracy. Specifically, we (1) exploit meta-learning and rehearsal strategies to explicitly cope with data scarcity issues and ensure high accuracy, (2) effectively combine lossless and lossy compression to significantly reduce the resource requirements of CL and rehearsal samples, and (3) developed hardware-aware system on embedded and IoT platforms considering the hardware characteristics. As a result, LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline. With respect to the state-of-the-art (SOTA) Meta CL method, LifeLearner drastically reduces the memory footprint (by 178.7x), end-to-end latency by 80.8-94.2%, and energy consumption by 80.9-94.2%. In addition, we successfully deployed LifeLearner on two edge devices and a microcontroller unit, thereby enabling efficient CL on resource-constrained platforms where it would be impractical to run SOTA methods and the far-reaching deployment of adaptable CL in a ubiquitous manner. Code is available at https://github.com/theyoungkwon/LifeLearner.	翻訳日:2023-11-21 20:32:51 公開日:2023-11-19
# DiffSCI:反復スペクトル拡散モデルによるゼロショットスナップショット圧縮イメージング DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model ( http://arxiv.org/abs/2311.11417v1 ) ライセンス: Link先を確認	Zhenghao Pan, Haijin Zeng, Jiezhang Cao, Kai Zhang, Yongyong Chen	(参考訳) 本稿では,マルチスペクトラル画像(msi)のためのスナップショット圧縮画像再構成(sci)の精度向上に尽力する。そこで我々は,既存のSCI技術と画像生成モデルとを融合し,DiffSCIと呼ばれる新規なゼロショット拡散モデルを提案する。 DiffSCIは、より深い事前および最適化に基づく方法論からの構造的洞察を活用し、現代の認知拡散モデルによって提供される生成能力を補完する。具体的には,まず,プラグ・アンド・プレイ・フレームワークにおける生成的デノイザーとして,rgb画像の実質的なコーパスでトレーニングされた事前学習された拡散モデルを用いる。この統合により、特に現在の手法が効果的な解決に苦慮している場合には、SCI再構築が成功する。次に,スペクトル帯域相関を体系的に考慮し,波長ミスマッチを緩和するロバストな手法を導入することで,rgb拡散モデルをmsisにシームレスに適応させることができる。第3に、データサブプロブレムの解像度を早めるために、高速化アルゴリズムを実装した。この増強は収束速度を加速するだけでなく、再構築過程の品質を高める。我々は、DiffSCIが、自己教師付きおよびゼロショットアプローチよりも明確なパフォーマンス向上を示し、シミュレートされたデータセットと実際のデータセットの両方にまたがる教師付きトランスフォーマーよりも優れていることを示すための広範なテストを示す。私たちのコードは利用可能です。 This paper endeavors to advance the precision of snapshot compressive imaging (SCI) reconstruction for multispectral image (MSI). To achieve this, we integrate the advantageous attributes of established SCI techniques and an image generative model, propose a novel structured zero-shot diffusion model, dubbed DiffSCI. DiffSCI leverages the structural insights from the deep prior and optimization-based methodologies, complemented by the generative capabilities offered by the contemporary denoising diffusion model. Specifically, firstly, we employ a pre-trained diffusion model, which has been trained on a substantial corpus of RGB images, as the generative denoiser within the Plug-and-Play framework for the first time. This integration allows for the successful completion of SCI reconstruction, especially in the case that current methods struggle to address effectively. Secondly, we systematically account for spectral band correlations and introduce a robust methodology to mitigate wavelength mismatch, thus enabling seamless adaptation of the RGB diffusion model to MSIs. Thirdly, an accelerated algorithm is implemented to expedite the resolution of the data subproblem. This augmentation not only accelerates the convergence rate but also elevates the quality of the reconstruction process. We present extensive testing to show that DiffSCI exhibits discernible performance enhancements over prevailing self-supervised and zero-shot approaches, surpassing even supervised transformer counterparts across both simulated and real datasets. Our code will be available.	翻訳日:2023-11-21 20:32:18 公開日:2023-11-19
# 大規模言語モデルに対するセキュリティリスク分類法 A Security Risk Taxonomy for Large Language Models ( http://arxiv.org/abs/2311.11415v1 ) ライセンス: Link先を確認	Erik Derner and Kristina Batisti\v{c} and Jan Zah\'alka and Robert Babu\v{s}ka	(参考訳) 大規模言語モデル(LLM)がより多くのアプリケーションに浸透するにつれて、関連するセキュリティリスクの評価がますます必要になる。不正情報からデータ漏洩や評判の損傷まで、悪意のある俳優による搾取の可能性はかなり大きい。本稿では,llmsが生み出すセキュリティリスクに着目し,広くカバーされている倫理的,社会的な影響を超えて,現在の研究におけるギャップについて述べる。本研究は,LSMに対する迅速な攻撃に着目し,ユーザモデル通信パイプラインに沿ったセキュリティリスクの分類法を提案する。ターゲットと攻撃タイプによる攻撃を、プロンプトベースのインタラクションスキームに分類する。分類学は、これらのリスクの実際の影響を示す特定の攻撃例で強化されている。この分類を通じて、堅牢でセキュアなllmアプリケーションの開発に報知し、安全性と信頼性を高めることを目的とする。 As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by focusing on the security risks posed by LLMs, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline, explicitly focusing on prompt-based attacks on LLMs. We categorize the attacks by target and attack type within a prompt-based interaction scheme. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.	翻訳日:2023-11-21 20:31:54 公開日:2023-11-19
# クロスドメイン時系列解析タスクのための大規模事前学習時系列モデル Large Pre-trained time series models for cross-domain Time series analysis tasks ( http://arxiv.org/abs/2311.11413v1 ) ライセンス: Link先を確認	Harshavardhan Kamarthi, B. Aditya Prakash	(参考訳) 大規模な事前学習モデルは、個々の下流タスクのためのモデルトレーニングをより効率的にし、優れたパフォーマンスを提供するために、言語やビジョンのような領域において重要な進歩に役立っている。しかしながら、時系列分析タスクに取り組むには、通常、タスク特有のトレーニングデータとドメイン専門知識を活用して、スクラッチから分離したモデルを設計およびトレーニングすることが必要となる。我々は、複数の不均一な時系列データセットから一般的な時系列モデルを事前学習するための重要な課題に取り組む:異なるドメインから異なるダイナミクスの時系列をモデル化するためのモデルに意味的に有用な入力を提供する。逐次モデルへの入力として時系列をセグメントに分割することで意味論的により良い入力が得られることを観察し、事前学習中に自己教師付き学習損失を利用した最適なデータセット特異的セグメンテーション戦略を自動的に識別する新しいモデルLPTMを提案する。 LPTMは、ドメイン固有の最先端モデルと同等かそれ以上のパフォーマンスを提供し、複数の異なるドメインからの幅広い時系列分析タスクにおいて、最大40%のデータを取り込み、50%のトレーニング時間で最先端のパフォーマンスを達成することができる。 Large pre-trained models have been instrumental in significant advancements in domains like language and vision making model training for individual downstream tasks more efficient as well as provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a general time-series model from multiple heterogeneous time-series dataset: providing semantically useful inputs to models for modeling time series of different dynamics from different domains. We observe that partitioning time-series into segments as inputs to sequential models produces semantically better inputs and propose a novel model LPTM that automatically identifies optimal dataset-specific segmentation strategy leveraging self-supervised learning loss during pre-training. LPTM provides performance similar to or better than domain-specific state-of-art model and is significantly more data and compute efficient taking up to 40% less data as well as 50% less training time to achieve state-of-art performance in a wide range of time-series analysis tasks from multiple disparate domain.	翻訳日:2023-11-21 20:31:39 公開日:2023-11-19
# ニューラル量子埋め込み: 量子教師付き学習の限界を押し上げる Neural Quantum Embedding: Pushing the Limits of Quantum Supervised Learning ( http://arxiv.org/abs/2311.11412v1 ) ライセンス: Link先を確認	Tak Hur, Israel F. Araujo, Daniel K. Park	(参考訳) 量子埋め込みは古典的なデータに量子機械学習技術を適用するのに不可欠であり、性能にかなりの影響を及ぼす。本研究では,古典的深層学習手法を活用し,量子埋め込みを効率的に最適化するニューラル量子埋め込み(nqe)を提案する。 NQEは経験的リスクの低いバウンダリを強化し、分類性能を大幅に改善する。さらに、NQEはノイズに対する堅牢性を改善する。 nqeの有効性を検証するため,画像データ分類のためのibm量子デバイス実験を行い,0.52から0.96までの精度向上を実現した。局所有効次元の数値解析は、nqeが量子ニューラルネットワークのトレーサビリティと一般化性能を向上させることを強調する。さらに、NQEは期待されるリスクの上界の減少によって証明されるように、量子カーネル法における一般化の改善を実現する。 Quantum embedding is indispensable for applying quantum machine learning techniques to classical data, and has substantial impacts on performance outcomes. In this study, we present Neural Quantum Embedding (NQE), a method that efficiently optimizes quantum embedding by leveraging classical deep learning techniques. NQE enhances the lower bound of the empirical risk, leading to substantial improvements in classification performance. Moreover, NQE improves robustness against noise. To validate the effectiveness of NQE, we conduct experiments on IBM quantum devices for image data classification, resulting in a remarkable accuracy enhancement from 0.52 to 0.96. Numerical analysis of the local effective dimension highlights that NQE improves the trainability and generalization performance of quantum neural networks. Furthermore, NQE achieves improved generalization in the quantum kernel method, as evidenced by a reduction in the upper bound of the expected risk.	翻訳日:2023-11-21 20:31:18 公開日:2023-11-19
# 機械ミーニング応用のための交渉表現 Negotiated Representations for Machine Mearning Application ( http://arxiv.org/abs/2311.11410v1 ) ライセンス: Link先を確認	Nuri Korhan, Samet Bayram	(参考訳) オーバーフィッティング(Overfitting)は、マシンラーニングモデルが長時間トレーニングされ、提供されるトレーニングラベルに対するトレーニングサンプルの正確な適合性に過度に集中し、テストデータに有用な予測ルールを追跡することができない場合に発生する現象である。この現象は、通常、特定のサンプルの記憶、ノイズの記憶、および多数のニューロンを用いて限られたサンプルのデータセットにフィットネスを強制することに起因する。トレーニングプロセスが継続するにつれて、モデルが様々な特徴を符号化することは事実であるが、過適合のほとんどは、明確に定義されたメンバーシップ比の調整の過程で起こると論じている。本研究では,事前決定されたクラスラベルを用いたサンプルの出力表現の交渉を可能にすることにより,機械学習モデルの分類精度を向上させる手法を提案する。入力のモデル解釈と提供されたラベルとのネゴシエーションを設定することで,平均的な分類精度を向上させるだけでなく,他の正規化手法を使わずにオーバーフィッティング率を下げることができた。 cifar 10やcifar 100、mnistといった公開データセットからオーバーフィットシナリオを生成することによって、いくつかのローレジーム機械学習問題に対する交渉パラダイムのアプローチを実装することにより、提案手法が、その目的よりも多くの能力を持つことを実証した。実験結果を共有し、機械学習コミュニティに提案されたパラダイムの限界を探らせています。また、継続学習などの他の研究分野における学習課題を克服するために、交渉パラダイムを活用するようコミュニティに促すことも目指している。実験的なセットアップのPythonコードはGitHubにアップロードされる。 Overfitting is a phenomenon that occurs when a machine learning model is trained for too long and focused too much on the exact fitness of the training samples to the provided training labels and cannot keep track of the predictive rules that would be useful on the test data. This phenomenon is commonly attributed to memorization of particular samples, memorization of the noise, and forced fitness into a data set of limited samples by using a high number of neurons. While it is true that the model encodes various peculiarities as the training process continues, we argue that most of the overfitting occurs in the process of reconciling sharply defined membership ratios. In this study, we present an approach that increases the classification accuracy of machine learning models by allowing the model to negotiate output representations of the samples with previously determined class labels. By setting up a negotiation between the models interpretation of the inputs and the provided labels, we not only increased average classification accuracy but also decreased the rate of overfitting without applying any other regularization tricks. By implementing our negotiation paradigm approach to several low regime machine learning problems by generating overfitting scenarios from publicly available data sets such as CIFAR 10, CIFAR 100, and MNIST we have demonstrated that the proposed paradigm has more capacity than its intended purpose. We are sharing the experimental results and inviting the machine learning community to explore the limits of the proposed paradigm. We also aim to incentive the community to exploit the negotiation paradigm to overcome the learning related challenges in other research fields such as continual learning. The Python code of the experimental setup is uploaded to GitHub.	翻訳日:2023-11-21 20:31:03 公開日:2023-11-19
# 未来を設計する: メタバースにおけるエンタープライズ統合のモデル Architecting the Future: A Model for Enterprise Integration in the Metaverse ( http://arxiv.org/abs/2311.11406v1 ) ライセンス: Link先を確認	Amirmohammad Nateghi and Maedeh Mosharraf	(参考訳) 約30年前にさかのぼる歴史があるが、メタバースは今日最も話題になっているテーマの1つに成長してきた。メタバースは当初エンタテインメントに関する議論に限定された後、徐々にビジネス談話の分野での影響を増大させた。メタバースを深く掘り下げる前に、ITに対する不適切な使用や考え方のために情報技術(IT)に大きく依存している企業にとって、失敗とビジネスパスからの逸脱は、非常にありそうである。エンタープライズアーキテクチャ(EA)という考え方は、この問題に対処するためのマネジメント戦略として現れました。 EAの第一の考え方として、企業における不要な負担から指導力、支援力へとITを転換しようとした。そこで,メタバースを基盤としたプラットフォーム上で,EAのアイデアを用いて仮想企業を運営しようとする試みの結果,拡張されたEAモデルを提案する。最後に、概念モデルを評価し、メタバースがビジネスを支援することを実証するために、3つのケーススタディ、分散、バトルインフィニティ、ルームを利用した。 Although it has a history that goes back about three decades, Metaverse has grown to be one of the most talked-about subjects today. Metaverse gradually increased its influence in the realm of business discourse after initially being restricted to discussions about entertainment. Before getting deep into the Metaverse, it should be noted that failure and deviating from the business path are highly likely for an enterprise that relies heavily on information technology (IT) because of improper use and thinking about IT. The idea of enterprise architecture (EA) emerged as a management strategy to address this issue. As the first school of thought of EA, it sought to transform IT from an unnecessary burden in an enterprise to a guiding and supporting force. Then an extended EA model is suggested as a result of the attempt made in this paper to use the idea of EA to steer virtual enterprises on Metaverse-based platforms. Finally, to evaluate the conceptual model and demonstrate that the Metaverse can support businesses, three case studies Decentraland, Battle Infinity, and Rooom were utilized.	翻訳日:2023-11-21 20:30:36 公開日:2023-11-19
# 私をオファーにしよう:観光業における前方・逆オークション問題 Make me an Offer: Forward and Reverse Auctioning Problems in the Tourism Industry ( http://arxiv.org/abs/2311.11400v1 ) ライセンス: Link先を確認	Ioannis T. Christou, Dimitris Doukas, Konstantina Skouri, Gerasimos Meletiou	(参考訳) ほとんどの観光地は、経済と社会に大きな影響を与え、定期的で一貫した季節性に直面している。この現象は、旅行需要が増加したが、地理的に異なる地域において不均一な時代においてより顕著である。 To counter these problems that both customers and hoteliers are facing, we have developed two auctioning systems that allow hoteliers of lower popularity tier areas or during low season periods to auction their rooms in what we call a forward auction model, and also allows customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms, in what constitutes a reverse auction model initiated by the customer, similar to the bidding concept of priceline.com. 我々は,両方のオークションを明示的に定義する数学的プログラミングモデルを開発し,各タイプにおいて,宿泊者側と顧客側の両方において,大きな利益が得られることを示す。本稿では,これらの最適化問題の近似解のアルゴリズム的手法について論じるとともに,最適化解法を用いて最適性を保証する。これらの技術は、中低期の季節を減らし、魅力的なオファーを顧客に提供する顧客と宿泊者の両方にとって有益である。 Most tourist destinations are facing regular and consistent seasonality with significant economic and social impacts. This phenomenon is more pronounced in the post-covid era, where demand for travel has increased but unevenly among different geographic areas. To counter these problems that both customers and hoteliers are facing, we have developed two auctioning systems that allow hoteliers of lower popularity tier areas or during low season periods to auction their rooms in what we call a forward auction model, and also allows customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms, in what constitutes a reverse auction model initiated by the customer, similar to the bidding concept of priceline.com. We develop mathematical programming models that define explicitly both types of auctions, and show that in each type, there are significant benefits to be gained both on the side of the hotelier as well as on the side of the customer. We discuss algorithmic techniques for the approximate solution of these optimization problems, and present results using exact optimization solvers to solve them to guaranteed optimality. These techniques could be beneficial to both customer and hotelier reducing seasonality during middle and low season and providing the customer with attractive offers.	翻訳日:2023-11-21 20:30:18 公開日:2023-11-19
# 深層学習アルゴリズムの解釈可能化に向けて Towards interpretable-by-design deep learning algorithms ( http://arxiv.org/abs/2311.11396v1 ) ライセンス: Link先を確認	Plamen Angelov, Dmitry Kangin, Ziyang Zhang	(参考訳) IDEAL(Interpretable-by-deep learning algorithms)というフレームワークは、標準教師付き分類問題をトレーニングデータから派生したプロトタイプのセットに類似した関数に再キャストすると同時に、いわゆるファンデーションモデル(FM)を形成する大規模ニューラルネットワークの既存の潜時空間を活用する。これは、IG-3.6B + ImageNet-1K や LVD-142M (stage A) のような巨大なデータセット上で事前訓練されたDLモデル(例えば、ビジュアルトランスフォーマー、ViT)の素晴らしい成果から恩恵を受けながら、説明可能性(ステージB)の問題に対処する。 dlモデルを概念的にシンプルで説明可能なプロトタイプにすることができることを示す。 The key findings can be summarized as follows: (1) the proposed models are interpretable through prototypes, mitigating the issue of confounded interpretations, (2) the proposed IDEAL framework circumvents the issue of catastrophic forgetting allowing efficient class-incremental learning, and (3) the proposed IDEAL approach demonstrates that ViT architectures narrow the gap between finetuned and non-finetuned models allowing for transfer learning in a fraction of time \textbf{without} finetuning of the feature space on a target dataset with iterative supervised methods. The proposed framework named IDEAL (Interpretable-by-design DEep learning ALgorithms) recasts the standard supervised classification problem into a function of similarity to a set of prototypes derived from the training data, while taking advantage of existing latent spaces of large neural networks forming so-called Foundation Models (FM). This addresses the issue of explainability (stage B) while retaining the benefits from the tremendous achievements offered by DL models (e.g., visual transformers, ViT) pre-trained on huge data sets such as IG-3.6B + ImageNet-1K or LVD-142M (stage A). We show that one can turn such DL models into conceptually simpler, explainable-through-prototypes ones. The key findings can be summarized as follows: (1) the proposed models are interpretable through prototypes, mitigating the issue of confounded interpretations, (2) the proposed IDEAL framework circumvents the issue of catastrophic forgetting allowing efficient class-incremental learning, and (3) the proposed IDEAL approach demonstrates that ViT architectures narrow the gap between finetuned and non-finetuned models allowing for transfer learning in a fraction of time \textbf{without} finetuning of the feature space on a target dataset with iterative supervised methods.	翻訳日:2023-11-21 20:30:00 公開日:2023-11-19
# 適応スパイキングニューロンの速度精度シミュレーショントレードオフに対処する Addressing the speed-accuracy simulation trade-off for adaptive spiking neurons ( http://arxiv.org/abs/2311.11390v1 ) ライセンス: Link先を確認	Luke Taylor, Andrew J King, Nicol S Harper	(参考訳) adaptive leaky integrated-and-fire(alif)モデルは、計算神経科学において基本的な概念であり、脳の研究に役立っている。これらのニューラルネットワークの逐次的な性質のため、一般的に直面する問題は、速度精度のトレードオフである。小さな離散時間ステップ(DT)を用いてニューロンを正確にシミュレートするか、より大きなDTを使用してニューロンをシミュレートし、シミュレーション精度を損なう。ここでは、アルゴリズムでalifモデルを再解釈し、逐次シミュレーションの複雑さを低減し、gpu上でより効率的な並列化を可能にすることで、このジレンマの解を提供する。合成ベンチマークの小さなDTを用いて,50ドル以上のトレーニングスピードアップを得るために,我々の実装を計算的に検証した。また、異なる教師付き分類タスクの標準的なalif実装と同等のパフォーマンスを得ることができました。最後に、我々のモデルが皮質ニューロンの実際の電気生理学的記録を迅速かつ正確に適合させる方法を示し、これは非常に微細なサブミリ秒のDTが正確なスパイクタイミングを捉えるのに不可欠である。 The adaptive leaky integrate-and-fire (ALIF) model is fundamental within computational neuroscience and has been instrumental in studying our brains $\textit{in silico}$. Due to the sequential nature of simulating these neural models, a commonly faced issue is the speed-accuracy trade-off: either accurately simulate a neuron using a small discretisation time-step (DT), which is slow, or more quickly simulate a neuron using a larger DT and incur a loss in simulation accuracy. Here we provide a solution to this dilemma, by algorithmically reinterpreting the ALIF model, reducing the sequential simulation complexity and permitting a more efficient parallelisation on GPUs. We computationally validate our implementation to obtain over a $50\times$ training speedup using small DTs on synthetic benchmarks. We also obtained a comparable performance to the standard ALIF implementation on different supervised classification tasks - yet in a fraction of the training time. Lastly, we showcase how our model makes it possible to quickly and accurately fit real electrophysiological recordings of cortical neurons, where very fine sub-millisecond DTs are crucial for capturing exact spike timing.	翻訳日:2023-11-21 20:29:37 公開日:2023-11-19
# 機械文化 Machine Culture ( http://arxiv.org/abs/2311.11388v1 ) ライセンス: Link先を確認	Levin Brinkmann, Fabian Baumann, Jean-Fran\c{c}ois Bonnefon, Maxime Derex, Thomas F. M\"uller, Anne-Marie Nussberger, Agnieszka Czaplicka, Alberto Acerbi, Thomas L. Griffiths, Joseph Henrich, Joel Z. Leibo, Richard McElreath, Pierre-Yves Oudeyer, Jonathan Stray and Iyad Rahwan	(参考訳) 人類が文化を創造し、広める能力は、種としての成功の最も重要な要素としてしばしば認められている。本稿では,機械が介在する,あるいは生成する,機械文化の概念について考察する。知的機械は、変化、伝達、選択の文化的進化過程を同時に変革すると主張する。 Recommenderアルゴリズムは、社会学習のダイナミクスを変えつつある。チャットボットは新しい文化伝達様式を形成しており、文化モデルとして機能している。さらに、インテリジェントマシンは、ゲーム戦略や視覚芸術から科学的結果に至るまで、文化的な特徴を生み出す貢献者として進化している。本稿では,機械の現在および今後の文化的発展への影響を研究するための概念的枠組みと,機械文化研究のための研究課題について述べる。 The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algorithms are altering social learning dynamics. Chatbots are forming a new mode of cultural transmission, serving as cultural models. Furthermore, intelligent machines are evolving as contributors in generating cultural traits--from game strategies and visual art to scientific results. We provide a conceptual framework for studying the present and anticipated future impact of machines on cultural evolution, and present a research agenda for the study of machine culture.	翻訳日:2023-11-21 20:29:14 公開日:2023-11-19
# 抽出ダイアログ要約のためのLLM支援セミスーパービジョン LLM aided semi-supervision for Extractive Dialog Summarization ( http://arxiv.org/abs/2311.11462v1 ) ライセンス: Link先を確認	Nishant Mishra (1 and 2), Gaurav Sahu (3 and 4), Iacer Calixto (1 and 2), Ameen Abu-Hanna (1 and 2), Issam H. Laradji (4 and 5) ((1) Amsterdam UMC, Department of Medical Informatics, University of Amsterdam, (2) Amsterdam Public Health, Methodology, Amsterdam, The Netherlands, (3) University of Waterloo, (4) Servicenow Research, (5) University of British Columbia)	(参考訳) チャットダイアログの高品質な要約を生成するには、しばしば大きなラベル付きデータセットが必要になる。本研究では,ラベルなしデータを用いてユーザエージェント対話の抽出を効率的に行う手法を提案する。本手法では,問合せ問題として要約をフレーム化し,現在最先端の大規模言語モデル(LLM)を用いてダイアログの擬似ラベルを生成する。次に、これらの擬似ラベルを用いてチャット要約モデルを微調整し、大きなLLMからの知識をより小さな特殊モデルに効果的に転送する。従来のラベル付きデータセットの10 % で 65.9/57.0/61.0 ROUGE-1/-2/L を達成することができるのに対し、トレーニングデータセット全体で訓練された現在の最先端技術では 65.16/55.81/64.37 ROUGE-1/-2/L が得られることを示す。言い換えれば、最悪の場合(ROUGE-L)では、パフォーマンスの94.7%を維持しながら、データの10%しか使用していません。 Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10\% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.	翻訳日:2023-11-21 20:19:50 公開日:2023-11-19
# 地平線からのデコヒーレンス:一般定式化と回転ブラックホール Decoherence from Horizons: General Formulation and Rotating Black Holes ( http://arxiv.org/abs/2311.11461v1 ) ライセンス: Link先を確認	Samuel E. Gralla and Hongji Wei	(参考訳) Danielson, Satishchandran, and Wald (DSW) による最近の研究は、ブラックホール ― そして実際、キリング地平線はより一般的に ― が、近くの全ての量子スーパーポジションに基本的なデコヒーレンスの割合を与えることを示した。ブラックホールの観測者(bob)は、重ねられた重力場を測定することによって、量子重ね合わせの外側を乱すことができるはずであるが、その作用は(因果性によって)この効果を持つことができないため、重ね合わせは自動的に妨害されなければならない。 DSWは、シュワルツシルト時空における遠い観測者、平時時におけるリンドラー観測者、デ・シッター時空における静的観測者に対して、デコヒーレンス率を未知の数値要因まで計算した。電磁的およびクライン=ゴードンアナログで作業し、それらの計算を一般化し、バイフルケートキリング地平線近傍のキリング観測者に対する正確なデコヒーレンス率の一般的な公式を導出する。カーブラックホールの対称性軸上の任意の位置における観測者に対する閉形式の速度を評価する。これにより、遠方のオブザーバーであるシュワルツシルトの結果における数値的要因が修正され、また近接ホリゾンおよび/または極端に近い振る舞いの新たな探索が可能になる。電磁界の場合、クーロン場がブラックホールに入るのを遮蔽する「ブラックホールマイスナー効果」のため、デコヒーレンスは極端に完全に消滅する。ボブは外側の重ね合わせの場を測定することができないので、非一貫性は必要ありません。 Recent work by Danielson, Satishchandran, and Wald (DSW) has shown that black holes -- and, in fact, Killing horizons more generally -- impart a fundamental rate of decoherence on all nearby quantum superpositions. The effect can be understood from measurement and causality: An observer (Bob) in the black hole should be able to disturb outside quantum superpositions by measuring their superposed gravitational fields, but since his actions cannot (by causality) have this effect, the superpositions must automatically disturb themselves. DSW calculated the rate of decoherence up to an unknown numerical factor for distant observers in Schwarzschild spacetime, Rindler observers in flat spacetime, and static observers in de Sitter spacetime. Working in electromagnetic and Klein-Gordon analogs, we flesh out and generalize their calculation to derive a general formula for the precise decoherence rate for Killing observers near bifurcate Killing horizons. We evaluate the rate in closed form for an observer at an arbitrary location on the symmetry axis of a Kerr black hole. This fixes the numerical factor in the distant-observer Schwarzschild result, while allowing new exploration of near-horizon and/or near-extremal behavior. In the electromagnetic case we find that the decoherence vanishes entirely in the extremal limit, due to the "Black hole Meissner effect" screening the Coulomb field from entering the black hole. This supports the causality picture: Since Bob is unable to measure the field of the outside superposition, no decoherence is necessary -- and indeed none occurs.	翻訳日:2023-11-21 20:19:29 公開日:2023-11-19
# 研究ソフトウェアエンジニアの基礎的能力と責任 Foundational Competencies and Responsibilities of a Research Software Engineer ( http://arxiv.org/abs/2311.11457v1 ) ライセンス: Link先を確認	Florian Goth, Renato Alves, Matthias Braun, Leyla Jael Castro, Gerasimos Chourdakis, Simon Christ, Jeremy Cohen, Fredo Erxleben, Jean-No\"el Grad, Magnus Hagdorn, Toby Hodges, Guido Juckeland, Dominic Kempf, Anna-Lena Lamprecht, Jan Linxweiler, Moritz Schwarzmeier, Heidi Seibold, Jan Philipp Thiele, Harald von Waldow, Samantha Wittke	(参考訳) リサーチソフトウェアエンジニア(rse: research software engineer)という用語は、10年以上前に、研究コミュニティで働いている個人を表現する手段として登場した。この用語は広く採用されており、RSEとは何かという高レベルな定義がいくつかある。しかし、RSEの役割は、彼らが働く制度の状況によって異なる。スペクトルの一端では、RSEの役割は伝統的な研究の役割と似ているかもしれない。極端に言えば、それらは業界のソフトウェアエンジニアのものと似ています。 RSEの役割の多くは、この2つの極端の間にある。したがって、RSEが何を行うのか、どのような経験、スキル、能力を必要とするのか、単純で包括的な定義を提供することは困難です。このコミュニティペーパーでは、RSEとは何かという広い概念を定義し、それらが実行しているさまざまなタイプの作業について検討し、基本的能力のリストと、RSEの一般的なプロファイルを定義する値を定義します。そこで我々は,これらのスキルのさまざまな側面による進歩,特定のタイプのRSEの役割の考察,組織に対する推奨の提案,将来的な特殊化の例について詳しく述べる。付録には、既存のcurriculaがこのフレームワークにどのように適合するかが詳述されている。 The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum, RSE roles may look similar to a traditional research role. At the other extreme, they resemble that of a software engineer in industry. Most RSE roles inhabit the space between these two extremes. Therefore, providing a straightforward, comprehensive definition of what an RSE does and what experience, skills and competencies are required to become one is challenging. In this community paper we define the broad notion of what an RSE is, explore the different types of work they undertake, and define a list of fundamental competencies as well as values that define the general profile of an RSE. On this basis, we elaborate on the progression of these skills along different dimensions, looking at specific types of RSE roles, proposing recommendations for organisations, and giving examples of future specialisations. An appendix details how existing curricula fit into this framework.	翻訳日:2023-11-21 20:18:56 公開日:2023-11-19
# タイムアローは生き生きとしたが、物理学の見地からは禁じられている The Arrow of Time is Alive and Well but Forbidden Under the Received View of Physics ( http://arxiv.org/abs/2311.11456v1 ) ライセンス: Link先を確認	R. E. Kastner	(参考訳) この論文は、"arrow of time"(いわゆる"two times"問題)の文脈における物理学の社会学と歴史のメタレベル分析を提供する。事実上、この2つのトピックは相互に絡み合っており、それは社会学的側面を握り、あるメタフィジカル、認識論的、方法論的信念と実践に固執することでのみ、実際の進歩は物理学でできると主張している。 This essay offers a meta-level analysis in the sociology and history of physics in the context of the "Arrow of Time" or so-called "Two Times" problem. In effect, it argues that the two topics are intertwined, and it is only by coming to grips with the sociological aspects, involving adherence to certain metaphysical, epistemological and methodological beliefs and practices, that real progress can be made in the physics.	翻訳日:2023-11-21 20:18:38 公開日:2023-11-19
# 地磁気異常のリアルタイム検出のための物理強化TinyML Physics-Enhanced TinyML for Real-Time Detection of Ground Magnetic Anomalies ( http://arxiv.org/abs/2311.11452v1 ) ライセンス: Link先を確認	Talha Siddique and MD Shaad Mahmud	(参考訳) 地磁気外乱(gmds)や地磁気誘導電流(gic)のような宇宙気象現象は、重要な技術基盤に重大なリスクをもたらす。従来の予測モデルはシミュレーションに基礎を置き、理論的な堅牢性を保ちながら、特に不正確なデータと広範な計算複雑性の同化に悩まされている。近年、Tiny Machine Learning (TinyML) が採用され、リアルタイムな地磁気摂動を予測する機械学習(ML)対応磁気センサシステムの開発が進められている。 TinyMLは効率的でリアルタイムなデータ処理を提供するが、本質的な制限は、高い計算要求を持つ堅牢なメソッドの利用を妨げる。本稿では,これらの課題に対処する物理誘導型TinyMLフレームワークを開発した。このフレームワークは、モデルトレーニングと圧縮の段階で物理ベースの正規化を統合し、予測の信頼性を高める。フレームワーク内で開発されたプルーニングスキームは、ドメイン固有の物理的特性を利用し、モデルサイズとロバストさのバランスを崩す。本研究は,開発フレームワークと従来のフレームワークの精度と信頼性を総合的に比較した経験的結果を示す。このような比較分析は、実時間宇宙天気予報のための堅牢なml磁力計システムの概念化における、開発フレームワークの将来の適用可能性の基礎となっている。 Space weather phenomena like geomagnetic disturbances (GMDs) and geomagnetically induced currents (GICs) pose significant risks to critical technological infrastructure. While traditional predictive models, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.	翻訳日:2023-11-21 20:18:28 公開日:2023-11-19
# 重量規範制御 Weight Norm Control ( http://arxiv.org/abs/2311.11446v1 ) ライセンス: Link先を確認	Ilya Loshchilov	(参考訳) 重みの目標ノルムが 0 に設定されるような重みの標準制御において、疎み付き重みの減衰正則化は特別な場合である。分離重み減衰正規化(英語版)(AdamW)を用いる任意の最適化法(例:Adam)は、ウェイトノルム制御を持つより一般的なアルゴリズム(例:AdamWN)の特別な場合と見なすことができる。重みの目標ノルムを0に設定することは準最適であり、他の目標ノルム値を考えることができる。例えば、AdamWが特定の重みのノルムを達成する任意のトレーニングランは、同等の重みのノルムを達成する予定のAdamWNによって挑戦される。重み減衰の代わりに重みノルム制御を導入することの様々な意味について論じる。 We note that decoupled weight decay regularization is a particular case of weight norm control where the target norm of weights is set to 0. Any optimization method (e.g., Adam) which uses decoupled weight decay regularization (respectively, AdamW) can be viewed as a particular case of a more general algorithm with weight norm control (respectively, AdamWN). We argue that setting the target norm of weights to 0 can be suboptimal and other target norm values can be considered. For instance, any training run where AdamW achieves a particular norm of weights can be challenged by AdamWN scheduled to achieve a comparable norm of weights. We discuss various implications of introducing weight norm control instead of weight decay.	翻訳日:2023-11-21 20:18:07 公開日:2023-11-19
# spot the bot:クラスタリングと情報理論技術を用いた人文とボットによるテキストの識別 Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques ( http://arxiv.org/abs/2311.11441v1 ) ライセンス: Link先を確認	Vasilii Gromov and Quynh Nhu Dang	(参考訳) GPT-3のような生成モデルの開発により、生成したテキストと人間が書いたテキストを区別することがますます困難になっている。ボット識別に優れた結果を示した研究は数多く存在する。しかし、これらの研究の大部分は、ラベル付きデータやボットモデルアーキテクチャに関する事前知識を必要とする教師あり学習手法に依存している。本研究では,教師なし学習手法に基づいて,大量のラベル付きデータに依存しないボット識別アルゴリズムを提案する。クラスタリング (crisp と fuzzy) による意味解析の知見と情報技術を組み合わせることで,さまざまな種類のボットに対して生成されたテキストを検出する頑健なモデルを構築する。生成したテキストはよりカオス的になりがちだが、文学作品はより複雑である。また、人間のテキストのクラスタリングは、ボット生成テキストのよりコンパクトでより分離されたクラスタと比較してファジエクラスタをもたらすことを示した。 With the development of generative models like GPT-3, it is increasingly more challenging to differentiate generated texts from human-written ones. There is a large number of studies that have demonstrated good results in bot identification. However, the majority of such works depend on supervised learning methods that require labelled data and/or prior knowledge about the bot-model architecture. In this work, we propose a bot identification algorithm that is based on unsupervised learning techniques and does not depend on a large amount of labelled data. By combining findings in semantic analysis by clustering (crisp and fuzzy) and information techniques, we construct a robust model that detects a generated text for different types of bot. We find that the generated texts tend to be more chaotic while literary works are more complex. We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.	翻訳日:2023-11-21 20:17:51 公開日:2023-11-19
# Slicing Aided Hyper Inference with Refinement Strategy による高度なICノードの欠陥検出と分類法の改善 Improved Defect Detection and Classification Method for Advanced IC Nodes by Using Slicing Aided Hyper Inference with Refinement Strategy ( http://arxiv.org/abs/2311.11439v1 ) ライセンス: Link先を確認	Vic De Ridder, Bappaditya Dey, Victor Blanco, Sandip Halder, Bartel Van Waeyenberge	(参考訳) 半導体製造において、リソグラフィーはしばしば最小のパターン次元を定義する製造ステップである。近年,高NA(数値開口)EUVL(Extreme-Ultraviolet-Lithography)パラダイムへの進展が見られ,パターン縮小(2nm以下)が期待されている。しかし,高naでは確率的欠陥の増加と欠陥検出の複雑さが顕著になる。現状の欠陥検査技術(非機械学習と機械学習ベースの両方)は、高NA次元での良好な性能を達成できない。本研究では,slicing aided hyper inference (sahi) フレームワークを用いて,現在の手法を改善する方法について検討する。 SAHIを用いて、SEM画像のサイズ増加スライスに対して推論を行う。これにより、オブジェクト検出器の受信フィールドは、小さな欠陥インスタンスをキャプチャするのにより効果的になる。まず,これまでに検討した半導体データセットの性能を様々な構成でベンチマークし,SAHI法により小さな欠陥の検出を近似により大幅に向上することを示した。 2倍。その後、トレーニング中にシナリオが発生しなかった新しいテストデータセットに対して、SAHIの適用が欠陥のない検出率につながることを実証した。最後に、真陽性予測を著しく減らすことなく偽陽性予測を排除できるsahiの拡張を定式化する。 In semiconductor manufacturing, lithography has often been the manufacturing step defining the smallest possible pattern dimensions. In recent years, progress has been made towards high-NA (Numerical Aperture) EUVL (Extreme-Ultraviolet-Lithography) paradigm, which promises to advance pattern shrinking (2 nm node and beyond). However, a significant increase in stochastic defects and the complexity of defect detection becomes more pronounced with high-NA. Present defect inspection techniques (both non-machine learning and machine learning based), fail to achieve satisfactory performance at high-NA dimensions. In this work, we investigate the use of the Slicing Aided Hyper Inference (SAHI) framework for improving upon current techniques. Using SAHI, inference is performed on size-increased slices of the SEM images. This leads to the object detector's receptive field being more effective in capturing small defect instances. First, the performance on previously investigated semiconductor datasets is benchmarked across various configurations, and the SAHI approach is demonstrated to substantially enhance the detection of small defects, by approx. 2x. Afterwards, we also demonstrated application of SAHI leads to flawless detection rates on a new test dataset, with scenarios not encountered during training, whereas previous trained models failed. Finally, we formulate an extension of SAHI that does not significantly reduce true-positive predictions while eliminating false-positive predictions.	翻訳日:2023-11-21 20:17:35 公開日:2023-11-19
# バーと形状距離の二重性とニューラル表現の比較 Duality of Bures and Shape Distances with Implications for Comparing Neural Representations ( http://arxiv.org/abs/2311.11436v1 ) ライセンス: Link先を確認	Sarah E. Harvey, Brett W. Larsen, Alex H. Williams	(参考訳) ニューラルネットワーク表現間の複数の類似度尺度が提案され、その結果、断片化された研究ランドスケープが生み出された。これらの措置のほとんどは2つのカテゴリーの1つに分類される。第一に、線形回帰、正準相関解析(CCA)、形状距離といった尺度は、期待される不変性を考慮して類似性を定量化するために神経ユニット間の明示的なマッピングを学習する。第二に、表現的類似度解析(RSA)、中心核アライメント(CKA)、正規化されたバーズ類似度(NBS)といった尺度は、既に期待される対称性に不変である刺激バイ刺激核行列のような要約統計において類似度を定量化する。ここでは、リーマン形状距離(圏 1 から)の余弦が NBS (圏 2 から) に等しいことを観察することによって、これらの2つの広い圏の方法を統合するためのステップをとる。この関係が形状距離やNBSの新たな解釈につながるのかを考察し、深層学習文学において一般的な類似度尺度であるCKAと対比する。 A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. Here, we take steps towards unifying these two broad categories of methods by observing that the cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2). We explore how this connection leads to new interpretations of shape distances and NBS, and draw contrasts of these measures with CKA, a popular similarity measure in the deep learning literature.	翻訳日:2023-11-21 20:17:13 公開日:2023-11-19
# インドにおける新型コロナウイルスワクチンの機械学習による感受性分析 Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India ( http://arxiv.org/abs/2311.11435v1 ) ライセンス: Link先を確認	Milind Gupta and Abhishek Kaushik	(参考訳) 2020年3月、世界保健機関(WHO)は新型コロナウイルスの世界的な感染拡大を宣言。 2021年半ばまでに、インドはコビシエルド、コヴァクシン、スプートニクの3つのワクチンを導入した。インドのような人口密度の高い国でワクチン接種が成功するためには、大衆の感情を理解することが不可欠だった。ソーシャルメディア、特にredditは4億3000万人のユーザーを抱えており、情報を広める上で重要な役割を果たした。この研究では、Redditのデータを分析し、新型コロナウイルスワクチンに対するインド人の感情を測定するためにデータマイニング技術を採用している。 PythonのText Blobライブラリを使って、コメントは一般的な感情を評価するために注釈付けされる。結果、インドのredditユーザーのほとんどが、予防接種に関する中立性を示しており、インド政府は人口のかなりの部分を予防接種しようとしている。 In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.	翻訳日:2023-11-21 20:16:54 公開日:2023-11-19
# トポロジカルエッジ状態による光子遮断の促進 Enhancement of photon blockade via topological edge states ( http://arxiv.org/abs/2311.11431v1 ) ライセンス: Link先を確認	Jun Li, Can-ming Hu, Yaping Yang	(参考訳) 量子技術は、量子光源の不安定性、量子デコヒーレンス、トポロジカルフォトニクスが完全に対処する損失に対する脆弱性など、古典的なタスクよりも指数関数的に優れたパフォーマンスを約束している。ここでは, 位相的保護により単一光子遮断効果(単一PB)を著しく向上させる量子Su-Schrieffer-Heeger型連鎖を理論的に提案する。意図的な結合強度を設計することにより、量子レベル格子は、単励起空間に位相的エッジ状態を持つ1次元配列と、2励起空間に位相的コーナー状態を持つ2次元四角形呼吸格子の形状となり、結果として単光子励起と2光子遷移の抑制が促進される。そのため, 2次相関関数は, 高輝度の共振器共振器周波数で最大2次まで減少し, さらに, PB効果は共振器-量子結合および量子周波数の局所摂動に強く, 位相的保護の恩恵を受けている。 Quantum technologies, holding the promise of exponentially superior performance than their classical counterparts for certain tasks, have consistently encountered challenges, including instability in quantum light sources, quantum decoherence and vulnerability to losses that topological photonics happens to adeptly address. Here, we theoretically put forth a quantum Su-Schrieffer-Heeger-type chain designed to greatly enhance single-photon blockade (single-PB) effect with topological protection. By designing the deliberate coupling strengths, the quantum-level lattices take the form of a one-dimensional array with a topological edge state in single-excitation space and a two-dimensional square breathing lattice with topological corner states in two-excitation space, resulting in enhanced single-photon excitation and the suppression of two-photon transitions. Therefore the second-order correlation function is diminished by up to two orders of magnitude at the cavity resonance frequency, accompanied by stronger brightness.Furthermore, the PB effect is robust to local perturbations in cavity-qubit coupling and qubit frequency, benefitting from topological protection.	翻訳日:2023-11-21 20:16:42 公開日:2023-11-19
# ニューラルネットワークトレーニングにおける重みと入力間の高速重み付き内積同定 Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training ( http://arxiv.org/abs/2311.11429v1 ) ライセンス: Link先を確認	Lianke Qin, Saayan Mitra, Zhao Song, Yuanyuan Yang, Tianyi Zhou	(参考訳) In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $\|A\|=\|B\| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i.e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, the goal is to identify those $k$ heavy inner products. 我々は、$o(n^{2 \omega / 3+ o(1)})$で実行されるアルゴリズムを提供し、$\rho \cdot d$しきい値を超える$k$の内積ペアを高い確率で見つけ、$\omega$が現在の行列乗算指数である。この問題を解決することで、ReLUアクティベーション機能を備えたニューラルネットワークのトレーニングを高速化する。 In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $\|A\|=\|B\| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i.e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, the goal is to identify those $k$ heavy inner products. We provide an algorithm that runs in $O(n^{2 \omega / 3+ o(1)})$ time to find the $k$ inner product pairs that surpass $\rho \cdot d$ threshold with high probability, where $\omega$ is the current matrix multiplication exponent. By solving this problem, our method speed up the training of neural networks with ReLU activation function.	翻訳日:2023-11-21 20:16:19 公開日:2023-11-19

Title

Authors

Abstract

論文公表日・翻訳日

# アナログ・高周波回路最適化のための回路中心型遺伝的アルゴリズム(CGA)

Circuit-centric Genetic Algorithm (CGA) for Analog and Radio-Frequency Circuit Optimization ( http://arxiv.org/abs/2403.17938v1 )

ライセンス: Link先を確認

Mingi Kwon, Yeonjun Lee, Ickhyun Song,

(参考訳) 本稿では,RF受信機の性能パラメータを最大化することを目的とした,アナログ/高周波回路におけるパラメータの自動最適化手法を提案する。設計対象は、消費電力とノイズフィギュアの低減と変換ゲインの増加を含む。本研究では,レシーバの最適化に人工アルゴリズムを用い,各種回路パラメータを用いた性能パラメータの達成方法について検討した。従来の遺伝的アルゴリズム(GA)の課題を克服するために,回路中心型遺伝的アルゴリズム(CGA)の概念が提案されている。提案手法では,既存のディープラーニングモデルよりもシンプルで計算効率のよい推論プロセスを採用する。さらに、CGAは、最適点を見つけるための手動設計よりも、設計者の作業量を軽減し、優れた最適点を探しながら、従来のGAよりも大きな利点を提供する。

This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating how to fulfill the performance parameters with diverse circuit parameters. To overcome issues observed in the traditional Genetic Algorithm (GA), the concept of the Circuit-centric Genetic Algorithm (CGA) is proposed as a viable approach. The new method adopts an inference process that is simpler and computationally more efficient than the existing deep learning models. In addition, CGA offers significant advantages over manual design of finding optimal points and the conventional GA, mitigating the designer's workload while searching for superior optimum points.

翻訳日:2024-04-01 02:44:33 公開日:2023-11-19

# ミニアプリにおけるセキュリティと脆弱性のシステマティック分析

Systematic Analysis of Security and Vulnerabilities in Miniapps ( http://arxiv.org/abs/2311.11382v1 )

ライセンス: Link先を確認

Yuyang Han, Xu Ji, Zhiqiang Wang, Jianyi Zhang,

(参考訳) 過去数年間、軽量アプリケーションとして、モバイルインターネットセクターではミニアプリがとても重要視されているため、ミニアプリが急増しているのを目撃してきた。このため、ミニアプリのセキュリティは、機密データの整合性を損なうことに直接影響し、ユーザーのプライバシーを脅かす可能性がある。しかし,ミニアプリ・セキュリティに関する様々な研究成果を概観した結果,ミニアプリ・ウェブ・インタフェースの安全性に関する研究における彼らの行動は限られていることが判明した。本稿では,ミニアプリのセキュリティリスクを軽減するために,ユーザ,サーバ,攻撃者に着目したトリアド脅威モデルを提案する。最小特権の原則と許可の整合性の方向性に従うことにより,このモデルによるミニアプリのセキュリティリスク評価のための新しい分析フレームワークを設計する。そして,セキュリティリスク評価と,ミニアプリに関連する脅威モデルとの相関関係を解析した。この分析により、潜在的なスコープを特定し、セキュリティリスクと分類することが可能になる。ケーススタディでは、SQLインジェクション、論理的脆弱性、クロスサイトスクリプティングなど、9つの主要な脆弱性のカテゴリを特定します。また,50,628件のセキュリティリスクリスクの評価を行い,具体例を示した。

The past few years have witnessed a boom of miniapps, as lightweight applications, miniapps are of great importance in the mobile internet sector. Consequently, the security of miniapps can directly impact compromising the integrity of sensitive data, posing a potential threat to user privacy. However, after a thorough review of the various research efforts in miniapp security, we found that their actions in researching the safety of miniapp web interfaces are limited. This paper proposes a triad threat model focusing on users, servers and attackers to mitigate the security risk of miniapps. By following the principle of least privilege and the direction of permission consistency, we design a novel analysis framework for the security risk assessment of miniapps by this model. Then, we analyzed the correlation between the security risk assessment and the threat model associated with the miniapp. This analysis led to identifying potential scopes and categorisations with security risks. In the case study, we identify nine major categories of vulnerability issues, such as SQL injection, logical vulnerabilities and cross-site scripting. We also assessed a total of 50,628 security risk hazards and provided specific examples.

翻訳日:2024-03-18 22:53:06 公開日:2023-11-19

# ニコラム乗算を用いた効率的な楕円曲線暗号算術

An Efficient Elliptic Curve Cryptography Arithmetic Using Nikhilam Multiplication ( http://arxiv.org/abs/2311.11392v1 )

ライセンス: Link先を確認

Prokash Barman, Banani Saha,

(参考訳) 乗算は楕円曲線暗号(ECC)算術において最も重要な演算の1つである。 ECCスカラー(整数)乗算における点加算と点倍増は必要である。より高階の古典的(標準的な)乗法では、多くの中間演算が必要である。乗算における演算の削減は、ECC演算の関数速度を増大させる。これらの目的は、古代の乗算アルゴリズムである日比羅経を用いて達成できる。ニキラーム経(ニキラームきょう)は、16のヴェーダ数学の経典(algorithms)に含まれる経典(algorithms)の一つ。ニヒラム経は2つの大きな十進数の乗算に効率的である。この経典は2つの大数の乗算を2つの小数の乗算に減らす。楕円曲線暗号の関数速度は、スカラー乗算のためのNikhilam法を用いて向上することができる。

Multiplication is one of the most important operation in Elliptic Curve Cryptography (ECC) arithmetic. For point addition and point doubling in ECC scalar (integer) multiplication is required. In higher order classical (standard) multiplication many intermediate operations are required. Reduced operation in multiplication will increase the functional speed of ECC arithmetic. These goals can be achieved using ancient multiplication algorithm namely Nikhilam Sutra. Nikhilam Sutra is one of the Sutra (algorithm) within 16 Vedic mathematics Sutras (algorithms). Nikhilam Sutra is efficient for multiplying two large decimal numbers. The Sutra reduces multiplication of two large numbers into two smaller numbers multiplication. The functional speed of Elliptic Curve Cryptography can be increased using Nikhilam method for scalar multiplication.

翻訳日:2024-03-18 22:53:06 公開日:2023-11-19

# IoTセキュリティのためのDNAエンコード楕円曲線暗号システム

DNA Encoded Elliptic Curve Cryptography System for IoT Security ( http://arxiv.org/abs/2311.11393v1 )

ライセンス: Link先を確認

Prokash Barmana, Banani Saha,

(参考訳) コンピュータ科学と情報技術(IoT)の分野において、モノのインターネット(IoT)は新興技術の1つである。 IoT環境では、複数のデバイスが相互接続され、それらの間でデータを送信する。 IoT環境には、何らかのセキュリティ上の脆弱性が発生する可能性がある。これまでのところ、IoTはセキュリティ上の欠陥のために広く受け入れられていない。したがって、IoT環境を最も堅牢に保つため、DNAエンコーディングを用いた楕円曲線暗号(ECC)によるIoTの安定したセキュリティフレームワークを提案する。 ECCは、他のよく知られた公開鍵暗号技術の中でも最も軽量な暗号技術である。暗号化の複雑さを高めるため、ECCを用いたDNA計算のDNA符号化機構が先行している。

In the field of Computer Science and Information Technology Internet of Things (IoT) is one of the emerging technologies. In IoT environment several devices are interconnected and transmit data among them. There may be some security vulnerability arise within the IoT environment. Till date, IoT has not been widely accepted due to its security flaws. Hence to keep the IoT environment most robust, we propose a stable security framework of IoT with Elliptic Curve Cryptography (ECC) using DNA Encoding. The ECC is most lightweight cryptography technique among other well known public key cryptography techniques. To increase encryption complexity, DNA encoding mechanism of DNA computing with ECC is preceded.

翻訳日:2024-03-18 22:53:06 公開日:2023-11-19

# 組込みシステムにおけるECQV暗証書の動的セキュアセッションの確立

Establishing Dynamic Secure Sessions for ECQV Implicit Certificates in Embedded Systems ( http://arxiv.org/abs/2311.11444v1 )

ライセンス: Link先を確認

Fikret Basic, Christian Steger, Robert Kofler,

(参考訳) IoTや自動車の領域では、暗黙の証明書が、制約のある組み込みデバイスでますます人気を集めている。彼らは、共通の脅威に対するリソース効率の高いセキュリティソリューションを提示します。計算要求はもはや主要な問題ではない。現在では、提供されたセキュリティレベルと引き起こされた脅威モデルとの適切なバランスを決定することに重点を置いている。ほとんどの設計ソリューションは静的キーの導出のみに基づいており、したがって完全なフォワードの秘密性が欠如している。これにより、送信されたデータは、通信セッションではなく、証明書に結びついたキーを持つことによって、将来的な露見のために開放される。そこで我々は,STS(Station to Station)プロトコルを暗黙の証明書で利用する設計を提案することで,このギャップに対処することを目指している。さらに,提案する設計と最先端鍵導出プロトコルとの間の性能およびセキュリティレベルを総合的に検討する。比較研究では,静的ECDSAキーの導出に比べて計算量が20倍に増加し,セッション関連のセキュリティ脆弱性を軽減できることが示されている。

Be it in the IoT or automotive domain, implicit certificates are gaining ever more prominence in constrained embedded devices. They present a resource-efficient security solution against common threat concerns. The computational requirements are not the main issue anymore. The focus is now placed on determining a good balance between the provided security level and the derived threat model. A security aspect that often gets overlooked is the establishment of secure communication sessions, as most design solutions are based only on the use of static key derivation, and therefore, lack the perfect forward secrecy. This leaves the transmitted data open for potential future exposures by having keys tied to the certificates rather than the communication sessions. We aim to patch this gap, by presenting a design that utilizes the Station to Station (STS) protocol with implicit certificates. In addition, we propose potential protocol optimization implementation steps and run a comprehensive study on the performance and security level between the proposed design and the state-of-the-art key derivation protocols. In our comparative study, we show that with a slight computational increase of 20\% compared to a static ECDSA key derivation, we are able to mitigate many session-related security vulnerabilities that would otherwise remain open.

翻訳日:2024-03-18 22:53:06 公開日:2023-11-19

# EditShield: 命令誘導拡散モデルによる未許可画像編集の保護

EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models ( http://arxiv.org/abs/2311.12066v1 )

ライセンス: Link先を確認

Ruoxi Chen, Haibo Jin, Jinyin Chen, Lichao Sun,

(参考訳) テキスト・ツー・イメージの拡散モデルは、画像合成において創造的なコンテンツを生み出す進化の過程として現れてきた。これらのモデルの印象的な生成能力に基づいて、命令誘導拡散モデルは、簡単な命令と入力画像で画像を編集することができる。ユーザーは自由に編集された画像を入手することができるが、許可されていない画像操作に関する懸念が持ち上がっている。従来の研究では、パーソナライズされた拡散モデルの未承認利用が検討されてきたが、命令誘導拡散モデルのこの問題はいまだほとんど解明されていない。本稿では,このようなモデルからの不正な修正に対する保護手法であるEditShieldを提案する。具体的には、EditShieldは拡散過程で使用される潜伏表現をシフトできる知覚不能な摂動を追加することで、モデルに不一致の被写体で非現実的な画像を生成するように強制する。人工および実世界のデータセット間でEditShieldの有効性を実証した。さらにEditShieldは、様々な編集タイプや同義語命令フレーズに対する堅牢性も維持している。

Text-to-image diffusion models have emerged as an evolutionary for producing creative content in image synthesis. Based on the impressive generation abilities of these models, instruction-guided diffusion models can edit images with simple instructions and input images. While they empower users to obtain their desired edited images with ease, they have raised concerns about unauthorized image manipulation. Prior research has delved into the unauthorized use of personalized diffusion models; however, this problem of instruction-guided diffusion models remains largely unexplored. In this paper, we first propose a protection method EditShield against unauthorized modifications from such models. Specifically, EditShield works by adding imperceptible perturbations that can shift the latent representation used in the diffusion process, forcing models to generate unrealistic images with mismatched subjects. Our extensive experiments demonstrate EditShield's effectiveness among synthetic and real-world datasets. Besides, EditShield also maintains robustness against various editing types and synonymous instruction phrases.

翻訳日:2024-03-18 15:51:52 公開日:2023-11-19

# ノイズフリー抵抗を用いた鍵分配方式の暗号解析

Crypto analysis of the key distribution scheme using noise-free resistances ( http://arxiv.org/abs/2312.00031v1 )

ライセンス: Link先を確認

Laszlo B. Kish,

(参考訳) 情報理論(無条件)セキュリティを提供する鍵交換方式は複雑で実装に費用がかかる。それでも、キー交換における無条件のセキュリティを達成するための唯一の方法である。したがって、情報理論セキュリティのためのより単純なソリューションの探索は、極めて正当化されている。 Linらは、熱ノイズのない抵抗と直流電圧を利用する興味深いハードウェアキー分配方式を提案した。このシステムの暗号解析について述べる。イヴが過去にも未来にもいつでも共有秘密にアクセスできれば、受動的に取得され記録された電圧と電流を用いて、過去と未来に生成されたすべての鍵を、遡っても破ることに成功したことが示される。したがって、このスキームはセキュアな鍵交換器ではないが、もともと共有されていた秘密以上の情報エントロピーを持たないキー拡張器である。また,本提案手法は,通信を効率よく認証できないため,元来の共有秘密が漏洩した場合に有効ではないことを指摘する。しかし、認証された通信プロトコルを有効にするために、無条件でセキュアな鍵交換器が適用された場合、動作します。

Known key exchange schemes offering information-theoretic (unconditional) security are complex and costly to implement. Nonetheless, they remain the only known methods for achieving unconditional security in key exchange. Therefore, the explorations for simpler solutions for information-theoretic security are highly justified. Lin et al. [1] proposed an interesting hardware key distribution scheme that utilizes thermal-noise-free resistances and DC voltages. A crypto analysis of this system is presented. It is shown that, if Eve gains access to the initial shared secret at any time in the past or future, she can successfully crack all the generated keys in the past and future, even retroactively, using passively obtained and recorded voltages and currents. Therefore, the scheme is not a secure key exchanger, but it is rather a key expander with no more information entropy than the originally shared secret at the beginning. We also point out that the proposed defense methods against active attacks do not function when the original shared secret is compromised because then the communication cannot be efficiently authenticated. However, they do work when an unconditionally secure key exchanger is applied to enable the authenticated communication protocol.

翻訳日:2024-03-18 13:35:06 公開日:2023-11-19

# 変貌する法医学的ツールマーク分析:客観的かつ透明な比較アルゴリズム

Revolutionizing Forensic Toolmark Analysis: An Objective and Transparent Comparison Algorithm ( http://arxiv.org/abs/2312.00032v1 )

ライセンス: Link先を確認

Maria Cuellar and Sheng Gao and Heike Hofmann

(参考訳) 現在、法医学的なツールマークの比較は人間によって主観的に行われており、一貫性と精度の欠如につながっている。検査官が同じツールか異なるツールでマークのペアが作られたかどうかを判断できる証拠はほとんどない。また、攻撃の角度やマーク生成の方向など、異なる条件下でマークが作られる場合、この分類が可能であるという証拠もほとんどない。元のツールマークデータを3Dで生成し、各ツールマークから信号を抽出し、ツールマーク信号を客観的に比較するためのアルゴリズムを訓練する。ツールマークの信号は、角度や方向ではなく、ツールによってクラスタ化されています。すなわち、ツール内の可変性は、角度/方向に関わらず、ツール間の可変性よりも小さい。既知のマッチと既知の非マッチ密度は、データの依存関係を考慮に入れた場合でも、重複が小さいため、新しい一対のマークが同じツールで作られたかどうかを判断するのに有用な手段である。ツールマーク信号と不確実性の尺度を比較するための形式的手法として,確率比法を提案する。この実験的に訓練されたオープンソース手法は、鑑識によって客観的にツールマークを比較し、ツールマークの比較の信頼性を向上させるために用いられる。これにより、刑事司法制度における司法の流産を減らすことができる。

Forensic toolmark comparisons are currently performed subjectively by humans, which leads to a lack of consistency and accuracy. There is little evidence that examiners can determine whether pairs of marks were made by the same tool or different tools. There is also little evidence that they can make this classification when marks are made under different conditions, such as different angles of attack or direction of mark generation. We generate original toolmark data in 3D, extract the signal from each toolmarks, and train an algorithm to compare toolmark signals objectively. We find that toolmark signals cluster by tool, and not by angle or direction. That is, the variability within tool, regardless of angle/direction, is smaller than the variability between tools. The known-match and known-non-match densities of the similarities of pairs of marks have a small overlap, even when accounting for dependencies in the data, making them a useful instrument for determining whether a new pair of marks was made by the same tool. We provide a likelihood ratio approach as a formal method for comparing toolmark signals with a measure of uncertainty. This empirically trained, open-source method can be used by forensic examiners to compare toolmarks objectively and thus improve the reliability of toolmark comparisons. This can, in turn, reduce miscarriages of justice in the criminal justice system.

翻訳日:2023-12-11 03:53:47 公開日:2023-11-19

# チューリングテスト:AIチャットボットは人間に似ているか?

A Turing Test: Are AI Chatbots Behaviorally Similar to Humans? ( http://arxiv.org/abs/2312.00798v1 )

ライセンス: Link先を確認

Qiaozhu Mei, Yutong Xie, Walter Yuan, Matthew O. Jackson

(参考訳) aiチャットボットにチューリングテストを実行します。チャットボットは,信頼,公平性,リスク回避,協力,<textit{etc>>といった特性を引き出すように設計された,一連の古典的な行動ゲームの中でどのように振る舞うかを検討する。パーソナリティ特性を測定する従来のbig-5心理学的調査と同様に。 ChatGPT-4はチューリングテストに合格し、50カ国以上からの数十万人の人間の行動との比較に基づいて、人間のような行動特性と性格特性を一貫して示す。チャットボットはまた、以前の経験に基づいて行動を変更し、そのやりとりから学習していたコンテキストを‘as if’と表現し、同じ戦略的状況の異なるフレーミングに対応して行動を変える。彼らの行動は、平均的な人間の行動と、より利他的かつ協調的な分布の端で行動する傾向にある。私たちは彼らが自分自身とパートナーの報酬の平均を最大化しているかのように振る舞うと見積もっています。

We administer a Turing Test to AI Chatbots. We examine how Chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, \textit{etc.}; as well as a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 passes the Turing Test in that it consistently exhibits human-like behavioral and personality traits based on a comparison to the behavior of hundreds of thousands of humans from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts ``as if'' they were learning from the interactions, and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoff.

翻訳日:2023-12-11 03:45:01 公開日:2023-11-19

# トレーニング可能なCOSFIREフィルタを用いたラジオギャラクシーの分類

Classification of Radio Galaxies with trainable COSFIRE filters ( http://arxiv.org/abs/2311.11286v1 )

ライセンス: Link先を確認

Steven Ndungu, Trienko Grobler, Stefan J. Wijnholds Dimka Karastoyanova, George Azzopardi

(参考訳) 電波銀河は多様な特性を示し、様々な放射メカニズムを通じて電波放射を放出し、形態に基づいた異なる種類に分類することは複雑な課題である。この課題を効果的に解決するために,コスファイアフィルタを用いた電波銀河分類の革新的アプローチを提案する。これらのフィルタは、画像内のプロトタイプパターンの形状と向きの両方に適応する能力を持っている。 COSFIREアプローチは、説明可能で、学習不要で、回転耐性があり、効率的で、巨大なトレーニングセットを必要としない。本手法の有効性を評価するため,1180個のトレーニングサンプルと404個のテストサンプルからなるベンチマーク電波銀河データセットの実験を行った。特に,本手法は平均精度93.36\%を達成した。この成果は、現代のディープラーニングモデルよりも優れており、このデータセット上で達成された最高の結果です。さらに、COSFIREフィルタはより優れた計算性能を提供し、$\sim$20$\times$演算はDenseNetベースの競合メソッドよりも少ない(同じ精度で比較した場合)。本研究は,COSFIREフィルタを用いたラジオ銀河分類の複雑さに対処する手法の有効性を裏付けるものである。この研究は、電波銀河観測に内在する方向の課題を超越するロバストな解を提供することによって、この分野の進歩に貢献している。本手法は,様々な画像分類手法に適用できるという点で多様である。

Radio galaxies exhibit a rich diversity of characteristics and emit radio emissions through a variety of radiation mechanisms, making their classification into distinct types based on morphology a complex challenge. To address this challenge effectively, we introduce an innovative approach for radio galaxy classification using COSFIRE filters. These filters possess the ability to adapt to both the shape and orientation of prototype patterns within images. The COSFIRE approach is explainable, learning-free, rotation-tolerant, efficient, and does not require a huge training set. To assess the efficacy of our method, we conducted experiments on a benchmark radio galaxy data set comprising of 1180 training samples and 404 test samples. Notably, our approach achieved an average accuracy rate of 93.36\%. This achievement outperforms contemporary deep learning models, and it is the best result ever achieved on this data set. Additionally, COSFIRE filters offer better computational performance, $\sim$20$\times$ fewer operations than the DenseNet-based competing method (when comparing at the same accuracy). Our findings underscore the effectiveness of the COSFIRE filter-based approach in addressing the complexities associated with radio galaxy classification. This research contributes to advancing the field by offering a robust solution that transcends the orientation challenges intrinsic to radio galaxy observations. Our method is versatile in that it is applicable to various image classification approaches.

翻訳日:2023-12-03 14:11:12 公開日:2023-11-19

# 大規模言語モデルを用いた財務書類に対するゼロショット質問応答

Zero-Shot Question Answering over Financial Documents using Large Language Models ( http://arxiv.org/abs/2311.14722v1 )

ライセンス: Link先を確認

Karmvir Singh Phogat, Chetan Harsha, Sridhar Dasaratha, Shashishekar Ramakrishna, Sai Akhil Puranam

(参考訳) 我々は,財務報告に対するマルチホップ数値推論を必要とする複雑な問題に答えるために,大規模言語モデル(LLM)に基づくアプローチを導入する。 LLMは様々な自然言語や推論タスクにおいて顕著な性能を示してきたが、複雑な推論問題はしばしば、慎重に例を作らなければならない数発のプロンプトに依存している。対照的に、我々のアプローチでは、LLMを誘導する新しいゼロショットプロンプトを使用して、必要な推論をPythonプログラムやドメイン固有言語にエンコードする。生成されたプログラムはプログラムインタープリタによって実行され、正確な算術演算を行う際の LLM の制限を緩和する。提案手法を,最近開発されたGPTモデルを用いて3つの財務データセットに対して評価し,様々なゼロショットベースラインとの比較を行った。実験結果から,本手法は各ベースライン上でのLLMの精度を著しく向上することが示された。結果の詳細な分析を行い、調査結果をサポートする洞察を与えます。提案手法の成功は,LLMに埋め込まれた知識を効果的に活用するためにゼロショットプロンプトを設計することで,複雑な領域固有の数値推論を抽出する可能性を示す。

We introduce a large language model (LLM) based approach to answer complex questions requiring multi-hop numerical reasoning over financial reports. While LLMs have exhibited remarkable performance on various natural language and reasoning tasks, complex reasoning problems often rely on few-shot prompts that require carefully crafted examples. In contrast, our approach uses novel zero-shot prompts that guide the LLM to encode the required reasoning into a Python program or a domain specific language. The generated program is then executed by a program interpreter, thus mitigating the limitations of LLM in performing accurate arithmetic calculations. We evaluate the proposed approach on three financial datasets using some of the recently developed generative pretrained transformer (GPT) models and perform comparisons with various zero-shot baselines. The experimental results demonstrate that our approach significantly improves the accuracy for all the LLMs over their respective baselines. We provide a detailed analysis of the results, generating insights to support our findings. The success of our approach demonstrates the enormous potential to extract complex domain specific numerical reasoning by designing zero-shot prompts to effectively exploit the knowledge embedded in LLMs.

翻訳日:2023-12-03 13:40:35 公開日:2023-11-19

# 学術雑誌のマニュアル作成におけるAI活用

AI Use in Manuscript Preparation for Academic Journals ( http://arxiv.org/abs/2311.14720v1 )

ライセンス: Link先を確認

Nir Chemaya and Daniel Martin

(参考訳) ChatGPTやBardといったツールを駆使したLarge Language Models(LLMs)の創発的な能力は、AIが学術的な文章にどう影響するかという興奮と心配の両方を生み出した。 ai利用に関する懸念が高まる中、学術出版物の著者は自発的に原稿の改訂に使用するaiツールを開示し、ジャーナルやカンファレンスは開示の義務付けや検出サービスの利用を開始する可能性がある。こうした略奪的可能性を踏まえ、学術者は、原稿作成におけるAIの使用を報告する必要があるとみなし、検出器が学術著作におけるAIの使用にどう反応するかを調査する。

The emergent abilities of Large Language Models (LLMs), which power tools like ChatGPT and Bard, have produced both excitement and worry about how AI will impact academic writing. In response to rising concerns about AI use, authors of academic publications may decide to voluntarily disclose any AI tools they use to revise their manuscripts, and journals and conferences could begin mandating disclosure and/or turn to using detection services, as many teachers have done with student writing in class settings. Given these looming possibilities, we investigate whether academics view it as necessary to report AI use in manuscript preparation and how detectors react to the use of AI in academic writing.

翻訳日:2023-12-03 13:40:14 公開日:2023-11-19

# 工学設計のための説明可能なAI:システム工学とコンポーネントベースディープラーニングの統一的アプローチ

Explainable AI for engineering design: A unified approach of systems engineering and component-based deep learning ( http://arxiv.org/abs/2108.13836v4 )

ライセンス: Link先を確認

Philipp Geyer, Manav Mahan Singh and Xia Chen

(参考訳) 機械学習によって作成されたデータ駆動モデルは、設計とエンジニアリングのあらゆる分野で重要性を増している。彼らは、より良いパフォーマンスと持続可能性を持つ新しいアーティファクトを作成する意思決定者を支援する高い可能性を持っている。しかし、これらのモデルの一般化とブラックボックスの性質は、説明可能性と再利用性に制限がある。このような状況を克服するため,機械学習(ML)による部分コンポーネントモデル作成のためのコンポーネントベースアプローチを提案する。このコンポーネントベースのアプローチは、ディープラーニングとシステムエンジニアリング(SE)を連携させる。エネルギー効率の高い建築設計の分野において,まず,トレーニングデータ外の予測精度を解析し,コンポーネントベース手法のより優れた一般化を実証する。従来のモノリシック法に比べてはるかに高い精度(R2 = 0.94)を観測する(R2 = 0.71)。第2に、SEからの感度情報と低深度決定木からのルールがいかに工学に役立つかを示す。第3に、予備知識とデータ駆動型戦略の整合性を示す定性的定量的手法による説明可能性の評価を行い、ホワイトボックスシミュレーション結果(エンベロープ成分: R2 = 0.92.0.99; ゾーン: R2 = 0.78.0.93)と比較して、コンポーネントインタフェースにおけるアクティベーションの正しさを示す。コンポーネントベースの説明可能性の鍵は、コンポーネント間のインターフェイスのアクティベーションが解釈可能なエンジニアリング量であることである。コンポーネントを構成する可能性の広い構成は、理解可能なデータ駆動モデルで見知らぬ新しい設計ケースの検証を可能にする。類似の確率分布による成分のパラメータ範囲のマッチングは、再利用可能な、一般化された、信頼できるモデルを生み出す。このアプローチは、モデル構造をシステム工学の工学的手法とドメイン知識に適応させる。

Data-driven models created by machine learning gain in importance in all fields of design and engineering. They have high potential to assist decision-makers in creating novel artefacts with better performance and sustainability. However, limited generalization and the black-box nature of these models lead to limited explainability and reusability. To overcome this situation, we propose a component-based approach to create partial component models by machine learning (ML). This component-based approach aligns deep learning with systems engineering (SE). For the domain of energy efficient building design, we first demonstrate better generalization of the component-based method by analyzing prediction accuracy outside the training data. We observe a much higher accuracy (R2 = 0.94) compared to conventional monolithic methods (R2 = 0.71). Second, we illustrate explainability by exemplary demonstrating how sensitivity information from SE and rules from low-depth decision trees serve engineering. Third, we evaluate explainability by qualitative and quantitative methods demonstrating the matching of preliminary knowledge and data-driven derived strategies and show correctness of activations at component interfaces compared to white-box simulation results (envelope components: R2 = 0.92..0.99; zones: R2 = 0.78..0.93). The key for component-based explainability is that activations at interfaces between the components are interpretable engineering quantities. The large range of possible configurations in composing components allows the examination of novel unseen design cases with understandable data-driven models. The matching of parameter ranges of components by similar probability distribution produces reusable, well-generalizing, and trustworthy models. The approach adapts the model structure to engineering methods of systems engineering and to domain knowledge.

翻訳日:2023-11-23 06:17:26 公開日:2023-11-19

# 高速ビュー合成のためのカスケードおよび一般化可能なニューラルラジアンス場

Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis ( http://arxiv.org/abs/2208.04717v2 )

ライセンス: Link先を確認

Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

(参考訳) ビュー合成のためのカスケードおよび一般化可能なニューラル放射場法であるCG-NeRFを提案する。近年の一般化されたビュー合成手法は、近隣の入力ビューを用いて高品質な新規ビューを描画することができる。しかしながら、ニューラルネットワークの放射場を均一にサンプリングする性質から、レンダリング速度は依然として遅い。既存のシーン固有のメソッドは、新しいビューを効率的に訓練しレンダリングできるが、見えないデータに一般化することはできない。本稿では、粗い放射場予測器と畳み込みに基づくニューラルレンダラーという2つの新しいモジュールを提案することにより、ビュー合成を高速かつ一般化する問題に対処する。このアーキテクチャは暗黙のニューラルネットワークに基づいて一貫したシーン形状を推論し、単一のGPUを使用して新しいビューを効率的にレンダリングする。まず,dtuデータセットの複数の3dシーンでcg-nerfをトレーニングし,光量損失のみを用いて,実データや合成データに対して高品質で正確な斬新なビューを生成する。さらに,単一のシーンのより密集した参照画像を用いて,事前学習したモデルの高速レンダリングを維持しつつ,明示的な表現に頼らずに正確なノベルビューを生成することができる。実験結果から,CG-NeRFは様々な合成および実データに対して,最先端の一般化可能なニューラルレンダリング法より優れていた。

We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering speed is still slow due to the nature of uniformly-point sampling of neural radiance fields. Existing scene-specific methods can train and render novel views efficiently but can not generalize to unseen data. Our approach addresses the problems of fast and generalizing view synthesis by proposing two novel modules: a coarse radiance fields predictor and a convolutional-based neural renderer. This architecture infers consistent scene geometry based on the implicit neural fields and renders new views efficiently using a single GPU. We first train CG-NeRF on multiple 3D scenes of the DTU dataset, and the network can produce high-quality and accurate novel views on unseen real and synthetic data using only photometric losses. Moreover, our method can leverage a denser set of reference images of a single scene to produce accurate novel views without relying on additional explicit representations and still maintains the high-speed rendering of the pre-trained model. Experimental results show that CG-NeRF outperforms state-of-the-art generalizable neural rendering methods on various synthetic and real datasets.

翻訳日:2023-11-23 06:07:55 公開日:2023-11-19

# スポーツにおける多目的追跡のための反復的スケールアップIoUとディープ・フィーチャーズ・アソシエーション

Iterative Scale-Up ExpansionIoU and Deep Features Association for Multi-Object Tracking in Sports ( http://arxiv.org/abs/2306.13074v5 )

ライセンス: Link先を確認

Hsiang-Wei Huang, Cheng-Yen Yang, Jiacheng Sun, Pyong-Kun Kim, Kwang-Ju Kim, Kyoungoh Lee, Chung-I Huang, Jenq-Neng Hwang

(参考訳) 深層学習に基づく物体検出装置は、多目的追跡アルゴリズムの顕著な進歩を導いている。しかし、現在の追跡手法は主に歩行者や車両の単純で規則的な動きパターンに焦点を当てている。これは、アスリートのような非線形不規則な動きを持つターゲットの追跡アルゴリズムのギャップを残している。さらに、最近の追跡アルゴリズムにおけるカルマンフィルタに依存すると、物体の動きがその線形仮定に反するときに不足する。これらの課題を克服するために,スポーツシナリオの多対象追跡に焦点を当てた,Deep ExpansionIoU(Deep-EIoU)という,オンラインかつ堅牢な多対象追跡手法を提案する。従来の手法とは異なり、カルマンフィルタの使用を放棄し、スポーツシナリオにおける拡張IoUの反復的なスケールアップと深い特徴を活用する。このアプローチは、トラッキングプロセスをオンラインに保ちながら、より堅牢な検出器を採用することなく、優れたトラッキング性能を実現する。提案手法は,SportsMOTデータセットでは77.2% HOTA,SportsNet-Trackingデータセットでは85.4% HOTAを達成し,不規則な動作物体の追跡に顕著な効果を示した。さまざまなスポーツシナリオをカバーする、さまざまな大規模マルチオブジェクトトラッキングベンチマークで、これまでの最先端のトラッカーを上回っている。コードとモデルはhttps://github.com/hsiangwei0903/deep-eiouで入手できる。

Deep learning-based object detectors have driven notable progress in multi-object tracking algorithms. Yet, current tracking methods mainly focus on simple, regular motion patterns in pedestrians or vehicles. This leaves a gap in tracking algorithms for targets with nonlinear, irregular motion, like athletes. Additionally, relying on the Kalman filter in recent tracking algorithms falls short when object motion defies its linear assumption. To overcome these issues, we propose a novel online and robust multi-object tracking approach named deep ExpansionIoU (Deep-EIoU), which focuses on multi-object tracking for sports scenarios. Unlike conventional methods, we abandon the use of the Kalman filter and leverage the iterative scale-up ExpansionIoU and deep features for robust tracking in sports scenarios. This approach achieves superior tracking performance without adopting a more robust detector, all while keeping the tracking process in an online fashion. Our proposed method demonstrates remarkable effectiveness in tracking irregular motion objects, achieving a score of 77.2% HOTA on the SportsMOT dataset and 85.4% HOTA on the SoccerNet-Tracking dataset. It outperforms all previous state-of-the-art trackers on various large-scale multi-object tracking benchmarks, covering various kinds of sports scenarios. The code and models are available at https://github.com/hsiangwei0903/Deep-EIoU.

翻訳日:2023-11-23 05:01:35 公開日:2023-11-19

# 高忠実度単分子ダイナミックシーン再構成のための変形性3次元ガウスアン

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction ( http://arxiv.org/abs/2309.13101v2 )

ライセンス: Link先を確認

Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, Xiaogang Jin

(参考訳) 暗黙の神経表現は、動的なシーンの再構築とレンダリングに対する新しいアプローチの道を開いた。それでも、最先端の動的ニューラルネットワークレンダリング手法はこれらの暗黙の表現に大きく依存しており、シーン内のオブジェクトの複雑な詳細を捉えるのにしばしば苦労している。さらに、暗黙の手法は、一般的な動的シーンにおけるリアルタイムレンダリングの達成が困難であり、様々なタスクでの使用を制限する。そこで,本稿では,3次元ガウス法を用いてシーンを再構成し,変形場を有する正準空間で学習し,モノクロ動的シーンをモデル化する3次元ガウス法を提案する。また,オーバヘッドの不要なアニーリングスムージングトレーニング機構を導入することで,実世界のデータセットにおける時間補間タスクのスムース性に対する不正確なポーズの影響を軽減できる。微分ガウスラスタライザにより、変形可能な3Dガウスは高いレンダリング品質だけでなく、リアルタイムレンダリング速度も達成できる。実験の結果,本手法はレンダリング品質と速度の両方において既存手法よりも優れており,新規なビュー合成,時間補間,リアルタイムレンダリングといったタスクに適していることがわかった。

Implicit neural representation has paved the way for new approaches to dynamic scene reconstruction and rendering. Nonetheless, cutting-edge dynamic neural rendering methods rely heavily on these implicit representations, which frequently struggle to capture the intricate details of objects in the scene. Furthermore, implicit methods have difficulty achieving real-time rendering in general dynamic scenes, limiting their use in a variety of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using 3D Gaussians and learns them in canonical space with a deformation field to model monocular dynamic scenes. We also introduce an annealing smoothing training mechanism with no extra overhead, which can mitigate the impact of inaccurate poses on the smoothness of time interpolation tasks in real-world datasets. Through a differential Gaussian rasterizer, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time interpolation, and real-time rendering.

翻訳日:2023-11-23 04:37:06 公開日:2023-11-19

# SIRe-IR:高照度シーンにおける影と照度除去によるBRDF再建のための逆レンダリング

SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes ( http://arxiv.org/abs/2310.13030v2 )

ライセンス: Link先を確認

Ziyi Yang, Yanzhen Chen, Xinyu Gao, Yazhen Yuan, Yu Wu, Xiaowei Zhou, Xiaogang Jin

(参考訳) 暗黙の神経表現は、逆レンダリングの新しい可能性を開く。しかし、既存の暗黙の神経逆レンダリング手法は、大きな影と間接的な照明を持つ強い照らされたシーンを扱うのに苦労している。影と反射の存在は、シーン幾何学の正確な理解につながり、正確な分解を困難にする。この目的のために,非線形マッピングと正規化可視性推定を用いてシーンを環境マップ,アルベド,粗さに分解する暗黙的ニューラルネットワーク逆レンダリング手法SIRe-IRを提案する。間接放射場, 正常, 視認性, 直接光を同時に正確にモデル化することにより, 現場に厳密な制約を課すことなく, 材料の影と間接照明の両方を除去できる。強い照明の存在下でも,影干渉のない高品質なアルベドと粗さを回収する。 SIRe-IRは、定量評価と定性評価の両方において既存の手法より優れている。

Implicit neural representation has opened up new possibilities for inverse rendering. However, existing implicit neural inverse rendering methods struggle to handle strongly illuminated scenes with significant shadows and indirect illumination. The existence of shadows and reflections can lead to an inaccurate understanding of scene geometry, making precise factorization difficult. To this end, we present SIRe-IR, an implicit neural inverse rendering approach that uses non-linear mapping and regularized visibility estimation to decompose the scene into environment map, albedo, and roughness. By accurately modeling the indirect radiance field, normal, visibility, and direct light simultaneously, we are able to remove both shadows and indirect illumination in materials without imposing strict constraints on the scene. Even in the presence of intense illumination, our method recovers high-quality albedo and roughness with no shadow interference. SIRe-IR outperforms existing methods in both quantitative and qualitative evaluations.

翻訳日:2023-11-23 04:24:40 公開日:2023-11-19

# SecureBERT と LLAMA 2 を利用した制御領域ネットワーク侵入検知と分類

SecureBERT and LLAMA 2 Empowered Control Area Network Intrusion Detection and Classification ( http://arxiv.org/abs/2311.12074v1 )

ライセンス: Link先を確認

Xuemei Li, Huirong Fu

(参考訳) 多くの研究がコントロールエリアネットワーク(CAN)攻撃の検出に有効であることを示した。人間の意味空間を理解する領域において、トランスフォーマーベースのモデルは顕著な効果を示した。事前学習されたトランスフォーマーを活用することは、様々な言語関連タスクにおいて一般的な戦略となり、これらのモデルが人間のセマンティクスをより包括的に把握できるようになる。 can侵入検出のための事前学習モデルの適応性評価について検討するため、can-securebertとcan-llama2の2つの異なるモデルを開発した。特に、我々のCAN-LLAMA2モデルは、バランスの取れた精度、精度検出率、F1スコア、そして驚くほど低い3.10e-6の誤警報率で、例外的な性能 0.999993 を達成することで、最先端モデルを上回る。驚くべきことに、誤警報率は、先行モデルのmth-ids(multitiered hybrid intrusion detection system)の52倍小さい。本研究は,大規模言語モデルを基盤モデルとして採用し,他のサイバーセキュリティ関連タスクへのアダプタを導入し,モデル固有の言語関連能力を維持することの約束を明らかにする。

Numerous studies have proved their effective strength in detecting Control Area Network (CAN) attacks. In the realm of understanding the human semantic space, transformer-based models have demonstrated remarkable effectiveness. Leveraging pre-trained transformers has become a common strategy in various language-related tasks, enabling these models to grasp human semantics more comprehensively. To delve into the adaptability evaluation on pre-trained models for CAN intrusion detection, we have developed two distinct models: CAN-SecureBERT and CAN-LLAMA2. Notably, our CAN-LLAMA2 model surpasses the state-of-the-art models by achieving an exceptional performance 0.999993 in terms of balanced accuracy, precision detection rate, F1 score, and a remarkably low false alarm rate of 3.10e-6. Impressively, the false alarm rate is 52 times smaller than that of the leading model, MTH-IDS (Multitiered Hybrid Intrusion Detection System). Our study underscores the promise of employing a Large Language Model as the foundational model, while incorporating adapters for other cybersecurity-related tasks and maintaining the model's inherent language-related capabilities.

翻訳日:2023-11-23 03:38:48 公開日:2023-11-19

# 教師なし学習の統合による低線量CT画像再構成の実現

Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning ( http://arxiv.org/abs/2311.12071v1 )

ライセンス: Link先を確認

Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar

(参考訳) 従来のモデルベース画像再構成法(mbir)は、前方モデルと雑音モデルと単純な物体前兆を組み合わせたものである。画像再構成へのディープラーニング手法の最近の応用は、アンサンプされた測定や様々なノイズによる画像再構成の課題に対処するためのデータ駆動アプローチに成功している。本研究では,X線CT画像再構成のためのハイブリッド教師なし学習フレームワークを提案する。提案した学習定式化は、疎性または教師なし学習に基づく先行とニューラルネットワーク再構成の両方を活用して、固定点反復過程をシミュレートする。各訓練ブロックは、決定論的MBIRソルバとニューラルネットワークで構成される。情報は2つの再構成器を通して並列に流れ、最適に結合される。複数のブロックをカスケードして再構築パイプラインを形成する。訓練データに制限のある低用量ct画像再構成における学習ハイブリッドモデルの有効性を実証し,nih aapm mayoクリニック低用量ctグランドチャレンジデータセットを用いてトレーニングおよびテストを行った。本研究では,教師付きディープ・ネットワーク・コンストラクタとMBIRソルバの組み合わせを,学習された疎表現に基づく先行や分析的先行と組み合わせて検討した。近年の低用量CT再建法と比較して,提案手法の有望な性能を示す。

Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with undersampled measurements or various types of noise. In this work, we propose a hybrid supervised-unsupervised learning framework for X-ray computed tomography (CT) image reconstruction. The proposed learning formulation leverages both sparsity or unsupervised learning-based priors and neural network reconstructors to simulate a fixed-point iteration process. Each proposed trained block consists of a deterministic MBIR solver and a neural network. The information flows in parallel through these two reconstructors and is then optimally combined. Multiple such blocks are cascaded to form a reconstruction pipeline. We demonstrate the efficacy of this learned hybrid model for low-dose CT image reconstruction with limited training data, where we use the NIH AAPM Mayo Clinic Low Dose CT Grand Challenge dataset for training and testing. In our experiments, we study combinations of supervised deep network reconstructors and MBIR solver with learned sparse representation-based priors or analytical priors. Our results demonstrate the promising performance of the proposed framework compared to recent low-dose CT reconstruction methods.

翻訳日:2023-11-23 03:38:24 公開日:2023-11-19

# FDDM:周波数分離拡散モデルを用いた医用画像の教師なし翻訳

FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model ( http://arxiv.org/abs/2311.12070v1 )

ライセンス: Link先を確認

Yunxiang Li, Hua-Chieh Shao, Xiaoxue Qian, You Zhang

(参考訳) 拡散モデルは、疾患の診断、局所化、治療を支援するために、医用画像翻訳のための高品質な画像を作成する大きな可能性を示している。しかしながら、現在の拡散モデルは、医学画像の解剖学的構造を正確に保存できる忠実な画像翻訳、特に障害のないデータセットの達成に限られている。構造的ミスマッチは疾患の誤認や治療ミスにつながるため、構造的および解剖学的詳細の保存は信頼できる診断と治療計画に不可欠である。本研究では,フーリエ領域の医療画像の周波数成分を翻訳過程で分離し,構造保存された高品質画像変換を可能にする新しいフレームワークである周波数分解拡散モデル(fddm)を導入した。 FDDMは、教師なしの周波数変換モジュールを適用して、ソースの医用画像を周波数固有出力に変換し、その後、周波数固有情報を使用して、最終ソースからターゲットへの画像変換のための次の拡散モデルを導出する。公開脳mriからctへの翻訳データセットを用いてfddmの広範な評価を行い,他のgan,vae,および拡散に基づくモデルよりも優れた性能を示した。 Frechet開始距離(FID)、ピーク信号-雑音比(PSNR)、構造類似度指標(SSIM)などの指標を評価した。 FDDMのFIDは29.88で、第2位の半分以下である。これらの結果から,FDDMは,翻訳された解剖学的構造の忠実さを維持しつつ,高リアルなターゲットドメイン画像の生成に優れていた。

Diffusion models have demonstrated significant potential in producing high-quality images for medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models have limited success in achieving faithful image translations that can accurately preserve the anatomical structures of medical images, especially for unpaired datasets. The preservation of structural and anatomical details is essential to reliable medical diagnosis and treatment planning, as structural mismatches can lead to disease misidentification and treatment errors. In this study, we introduced a frequency-decoupled diffusion model (FDDM), a novel framework that decouples the frequency components of medical images in the Fourier domain during the translation process, to allow structure-preserved high-quality image conversion. FDDM applies an unsupervised frequency conversion module to translate the source medical images into frequency-specific outputs and then uses the frequency-specific information to guide a following diffusion model for final source-to-target image translation. We conducted extensive evaluations of FDDM using a public brain MR-to-CT translation dataset, showing its superior performance against other GAN-, VAE-, and diffusion-based models. Metrics including the Frechet inception distance (FID), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM) were assessed. FDDM achieves an FID of 29.88, less than half of the second best. These results demonstrated FDDM's prowess in generating highly-realistic target-domain images while maintaining the faithfulness of translated anatomical structures.

翻訳日:2023-11-23 03:38:05 公開日:2023-11-19

# 協調基礎モデルによる新規物体検出の促進

Enhancing Novel Object Detection via Cooperative Foundational Models ( http://arxiv.org/abs/2311.12068v1 )

ライセンス: Link先を確認

Rohit Bharadwaj, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan

(参考訳) 本稿では,新規物体検出(nod)の難解かつ創発的な問題に対処し,推論中の未知物体と新規物体のカテゴリの正確な検出に焦点をあてる。従来の物体検出アルゴリズムは本質的にクローズドセットであり、NODを扱う能力を制限する。本稿では,既存の閉集合検出器を開集合検出器に変換する新しい手法を提案する。この変換は、事前訓練された基礎モデル、特にCLIPとSAMの相補的な強みを協調的なメカニズムを通じて活用することで達成される。さらに,この機構をGDINOなどの最先端のオープンセット検出器と統合することにより,物体検出性能の新たなベンチマークを確立する。 LVISデータセット上の既知のオブジェクトに対して,新しいオブジェクト検出において17.42mAP,42.08mAPを達成する。 COCO OVDの分割にアプローチを適用すると、新しいクラスに対する7.2ドル \text{AP}_{50} のマージンで現在の最先端技術を上回っます。私たちのコードはhttps://github.com/rohit901/cooperative-foundational-modelsで利用可能です。

In this work, we address the challenging and emergent problem of novel object detection (NOD), focusing on the accurate detection of both known and novel object categories during inference. Traditional object detection algorithms are inherently closed-set, limiting their capability to handle NOD. We present a novel approach to transform existing closed-set detectors into open-set detectors. This transformation is achieved by leveraging the complementary strengths of pre-trained foundational models, specifically CLIP and SAM, through our cooperative mechanism. Furthermore, by integrating this mechanism with state-of-the-art open-set detectors such as GDINO, we establish new benchmarks in object detection performance. Our method achieves 17.42 mAP in novel object detection and 42.08 mAP for known objects on the challenging LVIS dataset. Adapting our approach to the COCO OVD split, we surpass the current state-of-the-art by a margin of 7.2 $ \text{AP}_{50} $ for novel classes. Our code is available at https://github.com/rohit901/cooperative-foundational-models .

翻訳日:2023-11-23 03:37:39 公開日:2023-11-19

# 質と量:ファッションデザインにおけるテキストから画像への合成のための100万枚の高品質画像

Quality and Quantity: Unveiling a Million High-Quality Images for Text-to-Image Synthesis in Fashion Design ( http://arxiv.org/abs/2311.12067v1 )

ライセンス: Link先を確認

Jia Yu, Lichao Zhang, Zijie Chen, Fayu Pan, MiaoMiao Wen, Yuming Yan, Fangsheng Weng, Shuai Zhang, Lili Pan, Zhenzhong Lan

(参考訳) aiとファッションデザインの融合は有望な研究分野として現れてきた。しかし、衣料品や試着段階に関する広範な相互関連データが欠如していることは、この領域におけるAIの潜在能力を妨げている。そこで本研究では,複数年にわたる厳格な努力の成果であるファッション・ディフフュージョンデータセットを提案する。このデータセットは、100万以上の高品質なファッション画像で構成され、詳細なテキスト記述と組み合わせられている。さまざまな地理的位置と文化的背景から得られたデータセットは、世界的なファッショントレンドをカプセル化している。この画像には、衣服や人間に関連する細かい属性が刻まれており、ファッションデザインプロセスを単純化してテキスト・ツー・イメージ(T2I)タスクにしている。 Fashion-Diffusionデータセットは、高品質なテキストイメージペアと多様なヒューマンガーメントペアを提供するだけでなく、人間に関する大規模なリソースとしても機能し、T2I世代の研究を促進する。さらに、t2iベースのファッションデザイン分野における標準化を促進するために、ファッションデザインモデルの性能評価のための複数のデータセットからなる新しいベンチマークを提案する。この研究は、AI駆動のファッションデザインの領域における大きな飛躍であり、この分野における将来の研究のための新しい標準を確立している。

The fusion of AI and fashion design has emerged as a promising research area. However, the lack of extensive, interrelated data on clothing and try-on stages has hindered the full potential of AI in this domain. Addressing this, we present the Fashion-Diffusion dataset, a product of multiple years' rigorous effort. This dataset, the first of its kind, comprises over a million high-quality fashion images, paired with detailed text descriptions. Sourced from a diverse range of geographical locations and cultural backgrounds, the dataset encapsulates global fashion trends. The images have been meticulously annotated with fine-grained attributes related to clothing and humans, simplifying the fashion design process into a Text-to-Image (T2I) task. The Fashion-Diffusion dataset not only provides high-quality text-image pairs and diverse human-garment pairs but also serves as a large-scale resource about humans, thereby facilitating research in T2I generation. Moreover, to foster standardization in the T2I-based fashion design field, we propose a new benchmark comprising multiple datasets for evaluating the performance of fashion design models. This work represents a significant leap forward in the realm of AI-driven fashion design, setting a new standard for future research in this field.

翻訳日:2023-11-23 03:37:19 公開日:2023-11-19

# 大規模言語モデルエージェントを用いたマイナショット分類とセグメンテーション

Few-Shot Classification & Segmentation Using Large Language Models Agent ( http://arxiv.org/abs/2311.12065v1 )

ライセンス: Link先を確認

Tian Meng, Yang Tao, Wuliang Yin

(参考訳) 少数ショット画像分類とセグメンテーション(FS-CS)のタスクは、ターゲットクラスのいくつかの例を考慮すれば、クエリ画像中のターゲットオブジェクトの分類とセグメンテーションを必要とする。本研究では,大規模言語モデル(LLM)をエージェントとして利用し,FS-CS問題にトレーニング不要で対処する手法を提案する。 LLMをタスクプランナーおよび市販のビジョンモデルにツールを組み込むことにより、画像レベルラベルのみを用いて対象オブジェクトの分類とセグメンテーションを行うことができる。具体的には、chain-of-thought prompting and in-context learning guide the llm to observe support images like human; segment anything model (sam) や gpt-4vision といったビジョンモデルは、llm が空間的および意味的情報を同時に理解するのを支援する。最終的に、LLMはその要約と推論機能を使用して、クエリイメージの分類とセグメント化を行う。提案手法のモジュラーフレームワークにより拡張が容易になる。提案手法はPascal-5iデータセットの最先端性能を実現する。

The task of few-shot image classification and segmentation (FS-CS) requires the classification and segmentation of target objects in a query image, given only a few examples of the target classes. We introduce a method that utilises large language models (LLM) as an agent to address the FS-CS problem in a training-free manner. By making the LLM the task planner and off-the-shelf vision models the tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels. Specifically, chain-of-thought prompting and in-context learning guide the LLM to observe support images like human; vision models such as Segment Anything Model (SAM) and GPT-4Vision assist LLM understand spatial and semantic information at the same time. Ultimately, the LLM uses its summarizing and reasoning capabilities to classify and segment the query image. The proposed method's modular framework makes it easily extendable. Our approach achieves state-of-the-art performance on the Pascal-5i dataset.

翻訳日:2023-11-23 03:36:57 公開日:2023-11-19

# 本当にサイコロが必要なの? セグメンテーション損失の隠れた地域規模バイアス

Do We Really Need Dice? The Hidden Region-Size Biases of Segmentation Losses ( http://arxiv.org/abs/2104.08717v4 )

ライセンス: Link先を確認

Bingyuan Liu, Jose Dolz, Adrian Galdran, Riadh Kobbi, Ismail Ben Ayed

(参考訳) ほとんどのセグメンテーション損失はクロスエントロピー(CE)またはディース損失の変種である。表面的には、これら2つの損失のカテゴリは無関係に見え、どのカテゴリがより良い選択であるかについての明確なコンセンサスはなく、それぞれのベンチマークやアプリケーションのパフォーマンスが異なる。さらに、Dice と CE は相補的であり、複合 CE-Dice の損失を動機付けていると広く主張されている。本研究では,CE と Dice が従来考えられていたよりもはるかに深い関係を持つことを示す理論解析を行う。まず, 制約最適化の観点からは, 2つの要素,すなわち, 予測された前景領域を接地構造へ押し上げる類似の接地構造マッチング項と, 予測された領域の大きさに異なるバイアスを与える地域規模ペナルティ項に分解することを示した。 Diceは特定の極端に不均衡な解に対して本質的な偏りを持ち、CEは暗黙的に地平線領域の比例を奨励する。以上の結果から,dice損失が不均衡分節化に改善をもたらす医学的画像化文献における広範な実験的な証拠を説明する。理論解析に基づいて,領域サイズのバイアスを明示的に制御できる,原理的かつ簡単な解法を提案する。提案手法は,CE を L1 あるいは KL の発散に基づく明示的な用語と統合し,対象のクラス比にマッチする分節領域比を奨励し,クラス不均衡を緩和するが,一般性を損なうことはない。異なる損失と応用に関する包括的実験とアブレーションの研究は、我々の理論解析と、明示的かつ単純な領域サイズ項の有効性を検証する。

Most segmentation losses are arguably variants of the Cross-Entropy (CE) or Dice losses. On the surface, these two categories of losses seem unrelated, and there is no clear consensus as to which category is a better choice, with varying performances for each across different benchmarks and applications. Furthermore, it is widely argued within the medical-imaging community that Dice and CE are complementary, which has motivated the use of compound CE-Dice losses. In this work, we provide a theoretical analysis, which shows that CE and Dice share a much deeper connection than previously thought. First, we show that, from a constrained-optimization perspective, they both decompose into two components, i.e., a similar ground-truth matching term, which pushes the predicted foreground regions towards the ground-truth, and a region-size penalty term imposing different biases on the size of the predicted regions. Then, we provide bound relationships and an information-theoretic analysis, which uncover hidden region-size biases: Dice has an intrinsic bias towards specific extremely imbalanced solutions, whereas CE implicitly encourages the ground-truth region proportions. Our theoretical results explain the wide experimental evidence in the medical-imaging literature, whereby Dice losses bring improvements for imbalanced segmentation. Based on our theoretical analysis, we propose a principled and simple solution, which enables to control explicitly the region-size bias. The proposed method integrates CE with explicit terms based on L1 or the KL divergence, which encourage segmenting region proportions to match target class proportions, thereby mitigating class imbalance but without losing generality. Comprehensive experiments and ablation studies over different losses and applications validate our theoretical analysis, as well as the effectiveness of explicit and simple region-size terms.

翻訳日:2023-11-22 21:38:28 公開日:2023-11-19

# テキスト上の数値推論のための質問方向グラフ注意ネットワーク

Question Directed Graph Attention Network for Numerical Reasoning over Text ( http://arxiv.org/abs/2009.07448v2 )

ライセンス: Link先を確認

Kunlong Chen, Weidi Xu, Xingyi Cheng, Zou Xiaochuan, Yuyu Zhang, Le Song, Taifeng Wang, Yuan Qi, Wei Chu

(参考訳) 追加、減算、ソート、カウントなどのテキストに対する数値推論は、自然言語の理解と算術計算の両方を必要とするため、機械読解の難しい課題である。この課題に対処するために,このような推論に必要な経過と質問の文脈に対する不均質なグラフ表現を提案し,このコンテキストグラフ上で多段階の数値推論を駆動する質問指向グラフアテンションネットワークを設計する。コードリンクは:https://github.com/emnlp2020qdgat/QDGAT

Numerical reasoning over texts, such as addition, subtraction, sorting and counting, is a challenging machine reading comprehension task, since it requires both natural language understanding and arithmetic computation. To address this challenge, we propose a heterogeneous graph representation for the context of the passage and question needed for such reasoning, and design a question directed graph attention network to drive multi-step numerical reasoning over this context graph. The code link is at: https://github.com/emnlp2020qdgat/QDGAT

翻訳日:2023-11-22 21:36:17 公開日:2023-11-19

# データ駆動戦略における認識的不確実性と認識の表現

Representations of epistemic uncertainty and awareness in data-driven strategies ( http://arxiv.org/abs/2110.11482v7 )

ライセンス: Link先を確認

Mario Angelelli, Massimiliano Gervasi

(参考訳) aiとビッグデータの拡散は、意思決定を支援する情報量を増やしながら、データや実証的な証拠との直接的なインタラクションを削減し、意思決定プロセスを再構築している。このパラダイムシフトは、データオブザーバビリティの制限があいまいさと解釈性の欠如をもたらすため、新しい不確実性源を導入する。データ駆動戦略の適切な分析の必要性は、知識へのこの種の境界付きアクセスを記述できる新しいモデルの探索を動機付ける。この貢献は、知識表現の不確実性とそのエージェントによる伝達に関する新しい理論モデルを示す。モデルの比較と結合のための構造を内挿することで、知識状態の動的記述を提供する。具体的には、更新は組み合わせによって表現され、その説明可能性は異なる次元表現における一貫性に基づいている。我々は、推論、選好関係、情報尺度の多重性の観点から、非等価な知識表現を考察する。さらに,非古典的不確実性(エルスバーグのモデル)と,他のエージェントがデータ(ウィグナーの友人)を観察することによる知識の推論という2つのシナリオとの形式的類似性を定義する。最後に,提案モデルがデータ駆動戦略に与える影響について考察し,ビジネス価値次元の不確実性に基づく推論と,その評価のための計測ツールの設計に注目する。

The diffusion of AI and big data is reshaping decision-making processes by increasing the amount of information that supports decisions while reducing direct interaction with data and empirical evidence. This paradigm shift introduces new sources of uncertainty, as limited data observability results in ambiguity and a lack of interpretability. The need for the proper analysis of data-driven strategies motivates the search for new models that can describe this type of bounded access to knowledge. This contribution presents a novel theoretical model for uncertainty in knowledge representation and its transfer mediated by agents. We provide a dynamical description of knowledge states by endowing our model with a structure to compare and combine them. Specifically, an update is represented through combinations, and its explainability is based on its consistency in different dimensional representations. We look at inequivalent knowledge representations in terms of multiplicity of inferences, preference relations, and information measures. Furthermore, we define a formal analogy with two scenarios that illustrate non-classical uncertainty in terms of ambiguity (Ellsberg's model) and reasoning about knowledge mediated by other agents observing data (Wigner's friend). Finally, we discuss some implications of the proposed model for data-driven strategies, with special attention to reasoning under uncertainty about business value dimensions and the design of measurement tools for their assessment.

翻訳日:2023-11-22 21:27:10 公開日:2023-11-19

# 教師なし画像アニメーションにおける微細粒運動変形の微分運動進化

Differential Motion Evolution for Fine-Grained Motion Deformation in Unsupervised Image Animation ( http://arxiv.org/abs/2110.04658v2 )

ライセンス: Link先を確認

Peirong Liu, Rui Wang, Xuefei Cao, Yipin Zhou, Ashish Shah, Ser-Nam Lim

(参考訳) 画像アニメーション(英: Image animation)とは、ソース画像中の特定のオブジェクトに駆動ビデオの動きを転送するタスクである。近年では、ラベル付きデータやドメイン先行を必要とせず、教師なしのモーション転送において大きな進歩が見られるが、現在の教師なしアプローチの多くは、ソースと駆動ドメインの間に大きな動き/ビューの相違が生じても、動きの変形を捉えるのに苦慮している。このような条件下では、動き場を適切に捉えるのに十分な情報がないだけである。動き推定のための微分精細化を統合したエンドツーエンドの教師なし動き伝達フレームワークであるdime (differential motion evolution) を紹介する。主な発見は,(1)常微分方程式(ODE)で運動伝達を捉えることにより,運動場を規則化し,(2)原画像自体を利用することで,大きな運動変化による閉塞/欠落領域を塗布することができる。さらに、ビュー毎にODEをモデル化することで、DMEはソースオブジェクトの複数の異なるビューを簡単に活用できるというODEの考え方の自然な拡張も提案する。 9つのベンチマークにわたる広範囲な実験により、dimeは最先端のオブジェクトよりもかなりのマージンを上回り、目に見えないオブジェクトをより一般化しています。

Image animation is the task of transferring the motion of a driving video to a given object in a source image. While great progress has recently been made in unsupervised motion transfer, requiring no labeled data or domain priors, many current unsupervised approaches still struggle to capture the motion deformations when large motion/view discrepancies occur between the source and driving domains. Under such conditions, there is simply not enough information to capture the motion field properly. We introduce DiME (Differential Motion Evolution), an end-to-end unsupervised motion transfer framework integrating differential refinement for motion estimation. Key findings are twofold: (1) by capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field, and (2) by utilizing the source image itself, we are able to inpaint occluded/missing regions arising from large motion changes. Additionally, we also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available by modeling an ODE per view. Extensive experiments across 9 benchmarks show DiME outperforms the state-of-the-arts by a significant margin and generalizes much better to unseen objects.

翻訳日:2023-11-22 21:26:47 公開日:2023-11-19

# 確率のない頻繁な推論:信頼性のあるシミュレータに基づく推論のための古典統計と機械学習の橋渡し

Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning for Reliable Simulator-Based Inference ( http://arxiv.org/abs/2107.03920v8 )

ライセンス: Link先を確認

Niccol\`o Dalmasso, Luca Masserano, David Zhao, Rafael Izbicki, Ann B. Lee

(参考訳) 科学の多くの分野は、複雑なシステムの難解な可能性関数を暗黙的にエンコードするコンピュータシミュレータを多用している。古典的な統計手法は、いわゆる「可能性のない推論(LFI)」設定、特に漸近的および低次元のレジームの外では不適当である。同時に、近似ベイズ計算やより最近の機械学習技術のような従来のlfi法は、一般的な設定(高次元データ、有限サンプルサイズ、任意のパラメータ値)において名目カバレッジを持つ信頼セットを保証しない。さらに、パラメータ空間全体にわたってそのような手法によって提供される信頼セットの実証的カバレッジを確認するための診断ツールも存在しない。本研究では,古典統計と現代の機械学習提供を橋渡しする統一的モジュール型推論フレームワークを提案する。 (i)未知のパラメータの任意の値に対して、頻繁な有限サンプル被覆を持つ信頼集合のニーマン構成への実践的アプローチ (ii)パラメータ空間全体の経験的カバレッジを推定する解釈可能な診断。一般のフレームワークを、LF2I ( chance-free frequentist inference) と呼ぶ。テスト統計を定義する任意のメソッドはLF2Iを利用して、固定パラメータ設定のモンテカルロサンプルを犠牲にすることなく、有効な信頼セットと診断を作成することができる。 2つの確率ベーステスト統計(acoreとbff)のパワーを調査し,その経験的性能を高次元複雑なデータで実証する。コードはhttps://github.com/lee-group-cmu/lf2iで入手できる。

Many areas of science make extensive use of computer simulators that implicitly encode intractable likelihood functions of complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, especially outside asymptotic and low-dimensional regimes. At the same time, traditional LFI methods - such as Approximate Bayesian Computation or more recent machine learning techniques - do not guarantee confidence sets with nominal coverage in general settings (i.e., with high-dimensional data, finite sample sizes, and for any parameter value). In addition, there are no diagnostic tools to check the empirical coverage of confidence sets provided by such methods across the entire parameter space. In this work, we propose a unified and modular inference framework that bridges classical statistics and modern machine learning providing (i) a practical approach to the Neyman construction of confidence sets with frequentist finite-sample coverage for any value of the unknown parameters; and (ii) interpretable diagnostics that estimate the empirical coverage across the entire parameter space. We refer to the general framework as likelihood-free frequentist inference (LF2I). Any method that defines a test statistic can leverage LF2I to create valid confidence sets and diagnostics without costly Monte Carlo samples at fixed parameter settings. We study the power of two likelihood-based test statistics (ACORE and BFF) and demonstrate their empirical performance on high-dimensional, complex data. Code is available at https://github.com/lee-group-cmu/lf2i.

翻訳日:2023-11-22 21:25:09 公開日:2023-11-19

# 3次元小分子と高分子錯体のための効率的かつ正確な物理量認識多重グラフニューラルネットワーク

Efficient and Accurate Physics-aware Multiplex Graph Neural Networks for 3D Small Molecules and Macromolecule Complexes ( http://arxiv.org/abs/2206.02789v3 )

ライセンス: Link先を確認

Shuo Zhang, Yang Liu, Lei Xie

(参考訳) グラフニューラルネットワーク(GNN)を分子科学に適用する最近の進歩は、3次元3次元構造表現をGNNで学習する能力を示している。しかし、既存のGNNのほとんどは、多様な相互作用のモデリング不足、計算コストの高い演算、ベクトル値の無知の限界に悩まされている。そこで我々は,新しいGNNモデルである物理対応多重グラフニューラルネットワーク(PaxNet)を提案し,小さな有機化合物とマクロ分子複合体の3次元分子の表現を効率的かつ正確に学習する。 PaxNetは、分子力学にインスパイアされた局所的および非局所的な相互作用のモデリングを分離し、高価な角度関連計算を減らす。スカラー特性の他に、paxnetは各原子の関連するベクトルを学習することでベクトル特性を予測できる。 PaxNetの性能を評価するために,2つのタスクにおける最先端のベースラインと比較する。量子化学特性を予測するための小さな分子データセットでは、PaxNetは予測誤差を15%削減し、最高のベースラインよりも73%少ないメモリを使用する。タンパク質-リガンド結合親和性を予測するマクロ分子データセットでは、PaxNetはメモリ消費を33%減らし、推論時間を85%減らしながら、最高のベースラインを上回っている。したがって、PaxNetは分子の大規模機械学習のための普遍的で堅牢で正確な方法を提供する。私たちのコードはhttps://github.com/zetayue/Physics-aware-Multiplex-GNNで利用可能です。

Recent advances in applying Graph Neural Networks (GNNs) to molecular science have showcased the power of learning three-dimensional (3D) structure representations with GNNs. However, most existing GNNs suffer from the limitations of insufficient modeling of diverse interactions, computational expensive operations, and ignorance of vectorial values. Here, we tackle these limitations by proposing a novel GNN model, Physics-aware Multiplex Graph Neural Network (PaxNet), to efficiently and accurately learn the representations of 3D molecules for both small organic compounds and macromolecule complexes. PaxNet separates the modeling of local and non-local interactions inspired by molecular mechanics, and reduces the expensive angle-related computations. Besides scalar properties, PaxNet can also predict vectorial properties by learning an associated vector for each atom. To evaluate the performance of PaxNet, we compare it with state-of-the-art baselines in two tasks. On small molecule dataset for predicting quantum chemical properties, PaxNet reduces the prediction error by 15% and uses 73% less memory than the best baseline. On macromolecule dataset for predicting protein-ligand binding affinities, PaxNet outperforms the best baseline while reducing the memory consumption by 33% and the inference time by 85%. Thus, PaxNet provides a universal, robust and accurate method for large-scale machine learning of molecules. Our code is available at https://github.com/zetayue/Physics-aware-Multiplex-GNN.

翻訳日:2023-11-22 21:15:28 公開日:2023-11-19

# 追尾と映像物体検出の統一化

Unifying Tracking and Image-Video Object Detection ( http://arxiv.org/abs/2211.11077v2 )

ライセンス: Link先を確認

Peirong Liu, Rui Wang, Pengchuan Zhang, Omid Poursaeed, Yipin Zhou, Xuefei Cao, Sreya Dutta Roy, Ashish Shah, Ser-Nam Lim

(参考訳) オブジェクト指向検出(OD)はコンピュータビジョンにおける最も基本的なタスクの1つである。近年のディープラーニングの進歩により、画像ODのパフォーマンスは学習ベースのデータ駆動アプローチによって新たな高みへと押し上げられている。一方、video odは、より高価なデータアノテーションのニーズのために、あまり探求されていない。同時に、トラックの同一性や時空間軌跡の推論を必要とするマルチオブジェクト追跡(MOT)も、ビデオODと類似の精神を共有している。しかし、ほとんどのmotデータセットはクラス固有(例えば、person-annotated only)であり、モデルが他のオブジェクトを追跡する柔軟性を制約している。本稿では、画像OD、ビデオOD、MOTを1つのエンドツーエンドモデルで統合する最初のフレームワークであるTrIVD(Tracking and Image-Video Detection)を提案する。データセット間のカテゴリラベルの相違やセマンティックな重複に対処するため、TrIVDはビジュアルテキストアライメントによるオブジェクトカテゴリの検出/追跡を根拠と理由として定式化している。統合された定式化により、クロスデータセット、マルチタスクのトレーニングが可能になり、TrIVDにフレームレベルの特徴、ビデオレベルの時空間関係、およびアイデンティティの関連性を追跡することができる。このような共同トレーニングにより、よりリッチなオブジェクトカテゴリアノテーションを備えたODデータからの知識をMOTに拡張し、ゼロショット追跡機能を実現することができます。実験により、マルチタスクで訓練されたTrIVDは、すべての画像/ビデオODおよびMOTタスクでシングルタスクベースラインを上回っていることが示された。さらに、ゼロショットトラッキングという新しいタスクに、最初のベースラインを設定します。

Objection detection (OD) has been one of the most fundamental tasks in computer vision. Recent developments in deep learning have pushed the performance of image OD to new heights by learning-based, data-driven approaches. On the other hand, video OD remains less explored, mostly due to much more expensive data annotation needs. At the same time, multi-object tracking (MOT) which requires reasoning about track identities and spatio-temporal trajectories, shares similar spirits with video OD. However, most MOT datasets are class-specific (e.g., person-annotated only), which constrains a model's flexibility to perform tracking on other objects. We propose TrIVD (Tracking and Image-Video Detection), the first framework that unifies image OD, video OD, and MOT within one end-to-end model. To handle the discrepancies and semantic overlaps of category labels across datasets, TrIVD formulates detection/tracking as grounding and reasons about object categories via visual-text alignments. The unified formulation enables cross-dataset, multi-task training, and thus equips TrIVD with the ability to leverage frame-level features, video-level spatio-temporal relations, as well as track identity associations. With such joint training, we can now extend the knowledge from OD data, that comes with much richer object category annotations, to MOT and achieve zero-shot tracking capability. Experiments demonstrate that multi-task co-trained TrIVD outperforms single-task baselines across all image/video OD and MOT tasks. We further set the first baseline on the new task of zero-shot tracking.

翻訳日:2023-11-22 20:51:44 公開日:2023-11-19

# MLIC:学習画像圧縮のためのマルチ参照エントロピーモデル

MLIC: Multi-Reference Entropy Model for Learned Image Compression ( http://arxiv.org/abs/2211.07273v8 )

ライセンス: Link先を確認

Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, Ronggang Wang

(参考訳) 近年,学習画像の圧縮性能は著しく向上している。潜在表現の分布を推定するエントロピーモデルは、速度分散性能の向上に重要な役割を果たしている。しかし、ほとんどのエントロピーモデルは1次元の相関のみを捉えるが、潜在表現はチャネル回り、局所空間、大域的な空間相関を含む。この問題に対処するため、Multi-Reference Entropy Model (MEM) と高度なバージョンMEM$^+$を提案する。これらのモデルは潜在表現に存在する異なる種類の相関を捉える。具体的には、まず潜在表現をスライスに分割する。現在のスライスを復号する際には、予め復号されたスライスをコンテキストとして使用し、それまでのスライスのアテンションマップを用いて、現在のスライスにおける大域的相関を予測する。ローカルコンテキストをキャプチャするために,性能劣化を回避する2つの拡張チェッカーボードコンテキストキャプチャ技術を導入する。 MEM と MEM$^+$ に基づいて,画像圧縮モデル MLIC と MLIC$^+$ を提案する。我々のMLICおよびMLIC$^+$モデルは、PSNRで測定されたVTM-17.0と比較して、Kodakデータセット上でのBDレートが8.05\%$と11.39\%$に減少する。私たちのコードはhttps://github.com/jiangweibeta/mlicで利用可能です。

Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC.

翻訳日:2023-11-22 20:50:12 公開日:2023-11-19

# 擬似決定論的量子回路の難読化

Obfuscation of Pseudo-Deterministic Quantum Circuits ( http://arxiv.org/abs/2302.11083v3 )

ライセンス: Link先を確認

James Bartusek, Fuyuki Kitagawa, Ryo Nishimaki, and Takashi Yamakawa

(参考訳) 従来のオラクルモデルでは、疑似決定論的量子回路を難読化する方法を示し、誤りを伴う学習の量子ハードネスを仮定する。古典的な量子回路の$Q$の説明を考えると、我々のオブファスケータは任意の入力に対して$Q$を繰り返し評価することができる量子状態$\ket{\widetilde{Q}}$を出力する。古典オラクルを量子後識別不能なオブファスケータの候補として使用することにより、多項式サイズの疑似決定論的量子回路に対する識別不能な難読化の最初の候補構築が可能になる。特に,本手法はShorのアルゴリズム(SICOMP 1997)を実装するのに十分な性能を持つ回路群に対する,最初の候補オブファスケータである。提案手法はバルタテックとマラボルタ (ITCS 2022) に従っており、量子計算(CVQC) スキームの古典的検証の検証を妨害することにより、量子回路を難読化する。我々は、Mahadevの量子完全同型暗号スキーム(FOCS 2018)の評価手順を検証するために使用できる量子 \emph{partitioning} 回路に対して、公に検証可能なCVQCスキームを構築することで、ヌル回路を超えていく。我々はバルタテック (TCC 2021) の1回限りの安全なスキームを完全再利用可能なスキームにアップグレードし、パブリックデコダブルな \emph{Pauli functional commitment} を通じて実現し、この作業で正式に定義し構成する。このコミットメントスキームは、受信者の標準とアダマール基底のデコード機能にアクセスできるコミッタに対するバインディングの概念を満たすもので、等価だが衝突耐性のハッシュ関数の文脈で導入されたamos、georgiou、kiayias、zhandry(stoc 2020)の技術に基づいて構築されている。

We show how to obfuscate pseudo-deterministic quantum circuits in the classical oracle model, assuming the quantum hardness of learning with errors. Given the classical description of a quantum circuit $Q$, our obfuscator outputs a quantum state $\ket{\widetilde{Q}}$ that can be used to evaluate $Q$ repeatedly on arbitrary inputs. Instantiating the classical oracle using any candidate post-quantum indistinguishability obfuscator gives us the first candidate construction of indistinguishability obfuscation for all polynomial-size pseudo-deterministic quantum circuits. In particular, our scheme is the first candidate obfuscator for a class of circuits that is powerful enough to implement Shor's algorithm (SICOMP 1997). Our approach follows Bartusek and Malavolta (ITCS 2022), who obfuscate \emph{null} quantum circuits by obfuscating the verifier of an appropriate classical verification of quantum computation (CVQC) scheme. We go beyond null circuits by constructing a publicly-verifiable CVQC scheme for quantum \emph{partitioning} circuits, which can be used to verify the evaluation procedure of Mahadev's quantum fully-homomorphic encryption scheme (FOCS 2018). We achieve this by upgrading the one-time secure scheme of Bartusek (TCC 2021) to a fully reusable scheme, via a publicly-decodable \emph{Pauli functional commitment}, which we formally define and construct in this work. This commitment scheme, which satisfies a notion of binding against committers that can access the receiver's standard and Hadamard basis decoding functionalities, is constructed by building on techniques of Amos, Georgiou, Kiayias, and Zhandry (STOC 2020) introduced in the context of equivocal but collision-resistant hash functions.

翻訳日:2023-11-22 20:25:51 公開日:2023-11-19

# 初期化学習: メタ学習はプロンプトチューニングにおけるクロスタスクの一般化を改善するか?

Learning to Initialize: Can Meta Learning Improve Cross-task Generalization in Prompt Tuning? ( http://arxiv.org/abs/2302.08143v3 )

ライセンス: Link先を確認

Chengwei Qin, Qian Li, Ruochen Zhao, Shafiq Joty

(参考訳) タスク毎に追加のトークンの埋め込みのみをチューニングし、事前学習された言語モデル(plm)を凍結しておくプロンプトチューニング(pt)は、わずかな学習で驚くべきパフォーマンスを示している。それにもかかわらず、PTは迅速な埋め込みの良好な初期化に大きく依存していることが示されている。本研究では,メタプロンプト・チューニング(MPT)について検討し,メタ学習がPTにおけるクロスタスクの一般化を(可能ならば)改善し,他の関連するタスクからのプロンプト埋め込みを初期化することで,体系的に研究する。我々は,多種多様なソース/ターゲットタスク設定を用いて,多種多様な適応設定において,メタ学習アルゴリズムの代表セットを経験的に分析する。広範囲な実験と分析により,MPTの有効性を実証した。この改善は特に分類タスクにおいて重要である。質問応答など他のタスクでは、MPTはPTより優れているが、マルチタスク学習では必ずしも優れているとは限らない。さらに,タスクの類似性の観点から,詳細な分析を行う。

Prompt tuning (PT) which only tunes the embeddings of an additional sequence of tokens per task, keeping the pre-trained language model (PLM) frozen, has shown remarkable performance in few-shot learning. Despite this, PT has been shown to rely heavily on good initialization of the prompt embeddings. In this work, we study meta prompt tuning (MPT) to systematically explore how meta-learning can help improve (if it can) cross-task generalization in PT through learning to initialize the prompt embeddings from other relevant tasks. We empirically analyze a representative set of meta learning algorithms in a wide range of adaptation settings with different source/target task configurations on a large set of few-shot tasks. With extensive experiments and analysis, we demonstrate the effectiveness of MPT. We find the improvement to be significant particularly on classification tasks. For other kinds of tasks such as question answering, we observe that while MPT can outperform PT in most cases, it does not always outperform multi-task learning. We further provide an in-depth analysis from the perspective of task similarity.

翻訳日:2023-11-22 20:24:50 公開日:2023-11-19

# ChatGPTは汎用自然言語処理タスクか?

Is ChatGPT a General-Purpose Natural Language Processing Task Solver? ( http://arxiv.org/abs/2302.06476v3 )

ライセンス: Link先を確認

Chengwei Qin, Aston Zhang, Zhuosheng Zhang, Jiaao Chen, Michihiro Yasunaga, Diyi Yang

(参考訳) 大規模化の進展により、大規模言語モデル(LLM)は、下流データに適応することなく、さまざまな自然言語処理(NLP)タスクをゼロショットで実行できることを実証した。近年のChatGPTの登場は、人間の入力に対する高品質な応答と、その後の会話に基づく自己修正の誤りを生成できるという事実から、自然言語処理(NLP)コミュニティから大きな注目を集めている。しかし、ChatGPTが多くのNLPタスクをゼロショットで実行できるジェネラリストモデルとして機能するかどうかはまだ分かっていない。本研究では,ChatGPTのゼロショット学習能力を7つの代表的なタスクカテゴリをカバーする20のNLPデータセット上で評価することにより,実証的に解析する。広範な実証研究により,現在のChatGPTの有効性と限界を実証した。 ChatGPTは推論能力(例えば算術的推論)を好む多くのタスクでよく機能するが、シーケンシャルタグ付けのような特定のタスクを解く際にはまだ課題に直面している。また,定性ケーススタディを通じて詳細な分析を行う。

Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies.

翻訳日:2023-11-22 20:23:22 公開日:2023-11-19

# 位相空間における工学的アービタリーハミルトニアン

Engineering Arbitrary Hamiltonians in Phase Space ( http://arxiv.org/abs/2302.04257v3 )

ライセンス: Link先を確認

Lingzhen Guo and Vittorio Peano

(参考訳) 非可換フーリエ変換(NcFT)に基づく周期駆動発振器のフロケ位相空間における任意のハミルトニアンを設計するための一般化手法を提案する。位相空間における任意の対象フロケ・ハミルトニアンと実空間における周期的駆動ポテンシャルの関係を確立する。実空間における駆動ポテンシャルの解析式は、位相空間、例えば回転格子やシャープ境界井戸において新しいハミルトニアンを生成することができる。我々のプロトコルは、非古典的状態生成とボソニック量子計算のための様々な実験プラットフォームで実現できる。

We introduce a general method to engineer arbitrary Hamiltonians in the Floquet phase space of a periodically driven oscillator, based on the non-commutative Fourier transformation (NcFT) technique. We establish the relationship between an arbitrary target Floquet Hamiltonian in phase space and the periodic driving potential in real space. We obtain analytical expressions for the driving potentials in real space that can generate novel Hamiltonians in phase space, e.g., rotational lattices and sharp-boundary well. Our protocol can be realised in a range of experimental platforms for nonclassical states generation and bosonic quantum computation.

翻訳日:2023-11-22 20:22:11 公開日:2023-11-19

# シュレディンガー-ロバートソン不確実性関係に基づくより強いEPRステアリング基準

Stronger EPR-steering criterion based on inferred Schrodinger-Robertson uncertainty relation ( http://arxiv.org/abs/2303.11914v3 )

ライセンス: Link先を確認

Laxmi Prasad Naik, Rakesh Mohan Das, Prasanta K. Panigrahi

(参考訳) ステアリングはベルの非局所性と絡み合いの間の3つの同値な非局所相関の1つである。シュロディンガー・ロバートソンの不確実性関係(SRUR)は、絡みや操舵の検知に広く用いられている。しかし、SRURに基づく初期の研究におけるステアリング基準は、完全な推論-分散不確実性関係を含まない。本稿では,局所隠れ状態モデルとレイド形式を考慮し,二成分シナリオにおけるsrurに基づく完全な推定分散epr-steering条件を導出する。さらに,2量子および2量子の異方性状態の離散変数による操舵基準の有効性を確認する。

Steering is one of the three in-equivalent forms of nonlocal correlations intermediate between Bell nonlocality and entanglement. Schrodinger-Robertson uncertainty relation (SRUR), has been widely used to detect entanglement and steering. However, the steering criterion in earlier works, based on SRUR, did not involve complete inferred-variance uncertainty relation. In this paper, by considering the local hidden state model and Reid formalism, we derive a complete inferred-variance EPR-steering criterion based on SRUR in the bipartite scenario. Furthermore, we check the effectiveness of our steering criterion with discrete variable bipartite two-qubit and two-qutrit isotropic states.

翻訳日:2023-11-22 20:15:06 公開日:2023-11-19

# エージェントベース市場モデルと相互作用する多くの学習エージェント

Many learning agents interacting with an agent-based market model ( http://arxiv.org/abs/2303.07393v3 )

ライセンス: Link先を確認

Matthew Dicks, Andrew Paskaramoorthy, Tim Gebbie

(参考訳) 我々は,金融市場のリアクティブエージェントベースモデル(ABM)とイベント時に相互作用する複数の強化学習最適実行取引エージェントのダイナミクスと相互作用を考察する。このモデルは、最適な実行学習エージェント、最小限の知的流動性テイカー、高速な電子流動性プロバイダによって表される3つの栄養レベルを持つ市場エコロジーを表している。最適な実行エージェントクラスには、制限注文と市場注文の組み合わせを使用できる購入および販売エージェント、または市場注文を使用した貿易のみが含まれる。報酬関数は、注文をタイムリーに実行しないペナルティに対して、取引実行スリップを明示的にバランスさせる。この研究は、エージェントの数、エージェントの初期注文のサイズ、学習に使用される状態空間の関数として、複数の競合する学習エージェントが、最小限のインテリジェントな市場シミュレーションにどのように影響するかを示す。我々は、様々な学習エージェントの仕様が含まれている場合、abmのダイナミクスを調べるために位相空間プロットを用いる。さらに、学習可能な最適な実行エージェントが、経験的データと同じ複雑さでダイナミクスを生み出すことができるかどうかについて検討する。最適な実行エージェントを組み込むことで、ABMが作り出したスタイル化された事実を経験的データに適合させることができ、市場マイクロ構造を調査する上で必要となるものとなる。しかし, 実験データから得られた複雑性を回復するには, チャート-基礎-ノイズABMの実行エージェントを含めるには不十分である。

We consider the dynamics and the interactions of multiple reinforcement learning optimal execution trading agents interacting with a reactive Agent-Based Model (ABM) of a financial market in event time. The model represents a market ecology with 3-trophic levels represented by: optimal execution learning agents, minimally intelligent liquidity takers, and fast electronic liquidity providers. The optimal execution agent classes include buying and selling agents that can either use a combination of limit orders and market orders, or only trade using market orders. The reward function explicitly balances trade execution slippage against the penalty of not executing the order timeously. This work demonstrates how multiple competing learning agents impact a minimally intelligent market simulation as functions of the number of agents, the size of agents' initial orders, and the state spaces used for learning. We use phase space plots to examine the dynamics of the ABM, when various specifications of learning agents are included. Further, we examine whether the inclusion of optimal execution agents that can learn is able to produce dynamics with the same complexity as empirical data. We find that the inclusion of optimal execution agents changes the stylised facts produced by ABM to conform more with empirical data, and are a necessary inclusion for ABMs investigating market micro-structure. However, including execution agents to chartist-fundamentalist-noise ABMs is insufficient to recover the complexity observed in empirical data.

翻訳日:2023-11-22 20:13:51 公開日:2023-11-19

# SimuQ: アナログコンパイルによる量子ハミルトンシミュレーションのプログラミングフレームワーク

SimuQ: A Framework for Programming Quantum Hamiltonian Simulation with Analog Compilation ( http://arxiv.org/abs/2303.02775v3 )

ライセンス: Link先を確認

Yuxiang Peng, Jacob Young, Pengyu Liu, Xiaodi Wu

(参考訳) 量子系の進化をシミュレートし、量子現象を探究する量子ハミルトンシミュレーションは、量子コンピューティングの最も有望な応用の1つである。最近の実験結果から、ハミルトニアン指向アナログ量子シミュレーションは、ノイズ中間スケール量子(NISQ)マシン時代の回路指向デジタル量子シミュレーションよりも有利であることが示唆された。しかし、アナログ量子シミュレータのプログラミングはハードウェアとソフトウェアの統一インターフェースが欠如しているため、はるかに困難である。本稿では、ハミルトン計画とパルスレベルコンパイルをサポートする量子ハミルトンシミュレーションのための最初のフレームワークであるSimuQを、異種アナログ量子シミュレータに設計、実装する。具体的には、SimuQでは、フロントエンドユーザーがターゲットの量子システムをハミルトンモデリング言語で指定し、アナログ量子シミュレータのハミルトンレベルのプログラマビリティは、抽象アナログ命令セット(AAIS)と呼ばれる新しい抽象化によって指定され、ハードウェアプロバイダによってAIS仕様言語でプログラムされる。ソルバベースのコンパイルにより、simuqは実デバイスで実行可能なパルススケジュールを生成し、超伝導(ibm)、中性原子(quera)、閉じ込められたイオン(ionq)量子デバイスで実証される所望の量子システムの進化をシミュレートする。さらに,SimuQのコンパイラを上記のアナログ量子シミュレータで評価するために,ネイティブ操作やインタラクションベースゲートを持つデバイスのハミルトンレベルプログラマビリティを公開するという利点を実証し,量子シミュレーションの小さなベンチマークを確立する。

Quantum Hamiltonian simulation, which simulates the evolution of quantum systems and probes quantum phenomena, is one of the most promising applications of quantum computing. Recent experimental results suggest that Hamiltonian-oriented analog quantum simulation would be advantageous over circuit-oriented digital quantum simulation in the Noisy Intermediate-Scale Quantum (NISQ) machine era. However, programming analog quantum simulators is much more challenging due to the lack of a unified interface between hardware and software. In this paper, we design and implement SimuQ, the first framework for quantum Hamiltonian simulation that supports Hamiltonian programming and pulse-level compilation to heterogeneous analog quantum simulators. Specifically, in SimuQ, front-end users specify the target quantum system with Hamiltonian Modeling Language, and the Hamiltonian-level programmability of analog quantum simulators is specified through a new abstraction called the abstract analog instruction set (AAIS) and programmed in AAIS Specification Language by hardware providers. Through a solver-based compilation, SimuQ generates executable pulse schedules for real devices to simulate the evolution of desired quantum systems, which is demonstrated on superconducting (IBM), neutral-atom (QuEra), and trapped-ion (IonQ) quantum devices. Moreover, we demonstrate the advantages of exposing the Hamiltonian-level programmability of devices with native operations or interaction-based gates and establish a small benchmark of quantum simulation to evaluate SimuQ's compiler with the above analog quantum simulators.

翻訳日:2023-11-22 20:11:05 公開日:2023-11-19

# IRFL:図形言語の画像認識

IRFL: Image Recognition of Figurative Language ( http://arxiv.org/abs/2303.15445v2 )

ライセンス: Link先を確認

Ron Yosef, Yonatan Bitton, Dafna Shahaf

(参考訳) 比喩、シミュレート、イディオムなどの音声の図は人間のコミュニケーションの不可欠な部分である。それらは様々な形態の言論においてユビキタスであり、人々は複雑な抽象的な考えを伝え、感情を誘発することができる。図形形式はしばしば複数のモダリティ(テキストと画像の両方)を通して伝達されるため、多モーダルな図形言語を理解することは重要なAI課題であり、深いビジョン、言語、常識、文化的知識を織り合わせている。本研究では,IRFL(Image Recognition of Figurative Language)データセットの開発を行う。人間のアノテーションと自動パイプラインを利用して、マルチモーダルデータセットを生成し、マルチモーダル・フィギュラティブ言語理解のためのベンチマークとして2つの新しいタスクを導入する。我々は最先端のビジョンと言語モデルを実験し、最高の(22%)は人間(97%)よりもかなり悪い結果が得られた。私たちは、図形言語をよりよく理解できるモデルの開発を推進するために、データセット、ベンチマーク、コードをリリースしています。

Figures of speech such as metaphors, similes, and idioms are integral parts of human communication. They are ubiquitous in many forms of discourse, allowing people to convey complex, abstract ideas and evoke emotion. As figurative forms are often conveyed through multiple modalities (e.g., both text and images), understanding multimodal figurative language is an important AI challenge, weaving together profound vision, language, commonsense and cultural knowledge. In this work, we develop the Image Recognition of Figurative Language (IRFL) dataset. We leverage human annotation and an automatic pipeline we created to generate a multimodal dataset, and introduce two novel tasks as a benchmark for multimodal figurative language understanding. We experimented with state-of-the-art vision and language models and found that the best (22%) performed substantially worse than humans (97%). We release our dataset, benchmark, and code, in hopes of driving the development of models that can better understand figurative language.

翻訳日:2023-11-22 19:57:50 公開日:2023-11-19

# 非線形フォトニック結晶を用いたリートロッター積公式で定義されるコヒーレント圧縮様状態の生成

Generation of a coherent squeezed like state defined with the Lie-Trotter product formula using a nonlinear photonic crystal ( http://arxiv.org/abs/2304.11373v3 )

ライセンス: Link先を確認

Hiroo Azuma

(参考訳) 本稿では,非線形フォトニック結晶を用いたコヒーレント励起光の発生方法について検討する。フォトニック結晶は入射光の群速度を減少させるため、二階非線形光感受性$\chi^{(2)}$の材料からなる場合、非線形材料とそれを通過する光との相互作用は強化され、発光光の量子状態は大幅に縮小される。これにより、非線形フォトニック結晶を配置した共振共振器を備えたコヒーレント励起光を生成することができる。このコヒーレント圧縮様状態はリートロッター積公式で定義され、その数学的表現は従来のコヒーレント圧縮状態と異なる。このコヒーレントな圧縮状態が,提案手法の物理パラメータを調整することで,実際に15.9ドルdBで得られることを示す。光子の平均個数をビームスプリッタに1個または2個ずつ与え、圧縮光の流れを一対の絡み合った光に分割することにより、その絡み合いを定量的に推定する。本論文は、H. Azuma, J. Physの続編である。 d:appl。 Phys 55, 315106 (2022).

In this paper, we investigate how to generate coherent squeezed like light using a nonlinear photonic crystal. Because the photonic crystal reduces the group velocity of the incident light, if it is composed of a material with a second-order nonlinear optical susceptibility $\chi^{(2)}$, the interaction between the nonlinear material and the light passing through it strengthens and the quantum state of the emitted light is largely squeezed. Thus, we can generate a coherent squeezed like light with a resonating cavity in which the nonlinear photonic crystal is placed. This coherent squeezed like state is defined with the Lie-Trotter product formula and its mathematical expression is different from those of conventional coherent squeezed states. We show that we can obtain this coherent squeezed like state with a squeezing level $15.9$ dB practically by adjusting physical parameters for our proposed method. Feeding the squeezed light whose average number of photons is given by one or two into a beam splitter and splitting the flow of the squeezed light into a pair of entangled light beams, we estimate their entanglement quantitatively. This paper is a sequel to H. Azuma, J. Phys. D: Appl. Phys. 55, 315106 (2022).

翻訳日:2023-11-22 19:46:13 公開日:2023-11-19

# 圧縮された)大きな言語モデルをトレーニングする方法

How To Train Your (Compressed) Large Language Model ( http://arxiv.org/abs/2305.14864v2 )

ライセンス: Link先を確認

Ananya Harsh Jha, Tom Sherborne, Evan Pete Walsh, Dirk Groeneveld, Emma Strubell, Iz Beltagy

(参考訳) 大規模言語モデル (LLM) のサイズが大きくなると、モデルの汎用性とゼロショットのプロンプト性を保ちながら、モデルのサイズを縮小できる圧縮方法が必要である。このゴールは一般的な圧縮設定よりも野心的であり、特定のエンドタスクに特化するためにモデルのサイズを減らす。そこで本研究では,言語モデリングの複雑度と12のゼロショットエンドタスクを含む大規模評価を行うタスク非依存圧縮パイプラインを開発した。以上の結果から,単純な層毎の刈り取りと継続する言語モデルが,既存の3つの最先端ベースラインを上回って,計算効率が1.5倍向上していることが示された。しかし、典型的なタスク特化圧縮とは異なり、最良の圧縮モデルは、スクラッチから訓練された同様のサイズのモデルに著しく劣る。半大の事前訓練モデルをタスクに依存しない圧縮の上限とし、合理的なトークン予算の下でこのギャップを埋めるための今後の作業を求める。本研究は,既存のllm圧縮手法の欠如を浮き彫りにし,モデルの汎用性と圧縮時のゼロショットプロンサビリティを維持できる新しい方法の必要性を明らかにした。再現性の向上とメソッド設計の反復を支援するため、コードと評価のセットアップをリリースします。

With the increase in the size of large language models (LLMs), we need compression methods that can reduce the model size while preserving the generality and zero-shot promptability of the model. This goal is more ambitious than the typical compression setup, which reduces the model's size at the expense of specializing it to a specific end-task. To study this, we develop a task-agnostic compression pipeline with a large-scale evaluation comprising language modeling perplexity and 12 zero-shot end-tasks. Our results show that a simple layer-wise pruning followed by continued language model pretraining matches or outperforms three existing state-of-the-art baselines while being 1.5x more computationally efficient. However, unlike typical task-specialized compression, our best-compressed model significantly underperforms a similar-sized model trained from scratch. We posit the half-sized pretrained model as an upper bound for task-agnostic compression and call for future work to bridge this gap under a reasonable token budget. Our findings highlight the inadequacy of existing compression methods for LLMs and establish a requirement for new methods that preserve a model's generality and zero-shot promptability under compression. We release our code and evaluation setup to facilitate reproducibility and help iterate on method design.

翻訳日:2023-11-22 19:38:35 公開日:2023-11-19

# ミニマックス修正による効果的な二値最適化

Effective Bilevel Optimization via Minimax Reformulation ( http://arxiv.org/abs/2305.13153v2 )

ライセンス: Link先を確認

Xiaoyu Wang, Rui Pan, Renjie Pi and Tong Zhang

(参考訳) バイレベル最適化は、ハイパーパラメータ最適化、データクリーニング、メタラーニングなど、さまざまな機械学習問題に成功している。しかし、その膨大な計算コストは、大規模問題におけるその利用に大きな課題をもたらす。この課題は、2段階の定式化のネスト構造によって生じ、各高次計算はコストのかかる内部最適化手順を必要とする。そこで本研究では,二段階最適化をミニマックス問題として再編成し,外部依存性を効果的に分離する手法を提案する。穏やかな条件下では、これらの2つの問題が等価であることを示す。さらに,収束保証付きミニマックス問題の解法として,多段勾配降下法(GDA)アルゴリズムを導入する。その結果,提案手法は計算コストを大幅に削減しつつ,最先端の2段階法よりも優れていた。

Bilevel optimization has found successful applications in various machine learning problems, including hyper-parameter optimization, data cleaning, and meta-learning. However, its huge computational cost presents a significant challenge for its utilization in large-scale problems. This challenge arises due to the nested structure of the bilevel formulation, where each hyper-gradient computation necessitates a costly inner optimization procedure. To address this issue, we propose a reformulation of bilevel optimization as a minimax problem, effectively decoupling the outer-inner dependency. Under mild conditions, we show these two problems are equivalent. Furthermore, we introduce a multi-stage gradient descent and ascent (GDA) algorithm to solve the resulting minimax problem with convergence guarantees. Extensive experimental results demonstrate that our method outperforms state-of-the-art bilevel methods while significantly reducing the computational cost.

翻訳日:2023-11-22 19:37:36 公開日:2023-11-19

# 教師なしマルチビュー歩行者検出

Unsupervised Multi-view Pedestrian Detection ( http://arxiv.org/abs/2305.12457v2 )

ライセンス: Link先を確認

Mengyin Liu, Chao Zhu, Shiqi Ren, Xu-Cheng Yin

(参考訳) ビデオ監視の繁栄により、特定のエリアの歩行者を正確に見つけるために複数のカメラが適用された。しかし、従来の手法では、ビデオフレームやカメラビューごとに人間のラベル付きアノテーションに依存しており、カメラキャリブレーションや同期よりも重い負担がかかる。そこで本稿では,2D-3Dマッピングによる多視点検出器の学習におけるアノテーションの必要性を排除するために,unsupervised Multi-view Pedestrian Detection approach (UMPD)を提案する。 1)セマンティクス対応反復セグメンテーション(sis)は,視覚言語モデルから提案する反復型pcaとゼロショット意味クラスを用いて,仮想ラベルとして2次元歩行者マスクに変換されるマルチビュー画像の教師なし表現を抽出する。 2)2D-to-3D幾何投影による3D-to-2Dレンダリングの損失をSIS擬似ラベルを用いてトレーニングし,多視点2D画像を3次元ボリュームにエンコードし,ボクセルの密度と色を予測する。 3)GVDからバードスアイビューに投影される3次元密度のより優れた検出結果を得るためには,垂直型BEV正規化(VBR)を提案し,自然歩行者のポーズのように垂直となるように拘束する。一般的な多視点歩行者検出ベンチマークであるWildtrack,Terrace,MultiviewXの広範囲にわたる実験により,提案手法は,これまでの最先端の監視手法と競争的に機能することを示す。コードは利用可能だ。

With the prosperity of the video surveillance, multiple cameras have been applied to accurately locate pedestrians in a specific area. However, previous methods rely on the human-labeled annotations in every video frame and camera view, leading to heavier burden than necessary camera calibration and synchronization. Therefore, we propose in this paper an Unsupervised Multi-view Pedestrian Detection approach (UMPD) to eliminate the need of annotations to learn a multi-view pedestrian detector via 2D-3D mapping. 1) Firstly, Semantic-aware Iterative Segmentation (SIS) is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels, via our proposed iterative PCA and zero-shot semantic classes from vision-language models. 2) Secondly, we propose Geometry-aware Volume-based Detector (GVD) to end-to-end encode multi-view 2D images into a 3D volume to predict voxel-wise density and color via 2D-to-3D geometric projection, trained by 3D-to-2D rendering losses with SIS pseudo labels. 3) Thirdly, for better detection results, i.e., the 3D density projected on Birds-Eye-View from GVD, we propose Vertical-aware BEV Regularization (VBR) to constraint them to be vertical like the natural pedestrian poses. Extensive experiments on popular multi-view pedestrian detection benchmarks Wildtrack, Terrace, and MultiviewX, show that our proposed UMPD approach, as the first fully-unsupervised method to our best knowledge, performs competitively to the previous state-of-the-art supervised techniques. Code will be available.

翻訳日:2023-11-22 19:37:22 公開日:2023-11-19

# 低ランク拡散モデルによる非教師なし超スペクトルパンシャープニング

Unsupervised Hyperspectral Pansharpening via Low-rank Diffusion Model ( http://arxiv.org/abs/2305.10925v2 )

ライセンス: Link先を確認

Xiangyu Rui, Xiangyong Cao, Li Pang, Zeyu Zhu, Zongsheng Yue, and Deyu Meng

(参考訳) 高分解能パノクロマトグラフィー (PAN) 画像と低分解能ハイパースペクトル (LRHS) 画像を融合して高分解能ハイパースペクトル (HRHS) 画像を生成する過程である。既存のベイジアンベースのhsパンシャープニング法では、画像特徴を特徴付ける前に手工芸画像を設計する必要があり、ディープラーニングベースのhsパンシャープニング法は通常、多数のペアトレーニングデータを必要とし、一般化能力に乏しい。そこで本研究では,事前学習した深層拡散モデルのパワーとベイズ法の一般化能力を同時に活用し,ハイパースペクトルパンシャープ化のための低ランク拡散モデルを提案する。具体的には、HRHS画像は2つの低ランクテンソル、すなわちベーステンソルと係数行列の積から復元できると仮定する。基本テンソルは画像フィールド上にあり、スペクトル次元が小さい。これにより、事前学習したリモートセンシング拡散モデルを用いて画像構造を捉えることができる。さらに, HRHS のスペクトル情報を保持する LRHS 画像から係数行列を事前推定する, 単純かつ極めて有効な手法を導出する。実験の結果,提案手法は従来の手法よりも優れた性能を示し,dl法よりも一般化能力が向上した。コードはhttps://github.com/xyrui/plrdiffでリリースされる。

Hyperspectral pansharpening is a process of merging a high-resolution panchromatic (PAN) image and a low-resolution hyperspectral (LRHS) image to create a single high-resolution hyperspectral (HRHS) image. Existing Bayesian-based HS pansharpening methods require designing handcraft image prior to characterize the image features, and deep learning-based HS pansharpening methods usually require a large number of paired training data and suffer from poor generalization ability. To address these issues, in this work, we propose a low-rank diffusion model for hyperspectral pansharpening by simultaneously leveraging the power of the pre-trained deep diffusion model and better generalization ability of Bayesian methods. Specifically, we assume that the HRHS image can be recovered from the product of two low-rank tensors, i.e., the base tensor and the coefficient matrix. The base tensor lies on the image field and has a low spectral dimension. Thus, we can conveniently utilize a pre-trained remote sensing diffusion model to capture its image structures. Additionally, we derive a simple yet quite effective way to pre-estimate the coefficient matrix from the observed LRHS image, which preserves the spectral information of the HRHS. Experimental results demonstrate that the proposed method performs better than some popular traditional approaches and gains better generalization ability than some DL-based methods. The code is released in https://github.com/xyrui/PLRDiff.

翻訳日:2023-11-22 19:36:20 公開日:2023-11-19

# TwitterとMastodon間のプラットフォーム移行パターンの探索 - ユーザ行動調査

Exploring Platform Migration Patterns between Twitter and Mastodon: A User Behavior Study ( http://arxiv.org/abs/2305.09196v3 )

ライセンス: Link先を確認

Ujun Jeong, Paras Sheth, Anique Tahir, Faisal Alatawi, H. Russell Bernard, Huan Liu

(参考訳) 最近、twitterからmastodonなどの代替プラットフォームに移行するユーザの急増は、移行パターンとは何か、さまざまなプラットフォームがユーザの行動にどう影響するか、ユーザ移行が移行プロセスにどのように収まるのか、といった疑問を提起した。本研究では,twitterの所有権変更後の最初の10週間で,twitterからmastodonに移行した1万人以上のユーザを対象に,これらの質問を詳細に調査する。私たちの研究は3つの主要な段階に分かれている。まず,マイグレーションパターンの抽出と解析を行うアルゴリズムを開発した。第二に、行動分析を活用して、TwitterとMastodonの異なるアーキテクチャを調べ、ユーザー行動が各プラットフォームの特徴とどのように対応するかを学ぶ。最後に,特定の行動要因がユーザに与える影響を判断する。我々は,ユーザの行動調査から得られたユーザマイグレーション,洞察,教訓について共有する。

A recent surge of users migrating from Twitter to alternative platforms, such as Mastodon, raised questions regarding what migration patterns are, how different platforms impact user behaviors, and how migrated users settle in the migration process. In this study, we elaborate on how we investigate these questions by collecting data over 10,000 users who migrated from Twitter to Mastodon within the first ten weeks following the ownership change of Twitter. Our research is structured in three primary steps. First, we develop algorithms to extract and analyze migration patterns. Second, by leveraging behavioral analysis, we examine the distinct architectures of Twitter and Mastodon to learn how user behaviors correspond with the characteristics of each platform. Last, we determine how particular behavioral factors influence users to stay on Mastodon. We share our findings of user migration, insights, and lessons learned from the user behavior study.

翻訳日:2023-11-22 19:33:55 公開日:2023-11-19

# ソースフリードメイン適応によるSSVEPベースの脳-コンピュータインタフェース

Source-Free Domain Adaptation for SSVEP-based Brain-Computer Interfaces ( http://arxiv.org/abs/2305.17403v2 )

ライセンス: Link先を確認

Osman Berke Guney, Deniz Kucukahmetler and Huseyin Ozkan

(参考訳) 本稿では、定常視覚誘発電位(SSVEP)に基づく脳-コンピュータインタフェース(BCI)スペルに対するソースフリードメイン適応法を提案する。 SSVEPベースのBCIスペルは、迅速なコミュニケーションを可能にすることで、発話困難を経験する個人を支援する。しかし,高情報伝達率 (ITR) を実現するには,システムを使用する前に広い校正期間を必要とするため,新規ユーザの不快感が生じる。本稿では,未ラベルのターゲットデータのみに基づいて,ソースドメイン(元ユーザや過去の実験参加者のデータ)から新たなユーザ(ターゲットドメイン)に事前学習したデータに基づいて,強力なディープニューラルネットワーク(dnn)を適応させる新しい手法を提案する。この適応は、自己適応項と局所正規項からなるカスタム損失関数を最小化する。自己適応項は擬似ラベル戦略を使い、新しい局所規則項はデータ構造を利用してDNNに類似のラベルを隣接インスタンスに割り当てさせる。提案手法は,キャリブレーションの負担を取り除き,優れたキャラクタ識別精度とitrを維持しながらユーザの快適性を優先する。特に、ベンチマークとBETAデータセットにおける201.15ビット/minと145.02ビット/minのITRをそれぞれ達成し、最先端の代替よりも優れています。私たちのコードはhttps://github.com/osmanberke/SFDA-SSVEP-BCIで利用可能です。

This paper presents a source free domain adaptation method for steady-state visually evoked potentials (SSVEP) based brain-computer interface (BCI) spellers. SSVEP-based BCI spellers assist individuals experiencing speech difficulties by enabling them to communicate at a fast rate. However, achieving a high information transfer rate (ITR) in most prominent methods requires an extensive calibration period before using the system, leading to discomfort for new users. We address this issue by proposing a novel method that adapts a powerful deep neural network (DNN) pre-trained on data from source domains (data from former users or participants of previous experiments) to the new user (target domain), based only on the unlabeled target data. This adaptation is achieved by minimizing our proposed custom loss function composed of self-adaptation and local-regularity terms. The self-adaptation term uses the pseudo-label strategy, while the novel local-regularity term exploits the data structure and forces the DNN to assign similar labels to adjacent instances. The proposed method priorities user comfort by removing the burden of calibration while maintaining an excellent character identification accuracy and ITR. In particular, our method achieves striking 201.15 bits/min and 145.02 bits/min ITRs on the benchmark and BETA datasets, respectively, and outperforms the state-of-the-art alternatives. Our code is available at https://github.com/osmanberke/SFDA-SSVEP-BCI

翻訳日:2023-11-22 19:22:49 公開日:2023-11-19

# 簡単なベースラインによる対人訓練の見直しと促進

Revisiting and Advancing Adversarial Training Through A Simple Baseline ( http://arxiv.org/abs/2306.07613v2 )

ライセンス: Link先を確認

Hong Liu

(参考訳) 本稿では,敵の攻撃に対する先駆的防御手法である敵訓練の本質的要素について考察する。本稿では,損失関数や学習速度スケジューラ,データ拡張など,モデルアーキテクチャに依存しない要因が,敵の堅牢性と一般化に影響を及ぼすことを示す。これらの要因が制御されると、SimpleATと呼ばれるシンプルなベースラインアプローチを導入し、最近の手法と競合し、堅牢なオーバーフィッティングを軽減します。我々はCIFAR-10/100とTiny-ImageNetの広範な実験を行い、AutoAttackのような最先端の攻撃者に対するSimpleATの堅牢性を検証する。以上の結果から,CIFAR-10-Cに見られるような画像劣化の存在下で,SimpleATは優れた性能を示した。さらに、我々はSimpleATがモデル予測のばらつきを低減できることを実証的に示す。以上の結果から,SimpleATと先進的対人防御手法の相互関係が明らかとなった。

In this paper, we delve into the essential components of adversarial training which is a pioneering defense technique against adversarial attacks. We indicate that some factors such as the loss function, learning rate scheduler, and data augmentation, which are independent of the model architecture, will influence adversarial robustness and generalization. When these factors are controlled for, we introduce a simple baseline approach, termed SimpleAT, that performs competitively with recent methods and mitigates robust overfitting. We conduct extensive experiments on CIFAR-10/100 and Tiny-ImageNet, which validate the robustness of SimpleAT against state-of-the-art adversarial attackers such as AutoAttack. Our results also demonstrate that SimpleAT exhibits good performance in the presence of various image corruptions, such as those found in the CIFAR-10-C. In addition, we empirically show that SimpleAT is capable of reducing the variance in model predictions, which is considered the primary contributor to robust overfitting. Our results also reveal the connections between SimpleAT and many advanced state-of-the-art adversarial defense methods.

翻訳日:2023-11-22 19:13:33 公開日:2023-11-19

# 集合価値フィードバックによるオンライン学習

Online Learning with Set-Valued Feedback ( http://arxiv.org/abs/2306.06247v3 )

ライセンス: Link先を確認

Vinod Raman, Unique Subedi, Ambuj Tewari

(参考訳) 学習者が1つのラベルを予測するが、フィードバックとして \textit{set of labels} を受け取るオンラインマルチクラス分類の変種を調査した。このモデルでは、明らかにされた集合に含まれるラベルを出力しないために学習者がペナルティを課される。単一ラベルフィードバックによるオンラインマルチクラス学習とは異なり、決定論的かつランダム化されたオンライン学習能力は、集合的フィードバックの下で実現可能な設定において \textit{not equivalent} である。さらに、決定論的かつランダムな実現可能学習性は、フィードバックとして明らかにできる集合の集合のヘリー数が有限であれば同値であることを示す。この分離を考慮に入れ、我々は2つの新しい組合せ次元、すなわち集合リトルストーンと測度シェータリングの次元を与え、その有限性はそれぞれ決定論的およびランダムに実現可能な可学習性を特徴づける。さらに、これらの次元は、決定論的でランダム化されたミニマックスの後悔を、実現可能な設定で下界と上界に制限する。実現可能な設定を超えて、測定値の破砕次元が学習性を特徴づけ続け、不可知的な設定におけるミニマックス後悔を定量化する。最後に,オンラインマルチラベルランキング,オンラインマルチラベル分類,インターバル値応答による実数値予測という3つの実践的学習環境において,ミニマックス後悔の限界を確立するために実験結果を用いた。

We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} in the realizable setting under set-valued feedback. In addition, we show that deterministic and randomized realizable learnability are equivalent if the Helly number of the collection of sets that can be revealed as feedback is finite. In light of this separation, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, whose finiteness characterizes deterministic and randomized realizable learnability respectively. Additionally, these dimensions lower- and upper bound the deterministic and randomized minimax regret in the realizable setting. Going beyond the realizable setting, we prove that the Measure shattering dimension continues to characterize learnability and quantify minimax regret in the agnostic setting. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.

翻訳日:2023-11-22 19:12:34 公開日:2023-11-19

# MMSum:ビデオのマルチモーダル要約とサムネイル生成のためのデータセット

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos ( http://arxiv.org/abs/2306.04216v2 )

ライセンス: Link先を確認

Jielin Qiu, Jiacheng Zhu, William Han, Aditesh Kumar, Karthik Mittal, Claire Jin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Ding Zhao, Bo Li, Lijuan Wang

(参考訳) マルチモーダル出力(MSMO)を用いたマルチモーダル要約が,有望な研究方向として浮上している。それでも、メンテナンスの不十分、データアクセシビリティの欠如、サイズ制限、適切な分類の欠如など、既存のMSMOデータセットには多くの制限がある。これらの課題に対処し、この新たな方向性のための包括的なデータセットを提供するため、我々は、慎重に\textbf{MMSum}データセットをキュレートした。新しいデータセットは,(1)ビデオコンテンツとテキストコンテンツの両方に有能な要約を提供し,マルチモーダル学習に優れた指導とラベルを提供する。 2) 包括的かつ丁寧に分類し, 多様な実世界のシナリオを包括する17のカテゴリと170のサブカテゴリにまたがる。 3) 提案するデータセット上で行ったベンチマークテストは, \textit{video summarization}, \textit{text summarization}, \textit{multimodal summarization} など,さまざまなタスクとメソッドを評価した。アクセシビリティとコラボレーションを推進すべく、私たちは \textbf{MMSum}データセットとデータ収集ツールを完全なオープンソースリソースとしてリリースします。プロジェクトのwebサイトは~\url{https://mmsum-dataset.github.io/}にある。

Multimodal summarization with multimodal output (MSMO) has emerged as a promising research direction. Nonetheless, numerous limitations exist within existing public MSMO datasets, including insufficient maintenance, data inaccessibility, limited size, and the absence of proper categorization, which pose significant challenges. To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the \textbf{MMSum} dataset. Our new dataset features (1) Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning. (2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. (3) Benchmark tests performed on the proposed dataset to assess various tasks and methods, including \textit{video summarization}, \textit{text summarization}, and \textit{multimodal summarization}. To champion accessibility and collaboration, we will release the \textbf{MMSum} dataset and the data collection tool as fully open-source resources, fostering transparency and accelerating future developments. Our project website can be found at~\url{https://mmsum-dataset.github.io/}

翻訳日:2023-11-22 19:10:00 公開日:2023-11-19

# 有限要素インスピレーションネットワーク:部分観測から解釈可能な変形可能な物体ダイナミクスを学習する

Finite element inspired networks: Learning interpretable deformable object dynamics from partial observations ( http://arxiv.org/abs/2307.07975v2 )

ライセンス: Link先を確認

Shamil Mamedov, A. Ren\'e Geist, Jan Swevers, Sebastian Trimpe

(参考訳) 変形可能な線形オブジェクト(dlo)ダイナミクスの正確なシミュレーションは、手前のタスクが人間の解釈可能なモデルを必要とする場合、難しい。このようなモデルに到達するために、剛有限要素法(R-FEM)からインスピレーションを得て、動的ネットワークによって内部状態が経時的にアンロールされる剛体の直列鎖としてDLOをモデル化する。この状態が直接観察されないため、ダイナミックスネットワークは、観測された運動変数をDLOの隠れ状態にマッピングする物理インフォームドエンコーダと共同で訓練される。状態が物理的に意味のある表現を取得することを奨励するために、基礎となるR-FEMモデルの前方運動学をデコーダとして活用する。ロボット実験を通じて、提案アーキテクチャは、部分的な観測から物理的に解釈可能な予測をもたらす、容易に扱いやすいDLO力学モデルを提供することを示した。プロジェクトコードは \url{https://tinyurl.com/fei-networks} で利用可能である。

Accurate simulation of deformable linear object (DLO) dynamics is challenging if the task at hand requires a human-interpretable model that also yields fast predictions. To arrive at such a model, we draw inspiration from the rigid finite element method (R-FEM) and model a DLO as a serial chain of rigid bodies whose internal state is unrolled through time by a dynamics network. As this state is not observed directly, the dynamics network is trained jointly with a physics-informed encoder which maps observed motion variables to the DLO's hidden state. To encourage that the state acquires a physically meaningful representation, we leverage the forward kinematics of the underlying R-FEM model as a decoder. Through robot experiments we demonstrate that the proposed architecture provides an easy-to-handle, yet capable DLO dynamics model yielding physically interpretable predictions from partial observations. The project code is available at: \url{https://tinyurl.com/fei-networks}

翻訳日:2023-11-22 18:49:37 公開日:2023-11-19

# 非パラメトリックな帯域における最も重要なシフトの追跡

Tracking Most Significant Shifts in Nonparametric Contextual Bandits ( http://arxiv.org/abs/2307.05341v2 )

ライセンス: Link先を確認

Joe Suk and Samory Kpotufe

(参考訳) リプシッツが報酬関数を意味する非パラメトリックな文脈帯域について、時間とともに変化する可能性がある。まず、この最小限のダイナミックな後悔率を、変更数で$L$と総変量$V$で理解されていない設定で確立し、どちらも文脈空間上の分布のすべての変化を捉え、この設定では最先端の手続きが最適でないと主張する。次に、私たちはこの設定に対する適応性の問題、すなわち$l$ や $v$ を知らずにminimaxレートを達成する傾向がある。極めて重要なことは、与えられたコンテキストで局所的に見られるbandit問題は、他のコンテキスト空間の報酬変更である$\cal x$の影響を受けるべきではない、ということです。したがって、我々は変化の概念を提案し、これは大きな変化を経験し、局所性をうまく考慮し、したがって$L$や$V$よりもかなり少ない変化を数えている。さらに、非定常MAB(Suk & Kpotufe, 2022)に関する最近の研究と同様に、大きな変化は平均報酬の最も重要な変化(例えば、観測された文脈に関連する深刻なベストアームの変化)を数えることしかなかった。私たちの主な成果は、このより寛容な変化の概念が実際に適応可能であることを示すことです。

We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.

翻訳日:2023-11-22 18:48:05 公開日:2023-11-19

# 人間好奇心のネットワーク理論を用いた本質的動機付けグラフ探索

Intrinsically motivated graph exploration using network theories of human curiosity ( http://arxiv.org/abs/2307.04962v3 )

ライセンス: Link先を確認

Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy and Dani S. Bassett

(参考訳) 本質的に動機づけられた探索は、追加の外部報酬なしでも強化学習に役立つことが証明されている。環境が自然にグラフとして表現される場合、探索を導く最善の方法は未解決の問題だ。本研究では,情報ギャップ理論と圧縮進行理論という,人間の好奇心の2つの理論によるグラフ構造データ探索手法を提案する。この理論は好奇心を、環境に訪れるノードによって引き起こされるサブグラフの位相的特徴を最適化する本質的な動機であると考えている。これらの特徴をグラフニューラルネットワークに基づく強化学習の報奨として利用する。合成生成グラフの複数のクラスにおいて、訓練されたエージェントは、訓練中に見られるよりも長い探索的歩行とより大きな環境に一般化する。本手法は, トポロジ特性のグリーディ評価よりも効率的に計算する。提案される本質的動機は、レコメンダシステムに対して特に関連がある。我々は、好奇心を考慮した次のノードレコメンデーションが、MovieLens、Amazon Books、Wikipediaなど、現実世界のグラフ環境におけるPageRank中心性よりも人間の選択をより予測できることを示した。

Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by nodes visited in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to longer exploratory walks and larger environments than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that next-node recommendations considering curiosity are more predictive of human choices than PageRank centrality in several real-world graph environments, including MovieLens, Amazon Books, and Wikipedia.

翻訳日:2023-11-22 18:47:41 公開日:2023-11-19

# 対人訓練による解釈可能なコンピュータビジョンモデル:ロバスト性-解釈可能性結合を解き明かす

Interpretable Computer Vision Models through Adversarial Training: Unveiling the Robustness-Interpretability Connection ( http://arxiv.org/abs/2307.02500v2 )

ライセンス: Link先を確認

Delyan Boychev

(参考訳) 最先端のディープニューラルネットワークの複雑性が永久に増大するにつれて、その解釈性を維持することがますます難しくなっている。本研究は,ロバストなモデル作成に使用される敵の訓練の効果を評価することを目的としている。コンピュータビジョンモデルをより解釈可能にすることが示されている。モデルを現実世界にデプロイする場合、解釈性は堅牢性と同じくらい不可欠です。これら2つの課題の相関性を証明するため,局所的特徴重要度法 (SHAP, 統合的勾配法) と特徴可視化技術 (Representation Inversion, Class Specific Image Generation) を用いてモデルを広範囲に検討した。標準モデルは、ロバストに比べて敵の攻撃の影響を受けやすく、その学習された表現は人間にとって意味をなさない。逆に、これらのモデルは予測をサポートする画像の特徴的な領域に焦点を当てている。さらに、ロバストモデルによって学習される機能は、実際のものに近い。

With the perpetual increase of complexity of the state-of-the-art deep neural networks, it becomes a more and more challenging task to maintain their interpretability. Our work aims to evaluate the effects of adversarial training utilized to produce robust models - less vulnerable to adversarial attacks. It has been shown to make computer vision models more interpretable. Interpretability is as essential as robustness when we deploy the models to the real world. To prove the correlation between these two problems, we extensively examine the models using local feature-importance methods (SHAP, Integrated Gradients) and feature visualization techniques (Representation Inversion, Class Specific Image Generation). Standard models, compared to robust are more susceptible to adversarial attacks, and their learned representations are less meaningful to humans. Conversely, these models focus on distinctive regions of the images which support their predictions. Moreover, the features learned by the robust model are closer to the real ones.

翻訳日:2023-11-22 18:46:14 公開日:2023-11-19

# スピン-1鎖の量子フィッシャー情報と多成分絡み合い

Quantum Fisher Information and multipartite entanglement in spin-1 chains ( http://arxiv.org/abs/2307.02407v2 )

ライセンス: Link先を確認

Federico Dell'Anna, Sunny Pradhan, Cristian Degli Esposti Boschi, Elisa Ercolessi

(参考訳) 本稿では,1次元スピン-1モデルにおける基底状態の量子フィッシャー情報(QFI)をマルチパーティイトエンタングルメントの証として検討する。最も一般的なSU(2)不変のスピン-1鎖であるビリナー・バイカドラティックモデルと、最も近い隣り合う相互作用と開境界条件を持つXXZスピン-1鎖である。厳密な非局所可観測性のqfiのスケーリングは、位相図の特徴付けや、特に位相相の研究において、最大にスケールできることを示した。臨界相におけるその挙動を分析することで、局所および弦観測可能な順序パラメータのスケーリング次元を復元することができる。数値計算は密度行列再正規化群アルゴリズムとテンソルネットワーク技術を利用して得られた。

In this paper, we study the ground state Quantum Fisher Information (QFI) in one-dimensional spin-1 models, as witness to Multipartite Entanglement. The models addressed are the Bilinear-Biquadratic model, the most general isotropic SU(2)-invariant spin-1 chain, and the XXZ spin-1 chain, both with nearest-neighbor interactions and open boundary conditions. We show that the scaling of the QFI of strictly non-local observables can be used for characterizing the phase diagrams and, in particular, for studying topological phases, where it scales maximally. Analysing its behavior at the critical phases we are also able to recover the scaling dimensions of the order parameters both for local and string observables. The numerical results have been obtained by exploiting the Density Matrix Renormalization Group algorithm and Tensor Network techniques.

翻訳日:2023-11-22 18:45:57 公開日:2023-11-19

# 頸動脈超音波画像分割と分類のための領域とカテゴリ信頼に基づくマルチタスクネットワーク

A region and category confidence-based multi-task network for carotid ultrasound image segmentation and classification ( http://arxiv.org/abs/2307.00583v2 )

ライセンス: Link先を確認

Haitao Gan and Ran Zhou and Yanghan Ou and Furong Wang and Xinyao Cheng and Aaron Fenster

(参考訳) 超音波画像における頸動脈プラークの分割と分類は動脈硬化の治療と脳卒中リスクの評価において重要な役割を果たす。深層学習法は頸動脈プラークのセグメンテーションと分類に用いられてきたが,2段階法は解析の複雑さを増大させ,既存のマルチタスク法はセグメンテーションと分類の関係を無視している。これらのことは、すべてのタスクで価値ある情報が完全に活用されないため、最適以下のパフォーマンスをもたらす。そこで我々は,この2つの課題間の相関を利用して,領域信頼モジュール (RCM) とサンプルカテゴリ信頼モジュール (CCM) を用いて,超音波頸動脈プラーク分類と分類のためのマルチタスク学習フレームワーク (RCCM-Net) を提案する。 RCMは、プラーク領域の確率から分類タスクへの知識を提供し、CCMはセグメンテーションタスクのカテゴリ標本重量を学習するために設計されている。総計1270枚の頸動脈プラークの2次元超音波画像が,中国湖南省の病院から採取された。提案手法は,従来のシングルタスクネットワーク (segnet, deeplabv3+, unet++, efficientnet, res2net, repvgg, dpn) とマルチタスクアルゴリズム (hrnet, mtanet) と比較して,85.82% の精度と84.92%のディス相似性効率でセグメント化が可能であった。アブレーション実験では,設計したRCMとCCMの両方がネットワークの性能向上に有効であることを示した。そこで本手法は,臨床および臨床における頸動脈プラーク解析に有用であると考えられた。

The segmentation and classification of carotid plaques in ultrasound images play important roles in the treatment of atherosclerosis and assessment for the risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, two-stage methods will increase the complexity of the overall analysis and the existing multi-task methods ignored the relationship between the segmentation and classification. These will lead to suboptimal performance as valuable information might not be fully leveraged across all tasks. Therefore, we propose a multi-task learning framework (RCCM-Net) for ultrasound carotid plaque segmentation and classification, which utilizes a region confidence module (RCM) and a sample category confidence module (CCM) to exploit the correlation between these two tasks. The RCM provides knowledge from the probability of plaque regions to the classification task, while the CCM is designed to learn the categorical sample weight for the segmentation task. A total of 1270 2D ultrasound images of carotid plaques were collected from Zhongnan Hospital (Wuhan, China) for our experiments. The results showed that the proposed method can improve both segmentation and classification performance compared to existing single-task networks (i.e., SegNet, Deeplabv3+, UNet++, EfficientNet, Res2Net, RepVGG, DPN) and multi-task algorithms (i.e., HRNet, MTANet), with an accuracy of 85.82% for classification and a Dice-similarity-coefficient of 84.92% for segmentation. In the ablation study, the results demonstrated that both the designed RCM and CCM were beneficial in improving the network's performance. Therefore, we believe that the proposed method could be useful for carotid plaque analysis in clinical trials and practice.

翻訳日:2023-11-22 18:45:41 公開日:2023-11-19

# JD広告検索におけるマルチエキスパート知識凝縮を用いたクエリ分類の改善に向けて

Towards Better Query Classification with Multi-Expert Knowledge Condensation in JD Ads Search ( http://arxiv.org/abs/2308.01098v3 )

ライセンス: Link先を確認

Kun-Peng Ning, Ming Pang, Zheng Fang, Xue Jiang, Xi-Wei Zhao, Chang-Ping Peng, Zhan-Gang Lin, Jing-He Hu, Jing-Ping Shao

(参考訳) 検索クエリ分類は、ユーザの意図を理解する効果的な方法であり、実際のオンライン広告システムにおいて非常に重要である。低レイテンシを確保するために、浅いモデル(例えばFastText)が効率的なオンライン推論に広く使われている。しかし、fasttextモデルの表現能力は不十分であり、特に低頻度クエリや尾付きカテゴリでは分類性能が低下する。より深く複雑なモデル(bertなど)を使用することは効果的なソリューションだが、オンライン推論の遅延が増加し、計算コストが高くなる。したがって、推論効率と分類性能の両方をジャグリングする方法は明らかに極めて重要である。本稿では,この課題を克服するために,オンライン高速テキストモデルの厳密な低レイテンシ制約下での分類性能を向上させるための,単純かつ効果的な知識蒸留フレームワークである知識凝縮(kc)を提案する。具体的には、より関連性の高いデータを取得するために、オフラインのBERTモデルをトレーニングすることを提案する。強力なセマンティック表現から恩恵を受けることで、過去のデータに公開されていない関連性の高いラベルがトレーニングセットに追加され、FastTextモデルのトレーニングが改善される。さらに, 関係データのマイニング能力の向上を図るため, 分散分散多元学習戦略を提案する。異なるデータ分布から複数のbertモデルをトレーニングすることで、それぞれ、ハイ、ミドル、低周波の検索クエリでパフォーマンスが向上する。マルチディストリビューションからのモデルアンサンブルにより、その検索能力はより強力になる。我々はこのフレームワークの2つのバージョンをJD検索にデプロイし、オフライン実験と複数のデータセットからのオンラインA/Bテストの両方で提案手法の有効性を検証した。

Search query classification, as an effective way to understand user intents, is of great importance in real-world online ads systems. To ensure a lower latency, a shallow model (e.g. FastText) is widely used for efficient online inference. However, the representation ability of the FastText model is insufficient, resulting in poor classification performance, especially on some low-frequency queries and tailed categories. Using a deeper and more complex model (e.g. BERT) is an effective solution, but it will cause a higher online inference latency and more expensive computing costs. Thus, how to juggle both inference efficiency and classification performance is obviously of great practical importance. To overcome this challenge, in this paper, we propose knowledge condensation (KC), a simple yet effective knowledge distillation framework to boost the classification performance of the online FastText model under strict low latency constraints. Specifically, we propose to train an offline BERT model to retrieve more potentially relevant data. Benefiting from its powerful semantic representation, more relevant labels not exposed in the historical data will be added into the training set for better FastText model training. Moreover, a novel distribution-diverse multi-expert learning strategy is proposed to further improve the mining ability of relevant data. By training multiple BERT models from different data distributions, it can respectively perform better at high, middle, and low-frequency search queries. The model ensemble from multi-distribution makes its retrieval ability more powerful. We have deployed two versions of this framework in JD search, and both offline experiments and online A/B testing from multiple datasets have validated the effectiveness of the proposed approach.

翻訳日:2023-11-22 18:36:26 公開日:2023-11-19

# ワッサーシュタイン統計の形状とアフィン変形に関する情報幾何学

Information Geometry of Wasserstein Statistics on Shapes and Affine Deformations ( http://arxiv.org/abs/2307.12508v3 )

ライセンス: Link先を確認

Shun-ichi Amari, Takeru Matsuda

(参考訳) 情報幾何学とワッサーシュタイン幾何学は確率分布の多様体で導入された2つの主要な構造であり、それらはその異なる特徴を捉えている。位置スケールモデルの多次元一般化であるアフィン変形統計モデルのためのliおよびzhao(2023)の枠組みにおけるワッサースタイン幾何学の特徴について検討した。我々は情報幾何学とwasserstein幾何に基づく推定子の長所と短所を比較した。確率分布の形状とアフィン変形はワッサーシュタイン幾何学において分離され、フィッシャー効率の損失と引き換えに波形摂動に対する頑健さを示す。楕円対称アフィン変形モデルの場合,ワッサースタイン推定器がモーメント推定器であることを示す。波形がガウス的である場合と場合に限り、情報幾何学的推定器(maximum-likelihood estimator)と一致する。ワッサーシュタイン効率の役割は、波形変化に対する堅牢性の観点から解明される。

Information geometry and Wasserstein geometry are two main structures introduced in a manifold of probability distributions, and they capture its different characteristics. We study characteristics of Wasserstein geometry in the framework of Li and Zhao (2023) for the affine deformation statistical model, which is a multi-dimensional generalization of the location-scale model. We compare merits and demerits of estimators based on information geometry and Wasserstein geometry. The shape of a probability distribution and its affine deformation are separated in the Wasserstein geometry, showing its robustness against the waveform perturbation in exchange for the loss in Fisher efficiency. We show that the Wasserstein estimator is the moment estimator in the case of the elliptically symmetric affine deformation model. It coincides with the information-geometrical estimator (maximum-likelihood estimator) when and only when the waveform is Gaussian. The role of the Wasserstein efficiency is elucidated in terms of robustness against waveform change.

翻訳日:2023-11-22 18:33:50 公開日:2023-11-19

# 量子スピンガラスの絡み合いとレプリカ対称性の破れ

Entanglement and replica symmetry breaking in a driven-dissipative quantum spin glass ( http://arxiv.org/abs/2307.10176v2 )

ライセンス: Link先を確認

Brendan P. Marsh, Ronen M. Kroeze, Surya Ganguli, Sarang Gopalakrishnan, Jonathan Keeling, and Benjamin L. Lev

(参考訳) 本稿では,共焦点共焦点共振器QEDシステムの量子力学シミュレーションについて述べる。開量子力学とレプリカ対称性の破れの間の密接な関係が確立され、個々の量子軌道がレプリカとなる。我々は、最大15個のスピン1/2粒子からなる完全連結でフラストレーションのあるスピンネットワークにおけるレプリカ対称性の破れの出現において、絡み合いが重要な役割を担っていることを観察する。絡み合ったスピンの量子軌道は、半古典的軌道よりも低いエネルギーの定常状態スピン配置に達する。キャビティ放出はスピン配置の連続確率的進化のモニタリングを可能にし、この計画からのバックアクションは状態が分裂したイジング状態と複製対称性に絡み合った。スピンガラス秩序の出現は、磁化の欠如とレプリカ間の非自明なスピン重なり密度分布の存在によってそれ自体が現れる。さらに、これらの重なりは、パリシ rsb 解 ansatz のシェリントン-カークパトリック模型と一致して、初期の超計量次数を示す。しかし、非熱パリスオーダーのパラメータ分布は、この量子光学スピングラスの駆動散逸性を強調している。この実用可能なシステムは、量子効果がスピングラスの物理をいかに強化するかを調べるためのテストベッドとして機能するかもしれない。

We describe simulations of the quantum dynamics of a confocal cavity QED system that realizes an intrinsically driven-dissipative spin glass. A close connection between open quantum dynamics and replica symmetry breaking is established, in which individual quantum trajectories are the replicas. We observe that entanglement plays an important role in the emergence of replica symmetry breaking in a fully connected, frustrated spin network of up to fifteen spin-1/2 particles. Quantum trajectories of entangled spins reach steady-state spin configurations of lower energy than that of semiclassical trajectories. Cavity emission allows monitoring of the continuous stochastic evolution of spin configurations, while backaction from this projects entangled states into states of broken Ising and replica symmetry. The emergence of spin glass order manifests itself through the simultaneous absence of magnetization and the presence of nontrivial spin overlap density distributions among replicas. Moreover, these overlaps reveal incipient ultrametric order, in line with the Parisi RSB solution ansatz for the Sherrington-Kirkpatrick model. A nonthermal Parisi order parameter distribution, however, highlights the driven-dissipative nature of this quantum optical spin glass. This practicable system could serve as a testbed for exploring how quantum effects enrich the physics of spin glasses.

翻訳日:2023-11-22 18:32:19 公開日:2023-11-19

# ヒッグス真空がゼロの可視宇宙の双対として実現される隠れたセクタダークマター

Hidden Sector Dark Matter Realized as a Twin of the Visible Universe With Zero Higgs Vacuum Expectation ( http://arxiv.org/abs/2308.08107v4 )

ライセンス: Link先を確認

Stephen L. Adler

(参考訳) 宇宙は2つの同一の粒子集合とゲージ相互作用を含み、ヒッグスポテンシャルによって異なる重力によってのみ結合する。基礎となる対称性のため、非結合時の2つのセクタは非零相と零ヒッグス真空期待相の境界にあるヒッグスポテンシャルを持つと仮定する。 2つのセクター間の結合を断ち切ることで、あるセクターにおけるヒッグスポテンシャルを非ゼロヒッグス期待領域に(可視セクターを)押し込み、もう一方セクターにおけるヒッグスポテンシャルをゼロヒッグス期待領域に(暗セクターを)押し込むことができる。ダークセクターで最小の質量のバリオンは、自ら相互作用するダークマター粒子の候補となる。

We propose that the universe contains two identical sets of particles and gauge interactions, coupling only through gravitation, which differ by their Higgs potentials. We postulate that because of underlying symmetries, the two sectors when uncoupled have Higgs potentials that lie at the boundary between phases with nonzero and zero Higgs vacuum expectation. Turning on the coupling between the two sectors can break the degeneracy, pushing the Higgs potential in one sector into the domain of nonzero Higgs expectation (giving the visible sector), and pushing the Higgs potential in the other sector into the domain of zero Higgs expectation (giving the dark sector). The least massive baryon in the dark sector will then be a candidate self-interacting dark matter particle.

翻訳日:2023-11-22 18:20:18 公開日:2023-11-19

# 量子カーネル生成におけるいくつかの適合関数と絡み合いゲート

Several fitness functions and entanglement gates in quantum kernel generation ( http://arxiv.org/abs/2309.03307v3 )

ライセンス: Link先を確認

Haiyan Wang

(参考訳) 量子機械学習(QML)は、量子技術における有望なフロンティアである。量子アドバンテージの追求において、サポートベクトルマシンのための量子カーネル法が強力なアプローチとして登場した。量子力学の基本的な概念である絡み合いは、量子コンピューティングにおいて中心的な役割を果たす。本稿では,多目的遺伝的アルゴリズムを用いて,量子カーネル特徴写像におけるエンタングルメントゲートの最適個数について検討する。我々は,非局所ゲートと局所ゲートの遺伝的アルゴリズムの適合機能を明確にし,エンタングルメントゲートを用いる利点について考察した。実験により,量子カーネル法における量子回路の最適構成は,絡み合うための非局所ゲートの数に比例することがわかった。この結果は、非局所ゲートが主に抑制された量子カーネル生成に関する以前の文献を補完する。さらに,量子サポートベクトルマシンの機能マップに必要な非局所ゲート数を推定するために,データの分離性指標を活用できることを実証する。この洞察は、データ分析に基づいたhttps://qiskit.org/のような様々な量子プログラミングパッケージで、絡み合いパラメータなどの適切なパラメータを選択するのに役立つ。本研究は,量子機械学習アルゴリズムの効率と精度を向上させる上で有用なガイダンスを提供する。

Quantum machine learning (QML) represents a promising frontier in the quantum technologies. In this pursuit of quantum advantage, the quantum kernel method for support vector machine has emerged as a powerful approach. Entanglement, a fundamental concept in quantum mechanics, assumes a central role in quantum computing. In this paper, we investigate the optimal number of entanglement gates in the quantum kernel feature maps by a multi-objective genetic algorithm. We distinct the fitness functions of genetic algorithm for non-local gates for entanglement and local gates to gain insights into the benefits of employing entanglement gates. Our experiments reveal that the optimal configuration of quantum circuits for the quantum kernel method incorporates a proportional number of non-local gates for entanglement. The result complements the prior literature on quantum kernel generation where non-local gates were largely suppressed. Furthermore, we demonstrate that the separability indexes of data can be leveraged to estimate the number of non-local gates required for the quantum support vector machine's feature maps. This insight can be helpful in selecting appropriate parameters, such as the entanglement parameter, in various quantum programming packages like https://qiskit.org/ based on data analysis. Our findings offer valuable guidance for enhancing the efficiency and accuracy of quantum machine learning algorithms.

翻訳日:2023-11-22 18:10:20 公開日:2023-11-19

# 情報熱力学第二法則の普遍的妥当性

Universal validity of the second law of information thermodynamics ( http://arxiv.org/abs/2308.15558v2 )

ライセンス: Link先を確認

Shintaro Minagawa, M. Hamed Mohammady, Kenta Sakai, Kohtaro Kato, Francesco Buscemi

(参考訳) フィードバック制御と消去プロトコルは、マックスウェルのデモンパラドックスを具現化し、熱力学と情報処理の相互作用を研究するモデルとしてしばしば考えられている。このような研究は、マクスウェルのデーモンと第二の熱力学の法則が平和的に共存できるという結論をコミュニティで広く受け入れられており、デーモンが与える利益は、測定を行い、デーモンの記憶を初期状態に戻すコストによって相殺されなければならないからである。この種のステートメントは、まとめて情報熱力学の第2法則と呼ばれ、最近量子理論シナリオを含むように拡張されている。しかし、この方向の以前の研究では、特にデーモンの記憶におけるフィードバックプロセスと測定についていくつかの仮定がなされており、普遍的に適用不可能であり、有効範囲が明確でないステートメントに到達している。本研究では、熱力学の第2法則と完全に一致した量子フィードバック制御および消去プロトコルの全範囲を正確に特徴付けることにより、このギャップを埋める。量子フィードバック制御と消去プロトコルは、そのプロトコルが熱力学と全体的な互換性がある限り、関連する測定プロセスに関係なく保持されなければならない。我々の包括的な分析は、新しいシナリオを包含するだけでなく、より少ない仮定で以前のシナリオも取り出す。この単純化は理論のより明確な理解に寄与する。さらに,本研究は,フィードバック制御により抽出可能な作業の特徴を識別する正しい情報尺度として,Groenewold-Ozawa情報ゲインを同定する。

Feedback control and erasure protocols have often been considered as a model to embody Maxwell's Demon paradox and to study the interplay between thermodynamics and information processing. Such studies have led to the conclusion, now widely accepted in the community, that Maxwell's Demon and the second law of thermodynamics can peacefully coexist because any gain provided by the demon must be offset by the cost of performing measurement and resetting the demon's memory to its initial state. Statements of this kind are collectively referred to as second laws of information thermodynamics and have recently been extended to include quantum theoretical scenarios. However, previous studies in this direction have made several assumptions, in particular about the feedback process and the measurement performed on the demon's memory, and thus arrived at statements that are not universally applicable and whose range of validity is not clear. In this work, we fill this gap by precisely characterizing the full range of quantum feedback control and erasure protocols that are overall consistent with the second law of thermodynamics. This leads us to conclude that the second law of information thermodynamics is indeed universal: it must hold for any quantum feedback control and erasure protocol, regardless of the measurement process involved, as long as the protocol is overall compatible with thermodynamics. Our comprehensive analysis not only encompasses new scenarios but also retrieves previous ones, doing so with fewer assumptions. This simplification contributes to a clearer understanding of the theory. Additionally, our work identifies the Groenewold--Ozawa information gain as the correct information measure characterizing the work extractable by feedback control.

翻訳日:2023-11-22 18:07:46 公開日:2023-11-19

# SA2-Net:顕微鏡画像分割のためのスケールアウェアアテンションネットワーク

SA2-Net: Scale-aware Attention Network for Microscopic Image Segmentation ( http://arxiv.org/abs/2309.16661v3 )

ライセンス: Link先を確認

Mustansar Fiaz, Moein Heidari, Rao Muhammad Anwer, Hisham Cholakkal

(参考訳) 顕微鏡画像分割は、与えられた顕微鏡画像内の各ピクセルに意味的ラベルを割り当てることを目的としている。畳み込みニューラルネットワーク(CNN)は多くの既存のフレームワークの基礎となっているが、多くの場合、長距離依存を明示的に捉えるのに苦労する。当初、トランスフォーマーは自己注意でこの問題に対処するために考案されたが、形状、サイズ、外観、ターゲット領域密度など、顕微鏡画像における様々な課題に対処するために、局所的特徴とグローバルな特徴の両方が重要であることが証明されている。本稿では,マルチスケール特徴学習を利用して,顕微鏡画像内の多様な構造を効果的に処理する,注意誘導型SA2-Netを提案する。具体的には,細胞などの微細領域のスケールや形状の変動を正確に把握し,正確なセグメンテーションを行うためのSA2モジュールを提案する。このモジュールは、マルチステージ機能の各レベルにおけるローカルな注意と、複数の解像度にわたるグローバルな関心を取り入れている。さらに、アダプティブアップアテンション(AuA)モジュールと呼ばれる新しいアップサンプリング戦略を導入することで、ぼやけた領域境界(セル境界など)の問題に対処する。このモジュールは、明示的な注意機構を用いて顕微鏡領域の局在性を改善するための識別能力を高める。 5つの挑戦的なデータセットに関する広範な実験は、sa2-netモデルの利点を示しています。ソースコードは \url{https://github.com/mustansarfiaz/sa2-net} で公開されている。

Microscopic image segmentation is a challenging task, wherein the objective is to assign semantic labels to each pixel in a given microscopic image. While convolutional neural networks (CNNs) form the foundation of many existing frameworks, they often struggle to explicitly capture long-range dependencies. Although transformers were initially devised to address this issue using self-attention, it has been proven that both local and global features are crucial for addressing diverse challenges in microscopic images, including variations in shape, size, appearance, and target region density. In this paper, we introduce SA2-Net, an attention-guided method that leverages multi-scale feature learning to effectively handle diverse structures within microscopic images. Specifically, we propose scale-aware attention (SA2) module designed to capture inherent variations in scales and shapes of microscopic regions, such as cells, for accurate segmentation. This module incorporates local attention at each level of multi-stage features, as well as global attention across multiple resolutions. Furthermore, we address the issue of blurred region boundaries (e.g., cell boundaries) by introducing a novel upsampling strategy called the Adaptive Up-Attention (AuA) module. This module enhances the discriminative ability for improved localization of microscopic regions using an explicit attention mechanism. Extensive experiments on five challenging datasets demonstrate the benefits of our SA2-Net model. Our source code is publicly available at \url{https://github.com/mustansarfiaz/SA2-Net}.

翻訳日:2023-11-22 17:58:49 公開日:2023-11-19

# 時系列予測: 差分データによる長期依存の解放

Time-Series Forecasting: Unleashing Long-Term Dependencies with Fractionally Differenced Data ( http://arxiv.org/abs/2309.13409v3 )

ライセンス: Link先を確認

Sarit Maitra, Vivek Mishra, Srashti Dwivedi, Sukanya Kundu, Goutam Kumar Kundu

(参考訳) 本研究では,分数差分(FD)のパワーを利用して時系列データにおける短期的および長期的依存関係を捉える新しい予測手法を提案する。従来の整数差分法とは異なり、FDはメモリを連続的に保存し、モデリングのために安定化する。スパイ指標からの金融データにfdを適用し,ニュースレポートからの感情分析を組み込むことで,fdの有効性を目標変数のバイナリ分類と組み合わせて検討する。教師付き分類アルゴリズムを用いてFDシリーズの性能を検証した。その結果, 整数差に対するFDの優位性を示し, 受信器動作特性/Area Under the Curve (ROCAUC) とMathews correlation Coefficient (MCC) の評価で確認された。

This study introduces a novel forecasting strategy that leverages the power of fractional differencing (FD) to capture both short- and long-term dependencies in time series data. Unlike traditional integer differencing methods, FD preserves memory in series while stabilizing it for modeling purposes. By applying FD to financial data from the SPY index and incorporating sentiment analysis from news reports, this empirical analysis explores the effectiveness of FD in conjunction with binary classification of target variables. Supervised classification algorithms were employed to validate the performance of FD series. The results demonstrate the superiority of FD over integer differencing, as confirmed by Receiver Operating Characteristic/Area Under the Curve (ROCAUC) and Mathews Correlation Coefficient (MCC) evaluations.

翻訳日:2023-11-22 17:57:16 公開日:2023-11-19

# スパースエントロピーワッサースタイン回帰を用いたロバストネットワークプラニング

Robust Network Pruning With Sparse Entropic Wasserstein Regression ( http://arxiv.org/abs/2310.04918v2 )

ライセンス: Link先を確認

Lei You and Hei Victor Cheng

(参考訳) 本研究では、経験的フィッシャー情報行列(FIM)の計算において、不正確な勾配が存在するというニューラルネットワークプルーニングの問題に取り組む。我々は, 最適輸送 (ot) 問題の幾何学的属性を活かしたエントロピーワッサースタイン回帰 (ewr) の定式化を提案する。これは、データポイント間の近傍補間を採用することでノイズ緩和に優れる分析的に示される。ワッサーシュタイン距離の独特な強さは、ノイズ低減と共分散情報保存のバランスをとる本質的な能力である。各種ネットワーク上での大規模実験により,提案手法と最先端(SoTA)ネットワークプルーニングアルゴリズムとの同等の性能を示した。提案手法は,ネットワークサイズやターゲットのスパース性が大きい場合,ノイズデータやアナログメモリ,逆襲攻撃などにより,ノイズ勾配が存在する場合に,さらに大きな利得が得られる。特に,提案手法では,ネットワークパラメータの4分の1以下しか残っていないmobilenetv1の精度が6%向上し,テスト損失が8%向上した。

This study tackles the issue of neural network pruning that inaccurate gradients exist when computing the empirical Fisher Information Matrix (FIM). We introduce an entropic Wasserstein regression (EWR) formulation, capitalizing on the geometric attributes of the optimal transport (OT) problem. This is analytically showcased to excel in noise mitigation by adopting neighborhood interpolation across data points. The unique strength of the Wasserstein distance is its intrinsic ability to strike a balance between noise reduction and covariance information preservation. Extensive experiments performed on various networks show comparable performance of the proposed method with state-of-the-art (SoTA) network pruning algorithms. Our proposed method outperforms the SoTA when the network size or the target sparsity is large, the gain is even larger with the existence of noisy gradients, possibly from noisy data, analog memory, or adversarial attacks. Notably, our proposed method achieves a gain of 6% improvement in accuracy and 8% improvement in testing loss for MobileNetV1 with less than one-fourth of the network parameters remaining.

翻訳日:2023-11-22 17:47:07 公開日:2023-11-19

# 生まれつきのルールはどこから来るのか? 重ね合わせ

Where does the Born Rule come from? Superposition ( http://arxiv.org/abs/2310.04188v3 )

ライセンス: Link先を確認

David Ellerman

(参考訳) ボルン則は量子力学 (qm) において、数理形式論と確率論における実験結果との関係を提供するため、重要な役割を担っている。生まれてくる規則は通常の確率論では起こらない。その時はどこから来ますか。これは文学における大きな論争の的となった。我々は、自然法則が現れる通常の確率論の最も単純な拡張は何であるかを問うアプローチを取る。これは、(通常の離散事象に加えて)重ね合わせ事象の概念を有限確率論に加えることによって生まれた規則が現れることを示すことによって解かれる。したがって、この規則は物理学に基づく導出を必要としない。これは単に重ね合わせの数学の特徴であり、通常の確率論に重ね合わせの事象が加わっただけである。

The Born Rule plays a critical role in quantum mechanics (QM) since it supplies the link between the mathematical formalism and experimental results in terms of probabilities. The Born Rule does not occur in ordinary probability theory. Where then does it come from? This has been a topic of considerable controversy in the literature. We take the approach of asking what is the simplest extension of ordinary probability theory where the Born rule appears. This is answered by showing that the Born Rule appears by adding the notion of superposition events (in addition to the ordinary discrete events) to finite probability theory. Hence the rule does not need any physics-based derivation. It is simply a feature of the mathematics of superposition when only superposition events are added to ordinary probability theory.

翻訳日:2023-11-22 17:45:54 公開日:2023-11-19

# 初歩的行動の学習と再利用による隠れ体験の再現

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency ( http://arxiv.org/abs/2310.01827v2 )

ライセンス: Link先を確認

Francisco Roldan Sanchez, Qiang Wang, David Cordova Bulens, Kevin McGuinness, Stephen Redmond, Noel O'Connor

(参考訳) hindsight experience replay (her) は強化学習 (rl) で用いられるテクニックであり、スパース報酬を用いて目標ベースのロボット操作タスクを解決するために、オフポリシーrlベースのエージェントをトレーニングするのに非常に効率的であることが証明されている。 HERは、過去の経験の誤りから学習することで、RLベースのエージェントのサンプル効率を改善するが、環境を探索する際のガイダンスは提供しない。これは、このリプレイ戦略を使ってエージェントを訓練するのに必要な経験量のために、非常に大きなトレーニング時間をもたらす。本稿では,より複雑なタスクを学習しながら,エージェントを探索中により報奨的行動に導くために,単純なタスクの解法として学習された原始的な振る舞いを用いた手法を提案する。しかし、この指導は手動で設計したカリキュラムによっては実行されず、批判者ネットワークを使用して、前述したプリミティブポリシーによって提案されたアクションを使用するかどうかを各時間ステップで決定する。本手法は,複数のブロック操作タスクにおいて,その性能とアルゴリズムのより効率的なバリエーションを比較して評価する。提案手法では, サンプル効率と計算時間の両方から, エージェントがより早く方針を学習できることを実証する。コードはhttps://github.com/franroldans/qmp-herで入手できる。

Hindsight Experience Replay (HER) is a technique used in reinforcement learning (RL) that has proven to be very efficient for training off-policy RL-based agents to solve goal-based robotic manipulation tasks using sparse rewards. Even though HER improves the sample efficiency of RL-based agents by learning from mistakes made in past experiences, it does not provide any guidance while exploring the environment. This leads to very large training times due to the volume of experience required to train an agent using this replay strategy. In this paper, we propose a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks. This guidance, however, is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed by the previously-learned primitive policies. We evaluate our method by comparing its performance against HER and other more efficient variations of this algorithm in several block manipulation tasks. We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time. Code is available at https://github.com/franroldans/qmp-her.

翻訳日:2023-11-22 17:43:57 公開日:2023-11-19

# RegBN: 正規化を伴うマルチモーダルデータのバッチ正規化

RegBN: Batch Normalization of Multimodal Data with Regularization ( http://arxiv.org/abs/2310.00641v2 )

ライセンス: Link先を確認

Morteza Ghahremani and Christian Wachinger

(参考訳) 近年、マルチモーダルデータの統合におけるニューラルネットワークの成功によって、マルチソースセンサーが捉えた高次元データを統合することへの関心が高まっている。しかし、不均一なマルチモーダルデータの統合は、不均一なデータソース間の結合効果と依存関係が望ましくない変数とバイアスを導入し、マルチモーダルモデルの準最適性能をもたらすなど、大きな課題となる。そのため、融合前にデータモダリティから抽出した低レベル・高レベルの特徴を正規化することが重要となる。本稿では,正規化を組み込んだマルチモーダルデータの正規化のための新しい手法,reginbnを提案する。 RegBNはFrobeniusのノルムを正規化用語として使用して、共同創設者の副作用と、異なるデータソース間の基盤となる依存関係に対処している。提案手法は複数のモードにまたがってうまく一般化し,学習可能なパラメータの必要性を排除し,トレーニングや推論を簡素化する。言語, 音声, 画像, ビデオ, 深度, 表層, 三次元MRIなどの多彩なモーダル性を含む5つの研究領域の8つのデータベース上でRegBNの有効性を検証する。提案手法は多層パーセプトロン,畳み込みニューラルネットワーク,視覚トランスフォーマーなどの異なるアーキテクチャに適用可能であり,マルチモーダルニューラルネットワークにおいて低レベルと高レベルの両方の機能を効果的に正規化できることを示す。 RegBN は \url{https://github.com/mogvision/regbn} で利用可能である。

Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in the integration of multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces a novel approach for the normalization of multimodal data, called RegBN, that incorporates regularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks. RegBN is available at \url{https://github.com/mogvision/regbn}.

翻訳日:2023-11-22 17:43:34 公開日:2023-11-19

# SoybeanNet:無人航空機(UAV)画像からダイズポッドを数えるトランスフォーマーベースの畳み込みニューラルネットワーク

SoybeanNet: Transformer-Based Convolutional Neural Network for Soybean Pod Counting from Unmanned Aerial Vehicle (UAV) Images ( http://arxiv.org/abs/2310.10861v2 )

ライセンス: Link先を確認

Jiajia Li, Raju Thada Magar, Dong Chen, Feng Lin, Dechun Wang, Xiang Yin, Weichao Zhuang and Zhaojian Li

(参考訳) 大豆は食物、タンパク質、油の重要な供給源であり、その収量の向上、栽培法の改善、大豆の育種技術の進歩をめざす広範な研究が行われている。この文脈において、ダイズポッドカウントは生産の理解と最適化において重要な役割を果たす。近年の進歩にもかかわらず,実地環境で効果的に動作可能なロバストポッドカウントアルゴリズムの開発は,米国ミシガン州の実際の大豆畑から採取した無人航空機(uav)画像を用いた高精度大豆ポッドカウント手法の先駆的課題である。具体的には,大豆ポッドの同時カウントとローカライゼーションを高精度に行うために,強力なトランスフォーマーバックボーンを利用する新しいポイントベースカウントネットワークであるSoybeanNetを提案する。さらに、ダイズポッドカウントのためのUAV取得画像のデータセットが作成、オープンソース化され、113枚のドローン画像と260k以上の手動で注釈付けされたダイズポッドが自然の照明下で捕獲された。総合的な評価を通じて、SoybeanNetは、収集した画像をテストする際に、5つの最先端アプローチよりも優れた性能を示した。注目すべきは、SoybeanNetがテストデータセットでテストした場合のカウント精度が84.51\%に達したことだ。また、ソースコード(\url{https://github.com/jiajiali04/soybean-pod-counting-from-uav-images})とラベル付き大豆データセット(\url{https://www.kaggle.com/datasets/jiajiali/uav-based-soybean-pod-images})も提供している。

Soybeans are a critical source of food, protein and oil, and thus have received extensive research aimed at enhancing their yield, refining cultivation practices, and advancing soybean breeding techniques. Within this context, soybean pod counting plays an essential role in understanding and optimizing production. Despite recent advancements, the development of a robust pod-counting algorithm capable of performing effectively in real-field conditions remains a significant challenge This paper presents a pioneering work of accurate soybean pod counting utilizing unmanned aerial vehicle (UAV) images captured from actual soybean fields in Michigan, USA. Specifically, this paper presents SoybeanNet, a novel point-based counting network that harnesses powerful transformer backbones for simultaneous soybean pod counting and localization with high accuracy. In addition, a new dataset of UAV-acquired images for soybean pod counting was created and open-sourced, consisting of 113 drone images with more than 260k manually annotated soybean pods captured under natural lighting conditions. Through comprehensive evaluations, SoybeanNet demonstrated superior performance over five state-of-the-art approaches when tested on the collected images. Remarkably, SoybeanNet achieved a counting accuracy of $84.51\%$ when tested on the testing dataset, attesting to its efficacy in real-world scenarios. The publication also provides both the source code (\url{https://github.com/JiajiaLi04/Soybean-Pod-Counting-from-UAV-Images}) and the labeled soybean dataset (\url{https://www.kaggle.com/datasets/jiajiali/uav-based-soybean-pod-images}), offering a valuable resource for future research endeavors in soybean pod counting and related fields.

翻訳日:2023-11-22 17:34:42 公開日:2023-11-19

# AutoDIR: 遅延拡散によるオールインワン画像の自動復元

AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion ( http://arxiv.org/abs/2310.10123v3 )

ライセンス: Link先を確認

Yitong Jiang, Zhaoyang Zhang, Tianfan Xue and Jinwei Gu

(参考訳) 本稿では,ある画像が未知の劣化を生じさせる複雑な実世界の画像復元状況を解決することを目的とする。そこで本研究では,複数の未知の劣化を自動的に検出し対処できる,潜在拡散(autodir)を備えたオールインワン画像復元フレームワークを提案する。まず,ブラインド画像品質評価モジュール(biqa)を用いて,画像の未知の支配的画像劣化型の自動検出と同定を行う。次に、オールインワンイメージリファインメント(AIR)モジュールは、BIQAのガイダンスにより、複数の種類の劣化画像復元を処理する。最後に,AIRで歪んだ画像の復元のために,SCM(Structure Correction Module)を提案する。総合的な評価から,autodirはより広い範囲のタスクをサポートしながら,優れた修復結果を達成し,最先端のアプローチに勝ることが示された。特にAutoDIRは、複数の未知の劣化を伴う実シナリオイメージを自動的に処理する最初の方法でもある。

In this paper, we aim to solve complex real-world image restoration situations, in which, one image may have a variety of unknown degradations. To this end, we propose an all-in-one image restoration framework with latent diffusion (AutoDIR), which can automatically detect and address multiple unknown degradations. Our framework first utilizes a Blind Image Quality Assessment Module (BIQA) to automatically detect and identify the unknown dominant image degradation type of the image. Then, an All-in-One Image Refinement (AIR) Module handles multiple kinds of degradation image restoration with the guidance of BIQA. Finally, a Structure Correction Module (SCM) is proposed to recover the image details distorted by AIR. Our comprehensive evaluation demonstrates that AutoDIR outperforms state-of-the-art approaches by achieving superior restoration results while supporting a wider range of tasks. Notably, AutoDIR is also the first method to automatically handle real-scenario images with multiple unknown degradations.

翻訳日:2023-11-22 17:33:27 公開日:2023-11-19

# 動的モジュール展開と適応による生涯シーケンス生成

Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation ( http://arxiv.org/abs/2310.09886v3 )

ライセンス: Link先を確認

Chengwei Qin, Chen Chen, Shafiq Joty

(参考訳) 連続学習の課題である生涯シーケンス生成(LSG)は、連続的なタスクのシーケンス上でモデルを継続的に訓練し、過去の知識の忘れを回避しつつ、常に新しい世代パターンを学習することを目的としている。既存のLSG手法は主に、タスク間の知識伝達にほとんど注意を払わずに、古い知識を維持することに焦点を当てている。対照的に、人間は以前に獲得した類似のタスクからの知識を活用することで、新しいタスクをよりよく学べる。ヒトの学習パラダイムにインスパイアされた動的モジュール拡張・適応(DMEA)を提案し,タスク相関に基づく新しい知識獲得のためのアーキテクチャを動的に決定し,最も類似したタスクを選択し,新しいタスクへの適応を容易にする。さらに,学習プロセスが現在のタスクに偏りやすく,学習前の知識をより厳しく忘れてしまう可能性があることから,現在のタスクと再生タスクの学習のバランスをとるために,動的勾配スケーリングを提案する。大規模な実験により、DMEAはLSG設定の異なる既存手法より一貫して優れていることを示す。

Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can better learn new tasks by leveraging previously acquired knowledge from similar tasks. Inspired by the learning paradigm of humans, we propose Dynamic Module Expansion and Adaptation (DMEA), which enables the model to dynamically determine the architecture for acquiring new knowledge based on task correlation and select the most similar previous tasks to facilitate adaptation to new tasks. In addition, as the learning process can easily be biased towards the current task which might cause more severe forgetting of previously learned knowledge, we propose dynamic gradient scaling to balance the learning of the current task and replayed tasks. With extensive experiments, we demonstrate that DMEA can consistently outperform existing methods in different LSG settings.

翻訳日:2023-11-22 17:33:11 公開日:2023-11-19

# がんにおけるバイオマーカー発見のための言語モデルから知識グラフへ

From Large Language Models to Knowledge Graphs for Biomarker Discovery in Cancer ( http://arxiv.org/abs/2310.08365v2 )

ライセンス: Link先を確認

Md. Rezaul Karim and Lina Molinas Comet and Md Shajalal and Oya Deniz Beyan and Dietrich Rebholz-Schuhmann and Stefan Decker

(参考訳) ドメインの専門家は、様々な疾患のシナリオにおける予防と治療的意思決定を開発するための戦略を設計するのに役立つ、特定の生物学的プロセスの調整と普及に、近年の知識に頼っていることが多い。 ai(artificial intelligence)の難解なシナリオは、生体医学データ(テキスト、画像、省略、臨床など)を使用して、がんの診断と治療の推奨を提供することだ。 ~ がん、薬物、遺伝子、タンパク質などの生体医学的実体に関するデータと知識とそのメカニズムは、構造化された(知識ベース(kbs)と非構造化された(科学的記事など)ソースにまたがる。大規模知識グラフ(KG)は、意味的相互関連エンティティや関係に関する事実の統合と抽出によって構築することができる。このようなKGは、探索と質問応答(QA)を可能にするだけでなく、ドメインの専門家が新しい知識を推論することを可能にする。しかし,データアセットやセマンティック技術に対する理解の欠如から,大規模KGの探索とクエリは非ドメインユーザにとって面倒である。本稿では,癌特異的バイオマーカー発見と対話型QAを活用するドメインKGを開発する。そこで我々は OncoNet Ontology (ONO) というドメインオントロジーを構築した。 KGは、ONO、メタデータ、制御された語彙、およびBioBERTおよびSciBERTベースの情報抽出装置を用いて、科学論文から生医学的な概念を調和させることによりさらに豊かになる。さらに、生物医学領域は進化しており、新しい発見が古い発見に取って代わることが多いため、最新の科学的発見にアクセスできることなく、AIシステムが診断と治療を提供しながら概念ドリフトを示す可能性は高い。そこで,より最近の論文やkbsに基づいて,大言語モデル(llms)を用いてkgを微調整する。

Domain experts often rely on most recent knowledge for apprehending and disseminating specific biological processes that help them design strategies for developing prevention and therapeutic decision-making in various disease scenarios. A challenging scenarios for artificial intelligence (AI) is using biomedical data (e.g., texts, imaging, omics, and clinical) to provide diagnosis and treatment recommendations for cancerous conditions.~Data and knowledge about biomedical entities like cancer, drugs, genes, proteins, and their mechanism is spread across structured (knowledge bases (KBs)) and unstructured (e.g., scientific articles) sources. A large-scale knowledge graph (KG) can be constructed by integrating and extracting facts about semantically interrelated entities and relations. Such a KG not only allows exploration and question answering (QA) but also enables domain experts to deduce new knowledge. However, exploring and querying large-scale KGs is tedious for non-domain users due to their lack of understanding of the data assets and semantic technologies. In this paper, we develop a domain KG to leverage cancer-specific biomarker discovery and interactive QA. For this, we constructed a domain ontology called OncoNet Ontology (ONO), which enables semantic reasoning for validating gene-disease (different types of cancer) relations. The KG is further enriched by harmonizing the ONO, metadata, controlled vocabularies, and biomedical concepts from scientific articles by employing BioBERT- and SciBERT-based information extractors. Further, since the biomedical domain is evolving, where new findings often replace old ones, without having access to up-to-date scientific findings, there is a high chance an AI system exhibits concept drift while providing diagnosis and treatment. Therefore, we fine-tune the KG using large language models (LLMs) based on more recent articles and KBs.

翻訳日:2023-11-22 17:31:25 公開日:2023-11-19

# FedMFS:選択的モーダル通信を用いた多モード融合学習

FedMFS: Federated Multimodal Fusion Learning with Selective Modality Communication ( http://arxiv.org/abs/2310.07048v2 )

ライセンス: Link先を確認

Liangqi Yuan and Dong-Jun Han and Vishnu Pandi Chellapandi and Stanislaw H. \.Zak and Christopher G. Brinton

(参考訳) multimodal federated learning (fl) は、デバイスが複数のモダリティ(圧力、動き、その他の種類のデータを測定するセンサーなど)で計測値を集めているfl設定でのモデルトレーニングを強化することを目的としている。しかし、特に異種ネットワーク設定において、マルチモーダルFLに対する重要な課題は未解決のままである。 (i)各装置が収集するモダリティの集合は多様であり、 (ii) 通信制限は、デバイスがローカルに訓練されたモダリティモデルをサーバにアップロードすることを妨げている。本稿では,上記の課題に対処可能な新しいマルチモーダル融合fl手法であるfedmfs(federated multimodal fusion learning with selective modality communication)を提案する。鍵となるアイデアは、各デバイスに対するモダリティ選択基準の導入である。 (i)Shapley値分析によって測定されたモダリティの影響 (ii)通信オーバーヘッドの指標としてのモダリティモデルサイズ。これにより、fedmfはリソースの制約やアプリケーション要件に応じて、通信コストに対して柔軟にパフォーマンスのバランスをとることができる。実世界のActionSenseデータセットの実験では、FedMFSが複数のベースラインに匹敵する精度を達成し、通信オーバーヘッドを4倍に削減できることを示した。

Multimodal federated learning (FL) aims to enrich model training in FL settings where devices are collecting measurements across multiple modalities (e.g., sensors measuring pressure, motion, and other types of data). However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings: (i) the set of modalities collected by each device will be diverse, and (ii) communication limitations prevent devices from uploading all their locally trained modality models to the server. In this paper, we propose Federated Multimodal Fusion learning with Selective modality communication (FedMFS), a new multimodal fusion FL methodology that can tackle the above mentioned challenges. The key idea is the introduction of a modality selection criterion for each device, which weighs (i) the impact of the modality, gauged by Shapley value analysis, against (ii) the modality model size as a gauge for communication overhead. This enables FedMFS to flexibly balance performance against communication costs, depending on resource constraints and application requirements. Experiments on the real-world ActionSense dataset demonstrate the ability of FedMFS to achieve comparable accuracy to several baselines while reducing the communication overhead by over 4x.

翻訳日:2023-11-22 17:29:47 公開日:2023-11-19

# frank-wolfe-based metarounding アルゴリズムによるオンライン組合せ線形最適化

Online Combinatorial Linear Optimization via a Frank-Wolfe-based Metarounding Algorithm ( http://arxiv.org/abs/2310.12629v2 )

ライセンス: Link先を確認

Ryotaro Mitsuboshi, Kohei Hatano, and Eiji Takimoto

(参考訳) Metaroundingは、いくつかの組合せクラスに対する線形最適化のための近似アルゴリズムを、同じクラスのオンライン線形最適化アルゴリズムに変換するアプローチである。本稿では, 組合せクラスに対して, 緩和に基づく近似アルゴリズムが存在するという自然な仮定のもとに, 新たな畳み込みアルゴリズムを提案する。私たちのアルゴリズムは理論的にも実用的にもはるかに効率的です。

Metarounding is an approach to convert an approximation algorithm for linear optimization over some combinatorial classes to an online linear optimization algorithm for the same class. We propose a new metarounding algorithm under a natural assumption that a relax-based approximation algorithm exists for the combinatorial class. Our algorithm is much more efficient in both theoretical and practical aspects.

翻訳日:2023-11-22 17:17:52 公開日:2023-11-19

# 複数経路設定によるアライメント言語モデルの不確かさ校正の検討

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting ( http://arxiv.org/abs/2310.11732v2 )

ライセンス: Link先を確認

Guande He, Peng Cui, Jianfei Chen, Wenbo Hu, Jun Zhu

(参考訳) 協調言語モデル (LM) の実践的応用において顕著な進歩はあったが, 対応する事前学習型 LM と比較すると, 出力応答が過度に信頼される傾向にある。本研究では,多段設定下でのlmsのロジットに基づく不確実性校正に対するアライメントプロセスの影響を体系的に評価する。我々はまず,事前学習したキャリブレーションとlmsのキャリブレーションの違いについて,注意深い実験を行った。実験結果から,複数選択条件下でのLMには2つの不確実性が存在することが明らかとなった。次に,単純な合成アライメントスキームにおける微調整によるlmの調整におけるこれら2つの不確かさの役割について検討し,これら2つの不確かさの和合がlmsの過密化の一因であると結論づける。さらに,アライメントLMの一般的なポストホックキャリブレーション法の有用性について検討し,アライメントLMのキャリブレーションを容易かつ効率的に行う方法を提案する。 lmsのより信頼性の高いアライメントプロセスの設計に関する洞察を私たちの発見に提供できることを願っています。

Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs. In this work, we systematically evaluate the impact of the alignment process on logit-based uncertainty calibration of LMs under the multiple-choice setting. We first conduct a thoughtful empirical study on how aligned LMs differ in calibration from their pre-trained counterparts. Experimental results reveal that there are two distinct uncertainties in LMs under the multiple-choice setting, which are responsible for the answer decision and the format preference of the LMs, respectively. Then, we investigate the role of these two uncertainties on aligned LM's calibration through fine-tuning in simple synthetic alignment schemes and conclude that one reason for aligned LMs' overconfidence is the conflation of these two types of uncertainty. Furthermore, we examine the utility of common post-hoc calibration methods for aligned LMs and propose an easy-to-implement and sample-efficient method to calibrate aligned LMs. We hope our findings could provide insights into the design of more reliable alignment processes for LMs.

翻訳日:2023-11-22 17:17:32 公開日:2023-11-19

# イソトポローグ回転スペクトルによる自然存在量の3次元構造決定のための反射同変拡散

Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance ( http://arxiv.org/abs/2310.11609v2 )

ライセンス: Link先を確認

Austin Cheng, Alston Lo, Santiago Miret, Brooks Pate, Al\'an Aspuru-Guzik

(参考訳) 構造決定は、天然物、法医学的なサンプル、星間物質、実験室合成などの未知の有機分子を特定するために必要である。回転分光は、慣性モーメントを介して小さな有機分子の正確な3次元情報を提供することによって構造決定を可能にする。これらのモーメントを用いて、クラッチマン分析は、炭素、窒素、酸素を含む天然同位体の存在量を持つ全ての原子の非符号の$|x|,|y|,|z|$座標である同位体置換座標を決定する。非符号置換座標は構造の推測を検証することができるが、不足している$+/-$符号は置換座標のみから実際の構造を決定するのに困難である。この逆問題に対処するために、分子の完全な3d構造を分子式、慣性モーメント、重原子の無符号置換座標から推測する生成拡散モデルであるkreed(クラッチマン反射同変拡散)を開発した。 kreed の top-1 予測では、qm9 と geom データセットで 98% 以上の精度で正確な 3d 構造を同定している。置換座標が炭素のサブセットに制限されると、精度はQM9では91%、GEOMでは32%に維持される。文献から収集した置換座標の試験セットにおいて、クリードは33例中25例で正しい全原子3d構造を予測し、回転分光による文脈自由3d構造決定の実験的適用性を示した。

Structure determination is necessary to identify unknown organic molecules, such as those in natural products, forensic samples, the interstellar medium, and laboratory syntheses. Rotational spectroscopy enables structure determination by providing accurate 3D information about small organic molecules via their moments of inertia. Using these moments, Kraitchman analysis determines isotopic substitution coordinates, which are the unsigned $|x|,|y|,|z|$ coordinates of all atoms with natural isotopic abundance, including carbon, nitrogen, and oxygen. While unsigned substitution coordinates can verify guesses of structures, the missing $+/-$ signs make it challenging to determine the actual structure from the substitution coordinates alone. To tackle this inverse problem, we develop KREED (Kraitchman REflection-Equivariant Diffusion), a generative diffusion model that infers a molecule's complete 3D structure from its molecular formula, moments of inertia, and unsigned substitution coordinates of heavy atoms. KREED's top-1 predictions identify the correct 3D structure with >98% accuracy on the QM9 and GEOM datasets when provided with substitution coordinates of all heavy atoms with natural isotopic abundance. When substitution coordinates are restricted to only a subset of carbons, accuracy is retained at 91% on QM9 and 32% on GEOM. On a test set of experimentally measured substitution coordinates gathered from the literature, KREED predicts the correct all-atom 3D structure in 25 of 33 cases, demonstrating experimental applicability for context-free 3D structure determination with rotational spectroscopy.

翻訳日:2023-11-22 17:17:09 公開日:2023-11-19

# 量子ネットワークのパーコレーション理論

Percolation Theories for Quantum Networks ( http://arxiv.org/abs/2310.18420v2 )

ライセンス: Link先を確認

Xiangyi Meng, Xinqi Hu, Yu Tian, Gaogao Dong, Renaud Lambiotte, Jianxi Gao, Shlomo Havlin

(参考訳) 量子ネットワークは過去10年間、理論領域と実験領域の両方で急速に進歩し、統計物理学の観点からその大規模特徴を理解することがますます重要になっている。接続が部分的に絡み合っており、量子ノイズにさらされている不完全な量子ネットワークにおいて、遠方のノード間で(例えば、中間ノードを通して)効果的に、そして間接的に絡み合うことができるのか? ネットワーク接続に着目した統計物理学の分野であるパーコレーション理論に、正確なあるいは近似的なマッピングを描画することにより、この問題に対処する最近の研究を調査する。特に、古典的なパーコレーションフレームワークは、ネットワークの間接接続を一意的に定義していない。この実現により、「'Concurrence percolation'」と呼ばれる別の理論が出現し、この理論は、量子ネットワークがかつて古典的なパーコレーションの文脈で考えられていたよりも弾力性があり、将来の量子ネットワーク設計に新たな洞察をもたらすことを示唆している。

Quantum networks have experienced rapid advancements in both theoretical and experimental domains over the last decade, making it increasingly important to understand their large-scale features from the viewpoint of statistical physics. This review paper discusses a fundamental question: how can entanglement be effectively and indirectly (e.g., through intermediate nodes) distributed between distant nodes in an imperfect quantum network, where the connections are only partially entangled and subject to quantum noise? We survey recent studies addressing this issue by drawing exact or approximate mappings to percolation theory, a branch of statistical physics centered on network connectivity. Notably, we show that the classical percolation frameworks do not uniquely define the network's indirect connectivity. This realization leads to the emergence of an alternative theory called ``concurrence percolation,'' which uncovers a previously unrecognized quantum advantage that emerges at large scales, suggesting that quantum networks are more resilient than initially assumed within classical percolation contexts, offering refreshing insights into future quantum network design.

翻訳日:2023-11-22 17:08:56 公開日:2023-11-19

# IIDウェイトを超えて:スパースと低ランクのディープニューラルネットワークもガウス的プロセスである

Beyond IID weights: sparse and low-rank deep Neural Networks are also Gaussian Processes ( http://arxiv.org/abs/2310.16597v2 )

ライセンス: Link先を確認

Thiziri Nait-Saada, Alireza Naderi, Jared Tanner

(参考訳) 無限に広いニューラルネットワークは、ディープラーニングに現れる多くの現象の理解を可能にする、有用で管理可能な数学的モデルであることが証明されている。例えば、ランダムディープネットワークをガウス過程に収束させることで、活性化関数とネットワークウェイトの選択がトレーニング力学に与える影響を厳密に分析することができる。本稿では, Matthews et al. (2018) の初歩的な証明を, IID や直交重みの確立した事例を含むより大規模な初期重量分布(PSEUDO-IID と呼ぶ)に拡張するとともに, 計算速度の向上をめざす低ランクで構造化されたスパース設定を新たに導入する。また,PSEUDO-IID分布に初期化される完全連結・畳み込みネットワークは,その分散にほぼ等価であることを示す。この結果を用いて,より広い階層のニューラルネットワークのエッジ・オブ・カオスを識別し,そのトレーニングを強化するために臨界度でチューニングすることができる。

The infinitely wide neural network has been proven a useful and manageable mathematical model that enables the understanding of many phenomena appearing in deep learning. One example is the convergence of random deep networks to Gaussian processes that allows a rigorous analysis of the way the choice of activation function and network weights impacts the training dynamics. In this paper, we extend the seminal proof of Matthews et al. (2018) to a larger class of initial weight distributions (which we call PSEUDO-IID), including the established cases of IID and orthogonal weights, as well as the emerging low-rank and structured sparse settings celebrated for their computational speed-up benefits. We show that fully-connected and convolutional networks initialized with PSEUDO-IID distributions are all effectively equivalent up to their variance. Using our results, one can identify the Edge-of-Chaos for a broader class of neural networks and tune them at criticality in order to enhance their training.

翻訳日:2023-11-22 17:06:40 公開日:2023-11-19

# 複数の解像度でのルーティング問題を解決する対称性保存グラフアテンションネットワーク

Symmetry-preserving graph attention network to solve routing problems at multiple resolutions ( http://arxiv.org/abs/2310.15543v2 )

ライセンス: Link先を確認

Cong Dao Tran, Thong Bach, Truong Son Hy

(参考訳) トラベリングセールスパーソン問題 (TSP) と車両ルーティング問題 (VRP) は,機械学習 (ML) 手法の適応により,精度と計算時間を合理的に向上した。しかし、以前の作品では、回転、翻訳、置換、スケーリングを含む、tspsとvrpから生じる対称性を完全に尊重していない。本研究では,組合わせ問題を解くために,最初の完全同値モデルとトレーニングを導入する。さらに、特に大きなグラフや長距離グラフの場合において、入力グラフのマルチスケール構造(ローカルからグローバル情報)を捉えることが不可欠であり、従来の手法は局所的あるいは準最適解に繋がるローカル情報のみを抽出することに限定されていた。上記の制限に対処するため,マルチレゾリューション方式と等価グラフアテンションネットワーク(mEGAT)アーキテクチャを併用して,低レベルおよび高レベルグラフレゾリューションに基づく最適経路を効率的に学習する手法を提案する。特に, 入力グラフから粗粒グラフの階層構造を構築し, まずは単純な低レベルグラフのルーティング問題を解き, その知識をより複雑な高レベルグラフに活用する。実験により,本モデルが既存のベースラインより優れており,対称性の保存とマルチレゾリューションがデータ駆動方式で組合せ問題を解くための重要なレシピであることを実証した。私たちのソースコードはhttps://github.com/HySonLab/Multires-NP-hardで公開されています。

Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever completely equivariant model and training to solve combinatorial problems. Furthermore, it is essential to capture the multiscale structure (i.e. from local to global information) of the input graph, especially for the cases of large and long-range graphs, while previous methods are limited to extracting only local information that can lead to a local or sub-optimal solution. To tackle the above limitation, we propose a Multiresolution scheme in combination with Equivariant Graph Attention network (mEGAT) architecture, which can learn the optimal route based on low-level and high-level graph resolutions in an efficient way. In particular, our approach constructs a hierarchy of coarse-graining graphs from the input graph, in which we try to solve the routing problems on simple low-level graphs first, then utilize that knowledge for the more complex high-level graphs. Experimentally, we have shown that our model outperforms existing baselines and proved that symmetry preservation and multiresolution are important recipes for solving combinatorial problems in a data-driven manner. Our source code is publicly available at https://github.com/HySonLab/Multires-NP-hard

翻訳日:2023-11-22 17:05:25 公開日:2023-11-19

# FS-Net:マイクロ網膜血管構造の抽出改善のためのフルスケールネットワークと適応閾値

FS-Net: Full Scale Network and Adaptive Threshold for Improving Extraction of Micro-Retinal Vessel Structures ( http://arxiv.org/abs/2311.08059v2 )

ライセンス: Link先を確認

Melaku N. Getahun, Oleg Y. Rogov, Dmitry V. Dylov, Andrey Somov, Ahmed Bouridane, Rifat Hamoudi

(参考訳) 網膜血管セグメンテーションは、生体画像処理において広く研究されている課題であり、網膜障害の治療および検出における眼科医の負担を軽減することを目的としている。しかし、網膜血管の分割には独自の課題があり、従来の技術では分枝や微小血管構造を分割する場合に十分な結果が得られなかった。近年のニューラルネットワークのアプローチは、局所的および全体的特性を共に保持できないことと、小さなエンド容器を捕獲できないことが、望ましい結果を達成するのに困難である点が特徴である。この網膜血管セグメンテーション問題を解決するために,エンコーダ・デコーダニューラルネットワークアーキテクチャ,シグモイド平滑化,適応しきい値法に基づくフルスケールの微小血管抽出機構を提案する。ネットワークは、残余、エンコーダブースター、ボトルネック強化、圧縮、励起ビルディングブロックで構成されている。これらすべてのブロックは、セグメンテーションマップの機能抽出と予測を改善するのに役立ちます。提案手法は, DRIVE, CHASE-DB1, STAREデータセットを用いて評価し, 従来の研究と比較した場合の競合結果を得た。 AUCとDRIVEデータセットの精度はそれぞれ0.9884と0.9702である。 CHASE-DB1データセットでは、スコアはそれぞれ0.9903と0.9755である。 STAREデータセットでは、スコアはそれぞれ0.9916と0.9750である。その結果、眼科医の注意を引こうとする実生活診断センターにおいて、このソリューションが実現される確率が高くなる。

Retinal vascular segmentation, is a widely researched subject in biomedical image processing, aims to relieve ophthalmologists' workload when treating and detecting retinal disorders. However, segmenting retinal vessels has its own set of challenges, with prior techniques failing to generate adequate results when segmenting branches and microvascular structures. The neural network approaches used recently are characterized by the inability to keep local and global properties together and the failure to capture tiny end vessels make it challenging to attain the desired result. To reduce this retinal vessel segmentation problem, we propose a full-scale micro-vessel extraction mechanism based on an encoder-decoder neural network architecture, sigmoid smoothing, and an adaptive threshold method. The network consists of of residual, encoder booster, bottleneck enhancement, squeeze, and excitation building blocks. All of these blocks together help to improve the feature extraction and prediction of the segmentation map. The proposed solution has been evaluated using the DRIVE, CHASE-DB1, and STARE datasets, and competitive results are obtained when compared with previous studies. The AUC and accuracy on the DRIVE dataset are 0.9884 and 0.9702, respectively. On the CHASE-DB1 dataset, the scores are 0.9903 and 0.9755, respectively. On the STARE dataset, the scores are 0.9916 and 0.9750, respectively. The performance achieved is one step ahead of what has been done in previous studies, and this results in a higher chance of having this solution in real-life diagnostic centers that seek ophthalmologists attention.

翻訳日:2023-11-22 16:19:14 公開日:2023-11-19

# コードのための言語モデルに関する調査

A Survey on Language Models for Code ( http://arxiv.org/abs/2311.07989v2 )

ライセンス: Link先を確認

Ziyin Zhang and Chaoyu Chen and Bingchang Liu and Cong Liao and Zi Gong and Hang Yu and Jianguo Li and Rui Wang

(参考訳) 本稿では,50以上のモデル,30以上の評価タスク,150以上のデータセット,550以上の関連作業を含む,言語モデルによるコード処理の最近の進歩を体系的にレビューする。私たちは、コード処理モデルをgptファミリに代表される一般的な言語モデルと、特にコードで事前学習される特殊なモデルに分解します。これらのモデルとの関係と相違について考察し,nlpが実施したのと全く同じ方法で,統計モデルやrnnから事前学習されたトランスフォーマーやllmへのコードモデリングの歴史的変遷を強調する。また、ast、cfg、ユニットテストといったコード固有の機能や、コード言語モデルをトレーニングするアプリケーションについても議論し、このドメインにおける重要な課題と将来的な方向性を特定します。調査はGitHubリポジトリのhttps://github.com/codefuse-ai/Awesome-Code-LLM.comで公開しています。

In this work we systematically review the recent advancements in code processing with language models, covering 50+ models, 30+ evaluation tasks, 150+ datasets, and 550 related works. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also discuss code-specific features such as AST, CFG, and unit tests, along with their application in training code language models, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on GitHub repository at https://github.com/codefuse-ai/Awesome-Code-LLM.

翻訳日:2023-11-22 16:18:47 公開日:2023-11-19

# PEMS:事前訓練されたエピデミック時系列モデル

PEMS: Pre-trained Epidemic Time-series Models ( http://arxiv.org/abs/2311.07841v2 )

ライセンス: Link先を確認

Harshavardhan Kamarthi, B. Aditya Prakash

(参考訳) 伝染病の将来に関する正確かつ確実な予測を提供することは、公衆衛生上の決定を情報化するための重要な問題である。近年の研究では、ディープラーニング手法の進歩を活用して過去の流行データから学習するデータ駆動ソリューションが、従来の力学モデルより優れていることが示されている。しかし、多くの場合、過去のデータは希少であり、基礎となるダイナミクスを十分に捉えていない。過去の流行による大量のデータが存在しているが、他の病気の時系列データからの事前知識を活用することは、ささいな課題である。言語および視覚タスクにおける事前学習モデルの成功に動機づけられた我々は、異なる疾患や流行から複数のデータセットから学習するために、事前訓練された流行時間モデルの問題に取り組む。自己教師型学習(SSL)タスクの集合として事前学習を定式化することにより,各種疾患の時系列データセットから学習する,事前学習型エピデミック時系列モデル(PEMS)を導入する。我々は,複数のダウンストリームタスクの微調整に活用可能な流行ダイナミクスに関する重要な事前知識を得るために,sslタスクを慎重に設計することにより,不均一なダイナミクスの処理や,複数の流行データセットから有用なパターンを効率的に取得することなど,流行時系列の事前学習に特有のさまざまな重要な課題に取り組む。その結果、PEMは、さまざまな季節パターン、地理、感染メカニズムのデータセット間で、さまざまなダウンストリームの時系列タスクにおいて、以前の最先端の手法よりも優れています。

Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We introduce Pre-trained Epidemic Time-Series Models (PEMS) that learn from diverse time-series datasets of a variety of diseases by formulating pre-training as a set of self-supervised learning (SSL) tasks. We tackle various important challenges specific to pre-training for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets.

翻訳日:2023-11-22 16:18:31 公開日:2023-11-19

# 一般化アナロジー:aiの監視を測定困難領域に一般化するためのテストベッド

Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains ( http://arxiv.org/abs/2311.07723v2 )

ライセンス: Link先を確認

Joshua Clymer, Garrett Baker, Rohan Subramani, Sam Wang

(参考訳) aiシステムがよりインテリジェントになり、その行動がより評価が難しくなるにつれ、彼らは指示に従うのではなく、人間のフィードバックの欠陥を競うことを学ぶことができるが、このリスクは、llmが人間のフィードバックを信頼できない状況に一般化する方法を制御することによって軽減できる。報酬モデルをいかに一般化するかをよりよく理解するために、私たちは8つのカテゴリにまたがる69の分布シフトを作成します。報酬モデルでは,「インストラクション・フォロー」の評価をデフォルトでは学ばず,代わりにインターネットテキストに似たペルソナを好んでいる。報酬モデルの内部表現を解釈する技術は、標準的な微調整よりも優れた一般化を実現するが、それでもしばしば、複雑な振る舞いと命令追従を区別することができない。我々は、最も難しい15の分散シフトをジェネラライゼーションアナログIES(GENIES)ベンチマークに統合し、報酬モデル一般化の制御に向けた進歩を期待する。

As AI systems become more intelligent and their behavior becomes more challenging to assess, they may learn to game the flaws of human feedback instead of genuinely striving to follow instructions; however, this risk can be mitigated by controlling how LLMs generalize human feedback to situations where it is unreliable. To better understand how reward models generalize, we craft 69 distribution shifts spanning 8 categories. We find that reward models do not learn to evaluate `instruction-following' by default and instead favor personas that resemble internet text. Techniques for interpreting reward models' internal representations achieve better generalization than standard fine-tuning, but still frequently fail to distinguish instruction-following from conflated behaviors. We consolidate the 15 most challenging distribution shifts into the GENeralization analogIES (GENIES) benchmark, which we hope will enable progress toward controlling reward model generalization.

翻訳日:2023-11-22 16:16:45 公開日:2023-11-19

# 組合せ最適化問題に対する予測候補最適化パラダイムの再考とベンチマーク

Rethinking and Benchmarking Predict-then-Optimize Paradigm for Combinatorial Optimization Problems ( http://arxiv.org/abs/2311.07633v2 )

ライセンス: Link先を確認

Haoyu Geng, Hang Ruan, Runzhong Wang, Yang Li, Yang Wang, Lei Chen, Junchi Yan

(参考訳) 多くのwebアプリケーションは、エネルギーコスト認識スケジューリング、web広告の予算配分、ソーシャルネットワークでのグラフマッチングなど、組合せ最適化の問題を解決することに依存している。しかし、多くの最適化問題には未知の係数が含まれており、これらの要因の不適切な予測は、エネルギー浪費、非効率な資源配分、ソーシャルネットワークにおける不適切なマッチングなどを引き起こす可能性がある。このような研究テーマを「予測テーマ最適化(PTO)」と呼び、統一システムにおける予測と意思決定のパフォーマンスを考察する。注目すべき最近の開発は、従来の2段階のアプローチとは対照的に、よりよい結果をもたらすと主張する最終的な意思決定品質を直接最適化する、エンドツーエンドの手法である。しかしながら、この分野の評価ベンチマークは断片化されており、様々なシナリオにおける様々なモデルの有効性はいまだ不明であり、包括的な評価と迅速な展開を妨げる。これらの問題に対処するため,我々は,現在のアプローチを包括的に分類し,既存の実験シナリオを統合し,統合ベンチマークを確立する。また,インクルーシブファイナンスのためのインダストリアルコンビネート広告問題の新たなデータセットをオープンソースとして紹介する。 ptoの再設計とベンチマークによって、より便利な評価とデプロイメントが促進され、この分野のアカデミーと業界の両方でさらなる改善がもたらされることを願っています。

Numerous web applications rely on solving combinatorial optimization problems, such as energy cost-aware scheduling, budget allocation on web advertising, and graph matching on social networks. However, many optimization problems involve unknown coefficients, and improper predictions of these factors may lead to inferior decisions which may cause energy wastage, inefficient resource allocation, inappropriate matching in social networks, etc. Such a research topic is referred to as "Predict-Then-Optimize (PTO)" which considers the performance of prediction and decision-making in a unified system. A noteworthy recent development is the end-to-end methods by directly optimizing the ultimate decision quality which claims to yield better results in contrast to the traditional two-stage approach. However, the evaluation benchmarks in this field are fragmented and the effectiveness of various models in different scenarios remains unclear, hindering the comprehensive assessment and fast deployment of these methods. To address these issues, we provide a comprehensive categorization of current approaches and integrate existing experimental scenarios to establish a unified benchmark, elucidating the circumstances under which end-to-end training yields improvements, as well as the contexts in which it performs ineffectively. We also introduce a new dataset for the industrial combinatorial advertising problem for inclusive finance to open-source. We hope the rethinking and benchmarking of PTO could facilitate more convenient evaluation and deployment, and inspire further improvements both in the academy and industry within this field.

翻訳日:2023-11-22 16:15:52 公開日:2023-11-19

# ソーシャルレコメンデーションのための未ターゲティングブラックボックス攻撃

Untargeted Black-box Attacks for Social Recommendations ( http://arxiv.org/abs/2311.07127v2 )

ライセンス: Link先を確認

Wenqi Fan, Shijie Wang, Xiao-yong Wei, Xiaowei Mei, Qing Li

(参考訳) オンラインソーシャルネットワークの興隆は、ユーザの意思決定プロセスを強化するために社会的関係を組み込んだソーシャルレコメンデーションシステムの進化を促進する。ノード表現の学習においてグラフニューラルネットワークが大きな成功を収めたことにより、GNNベースのソーシャルレコメンデーションは、ユーザ-イテムインタラクションとユーザ-ユーザ関係を同時にモデル化するために広く研究されている。その大きな成功にもかかわらず、最近の研究では、これらの高度なレコメンデーションシステムは、攻撃者がレコメンデーションのパフォーマンスを乱すために適切に設計されたフェイクユーザープロファイルを注入できる敵の攻撃に対して非常に脆弱であることが示されている。既存のほとんどの研究は、主にバニラレコメンデーターシステムにおけるターゲットアイテムのプロモートを目的とした攻撃に焦点を当てているが、全体的な予測性能を低下させる未目標攻撃はブラックボックスシナリオ下での社会的レコメンデーションでは調査されていない。ソーシャルレコメンデーションシステムに対する未ターゲティング攻撃を実行するために、攻撃者は偽ユーザーのための悪意あるソーシャル関係を構築して攻撃性能を高めることができる。しかし、ブラックボックスのソーシャルレコメンデーションを攻撃するには、社会的関係とアイテムプロファイルの調整が難しい。この制限に対処するため,我々はまず,コミュニティ間接続とコールドスタート項目が推奨性能の劣化に有効であることを示すための予備的研究を行った。具体的には,マルチエージェント強化学習に基づく新しいフレームワークによるマルチアタックを提案し,コールドスタートアイテムプロファイルの生成と,ブラックボックスソーシャルレコメンデーションに対する非ターゲティング攻撃を行うためのコミュニティ間ソーシャルリレーションを協調させる。様々な実世界のデータセットに関する包括的実験は、ブラックボックス設定下で提案する攻撃フレームワークの有効性を実証する。

The rise of online social networks has facilitated the evolution of social recommender systems, which incorporate social relations to enhance users' decision-making process. With the great success of Graph Neural Networks in learning node representations, GNN-based social recommendations have been widely studied to model user-item interactions and user-user social relations simultaneously. Despite their great successes, recent studies have shown that these advanced recommender systems are highly vulnerable to adversarial attacks, in which attackers can inject well-designed fake user profiles to disrupt recommendation performances. While most existing studies mainly focus on targeted attacks to promote target items on vanilla recommender systems, untargeted attacks to degrade the overall prediction performance are less explored on social recommendations under a black-box scenario. To perform untargeted attacks on social recommender systems, attackers can construct malicious social relationships for fake users to enhance the attack performance. However, the coordination of social relations and item profiles is challenging for attacking black-box social recommendations. To address this limitation, we first conduct several preliminary studies to demonstrate the effectiveness of cross-community connections and cold-start items in degrading recommendations performance. Specifically, we propose a novel framework Multiattack based on multi-agent reinforcement learning to coordinate the generation of cold-start item profiles and cross-community social relations for conducting untargeted attacks on black-box social recommendations. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of our proposed attacking framework under the black-box setting.

翻訳日:2023-11-22 16:14:47 公開日:2023-11-19

# ヒト脳活動からの言語生成

Language Generation from Human Brain Activities ( http://arxiv.org/abs/2311.09889v2 )

ライセンス: Link先を確認

Ziyi Ye, Qingyao Ai, Yiqun Liu, Min Zhang, Christina Lioma, Tuukka Ruotsalo

(参考訳) 非侵襲的脳-コンピュータインタフェース(BCI)による人間の言語の生成は、障害者に提供したりコミュニケーションを改善するなど、多くの応用を解き放つ可能性がある。しかし、現在、bcisによる言語生成は、最も可能性の高い皮質意味表現を持つ前生成文継続候補を選択するための分類設定でのみ成功している。脳と大規模計算言語モデルとの関係を明らかにする最近の研究に触発されて,意味的脳デコーダと組み合わせて,機能的磁気共鳴画像(fMRI)入力から言語を直接生成する,大規模言語モデル(LLM)のキャパシティを利用する生成言語BCIを提案する。提案モデルは,事前生成した候補の事前知識を必要とせず,視覚刺激や聴覚刺激の意味的内容に整合したコヒーレントな言語系列を生成することができる。提案したモデルから生成された言語を,ランダム制御,事前生成言語選択アプローチ,および標準LCMと比較し,統計的言語学習データに基づいて,次の単語の確率のみに基づいて共通コヒーレントテキストを生成する。提案モデルでは,脳の入力がサンプリングされたときのセマンティック刺激とより整合した言語を生成する。本研究は,直接言語生成におけるbcis活用の可能性と実現可能性を示す。

Generating human language through non-invasive brain-computer interfaces (BCIs) has the potential to unlock many applications, such as serving disabled patients and improving communication. Currently, however, generating language via BCIs has been previously successful only within a classification setup for selecting pre-generated sentence continuation candidates with the most likely cortical semantic representation. Inspired by recent research that revealed associations between the brain and the large computational language models, we propose a generative language BCI that utilizes the capacity of a large language model (LLM) jointly with a semantic brain decoder to directly generate language from functional magnetic resonance imaging (fMRI) input. The proposed model can generate coherent language sequences aligned with the semantic content of visual or auditory language stimuli perceived, without prior knowledge of any pre-generated candidates. We compare the language generated from the presented model with a random control, pre-generated language selection approach, and a standard LLM, which generates common coherent text solely based on the next word likelihood according to statistical language training data. The proposed model is found to generate language that is more aligned with semantic stimulus in response to which brain input is sampled. Our findings demonstrate the potential and feasibility of employing BCIs in direct language generation.

翻訳日:2023-11-22 16:07:10 公開日:2023-11-19

# 生成aiを活用した臨床エビデンス要約の信頼性向上

Leveraging Generative AI for Clinical Evidence Summarization Needs to Achieve Trustworthiness ( http://arxiv.org/abs/2311.11211v1 )

ライセンス: Link先を確認

Gongbo Zhang, Qiao Jin, Denis Jered McInerney, Yong Chen, Fei Wang, Curtis L. Cole, Qian Yang, Yanshan Wang, Bradley A. Malin, Mor Peleg, Byron C. Wallace, Zhiyong Lu, Chunhua Weng, Yifan Peng

(参考訳) エビデンスベースの医療は、医療の判断と実践を最良の証拠で力づけることで、医療の質を向上させることを目指している。様々な情報源から得ることができる医学的証拠の急速な成長は、明らかな情報の収集、評価、合成に挑戦する。大規模言語モデルによって実証された、生成AIの最近の進歩は、困難な作業の促進を約束する。しかし、説明責任、公平、包括的モデルの開発は依然として複雑な作業である。この観点から、医療証拠の自動要約の文脈において、生成AIの信頼性について論じる。

Evidence-based medicine aims to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

翻訳日:2023-11-22 06:59:20 公開日:2023-11-19

# hih:unconstrained gait認識のための階層ネットワークにおけるマルチモーダル階層

HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition ( http://arxiv.org/abs/2311.11210v1 )

ライセンス: Link先を確認

Lei Wang, Yinchi Ma, Peng Luan, Wei Yao, Congcong Li, Bo Liu

(参考訳) 歩行認識は、制御された環境において有望な進歩を遂げてきたが、視野の変化、咬合、歩行速度の変化といった課題により、訓練されていない環境では著しく困難である。加えて、複数のモダリティを融合させる努力は、特に屋外シナリオにおいて、クロスモダリティの非互換性のため、しばしば限られた改善に直面します。これらの問題に対処するために,階層ネットワーク (hih) において,ロバストな歩行認識のためにシルエットとポーズシーケンスを統合するマルチモーダル階層を提案する。 HiHは階層的ゲイト分解器(HGD)モジュールを用いてシルエットデータから一般的なゲイトパターンの深さ方向およびモジュール内階層的な検査を行う。このアプローチは、全身のダイナミクスから詳細な手足の動きまでの動き階層を捉え、複数の空間分解能にわたる歩行特性の表現を容易にする。これを補完する2次元関節配列に基づく補助枝は、歩行解析の空間的・時間的側面を豊かにする。ポーズ誘導型空間アテンションのための変形性空間拡張(DSE)モジュールと、学習された時間オフセットを通じて運動力学を整列させる変形性時間アライメント(DTA)モジュールを用いる。さまざまな屋内および屋外データセットにわたる広範囲な評価は、HiHの最先端のパフォーマンスを示し、正確性と効率のバランスの取れたトレードオフを確認している。

Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.

翻訳日:2023-11-22 06:59:11 公開日:2023-11-19

# 単平面蛍光画像からの3次元ガイドワイヤ形状再構成

3D Guidewire Shape Reconstruction from Monoplane Fluoroscopic Images ( http://arxiv.org/abs/2311.11209v1 )

ライセンス: Link先を確認

Tudor Jianu, Baoru Huang, Pierre Berthet-Rayne, Sebastiano Fichera, Anh Nguyen

(参考訳) 血管内ナビゲーションは血管内疾患の診断および治療に必須であり、感覚フィードバックの制約により主に蛍光画像に影響を及ぼす。血管内介入のための現在の形状再構成技術は、しばしば事前情報または特殊な機器に依存し、患者に放射線曝露の増大を強いる可能性がある。ディープラーニングは潜在的な可能性を秘めているが、通常は広範なデータを必要とする。本稿では,最先端人工血管シミュレータcathsimと3次元蛍光ガイドワイヤ再構成ネットワーク(3d-fgrn)を用いた3次元ガイドワイヤの再構築手法を提案する。我々の3D-FGRNは、シミュレーションされた単平面蛍光画像から従来の三角測量と同等の結果が得られる。提案するネットワークの効率を高める実験を行い,従来の手法に代わる有望な代替手段として実証した。

Endovascular navigation, essential for diagnosing and treating endovascular diseases, predominantly hinges on fluoroscopic images due to the constraints in sensory feedback. Current shape reconstruction techniques for endovascular intervention often rely on either a priori information or specialized equipment, potentially subjecting patients to heightened radiation exposure. While deep learning holds potential, it typically demands extensive data. In this paper, we propose a new method to reconstruct the 3D guidewire by utilizing CathSim, a state-of-the-art endovascular simulator, and a 3D Fluoroscopy Guidewire Reconstruction Network (3D-FGRN). Our 3D-FGRN delivers results on par with conventional triangulation from simulated monoplane fluoroscopic images. Our experiments accentuate the efficiency of the proposed network, demonstrating it as a promising alternative to traditional methods.

翻訳日:2023-11-22 06:58:44 公開日:2023-11-19

# logicnet: 論理一貫性を組み込んだ顔属性学習ネットワーク

LogicNet: A Logical Consistency Embedded Face Attribute Learning Network ( http://arxiv.org/abs/2311.11208v1 )

ライセンス: Link先を確認

Haiyu Wu, Sicong Tian, Huayu Li, Kevin W. Bowyer

(参考訳) 予測における論理的一貫性の確保は、多属性分類において決定的だが見落とされた側面である。この監視の潜在的な理由を探求し、この分野に2つの押し付け課題を紹介します。 1) 論理的整合性をチェックするためにデータでトレーニングされたモデルが、論理的に整合性のある予測をどうやって得るか。 2) 論理的整合性チェックを受けていないデータで、どうやって同じことを達成できますか? 自動化を強化するには、手作業の最小化も不可欠です。これらの課題に対処するために,fh41kとceleba-logicという2つのデータセットを導入し,属性間の論理関係を学習する敵対的トレーニングフレームワークであるlogicnetを提案する。 LogicNetの精度は、FH37K、FH41K、CelebA-logicでそれぞれ23.05%、9.96%、そして1.71%という、次のベストアプローチよりも高い。実世界の事例分析において,本手法は,他の手法と比較して,平均失敗事例数の50%以上を削減できる。

Ensuring logical consistency in predictions is a crucial yet overlooked aspect in multi-attribute classification. We explore the potential reasons for this oversight and introduce two pressing challenges to the field: 1) How can we ensure that a model, when trained with data checked for logical consistency, yields predictions that are logically consistent? 2) How can we achieve the same with data that hasn't undergone logical consistency checks? Minimizing manual effort is also essential for enhancing automation. To address these challenges, we introduce two datasets, FH41K and CelebA-logic, and propose LogicNet, an adversarial training framework that learns the logical relationships between attributes. Accuracy of LogicNet surpasses that of the next-best approach by 23.05%, 9.96%, and 1.71% on FH37K, FH41K, and CelebA-logic, respectively. In real-world case analysis, our approach can achieve a reduction of more than 50% in the average number of failed cases compared to other methods.

翻訳日:2023-11-22 06:58:31 公開日:2023-11-19

# 拡散モデルを用いた有理設計生成のための雑音スケジューリングについて

On the Noise Scheduling for Generating Plausible Designs with Diffusion Models ( http://arxiv.org/abs/2311.11207v1 )

ライセンス: Link先を確認

Jiajie Fan, Laure Vuaille, Thomas B\"ack, Hao Wang

(参考訳) ディープジェネレーティブモデル(dgms)はファッションから自動車部門まで、複数の業界にまたがる革新的なデザインを生み出すために広く使われている。視覚的品質の高い画像を生成することに加え、構造設計のタスクは、例えば浮動小数点や欠落部分などの意味表現により厳密な制約を課す。拡散モデルのノイズスケジュールが結果の妥当性に与える影響を探索し、モデルの性能が結果の可否を決定する様々なノイズレベルが存在することを示す。また,与えられた画像集合に対して,そのような範囲を決定するための2つの手法を提案し,新しいパラメトリックノイズスケジュールを考案し,信頼性を向上させる。このノイズスケジュールをよく知られた拡散モデルEDMのトレーニングとサンプリングに適用し、デフォルトのノイズスケジュールと比較する。 edmと比較すると, 設計精度は83.4%から93.5%, fr\'echetインセプション距離 (fid) が7.84から4.87に大幅に向上した。高度な画像編集ツールのさらなる応用は、モデルの構造に対するしっかりとした理解を示している。

Deep Generative Models (DGMs) are widely used to create innovative designs across multiple industries, ranging from fashion to the automotive sector. In addition to generating images of high visual quality, the task of structural design generation imposes more stringent constrains on the semantic expression, e.g., no floating material or missing part, which we refer to as plausibility in this work. We delve into the impact of noise schedules of diffusion models on the plausibility of the outcome: there exists a range of noise levels at which the model's performance decides the result plausibility. Also, we propose two techniques to determine such a range for a given image set and devise a novel parametric noise schedule for better plausibility. We apply this noise schedule to the training and sampling of the well-known diffusion model EDM and compare it to its default noise schedule. Compared to EDM, our schedule significantly improves the rate of plausible designs from 83.4% to 93.5% and Fr\'echet Inception Distance (FID) from 7.84 to 4.87. Further applications of advanced image editing tools demonstrate the model's solid understanding of structure.

翻訳日:2023-11-22 06:58:14 公開日:2023-11-19

# ロバストなネットワークスライシング:マルチエージェントポリシー、敵対的攻撃、防御戦略

Robust Network Slicing: Multi-Agent Policies, Adversarial Attacks, and Defensive Strategies ( http://arxiv.org/abs/2311.11206v1 )

ライセンス: Link先を確認

Feng Wang, M. Cenk Gursoy, and Senem Velipasalar

(参考訳) 本稿では,複数の基地局と複数のユーザを持つ動的環境下でのネットワークスライシングのためのマルチエージェント深層強化学習(deep RL)フレームワークを提案する。特に,複数のアクタと集中型批評家(MACC)を備えた新しいディープRLフレームワークを提案し,アクタをポインタネットワークとして実装し,入力の異なる次元に適合させる。提案するdeep rlアルゴリズムの性能をシミュレーションにより評価し,その効果を示す。その後,先行情報と電力予算の制限を伴う深いrlベースのジャマーを開発した。妨害者の目標は、ネットワークスライシングによって達成される伝送速度を最小化し、ネットワークスライシングエージェントの性能を低下させることである。我々は、深いRLによるチャンネル最適化と同様に、聴取位相とジャミング位相の両方でジャマーを設計し、ジャミング位置最適化に対処する。我々は、ジャミングフェーズとリスニングフェーズを切り替えることで、最適化位置でのジャマーの評価を行い、最適化されたチャネルセットにおける干渉攻撃を生成する。提案手法は,ネットワークスライシングポリシーに関する直接的なフィードバックや事前知識を必要とせずに,被害者のパフォーマンスを著しく低減できることを示す。最後に,ネットワークスライシング(防御手段として)とジャミングのためのnash平衡教師付きポリシアンサンブル混合戦略プロファイルを考案する。本研究では,ネットワークスライシングエージェントとジャムマーエージェントを用いて,提案アルゴリズムの性能評価を行い,その有効性を示す。

In this paper, we present a multi-agent deep reinforcement learning (deep RL) framework for network slicing in a dynamic environment with multiple base stations and multiple users. In particular, we propose a novel deep RL framework with multiple actors and centralized critic (MACC) in which actors are implemented as pointer networks to fit the varying dimension of input. We evaluate the performance of the proposed deep RL algorithm via simulations to demonstrate its effectiveness. Subsequently, we develop a deep RL based jammer with limited prior information and limited power budget. The goal of the jammer is to minimize the transmission rates achieved with network slicing and thus degrade the network slicing agents' performance. We design a jammer with both listening and jamming phases and address jamming location optimization as well as jamming channel optimization via deep RL. We evaluate the jammer at the optimized location, generating interference attacks in the optimized set of channels by switching between the jamming phase and listening phase. We show that the proposed jammer can significantly reduce the victims' performance without direct feedback or prior knowledge on the network slicing policies. Finally, we devise a Nash-equilibrium-supervised policy ensemble mixed strategy profile for network slicing (as a defensive measure) and jamming. We evaluate the performance of the proposed policy ensemble algorithm by applying on the network slicing agents and the jammer agent in simulations to show its effectiveness.

翻訳日:2023-11-22 06:57:53 公開日:2023-11-19

# カテーテルおよびガイドワイヤセグメンテーションにおける形状感応損失

Shape-Sensitive Loss for Catheter and Guidewire Segmentation ( http://arxiv.org/abs/2311.11205v1 )

ライセンス: Link先を確認

Chayun Kongtongvattana, Baoru Huang, Jingxuan Kang, Hoan Nguyen, Olajide Olufemi, Anh Nguyen

(参考訳) 本稿では,カテーテルおよびガイドワイヤセグメンテーションのための形状感応損失関数を導入し,それを視覚トランスフォーマーネットワークで活用し,大規模x線画像データセットに新たな最先端結果を確立する。ネットワーク由来の予測とそれに対応する基底真理を符号付き距離マップに変換し、任意のネットワークが単に全体輪郭ではなく本質的な境界に集中できるようにする。これらのsdmは視覚トランスフォーマを施し、臨界画像属性をカプセル化した高次元特徴ベクトルを効率的に生成する。これらの特徴ベクトル間の余弦的類似性を計算することにより、従来の重複度に基づく測度の制限を超えて、画像類似性の微妙な理解が得られる。提案手法の利点は、スケールや翻訳の不変性から微妙な差異の検出に優れ、画像内の医療機器の正確な位置決めとデライン化を確保することにある。包括的定量的・質的分析により,既存のベースラインよりも性能が著しく向上し,カテーテルおよびガイドワイヤセグメンテーションを改善するための新しい形状感応損失関数が期待できることが証明された。

We introduce a shape-sensitive loss function for catheter and guidewire segmentation and utilize it in a vision transformer network to establish a new state-of-the-art result on a large-scale X-ray images dataset. We transform network-derived predictions and their corresponding ground truths into signed distance maps, thereby enabling any networks to concentrate on the essential boundaries rather than merely the overall contours. These SDMs are subjected to the vision transformer, efficiently producing high-dimensional feature vectors encapsulating critical image attributes. By computing the cosine similarity between these feature vectors, we gain a nuanced understanding of image similarity that goes beyond the limitations of traditional overlap-based measures. The advantages of our approach are manifold, ranging from scale and translation invariance to superior detection of subtle differences, thus ensuring precise localization and delineation of the medical instruments within the images. Comprehensive quantitative and qualitative analyses substantiate the significant enhancement in performance over existing baselines, demonstrating the promise held by our new shape-sensitive loss function for improving catheter and guidewire segmentation.

翻訳日:2023-11-22 06:57:31 公開日:2023-11-19

# データ信頼性のアンマキングと改善:無害言語モデルのトレーニングのためのデータセットを用いた研究

Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models ( http://arxiv.org/abs/2311.11202v1 )

ライセンス: Link先を確認

Zhaowei Zhu, Jialu Wang, Hao Cheng, Yang Liu

(参考訳) 言語モデルはさまざまなタスクでpromiseを示していますが、トレーニング、微調整、アライメントの間、望ましくないデータに影響されます。例えば、安全でない会話が誤って安全なものとして注釈付けされている場合、これらのサンプルに微調整されたモデルは有害である可能性がある。したがって、アノテーションの正確性、すなわちデータセットの信頼性が重要である。本研究は,Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHFなどの一般的なベンチマークを含む,現実世界のデータセットの信頼性に注目したものだ。ヒトによるこれらのデータセットのクリーニングのコストと難しさを考慮し、データセットの信頼性を評価し、ラベルの誤りを特定し、キュレートされた言語データにおけるノイズの多いラベルの影響を評価するための体系的な枠組みを導入する。このフレームワークでは、上記のベンチマークから構築された11のデータセットで平均6.16%のラベルエラーを発見し、修正する。データ信頼性と下流学習性能はラベルエラーを直接修正することで著しく改善され、既存の実世界のデータセットをクリーニングすることの重要性が示される。オープンソース: https://github.com/docta-ai/docta.com

Language models have shown promise in various tasks but can be affected by undesired data during training, fine-tuning, or alignment. For example, if some unsafe conversations are wrongly annotated as safe ones, the model fine-tuned on these samples may be harmful. Therefore, the correctness of annotations, i.e., the credibility of the dataset, is important. This study focuses on the credibility of real-world datasets, including the popular benchmarks Jigsaw Civil Comments, Anthropic Harmless & Red Team, PKU BeaverTails & SafeRLHF, that can be used for training a harmless language model. Given the cost and difficulty of cleaning these datasets by humans, we introduce a systematic framework for evaluating the credibility of datasets, identifying label errors, and evaluating the influence of noisy labels in the curated language data, specifically focusing on unsafe comments and conversation classification. With the framework, we find and fix an average of 6.16% label errors in 11 datasets constructed from the above benchmarks. The data credibility and downstream learning performance can be remarkably improved by directly fixing label errors, indicating the significance of cleaning existing real-world datasets. Open-source: https://github.com/Docta-ai/docta.

翻訳日:2023-11-22 06:57:11 公開日:2023-11-19

# スケールフリーネットワーク:推論の改善

Scale-free networks: improved inference ( http://arxiv.org/abs/2311.11200v1 )

ライセンス: Link先を確認

Nixon Jerez-Lillo, Francisco A. Rodrigues and Pedro L. Ramos

(参考訳) パワーロー分布は、様々な応用科学と同様に複雑なネットワークにおいて重要な役割を果たす。ネットワークの次数分布が法則分布に従うかどうかを調べることは重要な問題である。モデルパラメータを推定するためによく使われる推論手法は、しばしばバイアス付き推定を導き、モデルがパワーローに従うという仮説の拒絶につながる。本稿では,ベイズ推定を用いて正確な推定値と信頼性区間を求める手法について述べる。推論法は連続分布と離散分布の両方に対して導出される。これらの手法は、客観的ベイズアプローチが両モデルのパラメータの偏りのない推定値を返すことを明らかにする。特に,連続例では,明らかな後方分布を同定する。この研究は適合度テストの能力を高め、ネットワークや他のデータセットがパワーロー分布に準拠しているかどうかを正確に識別する。提案手法を5000以上の合成ネットワークと3,000以上の実ネットワークに対して適合度分布に適用する。以上の結果から,本手法は,指定名目レベルに近い受入頻度が得られるため,実用上より適していることが示唆された。

The power-law distribution plays a crucial role in complex networks as well as various applied sciences. Investigating whether the degree distribution of a network follows a power-law distribution is an important concern. The commonly used inferential methods for estimating the model parameters often yield biased estimates, which can lead to the rejection of the hypothesis that a model conforms to a power-law. In this paper, we discuss improved methods that utilize Bayesian inference to obtain accurate estimates and precise credibility intervals. The inferential methods are derived for both continuous and discrete distributions. These methods reveal that objective Bayesian approaches return nearly unbiased estimates for the parameters of both models. Notably, in the continuous case, we identify an explicit posterior distribution. This work enhances the power of goodness-of-fit tests, enabling us to accurately discern whether a network or any other dataset adheres to a power-law distribution. We apply the proposed approach to fit degree distributions for more than 5,000 synthetic networks and over 3,000 real networks. The results indicate that our method is more suitable in practice, as it yields a frequency of acceptance close to the specified nominal level.

翻訳日:2023-11-22 06:56:50 公開日:2023-11-19

# オルガノイド画像のセグメンテーションのための自己監督型Versus監督訓練

Self-Supervised Versus Supervised Training for Segmentation of Organoid Images ( http://arxiv.org/abs/2311.11198v1 )

ライセンス: Link先を確認

Asmaa Haja, Eric Brouwer and Lambert Schomaker

(参考訳) デジタル顕微鏡の分野における関連データの注釈付けのプロセスは、必要な技術スキルと人間の専門知識により、時間と費用の両方がかかる。結果として、大量の顕微鏡画像データセットがラベル付けされず、ディープラーニングアルゴリズムによる効果的な利用を妨げている。近年、ラベルのないデータから多くの関連情報が引き出せることが示されている。自己教師付き学習(SSL)は、ラベルを必要とせずにメインタスクに類似したプリテキストタスクの下で固有の特徴を学習する、有望なソリューションである。トレーニングされた結果は、我々の場合のメインタスクイメージセグメンテーションに転送されます。 ResNet50 U-Netは、構造化類似度指数(Structure similarity Index Metric, SSIM)だけで、L1損失と組み合わせてSSIMを用いて、肝臓前駆体オルガノイドのイメージを拡張画像から復元する訓練が最初に行われた。エンコーダとデコーダの両方がタンデムで訓練された。重みは、凍ったエンコーダ重みを持つセグメンテーションのために設計された別のU-Netモデルに転送され、Binary Cross Entropy、Dice、Intersection over Union (IoU)損失を用いた。比較のために、私たちは同じu-netアーキテクチャを使用して、2つの教師付きモデルをトレーニングしました。その結果,25\%のドロップを用いた自己教師型学習モデルや画像のぼかし増強法は,IoU損失を用いた他の強化法よりも優れていた。メインタスクの114画像のみを訓練すると、自己教師付き学習アプローチは、教師付き方法が得点したf1=0.78と対照的に、f1-scoreを0.85で達成する教師付き手法を高い安定性で上回る。さらに、より大きなデータセット(1000画像)でトレーニングすると、自己教師あり学習は依然として良くなり、教師あり方法のスコア 0.85と対照的に、f1スコア 0.92 となる。

The process of annotating relevant data in the field of digital microscopy can be both time-consuming and especially expensive due to the required technical skills and human-expert knowledge. Consequently, large amounts of microscopic image data sets remain unlabeled, preventing their effective exploitation using deep-learning algorithms. In recent years it has been shown that a lot of relevant information can be drawn from unlabeled data. Self-supervised learning (SSL) is a promising solution based on learning intrinsic features under a pretext task that is similar to the main task without requiring labels. The trained result is transferred to the main task - image segmentation in our case. A ResNet50 U-Net was first trained to restore images of liver progenitor organoids from augmented images using the Structural Similarity Index Metric (SSIM), alone, and using SSIM combined with L1 loss. Both the encoder and decoder were trained in tandem. The weights were transferred to another U-Net model designed for segmentation with frozen encoder weights, using Binary Cross Entropy, Dice, and Intersection over Union (IoU) losses. For comparison, we used the same U-Net architecture to train two supervised models, one utilizing the ResNet50 encoder as well as a simple CNN. Results showed that self-supervised learning models using a 25\% pixel drop or image blurring augmentation performed better than the other augmentation techniques using the IoU loss. When trained on only 114 images for the main task, the self-supervised learning approach outperforms the supervised method achieving an F1-score of 0.85, with higher stability, in contrast to an F1=0.78 scored by the supervised method. Furthermore, when trained with larger data sets (1,000 images), self-supervised learning is still able to perform better, achieving an F1-score of 0.92, contrasting to a score of 0.85 for the supervised method.

翻訳日:2023-11-22 06:56:32 公開日:2023-11-19

# 非同一分散サンプルによるテスト

Testing with Non-identically Distributed Samples ( http://arxiv.org/abs/2311.11194v1 )

ライセンス: Link先を確認

Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant

(参考訳) サンプルが独立に分布するが同一に分布しない環境では,サブ線形サンプル特性試験と推定がどの程度適用されるかを検討する。具体的には、以下の分散特性テストフレームワークについて検討する。 $k$, $\textbf{p}_1, \textbf{p}_2,\ldots,\textbf{p}_t$の離散的なサポートの上に一連のディストリビューションが存在すると仮定し、各ディストリビューションから$c$独立ドローを得る。平均分布のプロパティを学習またはテストすることを目標とすると、$\textbf{p}_{\mathrm{avg}}$である。 This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities -- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance. To test uniformity or identity -- distinguishing the case that $\textbf{p}_{\mathrm{avg}}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. 対照的に、$c \ge 2$ の場合、通常の i.i.d. のサブリニアなサンプル試験を復元する: $o(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ のサンプルは、$\varepsilon \ge k^{-1/4}$ の条件下での最適なサンプル複雑性に合致する。さらに、$c=2$の場合、$\rho > 0$ が存在して、$\rho k$ サンプルを持つ線形状態であっても、サンプルの多重集合(同じ $\textbf{p}_i$ から抽出されたサンプルを無視する)を考えるテスターは、均一性テストを行うことができない。

We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size $k$, $\textbf{p}_1, \textbf{p}_2,\ldots,\textbf{p}_T$, and we obtain $c$ independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, $\textbf{p}_{\mathrm{avg}}$. This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities -- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}}$ to within error $\varepsilon$ in TV distance. To test uniformity or identity -- distinguishing the case that $\textbf{p}_{\mathrm{avg}}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. In contrast, for $c \ge 2$, we recover the usual sublinear sample testing of the i.i.d. setting: we show that $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ samples are sufficient, matching the optimal sample complexity in the i.i.d. case in the regime where $\varepsilon \ge k^{-1/4}$. Additionally, we show that in the $c=2$ case, there is a constant $\rho > 0$ such that even in the linear regime with $\rho k$ samples, no tester that considers the multiset of samples (ignoring which samples were drawn from the same $\textbf{p}_i$) can perform uniformity testing.

翻訳日:2023-11-22 06:55:55 公開日:2023-11-19

# AIのインパクトアセスメントを評価する: 教室での研究

Assessing AI Impact Assessments: A Classroom Study ( http://arxiv.org/abs/2311.11193v1 )

ライセンス: Link先を確認

Nari Johnson, Hoda Heidari

(参考訳) 提案されたAIシステムへの影響を想像するための構造化プロセスを提供するツール群であるAIIA(Artificial Intelligence Impact Assessments)が、AIシステムを管理するための提案として人気が高まっている。近年、政府や民間団体の取り組みによりAIIAの多様なインスタンス化が提案されている。しかし、これまでのAIIA楽器の評価は限られていた。我々は,AIの社会的・倫理的意味に着目した選択科目において,大規模な研究集約大学(R1)で授業(N = 38)を行う。学生を異なる組織の役割(例えばML科学者やプロダクトマネージャ)に割り当て、参加者チームに、2つの想像上の生成AIシステムのうちの1つに対して、既存の3つのAI影響評価の1つを完成させるよう依頼します。参加者の行動前・後アンケートに対する反応のテーマ分析では、影響評価が、生成型AIシステムの潜在的なリスクに対する参加者の認識や、潜在的な害に対処するAI専門家の責任レベルに影響を及ぼすという予備的証拠が得られた。また、既存のAIIA機器が共有する一貫した制約も発見し、それらのフォーマットや内容、および潜在的な害を予知・軽減するための活動の実現可能性と有効性について懸念する。本研究の成果をもとに,AIIAの開発・検証に向けた今後の取り組みを提言する。

Artificial Intelligence Impact Assessments ("AIIAs"), a family of tools that provide structured processes to imagine the possible impacts of a proposed AI system, have become an increasingly popular proposal to govern AI systems. Recent efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. However, to date that has been limited evaluation of existing AIIA instruments. We conduct a classroom study (N = 38) at a large research-intensive university (R1) in an elective course focused on the societal and ethical implications of AI. We assign students to different organizational roles (for example, an ML scientist or product manager) and ask participant teams to complete one of three existing AI impact assessments for one of two imagined generative AI systems. In our thematic analysis of participants' responses to pre- and post-activity questionnaires, we find preliminary evidence that impact assessments can influence participants' perceptions of the potential risks of generative AI systems, and the level of responsibility held by AI experts in addressing potential harm. We also discover a consistent set of limitations shared by several existing AIIA instruments, which we group into concerns about their format and content, as well as the feasibility and effectiveness of the activity in foreseeing and mitigating potential harms. Drawing on the findings of this study, we provide recommendations for future work on developing and validating AIIAs.

翻訳日:2023-11-22 06:55:13 公開日:2023-11-19

# 視覚障害者の身体的攻撃に対する注意に基づくリアルタイム防御

Attention-Based Real-Time Defenses for Physical Adversarial Attacks in Vision Applications ( http://arxiv.org/abs/2311.11191v1 )

ライセンス: Link先を確認

Giulio Rossolini, Alessandro Biondi and Giorgio Buttazzo

(参考訳) ディープニューラルネットワークはコンピュータビジョンタスクにおいて優れたパフォーマンスを示すが、現実世界の敵攻撃に対する脆弱性は、予測を破損させる物理的オブジェクトを通じて達成され、安全クリティカルな領域における彼らのアプリケーションに対する深刻なセキュリティ上の懸念を引き起こす。既存の防衛手法は単一フレーム解析に焦点を当てており、リアルタイム決定が重要であるマルチフレームシナリオの適用性を制限する高い計算コストが特徴である。この問題に対処するため,本研究では,浅層ネットワーク層における悪意のある物体を迅速に識別し追跡し,その敵効果をマルチフレーム設定で隠蔽する,効果的な注意に基づく防御機構を提案する。本研究は,実世界の敵対的攻撃に対する既存のオーバーアクティベーション技術を拡張し,それらをリアルタイムアプリケーションで利用可能にすることにより,最先端の技術である。また、効率的なマルチフレーム防御フレームワークを導入し、防御性能と計算コストの両方を評価するための広範囲な実験を通じて有効性を検証する。

Deep neural networks exhibit excellent performance in computer vision tasks, but their vulnerability to real-world adversarial attacks, achieved through physical objects that can corrupt their predictions, raises serious security concerns for their application in safety-critical domains. Existing defense methods focus on single-frame analysis and are characterized by high computational costs that limit their applicability in multi-frame scenarios, where real-time decisions are crucial. To address this problem, this paper proposes an efficient attention-based defense mechanism that exploits adversarial channel-attention to quickly identify and track malicious objects in shallow network layers and mask their adversarial effects in a multi-frame setting. This work advances the state of the art by enhancing existing over-activation techniques for real-world adversarial attacks to make them usable in real-time applications. It also introduces an efficient multi-frame defense framework, validating its efficacy through extensive experiments aimed at evaluating both defense performance and computational cost.

翻訳日:2023-11-22 06:54:46 公開日:2023-11-19

# 検出可能性の絡み合い対策

Entanglement measures for detectability ( http://arxiv.org/abs/2311.11189v1 )

ライセンス: Link先を確認

Masahito Hayashi and Yuki Ito

(参考訳) 仮説テスト設定に基づく検出性能として,新たな絡み合い尺度を提案する。量子サノフ定理を拡張して絡み合った状態を検出する方法を明らかにする。最大相関状態に対するそれらの計算式を導出し、一般絡み合う状態に作用するアルゴリズムを提案する。さらに,本アルゴリズムがメンバシップ問題に対する分離可能性の解決にどのように役立つかを検討する。

We propose new entanglement measures as the detection performance based on as the hypothesis testing setting. We clarify how our measures work for detecting an entangled state by extending quantum Sanov theorem. We derive their calculation formulas for maximally correlated states, and propose their algorithms that work for general entangled state. In addition, we investigate how our algorithm works for solving the membership problem for separability.

翻訳日:2023-11-22 06:54:29 公開日:2023-11-19

# 一般化量子有元ブラフトアルゴリズムとその量子情報ボトルネックへの応用

Generalized quantum Arimoto-Blahut algorithm and its application to quantum information bottleneck ( http://arxiv.org/abs/2311.11188v1 )

ライセンス: Link先を確認

Masahito Hayashi and Geng Liu

(参考訳) 我々は、Ramakrishnan et al. (IEEE Trans) による量子アリーモト・ブラフトアルゴリズムを一般化する。 IT, 67, 946 (2021) は線形制約のある密度行列の集合上で定義される関数である。このアルゴリズムは適用範囲が広い。そこで,本アルゴリズムを3つの量子システムを用いた量子情報ボトルネックに適用し,量子学習に適用する。得られたアルゴリズムを,Grimsmo と Still (Phys) の既存アルゴリズムと比較した。 A, 94, 012338 (2016)。数値解析の結果,我々のアルゴリズムはアルゴリズムよりも優れていることがわかった。

We generalize the quantum Arimoto-Blahut algorithm by Ramakrishnan et al. (IEEE Trans. IT, 67, 946 (2021)) to a function defined over a set of density matrices with linear constraints. This algorithm has wider applicability. Hence, we apply our algorithm to the quantum information bottleneck with three quantum systems, which can be used for quantum learning. We numerically compare our obtained algorithm with the existing algorithm by Grimsmo and Still (Phys. Rev. A, 94, 012338 (2016)). Our numerical analysis shows that our algorithm is better than their algorithm.

翻訳日:2023-11-22 06:54:24 公開日:2023-11-19

# オープンボキャブラリーカモフラージュ物体のセグメンテーション

Open-Vocabulary Camouflaged Object Segmentation ( http://arxiv.org/abs/2311.11241v1 )

ライセンス: Link先を確認

Youwei Pang, Xiaoqi Zhao, Jiaming Zuo, Lihe Zhang, Huchuan Lu

(参考訳) 近年、CLIPのような大規模視覚言語モデル(VLM)が出現し、オープンワールドオブジェクト認識への道を開いた。多くの研究が、推論時に新しいクラスを持つ多様なオブジェクトを知覚する必要がある、オープン語彙の高密度な予測課題に対する事前学習VLMの利用について検討している。既存の手法は、オープンな語彙に適合せず、データ収集バイアスとアノテーションコストのために複雑な場面でキャモフラージュされた知覚できないオブジェクトを伴わない、関連するタスクの公開データセットに基づく実験を構築している。このギャップを埋めるために,新しいタスクであるオープンボキャブラリー迷彩オブジェクトセグメンテーション(ovcos)を導入し,11,483個の手書き画像と対応するオブジェクトクラスを含む大規模複雑なシーンデータセット(\textbf{ovcamo})を構築する。さらに、パラメータ固定されたCLIPに反復的意味指導と構造拡張を付加した、強力な単一段オープン語彙である \underline{c}amouflaged \underline{o}bject \underline{s}egmentation transform\underline{er} baseline \textbf{OVCoser} を構築した。クラスセマンティック知識の指導とエッジ情報と深度情報からの視覚構造的手がかりの補足を統合することにより、提案手法は効率よくカモフラージュされたオブジェクトを捕捉できる。さらに、この効果的なフレームワークは、OVCamoデータセットに対する大きなマージンで、従来のオープン語彙のセマンティックイメージセグメンテーションの最先端を上回ります。提案するデータセットとベースラインによって、より実用的な価値を持つこの新しいタスクが、オープンボキャブラリー密集予測タスクの研究をさらに拡大できることを期待している。

Recently, the emergence of the large-scale vision-language model (VLM), such as CLIP, has opened the way towards open-world object perception. Many works has explored the utilization of pre-trained VLM for the challenging open-vocabulary dense prediction task that requires perceive diverse objects with novel classes at inference time. Existing methods construct experiments based on the public datasets of related tasks, which are not tailored for open vocabulary and rarely involves imperceptible objects camouflaged in complex scenes due to data collection bias and annotation costs. To fill in the gaps, we introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS) and construct a large-scale complex scene dataset (\textbf{OVCamo}) which containing 11,483 hand-selected images with fine annotations and corresponding object classes. Further, we build a strong single-stage open-vocabulary \underline{c}amouflaged \underline{o}bject \underline{s}egmentation transform\underline{er} baseline \textbf{OVCoser} attached to the parameter-fixed CLIP with iterative semantic guidance and structure enhancement. By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects. Moreover, this effective framework also surpasses previous state-of-the-arts of open-vocabulary semantic image segmentation by a large margin on our OVCamo dataset. With the proposed dataset and baseline, we hope that this new task with more practical value can further expand the research on open-vocabulary dense prediction tasks.

翻訳日:2023-11-21 21:35:48 公開日:2023-11-19

# AtomXR: 自然言語と没入型物理的相互作用によるXRプロトタイピング

AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction ( http://arxiv.org/abs/2311.11238v1 )

ライセンス: Link先を確認

Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman

(参考訳) 拡張現実(XR)の技術的進歩により、より多くのXRコンテンツへの需要が増大するにつれ、従来の開発プロセスはいくつかの課題に直面している。 1)未熟な開発者のための急な学習曲線 2)ヘッドセット内における2次元開発環境と3次元ユーザ体験の切り離し 3) 開発環境とテスト環境のコンテキスト切り替えによるイテレーションサイクルの遅さ。これらの課題に対処するために、私たちは、経験豊富な開発者と経験の浅い開発者の両方に、自然言語、目視、タッチインタラクションを使用したアプリケーション開発を促進すべく設計された、合理化され、没入的、ノーコードXRプロトタイピングツールであるAtomXRを紹介します。 AtomXRは以下のもので構成されます。 1. AtomScript - 高速プロトタイピングのための高レベルの人間解釈可能なスクリプト言語。 2)atomscript生成のためのllmsとマルチモーダル入力を統合する自然言語インタフェース 3)没入型インヘッドセットオーサリング環境。 2つのユーザスタディによる経験的評価は、自然言語ベースおよび没入型プロトタイピングに関する洞察を与え、AtomXRは従来のシステムと比較して、スピードとユーザエクスペリエンスを大幅に改善することを示している。

As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environments. To address these challenges, we introduce AtomXR, a streamlined, immersive, no-code XR prototyping tool designed to empower both experienced and inexperienced developers in creating applications using natural language, eye-gaze, and touch interactions. AtomXR consists of: 1) AtomScript, a high-level human-interpretable scripting language for rapid prototyping, 2) a natural language interface that integrates LLMs and multimodal inputs for AtomScript generation, and 3) an immersive in-headset authoring environment. Empirical evaluation through two user studies offers insights into natural language-based and immersive prototyping, and shows AtomXR provides significant improvements in speed and user experience compared to traditional systems.

翻訳日:2023-11-21 21:35:15 公開日:2023-11-19

# マルチモーダル感性分析のためのAI深層学習アルゴリズムの実装

Implementation of AI Deep Learning Algorithm For Multi-Modal Sentiment Analysis ( http://arxiv.org/abs/2311.11237v1 )

ライセンス: Link先を確認

Jiazhen Wang

(参考訳) 2チャンネル畳み込みニューラルネットワークとリングネットワークを組み合わせたマルチモーダル感情認識手法が確立された。感情情報を効果的に抽出し、学習効率を向上させる。単語はグローブでベクトル化され、単語ベクトルは畳み込みニューラルネットワークに入力された。注意機構と最大プールコンバータBiSRUチャネルを組み合わせることで、局所的な深層感情と逐次的感情意味論を得る。最後に、複数の特徴を融合して感情の極性として入力することにより、対象の感情分析を実現する。特徴融合に基づく感情分析手法は,感情データセットの認識精度を効果的に向上させ,学習時間を短縮できることを示す。モデルにはある種の一般化がある。

A multi-modal emotion recognition method was established by combining two-channel convolutional neural network with ring network. This method can extract emotional information effectively and improve learning efficiency. The words were vectorized with GloVe, and the word vector was input into the convolutional neural network. Combining attention mechanism and maximum pool converter BiSRU channel, the local deep emotion and pre-post sequential emotion semantics are obtained. Finally, multiple features are fused and input as the polarity of emotion, so as to achieve the emotion analysis of the target. Experiments show that the emotion analysis method based on feature fusion can effectively improve the recognition accuracy of emotion data set and reduce the learning time. The model has a certain generalization.

翻訳日:2023-11-21 21:34:54 公開日:2023-11-19

# 時系列異常検出における「異常」の解法:自己教師付きトリドメイン解

Unraveling the `Anomaly' in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution ( http://arxiv.org/abs/2311.11235v1 )

ライセンス: Link先を確認

Yuting Sun, Guansong Pang, Guanhua Ye, Tong Chen, Xia Hu, Hongzhi Yin

(参考訳) 時系列異常検出(tsad: time series anomaly detection)における現在進行中の課題、特に異常ラベルの不足と異常長と形状の変化は、より効率的なソリューションの必要性をもたらした。 TSADにおける従来の教師付きモデルには限定的な異常ラベルが存在するため、自己教師付き学習のような様々なSOTA深層学習技術がこの問題に対処するために導入されている。しかし、これらは異常長や形状の変化に対処し難いため、様々な異常への適応性が制限される。さらに、多くのベンチマークデータセットは、ランダム関数でさえ検出できる明示的な異常を持つという問題に悩まされている。この問題は、不適切な評価指標である点調整(PA)によって悪化し、モデル性能が膨張する可能性がある。本稿では,3つのデータ領域の時間的・頻度的・残差的特徴を,異常ラベルに依存することなくモデル化することで,これらの課題に対処する,自己教師型学習ベースのTriADを提案する。従来のコントラスト学習法とは異なり、triadはドメイン間コントラスト損失とドメイン内コントラスト損失の両方を使用して、通常のデータ間の共通属性を学習し、異常と区別する。さらに,ディスコード検出アルゴリズムと統合することで,長さの異なる異常を検出できる。この研究は、高度に設計されたデータセット(UCRアーカイブ)と評価指標(PA%Kとアフィリエイト)の両方を利用して、TSADにおけるディープラーニングの可能性を再評価する最初の試みである。 UCRデータセットの実験結果により、TriADは、SOTA深層学習モデルよりもPA%KベースのF1スコアが3倍、精度が50%向上した。

The ongoing challenges in time series anomaly detection (TSAD), notably the scarcity of anomaly labels and the variability in anomaly lengths and shapes, have led to the need for a more efficient solution. As limited anomaly labels hinder traditional supervised models in TSAD, various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue. However, they encounter difficulties handling variations in anomaly lengths and shapes, limiting their adaptability to diverse anomalies. Additionally, many benchmark datasets suffer from the problem of having explicit anomalies that even random functions can detect. This problem is exacerbated by ill-posed evaluation metrics, known as point adjustment (PA), which can result in inflated model performance. In this context, we propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD), which addresses these challenges by modeling features across three data domains - temporal, frequency, and residual domains - without relying on anomaly labels. Unlike traditional contrastive learning methods, TriAD employs both inter-domain and intra-domain contrastive loss to learn common attributes among normal data and differentiate them from anomalies. Additionally, our approach can detect anomalies of varying lengths by integrating with a discord discovery algorithm. It is worth noting that this study is the first to reevaluate the deep learning potential in TSAD, utilizing both rigorously designed datasets (i.e., UCR Archive) and evaluation metrics (i.e., PA%K and affiliation). Through experimental results on the UCR dataset, TriAD achieves an impressive three-fold increase in PA%K based F1 scores over SOTA deep learning models, and 50% increase of accuracy as compared to SOTA discord discovery algorithms.

翻訳日:2023-11-21 21:34:41 公開日:2023-11-19

# 医療におけるコンピュータビジョンのための畳み込みニューラルネットワークによる放射線診断の強化

Enhancing Radiology Diagnosis through Convolutional Neural Networks for Computer Vision in Healthcare ( http://arxiv.org/abs/2311.11234v1 )

ライセンス: Link先を確認

Keshav Kumar K., Dr N V S L Narasimham

(参考訳) 放射線診断における畳み込みニューラルネットワーク(CNN)の変換力について, 解釈可能性, 有効性, 倫理的問題に着目して検討した。変更されたDenseNetアーキテクチャでは、CNNは特殊性、感度、精度の点で優れている。従来の手法よりも優れていることは、効率向上を強調する比較分析によって検証される。それでも、解釈可能性に関する問題は、継続的モデルの改善に加えて、洗練されたメソッドの必要性を強調している。相互運用性や放射線技師のトレーニングといった統合問題は、チームワークの提案につながります。倫理的含意を体系的に考慮し、広範な枠組みを必要とする。アーキテクチャのリファインメント、解釈可能性、倫理的考察は、放射線診断におけるCNNの展開に責任を持つものとして、今後の研究において優先される必要がある。

The transformative power of Convolutional Neural Networks (CNNs) in radiology diagnostics is examined in this study, with a focus on interpretability, effectiveness, and ethical issues. With an altered DenseNet architecture, the CNN performs admirably in terms of particularity, sensitivity, as well as accuracy. Its superiority over conventional methods is validated by comparative analyses, which highlight efficiency gains. Nonetheless, interpretability issues highlight the necessity of sophisticated methods in addition to continuous model improvement. Integration issues like interoperability and radiologists' training lead to suggestions for teamwork. Systematic consideration of the ethical implications is carried out, necessitating extensive frameworks. Refinement of architectures, interpretability, alongside ethical considerations need to be prioritized in future work for responsible CNN deployment in radiology diagnostics.

翻訳日:2023-11-21 21:34:07 公開日:2023-11-19

# 制御されたテキスト生成における意図しないバイアスを軽減する因果関係

Causal ATE Mitigates Unintended Bias in Controlled Text Generation ( http://arxiv.org/abs/2311.11229v1 )

ライセンス: Link先を確認

Rahul Madhavan and Kahini Wadhawan

(参考訳) 因果平均処理効果(Causal ATE)を用いた言語モデルの属性制御について検討した。言語モデルにおける属性制御タスク(lms)の既存の方法は、興味のある属性を持つ文中の単語の共起をチェックし、それらを制御する。しかしながら、トレーニングデータセット内の属性と単語のスプリアス相関は、推論中にスプリアス相関が提示された場合に、モデルが属性の存在を幻覚させる可能性がある。簡単な摂動に基づくCausal ATE法は意図しない効果を除去する。さらに,分類タスクにおける因果関係の調査のための理論的基礎を提供し,偽陽性の数を減らすことを証明し,意図しないバイアスの問題を緩和する。特に、有害性軽減の問題において、有害性軽減の課題は、しばしば除毒後に保護されたグループに現れる不注意な偏見にある。この意図しないバイアスは、Causal ATEメトリックを用いて解決できることが示される。

We study attribute control in language models through the method of Causal Average Treatment Effect (Causal ATE). Existing methods for the attribute control task in Language Models (LMs) check for the co-occurrence of words in a sentence with the attribute of interest, and control for them. However, spurious correlation of the words with the attribute in the training dataset, can cause models to hallucinate the presence of the attribute when presented with the spurious correlate during inference. We show that the simple perturbation-based method of Causal ATE removes this unintended effect. Additionally, we offer a theoretical foundation for investigating Causal ATE in the classification task, and prove that it reduces the number of false positives -- thereby mitigating the issue of unintended bias. Specifically, we ground it in the problem of toxicity mitigation, where a significant challenge lies in the inadvertent bias that often emerges towards protected groups post detoxification. We show that this unintended bias can be solved by the use of the Causal ATE metric.

翻訳日:2023-11-21 21:33:55 公開日:2023-11-19

# 分子システムの高精度で効率的な幾何学的深層学習のための普遍的枠組み

A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems ( http://arxiv.org/abs/2311.11228v1 )

ライセンス: Link先を確認

Shuo Zhang, Yang Liu, Lei Xie

(参考訳) 分子科学は、異なるタイプや大きさの分子とその複合体を含む幅広い問題に対処する。近年、幾何学的ディープラーニング、特にグラフニューラルネットワークは、分子科学の応用において有望な性能を示している。しかし、既存のほとんどの研究は特定の分子系に目的の誘導バイアスを課すことが多く、マクロ分子や大規模タスクに適用しても非効率である。これらの課題に対処するため,PAMNetは,任意の分子系のサイズや型が異なる3次元(3D)分子の表現を正確かつ効率的に学習するための普遍的なフレームワークである。分子力学にインスパイアされたPAMNetは、局所的および非局所的相互作用とその組み合わせ効果を明示的にモデル化するために、物理情報バイアスを誘導する。その結果、PAMNetは高価な操作を削減でき、時間とメモリ効率が向上する。広範なベンチマーク研究において、PAMNetは、小さな分子の性質、RNA3D構造、タンパク質-リガンド結合親和性という3つの異なる学習課題において、精度と効率の両面で最先端のベースラインより優れている。この結果は,分子科学の幅広い応用におけるPAMNetの可能性を強調した。

Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes. Recently, geometric deep learning, especially Graph Neural Networks, has shown promising performance in molecular science applications. However, most existing works often impose targeted inductive biases to a specific molecular system, and are inefficient when applied to macromolecules or large-scale tasks, thereby limiting their applications to many real-world problems. To address these challenges, we present PAMNet, a universal framework for accurately and efficiently learning the representations of three-dimensional (3D) molecules of varying sizes and types in any molecular system. Inspired by molecular mechanics, PAMNet induces a physics-informed bias to explicitly model local and non-local interactions and their combined effects. As a result, PAMNet can reduce expensive operations, making it time and memory efficient. In extensive benchmark studies, PAMNet outperforms state-of-the-art baselines regarding both accuracy and efficiency in three diverse learning tasks: small molecule properties, RNA 3D structures, and protein-ligand binding affinities. Our results highlight the potential for PAMNet in a broad range of molecular science applications.

翻訳日:2023-11-21 21:33:41 公開日:2023-11-19

# FedRA:不均一クライアントの力を解き放つためのフェデレーションチューニングのためのランダムアロケーション戦略

FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients ( http://arxiv.org/abs/2311.11227v1 )

ライセンス: Link先を確認

Shangchao Su, Bin Li, Xiangyang Xue

(参考訳) 基礎モデルの可用性が高まり、フェデレーションチューニングはフェデレーション学習の分野で注目を集め、複数のクライアントからのデータと計算リソースを活用して、協調的に微調整された基礎モデルを開発した。しかしながら、現実世界のフェデレーションシナリオでは、計算や通信リソースの異なる多数の異種クライアントが存在することが多く、モデルの微調整プロセス全体をサポートすることができない。そこで本研究では,新しいフェデレートチューニングアルゴリズムであるFedRAを提案する。 FedRAの実装は単純で、オリジナルのモデルにさらなる変更を加えることなく、トランスフォーマーベースのモデルにシームレスに統合することができる。具体的には、各通信ラウンドにおいて、FedRAはランダムにアロケーション行列を生成する。リソース制約のあるクライアントの場合、loraを使用して割り当て行列と微調整に基づいて、元のモデルから少数のレイヤを再編成する。その後、サーバは現在の割り当て行列に従ってクライアントから更新されたLoRAパラメータを元のモデルの対応するレイヤに集約する。 fedraは、すべてのクライアントがグローバルモデルを完全にサポートできないようなシナリオもサポートしていますが、これは素晴らしいアドバンテージです。大規模な画像データセットであるDomainNetとNICO++を、さまざまな非ID設定で実験する。その結果,FedRAは比較手法よりも優れていた。ソースコードは \url{https://github.com/leondada/fedra} で入手できる。

With the increasing availability of Foundation Models, federated tuning has garnered attention in the field of federated learning, utilizing data and computation resources from multiple clients to collaboratively fine-tune foundation models. However, in real-world federated scenarios, there often exist a multitude of heterogeneous clients with varying computation and communication resources, rendering them incapable of supporting the entire model fine-tuning process. In response to this challenge, we propose a novel federated tuning algorithm, FedRA. The implementation of FedRA is straightforward and can be seamlessly integrated into any transformer-based model without the need for further modification to the original model. Specifically, in each communication round, FedRA randomly generates an allocation matrix. For resource-constrained clients, it reorganizes a small number of layers from the original model based on the allocation matrix and fine-tunes using LoRA. Subsequently, the server aggregates the updated LoRA parameters from the clients according to the current allocation matrix into the corresponding layers of the original model. It is worth noting that FedRA also supports scenarios where none of the clients can support the entire global model, which is an impressive advantage. We conduct experiments on two large-scale image datasets, DomainNet and NICO++, under various non-iid settings. The results demonstrate that FedRA outperforms the compared methods significantly. The source code is available at \url{https://github.com/leondada/FedRA}.

翻訳日:2023-11-21 21:33:19 公開日:2023-11-19

# LLMに基づくプロンプト修正とユーザフィードバックを用いた対話型クエリ生成アシスタント

An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback ( http://arxiv.org/abs/2311.11226v1 )

ライセンス: Link先を確認

Kaustubh D. Dhole, Ramraj Chandradevan, Eugene Agichtein

(参考訳) 検索は情報にアクセスする主要な方法であるが、特にユーザがドメインに精通していない状況や、他の言語の文書を検索したり、クエリとして容易に表現できないイベントなどの複雑な情報を探す場合、効果的なクエリの定式化は難しい課題である。しかし、このようなクエリ・バイ・サンプルのシナリオは、ドリフトの概念になりがちであり、クエリ生成メソッドに非常に敏感である。このデモでは、llmをインタラクティブに使用し、ユーザがクエリ定式化プロセスのすべての段階で編集やフィードバックを提供するための補完的なアプローチを示す。提案するクエリ生成アシスタントは,単言語および多言語文書コレクション上での対話型クエリ生成をサポートする新しい検索インタフェースである。具体的には、提案した補助インタフェースにより、ユーザは異なるLCMによって生成されたクエリを洗練し、検索したドキュメントやパスに対するフィードバックを提供し、より効果的なクエリを生成するプロンプトとしてユーザのフィードバックを組み込むことができる。提案するインタフェースは,検索モデルの有効性を定性的に評価するために,クエリ生成のためのllmの微調整と促進を探求する,複雑な検索タスクに対してhitl(human-in-the-loop)実験を行う上で有用な実験ツールである。

While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other languages, or looking for complex information such as events, which are not easily expressible as queries. Providing example documents or passages of interest, might be easier for a user, however, such query-by-example scenarios are prone to concept drift, and are highly sensitive to the query generation method. This demo illustrates complementary approaches of using LLMs interactively, assisting and enabling the user to provide edits and feedback at all stages of the query formulation process. The proposed Query Generation Assistant is a novel search interface which supports automatic and interactive query generation over a mono-linguial or multi-lingual document collection. Specifically, the proposed assistive interface enables the users to refine the queries generated by different LLMs, to provide feedback on the retrieved documents or passages, and is able to incorporate the users' feedback as prompts to generate more effective queries. The proposed interface is a valuable experimental tool for exploring fine-tuning and prompting of LLMs for query generation to qualitatively evaluate the effectiveness of retrieval and ranking models, and for conducting Human-in-the-Loop (HITL) experiments for complex search tasks where users struggle to formulate queries without such assistance.

翻訳日:2023-11-21 21:32:58 公開日:2023-11-19

# TextGuard: テキスト分類によるバックドア攻撃に対する防御

TextGuard: Provable Defense against Backdoor Attacks on Text Classification ( http://arxiv.org/abs/2311.11225v1 )

ライセンス: Link先を確認

Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song

(参考訳) バックドア攻撃は、セキュリティクリティカルなアプリケーションに機械学習モデルをデプロイする上で、大きなセキュリティ脅威となっている。既存の研究はバックドア攻撃に対する多くの防御を提案している。特定の実証的な防御効果を示すにもかかわらず、これらの技術は任意の攻撃に対して形式的で証明可能なセキュリティ保証を提供することはできない。その結果,本評価で示すように,強力な適応攻撃によって容易に破られる。本稿では,テキスト分類におけるバックドア攻撃に対する最初の防御手法であるtextguardを提案する。特にTextGuardは、まず(バックドア付き)トレーニングデータをサブトレーニングセットに分割し、各トレーニング文をサブ文に分割する。このパーティショニングにより、サブトレーニングセットの大部分がバックドアトリガを含まないことが保証される。その後、各サブトレーニングセットからベース分類器を訓練し、そのアンサンブルが最終予測を提供する。理論的には、バックドアトリガの長さが一定のしきい値に収まると、TextGuardは、トレーニングやテストにおけるトリガーの存在によって、その予測が影響を受けないことを保証します。本評価では,3つのベンチマークテキスト分類タスクにおけるTextGuardの有効性を実証し,バックドア攻撃に対する既存の認証防御の認証精度を上回った。さらに,TextGuardの実証性能を高めるための新たな戦略を提案する。最先端の実証的防御との比較は、複数のバックドア攻撃に対するTextGuardの優位性を検証する。私たちのコードとデータはhttps://github.com/ai-secure/textguardで入手できます。

Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard.

翻訳日:2023-11-21 21:32:30 公開日:2023-11-19

# ガウス拡散:構造雑音を伴う拡散確率モデルの3次元ガウス散乱

GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise ( http://arxiv.org/abs/2311.11221v1 )

ライセンス: Link先を確認

Xinhai Li and Huaibin Wang and Kuo-Kun Tseng

(参考訳) text-to-3dは効率的な生成方法と拡張的な創造性で知られており、aigcドメインでかなりの注目を集めている。しかし、Nerfと2次元拡散モデルの融合は、しばしば過飽和画像を生成し、画素ワイドレンダリング法の制約により下流産業用途に厳しい制約を課す。ガウススプラッティングは、最近、NeRF法で一般的な従来の点検法に取って代わられ、3次元再構成の様々な側面に革命をもたらした。本稿では,gaussian splattingに基づく新たな3dコンテンツ生成フレームワークを提案する。 3次元生成における多視点一貫性の実現という課題は、モデリングの複雑さと精度を著しく損なう。 SJCからインスピレーションを得て,多視点形状の不整合の是正を目的とした3次元ガウススプラッティングによる摂動画像へのマルチビューノイズ分布の適用を検討した。我々は,様々な視点からガウスノイズを発生させる効率的なノイズ生成法を考案した。さらに、バニラ3dガウス系世代は、局所的なミニマでモデルを罠にかける傾向があり、フローター、バリ、増殖要素などの人工物を引き起こす。これらの問題を緩和するために,3次元外観の品質と安定性を高めるため,変分ガウススプラッティング法を提案する。我々の知る限り,本手法は3次元コンテンツ生成プロセスの全領域にわたるガウススプラッティングの包括的利用が初めてである。

Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the amalgamation of Nerf and 2D diffusion models frequently yields oversaturated images, posing severe limitations on downstream industrial applications due to the constraints of pixelwise rendering method. Gaussian splatting has recently superseded the traditional pointwise sampling technique prevalent in NeRF-based methodologies, revolutionizing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework based on Gaussian splatting, enabling fine control over image saturation through individual Gaussian sphere transparencies, thereby producing more realistic images. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian splatting across the entire spectrum of 3D content generation processes.

翻訳日:2023-11-21 21:32:07 公開日:2023-11-19

# 状態独立な全対数論

State-independent all-versus-nothing arguments ( http://arxiv.org/abs/2311.11218v1 )

ライセンス: Link先を確認

Boseong Kim, Samson Abramsky

(参考訳) 文脈性は、古典的直観に挑戦する量子情報の重要な特徴であり、量子優位の明示的な証明を構築する基盤を提供する。量子的優位性の多くの証拠は文脈性議論に基づいているが、文脈性の定義はそれぞれの研究で異なり、その結果間の即時接続の確立に矛盾を引き起こす。本報告では,層理論的文脈性の数学的構造を概観し,この枠組みを拡張してコチェン・スペック的文脈性を説明する。まず、文脈性の定義を詳細な例で取り上げる。次に、全対無(AvN)引数を述べ、状態非依存のAvNクラスを定義します。可観測物の部分的閉包を可換化することで, コッチェン=スペーカー型文脈性, あるいは部分閉包における文脈性が, この枠組みに変換できることが示されている。最後に、状態側ビューにおける文脈性クラスの厳密な階層構造が部分的クロージャ形式とともに状態非依存のAvNクラスにマージされるような演算子側ビューにおける文脈性の各ケースを比較する。全体として、この記事はコチェン=スペクター型の概念を状態独立なAvN引数に組み込むことにより、文脈性の統一的な解釈を提供する。この結果は文脈性に対する新たな洞察を示し、量子優位の証明を構築するためのコヒーレントなアプローチへの道を開く。

Contextuality is a key feature of quantum information that challenges classical intuitions, providing the basis for constructing explicit proofs of quantum advantage. While a number of evidences of quantum advantage are based on the contextuality argument, the definition of contextuality is different in each research, causing incoherence in the establishment of instant connection between their results. In this report, we review the mathematical structure of sheaf-theoretic contextuality and extend this framework to explain Kochen-Specker type contextuality. We first cover the definitions in contextuality with detailed examples. Then, we state the all-versus-nothing (AvN) argument and define a state-independent AvN class. It is shown that Kochen-Specker type contextuality, or contextuality in a partial closure, can be translated into this framework by the partial closure of observables under the multiplication of commuting measurements. Finally, we compare each case of contextuality in an operator-side view, where the strict hierarchy of contextuality class in a state-side view seems to merge into the state-independent AvN class together with the partial closure formalism. Overall, this report provides a unified interpretation of contextuality by integrating Kochen-Specker type notions into the state-independent AvN argument. The results present novel insights into contextuality, which pave the way for a coherent approach to constructing proofs of quantum advantage.

翻訳日:2023-11-21 21:31:38 公開日:2023-11-19

# SPLAIN: 理由とデータによるサイバーセキュリティ問題の拡大

SPLAIN: Augmenting CybersecurityWarnings with Reasons and Data ( http://arxiv.org/abs/2311.11215v1 )

ライセンス: Link先を確認

Vera A. Kazakova and Jena D. Hwang and Bonnie J. Dorr and Yorick Wilks and J. Blake Gage and Alex Memory and Mark A. Clark

(参考訳) 効果的なサイバー脅威認識と予防要求は、事前アプローチが一般的に限定的で、究極的には理解できない情報を提供するため、理解しやすい予測システムである。 SPLAIN(Simplified Plaintext Language)は,警告データをユーザフレンドリーなサイバー脅威記述に変換する自然言語生成装置である。 SPLAINは、入力データとシステム機能に関する階層的な説明的詳細を組み込んだ、明確で実用的な出力を生成するように設計されている。個々のセンサによる予測信号の入力と融合モジュールからの全体的な警告を考慮し、SPLAINは各信号に対して、センサやデータ信号に関する情報を問い合わせる。この収集されたデータは、ユーザレビューのための予測、センシング、データ要素を含む、一貫性のある英語の説明に処理される。 SPLAINのテンプレートベースのアプローチは、一貫した警告構造と語彙を保証する。 splainの階層的な出力構造により、各脅威とそのコンポーネントを拡張でき、要求の基盤となる説明を明らかにすることができる。我々の結論は、サイバー警告の背後にある「方法」と「理由」を特定することの必要性を強調し、一貫性のある説明を生成するための単純な構造化テンプレートを提唱し、機械学習アプローチにおける直接因果関係が常に識別可能であるとは限らないことを認識し、モデルやトレーニングデータといった一般的な方法論に焦点を当てるためにいくつかの説明を必要とする。

Effective cyber threat recognition and prevention demand comprehensible forecasting systems, as prior approaches commonly offer limited and, ultimately, unconvincing information. We introduce Simplified Plaintext Language (SPLAIN), a natural language generator that converts warning data into user-friendly cyber threat explanations. SPLAIN is designed to generate clear, actionable outputs, incorporating hierarchically organized explanatory details about input data and system functionality. Given the inputs of individual sensor-induced forecasting signals and an overall warning from a fusion module, SPLAIN queries each signal for information on contributing sensors and data signals. This collected data is processed into a coherent English explanation, encompassing forecasting, sensing, and data elements for user review. SPLAIN's template-based approach ensures consistent warning structure and vocabulary. SPLAIN's hierarchical output structure allows each threat and its components to be expanded to reveal underlying explanations on demand. Our conclusions emphasize the need for designers to specify the "how" and "why" behind cyber warnings, advocate for simple structured templates in generating consistent explanations, and recognize that direct causal links in Machine Learning approaches may not always be identifiable, requiring some explanations to focus on general methodologies, such as model and training data.

翻訳日:2023-11-21 21:31:13 公開日:2023-11-19

# 弱監督下における変電所設備故障の赤外画像識別法

Infrared image identification method of substation equipment fault under weak supervision ( http://arxiv.org/abs/2311.11214v1 )

ライセンス: Link先を確認

Anjali Sharma, Priya Banerjee, Nikhil Singh

(参考訳) 本研究では, サブステーション装置の赤外線画像中の欠陥を弱教師付きで識別する手法を提案する。機器識別にFaster RCNNモデルを使用し、モデルのネットワーク構造とパラメータの変更による検出精度を向上させる。サブステーションで検査ロボットが捉えた赤外線画像の解析により,本手法を実証する。手動でマークされた結果に対して性能が検証され、提案手法が様々な機器タイプにわたる故障同定の精度を大幅に向上させることを示した。

This study presents a weakly supervised method for identifying faults in infrared images of substation equipment. It utilizes the Faster RCNN model for equipment identification, enhancing detection accuracy through modifications to the model's network structure and parameters. The method is exemplified through the analysis of infrared images captured by inspection robots at substations. Performance is validated against manually marked results, demonstrating that the proposed algorithm significantly enhances the accuracy of fault identification across various equipment types.

翻訳日:2023-11-21 21:30:49 公開日:2023-11-19

# 因果探索アルゴリズムにおける事前学習言語モデルの利用は可能か?

Can We Utilize Pre-trained Language Models within Causal Discovery Algorithms? ( http://arxiv.org/abs/2311.11212v1 )

ライセンス: Link先を確認

Chanhui Lee (1), Juhyeon Kim (2), Yongjun Jeong (3), Juhyun Lyu (4), Junghee Kim (4), Sangmin Lee (4), Sangjun Han (4), Hyeokjun Choe (4), Soyeon Park (4), Woohyung Lim (4), Sungbin Lim (5,6), Sanghack Lee (2,7) ((1) Department of Artificial Intelligence, Korea University, (2) Graduate School of Data Science, Seoul National University, (3) Department of Computer Science and Engineering, UNIST, (4) Data Intelligence Laboratory, LG AI Research, (5) Department of Statistics, Korea University, (6) LG AI Research, (7) SNU-LG AI Research Center)

(参考訳) スケーリング法は、事前訓練された言語モデル(PLM)を因果推論の分野に導入することを許している。 PLMの因果推論は、データを利用した変数間の因果関係を決定することを目的とした因果発見とは対照的に、テキストベースの記述にのみ依存する。近年,特別に設計されたプロンプトにより,反復的因果推論の結果を集約して因果発見を模倣する手法が研究されている。原因と効果の発見におけるPLMの有用性を強調しており、特に複数の変数を扱う場合、データ不足によって制限されることが多い。逆に、PLMはデータを解析せず、迅速な設計に大きく依存しているというPLMの特徴は、因果発見にPLMを直接使用する上で重要な制限となる。したがって、plmに基づく因果推論は、素早い設計に深く依存し、因果関係を決定する際に過剰信頼と誤った予測のリスクを負う。本稿では,物理に着想を得た合成データの実験を通して,前述のPLMに基づく因果推論の限界を実証的に示す。そこで本研究では,plmから得られた知識を因果発見アルゴリズムと統合する新しいフレームワークを提案する。これは因果発見のための隣接行列を初期化し、事前知識を用いた正規化を組み込むことによって達成される。提案手法は, PLMと因果発見の統合による性能向上を実証するだけでなく, PLMから抽出した事前知識を既存の因果発見アルゴリズムで活用する方法も提案する。

Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal relationships between variables utilizing data. Recently, there has been current research regarding a method that mimics causal discovery by aggregating the outcomes of repetitive causal reasoning, achieved through specifically designed prompts. It highlights the usefulness of PLMs in discovering cause and effect, which is often limited by a lack of data, especially when dealing with multiple variables. Conversely, the characteristics of PLMs which are that PLMs do not analyze data and they are highly dependent on prompt design leads to a crucial limitation for directly using PLMs in causal discovery. Accordingly, PLM-based causal reasoning deeply depends on the prompt design and carries out the risk of overconfidence and false predictions in determining causal relationships. In this paper, we empirically demonstrate the aforementioned limitations of PLM-based causal reasoning through experiments on physics-inspired synthetic data. Then, we propose a new framework that integrates prior knowledge obtained from PLM with a causal discovery algorithm. This is accomplished by initializing an adjacency matrix for causal discovery and incorporating regularization using prior knowledge. Our proposed framework not only demonstrates improved performance through the integration of PLM and causal discovery but also suggests how to leverage PLM-extracted prior knowledge with existing causal discovery algorithms.

翻訳日:2023-11-21 21:30:42 公開日:2023-11-19

# イベントトリガー型コンテキスト認識ストーリー生成のためのクロスアテンション強化モデル

A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation ( http://arxiv.org/abs/2311.11271v1 )

ライセンス: Link先を確認

Chen Tang, Tyler Loakman and Chenghua Lin

(参考訳) 近年の進歩にもかかわらず、既存のストーリー生成システムは、コンテクストやイベントの特徴を効果的に組み込むのに困難に直面している。これらの課題に対処するために、我々は、コンテキスト特徴をイベントシーケンスに残余マッピングを通じてマッピングするクロスアテンション機構を用いて、生成されたストーリーの関連性とコヒーレンスを高める新しいニューラル生成モデル、EtriCAを導入する。この機能キャプチャメカニズムにより,ストーリー生成プロセスにおいて,イベント間の論理関係をより効果的に活用できる。提案モデルをさらに強化するために,大規模書籍コーパスに知識向上のためのポストトレーニングフレームワーク(KeEtriCA)を用いる。これにより、EtriCAはより広い範囲のデータサンプルに適応できる。その結果,自動測定では約5倍,人的評価では10倍以上の改善が得られた。我々は、ストーリー生成におけるフレームワークの性能を評価するために、最新技術ベースラインモデル(SOTA)との比較を含む広範な実験を行う。自動測定と人的評価の両方を含む実験結果は、既存の最先端ベースラインよりもモデルの方が優れていることを示す。これらの結果は,生成した物語の質を向上させるために,文脈やイベントの特徴を活用するモデルの有効性を裏付けるものである。

Despite recent advancements, existing story generation systems continue to encounter difficulties in effectively incorporating contextual and event features, which greatly influence the quality of generated narratives. To tackle these challenges, we introduce a novel neural generation model, EtriCA, that enhances the relevance and coherence of generated stories by employing a cross-attention mechanism to map context features onto event sequences through residual mapping. This feature capturing mechanism enables our model to exploit logical relationships between events more effectively during the story generation process. To further enhance our proposed model, we employ a post-training framework for knowledge enhancement (KeEtriCA) on a large-scale book corpus. This allows EtriCA to adapt to a wider range of data samples. This results in approximately 5\% improvement in automatic metrics and over 10\% improvement in human evaluation. We conduct extensive experiments, including comparisons with state-of-the-art (SOTA) baseline models, to evaluate the performance of our framework on story generation. The experimental results, encompassing both automated metrics and human assessments, demonstrate the superiority of our model over existing state-of-the-art baselines. These results underscore the effectiveness of our model in leveraging context and event features to improve the quality of generated narratives.

翻訳日:2023-11-21 21:24:47 公開日:2023-11-19

# 実世界の筆記支援に向けて:偽字と誤字による漢字チェックベンチマーク

Towards Real-World Writing Assistance: A Chinese Character Checking Benchmark with Faked and Misspelled Characters ( http://arxiv.org/abs/2311.11268v1 )

ライセンス: Link先を確認

Yinghui Li, Zishan Xu, Shaoshen Chen, Haojing Huang, Yangning Li, Yong Jiang, Zhongli Li, Qingyu Zhou, Hai-Tao Zheng, Ying Shen

(参考訳) 筆記支援は人間の生活に密接に関連する応用であり、また、基礎的な自然言語処理(NLP)研究分野でもある。その目的は入力テキストの正しさと品質を改善することであり、誤字の検出と修正には文字チェックが不可欠である。手書き文字が大多数を占める現実の世界から見ると、人間が間違える文字には、偽文字(すなわち、文字の誤りによって作られた不正確な文字)と誤字文字(すなわち、スペルミスによって誤用された真の文字)が含まれる。しかし、既存のデータセットや関連研究は、主に音韻的・視覚的混乱に起因する誤字のみに焦点を当てており、より一般的で難しい偽字を無視している。このジレンマを突破するために、偽字と誤字が混ざった人間の注釈付き視覚中国語文字チェックデータセットVisual-C$^3$を提示する。私たちの知る限りでは、visual-c$^3$は、漢字チェックシナリオにおける、最初の現実世界のビジュアルであり、最大の人造データセットです。また,Visual-C$^3$の新たなベースライン手法を提案し,評価する。広範な実験結果と分析の結果、visual-c$^3$は高品質だが困難であることがわかった。 Visual-C$^3$データセットとベースラインメソッドは、コミュニティにおけるさらなる研究を促進するために公開されます。

Writing assistance is an application closely related to human life and is also a fundamental Natural Language Processing (NLP) research field. Its aim is to improve the correctness and quality of input texts, with character checking being crucial in detecting and correcting wrong characters. From the perspective of the real world where handwriting occupies the vast majority, characters that humans get wrong include faked characters (i.e., untrue characters created due to writing errors) and misspelled characters (i.e., true characters used incorrectly due to spelling errors). However, existing datasets and related studies only focus on misspelled characters mainly caused by phonological or visual confusion, thereby ignoring faked characters which are more common and difficult. To break through this dilemma, we present Visual-C$^3$, a human-annotated Visual Chinese Character Checking dataset with faked and misspelled Chinese characters. To the best of our knowledge, Visual-C$^3$ is the first real-world visual and the largest human-crafted dataset for the Chinese character checking scenario. Additionally, we also propose and evaluate novel baseline methods on Visual-C$^3$. Extensive empirical results and analyses show that Visual-C$^3$ is high-quality yet challenging. The Visual-C$^3$ dataset and the baseline methods will be publicly available to facilitate further research in the community.

翻訳日:2023-11-21 21:24:26 公開日:2023-11-19

# メンタルヘルスアプリケーションにおける大規模言語モデルの再考

Rethinking Large Language Models in Mental Health Applications ( http://arxiv.org/abs/2311.11267v1 )

ライセンス: Link先を確認

Shaoxiong Ji and Tianlin Zhang and Kailai Yang and Sophia Ananiadou and Erik Cambria

(参考訳) 大規模言語モデル(LLM)はメンタルヘルスにおいて貴重な資産となり、分類タスクとカウンセリングアプリケーションの両方において有望である。本稿では,精神保健分野におけるLSMの利用について考察する。予測のための生成モデルの不安定性と幻覚的なアウトプットを生成する可能性について論じ、その信頼性と信頼性を維持するために継続する監査と評価の必要性を強調する。この論文は、しばしば交換可能な『説明可能性』と『解釈可能性』を区別し、LLMが生み出す潜在的幻覚的自己説明に頼るのではなく、本質的に解釈可能な方法を開発することを提唱している。 LLMの進歩にもかかわらず、人間のカウンセラーの共感的理解、ニュアンスド解釈、文脈認識は、精神保健カウンセリングのセンシティブで複雑な領域では相容れないままである。 LLMの使用は、それを置き換えようとするのではなく、人間の専門知識を補完するツールと見なして、司法的かつ思慮深い考え方でアプローチされるべきである。

Large Language Models (LLMs) have become valuable assets in mental health, showing promise in both classification tasks and counseling applications. This paper offers a perspective on using LLMs in mental health applications. It discusses the instability of generative models for prediction and the potential for generating hallucinatory outputs, underscoring the need for ongoing audits and evaluations to maintain their reliability and dependability. The paper also distinguishes between the often interchangeable terms ``explainability'' and ``interpretability'', advocating for developing inherently interpretable methods instead of relying on potentially hallucinated self-explanations generated by LLMs. Despite the advancements in LLMs, human counselors' empathetic understanding, nuanced interpretation, and contextual awareness remain irreplaceable in the sensitive and complex realm of mental health counseling. The use of LLMs should be approached with a judicious and considerate mindset, viewing them as tools that complement human expertise rather than seeking to replace it.

翻訳日:2023-11-21 21:24:03 公開日:2023-11-19

# 物理インフォームドニューラルネットワークとニューラル演算子における雑音入出力の不確かさの定量化

Uncertainty quantification for noisy inputs-outputs in physics-informed neural networks and neural operators ( http://arxiv.org/abs/2311.11262v1 )

ライセンス: Link先を確認

Zongren Zou, Xuhui Meng, George Em Karniadakis

(参考訳) 科学機械学習(SciML)における不確実性定量化(UQ)は、ニューラルネットワーク(NN)が様々な科学分野にわたる複雑な問題に広く採用されているため、ますます重要になっている。代表的なSciMLモデルは物理インフォームドニューラルネットワーク(PINN)とニューラル演算子(NO)である。近年、SciMLのUQはますます研究されているが、PINNにおける時空間座標やNOsにおける入力関数などのノイズ入力による不確実性に対処する研究はほとんどない。モデルの入力におけるノイズの存在は、ほとんどのSciMLアルゴリズムの固有の非線形性のために、モデルの出力におけるノイズと比較して、かなり多くの課題を引き起こす。結果として、ノイズの多い入力に対するUQは、物理的な知識を含むアプリケーションにこれらのモデルの信頼性と信頼性の高いデプロイを行う上で重要な要素となる。そこで本研究では,ピンとnosのノイズ入力から生じる不確かさを定量化するベイズ法を提案する。本手法は,物理情報を符号化する際に,PINNやNOにシームレスに統合可能であることを示す。 PINNは、損失関数または可能性のいずれにおいても、自動的に微分される物理情報を含むことで物理学を取り入れ、時空間座標を入力とすることが多い。そこで,本手法は,観測された座標が雑音を受ける問題に対処する能力をPINNに装備する。一方、事前訓練されたNOは微分方程式の解法やベイズ逆問題(英語版)において方程式を含まない代理として一般的に用いられる。提案手法では,入力関数と出力関数の両方のノイズ測定をuqで処理できる。

Uncertainty quantification (UQ) in scientific machine learning (SciML) becomes increasingly critical as neural networks (NNs) are being widely adopted in addressing complex problems across various scientific disciplines. Representative SciML models are physics-informed neural networks (PINNs) and neural operators (NOs). While UQ in SciML has been increasingly investigated in recent years, very few works have focused on addressing the uncertainty caused by the noisy inputs, such as spatial-temporal coordinates in PINNs and input functions in NOs. The presence of noise in the inputs of the models can pose significantly more challenges compared to noise in the outputs of the models, primarily due to the inherent nonlinearity of most SciML algorithms. As a result, UQ for noisy inputs becomes a crucial factor for reliable and trustworthy deployment of these models in applications involving physical knowledge. To this end, we introduce a Bayesian approach to quantify uncertainty arising from noisy inputs-outputs in PINNs and NOs. We show that this approach can be seamlessly integrated into PINNs and NOs, when they are employed to encode the physical information. PINNs incorporate physics by including physics-informed terms via automatic differentiation, either in the loss function or the likelihood, and often take as input the spatial-temporal coordinate. Therefore, the present method equips PINNs with the capability to address problems where the observed coordinate is subject to noise. On the other hand, pretrained NOs are also commonly employed as equation-free surrogates in solving differential equations and Bayesian inverse problems, in which they take functions as inputs. The proposed approach enables them to handle noisy measurements for both input and output functions with UQ.

翻訳日:2023-11-21 21:23:45 公開日:2023-11-19

# 視覚言語モデルに対する対向的プロンプトチューニング

Adversarial Prompt Tuning for Vision-Language Models ( http://arxiv.org/abs/2311.11261v1 )

ライセンス: Link先を確認

Jiaming Zhang, Xingjun Ma, Xin Wang, Lingyu Qiu, Jiaqi Wang, Yu-Gang Jiang, Jitao Sang

(参考訳) マルチモーダル学習の急速な進歩に伴い、CLIPのような事前学習された視覚言語モデル(VLM)は、視覚と言語の間のギャップを埋める際、顕著な能力を示した。しかし、これらのモデルは敵の攻撃、特に画像のモダリティに弱いままであり、かなりのセキュリティリスクが生じる。本稿では,VLMにおける画像エンコーダの対向性を高める手法であるAdvPT(Adversarial Prompt Tuning)を提案する。 AdvPTは、学習可能なテキストプロンプトを革新的に活用し、それを敵対的な画像埋め込みと整合させ、広範囲なパラメータトレーニングやモデルアーキテクチャの変更を必要とせずに、VLMに固有の脆弱性に対処する。我々は,AdvPTがホワイトボックス攻撃やブラックボックス攻撃に対する抵抗性を向上し,既存の画像処理による防御技術と組み合わせることで,防御能力をさらに向上することを示す。総合的な実験分析は、テキスト入力の修正を通じて、対向画像に対する抵抗を改善することに特化した新しいパラダイムである、対向プロンプトチューニングに関する洞察を与え、将来の堅牢なマルチモーダル学習研究への道を開く。これらの知見は、VLMの安全性を高める新たな可能性を開く。私たちのコードは論文の発行時に入手できます。

With the rapid advancement of multimodal learning, pre-trained Vision-Language Models (VLMs) such as CLIP have demonstrated remarkable capacities in bridging the gap between visual and language modalities. However, these models remain vulnerable to adversarial attacks, particularly in the image modality, presenting considerable security risks. This paper introduces Adversarial Prompt Tuning (AdvPT), a novel technique to enhance the adversarial robustness of image encoders in VLMs. AdvPT innovatively leverages learnable text prompts and aligns them with adversarial image embeddings, to address the vulnerabilities inherent in VLMs without the need for extensive parameter training or modification of the model architecture. We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques, further boosting defensive capabilities. Comprehensive experimental analyses provide insights into adversarial prompt tuning, a novel paradigm devoted to improving resistance to adversarial images through textual input modifications, paving the way for future robust multimodal learning research. These findings open up new possibilities for enhancing the security of VLMs. Our code will be available upon publication of the paper.

翻訳日:2023-11-21 21:23:17 公開日:2023-11-19

# Radarize:屋内環境のための大規模レーダーSLAM

Radarize: Large-Scale Radar SLAM for Indoor Environments ( http://arxiv.org/abs/2311.11260v1 )

ライセンス: Link先を確認

Emerson Sie, Xinyu Wu, Heyu Guo, Deepak Vasisht

(参考訳) 我々は、低コストのコモディティ単一チップmmWaveレーダのみを使用する屋内環境のための自己完結SLAMパイプラインであるRadarizeを提案する。レーダネイティブアプローチでは,ドップラーシフトに基づくオドメトリなどの電波周波数特有の現象を利用して,性能を向上させる。本手法は,4つのキャンパス建物にまたがる146件の大規模トラジェクトリデータセットを用いて,約4680mの走行距離で評価した。以上の結果から,IMUやホイール・オドメトリーなどのセンサを必要とせず,絶対軌道誤差 (ATE) を用いて計測し, 計測精度を約5倍, SLAMの約8倍に向上することがわかった。

We present Radarize, a self-contained SLAM pipeline for indoor environments that uses only a low-cost commodity single-chip mmWave radar. Our radar-native approach leverages phenomena unique to radio frequencies, such as doppler shift-based odometry, to improve performance. We evaluate our method on a large-scale dataset of 146 trajectories spanning 4 campus buildings, totaling approximately 4680m of travel distance. Our results show that our method outperforms state-of-the-art radar-based approaches by approximately 5x in terms of odometry and 8x in terms of end-to-end SLAM, as measured by absolute trajectory error (ATE), without the need additional sensors such as IMUs or wheel odometry.

翻訳日:2023-11-21 21:22:57 公開日:2023-11-19

# 解釈可能かつ効率的な量子インスパイア機械学習のためのテンソルネットワーク

Tensor networks for interpretable and efficient quantum-inspired machine learning ( http://arxiv.org/abs/2311.11258v1 )

ライセンス: Link先を確認

Shi-Ju Ran and Gang Su

(参考訳) ディープラーニング(ML)の現在のスキームと高い解釈可能性と効率を同時に獲得することは、重要な課題である。量子力学から派生したよく確立された数学的ツールであるテンソルネットワーク(TN)は、効率的な「ホワイトボックス」MLスキームを開発する上で、その独特な利点を示している。本稿では,TNベースのMLにおけるインスピレーションの進展について概説する。一方、TN MLの解釈性は、量子情報と多体物理学に基づく固い理論基盤に適合する。一方で、強力なtn表現や量子多体物理学で開発された高度な計算技術から高い効率を得られる。量子コンピュータの急速な発展に伴い、TNは量子ハードウェア上で実行可能な新しいスキームを思いつき、近い将来「量子人工知能」へと進むことが期待されている。

It is a critical challenge to simultaneously gain high interpretability and efficiency with the current schemes of deep machine learning (ML). Tensor network (TN), which is a well-established mathematical tool originating from quantum mechanics, has shown its unique advantages on developing efficient ``white-box'' ML schemes. Here, we give a brief review on the inspiring progresses made in TN-based ML. On one hand, interpretability of TN ML is accommodated with the solid theoretical foundation based on quantum information and many-body physics. On the other hand, high efficiency can be rendered from the powerful TN representations and the advanced computational techniques developed in quantum many-body physics. With the fast development on quantum computers, TN is expected to conceive novel schemes runnable on quantum hardware, heading towards the ``quantum artificial intelligence'' in the forthcoming future.

翻訳日:2023-11-21 21:22:39 公開日:2023-11-19

# 圧縮高次高調波の発生

Generation of squeezed high-order harmonics ( http://arxiv.org/abs/2311.11257v1 )

ライセンス: Link先を確認

Matan Even Tzur, Michael Birk, Alexey Gorlach, Ido Kaminer, Michael Krueger, and Oren Cohen

(参考訳) 何十年もの間、高調波発生(hhg)に関するほとんどの研究は、物質は量子ではなく、古典的物であると考えており、高調波の量子光学的性質は疑問視されている。ここでは高調波の量子的性質を探求する。任意の量子光状態によって駆動されるとき、高調波の量子状態の公式を導出し、実験的関連性の特定の場合を探索する。特に、適度に圧縮されたポンプの場合、HHGはコヒーレント光によって駆動され、圧縮された高調波が生じる。高調波スクイージングは、イオン化時間をポンプのスクイージング位相と同期させることで最適化される。この体制を超えて、ポンプのスクイーズが増加するにつれて、ハーモニクスは最初、圧縮された熱光子統計を取得し、相互作用系の半古典的非線形応答関数に強く依存する複雑な量子状態を占める。その結果、超短波長超短パルスを絞り込み、より一般的には、従来アクセスできなかったスペクトル範囲に量子周波数変換することで、超感度アト秒メトロロジーが可能となる。

For decades, most research on high harmonic generation (HHG) considered matter as quantum but light as classical, leaving the quantum-optical nature of the harmonics an open question. Here we explore the quantum properties of high harmonics. We derive a formula for the quantum state of the high harmonics, when driven by arbitrary quantum light states, and then explore specific cases of experimental relevance. Specifically, for a moderately squeezed pump, HHG driven by squeezed coherent light results in squeezed high harmonics. Harmonic squeezing is optimized by syncing ionization times with the pump's squeezing phase. Beyond this regime, as pump squeezing is increased, the harmonics initially acquire squeezed thermal photon statistics, and then occupy an intricate quantum state which strongly depends on the semi-classical nonlinear response function of the interacting system. Our results pave the way for the generation of squeezed extreme-ultraviolet ultrashort pulses, and, more generally, quantum frequency conversion into previously inaccessible spectral ranges, which may enable ultrasensitive attosecond metrology.

翻訳日:2023-11-21 21:22:25 公開日:2023-11-19

# BOIS: 相互接続システムのベイズ最適化

BOIS: Bayesian Optimization of Interconnected Systems ( http://arxiv.org/abs/2311.11254v1 )

ライセンス: Link先を確認

Leonardo D. Gonz\'alez and Victor M. Zavala

(参考訳) ベイズ最適化(BO)は、高価なサンプルシステムのグローバル最適化に有効なパラダイムであることが証明されている。 boの主な利点の1つは、学習と探索のプロセスを導くのに利用できるモデルの不確かさを特徴付けるために、ガウス過程(gps)を使用することである。しかし、BOは通常システムをブラックボックスとして扱うため、構造的知識(物理学や疎結合など)を利用する能力は制限される。複合関数は$f(x, y(x))$であり、gp モデリングはパフォーマンス関数 $f$ から中間関数 $y$ にシフトされ、構造知識を利用するための道筋を提供する。しかし、BOフレームワークにおける合成関数の使用は、GPによって計算されるガウス密度$y$から$f$の確率密度を生成する必要性により複雑である(例えば、$f$が非線形であれば、閉形式式を得ることはできない)。従来の作業ではサンプリング技術を使ってこの問題に対処しており、実装が容易で柔軟性があるが、計算集約性が高い。本稿では,boにおける複合関数の効率的な利用を可能にする新しいパラダイムを提案する。このパラダイムでは,複合関数の統計モーメントに対する閉形式式を得るのに$f$の適応線形化を用いる。この単純なアプローチ(boisと呼ぶ)により、相互接続されたシステムや複数のgpモデルを埋め込んだシステム、物理モデルとgpモデルの組み合わせなど、構造的知識の活用が可能になる。化学プロセス最適化ケーススタディを用いて,BOISの標準BOとサンプリングアプローチの有効性をベンチマークした。その結果,boisは性能向上を達成し,複合関数の統計を正確に捉えることができた。

Bayesian optimization (BO) has proven to be an effective paradigm for the global optimization of expensive-to-sample systems. One of the main advantages of BO is its use of Gaussian processes (GPs) to characterize model uncertainty which can be leveraged to guide the learning and search process. However, BO typically treats systems as black-boxes and this limits the ability to exploit structural knowledge (e.g., physics and sparse interconnections). Composite functions of the form $f(x, y(x))$, wherein GP modeling is shifted from the performance function $f$ to an intermediate function $y$, offer an avenue for exploiting structural knowledge. However, the use of composite functions in a BO framework is complicated by the need to generate a probability density for $f$ from the Gaussian density of $y$ calculated by the GP (e.g., when $f$ is nonlinear it is not possible to obtain a closed-form expression). Previous work has handled this issue using sampling techniques; these are easy to implement and flexible but are computationally intensive. In this work, we introduce a new paradigm which allows for the efficient use of composite functions in BO; this uses adaptive linearizations of $f$ to obtain closed-form expressions for the statistical moments of the composite function. We show that this simple approach (which we call BOIS) enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. Using a chemical process optimization case study, we benchmark the effectiveness of BOIS against standard BO and sampling approaches. Our results indicate that BOIS achieves performance gains and accurately captures the statistics of composite functions.

翻訳日:2023-11-21 21:22:05 公開日:2023-11-19

# 日本のサブメーターレベルの土地被覆マッピング

Submeter-level Land Cover Mapping of Japan ( http://arxiv.org/abs/2311.11252v1 )

ライセンス: Link先を確認

Naoto Yokoya, Junshi Xia, Clifford Broni-Bediako

(参考訳) ディープラーニングは、サブメーターレベルのマッピングタスクにおいて有望なパフォーマンスを示しているが、特に大規模に適用する場合、サブメーターレベルの画像のアノテーションコストは依然として課題である。本稿では,日本初の8階級の土地被覆図を,比較的低いアノテーションコストで提示する。最近導入されたグローバルサブメーターレベルの土地被覆マッピングのベンチマークデータセットであるOpenEarthMapと,少量のラベル付きデータによる全国規模の地図を実現するU-Netモデルを導入した。 OpenEarthMapでトレーニングされたU-Netモデルが明らかに失敗し、モデルを再トレーニングする領域や領域のラベル付きデータを少量追加することで、全体の精度が80%向上し、再トレーニング後の16パーセント近くの改善が達成された。地理空間情報機関(Geospatial Information Authority of Japan)が提供する航空画像を用いて,全国8クラスの土地被覆分類地図を作成する。提案手法は, アノテーションコストの低減と高精度マッピングの結果から, サブメータレベルの光リモートセンシングデータを用いた全国規模の土地被覆マッピングの自動更新に寄与する可能性を実証する。地図の結果は公開される予定だ。

Deep learning has shown promising performance in submeter-level mapping tasks; however, the annotation cost of submeter-level imagery remains a challenge, especially when applied on a large scale. In this paper, we present the first submeter-level land cover mapping of Japan with eight classes, at a relatively low annotation cost. We introduce a human-in-the-loop deep learning framework leveraging OpenEarthMap, a recently introduced benchmark dataset for global submeter-level land cover mapping, with a U-Net model that achieves national-scale mapping with a small amount of additional labeled data. By adding a small amount of labeled data of areas or regions where a U-Net model trained on OpenEarthMap clearly failed and retraining the model, an overall accuracy of 80\% was achieved, which is a nearly 16 percentage point improvement after retraining. Using aerial imagery provided by the Geospatial Information Authority of Japan, we create land cover classification maps of eight classes for the entire country of Japan. Our framework, with its low annotation cost and high-accuracy mapping results, demonstrates the potential to contribute to the automatic updating of national-scale land cover mapping using submeter-level optical remote sensing data. The mapping results will be made publicly available.

翻訳日:2023-11-21 21:21:34 公開日:2023-11-19

# 非エルミートイジング鎖における異種多体相転移

Unconventional many-body phase transitions in a non-Hermitian Ising chain ( http://arxiv.org/abs/2311.11251v1 )

ライセンス: Link先を確認

Chao-Ze Lu, Xiaolong Deng, Su-Peng Kou and Gaoyong Sun

(参考訳) 1次元強磁性トランスバースフィールドIsingモデルにおける多体相転移について検討し、2次相転移と2つの$\mathcal{PT}$相転移の3つの相転移を示すことを示す。基底状態における2次相転移は, 生体直交および自己正規エンタングルメントエントロピーを用いて検討し, 有限スケールスケーリング理論を用いて小系の中心電荷を抽出する手法を開発した。第2次相転移と比較して、第1の$\mathcal{PT}$遷移は全エネルギースペクトルにおける例外点の出現によって特徴づけられるが、第2の$\mathcal{PT}$遷移は特定の励起状態においてのみ発生する。さらに, エネルギーの仮想部分のスケーリングの観点から, 二つの例外点が2次であることが興味深い。この研究は、非エルミート系における非慣習的多体相転移の正確な解を与える。

We study many-body phase transitions in a one-dimensional ferromagnetic transversed field Ising model with an imaginary field and show that the system exhibits three phase transitions: one second-order phase transition and two $\mathcal{PT}$ phase transitions. The second-order phase transition occurring in the ground state is investigated via biorthogonal and self-normal entanglement entropy, for which we develop an approach to perform finite-size scaling theory to extract the central charge for small systems. Compared with the second-order phase transition, the first $\mathcal{PT}$ transition is characterized by the appearance of an exceptional point in the full energy spectrum, while the second $\mathcal{PT}$ transition only occurs in specific excited states. Furthermore, we interestingly show that both of exceptional points are second-order in terms of scalings of imaginary parts of the energy. This work provides an exact solution for unconventional many-body phase transitions in non-Hermitian systems.

翻訳日:2023-11-21 21:20:57 公開日:2023-11-19

# 感性分析に関する総合的レビュー:課題・アプローチ・応用

A Comprehensive Review on Sentiment Analysis: Tasks, Approaches and Applications ( http://arxiv.org/abs/2311.11250v1 )

ライセンス: Link先を確認

Sudhanshu Kumar (1), Partha Pratim Roy (1), Debi Prosad Dogra (2), Byung-Gyu Kim (3) ((1) Department of Computer Science and Engineering, IIT Roorkee, India, (2) School of Electrical Sciences, IIT Bhubaneswar, Odisha, India, (3) Department of IT Engineering, Sookmyung Women's University, Seoul, South Korea)

(参考訳) 感性分析(SA)はテキストマイニングにおける新たな分野である。異なるソーシャルメディアプラットフォーム上でテキストで表現された意見を計算的に識別し分類するプロセスである。ソーシャルメディアは、製品、サービス、そして最新の市場トレンドに対する顧客のマインドセットを知る上で重要な役割を果たす。ほとんどの組織は、提供された製品やサービスをアップグレードするための顧客の反応とフィードバックに依存しています。 SAや世論調査は諸藩にとって有望な研究分野であると思われる。インターネット上の構造化および非構造化フォーマットで毎日発生するビッグデータを分析する上で重要な役割を果たす。本研究は,音声,画像,映像,テキストなど様々な分野における感情と最近の研究・開発について述べる。感情分析の課題と機会についても論文で論じている。 \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}

Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}

翻訳日:2023-11-21 21:20:29 公開日:2023-11-19

# iot侵入検出のためのopen set dandelion network

Open Set Dandelion Network for IoT Intrusion Detection ( http://arxiv.org/abs/2311.11249v1 )

ライセンス: Link先を確認

Jiashu Wu, Hao Dai, Kenneth B. Kent, Jerome Yen, Chengzhong Xu, Yang Wang

(参考訳) IoTデバイスが広く普及するにつれて、悪意のある侵入から保護することが不可欠である。しかし、IoTのデータ不足は、データ依存の従来の侵入検出手法の適用性を制限している。そこで本稿では,非教師付きヘテロジニアスドメイン適応に基づくオープンセット型Dandelion Network(OSDN)を提案する。 OSDNモデルは、知識豊富なソースネットワーク侵入ドメインからの侵入知識転送を実行し、データスカースターゲットIoT侵入ドメインのより正確な侵入検出を容易にする。オープンセット設定の下では、ソースドメインで観測されない新規のターゲットドメイン侵入を検出することもできる。これを実現するために、osdnモデルは、ソースドメインを、各侵入カテゴリがコンパクトにグループ化され、異なる侵入カテゴリが分離される、すなわち、カテゴリ間分離性とカテゴリ内コンパクト性を同時に強調する、タンポレーションのような特徴空間に形成する。タンポポをベースとしたターゲットメンバシップ機構は、ターゲットタンポポを形成する。そして、タンポポ角分離機構によりカテゴリー間分離性が向上し、タンポポ埋め込みアライメント機構はさらに細かな方法で両タンポポを整列させる。カテゴリ内コンパクト性を促進するために、識別されたサンプルタンポポ機構を用いる。未知の侵入知識と生成した未知の侵入知識の両方を用いて訓練された侵入分類器の支援により、セマンティクスダンポレーション補正機構は、難解なカテゴリを強調し、カテゴリ間分離性を改善する。理論的には、これらのメカニズムはIoT侵入検出のために侵入知識転送を効果的に実行するOSDNモデルを形成する。いくつかの侵入データセットに関する包括的な実験は、OSDNモデルの有効性を検証し、3つの最先端のベースライン法を16.9%上回った。

As IoT devices become widely, it is crucial to protect them from malicious intrusions. However, the data scarcity of IoT limits the applicability of traditional intrusion detection methods, which are highly data-dependent. To address this, in this paper we propose the Open-Set Dandelion Network (OSDN) based on unsupervised heterogeneous domain adaptation in an open-set manner. The OSDN model performs intrusion knowledge transfer from the knowledge-rich source network intrusion domain to facilitate more accurate intrusion detection for the data-scarce target IoT intrusion domain. Under the open-set setting, it can also detect newly-emerged target domain intrusions that are not observed in the source domain. To achieve this, the OSDN model forms the source domain into a dandelion-like feature space in which each intrusion category is compactly grouped and different intrusion categories are separated, i.e., simultaneously emphasising inter-category separability and intra-category compactness. The dandelion-based target membership mechanism then forms the target dandelion. Then, the dandelion angular separation mechanism achieves better inter-category separability, and the dandelion embedding alignment mechanism further aligns both dandelions in a finer manner. To promote intra-category compactness, the discriminating sampled dandelion mechanism is used. Assisted by the intrusion classifier trained using both known and generated unknown intrusion knowledge, a semantic dandelion correction mechanism emphasises easily-confused categories and guides better inter-category separability. Holistically, these mechanisms form the OSDN model that effectively performs intrusion knowledge transfer to benefit IoT intrusion detection. Comprehensive experiments on several intrusion datasets verify the effectiveness of the OSDN model, outperforming three state-of-the-art baseline methods by 16.9%.

翻訳日:2023-11-21 21:20:08 公開日:2023-11-19

# AutoStory:最小限の人間によるストーリーテリング画像の生成

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort ( http://arxiv.org/abs/2311.11243v1 )

ライセンス: Link先を確認

Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen

(参考訳) ストーリービジュアライゼーションは、テキストで記述されたストーリーにマッチする一連の画像を生成することを目的としており、生成した画像は高品質、テキスト記述との整合性、文字のアイデンティティの整合性を満たす必要がある。ストーリービジュアライゼーションの複雑さを考えると、既存のメソッドは、いくつかの特定の文字やシナリオだけを考慮するか、スケッチのようなイメージごとの制御条件をユーザに要求することで、問題を劇的に単純化する。しかし、これらの単純化により、実際のアプリケーションではこれらの手法は無能である。そこで本研究では,人間のインタラクションを最小限に抑えて,多種多様で高品質で一貫したストーリーイメージを効果的に生成できる自動ストーリー可視化システムを提案する。具体的には,大規模言語モデルの理解と計画機能をレイアウト計画に活用し,大規模テキストから画像へのモデルを用いて,レイアウトに基づく高度なストーリーイメージを生成する。境界ボックスなどのスパース制御条件はレイアウト計画に適しているが,スケッチやキーポイントなどの密集制御条件は高品質な画像コンテンツを生成するのに適している。画像の画質を向上させるだけでなく,ユーザインタラクションも簡単かつ直感的に行えるよう,画像生成のための簡単なバウンディングボックスレイアウトをスケッチやキーポイント制御条件に変換する。また,文字画像の収集や描画に要する労力をなくし,多視点に一貫性のある文字画像を生成するための簡易かつ効果的な手法を提案する。

Story visualization aims to generate a series of images that match the story described in texts, and it requires the generated images to satisfy high quality, alignment with the text description, and consistency in character identities. Given the complexity of story visualization, existing methods drastically simplify the problem by considering only a few specific characters and scenarios, or requiring the users to provide per-image control conditions such as sketches. However, these simplifications render these methods incompetent for real applications. To this end, we propose an automated story visualization system that can effectively generate diverse, high-quality, and consistent sets of story images, with minimal human interactions. Specifically, we utilize the comprehension and planning capabilities of large language models for layout planning, and then leverage large-scale text-to-image models to generate sophisticated story images based on the layout. We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e.g., sketches and keypoints, are suitable for generating high-quality image content. To obtain the best of both worlds, we devise a dense condition generation module to transform simple bounding box layouts into sketch or keypoint control conditions for final image generation, which not only improves the image quality but also allows easy and intuitive user interactions. In addition, we propose a simple yet effective method to generate multi-view consistent character images, eliminating the reliance on human labor to collect or draw character images.

翻訳日:2023-11-21 21:19:17 公開日:2023-11-19

# 断熱強磁場近似に基づく分子のコヒーレントイオン化ダイナミクス

Coherent postionization dynamics of molecules based on adiabatic strong-field approximation ( http://arxiv.org/abs/2311.11242v1 )

ライセンス: Link先を確認

Shan Xue, Wenli Yang, Ping Li, Yuxuan Zhang, Pengji Ding, Song-Feng Zhao, Hongchuan Du and Anh-Thu Le

(参考訳) 開システム密度行列法は通常、強いレーザー場におけるポストイオン化ダイナミクスを調べるために非コヒーレントな集団注入を用いる。コヒーレンス注入の存在は長い間議論の対象となっている。この文脈では、断熱強磁場近似(ASFA)に基づくコヒーレンス注入モデルを導入する。このモデルは方向トンネルイオン化によるイオンコヒーレンスを効果的に予測する。磁場強度の増大に伴い、ASFAモデルにより予測されるコヒーレンス度は徐々にSFAモデルから逸脱するが、単純な波動膨張モデルと部分波動膨張モデルよりはるかに緩やかに保たれている。ポストイオン化分子動力学に及ぼすコヒーレンス注入の影響をo$_2$とn$_2$で検討した。イオン化誘起振動コヒーレンスは, n$_2^+$ における $x^2 \sigma _g^+ -b^2 \sigma _u^+ $ と o$_2^+$ の解離確率を強く増加させることがわかった。逆に、イオン化によって引き起こされるビブロンコヒーレンスが関連する遷移に阻害作用を持つ。これらの結果から,強電界電離後の分子動力学シミュレーションにおけるビブロニック状態分解コヒーレンス注入の意義が示唆された。

Open-system density matrix methods typically employ incoherent population injection to investigate the postionization dynamics in strong laser fields. The presence of coherence injection has long been a subject of debate. In this context, we introduce a coherence injection model based on the adiabatic strong-field approximation (ASFA). This model effectively predicts ionic coherence resulting from directional tunnel ionization. With increasing field strength, the degree of coherence predicted by the ASFA model gradually deviates from that of the SFA model but remains much milder compared to the results of the simple and partial-wave expansion models. The impact of coherence injection on the postionization molecular dynamics is explored in O$_2$ and N$_2$. We find that the ionization-induced vibrational coherence strongly enhances the population inversion of $X^2 \Sigma _g^+ -B^2 \Sigma _u^+ $ in N$_2^+$ and the dissociation probability of O$_2^+$. Conversely, the ionization-induced vibronic coherences have inhibitory effects on the related transitions. These findings reveal the significance of including the vibronic-state-resolved coherence injection in simulating molecular dynamics following strong-field ionization.

翻訳日:2023-11-21 21:18:46 公開日:2023-11-19

# UMAAF:画像の多面的属性による美学の展開

UMAAF: Unveiling Aesthetics via Multifarious Attributes of Images ( http://arxiv.org/abs/2311.11306v1 )

ライセンス: Link先を確認

Weijie Li, Yitian Wan, Xingjiao Wu, Junjie Xu, Liang He

(参考訳) スマートフォンやウェブサイトの普及に伴い、画像美容アセスメント(IAA)はますます重要になっている。 IAAにおける属性の重要性は広く認識されているが、多くの属性に基づく手法では美的属性の選択と利用について考慮されていない。最初のステップは、パースペクティブとインタースペクティブの両方から美的属性を取得することです。本研究では,画像の直接的視覚特性を抽出し,絶対的属性を構成する。 inter-perspectiveでは、同じシーケンス内の画像間の相対スコア関係をモデル化し、相対属性を形成することに重点を置いている。次に,画像属性の美的評価をよりよく活用するために,画像の絶対的属性と相対的属性の両方をモデル化する統一多属性美的評価フレームワーク(umaaf)を提案する。絶対属性に対しては,複数の絶対属性認識モジュールと絶対属性相互作用ネットワークを利用する。絶対属性認識モジュールは、まずいくつかの絶対属性学習タスクで事前訓練され、その後、対応する絶対属性の特徴を抽出するために使用される。絶対属性相互作用ネットワークは、多様な絶対属性特徴の重みを適応的に学習し、それらを様々な絶対属性視点から汎用的な美的特徴と効果的に統合し、美的予測を生成する。画像の相対的属性をモデル化するために,画像間の相対的ランク付けと相対的距離関係を相対的相関損失関数で検討し,umaafのロバスト性を高める。さらに、umaaf は tad66k と ava データセットで最先端のパフォーマンスを実現し、複数の実験で各モジュールの有効性とモデルの人間好みとの整合を実証した。

With the increasing prevalence of smartphones and websites, Image Aesthetic Assessment (IAA) has become increasingly crucial. While the significance of attributes in IAA is widely recognized, many attribute-based methods lack consideration for the selection and utilization of aesthetic attributes. Our initial step involves the acquisition of aesthetic attributes from both intra- and inter-perspectives. Within the intra-perspective, we extract the direct visual attributes of images, constituting the absolute attribute. In the inter-perspective, our focus lies in modeling the relative score relationships between images within the same sequence, forming the relative attribute. Then, to better utilize image attributes in aesthetic assessment, we propose the Unified Multi-attribute Aesthetic Assessment Framework (UMAAF) to model both absolute and relative attributes of images. For absolute attributes, we leverage multiple absolute-attribute perception modules and an absolute-attribute interacting network. The absolute-attribute perception modules are first pre-trained on several absolute-attribute learning tasks and then used to extract corresponding absolute attribute features. The absolute-attribute interacting network adaptively learns the weight of diverse absolute-attribute features, effectively integrating them with generic aesthetic features from various absolute-attribute perspectives and generating the aesthetic prediction. To model the relative attribute of images, we consider the relative ranking and relative distance relationships between images in a Relative-Relation Loss function, which boosts the robustness of the UMAAF. Furthermore, UMAAF achieves state-of-the-art performance on TAD66K and AVA datasets, and multiple experiments demonstrate the effectiveness of each module and the model's alignment with human preference.

翻訳日:2023-11-21 21:11:30 公開日:2023-11-19

# 大きな学習率によって一般化が改善される:しかし、どのくらい大きなことを言っているのか?

Large Learning Rates Improve Generalization: But How Large Are We Talking About? ( http://arxiv.org/abs/2311.11303v1 )

ライセンス: Link先を確認

Ekaterina Lobacheva, Eduard Pockonechnyy, Maxim Kodryan, Dmitry Vetrov

(参考訳) ニューラルネットワークのトレーニングを大きな学習率(LR)で始めることを推奨する最近の研究から着想を得て、この仮説を詳細に検討する。本研究は, 初回LR範囲を明らかにし, 後続のLRおよび重量平均化によるトレーニングに最適な結果を与えるものである。これらの範囲は、一般的に想定されるよりもかなり狭い。学習速度のハイパーパラメータを正確に制御し,より実用的な設定で重要な知見を検証できるように,簡易な設定で主実験を行った。

Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed. We conduct our main experiments in a simplified setup that allows precise control of the learning rate hyperparameter and validate our key findings in a more practical setting.

翻訳日:2023-11-21 21:11:04 公開日:2023-11-19

# Exchanging Dual Encoder-Decoder:Semantic Guidanceと空間的位置検出のための新しい戦略

Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization ( http://arxiv.org/abs/2311.11302v1 )

ライセンス: Link先を確認

Sijie Zhao, Xueliang Zhang, Pengfeng Xiao, and Guangjun He

(参考訳) 変化検出は地球観測における重要な課題である。近年,ディープラーニングに基づく手法が有望な性能を示し,変化検出に迅速に採用されている。しかし、広く使われているマルチエンコーダとシングルデコーダ(MESD)とデュアルエンコーダデコーダ(DED)アーキテクチャは、変更検出を効果的に処理するのに依然として苦労している。前者は機能レベル融合における両時間的特徴干渉の問題があり、後者はクラス内変化検出やマルチビュービルディング変更検出には適用できない。これらの問題を解決するために,セマンティックガイダンスと空間的ローカライゼーションを用いたバイナリ変更検出のためのデュアルエンコーダ・デコーダ構造を交換した新しい手法を提案する。提案手法は,決定レベルでの両時間的特徴とDEDの非適用性を両時間的意味的特徴を用いて決定することで,MESDにおける両時間的特徴推論の問題を解決する。この戦略に基づいてバイナリ変更検出モデルを構築し、クラス内変更検出データセット(CDD, SYSU)、シングルビュービルド変更検出データセット(WHU, LEVIR-CD, LEVIR-CD+)、マルチビュービルディング変更検出データセット(NJDS)の3つのシナリオにおいて、6つのデータセットに対して18の最先端変更検出手法を検証・比較する。実験結果から,f1-scores 97.77%,83.07%,94.86%,92.33%,91.39%,74.35%のcdd,sysu,whu,levir-cd,levir-cd+,njds のベンチマーク法をそれぞれ上回って,高い性能を実現した。この作業のコードはhttps://github.com/NJU-LHRS/official-SGSLNで公開される。

Change detection is a critical task in earth observation applications. Recently, deep learning-based methods have shown promising performance and are quickly adopted in change detection. However, the widely used multiple encoder and single decoder (MESD) as well as dual encoder-decoder (DED) architectures still struggle to effectively handle change detection well. The former has problems of bitemporal feature interference in the feature-level fusion, while the latter is inapplicable to intraclass change detection and multiview building change detection. To solve these problems, we propose a new strategy with an exchanging dual encoder-decoder structure for binary change detection with semantic guidance and spatial localization. The proposed strategy solves the problems of bitemporal feature inference in MESD by fusing bitemporal features in the decision level and the inapplicability in DED by determining changed areas using bitemporal semantic features. We build a binary change detection model based on this strategy, and then validate and compare it with 18 state-of-the-art change detection methods on six datasets in three scenarios, including intraclass change detection datasets (CDD, SYSU), single-view building change detection datasets (WHU, LEVIR-CD, LEVIR-CD+) and a multiview building change detection dataset (NJDS). The experimental results demonstrate that our model achieves superior performance with high efficiency and outperforms all benchmark methods with F1-scores of 97.77%, 83.07%, 94.86%, 92.33%, 91.39%, 74.35% on CDD, SYSU, WHU, LEVIR-CD, LEVIR- CD+, and NJDS datasets, respectively. The code of this work will be available at https://github.com/NJU-LHRS/official-SGSLN.

翻訳日:2023-11-21 21:10:54 公開日:2023-11-19

# CHAMP: クラスタ階層の効率的なアノテーションと統合

CHAMP: Efficient Annotation and Consolidation of Cluster Hierarchies ( http://arxiv.org/abs/2311.11301v1 )

ライセンス: Link先を確認

Arie Cattan, Tom Hope, Doug Downey, Roy Bar-Haim, Lilach Eden, Yoav Kantor, Ido Dagan

(参考訳) 様々なNLPタスクは、各ノードがアイテムのクラスタであるノード上の複雑な階層構造を必要とする。例えば、entailmentグラフの生成、階層的なクロスドキュメントのコア参照解決、アノテートイベントとサブイベントの関係などです。このような階層構造の効率的なアノテーションを可能にするため,任意のタイプのテキストに対してクラスタと階層を同時に構築可能なオープンソースツールであるCHAMPをリリースする。このインクリメンタルなアプローチは、一般的なペアワイズアノテーションアプローチに比べてアノテーション時間を大幅に削減するとともに、クラスタや階層レベルでの推移性を維持することを保証する。さらに、CHAMPには統合モードがあり、複数のクラスタ階層アノテーションを簡単に比較でき、不一致を解消できる。

Various NLP tasks require a complex hierarchical structure over nodes, where each node is a cluster of items. Examples include generating entailment graphs, hierarchical cross-document coreference resolution, annotating event and subevent relations, etc. To enable efficient annotation of such hierarchical structures, we release CHAMP, an open source tool allowing to incrementally construct both clusters and hierarchy simultaneously over any type of texts. This incremental approach significantly reduces annotation time compared to the common pairwise annotation approach and also guarantees maintaining transitivity at the cluster and hierarchy levels. Furthermore, CHAMP includes a consolidation mode, where an adjudicator can easily compare multiple cluster hierarchy annotations and resolve disagreements.

翻訳日:2023-11-21 21:10:17 公開日:2023-11-19

# カテゴリから分類器へ: Web 探索による名前のみの継続的な学習

From Categories to Classifier: Name-Only Continual Learning by Exploring the Web ( http://arxiv.org/abs/2311.11293v1 )

ライセンス: Link先を確認

Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip H.S. Torr, Adel Bibi

(参考訳) 継続学習(CL)はしばしば、非現実的に時間がかかり、実際にコストがかかるという仮定である広範な注釈付きデータセットの可用性に依存する。我々は、時間とコストの制約が手動アノテーションを禁止する、名前のみ連続学習と呼ばれる新しいパラダイムを探求する。このシナリオでは、学習者は注釈付きトレーニングデータの豪華さなしに、カテゴリ名のみを使用して新しいカテゴリシフトに適応する。提案手法は拡張的で進化し続けているインターネットを利用して,画像分類のためのweb上教師なしデータの検索とダウンロードを行う。我々は、Webデータの信頼性を調べ、それらが手動で注釈付きデータセットよりも優れている場合もあります。さらに,webを活用すれば,laion-5bから生成モデルや画像検索を用いたサポートセットを作成することで,最先端の命名のみの分類を上回って,最大25%の精度向上を実現するサポートセットを作成できることを示す。各種連続学習コンテキストに適用すると,手動で注釈付きデータセットで学習したモデルと比較して,連続的な性能差が小さい。 EvoTrendsは、Webから作られたクラスインクリメンタルなデータセットで、数分で作成された現実世界のトレンドをキャプチャします。全体として,本論文は,連続学習における手動データラベリングに関わる課題を軽減するために,未処理のウェブ教師付きデータを使用することの可能性を強調した。

Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annotated training data. Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification. We investigate the reliability of our web data and find them comparable, and in some cases superior, to manually annotated datasets. Additionally, we show that by harnessing the web, we can create support sets that surpass state-of-the-art name-only classification that create support sets using generative models or image retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied across varied continual learning contexts, our method consistently exhibits a small performance gap in comparison to models trained on manually annotated datasets. We present EvoTrends, a class-incremental dataset made from the web to capture real-world trends, created in just minutes. Overall, this paper underscores the potential of using uncurated webly-supervised data to mitigate the challenges associated with manual data labeling in continual learning.

翻訳日:2023-11-21 21:10:07 公開日:2023-11-19

# 映像予測のための空間マスキングによるペアワイズ層注意

Pair-wise Layer Attention with Spatial Masking for Video Prediction ( http://arxiv.org/abs/2311.11289v1 )

ライセンス: Link先を確認

Ping Li, Chenhan Zhang, Zheng Yang, Xianghua Xu, Mingli Song

(参考訳) ビデオ予測は、過去のフレームを利用することで将来のフレームを生み出し、気象予測や自律運転など、多くの応用においてその大きな可能性を示した。以前の作品は、テクスチャの詳細を伴わずに、究極のハイレベルなセマンティクス機能を将来のフレームにデコードすることが多く、予測品質が低下する。そこで我々は,低レベルの視覚手がかりと高レベル特徴を結合することにより,u字型構造から派生した特徴マップの層別意味依存性を高めるペアワイズ層注意モジュールを開発した。これにより、予測フレームのテクスチャ詳細が強化される。さらに、既存の手法の多くはトランスレータによって時空間のダイナミクスを捉えるが、エンコーダの空間的特徴を十分に活用できない。これにより、プリトレーニング中に部分的な符号化機能を隠蔽する空間マスキング(SM)モジュールを設計し、デコーダによる残像画素の可視性を高めることができる。そこで本稿では,映像予測のための空間マスキング(pla-sm)フレームワークを用いて,動きの傾向を反映した時空間ダイナミクスを捉える。 5つのベンチマークに関する広範囲な実験と厳密なアブレーション研究は、提案手法の利点を示している。コードはGitHubで入手できる。

Video prediction yields future frames by employing the historical frames and has exhibited its great potential in many applications, e.g., meteorological prediction, and autonomous driving. Previous works often decode the ultimate high-level semantic features to future frames without texture details, which deteriorates the prediction quality. Motivated by this, we develop a Pair-wise Layer Attention (PLA) module to enhance the layer-wise semantic dependency of the feature maps derived from the U-shape structure in Translator, by coupling low-level visual cues and high-level features. Hence, the texture details of predicted frames are enriched. Moreover, most existing methods capture the spatiotemporal dynamics by Translator, but fail to sufficiently utilize the spatial features of Encoder. This inspires us to design a Spatial Masking (SM) module to mask partial encoding features during pretraining, which adds the visibility of remaining feature pixels by Decoder. To this end, we present a Pair-wise Layer Attention with Spatial Masking (PLA-SM) framework for video prediction to capture the spatiotemporal dynamics, which reflect the motion trend. Extensive experiments and rigorous ablation studies on five benchmarks demonstrate the advantages of the proposed approach. The code is available at GitHub.

翻訳日:2023-11-21 21:09:42 公開日:2023-11-19

# パレート前線の向こうに何がある? 多目的最適化のための意思決定支援手法の検討

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization ( http://arxiv.org/abs/2311.11288v1 )

ライセンス: Link先を確認

Zuzanna Osika, Jazmin Zatarain Salazar, Diederik M. Roijers, Frans A. Oliehoek and Pradeep K. Murukannaiah

(参考訳) 本稿では,多目的最適化(MOO)アルゴリズムが生み出す解を探索するための意思決定支援手法を統一するレビューを行う。多様な問題を解決するためにMOOを適用するため、MOOアルゴリズムが提供するトレードオフを解析するためのアプローチがフィールドに分散している。本稿では,可視化,解集合のマイニング,不確実性探索,および対話性,説明可能性,倫理といった新たな研究方向を含む,このトピックの進歩の概要について述べる。これらの手法を様々な研究分野から合成し,アプリケーションとは無関係に統一的なアプローチを構築する。本研究の目的は,MOOアルゴリズムの利用に対する研究者や実践者の参入障壁を小さくし,新たな研究指針を提供することである。

We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic, including methods for visualization, mining the solution set, and uncertainty exploration as well as emerging research directions, including interactivity, explainability, and ethics. We synthesize these methods drawing from different fields of research to build a unified approach, independent of the application. Our goals are to reduce the entry barrier for researchers and practitioners on using MOO algorithms and to provide novel research directions.

翻訳日:2023-11-21 21:09:17 公開日:2023-11-19

# 効率的なロボットマニピュレーションスキル獲得のための触覚アクティブ推論強化学習

Tactile Active Inference Reinforcement Learning for Efficient Robotic Manipulation Skill Acquisition ( http://arxiv.org/abs/2311.11287v1 )

ライセンス: Link先を確認

Zihao Liu, Xing Liu, Yizhai Zhang, Zhengxiong Liu and Panfeng Huang

(参考訳) ロボット操作は、退屈で危険なタスクの実行において、人間を置き換える可能性を秘めている。しかし、現実のオープンワールド操作を形式的に記述することが困難であり、既存の学習手法の非効率性のため、制御に基づくアプローチは適切ではない。したがって、幅広いシナリオに操作を適用することが大きな課題となる。本研究では,ロボット操作におけるスキル学習のための新しい手法である触覚能動推論強化学習(tactile-airl)を提案する。強化学習(RL)の性能を高めるために,モデルに基づく手法と本質的な好奇心をRLプロセスに統合した能動推論を導入する。この統合により、アルゴリズムのトレーニング効率とスパース報酬への適応性が向上する。さらに、視覚に基づく触覚センサを用いて、操作タスクの詳細な認識を行う。最後に,自由エネルギー最小化による適切な行動を想定し,計画するためにモデルベースアプローチを採用する。シミュレーションの結果,タスクをプッシュする非理解オブジェクトのトレーニング効率は有意に高いことがわかった。エージェントは、SACベースラインを越え、わずか数回の相互作用エピソードで、密度と疎度の両方の報酬タスクをエクササイズすることができる。さらに,本手法を用いてグリッパーねじり作業の物理実験を行い,アルゴリズムの高速学習能力とその実用的応用の可能性を示す。

Robotic manipulation holds the potential to replace humans in the execution of tedious or dangerous tasks. However, control-based approaches are not suitable due to the difficulty of formally describing open-world manipulation in reality, and the inefficiency of existing learning methods. Thus, applying manipulation in a wide range of scenarios presents significant challenges. In this study, we propose a novel method for skill learning in robotic manipulation called Tactile Active Inference Reinforcement Learning (Tactile-AIRL), aimed at achieving efficient training. To enhance the performance of reinforcement learning (RL), we introduce active inference, which integrates model-based techniques and intrinsic curiosity into the RL process. This integration improves the algorithm's training efficiency and adaptability to sparse rewards. Additionally, we utilize a vision-based tactile sensor to provide detailed perception for manipulation tasks. Finally, we employ a model-based approach to imagine and plan appropriate actions through free energy minimization. Simulation results demonstrate that our method achieves significantly high training efficiency in non-prehensile objects pushing tasks. It enables agents to excel in both dense and sparse reward tasks with just a few interaction episodes, surpassing the SAC baseline. Furthermore, we conduct physical experiments on a gripper screwing task using our method, which showcases the algorithm's rapid learning capability and its potential for practical applications.

翻訳日:2023-11-21 21:09:06 公開日:2023-11-19

# TimeSQL: マルチスケールパッチとスムーズな2次損失による多変量時系列予測の改善

TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss ( http://arxiv.org/abs/2311.11285v1 )

ライセンス: Link先を確認

Site Mo, Haoxin Wang, Bixiong Li, Songhai Fan, Yuankai Wu, Xianggen Liu

(参考訳) 時系列(英: Time series)とは、任意の時間間隔で収集された実数値の確率変数の列である。実世界の多変量時系列はノイズを伴い、複雑な局所的および大域的時間力学を含むため、歴史的観測から将来の時系列を予測することは困難である。この作業は、マルチスケールパッチとスムーズな二次的損失(SQL)を活用して、上記の課題に対処する、シンプルで効果的なフレームワークであるTimeSQLを提案する。マルチスケールパッチは、時系列を異なる長さスケールの2次元パッチに変換し、時系列における局所性と長期相関の認識を促進する。 sqlはrational quadratic kernelから派生したもので、ノイズや外れ値の過剰を避けるために動的に勾配を調整することができる。理論的解析により、穏やかな条件下では、SQLのモデルに対するノイズの影響は常にMSEのノイズよりも小さいことが示される。 2つのモジュールに基づいて、TimeSQLは8つの実世界のベンチマークデータセット上で、最先端のパフォーマンスを新たに達成する。さらなるアブレーション研究により、TimeSQLのキーモジュールは、プラグイン・アンド・プレイ技術として立脚した多変量時系列予測のための他のモデルの結果も強化できることが示された。

Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as TimeSQL, which leverages multi-scale patching and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale patching transforms the time series into two-dimensional patches with different length scales, facilitating the perception of both locality and long-term correlations in time series. SQL is derived from the rational quadratic kernel and can dynamically adjust the gradients to avoid overfitting to the noises and outliers. Theoretical analysis demonstrates that, under mild conditions, the effect of the noises on the model with SQL is always smaller than that with MSE. Based on the two modules, TimeSQL achieves new state-of-the-art performance on the eight real-world benchmark datasets. Further ablation studies indicate that the key modules in TimeSQL could also enhance the results of other models for multivariate time series forecasting, standing as plug-and-play techniques.

翻訳日:2023-11-21 21:08:42 公開日:2023-11-19

# luciddreamer: インターバルスコアマッチングによる高忠実度テキスト対3d生成に向けて

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching ( http://arxiv.org/abs/2311.11284v1 )

ライセンス: Link先を確認

Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong Chen

(参考訳) テキスト3d生成の最近の進歩は、様々な現実世界のシナリオにまたがって想像力のある3dアセットを作成する新たな可能性を開くことによって、生成モデルにおける重要なマイルストーンとなった。テキスト3d生成の最近の進歩は期待されているものの、詳細な高品質な3dモデルのレンダリングには不足していることが多い。多くのメソッドがSDS(Score Distillation Sampling)に基づいているため、この問題は特に顕著である。本稿では3次元モデルに不整合かつ低品質な更新方向をもたらし、過度なスムーシング効果をもたらすSDSの顕著な欠陥を同定する。そこで我々は,ISM (Interval Score Matching) と呼ばれる新しい手法を提案する。 ISMは決定論的拡散軌道を用いており、間隔ベースのスコアマッチングを用いてオーバー・スムーシングに対抗する。さらに、テキストから3D生成パイプラインに3Dガウススプラッティングを組み込む。大規模な実験により、我々のモデルは品質と訓練効率の最先端性を大きく上回ることがわかった。

The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency.

翻訳日:2023-11-21 21:08:22 公開日:2023-11-19

# 個々の誤報タグ付けはエコーチャンバーを補強する;集団タグ付けはしない

Individual misinformation tagging reinforces echo chambers; Collective tagging does not ( http://arxiv.org/abs/2311.11282v1 )

ライセンス: Link先を確認

Junsol Kim, Zhao Wang, Haohan Shi, Hsin-Keng Ling, James Evans

(参考訳) オンライン上の誤った情報による不安定な影響に対する恐れは、個人やプラットフォームに反応を促した。個人は、より健康的な情報エコシステムを追求し、自己強化的な意見の反響室を壊すために、事実チェックで他人のオンライン主張に挑戦する権限を与えられた。タグづけされたポスターは、新しい政治情報を探し、その直前に話題の興味を広げていたが、タグ付けされたポスターは情報バブルに後退した。これらの意図しない結果は、誤情報モデレーションのための集合的検証システムによって軟化された。 Twitterの新しいプラットフォームであるCommunity Notesでは、偽情報のタグ付けは公開前に他のファクトチェッカーによってピアレビューされた。集団的な誤情報タグ付けでは、ポスターは多様な情報消費から撤退する可能性が低い。詳細な比較は、個人と集団の誤情報のタグ付けメッセージにおける毒性、感情、可読性、遅延の違いを示唆する。これらの知見は、情報エコシステム全体の情報消費とモビリティの多様性に個人と集団のモデレーション戦略が与える影響の異なる証拠を提供する。

Fears about the destabilizing impact of misinformation online have motivated individuals and platforms to respond. Individuals have become empowered to challenge others' online claims with fact-checks in pursuit of a healthier information ecosystem and to break down echo chambers of self-reinforcing opinion. Using Twitter data, here we show the consequences of individual misinformation tagging: tagged posters had explored novel political information and expanded topical interests immediately prior, but being tagged caused posters to retreat into information bubbles. These unintended consequences were softened by a collective verification system for misinformation moderation. In Twitter's new platform, Community Notes, misinformation tagging was peer-reviewed by other fact-checkers before the exposure. With collective misinformation tagging, posters were less likely to retreat from diverse information consumption. Detailed comparison suggests differences in toxicity, sentiment, readability, and delay in individual versus collective misinformation tagging messages. These findings provide evidence for differential impacts from individual versus collective moderation strategies on the diversity of information consumption and mobility across the information ecosystem.

翻訳日:2023-11-21 21:08:05 公開日:2023-11-19

# 深層強化学習によるマルチタイム制御とコミュニケーション -その1:通信対応車両制御-

Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control ( http://arxiv.org/abs/2311.11281v1 )

ライセンス: Link先を確認

Tong Liu, Lei Lei, Kan Zheng, Xuemin (Sherman) Shen

(参考訳) V2X通信によって実現されるインテリジェントな意思決定システムは、安全で効率的な自動運転(AD)を実現するために不可欠であり、車両制御と無線リソース割り当て(RRA)という2種類の決定を異なる時間スケールで行う必要がある。 RRAと車両制御の相互作用は共同設計を必要とする。本論文(パートI,パートII)では,多段階制御と通信(MTCC)の協調最適化フレームワークを,深層強化学習(DRL)に基づいて提案する。本稿では,まず通信対応DRLベースのPCサブプロブレムと制御対応DRLベースのRRAサブプロブレムに分解する。次に、RRAポリシーが与えられると仮定したPCサブプロブレムに着目し、効率的なPCポリシーを学ぶためのMTCC-PCアルゴリズムを提案する。ランダムな観察遅延下でのPC性能向上のため、PC状態空間を観察遅延とPC動作履歴で拡張する。さらに、拡張状態に関する報酬関数を定義して、拡張状態マルコフ決定プロセス(MDP)を構築する。拡張状態MDPの最適ポリシは、観測遅延を伴う元のPC問題に最適であることが証明された。 MTCC-PCアルゴリズムは,従来の通信対応制御とは異なり,単純な確率遅延モデルではなく,C-V2X通信の微細な埋め込みシミュレーションによって生成された遅延環境で訓練される。最後に,MTCC-PCの性能とベースラインDRLアルゴリズムの性能を比較する実験を行った。

An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multi-timescale control and communications (MTCC) based on Deep Reinforcement Learning (DRL). In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem. Then, we focus on the PC sub-problem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov Decision Process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of C-V2X communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.

翻訳日:2023-11-21 21:07:49 公開日:2023-11-19

# 深層強化学習によるマルチタイム制御とコミュニケーション -その2: 無線リソース配置の制御-

Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part II: Control-Aware Radio Resource Allocation ( http://arxiv.org/abs/2311.11280v1 )

ライセンス: Link先を確認

Lei Lei, Tong Liu, Kan Zheng, Xuemin (Sherman) Shen

(参考訳) 本論文のパートI(Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control)では,C-V2X(Cellular Vehicle-to-Everything)システムにおけるマルチスケール制御と通信(MTCC)の問題を,DRL(Deep Reinforcement Learning)に基づく小隊制御(PC)サブプロブレムとDRL(RRA)サブプロブレムに分解した。我々は,PCサブプロブレムに着目し,RRAポリシーを考慮し,最適PCポリシーを学習するためのMTCC-PCアルゴリズムを提案した。本稿では,PC ポリシーが与えられたことを前提とした MTCC における RRA サブプロブレムに着目し,RRA ポリシーを学習するための MTCC-RRA アルゴリズムを提案する。具体的には、観察遅延に起因するPC性能劣化量を定量化するRRA報酬関数にPCアドバンテージ関数を組み込む。さらに,PC アクション履歴を用いて RRA の状態空間を拡張し,より優れた RRA ポリシーを提案する。さらに,報奨シェーピングと報奨バックプロパゲーションを優先した経験リプレイ (rbper) 技術を用いて,マルチエージェント問題とスパース報酬問題を効率的に解決する。最後に,pcとrraのポリシを反復的に学習するために,サンプルと計算効率のよいトレーニング手法を提案する。 MTCCアルゴリズムの有効性を検証するために, MTCCの性能をベースラインDRLアルゴリズムと比較した, 先行車両の実走行データを用いた実験を行った。

In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward shaping and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.

翻訳日:2023-11-21 21:07:20 公開日:2023-11-19

# 汎用的ディープフェイク検出のための潜在空間拡張による超越的偽造特異性

Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection ( http://arxiv.org/abs/2311.11278v1 )

ライセンス: Link先を確認

Zhiyuan Yan, Yuhao Luo, Siwei Lyu, Qingshan Liu, Baoyuan Wu

(参考訳) Deepfake検出は、トレーニングとテストデータの分布にミスマッチがある場合のパフォーマンスが低下する、重要な一般化ハードルに直面している。広く受け入れられた説明は、これらの検出器が様々な偽造物に広く適用される特徴を学ぶよりも、偽造物に過度に適合する傾向にある。この問題に対処するために、我々は、ヒューリスティックなアイデアに基づく、lsda(\underline{l}atent \underline{s}pace \underline{d}ata \underline{a}ugmentation)と呼ばれる単純で効果的な検出器を提案する。この考え方に従い, 潜在空間における偽造特徴の変動を構築・シミュレートすることにより, 偽造空間の拡大を提案する。このアプローチは、リッチでドメイン固有の特徴の獲得と、異なるフォージェリータイプ間のスムーズな移行の促進を含み、ドメインギャップを効果的に埋める。提案手法は, 改良された特徴から蒸留された知識を生かし, 一般化可能なディープフェイク検出器の開発に有効である。包括的実験により,提案手法は驚くほど効果的であり,広く使用されているベンチマークにおいて最先端の検出器を超越することを示した。

Deepfake detection faces a critical generalization hurdle, with performance deteriorating when there is a mismatch between the distributions of training and testing data. A broadly received explanation is the tendency of these detectors to be overfitted to forgery-specific artifacts, rather than learning features that are widely applicable across various forgeries. To address this issue, we propose a simple yet effective detector called LSDA (\underline{L}atent \underline{S}pace \underline{D}ata \underline{A}ugmentation), which is based on a heuristic idea: representations with a wider variety of forgeries should be able to learn a more generalizable decision boundary, thereby mitigating the overfitting of method-specific features (see Figure. 1). Following this idea, we propose to enlarge the forgery space by constructing and simulating variations within and across forgery features in the latent space. This approach encompasses the acquisition of enriched, domain-specific features and the facilitation of smoother transitions between different forgery types, effectively bridging domain gaps. Our approach culminates in refining a binary classifier that leverages the distilled knowledge from the enhanced features, striving for a generalizable deepfake detector. Comprehensive experiments show that our proposed method is surprisingly effective and transcends state-of-the-art detectors across several widely used benchmarks.

翻訳日:2023-11-21 21:06:44 公開日:2023-11-19

# 迷彩レンズによる大型視覚言語モデルの一般化と幻覚

Generalization and Hallucination of Large Vision-Language Models through a Camouflaged Lens ( http://arxiv.org/abs/2311.11273v1 )

ライセンス: Link先を確認

Lv Tang, Peng-Tao Jiang, Zhihao Shen, Hao Zhang, Jinwei Chen, Bo Li

(参考訳) 大型視覚言語モデル(lvlm)は近年急速に発展し、注目を集めている。本稿では,LVLM が難易度の高いcamouflaged object detection (COD) シナリオに学習自由な方法で一般化できるかどうかを検討するために,新しいフレームワークであるcamo-perceptive vision- language framework (CPVLF) を提案する。一般化の過程では、lvlm内の幻覚の問題により、迷彩されたシーンの物体を誤って知覚し、反事実的な概念を生み出すことが分かる。さらに、LVLMはカモフラージュされた物体の正確な位置決めを特別に訓練されていないため、これらの物体を正確に特定する上で不確実性を示す。そこで本研究では,言語と視覚の両方の観点からのlvlmのカモフラージュシーンの知覚を増強し,幻覚問題を低減し,カモフラージュ対象を正確に同定する能力を向上させる視覚知覚連鎖を提案する。我々は,CPVLFが広く使用されている3つのCODデータセットに対して有効であることを検証するとともに,CODタスクにおけるLVLMの可能性を示す。

Large Vision-Language Model (LVLM) has seen burgeoning development and increasing attention recently. In this paper, we propose a novel framework, camo-perceptive vision-language framework (CPVLF), to explore whether LVLM can generalize to the challenging camouflaged object detection (COD) scenario in a training-free manner. During the process of generalization, we find that due to hallucination issues within LVLM, it can erroneously perceive objects in camouflaged scenes, producing counterfactual concepts. Moreover, as LVLM is not specifically trained for the precise localization of camouflaged objects, it exhibits a degree of uncertainty in accurately pinpointing these objects. Therefore, we propose chain of visual perception, which enhances LVLM's perception of camouflaged scenes from both linguistic and visual perspectives, reducing the hallucination issue and improving its capability in accurately locating camouflaged objects. We validate the effectiveness of CPVLF on three widely used COD datasets, and the experiments show the potential of LVLM in the COD task.

翻訳日:2023-11-21 21:06:15 公開日:2023-11-19

# 分散二レベル最適化の通信複雑性について

On the Communication Complexity of Decentralized Bilevel Optimization ( http://arxiv.org/abs/2311.11342v1 )

ライセンス: Link先を確認

Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao

(参考訳) 分散二レベル最適化は、機械学習に広く応用されて以来、ここ数年で積極的に研究されてきた。しかし、既存のアルゴリズムは確率的過次性の推定によって引き起こされる通信の複雑さに悩まされ、実際のタスクに限定する。この問題に対処するため,各ラウンドおよび小ラウンドの通信コストを低減し,不均一な環境下での分散確率的二段階勾配降下アルゴリズムを開発した。そのため、既存のアルゴリズムよりもはるかに優れた通信複雑性を実現することができる。さらに、より困難な分散化されたマルチレベル最適化にアルゴリズムを拡張します。私たちの知る限りでは、不均質な環境下でこれらの理論的な結果を達成するのは初めてです。実験結果から,本アルゴリズムの有効性が確認された。

Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and small communication rounds. As such, it can achieve a much better communication complexity than existing algorithms. Moreover, we extend our algorithm to the more challenging decentralized multi-level optimization. To the best of our knowledge, this is the first time achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.

翻訳日:2023-11-21 20:58:28 公開日:2023-11-19

# 時系列の自己蒸留表現学習

Self-Distilled Representation Learning for Time Series ( http://arxiv.org/abs/2311.11335v1 )

ライセンス: Link先を確認

Felix Pieper and Konstantin Ditschuneit and Martin Genzel and Alexandra Lindt and Johannes Otterbach

(参考訳) 時系列データに対する自己教師あり学習は、最近自然言語処理やコンピュータビジョンで解かれたものと同様の可能性を秘めている。この分野の既存の研究は対照的な学習に重点を置いているが、我々はData2vecの自己蒸留フレームワークに基づく概念的にシンプルだが強力な非競合的アプローチを提案する。本手法の中核は,同じ時系列のマスキングビューから入力時系列の潜在表現を予測する学生-教師方式である。この戦略は、対照的なサンプルペアの設計によって一般的に導入される強いモダリティ特有の仮定やバイアスを避ける。 UCRやUEAのアーカイブやETTやElectricityのデータセットといった最先端の自己教師型学習手法と比較して,下流タスクとして分類と予測を行うアプローチの競争力を実証する。

Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework. The core of our method is a student-teacher scheme that predicts the latent representation of an input time series from masked views of the same time series. This strategy avoids strong modality-specific assumptions and biases typically introduced by the design of contrastive sample pairs. We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets.

翻訳日:2023-11-21 20:58:19 公開日:2023-11-19

# 動的システムにおける因果スレッドによる変化の説明

Using Causal Threads to Explain Changes in a Dynamic System ( http://arxiv.org/abs/2311.11334v1 )

ライセンス: Link先を確認

Robert B. Allen

(参考訳) 我々はシステムのリッチな意味モデルの開発を探求する。具体的には,これらのシステムにおける状態変化に関する構造的因果説明について考察する。基本的に、プロセスベースの動的知識グラフを開発しています。例えば,雪球地球理論によって提案された地質変化の因果スレッドのモデルを構築した。さらに,説明を行うためのグラフィカルインタフェースの初期プロトタイプについて述べる。大規模言語モデル(llm)のような要約や説明に対する統計的アプローチとは異なり、直接表現のアプローチは直接検査し検証することができる。

We explore developing rich semantic models of systems. Specifically, we consider structured causal explanations about state changes in those systems. Essentially, we are developing process-based dynamic knowledge graphs. As an example, we construct a model of the causal threads for geological changes proposed by the Snowball Earth theory. Further, we describe an early prototype of a graphical interface to present the explanations. Unlike statistical approaches to summarization and explanation such as Large Language Models (LLMs), our approach of direct representation can be inspected and verified directly.

翻訳日:2023-11-21 20:58:05 公開日:2023-11-19

# 金融サービスにおけるポルトガルのFAQ

Portuguese FAQ for Financial Services ( http://arxiv.org/abs/2311.11331v1 )

ライセンス: Link先を確認

Paulo Finardi, Wanderley M. Melo, Edgard D. Medeiros Neto, Alex F. Mansano, Pablo B. Costa, Vinicius F. Carid\'a

(参考訳) ポルトガルの金融分野におけるドメイン固有データの重要性は、自然言語処理(NLP)アプリケーションの開発を嫌っている。この制限に対処するため,本研究はデータ拡張技術によって生成された合成データの利用を提唱する。この調査は、ブラジル中央銀行のfaqから引用されたデータセットの強化に焦点を当てており、意味的類似性が異なる技術を使用している。教師なしタスクは、低・高セマンティック類似性シナリオにおける拡張データの影響を評価するために行われる。さらに、結果のデータセットはHugging Face Datasetsプラットフォーム上に公開され、アクセシビリティが向上し、NLP研究コミュニティ内での広範なエンゲージメントが促進される。

Scarcity of domain-specific data in the Portuguese financial domain has disfavored the development of Natural Language Processing (NLP) applications. To address this limitation, the present study advocates for the utilization of synthetic data generated through data augmentation techniques. The investigation focuses on the augmentation of a dataset sourced from the Central Bank of Brazil FAQ, employing techniques that vary in semantic similarity. Supervised and unsupervised tasks are conducted to evaluate the impact of augmented data on both low and high semantic similarity scenarios. Additionally, the resultant dataset will be publicly disseminated on the Hugging Face Datasets platform, thereby enhancing accessibility and fostering broader engagement within the NLP research community.

翻訳日:2023-11-21 20:57:58 公開日:2023-11-19

# ユニタリ変換とアンシラ状態測定によるマトリックス操作

Matrix manipulations via unitary transformations and ancilla-state measurements ( http://arxiv.org/abs/2311.11329v1 )

ライセンス: Link先を確認

Alexander I. Zenchuk, Wentao Qi, Asutosh Kumar, Junde Wu

(参考訳) 本稿では,マルチキュービットトフォリ型と最も単純な1キュービット演算に基づく内部積,行列加算,行列乗算の計算プロトコルを提案し,アンシラ測定を用いて計算のすべてのゴミを除去する。加算プロトコルの深さ(ランタイム)は$O(1)$であり、他のプロトコルの深さは考慮された行列の次元によって対数的に増加する。

We propose protocols for calculating inner product, matrix addition and matrix multiplication based on multiqubit Toffoli-type and the simplest one-qubit operations and employ ancilla measurements to remove all garbage of calculations. The depth (runtime) of the addition protocol is $O(1)$ and that of other protocols logarithmically increases with the dimensionality of the considered matrices.

翻訳日:2023-11-21 20:57:46 公開日:2023-11-19

# LABCAT:主成分整合信頼領域を用いた局所適応ベイズ最適化

LABCAT: Locally adaptive Bayesian optimization using principal component-aligned trust regions ( http://arxiv.org/abs/2311.11328v1 )

ライセンス: Link先を確認

E. Visser, C.E. van Daalen, J.C. Schoeman

(参考訳) ベイズ最適化(BO)は高価なブラックボックス関数を最適化する一般的な方法である。 BOには、より長い最適化実行を伴う計算のスローダウン、非定常あるいは不条件の目的関数に対する適合性の低下、収束特性の低下など、よく文書化された欠点がいくつかある。信頼領域などのローカル戦略をBOに組み込んでこれらの制限を緩和するアルゴリズムがいくつか提案されているが、いずれのアルゴリズムも十分対応していない。そこで本研究では,局所ガウス過程サーロゲートモデルの長さスケールに基づく主成分整合回転と適応再スケーリング戦略を付加することにより,信頼領域に基づくboを拡張したlabcatアルゴリズムを提案する。一連の合成テスト関数とよく知られたCOCOベンチマークソフトウェアを用いて、広範囲にわたる数値実験を行い、LABCATアルゴリズムは最先端BOや他のブラックボックス最適化アルゴリズムよりも優れていることを示した。

Bayesian optimization (BO) is a popular method for optimizing expensive black-box functions. BO has several well-documented shortcomings, including computational slowdown with longer optimization runs, poor suitability for non-stationary or ill-conditioned objective functions, and poor convergence characteristics. Several algorithms have been proposed that incorporate local strategies, such as trust regions, into BO to mitigate these limitations; however, none address all of them satisfactorily. To address these shortcomings, we propose the LABCAT algorithm, which extends trust-region-based BO by adding principal-component-aligned rotation and an adaptive rescaling strategy based on the length-scales of a local Gaussian process surrogate model with automatic relevance determination. Through extensive numerical experiments using a set of synthetic test functions and the well-known COCO benchmarking software, we show that the LABCAT algorithm outperforms several state-of-the-art BO and other black-box optimization algorithms.

翻訳日:2023-11-21 20:57:37 公開日:2023-11-19

# MoVideo:拡散モデルを用いたモーション対応ビデオ生成

MoVideo: Motion-Aware Video Generation with Diffusion Models ( http://arxiv.org/abs/2311.11325v1 )

ライセンス: Link先を確認

Jingyun Liang, Yuchen Fan, Kai Zhang, Radu Timofte, Luc Van Gool, Rakesh Ranjan

(参考訳) 近年,映像生成における拡散モデルの利用は大きな進歩を遂げているが,そのほとんどは画像生成フレームワークの単純な拡張であり,映像と画像の大きな違いであるモーションを明示的に考慮していない。本稿では,映像奥行きと光流の2つの側面から運動を考慮した新しいモーションアウェアビデオ生成(movideo)フレームワークを提案する。前者はフレーム単位の物体距離と空間配置による動きを規制し、後者はフレーム間の対応による動きを記述し、細部を保存し時間的整合性を改善する。より具体的には、テキストプロンプトから生成されるキーフレームを前提として、ビデオ深度と対応する光フローを生成する時空間モジュールを用いた拡散モデルを最初に設計する。そして、奥行き、光フローベースゆがみビデオ、計算された咬合マスクの指導の下で、別の時空間拡散モデルにより潜時空間で映像を生成する。最後に、我々は再び光学フローを使用して異なるフレームを整列し、改良し、潜在空間から画素空間へのより良いビデオデコーディングを行う。実験では、MoVideoはテキスト・トゥ・ビデオと画像・トゥ・ビデオ生成の両方で最先端の結果を達成する。

While recent years have witnessed great progress on using diffusion models for video generation, most of them are simple extensions of image generation frameworks, which fail to explicitly consider one of the key differences between videos and images, i.e., motion. In this paper, we propose a novel motion-aware video generation (MoVideo) framework that takes motion into consideration from two aspects: video depth and optical flow. The former regulates motion by per-frame object distances and spatial layouts, while the later describes motion by cross-frame correspondences that help in preserving fine details and improving temporal consistency. More specifically, given a key frame that exists or generated from text prompts, we first design a diffusion model with spatio-temporal modules to generate the video depth and the corresponding optical flows. Then, the video is generated in the latent space by another spatio-temporal diffusion model under the guidance of depth, optical flow-based warped latent video and the calculated occlusion mask. Lastly, we use optical flows again to align and refine different frames for better video decoding from the latent space to the pixel space. In experiments, MoVideo achieves state-of-the-art results in both text-to-video and image-to-video generation, showing promising prompt consistency, frame consistency and visual quality.

翻訳日:2023-11-21 20:57:18 公開日:2023-11-19

# 処理効果推定のための表現誘発共起バイアスの境界

Bounds on Representation-Induced Confounding Bias for Treatment Effect Estimation ( http://arxiv.org/abs/2311.11321v1 )

ライセンス: Link先を確認

Valentyn Melnychuk, Dennis Frauen, Stefan Feuerriegel

(参考訳) 条件平均処理効果(CATE)推定のための最先端手法は、表現学習を広く活用する。ここでは、(潜在的に制約された)低次元表現による低サンプルCATE推定のばらつきを低減する。しかし、低次元の表現は、観測された共同設立者に関する情報を失う可能性があり、その結果、CATE推定のための表現学習の妥当性が典型的に侵害されるため、バイアスにつながる。本稿では,CATE推定における次元減少(あるいは表現に関する他の制約)から生じる表現誘発共起バイアスの境界を推定する,表現に依存しない新しいフレームワークを提案する。まず、CATEが低次元(制約付き)表現を非識別する条件を理論的に確立する。第二に、我々はCATEを部分的に同定すること、あるいは同等に、表現誘発共役バイアスの下限と上限を推定することを提案する。我々は一連の実験において境界の有効性を示す。まとめると、我々のフレームワークは、CATE推定の有効性が重要である実践において、直接的な関連性を持っている。

State-of-the-art methods for conditional average treatment effect (CATE) estimation make widespread use of representation learning. Here, the idea is to reduce the variance of the low-sample CATE estimation by a (potentially constrained) low-dimensional representation. However, low-dimensional representations can lose information about the observed confounders and thus lead to bias, because of which the validity of representation learning for CATE estimation is typically violated. In this paper, we propose a new, representation-agnostic framework for estimating bounds on the representation-induced confounding bias that comes from dimensionality reduction (or other constraints on the representations) in CATE estimation. First, we establish theoretically under which conditions CATEs are non-identifiable given low-dimensional (constrained) representations. Second, as our remedy, we propose to perform partial identification of CATEs or, equivalently, aim at estimating of lower and upper bounds of the representation-induced confounding bias. We demonstrate the effectiveness of our bounds in a series of experiments. In sum, our framework is of direct relevance in practice where the validity of CATE estimation is of importance.

翻訳日:2023-11-21 20:56:51 公開日:2023-11-19

# GeoSAM: モビリティインフラストラクチャの自動セグメンテーションのためのスパースと濃厚なビジュアルプロンプトを備えた微調整SAM

GeoSAM: Fine-tuning SAM with Sparse and Dense Visual Prompting for Automated Segmentation of Mobility Infrastructure ( http://arxiv.org/abs/2311.11319v1 )

ライセンス: Link先を確認

Rafi Ibn Sultan, Chengyin Li, Hui Zhu, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu

(参考訳) Segment Anything Model (SAM)は、自然画像のセグメンテーションに適用された際、印象的な性能を示している。しかし、特に道路、歩道、横断歩道などの移動インフラを分割する場合、航空画像や衛星画像のような地理的画像に苦しむ。この劣ったパフォーマンスは、これらのオブジェクトの狭い特徴、それらのテクスチャが周囲に混ざり合うこと、木、建物、車両、歩行者のようなオブジェクトから干渉することに由来する。これらの課題に対処するために,ゼロショット学習からの濃密な視覚的プロンプトと,事前学習したCNNセグメンテーションモデルからの疎密な視覚的プロンプトを用いて微調整戦略を実装する新しいSAMベースのフレームワークであるGeoSAMを提案する。提案したGeoSAMは、道路インフラ、歩行者インフラ、および平均17.65%において、道路と歩行者の両方のインフラを含む移動インフラのセグメント化に基礎モデルを活用するという重要な飛躍の成果として、地理的イメージセグメンテーションの既存のアプローチ、特に20%、14.29%、および17.65%を上回っている。

The Segment Anything Model (SAM) has shown impressive performance when applied to natural image segmentation. However, it struggles with geographical images like aerial and satellite imagery, especially when segmenting mobility infrastructure including roads, sidewalks, and crosswalks. This inferior performance stems from the narrow features of these objects, their textures blending into the surroundings, and interference from objects like trees, buildings, vehicles, and pedestrians - all of which can disorient the model to produce inaccurate segmentation maps. To address these challenges, we propose Geographical SAM (GeoSAM), a novel SAM-based framework that implements a fine-tuning strategy using the dense visual prompt from zero-shot learning, and the sparse visual prompt from a pre-trained CNN segmentation model. The proposed GeoSAM outperforms existing approaches for geographical image segmentation, specifically by 20%, 14.29%, and 17.65% for road infrastructure, pedestrian infrastructure, and on average, respectively, representing a momentous leap in leveraging foundation models to segment mobility infrastructure including both road and pedestrian infrastructure in geographical images.

翻訳日:2023-11-21 20:56:37 公開日:2023-11-19

# ガウス平滑化とガウス微分の離散近似

Discrete approximations of Gaussian smoothing and Gaussian derivatives ( http://arxiv.org/abs/2311.11317v1 )

ライセンス: Link先を確認

Tony Lindeberg

(参考訳) 本稿では, 離散データに適用するためのスケール空間理論におけるガウス平滑化およびガウス微分計算の近似問題に関する深い処理法を考案する。連続的および離散的スケール空間論の以前の公理的処理との密接な関係から、これらのスケール空間演算を明示的離散畳み込みという観点から区別する3つの主要な方法を考える。 (i)ガウス核とガウス微分核をサンプリングする。 (ii)各画素支持領域上にガウス核とガウス微分核を局所的に統合し、 3) ガウス核の離散アナログのスケール空間解析を基礎とし, 空間的スムーズな画像データに小サポート中央差分演算子を適用することにより微分近似を演算する。本研究では,これら3つの主要な離散化手法の特性を理論的・実験的に検討し,その性能を定量的に評価する。その結果、サンプル化されたガウス核と導関数、および統合されたガウス核と導関数は、非常に微細なスケールで非常に低性能であることがわかった。非常に微細なスケールでは、ガウス核の離散的な類似とそれに対応する離散微分近似が大幅に向上する。一方、サンプル化されたガウス核とサンプル化されたガウス微分は、スケールパラメータが十分に大きい場合、グリッド間隔の単位においてスケールパラメータが約1より大きい場合、対応する連続結果の数値的に非常に良い近似をもたらす。

This paper develops an in-depth treatment concerning the problem of approximating the Gaussian smoothing and Gaussian derivative computations in scale-space theory for application on discrete data. With close connections to previous axiomatic treatments of continuous and discrete scale-space theory, we consider three main ways discretizing these scale-space operations in terms of explicit discrete convolutions, based on either (i) sampling the Gaussian kernels and the Gaussian derivative kernels, (ii) locally integrating the Gaussian kernels and the Gaussian derivative kernels over each pixel support region and (iii) basing the scale-space analysis on the discrete analogue of the Gaussian kernel, and then computing derivative approximations by applying small-support central difference operators to the spatially smoothed image data. We study the properties of these three main discretization methods both theoretically and experimentally, and characterize their performance by quantitative measures, including the results they give rise to with respect to the task of scale selection, investigated for four different use cases, and with emphasis on the behaviour at fine scales. The results show that the sampled Gaussian kernels and derivatives as well as the integrated Gaussian kernels and derivatives perform very poorly at very fine scales. At very fine scales, the discrete analogue of the Gaussian kernel with its corresponding discrete derivative approximations performs substantially better. The sampled Gaussian kernel and the sampled Gaussian derivatives do, on the other hand, lead to numerically very good approximations of the corresponding continuous results, when the scale parameter is sufficiently large, in the experiments presented in the paper, when the scale parameter is greater than a value of about 1, in units of the grid spacing.

翻訳日:2023-11-21 20:56:12 公開日:2023-11-19

# TPTU-v2: リアルタイムシステムにおける大規模言語モデルベースエージェントのタスク計画とツール利用の促進

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems ( http://arxiv.org/abs/2311.11315v1 )

ライセンス: Link先を確認

Yilun Kong, Jingqing Ruan, Yihong Chen, Bin Zhang, Tianpeng Bao, Shiwei Shi, Guoqing Du, Xiaoru Hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao

(参考訳) 大規模言語モデル(llm)は、タスク計画と、タスク計画と、apiのような外部ツールの併用を必要とする外部ツールの使用の組み合わせを必要とするタスクに対処する能力を示している。 However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. そこで本稿では,実世界のシステムで動作するllmベースのエージェントのタスク計画とツール使用能力の向上を目的とした包括的フレームワークを提案する。このフレームワークは,(1) API Retrieverが利用可能な広範囲な配列の中で,ユーザタスクに関連するAPIを選択する,(2) LLM FinetunerがベースLLMをチューニングしてタスク計画やAPI呼び出しに役立てる,(3) Demo Selectorは,難しいAPIに関連するさまざまなデモを適応的に検索する,という3つの重要なコンポーネントで構成されている。実世界の商用システムとオープンソースの学術データセットを用いて,本手法の有効性を検証し,各コンポーネントの有効性と統合フレームワークの有効性を明らかにした。

Large Language Models (LLMs) have demonstrated proficiency in addressing tasks that necessitate a combination of task planning and the usage of external tools that require a blend of task planning and the utilization of external tools, such as APIs. However, real-world complex systems present three prevalent challenges concerning task planning and tool usage: (1) The real system usually has a vast array of APIs, so it is impossible to feed the descriptions of all APIs to the prompt of LLMs as the token length is limited; (2) the real system is designed for handling complex tasks, and the base LLMs can hardly plan a correct sub-task order and API-calling order for such tasks; (3) Similar semantics and functionalities among APIs in real systems create challenges for both LLMs and even humans in distinguishing between them. In response, this paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of LLM-based agents operating within real-world systems. Our framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs, which is further used for in-context learning to boost the final performance. We validate our methods using a real-world commercial system as well as an open-sourced academic dataset, and the outcomes clearly showcase the efficacy of each individual component as well as the integrated framework.

翻訳日:2023-11-21 20:55:47 公開日:2023-11-19

# ケラー非線形性のt-dmrgシミュレーション : 非ガウス力学の初期状態依存性の解析

t-DMRG Simulation of Kerr Nonlinearity; Analyzing Initial State Dependency of non-Gaussian Dynamics ( http://arxiv.org/abs/2311.11314v1 )

ライセンス: Link先を確認

Souvik Agasti

(参考訳) 時間発展ブロックデシメーション (tebd) アルゴリズムを用いて, コヒーレント駆動自由散逸カー非線形系を数値的にシミュレートし, 古典的ビスタブルとダイナミクスがどのように類似しているかを検証した。 2つのコヒーレント分岐の重ね合わせは非古典的時間ダイナミクスをもたらす。ウィグナー状態の表現は、システムが異なる軌道を通じて進化し、異なる外部ドライブと初期条件のために異なる分岐を安定化し、進化を通じて非ゲージ化をもたらすことを確認している。さらに,進化が初期状態の残留的な影響を被っていることも確認した。

We simulate coherent driven free dissipative Kerr nonlinear system numerically, starting from different initial states, using time-evolving block decimation (TEBD) algorithm to see how the dynamics are analogous to classical bistability. The superposition of two coherent branches results in non-classical time dynamics. The Wigner state representation confirms that the system evolves through different trajectories to stabilize different branches for different external drives and initial conditions, resulting de-Gaussification throughout evolution. Furthermore, we also see that the evolution suffers a residual effect of the initial state.

翻訳日:2023-11-21 20:55:18 公開日:2023-11-19

# 量子誤り訂正プログラムの記号的実行

Symbolic Execution for Quantum Error Correction Programs ( http://arxiv.org/abs/2311.11313v1 )

ライセンス: Link先を確認

Wang Fang, Mingsheng Ying

(参考訳) 我々は,量子プログラムのためのシンボリック実行フレームワークqseを定義し,記号変数を量子状態と量子測定結果に統合する。 QSEの音響定理が証明される。さらに,量子誤差補正プログラムの効率的な解析を容易にするシンボリック安定化状態を導入する。 QSEフレームワーク内では、シンボリック表現を用いて量子誤り訂正の可能な逆誤差を特徴付けることができ、シミュレータによるサンプリングに依存する既存の手法よりも大幅に改善される。我々はQuantumSE.jlというプロトタイプツールでシンボル安定化状態をサポートするQSEを実装した。量子反復符号、北エフのトーリック符号、量子タナー符号を含む代表量子誤り訂正符号の実験により、1000量子ビットを超える量子誤り訂正プログラムをデバッグするためのQuantumSE.jlの効率を実証する。さらに、QSEの副産物として、QuantumSE.jlの安定化回路のサンプリング機能は、実験において最先端の安定化シミュレータであるGoogleのStimよりも優れている。

We define a symbolic execution framework QSE for quantum programs by integrating symbolic variables into quantum states and the outcomes of quantum measurements. The soundness theorem of QSE is proved. We further introduce symbolic stabilizer states, which facilitate the efficient analysis of quantum error correction programs. Within the QSE framework, we can use symbolic expressions to characterize the possible adversarial errors in quantum error correction, providing a significant improvement over existing methods that rely on sampling with simulators. We implement QSE with the support of symbolic stabilizer states in a prototype tool named QuantumSE.jl. With experiments on representative quantum error correction codes, including quantum repetition codes, Kitaev's toric codes, and quantum Tanner codes, we demonstrate the efficiency of QuantumSE.jl for debugging quantum error correction programs with over 1000 qubits. In addition, as a by-product of QSE, QuantumSE.jl's sampling functionality for stabilizer circuits also outperforms the state-of-the-art stabilizer simulator, Google's Stim, in the experiments.

翻訳日:2023-11-21 20:55:04 公開日:2023-11-19

# マルチモーダル相互作用とプール注意によるrgb-d意味セグメンテーションの最適化

Optimizing rgb-d semantic segmentation through multi-modal interaction and pooling attention ( http://arxiv.org/abs/2311.11312v1 )

ライセンス: Link先を確認

Shuai Zhang, Minghong Xie

(参考訳) RGB-D画像のセマンティックセグメンテーションは、シーン内の物体の外観や空間的関係を理解し、様々な要因を慎重に検討する必要がある。しかし、屋内環境では、RGBと深度画像の単純な入力は、しばしば意味情報と空間情報の比較的限られた取得をもたらし、最適下分割の結果をもたらす。そこで本研究では,rgbと奥行きモダリティの対話的な相乗効果を活かし,補完的情報の利用を最適化する新しい手法であるmipanetを提案する。具体的には,Multi-modal Interaction Fusion Module (MIM) をネットワークの最も深い層に組み込む。このモジュールはRGBと深度情報の融合を容易にするために設計されており、相互強化と修正が可能である。さらに,エンコーダの様々な段階において,Pooling Attention Module (PAM)を導入する。このモジュールは、ネットワークによって抽出された機能を増幅し、モジュールの出力をターゲットとしてデコーダに統合し、セマンティックセグメンテーションのパフォーマンスを大幅に改善する。実験の結果、MIPANetは2つの屋内シーンデータセットであるNYUDv2とSUN-RGBDの既存手法よりも優れており、RGB-Dセマンティックセマンティックセマンティックセマンティクスの強化の有効性が示されている。

Semantic segmentation of RGB-D images involves understanding the appearance and spatial relationships of objects within a scene, which requires careful consideration of various factors. However, in indoor environments, the simple input of RGB and depth images often results in a relatively limited acquisition of semantic and spatial information, leading to suboptimal segmentation outcomes. To address this, we propose the Multi-modal Interaction and Pooling Attention Network (MIPANet), a novel approach designed to harness the interactive synergy between RGB and depth modalities, optimizing the utilization of complementary information. Specifically, we incorporate a Multi-modal Interaction Fusion Module (MIM) into the deepest layers of the network. This module is engineered to facilitate the fusion of RGB and depth information, allowing for mutual enhancement and correction. Additionally, we introduce a Pooling Attention Module (PAM) at various stages of the encoder. This module serves to amplify the features extracted by the network and integrates the module's output into the decoder in a targeted manner, significantly improving semantic segmentation performance. Our experimental results demonstrate that MIPANet outperforms existing methods on two indoor scene datasets, NYUDv2 and SUN-RGBD, underscoring its effectiveness in enhancing RGB-D semantic segmentation.

翻訳日:2023-11-21 20:54:44 公開日:2023-11-19

# 異方性中心スピンモデルによるスピンスクイーズ

Spin squeezing generated by anisotropic central spin model ( http://arxiv.org/abs/2311.11308v1 )

ライセンス: Link先を確認

Lei Shao and Libin Fu

(参考訳) スピンスクイージングは、重要な量子資源として、量子力学において重要な役割を担い、高精度なパラメータ推定スキームを実現できる。ここでは,異方性中心スピン系におけるスピンスクイージングと量子相転移について検討する。このような中心スピン系は、中心スピンとスピン浴の間の遷移周波数の比が無限大に向かう限界において、異方性リプキン-メシュコフ-グリック模型にマッピングできる。この性質は1軸のねじれ相互作用を誘発し、スピンスクイーズを生成する新しい可能性を与える。我々は、基底状態と中心スピンモデルの動的進化を通してスピンスクイーズ状態を生成することを検討する。その結果, スピンスクイーズパラメータは異方性パラメータが減少するにつれて向上し, その値はシステムサイズで$N^{-2/3}$となることがわかった。さらに, 臨界点周辺の量子フィッシャー情報の臨界指数を数値シミュレーションにより求め, この値は周波数比として4/3ドルの値となり, システムサイズが無限大になる傾向がみられた。この研究はスピンスクイーズ状態を生成するための有望なスキームを提供し、量子センシングの潜在的な進歩の道を開く。

Spin squeezing, as a crucial quantum resource, plays a pivotal role in quantum metrology, enabling us to achieve high-precision parameter estimation schemes. Here we investigate the spin squeezing and the quantum phase transition in anisotropic central spin systems. We find that this kind of central spin systems can be mapped to the anisotropic Lipkin-Meshkov-Glick model in the limit where the ratio of transition frequencies between the central spin and the spin bath tends towards infinity. This property can induce a one-axis twisting interaction and provides a new possibility for generating spin squeezing. We consider generating spin-squeezed states via the ground state and the dynamic evolution of the central spin model. The results show that the spin squeezing parameter improves as the anisotropy parameter decreases, and its value scales with system size as $N^{-2/3}$. Furthermore, we obtain the critical exponent of the quantum Fisher information around the critical point by numerical simulation, and find this value tends to $4/3$ as the frequency ratio and the system size approach infinity. This work offers a promising scheme for generating spin-squeezed state and paves the way for potential advancements in quantum sensing.

翻訳日:2023-11-21 20:54:20 公開日:2023-11-19

# 直交専門家の混合によるマルチタスク強化学習

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts ( http://arxiv.org/abs/2311.11385v1 )

ライセンス: Link先を確認

Ahmed Hendawy, Jan Peters, Carlo D'Eramo

(参考訳) マルチタスク強化学習(mtrl)は、様々な問題を一般化するスキルを持つ内在エージェントの長年の問題に取り組む。この目的のために、表現の共有は、タスクのユニークな特徴と共通の特徴の両方をキャプチャする上で、基本的な役割を果たす。タスクは、スキル、オブジェクト、または物理的特性の点で類似性を示し、それらの表現を活用すれば、普遍的なポリシーの達成が容易になる。それでも、多様な表現の共有セットを学ぶことの追求は、いまだに未解決の課題である。本稿では,直交表現を用いてタスク間の共通構造をカプセル化して多様性を促進するMTRLにおける表現学習手法を提案する。我々の手法はMixture Of Orthogonal Experts (MOORE) と呼ばれ、Gram-Schmidtプロセスを利用して、専門家の混合によって生成された表現の共有部分空間を形成する。タスク固有の情報が提供されると、MOOREはこの共有部分空間から関連する表現を生成する。提案手法の有効性をMiniGridとMetaWorldという2つのMTRLベンチマークで評価し,MOOREが関連するベースラインを超越し,MetaWorldにおける新たな最先端結果を確立することを示す。

Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.

翻訳日:2023-11-21 20:46:49 公開日:2023-11-19

# mriにおける拡散確率モデルの新しい応用に関する調査

A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI ( http://arxiv.org/abs/2311.11383v1 )

ライセンス: Link先を確認

Yuheng Fan, Hanxi Liao, Shiqi Huang, Yimin Luo, Huazhu Fu, Haikun Qi

(参考訳) 拡散確率モデル (DPM) は, 明らかな可能性評価とデータ合成のための段階的なサンプリングプロセスを用いて, 研究の関心が高まっている。サンプリング中の多くのステップによる計算負荷にもかかわらず、DPMは様々な医療画像タスクにおいて、その高品質で多様な世代に対して広く評価されている。磁気共鳴イメージング(mri)は、軟組織コントラストと超b空間分解能に優れ、拡散モデルに特有の機会を持つ重要な医用イメージングモードである。 MRIでDPMを探索する研究が近年増えているが、MRIアプリケーション用に特別に設計されたDPMの調査論文はいまだに不足している。この記事では、MRIコミュニティの研究者が異なるアプリケーションにおけるDPMの進歩を把握できるようにすることを目的としている。まず,拡散時間ステップが離散的か連続的かに応じて分類された2つの支配的なDPMの理論を紹介し,画像生成,画像翻訳,セグメンテーション,異常検出,その他の研究トピックを含むMRIにおける新たなDPMの総合的なレビューを行う。最後に、DPMのMRIタスクに特有の制限だけでなく、一般的な制限についても論じ、さらに探究する価値のある潜在的な領域を指摘する。

Diffusion probabilistic models (DPMs) which employ explicit likelihood characterization and a gradual sampling process to synthesize data, have gained increasing research interest. Despite their huge computational burdens due to the large number of steps involved during sampling, DPMs are widely appreciated in various medical imaging tasks for their high-quality and diversity of generation. Magnetic resonance imaging (MRI) is an important medical imaging modality with excellent soft tissue contrast and superb spatial resolution, which possesses unique opportunities for diffusion models. Although there is a recent surge of studies exploring DPMs in MRI, a survey paper of DPMs specifically designed for MRI applications is still lacking. This review article aims to help researchers in the MRI community to grasp the advances of DPMs in different applications. We first introduce the theory of two dominant kinds of DPMs, categorized according to whether the diffusion time step is discrete or continuous, and then provide a comprehensive review of emerging DPMs in MRI, including reconstruction, image generation, image translation, segmentation, anomaly detection, and further research topics. Finally, we discuss the general limitations as well as limitations specific to the MRI tasks of DPMs and point out potential areas that are worth further exploration.

翻訳日:2023-11-21 20:46:27 公開日:2023-11-19

# 統計情報を付加した変圧器モデルの説明可能性の検討

Inspecting Explainability of Transformer Models with Additional Statistical Information ( http://arxiv.org/abs/2311.11378v1 )

ライセンス: Link先を確認

Hoang C. Nguyen, Haeil Lee, Junmo Kim

(参考訳) 近年、視覚領域ではトランスフォーマーがより普及しているため、それを視覚化することでトランスフォーマーモデルを効果的に解釈する方法を見つける必要がある。最近の研究でcheferらは、各イメージパッチの重要性を示すために注意層を組み合わせることで、視覚とマルチモーダルタスクのトランスフォーマーを効果的に可視化できる。しかし、Swin Transformerのような他の変種のTransformerに適用する場合、この方法は予測対象に集中できない。本手法は,層正規化層におけるトークンの統計を考慮し,スウィントランスとvitの解釈可能性を示す。

Transformer becomes more popular in the vision domain in recent years so there is a need for finding an effective way to interpret the Transformer model by visualizing it. In recent work, Chefer et al. can visualize the Transformer on vision and multi-modal tasks effectively by combining attention layers to show the importance of each image patch. However, when applying to other variants of Transformer such as the Swin Transformer, this method can not focus on the predicted object. Our method, by considering the statistics of tokens in layer normalization layers, shows a great ability to interpret the explainability of Swin Transformer and ViT.

翻訳日:2023-11-21 20:46:04 公開日:2023-11-19

# ML-LMCL:音声言語理解におけるASRロバスト性向上のための相互学習と大規模コントラスト学習

ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding ( http://arxiv.org/abs/2311.11375v1 )

ライセンス: Link先を確認

Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou

(参考訳) 音声言語理解(SLU)はタスク指向対話システムの基本課題である。しかしながら、自動音声認識(ASR)による避けられない誤りは、通常、理解性能を損ね、エラーの伝播につながる。コントラスト学習によってこの問題に対処しようとする試みはいくつかあるが,(1)手書き文字とASR文字の書き起こしは微調整で等しく扱うこと,(2)コントラスト学習を適用する際に意味論的に類似したペアがまだ追い出されているという事実を無視すること,(3)KL(Kulback-Leibler)という問題に悩まされる。本稿では,sluにおけるasrロバスト性向上のための新しい枠組みである,相互学習と大規模比較学習(ml-lmcl)を提案する。具体的には、相互学習に適用し、2つのSLUモデルを手書き文字とASR文字で訓練し、これら2つのモデルの知識を反復的に共有することを目的としている。また,クラスタ内ペアを可能な限り排除しないように,距離偏光正規化器を導入する。さらに,klの消失を緩和するために周期的アニーリングスケジュールを用いる。 3つのデータセットの実験では、ML-LMCLは既存のモデルより優れ、新しい最先端のパフォーマンスを実現する。

Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue systems. However, the inevitable errors from automatic speech recognition (ASR) usually impair the understanding performance and lead to error propagation. Although there are some attempts to address this problem through contrastive learning, they (1) treat clean manual transcripts and ASR transcripts equally without discrimination in fine-tuning; (2) neglect the fact that the semantically similar pairs are still pushed away when applying contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL) vanishing. In this paper, we propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness in SLU. Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models. We also introduce a distance polarization regularizer to avoid pushing away the intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing schedule to mitigate KL vanishing issue. Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.

翻訳日:2023-11-21 20:45:52 公開日:2023-11-19

# SOccDPT: メモリ制約下で訓練された高密度予測変換器からの半教師付き3次元セマンティック動作

SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints ( http://arxiv.org/abs/2311.11371v1 )

ライセンス: Link先を確認

Aditya Nalgunda Ganesh

(参考訳) 我々は高密度な予測変換器を用いた単眼画像からの3次元意味占有予測のためのメモリ効率のよいSOccDPTを提案する。構造化トラヒックデータセットでトレーニングされた既存のメソッドの制限に対処するために、インド駆動データセットやベンガルー駆動データセットを含む非構造化データセットでモデルをトレーニングします。半教師付きトレーニングパイプラインにより,socdptは限定ラベル付きデータセットから学習でき,擬似基底真理ラベルを代入することで,手作業によるラベリングの必要性を低減し,bengaluruセマンティック占有データセットを作成できる。この広範なトレーニングにより、非構造化トラフィックシナリオを効果的に処理できるモデルの能力が向上します。トレーニング中のメモリ制限を克服するために,各エポックをトレーニングするパラメータのサブセットを選択するパッチワイズトレーニングを導入し,自動グレードグラフ構築時のメモリ使用量を削減する。構造化されていないトラフィックとメモリ制約のあるトレーニングと推論の文脈において、SOccDPTはRMSEの9.1473のスコアで示されるような既存の格差推定手法より優れており、セマンティックセグメンテーションIoUのスコアは46.02%に達し、競争周波数は69.47Hzである。コードとセマンティック占有率データセットを公開します。

We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi-supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labelling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public.

翻訳日:2023-11-21 20:45:24 公開日:2023-11-19

# 公共データを用いた最適局所的非パラメトリック分類

Optimal Locally Private Nonparametric Classification with Public Data ( http://arxiv.org/abs/2311.11369v1 )

ライセンス: Link先を確認

Yuheng Ma and Hanfang Yang

(参考訳) 本研究では,非パラメトリック分類に着目し,非対話型ldp(local differential privacy)学習の課題について検討する。後方ドリフト仮定の下では, LDP制約による最小収束率を初めて導出した。そこで,本研究では,極小最大収束率を実現する新しい手法である局所プライベート分類木を提案する。さらに,パラメータチューニングを回避し,高速収束推定器を生成するデータ駆動プルーニング手順を設計する。合成および実データを用いた総合的な実験は,提案手法の優れた性能を示す。理論的および実験的な結果は、プライベートデータと比較して公開データの有効性を示すものであり、非プライベートデータ収集の優先順位付けの実践的提案につながっている。

In this work, we investigate the problem of public data-assisted non-interactive LDP (Local Differential Privacy) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and produces a fast converging estimator. Comprehensive experiments conducted on synthetic and real datasets show the superior performance of our proposed method. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.

翻訳日:2023-11-21 20:44:56 公開日:2023-11-19

# 不均一ハイパーグラフニューラルネットワークの自己教師付き事前学習

Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks ( http://arxiv.org/abs/2311.11368v1 )

ライセンス: Link先を確認

Abdalgader Abubaker, Takanori Maehara, Madhav Nimishakavi, Vassilis Plachouras

(参考訳) 近年,グラフニューラルネットワーク(gnns)の事前学習手法がラベルなしグラフデータから効果的な表現を学習するのに成功している。しかし、これらの手法のほとんどはグラフの対関係に依存しており、エンティティ間の高次関係を捉えていない。ハイパーグラフは、データ内のエンティティ間の高次関係を効果的にモデル化できる汎用的で表現力のある構造である。ハイパーグラフ(HyperGNN)にGNNを適用する努力にもかかわらず、現在、異種ハイパーグラフ上でのHyperGNNの完全な事前訓練方法は存在しない。本稿では,異種HyperGNNのための自己教師型事前学習フレームワークであるSPHHを提案する。本手法は,データ内のエンティティ間の高次関係を自己監督的に効果的に捉えることができる。 SPHHは、ハイパーグラフ構造から派生した情報表現を用いて、ハイパーグラフ内のエンティティの局所的およびグローバル的表現を同時に学習することを目的とした2つの自己教師型事前訓練タスクからなる。全体としては,ハイパーgnnの自己教師付き事前学習の分野において重要な進歩を示し,ハイパーグラフ構成にマッピングされたノード分類やリンク予測タスクなど,グラフベースのダウンストリームタスクのパフォーマンス向上の可能性を示す。 4つの異なるHyperGNNモデルを用いた2つの実世界のベンチマーク実験により、提案したSPHHフレームワークは、様々な下流タスクにおける最先端のベースラインを一貫して上回ることを示す。その結果、SPHHは、アーキテクチャや複雑さに関わらず、様々な下流タスクにおける様々なHyperGNNモデルの性能を向上させることができることが示され、フレームワークの堅牢性を強調している。

Recently, pretraining methods for the Graph Neural Networks (GNNs) have been successful at learning effective representations from unlabeled graph data. However, most of these methods rely on pairwise relations in the graph and do not capture the underling higher-order relations between entities. Hypergraphs are versatile and expressive structures that can effectively model higher-order relationships among entities in the data. Despite the efforts to adapt GNNs to hypergraphs (HyperGNN), there are currently no fully self-supervised pretraining methods for HyperGNN on heterogeneous hypergraphs. In this paper, we present SPHH, a novel self-supervised pretraining framework for heterogeneous HyperGNNs. Our method is able to effectively capture higher-order relations among entities in the data in a self-supervised manner. SPHH is consist of two self-supervised pretraining tasks that aim to simultaneously learn both local and global representations of the entities in the hypergraph by using informative representations derived from the hypergraph structure. Overall, our work presents a significant advancement in the field of self-supervised pretraining of HyperGNNs, and has the potential to improve the performance of various graph-based downstream tasks such as node classification and link prediction tasks which are mapped to hypergraph configuration. Our experiments on two real-world benchmarks using four different HyperGNN models show that our proposed SPHH framework consistently outperforms state-of-the-art baselines in various downstream tasks. The results demonstrate that SPHH is able to improve the performance of various HyperGNN models in various downstream tasks, regardless of their architecture or complexity, which highlights the robustness of our framework.

翻訳日:2023-11-21 20:44:46 公開日:2023-11-19

# 証拠不確かさの定量化:変数に基づく視点

Evidential Uncertainty Quantification: A Variance-Based Perspective ( http://arxiv.org/abs/2311.11367v1 )

ライセンス: Link先を確認

Ruxiao Duan, Brian Caffo, Harrison X. Bai, Haris I. Sair, Craig Jones

(参考訳) 深層ニューラルネットワークの不確かさの定量化は研究の活発な分野となり、アクティブラーニングのような下流の様々なタスクにおいて重要な役割を担っている。近年の顕在的深層学習の進歩は, モデルの1つの前方通過による動脈およびてんかんの不確かさの直接定量化に寄与している。ほとんどの伝統的なアプローチでは、エントロピーに基づく方法で分類における明白な不確かさを導き、サンプルレベルでの不確かさを定量化している。しかし,回帰問題に広く適用されている分散ベースの手法は,分類設定においてほとんど用いられない。本研究では,回帰から分類へ分散ベースのアプローチを適用し,クラスレベルでの分類の不確かさを定量化する。回帰における分散分解手法は、全共分散の法則に基づく分類におけるクラス共分散分解に拡張され、クラス相関も共分散から導出される。クロスドメインデータセットの実験は、分散ベースのアプローチが、アクティブドメイン適応におけるエントロピーベースのアプローチと同等の精度をもたらすだけでなく、クラスワイドの不確実性やクラス間の相関に関する情報をもたらすことを示す。コードはhttps://github.com/kerrydrx/evidentialadaで入手できる。この明らかな不確実性定量化の代替手段は、クラスの不確実性や相関が応用において重要である場合、研究者により多くの選択肢を与える。

Uncertainty quantification of deep neural networks has become an active field of research and plays a crucial role in various downstream tasks such as active learning. Recent advances in evidential deep learning shed light on the direct quantification of aleatoric and epistemic uncertainties with a single forward pass of the model. Most traditional approaches adopt an entropy-based method to derive evidential uncertainty in classification, quantifying uncertainty at the sample level. However, the variance-based method that has been widely applied in regression problems is seldom used in the classification setting. In this work, we adapt the variance-based approach from regression to classification, quantifying classification uncertainty at the class level. The variance decomposition technique in regression is extended to class covariance decomposition in classification based on the law of total covariance, and the class correlation is also derived from the covariance. Experiments on cross-domain datasets are conducted to illustrate that the variance-based approach not only results in similar accuracy as the entropy-based one in active domain adaptation but also brings information about class-wise uncertainties as well as between-class correlations. The code is available at https://github.com/KerryDRX/EvidentialADA. This alternative means of evidential uncertainty quantification will give researchers more options when class uncertainties and correlations are important in their applications.

翻訳日:2023-11-21 20:44:19 公開日:2023-11-19

# 古典データ符号化のための量子アクセスモデルの回路複雑性について

On circuit complexity of quantum access models for encoding classical data ( http://arxiv.org/abs/2311.11365v1 )

ライセンス: Link先を確認

Xiao-Ming Zhang, Xiao Yuan

(参考訳) 古典的なデータエンコーディングは通常、オラクルベースの量子アルゴリズムではブラックボックスとして扱われる。一方,それらの構成は実用的なアルゴリズムの実装に不可欠である。ここでは、データエンコーディングのブラックボックスを開き、典型的な量子アクセスモデルを構築する際のclifford$+t$の複雑さを調べます。一般の行列に対して、スパースアクセス入力モデルとブロックエンコーディングの両方が、行列がスパースであっても、行列次元に対してほぼ線形回路複雑度を必要とすることを示す。また、ほぼ最適のゲート複雑性を達成する構築プロトコルも提供します。一方、行列が効率的なユニタリの線形結合多項式項である場合、構成はデータキュービットに対して効率的になる。典型的な例として、これらのユニタリがPauli文字列である場合のブロック符号化の改善を提案する。私たちのプロトコルは、量子状態の改善と独立した値を持つpauli文字列の選択的神託に基づいて構築されています。我々のアクセスモデル構築は、調整可能なアクビット数を提供し、対応する時空トレードオフを提供する。

Classical data encoding is usually treated as a black-box in the oracle-based quantum algorithms. On the other hand, their constructions are crucial for practical algorithm implementations. Here, we open the black-boxes of data encoding and study the Clifford$+T$ complexity of constructing some typical quantum access models. For general matrices, we show that both sparse-access input models and block-encoding require nearly linear circuit complexities relative to the matrix dimension, even if matrices are sparse. We also gives construction protocols achieving near-optimal gate complexities. On the other hand, the construction becomes efficient with respect to the data qubit when the matrix is the linear combination polynomial terms of efficient unitaries. As a typical example, we propose improved block encoding when these unitaries are Pauli strings. Our protocols are built upon improved quantum state preparation and a selective oracle for Pauli strings, which hold independent value. Our access model constructions offer considerable flexibility, allowing for tunable ancillary qubit number and offers corresponding space-time trade-offs.

翻訳日:2023-11-21 20:43:57 公開日:2023-11-19

# 対称性不変量子機械学習力場

Symmetry-invariant quantum machine learning force fields ( http://arxiv.org/abs/2311.11362v1 )

ライセンス: Link先を確認

Isabel Nha Minh Le, Oriel Kiss, Julian Schuhmacher, Ivano Tavernelli and Francesco Tacchino

(参考訳) 機械学習技術は、原子論シミュレーションのための効率的で正確な力場を計算するのに欠かせないツールである。このアプローチは最近、量子コンピューティングの手法を取り入れるために拡張され、潜在的なエネルギー表面や原子力を予測するために変分量子学習モデルが用いられるようになった。しかしながら、そのようなモデルのトレーニング容易性とスケーラビリティは、理論的および実用的障壁の両方のため、依然として制限されている。近年の幾何学的古典的および量子的機械学習の発展に触発されて、我々は、データにインスパイアされた先行として、物理的に関連する幅広い対称性を明示的に組み込む量子ニューラルネットワークを設計した。我々の不変量子学習モデルは、複雑性が増大する個々の分子において、より一般的なものよりも優れています。さらに,複数の成分を持つシステムの最小例として水二量体について検討し,提案手法の汎用性を示し,より大きなシミュレーションへの道を開く。以上の結果から,分子力場の生成は幾何学的量子機械学習の枠組みを活用し,化学系は高度な量子機械学習ツールの開発と応用のための興味深く豊かな場であることが示唆された。

Machine learning techniques are essential tools to compute efficient, yet accurate, force fields for atomistic simulations. This approach has recently been extended to incorporate quantum computational methods, making use of variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data. However, the trainability and scalability of such models are still limited, due to both theoretical and practical barriers. Inspired by recent developments in geometric classical and quantum machine learning, here we design quantum neural networks that explicitly incorporate, as a data-inspired prior, an extensive set of physically relevant symmetries. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity. Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent, in fact, an interesting and rich playground for the development and application of advanced quantum machine learning tools.

翻訳日:2023-11-21 20:43:41 公開日:2023-11-19

# 手のひら印字認識のためのスケールアウェアコンペティションネットワーク

Scale-aware competition network for palmprint recognition ( http://arxiv.org/abs/2311.11354v1 )

ライセンス: Link先を確認

Chengrui Gao, Ziyuan Yang, Min Zhu, Andrew Beng Jin Teo

(参考訳) Palmprintのバイオメトリックスは、パームスキャンによる支払いと社会保障に注意を向けた。しかし,テクスチャの寸法を無視して,テクスチャの配向を優先する手法が主流であった。我々は,この制約を解消するために,イントラスケールとイントラスケールの機能を同時抽出する革新的なネットワークを設計した。本稿では,ISCM(Inner-Scale Competition Module)とASCM(Across-Scale Competition Module)を含むSAC-Net(Scale-Aware competitive Network)を提案する。 ISCMは学習可能なGaborフィルタと自己認識機構を効率的に統合し、リッチな向きデータを抽出し、長距離識別特性を持つテクスチャを識別する。その後、ASCMは様々なスケールの競争戦略を活用して、競合するテクスチャスケールの要素を効果的にカプセル化する。 iscm と ascm を併用することにより, パームプリントの特徴を特徴付ける。 3つのベンチマークデータセットにまたがる厳密な実験は、最先端の代替案と比較して、提案手法の例外的な認識性能と回復力を示している。

Palmprint biometrics garner heightened attention in palm-scanning payment and social security due to their distinctive attributes. However, prevailing methodologies singularly prioritize texture orientation, neglecting the significant texture scale dimension. We design an innovative network for concurrently extracting intra-scale and inter-scale features to redress this limitation. This paper proposes a scale-aware competitive network (SAC-Net), which includes the Inner-Scale Competition Module (ISCM) and the Across-Scale Competition Module (ASCM) to capture texture characteristics related to orientation and scale. ISCM efficiently integrates learnable Gabor filters and a self-attention mechanism to extract rich orientation data and discern textures with long-range discriminative properties. Subsequently, ASCM leverages a competitive strategy across various scales to effectively encapsulate the competitive texture scale elements. By synergizing ISCM and ASCM, our method adeptly characterizes palmprint features. Rigorous experimentation across three benchmark datasets unequivocally demonstrates our proposed approach's exceptional recognition performance and resilience relative to state-of-the-art alternatives.

翻訳日:2023-11-21 20:43:20 公開日:2023-11-19

# 規制の代替:パブリックAIの事例

An Alternative to Regulation: The Case for Public AI ( http://arxiv.org/abs/2311.11350v1 )

ライセンス: Link先を確認

Nicholas Vincent, David Bau, Sarah Schwettmann, Joshua Tan

(参考訳) 政府はAIを構築できるのか? 本稿では、政府や他の公共機関が資金提供し、提供し、管理する「公的なAI」 - 公開アクセス可能なAIモデルを開発するための継続的な取り組みについて述べる。パブリックAIは、AIに対する標準的な規制アプローチの代替と補完の両方を提供するが、同時に新しい技術とポリシーの課題も示唆している。我々は、MLリサーチコミュニティがこのイニシアチブを形作り、その実装をサポートするためのロードマップと、パブリックAIが他の責任あるAIイニシアチブを補完する方法について提示する。

Can governments build AI? In this paper, we describe an ongoing effort to develop ``public AI'' -- publicly accessible AI models funded, provisioned, and governed by governments or other public bodies. Public AI presents both an alternative and a complement to standard regulatory approaches to AI, but it also suggests new technical and policy challenges. We present a roadmap for how the ML research community can help shape this initiative and support its implementation, and how public AI can complement other responsible AI initiatives.

翻訳日:2023-11-21 20:43:01 公開日:2023-11-19

# 被覆粘度を考慮したアルゴリズムの講義

Coverage-Validity-Aware Algorithmic Recourse ( http://arxiv.org/abs/2311.11349v1 )

ライセンス: Link先を確認

Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen

(参考訳) アルゴリズムリコースは、機械学習モデルの説明可能性、透明性、それゆえ倫理を促進するための顕著な技術として浮上する。既存のアルゴリズムリコースアプローチは不変予測モデルをとることが多いが、予測モデルは通常、新しいデータの到着時に更新される。したがって、現在のモデルにそれぞれ有効である言い換えは、将来のモデルでは無効になる可能性がある。そこで本研究では,モデルシフトに対するロバスト性を示すモデルに依存しない談話を生成する新しい枠組みを提案する。まず,非線形(ブラックボックス)モデルのカバレッジを意識した線形サロゲートを構築し,そのリコースを線形サロゲートに対して生成する。我々は, 被覆特性を考慮した線形サロゲートと minimax probability machines (mpm) との理論的関係を確立する。そして、異なる共分散の頑健性を規定することで、提案フレームワークは$\ell_2$-regularization やクラス重み付けを含むmpmの一般的な正規化を回復する。さらに,我々のサーロゲートが近似超平面を直観的に押し付け,ロバストだけでなく解釈可能な帰路も促進することを示した。数値的な結果は,我々のフレームワークの有用性と堅牢性を示している。

Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework.

翻訳日:2023-11-21 20:42:49 公開日:2023-11-19

# 異方性開量子ラビ模型における多臨界散逸相転移

Multicritical dissipative phase transitions in the anisotropic open quantum Rabi model ( http://arxiv.org/abs/2311.11346v1 )

ライセンス: Link先を確認

Guitao Lyu, Korbinian Kottmann, Martin B. Plenio, Myung-Joong Hwang

(参考訳) 回転項と反回転項の結合強度の異方性の程度を変えて一階及び二階の散逸相転移を示す異方性開量子ラビモデルの非平衡定常状態について検討する。半古典的および量子的アプローチの両方を用いて、異方性と散逸の間の相互作用から生じる豊富な位相図を見つける。まず、通常相と超放射相の両方が安定な双安定相が存在する。第2に、第1および第2次相転移の位相境界が一致する多臨界点が存在する。新しい臨界指数の集合が多臨界点のスケーリングを支配していることを示す。最後に,ラマン遷移の強度制御により異方性が調整可能な一対の捕捉イオンを用いて,多臨界遷移の観測とビスタビリティの実現可能性について検討する。本研究は, 有限成分量子系における臨界現象の範囲を拡大し, 臨界量子センシングへの応用に有用であることを示す。

We investigate the nonequilibrium steady state of the anisotropic open quantum Rabi model, which exhibits first-order and second-order dissipative phase transitions upon varying the degree of anisotropy between the coupling strengths of rotating and counterrotating terms. Using both semiclassical and quantum approaches, we find a rich phase diagram resulting from the interplay between the anisotropy and the dissipation. First, there exists a bistable phase where both the normal and superradiant phases are stable. Second, there are multicritical points where the phase boundaries for the first- and second-order phase transitions meet. We show that a new set of critical exponents governs the scaling of the multicritical points. Finally, we discuss the feasibility of observing the multicritical transitions and bistability using a pair of trapped ions where the anisotropy can be tuned by the controlling the intensity of the Raman transitions. Our study enlarges the scope of critical phenomena that may occur in finite-component quantum systems, which could be useful for the applications in the critical quantum sensing.

翻訳日:2023-11-21 20:42:27 公開日:2023-11-19

# 連続変数への新しい埋め込みを用いた高速化逆モデリングのための生成モデル

A Generative Model for Accelerated Inverse Modelling Using a Novel Embedding for Continuous Variables ( http://arxiv.org/abs/2311.11343v1 )

ライセンス: Link先を確認

S\'ebastien Bompas abd Stefan Sandfeld

(参考訳) 材料科学において、望ましい性質を持つ高速プロトタイピング材料の挑戦は、しばしば適切な微細構造を見つけるために広範囲な実験を必要とする。さらに、与えられた性質に対する微細構造の発見は、一般に複数の解が存在する可能性のある不適切な問題である。生成機械学習モデルを使用することは、計算コストの低減にも有効である。これは、例えばモデルへの条件付け入力として連続プロパティ変数を必要とするため、新しい課題が伴う。本稿では,既存手法の欠点を考察し,浮動小数点数のバイナリ表現に基づく生成モデルの新たな埋め込み戦略と比較する。これにより正規化の必要性を排除し、情報を保存し、生成モデルを条件付けするための汎用的な埋め込み空間を作成する。この手法は任意の数にネットワークを条件付けし、生成した微細構造画像のきめ細かい制御を提供し、加速材料設計に寄与することができる。

In materials science, the challenge of rapid prototyping materials with desired properties often involves extensive experimentation to find suitable microstructures. Additionally, finding microstructures for given properties is typically an ill-posed problem where multiple solutions may exist. Using generative machine learning models can be a viable solution which also reduces the computational cost. This comes with new challenges because, e.g., a continuous property variable as conditioning input to the model is required. We investigate the shortcomings of an existing method and compare this to a novel embedding strategy for generative models that is based on the binary representation of floating point numbers. This eliminates the need for normalization, preserves information, and creates a versatile embedding space for conditioning the generative model. This technique can be applied to condition a network on any number, to provide fine control over generated microstructure images, thereby contributing to accelerated materials design.

翻訳日:2023-11-21 20:42:07 公開日:2023-11-19

# 複数モーダルの同時埋め込み学習による出現コード

Appearance Codes using Joint Embedding Learning of Multiple Modalities ( http://arxiv.org/abs/2311.11427v1 )

ライセンス: Link先を確認

Alex Zhang and Evan Dogariu

(参考訳) 近年のジェネレーティブ・モデリングにおける外観コードの使用により、シーンの昼夜のレンダリングなど、様々な外観と照明を備えた新しいビューレンダリングが可能となった。この手法の大きな限界は,各シーンにおける新たな外観符号の再学習の必要性であり,異なるモード間のコントラスト的損失制約を強制することにより,シーンの外観と構造に対する共同埋め込み空間を学習するフレームワークを提案する。我々はRADIATEデータセット上の単純な変分オートエンコーダモデルに適用し、付加的な最適化イテレーションなしで夜間画像の新しいレンダリングを生成することができることを定性的に示す。さらに,標準的な画像毎出現コード技術を用いたベースラインvaeと比較し,推定で見当たらない画像の出現コードを学習することなく,同様の品質の世代を実現できることを示す。

The use of appearance codes in recent work on generative modeling has enabled novel view renders with variable appearance and illumination, such as day-time and night-time renders of a scene. A major limitation of this technique is the need to re-train new appearance codes for every scene on inference, so in this work we address this problem proposing a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities. We apply our framework to a simple Variational Auto-Encoder model on the RADIATE dataset \cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new renders of night-time photos using day-time appearance codes without additional optimization iterations. Additionally, we compare our model to a baseline VAE that uses the standard per-image appearance code technique and show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.

翻訳日:2023-11-21 20:34:16 公開日:2023-11-19

# テンソルアウェアエネルギー会計

Tensor-Aware Energy Accounting ( http://arxiv.org/abs/2311.11424v1 )

ライセンス: Link先を確認

Timur Babakol and Yu David Liu

(参考訳) ディープラーニング(DL)がサポートする人工知能(AI)アプリケーションの急速な成長に伴い、これらのアプリケーションのエネルギー効率は持続可能性に大きな影響を与えている。 SmaragdineはTensorFlowで実装されたテンソルベースのDLプログラムのための新しいエネルギー会計システムである。 SmaragdineはDLプログラムの内部構造を認識しており、テンソル対応エネルギー会計と呼んでいる。スマラグジンでは、DLプログラムのエネルギー消費は、その論理的階層的な分解構造に沿った単位に分解することができる。我々は、最も広く使われている言語モデルの一つであるBERTのエネルギー挙動を理解するためにSmaragdineを適用した。 Smaragdineは、BERTの最も高いエネルギー/電力消費成分を識別することができる。さらに,Smaragdineが下流のツールチェーン構築をどのようにサポートしているかを事例として,BERTのハイパーパラメータチューニングによるエネルギー影響と,BERTが次世代のALBERTに進化する際のエネルギー挙動の進化を比較検討した。

With the rapid growth of Artificial Intelligence (AI) applications supported by deep learning (DL), the energy efficiency of these applications has an increasingly large impact on sustainability. We introduce Smaragdine, a new energy accounting system for tensor-based DL programs implemented with TensorFlow. At the heart of Smaragdine is a novel white-box methodology of energy accounting: Smaragdine is aware of the internal structure of the DL program, which we call tensor-aware energy accounting. With Smaragdine, the energy consumption of a DL program can be broken down into units aligned with its logical hierarchical decomposition structure. We apply Smaragdine for understanding the energy behavior of BERT, one of the most widely used language models. Layer-by-layer and tensor-by-tensor, Smaragdine is capable of identifying the highest energy/power-consuming components of BERT. Furthermore, we conduct two case studies on how Smaragdine supports downstream toolchain building, one on the comparative energy impact of hyperparameter tuning of BERT, the other on the energy behavior evolution when BERT evolves to its next generation, ALBERT.

翻訳日:2023-11-21 20:33:57 公開日:2023-11-19

# 混合データセットを用いた無線ネットワーク最適化のためのオフライン強化学習

Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets ( http://arxiv.org/abs/2311.11423v1 )

ライセンス: Link先を確認

Kun Yang, Cong Shen, Jing Yang, Shu-ping Yeh, Jerry Sydir

(参考訳) 近年の強化学習(RL)は、無線無線リソース管理(RRM)におけるオンラインRLの採用を促進している。しかし、オンラインRLアルゴリズムは環境との直接の相互作用を必要とするが、RLにおける避けられない探索による潜在的な性能損失を考えると、望ましくないかもしれない。本研究ではまず, RRM 問題の解法における \emph{offline} RL アルゴリズムの利用について検討する。我々は,ユーザスケジューリングによる線形結合を最大化することを目的とした特定のRRM問題に対して,動作制約付きQラーニング(BCQ),保守的Qラーニング(CQL),暗黙的Qラーニング(IQL)を含む,最先端のオフラインRLアルゴリズムを評価した。 rrm問題に対するオフラインrlの性能は、データ収集に使用される行動ポリシーに極めて依存しており、さらに、異なる行動ポリシーによって収集される異種データセットを活用する新しいオフラインrlソリューションを提案する。データセットの適切な混合により、オフラインRLは、すべての関連する行動ポリシーが極めて最適である場合でも、ほぼ最適RLポリシーを生成することができることを示す。

The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first investigate the use of \emph{offline} RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms, including behavior constrained Q-learning (BCQ), conservative Q-learning (CQL), and implicit Q-learning (IQL), for a specific RRM problem that aims at maximizing a linear combination {of sum and} 5-percentile rates via user scheduling. We observe that the performance of offline RL for the RRM problem depends critically on the behavior policy used for data collection, and further propose a novel offline RL solution that leverages heterogeneous datasets collected by different behavior policies. We show that with a proper mixture of the datasets, offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.

翻訳日:2023-11-21 20:33:38 公開日:2023-11-19

# 識別不能閾値における精度:分類アルゴリズムの評価法

Precision at the indistinguishability threshold: a method for evaluating classification algorithms ( http://arxiv.org/abs/2311.11422v1 )

ライセンス: Link先を確認

David J. T. Sumpter

(参考訳) aucやf1-scoreなど、分類アルゴリズムのパフォーマンスを評価するための単一の数値メトリクスは幅広く存在する(wikipediaは17の指標をリストアップし、27の異なる名前を持っている)。本稿では,猫と実猫を区別できないようにアルゴリズムが調整された場合,猫が実際に猫を包んでいるとラベル付けされた画像が,どれくらいの頻度で存在するのか,という疑問に答えるための新しい指標を提案する。この計量を構成するステップは次のとおりである。まず、アルゴリズムが2つの無作為なチョセン画像(例えば、猫を含むとラベルづけされた画像)と、実際に猫を含む画像から1つの画像(つまり、猫を含むとラベルされた画像)を示すとき、最も高いスコアを持つ画像が実際の猫画像の集合から選択された画像の確率が50\%であるように閾値スコアを設定する。この判定閾値では、正のラベル付き画像の集合は正の画像の集合と区別できない。 2番目のステップとして、猫を含むものとしてラベル付けされた画像からランダムに選択された画像が実際に猫を含む頻度を問うことで、パフォーマンスを測定する。この計量は「区別不能閾値での精度」と考えることができる。この新しいメトリクスは、これらのメトリクスすべてに固有の精度とリコールのトレードオフに対処するものではないが、このメソッドがAUCなどの使用時に発生する落とし穴を回避し、例えばF1スコアよりもモチベーションがよいことを示す。

There exist a wide range of single number metrics for assessing performance of classification algorithms, including AUC and the F1-score (Wikipedia lists 17 such metrics, with 27 different names). In this article, I propose a new metric to answer the following question: when an algorithm is tuned so that it can no longer distinguish labelled cats from real cats, how often does a randomly chosen image that has been labelled as containing a cat actually contain a cat? The steps to construct this metric are as follows. First, we set a threshold score such that when the algorithm is shown two randomly-chosen images -- one that has a score greater than the threshold (i.e. a picture labelled as containing a cat) and another from those pictures that really does contain a cat -- the probability that the image with the highest score is the one chosen from the set of real cat images is 50\%. At this decision threshold, the set of positively labelled images are indistinguishable from the set of images which are positive. Then, as a second step, we measure performance by asking how often a randomly chosen picture from those labelled as containing a cat actually contains a cat. This metric can be thought of as {\it precision at the indistinguishability threshold}. While this new metric doesn't address the tradeoff between precision and recall inherent to all such metrics, I do show why this method avoids pitfalls that can occur when using, for example AUC, and it is better motivated than, for example, the F1-score.

翻訳日:2023-11-21 20:33:18 公開日:2023-11-19

# LifeLearner:組み込みコンピューティングプラットフォームのためのハードウェア対応メタ継続学習システム

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms ( http://arxiv.org/abs/2311.11420v1 )

ライセンス: Link先を確認

Young D. Kwon, Jagmohan Chauhan, Hong Jia, Stylianos I. Venieris, and Cecilia Mascolo

(参考訳) 連続学習(continual learning, cl)は、ユーザのパーソナライゼーションや家庭用ロボットといったアプリケーションに対して、オンザフライで学習とコンテキスト適応を可能にする。これはコンテキスト、アクション、ユーザが変更する場合に重要な機能です。しかし、リソース制約のある組み込みシステムでCLを有効にすることは、ラベル付きデータ、メモリ、計算能力に制限があるため困難である。本稿では,システムリソース(低メモリ,レイテンシ,エネルギー消費)を劇的に最適化し,高い精度を保ちながら,ハードウェアを意識したメタ連続学習システムlifelearnerを提案する。具体的には,(1)データ不足問題に明示的に対処し,高い精度を確保するためのメタラーニングとリハーサル戦略,(2)損失のない圧縮を効果的に組み合わせてCLとリハーサルサンプルのリソース要求を大幅に削減する,(3)ハードウェア特性を考慮した組込みおよびIoTプラットフォーム上でのハードウェア認識システムを開発する。その結果、lifelearnerは、oracleのベースラインと比較して精度が2.8%低下し、ほぼ最適のcl性能を達成している。最先端(SOTA)メタCL法では、LifeLearnerはメモリフットプリントを(178.7x)大幅に削減し、エンドツーエンドのレイテンシを80.8-94.2%、エネルギー消費を80.9-94.2%削減した。さらに、2つのエッジデバイスとマイクロコントローラユニットにLifeLearnerを配置し、リソース制約のあるプラットフォーム上で、SOTAメソッドの実行が不可能な効率的なCLと、適応可能なCLをユビキタスに極端に展開することを可能にする。コードはhttps://github.com/theyoungkwon/lifelearnerで入手できる。

Continual Learning (CL) allows applications such as user personalization and household robots to learn on the fly and adapt to context. This is an important feature when context, actions, and users change. However, enabling CL on resource-constrained embedded systems is challenging due to the limited labeled data, memory, and computing capacity. In this paper, we propose LifeLearner, a hardware-aware meta continual learning system that drastically optimizes system resources (lower memory, latency, energy consumption) while ensuring high accuracy. Specifically, we (1) exploit meta-learning and rehearsal strategies to explicitly cope with data scarcity issues and ensure high accuracy, (2) effectively combine lossless and lossy compression to significantly reduce the resource requirements of CL and rehearsal samples, and (3) developed hardware-aware system on embedded and IoT platforms considering the hardware characteristics. As a result, LifeLearner achieves near-optimal CL performance, falling short by only 2.8% on accuracy compared to an Oracle baseline. With respect to the state-of-the-art (SOTA) Meta CL method, LifeLearner drastically reduces the memory footprint (by 178.7x), end-to-end latency by 80.8-94.2%, and energy consumption by 80.9-94.2%. In addition, we successfully deployed LifeLearner on two edge devices and a microcontroller unit, thereby enabling efficient CL on resource-constrained platforms where it would be impractical to run SOTA methods and the far-reaching deployment of adaptable CL in a ubiquitous manner. Code is available at https://github.com/theyoungkwon/LifeLearner.

翻訳日:2023-11-21 20:32:51 公開日:2023-11-19

# DiffSCI:反復スペクトル拡散モデルによるゼロショットスナップショット圧縮イメージング

DiffSCI: Zero-Shot Snapshot Compressive Imaging via Iterative Spectral Diffusion Model ( http://arxiv.org/abs/2311.11417v1 )

ライセンス: Link先を確認

Zhenghao Pan, Haijin Zeng, Jiezhang Cao, Kai Zhang, Yongyong Chen

(参考訳) 本稿では,マルチスペクトラル画像(msi)のためのスナップショット圧縮画像再構成(sci)の精度向上に尽力する。そこで我々は,既存のSCI技術と画像生成モデルとを融合し,DiffSCIと呼ばれる新規なゼロショット拡散モデルを提案する。 DiffSCIは、より深い事前および最適化に基づく方法論からの構造的洞察を活用し、現代の認知拡散モデルによって提供される生成能力を補完する。具体的には,まず,プラグ・アンド・プレイ・フレームワークにおける生成的デノイザーとして,rgb画像の実質的なコーパスでトレーニングされた事前学習された拡散モデルを用いる。この統合により、特に現在の手法が効果的な解決に苦慮している場合には、SCI再構築が成功する。次に,スペクトル帯域相関を体系的に考慮し,波長ミスマッチを緩和するロバストな手法を導入することで,rgb拡散モデルをmsisにシームレスに適応させることができる。第3に、データサブプロブレムの解像度を早めるために、高速化アルゴリズムを実装した。この増強は収束速度を加速するだけでなく、再構築過程の品質を高める。我々は、DiffSCIが、自己教師付きおよびゼロショットアプローチよりも明確なパフォーマンス向上を示し、シミュレートされたデータセットと実際のデータセットの両方にまたがる教師付きトランスフォーマーよりも優れていることを示すための広範なテストを示す。私たちのコードは利用可能です。

This paper endeavors to advance the precision of snapshot compressive imaging (SCI) reconstruction for multispectral image (MSI). To achieve this, we integrate the advantageous attributes of established SCI techniques and an image generative model, propose a novel structured zero-shot diffusion model, dubbed DiffSCI. DiffSCI leverages the structural insights from the deep prior and optimization-based methodologies, complemented by the generative capabilities offered by the contemporary denoising diffusion model. Specifically, firstly, we employ a pre-trained diffusion model, which has been trained on a substantial corpus of RGB images, as the generative denoiser within the Plug-and-Play framework for the first time. This integration allows for the successful completion of SCI reconstruction, especially in the case that current methods struggle to address effectively. Secondly, we systematically account for spectral band correlations and introduce a robust methodology to mitigate wavelength mismatch, thus enabling seamless adaptation of the RGB diffusion model to MSIs. Thirdly, an accelerated algorithm is implemented to expedite the resolution of the data subproblem. This augmentation not only accelerates the convergence rate but also elevates the quality of the reconstruction process. We present extensive testing to show that DiffSCI exhibits discernible performance enhancements over prevailing self-supervised and zero-shot approaches, surpassing even supervised transformer counterparts across both simulated and real datasets. Our code will be available.

翻訳日:2023-11-21 20:32:18 公開日:2023-11-19

# 大規模言語モデルに対するセキュリティリスク分類法

A Security Risk Taxonomy for Large Language Models ( http://arxiv.org/abs/2311.11415v1 )

ライセンス: Link先を確認

Erik Derner and Kristina Batisti\v{c} and Jan Zah\'alka and Robert Babu\v{s}ka

(参考訳) 大規模言語モデル(LLM)がより多くのアプリケーションに浸透するにつれて、関連するセキュリティリスクの評価がますます必要になる。不正情報からデータ漏洩や評判の損傷まで、悪意のある俳優による搾取の可能性はかなり大きい。本稿では,llmsが生み出すセキュリティリスクに着目し,広くカバーされている倫理的,社会的な影響を超えて,現在の研究におけるギャップについて述べる。本研究は,LSMに対する迅速な攻撃に着目し,ユーザモデル通信パイプラインに沿ったセキュリティリスクの分類法を提案する。ターゲットと攻撃タイプによる攻撃を、プロンプトベースのインタラクションスキームに分類する。分類学は、これらのリスクの実際の影響を示す特定の攻撃例で強化されている。この分類を通じて、堅牢でセキュアなllmアプリケーションの開発に報知し、安全性と信頼性を高めることを目的とする。

As large language models (LLMs) permeate more and more applications, an assessment of their associated security risks becomes increasingly necessary. The potential for exploitation by malicious actors, ranging from disinformation to data breaches and reputation damage, is substantial. This paper addresses a gap in current research by focusing on the security risks posed by LLMs, which extends beyond the widely covered ethical and societal implications. Our work proposes a taxonomy of security risks along the user-model communication pipeline, explicitly focusing on prompt-based attacks on LLMs. We categorize the attacks by target and attack type within a prompt-based interaction scheme. The taxonomy is reinforced with specific attack examples to showcase the real-world impact of these risks. Through this taxonomy, we aim to inform the development of robust and secure LLM applications, enhancing their safety and trustworthiness.

翻訳日:2023-11-21 20:31:54 公開日:2023-11-19

# クロスドメイン時系列解析タスクのための大規模事前学習時系列モデル

Large Pre-trained time series models for cross-domain Time series analysis tasks ( http://arxiv.org/abs/2311.11413v1 )

ライセンス: Link先を確認

Harshavardhan Kamarthi, B. Aditya Prakash

(参考訳) 大規模な事前学習モデルは、個々の下流タスクのためのモデルトレーニングをより効率的にし、優れたパフォーマンスを提供するために、言語やビジョンのような領域において重要な進歩に役立っている。しかしながら、時系列分析タスクに取り組むには、通常、タスク特有のトレーニングデータとドメイン専門知識を活用して、スクラッチから分離したモデルを設計およびトレーニングすることが必要となる。我々は、複数の不均一な時系列データセットから一般的な時系列モデルを事前学習するための重要な課題に取り組む:異なるドメインから異なるダイナミクスの時系列をモデル化するためのモデルに意味的に有用な入力を提供する。逐次モデルへの入力として時系列をセグメントに分割することで意味論的により良い入力が得られることを観察し、事前学習中に自己教師付き学習損失を利用した最適なデータセット特異的セグメンテーション戦略を自動的に識別する新しいモデルLPTMを提案する。 LPTMは、ドメイン固有の最先端モデルと同等かそれ以上のパフォーマンスを提供し、複数の異なるドメインからの幅広い時系列分析タスクにおいて、最大40%のデータを取り込み、50%のトレーニング時間で最先端のパフォーマンスを達成することができる。

Large pre-trained models have been instrumental in significant advancements in domains like language and vision making model training for individual downstream tasks more efficient as well as provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a general time-series model from multiple heterogeneous time-series dataset: providing semantically useful inputs to models for modeling time series of different dynamics from different domains. We observe that partitioning time-series into segments as inputs to sequential models produces semantically better inputs and propose a novel model LPTM that automatically identifies optimal dataset-specific segmentation strategy leveraging self-supervised learning loss during pre-training. LPTM provides performance similar to or better than domain-specific state-of-art model and is significantly more data and compute efficient taking up to 40% less data as well as 50% less training time to achieve state-of-art performance in a wide range of time-series analysis tasks from multiple disparate domain.

翻訳日:2023-11-21 20:31:39 公開日:2023-11-19

# ニューラル量子埋め込み: 量子教師付き学習の限界を押し上げる

Neural Quantum Embedding: Pushing the Limits of Quantum Supervised Learning ( http://arxiv.org/abs/2311.11412v1 )

ライセンス: Link先を確認

Tak Hur, Israel F. Araujo, Daniel K. Park

(参考訳) 量子埋め込みは古典的なデータに量子機械学習技術を適用するのに不可欠であり、性能にかなりの影響を及ぼす。本研究では,古典的深層学習手法を活用し,量子埋め込みを効率的に最適化するニューラル量子埋め込み(nqe)を提案する。 NQEは経験的リスクの低いバウンダリを強化し、分類性能を大幅に改善する。さらに、NQEはノイズに対する堅牢性を改善する。 nqeの有効性を検証するため,画像データ分類のためのibm量子デバイス実験を行い,0.52から0.96までの精度向上を実現した。局所有効次元の数値解析は、nqeが量子ニューラルネットワークのトレーサビリティと一般化性能を向上させることを強調する。さらに、NQEは期待されるリスクの上界の減少によって証明されるように、量子カーネル法における一般化の改善を実現する。

Quantum embedding is indispensable for applying quantum machine learning techniques to classical data, and has substantial impacts on performance outcomes. In this study, we present Neural Quantum Embedding (NQE), a method that efficiently optimizes quantum embedding by leveraging classical deep learning techniques. NQE enhances the lower bound of the empirical risk, leading to substantial improvements in classification performance. Moreover, NQE improves robustness against noise. To validate the effectiveness of NQE, we conduct experiments on IBM quantum devices for image data classification, resulting in a remarkable accuracy enhancement from 0.52 to 0.96. Numerical analysis of the local effective dimension highlights that NQE improves the trainability and generalization performance of quantum neural networks. Furthermore, NQE achieves improved generalization in the quantum kernel method, as evidenced by a reduction in the upper bound of the expected risk.

翻訳日:2023-11-21 20:31:18 公開日:2023-11-19

# 機械ミーニング応用のための交渉表現

Negotiated Representations for Machine Mearning Application ( http://arxiv.org/abs/2311.11410v1 )

ライセンス: Link先を確認

Nuri Korhan, Samet Bayram

(参考訳) オーバーフィッティング(Overfitting)は、マシンラーニングモデルが長時間トレーニングされ、提供されるトレーニングラベルに対するトレーニングサンプルの正確な適合性に過度に集中し、テストデータに有用な予測ルールを追跡することができない場合に発生する現象である。この現象は、通常、特定のサンプルの記憶、ノイズの記憶、および多数のニューロンを用いて限られたサンプルのデータセットにフィットネスを強制することに起因する。トレーニングプロセスが継続するにつれて、モデルが様々な特徴を符号化することは事実であるが、過適合のほとんどは、明確に定義されたメンバーシップ比の調整の過程で起こると論じている。本研究では,事前決定されたクラスラベルを用いたサンプルの出力表現の交渉を可能にすることにより,機械学習モデルの分類精度を向上させる手法を提案する。入力のモデル解釈と提供されたラベルとのネゴシエーションを設定することで,平均的な分類精度を向上させるだけでなく,他の正規化手法を使わずにオーバーフィッティング率を下げることができた。 cifar 10やcifar 100、mnistといった公開データセットからオーバーフィットシナリオを生成することによって、いくつかのローレジーム機械学習問題に対する交渉パラダイムのアプローチを実装することにより、提案手法が、その目的よりも多くの能力を持つことを実証した。実験結果を共有し、機械学習コミュニティに提案されたパラダイムの限界を探らせています。また、継続学習などの他の研究分野における学習課題を克服するために、交渉パラダイムを活用するようコミュニティに促すことも目指している。実験的なセットアップのPythonコードはGitHubにアップロードされる。

Overfitting is a phenomenon that occurs when a machine learning model is trained for too long and focused too much on the exact fitness of the training samples to the provided training labels and cannot keep track of the predictive rules that would be useful on the test data. This phenomenon is commonly attributed to memorization of particular samples, memorization of the noise, and forced fitness into a data set of limited samples by using a high number of neurons. While it is true that the model encodes various peculiarities as the training process continues, we argue that most of the overfitting occurs in the process of reconciling sharply defined membership ratios. In this study, we present an approach that increases the classification accuracy of machine learning models by allowing the model to negotiate output representations of the samples with previously determined class labels. By setting up a negotiation between the models interpretation of the inputs and the provided labels, we not only increased average classification accuracy but also decreased the rate of overfitting without applying any other regularization tricks. By implementing our negotiation paradigm approach to several low regime machine learning problems by generating overfitting scenarios from publicly available data sets such as CIFAR 10, CIFAR 100, and MNIST we have demonstrated that the proposed paradigm has more capacity than its intended purpose. We are sharing the experimental results and inviting the machine learning community to explore the limits of the proposed paradigm. We also aim to incentive the community to exploit the negotiation paradigm to overcome the learning related challenges in other research fields such as continual learning. The Python code of the experimental setup is uploaded to GitHub.

翻訳日:2023-11-21 20:31:03 公開日:2023-11-19

# 未来を設計する: メタバースにおけるエンタープライズ統合のモデル

Architecting the Future: A Model for Enterprise Integration in the Metaverse ( http://arxiv.org/abs/2311.11406v1 )

ライセンス: Link先を確認

Amirmohammad Nateghi and Maedeh Mosharraf

(参考訳) 約30年前にさかのぼる歴史があるが、メタバースは今日最も話題になっているテーマの1つに成長してきた。メタバースは当初エンタテインメントに関する議論に限定された後、徐々にビジネス談話の分野での影響を増大させた。メタバースを深く掘り下げる前に、ITに対する不適切な使用や考え方のために情報技術(IT)に大きく依存している企業にとって、失敗とビジネスパスからの逸脱は、非常にありそうである。エンタープライズアーキテクチャ(EA)という考え方は、この問題に対処するためのマネジメント戦略として現れました。 EAの第一の考え方として、企業における不要な負担から指導力、支援力へとITを転換しようとした。そこで,メタバースを基盤としたプラットフォーム上で,EAのアイデアを用いて仮想企業を運営しようとする試みの結果,拡張されたEAモデルを提案する。最後に、概念モデルを評価し、メタバースがビジネスを支援することを実証するために、3つのケーススタディ、分散、バトルインフィニティ、ルームを利用した。

Although it has a history that goes back about three decades, Metaverse has grown to be one of the most talked-about subjects today. Metaverse gradually increased its influence in the realm of business discourse after initially being restricted to discussions about entertainment. Before getting deep into the Metaverse, it should be noted that failure and deviating from the business path are highly likely for an enterprise that relies heavily on information technology (IT) because of improper use and thinking about IT. The idea of enterprise architecture (EA) emerged as a management strategy to address this issue. As the first school of thought of EA, it sought to transform IT from an unnecessary burden in an enterprise to a guiding and supporting force. Then an extended EA model is suggested as a result of the attempt made in this paper to use the idea of EA to steer virtual enterprises on Metaverse-based platforms. Finally, to evaluate the conceptual model and demonstrate that the Metaverse can support businesses, three case studies Decentraland, Battle Infinity, and Rooom were utilized.

翻訳日:2023-11-21 20:30:36 公開日:2023-11-19

# 私をオファーにしよう:観光業における前方・逆オークション問題

Make me an Offer: Forward and Reverse Auctioning Problems in the Tourism Industry ( http://arxiv.org/abs/2311.11400v1 )

ライセンス: Link先を確認

Ioannis T. Christou, Dimitris Doukas, Konstantina Skouri, Gerasimos Meletiou

(参考訳) ほとんどの観光地は、経済と社会に大きな影響を与え、定期的で一貫した季節性に直面している。この現象は、旅行需要が増加したが、地理的に異なる地域において不均一な時代においてより顕著である。 To counter these problems that both customers and hoteliers are facing, we have developed two auctioning systems that allow hoteliers of lower popularity tier areas or during low season periods to auction their rooms in what we call a forward auction model, and also allows customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms, in what constitutes a reverse auction model initiated by the customer, similar to the bidding concept of priceline.com. 我々は,両方のオークションを明示的に定義する数学的プログラミングモデルを開発し,各タイプにおいて,宿泊者側と顧客側の両方において,大きな利益が得られることを示す。本稿では,これらの最適化問題の近似解のアルゴリズム的手法について論じるとともに,最適化解法を用いて最適性を保証する。これらの技術は、中低期の季節を減らし、魅力的なオファーを顧客に提供する顧客と宿泊者の両方にとって有益である。

Most tourist destinations are facing regular and consistent seasonality with significant economic and social impacts. This phenomenon is more pronounced in the post-covid era, where demand for travel has increased but unevenly among different geographic areas. To counter these problems that both customers and hoteliers are facing, we have developed two auctioning systems that allow hoteliers of lower popularity tier areas or during low season periods to auction their rooms in what we call a forward auction model, and also allows customers to initiate a bidding process whereby hoteliers in an area may make offers to the customer for their rooms, in what constitutes a reverse auction model initiated by the customer, similar to the bidding concept of priceline.com. We develop mathematical programming models that define explicitly both types of auctions, and show that in each type, there are significant benefits to be gained both on the side of the hotelier as well as on the side of the customer. We discuss algorithmic techniques for the approximate solution of these optimization problems, and present results using exact optimization solvers to solve them to guaranteed optimality. These techniques could be beneficial to both customer and hotelier reducing seasonality during middle and low season and providing the customer with attractive offers.

翻訳日:2023-11-21 20:30:18 公開日:2023-11-19

# 深層学習アルゴリズムの解釈可能化に向けて

Towards interpretable-by-design deep learning algorithms ( http://arxiv.org/abs/2311.11396v1 )

ライセンス: Link先を確認

Plamen Angelov, Dmitry Kangin, Ziyang Zhang

(参考訳) IDEAL(Interpretable-by-deep learning algorithms)というフレームワークは、標準教師付き分類問題をトレーニングデータから派生したプロトタイプのセットに類似した関数に再キャストすると同時に、いわゆるファンデーションモデル(FM)を形成する大規模ニューラルネットワークの既存の潜時空間を活用する。これは、IG-3.6B + ImageNet-1K や LVD-142M (stage A) のような巨大なデータセット上で事前訓練されたDLモデル(例えば、ビジュアルトランスフォーマー、ViT)の素晴らしい成果から恩恵を受けながら、説明可能性(ステージB)の問題に対処する。 dlモデルを概念的にシンプルで説明可能なプロトタイプにすることができることを示す。 The key findings can be summarized as follows: (1) the proposed models are interpretable through prototypes, mitigating the issue of confounded interpretations, (2) the proposed IDEAL framework circumvents the issue of catastrophic forgetting allowing efficient class-incremental learning, and (3) the proposed IDEAL approach demonstrates that ViT architectures narrow the gap between finetuned and non-finetuned models allowing for transfer learning in a fraction of time \textbf{without} finetuning of the feature space on a target dataset with iterative supervised methods.

The proposed framework named IDEAL (Interpretable-by-design DEep learning ALgorithms) recasts the standard supervised classification problem into a function of similarity to a set of prototypes derived from the training data, while taking advantage of existing latent spaces of large neural networks forming so-called Foundation Models (FM). This addresses the issue of explainability (stage B) while retaining the benefits from the tremendous achievements offered by DL models (e.g., visual transformers, ViT) pre-trained on huge data sets such as IG-3.6B + ImageNet-1K or LVD-142M (stage A). We show that one can turn such DL models into conceptually simpler, explainable-through-prototypes ones. The key findings can be summarized as follows: (1) the proposed models are interpretable through prototypes, mitigating the issue of confounded interpretations, (2) the proposed IDEAL framework circumvents the issue of catastrophic forgetting allowing efficient class-incremental learning, and (3) the proposed IDEAL approach demonstrates that ViT architectures narrow the gap between finetuned and non-finetuned models allowing for transfer learning in a fraction of time \textbf{without} finetuning of the feature space on a target dataset with iterative supervised methods.

翻訳日:2023-11-21 20:30:00 公開日:2023-11-19

# 適応スパイキングニューロンの速度精度シミュレーショントレードオフに対処する

Addressing the speed-accuracy simulation trade-off for adaptive spiking neurons ( http://arxiv.org/abs/2311.11390v1 )

ライセンス: Link先を確認

Luke Taylor, Andrew J King, Nicol S Harper

(参考訳) adaptive leaky integrated-and-fire(alif)モデルは、計算神経科学において基本的な概念であり、脳の研究に役立っている。これらのニューラルネットワークの逐次的な性質のため、一般的に直面する問題は、速度精度のトレードオフである。小さな離散時間ステップ(DT)を用いてニューロンを正確にシミュレートするか、より大きなDTを使用してニューロンをシミュレートし、シミュレーション精度を損なう。ここでは、アルゴリズムでalifモデルを再解釈し、逐次シミュレーションの複雑さを低減し、gpu上でより効率的な並列化を可能にすることで、このジレンマの解を提供する。合成ベンチマークの小さなDTを用いて,50ドル以上のトレーニングスピードアップを得るために,我々の実装を計算的に検証した。また、異なる教師付き分類タスクの標準的なalif実装と同等のパフォーマンスを得ることができました。最後に、我々のモデルが皮質ニューロンの実際の電気生理学的記録を迅速かつ正確に適合させる方法を示し、これは非常に微細なサブミリ秒のDTが正確なスパイクタイミングを捉えるのに不可欠である。

The adaptive leaky integrate-and-fire (ALIF) model is fundamental within computational neuroscience and has been instrumental in studying our brains $\textit{in silico}$. Due to the sequential nature of simulating these neural models, a commonly faced issue is the speed-accuracy trade-off: either accurately simulate a neuron using a small discretisation time-step (DT), which is slow, or more quickly simulate a neuron using a larger DT and incur a loss in simulation accuracy. Here we provide a solution to this dilemma, by algorithmically reinterpreting the ALIF model, reducing the sequential simulation complexity and permitting a more efficient parallelisation on GPUs. We computationally validate our implementation to obtain over a $50\times$ training speedup using small DTs on synthetic benchmarks. We also obtained a comparable performance to the standard ALIF implementation on different supervised classification tasks - yet in a fraction of the training time. Lastly, we showcase how our model makes it possible to quickly and accurately fit real electrophysiological recordings of cortical neurons, where very fine sub-millisecond DTs are crucial for capturing exact spike timing.

翻訳日:2023-11-21 20:29:37 公開日:2023-11-19

# 機械文化

Machine Culture ( http://arxiv.org/abs/2311.11388v1 )

ライセンス: Link先を確認

Levin Brinkmann, Fabian Baumann, Jean-Fran\c{c}ois Bonnefon, Maxime Derex, Thomas F. M\"uller, Anne-Marie Nussberger, Agnieszka Czaplicka, Alberto Acerbi, Thomas L. Griffiths, Joseph Henrich, Joel Z. Leibo, Richard McElreath, Pierre-Yves Oudeyer, Jonathan Stray and Iyad Rahwan

(参考訳) 人類が文化を創造し、広める能力は、種としての成功の最も重要な要素としてしばしば認められている。本稿では,機械が介在する,あるいは生成する,機械文化の概念について考察する。知的機械は、変化、伝達、選択の文化的進化過程を同時に変革すると主張する。 Recommenderアルゴリズムは、社会学習のダイナミクスを変えつつある。チャットボットは新しい文化伝達様式を形成しており、文化モデルとして機能している。さらに、インテリジェントマシンは、ゲーム戦略や視覚芸術から科学的結果に至るまで、文化的な特徴を生み出す貢献者として進化している。本稿では,機械の現在および今後の文化的発展への影響を研究するための概念的枠組みと,機械文化研究のための研究課題について述べる。

The ability of humans to create and disseminate culture is often credited as the single most important factor of our success as a species. In this Perspective, we explore the notion of machine culture, culture mediated or generated by machines. We argue that intelligent machines simultaneously transform the cultural evolutionary processes of variation, transmission, and selection. Recommender algorithms are altering social learning dynamics. Chatbots are forming a new mode of cultural transmission, serving as cultural models. Furthermore, intelligent machines are evolving as contributors in generating cultural traits--from game strategies and visual art to scientific results. We provide a conceptual framework for studying the present and anticipated future impact of machines on cultural evolution, and present a research agenda for the study of machine culture.

翻訳日:2023-11-21 20:29:14 公開日:2023-11-19

# 抽出ダイアログ要約のためのLLM支援セミスーパービジョン

LLM aided semi-supervision for Extractive Dialog Summarization ( http://arxiv.org/abs/2311.11462v1 )

ライセンス: Link先を確認

Nishant Mishra (1 and 2), Gaurav Sahu (3 and 4), Iacer Calixto (1 and 2), Ameen Abu-Hanna (1 and 2), Issam H. Laradji (4 and 5) ((1) Amsterdam UMC, Department of Medical Informatics, University of Amsterdam, (2) Amsterdam Public Health, Methodology, Amsterdam, The Netherlands, (3) University of Waterloo, (4) Servicenow Research, (5) University of British Columbia)

(参考訳) チャットダイアログの高品質な要約を生成するには、しばしば大きなラベル付きデータセットが必要になる。本研究では,ラベルなしデータを用いてユーザエージェント対話の抽出を効率的に行う手法を提案する。本手法では,問合せ問題として要約をフレーム化し,現在最先端の大規模言語モデル(LLM)を用いてダイアログの擬似ラベルを生成する。次に、これらの擬似ラベルを用いてチャット要約モデルを微調整し、大きなLLMからの知識をより小さな特殊モデルに効果的に転送する。従来のラベル付きデータセットの10 % で 65.9/57.0/61.0 ROUGE-1/-2/L を達成することができるのに対し、トレーニングデータセット全体で訓練された現在の最先端技術では 65.16/55.81/64.37 ROUGE-1/-2/L が得られることを示す。言い換えれば、最悪の場合(ROUGE-L)では、パフォーマンスの94.7%を維持しながら、データの10%しか使用していません。

Generating high-quality summaries for chat dialogs often requires large labeled datasets. We propose a method to efficiently use unlabeled data for extractive summarization of customer-agent dialogs. In our method, we frame summarization as a question-answering problem and use state-of-the-art large language models (LLMs) to generate pseudo-labels for a dialog. We then use these pseudo-labels to fine-tune a chat summarization model, effectively transferring knowledge from the large LLM into a smaller specialized model. We demonstrate our method on the \tweetsumm dataset, and show that using 10\% of the original labelled data set we can achieve 65.9/57.0/61.0 ROUGE-1/-2/-L, whereas the current state-of-the-art trained on the entire training data set obtains 65.16/55.81/64.37 ROUGE-1/-2/-L. In other words, in the worst case (i.e., ROUGE-L) we still effectively retain 94.7% of the performance while using only 10% of the data.

翻訳日:2023-11-21 20:19:50 公開日:2023-11-19

# 地平線からのデコヒーレンス:一般定式化と回転ブラックホール

Decoherence from Horizons: General Formulation and Rotating Black Holes ( http://arxiv.org/abs/2311.11461v1 )

ライセンス: Link先を確認

Samuel E. Gralla and Hongji Wei

(参考訳) Danielson, Satishchandran, and Wald (DSW) による最近の研究は、ブラックホール ― そして実際、キリング地平線はより一般的に ― が、近くの全ての量子スーパーポジションに基本的なデコヒーレンスの割合を与えることを示した。ブラックホールの観測者(bob)は、重ねられた重力場を測定することによって、量子重ね合わせの外側を乱すことができるはずであるが、その作用は(因果性によって)この効果を持つことができないため、重ね合わせは自動的に妨害されなければならない。 DSWは、シュワルツシルト時空における遠い観測者、平時時におけるリンドラー観測者、デ・シッター時空における静的観測者に対して、デコヒーレンス率を未知の数値要因まで計算した。電磁的およびクライン=ゴードンアナログで作業し、それらの計算を一般化し、バイフルケートキリング地平線近傍のキリング観測者に対する正確なデコヒーレンス率の一般的な公式を導出する。カーブラックホールの対称性軸上の任意の位置における観測者に対する閉形式の速度を評価する。これにより、遠方のオブザーバーであるシュワルツシルトの結果における数値的要因が修正され、また近接ホリゾンおよび/または極端に近い振る舞いの新たな探索が可能になる。電磁界の場合、クーロン場がブラックホールに入るのを遮蔽する「ブラックホールマイスナー効果」のため、デコヒーレンスは極端に完全に消滅する。ボブは外側の重ね合わせの場を測定することができないので、非一貫性は必要ありません。

Recent work by Danielson, Satishchandran, and Wald (DSW) has shown that black holes -- and, in fact, Killing horizons more generally -- impart a fundamental rate of decoherence on all nearby quantum superpositions. The effect can be understood from measurement and causality: An observer (Bob) in the black hole should be able to disturb outside quantum superpositions by measuring their superposed gravitational fields, but since his actions cannot (by causality) have this effect, the superpositions must automatically disturb themselves. DSW calculated the rate of decoherence up to an unknown numerical factor for distant observers in Schwarzschild spacetime, Rindler observers in flat spacetime, and static observers in de Sitter spacetime. Working in electromagnetic and Klein-Gordon analogs, we flesh out and generalize their calculation to derive a general formula for the precise decoherence rate for Killing observers near bifurcate Killing horizons. We evaluate the rate in closed form for an observer at an arbitrary location on the symmetry axis of a Kerr black hole. This fixes the numerical factor in the distant-observer Schwarzschild result, while allowing new exploration of near-horizon and/or near-extremal behavior. In the electromagnetic case we find that the decoherence vanishes entirely in the extremal limit, due to the "Black hole Meissner effect" screening the Coulomb field from entering the black hole. This supports the causality picture: Since Bob is unable to measure the field of the outside superposition, no decoherence is necessary -- and indeed none occurs.

翻訳日:2023-11-21 20:19:29 公開日:2023-11-19

# 研究ソフトウェアエンジニアの基礎的能力と責任

Foundational Competencies and Responsibilities of a Research Software Engineer ( http://arxiv.org/abs/2311.11457v1 )

ライセンス: Link先を確認

Florian Goth, Renato Alves, Matthias Braun, Leyla Jael Castro, Gerasimos Chourdakis, Simon Christ, Jeremy Cohen, Fredo Erxleben, Jean-No\"el Grad, Magnus Hagdorn, Toby Hodges, Guido Juckeland, Dominic Kempf, Anna-Lena Lamprecht, Jan Linxweiler, Moritz Schwarzmeier, Heidi Seibold, Jan Philipp Thiele, Harald von Waldow, Samantha Wittke

(参考訳) リサーチソフトウェアエンジニア(rse: research software engineer)という用語は、10年以上前に、研究コミュニティで働いている個人を表現する手段として登場した。この用語は広く採用されており、RSEとは何かという高レベルな定義がいくつかある。しかし、RSEの役割は、彼らが働く制度の状況によって異なる。スペクトルの一端では、RSEの役割は伝統的な研究の役割と似ているかもしれない。極端に言えば、それらは業界のソフトウェアエンジニアのものと似ています。 RSEの役割の多くは、この2つの極端の間にある。したがって、RSEが何を行うのか、どのような経験、スキル、能力を必要とするのか、単純で包括的な定義を提供することは困難です。このコミュニティペーパーでは、RSEとは何かという広い概念を定義し、それらが実行しているさまざまなタイプの作業について検討し、基本的能力のリストと、RSEの一般的なプロファイルを定義する値を定義します。そこで我々は,これらのスキルのさまざまな側面による進歩,特定のタイプのRSEの役割の考察,組織に対する推奨の提案,将来的な特殊化の例について詳しく述べる。付録には、既存のcurriculaがこのフレームワークにどのように適合するかが詳述されている。

The term Research Software Engineer, or RSE, emerged a little over 10 years ago as a way to represent individuals working in the research community but focusing on software development. The term has been widely adopted and there are a number of high-level definitions of what an RSE is. However, the roles of RSEs vary depending on the institutional context they work in. At one end of the spectrum, RSE roles may look similar to a traditional research role. At the other extreme, they resemble that of a software engineer in industry. Most RSE roles inhabit the space between these two extremes. Therefore, providing a straightforward, comprehensive definition of what an RSE does and what experience, skills and competencies are required to become one is challenging. In this community paper we define the broad notion of what an RSE is, explore the different types of work they undertake, and define a list of fundamental competencies as well as values that define the general profile of an RSE. On this basis, we elaborate on the progression of these skills along different dimensions, looking at specific types of RSE roles, proposing recommendations for organisations, and giving examples of future specialisations. An appendix details how existing curricula fit into this framework.

翻訳日:2023-11-21 20:18:56 公開日:2023-11-19

# タイムアローは生き生きとしたが、物理学の見地からは禁じられている

The Arrow of Time is Alive and Well but Forbidden Under the Received View of Physics ( http://arxiv.org/abs/2311.11456v1 )

ライセンス: Link先を確認

R. E. Kastner

(参考訳) この論文は、"arrow of time"(いわゆる"two times"問題)の文脈における物理学の社会学と歴史のメタレベル分析を提供する。事実上、この2つのトピックは相互に絡み合っており、それは社会学的側面を握り、あるメタフィジカル、認識論的、方法論的信念と実践に固執することでのみ、実際の進歩は物理学でできると主張している。

This essay offers a meta-level analysis in the sociology and history of physics in the context of the "Arrow of Time" or so-called "Two Times" problem. In effect, it argues that the two topics are intertwined, and it is only by coming to grips with the sociological aspects, involving adherence to certain metaphysical, epistemological and methodological beliefs and practices, that real progress can be made in the physics.

翻訳日:2023-11-21 20:18:38 公開日:2023-11-19

# 地磁気異常のリアルタイム検出のための物理強化TinyML

Physics-Enhanced TinyML for Real-Time Detection of Ground Magnetic Anomalies ( http://arxiv.org/abs/2311.11452v1 )

ライセンス: Link先を確認

Talha Siddique and MD Shaad Mahmud

(参考訳) 地磁気外乱(gmds)や地磁気誘導電流(gic)のような宇宙気象現象は、重要な技術基盤に重大なリスクをもたらす。従来の予測モデルはシミュレーションに基礎を置き、理論的な堅牢性を保ちながら、特に不正確なデータと広範な計算複雑性の同化に悩まされている。近年、Tiny Machine Learning (TinyML) が採用され、リアルタイムな地磁気摂動を予測する機械学習(ML)対応磁気センサシステムの開発が進められている。 TinyMLは効率的でリアルタイムなデータ処理を提供するが、本質的な制限は、高い計算要求を持つ堅牢なメソッドの利用を妨げる。本稿では,これらの課題に対処する物理誘導型TinyMLフレームワークを開発した。このフレームワークは、モデルトレーニングと圧縮の段階で物理ベースの正規化を統合し、予測の信頼性を高める。フレームワーク内で開発されたプルーニングスキームは、ドメイン固有の物理的特性を利用し、モデルサイズとロバストさのバランスを崩す。本研究は,開発フレームワークと従来のフレームワークの精度と信頼性を総合的に比較した経験的結果を示す。このような比較分析は、実時間宇宙天気予報のための堅牢なml磁力計システムの概念化における、開発フレームワークの将来の適用可能性の基礎となっている。

Space weather phenomena like geomagnetic disturbances (GMDs) and geomagnetically induced currents (GICs) pose significant risks to critical technological infrastructure. While traditional predictive models, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.

翻訳日:2023-11-21 20:18:28 公開日:2023-11-19

# 重量規範制御

Weight Norm Control ( http://arxiv.org/abs/2311.11446v1 )

ライセンス: Link先を確認

Ilya Loshchilov

(参考訳) 重みの目標ノルムが 0 に設定されるような重みの標準制御において、疎み付き重みの減衰正則化は特別な場合である。分離重み減衰正規化(英語版)(AdamW)を用いる任意の最適化法(例:Adam)は、ウェイトノルム制御を持つより一般的なアルゴリズム(例:AdamWN)の特別な場合と見なすことができる。重みの目標ノルムを0に設定することは準最適であり、他の目標ノルム値を考えることができる。例えば、AdamWが特定の重みのノルムを達成する任意のトレーニングランは、同等の重みのノルムを達成する予定のAdamWNによって挑戦される。重み減衰の代わりに重みノルム制御を導入することの様々な意味について論じる。

We note that decoupled weight decay regularization is a particular case of weight norm control where the target norm of weights is set to 0. Any optimization method (e.g., Adam) which uses decoupled weight decay regularization (respectively, AdamW) can be viewed as a particular case of a more general algorithm with weight norm control (respectively, AdamWN). We argue that setting the target norm of weights to 0 can be suboptimal and other target norm values can be considered. For instance, any training run where AdamW achieves a particular norm of weights can be challenged by AdamWN scheduled to achieve a comparable norm of weights. We discuss various implications of introducing weight norm control instead of weight decay.

翻訳日:2023-11-21 20:18:07 公開日:2023-11-19

# spot the bot:クラスタリングと情報理論技術を用いた人文とボットによるテキストの識別

Spot the Bot: Distinguishing Human-Written and Bot-Generated Texts Using Clustering and Information Theory Techniques ( http://arxiv.org/abs/2311.11441v1 )

ライセンス: Link先を確認

Vasilii Gromov and Quynh Nhu Dang

(参考訳) GPT-3のような生成モデルの開発により、生成したテキストと人間が書いたテキストを区別することがますます困難になっている。ボット識別に優れた結果を示した研究は数多く存在する。しかし、これらの研究の大部分は、ラベル付きデータやボットモデルアーキテクチャに関する事前知識を必要とする教師あり学習手法に依存している。本研究では,教師なし学習手法に基づいて,大量のラベル付きデータに依存しないボット識別アルゴリズムを提案する。クラスタリング (crisp と fuzzy) による意味解析の知見と情報技術を組み合わせることで,さまざまな種類のボットに対して生成されたテキストを検出する頑健なモデルを構築する。生成したテキストはよりカオス的になりがちだが、文学作品はより複雑である。また、人間のテキストのクラスタリングは、ボット生成テキストのよりコンパクトでより分離されたクラスタと比較してファジエクラスタをもたらすことを示した。

With the development of generative models like GPT-3, it is increasingly more challenging to differentiate generated texts from human-written ones. There is a large number of studies that have demonstrated good results in bot identification. However, the majority of such works depend on supervised learning methods that require labelled data and/or prior knowledge about the bot-model architecture. In this work, we propose a bot identification algorithm that is based on unsupervised learning techniques and does not depend on a large amount of labelled data. By combining findings in semantic analysis by clustering (crisp and fuzzy) and information techniques, we construct a robust model that detects a generated text for different types of bot. We find that the generated texts tend to be more chaotic while literary works are more complex. We also demonstrate that the clustering of human texts results in fuzzier clusters in comparison to the more compact and well-separated clusters of bot-generated texts.

翻訳日:2023-11-21 20:17:51 公開日:2023-11-19

# Slicing Aided Hyper Inference with Refinement Strategy による高度なICノードの欠陥検出と分類法の改善

Improved Defect Detection and Classification Method for Advanced IC Nodes by Using Slicing Aided Hyper Inference with Refinement Strategy ( http://arxiv.org/abs/2311.11439v1 )

ライセンス: Link先を確認

Vic De Ridder, Bappaditya Dey, Victor Blanco, Sandip Halder, Bartel Van Waeyenberge

(参考訳) 半導体製造において、リソグラフィーはしばしば最小のパターン次元を定義する製造ステップである。近年,高NA(数値開口)EUVL(Extreme-Ultraviolet-Lithography)パラダイムへの進展が見られ,パターン縮小(2nm以下)が期待されている。しかし,高naでは確率的欠陥の増加と欠陥検出の複雑さが顕著になる。現状の欠陥検査技術(非機械学習と機械学習ベースの両方)は、高NA次元での良好な性能を達成できない。本研究では,slicing aided hyper inference (sahi) フレームワークを用いて,現在の手法を改善する方法について検討する。 SAHIを用いて、SEM画像のサイズ増加スライスに対して推論を行う。これにより、オブジェクト検出器の受信フィールドは、小さな欠陥インスタンスをキャプチャするのにより効果的になる。まず,これまでに検討した半導体データセットの性能を様々な構成でベンチマークし,SAHI法により小さな欠陥の検出を近似により大幅に向上することを示した。 2倍。その後、トレーニング中にシナリオが発生しなかった新しいテストデータセットに対して、SAHIの適用が欠陥のない検出率につながることを実証した。最後に、真陽性予測を著しく減らすことなく偽陽性予測を排除できるsahiの拡張を定式化する。

In semiconductor manufacturing, lithography has often been the manufacturing step defining the smallest possible pattern dimensions. In recent years, progress has been made towards high-NA (Numerical Aperture) EUVL (Extreme-Ultraviolet-Lithography) paradigm, which promises to advance pattern shrinking (2 nm node and beyond). However, a significant increase in stochastic defects and the complexity of defect detection becomes more pronounced with high-NA. Present defect inspection techniques (both non-machine learning and machine learning based), fail to achieve satisfactory performance at high-NA dimensions. In this work, we investigate the use of the Slicing Aided Hyper Inference (SAHI) framework for improving upon current techniques. Using SAHI, inference is performed on size-increased slices of the SEM images. This leads to the object detector's receptive field being more effective in capturing small defect instances. First, the performance on previously investigated semiconductor datasets is benchmarked across various configurations, and the SAHI approach is demonstrated to substantially enhance the detection of small defects, by approx. 2x. Afterwards, we also demonstrated application of SAHI leads to flawless detection rates on a new test dataset, with scenarios not encountered during training, whereas previous trained models failed. Finally, we formulate an extension of SAHI that does not significantly reduce true-positive predictions while eliminating false-positive predictions.

翻訳日:2023-11-21 20:17:35 公開日:2023-11-19

# バーと形状距離の二重性とニューラル表現の比較

Duality of Bures and Shape Distances with Implications for Comparing Neural Representations ( http://arxiv.org/abs/2311.11436v1 )

ライセンス: Link先を確認

Sarah E. Harvey, Brett W. Larsen, Alex H. Williams

(参考訳) ニューラルネットワーク表現間の複数の類似度尺度が提案され、その結果、断片化された研究ランドスケープが生み出された。これらの措置のほとんどは2つのカテゴリーの1つに分類される。第一に、線形回帰、正準相関解析(CCA)、形状距離といった尺度は、期待される不変性を考慮して類似性を定量化するために神経ユニット間の明示的なマッピングを学習する。第二に、表現的類似度解析(RSA)、中心核アライメント(CKA)、正規化されたバーズ類似度(NBS)といった尺度は、既に期待される対称性に不変である刺激バイ刺激核行列のような要約統計において類似度を定量化する。ここでは、リーマン形状距離(圏 1 から)の余弦が NBS (圏 2 から) に等しいことを観察することによって、これらの2つの広い圏の方法を統合するためのステップをとる。この関係が形状距離やNBSの新たな解釈につながるのかを考察し、深層学習文学において一般的な類似度尺度であるCKAと対比する。

A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. Here, we take steps towards unifying these two broad categories of methods by observing that the cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2). We explore how this connection leads to new interpretations of shape distances and NBS, and draw contrasts of these measures with CKA, a popular similarity measure in the deep learning literature.

翻訳日:2023-11-21 20:17:13 公開日:2023-11-19

# インドにおける新型コロナウイルスワクチンの機械学習による感受性分析

Unveiling Public Perceptions: Machine Learning-Based Sentiment Analysis of COVID-19 Vaccines in India ( http://arxiv.org/abs/2311.11435v1 )

ライセンス: Link先を確認

Milind Gupta and Abhishek Kaushik

(参考訳) 2020年3月、世界保健機関(WHO)は新型コロナウイルスの世界的な感染拡大を宣言。 2021年半ばまでに、インドはコビシエルド、コヴァクシン、スプートニクの3つのワクチンを導入した。インドのような人口密度の高い国でワクチン接種が成功するためには、大衆の感情を理解することが不可欠だった。ソーシャルメディア、特にredditは4億3000万人のユーザーを抱えており、情報を広める上で重要な役割を果たした。この研究では、Redditのデータを分析し、新型コロナウイルスワクチンに対するインド人の感情を測定するためにデータマイニング技術を採用している。 PythonのText Blobライブラリを使って、コメントは一般的な感情を評価するために注釈付けされる。結果、インドのredditユーザーのほとんどが、予防接種に関する中立性を示しており、インド政府は人口のかなりの部分を予防接種しようとしている。

In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.

翻訳日:2023-11-21 20:16:54 公開日:2023-11-19

# トポロジカルエッジ状態による光子遮断の促進

Enhancement of photon blockade via topological edge states ( http://arxiv.org/abs/2311.11431v1 )

ライセンス: Link先を確認

Jun Li, Can-ming Hu, Yaping Yang

(参考訳) 量子技術は、量子光源の不安定性、量子デコヒーレンス、トポロジカルフォトニクスが完全に対処する損失に対する脆弱性など、古典的なタスクよりも指数関数的に優れたパフォーマンスを約束している。ここでは, 位相的保護により単一光子遮断効果(単一PB)を著しく向上させる量子Su-Schrieffer-Heeger型連鎖を理論的に提案する。意図的な結合強度を設計することにより、量子レベル格子は、単励起空間に位相的エッジ状態を持つ1次元配列と、2励起空間に位相的コーナー状態を持つ2次元四角形呼吸格子の形状となり、結果として単光子励起と2光子遷移の抑制が促進される。そのため, 2次相関関数は, 高輝度の共振器共振器周波数で最大2次まで減少し, さらに, PB効果は共振器-量子結合および量子周波数の局所摂動に強く, 位相的保護の恩恵を受けている。

Quantum technologies, holding the promise of exponentially superior performance than their classical counterparts for certain tasks, have consistently encountered challenges, including instability in quantum light sources, quantum decoherence and vulnerability to losses that topological photonics happens to adeptly address. Here, we theoretically put forth a quantum Su-Schrieffer-Heeger-type chain designed to greatly enhance single-photon blockade (single-PB) effect with topological protection. By designing the deliberate coupling strengths, the quantum-level lattices take the form of a one-dimensional array with a topological edge state in single-excitation space and a two-dimensional square breathing lattice with topological corner states in two-excitation space, resulting in enhanced single-photon excitation and the suppression of two-photon transitions. Therefore the second-order correlation function is diminished by up to two orders of magnitude at the cavity resonance frequency, accompanied by stronger brightness.Furthermore, the PB effect is robust to local perturbations in cavity-qubit coupling and qubit frequency, benefitting from topological protection.

翻訳日:2023-11-21 20:16:42 公開日:2023-11-19

# ニューラルネットワークトレーニングにおける重みと入力間の高速重み付き内積同定

Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training ( http://arxiv.org/abs/2311.11429v1 )

ライセンス: Link先を確認

Lianke Qin, Saayan Mitra, Zhao Song, Yuanyuan Yang, Tianyi Zhou

(参考訳) In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $|A|=|B| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i.e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, the goal is to identify those $k$ heavy inner products. 我々は、$o(n^{2 \omega / 3+ o(1)})$で実行されるアルゴリズムを提供し、$\rho \cdot d$しきい値を超える$k$の内積ペアを高い確率で見つけ、$\omega$が現在の行列乗算指数である。この問題を解決することで、ReLUアクティベーション機能を備えたニューラルネットワークのトレーニングを高速化する。

In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $|A|=|B| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i.e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, the goal is to identify those $k$ heavy inner products. We provide an algorithm that runs in $O(n^{2 \omega / 3+ o(1)})$ time to find the $k$ inner product pairs that surpass $\rho \cdot d$ threshold with high probability, where $\omega$ is the current matrix multiplication exponent. By solving this problem, our method speed up the training of neural networks with ReLU activation function.

翻訳日:2023-11-21 20:16:19 公開日:2023-11-19

PDF登録状況（公開日: 20231119）