Fugu-MT: arxivの論文翻訳

このサイトではarxivの論文のうち、30ページ以下でCreative Commonsライセンス（CC 0, CC BY, CC BY-SA）の論文を日本語訳しています。本文がCCでない論文、長すぎる論文はメタデータのみを翻訳しています。（arxivのメタデータは CC 0です。）翻訳文のライセンスはCC BY-SA 4.0です。翻訳にはFugu-Machine Translatorを利用しています。

本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。

公開日が20231128となっている論文です。

Title	Authors	Abstract	論文公表日・翻訳日
# サイバー犯罪Bitcoinの収益予測:方法論とカバレッジの影響を定量化 Cybercrime Bitcoin Revenue Estimations: Quantifying the Impact of Methodology and Coverage ( http://arxiv.org/abs/2309.03592v2 ) ライセンス: Link先を確認	Gibran Gomez, Kevin van Liebergen, Juan Caballero,	(参考訳) 複数の研究が公のBitcoin台帳を利用して、被害者から得たサイバー犯罪の収益を見積もっている。同じターゲットにフォーカスする推定は、異なる方法論、シードアドレス、および期間を使用するため、しばしば一致しない。これらの要因は、それらの方法論的差異の影響を理解することを困難にしている。さらに、ターゲットの支払いアドレスをカバーしている(不足している)ため、収益を過小評価しているが、この影響がどれほど大きいかは不明だ。本研究は,サイバー犯罪によるビットコイン収益の推定に関する最初の体系的分析を行う。異なる推定手法を再現できるツールを実装した。ツールを使うことで、異なる方法論のステップの影響を、制御された設定で定量化できます。広く信じられているのとは対照的に、収益は常に過小評価されているわけではない。膨大な過大評価をもたらす方法がある。 30,424件の支払いアドレスを収集し,6つのサイバー犯罪(ランサムウェア,クリッパー,セクシュアル,ポンジスキーム,配当詐欺,交換詐欺)と141のサイバー犯罪グループの金銭的影響を比較した。一般的なマルチインプットクラスタリングは、40%のグループのアドレスを見つけるのに失敗する。私たちは、初めて、(不足している)カバレッジが見積もりに与える影響を定量化します。そこで本研究では,DeadBoltサーバランサムウェア上で,高いカバレッジを実現するための2つの手法を提案する。対象範囲を広げることで、DeadBoltの収益を2.47億ドルで見積もることができる。 Multiple works have leveraged the public Bitcoin ledger to estimate the revenue cybercriminals obtain from their victims. Estimations focusing on the same target often do not agree, due to the use of different methodologies, seed addresses, and time periods. These factors make it challenging to understand the impact of their methodological differences. Furthermore, they underestimate the revenue due to the (lack of) coverage on the target's payment addresses, but how large this impact remains unknown. In this work, we perform the first systematic analysis on the estimation of cybercrime bitcoin revenue. We implement a tool that can replicate the different estimation methodologies. Using our tool we can quantify, in a controlled setting, the impact of the different methodology steps. In contrast to what is widely believed, we show that the revenue is not always underestimated. There exist methodologies that can introduce huge overestimation. We collect 30,424 payment addresses and use them to compare the financial impact of 6 cybercrimes (ransomware, clippers, sextortion, Ponzi schemes, giveaway scams, exchange scams) and of 141 cybercriminal groups. We observe that the popular multi-input clustering fails to discover addresses for 40% of groups. We quantify, for the first time, the impact of the (lack of) coverage on the estimation. For this, we propose two techniques to achieve high coverage, possibly nearly complete, on the DeadBolt server ransomware. Our expanded coverage enables estimating DeadBolt's revenue at $2.47M, 39 times higher than the estimation using two popular Internet scan engines.	翻訳日:2024-03-25 22:59:44 公開日:2023-11-28
# 「ユーザは本当の敵対的フィッシングに落ちるのか?」 : 普及するウェブページに対する人間の反応を調査する "Do Users fall for Real Adversarial Phishing?" Investigating the Human response to Evasive Webpages ( http://arxiv.org/abs/2311.16383v1 ) ライセンス: Link先を確認	Ajka Draganovic, Savino Dambra, Javier Aldana Iuit, Kevin Roundy, Giovanni Apruzzese,	(参考訳) フィッシングサイトは至る所にあり、静的ブロックリストに基づく対策はそのような脅威に対処できない。この問題に対処するため、最先端のソリューションでは、マシンラーニング(ML)を使用して、有名ブランドのWebページと視覚的に類似しているかどうかをチェックしてフィッシングサイトを検出する。これらの技術は研究において有望な成果を上げており、いくつかのセキュリティ会社がフィッシング検知システム(PDS)にも導入し始めた。しかし、MLメソッドは完璧ではなく、いくつかのサンプルはプロダクショングレードのPSDでさえバイパスされる。本稿では、「商用MLベースのPSD」を回避する「総合的なフィッシングサイト」が「現実」の問題であるかどうかを精査する。フィッシングのウェブページへの着地を嫌う人はいないが、偽陰性はユーザー(つまり実際のフィッシングのターゲット)が「フィッシングはフィッシングである」と認識できる場合、重大な結果をもたらすことはない。実際のPDSを回避した「敵対的」フィッシングWebページによって、疑わしいユーザ(多様な背景を持つ)が騙されるかどうかを評価する最初のユーザスタディ(N=126)を実施する。私たちは、よく造られた敵のWebページが、ほとんどの参加者(ITの専門家でさえ)を騙しかねないことを発見しました。我々の研究は実践者にとって重要であり、同時に愚かなフィッシングページの優先順位付けを可能にするためである。 (i)機械 (ii)人間、すなわち、意図した標的。 Phishing websites are everywhere, and countermeasures based on static blocklists cannot cope with such a threat. To address this problem, state-of-the-art solutions entail the application of machine learning (ML) to detect phishing websites by checking if they visually resemble webpages of well-known brands. These techniques have achieved promising results in research and, consequently, some security companies began to deploy them also in their phishing detection systems (PDS). However, ML methods are not perfect and some samples are bound to bypass even production-grade PDS. In this paper, we scrutinize whether 'genuine phishing websites' that evade 'commercial ML-based PDS' represent a problem "in reality". Although nobody likes landing on a phishing webpage, a false negative may not lead to serious consequences if the users (i.e., the actual target of phishing) can recognize that "something is phishy". Practically, we carry out the first user-study (N=126) wherein we assess whether unsuspecting users (having diverse backgrounds) are deceived by 'adversarial' phishing webpages that evaded a real PDS. We found that some well-crafted adversarial webpages can trick most participants (even IT experts), albeit others are easily recognized by most users. Our study is relevant for practitioners, since it allows prioritizing phishing webpages that simultaneously fool (i) machines and (ii) humans -- i.e., their intended targets.	翻訳日:2024-03-18 15:42:08 公開日:2023-11-28
# サイバーセキュリティにおけるデータラベリングのプロセスの理解 Understanding the Process of Data Labeling in Cybersecurity ( http://arxiv.org/abs/2311.16388v1 ) ライセンス: Link先を確認	Tobias Braun, Irdin Pekaric, Giovanni Apruzzese,	(参考訳) 多くのドメインが機械学習(ML)の利点を活用しており、いくつかのデータをトレーニングすることで、複雑なタスクを自律的に解決できるソリューションを約束している。残念ながら、サイバー脅威検出では、高品質なデータを得るのは難しい。さらに、MLの特定の用途では、そのようなデータは人間の演算子によってラベル付けされなければならない。多くの著作では、ラベリングはサイバー脅威検出においてタフ/シャレージング/コストがかかるため、そのようなハードルに対処する解決策を提案している。しかし、"MLセキュリティ実践者の観点から"ラベル付けのプロセスに特に対処する作業は見つからなかった。この日に至るまで、ラベリングが実際にどのように行われているのかはほとんど分かっていないため、現実の世界で“何が必要なのか”を特定できない。本稿では,データラベリングの文脈において,学術研究とセキュリティ実践の橋渡しを行うための第一歩を踏み出す。まず5つの課題の専門家に連絡し、公開インタビューを行い、ラベル付けルーチンの問題点を特定する。そして,この知見を足場として,大手セキュリティ企業の実践者13人とユーザスタディを行い,アクティブラーニングやラベルのコスト,ラベルの改訂といった課題について詳細な質問を行った。最後に,研究で見落とされがちなサイバー脅威検出におけるラベリングに関連する側面に対処する概念実証実験を行った。さらに、私たちのコントリビューションとレコメンデーションは、ML駆動のセキュリティシステムの品質と堅牢性の向上を目的とした、将来の取り組みの足掛かりとして役立ちます。リソースを解放します。 Many domains now leverage the benefits of Machine Learning (ML), which promises solutions that can autonomously learn to solve complex tasks by training over some data. Unfortunately, in cyberthreat detection, high-quality data is hard to come by. Moreover, for some specific applications of ML, such data must be labeled by human operators. Many works "assume" that labeling is tough/challenging/costly in cyberthreat detection, thereby proposing solutions to address such a hurdle. Yet, we found no work that specifically addresses the process of labeling 'from the viewpoint of ML security practitioners'. This is a problem: to this date, it is still mostly unknown how labeling is done in practice -- thereby preventing one from pinpointing "what is needed" in the real world. In this paper, we take the first step to build a bridge between academic research and security practice in the context of data labeling. First, we reach out to five subject matter experts and carry out open interviews to identify pain points in their labeling routines. Then, by using our findings as a scaffold, we conduct a user study with 13 practitioners from large security companies, and ask detailed questions on subjects such as active learning, costs of labeling, and revision of labels. Finally, we perform proof-of-concept experiments addressing labeling-related aspects in cyberthreat detection that are sometimes overlooked in research. Altogether, our contributions and recommendations serve as a stepping stone to future endeavors aimed at improving the quality and robustness of ML-driven security systems. We release our resources.	翻訳日:2024-03-18 15:42:08 公開日:2023-11-28
# Threshold Breaker: 対向型RowHammer防止メカニズムは本当に安全か? Threshold Breaker: Can Counter-Based RowHammer Prevention Mechanisms Truly Safeguard DRAM? ( http://arxiv.org/abs/2311.16460v1 ) ライセンス: Link先を確認	Ranyang Zhou, Jacqueline Liu, Sabbir Ahmed, Nakul Kochar, Adnan Siraj Rakin, Shaahin Angizi,	(参考訳) 本稿では,Threshold Breakerと呼ばれる新しい多面的障害注入攻撃手法を実験的に実証することにより,既存の被害者対応型RowHammer検出機構に挑戦する。この機構は、ターゲット行から遠い物理的距離で行をソフトアタックすることにより、最も先進的なカウンターベース防御機構を効果的に回避することができる。このような攻撃の効果を実証する以前の研究はないが、我々の研究は、128個の実際の商用DDR4 DRAM製品を体系的にテストすることでこのギャップを埋め、Threshold Breakerが主要DRAMメーカーの様々なチップに影響を与えることを明らかにした。ケーススタディでは、現代のディープニューラルネットワーク(DNN)に対して対向重み攻撃を行うことにより、我々のメカニズムとよく知られた両面攻撃のパフォーマンス効率を比較した。その結果、Threshold Breakerは、DRAMが完全に保護されている間、ターゲットとするDNNシステムのインテリジェンスを意図的に損なうことができることを示した。 This paper challenges the existing victim-focused counter-based RowHammer detection mechanisms by experimentally demonstrating a novel multi-sided fault injection attack technique called Threshold Breaker. This mechanism can effectively bypass the most advanced counter-based defense mechanisms by soft-attacking the rows at a farther physical distance from the target rows. While no prior work has demonstrated the effect of such an attack, our work closes this gap by systematically testing 128 real commercial DDR4 DRAM products and reveals that the Threshold Breaker affects various chips from major DRAM manufacturers. As a case study, we compare the performance efficiency between our mechanism and a well-known double-sided attack by performing adversarial weight attacks on a modern Deep Neural Network (DNN). The results demonstrate that the Threshold Breaker can deliberately deplete the intelligence of the targeted DNN system while DRAM is fully protected.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# ベアメタル埋込デバイスにおける汎用バイナリ機器のアブユージングプロセッサ例外 Abusing Processor Exception for General Binary Instrumentation on Bare-metal Embedded Devices ( http://arxiv.org/abs/2311.16532v1 ) ライセンス: Link先を確認	Shipei Qu, Xiaolin Zhang, Chi Zhang, Dawu Gu,	(参考訳) 組込みシステムにおけるクローズドソースドライバとライブラリのセキュリティの分析は、サプライチェーンにおけるその基本的な役割を考えると、非常に重要である。 x86とは異なり、組み込みプラットフォームには包括的なバイナリ操作ツールがないため、研究者や開発者がそのようなクローズドソースコンポーネントのセキュリティ問題を効果的に検出しパッチするのは難しい。既存の作業は、本格的なオペレーティングシステム機能に依存するか、面倒なコーナーケースに悩まされ、組み込み環境で普及しているベアメタルファームウェアにアプリケーションを制限している。本稿では,埋め込まれたベアメタルファームウェアに対して,汎用的できめ細かな静的バイナリ・インスツルメンテーションを可能にするPIFER(Practical Instrumenting Framework for Embedded fiRmware)を提案する。組み込みプロセッサのハードウェア例外処理機構を悪用することにより、PIFERは任意のターゲットアドレスに対してインスツルメンテーションを行うことができる。さらに,修正後のファームウェアの正しい実行を保証するための命令翻訳方式を提案する。我々は、Zephyr RTOS、CoreMarkベンチマーク、およびクローズソースの商用製品を含む、現実世界の複雑なファームウェアに対してPIFERを評価した。結果は、PIFERが98.9%の指示を正しく測定したことを示している。さらに,本研究の実用性と効率性を示す総合的な性能評価を行った。 Analyzing the security of closed-source drivers and libraries in embedded systems holds significant importance, given their fundamental role in the supply chain. Unlike x86, embedded platforms lack comprehensive binary manipulating tools, making it difficult for researchers and developers to effectively detect and patch security issues in such closed-source components. Existing works either depend on full-fledged operating system features or suffer from tedious corner cases, restricting their application to bare-metal firmware prevalent in embedded environments. In this paper, we present PIFER (Practical Instrumenting Framework for Embedded fiRmware) that enables general and fine-grained static binary instrumentation for embedded bare-metal firmware. By abusing the built-in hardware exception-handling mechanism of the embedded processors, PIFER can perform instrumentation on arbitrary target addresses. Additionally, We propose an instruction translation-based scheme to guarantee the correct execution of the original firmware after patching. We evaluate PIFER against real-world, complex firmware, including Zephyr RTOS, CoreMark benchmark, and a close-sourced commercial product. The results indicate that PIFER correctly instrumented 98.9% of the instructions. Further, a comprehensive performance evaluation was conducted, demonstrating the practicality and efficiency of our work.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# 垂直分割型健康データの応答性同定のための安全なトラバーサブルイベントロギング Secure Traversable Event logging for Responsible Identification of Vertically Partitioned Health Data ( http://arxiv.org/abs/2311.16575v1 ) ライセンス: Link先を確認	Sunanda Bose, Dusica Marijan,	(参考訳) 我々は、機密医療情報のセキュアな識別のためのソリューションを提供することを目標としている。医療機関に保管されている未確認医療データのレポジトリについて検討する。別々に記憶されている識別情報は、カストディアンと呼ばれるユーザのサブセットによってのみ医療情報に関連付けることができる。本論文は、識別情報と機密医療情報とを関連づけるプロセスを確保することを目的とする。我々はまた、このような情報識別の事象を文書化する不変台帳を維持することで、カストディアンの責任を執行する。本稿では,患者が関連する項目を閲覧できる台帳作成手法を提案する。しかし、プライバシを尊重するためには、そのようなトラバーサルは、ユーザが同一の操作に関わらない限り、システムに関わる他のユーザに関する情報を入手できないよう、適切な認証情報を必要とする。 We aim to provide a solution for the secure identification of sensitive medical information. We consider a repository of de-identified medical data that is stored in the custody of a Healthcare Institution. The identifying information that is stored separately can be associated with the medical information only by a subset of users referred to as custodians. This paper intends to secure the process of associating identifying information with sensitive medical information. We also enforce the responsibility of the custodians by maintaining an immutable ledger documenting the events of such information identification. The paper proposes a scheme for constructing ledger entries that allow the custodians and patients to browse through the entries which they are associated with. However, in order to respect their privacy, such traversal requires appropriate credentials to ensure that a user cannot gain any information regarding the other users involved in the system unless they are both involved in the same operation.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# AI加速器のための統一ハードウェアベースの脅威検出器 A Unified Hardware-based Threat Detector for AI Accelerators ( http://arxiv.org/abs/2311.16684v1 ) ライセンス: Link先を確認	Xiaobei Yan, Han Qiu, Tianwei Zhang,	(参考訳) AIテクノロジの普及は、さまざまなセキュリティ脅威を引き起こし、AIモデルやアプリケーションの機密性や整合性を著しく損なう。既存のソフトウェアベースのソリューションは、主に1つの特定の攻撃をターゲットとしており、モデルに実装する必要がある。 FPGAベースのAIアクセラレータを保護するために,UniGuardを設計する。 UniGuardの中核となる考え方は、モデル推論中に生成された電力側チャネル情報を利用して異常を検出することである。我々は、電力変動を捉え、教師付き機械学習モデルを訓練し、様々な種類の脅威を特定するために、タイム・トゥ・デジタル・コンバータを用いている。評価の結果、UniGuardは未知または適応的な攻撃に対する高い一般化と様々な構成(センサ周波数と位置)に対する堅牢性により、94.0%の攻撃検出精度を達成できることが示された。 The proliferation of AI technology gives rise to a variety of security threats, which significantly compromise the confidentiality and integrity of AI models and applications. Existing software-based solutions mainly target one specific attack, and require the implementation into the models, rendering them less practical. We design UniGuard, a novel unified and non-intrusive detection methodology to safeguard FPGA-based AI accelerators. The core idea of UniGuard is to harness power side-channel information generated during model inference to spot any anomaly. We employ a Time-to-Digital Converter to capture power fluctuations and train a supervised machine learning model to identify various types of threats. Evaluations demonstrate that UniGuard can achieve 94.0% attack detection accuracy, with high generalization over unknown or adaptive attacks and robustness against varied configurations (e.g., sensor frequency and location).	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# ブロックチェーンベースのZero Trust on the Edge Blockchain-based Zero Trust on the Edge ( http://arxiv.org/abs/2311.16744v1 ) ライセンス: Link先を確認	Cem Bicer, Ilir Murturi, Praveen Kumar Donta, Schahram Dustdar,	(参考訳) モノのインターネット(IoT)デバイスは、その異質性(ハードウェアとソフトウェア)と広範囲な攻撃面に対する脆弱性により、重大なセキュリティ上の問題を引き起こす。今日の従来の周辺システムでは、アクターがネットワークにアクセスできるかどうかを判断するためにクレデンシャルベースの認証(ユーザ名/パスワード、証明書など)を使用している。しかしながら、検証プロセスは、ほとんどのIoTデバイスは、ハードウェアとソフトウェア機能に制限があるため、堅牢なセキュリティ対策を欠いているため、システム周辺でのみ発生します。そこで本稿では,ブロックチェーンに拡張されたゼロトラストアーキテクチャ(ZTA)に基づく新たなアプローチを提案し,セキュリティをさらに強化する。ブロックチェーンコンポーネントは、ユーザの要求を格納するための不変データベースとして機能し、潜在的に悪意のあるユーザアクティビティを分析して識別することで、信頼性を検証するために使用される。スマートシティにおけるその実現可能性と適用性を検証するために,テストベッド上で実施したフレームワーク,アプローチのプロセス,実験について論じる。最後に、評価はパフォーマンス、スケーラビリティ、複雑さなどの非機能性に焦点を当てている。 Internet of Things (IoT) devices pose significant security challenges due to their heterogeneity (i.e., hardware and software) and vulnerability to extensive attack surfaces. Today's conventional perimeter-based systems use credential-based authentication (e.g., username/password, certificates, etc.) to decide whether an actor can access a network. However, the verification process occurs only at the system's perimeter because most IoT devices lack robust security measures due to their limited hardware and software capabilities, making them highly vulnerable. Therefore, this paper proposes a novel approach based on Zero Trust Architecture (ZTA) extended with blockchain to further enhance security. The blockchain component serves as an immutable database for storing users' requests and is used to verify trustworthiness by analyzing and identifying potentially malicious user activities. We discuss the framework, processes of the approach, and the experiments carried out on a testbed to validate its feasibility and applicability in the smart city context. Lastly, the evaluation focuses on non-functional properties such as performance, scalability, and complexity.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# 確率的関係ホア論理における直接遅延サンプリング証明法 A Direct Lazy Sampling Proof Technique in Probabilistic Relational Hoare Logic ( http://arxiv.org/abs/2311.16844v1 ) ライセンス: Link先を確認	Roberto Metere, Changyu Dong,	(参考訳) ランダムな値を使用するプログラムは、事前に(動的に)選択するか、必要に応じて(遅延的に)サンプルを選択できる。形式的証明では、ランダムオラクルモデル(ROM)の共通要件である2つの遅延プログラムの区別不可能性に焦点を当てる。サンプリング命令の再配置は、しばしばこれを解決しますが、サンプリングがプロシージャにまたがるときに複雑になります。 2004年にベルレとロガワによって導入された伝統的なアプローチは、プログラムを熱心なサンプリングに変換するが、有限メモリ、多項式境界、人工再サンプリング関数を仮定する必要がある。本稿では,確率的関係ホア論理(pRHL)における新しい手法を提案する。また、この手法をEasyCrypt定理証明器で実装し、従来の方法に代わる便利な方法であることを示す。 Programs using random values can either make all choices in advance (eagerly) or sample as needed (lazily). In formal proofs, we focus on indistinguishability between two lazy programs, a common requirement in the random oracle model (ROM). While rearranging sampling instructions often solves this, it gets complex when sampling is spread across procedures. The traditional approach, introduced by Bellare and Rogaway in 2004, converts programs to eager sampling, but requires assuming finite memory, a polynomial bound, and artificial resampling functions. We introduce a novel approach in probabilistic Relational Hoare Logic (pRHL) that directly proves indistinguishability, eliminating the need for conversions and the mentioned assumptions. We also implement this approach in the EasyCrypt theorem prover, showing that it can be a convenient alternative to the traditional method.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# VFK格子によるNTRUのメッセージ回復攻撃 Message Recovery Attack in NTRU through VFK Lattices ( http://arxiv.org/abs/2311.17022v1 ) ライセンス: Link先を確認	Eirini Poimenidou, Marios Adamoudis, Konstantinos A. Draziotis, Kostas Tsichlas,	(参考訳) 本稿では,NTRU暗号システムのすべての変種に対して,メッセージ回復攻撃を実装した。提案手法では,NTRU格子からVoronoi First Kind格子への変換を行う。攻撃の有効性は、未知の量の近似を可能にする特定の神託に依存する。さらに,攻撃が成功した数学的条件についても概説する。最後に、VFK格子上のCVPの確立された多項式アルゴリズムとその実装を探索し、攻撃の有効性について光を当てる。その後、NTRU-HPSおよびNTRU-PrimeのNISTサブミッションに対する総合的な実験結果を示し、攻撃に対するNTRU暗号システムの耐性を示す方法を提案する。 In the present paper, we implement a message recovery attack to all variants of the NTRU cryptosystem. Our approach involves a reduction from the NTRU-lattice to a Voronoi First Kind lattice, enabling the application of a polynomial CVP exact algorithm crucial for executing the Message Recovery. The efficacy of our attack relies on a specific oracle that permits us to approximate an unknown quantity. Furthermore, we outline the mathematical conditions under which the attack is successful. Finally, we delve into a well-established polynomial algorithm for CVP on VFK lattices and its implementation, shedding light on its efficacy in our attack. Subsequently, we present comprehensive experimental results on the NTRU-HPS and the NTRU-Prime variants of the NIST submissions and propose a method that could indicate the resistance of the NTRU cryptosystem to our attack.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# マネーロンダリングにおける暗号取引方式の検討 A Review on Cryptocurrency Transaction Methods for Money Laundering ( http://arxiv.org/abs/2311.17203v1 ) ライセンス: Link先を確認	Hugo Almeida, Pedro Pinto, Ana Fernández Vilas,	(参考訳) 暗号通貨は関連資産とみなされており、現在は投資や取引に利用されている。しかし、不可逆性、不変性、分散アーキテクチャ、制御権限の欠如、モビリティ、疑似匿名といった暗号通貨に共通する特性は、マネーロンダリング活動に訴える。したがって、マネーロンダリングに使用される現在の暗号通貨ベースの手法の収集と評価は、物理的およびデジタルマネーの流通フローを理解し、この違法な活動を防止するために最重要である。本稿では,マネーロンダリングのライフサイクルを通じて,暗号通貨取引手法のコレクションを提示し,配布する。各手法は、それに対応するマネーロンダリングのフェーズに従って分析され、分類される。本稿の結果は、今後、不正マネーロンダリング活動を防止するための効率的な戦略を設計する上で有効かもしれない。 Cryptocurrencies are considered relevant assets and they are currently used as an investment or to carry out transactions. However, specific characteristics commonly associated with the cryptocurrencies such as irreversibility, immutability, decentralized architecture, absence of control authority, mobility, and pseudo-anonymity make them appealing for money laundering activities. Thus, the collection and characterization of current cryptocurrency-based methods used for money laundering are paramount to understanding the circulation flows of physical and digital money and preventing this illegal activity. In this paper, a collection of cryptocurrency transaction methods is presented and distributed through the money laundering life cycle. Each method is analyzed and classified according to the phase of money laundering it corresponds to. The result of this article may in the future help design efficient strategies to prevent illegal money laundering activities.	翻訳日:2024-03-18 13:44:50 公開日:2023-11-28
# ZTCloudGuard: ジェネレーティブAIとクラウドベースのヘルス情報エコシステムの時代における誤用を回避するためのゼロトラストコンテキスト対応アクセス管理フレームワーク ZTCloudGuard: Zero Trust Context-Aware Access Management Framework to Avoid Misuse Cases in the Era of Generative AI and Cloud-based Health Information Ecosystem ( http://arxiv.org/abs/2312.02993v1 ) ライセンス: Link先を確認	Khalid Al-hammuri, Fayez Gebali, Awos Kanan,	(参考訳) 多数の分散型医療機器間のアクセスを管理することは、スマート病院と遠隔医療インフラの確立を可能にする、現代の医療システムにおいて重要な側面となっている。しかし、遠隔医療技術が発展し、IoT(Internet of Things)デバイスがより広く使われるようになるにつれ、さまざまなタイプの脆弱性や医療的エラーにさらされる傾向にある。医療情報システムでは、誤用やヒューマンエラーから約90%の脆弱性が出現した。その結果、このような攻撃を防ぐためにセキュリティツールのさらなる研究と開発が必要である。本稿では、ユーザ、デバイス、出力データを含むクラウドエコシステムの主要コンポーネントへのアクセスを管理するための、ゼロトラストベースのコンテキスト認識フレームワークを提案する。提案フレームワークの主な目的とメリットは、クラウドベースの医療情報システムで分散医療機器を使用して、誤用を防止または緩和するスコアシステムを構築することである。このフレームワークには、信頼の連鎖を維持するための2つの主要なスコアスキーマがある。まず、認証、暗号化、ロギング、認可といったクラウドネイティブなマイクロサービスに基づいて、クリティカルな信頼スコアを提案する。第2に、医療情報システムに格納された属性のリアルタイムな意味と統語的分析を評価するために、債券信託スコアを作成する。この分析は、セマンティックスコアと構文スコアを生成するための、事前訓練された機械学習モデルに基づいている。このフレームワークは、規制の遵守とユーザーの同意を考慮し、スコアリングシステムを作成する。この手法の利点は、どんな言語にも適用でき、事前定義された限定された属性のセットではなく、言語モデルに依存しているため、すべての属性に適応できる点である。その結果、高いF1スコアは93.5%であり、誤用例の検出に有効であることが証明された。 Managing access between large numbers of distributed medical devices has become a crucial aspect of modern healthcare systems, enabling the establishment of smart hospitals and telehealth infrastructure. However, as telehealth technology continues to evolve and Internet of Things (IoT) devices become more widely used, they are also becoming increasingly exposed to various types of vulnerabilities and medical errors. In healthcare information systems, about 90\% of vulnerabilities emerged from misuse cases and human errors. As a result, there is a need for additional research and development of security tools to prevent such attacks. This article proposes a zero-trust-based context-aware framework for managing access to the main components of the cloud ecosystem, including users, devices and output data. The main goal and benefit of the proposed framework is to build a scoring system to prevent or alleviate misuse cases while using distributed medical devices in cloud-based healthcare information systems. The framework has two main scoring schemas to maintain the chain of trust. First, it proposes a critical trust score based on cloud-native micro-services of authentication, encryption, logging, and authorizations. Second, creating a bond trust scoring to assess the real-time semantic and syntactic analysis of attributes stored in a healthcare information system. The analysis is based on a pre-trained machine learning model to generate the semantic and syntactic scores. The framework also takes into account regulatory compliance and user consent to create a scoring system. The advantage of this method is that it is applicable to any language and adapts to all attributes as it relies on a language model, not just a set of predefined and limited attributes. The results show a high F1 score of 93.5%, which proves that it is valid for detecting misuse cases.	翻訳日:2024-03-18 13:05:51 公開日:2023-11-28
# デジタル図書館における学習資源カテゴリの自動認識 Automatic Recognition of Learning Resource Category in a Digital Library ( http://arxiv.org/abs/2401.12220v1 ) ライセンス: Link先を確認	Soumya Banerjee, Debarshi Kumar Sanyal, Samiran Chattopadhyay, Plaban Kumar Bhowmick, Partha Pratim Das	(参考訳) デジタル図書館は、多種多様な文書タイプを処理するという課題に直面することが多い。メタデータの手動収集とタグ付けは、時間がかかり、エラーが発生しやすいタスクである。そこで本研究では,デジタルライブラリの自動メタデータ抽出装置の開発を目標としている。本稿では,文書画像分類用に設計されたヘテロジニアス・ラーニング・リソース(hlr)データセットを紹介する。このアプローチでは、個々の学習リソースを構成文書イメージ(シート)に分解する。これらの画像はOCRツールを通じて処理され、テキスト表現を抽出する。文書画像とテキスト内容の両方を分類するために最先端の分類器を用いる。その後、構成文書画像のラベルを利用して、全体文書のラベルを予測する。 Digital libraries often face the challenge of processing a large volume of diverse document types. The manual collection and tagging of metadata can be a time-consuming and error-prone task. To address this, we aim to develop an automatic metadata extractor for digital libraries. In this work, we introduce the Heterogeneous Learning Resources (HLR) dataset designed for document image classification. The approach involves decomposing individual learning resources into constituent document images (sheets). These images are then processed through an OCR tool to extract textual representation. State-of-the-art classifiers are employed to classify both the document image and its textual content. Subsequently, the labels of the constituent document images are utilized to predict the label of the overall document.	翻訳日:2024-02-11 17:42:42 公開日:2023-11-28
# f4d:効率的ビデオレベル表現学習のための因子化4次元畳み込みニューラルネットワーク F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation Learning ( http://arxiv.org/abs/2401.08609v1 ) ライセンス: Link先を確認	Mohammad Al-Saad, Lakshmish Ramaswamy and Suchendra Bhandarkar	(参考訳) 近年の研究では、ビデオレベルの表現学習は、ビデオ行動認識のための長距離時間構造を捕捉し理解するために重要であることが示されている。ビデオレベルの表現学習のための既存の3D畳み込みニューラルネットワーク(CNN)ベースのほとんどの方法はクリップベースであり、短期的な動きと外観のみに焦点を当てている。これらのcnnベースの手法は、基礎となるビデオの長距離時空間表現を取り入れ、モデル化する能力がなく、トレーニング中の長距離ビデオレベルコンテキストを無視している。本研究では,より効果的で,よりきめ細かな,長期の時空間映像表現を学習できる4次元CNNアーキテクチャ(F4D)を提案する。提案したF4Dアーキテクチャは,従来の2Dアーキテクチャと3D CNNアーキテクチャよりも大幅に性能が向上することを示す。 5つのアクション認識ベンチマークデータセット、すなわち something-something-v1, somethingsomething-v2, kinetics-400, ucf101, hmdb51 の実験評価は、ビデオレベルのアクション認識におけるf4dネットワークアーキテクチャの有効性を示している。 Recent studies have shown that video-level representation learning is crucial to the capture and understanding of the long-range temporal structure for video action recognition. Most existing 3D convolutional neural network (CNN)-based methods for video-level representation learning are clip-based and focus only on short-term motion and appearances. These CNN-based methods lack the capacity to incorporate and model the long-range spatiotemporal representation of the underlying video and ignore the long-range video-level context during training. In this study, we propose a factorized 4D CNN architecture with attention (F4D) that is capable of learning more effective, finer-grained, long-term spatiotemporal video representations. We demonstrate that the proposed F4D architecture yields significant performance improvements over the conventional 2D, and 3D CNN architectures proposed in the literature. Experiment evaluation on five action recognition benchmark datasets, i.e., Something-Something-v1, SomethingSomething-v2, Kinetics-400, UCF101, and HMDB51 demonstrate the effectiveness of the proposed F4D network architecture for video-level action recognition.	翻訳日:2024-01-22 10:06:11 公開日:2023-11-28
# HAtt-Flow:ビデオにおけるグループ活動シーングラフ生成のための階層的注意フロー機構 HAtt-Flow: Hierarchical Attention-Flow Mechanism for Group Activity Scene Graph Generation in Videos ( http://arxiv.org/abs/2312.07740v1 ) ライセンス: Link先を確認	Naga VS Raviteja Chappa, Pha Nguyen, Thi Hoang Ngan Le and Khoa Luu	(参考訳) グループアクティビティシーングラフ(GASG)の生成はコンピュータビジョンにおいて困難な課題であり、ビデオシーケンス中の被写体とオブジェクトの関係を予測し記述することを目的としている。従来のビデオシーングラフ生成(vidsgg)手法は振り返り分析にフォーカスし、予測能力を制限する。シーン理解機能を強化するため,我々は,<textit{Appearance, Interaction, position, Relationship, and situation}属性を含むニュアンスなアノテーションで,JRDBデータセットを拡張したGASGデータセットを導入した。この研究は、GASG性能を高めるためにフローネットワーク理論に根ざした革新的アプローチである \textbf{H}ierarchical \textbf{Att}ention-\textbf{Flow}(HAtt-Flow)メカニズムも導入した。 Flow-Attentionは、フロー保護の原則を取り入れ、ソースの競争を奨励し、シンクの割り当てを奨励する。提案手法は,従来の「値」と「キー」をそれぞれソースとシンクに変換し,アテンションベースモデルのための新たなフレームワークを作成する,アテンションメカニズムに関するユニークな視点を提供する。広範な実験により, hatt-flowモデルの有効性と提案するflow-attention機構の優位性を実証した。この研究は、ビデオデータのリアルタイム関係予測を必要とするアプリケーションに有用な洞察と技術を提供する、予測ビデオシーン理解の大幅な進歩を表している。 Group Activity Scene Graph (GASG) generation is a challenging task in computer vision, aiming to anticipate and describe relationships between subjects and objects in video sequences. Traditional Video Scene Graph Generation (VidSGG) methods focus on retrospective analysis, limiting their predictive capabilities. To enrich the scene understanding capabilities, we introduced a GASG dataset extending the JRDB dataset with nuanced annotations involving \textit{Appearance, Interaction, Position, Relationship, and Situation} attributes. This work also introduces an innovative approach, \textbf{H}ierarchical \textbf{Att}ention-\textbf{Flow} (HAtt-Flow) Mechanism, rooted in flow network theory to enhance GASG performance. Flow-Attention incorporates flow conservation principles, fostering competition for sources and allocation for sinks, effectively preventing the generation of trivial attention. Our proposed approach offers a unique perspective on attention mechanisms, where conventional "values" and "keys" are transformed into sources and sinks, respectively, creating a novel framework for attention-based models. Through extensive experiments, we demonstrate the effectiveness of our Hatt-Flow model and the superiority of our proposed Flow-Attention mechanism. This work represents a significant advancement in predictive video scene understanding, providing valuable insights and techniques for applications that require real-time relationship prediction in video data.	翻訳日:2024-01-15 14:53:02 公開日:2023-11-28
# 物理インフォーメーションニューラルネットワークを用いた多負荷筋電図信号による上肢の多関節運動の予測 Predicting Multi-Joint Kinematics of the Upper Limb from EMG Signals Across Varied Loads with a Physics-Informed Neural Network ( http://arxiv.org/abs/2312.09418v1 ) ライセンス: Link先を確認	Rajnish Kumar, Suriya Prakash Muthukrishnan, Lalan Kumar, Sitikantha Roy	(参考訳) 本研究では,これらの関節を囲む筋肉から記録された筋電図(emg)信号を用いて,多関節運動学を予測するための物理インフォームドニューラルネットワーク(pinn)モデルを提案する。主な目的は,肩関節角度と肘関節角度を同時に予測し,特に負荷条件の異なる肘屈曲伸展(FE)運動を行うことである。 PINNモデルは、フィードフォワードニューラルネットワーク(ANN)とジョイントトルク計算モデルを組み合わせることで構成される。トレーニングの過程では,逆動力学関節筋骨格モデルから得られたカスタム損失関数と,平均二乗角損失を利用する。 PINNモデルのトレーニングデータセットは、EMGと4つの異なる被験者から収集された時間データを含む。モデルの性能を評価するため,実験データと予測関節角度の比較を行った。その結果,関節角度予測では58%から83%の相関が認められた。この結果は、モデルに物理原理を組み込むことの可能性を強調し、汎用性を高めるだけでなく、精度も向上させた。この結果は, 動的シナリオにおけるマルチジョイント・キネマティクスの正確な推定, 特に外骨格および補綴制御系のためのヒューマン・マシン・インタフェース(HMI)の進歩に重要な影響を与える可能性がある。 In this research, we present an innovative method known as a physics-informed neural network (PINN) model to predict multi-joint kinematics using electromyography (EMG) signals recorded from the muscles surrounding these joints across various loads. The primary aim is to simultaneously predict both the shoulder and elbow joint angles while executing elbow flexion-extension (FE) movements, especially under varying load conditions. The PINN model is constructed by combining a feed-forward Artificial Neural Network (ANN) with a joint torque computation model. During the training process, the model utilizes a custom loss function derived from an inverse dynamics joint torque musculoskeletal model, along with a mean square angle loss. The training dataset for the PINN model comprises EMG and time data collected from four different subjects. To assess the model's performance, we conducted a comparison between the predicted joint angles and experimental data using a testing data set. The results demonstrated strong correlations of 58% to 83% in joint angle prediction. The findings highlight the potential of incorporating physical principles into the model, not only increasing its versatility but also enhancing its accuracy. The findings could have significant implications for the precise estimation of multi-joint kinematics in dynamic scenarios, particularly concerning the advancement of human-machine interfaces (HMIs) for exoskeletons and prosthetic control systems.	翻訳日:2024-01-15 14:23:05 公開日:2023-11-28
# 模擬ロボットアームにおける安全強化学習 Safe Reinforcement Learning in a Simulated Robotic Arm ( http://arxiv.org/abs/2312.09468v1 ) ライセンス: Link先を確認	Luka Kova\v{c} and Igor Farka\v{s}	(参考訳) 強化学習(RL)エージェントは最適な政策を学ぶために環境を探索する必要がある。多くの環境やタスクにおいて、安全は重要である。シミュレータの普及は、RLシステムが物理的環境(例えば人間とロボットの相互作用)で直接訓練される必要がある場合に必然的に回避されるような安全な探索など、多くの利点を提供している。人気のある安全ジムライブラリーは、さまざまな安全制約を考慮しながら目標指向のタスクを学習できる3つのモバイルエージェントタイプを提供している。本稿では,安全ジムアルゴリズムをテスト可能なパンダロボットアームを用いたカスタマイズ環境の構築により,安全rlアルゴリズムの適用性を拡張する。 ppoアルゴリズムのベースラインと制約付きバージョンを比較してパイロット実験を行い,制約付きバージョンでは,安全上の制約を満たし,期待どおりのトレーニング時間を短縮しながら,等しく優れたポリシーを学習できることを示した。 Reinforcement learning (RL) agents need to explore their environments in order to learn optimal policies. In many environments and tasks, safety is of critical importance. The widespread use of simulators offers a number of advantages, including safe exploration which will be inevitable in cases when RL systems need to be trained directly in the physical environment (e.g. in human-robot interaction). The popular Safety Gym library offers three mobile agent types that can learn goal-directed tasks while considering various safety constraints. In this paper, we extend the applicability of safe RL algorithms by creating a customized environment with Panda robotic arm where Safety Gym algorithms can be tested. We performed pilot experiments with the popular PPO algorithm comparing the baseline with the constrained version and show that the constrained version is able to learn the equally good policy while better complying with safety constraints and taking longer training time as expected.	翻訳日:2024-01-15 14:14:06 公開日:2023-11-28
# ミリ波帯5gにおける固定広帯域無線アクセスの性能解析 Performance Analysis of Fixed Broadband Wireless Access in mmWave Band in 5G ( http://arxiv.org/abs/2312.09467v1 ) ライセンス: Link先を確認	Soumya Banerjee, Sarada Prasad Gochhayat, and Sachin Shetty	(参考訳) エンド・ツー・エンドのファイバベースネットワークは、エンドユーザーに複数ギガビットの固定アクセスを提供する可能性を秘めている。しかし、ファイバーアクセスの展開、特にファイバーが存在しない地域では、時間がかかりコストがかかるため、オペレーターのリターンが遅れる。本研究は5gのmm波帯における固定広帯域無線アクセスからの伝送データについて検討する。この領域への関心が高まる中、データの伝達特性を理解することが重要となる。 mmWaveバンドの既存のデータセットは利用可能だが、しばしばシミュレーション環境から生成される。本研究では,固定広帯域無線アクセス(rwm6050)から収集した実世界の伝送データから得られたデータセットを提案する。送信特性に基づく自己設定を容易にすることを目的とする。そこで本稿では,リアルタイム学習と伝送特性の分類を行うオンライン機械学習手法を提案する。さらに,より正確な分類のための2つの時間モデルを提案する。以上の結果から,送信データの解析結果から,送信角度と距離を高い精度で直接検出でき,組み合わせ分類タスクにおいて最大99%の精度で検出できることを示した。最後に,収集データに基づく今後の研究方向性について概説する。 An end-to-end fiber-based network holds the potential to provide multi-gigabit fixed access to end-users. However, deploying fiber access, especially in areas where fiber is non-existent, can be time-consuming and costly, resulting in delayed returns for Operators. This work investigates transmission data from fixed broadband wireless access in the mmWave band in 5G. Given the growing interest in this domain, understanding the transmission characteristics of the data becomes crucial. While existing datasets for the mmWave band are available, they are often generated from simulated environments. In this study, we introduce a dataset compiled from real-world transmission data collected from the Fixed Broadband Wireless Access in mmWave Band device (RWM6050). The aim is to facilitate self-configuration based on transmission characteristics. To achieve this, we propose an online machine learning-based approach for real-time training and classification of transmission characteristics. Additionally, we present two advanced temporal models for more accurate classifications. Our results demonstrate the ability to detect transmission angle and distance directly from the analysis of transmission data with very high accuracy, reaching up to 99% accuracy on the combined classification task. Finally, we outline promising future research directions based on the collected data.	翻訳日:2024-01-15 14:13:52 公開日:2023-11-28
# ロボット工学, コンピュータビジョン, アルゴリズム設計の統合: 中国のポーカー・セルフプレイロボット Integration of Robotics, Computer Vision, and Algorithm Design: A Chinese Poker Self-Playing Robot ( http://arxiv.org/abs/2312.09455v1 ) ライセンス: Link先を確認	Kuan-Huang Yu	(参考訳) 本稿では、TM5-900ロボットアームが4人称カードゲームチャイニーズポーカーを独立してプレイできる統合システムであるチャイナポーカーセルフプレイングロボットを提案する。このロボットはカスタムの吸盤機構を使ってカードを拾い、プレイする。 YOLOv5に基づく物体検出モデルを用いて、ロボットに処理された13枚のカードのスーツと枚数を認識する。 13枚のカードを3,5,5枚のカードの最適な手に分割するためのグリーディアルゴリズムを開発した。実験により、ロボットはカードの入手に成功し、コンピュータビジョンを使って識別し、アルゴリズムを使って戦略的に手を選び、ゲーム内で物理的にカードをプレイできることが示されている。このシステムは、メカニカルデザイン、コンピュータビジョン、アルゴリズム設計、ロボット制御を効果的に統合し、独立してカードをプレイする複雑なタスクを達成する。 This paper presents Chinese Poker Self-Playing Robot, an integrated system enabling a TM5-900 robotic arm to independently play the four-person card game Chinese poker. The robot uses a custom sucker mechanism to pick up and play cards. An object detection model based on YOLOv5 is utilized to recognize the suit and number of 13 cards dealt to the robot. A greedy algorithm is developed to divide the 13 cards into optimal hands of 3, 5, and 5 cards to play. Experiments demonstrate that the robot can successfully obtain the cards, identify them using computer vision, strategically select hands to play using the algorithm, and physically play the selected cards in the game. The system showcases effective integration of mechanical design, computer vision, algorithm design, and robotic control to accomplish the complex task of independently playing cards.	翻訳日:2024-01-15 14:11:22 公開日:2023-11-28
# MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit MANTIS at #SMM4H 2023: Leveraging Hybrid and Ensemble Models for Detection of Social Anxiety Disorder on Reddit ( http://arxiv.org/abs/2312.09451v1 ) ライセンス: Link先を確認	Sourabh Zanwar, Daniel Wiechmann, Yu Qiao, Elma Kerz	(参考訳) 本稿では,ソーシャル・メディア・マイニング・フォー・ヘルス (Social Media Mining for Health) 2023 Shared Task 4: Binary classification of English Reddit post in self-reporting a social anxiety disorder diagnosis。医療領域適応トランスフォーマーとbilstmニューラルネットワークを併用したハイブリッドモデルとアンサンブルモデルの有効性を体系的に検討し,比較した。評価結果から,テストセットでは89.31%F1,テストセットでは83.76%F1を得た。 This paper presents our system employed for the Social Media Mining for Health 2023 Shared Task 4: Binary classification of English Reddit posts self-reporting a social anxiety disorder diagnosis. We systematically investigate and contrast the efficacy of hybrid and ensemble models that harness specialized medical domain-adapted transformers in conjunction with BiLSTM neural networks. The evaluation results outline that our best performing model obtained 89.31% F1 on the validation set and 83.76% F1 on the test set.	翻訳日:2024-01-15 14:10:46 公開日:2023-11-28
# 拡散モデルのトラクタブルステアリングによる画像処理 Image Inpainting via Tractable Steering of Diffusion Models ( http://arxiv.org/abs/2401.03349v1 ) ライセンス: Link先を確認	Anji Liu and Mathias Niepert and Guy Van den Broeck	(参考訳) 拡散モデルは、フォトリアリスティックな画像を生成する技術の現状である。しかし, この制約に対する厳密な条件付けは難解であるため, 塗装などの制約付き画像生成タスクのサンプリングプロセスの制御は困難である。既存の手法では制約後部を近似するために様々な手法が用いられているが,本研究では,制約後部を正確にかつ効率的に計算するためのTPM(Tractable Probabilistic Models)の活用と,拡散モデルの認知過程の制御にこの信号を活用することを提案する。具体的には、確率回路(PC)と呼ばれる表現型TPMのクラスを採用する。先行研究では,さらにpcをスケールアップし,拡散モデルの画像生成プロセスを導くことができるようにした。実験の結果,TPMがもたらす計算オーバーヘッドは10%程度に過ぎず,3つの自然な画像データセット(CelebA-HQ, ImageNet, LSUN)にまたがるインペイント画像の全体的な品質とセマンティックコヒーレンスを一貫して改善できることが示唆された。さらに、画像エンコーダとデコーダの助けを借りて、画像の特定の領域に対する意味的制約を容易に受け取り、より制御された画像生成タスクの可能性を開くことができる。本稿では、制約付き画像生成のための新しいフレームワークの提案に加えて、よりトラクタブルなモデルの利点を強調し、表現型TPMの開発を動機づける。 Diffusion models are the current state of the art for generating photorealistic images. Controlling the sampling process for constrained image generation tasks such as inpainting, however, remains challenging since exact conditioning on such constraints is intractable. While existing methods use various techniques to approximate the constrained posterior, this paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior, and to leverage this signal to steer the denoising process of diffusion models. Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs). Building upon prior advances, we further scale up PCs and make them capable of guiding the image generation process of diffusion models. Empirical results suggest that our approach can consistently improve the overall quality and semantic coherence of inpainted images across three natural image datasets (i.e., CelebA-HQ, ImageNet, and LSUN) with only ~10% additional computational overhead brought by the TPM. Further, with the help of an image encoder and decoder, our method can readily accept semantic constraints on specific regions of the image, which opens up the potential for more controlled image generation tasks. In addition to proposing a new framework for constrained image generation, this paper highlights the benefit of more tractable models and motivates the development of expressive TPMs.	翻訳日:2024-01-15 09:20:23 公開日:2023-11-28
# STR-Cert:ディープラーニングパイプラインと視覚変換器の深部テキスト認識のためのロバストネス認証 STR-Cert: Robustness Certification for Deep Text Recognition on Deep Learning Pipelines and Vision Transformers ( http://arxiv.org/abs/2401.05338v1 ) ライセンス: Link先を確認	Daqian Shao, Lukas Fesser, Marta Kwiatkowska	(参考訳) ロバストネス認証は、ニューラルネットワークの敵入力に対する予測を正式に認証することを目的としており、安全クリティカルなアプリケーションにとって重要なツールの不可欠な部分となっている。かなりの進歩にもかかわらず、既存の認証手法は、MNISTのようなベンチマークデータセット上で、畳み込みネットワークやリカレントネットワーク、最近のトランスフォーマーのような基本的なアーキテクチャに限られている。本稿では,複雑な画像に基づくシーケンス予測問題であるシーンテキスト認識(STR)の堅牢性検証に焦点を当てる。我々は、標準のSTRパイプラインやビジョントランスフォーマーなど、STRモデルアーキテクチャの3つのタイプに取り組みます。本稿では,鍵となるSTRモデルコンポーネントに対する新しい多面体境界とアルゴリズムを導出し,DeepPoly多面体検証フレームワークを著しく拡張したSTRモデルの最初の認証手法STR-Certを提案する。最後に、6つのデータセット上でSTRモデルを認証し比較し、特にVision Transformerにおけるロバストネス認証の効率性とスケーラビリティを実証する。 Robustness certification, which aims to formally certify the predictions of neural networks against adversarial inputs, has become an integral part of important tool for safety-critical applications. Despite considerable progress, existing certification methods are limited to elementary architectures, such as convolutional networks, recurrent networks and recently Transformers, on benchmark datasets such as MNIST. In this paper, we focus on the robustness certification of scene text recognition (STR), which is a complex and extensively deployed image-based sequence prediction problem. We tackle three types of STR model architectures, including the standard STR pipelines and the Vision Transformer. We propose STR-Cert, the first certification method for STR models, by significantly extending the DeepPoly polyhedral verification framework via deriving novel polyhedral bounds and algorithms for key STR model components. Finally, we certify and compare STR models on six datasets, demonstrating the efficiency and scalability of robustness certification, particularly for the Vision Transformer.	翻訳日:2024-01-15 09:08:55 公開日:2023-11-28
# アルファ+8Be反応におけるポケット共鳴の観点からみたホイルと関連する励起状態 The Hoyle and associated excited states from the viewpoint of pocket resonances in alpha + 8Be reactions ( http://arxiv.org/abs/2311.16837v1 ) ライセンス: Link先を確認	Teck-Ghee Lee, Orhan Bayrak, Ian J. Thompson and Cheuk-Yin Wong	(参考訳) 基底状態における$\alpha$- Particleの反応におけるポケット共鳴の観点からのホイルおよび関連する励起状態の生成について検討し,光モデル結合チャネルフレームワーク内の$^8$Be核をプロレートする。予測された反応断面積は、中心質量エネルギー$E_{\rm cm}$の関数として、ホイル共鳴を含む顕著な共鳴を示す。これらの共振器の位置と幅は、ターゲット変形(\beta_2$パラメータ)と核表面ポテンシャルのパリティ($-$)に敏感である($\alpha$-bosonsのボース・アインシュタイン交換のため、奇数の$L$偏波に対して$L$偏波に対して$-$)。反応断面積を異なる部分波に分解すると、共振エネルギーと幅は、0_2^+$(ホイル状態)、$0_3^+$、$_1^-$および$_1^-$状態の2_2^+$状態の狭い理論的幅を除いて、利用可能な実験データと以前の超球面計算と合理的に一致することが分かる。波動関数と共振幅を解析し、狭く鋭い$0_2^+$、$3_1^-$および$2_2^+$共鳴をポケット共鳴とし、広い$0_3^+$と$1_1^-$共鳴を上記バリア共鳴として同定する。天体物理学の応用については、天体物理学的な$s(e_{\rm cm})$-factor for $e_{\rm cm}$<$ 1.0 mev, for fusion of $\alpha$+$^8$be to the $^{12}$c$(2^+)$ state を推定$s$-wave $\alpha$+$^8$be反応断面積と関連する$\gamma$- and $\alpha$-decay widths for the lost of $^{12}$c excited states に基づいて評価する。 We examine the production of the Hoyle and associated excited states from the viewpoint of pocket resonances in the reaction of an $\alpha$-particle on a ground state prolate $^8$Be nucleus within the optical model coupled-channel framework. The predicted reaction cross sections, as a function of the center-of-mass energy $E_{\rm cm}$, show prominent resonances, including the Hoyle resonance. The positions and widths of these resonances are sensitive to the target deformation ($\beta_2$ parameter) and the parity of the nuclear surface potential $-$ deeper for the even-parity $L$ partial waves relative to those for the odd-parity $L$ partial waves at the surface region because of the Bose-Einstein exchange of the $\alpha$-bosons. Decomposing the reaction cross sections to different partial waves, we find that the resonance energies and widths reasonably agree with the available experimental data and previous hyperspherical calculations for the $0_2^+$ (Hoyle state), $0_3^+$, $1_1^-$ and $3_1^-$ states of $^{12}$C, except for the narrow theoretical width of the $2_2^+$ state. Analyzing the wavefunctions and the resonance widths, we identify the narrow and sharp $0_2^+$, $3_1^-$ and $2_2^+$ resonances as pocket resonances -- resonances which occur below the potential barrier, while the broad $0_3^+$ and $1_1^-$ resonances as above-the-barrier resonances. For astrophysical applications, we also evaluate the astrophysical $S(E_{\rm cm})$-factor for $E_{\rm cm}$ $<$ 1.0 MeV, for the fusion of $\alpha$+$^8$Be into the $^{12}$C$(2^+)$ state based on our estimated $s$-wave $\alpha$+$^8$Be reaction cross section and the associated $\gamma$- and $\alpha$-decay widths for the decay of $^{12}$C excited states in the potential pocket.	翻訳日:2023-12-11 04:03:56 公開日:2023-11-28
# 反セクシズムアラートシステム:AI技術を用いたソーシャルメディア上のセクシストコメントの同定 Anti-Sexism Alert System: Identification of Sexist Comments on Social Media Using AI Techniques ( http://arxiv.org/abs/2312.00053v1 ) ライセンス: Link先を確認	Rebeca P. D\'iaz Redondo and Ana Fern\'andez Vilas and Mateo Ramos Merino and Sonia Valladares and Soledad Torres Guijarro and Manar Mohamed Hafez	(参考訳) デジタル分野における社会的関係は、より日常的で頻繁になり、私たち全員にとって非常に重要な側面となっている。この領域における暴力的相互作用は非常に頻繁に、犠牲者に深刻な影響を及ぼす。この世界的シナリオでは、女性に対する性差別という、本当に心配なデジタル暴力がひとつあります。ソーシャルメディア(newspaperコメント、sns等)に公開されている性差別的なコメントは、通常多くの注目を集めてバイラルになり、関係者にダメージを与える。本稿では、自然言語処理(NLP)と人工知能(AI)に基づいて、あらゆる公開投稿を分析し、性差別的コメントとみなすことができるかどうかを判断する反性差別警告システムを提案する。さらに、このシステムは、あらゆるマルチメディアコンテンツ(ニュース、ビデオ、ツイートなど)にリンクされたすべての公開コメントを分析し、グローバル投稿に性差別がある場合、交通信号に似た色ベースのシステムを使用して決定する。私たちはスペイン語でラベル付きデータセットを作成しましたが、ほとんどの研究は英語に焦点を合わせており、検証実験後に非常に優れたパフォーマンスを提供する。 Social relationships in the digital sphere are becoming more usual and frequent, and they constitute a very important aspect for all of us. {Violent interactions in this sphere are very frequent, and have serious effects on the victims}. Within this global scenario, there is one kind of digital violence that is becoming really worrying: sexism against women. Sexist comments that are publicly posted in social media (newspaper comments, social networks, etc.), usually obtain a lot of attention and become viral, with consequent damage to the persons involved. In this paper, we introduce an anti-sexism alert system, based on natural language processing (NLP) and artificial intelligence (AI), that analyzes any public post, and decides if it could be considered a sexist comment or not. Additionally, this system also works on analyzing all the public comments linked to any multimedia content (piece of news, video, tweet, etc.) and decides, using a color-based system similar to traffic lights, if there is sexism in the global set of posts. We have created a labeled data set in Spanish, since the majority of studies focus on English, to train our system, which offers a very good performance after the validation experiments.	翻訳日:2023-12-11 03:43:17 公開日:2023-11-28
# MIA-BAD:メンバーシップ推論攻撃の促進とフェデレートラーニングによる軽減 MIA-BAD: An Approach for Enhancing Membership Inference Attack and its Mitigation with Federated Learning ( http://arxiv.org/abs/2312.00051v1 ) ライセンス: Link先を確認	Soumya Banerjee, Sandip Roy, Sayyed Farid Ahamed, Devin Quinn, Marc Vucovich, Dhruv Nandakumar, Kevin Choi, Abdul Rahman, Edward Bowen, and Sachin Shetty	(参考訳) メンバシップ推論攻撃(MIA)は、機械学習(ML)モデルのプライバシを妥協するための一般的なパラダイムである。 MIAは、トレーニングデータに過度に適合するために、MLモデルの自然な傾きを利用する。 miasは、メンバーシップ情報を推測する自信のトレーニングとテストの区別を訓練される。 Federated Learning(FL)は、プライバシ保護のMLパラダイムであり、複数のクライアントがプライベートデータを公開せずに統一モデルのトレーニングを可能にする。本稿では,MIA アプローチを改良した Batch-wise generated Attack Dataset (MIA-BAD) を用いた拡張メンバーシップ推論攻撃を提案する。攻撃データセットがバッチ単位で生成された場合、MIAはより正確である。これにより、攻撃データセットを質的に改善しながら定量的に減少させる。 FLを用いたMLモデルのトレーニング方法を示すとともに,提案したMIA-BADアプローチによる脅威をFLアプローチで緩和する方法について検討する。最後に,様々なターゲットデータセット,可変数のフェデレーションクライアント,バッチサイズをトレーニングすることにより,mia-bad手法の質的効果を実証する。 The membership inference attack (MIA) is a popular paradigm for compromising the privacy of a machine learning (ML) model. MIA exploits the natural inclination of ML models to overfit upon the training data. MIAs are trained to distinguish between training and testing prediction confidence to infer membership information. Federated Learning (FL) is a privacy-preserving ML paradigm that enables multiple clients to train a unified model without disclosing their private data. In this paper, we propose an enhanced Membership Inference Attack with the Batch-wise generated Attack Dataset (MIA-BAD), a modification to the MIA approach. We investigate that the MIA is more accurate when the attack dataset is generated batch-wise. This quantitatively decreases the attack dataset while qualitatively improving it. We show how training an ML model through FL, has some distinct advantages and investigate how the threat introduced with the proposed MIA-BAD approach can be mitigated with FL approaches. Finally, we demonstrate the qualitative effects of the proposed MIA-BAD methodology by conducting extensive experiments with various target datasets, variable numbers of federated clients, and training batch sizes.	翻訳日:2023-12-11 03:42:57 公開日:2023-11-28
# 潜在変数推論による思考の学習 Training Chain-of-Thought via Latent-Variable Inference ( http://arxiv.org/abs/2312.02179v1 ) ライセンス: Link先を確認	Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous	(参考訳) 大規模言語モデル (LLM) は '`chain-of- Thought'' (CoT) プロンプトを使って解答ステップを実行するように指示されたときに、より正確かつ解釈可能な問題を解く。特定のタスクにおけるLLMのパフォーマンスを、微調整、すなわちいくつかの調整可能なパラメータへの勾配上昇を用いて、ラベル付きトレーニングセットからの正しい回答の平均ログリブレーションを最大化することにより改善することができる。 CoTと教師付きチューニングを組み合わせるには、正しい回答だけでなく、それらの答えにつながる詳細な理論的根拠の監督が必要である。代わりに、我々はCoTプロンプトを用いて正解を生成し、可能なすべての有理性に対してほぼ平均化する「emph{marginal} log-likelyhood」を最大化する微調整戦略を提案する。本研究の課題は, 自己学習推論器(STaR)にインスパイアされた単純なマルコフ連鎖モンテカルロ (MCMC) 予測最大化 (EM) アルゴリズム, 覚醒スリープ, マルコフスコアクライミング, 永続的コントラスト分岐を用いて, 正解に基づく有理性上の後部から抽出することである。このアルゴリズムはまた、モデルが改善するにつれて勾配推定のばらつきをゼロにする新しい制御変量法も認めている。本手法をGSM8KとBIG-Bench Hardのタスクに適用すると,このMCMC-EM微細チューニング技術は,通常,STaR以上のホールドアウト例や,CoTの有無に関わらず,モデルの精度を向上する。 Large language models (LLMs) solve problems more accurately and interpretably when instructed to work out the answer step by step using a ``chain-of-thought'' (CoT) prompt. One can also improve LLMs' performance on a specific task by supervised fine-tuning, i.e., by using gradient ascent on some tunable parameters to maximize the average log-likelihood of correct answers from a labeled training set. Naively combining CoT with supervised tuning requires supervision not just of the correct answers, but also of detailed rationales that lead to those answers; these rationales are expensive to produce by hand. Instead, we propose a fine-tuning strategy that tries to maximize the \emph{marginal} log-likelihood of generating a correct answer using CoT prompting, approximately averaging over all possible rationales. The core challenge is sampling from the posterior over rationales conditioned on the correct answer; we address it using a simple Markov-chain Monte Carlo (MCMC) expectation-maximization (EM) algorithm inspired by the self-taught reasoner (STaR), memoized wake-sleep, Markovian score climbing, and persistent contrastive divergence. This algorithm also admits a novel control-variate technique that drives the variance of our gradient estimates to zero as the model improves. Applying our technique to GSM8K and the tasks in BIG-Bench Hard, we find that this MCMC-EM fine-tuning technique typically improves the model's accuracy on held-out examples more than STaR or prompt-tuning with or without CoT.	翻訳日:2023-12-11 03:34:19 公開日:2023-11-28
# 大規模言語モデルによる自律運転の強化:安全の観点から Empowering Autonomous Driving with Large Language Models: A Safety Perspective ( http://arxiv.org/abs/2312.00812v1 ) ライセンス: Link先を確認	Yixuan Wang, Ruochen Jiao, Chengtian Lang, Sinong Simon Zhan, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu	(参考訳) 自律運転(AD)は商業打ち上げの重要なハードルに直面しており、特に、長期にわたる予期せぬ運転シナリオによる公共の信頼と安全上の懸念が減少している。この予測は、ADソフトウェアにおけるディープニューラルネットワークの制限によるものであり、解釈可能性に苦慮し、配布外および不確実なシナリオにおける一般化能力の低下を示す。そこで本稿では,大規模言語モデル(llm)を広告システムに統合し,その強固な共通認識知識,推論能力,ヒューマンインタラクション能力を活用することを提案する。提案手法は,環境安全学習のための安全性検証器を組み込んで,全体的な広告パフォーマンスと安全性を高めることを目的として,計画におけるインテリジェントな意思決定者としてllmを展開する。本手法の有効性を実証する2つの症例研究の結果を報告する。さらに、認識、予測、シミュレーションを含む他のADソフトウェアコンポーネントに対するLLMの統合の可能性についても論じる。ケーススタディで観察された課題にもかかわらず、LDMの統合は、ADにおける安全性と性能の強化に有益であり、有益である。 Autonomous Driving (AD) faces crucial hurdles for commercial launch, notably in the form of diminished public trust and safety concerns from long-tail unforeseen driving scenarios. This predicament is due to the limitation of deep neural networks in AD software, which struggle with interpretability and exhibit poor generalization capabilities in out-of-distribution and uncertain scenarios. To this end, this paper advocates for the integration of Large Language Models (LLMs) into the AD system, leveraging their robust common-sense knowledge, reasoning abilities, and human-interaction capabilities. The proposed approach deploys the LLM as an intelligent decision-maker in planning, incorporating safety verifiers for contextual safety learning to enhance overall AD performance and safety. We present results from two case studies that affirm the efficacy of our approach. We further discuss the potential integration of LLM for other AD software components including perception, prediction, and simulation. Despite the observed challenges in the case studies, the integration of LLMs is promising and beneficial for reinforcing both safety and performance in AD.	翻訳日:2023-12-11 03:31:50 公開日:2023-11-28
# ウェーブレットとグラフ理論による脳波信号からの震度検出 Seizure detection from Electroencephalogram signals via Wavelets and Graph Theory metrics ( http://arxiv.org/abs/2312.00811v1 ) ライセンス: Link先を確認	Paul Grant, Md Zahidul Islam	(参考訳) てんかんは最も一般的な神経疾患の1つであり、てんかん発作は脳の異常、過剰、同期性活動による一過性の発作である。脳波信号は、脳から放出され、分析され、てんかん発作の検出と予測に重要な役割を果たす。この研究では、ウェーブレット変換の異なる性質に依存する以前のアプローチを強化する。ここでは、最大オーバーラップ離散ウェーブレット変換を、信号 \textit{noise} と固有の周波数レベルが異なる信号分散の両方に応用し、頭皮上の電極間の接続の様々な測定値を開発する。 %) のノイズ低減信号と結合電極の特性は, 脳の状態によって大きく異なる。短時間のエポックを用いてリアルタイムモニタリングに近づき、再構成された雑音低減信号から得られた単純な統計パラメータとともに、発作検出を開始する。さらに性能を向上させるため、導出電極接続からグラフ理論指標を利用する。そこから属性空間を構築します。我々は,既存の公開手法と比較して,オープンソースソフトウェアと公開データを利用して,我々のアプローチの優れたリコール/感度性能を強調する。 Epilepsy is one of the most prevalent neurological conditions, where an epileptic seizure is a transient occurrence due to abnormal, excessive and synchronous activity in the brain. Electroencephalogram signals emanating from the brain may be captured, analysed and then play a significant role in detection and prediction of epileptic seizures. In this work we enhance upon a previous approach that relied on the differing properties of the wavelet transform. Here we apply the Maximum Overlap Discrete Wavelet Transform to both reduce signal \textit{noise} and use signal variance exhibited at differing inherent frequency levels to develop various metrics of connection between the electrodes placed upon the scalp. %The properties of both the noise reduced signal and the interconnected electrodes differ significantly during the different brain states. Using short duration epochs, to approximate close to real time monitoring, together with simple statistical parameters derived from the reconstructed noise reduced signals we initiate seizure detection. To further improve performance we utilise graph theoretic indicators from derived electrode connectivity. From there we build the attribute space. We utilise open-source software and publicly available data to highlight the superior Recall/Sensitivity performance of our approach, when compared to existing published methods.	翻訳日:2023-12-11 03:31:31 公開日:2023-11-28
# Webアクセシビリティの向上 - WCAG 2.0からWCAG 2.1への移行ガイド Advancing Web Accessibility -- A guide to transitioning Design Systems from WCAG 2.0 to WCAG 2.1 ( http://arxiv.org/abs/2312.02992v1 ) ライセンス: Link先を確認	Hardik Shah	(参考訳) 本研究は, Web コンテントアクセシビリティガイドライン (WCAG) 2.0 から Web コンテントアクセシビリティガイドライン (WCAG) 2.1 へ設計システムをアップグレードする重要なプロセスに焦点を当てる。アクセシビリティ要件の増大に加えて、デジタル環境へのインクルージョンをサポートする上で設計システムの重要な機能として、最新の状態を維持することの重要性を強調している。この記事では、WCAG 2.1準拠を満たすための完全な戦略を概説する。評価、戦略的計画、実装、テストはすべて、この戦略の一部です。コラボレーションとユーザ関与の必要性は、移行を成功させるための重要な戦略とベストプラクティスとして強調されている。さらに、この記事は、移行障壁を掘り下げ、得られた重要な教訓について論じ、この転換する道路の複雑さの現実的な見解を提供する。最後に、これは実践的なガイドであり、アクセス可能でユーザ中心の設計にコミットする組織に必要なリソースである。ドキュメントは、webアクセシビリティの変化する世界を適切にナビゲートするために必要な知識とリソースを提供します。 This research focuses on the critical process of upgrading a Design System from Web Content Accessibility Guidelines (WCAG) 2.0 to WCAG 2.1, which is an essential step in enhancing web accessibility. It emphasizes the importance of staying up to date on increasing accessibility requirements, as well as the critical function of Design Systems in supporting inclusion in digital environments. The article lays out a complete strategy for meeting WCAG 2.1 compliance. Assessment, strategic planning, implementation, and testing are all part of this strategy. The need for collaboration and user involvement is emphasized as critical strategies and best practices for a successful migration journey. In addition, the article digs into migration barriers and discusses significant lessons acquired, offering a realistic view of the intricacies of this transforming road. Finally, it is a practical guide and a necessary resource for organizations committed to accessible and user-centered design. The document provides them with the knowledge and resources they need to navigate the changing world of web accessibility properly.	翻訳日:2023-12-11 03:19:07 公開日:2023-11-28
# 治療に伴う感情分類のための汎用的nliアプローチ A Generic NLI approach for Classification of Sentiment Associated with Therapies ( http://arxiv.org/abs/2312.03737v1 ) ライセンス: Link先を確認	Rajaraman Kanagasabai and Anitha Veeramani	(参考訳) 本稿では,SMM4H 2023共有タスク2の「治療(アスペクト指向)に関する感情の分類」について述べる。本研究では,自然言語推論(nli)に基づく手法を適用し,文対分類問題としてタスクを定式化し,与えられたテキストに対するセラピーに関連する感情を予測するトランスフォーマーモデルを訓練する。最善のモデルは75.22\% f1-scoreで、全チームの提出した平均スコア(中間点)よりも11\% (4\%)高かった。 This paper describes our system for addressing SMM4H 2023 Shared Task 2 on "Classification of sentiment associated with therapies (aspect-oriented)". In our work, we adopt an approach based on Natural language inference (NLI) to formulate this task as a sentence pair classification problem, and train transformer models to predict sentiment associated with a therapy on a given text. Our best model achieved 75.22\% F1-score which was 11\% (4\%) more than the mean (median) score of all teams' submissions.	翻訳日:2023-12-11 03:11:09 公開日:2023-11-28
# 自然言語処理による臨床フリーテキストの非同定--最近のアプローチの体系的考察 De-identification of clinical free text using natural language processing: A systematic review of current approaches ( http://arxiv.org/abs/2312.03736v1 ) ライセンス: Link先を確認	Aleksandar Kova\v{c}evi\'c, Bojana Ba\v{s}aragin, Nikola Milo\v{s}evi\'c, Goran Nenadi\'c	(参考訳) 背景: 電子健康記録(EHR)は、データ駆動医療研究にとって貴重な資源である。しかしながら、保護された健康情報(PHI)の存在は、EHRを研究目的で共有するのに不適当である。 PHIを除去するプロセスは、EHRデータをアクセス可能にするための重要なステップである。自然言語処理は、その非識別プロセスの自動化の可能性を繰り返し示してきた。目的:本研究は,過去13年間で臨床フリーテキストの非特定化がいかに進展したかを体系的に証明し,現在の最先端システムの性能と限界について報告することを目的とする。また,本分野での課題や研究機会の特定も目指している。方法:2010年1月から2023年2月にかけて,PubMed,Web of Science,DBLPの体系的検索を行った。関連する研究を識別するために,タイトルと要約を調査した。その後、選抜された研究を詳細に分析し、非識別手法、データソース、測定性能に関する情報を収集した。結果: タイトルと要約スクリーニングには合計2125冊の出版物が同定された。 69の研究が関連することが判明した。機械学習(37研究)とハイブリッド(26研究)のアプローチが主流だが、6つの研究はルールのみに依存している。アプローチの大部分は公開コーパスでトレーニングされ、評価された。 2014年のi2b2/uthealthコーパスが最も頻繁に使用される(36の研究)が、2006年のi2b2 (18研究)と2016年のcegs n-grid (10研究)コーパスが続く。 Background: Electronic health records (EHRs) are a valuable resource for data-driven medical research. However, the presence of protected health information (PHI) makes EHRs unsuitable to be shared for research purposes. De-identification, i.e. the process of removing PHI is a critical step in making EHR data accessible. Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process. Objectives: Our study aims to provide systematic evidence on how the de-identification of clinical free text has evolved in the last thirteen years, and to report on the performances and limitations of the current state-of-the-art systems. In addition, we aim to identify challenges and potential research opportunities in this field. Methods: A systematic search in PubMed, Web of Science and the DBLP was conducted for studies published between January 2010 and February 2023. Titles and abstracts were examined to identify the relevant studies. Selected studies were then analysed in-depth, and information was collected on de-identification methodologies, data sources, and measured performance. Results: A total of 2125 publications were identified for the title and abstract screening. 69 studies were found to be relevant. Machine learning (37 studies) and hybrid (26 studies) approaches are predominant, while six studies relied only on rules. Majority of the approaches were trained and evaluated on public corpora. The 2014 i2b2/UTHealth corpus is the most frequently used (36 studies), followed by the 2006 i2b2 (18 studies) and 2016 CEGS N-GRID (10 studies) corpora.	翻訳日:2023-12-11 03:10:59 公開日:2023-11-28
# 言語モデリングにおける最先端技術 Advancing State of the Art in Language Modeling ( http://arxiv.org/abs/2312.03735v1 ) ライセンス: Link先を確認	David Herel and Tomas Mikolov	(参考訳) 一般化は統計言語モデリング研究の最も重要な目標である。オープンソースのコードで公開されたベンチマークや論文は、この分野の進歩に不可欠である。しかし、出版物で報告されているように結果を完全に再現することはしばしば困難であり、時には不可能である。本稿では,一般化の観点から言語モデリングの最先端化を支援するための,シンプルなフレームワークを提案する。我々は,コードだけでなく,新たなモデルをアンサンブルに簡単に追加できるように,将来の出版物による開発やテストセットの確率も提案する。新しく提案されたモデルが実際に現在のベースラインを補完しているかどうかを判断するのはずっと簡単です。したがって、古いトリックの新しい名前を発明する代わりに、科学コミュニティはより早く前進することができる。最後に、このアプローチはアイデアの多様性を促進する: 注目を惹きつけるために、新しい技術の状態である個々のモデルを作成する必要はない; 他のモデルがしないパターンを学ぶ新しいモデルを開発するのに十分である。したがって、準最適モデルでさえも値を持つことが分かる。注目すべきことに、我々のアプローチは、様々な言語モデリングベンチマークで10%まで、最先端の結果をもたらしました。 Generalization is arguably the most important goal of statistical language modeling research. Publicly available benchmarks and papers published with an open-source code have been critical to advancing the field. However, it is often very difficult, and sometimes even impossible, to reproduce the results fully as reported in publications. In this paper, we propose a simple framework that should help advance the state of the art in language modeling in terms of generalization. We propose to publish not just the code, but also probabilities on dev and test sets with future publications so that one can easily add the new model into an ensemble. This has crucial advantages: it is much easier to determine whether a newly proposed model is actually complementary to the current baseline. Therefore, instead of inventing new names for the old tricks, the scientific community can advance faster. Finally, this approach promotes diversity of ideas: one does not need to create an individual model that is the new state of the art to attract attention; it will be sufficient to develop a new model that learns patterns which other models do not. Thus, even a suboptimal model can be found to have value. Remarkably, our approach has yielded new state-of-the-art results across various language modeling benchmarks up to 10%.	翻訳日:2023-12-11 03:10:35 公開日:2023-11-28
# マルチモーダル融合のための条件付きプロンプトチューニング Conditional Prompt Tuning for Multimodal Fusion ( http://arxiv.org/abs/2312.03734v1 ) ライセンス: Link先を確認	Ruixiang Jiang, Lingbo Liu, Changwen Chen	(参考訳) パラメータ効率のよいマルチモーダル核融合のための他のモーダルの促進を効果的に導くことができることを示す。具体的には、まず1つのモダリティを符号化し、その表現を他のモダリティのすべての凍結層を条件付きで促す前に使用する。これは、バニラプロンプトベクトルを3種類の特殊プロンプトに切り離して、グローバルレベルとインスタンスレベルの機能を適応的にキャプチャすることで達成される。インスタンスのプロンプトをより良く生成するために、各インスタンスを動的に、最も適切なプロンプトの専門家にルーティングするプロンプトエキスパート(MoPE)の混合を導入する。我々はさらに、非生成プロプライエタリなプロプライエタリなルーティングを避けるために正規化項を研究する。我々の設計により、下流マルチモーダルタスクのための単調エンコーダにおける事前訓練された知識を効果的に転送することができる。バニラプロンプトと比較すると,MoPEに基づく条件付きプロンプトの方がより表現力が高く,トレーニングデータやプロンプトの総数に優れていた。また、プロンプトチューニングがアーキテクチャに依存しないため、モジュール性が高いことも示しています。 3つのマルチモーダルデータセットに対する大規模な実験は、訓練可能なパラメータの0.7%しか必要とせず、微調整によって達成されたパフォーマンスを一致または超える、最先端の結果を示す。コードは、https://github.com/songrise/ConditionalPrompt.comでリリースされる。 We show that the representation of one modality can effectively guide the prompting of another modality for parameter-efficient multimodal fusion. Specifically, we first encode one modality and use its representation as a prior to conditionally prompt all frozen layers of the other modality. This is achieved by disentangling the vanilla prompt vectors into three types of specialized prompts that adaptively capture global-level and instance-level features. To better produce the instance-wise prompt, we introduce the mixture of prompt experts (MoPE) to dynamically route each instance to the most suitable prompt experts for encoding. We further study a regularization term to avoid degenerated prompt expert routing. Thanks to our design, our method can effectively transfer the pretrained knowledge in unimodal encoders for downstream multimodal tasks. Compared with vanilla prompting, we show that our MoPE-based conditional prompting is more expressive, thereby scales better with training data and the total number of prompts. We also demonstrate that our prompt tuning is architecture-agnostic, thereby offering high modularity. Extensive experiments over three multimodal datasets demonstrate state-of-the-art results, matching or surpassing the performance achieved through fine-tuning, while only necessitating 0.7% of the trainable parameters. Code will be released: https://github.com/songrise/ConditionalPrompt.	翻訳日:2023-12-11 03:10:18 公開日:2023-11-28
# 大規模言語モデル信頼度推定手法 Methods to Estimate Large Language Model Confidence ( http://arxiv.org/abs/2312.03733v1 ) ライセンス: Link先を確認	Maia Kotelanski, Robert Gallo, Ashwin Nayak, Thomas Savage	(参考訳) 大規模言語モデルは、複雑な医療タスクにLLMを適用する上で重要な障害である不確実性を伝えるのが困難である。本研究は,難治性クリニカルヴィグネットの診断におけるLCM信頼度測定法について検討した。 GPT4は、Chain of ThoughtとSelf Consistencyのプロンプトを使って、一連の挑戦的なケース質問を受けた。モデル信頼度を評価するために複数の手法を検討した結果,モデルが観測した精度を予測する能力について評価した。評価方法は,本質的信頼度,SC一致頻度,CoT応答長であった。 SCコンセンサス周波数は観測精度と相関し, 固有信頼度とCoT長解析と比較すると, 受信器動作特性曲線の下で高い面積が得られる。 SC合意はモデル信頼性,特に診断において最も有用な指標である。 Model Intrinsic ConfidenceとCoT Response Lengthは、正しい回答と間違った回答を区別する弱い能力を示し、モデル信頼性のための信頼性と解釈可能なマーカーになることを防ぐ。 GPT4は診断精度に限界があると結論付けている。 SCアグリーメント周波数はGPT4信頼度を測定する最も有用な方法である。 Large Language Models have difficulty communicating uncertainty, which is a significant obstacle to applying LLMs to complex medical tasks. This study evaluates methods to measure LLM confidence when suggesting a diagnosis for challenging clinical vignettes. GPT4 was asked a series of challenging case questions using Chain of Thought and Self Consistency prompting. Multiple methods were investigated to assess model confidence and evaluated on their ability to predict the models observed accuracy. The methods evaluated were Intrinsic Confidence, SC Agreement Frequency and CoT Response Length. SC Agreement Frequency correlated with observed accuracy, yielding a higher Area under the Receiver Operating Characteristic Curve compared to Intrinsic Confidence and CoT Length analysis. SC agreement is the most useful proxy for model confidence, especially for medical diagnosis. Model Intrinsic Confidence and CoT Response Length exhibit a weaker ability to differentiate between correct and incorrect answers, preventing them from being reliable and interpretable markers for model confidence. We conclude GPT4 has a limited ability to assess its own diagnostic accuracy. SC Agreement Frequency is the most useful method to measure GPT4 confidence.	翻訳日:2023-12-11 03:09:54 公開日:2023-11-28
# LoRAを用いた微調整のためのランク安定化スケーリング因子 A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA ( http://arxiv.org/abs/2312.03732v1 ) ライセンス: Link先を確認	Damjan Kalajdzievski	(参考訳) 大規模言語モデル (LLM) はますます計算とメモリ集約化が進んでいるため、パラメータ効率のよい微調整法 (PEFT) がLLMを微調整するための一般的な戦略となっている。 PEFTの一般的な手法はLoRA(Lo-Rank Adapters)であり、選択した層にトレーニング可能な低ランクの"アダプタ"を追加する。各アダプタは、ランク依存因子によって乗算スケールされた低ランク行列積からなる。このスケーリング係数は、アダプタをランクの要素で分割するので、ローラの学習が遅くなり、上位のアダプタでパフォーマンスが低下する。そのため、LoRAの使用は一般的に非常に低い階級に限られている。本研究では,スケーリング因子が学習過程に与える影響について検討し,LoRAアダプタをランクの平方根の因子で分割すべきであることを証明した。ランク安定化LoRA (rsLoRA) 法と呼ばれる,LoRAを適切なスケーリング係数で修正することで,計算資源の増大を抑えるための微調整/性能トレードオフを容易に実現することができる。 As large language models (LLMs) have become increasingly compute and memory intensive, parameter-efficient fine-tuning (PEFT) methods are now a common strategy to fine-tune LLMs. A popular PEFT method is Low-Rank Adapters (LoRA), which adds trainable low-rank "adapters" to selected layers. Each adapter consists of a low-rank matrix product, multiplicatively scaled by a rank-dependent factor. This scaling factor, which divides adapters by a factor of the rank, results in slowed learning and stunted performance for LoRA with higher-rank adapters. Consequently, the use of LoRA in practice has generally been limited to very low ranks. In this work, we study the impact of the scaling factor on the learning process and prove that LoRA adapters should be divided by a factor of the square root of the rank. Modifying LoRA with the appropriate scaling factor, which we call the rank-stabilized LoRA (rsLoRA) method, easily provides for a fine-tuning compute/performance trade-off, where larger ranks can be used to trade off increased computational resources during training for better fine-tuning performance, with no change in inference computing cost.	翻訳日:2023-12-11 03:09:37 公開日:2023-11-28
# グラフ上でのマルチタスク事前学習とプロンプトのためのマルチgprompt MultiGPrompt for Multi-Task Pre-Training and Prompting on Graphs ( http://arxiv.org/abs/2312.03731v1 ) ライセンス: Link先を確認	Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhang	(参考訳) グラフは本質的にWeb上の相互接続オブジェクトをモデル化することができ、Web分析やコンテントレコメンデーションといった一連のWebアプリケーションを容易にします。近年,グラフ表現学習の主流技術としてグラフニューラルネットワーク(GNN)が登場している。しかし、エンドツーエンドの監視フレームワークでの有効性は、タスク固有のラベルの可用性にかなり関係しています。ラベリングコストを軽減し、数ショット設定で堅牢性を高めるため、自己指導型タスクの事前訓練が有望な方法として現れ、プリテキストと下流タスクの客観的ギャップをさらに狭めるためのプロンプトが提案されている。グラフ上でのプロンプトベース学習の初期調査はあったが、それらは主に単一のプリテキストタスクを活用し、事前学習データから学べる一般的な知識のサブセットが限られている。そこで本稿では,マルチタスク事前学習およびプロンプトフレームワークであるmultigpromptを提案する。まず、事前学習において、複数のプリテキストタスクを相乗化するためのプリテキストトークンセットを設計する。第2に,タスク固有の,グローバルな事前学習知識を活用するためのオープンプロンプトとオープンプロンプトから構成されたデュアルプロンプト機構を提案する。最後に、MultiGPromptの評価と分析を行うために、6つの公開データセットに関する広範な実験を行う。 Graphs can inherently model interconnected objects on the Web, thereby facilitating a series of Web applications, such as web analyzing and content recommendation. Recently, Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availabilityof task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the pre-training data. Hence, in this paper, we propose MultiGPrompt, a novel multi-task pre-training and prompting framework to exploit multiple pretext tasks for more comprehensive pre-trained knowledge. First, in pre-training, we design a set of pretext tokens to synergize multiple pretext tasks. Second, we propose a dual-prompt mechanism consisting of composed and open prompts to leverage task-specific and global pre-training knowledge, to guide downstream tasks in few-shot settings. Finally, we conduct extensive experiments on six public datasets to evaluate and analyze MultiGPrompt.	翻訳日:2023-12-11 03:09:12 公開日:2023-11-28
# プロセス要求に基づく生成チャットボットの比較 Comparing Generative Chatbots Based on Process Requirements ( http://arxiv.org/abs/2312.03741v1 ) ライセンス: Link先を確認	Luis Fernando Lins, Nathalia Nascimento, Paulo Alencar, Toacy Oliveira, Donald Cowan	(参考訳) ビジネスプロセスは一般的に、イベント駆動プロセスチェイン(EPC)や他のワークフロー言語(YAWL)などのモデリング言語、ビジネスプロセスをモデリングするための最も一般的な標準表記法であるビジネスプロセスモデルと表記法(BPMN)によって表現されます。最近では、自然言語を使ってマシンと対話できるプログラムであるチャットボットが、ビジネスプロセスの実行サポートにますます使われています。注目すべきチャットボットの最近のカテゴリは、数十億のパラメータに基づいてトレーニングされ、会話インテリジェンスをサポートするOpenAIのGenerative Pre-Trained Transformer(GPT)モデルやGoogleのPaLM(Pathways Language Model)など、LLM(Large Language Model)を利用したジェネレーティブベースのチャットボットである。しかし、生成型ベースのチャットボットがプロセス実行サポートのためにBPMNが提供するような構成要素の要求を理解し、満たせるかどうかは不明です。本稿では, プロセス実行支援の文脈において, GPT と PaLM の顕著な生成モデルの性能を比較するケーススタディを提案する。この研究は、生成型チャットボットがサポートする会話型アプローチを、プロセス認識型モデリング表記法を理解し、タスクを実行するユーザをサポートする手段として使用するという、難しい問題に光を当てている。 Business processes are commonly represented by modelling languages, such as Event-driven Process Chain (EPC), Yet Another Workflow Language (YAWL), and the most popular standard notation for modelling business processes, the Business Process Model and Notation (BPMN). Most recently, chatbots, programs that allow users to interact with a machine using natural language, have been increasingly used for business process execution support. A recent category of chatbots worth mentioning is generative-based chatbots, powered by Large Language Models (LLMs) such as OpenAI's Generative Pre-Trained Transformer (GPT) model and Google's Pathways Language Model (PaLM), which are trained on billions of parameters and support conversational intelligence. However, it is not clear whether generative-based chatbots are able to understand and meet the requirements of constructs such as those provided by BPMN for process execution support. This paper presents a case study to compare the performance of prominent generative models, GPT and PaLM, in the context of process execution support. The research sheds light into the challenging problem of using conversational approaches supported by generative chatbots as a means to understand process-aware modelling notations and support users to execute their tasks.	翻訳日:2023-12-11 02:55:03 公開日:2023-11-28
# 自己回帰型大規模言語モデルのプロンプト Prompting in Autoregressive Large Language Models ( http://arxiv.org/abs/2312.03740v1 ) ライセンス: Link先を確認	Prabin Bhandari	(参考訳) 自己回帰型大規模言語モデルは自然言語処理のランドスケープに変化をもたらした。プレトレインとプロンプトのパラダイムは、多くの下流NLPタスクに対する事前トレーニングと微調整の従来のアプローチに取って代わられた。この変化は、LLMと革新的なプロンプト技術によって起こりうる。 LLMは、膨大なパラメータとトレーニング済みの巨大なデータセットのために、さまざまなダウンストリームタスクに対して大きな期待を示している。しかし、その潜在能力を十分に実現するためには、その成果を望ましい結果へと導く必要がある。 LLMを目的の出力に導くための特定の入力や命令が提供されるプロンプトは、この目標を達成するためのツールとなっている。本稿では,LLMのパワーをフル活用するための様々なプロンプト技術について論じる。我々は,既存の文献の分類法と,この分類法に基づく簡潔な調査を行った。さらに,今後の研究の方向性として期待できる自己回帰型LSMを推し進める領域において,いくつかの未解決問題を明らかにした。 Autoregressive Large Language Models have transformed the landscape of Natural Language Processing. Pre-train and prompt paradigm has replaced the conventional approach of pre-training and fine-tuning for many downstream NLP tasks. This shift has been possible largely due to LLMs and innovative prompting techniques. LLMs have shown great promise for a variety of downstream tasks owing to their vast parameters and huge datasets that they are pre-trained on. However, in order to fully realize their potential, their outputs must be guided towards the desired outcomes. Prompting, in which a specific input or instruction is provided to guide the LLMs toward the intended output, has become a tool for achieving this goal. In this paper, we discuss the various prompting techniques that have been applied to fully harness the power of LLMs. We present a taxonomy of existing literature on prompting techniques and provide a concise survey based on this taxonomy. Further, we identify some open problems in the realm of prompting in autoregressive LLMs which could serve as a direction for future research.	翻訳日:2023-12-11 02:54:38 公開日:2023-11-28
# 総合的アスペクトベース感性分析のための構文インフォームド対話モデル Syntax-Informed Interactive Model for Comprehensive Aspect-Based Sentiment Analysis ( http://arxiv.org/abs/2312.03739v1 ) ライセンス: Link先を確認	Ullman Galen, Frey Lee, Woods Ali	(参考訳) テキスト分析におけるニュアンス化されたタスクであるアスペクトベース感情分析(ABSA)は、テキスト内の特定のアスペクト項に関連する感情指向を識別しようとする。伝統的なアプローチは、しばしば文の明示的な構文構造を見落としたり、不十分にモデル化する。このギャップに対処するため, 総合ABSAのための構文依存強化マルチタスクインタラクションアーキテクチャ (SDEMTIA) を提案する。本手法は,SDEIN(Syntactic Dependency Embedded Interactive Network)を用いて,構文知識(依存性関係と型)を革新的に活用する。また,マルチタスク学習フレームワークに新規かつ効率的なメッセージパッシング機構を組み込んで,学習効果を高める。ベンチマークデータセットに関する大規模な実験では、既存の手法をはるかに超えて、モデルの優位性を示しました。さらに,BERTを補助的特徴抽出器として組み込むことにより,モデルの性能をさらに向上させる。 Aspect-based sentiment analysis (ABSA), a nuanced task in text analysis, seeks to discern sentiment orientation linked to specific aspect terms in text. Traditional approaches often overlook or inadequately model the explicit syntactic structures of sentences, crucial for effective aspect term identification and sentiment determination. Addressing this gap, we introduce an innovative model: Syntactic Dependency Enhanced Multi-Task Interaction Architecture (SDEMTIA) for comprehensive ABSA. Our approach innovatively exploits syntactic knowledge (dependency relations and types) using a specialized Syntactic Dependency Embedded Interactive Network (SDEIN). We also incorporate a novel and efficient message-passing mechanism within a multi-task learning framework to bolster learning efficacy. Our extensive experiments on benchmark datasets showcase our model's superiority, significantly surpassing existing methods. Additionally, incorporating BERT as an auxiliary feature extractor further enhances our model's performance.	翻訳日:2023-12-11 02:54:25 公開日:2023-11-28
# Syntactic Fusion:マルチトレーグラフ統合によるアスペクトレベル知覚分析の強化 Syntactic Fusion: Enhancing Aspect-Level Sentiment Analysis Through Multi-Tree Graph Integration ( http://arxiv.org/abs/2312.03738v1 ) ライセンス: Link先を確認	Jane Sunny, Tom Padraig, Roggie Terry, Woods Ali	(参考訳) アスペクトレベルの感情分類の最近の進歩は、構文構造、特に依存木を利用したグラフニューラルネットワーク(GNN)の導入によって促進されている。しかしながら、これらのモデルの性能はパースアルゴリズムの本質的に不正確さによって妨げられることが多い。この課題を軽減するために、複数のパーサーから予測をアマルガメートする革新的なグラフアンサンブル法であるSynthFusionを導入する。この戦略はgnnの適用前に様々な依存関係の関係を融合させ、余分な計算負荷を避けながら解析エラーに対する堅牢性を高めている。 SynthFusionは、グラフ接続を最適化することで、過度なパラメータ化の落とし穴を回避し、GNN層を積み重ねたモデルで一般的なオーバーフィッティングのリスクを低減します。 SemEval14とTwitter14データセットに関する実証的な評価では、SynthFusionは単一の依存ツリーに依存しているだけでなく、代替アンサンブルテクニックを無視し、モデルの複雑さをエスカレーションすることなくこれを達成している。 Recent progress in aspect-level sentiment classification has been propelled by the incorporation of graph neural networks (GNNs) leveraging syntactic structures, particularly dependency trees. Nevertheless, the performance of these models is often hampered by the innate inaccuracies of parsing algorithms. To mitigate this challenge, we introduce SynthFusion, an innovative graph ensemble method that amalgamates predictions from multiple parsers. This strategy blends diverse dependency relations prior to the application of GNNs, enhancing robustness against parsing errors while avoiding extra computational burdens. SynthFusion circumvents the pitfalls of overparameterization and diminishes the risk of overfitting, prevalent in models with stacked GNN layers, by optimizing graph connectivity. Our empirical evaluations on the SemEval14 and Twitter14 datasets affirm that SynthFusion not only outshines models reliant on single dependency trees but also eclipses alternative ensemble techniques, achieving this without an escalation in model complexity.	翻訳日:2023-12-11 02:54:07 公開日:2023-11-28
# 深層強化学習による需要応答を考慮した統合エネルギーシステムの攻撃耐性スケジューリング Advancing Attack-Resilient Scheduling of Integrated Energy Systems with Demand Response via Deep Reinforcement Learning ( http://arxiv.org/abs/2311.17941v1 ) ライセンス: Link先を確認	Yang Li, Wenjie Ma, Yuanzheng Li, Sen Li, Zhe Chen	(参考訳) 多エネルギー流の最適スケジューリングは、再生可能エネルギー源(RES)を有効利用し、統合エネルギーシステム(IES)の安定性と経済性を改善する方法である。しかし、IESの安定した需要供給は、RESや負荷から生じる不確実性や、高度な情報や通信技術の導入によるサイバー攻撃の影響の増大による課題に直面している。これらの課題に対処するため,本研究では,IDR対応IESのための状態逆深部強化学習(DRL)に基づくモデルレスレジリエンススケジューリング手法を提案する。提案手法は,電気・ガス熱フレキシブル負荷の相互作用能力を調べるためのidrプログラムの設計である。さらに、国家対向マルコフ決定プロセス(sa-mdp)モデルは、サイバー攻撃下のiesのエネルギースケジューリング問題を特徴づける。スケジューリング戦略に対するサイバー攻撃の影響を軽減するため,SA-SAC (State-adversarial soft actor-critic)アルゴリズムを提案する。シミュレーションの結果,RESと負荷による不確実性に適切に対処し,サイバー攻撃がスケジュール戦略に与える影響を軽減し,様々なエネルギー源に対する安定した需要供給を確保することができることがわかった。さらに,提案手法はサイバー攻撃に対する弾力性を示す。ソフトアクター・クリティカル(SAC)アルゴリズムと比較すると,サイバー攻撃シナリオ下での経済性能は10倍に向上する。 Optimally scheduling multi-energy flow is an effective method to utilize renewable energy sources (RES) and improve the stability and economy of integrated energy systems (IES). However, the stable demand-supply of IES faces challenges from uncertainties that arise from RES and loads, as well as the increasing impact of cyber-attacks with advanced information and communication technologies adoption. To address these challenges, this paper proposes an innovative model-free resilience scheduling method based on state-adversarial deep reinforcement learning (DRL) for integrated demand response (IDR)-enabled IES. The proposed method designs an IDR program to explore the interaction ability of electricity-gas-heat flexible loads. Additionally, a state-adversarial Markov decision process (SA-MDP) model characterizes the energy scheduling problem of IES under cyber-attack. The state-adversarial soft actor-critic (SA-SAC) algorithm is proposed to mitigate the impact of cyber-attacks on the scheduling strategy. Simulation results demonstrate that our method is capable of adequately addressing the uncertainties resulting from RES and loads, mitigating the impact of cyber-attacks on the scheduling strategy, and ensuring a stable demand supply for various energy sources. Moreover, the proposed method demonstrates resilience against cyber-attacks. Compared to the original soft actor-critic (SAC) algorithm, it achieves a 10\% improvement in economic performance under cyber-attack scenarios.	翻訳日:2023-12-01 19:35:28 公開日:2023-11-28
# シーン要約:シーン映像を空間的に異なるフレームにまとめる Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames ( http://arxiv.org/abs/2311.17940v1 ) ライセンス: Link先を確認	Chao Chen, Mingzhi Zhu, Ankush Pratap Singh, Yu Yan, Felix Juefei Xu, Chen Feng	(参考訳) シーン理解タスクとしてシーン要約を提案する。それは、シーンの長いビデオウォークスルーを、そのシーンで空間的に多様な小さなフレームにまとめることを目的としており、監視、不動産、ロボット工学など、多くの不適切な応用がある。ビデオ要約から生まれたものだが、既存のビデオ要約作業で一般的に研究されているユーザー編集の断片化されたビデオクリップではなく、移動中のカメラからの長い連続的なビデオに焦点を当てている。このタスクに対する私たちのソリューションは、SceneSumという名前の2段階の自己管理パイプラインです。第1段では、クラスタリングを使用してビデオシーケンスをセグメンテーションする。我々の中心となる考え方は、空間的多様性を促進するために視覚的位置認識(VPR)をこのクラスタリングプロセスに統合することである。第2段階では、メモリやディスクスペースの制限といったリソース制約を尊重しながら、各クラスタからサマリとして代表キーフレームを選択する必要がある。さらに,基底真理画像の軌跡が利用可能であれば,教師付き損失で容易に拡張でき,クラスタリングやキーフレームの選択が容易になる。実世界およびシミュレートされたデータセットの広汎な実験は、我々の手法が共通のビデオ要約ベースラインを50%上回っていることを示している。 We propose scene summarization as a new video-based scene understanding task. It aims to summarize a long video walkthrough of a scene into a small set of frames that are spatially diverse in the scene, which has many impotant applications, such as in surveillance, real estate, and robotics. It stems from video summarization but focuses on long and continuous videos from moving cameras, instead of user-edited fragmented video clips that are more commonly studied in existing video summarization works. Our solution to this task is a two-stage self-supervised pipeline named SceneSum. Its first stage uses clustering to segment the video sequence. Our key idea is to combine visual place recognition (VPR) into this clustering process to promote spatial diversity. Its second stage needs to select a representative keyframe from each cluster as the summary while respecting resource constraints such as memory and disk space limits. Additionally, if the ground truth image trajectory is available, our method can be easily augmented with a supervised loss to enhance the clustering and keyframe selection. Extensive experiments on both real-world and simulated datasets show our method outperforms common video summarization baselines by 50%	翻訳日:2023-12-01 19:35:03 公開日:2023-11-28
# 能動的オープンボキャブラリ認識:知的移動型CLIP制限 Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations ( http://arxiv.org/abs/2311.17938v1 ) ライセンス: Link先を確認	Lei Fan, Jianxiong Zhou, Xiaoying Xing and Ying Wu	(参考訳) 知的なエージェントがより優れた認識性能のために観察を探索できるアクティブ認識は、把持、ナビゲーション、部屋の配置など、様々な具体化されたaiタスクの前提条件として機能する。進化する環境と多数のオブジェクトクラスを考えると、トレーニングステージ中に可能なすべてのクラスを含めることは非現実的です。本稿では,任意の対象を積極的に認識し分類するエンボディドエージェントの権限を付与し,アクティブなオープンボキャブラリー認識の促進を目指す。しかし、Contrastive Language Image Pretraining (CLIP)のような最近のオープン語彙分類モデルを直接採用することは、そのユニークな課題を提起する。具体的には,CLIPの性能は視点や閉塞の影響を強く受けており,非拘束的知覚シナリオにおける信頼性を損なう。さらに、エージェント環境相互作用における観察のシーケンシャルな性質は、オープン語彙分類の識別力を維持する特徴を統合する効果的な方法を必要とする。これらの課題に対処するために,オープン語彙認識のための新しいエージェントを提案する。提案手法は,クラス固有の知識に頼ることなく,フレーム間の類似性を利用してエージェントの動きをナビゲートし,特徴を融合する。 ShapeNetデータセットで29.6%の精度を持つベースラインCLIPモデルと比較して、提案されたエージェントは、装備されたCLIPモデルに微調整することなく、オープン語彙認識において53.3%の精度を達成することができた。 Habitatシミュレータを用いて追加実験を行い,本手法の有効性を確認した。 Active recognition, which allows intelligent agents to explore observations for better recognition performance, serves as a prerequisite for various embodied AI tasks, such as grasping, navigation and room arrangements. Given the evolving environment and the multitude of object classes, it is impractical to include all possible classes during the training stage. In this paper, we aim at advancing active open-vocabulary recognition, empowering embodied agents to actively perceive and classify arbitrary objects. However, directly adopting recent open-vocabulary classification models, like Contrastive Language Image Pretraining (CLIP), poses its unique challenges. Specifically, we observe that CLIP's performance is heavily affected by the viewpoint and occlusions, compromising its reliability in unconstrained embodied perception scenarios. Further, the sequential nature of observations in agent-environment interactions necessitates an effective method for integrating features that maintains discriminative strength for open-vocabulary classification. To address these issues, we introduce a novel agent for active open-vocabulary recognition. The proposed method leverages inter-frame and inter-concept similarities to navigate agent movements and to fuse features, without relying on class-specific knowledge. Compared to baseline CLIP model with 29.6% accuracy on ShapeNet dataset, the proposed agent could achieve 53.3% accuracy for open-vocabulary recognition, without any fine-tuning to the equipped CLIP model. Additional experiments conducted with the Habitat simulator further affirm the efficacy of our method.	翻訳日:2023-12-01 19:34:43 公開日:2023-11-28
# テキスト・画像拡散モデルにおける空間的理解のアンロック Unlocking Spatial Comprehension in Text-to-Image Diffusion Models ( http://arxiv.org/abs/2311.17937v1 ) ライセンス: Link先を確認	Mohammad Mahdi Derakhshani, Menglin Xia, Harkirat Behl, Cees G. M. Snoek, Victor R\"uhle	(参考訳) テキストから画像への生成モデルにおける空間的理解と属性割り当てを向上させる画像生成パイプラインCompFuserを提案する。我々のパイプラインは,「オレンジ色の犬の左にある灰色の猫の画像」のようなシーン内の物体間の空間的関係を定義する命令を解釈し,対応する画像を生成する。これは、ユーザにもっとコントロールを提供するために特に重要です。 CompFuserは、複数のオブジェクトの生成を反復的なステップにデコードすることで、既存のテキストと画像の拡散モデルの制限を克服する。空間的理解と属性割り当てのためのトレーニングデータを作成するために,凍結した大言語モデルと凍結したレイアウトに基づくオブジェクト配置の拡散モデルを利用する合成データ生成プロセスを導入する。提案手法を強いベースラインと比較し,パラメータが3倍から5倍小さいにもかかわらず,空間的理解と属性割当において最先端画像生成モデルを上回ることを示す。 We propose CompFuser, an image generation pipeline that enhances spatial comprehension and attribute assignment in text-to-image generative models. Our pipeline enables the interpretation of instructions defining spatial relationships between objects in a scene, such as `An image of a gray cat on the left of an orange dog', and generate corresponding images. This is especially important in order to provide more control to the user. CompFuser overcomes the limitation of existing text-to-image diffusion models by decoding the generation of multiple objects into iterative steps: first generating a single object and then editing the image by placing additional objects in their designated positions. To create training data for spatial comprehension and attribute assignment we introduce a synthetic data generation process, that leverages a frozen large language model and a frozen layout-based diffusion model for object placement. We compare our approach to strong baselines and show that our model outperforms state-of-the-art image generation models in spatial comprehension and attribute assignment, despite being 3x to 5x smaller in parameters.	翻訳日:2023-12-01 19:34:16 公開日:2023-11-28
# ランク付け平均治療効果による治療優先順位付けルールの評価 Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects ( http://arxiv.org/abs/2111.07966v2 ) ライセンス: Link先を確認	Steve Yadlowsky, Scott Fleming, Nigam Shah, Emma Brunskill, Stefan Wager	(参考訳) 治療効果の推定、リスクスコア、手作りのルールに基づいて、誰が治療を優先するかを選択する方法は数多く存在する。本稿では, 優先順位付け規則の質を比較し, 検証するための簡易かつ一般的な指標として, ランク重み付き平均治療効果(RATE)指標を提案する。 RATEメトリクスは、優先順位付けルールの導出方法に非依存であり、治療から最も恩恵を受ける個人をどの程度正確に識別するかのみを評価する。我々は、RATE推定器の族を定義し、多種多様なランダム化および観察研究環境における漸近的正確な推論を可能にする中心極限定理を証明した。レートメトリクスは、qini係数を含む既存のメトリクスを列挙し、分析によってこれらのメトリクスの推論メソッドが直接得られる。我々は脳卒中患者に対するアスピリンの最適標的を含む多くの応用の文脈でRATEを紹介した。 There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as a simple and general family of metrics for comparing and testing the quality of treatment prioritization rules. RATE metrics are agnostic as to how the prioritization rules were derived, and only assess how well they identify individuals that benefit the most from treatment. We define a family of RATE estimators and prove a central limit theorem that enables asymptotically exact inference in a wide variety of randomized and observational study settings. RATE metrics subsume a number of existing metrics, including the Qini coefficient, and our analysis directly yields inference methods for these metrics. We showcase RATE in the context of a number of applications, including optimal targeting of aspirin to stroke patients.	翻訳日:2023-12-01 04:22:20 公開日:2023-11-28
# 地球外文明による量子コンピューティングの道具としてのブラックホール Black holes as tools for quantum computing by advanced extraterrestrial civilizations ( http://arxiv.org/abs/2301.09575v3 ) ライセンス: Link先を確認	Gia Dvali and Zaza N. Osmanov	(参考訳) ブラックホールは量子情報の最も効率的なコンデンサであると説明した。したがって、全ての十分に進んだ文明は最終的にブラックホールを量子コンピュータに採用することが期待されている。伴うホーキング放射は粒子種では民主的である。このため、宇宙の量子コンピュータはニュートリノや光子のような通常の粒子を検出器の潜在的な感度の範囲内で放射する。これはSETIにとって新しい道であり、それは重力によってのみ世界と相互作用する隠れ粒子の種からなる文明を含む。 We explain that black holes are the most efficient capacitors of quantum information. It is thereby expected that all sufficiently advanced civilizations ultimately employ black holes in their quantum computers. The accompanying Hawking radiation is democratic in particle species. Due to this, the alien quantum computers will radiate in ordinary particles such as neutrinos and photons within the range of potential sensitivity of our detectors. This offers a new avenue for SETI, including the civilizations entirely composed of hidden particles species interacting with our world exclusively through gravity.	翻訳日:2023-12-01 04:14:04 公開日:2023-11-28
# 生成モデルと異常検出のための量子確率ハミルトン学習 Quantum-probabilistic Hamiltonian learning for generative modelling & anomaly detection ( http://arxiv.org/abs/2211.03803v3 ) ライセンス: Link先を確認	Jack Y. Araz and Michael Spannowsky	(参考訳) 孤立量子力学系のハミルトニアンはその力学と物理的挙動を決定する。本研究では,システムのハミルトニアンを学習し,その変動熱状態推定をデータ解析に活用する可能性について検討する。そこで本研究では,シミュレーションによる大型ハドロン衝突型加速器データの生成モデルとして量子ハミルトニアンモデルを用いて,混合状態として表現可能性を示す。さらに、学習したハミルトニアンを用いて異常検出を行い、異なるサンプル型が量子多体系として扱われたときの異なる動的挙動を形成することを示した。これらの特徴を利用してサンプルタイプの違いを定量化する。本研究は,フィールド理論計算のための手法を機械学習アプリケーションに応用し,データ解析手法の理論的アプローチを応用できることを示唆する。 The Hamiltonian of an isolated quantum mechanical system determines its dynamics and physical behaviour. This study investigates the possibility of learning and utilising a system's Hamiltonian and its variational thermal state estimation for data analysis techniques. For this purpose, we employ the method of Quantum Hamiltonian-based models for the generative modelling of simulated Large Hadron Collider data and demonstrate the representability of such data as a mixed state. In a further step, we use the learned Hamiltonian for anomaly detection, showing that different sample types can form distinct dynamical behaviours once treated as a quantum many-body system. We exploit these characteristics to quantify the difference between sample types. Our findings show that the methodologies designed for field theory computations can be utilised in machine learning applications to employ theoretical approaches in data analysis techniques.	翻訳日:2023-12-01 04:11:09 公開日:2023-11-28
# 固有状態熱化のない熱化 Thermalization without eigenstate thermalization ( http://arxiv.org/abs/2209.09826v2 ) ライセンス: Link先を確認	Aram W. Harrow and Yichen Huang	(参考訳) 孤立量子多体系における一元進化の過程において、サブシステムの熱化の研究を行い、残りの系を浴として扱う。この設定では、固有状態熱化仮説(ETH)が熱化を説明するために提案された。ランダムな全ての4体相互作用をランダムな自由フェルミオンモデルに対する摂動として加えた、ほぼ可積分なサハデフ-イェ-キタエフモデルを考える。サブシステムのサイズが平方根よりも大きいが、それでも系のサイズが消える割合である場合、系がランダムな積状態で初期化され、ほぼすべての固有状態がETHに反することを示す。この意味で、ETHは熱化に必要な条件ではない。 In an isolated quantum many-body system undergoing unitary evolution, we study the thermalization of a subsystem, treating the rest of the system as a bath. In this setting, the eigenstate thermalization hypothesis (ETH) was proposed to explain thermalization. Consider a nearly integrable Sachdev-Ye-Kitaev model obtained by adding random all-to-all 4-body interactions as a perturbation to a random free-fermion model. When the subsystem size is larger than the square root of but is still a vanishing fraction of the system size, we prove thermalization if the system is initialized in a random product state, while almost all eigenstates violate the ETH. In this sense, the ETH is not a necessary condition for thermalization.	翻訳日:2023-12-01 04:10:15 公開日:2023-11-28
# 計算仮定による非局所性 Nonlocality under Computational Assumptions ( http://arxiv.org/abs/2303.02080v2 ) ライセンス: Link先を確認	Khashayar Barooti, Alexandru Gheorghiu, Grzegorz G{\l}uch, Marc-Olivier Renou	(参考訳) 非局所性と絡み合いとのつながりは量子力学の基本的な特徴であり、量子情報科学において多くの応用が見つかっている。相関の組が非局所であるとは、無作為性を共有し、局所的な操作を行うスペースのような分離されたパーティによって再現できない場合である。重要な実践的考慮事項は、当事者のランタイムが、その間の移動に光がかかる時間よりも短くなければならないことである。この制限をモデル化する一つの方法は、当事者が計算的に有界であると仮定することである。そこで,計算仮定の下での非局所性の研究を開始し,以下の結果を得る。 a) 集合 $\mathsf{nel}$ (非効率的局所的) を、局所的な測定から生じる相関関係を共有ランダム性および \emph{polynomial-time} 局所演算で再現できないすべての二成分状態からなるものと定義する。 b)Learning With Errors が多項式時間で解けないという仮定の下では、$\mathsf{NeL}=\mathsf{ENT}$, where $\mathsf{ENT}$ が 'emph{all} bipartite entangled state (pure and mixed) の集合であることを示す。これは、例えばヴェルナー状態のような絡み合った状態が局所であることが知られている非局所性の標準的な概念とは対照的である。本質的に、共有ランダム性や量子多項式時間計算によって再現できない相関関係を生成する(効率的な)局所測定が存在することを示す。 (c)$\mathsf{NeL}=\mathsf{ENT}$が無条件であれば、$\mathsf{BQP}\neq\mathsf{PP}$が証明される。言い換えれば、計算的に有界な敵に対して全ての二部交絡状態を証明する能力は、複雑性クラスを非自明に分離する。 (d)使用 (c) では,$\mathsf{pp}$ provers に対して可聴な 1 ラウンドデリゲート量子計算プロトコルのある種の自然クラスは存在できないことを示す。 Nonlocality and its connections to entanglement are fundamental features of quantum mechanics that have found numerous applications in quantum information science. A set of correlations is said to be nonlocal if it cannot be reproduced by spacelike-separated parties sharing randomness and performing local operations. An important practical consideration is that the runtime of the parties has to be shorter than the time it takes light to travel between them. One way to model this restriction is to assume that the parties are computationally bounded. We therefore initiate the study of nonlocality under computational assumptions and derive the following results: (a) We define the set $\mathsf{NeL}$ (not-efficiently-local) as consisting of all bipartite states whose correlations arising from local measurements cannot be reproduced with shared randomness and \emph{polynomial-time} local operations. (b) Under the assumption that the Learning With Errors problem cannot be solved in \emph{quantum} polynomial-time, we show that $\mathsf{NeL}=\mathsf{ENT}$, where $\mathsf{ENT}$ is the set of \emph{all} bipartite entangled states (pure and mixed). This is in contrast to the standard notion of nonlocality where it is known that some entangled states, e.g. Werner states, are local. In essence, we show that there exist (efficient) local measurements producing correlations that cannot be reproduced through shared randomness and quantum polynomial-time computation. (c) We prove that if $\mathsf{NeL}=\mathsf{ENT}$ unconditionally, then $\mathsf{BQP}\neq\mathsf{PP}$. In other words, the ability to certify all bipartite entangled states against computationally bounded adversaries gives a non-trivial separation of complexity classes. (d) Using (c), we show that a certain natural class of 1-round delegated quantum computation protocols that are sound against $\mathsf{PP}$ provers cannot exist.	翻訳日:2023-12-01 03:59:41 公開日:2023-11-28
# 光ツイーザーにおけるRydberg原子と極性分子を用いた中心スピンモデルの量子シミュレーション Quantum simulation of the central spin model with a Rydberg atom and polar molecules in optical tweezers ( http://arxiv.org/abs/2302.14774v2 ) ライセンス: Link先を確認	Jacek Dobrzyniecki, Micha{\l} Tomza	(参考訳) 1つのスピンフル粒子がスピン環境と相互作用する中心スピンモデルは、量子情報技術の幅広い応用を見つけ、例えば時間とともに量子ビットのデコヒーレンスを記述するのに使うことができる。本稿では、XX(スピン交換)相互作用を持つ中心スピンモデルの超低温量子シミュレータを実現する方法を提案する。提案系は1つのRydberg原子(中心スピン)と周囲の極性分子(基底スピン)から構成され、双極子-双極子相互作用を介して互いに結合している。内部粒子状態をスピン状態にマッピングすることで、スピン交換相互作用をシミュレートすることができる。システム幾何学の例として、リング状の浴室スピンの配置を検討し、それが相互作用強度の正確な制御を可能にする方法を示す。この構成でシミュレーションできる2つの例の動的シナリオを数値的に解析する: 乱れた環境で量子ビットのデコヒーレンスを表現できる中心スピン偏極の崩壊と、量子ネットワークをまたいだ単一ビットの伝送を表現できる特定の出力スピンへの入力スピン状態の転送である。この設定により、量子科学や技術への応用のために、高度に調整可能なパラメータと幾何学を持つ中心スピンモデルを実現することができることを示す。 Central spin models, where a single spinful particle interacts with a spin environment, find wide application in quantum information technology and can be used to describe, e.g., the decoherence of a qubit over time. We propose a method of realizing an ultracold quantum simulator of a central spin model with XX (spin-exchanging) interactions. The proposed system consists of a single Rydberg atom ("central spin") and surrounding polar molecules ("bath spins"), coupled to each other via dipole-dipole interactions. By mapping internal particle states to spin states, spin-exchanging interactions can be simulated. As an example system geometry, we consider a ring-shaped arrangement of bath spins, and show how it allows us to exact precise control over the interaction strengths. We numerically analyze two example dynamical scenarios which can be simulated in this setup: a decay of central spin polarization, which can represent qubit decoherence in a disordered environment, and a transfer of an input spin state to a specific output spin, which can represent the transmission of a single bit across a quantum network. We demonstrate that this setup allows us to realize a central spin model with highly tunable parameters and geometry, for applications in quantum science and technology.	翻訳日:2023-12-01 03:59:02 公開日:2023-11-28
# StyLIP: CLIPベースのドメイン一般化のためのマルチスケールスタイルのプロンプト学習 StyLIP: Multi-Scale Style-Conditioned Prompt Learning for CLIP-based Domain Generalization ( http://arxiv.org/abs/2302.09251v3 ) ライセンス: Link先を確認	Shirsha Bose, Ankit Jha, Enrico Fini, Mainak Singha, Elisa Ricci, Biplab Banerjee	(参考訳) CLIPのような大規模ファウンデーションモデルは、よく設計された言語プロンプトを活用して、下流タスクで印象的なゼロショットの一般化性能を示した。しかし、これらの即興学習技術は、しばしばドメインシフトに苦しめられ、一般化能力が制限される。本研究では、ドメイン間の分類性能を高めるドメイン一般化(DG)の新しいアプローチであるStyLIPを提案し、この問題に対処する。提案手法は,CLIPの事前学習された視覚エンコーダに埋め込まれた視覚スタイルやコンテンツ情報を,ドメインに依存しないプロンプト学習戦略に重点を置いている。そこで我々は,抽出したマルチスケールスタイルの特徴から,ドメイン固有のプロンプトトークンを直接学習する一連のスタイルプロジェクタを提案する。これらの生成したプロンプト埋め込みは、その後、コンテンツプロジェクタが学習したマルチスケールのビジュアルコンテンツ機能と組み合わせられる。プロジェクタは、CLIPの固定されたビジョンとテキストバックボーンを利用して、対照的な方法でトレーニングされる。複数のベンチマークデータセット上で5つの異なるDG設定で実施された広範な実験を通じて、StyLIPが現在のSOTA(State-of-the-art)メソッドよりも優れていることを一貫して実証する。 Large-scale foundation models, such as CLIP, have demonstrated impressive zero-shot generalization performance on downstream tasks, leveraging well-designed language prompts. However, these prompt learning techniques often struggle with domain shift, limiting their generalization capabilities. In our study, we tackle this issue by proposing StyLIP, a novel approach for Domain Generalization (DG) that enhances CLIP's classification performance across domains. Our method focuses on a domain-agnostic prompt learning strategy, aiming to disentangle the visual style and content information embedded in CLIP's pre-trained vision encoder, enabling effortless adaptation to novel domains during inference. To achieve this, we introduce a set of style projectors that directly learn the domain-specific prompt tokens from the extracted multi-scale style features. These generated prompt embeddings are subsequently combined with the multi-scale visual content features learned by a content projector. The projectors are trained in a contrastive manner, utilizing CLIP's fixed vision and text backbones. Through extensive experiments conducted in five different DG settings on multiple benchmark datasets, we consistently demonstrate that StyLIP outperforms the current state-of-the-art (SOTA) methods.	翻訳日:2023-12-01 03:58:38 公開日:2023-11-28
# 一般化Werner状態における2量子絡みのシングルキュービット計測 Single-qubit measurement of two-qubit entanglement in generalized Werner states ( http://arxiv.org/abs/2306.04103v2 ) ライセンス: Link先を確認	Salini Rajeev and Mayukh Lahiri	(参考訳) 従来の2量子フォトニック混合状態における絡み合いの測定方法は、両方の量子ビットの検出を必要とする。両キュービット検出を必要とせず,より広い絡み合った状態のクラスをカバーするように拡張することで,最近導入された手法を一般化した。具体的には、ワーナー状態の一般化によって得られる2量子混合状態の族におけるエンタングルメントを、量子ビットの1つを検出せずに測定する方法を示す詳細な理論を示す。本手法は干渉計であり, 偶然の計測やポストセレクションは不要である。また,本手法が実験的に実装可能であり,実験的損失に対する耐性を示すために,予測される実験不完全さを定量的に解析する。 Conventional methods of measuring entanglement in a two-qubit photonic mixed state require the detection of both qubits. We generalize a recently introduced method which does not require the detection of both qubits, by extending it to cover a wider class of entangled states. Specifically, we present a detailed theory that shows how to measure entanglement in a family of two-qubit mixed states - obtained by generalizing Werner states - without detecting one of the qubits. Our method is interferometric and does not require any coincidence measurement or postselection. We also perform a quantitative analysis of anticipated experimental imperfections to show that the method is experimentally implementable and resistant to experimental losses.	翻訳日:2023-12-01 03:50:13 公開日:2023-11-28
# クラスインクリメンタルセグメンテーションのための知識マイニングの進化 Evolving Knowledge Mining for Class Incremental Segmentation ( http://arxiv.org/abs/2306.02027v2 ) ライセンス: Link先を確認	Zhihe Lu, Shuicheng Yan, Xinchao Wang	(参考訳) クラスインクリメンタルセマンティックセマンティックセグメンテーション(CISS)は,近年,実世界のアプリケーションにおいて大きな意味を持つ傾向にある。既存のCISS法は優れた性能を示すが、それらは低レベルの特徴において豊富な知識と多様な知識を無視しながらのみ高レベルの知識(機能)を活用し、古い知識の保存が貧弱で新しい知識探索が弱いか、あるいは重いバックボーンをトレーニングすることで知識の蒸留に多レベルの特徴を用いるかのどちらかである。本稿では,CISSの効率的な多粒度知識再利用を初めて検討し,凍結したバックボーンを用いてkNowleDge mining(ENDING)を進化させる新しい手法を提案する。 ENDINGには、融合の進化と意味の強化という、2つの重要なモジュールが組み込まれている。メタネットから生成されるパーソナライズされた軽量ネットワークを用いて、個々の低レベル特徴から知識を抽出し、高レベル特徴を入力とする。この設計により、インクリメンタルな新しいクラスに適用されると、知識マイニングとfusingの進化が可能になる。対照的に、セマンティックエンハンスメントは、多レベル特徴からプロトタイプベースのセマンティクスを集約するために特別に作られ、拡張表現に寄与する。本手法を2つのベンチマークで評価し,最新の性能を一貫して実証する。コードはhttps://github.com/zhihelu/ending_issで入手できる。 Class Incremental Semantic Segmentation (CISS) has been a trend recently due to its great significance in real-world applications. Although the existing CISS methods demonstrate remarkable performance, they either leverage the high-level knowledge (feature) only while neglecting the rich and diverse knowledge in the low-level features, leading to poor old knowledge preservation and weak new knowledge exploration; or use multi-level features for knowledge distillation by retraining a heavy backbone, which is computationally intensive. In this paper, we for the first time investigate the efficient multi-grained knowledge reuse for CISS, and propose a novel method, Evolving kNowleDge minING (ENDING), employing a frozen backbone. ENDING incorporates two key modules: evolving fusion and semantic enhancement, for dynamic and comprehensive exploration of multi-grained knowledge. Evolving fusion is tailored to extract knowledge from individual low-level feature using a personalized lightweight network, which is generated from a meta-net, taking the high-level feature as input. This design enables the evolution of knowledge mining and fusing when applied to incremental new classes. In contrast, semantic enhancement is specifically crafted to aggregate prototype-based semantics from multi-level features, contributing to an enhanced representation. We evaluate our method on two widely used benchmarks and consistently demonstrate new state-of-the-art performance. The code is available at https://github.com/zhiheLu/ENDING_ISS.	翻訳日:2023-12-01 03:49:42 公開日:2023-11-28
# 概念表現は身体を必要とするか? 大規模言語モデルからの洞察 Does Conceptual Representation Require Embodiment? Insights From Large Language Models ( http://arxiv.org/abs/2305.19103v2 ) ライセンス: Link先を確認	Qihui Xu, Yingying Peng, Minghua Wu, Feng Xiao, Martin Chodorow, and Ping Li	(参考訳) 言語だけが複雑な概念をもたらすのか、それとも具体的経験が不可欠か? 大規模言語モデル(LLM)の最近の進歩は、この問題に新たな視点を与えている。 LLMは制限されたモダリティに基づいて訓練されているが、様々な心理的タスクにおいて人間のようなパフォーマンスを示す。ヒトとチャットgpts(gpt-3.5とgpt-4)の4,442種類の語彙概念の表現を,感情,敬礼,精神的可視化,感覚,運動経験という5つの重要な領域を含む多次元で比較した。主な発見は2つあります 1) 両モデルとも非感性運動野ではヒトの表現と強く一致しているが, 感覚野や運動野では遅延がみられ, GPT-4はGPT-3.5より優れていた。 2) GPT-4の利得は付加的な視覚学習と結びついており, 触覚やイメージ可能性といった関連次元にも寄与すると考えられる。これらの結果は、孤立した言語の制限を強調し、入力の多様なモダリティの統合は、より人間的な概念表現につながる。 To what extent can language alone give rise to complex concepts, or is embodied experience essential? Recent advancements in large language models (LLMs) offer fresh perspectives on this question. Although LLMs are trained on restricted modalities, they exhibit human-like performance in diverse psychological tasks. Our study compared representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4) across multiple dimensions, including five key domains: emotion, salience, mental visualization, sensory, and motor experience. We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability. These results highlight the limitations of language in isolation, and that the integration of diverse modalities of inputs leads to a more human-like conceptual representation.	翻訳日:2023-12-01 03:49:15 公開日:2023-11-28
# ビジョントランスフォーマーを本当に変身させる Making Vision Transformers Truly Shift-Equivariant ( http://arxiv.org/abs/2305.16316v2 ) ライセンス: Link先を確認	Renan A. Rojas-Gomez, Teck-Yian Lim, Minh N. Do, Raymond A. Yeh	(参考訳) コンピュータビジョンでは、ビジョントランスフォーマー (ViT) が網の深いアーキテクチャの1つになっている。畳み込みニューラルネットワーク(cnns)に触発されたにもかかわらず、vitsの出力は入力の小さな空間的シフト、すなわちシフト不変量に敏感である。この欠点に対処するために、トークン化、自己アテンション、パッチマージ、位置エンコーディングなど、ViTの各モジュールに新しいデータ適応型設計を導入する。提案するモジュールでは,Swin,SwinV2,CvT,MViTv2という,確立された4つのViTに対して真のシフト等価性を実現する。画像分類と意味セグメンテーションタスクにおける適応モデルの評価を行った。これらのモデルは、100%シフト一貫性を維持しながら、3つの異なるデータセットで競合性能を達成します。 For computer vision, Vision Transformers (ViTs) have become one of the go-to deep net architectures. Despite being inspired by Convolutional Neural Networks (CNNs), ViTs' output remains sensitive to small spatial shifts in the input, i.e., not shift invariant. To address this shortcoming, we introduce novel data-adaptive designs for each of the modules in ViTs, such as tokenization, self-attention, patch merging, and positional encoding. With our proposed modules, we achieve true shift-equivariance on four well-established ViTs, namely, Swin, SwinV2, CvT, and MViTv2. Empirically, we evaluate the proposed adaptive models on image classification and semantic segmentation tasks. These models achieve competitive performance across three different datasets while maintaining 100% shift consistency.	翻訳日:2023-12-01 03:48:54 公開日:2023-11-28
# 線形逆問題に対する複合ガウス最小二乗アルゴリズムとアンロールネットワーク A Compound Gaussian Least Squares Algorithm and Unrolled Network for Linear Inverse Problems ( http://arxiv.org/abs/2305.11120v3 ) ライセンス: Link先を確認	Carter Lyons, Raghu G. Raj, and Margaret Cheney	(参考訳) 本稿では, 線形逆問題, 特にトモグラフィー画像や圧縮センシングに現れるタイプの問題を解くために, 2つの新しいアプローチを考案する。第一のアプローチは、正規化が複合ガウス事前分布に基づく正則化最小二乗対象関数を最小化する反復アルゴリズムである。ガウス型化合物は、スパーシティに基づくアプローチを含む画像再構成においてよく使われる多くの先行事項を事前に仮定する。このアルゴリズムは、反復アルゴリズムの「アンロール」または「アンフォールディング」に対応するディープニューラルネットワークである。アンロールされたディープニューラルネットワークは解釈可能な層を持ち、標準的なディープラーニング手法より優れている。本稿では,両アルゴリズムの構成と性能に関する洞察を与える計算理論について述べる。結論は、どちらのアルゴリズムも、トモグラフィ画像形成や圧縮センシング、特に低訓練の難しい状況において、最先端のアプローチよりも優れているということである。 For solving linear inverse problems, particularly of the type that appears in tomographic imaging and compressive sensing, this paper develops two new approaches. The first approach is an iterative algorithm that minimizes a regularized least squares objective function where the regularization is based on a compound Gaussian prior distribution. The compound Gaussian prior subsumes many of the commonly used priors in image reconstruction, including those of sparsity-based approaches. The developed iterative algorithm gives rise to the paper's second new approach, which is a deep neural network that corresponds to an "unrolling" or "unfolding" of the iterative algorithm. Unrolled deep neural networks have interpretable layers and outperform standard deep learning methods. This paper includes a detailed computational theory that provides insight into the construction and performance of both algorithms. The conclusion is that both algorithms outperform other state-of-the-art approaches to tomographic image formation and compressive sensing, especially in the difficult regime of low training.	翻訳日:2023-12-01 03:47:53 公開日:2023-11-28
# 非トレーサブル表現を用いたマルチプランネ波ニューラル放射場 MultiPlaneNeRF: Neural Radiance Field with Non-Trainable Representation ( http://arxiv.org/abs/2305.10579v2 ) ライセンス: Link先を確認	Dominik Zimny, Artur Kasymov, Adam Kania, Jacek Tabor, Maciej Zi\k{e}ba, Przemys{\l}aw Spurek	(参考訳) NeRFは2D画像から3Dオブジェクトを効率的に表現する人気モデルである。しかしながら、バニラNeRFにはいくつかの重要な制限がある。 NeRFは個々のオブジェクトに対して個別にトレーニングされなければならない。トレーニング時間は、ニューラルネットワークの重みで物体の形状と色を符号化してから長い。さらに、NeRFは見えないデータに対してうまく一般化しない。本稿では,上記の問題を同時に解くモデルであるMultiPlaneNeRFを提案する。私たちのモデルは2D画像を直接処理します。 2次元画像に3dポイントを投影し,訓練不能な表現を生成する。投影ステップはパラメータ化されず、非常に浅いデコーダが効率よく表現を処理できる。さらに、大きなデータセット上でMultiPlaneNeRFをトレーニングし、暗黙のデコーダを多くのオブジェクトに一般化させます。これにより、新しいオブジェクトのNeRF表現を生成するために、2Dイメージを(追加のトレーニングなしで)置き換えることができる。実験セクションでは、MultiPlaneNeRFが、新しいビューを合成するための最先端モデルに匹敵する結果を達成し、一般化特性を有することを示す。さらに、MultiPlaneデコーダは、GANのような大規模な生成モデルのコンポーネントとして使用できる。 NeRF is a popular model that efficiently represents 3D objects from 2D images. However, vanilla NeRF has some important limitations. NeRF must be trained on each object separately. The training time is long since we encode the object's shape and color in neural network weights. Moreover, NeRF does not generalize well to unseen data. In this paper, we present MultiPlaneNeRF -- a model that simultaneously solves the above problems. Our model works directly on 2D images. We project 3D points on 2D images to produce non-trainable representations. The projection step is not parametrized and a very shallow decoder can efficiently process the representation. Furthermore, we can train MultiPlaneNeRF on a large data set and force our implicit decoder to generalize across many objects. Consequently, we can only replace the 2D images (without additional training) to produce a NeRF representation of the new object. In the experimental section, we demonstrate that MultiPlaneNeRF achieves results comparable to state-of-the-art models for synthesizing new views and has generalization properties. Additionally, MultiPlane decoder can be used as a component in large generative models like GANs.	翻訳日:2023-12-01 03:47:35 公開日:2023-11-28
# 有限・連続可変量子系における速度制限量子-古典的最適輸送 Rate-Limited Quantum-to-Classical Optimal Transport in Finite and Continuous-Variable Quantum Systems ( http://arxiv.org/abs/2305.10004v2 ) ライセンス: Link先を確認	Hafez M. Garmaroudi, S. Sandeep Pradhan, Jun Chen	(参考訳) 有限次元および連続変数の量子-古典的システムにおける出力制約付き速度-歪み符号化の観点から、速度制限型量子-古典的最適輸送を考察する。主符号化定理は、一般的に定義された歪観測可能な歪みに従ってソース状態から歪みのしきい値を維持しつつ、目的地分布(または等価量子状態)の正確な構成のために符号化される損失量子測定源の達成可能な速度領域の単一レター特性を提供する。出力空間上の制約は、出力分布をIDD予め定義された確率質量関数に固定する。したがって、この問題は、通信速度と共通ランダム性が制限された量子測定により、ソース量子状態を目的地の古典分布に輸送する最適なコストを求める情報制約付き最適輸送であると考えることもできる。クリッピングプロジェクションとデクタント化ブロックを用い, 有限次元符号化定理を用いて連続可変量子系の符号化フレームワークを開発する。さらに、ガウス量子系において、次数 2 のレート制限ワッサースタイン距離の解析解とガウス最適性定理を導出し、ガウス量子源とガウス目的地分布を持つ系において、ガウス計測が速度を最適化することを示す。さらに, 量子ガウス計測系において, 無限伝達率に対応するガウス分布の古典的ワッサースタイン距離とは対照的に, ハイゼンベルクの不確かさ原理によって課される量子測定の固有ノイズにより, 有限伝達率で最適輸送が達成されることを示した。 We consider the rate-limited quantum-to-classical optimal transport in terms of output-constrained rate-distortion coding for both finite-dimensional and continuous-variable quantum-to-classical systems with limited classical common randomness. The main coding theorem provides a single-letter characterization of the achievable rate region of a lossy quantum measurement source coding for an exact construction of the destination distribution (or the equivalent quantum state) while maintaining a threshold of distortion from the source state according to a generally defined distortion observable. The constraint on the output space fixes the output distribution to an IID predefined probability mass function. Therefore, this problem can also be viewed as information-constrained optimal transport which finds the optimal cost of transporting the source quantum state to the destination classical distribution via a quantum measurement with limited communication rate and common randomness. We develop a coding framework for continuous-variable quantum systems by employing a clipping projection and a dequantization block and using our finite-dimensional coding theorem. Moreover, for the Gaussian quantum systems, we derive an analytical solution for rate-limited Wasserstein distance of order 2, along with a Gaussian optimality theorem, showing that Gaussian measurement optimizes the rate in a system with Gaussian quantum source and Gaussian destination distribution. The results further show that in contrast to the classical Wasserstein distance of Gaussian distributions, which corresponds to an infinite transmission rate, in the Quantum Gaussian measurement system, the optimal transport is achieved with a finite transmission rate due to the inherent noise of the quantum measurement imposed by Heisenberg's uncertainty principle.	翻訳日:2023-12-01 03:47:16 公開日:2023-11-28
# ニューラルポアソン表面再構成:点雲からの解像度非依存形状再構成 Neural Poisson Surface Reconstruction: Resolution-Agnostic Shape Reconstruction from Point Clouds ( http://arxiv.org/abs/2308.01766v3 ) ライセンス: Link先を確認	Hector Andrade-Loarca, Julius Hege, Daniel Cremers, Gitta Kutyniok	(参考訳) 我々は,3次元形状を点から復元するという課題に対処する形状再構成アーキテクチャであるニューラルポアソン表面再構成(nPSR)を導入する。従来のディープニューラルネットワークは、高解像度での計算複雑性のため、一般的な3次元形状の離散化技術による課題に直面している。これを解決するために、フーリエニューラル演算子を用いてポアソン方程式を解き、配向点雲の測定からメッシュを再構築する。第一に、FNOの分解能に依存しない性質のおかげで、高分解能評価において同等の性能を達成しつつ、低分解能データの効率的なトレーニングを可能にします。この機能はワンショットの超解像度を可能にする。第2に, 本手法は, 点サンプリング率に関して微分可能かつ頑健でありながら, 既存の復元品質のアプローチを上回っている。全体として、ニューラルポアソン表面再構成は、形状再構成における古典的なディープニューラルネットワークの制限を改良するだけでなく、再構成品質、実行時間、解像度非依存性の観点から優れた結果を得る。 We introduce Neural Poisson Surface Reconstruction (nPSR), an architecture for shape reconstruction that addresses the challenge of recovering 3D shapes from points. Traditional deep neural networks face challenges with common 3D shape discretization techniques due to their computational complexity at higher resolutions. To overcome this, we leverage Fourier Neural Operators to solve the Poisson equation and reconstruct a mesh from oriented point cloud measurements. nPSR exhibits two main advantages: First, it enables efficient training on low-resolution data while achieving comparable performance at high-resolution evaluation, thanks to the resolution-agnostic nature of FNOs. This feature allows for one-shot super-resolution. Second, our method surpasses existing approaches in reconstruction quality while being differentiable and robust with respect to point sampling rates. Overall, the neural Poisson surface reconstruction not only improves upon the limitations of classical deep neural networks in shape reconstruction but also achieves superior results in terms of reconstruction quality, running time, and resolution agnosticism.	翻訳日:2023-12-01 03:40:13 公開日:2023-11-28
# アベリア集団行動の量子マネー Quantum Money from Abelian Group Actions ( http://arxiv.org/abs/2307.12120v3 ) ライセンス: Link先を確認	Mark Zhandry	(参考訳) 我々は、公鍵量子マネーの構築と、アーベル群作用から量子雷と呼ばれる強化版も与え、楕円曲線上の適切な等質性から構築することができる。本稿では,グループ行動の一般群モデルにおけるセキュリティの検証を行い,このモデルにおける量子セキュリティを証明する汎用ツールキットを開発した。その過程で、量子設定における知識仮定と代数群作用を探求し、一般的な群作用と比較してこれらの仮定/モデルに重大な制限を見いだす。 We give a construction of public key quantum money, and even a strengthened version called quantum lightning, from abelian group actions, which can in turn be constructed from suitable isogenies over elliptic curves. We prove security in the generic group model for group actions under a plausible computational assumption, and develop a general toolkit for proving quantum security in this model. Along the way, we explore knowledge assumptions and algebraic group actions in the quantum setting, finding significant limitations of these assumptions/models compared to generic group actions.	翻訳日:2023-12-01 03:39:51 公開日:2023-11-28
# My3DGen: スケーラブルなパーソナライズされた3D生成モデル My3DGen: A Scalable Personalized 3D Generative Model ( http://arxiv.org/abs/2307.05468v3 ) ライセンス: Link先を確認	Luchao Qi, Jiaye Wu, Annie N. Wang, Shengze Wang, Roni Sengupta	(参考訳) 近年,フォトリアリスティック顔の合成問題に取り組むため,生成型3次元顔モデル(eg3dなど)が開発されている。しかし、これらのモデルは個々の個人に固有の顔の特徴を捉えることができず、パーソナライゼーションの重要性を強調している。いくつかの先行研究は、生成的顔モデルのパーソナライズを約束しているが、これらの研究は主に2D設定に焦点を当てている。また、これらの方法では、ユーザ毎に多数のパラメータを微調整して保存する必要があるため、スケーラブルなパーソナライズを実現する上で障害となる。パーソナライゼーションのもうひとつの課題は、個々の個人が利用可能なトレーニングイメージ数が限られていることだ。提案手法であるmy3dgenは,50以上のトレーニング画像を用いて個人にパーソナライズされた3d画像を生成する。 My3DGenは、新しいビューの合成、特定の顔のセマンティックな編集(例えば、笑顔を追加する)、新しい外観の合成を可能にする。我々は3D顔の特徴をグローバルな特徴とパーソナライズされた特徴に分解し、トレーニング済みのEG3Dを凍結し、低ランクの分解によってさらにパーソナライズされた重みをトレーニングする。その結果、my3dgenは個人ごとに$\textbf{240k}$のパラメータを導入するだけで、パラメータ空間全体の微調整に必要な$\textbf{30.6m}$と比較して、トレーニング可能なパラメータの$\textbf{127}\times$が削減される。ストレージの大幅な削減にもかかわらず、我々のモデルは下流アプリケーションの品質を損なうことなくアイデンティティ機能を保存する。 In recent years, generative 3D face models (e.g., EG3D) have been developed to tackle the problem of synthesizing photo-realistic faces. However, these models are often unable to capture facial features unique to each individual, highlighting the importance of personalization. Some prior works have shown promise in personalizing generative face models, but these studies primarily focus on 2D settings. Also, these methods require both fine-tuning and storing a large number of parameters for each user, posing a hindrance to achieving scalable personalization. Another challenge of personalization is the limited number of training images available for each individual, which often leads to overfitting when using full fine-tuning methods. Our proposed approach, My3DGen, generates a personalized 3D prior of an individual using as few as 50 training images. My3DGen allows for novel view synthesis, semantic editing of a given face (e.g. adding a smile), and synthesizing novel appearances, all while preserving the original person's identity. We decouple the 3D facial features into global features and personalized features by freezing the pre-trained EG3D and training additional personalized weights through low-rank decomposition. As a result, My3DGen introduces only $\textbf{240K}$ personalized parameters per individual, leading to a $\textbf{127}\times$ reduction in trainable parameters compared to the $\textbf{30.6M}$ required for fine-tuning the entire parameter space. Despite this significant reduction in storage, our model preserves identity features without compromising the quality of downstream applications.	翻訳日:2023-12-01 03:38:23 公開日:2023-11-28
# DynamicBEV:3Dオブジェクト検出のための動的クエリと時間コンテキストを活用する DynamicBEV: Leveraging Dynamic Queries and Temporal Context for 3D Object Detection ( http://arxiv.org/abs/2310.05989v2 ) ライセンス: Link先を確認	Jiawei Yao and Yingxin Lai	(参考訳) 3Dオブジェクト検出は、自動運転やロボティクスといったアプリケーションには不可欠だ。 BEV(Bird's Eye View)画像に対するクエリベースの3Dオブジェクト検出は大幅に進歩しているが、既存の手法のほとんどは静的クエリのパラダイムに従っている。このようなパラダイムは、シーン内の複雑な空間的時間的関係に適応できない。この問題を解決するために,BEVに基づく3次元オブジェクト検出に動的クエリを利用する新しいアプローチであるDynamicBEVのパラダイムを導入する。静的クエリとは対照的に,提案する動的クエリはk-meansクラスタリングとtop-kアテンションを創造的な方法で活用し,局所的特徴と遠方特徴の両方からより効率的に情報を集約する。効率をさらに高めるため、DynamicBEVは、時間文脈の効率的な統合と計算の大幅な削減のために設計された軽量時間融合モジュール(LTFM)を組み込んでいる。さらに、カスタム設計の多様性損失によって、シナリオ間でバランスのとれた機能表現が保証される。 nuScenesデータセットの大規模な実験はDynamicBEVの有効性を検証し、新しい最先端技術を確立し、クエリベースのBEVオブジェクト検出におけるパラダイムレベルのブレークスルーを宣言する。 3D object detection is crucial for applications like autonomous driving and robotics. While query-based 3D object detection for BEV (Bird's Eye View) images has seen significant advancements, most existing methods follows the paradigm of static query. Such paradigm is incapable of adapting to complex spatial-temporal relationships in the scene. To solve this problem, we introduce a new paradigm in DynamicBEV, a novel approach that employs dynamic queries for BEV-based 3D object detection. In contrast to static queries, the proposed dynamic queries exploit K-means clustering and Top-K Attention in a creative way to aggregate information more effectively from both local and distant feature, which enable DynamicBEV to adapt iteratively to complex scenes. To further boost efficiency, DynamicBEV incorporates a Lightweight Temporal Fusion Module (LTFM), designed for efficient temporal context integration with a significant computation reduction. Additionally, a custom-designed Diversity Loss ensures a balanced feature representation across scenarios. Extensive experiments on the nuScenes dataset validate the effectiveness of DynamicBEV, establishing a new state-of-the-art and heralding a paradigm-level breakthrough in query-based BEV object detection.	翻訳日:2023-12-01 03:28:51 公開日:2023-11-28
# deepdecipher: 大言語モデルにおけるニューロン活性化のアクセスと研究 DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models ( http://arxiv.org/abs/2310.01870v2 ) ライセンス: Link先を確認	Albert Garde, Esben Kran, Fazl Barez	(参考訳) 大きな言語モデル(LLM)がより能力を持つようになると、解釈可能で透明なツールが緊急に必要になる。現在の手法の実装は困難であり、モデル内部を解析するためのアクセス可能なツールが不足している。このギャップを埋めるために、私たちはDeepDecipher – トランスフォーマーモデルのMLPレイヤでニューロンを探索するためのAPIとインターフェースを提供する。 deepdecipherはllmの高度な解釈技術の出力を簡単に利用できるようにする。使いやすいインターフェースは、これらの複雑なモデルの検査をより直感的にする。本稿ではDeepDecipherの設計と機能について概説する。我々は、ニューロンを分析し、モデルを比較し、モデル行動に関する洞察を得る方法を実証する。例えば、deepdecipherの機能とneuroscopeやopenaiのneuron explanationerのような類似のツールを比較します。 DeepDecipherは、LLMの効率的でスケーラブルな分析を可能にする。最先端の解釈方法へのアクセスを許可することで、deepdecipherはllmをより透明で、信頼性があり、安全である。研究者、エンジニア、開発者は、問題を迅速に診断し、システムを監査し、この分野を前進させることができる。 As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques for LLMs readily available. The easy-to-use interface also makes inspecting these complex models more intuitive. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe. Researchers, engineers, and developers can quickly diagnose issues, audit systems, and advance the field.	翻訳日:2023-12-01 03:27:47 公開日:2023-11-28
# 変圧器の深さ勾配連続性の改善:CNNによる単眼深度推定の比較検討 Improving Depth Gradient Continuity in Transformers: A Comparative Study on Monocular Depth Estimation with CNN ( http://arxiv.org/abs/2308.08333v2 ) ライセンス: Link先を確認	Jiawei Yao, Tong Wu, Xiaofeng Zhang	(参考訳) 単眼深度推定はコンピュータビジョンにおいて進行中の課題である。最近のトランスフォーマーモデルの進歩は、この分野の従来のcnnよりも顕著な利点を示している。しかし、これらのモデルが2次元画像の異なる領域を優先し、これらの領域が深さ推定性能にどのように影響するかを理解するには、まだギャップがある。トランスフォーマーとcnnの違いを探るため,我々は,両者の区別を対比的に解析するために,疎画素法を適用した。以上の結果から,トランスフォーマーはグローバルな文脈や複雑なテクスチャを扱うのに優れるが,CNNより遅れて奥行き勾配の連続性を保っていることが示唆された。単眼深度推定におけるトランスモデルの性能をさらに高めるために,高次微分,特徴融合,再校正により深さ推定を洗練する深さ勾配補正(dgr)モジュールを提案する。さらに, 最適輸送理論を活用し, 深度写像を空間確率分布として扱い, 最適輸送距離を損失関数としてモデル最適化を行う。実験により,プラグアンドプレイDGR(Depth Gradient Refinement)モジュールに統合されたモデルと,提案した損失関数により,屋外KITTIと屋内NYU-Depth-v2データセットの複雑さと計算コストを増大させることなく,性能が向上することを示した。本研究は,トランスフォーマーとCNNの深度推定における区別に関する新たな知見を提供するだけでなく,新しい深度推定手法の道を開く。 Monocular depth estimation is an ongoing challenge in computer vision. Recent progress with Transformer models has demonstrated notable advantages over conventional CNNs in this area. However, there's still a gap in understanding how these models prioritize different regions in 2D images and how these regions affect depth estimation performance. To explore the differences between Transformers and CNNs, we employ a sparse pixel approach to contrastively analyze the distinctions between the two. Our findings suggest that while Transformers excel in handling global context and intricate textures, they lag behind CNNs in preserving depth gradient continuity. To further enhance the performance of Transformer models in monocular depth estimation, we propose the Depth Gradient Refinement (DGR) module that refines depth estimation through high-order differentiation, feature fusion, and recalibration. Additionally, we leverage optimal transport theory, treating depth maps as spatial probability distributions, and employ the optimal transport distance as a loss function to optimize our model. Experimental results demonstrate that models integrated with the plug-and-play Depth Gradient Refinement (DGR) module and the proposed loss function enhance performance without increasing complexity and computational costs on both outdoor KITTI and indoor NYU-Depth-v2 datasets. This research not only offers fresh insights into the distinctions between Transformers and CNNs in depth estimation but also paves the way for novel depth estimation methodologies.	翻訳日:2023-12-01 03:24:50 公開日:2023-11-28
# 確率勾配降下法と適応勾配法とのロバスト性差の理解 Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods ( http://arxiv.org/abs/2308.06703v2 ) ライセンス: Link先を確認	Avery Ma, Yangchen Pan and Amir-massoud Farahmand	(参考訳) 確率勾配勾配(SGD)とアダムやRMSPropのような適応勾配法は、ディープニューラルネットワークのトレーニングに広く用いられている。これらの手法を用いて訓練したモデルの標準一般化性能の差は小さいが、SGDを用いて訓練したモデルは入力摂動下でははるかに頑健であることを示す。特に,本研究は,モデルの一般化性能に影響を及ぼさない自然データセットにおける非関連周波数の存在を実証する。しかし、適応的手法で訓練されたモデルはこれらの変化に敏感であり、それらの無関係な周波数の使用は摂動に敏感な解をもたらす可能性があることを示唆している。この違いをよりよく理解するために,自然信号を反映した合成データセット上での勾配降下(gd)と符号勾配降下(signgd)の学習ダイナミクスについて検討した。 3次元入力空間では、GD と signGD で最適化されたモデルは標準リスクがゼロに近いが、その逆のリスクは異なる。この結果から, モデルパラメータの重みノルムに対して, $\ell_2$-norm の有界変化に対する線形モデルのロバスト性は逆比例することがわかった。ディープラーニングの文脈では、SGD学習ニューラルネットワークはリプシッツ定数が小さく、適応勾配法で訓練されたものよりも入力摂動の堅牢性が高いことを説明できる。 Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation demonstrates the presence of irrelevant frequencies in natural datasets, where alterations do not affect models' generalization performance. However, models trained with adaptive methods show sensitivity to these changes, suggesting that their use of irrelevant frequencies can lead to solutions sensitive to perturbations. To better understand this difference, we study the learning dynamics of gradient descent (GD) and sign gradient descent (signGD) on a synthetic dataset that mirrors natural signals. With a three-dimensional input space, the models optimized with GD and signGD have standard risks close to zero but vary in their adversarial risks. Our result shows that linear models' robustness to $\ell_2$-norm bounded changes is inversely proportional to the model parameters' weight norm: a smaller weight norm implies better robustness. In the context of deep learning, our experiments show that SGD-trained neural networks have smaller Lipschitz constants, explaining the better robustness to input perturbations than those trained with adaptive gradient methods.	翻訳日:2023-12-01 03:24:20 公開日:2023-11-28
# テンソルネットワークの変分断断熱輸送 Variational adiabatic transport of tensor networks ( http://arxiv.org/abs/2311.00748v2 ) ライセンス: Link先を確認	Hyeongjin Kim, Matthew T. Fishman, Dries Sels	(参考訳) 本稿では, 行列積演算子としてアディベートゲージポテンシャル(アディベート変換の生成元)を構築するためのテンソルネットワーク法について論じ, 行列積状態のアディベート輸送を可能にする。テンソルネットワークの断熱的進化は、幅広い応用を提供するが、その2つは、テンソルネットワークの最適化と走査位相図の改善である。固有状態を量子臨界に効率的に輸送し、その過程で中間密度行列再正規化群(DMRG)の最適化を行うことにより、量子臨界度付近の標準DMRG法よりも高速かつ確実に基底および低層励起状態を計算できることを実証する。本稿では, 簡易な自動ステップサイズ調整と, 断熱ゲージ電位の標準値に基づく臨界点の検出について述べる。驚くべきことに、我々は研究するモデルの臨界点を通して確実に状態を輸送することができる。 We discuss a tensor network method for constructing the adiabatic gauge potential -- the generator of adiabatic transformations -- as a matrix product operator, which allows us to adiabatically transport matrix product states. Adiabatic evolution of tensor networks offers a wide range of applications, of which two are explored in this paper: improving tensor network optimization and scanning phase diagrams. By efficiently transporting eigenstates to quantum criticality and performing intermediary density matrix renormalization group (DMRG) optimizations along the way, we demonstrate that we can compute ground and low-lying excited states faster and more reliably than a standard DMRG method at or near quantum criticality. We demonstrate a simple automated step size adjustment and detection of the critical point based on the norm of the adiabatic gauge potential. Remarkably, we are able to reliably transport states through the critical point of the models we study.	翻訳日:2023-12-01 03:15:35 公開日:2023-11-28
# 熱間の読み」--非誘引的ストレス検出のための身体熱シグネチャ- "Reading Between the Heat": Co-Teaching Body Thermal Signatures for Non-intrusive Stress Detection ( http://arxiv.org/abs/2310.09932v2 ) ライセンス: Link先を確認	Yi Xiao, Harshit Sharma, Zhongyang Zhang, Dessa Bergen-Cico, Tauhidur Rahman, Asif Salekin	(参考訳) ストレスは私たちの身体と精神の健康と社会生活に影響を与える。受動的で接触のない屋内ストレスモニタリングシステムは、職場の生産性評価、スマートホーム、パーソナライズされたメンタルヘルスモニタリングなど、数多くの重要な応用を解き放つことができる。サーマルカメラで撮影されたユーザーの身体からのサーマルシグネチャは、交感神経および副交感神経系の「戦闘飛行」応答に関する重要な情報を提供することができるが、ストレス予測モデルを訓練するためのサーマルイメージングのみに頼ると、しばしば過剰フィッティングと亜最適性能につながる。本稿では,ウェアラブルのモダリティから非接触熱のモダリティへ知識を伝達することで,高いストレス予測性能を実現する新しいコトレーニングフレームワークであるThermaStrainを導入することで,この問題に対処する。トレーニング中、ThermaStrainはウェアラブル・エレクトロミカル・アクティビティ(EDA)センサーを組み込んで、サーマルビデオからストレス指示表現を生成し、ウェアラブルEDAセンサーからストレス指示表現をエミュレートする。試験では, 温度センサのみを使用し, 熱データから応力指示パターンを抽出し, EDA表現をエミュレートし, 応力評価を改善する。本研究では,様々な応力条件と距離を考慮したサーマルビデオとedaデータを用いた総合データセットを収集した。 ThermaStrainは2次応力分類においてF1スコアが0.8293であり、熱のみのベースラインのアプローチを9%以上上回っている。広範な評価では、ストレスを示唆する属性の認識におけるthermastrainの有効性、距離とストレスシナリオ間の適応性、エッジプラットフォームでのリアルタイム実行性、マルチ個別センシングへの適用性、可視性と不慣れな条件で機能する能力、そして共同ティーチングアプローチの利点が強調されている。 Stress impacts our physical and mental health as well as our social life. A passive and contactless indoor stress monitoring system can unlock numerous important applications such as workplace productivity assessment, smart homes, and personalized mental health monitoring. While the thermal signatures from a user's body captured by a thermal camera can provide important information about the "fight-flight" response of the sympathetic and parasympathetic nervous system, relying solely on thermal imaging for training a stress prediction model often lead to overfitting and consequently a suboptimal performance. This paper addresses this challenge by introducing ThermaStrain, a novel co-teaching framework that achieves high-stress prediction performance by transferring knowledge from the wearable modality to the contactless thermal modality. During training, ThermaStrain incorporates a wearable electrodermal activity (EDA) sensor to generate stress-indicative representations from thermal videos, emulating stress-indicative representations from a wearable EDA sensor. During testing, only thermal sensing is used, and stress-indicative patterns from thermal data and emulated EDA representations are extracted to improve stress assessment. The study collected a comprehensive dataset with thermal video and EDA data under various stress conditions and distances. ThermaStrain achieves an F1 score of 0.8293 in binary stress classification, outperforming the thermal-only baseline approach by over 9%. Extensive evaluations highlight ThermaStrain's effectiveness in recognizing stress-indicative attributes, its adaptability across distances and stress scenarios, real-time executability on edge platforms, its applicability to multi-individual sensing, ability to function on limited visibility and unfamiliar conditions, and the advantages of its co-teaching approach.	翻訳日:2023-12-01 03:13:23 公開日:2023-11-28
# ファインマンプロパゲータによる確率分布の効率的な量子負荷 Efficient quantum loading of probability distributions through Feynman propagators ( http://arxiv.org/abs/2311.13702v2 ) ライセンス: Link先を確認	Elie Alhajjar and Jesse Geneson and Anupam Prakash and Nicolas Robles	(参考訳) 確率分布の負荷に対する量子アルゴリズムを,${\hat h}= \delta + v(x) \mathbb{i}$ の形をした1次元ハミルトニアンのハミルトニアンシミュレーションを用いて提示する。ファインマンプロパゲータが解析的に閉じた形式を持つことが知られているポテンシャル $v(x)$ を考え、これらのハミルトニアンを用いて正規値、ラプラス値、マクスウェル・ボルツマン値を含む確率分布を量子状態へロードする。また,「層状態」の形で分布に対する粗い近似を構築し,所望の確率分布を基底状態として選択したハミルトニアンの基底状態に投影することで,確率分布の負荷を変動させる手法を提案する。これらの手法は確率分布のロードに利用可能な一連のテクニックを拡張し、量子機械学習で使用される汎用データロード方法よりも効率的である。 We present quantum algorithms for the loading of probability distributions using Hamiltonian simulation for one dimensional Hamiltonians of the form ${\hat H}= \Delta + V(x) \mathbb{I}$. We consider the potentials $V(x)$ for which the Feynman propagator is known to have an analytically closed form and utilize these Hamiltonians to load probability distributions including the normal, Laplace and Maxwell-Boltzmann into quantum states. We also propose a variational method for probability distribution loading based on constructing a coarse approximation to the distribution in the form of a `ladder state' and then projecting onto the ground state of a Hamiltonian chosen to have the desired probability distribution as ground state. These methods extend the suite of techniques available for the loading of probability distributions, and are more efficient than general purpose data loading methods used in quantum machine learning.	翻訳日:2023-12-01 03:02:14 公開日:2023-11-28
# 等変自己回帰貯留層コンピュータを用いた対称性の同定 Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers ( http://arxiv.org/abs/2311.09511v2 ) ライセンス: Link先を確認	Fredy Vides, Idelfonso B. R. Nogueira, Lendy Banegas, Evelyn Flores	(参考訳) 本報告では, 同種の自己回帰型貯水池コンピュータを用いて, 対称性を持つシステムを特定することに焦点を当てた。構造行列近似理論の一般的な結果を示し、2次元のアプローチを探求する。まず, 一般対称性保存非線形遅延埋め込みの包括的検討を行う。これは、研究中の同変系からサンプリングされた時系列データを解析することを含む。第二に、出力結合行列の近似表現を識別するためにスパース最小二乗法を適用する。これらの行列は同変系の非線形自己回帰表現を決定する上で重要な役割を果たす。これらの行列の構造的特性は、系に固有の対称性の集合によって決定される。この文書は、記述した手法から派生したプロトタイプアルゴリズムの概要を述べ、それらの実用的応用についての洞察を提供する。これらの系がカオス的振舞いを示すかどうかに関わらず、等変非線形系の同定と予測シミュレーションにおいて有効性に重点を置いている。 The investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a pivotal role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on their effectiveness in the identification and predictive simulation of equivariant nonlinear systems, regardless of whether such systems exhibit chaotic behavior.	翻訳日:2023-12-01 03:00:53 公開日:2023-11-28
# LegendreTron: マルチクラスの損失学習が向上 LegendreTron: Uprising Proper Multiclass Loss Learning ( http://arxiv.org/abs/2301.11695v3 ) ライセンス: Link先を確認	Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock	(参考訳) 損失関数は教師付き学習の基礎となり、しばしばモデル開発の前に選択される。損失のアドホックな選択を避けるために、統計的決定理論は、ベイズの法則が最適であると主張する \emph{properness} として知られる損失の望ましい性質を記述する。近年の研究では、emph{learn loss} とモデルの共同開発が試みられている。既存の方法では、逆正準リンク関数を単調に$\mathbb{R}$を$[0,1]$にし、二元問題に対する確率を推定する。本論文では、凸関数の勾配の単調性を用いて、$\mathbb{R}^{C-1}$と予想される確率単純度$\tilde{\Delta}^{C-1}$の間の写像への単調性を拡張する。本稿では,emph{proper canonical loss} と多クラス問題に対する確率を共同で学習する新規かつ実用的な方法として {\sc LegendreTron を提案する。最大1000のクラスを持つドメインのベンチマークでテストした結果、我々のメソッドは10以上のクラスを持つすべてのデータセットで99%の価値がある$t$-testで、自然のマルチクラスベースラインを一貫して上回ります。 Loss functions serve as the foundation of supervised learning and are often chosen prior to model development. To avoid potentially ad hoc choices of losses, statistical decision theory describes a desirable property for losses known as \emph{properness}, which asserts that Bayes' rule is optimal. Recent works have sought to \emph{learn losses} and models jointly. Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0,1]$ to estimate probabilities for binary problems. In this paper, we extend monotonicity to maps between $\mathbb{R}^{C-1}$ and the projected probability simplex $\tilde{\Delta}^{C-1}$ by using monotonicity of gradients of convex functions. We present {\sc LegendreTron} as a novel and practical method that jointly learns \emph{proper canonical losses} and probabilities for multiclass problems. Tested on a benchmark of domains with up to 1,000 classes, our experimental results show that our method consistently outperforms the natural multiclass baseline under a $t$-test at 99% significance on all datasets with greater than 10 classes.	翻訳日:2023-12-01 01:16:01 公開日:2023-11-28
# 重要な特徴を生かした自動運転車の安全クリティカルシナリオの特定と説明 Identifying and Explaining Safety-critical Scenarios for Autonomous Vehicles via Key Features ( http://arxiv.org/abs/2212.07566v2 ) ライセンス: Link先を確認	Neelofar, Aldeida Aleti	(参考訳) 自動運転車(AV)の安全性の確保は最重要であり、模擬環境でのテストは現地での運用テストよりも安全な選択肢である。しかしながら、重要なテストシナリオを特定するための徹底的なテストスイートの生成は、各テストの表現が複雑で、テスト中のAV、道路参加者(車両、歩行者、静的障害物)、環境要因(重量と光)、道路の構造的特徴(レーン、ターン、道路速度など)など、様々な動的および静的な特徴を含むため、計算的にコストがかかる。本稿では,インスタンス空間解析(isa)を用いて,avsの安全でない動作を明らかにする能力に影響を与えるテストシナリオの重要な特徴を識別する体系的手法を提案する。 ISAは、安全クリティカルなシナリオと通常の運転とを最も区別する機能を特定し、2Dのテストシナリオ結果(セーフ/アンセーフ)への影響を可視化する。この視覚化は、インスタンス空間の未テスト領域を特定し、テストによってカバーされる機能空間の割合でテストスイートの品質の指標を提供するのに役立つ。識別された機能の予測能力をテストするために、5つの機械学習分類器をトレーニングし、テストシナリオを安全か安全かのどちらかに分類します。高い精度、リコール、f1スコアは、提案手法がテストシナリオの実行なしに結果を予測するのに有効であり、テスト生成、選択、優先順位付けに使用できることを示している。 Ensuring the safety of autonomous vehicles (AVs) is of utmost importance and testing them in simulated environments is a safer option than conducting in-field operational tests. However, generating an exhaustive test suite to identify critical test scenarios is computationally expensive as the representation of each test is complex and contains various dynamic and static features, such as the AV under test, road participants (vehicles, pedestrians, and static obstacles), environmental factors (weather and light), and the road's structural features (lanes, turns, road speed, etc.). In this paper, we present a systematic technique that uses Instance Space Analysis (ISA) to identify the significant features of test scenarios that affect their ability to reveal the unsafe behaviour of AVs. ISA identifies the features that best differentiate safety-critical scenarios from normal driving and visualises the impact of these features on test scenario outcomes (safe/unsafe) in 2D. This visualization helps to identify untested regions of the instance space and provides an indicator of the quality of the test suite in terms of the percentage of feature space covered by testing. To test the predictive ability of the identified features, we train five Machine Learning classifiers to classify test scenarios as safe or unsafe. The high precision, recall, and F1 scores indicate that our proposed approach is effective in predicting the outcome of a test scenario without executing it and can be used for test generation, selection, and prioritization.	翻訳日:2023-12-01 01:15:12 公開日:2023-11-28
# 異種性空間における品質多様性 Quality-diversity in dissimilarity spaces ( http://arxiv.org/abs/2211.12337v3 ) ライセンス: Link先を確認	Steve Huntsman	(参考訳) 等級の理論は多様性の定量化と最大化のための数学的枠組みを提供する。この枠組みを汎用的異質性空間における品質多様性アルゴリズムの定式化に応用する。特に、Go-Exploreの非常に一般的なバージョンをインスタンス化し、デモします。 The theory of magnitude provides a mathematical framework for quantifying and maximizing diversity. We apply this framework to formulate quality-diversity algorithms in generic dissimilarity spaces. In particular, we instantiate and demonstrate a very general version of Go-Explore with promising performance.	翻訳日:2023-12-01 01:14:09 公開日:2023-11-28
# 記憶能を考慮した3次元ヘリカルCT再構成法 3D helical CT Reconstruction with a Memory Efficient Learned Primal-Dual Architecture ( http://arxiv.org/abs/2205.11952v3 ) ライセンス: Link先を確認	Jevgenija Rudzusika, Buda Baji\'c, Thomas Koehler, Ozan \"Oktem	(参考訳) 深層学習によるCT(Computerd tomography)の再構成は, シミュレーション2次元低線量CTデータにおいて顕著な性能を示した。これは特に、CTイメージングのための手作り物理モデルを含む、ドメイン適応ニューラルネットワークに適用できる。このようなアーキテクチャを採用することで、トレーニングデータの需要が減少し、一般化によって改善される、という実証的な証拠がある。しかし,3次元ヘリカルCTは医用画像の取得法として最も一般的な3次元ヘリカルCTにおいて,急速に禁止となる膨大な計算資源を必要とする。さらに、臨床データには、フラックス測定の誤差、分解ミスマッチ、そして最も重要なことは、実際の真実がないことなど、シミュレーションで考慮されていない他の課題も伴っている。これらの課題に対処するために必要な計算可能トレーニングと組み合わせることの必要性は,臨床3次元ヘリカルCTによる深層学習の再構築を困難にしている。本稿では,学習プライマル・ダイアル (lpd) というドメイン適応型ニューラルネットワークアーキテクチャを改良し,この環境での再構築を訓練し,応用する。ヘリカル軌道をセクションに分割し,そのセクションに無回転のLPD反復を順次適用することで,これを実現する。我々の知る限りでは、この研究は、低線量CT画像や投影データセット(LDCT)のようなフルサイズの臨床データに、非ロールのディープラーニングアーキテクチャを適用した最初のものである。さらに、トレーニングとテストは、24GBのメモリを持つ単一のGPUカード上で行われる。 Deep learning based computed tomography (CT) reconstruction has demonstrated outstanding performance on simulated 2D low-dose CT data. This applies in particular to domain adapted neural networks, which incorporate a handcrafted physics model for CT imaging. Empirical evidence shows that employing such architectures reduces the demand for training data and improves upon generalisation. However, their training requires large computational resources that quickly become prohibitive in 3D helical CT, which is the most common acquisition geometry used for medical imaging. Furthermore, clinical data also comes with other challenges not accounted for in simulations, like errors in flux measurement, resolution mismatch and, most importantly, the absence of the real ground truth. The necessity to have a computationally feasible training combined with the need to address these issues has made it difficult to evaluate deep learning based reconstruction on clinical 3D helical CT. This paper modifies a domain adapted neural network architecture, the Learned Primal-Dual (LPD), so that it can be trained and applied to reconstruction in this setting. We achieve this by splitting the helical trajectory into sections and applying the unrolled LPD iterations to those sections sequentially. To the best of our knowledge, this work is the first to apply an unrolled deep learning architecture for reconstruction on full-sized clinical data, like those in the Low dose CT image and projection data set (LDCT). Moreover, training and testing is done on a single GPU card with 24GB of memory.	翻訳日:2023-12-01 01:14:07 公開日:2023-11-28
# NeuroBack: グラフニューラルネットワークによるCDCL SAT解決の改善 NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks ( http://arxiv.org/abs/2110.14053v6 ) ライセンス: Link先を確認	Wenxi Wang, Yang Hu, Mohit Tiwari, Sarfraz Khurshid, Kenneth McMillan, Risto Miikkulainen	(参考訳) 提案的満足度(SAT)は、計画、検証、セキュリティなど、多くの研究分野に影響を与えるNP完全問題である。主流のSATソルバは、Conflict-Driven Clause Learning (CDCL)アルゴリズムに基づいている。グラフニューラルネットワーク(GNN)を用いたCDCL SATソルバの高速化を目的とした最近の研究。しかし、これまでのところこのアプローチは、より効果的に解決できないか、または頻繁にオンラインモデル推論のために、相当なgpuリソースを必要とした。本稿では,GNNの改良を現実的なものにすることを目的としたNeuroBackという手法を提案する。(1)CDCL SATの解法において,満たされる課題の多数(あるいはすべて)に現れる変数の位相(すなわち値)を予測すること,(2)SATの解法が始まる前に1回だけ神経モデルに問い合わせること,である。トレーニングが完了すると、オフラインモデル推論によって、neurobackはcpuのみで動作するようになり、gpuリソースへの依存がなくなる。 NeuroBackをトレーニングするために、120,286のデータサンプルを含むDataBackと呼ばれる新しいデータセットが作成される。最後に、NeuroBackはKissatと呼ばれる最先端のSATソルバの拡張として実装されている。その結果、Kissatは最近のSAT競合問題SATCOMP-2022でさらに5.2%の問題を解決することができた。したがってneurobackは、sat解決を効果的かつ実用的な方法で改善するために機械学習をどのように活用できるかを示している。 Propositional satisfiability (SAT) is an NP-complete problem that impacts many research fields, such as planning, verification, and security. Mainstream modern SAT solvers are based on the Conflict-Driven Clause Learning (CDCL) algorithm. Recent work aimed to enhance CDCL SAT solvers using Graph Neural Networks (GNNs). However, so far this approach either has not made solving more effective, or required substantial GPU resources for frequent online model inferences. Aiming to make GNN improvements practical, this paper proposes an approach called NeuroBack, which builds on two insights: (1) predicting phases (i.e., values) of variables appearing in the majority (or even all) of the satisfying assignments are essential for CDCL SAT solving, and (2) it is sufficient to query the neural model only once for the predictions before the SAT solving starts. Once trained, the offline model inference allows NeuroBack to execute exclusively on the CPU, removing its reliance on GPU resources. To train NeuroBack, a new dataset called DataBack containing 120,286 data samples is created. Finally, NeuroBack is implemented as an enhancement to a state-of-the-art SAT solver called Kissat. As a result, it allowed Kissat to solve 5.2% more problems on the recent SAT competition problem set, SATCOMP-2022. NeuroBack therefore shows how machine learning can be harnessed to improve SAT solving in an effective and practical manner.	翻訳日:2023-12-01 01:13:24 公開日:2023-11-28
# 因果推論のための深層学習プライマー A Primer on Deep Learning for Causal Inference ( http://arxiv.org/abs/2110.04442v2 ) ライセンス: Link先を確認	Bernard Koch, Tim Sainburg, Pablo Geraldo, Song Jiang, Yizhou Sun, Jacob Gates Foster	(参考訳) このレビューは、潜在的な結果の枠組みの下でディープニューラルネットワークを用いた因果推論の新たな文献を体系化する。深層学習を用いて不均一な治療効果を推定・予測し、因果推論を非線形、時間変化、テキスト、ネットワーク、画像にエンコードされた設定にまで拡張する方法について、直感的な紹介を提供する。アクセシビリティを最大化するために,因果推論やディープラーニングといった前提概念も導入する。この調査は、観察因果推定、キーアルゴリズムの拡張展開、および、github.com/kochbj/deep-Learning-for-Causal-Inferenceで利用可能なTensorflow 2の深部推定器の実装、訓練、選択に関する詳細なチュートリアルに重点を置いている他の深部学習と因果推論の処理とは異なる。 This review systematizes the emerging literature for causal inference using deep neural networks under the potential outcomes framework. It provides an intuitive introduction on how deep learning can be used to estimate/predict heterogeneous treatment effects and extend causal inference to settings where confounding is non-linear, time varying, or encoded in text, networks, and images. To maximize accessibility, we also introduce prerequisite concepts from causal inference and deep learning. The survey differs from other treatments of deep learning and causal inference in its sharp focus on observational causal estimation, its extended exposition of key algorithms, and its detailed tutorials for implementing, training, and selecting among deep estimators in Tensorflow 2 available at github.com/kochbj/Deep-Learning-for-Causal-Inference.	翻訳日:2023-12-01 01:12:37 公開日:2023-11-28
# スコアベース拡散モデルにおける色変化の回避 Easing Color Shifts in Score-Based Diffusion Models ( http://arxiv.org/abs/2306.15832v2 ) ライセンス: Link先を確認	Katherine Deck and Tobias Bischoff	(参考訳) スコアベースのモデルの生成された画像は、その空間的手段、すなわち色シフトと呼ばれる効果の誤りに苦しむ可能性がある。本稿では,スコアベース拡散モデルのカラーシフトを緩和する手法について検討する。入力の空間平均を処理しスコア関数の平均を予測するために設計されたスコアネットワークにおける非線形バイパス接続の性能を定量化する。このネットワークアーキテクチャは、生成した画像の質を大幅に改善し、生成した画像のサイズにほぼ依存しないことを示す。結果として、この修正されたアーキテクチャは、画像サイズ間の色シフト問題に対する簡単な解決策を提供する。さらに,カラーシフトの起源を理想化された環境で議論し,そのアプローチを動機づける。 Generated images of score-based models can suffer from errors in their spatial means, an effect, referred to as a color shift, which grows for larger images. This paper investigates a previously-introduced approach to mitigate color shifts in score-based diffusion models. We quantify the performance of a nonlinear bypass connection in the score network, designed to process the spatial mean of the input and to predict the mean of the score function. We show that this network architecture substantially improves the resulting quality of the generated images, and that this improvement is approximately independent of the size of the generated images. As a result, this modified architecture offers a simple solution for the color shift problem across image sizes. We additionally discuss the origin of color shifts in an idealized setting in order to motivate the approach.	翻訳日:2023-12-01 01:08:06 公開日:2023-11-28
# 自己教師形変圧器における分離正規化について On Separate Normalization in Self-supervised Transformers ( http://arxiv.org/abs/2309.12931v2 ) ライセンス: Link先を確認	Xiaohui Chen, Yinkai Wang, Yuanqi Du, Soha Hassoun, Li-Ping Liu	(参考訳) 変圧器の自己指導訓練法は,様々な領域で顕著な性能を示した。マスク付きオートエンコーダ(MAE)のような以前のトランスフォーマーベースのモデルは、通常、[CLS]シンボルとトークンの両方に単一の正規化層を使用する。本稿では,トークンの正規化レイヤと[CLS]シンボルを分離して,それらの特徴をよりよく把握し,下流タスク性能を向上させるための簡単な修正を提案する。本手法は,両トークン型に対して同一の正規化統計値を使用することによる潜在的負の効果を緩和することを目的としている。我々は,別の正規化層を利用することで,[CLS]埋め込みがグローバルな文脈情報をよりよく符号化し,異方性空間に均一に分散できることを実証的に示す。従来の正規化層を2つの別々の層に置き換える場合、画像、自然言語、グラフドメインに対する平均2.7%のパフォーマンス改善が観察される。 Self-supervised training methods for transformers have demonstrated remarkable performance across various domains. Previous transformer-based models, such as masked autoencoders (MAE), typically utilize a single normalization layer for both the [CLS] symbol and the tokens. We propose in this paper a simple modification that employs separate normalization layers for the tokens and the [CLS] symbol to better capture their distinct characteristics and enhance downstream task performance. Our method aims to alleviate the potential negative effects of using the same normalization statistics for both token types, which may not be optimally aligned with their individual roles. We empirically show that by utilizing a separate normalization layer, the [CLS] embeddings can better encode the global contextual information and are distributed more uniformly in its anisotropic space. When replacing the conventional normalization layer with the two separate layers, we observe an average 2.7% performance improvement over the image, natural language, and graph domains.	翻訳日:2023-12-01 00:55:05 公開日:2023-11-28
# Diffusion-EDFs: 視覚ロボットマニピュレーションのためのSE(3)に基づく2-equivariant Denoising Generative Modeling Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation ( http://arxiv.org/abs/2309.02685v3 ) ライセンス: Link先を確認	Hyunwoo Ryu, Jiwoo Kim, Hyunseok An, Junwoo Chang, Joohwan Seo, Taehan Kim, Yubin Kim, Chaewon Hwang, Jongeun Choi, Roberto Horowitz	(参考訳) 拡散生成モデリングは、確率的人間の実演からロボット操作タスクを学ぶための有望なアプローチとなっている。本稿では,視覚ロボット操作タスクのための新しいSE(3)等価拡散に基づくアプローチであるDiffusion-EDFを提案する。提案手法は,1時間以内の効果的なエンドツーエンドトレーニングに5～10人の人間によるデモンストレーションしか必要とせず,優れたデータ効率が得られることを示す。さらに,本手法は最先端手法と比較して,一般化性と堅牢性に優れることを示した。最後に,本手法を実ハードウェア実験で検証する。プロジェクトウェブサイト: https://sites.google.com/view/diffusion-edfs/home Diffusion generative modeling has become a promising approach for learning robotic manipulation tasks from stochastic human demonstrations. In this paper, we present Diffusion-EDFs, a novel SE(3)-equivariant diffusion-based approach for visual robotic manipulation tasks. We show that our proposed method achieves remarkable data efficiency, requiring only 5 to 10 human demonstrations for effective end-to-end training in less than an hour. Furthermore, our benchmark experiments demonstrate that our approach has superior generalizability and robustness compared to state-of-the-art methods. Lastly, we validate our methods with real hardware experiments. Project Website: https://sites.google.com/view/diffusion-edfs/home	翻訳日:2023-12-01 00:53:36 公開日:2023-11-28
# 大規模言語モデルの説明可能性:調査 Explainability for Large Language Models: A Survey ( http://arxiv.org/abs/2309.01029v3 ) ライセンス: Link先を確認	Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du	(参考訳) 大規模言語モデル(llm)は自然言語処理において印象的な能力を示している。しかし、内部メカニズムはまだ不明であり、この透明性の欠如は下流アプリケーションにとって望ましくないリスクをもたらす。したがって、これらのモデルを理解し説明することは、それらの行動、制限、社会的影響を解明するために重要である。本稿では,説明可能性の分類法を紹介し,トランスフォーマティブに基づく言語モデルを説明する手法の構造化概要を示す。従来の微調整型パラダイムとプロンプト型パラダイムという,LLMのトレーニングパラダイムに基づいたテクニックを分類する。各パラダイムについて,個別予測の局所的説明とモデル知識の全体的説明を生成するための目標と支配的アプローチを要約する。また、生成した説明を評価するためのメトリクスについても論じ、モデルのデバッグやパフォーマンス向上に説明をどのように活用できるかについて議論する。最後に,従来の機械学習モデルと比較して,LLMの時代における重要な課題と説明手法の出現機会について検討する。 Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations, and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional machine learning models.	翻訳日:2023-12-01 00:53:26 公開日:2023-11-28
# 自動運転車の危険度評価における反事実的安全マージンの視点 A Counterfactual Safety Margin Perspective on the Scoring of Autonomous Vehicles' Riskiness ( http://arxiv.org/abs/2308.01050v4 ) ライセンス: Link先を確認	Alessandro Zanardi, Andrea Censi, Margherita Atzei, Luigi Di Lillo, Emilio Frazzoli	(参考訳) 自動運転車(avs)は、モビリティへの広範なアクセス、交通事故の低減、輸送効率の向上など、社会的なメリットを享受している。しかし, AVsに関連するリスクの評価は, 限られた歴史的データと急速な技術進歩のために複雑である。本稿では,道路利用者の非現実的シミュレーションに基づいて,様々な運用設計ドメイン(odd)における異なるavs行動のリスクを評価するためのデータ駆動型フレームワークを提案する。衝突を引き起こす可能性のある名目行動から最小の逸脱を表わす反事実的安全マージンの概念を提案する。この手法は最も重要なシナリオだけでなく、(相対的な)リスクの頻度とAVに関する重大度を定量化する。重要なことは、AVの行動方針が開示されていない場合でも、最悪のケースとベストケースの分析を通じて、規制当局やリスク評価機関のような外部機関に利益をもたらすことを示します。我々の実験結果は、安全マージンと運転方針の質、そして異なるavプロバイダの相対的なリスクに光を当てる奇妙な結果の相関を示している。全体として、この研究はavsの安全性評価に寄与し、この急成長する技術を取り巻く立法および保険の懸念に対処する。 Autonomous Vehicles (AVs) promise a range of societal advantages, including broader access to mobility, reduced road accidents, and enhanced transportation efficiency. However, evaluating the risks linked to AVs is complex due to limited historical data and the swift progression of technology. This paper presents a data-driven framework for assessing the risk of different AVs' behaviors in various operational design domains (ODDs), based on counterfactual simulations of "misbehaving" road users. We propose the notion of counterfactual safety margin, which represents the minimum deviation from nominal behavior that could cause a collision. This methodology not only pinpoints the most critical scenarios but also quantifies the (relative) risk's frequency and severity concerning AVs. Importantly, we show that our approach is applicable even when the AV's behavioral policy remains undisclosed, through worst- and best-case analyses, benefiting external entities like regulators and risk evaluators. Our experimental outcomes demonstrate the correlation between the safety margin, the quality of the driving policy, and the ODD, shedding light on the relative risks of different AV providers. Overall, this work contributes to the safety assessment of AVs and addresses legislative and insurance concerns surrounding this burgeoning technology.	翻訳日:2023-12-01 00:52:55 公開日:2023-11-28
# 高次元観測による可変性の学習源 Learning sources of variability from high-dimensional observational studies ( http://arxiv.org/abs/2307.13868v2 ) ライセンス: Link先を確認	Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein	(参考訳) 因果推論は、変数の存在が観測結果に影響を及ぼすかどうかを研究する。平均治療効果」などの量によって測定されるように、このパラダイムはワクチンや薬物開発から政策介入に至るまで、多くの生物学的分野にまたがる。残念なことに、これらの手法の大部分は、しばしば単変量の結果に制限される。我々の研究は、任意の次元または可測空間を持つ結果に対する因果推定を一般化し、因果差検定として名目変数に対する従来の因果推定を定式化する。本稿では,一貫した条件付き独立性テストの簡易な調整手法を提案し,これらのテストが一貫した因果不一致性テストであることを証明した。数値実験により,提案手法であるcausal cdcorrは,既存の手法と比較して有限サンプルの妥当性とパワーが向上することを示す。私たちのメソッドはすべてオープンソースで、github.com/ebridge2/cdcorrで利用可能です。 Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.	翻訳日:2023-12-01 00:51:37 公開日:2023-11-28
# コントラスト・デモとサリエンシー・マップを用いた文脈内学習の理解に向けて Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps ( http://arxiv.org/abs/2307.05052v2 ) ライセンス: Link先を確認	Paiheng Xu, Fuxiao Liu, Zongxia Li, Hyemi Song	(参考訳) 大規模言語モデル(LLM)のテキスト内学習(ICL)性能における様々な実演要素の役割について検討する。具体的には, 接地ラベル, 入力分布, 補完的説明, 特に, 変化や摂動時の影響について検討する。これらの要素がICLにどのように影響するかについて、さまざまな知見を提供する。これらの問題を探索するために,説明可能なNLP(XNLP)法を用い,定性解析と定量的解析の両方に対照的な実演のサリエンシマップを用いた。以上の結果から,大きなLSMではより顕著であるが,地平線ラベルの反転は唾液濃度に大きな影響を及ぼすことが明らかとなった。粒度レベルでの入力分布の解析により,感情分析タスクにおける感情表出語の変化は,表層ラベルの変更ほど大きな影響を及ぼさないことが明らかとなった。最後に、ICLの性能向上における補完的説明の有効性は、シンボリック推論タスクと比較して感情分析タスクで見られるメリットが限定的であることが判明した。これらの知見は,ChatGPT などのアプリケーションで LLM の利用が増加していることを踏まえ,LLM の機能を理解し,効果的な実演の開発を導く上で重要である。我々の研究コードはhttps://github.com/paihengxu/XICL.comで公開されています。 We investigate the role of various demonstration components in the in-context learning (ICL) performance of large language models (LLMs). Specifically, we explore the impacts of ground-truth labels, input distribution, and complementary explanations, particularly when these are altered or perturbed. We build on previous work, which offers mixed findings on how these elements influence ICL. To probe these questions, we employ explainable NLP (XNLP) methods and utilize saliency maps of contrastive demonstrations for both qualitative and quantitative analysis. Our findings reveal that flipping ground-truth labels significantly affects the saliency, though it's more noticeable in larger LLMs. Our analysis of the input distribution at a granular level reveals that changing sentiment-indicative terms in a sentiment analysis task to neutral ones does not have as substantial an impact as altering ground-truth labels. Finally, we find that the effectiveness of complementary explanations in boosting ICL performance is task-dependent, with limited benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks. These insights are critical for understanding the functionality of LLMs and guiding the development of effective demonstrations, which is increasingly relevant in light of the growing use of LLMs in applications such as ChatGPT. Our research code is publicly available at https://github.com/paihengxu/XICL.	翻訳日:2023-12-01 00:51:23 公開日:2023-11-28
# シンセティック・ヒューマングループ活動から学ぶ Learning from Synthetic Human Group Activities ( http://arxiv.org/abs/2306.16772v4 ) ライセンス: Link先を確認	Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir Kapadia	(参考訳) 複雑な人間の相互作用と集団活動の研究は、人間中心のコンピュータビジョンの焦点となっている。しかし、関連するタスクの進捗は、実世界のシナリオから大規模ラベル付きデータセットを取得するという課題によって妨げられることが多い。この制限に対処するために,マルチビューマルチパーソン・ヒューマン・アトミック・アクションとグループ・アクティビティのための合成データ・ジェネレータm3actを紹介する。 unityエンジンを搭載したm3actは、複数のセマンティックグループ、高度に多様なフォトリアリスティックなイメージ、そして、人間中心のタスクの学習を容易にする包括的なアノテーションセットを備えている。各種入力モダリティを用いた3つのコア実験におけるM3Actの利点を示す。まず、合成データを追加することで、dancetrackでのmotrv2のパフォーマンスが大幅に向上し、リードボードが10位から2位に跳ね上がりました。 M3Actでは、実世界の62.5%のデータをトレーニングしたMOTRv2と同等の追跡結果が得られる。第2に、M3ActはCAD2のベンチマーク性能を5.59%改善し、グループアクティビティとアトミックアクションの精度は7.43%向上した。さらに、M3Actは制御可能な3Dグループ活動生成のための新しい研究を開始した。複数のメトリクスを定義し、新しいタスクの競争基準を提案する。 The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by the Unity engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across single-person, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments using various input modalities. First, adding our synthetic data significantly improves the performance of MOTRv2 on DanceTrack, leading to a hop on the leaderboard from 10th to 2nd place. With M3Act, we achieve tracking results on par with MOTRv2, which is trained with 62.5% more real-world data. Second, M3Act improves the benchmark performances on CAD2 by 5.59% and 7.43% on group activity and atomic action accuracy respectively. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task.	翻訳日:2023-12-01 00:50:58 公開日:2023-11-28
# hallusionbench: 大きな視覚言語モデルにおける絡み合った言語幻覚と視覚錯覚のための高度な診断スイート HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination & Visual Illusion in Large Vision-Language Models ( http://arxiv.org/abs/2310.14566v2 ) ライセンス: Link先を確認	Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou	(参考訳) 画像コンテキスト推論の評価用に設計された総合ベンチマークであるhallusionbenchを紹介する。このベンチマークは,GPT-4V(Vision)やLLaVA-1.5のような高度な視覚言語モデル(LVLM)に対して,視覚データのニュアンスな理解と解釈を強調することで大きな課題を提起する。このベンチマークは、1129の質問と組み合わせた346の画像で構成されている。制御群を確立するために設計された視覚的質問に対する新しい構造を提案する。この構造により,モデルの応答傾向,論理的一貫性,さまざまな障害モードを定量的に解析することができる。 HallusionBenchの評価では、13種類のモデルをベンチマークし、31.42%の質問対精度を最先端のGPT-4Vで達成した。特に、他の評価モデルは全て16%未満の精度を達成する。さらに,本分析では,言語幻覚や視覚錯覚など,観察された障害モードだけでなく,これらの落とし穴の理解を深めている。 HallusionBench内の包括的ケーススタディは、LVLMにおける幻覚と幻覚の課題に光を当てた。これらの知見に基づいて,今後の改善の道筋を提案する。ベンチマークとコードベースはhttps://github.com/tianyi-lab/hallusionbenchからアクセスできる。 We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision) and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129 questions, all meticulously crafted by human experts. We introduce a novel structure for these visual questions designed to establish control groups. This structure enables us to conduct a quantitative analysis of the models' response tendencies, logical consistency, and various failure modes. In our evaluation on HallusionBench, we benchmarked 13 different models, highlighting a 31.42% question-pair accuracy achieved by the state-of-the-art GPT-4V. Notably, all other evaluated models achieve accuracy below 16%. Moreover, our analysis not only highlights the observed failure modes, including language hallucination and visual illusion, but also deepens an understanding of these pitfalls. Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs. Based on these insights, we suggest potential pathways for their future improvement. The benchmark and codebase can be accessed at https://github.com/tianyi-lab/HallusionBench.	翻訳日:2023-12-01 00:43:38 公開日:2023-11-28
# 完全性保証による効率的な計画のハイブリッド探索 Hybrid Search for Efficient Planning with Completeness Guarantees ( http://arxiv.org/abs/2310.12819v2 ) ライセンス: Link先を確認	Kalle Kujanp\"a\"a, Joni Pajarinen, Alexander Ilin	(参考訳) 複雑な計画問題の解決は、コンピュータ科学における長年の課題である。学習に基づく下位探索手法は、これらの問題に取り組むことには期待が持たれているが、それらはしばしば完全性保証の欠如に苦しめられている。本稿では,離散的な行動空間における完全性を実現するために,部分ゴール探索法を効果的に拡張する手法を提案する。具体的には、マルチレベル(ハイブリッド)検索を実行するために、低レベル動作による高レベル検索を増強する。このソリューションは、高レベル検索の実用的効率と低レベル検索の完全性という、両方の世界のベストを達成する。提案手法を最近提案したサブゴール探索アルゴリズムに適用し,複雑な計画問題に対するオフラインデータに基づく学習アルゴリズムの評価を行った。完全なサブゴア検索は完全性を保証するだけでなく、低レベルの拡張なしに高レベルが解決できるインスタンスの検索拡張の観点からもパフォーマンスを向上させることができることを実証する。当社のアプローチでは,完全性が必須要件であるシステムに対して,サブゴールレベルの計画を適用することができる。 Solving complex planning problems has been a long-standing challenge in computer science. Learning-based subgoal search methods have shown promise in tackling these problems, but they often suffer from a lack of completeness guarantees, meaning that they may fail to find a solution even if one exists. In this paper, we propose an efficient approach to augment a subgoal search method to achieve completeness in discrete action spaces. Specifically, we augment the high-level search with low-level actions to execute a multi-level (hybrid) search, which we call complete subgoal search. This solution achieves the best of both worlds: the practical efficiency of high-level search and the completeness of low-level search. We apply the proposed search method to a recently proposed subgoal search algorithm and evaluate the algorithm trained on offline data on complex planning problems. We demonstrate that our complete subgoal search not only guarantees completeness but can even improve performance in terms of search expansions for instances that the high-level could solve without low-level augmentations. Our approach makes it possible to apply subgoal-level planning for systems where completeness is a critical requirement.	翻訳日:2023-12-01 00:43:16 公開日:2023-11-28
# 大規模言語モデルによる非コンパイルCコードの書き換え Refining Decompiled C Code with Large Language Models ( http://arxiv.org/abs/2310.06530v2 ) ライセンス: Link先を確認	Wai Kin Wong, Huaijin Wang, Zongjie Li, Zhibo Liu, Shuai Wang, Qiyi Tang, Sen Nie, Shi Wu	(参考訳) cの逆コンパイラは実行ファイルをソースコードに変換する。再コンパイルされたcのソースコードは、元の実行ファイルと同じ機能を持つ実行ファイルを生成することが期待されている。 20年以上の開発を経て、cデコンパイラはリバースエンジニアリングアプリケーションをサポートするためにプロダクションで広く使われている。 Cデコンパイラの発達にもかかわらず、デコンパイラの出力は主に人間の消費に使われており、自動再コンパイルには適していないことが広く認識されている。多くの場合、再コンパイルされ適切に実行される前に逆コンパイラ出力を修正するためにかなりの手作業が必要となる。本論文は, 自然言語の高密度コーパスの理解において, 大規模言語モデル (LLM) が最近成功したことによる。逆コンパイラ出力の修正における退屈でコストがかかり、しばしばエラーが発生しやすい手作業を軽減するため、再コンパイル可能な逆コンパイラ出力をLLMで拡張する可能性を検討する。より高い可読性(例えば、型/変数の名前の復元)でデコンパイラの出力を増やすことに注力する以前の取り組みとは違って、再コンパイル性のあるデコンパイラの出力を増やすことに重点を置いている。我々は、事実上の商用CデコンパイラであるIDA-Proの出力を再コンパイルする際の障害を特徴づけるパイロット研究を行う。次に、LLMを用いてデコンパイラ出力を増大させる2段階のハイブリッド手法を提案する。我々は、人気のあるcテストケースのセットに対するアプローチを評価し、我々のアプローチが中程度の労力で高い再コンパイル成功率を75%以上達成できることを示す一方で、ida-proのオリジナルの出力は再コンパイルできないことを示した。我々は,我々のアプローチの限界と将来的な研究の方向性について論じる。 A C decompiler converts an executable into source code. The recovered C source code, once re-compiled, is expected to produce an executable with the same functionality as the original executable. With over twenty years of development, C decompilers have been widely used in production to support reverse engineering applications. Despite the prosperous development of C decompilers, it is widely acknowledged that decompiler outputs are mainly used for human consumption, and are not suitable for automatic recompilation. Often, a substantial amount of manual effort is required to fix the decompiler outputs before they can be recompiled and executed properly. This paper is motived by the recent success of large language models (LLMs) in comprehending dense corpus of natural language. To alleviate the tedious, costly and often error-prone manual effort in fixing decompiler outputs, we investigate the feasibility of using LLMs to augment decompiler outputs, thus delivering recompilable decompilation. Note that different from previous efforts that focus on augmenting decompiler outputs with higher readability (e.g., recovering type/variable names), we focus on augmenting decompiler outputs with recompilability, meaning to generate code that can be recompiled into an executable with the same functionality as the original executable. We conduct a pilot study to characterize the obstacles in recompiling the outputs of the de facto commercial C decompiler -- IDA-Pro. We then propose a two-step, hybrid approach to augmenting decompiler outputs with LLMs. We evaluate our approach on a set of popular C test cases, and show that our approach can deliver a high recompilation success rate to over 75% with moderate effort, whereas none of the IDA-Pro's original outputs can be recompiled. We conclude with a discussion on the limitations of our approach and promising future research directions.	翻訳日:2023-12-01 00:41:51 公開日:2023-11-28
# CLiC: コンテキストにおける概念学習 CLiC: Concept Learning in Context ( http://arxiv.org/abs/2311.17083v1 ) ライセンス: Link先を確認	Mehdi Safaee, Aryan Mikaeili, Or Patashnik, Daniel Cohen-Or, Ali Mahdavi-Amiri	(参考訳) 本稿では,物体の局所的な視覚パターンを1つの画像から学習し,そのパターンで表現した画像を生成する課題について述べる。ローカライズされた概念を学習し、対象のイメージにオブジェクトを置くことは、異なる方向や形を持つ可能性があるため、非自明な作業である。我々のアプローチは視覚概念学習の最近の進歩に基づいている。ソースイメージから視覚概念(例えば、装飾)を取得し、その後、ターゲットイメージ内のオブジェクト(例えば、椅子)に適用する。私たちの重要なアイデアは、コンテキスト内コンセプト学習を実行し、それらが属するオブジェクトの広いコンテキスト内でローカルなビジュアル概念を取得することです。概念学習のローカライズには,マスク内の概念と周囲の画像領域の両方を含むソフトマスクを用いる。画像内のオブジェクト生成によるアプローチを実証し,コンテキスト内学習概念の活用可能性を示す。また,取得した概念を対象画像内の特定の場所に向ける手法を導入し,クロスアテンション機構を導入し,ソースとターゲットオブジェクトの対応性を確立する。本手法の有効性を定量的・質的実験と基礎技術との比較により実証した。 This paper addresses the challenge of learning a local visual pattern of an object from one image, and generating images depicting objects with that pattern. Learning a localized concept and placing it on an object in a target image is a nontrivial task, as the objects may have different orientations and shapes. Our approach builds upon recent advancements in visual concept learning. It involves acquiring a visual concept (e.g., an ornament) from a source image and subsequently applying it to an object (e.g., a chair) in a target image. Our key idea is to perform in-context concept learning, acquiring the local visual concept within the broader context of the objects they belong to. To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area. We demonstrate our approach through object generation within an image, showcasing plausible embedding of in-context learned concepts. We also introduce methods for directing acquired concepts to specific locations within target images, employing cross-attention mechanisms, and establishing correspondences between source and target objects. The effectiveness of our method is demonstrated through quantitative and qualitative experiments, along with comparisons against baseline techniques.	翻訳日:2023-12-01 00:32:12 公開日:2023-11-28
# 分布シフト下における基礎モデルの正確な解析能力のベースライン解析 A Baseline Analysis of Reward Models' Ability To Accurately Analyze Foundation Models Under Distribution Shift ( http://arxiv.org/abs/2311.14743v2 ) ライセンス: Link先を確認	Ben Pikus, Will LeVine, Tony Chen, Sean Hendryx	(参考訳) 基礎モデル、特にLarge Language Models (LLM)は近年広く注目を集め、採用されている。強化学習(Reinforcement Learning with Human Feedback, RLHF)は、所望の行動を捉えるために報酬モデルを訓練し、LLMを整列させる。これらの報酬モデルはまた、所望の行動にLLMの反応がどの程度順応するかを推測するために、推論時にも使われる。しかしながら、これらの報酬モデルが分散シフトに対してどれほど堅牢かを測定する作業はほとんどありません。本研究では,報奨モデルの性能 - 精度とキャリブレーション(精度と信頼性のアライメント)による測定 - が分布シフトによってどのように影響を受けるかを評価する。我々は、OODプロンプトと応答による新しいキャリブレーションパターンと精度低下を示し、報酬モデルがプロンプトよりも応答の変化に敏感であることを示す。さらに,報奨モデル設定に分類によく用いられるOOD検出手法を適用し,これらの分布変化をプロンプトや応答で検出する。 Foundation models, specifically Large Language Models (LLM's), have lately gained wide-spread attention and adoption. Reinforcement Learning with Human Feedback (RLHF) involves training a reward model to capture desired behaviors, which is then used to align an LLM. These reward models are additionally used at inference-time to estimate how well LLM responses adhere to those desired behaviors. However, there is little work measuring how robust these reward models are to distribution shifts. In this work, we evaluate how reward model performance - measured via accuracy and calibration (i.e. alignment between accuracy and confidence) - is affected by distribution shift. We show novel calibration patterns and accuracy drops due to OOD prompts and responses, and that the reward model is more sensitive to shifts in responses than prompts. Additionally, we adapt an OOD detection technique commonly used in classification to the reward model setting in order to detect these distribution shifts in prompts and responses.	翻訳日:2023-12-01 00:31:28 公開日:2023-11-28
# 道路シーンにおける交通信号の解釈 Traffic Sign Interpretation in Real Road Scene ( http://arxiv.org/abs/2311.10793v2 ) ライセンス: Link先を確認	Chuang Yang, Kai Zhuang, Mulin Chen, Haozhao Ma, Xu Han, Tao Han, Changxing Guo, Han Han, Bingxuan Zhao, and Qi Wang	(参考訳) 既存の交通標識関連作業の多くは、信号機間のグローバルなセマンティックロジックの分析に失敗し、不正確な交通指示を伝達する可能性のある交通標識の一部を個別に検出し、認識することを目的としている。上記の課題に倣って,グローバルな意味的相互関連交通標識(例えば,運転指示関連テキスト,シンボル,ガイドパネル)を自然言語に解釈し,自律運転やアシスタント運転に正確な指導支援を提供することを目的とした交通標識解釈(TSI)タスクを提案する。一方,TSIのためのマルチタスク学習アーキテクチャを設計し,様々な交通標識を検出して認識し,それを人間のような自然言語に解釈する。さらに、パブリックなTSIデータセットがないため、トラフィックサイン解釈データセット、すなわちTSI-CNを構築する必要がある。このデータセットは実際の道路シーンの画像で構成されており、道路や中国の都市部からドライバーの視点から捉えられている。テキスト、シンボル、ガイドパネルの豊富な位置ラベルと、対応する自然言語記述ラベルが含まれている。 TSI-CNの実験は、TSIタスクが達成可能であることを示し、TSIアーキテクチャは、記号間に複雑なセマンティックロジックがあっても、シーンからのトラフィックサインをうまく解釈できることを示した。 TSI-CNデータセットとTSIアーキテクチャのソースコードは、修正プロセス後に公開される。 Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instruction-related texts, symbols, and guide panels) into a natural language for providing accurate instruction support to autonomous or assistant driving. Meanwhile, we design a multi-task learning architecture for TSI, which is responsible for detecting and recognizing various traffic signs and interpreting them into a natural language like a human. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complex semantic logic among signs. The TSI-CN dataset and the source code of the TSI architecture will be publicly available after the revision process.	翻訳日:2023-12-01 00:28:36 公開日:2023-11-28
# StreamFlow: ビデオシーケンスのためのストリーム化多フレーム光フロー推定 StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video Sequences ( http://arxiv.org/abs/2311.17099v1 ) ライセンス: Link先を確認	Shangkun Sun, Jiaming Liu, Thomas H. Li, Huaxia Li, Guoqing Liu, Wei Gao	(参考訳) 連続するフレーム間のオクルージョンは、光学的フロー推定において重要な課題となった。オクルージョンによって引き起こされる固有の曖昧さは、輝度コンスタント性制約に直接違反し、ピクセル対ピクセルマッチングを著しく阻害する。この問題に対処するため、マルチフレーム光フロー法は隣接するフレームを利用して局所的曖昧さを緩和する。しかし、従来のマルチフレーム手法は再帰的なフロー推定を主に採用しており、計算上の重複がかなり大きい。そこで本研究では,冗長再帰的計算の必要性をなくし,同時にバッチ内推定制約下で有効な時空間モデリング手法を考案する,バッチ内フレームワークを提案する。具体的には,ビデオ入力に合わせたマルチフレーム(sim)パイプラインの合理化を行い,2フレームネットワークと同等の時間効率を実現する。さらに、符号化フェーズにおける効果的な時空間モデリングのための効率的な積分時空間コヒーレンス(ISC)モデルを導入し、追加パラメータのオーバーヘッドを生じさせない。さらに,デコード中の時間的関係を効果的に探索するグローバルテンポラルレグレシタ(GTR)を考案した。効率的なsimパイプラインと効果的なモジュールの恩恵を受けるstreamflowは、挑戦的なkittiとsintelデータセットのパフォーマンスの面で優れているだけでなく、特にオクルード領域におけるパフォーマンスが向上しているだけでなく、従来のマルチフレーム法と比較して驚くべき63.82\%のスピード向上を達成している。コードは近くhttps://github.com/littlespray/streamflowで入手できる。 Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation. The inherent ambiguity introduced by occlusions directly violates the brightness constancy constraint and considerably hinders pixel-to-pixel matching. To address this issue, multi-frame optical flow methods leverage adjacent frames to mitigate the local ambiguity. Nevertheless, prior multi-frame methods predominantly adopt recursive flow estimation, resulting in a considerable computational overlap. In contrast, we propose a streamlined in-batch framework that eliminates the need for extensive redundant recursive computations while concurrently developing effective spatio-temporal modeling approaches under in-batch estimation constraints. Specifically, we present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks. Furthermore, we introduce an efficient Integrative Spatio-temporal Coherence (ISC) modeling method for effective spatio-temporal modeling during the encoding phase, which introduces no additional parameter overhead. Additionally, we devise a Global Temporal Regressor (GTR) that effectively explores temporal relations during decoding. Benefiting from the efficient SIM pipeline and effective modules, StreamFlow not only excels in terms of performance on the challenging KITTI and Sintel datasets, with particular improvement in occluded areas but also attains a remarkable $63.82\%$ enhancement in speed compared with previous multi-frame methods. The code will be available soon at https://github.com/littlespray/StreamFlow.	翻訳日:2023-12-01 00:21:05 公開日:2023-11-28
# 内部および横断的不整合を用いた教師なしマルチモーダルディープフェイク検出 Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies ( http://arxiv.org/abs/2311.17088v1 ) ライセンス: Link先を確認	Mulin Tian, Mahyar Khayatkhoei, Joe Mathai, Wael AbdAlmageed	(参考訳) ディープフェイクビデオは、刑事司法、民主主義、個人の安全とプライバシーに悪影響を及ぼす可能性のある社会への脅威が増えている。一方で、大規模なディープフェイクの検出は、既存のディープフェイク生成メソッドからのラベル付きトレーニングデータを必要とする、非常に難しいタスクである。さらに、最も正確な教師付き学習であっても、ディープフェイク検出法は、新しい生成法を用いて生成されたディープフェイクに一般化しない。本稿では,マルチモーダル機能,特に視覚,音声,アイデンティティ機能間のイントラモーダルおよびクロスモーダル一貫性を測定することにより,ディープフェイク映像を検出するための新しい教師なし手法を提案する。提案手法の背後にある基本的な仮説は、ディープフェイク生成が顔の動きを別の顔に移そうとするので、これらの手法は最終的に動きとアイデンティティのトレードオフに遭遇し、検出可能な不一致を生じさせることである。本手法を広範な実験により検証し,ディープフェイク映像における重要なモード間およびモード間不整合の存在を実証し,高精度な検出に効果的に利用できることを示す。提案手法は, 実データのみを用いてトレーニングされるため, 推定時にプリスタンサンプルを必要としないため, 拡張性があり, 人間の専門家によって検証されるモダリティの不整合の正確な位置を特定できるため, 説明可能である。 Deepfake videos present an increasing threat to society with potentially negative impact on criminal justice, democracy, and personal safety and privacy. Meanwhile, detecting deepfakes, at scale, remains a very challenging tasks that often requires labeled training data from existing deepfake generation methods. Further, even the most accurate supervised learning, deepfake detection methods do not generalize to deepfakes generated using new generation methods. In this paper, we introduce a novel unsupervised approach for detecting deepfake videos by measuring of intra- and cross-modal consistency among multimodal features; specifically visual, audio, and identity features. The fundamental hypothesis behind the proposed detection method is that since deepfake generation attempts to transfer the facial motion of one identity to another, these methods will eventually encounter a trade-off between motion and identity that enviably leads to detectable inconsistencies. We validate our method through extensive experimentation, demonstrating the existence of significant intra- and cross- modal inconsistencies in deepfake videos, which can be effectively utilized to detect them with high accuracy. Our proposed method is scalable because it does not require pristine samples at inference, generalizable because it is trained only on real data, and is explainable since it can pinpoint the exact location of modality inconsistencies which are then verifiable by a human expert.	翻訳日:2023-12-01 00:20:39 公開日:2023-11-28
# 逆転性向上のためのミックスアップの再考 Rethinking Mixup for Improving the Adversarial Transferability ( http://arxiv.org/abs/2311.17087v1 ) ライセンス: Link先を確認	Xiaosen Wang, Zeyuan Yin	(参考訳) 混合拡張は、代理モデルから他のモデルに移行する際に、より優れた逆転性を持つ逆転例を生成するために広く統合されている。しかし、ミキサップの効果に影響を及ぼすメカニズムは未解明のままである。本研究では,様々なカテゴリにおける意思決定境界の収束点に位置する逆例は,より優れた伝達性を示し,Admixがそのような領域に対して逆例を操る傾向にあることを示す。しかし、Admixの付加画像上の制約は、その能力を損なうため、転送性は制限される。このような問題に対処するため、我々はMixing the Image but Separating the gradienT (MIST)と呼ばれる新しい入力変換ベースの攻撃を提案する。具体的には、MISTは、入力画像とランダムにシフトした画像とをランダムに混合し、混合画像毎に各損失項目の勾配を分離する。不正確な勾配に対処するため、MISTは入力サンプル毎に複数の混合画像の勾配を算出する。 ImageNetデータセットの大規模な実験結果によると、MISTは既存のSOTA入力変換ベースの攻撃よりも優れており、畳み込みニューラルネットワーク(CNN)とビジョントランスフォーマー(ViT)の2つの防御メカニズムに明確な差がある。 Mixup augmentation has been widely integrated to generate adversarial examples with superior adversarial transferability when immigrating from a surrogate model to other models. However, the underlying mechanism influencing the mixup's effect on transferability remains unexplored. In this work, we posit that the adversarial examples located at the convergence of decision boundaries across various categories exhibit better transferability and identify that Admix tends to steer the adversarial examples towards such regions. However, we find the constraint on the added image in Admix decays its capability, resulting in limited transferability. To address such an issue, we propose a new input transformation-based attack called Mixing the Image but Separating the gradienT (MIST). Specifically, MIST randomly mixes the input image with a randomly shifted image and separates the gradient of each loss item for each mixed image. To counteract the imprecise gradient, MIST calculates the gradient on several mixed images for each input sample. Extensive experimental results on the ImageNet dataset demonstrate that MIST outperforms existing SOTA input transformation-based attacks with a clear margin on both Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) w/wo defense mechanisms, supporting MIST's high effectiveness and generality.	翻訳日:2023-12-01 00:20:15 公開日:2023-11-28
# PEA拡散:非英語テキスト・画像生成における知識蒸留を用いたパラメータ効率の良い適応器 PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation ( http://arxiv.org/abs/2311.17086v1 ) ライセンス: Link先を確認	Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu	(参考訳) テキスト間拡散モデルは、テキストのプロンプトに基づいて現実的な画像を生成する能力でよく知られている。しかし、既存の作品は主に英語に焦点を当てており、非英語のテキストから画像へのモデルをサポートしていない。最も一般的な翻訳方法は、言語文化に関連する生成問題を解決できないが、特定の言語データセット上でスクラッチからトレーニングすることは、非常に高価である。本稿では,知識蒸留に基づく単純なプラグ・アンド・プレイ言語転送法を提案する。必要なのは、教師の知識の蒸留の下で6Mパラメータしか持たない軽量なMLP型パラメータ効率アダプタ(PEA)と、小さな並列データコーパスをトレーニングすることだけです。 UNetのパラメータの凍結は、言語固有のプロンプト評価セットにおいて依然として顕著な性能を達成できることに驚き、PEAが元のUNetの潜在的な生成能力を刺激できることを実証した。さらに、一般的なプロンプト評価セットに基づいて、英語のテキスト・画像モデルの性能に近づいた。さらに,このアダプタをプラグインとして使用することで,言語間テキスト・画像生成における下流タスクの重要な結果が得られる。コードは、https://github.com/OPPO-Mente-Lab/PEA-Diffusionで入手できる。 Text-to-image diffusion models are well-known for their ability to generate realistic images based on textual prompts. However, the existing works have predominantly focused on English, lacking support for non-English text-to-image models. The most commonly used translation methods cannot solve the generation problem related to language culture, while training from scratch on a specific language dataset is prohibitively expensive. In this paper, we are inspired to propose a simple plug-and-play language transfer method based on knowledge distillation. All we need to do is train a lightweight MLP-like parameter-efficient adapter (PEA) with only 6M parameters under teacher knowledge distillation along with a small parallel data corpus. We are surprised to find that freezing the parameters of UNet can still achieve remarkable performance on the language-specific prompt evaluation set, demonstrating that PEA can stimulate the potential generation ability of the original UNet. Additionally, it closely approaches the performance of the English text-to-image model on a general prompt evaluation set. Furthermore, our adapter can be used as a plugin to achieve significant results in downstream tasks in cross-lingual text-to-image generation. Code will be available at: https://github.com/OPPO-Mente-Lab/PEA-Diffusion	翻訳日:2023-12-01 00:19:50 公開日:2023-11-28
# beyond visual cues: 視覚言語追跡のための目標中心セマンティクスの同時探索 Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking ( http://arxiv.org/abs/2311.17085v1 ) ライセンス: Link先を確認	Jiawei Ge, Xiangmei Chen, Jiuxin Cao, Xuelin Zhu, Weijia Liu, Bo Liu	(参考訳) 単一のオブジェクト追跡は、初期状態から、ビデオシーケンス内の特定のターゲットを見つけることを目的としている。古典的なトラッカーは視覚的な手がかりにのみ依存しており、外観の変化、曖昧さ、気晴らしといった課題に対処する能力を制限する。そのため、視覚言語(vl)トラッキングは有望なアプローチとして登場し、言語記述を組み込んで高レベルのセマンティクスを直接提供し、トラッキング性能を向上させる。しかしながら、現在のVLトラッカーはVL学習のパワーを十分に活用していない。特徴抽出のためにオフザシェルフバックボーンに強く依存する、非効率なVL融合設計、VL関連損失関数の欠如などである。そこで本研究では,VLトラッキングのためのターゲット中心のセマンティクスを徐々に探求する新しいトラッカーを提案する。具体的には,VLトラッキングのための最初のSynchronous Learning Backbone (SLB)を提案する。これは,Target Enhance Module (TEM) と Semantic Aware Module (SAM) の2つの新しいモジュールで構成される。これらのモジュールは、トラッカーがターゲットに関連するセマンティクスを知覚し、視覚とテキストの両方のモダリティのコンテキストを同じペースで理解し、VLの特徴抽出と異なるセマンティクスレベルでの融合を容易にする。さらに,マルチモーダル表現学習をさらに強化するために,濃密なマッチング損失を考案する。 VL追跡データセットの大規模実験により,本手法の優位性と有効性を示した。 Single object tracking aims to locate one specific target in video sequences, given its initial state. Classical trackers rely solely on visual cues, restricting their ability to handle challenges such as appearance variations, ambiguity, and distractions. Hence, Vision-Language (VL) tracking has emerged as a promising approach, incorporating language descriptions to directly provide high-level semantics and enhance tracking performance. However, current VL trackers have not fully exploited the power of VL learning, as they suffer from limitations such as heavily relying on off-the-shelf backbones for feature extraction, ineffective VL fusion designs, and the absence of VL-related loss functions. Consequently, we present a novel tracker that progressively explores target-centric semantics for VL tracking. Specifically, we propose the first Synchronous Learning Backbone (SLB) for VL tracking, which consists of two novel modules: the Target Enhance Module (TEM) and the Semantic Aware Module (SAM). These modules enable the tracker to perceive target-related semantics and comprehend the context of both visual and textual modalities at the same pace, facilitating VL feature extraction and fusion at different semantic levels. Moreover, we devise the dense matching loss to further strengthen multi-modal representation learning. Extensive experiments on VL tracking datasets demonstrate the superiority and effectiveness of our methods.	翻訳日:2023-12-01 00:19:26 公開日:2023-11-28
# depthssc: 単眼3次元セマンティックシーンの奥行き空間アライメントと動的ボクセル解像度 DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion ( http://arxiv.org/abs/2311.17084v1 ) ライセンス: Link先を確認	Jiawei Yao and Jusheng Zhang	(参考訳) 単眼カメラによる3次元セマンティックシーンの完成作業は、自動運転の分野で注目を集めている。その目的は、部分的な画像入力から3dシーン内の各ボクセルの占有状況を予測することである。多くの方法が存在するにも拘わらず、その多くは空間情報と深度情報の正確なアライメントの問題を見落としている。そこで本研究では,単眼カメラのみをベースとするセマンティックシーン補完手法であるdeepsscを提案する。 DepthSSCは、ST-GF(Spatial Transformation Graph Fusion)モジュールと幾何学的なボクセル化を組み合わせ、ボクセル解像度の動的調整を可能にし、3次元空間の幾何学的複雑さを考慮して空間情報と深度情報の正確な整合性を確保する。この手法は,従来の手法で観測された空間的ずれや歪みの問題を緩和する。 SemanticKITTIデータセットの評価を通じて、DepthSSCは複雑な3D構造の詳細をキャプチャする効果を示すだけでなく、最先端のパフォーマンスも達成している。 depthsscは、単眼カメラベースの3dセマンティックシーン補完研究の新しい視点を提供し、さらなる研究を刺激することを期待している。 The task of 3D semantic scene completion with monocular cameras is gaining increasing attention in the field of autonomous driving. Its objective is to predict the occupancy status of each voxel in the 3D scene from partial image inputs. Despite the existence of numerous methods, many of them overlook the issue of accurate alignment between spatial and depth information. To address this, we propose DepthSSC, an advanced method for semantic scene completion solely based on monocular cameras. DepthSSC combines the ST-GF (Spatial Transformation Graph Fusion) module with geometric-aware voxelization, enabling dynamic adjustment of voxel resolution and considering the geometric complexity of 3D space to ensure precise alignment between spatial and depth information. This approach successfully mitigates spatial misalignment and distortion issues observed in prior methods. Through evaluation on the SemanticKITTI dataset, DepthSSC not only demonstrates its effectiveness in capturing intricate 3D structural details but also achieves state-of-the-art performance. We believe DepthSSC provides a fresh perspective on monocular camera-based 3D semantic scene completion research and anticipate it will inspire further related studies.	翻訳日:2023-12-01 00:18:59 公開日:2023-11-28
# dreampropeller:並列サンプリングによるsupercharge text-to-3d生成 DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling ( http://arxiv.org/abs/2311.17082v1 ) ライセンス: Link先を確認	Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon	(参考訳) テキスト3次元生成のための2次元拡散モデルを用いたスコア蒸留サンプリング(sds)や変分スコア蒸留(vsd)などの最近の手法は、優れた生成品質を示している。しかし、そのようなアルゴリズムの長期化はユーザー体験を著しく劣化させる。そこで,本稿では,既存のテキストから3dへの生成パイプラインをスコア蒸留に基づいてラップできる,ドロップイン・アクセラレーションアルゴリズムであるdreampropellerを提案する。我々のフレームワークは、ODEパスを並列サンプリングする古典的なアルゴリズムであるPicard繰り返しを一般化し、モーメントベースの勾配更新や最適化プロセス中の寸法変化などの非ODEパスを3次元生成の場合と同様に考慮することができる。アルゴリズムが並列計算をウォールクロック時間と交換し、テスト済みフレームワークの最大4.7倍のスピードアップを達成し、生成品質の低下を無視できることを示した。 Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks.	翻訳日:2023-12-01 00:18:35 公開日:2023-11-28
# I-MedSAM: セグメンテーションによる医用画像セグメンテーション I-MedSAM: Implicit Medical Image Segmentation with Segment Anything ( http://arxiv.org/abs/2311.17081v1 ) ライセンス: Link先を確認	Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, Shanghang Zhang	(参考訳) ディープニューラルネットワーク(DNN)の開発により、医療画像のセグメンテーションに多くの取り組みがなされている。 nnUNetのような従来の手法では、個々のデータセット上で特定のセグメンテーションモデルをトレーニングしている。基礎的なセグメンテーションモデル(SAM)を医用画像セグメンテーションに適用する手法が,近年提案されている。しかし、彼らは依然として離散的な表現に注目してピクセル単位で予測し、空間的に柔軟性がなく、より高解像度にスケールしにくい。対照的に、暗黙的手法は、医用画像のセグメンテーションに欠かせないセグメンテーションの連続的な表現を学習する。本稿では,連続表現とSAMの両方の利点を利用するI-MedSAMを提案する。医用画像セグメンテーションは,詳細なセグメンテーション境界を予測する必要があるため,パラメータ効率の良い微調整(peft)時に,高周波情報を用いたsam機能を強化する新しいアダプタを設計した。 samの特徴と座標を連続的なセグメンテーション出力に変換するために、暗黙的なニューラルネットワーク表現(inr)を使用して、暗黙的なセグメンテーションデコーダを学習する。また、INRの効率的な学習のための不確実性誘導サンプリング戦略を提案する。 2次元医用画像セグメンテーションタスクの広範な評価を行った結果, 学習可能なパラメータが1.6mに満たない手法は, 離散的および連続的手法を含む既存の手法よりも優れていることがわかった。コードはリリースされます。 With the development of Deep Neural Networks (DNNs), many efforts have been made to handle medical image segmentation. Traditional methods such as nnUNet train specific segmentation models on the individual datasets. Plenty of recent methods have been proposed to adapt the foundational Segment Anything Model (SAM) to medical image segmentation. However, they still focus on discrete representations to generate pixel-wise predictions, which are spatially inflexible and scale poorly to higher resolution. In contrast, implicit methods learn continuous representations for segmentation, which is crucial for medical image segmentation. In this paper, we propose I-MedSAM, which leverages the benefits of both continuous representations and SAM, to obtain better cross-domain ability and accurate boundary delineation. Since medical image segmentation needs to predict detailed segmentation boundaries, we designed a novel adapter to enhance the SAM features with high-frequency information during Parameter Efficient Fine Tuning (PEFT). To convert the SAM features and coordinates into continuous segmentation output, we utilize Implicit Neural Representation (INR) to learn an implicit segmentation decoder. We also propose an uncertainty-guided sampling strategy for efficient learning of INR. Extensive evaluations on 2D medical image segmentation tasks have shown that our proposed method with only 1.6M trainable parameters outperforms existing methods including discrete and continuous methods. The code will be released.	翻訳日:2023-12-01 00:18:19 公開日:2023-11-28
# AIアートにおける「サミネス」の議論: 幻覚の対話型AIインスタレーションに関する考察 Combating the "Sameness" in AI Art: Reflections on the Interactive AI Installation Fencing Hallucination ( http://arxiv.org/abs/2311.17080v1 ) ライセンス: Link先を確認	Weihao Qiu, George Legrady	(参考訳) この記事では、AI(Artificial Intelligence)アートにおける3つのタイプの「サミーネス」問題を要約する。 Fencing Hallucinationプロジェクトを通じて、同記事は、統一性の感覚を緩和し、AI画像シンセサイザーからのイメージの独自性を維持し、アートワークとオーディエンスとの接続性を高めるために、AIアート制作の設計を反映している。本稿では,fencing hallucination projectから得られた取り組みと洞察を振り返って,独特のaiアートの創造を促進することを目的としている。 The article summarizes three types of "sameness" issues in Artificial Intelligence(AI) art, each occurring at different stages of development in AI image creation tools. Through the Fencing Hallucination project, the article reflects on the design of AI art production in alleviating the sense of uniformity, maintaining the uniqueness of images from an AI image synthesizer, and enhancing the connection between the artworks and the audience. This paper endeavors to stimulate the creation of distinctive AI art by recounting the efforts and insights derived from the Fencing Hallucination project, all dedicated to addressing the issue of "sameness".	翻訳日:2023-12-01 00:17:52 公開日:2023-11-28
# 暗黙的ニューラル表現によるパラメータ化PDEの低次モデリング Reduced-order modeling for parameterized PDEs via implicit neural representations ( http://arxiv.org/abs/2311.16410v1 ) ライセンス: Link先を確認	Tianshu Wen, Kookjin Lee, Youngsoo Choi	(参考訳) 並列化偏微分方程式(PDE)を多値化問題に対して効率的に解くために,データ駆動型低次モデリング手法を提案する。この研究は暗黙的神経表現(INR)の概念に触発され、物理信号を連続的にモデル化し、空間的・時間的離散化とは無関係である。提案フレームワークは、PDEを符号化し、パラメトリゼーションニューラルネットワーク(PNODE)を用いて、複数のPDEパラメータを特徴とする潜時ダイナミクスを学習する。 PNODEは、複雑な多層パーセプトロン(MLP)によるPNODE学習の潜在的な困難を軽減するために、ハイパーネットワークによって推論できる。このフレームワークはinrを使用して潜在ダイナミクスをデコードし、正確なpdeソリューションを再構築する。さらに、未確認パラメータの予測を補正するために、物理情報損失も導入する。物理インフォームド損失を組み込むことで、未知のPDEパラメータに基づいて教師なしの方法でモデルを微調整することもできる。 pdeパラメータの変動が大きい2次元バーガース方程式について数値実験を行った。我々は,提案手法を大規模なレイノルズ数で評価し,O(10^3) の高速化と基底真理値に対する ~1% の誤差を得る。 We present a new data-driven reduced-order modeling approach to efficiently solve parametrized partial differential equations (PDEs) for many-query problems. This work is inspired by the concept of implicit neural representation (INR), which models physics signals in a continuous manner and independent of spatial/temporal discretization. The proposed framework encodes PDE and utilizes a parametrized neural ODE (PNODE) to learn latent dynamics characterized by multiple PDE parameters. PNODE can be inferred by a hypernetwork to reduce the potential difficulties in learning PNODE due to a complex multilayer perceptron (MLP). The framework uses an INR to decode the latent dynamics and reconstruct accurate PDE solutions. Further, a physics-informed loss is also introduced to correct the prediction of unseen parameter instances. Incorporating the physics-informed loss also enables the model to be fine-tuned in an unsupervised manner on unseen PDE parameters. A numerical experiment is performed on a two-dimensional Burgers equation with a large variation of PDE parameters. We evaluate the proposed method at a large Reynolds number and obtain up to speedup of O(10^3) and ~1% relative error to the ground truth values.	翻訳日:2023-12-01 00:16:07 公開日:2023-11-28
# climatex: llmは、気候条件に対する人間の専門家の信頼度を正確に評価するのか? ClimateX: Do LLMs Accurately Assess Human Expert Confidence in Climate Statements? ( http://arxiv.org/abs/2311.17107v1 ) ライセンス: Link先を確認	Romain Lacombe, Kerrie Wu, Eddie Dilworth	(参考訳) 大言語モデル(llms)が生成する出力の正確性を評価することは、気候科学および政策分野において特に重要である。気候変動に関する最新の政府間パネル(IPCC)レポートから収集された8094の気候ステートメントからなる,新規でキュレートされた専門家ラベル付きデータセットであるClimateX(Expert Confidence in Climate Statements)データセットを紹介する。このデータセットを用いて,最近のllmでは,気候関連文に対する人間専門家の信頼度,特に数回の学習環境において,限定的(最大47%)の精度で分類可能であることを示した。全体として、モデルは、低信頼と中自信のステートメントに一貫性があり、重要な自信を示す。我々は,気候情報通信,LCM評価戦略,情報検索システムにおけるLSMの利用について,その意義を強調した。 Evaluating the accuracy of outputs generated by Large Language Models (LLMs) is especially important in the climate science and policy domain. We introduce the Expert Confidence in Climate Statements (ClimateX) dataset, a novel, curated, expert-labeled dataset consisting of 8094 climate statements collected from the latest Intergovernmental Panel on Climate Change (IPCC) reports, labeled with their associated confidence levels. Using this dataset, we show that recent LLMs can classify human expert confidence in climate-related statements, especially in a few-shot learning setting, but with limited (up to 47%) accuracy. Overall, models exhibit consistent and significant over-confidence on low and medium confidence statements. We highlight implications of our results for climate communication, LLMs evaluation strategies, and the use of LLMs in information retrieval systems.	翻訳日:2023-12-01 00:09:01 公開日:2023-11-28
# 人文推定の校正について On the Calibration of Human Pose Estimation ( http://arxiv.org/abs/2311.17105v1 ) ライセンス: Link先を確認	Kerui Gu, Rongyu Chen, Angela Yao	(参考訳) ほとんどの2次元ポーズ推定フレームワークは、ヒートマップの最大値のようなヒューリスティックを用いて、アドホックな方法でキーポイント信頼度を推定する。信頼性は、例えば、MSCOCOデータセットのAPのような評価スキームの一部であるが、最先端の手法の開発においてほとんど見過ごされてきた。本稿では,ポーズ推定におけるミスキャリブレーションに対処するための第一歩を踏み出す。キャリブレーションの観点からは、信頼性はポーズの正確さと一致すべきである。実際には、既存の方法の校正が不十分である。理論的解析を通して、なぜ誤校正ギャップが存在し、そのギャップを狭めるかを示す。インスタンスサイズを予測し、信頼度関数を調整するだけで、apを大幅に改善できる。しかし、ディープニューラルネットワークのブラックボックスの性質を考えると、このギャップを完全に閉じることは、クローズドフォームの調整だけでは不可能である。そこで我々は,信頼度と姿勢の正確さの一貫性を強要し,ネットワーク固有の調整を学習する。提案するCalibrated ConfidenceNet(CCNet)は,市販のポーズ推定フレームワークでAPを最大1.4%改善する軽量なポストホック追加である。メッシュリカバリのダウンストリームタスクに適用されたCCNetは、3Dキーポイントエラーを1.0mm削減する。 Most 2D human pose estimation frameworks estimate keypoint confidence in an ad-hoc manner, using heuristics such as the maximum value of heatmaps. The confidence is part of the evaluation scheme, e.g., AP for the MSCOCO dataset, yet has been largely overlooked in the development of state-of-the-art methods. This paper takes the first steps in addressing miscalibration in pose estimation. From a calibration point of view, the confidence should be aligned with the pose accuracy. In practice, existing methods are poorly calibrated. We show, through theoretical analysis, why a miscalibration gap exists and how to narrow the gap. Simply predicting the instance size and adjusting the confidence function gives considerable AP improvements. Given the black-box nature of deep neural networks, however, it is not possible to fully close this gap with only closed-form adjustments. As such, we go one step further and learn network-specific adjustments by enforcing consistency between confidence and pose accuracy. Our proposed Calibrated ConfidenceNet (CCNet) is a light-weight post-hoc addition that improves AP by up to 1.4% on off-the-shelf pose estimation frameworks. Applied to the downstream task of mesh recovery, CCNet facilitates an additional 1.0mm decrease in 3D keypoint error.	翻訳日:2023-12-01 00:08:45 公開日:2023-11-28
# デュアルグラフアライメントによるシングルセルクラスタリング Single-Cell Clustering via Dual-Graph Alignment ( http://arxiv.org/abs/2311.17104v1 ) ライセンス: Link先を確認	Dayu Hu, Ke Liang, Xinwang Liu	(参考訳) 近年、シングルセルRNAシークエンシングの分野はクラスタリング法の開発が急増している。これらの方法は細胞亜集団の同定を可能にし、腫瘍の微小環境の理解を容易にする。その実用性にもかかわらず、既存のクラスタリングアルゴリズムのほとんどは、主に細胞マトリックスまたは細胞間のネットワーク構造が提供する属性情報に焦点を当て、しばしば遺伝子間のネットワークを無視している。この監視は、臨床的な重要性に欠ける情報とクラスタリングの結果を失う可能性がある。この制限に対処するため,遺伝子ネットワーク情報を自己監督的かつ教師なし最適化に基づくクラスタリングプロセスに統合する,デュアルグラフアライメントを組み込んだ高度な単一セルクラスタリングモデルを開発した。具体的には,セル間の関係を効果的に捉えるために注意機構によって拡張されたグラフベースのオートエンコーダを設計した。さらに, 遺伝子ネットワーク構造を導出するために, タンパク質間相互作用(ppi)ネットワークのnode2vec法を実施し, この構造をクラスタリングプロセスを通じて維持した。提案手法は, 細胞と遺伝子との関係を保ちながら, クラスタリング結果の最適化能力を示す実験により有効であることが実証された。この研究は、正確な細胞亜集団の獲得に寄与し、より実世界の生物学的シナリオによく似たクラスタリング結果を生成する。疾患細胞の特徴と分布に関するより良い洞察を与え、最終的に早期疾患の診断と治療の基礎を構築する。 In recent years, the field of single-cell RNA sequencing has seen a surge in the development of clustering methods. These methods enable the identification of cell subpopulations, thereby facilitating the understanding of tumor microenvironments. Despite their utility, most existing clustering algorithms primarily focus on the attribute information provided by the cell matrix or the network structure between cells, often neglecting the network between genes. This oversight could lead to loss of information and clustering results that lack clinical significance. To address this limitation, we develop an advanced single-cell clustering model incorporating dual-graph alignment, which integrates gene network information into the clustering process based on self-supervised and unsupervised optimization. Specifically, we designed a graph-based autoencoder enhanced by an attention mechanism to effectively capture relationships between cells. Moreover, we performed the node2vec method on Protein-Protein Interaction (PPI) networks to derive the gene network structure and maintained this structure throughout the clustering process. Our proposed method has been demonstrated to be effective through experimental results, showcasing its ability to optimize clustering outcomes while preserving the original associations between cells and genes. This research contributes to obtaining accurate cell subpopulations and generates clustering results that more closely resemble real-world biological scenarios. It provides better insights into the characteristics and distribution of diseased cells, ultimately building a foundation for early disease diagnosis and treatment.	翻訳日:2023-12-01 00:08:24 公開日:2023-11-28
# 未知数のクラスタによるコミュニティ検出によるシングルセルマルチビュークラスタリング Single-cell Multi-view Clustering via Community Detection with Unknown Number of Clusters ( http://arxiv.org/abs/2311.17103v1 ) ライセンス: Link先を確認	Dayu Hu, Zhibin Dong, Ke Liang, Jun Wang, Siwei Wang and Xinwang Liu	(参考訳) 単一セルのマルチビュークラスタリングは、異なる視点から同一セル内の細胞不均一性を探索することができる。複数のマルチビュークラスタリング手法の開発にもかかわらず、2つの主要な課題が続いている。第一に、既存のほとんどの方法では、単細胞rna(scrna)と単細胞アッセイの両方のトランスポターゼアクセスクロマチン(scatac)ビューからの情報を等しく重要視しており、2つのビュー間のデータ豊かさの実質的な差を見渡している。この見落としは、しばしば全体的なパフォーマンスの低下につながる。さらに、クラスタリング手法の大部分は、ユーザによるクラスタ数を手動で指定する必要がある。しかし、細胞データを扱う生物学者にとって、異なる種類の細胞を正確に決定することは大きな課題となる。そこで本稿では,クラスタ数を事前に定義することなく,異なるビューからの情報をシームレスに統合する,シングルセルデータに適した革新的なマルチビュークラスタリング手法であるscuncを紹介する。 scUNC法はいくつかのステップで構成されており、最初はクロスビュー融合ネットワークを使用して効果的な埋め込みを作成し、その後コミュニティ検出を通じて初期クラスタを生成する。その後、クラスタがマージされずに自動的にマージされ、最適化される。 3つの異なる単一セルデータセットを用いて,SCUNCの総合評価を行った。結果は、 scUNCが他のベースラインメソッドより優れていることを強調した。 Single-cell multi-view clustering enables the exploration of cellular heterogeneity within the same cell from different views. Despite the development of several multi-view clustering methods, two primary challenges persist. Firstly, most existing methods treat the information from both single-cell RNA (scRNA) and single-cell Assay of Transposase Accessible Chromatin (scATAC) views as equally significant, overlooking the substantial disparity in data richness between the two views. This oversight frequently leads to a degradation in overall performance. Additionally, the majority of clustering methods necessitate manual specification of the number of clusters by users. However, for biologists dealing with cell data, precisely determining the number of distinct cell types poses a formidable challenge. To this end, we introduce scUNC, an innovative multi-view clustering approach tailored for single-cell data, which seamlessly integrates information from different views without the need for a predefined number of clusters. The scUNC method comprises several steps: initially, it employs a cross-view fusion network to create an effective embedding, which is then utilized to generate initial clusters via community detection. Subsequently, the clusters are automatically merged and optimized until no further clusters can be merged. We conducted a comprehensive evaluation of scUNC using three distinct single-cell datasets. The results underscored that scUNC outperforms the other baseline methods.	翻訳日:2023-12-01 00:08:02 公開日:2023-11-28
# 半平衡最適輸送を用いたロバスト拡散gan Robust Diffusion GAN using Semi-Unbalanced Optimal Transport ( http://arxiv.org/abs/2311.17101v1 ) ライセンス: Link先を確認	Quan Dao, Binh Ta, Tung Pham and Anh Tran	(参考訳) 拡散モデル(Diffusion model)は、高精細な画像を合成する大きな可能性を示している。 GANと統合することにより、DDGAN \citep{xiao2022DDGAN} のような高度な拡散モデルが、拡張実用的な応用のためにリアルタイム性能にアプローチすることができる。 DDGANは、高品質なサンプルを生成し、異なるデータモードをカバーし、より高速なサンプリングを実現するという、生成モデリングの課題に効果的に対処してきた。本研究は, 半バランス最適輸送に基づくロバストなトレーニング手法を導入し, 異常値の影響を効果的に軽減する。包括的評価により、我々のロバスト拡散GAN(RDGAN)は、前述の生成モデリング基準、すなわち画像品質、分布のモードカバレッジ、推論速度においてバニラDDGANよりも優れており、クリーンかつ破損したデータセットの両方を扱う場合のロバスト性の向上を示す。 Diffusion models, a type of generative model, have demonstrated great potential for synthesizing highly detailed images. By integrating with GAN, advanced diffusion models like DDGAN \citep{xiao2022DDGAN} could approach real-time performance for expansive practical applications. While DDGAN has effectively addressed the challenges of generative modeling, namely producing high-quality samples, covering different data modes, and achieving faster sampling, it remains susceptible to performance drops caused by datasets that are corrupted with outlier samples. This work introduces a robust training technique based on semi-unbalanced optimal transport to mitigate the impact of outliers effectively. Through comprehensive evaluations, we demonstrate that our robust diffusion GAN (RDGAN) outperforms vanilla DDGAN in terms of the aforementioned generative modeling criteria, i.e., image quality, mode coverage of distribution, and inference speed, and exhibits improved robustness when dealing with both clean and corrupted datasets.	翻訳日:2023-12-01 00:07:40 公開日:2023-11-28
# DyRA: スケールロバスト物体検出のための動的分解能調整 DyRA: Dynamic Resolution Adjustment for Scale-robust Object Detection ( http://arxiv.org/abs/2311.17098v1 ) ライセンス: Link先を確認	Daeun Seo, Hoeseok Yang, Hyungshin Kim	(参考訳) 物体検出において,物体の大きさの変動により一定精度を達成することは困難である。この問題の1つの可能な解決策は、マルチレゾリューション戦略として知られる入力解像度を最適化することである。解決を最適化するための従来のアプローチは、しばしば事前定義された解決や動的ニューラルネットワークに基づいているが、既存のアーキテクチャに対する実行時の解決最適化に関する研究は不足している。本稿では,既存の検出器に対する畳み込みとトランスフォーマーエンコーダブロックを含むDyRAと呼ばれる適応分解能スケーリングネットワークを提案する。我々のDyRAは、インスタンス固有のスケーリングを可能にする入力イメージからスケールファクターを返します。このネットワークは、パレートスケールロス(paretoscaleloss)とバランスロス( balanceloss)という特別な設計の損失関数を持つ検出器と共同で訓練されている。 ParetoScaleLossは画像から適応的なスケールファクタを生成し、Ba BalanceLossはデータセットのローカライゼーションパワーに応じてスケールファクタを最適化する。損失関数は、小物体と大物体の対比目的の精度低下を最小限に抑えるように設計されている。 COCO, RetinaNet, Faster-RCNN, FCOS, Mask-RCNNで行った実験は, 解像度調整のみによる多解像度ベースラインよりも1.3%, 1.1%, 1.3%, 0.8%の精度向上を実現した。コードはhttps://github.com/DaEunFullGrace/DyRA.gitで入手できる。 In object detection, achieving constant accuracy is challenging due to the variability of object sizes. One possible solution to this problem is to optimize the input resolution, known as a multi-resolution strategy. Previous approaches for optimizing resolution are often based on pre-defined resolutions or a dynamic neural network, but there is a lack of study for run-time resolution optimization for existing architecture. In this paper, we propose an adaptive resolution scaling network called DyRA, which comprises convolutions and transformer encoder blocks, for existing detectors. Our DyRA returns a scale factor from an input image, which enables instance-specific scaling. This network is jointly trained with detectors with specially designed loss functions, namely ParetoScaleLoss and BalanceLoss. The ParetoScaleLoss produces an adaptive scale factor from the image, while the BalanceLoss optimizes the scale factor according to localization power for the dataset. The loss function is designed to minimize accuracy drop about the contrasting objective of small and large objects. Our experiments on COCO, RetinaNet, Faster-RCNN, FCOS, and Mask-RCNN achieved 1.3%, 1.1%, 1.3%, and 0.8% accuracy improvement than a multi-resolution baseline with solely resolution adjustment. The code is available at https://github.com/DaEunFullGrace/DyRA.git.	翻訳日:2023-12-01 00:07:19 公開日:2023-11-28
# ベイジアンネットワークモデルに基づく推論分析による5Gの匿名ジャミング検出 Anonymous Jamming Detection in 5G with Bayesian Network Model Based Inference Analysis ( http://arxiv.org/abs/2311.17097v1 ) ライセンス: Link先を確認	Ying Wang, Shashank Jere, Soumya Banerjee, Lingjia Liu, Sachin Shetty, and Shehadi Dayekh	(参考訳) ジャミングと侵入検出は、5g研究において重要であり、信頼性の維持、ユーザエクスペリエンスの劣化の防止、インフラストラクチャ障害の回避を目的としている。本稿では,プロトコルスタックからの信号パラメータに基づく5Gの匿名ジャミング検出モデルを提案する。このシステムは教師なし学習を用いて、未知の型を含むジャミングのリアルタイムかつ高精度な検出を行う。上位モデルは0.964から1のAUCに達するが、LSTMは0.923から1のAUCである。しかし、データアノテーションの必要性は教師付きアプローチを制限する。これを解決するために、教師なし自動エンコーダに基づく異常検出をAUC 0.987で提示する。このアプローチは敵のトレーニングサンプルに耐性がある。透明性とドメイン知識注入のために,ベイジアンネットワークに基づく因果解析を導入する。 Jamming and intrusion detection are critical in 5G research, aiming to maintain reliability, prevent user experience degradation, and avoid infrastructure failure. This paper introduces an anonymous jamming detection model for 5G based on signal parameters from the protocol stacks. The system uses supervised and unsupervised learning for real-time, high-accuracy detection of jamming, including unknown types. Supervised models reach an AUC of 0.964 to 1, compared to LSTM models with an AUC of 0.923 to 1. However, the need for data annotation limits the supervised approach. To address this, an unsupervised auto-encoder-based anomaly detection is presented with an AUC of 0.987. The approach is resistant to adversarial training samples. For transparency and domain knowledge injection, a Bayesian network-based causation analysis is introduced.	翻訳日:2023-12-01 00:06:54 公開日:2023-11-28
# 共同メッセージパッシングとプロトタイプベースソフトラベル伝播によるロバストなトランスダクティブ・ファウショット学習 Robust Transductive Few-shot Learning via Joint Message Passing and Prototype-based Soft-label Propagation ( http://arxiv.org/abs/2311.17096v1 ) ライセンス: Link先を確認	Jiahui Wang, Qin Xu, Bo Jiang, Bin Luo	(参考訳) FSL(Few-shot Learning)は、いくつかのサポートサンプルを使用して新しいクラスに一般化できる学習モデルを開発することを目的としている。トランスダクティブなFSLタスクでは、プロトタイプ学習とラベル伝搬が一般的である。プロトタイプメソッドは通常、サポートセットから代表的なプロトタイプを学習し、クエリサンプルとプロトタイプの間のメトリックに基づいてクエリのラベルを決定する。ラベル伝搬法は,サポートサンプルとクエリサンプルの関係を符号化したグラフ上に,サポートサンプルのラベルを伝搬させようとする。本稿では,これら2つの原則を統合し,Prototype-based Soft-label Propagation (PSLP)と呼ばれる効率的かつ堅牢なFSLアプローチを開発することを目的とする。具体的には,まずプロトタイプを用いて各問合せサンプルのソフトラベル提示を推定する。次に,学習した問合せ支援グラフ上でソフトラベル伝搬を行う。どちらのステップも段階的に行われ、それぞれのパフォーマンスが向上する。さらに,ソフトラベル推定のための効果的なプロトタイプと,ソフトラベル伝搬のための望ましいクエリ支援グラフを学習するために,サンプルの提示と関係グラフを共同で学習するための新しい共同メッセージパッシングスキームを設計する。我々のPSLP法はパラメータフリーであり、非常に効率的に実装できる。提案手法は,4つの一般的なデータセットにおいて,最先端の手法と比較して,バランスの取れた設定と不均衡な設定の両方で競合する結果が得られる。コードは受理後にリリースされます。 Few-shot learning (FSL) aims to develop a learning model with the ability to generalize to new classes using a few support samples. For transductive FSL tasks, prototype learning and label propagation methods are commonly employed. Prototype methods generally first learn the representative prototypes from the support set and then determine the labels of queries based on the metric between query samples and prototypes. Label propagation methods try to propagate the labels of support samples on the constructed graph encoding the relationships between both support and query samples. This paper aims to integrate these two principles together and develop an efficient and robust transductive FSL approach, termed Prototype-based Soft-label Propagation (PSLP). Specifically, we first estimate the soft-label presentation for each query sample by leveraging prototypes. Then, we conduct soft-label propagation on our learned query-support graph. Both steps are conducted progressively to boost their respective performance. Moreover, to learn effective prototypes for soft-label estimation as well as the desirable query-support graph for soft-label propagation, we design a new joint message passing scheme to learn sample presentation and relational graph jointly. Our PSLP method is parameter-free and can be implemented very efficiently. On four popular datasets, our method achieves competitive results on both balanced and imbalanced settings compared to the state-of-the-art methods. The code will be released upon acceptance.	翻訳日:2023-12-01 00:06:41 公開日:2023-11-28
# 視覚言語モデルからの開語彙セマンティックセマンティックセグメンテーションのプラグアンドプレイ自由抽出 Plug-and-Play, Dense-Label-Free Extraction of Open-Vocabulary Semantic Segmentation from Vision-Language Models ( http://arxiv.org/abs/2311.17095v1 ) ライセンス: Link先を確認	Luo Jiayun, Siddhesh Khandelwal, Leonid Sigal, Boyang Li	(参考訳) 膨大な量の画像テキストペアから、大規模視覚言語モデル(VLM)は、画像領域と単語を暗黙的に関連付けることを学習する。しかし、そのような事前訓練されたモデルをオープン語彙セマンティックセグメンテーションに活用することは依然として課題である。本稿では,この課題に対してpnp-ovss (plug-and-play open-vocabulary semantic segmentation) を提案する。 PnP-OVSS は VLM を利用して直接テキスト対イメージのクロスアテンションと画像-テキストマッチングの損失を利用してセマンティックセグメンテーションを生成する。しかし、クロスアテンションだけは過剰なセグメントの傾向があり、クロスアテンションプラスGradCAMは低セグメントの傾向にある。この問題を緩和するために、Salience Dropoutを導入し、モデルが最も注意を払っているパッチを反復的にドロップすることで、セグメンテーションマスクの全範囲をよりよく解決する。既存の手法と比較して、提案手法はニューラルネットワークのトレーニングを必要とせず、バリデーションセットであってもセグメンテーションアノテーションを必要とせずにハイパーパラメータチューニングを実行する。 PnP-OVSSは、同等のベースライン(Pascal VOCでは+29.4% mIoU、Pascal Contextでは+13.2% mIoU、MS COCOでは+14.0% mIoU、COCO Stuffでは+2.4% mIoU)を大幅に改善し、事前訓練されたVLM上で追加のネットワークトレーニングを行うベースラインよりも優れている。 From an enormous amount of image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which is vital for tasks such as image captioning and visual question answering. However, leveraging such pre-trained models for open-vocabulary semantic segmentation remains a challenge. In this paper, we propose a simple, yet extremely effective, training-free technique, Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) for this task. PnP-OVSS leverages a VLM with direct text-to-image cross-attention and an image-text matching loss to produce semantic segmentation. However, cross-attention alone tends to over-segment, whereas cross-attention plus GradCAM tend to under-segment. To alleviate this issue, we introduce Salience Dropout; by iteratively dropping patches that the model is most attentive to, we are able to better resolve the entire extent of the segmentation mask. Compared to existing techniques, the proposed method does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set. PnP-OVSS demonstrates substantial improvements over a comparable baseline (+29.4% mIoU on Pascal VOC, +13.2% mIoU on Pascal Context, +14.0% mIoU on MS COCO, +2.4% mIoU on COCO Stuff) and even outperforms most baselines that conduct additional network training on top of pretrained VLMs.	翻訳日:2023-12-01 00:06:19 公開日:2023-11-28
# ニューラルフィールドトレーニングを加速するデータ変換の探索において In Search of a Data Transformation That Accelerates Neural Field Training ( http://arxiv.org/abs/2311.17094v1 ) ライセンス: Link先を確認	Junwon Seo, Sangyoon Lee, Kwang In Kim, Jaeho Lee	(参考訳) ニューラルネットワークは、与えられた信号を近似するためにニューラルネットワークをトレーニングする、データ表現の新しいパラダイムである。広範に採用されるのを防ぐ重要な障害は、速度生成するニューラルネットワークのエンコーディングには、ニューラルネットワークの過剰フィッティングが必要であり、必要な忠実度レベルに達するためにかなりの数のsgdステップを要す。本稿では,ニューラルネットワークの学習速度に対するデータ変換の影響を考察し,特に画素位置の置換がsgdの収束速度に与える影響に注目した。直観的に言うと、ピクセルの位置をランダムに置換することで、トレーニングをかなり加速することができる。この現象を説明するために、PSNR曲線、損失景観、エラーパターンのレンズによる神経場訓練について検討する。解析の結果、ランダムなピクセル置換は、初期最適化が容易であるが、信号の細部を捉えるのを妨げている。 Neural field is an emerging paradigm in data representation that trains a neural network to approximate the given signal. A key obstacle that prevents its widespread adoption is the encoding speed-generating neural fields requires an overfitting of a neural network, which can take a significant number of SGD steps to reach the desired fidelity level. In this paper, we delve into the impacts of data transformations on the speed of neural field training, specifically focusing on how permuting pixel locations affect the convergence speed of SGD. Counterintuitively, we find that randomly permuting the pixel locations can considerably accelerate the training. To explain this phenomenon, we examine the neural field training through the lens of PSNR curves, loss landscapes, and error patterns. Our analyses suggest that the random pixel permutations remove the easy-to-fit patterns, which facilitate easy optimization in the early stage but hinder capturing fine details of the signal.	翻訳日:2023-12-01 00:05:43 公開日:2023-11-28
# 基礎モデルを用いたprototypepical semi-supervised learningの改良:プロトタイプ選択、パラメトリックvmf-sneプリトレーニング、マルチビュー擬似ラベリング Improved Prototypical Semi-Supervised Learning with Foundation Models: Prototype Selection, Parametric vMF-SNE Pretraining and Multi-view Pseudolabelling ( http://arxiv.org/abs/2311.17093v1 ) ライセンス: Link先を確認	Evelyn Mannix and Howard Bondell	(参考訳) 本稿では,ニューラルネットワークのバックボーンとして凍結基盤モデルを活用するという文脈において,コンピュータビジョンのためのプロトタイプ半教師付き学習の改良手法を提案する。一般的なツールとして、局所構造を保存する高次元潜在空間間のニューラルネットワークによるマッピングを作成するために、パラメトリックなvon-Mises Fisher Stochastic Neighbour Embedding (vMF-SNE)を提案する。これにより、vMF-SNEによる基礎モデルの高品質な埋め込みを用いて、ネットワークのプロジェクションヘッドを事前訓練することができる。また,複数のビューにまたがる予測を組み合わせることで,一貫性や代入のアプローチと比較して,より信頼性の高い監督信号を提供するソフトマルチビュー擬似ラベルを提案する。これらの考え方は,現在の最先端の半教師付き学習手法であるPAWS (View-Assignments with Support Samples) や,さまざまなベンチマークデータセットを用いたRobost PAWS (RoPAWS) によって改善されている。この文脈で他の教師なしラベル選択アプローチよりも優れたパフォーマンスを提供するテクニックである、単純な$k$-meansプロトタイプセレクションも導入しています。これらの変更は、CIFAR-10では平均+2.9%、CIFAR-100では+5.7%、DeepWeedsでは+15.2%改善されている。また, CIFAR-10 95.8% (+0.7%) と CIFAR-100 76.6% (+12.0%) について, 半教師付き学習の新たな成果を得た。 In this paper we present an improved approach to prototypical semi-supervised learning for computer vision, in the context of leveraging a frozen foundation model as the backbone of our neural network. As a general tool, we propose parametric von-Mises Fisher Stochastic Neighbour Embedding (vMF-SNE) to create mappings with neural networks between high-dimensional latent spaces that preserve local structure. This enables us to pretrain the projection head of our network using the high-quality embeddings of the foundation model with vMF-SNE. We also propose soft multi-view pseudolabels, where predictions across multiple views are combined to provide a more reliable supervision signal compared to a consistency or swapped assignment approach. We demonstrate that these ideas improve upon P}redicting View-Assignments with Support Samples (PAWS), a current state-of-the-art semi-supervised learning method, as well as Robust PAWS (RoPAWS), over a range of benchmarking datasets. We also introduce simple $k$-means prototype selection, a technique that provides superior performance to other unsupervised label selection approaches in this context. These changes improve upon PAWS by an average of +2.9% for CIFAR-10 and +5.7% for CIFAR-100 with four labels per class, and by +15.2% for DeepWeeds, a particularly challenging dataset for semi-supervised learning. We also achieve new state-of-the-art results in semi-supervised learning in this small label regime for CIFAR-10 - 95.8% (+0.7%) and CIFAR-100 - 76.6% (+12.0%).	翻訳日:2023-12-01 00:05:26 公開日:2023-11-28
# SEED-Bench-2:マルチモーダル大言語モデルのベンチマーク SEED-Bench-2: Benchmarking Multimodal Large Language Models ( http://arxiv.org/abs/2311.17092v1 ) ライセンス: Link先を確認	Bohao Li, Yuying Ge, Yixiao Ge, Guangzhi Wang, Rui Wang, Ruimao Zhang, Ying Shan	(参考訳) MLLM(Multimodal large language model)は、強力な大規模言語モデル(LLM)の基礎の上に構築され、最近、テキストだけでなく、インターリーブされたマルチモーダル入力(GPT-4VとDALL-E3)の組合せのような)の画像を生成できることを実証した。しかし、既存のMLLMベンチマークは、単一の画像テキスト入力のモデルの理解能力のみを評価することに限定されており、MLLMの歩みに遅れている。包括的なベンチマークは、現在のMLLMの進歩と限界を明らかにするために不可欠である。本研究は,MLLMの機能を,受容・生成可能なモダリティに基づいて,階層レベルを$L_0$から$L_4$に分類し,MLLMの‘textbf{hierarchical}機能を評価する総合ベンチマークSEED-Bench-2を提案する。具体的には、SEED-Bench-2は、テキストと画像生成の両方の評価を含む27次元にわたる正確な人間のアノテーションを持つ24Kの多重選択質問を含む。人間のアノテーションを基本とした複数選択質問は、モデル性能の客観的かつ効率的な評価を可能にし、評価中に人間やGPTの介入が不要になる。さらに,23個の著名なオープンソースMLLMの性能を評価し,貴重な観察結果を要約した。大規模な評価を通じて既存のMLLMの限界を明らかにすることにより、SEED-Bench-2は、汎用人工知能の目標に向けた将来の研究の動機となる洞察を提供することを目指している。データセットと評価コードは \href{https://github.com/AILab-CVC/SEED-Bench} で公開されている。 Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal inputs (acting like a combination of GPT-4V and DALL-E 3). However, existing MLLM benchmarks remain limited to assessing only models' comprehension ability of single image-text inputs, failing to keep up with the strides made in MLLMs. A comprehensive benchmark is imperative for investigating the progress and uncovering the limitations of current MLLMs. In this work, we categorize the capabilities of MLLMs into hierarchical levels from $L_0$ to $L_4$ based on the modalities they can accept and generate, and propose SEED-Bench-2, a comprehensive benchmark that evaluates the \textbf{hierarchical} capabilities of MLLMs. Specifically, SEED-Bench-2 comprises 24K multiple-choice questions with accurate human annotations, which spans 27 dimensions, including the evaluation of both text and image generation. Multiple-choice questions with groundtruth options derived from human annotation enables an objective and efficient assessment of model performance, eliminating the need for human or GPT intervention during evaluation. We further evaluate the performance of 23 prominent open-source MLLMs and summarize valuable observations. By revealing the limitations of existing MLLMs through extensive evaluations, we aim for SEED-Bench-2 to provide insights that will motivate future research towards the goal of General Artificial Intelligence. Dataset and evaluation code are available at \href{https://github.com/AILab-CVC/SEED-Bench}	翻訳日:2023-12-01 00:04:48 公開日:2023-11-28
# ソレ強度を超えて: 一般化ビジョンランゲージモデルのためのカスタマイズアンサンブル Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models ( http://arxiv.org/abs/2311.17091v1 ) ライセンス: Link先を確認	Zhihe Lu, Jiawang Bai, Xin Li, Zeyu Xiao, Xinchao Wang	(参考訳) オープンワールドの一般化のためのクリップなど、事前学習された視覚言語モデル(vlms)は、実用的価値のために人気が高まっている。しかし、単一のモデルの複雑なアルゴリズム設計のみに依存する場合、性能向上は制限され、clip-vit-b/16のような強力な性能を示す場合さえある。本稿では,より弱いvlmを用いてロバストな単一モデルの一般化を促進する,協調的な可能性について初めて検討する。肯定的な結果は,新しい視点,すなわち事前学習されたvlmのアンサンブルから一般化問題に取り組む動機となった。それぞれが特定のシナリオに合わせてカスタマイズされた3つのアンサンブル戦略を導入する。まずゼロショットアンサンブルを導入し,事前学習されたvlmのみ利用可能である場合の信頼性に基づいて,異なるモデルのロジットを自動的に調整する。さらに,余分なサンプルを持つシナリオに対しては,コンピュータリソースの可用性に基づいた柔軟性を提供するトレーニングフリーおよびチューニングアンサンブルを提案する。提案するアンサンブル戦略はゼロショット,ベース・ツー・ニュー,クロスデータセットの一般化によって評価され,新たな最先端性能を実現する。特に、本研究は、アンサンブルによるVLMの一般化性能向上に向けた最初の一歩である。コードはhttps://github.com/zhihelu/ensemble_vlm.gitで入手できる。 Fine-tuning pre-trained vision-language models (VLMs), e.g., CLIP, for the open-world generalization has gained increasing popularity due to its practical value. However, performance advancements are limited when relying solely on intricate algorithmic designs for a single model, even one exhibiting strong performance, e.g., CLIP-ViT-B/16. This paper, for the first time, explores the collaborative potential of leveraging much weaker VLMs to enhance the generalization of a robust single model. The affirmative findings motivate us to address the generalization problem from a novel perspective, i.e., ensemble of pre-trained VLMs. We introduce three customized ensemble strategies, each tailored to one specific scenario. Firstly, we introduce the zero-shot ensemble, automatically adjusting the logits of different models based on their confidence when only pre-trained VLMs are available. Furthermore, for scenarios with extra few-shot samples, we propose the training-free and tuning ensemble, offering flexibility based on the availability of computing resources. The proposed ensemble strategies are evaluated on zero-shot, base-to-new, and cross-dataset generalization, achieving new state-of-the-art performance. Notably, this work represents an initial stride toward enhancing the generalization performance of VLMs via ensemble. The code is available at https://github.com/zhiheLu/Ensemble_VLM.git.	翻訳日:2023-12-01 00:04:16 公開日:2023-11-28
# アンチエイリアスレンダリングのためのマルチスケール3次元ガウススプレーティング Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering ( http://arxiv.org/abs/2311.17089v1 ) ライセンス: Link先を確認	Zhiwen Yan, Weng Fei Low, Yu Chen, Gim Hee Lee	(参考訳) 3d gaussianは最近、3dレコンストラクションとレンダリングの非常に効率的な表現として登場した。高い解像度のレンダリング品質と速度にもかかわらず、どちらも低解像度または遠方のカメラ位置でレンダリングすると大幅に劣化する。低解像度または遠距離レンダリングでは、画像の画素サイズが3Dガウスの画面サイズと比較してニキスト周波数以下になり、エイリアス効果をもたらす。レンダリングは、よりスメットされた1ピクセルあたりのガウシアンを逐次アルファブレンドすることで大幅に遅くなる。これらの問題に対処するため,我々は,ガウスを異なるスケールで維持し,同じシーンを表現するマルチスケール3次元ガウススプレーティングアルゴリズムを提案する。高解像度画像はより小さなガウスでレンダリングされ、低解像度画像はより小さなガウスでレンダリングされる。同様のトレーニング時間で,本アルゴリズムは,1次元ガウススプラッティングに比べて4$\times$-128$\times$スケールレンダリングで13\%-66\% PSNRと160\%-2400\%PSNRを実現することができる。 3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13\%-66\% PSNR and 160\%-2400\% rendering speed improvement at 4$\times$-128$\times$ scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splatting.	翻訳日:2023-12-01 00:03:53 公開日:2023-11-28
# 空中物体検出を改善するフィードバックRoI機能 Feedback RoI Features Improve Aerial Object Detection ( http://arxiv.org/abs/2311.17129v1 ) ライセンス: Link先を確認	Botao Ren, Botian Xu, Tengyu Liu, Jingyi Wang, Zhidong Deng	(参考訳) 神経科学の研究では、人間の視覚システムは高レベルのフィードバック情報を利用して低レベルの知覚を誘導し、異なる特性の信号に適応できることが示されている。そこで我々は,オブジェクト検出のための同様の機構を組み込むために,フィードバックマルチレベル機能エクストラクタ(Flex)を提案する。 Flexは、画像品質の変化と分類の不確実性に応じて、画像ワイドおよびインスタンスレベルのフィードバック情報に基づいて特徴選択を洗練する。実験結果からFlexは、DOTA-v1.0、DOTA-v1.5、HRSC2016などの難易度の高いオブジェクト検出データセットに対して、既存のSOTAメソッドに一貫した改善を提供することがわかった。この設計は空中画像検出に起源があるが、MS COCOのさらなる実験により、一般的な検出モデルにおける我々のモジュールの有効性が明らかになる。定量的および質的な分析は、改善が画像の品質と密接に関連していることを示している。 Neuroscience studies have shown that the human visual system utilizes high-level feedback information to guide lower-level perception, enabling adaptation to signals of different characteristics. In light of this, we propose Feedback multi-Level feature Extractor (Flex) to incorporate a similar mechanism for object detection. Flex refines feature selection based on image-wise and instance-level feedback information in response to image quality variation and classification uncertainty. Experimental results show that Flex offers consistent improvement to a range of existing SOTA methods on the challenging aerial object detection datasets including DOTA-v1.0, DOTA-v1.5, and HRSC2016. Although the design originates in aerial image detection, further experiments on MS COCO also reveal our module's efficacy in general detection models. Quantitative and qualitative analyses indicate that the improvements are closely related to image qualities, which match our motivation.	翻訳日:2023-11-30 23:56:24 公開日:2023-11-28
# 逆攻撃に対する変圧器を用いた光文字認識の脆弱性解析 Vulnerability Analysis of Transformer-based Optical Character Recognition to Adversarial Attacks ( http://arxiv.org/abs/2311.17128v1 ) ライセンス: Link先を確認	Lucas Beerens and Desmond J. Higham	(参考訳) 光文字認識(OCR)の最近の進歩は、トランスフォーマーモデルによって駆動されている。 ocrシステムは多数の高リスクドメインにおいて極めて重要であるが、敵への攻撃に対する脆弱性は大部分が未解決の領域であり、セキュリティと新たなai規制への準拠に関する懸念が高まっている。本稿では,Transformer-based OCR(TrOCR)モデルのレジリエンスを評価するための新しいフレームワークを提案する。我々は,標的攻撃と非標的攻撃の両方に対するアルゴリズムを開発し,評価する。未ターゲットの場合、キャラクタエラー率(CER)を測定し、対象の場合、成功率を使用します。 TrOCRは標的外攻撃に対して非常に脆弱であり、標的攻撃に対して若干脆弱でないことが判明した。ベンチマーク手書きデータセットでは、標的外攻撃は目に見えることなく1以上のCERを引き起こす可能性がある。同じような摂動サイズで、ターゲット攻撃は成功率約25\%$ -- ここで私たちは単一のトークンを攻撃し、大きな語彙から10番目の可能性を持つトークンを出力することをtrocrに要求しました。 Recent advancements in Optical Character Recognition (OCR) have been driven by transformer-based models. OCR systems are critical in numerous high-stakes domains, yet their vulnerability to adversarial attack remains largely uncharted territory, raising concerns about security and compliance with emerging AI regulations. In this work we present a novel framework to assess the resilience of Transformer-based OCR (TrOCR) models. We develop and assess algorithms for both targeted and untargeted attacks. For the untargeted case, we measure the Character Error Rate (CER), while for the targeted case we use the success ratio. We find that TrOCR is highly vulnerable to untargeted attacks and somewhat less vulnerable to targeted attacks. On a benchmark handwriting data set, untargeted attacks can cause a CER of more than 1 without being noticeable to the eye. With a similar perturbation size, targeted attacks can lead to success rates of around $25\%$ -- here we attacked single tokens, requiring TrOCR to output the tenth most likely token from a large vocabulary.	翻訳日:2023-11-30 23:56:09 公開日:2023-11-28
# Reason out your Layout: テキストから画像への合成のための大規模言語モデルからLayout Masterを呼び出す Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis ( http://arxiv.org/abs/2311.17126v1 ) ライセンス: Link先を確認	Xiaohui Chen, Yongfei Liu, Yingxiang Yang, Jianbo Yuan, Quanzeng You, Li-Ping Liu, Hongxia Yang	(参考訳) テキスト・ツー・イメージ(T2I)生成モデルの最近の進歩は、テキスト・プロンプトに基づく多種多様な想像的視覚を創出する顕著な能力を示している。進歩にもかかわらず、これらの拡散モデルは、時々テキストから画像への意味的内容の変換に苦労する。レイアウトの条件付けはT2I拡散モデルの合成能力を向上させるのに有効であるが、通常は手動レイアウト入力を必要とする。本研究では,レイアウト生成器としてLarge Language Models (LLM) を用いたT2I拡散モデルの改良手法を提案する。テキストの解釈や空間的に合理的なオブジェクトレイアウトを生成するために, LLM のChain-of-Thought プロンプトを利用する。生成されたレイアウトは、生成された画像の構成と空間的精度を高めるために使用される。さらに,レイアウト情報を安定拡散モデルに明示的に統合するクロスアテンション機構に基づく効率的なアダプタを提案する。実験では画像品質とレイアウト精度が大幅に向上し,生成画像モデルの強化におけるllmの可能性を示した。 Recent advancements in text-to-image (T2I) generative models have shown remarkable capabilities in producing diverse and imaginative visuals based on text prompts. Despite the advancement, these diffusion models sometimes struggle to translate the semantic content from the text into images entirely. While conditioning on the layout has shown to be effective in improving the compositional ability of T2I diffusion models, they typically require manual layout input. In this work, we introduce a novel approach to improving T2I diffusion models using Large Language Models (LLMs) as layout generators. Our method leverages the Chain-of-Thought prompting of LLMs to interpret text and generate spatially reasonable object layouts. The generated layout is then used to enhance the generated images' composition and spatial accuracy. Moreover, we propose an efficient adapter based on a cross-attention mechanism, which explicitly integrates the layout information into the stable diffusion models. Our experiments demonstrate significant improvements in image quality and layout accuracy, showcasing the potential of LLMs in augmenting generative image models.	翻訳日:2023-11-30 23:55:53 公開日:2023-11-28
# 知識駆動型AutoMLアーキテクチャ A knowledge-driven AutoML architecture ( http://arxiv.org/abs/2311.17124v1 ) ライセンス: Link先を確認	Corneliu Cofaru and Johan Loeckx	(参考訳) 本稿では,パイプラインと深い特徴合成のための知識駆動型AutoMLアーキテクチャを提案する。主な目標は、AutoMLプロセスを説明可能とし、パイプラインと機能の合成にドメイン知識を活用することである。まず、パイプラインと深い機能の構築は、統一された方法でアプローチされる。次に、合成は共有知識システムによって駆動され、パイプライン操作の使用方法や計算する機能に関して対話的にクエリされる。最後に、合成プロセスは、データに対するアプリケーションの部分的なソリューションと結果を使用して実行時に決定する。提案アーキテクチャのna\"{\i}ve実装の機能の実証と,そのメリット,トレードオフ,さらにはAutoMLの将来の可能性について議論するために,2つの実験を行った。 This paper proposes a knowledge-driven AutoML architecture for pipeline and deep feature synthesis. The main goal is to render the AutoML process explainable and to leverage domain knowledge in the synthesis of pipelines and features. The architecture explores several novel ideas: first, the construction of pipelines and deep features is approached in an unified way. Next, synthesis is driven by a shared knowledge system, interactively queried as to what pipeline operations to use or features to compute. Lastly, the synthesis processes takes decisions at runtime using partial solutions and results of their application on data. Two experiments are conducted to demonstrate the functionality of a na\"{\i}ve implementation of the proposed architecture and to discuss its advantages, trade-offs as well as future potential for AutoML.	翻訳日:2023-11-30 23:55:33 公開日:2023-11-28
# ConTex-Human: テクスチュア一貫性合成による単一画像からの人間の自由視点レンダリング ConTex-Human: Free-View Rendering of Human from a Single Image with Texture-Consistent Synthesis ( http://arxiv.org/abs/2311.17123v1 ) ライセンス: Link先を確認	Xiangjun Gao, Xiaoyu Li, Chaopeng Zhang, Qi Zhang, Yanpei Cao, Ying Shan, Long Quan	(参考訳) 本研究では,1つの画像から3次元人間を自由視点でレンダリングする課題を解決する手法を提案する。既存のアプローチでは、一般化可能なピクセル配列の暗黙のフィールドを使用して人間のテクスチャメッシュを再構築したり、2次元拡散モデルをスコア蒸留サンプリング法(SDS)のガイダンスとして使用して、2次元画像を3次元空間に持ち上げることでこれを実現できる。しかし、一般化可能な暗黙の場は、しばしばスムーズなテクスチャフィールドをもたらすが、SDS法は入力画像とのテクスチャに一貫性のない新しいビューをもたらす傾向がある。本稿では,テクスチャに一貫性のあるバックビュー合成モジュールについて紹介する。さらに, 側領域に生じる色歪みを緩和するために, 合成バックビューテクスチャと組み合わせたテクスチャマッピングと精細化のための可視性を考慮したパッチ一貫性の規則化を提案する。以上の技術により、1つの画像から高忠実度かつテクスチャに一貫性のある人間のレンダリングを実現することができる。実データと合成データの両方で行った実験は,本手法の有効性を示し,本手法が従来のベースライン法より優れていることを示す。 In this work, we propose a method to address the challenge of rendering a 3D human from a single image in a free-view manner. Some existing approaches could achieve this by using generalizable pixel-aligned implicit fields to reconstruct a textured mesh of a human or by employing a 2D diffusion model as guidance with the Score Distillation Sampling (SDS) method, to lift the 2D image into 3D space. However, a generalizable implicit field often results in an over-smooth texture field, while the SDS method tends to lead to a texture-inconsistent novel view with the input image. In this paper, we introduce a texture-consistent back view synthesis module that could transfer the reference image content to the back view through depth and text-guided attention injection. Moreover, to alleviate the color distortion that occurs in the side region, we propose a visibility-aware patch consistency regularization for texture mapping and refinement combined with the synthesized back view texture. With the above techniques, we could achieve high-fidelity and texture-consistent human rendering from a single image. Experiments conducted on both real and synthetic data demonstrate the effectiveness of our method and show that our approach outperforms previous baseline methods.	翻訳日:2023-11-30 23:55:22 公開日:2023-11-28
# 大規模モデルに基づくカモフラージュ物体検出 Large Model Based Referring Camouflaged Object Detection ( http://arxiv.org/abs/2311.17122v1 ) ライセンス: Link先を確認	Shupeng Cheng, Ge-Peng Ji, Pengda Qin, Deng-Ping Fan, Bowen Zhou, Peng Xu	(参考訳) camouflaged object detection(ref-cod)は、テキストや視覚的参照とマッチする特定のcamouflaged objectsをセグメント化する、最近提案された問題である。この課題には、CODドメイン固有の認識とマルチモーダル参照イメージアライメントという2つの大きな課題がある。我々のモチベーションは、最近のMLLM(Multimodal Large Language Models)のセマンティックインテリジェンスと本質的な知識をフル活用して、この複雑なタスクを人間的な方法で分解することである。言語は高度に凝縮され帰納的であるため、言語表現は人間の知識学習の主要なメディアであり、知識情報の伝達は単純さから複雑さへの多段階的な進歩に続く。本稿では,mllmからの多レベル知識記述を整理し,カモフラージュ目標とカモフラージュシーンを知覚し,さらにテクスト参照をカモフラージュ画像と深く関連付ける,大規模セグメンテーションのビジョンモデルを導出する,ref-cod用多レベル知識誘導マルチモーダル手法を提案する。 1) MLLMの知識がRef-CODとCODのために研究されたのは今回が初めてです。 2) MLLMの知識を統合することにより,ターゲットとシーンを知覚する2つの主要な視点にRef-CODを分解し,多段階の知識誘導手法を提案する。 (3)提案手法はRef-CODベンチマークの最先端性を達成し,多くの競争相手に勝る結果となった。さらに、注入された豊富な知識のおかげで、ユニモーダルCODデータセット上でゼロショットの一般化能力を示す。私たちはすぐにコードをリリースします。 Referring camouflaged object detection (Ref-COD) is a recently-proposed problem aiming to segment out specified camouflaged objects matched with a textual or visual reference. This task involves two major challenges: the COD domain-specific perception and multimodal reference-image alignment. Our motivation is to make full use of the semantic intelligence and intrinsic knowledge of recent Multimodal Large Language Models (MLLMs) to decompose this complex task in a human-like way. As language is highly condensed and inductive, linguistic expression is the main media of human knowledge learning, and the transmission of knowledge information follows a multi-level progression from simplicity to complexity. In this paper, we propose a large-model-based Multi-Level Knowledge-Guided multimodal method for Ref-COD termed MLKG, where multi-level knowledge descriptions from MLLM are organized to guide the large vision model of segmentation to perceive the camouflage-targets and camouflage-scene progressively and meanwhile deeply align the textual references with camouflaged photos. To our knowledge, our contributions mainly include: (1) This is the first time that the MLLM knowledge is studied for Ref-COD and COD. (2) We, for the first time, propose decomposing Ref-COD into two main perspectives of perceiving the target and scene by integrating MLLM knowledge, and contribute a multi-level knowledge-guided method. (3) Our method achieves the state-of-the-art on the Ref-COD benchmark outperforming numerous strong competitors. Moreover, thanks to the injected rich knowledge, it demonstrates zero-shot generalization ability on uni-modal COD datasets. We will release our code soon.	翻訳日:2023-11-30 23:54:56 公開日:2023-11-28
# 生成データ拡張によるスクリブル制御セマンティックセマンティックセグメンテーションの改善 Generative Data Augmentation Improves Scribble-supervised Semantic Segmentation ( http://arxiv.org/abs/2311.17121v1 ) ライセンス: Link先を確認	Jacob Schnell, Jieke Wang, Lu Qi, Vincent Tao Hu, Meng Tang	(参考訳) 拡散モデルなどの生成モデルの最近の進歩により、高品質な合成画像の生成が広く利用できるようになった。先行研究では、合成画像のトレーニングは、画像分類、オブジェクト検出、意味セグメンテーションといった多くの知覚タスクを改善することが示されている。我々は,スクリブル教師付きセマンティックセグメンテーションのための生成データ拡張を初めて検討した。セマンティックスクリブルに条件付き制御ネット拡散モデルを利用して高品質なトレーニングデータを生成する生成データ拡張手法を提案する。しかし、生成データ拡張の素早い実装は、その改善よりも下流セグメンタの性能を必然的に損なう可能性がある。クラス一貫性を強制するために分類器フリーの拡散ガイダンスを利用し、データの多様性をデータリアリズムと引き換えにエンコード比を導入する。指導尺度と符号化率を用いて、高品質なトレーニング画像のスペクトルを生成することができる。我々は,複数の拡張スキームを提案し,これらのスキームがモデル性能,特に低データ環境に大きな影響を与えることを見出した。さらに,本フレームワークは,スクリブル教師付きセグメンテーションと完全教師付きセグメンテーションのギャップを小さくする。また、このフレームワークは、小さなデータセットのセグメンテーション性能を大幅に改善し、完全な教師付きセグメンテーションを上回ることさえも示しています。 Recent advances in generative models, such as diffusion models, have made generating high-quality synthetic images widely accessible. Prior works have shown that training on synthetic images improves many perception tasks, such as image classification, object detection, and semantic segmentation. We are the first to explore generative data augmentations for scribble-supervised semantic segmentation. We propose a generative data augmentation method that leverages a ControlNet diffusion model conditioned on semantic scribbles to produce high-quality training data. However, naive implementations of generative data augmentations may inadvertently harm the performance of the downstream segmentor rather than improve it. We leverage classifier-free diffusion guidance to enforce class consistency and introduce encode ratios to trade off data diversity for data realism. Using the guidance scale and encode ratio, we are able to generate a spectrum of high-quality training images. We propose multiple augmentation schemes and find that these schemes significantly impact model performance, especially in the low-data regime. Our framework further reduces the gap between the performance of scribble-supervised segmentation and that of fully-supervised segmentation. We also show that our framework significantly improves segmentation performance on small datasets, even surpassing fully-supervised segmentation.	翻訳日:2023-11-30 23:54:24 公開日:2023-11-28
# 神経陰影表現における単眼カメラの連続ポーズ Continuous Pose for Monocular Cameras in Neural Implicit Representation ( http://arxiv.org/abs/2311.17119v1 ) ライセンス: Link先を確認	Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool	(参考訳) 本稿では,時間的連続的な機能として単眼カメラポーズの最適化の有効性を示す。カメラポーズは、所定の時刻を対応するカメラポーズにマッピングする暗黙のニューラル関数を使用して表現される。マッピングされたカメラポーズは、ジョイントカメラポーズ最適化が必要な下流タスクに使用される。その際、暗黙的にカメラポーズを表すネットワークパラメータが最適化される。提案手法は,(1)ノイズのあるポーズからのNeRF,(2)非同期イベントからのNeRF,(3)視覚的局所化とマッピング(vSLAM),(4)VSLAMとIMUの4つの異なる実験環境において有効である。これら4つの設定において,提案手法は比較したベースラインや最先端手法よりも性能が優れている。さらに、連続運動の仮定を用いて、ポーズの変化は実際には6度以下の自由度(DOF)を持つ多様体に存在することができる。我々はこの低DOF動作表現を \emph{intrinsic motion} と呼び、vSLAM設定でこのアプローチを使用し、カメラ追跡性能を高く評価した。 In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time. The camera poses are represented using an implicit neural function which maps the given time to the corresponding camera pose. The mapped camera poses are then used for the downstream tasks where joint camera pose optimization is also required. While doing so, the network parameters -- that implicitly represent camera poses -- are optimized. We exploit the proposed method in four diverse experimental settings, namely, (1) NeRF from noisy poses; (2) NeRF from asynchronous Events; (3) Visual Simultaneous Localization and Mapping (vSLAM); and (4) vSLAM with IMUs. In all four settings, the proposed method performs significantly better than the compared baselines and the state-of-the-art methods. Additionally, using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is also realized. We call this low DOF motion representation as the \emph{intrinsic motion} and use the approach in vSLAM settings, showing impressive camera tracking performance.	翻訳日:2023-11-30 23:54:02 公開日:2023-11-28
# adafocus: ロングビデオアクション理解のためのエンド・ツー・エンドの弱い教師付き学習に向けて AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding ( http://arxiv.org/abs/2311.17118v1 ) ライセンス: Link先を確認	Jiaming Zhou, Hanjun Li, Kun-Yu Lin, Junwei Liang	(参考訳) 長時間ビデオのアクション理解タスクのためのエンドツーエンドモデルの開発は、計算とメモリに大きな課題をもたらす。既存の作業は、通常、オフザシェルフアクション認識モデルによって抽出された長ビデオ機能のモデルを構築し、異なるドメインのショートビデオデータセットでトレーニングされ、抽出された機能はドメインの相違を被る。これを避けるために、アクション認識モデルは、長いビデオからトリミングされ、アクションインターバルアノテーションを使用してラベル付けされるクリップでエンドツーエンドにトレーニングすることができる。このような完全に管理されたアノテーションは収集に費用がかかる。したがって, 大規模ビデオの動作理解には, 弱い教師付き手法が必要となる。弱い監督設定では、アクションクリップの開始時刻と終了時刻を正確に指定することなく、ビデオ全体に対してアクションラベルを提供する。そこで我々は,AdaFocusフレームワークを提案する。 AdaFocusは、アクションのスパイクアクション性と時間的位置を推定し、正確なアノテーションを必要とせずに、より良いトレーニングを容易にするアクションクリップに適応的にフォーカスすることができる。 3つの長ビデオデータセットの実験は、その有効性を示している。驚くべきことに、2つのデータセットで、弱い監督下でadafocusでトレーニングされたモデルは、完全な監督の下でトレーニングされたモデルよりも優れています。さらに, adafocus を用いた弱い教師付き特徴抽出パイプラインを構築し, 3つの長ビデオアクション理解タスクにおいて大幅な改善が可能となった。 Developing end-to-end models for long-video action understanding tasks presents significant computational and memory challenges. Existing works generally build models on long-video features extracted by off-the-shelf action recognition models, which are trained on short-video datasets in different domains, making the extracted features suffer domain discrepancy. To avoid this, action recognition models can be end-to-end trained on clips, which are trimmed from long videos and labeled using action interval annotations. Such fully supervised annotations are expensive to collect. Thus, a weakly supervised method is needed for long-video action understanding at scale. Under the weak supervision setting, action labels are provided for the whole video without precise start and end times of the action clip. To this end, we propose an AdaFocus framework. AdaFocus estimates the spike-actionness and temporal positions of actions, enabling it to adaptively focus on action clips that facilitate better training without the need for precise annotations. Experiments on three long-video datasets show its effectiveness. Remarkably, on two of datasets, models trained with AdaFocus under weak supervision outperform those trained under full supervision. Furthermore, we form a weakly supervised feature extraction pipeline with our AdaFocus, which enables significant improvements on three long-video action understanding tasks.	翻訳日:2023-11-30 23:53:39 公開日:2023-11-28
# Animate Anyone:文字アニメーションのための一貫性と制御可能な画像間合成 Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation ( http://arxiv.org/abs/2311.17117v1 ) ライセンス: Link先を確認	Li Hu, Xin Gao, Peng Zhang, Ke Sun, Bang Zhang, Liefeng Bo	(参考訳) キャラクタアニメーションは、運転信号を通じて静止画像からキャラクタビデオを生成することを目的としている。現在、拡散モデルは、その堅牢な生成能力のため、視覚発生研究の主流となっている。しかし、画像対ビデオの領域、特にキャラクタアニメーションでは、キャラクタからの詳細な情報との一貫性を時間的に維持することが問題となっている。本稿では,拡散モデルのパワーを活用し,キャラクタアニメーションに適した新しいフレームワークを提案する。参照画像から複雑な外観特徴の整合性を維持するため、空間的注意による詳細特徴のマージを行うためにReferenceNetを設計する。制御性と連続性を確保するため,映像フレーム間のスムーズなフレーム間遷移を確保するために,キャラクタの動きを指示するための効率的なポーズ案内器を導入する。学習データを拡大することで任意の文字をアニメーション化でき、他の画像から動画への手法と比較して文字アニメーションの優れた結果が得られる。さらに,ファッションビデオと人間のダンス合成のベンチマークによる評価を行い,最新の結果を得た。 Character Animation aims to generating character videos from still images through driving signals. Currently, diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. However, challenges persist in the realm of image-to-video, especially in character animation, where temporally maintaining consistency with detailed information from character remains a formidable problem. In this paper, we leverage the power of diffusion models and propose a novel framework tailored for character animation. To preserve consistency of intricate appearance features from reference image, we design ReferenceNet to merge detail features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guider to direct character's movements and employ an effective temporal modeling approach to ensure smooth inter-frame transitions between video frames. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods. Furthermore, we evaluate our method on benchmarks for fashion video and human dance synthesis, achieving state-of-the-art results.	翻訳日:2023-11-30 23:53:18 公開日:2023-11-28
# ref$^2$-nerf:反射と屈折を考慮した神経放射場 REF$^2$-NeRF: Reflection and Refraction aware Neural Radiance Field ( http://arxiv.org/abs/2311.17116v1 ) ライセンス: Link先を確認	Wooseok Kim, Taiki Fukiage, Takeshi Oishi	(参考訳) 近年,neural radiance field (nerf) 法による暗黙的神経表現を用いた複数画像からの3次元再構成法の研究において有意な進歩がみられた。ボリュームレンダリングに基づくこのような手法は様々な光現象をモデル化することができ、様々な場面や状況に対応するために様々な拡張手法が提案されている。しかし、複数のガラスオブジェクト(例えばガラスショーケースのオブジェクト)でシーンを扱う場合、複数の反射や屈折効果があるため、ターゲットシーンを正確にモデル化することは困難である。そこで本研究では,ガラスケースを含むシーンのNeRFモデリング手法を提案する。提案手法では, 屈折と反射を, ビューアの視点に依存し, 独立な要素を用いてモデル化する。このアプローチにより、屈折が発生する表面、すなわちガラス表面を推定することができ、直接および反射光成分の分離とモデリングを可能にする。既存の手法と比較して,ガラス屈折率と全体像のより正確なモデリングが可能である。 Recently, significant progress has been made in the study of methods for 3D reconstruction from multiple images using implicit neural representations, exemplified by the neural radiance field (NeRF) method. Such methods, which are based on volume rendering, can model various light phenomena, and various extended methods have been proposed to accommodate different scenes and situations. However, when handling scenes with multiple glass objects, e.g., objects in a glass showcase, modeling the target scene accurately has been challenging due to the presence of multiple reflection and refraction effects. Thus, this paper proposes a NeRF-based modeling method for scenes containing a glass case. In the proposed method, refraction and reflection are modeled using elements that are dependent and independent of the viewer's perspective. This approach allows us to estimate the surfaces where refraction occurs, i.e., glass surfaces, and enables the separation and modeling of both direct and reflected light components. Compared to existing methods, the proposed method enables more accurate modeling of both glass refraction and the overall scene.	翻訳日:2023-11-30 23:52:58 公開日:2023-11-28
# ヒトgaussian splatting: アニメーション可能なアバターのリアルタイムレンダリング Human Gaussian Splatting: Real-time Rendering of Animatable Avatars ( http://arxiv.org/abs/2311.17113v1 ) ライセンス: Link先を確認	Arthur Moreau, Jifei Song, Helisa Dhamo, Richard Shaw, Yiren Zhou, Eduardo P\'erez-Pellitero	(参考訳) 本研究は,多視点映像から学習した実写体アバターのリアルタイムレンダリングの問題に対処する。仮想人間のモデリングとレンダリングの古典的なアプローチは、一般的にはテクスチャメッシュを使用しているが、最近の研究は、印象的な視覚品質を達成するニューラルネットワークの表現を開発した。しかし、これらのモデルはリアルタイムにレンダリングすることは困難であり、キャラクターが体でアニメーションされたときの品質はトレーニングの観察と異なる。本研究では,3次元ガウス型スプラッティングに基づく,ニューラルネットワークの放射能場に対する極めて効率的な代替手段として最近出現した,最初のアニマタブルなヒトモデルを提案する。我々の体は、前方のスキンニングと局所的な非剛性精製を組み合わせた粗いアプローチで変形する標準空間のガウス原始体によって表現されている。マルチビュー観察から人間のガウス型スプラッティング(\ours)モデルをエンド・ツー・エンドの方法で学習する方法を説明し,新しいポーズ合成のための最先端のアプローチと対比して評価する。提案手法は,20fps以上のレンダリングが可能でありながら,THuman4データセットの最先端技術よりもPSNR 1.5dbBが優れていることを示す。 This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose the first animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. Our body is represented by a set of gaussian primitives in a canonical space which are deformed in a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (\OURS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method presents a PSNR 1.5dbB better than the state-of-the-art on THuman4 dataset while being able to render at 20fps or more.	翻訳日:2023-11-30 23:52:41 公開日:2023-11-28
# セグメント型anyモデルのクロスブロックオーケストレーションによるパラメータ高効率微調整 Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model ( http://arxiv.org/abs/2311.17112v1 ) ライセンス: Link先を確認	Zelin Peng, Zhengqin Xu, Zhilin Zeng, Lingxi Xie, Qi Tian, Wei Shen	(参考訳) パラメータ効率細調整(PEFT)は、限られたトレーニングデータを持つ新しいシナリオにおいて、大きな基礎モデルの可能性を解き放つ効果的な手法である。コンピュータビジョンのコミュニティでは、PEFTは画像分類において有効性を示しているが、画像分割の能力についてはほとんど研究されていない。微調整セグメンテーションモデルは通常、新しいシナリオのためにパラメータ空間の適切な射影方向を調整するためにパラメータの重い調整を必要とする。これは既存のPEFTアルゴリズムに挑戦し、各ブロックに限られた数の個々のパラメータを注入することで、ブロックに沿った隠れマルコフ連鎖の制限によるパラメータ空間の射影方向の相当な調整を防止する。本稿では,PEFTにクロスブロックオーケストレーション機構を組み,Segment Anything Model(SAM)の下流シナリオへの適応を可能にする。本稿では,各ペフトブロックのパラメータ空間の異なる係数集合間の通信を容易にするために学習可能な関係行列を統合する新しいブロック間通信モジュールを提案する。さらに,超複素層から重みが生成される線形投影ヘッドを導入し,パラメータ空間全体の投影方向の調整の影響をさらに高めるブロック内拡張モジュールを提案する。多様なベンチマーク実験により,提案手法は,約1K以上のパラメータを持つ新規シナリオにおいて,セグメンテーション性能を大幅に向上することを示した。 Parameter-efficient fine-tuning (PEFT) is an effective methodology to unleash the potential of large foundation models in novel scenarios with limited training data. In the computer vision community, PEFT has shown effectiveness in image classification, but little research has studied its ability for image segmentation. Fine-tuning segmentation models usually require a heavier adjustment of parameters to align the proper projection directions in the parameter space for new scenarios. This raises a challenge to existing PEFT algorithms, as they often inject a limited number of individual parameters into each block, which prevents substantial adjustment of the projection direction of the parameter space due to the limitation of Hidden Markov Chain along blocks. In this paper, we equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We introduce a novel inter-block communication module, which integrates a learnable relation matrix to facilitate communication among different coefficient sets of each PEFT block's parameter space. Moreover, we propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer, further enhancing the impact of the adjustment of projection directions on the entire parameter space. Extensive experiments on diverse benchmarks demonstrate that our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.	翻訳日:2023-11-30 23:52:20 公開日:2023-11-28
# 画像ハイライト法を用いた時系列分類のためのXAI XAI for time-series classification leveraging image highlight methods ( http://arxiv.org/abs/2311.17110v1 ) ライセンス: Link先を確認	Georgios Makridis, Georgios Fatouros, Vasileios Koukos, Dimitrios Kotios, Dimosthenis Kyriazis, Ioannis Soldatos	(参考訳) コンピュータビジョンと自然言語処理(nlp)の分野では説明可能性について多くの研究がなされているが、自然による時系列として時系列に適用される方法を説明することは、一見すると理解できない。本稿では、時系列分類タスクにおける解釈可能性を提供する教師学生アーキテクチャ(蒸留モデル)にディープニューラルネットワーク(DNN)を提案する。このアプローチの説明は、時系列を2dプロットに変換し、画像ハイライト手法(limeやgradcamなど)を適用して予測を解釈することに基づいている。同時に,提案手法は,トレーニング時間の増加のトレードオフとともに,ベースラインモデルと競合する精度の向上を提供する。 Although much work has been done on explainability in the computer vision and natural language processing (NLP) fields, there is still much work to be done to explain methods applied to time series as time series by nature can not be understood at first sight. In this paper, we present a Deep Neural Network (DNN) in a teacher-student architecture (distillation model) that offers interpretability in time-series classification tasks. The explainability of our approach is based on transforming the time series to 2D plots and applying image highlight methods (such as LIME and GradCam), making the predictions interpretable. At the same time, the proposed approach offers increased accuracy competing with the baseline model with the trade-off of increasing the training time.	翻訳日:2023-11-30 23:51:54 公開日:2023-11-28
# neural texture puppeteer: 対話的な速度で再識別を可能にする、明瞭な形状の神経構造とテクスチャレンダリングのためのフレームワーク Neural Texture Puppeteer: A Framework for Neural Geometry and Texture Rendering of Articulated Shapes, Enabling Re-Identification at Interactive Speed ( http://arxiv.org/abs/2311.17109v1 ) ライセンス: Link先を確認	Urs Waldmann, Ole Johannsen, Bastian Goldluecke	(参考訳) 本稿では,ニューラルテクスチュア Puppeteer と呼ぶテクスチャ化された形状のためのニューラルネットワークパイプラインを提案する。本手法は幾何学とテクスチャエンコーディングを分離する。幾何パイプラインは、この幾何学的情報を提供する基底真理データから、明瞭な形状の表面上の空間的関係を捉えることを学ぶ。テクスチャオートエンコーダは、この情報を利用してテクスチャ化された画像をグローバル潜在コードにエンコードする。このグローバルテクスチャ埋め込みは、幾何学から分離して効率的に訓練され、ダウンストリームタスクで個人を特定するために使用される。神経テクスチャレンダリングと個人識別は、対話的な速度で実行される。われわれの知る限りでは、ニューラルレンダリングに基づいて、私たちはCNNやトランスフォーマーベースのアプローチに代わる有望な代替手段を初めて提供します。異なる合成ウシのテクスチャのリアルな外観とポーズ合成は,この手法のクオリティをさらに示している。明瞭な形状の幾何学に対する基底的真理データの利用によって制限されるため、実世界のデータ合成の品質が低下する。さらに,実世界の2d rgb画像からテクスチャを再構成する実世界のテクスチャ領域シフトを適用することで,実世界のデータに対するモデルの柔軟性を実証する。そこで本手法は,データに制限がある絶滅危惧種に適用することができる。我々の新しい人工テクスチャデータセットNePuMooは、ニューラルネットワークによる再識別の分野でさらなる発展を促すために公開されています。 In this paper, we present a neural rendering pipeline for textured articulated shapes that we call Neural Texture Puppeteer. Our method separates geometry and texture encoding. The geometry pipeline learns to capture spatial relationships on the surface of the articulated shape from ground truth data that provides this geometric information. A texture auto-encoder makes use of this information to encode textured images into a global latent code. This global texture embedding can be efficiently trained separately from the geometry, and used in a downstream task to identify individuals. The neural texture rendering and the identification of individuals run at interactive speeds. To the best of our knowledge, we are the first to offer a promising alternative to CNN- or transformer-based approaches for re-identification of articulated individuals based on neural rendering. Realistic looking novel view and pose synthesis for different synthetic cow textures further demonstrate the quality of our method. Restricted by the availability of ground truth data for the articulated shape's geometry, the quality for real-world data synthesis is reduced. We further demonstrate the flexibility of our model for real-world data by applying a synthetic to real-world texture domain shift where we reconstruct the texture from a real-world 2D RGB image. Thus, our method can be applied to endangered species where data is limited. Our novel synthetic texture dataset NePuMoo is publicly available to inspire further development in the field of neural rendering-based re-identification.	翻訳日:2023-11-30 23:51:44 公開日:2023-11-28
# (Ir)AIの合理性:最先端、研究課題、オープンな質問 (Ir)rationality in AI: State of the Art, Research Challenges and Open Questions ( http://arxiv.org/abs/2311.17165v1 ) ライセンス: Link先を確認	Olivia Macmillan-Scott and Mirco Musolesi	(参考訳) 合理性の概念は人工知能の分野の中心である。人間の推論をシミュレートしたいのか、それとも境界のある最適性を達成することを目指すのかに関わらず、一般的には人工エージェントを可能な限り合理的なものにしたいと考えています。 AIにおける概念の中心性にもかかわらず、合理的なエージェントを構成するものの統一された定義は存在しない。この記事では、人工知能における合理性と不合理性に関する調査を行い、この分野におけるオープンな疑問を取り上げます。他の分野における合理性の理解は、人工知能、特に経済学、哲学、心理学におけるその概念に影響を与えてきた。人工エージェントの挙動に着目し,特定のシナリオにおいて最適であることを示す不合理行動を考える。識別と相互作用の両面で不合理なエージェントを扱うためにいくつかの方法が開発されているが、この分野での作業は限られている。現在までに開発されている手法、すなわち敵対的シナリオは、人工エージェントとの相互作用に適合するように適応することができる。我々はさらに,人間と人工エージェントの相互作用と,この相互作用において合理性が果たす役割について論じる。 The concept of rationality is central to the field of artificial intelligence. Whether we are seeking to simulate human reasoning, or the goal is to achieve bounded optimality, we generally seek to make artificial agents as rational as possible. Despite the centrality of the concept within AI, there is no unified definition of what constitutes a rational agent. This article provides a survey of rationality and irrationality in artificial intelligence, and sets out the open questions in this area. The understanding of rationality in other fields has influenced its conception within artificial intelligence, in particular work in economics, philosophy and psychology. Focusing on the behaviour of artificial agents, we consider irrational behaviours that can prove to be optimal in certain scenarios. Some methods have been developed to deal with irrational agents, both in terms of identification and interaction, however work in this area remains limited. Methods that have up to now been developed for other purposes, namely adversarial scenarios, may be adapted to suit interactions with artificial agents. We further discuss the interplay between human and artificial agents, and the role that rationality plays within this interaction; many questions remain in this area, relating to potentially irrational behaviour of both humans and artificial agents.	翻訳日:2023-11-30 23:44:53 公開日:2023-11-28
# チューナブルカップリングによる数百個のトラップイオンを持つサイトリゾルド2次元量子シミュレータ A Site-Resolved 2D Quantum Simulator with Hundreds of Trapped Ions under Tunable Couplings ( http://arxiv.org/abs/2311.17163v1 ) ライセンス: Link先を確認	S.-A. Guo, Y.-K. Wu, J. Ye, L. Zhang, W.-Q. Lian, R. Yao, Y. Wang, R.-Y. Yan, Y.-J. Yi, Y.-L. Xu, B.-W. Li, Y.-H. Hou, Y.-Z. Xu, W.-X. Guo, C. Zhang, B.-X. Qi, Z.-C. Zhou, L. He, and L.-M. Duan	(参考訳) 大きな量子ビット容量と個々の読み出し能力は、大規模量子コンピューティングとシミュレーションの2つの重要な要件である。量子情報処理の主要な物理プラットフォームの一つとして、イオントラップは1dポールトラップでサイト解決された読み出しを持つ数十個のイオンと、2dペニングトラップでグローバル観測可能な数百個のイオンの量子シミュレーションを達成している。しかし、これら2つの機能を1つのシステムに統合することは依然として非常に難しい。ここでは,2次元ウィグナー結晶中の512イオンの安定トラップと,その横運動のサイドバンド冷却について報告する。波長可変結合強度とパターンを有する長距離量子イジングモデルの量子シミュレーションを300イオンを用いて行った。シングルショット計測におけるサイト分解能により,準アパラバティカルな地盤状態における空間相関パターンが高まることを観測した。この空間分解能により、計算された集合フォノンモードとの比較により量子シミュレーション結果の検証が可能となる。本研究は,古典的に抽出可能な量子力学のシミュレーションと,2次元イオントラップ量子シミュレータを用いたNISQアルゴリズムの実行方法について述べる。 2次元個別アドレスのさらなる開発により、我々の研究は大規模なイオントラップ量子コンピュータのビルディングブロックも作りました。 A large qubit capacity and an individual readout capability are two crucial requirements for large-scale quantum computing and simulation. As one of the leading physical platforms for quantum information processing, the ion trap has achieved quantum simulation of tens of ions with site-resolved readout in 1D Paul trap, and that of hundreds of ions with global observables in 2D Penning trap. However, integrating these two features into a single system is still very challenging. Here we report the stable trapping of 512 ions in a 2D Wigner crystal and the sideband cooling of their transverse motion. We demonstrate the quantum simulation of long-range quantum Ising models with tunable coupling strengths and patterns, with or without frustration, using 300 ions. Enabled by the site resolution in the single-shot measurement, we observe rich spatial correlation patterns in the quasi-adiabatically prepared ground states. This spatial resolution further allows us to verify quantum simulation results by comparing with the calculated collective phonon modes. Our work paves the way for simulating classically intractable quantum dynamics and for running NISQ algorithms using 2D ion trap quantum simulators. With the further development of 2D individual addressing, our work also makes a building block for a large-scale ion trap quantum computer.	翻訳日:2023-11-30 23:43:38 公開日:2023-11-28
# 変分オートエンコーダを用いた高速粒子異常検出アルゴリズム Fast Particle-based Anomaly Detection Algorithm with Variational Autoencoder ( http://arxiv.org/abs/2311.17162v1 ) ライセンス: Link先を確認	Ryan Liu, Abhijith Gandrakota, Jennifer Ngadiuba, Maria Spiropulu, Jean-Roch Vlimant	(参考訳) モデル非依存異常検出は、標準模型物理学を超えた新しい探索において有望なアプローチの1つである。本稿では,粒子ベース変分オートエンコーダ(VAE)異常検出アルゴリズムであるSet-VAEを提案する。従来のジェット選択に比べて2倍の信号効率向上を示す。さらに,システムトリガの今後の展開に注目して,KL分割損失を異常スコアとして用いて異常検出の推論時間コストを削減するCLIP-VAEを提案し,遅延の2倍の高速化とキャッシング要求の低減を実現した。 Model-agnostic anomaly detection is one of the promising approaches in the search for new beyond the standard model physics. In this paper, we present Set-VAE, a particle-based variational autoencoder (VAE) anomaly detection algorithm. We demonstrate a 2x signal efficiency gain compared with traditional subjettiness-based jet selection. Furthermore, with an eye to the future deployment to trigger systems, we propose the CLIP-VAE, which reduces the inference-time cost of anomaly detection by using the KL-divergence loss as the anomaly score, resulting in a 2x acceleration in latency and reducing the caching requirement.	翻訳日:2023-11-30 23:43:09 公開日:2023-11-28
# Pragmatic Radiology Report Generation Pragmatic Radiology Report Generation ( http://arxiv.org/abs/2311.17154v1 ) ライセンス: Link先を確認	Dang Nguyen, Chacha Chen, He He, Chenhao Tan	(参考訳) 胸部X線で肺炎がみつからなかった場合、この陰性な観察を報告するか、省略するか。我々は、この疑問はX線だけでは答えられず、現実的な視点が必要であり、放射線技師と患者の間で報告されるコミュニケーション上の目標を捉えていると論じる。しかし、ラジオロジーレポート生成のための標準的な画像とテキストの定式化は、そのような実用的意図を組み込まなかった。この現実的な観点から、患者がなぜX線を望んだのかを示す指標が、陰性観察の言及を駆動し、レポート生成のための追加入力として表示を導入することを実証する。出力については,モデル幻覚の源として画像から推測不能な情報を識別し,基礎的報告のクリーニングによって制限する枠組みを開発する。最後に, 実用的モデルを開発するために, 表示と洗浄された基礎レポートを用い, 新たな実用的指標 (+4.3 負のf1) だけでなく, 標準指標 (+6.3 正のf1 と +11.0 bleu-2) においても, 既存の手法を上回っていることを示す。 When pneumonia is not found on a chest X-ray, should the report describe this negative observation or omit it? We argue that this question cannot be answered from the X-ray alone and requires a pragmatic perspective, which captures the communicative goal that radiology reports serve between radiologists and patients. However, the standard image-to-text formulation for radiology report generation fails to incorporate such pragmatic intents. Following this pragmatic perspective, we demonstrate that the indication, which describes why a patient comes for an X-ray, drives the mentions of negative observations and introduce indications as additional input to report generation. With respect to the output, we develop a framework to identify uninferable information from the image as a source of model hallucinations, and limit them by cleaning groundtruth reports. Finally, we use indications and cleaned groundtruth reports to develop pragmatic models, and show that they outperform existing methods not only in new pragmatics-inspired metrics (+4.3 Negative F1) but also in standard metrics (+6.3 Positive F1 and +11.0 BLEU-2).	翻訳日:2023-11-30 23:42:47 公開日:2023-11-28
# フェルミオン散逸支援作用素進化を伴う弱相互作用鎖のエネルギー拡散 Energy diffusion in weakly interacting chains with fermionic dissipation-assisted operator evolution ( http://arxiv.org/abs/2311.17148v1 ) ライセンス: Link先を確認	En-Jui Kuo, Brayden Ware, Peter Lunts, Mohammad Hafezi, Christopher David White	(参考訳) 高温での相互作用格子ハミルトニアンは、古典拡散方程式に支配されるエネルギー輸送をジェネリックに生み出すが、拡散速度の予測には微視的量子力学の数値シミュレーションが必要である。このような輸送特性を予測するため、計算時間進化法は、絡み合いの成長を制御するためのスキームと組み合わせて、十分に長期にわたってうまくシミュレートする必要がある。散逸支援作用素進化(DAOE)は、大きなパウリ重みを持つ作用素の成分を減衰させることで絡み合いを制御する。本稿では,フェミオン系に対するDAOEの一般化について述べる。代わりに, フェルミオン重みを持つ演算子の成分を減衰させる。相互作用する1次元Majorana鎖におけるエネルギー輸送のシミュレーションにおいて,DAOE,新しいフェミオンDAOE(FDAOE)および別のシミュレーション手法である密度行列トランケーション(DMT)の性能について検討した。この鎖は、フェルミの黄金律に基づく単純な期待とは対照的に、相互作用強度のような拡散係数を第4の力にスケーリングするが、'emph{weak integrability break} の理論に基づく最近の予測と一致している。系のフェルミオン性が最も関係のある弱い相互作用系では、FDAOEはDAOEよりも効率的に系をシミュレートする。 Interacting lattice Hamiltonians at high temperature generically give rise to energy transport governed by the classical diffusion equation; however, predicting the rate of diffusion requires numerical simulation of the microscopic quantum dynamics. For the purpose of predicting such transport properties, computational time evolution methods must be paired with schemes to control the growth of entanglement to tractably simulate for sufficiently long times. One such truncation scheme -- dissipation-assisted operator evolution (DAOE) -- controls entanglement by damping out components of operators with large Pauli weight. In this paper, we generalize DAOE to treat fermionic systems. Our method instead damps out components of operators with large fermionic weight. We investigate the performance of DAOE, the new fermionic DAOE (FDAOE), and another simulation method, density matrix truncation (DMT), in simulating energy transport in an interacting one-dimensional Majorana chain. The chain is found to have a diffusion coefficient scaling like interaction strength to the fourth power, contrary to naive expectations based on Fermi's Golden rule -- but consistent with recent predictions based on the theory of \emph{weak integrability breaking}. In the weak interaction regime where the fermionic nature of the system is most relevant, FDAOE is found to simulate the system more efficiently than DAOE.	翻訳日:2023-11-30 23:42:23 公開日:2023-11-28
# Calabi-Yau Four/5/Six-folds as $\mathbb{P}^n_\textbf{w}$ Hyper surfaces: Machine Learning, Approximation, Generation Calabi-Yau Four/Five/Six-folds as $\mathbb{P}^n_\textbf{w}$ Hypersurfaces: Machine Learning, Approximation, and Generation ( http://arxiv.org/abs/2311.17146v1 ) ライセンス: Link先を確認	Edward Hirst, Tancredi Schettini Gherardini	(参考訳) カラビ・ヤウ四次元多様体は、6重みの重み系によって定義される複素次元5の重み付き射影空間の超曲面として構成することができる。この研究において、ニューラルネットワークは重み系からカラビ・ヤウのホッジ数を学習するために実装され、グラデーション・サリエンシーとシンボリック・レグレッションがランダウ・ギンツブルクのモデル公式をこの方法で構築した任意の次元カラビ・ヤウのホッジ数に切り替わった。近似は常に厳密な下限を提供し、計算が劇的に速くなる(計算時間は最大4桁まで削減される)ことが示され、大きな重みを持つ系に対して驚くほど正確な結果が得られる。さらに,ip,反射率,可分性などの考慮を含め,トランスバーサリティに必要なが不十分な条件を満たす重みシステムの補完データセットを構築した。全体として、この重みシステムのランドスケープの分類を作成し、さらに機械学習手法で確認した。この分類の知識と提示された近似の性質を用いて、7つの重みからなる新しい横重み系のデータセットが、重量の合計である$\leq 200$で作成され、それぞれの位相特性を計算したカラビ・ヤウの新しいデータベースを生成する。さらに、候補calabi-yau 6-foldsの等価データベースを近似ホッジ数で生成した。 Calabi-Yau four-folds may be constructed as hypersurfaces in weighted projective spaces of complex dimension 5 defined via weight systems of 6 weights. In this work, neural networks were implemented to learn the Calabi-Yau Hodge numbers from the weight systems, where gradient saliency and symbolic regression then inspired a truncation of the Landau-Ginzburg model formula for the Hodge numbers of any dimensional Calabi-Yau constructed in this way. The approximation always provides a tight lower bound, is shown to be dramatically quicker to compute (with compute times reduced by up to four orders of magnitude), and gives remarkably accurate results for systems with large weights. Additionally, complementary datasets of weight systems satisfying the necessary but insufficient conditions for transversality were constructed, including considerations of the IP, reflexivity, and intradivisibility properties. Overall producing a classification of this weight system landscape, further confirmed with machine learning methods. Using the knowledge of this classification, and the properties of the presented approximation, a novel dataset of transverse weight systems consisting of 7 weights was generated for a sum of weights $\leq 200$; producing a new database of Calabi-Yau five-folds, with their respective topological properties computed. Further to this an equivalent database of candidate Calabi-Yau six-folds was generated with approximated Hodge numbers.	翻訳日:2023-11-30 23:41:57 公開日:2023-11-28
# リアルタイム多変量時系列による天文遷移の年齢予測 Predicting the Age of Astronomical Transients from Real-Time Multivariate Time Series ( http://arxiv.org/abs/2311.17143v1 ) ライセンス: Link先を確認	Hali Huang, Daniel Muthukrishna, Prajna Nair, Zimi Zhang, Michael Fausnaugh, Torsha Majumder, Ryan J. Foley, George R. Ricker	(参考訳) 超新星や他の稀な恒星爆発のような天文学的なトランジェントは、天文学において最も重要な発見のいくつかに役立っている。新しい天文学的スカイサーベイは、すぐに前例のない数の過渡現象を、ばらばらで不規則にサンプリングされた多変量時系列として記録する。トランジェントとその前駆体の物理的メカニズムの理解を深めるためには,早期測定が必要である。高齢者の年齢と授業のフォローアップを優先することが,新たな調査に不可欠である。そこで本研究では,複数波長の時系列観測からリアルタイムに過渡現象の年齢を予測する最初の手法を提案する。ベイズ確率的リカレントニューラルネットワークを構築します。本手法は,調査望遠鏡による観測開始時にロバストな不確実性を有する過渡星の年齢を正確に予測することができる。この研究は、現在および今後の天文学的な調査によって検出されている多くの若いトランジェントに対する理解の進展に不可欠である。 Astronomical transients, such as supernovae and other rare stellar explosions, have been instrumental in some of the most significant discoveries in astronomy. New astronomical sky surveys will soon record unprecedented numbers of transients as sparsely and irregularly sampled multivariate time series. To improve our understanding of the physical mechanisms of transients and their progenitor systems, early-time measurements are necessary. Prioritizing the follow-up of transients based on their age along with their class is crucial for new surveys. To meet this demand, we present the first method of predicting the age of transients in real-time from multi-wavelength time-series observations. We build a Bayesian probabilistic recurrent neural network. Our method can accurately predict the age of a transient with robust uncertainties as soon as it is initially triggered by a survey telescope. This work will be essential for the advancement of our understanding of the numerous young transients being detected by ongoing and upcoming astronomical surveys.	翻訳日:2023-11-30 23:41:26 公開日:2023-11-28
# フィールドレベルの銀河サーベイにおける生成モデリングへのポイントクラウドアプローチ A point cloud approach to generative modeling for galaxy surveys at the field level ( http://arxiv.org/abs/2311.17141v1 ) ライセンス: Link先を確認	Carolina Cuesta-Lazaro and Siddharth Mishra-Sharma	(参考訳) 我々は、宇宙における銀河の分布を直接3次元空間の点の集合として記述するための拡散に基づく生成モデルを導入し(座標)、ビンニングやボキセル化に頼ることなく、関連する属性(速度や質量など)と任意に関連付ける。カスタム拡散モデルは、銀河の分布の基本的な要約統計を再現するエミュレーションや、銀河場の条件付き確率を計算することによって推論に利用できる。我々は、quijoteシミュレーションスイートにおいて、巨大なダークマターハロエに対する最初の応用を示す。このアプローチは、サマリ統計学に固有の制限を回避して、宇宙データの包括的な分析を可能にするために拡張することができる。 We introduce a diffusion-based generative model to describe the distribution of galaxies in our Universe directly as a collection of points in 3-D space (coordinates) optionally with associated attributes (e.g., velocities and masses), without resorting to binning or voxelization. The custom diffusion model can be used both for emulation, reproducing essential summary statistics of the galaxy distribution, as well as inference, by computing the conditional likelihood of a galaxy field. We demonstrate a first application to massive dark matter haloes in the Quijote simulation suite. This approach can be extended to enable a comprehensive analysis of cosmological data, circumventing limitations inherent to summary statistic -- as well as neural simulation-based inference methods.	翻訳日:2023-11-30 23:41:12 公開日:2023-11-28
# 影は嘘をつかないし線は曲がれない! 生成モデルは射影幾何学を知らない... Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now ( http://arxiv.org/abs/2311.17138v1 ) ライセンス: Link先を確認	Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra, Svetlana Lazebnik, D.A. Forsyth, Anand Bhattad	(参考訳) 生成モデルは驚くほどリアルなイメージを作り出すことができる。本稿では,生成画像が実画像と異なる幾何学的特徴を持つことを示す。生成した画像の集合を作り、単純な信号ベースの分類器を騙して、それらが本物であると信じ込ませる。次に,事前修飾された生成画像は,幾何学的性質のみを見る分類器によって確実に識別できることを示す。私たちはそのような分類器を3つ使います。 3つの分類器は画像画素へのアクセスを拒否され、導出した幾何学的特徴のみを見る。第1の分類器は画像の視野を、第2の分類器は画像から検出された線を、第3の分類器は検出された物体と影の関係を見る。本手法は、複数の異なる発電機の画像に対して、SOTAローカル信号ベース検出器よりも確実に生成された画像を検出する。正則写像は、分類器が幾何的問題を確実に特定できることを示唆する。現状のジェネレータは実画像の幾何学的特性を確実に再現できない。 Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.	翻訳日:2023-11-30 23:41:00 公開日:2023-11-28
# 生成モデル: 彼らは何を知っているのか? 彼らは何か知ってるの? 見つけよう! Generative Models: What do they know? Do they know things? Let's find out! ( http://arxiv.org/abs/2311.17137v1 ) ライセンス: Link先を確認	Xiaodan Du, Nicholas Kolkin, Greg Shakhnarovich, Anand Bhattad	(参考訳) 生成モデルは高精細でリアルな画像を合成できることが示されている。表面の正常や深度、影といった画像の内在を暗黙的にモデル化することを学ぶことは、疑わしい。本稿では,生成モデルが内在的に高品質なシーン内在的地図を生成するという説得力のある証拠を示す。 Intrinsic LoRA(I LoRA)は、任意の生成モデルをシーン固有の予測子に変換する汎用的なプラグイン・アンド・プレイ方式であり、デコーダの追加やオリジナルネットワークを完全に微調整することなく、オリジナルジェネレータネットワークから直接固有のシーンマップを抽出することができる。提案手法では,重要特徴マップの低ランク適応 (lora) を用いて,生成モデルにおけるパラメータ全体の0.6%未満のパラメータを新たに学習した。ラベル付き画像の小さなセットで最適化された我々のモデル非依存のアプローチは、拡散モデル、GAN、自動回帰モデルなど、様々な生成アーキテクチャに適応する。本研究では,本手法が生成するシーン固有マップと,指導手法が生成するシーン固有マップとを比較した。 Generative models have been shown to be capable of synthesizing highly detailed and realistic images. It is natural to suspect that they implicitly learn to model some image intrinsics such as surface normals, depth, or shadows. In this paper, we present compelling evidence that generative models indeed internally produce high-quality scene intrinsic maps. We introduce Intrinsic LoRA (I LoRA), a universal, plug-and-play approach that transforms any generative model into a scene intrinsic predictor, capable of extracting intrinsic scene maps directly from the original generator network without needing additional decoders or fully fine-tuning the original network. Our method employs a Low-Rank Adaptation (LoRA) of key feature maps, with newly learned parameters that make up less than 0.6% of the total parameters in the generative model. Optimized with a small set of labeled images, our model-agnostic approach adapts to various generative architectures, including Diffusion models, GANs, and Autoregressive models. We show that the scene intrinsic maps produced by our method compare well with, and in some cases surpass those generated by leading supervised techniques.	翻訳日:2023-11-30 23:40:44 公開日:2023-11-28
# UniIR:Universal Multimodal Information Retrieverのトレーニングとベンチマーク UniIR: Training and Benchmarking Universal Multimodal Information Retrievers ( http://arxiv.org/abs/2311.17136v1 ) ライセンス: Link先を確認	Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen	(参考訳) 既存の情報検索(IR)モデルは、テキスト記述による画像の検索、見出し画像によるニュース記事の検索、クエリ画像による類似した画像の検索など、さまざまなユーザニーズに対する適用性を制限する、均質なフォーマットを前提とすることが多い。このような異なる情報検索要求にアプローチするために,命令誘導型マルチモーダルレトリバーであるUniIRを導入する。 UniIRは、10の多様なマルチモーダル-IRデータセットを共同でトレーニングした単一の検索システムで、ユーザ命令を解釈してさまざまな検索タスクを実行し、既存のデータセット間で堅牢なパフォーマンスを示し、新しいタスクにゼロショットの一般化を示す。本実験は,マルチタスク学習と指導訓練がUniIRの一般化能力の鍵であることを示す。さらに,包括的結果を持つマルチモーダル検索ベンチマークであるm-beirを構築し,ユニバーサルマルチモーダル情報検索の評価を標準化する。 Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image. To approach such different information-seeking demands, we introduce UniIR, a unified instruction-guided multimodal retriever capable of handling eight distinct retrieval tasks across modalities. UniIR, a single retrieval system jointly trained on ten diverse multimodal-IR datasets, interprets user instructions to execute various retrieval tasks, demonstrating robust performance across existing datasets and zero-shot generalization to new tasks. Our experiments highlight that multi-task training and instruction tuning are keys to UniIR's generalization ability. Additionally, we construct the M-BEIR, a multimodal retrieval benchmark with comprehensive results, to standardize the evaluation of universal multimodal information retrieval.	翻訳日:2023-11-30 23:40:20 公開日:2023-11-28
# TLControl:人間の運動合成のための軌道と言語制御 TLControl: Trajectory and Language Control for Human Motion Synthesis ( http://arxiv.org/abs/2311.17135v1 ) ライセンス: Link先を確認	Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu	(参考訳) 制御可能な人間のモーション合成は、AR/VR、ゲーム、映画、エンボディドAIの応用に不可欠である。既存の手法は言語または完全な軌道制御にのみ焦点をあてることが多く、特にマルチジョイント制御において、ユーザが特定した軌道に合わせた合成動作の精度に欠ける。これらの問題に対処するため,TLControlは,低レベルな軌跡と高レベルな言語セマンティクス制御の両方を取り入れた,リアルな人間の動作合成のための新しい手法である。具体的には、まずVQ-VAEをトレーニングし、ボディパーツによって構成されたコンパクトな潜在運動空間を学習する。次に,学習された潜在運動空間に基づく関節の完全な軌跡の粗い初期予測を行うために,ユーザが指定した部分的軌跡とテキスト記述を条件として仮面付き軌跡変換器を提案する。最後に, 高精度軌道制御のための粗い予測を洗練するために, 効率的なテストタイム最適化を提案する。実験により,TLControlはトラジェクトリの精度と時間効率に優れており,インタラクティブで高品質なアニメーション生成に実用的であることが示された。 Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion synthesis, incorporating both low-level trajectory and high-level language semantics controls. Specifically, we first train a VQ-VAE to learn a compact latent motion space organized by body parts. We then propose a Masked Trajectories Transformer to make coarse initial predictions of full trajectories of joints based on the learned latent motion space, with user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce an efficient test-time optimization to refine these coarse predictions for accurate trajectory control. Experiments demonstrate that TLControl outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.	翻訳日:2023-11-30 23:40:01 公開日:2023-11-28
# グラフニューラルネットワークを用いた炭水化物のNMR化学シフト予測のためのデータセットとベンチマーク \texttt{GlycoNMR}: Dataset and benchmarks for NMR chemical shift prediction of carbohydrates with graph neural networks ( http://arxiv.org/abs/2311.17134v1 ) ライセンス: Link先を確認	Zizhang Chen, Ryan Paul Badman, Lachele Foley, Robert Woods, Pengyu Hong	(参考訳) 分子表現学習(mrl)は、分子を化学特性を維持しながら数値表現に変換するため、機械学習と化学科学の間のギャップを埋める強力なツールである。これらのエンコード表現は、特性予測や薬物設計を含む様々な下流生化学研究の基盤となる。 MRLはタンパク質と一般的な生体分子のデータセットで大きな成功を収めた。しかし、糖科学の亜分野(炭水化物の研究、長鎖の炭水化物もグリカンと呼ばれる)では、MRL法はほとんど研究されていない。このアンダー探索は、主に、炭水化物データによって引き起こされる固有の問題を満たすように特別に調整された、包括的で十分に計算された炭水化物固有のデータセットと機械学習(ML)パイプラインの欠如による。炭水化物固有のデータの解釈と注釈は一般にタンパク質データよりも複雑であるため、ドメインの専門家が関与する必要がある。タンパク質や小さな生体分子に優先的に最適化された既存のmrl法は、特別な修飾なしでは直接炭水化物に使用できない。この課題に対処し、グリコサイエンスの進歩を加速し、MRLコミュニティのデータ資源を充実させるため、GlycoNMRを導入する。グリコNMRは、2,609の炭水化物構造と211,543のアノテート核磁気共鳴(NMR)化学シフトを持つ、2つの熱処理されたデータセットを含んでいる。我々は, 炭水化物特有の特徴と既存のMRLモデルを用いて, この問題を効果的に解決した。図は、新しいデータセットで4つの修正MRLモデルをベンチマークする。 Molecular representation learning (MRL) is a powerful tool for bridging the gap between machine learning and chemical sciences, as it converts molecules into numerical representations while preserving their chemical features. These encoded representations serve as a foundation for various downstream biochemical studies, including property prediction and drug design. MRL has had great success with proteins and general biomolecule datasets. Yet, in the growing sub-field of glycoscience (the study of carbohydrates, where longer carbohydrates are also called glycans), MRL methods have been barely explored. This under-exploration can be primarily attributed to the limited availability of comprehensive and well-curated carbohydrate-specific datasets and a lack of Machine learning (ML) pipelines specifically tailored to meet the unique problems presented by carbohydrate data. Since interpreting and annotating carbohydrate-specific data is generally more complicated than protein data, domain experts are usually required to get involved. The existing MRL methods, predominately optimized for proteins and small biomolecules, also cannot be directly used in carbohydrate applications without special modifications. To address this challenge, accelerate progress in glycoscience, and enrich the data resources of the MRL community, we introduce GlycoNMR. GlycoNMR contains two laboriously curated datasets with 2,609 carbohydrate structures and 211,543 annotated nuclear magnetic resonance (NMR) chemical shifts for precise atomic-level prediction. We tailored carbohydrate-specific features and adapted existing MRL models to tackle this problem effectively. For illustration, we benchmark four modified MRL models on our new datasets.	翻訳日:2023-11-30 23:39:39 公開日:2023-11-28
# ロバストで説明可能な死亡予測モデルの展開:COVID-19パンデミックとそれ以上 Deployment of a Robust and Explainable Mortality Prediction Model: The COVID-19 Pandemic and Beyond ( http://arxiv.org/abs/2311.17133v1 ) ライセンス: Link先を確認	Jacob R. Epifano, Stephen Glass, Ravi P. Ramachandran, Sharad Patel, Aaron J. Masino, Ghulam Rasool	(参考訳) 本研究では、新型コロナウイルスのパンデミック以降の死亡率予測におけるAIモデルの有効性、説明可能性、堅牢性について検討した。このタイプの最初の研究で、ベイズニューラルネットワーク(BNN)とインテリジェントトレーニング技術によって、重要なデータシフトの中で、我々のモデルがパフォーマンスを維持することができることがわかった。本研究は, 困難な状況下でも臨床予測に適合し, かつ超越することができる頑健なaiモデルを開発することの重要性を強調する。モデル説明可能性の探索により、確率的モデルはより多様でパーソナライズされた説明を生成し、現実の臨床環境で詳細な個別化された洞察を提供するAIモデルの必要性を強調した。さらに,AIモデルにおける不確実性の定量化の重要性を強調し,信頼性の高い予測に基づいて,臨床医がより良いインフォームド決定を行えるようにした。我々の研究は、医療のためのAI研究における実装科学の優先順位付けを提唱し、現実の臨床環境でAIソリューションが実用的で有益で持続可能であることを保証する。医療設定における固有の課題や複雑さに対処することで、研究者は臨床実践と患者の成果を効果的に改善するAIモデルを開発することができる。 This study investigated the performance, explainability, and robustness of deployed artificial intelligence (AI) models in predicting mortality during the COVID-19 pandemic and beyond. The first study of its kind, we found that Bayesian Neural Networks (BNNs) and intelligent training techniques allowed our models to maintain performance amidst significant data shifts. Our results emphasize the importance of developing robust AI models capable of matching or surpassing clinician predictions, even under challenging conditions. Our exploration of model explainability revealed that stochastic models generate more diverse and personalized explanations thereby highlighting the need for AI models that provide detailed and individualized insights in real-world clinical settings. Furthermore, we underscored the importance of quantifying uncertainty in AI models which enables clinicians to make better-informed decisions based on reliable predictions. Our study advocates for prioritizing implementation science in AI research for healthcare and ensuring that AI solutions are practical, beneficial, and sustainable in real-world clinical environments. By addressing unique challenges and complexities in healthcare settings, researchers can develop AI models that effectively improve clinical practice and patient outcomes.	翻訳日:2023-11-30 23:39:10 公開日:2023-11-28
# TransNeXt:視覚変換器のロバストな視覚知覚 TransNeXt: Robust Foveal Visual Perception for Vision Transformers ( http://arxiv.org/abs/2311.17132v1 ) ライセンス: Link先を確認	Dai Shi	(参考訳) 残差接続の深さ分解効果のため、情報交換のための積み重ね層に依存する多くの効率的な視覚トランスフォーマーモデルは十分な情報混合を形成することができず、不自然な視覚知覚に繋がる。そこで本稿では,生物の焦点視と連続眼球運動をシミュレートするバイオミメティックデザインに基づくトークンミキサーであるaggregated attentionを提案する。さらに,従来のクエリやキーと対話する学習可能なトークンを組み込んで,クエリとキーの類似性に頼るだけでなく,親和性行列の生成をさらに多様化する。本手法では,情報交換の積み重ねに頼らず,奥行き劣化を効果的に回避し,自然な視覚知覚を実現する。さらに,GLUとSEのギャップを埋めるチャネルミキサーであるConvolutional GLUを提案する。集約された注意と畳み込みGLUを組み合わせて、TransNeXtと呼ばれる新しいビジュアルバックボーンを作成します。大規模な実験により、TransNeXtは複数のモデルサイズにわたる最先端のパフォーマンスを実現する。 224^2$の解像度で、TransNeXt-Tinyはイメージネットの精度84.0%に達し、69%のパラメータでConvNeXt-Bを上回った。 TransNeXt-Base は ImageNet の精度86.2%、ImageNet-A の精度61.6%を384^2$、COCO オブジェクト検出 mAP 57.1、ADE20K セマンティックセグメンテーション mIoU 54.7 で達成している。 Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and keys. Our approach does not rely on stacking for information exchange, thus effectively avoiding depth degradation and achieving natural visual perception. Additionally, we propose Convolutional GLU, a channel mixer that bridges the gap between GLU and SE mechanism, which empowers each token to have channel attention based on its nearest neighbor image features, enhancing local modeling capability and model robustness. We combine aggregated attention and convolutional GLU to create a new visual backbone called TransNeXt. Extensive experiments demonstrate that our TransNeXt achieves state-of-the-art performance across multiple model sizes. At a resolution of $224^2$, TransNeXt-Tiny attains an ImageNet accuracy of 84.0%, surpassing ConvNeXt-B with 69% fewer parameters. Our TransNeXt-Base achieves an ImageNet accuracy of 86.2% and an ImageNet-A accuracy of 61.6% at a resolution of $384^2$, a COCO object detection mAP of 57.1, and an ADE20K semantic segmentation mIoU of 54.7.	翻訳日:2023-11-30 23:38:47 公開日:2023-11-28
# クラス分布推定のための不変仮定 Invariance assumptions for class distribution estimation ( http://arxiv.org/abs/2311.17225v1 ) ライセンス: Link先を確認	Dirk Tasche	(参考訳) データセットシフトによるクラス分布推定の問題について検討する。トレーニングデータセットでは、機能とクラスラベルの両方が観察され、テストデータセットでは、機能のみを観察できる。そのタスクは、テストデータセットにおけるクラスラベルの分布、すなわち、クラス前の確率を推定することである。特徴とラベルのトレーニング共同分布とテスト分布との不変性の推定は,この課題をかなり容易にする。共変量シフト,因子可能なジョイントシフト,スパースジョイントシフトの仮定とそのクラス分布推定への応用について考察する。 We study the problem of class distribution estimation under dataset shift. On the training dataset, both features and class labels are observed while on the test dataset only the features can be observed. The task then is the estimation of the distribution of the class labels, i.e. the estimation of the class prior probabilities, in the test dataset. Assumptions of invariance between the training joint distribution of features and labels and the test distribution can considerably facilitate this task. We discuss the assumptions of covariate shift, factorizable joint shift, and sparse joint shift and their implications for class distribution estimation.	翻訳日:2023-11-30 23:30:57 公開日:2023-11-28
# 量子フィードバックによる電荷輸送電池 Charge transport battery with quantum feedback ( http://arxiv.org/abs/2311.17219v1 ) ライセンス: Link先を確認	Oscar Bohorquez	(参考訳) バッテリ(英: battery)は、仕事用の蓄電装置、すなわち、他の装置が後で使用する作業形態でエネルギーを蓄電する装置である。本研究では,2つの量子ドットを直列に配置し,異なる化学ポテンシャルで2つの電極に帯電し,マルコフ量子フィードバックプロトコルにより最適化した量子電池の実現について検討する。エルゴトロピーの概念をメリットの図形として用い、まず2レベルシステムにおける最大エルゴトロピーの簡単な表現を確立し、マルコフフィードバックがこの最適なエルゴトロピーを達成するためのパラメータを見つける。また,電池の充電・放電過程に及ぼすフォノン環境との相互作用の影響についても検討した。 A battery is a work storage device, i.e. a device that stores energy in the form of work for later use by other devices. In this work, we study the realization of a quantum battery in a double quantum dot in series, charged by two electrodes at different chemical potentials and optimized by a Markovian quantum feedback protocol. Using the concept of ergotropy as a figure of merit, we first establish a simple expression for the maximum ergotropy in a two-level system, and then find the parameters under which a Markovian feedback can achieve this optimal ergotropy. We also study the influence of interaction with a phonon environment on the charging and discharging process of the battery.	翻訳日:2023-11-30 23:30:47 公開日:2023-11-28
# BIM:マスク画像モデリングによるブロックワイズ自己指導型学習 BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling ( http://arxiv.org/abs/2311.17218v1 ) ライセンス: Link先を確認	Yixuan Luo, Mengye Ren, Sai Qian Zhang	(参考訳) 自然言語処理におけるマスク付き言語モデリング(MLM)と同様に、マスク付き画像モデリング(MIM)は、画像パッチから貴重な洞察を抽出し、基盤となるディープニューラルネットワーク(DNN)の機能抽出機能を強化することを目的としている。教師付き学習や教師なしコントラスト学習といった他のトレーニングパラダイムとは対照的に、マスク付き画像モデリング(mim)の事前トレーニングは、大規模なトレーニングデータバッチ(例えば4096)を管理するために、重要な計算リソースを必要とする。重要なメモリと計算要件は、その広範にわたる採用にとって大きな課題となる。そこで,本稿では,BIM(Block-Wise Masked Image Modeling)と呼ばれる新しい学習フレームワークを導入する。このフレームワークは、MIMタスクを独立した計算パターンを持ついくつかのサブタスクに分解することで、従来のエンドツーエンドアプローチの代わりにブロック単位でのバックプロパゲーション操作を行う。提案するbimは,従来のmimよりも優れた性能を維持しつつ,ピークメモリ消費を大幅に削減する。さらに、BIMは様々な深さの多数のDNNバックボーンの同時トレーニングを可能にする。これにより、複数のトレーニングされたDNNバックボーンが作成され、それぞれが異なるコンピューティング機能を備えた異なるハードウェアプラットフォームに適合する。このアプローチは,各DNNバックボーンを個別にトレーニングした場合と比較して,計算コストを大幅に削減する。当社のフレームワークは、mimのリソース制約付きトレーニングに有望なソリューションを提供します。 Like masked language modeling (MLM) in natural language processing, masked image modeling (MIM) aims to extract valuable insights from image patches to enhance the feature extraction capabilities of the underlying deep neural network (DNN). Contrasted with other training paradigms like supervised learning and unsupervised contrastive learning, masked image modeling (MIM) pretraining typically demands significant computational resources in order to manage large training data batches (e.g., 4096). The significant memory and computation requirements pose a considerable challenge to its broad adoption. To mitigate this, we introduce a novel learning framework, termed~\textit{Block-Wise Masked Image Modeling} (BIM). This framework involves decomposing the MIM tasks into several sub-tasks with independent computation patterns, resulting in block-wise back-propagation operations instead of the traditional end-to-end approach. Our proposed BIM maintains superior performance compared to conventional MIM while greatly reducing peak memory consumption. Moreover, BIM naturally enables the concurrent training of numerous DNN backbones of varying depths. This leads to the creation of multiple trained DNN backbones, each tailored to different hardware platforms with distinct computing capabilities. This approach significantly reduces computational costs in comparison with training each DNN backbone individually. Our framework offers a promising solution for resource constrained training of MIM.	翻訳日:2023-11-30 23:30:34 公開日:2023-11-28
# 責任のあるテキスト対画像生成のための自己発見型拡散潜在方向 Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation ( http://arxiv.org/abs/2311.17216v1 ) ライセンス: Link先を確認	Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu	(参考訳) 拡散に基づくモデルは、例外的な画像生成能力のため、テキスト・画像生成において大きな人気を集めている。これらのモデルによるリスクは、バイアスや有害な画像などの不適切なコンテンツの潜在的な生成である。しかし、拡散モデルの内部表現の観点から、そのような望ましくないコンテンツを生成する根本的な理由は不明である。以前の研究では、拡散モデルの解釈可能な潜在空間内のベクトルを意味概念として解釈している。しかし、既存のアプローチでは、不適切な概念に関連するような任意の概念の方向を見つけることはできない。本研究では,ある概念に対する解釈可能な潜在方向を見つけるための,新たな自己教師型アプローチを提案する。さらに, 検出ベクトルを用いて, 不適切な生成を緩和するための簡易な手法を提案する。公平な生成,安全な生成,責任のあるテキストエンハンシング生成といった緩和手法の有効性を検証するために,広範な実験が行われてきた。 Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model's internal representation remain unclear. Previous work interprets vectors in an interpretable latent space of diffusion models as semantic concepts. However, existing approaches cannot discover directions for arbitrary concepts, such as those related to inappropriate concepts. In this work, we propose a novel self-supervised approach to find interpretable latent directions for a given concept. With the discovered vectors, we further propose a simple approach to mitigate inappropriate generation. Extensive experiments have been conducted to verify the effectiveness of our mitigation approach, namely, for fair generation, safe generation, and responsible text-enhancing generation.	翻訳日:2023-11-30 23:30:13 公開日:2023-11-28
# 将来の重力波観測のための2128nmの30w超安定レーザー光 30 W ultra-stable laser light at 2128 nm for future gravitational-wave observatories ( http://arxiv.org/abs/2311.17214v1 ) ライセンス: Link先を確認	Julian Gurs, Nina Bode, Christian Darsow-Fromm, Henning Vahlbruch, Pascal Gewecke, Sebastian Steinlechner, Benno Willke, Roman Schnabel	(参考訳) 誘電体ミラーコーティングの熱ノイズはレーザ光学的高精度測定を制限できる。アモルファスシリコンと窒化ケイ素からなるコーティングは、重力波検出器と光時計の両方に効果がある。しかし、これらの材料の吸収スペクトルは、約2ドルのレーザー波長を必要とする。 GW検出器には、数十から数百ワットの超安定レーザー光が必要である。本稿では、マスター発振器電力増幅器から1064nm光の周波数変換により、2128nmで30W近い超安定レーザー光を生成することを報告する。我々は、光パラメトリック振動による外部変換効率(67.5$\pm$ 0.5)%と、100hzにおける10^{-6}$/$\sqrt{\text{hz}}$の範囲における相対パワーノイズを100hzで達成し、入力光と同程度の低値であり、我々のアプローチの可能性を示す。 Thermal noise of the dielectric mirror coatings can limit laser-optical high-precision measurements. Coatings made of amorphous silicon and silicon nitride could provide a remedy for both gravitational-wave detectors and optical clocks. However, the absorption spectra of these materials require laser wavelengths around 2 $\mu$m. For GW detectors, ultra-stable laser light of tens or hundreds of watts is needed. Here, we report the production of nearly 30 W of ultra-stable laser light at 2128 nm by frequency conversion of 1064 nm light from a master oscillator power amplifier system. We achieve an external conversion efficiency of (67.5 $\pm$ 0.5) % via optical parametric oscillation and a relative power noise in the range of $10^{-6}$/$\sqrt{\text{Hz}}$ at 100 Hz, which is almost as low as that of the input light and underlines the potential of our approach.	翻訳日:2023-11-30 23:29:59 公開日:2023-11-28
# 胸部X線写真からのデータの抽出のための一般目的対ドメイン適応大言語モデル General-Purpose vs. Domain-Adapted Large Language Models for Extraction of Data from Thoracic Radiology Reports ( http://arxiv.org/abs/2311.17213v1 ) ライセンス: Link先を確認	Ali H. Dhanaliwala, Rikhiya Ghosh, Sanjeev Kumar Karn, Poikavila Ullaskrishnan, Oladimeji Farri, Dorin Comaniciu and Charles E. Kahn	(参考訳) 放射線科医は、情報システムによって消費される臨床治療に有用な非構造化データを生成する。しかし、スタイルのバリエーションは使用を制限する。胸部X線写真から共通データ要素(CDE)を抽出する際,ドメイン適応言語モデル(RadLing)と汎用大言語モデル(GPT-4)を用いたシステムの性能の比較を行った。 3人の放射線学者が1300の胸部リポート(900のトレーニング、400の試験)の振り返りデータセットに注釈を付け、21の関連するCDEにマッピングした。 radlingは文の埋め込みを生成し、cosine- similarityを使ってcdを識別するために使われ、軽量マッパーを使って値にマッピングされた。 GPT-4システムはOpenAIの汎用埋め込みを使用して関連するCDEを識別し、GPT-4を使って値にマッピングした。出力のcde:valueペアは参照標準と比較され、正の一致は正であると考えられた。精度(正の予測値)はRadLingが96%(2700/2824)、GPT-4が99%(2034/2047)であった。リコール(感度)はRadLingが94%(2700/2876)、GPT-4が70%(2034/2887)、統計学的に有意差(P<.001。 RadLingのドメイン適応型埋め込みはCDE識別(95%対71%)に敏感であり、軽量マッパーは95.4%対95.0%)と同等の精度であった。 RadLing system は GPT-4 system よりも高い性能を示した。 RadLingシステムのドメイン適応埋め込みは、CDE識別におけるOpenAIの汎用埋め込みよりも優れており、その軽量値マッパーは大きなGPT-4に匹敵する精度を達成する。 RadLingシステムは、ローカルデプロイメントやランタイムコストの削減など、運用上のメリットを提供する。ドメイン適応型RadLingシステムは、ローカルデプロイメントと低コストのメリットを提供しながら、放射線学レポートから共通データ要素を抽出するGPT-4システムを上回る。 Radiologists produce unstructured data that could be valuable for clinical care when consumed by information systems. However, variability in style limits usage. Study compares performance of system using domain-adapted language model (RadLing) and general-purpose large language model (GPT-4) in extracting common data elements (CDE) from thoracic radiology reports. Three radiologists annotated a retrospective dataset of 1300 thoracic reports (900 training, 400 test) and mapped to 21 pre-selected relevant CDEs. RadLing was used to generate embeddings for sentences and identify CDEs using cosine-similarity, which were mapped to values using light-weight mapper. GPT-4 system used OpenAI's general-purpose embeddings to identify relevant CDEs and used GPT-4 to map to values. The output CDE:value pairs were compared to the reference standard; an identical match was considered true positive. Precision (positive predictive value) was 96% (2700/2824) for RadLing and 99% (2034/2047) for GPT-4. Recall (sensitivity) was 94% (2700/2876) for RadLing and 70% (2034/2887) for GPT-4; the difference was statistically significant (P<.001). RadLing's domain-adapted embeddings were more sensitive in CDE identification (95% vs 71%) and its light-weight mapper had comparable precision in value assignment (95.4% vs 95.0%). RadLing system exhibited higher performance than GPT-4 system in extracting CDEs from radiology reports. RadLing system's domain-adapted embeddings outperform general-purpose embeddings from OpenAI in CDE identification and its light-weight value mapper achieves comparable precision to large GPT-4. RadLing system offers operational advantages including local deployment and reduced runtime costs. Domain-adapted RadLing system surpasses GPT-4 system in extracting common data elements from radiology reports, while providing benefits of local deployment and lower costs.	翻訳日:2023-11-30 23:29:45 公開日:2023-11-28
# トラブルシューティング物理コンピューティングプロジェクトにおける高校生の成長理解のための障害アーチファクトシナリオ Failure Artifact Scenarios to Understand High School Students' Growth in Troubleshooting Physical Computing Projects ( http://arxiv.org/abs/2311.17212v1 ) ライセンス: Link先を確認	L. Morales-Navarro, D. A. Fields, D. Barapatre, Y. B. Kafai	(参考訳) 物理コンピューティングプロジェクトのデバッグは、コンピューティングとエンジニアリングの複数の領域を統合する分野横断的な問題解決を理解するためのリッチなコンテキストを提供する。しかし、ハードウェアやソフトウェアのバグの発見と修正は、特に物理コンピューティングなどの未調査領域において、デバッグに関する学生の学習を理解し、評価することは依然として困難である。本稿では,電子織物(e-textiles)のデバッギングとトラブルシューティングに対する学生のアプローチの変化を研究するために,「障害アーティファクトシナリオ(failure artifact scenarios)」の開発とパイロットを行うための,臨床面接の豊富な歴史について述べる。 8週間のe-textilesユニットの前後で臨床面接プロトコルを適用した。 4つの学校における18人の学生のプレ/ポスト臨床面接の分析を行った。分析の結果、学生はより特定度の高いバグを識別し、また複数のバグの原因を検討することで改善した。本稿では,物理コンピューティングにおけるコンテキスト化されたデバッグシナリオを通じて,学生のデバッグ能力を評価するツールの開発について論じる。 Debugging physical computing projects provides a rich context to understand cross-disciplinary problem solving that integrates multiple domains of computing and engineering. Yet understanding and assessing students' learning of debugging remains a challenge, particularly in understudied areas such as physical computing, since finding and fixing hardware and software bugs is a deeply contextual practice. In this paper we draw on the rich history of clinical interviews to develop and pilot "failure artifact scenarios" in order to study changes in students' approaches to debugging and troubleshooting electronic textiles (e-textiles). We applied this clinical interview protocol before and after an eight-week-long e-textiles unit. We analyzed pre/post clinical interviews from 18 students at four different schools. The analysis revealed that students improved in identifying bugs with greater specificity, and across domains, and in considering multiple causes for bugs. We discuss implications for developing tools to assess students' debugging abilities through contextualized debugging scenarios in physical computing.	翻訳日:2023-11-30 23:29:07 公開日:2023-11-28
# 脳信号からの感情認識のための最適脳波電極セット : 経験的探求 Optimal EEG Electrode Set for Emotion Recognition From Brain Signals: An Empirical Quest ( http://arxiv.org/abs/2311.17204v1 ) ライセンス: Link先を確認	Rumman Ahmed Prodhan, Sumya Akter, Tanmoy Sarkar Pias, Md. Akhtaruzzaman Adnan	(参考訳) 人間の脳は複雑な器官であり、まだ完全には発見されていない。生存とは別に、人間の脳は感情を刺激する。近年の研究では、脳信号が感情認識に非常に有効であることが示されている。しかし、脳のどの部分がほとんどの感情を示すかはまだ解明されていない。本研究では,感情提示における脳の各部分の寄与を経験的に分析する。私たちはDEAPデータセットを使用して、感情に関連する効果的な脳の部分につながる最も最適な電極セットを見つけます。効率的な特徴抽出にはFast Fourier Transformation, 分類には残差接続を持つ1D-CNNを用いる。 DEAPデータセットの32電極の精度は97.34%であったが、わずか12電極(F7、P8、O1、F8、C4、T7、P3、Fp1、Fp2、O2、P3、Fz)で95.81%の精度が得られた。また,10以上の電極を添加しても性能は向上しないことを示した。さらに、前頭葉は感情を認識する上で最も重要なものである。 The human brain is a complex organ, still completely undiscovered, that controls almost all the parts of the body. Apart from survival, the human brain stimulates emotions. Recent research indicates that brain signals can be very effective for emotion recognition. However, which parts of the brain exhibit most of the emotions is still under-explored. In this study, we empirically analyze the contribution of each part of the brain in exhibiting emotions. We use the DEAP dataset to find the most optimal electrode set which eventually leads to the effective brain part associated with emotions. We use Fast Fourier Transformation for effective feature extraction and a 1D-CNN with residual connection for classification. Though 32 electrodes from the DEAP dataset got an accuracy of 97.34%, only 12 electrodes (F7, P8, O1, F8, C4, T7, PO3, Fp1, Fp2, O2, P3, and Fz) achieve 95.81% accuracy. This study also shows that adding more than 10 electrodes does not improve performance significantly. Moreover, the frontal lobe is the most important for recognizing emotion.	翻訳日:2023-11-30 23:28:47 公開日:2023-11-28
# greybox fuzzing time-intensive program(英語) Greybox fuzzing time-intensive programs ( http://arxiv.org/abs/2311.17200v1 ) ライセンス: Link先を確認	Steve Huntsman	(参考訳) 幾何的な観点からgreybox fuzzing(指向)を調べ,入力と制御フローグラフ(動的統計)の相違点を興味のある原始的対象として捉えた。我々は、この視点を取り入れた時間集約型プログラムのためのグレーボックスファザであるGoExploreFuzzのプロトタイプと評価を行った。その結果, 経路の多様性を定量化し, 変異の「帯域幅」を自律的に調整したgreybox fuzzingの有用性が示された。 We examine (directed) greybox fuzzing from a geometrical perspective, viewing dissimilarities on inputs and on control flow graphs (with dynamical statistics) as primitive objects of interest. We prototype and evaluate GoExploreFuzz, a greybox fuzzer for time-intensive programs that incorporates this perspective. The results indicate useful capabilities for greybox fuzzing that have hitherto been underutilized, notably quantifying the diversity of paths and autonomously tuning the "bandwidth" of mutations.	翻訳日:2023-11-30 23:28:30 公開日:2023-11-28
# Minimax Exploiter: 競争力のあるセルフプレイのためのデータ効率の良いアプローチ Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play ( http://arxiv.org/abs/2311.17190v1 ) ライセンス: Link先を確認	Daniel Bairamian, Philippe Marcotte, Joshua Romoff, Gabriel Robert, Derek Nowrouzezahrai	(参考訳) 近年のCSP(Competitive Self-Play)の進歩は、分散マルチエージェント強化学習(MARL)を用いて、Dota 2やStarCraft IIのような複雑なゲーム環境での人間レベルのパフォーマンスを達成または超えた。これらの方法のコアコンポーネントの1つは、メインエージェント、このエージェントの過去のバージョン、エクスプロイラーエージェントからなる学習エージェントのプールを作成し、エクスプロイラーエージェントがメインエージェントの逆戦略を学習する。これらのアプローチの大きな欠点は、システムのトレーニングに必要な計算コストと物理的時間であり、ビデオゲームの制作のような高度に反復的な実生活環境でのデプロイが現実的でないことである。本稿では,対戦相手の知識を利用する主エージェントを利用するゲーム理論的なアプローチであるminimax exploiterを提案する。我々は、単純なターンベースのゲーム、アーケード学習環境、そして名誉のために現代のビデオゲームを含む、様々な設定で我々のアプローチを検証する。 Minimax Exploiterは、安定性とデータの効率性を向上し、柔軟性とデプロイが容易な、堅牢なCSP-MARLメソッドを実現している。 Recent advances in Competitive Self-Play (CSP) have achieved, or even surpassed, human level performance in complex game environments such as Dota 2 and StarCraft II using Distributed Multi-Agent Reinforcement Learning (MARL). One core component of these methods relies on creating a pool of learning agents -- consisting of the Main Agent, past versions of this agent, and Exploiter Agents -- where Exploiter Agents learn counter-strategies to the Main Agents. A key drawback of these approaches is the large computational cost and physical time that is required to train the system, making them impractical to deploy in highly iterative real-life settings such as video game productions. In this paper, we propose the Minimax Exploiter, a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents, leading to significant increases in data efficiency. We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game. The Minimax Exploiter consistently outperforms strong baselines, demonstrating improved stability and data efficiency, leading to a robust CSP-MARL method that is both flexible and easy to deploy.	翻訳日:2023-11-30 23:28:19 公開日:2023-11-28
# 量子未来への投資 : 量子ベンチャーキャピタルの現状と今後の展開 Investing in the Quantum Future : State of Play and Way Forward for Quantum Venture Capital ( http://arxiv.org/abs/2311.17187v1 ) ライセンス: Link先を確認	Christophe Jurczak	(参考訳) 何十年もの基本的な研究に基づいて、コンピューティング、センシング、ネットワークの分野で量子科学の新しい応用が生まれ始めている。現在のデプロイメントのフェーズでは、量子技術はまだ日常的に使用されていないが、まだ実験室から離脱しているため、VC(Venture Capital)が不可欠である。公的資金調達プログラムに関連して、VCは学術機関で生まれたスタートアップを支援し、社会に最も大きな影響を与えるアプリケーションに向けてエコシステムの優先順位を構造化する役割を担っている。本論では, 量子ファンドQuantonation Iのケーススタディを用いて, 科学知識の創出, 雇用創出, 産業界への資金提供に対するその影響を詳述する。本稿は、新しいスタートアップの出現を支える概念を紹介し、スケールアップ量子企業の資金調達を提唱する。この論文は、社会への関与を向上し、大きな社会的利益を生かしたアプリケーションに焦点を当てたプロジェクトへの協力を求めることで、業界への影響を改善するための提案を締めくくっている。 Building on decades of fundamental research, new applications of Quantum Science have started to emerge in the fields of computing, sensing and networks. In the current phase of deployment, in which quantum technology is not yet in routine use but is still transitioning out of the laboratory, Venture Capital (VC) is critical. In association with public funding programs, VC supports startups born in academic institutions and has a role to play in structuring the priorities of the ecosystem, guiding it towards applications with the greatest impact on society. This paper illustrates this thesis with a case-study: the experience of the first dedicated quantum fund, Quantonation I, chronicling its impacts on the production of scientific knowledge, job creation and funding of the industry. The paper introduces concepts to support the emergence of new startups and advocates for funding of scale-up quantum companies. The paper concludes with proposals to improve the impact of the industry by taking steps to better involve society-at-large and with a call for collaboration on projects focused on the applications with a large societal benefit.	翻訳日:2023-11-30 23:27:56 公開日:2023-11-28
# satclip:衛星画像によるグローバルな汎用位置情報埋め込み SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery ( http://arxiv.org/abs/2311.17179v1 ) ライセンス: Link先を確認	Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Ru{\ss}wurm	(参考訳) 地理的な位置は、生態学から疫学、地球系科学まで幅広い分野のモデリングタスクに不可欠である。しかし、地理的かつ有意義な特徴の抽出は困難であり、しばしばグローバル画像データセットから高価なデータ融合やデータの蒸留を伴っている。この課題に対処するために,衛星画像から暗黙的な位置表現を学習する,グローバルで汎用的な地理的位置エンコーダsatclip(saturtic contrastive location-image pretraining)を導入する。訓練された位置エンコーダは、様々な下流タスクで便利な使用のために任意の位置の特性を要約したベクトル埋め込みを提供する。本研究では,多スペクトルsentinel-2衛星データに基づいて事前学習されたsatclip埋め込みを,温度予測や画像における動物認識,人口密度推定など,必ずしも衛星画像に依存しない様々な予測タスクに使用できることを示す。タスク全体にわたって、satclipは、自然画像でトレーニングされたモデルからセマンティックコンテキストでトレーニングされたモデルまで、既存のトレーニング済みロケーションエンコーダからの埋め込みを一貫して上回っています。 SatCLIP埋め込みは地理的一般化の改善にも役立つ。このことは、汎用的な位置エンコーダの可能性を示し、地理空間データの広大で多様で、ほとんど利用されていない様相から惑星の有意義な表現を学ぶための扉を開く。 Geographic location is essential for modeling tasks in fields ranging from ecology to epidemiology to the Earth system sciences. However, extracting relevant and meaningful characteristics of a location can be challenging, often entailing expensive data fusion or data distillation from global imagery datasets. To address this challenge, we introduce Satellite Contrastive Location-Image Pretraining (SatCLIP), a global, general-purpose geographic location encoder that learns an implicit representation of locations from openly available satellite imagery. Trained location encoders provide vector embeddings summarizing the characteristics of any given location for convenient usage in diverse downstream tasks. We show that SatCLIP embeddings, pretrained on globally sampled multi-spectral Sentinel-2 satellite data, can be used in various predictive tasks that depend on location information but not necessarily satellite imagery, including temperature prediction, animal recognition in imagery, and population density estimation. Across tasks, SatCLIP embeddings consistently outperform embeddings from existing pretrained location encoders, ranging from models trained on natural images to models trained on semantic context. SatCLIP embeddings also help to improve geographic generalization. This demonstrates the potential of general-purpose location encoders and opens the door to learning meaningful representations of our planet from the vast, varied, and largely untapped modalities of geospatial data.	翻訳日:2023-11-30 23:27:36 公開日:2023-11-28
# thinimg:画像中の話し頭提示のためのクロスモーダルステガノグラフィ THInImg: Cross-modal Steganography for Presenting Talking Heads in Images ( http://arxiv.org/abs/2311.17177v1 ) ライセンス: Link先を確認	Lin Zhao, Hongxuan Li, Xuefei Ning, Xinru Jiang	(参考訳) クロスモーダル・ステガノグラフィ(Cross-modal Steganography)は、秘密信号(秘密信号のモダリティとは別物)に秘密信号を隠蔽する手法である。従来のアプローチでは,比較的少ない情報の隠蔽に主眼を置きつつ,人物の顔の特性を活かし,長大な音声データ(後に音声ヘッドビデオのデコード)を身元確認画像内に隠蔽するthinimgを提案する。 thinimgはエンコーダとデコーダの2つの部分からなる。エンコーダ・デコーダパイプライン内で,画像に音声を隠蔽する能力を大幅に向上させる新しいアーキテクチャを導入する。さらに、我々のフレームワークは、複数のオーディオクリップをIDイメージに反復的に隠すように拡張することができ、パーミッションに対する複数のレベルの制御を提供する。提案手法の有効性を実証するために広範な実験を行い,160x160解像度のアイデンティティ画像において,sinimgが最大80秒の高品質音声(オーディオを含む)を提示できることを実証した。 Cross-modal Steganography is the practice of concealing secret signals in publicly available cover signals (distinct from the modality of the secret signals) unobtrusively. While previous approaches primarily concentrated on concealing a relatively small amount of information, we propose THInImg, which manages to hide lengthy audio data (and subsequently decode talking head video) inside an identity image by leveraging the properties of human face, which can be effectively utilized for covert communication, transmission and copyright protection. THInImg consists of two parts: the encoder and decoder. Inside the encoder-decoder pipeline, we introduce a novel architecture that substantially increase the capacity of hiding audio in images. Moreover, our framework can be extended to iteratively hide multiple audio clips into an identity image, offering multiple levels of control over permissions. We conduct extensive experiments to prove the effectiveness of our method, demonstrating that THInImg can present up to 80 seconds of high quality talking-head video (including audio) in an identity image with 160x160 resolution.	翻訳日:2023-11-30 23:27:11 公開日:2023-11-28
# 患者生存モデルのためのパーソナライズされた不確かさ定量化フレームワーク--基底的真理のない転移性脳腫瘍患者の個人的不確実性の推定 A personalized Uncertainty Quantification framework for patient survival models: estimating individual uncertainty of patients with metastatic brain tumors in the absence of ground truth ( http://arxiv.org/abs/2311.17173v1 ) ライセンス: Link先を確認	Yuqi Wang, Aarzu Gupta, David Carpenter, Trey Mullikin, Zachary J. Reitman, Scott Floyd, John Kirkpatrick, Joseph K. Salama, Paul W. Sperduto, Jian-Guo Liu, Mustafa R. Bashir, Kyle J. Lafata	(参考訳) TodevelopanovelUncertaintyQuantification (UQ) フレームワークを用いて,2015年1月から2020年12月にかけての脳転移に対する定位放射線治療 (SRS) を施行した1383例を対象に,患者生存モデルの不確実性の評価を行った。私たちのモチベーション仮説は、トレーニングセットの患者と高い機能空間の類似性を考えると、推論に関するテスト患者の時間-イベント予測の方がより確実であるということです。したがって、特定の患者に対する不確実性は、患者類似度ランクと予測類似度ランクとの一致指数で表される。モデル不確実性はモデルAUCと比較して最大不確実性制約AUCの増加率として定義された。 coxph, conditional survival forest,neural multi-task linear regression (nmtlr) などの統計モデルおよび非統計モデルを用いて,頭蓋内進展時間(icp),srs後の進行無生存時間(pfs),全生存時間(os),icpおよび/または死亡時間(icpd)を含む複数の臨床関連エンドポイントについて検討した。以上の結果から,全てのモデルにおいてICPが最低不確実性(2.21%),ICPDが最低不確実性(17.28%)を示した。 OSモデルは、NMTLRが最低不確実性(1.96%)、CSFが最高不確実性(14.29%)で高い不確実性を示した。結論として,本手法は患者個人生存モデルの結果の不確かさを推定できる。予想通り, モデルの不確かさが増大するにつれて, 特徴空間と予測結果との類似性が低下することを示した。 TodevelopanovelUncertaintyQuantification (UQ) framework to estimate the uncertainty of patient survival models in the absence of ground truth, we developed and evaluated our approach based on a dataset of 1383 patients treated with stereotactic radiosurgery (SRS) for brain metastases between January 2015 and December 2020. Our motivating hypothesis is that a time-to-event prediction of a test patient on inference is more certain given a higher feature-space-similarity to patients in the training set. Therefore, the uncertainty for a particular patient-of-interest is represented by the concordance index between a patient similarity rank and a prediction similarity rank. Model uncertainty was defined as the increased percentage of the max uncertainty-constrained-AUC compared to the model AUC. We evaluated our method on multiple clinically-relevant endpoints, including time to intracranial progression (ICP), progression-free survival (PFS) after SRS, overall survival (OS), and time to ICP and/or death (ICPD), on a variety of both statistical and non-statistical models, including CoxPH, conditional survival forest (CSF), and neural multi-task linear regression (NMTLR). Our results show that all models had the lowest uncertainty on ICP (2.21%) and the highest uncertainty (17.28%) on ICPD. OS models demonstrated high variation in uncertainty performance, where NMTLR had the lowest uncertainty(1.96%)and CSF had the highest uncertainty (14.29%). In conclusion, our method can estimate the uncertainty of individual patient survival modeling results. As expected, our data empirically demonstrate that as model uncertainty measured via our technique increases, the similarity between a feature-space and its predicted outcome decreases.	翻訳日:2023-11-30 23:26:49 公開日:2023-11-28
# 超伝導量子ハードウェアのためのqick(quantum instrumentation control kit)の実験的検討 Experimental advances with the QICK (Quantum Instrumentation Control Kit) for superconducting quantum hardware ( http://arxiv.org/abs/2311.17171v1 ) ライセンス: Link先を確認	Chunyang Ding, Martin Di Federico, Michael Hatridge, Andrew Houck, Sebastien Leger, Jeronimo Martinez, Connie Miao, David I. Schuster, Leandro Stefanazzi, Chris Stoughton, Sara Sussman, Ken Treptow, Sho Uemura, Neal Wilcer, Helin Zhang, Chao Zhou, and Gustavo Cancelo	(参考訳) QICKは2022年に初めて導入されたスタンドアロンのオープンソースキュービットコントローラである。本稿では, 超伝導量子ビット系においてQICKが一意に有効であった最近の実験事例について述べる。これには多重信号生成と読み出し、ミキサーフリー読み出し、歪んだ高速フラックスパルス、高忠実度パラメトリックエンタングリングゲートを含むパラメトリック動作のための位相コヒーレントパルスが含まれる。これらの実験にどのようにQICKが使われたのかを詳細に説明する。 The QICK is a standalone open source qubit controller that was first introduced in 2022. In this follow-up work, we present recent experimental use cases that the QICK uniquely enabled for superconducting qubit systems. These include multiplexed signal generation and readout, mixer-free readout, pre-distorted fast flux pulses, and phase-coherent pulses for parametric operations, including high-fidelity parametric entangling gates. We explain in detail how the QICK was used to enable these experiments.	翻訳日:2023-11-30 23:26:12 公開日:2023-11-28
# SoUnDフレームワーク: (Un)structured (D)ataにおける (So)cial Representationの解析 SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata ( http://arxiv.org/abs/2311.17259v1 ) ライセンス: Link先を確認	Mark D\'iaz, Sunipa Dev, Emily Reif, Remi Denton, Vinodkumar Prabhakaran	(参考訳) 基礎モデル開発で使用されるデータの非構造化の性質は、データの使用やドキュメントの決定を行うための体系的な分析の課題である。責任あるaiの観点からすると、これらの決定は、データにおける人々の表現方法を理解することに依存することが多い。本稿では,非構造化データにおける人間表現の分析を指導し,下流リスクを識別するための枠組みを提案する。このフレームワークをCommon Crawl Web text corpus (C4) と LAION-400M の2つの例に適用する。また、データセットの使用、開発、およびドキュメントのサービスにおける一連の仮定的なアクションステップも提案する。 The unstructured nature of data used in foundation model development is a challenge to systematic analyses for making data use and documentation decisions. From a Responsible AI perspective, these decisions often rely upon understanding how people are represented in data. We propose a framework designed to guide analysis of human representation in unstructured data and identify downstream risks. We apply the framework in two toy examples using the Common Crawl web text corpus (C4) and LAION-400M. We also propose a set of hypothetical action steps in service of dataset use, development, and documentation.	翻訳日:2023-11-30 23:19:07 公開日:2023-11-28
# トラヒックドメイン固有の特徴のグラフベース関連を用いたトラヒックのパターン検索 Pattern retrieval of traffic congestion using graph-based associations of traffic domain-specific features ( http://arxiv.org/abs/2311.17256v1 ) ライセンス: Link先を確認	Tin T. Nguyen, Simeon C. Calvert, Guopeng Li, Hans van Lint	(参考訳) トラフィックデータの急増は、トラフィックのダイナミクスに関するより洞察力のある情報を明らかにする多くの機会をもたらします。しかし、情報検索が重要な機能である効果的なデータベース管理システムも要求している。大きなデータセットに類似したパターンを見つける能力は、トラフィック管理におけるさらなる価値ある分析の道を開く可能性がある。本稿では,高速道路渋滞の時空間パターンに対するコンテンツベース検索システムを提案する。私たちのフレームワークにはパターン表現と類似度測定という2つの主要なコンポーネントがあります。検索結果を効果的に解釈するために, 基本的トラフィック現象をノードとして符号化し, 時空間の関係をエッジとして表現するグラフベース手法(リレーグラフ)を提案する。後者のコンポーネントでは、混雑パターン間の類似性はユーザの期待に応じてさまざまな側面でカスタマイズできる。様々な複雑なパターン(時間的・空間的)のデータセットに適用することで,提案手法の評価を行った。サンプルクエリは,提案手法の有効性,すなわち得られたパターンが与えられた例と同様の交通現象を示すことを示す。さらに,提案手法の成功は,基本トラヒック現象を関連付けるために関係グラフの概念を採用することにより,期待されるパターンを記述する意味検索の新たな機会を直接導出する。 The fast-growing amount of traffic data brings many opportunities for revealing more insightful information about traffic dynamics. However, it also demands an effective database management system in which information retrieval is arguably an important feature. The ability to locate similar patterns in big datasets potentially paves the way for further valuable analyses in traffic management. This paper proposes a content-based retrieval system for spatiotemporal patterns of highway traffic congestion. There are two main components in our framework, namely pattern representation and similarity measurement. To effectively interpret retrieval outcomes, the paper proposes a graph-based approach (relation-graph) for the former component, in which fundamental traffic phenomena are encoded as nodes and their spatiotemporal relationships as edges. In the latter component, the similarities between congestion patterns are customizable with various aspects according to user expectations. We evaluated the proposed framework by applying it to a dataset of hundreds of patterns with various complexities (temporally and spatially). The example queries indicate the effectiveness of the proposed method, i.e. the obtained patterns present similar traffic phenomena as in the given examples. In addition, the success of the proposed approach directly derives a new opportunity for semantic retrieval, in which expected patterns are described by adopting the relation-graph notion to associate fundamental traffic phenomena.	翻訳日:2023-11-30 23:18:50 公開日:2023-11-28
# subzero: subspace zero-shot mri 再構成 SubZero: Subspace Zero-Shot MRI Reconstruction ( http://arxiv.org/abs/2311.17251v1 ) ライセンス: Link先を確認	Heng Yu, Yamin Arefeen, Berkin Bilgic	(参考訳) 最近導入されたゼロショット自己教師学習(ZS-SSL)は、スキャン固有のシナリオでMRIを高速化する可能性を示し、大規模なトレーニングデータセットにアクセスせずに高品質な再構築を可能にした。 ZS-SSLは2D T2シャッフルの取得を加速するために、サブスペースモデルとさらに結合されている。本研究では,サブスペースベースのゼロショット自己教師型学習を改良し,より高いアクセラレーション係数を実現するための並列ネットワークフレームワークを提案し,アテンション機構を提案する。本稿では,提案手法をSubZeroと命名し,現在のT1およびT2マッピングの手法と比較して性能の向上を実証する。 Recently introduced zero-shot self-supervised learning (ZS-SSL) has shown potential in accelerated MRI in a scan-specific scenario, which enabled high-quality reconstructions without access to a large training dataset. ZS-SSL has been further combined with the subspace model to accelerate 2D T2-shuffling acquisitions. In this work, we propose a parallel network framework and introduce an attention mechanism to improve subspace-based zero-shot self-supervised learning and enable higher acceleration factors. We name our method SubZero and demonstrate that it can achieve improved performance compared with current methods in T1 and T2 mapping acquisitions.	翻訳日:2023-11-30 23:18:18 公開日:2023-11-28
# 量子場理論学習のためのフーリエ微分方程式 Fourier Neural Differential Equations for learning Quantum Field Theories ( http://arxiv.org/abs/2311.17250v1 ) ライセンス: Link先を確認	Isaac Brant, Alexander Norcliffe and Pietro Li\`o	(参考訳) 量子場理論は相互作用ハミルトニアンによって定義され、散乱行列によって実験データにリンクされる。散乱行列は摂動列として計算され、1次微分方程式として簡潔に表される。ニューラルネットワーク微分方程式(NDE)は、残差ネットワークの隠れ状態の時間微分を学習し、物理的制約のある微分方程式の学習に有効であることが証明された。したがって、NDEを用いて粒子散乱行列を学習すると、実験理論と現象学の結びつきが生じる可能性がある。本稿では、NDEモデルを用いて、$\phi^4$理論、Scalar-Yukawa理論、Scalar Quantum Electrodynamicsを学ぶ。新たなndeアーキテクチャとして、nde積分とフーリエネットワーク畳み込みを組み合わせたフーリエ神経微分方程式(fnde)が導入された。 FNDEモデルは、非積分同値なFNOモデルよりも優れた一般化性を示す。また、散乱データのトレーニングにより、理論の相互作用ハミルトニアンをネットワークパラメータから抽出できることが示されている。 A Quantum Field Theory is defined by its interaction Hamiltonian, and linked to experimental data by the scattering matrix. The scattering matrix is calculated as a perturbative series, and represented succinctly as a first order differential equation in time. Neural Differential Equations (NDEs) learn the time derivative of a residual network's hidden state, and have proven efficacy in learning differential equations with physical constraints. Hence using an NDE to learn particle scattering matrices presents a possible experiment-theory phenomenological connection. In this paper, NDE models are used to learn $\phi^4$ theory, Scalar-Yukawa theory and Scalar Quantum Electrodynamics. A new NDE architecture is also introduced, the Fourier Neural Differential Equation (FNDE), which combines NDE integration and Fourier network convolution. The FNDE model demonstrates better generalisability than the non-integrated equivalent FNO model. It is also shown that by training on scattering data, the interaction Hamiltonian of a theory can be extracted from network parameters.	翻訳日:2023-11-30 23:17:59 公開日:2023-11-28
# 親ハミルトニアン構成における可換ペナルティ関数について On Commutative Penalty Functions in Parent-Hamiltonian Constructions ( http://arxiv.org/abs/2311.17249v1 ) ライセンス: Link先を確認	Jacob Biamonte	(参考訳) 与えられた量子状態によって一意に最小化される期待値を持つハミルトニアンを構成するいくつかの既知の技術がある。一般的なアプローチとしては、行列積状態からの親ハミルトン構成、近似基底状態プロジェクタの構築、一般化イジングモデルからのペナルティ関数の開発などがある。ここでは、可換多項式から正確な親ハミルトニアンを設計できる枠組みを考える。二次イジング親ハミルトニアンの基本分類結果を導出し、一般に非インジェクティブな親ハミルトニアン構成を導出する。また、任意の$n$-qubit安定化状態は、$n+1$項を持つ可換な親ハミルトニアンを持ち、離散函数の真理テーブルをカーネル空間に埋め込むネットワーク要素を構成することで、親ハミルトニアンを導出できるアプローチを開発する。この研究は、正確な親ハミルトニアンについて知られていることの構成要素を捉える統一フレームワークを示し、そのような構成に関係のあるドメインをまたがるいくつかのテクニックを橋渡しする。 There are several known techniques to construct a Hamiltonian with an expected value that is minimized uniquely by a given quantum state. Common approaches include the parent Hamiltonian construction from matrix product states, building approximate ground state projectors, and, in a common case, developing penalty functions from the generalized Ising model. Here we consider the framework that enables one to engineer exact parent Hamiltonians from commuting polynomials. We derive elementary classification results of quadratic Ising parent Hamiltonians and to generally derive a non-injective parent Hamiltonian construction. We also consider that any $n$-qubit stabilizer state has a commutative parent Hamiltonian with $n+1$ terms and we develop an approach that allows the derivation of parent Hamiltonians by composition of network elements that embed the truth tables of discrete functions into a kernel space. This work presents a unifying framework that captures components of what is known about exact parent Hamiltonians and bridges a few techniques across the domains that are concerned with such constructions.	翻訳日:2023-11-30 23:17:28 公開日:2023-11-28
# 線形逆問題を解くための深い正則化複合ガウスネットワーク Deep Regularized Compound Gaussian Network for Solving Linear Inverse Problems ( http://arxiv.org/abs/2311.17248v1 ) ライセンス: Link先を確認	Carter Lyons, Raghu G. Raj, and Margaret Cheney	(参考訳) 逆問題に事前情報を組み込むことは、例えば、最大位置推定によって、堅牢な逆問題解決を容易にする重要な手法である。本稿では,複合ガウス分布(CG)クラスにおける問題固有の統計的事前選択を許容する線形逆問題に対する2つの新しいアプローチを考案する。 CGクラスは、疎度に基づくアプローチを含む信号および画像再構成手法において、よく使われる多くの先行を仮定する。最初の手法は、一般化複合ガウス最小平方(G-CG-LS)と呼ばれる反復アルゴリズムであり、正規化がCGを前に強制する正規化最小二乗目的関数を最小化する。そして、G-CG-LSをアンロールするか、展開するかして、2つ目の手法、DR-CG-Netと呼ばれる新しいDeep regularized(DR)ニューラルネットワークを構築し、事前情報を学習する。 G-CG-LSの収束特性に関する詳細な計算理論とDR-CG-Netの詳細な数値実験を提供する。従来のCGの包括的性質から,これらの実験は,特に低トレーニングシナリオにおいて,断層撮影や圧縮センシングにおいて,DR-CG-Netが先行技術よりも優れていることを示す。 Incorporating prior information into inverse problems, e.g. via maximum-a-posteriori estimation, is an important technique for facilitating robust inverse problem solutions. In this paper, we devise two novel approaches for linear inverse problems that permit problem-specific statistical prior selections within the compound Gaussian (CG) class of distributions. The CG class subsumes many commonly used priors in signal and image reconstruction methods including those of sparsity-based approaches. The first method developed is an iterative algorithm, called generalized compound Gaussian least squares (G-CG-LS), that minimizes a regularized least squares objective function where the regularization enforces a CG prior. G-CG-LS is then unrolled, or unfolded, to furnish our second method, which is a novel deep regularized (DR) neural network, called DR-CG-Net, that learns the prior information. A detailed computational theory on convergence properties of G-CG-LS and thorough numerical experiments for DR-CG-Net are provided. Due to the comprehensive nature of the CG prior, these experiments show that our unrolled DR-CG-Net outperforms competitive prior art methods in tomographic imaging and compressive sensing, especially in challenging low-training scenarios.	翻訳日:2023-11-30 23:16:29 公開日:2023-11-28
# lightgaussian: 15倍縮小200fpsの非有界3次元ガウス圧縮 LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS ( http://arxiv.org/abs/2311.17245v1 ) ライセンス: Link先を確認	Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang	(参考訳) ポイントベース技術を用いたリアルタイムニューラルレンダリングの最近の進歩は、3D表現の普及の道を開いた。しかし、3D Gaussian Splattingのような基本的なアプローチは、SfMポイントを数百万に拡大し、単一の無制限シーンに対してギガバイトレベルのディスクスペースを必要とすることがあり、大きなスケーラビリティ上の課題を生じさせ、スティング効率を妨げている。この課題に対処するために、我々は3Dガウスをより効率的でコンパクトなフォーマットに変換するために設計された新しい方法であるLightGaussianを紹介する。ネットワークプルーニングの概念からインスピレーションを得て、lightgaussianはシーンの再構築に寄与しないガウス人を特定し、プルーニングとリカバリのプロセスを採用し、視覚効果を保ちながらガウス数における冗長性を効果的に削減した。さらに、LightGaussianは、蒸留と擬似ビュー拡張を使用して球面調和を低い程度に蒸留し、反射性を維持しながらよりコンパクトな表現への知識伝達を可能にする。さらに,全ての属性を量子化するハイブリッド方式であるVecTree Quantizationを提案する。要約すると、LightGaussian は FPS を 139 から 215 に向上させ、Mip-NeRF 360, Tank と Temple のデータセット上の複雑なシーンの効率的な表現を可能にした。プロジェクトウェブサイト: https://lightgaussian.github.io/ Recent advancements in real-time neural rendering using point-based techniques have paved the way for the widespread adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting come with a substantial storage overhead caused by growing the SfM points to millions, often demanding gigabyte-level disk space for a single unbounded scene, posing significant scalability challenges and hindering the splatting efficiency. To address this challenge, we introduce LightGaussian, a novel method designed to transform 3D Gaussians into a more efficient and compact format. Drawing inspiration from the concept of Network Pruning, LightGaussian identifies Gaussians that are insignificant in contributing to the scene reconstruction and adopts a pruning and recovery process, effectively reducing redundancy in Gaussian counts while preserving visual effects. Additionally, LightGaussian employs distillation and pseudo-view augmentation to distill spherical harmonics to a lower degree, allowing knowledge transfer to more compact representations while maintaining reflectance. Furthermore, we propose a hybrid scheme, VecTree Quantization, to quantize all attributes, resulting in lower bitwidth representations with minimal accuracy losses. In summary, LightGaussian achieves an averaged compression rate over 15x while boosting the FPS from 139 to 215, enabling an efficient representation of complex scenes on Mip-NeRF 360, Tank and Temple datasets. Project website: https://lightgaussian.github.io/	翻訳日:2023-11-30 23:15:52 公開日:2023-11-28
# phg-net: 持続的ホモロジー誘導医用画像分類 PHG-Net: Persistent Homology Guided Medical Image Classification ( http://arxiv.org/abs/2311.17243v1 ) ライセンス: Link先を確認	Yaopeng Peng, Hongxiao Wang, Milan Sonka and Danny Z. Chen	(参考訳) 現代のディープニューラルネットワークは、医用画像分析で大きな成功を収めている。しかし、畳み込みニューラルネットワーク(cnns)やトランスフォーマーによってキャプチャされる機能は、ピクセルの強度や連結コンポーネントやループのようなキー解剖学的構造を無視するために最適化される傾向がある。本稿では,医用画像分類対象のトポロジ的特徴を探索する永続的ホモロジーガイド手法(PHG-Net)を提案する。入力画像の場合、まずその3次永続図を計算し、小さなニューラルネットワーク(PHモジュールと呼ばれる)を用いてトポロジ的特徴をベクトル表現に抽出する。抽出したトポロジ的特徴は、特徴融合のためにCNNやTransformerによって生成された特徴マップに組み込まれる。 PHモジュールは軽量で、どのCNNやTransformerアーキテクチャにもエンドツーエンドでトポロジ的な機能を組み込むことができる。 PHG-Netを3つの公開データセット上で評価し、最先端手法よりもターゲット分類タスクをかなり改善したことを示す。 Modern deep neural networks have achieved great successes in medical image analysis. However, the features captured by convolutional neural networks (CNNs) or Transformers tend to be optimized for pixel intensities and neglect key anatomical structures such as connected components and loops. In this paper, we propose a persistent homology guided approach (PHG-Net) that explores topological features of objects for medical image classification. For an input image, we first compute its cubical persistence diagram and extract topological features into a vector representation using a small neural network (called the PH module). The extracted topological features are then incorporated into the feature map generated by CNN or Transformer for feature fusion. The PH module is lightweight and capable of integrating topological features into any CNN or Transformer architectures in an end-to-end fashion. We evaluate our PHG-Net on three public datasets and demonstrate its considerable improvements on the target classification tasks over state-of-the-art methods.	翻訳日:2023-11-30 23:15:19 公開日:2023-11-28
# 1000フレームの1Bパラメータによる終端動作検出 End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames ( http://arxiv.org/abs/2311.17241v1 ) ライセンス: Link先を確認	Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem	(参考訳) 近年,エンドツーエンドトレーニングによる時間的行動検出(tad)の性能向上がみられた。しかし、メモリボトルネックのため、制限されたスケールと限られたデータボリュームを持つモデルだけがエンドツーエンドのトレーニングを受けられるため、必然的にtadパフォーマンスが制限される。本稿では,エンド・ツー・エンドのトレーニングにおけるメモリ消費を削減し,10億のパラメータと入力ビデオの容量を1,536フレームに拡大し,検出性能を著しく向上させる。このアプローチの鍵は、トレーニングメモリを減らす新しい軽量モジュールであるtia(temporal-informative adapter)の提案にあります。 TIAを用いて,TADタスクに適応するために,TAAのパラメータのみを更新することで,背骨を学習から解放する。 TIAはまた、背骨全体に隣接するフレームから時間的にコンテキストを集約することで、TAD表現を改善する。モデルを4つの代表的なデータセットで評価します。効率的な設計のため、VideoMAEv2-giantでエンドツーエンドをトレーニングし、THUMOS14で75.4%のmAPを達成できます。 Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training, which inevitably restricts TAD performance. In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames, leading to significant detection performance. The key to our approach lies in our proposed temporal-informative adapter (TIA), which is a novel lightweight module that reduces training memory. Using TIA, we free the humongous backbone from learning to adapt to the TAD task by only updating the parameters in TIA. TIA also leads to better TAD representation by temporally aggregating context from adjacent frames throughout the backbone. We evaluate our model across four representative datasets. Owing to our efficient design, we are able to train end-to-end on VideoMAEv2-giant and achieve 75.4% mAP on THUMOS14, being the first end-to-end model to outperform the best feature-based methods.	翻訳日:2023-11-30 23:15:01 公開日:2023-11-28
# 重み付きグラフ上のPromise Clique Homologyは$\text{QMA}_1$-hardであり、$\text{QMA}$に含まれる Promise Clique Homology on weighted graphs is $\text{QMA}_1$-hard and contained in $\text{QMA}$ ( http://arxiv.org/abs/2311.17234v1 ) ライセンス: Link先を確認	Robbie King, Tamara Kohler	(参考訳) 計算トポロジーにおける古典問題、ホモロジー問題:ある空間$X$と整数$k$の説明が与えられたとき、$X$が$k$次元の穴を持つかどうかを決定する。ホモロジー問題の設定とステートメントは完全に古典的であるが、複雑性は量子複雑性クラスによって特徴づけられる。我々の結果はホモロジーと超対称量子力学 [Wit82] の接続の側面と見なすことができる。我々は、トポロジカルデータ解析(TDA)の実践的応用を動機とする斜方体を考察する。グラフのクライク複体 (clique complex) は、グラフ内のすべての$k+1$-clique を$k$-simplex と宣言して形成される単純複体である。我々の主な結果は、重み付きグラフのclique complexが、適当な約束を与えられたとき、$\text{QMA}_1$-hardであり、$\text{QMA}$に含まれるかどうかを決定することである。我々の主な革新は、組合せラプラシア作用素の固有値を下げる手法である。このため、スペクトル列として知られる代数的トポロジーからツールを呼び出す。特にスペクトル列とホッジ理論 [for94] の関係を利用する。スペクトル列は、組合せラプラシアンの摂動理論に類似した役割を果たす。また,本研究は,先行研究(CK22)で用いられる外科手術技術を開発した。この結果は量子tdaアルゴリズム[lgz16]が非量子化できないことを示唆する。より広い意味では、我々の結果がトポロジカルデータ分析における量子優位の新たな可能性を開くことを望んでいる。 We study the complexity of a classic problem in computational topology, the homology problem: given a description of some space $X$ and an integer $k$, decide if $X$ contains a $k$-dimensional hole. The setting and statement of the homology problem are completely classical, yet we find that the complexity is characterized by quantum complexity classes. Our result can be seen as an aspect of a connection between homology and supersymmetric quantum mechanics [Wit82]. We consider clique complexes, motivated by the practical application of topological data analysis (TDA). The clique complex of a graph is the simplicial complex formed by declaring every $k+1$-clique in the graph to be a $k$-simplex. Our main result is that deciding whether the clique complex of a weighted graph has a hole or not, given a suitable promise, is $\text{QMA}_1$-hard and contained in $\text{QMA}$. Our main innovation is a technique to lower bound the eigenvalues of the combinatorial Laplacian operator. For this, we invoke a tool from algebraic topology known as spectral sequences. In particular, we exploit a connection between spectral sequences and Hodge theory [For94]. Spectral sequences will play a role analogous to perturbation theory for combinatorial Laplacians. In addition, we develop the simplicial surgery technique used in prior work [CK22]. Our result provides some suggestion that the quantum TDA algorithm [LGZ16] cannot be dequantized. More broadly, we hope that our results will open up new possibilities for quantum advantage in topological data analysis.	翻訳日:2023-11-30 23:14:39 公開日:2023-11-28
# 韻律とテキストの冗長性の定量化 Quantifying the redundancy between prosody and text ( http://arxiv.org/abs/2311.17233v1 ) ライセンス: Link先を確認	Lukas Wolf, Tiago Pimentel, Evelina Fedorenko, Ryan Cotterell, Alex Warstadt, Ethan Wilcox, Tamar Regev	(参考訳) 韻律(Prosody)は、ピッチ、ラウドネス、テンポを含む音声の超音節的要素であり、意味の重要な側面を持っている。しかし、韻律によって伝達される情報と語自体の関係は、まだよく分かっていない。我々は大言語モデル(LLM)を用いて、韻律と単語自体の間にどれだけの情報が冗長であるかを推定する。英語オーディオブックの大規模なコーパスを用いて、個々の単語に整列した韻律的特徴を抽出し、LLM埋め込みによる予測の精度を、非文脈単語埋め込みと比較して検証する。単語が持つ情報と韻律情報との間には,強度,持続時間,停止時間,ピッチ輪郭など,様々な韻律的特徴がある。さらに、単語の韻律情報は、単語自体とそれに続く文脈の両方で冗長である。それでも、韻律的特徴がテキストから完全には予測できないことを観察し、韻律が単語の上や上からの情報を運ぶことを示唆する。本稿では,言語情報と言語外特徴との関係を定量化するための汎用データ処理パイプラインをリリースする。 Prosody -- the suprasegmental component of speech, including pitch, loudness, and tempo -- carries critical aspects of meaning. However, the relationship between the information conveyed by prosody vs. by the words themselves remains poorly understood. We use large language models (LLMs) to estimate how much information is redundant between prosody and the words themselves. Using a large spoken corpus of English audiobooks, we extract prosodic features aligned to individual words and test how well they can be predicted from LLM embeddings, compared to non-contextual word embeddings. We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features, including intensity, duration, pauses, and pitch contours. Furthermore, a word's prosodic information is redundant with both the word itself and the context preceding as well as following it. Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words. Along with this paper, we release a general-purpose data processing pipeline for quantifying the relationship between linguistic information and extra-linguistic features.	翻訳日:2023-11-30 23:14:12 公開日:2023-11-28
# ReWaRD: 出生前発達を模倣するニューラルネットワークのための網膜波 ReWaRD: Retinal Waves for Pre-Training Artificial Neural Networks Mimicking Real Prenatal Development ( http://arxiv.org/abs/2311.17232v1 ) ライセンス: Link先を確認	Benjamin Cappell and Andreas Stoll and Williams Chukwudi Umah and Bernhard Egger	(参考訳) 大量の自然画像に基づいて訓練された計算モデルは、人間の視覚を研究する最先端技術である。幼児の視覚の計算モデルとそのさらなる発展は、コミュニティでますます注目を集めている。この研究では、視覚体験の始まり – 生前および生後の網膜波 – を目標としており、これは霊長類視覚システムの開発のごく初期段階における事前学習メカニズムであることを示唆している。我々は、このアプローチを、事前学習による生物学的にもっともらしいデータ駆動誘導バイアスの例と見なしている。我々は、人工畳み込みニューラルネットワークをシミュレーションされた網膜波画像で事前学習することで、この発達メカニズムを模倣する計算モデルを構築した。この生物学的に可能な事前学習の結果として得られた特徴は、霊長類視覚システムのv1の特徴と密接に一致する。網膜波による事前トレーニングによるパフォーマンス向上は,最先端の事前トレーニングパイプラインに類似している。我々のフレームワークには網膜波発生器とトレーニング戦略が含まれており、これは様々な開発モデルのためのカリキュラム学習に基づく学習ダイエットの第一段階となる。我々は、視覚発達に関する将来の研究の基礎を構築するためのコード、データ、訓練されたネットワークをリリースし、出生前開発を含むカリキュラム学習アプローチに基づいて、霊長類視覚系の自然と学習特性の研究を支援する。神経科学やコンピュータビジョンの応用のためのトレーニング済みネットワークのメリットは、ImageNetのようなデータセットから受け継がれたバイアスがないことです。 Computational models trained on a large amount of natural images are the state-of-the-art to study human vision - usually adult vision. Computational models of infant vision and its further development are gaining more and more attention in the community. In this work we aim at the very beginning of our visual experience - pre- and post-natal retinal waves which suggest to be a pre-training mechanism for the primate visual system at a very early stage of development. We see this approach as an instance of biologically plausible data driven inductive bias through pre-training. We built a computational model that mimics this development mechanism by pre-training different artificial convolutional neural networks with simulated retinal wave images. The resulting features of this biologically plausible pre-training closely match the V1 features of the primate visual system. We show that the performance gain by pre-training with retinal waves is similar to a state-of-the art pre-training pipeline. Our framework contains the retinal wave generator, as well as a training strategy, which can be a first step in a curriculum learning based training diet for various models of development. We release code, data and trained networks to build the basis for future work on visual development and based on a curriculum learning approach including prenatal development to support studies of innate vs. learned properties of the primate visual system. An additional benefit of our pre-trained networks for neuroscience or computer vision applications is the absence of biases inherited from datasets like ImageNet.	翻訳日:2023-11-30 23:13:52 公開日:2023-11-28
# コヒーレント集団トラップ共鳴の非対称性に起因する非線形周波数シフト:一般化 A nonlinear frequency shift caused by asymmetry of the coherent population trapping resonance: a generalization ( http://arxiv.org/abs/2311.17229v1 ) ライセンス: Link先を確認	E. A. Tsygankov, D. S. Chuchelov, M. I. Vaskovskaya, V. V. Vassiliev, S. A. Zibrov, and V. L. Velichansky	(参考訳) 非対称スペクトルを持つ多色光学場により誘導されるコヒーレント集団トラップ共鳴、すなわち、キャリアから等距離の側バンドが不平等なパワーを持つコヒーレント集団トラップ共鳴について検討する。チップスケールの原子時計における局所発振器の周波数の安定化に使用される光電界の伝送におけるいわゆるin-phaseおよびquadrature信号を提供するために変調が使用される状況が検討されている。一般的な場合、信号の周波数は共鳴の非対称性による光場の強度に非線形に依存する。本研究において,我々は実演する a) この効果が共鳴のマルチピーク構造に由来すること b) 高変調値における光界の強度に対する周波数の線形依存性 c) 解決された構造の体制は、スペクトル非対称性による周波数シフトの抑制よりも有利である。 We investigate the coherent population trapping resonance induced by a polychromatic optical field with an asymmetric spectrum, i.e., whose sidebands equidistant from the carrier have unequal powers. A situation is considered where a modulation is used to provide the so-called in-phase and quadrature signals in the optical field's transmission, which are used for stabilization of the local oscillator's frequency in chip-scale atomic clocks. In a general case, the frequencies of the signals nonlinearly depend on the optical field's intensity due to the asymmetry of the resonance. In this work, we demonstrate a) that this effect stems from a multi-peak structure of the resonance; b) a linear dependence of the frequency on the optical field's intensity at high modulation values; c) that the regime of the resolved structure has more advantages than suppression of the frequency shift due to the spectrum asymmetry.	翻訳日:2023-11-30 23:13:30 公開日:2023-11-28
# AI倫理に関する調査: 社会技術的視点 Survey on AI Ethics: A Socio-technical Perspective ( http://arxiv.org/abs/2311.17228v1 ) ライセンス: Link先を確認	Dave Mbiazi, Meghana Bhange, Maryam Babaei, Ivaxi Sheth, Patrik Joslin Kenfack	(参考訳) 過去10年間、ディープラーニングベースのモデルは、安全クリティカルなアプリケーションを含むさまざまなシナリオにデプロイされている。これらのAIシステムが私たちの社会的基盤に深く浸透するにつれて、その決定と行動に対する反感が重大な結果をもたらし、AIデプロイメントの倫理的影響が極めて重要になる。 aiに関連する倫理的な懸念には、公平性、プライバシとデータ保護、責任と説明責任、安全と堅牢性、透明性と説明可能性、環境への影響といった課題が含まれている。これらの原則は、AIシステムのライフサイクルにおけるすべてのステークホルダーに関する倫理的AI考察の基礎を形成する。現在の倫理的および将来のxリスクの懸念を踏まえ、政府はAIの倫理的展開のためのガイドラインを確立することへの関心が高まっている。この研究は、AIを社会に展開する際の現在と将来の倫理的懸念を統一する。本稿では,それぞれの倫理的原則に関する技術的調査を認識し,評価する一方で,技術的な観点からも各原則に対処するだけでなく,社会的観点からも議論する包括的概要を提供することを目的とする。 The past decade has observed a great advancement in AI with deep learning-based models being deployed in diverse scenarios including safety-critical applications. As these AI systems become deeply embedded in our societal infrastructure, the repercussions of their decisions and actions have significant consequences, making the ethical implications of AI deployment highly relevant and important. The ethical concerns associated with AI are multifaceted, including challenging issues of fairness, privacy and data protection, responsibility and accountability, safety and robustness, transparency and explainability, and environmental impact. These principles together form the foundations of ethical AI considerations that concern every stakeholder in the AI system lifecycle. In light of the present ethical and future x-risk concerns, governments have shown increasing interest in establishing guidelines for the ethical deployment of AI. This work unifies the current and future ethical concerns of deploying AI into society. While we acknowledge and appreciate the technical surveys for each of the ethical principles concerned, in this paper, we aim to provide a comprehensive overview that not only addresses each principle from a technical point of view but also discusses them from a social perspective.	翻訳日:2023-11-30 23:13:18 公開日:2023-11-28
# 戦争と平和(WarAgent):大言語モデルに基づく世界大戦のマルチエージェントシミュレーション War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars ( http://arxiv.org/abs/2311.17227v1 ) ライセンス: Link先を確認	Wenyue Hua, Lizhou Fan, Lingyao Li, Kai Mei, Jianchao Ji, Yingqiang Ge, Libby Hemphill, Yongfeng Zhang	(参考訳) 歴史の交差点での戦争は避けられるか? この問題は人類の歴史を通じて個人、学者、政策立案者、組織によって追求されてきた。本研究では,人工知能(AI)とLarge Language Models(LLM)の最近の進歩に基づいて,この問題に答えようとしている。我々は、第一次世界大戦(wwi)、第二次世界大戦(wwii)、古代中国の戦国時代(wsp)を含む歴史的な国際紛争において、参加国、決定、結果をシミュレートする、llmによるマルチエージェントaiシステムである \textbf{waragent} を提案する。シミュレーションの有効性を評価することにより,多種多様な環境下での国際紛争などの複雑な集団的人間行動の研究における最先端AIシステムの能力の進歩と限界を検討する。これらのシミュレーションでは、エージェント間の創発的な相互作用は、戦争につながるトリガーと条件を調べるための新しい視点を提供する。私たちの調査結果は、データ駆動およびaiによる洞察を提供し、紛争解決と平和維持戦略へのアプローチ方法を再定義します。この意味合いは歴史的分析を超えて、AIを使って人類の歴史を理解し、将来の国際紛争を防ぐ青写真を提供する。コードとデータは \url{https://github.com/agiresearch/waragent} で入手できる。 Can we avoid wars at the crossroads of history? This question has been pursued by individuals, scholars, policymakers, and organizations throughout human history. In this research, we attempt to answer the question based on the recent advances of Artificial Intelligence (AI) and Large Language Models (LLMs). We propose \textbf{WarAgent}, an LLM-powered multi-agent AI system, to simulate the participating countries, their decisions, and the consequences, in historical international conflicts, including the World War I (WWI), the World War II (WWII), and the Warring States Period (WSP) in Ancient China. By evaluating the simulation effectiveness, we examine the advancements and limitations of cutting-edge AI systems' abilities in studying complex collective human behaviors such as international conflicts under diverse settings. In these simulations, the emergent interactions among agents also offer a novel perspective for examining the triggers and conditions that lead to war. Our findings offer data-driven and AI-augmented insights that can redefine how we approach conflict resolution and peacekeeping strategies. The implications stretch beyond historical analysis, offering a blueprint for using AI to understand human history and possibly prevent future international conflicts. Code and data are available at \url{https://github.com/agiresearch/WarAgent}.	翻訳日:2023-11-30 23:13:01 公開日:2023-11-28
# VLNは非感覚的または無関係な指示による作業の事前訓練を行っているか? Does VLN Pretraining Work with Nonsensical or Irrelevant Instructions? ( http://arxiv.org/abs/2311.17280v1 ) ライセンス: Link先を確認	Wang Zhu, Ishika Singh, Yuan Huang, Robin Jia and Jesse Thomason	(参考訳) バックトランスレーションによるデータ拡張は、生成した命令がノイズであっても、Vision-and-Language Navigation (VLN)モデルを事前訓練する場合に一般的である。しかし、そのノイズは重要か? R2R上のHAMTとVLN-BERTの両方のダウンストリーム性能には,事前学習中の非感覚的あるいは無関係な言語命令がほとんど影響しないことがわかった。これらの結果を評価するために、下流の性能を改善する非感覚的な命令を生成する効率的な拡張手法Unigram + Objectを考案した。以上の結果から,VLN R2R事前訓練で重要なことは,指示の質ではなく,視線量であることが示唆された。 Data augmentation via back-translation is common when pretraining Vision-and-Language Navigation (VLN) models, even though the generated instructions are noisy. But: does that noise matter? We find that nonsensical or irrelevant language instructions during pretraining can have little effect on downstream performance for both HAMT and VLN-BERT on R2R, and is still better than only using clean, human data. To underscore these results, we concoct an efficient augmentation method, Unigram + Object, which generates nonsensical instructions that nonetheless improve downstream performance. Our findings suggest that what matters for VLN R2R pretraining is the quantity of visual trajectories, not the quality of instructions.	翻訳日:2023-11-30 23:03:56 公開日:2023-11-28
# LiveTune: ディープニューラルネットワークのトレーニングのための動的パラメータチューニング LiveTune: Dynamic Parameter Tuning for Training Deep Neural Networks ( http://arxiv.org/abs/2311.17279v1 ) ライセンス: Link先を確認	Soheil Zibakhsh Shabgahi, Nojan Sheybani, Aiden Tabrizi, Farinaz Koushanfar	(参考訳) 従来の機械学習トレーニングは、ハイパーパラメータのリアルタイム適応性に欠ける静的プロセスである。実行時の一般的なチューニングソリューションには、チェックポイントとスケジューラが含まれる。ハイパーパラメータの調整は通常、プログラムを再起動し、使用時間と時間を浪費し、メモリやプロセッサに不必要な歪みを課す。 LiveTuneは、LiveVariablesを通じてトレーニング中のリアルタイムパラメータチューニングを可能にする新しいフレームワークである。 Live Variablesは、システム上の指定されたポートにパラメータを格納することで、継続的なトレーニングセッションを可能にする。フレームワークの広範な評価では、ハイパーパラメータの変化毎に最大60秒と5.4キロジュールのエネルギーを節約できることがわかった。 Traditional machine learning training is a static process that lacks real-time adaptability of hyperparameters. Popular tuning solutions during runtime involve checkpoints and schedulers. Adjusting hyper-parameters usually require the program to be restarted, wasting utilization and time, while placing unnecessary strain on memory and processors. We present LiveTune, a new framework allowing real-time parameter tuning during training through LiveVariables. Live Variables allow for a continuous training session by storing parameters on designated ports on the system, allowing them to be dynamically adjusted. Extensive evaluations of our framework show saving up to 60 seconds and 5.4 Kilojoules of energy per hyperparameter change.	翻訳日:2023-11-30 23:03:39 公開日:2023-11-28
# インド小作農家のためのオンライン最適化に基づく意思決定支援ツール--非定常環境における学習 An Online Optimization-Based Decision Support Tool for Small Farmers in India: Learning in Non-stationary Environments ( http://arxiv.org/abs/2311.17277v1 ) ライセンス: Link先を確認	Tuxun Lu, Aviva Prins	(参考訳) 作物経営決定支援システムは、特に農業生産性に影響を及ぼす現在の気候変化の下での使用に有用な、収入の流れのリスクを低減する農家のための特別なツールである。残念ながら、これらのツールの恩恵を受けることができるインドの小農家は、それらにアクセスできない。本稿では,個別の温室をマルコフ決定プロセス(MDP)としてモデル化し,Li and Li(2019)のFWLオンライン学習アルゴリズムを適応させ,作物計画のアドバイスを提供する。シミュレーションでは, 実用性保全型クロッピングパターンの提案に成功している。オフラインの計画アルゴリズムと比較すると,実行時間を大幅に削減した累積収益が得られる。 Crop management decision support systems are specialized tools for farmers that reduce the riskiness of revenue streams, especially valuable for use under the current climate changes that impact agricultural productivity. Unfortunately, small farmers in India, who could greatly benefit from these tools, do not have access to them. In this paper, we model an individual greenhouse as a Markov Decision Process (MDP) and adapt Li and Li (2019)'s Follow the Weighted Leader (FWL) online learning algorithm to offer crop planning advice. We successfully produce utility-preserving cropping pattern suggestions in simulations. When we compare against an offline planning algorithm, we achieve the same cumulative revenue with greatly reduced runtime.	翻訳日:2023-11-30 23:03:29 公開日:2023-11-28
# 捕獲イオン結晶の非古典運動を利用した大域スピン回転と微分スピン回転の量子エンハンシングメトロロジー Exploiting nonclassical motion of a trapped ion crystal for quantum-enhanced metrology of global and differential spin rotations ( http://arxiv.org/abs/2311.17275v1 ) ライセンス: Link先を確認	R. J. Lewis-Swan, J. C. Zu\~niga Castro, D. Barberena, A. M. Rey	(参考訳) イオンの集団運動の圧縮状態と結合することにより、閉じ込められたイオン配列における非古典的スピン状態の生成を理論的に検討する。生成したスピン状態の相関は、イオンアレイの特定の振動モードと相互作用してスピンのサブセンスの大域的または微分的な回転の量子エンハンスセンシングのために調整することができる。本研究では,生成した状態を利用し,有限サイズ効果,スピンと運動の自由度と技術的ノイズとの不均質な結合の影響を判定するプロトコルを提案する。本研究はスピンボーソン系における量子エンハンスド・メトロロジーの相関を調整した多体状態の生成に新たな機会を示唆する。 We theoretically investigate prospects for the creation of nonclassical spin states in trapped ion arrays by coupling to a squeezed state of the collective motion of the ions. The correlations of the generated spin states can be tailored for quantum-enhanced sensing of global or differential rotations of sub-ensembles of the spins by working with specific vibrational modes of the ion array. We propose a pair of protocols to utilize the generated states and determine the impact of finite size effects, inhomogeneous couplings between the spin and motional degrees of freedom and technical noise. Our work suggests new opportunities for the preparation of many-body states with tailored correlations for quantum-enhanced metrology in spin-boson systems.	翻訳日:2023-11-30 23:03:13 公開日:2023-11-28
# e-vilm:semantic vector-quantized tokenizerを用いたマスキングビデオモデリングによる効率的なビデオ言語モデル E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer ( http://arxiv.org/abs/2311.17267v1 ) ライセンス: Link先を確認	Jacob Zhiyuan Fang, Skyler Zheng, Vasu Sharma, Robinson Piramuthu	(参考訳) 実世界のタスクに挑戦するためのスケーラブルなモデルを構築するには、さまざまな形式の多様なマルチモーダルデータ(ビデオ、テキスト、画像など)から学ぶことが重要である。既存の作品の中には、大きなが面倒なクロスモーダルアーキテクチャの活用に焦点を当てたものもある。それらの効果にかかわらず、より大きなアーキテクチャは、モデルが現実世界のアプリケーションに拡張されることを必然的に防ぎ、軽量なVLアーキテクチャと効率的な学習スキーマを構築することは、非常に実用的な価値である。本稿では,効率的なビデオランゲージモデル(E-ViLM)とマスク付きビデオモデリング(MVM)スキーマを提案する。特に,我々のE-ViLMは,事前学習されたベクトル量子化トークン化器によって生成され,連続的な視覚信号をラベルに識別するマスキングビデオ領域のセマンティックラベルを再構築することを学ぶ。簡単なMVMタスクと通常のVL事前学習モデルを用いて,ビデオ言語コーパスから表現表現を学習し,ビデオ質問応答やテキスト・ツー・ビデオ検索などの広範囲なビデオ言語タスクに適応できることを示す。特に、我々のE-ViLMは、より高速な推論速度で競合性能に到達することで、明らかな効率改善を実現している。すなわち、MSRVTTベンチマークのTop-$1$精度が39.3$%に達し、パラメータがわずか15%、GFLOPが9.4.8%少ない、最先端の大規模VLアーキテクチャの精度の91.4$%を維持している。また,提案したE-ViLMの学習スキーマの有効性を検証するための広範囲な研究を行った。 To build scalable models for challenging real-world tasks, it is important to learn from diverse, multi-modal data in various forms (e.g., videos, text, and images). Among the existing works, a plethora of them have focused on leveraging large but cumbersome cross-modal architectures. Regardless of their effectiveness, larger architectures unavoidably prevent the models from being extended to real-world applications, so building a lightweight VL architecture and an efficient learning schema is of great practical value. In this paper, we propose an Efficient Video-Language Model (dubbed as E-ViLM) and a masked video modeling (MVM) schema, assisted with a semantic vector-quantized tokenizer. In particular, our E-ViLM learns to reconstruct the semantic labels of masked video regions, produced by the pre-trained vector-quantized tokenizer, which discretizes the continuous visual signals into labels. We show that with our simple MVM task and regular VL pre-training modelings, our E-ViLM, despite its compactness, is able to learn expressive representations from Video-Language corpus and generalize well to extensive Video-Language tasks including video question answering, text-to-video retrieval, etc. In particular, our E-ViLM obtains obvious efficiency improvements by reaching competing performances with faster inference speed, i.e., our model reaches $39.3$% Top-$1$ accuracy on the MSRVTT benchmark, retaining $91.4$% of the accuracy of state-of-the-art larger VL architecture with only $15%$ parameters and $94.8%$ fewer GFLOPs. We also provide extensive ablative studies that validate the effectiveness of our proposed learning schema for E-ViLM.	翻訳日:2023-11-30 23:02:58 公開日:2023-11-28
# RETSim: レジリエントで効率的なテキスト類似性 RETSim: Resilient and Efficient Text Similarity ( http://arxiv.org/abs/2311.17264v1 ) ライセンス: Link先を確認	Marina Zhang, Owen Vallis, Aysegul Bumin, Tanay Vakharia, Elie Bursztein	(参考訳) 本稿では,ほぼ重複したテキスト検索,クラスタリング,データセットの重複解消タスクのための堅牢なメトリック埋め込みを生成するように訓練された,軽量多言語ディープラーニングモデルであるretsim(resilient and efficient text similarity)を提案する。我々は、retsimがminhashやneural textの埋め込みよりも大幅に堅牢で正確であることを示し、データセットの重複排除、逆テキスト検索ベンチマーク、スパムクラスタリングタスクにおける新しい最先端のパフォーマンスを達成する。また,W4NT3D ベンチマーク (Wiki-40B 4dversarial Near-T3xt Dataset) を導入し,多言語・ほぼ重複したテキスト検索機能の評価を行った。 RETSimとW4NT3Dベンチマークは、MITライセンス下でhttps://github.com/google/unisimでオープンソース化されている。 This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks. We demonstrate that RETSim is significantly more robust and accurate than MinHash and neural text embeddings, achieving new state-of-the-art performance on dataset deduplication, adversarial text retrieval benchmarks, and spam clustering tasks. We also introduce the W4NT3D benchmark (Wiki-40B 4dversarial Near-T3xt Dataset) for evaluating multilingual, near-duplicate text retrieval capabilities under adversarial settings. RETSim and the W4NT3D benchmark are open-sourced under the MIT License at https://github.com/google/unisim.	翻訳日:2023-11-30 23:02:18 公開日:2023-11-28
# 金属有機導波路における偏光エンタングル光子の巨大発生 Giant Generation of Polarization-Entangled Photons in Metal Organic Framework Waveguides ( http://arxiv.org/abs/2311.17263v1 ) ライセンス: Link先を確認	Sim\'on Paiva, Ruben A. Fritz, Sanoj Raj, Yamil J. Col\'on, Felipe Herrera	(参考訳) パラメトリック非線形光学プロセスは、絡み合った光を生成するための光学量子技術において有用である。しかし、従来は絡み合った光子を作る材料の範囲は限られている。金属-有機系フレームワーク(mofs)は、非線形特性をカスタマイズし、化学的および光学的安定性が証明された新しい光学材料として登場した。バルクmof結晶の形成が知られている金属原子と有機配位子の組み合わせの多さは、非線形光学の有望な候補の探索を促進する。次世代の量子光源の発見を加速するために,コリニア縮退型ii自発的パラメトリックダウン変換(spdc)とmofベースの1次元導波路の位相整合条件を研究するために,マルチスケールモデリング手法を用いる。選択された亜鉛系MOF結晶の非線形光学特性を計算するために周期-DFT計算を用い、量子光学で用いられる産業材料に匹敵する1064nmでの偏光対生成率$\sim 10^3-10^6$s$^{-1}$mW$^{-1}$mm$^{-1}$m$^{-1}$を予測した。 2軸MoF結晶Zn(4-pyridylacrylate)$_2$は、同一次元の周期的なKTP導波路上での変換効率を2倍改善する。この研究は、量子通信およびセンシングへの応用のための絡み合った光源として、MOF単結晶の大きなポテンシャルを浮き彫りにする。 Parametric nonlinear optical processes are instrumental in optical quantum technology for generating entangled light. However, the range of materials conventionally used for producing entangled photons is limited. Metal-organic frameworks (MOFs) have emerged as a novel class of optical materials with customizable nonlinear properties and proven chemical and optical stability. The large number of combinations of metal atoms and organic ligand from which bulk MOF crystals are known to form, facilitates the search of promising candidates for nonlinear optics. To accelerate the discovery of next-generation quantum light sources, we employ a multi-scale modeling approach to study phase-matching conditions for collinear degenerate type-II spontaneous parametric down conversion (SPDC) with MOF-based one dimensional waveguides. Using periodic-DFT calculations to compute the nonlinear optical properties of selected zinc-based MOF crystals, we predict polarization-entangled pair generation rates of $\sim 10^3-10^6$ s$^{-1}$mW$^{-1}$mm$^{-1}$ at 1064 nm, which are comparable with industry materials used in quantum optics. We find that the biaxial MOF crystal Zn(4-pyridylacrylate)$_2$ improves two-fold the conversion efficiency over a periodically-poled KTP waveguide of identical dimensions. This work underscores the great potential of MOF single crystals as entangled light sources for applications in quantum communication and sensing.	翻訳日:2023-11-30 23:02:02 公開日:2023-11-28
# SceneTex:拡散前処理による室内シーンの高品質テクスチャ合成 SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors ( http://arxiv.org/abs/2311.17261v1 ) ライセンス: Link先を確認	Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nie{\ss}ner	(参考訳) SceneTexは,奥行き拡散先行画像を用いた室内シーンの高品質でスタイルに整合したテクスチャを効果的に生成する手法である。メッシュ面に2Dビューを反復的にワープするか、正確な幾何学的およびスタイルの手がかりのない拡散遅延特徴を蒸留する従来の方法とは異なり、SceneTexはスタイルと幾何学的整合性を適切に反映したRGB空間の最適化問題としてテクスチャ合成タスクを定式化している。中心となるSceneTexは、メッシュの外観を暗黙的にエンコードするマルチレゾリューションテクスチャフィールドを提案する。目標テクスチャを各RGBレンダリングにおいて,スコア蒸留に基づく目的関数を用いて最適化する。ビュー間のスタイル整合性をさらに確保するために、各インスタンスの予めサンプリングされた参照位置にクロスアテンディングすることでRGB値を予測するクロスアテンションデコーダを導入する。 SceneTexは3D-FRONTシーンの様々な正確なテクスチャ合成を可能にする。 We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors. Unlike previous methods that either iteratively warp 2D views onto a mesh surface or distillate diffusion latent features without accurate geometric and style cues, SceneTex formulates the texture synthesis task as an optimization problem in the RGB space where style and geometry consistency are properly reflected. At its core, SceneTex proposes a multiresolution texture field to implicitly encode the mesh appearance. We optimize the target texture via a score-distillation-based objective function in respective RGB renderings. To further secure the style consistency across views, we introduce a cross-attention decoder to predict the RGB values by cross-attending to the pre-sampled reference locations in each instance. SceneTex enables various and accurate texture synthesis for 3D-FRONT scenes, demonstrating significant improvements in visual quality and prompt fidelity over the prior texture generation methods.	翻訳日:2023-11-30 23:01:33 公開日:2023-11-28
# 対称セクタ上でゼロである量子状態の絡み合い Entanglement of Quantum States which are Zero on the Symmetric Sector ( http://arxiv.org/abs/2311.17260v1 ) ライセンス: Link先を確認	Domenico D'Alessandro	(参考訳) 我々は n 個のクウディッツの量子系と関連するヒルベルト空間のクレブシュ・ゴルダン分解を考える。この分解において、部分空間の1つはいわゆる対称部分空間または対称セクタ、すなわち対称群の作用の下で不変な全ての状態の部分空間である。我々は、任意の分離可能な状態が対称セクター上の非零成分を持つこと、あるいは同値に対称セクター上の零成分を持つ状態が絡み合わなければならないことを証明している。 n=2,3粒子の場合、および任意の次元 d の場合、この結果は対称セクター上の可分状態の成分のサイズに鋭く下限を与えることによって洗練することができる。これにより、これらのシステムの絡み合い証人のクラスが特定できます。多成分の場合、この種類の証人がpptの絡み合った状態を検出することを示す例を示す。 We consider a quantum system of n qudits and the Clebsch-Gordan decomposition of the associated Hilbert space. In this decomposition, one of the subspaces is the so-called symmetric subspace or symmetric sector, that is, the subspace of all states that are invariant under the action of the symmetric group. We prove that any separable state must have a nonzero component on the symmetric sector, or, equivalently, any state which has zero component on the symmetric sector must be entangled. For the cases of n=2,3 particles, and in arbitrary dimension d, this result can be refined by providing sharp lower bounds on the size of the component of separable states on the symmetric sector. This leads us to identify a class of entanglement witnesses for these systems. We provide an example showing that in the multipartite case, this class of witnesses detects PPT entangled states.	翻訳日:2023-11-30 23:01:12 公開日:2023-11-28
# occamnet: 大規模記号回帰のための高速ニューラルネットワークモデル OccamNet: A Fast Neural Model for Symbolic Regression at Scale ( http://arxiv.org/abs/2007.10784v3 ) ライセンス: Link先を確認	Owen Dugan and Rumen Dangovski and Allan Costa and Samuel Kim and Pawan Goyal and Joseph Jacobson and Marin Solja\v{c}i\'c	(参考訳) ニューラルネットワークの表現性は、しばしばトレーニングデータセットの領域をはるかに超越する複雑なブラックボックスモデルのコストを伴い、科学データを記述するためのコンパクトな解析式を見つけるという目標と矛盾する。我々はOccamNetを紹介した。Occamのカミソリに適合する解釈可能でコンパクトでスパースなシンボルを見つけるニューラルネットワークモデルである。本モデルは,効率的なサンプリングと関数評価による関数上の確率分布を定義する。我々は,強化学習損失におけるクロスエントロピーマッチングを用いて,関数のサンプリングと確率質量の偏りをトレーニングする。 occamnetは、解析的および非解析的関数、暗黙的関数、単純な画像分類など、様々な問題に対する記号的適合を識別でき、実世界の回帰データセットにおける最先端の記号的回帰法を上回ることができる。この方法は、メモリフットプリントを最小にし、単一のcpu上で数分で複雑な関数に適合し、gpuにスケールする。 Neural networks' expressiveness comes at the cost of complex, black-box models that often extrapolate poorly beyond the domain of the training dataset, conflicting with the goal of finding compact analytic expressions to describe scientific data. We introduce OccamNet, a neural network model that finds interpretable, compact, and sparse symbolic fits to data, \`a la Occam's razor. Our model defines a probability distribution over functions with efficient sampling and function evaluation. We train by sampling functions and biasing the probability mass toward better fitting solutions, backpropagating using cross-entropy matching in a reinforcement-learning loss. OccamNet can identify symbolic fits for a variety of problems, including analytic and non-analytic functions, implicit functions, and simple image classification, and can outperform state-of-the-art symbolic regression methods on real-world regression datasets. Our method requires a minimal memory footprint, fits complicated functions in minutes on a single CPU, and scales on a GPU.	翻訳日:2023-11-30 18:23:04 公開日:2023-11-28
# 戦略的かつ比例的に公平な施設配置 Strategyproof and Proportionally Fair Facility Location ( http://arxiv.org/abs/2111.01566v3 ) ライセンス: Link先を確認	Haris Aziz, Alexander Lam, Barton E. Lee, Toby Walsh	(参考訳) 簡単な1次元の集合的決定問題(しばしば施設配置問題と呼ばれる)に焦点を当て、戦略の安全性と比例性に基づく公正性の問題を探求する。我々は,比例性に基づく正当性公理の階層構造を,IFS,Unanimous Fair Share(UFS),Proportionality(Freeman et al,2021),Proportional Fairness(PF)として導入し,分析した。各公理に対して、我々は公理と戦略耐性を満たすメカニズムの族を特徴づける。比例性,一様性,戦略性を満足する機構のファミリーは、UFSと戦略性を満足する機構のファミリーと等価であり、そのファミリーはPFと戦略性を満足する機構のファミリーと等価である。さらに、一様ファントム機構(uniform phantom mechanism)は、freeman et al. (2021)で研究されている。また,一様ファントム機構の結果を,連続性,厳密な単調性,ufsを満たす任意の機構に対する一意な(純粋)平衡結果として特徴づける。最後に,各比例性に基づく公平性公理を満足するメカニズムによって得られた最適社会福祉と最小総コストの観点から近似保証を分析する。一様ファントム機構は、ufsを満たす全てのメカニズムの中で最適な社会福祉(および最小総コスト)の最適近似を提供する。 We focus on a simple, one-dimensional collective decision problem (often referred to as the facility location problem) and explore issues of strategyproofness and proportionality-based fairness. We introduce and analyze a hierarchy of proportionality-based fairness axioms of varying strength: Individual Fair Share (IFS), Unanimous Fair Share (UFS), Proportionality (as in Freeman et al, 2021), and Proportional Fairness (PF). For each axiom, we characterize the family of mechanisms that satisfy the axiom and strategyproofness. We show that imposing strategyproofness renders many of the axioms to be equivalent: the family of mechanisms that satisfy proportionality, unanimity, and strategyproofness is equivalent to the family of mechanisms that satisfy UFS and strategyproofness, which, in turn, is equivalent to the family of mechanisms that satisfy PF and strategyproofness. Furthermore, there is a unique such mechanism: the Uniform Phantom mechanism, which is studied in Freeman et al. (2021). We also characterize the outcomes of the Uniform Phantom mechanism as the unique (pure) equilibrium outcome for any mechanism that satisfies continuity, strict monotonicity, and UFS. Finally, we analyze the approximation guarantees, in terms of optimal social welfare and minimum total cost, obtained by mechanisms that are strategyproof and satisfy each proportionality-based fairness axiom. We show that the Uniform Phantom mechanism provides the best approximation of the optimal social welfare (and also minimum total cost) among all mechanisms that satisfy UFS.	翻訳日:2023-11-30 18:18:28 公開日:2023-11-28
# 最小長さの変形空間における$\delta$-ポテンシャルを持つschr\"odinger方程式の散乱解 Scattering solution of Schr\"odinger equation with $\delta$-potential in deformed space with minimal length ( http://arxiv.org/abs/2110.12494v2 ) ライセンス: Link先を確認	M. I. Samar and V. M. Tkachuk	(参考訳) 一般に変形したハイゼンベルク代数の場合、ディラック $\delta$-関数ポテンシャル問題を考える。準位表現における問題の厳密な境界と散乱解が提示される。共鳴エネルギーについては、入射波は完全に反射される。この効果は変形関数の選択に非常に敏感である。 We consider the Dirac $\delta$-function potential problem in general case of deformed Heisenberg algebra leading to the minimal length. Exact bound and scattering solutions of the problem in quasiposition representation are presented. We obtain that for some resonance energy the incident wave is completely reflected. We conclude that this effect is very sensitive to the choice of the deformation function.	翻訳日:2023-11-30 18:18:01 公開日:2023-11-28
# 画像検索による自動運転車用単眼カメラ位置推定 Monocular Camera Localization for Automated Vehicles Using Image Retrieval ( http://arxiv.org/abs/2109.06296v2 ) ライセンス: Link先を確認	Eunhyek Joa and Francesco Borrelli	(参考訳) 本研究では,自律走行車の位置と方向角を1台のカメラでリアルタイムで検出する問題に対処する。リアルタイムにlidarとhd(high definition)3dマップを必要とする手法と比較すると,提案手法はスケーラブルで計算効率が良く,精度が低い。新しい手法は、既存のアルゴリズムを画像検索、マッピングデータベース、粒子フィルタリングの3つの分野に組み合わせ、適応する。その結果,lidarで構築した地図を用いた他の単眼カメラローカライズ法に匹敵する性能を有する画像検索手法を用いた簡易なリアルタイムローカライズ手法が得られた。提案手法は,KITTI odometry データセットと屋内1:10自律走行車を用いた閉ループ実験を用いて評価した。テストでは、リアルタイム能力と10cmレベルの精度を示す。また, 閉ループ室内実験の結果, 位置推定誤差と制御誤差との間に正のフィードバックループが存在することがわかった。このような現象は記事の最後に詳細に分析される。 We address the problem of finding the current position and heading angle of an autonomous vehicle in real-time using a single camera. Compared to methods which require LiDARs and high definition (HD) 3D maps in real-time, the proposed approach is easily scalable and computationally efficient, at the price of lower precision. The new method combines and adapts existing algorithms in three different fields: image retrieval, mapping database, and particle filtering. The result is a simple, real-time localization method using an image retrieval method whose performance is comparable to other monocular camera localization methods which use a map built with LiDARs. We evaluate the proposed method using the KITTI odometry dataset and via closed-loop experiments with an indoor 1:10 autonomous vehicle. The tests demonstrate real-time capability and a 10cm level accuracy. Also, experimental results of the closed-loop indoor tests show the presence of a positive feedback loop between the localization error and the control error. Such phenomena is analysed in details at the end of the article.	翻訳日:2023-11-30 18:16:25 公開日:2023-11-28
# ニューラルNLPのポストホック解釈可能性:サーベイ Post-hoc Interpretability for Neural NLP: A Survey ( http://arxiv.org/abs/2108.04840v5 ) ライセンス: Link先を確認	Andreas Madsen, Siva Reddy, Sarath Chandar	(参考訳) nlpのニューラルネットワークはますます複雑で広くなりつつあり、これらのモデルの使用に責任があるかどうかの懸念が高まっている。モデルを説明することは、安全性と倫理上の懸念に対処し、説明責任に不可欠である。解釈性は、人間に理解できる言葉でこれらの説明を提供するのに役立つ。さらに、post-hocメソッドは、モデルが学習され、一般的にモデルに依存しない後に説明を提供する。この調査は、最近のポストホック解釈可能性法がいかに人間に説明を伝えるか、そして、それぞれの方法が深く、どのように検証されるかを分類する。 Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for accountability. Interpretability serves to provide these explanations in terms that are understandable to humans. Additionally, post-hoc methods provide explanations after a model is learned and are generally model-agnostic. This survey provides a categorization of how recent post-hoc interpretability methods communicate explanations to humans, it discusses each method in-depth, and how they are validated, as the latter is often a common concern.	翻訳日:2023-11-30 18:16:09 公開日:2023-11-28
# テンソルネットワーク法によるキタエフ量子二重モデルの熱化 Thermalization in Kitaev's quantum double models via Tensor Network techniques ( http://arxiv.org/abs/2107.01628v3 ) ライセンス: Link先を確認	Angelo Lucia, David P\'erez-Garc\'ia, Antonio P\'erez-Hern\'andez	(参考訳) 任意の2次元キタエフの量子二重モデルに付随するデイビース生成器は熱力学的極限において非有界なスペクトルギャップを持つことを示した。これは、これらのモデルが非アーベルの場合でさえ自己修正量子記憶として役に立たないという拡張された信念を厳密に検証する。この証明は、プロジェンド・アンタングルド・ペア状態に関連する親ハミルトニアンのスペクトルギャップを、バルク境界対応の観点から特徴づける最近のアイデアと結果を使用する。 We show that the Davies generator associated to any 2D Kitaev's quantum double model has a non-vanishing spectral gap in the thermodynamic limit. This validates rigorously the extended belief that those models are useless as self-correcting quantum memories, even in the non-abelian case. The proof uses recent ideas and results regarding the characterization of the spectral gap for parent Hamiltonians associated to Projected Entangled Pair States in terms of a bulk-boundary correspondence.	翻訳日:2023-11-30 18:15:46 公開日:2023-11-28
# GraphPrompt: バイオメディカル同期予測のためのグラフベースのプロンプトテンプレート GraphPrompt: Graph-Based Prompt Templates for Biomedical Synonym Prediction ( http://arxiv.org/abs/2112.03002v2 ) ライセンス: Link先を確認	Hanwen Xu, Jiayou Zhang, Zhirui Wang, Shizhuo Zhang, Megh Manoj Bhalerao, Yucong Liu, Dawei Zhu, Sheng Wang	(参考訳) バイオメディカルデータセットの拡張では、同じカテゴリが異なる用語でラベル付けされるため、これらの用語のキュレーションは退屈で面倒である。したがって、生体同義語をオントロジーに自動的にマッピングすることが望ましいので、生体同義語予測タスクと呼ぶ。生物医学的概念正規化(BCN)とは異なり、文脈からの手がかりは同義語予測を強化するために使用できず、オントロジーからグラフの特徴を抽出することが不可欠である。我々は,70種類の概念と200万種類の概念-長期ペアを含む専門家計算データセットOBO-synを導入する。グラフ情報を十分に活用しないbcn手法は,この課題において弱い性能を示す。そこで本稿では,グラフに従ってプロンプトテンプレートを生成するプロンプトベースの学習手法であるGraphPromptを提案する。 GraphPromptはゼロショットと少数ショットの設定で37.2\%と28.5\%の改善を行い、グラフベースのプロンプトテンプレートの有効性を示した。我々は,グラフベースのNLPタスクにグラフプロンプトとOBO-synデータセットを幅広く適用し,生物医学的データを多種多様な蓄積する基盤となることを想定する。 https://github.com/HanwenXuTHU/GraphPrompt In the expansion of biomedical dataset, the same category may be labeled with different terms, thus being tedious and onerous to curate these terms. Therefore, automatically mapping synonymous terms onto the ontologies is desirable, which we name as biomedical synonym prediction task. Unlike biomedical concept normalization (BCN), no clues from context can be used to enhance synonym prediction, making it essential to extract graph features from ontology. We introduce an expert-curated dataset OBO-syn encompassing 70 different types of concepts and 2 million curated concept-term pairs for evaluating synonym prediction methods. We find BCN methods perform weakly on this task for not making full use of graph information. Therefore, we propose GraphPrompt, a prompt-based learning approach that creates prompt templates according to the graphs. GraphPrompt obtained 37.2\% and 28.5\% improvement on zero-shot and few-shot settings respectively, indicating the effectiveness of these graph-based prompt templates. We envision that our method GraphPrompt and OBO-syn dataset can be broadly applied to graph-based NLP tasks, and serve as the basis for analyzing diverse and accumulating biomedical data. All the data and codes are avalible at: https://github.com/HanwenXuTHU/GraphPrompt	翻訳日:2023-11-30 18:04:34 公開日:2023-11-28
# 確率一元性を持つ任意の二重ポテンシャル障壁の浸透:最小長さの検証への示唆 Penetration of Arbitrary Double Potential Barriers with Probability Unity: Implications for Testing the Existence of a Minimum Length ( http://arxiv.org/abs/2206.04243v3 ) ライセンス: Link先を確認	Yong Yang	(参考訳) 二重ポテンシャル障壁を横切る量子トンネルの研究を行う。実空間が連続体であると仮定すると、任意の形の大きな障壁は、単に境界間間隔を調整することによって、共振トンネル(rt)の実現という統一性の確率を持つ低エネルギー粒子によって貫かれることが厳密に証明される。結果は、共鳴トンネルと逐次トンネルを区別する電子と陽子のトンネルによって示される。バリア位置におけるトンネル確率の臨界依存性は、位相因子の重要な役割を示すだけでなく、共鳴付近での超高精度測定の可能性も示している。対照的に、非ゼロの最小長の存在は、障壁の大きさと粒子の質量に上限を与え、有効RTが停止する。不確実性原理による粒子位置の非局在化に起因する実用的困難に対処するためのスキームを提案する。この研究は、原子系に基づく最小長さの存在を実験的に検証するための道を開く。 Quantum tunneling across double potential barriers is studied. With the assumption that the real space is a continuum, it is rigorously proved that large barriers of arbitrary shapes can be penetrated by low-energy particles with a probability of unity, i.e., realization of resonant tunneling (RT), by simply tuning the inter-barrier spacing. The results are demonstrated by tunneling of electrons and protons, in which resonant and sequential tunneling are distinguished. The critical dependence of tunneling probabilities on the barrier positions not only demonstrates the crucial role of phase factors, but also points to the possibility of ultrahigh accuracy measurements near resonance. By contrast, the existence of a nonzero minimum length puts upper bounds on the barrier size and particle mass, beyond which effective RT ceases. A scheme is suggested for dealing with the practical difficulties arising from the delocalization of particle position due to the uncertainty principle. This work opens a possible avenue for experimental tests of the existence of a minimum length based on atomic systems.	翻訳日:2023-11-30 17:55:26 公開日:2023-11-28
# PCPTとACPT:DNNモデルの著作権保護とトレーサビリティ・スキーム PCPT and ACPT: Copyright Protection and Traceability Scheme for DNN Models ( http://arxiv.org/abs/2206.02541v2 ) ライセンス: Link先を確認	Xuefeng Fan, Dahao Fu, Hangyu Gui, Xinpeng Zhang and Xiaoyi Zhou	(参考訳) ディープニューラルネットワーク(DNN)は人工知能(AI)分野で大きな成功を収めている。しかし、DNNモデルは容易に違法にコピーされ、再配布され、犯罪者によって虐待され、モデル発明者の利益を著しく損なうことができる。ニューラルネットワークの透かしによるDNNモデルの著作権保護が研究されているが、漏洩したモデルの認証ユーザを決定するトレーサビリティメカニズムの確立は、AIサービスの需要が引き起こした新たな問題である。既存のトレーサビリティメカニズムは透かしのないモデルに使用されるため、少数の偽陽性が生成される。既存のブラックボックスのアクティブ保護スキームは、権限制御が緩く、偽造攻撃に弱い。そこで,ビデオフレーミングと画像知覚ハッシュアルゴリズムを用いたブラックボックスニューラルネットワークの透かしの考え方に基づき,新たなdnnモデルを用いた受動的著作権保護・トレーサビリティフレームワークpcptを提案し,少数の偽陽性を生じさせる既存のトレーサビリティメカニズムを改善した。認証制御戦略と画像知覚ハッシュアルゴリズムに基づいて,dnnモデルによる著作権保護とトレーサビリティフレームワークapptを提案する。このフレームワークは、検出器と検証器によって構築された認可制御センターを使用する。このアプローチは、より厳格な認証制御を実現し、ユーザとモデルオーナ間の強い接続を確立し、フレームワークのセキュリティを改善し、トレーサビリティ検証をサポートする。 Deep neural networks (DNNs) have achieved tremendous success in artificial intelligence (AI) fields. However, DNN models can be easily illegally copied, redistributed, or abused by criminals, seriously damaging the interests of model inventors. The copyright protection of DNN models by neural network watermarking has been studied, but the establishment of a traceability mechanism for determining the authorized users of a leaked model is a new problem driven by the demand for AI services. Because the existing traceability mechanisms are used for models without watermarks, a small number of false-positives are generated. Existing black-box active protection schemes have loose authorization control and are vulnerable to forgery attacks. Therefore, based on the idea of black-box neural network watermarking with the video framing and image perceptual hash algorithm, a passive copyright protection and traceability framework PCPT is proposed that uses an additional class of DNN models, improving the existing traceability mechanism that yields a small number of false-positives. Based on an authorization control strategy and image perceptual hash algorithm, a DNN model active copyright protection and traceability framework ACPT is proposed. This framework uses the authorization control center constructed by the detector and verifier. This approach realizes stricter authorization control, which establishes a strong connection between users and model owners, improves the framework security, and supports traceability verification.	翻訳日:2023-11-30 17:54:53 公開日:2023-11-28
# 自己回帰型スロットVAEの生成品質向上に向けて Towards Improving the Generation Quality of Autoregressive Slot VAEs ( http://arxiv.org/abs/2206.01370v3 ) ライセンス: Link先を確認	Patrick Emami, Pan He, Sanjay Ranka, Anand Rangarajan	(参考訳) 無条件シーン推論と生成は、単一の構成モデルと共同で学ぶことが困難である。画像からオブジェクト中心表現('slots'')を抽出するモデルの進歩を奨励する一方で、スロットからの無条件シーン生成は注目されていない。これは主に、コヒーレントなシーンを想像するために必要な多目的関係の学習が難しいためである。既存のスロットベースモデルの多くは、オブジェクト相関を学習する能力に制限があるという仮説を立てる。オブジェクト相関学習を強化する2つの改善を提案する。ひとつは、スロット間の高次相関をキャプチャするグローバルなシーンレベル変数のスロットを条件付けることだ。第2に、シーンオブジェクトの自動回帰生成に使用する一貫した順序を学習することを提案することにより、画像中のオブジェクトに対する標準順序の根本的な欠如に対処する。具体的には,学習順序に従ってシーンオブジェクトを逐次生成する前に,自己回帰スロットをトレーニングする。順序付きスロット推論は、画像からスロットを抽出する既存のアプローチを使って、ランダムに順序付けされたスロットセットを推定し、そのスロットを予め自己回帰的に生成された順序付きスロットに調整する。 3つの多目的環境における実験により,無条件シーン生成の品質が明らかに向上した。詳細なアブレーション研究も提供され、2つの改善が提案されている。 Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (''slots'') from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multi-object relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multi-object environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.	翻訳日:2023-11-30 17:54:26 公開日:2023-11-28
# ボース・アインシュタイン凝縮体における量子粒子の有限エントロピー揺らぎ Finite entropy fluctuations of a quantum particle in a Bose-Einstein condensate ( http://arxiv.org/abs/2204.01730v4 ) ライセンス: Link先を確認	Alexej Schelle	(参考訳) 複素二次元時間表現を用いたほぼ理想的で希薄なボース気体の量子場アプローチについて述べる。 A quantitative measure for the coherence time of a quantum particle in the limit of very dilute atomic gases is derived and it is illustrated that the process of spontaneous symmetry breaking of the quantum field for the particle with a well-defined gauge has its origin in the coupling of symmetric and asymmetric parts of the underlying fugacity spectrum, which induces finite single-particle coherence times and thus breaks time reversal symmetry and the corresponding phase gauge symmetry of the quantum field below the critical temperature. カップリングとゲージ対称性の破れは、2次元時間の自発的な量子揺らぎと、コヒーレントな粒子相互作用に依存しない平衡での対応する有限エントロピーによって説明できる。この文脈では、ボルツマン平衡における量子場のゲージを定義することは、現在のモデルのパラメータ状態における熱平衡におけるエントロピーと自然対称性の破れを最大化する必要十分条件である。二次元時間の概念は最終的に、古典的温度限界における時間反転対称性を持つ純粋虚時という標準的なスキームに収束する。 A quantitative quantum field approach for a nearly-ideal, dilute Bose gas using a complex two-dimensional representation of time is presented. A quantitative measure for the coherence time of a quantum particle in the limit of very dilute atomic gases is derived and it is illustrated that the process of spontaneous symmetry breaking of the quantum field for the particle with a well-defined gauge has its origin in the coupling of symmetric and asymmetric parts of the underlying fugacity spectrum, which induces finite single-particle coherence times and thus breaks time reversal symmetry and the corresponding phase gauge symmetry of the quantum field below the critical temperature. The coupling and gauge symmetry breaking can be understood as due to spontaneous quantum fluctuations of two-dimensional time and the corresponding finite entropy at equilibrium, which does not rely on coherent particle interactions. In this context, defining a gauge for the quantum field at the Boltzmann equilibrium is a necessary and sufficient condition for maximization of entropy and spontaneous symmetry breaking at thermal equilibrium in the parameter regime of the present model. The concept of two-dimensional time finally converges to the standard scheme of purely imaginary time with time reversal symmetry in the classical limit of large temperatures.	翻訳日:2023-11-30 17:51:38 公開日:2023-11-28
# 360Roam:360$^\circ$放射場を利用した実時間室内ローミング 360Roam: Real-Time Indoor Roaming Using Geometry-Aware 360$^\circ$ Radiance Fields ( http://arxiv.org/abs/2208.02705v2 ) ライセンス: Link先を確認	Huajian Huang, Yingshu Chen, Tianjia Zhang and Sai-Kit Yeung	(参考訳) スパース360$^\circ$画像間の仮想ツアーは、滑らかで没入的なローミング体験を妨げながら広く使われている。ニューラル・レージアンス・フィールド(NeRF)の出現は、新しいビューの合成において大きな進歩を示し、没入型シーン探索の可能性を解き放った。それでも、以前のNeRFは主にオブジェクト中心のシナリオに焦点を当てており、シーンパラメータ化の制限により、外向きのシーンや大規模シーンに適用した場合、顕著なパフォーマンス劣化が生じる。室内のローミングをシームレスかつリアルタイムに行うために,局所放射場を適応的に割り当てた幾何認識放射場を用いた新しいアプローチを提案する。当初は,グローバル全方位放射場から導かれる確率的占有マップの形で,複数の屋内シーンの360$^\circ$画像を用いて,鮮明な幾何学を段階的に再構築する。次に,局所放射場を,復元された幾何学に基づいて適応的な分割・探索戦略によって割り当てる。ジオメトリ対応のサンプリングと大域ラディアンス場の分解を取り入れることで,位置符号化とコンパクトニューラルネットワークを効果的に活用し,レンダリング品質と速度を向上させる。さらに、シーンの抽出されたフロアプランは視覚的ガイダンスを提供し、リアルなローミング体験に寄与する。本システムの有効性を実証するため,様々な実写シーンを含む360$^\circ$画像の多様なデータセットをキュレートし,広範な実験を行った。大規模屋内シーンローミングにおいて,ベースラインアプローチとの比較により,システムの性能が向上したことを示す。 Virtual tour among sparse 360$^\circ$ images is widely used while hindering smooth and immersive roaming experiences. The emergence of Neural Radiance Field (NeRF) has showcased significant progress in synthesizing novel views, unlocking the potential for immersive scene exploration. Nevertheless, previous NeRF works primarily focused on object-centric scenarios, resulting in noticeable performance degradation when applied to outward-facing and large-scale scenes due to limitations in scene parameterization. To achieve seamless and real-time indoor roaming, we propose a novel approach using geometry-aware radiance fields with adaptively assigned local radiance fields. Initially, we employ multiple 360$^\circ$ images of an indoor scene to progressively reconstruct explicit geometry in the form of a probabilistic occupancy map, derived from a global omnidirectional radiance field. Subsequently, we assign local radiance fields through an adaptive divide-and-conquer strategy based on the recovered geometry. By incorporating geometry-aware sampling and decomposition of the global radiance field, our system effectively utilizes positional encoding and compact neural networks to enhance rendering quality and speed. Additionally, the extracted floorplan of the scene aids in providing visual guidance, contributing to a realistic roaming experience. To demonstrate the effectiveness of our system, we curated a diverse dataset of 360$^\circ$ images encompassing various real-life scenes, on which we conducted extensive experiments. Quantitative and qualitative comparisons against baseline approaches illustrated the superior performance of our system in large-scale indoor scene roaming.	翻訳日:2023-11-30 17:42:59 公開日:2023-11-28
# 低周波重力波がベリー相を発する Low frequency gravitational waves emerge Berry phase ( http://arxiv.org/abs/2207.08687v2 ) ライセンス: Link先を確認	Partha Nandi, Sounak Pal, Sayan Kumar Pal, Bibhas Ranjan Majhi	(参考訳) 低周波重力波(lfgws)天文学の検出は、天体物理学と一般相対性理論の領域において新しい時代の到来を告げた。線形重力法において、gwsと検出器のような2粒子の相互作用の枠組みを用いて、gwsの低周波で量子状態が研究されているおもちゃ検出器モデルを提案する。検出器は、GWと外部時間依存の2次元高調波ポテンシャルとを同時に相互作用させる。低周波gwsとの相互作用は、自然に計算における断熱近似を提供し、それゆえ検出器の量子状態における量子幾何位相を導くことができる。さらに、外部高調波ポテンシャルトラップの周波数を調整して制御することができる。このような幾何位相検出はgwsの足跡の顕在化に寄与する可能性がある。さらに重要なことは、我々の理論モデルがベリー位相を通して非常に小さな周波数GWを検出するためのレイアウトを提供することができるかもしれない。 The detection of low frequency gravitational waves (LFGWs) astronomy has marked an advent of new era in the domain of astrophysics and general relativity. Using the framework of interaction between GWs and a point two-particles like detector, within linearized gravity approach, we propose a toy detector model whose quantum state is being investigated at a low-frequency of GWs. The detector is in simultaneous interaction with GWs and an external time-dependent (tuneable) two-dimensional harmonic potential. We observe that the interaction with low frequency GWs naturally provides adiabatic approximation in the calculation, and thereby can lead to a quantal geometric phase in the quantum states of the detector. Moreover this can be controlled by tuning the frequency of the external harmonic potential trap. We argue that such geometric phase detection may serve as a manifestation of the footprint of GWs. More importantly, our theoretical model may be capable of providing a layout for the detection of very small frequency GWs through Berry phase.	翻訳日:2023-11-30 17:41:47 公開日:2023-11-28
# ACHO:適応型コンフォーマルハイパーパラメータ最適化 ACHO: Adaptive Conformal Hyperparameter Optimization ( http://arxiv.org/abs/2207.03017v3 ) ライセンス: Link先を確認	Riccardo Doyle	(参考訳) ハイパーパラメータ検索のための新しいフレームワークは、ここ10年でいくつか登場したが、ほとんどが厳密で、通常、分散的な仮定に依存し、検索モデルの柔軟性を制限している。本稿では,共形信頼区間の上位信頼境界サンプリングに基づく新しい最適化フレームワークを提案する。このようなアーキテクチャのいくつかは、ランダムフォレストと畳み込みニューラルネットワークのハイパーパラメータ探索で探索、ベンチマークされ、十分な間隔のカバレッジを示し、ランダム検索に優れたチューニング性能を発揮した。 Several novel frameworks for hyperparameter search have emerged in the last decade, but most rely on strict, often normal, distributional assumptions, limiting search model flexibility. This paper proposes a novel optimization framework based on upper confidence bound sampling of conformal confidence intervals, whose weaker assumption of exchangeability enables greater choice of search model architectures. Several such architectures were explored and benchmarked on hyperparameter search of random forests and convolutional neural networks, displaying satisfactory interval coverage and superior tuning performance to random search.	翻訳日:2023-11-30 17:40:29 公開日:2023-11-28
# 高次元におけるwaserstein分布ロバスト推定:性能解析と最適ハイパーパラメータチューニング Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning ( http://arxiv.org/abs/2206.13269v2 ) ライセンス: Link先を確認	Liviu Aolaritei, Soroosh Shafiee, Florian D\"orfler	(参考訳) Wassersteinの分散的ロバストな最適化は、最近、堅牢な推定のための強力なフレームワークとして登場し、優れたアウトオブサンプル性能保証、よく理解された正規化効果、そして計算的に抽出可能な再構成を享受している。そのような枠組みにおいて、推定子は、ワッサーシュタイン意味で近い全ての確率分布に対する最悪の損失を経験的分布に最小化することによって得られる。本稿では,雑音のある線形測定値から未知パラメータを推定するための分布的ロバストな推定手法を提案し,そのような推定器の2乗誤差特性を解析する作業に着目する。本研究は, 周辺次元と試料数の両方を比例率で無限大化し, 過度・過度なパラメータ化を符号化した現代高次元比例法を用いて行った。等方性ガウスの特徴の仮定の下で、二乗誤差は4つのスカラー変数を含む凸凸最適化問題の解として回復できることを示した。重要なことに、二乗誤差の正確な定量化は、異なる曖昧度半径を正確かつ効率的に比較し、推定誤差に対するアンダー・オーバ・パラメトリゼーションの影響を理解することができる。本論文は,本研究の成果を活かしたエキサイティングな研究指針の一覧でまとめる。 Wasserstein distributionally robust optimization has recently emerged as a powerful framework for robust estimation, enjoying good out-of-sample performance guarantees, well-understood regularization effects, and computationally tractable reformulations. In such framework, the estimator is obtained by minimizing the worst-case expected loss over all probability distributions which are close, in a Wasserstein sense, to the empirical distribution. In this paper, we propose a Wasserstein distributionally robust estimation framework to estimate an unknown parameter from noisy linear measurements, and we focus on the task of analyzing the squared error performance of such estimators. Our study is carried out in the modern high-dimensional proportional regime, where both the ambient dimension and the number of samples go to infinity at a proportional rate which encodes the under/over-parametrization of the problem. Under an isotropic Gaussian features assumption, we show that the squared error can be recovered as the solution of a convex-concave optimization problem which, surprinsingly, involves at most four scalar variables. Importantly, the precise quantification of the squared error allows to accurately and efficiently compare different ambiguity radii and to understand the effect of the under/over-parametrization on the estimation error. We conclude the paper with a list of exciting research directions enabled by our results.	翻訳日:2023-11-30 17:40:18 公開日:2023-11-28
# 条件付き生成逆数ネットワークを用いた2次元全吸収分光 Two-dimensional total absorption spectroscopy with conditional generative adversarial networks ( http://arxiv.org/abs/2206.11792v2 ) ライセンス: Link先を確認	Cade Dembski, Michelle P. Kuchera, Sean Liddick, Raghu Ramanujan, Artemis Spyrou	(参考訳) 実験スペクトルから大量の$\gamma$-ray検出器の応答を除去するために、機械学習技術の利用を検討する。分割された$\gamma$-ray total absorption spectrometers (tas) により、個々の$\gamma$-ray energy (e$_\gamma$) と全励起エネルギー (e$_x$) を同時に測定することができる。 TAS検出器データの解析は、E$_x$とE$_\gamma$の量とが相関しているという事実により複雑であり、E$_x$とE$_\gamma$の応答関数を独立に展開する技術は正確ではない。本研究では,条件付き生成逆数ネットワーク(cGAN)を用いて,TAS検出器における$E_{x}$と$E_{\gamma}$データを同時に展開する。具体的には,近年の深層学習の進歩に基づく生成モデリング手法である「texttt{Pix2Pix} cGAN」を用いて,画像から画像への変換問題として「rawmatrix~行列展開」を扱う。本研究は, 1-$\gamma$ および double-$\gamma$ 崩壊カスケードのシミュレーションおよび実験行列に関する結果である。シミュレーションテストケースの93%以上において, 検出器分解能限界内でのキャラクタリゼーション能力を示す。 We explore the use of machine learning techniques to remove the response of large volume $\gamma$-ray detectors from experimental spectra. Segmented $\gamma$-ray total absorption spectrometers (TAS) allow for the simultaneous measurement of individual $\gamma$-ray energy (E$_\gamma$) and total excitation energy (E$_x$). Analysis of TAS detector data is complicated by the fact that the E$_x$ and E$_\gamma$ quantities are correlated, and therefore, techniques that simply unfold using E$_x$ and E$_\gamma$ response functions independently are not as accurate. In this work, we investigate the use of conditional generative adversarial networks (cGANs) to simultaneously unfold $E_{x}$ and $E_{\gamma}$ data in TAS detectors. Specifically, we employ a \texttt{Pix2Pix} cGAN, a generative modeling technique based on recent advances in deep learning, to treat \rawmatrix~ matrix unfolding as an image-to-image translation problem. We present results for simulated and experimental matrices of single-$\gamma$ and double-$\gamma$ decay cascades. Our model demonstrates characterization capabilities within detector resolution limits for upwards of 93% of simulated test cases.	翻訳日:2023-11-30 17:39:53 公開日:2023-11-28
# 非対称相反効用による政策学習 Policy Learning with Asymmetric Counterfactual Utilities ( http://arxiv.org/abs/2206.10479v3 ) ライセンス: Link先を確認	Eli Ben-Michael and Kosuke Imai and Zhichao Jiang	(参考訳) データ駆動意思決定は、医療や公共政策のような高リスク設定においても重要な役割を果たす。観測データから最適政策を学ぶには、人口間で期待値が最大化される効用関数を慎重に定式化する必要がある。研究者は通常、観察結果のみに依存するユーティリティを使用するが、多くの環境では、意思決定者のユーティリティ機能は、すべてのアクションの下での潜在的な結果の共同セットによってより適切に特徴付けられる。例えば、「害を及ぼさない」というヒポクラテスの原則は、治療なしで生き残る患者に死をもたらすコストが、救命治療の費用よりも大きいことを意味する。本稿では,この形態の非対称対実効関数を用いた最適政策学習について考察する。非対称な反ファクト的ユーティリティが期待できないユーティリティ機能につながることを示すので、まずそれを部分的に同定する。統計的決定理論に基づき、異なる代替政策に対する最大公益損失を最小化することにより、ミニマックス決定規則を導出する。中間分類問題を解くことにより、観測データから最小損失決定ルールを学習できることを示し、これらの中間分類器の後悔によって、この手順の有限サンプル過剰な実用的損失が有界であることを示す。この概念的枠組みと方法論を,肺高血圧の可能性を秘めた患者に対して,右心カテーテルを使用すべきか否かの判断に応用する。 Data-driven decision making plays an important role even in high stakes settings like medicine and public policy. Learning optimal policies from observed data requires a careful formulation of the utility function whose expected value is maximized across a population. Although researchers typically use utilities that depend on observed outcomes alone, in many settings the decision maker's utility function is more properly characterized by the joint set of potential outcomes under all actions. For example, the Hippocratic principle to "do no harm" implies that the cost of causing death to a patient who would otherwise survive without treatment is greater than the cost of forgoing life-saving treatment. We consider optimal policy learning with asymmetric counterfactual utility functions of this form that consider the joint set of potential outcomes. We show that asymmetric counterfactual utilities lead to an unidentifiable expected utility function, and so we first partially identify it. Drawing on statistical decision theory, we then derive minimax decision rules by minimizing the maximum expected utility loss relative to different alternative policies. We show that one can learn minimax loss decision rules from observed data by solving intermediate classification problems, and establish that the finite sample excess expected utility loss of this procedure is bounded by the regret of these intermediate classifiers. We apply this conceptual framework and methodology to the decision about whether or not to use right heart catheterization for patients with possible pulmonary hypertension.	翻訳日:2023-11-30 17:39:08 公開日:2023-11-28
# Long Range Graphベンチマーク Long Range Graph Benchmark ( http://arxiv.org/abs/2206.08164v4 ) ライセンス: Link先を確認	Vijay Prakash Dwivedi, Ladislav Ramp\'a\v{s}ek, Mikhail Galkin, Ali Parviz, Guy Wolf, Anh Tuan Luu, Dominique Beaini	(参考訳) メッセージパッシング(MP)パラダイムに基づくグラフニューラルネットワーク(GNN)は、一般的に1ホップ隣人間で情報を交換して各層にノード表現を構築する。原則として、そのようなネットワークは、グラフ上で所定のタスクを学ぶのに必要な長距離インタラクション(lri)をキャプチャできない。近年、lriのモデリングを可能にするために、元のスパース構造を超えた完全なノード接続を考慮できるグラフのためのトランスフォーマティブベース手法の開発への関心が高まっている。しかし、単に1ホップメッセージパッシングに頼るMP-GNNは、いくつかの既存のグラフベンチマークと位置的特徴表現を組み合わせると、しばしば改善され、トランスフォーマーのようなアーキテクチャの実用性やランキングが制限される。本稿では,5つのグラフ学習データセット(PascalVOC-SP,COCO-SP,PCQM-Contact,Peptides-func,Peptides-struct)を用いたLong Range Graph Benchmark(LRGB)を提案する。ベースラインのGNNとGraph Transformerネットワークの両方をベンチマークし、長距離依存をキャプチャするモデルがこれらのタスクにおいて著しく優れていることを検証した。したがって、これらのデータセットは、LRIをキャプチャするためのMP-GNNとGraph Transformerアーキテクチャのベンチマークと探索に適している。 Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to build node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI.	翻訳日:2023-11-30 17:38:44 公開日:2023-11-28
# クローズだ! 抽象要約におけるFactual Consistency Fasterの評価のための新しいフレームワーク Just ClozE! A Novel Framework for Evaluating the Factual Consistency Faster in Abstractive Summarization ( http://arxiv.org/abs/2210.02804v2 ) ライセンス: Link先を確認	Yiyang Li, Lei Li, Marina Litvak, Natalia Vanetik, Dingxin Hu, Yuze Li, Yanquan Zhou	(参考訳) 近年,抽象要約における事実整合性の問題が注目され,要約と文書間の事実整合性の評価が重要かつ緊急課題となっている。現在の評価指標のほとんどは、質問応答(QA)や自然言語推論(NLI)タスクから採用されている。しかし、QAベースのメトリクスの応用は実際に非常に時間がかかり、NLIベースのメトリクスは解釈不可能である。本稿では,cloze と呼ばれるcloze ベースの評価フレームワークを提案し,cloze ベースのメトリクスの可能性を示す。 NLIレベルの推論速度を維持しながら、QAから強い解釈可能性を引き継ぐ。評価時間をQAベースの指標と比較して約96%短縮できることを示すとともに、6つの人間アノテーション付きデータセットとメタ評価ベンチマークGO FIGURE(Gabriel et al., 2021)の実験を通してその解釈可能性と性能を維持する。最後に、ClozEの重要な3つの側面について論じ、他のメトリクスと比較してClozEの全体的なパフォーマンスをさらに向上させる。 The issue of factual consistency in abstractive summarization has received extensive attention in recent years, and the evaluation of factual consistency between summary and document has become an important and urgent task. Most of the current evaluation metrics are adopted from the question answering (QA) or natural language inference (NLI) task. However, the application of QA-based metrics is extremely time-consuming in practice while NLI-based metrics are lack of interpretability. In this paper, we propose a cloze-based evaluation framework called ClozE and show the great potential of the cloze-based metric. It inherits strong interpretability from QA, while maintaining the speed of NLI- level reasoning. We demonstrate that ClozE can reduce the evaluation time by nearly 96% relative to QA-based metrics while retaining their interpretability and performance through experiments on six human-annotated datasets and a meta-evaluation benchmark GO FIGURE (Gabriel et al., 2021). Finally, we discuss three important facets of ClozE in practice, which further shows better overall performance of ClozE compared to other metrics.	翻訳日:2023-11-30 17:29:35 公開日:2023-11-28
# 対話型顔ビデオにおける表情編集の連続制御 Continuously Controllable Facial Expression Editing in Talking Face Videos ( http://arxiv.org/abs/2209.08289v2 ) ライセンス: Link先を確認	Zhiyao Sun, Yu-Hui Wen, Tian Lv, Yanan Sun, Ziyang Zhang, Yaoyuan Wang, Yong-Jin Liu	(参考訳) 近年,音声による対面映像生成が注目されている。しかし、これらの会話ビデオの感情的な編集を連続的に制御可能な表現で行うという問題に対処する研究はほとんどなく、この業界では強い需要がある。課題は、言語関連表現と感情関連表現が高結合であることである。一方、従来の画像から画像への変換手法では、ポーズなどの他の属性と表現の結合、すなわち各フレームにおける文字表現の翻訳は、トレーニングデータ分布のバイアスにより、頭の位置が同時に変化する可能性があるため、アプリケーションではうまく機能しない。そこで本稿では,会話ビデオの高品質な表情編集手法を提案し,ユーザが編集ビデオのターゲット感情を連続的に制御できるようにする。本研究では,3dmmを用いて顔の動きをキャプチャし,スタイルガンによってモデル化されたテクスチャマップを用いて外観の詳細をキャプチャする,モーション情報編集の特別なケースとして,この課題の新しい視点を提案する。両方の表現(3dmmとテクスチャマップ)には感情情報が含まれており、ニューラルネットワークによって連続的に修正され、係数/相対空間の平均化によって容易に平滑化することができる。また,唇の同期と編集表現の誇張の程度とのトレードオフを制御するために,口形状保存損失を導入する。広範な実験とユーザスタディにより,様々な評価基準において最先端の性能が得られた。 Recently audio-driven talking face video generation has attracted considerable attention. However, very few researches address the issue of emotional editing of these talking face videos with continuously controllable expressions, which is a strong demand in the industry. The challenge is that speech-related expressions and emotion-related expressions are often highly coupled. Meanwhile, traditional image-to-image translation methods cannot work well in our application due to the coupling of expressions with other attributes such as poses, i.e., translating the expression of the character in each frame may simultaneously change the head pose due to the bias of the training data distribution. In this paper, we propose a high-quality facial expression editing method for talking face videos, allowing the user to control the target emotion in the edited video continuously. We present a new perspective for this task as a special case of motion information editing, where we use a 3DMM to capture major facial movements and an associated texture map modeled by a StyleGAN to capture appearance details. Both representations (3DMM and texture map) contain emotional information and can be continuously modified by neural networks and easily smoothed by averaging in coefficient/latent spaces, making our method simple yet effective. We also introduce a mouth shape preservation loss to control the trade-off between lip synchronization and the degree of exaggeration of the edited expression. Extensive experiments and a user study show that our method achieves state-of-the-art performance across various evaluation criteria.	翻訳日:2023-11-30 17:28:26 公開日:2023-11-28
# 分散ラベル空間のフェデレーション学習のための基礎モデルからのセマンティック属性の探索 Exploring Semantic Attributes from A Foundation Model for Federated Learning of Disjoint Label Spaces ( http://arxiv.org/abs/2208.13465v2 ) ライセンス: Link先を確認	Shitong Sun, Chenyang Si, Guile Wu, Shaogang Gong	(参考訳) 従来の集中型ディープラーニングパラダイムは、データプライバシや送信制限のため、異なるソースからのデータを共有できない場合、実現不可能である。この問題を解決するために、グローバルに一般化された中央モデル(サーバ)を最適化しながら、複数のソース(クライアント)に非共有データで知識を伝達するフェデレーション学習が導入された。既存のフェデレートされた学習パラダイムは、主にモデルの全体的高レベルな知識(クラスなど)の伝達に焦点を当てており、これは特定の関心の対象と密接に関連しているため、逆攻撃に悩まされる可能性がある。対照的に、本研究では、特定の関心対象に敏感でないため、よりプライバシー保護的でスケーラブルな中レベルの意味知識(属性など)の転送を検討する。この目的のために,共有されていないローカルデータを用いて,複数のローカルクライアントで中レベルの意味知識を学習し,グローバルに一般化されたデプロイメントの中央モデルを累積集約する,新しいフェデレーションゼロショット学習(fzsl)パラダイムを策定する。モデル識別能力を向上させるために,FZSLの中間レベル意味空間を充実させるために,外部知識からのセマンティック知識増強を提案する。 5つのゼロショット学習ベンチマークデータセットの大規模な実験により、中間レベルの意味的知識伝達を伴う一般化可能なフェデレーション学習モデルを最適化するためのアプローチの有効性が検証された。 Conventional centralised deep learning paradigms are not feasible when data from different sources cannot be shared due to data privacy or transmission limitation. To resolve this problem, federated learning has been introduced to transfer knowledge across multiple sources (clients) with non-shared data while optimising a globally generalised central model (server). Existing federated learning paradigms mostly focus on transferring holistic high-level knowledge (such as class) across models, which are closely related to specific objects of interest so may suffer from inverse attack. In contrast, in this work, we consider transferring mid-level semantic knowledge (such as attribute) which is not sensitive to specific objects of interest and therefore is more privacy-preserving and scalable. To this end, we formulate a new Federated Zero-Shot Learning (FZSL) paradigm to learn mid-level semantic knowledge at multiple local clients with non-shared local data and cumulatively aggregate a globally generalised central model for deployment. To improve model discriminative ability, we propose to explore semantic knowledge augmentation from external knowledge for enriching the mid-level semantic space in FZSL. Extensive experiments on five zeroshot learning benchmark datasets validate the effectiveness of our approach for optimising a generalisable federated learning model with mid-level semantic knowledge transfer.	翻訳日:2023-11-30 17:26:34 公開日:2023-11-28
# 量子作曲家を用いたブロッホ球面のモデリング支援トンネル Modelling assisted tunneling on the Bloch sphere using the Quantum Composer ( http://arxiv.org/abs/2212.04845v3 ) ライセンス: Link先を確認	Jonas Bley, Vieri Mattei, Simon Goorney, Jacob Sherson, Stefan Heusler	(参考訳) ブロッホ球面表現(bloch sphere representation)は、量子ビットの時間ダイナミクスを記述するのに使用できる2レベル系の全ての可能な量子状態の幾何学モデルである。明示的な応用として,ダブルウェルポテンシャルにおける粒子の時間ダイナミクスを考える。特に、周期的な電磁場によって駆動されるSUPER原理(Swing-UP of the quantum emitter population)と呼ばれる非共鳴励起の最近の手法を量子トンネルの文脈に適用する。電位高さの適切な振動が導入されたとき,トンネルの確率を著しく向上させることができることを示す。教育者と開発者との対話と呼ぶコラボレーティブなアプローチによって、ソフトウェアquantum composerの更新版が提示されます。教育目的のために、1D-Schr\\odinger方程式の2つの最低エネルギー状態をブロッホ球面表現にマッピングし、関連する時間力学に対してかなり明確で直感的な物理像を与える。 The Bloch sphere representation is a geometric model for all possible quantum states of a two-level system that can be used to describe the time dynamics of a qubit. As explicit application, we consider the time dynamics of a particle in a double-well potential. In particular, we adopt a recent method for off-resonant excitations, the so-called SUPER principle (Swing-UP of the quantum emitter population) driven by periodic electromagnetic fields, to the context of quantum tunnelling. We show that the tunnelling probability can be enhanced significantly when an appropriate oscillation of the potential height is introduced. Driven by a collaborative approach we call educator-developer dialogue, an updated version of the software Quantum Composer is presented. For educational purposes, we map the two lowest energy states of the 1D-Schr\"odinger equation to the Bloch sphere representation, leading to a rather clear and intuitive physical picture for the pertinent time dynamics.	翻訳日:2023-11-30 17:19:21 公開日:2023-11-28
# STLGRU:交通流予測のための時空間グラフGRU STLGRU: Spatio-Temporal Lightweight Graph GRU for Traffic Flow Prediction ( http://arxiv.org/abs/2212.04548v2 ) ライセンス: Link先を確認	Kishor Kumar Bhaumik, Fahim Faisal Niloy, Saif Mahmud, Simon Woo	(参考訳) トラフィックフローの信頼性の高い予測には、トラフィックデータの効率的なモデリングが必要である。異なる相関と影響が動的トラフィックネットワークで発生し、モデリングは複雑なタスクとなる。既存の文献では、交通ネットワークの複雑な空間-時間関係を捉えるための様々な方法が提案されている。しかしながら、メソッドは、長い範囲の自然の異なるローカルおよびグローバル依存を捉えるのに苦労している。また、より高度な手法が提案されるにつれて、モデルは記憶量が多くなり、低消費電力デバイスには適さないものになっている。本稿では,新しいディープラーニングフレームワークSTLGRUを提案することによって,これらの問題を解決することに焦点を当てる。具体的には,提案するSTLGRUは,メモリ拡張アテンションとゲーティング機構を用いて,交通ネットワークの局所的・グローバル的時空間関係を効果的に捉えることができる。時間的および空間的要素を分離する代わりに、メモリモジュールとゲートユニットが空間的時間的依存関係をうまく学習し、少ないパラメータでメモリ使用量を削減できることを示す。我々は,メモリフットプリントが低く,既存の手法よりも優れた性能を示すために,実世界のトラヒック予測データセットを広範囲に実験した。コードは \url{https://github.com/kishor-bhaumik/stlgru} で入手できる。 Reliable forecasting of traffic flow requires efficient modeling of traffic data. Different correlations and influences arise in a dynamic traffic network, making modeling a complicated task. Existing literature has proposed many different methods to capture the complex underlying spatial-temporal relations of traffic networks. However, methods still struggle to capture different local and global dependencies of long-range nature. Also, as more and more sophisticated methods are being proposed, models are increasingly becoming memory-heavy and, thus, unsuitable for low-powered devices. In this paper, we focus on solving these problems by proposing a novel deep learning framework - STLGRU. Specifically, our proposed STLGRU can effectively capture both local and global spatial-temporal relations of a traffic network using memory-augmented attention and gating mechanism. Instead of employing separate temporal and spatial components, we show that our memory module and gated unit can learn the spatial-temporal dependencies successfully, allowing for reduced memory usage with fewer parameters. We extensively experiment on several real-world traffic prediction datasets to show that our model performs better than existing methods while the memory footprint remains lower. Code is available at \url{https://github.com/Kishor-Bhaumik/STLGRU}.	翻訳日:2023-11-30 17:19:02 公開日:2023-11-28
# FeTrIL: 初級クラス増分学習のための特徴翻訳 FeTrIL: Feature Translation for Exemplar-Free Class-Incremental Learning ( http://arxiv.org/abs/2211.13131v2 ) ライセンス: Link先を確認	Gr\'egoire Petit, Adrian Popescu, Hugo Schindler, David Picard, Bertrand Delezoide	(参考訳) 難解なクラスインクリメンタル学習は、破滅的な放棄の悪影響のため、非常に困難である。新しいクラスだけでなく過去の精度を高めるためには, 段階的プロセスの安定性と可塑性のバランスが必要である。既存の非古典的クラス増分法は、モデルの連続的な微調整に焦点をあて、可塑性を優先するか、初期漸進状態後に固定された特徴抽出器を使用するか、安定性を優先する。固定特徴抽出器と擬似特徴生成器を組み合わせて安定性・塑性バランスを改善する手法を提案する。ジェネレータは、新しいクラス機能の単純かつ効果的な幾何学的変換を使用して、擬似機能で作られた過去のクラスの表現を生成する。機能の翻訳は、擬似特徴を生成するために過去のクラスのセントロイド表現の保存のみを必要とする。新しいクラスの実際の特徴と過去のクラスの擬似特徴を線形分類器に入力し、すべてのクラスを識別するために漸進的に訓練する。深層モデル全体を更新する主流のプロセスに比べて,提案手法よりもインクリメンタルなプロセスの方がはるかに高速である。実験は3つの挑戦的なデータセットと異なるインクリメンタル設定で実施される。既存手法10例と比較したところ,本手法はほとんどの場合,他の手法よりも優れていた。 Exemplar-free class-incremental learning is very challenging due to the negative effect of catastrophic forgetting. A balance between stability and plasticity of the incremental process is needed in order to obtain good accuracy for past as well as new classes. Existing exemplar-free class-incremental methods focus either on successive fine tuning of the model, thus favoring plasticity, or on using a feature extractor fixed after the initial incremental state, thus favoring stability. We introduce a method which combines a fixed feature extractor and a pseudo-features generator to improve the stability-plasticity balance. The generator uses a simple yet effective geometric translation of new class features to create representations of past classes, made of pseudo-features. The translation of features only requires the storage of the centroid representations of past classes to produce their pseudo-features. Actual features of new classes and pseudo-features of past classes are fed into a linear classifier which is trained incrementally to discriminate between all classes. The incremental process is much faster with the proposed method compared to mainstream ones which update the entire deep model. Experiments are performed with three challenging datasets, and different incremental settings. A comparison with ten existing methods shows that our method outperforms the others in most cases.	翻訳日:2023-11-30 17:16:57 公開日:2023-11-28
# FIXED: Mixupで簡単にドメインを一般化できる FIXED: Frustratingly Easy Domain Generalization with Mixup ( http://arxiv.org/abs/2211.05228v2 ) ライセンス: Link先を確認	Wang Lu, Jindong Wang, Han Yu, Lei Huang, Xiang Zhang, Yiqiang Chen, Xing Xie	(参考訳) ドメイン一般化(Domain Generalization, DG)は、複数のトレーニングドメインから一般化可能なモデルを学ぶことを目的としている。一般的な戦略は、Mixup~\cite{zhang2018mixup}のようなメソッドによる一般化のためにトレーニングデータを拡張することである。バニラミックスアップは直接適用できるが、理論的および実証的な調査は、その性能を制限するいくつかの欠点を明らかにする。まず、mixupは不変表現の学習に使用できるドメインとクラス情報を効果的に識別することはできない。第二に、Mixupはランダム補間によって合成ノイズデータポイントを導入し、識別能力を低下させる。この分析に基づき、MixupベースのDG、すなわちドメイン不変の特徴mIXup(FIX)の簡易かつ効果的な拡張を提案する。 Mixupのドメイン不変表現を学習する。差別をさらに強化するために、既存の手法を活用し、クラス間のマージンを拡大し、拡張識別(FIXED)アプローチによるドメイン不変の特徴MIXupをさらに提案する。我々はその有効性に関する保証に関する理論的知見を示す。画像分類 (Digits-DG, PACS, Office-Home) と時系列 (DSADS, PAMAP2, UCI-HAR, USC-HAD) を含む2つのモードの公開データセットに対する大規模な実験により, 提案手法は9つの最先端関連手法を著しく上回り, 平均6.5倍のベースラインを達成できた。コードは以下の通り。 https://github.com/jindongwang/transferlearning/tree/master/code/deep/fixed。 Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains. A popular strategy is to augment training data to benefit generalization through methods such as Mixup~\cite{zhang2018mixup}. While the vanilla Mixup can be directly applied, theoretical and empirical investigations uncover several shortcomings that limit its performance. Firstly, Mixup cannot effectively identify the domain and class information that can be used for learning invariant representations. Secondly, Mixup may introduce synthetic noisy data points via random interpolation, which lowers its discrimination capability. Based on the analysis, we propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX). It learns domain-invariant representations for Mixup. To further enhance discrimination, we leverage existing techniques to enlarge margins among classes to further propose the domain-invariant Feature MIXup with Enhanced Discrimination (FIXED) approach. We present theoretical insights about guarantees on its effectiveness. Extensive experiments on seven public datasets across two modalities including image classification (Digits-DG, PACS, Office-Home) and time series (DSADS, PAMAP2, UCI-HAR, and USC-HAD) demonstrate that our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5\% on average in terms of test accuracy. Code is available at: https://github.com/jindongwang/transferlearning/tree/master/code/deep/fixed.	翻訳日:2023-11-30 17:16:00 公開日:2023-11-28
# ガッピンググラフェン単分子膜の光増幅ランドウ・ツェナー伝導率:光触媒真空不安定性のシミュラクラム Light-amplified Landau-Zener conductivity in gapped graphene monolayers: a simulacrum of photo-catalyzed vacuum instability ( http://arxiv.org/abs/2211.04206v2 ) ライセンス: Link先を確認	Selym Villalba-Ch\'avez, Oliver Mathiak, Reinhold Egger and Carsten M\"uller	(参考訳) ガッピンググラフェン単層中の電子のバンド間遷移は、フレーク面に弱強度の高周波電磁波と強定電界が重畳されたときに、フェルミ表面付近で高い刺激を受ける。この現象はフランツ・ケルディッシュ効果と同値であり、高速振動場に繋がる光子エネルギーがグラフェンギャップの直下にあり、量子遷移は1光子吸収チャネルによって促進されながらトンネル効果によっても起こるという構造に特に注意を払っている。考慮されたパラメータ配置では、前述の設定に関連付けられた光触媒電流は、強磁場によって駆動される電流を数桁だけ超えることが示される。場の有限拡大の影響を緩和する条件を議論し、残留電流密度の式を導出する。本評価の堅牢性は,この現象をグラフェンで検出できることを裏付けるものであり,QEDにおける動的支援シュウィンガー機構のシミュレーションを提供する。 Interband transitions of electrons in a gapped graphene monolayer are highly stimulated near the Fermi surface when a high-frequency electric wave of weak intensity and a strong constant electric field are superposed in the plane of the flake. We consider this phenomenon equivalent to the Franz-Keldysh effect, paying particular attention to the regime where the photon energy linked to the fast-oscillating field is just below the graphene gap, so that the quantum transitions still occur through tunneling effects while being facilitated by the one-photon absorption channel. In the considered parameter regime the photo-catalyzed current linked to the described setup is shown to exceed the one driven by the strong field solely by several orders of magnitude. Conditions to relieve the impact of the field's finite extension are discussed, and a formula for the residual current density is derived. The robustness of our assessment supports the viability of detecting this phenomenon in graphene, thus providing a simulation of the dynamically-assisted Schwinger mechanism in QED.	翻訳日:2023-11-30 17:15:07 公開日:2023-11-28
# マルチビュー知覚と3次元多目的追跡に基づく温室トマトにおける全果実の自動局在と全果実の再構築に関する研究 Development and evaluation of automated localisation and reconstruction of all fruits on tomato plants in a greenhouse based on multi-view perception and 3D multi-object tracking ( http://arxiv.org/abs/2211.02760v3 ) ライセンス: Link先を確認	David Rapado Rincon, Eldert J. van Henten, Gert Kootstra	(参考訳) ロボットがタスクを効果的に実行するためには、関連オブジェクトを正確に表現し、ローカライズする能力が不可欠である。従来のアプローチでは、ロボットは単に画像をキャプチャし、その画像を処理してアクションを取り、その情報を忘れる。これらの問題に対処する可能性を持つ多視点知覚を用いた手法は、複数の視点から情報の収集、統合、抽出を導く世界モデルを必要とする。さらに,様々な環境やタスクに適用可能な汎用表現の構築も困難である。本稿では,多視点認識と3次元多物体追跡を用いた閉鎖されたアグロフード環境における汎用表現構築手法を提案する。この方法は、検出対象毎に部分的点雲を生成する検出アルゴリズムと、時間とともに表現を更新する3dマルチオブジェクト追跡アルゴリズムに基づいている。表象の精度は実環境において評価され, トマト植物におけるトマトの表現と局在は, 高い包接度にもかかわらず達成され, トマトの総数5.08%, トマトは71.47%と推定された。新たな追跡指標を導入し、果実のローカライズおよび表現におけるエラーに対する貴重な洞察が、それらの使用によって提供できることを実証した。このアプローチは、閉鎖されたアグロフード環境における表現を構築するための新しいソリューションを示し、ロボットがこれらの困難な環境で効果的にタスクを実行できる可能性を示す。 The ability to accurately represent and localise relevant objects is essential for robots to carry out tasks effectively. Traditional approaches, where robots simply capture an image, process that image to take an action, and then forget the information, have proven to struggle in the presence of occlusions. Methods using multi-view perception, which have the potential to address some of these problems, require a world model that guides the collection, integration and extraction of information from multiple viewpoints. Furthermore, constructing a generic representation that can be applied in various environments and tasks is a difficult challenge. In this paper, a novel approach for building generic representations in occluded agro-food environments using multi-view perception and 3D multi-object tracking is introduced. The method is based on a detection algorithm that generates partial point clouds for each detected object, followed by a 3D multi-object tracking algorithm that updates the representation over time. The accuracy of the representation was evaluated in a real-world environment, where successful representation and localisation of tomatoes in tomato plants were achieved, despite high levels of occlusion, with the total count of tomatoes estimated with a maximum error of 5.08% and the tomatoes tracked with an accuracy up to 71.47%. Novel tracking metrics were introduced, demonstrating that valuable insight into the errors in localising and representing the fruits can be provided by their use. This approach presents a novel solution for building representations in occluded agro-food environments, demonstrating potential to enable robots to perform tasks effectively in these challenging environments.	翻訳日:2023-11-30 17:14:46 公開日:2023-11-28
# 推論・幻覚・対話性におけるchatgptのマルチタスク・マルチリンガル・マルチモーダル評価 A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity ( http://arxiv.org/abs/2302.04023v4 ) ライセンス: Link先を確認	Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung	(参考訳) 本稿では,ChatGPT などの対話型 LLM を公開データセットを用いて定量的に評価するためのフレームワークを提案する。 8種類の共通NLPアプリケーションタスクをカバーする23のデータセットを用いてChatGPTの広範な技術的評価を行う。これらのデータセットと、新たに設計されたマルチモーダルデータセットに基づいて、ChatGPTのマルチタスク、マルチ言語、マルチモーダルの側面を評価する。また、ChatGPTは、ほとんどのタスクでゼロショット学習でLLMよりも優れており、一部のタスクでは微調整モデルよりも優れています。生成するよりも、非ラテン語のスクリプト言語を理解する方が優れていることが分かりました。中間のコード生成ステップを通じて、テキストプロンプトからマルチモーダルコンテンツを生成することができる。さらに、ChatGPTは論理的推論、非テクスト的推論、コモンセンス推論の10種類の推論カテゴリで平均63.41%正確であることから、信頼できない推論となる。例えば、帰納的推論よりも推論的に優れている。 ChatGPTは、他のLLMのような幻覚障害に悩まされており、外部知識ベースにアクセスできないため、そのパラメトリックメモリから外因性幻覚を生成する。最後に、ChatGPTの対話的機能により、基礎となるLLMとの人間によるコラボレーションにより、要約における8%のROUGE-1、機械翻訳における2%のChrF++をマルチターンの"プロンプトエンジニアリング"方式で改善することができる。評価セット抽出のためのコードベースもリリースしています。 This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. We find that ChatGPT outperforms LLMs with zero-shot learning on most tasks and even outperforms fine-tuned models on some tasks. We find that it is better at understanding non-Latin script languages than generating them. It is able to generate multimodal content from textual prompts, via an intermediate code generation step. Moreover, we find that ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning, hence making it an unreliable reasoner. It is, for example, better at deductive than inductive reasoning. ChatGPT suffers from hallucination problems like other LLMs and it generates more extrinsic hallucinations from its parametric memory as it does not have access to an external knowledge base. Finally, the interactive feature of ChatGPT enables human collaboration with the underlying LLM to improve its performance, i.e, 8% ROUGE-1 on summarization and 2% ChrF++ on machine translation, in a multi-turn "prompt engineering" fashion. We also release codebase for evaluation set extraction.	翻訳日:2023-11-30 17:07:15 公開日:2023-11-28
# マイクロマザー量子電池における充電プロトコルの人工知能による発見 Artificial intelligence discovery of a charging protocol in a micromaser quantum battery ( http://arxiv.org/abs/2301.09408v3 ) ライセンス: Link先を確認	Carla Rodr\'iguez, Dario Rosa and Jan Olle	(参考訳) 量子バッテリ(qb)におけるモデル依存パラメータを最適化するための勾配に基づく一般計算フレームワークを提案する。マイクロメーサQBにおける2つの異なる充電シナリオに適用し、制御および自動制御されたヒルベルト宇宙室における電池の安定化のための充電プロトコルを発見する。このプロトコルは安定で堅牢であることが判明し、マイクロマザーqbの充電効率が向上する。さらに、我々の最適化フレームワークは非常に多用途で効率的であり、あらゆるスケールでQB技術の進歩を約束している。 We propose a gradient-based general computational framework for optimizing model-dependent parameters in quantum batteries (QB). We apply this method to two different charging scenarios in the micromaser QB and we discover a charging protocol for stabilizing the battery in upper-laying Hilbert space chambers in a controlled and automatic way. This protocol is found to be stable and robust, and it leads to an improved charging efficiency in micromaser QBs. Moreover, our optimization framework is highly versatile and efficient, holding great promise for the advancement of QB technologies at all scales.	翻訳日:2023-11-30 17:05:17 公開日:2023-11-28
# 深部平面視差による単眼深度推定 Deep Planar Parallax for Monocular Depth Estimation ( http://arxiv.org/abs/2301.03178v2 ) ライセンス: Link先を確認	Haoqian Liang, Zhichao Li, Ya Yang, Naiyan Wang	(参考訳) 近年,単眼深度推定における平面視差幾何学の有用性が注目されている。しかし、ネットワークは深度予測の外観に大きく依存するため、その可能性はまだ完全には実現されていない。より詳細な分析により,フロープレトラクションを用いることで,連続するフレームモデリングのネットワーク利用を最適化できることがわかった。さらに,静的なシーン仮定に反する動的オブジェクトを扱うための平面位置埋め込み(PPE)を提案し,区別が難しい傾斜変動に対処する。 KITTIとWaymo Open Dataset(WOD)という自動運転データセットに関する総合的な実験は、我々のPlanar Parallax Network(PPNet)が、既存の学習ベースのパフォーマンス手法を大幅に上回っていることを証明している。 Recent research has highlighted the utility of Planar Parallax Geometry in monocular depth estimation. However, its potential has yet to be fully realized because networks rely heavily on appearance for depth prediction. Our in-depth analysis reveals that utilizing flow-pretrain can optimize the network's usage of consecutive frame modeling, leading to substantial performance enhancement. Additionally, we propose Planar Position Embedding (PPE) to handle dynamic objects that defy static scene assumptions and to tackle slope variations that are challenging to differentiate. Comprehensive experiments on autonomous driving datasets, namely KITTI and the Waymo Open Dataset (WOD), prove that our Planar Parallax Network (PPNet) significantly surpasses existing learning-based methods in performance.	翻訳日:2023-11-30 17:04:25 公開日:2023-11-28
# 真空場誘起状態混合 Vacuum-field-induced state mixing ( http://arxiv.org/abs/2212.11610v2 ) ライセンス: Link先を確認	Diego Fern\'andez de la Pradilla, Esteban Moreno, Johannes Feist	(参考訳) 電磁真空場の工学により、誘導カシミール・ポルダーシフト(ラムシフトとも呼ばれる)と個々の原子レベルの自発放出速度を制御することができる。これらの効果の強さが2つの未結合の原子状態間のエネルギー差に匹敵するようになると、環境をトレースした後、環境によって引き起こされる相互作用が現れる。この相互作用はかつて、無限で完全な半空間や自由空間を含む退化レベルと単純な測地のために研究されてきた。ここでは、これらの非対角摂動を原子ハミルトニアンに正確な非エルミートハミルトニアンの観点から解析できる便利な記述を考案することで、これらの研究を一般化する。この理論を誘電体ナノ粒子に近い水素原子に応用し、従来の対角摂動理論と比較して、エネルギーと崩壊速度の両方に劇的な変化をもたらす強い真空場誘起状態混合を示す。特に、期待されるパーセルの増進とは対照的に、かなりの範囲の原子-ナノ粒子分離において、崩壊速度は驚くほど低下する。さらに,非対角摂動による非摂動固有状態の混合量の定量化を行った。我々の研究は、エネルギー準位が密接なエミッタに新しい量子状態操作の可能性を開く。 By engineering the electromagnetic vacuum field, the induced Casimir-Polder shift (also known as Lamb shift) and spontaneous emission rates of individual atomic levels can be controlled. When the strength of these effects becomes comparable to the energy difference between two previously uncoupled atomic states, an environment-induced interaction between these states appears after tracing over the environment. This interaction has been previously studied for degenerate levels and simple geometries involving infinite, perfectly conducting half-spaces or free space. Here, we generalize these studies by developing a convenient description that permits the analysis of these non-diagonal perturbations to the atomic Hamiltonian in terms of an accurate non-Hermitian Hamiltonian. Applying this theory to a hydrogen atom close to a dielectric nanoparticle, we show strong vacuum-field-induced state mixing that leads to drastic modifications in both the energies and decay rates compared to conventional diagonal perturbation theory. In particular, contrary to the expected Purcell enhancement, we find a surprising decrease of decay rates within a considerable range of atom-nanoparticle separations. Furthermore, we quantify the large degree of mixing of the unperturbed eigenstates due to the non-diagonal perturbation. Our work opens new quantum state manipulation possibilities in emitters with closely spaced energy levels.	翻訳日:2023-11-30 17:03:48 公開日:2023-11-28
# 熱原子のアンサンブルに基づくRydberg-blockaded単一光子源の集合放出の解析 Analyzing the collective emission of a Rydberg-blockaded single-photon source based on an ensemble of thermal atoms ( http://arxiv.org/abs/2303.03937v3 ) ライセンス: Link先を確認	Jan A. P. Reuter, Max M\"ausezahl, Felix Moumtsilis, Tilman Pfau, Tommaso Calarco, Robert L\"ow, Matthias M. M\"uller	(参考訳) ルビドゥム原子のアンサンブルはレーザーによって励起され、リドベルク封鎖半径内の1つの集合励起で絡み合った状態へと進化する。この状態の崩壊は、単一の反結合光子の放出につながる。マイクロセル中のルビジウム原子の高温蒸気について、原子密度分布やレーザーによる電子状態の選択のような異なる実験条件下で、そのような単一光子源の有効性を数値的に研究する。 3つの長方形レーザーパルスを用いた励起過程について, 切断ヒルベルト空間における系のコヒーレントダイナミクスをシミュレートする。移動するルビジウム原子の放射挙動を調査し,それに応じてレーザーパルスシーケンスを最適化する。単一励起の集団的崩壊は高速で指向的な光子放出につながり、さらにスピンエコーに似たパルスシーケンスは光子の方向性を高めることが判明した。最後に,残余の二重励起を解析し,これら集合的崩壊特性を示さず,小さな有害な役割のみを果たすことを見出した。 An ensemble of Rubidum atoms can be excited with lasers such that it evolves into an entangled state with just one collective excitation within the Rydberg blockade radius. The decay of this state leads to the emission of a single, antibunched photon. For a hot vapor of Rubidium atoms in a micro cell we numerically study the feasibility of such a single-photon source under different experimental conditions like the atomic density distribution and the choice of electronic states addressed by the lasers. For the excitation process with three rectangular lasers pulses, we simulate the coherent dynamics of the system in a truncated Hilbert space. We investigate the radiative behavior of the moving Rubidum atoms and optimize the laser pulse sequence accordingly. We find that the collective decay of the single-excitation leads to a fast and directed photon emission and further, that a pulse sequence similar to a spin echo increases the directionality of the photon. Finally, we analyze the residual double-excitations and find that they do not exhibit these collective decay properties and play only a minor deleterious role.	翻訳日:2023-11-30 16:53:58 公開日:2023-11-28
# FairShap: 共有値に基づくアルゴリズムフェアネスのためのデータ再重み付けアプローチ FairShap: A Data Re-weighting Approach for Algorithmic Fairness based on Shapley Values ( http://arxiv.org/abs/2303.01928v3 ) ライセンス: Link先を確認	Adrian Arnaiz-Rodriguez, Nuria Oliver	(参考訳) アルゴリズムの公平性は最も社会的に重要であるが、大規模機械学習モデルの現在のトレンドは、バイアスの多い巨大なデータセットでのトレーニングを必要とする。この文脈では、データのモデリングとバイアスの修正に焦点を当てた事前処理手法が貴重なアプローチとして現れます。本稿では,shapley値を用いたデータ評価による公平なアルゴリズム決定のための,新しいインスタンスレベルのデータ重み付け手法であるfairshapを提案する。 fairshapはモデル非依存で、事前定義されたフェアネスメトリックへの各トレーニングデータポイントの寄与度を測定するため、容易に解釈できる。私たちは、さまざまなトレーニングシナリオとモデルを使って、さまざまな性質の最先端データセットでfairshapを実証的に検証し、ベースラインと同じようなレベルの精度でfairshapモデルを生成する方法を示します。ヒストグラムと潜時空間の可視化によるFairShapの解釈可能性について説明する。さらに,データのサイズや特徴数に応じて,参照データセットのサイズとFairShapの計算コストの影響を説明するために,ユーティリティフェアネススタディとアブレーションおよび実行実験を実施している。 FairShapはアルゴリズムの公正性に対する解釈およびモデルに依存しないアプローチにおいて有望な方向を示しており、バイアス付きデータセットのみが利用可能であっても、競合する精度が得られると考えている。 Algorithmic fairness is of utmost societal importance, yet the current trend in large-scale machine learning models requires training with massive datasets that are frequently biased. In this context, pre-processing methods that focus on modeling and correcting bias in the data emerge as valuable approaches. In this paper, we propose FairShap, a novel instance-level data re-weighting method for fair algorithmic decision-making through data valuation by means of Shapley Values. FairShap is model-agnostic and easily interpretable, as it measures the contribution of each training data point to a predefined fairness metric. We empirically validate FairShap on several state-of-the-art datasets of different nature, with a variety of training scenarios and models and show how it yields fairer models with similar levels of accuracy than the baselines. We illustrate FairShap's interpretability by means of histograms and latent space visualizations. Moreover, we perform a utility-fairness study, and ablation and runtime experiments to illustrate the impact of the size of the reference dataset and FairShap's computational cost depending on the size of the dataset and the number of features. We believe that FairShap represents a promising direction in interpretable and model-agnostic approaches to algorithmic fairness that yield competitive accuracy even when only biased datasets are available.	翻訳日:2023-11-30 16:53:06 公開日:2023-11-28
# planet-clothpick:潜在動的計画に基づく布のフラット化 PlaNet-ClothPick: Effective Fabric Flattening Based on Latent Dynamic Planning ( http://arxiv.org/abs/2303.01345v2 ) ライセンス: Link先を確認	Halid Abdulrahim Kadi and Kasim Terzic	(参考訳) PlaNetのようなリカレントステートスペースモデルは、なぜ布の操作に失敗するのか? 最近の研究は、観測結果のぼやけた予測によるもので、潜伏した空間を直接計画することは困難である。本稿では,PlaNetをファブリック平滑化領域に適用することで,その背景となる理由を考察する。ファブリックの輪郭上の遷移関数の急激な不連続性は、正確な潜在動的モデルを学ぶのを困難にし、MPCプランナーは記事の少し外側でピックアクションを生成する。布マスクの空間を限定し,特別に設計された軌道を訓練することで,メッシュフリーのプラネットクロスピックは,シミュレーションにおける主要なメトリクスの視覚的計画法や政策学習法を上回り,最先端のメッシュベースの計画手法と同等のパフォーマンスを達成している。特に、我々のモデルはより高速な行動推論を示し、この領域の最先端ロボットシステムよりも遷移モデルパラメータを少なくする。その他の追加資料は、https://sites.google.com/view/planet-clothpick.comで入手できる。 Why do Recurrent State Space Models such as PlaNet fail at cloth manipulation tasks? Recent work has attributed this to the blurry prediction of the observation, which makes it difficult to plan directly in the latent space. This paper explores the reasons behind this by applying PlaNet in the pick-and-place fabric-flattening domain. We find that the sharp discontinuity of the transition function on the contour of the fabric makes it difficult to learn an accurate latent dynamic model, causing the MPC planner to produce pick actions slightly outside of the article. By limiting picking space on the cloth mask and training on specially engineered trajectories, our mesh-free PlaNet-ClothPick surpasses visual planning and policy learning methods on principal metrics in simulation, achieving similar performance as state-of-the-art mesh-based planning approaches. Notably, our model exhibits a faster action inference and requires fewer transitional model parameters than the state-of-the-art robotic systems in this domain. Other supplementary materials are available at: https://sites.google.com/view/planet-clothpick.	翻訳日:2023-11-30 16:52:26 公開日:2023-11-28
# 指数的に可変な波動関数重なりによるフラックス量子ビットの高速ユニバーサル制御 Fast universal control of a flux qubit via exponentially tunable wave-function overlap ( http://arxiv.org/abs/2303.01102v2 ) ライセンス: Link先を確認	Svend Kr{\o}jer, Anders Enevold Dahl, Kasper Sangild Christensen, Morten Kjaergaard and Karsten Flensberg	(参考訳) 保護された超伝導量子ビットの高速かつ高忠実性制御と読み出しは、本質的に不感度のため困難である。本稿では,この課題を解決するために,リラクゼーションに対する調整可能な保護レベルを享受するフラックス量子ビット変動を提案する。我々の量子ビット設計であるDSFQ(Double-shunted flux qubit)は、3つの接合環幾何学を通して一般的な二重井戸ポテンシャルを実現する。ジャンクションの1つは調整可能であり、バリアの高さと保護レベルを制御することができる。バリアの低下に依存する単一および2量子ゲート動作の解析を行う。非計算状態が操作中に占有されないため、これは高い忠実度ゲートをもたらす実行可能な方法であることを示す。また、dsfqが読み出し共振器への減衰を防ぎながら外部印加フラックスを調整することにより、読み出し共振器との効果的な結合を制御できることを示す。最後に,ループ領域が同一でない場合でも,大域磁場の変動に指数関数的に影響を受けないdsfqの二重ループグラディメトリ版についても検討した。 Fast, high fidelity control and readout of protected superconducting qubits are fundamentally challenging due to their inherent insensitivity. We propose a flux qubit variation which enjoys a tunable level of protection against relaxation to resolve this outstanding issue. Our qubit design, the double-shunted flux qubit (DSFQ), realizes a generic double-well potential through its three junction ring geometry. One of the junctions is tunable, making it possible to control the barrier height and thus the level of protection. We analyze single- and two-qubit gate operations that rely on lowering the barrier. We show that this is a viable method that results in high fidelity gates as the non-computational states are not occupied during operations. Further, we show how the effective coupling to a readout resonator can be controlled by adjusting the externally applied flux while the DSFQ is protected from decaying into the readout resonator. Finally, we also study a double-loop gradiometric version of the DSFQ which is exponentially insensitive to variations in the global magnetic field, even when the loop areas are non-identical.	翻訳日:2023-11-30 16:52:04 公開日:2023-11-28
# 画像復元のための混合階層ネットワーク Mixed Hierarchy Network for Image Restoration ( http://arxiv.org/abs/2302.09554v4 ) ライセンス: Link先を確認	Hu Gao and Depeng Dang	(参考訳) 画像復元は、デブラリングやデレイニングなど、長期にわたる低レベルの視覚問題である。画像復元の過程では,空間的詳細や文脈情報だけでなく,システムの複雑さも考慮する必要がある。画像復元の質を保証できる手法は数多くあるが, 現状技術(SOTA)手法の複雑さも増大している。この動機付けにより、これらの競合する目標のバランスをとることができる混合階層ネットワークを提案する。システム複雑性を軽減するためにブロック内の設計を行いながら、劣化した画像からコンテキスト情報と空間詳細を段階的に復元する。具体的には,まずエンコーダデコーダアーキテクチャを用いて文脈情報を学習し,空間的詳細を保存する高分解能分岐と組み合わせる。簡易な解析と比較のために、このアーキテクチャのシステムの複雑さを軽減するために、非線形活性化関数を乗法で置き換えたり取り除いたりし、単純なネットワーク構造を使う。さらに,エンコーダデコーダの中間ブロックに対する空間畳み込みをグローバルな自己注意に置き換える。その結果、mhnetと呼ばれる密にリンクされた階層アーキテクチャは、画像のデレイニングやデブラリングなど、いくつかの画像復元タスクにおいて強力なパフォーマンス向上をもたらす。 Image restoration is a long-standing low-level vision problem, e.g., deblurring and deraining. In the process of image restoration, it is necessary to consider not only the spatial details and contextual information of restoration to ensure the quality, but also the system complexity. Although many methods have been able to guarantee the quality of image restoration, the system complexity of the state-of-the-art (SOTA) methods is increasing as well. Motivated by this, we present a mixed hierarchy network that can balance these competing goals. Our main proposal is a mixed hierarchy architecture, that progressively recovers contextual information and spatial details from degraded images while we design intra-blocks to reduce system complexity. Specifically, our model first learns the contextual information using encoder-decoder architectures, and then combines them with high-resolution branches that preserve spatial detail. In order to reduce the system complexity of this architecture for convenient analysis and comparison, we replace or remove the nonlinear activation function with multiplication and use a simple network structure. In addition, we replace spatial convolution with global self-attention for the middle block of encoder-decoder. The resulting tightly interlinked hierarchy architecture, named as MHNet, delivers strong performance gains on several image restoration tasks, including image deraining, and deblurring.	翻訳日:2023-11-30 16:51:25 公開日:2023-11-28
# 逆ロバスト分類におけるランダム化の役割について On the Role of Randomization in Adversarially Robust Classification ( http://arxiv.org/abs/2302.07221v3 ) ライセンス: Link先を確認	Lucas Gnecco-Heredia, Yann Chevaleyre, Benjamin Negrevergne, Laurent Meunier, Muni Sreenivas Pydi	(参考訳) ディープニューラルネットワークは、テストデータの小さな逆方向の摂動に弱いことが知られている。敵攻撃から守るため、確率的分類器は決定論的分類に代わるものとして提案されている。しかし, 確率的分類器の有効性は, 決定論的分類と比較して矛盾している。本稿では,逆ロバストな分類器の構築におけるランダム化の役割を明らかにする。決定論的分類器の基本的な仮説セットが与えられた場合、ランダムなアンサンブルが敵のリスクで設定された仮説を上回り、前の結果を延ばす条件を示す。さらに、確率的二項分類器(ランダム化アンサンブルを含む)に対して、それを上回る決定論的分類器が存在することを示す。最後に,多種多様な確率的分類器,すなわちランダム化アンサンブルとパラメトリック/入力ノイズインジェクションに対する決定論的分類器を含む決定論的仮説集合を明示的に記述する。 Deep neural networks are known to be vulnerable to small adversarial perturbations in test data. To defend against adversarial attacks, probabilistic classifiers have been proposed as an alternative to deterministic ones. However, literature has conflicting findings on the effectiveness of probabilistic classifiers in comparison to deterministic ones. In this paper, we clarify the role of randomization in building adversarially robust classifiers. Given a base hypothesis set of deterministic classifiers, we show the conditions under which a randomized ensemble outperforms the hypothesis set in adversarial risk, extending previous results. Additionally, we show that for any probabilistic binary classifier (including randomized ensembles), there exists a deterministic classifier that outperforms it. Finally, we give an explicit description of the deterministic hypothesis set that contains such a deterministic classifier for many types of commonly used probabilistic classifiers, i.e. randomized ensembles and parametric/input noise injection.	翻訳日:2023-11-30 16:50:49 公開日:2023-11-28
# 不満足な部分最適化によるcspの効率的な説明(拡張アルゴリズムと例) Efficiently Explaining CSPs with Unsatisfiable Subset Optimization (extended algorithms and examples) ( http://arxiv.org/abs/2303.11712v3 ) ライセンス: Link先を確認	Emilio Gamba, Bart Bogaerts, Tias Guns	(参考訳) 我々は,制約満足度問題 (CSP) の解を,人間に理解可能な方法で段階的に説明する手法を最近提案した。ここでは、コスト関数を用いて単純さを定量化する単純な推論ステップの列を説明する。説明生成アルゴリズムは、派生した不満足な式から最小不満足な部分集合(MUS)を抽出し、いわゆる非冗長な説明とMUSを1対1で対応させる。しかし、mus抽出アルゴリズムは、与えられたコスト関数に対する部分的最小性や最適性の保証を提供しない。したがって、これらの形式的基礎の上に構築し、改善の主なポイント、すなわち(与えられたコストメトリックに関して)確実に最適な説明を効率的に生成する方法に取り組む。そこで本研究では,(1)最適制約を満たさない部分集合を探索するヒット集合型アルゴリズム,(2)複数のアルゴリズム呼び出しで関連する情報を再利用する手法,(3)説明シーケンス生成を高速化するためにドメイン固有情報を利用する手法を開発した。我々は多数のcsp問題に対してアルゴリズムを実験的に検証した。我々のアルゴリズムは、説明品質と計算時間(標準のMUSアプローチよりも平均56%高速)において、MUSアプローチよりも優れていることがわかった。 We build on a recently proposed method for stepwise explaining solutions of Constraint Satisfaction Problems (CSP) in a human-understandable way. An explanation here is a sequence of simple inference steps where simplicity is quantified using a cost function. The algorithms for explanation generation rely on extracting Minimal Unsatisfiable Subsets (MUS) of a derived unsatisfiable formula, exploiting a one-to-one correspondence between so-called non-redundant explanations and MUSs. However, MUS extraction algorithms do not provide any guarantee of subset minimality or optimality with respect to a given cost function. Therefore, we build on these formal foundations and tackle the main points of improvement, namely how to generate explanations efficiently that are provably optimal (with respect to the given cost metric). For that, we developed (1) a hitting set-based algorithm for finding the optimal constrained unsatisfiable subsets; (2) a method for re-using relevant information over multiple algorithm calls; and (3) methods exploiting domain-specific information to speed up the explanation sequence generation. We experimentally validated our algorithms on a large number of CSP problems. We found that our algorithms outperform the MUS approach in terms of explanation quality and computational time (on average up to 56 % faster than a standard MUS approach).	翻訳日:2023-11-30 16:41:47 公開日:2023-11-28
# 動的信頼度によるスパイクニューラルネットワークの可能性の解き放つ Unleashing the Potential of Spiking Neural Networks by Dynamic Confidence ( http://arxiv.org/abs/2303.10276v3 ) ライセンス: Link先を確認	Chen Li, Edward Jones, Steve Furber	(参考訳) 本稿では,スパイキングニューラルネットワーク(SNN)の精度とレイテンシのトレードオフを緩和する新しい手法を提案する。このアプローチでは、sn出力から時間とともに信頼情報をデコードし、各推論を終了するタイミングを動的に決定できる意思決定エージェントを開発する。提案手法であるDynamic Confidenceは,SNNにいくつかの大きなメリットを提供する。 1. 実行時に動的にレイテンシを最適化し、既存の低レイテンシSNNアルゴリズムとは分離することができる。 CIFAR-10とImageNetデータセットに関する実験は、Dynamic Confidenceを適用した後、8つの異なる設定で平均40%のスピードアップを示した。 2) Dynamic Confidenceにおける意思決定エージェントは,パラメータ空間の構築が容易で,非常に堅牢であり,実装が非常に容易である。 3)提案手法は,現在のSNNが接近するターゲットを設定する任意のSNNのポテンシャルを可視化する。例えば、SNNが各入力サンプルの最も適切な時刻で終了できる場合、ResNet-50 SNNは平均4.71タイムステップでImageNet上で82.47%の精度を達成できる。 SNNの可能性を解き放つには、信頼性の高い意思決定エージェントを構築し、高品質な基底真理推定を行う必要がある。この点において、Dynamic ConfidenceはSNNの可能性を実現するための重要なステップである。 This paper presents a new methodology to alleviate the fundamental trade-off between accuracy and latency in spiking neural networks (SNNs). The approach involves decoding confidence information over time from the SNN outputs and using it to develop a decision-making agent that can dynamically determine when to terminate each inference. The proposed method, Dynamic Confidence, provides several significant benefits to SNNs. 1. It can effectively optimize latency dynamically at runtime, setting it apart from many existing low-latency SNN algorithms. Our experiments on CIFAR-10 and ImageNet datasets have demonstrated an average 40% speedup across eight different settings after applying Dynamic Confidence. 2. The decision-making agent in Dynamic Confidence is straightforward to construct and highly robust in parameter space, making it extremely easy to implement. 3. The proposed method enables visualizing the potential of any given SNN, which sets a target for current SNNs to approach. For instance, if an SNN can terminate at the most appropriate time point for each input sample, a ResNet-50 SNN can achieve an accuracy as high as 82.47% on ImageNet within just 4.71 time steps on average. Unlocking the potential of SNNs needs a highly-reliable decision-making agent to be constructed and fed with a high-quality estimation of ground truth. In this regard, Dynamic Confidence represents a meaningful step toward realizing the potential of SNNs.	翻訳日:2023-11-30 16:40:43 公開日:2023-11-28
# MAPSeg:3次元マスケード自動符号化と擬似ラベルによる不均一な医用画像分割のための統一的ドメイン適応 MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling ( http://arxiv.org/abs/2303.09373v2 ) ライセンス: Link先を確認	Xuzhe Zhang, Yuhao Wu, Elsa Angelini, Ang Li, Jia Guo, Jerod M. Rasmussen, Thomas G. O'Connor, Pathik D. Wadhwa, Andrea Parolin Jackowski, Hai Li, Jonathan Posner, Andrew F. Laine, Yun Wang	(参考訳) ロバストセグメンテーションは、大規模、多施設、縦断的な医療スキャンから定量的測定を導出するために重要である。しかし、手動でアノテートする医療スキャンは高価で労働集約的であり、すべてのドメインで利用できるとは限らない。非教師付きドメイン適応(Unsupervised domain adapt, UDA)は、他のドメインから利用可能なラベルを活用することで、このラベルとスカシティの問題を軽減する、よく研究されている手法である。本研究では,多種多様な医用画像のセグメンテーションにおいて,多目的性と優れた性能を有する$\textbf{unified}$ UDAフレームワークであるMasked Autoencoding and Pseudo-Labeling Segmentation (MAPSeg)を紹介する。我々の知る限りでは、医療画像セグメンテーションにおける4つの異なるドメインシフトに取り組むための枠組みを体系的にレビューし、開発する最初の研究である。さらに重要なのは、MAPSegは、同等のパフォーマンスを維持しながら、$\textbf{centralized}$, $\textbf{federated}$, $\textbf{test-time}$ UDAに適用できる最初のフレームワークである。我々は,MAPSegを,乳幼児用MRIデータセットと一般用CT-MRIデータセットの最先端手法と比較し,MAPSegは他者よりも大きなマージン(プライベートMRIデータセットの10.5Dice,一般用CT-MRIデータセットの5.7Dice改善)で優れていた。 MAPSegは非常に実用的な価値を持ち、現実世界の問題に適用できる。コードと事前訓練されたモデルは、後ほど利用可能になります。 Robust segmentation is critical for deriving quantitative measures from large-scale, multi-center, and longitudinal medical scans. Manually annotating medical scans, however, is expensive and labor-intensive and may not always be available in every domain. Unsupervised domain adaptation (UDA) is a well-studied technique that alleviates this label-scarcity problem by leveraging available labels from another domain. In this study, we introduce Masked Autoencoding and Pseudo-Labeling Segmentation (MAPSeg), a $\textbf{unified}$ UDA framework with great versatility and superior performance for heterogeneous and volumetric medical image segmentation. To the best of our knowledge, this is the first study that systematically reviews and develops a framework to tackle four different domain shifts in medical image segmentation. More importantly, MAPSeg is the first framework that can be applied to $\textbf{centralized}$, $\textbf{federated}$, and $\textbf{test-time}$ UDA while maintaining comparable performance. We compare MAPSeg with previous state-of-the-art methods on a private infant brain MRI dataset and a public cardiac CT-MRI dataset, and MAPSeg outperforms others by a large margin (10.5 Dice improvement on the private MRI dataset and 5.7 on the public CT-MRI dataset). MAPSeg poses great practical value and can be applied to real-world problems. Our code and pretrained model will be available later.	翻訳日:2023-11-30 16:40:19 公開日:2023-11-28
# 連続翻訳対称性を保持する量子格子モデル Quantum lattice models that preserve continuous translation symmetry ( http://arxiv.org/abs/2303.07649v2 ) ライセンス: Link先を確認	Dominic G. Lewis, Achim Kempf, Nicolas C. Menicucci	(参考訳) 量子場理論に対する帯域制限のアプローチは、信号処理からシャノンサンプリング定理を通じて連続かつ離散的な場を同時に扱うことができる。一般相対性理論と場の量子論における矛盾する仮定は、両方の要求を満たすために針をスレッドできる魅力的な分析ツールを使うことを動機付けている。帯域制限連続量子場は格子理論に同型であるが、固定格子を必要としない。必要最小間隔を持つ任意の格子を用いることができる。これは、格子間隔の極限が 0 になるのを避ける同型である。本研究では、量子格子理論における効果的連続対称性の出現を含む、この同型の帰結を探求する。これらの連続対称性に対する保存格子可観測性と、この2つの視点から局所性の双対性を得る。この研究とその拡張は、固定格子のない離散性から生じる連続量子場の数値格子モデルを考えるための有用なツールを提供するとともに、格子モデルにおける創発的連続対称性に対する新たな洞察と、これらの現象の実験的実証を提供する。 Bandlimited approaches to quantum field theory offer the tantalizing possibility of working with fields that are simultaneously both continuous and discrete via the Shannon Sampling Theorem from signal processing. Conflicting assumptions in general relativity and quantum field theory motivate the use of such an appealing analytical tool that could thread the needle to meet both requirements. Bandlimited continuous quantum fields are isomorphic to lattice theories, yet without requiring a fixed lattice. Any lattice with a required minimum spacing can be used. This is an isomorphism that avoids taking the limit of the lattice spacing going to zero. In this work, we explore the consequences of this isomorphism, including the emergence of effectively continuous symmetries in quantum lattice theories. One obtains conserved lattice observables for these continuous symmetries, as well as a duality of locality from the two perspectives. We expect this work and its extensions to provide useful tools for considering numerical lattice models of continuous quantum fields arising from the availability of discreteness without a fixed lattice, as well as offering new insights into emergent continuous symmetries in lattice models and possible laboratory demonstrations of these phenomena.	翻訳日:2023-11-30 16:39:10 公開日:2023-11-28
# agi: 教育のための人工知能 AGI: Artificial General Intelligence for Education ( http://arxiv.org/abs/2304.12479v4 ) ライセンス: Link先を確認	Ehsan Latif, Gengchen Mai, Matthew Nyaaba, Xuansheng Wu, Ninghao Liu, Guoyu Lu, Sheng Li, Tianming Liu, and Xiaoming Zhai	(参考訳) 人工知能 (AGI) は, GPT-4 や ChatGPT といった大規模言語モデルやチャットボットの出現により, 将来の技術としてグローバルに認識されるようになった。通常、限られた範囲のタスク用に設計された従来のaiモデルと比較すると、トレーニングのためにかなりの量のドメイン固有のデータを必要とし、教育における複雑な対人ダイナミクスを考えるとは限らない。最近の大規模な事前学習モデルによって駆動されるAGIは、推論、問題解決、意思決定、さらには人間の感情や社会的相互作用を理解することなど、人間レベルの知性を必要とするタスクを実行する機械の能力において、大きな飛躍を示している。本稿では,将来の教育目標達成,教育・カリキュラムの設計,評価の実施など,今後の教育におけるAGIの重要概念,能力,範囲,ポテンシャルを概観する。 AGIは知的学習システム、教育評価、評価手順を大幅に改善することができる。 AGIシステムは個々の学生のニーズに適応し、適切な学習体験を提供する。また、生徒のパフォーマンスに関する総合的なフィードバックを提供し、生徒の進捗に応じて動的に指導方法を調整できる。本稿は、AGIの能力が人間の感情や社会的相互作用を理解することにまで及んでいることを強調する。本稿は,agiによる教育における倫理的課題として,データのバイアス,公平性,プライバシなどについて論じるとともに,宿題や教育,リクルートといった学術的場面における責任あるagiの利用を保証するための行動規範の必要性を強調する。また,agiの開発には,教育者とai技術者の学際的な連携が不可欠であると結論づけた。 Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This position paper reviews AGI's key concepts, capabilities, scope, and potential within future education, including achieving future educational goals, designing pedagogy and curriculum, and performing assessments. It highlights that AGI can significantly improve intelligent tutoring systems, educational assessment, and evaluation procedures. AGI systems can adapt to individual student needs, offering tailored learning experiences. They can also provide comprehensive feedback on student performance and dynamically adjust teaching methods based on student progress. The paper emphasizes that AGI's capabilities extend to understanding human emotions and social interactions, which are critical in educational settings. The paper discusses that ethical issues in education with AGI include data bias, fairness, and privacy and emphasizes the need for codes of conduct to ensure responsible AGI use in academic settings like homework, teaching, and recruitment. We also conclude that the development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.	翻訳日:2023-11-30 16:32:21 公開日:2023-11-28
# NeRFによる3次元のセグメンテーション Segment Anything in 3D with NeRFs ( http://arxiv.org/abs/2304.12308v4 ) ライセンス: Link先を確認	Jiazhong Cen, Zanwei Zhou, Jiemin Fang, Chen Yang, Wei Shen, Lingxi Xie, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian	(参考訳) 最近,Segment Anything Model (SAM) は,任意のものを2次元画像に分割できる強力なビジョン基盤モデルとして登場した。本稿では,SAMを3次元オブジェクトに分割する手法を提案する。 3Dでコストがかかるデータ取得とアノテーションの手順を複製するのではなく、我々はNeural Radiance Field(NeRF)を安価でオフザシェルフとして活用し、マルチビュー2D画像を3D空間に接続する効率的なソリューションを設計する。提案したソリューションを,SA3D, セグメンテーション・アニーシング(Seegment Anything in 3D)と呼ぶ。単一のビューでターゲットオブジェクトに対して手動のセグメンテーションプロンプト(例えば粗い点)を提供することが要求され、SAMでこのビューでその2Dマスクを生成するのに使用される。次に、SA3Dは、ボクセルグリッドで構築されたターゲットオブジェクトの3Dマスクを反復的に完了するように、様々な視点でマスク逆レンダリングとクロスビューのセルフプロンプトを交互に行う。前者は、SAMが取得した2Dマスクを現在の視点で3Dマスクに投影し、NeRFが学習した密度分布を誘導し、後者は、NeRFレンダリングされた2DマスクからのSAMへの入力として、信頼性の高いプロンプトを自動的に抽出する。実験では,sa3dが様々なシーンに適応し,数分で3dセグメンテーションを実現することを示す。 2dモデルが複数のビューにまたがって迅速セグメント化に着実に対処できる限り、2d vision foundation modelの能力を3dに引き上げる潜在的な方法論を明らかにする。私たちのコードはhttps://github.com/jumpat/segmentanythingin3dで利用可能です。 Recently, the Segment Anything Model (SAM) emerged as a powerful vision foundation model which is capable to segment anything in 2D images. This paper aims to generalize SAM to segment 3D objects. Rather than replicating the data acquisition and annotation procedure which is costly in 3D, we design an efficient solution, leveraging the Neural Radiance Field (NeRF) as a cheap and off-the-shelf prior that connects multi-view 2D images to the 3D space. We refer to the proposed solution as SA3D, for Segment Anything in 3D. It is only required to provide a manual segmentation prompt (e.g., rough points) for the target object in a single view, which is used to generate its 2D mask in this view with SAM. Next, SA3D alternately performs mask inverse rendering and cross-view self-prompting across various views to iteratively complete the 3D mask of the target object constructed with voxel grids. The former projects the 2D mask obtained by SAM in the current view onto 3D mask with guidance of the density distribution learned by the NeRF; The latter extracts reliable prompts automatically as the input to SAM from the NeRF-rendered 2D mask in another view. We show in experiments that SA3D adapts to various scenes and achieves 3D segmentation within minutes. Our research reveals a potential methodology to lift the ability of a 2D vision foundation model to 3D, as long as the 2D model can steadily address promptable segmentation across multiple views. Our code is available at https://github.com/Jumpat/SegmentAnythingin3D.	翻訳日:2023-11-30 16:31:54 公開日:2023-11-28
# エッジ方向性が親水性グラフの学習を改善する Edge Directionality Improves Learning on Heterophilic Graphs ( http://arxiv.org/abs/2305.10498v3 ) ライセンス: Link先を確認	Emanuele Rossi, Bertrand Charpentier, Francesco Di Giovanni, Fabrizio Frasca, Stephan G\"unnemann, Michael Bronstein	(参考訳) グラフニューラルネットワーク(gnns)は、関係データモデリングのデファクト標準ツールとなっている。しかし、多くの現実世界のグラフが向けられているが、今日のGNNモデルの大半は、グラフを非ダイレクトにすることで、この情報を完全に捨てている。その理由は歴史的である。 1)スペクトルGNNの初期変種の多くは、明示的に無向グラフを必要とし、 2) 好中球グラフに関する最初のベンチマークでは, 方向性による有意な利得は得られなかった。本稿では, グラフを方向性として扱うと, グラフの有効ホモフィリエが増大し, 方向情報の正しい利用による潜在的な利得が示唆された。そこで我々は,有向グラフの深層学習のための新しい汎用フレームワークであるDirected Graph Neural Network (Dir-GNN)を紹介した。 dir-gnnは、入出力エッジの分離集約を行うことで、エッジ方向情報を考慮した任意のメッセージパッシングニューラルネットワーク(mpnn)を拡張するために使用できる。我々は,Dir-GNNが従来のMPNNよりも高い指向性Weisfeiler-Lehmanテストの表現性に一致することを証明した。広範な実験において、我々のフレームワークは、ホモフィル性データセットにパフォーマンスをそのまま残すが、GCN、GAT、GraphSageのようなヘテロフィル性ベンチマークのベースモデルよりも大幅に向上し、より複雑な手法よりも優れ、新しい最先端の結果が得られることを検証した。 Graph Neural Networks (GNNs) have become the de-facto standard tool for modeling relational data. However, while many real-world graphs are directed, the majority of today's GNN models discard this information altogether by simply making the graph undirected. The reasons for this are historical: 1) many early variants of spectral GNNs explicitly required undirected graphs, and 2) the first benchmarks on homophilic graphs did not find significant gain from using direction. In this paper, we show that in heterophilic settings, treating the graph as directed increases the effective homophily of the graph, suggesting a potential gain from the correct use of directionality information. To this end, we introduce Directed Graph Neural Network (Dir-GNN), a novel general framework for deep learning on directed graphs. Dir-GNN can be used to extend any Message Passing Neural Network (MPNN) to account for edge directionality information by performing separate aggregations of the incoming and outgoing edges. We prove that Dir-GNN matches the expressivity of the Directed Weisfeiler-Lehman test, exceeding that of conventional MPNNs. In extensive experiments, we validate that while our framework leaves performance unchanged on homophilic datasets, it leads to large gains over base models such as GCN, GAT and GraphSage on heterophilic benchmarks, outperforming much more complex methods and achieving new state-of-the-art results.	翻訳日:2023-11-30 16:22:05 公開日:2023-11-28
# Marsellus: 2-to-8b DNNアクセラレーションと30%ブースト適応ボディバイアスを備えた異種RISC-V AI-IoTエンドノードSoC Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing ( http://arxiv.org/abs/2305.08415v3 ) ライセンス: Link先を確認	Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, Gianmarco Ottavi, Manuel Eggimann, Hayate Okuhara, Luca Benini	(参考訳) 拡張現実、パーソナライズされたヘルスケア、ナノロボティクスのためのAI-IoT(System-on-a-Chip)システム・オン・チップ(SoC)の進化は、計算集約的だが強力な量子化されたDeep Neural Network(DNN)推論や、高精度浮動小数点を必要とする信号処理と制御など、幅広い操作条件において、数十mWのパワーエンベロープ内で多くの多様なタスクを実行する必要がある。我々はglobalfoundries 22nm fdxで作製したai-iotエンドノードのための全デジタルヘテロジニアスsocであるmarsellusを提案する。 1 RISC-Vデジタル信号処理(DSP)16コアの汎用クラスタで、4ビットと2ビットの算術拡張(XpulpNN)を利用して、MAC&LOAD操作と浮動小数点演算を併用した多様なワークロードを実行する。 2) DNNにおける3x3と1x1(ポイントワイド)の畳み込みを加速する2-8ビット再構成可能なバイナリエンジン(RBE) 3)Adaptive Body Biasing(ABB)ジェネレータとハードウェア制御ループに接続されたオンチップ監視(OCM)ブロックのセットにより、トランジスタ閾値電圧のオンザフライ適応が可能となる。 Marsellusは2ビットの精度演算で最大180 Gop/s、3.32 Top/s/W、ハードウェアアクセラレーションされたDNN層で最大637 Gop/s、12.4 Top/s/Wを達成する。 Emerging Artificial Intelligence-enabled Internet-of-Things (AI-IoT) System-on-a-Chip (SoC) for augmented reality, personalized healthcare, and nano-robotics need to run many diverse tasks within a power envelope of a few tens of mW over a wide range of operating conditions: compute-intensive but strongly quantized Deep Neural Network (DNN) inference, as well as signal processing and control requiring high-precision floating-point. We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22nm FDX that combines 1) a general-purpose cluster of 16 RISC-V Digital Signal Processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-bit and 2-bit arithmetic extensions (XpulpNN), combined with fused MAC&LOAD operations and floating-point support; 2) a 2-8bit Reconfigurable Binary Engine (RBE) to accelerate 3x3 and 1x1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages. Marsellus achieves up to 180 Gop/s or 3.32 Top/s/W on 2-bit precision arithmetic in software, and up to 637 Gop/s or 12.4 Top/s/W on hardware-accelerated DNN layers.	翻訳日:2023-11-30 16:20:20 公開日:2023-11-28
# 移動自走車からのイベントフリー移動物体セグメンテーション Event-Free Moving Object Segmentation from Moving Ego Vehicle ( http://arxiv.org/abs/2305.00126v2 ) ライセンス: Link先を確認	Zhuyun Zhou, Zongwei Wu, Danda Pani Paudel, R\'emi Boutteau, Fan Yang, Luc Van Gool, Radu Timofte, Dominique Ginhac	(参考訳) 動的シーンにおける移動物体セグメンテーション(MOS)は、特に移動エゴ車から得られるシーケンスにおいて自律運転において困難である。ほとんどの最先端の手法は光学フローマップから得られる動きキューを利用する。しかし、これらの手法はしばしば連続するrgbフレームから予め計算された光学フローに基づいているため、フレーム間で起こる事象の時間的考察を無視し、現実の状況においてこれらの方法の実用性を制限している。これらの制約に対処するために,光学的フローに頼ることなくリッチなモーションキューを提供する,より優れた映像理解のためのイベントカメラを提案する。この領域の研究を促進するため,我々はまず,移動する自走車から物体のセグメンテーションを動かすために,dsec-mosと呼ばれる新しい大規模データセットを導入した。次に、イベントデータを活用可能な新しいネットワークであるemoformerを考案する。この目的のために、オブジェクトを静的な背景から区別するために、以前のイベントと空間的なセマンティックマップを融合させ、関心のあるオブジェクト(移動対象)の周りに別のレベルの密接な監督を加えます。提案するネットワークは,トレーニングにイベントデータのみに依存するが,推論時にイベント入力を必要としないため,効率の面でフレームのみの手法と直接的に比較でき,多くのアプリケーションでより広く利用することができる。 8つの最先端ビデオオブジェクトセグメンテーション手法と徹底的に比較した結果,他の手法よりも優れた性能向上が得られた。プロジェクトページ: https://github.com/zzy-zhou/dsec-mos。 Moving object segmentation (MOS) in dynamic scenes is challenging for autonomous driving, especially for sequences obtained from moving ego vehicles. Most state-of-the-art methods leverage motion cues obtained from optical flow maps. However, since these methods are often based on optical flows that are pre-computed from successive RGB frames, this neglects the temporal consideration of events occurring within inter-frame and limits the practicality of these methods in real-life situations. To address these limitations, we propose to exploit event cameras for better video understanding, which provide rich motion cues without relying on optical flow. To foster research in this area, we first introduce a novel large-scale dataset called DSEC-MOS for moving object segmentation from moving ego vehicles. Subsequently, we devise EmoFormer, a novel network able to exploit the event data. For this purpose, we fuse the event prior with spatial semantic maps to distinguish moving objects from the static background, adding another level of dense supervision around our object of interest - moving ones. Our proposed network relies only on event data for training but does not require event input during inference, making it directly comparable to frame-only methods in terms of efficiency and more widely usable in many application cases. An exhaustive comparison with 8 state-of-the-art video object segmentation methods highlights a significant performance improvement of our method over all other methods. Project Page: https://github.com/ZZY-Zhou/DSEC-MOS.	翻訳日:2023-11-30 16:16:47 公開日:2023-11-28
# 2つの特徴の物語:ゼロショットセマンティック対応のための安定拡散補完DINO A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence ( http://arxiv.org/abs/2305.15347v2 ) ライセンス: Link先を確認	Junyi Zhang, Charles Herrmann, Junhwa Hur, Luisa Polania Cabrera, Varun Jampani, Deqing Sun, Ming-Hsuan Yang	(参考訳) テキストと画像の拡散モデルは高品質な画像の生成と編集に大きな進歩をもたらした。その結果,分類,意味セグメンテーション,スタイライゼーションなど,下流タスクの単一画像を理解し,処理する拡散モデル機能について,多くのアプローチが検討されている。しかし、これらの機能が複数の異なる画像やオブジェクトで明らかにするものについて、あまり知られていない。本研究では,安定拡散(sd)特徴を意味的かつ密接な対応に活用し,sd特徴がsota表現と定量的に類似していることを発見する。興味深いことに、定性的な分析により、SD機能は、最近リリースされたDINOv2のような既存の表現学習機能とは全く異なる性質を持つことが明らかになった。これら2つの機能の単純な融合は驚くほどうまく動作し、これらの融合した特徴に対して近接した隣人によるゼロショット評価は、ベンチマークデータセット(SPair-71k、PF-Pascal、TSS)の最先端メソッドよりも大きなパフォーマンス向上をもたらす。また,これらの対応により,2つのイメージをスワップするなど,興味深い応用が可能となることを示す。 Text-to-image diffusion models have made significant advances in generating and editing high-quality images. As a result, numerous approaches have explored the ability of diffusion model features to understand and process single images for downstream tasks, e.g., classification, semantic segmentation, and stylization. However, significantly less is known about what these features reveal across multiple, different images and objects. In this work, we exploit Stable Diffusion (SD) features for semantic and dense correspondence and discover that with simple post-processing, SD features can perform quantitatively similar to SOTA representations. Interestingly, the qualitative analysis reveals that SD features have very different properties compared to existing representation learning features, such as the recently released DINOv2: while DINOv2 provides sparse but accurate matches, SD features provide high-quality spatial information but sometimes inaccurate semantic matches. We demonstrate that a simple fusion of these two features works surprisingly well, and a zero-shot evaluation using nearest neighbors on these fused features provides a significant performance gain over state-of-the-art methods on benchmark datasets, e.g., SPair-71k, PF-Pascal, and TSS. We also show that these correspondences can enable interesting applications such as instance swapping in two images.	翻訳日:2023-11-30 16:09:01 公開日:2023-11-28
# 比較推論のための事前学習言語モデル Pre-training Language Models for Comparative Reasoning ( http://arxiv.org/abs/2305.14457v4 ) ライセンス: Link先を確認	Mengxia Yu, Zhihan Zhang, Wenhao Yu, Meng Jiang	(参考訳) 比較推論は、対象、概念または実体を比較して結論を引き出す過程であり、基本的な認知能力を構成する。本稿では,テキストに対する比較推論能力を高めるための,事前学習型言語モデルのための新しいフレームワークを提案する。比較推論を必要とするNLPタスクにはアプローチがあるが、コストのかかる手動データラベリングと、異なるタスクに対する限定的な一般化性に悩まされている。本手法では,構造化データと非構造化データの両方を活用する,テキストベースのエンティティ比較のためのスケーラブルなデータ収集手法を提案する。さらに, 比較推論に関する3つの新しい目的を通して, 事前学習言語モデルの枠組みを提案する。比較質問応答,質問生成,要約などの下流タスクの評価は,特に低リソース条件下で,我々の事前学習フレームワークが言語モデルの比較推論能力を大幅に向上させることを示す。この研究は、比較推論のための最初の統合ベンチマークもリリースしている。 Comparative reasoning is a process of comparing objects, concepts, or entities to draw conclusions, which constitutes a fundamental cognitive ability. In this paper, we propose a novel framework to pre-train language models for enhancing their abilities of comparative reasoning over texts. While there have been approaches for NLP tasks that require comparative reasoning, they suffer from costly manual data labeling and limited generalizability to different tasks. Our approach introduces a novel method of collecting scalable data for text-based entity comparison, which leverages both structured and unstructured data. Moreover, we present a framework of pre-training language models via three novel objectives on comparative reasoning. Evaluation on downstream tasks including comparative question answering, question generation, and summarization shows that our pre-training framework significantly improves the comparative reasoning abilities of language models, especially under low-resource conditions. This work also releases the first integrated benchmark for comparative reasoning.	翻訳日:2023-11-30 16:07:33 公開日:2023-11-28
# 大言語モデルからの複合視覚手がかりによるゼロショット視覚関連検出 Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models ( http://arxiv.org/abs/2305.12476v4 ) ライセンス: Link先を確認	Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen	(参考訳) CLIPのような事前訓練された視覚言語モデルは強力な一般化能力を示しており、ゼロショット視覚認識の領域において有望なツールとなっている。視覚的関係検出(VRD)は、画像内のオブジェクトペア間の関係(または相互作用)タイプを特定する典型的なタスクである。しかし、ゼロショットvrdのクラスベースプロンプトが一般的であるクリップは、異なる細かな関係タイプを区別するのに苦労し、2つのオブジェクトの本質的な空間情報を無視するなど、いくつかの弱点がある。そこで本研究では,複合記述プロンプトによる関係検出を解消する,ゼロショットvrd: recodeを提案する。具体的には、まず各述語カテゴリを主題、対象、空間構成要素に分解する。次に、大きな言語モデル(LLM)を活用して、各コンポーネントに対する記述ベースのプロンプト(またはビジュアルキュー)を生成する。異なる視覚的な手がかりは、異なる視点から類似した関連カテゴリの識別性を高め、vrdのパフォーマンスを著しく向上させる。異なる視覚的手がかりを動的に融合させるために,LLMが異なる視覚的手がかりに対して適切な重みを生成できるようにするチェーン・オブ・シント法を導入する。 4つのVRDベンチマークの大規模な実験は、RECODEの有効性と解釈可能性を示している。 Pretrained vision-language models, such as CLIP, have demonstrated strong generalization capabilities, making them promising tools in the realm of zero-shot visual recognition. Visual relation detection (VRD) is a typical task that identifies relationship (or interaction) types between object pairs within an image. However, naively utilizing CLIP with prevalent class-based prompts for zero-shot VRD has several weaknesses, e.g., it struggles to distinguish between different fine-grained relation types and it neglects essential spatial information of two objects. To this end, we propose a novel method for zero-shot VRD: RECODE, which solves RElation detection via COmposite DEscription prompts. Specifically, RECODE first decomposes each predicate category into subject, object, and spatial components. Then, it leverages large language models (LLMs) to generate description-based prompts (or visual cues) for each component. Different visual cues enhance the discriminability of similar relation categories from different perspectives, which significantly boosts performance in VRD. To dynamically fuse different cues, we further introduce a chain-of-thought method that prompts LLMs to generate reasonable weights for different visual cues. Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE.	翻訳日:2023-11-30 16:06:24 公開日:2023-11-28
# Twitterのマストドン移住における社会的影響の要因 Drivers of social influence in the Twitter migration to Mastodon ( http://arxiv.org/abs/2305.19056v3 ) ライセンス: Link先を確認	Lucio La Cava, Luca Maria Aiello, Andrea Tagarelli	(参考訳) elon muskが買収された後、twitterユーザーのマストドンへの移行は、集団行動を研究し、オンラインメディアにおける協調行動の原動力となる洞察を得るユニークな機会をもたらす。我々は,約75,000人の移住者のソーシャル・ネットワークと公的な会話を分析し,情報拡散の包括的流行モデルを用いて,移動の時間的痕跡が社会的影響の現象と相容れないことを観察した。行動変化に関する以前の研究から、さまざまなTwitterコミュニティの多様性を考慮に入れた要因を、移行への影響力の拡散の有効性について検討した。影響プロセスが急速に拡大するコミュニティは、社会的つながりの密度の低下、移住へのコミットメントのレベル上昇、コミュニティの議論において共有アイデンティティと事実知識の交換に重点を置いている。これらの因子は、観測データのばらつきの57%を占める。本研究は,草の根集団行動の記述における社会的相互作用のネットワーク構造,コミットメント,心理言語学的側面の連立の重要性を強調し,オンライン集団の行動変化の過程の解明に寄与する。 The migration of Twitter users to Mastodon following Elon Musk's acquisition presents a unique opportunity to study collective behavior and gain insights into the drivers of coordinated behavior in online media. We analyzed the social network and the public conversations of about 75,000 migrated users and observed that the temporal trace of their migrations is compatible with a phenomenon of social influence, as described by a compartmental epidemic model of information diffusion. Drawing from prior research on behavioral change, we delved into the factors that account for variations across different Twitter communities in the effectiveness of the spreading of the influence to migrate. Communities in which the influence process unfolded more rapidly exhibit lower density of social connections, higher levels of signaled commitment to migrating, and more emphasis on shared identity and exchange of factual knowledge in the community discussion. These factors account collectively for 57% of the variance in the observed data. Our results highlight the joint importance of network structure, commitment, and psycho-linguistic aspects of social interactions in describing grassroots collective action, and contribute to deepen our understanding of the mechanisms driving processes of behavior change of online groups.	翻訳日:2023-11-30 15:57:22 公開日:2023-11-28
# HiFA:高度拡散誘導による高忠実テキスト・ツー・3D生成 HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance ( http://arxiv.org/abs/2305.18766v3 ) ライセンス: Link先を確認	Junzhe Zhu and Peiye Zhuang	(参考訳) 自動テキストから3D生成の進歩は目覚ましい。既存のほとんどのメソッドは、トレーニング済みのテキスト-画像拡散モデルを使用して、潜時空間の復調スコアマッチングを通じてNeural Radiance Fields (NeRF)のような3D表現を最適化する。しかし、これらの手法は、しばしば、サブ最適最適化アプローチと3次元幾何学の限られた理解のために、異なる視点でアーティファクトや不整合をもたらす。さらに、クリプス幾何学や安定したテクスチャのレンダリングにおけるNeRFの固有の制約は、高精細度を実現するための2段階の最適化につながる。本研究は,高品質なテキストから3d生成を実現するための包括的サンプリングと平滑化手法を提案する。テキストから画像への拡散モデルの潜在および画像空間における分別スコアを計算する。スコアマッチングにおけるノイズレベル(ノイズレベル)をランダムにサンプリングする代わりに、最適化全体を通してサンプリングされたタイムステップを段階的に削減する新しいタイムステップアニーリングアプローチを導入する。単一段最適化において高品質なレンダリングを生成するため,我々はNeRF線に沿ったz座標の分散の正則化を提案する。我々は,NeRFにおけるテクスチャ・フリッカリング問題に対処するため,重みを粗くし,高精度かつ徹底的なサンプリングを行うカーネル・スムースティング手法を提案する。広範な実験により,従来の手法よりも優れていることを示し,一段階のトレーニングプロセスを通じて,高度に詳細な3dアセットの生成を可能にした。 The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited understanding of 3D geometry. Moreover, the inherent constraints of NeRFs in rendering crisp geometry and stable textures usually lead to a two-stage optimization to attain high-resolution details. This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation, all in a single-stage optimization. We compute denoising scores in the text-to-image diffusion model's latent and image spaces. Instead of randomly sampling timesteps (also referred to as noise levels in denoising score matching), we introduce a novel timestep annealing approach that progressively reduces the sampled timestep throughout optimization. To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays. To address texture flickering issues in NeRFs, we introduce a kernel smoothing technique that refines importance sampling weights coarse-to-fine, ensuring accurate and thorough sampling in high-density regions. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling the generation of highly detailed and view-consistent 3D assets through a single-stage training process.	翻訳日:2023-11-30 15:57:02 公開日:2023-11-28
# sr-ood: サンプル修復による分散検出 SR-OOD: Out-of-Distribution Detection via Sample Repairing ( http://arxiv.org/abs/2305.18228v2 ) ライセンス: Link先を確認	Rui Sun, Andi Zhang, Haiming Zhang, Jinke Ren, Yao Zhu, Ruimao Zhang, Shuguang Cui, Zhen Li	(参考訳) out-of-distribution (ood)検出は、機械学習モデルの信頼性と堅牢性を保証する上で重要なタスクである。近年の研究では、生成モデルはOODサンプルに高い信頼度を割り当てることがしばしばあり、データのセマンティックな情報を捕捉できないことが示されている。この問題に対処するために,サンプル修復を活用し,新しいOOD検出フレームワーク,SR-OODを提案する。筆者らのフレームワークは,OODサンプルの修復により,その意味的不整合が分布内データと矛盾していることを明らかにする。具体的には,本フレームワークは,サンプル修復モジュールと検出モジュールの2つのコンポーネントから構成される。サンプル修復モジュールは、入力サンプルにエロージョンを施し、生成的な敵ネットワークを使用して修復する。そして、検出モジュールは距離メトリックを用いて入力サンプルがOODであるか否かを決定する。我々のフレームワークは検出のために追加のデータやラベル情報を必要としないので、様々なシナリオに適用できます。 CIFAR-10, CelebA, Pokemonの3つの画像データセットについて広範な実験を行った。その結果,本手法はOOD検出における最先端な生成手法よりも優れた性能を示すことがわかった。 Out-of-distribution (OOD) detection is a crucial task for ensuring the reliability and robustness of machine learning models. Recent works have shown that generative models often assign high confidence scores to OOD samples, indicating that they fail to capture the semantic information of the data. To tackle this problem, we take advantage of sample repairing and propose a novel OOD detection framework, namely SR-OOD. Our framework leverages the idea that repairing an OOD sample can reveal its semantic inconsistency with the in-distribution data. Specifically, our framework consists of two components: a sample repairing module and a detection module. The sample repairing module applies erosion to an input sample and uses a generative adversarial network to repair it. The detection module then determines whether the input sample is OOD using a distance metric. Our framework does not require any additional data or label information for detection, making it applicable to various scenarios. We conduct extensive experiments on three image datasets: CIFAR-10, CelebA, and Pokemon. The results demonstrate that our approach achieves superior performance over the state-of-the-art generative methods in OOD detection.	翻訳日:2023-11-30 15:56:14 公開日:2023-11-28
# ControlVideo: ワンショットテキスト駆動ビデオ編集のための条件制御 ControlVideo: Conditional Control for One-shot Text-driven Video Editing and Beyond ( http://arxiv.org/abs/2305.17098v2 ) ライセンス: Link先を確認	Min Zhao, Rongzhen Wang, Fan Bao, Chongxuan Li, Jun Zhu	(参考訳) 本稿では,テキスト駆動ビデオ編集のための「emph{ControlVideo}」を提案する。事前学習されたテキストから画像への拡散モデルに基づいて、コントロールビデオは、追加の条件(エッジマップなど)を組み込んで忠実度と時間的一貫性を高め、キーフレームと時間的注意を設計空間の深い探索を通じてソースビデオ-テキストペアに微調整する。広範にわたる実験結果から、コントロールビデオは、テキストと整合しながら、高忠実度w.r.t.のソースコンテンツと時間的一貫性を示すビデオを提供することで、様々な競争上のベースラインを上回ります。トレーニング前に低ランク適応層をモデルに組み込むことで、controlvideoは参照画像とシームレスに連携するビデオを生成することができる。さらに重要なこととして、ControlVideoは、長距離の時間的一貫性を維持することが不可欠である長いビデオ編集(例えば数百フレーム)の課題に容易に拡張できる。そこで本研究では,短いビデオセグメントとキーフレームビデオの重ね合わせに基本制御ビデオを適用し,それらを予め定義された重み関数で融合することにより,融合制御ビデオを構築することを提案する。実験の結果、140フレームにまたがる動画を作成する能力は、以前の作品の約5.83倍から17.5倍に向上した。コードは \href{https://github.com/thu-ml/controlvideo}{https://github.com/thu-ml/controlvideo} で、視覚化結果は \href{https://drive.google.com/file/d/1wEgc2io3UwmoC5vTPbkccFvTkwVqsZlK/view? usp=drive_link}{here}。 This paper presents \emph{ControlVideo} for text-driven video editing -- generating a video that aligns with a given text while preserving the structure of the source video. Building on a pre-trained text-to-image diffusion model, ControlVideo enhances the fidelity and temporal consistency by incorporating additional conditions (such as edge maps), and fine-tuning the key-frame and temporal attention on the source video-text pair via an in-depth exploration of the design space. Extensive experimental results demonstrate that ControlVideo outperforms various competitive baselines by delivering videos that exhibit high fidelity w.r.t. the source content, and temporal consistency, all while aligning with the text. By incorporating Low-rank adaptation layers into the model before training, ControlVideo is further empowered to generate videos that align seamlessly with reference images. More importantly, ControlVideo can be readily extended to the more challenging task of long video editing (e.g., with hundreds of frames), where maintaining long-range temporal consistency is crucial. To achieve this, we propose to construct a fused ControlVideo by applying basic ControlVideo to overlapping short video segments and key frame videos and then merging them by pre-defined weight functions. Empirical results validate its capability to create videos across 140 frames, which is approximately 5.83 to 17.5 times more than what previous works achieved. The code is available at \href{https://github.com/thu-ml/controlvideo}{https://github.com/thu-ml/controlvideo} and the visualization results are available at \href{https://drive.google.com/file/d/1wEgc2io3UwmoC5vTPbkccFvTkwVqsZlK/view?usp=drive_link}{HERE}.	翻訳日:2023-11-30 15:55:54 公開日:2023-11-28
# コンテキスト結合型確率的包帯 Context-lumpable stochastic bandits ( http://arxiv.org/abs/2306.13053v2 ) ライセンス: Link先を確認	Chung-Wei Lee, Qinghua Liu, Yasin Abbasi-Yadkori, Chi Jin, Tor Lattimore, Csaba Szepesv\'ari	(参考訳) 我々は、$S$コンテキストと$K$アクションによる文脈的盗賊問題を考える。各ラウンド$t=1,2,\dots$ では、学習者はランダムな文脈を観察し、過去の経験に基づいてアクションを選択する。そして、学習者は、平均が文脈の関数であり、ラウンドに対するアクションであるランダムな報酬を観察する。コンテキストを$r\le \min\{s,k\}$ グループにまとめることができて、同じグループに属する2つのコンテキストに対して平均報酬が同じであるという仮定の下で、最大$\widetilde o(r (s +k )/\epsilon^2)$ のサンプルを高い確率で使い、$\omega(r(s+k)/\epsilon^2)$ の値と一致する$\omega(r(s+k)/\epsilon^2)$ の値を与えるアルゴリズムを与える。後悔の最小化設定では、累積的な後悔が最大時間$t$のアルゴリズムは$\widetilde o(\sqrt{r^3(s+k)t})$で区切られる。我々の知る限り、我々はPAC設定におけるほぼ最適サンプルの複雑さを初めて示し、この問題のオンライン設定において、$\widetilde O(\sqrt{{poly}(r)(S+K)T})$ minimax regret を示す。また、我々のアルゴリズムはより一般的な低ランクバンディットに適用でき、いくつかのシナリオで改善された後悔境界が得られることを示す。 We consider a contextual bandit problem with $S$ contexts and $K$ actions. In each round $t=1,2,\dots$, the learner observes a random context and chooses an action based on its past experience. The learner then observes a random reward whose mean is a function of the context and the action for the round. Under the assumption that the contexts can be lumped into $r\le \min\{S,K\}$ groups such that the mean reward for the various actions is the same for any two contexts that are in the same group, we give an algorithm that outputs an $\epsilon$-optimal policy after using at most $\widetilde O(r (S +K )/\epsilon^2)$ samples with high probability and provide a matching $\Omega(r(S+K)/\epsilon^2)$ lower bound. In the regret minimization setting, we give an algorithm whose cumulative regret up to time $T$ is bounded by $\widetilde O(\sqrt{r^3(S+K)T})$. To the best of our knowledge, we are the first to show the near-optimal sample complexity in the PAC setting and $\widetilde O(\sqrt{{poly}(r)(S+K)T})$ minimax regret in the online setting for this problem. We also show our algorithms can be applied to more general low-rank bandits and get improved regret bounds in some scenarios.	翻訳日:2023-11-30 15:47:53 公開日:2023-11-28
# 軽量侵入検知のための分散オンラインGネットワーク学習 Decentralized Online Federated G-Network Learning for Lightweight Intrusion Detection ( http://arxiv.org/abs/2306.13029v2 ) ライセンス: Link先を確認	Mert Nak{\i}p and Baran Can G\"ul and Erol Gelenbe	(参考訳) サイバー攻撃はネットワーク化されたシステムを脅かしている。多くの場合、新しいタイプの未知(ゼロデイ)攻撃や脆弱なデバイスが出現する。このような攻撃は、機械学習(ML)ベースの侵入検知システム(IDS)を介して保護されるサプライチェーンの複数のコンポーネントをターゲットにすることもできる。 However, the need to learn large amounts of labelled data often limits the applicability of ML-based IDSs to cybersystems that only have access to private local data, while distributed systems such as Supply Chains have multiple components, each of which must preserve its private data while being targeted by the same attack To address this issue, this paper proposes a novel Decentralized and Online Federated Learning Intrusion Detection (DOF-ID) architecture based on the G-Network model with collaborative learning, that allows each IDS used by a specific component to learn from the experience gained in other components, in addition to its own local data, without violating the data privacy of other components. 公開kitsuneおよびbot-iotデータセットを用いた性能評価の結果,dof-idは,オンライン学習に許容される計算時間とともに,すべての協調コンポーネントの侵入検出性能を大幅に向上させることが示された。 Cyberattacks are increasingly threatening networked systems, often with the emergence of new types of unknown (zero-day) attacks and the rise of vulnerable devices. Such attacks can also target multiple components of a Supply Chain, which can be protected via Machine Learning (ML)-based Intrusion Detection Systems (IDSs). However, the need to learn large amounts of labelled data often limits the applicability of ML-based IDSs to cybersystems that only have access to private local data, while distributed systems such as Supply Chains have multiple components, each of which must preserve its private data while being targeted by the same attack To address this issue, this paper proposes a novel Decentralized and Online Federated Learning Intrusion Detection (DOF-ID) architecture based on the G-Network model with collaborative learning, that allows each IDS used by a specific component to learn from the experience gained in other components, in addition to its own local data, without violating the data privacy of other components. The performance evaluation results using public Kitsune and Bot-IoT datasets show that DOF-ID significantly improves the intrusion detection performance in all of the collaborating components, with acceptable computation time for online learning.	翻訳日:2023-11-30 15:47:21 公開日:2023-11-28
# 順序最適後悔境界を用いたカーネル化強化学習 Kernelized Reinforcement Learning with Order Optimal Regret Bounds ( http://arxiv.org/abs/2306.07745v2 ) ライセンス: Link先を確認	Sattar Vakili, Julia Olkhovskaya	(参考訳) 強化学習(rl)は、複雑なモデルと大きな状態動作空間を持つ様々な実世界での経験的成功を示している。しかし、既存の分析結果は通常、少数の状態アクションや線形モデル化された状態アクション値関数のような単純なモデルの設定に焦点を当てる。より一般的な値関数を持つ大きな状態作用空間を効率的に扱うrlポリシーを導出するために、近年、カーネルリッジ回帰を用いた非線形関数近似が検討されている。状態作用値関数が再生カーネルヒルベルト空間(RKHS)で表されるとき、最小二乗値反復の楽観的な修正である$\pi$-KRVIを提案する。一般設定で、最初の順序最適後悔保証を証明します。以上の結果から,最先端技術におけるエピソード改善回数に有意な多項式がみられた。特に、非常に非滑らかなカーネル(Neural Tangent kernel や Mat\'ern kernel など)では、既存の結果は自明な(エピソード数で超直線的な)後悔境界に繋がる。我々は、後悔に対する下限が知られているmat\'ern核の場合、順序最適である部分線形の後悔の束縛を示す。 Reinforcement learning (RL) has shown empirical success in various real world settings with complex models and large state-action spaces. The existing analytical results, however, typically focus on settings with a small number of state-actions or simple models such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have considered nonlinear function approximation using kernel ridge regression. We propose $\pi$-KRVI, an optimistic modification of least-squares value iteration, when the state-action value function is represented by a reproducing kernel Hilbert space (RKHS). We prove the first order-optimal regret guarantees under a general setting. Our results show a significant polynomial in the number of episodes improvement over the state of the art. In particular, with highly non-smooth kernels (such as Neural Tangent kernel or some Mat\'ern kernels) the existing results lead to trivial (superlinear in the number of episodes) regret bounds. We show a sublinear regret bound that is order optimal in the case of Mat\'ern kernels where a lower bound on regret is known.	翻訳日:2023-11-30 15:45:41 公開日:2023-11-28
# 視覚探索のための脳拡散:大規模生成モデルを用いた皮質発見 Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models ( http://arxiv.org/abs/2306.03089v2 ) ライセンス: Link先を確認	Andrew F. Luo, Margaret M. Henderson, Leila Wehbe, Michael J. Tarr	(参考訳) 神経科学における長年の目標は、脳の機能的組織を解明することであった。高度な視覚野の中では、機能的説明は比較的粗いままであり、関心領域(ROI)に焦点を当て、顔、場所、体、食べ物、言葉など幅広いカテゴリーの選択の形式を採っている。このようなROIの同定は、通常、非生態的な文脈で孤立した物体からなる手作業による刺激セットに依存しているため、先験仮説を頑健にしない機能的な組織を探索することは困難である。これらの限界を克服するために, カテゴリー特異的な刺激を必要とせず, 自然画像とfmri記録を組み合わせることで, 所定の脳領域を活性化させると予測される画像を合成するデータ駆動手法を提案する。脳拡散(Brain Diffusion for Visual Exploration)は、脳誘導画像合成と大規模拡散モデルを組み合わせることで、最近の生成法に基づいている。本手法の有効性を検証し,カテゴリ選択型roisに対して,適切な意味特異性を持つ好適画像を合成する能力を示す。次に、BrainDiVEは、同じハイレベルカテゴリに選択されたROIの違いを特徴付けることができることを示す。最後に,これらのrois内の新たな機能的部分区分を同定し,行動データを用いて検証する。これらの結果は、人間の視覚野の細粒度機能構造の理解を前進させ、仮説駆動法を用いて皮質組織のさらなる検討のための明確な制約を与える。 A long standing goal in neuroscience has been to elucidate the functional organization of the brain. Within higher visual cortex, functional accounts have remained relatively coarse, focusing on regions of interest (ROIs) and taking the form of selectivity for broad categories such as faces, places, bodies, food, or words. Because the identification of such ROIs has typically relied on manually assembled stimulus sets consisting of isolated objects in non-ecological contexts, exploring functional organization without robust a priori hypotheses has been challenging. To overcome these limitations, we introduce a data-driven approach in which we synthesize images predicted to activate a given brain region using paired natural images and fMRI recordings, bypassing the need for category-specific stimuli. Our approach -- Brain Diffusion for Visual Exploration ("BrainDiVE") -- builds on recent generative methods by combining large-scale diffusion models with brain-guided image synthesis. Validating our method, we demonstrate the ability to synthesize preferred images with appropriate semantic specificity for well-characterized category-selective ROIs. We then show that BrainDiVE can characterize differences between ROIs selective for the same high-level category. Finally we identify novel functional subdivisions within these ROIs, validated with behavioral data. These results advance our understanding of the fine-grained functional organization of human visual cortex, and provide well-specified constraints for further examination of cortical organization using hypothesis-driven methods.	翻訳日:2023-11-30 15:44:04 公開日:2023-11-28
# ノイズのある中間スケール量子時代の量子メロジ Quantum metrology in the noisy intermediate-scale quantum era ( http://arxiv.org/abs/2307.07701v2 ) ライセンス: Link先を確認	Lin Jiao, Wei Wu, Si-Yuan Bai, Jun-Hong An	(参考訳) 量子メートル法は、エンタングルメントやスクイーズングといった量子的特徴を資源として利用することにより、古典的に達成可能な限界よりも高い精度の測定を物理的量に実現することを目指している。次世代の周波数標準、磁力計、レーダー、ナビゲーションの開発に応用できる可能性がある。しかし、量子世界のユビキタスなデコヒーレンスは量子資源を分解し、その精度を古典的極限(ノイズ量子メトロロジーのno-go定理と呼ばれ、その応用を著しく阻害する)に戻す。そのため,近年,現実的な雑音環境下での量子気象学の性能の実現が注目されている。我々は、量子メソロジーの原理、分類、応用について概観する。感度向上に量子優位をもたらす様々な量子リソースに、特に注意が払われるだろう。次に,ノイズ誘起デコヒーレンス状態における雑音量子メトロロジーのno-go定理とそのアクティブ制御について述べる。 Quantum metrology pursues the physical realization of higher-precision measurements to physical quantities than the classically achievable limit by exploiting quantum features, such as entanglement and squeezing, as resources. It has potential applications in developing next-generation frequency standards, magnetometers, radar, and navigation. However, the ubiquitous decoherence in the quantum world degrades the quantum resources and forces the precision back to or even worse than the classical limit, which is called the no-go theorem of noisy quantum metrology and greatly hinders its applications. Therefore, how to realize the promised performance of quantum metrology in realistic noisy situations attracts much attention in recent years. We will review the principle, categories, and applications of quantum metrology. Special attention will be paid to different quantum resources that can bring quantum superiority in enhancing sensitivity. Then, we will introduce the no-go theorem of noisy quantum metrology and its active control under different kinds of noise-induced decoherence situations.	翻訳日:2023-11-30 15:36:44 公開日:2023-11-28
# エンジニアリングデザインナレッジグラフへの特許書類 Patent Documents to Engineering Design Knowledge Graphs ( http://arxiv.org/abs/2307.06985v4 ) ライセンス: Link先を確認	L Siddharth, Jianxi Luo	(参考訳) 設計プロセスにおいて知識集約的なタスクをサポートすることを目的として、テキスト文書から設計知識を投入することは、三重項(head entity :: relationship :: tail entity or h :: r :: t)の抽出を伴う。関係性は、主に存在論的あるいは常識的な代替物から選択されるため、これらを用いて構築された知識グラフは、テキスト文書に記述されたものよりも、設計知識の近似あるいは制限された視点を描いている。本稿では,特許文書中の文から事実(h :: r :: t)を識別・説明するためのデータ駆動アプローチを提案する。我々は44,227の文と事実のデータセットを作成し、すべての特許分類を包含するとともに、特許文書セクションのバリエーションを捉える。このデータセットを使用して、トークンを分類するタグをトレーニングします。 1)すべての実体(h)と関係(r)を識別し、 2)一対の実体(h :: ___ :: t)に対する特定の関係(r)。これらのタガーはトランスフォーマティブに基づくシーケンス分類モデルに基づいて構築されているが,線形分類器とグラフニューラルネットワークを用いたエッジ分類アプローチに対する提案手法を評価し,トランスフォーマティブに基づくトークン埋め込みと言語的特徴を取り入れている。提案手法の単純さと包括性により,任意の規模および多種多様な文書を特許出願することができる。オープンソースのpythonパッケージをデプロイする際,本手法をファンシステムに関する文書の特許に適用する。このように抽出した知識グラフから、事実をドメインオントロジーに一般化し、サブシステムレベルに指定する方法を説明する。また,ChatGPTの意見に反する議論をしながら,ファンシステムにおける重要な課題の知識を検索・抽出することで,知識グラフ表現の重要性を強調した。 Aimed at supporting knowledge-intensive tasks in the design process, populating design knowledge from text documents involves the extraction of triples - head entity :: relationship :: tail entity or h :: r :: t that could be combined into a knowledge graph representation. As relationships are largely chosen from ontological or common-sense alternatives, knowledge graphs built using these depict an approximation or restricted view of design knowledge, rather than what is explicated in text document. In this article, we present a data-driven approach to identify and explicate facts (h :: r :: t) from sentences in patent documents. We create a dataset of 44,227 sentences and facts, encompassing all patent classifications while also capturing the variations among patent document sections. Using this dataset, we train taggers that classify tokens to: 1) identify all entities (h) and relationships (r) and 2) specific relationships (r) for a pair of entities (h :: ___ :: t). While these taggers are built upon transformer-based sequence classification models, we evaluate our proposed method against edge classification approaches that use linear classifiers and graph neural networks, incorporating transformer-based token embeddings and linguistic features. The simplicity and coverage of the proposed method enable its application to patent documents at any scale and variety. Upon deploying an open-source python package, we apply our method to patent documents related to fan systems. From the knowledge graphs thus extracted, we explain how facts could be generalised to domain ontologies as well as be specified to subsystem levels. We also highlight the importance of knowledge graph representations by retrieving and explicating the knowledge of key issues in fan systems, while holding a comparative discussion against opinions from ChatGPT.	翻訳日:2023-11-30 15:36:00 公開日:2023-11-28
# 分極化学におけるパウリ原理 Pauli principle in polaritonic chemistry ( http://arxiv.org/abs/2307.03508v5 ) ライセンス: Link先を確認	Tam\'as Szidarovszky	(参考訳) パウリの原理(スピン統計定理)によって要求される置換対称性がキャビティの量子化放射モードと相互作用する分子アンサンブルの状態空間に及ぼす影響について論じる。パウリが許容する集合状態は、群論、すなわち、状態空間を区別不能な分子の置換群の適切な既約表現に射影することによって得られる。分子数の増加に伴い,パウリが許容する集団状態の比は急速に減少することが示された。ボゾン状態はフェルミオン状態よりも豊富であり、パウリが許容する状態空間(光子励起状態からの寄与)の明るさは、物質基底(励起)状態多様体のエネルギー準位における微細構造の増加とともに増大(減少)する。数値的な結果は、H$_2$O分子が赤外線キャビティモードと相互作用する現実的な例を示す。 The consequences of enforcing permutational symmetry, as required by the Pauli principle (spin-statistical theorem), on the state space of molecular ensembles interacting with the quantized radiation mode of a cavity are discussed. The Pauli-allowed collective states are obtained by means of group theory, i.e., by projecting the state space onto the appropriate irreducible representations of the permutation group of the indistinguishable molecules. It is shown that with increasing number of molecules the ratio of Pauli-allowed collective states decreases very rapidly. Bosonic states are more abundant than fermionic states, and the brightness of Pauli-allowed state space (the contribution from photon excited states) increases(decreases) with increasing fine structure in the energy levels of the material ground(excited) state manifold. Numerical results are shown for the realistic example of rovibrating H$_2$O molecules interacting with an infrared cavity mode.	翻訳日:2023-11-30 15:34:29 公開日:2023-11-28
# デコヒーレンスがアンハーモニック発振器をシミュレートするコストを制限 Decoherence Limits the Cost to Simulate an Anharmonic Oscillator ( http://arxiv.org/abs/2307.00748v3 ) ライセンス: Link先を確認	Tzula B. Propp, Sayonee Ray, John B. DeBrota, Tameem Albash, and Ivan Deutsch	(参考訳) カー効果によって支配される無調波発振器の量子力学をシミュレートし、デコヒーレンスがいかに効率を高めるかを研究する。閉量子系の位相空間量子干渉に付随する微細なサブプランク構造をデコヒーレンスで洗い出すと、粗い有限差分積分を用いてより効率的に量子力学をシミュレートすることができる。これをデコヒーレンスが半古典的切断ウィグナー近似(twa)を回復する方法と結びつけ、量子干渉が猫状態やより一般的なコヒーレント状態の重ね合わせに繋がる場合の正確な閉系力学とは大きく異なる。半古典力学への二次測定統計量の回帰は、発振器の初期振幅が大きくなるにつれてより顕著になり、ノイズの多い量子デバイスでシステムサイズが大きくなるにつれてアクセス可能な量子優位性が示唆される。最後に、この回帰は、非偏極ノイズチャネルのような凸ノイズモデルの形を持っていないことを示す。その代わり、クローズド量子システム効果はオープンシステム効果と相互作用し、異なるオープンシステムの振る舞いを生み出す。 We study how decoherence increases the efficiency with which we can simulate the quantum dynamics of an anharmonic oscillator, governed by the Kerr effect. As decoherence washes out the fine-grained subPlanck structure associated with phase-space quantum interference in the closed quantum system, open quantum dynamics can be more efficiently simulated using a coarse-grained finite-difference numerical integration. We tie this to the way in which decoherence recovers the semiclassical truncated Wigner approximation (TWA), which strongly differs from the exact closed-system dynamics at times when quantum interference leads to cat states and more general superpositions of coherent states. The regression in quadrature measurement statistics to semiclassical dynamics becomes more pronounced as the initial amplitude of the oscillator grows, with implications for the quantum advantage that might be accessible as system size grows in noisy quantum devices. Lastly, we show that this regression does not have the form of a convex noise model, such as for a depolarizing noise channel. Instead, closed quantum system effects interact with the open system effects, giving rise to distinct open system behavior.	翻訳日:2023-11-30 15:33:31 公開日:2023-11-28
# ViTはフレキシブルなビジョンバックボーン Stitched ViTs are Flexible Vision Backbones ( http://arxiv.org/abs/2307.00154v2 ) ライセンス: Link先を確認	Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang	(参考訳) 大きな事前訓練されたプレーンビジョントランスフォーマー(vits)は多くの下流タスクの作業馬であった。しかし,個別サイズのvitを採用するには個別のトレーニングが必要であり,一定の性能・効率上のトレードオフによって制限されるため,既設のvitを用いた既存の作業はトレーニングや展開の面では非効率である。本稿では,事前学習したモデルファミリを縫い合わせることによって,リッチサブネットワークをカバーする単一モデルを安価に生成し,実行時に多様なパフォーマンス効率のトレードオフをサポートするステッチブルニューラルネットワーク(sn-net)に着想を得た。この基盤の上に構築されたSN-Netv2は、ダウンストリームタスク適応を容易にするために、体系的に改良されたモデル縫合フレームワークである。具体的には,縫合空間を拡大する2方向縫合法を提案する。そこで我々は,空間内のフロップ分布を考慮に入れたリソース制約付きサンプリング戦略を考案した。最後に,低レベルの更新としてスタイリング層を学習することが,下流タスクにおいて重要な役割を担い,トレーニングを安定させ,適切なparetoフロンティアを確保できることを観察した。 ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2の広範な実験により、SN-Netv2は、下流の高密度予測においてSN-Netv1よりも優れたパフォーマンスを示し、柔軟なビジョンバックボーンとしての強い能力を示し、トレーニング効率とデプロイメントの柔軟性の両方において大きな優位性を実現している。コードはhttps://github.com/ziplab/sn-netv2で入手できる。 Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate trainings and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural networks (SN-Net), which is a new framework that cheaply produces a single model that covers rich subnetworks by stitching pretrained model families, supporting diverse performance-efficiency trade-offs at runtime. Building upon this foundation, we introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation. Specifically, we first propose a two-way stitching scheme to enlarge the stitching space. We then design a resource-constrained sampling strategy that takes into account the underlying FLOPs distributions in the space for better sampling. Finally, we observe that learning stitching layers as a low-rank update plays an essential role on downstream tasks to stabilize training and ensure a good Pareto frontier. With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility. Code is available at https://github.com/ziplab/SN-Netv2.	翻訳日:2023-11-30 15:33:08 公開日:2023-11-28
# 移動不要:Opti-Mileを用いたラストマイルと公共交通の統合 No Transfers Required: Integrating Last Mile with Public Transit Using Opti-Mile ( http://arxiv.org/abs/2306.15943v2 ) ライセンス: Link先を確認	Raashid Altaf, Pravesh Biyani	(参考訳) 公共交通機関は、ほとんどの地域に到達するのに必要な交通機関の必要性のため不便にもかかわらず、その手頃な価格のため人気のある交通手段である。例えば、ニューデリーのバスと地下鉄のネットワークでは、すべての出発点から直接アクセスできる停留所は30%に過ぎず、ほとんどの通勤に乗り換える必要がある。さらに、リックショー、タクチューク、シャトルといったラストマイルのサービスは、最も近い公共交通機関のアクセスポイントへの給餌機として一般的に使われており、旅の複雑さと非効率性をさらに増す。最終的に、ユーザーは移動モードやラストマイルサービスの有無に関わらず、目的地に到達するためのカバレッジと転送のトレードオフに直面します。公共交通機関における移動に伴うアクセシビリティの制限と非効率の問題に対処するために,ラストマイルサービスと公共交通機関を組み合わせた新しい旅行計画手法である「opti-mile」を提案する。 Opti-mileでは、最大歩行距離や許容範囲などの旅行パラメータをカスタマイズできる。我々はニューデリーの交通ネットワークを解析し、ランダムに選択されたソース-決定ペア間の最適なマルチモーダル旅行におけるオプティマイルの効率、実現可能性、利点を評価する。従来の最短経路に比べて18%の値上げで、オプティマイル走行が10%距離移動を減少させることを示した。また、オプティマイルの旅行は公共交通機関よりも、運賃の大幅な増加を伴わずに、市をカバーしていることを示す。 Public transit is a popular mode of transit due to its affordability, despite the inconveniences due to the necessity of transfers required to reach most areas. For example, in the bus and metro network of New Delhi, only 30% of stops can be directly accessed from any starting point, thus requiring transfers for most commutes. Additionally, last-mile services like rickshaws, tuk-tuks or shuttles are commonly used as feeders to the nearest public transit access points, which further adds to the complexity and inefficiency of a journey. Ultimately, users often face a tradeoff between coverage and transfers to reach their destination, regardless of the mode of transit or the use of last-mile services. To address the problem of limited accessibility and inefficiency due to transfers in public transit systems, we propose ``opti-mile," a novel trip planning approach that combines last-mile services with public transit such that no transfers are required. Opti-mile allows users to customise trip parameters such as maximum walking distance, and acceptable fare range. We analyse the transit network of New Delhi, evaluating the efficiency, feasibility and advantages of opti-mile for optimal multi-modal trips between randomly selected source-destination pairs. We demonstrate that opti-mile trips lead to a 10% reduction in distance travelled for 18% increase in price compared to traditional shortest paths. We also show that opti-mile trips provide better coverage of the city than public transit, without a significant fare increase.	翻訳日:2023-11-30 15:32:37 公開日:2023-11-28
# grass: リモートセンシング画像セマンティクスセグメンテーションのためのグラデーション誘導サンプリング戦略を用いたコントラスト学習 GraSS: Contrastive Learning with Gradient Guided Sampling Strategy for Remote Sensing Image Semantic Segmentation ( http://arxiv.org/abs/2306.15868v3 ) ライセンス: Link先を確認	Zhaoyang Zhang, Zhen Ren, Chao Tao, Yunsheng Zhang, Chengli Peng, Haifeng Li	(参考訳) 自己教師付きコントラスト学習(SSCL)は、リモートセンシング画像(RSI)理解において重要なマイルストーンを達成している。その本質は、ダウンストリームタスクに有益である多数のラベルのない画像から画像の特徴を抽出するための教師なしインスタンス識別プリテキストタスクを設計することである。しかしながら、既存のインスタンス識別ベースのssclは、rsiセマンティックセグメンテーションタスクに適用される場合、2つの制限に苦しむ。 1) 肯定的なサンプル結合問題 2)特徴適応バイアス。ピクセルレベルやオブジェクトレベルの機能を必要とするセマンティックセグメンテーションタスクに適用すると、機能適応バイアスが導入される。本研究では,RSIの特定領域に対して,教師なしのコントラスト損失の勾配によって識別情報をマッピングできることを見いだし,これらの特定領域は特異な接地対象を含む傾向にあることを示した。そこで本研究では,RSIセマンティックセグメンテーションのためのGradient Guided Sampling Strategy(GraSS)を用いたコントラスト学習を提案する。 GraSSは、インスタンス識別ウォームアップ(IDウォームアップ)とGradient Guided Sampling contrastive training(GSトレーニング)の2つのステージで構成される。 idウォームアップは、コントラスト損失勾配に初期識別情報を提供することを目的としている。 gsトレーニングステージは、より特異な接地対象を含むrsiパッチのコントラスト損失勾配および適応的に選択された領域に含まれる識別情報を活用し、新しい正と負のサンプルを構築することを目的としている。 3つのオープンデータセットの実験結果から、GraSSは高分解能RSIセマンティックセグメンテーションにおけるSSCLの性能を効果的に向上することが示された。 5つの異なる種類のssclからの7つのベースライン法と比較すると、草は平均で 1.57 %、最大で 3.58 % の改善を達成している。ソースコードはhttps://github.com/GeoX-Lab/GraSSで入手できる。 Self-supervised contrastive learning (SSCL) has achieved significant milestones in remote sensing image (RSI) understanding. Its essence lies in designing an unsupervised instance discrimination pretext task to extract image features from a large number of unlabeled images that are beneficial for downstream tasks. However, existing instance discrimination based SSCL suffer from two limitations when applied to the RSI semantic segmentation task: 1) Positive sample confounding issue; 2) Feature adaptation bias. It introduces a feature adaptation bias when applied to semantic segmentation tasks that require pixel-level or object-level features. In this study, We observed that the discrimination information can be mapped to specific regions in RSI through the gradient of unsupervised contrastive loss, these specific regions tend to contain singular ground objects. Based on this, we propose contrastive learning with Gradient guided Sampling Strategy (GraSS) for RSI semantic segmentation. GraSS consists of two stages: Instance Discrimination warm-up (ID warm-up) and Gradient guided Sampling contrastive training (GS training). The ID warm-up aims to provide initial discrimination information to the contrastive loss gradients. The GS training stage aims to utilize the discrimination information contained in the contrastive loss gradients and adaptively select regions in RSI patches that contain more singular ground objects, in order to construct new positive and negative samples. Experimental results on three open datasets demonstrate that GraSS effectively enhances the performance of SSCL in high-resolution RSI semantic segmentation. Compared to seven baseline methods from five different types of SSCL, GraSS achieves an average improvement of 1.57\% and a maximum improvement of 3.58\% in terms of mean intersection over the union. The source code is available at https://github.com/GeoX-Lab/GraSS	翻訳日:2023-11-30 15:32:09 公開日:2023-11-28
# domaintudio:限定データを用いたドメイン駆動画像生成のための微調整拡散モデル DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data ( http://arxiv.org/abs/2306.14153v3 ) ライセンス: Link先を確認	Jingyuan Zhu, Huimin Ma, Jiansheng Chen, Jian Yuan	(参考訳) denoising diffusion probabilistic models (ddpms) は、大量のデータでトレーニングされた場合、非常に多様な高品質な画像を合成できることが証明されている。典型的拡散モデルとテキスト・ツー・イメージ生成モデルのような現代の大規模条件生成モデルは、極端に限られたデータに微調整された場合、過度な適合に弱い。既存の研究は、いくつかの画像を含む参照セットを用いて主題駆動生成を調査してきた。しかし、DDPMベースのドメイン駆動生成は、多様性を維持しながらターゲットドメインの共通の特徴を学習することを目的としている。本稿では,大規模ソースデータセット上で事前学習したDDPMを限定データを用いて対象ドメインに適応する新しいDomainStudioアプローチを提案する。ソースドメインが提供する対象の多様性を維持し、ターゲットドメインで高品質で多様な適応サンプルを取得するように設計されている。本稿では,適応試料間の相対的距離を保ち,相当な世代多様性を達成することを提案する。さらに,高頻度ディテールの学習をさらに強化し,高次品質を実現する。我々のアプローチは無条件拡散モデルと条件拡散モデルの両方と互換性がある。この研究は、拡散モデルを用いて無条件の少数ショット画像生成を実現し、現在最先端のGANベースのアプローチよりも優れた品質と多様性を実現する最初の試みである。さらに、条件付き生成の過剰適合を著しく軽減し、高品質なドメイン駆動生成を実現し、現代の大規模テキスト・画像モデルに適用可能なシナリオをさらに拡大する。 Denoising diffusion probabilistic models (DDPMs) have been proven capable of synthesizing high-quality images with remarkable diversity when trained on large amounts of data. Typical diffusion models and modern large-scale conditional generative models like text-to-image generative models are vulnerable to overfitting when fine-tuned on extremely limited data. Existing works have explored subject-driven generation using a reference set containing a few images. However, few prior works explore DDPM-based domain-driven generation, which aims to learn the common features of target domains while maintaining diversity. This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains. We propose to keep the relative distances between adapted samples to achieve considerable generation diversity. In addition, we further enhance the learning of high-frequency details for better generation quality. Our approach is compatible with both unconditional and conditional diffusion models. This work makes the first attempt to realize unconditional few-shot image generation with diffusion models, achieving better quality and greater diversity than current state-of-the-art GAN-based approaches. Moreover, this work also significantly relieves overfitting for conditional generation and realizes high-quality domain-driven generation, further expanding the applicable scenarios of modern large-scale text-to-image models.	翻訳日:2023-11-30 15:31:37 公開日:2023-11-28
# 内陸水路における船舶の2段階強化学習 2-Level Reinforcement Learning for Ships on Inland Waterways ( http://arxiv.org/abs/2307.16769v2 ) ライセンス: Link先を確認	Martin Waltz, Niklas Paulig, Ostap Okhrin	(参考訳) 本稿では、深部強化学習(DRL)に基づく内陸水路(IW)における自動表面車両(ASV)制御のための現実的なモジュール化フレームワークを提案する。高レベルローカルパス計画(LPP)ユニットと低レベルパス追従(PF)ユニットの2つのレベルで構成され、それぞれDRLエージェントで構成されている。 LPPエージェントは、近くの船舶、交通規則、水路の幾何学を考慮に入れた経路を計画する責任がある。これにより、最近提案された時空間リカレントニューラルネットワークアーキテクチャを連続的な行動空間に転送する。 lpp剤は、他の船舶との最小距離を平均65%増加させることで、最先端人工電位法と比較して運用安全性を向上させる。 pfエージェントは、浅い水の影響と環境風、波、電流を考慮しつつ、低レベルのアクチュエータ制御を行う。比例積分導波管(PID)コントローラと比較して、PFエージェントは平均クロストラック誤差の61%しか得られず、必要な絶対舵角の制御力を著しく低減する。最後に、両方のエージェントはシミュレーションにおいて共同で検証され、北ドイツのエルベを例に挙げ、他の船の挙動をモデル化するために実際の自動識別システム(AIS)トラジェクトリを使用する。 This paper proposes a realistic modularized framework for controlling autonomous surface vehicles (ASVs) on inland waterways (IWs) based on deep reinforcement learning (DRL). The framework comprises two levels: a high-level local path planning (LPP) unit and a low-level path following (PF) unit, each consisting of a DRL agent. The LPP agent is responsible for planning a path under consideration of nearby vessels, traffic rules, and the geometry of the waterway. We thereby transfer a recently proposed spatial-temporal recurrent neural network architecture to continuous action spaces. The LPP agent improves operational safety in comparison to a state-of-the-art artificial potential field method by increasing the minimum distance to other vessels by 65% on average. The PF agent performs low-level actuator control while accounting for shallow water influences and the environmental forces winds, waves, and currents. Compared with a proportional-integral-derivative (PID) controller, the PF agent yields only 61% of the mean cross-track error while significantly reducing control effort in terms of the required absolute rudder angle. Lastly, both agents are jointly validated in simulation, employing the lower Elbe in northern Germany as an example case and using real automatic identification system (AIS) trajectories to model the behavior of other ships.	翻訳日:2023-11-30 15:25:50 公開日:2023-11-28
# グラフニューラルネットワークにおけるローカライズトレーニングデータの影響に対処する Addressing the Impact of Localized Training Data in Graph Neural Networks ( http://arxiv.org/abs/2307.12689v2 ) ライセンス: Link先を確認	Akansha A	(参考訳) グラフニューラルネットワーク(GNN)は、複雑な依存関係とノード間の関係をキャプチャする能力のため、グラフ構造化データから学ぶことで大きな成功を収めている。半教師付きノード分類、リンク予測、グラフ生成など、様々な応用に優れています。しかし、現状のGNNモデルの大部分は、動的構造を持つ実世界のグラフ上での性能を阻害する分布内設定の仮定に基づいて構築されていることを認識しておくことが重要である。本稿では,GNNの学習がグラフの局所化部分集合に与える影響を評価することを目的とする。このような制限されたトレーニングデータは、訓練された特定の領域でうまく機能するが、グラフ全体の一般化と正確な予測に失敗するモデルにつながる可能性がある。グラフベースの半教師付き学習(SSL)の文脈では、リソースの制約はしばしばデータセットが大きいシナリオにつながるが、その一部だけがラベル付け可能であり、モデルのパフォーマンスに影響する。この制限は、ラベル付けプロセスが人間の主観性に左右される場合、異常検出やスパム検出などのタスクに影響する。ローカライズされたトレーニングデータによって生じる課題に対処するために,ラベル付きデータのごく一部を表すトレーニングデータと,グラフ全体の予測を伴うグラフ推論プロセスとの間の分布を整合させることにより,分散(ood)データ問題としてこの問題にアプローチする。局所化学習データとグラフ推論の分布差を最小化し,OODデータのモデル性能を向上する正規化手法を提案する。一般的なGNNモデルに対する大規模なテストは、3つの引用GNNベンチマークデータセットに対して大幅なパフォーマンス向上を示す。正規化アプローチはモデル適応と一般化を効果的に促進し、OODデータによる課題を克服する。 Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data, owing to their ability to capture intricate dependencies and relationships between nodes. They excel in various applications, including semi-supervised node classification, link prediction, and graph generation. However, it is important to acknowledge that the majority of state-of-the-art GNN models are built upon the assumption of an in-distribution setting, which hinders their performance on real-world graphs with dynamic structures. In this article, we aim to assess the impact of training GNNs on localized subsets of the graph. Such restricted training data may lead to a model that performs well in the specific region it was trained on but fails to generalize and make accurate predictions for the entire graph. In the context of graph-based semi-supervised learning (SSL), resource constraints often lead to scenarios where the dataset is large, but only a portion of it can be labeled, affecting the model's performance. This limitation affects tasks like anomaly detection or spam detection when labeling processes are biased or influenced by human subjectivity. To tackle the challenges posed by localized training data, we approach the problem as an out-of-distribution (OOD) data issue by by aligning the distributions between the training data, which represents a small portion of labeled data, and the graph inference process that involves making predictions for the entire graph. We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference, improving model performance on OOD data. Extensive tests on popular GNN models show significant performance improvement on three citation GNN benchmark datasets. The regularization approach effectively enhances model adaptation and generalization, overcoming challenges posed by OOD data.	翻訳日:2023-11-30 15:23:45 公開日:2023-11-28
# 事前学習モデルに対する幾何認識適応 Geometry-Aware Adaptation for Pretrained Models ( http://arxiv.org/abs/2307.12226v2 ) ライセンス: Link先を確認	Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala	(参考訳) 著名なゼロショットモデルを含む機械学習モデルは、ラベルがより大きなラベル空間のごく一部に過ぎないデータセットでトレーニングされることが多い。そのような空間は、ラベルを距離で関連付けるメトリクスを備えている。我々は、トレーニングされたモデルを使って新しいクラスを確実に予測したり、ゼロショット予測の場合、追加のトレーニングなしでパフォーマンスを改善するための単純なアプローチを提案する。我々の手法は標準予測規則のドロップイン置換であり、argmaxをfr\'echet平均に置き換える。このアプローチを包括的に理論的に分析し (i)ラベル空間の直径、サンプルの複雑さ、モデル次元を交換する学習理論的結果 (ii)観測されていないクラスを予測できるシナリオの全範囲の特徴、および (iii)非観察クラス全体の予測ができない場合に最適なトレーニングクラスを得るための最適アクティブラーニング型次類選択手順。経験的に、簡単に利用できる外部メトリクスを使用することで、提案手法であるlokiは、imagenetのsimclrよりも29.7%改善され、数十万のクラスにスケールできる。そのようなメトリクスが利用できない場合、Lokiはクラス埋め込みから自己派生メトリクスを使用でき、CLIPのような事前訓練されたゼロショットモデルで10.5%改善される。 Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fr\'echet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP.	翻訳日:2023-11-30 15:23:03 公開日:2023-11-28
# in situモデルフリー最適化による高性能実世界光コンピューティング High-performance real-world optical computing trained by in situ model-free optimization ( http://arxiv.org/abs/2307.11957v4 ) ライセンス: Link先を確認	Guangyuan Zhao, and Xin Shu	(参考訳) 光コンピューティングシステムは、高速で低エネルギーのデータ処理を提供するが、計算要求の訓練やシミュレーションから現実へのギャップが不足している。光学計算機システムのその場学習を効率的に行うためのスコア勾配推定アルゴリズムに基づくモデルフリー最適化(MFO)手法を提案する。このアプローチは、光学計算システムをブラックボックスとして扱い、光学計算重みの確率分布に直接損失をバックプロパゲートし、計算的に重く偏りのあるシステムシミュレーションの必要性を回避する。単層回折光計算システムを用いた実験により,mnist と fmnist データセットのハイブリッドトレーニングを mfo が上回ることを示した。さらに, 位相マップから画像のない高速な細胞分類を示す。提案手法のモデルフリーで高性能な性質と計算資源の需要の低さが組み合わさって,実験室から実世界の応用への光コンピューティングの移行を早める。 Optical computing systems provide high-speed and low-energy data processing but face deficiencies in computationally demanding training and simulation-to-reality gaps. We propose a model-free optimization (MFO) method based on a score gradient estimation algorithm for computationally efficient in situ training of optical computing systems. This approach treats an optical computing system as a black box and back-propagates the loss directly to the optical computing weights' probability distributions, circumventing the need for a computationally heavy and biased system simulation. Our experiments on a single-layer diffractive optical computing system show that MFO outperforms hybrid training on the MNIST and FMNIST datasets. Furthermore, we demonstrate image-free and high-speed classification of cells from their phase maps. Our method's model-free and high-performance nature, combined with its low demand for computational resources, expedites the transition of optical computing from laboratory demonstrations to real-world applications.	翻訳日:2023-11-30 15:22:39 公開日:2023-11-28
# FedSOL: フェデレートラーニングにおける直交学習の安定化 FedSOL: Stabilized Orthogonal Learning in Federated Learning ( http://arxiv.org/abs/2308.12532v3 ) ライセンス: Link先を確認	Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, Se-Young Yun	(参考訳) フェデレーション学習(fl)は、グローバルモデルを構築するために個々のクライアントからローカルにトレーニングされたモデルを集約する。 flはデータプライバシを備えたモデル学習を可能にするが、クライアントデータ分散が異種である場合、パフォーマンスが著しく低下することが多い。従来のFLアルゴリズムの多くは、様々な近位制限を導入してこの問題に対処してきた。これらの制限は、地域学習のグローバル目標からの逸脱を制限することによって、グローバルアライメントを促進することを目的としている。しかし、それらは本来、本来のローカルな目的に干渉することによって、ローカルな学習を制限する。近年,局所学習の一般性向上に向けた新たなアプローチが出現している。スムーズな損失環境の中でローカルモデルを得ることで、このアプローチは、クライアントの異なるローカル目的間の競合を緩和する。しかし、地域学習はグローバルな目標を考慮していないため、安定したグローバルアライメントは保証されていない。本研究では,グローバルアライメントの概念と局所的一般性を組み合わせたFedSoL(Federated Stability on Learning)を提案する。 FedSoLでは、局所学習は近位摂動に対して頑健なパラメータ領域を求める。この戦略は、パラメータ更新の本来のローカル目的を維持しながら、局所学習における暗黙の近位制限効果を導入する。実験の結果,FedSoLは様々な設定で常に最先端の性能を実現していることがわかった。 Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.	翻訳日:2023-11-30 15:13:31 公開日:2023-11-28
# FedSOL: フェデレートラーニングにおける直交学習の安定化 FedSOL: Stabilized Orthogonal Learning in Federated Learning ( http://arxiv.org/abs/2308.12532v2 ) ライセンス: Link先を確認	Gihun Lee, Minchan Jeong, Sangmook Kim, Jaehoon Oh, Se-Young Yun	(参考訳) フェデレーション学習(fl)は、グローバルモデルを構築するために個々のクライアントからローカルにトレーニングされたモデルを集約する。 flはデータプライバシを備えたモデル学習を可能にするが、クライアントデータ分散が異種である場合、パフォーマンスが著しく低下することが多い。従来のFLアルゴリズムの多くは、様々な近位制限を導入してこの問題に対処してきた。これらの制限は、地域学習のグローバル目標からの逸脱を制限することによって、グローバルアライメントを促進することを目的としている。しかし、それらは本来、本来のローカルな目的に干渉することによって、ローカルな学習を制限する。近年,局所学習の一般性向上に向けた新たなアプローチが出現している。スムーズな損失環境の中でローカルモデルを得ることで、このアプローチは、クライアントの異なるローカル目的間の競合を緩和する。しかし、地域学習はグローバルな目標を考慮していないため、安定したグローバルアライメントは保証されていない。本研究では,グローバルアライメントの概念と局所的一般性を組み合わせたFedSoL(Federated Stability on Learning)を提案する。 FedSoLでは、局所学習は近位摂動に対して頑健なパラメータ領域を求める。この戦略は、パラメータ更新の本来のローカル目的を維持しながら、局所学習における暗黙の近位制限効果を導入する。実験の結果,FedSoLは様々な設定で常に最先端の性能を実現していることがわかった。 Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.	翻訳日:2023-11-30 15:13:15 公開日:2023-11-28
# SE(3) 等変拡大結合流 SE(3) Equivariant Augmented Coupling Flows ( http://arxiv.org/abs/2308.10364v4 ) ライセンス: Link先を確認	Laurence I. Midgley and Vincent Stimper and Javier Antor\'an and Emile Mathieu and Bernhard Sch\"olkopf and Jos\'e Miguel Hern\'andez-Lobato	(参考訳) 結合正規化フローは高速サンプリングと密度評価を可能にし、物理システムの確率的モデリングに最適なツールとなる。しかし、標準結合構造は、se(3)と物理系の置換不変性を持つ原子の直交座標上で作用する内転流を妨げている。本研究は,SE(3)と置換等式を付加次元に沿って座標分割して保持する結合流を提案する。各層において、フローは原子の位置を学習されたSE(3)不変基底にマッピングし、そこではモノトニックな有理クアドラティックスプラインのような標準フロー変換を適用し、元の基底に戻る。重要な点として,我々のフローは高速サンプリングと密度評価を保ち,重要サンプリングによる目標分布に対する予測の偏りのない推定を行うのに有用である。 DW4, LJ13, QM9-ポジションデータセットでトレーニングすると, 流れは等変連続正規化フローと競合すると同時に, 1桁以上のサンプリングを高速に行うことができる。さらに、我々の知る限りでは、我々は、その原子のカルテシアン位置のみをモデル化することによって、初めて、アラニンジペプチドのボルツマン分布を学習する。最後に,DW4粒子系とLJ13粒子系のボルツマン分布から,エネルギー関数のみを用いて,我々の流れをおよそサンプルとしてトレーニングできることを実証した。 Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms' positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13, and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling more than an order of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.	翻訳日:2023-11-30 15:11:54 公開日:2023-11-28
# 大きなグラフ上のグラフニューラルネットワークの幾何学的不安定性 Geometric instability of graph neural networks on large graphs ( http://arxiv.org/abs/2308.10099v2 ) ライセンス: Link先を確認	Emily Morris, Haotian Shen, Weiling Du, Muhammad Hamza Sajjad, Borun Shi	(参考訳) グラフニューラルネットワーク(GNN)による埋め込みの幾何学的不安定性を解析する。既存の方法は小さなグラフにのみ適用でき、グラフ領域のコンテキストが欠けている。本稿では、置換、直交変換、変換、評価順序に不変な不安定性を測定するため、単純で効率的でグラフネイティブなグラフグラム指数(GGI)を提案する。これにより、ノード分類とリンク予測の両方のために、GNN埋め込みの様々な不安定な振る舞いを大きなグラフ上で研究することができる。 We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to study the varying instability behaviour of GNN embeddings on large graphs for both node classification and link prediction.	翻訳日:2023-11-30 15:11:32 公開日:2023-11-28
# カー非線形誘導強スピン-マグノンカップリング Kerr nonlinearity induced strong spin-magnon coupling ( http://arxiv.org/abs/2308.05927v2 ) ライセンス: Link先を確認	Feng-Zhou Ji, Jun-Hong An	(参考訳) 量子マグノニクスの柱の1つは、異なるプラットフォームにおけるマグノンの媒介の役割を利用して量子技術を開発することである。マグノンと様々な量子実体の効率的な結合は前提条件である。本稿では,YIG球面におけるマグノンカーの非線形性によるスピン-マグノン結合の促進手法を提案する。我々は、Kerr-enhanced spin-magnon couplingが、広く使われている単極モード近似をマグノンに無効にすることを示した。強結合系におけるマルチモードマグノンによるスピンデコヒーレンスが,個体群トラップあるいは持続ラビ様振動として表されるほど深刻ではないが抑制されることが明らかとなった。この異常な効果は、スピンの変化がマグノンと組み合わされ、その間に1つまたは2つの境界状態が形成されるためである。スピン-マグノン結合物理学の強化により、スピン-マグノン界面を制御するためのガイドラインが提供される。 One pillar of quantum magnonics is the exploration of the utilization of the mediation role of magnons in different platforms to develop quantum technologies. The efficient coupling between magnons and various quantum entities is a prerequisite. Here, we propose a scheme to enhance the spin-magnon coupling by the magnonic Kerr nonlinearity in a YIG sphere. We find that the Kerr-enhanced spin-magnon coupling invalidates the widely used single-Kittel-mode approximation to magnons. It is revealed that the spin decoherence induced by the multimode magnons in the strong-coupling regime becomes not severe, but suppressed, manifesting as either population trapping or persistent Rabi-like oscillation. This anomalous effect is because the spin changes to be so hybridized with the magnons that one or two bound states are formed between them. Enriching the spin-magnon coupling physics, the result supplies a guideline to control the spin-magnon interface.	翻訳日:2023-11-30 15:10:12 公開日:2023-11-28
# 重ね合わせ粒子における重力場の量子不確かさと絡み合い Quantum uncertainty of gravitational field and entanglement in superposed massive particles ( http://arxiv.org/abs/2308.03093v2 ) ライセンス: Link先を確認	Yuuki Sugiyama, Akira Matsumura, and Kazuhiro Yamamoto	(参考訳) 重力の量子性の研究は現代物理学において重要な問題である。近年、重力ポテンシャルの量子重ね合わせに関する研究が大きな関心を集めている。 Mari \textit{et al にインスパイアされた。と『Sci』。 Rep. {\bf 6} 22777 (2016)] and Baym and Ozawa [Proc] ナトルと。 Sci U.S.A. {\bf 106}, 3035 (2009)], Belenchia \textit{et al. である。 D {\bf 98}, 126009 (2018)] はそのような量子的重ね合わせを含むゲダンケンの実験を検討し、重ね合わせが因果関係と相補性に矛盾すると述べた。彼らは重力の自由度を量子化することでこの矛盾を解消した。これは重力ポテンシャルの量子重ね合わせと重力場の量子化との強い関係を示唆している。これまでの研究で [phys] D {\bf 106}, 125002 (2022)] は、場の量子的不確実性は因果性と相補性の間の整合性を保証することを示した。本研究では,電磁・重力ポテンシャルによる2つの粒子の状態の絡み合いに着目し,量子不確かさ,因果性,相補性との関係について検討する。数値解析の結果,電磁・重力場の量子的不確実性は真空変動を引き起こし,因果性を満たす場合の2つの粒子状態の絡み合いを禁止していることがわかった。さらに,粒子が絡み合わなければ相補性は持続することを示した。不確実性の関係は、2つの粒子の状態間の絡み合いを起こさない。 Investigating the quantum nature of gravity is an important issue in modern physics. Recently, studies pertaining to the quantum superposition of gravitational potential have garnered significant interest. Inspired by Mari \textit{et al.} [Sci. Rep. {\bf 6} 22777 (2016)] and Baym and Ozawa [Proc. Natl. Acad. Sci. U.S.A. {\bf 106}, 3035 (2009)], Belenchia \textit{et al.} [Phys. Rev. D {\bf 98}, 126009 (2018)] considered a gedanken experiment involving such a quantum superposition and mentioned that the superposition renders causality and complementarity inconsistent. They resolved this inconsistency by considering the quantized dynamical degrees of freedom of gravity. This suggests a strong relationship between the quantum superposition of the gravitational potential and the quantization of the gravitational field. In our previous study [Phys. Rev. D {\bf 106}, 125002 (2022)], we have shown that the quantum uncertainty of a field guarantees the consistency between causality and complementarity. In this study, we focus on the entanglement between two particles' states due to the electromagnetic/gravitational potential and investigate its relationship with quantum uncertainty, causality, and complementarity. Our numerical analyses show that the quantum uncertainty of the electromagnetic/gravitational field results in vacuum fluctuations and prohibits the entanglement between two particles' states when causality is satisfied. We further demonstrate that complementarity holds when the particles do not get entangled. The uncertainty relation does not cause the entanglement between two particles' states, which guarantees complementarity.	翻訳日:2023-11-30 15:09:56 公開日:2023-11-28
# ランダム化QAOA回路のエントロピー特性 Entropic property of randomized QAOA circuits ( http://arxiv.org/abs/2308.01807v4 ) ライセンス: Link先を確認	A. Yu. Chernyavskiy, B. I. Bantysh, Yu. I. Bogdanov	(参考訳) 量子近似最適化アルゴリズム(QAOA)は、パラメータ化量子回路を用いてビットストリングをサンプリングすることで離散最適化問題を解決することを目的とする。回路パラメータ(角度)はコストハミルトニアン期待値を最小限に抑えるように最適化される。近年,QAOA出力確率分布の一般統計的性質の研究が始まっている。従来の手法とは対照的に、ランダムな角度でQAOA回路を解析する。我々は、確率に関する解析方程式と、そのようなサンプリングがビットストリングの一様ランダムサンプリングよりもエネルギー分布のエントロピーを常に高めるという数値的な証拠を提供する。また, ランダムサンプリングよりも平均値が高い大域的最適値を得る確率も解析する。 Quantum approximate optimization algorithm (QAOA) aims to solve discrete optimization problems by sampling bitstrings using a parameterized quantum circuit. The circuit parameters (angles) are optimized in the way that minimizes the cost Hamiltonian expectation value. Recently, general statistical properties of QAOA output probability distributions have begun to be studied. In contrast to the conventional approach, we analyse QAOA circuits with random angles. We provide analytical equations for probabilities and the numerical evidence that for unweighted Max-Cut problems on connected graphs such sampling always gives higher entropy of energy distribution than uniform random sampling of bitstrings. We also analyse the probability to obtain the global optima, which appears to be higher on average than for random sampling.	翻訳日:2023-11-30 15:09:27 公開日:2023-11-28
# 大腸内視鏡ポリープ再同定のためのメタラーニングによる識別的表現に向けて Towards Discriminative Representation with Meta-learning for Colonoscopic Polyp Re-Identification ( http://arxiv.org/abs/2308.00929v2 ) ライセンス: Link先を確認	Suncheng Xiang, Qingzhong Chen, Shilun Cai, Chengfeng Zhou, Crystal Cai, Sijia Du, Zhengjie Zhang, Yunshi Zhong, Dahong Qian	(参考訳) 大腸内視鏡的ポリープ再同定は大きなギャラリーから得られたポリプと、異なるカメラで撮影された異なる視点の画像とを一致させることを目的としており、コンピュータ診断における大腸癌の予防と治療において重要な役割を果たす。しかし、ImageNetデータセットでトレーニングされたCNNモデルを直接適用する従来のオブジェクトReIDでは、ドメインギャップが大きいため、通常は大腸内視鏡的データセットで不満足な検索性能が得られる。さらに,これらの手法は,大腸内視鏡的ポリープデータセットにおけるクラス内関係の自己相違の可能性を検討することを怠っている。このジレンマを解決するために,サンプルが少ないシナリオにおけるメタラーニング戦略に基づいて,モデルがより一般的かつ識別的な知識を学習するのに役立つ,Colo-ReIDと呼ばれるシンプルで効果的なトレーニング手法を提案する。このことから,MLRと呼ばれる動的メタラーニング制御機構を導入し,ポリプ再同定の性能をさらに向上させる。我々の知る限りでは、これは従来の機械学習アルゴリズムの代わりにメタ学習パラダイムを活用して、大腸ポリプ再同定のタスクにおいて、ディープモデルを効果的に訓練する最初の試みである。実験の結果,本手法が現在の最先端手法を著しく上回っていることがわかった。 Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory retrieval performance on colonoscopic datasets due to the large domain gap. Additionally, these methods neglect to explore the potential of self-discrepancy among intra-class relations in the colonoscopic polyp dataset, which remains an open research problem in the medical community. To solve this dilemma, we propose a simple but effective training method named Colo-ReID, which can help our model learn more general and discriminative knowledge based on the meta-learning strategy in scenarios with fewer samples. Based on this, a dynamic Meta-Learning Regulation mechanism called MLR is introduced to further boost the performance of polyp re-identification. To the best of our knowledge, this is the first attempt to leverage the meta-learning paradigm instead of traditional machine learning algorithm to effectively train deep models in the task of colonoscopic polyp re-identification. Empirical results show that our method significantly outperforms current state-of-the-art methods by a clear margin.	翻訳日:2023-11-30 15:09:16 公開日:2023-11-28
# 創造的社会選択 Generative Social Choice ( http://arxiv.org/abs/2309.01291v2 ) ライセンス: Link先を確認	Sara Fish, Paul G\"olz, David C. Parkes, Ariel D. Procaccia, Gili Rusak, Itai Shapira, Manuel W\"uthrich	(参考訳) 伝統的に、社会的選択理論は、いくつかの所定の選択肢のうちの選択のみに適用されるが、集合的にテキスト文を選択するようなより複雑な決定には適用されない。本稿では,社会的選択理論の数学的厳密さと,大規模言語モデルによるテキスト生成と外挿選好の能力を組み合わせる枠組みである生成的社会的選択を紹介する。このフレームワークは、AIによって強化された民主的プロセスの設計を2つのコンポーネントに分割する。まず、プロセスがオラクルクエリへのアクセスを与えられたときに厳密な表現を保証することを証明する。本稿では,自由形式の文章として表現された意見を表す発言のスレートを生成する問題に適用する。具体的には,表現保証を伴う民主的プロセスを開発し,チャットボットのパーソナライゼーションに関する調査の参加者の意見を表現するためにこのプロセスを利用する。 100人中93人が、抽出した5つの発言のスレートによって「ほとんど」あるいは「完璧」と感じていることがわかった。 Traditionally, social choice theory has only been applicable to choices among a few predetermined alternatives but not to more complex decisions such as collectively selecting a textual statement. We introduce generative social choice, a framework that combines the mathematical rigor of social choice theory with the capability of large language models to generate text and extrapolate preferences. This framework divides the design of AI-augmented democratic processes into two components: first, proving that the process satisfies rigorous representation guarantees when given access to oracle queries; second, empirically validating that these queries can be approximately implemented using a large language model. We apply this framework to the problem of generating a slate of statements that is representative of opinions expressed as free-form text; specifically, we develop a democratic process with representation guarantees and use this process to represent the opinions of participants in a survey about chatbot personalization. We find that 93 out of 100 participants feel "mostly" or "perfectly" represented by the slate of five statements we extracted.	翻訳日:2023-11-30 15:01:36 公開日:2023-11-28
# 干渉型衛星開口レーダの拡散モデル Diffusion Models for Interferometric Satellite Aperture Radar ( http://arxiv.org/abs/2308.16847v2 ) ライセンス: Link先を確認	Alexandre Tuel and Thomas Kerdreux and Claudia Hulbert and Bertrand Rouet-Leduc	(参考訳) PDM(probabilistic Diffusion Models)は、最近、自然画像生成において高い性能を達成するために、非常に有望な生成モデルのクラスとして登場した。しかし、レーダーベースの衛星データのような非自然画像と比較しての性能はほとんど不明である。大量の合成(特にラベル付き)衛星データを生成することは、(インターフェロメトリ)衛星開口レーダデータの処理と解析のためのディープラーニングアプローチを実装するために重要である。ここでは、PDMを利用して複数のレーダベースの衛星画像データセットを生成する。 PDMは複雑で現実的な構造を持つ画像を生成するのに成功するが、サンプリング時間は依然として問題である。実際、mnistのような単純な画像データセットでうまく機能する加速サンプリング戦略は、我々のレーダーデータセットでは失敗する。単一のGPU上のデータセットを使用して、PDMをトレーニング、サンプリング、評価するための、シンプルで汎用的なオープンソースhttps://github.com/thomaskerdreux/PDM_SAR_gene。 Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models, achieving high performance in natural image generation. However, their performance relative to non-natural images, like radar-based satellite data, remains largely unknown. Generating large amounts of synthetic (and especially labelled) satellite data is crucial to implement deep-learning approaches for the processing and analysis of (interferometric) satellite aperture radar data. Here, we leverage PDMs to generate several radar-based satellite image datasets. We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue. Indeed, accelerated sampling strategies, which work well on simple image datasets like MNIST, fail on our radar datasets. We provide a simple and versatile open-source https://github.com/thomaskerdreux/PDM_SAR_InSAR_generation to train, sample and evaluate PDMs using any dataset on a single GPU.	翻訳日:2023-11-30 15:00:42 公開日:2023-11-28
# グラフニューラルネットワークのオーバースカッシング: 総合的な調査 Over-Squashing in Graph Neural Networks: A Comprehensive survey ( http://arxiv.org/abs/2308.15568v5 ) ライセンス: Link先を確認	Singh Akansha	(参考訳) グラフニューラルネットワーク(GNN)は、グラフ構造化データの機械学習に革命をもたらし、複雑な関係を効果的にキャプチャする。相互接続されたノードを通じて情報を広めるが、長距離インタラクションは"over-squashing"として知られる課題に直面している。この調査は、長距離情報の拡散が妨げられるグラフニューラルネットワーク(GNN)におけるオーバー・スカッシングの課題を掘り下げ、複雑な長距離通信に依存するタスクに影響を与える。オーバースカッシングの原因、結果、緩和戦略を包括的に探求する。グラフリワイリング、新しい正規化、スペクトル分析、曲率に基づく戦略など、さまざまな方法論が検討され、トレードオフと有効性に焦点が当てられている。オーバー・スクワッシングとオーバー・スムーシングのような他のGNN制限との相互作用についても論じており、ノードやグラフレベルのタスクでこれらの問題に対処するために設計されたモデルの分類を提供している。パフォーマンス評価のためのベンチマークデータセットも詳細であり、この調査はGNN分野の研究者や実践者にとって貴重なリソースである。 Graph Neural Networks (GNNs) revolutionize machine learning for graph-structured data, effectively capturing complex relationships. They disseminate information through interconnected nodes, but long-range interactions face challenges known as "over-squashing". This survey delves into the challenge of over-squashing in Graph Neural Networks (GNNs), where long-range information dissemination is hindered, impacting tasks reliant on intricate long-distance interactions. It comprehensively explores the causes, consequences, and mitigation strategies for over-squashing. Various methodologies are reviewed, including graph rewiring, novel normalization, spectral analysis, and curvature-based strategies, with a focus on their trade-offs and effectiveness. The survey also discusses the interplay between over-squashing and other GNN limitations, such as over-smoothing, and provides a taxonomy of models designed to address these issues in node and graph-level tasks. Benchmark datasets for performance evaluation are also detailed, making this survey a valuable resource for researchers and practitioners in the GNN field.	翻訳日:2023-11-30 14:59:16 公開日:2023-11-28
# 連帯における入力変数の属性 Towards Attributions of Input Variables in a Coalition ( http://arxiv.org/abs/2309.13411v2 ) ライセンス: Link先を確認	Xinhao Zheng, Huiqi Deng, Bo Fan, Quanshi Zhang	(参考訳) 本稿では,各変数の帰属と連立の帰属の対立を,全く新しい視点から説明するための新しい帰属法を開発することを目的とする。まず、Shapleyの値はAIモデルで符号化されたHarsanyi相互作用の割り当てとして再構成できる。第2に、相互作用の再調整に基づいて、Shapleyの価値を連立の帰属にまで広げます。第3に誘惑的だ紛争の背後にある基本的なメカニズムを導き出しますこの対立は、連立関係に部分変数を含む相互作用から生じる。 This paper aims to develop a new attribution method to explain the conflict between individual variables' attributions and their coalition's attribution from a fully new perspective. First, we find that the Shapley value can be reformulated as the allocation of Harsanyi interactions encoded by the AI model. Second, based the re-alloction of interactions, we extend the Shapley value to the attribution of coalitions. Third we ective. We derive the fundamental mechanism behind the conflict. This conflict come from the interaction containing partial variables in their coalition.	翻訳日:2023-11-30 14:51:26 公開日:2023-11-28
# アフィン変換を用いた確率に基づくセンサキャリブレーション Likelihood-based Sensor Calibration using Affine Transformation ( http://arxiv.org/abs/2309.11526v2 ) ライセンス: Link先を確認	R\"udiger Machhamer, Lejla Begic Fazlic, Eray Guven, David Junk, Gunes Karabulut Kurt, Stefan Naumann, Stephan Didas, Klaus-Uwe Gollmer, Ralph Bergmann, Ingo J. Timm, and Guido Dartmann	(参考訳) センサ技術の分野における重要な課題は、あるセンサから同じ設計の別のセンサーへの測定の適応手順の効率的な実装である。 1つの考え方は、専門家の知識によって改善できる、異なるシステム間のアフィン変換の推定を使用することである。本稿では,1973年に発表された氷河研究による改良解を提案する。その結果,センサのソフトウェアキャリブレーション,エキスパートベース適応の実装,分散学習手法などの今後の進歩への道を開くなど,様々な応用にこのソリューションが適用可能であることを示す。ここでのアイデアは、専門家の知識を使って、異なるシステム間のアフィン変換を推定することだ。シミュレーションと8つの同一センサを用いたマルチセンサボードの実測データを用いて本研究を評価した。データセットと評価スクリプトの両方がダウンロード可能である。その結果,実データを用いたシミュレーションと実験の両面で改善が見られた。 An important task in the field of sensor technology is the efficient implementation of adaptation procedures of measurements from one sensor to another sensor of identical design. One idea is to use the estimation of an affine transformation between different systems, which can be improved by the knowledge of experts. This paper presents an improved solution from Glacier Research that was published back in 1973. The results demonstrate the adaptability of this solution for various applications, including software calibration of sensors, implementation of expert-based adaptation, and paving the way for future advancements such as distributed learning methods. One idea here is to use the knowledge of experts for estimating an affine transformation between different systems. We evaluate our research with simulations and also with real measured data of a multi-sensor board with 8 identical sensors. Both data set and evaluation script are provided for download. The results show an improvement for both the simulation and the experiments with real data.	翻訳日:2023-11-30 14:50:56 公開日:2023-11-28
# 医用画像における因果性信号の爆発:実証実験による検討 Exploiting Causality Signals in Medical Images: A Pilot Study with Empirical Results ( http://arxiv.org/abs/2309.10399v2 ) ライセンス: Link先を確認	Gianluca Carloni, Sara Colantonio	(参考訳) 本稿では,ニューラルネットワークを用いて,画像から直接弱い因果信号を発見し,活用する新しい手法を提案する。これにより、画像の一部に特徴が存在することが、画像の別の部分における他の特徴の出現にどのように影響するかをモデル化する。提案手法は,畳み込みニューラルネットワークバックボーンと因果関係因子抽出モジュールで構成され,重みを計算し,各特徴マップをシーン内における因果影響に応じて拡張する。発癌診断のための前立腺mri画像と乳腺病理組織学スライドの2つの公開データセットを用いて, 異なるアーキテクチャ変異体を開発し, 実験的検討を行った。定量的な結果を確認するため、クラスアクティベーションマップを用いてアブレーション研究を行い、モデルの説明可能性について検討する。以上の結果から, 軽量ブロックは意味のある情報を抽出し, 全体的な分類を改善し, 画像の関連部分に焦点を当てたより堅牢な予測を行う。これは、診断と治療計画に正確かつ信頼性の高い分類が不可欠である医療画像において重要である。 We present a novel technique to discover and exploit weak causal signals directly from images via neural networks for classification purposes. This way, we model how the presence of a feature in one part of the image affects the appearance of another feature in a different part of the image. Our method consists of a convolutional neural network backbone and a causality-factors extractor module, which computes weights to enhance each feature map according to its causal influence in the scene. We developed different architecture variants and empirically evaluated all of our models on two public datasets of prostate MRI images and breast histopathology slides for cancer diagnosis. To confirm our quantitative results, we conduct ablation studies and investigate the explainability of our models via class activation maps. Our findings show that our lightweight block extracts meaningful information and improves the overall classification, together with producing more robust predictions that focus on relevant parts of the image. That is crucial in medical imaging, where accurate and reliable classifications are essential for effective diagnosis and treatment planning.	翻訳日:2023-11-30 14:50:10 公開日:2023-11-28
# 安全チップのプラグ:LLM駆動型ロボットエージェントの制約を強制する Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents ( http://arxiv.org/abs/2309.09919v3 ) ライセンス: Link先を確認	Ziyi Yang and Shreyas S. Raman and Ankit Shah and Stefanie Tellex	(参考訳) 大規模言語モデル(LLM)の最近の進歩により、ロボット工学を解くための新しい研究領域であるLLMエージェントが、事前訓練中に得られたLLMの世界の知識と一般的な推論能力を活用して実現されている。しかし、ロボットに"dos"を教えるためにかなりの努力がなされているが、"Don'ts"は比較的あまり注目されなかった。我々は、いかなる実践的利用においても、禁止された行為に関する明確な指示を伝えること、これらの制限に対するロボットの理解を評価すること、そして最も重要なのはコンプライアンスを確保すること、をロボットに教えることが重要であると主張する。さらに、検証可能な安全な運用は、世界中の産業工場環境で安全にロボットを配備するための標準を定義するiso 61508のような世界的な標準を満たす展開には不可欠である。本研究では,LLMエージェントを協調環境に配置することを目的とした,線形時間論理(LTL)に基づくクエリ可能な安全制約モジュールを提案する。本システムの有効性を実証するため,バーチャルホーム環境と実ロボットを用いて実験を行った。実験の結果,本システムは安全制約に厳格に準拠し,複雑な安全制約とともにスケールし,実用性の可能性を強調した。 Recent advancements in large language models (LLMs) have enabled a new research domain, LLM agents, for solving robotics and planning tasks by leveraging the world knowledge and general reasoning abilities of LLMs obtained during pretraining. However, while considerable effort has been made to teach the robot the "dos," the "don'ts" received relatively less attention. We argue that, for any practical usage, it is as crucial to teach the robot the "don'ts": conveying explicit instructions about prohibited actions, assessing the robot's comprehension of these restrictions, and, most importantly, ensuring compliance. Moreover, verifiable safe operation is essential for deployments that satisfy worldwide standards such as ISO 61508, which defines standards for safely deploying robots in industrial factory environments worldwide. Aiming at deploying the LLM agents in a collaborative environment, we propose a queryable safety constraint module based on linear temporal logic (LTL) that simultaneously enables natural language (NL) to temporal constraints encoding, safety violation reasoning and explaining, and unsafe action pruning. To demonstrate the effectiveness of our system, we conducted experiments in VirtualHome environment and on a real robot. The experimental results show that our system strictly adheres to the safety constraints and scales well with complex safety constraints, highlighting its potential for practical utility.	翻訳日:2023-11-30 14:49:53 公開日:2023-11-28
# 感染拡大の地理を解き放つ:スーパーエージェントを用いた予測モデル Unraveling the Geography of Infection Spread: Harnessing Super-Agents for Predictive Modeling ( http://arxiv.org/abs/2309.07055v4 ) ライセンス: Link先を確認	Amir Mohammad Esmaieeli Sikaroudi, Alon Efrat, Michael Chertkov	(参考訳) 本研究は, 複雑なエージェントベースモデル (ABM) と感染症の伝統的なコンパートメンタルモデルとのギャップを埋める中間レベルモデリング手法を提案する。都市部における感染拡大をシミュレートし,個別レベルの相互作用を維持しながら計算複雑性を低減させる「スーパーエージェント」を導入する。このアプローチは、実世界のモビリティデータと戦略的地理空間的テッセルレーションを効率よく活用する。 Voronoi Diagramテッセルレーションは、特定のストリートネットワークの位置に基づいて、標準のCensus Block Groupテッセルレーションより優れており、ハイブリッドアプローチは精度と効率のバランスをとる。既存のabmsに対するベンチマークでは、重要な最適化が強調される。本研究は都市部の疾病モデルを改善し,地理的特異性と高い計算効率を必要とするシナリオにおいて,公衆衛生戦略を支援する。 Our study presents an intermediate-level modeling approach that bridges the gap between complex Agent-Based Models (ABMs) and traditional compartmental models for infectious diseases. We introduce "super-agents" to simulate infection spread in cities, reducing computational complexity while retaining individual-level interactions. This approach leverages real-world mobility data and strategic geospatial tessellations for efficiency. Voronoi Diagram tessellations, based on specific street network locations, outperform standard Census Block Group tessellations, and a hybrid approach balances accuracy and efficiency. Benchmarking against existing ABMs highlights key optimizations. This research improves disease modeling in urban areas, aiding public health strategies in scenarios requiring geographic specificity and high computational efficiency.	翻訳日:2023-11-30 14:47:52 公開日:2023-11-28
# 対向プロンプトに対するllm安全性の検証 Certifying LLM Safety against Adversarial Prompting ( http://arxiv.org/abs/2309.02705v2 ) ライセンス: Link先を確認	Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi and Himabindu Lakkaraju	(参考訳) 一般向けにリリースされた大型言語モデル(llm)は、出力が安全であることを保証するためにguardrailsを組み込んでいる。整列型言語モデルは、有害なコンテンツを生成するユーザの要求を減らすべきである。しかし、このような安全対策は敵の攻撃に弱いため、悪意ある設計のトークンシーケンスをモデルの安全ガードをバイパスする有害なプロンプトに追加する。本稿では,検証可能な安全性保証によって敵のプロンプトから防御する最初のフレームワークである消去・チェックを紹介する。我々は3つの攻撃モードに対して防御する。一相手方の接尾辞で、プロンプトの終わりに相手方の接尾辞を付すもの二相手方の挿入であって、相手方のシーケンスがプロンプトの中央のどこにでも挿入されているもの三敵トークンをそのプロンプトにおいて任意の位置に挿入する場合であって、必ずしも連続ブロックではないもの実験結果から, 安全プロンプトの安全性が保証され, 安全プロンプトの良好な試験性能が維持できることがわかった。例えば、長さ20の敵の接尾辞に対して、有害なプロンプトの92%を確実に検出し、オープンソースの言語モデルであるLlama 2を安全フィルタとして、安全なプロンプトの94%を正しく検出する。我々は,安全かつ有害なプロンプトを微調整したディチルバート安全分類器をllama 2に置き換えることで,精度と速度の面でフィルタの性能をさらに向上させる。さらに,2つの効果的な実証的防御法を提案する。 i) 消去されたサブシーケンスの小さなサブセット上で安全フィルタを評価する消去・チェックのランダム化バージョンであるrandec ii) gradecは,消去されたトークンを最適化して逆シーケンスを削除する勾配ベースのバージョンである。私たちの実験のコードはhttps://github.com/aounon/certified-llm-safetyで利用可能です。 Large language models (LLMs) released for public use incorporate guardrails to ensure their output is safe, often referred to as "model alignment." An aligned language model should decline a user's request to produce harmful content. However, such safety measures are vulnerable to adversarial attacks, which add maliciously designed token sequences to a harmful prompt to bypass the model's safety guards. In this work, we introduce erase-and-check, the first framework to defend against adversarial prompts with verifiable safety guarantees. We defend against three attack modes: i) adversarial suffix, which appends an adversarial sequence at the end of the prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block. Our experimental results demonstrate that this procedure can obtain strong certified safety guarantees on harmful prompts while maintaining good empirical performance on safe prompts. For example, against adversarial suffixes of length 20, it certifiably detects 92% of harmful prompts and labels 94% of safe prompts correctly using the open-source language model Llama 2 as the safety filter. We further improve the filter's performance, in terms of accuracy and speed, by replacing Llama 2 with a DistilBERT safety classifier fine-tuned on safe and harmful prompts. Additionally, we propose two efficient empirical defenses: i) RandEC, a randomized version of erase-and-check that evaluates the safety filter on a small subset of the erased subsequences, and ii) GradEC, a gradient-based version that optimizes the erased tokens to remove the adversarial sequence. The code for our experiments is available at https://github.com/aounon/certified-llm-safety.	翻訳日:2023-11-30 14:46:18 公開日:2023-11-28
# CLIP-DIY: CLIP Dense Inferenceがオープンソースでセマンティックセマンティックセグメンテーションを無償で提供 CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free ( http://arxiv.org/abs/2309.14289v2 ) ライセンス: Link先を確認	Monika Wysocza\'nska, Micha\"el Ramamonjisoa, Tomasz Trzci\'nski, Oriane Sim\'eoni	(参考訳) CLIPの出現は、オープンワールドイメージ認識の道を開いた。モデルのゼロショット分類機能は印象的だが、画像セグメンテーションのような密集したタスクには使いづらい。いくつかの方法で異なる修正と学習スキームを提案し、密集したアウトプットを作り出す。代わりに、我々はCLIP-DIYと呼ばれるオープン語彙セマンティックセマンティックセマンティクス手法を提案し、これは追加のトレーニングやアノテーションを必要としないが、代わりに既存の教師なしオブジェクトローカライゼーションアプローチを活用する。特にCLIP-DIYは、CLIP分類能力を異なるサイズのパッチに直接活用し、決定を単一のマップに集約するマルチスケールアプローチである。さらに,教師なし物体定位法を用いて得られたフォアグラウンド/バックグラウンドスコアを用いたセグメンテーションをガイドする。提案手法により,PASCAL VOC上での最先端のゼロショットセマンティックセマンティックセマンティックセマンティクス結果を取得し,COCO上でのベストメソッドと同等に実行する。コードはhttp://github.com/wysoczanska/clip-diyで入手できる。 The emergence of CLIP has opened the way for open-world image perception. The zero-shot classification capabilities of the model are impressive but are harder to use for dense tasks such as image segmentation. Several methods have proposed different modifications and learning schemes to produce dense output. Instead, we propose in this work an open-vocabulary semantic segmentation method, dubbed CLIP-DIY, which does not require any additional training or annotations, but instead leverages existing unsupervised object localization approaches. In particular, CLIP-DIY is a multi-scale approach that directly exploits CLIP classification abilities on patches of different sizes and aggregates the decision in a single map. We further guide the segmentation using foreground/background scores obtained using unsupervised object localization methods. With our method, we obtain state-of-the-art zero-shot semantic segmentation results on PASCAL VOC and perform on par with the best methods on COCO. The code is available at http://github.com/wysoczanska/clip-diy	翻訳日:2023-11-30 14:37:55 公開日:2023-11-28
# ニューラルネットワークの大規模バッチトレーニング一般化のためのlarsの再訪 Revisiting LARS for Large Batch Training Generalization of Neural Networks ( http://arxiv.org/abs/2309.14053v2 ) ライセンス: Link先を確認	Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, and Quoc-Viet Pham	(参考訳) LARSとLAMBは、AIのトレーニング安定性を確保するために、LBL(Large Batch Learning)において顕著なテクニックとして登場した。収束安定性はLBLの課題であり、AIエージェントは通常、鋭い最小化器に閉じ込められる。この課題に対処するためには、ウォームアップは効率的なテクニックであるが、強力な理論的基盤がない。具体的には、ウォームアッププロセスは、しばしば初期段階の勾配を減少させ、エージェントが急激な最小化剤を早期に逃がすことを防ぐ。このような状況を踏まえて,我々はLARSとLAMBの動作をウォームアップ戦略なしで解析する実験を行った。本研究は,子牛,子羊の行動とlblにおけるウォームアップ技術の必要性を包括的に把握し,多くの場合においてその失敗を説明する。これらの知見に基づいて,ウォームアップを必要とせず,初期段階におけるロバストなトレーニングを容易にする時間変化lars(tvlars)と呼ばれる新しいアルゴリズムを提案する。トレーニング安定性を高めるためにウォームアッププロセスを置き換えるために、TVLARSで構成可能なシグモイド様関数が使用される。さらに、tvlarsは初期段階の勾配探査を刺激し、初期段階の急勾配を最小化し、段階的にlarsに移行し、後期段階におけるlarsの堅牢性を達成する。大規模な実験的評価の結果、TVLARSはLARSとLAMBを一貫して上回り、分類シナリオでは最大2%の改善が見られた。特に、自己教師型学習では、TVLARSがLARSとLAMBを支配し、パフォーマンスは最大10%向上した。 LARS and LAMB have emerged as prominent techniques in Large Batch Learning (LBL) to ensure training stability in AI. Convergence stability is a challenge in LBL, where the AI agent usually gets trapped in the sharp minimizer. To address this challenge, warm-up is an efficient technique, but it lacks a strong theoretical foundation. Specifically, the warm-up process often reduces gradients in the early phase, inadvertently preventing the agent from escaping the sharp minimizer early on. In light of this situation, we conduct empirical experiments to analyze the behaviors of LARS and LAMB with and without a warm-up strategy. Our analyses give a comprehensive insight into the behaviors of LARS, LAMB, and the necessity of a warm-up technique in LBL, including an explanation of their failure in many cases. Building upon these insights, we propose a novel algorithm called Time Varying LARS (TVLARS), which facilitates robust training in the initial phase without the need for warm-up. A configurable sigmoid-like function is employed in TVLARS to replace the warm-up process to enhance training stability. Moreover, TVLARS stimulates gradient exploration in the early phase, thus allowing it to surpass the sharp minimizes early on and gradually transition to LARS and achieving robustness of LARS in the latter phases. Extensive experimental evaluations reveal that TVLARS consistently outperforms LARS and LAMB in most cases, with improvements of up to 2% in classification scenarios. Notably, in every case of self-supervised learning, TVLARS dominates LARS and LAMB with performance improvements of up to 10%.	翻訳日:2023-11-30 14:37:33 公開日:2023-11-28
# 先進的な一般化対策は見つからない Fantastic Generalization Measures are Nowhere to be Found ( http://arxiv.org/abs/2309.13658v3 ) ライセンス: Link先を確認	Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger	(参考訳) 本研究では,全学習アルゴリズムと全人口分布において,有界と人口損失の差が小さく,一様に密接な一般化の概念を考察する。ニューラルネットワークが過パラメータ設定で一般化する能力の潜在的な説明として、多くの一般化境界が文献に提案されている。しかし、彼の論文「fantastic generalization measures and where to find them」において、jiangら(2020年)は10以上の一般化境界を調べ、それぞれが一様にタイトではないことを実証的に示している。これは、一様密な一般化境界が超パラメータ設定において可能かどうかという疑問を提起する。一般化境界は、(1)訓練集合と学習された仮説(例えば、マージン境界)に依存するかもしれない境界である。数学的には、そのような境界は超パラメータ設定では一様にタイトにできないことを証明し、(2)さらに学習アルゴリズム(例えば、安定性境界)に依存するような境界も証明する。これらの境界に対して,アルゴリズムの性能と境界の厳密さとのトレードオフを示す。すなわち、アルゴリズムが特定の分布に対して良好な精度を達成した場合、過度なパラメータ設定で一般化境界を均一に締め付けることはできない。これらの形式的結果がニューラルネットワークの一般化境界に関する研究にどのように影響を与えるかを説明するとともに、これらの結果の他の解釈も可能であることを強調する。 We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small for all learning algorithms and all population distributions. Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize in the overparameterized setting. However, in their paper ``Fantastic Generalization Measures and Where to Find Them,'' Jiang et al. (2020) examine more than a dozen generalization bounds, and show empirically that none of them are uniformly tight. This raises the question of whether uniformly-tight generalization bounds are at all possible in the overparameterized setting. We consider two types of generalization bounds: (1) bounds that may depend on the training set and the learned hypothesis (e.g., margin bounds). We prove mathematically that no such bound can be uniformly tight in the overparameterized setting; (2) bounds that may in addition also depend on the learning algorithm (e.g., stability bounds). For these bounds, we show a trade-off between the algorithm's performance and the bound's tightness. Namely, if the algorithm achieves good accuracy on certain distributions, then no generalization bound can be uniformly tight for it in the overparameterized setting. We explain how these formal results can, in our view, inform research on generalization bounds for neural networks, while stressing that other interpretations of these results are also possible.	翻訳日:2023-11-30 14:37:05 公開日:2023-11-28
# 先進的な一般化対策は見つからない Fantastic Generalization Measures are Nowhere to be Found ( http://arxiv.org/abs/2309.13658v2 ) ライセンス: Link先を確認	Michael Gastpar, Ido Nachum, Jonathan Shafer, Thomas Weinberger	(参考訳) 本研究では,全学習アルゴリズムと全人口分布において,有界と人口損失の差が小さく,一様に密接な一般化の概念を考察する。ニューラルネットワークが過パラメータ設定で一般化する能力の潜在的な説明として、多くの一般化境界が文献に提案されている。しかし、彼の論文「fantastic generalization measures and where to find them」において、jiangら(2020年)は10以上の一般化境界を調べ、それぞれが一様にタイトではないことを実証的に示している。これは、一様密な一般化境界が超パラメータ設定において可能かどうかという疑問を提起する。一般化境界は、(1)訓練集合と学習された仮説(例えば、マージン境界)に依存するかもしれない境界である。数学的には、そのような境界は超パラメータ設定では一様にタイトにできないことを証明し、(2)さらに学習アルゴリズム(例えば、安定性境界)に依存するような境界も証明する。これらの境界に対して,アルゴリズムの性能と境界の厳密さとのトレードオフを示す。すなわち、アルゴリズムが特定の分布に対して良好な精度を達成した場合、過度なパラメータ設定で一般化境界を均一に締め付けることはできない。これらの形式的結果がニューラルネットワークの一般化境界に関する研究にどのように影響を与えるかを説明するとともに、これらの結果の他の解釈も可能であることを強調する。 We study the notion of a generalization bound being uniformly tight, meaning that the difference between the bound and the population loss is small for all learning algorithms and all population distributions. Numerous generalization bounds have been proposed in the literature as potential explanations for the ability of neural networks to generalize in the overparameterized setting. However, in their paper ``Fantastic Generalization Measures and Where to Find Them,'' Jiang et al. (2020) examine more than a dozen generalization bounds, and show empirically that none of them are uniformly tight. This raises the question of whether uniformly-tight generalization bounds are at all possible in the overparameterized setting. We consider two types of generalization bounds: (1) bounds that may depend on the training set and the learned hypothesis (e.g., margin bounds). We prove mathematically that no such bound can be uniformly tight in the overparameterized setting; (2) bounds that may in addition also depend on the learning algorithm (e.g., stability bounds). For these bounds, we show a trade-off between the algorithm's performance and the bound's tightness. Namely, if the algorithm achieves good accuracy on certain distributions, then no generalization bound can be uniformly tight for it in the overparameterized setting. We explain how these formal results can, in our view, inform research on generalization bounds for neural networks, while stressing that other interpretations of these results are also possible.	翻訳日:2023-11-30 14:36:40 公開日:2023-11-28
# PRIS:画像ステガノグラフィのための実用的頑健な非可逆ネットワーク PRIS: Practical robust invertible network for image steganography ( http://arxiv.org/abs/2309.13620v2 ) ライセンス: Link先を確認	Hang Yang, Yitian Xu, Xuhua Liu, Xiaodong Ma	(参考訳) 画像ステガノグラフィー(英: Image steganography)は、他の画像の中に秘密情報を隠す技術であり、その秘密が人間の目からは見えず、必要に応じて復元できる。既存の画像ステガノグラフィ法のほとんどは、歪みに影響を受ける容器像の隠れ堅牢性が低い。ガウスノイズや損失圧縮など。本稿では,可逆ニューラルネットワークに基づく画像ステガノグラフィのロバスト性向上のためのprisを提案し,3段階のトレーニング戦略を用いて,抽出過程の前後に2つの強化モジュールを配置した。さらに、既存の手法で常に無視される丸め誤差も考慮されているが、実際は避けられない。傾斜近似関数 (GAF) も、円形歪みの微分不可能な問題を克服するために提案されている。以上の結果から,PRISは画像ステガノグラフィー法よりも頑健性と実践性に優れていた。コードはhttps://github.com/yanghangAI/PRISで公開されています。 Image steganography is a technique of hiding secret information inside another image, so that the secret is not visible to human eyes and can be recovered when needed. Most of the existing image steganography methods have low hiding robustness when the container images affected by distortion. Such as Gaussian noise and lossy compression. This paper proposed PRIS to improve the robustness of image steganography, it based on invertible neural networks, and put two enhance modules before and after the extraction process with a 3-step training strategy. Moreover, rounding error is considered which is always ignored by existing methods, but actually it is unavoidable in practical. A gradient approximation function (GAF) is also proposed to overcome the undifferentiable issue of rounding distortion. Experimental results show that our PRIS outperforms the state-of-the-art robust image steganography method in both robustness and practicability. Codes are available at https://github.com/yanghangAI/PRIS, demonstration of our model in practical at http://yanghang.site/hide/.	翻訳日:2023-11-30 14:36:14 公開日:2023-11-28
# MM-NeRF:マルチモーダルガイドによるニューラルラジアンス場の多次元移動 MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance Field ( http://arxiv.org/abs/2309.13607v2 ) ライセンス: Link先を確認	Zijiang Yang, Zhongwei Qiu, Chang Xu, Dongmei Fu	(参考訳) 3dスタイル転送は、特定のスタイルで3dシーンのスタイル化されたビューを生成することを目的としている。既存の手法は、テクスチャの詳細とマルチモーダルガイダンスによるスタイル化による高品質なスタイル化の課題を依然として抱えている。本論文では,2次元スタイルの転送モデルによる多面的監視を生成するNeRFを用いた多面的スタイル化の一般的な訓練手法により,異なる視点における異なる状態(色調,詳細など)を同じオブジェクトに表示させることで,NeRFはテクスチャの細部を滑らかにし,さらに3次元多面的転送のための低品質レンダリングを実現する。これらの問題に対処するため,MM-NeRFと呼ばれる新しいマルチモーダル誘導型3次元NeRFの多次元転送を提案する。まず、mm-nerfはマルチモーダルスタイルの一貫性を保つためにマルチモーダルガイダンスを統一空間に投影し、3dスタイライゼーションを導くためにマルチモーダルな特徴を抽出する。第二に,多視点監視データの一貫性の欠如を追跡するために,多視点学習の難易度を軽減すべく,新しい多視点学習方式を提案する。最後に、MM-NeRFを小さなコストで新しいスタイルに一般化する新しいインクリメンタル学習機構を提案する。複数の実世界のデータセットに対する大規模な実験により、MM-NeRFはマルチモーダルガイダンスによる高品質な3Dマルチスタイルのスタイリングを実現し、マルチビューの一貫性とマルチモーダルガイダンス間のスタイルの整合性を維持する。コードはリリースされる。 3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance. Codes will be released.	翻訳日:2023-11-30 14:35:57 公開日:2023-11-28
# フーリエニューラル演算子を用いたロバスト海洋サブグリッドスケールパラメータ化 Robust Ocean Subgrid-Scale Parameterizations Using Fourier Neural Operators ( http://arxiv.org/abs/2310.02691v2 ) ライセンス: Link先を確認	Victor Mangeleer and Gilles Louppe	(参考訳) 気候シミュレーションでは、小規模プロセスは海洋力学を形作るが、直接解決するには計算コストがかかる。このため、それらの寄与は経験的パラメータ化を用いて概ね近似され、長期射影において重大な誤差をもたらす。本研究では,フーリエニューラル演算子に基づくパラメータ化手法を開発し,他の手法と比較してその精度と一般化性を示す。最後に、周波数領域で動作するニューラルネットワークの可能性と限界について論じ、今後の研究の道を開く。 In climate simulations, small-scale processes shape ocean dynamics but remain computationally expensive to resolve directly. For this reason, their contributions are commonly approximated using empirical parameterizations, which lead to significant errors in long-term projections. In this work, we develop parameterizations based on Fourier Neural Operators, showcasing their accuracy and generalizability in comparison to other approaches. Finally, we discuss the potential and limitations of neural networks operating in the frequency domain, paving the way for future investigation.	翻訳日:2023-11-30 14:29:39 公開日:2023-11-28
# マルチモーダル大言語モデルによるエンド・ツー・エンドの身体決定に向けて: GPT4-Vision による探索と超越 Towards End-to-End Embodied Decision Making via Multi-modal Large Language Model: Explorations with GPT4-Vision and Beyond ( http://arxiv.org/abs/2310.02071v4 ) ライセンス: Link先を確認	Liang Chen, Yichi Zhang, Shuhuai Ren, Haozhe Zhao, Zefan Cai, Yuchi Wang, Peiyi Wang, Tianyu Liu, Baobao Chang	(参考訳) 本研究では,エージェントの具体的意思決定プロセスを改善する上で,MLLM(Multimodal Large Language Models)の可能性を検討する。大きな言語モデル(LLM)はその高度な推論技術と広大な世界知識のために広く使われているが、GPT4-VisionのようなMLLMは視覚的理解と推論能力の向上を提供する。我々は,最先端のMLLMがエンド・ツー・エンドで具体的意思決定を扱えるか,LLMとMLLMの連携が意思決定を促進するかを検討する。これらの問題に対処するため,PCA-EVALと呼ばれる新しいベンチマークを導入し,知覚,認知,行動の観点から具体的意思決定を評価する。さらに,マルチエージェント協調フレームワークであるhomesを提案する。llmがmllmとapiを活用して,インフォームド意思決定のためのマルチモーダル情報収集を可能にする。 GPT4-Visionモデルでは, 平均判定精度(+3%)において, GPT4-HOLMESを上回り, GPT4-HOLMESよりも高い結果が得られた。しかし、この性能は最新のGPT4-Visionモデルのみであり、オープンソースのMLLMを26%上回っている。 GPT4-Visionのような強力なMLLMは、エンボディエージェントの意思決定を約束し、MLLM研究の新たな道筋を提供する。コードとデータはhttps://github.com/pkunlp-icler/PCA-EVAL/.comで公開されている。 In this study, we explore the potential of Multimodal Large Language Models (MLLMs) in improving embodied decision-making processes for agents. While Large Language Models (LLMs) have been widely used due to their advanced reasoning skills and vast world knowledge, MLLMs like GPT4-Vision offer enhanced visual understanding and reasoning capabilities. We investigate whether state-of-the-art MLLMs can handle embodied decision-making in an end-to-end manner and whether collaborations between LLMs and MLLMs can enhance decision-making. To address these questions, we introduce a new benchmark called PCA-EVAL, which evaluates embodied decision-making from the perspectives of Perception, Cognition, and Action. Additionally, we propose HOLMES, a multi-agent cooperation framework that allows LLMs to leverage MLLMs and APIs to gather multimodal information for informed decision-making. We compare end-to-end embodied decision-making and HOLMES on our benchmark and find that the GPT4-Vision model demonstrates strong end-to-end embodied decision-making abilities, outperforming GPT4-HOLMES in terms of average decision accuracy (+3%). However, this performance is exclusive to the latest GPT4-Vision model, surpassing the open-source state-of-the-art MLLM by 26%. Our results indicate that powerful MLLMs like GPT4-Vision hold promise for decision-making in embodied agents, offering new avenues for MLLM research. Code and data are open at https://github.com/pkunlp-icler/PCA-EVAL/.	翻訳日:2023-11-30 14:29:30 公開日:2023-11-28
# リモートセンシング画像セグメンテーションのためのCAMに基づくXAI手法の拡張 Extending CAM-based XAI methods for Remote Sensing Imagery Segmentation ( http://arxiv.org/abs/2310.01837v2 ) ライセンス: Link先を確認	Abdul Karim Gizzini, Mustafa Shukor and Ali J. Ghandour	(参考訳) 現在のAIベースの手法では、使用済みデータ、抽出された特徴、予測/推論操作の理解可能な物理的解釈を提供していない。その結果、高解像度衛星画像を用いて訓練されたディープラーニングモデルは透明性と説明性に欠けており、単にブラックボックスと見なすだけで広範に採用が制限される。専門家はAIモデルの複雑な振る舞いと根底にある意思決定プロセスを理解するのに役立つ必要がある。説明可能な人工知能(XAI)分野は、堅牢で実用的で信頼性の高いAIモデルのデプロイのための手段を提供する新興分野である。画像分類タスクにはいくつかのXAI技術が提案されているが、画像分割の解釈はほとんど未検討である。本稿では,最近のxai分類アルゴリズムを応用し,高分解能衛星画像からの建物のセグメンテーションに着目したミューティ級画像セグメンテーションに適用することで,このギャップを埋めることを提案する。提案手法の性能をベンチマークし比較するために,モデルの不確実性を測定するために,「エントロピー」に基づく新しいXAI評価手法とメトリクスを導入する。従来のXAI評価手法は、主に画像から事前学習された(実用)モデルへの関心領域の供給に依存し、対象クラスの確率の平均変化を計算する。これらの評価指標には頑健性が欠如しており、対象クラス内のピクセルをセグメンテーションする際のモデルの不確実性を監視するエントロピーの使用がより適していることを示す。この研究が、リモートセンシング分野における画像セグメンテーションと応用のための追加のXAI研究の道を開くことを願っている。 Current AI-based methods do not provide comprehensible physical interpretations of the utilized data, extracted features, and predictions/inference operations. As a result, deep learning models trained using high-resolution satellite imagery lack transparency and explainability and can be merely seen as a black box, which limits their wide-level adoption. Experts need help understanding the complex behavior of AI models and the underlying decision-making process. The explainable artificial intelligence (XAI) field is an emerging field providing means for robust, practical, and trustworthy deployment of AI models. Several XAI techniques have been proposed for image classification tasks, whereas the interpretation of image segmentation remains largely unexplored. This paper offers to bridge this gap by adapting the recent XAI classification algorithms and making them usable for muti-class image segmentation, where we mainly focus on buildings' segmentation from high-resolution satellite images. To benchmark and compare the performance of the proposed approaches, we introduce a new XAI evaluation methodology and metric based on "Entropy" to measure the model uncertainty. Conventional XAI evaluation methods rely mainly on feeding area-of-interest regions from the image back to the pre-trained (utility) model and then calculating the average change in the probability of the target class. Those evaluation metrics lack the needed robustness, and we show that using Entropy to monitor the model uncertainty in segmenting the pixels within the target class is more suitable. We hope this work will pave the way for additional XAI research for image segmentation and applications in the remote sensing discipline.	翻訳日:2023-11-30 14:27:59 公開日:2023-11-28
# sarデータのラベル要求を低減するための大規模マスキング自動エンコーディング Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data ( http://arxiv.org/abs/2310.00826v2 ) ライセンス: Link先を確認	Matt Allen, Francisco Dorr, Joseph A. Gallego-Mejia, Laura Mart\'inez-Ferrer, Anna Jungbluth, Freddie Kalaitzis, Ra\'ul Ramos-Poll\'an	(参考訳) 衛星によるリモートセンシングは、人為的な気候変動の影響の監視と緩和に寄与する。これらのセンサから得られた大規模で高解像度なデータは、介入や政策決定に役立てることができるが、これらの介入のタイムラインと正確性は、夜間には動作できず、悪天候の影響を受ける光学データによって制限される。 SAR(Synthetic Aperture Radar)は、光学データに対して堅牢な代替手段を提供するが、関連する複雑さは、従来のディープラーニングのためのラベル付きデータ生成の範囲を制限する。本研究では,地球表面積の8.7\%をカバーするSAR振幅データに対して,自己監督型事前学習スキーム,マスク付き自己エンコーディングを適用し,気候変動の監視に不可欠な2つの下流タスク(植生被覆予測と土地被覆分類)に事前トレーニングされた重量を調整した。このプリトレーニング方式を用いることで、下流タスクのラベリング要件を1桁以上削減でき、プレトレーニングセット外の領域で下流タスクをチューニングした場合のパフォーマンス向上により、地理的に一般化できることを示す。課題と地域固有のSARモデルの開発を促進することで気候変動の緩和を著しく促進し、地域社会や組織が気候変動効果の迅速かつ正確なモニタリングのために調整されたソリューションを展開できるようにする。 Satellite-based remote sensing is instrumental in the monitoring and mitigation of the effects of anthropogenic climate change. Large scale, high resolution data derived from these sensors can be used to inform intervention and policy decision making, but the timeliness and accuracy of these interventions is limited by use of optical data, which cannot operate at night and is affected by adverse weather conditions. Synthetic Aperture Radar (SAR) offers a robust alternative to optical data, but its associated complexities limit the scope of labelled data generation for traditional deep learning. In this work, we apply a self-supervised pretraining scheme, masked autoencoding, to SAR amplitude data covering 8.7\% of the Earth's land surface area, and tune the pretrained weights on two downstream tasks crucial to monitoring climate change - vegetation cover prediction and land cover classification. We show that the use of this pretraining scheme reduces labelling requirements for the downstream tasks by more than an order of magnitude, and that this pretraining generalises geographically, with the performance gain increasing when tuned downstream on regions outside the pretraining set. Our findings significantly advance climate change mitigation by facilitating the development of task and region-specific SAR models, allowing local communities and organizations to deploy tailored solutions for rapid, accurate monitoring of climate change effects.	翻訳日:2023-11-30 14:25:08 公開日:2023-11-28
# FELM:大規模言語モデルの品質評価のベンチマーク FELM: Benchmarking Factuality Evaluation of Large Language Models ( http://arxiv.org/abs/2310.00741v2 ) ライセンス: Link先を確認	Shiqi Chen, Yiran Zhao, Jinghan Zhang, I-Chun Chern, Siyang Gao, Pengfei Liu and Junxian He	(参考訳) 大規模言語モデル(LLM)によって生成されたテキストの事実性を評価することは、ユーザに対して潜在的なエラーを警告し、より信頼性の高いLLMの開発を導くことを目的とした、新たな重要な研究分野である。それにもかかわらず、事実性を評価する評価者は、進捗を測り、進歩を促進するのに適切な評価が必要である。この方向は未調査のままであり、事実性評価者の進歩に重大な障害をもたらす。この問題を軽減するため,本研究では,fermと呼ばれる大規模言語モデルの事実性評価のためのベンチマークを紹介する。本ベンチマークでは, LLMから生成した応答を収集し, ファクトリティラベルをきめ細かな方法でアノテートする。世界知識の事実性(wikipediaからの情報など)に主に注力した以前の研究とは対照的に、フェルムは世界知識から数学や推論まで幅広い分野の事実性に焦点を当てている。アノテーションはテキストセグメントに基づいており、特定の事実的エラーを特定するのに役立ちます。 factualityアノテーションはさらに、事前定義されたエラータイプと、そのステートメントをサポートするか、矛盾する参照リンクによって補完される。本実験では,バニラLSMと検索機構とチェーン・オブ・プリート・プロセスを併用したファレルム上での現実性評価器の性能について検討した。その結果,検索は事実性評価に役立つが,現在のLCMは事実の誤りを忠実に検出するには不十分であることがわかった。 Assessing factuality of text generated by large language models (LLMs) is an emerging yet crucial research area, aimed at alerting users to potential errors and guiding the development of more reliable LLMs. Nonetheless, the evaluators assessing factuality necessitate suitable evaluation themselves to gauge progress and foster advancements. This direction remains under-explored, resulting in substantial impediments to the progress of factuality evaluators. To mitigate this issue, we introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm. In this benchmark, we collect responses generated from LLMs and annotate factuality labels in a fine-grained manner. Contrary to previous studies that primarily concentrate on the factuality of world knowledge (e.g.~information from Wikipedia), felm focuses on factuality across diverse domains, spanning from world knowledge to math and reasoning. Our annotation is based on text segments, which can help pinpoint specific factual errors. The factuality annotations are further supplemented by predefined error types and reference links that either support or contradict the statement. In our experiments, we investigate the performance of several LLM-based factuality evaluators on felm, including both vanilla LLMs and those augmented with retrieval mechanisms and chain-of-thought processes. Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors.	翻訳日:2023-11-30 14:24:44 公開日:2023-11-28
# テレビがオフになったら? マルチモーダル言語モデルの反事実推論能力の検討 What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models ( http://arxiv.org/abs/2310.06627v3 ) ライセンス: Link先を確認	Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, Bingchen Zhao	(参考訳) 人間の認知の基本的な側面である対実的推論は、確立された事実や過去の出来事に対する代替案を考え、計画や意思決定における我々の能力を大幅に向上させる。現在のマルチモーダル大規模言語モデルの進歩を踏まえて,反事実推論におけるその効果を考察する。そこで本研究では,現代マルチモーダル大言語モデルの対実的推論能力をテストするために,新しいデータセットC-VQAを提案する。このデータセットは、数値クエリやブールクエリなど、さまざまな型にまたがる偽の述語で元の質問を推論することによって構築される。実際のデータと合成データを混在させ、幅広い難易度を表現している。このデータセットを用いた現代の視覚言語モデルの徹底的な評価では、パフォーマンス低下が顕著であり、一部のモデルでは40%まで低下し、現在のモデルと人間に似た視覚推論能力の間に大きなギャップが浮かび上がっている。当社のデータセットが,モデルの事実上の推論能力を評価する上で重要なベンチマークになることを期待しています。コードとデータセットはhttps://bzhao.me/C-VQA/で公開されている。 Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal large language models, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern multi-modal large language models. This dataset is constructed by infusing original questions with counterfactual presuppositions, spanning various types such as numerical and boolean queries. It encompasses a mix of real and synthetic data, representing a wide range of difficulty levels. Our thorough evaluations of contemporary vision-language models using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease, highlighting a significant gap between current models and human-like vision reasoning capabilities. We hope our dataset will serve as a vital benchmark for evaluating the counterfactual reasoning capabilities of models. Code and dataset are publicly available at https://bzhao.me/C-VQA/.	翻訳日:2023-11-30 14:17:45 公開日:2023-11-28
# テレビがオフになったら? マルチモーダル言語モデルの反事実推論能力の検討 What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models ( http://arxiv.org/abs/2310.06627v2 ) ライセンス: Link先を確認	Letian Zhang, Xiaotong Zhai, Zhongkai Zhao, Yongshuo Zong, Xin Wen, Bingchen Zhao	(参考訳) 人間の認知の基本的な側面である対実的推論は、確立された事実や過去の出来事に対する代替案を考え、計画や意思決定における我々の能力を大幅に向上させる。現在のマルチモーダル大規模言語モデルの進歩を踏まえて,反事実推論におけるその効果を考察する。そこで本研究では,現代マルチモーダル大言語モデルの対実的推論能力をテストするために,新しいデータセットC-VQAを提案する。このデータセットは、数値クエリやブールクエリなど、さまざまな型にまたがる偽の述語で元の質問を推論することによって構築される。実際のデータと合成データを混在させ、幅広い難易度を表現している。このデータセットを用いた現代の視覚言語モデルの徹底的な評価では、パフォーマンス低下が顕著であり、一部のモデルでは40%まで低下し、現在のモデルと人間に似た視覚推論能力の間に大きなギャップが浮かび上がっている。当社のデータセットが,モデルの事実上の推論能力を評価する上で重要なベンチマークになることを期待しています。コードとデータセットはhttps://bzhao.me/C-VQA/で公開されている。 Counterfactual reasoning, a fundamental aspect of human cognition, involves contemplating alternatives to established facts or past events, significantly enhancing our abilities in planning and decision-making. In light of the advancements in current multi-modal large language models, we explore their effectiveness in counterfactual reasoning. To facilitate this investigation, we introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern multi-modal large language models. This dataset is constructed by infusing original questions with counterfactual presuppositions, spanning various types such as numerical and boolean queries. It encompasses a mix of real and synthetic data, representing a wide range of difficulty levels. Our thorough evaluations of contemporary vision-language models using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease, highlighting a significant gap between current models and human-like vision reasoning capabilities. We hope our dataset will serve as a vital benchmark for evaluating the counterfactual reasoning capabilities of models. Code and dataset are publicly available at https://bzhao.me/C-VQA/.	翻訳日:2023-11-30 14:16:54 公開日:2023-11-28
# WIFIに基づく人間活動認識のためのアンテナ応答一貫性駆動型自己教師型学習 Antenna Response Consistency Driven Self-supervised Learning for WIFI-based Human Activity Recognition ( http://arxiv.org/abs/2310.06328v3 ) ライセンス: Link先を確認	Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng	(参考訳) WiFiベースのヒューマンアクティビティ認識(HAR)のための自己教師付き学習(SSL)は、ラベル付きデータ不足の課題に対処する能力のため、大きな期待を持っている。しかし、SSLアルゴリズムを直接移植する、特に対照的な学習は、本来は他のドメインのためにCSIデータに設計されていたが、期待する性能を達成できなかった。この問題は,特徴空間と入力空間間の意味的距離の整合性を阻害する不適切なアライメント基準に起因する。この課題に対処するために、適切なアライメント基準を定義するソリューションとして、 \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency (ARC) を導入する。 ARCは、実世界の雑音に堅牢性を導入しながら、入力空間からの意味情報を保持するように設計されている。さらに,WiFi ベース HAR の自己教師型学習性能を向上し,ほとんどの場合において 5 % 以上の精度を達成し,94.97 % の精度を達成することにより,ARC の有効性を実証した。 Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space. To address this challenge, we introduce \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency (ARC) as a solution to define proper alignment criteria. ARC is designed to retain semantic information from the input space while introducing robustness to real-world noise. Moreover, we substantiate the effectiveness of ARC through a comprehensive set of experiments, demonstrating its capability to enhance the performance of self-supervised learning for WiFi-based HAR by achieving an increase of over 5\% in accuracy in most cases and achieving a best accuracy of 94.97\%.	翻訳日:2023-11-30 14:15:45 公開日:2023-11-28
# Lyapunovの予測通り、ライオンは秘密裏に最適化する Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts ( http://arxiv.org/abs/2310.05898v4 ) ライセンス: Link先を確認	Lizhang Chen, Bo Liu, Kaizhao Liang, Qiang Liu	(参考訳) プログラム検索を通じて発見された新しいオプティマイザであるLion(Evolved Sign Momentum)は、大規模なAIモデルのトレーニングにおいて有望な結果を示している。 AdamWと同等か好意的に動作するが、メモリ効率は高い。ランダム探索プログラムの結果から想像できるように、lionは、符号付き運動量、デカップリングされた重みの減衰、polak、ネステロフ運動量を含む、いくつかの既存のアルゴリズムの要素を組み込んでいるが、理論上既定のオプティマイザのどのカテゴリにも当てはまらない。したがって、ライオンは幅広いタスクの汎用最適化器として機能するように見えるが、理論的根拠は定かではない。この理論的明快さの欠如は、ライオンの有効性をさらに強化し拡大する機会を制限している。この作品はライオンを軽蔑することを目的としている。連続時間解析と離散時間解析の両方に基づき、Lion は一般損失関数 $f(x)$ を最小化し、有界制約 $\\|x\\|_\infty \leq 1/\lambda$ を強制する理論的および原理的アプローチであることを示した。ライオンはこれをデカップリングウェイト崩壊の包含によって達成し、$\lambda$はウェイト崩壊係数を表す。我々の分析はライオン更新のための新しいリアプノフ関数の開発によって可能である。これは、Lion-$\kappa$アルゴリズムのより広範なファミリーに適用され、Lionの$\text{sign}(\cdot)$演算子は凸関数 $\kappa$ の次数に置き換えられ、一般的な合成最適化問題である $\min_x f(x) + \kappa^(x)$ の解となる。我々の発見はライオンのダイナミクスに関する貴重な洞察を与え、ライオン関連アルゴリズムのさらなる改良と拡張の道を開く。 Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\\|x\\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.	翻訳日:2023-11-30 14:15:23 公開日:2023-11-28
# 大規模言語モデルによる最適化に向けて Towards Optimizing with Large Language Models ( http://arxiv.org/abs/2310.05204v2 ) ライセンス: Link先を確認	Pei-Fu Guo, Ying-Hsuan Chen, Yun-Da Tsai, Shou-De Lin	(参考訳) 本研究では,様々なタスクやデータサイズにまたがるLLMの最適化能力の評価を行う。これらのタスクはそれぞれ独自の最適化ドメインに対応しており、対話的なプロンプトでこれらのタスクを実行するにはLSMが必要である。すなわち、各最適化ステップにおいて、LLMは過去の生成した解からそれらの値で新しい解を生成し、次に次の最適化ステップで新しい解を評価し検討する。さらに,様々な視点からタスクパフォーマンスを総合的に評価するための3つの異なる指標を紹介した。これらのメトリクスは、幅広い最適化タスクにわたるLLMパフォーマンスの評価に適用できる利点を提供し、テストサンプルのバリエーションに敏感でない。これらのメトリクスを適用することで、小規模サンプルを扱う際にllmが強力な最適化能力を示すことが分かる。しかし、それらの性能はデータサイズや値などの要因に大きく影響され、LLMの最適化タスクの領域におけるさらなる研究の重要性が強調されている。 In this work, we conduct an assessment of the optimization capabilities of LLMs across various tasks and data sizes. Each of these tasks corresponds to unique optimization domains, and LLMs are required to execute these tasks with interactive prompting. That is, in each optimization step, the LLM generates new solutions from the past generated solutions with their values, and then the new solutions are evaluated and considered in the next optimization step. Additionally, we introduce three distinct metrics for a comprehensive assessment of task performance from various perspectives. These metrics offer the advantage of being applicable for evaluating LLM performance across a broad spectrum of optimization tasks and are less sensitive to variations in test samples. By applying these metrics, we observe that LLMs exhibit strong optimization capabilities when dealing with small-sized samples. However, their performance is significantly influenced by factors like data size and values, underscoring the importance of further research in the domain of optimization tasks for LLMs.	翻訳日:2023-11-30 14:14:20 公開日:2023-11-28
# T-Rep: 時間埋め込みを用いた時系列表現学習 T-Rep: Representation Learning for Time Series using Time-Embeddings ( http://arxiv.org/abs/2310.04486v2 ) ライセンス: Link先を確認	Archibald Fraikin, Adrien Bennetot, St\'ephanie Allassonni\`ere	(参考訳) 多変量時系列は、しばしばラベルがなく、高次元でノイズがあり、欠落したデータを含んでいるため、標準的な機械学習技術に挑戦する。そこで本稿では,時系列表現を時間ステップの粒度で学習する自己教師型T-Repを提案する。 T-Repは特徴抽出器と共に時間のベクトル埋め込みを学び、信号からトレンド、周期性、分布シフトなどの時間的特徴を抽出する。これらの時間埋め込みはプリテキストタスクで活用され、スムースできめ細かい時間依存を表現に組み込むとともに、欠落したデータに対する堅牢性を強化する。下流分類,予測,異常検出タスクにおけるT-Repの評価を行った。時系列の既存の自己教師型アルゴリズムと比較され、3つのタスクすべてで優れています。私たちは、T-Repが欠落しているデータ構造でテストします。最後に,学習表現の解釈可能性に注目した潜在空間可視化実験を行う。 Multivariate time series present challenges to standard machine learning techniques, as they are often unlabeled, high dimensional, noisy, and contain missing data. To address this, we propose T-Rep, a self-supervised method to learn time series representations at a timestep granularity. T-Rep learns vector embeddings of time alongside its feature extractor, to extract temporal features such as trend, periodicity, or distribution shifts from the signal. These time-embeddings are leveraged in pretext tasks, to incorporate smooth and fine-grained temporal dependencies in the representations, as well as reinforce robustness to missing data. We evaluate T-Rep on downstream classification, forecasting, and anomaly detection tasks. It is compared to existing self-supervised algorithms for time series, which it outperforms in all three tasks. We test T-Rep in missing data regimes, where it proves more resilient than its counterparts. Finally, we provide latent space visualisation experiments, highlighting the interpretability of the learned representations.	翻訳日:2023-11-30 14:13:38 公開日:2023-11-28
# A brief History of Prompt: Leveraging Language Models (英語) (先進的なプロンプティング) A Brief History of Prompt: Leveraging Language Models. (Through Advanced Prompting) ( http://arxiv.org/abs/2310.04438v2 ) ライセンス: Link先を確認	Golam Md Muktadir	(参考訳) 本稿では,自然言語処理(NLP)分野における迅速なエンジニアリングと生成の進化を包括的に探求する。初期の言語モデルと情報検索システムから始まり、長年にわたってプロンプトエンジニアリングを形成してきた重要な発展を追跡する。 2015年のアテンションメカニズムの導入は言語理解に革命をもたらし、制御性と文脈認識の進歩につながった。その後の強化学習技術のブレークスルーにより、さらなるエンジニアリングが促進され、生成されたテキストの露出バイアスやバイアスといった問題に対処する。 2018年と2019年における重要な貢献について検討し、微調整戦略、制御コード、テンプレートベースの生成に焦点を当てた。また,公平性,人間とaiのコラボレーション,低リソース適応の重要性についても論じた。 2020年と2021年には、文脈的なプロンプトとトランスファー学習が注目され、2022年と2023年には、教師なし事前学習や新しい報酬形成のような高度な技術が出現した。本稿では,各種開発が迅速工学に与える影響を実証する研究成果について紹介する。迅速なエンジニアリングの旅は続き、倫理的考慮がAIシステムの責任と包括的未来にとって最重要である。 This paper presents a comprehensive exploration of the evolution of prompt engineering and generation in the field of natural language processing (NLP). Starting from the early language models and information retrieval systems, we trace the key developments that have shaped prompt engineering over the years. The introduction of attention mechanisms in 2015 revolutionized language understanding, leading to advancements in controllability and context-awareness. Subsequent breakthroughs in reinforcement learning techniques further enhanced prompt engineering, addressing issues like exposure bias and biases in generated text. We examine the significant contributions in 2018 and 2019, focusing on fine-tuning strategies, control codes, and template-based generation. The paper also discusses the growing importance of fairness, human-AI collaboration, and low-resource adaptation. In 2020 and 2021, contextual prompting and transfer learning gained prominence, while 2022 and 2023 witnessed the emergence of advanced techniques like unsupervised pre-training and novel reward shaping. Throughout the paper, we reference specific research studies that exemplify the impact of various developments on prompt engineering. The journey of prompt engineering continues, with ethical considerations being paramount for the responsible and inclusive future of AI systems.	翻訳日:2023-11-30 14:13:22 公開日:2023-11-28
# 二重分割星における状態移動のキャラクタリゼーション A Characterization of State Transfer on Double Subdivided Stars ( http://arxiv.org/abs/2310.04107v2 ) ライセンス: Link先を確認	Sarojini Mohapatra and Hiranmoy Pal	(参考訳) 準分割星 $sk_{1,l}$ は、経路 $p_3 の l$ コピーからちょうど 1 つのペンダント頂点を同定することによって得られる。$ この研究は、2 つの準分割星 $t_{l,m}$ の量子状態遷移の存在に関するもので、これは準分割星 $sk_{1,l}$ と $sk_{1,m}$ の対である。 t_{l,m} の標多項式のガロア群を用いて、その固有値の線形独立性を解析し、対応する量子系のハミルトニアン行列として隣接行列を考えるとき、二重分割された恒星における完全状態移動は見当たらない。次に、非常に良好な状態移動を示す二重分割星の完全なキャラクタリゼーションを確立する。 A subdivided star $SK_{1,l}$ is obtained by identifying exactly one pendant vertex from $l$ copies of the path $P_3.$ This study is on the existence of quantum state transfer on double subdivided star $T_{l,m}$ which is a pair of subdivided stars $SK_{1,l}$ and $SK_{1,m}$ joined by an edge to the respective coalescence vertices. Using the Galois group of the characteristic polynomial of $T_{l,m},$ we analyze the linear independence of its eigenvalues which uncovers no perfect state transfer in double subdivided stars when considering the adjacency matrix as the Hamiltonian of corresponding quantum system. Then we establish a complete characterization on double subdivided stars exhibiting pretty good state transfer.	翻訳日:2023-11-30 14:13:00 公開日:2023-11-28
# マルチモーダル言語モデルの性能について On the Performance of Multimodal Language Models ( http://arxiv.org/abs/2310.03211v2 ) ライセンス: Link先を確認	Utsav Garg, Erhan Bas	(参考訳) Instruction-tuned large language model (LLM) は、様々な下流タスクにまたがるゼロショットの一般化機能を示す。最近の研究は、モデルグラフトを通して独立に事前訓練された視覚エンコーダを統合することで、LLMにマルチモーダル機能を導入している。これらのマルチモーダル変種は、LLMに似た命令チューニングを行い、マルチモーダルタスクの効率的なゼロショット一般化を可能にする。本研究では,様々なマルチモーダル・インストラクション・チューニング手法の比較分析を行い,複雑な推論,会話,イメージキャプション,マルチチョイス質問(MCQ),バイナリ分類など,様々なタスクにおける性能評価を行う。厳密なベンチマークとアブレーション実験を通じて、マルチモーダル機能をLLMに組み込む際のアーキテクチャ選択を導くための重要な洞察を明らかにする。しかし、現在のアプローチには限界があり、タスク一般化の強化に不可欠である多様なマルチモーダル命令データセットの必要性に十分対応していない。さらに、応答を生成する際に、真理と事実性に関する問題を見落としている。これらの知見は、画像理解に言語モデルを適用する際の現在の方法論的制約を明らかにし、LLMのマルチモーダルバージョンを活用しようとする研究者や実践者に貴重なガイダンスを提供する。 Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently pretrained vision encoders through model grafting. These multimodal variants undergo instruction tuning, similar to LLMs, enabling effective zero-shot generalization for multimodal tasks. This study conducts a comparative analysis of different multimodal instruction tuning approaches and evaluates their performance across a range of tasks, including complex reasoning, conversation, image captioning, multiple-choice questions (MCQs), and binary classification. Through rigorous benchmarking and ablation experiments, we reveal key insights for guiding architectural choices when incorporating multimodal capabilities into LLMs. However, current approaches have limitations; they do not sufficiently address the need for a diverse multimodal instruction dataset, which is crucial for enhancing task generalization. Additionally, they overlook issues related to truthfulness and factuality when generating responses. These findings illuminate current methodological constraints in adapting language models for image comprehension and provide valuable guidance for researchers and practitioners seeking to harness multimodal versions of LLMs.	翻訳日:2023-11-30 14:12:04 公開日:2023-11-28
# ポイントPEFT:3次元事前学習モデルのためのパラメータ効率の良いファインチューニング Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models ( http://arxiv.org/abs/2310.03059v3 ) ライセンス: Link先を確認	Ivan Tang, Ray Zhang, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Xuelong Li	(参考訳) 事前訓練された大規模モデルの人気は、言語、ビジョン、マルチモダリティといった様々な分野の下流タスクに革命をもたらした。下流タスクの適応コストを最小限に抑えるために,言語および2次元画像事前訓練モデルに対して,パラメータ効率の良い細調整(PEFT)技術が多数提案されている。しかし,3次元事前学習モデルのPEFT法はまだ未検討である。この目的のために,最小限の学習パラメータを持つポイントクラウド事前学習モデルに適用するための新しいフレームワークであるPoint-PEFTを紹介する。具体的には、事前トレーニングされた3dモデルでは、ほとんどのパラメータを凍結し、新たに追加されたpeftモジュールを、ポイント優先プロンプトとジオメトリ対応アダプタで構成される下流タスクでのみチューニングします。 Point-prior Promptは学習可能なプロンプトトークンの集合を採用し、ドメイン固有の知識を持つメモリバンクの構築を提案し、パラメータフリーの注意を使ってプロンプトトークンを強化する。 Geometry-Aware Adapterは、空間近傍の点雲の特徴を集約し、局所的な相互作用を通じてきめ細かい幾何学的情報をキャプチャすることを目的としている。広範な実験により,ダウンストリームタスクの完全な微調整よりも優れた性能を実現することができたが,トレーニング可能なパラメータは5%に過ぎず,その効率と効果を示すことができた。コードはhttps://github.com/Even-JK/PEFT-3Dで公開される。 The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/Even-JK/PEFT-3D.	翻訳日:2023-11-30 14:11:47 公開日:2023-11-28
# CodeChain: 代表サブモジュールとの自己修正によるモジュールコード生成を目指す CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules ( http://arxiv.org/abs/2310.08992v2 ) ライセンス: Link先を確認	Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty	(参考訳) LLM(Large Language Models)は、HumanEvalやMBPPベンチマークのような単純なプログラミングタスクを解くのに、すでに非常に熟練している。しかし、より複雑で競争的なプログラミングタスクの解決は、これらのモデルにとって依然として非常に難しい - おそらくは、論理的なサブタスクやサブモジュールに分解する代わりに、モノリシックなコードブロックとしてソリューションを生成する傾向があるからだ。一方、経験豊富なプログラマは、しばしば以前開発されたモジュールを再利用して、複雑なタスクを解決するための抽象的なモジュール化されたコードを書く。このギャップに対処するために,我々は,自己リビジョンのチェーンを通じてモジュール化されたコード生成を導出する,新しい推論フレームワークであるcodechainを提案する。具体的には、CodeChainはまずLLMに、チェーン・オブ・ソート・プロンプトを通じてモジュール化されたコードを生成するように指示する。次に、2つのステップを繰り返すことによって、自己再定義の連鎖を適用する。 1)生成されたサブモジュールを抽出してクラスタ化し、クラスタ代表をより汎用的で再利用可能な実装として選択し、 2)これら選択されたモジュール実装で元のチェーン・オブ・マインド・プロンプトを補強し、llmに新しいモジュール化ソリューションを再生成するよう指示する。我々は、LLMが以前開発され、検証されたサブモジュールの再利用を自然に促すことで、CodeChainは、生成したソリューションのモジュラリティと正確性の両方を大幅に向上させ、APPSで35%、CodeContestsで76%の相対パス@1の改善を達成できることがわかった。これはOpenAI LLMとWizardCoderのようなオープンソースLLMの両方で有効であることが示されている。また,CodeChainの成功を支える有用な洞察を提供するために,クラスタ数,モデルサイズ,プログラム品質など,さまざまな方法による包括的なアブレーション研究も行っています。 Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modules. On the other hand, experienced programmers instinctively write modularized code with abstraction for solving complex tasks, often reusing previously developed modules. To address this gap, we propose CodeChain, a novel framework for inference that elicits modularized code generation through a chain of self-revisions, each being guided by some representative sub-modules generated in previous iterations. Concretely, CodeChain first instructs the LLM to generate modularized codes through chain-of-thought prompting. Then it applies a chain of self-revisions by iterating the two steps: 1) extracting and clustering the generated sub-modules and selecting the cluster representatives as the more generic and re-usable implementations, and 2) augmenting the original chain-of-thought prompt with these selected module-implementations and instructing the LLM to re-generate new modularized solutions. We find that by naturally encouraging the LLM to reuse the previously developed and verified sub-modules, CodeChain can significantly boost both modularity as well as correctness of the generated solutions, achieving relative pass@1 improvements of 35% on APPS and 76% on CodeContests. It is shown to be effective on both OpenAI LLMs as well as open-sourced LLMs like WizardCoder. We also conduct comprehensive ablation studies with different methods of prompting, number of clusters, model sizes, program qualities, etc., to provide useful insights that underpin CodeChain's success.	翻訳日:2023-11-30 14:05:33 公開日:2023-11-28
# LoftQ: 大規模言語モデルのための LoRA-Fine-Tuning-Aware 量子化 LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models ( http://arxiv.org/abs/2310.08659v4 ) ライセンス: Link先を確認	Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao	(参考訳) 量子化は、LLM(Large Language Models)を提供するのに必須のテクニックであり、最近LoRAファインチューニングへの道を見つけた。本研究では、事前学習モデルに量子化とLoRA微調整を併用するシナリオに焦点を当てる。このような場合、完全な微調整と量子化とLoRA微調整のアプローチで下流タスクのパフォーマンスの一貫性のあるギャップを観察することが一般的である。 LLMの量子化を同時に行う新しい量子化フレームワークであるLoftQ(LoRA-Fine-Tuning-Aware Quantization)を提案する。このような初期化は量子化モデルと完全精度モデルの相違を緩和し、下流タスクの一般化を大幅に改善する。本稿では,自然言語理解,質問応答,要約,自然言語生成タスクについて評価する。実験により,本手法は既存の量子化法,特に2ビットと2/4ビットの混合精度で高い性能を示した。コードはhttps://github.com/yxli2123/LoftQで入手できる。 Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning. In this work we focus on the scenario where quantization and LoRA fine-tuning are applied together on a pre-trained model. In such cases it is common to observe a consistent gap in the performance on downstream tasks between full fine-tuning and quantization plus LoRA fine-tuning approach. In response, we propose LoftQ (LoRA-Fine-Tuning-aware Quantization), a novel quantization framework that simultaneously quantizes an LLM and finds a proper low-rank initialization for LoRA fine-tuning. Such an initialization alleviates the discrepancy between the quantized and full-precision model and significantly improves generalization in downstream tasks. We evaluate our method on natural language understanding, question answering, summarization, and natural language generation tasks. Experiments show that our method is highly effective and outperforms existing quantization methods, especially in the challenging 2-bit and 2/4-bit mixed precision regimes. The code is available on https://github.com/yxli2123/LoftQ.	翻訳日:2023-11-30 14:04:39 公開日:2023-11-28
# 現象を補う:仮説補充を伴う言語モデルの帰納的推論能力のテスト Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement ( http://arxiv.org/abs/2310.08559v2 ) ライセンス: Link先を確認	Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren	(参考訳) 一握りの観察から基本原理を導き出し、帰納的推論として知られる新しい状況に一般化する能力は、人間の知性の中心である。以前の研究は、言語モデル(LM)が、しばしば帰納的推論に不足していることを示唆している。本研究では,標準的な入出力プロンプトよりも人間の帰納的過程をより密接に反映する手法である反復的仮説リファインメントを用いて,lmsの帰納的推論能力に関する体系的研究を行う。反復的仮説の洗練は、テキスト規則の形で仮説を提案、選択、洗練する3段階のプロセスを用いる。中間ルールを検証した結果,LMは現象仮説の提案者(すなわち,候補規則の生成)であり,提案したルールセットを体系的にフィルタリングする(タスク固有の)シンボリックインタプリタと組み合わせることで,因果関係,言語的指示,記号的概念の誘導を必要とする帰納的推論ベンチマークに対して強い結果が得られた。しかし、それらは帰納的推論器としても振舞い、規則帰納法(可算規則の特定)と規則適用法(インスタンスに提案された規則を適用する)の間に顕著な性能差を示し、LMが実際に規則を適用することなく仮説を提案していることを示唆している。経験的および人的分析により, LMの誘導的推論過程と人間とのいくつかの相違が明らかとなり, 誘導的推論タスクにおけるLMの使用の可能性と限界の両方に光を当てる。 The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on inductive reasoning, despite achieving impressive success on research benchmarks. In this work, we conduct a systematic study of the inductive reasoning capabilities of LMs through iterative hypothesis refinement, a technique that more closely mirrors the human inductive process than standard input-output prompting. Iterative hypothesis refinement employs a three-step process: proposing, selecting, and refining hypotheses in the form of textual rules. By examining the intermediate rules, we observe that LMs are phenomenal hypothesis proposers (i.e., generating candidate rules), and when coupled with a (task-specific) symbolic interpreter that is able to systematically filter the proposed set of rules, this hybrid approach achieves strong results across inductive reasoning benchmarks that require inducing causal relations, language-like instructions, and symbolic concepts. However, they also behave as puzzling inductive reasoners, showing notable performance gaps between rule induction (i.e., identifying plausible rules) and rule application (i.e., applying proposed rules to instances), suggesting that LMs are proposing hypotheses without being able to actually apply the rules. Through empirical and human analyses, we further reveal several discrepancies between the inductive reasoning processes of LMs and humans, shedding light on both the potentials and limitations of using LMs in inductive reasoning tasks.	翻訳日:2023-11-30 14:04:17 公開日:2023-11-28
# トモグラフィー画像からのVTトランスフォーマーを用いたCOVID-19検出 COVID-19 detection using ViT transformer-based approach from Computed Tomography Images ( http://arxiv.org/abs/2310.08165v2 ) ライセンス: Link先を確認	Kenan Morani	(参考訳) 本稿では,ct画像を用いたcovid-19診断の精度と効率を向上させるための新しい手法を提案する。コンピュータビジョンにおける最先端のトランスフォーマーモデルを利用して、224x224サイズの入力画像に設定されたベースViTトランスフォーマーを用いて、バイナリ分類タスクに適合するように出力を変更した。特に、入力画像は標準のCTスキャンサイズ512x512から、モデルの期待に合うようにリサイズされた。本手法では,患者のCTスライスをCOVID-19または非COVID-19に分類する。各患者の診断全体を決定するために,多数決法と他の閾値付け法が採用された。本方法は,患者のCTスライスを全て評価し,CTスキャンのしきい値に関連する診断を患者に割り当てることを含む。この細かな患者レベルの予測プロセスは、2Dスライスから3D患者レベルに至るまでのソリューションの堅牢性に寄与する。評価過程を通じて,COV19-CT-DB検証セットで0.7マクロF1スコアを得た。モデルの信頼性と有効性を確保するため,タスクに注意深い注釈付きCOV-19 CTデータセット上で厳密に検証した。このデータセットは包括的なアノテーションとともに、ソリューション全体の堅牢性を強化します。 In here, we introduce a novel approach to enhance the accuracy and efficiency of COVID-19 diagnosis using CT images. Leveraging state-of-the-art Transformer models in computer vision, we employed the base ViT Transformer configured for 224x224-sized input images, modifying the output to suit the binary classification task. Notably, input images were resized from the standard CT scan size of 512x512 to match the model's expectations. Our method implements a systematic patient-level prediction strategy, classifying individual CT slices as COVID-19 or non-COVID. To determine the overall diagnosis for each patient, a majority voting approach as well as other thresholding approaches were employed. This method involves evaluating all CT slices for a given patient and assigning the patient the diagnosis that relates to the thresholding for the CT scan. This meticulous patient-level prediction process contributes to the robustness of our solution as it starts from 2D-slices to 3D-patient level. Throughout the evaluation process, our approach resulted in 0.7 macro F1 score on the COV19-CT -DB validation set. To ensure the reliability and effectiveness of our model, we rigorously validate it on the extensive COV-19 CT dataset, which is meticulously annotated for the task. This dataset, with its comprehensive annotations, reinforces the overall robustness of our solution.	翻訳日:2023-11-30 14:03:22 公開日:2023-11-28
# スパースオートエンコーダを用いたRLHF修飾言語モデルの逆モデル解釈 Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders ( http://arxiv.org/abs/2310.08164v2 ) ライセンス: Link先を確認	Luke Marks, Amir Abdullah, Luna Mendez, Rauno Arike, Philip Torr, Fazl Barez	(参考訳) 大規模言語モデル(LLM)は、人間のフィードバック(RLHF)からの強化学習を通じて、人間の好みに合わせている。しかし、LLM内部へのRLHFの影響はいまだに不透明である。 RLHF を用いて学習した LLM における暗黙報酬モデル (IRM) の解釈法を提案する。我々のアプローチは、ベースLSMとそのRLHF調整型からアクティベートされたオートエンコーダのペアを訓練する。オートエンコーダの隠れ空間の比較により,学習したIRMの精度を反映した特徴を同定する。提案手法を説明するため,RLHFを用いてLPMを微調整し,トークンユーティリティマッピングを学習し,生成したテキストの集合的有用性を最大化する。これは、irmを解釈するためのスパースオートエンコーダの最初のアプリケーションである。本手法は報酬の整合性を抽象的に近似し,特定の目的と学習モデル行動の一致度を測定することを約束する。 Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications of LLM technology. Despite this, the impacts of RLHF on LLM internals remain opaque. We propose a novel method for interpreting implicit reward models (IRMs) in LLMs learned through RLHF. Our approach trains pairs of autoencoders on activations from a base LLM and its RLHF-tuned variant. Through a comparison of autoencoder hidden spaces, we identify features that reflect the accuracy of the learned IRM. To illustrate our method, we fine-tune an LLM via RLHF to learn a token-utility mapping and maximize the aggregate utility of generated text. This is the first application of sparse autoencoders to interpreting IRMs. Our method provides an abstract approximation of reward integrity and holds promise for measuring alignment between specified objectives and learned model behaviors.	翻訳日:2023-11-30 14:02:59 公開日:2023-11-28
# PreM:ノードレベルグラフ異常検出のためのシンプルで効果的なアプローチ PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly Detection ( http://arxiv.org/abs/2310.11676v3 ) ライセンス: Link先を確認	Junjun Pan, Yixin Liu, Yizhen Zheng, Shirui Pan	(参考訳) ノードレベルのグラフ異常検出(GAD)は、医学、ソーシャルネットワーク、eコマースなど、さまざまな領域におけるグラフ構造化データから異常ノードを特定する上で重要な役割を果たす。しかし、異常の多様性とラベル付きデータの変形により、問題が発生している。既存の方法論に基づくコントラスト学習 - 効率的ではあるが、しばしば効率上の問題に悩まされ、複雑な目的や精巧なモジュールから生じる。本稿では,GADの効率を向上させるために,PREM (preprocessing and Matching) という簡単な手法を提案する。我々のアプローチは、強力な異常検出機能を維持しながら、GADを合理化し、時間とメモリ消費を削減する。プリプロセッシングモジュールとego-neighborマッチングモジュールの2つのモジュールで構成されるpremは、トレーニング中にメッセージパッシング伝搬の必要性をなくし、単純なコントラスト損失を採用し、トレーニング時間とメモリ使用量を大幅に削減する。さらに,5つの実世界のデータセットの厳密な評価により,ロバスト性と有効性を示した。特に、ACMデータセットで検証された場合、PremMはAUCの5%の改善、トレーニング速度の9倍向上、最も効率的なベースラインと比較してメモリ使用量を大幅に削減した。 Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in various domains such as medicine, social networks, and e-commerce. However, challenges have arisen due to the diversity of anomalies and the dearth of labeled data. Existing methodologies - reconstruction-based and contrastive learning - while effective, often suffer from efficiency issues, stemming from their complex objectives and elaborate modules. To improve the efficiency of GAD, we introduce a simple method termed PREprocessing and Matching (PREM for short). Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities. Comprising two modules - a pre-processing module and an ego-neighbor matching module - PREM eliminates the necessity for message-passing propagation during training, and employs a simple contrastive loss, leading to considerable reductions in training time and memory usage. Moreover, through rigorous evaluations of five real-world datasets, our method demonstrated robustness and effectiveness. Notably, when validated on the ACM dataset, PREM achieved a 5% improvement in AUC, a 9-fold increase in training speed, and sharply reduce memory usage compared to the most efficient baseline.	翻訳日:2023-11-30 13:54:14 公開日:2023-11-28
# CoCoFormer: 制御可能な機能豊富なポリフォニック音楽生成法 CoCoFormer: A controllable feature-rich polyphonic music generation method ( http://arxiv.org/abs/2310.09843v2 ) ライセンス: Link先を確認	Jiuyang Zhou, Tengfei Niu, Hong Zhu, Xingping Wang	(参考訳) 本稿では,ポリフォニック音楽系列のモデル化手法について述べる。音楽生成におけるトランスフォーマーモデルの可能性が大きいため、制御可能な音楽生成が注目されている。ポリフォニック音楽の課題において、現在制御可能な生成研究はコード生成の制御に焦点を当てているが、合唱音楽テクスチャの制御可能な生成の正確な調整が欠けている。本稿では,コードとリズムの入力をきめ細かいレベルで制御することで,モデルの出力を制御する条件合唱変換器(CoCoFormer)を提案する。本稿では,自己教師方式により損失関数が向上し,条件制御入力と非条件入力トレーニングによる共同訓練を行う。そこで本研究では,教師の強制訓練による生成サンプルの多様性の欠如を緩和するために,逆訓練法を付加した。 CoCoFormerは、コードとリズムへの明示的で暗黙的な入力でモデルパフォーマンスを向上させる。本稿では,CoCoFormerが現在のモデルよりも優れたレベルに達したことを実証する。ポリフォニック音楽のテクスチャを規定する前提では、同じメロディを様々な方法で生成することも可能である。 This paper explores the modeling method of polyphonic music sequence. Due to the great potential of Transformer models in music generation, controllable music generation is receiving more attention. In the task of polyphonic music, current controllable generation research focuses on controlling the generation of chords, but lacks precise adjustment for the controllable generation of choral music textures. This paper proposed Condition Choir Transformer (CoCoFormer) which controls the output of the model by controlling the chord and rhythm inputs at a fine-grained level. In this paper, the self-supervised method improves the loss function and performs joint training through conditional control input and unconditional input training. In order to alleviate the lack of diversity on generated samples caused by the teacher forcing training, this paper added an adversarial training method. CoCoFormer enhances model performance with explicit and implicit inputs to chords and rhythms. In this paper, the experiments proves that CoCoFormer has reached the current better level than current models. On the premise of specifying the polyphonic music texture, the same melody can also be generated in a variety of ways.	翻訳日:2023-11-30 13:49:13 公開日:2023-11-28
# FormalGeo:人間ライクなIMOレベルの自動推論への第一歩 FormalGeo: The First Step Toward Human-like IMO-level Geometric Automated Reasoning ( http://arxiv.org/abs/2310.18021v3 ) ライセンス: Link先を確認	Xiaokai Zhang, Na Zhu, Yiming He, Jia Zou, Qike Huang, Xiaoxiao Jin, Yanjun Guo, Chenyang Mao, Zhe Zhu, Dengfeng Yue, Fangzhen Zhu, Yang Li, Yifan Wang, Yiwen Huang, Runan Wang, Cheng Qin, Zhenbing Zeng, Shaorong Xie, Xiangfeng Luo, Tuo Leng	(参考訳) これは、私たちが過去3年間に達成した一連の研究における最初の論文です。本稿では,完全かつ互換性のある形式平面幾何システムを構築した。これは、IMOレベルの平面形状問題と可読性AI自動推論の間に重要な橋渡しとなる。このフォーマルなフレームワークでは、最新のAIモデルをフォーマルなシステムとシームレスに統合することができます。 aiは、他の自然言語を扱うのと同じように、imoレベルの平面幾何問題に対する推論的推論ソリューションを提供することができ、これらの証明は可読性、トレース性、検証可能である。本稿では,幾何形式体系の発展を導くために,幾何形式化理論(GFT)を提案する。 GFTに基づいて、88の幾何述語と196の定理からなるフォーマルジオを確立した。 IMOレベルの幾何学問題を表現、検証、解決することができる。また、PythonでFGPS(形式幾何学問題の解法)も作成しました。問題解決プロセスを検証するための対話型アシスタントと自動問題解決ツールの両方として機能する。 formalgeo7k と formalgeo-imo データセットにアノテートしました。前者は6,891点(データ拡張による133,818点)、後者は18点(2,627点)のIMOレベルの挑戦的幾何学問題を含む。注釈付き問題には、詳細な形式的な言語記述と解決策が含まれる。形式システムの実装と実験は、GFTの正当性と有用性を検証する。奥行き優先探索法は2.42%の問題解決失敗率しか生み出せず,より低い解を得るために深層学習手法を組み込むことができる。 FGPSとデータセットのソースコードはhttps://github.com/BitSecret/FGPSで入手できる。 This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a complete and compatible formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. Within this formal framework, we have been able to seamlessly integrate modern AI models with our formal system. AI is now capable of providing deductive reasoning solutions to IMO-level plane geometry problems, just like handling other natural languages, and these proofs are readable, traceable, and verifiable. We propose the geometry formalization theory (GFT) to guide the development of the geometry formal system. Based on the GFT, we have established the FormalGeo, which consists of 88 geometric predicates and 196 theorems. It can represent, validate, and solve IMO-level geometry problems. we also have crafted the FGPS (formal geometry problem solver) in Python. It serves as both an interactive assistant for verifying problem-solving processes and an automated problem solver. We've annotated the formalgeo7k and formalgeo-imo datasets. The former contains 6,891 (expand to 133,818 through data augmentation) geometry problems, while the latter includes 18 (expand to 2,627 and continuously increasing) IMO-level challenging geometry problems. All annotated problems include detailed formal language descriptions and solutions. Implementation of the formal system and experiments validate the correctness and utility of the GFT. The backward depth-first search method only yields a 2.42% problem-solving failure rate, and we can incorporate deep learning techniques to achieve lower one. The source code of FGPS and datasets are available at https://github.com/BitSecret/FGPS.	翻訳日:2023-11-30 13:41:52 公開日:2023-11-28
# netFound:ネットワークセキュリティのための基盤モデル netFound: Foundation Model for Network Security ( http://arxiv.org/abs/2310.17025v2 ) ライセンス: Link先を確認	Satyandra Guthula, Navya Battula, Roman Beltiukov, Wenbo Guo, Arpit Gupta	(参考訳) ネットワークセキュリティのためのmlでは、従来のワークフローは高品質なラベル付きデータと手動の機能エンジニアリングに依存しているが、限られたデータセットと人間の専門知識は特徴の選択を妨げる。 GPT-4やVision TransformersといったMLアプリケーションドメインの最近の進歩に触発されて,ネットワークセキュリティの基礎モデルであるnetFoundを開発した。このモデルは、利用可能な未ラベルのネットワークパケットトレースに自己教師付きアルゴリズムを適用して事前学習を行う。 netFoundの設計には、ネットワークトラフィックの階層的およびマルチモーダルな属性が含まれており、アプリケーションロジック、通信プロトコル、ネットワーク条件を含む隠されたネットワークコンテキストを効果的にキャプチャする。この事前訓練された基盤があれば、低品質で限定的でノイズの多いラベル付きデータを扱う場合でも、幅広いダウンストリームタスクに対してnetFoundを微調整できます。実験では,トラフィック分類,ネットワーク侵入検出,APT検出の3つの異なるダウンストリームタスクにおいて,既存の最先端MLベースのソリューションよりもnetFoundの方が優れていることを示した。さらに,ノイズや欠落するラベルに対するnetfoundの頑健さや,時間的変動や多様なネットワーク環境にまたがる汎用性についても強調する。最後に、一連のアブレーション研究を通じて、netFoundがネットワークセキュリティアプリケーションにおけるパフォーマンスとユーティリティをさらに強化し、隠れたネットワークコンテキストをより効果的にキャプチャする方法に関する総合的な洞察を提供する。 In ML for network security, traditional workflows rely on high-quality labeled data and manual feature engineering, but limited datasets and human expertise hinder feature selection, leading to models struggling to capture crucial relationships and generalize effectively. Inspired by recent advancements in ML application domains like GPT-4 and Vision Transformers, we have developed netFound, a foundational model for network security. This model undergoes pre-training using self-supervised algorithms applied to readily available unlabeled network packet traces. netFound's design incorporates hierarchical and multi-modal attributes of network traffic, effectively capturing hidden networking contexts, including application logic, communication protocols, and network conditions. With this pre-trained foundation in place, we can fine-tune netFound for a wide array of downstream tasks, even when dealing with low-quality, limited, and noisy labeled data. Our experiments demonstrate netFound's superiority over existing state-of-the-art ML-based solutions across three distinct network downstream tasks: traffic classification, network intrusion detection, and APT detection. Furthermore, we emphasize netFound's robustness against noisy and missing labels, as well as its ability to generalize across temporal variations and diverse network environments. Finally, through a series of ablation studies, we provide comprehensive insights into how our design choices enable netFound to more effectively capture hidden networking contexts, further solidifying its performance and utility in network security applications.	翻訳日:2023-11-30 13:41:02 公開日:2023-11-28
# BatteryML:バッテリ劣化による機械学習のためのオープンソースプラットフォーム BatteryML:An Open-source platform for Machine Learning on Battery Degradation ( http://arxiv.org/abs/2310.14714v2 ) ライセンス: Link先を確認	Han Zhang, Xiaofan Gui, Shun Zheng, Ziheng Lu, Yuqi Li, Jiang Bian	(参考訳) バッテリーの劣化は、エネルギーストレージ領域における重要な関心事であり、機械学習が先進的な洞察とソリューションを促進する強力なツールとして台頭している。しかし、この電気化学科学と機械学習の交わりは複雑な問題を引き起こす。機械学習の専門家はバッテリー科学の複雑さに苦しむことが多いが、バッテリー研究者は特定のデータセットに合わせた複雑なモデルに適応するハードルに直面している。これに加えて、データフォーマットと評価ベンチマークを包含する、バッテリー劣化モデリングの凝集度基準が目立って欠如している。このような障害を認識したbatterymlは,データの前処理,機能抽出,従来型モデルと最先端モデルの両方の実装を統一した,ワンステップの,オールエンコンパスなオープンソースプラットフォームです。この合理化されたアプローチは、研究アプリケーションの実用性と効率を高めることを約束する。 BatteryMLはこの空白を埋めようとしている。さまざまな専門分野の専門家が協力して貢献できる環境を育み、バッテリリサーチの全体的な理解と進歩を高める。プロジェクトのコードはGitHubでhttps://github.com/microsoft/BatteryMLで公開されている。 Battery degradation remains a pivotal concern in the energy storage domain, with machine learning emerging as a potent tool to drive forward insights and solutions. However, this intersection of electrochemical science and machine learning poses complex challenges. Machine learning experts often grapple with the intricacies of battery science, while battery researchers face hurdles in adapting intricate models tailored to specific datasets. Beyond this, a cohesive standard for battery degradation modeling, inclusive of data formats and evaluative benchmarks, is conspicuously absent. Recognizing these impediments, we present BatteryML - a one-step, all-encompass, and open-source platform designed to unify data preprocessing, feature extraction, and the implementation of both traditional and state-of-the-art models. This streamlined approach promises to enhance the practicality and efficiency of research applications. BatteryML seeks to fill this void, fostering an environment where experts from diverse specializations can collaboratively contribute, thus elevating the collective understanding and advancement of battery research.The code for our project is publicly available on GitHub at https://github.com/microsoft/BatteryML.	翻訳日:2023-11-30 13:38:54 公開日:2023-11-28
# 強化学習の歴史とリスクと人間フィードバック The History and Risks of Reinforcement Learning and Human Feedback ( http://arxiv.org/abs/2310.13595v2 ) ライセンス: Link先を確認	Nathan Lambert and Thomas Krendl Gilbert and Tom Zick	(参考訳) 人間からのフィードバックからの強化学習(RLHF)は、大規模言語モデル(LLM)をより使いやすく、効果的にするための強力なテクニックとして登場した。 RLHFプロセスの中核は、最適化のための報酬関数として機能する人間の好みのモデルのトレーニングと利用である。このアプローチは、多くの利害関係者と学術分野の交点で運用されているが、いまだによく分かっていない。 RLHF報酬モデルはしばしばパフォーマンスの達成の中心として言及されるが、能力、評価、トレーニング方法、オープンソースのモデルに関する記述はごくわずかである。このような情報がないため、学習したRLHF報酬モデルにはさらなる研究と透明性が必要である。本稿では,プライオリティを最適化する複雑な歴史と,報酬モデルの社会学的文脈を理解するための問合せの要点について述べる。特に、RLHFの基礎におけるコスト、報酬、嗜好のオントロジ的差異、関連する方法論的緊張、および報酬モデルがどのように機能するかの一般的な理解を改善するための研究の方向性について強調する。 Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) easier to use and more effective. A core piece of the RLHF process is the training and utilization of a model of human preferences that acts as a reward function for optimization. This approach, which operates at the intersection of many stakeholders and academic disciplines, remains poorly understood. RLHF reward models are often cited as being central to achieving performance, yet very few descriptors of capabilities, evaluations, training methods, or open-source models exist. Given this lack of information, further study and transparency is needed for learned RLHF reward models. In this paper, we illustrate the complex history of optimizing preferences, and articulate lines of inquiry to understand the sociotechnical context of reward models. In particular, we highlight the ontological differences between costs, rewards, and preferences at stake in RLHF's foundations, related methodological tensions, and possible research directions to improve general understanding of how reward models function.	翻訳日:2023-11-30 13:37:57 公開日:2023-11-28
# リー代数畳み込みによる概等分散 Almost Equivariance via Lie Algebra Convolutions ( http://arxiv.org/abs/2310.13164v2 ) ライセンス: Link先を確認	Daniel McNeela	(参考訳) 近年,機械学習の研究において,集団行動に関するモデルの等価性が重要な話題となっている。しかし、特定のグループの同値性を持つアーキテクチャを付与することは、モデルが期待するデータ変換のタイプに強く先行する。厳密な同変モデルは対称性を強制するが、実世界のデータは必ずしもそのような厳密な等式に従わない。そのような場合、厳密な等分散の事前は実際には強すぎることが証明され、実世界のデータでモデルが過小評価される。そこで本研究では,近縁な話題であるほぼ同値な話題について考察する。本論文は,現在の文献に存在するものと異なる概等分散の定義を提供し,リー群のリー代数に訴えることでモデルの概等分散を符号化する実用的な方法を提案する。具体的には、リー代数の畳み込みを定義し、それらがリー群畳み込みに対していくつかの利点を与えることを示す。そこから、我々は理論の領域に方向転換し、等分散と等分散の概念と、ほぼ等分散と概等化の概念の間の接続をそれぞれ示す。 2つの存在定理を証明し、1つは一般多様体の同型の有界距離内における概等距離の存在を示し、もう1つはヒルベルト空間の逆を示す。次に、これらの定理を拡張して、群作用と函数類に関する一定の制約に従う完全同値な埋め込み関数の有界距離内における概同値多様体埋め込みの存在を証明する。最後に、完全同値およびほぼ同値な設定でデータセットに対してベンチマークを行うことにより、このアプローチの有効性を実証する。 Recently, the equivariance of models with respect to a group action has become an important topic of research in machine learning. However, imbuing an architecture with a specific group equivariance imposes a strong prior on the types of data transformations that the model expects to see. While strictly-equivariant models enforce symmetries, real-world data does not always conform to such strict equivariances, be it due to noise in the data or underlying physical laws that encode only approximate or partial symmetries. In such cases, the prior of strict equivariance can actually prove too strong and cause models to underperform on real-world data. Therefore, in this work we study a closely related topic, that of almost equivariance. We provide a definition of almost equivariance that differs from those extant in the current literature and give a practical method for encoding almost equivariance in models by appealing to the Lie algebra of a Lie group. Specifically, we define Lie algebra convolutions and demonstrate that they offer several benefits over Lie group convolutions, including being well-defined for non-compact groups. From there, we pivot to the realm of theory and demonstrate connections between the notions of equivariance and isometry and those of almost equivariance and almost isometry, respectively. We prove two existence theorems, one showing the existence of almost isometries within bounded distance of isometries of a general manifold, and another showing the converse for Hilbert spaces. We then extend these theorems to prove the existence of almost equivariant manifold embeddings within bounded distance of fully equivariant embedding functions, subject to certain constraints on the group action and the function class. Finally, we demonstrate the validity of our approach by benchmarking against datasets in fully equivariant and almost equivariant settings.	翻訳日:2023-11-30 13:37:18 公開日:2023-11-28
# $\mathbb{Z}_3$キラルクロックモデルにおけるエネルギー輸送の温度依存性 Temperature dependence of energy transport in the $\mathbb{Z}_3$ chiral clock model ( http://arxiv.org/abs/2311.00046v2 ) ライセンス: Link先を確認	Yongchan Yoo, Brian Swingle	(参考訳) 1次元の$\mathbb{Z}_3$キラルクロックモデルの非可積分状態におけるエネルギー輸送を研究するために行列積状態シミュレーションを用いる。システム全体の非平衡定常状態を誘導するために,システム内の温度と足跡を調節可能なジャンプ演算子を特徴とする境界駆動を伴うオープンシステムダイナミクスを考察する。定常状態が与えられると、真の局所状態と均一な熱アンサンブルの局所状態との間のトレース距離を最小化し、有効局所温度を診断する。スケール解析により, 比較的高い温度で模型の輸送係数を, ギャップのない, ガッピングされた低温の位相から抽出した。中～高温の状態では、低温物理学にかかわらず拡散輸送が観察される。エネルギー拡散定数の温度依存性をモデルパラメータの関数として計算し、低温でモデルが量子臨界である場合を含める。特に、ギャップのない状態でも、電力系列展開に基づく解析は、比較的限られた設定で中間温度輸送にアクセス可能であることを示唆している。量子臨界スケーリングが観測される温度にはまだ到達できませんが、我々のアプローチでは、幅広い温度とパラメータにわたってモデルの輸送特性にアクセスすることが可能です。結論として,本手法の限界と,その適用範囲をより低い温度にまで拡大する可能性について考察した。 We employ matrix product state simulations to study energy transport within the non-integrable regime of the one-dimensional $\mathbb{Z}_3$ chiral clock model. To induce a non-equilibrium steady state throughout the system, we consider open system dynamics with boundary driving featuring jump operators with adjustable temperature and footprint in the system. Given a steady state, we diagnose the effective local temperature by minimizing the trace distance between the true local state and the local state of a uniform thermal ensemble. Via a scaling analysis, we extract the transport coefficients of the model at relatively high temperatures above both its gapless and gapped low-temperature phases. In the medium-to-high temperature regime we consider, diffusive transport is observed regardless of the low-temperature physics. We calculate the temperature dependence of the energy diffusion constant as a function of model parameters, including in the regime where the model is quantum critical at the low temperature. Notably, even within the gapless regime, an analysis based on power series expansion implies that intermediate-temperature transport can be accessed within a relatively confined setup. Although we are not yet able to reach temperatures where quantum critical scaling would be observed, our approach is able to access the transport properties of the model over a broad range of temperatures and parameters. We conclude by discussing the limitations of our method and potential extensions that could expand its scope, for example, to even lower temperatures.	翻訳日:2023-11-30 13:31:17 公開日:2023-11-28
# 特徴抽出を用いたサイクロンを用いたCTスキャンからの合成MRI生成 Enhanced Synthetic MRI Generation from CT Scans Using CycleGAN with Feature Extraction ( http://arxiv.org/abs/2310.20604v2 ) ライセンス: Link先を確認	Saba Nikbakhsh, Lachin Naghashyar, Morteza Valizadeh, Mehdi Chehel Amirani	(参考訳) 放射線治療の分野では, 正確な画像診断と画像登録が最も重要である。磁気共鳴イメージング(MRI)は、侵襲的でない詳細な画像を提供し、軟質なコントラストに優れており、放射線治療計画に好適である。しかし、MRIの高コスト、より長い取得時間、特定の患者に対する健康上の配慮が課題となる。逆にCT(Computerd Tomography)スキャンは、高速で安価なイメージングソリューションを提供する。これらのモダリティを橋渡しし,マルチモーダルアライメント問題に対処するために,合成mri画像を用いたモノモーダル登録の強化手法を提案する。そこで本研究では,CTスキャンからCycleGANと特徴抽出器を利用して合成MRI画像を生成する手法を提案する。本手法は,Cycle-Consistent Adversarial Networksの基礎研究を基盤として,関連文献の進歩を取り入れることで,有望な成果を示し,いくつかの最先端手法に勝ることを示す。提案手法の有効性は,複数の比較指標を用いて検証した。 In the field of radiotherapy, accurate imaging and image registration are of utmost importance for precise treatment planning. Magnetic Resonance Imaging (MRI) offers detailed imaging without being invasive and excels in soft-tissue contrast, making it a preferred modality for radiotherapy planning. However, the high cost of MRI, longer acquisition time, and certain health considerations for patients pose challenges. Conversely, Computed Tomography (CT) scans offer a quicker and less expensive imaging solution. To bridge these modalities and address multimodal alignment challenges, we introduce an approach for enhanced monomodal registration using synthetic MRI images. Utilizing unpaired data, this paper proposes a novel method to produce these synthetic MRI images from CT scans, leveraging CycleGANs and feature extractors. By building upon the foundational work on Cycle-Consistent Adversarial Networks and incorporating advancements from related literature, our methodology shows promising results, outperforming several state-of-the-art methods. The efficacy of our approach is validated by multiple comparison metrics.	翻訳日:2023-11-30 13:30:55 公開日:2023-11-28
# SemanticBoost: Augmented Textual Cuesを用いたモーション生成 SemanticBoost: Elevating Motion Generation with Augmented Textual Cues ( http://arxiv.org/abs/2310.20323v2 ) ライセンス: Link先を確認	Xin He, Shaoli Huang, Xiaohang Zhan, Chao Weng, Ying Shan	(参考訳) 現在の技術では、データセットのセマンティックアノテーションが不十分でコンテキスト理解が弱いため、複雑なセマンティック記述から動作を生成するのが困難である。これらの問題に対処するために,我々はsemanticboostという新しいフレームワークを提案する。本フレームワークは,意味強調モジュールと文脈対応モーションデノイザー(camd)から構成される。セマンティックエンハンスメントモジュールは、モーションデータから補足的セマンティクスを抽出し、データセットのテキスト記述を豊かにし、大きな言語モデルに依存することなく、テキストとモーションデータの正確なアライメントを確保する。一方、camdアプローチは、コンテキスト情報を効果的に捉え、生成された動きを所定のテキスト記述と整合させることで、高品質で意味的に一貫性のある動きシーケンスを生成するための全包括的ソリューションを提供する。既存の方法と異なるアプローチでは、正確な方向移動、特定の身体部分の記述に基づく複合動作、複雑な伸長文から生成される動きを合成することができる。実験の結果,SemanticBoostは拡散法として自己回帰法より優れ,Humanml3Dデータセット上での最先端性能を実現し,現実的かつスムーズな動き生成品質を維持した。 Current techniques face difficulties in generating motions from intricate semantic descriptions, primarily due to insufficient semantic annotations in datasets and weak contextual understanding. To address these issues, we present SemanticBoost, a novel framework that tackles both challenges simultaneously. Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD). The Semantic Enhancement module extracts supplementary semantics from motion data, enriching the dataset's textual description and ensuring precise alignment between text and motion data without depending on large language models. On the other hand, the CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences by effectively capturing context information and aligning the generated motion with the given textual descriptions. Distinct from existing methods, our approach can synthesize accurate orientational movements, combined motions based on specific body part descriptions, and motions generated from complex, extended sentences. Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques, achieving cutting-edge performance on the Humanml3D dataset while maintaining realistic and smooth motion generation quality.	翻訳日:2023-11-30 13:30:05 公開日:2023-11-28
# 多言語数学的推論における言語バリアの破壊:洞察と観察 Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations ( http://arxiv.org/abs/2310.20246v4 ) ライセンス: Link先を確認	Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Yangqiu Song, Dongmei Zhang, Jia Li	(参考訳) 既存の研究は主に、単言語言語における数学的推論のための強力な言語学習モデル(llm)の開発に焦点を当てている。このギャップを埋めるために, マルチリンガル数学推論 (xMR) LLM の探索と訓練を行った。まず,多言語数学推論指導データセットmgsm8kinstructを構築し,10個の異なる言語を包含することで,xmrタスクにおけるデータ不足の学習問題に対処する。収集したデータセットに基づいて,MathOctopusという名の強力なxMR LLMを構築するための異なるトレーニング戦略を提案する。特にMathOctopus-13Bの精度は47.6%に達し、MGSMテストセットのChatGPT 46.3%を超えている。 1) 拒否的サンプリング戦略を多言語文脈に拡張すると, モデルの性能に有効であることが証明されるが, 限定的である。 2) 複数の言語にまたがる並列コーパス (SFT) の利用は, モデル性能を多言語的に向上させるだけでなく, モノリンガル性能も向上させる。これは,多言語コーパスの作成が,特に数学的推論タスクにおいて,特定の言語におけるモデル性能を高める上で重要な戦略であることを示す。例えば、mathoctopus-7bは、gsm8kテストセットで42.2%から50.8%に向上した。 Existing research predominantly focuses on developing powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages, thus addressing the issue of training data scarcity in xMR tasks. Based on the collected dataset, we propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs and exhibit superiority over ChatGPT in few-shot scenarios. Notably, MathOctopus-13B reaches 47.6% accuracy which exceeds ChatGPT 46.3% on MGSM testset. Beyond remarkable results, we unearth several pivotal observations and insights from extensive experiments: (1) When extending the rejection sampling strategy to the multilingual context, it proves effective for model performances, albeit limited. (2) Employing parallel corpora for math Supervised Fine-Tuning (SFT) across multiple languages not only significantly enhances model performance multilingually but also elevates their monolingual performance. This indicates that crafting multilingual corpora can be regarded as a vital strategy for enhancing model performance in a specific language, especially in mathematical reasoning tasks. For instance, MathOctopus-7B improves its counterparts that trained on English from 42.2% to 50.8% on GSM8K testset.	翻訳日:2023-11-30 13:29:41 公開日:2023-11-28
# CapST: 合成ビデオのための強化された軽量モデル属性アプローチ CapST: An Enhanced and Lightweight Model Attribution Approach for Synthetic Videos ( http://arxiv.org/abs/2311.03782v2 ) ライセンス: Link先を確認	Wasim Ahmad, Yan-Tsung Peng, Yuan-Hao Chang, Gaddisa Olani Ganfure, Sarwar Khan, Sahibzada Adil Shahzad	(参考訳) ディープフェイクビデオはAIのフェイスウォーピング技術によって生成され、強力な偽造攻撃の可能性からかなりの注目を集めている。既存の研究は、主に実物と偽物の区別のためのバイナリ分類に焦点を当てているが、偽の動画の特定の生成モデルを決定することは、法医学的な調査には不可欠である。本稿では,様々なオートエンコーダモデルから派生した,最近提案されたデータセットDeepfakes from Different Models (DFDM) のDeepfakeビデオのモデル属性問題について検討する。データセットは、エンコーダ、デコーダ、中間層、入力解像度、圧縮比の5つの異なるモデルによって生成された6,450のDeepfakeビデオからなる。本研究では,VGG19のセグメントを特徴抽出バックボーンとして提案する多クラス分類タスクとしてDeepfakesモデル属性を定式化した。カプセルモジュールは、ディープフェイク属性のロバストな識別のための特徴のうち複雑な階層をキャプチャする。さらに、ビデオレベルの融合技術は、連続した特徴ベクトルを扱うために時間的注意機構を利用し、ディープフェイクビデオに固有の時間的依存性を生かしている。フレームにまたがる洞察を集約することで、私たちのモデルはビデオコンテンツの包括的理解を獲得し、より正確な予測を可能にします。 deepfake benchmark dataset (dfdm) における実験結果は,提案手法の有効性を実証し,計算資源の少ないベースラインモデルと比較して,deepfakeビデオの精度を最大4%向上させた。 Deepfake videos, generated through AI faceswapping techniques, have garnered considerable attention due to their potential for powerful impersonation attacks. While existing research primarily focuses on binary classification to discern between real and fake videos, however determining the specific generation model for a fake video is crucial for forensic investigation. Addressing this gap, this paper investigates the model attribution problem of Deepfake videos from a recently proposed dataset, Deepfakes from Different Models (DFDM), derived from various Autoencoder models. The dataset comprises 6,450 Deepfake videos generated by five distinct models with variations in encoder, decoder, intermediate layer, input resolution, and compression ratio. This study formulates Deepfakes model attribution as a multiclass classification task, proposing a segment of VGG19 as a feature extraction backbone, known for its effectiveness in imagerelated tasks, while integrated a Capsule Network with a Spatio-Temporal attention mechanism. The Capsule module captures intricate hierarchies among features for robust identification of deepfake attributes. Additionally, the video-level fusion technique leverages temporal attention mechanisms to handle concatenated feature vectors, capitalizing on inherent temporal dependencies in deepfake videos. By aggregating insights across frames, our model gains a comprehensive understanding of video content, resulting in more precise predictions. Experimental results on the deepfake benchmark dataset (DFDM) demonstrate the efficacy of our proposed method, achieving up to a 4% improvement in accurately categorizing deepfake videos compared to baseline models while demanding fewer computational resources.	翻訳日:2023-11-30 13:18:47 公開日:2023-11-28
# ランダム量子状態の非特異性に対する厳密指数 Exact Exponent for Atypicality of Random Quantum States ( http://arxiv.org/abs/2311.02534v3 ) ライセンス: Link先を確認	Eyuri Wakakuwa	(参考訳) 両部量子系上の一様ランダムな純状態から誘導されるランダムな量子状態の特性を,より大きな部分系上の部分的トレースを用いて検討する。これまでの研究の多くは「測度の集中」という視点を採用しており、平均に近い状態の行動に焦点を当てている。対照的に、州が平均から遠く離れている可能性がある大規模な偏差体制を調査する。第一に、誘導ランダム状態が与えられた集合内にある確率は、追跡されたサブシステムの次元において指数関数よりも遅くも速くも減少しない。第二に、指数は最大混合状態と与えられた集合の量子相対エントロピーに等しく、残りの部分系の次元に乗じる。第三に、与えられた集合の全体確率は、条件濃度と呼ばれる性質である最大混合状態に最も近い要素の周りに強く集中する。同じ線に沿って、大きな次元を持つ単一系におけるランダム純粋状態のコヒーレンスの漸近挙動についても検討する。 We study the properties of the random quantum states induced from the uniformly random pure states on a bipartite quantum system by taking the partial trace over the larger subsystem. Most of the previous studies have adopted a viewpoint of "concentration of measure" and have focused on the behavior of the states close to the average. In contrast, we investigate the large deviation regime, where the states may be far from the average. We prove the following results: First, the probability that the induced random state is within a given set decreases no slower or faster than exponential in the dimension of the subsystem traced out. Second, the exponent is equal to the quantum relative entropy of the maximally mixed state and the given set, multiplied by the dimension of the remaining subsystem. Third, the total probability of a given set strongly concentrates around the element closest to the maximally mixed state, a property that we call conditional concentration. Along the same line, we also investigate an asymptotic behavior of coherence of random pure states in a single system with large dimensions.	翻訳日:2023-11-30 13:18:18 公開日:2023-11-28
# 人々がより良い編集を行う: 有害言語検出のためのLLM生成逆拡張データの有効性の測定 People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection ( http://arxiv.org/abs/2311.01270v2 ) ライセンス: Link先を確認	Indira Sen, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil van der Aalst, Claudia Wagner	(参考訳) nlpモデルは、性差別者、人種差別主義者、その他嫌悪なコンテンツの検出など、様々な重要な社会コンピューティングタスクで使用される。したがって、これらのモデルがスプリアス機能に対して堅牢であることは必須である。過去の研究は、CAD(Counterfactually Augmented Data)を含むトレーニングデータ拡張を使用して、このような急激な機能に取り組みました。 CADは既存のトレーニングデータポイントに最小限の変更を導入し、ラベルをフリップする。しかし、手動でCADを生成するのは時間と費用がかかる。そこで本研究では,生成NLPモデルを用いて,このタスクが自動化可能かどうかを評価する。我々は,polyjuice,chatgpt,flan-t5を用いてcadを自動生成し,モデルロバスト性を改善するための有用性を評価する。複数のドメイン外のテストセットでモデル性能と個々のデータポイントの有効性をテストすることで、手動CADは依然として最も効果的であるが、ChatGPTが生成したCADは2秒間近かった。自動メソッドのパフォーマンスが低い理由の1つは、彼らが導入した変更が元のラベルをひっくり返すのに不十分であることである。 NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.	翻訳日:2023-11-30 13:15:56 公開日:2023-11-28
# 科学ジャーナリズム領域における計算型ニュース発見ツールに関する実践の理解 Understanding Practices around Computational News Discovery Tools in the Domain of Science Journalism ( http://arxiv.org/abs/2311.06864v2 ) ライセンス: Link先を確認	Sachita Nishal, Jasmine Sinchai, Nicholas Diakopoulos	(参考訳) 今日、科学とテクノロジーのジャーナリストは、ワークロードの増加、リソースの削減、科学出版のエコシステムの拡大など、ニュースに値するリードを見つけることの課題に直面している。この状況を踏まえて,これらのジャーナリストのニュース発見を支援する計算手法を,時間効率と機関の観点から検討する。特に,3つの計算情報助成金を対話型ツールとして試作し,そのようなツールがプロフェッショナル・サイエンス・ジャーナリストのプラクティスをどのように活用するか,あるいはより広範に形作るのかを探究した。本研究は,これらのツールがデザインに影響を及ぼしうる科学ジャーナリストのエージェンシー,文脈,責任に関する中心的な考察を明らかにするものである。これに基づいて、より長期のユーザエージェンシーのためのデザイン機会を提案し、コンテクスト的、個人的、コラボレーティブなニュース適性の概念を取り入れ、柔軟なインターフェースと生成モデルを活用する。全体として,コンピュータニュース発見ツールに関する社会学的システムのより豊かな視点を提供し,科学ジャーナリストの実践をより良く支援するためのツールを改善する方法を提案する。 Science and technology journalists today face challenges in finding newsworthy leads due to increased workloads, reduced resources, and expanding scientific publishing ecosystems. Given this context, we explore computational methods to aid these journalists' news discovery in terms of time-efficiency and agency. In particular, we prototyped three computational information subsidies into an interactive tool that we used as a probe to better understand how such a tool may offer utility or more broadly shape the practices of professional science journalists. Our findings highlight central considerations around science journalists' agency, context, and responsibilities that such tools can influence and could account for in design. Based on this, we suggest design opportunities for greater and longer-term user agency; incorporating contextual, personal and collaborative notions of newsworthiness; and leveraging flexible interfaces and generative models. Overall, our findings contribute a richer view of the sociotechnical system around computational news discovery tools, and suggest ways to improve such tools to better support the practices of science journalists.	翻訳日:2023-11-30 13:08:25 公開日:2023-11-28
# 双方向長期記憶ネットワークを用いた色生成 Generation Of Colors using Bidirectional Long Short Term Memory Networks ( http://arxiv.org/abs/2311.06542v2 ) ライセンス: Link先を確認	A. Sinha	(参考訳) 人間の視覚は、200万から700万の識別可能な色合いと推定される広大な色を区別することができる。しかし、この印象的な範囲は、これらの色が我々の辞書の中で正確に命名され、記述されていることを本質的に意味していない。私たちはしばしば、日常生活で身近な物体や概念と色を関連付けます。この研究は、無数の陰影に対する視覚的認識と、それらを正確に表現し、命名する能力のギャップを埋めようとしている。この目的を達成するために,双方向長短期記憶(BiLSTM)ネットワークとアクティブラーニングを利用した新しいモデルが開発された。このモデルは、この研究のために慎重にキュレートされたプロプライエタリなデータセット上で動作する。本研究の主な目的は、以前は名前のない色を分類・命名したり、伝統的な色用語を損なう中間色を識別するための多用途ツールを作ることである。この発見は、色知覚と言語に対する我々の理解を革新するこの革新的なアプローチの可能性を基礎にしている。本研究は, 厳密な実験と分析を通じて, 多様な産業における自然言語処理(NLP)応用の道筋を照らすものである。広い色スペクトルの探索を容易にすることで、NLPの潜在的な応用は従来の境界を越えて拡張される。 Human vision can distinguish between a vast spectrum of colours, estimated to be between 2 to 7 million discernible shades. However, this impressive range does not inherently imply that all these colours have been precisely named and described within our lexicon. We often associate colours with familiar objects and concepts in our daily lives. This research endeavors to bridge the gap between our visual perception of countless shades and our ability to articulate and name them accurately. A novel model has been developed to achieve this goal, leveraging Bidirectional Long Short-Term Memory (BiLSTM) networks with Active learning. This model operates on a proprietary dataset meticulously curated for this study. The primary objective of this research is to create a versatile tool for categorizing and naming previously unnamed colours or identifying intermediate shades that elude traditional colour terminology. The findings underscore the potential of this innovative approach in revolutionizing our understanding of colour perception and language. Through rigorous experimentation and analysis, this study illuminates a promising avenue for Natural Language Processing (NLP) applications in diverse industries. By facilitating the exploration of the vast colour spectrum the potential applications of NLP are extended beyond conventional boundaries.	翻訳日:2023-11-30 13:08:03 公開日:2023-11-28
# 圧縮量子回路を用いた断熱量子コンピューティングに向けて Towards adiabatic quantum computing using compressed quantum circuits ( http://arxiv.org/abs/2311.05544v2 ) ライセンス: Link先を確認	Conor Mc Keever, Michael Lubasch	(参考訳) 本稿では,量子回路を最適化するテンソルネットワークアルゴリズムについて述べる。ダイアバティック遷移を抑制するために, 逆ダイアバティック駆動を最適化に含め, 変分行列積作用素を用いて断熱ゲージポテンシャルを表現する。伝統的に、トロッター積公式は断熱時間進化を量子回路に変換するために用いられ、反断熱駆動の追加は時間ステップ当たりの回路深さを増加させる。代わりに、固定深さのパラメータ化量子回路を古典的に最適化し、多くの時間ステップで反断熱駆動とともに断熱進化を同時に捉える。これらの方法は、横方向および縦方向の場の量子イジング鎖の基底状態に応用される。古典的に最適化された回路は、トロッター積公式を著しく上回ることを示す。さらに,この手法を組合せ最適化に利用する方法について検討する。 We describe tensor network algorithms to optimize quantum circuits for adiabatic quantum computing. To suppress diabatic transitions, we include counterdiabatic driving in the optimization and utilize variational matrix product operators to represent adiabatic gauge potentials. Traditionally, Trotter product formulas are used to turn adiabatic time evolution into quantum circuits and the addition of counterdiabatic driving increases the circuit depth per time step. Instead, we classically optimize a parameterized quantum circuit of fixed depth to simultaneously capture adiabatic evolution together with counterdiabatic driving over many time steps. The methods are applied to the ground state preparation of quantum Ising chains with transverse and longitudinal fields. We show that the classically optimized circuits can significantly outperform Trotter product formulas. Additionally, we discuss how the approach can be used for combinatorial optimization.	翻訳日:2023-11-30 13:05:44 公開日:2023-11-28
# BakedAvatar: リアルタイムアバター合成のためのバッキングニューラルネットワーク BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis ( http://arxiv.org/abs/2311.05521v2 ) ライセンス: Link先を確認	Hao-Bin Duan, Miao Wang, Jin-Chuan Shi, Xu-Chuan Chen and Yan-Pei Cao	(参考訳) ビデオからフォトリアリスティックな4D人間の頭アバターを合成することは、VR/AR、テレプレゼンス、ビデオゲームアプリケーションに不可欠である。既存のNeural Radiance Fields(NeRF)ベースの手法は高忠実性を実現するが、計算コストはリアルタイムアプリケーションでの使用を制限する。この限界を克服するため,我々は,標準ポリゴンラスタライズパイプラインに展開可能な,リアルタイムニューラルネットワークヘッドアバター合成のための新しい表現であるbakedavatarを紹介する。提案手法は, 学習した頭部の異面から変形可能な多層メッシュを抽出し, 静的なテクスチャに埋め込んだ表現-, ポーズ-, ビュー依存の外観を計算し, 効率的なラスタライズを行う。そこで我々は, 連続的な変形, 多様体, 放射界の学習, 層状メッシュとテクスチャの抽出, ディファレンシャルラスタ化を伴う微調整テクスチャ詳細を含む, ニューラルヘッドアバター合成のための3段階パイプラインを提案する。実験結果から,本表現は他の最先端手法と同等の品質の合成結果を生成するとともに,推定時間を大幅に削減できることを示した。さらに,視覚合成,顔再現,表情編集,ポーズ編集など単眼映像からの頭部アバター合成結果をインタラクティブフレームレートで紹介する。 Synthesizing photorealistic 4D human head avatars from videos is essential for VR/AR, telepresence, and video game applications. Although existing Neural Radiance Fields (NeRF)-based methods achieve high-fidelity results, the computational expense limits their use in real-time applications. To overcome this limitation, we introduce BakedAvatar, a novel representation for real-time neural head avatar synthesis, deployable in a standard polygon rasterization pipeline. Our approach extracts deformable multi-layer meshes from learned isosurfaces of the head and computes expression-, pose-, and view-dependent appearances that can be baked into static textures for efficient rasterization. We thus propose a three-stage pipeline for neural head avatar synthesis, which includes learning continuous deformation, manifold, and radiance fields, extracting layered meshes and textures, and fine-tuning texture details with differential rasterization. Experimental results demonstrate that our representation generates synthesis results of comparable quality to other state-of-the-art methods while significantly reducing the inference time required. We further showcase various head avatar synthesis results from monocular videos, including view synthesis, face reenactment, expression editing, and pose editing, all at interactive frame rates.	翻訳日:2023-11-30 13:05:31 公開日:2023-11-28
# GPT-4V(Ision):自律走行における視覚言語モデルの早期探索 On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving ( http://arxiv.org/abs/2311.05332v2 ) ライセンス: Link先を確認	Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao	(参考訳) 自動運転技術の追求は、知覚、意思決定、制御システムの高度な統合にかかっている。データ駆動型とルールベースの従来のアプローチは、複雑な運転環境のニュアンスや、他の道路利用者の意図を把握できないことで妨げられている。これは特に、安全で信頼性の高い自動運転に必要な常識推論とニュアンスのあるシーン理解の開発において重要なボトルネックとなっている。視覚言語モデル(VLM)の出現は、完全自律運転の実現における新たなフロンティアである。本報告では,最新のVLMであるGPT-4V(ision)の総合評価と自動運転シナリオへの応用について述べる。我々は、シーンを駆動し、決定を下し、最終的にはドライバーの能力で行動する、モデルを理解する能力について探求する。我々の包括的なテストは、基本的なシーン認識から複雑な因果推論、様々な条件下でのリアルタイム意思決定まで幅広い。 GPT-4Vは,既存の自律システムと比較して,シーン理解や因果推論において優れた性能を示した。分散外のシナリオを処理し、意図を認識し、実際の運転状況でインフォームドな意思決定を行う可能性を示す。しかし、特に方向識別、交通光認識、視覚の接地、空間的推論といった課題は残る。これらの制限は、さらなる研究と開発の必要性を浮き彫りにした。プロジェクトがGitHubで利用可能になった。 \url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration} The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving. The advent of Visual Language Models (VLM) represents a novel frontier in realizing fully autonomous vehicle driving. This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios. We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the capacity of a driver. Our comprehensive tests span from basic scene recognition to complex causal reasoning and real-time decision-making under varying conditions. Our findings reveal that GPT-4V demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. It showcases the potential to handle out-of-distribution scenarios, recognize intentions, and make informed decisions in real driving contexts. However, challenges remain, particularly in direction discernment, traffic light recognition, vision grounding, and spatial reasoning tasks. These limitations underscore the need for further research and development. Project is now available on GitHub for interested parties to access and utilize: \url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration}	翻訳日:2023-11-30 13:04:43 公開日:2023-11-28
# 大きな言語モデルを用いた有意義な物語のための人間の記憶の研究 Using large language models to study human memory for meaningful narratives ( http://arxiv.org/abs/2311.04742v2 ) ライセンス: Link先を確認	Antonios Georgiou, Tankut Can, Mikhail Katkov, Misha Tsodyks	(参考訳) ai革命の最も印象的な成果の1つは、意味のあるテキストを生成し、追加のトレーニングなしで平易な英語の指示に応答できる大きな言語モデルの開発である。ここでは,有意義な素材に対する人間の記憶を研究するための科学的手段として,言語モデルが利用できることを示す。大規模メモリ実験を設計し,結果を解析するパイプラインを開発した。我々は,多数の参加者とオンライン記憶実験を行い,異なる長さの物語に対する認識と記憶データを収集した。記憶と認識の両方のパフォーマンスは物語の長さと線形にスケールしていることがわかった。さらに,記憶におけるナラティブ理解の役割を検討するために,提示したストーリーのスクランブル版を用いて,これらの実験を繰り返した。その結果,リコール性能は著しく低下したが,認識にはほとんど影響を与えなかった。興味深いことに、この状況でのリコールは、スクランブルドのプレゼンテーションではなく、オリジナルの物語の順序に従っており、記憶におけるストーリーの文脈的再構成を指している。 One of the most impressive achievements of the AI revolution is the development of large language models that can generate meaningful text and respond to instructions in plain English with no additional training necessary. Here we show that language models can be used as a scientific instrument for studying human memory for meaningful material. We developed a pipeline for designing large scale memory experiments and analyzing the obtained results. We performed online memory experiments with a large number of participants and collected recognition and recall data for narratives of different lengths. We found that both recall and recognition performance scale linearly with narrative length. Furthermore, in order to investigate the role of narrative comprehension in memory, we repeated these experiments using scrambled versions of the presented stories. We found that even though recall performance declined significantly, recognition remained largely unaffected. Interestingly, recalls in this condition seem to follow the original narrative order rather than the scrambled presentation, pointing to a contextual reconstruction of the story in memory.	翻訳日:2023-11-30 13:03:20 公開日:2023-11-28
# クーロンブロック領域におけるt型二重量子ドットにおけるファノ・アンドレエフ効果 Fano-Andreev effect in a T-shaped Double Quantum Dot in the Coulomb blockade regime ( http://arxiv.org/abs/2311.04445v2 ) ライセンス: Link先を確認	A. Gonz\'alez I., A. M. Calle, M. Pacheco, E. C. Siqueira, Pedro A. Orellana	(参考訳) 2つの量子ドットと2つの通常の導線と超伝導体からなる系における超伝導量子相関の効果について検討した。非平衡グリーン関数法を用いて、通常の鉛間の電子の透過、状態密度、および差分伝導を解析した。超伝導相関はファノ・アンドレーエフ干渉を生じさせ,これら2種類の反共振線形状を特徴とすることがわかった。この挙動は平衡状態と非平衡状態の両方で観察され、ハバード-i近似を用いてクーロン相関を考慮した場合でも持続した。なお、これらの状況に対するこの行動の堅牢性は文献ではこれまで研究されていない。 We studied the effects of superconducting quantum correlations in a system consisting of two quantum dots, two normal leads, and a superconductor. Using the non-equilibrium Green's functions method, we analyzed the transmission, density of states, and differential conductance of electrons between the normal leads. We found that the superconducting correlations resulted in Fano-Andreev interference, which is characterized by two anti-resonance line shapes in all of these quantities. This behavior was observed in both equilibrium and non-equilibrium regimes and persisted even when Coulomb correlations were taken into account using the Hubbard-I approximation. It is worth noting that the robustness of this behavior against these conditions has not been studied previously in the literature.	翻訳日:2023-11-30 13:02:50 公開日:2023-11-28
# 効率的な線形光量子計算のための損失耐性閾値を超える高効率単一光子源 High-efficiency single-photon source above the loss-tolerant threshold for efficient linear optical quantum computing ( http://arxiv.org/abs/2311.08347v2 ) ライセンス: Link先を確認	Xing Ding, Yong-Peng Guo, Mo-Chi Xu, Run-Ze Liu, Geng-Yan Zou, Jun-Yi Zhao, Zhen-Xuan Ge, Qi-Hang Zhang, Hua-Liang Liu, Lin-Jun Wang, Ming-Cheng Chen, Hui Wang, Yu-Ming He, Yong-Heng Huo, Chao-Yang Lu, Jian-Wei Pan	(参考訳) 光子損失はスケーラブルなフォトニック量子情報処理の最大の敵である。この問題は、全体の光子損失がしきい値1/3以下であることから、量子誤差補正を用いることで解決できる。しかし、報告されたオンデマンドかつ識別不能な単一光子源は、まだこのしきい値に届かない。本稿では,波長可変な開放型マイクロキャビティに決定論的に結合した高量子効率のレーザーパルス励起を用いて,単一光子純度0.9795(6),光子識別性0.9856(13),システム全体の効率0.712(18)を同時に示す。このソースは、スケーラブルなフォトニック量子コンピューティングの効率しきい値に初めて到達した。この資料では、さらに1.89(14) dB強度のスクイーズ、および1.67mHzの連続40光子事象を実証する。 Photon loss is the biggest enemy for scalable photonic quantum information processing. This problem can be tackled by using quantum error correction, provided that the overall photon loss is below a threshold of 1/3. However, all reported on-demand and indistinguishable single-photon sources still fall short of this threshold. Here, by using tailor shaped laser pulse excitation on a high-quantum efficiency single quantum dot deterministically coupled to a tunable open microcavity, we demonstrate a high-performance source with a single-photon purity of 0.9795(6), photon indistinguishability of 0.9856(13), and an overall system efficiency of 0.712(18), simultaneously. This source for the first time reaches the efficiency threshold for scalable photonic quantum computing. With this source, we further demonstrate 1.89(14) dB intensity squeezing, and consecutive 40-photon events with 1.67 mHz count rate.	翻訳日:2023-11-30 12:54:32 公開日:2023-11-28
# 境界の定義:顕微鏡画像における細胞同定の課題と進歩 Defining the boundaries: challenges and advances in identifying cells in microscopy images ( http://arxiv.org/abs/2311.08269v2 ) ライセンス: Link先を確認	Nodar Gogoberidze, Beth A. Cimini	(参考訳) セグメンテーション(Seegmentation)は、顕微鏡画像中の細胞の測定と解析において重要なステップである。従来のセグメンテーションの方法に依存するツールでは改善が続いているが、ディープラーニングベースのツールはテクノロジの進歩をますます支配している。 Cellposeのようなスペシャリストモデルは精度とユーザフレンドリさを向上し続けており、Multi-Modality Cell Segmentation Challengeのようなセグメンテーションチャレンジは、広範囲にわたるテストデータだけでなく、効率とユーザビリティも革新を推し進めている。ドキュメンテーション、共有、評価標準への注目が高まり、ユーザーフレンドリさが増し、真に普遍的な方法の目標に向かって加速している。 Segmentation, or the outlining of objects within images, is a critical step in the measurement and analysis of cells within microscopy images. While improvements continue to be made in tools that rely on classical methods for segmentation, deep learning-based tools increasingly dominate advances in the technology. Specialist models such as Cellpose continue to improve in accuracy and user-friendliness, and segmentation challenges such as the Multi-Modality Cell Segmentation Challenge continue to push innovation in accuracy across widely-varying test data as well as efficiency and usability. Increased attention on documentation, sharing, and evaluation standards are leading to increased user-friendliness and acceleration towards the goal of a truly universal method.	翻訳日:2023-11-30 12:54:14 公開日:2023-11-28
# 構造概念の予測による画像キャプションの改善 Improving Image Captioning via Predicting Structured Concepts ( http://arxiv.org/abs/2311.08223v2 ) ライセンス: Link先を確認	Ting Wang, Weidong Chen, Yuanhe Tian, Yan Song, Zhendong Mao	(参考訳) 画像キャプションタスクにおける画像とテキストのセマンティックギャップの解決が困難であったため,従来の研究では,2つのモダリティ間のブリッジとしての意味概念を扱い,キャプティング性能の向上に留意した。概念予測の有望な結果が得られたが、前述の研究は通常、イメージ内のオブジェクトだけでなく、テキスト内の単語依存性にも依存する概念間の関係を無視するので、良質な記述を生成するプロセスを改善する大きな可能性を秘めている。本稿では,概念とその構造を予測するための構造化概念予測器 (SCP) を提案し,それらをキャプションに統合し,このタスクにおける視覚信号の寄与を高めるとともに,それらの関係を利用して記述生成を改善する。特に,単語依存による概念関係を表現するために重み付きグラフ畳み込みネットワーク(W-GCN)を設計し,これらの概念と区別されたコントリビューションを復号プロセスに従って学習する。そこで本研究では,概念間の潜在的な関係を捉え,異なる概念を識別的に学習する手法を提案する。広範な実験とその結果から,提案する各モジュールとともに,提案手法の有効性が示された。 Having the difficulty of solving the semantic gap between images and texts for the image captioning task, conventional studies in this area paid some attention to treating semantic concepts as a bridge between the two modalities and improved captioning performance accordingly. Although promising results on concept prediction were obtained, the aforementioned studies normally ignore the relationship among concepts, which relies on not only objects in the image, but also word dependencies in the text, so that offers a considerable potential for improving the process of generating good descriptions. In this paper, we propose a structured concept predictor (SCP) to predict concepts and their structures, then we integrate them into captioning, so as to enhance the contribution of visual signals in this task via concepts and further use their relations to distinguish cross-modal semantics for better description generation. Particularly, we design weighted graph convolutional networks (W-GCN) to depict concept relations driven by word dependencies, and then learns differentiated contributions from these concepts for following decoding process. Therefore, our approach captures potential relations among concepts and discriminatively learns different concepts, so that effectively facilitates image captioning with inherited information across modalities. Extensive experiments and their results demonstrate the effectiveness of our approach as well as each proposed module in this work.	翻訳日:2023-11-30 12:54:01 公開日:2023-11-28
# 神経一般循環モデル Neural General Circulation Models ( http://arxiv.org/abs/2311.07222v2 ) ライセンス: Link先を確認	Dmitrii Kochkov, Janni Yuval, Ian Langmore, Peter Norgaard, Jamie Smith, Griffin Mooers, James Lottes, Stephan Rasp, Peter D\"uben, Milan Kl\"ower, Sam Hatfield, Peter Battaglia, Alvaro Sanchez-Gonzalez, Matthew Willson, Michael P. Brenner, Stephan Hoyer	(参考訳) 一般的な循環モデル(GCM)は気象と気候予測の基礎である。 gcmsは、大規模ダイナミクスのための数値解法と、雲形成のような小規模プロセスのための調律表現を組み合わせた物理ベースのシミュレータである。近年,再分析データに基づく機械学習(ml)モデルが,気象予報のためのgcmと同等あるいは優れたスキルを達成している。しかし,これらのモデルではアンサンブル予測の改善は示されておらず,長期気象・気候シミュレーションに十分な安定性を示した。本稿では,大気力学の微分可能な解法をML成分と組み合わせた最初のGCMについて述べる。 NeuralGCMは1～10日の予測でMLモデルと競合し、European Centre for Medium-Range Weather Forecasts は1～15日の予測で一致している。所定の海面温度で、ニューラルgcmは地球平均気温などの気候指標を何十年も正確に追跡することができ、140kmの解像度の気候予測では、現実の頻度や熱帯サイクロンの軌道のような創発的な現象を示す。気象・気候の両面では,従来のGCMよりも桁違いの計算コストを削減できる。この結果から, エンド・ツー・エンドの深層学習は従来のGCMのタスクと互換性があり, 地球系の理解と予測に不可欠な大規模物理シミュレーションを向上できることがわかった。 General circulation models (GCMs) are the foundation of weather and climate prediction. GCMs are physics-based simulators which combine a numerical solver for large-scale dynamics with tuned representations for small-scale processes such as cloud formation. Recently, machine learning (ML) models trained on reanalysis data achieved comparable or better skill than GCMs for deterministic weather forecasting. However, these models have not demonstrated improved ensemble forecasts, or shown sufficient stability for long-term weather and climate simulations. Here we present the first GCM that combines a differentiable solver for atmospheric dynamics with ML components, and show that it can generate forecasts of deterministic weather, ensemble weather and climate on par with the best ML and physics-based methods. NeuralGCM is competitive with ML models for 1-10 day forecasts, and with the European Centre for Medium-Range Weather Forecasts ensemble prediction for 1-15 day forecasts. With prescribed sea surface temperature, NeuralGCM can accurately track climate metrics such as global mean temperature for multiple decades, and climate forecasts with 140 km resolution exhibit emergent phenomena such as realistic frequency and trajectories of tropical cyclones. For both weather and climate, our approach offers orders of magnitude computational savings over conventional GCMs. Our results show that end-to-end deep learning is compatible with tasks performed by conventional GCMs, and can enhance the large-scale physical simulations that are essential for understanding and predicting the Earth system.	翻訳日:2023-11-30 12:52:17 公開日:2023-11-28
# 化学合成における反応条件推薦のための自動aiエージェントの開発 Towards an Automatic AI Agent for Reaction Condition Recommendation in Chemical Synthesis ( http://arxiv.org/abs/2311.10776v2 ) ライセンス: Link先を確認	Kexin Chen, Junyou Li, Kunyi Wang, Yuyang Du, Jiahui Yu, Jiamin Lu, Lanqing Li, Jiezhong Qiu, Qun Fang, Pheng Ann Heng, Guangyong Chen	(参考訳) 反応条件最適化のための人工知能(AI)は、データ駆動型AIモデルが医薬品の発見と反応設計の加速を支援することを考えると、製薬業界において重要なトピックとなっている。しかし、既存のAIモデルは、経験豊富な化学者の化学的洞察とリアルタイム知識獲得能力に欠ける。本稿では,このギャップを埋めるために,Large Language Model (LLM) を利用したAIエージェントを提案する。そこで我々は,AIエージェントが人間の洞察を借り,最新の化学文献を検索して知識を更新できるように,新しい3段階のパラダイムを提案し,インコンテキスト学習やマルチLLM議論のような高度な知能向上手法を適用した。また, 反応条件の最適化において, エージェントの性能を大幅に向上させる新しい粗ラベルコントラスト学習(ccl)ベースの化学指紋を導入する。上記の取り組みにより、提案したAIエージェントは、人間の相互作用なしに最適な反応条件の推薦を自律的に生成することができる。さらに、この剤は化学反応の点で非常にプロフェッシブである。乾板実験と湿板実験の両方において、人間に近い性能と強い一般化能力を示す。化学AIエージェントの最初の試みとして、この研究は「化学のためのAI」の分野をさらに前進させ、コンピュータ支援合成計画の新しい可能性を開く。 Artificial intelligence (AI) for reaction condition optimization has become an important topic in the pharmaceutical industry, given that a data-driven AI model can assist drug discovery and accelerate reaction design. However, existing AI models lack the chemical insights and real-time knowledge acquisition abilities of experienced human chemists. This paper proposes a Large Language Model (LLM) empowered AI agent to bridge this gap. We put forth a novel three-phase paradigm and applied advanced intelligence-enhancement methods like in-context learning and multi-LLM debate so that the AI agent can borrow human insight and update its knowledge by searching the latest chemical literature. Additionally, we introduce a novel Coarse-label Contrastive Learning (CCL) based chemical fingerprint that greatly enhances the agent's performance in optimizing the reaction condition. With the above efforts, the proposed AI agent can autonomously generate the optimal reaction condition recommendation without any human interaction. Further, the agent is highly professional in terms of chemical reactions. It demonstrates close-to-human performance and strong generalization capability in both dry-lab and wet-lab experiments. As the first attempt in the chemical AI agent, this work goes a step further in the field of "AI for chemistry" and opens up new possibilities for computer-aided synthesis planning.	翻訳日:2023-11-30 12:44:38 公開日:2023-11-28
# 電子スピン浴における個々の電子スピン対の制御 Control of individual electron-spin pairs in an electron-spin bath ( http://arxiv.org/abs/2311.10110v2 ) ライセンス: Link先を確認	H. P. Bartling, N. Demetriou, N. C. F. Zutt, D. Kwiatkowski, M. J. Degen, S. J. H. Loenen, C. E. Bradley, M. Markham, D. J. Twitchen, T. H. Taminiau	(参考訳) 結合電子スピン浴のダイナミクスによる中心電子スピンの脱コヒーレンスは、固体スピン物理学における中核的な問題である。アンサンブル実験は中心スピンコヒーレンスを詳細に研究しているが、そのような実験は浴槽の量子力学を平均している。ここで、電子スピン浴における個々のnv中心のコヒーレントなバックアクションを示し、それを用いて一対の浴スピンのダイナミクスを検出し、準備し、制御する。電子スピン対に符号化された量子ビットに対して,NVペア系をサブナノメータ分解能で撮像し,長い劣化時間(T_2^* = 44(9)$ ms)を明らかにする。実験では、中心スピンデコヒーレンスの基礎となる微視的量子力学を明らかにし、相互作用するスピン系の制御とセンシングの新たな機会を提供する。 The decoherence of a central electron spin due to the dynamics of a coupled electron-spin bath is a core problem in solid-state spin physics. Ensemble experiments have studied the central spin coherence in detail, but such experiments average out the underlying quantum dynamics of the bath. Here, we show the coherent back-action of an individual NV center on an electron-spin bath and use it to detect, prepare and control the dynamics of a pair of bath spins. We image the NV-pair system with sub-nanometer resolution and reveal a long dephasing time ($T_2^* = 44(9)$ ms) for a qubit encoded in the electron-spin pair. Our experiment reveals the microscopic quantum dynamics that underlie the central spin decoherence and provides new opportunities for controlling and sensing interacting spin systems.	翻訳日:2023-11-30 12:42:16 公開日:2023-11-28
# ブレーキング境界:ディープワイヤレストラフィック予測におけるバランシング性能とロバスト性 Breaking Boundaries: Balancing Performance and Robustness in Deep Wireless Traffic Forecasting ( http://arxiv.org/abs/2311.09790v3 ) ライセンス: Link先を確認	Romain Ilbert, Thai V. Hoang, Zonghua Zhang, Themis Palpanas	(参考訳) 正確性と堅牢性の間のトレードオフのバランスは、時系列予測における長年の課題である。既存のロバストなアルゴリズムのほとんどは、クリーンなデータに対してある種の準最適性能を達成したが、データ摂動の存在下では同じパフォーマンスレベルを維持することは、非常に難しいままである。本稿では,多種多様な摂動シナリオを考察し,実世界の通信データを用いた敵攻撃に対する防御機構を提案する。我々は,$\ell_{\infty}$-norm,$\in [0.1,0.4]$で定義される最大許容摂動の範囲で,既存の2つの敵訓練アルゴリズムと比較する。我々のハイブリッド戦略は, 敵対的サンプルを検出する分類器, 摂動データサンプルからノイズを除去するデノイザ, および標準予測器から構成されており, 清浄データと摂動データの両方で最高の性能を発揮する。我々の最適モデルは、クリーンデータにおける平均正方形誤差(MSE)の観点から、元の予測モデルの性能を最大92.02\%保ちつつ、摂動データにおける標準的な逆トレーニングモデルよりも堅牢である。 MSEは2.71$\times$と2.51$\times$で、通常のデータと摂動データの比較値よりも低い。さらに、モデルのコンポーネントを並列にトレーニングすることで、計算効率も向上します。本研究は, 高度で破壊的な毒殺攻撃があっても, 分類器とデノイザーの改善により, 予測モデルの性能と堅牢性のトレードオフを最適にバランスできることを示す。 Balancing the trade-off between accuracy and robustness is a long-standing challenge in time series forecasting. While most of existing robust algorithms have achieved certain suboptimal performance on clean data, sustaining the same performance level in the presence of data perturbations remains extremely hard. In this paper, we study a wide array of perturbation scenarios and propose novel defense mechanisms against adversarial attacks using real-world telecom data. We compare our strategy against two existing adversarial training algorithms under a range of maximal allowed perturbations, defined using $\ell_{\infty}$-norm, $\in [0.1,0.4]$. Our findings reveal that our hybrid strategy, which is composed of a classifier to detect adversarial examples, a denoiser to eliminate noise from the perturbed data samples, and a standard forecaster, achieves the best performance on both clean and perturbed data. Our optimal model can retain up to $92.02\%$ the performance of the original forecasting model in terms of Mean Squared Error (MSE) on clean data, while being more robust than the standard adversarially trained models on perturbed data. Its MSE is 2.71$\times$ and 2.51$\times$ lower than those of comparing methods on normal and perturbed data, respectively. In addition, the components of our models can be trained in parallel, resulting in better computational efficiency. Our results indicate that we can optimally balance the trade-off between the performance and robustness of forecasting models by improving the classifier and denoiser, even in the presence of sophisticated and destructive poisoning attacks.	翻訳日:2023-11-30 12:41:40 公開日:2023-11-28
# 連続学習における重み付け決定が知識伝達に及ぼす影響の検討 Investigating the Impact of Weight Sharing Decisions on Knowledge Transfer in Continual Learning ( http://arxiv.org/abs/2311.09506v2 ) ライセンス: Link先を確認	Josh Andle, Ali Payani, Salimeh Yasaei-Sekeh	(参考訳) 連続学習(CL)は、ニューラルネットワークの逐次トレーニングにおけるカタストロフィック・フォーッティング(CF)を回避する方法として注目され、異なるタスクに対するネットワーク効率と適応性が改善されている。さらにCLは、タスク間のネットワーク行動とフォワード知識伝達(FKT)を研究するための理想的な設定として機能する。 CLトレインサブネットワークのプルーニング手法は、FKTの調査に構造化されたアプローチを採ることができるように、シーケンシャルなタスクを処理する。以前のサブネットワークの重みを共有することは、FKTを通じて現在のタスクに対する過去の知識を活用する。どの重みを共有するかを理解することは、すべての重みを共有することで、準最適精度が得られる。本稿では,タスク間のfktに異なる共有判断が与える影響について検討する。このレンズを通して、タスクの複雑さと類似性が最適な重み付け決定にどのように影響するかを示し、タスク間の関係について洞察を与え、同様のCL手法による意思決定を支援する。 resnet-18とvgg-16の両方について,タスクの複雑さと類似性を強調する3つのシーケンシャルデータセットを実装した。結果から得られた決定に従って共有することで,他の共有決定よりもタスクの精度を向上させることができることを示す。 Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the sequential tasks which allows us to take a structured approach to investigating FKT. Sharing prior subnetworks' weights leverages past knowledge for the current task through FKT. Understanding which weights to share is important as sharing all weights can yield sub-optimal accuracy. This paper investigates how different sharing decisions affect the FKT between tasks. Through this lens we demonstrate how task complexity and similarity influence the optimal weight sharing decisions, giving insights into the relationships between tasks and helping inform decision making in similar CL methods. We implement three sequential datasets designed to emphasize variation in task complexity and similarity, reporting results for both ResNet-18 and VGG-16. By sharing in accordance with the decisions supported by our findings, we show that we can improve task accuracy compared to other sharing decisions.	翻訳日:2023-11-30 12:40:54 公開日:2023-11-28
# H-Packer:タンパク質側鎖包装のためのホログラフィック回転同変畳み込みニューラルネットワーク H-Packer: Holographic Rotationally Equivariant Convolutional Neural Network for Protein Side-Chain Packing ( http://arxiv.org/abs/2311.09312v2 ) ライセンス: Link先を確認	Gian Marco Visani, William Galvin, Michael Neal Pun, Armita Nourmohammad	(参考訳) タンパク質の正確なモデリングは機能タンパク質の設計に不可欠である。構造モデリングの重要なサブタスクは、タンパク質の背骨構造とアミノ酸配列から側鎖(ロータマー)の配座を予測するタンパク質側鎖パッキングである。この課題に対する従来のアプローチは、手作りエネルギー関数やロータマーライブラリに対する高価なサンプリング手順に依存している。近年、画像から画像への変換から原子座標の直接予測まで、非常に異なる定式化ではあるものの、データ駆動方式でこの問題に取り組むために、いくつかのディープラーニング手法が開発されている。ここでは、この問題をサイドチェインの真の自由度に対する合同回帰として表す: dihedral $\chi$ angles である。我々は、タスクの基本的な対称性を考慮しつつ、このタスクの目的関数を慎重に研究する。 2つの軽量回転同変ニューラルネットワーク上に構築されたサイドチェーンパッキングのための新しい2段階アルゴリズムであるホログラフィックパッカー(H-Packer)を提案する。 CASP13とCASP14の目標に対して,本手法の評価を行った。 H-Packerは計算効率が良く、従来の物理ベースのアルゴリズムよりも優れた性能を示し、代替のディープラーニングソリューションと競合する。 Accurately modeling protein 3D structure is essential for the design of functional proteins. An important sub-task of structure modeling is protein side-chain packing: predicting the conformation of side-chains (rotamers) given the protein's backbone structure and amino-acid sequence. Conventional approaches for this task rely on expensive sampling procedures over hand-crafted energy functions and rotamer libraries. Recently, several deep learning methods have been developed to tackle the problem in a data-driven way, albeit with vastly different formulations (from image-to-image translation to directly predicting atomic coordinates). Here, we frame the problem as a joint regression over the side-chains' true degrees of freedom: the dihedral $\chi$ angles. We carefully study possible objective functions for this task, while accounting for the underlying symmetries of the task. We propose Holographic Packer (H-Packer), a novel two-stage algorithm for side-chain packing built on top of two light-weight rotationally equivariant neural networks. We evaluate our method on CASP13 and CASP14 targets. H-Packer is computationally efficient and shows favorable performance against conventional physics-based algorithms and is competitive against alternative deep learning solutions.	翻訳日:2023-11-30 12:40:32 公開日:2023-11-28
# 異方性中心スピンモデルによるスピンスクイーズ Spin squeezing generated by the anisotropic central spin model ( http://arxiv.org/abs/2311.11308v2 ) ライセンス: Link先を確認	Lei Shao and Libin Fu	(参考訳) スピンスクイージングは、重要な量子資源として、量子力学において重要な役割を担い、高精度なパラメータ推定スキームを実現できる。本稿では,異方性中心スピン系におけるスピンのスクイージングと量子相転移について検討する。このような中心スピン系は、中心スピンとスピン浴の間の遷移周波数の比が無限大に向かう限界において、異方性リプキン-メシュコフ-グリック模型にマッピングできる。この性質は1軸のねじれ相互作用を誘発し、スピンスクイーズを生成する新しい可能性を与える。我々は、基底状態と中心スピンモデルの動的進化を通してスピンスクイーズ状態を生成することを別々に検討する。その結果, スピンスクイーズパラメータは異方性パラメータが減少するにつれて向上し, その値はシステムサイズで$N^{-2/3}$となることがわかった。さらに, 臨界点周辺の量子フィッシャー情報の臨界指数を数値シミュレーションにより求め, その値が周波数比として4/3ドル, システムサイズが無限大に近づく傾向が確認された。この研究はスピンスクイーズ状態を生成するための有望なスキームを提供し、量子センシングの潜在的な進歩の道を開く。 Spin squeezing, as a crucial quantum resource, plays a pivotal role in quantum metrology, enabling us to achieve high-precision parameter estimation schemes. Here we investigate the spin squeezing and the quantum phase transition in an anisotropic central spin system. We find that this kind of central spin systems can be mapped to the anisotropic Lipkin-Meshkov-Glick model in the limit where the ratio of transition frequencies between the central spin and the spin bath tends towards infinity. This property can induce a one-axis twisting interaction and provides a new possibility for generating spin squeezing. We separately consider generating spin-squeezed states via the ground state and the dynamic evolution of the central spin model. The results show that the spin squeezing parameter improves as the anisotropy parameter decreases, and its value scales with system size as $N^{-2/3}$. Furthermore, we obtain the critical exponent of the quantum Fisher information around the critical point by numerical simulation, and find its value tends to $4/3$ as the frequency ratio and the system size approach infinity. This work offers a promising scheme for generating spin-squeezed state and paves the way for potential advancements in quantum sensing.	翻訳日:2023-11-30 12:29:34 公開日:2023-11-28
# データ駆動CFD壁モデリングのための前方勾配 Forward Gradients for Data-Driven CFD Wall Modeling ( http://arxiv.org/abs/2311.11876v2 ) ライセンス: Link先を確認	Jan H\"uckelheim, Tadbhagya Kumar, Krishnan Raghavan, Pinaki Pal	(参考訳) 計算流体力学(CFD、Computational Fluid Dynamics)は、ガスタービンの設計と最適化に用いられている。しかし, 実用化は計算コストの増大によって制限されることが多く, ほぼ壁面流れの正確な分解能がこのコストに大きく寄与する。機械学習(ML)や他のデータ駆動手法は、既存の壁モデルを補完することができる。それでも、これらのモデルのトレーニングは、バックプロパゲーションによって要求される膨大な計算労力とメモリフットプリントによってボトルネックとなる。最近の研究では、勾配の偏りのない推定器が単一の前方スイープで計算されるため、別個の前方および後方スイープが不要で、スイープ間の中間結果の保存が不要なニューラルネットワークの勾配を計算するための代替案が提示されている。本稿では,予測精度を保ちつつ計算オーバーヘッドを削減するために,壁面境界流cfdシミュレーションにおけるサロゲートとして使用可能なサブグリッド壁モデルのトレーニングにおける,この手法の適用について述べる。 Computational Fluid Dynamics (CFD) is used in the design and optimization of gas turbines and many other industrial/ scientific applications. However, the practical use is often limited by the high computational cost, and the accurate resolution of near-wall flow is a significant contributor to this cost. Machine learning (ML) and other data-driven methods can complement existing wall models. Nevertheless, training these models is bottlenecked by the large computational effort and memory footprint demanded by back-propagation. Recent work has presented alternatives for computing gradients of neural networks where a separate forward and backward sweep is not needed and storage of intermediate results between sweeps is not required because an unbiased estimator for the gradient is computed in a single forward sweep. In this paper, we discuss the application of this approach for training a subgrid wall model that could potentially be used as a surrogate in wall-bounded flow CFD simulations to reduce the computational overhead while preserving predictive accuracy.	翻訳日:2023-11-30 12:22:48 公開日:2023-11-28
# HCAI方法論フレームワーク:人間中心のAIを実現するためのアクションにそれを組み込む An HCAI Methodological Framework: Putting It Into Action to Enable Human-Centered AI ( http://arxiv.org/abs/2311.16027v2 ) ライセンス: Link先を確認	Wei Xu, Zaifeng Gao, Marvin Dainoff	(参考訳) 人間中心型AI(HCAI)は、設計哲学として、人間に対してAI技術の利点を最大化し、その潜在的な悪影響を避けることを目的として、インテリジェントシステムの設計、開発、デプロイにおいて人間の優先順位を主張する。 HCAIは勢いを増しているが、その実装における方法論に関するガイダンスの欠如は、その採用を困難にしている。本稿では,hcaiの方法論的枠組みの必要性を評価し,まず設計目標,設計原則,実装アプローチ,設計パラダイム,学際チーム,方法,プロセスを含む7つの要素を統合した総合的かつ学際的なhcai方法論フレームワークを提案する。フレームワークの意味についても論じている。本稿では,フレームワークの実装を容易にする"3層"アプローチを提案する。提案するフレームワークは体系的で実行可能であり、現在のフレームワークの弱点と現在HCAIの実装で直面している課題を克服できると考えています。したがって、このフレームワークはHCAIを実際に開発、移行、実装するためのアクションに役立ち、最終的にHCAIベースのインテリジェントシステムの設計、開発、デプロイを可能にします。 Human-centered AI (HCAI), as a design philosophy, advocates prioritizing humans in designing, developing, and deploying intelligent systems, aiming to maximize the benefits of AI technology to humans and avoid its potential adverse effects. While HCAI has gained momentum, the lack of guidance on methodology in its implementation makes its adoption challenging. After assessing the needs for a methodological framework for HCAI, this paper first proposes a comprehensive and interdisciplinary HCAI methodological framework integrated with seven components, including design goals, design principles, implementation approaches, design paradigms, interdisciplinary teams, methods, and processes. THe implications of the framework are also discussed. This paper also presents a "three-layer" approach to facilitate the implementation of the framework. We believe the proposed framework is systematic and executable, which can overcome the weaknesses in current frameworks and the challenges currently faced in implementing HCAI. Thus, the framework can help put it into action to develop, transfer, and implement HCAI in practice, eventually enabling the design, development, and deployment of HCAI-based intelligent systems.	翻訳日:2023-11-30 12:16:31 公開日:2023-11-28
# 逆行性ドゥードル:解釈可能な人力攻撃は説明可能な洞察を与える Adversarial Doodles: Interpretable and Human-drawable Attacks Provide Describable Insights ( http://arxiv.org/abs/2311.15994v2 ) ライセンス: Link先を確認	Ryoya Nara and Yusuke Matsui	(参考訳) DNNに基づく画像分類モデルは、敵対的攻撃の影響を受けやすい。これまでのほとんどの敵攻撃は、生成した敵の例の解釈可能性に焦点を合わせておらず、攻撃から標的分類器のメカニズムを把握できない。そこで本研究では,解釈可能な形状を持つ逆ドゥードルを提案する。黒のb\'ezier曲線を最適化し、対象の分類器を入力画像に重ねて騙す。ランダムな視点変換を導入し, doodled領域を正則化することにより, 人間が手で複製した場合でも, 誤分類を引き起こすコンパクトな攻撃が得られる。 adversarial doodlesは、攻撃と分類器の出力との関係について、批判的で興味深い洞察を与えてくれる。逆向きのドゥードルを利用して、「頭部に2つのストローク、体に1つの三角形、鳥の画像に三角形の内側に2つの線を加えます。そして、分類器は画像を蝶と誤分類します。」というように、対象の分類器に固有のバイアスを発見する。 DNN-based image classification models are susceptible to adversarial attacks. Most previous adversarial attacks do not focus on the interpretability of the generated adversarial examples, and we cannot gain insights into the mechanism of the target classifier from the attacks. Therefore, we propose Adversarial Doodles, which have interpretable shapes. We optimize black b\'ezier curves to fool the target classifier by overlaying them onto the input image. By introducing random perspective transformation and regularizing the doodled area, we obtain compact attacks that cause misclassification even when humans replicate them by hand. Adversarial doodles provide describable and intriguing insights into the relationship between our attacks and the classifier's output. We utilize adversarial doodles and discover the bias inherent in the target classifier, such as "We add two strokes on its head, a triangle onto its body, and two lines inside the triangle on a bird image. Then, the classifier misclassifies the image as a butterfly."	翻訳日:2023-11-30 12:16:12 公開日:2023-11-28
# 行動と状態依存信号可変を用いた適応ベイズ学習 Adaptive Bayesian Learning with Action and State-Dependent Signal Variance ( http://arxiv.org/abs/2311.12878v2 ) ライセンス: Link先を確認	Kaiwen Hou	(参考訳) 本稿では,行動と状態に依存した信号分散を意思決定モデルに組み込むことにより,ベイズ学習のための高度な枠組みを提案する。この枠組みは、様々な経済システムにおける複雑なデータフィードバックループと意思決定プロセスを理解する上で重要である。安定環境における単純なベイズ的更新から、社会学習と状態依存的不確実性を伴う複雑なモデルまで、さまざまな状況においてこのアプローチが多様であることを示します。この論文は、データ、行動、成果、および経済モデルにおける固有の不確実性の間の曖昧な相互作用の理解に一意的に貢献する。 This manuscript presents an advanced framework for Bayesian learning by incorporating action and state-dependent signal variances into decision-making models. This framework is pivotal in understanding complex data-feedback loops and decision-making processes in various economic systems. Through a series of examples, we demonstrate the versatility of this approach in different contexts, ranging from simple Bayesian updating in stable environments to complex models involving social learning and state-dependent uncertainties. The paper uniquely contributes to the understanding of the nuanced interplay between data, actions, outcomes, and the inherent uncertainty in economic models.	翻訳日:2023-11-30 10:15:19 公開日:2023-11-28
# 鉄筋コンクリートスラブ列接合部のせん断強度予測モデルの比較解析 Comparative Analysis of Shear Strength Prediction Models for Reinforced Concrete Slab-Column Connections ( http://arxiv.org/abs/2311.12824v2 ) ライセンス: Link先を確認	Sarmed Wahab, Nasim Shakouri Mahmoudabadi, Sarmad Waqas, Nouman Herl, Muhammad Iqbal, Khurshid Alam, Afaq Ahmad	(参考訳) 本研究の目的は,スラブ列接続におけるせん断強度予測の比較解析,機械学習,設計符号,有限要素解析である。 ACI 318-19(ACI)、Eurocode 2(EC2)、Compressive Force Path(CFP)、Feed Forward Neural Network(FNN)ベースのArtificial Neural Network(ANN)、PSOベースのFNN(PSOFNN)、BATアルゴリズムベースのBATFNNの現在の設計コード(CDC)が使用されている。実験結果と機械学習予測を検証するためのslabのfeaを補完し、psofnnとbatfnnのハイブリッドモデルでは、平均二乗誤差を、slabデータ上で予測を行うためにフィードフォワードニューラルネットワークによって使用される重みの最適化値を得る目的関数として用いる。 PSOFNN、BATFNN、FNNの7つの異なるモデルがこのデータに基づいてトレーニングされ、その結果、PSOFNNが全体として最高のモデルであることが判明した。 PSOFNNは、SCS=1の最高値が99.37%、MSEが最低値が0.0275%、MAEが1.214%であり、それぞれR、MSE、MAEが97.464%、0.0492%、1.43%である。 This research aims at comparative analysis of shear strength prediction at slab-column connection, unifying machine learning, design codes and Finite Element Analysis. Current design codes (CDCs) of ACI 318-19 (ACI), Eurocode 2 (EC2), Compressive Force Path (CFP) method, Feed Forward Neural Network (FNN) based Artificial Neural Network (ANN), PSO-based FNN (PSOFNN), and BAT algorithm-based BATFNN are used. The study is complemented with FEA of slab for validating the experimental results and machine learning predictions.In the case of hybrid models of PSOFNN and BATFNN, mean square error is used as an objective function to obtain the optimized values of the weights, that are used by Feed Forward Neural Network to perform predictions on the slab data. Seven different models of PSOFNN, BATFNN, and FNN are trained on this data and the results exhibited that PSOFNN is the best model overall. PSOFNN has the best results for SCS=1 with highest value of R as 99.37% and lowest of MSE, and MAE values of 0.0275%, and 1.214% respectively which are better than the best FNN model for SCS=4 having the values of R, MSE, and MAE as 97.464%, 0.0492%, and 1.43%, respectively.	翻訳日:2023-11-30 10:15:06 公開日:2023-11-28
# sharegpt4v: 大きなマルチモーダルモデルの改善とキャプションの改善 ShareGPT4V: Improving Large Multi-Modal Models with Better Captions ( http://arxiv.org/abs/2311.12793v2 ) ライセンス: Link先を確認	Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin	(参考訳) 大規模マルチモーダルモデル(LMM)の領域では、高画質の画像テキストデータの不足により、効率的なモダリティアライメントが不可欠である。このボトルネックに対処するため,世界知識,オブジェクト特性,空間関係,美的評価を網羅し,多様性と情報内容の既存のデータセットを超越した120万の高説明キャプションを備えた大規模リソースであるShareGPT4Vデータセットを紹介した。具体的には、ShareGPT4Vは、高度なGPT4-Visionから収集された100Kの高品質キャプションから生まれ、このサブセットで訓練されたスーパーキャプションモデルで1.2Mに拡張されている。 ShareGPT4Vは、既存のSFTデータセットの詳細なキャプションを高品質なキャプションのサブセットに置き換え、MMEおよびMMBenchベンチマークにおけるLLaVA-7B、LLaVA-1.5-13B、Qwen-VL-Chat-7BなどのLMMを大幅に強化し、それぞれ222.8/22.0/22.3と2.7/1.3/1.5のゲインを付与することで、SFT(Supervised Fine-Tuning)フェーズの有効性を最初に示す。さらに、事前学習とSFTフェーズの両方にShareGPT4Vデータを組み込み、マルチモーダルベンチマークの大部分で顕著な性能を持つ単純なアーキテクチャに基づく優れたLMMであるShareGPT4V-7Bを得る。このプロジェクトはhttps://ShareGPT4V.github.ioで公開されており、LMMコミュニティを前進させるための重要なリソースとなっている。 In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data. To address this bottleneck, we introduce the ShareGPT4V dataset, a pioneering large-scale resource featuring 1.2 million highly descriptive captions, which surpasses existing datasets in diversity and information content, covering world knowledge, object properties, spatial relationships, and aesthetic evaluations. Specifically, ShareGPT4V originates from a curated 100K high-quality captions collected from advanced GPT4-Vision and has been expanded to 1.2M with a superb caption model trained on this subset. ShareGPT4V first demonstrates its effectiveness for the Supervised Fine-Tuning (SFT) phase, by substituting an equivalent quantity of detailed captions in existing SFT datasets with a subset of our high-quality captions, significantly enhancing the LMMs like LLaVA-7B, LLaVA-1.5-13B, and Qwen-VL-Chat-7B on the MME and MMBench benchmarks, with respective gains of 222.8/22.0/22.3 and 2.7/1.3/1.5. We further incorporate ShareGPT4V data into both the pre-training and SFT phases, obtaining ShareGPT4V-7B, a superior LMM based on a simple architecture that has remarkable performance across a majority of the multi-modal benchmarks. This project is available at https://ShareGPT4V.github.io to serve as a pivotal resource for advancing the LMMs community.	翻訳日:2023-11-30 10:14:33 公開日:2023-11-28
# 標準量子限界を超えた量子イメージングと相蒸留 Quantum Imaging Beyond the Standard-Quantum Limit and Phase Distillation ( http://arxiv.org/abs/2311.12782v2 ) ライセンス: Link先を確認	Simon Schaffrath, Daniel Derr, Markus Gr\"afe, Enno Giese	(参考訳) 非線形干渉計を用いた量子センシングは、興味の対象と相互作用しない光を使ったバイカラーイメージングの可能性を提供し、位相超感度、すなわち位相不確実性のハイゼンベルク型スケーリングを実現する方法を提供する。このようなスケーリング動作はノイズに非常に敏感であり、デバイスの最適作業点を定義する特定のフェーズでのみ発生する。位相シフトアルゴリズムはノイズによる消去効果に対してある程度頑健であるが、干渉計位相を広い範囲にわたってチューニングすることで画像を抽出する。本研究では,非線形干渉計の動作の自発性と高利得性の両方について検討する。実際、蒸留技術を使った自発的な状態において、作業点での運転は質的に類似した行動をもたらす。しかし、高利得体制においては、典型的な蒸留技術は、スクラップ真空の光子統計の結果、本質的に標準量子限界よりもスケーリングを禁止した。対照的に、作業点での操作は、ノイズの存在下でもショットノイズ以下の感度をもたらす可能性がある。したがって, この手法は, 作業点近傍で作業することで, ショットノイズ位相の不確かさよりも優れたバイカラーイメージングの視点を開く。提案手法は, バイカラー画像と位相超感度を組み合わせることで, 高利得環境に量子画像蒸留を移し, そのポテンシャルを最大限に活用することを目的としている。 Quantum sensing using non-linear interferometers offers the possibility of bicolour imaging, using light that never interacted with the object of interest, and provides a way to achieve phase supersensitivity, i.e. a Heisenberg-type scaling of the phase uncertainty. Such a scaling behaviour is extremely susceptible to noise and only arises at specific phases that define the optimal working point of the device. While phase-shifting algorithms are to some degree robust against the deleterious effects induced by noise they extract an image by tuning the interferometer phase over a broad range, implying an operation beyond the working point. In our theoretical study, we investigate both the spontaneous and the high-gain regime of operation of a non-linear interferometer. In fact, in the spontaneous regime using a distillation technique and operating at the working point leads to a qualitatively similar behaviour. In the high-gain regime, however, typical distillation techniques inherently forbid a scaling better than the standard-quantum limit, as a consequence of the photon statistics of squeezed vacuum. In contrast, an operation at the working point still may lead to a sensitivity below shot noise, even in the presence of noise. Therefore, this procedure opens the perspective of bicolour imaging with a better than shot-noise phase uncertainty by working in the vicinity of the working point. Our results transfer quantum imaging distillation in a noisy environment to the high-gain regime with the ultimate goal of harnessing its full potential by combining bicolour imaging and phase supersensitivity.	翻訳日:2023-11-30 10:13:56 公開日:2023-11-28
# グラフの大規模言語モデルに関する調査 - 進展と今後の方向性 A Survey of Graph Meets Large Language Model: Progress and Future Directions ( http://arxiv.org/abs/2311.12399v2 ) ライセンス: Link先を確認	Yuhan Li, Zhixun Li, Peisong Wang, Jia Li, Xiangguo Sun, Hong Cheng, Jeffrey Xu Yu	(参考訳) グラフは、引用ネットワーク、ソーシャルネットワーク、生物学的データといった現実世界のアプリケーションにおける複雑な関係の表現と分析において重要な役割を果たす。近年,様々な領域で大きな成功を収めたLarge Language Models (LLM) もグラフ関連タスクに活用され,従来のグラフニューラルネットワーク(GNN)ベースの手法を超越し,最先端のパフォーマンスを実現している。本稿ではまず,LLMとグラフを統合する既存手法の総合的なレビューと分析を行う。まず,グラフ関連タスクにおいてllmが果たす役割(エンハンサー,予測子,アライメント成分)に基づいて,既存の手法を3つのカテゴリに分類する新しい分類法を提案する。次に,分類学の3つのカテゴリに沿って,代表的な手法を体系的に調査する。最後に,既存研究の残り限界について論じ,今後の研究に期待できる道のりを明らかにする。関連する論文は要約され、一貫して更新される。 https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks。 Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i.e., enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks.	翻訳日:2023-11-30 10:12:22 公開日:2023-11-28
# 大規模基礎モデルの自律運転への適用 Applications of Large Scale Foundation Models for Autonomous Driving ( http://arxiv.org/abs/2311.12144v5 ) ライセンス: Link先を確認	Yu Huang, Yue Chen, Zhu Li	(参考訳) 2004/05年のDARPA Grand Challenges、2007年のUrban Challenges以来、自動運転はAIアプリケーションの最も活発な分野となっている。近年,大規模言語モデル (LLM) を基盤として,チャットGPT や PaLM などのチャットシステムが出現し,自然言語処理 (NLP) において人工知能 (AGI) を実現するための有望な方向となった。自動運転の改革にこれらの能力を使うことは自然な考えだ。 llmを基礎モデルと組み合わせることで、人間の知識、常識、推論を利用して、現在のロングテールのaiジレンマから自動運転システムを再構築することができる。本稿では、シミュレーション、世界モデル、データアノテーションと計画、E2Eソリューションなどに分類される、自動運転に応用された基礎モデルとLLMの技術について検討する。 Since DARPA Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.	翻訳日:2023-11-30 10:11:32 公開日:2023-11-28
# 時系列加速度に基づく年齢関連歩行分類のための深層学習モデル Explaining Deep Learning Models for Age-related Gait Classification based on time series acceleration ( http://arxiv.org/abs/2311.12089v2 ) ライセンス: Link先を確認	Xiaoping Zheng, Bert Otten, Michiel F Reneman, Claudine JC Lamoth	(参考訳) 歩行分析は、特に高齢者の日常生活のモニタリングにおいて重要な役割を担っている。センサー技術の進歩により、実生活環境での動きを捉え、ビッグデータを生成することができる。機械学習、特にディープラーニング(DL)は、これらのビッグデータを歩行分析に使用することを約束している。しかしながら、これらのモデル固有のブラックボックスの性質は、臨床応用に課題をもたらす。本研究の目的は,SHAPなどの説明可能な人工知能を用いた高齢歩行パターンに対するDLに基づく歩行分類の透明性を高めることである。対象は,成人129名,高齢者115名(65歳)の計244名であった。彼らは3分間の歩行作業を行い、加速度計を腰椎セグメントL3に取り付けた。成人群と高齢者群を分類するために, DLモデル, 畳み込みニューラルネットワーク(CNN)およびゲートリカレントユニット(GRU)を1ストライド, 8ストライドアクセラレーションを用いて訓練した。 SHAPはモデルの予測を説明するために使用された。 cnn は 81.4% の精度と 0.89 の auc で満足のいく性能を達成し、 gru は 84.5% の精度と 0.94 の auc で有望な結果を示した。 shap分析の結果、cnnとgruは、垂直方向と歩行方向からのデータにより高いシェープ値を割り当て、特に、端末スイングから負荷応答フェーズにまたがるヒール接触周辺のデータを強調した。さらに、SHAP値から、GRUは全てのストライドを等しく扱っていないことが示された。 CNNは, シングルストライドデータの特徴から, 成人と高齢者を正確に区別した。 GRUはストライド間の関係と微妙な差異を考慮して正確な分類を行った。両モデルとも、ヒール接触周辺のデータは最も重要であり、異なる年齢群間の歩行における加速度と減速パターンの違いが示唆された。 Gait analysis holds significant importance in monitoring daily health, particularly among older adults. Advancements in sensor technology enable the capture of movement in real-life environments and generate big data. Machine learning, notably deep learning (DL), shows promise to use these big data in gait analysis. However, the inherent black-box nature of these models poses challenges for their clinical application. This study aims to enhance transparency in DL-based gait classification for aged-related gait patterns using Explainable Artificial Intelligence, such as SHAP. A total of 244 subjects, comprising 129 adults and 115 older adults (age>65), were included. They performed a 3-minute walking task while accelerometers were affixed to the lumbar segment L3. DL models, convolutional neural network (CNN) and gated recurrent unit (GRU), were trained using 1-stride and 8-stride accelerations, respectively, to classify adult and older adult groups. SHAP was employed to explain the models' predictions. CNN achieved a satisfactory performance with an accuracy of 81.4% and an AUC of 0.89, and GRU demonstrated promising results with an accuracy of 84.5% and an AUC of 0.94. SHAP analysis revealed that both CNN and GRU assigned higher SHAP values to the data from vertical and walking directions, particularly emphasizing data around heel contact, spanning from the terminal swing to loading response phases. Furthermore, SHAP values indicated that GRU did not treat every stride equally. CNN accurately distinguished between adults and older adults based on the characteristics of a single stride's data. GRU achieved accurate classification by considering the relationships and subtle differences between strides. In both models, data around heel contact emerged as most critical, suggesting differences in acceleration and deceleration patterns during walking between different age groups.	翻訳日:2023-11-30 10:10:59 公開日:2023-11-28
# 協調的側方情報を用いた知覚画像圧縮 Perceptual Image Compression with Cooperative Cross-Modal Side Information ( http://arxiv.org/abs/2311.13847v2 ) ライセンス: Link先を確認	Shiyu Qin, Bin Chen, Yujun Huang, Baoyi An, Tao Dai, Shu-Tao Xia	(参考訳) データの爆発により、画像とともに多くの関連テキストが送信されるようになった。分散ソース符号化から着想を得た多くの作品が画像側情報を利用して画像圧縮を強化する。しかし、既存の手法では、マルチモーダル・シナジーの利点が研究で広く実証されているにもかかわらず、画像の知覚的圧縮を高めるために、テキストをサイド情報として使うことを考慮していない。テキストレベルのセマンティクスを効果的に転送して、デコーダにのみ使用可能な画像圧縮を支援するには、どうすればよいのか? 本研究では,テキスト誘導側情報を用いた新しい深層画像圧縮手法を提案する。具体的には,CLIPテキストエンコーダとSemantic-Spatial Awareブロックを用いてテキストと画像の特徴を融合する。これは、学習したテキスト適応アフィン変換をピクセルレベルで導くためにセマンティックマスクを予測することで実現される。さらに,再構成画像の知覚品質を向上させるために,テキスト条件生成対向ネットワークを設計する。 4つのデータセットと10の画像品質評価指標を含む大規模な実験により、提案手法は速度知覚トレードオフと意味的歪みの点で優れた結果が得られることを示した。 The explosion of data has resulted in more and more associated text being transmitted along with images. Inspired by from distributed source coding, many works utilize image side information to enhance image compression. However, existing methods generally do not consider using text as side information to enhance perceptual compression of images, even though the benefits of multimodal synergy have been widely demonstrated in research. This begs the following question: How can we effectively transfer text-level semantic dependencies to help image compression, which is only available to the decoder? In this work, we propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff. Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features. This is done by predicting a semantic mask to guide the learned text-adaptive affine transformation at the pixel level. Furthermore, we design a text-conditional generative adversarial networks to improve the perceptual quality of reconstructed images. Extensive experiments involving four datasets and ten image quality assessment metrics demonstrate that the proposed approach achieves superior results in terms of rate-perception trade-off and semantic distortion.	翻訳日:2023-11-30 10:04:14 公開日:2023-11-28
# 可変レート画像圧縮のためのビジュアルプロンプトチューニングによるプログレッシブラーニング Progressive Learning with Visual Prompt Tuning for Variable-Rate Image Compression ( http://arxiv.org/abs/2311.13846v2 ) ライセンス: Link先を確認	Shiyu Qin, Yimin Zhou, Jinpeng Wang, Bin Chen, Baoyi An, Tao Dai, Shu-Tao Xia	(参考訳) 本稿では,変圧器を用いた可変レート画像圧縮のための漸進学習パラダイムを提案する。提案手法は,Layer-Adaptive Prompt Module (LPM) の助けを借りて,幅広い圧縮率をカバーする。視覚的プロンプトチューニングにより,LPMを用いてエンコーダ側の入力画像とデコーダ側の隠れ特徴のプロンプトを抽出し,事前学習されたトランスフォーマーベース画像圧縮モデルのSwinトランスフォーマー層に付加情報として供給し,アテンション領域とビットの割り当てに影響を及ぼし,モデルの目標圧縮率を変化させる。ネットワークがより軽量であることを保証するため、より畳み込みの少ないプロンプトネットワークの統合を伴います。実験の結果,異なるターゲットレートで個別に最適化された複数のモデルに基づく手法と比較して,パラメータストレージの80%,データセットの90%の削減で,提案手法は同一性能に到達した。一方,本モデルでは,現在の可変ビットレート画像法をレートゆらぎ性能で上回り,スクラッチからトレーニングした最先端の固定ビットレート画像圧縮手法にアプローチする。 In this paper, we propose a progressive learning paradigm for transformer-based variable-rate image compression. Our approach covers a wide range of compression rates with the assistance of the Layer-adaptive Prompt Module (LPM). Inspired by visual prompt tuning, we use LPM to extract prompts for input images and hidden features at the encoder side and decoder side, respectively, which are fed as additional information into the Swin Transformer layer of a pre-trained transformer-based image compression model to affect the allocation of attention region and the bits, which in turn changes the target compression ratio of the model. To ensure the network is more lightweight, we involves the integration of prompt networks with less convolutional layers. Exhaustive experiments show that compared to methods based on multiple models, which are optimized separately for different target rates, the proposed method arrives at the same performance with 80% savings in parameter storage and 90% savings in datasets. Meanwhile, our model outperforms all current variable bitrate image methods in terms of rate-distortion performance and approaches the state-of-the-art fixed bitrate image compression methods trained from scratch.	翻訳日:2023-11-30 10:03:07 公開日:2023-11-28
# CompenHR:高分解能プロジェクタの効率的な完全補償 CompenHR: Efficient Full Compensation for High-resolution Projector ( http://arxiv.org/abs/2311.13409v2 ) ライセンス: Link先を確認	Yuxi Wang, Haibin Ling, Bingyao Huang	(参考訳) プロジェクター補償はプロジェクターカメラシステムの実用的なタスクである。プロジェクターの入力画像である補償画像を見つけることを目的としており、プロジェクターが投影されると物理的環境やハードウェアによる幾何学的および測光的歪みがキャンセルされる。最先端の手法では、ディープラーニングを使用してこの問題に対処し、低解像度設定で有望なパフォーマンスを示す。しかしながら、高分解能設定にディープラーニングを直接適用することは、長いトレーニング時間と高いメモリコストのため、現実的ではない。この問題に対処するため,本論文では,実用的な完全補償ソリューションを提案する。まず,幾何学的補正の質を向上させるために,注意に基づくグリッドリファインメントネットワークを設計する。次に,新しいサンプリング方式をエンドツーエンド補償ネットワークに統合し,計算の軽減と注意ブロックの導入により重要な特徴の保存を行う。最後に,高分解能プロジェクタフル補償のためのベンチマークデータセットを構築した。実験では,効率と品質の両面で明らかな優位性を示す。 Full projector compensation is a practical task of projector-camera systems. It aims to find a projector input image, named compensation image, such that when projected it cancels the geometric and photometric distortions due to the physical environment and hardware. State-of-the-art methods use deep learning to address this problem and show promising performance for low-resolution setups. However, directly applying deep learning to high-resolution setups is impractical due to the long training time and high memory cost. To address this issue, this paper proposes a practical full compensation solution. Firstly, we design an attention-based grid refinement network to improve geometric correction quality. Secondly, we integrate a novel sampling scheme into an end-to-end compensation network to alleviate computation and introduce attention blocks to preserve key features. Finally, we construct a benchmark dataset for high-resolution projector full compensation. In experiments, our method demonstrates clear advantages in both efficiency and quality.	翻訳日:2023-11-30 10:01:04 公開日:2023-11-28
# 責任あるaiの未来の構築 - 大規模言語モデルに基づくエージェント設計のためのリファレンスアーキテクチャ Building the Future of Responsible AI: A Reference Architecture for Designing Large Language Model based Agents ( http://arxiv.org/abs/2311.13148v2 ) ライセンス: Link先を確認	Qinghua Lu, Liming Zhu, Xiwei Xu, Zhenchang Xing, Stefan Harrer, Jon Whittle	(参考訳) 大規模言語モデル(llm)は,推論能力を備えた計画を含むコンテンツの理解と生成能力から,トランスフォーメーション型人工生成知能(agi)技術として広く認識されている。ファンデーションモデルに基づくエージェントは、ファンデーションモデルの能力から自主性を導き、与えられた目標を自律的に管理可能なタスクのセットに分解し、目標を達成するためにタスク実行を編成することを可能にする。基礎モデルに基づく自律エージェントの構築に対する多大な努力にもかかわらず、エージェントのアーキテクチャ設計はまだ体系的に検討されていない。また、自律エージェントを計画と実行に使用するという大きなメリットもあるが、セキュリティや説明責任など、AI関連のソフトウェア品質特性の責任については、深刻な考慮がある。そこで本稿では,基盤モデルに基づく自律エージェントの設計において,アーキテクチャ設計指針として機能するパターン指向参照アーキテクチャを提案する。 2つの実世界のエージェントのアーキテクチャにマッピングすることで,提案する参照アーキテクチャの完全性と有用性を評価する。 Large language models (LLMs) have been widely recognised as transformative artificial generative intelligence (AGI) technologies due to their capabilities to understand and generate content, including plans with reasoning capabilities. Foundation model based agents derive their autonomy from the capabilities of foundation models, which enable them to autonomously break down a given goal into a set of manageable tasks and orchestrate task execution to meet the goal. Despite the huge efforts put into building foundation model based autonomous agents, the architecture design of the agents has not yet been systematically explored. Also, while there are significant benefits of using autonomous agents for planning and execution, there are serious considerations regarding responsible AI related software quality attributes, such as security and accountability. Therefore, this paper presents a pattern-oriented reference architecture that serves as architecture design guidance and enables responsible-AI-by-design when designing foundation model based autonomous agents. We evaluate the completeness and utility of the proposed reference architecture by mapping it to the architecture of two real-world agents.	翻訳日:2023-11-30 09:59:24 公開日:2023-11-28
# E-polis:社会学調査のゲーミフィケーションのための真剣なゲーム E-polis: A serious game for the gamification of sociological surveys ( http://arxiv.org/abs/2311.14680v2 ) ライセンス: Link先を確認	Alexandros Gazis, Eleftheria Katsiri	(参考訳) E-polisは、若者の理想的な社会に関する意見を研究するための社会学的調査をゲーミフィケーションするマルチプラットフォーム真剣なゲームである。このゲームプレイは「ジレンマ」として知られる社会的および教育的な調査に対する反応によって引き起こされた変化を経験するデジタルシティをナビゲートするユーザーに基づいている。ゲームは冒険、探索、シミュレーションの要素を統合する。 Unityはゲームの開発に使用される選択されたゲームエンジンであり、ミドルウエアコンポーネントもユーザデータの収集と処理のために開発された。ゲームの最後には、ユーザがナビゲートした都市の青写真を表示して、自分たちの選択がその開発にどのように影響するかを示す。これが彼らの回答を反映し、それを検証する動機となります。このゲームは、社会正義や経済発展などの様々なトピックに関するデータを収集したり、市民のエンゲージメントを促進し、若者が周囲の世界について批判的に考えるように促したりするのに使うことができる。 E-polis is a multi-platform serious game that gamifies a sociological survey for studying young people's opinions regarding their ideal society. The gameplay is based on a user navigating through a digital city, experiencing the changes inflicted, triggered by responses to social and pedagogical surveys, known as "dilemmas". The game integrates elements of adventure, exploration, and simulation. Unity was the selected game engine used for the development of the game, while a middleware component was also developed to gather and process the users' data. At the end of each game, users are presented with a blueprint of the city they navigated to showcase how their choices influenced its development. This motivates them to reflect on their answers and validate them. The game can be used to collect data on a variety of topics, such as social justice, and economic development, or to promote civic engagement and encourage young people to think critically about the world around them.	翻訳日:2023-11-30 09:48:50 公開日:2023-11-28
# コンピューティング教育会議における「Medium-n Study」 "Medium-n studies" in computing education conferences ( http://arxiv.org/abs/2311.14679v2 ) ライセンス: Link先を確認	Michael Guerzhoy	(参考訳) 良い(頻度主義的な)統計実践は、観測されている現象が偶然にヌル仮説が偽であるかどうかを判断するために統計テストを実施する必要がある。良い実践はまた、研究が力不足である場合、テストは実行されないことを要求している:もし観測の数が十分大きくなければ、たとえその効果が存在するとしても、その効果を確実に検出できる。動力不足の研究は誤ったネガティブな結果のリスクを負う。これは、コンピュータサイエンス教育カンファレンスのガイドラインと期待に緊張を生じさせる: 多くの観察を持つ研究は明らかだが、研究者は実際にはp値を計算するべきではないし、観測数が少なすぎる場合は統計テストを行うべきである。この問題は特にCSedの会場で行われている。本稿では,計算機科学教育研究者が遭遇する異なる環境でp値を計算する場合と計算しない場合の考察について概説する。我々は,異なるコンピュータサイエンス教育会議(ICER,SIGCSE TS,ITiCSE,EAAI,CompEd,Koli Calling)の著者およびレビュアーガイドラインを調査した。要旨は会議から会議まで様々であり、定性的な研究が可能であり、場合によっては経験報告もあるが、そのガイドラインは、(1)適切な統計分析または(2)豊かな定性的な記述の少なくとも1つを持つことを明記してはいない。小規模研究と大規模研究のガイドラインにおける緊張に対処するための予備的考え方について述べる。 Good (Frequentist) statistical practice requires that statistical tests be performed in order to determine if the phenomenon being observed could plausibly occur by chance if the null hypothesis is false. Good practice also requires that a test is not performed if the study is underpowered: if the number of observations is not sufficiently large to be able to reliably detect the effect one hypothesizes, even if the effect exists. Running underpowered studies runs the risk of false negative results. This creates tension in the guidelines and expectations for computer science education conferences: while things are clear for studies with a large number of observations, researchers should in fact not compute p-values and perform statistical tests if the number of observations is too small. The issue is particularly live in CSed venues, since class sizes where those issues are salient are common. We outline the considerations for when to compute and when not to compute p-values in different settings encountered by computer science education researchers. We survey the author and reviewer guidelines in different computer science education conferences (ICER, SIGCSE TS, ITiCSE, EAAI, CompEd, Koli Calling). We present summary data and make several preliminary observations about reviewer guidelines: guidelines vary from conference to conference; guidelines allow for qualitative studies, and, in some cases, experience reports, but guidelines do not generally explicitly indicate that a paper should have at least one of (1) an appropriately-powered statistical analysis or (2) rich qualitative descriptions. We present preliminary ideas for addressing the tension in the guidelines between small-n and large-n studies	翻訳日:2023-11-30 09:48:34 公開日:2023-11-28
# 閉じ込められた冷原子rydberg系におけるビブロンカップリングのスペクトルシグネチャ Spectral signatures of vibronic coupling in trapped cold atomic Rydberg systems ( http://arxiv.org/abs/2311.16998v1 ) ライセンス: Link先を確認	Joseph W. P. Wilkinson, Weibin Li, Igor Lesanovsky	(参考訳) 電場と光学場に閉じ込められた原子とイオンは多くの現在の量子シミュレーションと計算プラットフォームの基礎となる。高次ライドバーグ状態に興奮すると、状態依存力によって電子と振動の自由度を強く結合する長距離双極子相互作用が現れる。このビブロンカップリングとそれに続く内部および外部自由度のハイブリダイゼーションは、多体スペクトルの明快なシグネチャを通して現れる。このことは、相対振動とRydberg状態の間の相互作用が量子ラビモデルを実現する2つのトラップされたRydbergイオンの場合を考慮して説明する。我々は,上記のハイブリダイゼーションを周波数分光法により検証し,有限温度およびより大きなイオン結晶に対して観測可能なスペクトルシグネチャを議論できることを実証する。 Atoms and ions confined with electric and optical fields form the basis of many current quantum simulation and computing platforms. When excited to high-lying Rydberg states, long-ranged dipole interactions emerge which strongly couple the electronic and vibrational degrees of freedom through state-dependent forces. This vibronic coupling and the ensuing hybridization of internal and external degrees of freedom manifest through clear signatures in the many-body spectrum. We illustrate this by considering the case of two trapped Rydberg ions, for which the interaction between the relative vibrations and Rydberg states realizes a quantum Rabi model. We proceed to demonstrate that the aforementioned hybridization can be probed by radio frequency spectroscopy and discuss observable spectral signatures at finite temperatures and for larger ion crystals.	翻訳日:2023-11-30 09:28:13 公開日:2023-11-28
# FedECA:分散環境での時系列データを用いた因果推論のためのフェデレーション外部制御アーム手法 FedECA: A Federated External Control Arm Method for Causal Inference with Time-To-Event Data in Distributed Settings ( http://arxiv.org/abs/2311.16984v1 ) ライセンス: Link先を確認	Jean Ogier du Terrail, Quentin Klopfenstein, Honghao Li, Imke Mayer, Nicolas Loiseau, Mohammad Hallal, F\'elix Balazard, Mathieu Andreux	(参考訳) 外部制御アーム(ECA)は、実験薬の初期臨床開発を知らせ、非ランダム化環境での規制承認の有効な証拠を提供する。しかし、ECAを実装する主な課題は、現実世界のデータや歴史的な臨床試験にアクセスすることである。実際、データ共有は、元々の収集センターを離れるデータに関するプライバシー上の考慮と、製薬会社の競争動機によって実現できないことが多い。本稿では,フェデレーション学習(FL)と呼ばれるプライバシ向上技術を活用し,データ共有の障壁を取り除く。我々は,患者のデータ露出を制限することにより,ECAの実装を容易化するFedECAと呼ばれる,治療重み付け(IPTW)方式のフェデレーション学習逆確率を導入する。我々は,FedECAが最も近い競合相手であるMAIC(Match-adjusted indirect comparison)よりも,統計的パワーと治療と対照群のバランスの点で優れていることを示す。このようなメソッドの使用を促進するため、プライバシーに敏感なコンテキストで実証された経験を持つオープンソースのFLソフトウェアであるSubstraに依存したコードを公開しています。 External control arms (ECA) can inform the early clinical development of experimental drugs and provide efficacy evidence for regulatory approval in non-randomized settings. However, the main challenge of implementing ECA lies in accessing real-world data or historical clinical trials. Indeed, data sharing is often not feasible due to privacy considerations related to data leaving the original collection centers, along with pharmaceutical companies' competitive motives. In this paper, we leverage a privacy-enhancing technology called federated learning (FL) to remove some of the barriers to data sharing. We introduce a federated learning inverse probability of treatment weighted (IPTW) method for time-to-event outcomes called FedECA which eases the implementation of ECA by limiting patients' data exposure. We show with extensive experiments that FedECA outperforms its closest competitor, matching-adjusted indirect comparison (MAIC), in terms of statistical power and ability to balance the treatment and control groups. To encourage the use of such methods, we publicly release our code which relies on Substra, an open-source FL software with proven experience in privacy-sensitive contexts.	翻訳日:2023-11-30 09:28:00 公開日:2023-11-28
# 金属スピンガラスの機械学習力場モデル Machine learning force-field models for metallic spin glass ( http://arxiv.org/abs/2311.16964v1 ) ライセンス: Link先を確認	Menglin Shi, Sheng Zhang, Gia-Wei Chern	(参考訳) 希薄磁性合金のような金属スピンガラス系は、ランダムに分散した局所モーメントを長距離電子を媒介する効果的な相互作用によって特徴づけられる。本稿では,金属スピングラスの動的シミュレーションのためのスケーラブル機械学習(ML)フレームワークを提案する。局所性の原理に基づくbehler-parrinello型ニューラルネットワークモデルを開発し、スピンダイナミクスを駆動する電子誘起局所磁場を正確にかつ効率的に予測する。 MLモデルの重要な構成要素は、ニューラルネットワークに直接入力される局所磁気環境の適切な対称性不変表現である。量子分子動力学のml力場モデルで広く用いられている原子中心対称性関数法にスピン自由度を組み込むことにより、このような磁気ディスクリプタを開発した。我々は、s-dモデルの非晶質一般化の緩和ダイナミクスの研究にアプローチを適用する。本研究は,拡張磁石の大規模動的モデリングにおけるMLモデルの可能性を明らかにするものである。 Metallic spin glass systems, such as dilute magnetic alloys, are characterized by randomly distributed local moments coupled to each other through a long-range electron-mediated effective interaction. We present a scalable machine learning (ML) framework for dynamical simulations of metallic spin glasses. A Behler-Parrinello type neural-network model, based on the principle of locality, is developed to accurately and efficiently predict electron-induced local magnetic fields that drive the spin dynamics. A crucial component of the ML model is a proper symmetry-invariant representation of local magnetic environment which is direct input to the neural net. We develop such a magnetic descriptor by incorporating the spin degrees of freedom into the atom-centered symmetry function methods which are widely used in ML force-field models for quantum molecular dynamics. We apply our approach to study the relaxation dynamics of an amorphous generalization of the s-d model. Our work highlights the promising potential of ML models for large-scale dynamical modeling of itinerant magnets with quenched disorder.	翻訳日:2023-11-30 09:27:37 公開日:2023-11-28
# 高マッハ数流体問題に対するデータ効率演算子学習 Data-efficient operator learning for solving high Mach number fluid flow problems ( http://arxiv.org/abs/2311.16860v1 ) ライセンス: Link先を確認	Noah Ford, Victor J. Leon, Honest Merman, Jeffrey Gilbert, Alexander New	(参考訳) 本研究では,SciMLを用いて不規則な地形上のマッハ流体の解を予測する。この設定では、データは制限されているため、モデルが低データ設定でうまく機能することが望ましい。データから行動モードの基底を学習し,この基底を用いて予測を行うニューラル基底関数(NBF)は,ベースを意識しないベースラインモデルよりも効果的であることを示す。さらに,このような問題に対する解決策の予測という分野における継続的な課題を明らかにする。 We consider the problem of using SciML to predict solutions of high Mach fluid flows over irregular geometries. In this setting, data is limited, and so it is desirable for models to perform well in the low-data setting. We show that Neural Basis Functions (NBF), which learns a basis of behavior modes from the data and then uses this basis to make predictions, is more effective than a basis-unaware baseline model. In addition, we identify continuing challenges in the space of predicting solutions for this type of problem.	翻訳日:2023-11-30 09:27:24 公開日:2023-11-28
# 確率的クライアント選択による非同期無線フェデレーション学習 Asynchronous Wireless Federated Learning with Probabilistic Client Selection ( http://arxiv.org/abs/2311.16741v1 ) ライセンス: Link先を確認	Jiarong Yang, Yuan Liu, Fangjiong Chen, Wen Chen, Changle Li	(参考訳) federated learning (fl) は有望な分散学習フレームワークであり、分散クライアントはサーバが協調する機械学習モデルを協調的にトレーニングする。非同期FLにおけるトラグラー問題に対処するため、各クライアントはローカル更新を保持し、任意のタイミングでローカルモデルをサーバに確率的に送信する。まず,確率的クライアント選択に基づく収束率の(近似的な)表現を導出する。そして、非同期FLの収束率と移動エネルギー消費を、連立確率的クライアント選択と帯域割り当てによりトレードオフする最適化問題を定式化する。我々は,非凸問題をグローバルに解く反復アルゴリズムを開発した。実験は従来のスキームと比較して提案手法の優位性を示す。 Federated learning (FL) is a promising distributed learning framework where distributed clients collaboratively train a machine learning model coordinated by a server. To tackle the stragglers issue in asynchronous FL, we consider that each client keeps local updates and probabilistically transmits the local model to the server at arbitrary times. We first derive the (approximate) expression for the convergence rate based on the probabilistic client selection. Then, an optimization problem is formulated to trade off the convergence rate of asynchronous FL and mobile energy consumption by joint probabilistic client selection and bandwidth allocation. We develop an iterative algorithm to solve the non-convex problem globally optimally. Experiments demonstrate the superiority of the proposed approach compared with the traditional schemes.	翻訳日:2023-11-30 09:27:15 公開日:2023-11-28
# 濃厚固溶体合金の希薄・化学的に偏った間質拡散:機構と方法 Sluggish and Chemically-Biased Interstitial Diffusion in Concentrated Solid Solution Alloys: Mechanisms and Methods ( http://arxiv.org/abs/2311.16727v1 ) ライセンス: Link先を確認	Biao Xu, Haijun Fu, Shasha Huang, Shihua Ma, Yaoxu Xiong, Jun Zhang, Xuepeng Xiang, Wenyu Lu, Ji-Jung Kai, Shijun Zhao	(参考訳) 間質拡散(Interstitial diffusion)は、非平衡条件下で材料の相安定性と照射応答を管理する中心的な過程である。本研究では, 機械学習 (ml) と速度論的モンテカルロ (kmc) を組み合わせることで, fe-ni 濃縮固溶合金 (csas) におけるsluggish and chemicallybiased interstitial diffusion (sluggish and chemicallybiased interstitial diffusion) について検討した。 ML-kMCは高温で分子動力学によって報告された拡散率を再現する。この強力なツールにより,Fe-Ni合金のスラグ拡散と"Ni-Ni-Ni"-バイアス拡散は独特な"バリアロック"機構と説明され,"Fe-Fe-Fe"-バイアス拡散は"コンポーネントドミナンス"機構の影響を受けていることがわかった。以上のメカニズムに着想を得て,移動パターンの平均エネルギー障壁にのみ依存して間質媒介拡散率を簡便かつ迅速に決定する実用的なAvgS-kMC法を提案する。 AvgS-kMCと微分進化アルゴリズムを組み合わせることで、スラグ拡散特性を最適化するための逆設計戦略を適用し、好ましいマイグレーションパターンの重要な役割を強調する。 Interstitial diffusion is a pivotal process that governs the phase stability and irradiation response of materials in non-equilibrium conditions. In this work, we study sluggish and chemically-biased interstitial diffusion in Fe-Ni concentrated solid solution alloys (CSAs) by combining machine learning (ML) and kinetic Monte Carlo (kMC), where ML is used to accurately and efficiently predict the migration energy barriers on-the-fly. The ML-kMC reproduces the diffusivity that was reported by molecular dynamics results at high temperatures. With this powerful tool, we find that the observed sluggish diffusion and the "Ni-Ni-Ni"-biased diffusion in Fe-Ni alloys are ascribed to a unique "Barrier Lock" mechanism, whereas the "Fe-Fe-Fe"-biased diffusion is influenced by a "Component Dominance" mechanism. Inspired by the mentioned mechanisms, a practical AvgS-kMC method is proposed for conveniently and swiftly determining interstitial-mediated diffusivity by only relying on the mean energy barriers of migration patterns. Combining the AvgS-kMC with the differential evolutionary algorithm, an inverse design strategy for optimizing sluggish diffusion properties is applied to emphasize the crucial role of favorable migration patterns.	翻訳日:2023-11-30 09:27:04 公開日:2023-11-28
# 非構造スパース回収のための固有行列 Eigenmatrix for unstructured sparse recovery ( http://arxiv.org/abs/2311.16609v1 ) ライセンス: Link先を確認	Lexing Ying	(参考訳) 本稿では,非構造化スパースリカバリ問題を一般に検討する。例えば、有理近似、スペクトル関数推定、フーリエ逆変換、ラプラス逆変換、スパース逆畳みなどである。主な課題は、サンプル値のノイズと、サンプル位置の構造化されていない性質である。本稿では,所望の固有値と固有ベクトルを持つデータ駆動構成である固有行列を提案する。 eigenmatrixは、これらのスパースリカバリ問題に対して、新しい方法を提供する。提案手法の効率性を示すために, 数値計算を行った。 This paper considers the unstructured sparse recovery problems in a general form. Examples include rational approximation, spectral function estimation, Fourier inversion, Laplace inversion, and sparse deconvolution. The main challenges are the noise in the sample values and the unstructured nature of the sample locations. This paper proposes the eigenmatrix, a data-driven construction with desired approximate eigenvalues and eigenvectors. The eigenmatrix offers a new way for these sparse recovery problems. Numerical results are provided to demonstrate the efficiency of the proposed method.	翻訳日:2023-11-30 09:26:36 公開日:2023-11-28
# d4am:下流音響モデルのための汎用分節フレームワーク D4AM: A General Denoising Framework for Downstream Acoustic Models ( http://arxiv.org/abs/2311.16595v1 ) ライセンス: Link先を確認	Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen	(参考訳) 音響モデルの性能は特にノイズの多い環境で劣化する。音声強調(SE)は、自動音声認識(ASR)システムを支援するフロントエンド戦略として用いられる。しかし、既存のse手法の訓練目的は、音声テキストとノイズ除去されたペアデータを統合することで、asrシステムに対する訓練に完全には役に立たない。本研究では,下流の様々な音響モデルに対して,d4amを提案する。本フレームワークは, 特定の音響モデルと対応する分類対象に応じて, 後方勾配のSEモデルを微調整する。さらに, 本手法は, SEモデルを他の未知音響モデルに一般化させるため, 回帰目標を補助的損失として考慮することを目的とする。回帰および分類目的のSEユニットを共同で訓練するために、D4AMは、追加の訓練コストでグリッド探索処理を行うのではなく、適切な重み付け係数を直接推定する調整スキームを使用する。この調整方式は、勾配校正と回帰目標重み付けの2つの部分からなる。実験の結果,D4AMは様々な音響モデルの改良を一貫して効果的に行うことができ,他の組み合わせよりも優れることがわかった。具体的には、Google ASR APIでSEトレーニング中に完全にノイズのないデータで評価すると、D4AMはノイズ入力の直接供給と比較して相対的なWERの24.65%の減少を達成する。我々の知る限り、これは回帰(デノジング)と分類(ASR)の効果的な組み合わせを展開し、様々な未知のASRシステムに適用可能な一般的なプリプロセッサを導出する最初の試みである。私たちのコードはhttps://github.com/ChangLee0903/D4AMで利用可能です。 The performance of acoustic models degrades notably in noisy environments. Speech enhancement (SE) can be used as a front-end strategy to aid automatic speech recognition (ASR) systems. However, existing training objectives of SE methods are not fully effective at integrating speech-text and noisy-clean paired data for training toward unseen ASR systems. In this study, we propose a general denoising framework, D4AM, for various downstream acoustic models. Our framework fine-tunes the SE model with the backward gradient according to a specific acoustic model and the corresponding classification objective. In addition, our method aims to consider the regression objective as an auxiliary loss to make the SE model generalize to other unseen acoustic models. To jointly train an SE unit with regression and classification objectives, D4AM uses an adjustment scheme to directly estimate suitable weighting coefficients rather than undergoing a grid search process with additional training costs. The adjustment scheme consists of two parts: gradient calibration and regression objective weighting. The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. Specifically, when evaluated on the Google ASR API with real noisy data completely unseen during SE training, D4AM achieves a relative WER reduction of 24.65% compared with the direct feeding of noisy input. To our knowledge, this is the first work that deploys an effective combination scheme of regression (denoising) and classification (ASR) objectives to derive a general pre-processor applicable to various unseen ASR systems. Our code is available at https://github.com/ChangLee0903/D4AM.	翻訳日:2023-11-30 09:26:31 公開日:2023-11-28
# 6Gネットワークのネットワーク収束と計算のためのフェデレーション学習の通信効率最適化 Communication Efficiency Optimization of Federated Learning for Computing and Network Convergence of 6G Networks ( http://arxiv.org/abs/2311.16540v1 ) ライセンス: Link先を確認	Yizhuo Cai, Bo Lei, Qianying Zhao, Jing Peng, Min Wei, Yushun Zhang, Xing Zhang	(参考訳) フェデレーション学習は、参加するデバイスを横断してグローバルモデルをトレーニングすることによって、データプライバシなどの問題に効果的に対処する。しかしながら、ネットワークトポロジーやデバイスコンピューティングパワーなどの要素は、複雑なネットワーク環境でのトレーニングや通信プロセスに影響を与える可能性がある。 6Gネットワークのコンピューティングとネットワーク収束(CNC)は、フェデレーション学習トレーニングを効果的にサポートし、通信効率を向上させることができる。参加者のデバイストレーニングを、ビジネス要件、リソース負荷、ネットワーク条件、デバイスの演算能力に基づいてフェデレーション学習で導くことで、cncはこの目標を達成できる。本稿では,複雑なネットワークにおけるフェデレート学習の通信効率を向上させるために,ネットワークのネットワーク収束と6Gネットワークにおけるフェデレーション学習の通信効率の最適化,ネットワーク条件の異なるトレーニングプロセスの判断方法,フェデレーション学習に参加する装置の演算能力について検討する。実験では, モデルパラメータの伝達過程において, 通信効率の最適化を図りながら, フェデレーション学習においてデバイスに存在する2つのアーキテクチャに対処し, 演算力に基づくトレーニングに装置を配置する。提案手法は,(1)複雑なネットワーク状況にうまく対処できる,(2)ローカルトレーニングにおける参加機器の遅延分散の効果的バランス(3)モデルパラメータ転送時の通信効率の向上、(4)ネットワーク内のリソース利用率の向上が示されている。 Federated learning effectively addresses issues such as data privacy by collaborating across participating devices to train global models. However, factors such as network topology and device computing power can affect its training or communication process in complex network environments. A new network architecture and paradigm with computing-measurable, perceptible, distributable, dispatchable, and manageable capabilities, computing and network convergence (CNC) of 6G networks can effectively support federated learning training and improve its communication efficiency. By guiding the participating devices' training in federated learning based on business requirements, resource load, network conditions, and arithmetic power of devices, CNC can reach this goal. In this paper, to improve the communication efficiency of federated learning in complex networks, we study the communication efficiency optimization of federated learning for computing and network convergence of 6G networks, methods that gives decisions on its training process for different network conditions and arithmetic power of participating devices in federated learning. The experiments address two architectures that exist for devices in federated learning and arrange devices to participate in training based on arithmetic power while achieving optimization of communication efficiency in the process of transferring model parameters. The results show that the method we proposed can (1) cope well with complex network situations (2) effectively balance the delay distribution of participating devices for local training (3) improve the communication efficiency during the transfer of model parameters (4) improve the resource utilization in the network.	翻訳日:2023-11-30 09:26:02 公開日:2023-11-28
# グリオ芽腫浸潤のパーソナライズド予測:数学的モデル、物理インフォームドニューラルネットワーク、マルチモーダルスコープ Personalized Predictions of Glioblastoma Infiltration: Mathematical Models, Physics-Informed Neural Networks and Multimodal Scans ( http://arxiv.org/abs/2311.16536v1 ) ライセンス: Link先を確認	Ray Zirui Zhang, Ivan Ezhov, Michal Balcerak, Andy Zhu, Benedikt Wiestler, Bjoern Menze, John Lowengrub	(参考訳) 医学的MRI検査からGlioblastoma(GBM)の浸潤を予測することは、腫瘍の増殖動態を理解し、個別の放射線治療計画を立てるのに不可欠であり、GBM成長の数学的モデルは腫瘍細胞の空間分布の予測においてそのデータを補うことができる。しかし、これは、時間的データや画像診断と診断の間の制限による逆問題である臨床データから、モデルの患者固有のパラメータを推定する必要がある。本研究では,単一3次元構造MRIスナップショットからGBM成長の反応拡散PDEモデルの患者固有のパラメータを推定するために物理情報ニューラルネットワーク(PINN)を用いる手法を提案する。 PINNはデータとPDEの両方を損失関数に埋め込み、理論とデータを統合する。主なイノベーションは、特徴的な非次元パラメータの同定と推定、非次元パラメータを利用する事前学習ステップ、患者固有のパラメータを決定するための微調整ステップである。さらに、拡散領域法は、PINNフレームワーク内の複雑な脳の形状を扱うために用いられる。本手法は, 合成データと患者データの両方で検証され, 個人化gbm治療のための臨床設定において, リアルタイムパラメトリック推論が期待できる。 Predicting the infiltration of Glioblastoma (GBM) from medical MRI scans is crucial for understanding tumor growth dynamics and designing personalized radiotherapy treatment plans.Mathematical models of GBM growth can complement the data in the prediction of spatial distributions of tumor cells. However, this requires estimating patient-specific parameters of the model from clinical data, which is a challenging inverse problem due to limited temporal data and the limited time between imaging and diagnosis. This work proposes a method that uses Physics-Informed Neural Networks (PINNs) to estimate patient-specific parameters of a reaction-diffusion PDE model of GBM growth from a single 3D structural MRI snapshot. PINNs embed both the data and the PDE into a loss function, thus integrating theory and data. Key innovations include the identification and estimation of characteristic non-dimensional parameters, a pre-training step that utilizes the non-dimensional parameters and a fine-tuning step to determine the patient specific parameters. Additionally, the diffuse domain method is employed to handle the complex brain geometry within the PINN framework. Our method is validated both on synthetic and patient datasets, and shows promise for real-time parametric inference in the clinical setting for personalized GBM treatment.	翻訳日:2023-11-30 09:25:39 公開日:2023-11-28
# 状態制約付き2プレイヤー一般サム微分ゲームの値近似 Value Approximation for Two-Player General-Sum Differential Games with State Constraints ( http://arxiv.org/abs/2311.16520v1 ) ライセンス: Link先を確認	Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren	(参考訳) Hamilton-Jacobi-Isaacs (HJI) PDE を解くことで、2人のプレイヤーの差分ゲームにおける平衡フィードバック制御が可能になるが、次元性(CoD)の呪いに直面している。物理インフォームド機械学習は、PDEの解法においてCoDに対処するために採用されているが、この手法はサンプリングの性質から不連続解を学習するには不十分であり、状態やその他の時間的論理的制約により値が不連続であるロボティクスアプリケーションにおいて、結果として生じるコントローラの安全性性能が低下する。本研究では,(1)平衡実証とHJI PDEの両方を用いたハイブリッド学習法,(2)制約違反ペナルティのリプシッツ定数を増大させてHJIの列を解く値硬化法,(3)値が連続となる高次元補助状態空間に値を持ち上げるエピグラフィカル手法,の3つの可能性について検討する。 5Dと9Dの車両シミュレーションと13Dのドローンシミュレーションによる評価は、このハイブリッド手法が一般化と安全性の面で他よりも優れていることを示している。 Solving Hamilton-Jacobi-Isaacs (HJI) PDEs enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD). While physics-informed machine learning has been adopted to address CoD in solving PDEs, this method falls short in learning discontinuous solutions due to its sampling nature, leading to poor safety performance of the resulting controllers in robotics applications where values are discontinuous due to state or other temporal logic constraints. In this study, we explore three potential solutions to this problem: (1) a hybrid learning method that uses both equilibrium demonstrations and the HJI PDE, (2) a value-hardening method where a sequence of HJIs are solved with increasing Lipschitz constant on the constraint violation penalty, and (3) the epigraphical technique that lifts the value to a higher dimensional auxiliary state space where the value becomes continuous. Evaluations through 5D and 9D vehicle simulations and 13D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance.	翻訳日:2023-11-30 09:25:16 公開日:2023-11-28
# b-lstm-mionet:bayesian lstm-based neural operators for learn the response of complex dynamical systems to length-variant multiple input function B-LSTM-MIONet: Bayesian LSTM-based Neural Operators for Learning the Response of Complex Dynamical Systems to Length-Variant Multiple Input Functions ( http://arxiv.org/abs/2311.16519v1 ) ライセンス: Link先を確認	Zhihao Kong and Amirhossein Mollaali and Christian Moya and Na Lu and Guang Lin	(参考訳) Deep Operator Network (DeepONet)は、複雑なシステムを記述する通常の微分方程式(ODE)のような非線形演算子を学習するためのニューラルネットワークフレームワークである。マルチインプットディープニューラル演算子(MIONet)は、異なるバナッハ空間における複数の入力関数を可能にするためにDeepONetを拡張した。 MIONetは、出力位置の制約なしにデータセットグリッド間隔をトレーニングする柔軟性を提供する。しかし、オフライン入力が必要であり、テストデータセットのさまざまなシーケンス長を処理できないため、動的複雑システムにおけるリアルタイムアプリケーションを制限することができる。この作業はMIONetを再設計し、Long Short Term Memory(LSTM)を統合して、時間依存のデータからニューラル演算子を学ぶ。このアプローチはデータの離散化の制約を克服し、LSTMの能力を可変長リアルタイムデータで活用する。アルゴリズム外挿能力などの学習性能に影響する要因を提示する。このフレームワークは、新しいベイズ法による不確実な定量化によって拡張され、MIONetパラメータ分布からサンプリングされる。そこで我々は,B-LSTM-MIONetを開発し,LSTMの時間的強度をベイズ的頑健さと組み合わせることで,ノイズのあるデータセットのより正確で信頼性の高いモデルを構築した。 Deep Operator Network (DeepONet) is a neural network framework for learning nonlinear operators such as those from ordinary differential equations (ODEs) describing complex systems. Multiple-input deep neural operators (MIONet) extended DeepONet to allow multiple input functions in different Banach spaces. MIONet offers flexibility in training dataset grid spacing, without constraints on output location. However, it requires offline inputs and cannot handle varying sequence lengths in testing datasets, limiting its real-time application in dynamic complex systems. This work redesigns MIONet, integrating Long Short Term Memory (LSTM) to learn neural operators from time-dependent data. This approach overcomes data discretization constraints and harnesses LSTM's capability with variable-length, real-time data. Factors affecting learning performance, like algorithm extrapolation ability are presented. The framework is enhanced with uncertainty quantification through a novel Bayesian method, sampling from MIONet parameter distributions. Consequently, we develop the B-LSTM-MIONet, incorporating LSTM's temporal strengths with Bayesian robustness, resulting in a more precise and reliable model for noisy datasets.	翻訳日:2023-11-30 09:24:53 公開日:2023-11-28
# 加速ランジュバンダイナミクスのための近位アルゴリズム Proximal Algorithms for Accelerated Langevin Dynamics ( http://arxiv.org/abs/2311.14829v2 ) ライセンス: Link先を確認	Duy H. Thai, Alexander L. Young, David B. Dunson	(参考訳) 我々は,確率化Nesterovスキームに基づくMCMCアルゴリズムの新たなクラスを開発する。ノイズを適切に加えることで、結果は時間的不均一なアンダーダムングランゲヴィン方程式となり、そこでは特定の目標分布をその不変測度として出力することが証明される。 Wasserstein-2 距離での定常性への収束率も確立されている。提案したランゲヴィン力学のメトロポリス調整および確率勾配版も提供される。実験例では, マルコフ連鎖のより優れた混合を含む, 統計および画像処理の異なるモデルに対して, 典型的なランゲヴィンサンプルよりも優れた性能を示す。 We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. With an appropriate addition of noise, the result is a time-inhomogeneous underdamped Langevin equation, which we prove emits a specified target distribution as its invariant measure. Convergence rates to stationarity under Wasserstein-2 distance are established as well. Metropolis-adjusted and stochastic gradient versions of the proposed Langevin dynamics are also provided. Experimental illustrations show superior performance of the proposed method over typical Langevin samplers for different models in statistics and image processing including better mixing of the resulting Markov chains.	翻訳日:2023-11-29 23:22:24 公開日:2023-11-28
# 加速ランジュバンダイナミクスのための近位アルゴリズム Proximal Algorithms for Accelerated Langevin Dynamics ( http://arxiv.org/abs/2311.14829v1 ) ライセンス: Link先を確認	Duy H. Thai, Alexander L. Young, David B. Dunson	(参考訳) 我々は,確率化Nesterovスキームに基づくMCMCアルゴリズムの新たなクラスを開発する。ノイズを適切に加えることで、結果は時間的不均一なアンダーダムングランゲヴィン方程式となり、そこでは特定の目標分布をその不変測度として出力することが証明される。 Wasserstein-2 距離での定常性への収束率も確立されている。提案したランゲヴィン力学のメトロポリス調整および確率勾配版も提供される。実験例では, マルコフ連鎖のより優れた混合を含む, 統計および画像処理の異なるモデルに対して, 典型的なランゲヴィンサンプルよりも優れた性能を示す。 We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. With an appropriate addition of noise, the result is a time-inhomogeneous underdamped Langevin equation, which we prove emits a specified target distribution as its invariant measure. Convergence rates to stationarity under Wasserstein-2 distance are established as well. Metropolis-adjusted and stochastic gradient versions of the proposed Langevin dynamics are also provided. Experimental illustrations show superior performance of the proposed method over typical Langevin samplers for different models in statistics and image processing including better mixing of the resulting Markov chains.	翻訳日:2023-11-29 23:22:16 公開日:2023-11-28
# 2次元MRI分割のための不確実性認識AI Uncertainty Aware AI for 2D MRI Segmentation ( http://arxiv.org/abs/2311.14875v2 ) ライセンス: Link先を確認	Lohith Konathala	(参考訳) ディープラーニングの安全性クリティカルな応用には、ロバストな不確実性推定が必要である。医用画像のセマンティックセグメンテーション(セマンティックセグメンテーション)は、深層学習のアプローチは、そのようなタスクにおいて高いパフォーマンスを持つ一方で、分類決定を行う際の信頼感の表れを示さないため、解釈可能性に欠ける。ロバストで解釈可能なセグメンテーションは、病理の自動スクリーニングにおいて重要な第1段階であるため、最適な解は、高い精度を提供すると同時に、基礎となる不確実性も捉えることができるものである。本研究では,ベイズニューラルネットワークとアテンションメカニズムを組み込んだMRIデータを用いて,高精度かつ解釈可能なセグメンテーションを提供する,不確実性を考慮したセグメンテーションモデルBA U-Netを提案する。評価指標として,f1スコアとintersection over union (iou)を用いたbrats 2020データセット上で評価を行った。 Robust uncertainty estimations are necessary in safety-critical applications of Deep Learning. One such example is the semantic segmentation of medical images, whilst deep-learning approaches have high performance in such tasks they lack interpretability as they give no indication of their confidence when making classification decisions. Robust and interpretable segmentation is a critical first stage in automatically screening for pathologies hence the optimal solution is one which can provide high accuracy but also capture the underlying uncertainty. In this work, we present an uncertainty-aware segmentation model, BA U-Net, for use on MRI data that incorporates Bayesian Neural Networks and Attention Mechanisms to provide accurate and interpretable segmentations. We evaluated our model on the publicly available BraTS 2020 dataset using F1 Score and Intersection Over Union (IoU) as evaluation metrics.	翻訳日:2023-11-29 23:10:33 公開日:2023-11-28
# 2次元MRI分割のための不確実性認識AI Uncertainty Aware AI for 2D MRI Segmentation ( http://arxiv.org/abs/2311.14875v1 ) ライセンス: Link先を確認	Lohith Konathala	(参考訳) ディープラーニングの安全性クリティカルな応用には、ロバストな不確実性推定が必要である。医用画像のセマンティックセグメンテーション(セマンティックセグメンテーション)は、深層学習のアプローチは、そのようなタスクにおいて高いパフォーマンスを持つ一方で、分類決定を行う際の信頼感の表れを示さないため、解釈可能性に欠ける。ロバストで解釈可能なセグメンテーションは、病理の自動スクリーニングにおいて重要な第1段階であるため、最適な解は、高い精度を提供すると同時に、基礎となる不確実性も捉えることができるものである。本研究では,ベイズニューラルネットワークとアテンションメカニズムを組み込んだMRIデータを用いて,高精度かつ解釈可能なセグメンテーションを提供する,不確実性を考慮したセグメンテーションモデルBA U-Netを提案する。評価指標として,f1スコアとintersection over union (iou)を用いたbrats 2020データセット上で評価を行った。 Robust uncertainty estimations are necessary in safety-critical applications of Deep Learning. One such example is the semantic segmentation of medical images, whilst deep-learning approaches have high performance in such tasks they lack interpretability as they give no indication of their confidence when making classification decisions. Robust and interpretable segmentation is a critical first stage in automatically screening for pathologies hence the optimal solution is one which can provide high accuracy but also capture the underlying uncertainty. In this work, we present an uncertainty-aware segmentation model, BA U-Net, for use on MRI data that incorporates Bayesian Neural Networks and Attention Mechanisms to provide accurate and interpretable segmentations. We evaluated our model on the publicly available BraTS 2020 dataset using F1 Score and Intersection Over Union (IoU) as evaluation metrics.	翻訳日:2023-11-29 23:10:16 公開日:2023-11-28
# 腎血栓性微小血管障害(TMA)をともなう全スライド画像における診断組織分画の分画 Segmentation of diagnostic tissue compartments on whole slide images with renal thrombotic microangiopathies (TMAs) ( http://arxiv.org/abs/2311.14971v1 ) ライセンス: Link先を確認	Huy Q. Vo, Pietro A. Cicalese, Surya Seshan, Syed A. Rizvi, Aneesh Vathul, Gloria Bueno, Anibal Pedraza Dorado, Niels Grabe, Katharina Stolle, Francesco Pesce, Joris J.T.H. Roelofs, Jesper Kers, Vitoantonio Bevilacqua, Nicola Altini, Bernd Schr\"oppel, Dario Roccatello, Antonella Barreca, Savino Sciascia, Chandra Mohan, Hien V. Nguyen, Jan U. Becker	(参考訳) 腎生検では血栓性微小血管腫 (TMA) が出現し, 急性および慢性の幅広い所見が認められた。 TMAの腎生検診断の正確な診断基準が欠如している。 As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of whole slide images from renal biopsies with TMAs and Mimickers (distinct diseases with a similar nephropathological appearance as TMA like severe benign nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy, arteriolar light chain deposition disease). u-netベースの組織検出とシフトしたwindows-transformerアーキテクチャを組み合わせたセグメンテーションモデルにより, 腎病理検査所の未発見の染色領域においても, 最も大きく変化した糸球体, 動脈, 動脈においても優れたセグメンテーション結果を得ることができた。ヒト腎血管病理における決定的腎生検区画の正確な自動分割により,TMAを用いた腎生検リポジトリの大規模区画別機械学習とコンピュータビジョン解析の基礎を築いた。 The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology with a broad spectrum of acute and chronic findings. Precise diagnostic criteria for a renal biopsy diagnosis of TMA are missing. As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of whole slide images from renal biopsies with TMAs and Mimickers (distinct diseases with a similar nephropathological appearance as TMA like severe benign nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy, arteriolar light chain deposition disease). Our segmentation model combines a U-Net-based tissue detection with a Shifted windows-transformer architecture to reach excellent segmentation results for even the most severely altered glomeruli, arterioles and arteries, even on unseen staining domains from a different nephropathology lab. With accurate automatic segmentation of the decisive renal biopsy compartments in human renal vasculopathies, we have laid the foundation for large-scale compartment-specific machine learning and computer vision analysis of renal biopsy repositories with TMAs.	翻訳日:2023-11-29 22:47:20 公開日:2023-11-28
# フォトニック導波路アレイにおけるプログラマブル高次元ハミルトニアン Programmable high-dimensional Hamiltonian in a photonic waveguide array ( http://arxiv.org/abs/2311.14951v2 ) ライセンス: Link先を確認	Yang Yang, Robert J. Chapman, Ben Haylock, Francesco Lenzini, Yogesh N. Joglekar, Mirko Lobino, and Alberto Peruzzo	(参考訳) 導波路格子は、量子ウォーク、トポロジカルエフェクト、凝縮物質系のシミュレーション、古典的および量子的情報処理など、様々な用途にコンパクトで安定したプラットフォームを提供する。このような格子において、ハミルトンのホッピングとオンサイト項は、導波路間隔と屈折率プロファイルを用いて設計できる光学的進化を決定する。導波路格子は様々なフォトニックプラットフォームで実現されているが、これらのデバイスは常に静的であり、特定の用途のために設計されている。本稿では,ハミルトニアン項を電気光学的に調整し,単一デバイス上で様々なハミルトニアン連続時間発展を実現するプログラム可能な導波路アレイを提案する。ニオブ酸リチウム中の11個の導波路を22個の電極で制御し,su-schriffer-heegerモデル,aubrey-andreモデル,andersonローカライズを実現する実験を行った。アーキテクチャのマイクロスケール局所電場は導波路結合係数と有効指数を独立に制御し,シリコン,窒化ケイ素,シリカなどの他のプラットフォームにおける熱光学位相シフト器のクロストーク制限を克服する。電気光学制御により、低消費電力で超高速でより正確な再構成が可能となり、量子入力状態でも単一デバイスで複数の凝縮物量子ダイナミクスの研究が可能となる。 Waveguide lattices offer a compact and stable platform for a range of applications, including quantum walks, topological effects, condensed matter system simulation, and classical and quantum information processing. In such lattices, the Hamiltonian's hopping and on-site terms determine the optical evolution, which can be engineered using waveguide spacing and refractive index profile. While waveguide lattices have been realized in various photonic platforms, these devices have always been static and designed for specific applications. We present a programmable waveguide array in which the Hamiltonian terms can be electro-optically tuned to implement various Hamiltonian continuous-time evolutions on a single device. We used a single array with 11 waveguides in lithium niobate, controlled via 22 electrodes, to perform a range of experiments that realized the Su-Schriffer-Heeger model, the Aubrey-Andre model, and Anderson localization, which is equivalent to over 2500 static devices. Our architecture's micron-scale local electric fields independently control waveguide coupling coefficients and effective indices, which overcomes cross-talk limitations of thermo-optic phase shifters in other platforms such as silicon, silicon-nitride, and silica. Electro-optic control allows for ultra-fast and more precise reconfigurability with lower power consumption, and with quantum input states, our platform can enable the study of multiple condensed matter quantum dynamics with a single device.	翻訳日:2023-11-29 22:45:24 公開日:2023-11-28
# フォトニック導波路アレイにおけるプログラマブル高次元ハミルトニアン Programmable high-dimensional Hamiltonian in a photonic waveguide array ( http://arxiv.org/abs/2311.14951v1 ) ライセンス: Link先を確認	Yang Yang, Robert J. Chapman, Ben Haylock, Francesco Lenzini, Yogesh N. Joglekar, Mirko Lobino, and Alberto Peruzzo	(参考訳) 導波路格子は、量子ウォーク、トポロジカルエフェクト、凝縮物質系のシミュレーション、古典的および量子的情報処理など、様々な用途にコンパクトで安定したプラットフォームを提供する。このような格子において、ハミルトンのホッピングとオンサイト項は、導波路間隔と屈折率プロファイルを用いて設計できる光学的進化を決定する。導波路格子は様々なフォトニックプラットフォームで実現されているが、これらのデバイスは常に静的であり、特定の用途のために設計されている。本稿では,ハミルトニアン項を電気光学的に調整し,単一デバイス上で様々なハミルトニアン連続時間発展を実現するプログラム可能な導波路アレイを提案する。ニオブ酸リチウム中の11個の導波路を22個の電極で制御し,su-schriffer-heegerモデル,aubrey-andreモデル,andersonローカライズを実現する実験を行った。アーキテクチャのマイクロスケール局所電場は導波路結合係数と有効指数を独立に制御し,シリコン,窒化ケイ素,シリカなどの他のプラットフォームにおける熱光学位相シフト器のクロストーク制限を克服する。電気光学制御により、低消費電力で超高速でより正確な再構成が可能となり、量子入力状態でも単一デバイスで複数の凝縮物量子ダイナミクスの研究が可能となる。 Waveguide lattices offer a compact and stable platform for a range of applications, including quantum walks, topological effects, condensed matter system simulation, and classical and quantum information processing. In such lattices, the Hamiltonian's hopping and on-site terms determine the optical evolution, which can be engineered using waveguide spacing and refractive index profile. While waveguide lattices have been realized in various photonic platforms, these devices have always been static and designed for specific applications. We present a programmable waveguide array in which the Hamiltonian terms can be electro-optically tuned to implement various Hamiltonian continuous-time evolutions on a single device. We used a single array with 11 waveguides in lithium niobate, controlled via 22 electrodes, to perform a range of experiments that realized the Su-Schriffer-Heeger model, the Aubrey-Andre model, and Anderson localization, which is equivalent to over 2500 static devices. Our architecture's micron-scale local electric fields independently control waveguide coupling coefficients and effective indices, which overcomes cross-talk limitations of thermo-optic phase shifters in other platforms such as silicon, silicon-nitride, and silica. Electro-optic control allows for ultra-fast and more precise reconfigurability with lower power consumption, and with quantum input states, our platform can enable the study of multiple condensed matter quantum dynamics with a single device.	翻訳日:2023-11-29 22:44:57 公開日:2023-11-28
# 腎血栓性微小血管障害(TMA)をともなう全スライド画像における診断組織分画の分画 Segmentation of diagnostic tissue compartments on whole slide images with renal thrombotic microangiopathies (TMAs) ( http://arxiv.org/abs/2311.14971v2 ) ライセンス: Link先を確認	Huy Q. Vo, Pietro A. Cicalese, Surya Seshan, Syed A. Rizvi, Aneesh Vathul, Gloria Bueno, Anibal Pedraza Dorado, Niels Grabe, Katharina Stolle, Francesco Pesce, Joris J.T.H. Roelofs, Jesper Kers, Vitoantonio Bevilacqua, Nicola Altini, Bernd Schr\"oppel, Dario Roccatello, Antonella Barreca, Savino Sciascia, Chandra Mohan, Hien V. Nguyen, Jan U. Becker	(参考訳) 腎生検では血栓性微小血管腫 (TMA) が出現し, 急性および慢性の幅広い所見が認められた。 TMAの腎生検診断の正確な診断基準が欠如している。 As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of whole slide images from renal biopsies with TMAs and Mimickers (distinct diseases with a similar nephropathological appearance as TMA like severe benign nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy, arteriolar light chain deposition disease). u-netベースの組織検出とシフトしたwindows-transformerアーキテクチャを組み合わせたセグメンテーションモデルにより, 腎病理検査所の未発見の染色領域においても, 最も大きく変化した糸球体, 動脈, 動脈においても優れたセグメンテーション結果を得ることができた。ヒト腎血管病理における決定的腎生検区画の正確な自動分割により,TMAを用いた腎生検リポジトリの大規模区画別機械学習とコンピュータビジョン解析の基礎を築いた。 The thrombotic microangiopathies (TMAs) manifest in renal biopsy histology with a broad spectrum of acute and chronic findings. Precise diagnostic criteria for a renal biopsy diagnosis of TMA are missing. As a first step towards a machine learning- and computer vision-based analysis of wholes slide images from renal biopsies, we trained a segmentation model for the decisive diagnostic kidney tissue compartments artery, arteriole, glomerulus on a set of whole slide images from renal biopsies with TMAs and Mimickers (distinct diseases with a similar nephropathological appearance as TMA like severe benign nephrosclerosis, various vasculitides, Bevacizumab-plug glomerulopathy, arteriolar light chain deposition disease). Our segmentation model combines a U-Net-based tissue detection with a Shifted windows-transformer architecture to reach excellent segmentation results for even the most severely altered glomeruli, arterioles and arteries, even on unseen staining domains from a different nephropathology lab. With accurate automatic segmentation of the decisive renal biopsy compartments in human renal vasculopathies, we have laid the foundation for large-scale compartment-specific machine learning and computer vision analysis of renal biopsy repositories with TMAs.	翻訳日:2023-11-29 22:30:02 公開日:2023-11-28
# 実験間のリプレイ:オフポリシーrlの自然な拡張 Replay across Experiments: A Natural Extension of Off-Policy RL ( http://arxiv.org/abs/2311.15951v2 ) ライセンス: Link先を確認	Dhruva Tirumala, Thomas Lampe, Jose Enrique Chen, Tuomas Haarnoja, Sandy Huang, Guy Lever, Ben Moran, Tim Hertweck, Leonard Hasenclever, Martin Riedmiller, Nicolas Heess and Markus Wulfmeier	(参考訳) データの再生は、オフポリティクス強化学習(RL)の安定性とデータ効率の基盤となる主要なメカニズムである。複数の実験にまたがってリプレイを効果的に拡張し、RLワークフローを最小限に適用し、コントローラの性能と研究のイテレーション時間を大幅に改善する。中心となるのがreplay across experiments(rae)で、以前の実験からの経験を再利用して、探索とブートストラップ学習を改善し、必要な変更を最小限に抑える。我々は経験的に、多くのrlアルゴリズムと、自発的なビジョンからの厳しい探索タスクを含む、ロコモーションと操作の両方にまたがる困難な制御ドメインにまたがる利点を示す。包括的アブレーションにより、利用可能なデータの品質と量、および様々なハイパーパラメータの選択に対するロバスト性を示す。最後に,このアプローチを研究ライフサイクル全体にわたってより広く適用し,ランダムシードやハイパーパラメータの変動に対してデータを再ロードすることでレジリエンスを向上させる方法について論じる。 Replaying data is a principal mechanism underlying the stability and data efficiency of off-policy reinforcement learning (RL). We present an effective yet simple framework to extend the use of replays across multiple experiments, minimally adapting the RL workflow for sizeable improvements in controller performance and research iteration times. At its core, Replay Across Experiments (RaE) involves reusing experience from previous experiments to improve exploration and bootstrap learning while reducing required changes to a minimum in comparison to prior work. We empirically show benefits across a number of RL algorithms and challenging control domains spanning both locomotion and manipulation, including hard exploration tasks from egocentric vision. Through comprehensive ablations, we demonstrate robustness to the quality and amount of data available and various hyperparameter choices. Finally, we discuss how our approach can be applied more broadly across research life cycles and can increase resilience by reloading data across random seeds or hyperparameter variations.	翻訳日:2023-11-29 21:57:55 公開日:2023-11-28
# 生物デザインツールの責任あるガバナンスに向けて Towards Responsible Governance of Biological Design Tools ( http://arxiv.org/abs/2311.15936v2 ) ライセンス: Link先を確認	Richard Moulange, Max Langenkamp, Tessa Alexanian, Samuel Curtis, Morgan Livingston	(参考訳) 生成機械学習の最近の進歩は、タンパク質構造やシーケンス予測モデルなどの生物設計ツール(BDT)の急速な進歩を可能にしている。前例のないBDTの予測精度と新規設計能力は、新しい重要な二重利用リスクをもたらす。例えば、それらの予測精度は、ワクチンや病原体などの生物学的エージェントをより迅速に開発することを可能にし、その設計能力は薬物の発見やDNAスクリーニングの回避に利用できる。他のデュアルユースAIシステムと同様、BDTも悪質な問題を抱えている。我々は、大規模な言語モデルに主に適合する現在の規制提案が、トレーニングする計算リソースを少なくし、しばしばオープンソースで開発されるBDTにとって、いかに効果が低いかを強調した。我々は、bdtが誤用されるリスクを軽減し、責任ある開発、リスクアセスメント、透明性、アクセス管理、サイバーセキュリティ、レジリエンス投資の分野にまたがる幅広い対策を提案する。このような措置を実施するには、開発者と政府間の緊密な調整が必要である。 Recent advancements in generative machine learning have enabled rapid progress in biological design tools (BDTs) such as protein structure and sequence prediction models. The unprecedented predictive accuracy and novel design capabilities of BDTs present new and significant dual-use risks. For example, their predictive accuracy allows biological agents, whether vaccines or pathogens, to be developed more quickly, while the design capabilities could be used to discover drugs or evade DNA screening techniques. Similar to other dual-use AI systems, BDTs present a wicked problem: how can regulators uphold public safety without stifling innovation? We highlight how current regulatory proposals that are primarily tailored toward large language models may be less effective for BDTs, which require fewer computational resources to train and are often developed in an open-source manner. We propose a range of measures to mitigate the risk that BDTs are misused, across the areas of responsible development, risk assessment, transparency, access management, cybersecurity, and investing in resilience. Implementing such measures will require close coordination between developers and governments.	翻訳日:2023-11-29 21:57:35 公開日:2023-11-28
# A-JEPA: 統合組み込み予測アーキテクチャ A-JEPA: Joint-Embedding Predictive Architecture Can Listen ( http://arxiv.org/abs/2311.15830v2 ) ライセンス: Link先を確認	Zhengcong Fei, Mingyuan Fan, Junshi Huang	(参考訳) 本稿では,大規模視覚モデルの成功を駆動するマスク・モデリングの原理を,潜時空間での予測により効果的に適用できることを示す。本稿では,音声スペクトルから自己教師付き学習を行うシンプルな拡張手法であるA-JEPAを提案する。 I-JEPAの設計に続いて、我々のA-JEPAは、コンテキストエンコーダによるカリキュラムマスキング戦略で可視音声スペクトログラムパッチを符号化し、よく設計された場所でサンプリングされた領域の表現を予測する。これらの領域のターゲット表現は、スペクトル全体について、文脈エンコーダの指数的移動平均である \emph{i.e}, 目標エンコーダによって抽出される。音声スペクトログラムの局所時間と周波数に高度に相関する複雑さを考慮して,ランダムブロックマスキングを時間周波数対応マスキングにカリキュラム的に移行することは有益である。文脈意味理解とロバスト性を高めるため、入力ドロップやゼロではなく、ターゲットデータセットに正規化マスキングを施したエンコーダを微調整する。経験的に、Vision Transformers構造で構築すると、A-JEPAは高度にスケーラブルであり、複数のオーディオおよび音声分類タスクで新しい最先端のパフォーマンスを設定できる。 This paper presents that the masked-modeling principle driving the success of large foundational vision models can be effectively applied to audio by making predictions in a latent space. We introduce Audio-based Joint-Embedding Predictive Architecture (A-JEPA), a simple extension method for self-supervised learning from the audio spectrum. Following the design of I-JEPA, our A-JEPA encodes visible audio spectrogram patches with a curriculum masking strategy via context encoder, and predicts the representations of regions sampled at well-designed locations. The target representations of those regions are extracted by the exponential moving average of context encoder, \emph{i.e.}, target encoder, on the whole spectrogram. We find it beneficial to transfer random block masking into time-frequency aware masking in a curriculum manner, considering the complexity of highly correlated in local time and frequency in audio spectrograms. To enhance contextual semantic understanding and robustness, we fine-tune the encoder with a regularized masking on target datasets, instead of input dropping or zero. Empirically, when built with Vision Transformers structure, we find A-JEPA to be highly scalable and sets new state-of-the-art performance on multiple audio and speech classification tasks, outperforming other recent models that use externally supervised pre-training.	翻訳日:2023-11-29 21:57:20 公開日:2023-11-28
# 情報源開示がAI生成メッセージの評価に及ぼす影響:2部研究 The effect of source disclosure on evaluation of AI-generated messages: A two-part study ( http://arxiv.org/abs/2311.15544v2 ) ライセンス: Link先を確認	Sue Lim, Ralf Schm\"alzle	(参考訳) 過去10年間の人工知能(ai)の進歩は、機械がコミュニケーション行動を示し、人間の思考、感覚、行動に影響を及ぼすことを証明している。実際、ChatGPTの最近の開発により、大規模言語モデル(LLM)が、大規模およびドメイン間の高品質なコミュニケーションコンテンツを生成するために活用できることが示され、実際はますます使われるようになる。しかし、メッセージの発信元を知ることが、人間が生成したメッセージと比較して、受信者のAI生成メッセージの評価と嗜好にどのように影響するかについては、多くの疑問が残る。本稿では,この話題を電子タバコ防止メッセージングの文脈で検討した。事前登録された研究1では、ソース開示がaiによる健康予防メッセージの評価に及ぼす影響について、人間生成メッセージと比較して検討した。ソースの開示(つまり、メッセージのソースをaiと人間にラベル付けする)は、メッセージの評価に大きな影響を与えたが、メッセージのランク付けには大きな影響を与えなかった。研究2では,被験者のAIに対する否定的態度によって,情報源開示の影響がどう変化するかを検討した。我々は,AIに対するネガティブな態度がメッセージ評価に悪影響を及ぼすことを発見したが,メッセージ選択には影響しなかった。しかし、AIに対する否定的な態度が適度である場合、ソース開示はAI生成メッセージの嗜好を減らした。全体として、この一連の研究の結果は、ソースが開示されるとAIが生成するメッセージに対してわずかに偏りを示し、AIとコミュニケーションの交差点にある新たな研究領域が加わった。 Advancements in artificial intelligence (AI) over the last decade demonstrate that machines can exhibit communicative behavior and influence how humans think, feel, and behave. In fact, the recent development of ChatGPT has shown that large language models (LLMs) can be leveraged to generate high-quality communication content at scale and across domains, suggesting that they will be increasingly used in practice. However, many questions remain about how knowing the source of the messages influences recipients' evaluation of and preference for AI-generated messages compared to human-generated messages. This paper investigated this topic in the context of vaping prevention messaging. In Study 1, which was pre-registered, we examined the influence of source disclosure on people's evaluation of AI-generated health prevention messages compared to human-generated messages. We found that source disclosure (i.e., labeling the source of a message as AI vs. human) significantly impacted the evaluation of the messages but did not significantly alter message rankings. In a follow-up study (Study 2), we examined how the influence of source disclosure may vary by the participants' negative attitudes towards AI. We found a significant moderating effect of negative attitudes towards AI on message evaluation, but not for message selection. However, for those with moderate levels of negative attitudes towards AI, source disclosure decreased the preference for AI-generated messages. Overall, the results of this series of studies showed a slight bias against AI-generated messages once the source was disclosed, adding to the emerging area of study that lies at the intersection of AI and communication.	翻訳日:2023-11-29 21:56:55 公開日:2023-11-28
# 分散検出のためのIDライクなプロンプト学習 ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection ( http://arxiv.org/abs/2311.15243v2 ) ライセンス: Link先を確認	Yichen Bai, Zongbo Han, Changqing Zhang, Bing Cao, Xiaoheng Jiang, Qinghua Hu	(参考訳) アウト・オブ・ディストリビューション(OOD)検出法は、OODサンプルを識別するモデルをトレーニングするために補助的なアウトレイアを利用することが多い。しかし、これらのサンプルは、ID(In-distriion)データに近い最も困難なOODサンプル、すなわちIDライクなサンプルを効果的に区別する際の制限に直面している。そこで本研究では,IDサンプルの近傍空間からCLIPを用いて,ID類似の異常値を検出する新しいOOD検出フレームワークを提案する。次に、識別されたIDライクな外れ値を利用して、OOD検出のためのCLIPの機能をさらに活用する即時学習フレームワークを提案する。強力なCLIPから恩恵を受けるため、補助的な外れ値データセットを公開せずにモデルのプロンプトを学習するためには、少数のIDサンプルが必要である。最も難しいidライクなoodサンプルに着目し,クリップの能力をエレガントに活用することにより,実世界の様々な画像データセットにおいて,優れた少数ショット学習性能を実現する(例えば,imagenet-1kデータセットにおける4ショットood検出では,平均fpr95を12.16%削減し,平均aurocを2.76%改善した)。 Out-of-distribution (OOD) detection methods often exploit auxiliary outliers to train model identifying OOD samples, especially discovering challenging outliers from auxiliary outliers dataset to improve OOD detection. However, they may still face limitations in effectively distinguishing between the most challenging OOD samples that are much like in-distribution (ID) data, i.e., ID-like samples. To this end, we propose a novel OOD detection framework that discovers ID-like outliers using CLIP from the vicinity space of the ID samples, thus helping to identify these most challenging OOD samples. Then a prompt learning framework is proposed that utilizes the identified ID-like outliers to further leverage the capabilities of CLIP for OOD detection. Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model without exposing other auxiliary outlier datasets. By focusing on the most challenging ID-like OOD samples and elegantly exploiting the capabilities of CLIP, our method achieves superior few-shot learning performance on various real-world image datasets (e.g., in 4-shot OOD detection on the ImageNet-1k dataset, our method reduces the average FPR95 by 12.16% and improves the average AUROC by 2.76%, compared to state-of-the-art methods).	翻訳日:2023-11-29 21:56:29 公開日:2023-11-28
# SAMv2: 外観、意味、横断的な解剖学的埋め込みを学習するための統一フレームワーク SAMv2: A Unified Framework for Learning Appearance, Semantic and Cross-Modality Anatomical Embeddings ( http://arxiv.org/abs/2311.15111v2 ) ライセンス: Link先を確認	Xiaoyu Bai, Fan Bai, Xiaofei Huo, Jia Ge, Jingjing Lu, Xianghua Ye, Ke Yan, and Yong Xia	(参考訳) 医用画像における解剖学的構造(病変やランドマークなど)の同定は、医用画像解析において基本的な役割を果たす。自己監督型解剖学的eMbedding(SAM)は,画像中の各ボクセルに対する識別的埋め込みを学習し,様々なタスクにおいて有望な結果を示した。しかし、SAMは、(1)類似した外観を持つボクセルを区別するが、異なる意味を持つ(\textit{e.}, 明確な境界を持たない2つの隣接する構造)、(2)類似した意味を持つが顕著に異なる外観を持つボクセル(例えば、コントラスト注入前後の同じ容器)、(3)異質なマッチング(例えば、CT-MRI登録)、といった課題に直面している。これらの課題を克服するため, SAMv2 は外観, 意味, 異質な解剖学的埋め込みを学習するための統一的なフレームワークである。具体的には,(1) 意味的埋め込み学習と,(2) 固定点に基づくマッチング戦略,(3) クロスモーダル埋め込み学習の反復的アプローチの3つの革新を取り入れた。今回我々はSAMv2を3つのタスクに網羅的に評価した。1ショットのランドマーク検出、縦型CTスキャンの病変追跡、視野の異なるCT-MRIアフィン/リグイド登録などである。その結果,SAMv2はSAMや他の最先端手法よりも優れており,ランドマークに基づく医用画像解析タスクに対して,堅牢で多用途なアプローチが提案されている。コードとトレーニングされたモデルは、https://github.com/alibaba-damo-academy/self-supervised-anatomical-embedding-v2で利用可能だ。 Identifying anatomical structures (e.g., lesions or landmarks) in medical images plays a fundamental role in medical image analysis. As an exemplar-based landmark detection method, Self-supervised Anatomical eMbedding (SAM) learns a discriminative embedding for each voxel in the image and has shown promising results on various tasks. However, SAM still faces challenges in: (1) differentiating voxels with similar appearance but different semantic meanings (\textit{e.g.}, two adjacent structures without clear borders); (2) matching voxels with similar semantics but markedly different appearance (e.g., the same vessel before and after contrast injection); and (3) cross-modality matching (e.g., CT-MRI registration). To overcome these challenges, we propose SAMv2, which is a unified framework designed to learn appearance, semantic, and cross-modality anatomical embeddings. Specifically, SAMv2 incorporates three key innovations: (1) semantic embedding learning with prototypical contrastive loss; (2) a fixed-point-based matching strategy; and (3) an iterative approach for cross-modality embedding learning. We thoroughly evaluated SAMv2 across three tasks, including one-shot landmark detection, lesion tracking on longitudinal CT scans, and CT-MRI affine/rigid registration with varying field of view. Our results suggest that SAMv2 outperforms SAM and other state-of-the-art methods, offering a robust and versatile approach for landmark based medical image analysis tasks. Code and trained models are available at: https://github.com/alibaba-damo-academy/self-supervised-anatomical-embedding-v2	翻訳日:2023-11-29 21:54:45 公開日:2023-11-28
# SAMv2: 外観、意味、横断的な解剖学的埋め込みを学習するための統一フレームワーク SAMv2: A Unified Framework for Learning Appearance, Semantic and Cross-Modality Anatomical Embeddings ( http://arxiv.org/abs/2311.15111v1 ) ライセンス: Link先を確認	Xiaoyu Bai, Fan Bai, Xiaofei Huo, Jia Ge, Jingjing Lu, Xianghua Ye, Ke Yan, and Yong Xia	(参考訳) 医用画像における解剖学的構造(病変やランドマークなど)の同定は、医用画像解析において基本的な役割を果たす。自己監督型解剖学的eMbedding(SAM)は,画像中の各ボクセルに対する識別的埋め込みを学習し,様々なタスクにおいて有望な結果を示した。しかし、SAMは、(1)類似した外観を持つボクセルを区別するが、異なる意味を持つ(\textit{e.}, 明確な境界を持たない2つの隣接する構造)、(2)類似した意味を持つが顕著に異なる外観を持つボクセル(例えば、コントラスト注入前後の同じ容器)、(3)異質なマッチング(例えば、CT-MRI登録)、といった課題に直面している。これらの課題を克服するため, SAMv2 は外観, 意味, 異質な解剖学的埋め込みを学習するための統一的なフレームワークである。具体的には,(1) 意味的埋め込み学習と,(2) 固定点に基づくマッチング戦略,(3) クロスモーダル埋め込み学習の反復的アプローチの3つの革新を取り入れた。今回我々はSAMv2を3つのタスクに網羅的に評価した。1ショットのランドマーク検出、縦型CTスキャンの病変追跡、視野の異なるCT-MRIアフィン/リグイド登録などである。その結果,SAMv2はSAMや他の最先端手法よりも優れており,ランドマークに基づく医用画像解析タスクに対して,堅牢で多用途なアプローチが提案されている。コードとトレーニングされたモデルは、https://github.com/alibaba-damo-academy/self-supervised-anatomical-embedding-v2で利用可能だ。 Identifying anatomical structures (e.g., lesions or landmarks) in medical images plays a fundamental role in medical image analysis. As an exemplar-based landmark detection method, Self-supervised Anatomical eMbedding (SAM) learns a discriminative embedding for each voxel in the image and has shown promising results on various tasks. However, SAM still faces challenges in: (1) differentiating voxels with similar appearance but different semantic meanings (\textit{e.g.}, two adjacent structures without clear borders); (2) matching voxels with similar semantics but markedly different appearance (e.g., the same vessel before and after contrast injection); and (3) cross-modality matching (e.g., CT-MRI registration). To overcome these challenges, we propose SAMv2, which is a unified framework designed to learn appearance, semantic, and cross-modality anatomical embeddings. Specifically, SAMv2 incorporates three key innovations: (1) semantic embedding learning with prototypical contrastive loss; (2) a fixed-point-based matching strategy; and (3) an iterative approach for cross-modality embedding learning. We thoroughly evaluated SAMv2 across three tasks, including one-shot landmark detection, lesion tracking on longitudinal CT scans, and CT-MRI affine/rigid registration with varying field of view. Our results suggest that SAMv2 outperforms SAM and other state-of-the-art methods, offering a robust and versatile approach for landmark based medical image analysis tasks. Code and trained models are available at: https://github.com/alibaba-damo-academy/self-supervised-anatomical-embedding-v2	翻訳日:2023-11-29 21:54:07 公開日:2023-11-28
# Video-Bench: ビデオベース大規模言語モデル評価のための総合ベンチマークとツールキット Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models ( http://arxiv.org/abs/2311.16103v2 ) ライセンス: Link先を確認	Munan Ning and Bin Zhu and Yujia Xie and Bin Lin and Jiaxi Cui and Lu Yuan and Dongdong Chen and Li Yuan	(参考訳) ビデオベースの大規模言語モデル(Video-LLMs)が最近導入され、認識と理解の基本的な改善と多様なユーザからの問い合わせの両方をターゲットにしている。人工知能の実現という究極の目標を追求するために、真にインテリジェントなビデオllmモデルは、周囲を観察し理解するだけでなく、人間レベルの常識を持ち、ユーザに適切に決定を下すべきである。このようなモデルの開発を導くため、堅牢で包括的な評価システムの構築が重要となる。そこで本稿では,ビデオllm評価のためのツールキットとともに,新しい総合ベンチマークである \textit{video-bench} を提案する。このベンチマークは、ビデオ排他的理解、事前知識に基づく質問回答、理解と意思決定の3つのレベルにまたがる、ビデオLLMの能力を評価する10の精巧なタスクで構成されている。さらに,各種タスクのプロセスモデル出力に適した自動ツールキットを導入し,メトリクスの計算を容易にし,便利な最終スコアを生成する。ビデオ-LLMの代表的な8種をtextit{Video-Bench} を用いて評価した。この結果によると、現在のビデオ-LLMは人間のような理解と実世界のビデオの分析を達成できない状態にあり、将来の研究の方向性に貴重な洞察を与えている。ベンチマークとツールキットは: \url{https://github.com/PKU-YuanGroup/Video-Bench}.comで入手できる。 Video-based large language models (Video-LLMs) have been recently introduced, targeting both fundamental improvements in perception and comprehension, and a diverse range of user inquiries. In pursuit of the ultimate goal of achieving artificial general intelligence, a truly intelligent Video-LLM model should not only see and understand the surroundings, but also possess human-level commonsense, and make well-informed decisions for the users. To guide the development of such a model, the establishment of a robust and comprehensive evaluation system becomes crucial. To this end, this paper proposes \textit{Video-Bench}, a new comprehensive benchmark along with a toolkit specifically designed for evaluating Video-LLMs. The benchmark comprises 10 meticulously crafted tasks, evaluating the capabilities of Video-LLMs across three distinct levels: Video-exclusive Understanding, Prior Knowledge-based Question-Answering, and Comprehension and Decision-making. In addition, we introduce an automatic toolkit tailored to process model outputs for various tasks, facilitating the calculation of metrics and generating convenient final scores. We evaluate 8 representative Video-LLMs using \textit{Video-Bench}. The findings reveal that current Video-LLMs still fall considerably short of achieving human-like comprehension and analysis of real-world videos, offering valuable insights for future research directions. The benchmark and toolkit are available at: \url{https://github.com/PKU-YuanGroup/Video-Bench}.	翻訳日:2023-11-29 21:40:02 公開日:2023-11-28
# 超強光物質結合系におけるキャビティ光学:カシミールトルクを介する自己制御と集団回転 Cavity optomechanics in ultrastrong light matter coupling regime: Self-alignment and collective rotation mediated by Casimir torque ( http://arxiv.org/abs/2311.15969v2 ) ライセンス: Link先を確認	Denis Ilin, I. V. Tokatly, and Ivan Iorsh	(参考訳) 理論的には、光学キャビティ内に置かれる量子二量体の集合を考える。まず、共振器光子の放出と再吸収を媒介する二量体間の角運動量の交換により、二量体が配向する。さらに、キラルキャビティの真空状態の光学角運動量はダイマーのアンサンブルに伝達され、特定のレベルの光マッター結合強度でダイマーの同期回転に繋がる。 We theoretically consider an ensemble of quantum dimers placed inside an optical cavity. We predict two effects: first, an exchange of angular momentum between the dimers mediated by the emission and re-absorption of the cavity photons leads to the alignment of dimers. Furthermore, the optical angular momentum of the vacuum state of the chiral cavity is transferred to the ensemble of dimers which leads to the synchronous rotation of the dimers at certain levels of light-matter coupling strength.	翻訳日:2023-11-29 21:39:22 公開日:2023-11-28
# 分割と征服の量子時間複雑性について On the quantum time complexity of divide and conquer ( http://arxiv.org/abs/2311.16401v1 ) ライセンス: Link先を確認	Jonathan Allcock, Jinge Bao, Aleksandrs Belovs, Troy Lee, Miklos Santha	(参考訳) 我々は,古典的問題に対する量子分割アルゴリズムの時間複雑性を体系的に研究する。古典的除算アルゴリズムによる探索と最小化の問題が量子スピードアップに有効である一般的な条件を確立し、これらの定理を弦、整数、幾何学的対象を含む一連の問題に適用する。その中には、LongEST DISTINCT SUBSTRING、KLEE's COVERAGE、ストック取引における最適化問題、k-IncrASING SUBsequenceなどがある。これらの結果のほとんどにおいて、我々の量子時間上限は問題に対する量子クエリ下限に一致し、多対数因子までである。 We initiate a systematic study of the time complexity of quantum divide and conquer algorithms for classical problems. We establish generic conditions under which search and minimization problems with classical divide and conquer algorithms are amenable to quantum speedup and apply these theorems to an array of problems involving strings, integers, and geometric objects. They include LONGEST DISTINCT SUBSTRING, KLEE'S COVERAGE, several optimization problems on stock transactions, and k-INCREASING SUBSEQUENCE. For most of these results, our quantum time upper bound matches the quantum query lower bound for the problem, up to polylogarithmic factors.	翻訳日:2023-11-29 20:52:45 公開日:2023-11-28
# 効果的なセキュアコードレビューに向けて:セキュリティ関連コーディング弱さの実証的研究 Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses ( http://arxiv.org/abs/2311.16396v1 ) ライセンス: Link先を確認	Wachiraphan Charoenwet, Patanamon Thongtanunam, Van-Thuan Pham, Christoph Treude	(参考訳) 早期にセキュリティ上の問題を特定することは、ソフトウェアシステムに対する潜在的なネガティブな影響を減らすために推奨される。コードレビューは、開発者が手動で修正されたコードを検査し、ソフトウェア開発サイクル中にセキュリティ上の問題をキャッチできる、広く使われている方法である。しかし、既存のコードレビュー研究は、しばしば既知の脆弱性に焦点を当て、コーディングの弱点を無視している。このようなコーディングの弱点を特定するためのコードレビューの実践は、まだ完全には調査されていない。これを理解するために、私たちはOpenSSLとPHPという2つの大きなオープンソースプロジェクトで実証的なケーススタディを実施しました。 135,560のコードレビューコメントに基づいて、40のコーディング弱点カテゴリのうち35に、レビュー担当者がセキュリティ上の懸念を提起していることが分かりました。驚いたことに、メモリエラーやリソース管理といった過去の脆弱性に関連するコーディングの弱点は、脆弱性よりも少ない頻度で議論された。開発者は多くのケースでセキュリティ上の懸念に対処しようとしたが(39%-41%)、かなりの部分(30%-36%)しか認められず、ソリューションに関する意見の不一致(18%-20%)により修正されなかった。これは、コーディングの弱点が特定してもコードレビューを滑り抜けてしまうことを強調する。この結果から,コードレビュー中のセキュリティ問題の原因となるコーディングの弱点を,レビュアが特定できることが示唆された。しかし、これらの結果は、コードレビューのプラクティスの欠点も示しており、コードレビューにおけるセキュリティ問題管理の認識を高めるためのより効果的なメカニズムやサポートの必要性を示している。 Identifying security issues early is encouraged to reduce the latent negative impacts on software systems. Code review is a widely-used method that allows developers to manually inspect modified code, catching security issues during a software development cycle. However, existing code review studies often focus on known vulnerabilities, neglecting coding weaknesses, which can introduce real-world security issues that are more visible through code review. The practices of code reviews in identifying such coding weaknesses are not yet fully investigated. To better understand this, we conducted an empirical case study in two large open-source projects, OpenSSL and PHP. Based on 135,560 code review comments, we found that reviewers raised security concerns in 35 out of 40 coding weakness categories. Surprisingly, some coding weaknesses related to past vulnerabilities, such as memory errors and resource management, were discussed less often than the vulnerabilities. Developers attempted to address raised security concerns in many cases (39%-41%), but a substantial portion was merely acknowledged (30%-36%), and some went unfixed due to disagreements about solutions (18%-20%). This highlights that coding weaknesses can slip through code review even when identified. Our findings suggest that reviewers can identify various coding weaknesses leading to security issues during code reviews. However, these results also reveal shortcomings in current code review practices, indicating the need for more effective mechanisms or support for increasing awareness of security issue management in code reviews.	翻訳日:2023-11-29 20:52:33 公開日:2023-11-28
# スケジュール付きマルチディフェンダーセキュリティゲーム Multi-defender Security Games with Schedules ( http://arxiv.org/abs/2311.16392v1 ) ライセンス: Link先を確認	Zimeng Song, Chun Kai Ling, Fei Fang	(参考訳) Stackelberg Security Gamesは、高レベルのセキュリティ設定における戦略的インタラクションのモデル化によく使用される。既存のモデルの大部分は、単一のエンティティがすべてのセキュリティアセットを指揮する、単一防御設定に重点を置いている。しかし、現実的なシナリオの多くは、より複雑なシステムに埋め込まれた独自の関心と優先順位を持つ複数の異種ディフェンダーを特徴としている。さらに、守備隊は守るべき標的をほとんど選ばない。その代わり、多くの防御的資源やスケジュールを持ち、それぞれ異なる防御能力を持つ。本稿では,複数のディフェンダーとスケジュールを同時に行うセキュリティゲームについて検討する。従来のマルチディフェンダーセキュリティゲームとは異なり、スケジュールの導入はより制限された環境下においても平衡が存在しないことを示している。スケジュールの任意の部分集合がスケジュールでもあるという穏やかな制限の下では、平衡の存在は回避されるだけでなく、2つのディフェンダーを持つゲームにおいて多項式時間で計算できる。さらに,本アルゴリズムは2つ以上のディフェンダーを持つゲームに拡張可能であり,パトロールアプリケーションなど,コンパクトに表現されたスケジュールを持つゲームの特別なクラスにスケールアップされる。実験結果から,本手法はゲームサイズと良好にスケール可能であることが示唆された。 Stackelberg Security Games are often used to model strategic interactions in high-stakes security settings. The majority of existing models focus on single-defender settings where a single entity assumes command of all security assets. However, many realistic scenarios feature multiple heterogeneous defenders with their own interests and priorities embedded in a more complex system. Furthermore, defenders rarely choose targets to protect. Instead, they have a multitude of defensive resources or schedules at its disposal, each with different protective capabilities. In this paper, we study security games featuring multiple defenders and schedules simultaneously. We show that unlike prior work on multi-defender security games, the introduction of schedules can cause non-existence of equilibrium even under rather restricted environments. We prove that under the mild restriction that any subset of a schedule is also a schedule, non-existence of equilibrium is not only avoided, but can be computed in polynomial time in games with two defenders. Under additional assumptions, our algorithm can be extended to games with more than two defenders and its computation scaled up in special classes of games with compactly represented schedules such as those used in patrolling applications. Experimental results suggest that our methods scale gracefully with game size, making our algorithms amongst the few that can tackle multiple heterogeneous defenders.	翻訳日:2023-11-29 20:52:05 公開日:2023-11-28
# 任意の階数 1 の単純リー代数の一般化コヒーレント状態の間の重なりの公式 A formula for the overlap between Generalized Coherent States of any rank one simple Lie algebra ( http://arxiv.org/abs/2311.16385v1 ) ライセンス: Link先を確認	Nicola Pranzini	(参考訳) 任意のランク1の単純リー代数の2つの一般化コヒーレント状態間の重なりを計算する公式を提供する。そして、この式をスピンコヒーレント状態(例えば、$\mathfrak{su}(2)$環)、擬スピンコヒーレント状態(すなわち、$\mathfrak{su}(1,1)$環)、および$\mathfrak{sl}(2,\mathbb{R})$ビラソーロの部分代数に適用する。これらすべての例において、コヒーレント状態の集合から半古典的挙動の出現を示し、代数とその表現に依存するパラメータが大きくなると、それが常に起こることを検証する。 We provide a formula for computing the overlap between two Generalized Coherent States of any rank one simple Lie algebra. Then, we apply our formula to spin coherent states (i.e. $\mathfrak{su}(2)$ algebra), pseudo-spin coherent states (i.e. $\mathfrak{su}(1,1)$ algebra), and the $\mathfrak{sl}(2,\mathbb{R})$ subalgebras of Virasoro. In all these examples, we show the emergence of a semi-classical behaviour from the set of coherent states and verify that it always happens when some parameter, depending on the algebra and its representation, becomes large.	翻訳日:2023-11-29 20:51:45 公開日:2023-11-28
# パーキンソン病眼球追跡データの時系列分類のための深層学習 Deep Learning for Time Series Classification of Parkinson's Disease Eye Tracking Data ( http://arxiv.org/abs/2311.16381v1 ) ライセンス: Link先を確認	Gonzalo Uribarri, Simon Ekman von Huth, Josefine Waldthaler, Per Svenningsson, Erik Frans\'en	(参考訳) 視線追跡は、被験者の運動と認知能力に関する情報を提供する、アクセス可能で非侵襲的な技術である。そのため、パーキンソン病などの神経変性疾患の研究において貴重な資源であることが証明されている。特にsaccadeの実験はパーキンソン病の診断と進行に有用であることが証明されている。しかし、今のところ、健康的なコントロールから患者を確定的に区別する眼球運動バイオマーカーは見つかっていない。本研究では,現在最先端のディープラーニングアルゴリズムを用いて,ササード実験による視線追跡データを用いてパーキンソン病分類を行う。従来の研究とは対照的に、サッケードから手作りの機能を使う代わりに、試験前の準備段階で記録された生の$\sim1.5\,s$の固定間隔を使用する。これらの短い時系列を入力として、InceptionTimeとROCKETという2つの異なる分類モデルを実装します。モデルは分類タスクを学習し,未発見の被験者に一般化することができる。インセプションタイムは88%の精度を達成し、ロケットは88%の精度を達成している。また, ロケット模型のプルーニング法を用いて, 解釈性と一般化性を改善し, 精度96%の精度を得た。以上の結果から, 固定データは物体間変動が低く, 脳の認知や運動状態に関する有用な情報を伝達し, 疾患関連バイオマーカーの発見に機械学習を応用できる可能性が示唆された。 Eye-tracking is an accessible and non-invasive technology that provides information about a subject's motor and cognitive abilities. As such, it has proven to be a valuable resource in the study of neurodegenerative diseases such as Parkinson's disease. Saccade experiments, in particular, have proven useful in the diagnosis and staging of Parkinson's disease. However, to date, no single eye-movement biomarker has been found to conclusively differentiate patients from healthy controls. In the present work, we investigate the use of state-of-the-art deep learning algorithms to perform Parkinson's disease classification using eye-tracking data from saccade experiments. In contrast to previous work, instead of using hand-crafted features from the saccades, we use raw $\sim1.5\,s$ long fixation intervals recorded during the preparatory phase before each trial. Using these short time series as input we implement two different classification models, InceptionTime and ROCKET. We find that the models are able to learn the classification task and generalize to unseen subjects. InceptionTime achieves $78\%$ accuracy, while ROCKET achieves $88\%$ accuracy. We also employ a novel method for pruning the ROCKET model to improve interpretability and generalizability, achieving an accuracy of $96\%$. Our results suggest that fixation data has low inter-subject variability and potentially carries useful information about brain cognitive and motor conditions, making it suitable for use with machine learning in the discovery of disease-relevant biomarkers.	翻訳日:2023-11-29 20:51:28 公開日:2023-11-28
# 視覚トランスフォーマによる台風強度予測 Typhoon Intensity Prediction with Vision Transformer ( http://arxiv.org/abs/2311.16450v1 ) ライセンス: Link先を確認	Huanxin Chen, Pengshuai Yin, Huichou Huang, Qingyao Wu, Ruirui Liu and Xiatian Zhu	(参考訳) 時空間をまたいで台風の強度を正確に予測することは,災害警報の発行や緊急対応の容易化に不可欠である。これは、生命の損失や財産の損傷を最小化し、経済や環境への影響を低減させる大きな可能性を持っている。シナリオ分析に衛星画像を活用することは有効であるが、クラウド間の複雑な関係と高度にダイナミックなコンテキストによって、追加の課題も生じている。この領域の既存のディープラーニング手法は畳み込みニューラルネットワーク(CNN)に依存している。この制限は、推論中に長距離依存関係とグローバルコンテキスト知識をキャプチャする能力を妨げます。そこで,我々は,層ごとに大域的な受容場を有する自己着脱機構を利用する新しい手法である「チフロン強度トランスフォーマー(tint)」を提案する。 Tintはシーケンス間特徴表現学習の観点を採用する。まず、与えられた衛星画像を一連のパッチに切り分け、再帰的に自己注意操作を使用して、すべてのパッチペア間の局所的およびグローバルなコンテキスト関係を同時に抽出し、パッチごとの特徴表現学習を強化する。タイフーンベンチマークの広範な実験は、tintの有効性を最先端のディープラーニングと従来の気象手法の両方と比較している。私たちのコードはhttps://github.com/chen-huanxin/tint.comで利用可能です。 Predicting typhoon intensity accurately across space and time is crucial for issuing timely disaster warnings and facilitating emergency response. This has vast potential for minimizing life losses and property damages as well as reducing economic and environmental impacts. Leveraging satellite imagery for scenario analysis is effective but also introduces additional challenges due to the complex relations among clouds and the highly dynamic context. Existing deep learning methods in this domain rely on convolutional neural networks (CNNs), which suffer from limited per-layer receptive fields. This limitation hinders their ability to capture long-range dependencies and global contextual knowledge during inference. In response, we introduce a novel approach, namely "Typhoon Intensity Transformer" (Tint), which leverages self-attention mechanisms with global receptive fields per layer. Tint adopts a sequence-to-sequence feature representation learning perspective. It begins by cutting a given satellite image into a sequence of patches and recursively employs self-attention operations to extract both local and global contextual relations between all patch pairs simultaneously, thereby enhancing per-patch feature representation learning. Extensive experiments on a publicly available typhoon benchmark validate the efficacy of Tint in comparison with both state-of-the-art deep learning and conventional meteorological methods. Our code is available at https://github.com/chen-huanxin/Tint.	翻訳日:2023-11-29 20:41:37 公開日:2023-11-28
# TopoSemiSeg: 病理像の半監督分割のためのトポロジー整合性の実現 TopoSemiSeg: Enforcing Topological Consistency for Semi-Supervised Segmentation of Histopathology Images ( http://arxiv.org/abs/2311.16447v1 ) ライセンス: Link先を確認	Meilong Xu, Xiaoling Hu, Saumya Gupta, Shahira Abousamra, Chao Chen	(参考訳) 計算病理学では、腺や核のような高密度に分布した物体を分割することは下流解析に不可欠である。画素毎のアノテーション取得の負担を軽減するため、半教師付き学習方法は、大量のラベルなしデータから学習する。それでも、既存の半監督的手法は、ラベル付けされていない画像に隠されたトポロジカルな情報を見落とし、例えば、欠落または誤って融合/分離された腺や核などのトポロジカルな誤りを引き起こす。この問題に対処するために,ラベルのないデータからトポロジカル表現を学習する最初の半教師付き手法であるTopoSemiSegを提案する。特に,教師と学生のネットワークが共有したトポロジ表現を学習するトポロジ対応型教師学生アプローチを提案する。これを実現するために,我々は,学習表現がロバストであり,真のトポロジカル信号に焦点をあてるように,信号一貫性とノイズ除去損失を含む位相的一貫性損失を導入する。公共病理画像データセットの大規模な実験は、特にトポロジ的評価指標において、我々の手法の優位性を示している。コードはhttps://github.com/Melon-Xu/TopoSemiSegで入手できる。 In computational pathology, segmenting densely distributed objects like glands and nuclei is crucial for downstream analysis. To alleviate the burden of obtaining pixel-wise annotations, semi-supervised learning methods learn from large amounts of unlabeled data. Nevertheless, existing semi-supervised methods overlook the topological information hidden in the unlabeled images and are thus prone to topological errors, e.g., missing or incorrectly merged/separated glands or nuclei. To address this issue, we propose TopoSemiSeg, the first semi-supervised method that learns the topological representation from unlabeled data. In particular, we propose a topology-aware teacher-student approach in which the teacher and student networks learn shared topological representations. To achieve this, we introduce topological consistency loss, which contains signal consistency and noise removal losses to ensure the learned representation is robust and focuses on true topological signals. Extensive experiments on public pathology image datasets show the superiority of our method, especially on topology-wise evaluation metrics. Code is available at https://github.com/Melon-Xu/TopoSemiSeg.	翻訳日:2023-11-29 20:41:14 公開日:2023-11-28
# センターステージ:Centricity-based Audio-Visual Temporal Action Detection Centre Stage: Centricity-based Audio-Visual Temporal Action Detection ( http://arxiv.org/abs/2311.16446v1 ) ライセンス: Link先を確認	Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett	(参考訳) 従来のワンステージ動作検出アプローチでは、視覚的モードのみを用いて時間的依存をモデル化していた。本稿では,2つのモダリティを融合させるために,マルチスケールのクロスアテンションを用いて,オーディオモダリティを取り入れるための様々な戦略を検討する。また,時間ステップから行動中心までの距離と予測された境界の精度との相関性を示す。そこで本研究では,アクションセンタへのタイムステップの近さを推定する新たなネットワークヘッドを提案し,その中心性スコア(centity score)と呼ぶ。これにより、より正確な境界を示す提案に対する信頼が高まる。本稿では,EPIC-Kitchens-100 アクション検出ベンチマークの最近の3つのベースラインをベースとした,最先端の性能を実現する手法を提案する。詳細なアブレーション研究は,音声を融合することの利点と,提案する中心性スコアを示している。提案手法のコードとモデルはhttps://github.com/hanielwang/Audio-Visual-TAD.gitで公開されている。 Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality. In this paper, we explore different strategies to incorporate the audio modality, using multi-scale cross-attention to fuse the two modalities. We also demonstrate the correlation between the distance from the timestep to the action centre and the accuracy of the predicted boundaries. Thus, we propose a novel network head to estimate the closeness of timesteps to the action centre, which we call the centricity score. This leads to increased confidence for proposals that exhibit more precise boundaries. Our method can be integrated with other one-stage anchor-free architectures and we demonstrate this on three recent baselines on the EPIC-Kitchens-100 action detection benchmark where we achieve state-of-the-art performance. Detailed ablation studies showcase the benefits of fusing audio and our proposed centricity scores. Code and models for our proposed method are publicly available at https://github.com/hanielwang/Audio-Visual-TAD.git	翻訳日:2023-11-29 20:40:49 公開日:2023-11-28
# CLAP:事前学習型視覚言語モデルにおけるロバスト性向上のための拡張プロンプトを用いたコントラスト学習 CLAP: Contrastive Learning with Augmented Prompts for Robustness on Pretrained Vision-Language Models ( http://arxiv.org/abs/2311.16445v1 ) ライセンス: Link先を確認	Yichao Cai, Yuhang Liu, Zhen Zhang, Javen Qinfeng Shi	(参考訳) 対照的な視覚言語モデル、例えばCLIPは、その例外的な一般化能力にかなりの注意を払っている。しかし、摂動に対する強固さは懸念を燃やしている。既存の戦略は、通常、画像エンコーダがこれらの混乱した例を「見る」ようにすることで、敵の例に対する抵抗性を強化する。本研究では,画像エンコーダを逆例として再トレーニングする必要をなくし,テキスト拡張のみを通じてロバスト性を向上させる新しい手法を提案する。我々のモチベーションは、テキストと画像データが本質的に、潜在コンテンツ変数とスタイル変数からなる共有潜在空間を占有していることに起因している。この知見は、テキストデータのみを使用してこれらの潜在コンテンツ変数を分離する学習の可能性を示している。そこで本研究では,テキストデータの内容を保存しつつ,スタイルの変更に焦点をあてた効果的なテキスト拡張手法を提案する。テキストデータのスタイル部分を変更することで、テキストエンコーダに潜在コンテンツ変数を強調させ、最終的には視覚言語モデルの堅牢性を高めます。各種データセットを対象とした実験により,事前学習したCLIPモデルの堅牢性を大幅に向上した。 Contrastive vision-language models, e.g., CLIP, have garnered substantial attention for their exceptional generalization capabilities. However, their robustness to perturbations has ignited concerns. Existing strategies typically reinforce their resilience against adversarial examples by enabling the image encoder to "see" these perturbed examples, often necessitating a complete retraining of the image encoder on both natural and adversarial samples. In this study, we propose a new method to enhance robustness solely through text augmentation, eliminating the need for retraining the image encoder on adversarial examples. Our motivation arises from the realization that text and image data inherently occupy a shared latent space, comprising latent content variables and style variables. This insight suggests the feasibility of learning to disentangle these latent content variables using text data exclusively. To accomplish this, we introduce an effective text augmentation method that focuses on modifying the style while preserving the content in the text data. By changing the style part of the text data, we empower the text encoder to emphasize latent content variables, ultimately enhancing the robustness of vision-language models. Our experiments across various datasets demonstrate substantial improvements in the robustness of the pre-trained CLIP model.	翻訳日:2023-11-29 20:40:32 公開日:2023-11-28
# Exo2EgoDVC:Webインストラクショナルビデオを用いたエゴセントリックな手続き活動の高精細ビデオキャプション Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos ( http://arxiv.org/abs/2311.16444v1 ) ライセンス: Link先を確認	Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato	(参考訳) 本稿では,webインストラクションビデオからのモデルをエゴセントリックな視点に適応させ,高精細ビデオキャプションのクロスビュー知識伝達のための新しいベンチマークを提案する。濃密なビデオキャプション(予測時間セグメントとそのキャプション)は、主にエキソセントリックなビデオ(例えばyoucook2)で研究されているが、エゴセントリックなビデオによるベンチマークはデータの不足のために制限されている。制限されたビデオの可用性を克服するために、豊富なエキソセントリックなウェブビデオからの知識の転送は実用的なアプローチとして要求される。しかし,外心的視点と自我的視点の対応の学習は,動的視点の変化のため困難である。ウェブビデオには、人間の身体の動きやクローズアップなハンドオブジェクトのインタラクションに焦点を当てた、さまざまなビューが含まれています。これは複雑なビュー変更の下でのクロスビュー転送の詳細な研究を必要とする。本研究では,まず,実生活におけるエゴセントリックなデータセット(EgoYC2)を作成し,そのキャプションがYouCook2と共有されている。そこで本研究では,前訓練段階と微調整段階の両方において,逆訓練を用いた視点不変学習法を提案する。事前トレーニングは、webビデオの混合ビューに対して不変機能を学ぶように設計されているが、ビュー不変の微調整は、両方のデータセット間のビューギャップをさらに緩和する。提案手法は,視点変化問題をいかに効果的に克服し,知識をエゴセントリックな領域に効率的に移すことで検証する。本ベンチマークでは,映像キャプションのタスク領域にクロスビュー変換の研究をプッシュし,自然言語による自己中心的映像記述の方法論を考察する。 We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view. While dense video captioning (predicting time segments and their captions) is primarily studied with exocentric videos (e.g., YouCook2), benchmarks with egocentric videos are restricted due to data scarcity. To overcome the limited video availability, transferring knowledge from abundant exocentric web videos is demanded as a practical approach. However, learning the correspondence between exocentric and egocentric views is difficult due to their dynamic view changes. The web videos contain mixed views focusing on either human body actions or close-up hand-object interactions, while the egocentric view is constantly shifting as the camera wearer moves. This necessitates the in-depth study of cross-view transfer under complex view changes. In this work, we first create a real-life egocentric dataset (EgoYC2) whose captions are shared with YouCook2, enabling transfer learning between these datasets assuming their ground-truth is accessible. To bridge the view gaps, we propose a view-invariant learning method using adversarial training in both the pre-training and fine-tuning stages. While the pre-training is designed to learn invariant features against the mixed views in the web videos, the view-invariant fine-tuning further mitigates the view gaps between both datasets. We validate our proposed method by studying how effectively it overcomes the view change problem and efficiently transfers the knowledge to the egocentric domain. Our benchmark pushes the study of the cross-view transfer into a new task domain of dense video captioning and will envision methodologies to describe egocentric videos in natural language.	翻訳日:2023-11-29 20:40:10 公開日:2023-11-28
# GPU上で高速2ビットLLMを実現する:メモリアライメント、スパースアウトリア、非同期デクエント化 Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization ( http://arxiv.org/abs/2311.16442v1 ) ライセンス: Link先を確認	Jinhao Li, Shiyao Li, Jiaming Xu, Shan Huang, Yaoxiu Lian, Jun Liu, Yu Wang, Guohao Dai	(参考訳) 大規模言語モデル(LLM)は、推論コストが高価である間に、様々な領域で印象的な能力を示す。最先端の手法は2ビット量子化をメインストリームのLCMに用いている。しかし、(1)2ビット量子化の精度損失は無視できない。重みはグループによって量子化されるが、重みの範囲はいくつかのグループで大きく、大きな量子化誤差と無視できない精度損失をもたらす(例えば、gptqとgreenbitの2ビット量子化を持つllama2-7bの3%)。 2) 4ビット重み付けによる精度向上の制限。 10%余分な平均ビットを4ビット重みに増やすと、定量化されたllama2-7bの精度が0.5%向上する。 (3)GPUにおける時間を要する復調処理。 dequantization操作は50%以上の実行時間をもたらし、LSM推論コストを削減する可能性を妨げている。これらの課題に対処するために,(1)GPU上でのメモリアライメントを考慮した4ビットを用いて,より広い範囲の少数のグループのみを定量化する手法を提案する。 2) 2 ビット群と 4 ビット群では, スパース値の分布が異なることが指摘され, 16 ビットの量子化を必要とするのはごくわずかである。このような設計は、Llama2-7bの平均的な増加ビット数で0.5%の精度向上をもたらす。 (3)GPU上での非同期dequantizationを設計し,最大3.92倍の高速化を実現した。異なるモデルファミリーとモデルサイズについて広範な実験を行う。我々はLlama2-7bの各重量に対して2.85ビットを達成し、Llama2-7bのエンドツーエンドのスピードアップはオリジナルのモデルよりも1.74倍、実行時コストとハードウェアコストの両方を2.70Xと2.81Xに削減し、GPUの要求を減らした。 Large language models (LLMs) have demonstrated impressive abilities in various domains while the inference cost is expensive. The state-of-the-art methods use 2-bit quantization for mainstream LLMs. However, challenges still exist: (1) Nonnegligible accuracy loss for 2-bit quantization. Weights are quantized by groups, while the ranges of weights are large in some groups, resulting in large quantization errors and nonnegligible accuracy loss (e.g. >3% for Llama2-7b with 2-bit quantization in GPTQ and Greenbit). (2) Limited accuracy improvement by adding 4-bit weights. Increasing 10% extra average bit more 4-bit weights only leads to <0.5% accuracy improvement on a quantized Llama2-7b. (3) Time-consuming dequantization operations on GPUs. The dequantization operations lead to >50% execution time, hindering the potential of reducing LLM inference cost. To tackle these challenges, we propose the following techniques: (1) We only quantize a small fraction of groups with the larger range using 4-bit with memory alignment consideration on GPUs. (2) We point out that the distribution of the sparse outliers with larger weights is different in 2-bit and 4-bit groups, and only a small fraction of outliers require 16-bit quantization. Such design leads to >0.5% accuracy improvement with <3% average increased bit for Llama2-7b. (3) We design the asynchronous dequantization on GPUs, leading to up to 3.92X speedup. We conduct extensive experiments on different model families and model sizes. We achieve 2.85-bit for each weight and the end-to-end speedup for Llama2-7b is 1.74X over the original model, and we reduce both runtime cost and hardware cost by up to 2.70X and 2.81X with less GPU requirements.	翻訳日:2023-11-29 20:39:33 公開日:2023-11-28
# 学習可能な領域によるテキスト駆動画像編集 Text-Driven Image Editing via Learnable Regions ( http://arxiv.org/abs/2311.16432v1 ) ライセンス: Link先を確認	Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai, Lu Jiang, Ming-Hsuan Yang	(参考訳) 言語は画像編集の自然なインターフェースとして登場した。本稿では,ユーザが提供するマスクやスケッチを必要とせず,テキストプロンプトによる領域ベースの画像編集手法を提案する。具体的には、既存の事前学習済みテキストから画像へのモデルを利用して、テキストプロンプトにアラインされた編集領域を見つけるためのバウンディングボックスジェネレータを導入する。この単純なアプローチは、現在の画像生成モデルと互換性のある柔軟な編集を可能にし、複数のオブジェクト、複雑な文、長い段落を含む複雑なプロンプトを処理できることを示します。本手法を最先端手法と比較するために,広範なユーザ調査を行った。実験は,高い忠実性とリアリズムを持つ画像を操作し,提供された言語記述と整合する手法の競合性能を実証する。私たちのプロジェクトのWebページは、https://yuanze-lin.me/LearnableRegions_pageです。 Language has emerged as a natural interface for image editing. In this paper, we introduce a method for region-based image editing driven by textual prompts, without the need for user-provided masks or sketches. Specifically, our approach leverages an existing pretrained text-to-image model and introduces a bounding box generator to find the edit regions that are aligned with the textual prompts. We show that this simple approach enables flexible editing that is compatible with current image generation models, and is able to handle complex prompts featuring multiple objects, complex sentences or long paragraphs. We conduct an extensive user study to compare our method against state-of-the-art methods. Experiments demonstrate the competitive performance of our method in manipulating images with high fidelity and realism that align with the language descriptions provided. Our project webpage: https://yuanze-lin.me/LearnableRegions_page.	翻訳日:2023-11-29 20:38:58 公開日:2023-11-28
# 複素値ニューラルネットワークにおける過渡時空間ダイナミクスを用いた計算の厳密な数学的記述 An exact mathematical description of computation with transient spatiotemporal dynamics in a complex-valued neural network ( http://arxiv.org/abs/2311.16431v1 ) ライセンス: Link先を確認	Roberto C. Budzinski, Alexandra N. Busch, Samuel Mestern, Erwan Martin, Luisa H. B. Liboni, Federico W. Pasini, J\'an Min\'a\v{c}, Todd Coleman, Wataru Inoue, Lyle E. Muller	(参考訳) 線形時間遅延相互作用を持つ複素値ニューラルネットワーク(cv-nn)について検討した。 cv-nnは,'chimera'状態の部分同期を含む,洗練された時空間ダイナミクスを示す。次に、これらの時空間力学と非線形読み出しを併用して計算を行う。 cv-NNは動的論理ゲートをインスタンス化し、短期記憶を符号化し、対話と時間遅延を組み合わせたセキュアなメッセージパッシングを仲介する。このシステムの計算は、正確に閉形式の数学的表現で完全に記述することができる。最後に、新皮質からスライスした神経細胞の直接細胞内記録を用いて、生体神経細胞によるcv-NNの計算が可能であることを実証した。これらの結果は、複雑な値の線形系が洗練された計算を行いながら、正確に解けることを示した。まとめると、これらの結果は、他のニューラルネットワークとシームレスに対話できる高度に適応可能なバイオハイブリッドコンピューティングシステムの設計のための将来の道を開く。 We study a complex-valued neural network (cv-NN) with linear, time-delayed interactions. We report the cv-NN displays sophisticated spatiotemporal dynamics, including partially synchronized ``chimera'' states. We then use these spatiotemporal dynamics, in combination with a nonlinear readout, for computation. The cv-NN can instantiate dynamics-based logic gates, encode short-term memories, and mediate secure message passing through a combination of interactions and time delays. The computations in this system can be fully described in an exact, closed-form mathematical expression. Finally, using direct intracellular recordings of neurons in slices from neocortex, we demonstrate that computations in the cv-NN are decodable by living biological neurons. These results demonstrate that complex-valued linear systems can perform sophisticated computations, while also being exactly solvable. Taken together, these results open future avenues for design of highly adaptable, bio-hybrid computing systems that can interface seamlessly with other neural networks.	翻訳日:2023-11-29 20:38:43 公開日:2023-11-28
# ソフトウェア開発における大規模言語モデルの変容的影響 The Transformative Influence of Large Language Models on Software Development ( http://arxiv.org/abs/2311.16429v1 ) ライセンス: Link先を確認	Sajed Jalil	(参考訳) 一般化されたLarge Language Models(LLM)の採用と商業化が、私たちの日常生活の様々な側面に大きな影響を与えています。当初はコンピュータサイエンスコミュニティに受け入れられていたが、LLMの汎用性は様々な分野に浸透した。特に、ソフトウェアエンジニアリングの領域は、最も変革的な変化を目の当たりにしている。 LLMがAIペアプログラミングアシスタントとして機能するようになり、ソフトウェアエンジニアを支援する専門モデルの開発が加速した。この新しいパラダイムには多くの利点があるが、重要な課題やオープンな問題も提示する。可能性と普及する障害を特定するため,同時代の学術誌を体系的にレビューし,ソフトウェア開発者とユーザビリティの懸念点を強調した。予備的な調査結果は、データのプライバシー、偏見、誤報に関する懸念を浮き彫りにしている。さらに,プロンプトエンジニアリング,認知的要求の増大,不信感など,ユーザビリティの課題をいくつか特定した。最後に、これらの領域について、調査を通じて確認した12のオープンな問題を紹介します。 The increasing adoption and commercialization of generalized Large Language Models (LLMs) have profoundly impacted various aspects of our daily lives. Initially embraced by the computer science community, the versatility of LLMs has found its way into diverse domains. In particular, the software engineering realm has witnessed the most transformative changes. With LLMs increasingly serving as AI Pair Programming Assistants spurred the development of specialized models aimed at aiding software engineers. Although this new paradigm offers numerous advantages, it also presents critical challenges and open problems. To identify the potential and prevailing obstacles, we systematically reviewed contemporary scholarly publications, emphasizing the perspectives of software developers and usability concerns. Preliminary findings underscore pressing concerns about data privacy, bias, and misinformation. Additionally, we identified several usability challenges, including prompt engineering, increased cognitive demands, and mistrust. Finally, we introduce 12 open problems that we have identified through our survey, covering these various domains.	翻訳日:2023-11-29 20:38:27 公開日:2023-11-28
# 重力波検出と低雑音サファイア振動子 Gravitational Wave Detection and Low-Noise Sapphire Oscillators ( http://arxiv.org/abs/2311.16426v1 ) ライセンス: Link先を確認	Michael Edmund Tobar	(参考訳) この論文は、Xバンド上で調整可能な超低雑音サファイア共振器発振器の開発について述べる。この作業中に、多モード共振器キャビティと発振器の設計と理解に関して、興味深く非常に有用な現象を説明している。この発振器は、1.5トンのニオブ共振バー重力波検出器に取り付けられた超伝導パラメトリックトランスデューサシステムのポンプ発振器として機能するように設計された。パラメトリックトランスデューサと共振バーシステムによるポンプ発振器の内蔵効果を分析し,検出感度の予測を可能にした。この検出器は、史上初の超精密光機械システムであった。共鳴検出器への関心が再び高まる中、この論文は多モード音響システムに関する重要な研究であり、今日多くの研究分野に関連する高感度パラメトリックトランスデューサと組み合わせられている。 This thesis describes the development of an ultra-low noise sapphire resonator oscillator that is tunable over X-band. While undertaking this task the author has explained some interesting and very useful phenomena in regards to the design and understanding of multi-mode resonant cavities and oscillators. The oscillator was constructed to operate as the pump oscillator in the superconducting parametric transducer system, attached to a 1.5-tonne niobium resonant bar gravitational wave detector. The effects of incorporating the pump oscillator with the parametric transducer and resonant bar system are analyzed to enable prediction of the detector sensitivity. The detector was the first massive precision optomechanical system ever built. With the resurgence in interest in resonant detectors, this thesis has important work on multi-mode acoustic systems, coupled to a highly sensitive parametric transducer relevant for many fields of research today.	翻訳日:2023-11-29 20:38:15 公開日:2023-11-28
# 誘導拡散保存マニフォールド Manifold Preserving Guided Diffusion ( http://arxiv.org/abs/2311.16424v1 ) ライセンス: Link先を確認	Yutong He, Naoki Murata, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Dongjun Kim, Wei-Hsiang Liao, Yuki Mitsufuji, J. Zico Kolter, Ruslan Salakhutdinov, Stefano Ermon	(参考訳) 最近の進歩にもかかわらず、条件付き画像生成は依然としてコスト、一般化可能性、タスク固有のトレーニングの必要性といった課題に直面している。本稿では,事前学習された拡散モデルとオフ・ザ・シェルフ・ニューラル・ネットワークを活用した学習フリー条件生成フレームワークであるmanachular preservation guided diffusion (mpgd)を提案する。具体的には,多様体仮説を利用して誘導拡散ステップを洗練し,その過程に近道アルゴリズムを導入する。次に,事前学習されたオートエンコーダを用いたオンライントレーニングフリーガイダンスの2つの手法を提案し,潜在拡散モデルに適用した場合のショートカットが本質的に多様体を保存することを示した。実験の結果,MPGDは様々な条件生成アプリケーションを低計算条件で解くのに効率的かつ効果的であり,同じ拡散段数で最大3.8倍の高速化を実現できるとともに,ベースラインに比べて高いサンプル品質を維持することができることがわかった。 Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework that leverages pretrained diffusion models and off-the-shelf neural networks with minimal additional inference cost for a broad range of tasks. Specifically, we leverage the manifold hypothesis to refine the guided diffusion steps and introduce a shortcut algorithm in the process. We then propose two methods for on-manifold training-free guidance using pre-trained autoencoders and demonstrate that our shortcut inherently preserves the manifolds when applied to latent diffusion models. Our experiments show that MPGD is efficient and effective for solving a variety of conditional generation applications in low-compute settings, and can consistently offer up to 3.8x speed-ups with the same number of diffusion steps while maintaining high sample quality compared to the baselines.	翻訳日:2023-11-29 20:38:02 公開日:2023-11-28
# cdeval: 大きな言語モデルの文化的次元を測定するためのベンチマーク CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models ( http://arxiv.org/abs/2311.16421v1 ) ライセンス: Link先を確認	Yuhang Wang, Yanxu Zhu, Chao Kong, Shuyu Wei, Xiaoyuan Yi, Xing Xie and Jitao Sang	(参考訳) 大規模言語モデル(llm)のスケーリングによって能力が劇的に向上するにつれ、その責任と倫理的利用を確保するために、アライメントの問題に注目が集まっている。既存のアライメント努力は、HHH原則のような普遍的価値に主に集中しているが、本質的に多元的かつ多様である文化の側面には十分な注意が払われていない。この研究は、LLMの文化的側面を評価することを目的とした新しいベンチマークであるCDEvalを導入する。 CDEvalは、GPT-4の自動生成と人間の検証の両方を取り入れて構築され、7つの領域にわたる6つの文化的次元をカバーする。我々の包括的な実験は、主流のllmの文化に興味深い洞察を与え、異なる次元とドメインにおける構成とバリエーションを強調する。この知見は, LLM開発における文化的考慮事項の統合の重要性, 特に多様な文化的状況における応用の重要性を浮き彫りにした。 CDEvalを通じて、文化的な側面を含むことでLCMアライメント研究の地平を広げ、LCMの将来の発展と評価のためのより包括的な枠組みを提供する。このベンチマークは、LLMにおける文化的研究の貴重なリソースとなり、より文化的に認識され、センシティブなモデルへの道を開いた。 As the scaling of Large Language Models (LLMs) has dramatically enhanced their capabilities, there has been a growing focus on the alignment problem to ensure their responsible and ethical use. While existing alignment efforts predominantly concentrate on universal values such as the HHH principle, the aspect of culture, which is inherently pluralistic and diverse, has not received adequate attention. This work introduces a new benchmark, CDEval, aimed at evaluating the cultural dimensions of LLMs. CDEval is constructed by incorporating both GPT-4's automated generation and human verification, covering six cultural dimensions across seven domains. Our comprehensive experiments provide intriguing insights into the culture of mainstream LLMs, highlighting both consistencies and variations across different dimensions and domains. The findings underscore the importance of integrating cultural considerations in LLM development, particularly for applications in diverse cultural settings. Through CDEval, we aim to broaden the horizon of LLM alignment research by including cultural dimensions, thus providing a more holistic framework for the future development and evaluation of LLMs. This benchmark serves as a valuable resource for cultural studies in LLMs, paving the way for more culturally aware and sensitive models.	翻訳日:2023-11-29 20:37:43 公開日:2023-11-28
# 分散検出のためのモデルフリーテスト時間適応 Model-free Test Time Adaptation for Out-Of-Distribution Detection ( http://arxiv.org/abs/2311.16420v1 ) ライセンス: Link先を確認	YiFan Zhang, Xue Wang, Tian Zhou, Kun Yuan, Zhang Zhang, Liang Wang, Rong Jin, Tieniu Tan	(参考訳) MLモデルの信頼性には、アウト・オブ・ディストリビューション(OOD)検出が不可欠である。 OOD検出の既存のほとんどの方法は、所定の分布内データセットから一定の決定基準を学習し、データポイントがOODであるかどうかを普遍的に判定する。最近の研究は、分配データのみを考慮すれば、余分な仮定なしでOODデータを確実に検出することは不可能であることを示している。理論的な結果と近年の試験時間適応法の研究に動機づけられ、非パラメトリックなテスト時間 \textbf{a}ptation framework for \textbf{o}ut-of-\textbf{d}istribution \textbf{d}etection (\abbr) を提案する。従来の方法とは異なり、\abbrはテスト中のモデル適応のためにオンラインテストサンプルを使用し、データ分布の変更への適応性を高めている。このフレームワークは検出されたOODインスタンスを意思決定に組み込んで、特にIDとOODの分布が重なり合う場合、偽陽性率を減らす。我々は,複数のOOD検出ベンチマークの総合的な実験を通じて,Shaabbrの有効性を実証した。具体的には、 \abbr は CIFAR-10 ベンチマークで23.23 %$、ImageNet-1k ベンチマークで38 %$ の偽陽性率 (FPR95) を、先進的な手法と比較して減少させる。最後に, 理論的に \abbr の有効性を検証する。 Out-of-distribution (OOD) detection is essential for the reliability of ML models. Most existing methods for OOD detection learn a fixed decision criterion from a given in-distribution dataset and apply it universally to decide if a data point is OOD. Recent work~\cite{fang2022is} shows that given only in-distribution data, it is impossible to reliably detect OOD data without extra assumptions. Motivated by the theoretical result and recent exploration of test-time adaptation methods, we propose a Non-Parametric Test Time \textbf{Ada}ptation framework for \textbf{O}ut-Of-\textbf{D}istribution \textbf{D}etection (\abbr). Unlike conventional methods, \abbr utilizes online test samples for model adaptation during testing, enhancing adaptability to changing data distributions. The framework incorporates detected OOD instances into decision-making, reducing false positive rates, particularly when ID and OOD distributions overlap significantly. We demonstrate the effectiveness of \abbr through comprehensive experiments on multiple OOD detection benchmarks, extensive empirical studies show that \abbr significantly improves the performance of OOD detection over state-of-the-art methods. Specifically, \abbr reduces the false positive rate (FPR95) by $23.23\%$ on the CIFAR-10 benchmarks and $38\%$ on the ImageNet-1k benchmarks compared to the advanced methods. Lastly, we theoretically verify the effectiveness of \abbr.	翻訳日:2023-11-29 20:37:22 公開日:2023-11-28
# ロバストPCAに対する組合せ的アプローチ A Combinatorial Approach to Robust PCA ( http://arxiv.org/abs/2311.16416v1 ) ライセンス: Link先を確認	Weihao Kong, Mingda Qiao, Rajat Sen	(参考訳) 雑音が低く, 汚職が座標レベルにある場合に, 逆汚職の下でガウスデータを復元する問題について検討する。具体的には、ガウスノイズは未知の$k$-次元部分空間 $u \subseteq \mathbb{r}^d$ にあり、各データポイントのランダムに選択された座標は敵の制御下に入ると仮定する。この設定は、高度にノイズの多いチャネルを介して送信される高次元で構造化されたデータから学習するシナリオをモデル化する。我々の主な結果は、$ks^2 = O(d)$ のとき、ほぼ最適の $\ell_1$ エラーの $\tilde O(ks/d)$ まで全てのデータポイントを復元する効率的なアルゴリズムである。我々の証明の核心は、基礎となる部分空間 u$ 上の追加の仮定(例えば、非一貫性や制限された等長性)で成功するスパース信号の回復のためのよく知られた基底追跡法 (bp) の新しい分析である。対照的に、自然組合せ問題の研究を通じて新しいアプローチを提案し、スパース信号の支持におけるランダム性よりも、サブスペース$U$が任意であっても高い確率誤差境界が可能であることを示す。 We study the problem of recovering Gaussian data under adversarial corruptions when the noises are low-rank and the corruptions are on the coordinate level. Concretely, we assume that the Gaussian noises lie in an unknown $k$-dimensional subspace $U \subseteq \mathbb{R}^d$, and $s$ randomly chosen coordinates of each data point fall into the control of an adversary. This setting models the scenario of learning from high-dimensional yet structured data that are transmitted through a highly-noisy channel, so that the data points are unlikely to be entirely clean. Our main result is an efficient algorithm that, when $ks^2 = O(d)$, recovers every single data point up to a nearly-optimal $\ell_1$ error of $\tilde O(ks/d)$ in expectation. At the core of our proof is a new analysis of the well-known Basis Pursuit (BP) method for recovering a sparse signal, which is known to succeed under additional assumptions (e.g., incoherence or the restricted isometry property) on the underlying subspace $U$. In contrast, we present a novel approach via studying a natural combinatorial problem and show that, over the randomness in the support of the sparse signal, a high-probability error bound is possible even if the subspace $U$ is arbitrary.	翻訳日:2023-11-29 20:36:49 公開日:2023-11-28
# 拡張2体型リドバーグ遮断相互作用とオフ共振変調駆動 Extended two-body Rydberg blockade interaction with off-resonant modulated driving ( http://arxiv.org/abs/2311.16413v1 ) ライセンス: Link先を確認	Yuan Sun	(参考訳) 接続性はcold atom qubitプラットフォームにおいて必須かつ不可欠な役割を担っている。 2量子ビットのブロックゲートは最近、忠実度側で急速に進歩しているが、真のスケーラビリティを追求する接続性を改善することが、完全に接続されたコールドアトムキュービットアレイの究極の展望である。この方向への固いステップは、2量子ビットのrydbergブロックゲートを純粋に近距離の2体相互作用を超えて拡張するために余分なバッファ原子を導入することで得られる。ライドバーグ双極子-双極子相互作用を通じて、バッファ原子は互いに直接物理的に影響を与えない2つのクビット原子と結合する。オフ共振変調駆動の確立された方法は便利であるばかりでなく、この最新の開発の基礎となるものである。ここでの原子結合構造は, 従来の2体系に比べて非自明な合併症を呈するが, 適切に設計された変調波形による地表面-ライドベルク遷移後, 人口は満足して基底状態に戻ることができる。一般的には1光子と2光子地上リドバーグ遷移によってインスタンス化できる。さらに、バッファ原子リレーや同様の構造により、2つの遠方量子ビット原子間の2量子エンタングゲートを実現することができる。このようなソリューションが実現可能なコア問題に加えて、代表変調パターンも解析し、バッファ原子を介する2量子ゲートの汎用性を示す。より広い視点で見れば、これらの取り組みは、固体電子のワイヤとジャンクションの概念に近づいた冷原子量子ビットプラットフォームをもたらす。 Connectivity has an essential and indispensable role in the cold atom qubit platform. Whilst the two-qubit Rydberg blockade gate recently receives rapid progress on the fidelity side, a pressing challenge is to improve the connectivity in pursuit of genuine scalability, with the ultimate prospect of fully-connected cold atom qubit array. It turns out that a solid step along this direction can be made by introducing extra buffer atom to extend two-qubit Rydberg blockade gate beyond a purely nearest-neighbor two-body interaction. Through Rydberg dipole-dipole interactions, the buffer atom couples with the two qubit atoms which do not directly exert any physical influence on each other. The established method of off-resonant modulated driving is not only convenient but also lays down the groundwork for this latest development. Although the atomic linkage structure here exhibits nontrivial complications compared to previous cases of mere two-body system, the population can satisfyingly return to the ground state after the ground-Rydberg transition with properly designed modulation waveforms. It can be instantiated via one-photon and two-photon ground-Rydberg transitions in common practices. Furthermore, with buffer atom relay or similar structures, it is possible to realize two-qubit entangling gate between two far-away qubit atoms. Besides the core issue that such solutions are attainable, the representative modulation patterns are also analyzed, demonstrating the versatility of buffer-atom-mediated two-qubit gate. Put in a broader perspective, these efforts bring the cold atom qubit platform closer to the notions of wires and junctions in solid state electronics.	翻訳日:2023-11-29 20:36:23 公開日:2023-11-28
# 臨界と量子力学の組み合わせ Combining critical and quantum metrology ( http://arxiv.org/abs/2311.16472v1 ) ライセンス: Link先を確認	Christoph Hotter, Helmut Ritsch, and Karol Gietka	(参考訳) 臨界距離論は、量子の相関が非常に強い量子相転移点付近の基底状態における系の精密な準備に依存する。これは典型的には、系のパラメータの変化に関する量子フィッシャー情報を増やし、Cram\'er-Rao境界で制限された最適に可能な測定精度を改善する。したがって、クリティカル・メトロロジーは未知のパラメータに関する情報をシステムの基底状態の変化にエンコードする。逆にラムゼー干渉法のような従来のメロロジー法では、系の固有状態は変化せず、未知のパラメータに関する情報は励起系状態が時間進化の間に蓄積する相対位相に符号化される。本稿では,これら2つの手法を閉じた分散システムに適用可能な統一プロトコルに結合する手法を提案する。この場合の量子フィッシャー情報は、固有状態と相対位相変化の相互作用に由来する追加の干渉項を示す。このようなセットアップで量子的および古典的フィッシャー情報の解析式を提供し、またクラム・ラオ境界の下で許容される最大精度をほぼ達成できる簡単な測定アプローチを解明する。我々は,dicke と lipkin-meshkov-glick hamiltonian の熱力学的限界を特徴付けるスクイージング・ハミルトニアンに注目して,これらの結果を紹介する。 Critical metrology relies on the precise preparation of a system in its ground state near a quantum phase transition point where quantum correlations get very strong. Typically this increases the quantum Fisher information with respect to changes in system parameters and thus improves the optimally possible measurement precision limited by the Cram\'er-Rao bound. Hence critical metrology involves encoding information about the unknown parameter in changes of the system's ground state. Conversely, in conventional metrology methods like Ramsey interferometry, the eigenstates of the system remain unchanged, and information about the unknown parameter is encoded in the relative phases that excited system states accumulate during their time evolution. Here we introduce an approach combining these two methodologies into a unified protocol applicable to closed and driven-dissipative systems. We show that the quantum Fisher information in this case exhibits an additional interference term originating from the interplay between eigenstate and relative phase changes. We provide analytical expressions for the quantum and classical Fisher information in such a setup, elucidating as well a straightforward measurement approach that nearly attains the maximum precision permissible under the Cram\'er-Rao bound. We showcase these results by focusing on the squeezing Hamiltonian, which characterizes the thermodynamic limit of Dicke and Lipkin-Meshkov-Glick Hamiltonians.	翻訳日:2023-11-29 20:26:43 公開日:2023-11-28
# マルチモーダル・マルチパート動作合成のための統一フレームワーク A Unified Framework for Multimodal, Multi-Part Human Motion Synthesis ( http://arxiv.org/abs/2311.16471v1 ) ライセンス: Link先を確認	Zixiang Zhou, Yu Wan, Baoyuan Wang	(参考訳) 様々なモダリティによって駆動される現実的な人間の動きの合成において、この分野は大きな進歩を遂げた。しかし、様々な制御信号に従って様々な身体部位をアニメーションする異なる方法の必要性は、現実的なシナリオにおいてこれらの手法のスケーラビリティを制限している。本稿では,マルチモーダル(テキスト,音楽,音声)とマルチパート(ハンド,トルソ)のヒューマンモーション生成を統合する,凝集的でスケーラブルなアプローチを提案する。私たちは、様々な身体部分の動きを、それぞれのドメインに合わせた別々のコードブックに定量化することから始めます。次に,事前学習モデルのロバスト性を利用して,マルチモーダル信号の共有潜在空間への変換を行う。次に、これらの信号を離散的な動きトークンに変換し、その後のトークンを反復的に予測して完全なシーケンスを形成する。最後に、このトークン化されたシーケンスから連続的な実際の動きを再構成する。本手法は,制御信号のモダリティに基づいて,専用コードブックから抽出したトークン予測タスクとして,マルチモーダルモーション生成課題をフレーム化する。このアプローチは本質的にスケーラブルであり、新しいモダリティを簡単に統合できる。広範な実験により,我々の設計の有効性を実証し,幅広い応用への可能性を強調した。 The field has made significant progress in synthesizing realistic human motion driven by various modalities. Yet, the need for different methods to animate various body parts according to different control signals limits the scalability of these techniques in practical scenarios. In this paper, we introduce a cohesive and scalable approach that consolidates multimodal (text, music, speech) and multi-part (hand, torso) human motion generation. Our methodology unfolds in several steps: We begin by quantizing the motions of diverse body parts into separate codebooks tailored to their respective domains. Next, we harness the robust capabilities of pre-trained models to transcode multimodal signals into a shared latent space. We then translate these signals into discrete motion tokens by iteratively predicting subsequent tokens to form a complete sequence. Finally, we reconstruct the continuous actual motion from this tokenized sequence. Our method frames the multimodal motion generation challenge as a token prediction task, drawing from specialized codebooks based on the modality of the control signal. This approach is inherently scalable, allowing for the easy integration of new modalities. Extensive experiments demonstrated the effectiveness of our design, emphasizing its potential for broad application.	翻訳日:2023-11-29 20:26:22 公開日:2023-11-28
# AvatarGPT: モーション理解,計画,生成,その他のためのオールインワンフレームワーク AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond ( http://arxiv.org/abs/2311.16468v1 ) ライセンス: Link先を確認	Zixiang Zhou, Yu Wan, Baoyuan Wang	(参考訳) 大規模言語モデル(LLM)は、ほとんどすべての(すべての)NLPタスクを統一する際、顕著な創発的な能力を示している。しかし、人間の動きに関連する領域では、研究者は依然として各タスクのためのサイロモデルを開発している。 InstuctGPT と Gato の背後にある一般概念に触発されて,動作理解,計画,世代,さらには動作間合成などのタスクのためのオールインワンフレームワークである AvatarGPT を紹介した。アバターGPTは、各タスクを共有LLM上で微調整された1種類の命令として扱う。すべてのタスクはユニバーサルインターフェースとして言語とシームレスに相互接続され、フレームワーク内のクローズドループを構成する。これを実現するために、人間の動きシーケンスはまず離散トークンとして符号化され、LLMの拡張語彙として機能する。次に,野生映像から人間の行動系列の自然言語記述を生成するための教師なしパイプラインを開発した。最後に、全てのタスクは共同で訓練される。 AvatarGPTは低レベルタスクでSOTAを達成し、高レベルタスクで有望な結果を達成し、提案したオールインワンフレームワークの有効性を示す。さらに、AvatarGPTは、閉ループ内のタスクの反復的トラバーサルにより、無制限な長動き合成を初めて実現した。 Large Language Models(LLMs) have shown remarkable emergent abilities in unifying almost all (if not every) NLP tasks. In the human motion-related realm, however, researchers still develop siloed models for each task. Inspired by InstuctGPT, and the generalist concept behind Gato, we introduce AvatarGPT, an All-in-One framework for motion understanding, planning, generations as well as other tasks such as motion in-between synthesis. AvatarGPT treats each task as one type of instruction fine-tuned on the shared LLM. All the tasks are seamlessly interconnected with language as the universal interface, constituting a closed-loop within the framework. To achieve this, human motion sequences are first encoded as discrete tokens, which serve as the extended vocabulary of LLM. Then, an unsupervised pipeline to generate natural language descriptions of human action sequences from in-the-wild videos is developed. Finally, all tasks are jointly trained. Extensive experiments show that AvatarGPT achieves SOTA on low-level tasks, and promising results on high-level tasks, demonstrating the effectiveness of our proposed All-in-One framework. Moreover, for the first time, AvatarGPT enables a principled approach by iterative traversal of the tasks within the closed-loop for unlimited long-motion synthesis.	翻訳日:2023-11-29 20:26:03 公開日:2023-11-28
# 大規模言語モデルによる人間の説得の促進 Enhancing Human Persuasion With Large Language Models ( http://arxiv.org/abs/2311.16466v1 ) ライセンス: Link先を確認	Minkyu Shin and Jin Kim	(参考訳) 大きな言語モデル(LLM)は、人間の生活の様々な側面を再構築していますが、その影響に対する現在の理解は、多少制約があります。本稿では,LLMが人的コミュニケーションに与える影響について,金融業界における消費者苦情の文脈で検討する。消費者金融保護局(CFPB)が収集した780万件以上の苦情に対してAI検出ツールを使用することで,ChatGPTのリリース直後の苦情の書面にLDMの使用の証拠が発見された。分析の結果, LLMの使用状況は, 望ましい結果を得る可能性(すなわち, 金融機関からの救済提供)と正の相関があることが判明し, この正の相関は, LLMが改善した言語的特徴に起因している可能性が示唆された。言語的品質を改善するためにChatGPTで書かれた消費者の苦情は、当初の消費者の苦情よりも仮説的救済のオファーを受ける傾向があり、人間のコミュニケーションにおけるメッセージの説得力を高めるLLMの能力を示す。説得力を高めるためのLSMの使用に関する最も初期の経験的証拠の一つとして,人間のコミュニケーションにおけるLSMの変容の可能性を強調した。 Although large language models (LLMs) are reshaping various aspects of human life, our current understanding of their impacts remains somewhat constrained. Here we investigate the impact of LLMs on human communication, in the context of consumer complaints in the financial industry. Employing an AI detection tool on more than 780K complaints gathered by the Consumer Financial Protection Bureau (CFPB), we find evidence of LLM usage in the writing of complaints - shortly after the release of ChatGPT. Our analyses reveal that LLM usage is positively correlated with the likelihood of obtaining desirable outcomes (i.e., offer of relief from financial firms) and suggest that this positive correlation may be partly due to the linguistic features improved by LLMs. We test this conjecture with a preregistered experiment, which reveals results consistent with those from observational studies: Consumer complaints written with ChatGPT for improved linguistic qualities were more likely to receive hypothetical relief offers than the original consumer complaints, demonstrating the LLM's ability to enhance message persuasiveness in human communication. Being some of the earliest empirical evidence on LLM usage for enhancing persuasion, our results highlight the transformative potential of LLMs in human communication.	翻訳日:2023-11-29 20:25:40 公開日:2023-11-28
# TextDiffuser-2: テキストレンダリングのための言語モデルのパワーを解放する TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering ( http://arxiv.org/abs/2311.16465v1 ) ライセンス: Link先を確認	Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei	(参考訳) 拡散モデルは近年、強力な生成モデルであることが証明されているが、ビジュアルテキストの生成には依然として課題である。明示的なテキストの位置とコンテンツを、どのテキストをレンダリングするかのガイダンスとして組み込むことで、この問題を緩和した。しかし、これらの手法には、柔軟性と自動化の制限、レイアウト予測の制限された機能、スタイルの多様性の制限など、いくつかの欠点がある。本稿では,テキストレンダリングのための言語モデルのパワーを解き放つことを目的としたTextDiffuser-2を提案する。まず,レイアウト計画のための大規模言語モデルを微調整する。大規模言語モデルはテキストレンダリング用のキーワードを自動的に生成し、チャットによるレイアウト変更もサポートする。次に,拡散モデル内の言語モデルを用いて,行レベルでの位置とテキストを符号化する。タイトな文字レベルのガイダンスを用いた従来の方法とは異なり、このアプローチはより多様なテキストイメージを生成する。我々は,より合理的なテキストレイアウトを実現するためのtextdiffuser-2のキャパシティを検証し,多様性を増すために,ヒトおよびgpt-4vを用いたユーザ研究を実施し,広範な実験を行った。コードとモデルは \url{https://aka.ms/textdiffuser-2} で入手できる。 The diffusion model has been proven a powerful generative model in recent years, yet remains a challenge in generating visual text. Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity. In this paper, we present TextDiffuser-2, aiming to unleash the power of language models for text rendering. Firstly, we fine-tune a large language model for layout planning. The large language model is capable of automatically generating keywords for text rendering and also supports layout modification through chatting. Secondly, we utilize the language model within the diffusion model to encode the position and texts at the line level. Unlike previous methods that employed tight character-level guidance, this approach generates more diverse text images. We conduct extensive experiments and incorporate user studies involving human participants as well as GPT-4V, validating TextDiffuser-2's capacity to achieve a more rational text layout and generation with enhanced diversity. The code and model will be available at \url{https://aka.ms/textdiffuser-2}.	翻訳日:2023-11-29 20:25:20 公開日:2023-11-28
# Bridging the Gap: モーメント検索とハイライト検出のための統合ビデオ理解フレームワーク Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection ( http://arxiv.org/abs/2311.16464v1 ) ライセンス: Link先を確認	Yicheng Xiao, Zhuoyan Luo, Yong Liu, Yue Ma, Hengwei Bian, Yatai Ji, Yujiu Yang, Xiu Li	(参考訳) ビデオモーメント検索 (MR) とハイライト検出 (HD) は, ビデオ解析の需要が高まっているため, 注目されている。最近のアプローチでは、MRとHDをビデオグラウンド問題として扱い、トランスフォーマーベースのアーキテクチャでそれらに対処している。しかし, MRとHDの重み付けは, 局所的な関係の認識と, グローバルな文脈の理解を優先することとで異なる。したがって、タスク固有の設計の欠如は、必然的に2つのタスクの本質的な特殊性を関連付けることの制限につながる。本稿では,このギャップを埋め,MRとHDを効果的に解決するための統一ビデオ理解フレームワーク (UVCOM) を提案する。複数の粒度にわたってモダリティ内とモダリティ間のプログレッシブな統合を行うことで、uvcomはビデオの処理における包括的理解を達成する。さらに,局所的関係モデリングとグローバルな知識蓄積を適切に整合したマルチモーダル空間を通じて統合するために,マルチアスペクトコントラスト学習を提案する。 QVHighlights、Charades-STA、TACoS、YouTube Highlights、TVSumデータセットに関する大規模な実験は、UVCOMの有効性と合理性を示している。 Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted significant attention due to the growing demand for video analysis. Recent approaches treat MR and HD as similar video grounding problems and address them together with transformer-based architecture. However, we observe that the emphasis of MR and HD differs, with one necessitating the perception of local relationships and the other prioritizing the understanding of global contexts. Consequently, the lack of task-specific design will inevitably lead to limitations in associating the intrinsic specialty of two tasks. To tackle the issue, we propose a Unified Video COMprehension framework (UVCOM) to bridge the gap and jointly solve MR and HD effectively. By performing progressive integration on intra and inter-modality across multi-granularity, UVCOM achieves the comprehensive understanding in processing a video. Moreover, we present multi-aspect contrastive learning to consolidate the local relation modeling and global knowledge accumulation via well aligned multi-modal space. Extensive experiments on QVHighlights, Charades-STA, TACoS , YouTube Highlights and TVSum datasets demonstrate the effectiveness and rationality of UVCOM which outperforms the state-of-the-art methods by a remarkable margin.	翻訳日:2023-11-29 20:24:59 公開日:2023-11-28
# ビデオサリエンシーと軌道情報の探索によるボリュームビデオストリーミングのビューポート予測 Viewport Prediction for Volumetric Video Streaming by Exploring Video Saliency and Trajectory Information ( http://arxiv.org/abs/2311.16462v1 ) ライセンス: Link先を確認	Jie Li, Zhixin Li, Zhi Liu, Pengyuan Zhou, Richang Hong, Qiyue Li, Han Hu	(参考訳) ボリュームビデオ(英: volumetric video)またはホログラムビデオ(英: hologram video)は、仮想現実(vr)、拡張現実(ar)、混合現実(mr)の自然コンテンツを描写する新しい媒体である。次世代のビデオ技術であり、5Gや無線通信以上の用途が想定されている。各ユーザが通常、ビューポートと呼ばれるボリュームビデオのセクションのみを視聴するので、最適なパフォーマンスのために正確なビューポート予測を行うことが不可欠である。しかし、この話題の研究はまだ初期段階にある。最後に,ボリュームビデオストリーミングにおけるビューポート予測の精度向上を目的とした,Saliency and Trajectory Viewport Prediction (STVP) という新しい手法を提案し,提案する。 STVPはビデオサリエンシ情報とビューポート軌跡を広範囲に活用する。私たちの知る限り、これはボリュームビデオストリーミングにおけるviewport予測に関する最初の包括的な研究である。特に,一様ランダムサンプリング(URS)という新しいサンプリング手法を導入し,ビデオの特徴を効率的に保存しながら,計算複雑性を低減した。次に,静的,動的幾何学的,カラーサルエント領域を検出するために,空間情報と時間情報の両方を組み込んだサルエンシー検出手法を提案する。最後に、我々はより正確なビューポート予測を実現するために、精度と軌道情報をインテリジェントに融合する。我々は,最先端のボリュームビデオシーケンスを用いたビューポート予測手法の有効性を評価するために,広範囲なシミュレーションを行った。実験の結果,提案手法が既存手法よりも優れていることがわかった。データセットとソースコードは、受理後に公開アクセスされる。 Volumetric video, also known as hologram video, is a novel medium that portrays natural content in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR). It is expected to be the next-gen video technology and a prevalent use case for 5G and beyond wireless communication. Considering that each user typically only watches a section of the volumetric video, known as the viewport, it is essential to have precise viewport prediction for optimal performance. However, research on this topic is still in its infancy. In the end, this paper presents and proposes a novel approach, named Saliency and Trajectory Viewport Prediction (STVP), which aims to improve the precision of viewport prediction in volumetric video streaming. The STVP extensively utilizes video saliency information and viewport trajectory. To our knowledge, this is the first comprehensive study of viewport prediction in volumetric video streaming. In particular, we introduce a novel sampling method, Uniform Random Sampling (URS), to reduce computational complexity while still preserving video features in an efficient manner. Then we present a saliency detection technique that incorporates both spatial and temporal information for detecting static, dynamic geometric, and color salient regions. Finally, we intelligently fuse saliency and trajectory information to achieve more accurate viewport prediction. We conduct extensive simulations to evaluate the effectiveness of our proposed viewport prediction methods using state-of-the-art volumetric video sequences. The experimental results show the superiority of the proposed method over existing schemes. The dataset and source code will be publicly accessible after acceptance.	翻訳日:2023-11-29 20:24:35 公開日:2023-11-28
# フェデレーション学習における障害の影響と対策について On the Effect of Defections in Federated Learning and How to Prevent Them ( http://arxiv.org/abs/2311.16459v1 ) ライセンス: Link先を確認	Minbiao Han, Kumar Kshitij Patel, Han Shao, Lingxiao Wang	(参考訳) Federated Learningは、多数のエージェントが複数のラウンドで協力して単一のコンセンサスモデルを生成することができる、機械学習プロトコルである。複数の連合学習アプリケーションがあり、エージェントは、そのラウンドの瞬間的なモデルで満足している場合、コラボレーションから永久的に$-$を引き出して、欠陥をなくすことができる。この研究は、そのような欠陥が最終モデルの堅牢性と一般化能力に与える影響を実証する。また、現在のフェデレーション最適化アルゴリズムは、これらの有害な欠陥を非インセンティブにしないことを示す。提案手法は, 有効解に対する漸近収束性を確保しつつ, 欠陥防止を理論的に保証する新しい最適化アルゴリズムを提案する。また,実験結果の相関やアルゴリズムの有効性を示す数値実験も行った。 Federated learning is a machine learning protocol that enables a large population of agents to collaborate over multiple rounds to produce a single consensus model. There are several federated learning applications where agents may choose to defect permanently$-$essentially withdrawing from the collaboration$-$if they are content with their instantaneous model in that round. This work demonstrates the detrimental impact of such defections on the final model's robustness and ability to generalize. We also show that current federated optimization algorithms fail to disincentivize these harmful defections. We introduce a novel optimization algorithm with theoretical guarantees to prevent defections while ensuring asymptotic convergence to an effective solution for all participating agents. We also provide numerical experiments to corroborate our findings and demonstrate the effectiveness of our algorithm.	翻訳日:2023-11-29 20:24:11 公開日:2023-11-28
# 視覚トランスフォーマーのための動的時間ステップを用いたスパイクニューラルネットワーク Spiking Neural Networks with Dynamic Time Steps for Vision Transformers ( http://arxiv.org/abs/2311.16456v1 ) ライセンス: Link先を確認	Gourav Datta, Zeyu Liu, Anni Li, Peter A. Beerel	(参考訳) Spiking Neural Networks (SNN)は、複雑な視覚タスクのための時空間コンピューティングパラダイムとして人気がある。最近提案されたSNNトレーニングアルゴリズムは、レイテンシとエネルギー効率を改善するための時間ステップ(1まで)を大幅に削減したが、それらは畳み込みニューラルネットワーク(CNN)のみを対象としている。これらのアルゴリズムは、最近注目された視覚変換器(ViT)に適用された場合、大量の時間ステップを必要とするか、収束に失敗する。 ANN と SNN のアクティベーションマップのヒストグラム解析に基づいて,各 ViT ブロックが時間ステップ数に対して異なる感度を持つという仮説を立てた。本稿では、各タイムステップに割り当てられたトレーニング可能なスコアに応じて、各ViTモジュールに動的にタイムステップ数を割り当てる新しいトレーニングフレームワークを提案する。特に、各ニューロンから放出されるスパイクを、漏洩積分燃焼(LIF)層でフィルタリングするスカラー二分時間ステップマスクを生成する。結果として生じるSNNは高い活性化間隔を持ち、従来のViTで必要とされる高価な乗算累積(MAC)とは対照的に、入力埋め込み層を除いて、蓄積操作(AC)のみを必要とする。これによりエネルギー効率が大幅に向上する。 vitアーキテクチャの異なるcifar10,cifar100,imagenetなどの画像認識タスクにおいて,トレーニングフレームワークと結果snsを評価した。テスト精度は95.97%で、4.97時間ステップでcifar10に直接エンコードする。 Spiking Neural Networks (SNNs) have emerged as a popular spatio-temporal computing paradigm for complex vision tasks. Recently proposed SNN training algorithms have significantly reduced the number of time steps (down to 1) for improved latency and energy efficiency, however, they target only convolutional neural networks (CNN). These algorithms, when applied on the recently spotlighted vision transformers (ViT), either require a large number of time steps or fail to converge. Based on analysis of the histograms of the ANN and SNN activation maps, we hypothesize that each ViT block has a different sensitivity to the number of time steps. We propose a novel training framework that dynamically allocates the number of time steps to each ViT module depending on a trainable score assigned to each timestep. In particular, we generate a scalar binary time step mask that filters spikes emitted by each neuron in a leaky-integrate-and-fire (LIF) layer. The resulting SNNs have high activation sparsity and require only accumulate operations (AC), except for the input embedding layer, in contrast to expensive multiply-and-accumulates (MAC) needed in traditional ViTs. This yields significant improvements in energy efficiency. We evaluate our training framework and resulting SNNs on image recognition tasks including CIFAR10, CIFAR100, and ImageNet with different ViT architectures. We obtain a test accuracy of 95.97% with 4.97 time steps with direct encoding on CIFAR10.	翻訳日:2023-11-29 20:23:59 公開日:2023-11-28
# ジェネリスト・ファンデーション・モデルは特殊目的チューニングに勝るか? 医学におけるケーススタディ Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine ( http://arxiv.org/abs/2311.16452v1 ) ライセンス: Link先を確認	Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz	(参考訳) GPT-4のような一般的な基礎モデルは、様々な領域やタスクにおいて驚くべき能力を示している。しかし、微調整モデルの専門的な能力にはマッチしないという仮定が一般的である。例えば、医療能力ベンチマークにおけるこれまでのほとんどの調査は、BioGPTやMed-PaLMの取り組みによって実証されたように、ドメイン固有のトレーニングを活用している。本研究は, GPT-4の医学的課題評価における能力について, 専門訓練の欠如による先行研究に基づくものである。モデルのアウトオブボックス機能を強調するために単純なプロンプトを使うのではなく、プロンプトエンジニアリングを体系的に調査する。イノベーションを促進することで、より深い専門的能力が解放され、gpt-4が医療ベンチマークの先行成果を上回ったことが分かります。調査するプロンプトメソッドは汎用的であり、専門分野の専門知識を特に使用せず、専門家によるコンテンツの必要性を排除しています。我々の実験設計は、迅速なエンジニアリングプロセスにおける過度な適合を慎重に制御する。我々は,いくつかのプロンプト戦略の構成に基づき,medpromptを紹介する。 Medpromptを使用すると、GPT-4はMultiMedQAスイートのベンチマークデータセットの9つすべてに対して最先端の結果を得る。この手法は、Med-PaLM 2のような主要なスペシャリストモデルよりも、桁違いに少ない精度で性能を向上する。 MedpromptによるGPT-4のステアリングは、MedQAデータセットの27%のエラー率を、これまでスペシャリストモデルで達成された最良のメソッドに対して達成し、初めて90%を超えた。医療問題以外にも,電気工学,機械学習,哲学,会計学,法学,看護学,臨床心理学における試験戦略の研究を通じて,medpromptが他の領域に一般化し,そのアプローチが広く適用可能であることを示す。 Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM. We build on a prior study of GPT-4's capabilities on medical challenge benchmarks in the absence of special training. Rather than using simple prompting to highlight the model's out-of-the-box capabilities, we perform a systematic exploration of prompt engineering. We find that prompting innovation can unlock deeper specialist capabilities and show that GPT-4 easily tops prior leading results for medical benchmarks. The prompting methods we explore are general purpose, and make no specific use of domain expertise, removing the need for expert-curated content. Our experimental design carefully controls for overfitting during the prompt engineering process. We introduce Medprompt, based on a composition of several prompting strategies. With Medprompt, GPT-4 achieves state-of-the-art results on all nine of the benchmark datasets in the MultiMedQA suite. The method outperforms leading specialist models such as Med-PaLM 2 by a significant margin with an order of magnitude fewer calls to the model. Steering GPT-4 with Medprompt achieves a 27% reduction in error rate on the MedQA dataset over the best methods to date achieved with specialist models and surpasses a score of 90% for the first time. Beyond medical problems, we show the power of Medprompt to generalize to other domains and provide evidence for the broad applicability of the approach via studies of the strategy on exams in electrical engineering, machine learning, philosophy, accounting, law, nursing, and clinical psychology.	翻訳日:2023-11-29 20:23:35 公開日:2023-11-28
# 部分共有u-netを用いたジョイントデータインフィルメントを用いた効率的なマルチモーダル拡散モデル Efficient Multimodal Diffusion Models Using Joint Data Infilling with Partially Shared U-Net ( http://arxiv.org/abs/2311.16488v1 ) ライセンス: Link先を確認	Zizhao Hu, Shaochong Jia, Mohammad Rostami	(参考訳) 近年,クロスモーダルデータ変換やマルチモーダルデータ生成のための分散に適合する拡散モデルが提案されている。しかし、これらの手法は広範なスケーリングに依存しており、非効率性やモダリティ間の干渉を見越している。我々は,テキストと画像の入力が専用レイヤを通過することを可能にする効率的なマルチモーダル拡散モデルである部分共有型u-net (ps-u-net) アーキテクチャを開発した。画像インパインティングに触発されて,単純なジョイント分布のみを学習しながら条件付き生成のための新しいシナリオを導入する,効率的なマルチモーダルサンプリング手法を提案する。我々のMS-COCOデータセットを実験的に調べたところ、本手法は既存のマルチモーダル拡散モデルと比較して高い品質でマルチモーダルテキストと画像データを生成する一方で、より高速なトレーニング、高速なマルチモーダルサンプリング、より柔軟な生成を行う。 Recently, diffusion models have been used successfully to fit distributions for cross-modal data translation and multimodal data generation. However, these methods rely on extensive scaling, overlooking the inefficiency and interference between modalities. We develop Partially Shared U-Net (PS-U-Net) architecture which is an efficient multimodal diffusion model that allows text and image inputs to pass through dedicated layers and skip-connections for preserving modality-specific fine-grained details. Inspired by image inpainting, we also propose a new efficient multimodal sampling method that introduces new scenarios for conditional generation while only requiring a simple joint distribution to be learned. Our empirical exploration of the MS-COCO dataset demonstrates that our method generates multimodal text and image data with higher quality compared to existing multimodal diffusion models while having a comparable size, faster training, faster multimodal sampling, and more flexible generation.	翻訳日:2023-11-29 20:14:14 公開日:2023-11-28
# 意思決定型学習のロバスト性について On the Robustness of Decision-Focused Learning ( http://arxiv.org/abs/2311.16487v1 ) ライセンス: Link先を確認	Yehya Farhat	(参考訳) 決定焦点学習(Decision-Focused Learning, DFL)は、機械学習(ML)モデルを訓練し、不完全な最適化問題の欠落パラメータを予測するための新興学習パラダイムである。 DFLは、予測と最適化タスクを統合することで、エンドツーエンドシステムでMLモデルをトレーニングし、トレーニングとテストの目的の整合性を向上させる。 DFLは多くの約束を示し、多くの現実世界のアプリケーションで意思決定に革命をもたらす能力を持っている。しかし、これらのモデルの敵攻撃時の性能についてはほとんど分かっていない。我々は,10種類のDFL手法を採用し,その性能を予測列最適化問題に適応した2つの明確な攻撃条件下でベンチマークする。本研究は,モデルのロバスト性が,接地ラベルから逸脱することなく最適な決定につながる予測を見つける能力と高い相関関係にあるという仮説を提案する。さらに、この条件に違反するモデルをターゲットにする方法を考察し、トレーニングサイクルの最後に達成された最適性に応じてこれらのモデルがどのように反応するかを示す。 Decision-Focused Learning (DFL) is an emerging learning paradigm that tackles the task of training a machine learning (ML) model to predict missing parameters of an incomplete optimization problem, where the missing parameters are predicted. DFL trains an ML model in an end-to-end system, by integrating the prediction and optimization tasks, providing better alignment of the training and testing objectives. DFL has shown a lot of promise and holds the capacity to revolutionize decision-making in many real-world applications. However, very little is known about the performance of these models under adversarial attacks. We adopt ten unique DFL methods and benchmark their performance under two distinctly focused attacks adapted towards the Predict-then-Optimize problem setting. Our study proposes the hypothesis that the robustness of a model is highly correlated with its ability to find predictions that lead to optimal decisions without deviating from the ground-truth label. Furthermore, we provide insight into how to target the models that violate this condition and show how these models respond differently depending on the achieved optimality at the end of their training cycles.	翻訳日:2023-11-29 20:13:54 公開日:2023-11-28
# StyleCap: 音声と言語による自己教師型学習モデルに基づく音声の自動キャプション StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models ( http://arxiv.org/abs/2311.16509v1 ) ライセンス: Link先を確認	Kazuki Yamauchi, Yusuke Ijima, Yuki Saito	(参考訳) 音声に現れる話し方の自然言語記述を生成する方法であるStyleCapを提案する。従来のパラ言語/非言語情報認識技術のほとんどは、分類分類や事前定義されたラベルの強度推定に重点を置いているが、認識結果を解釈可能な方法で推論することはできない。 stylecapは、音声から発話スタイルプロンプトを生成するエンドツーエンドの方法、すなわち自動発話スタイルのキャプションを生成するための第一歩として、音声と自然言語記述のペアデータを使用して、音声表現ベクトルから大言語モデル(llm)ベースのテキストデコーダに供給されるプレフィックスベクトルを予測するニューラルネットワークを訓練する。本稿では,この課題に適したテキストデコーダと音声特徴表現について検討する。実験結果から,よりリッチなLLMをテキストデコーダ,音声自己教師学習(SSL)機能に活用したStyleCapは,音声文の精度と多様性を向上することが示された。 StyleCapが生成した話し方キャプションのサンプルが公開されている。 We propose StyleCap, a method to generate natural language descriptions of speaking styles appearing in speech. Although most of conventional techniques for para-/non-linguistic information recognition focus on the category classification or the intensity estimation of pre-defined labels, they cannot provide the reasoning of the recognition result in an interpretable manner. As a first step towards an end-to-end method for generating speaking-style prompts from speech, i.e., automatic speaking-style captioning, StyleCap uses paired data of speech and natural language descriptions to train neural networks that predict prefix vectors fed into a large language model (LLM)-based text decoder from a speech representation vector. We explore an appropriate text decoder and speech feature representation suitable for this new task. The experimental results demonstrate that our StyleCap leveraging richer LLMs for the text decoder, speech self-supervised learning (SSL) features, and sentence rephrasing augmentation improves the accuracy and diversity of generated speaking-style captions. Samples of speaking-style captions generated by our StyleCap are publicly available.	翻訳日:2023-11-29 20:02:58 公開日:2023-11-28
# 拡散誘導を伴う流れマッチングのストレートな軌跡の探索 Exploring Straighter Trajectories of Flow Matching with Diffusion Guidance ( http://arxiv.org/abs/2311.16507v1 ) ライセンス: Link先を確認	Siyu Xing, Jie Cao, Huaibo Huang, Xiao-Yu Zhang, Ran He	(参考訳) 生成モデルのパラダイムとしてのフローマッチングは、さまざまなドメインで注目すべき成功を収めます。しかし、既存の方法はマルチラウンドトレーニングまたはミニバッチ内の知識を使用しており、ストレートトラジェクタの適切な結合戦略を見つける上での課題となっている。この問題に対処するため,我々はフローマッチングのストレートトラジェクタ(straightfm)という新しい手法を提案する。分布レベル全体から拡散モデルによって導かれる結合戦略により軌道を直線化する。まず,トラジェクタをストレート化するための結合戦略を提案し,拡散モデル指導下で画像と雑音サンプルのカップリングを作成する。第二に、straightfmは実際のデータを統合してトレーニングを強化し、ニューラルネットワークを使って画像からノイズサンプルへの別の結合プロセスをパラメータ化する。ストレートFMは、相互に相補的な2つの方向からの結合で共同最適化され、より直線的な軌道となり、ワンステップと数ステップの両方を生成できる。広範囲な実験により、StraightFMはより少ないステップで高品質なサンプルを生成することが示された。 StraightFMは、拡散法と従来のフローマッチング法の間でFIDが低い視覚的に魅力的な画像を生成する。潜時空間(すなわち潜時拡散)において、straightfmは、celeba-hq 256データセットの既存の方法に比べて10以下のサンプリングステップで低いkid値を達成する。 Flow matching as a paradigm of generative model achieves notable success across various domains. However, existing methods use either multi-round training or knowledge within minibatches, posing challenges in finding a favorable coupling strategy for straight trajectories. To address this issue, we propose a novel approach, Straighter trajectories of Flow Matching (StraightFM). It straightens trajectories with the coupling strategy guided by diffusion model from entire distribution level. First, we propose a coupling strategy to straighten trajectories, creating couplings between image and noise samples under diffusion model guidance. Second, StraightFM also integrates real data to enhance training, employing a neural network to parameterize another coupling process from images to noise samples. StraightFM is jointly optimized with couplings from above two mutually complementary directions, resulting in straighter trajectories and enabling both one-step and few-step generation. Extensive experiments demonstrate that StraightFM yields high quality samples with fewer step. StraightFM generates visually appealing images with a lower FID among diffusion and traditional flow matching methods within 5 sampling steps when trained on pixel space. In the latent space (i.e., Latent Diffusion), StraightFM achieves a lower KID value compared to existing methods on the CelebA-HQ 256 dataset in fewer than 10 sampling steps.	翻訳日:2023-11-29 20:02:11 公開日:2023-11-28
# 古典機械における量子計算のシミュレーション:サーベイ Simulating Quantum Computations on Classical Machines: A Survey ( http://arxiv.org/abs/2311.16505v1 ) ライセンス: Link先を確認	Kieran Young, Marcus Scese, Ali Ebnenasir	(参考訳) 本稿では,古典計算機における量子シミュレーション手法と量子シミュレータの総合的研究について述べる。まず、150以上のシミュレーターと量子ライブラリーを網羅的に研究する。そして、アクティブに維持されているシミュレータをショートリスト化し、10キュービット以上の量子アルゴリズムのシミュレーションを可能にする。その結果,2010年以降,最も効率的でアクティブなシミュレータが開発されていることがわかった。また,シュロディンガー法,ファインマン経路積分法,ハイゼンベルク法,ハイブリッド法など,最も重要なシミュレーション手法の分類法を提案する。ほとんどのシミュレータは、シュロディンガーに基づくアプローチのカテゴリに該当する。しかし、他のカテゴリに属する効率的なシミュレータはいくつか存在する。また、量子フレームワークが独自のソフトウェアツールのクラスを形成し、シミュレーター/シミュレーションメソッドを選択することで、アルゴリズム設計者にさらなる柔軟性を提供することに留意する。この研究のもうひとつの貢献は、様々なシミュレータで使用される最適化手法の使用と分類である。現状のシミュレータの中には、ソフトウェアとハードウェアの最適化技術を組み合わせて量子回路のシミュレーションをスケールアップするものもある。本研究は,教育と研究における量子シミュレータの利用をさらに促進するための今後の研究のロードマップを提供することで要約する。 We present a comprehensive study of quantum simulation methods and quantum simulators for classical computers. We first study an exhaustive set of 150+ simulators and quantum libraries. Then, we short-list the simulators that are actively maintained and enable simulation of quantum algorithms for more than 10 qubits. As a result, we realize that most efficient and actively maintained simulators have been developed after 2010. We also provide a taxonomy of the most important simulation methods, namely Schrodinger-based, Feynman path integrals, Heisenberg-based, and hybrid methods. We observe that most simulators fall in the category of Schrodinger-based approaches. However, there are a few efficient simulators belonging to other categories. We also make note that quantum frameworks form their own class of software tools that provide more flexibility for algorithm designers with a choice of simulators/simulation method. Another contribution of this study includes the use and classification of optimization methods used in a variety of simulators. We observe that some state-of-the-art simulators utilize a combination of software and hardware optimization techniques to scale up the simulation of quantum circuits. We summarize this study by providing a roadmap for future research that can further enhance the use of quantum simulators in education and research.	翻訳日:2023-11-29 20:01:50 公開日:2023-11-28
# 神経放射場における方向積分の再考 Rethinking Directional Integration in Neural Radiance Fields ( http://arxiv.org/abs/2311.16504v1 ) ライセンス: Link先を確認	Congyue Deng, Jiawei Yang, Leonidas Guibas, Yue Wang	(参考訳) 近年の研究では、Neural Radiance Field (NeRF) を用いて多視点3D再構成を行い、フォトリアリスティックシーンのレンダリングにおいて大きな飛躍をもたらした。しかし、その有効性にもかかわらず、NeRFは光場レンダリングや画像ベースビュー合成と比較して、視野依存効果の学習能力に限界がある。そこで我々は,NeRF のレンダリングの精度を大幅に向上させつつ,NeRF の任意の変化に対して数行のコード変更が簡単な NeRF レンダリング方程式の修正を導入する。積分演算子と方向デコーダネットワークを交換することにより、線に沿った位置特徴のみを統合し、方向項を積分から外し、ビュー依存コンポーネントと独立コンポーネントをアンタングル化する。修正方程式は、ディラック密度を持つ物体表面上の理想の場合の古典的な体積レンダリングと等価である。さらに,ネットワーク近似と数値積分による誤差により,従来のNeRFと比較して誤差蓄積の少ない収束特性が向上することが証明された。また,修正方程式は学習レイ埋め込みによる光場レンダリングと解釈できることを示した。異なるnerf変動に関する実験では、単純な修正でビュー依存効果の品質が一貫して改善されている。 Recent works use the Neural radiance field (NeRF) to perform multi-view 3D reconstruction, providing a significant leap in rendering photorealistic scenes. However, despite its efficacy, NeRF exhibits limited capability of learning view-dependent effects compared to light field rendering or image-based view synthesis. To that end, we introduce a modification to the NeRF rendering equation which is as simple as a few lines of code change for any NeRF variations, while greatly improving the rendering quality of view-dependent effects. By swapping the integration operator and the direction decoder network, we only integrate the positional features along the ray and move the directional terms out of the integration, resulting in a disentanglement of the view-dependent and independent components. The modified equation is equivalent to the classical volumetric rendering in ideal cases on object surfaces with Dirac densities. Furthermore, we prove that with the errors caused by network approximation and numerical integration, our rendering equation exhibits better convergence properties with lower error accumulations compared to the classical NeRF. We also show that the modified equation can be interpreted as light field rendering with learned ray embeddings. Experiments on different NeRF variations show consistent improvements in the quality of view-dependent effects with our simple modification.	翻訳日:2023-11-29 20:01:31 公開日:2023-11-28
# fisheyevit と diffusion-based motionfine を用いたエゴセントリック全身運動キャプチャ Egocentric Whole-Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement ( http://arxiv.org/abs/2311.16495v1 ) ライセンス: Link先を確認	Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt	(参考訳) 本研究では,人体と手の動きを同時に推定する単一魚眼カメラを用いて,自我中心型全体モーションキャプチャーを探索する。このタスクは、高品質なデータセットの欠如、魚眼カメラの歪み、人体の自己隔離という3つの要因によって重大な課題を提起する。これらの課題に対処するために,fisheyevitを用いて魚眼画像の特徴を抽出し,その特徴を3次元人体ポーズ予測のための3次元ヒートマップ表現に変換する新しい手法を提案する。ハンドトラッキングには, 3次元ハンドポーズの回帰のためのハンド検出とハンドポーズ推定ネットワークが組み込まれている。最後に, 拡散に基づく全身運動先行モデルを開発し, 共同不確かさを考慮しつつ, 推定全身運動を洗練する。これらのネットワークをトレーニングするために、我々は、さまざまな全身動作シーケンスでキャプチャされた84万の高品質なエゴセントリック画像からなる、egowholebodyという大規模な合成データセットを収集した。定量的,定性的な評価は,単焦点カメラを用いた高品質な全身運動推定法の有効性を示す。 In this work, we explore egocentric whole-body motion capture using a single fisheye camera, which simultaneously estimates human body and hand motion. This task presents significant challenges due to three factors: the lack of high-quality datasets, fisheye camera distortion, and human body self-occlusion. To address these challenges, we propose a novel approach that leverages FisheyeViT to extract fisheye image features, which are subsequently converted into pixel-aligned 3D heatmap representations for 3D human body pose prediction. For hand tracking, we incorporate dedicated hand detection and hand pose estimation networks for regressing 3D hand poses. Finally, we develop a diffusion-based whole-body motion prior model to refine the estimated whole-body motion while accounting for joint uncertainties. To train these networks, we collect a large synthetic dataset, EgoWholeBody, comprising 840,000 high-quality egocentric images captured across a diverse range of whole-body motion sequences. Quantitative and qualitative evaluations demonstrate the effectiveness of our method in producing high-quality whole-body motion estimates from a single egocentric camera.	翻訳日:2023-11-29 19:58:41 公開日:2023-11-28
# Graph Prompt Learning: 総合的な調査とその先 Graph Prompt Learning: A Comprehensive Survey and Beyond ( http://arxiv.org/abs/2311.16534v1 ) ライセンス: Link先を確認	Xiangguo Sun, Jiawen Zhang, Xixi Wu, Hong Cheng, Yun Xiong, Jia Li	(参考訳) ai(artificial general intelligence, agi)は、多くの分野に革命をもたらしたが、そのグラフデータとの統合は、相互接続された世界の基礎である。本稿では、AGIにおけるグラフプロンプトの出現領域に関する先駆的な調査を行い、AGIアプリケーションにおいてグラフデータを活用する上での重要な課題と機会に対処する。自然言語処理とコンピュータビジョンにおけるAGIの大幅な進歩にもかかわらず、グラフデータへの応用は比較的過小評価されている。この調査は、グラフデータを扱うAGIの現在の状況を批判的に評価し、グラフ固有のクロスモダリティ、クロスドメイン、クロスタスクアプリケーションにおける異なる課題を強調します。私たちの研究は、グラフのプロンプト学習を理解し、プロンプトトークン、トークン構造、グラフ領域の挿入パターンを明確にするための統一的なフレームワークを最初に提案しました。グラフプロンプトの固有の特性を調べ、その柔軟性、表現力、既存のグラフモデルとの相互作用を探求する。包括的分類学はこの分野で100以上の作品を分類し、ノードレベル、エッジレベル、グラフレベルの目標をまたいだ事前学習タスクに分類する。さらに、グラフプロンプトの研究を支援するために、PythonライブラリであるProGと、それに付随するWebサイトを紹介します。この調査は、現在の課題と今後の方向性について議論し、AGI内でグラフプロンプトの研究のロードマップを提供する。この包括的分析を通じて、グラフデータにおけるAGIのさらなる探索と実践的応用を触媒し、AGIフィールドを再構築する可能性を明確にする。 ProGとWebサイトは、それぞれ \url{https://github.com/WxxShirley/Awesome-Graph-Prompt} と \url{https://github.com/sheldonresearch/ProG} でアクセスできる。 Artificial General Intelligence (AGI) has revolutionized numerous fields, yet its integration with graph data, a cornerstone in our interconnected world, remains nascent. This paper presents a pioneering survey on the emerging domain of graph prompts in AGI, addressing key challenges and opportunities in harnessing graph data for AGI applications. Despite substantial advancements in AGI across natural language processing and computer vision, the application to graph data is relatively underexplored. This survey critically evaluates the current landscape of AGI in handling graph data, highlighting the distinct challenges in cross-modality, cross-domain, and cross-task applications specific to graphs. Our work is the first to propose a unified framework for understanding graph prompt learning, offering clarity on prompt tokens, token structures, and insertion patterns in the graph domain. We delve into the intrinsic properties of graph prompts, exploring their flexibility, expressiveness, and interplay with existing graph models. A comprehensive taxonomy categorizes over 100 works in this field, aligning them with pre-training tasks across node-level, edge-level, and graph-level objectives. Additionally, we present, ProG, a Python library, and an accompanying website, to support and advance research in graph prompting. The survey culminates in a discussion of current challenges and future directions, offering a roadmap for research in graph prompting within AGI. Through this comprehensive analysis, we aim to catalyze further exploration and practical applications of AGI in graph data, underlining its potential to reshape AGI fields and beyond. ProG and the website can be accessed by \url{https://github.com/WxxShirley/Awesome-Graph-Prompt}, and \url{https://github.com/sheldonresearch/ProG}, respectively.	翻訳日:2023-11-29 19:50:59 公開日:2023-11-28
# ナノダイヤモンドエマルションによる量子センシングとクリック化学共役 Nanodiamond emulsions for enhanced quantum sensing and click-chemistry conjugation ( http://arxiv.org/abs/2311.16530v1 ) ライセンス: Link先を確認	Henry J. Shulevitz, Ahmad Amirshaghaghi, Mathieu Ouellet, Caroline Brustoloni, Shengsong Yang, Jonah J. Ng, Tzu-Yung Huang, Davit Jishkariani, Christopher B. Murray, Andrew Tsourkas, Cherie R. Kagan, and Lee C. Bassett	(参考訳) 窒素空孔(NV)中心を含むナノダイヤモンドは、生物学的および化学的環境における局所場のコロイド量子センサーとして機能する。しかし、ナノダイヤモンドの表面はコロイドの安定性やnv中心の光学的およびスピン的性質を損なうことなく修正することが困難である。本稿では,ナノダイアモンドを薄いエマルション層で被覆し,その量子特性を保ち,コロイド安定性を高め,後続の架橋およびクリック化学共役反応のための官能基を提供する方法について報告する。この技術を実証するために, 2つの異なる戦略を用いて共役を可能にするカルボキシル基とアジド基を有する両親媒性化合物の組み合わせでナノダイヤモンドをデコレートする。我々は,エマルション層がNV中心のスピン寿命に及ぼす影響について検討し,ナノダイアモンドの常磁性イオンに対する化学感度を$T_1$リラクソメトリーを用いて定量化する。このナノダイヤモンド表面機能化への一般的なアプローチは、量子ナノメディシンと生物学的センシングの進歩を可能にする。 Nanodiamonds containing nitrogen-vacancy (NV) centers can serve as colloidal quantum sensors of local fields in biological and chemical environments. However, nanodiamond surfaces are challenging to modify without degrading their colloidal stability or the NV center's optical and spin properties. Here, we report a simple and general method to coat nanodiamonds with a thin emulsion layer that preserves their quantum features, enhances their colloidal stability, and provides functional groups for subsequent crosslinking and click-chemistry conjugation reactions. To demonstrate this technique, we decorate the nanodiamonds with combinations of carboxyl- and azide-terminated amphiphiles that enable conjugation using two different strategies. We study the effect of the emulsion layer on the NV center's spin lifetime, and we quantify the nanodiamonds' chemical sensitivity to paramagnetic ions using $T_1$ relaxometry. This general approach to nanodiamond surface functionalization will enable advances in quantum nanomedicine and biological sensing.	翻訳日:2023-11-29 19:49:46 公開日:2023-11-28
# 需要学習によるコンテキスト動的価格の実用性 Utility Fairness in Contextual Dynamic Pricing with Demand Learning ( http://arxiv.org/abs/2311.16528v1 ) ライセンス: Link先を確認	Xi Chen, David Simchi-Levi, Yining Wang	(参考訳) 本稿では,需要の不確実性を考慮したシナリオにおいて,実用フェアネス制約下でのパーソナライズされた価格設定のための新しいコンテキストバンディットアルゴリズムを提案する。動的価格設定と需要学習を取り入れた当社のアプローチは,価格戦略における公正性の重要課題に対処する。まず、静的な全情報設定を精査し、最適な価格設定を制約付き最適化問題として定式化する。本稿では,理想ポリシーを効率的に,ほぼ計算するための近似アルゴリズムを提案する。また,より詳細な研究と拡張の基礎を規定する簡略な方針を導出し,公平性制約に基づく最適文脈価格政策の構造を特徴付けるために,数理解析と計算研究を用いる。さらに,本研究を需要学習を伴う動的価格問題に拡張し,公平性制約が付加する複雑性を強調する非標準後悔下限を確立した。本研究は,公正のコストとその効用と収益の最大化のバランスへの影響を包括的に分析するものである。この研究は、データ駆動動的価格設定のアルゴリズム効率への倫理的配慮を統合するための一歩である。 This paper introduces a novel contextual bandit algorithm for personalized pricing under utility fairness constraints in scenarios with uncertain demand, achieving an optimal regret upper bound. Our approach, which incorporates dynamic pricing and demand learning, addresses the critical challenge of fairness in pricing strategies. We first delve into the static full-information setting to formulate an optimal pricing policy as a constrained optimization problem. Here, we propose an approximation algorithm for efficiently and approximately computing the ideal policy. We also use mathematical analysis and computational studies to characterize the structures of optimal contextual pricing policies subject to fairness constraints, deriving simplified policies which lays the foundations of more in-depth research and extensions. Further, we extend our study to dynamic pricing problems with demand learning, establishing a non-standard regret lower bound that highlights the complexity added by fairness constraints. Our research offers a comprehensive analysis of the cost of fairness and its impact on the balance between utility and revenue maximization. This work represents a step towards integrating ethical considerations into algorithmic efficiency in data-driven dynamic pricing.	翻訳日:2023-11-29 19:49:23 公開日:2023-11-28
# 強靭なオーバーフィッティング--逆行訓練による分布問題 On robust overfitting: adversarial training induced distribution matters ( http://arxiv.org/abs/2311.16526v1 ) ライセンス: Link先を確認	Runzhi Tian, Yongyi Mao	(参考訳) 敵の訓練は、修正された損失関数を持つ標準訓練と見なすことができる。しかし、その一般化誤差は標準損失下での標準訓練よりもはるかに大きいように見える。この現象は強固な過剰フィッティングとして知られるが、大きな研究の注目を集め、ほとんどが謎のままである。本稿では,強靭なオーバーフィッティングが,逆行訓練(特にPGDに基づく逆行訓練)の軌跡に沿った摂動誘起分布の一般化困難度の増加と相関することを示す。次に,摂動誘起分布に対する一般化誤差に対する新しい上限を与え,摂動作用素の概念を「局所分散」と呼ぶことが重要な役割を果たす。 Adversarial training may be regarded as standard training with a modified loss function. But its generalization error appears much larger than standard training under standard loss. This phenomenon, known as robust overfitting, has attracted significant research attention and remains largely as a mystery. In this paper, we first show empirically that robust overfitting correlates with the increasing generalization difficulty of the perturbation-induced distributions along the trajectory of adversarial training (specifically PGD-based adversarial training). We then provide a novel upper bound for generalization error with respect to the perturbation-induced distributions, in which a notion of the perturbation operator, referred to "local dispersion", plays an important role.	翻訳日:2023-11-29 19:48:58 公開日:2023-11-28
# ニューラルインシシシット機能を用いたパノラマX線写真からの3次元歯の再構築 3D Teeth Reconstruction from Panoramic Radiographs using Neural Implicit Functions ( http://arxiv.org/abs/2311.16524v1 ) ライセンス: Link先を確認	Sihwa Park, Seongjun Kim, In-Seok Song, Seung Jun Baek	(参考訳) パノラマX線撮影は歯科診療や研究で広く用いられている画像モダリティである。しかし, デンタル構造の詳細な評価を制限した2次元画像のみを提供する。本稿では,神経暗示機能を用いたパノラマX線写真からの3次元歯の再構築のための枠組みであるOccudentを提案する。 3次元空間の所定の点について、暗黙関数は、その点が歯によって占有されているかどうかを推定し、3次元の歯形の境界を暗黙的に決定する。まず、Occudentは入力パノラマラジオグラフィーにマルチラベルセグメンテーションを適用する。次に、復元ネットワークに供給されるセグメンテーション出力から、歯形埋め込み及び歯類埋め込みを生成する。 Conditional eXcitation (CX)と呼ばれる新しいモジュールは、暗黙の関数に形とクラス埋め込みを効果的に組み込むために提案されている。 Occudentの性能は,定量と定性の両方を用いて評価する。重要なことは、Occudentは、合成画像を使用した最近の研究と異なり、実際のパノラマラジオグラフを入力として訓練され、検証されていることである。実験は最先端の手法よりもoccudentの方が優れていることを示す。 Panoramic radiography is a widely used imaging modality in dental practice and research. However, it only provides flattened 2D images, which limits the detailed assessment of dental structures. In this paper, we propose Occudent, a framework for 3D teeth reconstruction from panoramic radiographs using neural implicit functions, which, to the best of our knowledge, is the first work to do so. For a given point in 3D space, the implicit function estimates whether the point is occupied by a tooth, and thus implicitly determines the boundaries of 3D tooth shapes. Firstly, Occudent applies multi-label segmentation to the input panoramic radiograph. Next, tooth shape embeddings as well as tooth class embeddings are generated from the segmentation outputs, which are fed to the reconstruction network. A novel module called Conditional eXcitation (CX) is proposed in order to effectively incorporate the combined shape and class embeddings into the implicit function. The performance of Occudent is evaluated using both quantitative and qualitative measures. Importantly, Occudent is trained and validated with actual panoramic radiographs as input, distinct from recent works which used synthesized images. Experiments demonstrate the superiority of Occudent over state-of-the-art methods.	翻訳日:2023-11-29 19:48:48 公開日:2023-11-28
# gnnに基づく電力網の動的特性の評価とナレッジグラフへの適用 Evaluation of dynamic characteristics of power grid based on GNN and application on knowledge graph ( http://arxiv.org/abs/2311.16522v1 ) ライセンス: Link先を確認	Hao Pei, Si Lin, Chuanfu Li, Che Wang, Haoming Chen, Sizhe Li	(参考訳) グラフニューラルネットワーク (GNN) を用いた電力網の故障検出手法が開発され, ネットワーク運用と保守におけるインテリジェントな故障診断の高度化を目的としている。このGNNベースのアプローチは、知識グラフと結合した特殊な電気的特徴抽出モデルにより、電力網内の故障ノードを特定する。時間的データを組み込んだ手法では、前および後続のノードの状態を利用して、現在の故障検出を支援する。ノード特徴抽出におけるこのGNNの有効性を検証するため,ニューラルネットワーク層内の各ノードからの出力特徴の相関解析を行った。実験結果から, シミュレーションシナリオの故障ノードを99.53%の精度で正確に検出できることがわかった。さらに、グラフニューラルネットワークの機能モデリングは、ノード間の障害の拡散方法の質的な検証を可能にし、障害ノードを分析する上で貴重な洞察を提供する。 A novel method for detecting faults in power grids using a graph neural network (GNN) has been developed, aimed at enhancing intelligent fault diagnosis in network operation and maintenance. This GNN-based approach identifies faulty nodes within the power grid through a specialized electrical feature extraction model coupled with a knowledge graph. Incorporating temporal data, the method leverages the status of nodes from preceding and subsequent time periods to aid in current fault detection. To validate the effectiveness of this GNN in extracting node features, a correlation analysis of the output features from each node within the neural network layer was conducted. The results from experiments show that this method can accurately locate fault nodes in simulated scenarios with a remarkable 99.53% accuracy. Additionally, the graph neural network's feature modeling allows for a qualitative examination of how faults spread across nodes, providing valuable insights for analyzing fault nodes.	翻訳日:2023-11-29 19:48:28 公開日:2023-11-28
# MobileDiffusion: モバイルデバイス上のサブ秒間テキスト・画像生成 MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices ( http://arxiv.org/abs/2311.16567v1 ) ライセンス: Link先を確認	Yang Zhao, Yanwu Xu, Zhisheng Xiao, Tingbo Hou	(参考訳) 大規模テキストから画像への拡散モデルのモバイルデバイスへの展開は、そのモデルサイズと遅延推論速度によって妨げられている。本稿では,アーキテクチャとサンプリング技術の両方における広範囲な最適化により得られた,高効率なテキストから画像への拡散モデルである \textbf{mobilediffusion} を提案する。画像生成品質を維持しつつ,冗長性を低減し,計算効率を高め,モデルのパラメータ数を最小化するために,モデルアーキテクチャ設計を包括的に検討する。さらに, 蒸留法と拡散GAN微調整法を用いて, それぞれ8段階, 1段階の推論を行う。定量的および定性的に実施した実証研究は,提案手法の有効性を実証するものである。 mobilediffusion は 512\times512$ の画像をモバイルデバイスで生成する、驚くべき \textbf{sub-second} 推論速度を達成し、新たな最先端を確立している。 The deployment of large-scale text-to-image diffusion models on mobile devices is impeded by their substantial model size and slow inference speed. In this paper, we propose \textbf{MobileDiffusion}, a highly efficient text-to-image diffusion model obtained through extensive optimizations in both architecture and sampling techniques. We conduct a comprehensive examination of model architecture design to reduce redundancy, enhance computational efficiency, and minimize model's parameter count, while preserving image generation quality. Additionally, we employ distillation and diffusion-GAN finetuning techniques on MobileDiffusion to achieve 8-step and 1-step inference respectively. Empirical studies, conducted both quantitatively and qualitatively, demonstrate the effectiveness of our proposed techniques. MobileDiffusion achieves a remarkable \textbf{sub-second} inference speed for generating a $512\times512$ image on mobile devices, establishing a new state of the art.	翻訳日:2023-11-29 19:38:01 公開日:2023-11-28
# DiffusionTalker:音声駆動型3次元顔ディフューザのパーソナライズとアクセラレーション DiffusionTalker: Personalization and Acceleration for Speech-Driven 3D Face Diffuser ( http://arxiv.org/abs/2311.16565v1 ) ライセンス: Link先を確認	Peng Chen, Xiaobao Wei, Ming Lu, Yitong Zhu, Naiming Yao, Xingyu Xiao, Hui Chen	(参考訳) スピーチ駆動の3D顔アニメーションは、学術と産業の両方において魅力的なタスクだ。伝統的な手法は主に、音声からアニメーションへの決定論的マッピングの学習に焦点を当てている。最近のアプローチでは、音声駆動3d顔アニメーションの非決定論的事実を検討し、そのタスクに拡散モデルを採用する。しかし、既存の拡散法では、顔アニメーションのパーソナライズとアニメーション生成の加速が大きな制限となっている。そこで本研究では, コントラスト学習を用いて3次元顔アニメーションと知識蒸留をパーソナライズし, 3次元アニメーション生成を高速化する拡散ベースの手法である diffusiontalker を提案する。具体的には,パーソナライゼーションを実現するために,学習可能な発話idを導入し,知識を音声列に集約する。提案したアイデンティティ埋め込みは、異なる人物間で異なる学習方法でカスタマイズされた顔の手がかりを抽出する。推論中、ユーザーは特定の話し方を反映して入力音声に基づくパーソナライズされた顔のアニメーションを得ることができる。何百ステップものステップを持つトレーニングされた拡散モデルでは、アクセラレーションのために8ステップの軽量モデルにそれを蒸留します。本手法が最先端手法よりも優れていることを示すために,広範な実験を行った。コードはリリースされます。 Speech-driven 3D facial animation has been an attractive task in both academia and industry. Traditional methods mostly focus on learning a deterministic mapping from speech to animation. Recent approaches start to consider the non-deterministic fact of speech-driven 3D face animation and employ the diffusion model for the task. However, personalizing facial animation and accelerating animation generation are still two major limitations of existing diffusion-based methods. To address the above limitations, we propose DiffusionTalker, a diffusion-based method that utilizes contrastive learning to personalize 3D facial animation and knowledge distillation to accelerate 3D animation generation. Specifically, to enable personalization, we introduce a learnable talking identity to aggregate knowledge in audio sequences. The proposed identity embeddings extract customized facial cues across different people in a contrastive learning manner. During inference, users can obtain personalized facial animation based on input audio, reflecting a specific talking style. With a trained diffusion model with hundreds of steps, we distill it into a lightweight model with 8 steps for acceleration. Extensive experiments are conducted to demonstrate that our method outperforms state-of-the-art methods. The code will be released.	翻訳日:2023-11-29 19:37:46 公開日:2023-11-28
# 反復量子振幅推定におけるバイアスについて On the bias in iterative quantum amplitude estimation ( http://arxiv.org/abs/2311.16560v1 ) ライセンス: Link先を確認	Koichi Miyamoto	(参考訳) 量子振幅推定(QAE)は、ターゲット基底状態の平方振幅$a$を量子状態$\|\Phi\rangle$で推定する中心量子アルゴリズムである。元の量子位相推定に基づくQAEの様々な改良が提案されている。このような改良されたバージョンの一つがイテレーティブ量子振幅推定(IQAE)であり、G^k\|\Phi\rangle$のような量子状態における測定の反復ラウンドを通じて$\hat{a}$の$a$を出力し、Grover演算子$G$(グロバー数)の演算数とショット番号を適応的に決定する。本稿ではIQAEのバイアスについて検討する。 IQAEをシミュレートする数値実験により、IQAEの推定値にバイアスがかかり、特定の値が$a$に対してバイアスが強化されることが明らかになった。 IQAEの終端基準は、$\hat{a}$の推定精度が閾値より下降していることがバイアスの源であることが分かる。さらに、最終ラウンドにおけるGrover数である$k_\mathrm{fin}$と、最終ラウンドにおける測定結果の確率分布に影響を与える$f_\mathrm{fin}$はバイアスを決定する重要な要素であり、a$の特定の値に対するバイアスの増大は、$(k_\mathrm{fin},f_\mathrm{fin})$の歪分布に起因する。また, 最終ラウンドをグロバー数とショット数で再実行するだけで, バイアス緩和法を提案する。 Quantum amplitude estimation (QAE) is a pivotal quantum algorithm to estimate the squared amplitude $a$ of the target basis state in a quantum state $\|\Phi\rangle$. Various improvements on the original quantum phase estimation-based QAE have been proposed for resource reduction. One of such improved versions is iterative quantum amplitude estimation (IQAE), which outputs an estimate $\hat{a}$ of $a$ through the iterated rounds of the measurements on the quantum states like $G^k\|\Phi\rangle$, with the number $k$ of operations of the Grover operator $G$ (the Grover number) and the shot number determined adaptively. This paper investigates the bias in IQAE. Through the numerical experiments to simulate IQAE, we reveal that the estimate by IQAE is biased and the bias is enhanced for some specific values of $a$. We see that the termination criterion in IQAE that the estimated accuracy of $\hat{a}$ falls below the threshold is a source of the bias. Besides, we observe that $k_\mathrm{fin}$, the Grover number in the final round, and $f_\mathrm{fin}$, a quantity affecting the probability distribution of measurement outcomes in the final round, are the key factors to determine the bias, and the bias enhancement for specific values of $a$ is due to the skewed distribution of $(k_\mathrm{fin},f_\mathrm{fin})$. We also present a bias mitigation method: just re-executing the final round with the Grover number and the shot number fixed.	翻訳日:2023-11-29 19:37:26 公開日:2023-11-28
# 富士通デジタルアニールによるグラフ分割 Graph Partitioning with Fujitsu Digital Annealer ( http://arxiv.org/abs/2311.16559v1 ) ライセンス: Link先を確認	Yu-Ting Kao, Hsiu-Chuan Hsu	(参考訳) グラフ分割、またはコミュニティ検出は、ロジスティクス、輸送、スマートパワーグリッドなど、多くの分野の基盤となっている。地域社会の効率的な計算と効果的な評価は、特に商業的・工業的な環境では不可欠である。しかし、グラフ分割の解空間は頂点と部分群の数によって劇的に増加する。大規模なグラフ分割などの最適化問題を短時間で解くことに目を向け、改良されたアルゴリズムを特徴とする特殊なcmosハードウェアであるdigital annealer(da)が富士通によって考案された。本研究は,藤津田の性能と実行時間を測定する。モジュラリティは、ソリューションの目的関数とメトリクスの両方として実装されました。グラフ分割問題は、DAに適切にインポートできるように、擬似非制約バイナリ最適化(QUBO)構造にフォーマットされた。 daは、空手クラブ、レス・ミゼラブルズ、アメリカンフットボール、ドルフィンを分割する他の研究の中で、最も高いモジュラリティを得た。さらにDAは、ケース1354ペガゼパワーグリッドネットワークを45のサブグループに分割し、60,930のバイナリ変数を要求し、約80秒の問題解決時間内に最適なモジュラリティ結果を提供することができた。その結果,Fujitsu DAはグラフ分割の高速化と最適化に有効であることが示唆された。 Graph partitioning, or community detection, is the cornerstone of many fields, such as logistics, transportation and smart power grids. Efficient computation and efficacious evaluation of communities are both essential, especially in commercial and industrial settings. However, the solution space of graph partitioning increases drastically with the number of vertices and subgroups. With an eye to solving large scale graph partitioning and other optimization problems within a short period of time, the Digital Annealer (DA), a specialized CMOS hardware also featuring improved algorithms, has been devised by Fujitsu Ltd. This study gauges Fujitsu DA's performance and running times. The modularity was implemented as both the objective function and metric for the solutions. The graph partitioning problems were formatted into Quadratic Unconstrained Binary Optimization (QUBO) structures so that they could be adequately imported into the DA. The DA yielded the highest modularity among other studies when partitioning Karate Club, Les Miserables, American Football, and Dolphin. Moreover, the DA was able to partition the Case 1354pegase power grid network into 45 subgroups, calling for 60,930 binary variables, whilst delivering optimal modularity results within a solving time of roughly 80 seconds. Our results suggest that the Fujitsu DA can be applied for rapid and efficient optimization for graph partitioning.	翻訳日:2023-11-29 19:36:52 公開日:2023-11-28
# マルチラベル分類のためのスケーラブルなラベル分布学習 Scalable Label Distribution Learning for Multi-Label Classification ( http://arxiv.org/abs/2311.16556v1 ) ライセンス: Link先を確認	Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng	(参考訳) マルチラベル分類(MLC、Multi-label classification)とは、あるインスタンスに関連ラベルのセットをタグ付けする問題を指す。既存のMLC法の多くは、各ラベルペア内の2つのラベルの相関関係が対称であるという仮定に基づいている。さらに、既存のほとんどの手法はラベル数に関連する学習プロセスを設計し、大規模な出力空間にスケールアップする際の計算複雑性をボトルネックにする。これらの課題に対処するために,ラベルの相関関係が非対称で次元がラベル数に依存しない潜在空間における分布として,異なるラベルを記述可能な多ラベル分類のためのスケーラブルラベル分布学習(SLDL)を提案する。特に、sldlはラベルをまず低次元の潜在空間内の連続分布に変換し、非対称計量を利用して異なるラベル間の相関を確立する。そして、特徴空間から潜在空間へのマッピングを学習し、結果として計算の複雑さはラベルの数とは無関係になる。最後に、SLDLは隣り合う戦略を利用して、潜在表現をデコードし、最終的な予測を得る。 SLDLは計算量が少なく,非常に競争力のある分類性能が得られることを示す。 Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their computational complexity a bottleneck when scaling up to large-scale output space. To tackle these issues, we propose a novel MLC learning method named Scalable Label Distribution Learning (SLDL) for multi-label classification which can describe different labels as distributions in a latent space, where the label correlation is asymmetric and the dimension is independent of the number of labels. Specifically, SLDL first converts labels into continuous distributions within a low-dimensional latent space and leverages the asymmetric metric to establish the correlation between different labels. Then, it learns the mapping from the feature space to the latent space, resulting in the computational complexity is no longer related to the number of labels. Finally, SLDL leverages a nearest-neighbor-based strategy to decode the latent representations and obtain the final predictions. Our extensive experiments illustrate that SLDL can achieve very competitive classification performances with little computational consumption.	翻訳日:2023-11-29 19:36:29 公開日:2023-11-28
# 拡散モデルを用いたリアルテキスト画像合成によるシーンテキスト検出装置の強化 Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models ( http://arxiv.org/abs/2311.16555v1 ) ライセンス: Link先を確認	Ling Fu, Zijie Wu, Yingying Zhu, Yuliang Liu, Xiang Bai	(参考訳) シーンテキスト検出技術は広範に応用されているため注目されている。しかし、既存の手法はトレーニングデータに対する高い需要があり、正確な人間のアノテーションを得ることは労働集約的で時間がかかります。解決策として、研究者は事前学習中に合成テキスト画像が実際のテキスト画像の補完的リソースとして広く採用されている。しかし、シーンテキスト検出器の性能を高めるための合成データセットは依然として存在する。既存の生成方法の1つの主な制限は、前景テキストの背景への統合が不十分であることである。そこで本研究では,この拡散モデルを用いてテキスト領域と背景の特徴をシームレスに融合する,拡散モデルに基づくテキスト生成器(difftext)を提案する。さらに,スペルエラーが少ない視覚的コヒーレントテキストを生成するための2つの手法を提案する。テキストインスタンスが少なくなると、生成したテキストイメージはテキスト検出を支援する他の合成データを一貫して上回ります。水平, 回転, 湾曲, 線状テキストの検出実験により, リアルテキスト画像の生成におけるDiffTextの有効性が示された。 Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and time-consuming. As a solution, researchers have widely adopted synthetic text images as a complementary resource to real text images during pre-training. Yet there is still room for synthetic datasets to enhance the performance of scene text detectors. We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background. To alleviate this problem, we present the Diffusion Model based Text Generator (DiffText), a pipeline that utilizes the diffusion model to seamlessly blend foreground text regions with the background's intrinsic features. Additionally, we propose two strategies to generate visually coherent text with fewer spelling errors. With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors. Extensive experiments on detecting horizontal, rotated, curved, and line-level texts demonstrate the effectiveness of DiffText in producing realistic text images.	翻訳日:2023-11-29 19:36:06 公開日:2023-11-28
# HandyPriors: 利き手と利き手との相互作用の物理的に一貫性のある知覚 HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors ( http://arxiv.org/abs/2311.16552v1 ) ライセンス: Link先を確認	Shutong Zhang, Yi-Ling Qiao, Guanglei Zhu, Eric Heiden, Dylan Turpin, Jingzhou Liu, Ming Lin, Miles Macklin, Animesh Garg	(参考訳) ハンドオブジェクトの相互作用をモデル化するための様々なヒューリスティックな目的が過去の研究で提案されている。しかしながら、結束的な枠組みが欠如しているため、これらの目的はしばしば適用範囲が狭く、その効率や精度によって制限される。本稿では,近年の微分物理学とレンダリングの進歩を活用して,人間と物体の相互作用シーンにおけるポーズ推定のための統一的で汎用的なパイプラインであるHandyPriorsを提案する。提案手法では,入力画像やセグメンテーションマスクとレンダリングプリエントと物理プリエントを併用することで,フレーム間の透過性や相対スライディングを緩和する。さらに,手と物体のポーズ推定のための2つの代替案を提案する。最適化に基づくポーズ推定は精度が向上する一方、微分可能前処理をダイナミクスモデルやオブザーバモデルとして利用するフィルタリングベーストラッキングはより高速に実行される。我々は,HandyPriorsがポーズ推定タスクにおいて同等あるいは優れた結果が得られることを実証し,識別可能な物理モジュールがポーズ修正のための接触情報を予測できることを実証した。また,本手法はロボットハンド操作や野生の人間-対象ポーズ推定を含む知覚タスクに一般化することを示した。 Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves higher accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes faster. We demonstrate that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement. We also show that our approach generalizes to perception tasks, including robotic hand manipulation and human-object pose estimation in the wild.	翻訳日:2023-11-29 19:35:51 公開日:2023-11-28
# 量子シミュレーションのための平行格子ゲージ理論リンク上の量子ビット変数の実空間ブロック Real-space blocking of qubit variables on parallel lattice gauge theory links for quantum simulation ( http://arxiv.org/abs/2311.16549v1 ) ライセンス: Link先を確認	Judy Shir and Erez Zohar	(参考訳) 非摂動ゲージ理論の物理学を研究するために近年提案されている方法の1つは、格子ゲージ理論を実験室や量子コンピュータで構築できる量子デバイスにマッピングする量子シミュレーションである。 While being very promising and already showing some experimental results, these methods still face several challenges related to the interface between the technological capabilities and the demands of the simulated models; in particular, one such challenge is the need to simulate infinitely dimensional local Hilbert spaces, describing the gauge fields on the links in the case of compact Lie gauge groups, requiring some truncations and approximations which are not completely understood or controllable in the general case. この研究は、ほとんどの量子シミュレーションプラットフォームで利用可能なコンポーネントからなる単純で低次元の量子ビット系の粗粒化を用いて、任意の大きさの局所ヒルベルト空間を得る方法を提案し、新しいタイプの格子ゲージ理論量子シミュレーションの道を開く。 One of the methods proposed in the last years for studying non-perturbative gauge theory physics is quantum simulation, where lattice gauge theories are mapped onto quantum devices which can be built in the laboratory, or quantum computers. While being very promising and already showing some experimental results, these methods still face several challenges related to the interface between the technological capabilities and the demands of the simulated models; in particular, one such challenge is the need to simulate infinitely dimensional local Hilbert spaces, describing the gauge fields on the links in the case of compact Lie gauge groups, requiring some truncations and approximations which are not completely understood or controllable in the general case. This work proposes a way to obtain arbitrarily large such local Hilbert spaces by using coarse graining of simple, low dimensional qubit systems, made of components available on most quantum simulation platforms, and thus opening the way of new types of lattice gauge theory quantum simulations.	翻訳日:2023-11-29 19:35:31 公開日:2023-11-28
# 3端子量子ドット熱電対のグラフ理論的解析:オンザガー関係とスピン-熱電効果 Graph Theoretic Analysis of Three-Terminal Quantum Dot Thermocouples: Onsager Relations and Spin-Thermoelectric Effects ( http://arxiv.org/abs/2311.16548v1 ) ライセンス: Link先を確認	Nikhil Gupt, Shuvadip Ghosh and Arnab Ghosh	(参考訳) 2つの強結合量子ドットからなる3端子量子熱電対の簡易モデルを提案する。スピン依存ゼーベック効果とペルティエ効果を解明するために、顕微鏡的ハミルトン方程式を用い、リンドブラッドマスター方程式を量子遷移ネットワークにマッピングし、両相互効果の主要な作用原理を捉える。本研究では,クーロン相互作用とスピンフリッピング過程を包含する量子熱力学ネットワークを明らかにした。代数グラフ理論を用いて,循環束とサイクル力の観点から表されるエントロピー生成率の確率的バージョンから,可逆熱力学の現象論的法則を回復する。驚くべきことに、輸送係数に対するオンザガー相反性とケルビン関係は、量子遷移ネットワーク内のサイクルフラックス軌道の性質においてその前提を見つける。これは、局所平衡仮定に依存する可逆熱力学の古典法則とは根本的に異なる基礎にもかかわらず、古典的および量子的領域における熱力学原理の普遍的一般化を基礎としている。 We introduce a simplified model for a three-terminal quantum thermocouple consisting of two strongly-coupled quantum dots. To elucidate spin-dependent Seebeck and Peltier effects, we employ a microscopic Hamiltonian and map the Lindblad master equation onto a quantum transition network, capturing the key working principles for both reciprocal effects. Our analysis reveals quantum thermodynamic networks encompassing both Coulomb interaction and spin-flipping processes, lead to the emergence of spin-thermolectric effects. Using algebraic graph theory, we recover the phenomenological law of irreversible thermodynamics from the stochastic version of the entropy production rate expressed in terms of cycle flux and cycle forces. Remarkably, Onsager reciprocity and Kelvin relation for transport coefficients find their premises in the properties of cycle flux trajectories within the quantum transition network. This underscores the universal generality of thermodynamic principles across classical and quantum realms, despite their fundamentally different basis from classical laws of irreversible thermodynamics relying on local equilibrium assumptions.	翻訳日:2023-11-29 19:35:18 公開日:2023-11-28
# ロバスト回転平均化のための多波長スペクトル同期 Multi-Irreducible Spectral Synchronization for Robust Rotation Averaging ( http://arxiv.org/abs/2311.16544v1 ) ライセンス: Link先を確認	Owen Howell, Haoen Huang, and David Rosen	(参考訳) 回転平均化(RA)はロボット工学とコンピュータビジョンの基本的な問題である。 ra の目標は、ノイズのある測定値 $r_{ij} \sim r^{-1}_{i} r_{j}$ が対の相対回転のサブセットであるとき、n$ の未知方向 $r_{1}, ..., r_{n} \in so(3)$ を推定することである。この問題は非凸かつNPハードであり、一般の場合では解決が難しい。コンパクト群に調和解析を適用して、RA目的に現れる個々のサマンドの(凸)フーリエ分解から構成したスペクトル緩和を導出し、この緩和の極端固有ペアを数個計算してRA解の推定を復元し、(ほぼ)コンセンサス問題を解く。この手法は従来のra法に比べていくつかの利点がある: \emph{any} の滑らかな損失関数(ロバストな m-推定子を含む)と併用することができ、初期化は必要とせず、単純な(かつ高度にスケーラブルな)線形代数計算と、個々の回転状態の帯域制限関数上の並列化可能な最適化のみを用いて実装される。さらに、乗法的ランゲヴィン測定ノイズの(物理的に動機づけられた)仮定の下で、基礎となる測定ネットワークのグラフ理論量でパラメータ化されるスペクトル推定器(推定誤差の確率的尾境界の形で)の明確な性能保証を導出する。また, 推定器の性能と基礎となる測定グラフの特性を具体的に結びつけることで, センサ配置, ネットワーク圧縮, アクティブセンシングなどの下流タスクを可能とし, 精度の高い推定を実現するために, \emph{guaranteed} 測定ネットワークをどのように考案するかを示す。 Rotation averaging (RA) is a fundamental problem in robotics and computer vision. In RA, the goal is to estimate a set of $N$ unknown orientations $R_{1}, ..., R_{N} \in SO(3)$, given noisy measurements $R_{ij} \sim R^{-1}_{i} R_{j}$ of a subset of their pairwise relative rotations. This problem is both nonconvex and NP-hard, and thus difficult to solve in the general case. We apply harmonic analysis on compact groups to derive a (convex) spectral relaxation constructed from truncated Fourier decompositions of the individual summands appearing in the RA objective; we then recover an estimate of the RA solution by computing a few extremal eigenpairs of this relaxation, and (approximately) solving a consensus problem. Our approach affords several notable advantages versus prior RA methods: it can be used in conjunction with \emph{any} smooth loss function (including, but not limited to, robust M-estimators), does not require any initialization, and is implemented using only simple (and highly scalable) linear-algebraic computations and parallelizable optimizations over band-limited functions of individual rotational states. Moreover, under the (physically well-motivated) assumption of multiplicative Langevin measurement noise, we derive explicit performance guarantees for our spectral estimator (in the form of probabilistic tail bounds on the estimation error) that are parameterized in terms of graph-theoretic quantities of the underlying measurement network. By concretely linking estimator performance with properties of the underlying measurement graph, our results also indicate how to devise measurement networks that are \emph{guaranteed} to achieve accurate estimation, enabling such downstream tasks as sensor placement, network compression, and active sensing.	翻訳日:2023-11-29 19:34:58 公開日:2023-11-28
# エージェントがOKRに会う:階層的自己コラボレーションと自己評価を備えたオブジェクトとキー結果駆動エージェントシステム Agents meet OKR: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation ( http://arxiv.org/abs/2311.16542v1 ) ライセンス: Link先を確認	Yi Zheng, Chongyang Ma, Kanle Shi, Haibin Huang	(参考訳) 本研究では,タスク解決におけるLarge Language Models (LLM) の機能向上を目的としたOKR-Agentの概念を紹介する。本手法は,階層的エージェントによって促進される自己協調と自己修正機構を併用し,タスク解決における本質的な複雑さに対処する。第一に、ドメイン知識の深い部分に対する効果的なタスク解決要求と、個々のサブタスクに特別なエージェントを配置することで、llmパフォーマンスを著しく向上させる複雑な推論です。第二に、タスク解決は本質的に階層的な実行構造に固執し、高いレベルの戦略的計画と詳細なタスク実行の両方を含む。この目的に向けて、我々のOKR-Agentパラダイムは、この階層構造と密接に一致し、様々なシナリオにおける有効性と適応性を保証します。具体的には、階層的オブジェクトとキー結果の生成とマルチレベル評価という、2つの新しいモジュールが含まれます。実際に階層的なOKR生成は、オブジェクトを複数のサブオブジェクトに分解し、主要な結果とエージェントの責任に基づいて新しいエージェントを割り当てます。これらのエージェントはその後、指定されたタスクを精巧にし、必要に応じてさらに分解することができる。このような生成は再帰的かつ階層的に動作し、詳細な解の包括的な集合で終わる。 OKR-Agentのマルチレベル評価モジュールは、関連するすべてのエージェントからのフィードバックを活用し、プロセスの各ステップを最適化することで、ソリューションを洗練します。これにより、ソリューションは正確で実用的で、複雑なタスク要求に効果的に対応でき、その結果の全体的な信頼性と品質が向上します。実験の結果,提案手法はいくつかのタスクにおいて従来の手法よりも優れていた。コードとデモはhttps://okr-agent.github.io/で入手できる。 In this study, we introduce the concept of OKR-Agent designed to enhance the capabilities of Large Language Models (LLMs) in task-solving. Our approach utilizes both self-collaboration and self-correction mechanism, facilitated by hierarchical agents, to address the inherent complexities in task-solving. Our key observations are two-fold: first, effective task-solving demands in-depth domain knowledge and intricate reasoning, for which deploying specialized agents for individual sub-tasks can markedly enhance LLM performance. Second, task-solving intrinsically adheres to a hierarchical execution structure, comprising both high-level strategic planning and detailed task execution. Towards this end, our OKR-Agent paradigm aligns closely with this hierarchical structure, promising enhanced efficacy and adaptability across a range of scenarios. Specifically, our framework includes two novel modules: hierarchical Objects and Key Results generation and multi-level evaluation, each contributing to more efficient and robust task-solving. In practical, hierarchical OKR generation decomposes Objects into multiple sub-Objects and assigns new agents based on key results and agent responsibilities. These agents subsequently elaborate on their designated tasks and may further decompose them as necessary. Such generation operates recursively and hierarchically, culminating in a comprehensive set of detailed solutions. The multi-level evaluation module of OKR-Agent refines solution by leveraging feedback from all associated agents, optimizing each step of the process. This ensures solution is accurate, practical, and effectively address intricate task requirements, enhancing the overall reliability and quality of the outcome. Experimental results also show our method outperforms the previous methods on several tasks. Code and demo are available at https://okr-agent.github.io/	翻訳日:2023-11-29 19:34:22 公開日:2023-11-28
# フィードバック誘起皮膚効果による動的相転移 Dynamical Phase Transition due to Feedback-induced Skin Effect ( http://arxiv.org/abs/2311.16541v1 ) ライセンス: Link先を確認	Ze-Chuan Liu, Kai Li, and Yong Xu	(参考訳) 非エルミート的皮膚効果(non-hermitian skin effect)とは、非エルミート的ハミルトニアンの固有状態が主に境界に存在する現象を指す。近年の研究では,ポストセレクションを必要とせずに継続的に監視するシステムにおいて,条件フィードバックによって同様の皮膚効果が引き起こされることが示されている。残念なことに、エンタングルメント相転移は後期定常状態において欠如している。本稿では,開境界条件下での条件フィードバックを伴う連続監視自由フェルミオン系における多体ダイナミクスについて検討する。時間の経過とともに、絡み合いエントロピーの対数的スケーリングから領域-法則スケーリングへの新たな動的位相遷移が予想される。この遷移は、従来の動的相転移と明らかに異なるが、バルク力学と境界皮膚効果の競合から生じる。さらに, 準不規則あるいは乱れは開境界条件下で定常状態の相転移を駆動することはできないが, 周期的境界条件下でのダイナミクスの定常状態に対する絡み合い相転移とよく一致する時間発展中の最大絡み合いエントロピーについては遷移が発生することを見出した。 The non-Hermitian skin effect refers to a phenomenon that eigenstates of a non-Hermitian Hamiltonian mainly reside at a boundary. Remarkably, recent work shows that a similar skin effect can be induced by conditional feedback in a continuously monitored system without the requirement of postselection. Unfortunately, the entanglement phase transition is found to be absent for the late-time steady state. Here, we study the many-body dynamics in a continuously monitored free fermion system with conditional feedback under open boundary conditions. We surprisingly find a novel dynamical phase transition from a logarithmic scaling of the entanglement entropy to an area-law scaling as time evolves. The transition, which is noticeably different from the conventional dynamical phase transition, arises from the competition between the bulk dynamics and boundary skin effects. In addition, we find that while quasidisorder or disorder cannot drive a phase transition for the steady state under open boundary conditions, the transition occurs for the maximum entanglement entropy during the time evolution, which agrees well with the entanglement phase transition for the steady state of the dynamics under periodic boundary conditions.	翻訳日:2023-11-29 19:33:53 公開日:2023-11-28
# プライバシーに敏感な視覚課題のための拡散モデルによるフェデレーション学習 Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks ( http://arxiv.org/abs/2311.16538v1 ) ライセンス: Link先を確認	Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong	(参考訳) 拡散モデルは視覚関連タスク、特に画像生成に大きな可能性を示している。しかしながら、彼らのトレーニングは通常、公開されているソースから収集したデータに依存する中央集権的な方法で行われる。このアプローチは、データ収集に関するプライバシの懸念を含む医療分野など、多くの領域で実現可能あるいは実用的なものではありません。プライバシーに敏感なデータに関わる課題にもかかわらず、そのようなドメインは拡散モデルが提供する貴重なビジョンサービスから恩恵を受けることができる。フェデレーション学習(fl)は,データのプライバシを損なうことなく分散モデルトレーニングを可能にする上で重要な役割を担っている。データを収集する代わりに、FLシステムはモデルパラメータを収集し、関係するパーティのプライベートデータを効果的に保護する。これによりflシステムは、特にプライバシに敏感なデータがクライアントのネットワークに分散しているシナリオにおいて、分散学習タスクの管理に不可欠となる。それでもFLは、分散した性質とプライバシ保護プロパティのために、独自の課題を提示している。そこで本研究では, 拡散モデルを訓練するためのFL戦略を考察し, 連成拡散モデルの開発への道を開く。我々は様々なFLシナリオの実験を行い、フェデレーション拡散モデルが、プライバシーに敏感なドメインにビジョンサービスを提供する大きな可能性を実証した。 Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challenges associated with privacy-sensitive data, such domains could still benefit from valuable vision services provided by diffusion models. Federated learning (FL) plays a crucial role in enabling decentralized model training without compromising data privacy. Instead of collecting data, an FL system gathers model parameters, effectively safeguarding the private data of different parties involved. This makes FL systems vital for managing decentralized learning tasks, especially in scenarios where privacy-sensitive data is distributed across a network of clients. Nonetheless, FL presents its own set of challenges due to its distributed nature and privacy-preserving properties. Therefore, in this study, we explore the FL strategy to train diffusion models, paving the way for the development of federated diffusion models. We conduct experiments on various FL scenarios, and our findings demonstrate that federated diffusion models have great potential to deliver vision services to privacy-sensitive domains.	翻訳日:2023-11-29 19:33:34 公開日:2023-11-28
# ランダウ固有状態を特徴づける量子数のゲージ原理と非可観測性について More on the gauge principle and nonobservability of some quantum numbers characterizing the Landau eigen-states ( http://arxiv.org/abs/2311.16537v1 ) ライセンス: Link先を確認	Masashi Wakamatsu	(参考訳) 対称ゲージにおけるランダウハミルトニアンの固有状態は、2つの整数 $n$ と $m$ によって特徴づけられる。ここで、$n$ は慣れ親しんだランダウ量子数を表し、$m$ は正準軌道角運動量 (oam) 作用素 $\hat{l}^{can}_z$ の固有値を表す。一方、第1ランダウゲージの固有状態は、2つの整数 $n$ と $k_x$ で特徴づけられ、ここで $n$ はランダウ量子数であり、$k_x$ は標準運動量演算子 $\hat{p}^{can}_x$ の固有値である。正準運動量と正準OAMは共にゲージ変量であるため、その固有値 $k_x$ と $m$ は観測値に対応していないと標準的に信じられている。しかし、この広視野観は、ランダウ問題のゲージポテンシャル独立な定式化の論理的発展に基づいて、二つの保存モーメントa $\hat{p}^{cons}_x$と$\hat{p}^{cons}_y$と1つの保存OAM $\hat{L}^{cons}_z$が存在することを予測した最近の論文で疑わしい。これらはランダウ・ハミルトニアン(英語版)のネーター電荷と見なされ、その保存は {\it auge potential} の選択から独立に保証される。特に、これらの保存作用素の新しい共変ゲージ変換特性に基づいて、固有値は量子数 $k_x$, $k_y$, $m$ によって特徴づけられ、これらの量子数は少なくとも原理上は可観測値に対応すると主張された。本研究の目的は、ランダウ問題の2つの理論的定式化の違い、すなわち従来の定式化とゲージポテンシャル独立な定式化によらず、この主張が正当化されないことを示すことである。 The eigen-states of the Landau Hamiltonian in the symmetric gauge are characterized by two integers $n$ and $m$. Here, $n$ denotes the familiar Landau quantum number, while $m$ represents the eigen-value of the canonical orbital angular momentum (OAM) operator $\hat{L}^{can}_z$. On the other hand, the eigen-states in the 1st Landau gauge are characterized by two integers $n$ and $k_x$, here $n$ is the Landau quantum number, while $k_x$ is the eigen-value of the canonical momentum operator $\hat{p}^{can}_x$. Since the canonical momentum and the canonical OAM are both gauge-variant quantities, their eigenvalues $k_x$ and $m$ are standardly believed not to correspond to observables. However, this wide-spread view was suspected in a recent paper based on the logical development of the gauge-potential-independent formulation of the Landau problem, which predicts the existence of two conserved momenta $\hat{p}^{cons}_x$ and $\hat{p}^{cons}_y$ and one conserved OAM $\hat{L}^{cons}_z$. They are regarded as Noether charges of the Landau Hamiltonian, the conservation of which is guaranteed {\it independently} of the choice of the {\it auge potential}. In particular, on the basis of novel covariant gauge transformation properties of these conserved operators, the eigen-values of which are characterized by the quantum numbers $k_x$, $k_y$, and $m$, it was claimed that these quantum numbers correspond to observables at least in principle. The purpose of the present paper is to show that this claim is not justified, regardless of the differences in the two theoretical formulations of the Landau problem, i.e. the traditional formulation and the gauge-potential-independent formulation.	翻訳日:2023-11-29 19:33:14 公開日:2023-11-28
# コントラストエンコーダによる異種データのためのクラスタ化フェデレーション学習 Contrastive encoder pre-training-based clustered federated learning for heterogeneous data ( http://arxiv.org/abs/2311.16535v1 ) ライセンス: Link先を確認	Ye Lin Tun, Minh N.H. Nguyen, Chu Myaet Thwal, Jinwoo Choi, Choong Seon Hong	(参考訳) フェデレーション学習(federated learning, fl)は、分散クライアントがデータプライバシを維持しながらグローバルなモデルを協調的にトレーニングできる、有望なアプローチである。しかし、FLはデータの不均一性の問題に悩まされ、その性能に大きな影響を及ぼすことがある。これに対処するために、異なるクライアントクラスタ向けにパーソナライズされたモデルを構築するために、clustered federated learning (cfl)が提案されている。効果的なクライアントクラスタリング戦略の1つは、クライアントがパフォーマンスに基づいてモデルプールから独自のローカルモデルを選択できるようにすることである。しかしながら、事前トレーニングされたモデルパラメータがなければ、このような戦略は、すべてのクライアントが同じモデルを選択するクラスタ障害を起こしやすい。残念ながら、事前トレーニングのために大量のラベル付きデータを収集することは、分散環境ではコストがかかり実用的ではない。この課題を克服するために、自己教師付きコントラスト学習を活用し、FLシステムの事前学習にラベルのないデータを活用する。自己教師付き事前トレーニングとクライアントクラスタリングは、flのデータの不均一性問題に取り組む上で重要なコンポーネントとなる。そこで本研究では,これら2つの重要な戦略を活かし,コントラスト型事前学習に基づくクラスタ型フェデレート学習(cp-cfl)を提案する。本研究では、異種FL設定における広範な実験を通してCP-CFLの有効性を実証し、様々な興味深い観察結果を示す。 Federated learning (FL) is a promising approach that enables distributed clients to collaboratively train a global model while preserving their data privacy. However, FL often suffers from data heterogeneity problems, which can significantly affect its performance. To address this, clustered federated learning (CFL) has been proposed to construct personalized models for different client clusters. One effective client clustering strategy is to allow clients to choose their own local models from a model pool based on their performance. However, without pre-trained model parameters, such a strategy is prone to clustering failure, in which all clients choose the same model. Unfortunately, collecting a large amount of labeled data for pre-training can be costly and impractical in distributed environments. To overcome this challenge, we leverage self-supervised contrastive learning to exploit unlabeled data for the pre-training of FL systems. Together, self-supervised pre-training and client clustering can be crucial components for tackling the data heterogeneity issues of FL. Leveraging these two crucial strategies, we propose contrastive pre-training-based clustered federated learning (CP-CFL) to improve the model convergence and overall performance of FL systems. In this work, we demonstrate the effectiveness of CP-CFL through extensive experiments in heterogeneous FL settings, and present various interesting observations.	翻訳日:2023-11-29 19:32:34 公開日:2023-11-28
# LasTGL: 大規模時間グラフ学習のための産業フレームワーク LasTGL: An Industrial Framework for Large-Scale Temporal Graph Learning ( http://arxiv.org/abs/2311.16605v1 ) ライセンス: Link先を確認	Jintang Li, Jiawang Dan, Ruofan Wu, Jing Zhou, Sheng Tian, Yunfei Liu, Baokun Wang, Changhua Meng, Weiqiang Wang, Yuchang Zhu, Liang Chen, Zibin Zheng	(参考訳) ここ数年、グラフニューラルネットワーク(GNN)は、(静的)グラフ構造データを学ぶための強力で実用的なツールになっています。しかし、ソーシャルネットワークやeコマースのような現実世界のアプリケーションの多くは、ノードとエッジが動的に進化している時間グラフを含んでいる。時相グラフニューラルネットワーク(TGNN)は、時間進化グラフに対処するGNNの拡張として徐々に現れ、学術と産業の両方において、徐々にトレンドとなっている。このような分野における研究の促進には、TGNNモデルを構築し、時間グラフを扱う際の異なるスキームを統合するための新しいツールが必要である。時間グラフ学習の研究と応用を容易にするために,様々な高度なタスクに対して共通時間グラフ学習アルゴリズムの統一的かつ拡張可能な実装を統合する産業フレームワークであるlastglを紹介する。 LasTGLの目的は、PyTorchがベースとするユーザフレンドリ性の原則とクイックプロトタイピングに重点を置いて、時間グラフ学習タスクを解決するための重要なビルディングブロックを提供することである。特にLasTGLは、包括的な時間グラフデータセット、TGNNモデル、ユーティリティ、ドキュメント化されたチュートリアルを提供しており、絶対的な初心者と専門的なディープラーニング実践者の両方に適している。 Over the past few years, graph neural networks (GNNs) have become powerful and practical tools for learning on (static) graph-structure data. However, many real-world applications, such as social networks and e-commerce, involve temporal graphs where nodes and edges are dynamically evolving. Temporal graph neural networks (TGNNs) have progressively emerged as an extension of GNNs to address time-evolving graphs and have gradually become a trending research topic in both academics and industry. Advancing research in such an emerging field requires new tools to compose TGNN models and unify their different schemes in dealing with temporal graphs. To facilitate research and application in temporal graph learning, we introduce LasTGL, an industrial framework that integrates unified and extensible implementations of common temporal graph learning algorithms for various advanced tasks. The purpose of LasTGL is to provide the essential building blocks for solving temporal graph learning tasks, focusing on the guiding principles of user-friendliness and quick prototyping on which PyTorch is based. In particular, LasTGL provides comprehensive temporal graph datasets, TGNN models and utilities along with well-documented tutorials, making it suitable for both absolute beginners and expert deep learning practitioners alike.	翻訳日:2023-11-29 19:25:59 公開日:2023-11-28
# LC4SV: 未知の話者検証モデルを補うためのフレームワーク学習 LC4SV: A Denoising Framework Learning to Compensate for Unseen Speaker Verification Models ( http://arxiv.org/abs/2311.16604v1 ) ライセンス: Link先を確認	Chi-Chang Lee, Hong-Wei Chen, Chu-Song Chen, Hsin-Min Wang, Tsung-Te Liu, Yu Tsao	(参考訳) 話者検証(SV)モデルの性能はノイズの多い環境で劇的に低下する可能性がある。音声強調(SE)モジュールは、フロントエンド戦略として使用できる。しかし、既存のseメソッドは、seモデルの予測信号のアーティファクトのため、下流のsvシステムにパフォーマンス改善をもたらすことができないかもしれない。アーティファクトを補うために,様々な未知の下流svモデルのプリプロセッサとして機能するlc4svという汎用除算フレームワークを提案する。 LC4SVでは,拡張信号とその雑音入力の間の適切な係数を自動的に生成し,ノイズの多い環境でのSV性能を向上させるために,学習ベースの補間エージェントを用いる。実験の結果,LC4SVは様々な未確認SVシステムの性能を一貫して改善することがわかった。我々の知る限り、本研究はノイズの多い環境下でのSV性能向上を目的とした学習ベース補間スキームの最初の試みである。 The performance of speaker verification (SV) models may drop dramatically in noisy environments. A speech enhancement (SE) module can be used as a front-end strategy. However, existing SE methods may fail to bring performance improvements to downstream SV systems due to artifacts in the predicted signals of SE models. To compensate for artifacts, we propose a generic denoising framework named LC4SV, which can serve as a pre-processor for various unknown downstream SV models. In LC4SV, we employ a learning-based interpolation agent to automatically generate the appropriate coefficients between the enhanced signal and its noisy input to improve SV performance in noisy environments. Our experimental results demonstrate that LC4SV consistently improves the performance of various unseen SV systems. To the best of our knowledge, this work is the first attempt to develop a learning-based interpolation scheme aiming at improving SV performance in noisy environments.	翻訳日:2023-11-29 19:25:22 公開日:2023-11-28
# GSP-KalmanNet:ニューラルアシストカルマンフィルタによるグラフ信号追跡 GSP-KalmanNet: Tracking Graph Signals via Neural-Aided Kalman Filtering ( http://arxiv.org/abs/2311.16602v1 ) ライセンス: Link先を確認	Itay Buchnik, Guy Sagi, Nimrod Leinwand, Yuval Loya, Nir Shlezinger, and Tirza Routtenberg	(参考訳) グラフ信号の動的システムは、ソーシャルネットワーク、電力グリッド、輸送など、様々なアプリケーションで遭遇する。このようなシステムはしばしば状態空間(ss)モデルとして記述されるが、カルマンフィルタ(kf)とその変種に基づく従来のツールによるグラフ信号の追跡は困難である。これは、非線形性、高次元性、領域の不規則性、グラフ信号の実世界の動的システムに関連する複雑なモデリングのためである。本研究では,ハイブリッドモデルベース/データ駆動手法を用いて,グラフ信号の追跡について検討する。グラフ信号処理(GSP)ツールと深層学習(DL)技術を併用することにより,グラフィカルな計測から隠れたグラフィカルな状態を追跡できるGSP-KalmanNetを開発した。 GSP-KalmanNetの導出は、KFを拡張してグラフ周波数領域フィルタリングにより固有のグラフ構造を利用することに基づいている。次に,Kalmanが最近提案したKalmanNetフレームワークに従えば,ノイズ統計に対して特定のモデルを強制することなく,部分的および近似的モデリングに対処できる。実験の結果,提案したGSP-KalmanNetは,モデルベースベンチマークとデータ駆動ベンチマークの両方と比較して,精度の向上と実行時間性能の向上を実現している。 Dynamic systems of graph signals are encountered in various applications, including social networks, power grids, and transportation. While such systems can often be described as state space (SS) models, tracking graph signals via conventional tools based on the Kalman filter (KF) and its variants is typically challenging. This is due to the nonlinearity, high dimensionality, irregularity of the domain, and complex modeling associated with real-world dynamic systems of graph signals. In this work, we study the tracking of graph signals using a hybrid model-based/data-driven approach. We develop the GSP-KalmanNet, which tracks the hidden graphical states from the graphical measurements by jointly leveraging graph signal processing (GSP) tools and deep learning (DL) techniques. The derivations of the GSP-KalmanNet are based on extending the KF to exploit the inherent graph structure via graph frequency domain filtering, which considerably simplifies the computational complexity entailed in processing high-dimensional signals and increases the robustness to small topology changes. Then, we use data to learn the Kalman gain following the recently proposed KalmanNet framework, which copes with partial and approximated modeling, without forcing a specific model over the noise statistics. Our empirical results demonstrate that the proposed GSP-KalmanNet achieves enhanced accuracy and run time performance as well as improved robustness to model misspecifications compared with both model-based and data-driven benchmarks.	翻訳日:2023-11-29 19:24:57 公開日:2023-11-28
# カスタマイズされた組込み要素を利用する -- html5 webコンポーネントによるコンポーネントベースのソフトウェアエンジニアリングとデザインシステム Harnessing customized built-in elements -- Empowering Component-Based Software Engineering and Design Systems with HTML5 Web Components ( http://arxiv.org/abs/2311.16601v1 ) ライセンス: Link先を確認	Hardik Shah	(参考訳) HTML5のカスタム組み込み要素は、Web開発を大きく変える。これらの要素により、開発者は特定の設計と目的に合わせて独自のHTMLコンポーネントを作成できる。カスタマイズされた組み込み要素により、開発者はWebアプリケーションのユニークなニーズに素早く対処でき、多様なデジタルプラットフォームで一貫したユーザーインターフェイスとエクスペリエンスをサポートできます。本研究では,コンポーネントベースソフトウェア工学(CBSE)とデザインシステムにおけるこれらの機能の役割を考察し,Web開発におけるコードのモジュール化,再利用性,スケーラビリティのメリットを強調した。カスタマイズされた組み込み要素により、開発者はWebアプリケーションのユニークなニーズに素早く対処でき、多様なデジタルプラットフォームで一貫したユーザーインターフェイスとエクスペリエンスをサポートできます。また、ブラウザの互換性、パフォーマンスの最適化、アクセシビリティ、セキュリティ、スタイリング、相互運用性など、カスタマイズされた組み込み要素を作成する際に対処しなければならない困難と懸念についても論じる。これらの機能の可能性を完全に実現するために、標準化、開発者ツール、コミュニティの相互作用の重要性を強調します。今後、カスタマイズされた組み込み要素は、IoT(Internet of Things)、eコマース、教育技術など、さまざまなアプリケーションに潜在する可能性がある。プログレッシブWebアプリ(PWA)への組み入れは、Webエクスペリエンスをさらに改善することが期待されている。障害は残るが、記事はHTML5のカスタマイズされた組み込み要素がWeb開発革新の原動力であり、常に変化するデジタルコンテキストにおいて、効率的で適応的でユーザ中心のWebアプリケーションを生産できると結論付けている。 Customized built-in elements in HTML5 significantly transform web development. These elements enable developers to create unique HTML components tailored with specific design and purpose. Customized built-in elements enable developers to address the unique needs of web applications more quickly, supporting consistent user interfaces and experiences across diverse digital platforms. This study investigates the role of these features in Component-Based Software Engineering (CBSE) and Design Systems, emphasizing the benefits of code modularity, reusability, and scalability in web development. Customized built-in elements enable developers to address the unique needs of web applications more quickly, supporting consistent user interfaces and experiences across diverse digital platforms. The paper also discusses the difficulties and concerns that must be addressed when creating customized built-in elements, such as browser compatibility, performance optimization, accessibility, security, styling, and interoperability. It emphasizes the importance of standardization, developer tooling, and community interaction in order to fully realize the potential of these features. Looking ahead, customized built-in elements have potential in a variety of applications, including the Internet of Things (IoT), e-commerce, and educational technologies. Their incorporation into Progressive Web Apps (PWAs) is expected to further improve web experiences. While obstacles remain, the article concludes that HTML5 customized built-in elements are a driver for web development innovation, allowing the production of efficient, adaptive, and user-centric web applications in an ever-changing digital context.	翻訳日:2023-11-29 19:24:14 公開日:2023-11-28
# 光キャビティ、2レベル原子およびJaynes-Cummingsエミッタで散乱した光子状態の統計 Statistics of tens-of-photon states scattered by optical cavity, two-level atom and Jaynes-Cummings emitter ( http://arxiv.org/abs/2311.16599v1 ) ライセンス: Link先を確認	Jia-Nan Wu and Bingsuo Zou and and Yongyou Zhang	(参考訳) 光子状態を操作することは様々な光学デバイスの主要な要件であり、量子情報技術には高い関連性がある。それでも十万光子状態の基本的な理論的枠組みは確立されていない。本研究では,光共振器(OC),2レベル原子(TLA)およびJynes-Cummingsエミッタ(JCE)によって散乱した光子状態の統計を導出するために,行列積状態理論の確立に成功した。インシデント10光子状態の一例として,少数の光子の場合と異なる新しい物理結果を示す。我々は、OCが入射光子状態の統計を変更せず、光子番号に依存しないことを検証する。しかし、TLAやJCEでは、光子数は光子束と反膨らみの挙動に強く影響を及ぼす。光子数が増加するにつれて、JCEによって誘導される光子-光子相関の最大値が存在する。特に、tla(jce)による散乱波は、bi-photonの場合と10-photonの場合とは非常に異なる統計挙動を示す。これらの十光子状態と開発された行列生成状態理論の区別可能な結論は、多光子操作への道を開いた。 Manipulating photon states serves as a primary requirement for various optical devices and is of high relevance for quantum information technology. Nevertheless, the fundamental theoretical framework for tens-of-photon states has not been established. This study successfully establishes the matrix-product-state theory to explore the statistics of the tens-of-photon states scattered by optical cavities (OCs), two-level atoms (TLAs), and Jaynes-Cummings emitters (JCEs) in waveguide-QED systems. Taking the incident 10-photon states as an example, we reveal some novel physical results that differ from those for few-photon cases. We verify that OCs do not change the statistics of the incident photon states, being independent of the photon number. However, for the TLAs and JCEs, the photon number strongly impacts the photon bunching and anti-bunching behaviors. As the photon number increases, there exists a maximum value for the photon-photon correlation induced by the JCE. Especially, the scattered waves by the TLA (or JCE) exhibit extremely different statistics behaviors for the 10-photon cases from those for the bi-photon. These distinguishable conclusions for the tens-of-photon states and the developed matrix-product-state theory pave the way for the multi-photon manipulation.	翻訳日:2023-11-29 19:23:18 公開日:2023-11-28
# ディープニューラルネットワーク加速器における故障位置推定のためのモニタ配置 Monitor Placement for Fault Localization in Deep Neural Network Accelerators ( http://arxiv.org/abs/2311.16594v1 ) ライセンス: Link先を確認	Wei-Kai Liu, Benjamin Tan, Krishnendu Chakrabarty	(参考訳) サイストリックアレイは、並列性と効率的なデータ再利用を提供するため、ディープニューラルネットワーク(DNN)アクセラレーターにとって顕著な選択である。ハードウェア障害がDNN推論の精度を低下させる可能性があるため、DNNアクセラレータの信頼性の向上が不可欠である。シストリックアレイは並列処理に多数の処理要素(PE)を用いるが、1つのPEが故障すると、エラーが伝播し、下流PEの結果に影響を与える。 PEの数が多すぎるため、各PEのハードウェアベースのランタイム監視を実装するコストは、実現不可能である。本稿では,systolic配列内のハードウェアモニタ配置を最適化するソリューションを提案する。まず、単一障害PEをローカライズするために2N-1ドルモニターが必要であることを証明し、モニタ配置を導出する。与えられたモニタ数に対する候補故障PEの集合を最小化する第2の配置最適化問題はNPハードであることを示す。そこで本研究では,DNNアクセラレータの信頼性とハードウェアリソース利用のバランスをとるためのヒューリスティックな手法を提案する。実験により、単一障害PEをローカライズするには、256\times 256$ systolic配列に対して0.33%のオーバーヘッドしか発生しないことがわかった。 Systolic arrays are a prominent choice for deep neural network (DNN) accelerators because they offer parallelism and efficient data reuse. Improving the reliability of DNN accelerators is crucial as hardware faults can degrade the accuracy of DNN inferencing. Systolic arrays make use of a large number of processing elements (PEs) for parallel processing, but when one PE is faulty, the error propagates and affects the outcomes of downstream PEs. Due to the large number of PEs, the cost associated with implementing hardware-based runtime monitoring of every single PE is infeasible. We present a solution to optimize the placement of hardware monitors within systolic arrays. We first prove that $2N-1$ monitors are needed to localize a single faulty PE and we also derive the monitor placement. We show that a second placement optimization problem, which minimizes the set of candidate faulty PEs for a given number of monitors, is NP-hard. Therefore, we propose a heuristic approach to balance the reliability and hardware resource utilization in DNN accelerators when number of monitors is limited. Experimental evaluation shows that to localize a single faulty PE, an area overhead of only 0.33% is incurred for a $256\times 256$ systolic array.	翻訳日:2023-11-29 19:22:50 公開日:2023-11-28
# 新型コロナウイルス検出の強化 - ネットワーク深層学習アーキテクチャの微調整によるパフォーマンスの最適化 Empowering COVID-19 Detection: Optimizing Performance Through Fine-Tuned EfficientNet Deep Learning Architecture ( http://arxiv.org/abs/2311.16593v1 ) ライセンス: Link先を確認	Md. Alamin Talukder, Md. Abu Layek, Mohsin Kazi, Md Ashraf Uddin, Sunil Aryal	(参考訳) 世界的な新型コロナウイルス(covid-19)パンデミックは、世界中の個人の健康と日常体験に大きな影響を与えている。急速感染を抑制するために早期かつ正確な検出を必要とする非常に伝染性の呼吸器疾患である。最初の検査方法は、主にウイルスの遺伝的構成を特定し、比較的低い検出率を示し、時間を要する手順を必要とする。この課題に対処するため、専門家は診断プロトコルの中で貴重なアプローチとして放射線画像、特に胸部x線を使うことを提案した。本研究では,深層学習アルゴリズムを用いたx線撮影(x線)による新型コロナウイルスの迅速かつ正確な診断の可能性について検討する。提案手法は,様々な確立したトランスファー学習モデルにおいて,適切な層を微調整することで検出精度を高める。実験は、2000枚の画像を含むcovid-19 x線データセットで行われた。 EfficientNetB4モデルでは100%の精度で達成された。微調整された efficientnetb4 は優れた精度スコアを達成し、ロバストな新型コロナウイルス検出モデルとしての可能性を示した。さらに4350枚の画像を含む胸部x線データを用いて肺疾患の同定に優れ、99.17%の精度、99.13%の精度、99.16%のリコール、99.14%のf1-scoreで優れた性能を得た。これらの結果は、医用画像、特にX線画像による効率的な肺検出のための微調整転写学習の可能性を浮き彫りにした。この研究は、放射線科医に、迅速かつ正確な新型コロナウイルスの診断を支援する効果的な手段を提供し、感染した患者を正確に識別する医療専門家に貴重な支援を提供する。 The worldwide COVID-19 pandemic has profoundly influenced the health and everyday experiences of individuals across the planet. It is a highly contagious respiratory disease requiring early and accurate detection to curb its rapid transmission. Initial testing methods primarily revolved around identifying the genetic composition of the coronavirus, exhibiting a relatively low detection rate and requiring a time-intensive procedure. To address this challenge, experts have suggested using radiological imagery, particularly chest X-rays, as a valuable approach within the diagnostic protocol. This study investigates the potential of leveraging radiographic imaging (X-rays) with deep learning algorithms to swiftly and precisely identify COVID-19 patients. The proposed approach elevates the detection accuracy by fine-tuning with appropriate layers on various established transfer learning models. The experimentation was conducted on a COVID-19 X-ray dataset containing 2000 images. The accuracy rates achieved were impressive of 100% for EfficientNetB4 model. The fine-tuned EfficientNetB4 achieved an excellent accuracy score, showcasing its potential as a robust COVID-19 detection model. Furthermore, EfficientNetB4 excelled in identifying Lung disease using Chest X-ray dataset containing 4,350 Images, achieving remarkable performance with an accuracy of 99.17%, precision of 99.13%, recall of 99.16%, and f1-score of 99.14%. These results highlight the promise of fine-tuned transfer learning for efficient lung detection through medical imaging, especially with X-ray images. This research offers radiologists an effective means of aiding rapid and precise COVID-19 diagnosis and contributes valuable assistance for healthcare professionals in accurately identifying affected patients.	翻訳日:2023-11-29 19:22:18 公開日:2023-11-28
# レーン検出の一般化:多様性向上のためのhdマップを用いた新しい枠組み Improving Lane Detection Generalization: A Novel Framework using HD Maps for Boosting Diversity ( http://arxiv.org/abs/2311.16589v1 ) ライセンス: Link先を確認	Daeun Lee, Minhyeok Heo and Jiwon Kim	(参考訳) レーン検出は、車両が道路上の位置をナビゲートし、ローカライズするための重要なタスクである。信頼性の高い結果を得るためには、レーン検出アルゴリズムは様々な道路環境において堅牢な一般化性能を有する必要がある。しかし,深層学習に基づく車線検出アルゴリズムの大幅な性能向上にもかかわらず,道路環境の変化に対する一般化性能は期待に届かなかった。本稿では,レーン検出におけるssdg(single-source domain generalization)の新たな枠組みを提案する。データをレーン構造や周囲に分解することで,ハイディフィニション(HD)マップと生成モデルを用いて多様性を向上させる。データボリュームを拡大するのではなく、データのコアサブセットを戦略的に選択し、多様性を最大化し、パフォーマンスを最適化します。広範な実験により,提案手法はレーン検出の一般化性能を向上し,ドメイン適応型手法に匹敵することを示した。 Lane detection is a vital task for vehicles to navigate and localize their position on the road. To ensure reliable results, lane detection algorithms must have robust generalization performance in various road environments. However, despite the significant performance improvement of deep learning-based lane detection algorithms, their generalization performance in response to changes in road environments still falls short of expectations. In this paper, we present a novel framework for single-source domain generalization (SSDG) in lane detection. By decomposing data into lane structures and surroundings, we enhance diversity using High-Definition (HD) maps and generative models. Rather than expanding data volume, we strategically select a core subset of data, maximizing diversity and optimizing performance. Our extensive experiments demonstrate that our framework enhances the generalization performance of lane detection, comparable to the domain adaptation-based method.	翻訳日:2023-11-29 19:21:50 公開日:2023-11-28
# medgen: 医学テキスト処理のためのpython自然言語処理ツールキット MedGen: A Python Natural Language Processing Toolkit for Medical Text Processing ( http://arxiv.org/abs/2311.16588v1 ) ライセンス: Link先を確認	Rui Yang, Qingcheng Zeng, Keen You, Yujie Qiao, Lucas Huang, Chia-Chun Hsieh, Benjamin Rosand, Jeremy Goldwasser, Amisha D Dave, Tiarnan D.L. Keenan, Emily Y Chew, Dragomir Radev, Zhiyong Lu, Hua Xu, Qingyu Chen, Irene Li	(参考訳) 本研究は、医学テキスト処理用に設計された包括的自然言語処理(NLP)ツールキットであるMedGenを紹介する。 MedGenは、最小限のプログラミング専門知識を必要とする、使いやすくオールインワンのソリューションを持つバイオメディカル研究者や医療専門家向けにカスタマイズされている。生成関数: (1)生成関数: 初めて、MedGenは4つの高度な生成関数を含む: 質問応答、テキスト要約、テキスト単純化、機械翻訳、(2)基本NLP関数: MedGenは、単語のトークン化や文のセグメンテーションのような12の必須NLP関数を統合し、(3)クエリと検索機能: MedGenは、テキストコーパス上でユーザフレンドリなクエリと検索機能を提供します。我々は32のドメイン固有言語モデルを微調整し、24の確立されたベンチマークで徹底的に評価し、臨床医と手動レビューを行った。さらに,クエリ機能や検索機能を導入してツールキットを拡張し,サードパーティライブラリからの機能を標準化し,統合しました。ツールキット、そのモデル、および関連するデータはhttps://github.com/Yale-LILY/MedGenから公開されている。 This study introduces MedGen, a comprehensive natural language processing (NLP) toolkit designed for medical text processing. MedGen is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. It includes (1) Generative Functions: For the first time, MedGen includes four advanced generative functions: question answering, text summarization, text simplification, and machine translation; (2) Basic NLP Functions: MedGen integrates 12 essential NLP functions such as word tokenization and sentence segmentation; and (3) Query and Search Capabilities: MedGen provides user-friendly query and search functions on text corpora. We fine-tuned 32 domain-specific language models, evaluated them thoroughly on 24 established benchmarks and conducted manual reviews with clinicians. Additionally, we expanded our toolkit by introducing query and search functions, while also standardizing and integrating functions from third-party libraries. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/MedGen.	翻訳日:2023-11-29 19:21:34 公開日:2023-11-28
# ニューヨーク市のゴミ問題を整理する Sorting Out New York City's Trash Problem ( http://arxiv.org/abs/2311.16585v1 ) ライセンス: Link先を確認	Steven DiSilvio, Anthony Ozerov, Leon Zhou	(参考訳) ニューヨーク市における廃棄物の削減と公衆衛生の改善のためには、市内のユニークな都市景観に合わせた革新的な政策が必要である。最初に提案するプログラムはDumpster and Compost Accessibility Programである。このプログラムは安価で、消火栓の近くに設置されたゴミ捨て場を利用して、駐車場をなくすことなく道路から無駄を省く。また、法的な変更や、単一世帯と二世帯の世帯へのコンポストビンの提供も含まれており、共同でコンポストレートが向上する。第2のプログラムはPay-As-You-Throw Programである。これはニューヨーク市が収集したごみ袋のステッカーを購入し、コンポスト可能な廃棄物とリサイクル可能なごみを整理するインセンティブを与えるためである。市の優先順位に基づいて最適なステッカー価格を決定するために,重み付き多目的最適化を行う。料金にほぼ比例して、このプログラムは陽動率を増加させ、ニューヨーク市衛生局の純コストを下げる。この2つのプログラムはニューヨーク市のディバージョン率を改善し、街路からゴミ袋を取り除き、ニューヨーク市の費用を節約する可能性がある。 To reduce waste and improve public health and sanitation in New York City, innovative policies tailored to the city's unique urban landscape are necessary. The first program we propose is the Dumpster and Compost Accessibility Program. This program is affordable and utilizes dumpsters placed near fire hydrants to keep waste off the street without eliminating parking spaces. It also includes legal changes and the provision of compost bins to single/two-family households, which together will increase composting rates. The second program is the Pay-As-You-Throw Program. This requires New Yorkers living in single/two-family households to purchase stickers for each refuse bag they have collected by the city, incentivizing them to sort out compostable waste and recyclables. We conduct a weighted multi-objective optimization to determine the optimal sticker price based on the City's priorities. Roughly in proportion to the price, this program will increase diversion rates and decrease the net costs to New York City's Department of Sanitation. In conjunction, these two programs will improve NYC's diversion rates, eliminate garbage bags from the streets, and potentially save New York City money.	翻訳日:2023-11-29 19:21:12 公開日:2023-11-28
# fedal: 逆学習によるブラックボックスフェデレート知識蒸留 FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning ( http://arxiv.org/abs/2311.16584v1 ) ライセンス: Link先を確認	Pengchao Han, Xingyan Shi, Jianwei Huang	(参考訳) 知識蒸留(KD)は、異なるモデルアーキテクチャを持ち、ローカルデータやモデルパラメータを他と共有しない分散クライアント間の協調学習を可能にする。各クライアントは、すべてのクライアントモデルの平均モデル出力/フィーチャをターゲットとしてローカルモデルを更新する。しかし、クライアントのローカルモデルが不均一なローカルデータセットでトレーニングされている場合、既存のフェデレーションKDメソッドはよく機能しないことが多い。本稿では,クライアント間のデータ不均一性に対応するために,Adversarial Learning (FedAL) によって実現されたフェデレーション知識蒸留を提案する。まず、データの不均一性に起因するクライアント間の局所モデル出力のばらつきを軽減するため、サーバはクライアント間のコンセンサスモデル出力をクライアントと差別者間のmin-maxゲームを介してクライアント間のコンセンサスモデル出力を達成するための判別器として機能する。さらに、クライアントのローカルトレーニングや、クライアントの異種ローカルデータによるグローバルナレッジ転送中に、壊滅的な忘れが起きる可能性がある。この課題に向けて,我々は,ローカルトレーニングとグローバル知識伝達の両方において,忘れられない正規化をデザインし,クライアントが他者に知識を伝達/誘導する能力を保証する。実験により,FedALとその変異体は,他の連合KDベースラインよりも高い精度が得られることが示された。 Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.	翻訳日:2023-11-29 19:20:51 公開日:2023-11-28
# GeoScaler: 3Dメッシュテクスチャの幾何学とレンダリングによるダウンサンプリング GeoScaler: Geometry and Rendering-Aware Downsampling of 3D Mesh Textures ( http://arxiv.org/abs/2311.16581v1 ) ライセンス: Link先を確認	Sai Karthikey Pentapati, Anshul Rai, Arkady Ten, Chaitanya Atluru, Alan Bovik	(参考訳) 高解像度テクスチャマップは、3Dメッシュで現実世界のオブジェクトを正確に表現するために必要である。テクスチャの大きなサイズは、計算予算が低くメモリが限られているデバイス上で、高品質な仮想3Dシーンのリアルタイムレンダリングをボトルネックにすることができる。テクスチャマップのダウンサンプリングは、視覚的忠実さを犠牲にして、問題に直接対処する。伝統的に、テクスチャマップのダウンサンプリングは、bicubic interpolationやlanczosアルゴリズムのような手法を用いて行われる。これらの方法はメッシュの幾何学的レイアウトや紫外線パラメトリゼーションを無視し、ユーザが経験する最終的な可視化を得るために使われるレンダリングプロセスも考慮しない。これらのギャップを埋めるために,幾何学的手がかりを取り入れつつ3次元メッシュのテクスチャマップをダウンサンプリングする方法であるGeoScalerを導入し,テクスチャメッシュの描画ビューの視覚的忠実度を最大化する。ジオスケーラが生成するテクスチャは, 従来のダウンサンプリング法と比較して, 画質が大幅に向上することを示す。 High-resolution texture maps are necessary for representing real-world objects accurately with 3D meshes. The large sizes of textures can bottleneck the real-time rendering of high-quality virtual 3D scenes on devices having low computational budgets and limited memory. Downsampling the texture maps directly addresses the issue, albeit at the cost of visual fidelity. Traditionally, downsampling of texture maps is performed using methods like bicubic interpolation and the Lanczos algorithm. These methods ignore the geometric layout of the mesh and its UV parametrization and also do not account for the rendering process used to obtain the final visualization that the users will experience. Towards filling these gaps, we introduce GeoScaler, which is a method of downsampling texture maps of 3D meshes while incorporating geometric cues, and by maximizing the visual fidelity of the rendered views of the textured meshes. We show that the textures generated by GeoScaler deliver significantly better quality rendered images compared to those generated by traditional downsampling methods	翻訳日:2023-11-29 19:20:27 公開日:2023-11-28
# ノイズラベルを用いた医用画像セグメンテーションのためのクリーンラベルアンテング Clean Label Disentangling for Medical Image Segmentation with Noisy Labels ( http://arxiv.org/abs/2311.16580v1 ) ライセンス: Link先を確認	Zicheng Wang, Zhen Zhao, Erjian Guo and Luping Zhou	(参考訳) 医療画像のセグメンテーションに焦点を絞る現在の手法は、ノイズラベル問題として知られる不正確なアノテーションに苦しむ。ノイズラベルを用いた医用画像セグメンテーションのほとんどはノイズ遷移行列、ノイズロバスト損失関数、擬似ラベル法のいずれかを使用しているが、現在の研究はクリーンラベルの絡み合いに焦点を当てていない。主な理由は、厳格なクラス不均衡が選択された ``clean''' ラベルの不正確さを招き、ノイズに対するモデルの堅牢性に影響を与えるためである。そこで本研究では,新たに提案するクリーンラベル分離フレームワークにより,与えられたラベル集合からクリーンラベルを選択的に選択し,適切なアノテーションからモデルに学習を促すための,単純かつ効率的なクラスバランスのサンプリング戦略を考案する。しかし、そのようなメソッドは有用な情報を含むかもしれないアノテーションを多すぎるとフィルタリングする。そこで我々は,クリーンラベル・ディエンタングフレームワークをさらに拡張し,新しいノイズの多い機能支援クリーンラベル・ディエンタングフレームワークを構築した。広範な実験により,本手法の有効性が検証され,新たな最先端性能が得られた。私たちのコードはhttps://github.com/xiaoyao3302/2bdenoiseで利用可能です。 Current methods focusing on medical image segmentation suffer from incorrect annotations, which is known as the noisy label issue. Most medical image segmentation with noisy labels methods utilize either noise transition matrix, noise-robust loss functions or pseudo-labeling methods, while none of the current research focuses on clean label disentanglement. We argue that the main reason is that the severe class-imbalanced issue will lead to the inaccuracy of the selected ``clean'' labels, thus influencing the robustness of the model against the noises. In this work, we come up with a simple but efficient class-balanced sampling strategy to tackle the class-imbalanced problem, which enables our newly proposed clean label disentangling framework to successfully select clean labels from the given label sets and encourages the model to learn from the correct annotations. However, such a method will filter out too many annotations which may also contain useful information. Therefore, we further extend our clean label disentangling framework to a new noisy feature-aided clean label disentangling framework, which takes the full annotations into utilization to learn more semantics. Extensive experiments have validated the effectiveness of our methods, where our methods achieve new state-of-the-art performance. Our code is available at https://github.com/xiaoyao3302/2BDenoise.	翻訳日:2023-11-29 19:20:08 公開日:2023-11-28
# 感情とその状態に関する条件付き因果関係の認識 Recognizing Conditional Causal Relationships about Emotions and Their Corresponding Conditions ( http://arxiv.org/abs/2311.16579v1 ) ライセンス: Link先を確認	Xinhong Chen, Zongxi Li, Yaowei Wang, Haoran Xie, Jianping Wang, Qing Li	(参考訳) テキストにおける感情と原因の因果関係の研究は近年注目されている。ほとんどの作品は文書から因果関係の節を抽出することに焦点を当てている。しかしながら、これらの研究は、抽出された感情と原因節間の因果関係が特定の文脈節でのみ有効であると考えるものはない。このような特別な因果関係における文脈を強調するために、感情と原因の入力対が異なる文脈下で有効な因果関係を持つか否かを判定し、因果関係に関与する特定の文脈節を抽出するための新しいタスクを提案する。このタスクは、既存のデータセットが利用できない新しいタスクであるため、ベンチマークデータセット上で手動アノテーションを実行して、タスクのラベルと、他のアプリケーションでも使用できる各コンテキスト節の型のアノテーションを取得します。文書数と因果関係のバランスをとるために,最終的なデータセットを構築するために負のサンプリングを採用する。構築したデータセットに基づいてエンドツーエンドのマルチタスクフレームワークを提案し、タスクの2つの目標を処理するために、2つの新しいモジュールと一般的なモジュールを設計する。具体的には,因果関係に関与する文脈節を抽出するためのコンテキストマスキングモジュールを提案する。入力感情や原因が特定の文脈節に依存するかどうかに応じて予測結果を微調整する予測集約モジュールを提案する。比較実験およびアブレーション実験の結果,提案手法の有効性と汎用性が示された。 The study of causal relationships between emotions and causes in texts has recently received much attention. Most works focus on extracting causally related clauses from documents. However, none of these works has considered that the causal relationships among the extracted emotion and cause clauses can only be valid under some specific context clauses. To highlight the context in such special causal relationships, we propose a new task to determine whether or not an input pair of emotion and cause has a valid causal relationship under different contexts and extract the specific context clauses that participate in the causal relationship. Since the task is new for which no existing dataset is available, we conduct manual annotation on a benchmark dataset to obtain the labels for our tasks and the annotations of each context clause's type that can also be used in some other applications. We adopt negative sampling to construct the final dataset to balance the number of documents with and without causal relationships. Based on the constructed dataset, we propose an end-to-end multi-task framework, where we design two novel and general modules to handle the two goals of our task. Specifically, we propose a context masking module to extract the context clauses participating in the causal relationships. We propose a prediction aggregation module to fine-tune the prediction results according to whether the input emotion and causes depend on specific context clauses. Results of extensive comparative experiments and ablation studies demonstrate the effectiveness and generality of our proposed framework.	翻訳日:2023-11-29 19:19:42 公開日:2023-11-28
# 事前学習モデルを用いた画像ネットの効率的なキーベース対向防御 Efficient Key-Based Adversarial Defense for ImageNet by Using Pre-trained Model ( http://arxiv.org/abs/2311.16577v1 ) ライセンス: Link先を確認	AprilPyone MaungMaung, Isao Echizen, Hitoshi Kiya	(参考訳) 本稿では,事前学習モデルの活用と,ImageNet-1k分類における最近の高速微調整技術を活用したキーベースディフェンスモデル拡散手法を提案する。まず、キーベースのモデルをエッジデバイスにデプロイすることは、apple coremlのような最新のモデル展開の進歩によって実現可能であることを強調するが、メインストリームのエンタープライズエッジ人工知能(edge ai)はクラウドにフォーカスしている。その結果,(1)スクラッチから多くの分類器を訓練することは不可能であり,(2)ImageNetのような大規模データセット上でも,キーベースのディフェンスを徹底的にテストする必要がある。そこで本研究では,事前学習したモデルを活用して,限られた計算資源でも鍵ベースのモデルを増殖させる効率的な微調整手法を提案する。適応攻撃と非適応攻撃を用いてimagenet-1kデータセット上で実験を行った。以上の結果から,提案手法は従来のキーベースモデルと比較して,クリーンかつ逆例の分類において優れた分類精度(10%以上増加)が得られることが示された。 In this paper, we propose key-based defense model proliferation by leveraging pre-trained models and utilizing recent efficient fine-tuning techniques on ImageNet-1k classification. First, we stress that deploying key-based models on edge devices is feasible with the latest model deployment advancements, such as Apple CoreML, although the mainstream enterprise edge artificial intelligence (Edge AI) has been focused on the Cloud. Then, we point out that the previous key-based defense on on-device image classification is impractical for two reasons: (1) training many classifiers from scratch is not feasible, and (2) key-based defenses still need to be thoroughly tested on large datasets like ImageNet. To this end, we propose to leverage pre-trained models and utilize efficient fine-tuning techniques to proliferate key-based models even on limited computing resources. Experiments were carried out on the ImageNet-1k dataset using adaptive and non-adaptive attacks. The results show that our proposed fine-tuned key-based models achieve a superior classification accuracy (more than 10% increase) compared to the previous key-based models on classifying clean and adversarial examples.	翻訳日:2023-11-29 19:19:20 公開日:2023-11-28
# エピポーラ変位場を用いたParallax-Tolerant Image Stitching Parallax-Tolerant Image Stitching with Epipolar Displacement Field ( http://arxiv.org/abs/2311.16637v1 ) ライセンス: Link先を確認	Jian Yu, Yi Yu, Feipeng Da	(参考訳) 大きな視差画像縫合は難しい作業である。既存の手法では、画像の局所的構造と大域的構造の両方を維持するのに苦労し、アライメントアーティファクトの低減や歪みの抑制に苦労することが多い。本稿では, エピポーラ変位場に基づくワープ技術を確立するために, エピポーラ幾何を利用した新しいアプローチを提案する。当初、エピポーラ幾何学における画素の歪み規則は無限ホモグラフィーによって確立される。その後、局所的な弾性変形の原理に基づいて、歪んだ画素のエピポーラ線に沿っての滑り距離を表すエピポーラ変位場を薄板スプラインで定式化する。この縫合結果は、エピポーラ変位場に応じて画素を逆回転させることで得られる。この方法では、エピポーラ制約をワーピング規則に取り入れ、高品質なアライメントを保証し、パノラマの射影性を維持する。大視差画像の縫合における提案手法の競合性を示す定性的,定量的な比較実験を行った。 Large parallax image stitching is a challenging task. Existing methods often struggle to maintain both the local and global structures of the image while reducing alignment artifacts and warping distortions. In this paper, we propose a novel approach that utilizes epipolar geometry to establish a warping technique based on the epipolar displacement field. Initially, the warping rule for pixels in the epipolar geometry is established through the infinite homography. Subsequently, Subsequently, the epipolar displacement field, which represents the sliding distance of the warped pixel along the epipolar line, is formulated by thin plate splines based on the principle of local elastic deformation. The stitching result can be generated by inversely warping the pixels according to the epipolar displacement field. This method incorporates the epipolar constraints in the warping rule, which ensures high-quality alignment and maintains the projectivity of the panorama. Qualitative and quantitative comparative experiments demonstrate the competitiveness of the proposed method in stitching images large parallax.	翻訳日:2023-11-29 19:11:49 公開日:2023-11-28
# MotionZero:ゼロショットテキスト・ビデオ・ジェネレーションの先行動作 MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation ( http://arxiv.org/abs/2311.16635v1 ) ライセンス: Link先を確認	Sitong Su, Litao Guo, Lianli Gao, Hengtao Shen and Jingkuan Song	(参考訳) ゼロショットテキスト・トゥ・ビデオ合成は、ビデオなしでプロンプトに基づいてビデオを生成する。動画からの動き情報がないと、プロンプトに暗示される動き優先が重要なガイダンスとなる。例えば、プロンプト "airplane landing on the runway" は、"airplane" が下方へ動く前に、"runway" が静止していることを示す。動きの優先順位は以前のアプローチで完全には悪用されていないため、2つの非自明な問題に繋がる。 1) 動きの変動パターンは,動きの先行性を無視したまま,かつ,即時無依存である。 2) 異なる物体の運動制御は、異なる物体の独立運動先行を考慮せずに不正確かつ絡み合っている。この2つの課題に対処するために,大言語モデルによる異なる物体のプロンプトから動き先行を導出するMotionZeroと呼ばれるプロンプト適応型・アンタングル型モーションコントロール戦略を提案する。さらに,動き振幅の度合いの異なる動画を容易にするために,動き振幅によるフレーム間の注意調整を行う動き認識方式を提案する。広範な実験により,提案手法が異なる物体の動きを正しく制御し,ゼロショット映像編集を含む多用途アプリケーションをサポートすることを実証した。 Zero-shot Text-to-Video synthesis generates videos based on prompts without any videos. Without motion information from videos, motion priors implied in prompts are vital guidance. For example, the prompt "airplane landing on the runway" indicates motion priors that the "airplane" moves downwards while the "runway" stays static. Whereas the motion priors are not fully exploited in previous approaches, thus leading to two nontrivial issues: 1) the motion variation pattern remains unaltered and prompt-agnostic for disregarding motion priors; 2) the motion control of different objects is inaccurate and entangled without considering the independent motion priors of different objects. To tackle the two issues, we propose a prompt-adaptive and disentangled motion control strategy coined as MotionZero, which derives motion priors from prompts of different objects by Large-Language-Models and accordingly applies motion control of different objects to corresponding regions in disentanglement. Furthermore, to facilitate videos with varying degrees of motion amplitude, we propose a Motion-Aware Attention scheme which adjusts attention among frames by motion amplitude. Extensive experiments demonstrate that our strategy could correctly control motion of different objects and support versatile applications including zero-shot video edit.	翻訳日:2023-11-29 19:11:30 公開日:2023-11-28
# ブラックボックスのオープン:ビルディング物理洞察を用いた本質的に解釈可能なエネルギーデータインプテーションモデルに向けて Opening the Black Box: Towards inherently interpretable energy data imputation models using building physics insight ( http://arxiv.org/abs/2311.16632v1 ) ライセンス: Link先を確認	Antonio Liguori, Matias Quintana, Chun Fu, Clayton Miller, J\'er\^ome Frisch, Christoph van Treeck	(参考訳) 失われたデータは、建築エネルギーモデリングコミュニティの実践者や研究者によってしばしば観察される。この点において、ディープラーニング手法のような先進的なデータ駆動ソリューションは、一般的にこれらの異常の非線形挙動を反映するために必要である。ディープラーニングに関する継続的な研究課題として、ネットワークに事前知識を導入することで、限られたデータ設定へのモデルの適用性を検討することができる。この戦略は、より解釈可能な予測につながる可能性があるため、アプローチのフィールド適用が容易になる。本研究の目的は, 物理インフォームド・デノイング・オートエンコーダ (PI-DAE) を用いて, 商業ビルにおけるデータ計算の欠如について検討することである。特に,提案手法では,物理に着想を得たソフト制約をデノナイジングオートエンコーダ(DAE)の損失関数に適用する。物理成分の利点を定量化するために、異なるDAE構成間のアブレーション研究を行った。まず、3つの単変量DAEを室内の気温、暖房、冷却データに別々に最適化する。次に、以前の構成から2つの多変量DAEを導出する。最終的に、建築熱収支方程式を最終多変量構成に結合してPI-DAEを得る。さらに、この結果をサポートするために2つの一般的なベンチマークが使用される。多変量デノナイジングオートエンコーダにおける物理知識の導入は、最適化された物理ベースの係数を通して、固有モデルの解釈可能性を高めることができることを示す。提案したPI-DAEの復元誤差に関して有意な改善は見られていないが, 欠落率の変動に対する堅牢性の向上と, 物理に基づく係数から得られた貴重な洞察は, 建築システムや建築環境における幅広い応用の機会を生み出している。 Missing data are frequently observed by practitioners and researchers in the building energy modeling community. In this regard, advanced data-driven solutions, such as Deep Learning methods, are typically required to reflect the non-linear behavior of these anomalies. As an ongoing research question related to Deep Learning, a model's applicability to limited data settings can be explored by introducing prior knowledge in the network. This same strategy can also lead to more interpretable predictions, hence facilitating the field application of the approach. For that purpose, the aim of this paper is to propose the use of Physics-informed Denoising Autoencoders (PI-DAE) for missing data imputation in commercial buildings. In particular, the presented method enforces physics-inspired soft constraints to the loss function of a Denoising Autoencoder (DAE). In order to quantify the benefits of the physical component, an ablation study between different DAE configurations is conducted. First, three univariate DAEs are optimized separately on indoor air temperature, heating, and cooling data. Then, two multivariate DAEs are derived from the previous configurations. Eventually, a building thermal balance equation is coupled to the last multivariate configuration to obtain PI-DAE. Additionally, two commonly used benchmarks are employed to support the findings. It is shown how introducing physical knowledge in a multivariate Denoising Autoencoder can enhance the inherent model interpretability through the optimized physics-based coefficients. While no significant improvement is observed in terms of reconstruction error with the proposed PI-DAE, its enhanced robustness to varying rates of missing data and the valuable insights derived from the physics-based coefficients create opportunities for wider applications within building systems and the built environment.	翻訳日:2023-11-29 19:11:07 公開日:2023-11-28
# 条件付き集合変換による衣装完成 Outfit Completion via Conditional Set Transformation ( http://arxiv.org/abs/2311.16630v1 ) ライセンス: Link先を確認	Takuma Nakamura, Yuki Saito, Ryosuke Goto	(参考訳) 本稿では,集合的検索タスクとして衣装完成問題を定式化し,この問題を解決するための新しい枠組みを提案する。この提案は、ディープニューラルネットワークを用いた条件セット変換アーキテクチャと、互換性に基づく正規化手法を含む。提案手法は,入力集合に置換不変な写像,条件集合に置換不変な写像を用いる。これにより、条件セットの特性を反映しながら、入力セットと互換性のある集合を取得することができる。さらに、この構造は単一の推論で出力集合の要素を出力するので、出力集合の濃度に関してスケーラブルな推論速度を達成することができる。実データを用いた実験結果から,提案手法は,服飾完了作業の精度,条件満足度,完了結果の整合性において,既存の手法よりも優れていることがわかった。 In this paper, we formulate the outfit completion problem as a set retrieval task and propose a novel framework for solving this problem. The proposal includes a conditional set transformation architecture with deep neural networks and a compatibility-based regularization method. The proposed method utilizes a map with permutation-invariant for the input set and permutation-equivariant for the condition set. This allows retrieving a set that is compatible with the input set while reflecting the properties of the condition set. In addition, since this structure outputs the element of the output set in a single inference, it can achieve a scalable inference speed with respect to the cardinality of the output set. Experimental results on real data reveal that the proposed method outperforms existing approaches in terms of accuracy of the outfit completion task, condition satisfaction, and compatibility of completion results.	翻訳日:2023-11-29 19:10:40 公開日:2023-11-28
# 対称性正規化神経常微分方程式 Symmetry-regularized neural ordinary differential equations ( http://arxiv.org/abs/2311.16628v1 ) ライセンス: Link先を確認	Wenbo Hao	(参考訳) 神経常微分方程式(neural ordinary differential equation、neural odes)は、ニューラルネットワークの隠れ状態ダイナミクスを常微分方程式として解釈する深層ニューラルネットワークモデルの一種である。本研究では,対称性の正規化をニューラルODEに統合する。特に、保存法則を導出し、損失関数にそれらを加えるために、ODE と PDE の連続リー対称性を使い、物理学的に不変である。損失関数に固有の構造特性を組み込むことで、トレーニング中のモデルの堅牢性と安定性を著しく向上させることができる。本手法を説明するために,隠れ状態の変化のコサイン率を利用して,Lie対称性を同定し,保存法則を導出し,新たな損失関数を構築する玩具モデルを用いた。 Neural Ordinary Differential Equations (Neural ODEs) is a class of deep neural network models that interpret the hidden state dynamics of neural networks as an ordinary differential equation, thereby capable of capturing system dynamics in a continuous time framework. In this work, I integrate symmetry regularization into Neural ODEs. In particular, I use continuous Lie symmetry of ODEs and PDEs associated with the model to derive conservation laws and add them to the loss function, making it physics-informed. This incorporation of inherent structural properties into the loss function could significantly improve robustness and stability of the model during training. To illustrate this method, I employ a toy model that utilizes a cosine rate of change in the hidden state, showcasing the process of identifying Lie symmetries, deriving conservation laws, and constructing a new loss function.	翻訳日:2023-11-29 19:10:28 公開日:2023-11-28
# アト秒トンネル顕微鏡の強磁場理論 Strong-field theory of attosecond tunneling microscopy ( http://arxiv.org/abs/2311.16626v1 ) ライセンス: Link先を確認	Boyang Ma and Michael Kr\"uger	(参考訳) 分子およびナノ構造中のコヒーレント電子のアト秒観察は、従来の走査型トンネル顕微鏡(stm)と超短フェムト秒レーザーパルスを組み合わせることで達成できる。サブサイクルにおける実験的研究が進行中であるが、強い強磁場理論の記述はいまだ解明されていない。ここでは強場近似に基づくモデルを考案する。すべてのレシエーションにおいて有効であり、STMの標準モデルに驚くべき類似を提供する。また,直観的なアト秒科学の3段階モデルがモデルから直接出現し,アト秒stm実験の最適条件を記述できることを示した。 Attosecond observations of coherent electron dynamics in molecules and nanostructures can be achieved by combining conventional scanning tunneling microscopy (STM) with ultrashort femtosecond laser pulses. While experimental studies in the sub-cycle regime are underway, a robust strong-field theory description has remained elusive. Here we devise a model based on the strong-field approximation. Valid in all regimes, it provides a surprising analogy to the standard model of STM. We also show that the intuitive three-step model of attosecond science directly emerges from our model and describe the optimal conditions for attosecond STM experiments.	翻訳日:2023-11-29 19:10:02 公開日:2023-11-28
# カンパラにおける大気質モニタリングのためのガウス過程 Gaussian Processes for Monitoring Air-Quality in Kampala ( http://arxiv.org/abs/2311.16625v1 ) ライセンス: Link先を確認	Clara Stoddart, Lauren Shrack, Richard Sserunjogi, Usman Abdul-Ganiy, Engineer Bainomugisha, Deo Okure, Ruth Misener, Jose Pablo Folch, Ruby Sedgwick	(参考訳) 大気汚染のモニタリングは、人口全体の健康にとって極めて重要である。残念ながら、空気の質を測る装置は高価であり、低所得国や中所得国の多くの都市は、それらをまばらに割り当てる必要がある。本稿では,センサが存在しない場所での現在の大気汚染の予測と,センサ位置における将来的な大気汚染の予測にガウス過程を用いることについて検討する。特に、AirQoのセンサーネットワークのデータを用いて、ウガンダのカンパラ市に焦点を当てています。アウトレイラを除去し、異なるカーネル関数と追加入力を比較する利点を実証する。また、2つのスパース近似を比較して、データセット内の大量の時間データを生成する。 Monitoring air pollution is of vital importance to the overall health of the population. Unfortunately, devices that can measure air quality can be expensive, and many cities in low and middle-income countries have to rely on a sparse allocation of them. In this paper, we investigate the use of Gaussian Processes for both nowcasting the current air-pollution in places where there are no sensors and forecasting the air-pollution in the future at the sensor locations. In particular, we focus on the city of Kampala in Uganda, using data from AirQo's network of sensors. We demonstrate the advantage of removing outliers, compare different kernel functions and additional inputs. We also compare two sparse approximations to allow for the large amounts of temporal data in the dataset.	翻訳日:2023-11-29 19:09:50 公開日:2023-11-28
# 実ロボットを用いた視覚意味ナビゲーション Visual Semantic Navigation with Real Robots ( http://arxiv.org/abs/2311.16623v1 ) ライセンス: Link先を確認	Carlos Guti\'errez-\'Alvarez, Pablo R\'ios-Navarro, Rafael Flor-Rodr\'iguez, Francisco Javier Acevedo-Rodr\'iguez, Roberto J. L\'opez-Sastre	(参考訳) ビジュアルセマンティックナビゲーション(VSN)は、ロボットが見えない環境でナビゲートするための視覚意味情報を学ぶ能力である。これらのVSNモデルは、トレーニング対象の仮想環境において、主に強化学習に基づくアプローチを使用してテストされる。したがって、これらのモデルが現実世界でどのように振る舞うかに関する詳細な分析はまだありません。そこで本研究では,VSNモデルを実ロボットに組み込む新しい手法を提案する。また,VSN 用の新しい ROS ベースのフレームワーク ROS4VSN をリリースし,任意の VSN モデルを ROS 互換ロボットに容易にデプロイし,実環境でテストできるようにした。実環境とシミュレーション環境において,2つのVSNエージェントを組み込んだ2つの異なるロボットを用いた実験により,これらのVSNソリューションに顕著な性能差があることが確認された。我々は,本研究が,実世界の現実シナリオにおけるエンボディエージェントの性能と効率の向上を究極的に目的とし,この課題に対処するための基盤を提供することを期待している。すべての実験を再現するコードは、https://github.com/gramuah/ros4vsnで確認できます。 Visual Semantic Navigation (VSN) is the ability of a robot to learn visual semantic information for navigating in unseen environments. These VSN models are typically tested in those virtual environments where they are trained, mainly using reinforcement learning based approaches. Therefore, we do not yet have an in-depth analysis of how these models would behave in the real world. In this work, we propose a new solution to integrate VSN models into real robots, so that we have true embodied agents. We also release a novel ROS-based framework for VSN, ROS4VSN, so that any VSN-model can be easily deployed in any ROS-compatible robot and tested in a real setting. Our experiments with two different robots, where we have embedded two state-of-the-art VSN agents, confirm that there is a noticeable performance difference of these VSN solutions when tested in real-world and simulation environments. We hope that this research will endeavor to provide a foundation for addressing this consequential issue, with the ultimate aim of advancing the performance and efficiency of embodied agents within authentic real-world scenarios. Code to reproduce all our experiments can be found at https://github.com/gramuah/ros4vsn.	翻訳日:2023-11-29 19:09:39 公開日:2023-11-28
# 圧縮光を弱値増幅に導入した空間計測の精度向上 Precision Enhancement in Spatial Measurement by Introducing Squeezed Light into Weak Value Amplification ( http://arxiv.org/abs/2311.16622v1 ) ライセンス: Link先を確認	Chaoxia Zhang, Yongchao Chen, Gang Chen, Hengxin Sun, Jing Zhang, Kui Liu, Rongguo Yang, Jiangrui Gao	(参考訳) TEM10圧縮真空ビームを注入することにより、弱値増幅(WVA)システムとスプリットライクな検出に基づく光学空間測定で精度向上を実証する。 wva技術とスクイズドビーム注入を組み合わせることで、標準量子限界を超える高精度の光学空間計測を実験的に実現するのは初めてである。その結果、マッハ・ツェンダー干渉計の真空入力ポートに圧縮ビームを加えることにより、500kHzで1.3倍の精度向上が達成される。最小測定可能な変位は1.08pmから0.85pmに減少し、対応する最小測定可能な傾きは0.86pradから0.67pradに減少する。また、低周波帯における空間測定も実施し、SNRを4kHzで2dB改善する。我々の研究は、重力波干渉計の校正や超高分解能量子イメージングなどに応用できる光空間計測の高精度化に有効な方法を提供する。 The precision enhancement is demonstrated in an optical spatial measurement based on weak value amplification (WVA) system and split-like detection, by injecting a TEM10 squeezed vacuum beam. It is the first time combining the WVA technique and squeezed beam injection to experimentally realize high-precision optical spatial measurement beyond the standard quantum limit. As a result, the precision enhancement of 1.3 times can be achieved at 500kHz by adding a squeezed beam in the vacuum input port of the Mach-Zehnder interferometer. The minimum measurable displacement is reduced from 1.08pm to 0.85pm and the corresponding minimum measurable tilt is reduced from 0.86prad to 0.67prad. Moreover, the spatial measurement at low-frequency band is also implemented and the SNR is improved 2dB at 4kHz. Our work provides an effective method to accomplish higher precision in optical spatial measurement, which has potential applications in gravitational wave interferometer calibration, super-resolution quantum imaging, etc.	翻訳日:2023-11-29 19:09:19 公開日:2023-11-28
# Beyond Labels: 距離分布(EDD)のエントロピーによるクラスタ分析の促進 Beyond Labels: Advancing Cluster Analysis with the Entropy of Distance Distribution (EDD) ( http://arxiv.org/abs/2311.16621v1 ) ライセンス: Link先を確認	Claus Metzner, Achim Schilling and Patrick Krauss	(参考訳) 進化するデータサイエンスのランドスケープにおいて、高次元データセットにおけるクラスタリングの正確な定量化は、特に予め定義されたラベルがない場合において、重要な課題である。本稿では,ラベルフリークラスタリング解析におけるパラダイムシフトを表す新しい手法であるEntropy of Distance Distribution (EDD)を紹介する。離散ラベルに依存した従来の手法は、ラベルのないデータの複雑なクラスタパターンの識別に苦慮することが多い。しかし、eddは、データラベリングに依存しないクラスタリング傾向を識別するために、対方向のポイントツーポイント距離の特性的差異を利用する。本手法はシャノン情報エントロピーを用いて,データセット内の距離分布の「ピーク性」または「平坦性」を定量化する。このエントロピー測度は、その最大値に対して正規化され、(距離分布の発音ピークによって示される)強クラスターデータと、より均質な非クラスタデータセットとを効果的に区別する。このラベルのない量子化は、大域的なデータポイントの変換や置換に対して弾力性があり、追加の次元のz-スコーリングにより、データセットのスケーリングに不変となる。ガウスクラスタセンターを用いた2次元データ空間に関する一連の実験を通して,EDDの有効性を示す。以上の結果から,クラスタ幅の拡大に伴ってedd値が単調に上昇することが明らかとなった。この動作は、クラスタリングのさまざまな程度を検出する際の感度と精度を強調する。 eddのポテンシャルは、従来のクラスタリング分析を超えて拡張され、事前に割り当てられたラベルに依存することなく複雑なデータ構造を解き放つための堅牢でスケーラブルなツールを提供する。 In the evolving landscape of data science, the accurate quantification of clustering in high-dimensional data sets remains a significant challenge, especially in the absence of predefined labels. This paper introduces a novel approach, the Entropy of Distance Distribution (EDD), which represents a paradigm shift in label-free clustering analysis. Traditional methods, reliant on discrete labels, often struggle to discern intricate cluster patterns in unlabeled data. EDD, however, leverages the characteristic differences in pairwise point-to-point distances to discern clustering tendencies, independent of data labeling. Our method employs the Shannon information entropy to quantify the 'peakedness' or 'flatness' of distance distributions in a data set. This entropy measure, normalized against its maximum value, effectively distinguishes between strongly clustered data (indicated by pronounced peaks in distance distribution) and more homogeneous, non-clustered data sets. This label-free quantification is resilient against global translations and permutations of data points, and with an additional dimension-wise z-scoring, it becomes invariant to data set scaling. We demonstrate the efficacy of EDD through a series of experiments involving two-dimensional data spaces with Gaussian cluster centers. Our findings reveal a monotonic increase in the EDD value with the widening of cluster widths, moving from well-separated to overlapping clusters. This behavior underscores the method's sensitivity and accuracy in detecting varying degrees of clustering. EDD's potential extends beyond conventional clustering analysis, offering a robust, scalable tool for unraveling complex data structures without reliance on pre-assigned labels.	翻訳日:2023-11-29 19:09:01 公開日:2023-11-28
# 変圧器の長距離能力について On the Long Range Abilities of Transformers ( http://arxiv.org/abs/2311.16620v1 ) ライセンス: Link先を確認	Itamar Zimerman, Lior Wolf	(参考訳) 現代のDLや、特にNLPドメインにおいて支配的であるにもかかわらず、トランスフォーマーアーキテクチャは、この目的のために特別に設計された最近のレイヤと比較して、長距離タスクに準最適性能を示す。本稿では,状態空間層,線形rnn層,大域畳み込み層といった長距離層の主要な特性から着想を得て,トランスフォーマーアーキテクチャの最小限の変更が,long range arena (lra)ベンチマークの性能を著しく向上させ,これらの特殊な層とのギャップを狭めることを実証する。長距離タスクの2つの重要な原則は (i)滑らかさに対する帰納的バイアスを取り入れ、 (二)地域性。私たちが示すように、これらのアイデアをアテンションメカニズムに統合することで、追加の計算量と追加のトレーニング可能なパラメータなしで結果が向上する。我々の理論と実験は、長距離タスクにおけるトランスフォーマーの性能が劣る理由にも光を当て、長距離依存関係の取得に不可欠な重要な特性を特定した。 Despite their dominance in modern DL and, especially, NLP domains, transformer architectures exhibit sub-optimal performance on long-range tasks compared to recent layers that are specifically designed for this purpose. In this work, drawing inspiration from key attributes of long-range layers, such as state-space layers, linear RNN layers, and global convolution layers, we demonstrate that minimal modifications to the transformer architecture can significantly enhance performance on the Long Range Arena (LRA) benchmark, thus narrowing the gap with these specialized layers. We identify that two key principles for long-range tasks are (i) incorporating an inductive bias towards smoothness, and (ii) locality. As we show, integrating these ideas into the attention mechanism improves results with a negligible amount of additional computation and without any additional trainable parameters. Our theory and experiments also shed light on the reasons for the inferior performance of transformers on long-range tasks and identify critical properties that are essential for successfully capturing long-range dependencies.	翻訳日:2023-11-29 19:08:33 公開日:2023-11-28
# camouflaged object detectionのためのオーバーラップウィンドウによるクロスレベル注意 Cross-level Attention with Overlapped Windows for Camouflaged Object Detection ( http://arxiv.org/abs/2311.16618v1 ) ライセンス: Link先を確認	Jiepan Li, Fangxiao Lu, Nan Xue, Zhuohong Li, Hongyan Zhang, Wei He	(参考訳) カモフラージュした物体は環境に適応的に色やテクスチャを合わせ、周囲と区別できない。現在の手法では、高レベルのセマンティック機能は、カモフラージュされたオブジェクトと背景の違いを強調することができる。その結果、精度の高いcamouflaged object detection(cod)のために、ハイレベルなセマンティック機能と低レベルの詳細な機能を統合する。従来のマルチレベル特徴融合の設計とは異なり、低レベル特徴の強化はCODにとってより差し迫っていると述べる。本稿では,高次特徴によって導かれる低次特徴強調を実現するために,重なり合うウィンドウクロスレベル注意(OWinCA)を提案する。最高レベルと低レベルの両方の機能マップ上でアライメントされたウィンドウペアをスライドすることで、ハイレベルセマンティクスは、クロスレベルアテンションによって、明示的に低レベルの詳細に統合される。さらに、重なり合うウィンドウ分割戦略を用いて、ウィンドウ間の不整合を緩和し、グローバル情報の損失を防止する。これらの導入により、提案したOWinCAは、カモフラージュされたオブジェクトの分離性を促進することにより、低レベルの特徴を高めることができる。関連するOWinCANetは、これらの拡張されたマルチレベル特徴を単純な畳み込み演算によって融合し、最終的なCODを実現する。 3つの大規模CODデータセットで行った実験は、OWinCANetが現在の最先端COD法を大幅に上回っていることを示している。 Camouflaged objects adaptively fit their color and texture with the environment, which makes them indistinguishable from the surroundings. Current methods revealed that high-level semantic features can highlight the differences between camouflaged objects and the backgrounds. Consequently, they integrate high-level semantic features with low-level detailed features for accurate camouflaged object detection (COD). Unlike previous designs for multi-level feature fusion, we state that enhancing low-level features is more impending for COD. In this paper, we propose an overlapped window cross-level attention (OWinCA) to achieve the low-level feature enhancement guided by the highest-level features. By sliding an aligned window pair on both the highest- and low-level feature maps, the high-level semantics are explicitly integrated into the low-level details via cross-level attention. Additionally, it employs an overlapped window partition strategy to alleviate the incoherence among windows, which prevents the loss of global information. These adoptions enable the proposed OWinCA to enhance low-level features by promoting the separability of camouflaged objects. The associated proposed OWinCANet fuses these enhanced multi-level features by simple convolution operation to achieve the final COD. Experiments conducted on three large-scale COD datasets demonstrate that our OWinCANet significantly surpasses the current state-of-the-art COD methods.	翻訳日:2023-11-29 19:08:12 公開日:2023-11-28
# 反事実推論のための逆分布バランス Adversarial Distribution Balancing for Counterfactual Reasoning ( http://arxiv.org/abs/2311.16616v1 ) ライセンス: Link先を確認	Stefan Schrod, Fabian Sinz, Michael Altenbuchinger	(参考訳) 因果予測モデルの開発は、結果が適用された(実質的な)介入に対してのみ観察可能であり、その代替品(いわゆる反事実)に留まらず、医学においては、投与された薬物に対する患者の生存を知っており、他の治療の選択肢に留まらずに済むという事実によって挑戦される。反実的推論のための機械学習アプローチは、非ランダムな治療管理による未観測結果と分布差の両方に対処する必要がある。監視されていないドメイン適応(UDA)も同様の問題に対処する。対象ドメインのラベルである観測されていない結果と、ソースとターゲットドメインの分散的な差異を扱う必要がある。本稿では, 因果関係の素因関係を除去するために, 逆因果関係の潜在的結果推定を直接的に利用する, 逆因果推論のための逆分布バランス(Adversarial Distribution Balancing for Counterfactual Reasoning, ADBCR)を提案する。 ADBCRは3つのベンチマークデータセット上で最先端の手法よりも優れており、未ラベル検証データがトレーニング手順に含まれる場合、ADBCRの性能がさらに向上し、モデルの検証領域への適応性が向上することを示す。 The development of causal prediction models is challenged by the fact that the outcome is only observable for the applied (factual) intervention and not for its alternatives (the so-called counterfactuals); in medicine we only know patients' survival for the administered drug and not for other therapeutic options. Machine learning approaches for counterfactual reasoning have to deal with both unobserved outcomes and distributional differences due to non-random treatment administration. Unsupervised domain adaptation (UDA) addresses similar issues; one has to deal with unobserved outcomes -- the labels of the target domain -- and distributional differences between source and target domain. We propose Adversarial Distribution Balancing for Counterfactual Reasoning (ADBCR), which directly uses potential outcome estimates of the counterfactuals to remove spurious causal relations. We show that ADBCR outcompetes state-of-the-art methods on three benchmark datasets, and demonstrate that ADBCR's performance can be further improved if unlabeled validation data are included in the training procedure to better adapt the model to the validation domain.	翻訳日:2023-11-29 19:07:49 公開日:2023-11-28
# ランダム射影に対するマハラノビス距離のディップ統計に基づく多変量一様性試験 A Multivariate Unimodality Test Harnenssing the Dip Statistic of Mahalanobis Distances Over Random Projections ( http://arxiv.org/abs/2311.16614v1 ) ライセンス: Link先を確認	Prodromos Kolyvakis, Aristidis Likas	(参考訳) 統計解析において中心的な一様性は、データセット構造に関する洞察を与え、洗練された分析手順を駆動する。ユニモダリティの確認は、シルバーマンのアプローチやハーティガンズのディップ統計のような手法を用いた一次元データでは簡単であるが、高次元への一般化は依然として困難である。線形ランダム射影を用いて一次元一様性原理を多次元空間へ外挿し、点対点距離を生かし、この手法は$\alpha$-一様性仮定に根ざし、新しい多変量一様性試験である泥ッドを提示する。理論的および実証的研究は,多次元データセットの一様性評価およびクラスタ数推定における本手法の有効性を確認した。 Unimodality, pivotal in statistical analysis, offers insights into dataset structures and drives sophisticated analytical procedures. While unimodality's confirmation is straightforward for one-dimensional data using methods like Silverman's approach and Hartigans' dip statistic, its generalization to higher dimensions remains challenging. By extrapolating one-dimensional unimodality principles to multi-dimensional spaces through linear random projections and leveraging point-to-point distancing, our method, rooted in $\alpha$-unimodality assumptions, presents a novel multivariate unimodality test named mud-pod. Both theoretical and empirical studies confirm the efficacy of our method in unimodality assessment of multidimensional datasets as well as in estimating the number of clusters.	翻訳日:2023-11-29 19:07:26 公開日:2023-11-28
# 幾何学的媒介条件を用いた軽量顔検出器のフィルタ処理 Filter-Pruning of Lightweight Face Detectors Using a Geometric Median Criterion ( http://arxiv.org/abs/2311.16613v1 ) ライセンス: Link先を確認	Konstantinos Gkrispanis, Nikolaos Gkalelis, Vasileios Mezaris	(参考訳) 顔検出装置は、処理能力とメモリの制限のあるエッジデバイス上で実行されることが多い監視を含む多くのアプリケーションにおいて、重要なコンポーネントになりつつある。そのため、リソース制約のあるデバイス間で効率的に機能するコンパクトな顔検出モデルが求められている。近年、ネットワークプルーニング技術は研究者から多くの注目を集めている。これらの手法は、顔検出装置の普及にもかかわらず、十分に検討されていない。本稿では,EXTD (Extremely Tiny Face Detector) とEResFD (Efficient ResNet Face Detector) という,すでに小型でコンパクトな2つの顔検出器にフィルタプルーニングを実装した。主なプルーニングアルゴリズムは,幾何中央値 (fpgm) によるフィルタプルーニングと,soft filter pruning (sfp) 反復処理を組み合わせたフィルタプルーニングである。また,L1ノルムプルーニングをベースラインとして,提案手法と比較する。 WIDER FACEデータセットの実験的評価は,提案手法が既に軽量な顔検出器のモデルサイズをさらに削減し,精度の低下を抑えたり,低プルーニングレートの精度向上を図ったりできる可能性を示唆している。 Face detectors are becoming a crucial component of many applications, including surveillance, that often have to run on edge devices with limited processing power and memory. Therefore, there's a pressing demand for compact face detection models that can function efficiently across resource-constrained devices. Over recent years, network pruning techniques have attracted a lot of attention from researchers. These methods haven't been well examined in the context of face detectors, despite their expanding popularity. In this paper, we implement filter pruning on two already small and compact face detectors, named EXTD (Extremely Tiny Face Detector) and EResFD (Efficient ResNet Face Detector). The main pruning algorithm that we utilize is Filter Pruning via Geometric Median (FPGM), combined with the Soft Filter Pruning (SFP) iterative procedure. We also apply L1 Norm pruning, as a baseline to compare with the proposed approach. The experimental evaluation on the WIDER FACE dataset indicates that the proposed approach has the potential to further reduce the model size of already lightweight face detectors, with limited accuracy loss, or even with small accuracy gain for low pruning rates.	翻訳日:2023-11-29 19:07:08 公開日:2023-11-28
# LiveNVS:ライブRGB-Dストリームによるニューラルビュー合成 LiveNVS: Neural View Synthesis on Live RGB-D Streams ( http://arxiv.org/abs/2311.16668v1 ) ライセンス: Link先を確認	Laura Fink, Darius R\"uckert, Linus Franke, Joachim Keinert, Marc Stamminger	(参考訳) Kinect Fusionのような既存のリアルタイムRGB-D再構成アプローチには、リアルタイムのフォトリアリスティックな視覚化が欠けている。これは、不完全な深度地図とカメラのポーズから融合したノイズ、過剰な形状、不完全なテクスチャ、ぼやけたテクスチャが原因である。最近のニューラルレンダリング手法は、これらのアーティファクトの多くを克服することができるが、主にオフライン使用に最適化されており、ライブリビルドパイプラインへの統合を妨げる。本稿では,低レイテンシでリアルタイムなレンダリングが可能なライブRGB-D入力ストリーム上で,ニューラルノベルビューの合成を可能にするLiveNVSを提案する。 RGB-D入力ストリームに基づいて、高密度に融合した深度マップを介してニューラルネットワーク機能をターゲットビューに投影し、画像空間の特徴をターゲット特徴マップに集約することにより、新しいビューを描画する。一般化可能なニューラルネットワークは、ターゲットのフィーチャーマップを高品質なRGBイメージに変換する。 LiveNVSは、キャプチャ中に未知のシーンの最先端のニューラルネットワークレンダリング品質を実現し、ユーザーはシーンを仮想的に探索し、リアルタイムで再構築品質を評価することができる。 Existing real-time RGB-D reconstruction approaches, like Kinect Fusion, lack real-time photo-realistic visualization. This is due to noisy, oversmoothed or incomplete geometry and blurry textures which are fused from imperfect depth maps and camera poses. Recent neural rendering methods can overcome many of such artifacts but are mostly optimized for offline usage, hindering the integration into a live reconstruction pipeline. In this paper, we present LiveNVS, a system that allows for neural novel view synthesis on a live RGB-D input stream with very low latency and real-time rendering. Based on the RGB-D input stream, novel views are rendered by projecting neural features into the target view via a densely fused depth map and aggregating the features in image-space to a target feature map. A generalizable neural network then translates the target feature map into a high-quality RGB image. LiveNVS achieves state-of-the-art neural rendering quality of unknown scenes during capturing, allowing users to virtually explore the scene and assess reconstruction quality in real-time.	翻訳日:2023-11-29 19:00:37 公開日:2023-11-28
# 分子特性予測のためのマルチモーダルラーニング:画像とグラフ構造に基づく枠組み MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures ( http://arxiv.org/abs/2311.16666v1 ) ライセンス: Link先を確認	Zhuoyuan Wang, Jiacong Mi, Shan Lu, Jieyue He	(参考訳) 薬物分子特性の正確な予測の探求は、AIDD(Artificial Intelligence Drug Discovery)の領域における根本的な課題となっている。薬物分子の効果的な表現は、この追求において重要な要素として現れる。現代の先進的な研究は、主に、大規模でラベル付けされていない分子データから有意義な構造的表現を抽出するために、自己教師付き学習(SSL)技術を利用する。しかしながら、これらの研究の固有の欠点は、分子画像やSMILES表現のような分子情報の1つのモダリティに依存することにあるため、様々な分子のモダリティの潜在的な相補性を無視している。そこで,本研究では,画像およびグラフ構造に基づく分子特性予測のためのマルチモーダル分子事前学習フレームワークmoligを提案する。 MolIGモデルは、分子グラフと分子画像のコヒーレンスと相関を利用して、自己教師付きタスクを実行し、両方の分子表現形式の強みを効果的に融合させる。この包括的アプローチにより、重要な分子構造特性と高レベルの意味情報を取得することができる。事前トレーニングが完了すると、下流タスクの予測にグラフニューラルネットワーク(GNN)エンコーダが使用される。高度なベースラインモデルと比較して、MoleculeNet Benchmark GroupやADMET Benchmark Groupといったベンチマークグループ内の分子特性予測に関連する下流タスクのパフォーマンスが向上している。 The quest for accurate prediction of drug molecule properties poses a fundamental challenge in the realm of Artificial Intelligence Drug Discovery (AIDD). An effective representation of drug molecules emerges as a pivotal component in this pursuit. Contemporary leading-edge research predominantly resorts to self-supervised learning (SSL) techniques to extract meaningful structural representations from large-scale, unlabeled molecular data, subsequently fine-tuning these representations for an array of downstream tasks. However, an inherent shortcoming of these studies lies in their singular reliance on one modality of molecular information, such as molecule image or SMILES representations, thus neglecting the potential complementarity of various molecular modalities. In response to this limitation, we propose MolIG, a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. MolIG model innovatively leverages the coherence and correlation between molecule graph and molecule image to execute self-supervised tasks, effectively amalgamating the strengths of both molecular representation forms. This holistic approach allows for the capture of pivotal molecular structural characteristics and high-level semantic information. Upon completion of pre-training, Graph Neural Network (GNN) Encoder is used for the prediction of downstream tasks. In comparison to advanced baseline models, MolIG exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups such as MoleculeNet Benchmark Group and ADMET Benchmark Group.	翻訳日:2023-11-29 19:00:17 公開日:2023-11-28
# DGNR:大型運転シーンの密度誘導型ニューラルポイントレンダリング DGNR: Density-Guided Neural Point Rendering of Large Driving Scenes ( http://arxiv.org/abs/2311.16664v1 ) ライセンス: Link先を確認	Zhuopeng Li, Chenming Wu, Liangjun Zhang, Jianke Zhu	(参考訳) 近年のNeural Radiance Field (NeRF)の成功にもかかわらず、特に高いレンダリング品質と効率が要求される場合、長い軌道で大規模な運転シーンをレンダリングすることは依然として困難である。このようなシーンの既存の方法は、通常、空間的ワーピング、ゼロショット正規または深さ推定からの幾何学的監督、またはシーン分割戦略に関係しており、合成されたビューはしばしばぼやけているか、効率的なレンダリングの要件を満たしていない。以上の課題に対処するために,DGNR(Density-Guided Neural Rendering)と呼ばれる点ベースレンダラーの構築を支援するために,シーンから密度空間を学習する新しいフレームワークを提案する。 DGNRでは、幾何学的先行はもはや不要であり、体積レンダリングによって密度空間から本質的に学習することができる。具体的には、微分可能なレンダラを用いて、学習した密度空間から得られた神経密度特徴から画像を合成する。密度空間を最適化するために密度ベース融合モジュールと幾何正規化を提案する。広範に使用される自動運転データセットで実験を行うことで、フォトリアリスティックな運転シーンの合成とリアルタイムなレンダリングにおけるdgnrの有効性を検証する。 Despite the recent success of Neural Radiance Field (NeRF), it is still challenging to render large-scale driving scenes with long trajectories, particularly when the rendering quality and efficiency are in high demand. Existing methods for such scenes usually involve with spatial warping, geometric supervision from zero-shot normal or depth estimation, or scene division strategies, where the synthesized views are often blurry or fail to meet the requirement of efficient rendering. To address the above challenges, this paper presents a novel framework that learns a density space from the scenes to guide the construction of a point-based renderer, dubbed as DGNR (Density-Guided Neural Rendering). In DGNR, geometric priors are no longer needed, which can be intrinsically learned from the density space through volumetric rendering. Specifically, we make use of a differentiable renderer to synthesize images from the neural density features obtained from the learned density space. A density-based fusion module and geometric regularization are proposed to optimize the density space. By conducting experiments on a widely used autonomous driving dataset, we have validated the effectiveness of DGNR in synthesizing photorealistic driving scenes and achieving real-time capable rendering.	翻訳日:2023-11-29 18:59:52 公開日:2023-11-28
# 平モデルにおける非許容暗号 Unclonable Cryptography in the Plain Model ( http://arxiv.org/abs/2311.16663v1 ) ライセンス: Link先を確認	C\'eline Chevalier and Paul Hermouet and Quoc-Huy Vu	(参考訳) 量子力学の非閉鎖原理を利用することで、古典的に不可能な新しい暗号プロトコルを実現することができる。 unclonable cryptographyの最も有名な例は、量子コピー保護とunclonable encryptionである。近年、多くの注目を集めているにもかかわらず、まだ2つの重要な疑問が残っている:プレーンモデルにおけるポイント関数のコピー保護(通常、実現可能性の実証と見なされる)と、プレーンモデルにおける不可解な識別不能なセキュリティを持つ、不可解な暗号化である。本研究では、Cladangelo, Liu, Liu, Zhandry (Crypto'21) と Culf and Vidick (Quantum'22) の以前の研究を頼りに、サブスペースコセット状態に対する新しいモノガミー・オブ・エンタングルメント特性を確立し、以下の新しい結果を得ることができる。 -我々は、初めて、不可解な識別不能なセキュリティを持つ不可解な暗号化が、プレーンモデルに存在していることを示します。 By leveraging the no-cloning principle of quantum mechanics, unclonable cryptography enables us to achieve novel cryptographic protocols that are otherwise impossible classically. Two most notable examples of unclonable cryptography are quantum copy-protection and unclonable encryption. Despite receiving a lot of attention in recent years, two important open questions still remain: copy-protection for point functions in the plain model, which is usually considered as feasibility demonstration, and unclonable encryption with unclonable indistinguishability security in the plain model. In this work, by relying on previous works of Coladangelo, Liu, Liu, and Zhandry (Crypto'21) and Culf and Vidick (Quantum'22), we establish a new monogamy-of-entanglement property for subspace coset states, which allows us to obtain the following new results: - We show that copy-protection of point functions exists in the plain model, with different challenge distributions (including arguably the most natural ones). - We show, for the first time, that unclonable encryption with unclonable indistinguishability security exists in the plain model.	翻訳日:2023-11-29 18:59:30 公開日:2023-11-28
# 連続可変レーザチャネルを有する2モード状態の量子ステアリング Quantum steering for two-mode states with Continuous-variable in laser channel ( http://arxiv.org/abs/2311.16658v1 ) ライセンス: Link先を確認	Kaimin Zheng, Jifeng Sun, Liyun Hu, Lijian Zhang	(参考訳) Einstein-Podolsky-Rosenステアリングは、一方のデバイス独立量子情報処理において重要なリソースである。このステアリング特性は、いくつかの実用的な応用のために量子システムと環境の相互作用中に破壊される。本稿では,特性関数の表現を確率として用いて,利得係数と損失効果の両方を考慮した連続可変2モード状態の量子ステアリングについて検討する。まず, 1モードおよび2モードのレーザチャネル下での2モード圧縮真空状態のステアリング時間を解析した。ゲインプロセスは2モードの圧縮真空状態に対して追加ノイズを発生させ、制御可能な時間を短縮することを見いだす。第二に、量子アインシュタイン-ポドルスキー-ローゼンステアリングの量子化により、2つの側損失は1つの側損失よりも小さいステアビリティを示すが、2つの方向ステアブルタイムを共有している。さらに、獲得した政党が他国を操ることができ、一方、他の政党は獲得した政党を一定の閾値で操ることができない。この意味では、一方の当事者における利得効果は、他方の当事者における損失効果と等価であると考えられる。アインシュタイン-ポドルスキー-ローゼン・ステアリングの蒸留と実用的な量子チャネルにおける量子情報処理について検討した。 The Einstein-Podolsky-Rosen steering is an important resource for one-sided device independent quantum information processing. This steering property will be destroyed during the interaction between quantum system and environment for some practical applications. In this paper, we use the representation of characteristic function for probability to examine the quantum steering of two-mode states with continuous-variable in laser channel, where both the gain factor and the loss effect are considered. Firstly, we analyse the steering time of two-mode squeezed vacuum state under one-mode and two-mode laser channel respectively. We find the gain process will introduce additional noise to the two-mode squeezed vacuum state such that the steerable time is reduced. Secondly, by quantising quantum Einstein-Podolsky-Rosen steering, it shows that two-side loss presents a smaller steerability than one-side loss although they share the same two-way steerable time. In addition, we find the more gained party can steer the others state, while the other party cannot steer the gained party in a certain threshold value. In this sense, it seems that the gain effect in one party is equivalent to the loss effect in the others party. Our results pave way for the distillation of Einstein-Podolsky-Rosen steering and the quantum information processing in practical quantum channels.	翻訳日:2023-11-29 18:59:06 公開日:2023-11-28
# SCALAR-NeRF: シーン再構成のためのスケール可能なラージスケールニューラルレージアンスフィールド SCALAR-NeRF: SCAlable LARge-scale Neural Radiance Fields for Scene Reconstruction ( http://arxiv.org/abs/2311.16657v1 ) ライセンス: Link先を確認	Yu Chen, Gim Hee Lee	(参考訳) 本研究では,スケーラブルな大規模ニューラルシーン再構築に適した新しいフレームワークであるSCALAR-NeRFを紹介する。そこで、エンコーダは3d点座標を処理してエンコードされた特徴を生成し、デコーダは符号付き距離と色の体積密度を含む幾何学的な値を生成する。われわれのアプローチはまず、画像データセット全体の粗いグローバルモデルをトレーニングする。その後、KMeansを使って画像を小さなブロックに分割し、各ブロックは専用のローカルモデルでモデル化する。各局所ブロックの境界ボックスをスケールアップすることにより,各ブロックにまたがる重なり合う領域を拡大する。特に、グローバルモデルからのデコーダは異なるブロック間で共有されるため、ローカルエンコーダの特徴空間におけるアライメントが促進される。そこで本研究では,これらの局所モデルからの出力を融合して最終的な再構成を実現する効率的かつ効率的な手法を提案する。この改良された粗大な戦略を用いることで,提案手法は最先端のNeRF法より優れ,大規模シーン再構築のスケーラビリティを示す。コードはプロジェクトのページ(https://aibluefisher.github.io/SCALAR-NeRF/)で公開されます。 In this work, we introduce SCALAR-NeRF, a novel framework tailored for scalable large-scale neural scene reconstruction. We structure the neural representation as an encoder-decoder architecture, where the encoder processes 3D point coordinates to produce encoded features, and the decoder generates geometric values that include volume densities of signed distances and colors. Our approach first trains a coarse global model on the entire image dataset. Subsequently, we partition the images into smaller blocks using KMeans with each block being modeled by a dedicated local model. We enhance the overlapping regions across different blocks by scaling up the bounding boxes of each local block. Notably, the decoder from the global model is shared across distinct blocks and therefore promoting alignment in the feature space of local encoders. We propose an effective and efficient methodology to fuse the outputs from these local models to attain the final reconstruction. Employing this refined coarse-to-fine strategy, our method outperforms state-of-the-art NeRF methods and demonstrates scalability for large-scale scene reconstruction. The code will be available on our project page at https://aibluefisher.github.io/SCALAR-NeRF/	翻訳日:2023-11-29 18:58:31 公開日:2023-11-28
# Pseudo-likelihood推論 Pseudo-Likelihood Inference ( http://arxiv.org/abs/2311.16656v1 ) ライセンス: Link先を確認	Theo Gruner, Boris Belousov, Fabio Muratore, Daniel Palenicek, Jan Peters	(参考訳) シミュレーションベース推論(SBI: Simulation-Based Inference)は、モデルパラメータを推論する新しいアプローチのファミリーの一般的な名称である。既存のSBI法は、近似ベイズ計算(ABC)のような確率を近似するか、逐次ニューラル後推定(SNPE)のような後部を直接モデル化するかのいずれかである。 ABCは低次元問題や高次元問題では効率的であるが、一般に関数近似を利用するSNPEよりも優れている。本稿では,abcにニューラルネットワーク近似を導入する新しい手法であるpseudo-likelihood inference (pli)を提案する。積分確率メトリクスを利用することにより,情報理論的な信頼領域に基づいて更新される適応帯域幅を持つスムース・ラバース・カーネルを提案する。この定式化のおかげで、我々の手法は (i)勾配降下による神経後肢の最適化が可能。 (ii)要約統計に頼らず、 (iii)入力として複数の観察が可能。 SNPEと比較して、より多くのデータが利用可能になるとパフォーマンスが向上する。 PLIの有効性は、4つの古典的SBIベンチマークタスクと非常にダイナミックな物理システムで評価され、確率的シミュレーションやマルチモーダルな後部景観に特に利点がある。 Simulation-Based Inference (SBI) is a common name for an emerging family of approaches that infer the model parameters when the likelihood is intractable. Existing SBI methods either approximate the likelihood, such as Approximate Bayesian Computation (ABC) or directly model the posterior, such as Sequential Neural Posterior Estimation (SNPE). While ABC is efficient on low-dimensional problems, on higher-dimensional tasks, it is generally outperformed by SNPE, which leverages function approximation. In this paper, we propose Pseudo-Likelihood Inference (PLI), a new method that brings neural approximation into ABC, making it competitive on challenging Bayesian system identification tasks. By utilizing integral probability metrics, we introduce a smooth likelihood kernel with an adaptive bandwidth that is updated based on information-theoretic trust regions. Thanks to this formulation, our method (i) allows for optimizing neural posteriors via gradient descent, (ii) does not rely on summary statistics, and (iii) enables multiple observations as input. In comparison to SNPE, it leads to improved performance when more data is available. The effectiveness of PLI is evaluated on four classical SBI benchmark tasks and on a highly dynamic physical system, showing particular advantages on stochastic simulations and multi-modal posterior landscapes.	翻訳日:2023-11-29 18:57:58 公開日:2023-11-28
# EMRを用いた予測モデルの記述における不一致の解明 Elucidating Discrepancy in Explanations of Predictive Models Developed using EMR ( http://arxiv.org/abs/2311.16654v1 ) ライセンス: Link先を確認	Aida Brankovic, Wenjie Huang, David Cook, Sankalp Khanna, Konstanty Bialkowski	(参考訳) 透明性と説明性の欠如は、機械学習(ml)アルゴリズムの臨床的採用を妨げる。説明可能な人工知能(xai)手法が提案されているが、これらの手法と専門的な臨床知識の一致に焦点をあてた研究はほとんどない。本研究は,電子カルテ(EMR)データを用いて,これらの因子の一致を解析し,臨床・技術的見地から識別された相違点の原因を考察するために,現状の技術的説明可能性手法を臨床決定支援アルゴリズムに適用する。臨床診断支援のための信頼性の高いXAIソリューションを実現するための重要な要因についても論じる。 The lack of transparency and explainability hinders the clinical adoption of Machine learning (ML) algorithms. While explainable artificial intelligence (XAI) methods have been proposed, little research has focused on the agreement between these methods and expert clinical knowledge. This study applies current state-of-the-art explainability methods to clinical decision support algorithms developed for Electronic Medical Records (EMR) data to analyse the concordance between these factors and discusses causes for identified discrepancies from a clinical and technical perspective. Important factors for achieving trustworthy XAI solutions for clinical decision support are also discussed.	翻訳日:2023-11-29 18:57:24 公開日:2023-11-28
# 自己教師型機械学習によるX線単一粒子画像再構成 Augmenting x-ray single particle imaging reconstruction with self-supervised machine learning ( http://arxiv.org/abs/2311.16652v1 ) ライセンス: Link先を確認	Zhantao Chen, Cong Wang, Mingye Gao, Chun Hong Yoon, Jana B. Thayer, Joshua J. Turner	(参考訳) x線自由電子レーザー(xfels)の開発は、様々な材料の原子構造と超高速ダイナミクスを調べる多くの機会を開いた。 XFELを用いた単一粒子イメージング(SPI)は、低温条件や結晶化の必要性を回避しつつ、時間分解能のない自然生理状態における生物学的粒子の研究を可能にする。しかし、逆空間x線回折データからの実空間構造を再構成することは位相情報や方位情報がないため非常に困難であり、弱い散乱信号やパルス当たりの光子数のかなりの変動によりさらに複雑である。本研究では, 粒子配向を回復し, 回折画像のみから相互空間強度を推定するための, エンドツーエンドの自己教師付き機械学習手法を提案する。提案手法は, 従来のアルゴリズムと比べ, 再構成能力が大幅に向上した実験条件下では, 高い堅牢性を示し, XFELで実施されているSPIのパラダイムシフトを示す。 The development of X-ray Free Electron Lasers (XFELs) has opened numerous opportunities to probe atomic structure and ultrafast dynamics of various materials. Single Particle Imaging (SPI) with XFELs enables the investigation of biological particles in their natural physiological states with unparalleled temporal resolution, while circumventing the need for cryogenic conditions or crystallization. However, reconstructing real-space structures from reciprocal-space x-ray diffraction data is highly challenging due to the absence of phase and orientation information, which is further complicated by weak scattering signals and considerable fluctuations in the number of photons per pulse. In this work, we present an end-to-end, self-supervised machine learning approach to recover particle orientations and estimate reciprocal space intensities from diffraction images only. Our method demonstrates great robustness under demanding experimental conditions with significantly enhanced reconstruction capabilities compared with conventional algorithms, and signifies a paradigm shift in SPI as currently practiced at XFELs.	翻訳日:2023-11-29 18:57:07 公開日:2023-11-28
# トーリック符号とゲージヒッグスモデルにおけるバルク測定誘起境界相転移 Bulk-Measurement-Induced Boundary Phase Transition in Toric Code and Gauge-Higgs Model ( http://arxiv.org/abs/2311.16651v1 ) ライセンス: Link先を確認	Yoshihito Kuno, Takahiro Orito, Ikuo Ichinose	(参考訳) 円筒形状下のトーリック符号におけるバルク射影測定による境界位相遷移の研究を報告する。バルク量子ビットの局所的な測定頻度が増加すると、境界上のスピングラス型長距離秩序が出現し、z_2$対称性の自発的対称性破れ(ssb)を示す。格子ゲージ理論の観点から、このSSBは対称性が保護された位相秩序を持つヒッグス相への遷移の信号である。位相遷移の性質、特に臨界度を数値的に解明し、非局所ゲージ不変対称性作用素を用いて物理像を与える。バルク中の相転移についても検討し, 境界遷移との関係について考察した。 Study of boundary phase transition in toric code under cylinder geometry via bulk projective measurements is reported. As the frequency of the local measurements for bulk qubits is increased, spin-glass type long-range order on the boundaries emerges indicating spontaneous symmetry breaking (SSB) of $Z_2$ symmetry. From the lattice-gauge-theory viewpoint, this SSB is a signal of a transition to Higgs phase with symmetry protected topological order. We numerically elucidate properties of the phase transition in detail, especially its criticality and give a physical picture using a non-local gauge invariant symmetry operator. Phase transition in the bulk is also studied and its relationship to the boundary transition is discussed.	翻訳日:2023-11-29 18:56:18 公開日:2023-11-28
# Text2Tree:不均衡医療分類のためのラベルツリー階層へのテキスト表現の調整 Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification ( http://arxiv.org/abs/2311.16650v1 ) ライセンス: Link先を確認	Jiahuan Yan, Haojun Gao, Zhang Kai, Weize Liu, Danny Chen, Jian Wu, Jintai Chen	(参考訳) ディープラーニングアプローチは、様々なテキストタスクで有望なパフォーマンスを示す。しかし、サンプルは極めて不均衡で不足していることが多いため、医学的テキスト分類に苦戦している。外部医療情報を用いた補足的セマンティクスに焦点を当てた既存の主流アプローチとは違って,本研究では,医学テキストにおけるデータ課題を再考し,深層学習モデルのトレーニングにおいて,内部ラベル階層のみを利用するText2Treeと呼ばれる新しいフレームワーク非依存アルゴリズムを提案する。我々は階層認識ラベル表現を学習するためのカスケードアテンションモジュールにラベルのicdコードツリー構造を埋め込む。 2つの新しい学習スキームであるSimisity Surrogate Learning (SSL) と Dissimilarity Mixup Learning (DML) は、ラベル表現階層に従って他のラベルのサンプルを再利用・識別することで、テキスト分類を強化するために考案された。権威のある公開データセットと実世界の医療記録の実験により、我々のアプローチは古典的および高度な不均衡な分類方法よりも安定して優れた性能を発揮することが示された。 Deep learning approaches exhibit promising performances on various text tasks. However, they are still struggling on medical text classification since samples are often extremely imbalanced and scarce. Different from existing mainstream approaches that focus on supplementary semantics with external medical information, this paper aims to rethink the data challenges in medical texts and present a novel framework-agnostic algorithm called Text2Tree that only utilizes internal label hierarchy in training deep learning models. We embed the ICD code tree structure of labels into cascade attention modules for learning hierarchy-aware label representations. Two new learning schemes, Similarity Surrogate Learning (SSL) and Dissimilarity Mixup Learning (DML), are devised to boost text classification by reusing and distinguishing samples of other labels following the label representation hierarchy, respectively. Experiments on authoritative public datasets and real-world medical records show that our approach stably achieves superior performances over classical and advanced imbalanced classification methods.	翻訳日:2023-11-29 18:56:02 公開日:2023-11-28
# データセット蒸留におけるバックドア攻撃の再考:カーネル手法の展望 Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective ( http://arxiv.org/abs/2311.16646v1 ) ライセンス: Link先を確認	Ming-Yu Chung, Sheng-Yen Chou, Chia-Mu Yu, Pin-Yu Chen, Sy-Yen Kuo, Tsung-Yi Ho	(参考訳) データセットの蒸留は、ディープラーニングにおけるデータ効率を高める潜在的な手段を提供する。最近の研究では、元のトレーニングサンプルに存在するバックドアのリスクに対処できることが示されている。本研究では,カーネル法に基づくバックドア攻撃とデータセット蒸留の理論的な側面を考察する。本稿では, データセット蒸留に特化した2つの新しい理論駆動トリガパターン生成手法を提案する。総合的な分析と実験の結果,我々の最適化に基づくトリガー設計フレームワークが,データセット蒸留に対する効果的なバックドア攻撃を知らせることを示した。特に,我々の設計したトリガーによって汚染されたデータセットは,従来のバックドアアタック検出や緩和手法に耐性があることが証明された。実験の結果,我々のアプローチで開発したトリガーは弾力性のあるバックドアアタックの実行に熟練していることが確認された。 Dataset distillation offers a potential means to enhance data efficiency in deep learning. Recent studies have shown its ability to counteract backdoor risks present in original training samples. In this study, we delve into the theoretical aspects of backdoor attacks and dataset distillation based on kernel methods. We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation. Following a comprehensive set of analyses and experiments, we show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation. Notably, datasets poisoned by our designed trigger prove resilient against conventional backdoor attack detection and mitigation methods. Our empirical results validate that the triggers developed using our approaches are proficient at executing resilient backdoor attacks.	翻訳日:2023-11-29 18:55:23 公開日:2023-11-28
# ゲームバグの詳細な分類の導出と評価 Deriving and Evaluating a Detailed Taxonomy of Game Bugs ( http://arxiv.org/abs/2311.16645v1 ) ライセンス: Link先を確認	Nigar Azhar Butt, Salman Sherin, Muhammad Uzair Khan, Atif Aftab Jilani, and Muhammad Zohaib Iqbal	(参考訳) ゲーム開発は、非常に競争の激しい数十億ドル産業となっている。多くのゲームは、ゲームプレイを破壊し、プレイヤーの体験を台無しにするゲーム破壊バグのために、長年の開発努力の後でも失敗する。この研究の目的は、ゲーム開発者がバグ耐性ゲームを開発するのに役立つバグ分類を提供すること、ゲームテスタがフォールトフィニングテストケースの設計と実行、そしてゲームテストアプローチを評価する研究者に提供することである。そこで我々は,ゲーム開発業界で発生したバグを報告した189人 (78人, 灰色111人) の資料の中から, 436 の資料を分析し,MLR (Multivocal Literature Review) を行った。異なるゲーム業界の実践者を対象とした調査を行い,提案分類を検証した。 MLRは,ゲームバランス,実装応答,ネットワーク,音,テンポラル,予期せぬクラッシュ,ナビゲーション,非テンポラルの8つのカテゴリを含む,エンドユーザーの視点から63のゲームバグカテゴリの詳細な分類を確定しました。ゲームテストへの手動アプローチはまだ広く使われている。最近の文献では、ゲームバランシングや機械学習をゲームテストに組み込む方法が流行っているのに対して、サウンドバグをターゲットとするアプローチは1つだけである。ほとんどのゲームテスト技術は特定のプラットフォームに依存している。 Game development has become an extremely competitive multi-billion-dollar industry. Many games fail even after years of development efforts because of game-breaking bugs that disrupt the game-play and ruin the player experience. The goal of this work is to provide a bug taxonomy for games that will help game developers in developing bug-resistant games, game testers in designing and executing fault-finding test cases, and researchers in evaluating game testing approaches. For this purpose, we performed a Multivocal Literature Review (MLR) by analyzing 436 sources, out of which 189 (78 academic and 111 grey) sources reporting bugs encountered in the game development industry were selected for analysis. We validate the proposed taxonomy by conducting a survey involving different game industry practitioners. The MLR allowed us to finalize a detailed taxonomy of 63 game bug categories in end-user perspective including eight first-tier categories: Gaming Balance, Implementation Response, Network, Sound, Temporal, Unexpected Crash, Navigational, and Non-Temporal faults. We observed that manual approaches towards game testing are still widely used. Only one of the approaches targets sound bugs whereas game balancing and how to incorporate machine learning in game testing is trending in the recent literature. Most of the game testing techniques are specialized and dependent on specific platforms.	翻訳日:2023-11-29 18:55:11 公開日:2023-11-28
# フィンランドの5年生と6年生の人工知能に関する誤解 Finnish 5th and 6th graders' misconceptions about Artificial Intelligence ( http://arxiv.org/abs/2311.16644v1 ) ライセンス: Link先を確認	Pekka Mertala and Janne Fagerlund	(参考訳) 子どもの初期概念であるAIの研究は、建設主義の観点から、教育学的に健全なAIリテラシーカリキュラム、方法、材料の開発に挑戦する新興国家にある。本論文では,このニーズの解決に寄与するため,第5学年と第6学年のフィンランドにおけるAIの本質について,どのような誤解があるのか,という3つの研究課題に対して,195人の子どもの質的調査データを帰納的に分析した。 ; 2)これらの誤解は一般的な誤解とどのように関係しているのか? ;そして 3) この誤解はどの程度深刻か? その結果,3つの誤解カテゴリーが同定された。 1)AIが人々の認知過程(事実的誤解)として概念化された非技術AI。 2)人間のような存在としてAIが概念化された擬人化AI(垂直・非科学的・概念的誤解) 3) インテリジェンスや知識(実際の誤解)を事前にインストールした機械としてのAI。子どもの大多数はai知識の低さを評価しており、誤解は深みよりも表面的であることを示している。その結果, 文脈固有の言語的特徴は, 学生のAI誤解に寄与することが示唆された。今後の研究とAIリテラシー教育の意義について論じる。 Research on children's initial conceptions of AI is in an emerging state, which, from a constructivist viewpoint, challenges the development of pedagogically sound AI-literacy curricula, methods, and materials. To contribute to resolving this need in the present paper, qualitative survey data from 195 children were analyzed abductively to answer the following three research questions: What kind of misconceptions do Finnish 5th and 6th graders' have about the essence AI?; 2) How do these misconceptions relate to common misconception types?; and 3) How profound are these misconceptions? As a result, three misconception categories were identified: 1) Non-technological AI, in which AI was conceptualized as peoples' cognitive processes (factual misconception); 2) Anthropomorphic AI, in which AI was conceptualized as a human-like entity (vernacular, non-scientific, and conceptual misconception); and 3) AI as a machine with a pre-installed intelligence or knowledge (factual misconception). Majority of the children evaluated their AI-knowledge low, which implies that the misconceptions are more superficial than profound. The findings suggest that context-specific linguistic features can contribute to students' AI misconceptions. Implications for future research and AI literacy education are discussed.	翻訳日:2023-11-29 18:54:47 公開日:2023-11-28
# ChatGPTによる政治テキストのスケーリング Scaling Political Texts with ChatGPT ( http://arxiv.org/abs/2311.16639v1 ) ライセンス: Link先を確認	Ga\"el Le Mens and Aina Gallego	(参考訳) GPT-4を用いて連続空間における政治的テキストの位置推定を行う。我々は、左派イデオロギースペクトルに関するアメリカ合衆国議会のメンバーによる経済、社会、移民政策の側面とツイートに英国党のマニフェストを位置づけ、新しいアプローチを開発し、検証する。党宣言では、gpt-4が生み出した地位と専門家の相関関係は93%以上であり、クラウドソーシングによる位置推定と同等かそれ以上である。個別のつぶやきに対して、GPT-4で得られた位置は、クラウドソースされた位置推定と91%の相関性が得られる。アメリカ第117議会の上院議員は、gpt-4で得られた役職は、ロールコール投票に基づく推計と97%、選挙資金に基づくものと96%の相関がある。 GPT-4による位置推定は、上院議員間での党内差を捉えていることを示している。全体として、イデオロギースケーリングにGPT-4を用いることは、高速で、コスト効率が高く、信頼性が高い。このアプローチは、エキスパートレートとクラウドソーシングの両方によるスケーリングの代替手段を提供する。 We use GPT-4 to obtain position estimates of political texts in continuous spaces. We develop and validate a new approach by positioning British party manifestos on the economic, social, and immigration policy dimensions and tweets by members of the US Congress on the left-right ideological spectrum. For the party manifestos, the correlation between the positions produced by GPT-4 and experts is 93% or higher, a performance similar to or better than that obtained with crowdsourced position estimates. For individual tweets, the positions obtained with GPT-4 achieve a correlation of 91% with crowdsourced position estimates. For senators of the 117th US Congress, the positions obtained with GPT-4 achieve a correlation of 97% with estimates based on roll call votes and of 96% with those based on campaign funding. Correlations are also substantial within party, indicating that position estimates produced with GPT-4 capture within-party differences between senators. Overall, using GPT-4 for ideological scaling is fast, cost-efficient, and reliable. This approach provides a viable alternative to scaling by both expert raters and crowdsourcing.	翻訳日:2023-11-29 18:54:31 公開日:2023-11-28
# LEDITS++: テキスト・ツー・イメージモデルを用いた制限なし画像編集 LEDITS++: Limitless Image Editing using Text-to-Image Models ( http://arxiv.org/abs/2311.16711v1 ) ライセンス: Link先を確認	Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolin\'ario Passos	(参考訳) テキストから画像への拡散モデルは最近、テキスト入力のみから高精細な画像を生成するという驚くべき能力への関心が高まっている。その後の研究は、実際の画像編集にその能力を活用、応用することを目的としている。しかし、既存の画像から画像への方法はしばしば非効率であり、不正確であり、汎用性が限られている。それらは、時間を要する微調整、不要に入力画像から切り離すこと、および/または複数同時編集のサポートの欠如を必要とする。この問題に対処するため、より効率的で汎用的で正確なテキスト画像操作技術であるledits++を紹介する。 LEDITS++の新たな反転アプローチはチューニングや最適化を必要とせず、いくつかの拡散ステップで高忠実度な結果を生成する。第二に、我々の方法論は複数の同時編集をサポートし、アーキテクチャに依存しない。第3に、関連する画像領域への変更を制限する、新しい暗黙のマスキング技術を用いる。本稿では,TEdBench++ベンチマークを提案する。本結果は,LEDITS++ の機能と,従来の方法よりも改善されていることを示す。プロジェクトページはhttps://leditsplusplus-project.static.hf.spaceにある。 Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming fine-tuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++'s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods. The project page is available at https://leditsplusplus-project.static.hf.space .	翻訳日:2023-11-29 18:46:56 公開日:2023-11-28
# フルレゾリューションMLPによる医用線量予測 Full-resolution MLPs Empower Medical Dense Prediction ( http://arxiv.org/abs/2311.16707v1 ) ライセンス: Link先を確認	Mingyuan Meng, Yuxin Xue, Dagan Feng, Lei Bi, and Jinman Kim	(参考訳) デンス予測は、医用画像の復元、登録、セグメンテーションなどの多くの医療ビジョンタスクの基本的な要件である。最も一般的なビジョンモデルである畳み込みニューラルネットワーク(CNN)は、畳み込み操作の固有の局所性のためにボトルネックに達している。近年,変圧器は長距離視覚依存を捉える能力の高密度な予測に広く採用されている。しかし、計算複雑性が高く、自己アテンション操作のメモリ消費が大きいため、トランスフォーマーは通常、ダウンサンプリングされた特徴解像度で使用される。このような使用は、全解像度でのみ利用可能な組織レベルのテクスチャ情報を効果的に活用できない。このテクスチャ情報は、医用画像の微妙な人間の解剖を区別できるため、医用密度予測に不可欠である。本研究では,MLPがフル画像解像度での長距離依存を可能にするため,組織レベルの細部がパフォーマンスを左右する医療密度予測において,MLPはトランスフォーマーの代替として優れていると仮定する。本仮説を検証するために,全画像解像度から MLP を利用する完全解像度階層型 MLP フレームワークを開発した。この枠組みは, 修復, 登録, セグメンテーションを含む, 広範囲の医療用高密度予測タスクにおいて, 様々なMLPブロックを用いて評価する。 6つの公開ベンチマークデータセットに対する大規模な実験により、MLPをフル解像度で使用するだけで、我々のフレームワークはCNNやトランスフォーマーよりも優れており、様々な医療密集予測タスクにおける最先端のパフォーマンスを実現していることがわかった。 Dense prediction is a fundamental requirement for many medical vision tasks such as medical image restoration, registration, and segmentation. The most popular vision model, Convolutional Neural Networks (CNNs), has reached bottlenecks due to the intrinsic locality of convolution operations. Recently, transformers have been widely adopted for dense prediction for their capability to capture long-range visual dependence. However, due to the high computational complexity and large memory consumption of self-attention operations, transformers are usually used at downsampled feature resolutions. Such usage cannot effectively leverage the tissue-level textural information available only at the full image resolution. This textural information is crucial for medical dense prediction as it can differentiate the subtle human anatomy in medical images. In this study, we hypothesize that Multi-layer Perceptrons (MLPs) are superior alternatives to transformers in medical dense prediction where tissue-level details dominate the performance, as MLPs enable long-range dependence at the full image resolution. To validate our hypothesis, we develop a full-resolution hierarchical MLP framework that uses MLPs beginning from the full image resolution. We evaluate this framework with various MLP blocks on a wide range of medical dense prediction tasks including restoration, registration, and segmentation. Extensive experiments on six public well-benchmarked datasets show that, by simply using MLPs at full resolution, our framework outperforms its CNN and transformer counterparts and achieves state-of-the-art performance on various medical dense prediction tasks.	翻訳日:2023-11-29 18:46:41 公開日:2023-11-28
# Sinkhorn Flow: Sinkhornアルゴリズムの理解と一般化のための継続的時間フレームワーク Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm ( http://arxiv.org/abs/2311.16706v1 ) ライセンス: Link先を確認	Mohammad Reza Karimi, Ya-Ping Hsieh, Andreas Krause	(参考訳) 機械学習における多くの問題は、確率測度の空間におけるエントロピー正規化最適輸送の解法として定式化することができる。正準的アプローチは、その豊かな数学的性質で有名なシンクホーンイテレートを含む。近年、シンクホーンアルゴリズムはミラー降下フレームワーク内で再キャストされ、古典的な最適化理論の洞察の恩恵を受けている。そこで,この結果を基に,シンクホーンアルゴリズムの連続時間類似性を導入する。この観点から、ノイズやバイアスに頑健なシンクホーンスキームの新しい変種を導出することができる。さらに、我々の連続時間ダイナミクスは一般化するだけでなく、機械学習や数学で最近発見されたいくつかの力学、例えば(deb et al. 2023)の「wasserstein mirror flow」や(claisse et al. 2023)の「mean-field schr\"odinger equation」に対する統一的な視点を提供する。 Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schr\"odinger equation" of (Claisse et al. 2023).	翻訳日:2023-11-29 18:46:15 公開日:2023-11-28
# CADTalk:CADプログラムのセマンティックコメントのためのアルゴリズムとベンチマーク CADTalk: An Algorithm and Benchmark for Semantic Commenting of CAD Programs ( http://arxiv.org/abs/2311.16703v1 ) ライセンス: Link先を確認	Haocheng Yuan, Jing Xu, Hao Pan, Adrien Bousseau, Niloy Mitra, Changjian Li	(参考訳) cadプログラムは、パラメトリックな修正が容易な操作のシーケンスとして形状をコンパクトにエンコードする一般的な方法である。しかし、十分なセマンティックなコメントや構造がなければ、このようなプログラムは理解するのが難しくなる。本稿では,意味的に意味のある形状部分に対応するコードブロックに入力プログラムを分割し,各ブロックに意味ラベルを割り当てることを目的とする,意味的コメントcadプログラムの問題を紹介する。基礎言語と視覚モデルの最近の進歩を活かし,プログラム解析と視覚分析を組み合わせることで,この問題を解決した。具体的には、入力プログラムを実行することで、条件付きフォトリアリスティックな画像を生成するために、そのような画像にセマンティックアノテータを使用する形状を生成する。その後、画像にまたがって情報を蒸留し、元のプログラムにリンクして意味的にコメントします。さらに,5,280個の機械製プログラムと45個の人為的プログラムからなるベンチマークデータセットCADTalkを収集,注釈し,今後の研究を促進する。我々はGPTベースのベースラインアプローチやオープンセットの形状分割ベースラインであるPartSLIPと比較して、我々のアプローチを広範囲に評価し、新しいCADTalkデータセットに対して83.24%の精度を報告した。プロジェクトページ: https://enigma-li.github.io/CADTalk/。 CAD programs are a popular way to compactly encode shapes as a sequence of operations that are easy to parametrically modify. However, without sufficient semantic comments and structure, such programs can be challenging to understand, let alone modify. We introduce the problem of semantic commenting CAD programs, wherein the goal is to segment the input program into code blocks corresponding to semantically meaningful shape parts and assign a semantic label to each block. We solve the problem by combining program parsing with visual-semantic analysis afforded by recent advances in foundational language and vision models. Specifically, by executing the input programs, we create shapes, which we use to generate conditional photorealistic images to make use of semantic annotators for such images. We then distill the information across the images and link back to the original programs to semantically comment on them. Additionally, we collected and annotated a benchmark dataset, CADTalk, consisting of 5,280 machine-made programs and 45 human-made programs with ground truth semantic comments to foster future research. We extensively evaluated our approach, compared to a GPT-based baseline approach, and an open-set shape segmentation baseline, i.e., PartSLIP, and reported an 83.24% accuracy on the new CADTalk dataset. Project page: https://enigma-li.github.io/CADTalk/.	翻訳日:2023-11-29 18:45:55 公開日:2023-11-28
# 腎臓・肝腫瘍分節の知識蒸留における中間層設計の再考 Rethinking Intermediate Layers design in Knowledge Distillation for Kidney and Liver Tumor Segmentation ( http://arxiv.org/abs/2311.16700v1 ) ライセンス: Link先を確認	Vandan Gorade, Sparsh Mittal, Debesh Jha, Ulas Bagci	(参考訳) 知識蒸留(kd)は様々な領域で有意な成功を収めてきたが、腎臓や肝腫瘍の分画などの画像診断への応用は困難に直面している。既存のKDメソッドの多くは、これらのタスクに特化していない。さらに、一般的なkd法は、教師から生徒への知識を蒸留する場所や場所について、注意深い考察を欠くことが多い。この監視は、より浅い学生層内のトレーニングバイアスの蓄積などの問題を引き起こし、KDの有効性を損なう可能性がある。これらの課題に対処するため,階層型層選択型フィードバック蒸留(HLFD)を提案する。 HLFDは、中間層から以前の層への知識を戦略的に蒸留し、最終層の知識を特徴レベルと画素レベルの中間層に伝達する。この設計により、モデルは以前の層から高品質な表現を学ぶことができ、堅牢でコンパクトな学生モデルが得られる。大規模な定量的評価により、HLFDは既存の手法よりも有意なマージンで優れていることが明らかとなった。例えば、腎臓セグメンテーションタスクでは、HLFDは学生モデル(KDなし)を10pp以上上回り、腫瘍特異的な特徴への焦点を大幅に改善する。定性的な観点から、HLFDを用いて訓練された学生モデルは、無関係な情報の抑制に優れ、腫瘍特異的な詳細に鋭く焦点を合わせ、より効率的で正確な診断ツールのための新しい経路を開くことができる。 Knowledge distillation(KD) has demonstrated remarkable success across various domains, but its application to medical imaging tasks, such as kidney and liver tumor segmentation, has encountered challenges. Many existing KD methods are not specifically tailored for these tasks. Moreover, prevalent KD methods often lack a careful consideration of what and from where to distill knowledge from the teacher to the student. This oversight may lead to issues like the accumulation of training bias within shallower student layers, potentially compromising the effectiveness of KD. To address these challenges, we propose Hierarchical Layer-selective Feedback Distillation (HLFD). HLFD strategically distills knowledge from a combination of middle layers to earlier layers and transfers final layer knowledge to intermediate layers at both the feature and pixel levels. This design allows the model to learn higher-quality representations from earlier layers, resulting in a robust and compact student model. Extensive quantitative evaluations reveal that HLFD outperforms existing methods by a significant margin. For example, in the kidney segmentation task, HLFD surpasses the student model (without KD) by over 10pp, significantly improving its focus on tumor-specific features. From a qualitative standpoint, the student model trained using HLFD excels at suppressing irrelevant information and can focus sharply on tumor-specific details, which opens a new pathway for more efficient and accurate diagnostic tools.	翻訳日:2023-11-29 18:45:31 公開日:2023-11-28
# 強絡み空洞BEC系におけるエキゾチック量子ゆらぎの増大 Enhancing exotic quantum fluctuations in a strongly entangled cavity BEC system ( http://arxiv.org/abs/2311.16687v1 ) ライセンス: Link先を確認	Leon Mixa, Hans Ke{\ss}ler, Andreas Hemmerich, and Michael Thorwart	(参考訳) 量子光場と相関量子物質との強い結合は、物質セクターにおけるエキゾチックな量子揺らぎを引き起こすことを示した。スペクトル特性を判定し,原子s波散乱の影響を明らかにする。特に、仮想時間経路積分を用いた微視的ハミルトニアンから散逸的ランダウ過程とベリャーエフ過程を導出する。これにより、その強いサブオーミックな性質が解析的に明らかにされる。制振チャネルと防振チャネルの競争が明らかになっている。その物理観測性への影響を解析的に定量化し、臨界点のストークスシフトを決定する。これは強い光と物質とのカップリングを利用して量子物質のゆらぎの波長性を示す。 We show that the strong coupling of a quantum light field and correlated quantum matter induces exotic quantum fluctuations in the matter sector. We determine their spectral characteristics and reveal the impact of the atomic s-wave scattering. In particular, we derive the dissipative Landau and Beliaev processes from the microscopic Hamiltonian using imaginary time path integrals. By this, their strongly sub-Ohmic nature is revealed analytically. A competition between damping and antidamping channels is uncovered. Their intricate influence on physical observables is quantified analytically and the Stokes shift of the critical point is determined. This illustrates the tunability of the quantum matter fluctuations by exploiting strong light-matter coupling.	翻訳日:2023-11-29 18:45:06 公開日:2023-11-28
# 次世代POIのためのハイパーリレーショナル知識グラフニューラルネットワーク Hyper-Relational Knowledge Graph Neural Network for Next POI ( http://arxiv.org/abs/2311.16683v1 ) ライセンス: Link先を確認	Jixiao Zhang, Yongkang Li, Ruotong Zou, Jingyuan Zhang, Zipei Fan, Xuan Song	(参考訳) モバイル技術の発展に伴い、位置情報ベースのソーシャルネットワーク(LBSN)におけるPOIレコメンデーションシステムは、ユーザと企業双方に多くのメリットをもたらしている。既存の多くの研究では、LBSNのデータ空間の問題を軽減するためにKG(Knowledge Graph)を使用している。これらのアプローチは、主にLBSNにおけるペアワイズ関係のモデリングに焦点を当て、セマンティクスを充実させ、データ空間の問題を軽減する。しかし、既存のアプローチでは、移動関係(3項関係:ユーザ-POI-time)のようなLBSNのハイパーリレーションはめったに考慮されない。これによりモデルは、セマンティクスを正確に活用することが難しくなる。さらに,従来の研究は,高次関係から成り,データ空間の影響を緩和するKGの豊富な構造情報を見落としており,その目的のために,ハイパーリレーショナル知識グラフニューラルネットワーク(HKGNN)モデルを提案する。 HKGNNでは、LBSNデータをモデル化するハイパーリレーショナル知識グラフ(HKG)が構築され、ハイパーリレーショナルのリッチなセマンティクスを維持し、活用している。そこで我々は,hkgの構造情報を凝集的に利用するハイパーグラフニューラルネットワークを提案する。さらに、シーケンシャル情報を活用し、パーソナライズドレコメンデーションを行うために、セルフアテンションネットワークが使用される。また,従来の手法では,POIの背景知識を提供することによってデータの疎度を低減させるのに不可欠な側情報を十分に利用していない。これを踏まえて、現在のデータセットを利用可能なサイド情報で拡張し、データ空間の影響をさらに軽減しました。 4つの実世界のlbsnデータセットにおける実験の結果は、既存の最先端手法と比較して、このアプローチの有効性を示している。 With the advancement of mobile technology, Point of Interest (POI) recommendation systems in Location-based Social Networks (LBSN) have brought numerous benefits to both users and companies. Many existing works employ Knowledge Graph (KG) to alleviate the data sparsity issue in LBSN. These approaches primarily focus on modeling the pair-wise relations in LBSN to enrich the semantics and thereby relieve the data sparsity issue. However, existing approaches seldom consider the hyper-relations in LBSN, such as the mobility relation (a 3-ary relation: user-POI-time). This makes the model hard to exploit the semantics accurately. In addition, prior works overlook the rich structural information inherent in KG, which consists of higher-order relations and can further alleviate the impact of data sparsity.To this end, we propose a Hyper-Relational Knowledge Graph Neural Network (HKGNN) model. In HKGNN, a Hyper-Relational Knowledge Graph (HKG) that models the LBSN data is constructed to maintain and exploit the rich semantics of hyper-relations. Then we proposed a Hypergraph Neural Network to utilize the structural information of HKG in a cohesive way. In addition, a self-attention network is used to leverage sequential information and make personalized recommendations. Furthermore, side information, essential in reducing data sparsity by providing background knowledge of POIs, is not fully utilized in current methods. In light of this, we extended the current dataset with available side information to further lessen the impact of data sparsity. Results of experiments on four real-world LBSN datasets demonstrate the effectiveness of our approach compared to existing state-of-the-art methods.	翻訳日:2023-11-29 18:44:56 公開日:2023-11-28
# contextseg: 注意を向けたコンテキストクエリによる意味セグメンテーションのスケッチ ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention ( http://arxiv.org/abs/2311.16682v1 ) ライセンス: Link先を確認	Jiawei Wang, Changjian Li	(参考訳) スケッチセマンティックセグメンテーション(Sketch semantic segmentation)は、コンピュータビジョンにおいて、予め定義された部分ラベルを個々のストロークに割り当てることを含む、よく研究され重要な問題である。本稿では,この問題を2段階で解くための,単純かつ効果的なアプローチであるcontextsegを提案する。第1段階では、ストロークの形状と位置情報をより良くエンコードするために、オートエンコーダネットワークにおける超密接な距離場を予測し、構造的情報学習を強化する。第2段階では、全ストロークを単一のエンティティとして扱い、デフォルトのアテンション機構を備えた自動回帰変換器を用いて、同じ意味部分内でストロークのグループをラベル付けする。グループに基づくラベリングにより,ストロークの残りのグループについて決定を行う際に,コンテキスト情報を十分に活用することができる。提案手法は,2つの代表的なデータセットに対する最先端手法と比較して高いセグメンテーション精度を達成し,その性能を広く評価してきた。さらに、トレーニングデータにおける部分不均衡の解決に関する洞察と、この分野での今後の研究を刺激するクロスカテゴリトレーニングに関する予備実験を提供する。 Sketch semantic segmentation is a well-explored and pivotal problem in computer vision involving the assignment of pre-defined part labels to individual strokes. This paper presents ContextSeg - a simple yet highly effective approach to tackling this problem with two stages. In the first stage, to better encode the shape and positional information of strokes, we propose to predict an extra dense distance field in an autoencoder network to reinforce structural information learning. In the second stage, we treat an entire stroke as a single entity and label a group of strokes within the same semantic part using an auto-regressive Transformer with the default attention mechanism. By group-based labeling, our method can fully leverage the context information when making decisions for the remaining groups of strokes. Our method achieves the best segmentation accuracy compared with state-of-the-art approaches on two representative datasets and has been extensively evaluated demonstrating its superior performance. Additionally, we offer insights into solving part imbalance in training data and the preliminary experiment on cross-category training, which can inspire future research in this field.	翻訳日:2023-11-29 18:44:28 公開日:2023-11-28
# Extra-)規則を理解する: プロトタイプ概念に基づく説明による深部モデル決定の検証 Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations ( http://arxiv.org/abs/2311.16681v1 ) ライセンス: Link先を確認	Maximilian Dreyer, Reduan Achtibat, Wojciech Samek, Sebastian Lapuschkin	(参考訳) 深層ニューラルネットワーク(dnn)を医療などのリスクの高いアプリケーションにデプロイする場合、透明性と安全性の両立が不可欠である。説明可能なAI(XAI)の分野では、不透明なDNNの意思決定プロセスを理解するための様々な方法が提案されている。しかしながら、労働集約的かつ偏りのある人間の評価に強く依存するため、実際に安全を確保するのに適したXAI手法はごくわずかである。そこで本研究では,実例的(地域的)かつクラス的(グローバル的)意思決定戦略をプロトタイプを通じて伝達する,ポストホックなコンセプトベースXAIフレームワークを提案する。我々のアプローチを区別しているのは、ローカル戦略とグローバル戦略の組み合わせであり、モデル決定における(非)類似性を明確に理解し、期待される(プロの)概念使用と比べ、最終的に人間の長期的な評価への依存を減らすことができる。原型的行動からの逸脱を定量化することで、予測を特定のモデルサブストラテジーに関連付けるだけでなく、外れ値の挙動を検出することもできる。このように、我々のアプローチは、モデル検証のための直感的で説明可能なツールを構成する。本稿では,VGG,ResNet,EfficientNetの3つのデータセット(ImageNet,CUB-200,CIFAR-10)にまたがる分布外サンプルの同定,データ品質問題に対するアプローチの有効性を示す。コードはhttps://github.com/maxdreyer/pcxで入手できる。 Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.	翻訳日:2023-11-29 18:44:07 公開日:2023-11-28
# ROSO:合成観察によるロボット政策推論の改善 ROSO: Improving Robotic Policy Inference via Synthetic Observations ( http://arxiv.org/abs/2311.16680v1 ) ライセンス: Link先を確認	Yusuke Miyashita, Dimitris Gahtidis, Colin La, Jeremy Rabinowicz, Juxi Leitner	(参考訳) 本稿では,生成型人工知能(ai)を用いて,推定中に観測値を変更することにより,事前学習した方針のゼロショット性能を向上させることを提案する。先進的なニューラルネットワークを利用した現代のロボットシステムは、事前訓練されたタスクに顕著な能力を示した。しかし、新しいオブジェクトや環境への一般化と適応は困難であり、微調整型ビズモータポリシーは時間がかかる。これらの課題を克服するために, 合成観測(ROSO)によるロボットポリシー推論を提案する。 ROSOは安定拡散を利用して、ロボットの推論時間中の新しい物体の観察を前処理し、事前訓練されたポリシーの観察の分布に適合する。このパラダイムにより、既知のタスクから学習した知識を、これまで見つからなかったシナリオに移行し、長い微調整を必要とせず、ロボットの適応性を高めることができる。我々の実験は、生成AIをロボット推論に組み込むことで、成功率を大幅に向上し、事前訓練されたポリシーでなければ57%のタスクが失敗することを示した。 In this paper, we propose the use of generative artificial intelligence (AI) to improve zero-shot performance of a pre-trained policy by altering observations during inference. Modern robotic systems, powered by advanced neural networks, have demonstrated remarkable capabilities on pre-trained tasks. However, generalizing and adapting to new objects and environments is challenging, and fine-tuning visuomotor policies is time-consuming. To overcome these issues we propose Robotic Policy Inference via Synthetic Observations (ROSO). ROSO uses stable diffusion to pre-process a robot's observation of novel objects during inference time to fit within its distribution of observations of the pre-trained policies. This novel paradigm allows us to transfer learned knowledge from known tasks to previously unseen scenarios, enhancing the robot's adaptability without requiring lengthy fine-tuning. Our experiments show that incorporating generative AI into robotic inference significantly improves successful outcomes, finishing up to 57% of tasks otherwise unsuccessful with the pre-trained policy.	翻訳日:2023-11-29 18:43:38 公開日:2023-11-28
# 微粒化感度解析のためのエンティティ・アスペクト・オピニオン・センサ・クアドルプル抽出 Entity-Aspect-Opinion-Sentiment Quadruple Extraction for Fine-grained Sentiment Analysis ( http://arxiv.org/abs/2311.16678v1 ) ライセンス: Link先を確認	Dan Ma, Jun Xu, Zongyu Wang, Xuezhi Cao, Yunsen Xian	(参考訳) 製品レビューには、しばしば多くの暗黙的な側面とオブジェクト属性の共存ケースが含まれている。残念ながら、Aspect-Based Sentiment Analysis (ABSA) の既存の研究の多くがこの問題を見落としており、総合的かつ公平に意見を取り出すのが困難である。本稿では,情報損失や非排他的アノテーション,意見誤解を避けるために,アスペクト項をエンティティやアスペクトに階層的に分解することを目的とした,entity-aspect-opinion-sentiment quadruple extraction(easqe)と呼ばれる新しいタスクを提案する。このタスクの研究を容易にするために、SemEval RestaurantとLaptopのデータセットに基づいた4つのデータセット(Res14-EASQE、Res15-EASQE、Res16-EASQE、Lap14-EASQE)を構築した。 EASQEタスクのベースラインとして、2段階のシーケンスタグに基づくTrigger-Opinionフレームワークも提案している。実験的な評価から,我々のTrigger-Opinionフレームワークは満足なERSQE結果を生成することができ,他のABSAタスクにも適用可能であることが示唆された。我々はTrigger-Opinionの4つのデータセットとソースコードを公開し、この分野のさらなる研究を促進する。 Product reviews often contain a large number of implicit aspects and object-attribute co-existence cases. Unfortunately, many existing studies in Aspect-Based Sentiment Analysis (ABSA) have overlooked this issue, which can make it difficult to extract opinions comprehensively and fairly. In this paper, we propose a new task called Entity-Aspect-Opinion-Sentiment Quadruple Extraction (EASQE), which aims to hierarchically decompose aspect terms into entities and aspects to avoid information loss, non-exclusive annotations, and opinion misunderstandings in ABSA tasks. To facilitate research in this new task, we have constructed four datasets (Res14-EASQE, Res15-EASQE, Res16-EASQE, and Lap14-EASQE) based on the SemEval Restaurant and Laptop datasets. We have also proposed a novel two-stage sequence-tagging based Trigger-Opinion framework as the baseline for the EASQE task. Empirical evaluations show that our Trigger-Opinion framework can generate satisfactory EASQE results and can also be applied to other ABSA tasks, significantly outperforming state-of-the-art methods. We have made the four datasets and source code of Trigger-Opinion publicly available to facilitate further research in this area.	翻訳日:2023-11-29 18:43:20 公開日:2023-11-28
# 文の類似性決定のための分布に基づく閾値 A Distribution-Based Threshold for Determining Sentence Similarity ( http://arxiv.org/abs/2311.16675v1 ) ライセンス: Link先を確認	Gioele Cadamuro and Marco Gruppo	(参考訳) そこで本研究では,意味的テキスト類似性(STS)問題に対する解決法を提案する。その解決法は,唯一の識別因子として,高度に特定の情報(名前,住所,識別符号など)を含む2つの文を一致させる必要があり,それらが類似している場合に定義を導出する必要がある。このソリューションは、シアムアーキテクチャに基づくニューラルネットワークの使用を中心に展開され、類似した文と異なる文のペア間の距離の分布を生成する。これらの分布の目標は、新しい予測や後の分析において、類似の対のベクトル距離から類似の対のベクトル距離を区別するために用いられる、明確に定義された量を表す「閾値」と呼ばれる判別因子を見つけることである。さらに,分布の特徴と距離関数の働きの双方の属性を組み合わせることで,予測値のスコア付け方法を開発した。最後に、sts問題に対するよく知られた広く使われているベンチマークデータセットに適用することにより、より広い範囲のドメインに転送できることを示す結果を一般化する。 We hereby present a solution to a semantic textual similarity (STS) problem in which it is necessary to match two sentences containing, as the only distinguishing factor, highly specific information (such as names, addresses, identification codes), and from which we need to derive a definition for when they are similar and when they are not. The solution revolves around the use of a neural network, based on the siamese architecture, to create the distributions of the distances between similar and dissimilar pairs of sentences. The goal of these distributions is to find a discriminating factor, that we call "threshold", which represents a well-defined quantity that can be used to distinguish vector distances of similar pairs from vector distances of dissimilar pairs in new predictions and later analyses. In addition, we developed a way to score the predictions by combining attributes from both the distributions' features and the way the distance function works. Finally, we generalize the results showing that they can be transferred to a wider range of domains by applying the system discussed to a well-known and widely used benchmark dataset for STS problems.	翻訳日:2023-11-29 18:42:54 公開日:2023-11-28
# コンピュータビジョンに対応した大規模言語モデル:簡単な調査 Large Language Models Meet Computer Vision: A Brief Survey ( http://arxiv.org/abs/2311.16673v1 ) ライセンス: Link先を確認	Raby Hamadi	(参考訳) 近年,Large Language Models (LLMs) とComputer Vision (CV) の交差点が研究の重要な領域として現れ,人工知能 (AI) の分野で大きな進歩を遂げている。トランスフォーマーは自然言語処理(NLP)とCVの両方において多くの最先端モデルのバックボーンとなっているため、その進化と潜在的な拡張を理解することが重要である。この調査論文は、トランスフォーマーとその後継者の領域における最新の進歩を考察し、ビジョントランスフォーマー(ViT)とLCMを革命させる可能性を強調した。また、この調査では、いくつかの有償およびオープンソースのLCMのパフォーマンス指標について比較分析を行い、その強みと改善の領域に光を当て、また、LCMが視覚関連タスクにどのように使われているかの文献レビューを行っている。さらに、調査では、LLMのトレーニングに使用されるデータセットの包括的なコレクションを示し、LLMのさまざまなトレーニング前および下流タスクで高いパフォーマンスを達成するために利用可能な多様なデータに関する洞察を提供する。調査は、この分野のオープンな方向性を強調し、将来の研究開発の場を示唆することで締めくくられる。この調査は、cvにおけるllmの深い交差点の核心となることを目的としており、統合的で先進的なaiモデルの新しい時代へと繋がる。 Recently, the intersection of Large Language Models (LLMs) and Computer Vision (CV) has emerged as a pivotal area of research, driving significant advancements in the field of Artificial Intelligence (AI). As transformers have become the backbone of many state-of-the-art models in both Natural Language Processing (NLP) and CV, understanding their evolution and potential enhancements is crucial. This survey paper delves into the latest progressions in the domain of transformers and their subsequent successors, emphasizing their potential to revolutionize Vision Transformers (ViTs) and LLMs. This survey also presents a comparative analysis, juxtaposing the performance metrics of several leading paid and open-source LLMs, shedding light on their strengths and areas of improvement as well as a literature review on how LLMs are being used to tackle vision related tasks. Furthermore, the survey presents a comprehensive collection of datasets employed to train LLMs, offering insights into the diverse data available to achieve high performance in various pre-training and downstream tasks of LLMs. The survey is concluded by highlighting open directions in the field, suggesting potential venues for future research and development. This survey aims to underscores the profound intersection of LLMs on CV, leading to a new era of integrated and advanced AI models.	翻訳日:2023-11-29 18:42:35 公開日:2023-11-28
# splitnerf: ジョイント幾何学、照明、物質推定のための分割和近似ニューラルネットワーク SplitNeRF: Split Sum Approximation Neural Field for Joint Geometry, Illumination, and Material Estimation ( http://arxiv.org/abs/2311.16671v1 ) ライセンス: Link先を確認	Jesus Zarzar, Bernard Ghanem	(参考訳) 本稿では,実世界の物体の形状,材料特性,環境照明を定点画像群から推定し,実世界の物体をデジタル化する新しい手法を提案する。提案手法は,実時間物理ベースのレンダリングに画像ベースの照明で使用される分割和近似をニューラル放射場(NeRF)パイプラインに組み込む。任意の解像度で、画像ベース照明を予め統合したシーン固有のMLPを用いて、シーンの照明をモデル化する。効率的なモンテカルロサンプリングに基づく新しい正則化器を活用し,事前統合照明の正確なモデリングを実現する。さらに,モンテカルロサンプリングに基づく同様の正則化手法を用いて,自己閉塞予測の監視手法を提案する。実験の結果, 景観形状, 材質特性, 照明を推定する手法の有効性と有効性が示された。当社の手法では,NVIDIA A100 GPUの1時間あたりのトレーニングに1ドルを要しただけで,最先端のリライト品質を実現することができる。 We present a novel approach for digitizing real-world objects by estimating their geometry, material properties, and environmental lighting from a set of posed images with fixed lighting. Our method incorporates into Neural Radiance Field (NeRF) pipelines the split sum approximation used with image-based lighting for real-time physical-based rendering. We propose modeling the scene's lighting with a single scene-specific MLP representing pre-integrated image-based lighting at arbitrary resolutions. We achieve accurate modeling of pre-integrated lighting by exploiting a novel regularizer based on efficient Monte Carlo sampling. Additionally, we propose a new method of supervising self-occlusion predictions by exploiting a similar regularizer based on Monte Carlo sampling. Experimental results demonstrate the efficiency and effectiveness of our approach in estimating scene geometry, material properties, and lighting. Our method is capable of attaining state-of-the-art relighting quality after only ${\sim}1$ hour of training in a single NVIDIA A100 GPU.	翻訳日:2023-11-29 18:42:08 公開日:2023-11-28
# PyTorch Geometric High Order:高次グラフニューラルネットワークのための統一ライブラリ PyTorch Geometric High Order: A Unified Library for High Order Graph Neural Network ( http://arxiv.org/abs/2311.16670v1 ) ライセンス: Link先を確認	Xiyuan Wang, Muhan Zhang	(参考訳) PyTorch Geometric High Order (PyGHO)は、PyTorch Geometric (PyG)を拡張した高次グラフニューラルネットワーク(HOGNN)のためのライブラリである。ノード間でメッセージを交換する通常のメッセージパッシングニューラルネットワーク(MPNN)とは異なり、HOGNNはサブグラフGNNとk-WL GNNを包含し、ノードタプルをエンコードする。 PyGHOの主な目的は、様々なHOGNNに対して統一的でユーザフレンドリーなインターフェースを提供することである。これはノードタプルのための合理化されたデータ構造、包括的なデータ処理ユーティリティ、高次GNN方法論のための柔軟な演算子スイートによって実現される。本稿では,PyGHOの詳細について述べるとともに,PyGHOで実装されたHOGNNと実世界のタスクに関する公式実装との比較を行う。 PyGHOは最大50\%の加速を実現し、実装に必要なコードを桁違いに削減する。私たちのライブラリは \url{https://github.com/GraphPKU/PygHO} で利用可能です。 We introduce PyTorch Geometric High Order (PyGHO), a library for High Order Graph Neural Networks (HOGNNs) that extends PyTorch Geometric (PyG). Unlike ordinary Message Passing Neural Networks (MPNNs) that exchange messages between nodes, HOGNNs, encompassing subgraph GNNs and k-WL GNNs, encode node tuples, a method previously lacking a standardized framework and often requiring complex coding. PyGHO's main objective is to provide an unified and user-friendly interface for various HOGNNs. It accomplishes this through streamlined data structures for node tuples, comprehensive data processing utilities, and a flexible suite of operators for high-order GNN methodologies. In this work, we present a detailed in-depth of PyGHO and compare HOGNNs implemented with PyGHO with their official implementation on real-world tasks. PyGHO achieves up to $50\%$ acceleration and reduces the code needed for implementation by an order of magnitude. Our library is available at \url{https://github.com/GraphPKU/PygHO}.	翻訳日:2023-11-29 18:41:53 公開日:2023-11-28
# アクティブ推論による連続計算における平衡 Equilibrium in the Computing Continuum through Active Inference ( http://arxiv.org/abs/2311.16769v1 ) ライセンス: Link先を確認	Boris Sedlak, Victor Casamayor Pujol, Praveen Kumar Donta, Schahram Dustdar	(参考訳) 計算継続性(CC)システムは、各計算階層の複雑な要求を保証するために課題となる。システムの規模を考えると、これらの要件として表現されるサービスレベルオブジェクト(SLO)は、分散化可能な小さな部分に分割される必要があります。我々は,1)SLOの実施方法の因果的理解を深め,(2)異種デバイスの搭載を高速化するための知識の伝達を可能にする,協調的エッジインテリジェンスのための枠組みを提案する。コラボレーションを通じて、(3)SLO充足の範囲を拡大する。このフレームワークを実装し,ビデオストリーミングにおいて,CCシステムが品質・オブ・サービス(QoS)と品質・オブ・エクスペリエンス(QoE)の確保に寄与するユースケースを評価した。以上の結果から, エッジデバイスは4つのSLOを確保するために10回の訓練ラウンドしか必要とせず, さらに根底にある因果構造も合理的に説明可能であることがわかった。新しいタイプのデバイスの追加は後方で行うことができ、このフレームワークは、デバイスタイプが不明であったにもかかわらず、既存のモデルの再利用を可能にした。最後に、デバイスクラスタ内の負荷の再バランスにより、個々のエッジデバイスは、ネットワーク障害後のSLOコンプライアンスを22%から89%に回復することができた。 Computing Continuum (CC) systems are challenged to ensure the intricate requirements of each computational tier. Given the system's scale, the Service Level Objectives (SLOs) which are expressed as these requirements, must be broken down into smaller parts that can be decentralized. We present our framework for collaborative edge intelligence enabling individual edge devices to (1) develop a causal understanding of how to enforce their SLOs, and (2) transfer knowledge to speed up the onboarding of heterogeneous devices. Through collaboration, they (3) increase the scope of SLO fulfillment. We implemented the framework and evaluated a use case in which a CC system is responsible for ensuring Quality of Service (QoS) and Quality of Experience (QoE) during video streaming. Our results showed that edge devices required only ten training rounds to ensure four SLOs; furthermore, the underlying causal structures were also rationally explainable. The addition of new types of devices can be done a posteriori, the framework allowed them to reuse existing models, even though the device type had been unknown. Finally, rebalancing the load within a device cluster allowed individual edge devices to recover their SLO compliance after a network failure from 22% to 89%.	翻訳日:2023-11-29 18:34:53 公開日:2023-11-28
# 領域シフト医用画像の自動診断における参照障害の回避 Rescuing referral failures during automated diagnosis of domain-shifted medical images ( http://arxiv.org/abs/2311.16766v1 ) ライセンス: Link先を確認	Anuj Srivastava, Karm Patel, Pradeep Shenoy, Devarajan Sridharan	(参考訳) 現実世界にデプロイされたディープラーニングモデルの成功は、さまざまなデータドメインにまたがって適切に一般化できる能力に大きく依存する。ここでは、ドメインシフト医療画像の自動診断における選択分類の根本的な課題に対処する。このシナリオでは、特にトレーニングセットから遠く離れたサンプルでテストされた場合(コ変量シフト)、ラベルの信頼性が低い場合の予測を避けるためにモデルが学習しなければならない。このような不確実な症例は一般的に、さらなる分析と評価のために臨床医に言及される。しかし,最先端のドメイン一般化アプローチでさえ,異なる年齢層から取得した医用画像や異なる技術を用いた場合,参照中は極めて失敗することが判明した。強変量シフトを示す2つのベンチマーク診断医用画像データセットについて検討した。 i)網膜底像と糖尿病網膜症予測二胸部x線画像による多発性病変の予測予測の不確実性推定は,非単調な参照曲線に繋がる共変量シフトの下では十分に一般化せず,高い参照率 (>70%) で高い性能低下 (最大50%) を示す。我々は,これらの障害を解消し,ベースライン法よりも10%以上の大幅な性能向上を実現する,ロバストな一般化とポストホックレファレンシャルアプローチの新たな組み合わせを評価する。本研究は,領域シフト医療画像におけるレファラールの重要な課題を特定し,信頼性の高い自動疾患診断における重要な応用を見出す。 The success of deep learning models deployed in the real world depends critically on their ability to generalize well across diverse data domains. Here, we address a fundamental challenge with selective classification during automated diagnosis with domain-shifted medical images. In this scenario, models must learn to avoid making predictions when label confidence is low, especially when tested with samples far removed from the training set (covariate shift). Such uncertain cases are typically referred to the clinician for further analysis and evaluation. Yet, we show that even state-of-the-art domain generalization approaches fail severely during referral when tested on medical images acquired from a different demographic or using a different technology. We examine two benchmark diagnostic medical imaging datasets exhibiting strong covariate shifts: i) diabetic retinopathy prediction with retinal fundus images and ii) multilabel disease prediction with chest X-ray images. We show that predictive uncertainty estimates do not generalize well under covariate shifts leading to non-monotonic referral curves, and severe drops in performance (up to 50%) at high referral rates (>70%). We evaluate novel combinations of robust generalization and post hoc referral approaches, that rescue these failures and achieve significant performance improvements, typically >10%, over baseline methods. Our study identifies a critical challenge with referral in domain-shifted medical images and finds key applications in reliable, automated disease diagnosis.	翻訳日:2023-11-29 18:34:31 公開日:2023-11-28
# 放射線アウェアモデルに基づくレポート生成のための評価指標 Radiology-Aware Model-Based Evaluation Metric for Report Generation ( http://arxiv.org/abs/2311.16764v1 ) ライセンス: Link先を確認	Amos Calamida, Farhad Nooralahzadeh, Morteza Rohanian, Koji Fujimoto, Mizuho Nishio, Michael Krauthammer	(参考訳) 提案手法は,放射線学領域に適応したCOMETアーキテクチャを用いて,機械による放射線学レポートの自動評価手法を提案する。放射線学知識グラフであるRadGraphでトレーニングされた4つの医学的なモデルチェックポイントをトレーニングし、公開する。その結果,我々の測定値は,bertscore,bleu,chexbertスコアなどの確立した測定値と適度に相関していることがわかった。さらに, このチェックポイントの1つは, 200件の報告を用いて, 6人の放射線科医の公開アノテーションを用いて評価し, 人的判断と高い相関性を示すことを示した。我々はまた,100件の報告の収集において,2人の放射線学者とアノテーションを収集する独自の分析を行った。その結果, 放射線学的評価指標としての有効性が示唆された。調査結果を再現するためのコード、データ、モデルチェックポイントが公開される予定だ。 We propose a new automated evaluation metric for machine-generated radiology reports using the successful COMET architecture adapted for the radiology domain. We train and publish four medically-oriented model checkpoints, including one trained on RadGraph, a radiology knowledge graph. Our results show that our metric correlates moderately to high with established metrics such as BERTscore, BLEU, and CheXbert scores. Furthermore, we demonstrate that one of our checkpoints exhibits a high correlation with human judgment, as assessed using the publicly available annotations of six board-certified radiologists, using a set of 200 reports. We also performed our own analysis gathering annotations with two radiologists on a collection of 100 reports. The results indicate the potential effectiveness of our method as a radiology-specific evaluation metric. The code, data, and model checkpoints to reproduce our findings will be publicly available.	翻訳日:2023-11-29 18:34:08 公開日:2023-11-28
# 植物ノードの知覚改善のための勾配に基づく局所次視点計画 Gradient-based Local Next-best-view Planning for Improved Perception of Targeted Plant Nodes ( http://arxiv.org/abs/2311.16759v1 ) ライセンス: Link先を確認	Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra	(参考訳) トマトの温室では、選択的収穫や脱葉といった労働集約的な作業を自動化するロボットが増えている。これらのタスクを実行するには、ロボットは、他の植物部分から高いレベルの閉塞があるにもかかわらず、カットが必要な植物ノードを正確かつ効率的に知覚できなければならない。この問題を,ロボットが咬合を克服し知覚の質を向上させるために,効率的なカメラ視点のセットを計画しなければならない局所的次善視点計画タスクとして定式化する。提案方式では,単一の対象ノードの知覚精度を迅速に向上し,切断の確率を最大化することに注力する。従来のnbv計画の方法は、大域的視点計画に重点を置いており、高い計算コスト、悪い候補による非効率的な視点選択、または非効率的なサンプリングによる非スムース軌道に苦しむ探索のための候補視点のランダムサンプリングを用いていた。ディファレンシャルレイサンプリングを用いた傾斜型nbvプランナーを提案し, 局所勾配方向を直接推定し, 咬合を克服し, 知覚を改善する。シミュレーション実験により,本プランナーはオクルージョンを処理し,サンプリング型nbvプランナーと同様にノードの3次元再構成と位置推定を改善し,10倍の計算を要し,28%の効率の良いトラジェクタを生成することができた。 Robots are increasingly used in tomato greenhouses to automate labour-intensive tasks such as selective harvesting and de-leafing. To perform these tasks, robots must be able to accurately and efficiently perceive the plant nodes that need to be cut, despite the high levels of occlusion from other plant parts. We formulate this problem as a local next-best-view (NBV) planning task where the robot has to plan an efficient set of camera viewpoints to overcome occlusion and improve the quality of perception. Our formulation focuses on quickly improving the perception accuracy of a single target node to maximise its chances of being cut. Previous methods of NBV planning mostly focused on global view planning and used random sampling of candidate viewpoints for exploration, which could suffer from high computational costs, ineffective view selection due to poor candidates, or non-smooth trajectories due to inefficient sampling. We propose a gradient-based NBV planner using differential ray sampling, which directly estimates the local gradient direction for viewpoint planning to overcome occlusion and improve perception. Through simulation experiments, we showed that our planner can handle occlusions and improve the 3D reconstruction and position estimation of nodes equally well as a sampling-based NBV planner, while taking ten times less computation and generating 28% more efficient trajectories.	翻訳日:2023-11-29 18:33:56 公開日:2023-11-28
# Floquet-Rydberg 量子シミュレータによる$\mathbb{Z}_2$ゲージ理論の閉じ込め A Floquet-Rydberg quantum simulator for confinement in $\mathbb{Z}_2$ gauge theories ( http://arxiv.org/abs/2311.16758v1 ) ライセンス: Link先を確認	Enrico C. Domanti, Dario Zappal\`a, Alejandro Bermudez, Luigi Amico	(参考訳) 量子技術分野の最近の進歩は、クォークの閉じ込めの根底にある非摂動的メカニズムの理解を改善することを目的として、格子ゲージ理論の小規模量子シミュレータの実現に向けた道を開いた。本研究では, tweezerラダー幾何学における周期的に駆動されるrydberg原子の配列を考慮し, $\mathbb{z}_2$ lgt における実時間ダイナミクスの量子シミュレーションのためのスケーラブルフローケットスキームを考案する。リドベルク双極子相互作用の角度依存性を調整するために外部磁場に遷移し、駆動パラメータの適切なチューニングにより、主ゲージ違反項を抑えることができ、フロケット・リドベルク系におけるゲージ不変閉じ込めダイナミクスの観測が現在の実験手法の到達点であることを示す。格子の大きさに応じて,周期的に変化する実時間ダイナミクスに対して,厳密な対角化法と行列生成状態法を用いて,このスキームの有効性を徹底的に数値的に検証する。 Recent advances in the field of quantum technologies have opened up the road for the realization of small-scale quantum simulators of lattice gauge theories which, among other goals, aim at improving our understanding on the non-perturbative mechanisms underlying the confinement of quarks. In this work, considering periodically-driven arrays of Rydberg atoms in a tweezer ladder geometry, we devise a scalable Floquet scheme for the quantum simulation of the real-time dynamics in a $\mathbb{Z}_2$ LGT. Resorting to an external magnetic field to tune the angular dependence of the Rydberg dipolar interactions, and by a suitable tuning of the driving parameters, we manage to suppress the main gauge-violating terms, and show that an observation of gauge-invariant confinement dynamics in the Floquet-Rydberg setup is at reach of current experimental techniques. Depending on the lattice size, we present a thorough numerical test of the validity of this scheme using either exact diagonalization or matrix-product-state algorithms for the periodically-modulated real-time dynamics.	翻訳日:2023-11-29 18:33:30 公開日:2023-11-28
# 自律運転のための多エージェント協調型鳥眼視セグメンテーションのフルシーン領域一般化に向けて Towards Full-scene Domain Generalization in Multi-agent Collaborative Bird's Eye View Segmentation for Connected and Autonomous Driving ( http://arxiv.org/abs/2311.16754v1 ) ライセンス: Link先を確認	Senkang Hu, Zhengru Fang, Xianhao Chen, Yuguang Fang, Sam Kwong	(参考訳) 協調的知覚は、最近自動運転において大きな注目を集め、車両間の追加情報交換を可能にし、知覚品質の向上に寄与している。しかし、協調認識システムの展開は、様々な環境条件とコネクテッド・自動運転車(CAV)間のデータの均一性によるドメインシフトにつながる可能性がある。これらの課題に対処するために,協調認知の訓練段階と推論段階の両方に適用可能な統一ドメイン一般化フレームワークを提案する。訓練段階では、低周波画像の変動を強調する振幅拡張法(ampaug)を導入し、様々な領域で学習するモデルの能力を広げる。また、ドメインシフトをシミュレートするためにメタ一貫性トレーニングスキームを採用し、注意深く設計された一貫性損失でモデルを最適化し、ドメイン不変表現を促進する。推論フェーズでは,システム内ドメインアライメント機構を導入し,推論に先立ってCAV間のドメイン不一致を低減または除去する。包括的実験により,本手法の有効性が現行の手法と比較された。コードはhttps://github.com/DG-CAVs/DG-CoPerception.gitでリリースされる。 Collaborative perception has recently gained significant attention in autonomous driving, improving perception quality by enabling the exchange of additional information among vehicles. However, deploying collaborative perception systems can lead to domain shifts due to diverse environmental conditions and data heterogeneity among connected and autonomous vehicles (CAVs). To address these challenges, we propose a unified domain generalization framework applicable in both training and inference stages of collaborative perception. In the training phase, we introduce an Amplitude Augmentation (AmpAug) method to augment low-frequency image variations, broadening the model's ability to learn across various domains. We also employ a meta-consistency training scheme to simulate domain shifts, optimizing the model with a carefully designed consistency loss to encourage domain-invariant representations. In the inference phase, we introduce an intra-system domain alignment mechanism to reduce or potentially eliminate the domain discrepancy among CAVs prior to inference. Comprehensive experiments substantiate the effectiveness of our method in comparison with the existing state-of-the-art works. Code will be released at https://github.com/DG-CAVs/DG-CoPerception.git.	翻訳日:2023-11-29 18:33:08 公開日:2023-11-28
# 電子量子光学における熱パルス Heat Pulses in Electron Quantum Optics ( http://arxiv.org/abs/2311.16748v1 ) ライセンス: Link先を確認	Pedro Portugal, Fredrik Brange, Christian Flindt	(参考訳) 電子量子光学は、電子導体における電荷パルスによって演奏される光子の役割によって、光の量子理論からのアイデアを実現することを目的としている。実験では、電荷パルスは時間依存電圧によって励起されるが、電極を加熱して冷却することで熱パルスを生成することもできる。ここでは、メソスコピック導体中の熱パルスのフロッケ散乱理論を定式化することで、この興味深いアイデアを探求する。熱パルスの断熱放出は、線形応答において熱伝導量子によって与えられる熱電流につながる。しかし, 熱電流に対する揺らぎ散逸定理の妥当性が議論されている高周波成分も満たされている。熱パルスは無電荷であり、量子点接触の出力における分配ノイズを評価して電子ホールの含有量を調査する。また、Hong-Ou--Mandelセットアップを使用して、パルスがバラバラなのかアンチバンチなのかを調べます。最後に、電流を生成するために、電子-ホール対称性を破り、熱電効果を可能にするマッハ-ツェンダー干渉計を用いる。我々の研究はメソスコピック導体における熱パルスの体系的な研究の道を開き、将来の実験を刺激する可能性がある。 Electron quantum optics aims to realize ideas from the quantum theory of light with the role of photons being played by charge pulses in electronic conductors. Experimentally, the charge pulses are excited by time-dependent voltages, however, one could also generate heat pulses by heating and cooling an electrode. Here, we explore this intriguing idea by formulating a Floquet scattering theory of heat pulses in mesoscopic conductors. The adiabatic emission of heat pulses leads to a heat current that in linear response is given by the thermal conductance quantum. However, we also find a high-frequency component, which ensures that the fluctuation-dissipation theorem for heat currents, whose validity has been debated, is fulfilled. The heat pulses are uncharged, and we probe their electron-hole content by evaluating the partition noise in the outputs of a quantum point contact. We also employ a Hong--Ou--Mandel setup to examine if the pulses bunch or antibunch. Finally, to generate an electric current, we use a Mach--Zehnder interferometer that breaks the electron-hole symmetry and thereby enables a thermoelectric effect. Our work paves the way for systematic investigations of heat pulses in mesoscopic conductors, and it may stimulate future experiments.	翻訳日:2023-11-29 18:32:47 公開日:2023-11-28
# シリコンチップ上のグリーンベルガー・ホルン・ザイリンガー絡みにおける量子非局所性の観察 Observation of quantum nonlocality in Greenberger-Horne-Zeilinger entanglement on a silicon chip ( http://arxiv.org/abs/2311.16745v1 ) ライセンス: Link先を確認	Leizhen Chen, Bochi Wu, Liangliang Lu, Kai Wang, Yanqing Lu, Shining Zhu, Xiao-Song Ma	(参考訳) 非局所性は量子エンタングルメントの定義的特徴である。複数の粒子を持つ絡み合った状態は、量子物理学の基本的な実験や多くの量子情報処理において非常に重要である。グリーンベルガー=ホルン=ザイリンガー状態(GHZ)は、古典的な多部量子状態の1つであり、量子物理学と局所現実論の激しい衝突を、いわゆる全対無の方法で観察することができる。これは統計予測に依存する2つの粒子に対するベルの定理とは大きく異なる。ここでは,4光子GHZ状態の生成と操作が可能な集積フォトニックチップを実演する。量子状態トモグラフィを用いて4光子ghz状態の完全な特徴付けを行い、0.729(6)の状態忠実度を得る。 GHZエンタングルメントの量子非局所性を見極めるために、全対無検定とメルミン不等式を用いる。我々の研究は、複雑な集積量子デバイスで量子物理学の基礎的なテストを実行する方法である。 Nonlocality is the defining feature of quantum entanglement. Entangled states with multiple particles are of crucial importance in fundamental tests of quantum physics as well as in many quantum information tasks. One of the archetypal multipartite quantum states, Greenberger-Horne-Zeilinger (GHZ) state, allows one to observe the striking conflict of quantum physics to local realism in the so-called all-versus-nothing way. This is profoundly different from Bell's theorem for two particles, which relies on statistical predictions. Here, we demonstrate an integrated photonic chip capable of generating and manipulating the four-photon GHZ state. We perform a complete characterization of the four-photon GHZ state using quantum state tomography and obtain a state fidelity of 0.729(6). We further use the all-versus-nothing test and the Mermin inequalities to witness the quantum nonlocality of GHZ entanglement. Our work paves the way to perform fundamental tests of quantum physics with complex integrated quantum devices.	翻訳日:2023-11-29 18:32:28 公開日:2023-11-28
# as-plausible-as-possible:2次元拡散前兆を用いたプラウサビリティ・アウェアメッシュ変形 As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors ( http://arxiv.org/abs/2311.16739v1 ) ライセンス: Link先を確認	Seungwoo Yoo, Kunho Kim, Vladimir G. Kim, Minhyuk Sung	(参考訳) 本稿では,2次元拡散前処理を応用したas-plausible-as-possible (apap) メッシュ変形法を提案する。我々のフレームワークは、メッシュ変形を表すために、顔ごとのジャコビアンを用いており、メッシュ頂点座標は、微分可能なポアソン解によって計算される。変形メッシュを描画し、得られた2D画像をスコア蒸留サンプリング(SDS)プロセスで使用することにより、事前訓練された2D拡散モデルから有意義な可視性を抽出することができる。編集メッシュのアイデンティティをよりよく保存するために、私たちはLoRAで2次元拡散モデルを微調整します。 SDSによって抽出された勾配とユーザが規定するハンドル変位は、顔ごとのジャコビアンに逆転し、ユーザー編集と出力可否のバランスをとる最終的な変形を計算するために反復勾配勾配を用いて計算する。提案手法を2次元および3次元メッシュを用いて評価し,従来手法で用いた幾何保存や歪み最小化に先立って,定性的かつ定量的な精度向上を図った。 We present As-Plausible-as-Possible (APAP) mesh deformation technique that leverages 2D diffusion priors to preserve the plausibility of a mesh under user-controlled deformation. Our framework uses per-face Jacobians to represent mesh deformations, where mesh vertex coordinates are computed via a differentiable Poisson Solve. The deformed mesh is rendered, and the resulting 2D image is used in the Score Distillation Sampling (SDS) process, which enables extracting meaningful plausibility priors from a pretrained 2D diffusion model. To better preserve the identity of the edited mesh, we fine-tune our 2D diffusion model with LoRA. Gradients extracted by SDS and a user-prescribed handle displacement are then backpropagated to the per-face Jacobians, and we use iterative gradient descent to compute the final deformation that balances between the user edit and the output plausibility. We evaluate our method with 2D and 3D meshes and demonstrate qualitative and quantitative improvements when using plausibility priors over geometry-preservation or distortion-minimization priors used by previous techniques.	翻訳日:2023-11-29 18:32:08 公開日:2023-11-28
# SPDネットワークにおけるリーマン自己注意機構 Riemannian Self-Attention Mechanism for SPD Networks ( http://arxiv.org/abs/2311.16738v1 ) ライセンス: Link先を確認	Rui Wang, Xiao-Jun Wu, Hui Li, Josef Kittler	(参考訳) 対称正定値行列(SPD)は、曲線付きリーマン多様体、すなわちSPD多様体上のデータの時空間統計を適切に符号化できるため、多くの科学領域において効果的な特徴記述子であることが示されている。 SPD行列非線形学習のためのネットワークアーキテクチャを設計するには多くの異なる方法があるが、異なる層における特徴の幾何学的依存関係を明示的に抽出するソリューションはほとんどない。本稿では,spd多様体自己着付け機構 (smsa) をリーマン計量, リーマン平均, リーマン最適化などの多様体値幾何演算を用いて提案する。そして、生成された深層構造表現の識別を改善するため、SMSAベースの幾何学習モジュール(SMSA-GLM)を設計する。 3つのベンチマークデータセットで得られた広範囲な実験結果は、ベースラインネットワークに対する修正が情報劣化問題をさらに軽減し、精度の向上につながることを示している。 Symmetric positive definite (SPD) matrix has been demonstrated to be an effective feature descriptor in many scientific areas, as it can encode spatiotemporal statistics of the data adequately on a curved Riemannian manifold, i.e., SPD manifold. Although there are many different ways to design network architectures for SPD matrix nonlinear learning, very few solutions explicitly mine the geometrical dependencies of features at different layers. Motivated by the great success of self-attention mechanism in capturing long-range relationships, an SPD manifold self-attention mechanism (SMSA) is proposed in this paper using some manifold-valued geometric operations, mainly the Riemannian metric, Riemannian mean, and Riemannian optimization. Then, an SMSA-based geometric learning module (SMSA-GLM) is designed for the sake of improving the discrimination of the generated deep structured representations. Extensive experimental results achieved on three benchmarking datasets show that our modification against the baseline network further alleviates the information degradation problem and leads to improved accuracy.	翻訳日:2023-11-29 18:31:32 公開日:2023-11-28
# point'n move: gaussian splatting radiance fieldにおけるインタラクティブなシーンオブジェクト操作 Point'n Move: Interactive Scene Object Manipulation on Gaussian Splatting Radiance Fields ( http://arxiv.org/abs/2311.16737v1 ) ライセンス: Link先を確認	Jiajun Huang, Hongchuan Yu	(参考訳) 我々は、露出した領域を描画するインタラクティブなシーンオブジェクト操作を実現するPoint'n Moveを提案する。ここでの対話性は、直感的なオブジェクト選択とリアルタイム編集からもたらされる。これを実現するために,我々はガウス型レイミアンスフィールドをシーン表現として採用し,その明示的な性質と速度優位性を十分に活用する。その明示的な表現の定式化により、3次元マスクの2段自己プロンプトセグメンテーションアルゴリズムを考案し、マスクの精細化とマージを行い、変化を最小限にし、また、シーンのインペイントや編集を1回の編集なしでリアルタイムに行うことができる。本手法は,前向きと360シーンの両方で編集を行うことでテストを行う。また,提案手法を既存のシーンオブジェクト除去法と比較し,性能と性能に優れながら優れた品質を示す。 We propose Point'n Move, a method that achieves interactive scene object manipulation with exposed region inpainting. Interactivity here further comes from intuitive object selection and real-time editing. To achieve this, we adopt Gaussian Splatting Radiance Field as the scene representation and fully leverage its explicit nature and speed advantage. Its explicit representation formulation allows us to devise a 2D prompt points to 3D mask dual-stage self-prompting segmentation algorithm, perform mask refinement and merging, minimize change as well as provide good initialization for scene inpainting and perform editing in real-time without per-editing training, all leads to superior quality and performance. We test our method by performing editing on both forward-facing and 360 scenes. We also compare our method against existing scene object removal methods, showing superior quality despite being more capable and having a speed advantage.	翻訳日:2023-11-29 18:31:02 公開日:2023-11-28
# llms for science: コード生成とデータ分析のための利用 LLMs for Science: Usage for Code Generation and Data Analysis ( http://arxiv.org/abs/2311.16733v1 ) ライセンス: Link先を確認	Mohamed Nejjar, Luca Zacharias, Fabian Stiehle and Ingo Weber	(参考訳) 大規模言語モデル (LLMs) は、今日の作業環境の多くの領域で生産性の向上を図っている。研究分野としての科学研究は例外ではなく、科学者の日々の作業を支援するLLMベースのツールの可能性は、分野によって議論の的になっている。しかし、私たちはこの研究の始まりに過ぎません。 LLMのポテンシャルが研究実践においてどのように成立するかは、まだ不明である。本研究では,研究プロセスにおけるLSMの使用に関する実証的研究を行った。我々は,科学研究におけるLLMツールの一連のユースケースを調査し,現在のツールがどの程度役に立つかを評価するための最初の研究を行った。本稿では,アプリケーションコード生成やデータ解析用のスクリプトの開発など,ソフトウェア工学に関連するユースケースを具体的に報告する。一見単純なユースケースを検討したが、ツール間での結果は大きく異なる。以上の結果から,LLMベースのツール全般の約束が強調されているが,これらのツールが提供するアウトプットの完全性に関して,さまざまな問題も観察している。 Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. Scientific research as an area of work is no exception: the potential of LLM-based tools to assist in the daily work of scientists has become a highly discussed topic across disciplines. However, we are only at the very onset of this subject of study. It is still unclear how the potential of LLMs will materialise in research practice. With this study, we give first empirical evidence on the use of LLMs in the research process. We have investigated a set of use cases for LLM-based tools in scientific research, and conducted a first study to assess to which degree current tools are helpful. In this paper we report specifically on use cases related to software engineering, such as generating application code and developing scripts for data analytics. While we studied seemingly simple use cases, results across tools differ significantly. Our results highlight the promise of LLM-based tools in general, yet we also observe various issues, particularly regarding the integrity of the output these tools provide.	翻訳日:2023-11-29 18:30:33 公開日:2023-11-28

Title

Authors

Abstract

論文公表日・翻訳日

# サイバー犯罪Bitcoinの収益予測:方法論とカバレッジの影響を定量化

Cybercrime Bitcoin Revenue Estimations: Quantifying the Impact of Methodology and Coverage ( http://arxiv.org/abs/2309.03592v2 )

ライセンス: Link先を確認

Gibran Gomez, Kevin van Liebergen, Juan Caballero,

(参考訳) 複数の研究が公のBitcoin台帳を利用して、被害者から得たサイバー犯罪の収益を見積もっている。同じターゲットにフォーカスする推定は、異なる方法論、シードアドレス、および期間を使用するため、しばしば一致しない。これらの要因は、それらの方法論的差異の影響を理解することを困難にしている。さらに、ターゲットの支払いアドレスをカバーしている(不足している)ため、収益を過小評価しているが、この影響がどれほど大きいかは不明だ。本研究は,サイバー犯罪によるビットコイン収益の推定に関する最初の体系的分析を行う。異なる推定手法を再現できるツールを実装した。ツールを使うことで、異なる方法論のステップの影響を、制御された設定で定量化できます。広く信じられているのとは対照的に、収益は常に過小評価されているわけではない。膨大な過大評価をもたらす方法がある。 30,424件の支払いアドレスを収集し,6つのサイバー犯罪(ランサムウェア,クリッパー,セクシュアル,ポンジスキーム,配当詐欺,交換詐欺)と141のサイバー犯罪グループの金銭的影響を比較した。一般的なマルチインプットクラスタリングは、40%のグループのアドレスを見つけるのに失敗する。私たちは、初めて、(不足している)カバレッジが見積もりに与える影響を定量化します。そこで本研究では,DeadBoltサーバランサムウェア上で,高いカバレッジを実現するための2つの手法を提案する。対象範囲を広げることで、DeadBoltの収益を2.47億ドルで見積もることができる。

Multiple works have leveraged the public Bitcoin ledger to estimate the revenue cybercriminals obtain from their victims. Estimations focusing on the same target often do not agree, due to the use of different methodologies, seed addresses, and time periods. These factors make it challenging to understand the impact of their methodological differences. Furthermore, they underestimate the revenue due to the (lack of) coverage on the target's payment addresses, but how large this impact remains unknown. In this work, we perform the first systematic analysis on the estimation of cybercrime bitcoin revenue. We implement a tool that can replicate the different estimation methodologies. Using our tool we can quantify, in a controlled setting, the impact of the different methodology steps. In contrast to what is widely believed, we show that the revenue is not always underestimated. There exist methodologies that can introduce huge overestimation. We collect 30,424 payment addresses and use them to compare the financial impact of 6 cybercrimes (ransomware, clippers, sextortion, Ponzi schemes, giveaway scams, exchange scams) and of 141 cybercriminal groups. We observe that the popular multi-input clustering fails to discover addresses for 40% of groups. We quantify, for the first time, the impact of the (lack of) coverage on the estimation. For this, we propose two techniques to achieve high coverage, possibly nearly complete, on the DeadBolt server ransomware. Our expanded coverage enables estimating DeadBolt's revenue at $2.47M, 39 times higher than the estimation using two popular Internet scan engines.

翻訳日:2024-03-25 22:59:44 公開日:2023-11-28

# 「ユーザは本当の敵対的フィッシングに落ちるのか?」 : 普及するウェブページに対する人間の反応を調査する

"Do Users fall for Real Adversarial Phishing?" Investigating the Human response to Evasive Webpages ( http://arxiv.org/abs/2311.16383v1 )

ライセンス: Link先を確認

Ajka Draganovic, Savino Dambra, Javier Aldana Iuit, Kevin Roundy, Giovanni Apruzzese,

(参考訳) フィッシングサイトは至る所にあり、静的ブロックリストに基づく対策はそのような脅威に対処できない。この問題に対処するため、最先端のソリューションでは、マシンラーニング(ML)を使用して、有名ブランドのWebページと視覚的に類似しているかどうかをチェックしてフィッシングサイトを検出する。これらの技術は研究において有望な成果を上げており、いくつかのセキュリティ会社がフィッシング検知システム(PDS)にも導入し始めた。しかし、MLメソッドは完璧ではなく、いくつかのサンプルはプロダクショングレードのPSDでさえバイパスされる。本稿では、「商用MLベースのPSD」を回避する「総合的なフィッシングサイト」が「現実」の問題であるかどうかを精査する。フィッシングのウェブページへの着地を嫌う人はいないが、偽陰性はユーザー(つまり実際のフィッシングのターゲット)が「フィッシングはフィッシングである」と認識できる場合、重大な結果をもたらすことはない。実際のPDSを回避した「敵対的」フィッシングWebページによって、疑わしいユーザ(多様な背景を持つ)が騙されるかどうかを評価する最初のユーザスタディ(N=126)を実施する。私たちは、よく造られた敵のWebページが、ほとんどの参加者(ITの専門家でさえ)を騙しかねないことを発見しました。我々の研究は実践者にとって重要であり、同時に愚かなフィッシングページの優先順位付けを可能にするためである。 (i)機械 (ii)人間、すなわち、意図した標的。

Phishing websites are everywhere, and countermeasures based on static blocklists cannot cope with such a threat. To address this problem, state-of-the-art solutions entail the application of machine learning (ML) to detect phishing websites by checking if they visually resemble webpages of well-known brands. These techniques have achieved promising results in research and, consequently, some security companies began to deploy them also in their phishing detection systems (PDS). However, ML methods are not perfect and some samples are bound to bypass even production-grade PDS. In this paper, we scrutinize whether 'genuine phishing websites' that evade 'commercial ML-based PDS' represent a problem "in reality". Although nobody likes landing on a phishing webpage, a false negative may not lead to serious consequences if the users (i.e., the actual target of phishing) can recognize that "something is phishy". Practically, we carry out the first user-study (N=126) wherein we assess whether unsuspecting users (having diverse backgrounds) are deceived by 'adversarial' phishing webpages that evaded a real PDS. We found that some well-crafted adversarial webpages can trick most participants (even IT experts), albeit others are easily recognized by most users. Our study is relevant for practitioners, since it allows prioritizing phishing webpages that simultaneously fool (i) machines and (ii) humans -- i.e., their intended targets.

翻訳日:2024-03-18 15:42:08 公開日:2023-11-28

# サイバーセキュリティにおけるデータラベリングのプロセスの理解

Understanding the Process of Data Labeling in Cybersecurity ( http://arxiv.org/abs/2311.16388v1 )

ライセンス: Link先を確認

Tobias Braun, Irdin Pekaric, Giovanni Apruzzese,

(参考訳) 多くのドメインが機械学習(ML)の利点を活用しており、いくつかのデータをトレーニングすることで、複雑なタスクを自律的に解決できるソリューションを約束している。残念ながら、サイバー脅威検出では、高品質なデータを得るのは難しい。さらに、MLの特定の用途では、そのようなデータは人間の演算子によってラベル付けされなければならない。多くの著作では、ラベリングはサイバー脅威検出においてタフ/シャレージング/コストがかかるため、そのようなハードルに対処する解決策を提案している。しかし、"MLセキュリティ実践者の観点から"ラベル付けのプロセスに特に対処する作業は見つからなかった。この日に至るまで、ラベリングが実際にどのように行われているのかはほとんど分かっていないため、現実の世界で“何が必要なのか”を特定できない。本稿では,データラベリングの文脈において,学術研究とセキュリティ実践の橋渡しを行うための第一歩を踏み出す。まず5つの課題の専門家に連絡し、公開インタビューを行い、ラベル付けルーチンの問題点を特定する。そして,この知見を足場として,大手セキュリティ企業の実践者13人とユーザスタディを行い,アクティブラーニングやラベルのコスト,ラベルの改訂といった課題について詳細な質問を行った。最後に,研究で見落とされがちなサイバー脅威検出におけるラベリングに関連する側面に対処する概念実証実験を行った。さらに、私たちのコントリビューションとレコメンデーションは、ML駆動のセキュリティシステムの品質と堅牢性の向上を目的とした、将来の取り組みの足掛かりとして役立ちます。リソースを解放します。

Many domains now leverage the benefits of Machine Learning (ML), which promises solutions that can autonomously learn to solve complex tasks by training over some data. Unfortunately, in cyberthreat detection, high-quality data is hard to come by. Moreover, for some specific applications of ML, such data must be labeled by human operators. Many works "assume" that labeling is tough/challenging/costly in cyberthreat detection, thereby proposing solutions to address such a hurdle. Yet, we found no work that specifically addresses the process of labeling 'from the viewpoint of ML security practitioners'. This is a problem: to this date, it is still mostly unknown how labeling is done in practice -- thereby preventing one from pinpointing "what is needed" in the real world. In this paper, we take the first step to build a bridge between academic research and security practice in the context of data labeling. First, we reach out to five subject matter experts and carry out open interviews to identify pain points in their labeling routines. Then, by using our findings as a scaffold, we conduct a user study with 13 practitioners from large security companies, and ask detailed questions on subjects such as active learning, costs of labeling, and revision of labels. Finally, we perform proof-of-concept experiments addressing labeling-related aspects in cyberthreat detection that are sometimes overlooked in research. Altogether, our contributions and recommendations serve as a stepping stone to future endeavors aimed at improving the quality and robustness of ML-driven security systems. We release our resources.

翻訳日:2024-03-18 15:42:08 公開日:2023-11-28

# Threshold Breaker: 対向型RowHammer防止メカニズムは本当に安全か?

Threshold Breaker: Can Counter-Based RowHammer Prevention Mechanisms Truly Safeguard DRAM? ( http://arxiv.org/abs/2311.16460v1 )

ライセンス: Link先を確認

Ranyang Zhou, Jacqueline Liu, Sabbir Ahmed, Nakul Kochar, Adnan Siraj Rakin, Shaahin Angizi,

(参考訳) 本稿では,Threshold Breakerと呼ばれる新しい多面的障害注入攻撃手法を実験的に実証することにより,既存の被害者対応型RowHammer検出機構に挑戦する。この機構は、ターゲット行から遠い物理的距離で行をソフトアタックすることにより、最も先進的なカウンターベース防御機構を効果的に回避することができる。このような攻撃の効果を実証する以前の研究はないが、我々の研究は、128個の実際の商用DDR4 DRAM製品を体系的にテストすることでこのギャップを埋め、Threshold Breakerが主要DRAMメーカーの様々なチップに影響を与えることを明らかにした。ケーススタディでは、現代のディープニューラルネットワーク(DNN)に対して対向重み攻撃を行うことにより、我々のメカニズムとよく知られた両面攻撃のパフォーマンス効率を比較した。その結果、Threshold Breakerは、DRAMが完全に保護されている間、ターゲットとするDNNシステムのインテリジェンスを意図的に損なうことができることを示した。

This paper challenges the existing victim-focused counter-based RowHammer detection mechanisms by experimentally demonstrating a novel multi-sided fault injection attack technique called Threshold Breaker. This mechanism can effectively bypass the most advanced counter-based defense mechanisms by soft-attacking the rows at a farther physical distance from the target rows. While no prior work has demonstrated the effect of such an attack, our work closes this gap by systematically testing 128 real commercial DDR4 DRAM products and reveals that the Threshold Breaker affects various chips from major DRAM manufacturers. As a case study, we compare the performance efficiency between our mechanism and a well-known double-sided attack by performing adversarial weight attacks on a modern Deep Neural Network (DNN). The results demonstrate that the Threshold Breaker can deliberately deplete the intelligence of the targeted DNN system while DRAM is fully protected.